You are on page 1of 235

I2MTC 2008 - IEEE International Instrumentation and

Measurement Technology Conference


Victoria, Vancouver Island, Canada, May 12-15, 2008

Stability and Static Noise Margin Analysis of Low-Power SRAM


Rajasekhar Keerthi and Chein-in Henry Chen
Department of Electrical Engineering
Wright State University
Dayton, OH 45435 USA
Abstract- To overcome the read data destruction and to gain
stability at IOW-VDD a seven-transistor (7T) SRAM cell is
implemented and compared with the conventional six-transistor
(6T) SRAM cell. To illustrate the robust performance of an 8-bit
SRAM statistical simulation and data analysis by considering the
process variations and mismatch was conducted for every
operation of the 8-bit SRAM. The measurement results show that
the static noise margin (SNM) of the 7T SRAM cell is better than
that of the 6T SRAM cell. The stability of the 8-bit 7T SRAM at
IOW-VDD is also proved by testing the SRAM at 720 mV.

Keyword - SRAM, static noise margin (SNM), stability.


I.

INTRODUCTION

Static RAM plays a key role in modem devices as the


technology advances and the needs for high speed and
performance of very deep sub-micron CMOS designs are
increasing. As the sizing of the SRAM is in nanometer scale
the variations in electrical parameters (e.g., threshold voltage,
sheet resistance) reduces it's steadily due to the sensitivity in
process parameters i.e., density of impurity concentration,
oxide thickness and diffusion depths [1].
The data retention of the SRAM cell in hold state and the
read state are important constraints in advanced CMOS
processes. The SRAM cell becomes less stable at low VDD,
increasing leakage currents, and increasing variability [2]. The
stability is usually defined by the static noise margin (SNM) as
the maximum value of the DC noise voltage that can be
tolerated by the SRAM cell without altering the stored bits [2].
The read static-noise-margin (SNM) deteriorates with
decrease in supply voltage (VDD) [3-6] and increases with the
transistor mismatch. This mismatch occurs due to variations in
physical quantities of identically designed devices i.e., their
threshold voltages, body factor and current factor. Though
SNM decreases at low VDD, the overall SRAM delay increases.
Moreover the read operation at low VDD leads to storage data
destruction in SRAM [3].
The conventional 6T memory cell comprises of two CMOS
inverters cross coupled with two pass transistors connected to
a complementary bitlines. In Fig. 1 the access transistors NI
and N2 are connected to the wordline (WL) to have the data
written to the memory cell from bitlines (BL). The bitlines act
as I/0 buses which carry the data from memory cells to the
sense amplifier.
The main operations of the SRAM cells are the write, read
and hold. The SNM is an important performance factor of hold
and read operations [7], specifically in read operation when
the wordline is '1' and the bitlines are precharged to '1'.

1-4244-1541-1/08/$25.00 C 2008 IEEE

Figure 1. Schematic of 6T SRAM

The internal node of SRAM which stores '0' will be pulled


up through the access transistor across the access transistor
and the drive transistor. This increase in voltage severely
degrades the SNM during read operation. The read stability is
mainly depends on the cell ratio, usually it is greater than 1.2
for an ideal SRAM. The cell ratio in this paper is considered as
1.6.
In order to overcome the static noise margin at low VDD 7T
SRAM cell [2] is implemented in Fig. 2(a). It has one
additional transistor compared with 6T cell but operates more
efficiently than 6T cell at IOW-VDD. The 7th transistor which is
nMOS transistor is between the node and the driver transistor.
The voltage dividing effect takes place at the inverter which
stores '0', will be pulled up. In order to stop this transition the
7th transistor at the other node is turned off so that the node
which stores '1' will not be pulled down by the driver
transistor as it acts a switch between the node and the driver
transistor.
In the data retention period, the SRAM data will not be
accessed. In this period the wordline signal /WL is '1' and the
nMOS transistor N5 is ON. In the read operation, the logical
threshold voltage of the CMOS inverter driving node B
increases when the data protection transistor N5 is turned OFF

[3].
This paper is organized as follows. In section II, designing
of an 8-BIT SRAM is presented. Section III explains the
measurement of SNM, write and read delays. Section IV
describes Monte Carlo simulations by considering process
variations and mismatch. A summary and future work is
presented in section V.

Vs

(a)
Viffl*vi~fm ollf SRAM

FfO P>*|* Rnt"o

525
.t

1.3

:.0

Figure 3. Block Diagram of 8-BIT SRAM


4b

timre tnrsl

(b)
Figure 2. (a) Schematic for 7T SRAM cell (b) Waveform for 7T Cell during
Data Protection by N5 Transistor.

bitlines precharge and equalize to VDD and the wordline WL is


turned ON. The bitline which is connected through the passtransistor to the node which is stored '0', discharges and the
other bitline stays high. This charge/discharge of bitlines will
then be detected by the sense amplifier through the I/0 buses.

II. 8-BIT SRAM USING 7T SRAM CELLS

A 8-bit SRAM comprising of 4 x 2 arrays of memory cells


is shown in Fig. 3. In this design we use two bits Ao and A1 to
select the row and bit A2 to select the column. Each output of
row decoder controls the wordline WL of the SRAM and the
each output of the column decoder control a pair of bitlines.
The decoder is designed by two input NOR gates as shown
in Fig. 4. The sense amplifier is a latch type differential sense
amplifier. The write switch and the sense amplifier are
connected to the 1/0 buses. The write switch will charge one
bitline and discharge the other according to the input DI.
In write operation the data from the write switch is written
in the SRAM cell through the bitlines. In read operation both

WL

AO AlAV Al
I-

WL
_I

WL

I1_

WL

-77)

__ _

F_

NOR

NOR

AO Al

Figure 4. (a) Row Decoder of 8-Bit SRAM 6T cell (b) Row decoder
of 8-bit SRAM 7T cell

WWL

The 8-Bit 7T SRAM is designed by replacing the 6T cell in


Fig. 3 with 7T cell. The Row Decoder design for 8-Bit 7T
SRAM is shown in Fig. 4(b). The chip select signal (CS) is
used to precharge the bitlines. The Read/Write control switch
is activated by p. When rp attains high, it either activates
the sense amplifier by pulling up or pulling down Tto low,
releasing the write switch to the input data DI. The logic
equations are given by [8]

equation is the delay equation between input and output. The


mean (it) describes the central tendency or the location of the
delay distribution.
Table I

Static Noise Margin (SNM) for 1-BIT SRAM Cell


Voltages

: 12

;= 4R / W and

w=t

=
=

.:08
: tws

It is important that Read/Write signal must be held constant


throughout the completion of operation for either read (low) or
write (high).
In 7T SRAM cell while performing precharge the passtransistors are opened to isolate the SRAM cell from bitlines,
at this point the WWL and WL are low and AO, A1 are high.
The complement of Ao and A1 turns ON other wordline in the
SRAM so as to avoid this transition 2X4 decoder is placed in
the circuit. A 2X4 decoder is used as a selector switch where
the output of this decoder is connected to the pMOS switches
as shown in Fig. 2. The select lines So and SI are used to
control the decoder. The decoder is designed with NAND
gates for output low.

SPRAM 7T

SRAM 6T

Vss

Read DelW (ps

105

163

Read Delay

56
10 1

LOS
N

IW 2

2594

1536

096

0_12

47-65

27-62

0.M

O 4

55-68

33-07-

0-84

0-24

64-13

5 53

0-S4

0-36

S-36

0 736

-0-4

III. STATIC NOISE MARGIN, WRITE AND READ DELAYS

A measurement of static noise margin for 8-bit SRAM is


presented in Table I. The SNM of 6T and 7T cells are
compared by varying the VDD and Vss of the cells [9]. The
VDD is decreased by 10% of the actual voltage (1.2 V) and the
Vss is gradually increased by 10%. Fig. 5 shows the stability
of 6T SRAM cell at a voltage of VDD = 0.84 V and VSS = 0.36
V, in precharge phase when WL is closed the voltages at A
and B are overwritten by the bitlines due to voltage dividing
effect.
In Table II comparison of write and read operation is made
for 8-bit SRAMs designed by 6T and 7T cells. It is shown that
the performance of using 7T SRAM cell is far better than that
of using 6T cell. This is due to additional transistor in 7T
SRAM. The write delay is measured from Data-in DI to the
node B of the SRAM and the read delay is calculated from the
wordline WWL to the sense amplifier output DO.
IV. MONTE CARLO SIMULATIONS FOR STATISTICAL DATA
ANALYSIS OF 8-BIT SRAM
Monte Carlo simulation is a method for iteratively
evaluating a design which is expressed in a closed loop
equation [12]. The goal is to determine how random variation
affects the sensitivity, performance or reliability of the design.
Statistical data analysis is conducted on both 6T and 7T 8-bit
SRAMs. In Table III and IV the comparison is made on write
and read operation of 6T and 7T 8-bit SRAM. The closed loop

Figure 5. Static Noise Margin of 8-BIT 6T SRAM at a voltages of VDD =


0.84v and Vss = 0.36v and this Figure shows that the voltages are overwritten
by the bitlines.
Table II

Comparison between SRAM 8-BIT 6T and 7T write and read operation


SRAM 8BIT

operation

Write

Delay

(ps)

operation

Read

Delay

SRAM 6T

Write '0'

326

Read '0'

389.6

SRAM 7T

Write '0'

239

Read '0'

320.2

SRAM 6T

Write '1'

303

Read '1'

348.2

SRAM 7T

Write '1'

265

Read '1'

229.1

(ps)

The standard deviation SD (6) describes the spread of the


distribution. As we infer from the Tables II and III that the
variation in delay of 7T and 6T SRAMs is minimum and is
better for 7T SRAM because of additional transistor and
variation in design of row decoder as there are additional

Table III

Delay_8BIT_7T_SRAM_Read_1

Comparison between statistical analysis of SRAM 8-BIT 6T and 7T Write


operation

SRAM 8BIT

Write
operation

Mean(g)
(ps)

SD(6)
(ps)

- (%)

SRAM 6T

Write '0'

267.43

21.45

8.02

SRAM 7T

Write '0'

236.37

18.26

7.75

SRAM 6T

Write '1'

299.45

21.61

7.21

SRAM 7T

Write '1'

279.48

17.54

6.27

Table IV

(a)
Delay_6T_SRAM_8Bit_Read_1

Comparison between statistical analysis of SRAM 8-BIT 6T and 7T Write


operation

SRAM 8BIT

operation

Read

Mean (g)

SD (6)

(ps)

(O)

SRAM 6T

Read '0'

352.36

20.14

5.71

SRAM 7T

Read '0'

319.9

19.73

6.16

SRAM 6T

Read '1'

391.4

20.72

5.29

SRAM 7T

Read '1'

247.45

18.95

7.65

(ps)

REFERENCES
(b)
Figure 6. Monte Carlo Simulations for SRAM 8-BIT (a) Simulation
result for Read '1' operation of using 7T cell (b) Simulation for Read '1'
operation using 6T cell.

wordlines to control. The percentage decrease in delay


calculated from Tables III and IV, for write 'O' operation is
11% and for write '1' operation is 6%. The percentage
decrease in delay for read 'O' operation is 9%0, similarly for
read '1' operation is 36%.
V. CONCLUSION
The architecture of SRAM varies widely with applications.
The above SRAM design is used in on-chip RAM applications
such as the scratch pad memory of a microprocessor. The
experimental results show that the static noise margin (SNM)
of the 7T SRAM cell is better than that of the 6T SRAM cell.
The stability of the SRAM at IOW-VDD is also proved by
testing the SRAM at 720 mV. The average percentage
decrease in delay for write operation from the 8-bit 6TSRAM
to the 8-bit 7T SRAM design is 6% and the average
percentage decrease in delay for read operation is 20%.

[1]

B. Cheng et al., "The impact of random doping effects on CMOS


SRAM cell," in Proc. ESSCIRC, Sep. 2004, pp. 219-222.
[2] "Read stability and write-ability analysis of SRAM Cells for nanometer
Technologies" Evelyn Grossar, Michele Stucchi, Karen Maex, Member,
IEEE, and Wim Dehaene, Senior Member, IEEE Journal Of Solid-State
Circuits, Vol. 41, No. 1, November 2006
[3] E. Seevinck et al., "Static-noise margin analysis of MOS SRAM cells,"
IEEE J. Solid-State Circuits, vol. SC-22, no. 2, pp. 748-754, May 1987.
[4] A Read-Static-Noise- Margin-Free SRAM cell for low-VDD and highspeed applications" Takeda K, Hagihara Y, Aimoto Y, Nomura IEEE
Solid state circuits VOL. 41, NO. 1 Jan 2006.
[5] M. J. M. Pelgrom et al., "Transistor matching in analog CMOS
applications," in IEDM Tech. Dig., Dec. 1998, pp. 915-916.
[6] A. J. Bhavnagarwala et al., "The impact of intrinsic device fluctuations
on CMOS SRAM cell stability," IEEE J. Solid-State Circuits, vol. 36,
no. 4, pp. 658-665, Apr. 2001.
[7] Static Noise Margin Variation for Sub-SRAM in 65-nm CMOS
Benton H. Calhoun and Anantha P. Chandrakasan IEEE Journal of SolidState Circuits, Vol. 41, No. 7, July 2006
[8] Digital MOS Integrated Circuits, Design for Applications by Naintsu
Wang, Prentice Hall, Edition 1989.
[9] Review of 6T SRAM cell by Ding - Ming Kwai Intellectual Property
Library company June 3, 2005.
[10] B. S. Amrutur et al., "A replica technique for wordline and sense
control
in low-power SRAMs," IEEE J. Solid-State Circuits, vol. 33, no. 8, pp.
1208-1219, Aug. 1998.
[11] "Design and analysis of fast low power SRAMs" by Bharadwaj S.
Amrutur, August 1999.

[12] http://www.vertex42.com/ExcelArticles/mc/MonteCarloSimulation.html

Second International Conference on Emerging Trends in Engineering and Technology, ICETET-09

Design and Analysis of a New Loadless 4T SRAM


Cell in Deep Submicron CMOS Technologies
Sandeep R1, Narayan T Deshpande2,

A R Aswatha,

Dept of ECE,
BMSCE,
Bangalore-560019, India
1
sandeepr90@gmail.com, 2ntd.bms@gmail.com

Dept of ECE,
DSCE,
Bangalore-560078, India
aswath.ar@gmail.com
Cell [3], as shown in Figure 1(b). They will be designed and
analysed in various configurations with respect to
functionality, power dissipation, area occupancy, stability
and access time.

Abstract - The goal of this paper is to reduce the power and


area of the Static Random Access Memory (SRAM) array
while maintaining the competitive performance. Here the
various configuration of SRAM array is designed using both
the six-transistor (6T) SRAM cell and a new loadless fourtransistor (4T) SRAM cell in deep submicron (130nm, 90nm
and 65nm) CMOS technologies. Then it is simulated using
HSPICE to check for its functionality, Static Noise Margin
(SNM), power dissipation, area occupancy and access time.
Except the precharge circuits and the basic storage cells,
remaining part of the circuitry is same for both 6T SRAM
array and New Loadless 4T SRAM array. Compared to the
conventional 6T SRAM array, the new loadless 4T SRAM
array consumes less power with less area in deep submicron
CMOS technologies. Also the SNM of the new loadless 4T
SRAM cell is as good as that of the 6T SRAM cell for higher
values of Cell Ratio (CR).

(a)

Keywords 6T SRAM cell, new loadless 4T SRAM cell, SNM,


low power and low area.

I.

INTRODUCTION

The on-chip caches in embedded microprocessors are


implemented using arrays of densely packed Static Random
Access Memory (SRAM) cells [1]. The number of
transistors devoted to the on-chip caches is often a
significant fraction of the total transistors devoted for the
entire chip. According to International Technology
Roadmap for Semiconductors (ITRS)-2005, SRAM is going
to occupy more than 60% of the System-on-Chips (SoCs) in
the future. Thus reducing the number of transistors in the
basic cell leads to the overall reduction in the number of
transistors in the SRAM array and thus leading to the
overall reduction in the area occupancy of the SRAM array.
Moreover, the caches consume a large fraction of the total
power in embedded microprocessors. Improving the power
efficiency of caches is therefore critical to the overall
system power efficiency [2]. A few critical circuits in a
system not only affect the design metrics but may fail to
operate in deep submicron technology. Hence the SRAM
arrays are designed, analysed and checked for its design
metrics in deep submicron CMOS technologies.
Two types of SRAM cells will be considered in this paper.
(i) Conventional six-transistor (6T) SRAM cell, as shown in
Figure 1(a). (ii) New Loadless four-transistor (4T) SRAM

978-0-7695-3884-6/09 $26.00 2009 IEEE

155

(b)
Figure 1. SRAM Cell. (a) Conventional 6T SRAM Cell. (b) New
Loadless 4T SRAM Cell.

A 6T SRAM cell consists of two cross-coupled inverters


(M1-M3 and M2-M4) forming a latch and the access
transistors (M5 and M6). In the new loadless 4T SRAM cell,
two NMOS transistors (M3 and M4) are used as pass
transistors to access the cell and two PMOS transistors (M1
and M2) are used as drivers for the cell.
An SRAM cell must be designed such that it provides a
non-destructive read operation and a reliable write operation.
The working of the new loadless 4T SRAM cell can be
found in [3] and the conventional 6T SRAM cell can be
found in [4-9].
This paper is organized as follows. Section II deals with
the Static Noise Margin (SNM) of both the SRAM cells.
The precharge circuits for both the SRAM arrays are
presented in section III. The Sense Amplifier (SA) for the
SRAM arrays is presented in section IV. The decoder and
the write driver circuits are presented in section V. The
simulation environment and the results are discussed in
section VI. Finally the conclusion is given in section VII.

II.

SNM OF SRAM CELLS

write and read cycle. The transistor M1 and M2 will


precharge the bitlines while the transistor M3 will equalize
them to ensure both bit lines within a pair are at the same
potential before the cell is read. The same circuit topology is
used for local precharge in combination with the SA in the
corresponding SRAM arrays (Refer section IV for more
details).

The data stability of the SRAM cell has been a prominent


topic in the SRAM cell design, as it examines the SRAM
cell for its ability to retain the data. SNM is the metric used
in this paper to characterize the stability of the SRAM cells.
The SNM is defined as the minimum dc noise voltage
necessary to flip the state of a SRAM cell [3, 10]. The most
critical point in a SRAM cell is during a read, and hence
read SNM is given more importance than the write SNM.
The same method is been used for the evaluation of SNM
in both the types of SRAM cells [10]. SNM is typically
measured while holding the bitlines at the precharge value
and the Word Line (WL) asserted. The schematic used for
the SNM simulation in case of 6T SRAM cell is as shown in
Figure. 2(a). The schematic used for the SNM simulation in
case of New Loadless 4T SRAM cell is as shown in Figure.
2(b). The noise sources in the simulation were swept from 0
to 500mV in 200ns, which can be considered slow. The two
nodes initially remain stable, but as the noise increases, the
margin between the nodes diminishes. At some point the
storage nodes flip and the cell settles in this new stable state.
The point at which the storage nodes flip gives the value of
SNM. The results from these simulations will be presented
in section VI.

(a)

IV.

SENSE AMPLIFIER

One of the major issues in the design of SRAMs is the


speed of read operation. For having high performance
SRAMs, it is essential to take care of the read speed both in
the cell-level design and in the design of SA. The primary
function of a SA in SRAMs is to amplify a small analog
differential voltage developed on the bit lines by a readaccessed cell to the full swing digital output signal thus
greatly reducing the time required for a read operation. The
choice and design of a SA defines the robustness of bit line
sensing, impacting the read speed and power. High density
memories commonly come with increased bitline parasitic
capacitances. These large capacitances slow down voltage
sensing and makes bitline voltage swings energy-consuming,
which result in slower and more power hungry memories.
The need for larger memory capacity, higher speed, and
lower power dissipation, impose trade offs in the design of
SA. Also since SRAMs do not feature data refresh after
sensing, the sensing operation must be non-destructive.
There are many types of SA. The one that is used in this
paper is the latch-type SA [4, 8]. The SA is present in every
column of SRAM array. Except the local precharge circuits
both the versions of SRAM array use the same type of SA.
Figure. 4(a) shows the latch-type SA with local precharge
circuit for the 6T SRAM array and Figure. 4(b) shows the
latch-type SA with local precharge circuit for the new
loadless 4T SRAM array. It has cross coupled latch in its
configuration which relaxes the gain requirement of the
amplifier. The sizing of the transistors is done using the same
methodology as that of the 6T SRAM cell. Using the
approximate values, the simulations were run and the widths
were optimized to get the best output. The read operation
begins by precharging and equalizing both the bitlines, with
simultaneously biasing the latch-type SA in the high-gain
meta-stable region by precharging and equalizing its inputs.
And then to read a particular word from the SRAM array, the
corresponding row is selected by enabling the WL. Once a
sufficient voltage difference is built between the bitlines, the
SA is enabled by read enable (RE) signal. The SA will sense
which bitline is heading towards high voltage and which

(a)

(b)
Figure 2. SNM simulation Setup. (a) For Conventional 6T SRAM Cell.
(b) For New Loadless 4T SRAM Cell.

III.

(b)

Figure 3. Precharge Circuits. (a) Precharge Circuit for 6T SRAM Array.


(b) Precharge Circuit for New Loadless 4T SRAM Array.

PRECHARGE CIRCUITS

The precharge circuit used for the new loadless 4T


SRAM array is different from that of the 6T SRAM array.
The function of the precharge circuit in the 6T SRAM array
is to charge the Bit Line (BL) and Bit Line Bar (BLB) to
VDD. In the new loadless 4T SRAM array the bitlines are
precharged to ground instead of VDD and thus consuming
less power than the 6T SRAM array. The schematic of the
pre-charge circuit [4] for the 6T SRAM array is shown in
Figure. 3(a) and that of the new loadless 4T SRAM array is
shown in Figure. 3(b). The Precharge (PC) signal enables
the bit-lines to be pre-charged at all times except during

156

bitline is heading towards ground potential and then a full


voltage swing is obtained at the output.

where m=log2n. The schematic of a 2:4 dynamic NAND


decoder is shown in Figure. 5(a). Here all the outputs of the
array are high by default, with the exception of the selected
row, which is low. Since the interface between decoder and
memory often includes a buffer, it can be made inverting to
enable the WL.
B. Write Driver Circuit
The function of the SRAM write driver is to write the
input data to the bitlines when the Write Enable (WE) signal
is enabled; otherwise the data is not written onto the bitlines.
Only one write driver is needed for each SRAM column.
Thus the area impact of a larger write driver is not
multiplied by the number of cells in the column and hence
the write driver can be sized up if necessary. The schematic
of the write driver circuit is shown in Figure. 5(b).

(a)

(a)

(b)
Figure 4. Sense Amplifiers. (a) Latch-type SA with Local Precharge
Circuit for 6T SRAM Array. (b) Latch-Type SA with Local Precharge
Circuit for New Loadless 4T SRAM Array.

V.

DECODER AND WRITE DRIVER CIRCUITS

The decoder and the write driver circuits are same for
both the type of SRAM arrays. The decoder circuit is
presented in section V-A. The write driver circuit is
presented in section V-B.
(b)

A. Decoder Circuit
A decoder is used to decode the given input address and
to enable a particular WL. There are various types of
decoders available. The one that is used in this paper is the
dynamic decoder. Dynamic decoders [6] have the following
advantages when compared to the other types of decoders.
(a) The number of transistors used is less. (b) The layout of
the decoder is simple and less time consuming. (c) The
power consumption is less. (d) The speed of the decoder is
also good.
In particular dynamic NAND decoder is used in this
paper rather than dynamic NOR decoder, as the former
consumes less area and less power than the latter. For an nword memory, an m : n dynamic NAND decoder is used,

Figure 5. Decoder and Write Driver Circuits. (a) 2:4 Dynamic NAND
Decoder. (b) Write Driver

VI.

SIMULATION ENVIRONMENT AND RESULTS

The following configuration of SRAM arrays were


designed and analysed using the conventional 6T SRAM
Cell and the New Loadless 4T SRAM Cell: (a) 1*1 (b)
16*16 (c) 32*32. The various configurations were simulated
using HSPICE [11], using the Nominal Predictive
Technology Model (PTM) in 130nm, 90nm and 65nm
CMOS technologies [12]. The functionality of 1*1 6T
SRAM cell is shown in Figure. 6 and that of 1*1 New
Loadless 4T SRAM cell is shown in Figure. 7. The

157

0. And both the cells operated correctly with different array


configurations at a temperature of 270C, VDD=1.5V (in
130nm CMOS technology), VDD=1.2V (in 90nm CMOS
technology) and VDD=1.1V (in 65nm CMOS technology).
The frequency at which the both the cells were made to
operate was 333.33MHz. Each bitlines was assumed to have
a capacitance of 20fF. Also a load of 20fF was connected to
each output line of SA. Similar procedure was used to check
the functionality for other configurations of both the types
of SRAM arrays in all the CMOS technologies.

functionality of 32*32 6T SRAM cell is shown in Figure. 8


and that of 32*32 New Loadless 4T SRAM cell is shown in
Figure. 9. For 1K-bit (32*32) configuration along with the
relevant input control signals, only the signals for three
input data bits (0th, 16th and 31st), three output data bits (0th,
16th and 31st), and the corresponding storage nodes of the
appropriate cell is presented.

(a)

(a)

(b)

(b)

(c)
Figure 6. Write-Read Cycle of 1-Bit 6T SRAM. (a) In 130nm CMOS
Technology. (b) In 90nm CMOS Technology. (c) In 65nm CMOS
Technology.

(c)

Following order is used to check the functionality of both


the types of SRAM cells: write 1 read 1 write 0 read

Figure 7. Write-Read Cycle of 1-Bit New Loadless 4T SRAM. (a) In


130nm CMOS Technology. (b) In 90nm CMOS Technology. (c) In 65nm
CMOS Technology.

158

(a)

(a)

(b)

(b)

(c)

(c)

Figure 8. Write-Read Cycle of 1K-Bit 6T SRAM. (a) In 130nm CMOS


Technology. (b) In 90nm CMOS Technology. (c) In 65nm CMOS
Technology.

Figure 9. Write-Read Cycle of 1K-Bit New Loadless 4T SRAM. (a) In


130nm CMOS Technology. (b) In 90nm CMOS Technology. (c) In 65nm
CMOS Technology.

159

Following are the signals shown in the simulation results:


pc corresponds to PC signal given to the Precharge
circuits; dpc corresponds to the clock signal given to the
decoder circuits; wl0 corresponds to the WL signal of row
0 in the SRAM array. This is the output of inverting buffer
circuit; we corresponds to the write enable signal given to
the write driver circuits; di0, di16, and di31
correspond to the 0th, 16th and 31st input data bits; lpc
corresponds to the Precharge signal given to the Local
Precharge circuits; re corresponds to the read enable
signal given to the sense amplifier circuits; xm0_0.q,
xm0_16.q, and xm0_31.q correspond to the 0th, 16th and
31st storage node (true node-q) of the SRAM cell in the row
0; dt0, dt16, and dt31 correspond to the 0th, 16th and
31st output data bits of the sense amplifier circuits.
The SNM of both the types of SRAM cells, for
different values of Cell Ratio (CR) were obtained using the
same setup as given in section II. The results have been
tabulated for 130nm CMOS technology in Table I, for 90nm
CMOS technology in Table II and that of 65nm CMOS
technology in Table III. It is observed that even for low
value of CR the 6T SRAM cell is highly stable than that of
the new loadless 4T SRAM cell. To match the stability of
the new loadless 4T SRAM cell with that of 6T SRAM, the
value of its CR must be made high.
The Access time was measured for both the types of
1Kb SRAM array and the results are tabulated. The Read
Access time is the time measured from the point at which
the RE signal reaches 10% of VDD to the point at which the
output signal becomes +/- 10% VDD of the required logic
value. The Write Access time is the time measured from the
point at which the WE reaches 50% of VDD to the point at
which the storage node of the cell reaches 50% of VDD. The
access times for both the types of SRAM arrays with CR=3
for 6T SRAM and CR=4 for New Loadless 4T SRAM in
130nm CMOS technology is shown in Table IV, for 90nm
CMOS technology in Table V and that of 65nm CMOS
technology in Table VI. It is observed that the read access
time for the 1Kb 6T SRAM array is less than that of the
1Kb New Loadless 4T SRAM array and the write access
time for the 1Kb 6T SRAM array is more than that of the
1Kb New Loadless 4T SRAM array.
The total power dissipation (TPD) of various
configurations of SRAM arrays using both the types of cells
was measured and the results obtained have been tabulated.
The comparison of TPD for different array configurations
with CR=3 for 6T SRAM and CR=4 for New Loadless 4T
SRAM in 130nm CMOS technology is shown in Table VII,
for 90nm CMOS technology is shown in Table VIII and that
of 65nm CMOS technology in Table IX. It is observed that
for various configurations the new loadless 4T SRAM
arrays consume less power than that of the 6T SRAM array.
The total number of transistors used for various
configurations of SRAM arrays using both the types of
SRAM cells has been tabulated as shown in Table X. It is
observed that for various configurations the new loadless 4T

SRAM arrays uses lesser number of transistors and hence


the lower area than that of the 6T SRAM array.
TABLE I.

COMPARISON OF SNM FOR DIFFERENT CRS IN 130NM


CMOS TECHNOLOGY

Cell Ratio

SNM-6T (in mV)

SNM-4T (in mV)

290

70

310

250

320

370

TABLE II.

COMPARISON OF SNM FOR DIFFERENT CRS IN 90NM


CMOS TECHNOLOGY

Cell Ratio

SNM-6T (in mV)

SNM-4T (in mV)

260

40

270

170

280

270

TABLE III.

COMPARISON OF SNM FOR DIFFERENT CRS IN 65NM


CMOS TECHNOLOGY

Cell Ratio

SNM-6T (in mV)

SNM-4T (in mV)

240

30

250

150

260

230

TABLE IV.
ACCESS TIMES FOR BOTH THE TYPES OF SRAM ARRAYS
WITH CR=3 FOR 6T SRAM AND CR=4 FOR 4T SRAM IN 130NM CMOS
TECHNOLOGY.
Metric
Read Access
Time
Write Access
Time

1K-Bit 6T
SRAM

1K-Bit New Loadless 4T


SRAM

608ps

996ps

145ps

118ps

TABLE V.
ACCESS TIMES FOR BOTH THE TYPES OF SRAM ARRAYS
WITH CR=3 FOR 6T SRAM AND CR=4 FOR 4T SRAM IN 90NM CMOS
TECHNOLOGY.
Metric
Read Access
Time
Write Access
Time

1K-Bit 6T
SRAM

1K-Bit New Loadless 4T


SRAM

671ps

1290ps

145ps

92.2ps

TABLE VI.
ACCESS TIMES FOR BOTH THE TYPES OF SRAM ARRAYS
WITH CR=3 FOR 6T SRAM AND CR=4 FOR 4T SRAM IN 65NM CMOS
TECHNOLOGY.
Metric
Read Access
Time
Write Access
Time

160

1K-Bit 6T
SRAM

1K-Bit New Loadless 4T


SRAM

781ps

1990ps

134ps

87.6ps

TABLE VII.
COMPARISON OF TPD FOR DIFFERENT ARRAY
CONFIGURATIONS WITH CR=3 FOR 6T SRAM AND CR=4 FOR 4T SRAM IN
130NM CMOS TECHNOLOGY.
Configuration

TPD (6T)
(in mW)

TPD (4T)
(in mW)

Reduction in TPD

1*1

0.089408

0.05309

40.62%

16 * 16

1.4833

0.8738

41.09%

32 * 32

3.1688

1.8397

41.94%

ACKNOWLEDGMENT
Sandeep R would like to thank I. K. Ravish Kumar,
Project Manager, Intel Technology India Private Limited,
Bangalore, for helpful discussions and providing the tool.
REFERENCES
[1]

James S. Caravella, A Low Voltage SRAM for Embedded


Applications, IEEE Journal of Solid-State Circuits, vol. 32, no. 3,
pp. 428 432, March 1997.
[2] Yen-Jen Chang, Shanq-Jang Ruan, and Feipei Lai, Design and
Analysis of Low-Power Cache using Two-Level Filter Scheme, IEEE
Transactions on Very Large Scale Integration (VLSI) Systems, vol.
11, no. 4, pp. 568-580, August 2003.
[3] Jinshen Yang and Li Chen, A New loadless 4-transistor SRAM cell
with a 0.18m CMOS technology, Electrical and Computer
Engineering, CCECE Canadian Conference, pp. 538 541, April
2007.
[4] Andrei Pavlov and Manoj Sachdev, CMOS SRAM Circuit Design
and Parametric Test in Nano-Scaled Technologies, Springer, 2008.
[5] Nestoras Tzartzanis, High Performance Energy-Efficient Design, p
89-119, Springer 2006.
[6] Jan.M.Rabaey, Anantha.P.Chandrakasan, and Borivoje Nikolic,
Digital Integrated Circuits, PHI, 2003.
[7] Sung-Mo Kang and Yusuf Leblebici, CMOS Digital Integrated
Circuits, TMH, 2003.
[8] Mohammad Sharifkhani, Design and Analysis of Low-Power
SRAMs, PhD Thesis, University of Waterloo, 2006.
[9] Tegze.P.Haraszti, CMOS Memory Circuits, Kluwer Academic
Publishers, 2002.
[10] Ingvar Carlson, Design and Evaluation of High Density 5T SRAM
Cache for Advanced Microprocessors, Masters Thesis, Linkopings
University, 2004.
[11] HSPICE for Windows (Version: Z-2007.03), Inc, 2007 and StarHspice Manual, Release 2007.3.
[12] Berkeley
Predictive
Technology
Model
website,
http://www.eas.asu.edu/%7Eptm/

TABLE VIII. COMPARISON OF TPD FOR DIFFERENT ARRAY


CONFIGURATIONS WITH CR=3 FOR 6T SRAM AND CR=4 FOR 4T SRAM IN
90NM CMOS TECHNOLOGY.
Configuration

TPD (6T)
(in mW)

TPD (4T)
(in mW)

Reduction in TPD

1*1

0.048379

0.026362

45.51%

16 * 16

0.82611

0.44497

46.14%

32 * 32

1.7484

0.91411

47.72%

TABLE IX.
COMPARISON OF TPD FOR DIFFERENT ARRAY
CONFIGURATIONS WITH CR=3 FOR 6T SRAM AND CR=4 FOR 4T SRAM IN
65NM CMOS TECHNOLOGY.
Configuration

TPD (6T)
(in mW)

TPD (4T)
(in mW)

Reduction in TPD

1*1

0.036

0.0189

47.50%

16 * 16

0.5918

0.30994

47.63%

32 * 32

1.2478

0.6497

47.93%

TABLE X.

COMPARISON OF TOTAL NUMBER OF TRANSISTORS FOR


DIFFERENT ARRAY CONFIGURATIONS.
Total Number of Transistors

Configuration
1*1

6T SRAM
array
31

New Loadless
4T SRAM array
29

16 * 16

2064

1552

24.81%

32 * 32

7232

5184

28.32%

Reduction
6.45%

Sandeep R received his B.E Degree (Electronics and


Communication)
from Visvesvaraya Technological
University in 2006. Currently he is pursuing M.Tech Degree
(Electronics) from Visvesvaraya Technological University.
His main research interests include Analysis and design of
Low Power Memories and Adders.

VII. CONCLUSION
The New Loadless 4T-SRAM cell is designed and
analyzed in deep submicron (130nm, 90nm and 65nm)
CMOS technologies, which establish the technology
independence of the New Loadless 4T-SRAM cell and its
consistent performance with respect to Conventional 6TSRAM cell in deep sub-micron regime. The New Loadless
4T SRAM array consumes low power with low area than
that of the Conventional 6T SRAM array. The New
Loadless 4T SRAM Cell operates with high stability for
higher values of CR. The most significant feature of this
new loadless 4T SRAM Cell is that there is no need to
modify any of the fabrication process. Thus it can be used
for on-chip caches in embedded microprocessors, highdensity SRAMs embedded in any logic devices, as well as
for stand-alone SRAM applications.

Narayan T Deshpande received his B.E Degree from


Bangalore University in 1990 and M.E Degree from
Bangalore University in 1996. His main research interests
include Signal Processing.
A R Aswatha received his B.E Degree from Mysore
University in 1991, M.Tech Degree from M.I.T Manipal in
1996, M.S. Degree from B.I.T.S. Pilani in 2002. Currently
he has submitted the thesis for PhD degree from Dr. M.G.R
University. His main research Interests include Analysis and
design of Low Power VLSI Circuits and Image Processing.

161

1192

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 19, NO. 7, JULY 2011

Impacts of NBTI/PBTI and Contact Resistance


on Power-Gated SRAM With High-
Metal-Gate Devices
Hao-I Yang, Student Member, IEEE, Wei Hwang, Fellow, IEEE, and Ching-Te Chuang, Fellow, IEEE

AbstractThe threshold voltage ( TH ) drifts induced by negative bias temperature instability (NBTI) and positive bias temperature instability (PBTI) weaken PFETs and high-k metal-gate
NFETs, respectively. These long-term TH drifts degrade SRAM
cell stability, margin, and performance, and may lead to functional
failure over the life of usage. Meanwhile, the contact resistance
of CMOS device increases sharply with technology scaling, especially in SRAM cells with minimum size and/or sub-ground rule
devices. The contact resistance, together with NBTI/PBTI, cumulatively worsens the SRAM stability, and leads to severe SRAM performance degradation. Furthermore, most state-of-the-art SRAMs
are designed with power-gating structures to reduce leakage currents in Standby or Sleep mode. The power switches could suffer
NBTI or PBTI degradation and have large contact resistances. This
paper presents a comprehensive analysis on the impacts of NBTI
and PBTI on power-gated SRAM arrays with high-k metal-gate
devices and the combined effects with the contact resistance on
SRAM cell stability, margin, and performance. NBTI/PBTI tolerant sense amplifier structures are also discussed.
Index TermsContact resistance, negative bias temperature
instability (NBTI), positive bias temperature instability (PBTI),
power-gated SRAM, reliability.

ACRONYM
SRAM
PBTI
NBTI

Static random access memory.


Positive bias temperature instability.
Negative bias temperature instability.
Overlap resistance.
Extension resistance.
Deep resistance.
Silicon-contact diffusion resistance.
NOTATION
The sheet resistance per square of the underlying
heavily doped silicon layer, in unit of
.
The specific contact resistivity between the metal
and the diffusion layer in unit of ohm square
centimeter.

Manuscript received October 16, 2009; revised February 06, 2010; accepted
March 30, 2010. First published May 17, 2010; current version published June
24, 2011. This work was supported by the National Science Council of Taiwan,
under Contract NSC 98-2221-E-009 -112, and Ministry of Economic Affairs,
under the Project MOEA 98-EC-17-A-01-S1-124.
The authors are with the Department of Electronics Engineering
and Institute of Electronics, National Chiao-Tung University, Hsinchu
300, Taiwan (e-mail: haoiyang@gmail.com; hwang@mail.nctu.edu.tw;
chingte.chuang@gmail.com).
Digital Object Identifier 10.1109/TVLSI.2010.2049038

The transfer length.


The length of a contacted silicide.

I. INTRODUCTION

BTI has long been a concern for scaled PFETs. The longdrift caused by NBTI has been shown to determ
grade the stability and performance of SRAM, and may lead to
functional failure over the life of usage. Recently, with the introduction of high-k metal-gate technology to contain the gate
leakage current, and to enable scaling of MOSFET to 45 nm
node and below, PBTI has emerged to be a major reliability coninstability caused by charge trapping
cern for NFETs due to
drifts degrade MOSFET
at the interface. These long term
current drive over time, and their effects become more significant with technology and voltage scaling (see Fig. 1) [1].
The transistor performance also degrades with the ever-increasing device contact resistances and series resistances of
the channel/source/drain in scaled technologies. [2], [3]. Conventionally, the contact and series resistance are second-order
effects on the device performance. However, with technology
scaling, the contact area and the device width decrease, leading
to increase in contact and series resistances. When the silicide
length continuously shrinks and is smaller than the transfer
length, the contact resistance increases sharply, severely degrading the stability and performance of circuits.
SRAMs in deep sub-100 nm technologies have poor margin
and stability due to large leakage and process variation, fundamental limitation such as random dopant fluctuation (RDF),
and microscopic effects such as line edge roughness (LER). The
combined/cumulative effects of NBTI/PBTI and device contact
and series resistance aggravate the already poor margin and stability of SRAMs. Furthermore, many state-of-the-arts SRAMs
are designed with power-gating structures to reduce static power
in Standby or Sleep mode [4][7]. The power-gating structures
play vital roles to contain leakage current in Standby or Sleep
mode, and to provide sufficient currents for SRAM arrays in
Active mode. Unfortunately, power switches also suffer NBTI/
PBTI stress and degradation, and become weaker over time. As
such, it is crucial to understand the NBTI/PBTI degradation of
the power-gating structures, in addition to the cell, and the resulting combined impacts on the power-gated SRAMs.
Previous works have shown that SRAM read static margin
(RSNM) was degraded by NBTI effects, while write margin
(WM) was improved [8]. RSNM and WM were both degraded

1063-8210/$26.00 2010 IEEE

YANG et al.: IMPACTS OF NBTI/PBTI AND CONTACT RESISTANCE ON POWER-GATED SRAM

Fig. 1. V
drifts induced by NBTI and PBTI using reaction-diffusion framefor high-k metal gate (nMOS
work calibrated with published data [1]. T
7.5 ; pMOS
for poly gate (nMOS
7.7 ),
16.5 ; pMOS
17.5 ).

=
=

A
A

A T

A

when PBTI and NBTI were considered together, and the degrawas
dations were more sensitive to PBTI [9]. SRAM
also shown to degrade with time [10]. However, these papers
only focused on analyzing SRAM cells in standard 6-T SRAM
array structures. In this paper, we present a comprehensive
analysis on the impacts of NBTI and PBTI on power-gated
SRAM arrays. Two different types of power-gating structures,
header and footer, are analyzed. The resulting impacts on the
stability, margin, power, performance, virtual supply bounce,
and wake-up time, etc., are discussed. The effects of contact
and series resistances on the SRAM cell, and the combined
impacts with NBTI/PBTI on SRAM are also investigated.
In the following section, we first describe the details of our
simulation model in predictive technology model (PTM) high-k
CMOS 32 nm technology in Section II. Section III shows NBTI/
PBTI impacts on power-gated SRAM. The impacts of contact
resistance on power-gated SRAM are analyzed in Section IV,
and their combined effects with NBTI/PBTI are also studied.
Section V compares SRAM sensing structures, including differential sensing amplifier and large signal sensing scheme, and
shows that judicious choice of appropriate sense amplifier structure can mitigate NBTI and PBTI effects. The conclusions of the
paper are given in Section VI.
II. ANALYSIS MODELS
This section describes the NBTI/PBTI model and contact resistance model used in our analyses. The power-gated SRAM
structure and its operation in this work are also introduced.
A. NBTI and PBTI Model
NBTI causes the threshold voltage
of PFET to become
more negative with time, leading to long term degradation of
current drive. Under negative gate bias (stress phase), holes in
inversion layer interact with and break Si-H bonds at interface.
The H-species diffuse into the oxide, leaving interface traps at
. When stress condiinterface, thus causing increase in
tions are removed, H-species diffuses back to interface and passivates dangling Si-bonds, and passivation (or recovery) oc-

1193

curs. Thus, the device lifetime under ac stress is longer than that
predicted by dc stress measurements. The corresponding effect
for NFET, namely PBTI, is in general quite small and can be
neglected for oxide/poly-gate device. NFETs with high- gate,
however, exhibit significant charge trapping and thus long-term
shift as well. The
drift of PFET (NFET) due to NBTI
(PBTI) can be described by dc reaction-diffusion (RD) framework when the stress signal does not change (i.e., static stress)
[8], [11], [12]. If the stress signal changes with time (i.e., alternating stress), the dc RD model can be multiplied by a prefactor to account for the signal (stress) probability, frequency,
duty cycle of the stress signal, and the recovery mechanism, and
the new formula is called ac RD model [8], [11], [12]. However, according to the results of [12] and [13], the impact of the
drift is relatively insignificant. Thus,
signal frequency on
we neglect the effect of signal frequency, and analyze cases with
various signal (stress) probabilities. In following analysis, the
prefactor of the ac RD model is simplified as function of signal
probability. The simplified ac RD model is
(1)
where prefactor is a function of signal probability (S), and
is a technologydependent constant. Notice also that
drift depends strongly on the
NBTI/PBTI induced
bias and temperature, but barely on
[11], [12]. Fig. 1
shows the
drifts induced by NBTI and PBTI using reaction-diffusion framework and calibrated with published data
drifts are incorporated into PTM 32 nm and PTM
[1]. The
high-k 32 nm device models.1 Notice that in the model,
of poly-gate PFET is 17.5 , while
of high-k metal-gate
of high-k metal-gate device is almost
PFET is only 7.7 .
of poly-gate device. These are
2.3 times smaller than
that
consistent with the facts that the best (smallest)
can be achieved with SiON/poly-Si gate is around 1718 ,
limited by gate tunneling leakage, and that state-of-the-art 32
around 7.58.0 . As
nm high-k metal-gate devices have
drift of high-k gate-device is more
such, in our model, the
serious than the SiON/poly-Si gate device.
B. Contact Resistance Model
As shown in Fig. 2(a), the source/drain (S/D) series resistance
, extension resiscan be divided into overlap resistance
, deep resistance
, and silicon-contact diftance
, where all resistance are in units of
.
fusion resistance
,
, and
are included in the deConventionally,
vice model, but
is not. We model the
of a transistor as
,
shown in Fig. 2(b). With technology scaling, the sum of
, and
decreases, but
increases. The formula for
silicon-contact diffusion resistance is given by
(2)
where
is the sheet resistance per square of the underlying
,
is the specific
heavily doped silicon layer, in unit of
contact resistivity between the metal and the diffusion layer in
is the transfer length,
unit of ohm square centimeter, and
1[Online].

Available: http://www.eas.asu.edu/~ptm/

1194

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 19, NO. 7, JULY 2011

Fig. 4. 6-T SRAM cell.


Fig. 2. (a) Series resistance components of S/D and (b) schematic of NMOS
with S/D diffusion contact resistances.

Fig. 3. Power-gated SRAM with (a) header and (b) footer.

which is defined as
[3]. When
is
larger than , the contact resistance is only slightly dependent
is smaller than
on the contact region. However, when
, the contact resistance increases sharply if
is further
scaled down. According to [3], the contact resistance would be
,
, and
, and increases
larger than the sum of
sharply beyond 45 nm technology node. As diffusion contact
resistance dominates the short channel device resistance, we focuses on its impacts on SRAM array in the following analysis.
C. Power-Gated SRAM Structures
SRAM power-gating structures can be divided into two
basic types: a header power-gating structure and a footer
structure. Fig. 3(a) shows a column-based header-gated SRAM
structure. In this structure, PH1 is the header power switch
used for leakage reduction, and MH1 is the clamping device
to bias virtual supply (VVDD) for data retention in Standby
or Sleep mode. Fig. 3(b) shows a column-based footer-gated
SRAM structure. MF1 is the footer power switch, and PF1 is
the clamping device to bias virtual ground (VVSS) for data
retention in Standby or Sleep mode. Fig. 4 shows the standard
6T SRAM cells used in our analysis. The sub-array block
size is 128 128 cells. Parasitic capacitance, inductance, and
resistance of the package are included in our analysis. Each
power-gated SRAM array is packed by a package model [14],

TABLE I
SIGNAL (STRESS) PROBABILITY CASE SUMMARY

and parasitic capacitance, inductance and resistance of the


package model are 5.32 pF, 8.18 nH, and 0.217 , respectively.
In power-gated SRAM arrays, the power switch should
provide sufficient supply voltage and current for SRAM cells
to maintain adequate margin and performance during Read
and Write operations. As the power switch is constantly under
stress) in Active mode, they have
NBTI or PBTI stress (
the highest probability of being stressed. Hence, we assume the
worst case scenario that the power switch is always stressed.
In contrast, the clamping device (diode) is shunted and shorted
by the power switch and experiences no NBTI/PBTI stressing
during Active mode. In Standby or Sleep Mode, the stressing
of the clamping device is about one
(MOS
voltage
diode voltage), thus the NBTI or PBTI effects on the clamping
device is negligible.
The following sections present detail simulation results based
on BSIM predictive high-k metal-gate model for 32 nm. The
supply voltage of the SRAM arrays is 0.9 V. The contact area
0.05 m based on scaling from UMC 65
is assumed as
nm CMOS process technology in accordance with scaling factor
from ITRS Roadmap.2 The ranges of values for sheet resistance
and specific contact resistivity are based on scaling/extrapolation from UMC 65 nm CMOS process, published data, as well
as ITRS projection [3].2 The contact resistance at 32 nm node
ranges from around 100 to 500 .
Besides, The
drift due to NBTI and PBTI are based on
ac RD framework and calibrated with published data [1]. The
prefactors of ac RD framework for different cases are from [8].
When the entire array under NBTI/PBTI stress is analyzed, all
cells of the array are assumed to be stressed with three different
signal probability cases (Cases A, B, and C in Table I): none,
25% (75%) and 50% (50%). On the other hand, when an individual cell is examined, four different signal probability (SP)
cases (Cases D, E, F, and G in Table I) are considered: none,
25% (75%), 50% (50%), and 100% (0%).
2[Online].

Available: http://www.itrs.net/

YANG et al.: IMPACTS OF NBTI/PBTI AND CONTACT RESISTANCE ON POWER-GATED SRAM

1195

III. IMPACTS OF NBTI AND PBTI ON POWER-GATED SRAM


This section analyzes the impacts of NBTI and PBTI on
power-gated SRAM. Long-term reliability degradations of
header and footer structures are investigated, including RSNM,
WM, and access performance. NBTI and PBTI effects on
power-gated SRAM wake-up transition are also analyzed.
A. Active Mode Virtual Supply
When a header-gated SRAM is stressed, the VVDD drifts
with time, and the stability of the array is impacted. VVDD is
determined by the resistance of the power switch and the equivalent resistance of the SRAM array. Hence, the drift of VVDD is
also affected by signal probabilities of SRAM cells. When only
NBTI is present/considered, VVDD decreases with the stress
and resistance of the PFET header
time [see Fig. 5(a)] as
increase due to NBTI, and the effect is more significant than the
increase in the equivalent resistance of the SRAM array. On the
other hand, if only PBTI is present/considered, VVDD increase
[see Fig. 5(b)] with the stress time since there is no PBTI effect
on the PFET header switch while the equivalent resistance of the
SRAM array increases due to PBTI on cell NFETs. Similar behavior can be observed for a footer-gated SRAM. When only
NBTI is present/considered, VVSS decreases [see Fig. 5(d)]
since there is no NBTI effect on the NFET footer switch, while
the equivalent resistance of the SRAM array increases due to
NBTI on cell PFETs. In contrast, if only PBTI is present/considered, VVSS increases with time [see Fig. 5(e)]. This is because
the equivalent resistance of the NFET footer increases due to
PBTI, and the effect is more significant than the increase in the
equivalent resistance of the SRAM array. Finally, when both
NBTI and PBTI are present/considered, VVDD variation of a
header-gated SRAM shows the combined effect of NBTI-only
and PBTI-only cases [see Fig. 5(c)]. Similar observation for
VVSS variation of a footer-gated SRAM when both NBTI and
PBTI are present/considered can be seen in [see Fig. 5(f)]. These
drift of
results imply that virtual supply drift due to the
header/footer power switch is only partially compensated by the
drift of the SRAM cell array.
B. Read Operation
RSNM of a cell is defined as the voltage difference between
the maximum SRAM Read disturb and the minimum trip point
of the SRAM inverter pair during Read operation. Therefore,
when PFET loading transistors are degraded by NBTI, trip
points of the SRAM inverter pair reduce and RSNM becomes
worse. Degradation of header (footer) power switch by NBTI
and ON resistance of the power
(PBTI) increases the
switch, thus reducing the voltage across the array during Active
mode to degrade the RSNM of a cell [see Fig. 6(a) and (d)].
On the other hand, the impact of PBTI degradation of the
cell driving (pull-down) NFETs depends on the signal (stress)
probability. When both NFET driving transistors are stressed
drift, the RSNM tends to imand suffer PBTI induced
prove [see Fig. 6(b) and (e)]. The reason is as follows. Due to
PBTI, both Read disturb and trip points of SRAM inverter pair
increase. However, Read disturb is determined by the voltage
divider formed by the NFET access transistor and the NFET
driving transistor during Read operation. As the NFET driving

Fig. 5. Active mode VVDD of header structure impacted by (a) NBTI,


(b) PBTI, and (c) NBTI&PBTI; active mode VVSS of footer structure impacted
by (d) NBTI, (e) PBTI, and (f) NBTI&PBTI.

transistor in the voltage divider is fully on, the PBTI effect of


the Read disturb increase is smaller than the increase of inverter
trip voltage where the NFET driving transistor is off. Thus,
of both NFET driving transistors
RSNM improves when
increase. In contrast, if the signal (stress) probability is skewed
[e.g. 100% (0%)], the Read disturb would dominates RSNM
for the worst-case pattern, and RSNM decreases with the stress
time [see Fig. 6(b) and (e)]. Furthermore, when both NBTI and
PBTI are considered and the signal (stress) probability is not
100% or 0% (so both NFET driving transistors are stressed),
the RSNM degradation induced by NBTI can be offset by
RSNM improvement induced by PBTI of the driving NFETs
[see Fig. 6(c) and (f)]. Fig. 7(a) and (b) show the relation
between RSNM and signal (stress) probability when the stress
drifts induced by
time is 10 s. 10 s is chosen as the
NBTI and PBTI saturate when the stress time is around 10
to 10 s [1]. The
drift induced by PBTI saturates earlier
than that induced by NBTI. These figures clearly indicate that
while NBTI degrades RSNM in general, PBTI can improve
RSNM when signal (stress) probability is between around 25%
and 75%. RSNM under PBTI reaches the peak value when
cell signal (stress) probability is around 50%. Both figures also
indicate that the RSNM degradation induced by NBTI can be
offset by RSNM improvement induced by PBTI when the cell
signal (stress) probability is between around 25% and 75%. As
such, SRAM cell array lifetime would be extended if the cell
signal (stress) probability could be maintained around 50%.
The Read delay is determined by the bitline discharging
time by the access NFET and the driving NFET of the selected
cell. Therefore, Read delays are relatively insensitive to the

1196

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 19, NO. 7, JULY 2011

Fig. 6. RSNM of header structure impacted by (a) NBTI, (b) PBTI, and
(c) NBTI&PBTI; RSNM of footer structure impacted by (d) NBTI, (e) PBTI,
and (f) NBTI&PBTI.

Fig. 8. Read delay of header structure impacted by (a) NBTI, (b) PBTI, and
(c) NBTI&PBTI; read delay of footer structure impacted by (d) NBTI, (e) PBTI,
and (f) NBTI&PBTI.

drifts of the PFET loading transistors due to NBTI [see


Fig. 8(a) and (d)]. In contrast, Read delay becomes worse when
PBTI is present/considered [see Fig. 8(b) and (e)] as the driving
NFET is weakened. Moreover, after NBTI/PBTI stressing, the
current flowing through the power switch and SRAM array
decrease, and the Read power of both header/footer-gated
SRAM decrease with the stress time.
C. Write Operation

Fig. 7. Relation between RSNM and signal (stress) probability when stress
time is 10 s in (a) header and (b) footer structure.

WRITE occurs by first discharging the logic 1 storage


node through the access NFET, and then charging the logic 0
storage node towards 1 through the pull-up PFET once the
original logic 1 storage node is pulled down below the trip
voltage of the inverter. The cross-coupled feedback inverter
action then kicks in to complete the Write operation. When a
cell is affected by NBTI and the cell signal (stress) probability
is not 100% (0%), both PFET loading transistors become
weaker. A weaker holding PFET helps the initial discharging
of the logic 1 storage node through the access NFET, while
a weaker pull-up PFET impedes the subsequent pull-up of the
logic 0 storage node. Since the initial discharging of the
logic 1 storage node tends to be the dominating factor for
Write operation, the WM improves with both PFET weakened.
However, when the cell signal (stress) probability is 100%
(0%), only one PFET loading transistor becomes weaker. For
the worst case pattern, the PFET holding the original logic
1 storage node is not stressed/weakened, so the pull down
of the logic 1 storage node is not getting easier. The PFET
corresponding to the original logic 0 storage node, however,
would be fully stressed/weakened, and thus slowing down the
charging of its storage node to logic 1 during Write operation.

YANG et al.: IMPACTS OF NBTI/PBTI AND CONTACT RESISTANCE ON POWER-GATED SRAM

1197

Fig. 9. WM of header structure impacted by (a) NBTI, (b) PBTI, and


(c) NBTI&PBTI; WM of footer structure impacted by (d) NBTI, (e) PBTI, and
(f) NBTI&PBTI.

As a result, the WM degrades [see Fig. 9(a) and (d)]. On the


other hand, when a cell is affected by PBTI and cell signal
(stress) probability is not 0% or 100%, both NFET driving
transistors degrade. Nevertheless, during a Write operation, the
pull-down of the logic 1 storage node is dictated initially by
the strength of the access NFET and the loading PFET, and
then by the cross-coupled feedback inverter action. Weakening
both driving NFETs helps the pull-up of the logic 0 storage
node, while impedes the pull-down of the logic 1 storage
node through the cross-coupled feedback inverter action,
and the two effects compensate each other. Thus, the WM
is relatively insensitive to PBTI. However, if the cell signal
(stress) probability is 0% or 100%, only one NFET driving
increases, leading to higher trip point of one
transistors
of the inverter. For the worst-case pattern, the higher trip point
of the inverter would impede the final pull-down of its logic
1 storage node through the cross-coupled feedback inverter
action, thus degrading the WM [see Fig. 9(b) and 9(e)]. Finally,
the combined impact of NBTI and PBTI on the WM are shown
in Fig. 9(c) and (f).
The Write delay of a cell has a positive correlation with its
WM. Thus, the Write delay decreases when WM improves, and
the Write delay becomes worse when WM degrades. The Write
power also decreases with stress time due to the lower current
flowing through the power-gated SRAM after stressing.
D. Standby/Sleep Mode Virtual Supply and Wake-Up
Transition
During Standby or Sleep mode, the power switch turns off
while the clamping device biases VVDD (for header-gating)

Fig. 10. (a) Change in VVDD of header structure by NBTI&PBTI and


(b) change in VVSS of footer structure by NBTI&PBTI during standby/sleep
mode.

or VVSS (for footer-gating) to appropriate level for data retention. Because the clamping device is not stressed as explained
drifts of
in Section II, VVDD (VVSS) is dominated by
the SRAM array devices during Standby or Sleep mode. As
such, the Standby/Sleep mode VVDD of the header structure
increases with the stress time, while the Standby/Sleep mode
VVSS of the footer structure decreases with the stress time as
shown in Fig. 10. Additionally, due to the increased equivalent OFF resistances of the SRAM array, the leakages of both
header and footer structures decrease as shown in Fig. 11.
During wake-up transition, the power switch turns on,
leading to virtual supply line bounce due to large current
flowing through the parasitical capacitance, inductance, and
resistance of the package and interconnect. After stressing, the
and equivalent resistance of the power switch and SRAM
array increase, so the wake-up current of the SRAM array
decreases. As a result, the virtual supply line bounce reduces
during wake-up transition as shown in Fig. 12. Moreover,
when only the power switch is impacted by NBTI or PBTI,
the wake-up time increases with stress time due to higher
of the power switch (Case A of Fig. 13(a), and Case A
of Fig. 13(b)). However, if the SRAM cell array also suffers
NBTI and/or PBTI stress, the wake-up time decreases, as the
Standby/Sleep mode VVDD of the header structure increases

1198

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 19, NO. 7, JULY 2011

Fig. 11. (a) Leakage of header structure impacted by NBTI&PBTI and


(b) leakage of footer structure impacted by NBTI&PBTI during standby/sleep
mode.

and the Standby/Sleep mode VVSS of the footer structure


decreases with the stress time.
drifts of pMOS and nMOS strongly depend on
.
If
increases,
drift rises. If
decreases,
drift induced by NBTI/PBTI also decreases. For power-gated
SRAM during Standby/Sleep mode, the voltage across cell
drift during Standby/Sleep mode
array decreases, and the
becomes smaller compared with the drift during Active mode.
If the data is not needed, the SRAM can be shut down completely to allow/facilitate the NBTI/PBTI recovery mechanism.
Therefore, the lifetime of SRAM can be extended by properly
controlling power-gated SRAM into Standby/Sleep mode or
aggressively shutting down unused sections.
IV. COMBINED IMPACTS OF CONTACT RESISTANCE
AND NBTI/PBTI
This section investigates the SRAM reliability impact by
contact resistance. The combined impacts of contact resistance
and NBTI/PBTI on Power-gated SRAM are also analyzed.
Based on the MOS model of Fig. 2(a), an SRAM cell with contact resistances is built up as shown in Fig. 14. We assume the
. The RSNM of the cell
cell in Fig. 14 stores logic 1
is defined as the voltage difference between the trip voltage of
and Read disturb voltage induced by M4 and M6 during

Fig. 12. (a) VVDD bounce of header structure impacted by NBTI&PBTI and
(b) VVSS bounce of footer structure impacted by NBTI&PBTI during wake-up
transition.

Read cycles. WM of the cell is defined as the BL voltage level


below which the cell will flip during Write cycles. Reference
[3] proposed selective device structure scaling and parasitic
engineering as a way to improve transistor performance and
extend the technology roadmap. The contact resistance was
shown to play an important role, and various contact resistance
values were assumed/used. In our analysis, the contact area
m based on scaling from UMC
is assumed as
65 nm CMOS process technology in accordance with the
scaling factor from ITRS Road-map.2 The ranges of values
for sheet resistance and specific contact resistivity are based
on scaling/extrapolation from UMC 65 nm CMOS process,
published data, as well as ITRS projection [3].2 The contact
resistance at 32 nm node ranges from around 100 to 500 .
A. Read Operation
Referring to Fig. 14, when the diffusion contact resistances,
R1, R5, R7, and R9, increase, the trip voltage of INV_1 decreases and RSNM degrades. If the diffusion contact resistance,
R3, increases, the trip voltage of INV1 increases and RSNM improves. On the other hand, M4 and M6 form a voltage divider
and induce Read disturb during Read cycle. The Read disturb
voltage increases with larger R4 but decreases with larger R2.

YANG et al.: IMPACTS OF NBTI/PBTI AND CONTACT RESISTANCE ON POWER-GATED SRAM

1199

Fig. 15. Normalized RSNM versus contact resistance. RSNM is normalized


with respect to the case with no contact resistance. RSNM degradation at 32 nm
node from diffusion contact resistance is around 1% to 3%.

Fig. 13. (a) Wake-up time of header structure impacted by NBTI&PBTI and
(b) wake-up time of footer structure impacted by NBTI&PBTI.

Fig. 16. Normalized read delay versus contact resistance. Read delay is normalized with respect to the case with no contact resistance. Read delay degradation at 32 nm node from diffusion contact resistance is around 1% to 4.5%.

Fig. 14. (a) Wake-up time of header structure impacted by NBTI&PBTI and
(b) wake-up time of footer structure impacted by NBTI&PBTI.

All gate-poly contact resistances affect neither the trip voltage


because they are in series with the infinite gate resistance, nor
the RSNM as they are not on the Read current paths.

RSNM decreases with increasing contact resistance as


shown in Fig. 15. When the contact resistance approaches 1
k , RSNM would degrade 6%. The reason is that R5, R7, and
R9 form a series resistance chain, causing the trip voltage to
decrease. Although R3 increases the trip voltage, its effect is
smaller than the R5/R7/R9 resistance chain. Notice that R2
compensates the Read disturb increase caused by R4, thus the
Read disturb voltage remains almost unchanged. Fig. 15 also
shows the RSNM is not impacted by the increase in gate-poly
contact resistance as discussed in the previous section.
Notice that when the diffusion contact resistance increases,
Read delay becomes longer as shown in Fig. 16. This is due to
increased R2 and R4 on the Read current (bit-line discharge)
path. As a result, the discharge time of BLB increases with increasing diffusion contact resistance. Moreover, Read delay is
insensitive to the gate-poly contact resistances because they are
not on the Read current path of a SRAM cell.
When NBTI and PBTI are considered, in the worst case,
of M1 and M4 increase while
of M2 and M3 remain unchanged. Because access transistors, M5 and M6, are stressed

1200

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 19, NO. 7, JULY 2011

Fig. 17. Normalized RSNM under NBTI and PBTI versus contact resistance.
RSNM is normalized with respect to the case with no contact resistance and no
NBTI/PBTI stress. RSNM degradation at 32 nm node caused by the combined
effects of NBTI/PBTI and diffusion contact resistance is around 23% to 26%.

Fig. 18. Normalized read delay under NBTI and PBTI versus contact resistance. Read delay is normalized with respect to the case with no contact
resistance and no NBTI/PBTI stress. Read delay degradation at 32 nm node
caused by the combined effects of NBTI/PBTI and diffusion contact resistance
is around 20% to 24%.

only during WL turning on period, the


drifts of access trandrift of M1 lowers the trip point of
sistors are negligible.
drift of M4 causes increase of the Read disturb
INV_1, and
voltage, resulting in RSNM degradation with usage time.
drifts induced
By using ac Reaction-Diffusion model,
by NBTI and PBTI for 10 s of M1 and M2 are calculated to
be 110 and 125 mV, respectively, in the worst case, leading to
RSNM degradation of about 22% without considering contact
resistance effect as shown in Fig. 17. Fig. 17 also shows that
RSNM degradation becomes more serious when the cell is impacted by both NBTI/PBTI and the diffusion contact resistance.
Furthermore, the Read delay increases when the cell is impacted
by NBTI and PBTI according to Fig. 18. The reason is that M4 is
on the BLB discharging path, leading to longer Read delay with
of M4. Fig. 18 also shows that NBTI/PBTI and the
larger
diffusion contact resistance degrade SRAM Read performance
cumulatively.
B. Write Operation
Referring to Fig. 14, larger R5, R7, and R9 reduce the holding
strength of pMOS M1, thus facilitating pull-down of the storage

Fig. 19. Normalized WM versus contact resistance. WM is normalized with


respect to the case with no contact resistance. WM improvement at 32 nm node
caused by diffusion contact resistance is around 0.3% to 1.2%.

Fig. 20. Normalized WM under NBTI and PBTI versus contact resistance. WM
is normalized with respect to the case with no contact resistance and no NBTI/
PBTI stress. WM degradation at 32 nm node caused by the combined effects of
NBTI/PBTI and diffusion contact resistance is around 10.3% to 10.1%.

node Q. Larger R1 impedes BL to discharge Q through M5.


Larger R6, R8, and R10 impede M2 to charge up node QB, and
larger R2 also prevents BLB to charge up QB through M6. Thus,
larger R5, R7, and R9 improve WM, while larger R1, R2, R6,
R8, and R10 degrade WM. Nevertheless, charging up QB is the
second order effect during Write, and WM is mainly impacted
by R1, R5, R7, and R9. As shown in Fig. 19, WM is improved by
larger diffusion contact resistance, but is relatively insensitive to
the gate-poly contact resistance because gate-poly contacts are
not on the access paths of Q and QB. When the diffusion contact
resistance approaches 1 k , WM improves by about 2.5%.
When the cell is also impacted by NBTI and PBTI, in the
of M2 and M3 increase, while
of M1 and
worst case,
drifts of M5 and M6 are negliM4 remain unchanged. The
gible. Weak M2 slows down the charging of QB, and weak M3
slightly impedes the discharging of Q. Consequently, WM of
SRAM cell under NBTI and PBTI degrades in the worst case.
Fig. 20 shows the relation between WM and contact resistance
of M2 and M3 are 110 and 125 mV, respectively. As
when
can be seen, WM degrades about 10% due to NBTI and PBTI.

YANG et al.: IMPACTS OF NBTI/PBTI AND CONTACT RESISTANCE ON POWER-GATED SRAM

1201

Fig. 21. Normalized write delay versus contact resistance. Write delay is normalized with respect to the case with no contact resistance. Write delay degradation at 32 nm node by diffusion contact resistance is around 0.6% to 3.4%.

Fig. 23. Active mode VVDD of a header power-gating structure. Active mode
VVDD degradation at 32 nm node by diffusion contact resistance is around 1.03
to 5.06 mV.

Fig. 22. Normalized write delay under NBTI and PBTI versus contact
resistance, write delay is normalized with respect to the case with no contact
resistance and no NBTI/PBTI stress. Write delay degradation at 32 nm node
caused by the combined effects of NBTI/PBTI and diffusion contact resistance
is around 6.62% to 9.65%.

Fig. 24. Active mode VVSS of a footer power-gating structure. Active mode
VVSS degradation at 32 nm node by diffusion contact resistance is around 1.11
to 2.74 mV.

In contrast with RSNM, larger diffusion contact resistance improves WM slightly (about 0.5%), as the current charging QB
is limited by M2 under NBTI effect.
Write delay is defined as the latency between the time WL
and the time Q and QB cross each other. Write
rises to half
delay normally tracks WM, and better (higher) WM would improve Write delay in general. However, Write delay is also aftime constant, and larger diffusion contact refected by the
sistances lead to longer Write delay as shown in Fig. 21. Additionally, Fig. 22 shows the relation between Write delay and the
contact resistance when the cell is under NBTI and PBTI stress.
The Write delay can be seen to degrade about 6% with NBTI
and PBTI. The Write delay also increases sharply when the diffusion contact resistance is larger than 100 .
C. SRAM Power-Gating Structure
In power-gated SRAM, when diffusion contact resistances
increase, the equivalent resistance between VVDD and VDD
(header-gated structure) and VVSS and VSS (footer-gated
structure) also increase. It causes decrease of VVDD in a
header-gated structure and increase of VVSS in a footer-gated
structure as shown in Figs. 23 and 24, respectively. Con-

sequently, the voltage across the SRAM array reduces, and


RSNM degrades while WM improves. On the other hand, larger
diffusion contact resistances reduce the leakage during Standby
mode. It also reduces Standby VVDD of header-gated structure,
and increases Standby VVSS of footer-gated structure shown in
Figs. 25 and 26, respectively. However, the changes in virtual
supply/GND voltage during Standby mode are smaller than
those during Active mode. Since the current flowing through
the SRAM during Standby is significantly smaller than that
during Active mode.
When the power switch turns on during wake-up transition,
large wake-up current flows through the package parasitic capacitors, inductors, and resistance, resulting in VVDD bounce
in header-gated structure or VVSS bounce in footer-gated structure. As the diffusion contact resistance increases, the wake-up
current reduces and virtual supply/GND bounce is mitigated.
However, due to reduced wake-up current, the wake-up time becomes longer.
V. NBTI/PBTI ON SRAM SENSING STRUCTURE
Two commonly used differential sensing amplifier structure,
AMP_A [see Fig. 27(a)] and AMP_B [see Fig. 27(b)] are compared. These two amplifiers would have similar performance if

1202

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 19, NO. 7, JULY 2011

Fig. 25. Standby mode VVDD of header power-gating structure. Standby


mode VVDD degradation at 32 nm node by diffusion contact resistance is
around 0.15 to 0.61 mV.

Fig. 27. Two commonly used differential sensing amplifier structures:


(a) AMP_A and (b) AMP_B.

Fig. 28.

Fig. 26. Standby mode VVSS of footer power-gating structure. Standby mode
VVSS degradation at 32 nm node by diffusion contact resistance is around 0.08
to 0.5 mV.

they were not impacted by NBTI and/or PBTI. In reality, the


pMOS pair and nMOS pair of AMP_As latch are under NBTI
and PBTI stress respectively all the time even if AMP_A is in
Standby mode. In contrast, when AMP_B is during standby,
of the pMOS pair and nMOS
the signal, SAE, turns off.
pair are zero and suffer no NBTI and PBTI stress respectively.
The pMOS pair and nMOS pair of AMP_B are only stressed
when it senses data. Therefore, the nMOS and nMOS degradation in AMP_A is much more serious than AMP_B (see Fig. 28).
Moreover, if the signal (stress) probabilities of the amplifier are
mismatch can occur in
not close to 50% (50%), serious
AMP_A. As a result, the performance degradation of AMP_A
is much more significant than AMP_B.
Recently, large signal single-ended sensing scheme becomes
popular due to its process variation tolerance. NAND gate [15],
[16], inverter gate [17], and PFET pull-up [18] are often used
as in single-ended sensing structures. Because BL pairs are
precharged to VDD, the sensing PFETs of these single-ended
sensing schemes are not stressed during Standby, and only
suffer NBTI stress during SRAM Read cycle. Furthermore,
mismatch problem
single-ended sensing schemes have no

drifts of sense amplifiers due to NBTI and PBTI.

that plagues the small signal differential amplifier. Thus, their


long-term degradations are very small. Hence, large signal
single-ended sensing scheme has better NBTI/PBTI tolerance
than differential sensing scheme.
VI. CONCLUSION
In this paper, we presented a comprehensive analysis on
the impacts of NBTI and PBTI on power-gated SRAM stability, margin, and performance based on BSIM 32 nm highmetal-gate Predictive Model. The combining effects of NBTI
and PBTI with contact resistance on SRAM array are also
investigated. Differential sensing scheme and large signal
single-ended sensing structure were also compared when
NBTI/PBTI were considered.
We showed that the header/footer structure played an important role in determining the VVDD and VVSS. In the worst
case, contact resistant and NBTI/PBTI jointly degraded SRAM
RSNM and Read performance. Nevertheless, if a SRAM cell
suffered PBTI only, the impact on RSNM would depend on the
signal (stress) probability. When both NBTI and PBTI were
present, and both NFET driving transistors were stressed, the
RSNM degradation induced by NBTI could be partially mitigated by PBTI. The WM could improve or degrade, depending
on the presence of NBTI, PBTI, or both, and the signal (stress)
probability. With contact resistance, the Write delay increased

YANG et al.: IMPACTS OF NBTI/PBTI AND CONTACT RESISTANCE ON POWER-GATED SRAM

but WM improved. The SRAM active power and standby/sleep


power decreased with the stress time. After power switch was
stressed, the virtual supply bounce decreased. The wake-up time
increased if only the power switch was stressed due to higher
of the power switch, and decreased if the SRAM array
was also stressed due to higher Standby/Sleep mode VVDD or
lower Standby/Sleep mode VVSS. When contact resistance impacts were also considered, virtual supply/GND bounce during
wake-up transition was reduced with increasing contact diffusion resistances, but wake-up time became longer. Finally, we
showed that by judiciously choosing the sense amplifier structure, the performance degradation induced by NBTI and PBTI
could be significantly reduced.
REFERENCES
[1] S. Zafar, Y. H. Kim, V. Narayanan, C. Cabral, V. Paruchuri, B.
Doris, J. Stathis, A. Callegari, and M. Chudzik, A comparative study
of NBTI and PBTI (charge trapping) in SiO2/HfO2 stacks with FUSI,
TiN, Re gates, in IEEE Symp. VLSI Technol. Dig. Tech. Paper, 2006,
pp. 2325.
[2] F. Babarada, M. D. Profirescu, and C. Dunare, MOSFET distorsion
analysis including series resistance modelling aspects, in Proc. IEEE
Int. Semicond. Conf., Oct. 2004, pp. 307310.
[3] L. Wei, J. Deng, L. Chang, K. Kim, C.-T. Chuang, and H.-S. P. Wong,
Selective device structure scaling and parasitic engineering: A way to
extend the technology roadmap, IEEE Trans. Electron Devices, vol.
56, no. 2, pp. 312330, Feb. 2009.
[4] F. Hamzaoglu, K. Zhang, Y. Wang, H. J. Ahn, U. Bhattacharya, Z.
Chen, Y.-G. Ng, A. Pavlov, K. Smits, and M. Bohr, A 3.8 GHz 153
Mb SRAM design with dynamic stability enhancement and leakage
reduction in 45 nm high-k metal gate CMOS technology, IEEE J.
Solid-State Circuits, vol. 44, no. 1, pp. 148154, Jan. 2009.
[5] Y. Wang, H. J. Ahn, U. Bhattacharya, Z. Chen, T. Coan, F. Hamzaoglu, W. M. Hafez, C.-H. Jan, P. Kolar, S. H. Kulkarni, J.-F. Lin,
Y.-G. Ng, I. Post, L. Wei, Y. Zhang, K. Zhang, and M. Bohr, A
1.1 GHz 12 A/Mb-leakage SRAM design in 65 nm ultra-low-power
CMOS technology with integrated leakage reduction for mobile applications, IEEE J. Solid-State Circuits, vol. 43, no. 1, pp. 172179,
Jan. 2008.
[6] M. Sharifkhani and M. Sachdev, Segmented virtual ground architecture for low-power embedded SRAM, IEEE Trans. Very Large Scale
Integr. (VLSI) Syst., vol. 15, no. 2, pp. 196205, Feb. 2007.
[7] Y. Wang, U. Bhattacharya, F. Hamzaoglu, P. Kolar, Y. Ng, L. Wei,
Y. Zhang, K. Zhang, and M. Bohr, A 4.0 GHz 291 Mb voltage-scalable SRAM design in 32 nm high-k metal-gate CMOS with integrated
power management, in IEEE Int. Solid-State Circuits Conf. Dig. Tech.
Paper, 2009, pp. 456458.
[8] K. Kang, H. Kufluoglu, K. Roy, and M. A. Alam, Impact of negativebias temperature instability in nanoscale SRAM array: Modeling and
analysis, IEEE Trans. Comput.-Aided Des. Integr. Syst., vol. 26, no.
10, pp. 17701781, Oct. 2007.
[9] A. Bansal, R. Rao, J.-J. Kim, S. Zafar, J. H. Stathis, and C.-T. Chuang,
Imapcts of NBTI and PBTI on SRAM static/dynamic noise margins
and cell failure probabilty, Mircroelectron. Reliab., vol. 49, no. 6, pp.
642649, Jun. 2009.
degra[10] J. C. Lin, A. S. Oates, and C. H. Yu, Time dependent V
dation of SRAM fabricated with high-k gate dielectrics, in Proc. IEEE
Int. Reliab. Phys. Symp., 2007, pp. 439444.
[11] R. Vattikonda, W. Wang, and Y. Cao, Modeling and minimization of
PMOS NBTI effect for robust nanometer design, in Proc. IEEE Des.
Autom. Conf., 2006, pp. 10471052.
[12] S. Bhardwaj, W. Wang, R. Vattikonda, Y. Cao, and S. Vrudhula, Predictive modeling of the NBTI effect for reliable design, in Proc. IEEE
Custom Integr. Circuits Conf., 2006, pp. 189192.
[13] R. Fernandez, B. Kaczer, A. Nackaerts, S. Demuynck, R. Rodriguez,
M. Nafra, and G. Groeseneken, AC NBTI studied in the 1 Hz2 GHz
range on dedicated on-chip CMOS circuits, in Proc. IEEE Int. Electron Devices Meet., Dec. 2006, pp. 14.

1203

[14] S. Kim, S. V. Kosonocky, and D. R. Knebel, Understanding and


minimizing ground bounce during mode transition of power gating
structures, in Proc. Int. Symp. Low Power Electron. Des., 2003,
pp. 2225.
[15] A. Bhavnagarwala, S. Kosonocky, Y. Chant, K. Stawiasz, U. Srinivasant, S. Kowalczyk, and M. Ziegler, A Sub-600 mV fluctuation tolerant 65 nm CMOS SRAM array with dynamic cell biasing, in IEEE
Symp. VLSI Circuits Dig. Tech. Paper, 2007, pp. 7879.
[16] J. Pille1, C. Adams, T. Christensen, S. Cottier, S. Ehrenreich, F. Kono,
D. Nelson, O. Takahashi, S. Tokito, O. Torreiter, O. Wagner, and D.
Wendel, Implementation of the CELL broadband engine in a 65 nm
SOI technology featuring dual-supply SRAM arrays supporting 6 GHz
at 1.3 V, in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Paper, 2007,
pp. 322606.
[17] T. Suzuki, H. Yamauchi, Y. Yamagami, K. Satomi, and H. Akamatsu,
A stable 2-Port SRAM cell design against simultaneously read/writedisturbed accesses, IEEE J. Solid-State Circuits, vol. 43, no. 9, pp.
21092119, Sep. 2008.
[18] R. Joshi, R. Houle1, D. Rodko, P. Patel, W. Huott, R. Franch, Y. Chan,
D. Plass, S. Wilson, S. Wu, and R. Kanj, A high performance 2.4 Mb
L1 and L2 cache compatible 45 nm SRAM with yield improvement
capabilities, in IEEE Symp. VLSI Circuits Dig. Tech. Paper, 2008, pp.
208209.

Hao-I Yang (S09) received the B. S. and M. S. degree in electrical engineering from National Cheng
Kung University, Tainan, Taiwan, in 2003 and 2005,
respectively. He is currently pursuing the Ph.D. degree in electronic engineering from National Chiao
Tung University, Hsinchu, Taiwan.

Wei Hwang (S68M69SM90F01) received the


B.Sc. degree from National Cheng Kung University,
Tainan, Taiwan, the M.Sc. degree from National
Chiao Tung University, Hsinchu, Taiwan, and the
M.Sc. and Ph.D. degrees in electrical engineering
from the University of Manitoba, Winnipeg, MB,
Canada, in 1970 and 1974, respectively.
From 1975 to 1978, he was Assistant Professor
with the Department of Electrical Engineering,
Concordia University, Montreal, QC, Canada. From
1979 to 1984, he was Associate Professor with the
Department of Electrical Engineering, Columbia University, New York, NY.
From 1984 to 2002, he was a Research Staff Member with the IBM Thomas J.
Watson Research Center, Yorktown Heights, NY, where he worked on high performance DRAM and microprocessor design. In 2002, he joined the National
Chiao Tung University (NCTU), Hsinchu, Taiwan, as the Director of Microelectronics and Information Systems Research Center until 2008. Currently, he
holds a Chair Professor with the Department of Electronics Engineering, where
he is engaged in teaching and research on Circuit Technology for ultra-low
power, memory-centric NoC design, and 3-D integration technology and Systems. During 20032007, he served as Co-Principal Investigator of National
System-on-Chip (NSoC) Program in Taiwan. From 2005 to 2007, he also
served as a Senior Vice President and Acting President of NCTU, respectively.
He is the coauthor of the book Electrical Transports in Solids-with Particular
Reference to Organic Semiconductors (Pergamon Press, 1981), which has been
translated into Russian and Chinese. He has authored or coauthored over 200
technical papers in renowned international journals and conferences, and holds
over 150 international patents (including 65 U.S. patents).
Prof. Hwang was a recipient of several IBM Awards, including 16 IBM Invention Plateau Invention Achievement Awards, four IBM Research Division
Technical Awards, and the CIEE Outstanding Electrical Engineering Professor
Award in 2004 and Outstanding Scholar Award from the Foundation for the
Advancement of Outstanding Scholarship for 2005 to 2010. He was named an

1204

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 19, NO. 7, JULY 2011

IBM Master Inventor. He was President, Board Director and Chairman of the
Boards of Directors of the Chinese American Academic and Professional Society (CAAPS) from 1986 to 1999. He is a member of the New York Academy
of Science, Sigma Xi, and Phi Tau Phi Society. He has served several times in the
Technical Program Committee of the ISLPED, SOCC, A-SSCC. He served as
the General Chair of 2007 IEEE SoC Conference (SOCC 2007) and the General
Chair of 2007 IEEE International Workshop on Memory Technology, Design,
and Testing (MTDT 2007). Currently, he is serving as Founding Director of
Center for Advanced Information Systems and Electronics Research (CAISER)
of University System of Taiwan, UST and Director of ITRI and NCTU Joint
Research Center. He is also severing as a Supervisor of IEEE Taipei Section.

Ching-Te Chuang (S78M82SM91F94)


received the B.S.E.E. degree from National Taiwan
University, Taipei, Taiwan, in 1975 and the Ph.D.
degree in electrical engineering from University of
California, Berkeley, in 1982.
From 1977 to 1982, he was a Research Assistant
with the Electronics Research Laboratory, University
of California, Berkeley, where he worked on bulk and
surface acoustic wave devices. He joined the IBM T.
J. Watson Research Center, Yorktown Heights, NY,
in 1982. From 1982 to 1986, he worked on scaled
bipolar devices, technology, and circuits. He studied the scaling properties of
epitaxial Schottky barrier diodes, did pioneering works on the perimeter effects
of advanced double-poly self-aligned bipolar transistors, and designed the first
sub-nanosecond 5-kb bipolar ECL SRAM. From 1986 to 1988, he was Manager of the Bipolar VLSI Design Group, working on low-power bipolar circuits,
high-speed high-density bipolar SRAMs, multi-Gb/s fiber-optic data-link circuits, and scaling issues for bipolar/BiCMOS devices and circuits. Since 1988,
he has managed the High Performance Circuit Group, investigating high-performance logic and memory circuits. Since 1993, his group has been primarily
responsible for the circuit design of IBMs high-performance CMOS micropro-

cessors for enterprise servers, PowerPC workstations, and game/media processors. Since 1996, he has been leading the efforts in evaluating and exploring
scaled/emerging technologies, such as PD/SOI, UTB/SOI, strained-Si devices,
hybrid orientation technology, and multi-gate/FinFET devices, for high-performance logic and SRAM applications. Since 1998, he has been responsible for
the Research VLSI Technology Circuit Co-design strategy and execution. His
group has also been very active and visible in leakage/variation/degradation tolerant circuit and SRAM design techniques. He took early retirement from IBM
to join National Chiao-Tung University, Hsinchu, Taiwan, as a Chair Professor
in the Department of Electronics Engineering in February 2008. He is currently
the Director of the Intelligent Memory and SoC Laboratory at National ChiaoTung University. He has authored many invited papers in international journals such as International Journal of High Speed Electronics, PROCEEDINGS OF
IEEE, IEEE CIRCUITS AND DEVICES MAGAZINE, and Microelectronics Journal.
He holds 31 U.S. patents with another 11 pending. He has authored or coauthored over 290 papers.
Dr. Chuang was a recipient of an Outstanding Technical Achievement
Award, a Research Division Outstanding Contribution Award, 5 Research
Division Awards, 12 Invention Achievement Awards from IBM, and the
Outstanding Scholar Award from Taiwans Foundation for the Advancement of
Outstanding Scholarship for 2008 to 2013. He was the co-recipient of the Best
Paper Award at the 2000 IEEE International SOI Conference. He served on the
Device Technology Program Committee for IEDM in 1986 and 1987, and the
Program Committee for Symposium on VLSI Circuits from 1992 to 2006. He
was the Publication/Publicity Chairman for Symposium on VLSI Technology
and Symposium on VLSI Circuits in 1993 and 1994, and the Best Student
Paper Award Sub-Committee Chairman for Symposium on VLSI Circuits from
2004 to 2006. He was elected an IEEE Fellow in 1994 For contributions to
high-performance bipolar devices, circuits, and technology. He has presented
numerous plenary, invited or tutorial papers/talks at international conferences
such as International SOI Conference, DAC, VLSI-TSA, ISSCC Microprocessor Design Workshop, VLSI Circuit Symposium Short Course, ISQED,
ICCAD, APMC, VLSI-DAT, ISCAS, MTDT, WSEAS, and VLSI Design/CAD
Symposium, etc.

IEEE TRANSACTIONS ON ELECTRON DEVICES, VOL. 57, NO. 11, NOVEMBER 2010

2785

FinFET SRAM Optimization With Fin Thickness


and Surface Orientation
Mingu Kang, S. C. Song, S. H. Woo, H. K. Park, Student Member, IEEE, M. H. Abu-Rahma, L. Ge, B. M. Han,
J. Wang, G. Yeap, and S. O. Jung, Senior Member, IEEE

AbstractIn this paper, the design space, including fin thickness (Tn ), fin height (Hn ), fin ratio of bit-cell transistors,
and surface orientation, is researched to optimize the stability,
leakage current, array dynamic energy, and read/write delay of
the FinFET SRAM under layout area constraints. The simulation
results, which consider the variations of both Tn and threshold
voltage (Vth ), show that most FinFET SRAM configurations
achieve a superior read/write noise margin when compared with
planar SRAMs. However, when two fins are used as pass gate
transistors (PG) in FinFET SRAMs, enormous array dynamic
energy is required due to the increased effective gate and drain
capacitance. On the other hand, a FinFET SRAM with a one-fin
PG in the (110) plane shows a smaller write noise margin than the
planar SRAM. Thus, the one-fin PG in the (100) plane is suitable
for FinFET SRAM design. The one-fin PG FinFET SRAM with
Tn = 10 nm and Hn = 40 nm in the (100) plane achieves a
three times larger noise margin when compared with the planar
SRAM and consumes a 17% smaller bit-line toggling array energy
at a cost of a 22% larger word-line toggling energy. It also achieves
a 2.3 times smaller read delay and a 30% smaller write delay when
compared with the planar SRAM.
Index TermsCell current, FinFET, leakage current, read stability, SRAM, surface orientation, write stability.

I. I NTRODUCTION

inFET technology is one of the leading candidates for an


alternative device structure to replace a planar CMOS for
an ultradeep submicrometer device region. The FinFET has
superior scalability due to the stronger electrostatic control of
the channel, which suppresses the short-channel effect (SCE)
[1], [2]. The FinFET also has a potentially higher layout density
due to its vertical structure and higher ON current, compared
to planar devices [3]. In addition, the thin body of a doublegate device is typically undoped or lightly doped. Thus, the
random dopant fluctuation (RDF) is significantly decreased,
which results in the reduction of the threshold voltage (Vth )
variation [4], [5]. The SRAM bit-cell is considered as the first
functional block in the system on a chip to be implemented
Manuscript received February 19, 2010; revised July 23, 2010; accepted
July 23, 2010. Date of publication August 30, 2010; date of current version November 5, 2010. The review of this paper was arranged by Editor
C. Jungemann.
M. Kang was with the School of Electrical and Electronic Engineering,
Yonsei University, Seoul 120-749, Korea. He is now with the Memory Division,
Samsung Electronics, Hwaseong 445-701, Korea.
S. C. Song, M. H. Abu-Rahma, L. Ge, B. M. Han, J. Wang, and G. Yeap are
with Qualcomm Inc., San Diego, CA 92121 USA.
S. H. Woo, H. K. Park, and S. O. Jung are with the School of Electrical
and Electronic Engineering, Yonsei University, Seoul 120-749, Korea (e-mail:
sjung@yonsei.ac.kr).
Digital Object Identifier 10.1109/TED.2010.2065170

using the FinFET because the critical issues of the SRAM bitcell scaling, such as the demand for continuous bit-cell size
scaling and electrical stability problems, can be resolved [6].
Thus, the optimal design of the SRAM bit-cell with the FinFET
is analyzed in this paper.
For a standard planar CMOS, (100) silicon substrates have
been used generally due to superior electron mobility, which is
higher in the (100) plane than that in the (110) plane. However,
the mobility of a hole in the (100) plane is lower than that
in the (110) plane. For planar device technology, the devices
with a (110) surface orientation have to be fabricated on silicon
substrates with a (110) crystalline orientation, which is not
generally used. However, both the (100) and (110) orientations
for the FinFET can be achieved in the (100) plane because of
the vertical structure. As shown in Fig. 1, the (110)-oriented
FinFET can be achieved by only rotating the transistor layout
by 45 in the plane of a (100) wafer [7]. However, there is
an inevitable area penalty for using multiorientation, where
both (100)- and (110)-oriented FinFETs are used on the same
wafer, because the angle between the (100)- and (110)-oriented
FinFETs has to be 45 [8]. In addition, multiorientation is not
practical due to its complex fabrication process. Thus, singleoriented FinFET SRAM designs (all (110)- or (100)-oriented
FinFETs in the (100) plane) are considered, rather than the multioriented FinFET SRAM ones. The p-n mobility ratios used in
this paper are decided to express the general characteristics of
the (100) and (110) orientations by considering the sensitivity to
the modest amount of process-induced strain from [7] and [9].
The different results between the (100) and (110) orientations in
this paper are mainly caused by the p-n mobility ratio. The ratio
largely depends on the surface orientation but can be affected
by other elements, such as the materials and process-induced
strain. Thus, it is noted that the models for the (100) and (110)
orientations in this paper are one of the possible choices to show
the general trend between the orientations rather than represent
the absolute characteristics of the orientations.
The Tn variation of the FinFET is one of the major sources
of the Vth variation along with the RDF. Thus, both the RDF
and the Tn variation are considered for the FinFET SRAM.
While reducing the fin thickness (Tn ) suppresses the SCE,
the fabrication of a thin FinFET is challenging and increases
the variation of Tn , which results in a large Vth variation
[10]. Thus, different variations depending on the Tn value are
applied for the FinFET SRAM.
Because of the vertical nature of the device structure, the
FinFET can achieve a higher effective channel width (hence,
a higher driving strength) per unit planar area by increasing

0018-9383/$26.00 2010 IEEE

2786

IEEE TRANSACTIONS ON ELECTRON DEVICES, VOL. 57, NO. 11, NOVEMBER 2010

Fig. 2.

Fig. 1. (a) FinFET with different Tn value. (b) Multioriented FinFETs on


(100) wafer.

the fin height (Hn ), compared to a planar device. However,


the Hn is limited by the Tn , the source/drain implantation
angle, and the planar area when multifins are used. Therefore,
determining the appropriate combination of Hn and Tn is
critical to achieve an optimizing FinFET SRAM bit-cell with
the same layout area constraints. In this paper, the top portion
of the fin is also used for the gate control.
This paper is organized as follows. The planar and FinFET
SRAM bit-cell designs are described in Section II. The static noise margins in various design spaces are presented in
Section III. The SRAM bit-cell and leakage currents are provided in Section IV. The read/write delay and dynamic array energy, including both the SRAM bit-cell and read/write drivers,
are described in Section V. The evaluation for the FinFET
SRAMs is presented in Section VI. Finally, the conclusions
are reported in Section VII. In this paper, a 32-nm foundrycompatible planar model parameter and a Berkeley common
gate FinFET model parameter are used with a fixed 0.9-V
supply voltage (VDD ) for both the SRAM bit-cell and the logic
area [11].
II. SRAM B IT-C ELL D ESIGN
Since the Tn fluctuation in the FinFET results in the Vth
variation, it can cause an SRAM device mismatch. Furthermore, the saturation current (Idsat ) per unit width is decreased
with the reduction of Tn due to the mobility degradation
by scattering [12]. Thus, the characteristics of the FinFET

(a) Planar SRAM bit-cell layout. (b) FinFET SRAM bit-cell layout.

SRAM need to be evaluated with the Tn variation. In a planar


SRAM bit-cell, the width of a transistor is the same as the
horizontal dimension of the transistor in the layout, as shown
in Fig. 2(a). However, the geometric effective width of the
FinFET is calculated as (2Hn + Tn ) since the top portion
of the fin is used for the gate control, which is different from
the horizontal dimension. Thus, the horizontal dimension of
the transistors in the FinFET SRAM bit-cell is defined as a
2-D width as shown in Fig. 2(b). Even though the top portion
of the fin is used for the gate control, the FinFET behaves as
a double-gate device with scaling Tn due to the diminished
channel controllability at the top portion. As Tn scales further,
the FinFET suffers from more serious current crowding at the
corner of the fin. Therefore, the electrical effective widths of
the FinFETs even with the same geometric effective width can
be different depending on the fin geometry.
Under the same layout area constraints for the planar and
FinFET SRAM bit-cells, the stability, performance, and array
dynamic energy are analyzed for the planar SRAM and the
variant configurations of the FinFET SRAM. For the same
layout area, the 2-D width of a pull-down transistor (PD) in
the FinFET SRAM bit-cell is chosen to be same as the width
(70 nm) of a PD in the planar SRAM bit-cell. In order not to
exceed the area limitation, the maximum fin number is limited
to two. Therefore, the fin numbers (NU and ND ) of a pullup transistor (PU) and the PD are decided to be one and two,
respectively, for a larger static noise margin. As shown in
Fig. 2(b), if the fin number (NG ) of a pass gate transistor (PG)
is not larger than ND , the bit-cell area is not affected by NG
because the 2-D width of the PG is not larger than that of the
PD [13]. Thus, one and two of NG do not require an additional
layout area.
When NU is one in the FinFET SRAM bit-cell, the 2-D
width of the PU is decided by the minimum contact pad width
(50 nm), which is similar to the PU width (45 nm) in the planar
SRAM bit-cell [14]. On the other hand, as shown in Fig. 3, the

KANG et al.: FinFET SRAM OPTIMIZATION

Fig. 3.

2787

FinFET diagram.

TABLE I
FinFET SRAM DESIGN SPACES

2-D width of the PD in the FinFET SRAM bit-cell is decided


by the following equation
2-D Width of PD = 2 M T P + F in# Tn
+ Tan() Hn (F in# 1) (1)

Fig. 4. (a) Idsat per unit width and (b) IOFF per unit width when gate length
(Lg ) = 35 nm and oxide thickness (Tox ) = 1 nm.

where is a source/drain implantation angle that depends on


process technology. M T P means the minimum margin from
the edge of contact to the contact landing pad. The Hn and
Tn ratio depends on the technology constraint and is generally
chosen to be 15 [15]. With 70 nm of the 2-D width of the PD
in a FinFET SRAM bit-cell, the following three combinations
of Tn and Hn are created, assuming M T P = 5 nm and
= 45 : (Tn , Hn ) = (10 nm, 40 nm), (15 nm, 30 nm), and
(20 nm, 20 nm). The FinFET SRAM design spaces according
to the surface orientation, fin ratio, and (Tn , Hn ) combination
are described in Table I. To evaluate the effect of the FinFET
geometry, it is assumed that the identical implant doping is
applied to all the cases of the FinFET, and no separate implant
step for the SRAM bit-cell is used.
III. SRAM B IT-C ELL S TABILITY

Fig. 5. Idsat Vgs curve of planar and FinFET with Lg = 35 nm and Tox =
1 nm in the (100) plane.

Fig. 4 describes Idsat and the OFF current IOFF per unit width
of the FinFET and planar devices. As shown in Fig. 4(a), all
the FinFETs using three (Tn , Hn ) combinations achieve a
larger Idsat per unit width than the planar device. In a standard
(100) plane, the Idsat of the NMOS is larger than that of the
PMOS while the Idsat of the PMOS is larger than that of the
NMOS in the (110) plane. Fig. 5 shows the Idsat Vgs curves of
the NMOS and PMOS in three (Tn , Hn ) combinations in the
(100) plane. The Vth is higher in a narrower fin by suppressed
Vth rolloff and DIBL owing to the improved short-channel

effect. Thus, when Tn is reduced, the Idsat and IOFF per unit
width of the FinFET become smaller, as shown in Fig. 4. The
FinFET with Tn = 20 nm, which suffers from the SCE, has an
even larger IOFF than that of a planar device. The geometric
effective widths in the (10 nm, 40 nm), (15 nm, 30 nm), and
(20 nm, 20 nm) combinations are 90, 75, and 60 nm, respectively. Thus, as shown in Fig. 5, the Idsat is largest and smallest
in the (10 nm, 40 nm) and (20 nm, 20 nm) combinations,
respectively.

2788

IEEE TRANSACTIONS ON ELECTRON DEVICES, VOL. 57, NO. 11, NOVEMBER 2010

Fig. 6. RSNM and WNM of planar and FinFET SRAM bit-cells.

To measure the read static noise margin (RSNM) and the


write noise margin (WNM), the statistical (Monte Carlo)
simulations are performed with the Vth variation caused by the
RDF and Tn variation. When Tn is thin, the Vth variation is
increased. Yu et al. [16] show that, when Tn is 10, 15, and
20 nm, the standard deviation of Vth caused by only the Tn
variation (Vt.Tn ) is 15.5, 14, and 12.5 mV, respectively. The
sigma of the Vth variation caused by the RDF (Vt.RDF ) is
decided by the following equation
Vt.RDF = 

Avt
W idth(= 2Hn + Tn ) Length

(2)

where Avt is a technology constant proportional to the oxide thickness and the channel doping. In this paper, 2.5 and
1.76 mV m are used for the Avt of the planar device and
FinFET, respectively [17]. Even though the Vth.RDF is not
independent of the Tn variation as shown in (2), the effect
is negligible. Thus, Vt , which includes both Vt.RDF and
Vt.Tn , can be expressed by the following equation with the
assumption that Vt.RDF and Vt.Tn are independent

2
2
+ Vt.Tn
.
(3)
Vt = Vt.RDF
In Fig. 6, the RSNM and WNM of the planar and FinFET
SRAM bit-cells are described. The RSNM and the WNM are
measured with the methods described in [18] and [19], respectively. The RSNM and the WNM in the y-axis in Fig. 6 are
presented as / in order to standardize each result. Generally,
the RSNM and WNM of the FinFET SRAM are larger than
those of the planar SRAM because the effect of the small RDF
surpasses that of the Tn variation.
The driving strength of the PG divided by that of the PU
is defined as the alpha ratio. The driving strength of the PD
divided by that of the PG is defined as the beta ratio. The RSNM
is proportional to the beta ratio while the WNM is proportional
to the alpha ratio [20]. In addition, when the driving strength of
the PU relative to that of the PD is high, the trip point of the
inverter becomes high. Thus, the RSNM is also proportional to
the driving strength of the PU divided by that of the PD.

Fig. 7. (a) Currents in SRAM bit-cell during read operation. (b) Butterfly
curves of cases 1 and 3.

A schematic of the SRAM bit-cell and the butterfly curves


of cases 1 and 3 are shown in Fig. 7(a) and (b), respectively.
In Fig. 7(b), the curved point (C-point) is the point where
the gradient is 1. At the C-point, the PU and the PG in
Fig. 7(a) are in the linear region, and PD is in the saturation
region, respectively. Thus, the currents through the PU, PG,
and PD (IPU , IPG , and IPD ) can be expressed by the following
equations
IPU KPU (VDD VIN |Vtp |) (VDD VOUT )

(4)

IPG KPG (VDD VOUT VtnG )(VDD VOUT )

(5)

1
IPD = KPD (VIN VtnD )
(6)
2
 
 
W
W
KPU = P COX
KPG = N COX
L PU
L PG
 
W
KPD = N COX
(7)
L PD
where P and N are the mobility of the PMOS and NMOS
transistors, respectively. Vtp , VtnG , and VtnD are the threshold voltage of the PU, PG, and PD, respectively. (W/L)PU ,
(W/L)PG , and (W/L)PD are the width/length of the PU, PG,
and PD, respectively, and COX is the oxide capacitance. is
the number between one and two. To measure the RSNM, it is
assumed that the voltages of the word-line and the bit-line_b are

KANG et al.: FinFET SRAM OPTIMIZATION

2789

VDD . The summation of IPU and IPG is the same as IPD during
the read operation. Thus, the following equations are derived
from (4)(6)
1 
KPU (VDD VIN |Vtp |)
VOUT = VDD
2KPG

+ KPG VtnG + X
(8)


dVOUT
1
1 dX
=
KPU +
(9)
dVIN
2KPG
2 X dVIN
where
X = [KPU (VDD VIN |Vtp |) KPG VtnG ]2
+ 2KPG KPD (VIN VtnD )

(10)

dX
2
= 2KPU
(VDD VIN |Vtp |) + 2KPU KPG VtnG
dVIN
+ 2KPG KPD (VIN VtnD )1 .

(11)

Because dVOUT /dVIN is 1 at the C-point, it is proven


by the numerical calculation of (9) that the C-point occurs at
high VIN with high |Vtp |, VtnG , and VtpD . As a result, the
C-point of the butterfly curve becomes sharper in case 1 than
in case 3, as shown in Fig. 7(b), which results in a larger
SNM in case 1. Thus, as shown in Fig. 6, when Tn is smaller,
the RSNM becomes generally higher due to the higher Vth of
the transistors. With NG = 2, the RSNM is decreased and the
WNM is increased when compared with those with NG = 1
because the beta and alpha ratios are decreased and increased,
respectively, due to the enhanced driving strength of the PG.
In the (110) plane, the RSNM is increased and the WNM is
decreased when compared with those in the (100) plane, due to
the enhanced driving strength of the PU.
Fig. 8 describes how much the of the RSNM and WNM
(RSNM and WNM ) are increased by the Tn variation when
compared with the cases where only the RDF is applied without
the Tn variation. RSNM and WNM can be increased by over
18% from the Tn variation. When Tn is smaller, the variation
of Tn is larger, which results in a larger increase of RSNM
and WNM . Thus, the increase in RSNM and WNM caused by
the Tn variation is three times larger in the (10 nm, 40 nm)
combination when compared with that in the (20 nm, 20 nm)
combination.
As shown in Fig. 5, the Vth of a planar device is smaller than
that of the FinFET with Tn = 10 and 15 nm. In addition, the
effect of the RDF is smaller on the FinFET than on the planar
device because of its lightly doped channel and larger effective
width, which results in a smaller Vth variation. As the FinFET
shows less Vth variation, the FinFET SRAM bit-cell achieves a
generally larger RSNM and WNM, compared with the planar
SRAM bit-cell despite the Tn variation. The WNMs of the
FinFET SRAM bit-cells in cases 7, 8, and 9 are smaller than that
of the planar SRAM bit-cell, owing to the strong PU in the (110)
plane. When the RSNM and WNM of the planar SRAM bit-cell
are regarded as criteria for stability, the FinFET SRAM bit-cells
in cases 7, 8, and 9 are not desirable. Thus, in the following
sections, cases 7, 8, and 9 are not analyzed further.

Fig. 8. Effect of Tn variation on RSNM and WNM.

Fig. 9. Icell and IOFF per one SRAM bit-cell.

IV. L EAKAGE AND C ELL C URRENT OF SRAM B IT-C ELL


The cell current (Icell ) and IOFF per unit SRAM bit-cell of
all cases are described in Fig. 9. Within the same layout area,
the effective width of the transistor is larger in the FinFET
SRAM than in the planar SRAM. Thus, a larger Icell is achieved
in the FinFET SRAM bit-cell. The (Tn = 10 nm, Hn =
40 nm) configuration achieves the largest Icell when compared
with the other (Tn , Hn ) configurations owing to the largest
effective width despite the highest Vth . When Tn is smaller, the
FinFET SRAM bit-cell achieves a smaller IOFF due to superior
electrostatic channel controllability. When NG = 2, Icell and
IOFF become larger because of the larger driving strength of
the PG transistor. In the (110) plane, the driving strength of the
NMOS is decreased, which results in smaller Icell and IOFF
than those in the (100) plane.

2790

IEEE TRANSACTIONS ON ELECTRON DEVICES, VOL. 57, NO. 11, NOVEMBER 2010

Fig. 11.

SRAM array configuration.

Fig. 10. (a) Definition of effective gate capacitance (Cge ) and effective
drain capacitance (Cde ). (b) Cge and Cde per unit width for FinFET and
planar devices normalized to planar NMOS Cge .

V. D ELAY AND E NERGY C ONSUMPTION D URING R EAD


AND W RITE O PERATIONS
The effective gate capacitance (Cge ) and the effective drain
capacitance (Cde ) are defined in Fig. 10(a), and the relative
capacitance per unit width when compared with the Cge of
the planar NMOS is described in Fig. 10(b). The capacitance is
measured by the delay required to drive Cge or Cde from 0 V
to VDD by the same inverter. The Cge includes not only
the gate oxide capacitance but also the fringing and overlap
capacitances because the fringing and overlap capacitances are
connected across the gate and the drain. The Cde includes not
only the junction capacitance but also the fringing and overlap
capacitances. As shown in Fig. 10(b), the Cge of the FinFET
per unit width is about 70% of that of a planar device, caused by
a thicker gate dielectric which can be used owing to the strong
electrostatic channel controllability. In addition, owing to a low
channel doping concentration and vertical structure, the Cde
per unit width of the FinFET is only 45% of that of a planar
device. The reduced Cge and Cde per unit width provide not
only the reduced bit-line toggling power but also the potentially
reduced delay of the read and write operations.
In Fig. 11, the SRAM array configuration is described
to measure the delay and power consumption during the
read and write operations. To consider the wiring capacitance, 0.2 fF/m is assumed. The word-line and bit-line are
driven by a four-stage inverter chain. In the (100) plane,
the PMOS width is decided to be two times the NMOS
width in the inverter chain, to balance the driving strengths.

Fig. 12. Word-line and bit-line toggling energies normalized to bit-line toggling energy of planar SRAM.

On the other hand, in the (110) plane, the PMOS width


is set to be the same as the NMOS width because the
driving strengths of the NMOS and PMOS are almost the
same, as shown in Fig. 4(a). The SRAM array consists of
128 columns and 256 rows. The Tn of the inverter chain is
the same as that of the SRAM bit-cell.
Fig. 12 describes the word-line and bit-line toggling energies,
which are normalized to the bit-line toggling energy of the
planar SRAM, to toggle one word-line and one bit-line from
0 V to VDD , respectively. The bit-line and word-line toggling
energies consumed at the last-stage inverter of the word-line
and bit-line drivers are measured. The word-line capacitance
is proportional to the PG Cge NG . On the other hand, the
bit-line capacitance is proportional to the PG Cde NG . As
shown in Fig. 10(b), the Cge per unit width of the FinFET is
smaller than that of the planar device. However, the effective
width of a FinFET PG is larger than that of a planar device
PG in all the cases. Thus, in all the cases except case 3, the
Cge of the FinFET PG is larger than that of the planar PG,
which leads to a larger word-line capacitance in the FinFET

KANG et al.: FinFET SRAM OPTIMIZATION

SRAM than in the planar SRAM. As a result, the word-line


toggling power of the FinFET SRAM is larger than that of the
planar SRAM. On the other hand, as shown in Fig. 10(b), the
FinFET achieves a larger Cde reduction (55%) per unit width
than the Cge reduction (40%) per unit width when compared
with the planar device. Thus, the FinFET SRAM in cases 1,
2, and 3 achieves a bit-line energy reduction when compared
with the planar SRAM. Because many bit-lines (several tens to
hundreds of bit-lines) toggle when one word-line toggles during
the read and write operations in an SRAM array, the bit-line
toggling energy is the dominant factor for most SRAM arrays.
Thus, despite the increased word-line toggling energy, the total
energy consumption in the array of the FinFET SRAM would
be much smaller than that of the planar SRAM in cases 1, 2, and
3. The word-line and bit-line toggling energies are largest in the
(10 nm, 40 nm) combination, owing to the largest effective
width of the PG. When NG = 2, the word-line and bit-line
toggling energies are larger than that when NG = 1 because
of the larger effective width of the PG.
The read delay consists of the following two delay components: 1) the delay from the In_word triggering to the word-line
enabling and 2) the delay from the word-line enabling to the
time when the bit-line voltage becomes 20% of VDD . The write
delay is measured, provided that the valid data are loaded on a
bit-line before the write operation begins. Thus, the write delay
consists of the following two delay components: 1) the delay
from the In_word triggering to the wordline enabling and 2) the
delay from the word-line enabling to the time when the voltages
of the VR and VL nodes become the same (VRVL crossing
point).
Fig. 13 shows the read and write delays. The FinFET SRAM
shows smaller read and write delays in all the cases when compared with the planar SRAM since superior driving capability
surpasses the increment of Cge and Cde caused by a larger
effective width. As shown in Fig. 5, when Tn is smaller, Vth
becomes higher, which results in a smaller driving strength. As
a result, with a smaller Tn , the first part of the read and write
delays, which is the delay of the inverter chain, becomes larger.
When NG = 2, the first part of the read and write delays is
larger than when NG = 1, due to the larger capacitance on the
word-line. In case 1, the second part of the read delay is smaller
than in case 3, due to the superior driving strength of the PG by
the aid of its greater effective width. When NG = 2, the second
part of the read and write delays is smaller than when NG = 1,
due to the larger driving strength of the PG. In the (110) plane,
the same widths are used for the NMOS and the PMOS in the
inverter chain while the PMOS width is twice that of the NMOS
in the (100) plane. Thus, the capacitance of the inverter chain is
generally smaller in the (110) plane than that in the (100) plane.
As a result, the first part of the read and write delays becomes
smaller in the (110) plane. On the other hand, the second part
of the read delay becomes larger, owing to the smaller driving
strength of the PG.
VI. E VALUATION OF F IN FET SRAM
As shown in the previous sections, the FinFET SRAM bitcell with NG = 2 achieves large WNM and Icell and short

2791

Fig. 13. SRAM operation delay. (a) Read delay. (b) Write delay.

read/write delays. Thus, if the performance or stability is the


main target of the design, the cases with NG = 2 are the proper
choices. Case 12 shows outstanding speed performance and
moderate stability. On the other hand, case 4 or 10 achieves
distinguished write stability. However, owing to the greatly
increased Cge and Cde , the cases with NG = 2 consume
enormous word-line and bit-line toggling energies. If a low
power consumption is considered to be a major goal of the
SRAM, the FinFET SRAM bit-cell with NG = 2 is not desirable. Because of the large IOFF with Tn = 20 nm, case 3 is
also not recommendable, and cases 1 and 2 are applicable for
low-power design.
Case 2 has a larger IOFF when compared with case 1 because
of a smaller Vth . IOFF significantly affects the total power
consumption since most of the bit-cells are in a hold state. Thus,
case 1 is more suitable for low-power application than case 2.
Even with the Tn variation, case 1 achieves 3.3 and 2.8 times
larger RSNM and WNM, respectively, when compared with
the planar SRAM bit-cell because the effect of the small RDF
and high Vth surpasses that of the Tn variation. In addition,

2792

IEEE TRANSACTIONS ON ELECTRON DEVICES, VOL. 57, NO. 11, NOVEMBER 2010

case 1 achieves 2.6 times larger Icell and 103.7 times smaller
IOFF when compared with the planar SRAM bit-cell due to
a larger effective width achieved by the vertical structure and
superior electrostatic channel control, respectively. Because of
a larger driving strength and smaller Cge and Cde than a
planar device, the FinFET SRAM in case 1 achieves a 2.3 times
smaller read delay and a 30% smaller write delay. The FinFET
SRAM in case 1 also achieves a 17% smaller bit-line toggling
power at a cost of a 22% larger wordline toggling power when
compared with the planar SRAM, respectively.

VII. C ONCLUSION
In this paper, the FinFET SRAMs with possible (Tn , Hn )
combinations, fin ratio, and surface orientation are researched
with regard to speed, stability, leakage current, and array dynamic energy under area constraints. Despite the Tn variation,
the FinFET SRAM achieves a superior read/write noise margin
compared to that of the planar SRAM, owing to the small
RDF. In addition, owing to the strong driving capability with
the aid of the vertical structure, most FinFET SRAM configurations show superior speed performance when compared
with the planar SRAM. However, the cases with NG = 2
consume too much word-line and bit-line toggling energies
because of the greatly increased Cge and Cde . Moreover, the
cases with NG = 1 in the (110) plane show very poor write
stability. In the case of NG = 1 in the (100) plane, the (Tn =
15 nm, Hn = 30 nm) and (Tn = 20 nm, Hn = 20 nm) configurations show too much IOFF . The optimal configuration
with (Tn = 10 nm, Hn = 40 nm) and NG = 1 shows 103.7
times smaller IOFF due to a high Vth and shows three times
larger read and write noise margins when compared with the
planar SRAM bit-cell despite the Tn variation. It also achieves
a 2.3 times smaller read delay and a 30% smaller write delay
when compared with the planar SRAM.

R EFERENCES
[1] K. Kim, K. K. Das, R. V. Joshi, and C.-T. Chuang, Leakage power
analysis of 25-nm double-gate CMOS devices and circuits, IEEE Trans.
Electron Devices, vol. 52, no. 5, pp. 980986, May 2005.
[2] J. Kavalieros, B. Doyle, S. Datta, G. Dewey, M. Doczy, B. Jin,
D. Lionberger, M. Metz, W. Rachmady, M. Radosavljevic, U. Shah,
N. Zelick, and R. Chau, Tri-gate transistor architecture with high- k gate
dielectrics, metal gate and strain engineering, in VLSI Symp. Tech. Dig.,
Jun. 2006, pp. 5051.
[3] H. Shang, L. Chang, X. Wang, M. Rooks, Y. Zhang, B. To, K. Babich,
G. Totir, Y. Sun, E. Kiewra, M. Ieong, and W. Haensch, Investigation of
FinFET devices for 32 nm technologies and beyond, in VLSI Symp. Tech.
Dig., Jun. 2006, pp. 5455.
[4] S. A. Tawfik and V. Kursun, Low-power and compact sequential circuits
with independent-gate FinFETs, IEEE Trans. Electron Devices, vol. 55,
no. 1, pp. 6070, Jan. 2008.
[5] D. J. Frank, Y. Taur, M. Ieong, and H.-S. P. Wong, Monte Carlo modeling
of threshold variation due to dopant fluctuations, in VLSI Symp. Tech.
Dig., Jun. 1999, pp. 171172.
[6] A. Bansal, S. Mukhopadhyay, and K. Roy, Device-optimization technique for robust and low-power FinFET SRAM design in nanoscale
era, IEEE Trans. Electron Devices, vol. 54, no. 6, pp. 14091419,
Jun. 2007.
[7] L. Chang, M. Ieong, and M. Yang, CMOS circuit performance enhancement by surface orientation optimization, IEEE Trans. Electron Devices,
vol. 51, no. 10, pp. 16211627, Oct. 2004.

[8] S. Gangwal, S. Mukhopadhyay, and K. Roy, Optimization of


surface orientation for high-performance, Low-power and Robust FinFET
SRAM, in Proc. IEEE CICC, 2006, pp. 433436.
[9] A. Borges, V. Moroz, and X. Xu, Strain engineering and layout context
variability at 45 nm, Semiconductor International, Nov. 2007.
[10] S. Xiong and J. Bokor, Sensitivity of double-gate and FinFET devices
to process variations, IEEE Trans. Electron Devices, vol. 50, no. 11,
pp. 22552261, Nov. 2003.
[11] M. V. Dunga, C.-H. Lin, D. D. Lu, W. Xiong, C. R. Cleavelin, P. Patruno,
J.-R. Hwang, F.-L. Yang, and A. M. Niknejad, BSIM-MG: A versatile
multi-gate FET model for mixed-signal design, in VLSI Symp. Tech. Dig.,
Jun. 2007, pp. 6061.
[12] D. Esseni, A. Abramo, L. Selmi, and E. Sangiorgi, Physically based
modeling of low field electron mobility in ultrathin single- and doublegate SOI n-MOSFETs, IEEE Trans. Electron Devices, vol. 50, no. 12,
pp. 24452455, Dec. 2003.
[13] D. Lekshmanan, A. Bansal, and K. Roy, FinFET SRAM: Optimizing
silicon fin thickness and fin ratio to improve stability at ISO area, in
Proc. IEEE CICC, 2007, pp. 623626.
[14] X. Huang, W.-C. Lee, C. Kuo, D. Hisamoto, L. Chang, J. Kedzierski, E. Anderson, H. Takeuchi, Y.-K. Choi, K. Asano, V. Subramanian,
T.-J. King, and J. Bokor, Sub 50-nm FinFET: PMOS, in IEDM Tech.
Dig., Dec. 1999, pp. 6770.
[15] B. Yu, L. Chang, S. Ahmed, H. Wang, S. Bell, C.-Y. Yang, C. Tabery,
C. Ho, Q. Xiang, T.-J. King, J. Bokor, C. Hu, M.-R. Lin, and D. Kyser,
FinFET scaling to 10 nm gate length, in IEDM Tech. Dig., Dec. 2002,
pp. 251254.
[16] S. Yu, Y. Zhao, L. Zeng, G. Du, J. Kang, R. Han, and X. Liu, Impact
of line-edge roughness on double-gate Schottky-barrier field-effect transistors, IEEE Trans. Electron Devices, vol. 56, no. 6, pp. 12111219,
Jun. 2009.
[17] K. J. Kuhn, Reducing variation in advanced logic technologies: Approaches to process and design for manufacturability of nanoscale
CMOS, in IEDM Tech. Dig., Dec. 2007, pp. 471474.
[18] E. Seevinck, F. List, and J. Lohstroh, Static-noise margin analysis of
MOS SRAM cells, IEEE J. Solid-State Circuits, vol. SSC-22, no. 5,
pp. 748754, Oct. 1987.
[19] A. Bhavnagarwala, S. Kosonocky, C. Radens, K. Stawiasz, R. Mann,
Q. Ye, and K. Chin, Fluctuation limits & scaling opportunities for CMOS
SRAM cells, in IEDM Tech. Dig., Dec. 2005, pp. 659662.
[20] S. Mukhopadhyay, H. Mahmoodi, and K. Roy, Modeling of failure probability and statistical design of SRAM array for yield enhancement in
nanoscale CMOS, IEEE Trans. Comput.-Aided Design Integr. Circuits
Syst., vol. 24, no. 12, pp. 18591880, Dec. 2005.

Mingu Kang was born in Changwon-Si,


Gyeongsangnam-Do, Korea, in 1981. He received
the B.S. and M.S. degrees in electrical and electronic
engineering from Yonsei University, Seoul, Korea,
in 2007 and 2009, respectively.
Since 2009, he has been with the Memory Division, Samsung Electronics, Yongin, Korea, where
he has been engaged in the design of PRAM. His
research interests include low-power PRAM and
SRAM and FinFET application for memory circuits.

S. C. Song received the Ph.D. degree in solid state


electronics from The University of Texas at Austin,
Austin, in 2000.
Since 2000, he has been in engineering and management positions in various organizations, including Motorola, Samsung, and SEMATECH, working
on advanced CMOS process/device technology development. He is currently with Qualcomm Inc.,
San Diego, CA, where he leads the 28-nm HK/MG
technology development with leading foundries. He
has contributed several key papers to high-profile
journals and conferences on various topics of CMOS technology, including
SiON, HK/MG, and FinFET. He is the holder of six U.S. patents.

KANG et al.: FinFET SRAM OPTIMIZATION

2793

S. H. Woo was born in Seoul, Korea, in 1983. He


received the B.S. degree in electrical and electronic
engineering from Yonsei University, Seoul, Korea, in
2009, where he is currently working toward the M.S.
degree.
His current research interests include the analysis
of offset voltage, sensing the dead zone of a sense
amplifier, PVT variation sensing, and compensation
circuit design.

B. M. Han
From 1989 to 1997, he was with Samsung Electronics, where he worked on the DRAM/eDRAM/
GDRAM layout. From 1997 to 2000, he was with
AAC, where he worked on a graphic memory layout. From 2000 to 2004, he was with IDT, where
he worked on the SRAM and CAM layout. Since
2004, he has been with the digital mask design team
of Qualcomm Inc., San Diego, CA. He has more
than 20 years of memory layout experience and is
currently interested in new device technologies.

H. K. Park (S10) was born in Iksan, Jeollabukdo, Korea, in 1982. He received the B.S. degree
in electrical and electronic engineering from Yonsei
University, Seoul, Korea, in 2008, where he is currently working toward the M.S. degree.
His current research interests include SRAM stability, subthreshold SRAM bit-cell design, FinFET
SRAM bit-cell design, and FinFET peripheral circuit
design.

J. Wang received the B.S. and M.S. degrees in


physics from Peking University, Beijing, China, in
1990 and 1993, respectively, while doing research
on e-beam lithography and a high-temperature superconductor Josephson junction device, and the
M.S.E.E. degree from the University of Washington, Seattle, in 1996. His graduate research focused
on an in situ temperature control system for wafer
processing.
In 1997, he was with Micron Technology, Boise,
ID, where he worked on the process development of
many advanced memories, including stand-alone high-speed SRAM from 0.35to 0.1-m technology, 90-nm DRAM development, and 70-nm NAND Flash
process development. Since 2004, he has been with Qualcomm Inc., San Diego,
CA, where he is working on embedded memory solutions and is currently the
Manager for the memory technology group responsible for the enablement of
embedded SRAM, ROM, eDRAM, fuse, and multichip package DRAM.

M. H. Abu-Rahma received the B.Sc. degree (with


honors) in electronics and communication engineering from Ain Shams University, Cairo, Egypt, the
M.Sc. degree in electronics and communication engineering from Cairo University, Giza, Egypt, and
the Ph.D. degree in electrical engineering from the
University of Waterloo, Waterloo, ON, Canada.
From 2001 to 2004, he was with Mentor Graphics, Egypt, where he worked on MOSFET compact
model development and extraction. Since 2005, he
has been with Qualcomm Inc., San Diego, CA,
where he has been engaged in the research and development of low-power
embedded SRAM and CMOS circuits. He has authored and coauthored several
technical papers in refereed international conferences and journals. He is
the holder of one patent with ten more patents filed (pending). His research
interests include low-power digital circuits, variation-tolerant memory design,
and statistical design methodologies.

L. Ge received the B.Eng. degree from Southeast


University, Nanjing, China, the M.Sc. degree from
the National University of Singapore, Singapore, and
the Ph.D. degree from the Department of Electrical
and Computer Engineering, University of Florida,
Gainesville, in 2002. His doctoral research was focused on the modeling and design of DG and SOI
CMOS devices and circuits.
From 2002 to 2008, he was with Freescale
Semiconductor CMOS Next Generation Design
Foundations, Austin, TX, where he worked on the
characterization and modeling of advanced CMOS technologies, including uniaxial stress effects. Since 2008, he has been with Qualcomm Inc.,
San Diego, CA, where he is working on 28-nm SPICE modeling and technology
enablement. His current research and development interests include the modeling and analysis of advanced CMOS devices and circuits, characterization and
modeling of uniaxial stress effects and layout effects of CMOS devices, and
modeling of nonclassical CMOS devices and circuits.

G. Yeap received the B.S.E.E. (with honors),


M.S.E.E., and Ph.D. degrees in microelectronics
from The University of Texas at Austin, Austin.
He has close to 20 years of semiconductor experiences in both wireless and CPU technology. From
1995 to 1998, he was with AMD, where he worked
as a Strategic Technologist. From 1997 to 2004, he
was with Motorola, where he worked on wireless
technology. Since 2004, he has been with Qualcomm
Inc., San Diego, CA, where he is currently the Vice
President of Technology.

S. O. Jung (M00SM03) received the B.S. and


M.S. degrees in the electronic engineering from
Yonsei University, Seoul, Korea, in 1987 and 1989,
respectively. He received the Ph.D. degree in electrical engineering from the University of Illinois at
UrbanaChampaign, Urbana, in 2002.
From 1989 to 1998, he was with Samsung Electronics, where he worked on specialty memories,
such as video RAM, graphic RAM, and window
RAM, and merged memory logic. From 2001 to
2003, he was with T-RAM Inc., where he was the
Leader of the thyristor-based memory design team. From 2003 to 2006, he was
with Qualcomm Inc., San Diego, CA, where he worked on high-performance
low-power embedded memories, process variation tolerant circuit design, and
low power circuit techniques. Since 2006, he has been an Associate Professor
with Yonsei University. His research interests include process variation tolerant
circuit design, low-power circuit design, mixed-mode circuit design, and future
generation memory and technology.
Dr. Jung is currently a board member of the IEEE SSCS Seoul Chapter.

76

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 46, NO. 1, JANUARY 2011

A 32 nm High-k Metal Gate SRAM With


Adaptive Dynamic Stability Enhancement
for Low-Voltage Operation
Pramod Kolar, Member, IEEE, Eric Karl, Member, IEEE, Uddalak Bhattacharya, Member, IEEE,
Fatih Hamzaoglu, Member, IEEE, Henry Nho, Yong-Gee Ng, Yih Wang, Member, IEEE, and
Kevin Zhang, Senior Member, IEEE

AbstractSRAM bitcell design margin continues to shrink due


to random and systematic process variation in scaled technologies
and conventional SRAM faces a challenge in realizing the power
and density benefits of technology scaling. Smart and adaptive
assist circuits can improve design margins while satisfying SRAM
power and performance requirements in scaled technologies.
This paper introduces an adaptive, dynamic SRAM word-line
under-drive (ADWLUD) scheme that uses a bitcell-based sensor
to dynamically optimize the strength of WLUD for each die. The
ADWLUD sensor enables 130 mV reduction in SRAM Vccmin
while increasing frequency yield by 9% over conventional SRAM
without WLUD. The sensor area overhead is limited to 0.02% and
power overhead is 2% for a 3.4 Mb SRAM array.
Index TermsCMOS memory integrated circuits, highk+metal-gate, process variations, read assist, sram sensors, static
random-access memory (SRAM), systematic variation, Vccmin,
word-line under-drive, 32 nm.
Fig. 1. Technology scaling trend for SRAMs in high-performance CPU [5],
[25][27].

I. INTRODUCTION

OOREs law continues to drive technology scaling to


deliver increased density and integration in CMOS technology (see Fig. 1). The ever-shrinking cost per transistor and
recent enhancements in off-state power gating and circuit-level
sleep features have enabled unprecedented integration in the
pursuit of enhanced functionality and performance in future
products [1], [2]. Active power reduction is a critical challenge
to translating increasing levels of integration to architectural
performance enhancement. Supply voltage scaling remains
one of the most effective methods to reduce active power
consumption. As the active supply voltage approaches the
threshold voltage of transistors in CMOS technology, energy
efficiency nears the optimal pointbalanced between leakage
energy and active energy consumption [3], [4]. In the pursuit of
power reduction and maximal energy efficiency, active Vccmin
(defined as the minimum active operating voltage) is a critical
metric for SRAM memories in current and future applications.
Manuscript received April 13, 2010; revised June 26, 2010; accepted August
08, 2010. Date of publication November 09, 2010; date of current version December 27, 2010. This paper was approved by Guest Editor Ken Takeuchi.
P. Kolar, E. Karl, U. Bhattacharya, F. Hamzaoglu, Y.-G. Ng, Y. Wang,
and K. Zhang are with Intel Corporation, Hillsboro, OR 97124 USA (e-mail:
pramod.kolar@intel.com).
H. Nho is with LG Electronics, Seoul, South Korea.
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/JSSC.2010.2084490

Despite advances in process technology and the ability to


produce ever-smaller feature sizes (Fig. 1), manufacturing
process variation increasingly constrains the density, active
Vccmin, performance and leakage of conventional SRAM.
Conventional SRAM active Vccmin is bounded by opposing
constraints on read stability and write margin [5]. With the
increasing impact of process variation in highly scaled transistors, wide-ranging operating temperatures and transistor aging
effects, setting a fixed design point for SRAM that provides
expected density scaling and active Vccmin is challenging.
The increasing impact of intra-die and die-to-die (D2D)
process variation on SRAM is well documented in recent
literature [5][8]. Fig. 2 shows simulated active Vccmin versus
nMOS and pMOS threshold voltage from 10 C to 95 C
considering intra-die variation on a 32 nm high-K metal gate
technology for a 6 MB SRAM array. Die-to-die variation can
be graphically depicted as a shift in the X or Y axis to a different
VTN/VTP point for a given die. The inscribed circle in the
worst-case active Vccmin diagram, representing an expected
range of D2D variation shows that the process margin between
read and write limited dice is not sufficient to meet the Vccmin
target.
Another challenge in reducing active Vccmin of scaled
SRAM is designing the bitcell to function over a wide range
of operating temperatures. At lower voltages, cell stability
and write margin are increasingly sensitive to bitcell transistor

0018-9200/$26.00 2010 IEEE

KOLAR et al.: A 32 nm HIGH-k METAL GATE SRAM WITH ADAPTIVE DYNAMIC STABILITY ENHANCEMENT FOR LOW-VOLTAGE OPERATION

Fig. 2. Active Vccmin versus VTN/VTP simulated across a temperature range


from 10 C to 95 C. Active Vccmin at 95 C and at 10 C are also shown
separately to show the shift in optimal operating point with temperature.

threshold voltages. A cell designed for optimal Vccmin at cold


temperatures may be constrained by read stability at higher
temperatures due to the temperature dependency of threshold
voltages. Fig. 2(b) and (c) show that the intrinsic read/write
process margin is larger if the bitcell can be optimized for
operation at a given temperature ( 10 C or 95 C) rather than
across a range of temperatures ( 10 C to 95 C).
Transistor aging effects can also affect the Vccmin of SRAM
bitcells. Post burn-in SRAM tests exhibit elevated read Vccmin
and improved write Vccmin due to a decrease in the effective
P/N ratio of bitcell transistors [28], [29]. Continuous innovation and process adjustments add a layer of uncertainty around
the effects of aging. End of Life (EOL) aging effects reduce the
effective process margin between read and write limited Vccmin during process development.
Recent work in literature is focused on improving intrinsic
read/write process margin to offset increasing intra-die variation with technology scaling. Circuit techniques proposed to
expand design margins to enable SRAMVCC scaling reduce
wordline (WL) voltages [9][17], [24] or increase SRAMVCC
[9], [19], [21], [22] to increase read stability, reduce SRAMVCC
during write operations [9][11], [13], [18], [19], [23] or utilize
negative bitline (BL) voltages during writes [12], [20], [24] to
improve bitcell write margin. The improvements in read/write
margin are often achieved with significant design complexity,
array area overhead and array design restrictions (column multiplexing, etc.). Die-by-die programming (DBDP) of the aforementioned assist circuits is a supplemental approach to reducing
the impact of die-to-die variation, but the test cost overhead is
not allowable for all applications.

77

An analog closed loop feedback scheme integrated with a


sensor for WLUD control has been proposed earlier [16]. The
ability to alter WLUD to rebalance read and write margins with
process corners and temperature shifts has been demonstrated.
The approach of the sensor system proposed in this work leads
to an area overhead reduction relative to the optimal wordline
bias system described in [16] by transmitting digital control signals to a large amount of memory rather than utilizing a local
op-amp in each memory sub-array. This work also demonstrates
and quantifies the Vccmin benefit and frequency yield improvement of an adaptive control system subject to process variation
over a significant quantity of 32 nm high-K metal gate material.
This paper introduces an adaptive SRAM wordline
under-drive (WLUD) scheme [9][17], [24] that uses a
bitcell-based sensor to dynamically optimize the strength of
WLUD for each die. A fixed application of WLUD improves
Vccmin for read-limited dice yet also significantly degrades
Vccmin and performance for write-limited dice. By selectively
applying WLUD, the Vccmin benefit of WLUD is substantially
improved. A bitcell-based on-die sensor classifies the die as
read or write-limited and a programmable switch applies the
optimal WLUD strength for each individual die. The sensor
allows performance benefits similar to optimal die-by-die programming (DBDP) of the WLUD assist circuit without the test
cost overhead that limits DBDP adoption for most products.
Additionally, the adaptive sensor can address temperature and
aging effects on Vccmin by tuning the assist circuit as dice
shift between read and write-limited in operation. This allows a
substantial reduction in the time-0 guard-band for aging effects
and allows a WLUD setting that is near optimal for the current
temperature of the die instead of a compromised setting that is
optimized across the entire temperature range. By addressing
inter-die variation, temperature and aging effects with the
on-die sensor and WLUD circuit, Vccmin is reduced by 130
mV and frequency-constrained yield is increased by 9%.
The rest of the paper is organized as follows: Section II discusses the concept of adaptive dynamic WLUD. Section III describes the circuit implementation details. Section IV reviews
silicon measurement results from a high volume test-chip. The
final section concludes the paper.
II. CONCEPT OF ADAPTIVE DYNAMIC WLUD
The first premise of the adaptive, dynamic WLUD (ADWLUD) technique is to actively track process skew corners to
determine if active Vccmin of each die is constrained by write
margin limitations (write limited). If the die is not write limited,
the ADWLUD circuit will apply WLUD. The second premise
is to dynamically track P/N shift with temperature and identify
write limited dice that become read stability limited (read
limited). If the dice are no longer write-limited, the ADWLUD
circuit applies WLUD for read stability enhancement.
Fig. 3 illustrates the concept of process tracking with the
adaptive WLUD system. The silicon measurements are for two
die on a single waferone at the SF corner and the other at the
FS corner. The read and write Vccmin for each die is plotted as a
function of WLUD strength. The x-intercept of the intersection
of the read and write curves is the optimal WLUD setting for the
die. As can be seen from the graph, this value is 5% of Vcc

78

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 46, NO. 1, JANUARY 2011

Fig. 3. The effect of WLUD strength on read and write Vccmin for a slow-fast
(SF) and fast-slow (FS) corner die is shown. The optimal WLUD strength is determined by where the read and write Vccmin curve intersect and are different
for the two die. Without the ability to track process corners, a compromise solution for WLUD will have to be picked which is not optimal for either the SF
or the FS die. Data is measured at 10 C.

for the SF die. Now consider the FS die. The optimal WLUD for
this die is about 17% of Vcc. Clearly what WLUD is optimal for
the FS die is not optimal for the SF die and vice-versa. If there
are no means of determining if a die is in the SF or FS corner, the
optimal WLUD setting is determined by intersection of the RD
Vccmin for the FS die and the WR Vccmin for the SF die. This
value of WLUD is 9% which is not the optimal selection for
either the SF die or the FS die. The die active Vccmin obtained
using this compromise WLUD setting is higher that what could
be achieved if WLUD strength was chosen independently for
each die. The ADWLUD circuit uses the sensor input to track
process skew corner at the die level and selects an optimized
WLUD setting from a fixed set of WLUD strength options for
each die. This yields a substantial Vccmin improvement for that
die, compared to a globally selected optimal setting.
Fig. 4 depicts the concept of temperature tracking with the
ADWLUD system. The silicon measurements are for a single
die at two different temperatures. The read and write Vccmin for
each die is plotted as a function of WLUD strength. The x-intercept of the intersection of the read and write Vccmin curves
is the optimal WLUD setting for this die at a given temperature. As can be seen from the graph, this value is 12% of
Vcc at 10 C. The optimal WLUD strength at 95 C is quite
differentabout 24% of Vcc. The WLUD setting that is optimal for this die at high temperature is suboptimal for the die
at low temperature. Without a dynamic system for tracking and
responding to this temperature shift, a compromise WLUD of
16% is optimal (determined by the intersection of the read
curve at 95 C and write curve at 10 C). Active Vccmin with
the fixed, single-point WLUD setting is higher than the Vccmin
possible with multiple settings tuned for different temperature
ranges. The ADWLUD sensor is able to track temperature shifts
and dynamically select the optimal WLUD strength setting for
each die across a range of temperatures. This selection process
yields improved active Vccmin for this die at both temperatures,
as seen in Fig. 4.

Fig. 4. The effect of WLUD strength on read and write Vccmin for a die at 95
C and 10 C is shown. The optimal WLUD strength is determined by where
the read and write Vccmin curve intersect are different for the die at the two
temperatures. Without the ability to track this temperature shift, a compromise
solution for WLUD will have to be picked which is not optimal for either the
10 C or 95 C. Data is measured for a typical die.

III. CIRCUIT IMPLEMENTATION


The circuit schematic of the adaptive dynamic wordline
under-drive scheme is shown in Fig. 5. It consists of three
modules: the local WLUD circuit in each SRAM subarray, the
ADLWUD controller and the SRAM bitcell-based sensor. First,
a local WLUD circuit is embedded in each 16 kB subarray with
three selectable strength settings. A strong WLUD setting is
accomplished by turning on a distributed, small pMOS (P3)
which is in each wordline driver. These devices are connected
in parallel across all the 256 wordline drivers to provide sufficient drive strength to discharge the wordline driver supply
voltage (WLVCC). In addition, the impact of random variation
on WLVCC is diminished significantly with the use of 256
devices. Through opportunistic use of layout space, the distributed P3 transistor incurs no area overhead. A single, large
pMOS device (P4) generates a weak WLUD setting. Since
the P4 device is shared by the entire 16 kB subarray, the area
overhead is kept at a minimum (0.1%). Turning both P3 and P4
on at the same time, provides the strongest WLUD.
At the core of the WLUD controller is the 6T SRAM cell
based Vtp/Vtn sensor. Minor modification to the layout of the
6T SRAM cell are made to produce the sensor cell. The PD
transistor is disconnected and the input of the inverter is tied
to Vss. The cross coupling is also broken. The internal nodes
are shorted and connected to Vsensor. The gate of the PG is
tied to Vcc while the bitline node is grounded. It is important to
preserve the bitcell layout as much as possible to capture layout
dependent effects on SRAM transistor performance. The ratio
of pull-up pMOS and pass-gate nMOS strength determines the
sensor output voltage, Vsensor, which has a strong correlation
to write margin for the SRAM. This analog Vsensor voltage is
used by the controller logic described previously to turn WLUD
on or off. Fourty eight sensor cells are connected in parallel to
reduce the effects of random variation.
The final component is the controller module, which consists of a comparator and reference voltage generation circuit.

KOLAR et al.: A 32 nm HIGH-k METAL GATE SRAM WITH ADAPTIVE DYNAMIC STABILITY ENHANCEMENT FOR LOW-VOLTAGE OPERATION

79

Fig. 5. The adaptive dynamic word-line under-drive (ADWLUD) circuit consists of the WLUD module, controller and 6T-SRAM bitcell based sensor.

One input to the comparator is the sensor output voltage and the
other is the reference voltage. If Vsensor is less than Vref1, the
comparator output signal activates WLUD pMOS P3, applying
a strong WLUD. Similarly, the other comparator will enable
P4 when Vsensor is less than Vref2. Depending on the magnitude of Vsensor, the controller applies either weak or strong
WLUD. Comparator offset directly contributes to the overall
error in WLUD setting assignment and reduces the Vccmin benefit. The impact of non-idealities in the controller is explored in
the measurement results section. The reference voltage generation circuit consists of a resistive divider with a multiplexer for
controller calibration. One of four different nodes from the resistive divider can be chosen as the reference voltage. Since this
is a ratioed circuit composed of uniform elements, it provides
reference voltages that are independent of process corner and
temperature. Characterization of a statistically significant quantity of silicon material is used to empirically determine optimal
values for Vref1 and Vref2. The empirically determined settings
are then used across all silicon material. The choice of Vref1 and
Vref2 is discussed in more detail in Section IV in the context of
actual silicon results.
Fig. 6 shows simulated waveforms of the ADWLUD system
at 5 GHz with 1 V supply voltage at 95 C. The diagram describes the timing of WL-driver wake-up signal (wl_slpen) and
WLVCC with different WLUD settings. The wake-up signal arrives one cycle before WL is asserted and WLVCC is restored
to the voltage level set by the WLUD strength settings, ranging
.
from 0 to 20% below
Simulations were performed to assess the effect of systematic
and random variation on the WLUD circuit itself. At a given
temperature, the variation across various process corners for
the WLVCC voltage is on the order of 6 mV. In addition, the
random variation component is 10 mV. These variation components are not significant enough to impact the efficacy of ADWLUD, which is confirmed by the silicon measurements pre-

Fig. 6. Simulation waveforms of WLUD in wake-up condition.

sented in the next section. Even considering variation in the


WLUD circuitry and comparator, significant improvement in
overall Vccmin is observed.
An analysis of area is critical, since silicon area can be used
to increase bitcell size, or implement alternative assist circuits
in pursuit of improved active Vccmin. In this work, since the
sensor and controller are shared across all SRAM array macros
on the die, the area overhead is 0.02% for a 3.4 Mb core
(Fig. 13). Power overhead is also a critical factor in pursuing
lower active Vccmin, as the purpose of lower active Vccmin is
typically to achieve a lower system active power. Power dissipation of the ADWLUD system is dominated by the bitcell-based
Vtn-Vtp sensor. The power dissipated by the refence voltage
generator and comparator are negligible in the comparison to
the sensor. The power overhead of the sensor-controller module
during active mode for a 3.4 Mb 64 bit IO SRAM array is

80

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 46, NO. 1, JANUARY 2011

Fig. 8. Vccmin is measured on a large sample of dice ( 300) with different


reference voltage settings for Vref and Vref_s during silicon characterization. It
is found that sensor_mode_ref0 gives the best Vccmin and is therefore chosen
as the optimal setting. There are several settings for which Vccmin is 2040
mV of the optimal Vccmin showing that the is not very sensitive to the choice
of Vref.

Fig. 7. The ability of the sensor to track process corners (a) and temperature
shifts (b) is shown. On the x-axis, the difference between read and write Vccmin
data obtained from Si measurements is plotted. This value is well correlated to
the sensor output voltage Vsensor. Vsensor (measured at nominal voltage for
all die) can be used to determine if a die is read or write limited by comparing
Vsensor to the reference voltage.

2% based on simulations. Enabling power gating on the sensor


module allows for further power reduction.
IV. MEASUREMENT RESULTS
The ability of the sensor to track process corners and temperature shifts is shown in Fig. 7. The y axis shows Vsensor voltage
(measured at the nominal voltage) for die and the x axis shows
the difference between read and write Vccmin for that die. The
are read limited while those
die with read-write
with read-write
are write limited. Data measured
at 10 C is shown in Fig. 7(a). A clear correlation exists between Vsensor and read-write Vccmin with a slope of 0.69
V/V. Vsensor is a good indicator of whether each die is read
or write limited. The controller module described in Section III
identifies read or write limited dice based upon on Vsensor and
Vref.
The controller determines if a die is read or write limited
based on whether Vsensor is above or below Vref (Fig. 7). All
die above Vref are identified as write limited and WLUD is not
applied. All die below the Vref line are deemed read limited and
WLUD is applied. For the die in the upper left quadrant (above
) and lower right quadrant
Vref and read-write
) the controller cor(below Vref and read-write
rectly categorizes these dice to be write limited and read limited
respectively. The die in the upper right quadrant (above Vref
), the controller incorrectly deterand read-write
mines them to be write limited. The controller does not apply
WLUD to these die even though application would be beneficial to their active Vccmin. The dice in the lower left quadrant

) are also incorrectly


(below Vref and read-write
categorized as read limited. The controller applies WLUD to
these, increasing the Vccmin of these dice. The sensor-based
controller makes correct decisions for a 70% of the die. Of
the incorrect decisions, there are two varietiesone where the
controller wrongly applies WLUD for a write limited die (false
positive) and one where the controller does not apply WLUD for
read limited die (false negatives). While false positives are detrimental to overall Vccmin, false negatives are less so as they do
not make the die any worse. The overall rate of false positives in
this set is less than 5% of the dice. False positives and false negatives occur for die that are very close to being read-write balanced and these dice do not typically constrain the Vccmin distribution. By adding a second reference voltage, Vref_s, strong
WLUD can be turned on to die that are strongly read limited as
determined by Vsensor.
The same die data are shown at 95 C in the plot in Fig. 7(b).
Again, the y axis shows Vsensor voltage for the die and the x
axis shows the difference between read and write Vccmin for
that die. Most of the die that were write limited at 10 C are
read limited at 95 C. Vsensor tracks this shift and, based on
Vref, the controller correctly decides to apply WLUD to most of
these die. Nearly 30% more dice are found to be read limited
at 95 C and the Vsensor shift triggers application of WLUD to
these dice.
In this work, silicon characterization is used to empirically
determine the optimal Vref settings. Data was collected for different settings across a large sample of dice ( 300 dice). The
optimal Vref was chosen to maximize the Vccmin benefit. The
optimal Vccmin is not very sensitive to Vref for several settings.
Characterizing the Vref selection space is a one-time cost during
process characterization. Subsequently, the same Vref can be
used for all die. The Vccmin response to Vref is relatively minor
over a wide region of settings. Changing Vref in this region does
not alter the active Vccmin significantly (see Fig. 8). Post-characterization, the Vref values are fixed for a particular process
flow and all dice are programmed with the optimal Vref setting.

KOLAR et al.: A 32 nm HIGH-k METAL GATE SRAM WITH ADAPTIVE DYNAMIC STABILITY ENHANCEMENT FOR LOW-VOLTAGE OPERATION

81

Fig. 9. The shift in measured Vsensor value when temperature changes from
10 C to 95 C is shown. For die of interest, the median shift is 80 mV with
a 1 sigma variation of 6 mV.

The temperature sensitivity of the sensor is shown in Fig. 9.


The dice in the boxed region are determined to be write limited
by the sensor at 10 C. The Vsensor shift at 95 C is particularly important for these dice as they become read limited
at 95 C. If the sensor voltage does not change enough to track
this shift, then there will be additional incorrect decisions by the
controller at 95 C. In the region of interest, the Vsensor shift
from 95 C to 10 C is 80 mV, on average, with a 1 value
of 6 mV. An 80 mV shift in Vsensor and a
ratio of 13
indicates that the signal is sensitive to temperature changes and
robust to variations.
The advantages of using sensor based WLUD compared to
conventional fixed WLUD is shown graphically in Fig. 10. The
plot in Fig. 10(a) shows the difference between read and write
Vccmin on the x axis and Vccmin on the y axis for a collecare write limtion of die. The die with read-write
ited while the die with read-write
are read limited. The maximum Vccmin for this distribution is set by a read
limited die. Next, the same dice are shown when a strong fixed
WLUD is turned on, depicted in Fig. 10(b). Read limited dice
improve while the write limited dice degrade with fixed WLUD.
The maximum Vccmin for this population is worse than without
WLUD and the overall distribution is limited by a write limited
die. In the sensor controlled mode (Fig. 10(c)), the read limited
die clearly improve and the write limited die do not degrade.
This is the key advantage of the ADWLUD technique.
The measured Vccmin distribution with no WLUD, conventional weak WLUD, conventional strong WLUD and
ADWLUD is shown in Fig. 11. Applying weak, fixed WLUD
improves the overall distribution compared to conventional
SRAM without WLUD. Even though weak, fixed WLUD
degrades write limited dice, the benefits to read limited dice
outweigh the degradation in write limited dice. However,
applying a strong, fixed WLUD introduces a sharp tail in the
distribution with 20% of the die failing due to degradation of
write limited dice. In this case, the degradation of write limited
dice with strong WLUD dominates any gains obtained on read
limited dice. In the ADWLUD mode, the Vccmin improvement
of using strong, fixed WLUD up to the 80%-tile is preserved
and the tail of the distribution is not degraded, as in the strong,
fixed WLUD case. The ADWLUD assist scheme does not
apply WLUD to the write limited dice that do not require it.
ADWLUD enables a 130 mV Vccmin improvement at the
90%-tile of the Vccmin distribution compared to the reference

Fig. 10. The distribution of read-write Vccmin for die measured on a wafer
with no WLUD (a). Applying fixed strong WLUD (b) improves read limited
die while degrading the write limited die significantly. In the sensor-controlled
mode (ADWLUD), read limited die improve but the write limited die do not
degrade significantly (c).

distribution. The ADWLUD system not only improves the


90%tile Vccmin, but also tightens the distribution by 42%
(defined as the delta between 10%90% distribution points),
from 190 mV to 110 mV.
The improvement obtained in die yield at the target operating frequency is shown in Fig. 12. The plot shows Vccmin
improvement as a function of measured frequency yield. The
frequency yield is defined as the percentage of dice meeting the
frequency target. The x axis intercept shows that 87% of dice
meet the target voltage/frequency point with no WLUD. When
weak, fixed WLUD is turned on, it lowers Vccmin and increases
frequency yield because many of the failing dice are read-limited for this wafer. However, turning on fixed, strong WLUD
degrades frequency yield and Vccmin because strong WLUD
degrades write limited die thus reducing the number of working
die at target Vcc. This in turn reduces the percentage of die
that meet frequency target. ADWLUD increases the frequency
yield by 9% while lowering Vccmin by 130 mV. The number of
yielding die at a given target frequency improves because ADWLUD prevents application of WLUD to write limited die and

82

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 46, NO. 1, JANUARY 2011

V. CONCLUSIONS

Fig. 11. Cumulative Vccmin distribution measured on Si shown with various


magnitudes of WLUD. In the sensor controlled mode, the benefits of strong fixed
WLUD are retained without the tail degradation. An improvement of 130 mV
is obtained at the 90%tile for ADWLUD compared to no WLUD.

An adaptive dynamic world-line under-drive (ADWLUD)


scheme with integrated SRAM bitcell based sensor has been
developed and successfully implemented in 32 nm high-k
metal gate process. This sensor tracks process corners and
temperature shifts allowing for dynamically adjusting WLUD.
The area overhead of ADWLUD circuit is 0.02% for a 3.4 Mb
SRAM macro. Si results show 130 mV Vccmin improvement
with ADWLUD compared to no WLUD at the 90%-tile of
the Vccmin distribution. The frequency yield (defined as the
percentage of die working at target frequency at desired Vcc)
increases by 9% in ADWLUD sensor controlled mode compared to conventional SRAM. The improvements to Vccmin
and yield at target frequency are achieved with minimal area
and power overhead.
ACKNOWLEDGMENT
The authors gratefully acknowledge members of the Advanced Design team and Portland Technology Development
technical staff for their contributions to this work
REFERENCES

Fig. 12. Comparison of Vccmin change and improvement in yield at target


frequency for various styles of WLUD.

Fig. 13. Die photo of the testchip showing the 3.4 Mb macro that is controlled
by the sensor.

therefore reduces the number of die not meeting Vccmin target.


However, it must be noted that there is frequency degradation
for a given die when WLUD is applied. This is a feature of the
WLUD technique itself and not unique to ADWLUD.

[1] S. Rusu, S. Tam, H. Muljono, J. Stinson, D. Ayers, R. V. J. Chang, M.


Ratta, and S. Kottapalli, A 45 nm 8-core enterprise Xeon processor,
in IEEE ISSCC Dig. Tech Papers, Feb. 2009.
[2] D. Wendel, R. Kalla, and R. Cargoni et al., The implementation of
POWER7: A highly parallel and scalable multi-core high-end server
processor, in IEEE ISSCC Dig. Tech. Papers, Feb. 2010, pp. 102103.
[3] A. Bhavnagarwala, B. Austin, K. Bowman, and J. Meindl, A minimum
total power methodology for projecting limits on CMOS GSI, IEEE
Trans. VLSI Syst., vol. 8, no. 3, pp. 235251, Jun. 2000.
[4] B. Zhai, D. Blaauw, D. Sylvester, and K. Flautner, Theoretical and
practical limits of dynamic voltage scaling, in Proc. DAC, 2004, pp.
868873.
[5] Y. Wang, U. Bhattacharya, and F. Hamzaoglu et al., A 4.0 GHz 291
Mb voltage-scalable SRAM design in a 32 nm high-k+ metal-gate
CMOS technology with integrated power management, IEEE J.
Solid-State Circuits, vol. 45, pp. 103110, 2010.
[6] A. Bhavanagarwala, X. Tang, and J. Meindl, The impact of intrinsic
device fluctuations on CMOS SRAM cell stability, IEEE J. Solid-State
Circuits, vol. 36, no. 4, pp. 658665, Apr. 2001.
[7] B. Calhoun and A. Chandrakasan, Static noise margin variation for
subthreshold SRAM in 65-nm CMOS, IEEE J. Solid-State Circuits,
vol. 41, no. 7, pp. 16731679, Jul. 2006.
[8] S. Mukhopadhyay, H. Mahmoodi, and K. Roy, Modeling of failure
probability and statistical design of SRAM array for yield enhancement
in nanoscale CMOS, IEEE Trans. Comput.-Aided Des. Integr. Circuits
Syst., vol. 24, no. 12, pp. 18591880, Dec. 2005.
[9] M. Khellah, N. S. Kim, Y. Ye, D. Somasekhar, T. Karnik, N. Borkar,
G. Pandya, F. Hamzaoglu, T. Coan, Y. Wang, K. Zhang, C. Webb, and
V. De, Process, temperature, and supply-noise tolerant 45 nm dense
cache arrays with Diffusion-Notch-Free (DNF) 6T SRAM cells and
dynamic multi-Vcc circuits, IEEE J. Solid-State Circuits, vol. 44, no.
4, pp. 11991208, Apr. 2009.
[10] S. Ohbayashi, M. Yabuuchi, K. Nii, Y. Tsukamoto, S. Imaoka, Y. Oda,
T. Yoshihara, M. Igarashi, M. Takeuchi, H. Kawashima, Y. Yamaguchi,
K. Tsukamoto, M. Inuishi, H. Makino, K. Ishibashi, and H. Shinohara,
A 65-nm SoC embedded 6T-SRAM designed for manufacturability
with read and write operation stabilizing circuits, IEEE J. Solid-State
Circuits, vol. 42, no. 4, pp. 820829, 2007.
[11] O. Hirabayashi, A. Kawasumi, A. Suzuki, Y. Takeyama, K. Kushida,
T. Sasaki, A. Katayama, G. Fukano, Y. Fujimura, T. Nakazato, Y.
Shizuki, N. Kushiyama, and T. Yabe, A process-variation-tolerant
dual-power-supply SRAM with 0.179 m cell in 40 nm CMOS using
level-programmable wordline driver, in IEEE Int. Solid-State Circuits
Conf. (ISSCC 2009) Dig. Tech. Papers, 2009, pp. 458459, 459a.

KOLAR et al.: A 32 nm HIGH-k METAL GATE SRAM WITH ADAPTIVE DYNAMIC STABILITY ENHANCEMENT FOR LOW-VOLTAGE OPERATION

[12] D. P. Wang, H. J. Liao, H. Yamauchi, Y. H. Chen, Y. L. Lin, S. H. Lin,


D. C. Liu, H. C. Chang, and W. Hwang, A 45 nm dual-port SRAM
with write and read capability enhancement at low voltage, in Proc.
IEEE Int. SOC Conf., 2007, pp. 211214.
[13] B. Mohammad, M. Saint-Laurent, P. Bassett, and J. Abraham, Cache
design for low power and high yield, in Proc. 9th Int. Symp. Quality
Electronic Design (ISQED 2008), 2008, pp. 103107.
[14] K. Nii, M. Yabuuchi, Y. Tsukamoto, S. Ohbayashi, Y. Oda, K. Usui, T.
Kawamura, N. Tsuboi, T. Iwasaki, K. Hashimoto, H. Makino, and H.
Shinohara, A 45-nm single-port and dual-port SRAM family with robust read/write stabilizing circuitry under DVFS environment, in IEEE
Symp. VLSI Circuits Dig., 2008, pp. 212213.
[15] K. Nii, M. Yabuuchi, Y. Tsukamoto, S. Ohbayashi, S. Imaoka, H.
Makino, Y. Yamagami, S. Ishikura, T. Terano, T. Oashi, K. Hashimoto,
A. Sebe, S. Okazaki, K. Satomi, H. Akamatsu, and H. Shinohara, A
45-nm bulk CMOS embedded SRAM with improved immunity against
process and temperature variations, IEEE J. Solid-State Circuits, vol.
43, no. 1, pp. 180191, Jan. 2008.
[16] A. Carlson, Z. Guo, L.-T. Pang, T.-J. King Liu, and B. Nikolic,
Compensation of systematic variations through optimal biasing of
SRAM wordlines, in Proc. IEEE Custom Integrated Circuits Conf.
2008 (CICC 2008), Sep. 2124, 2008, pp. 411414.
[17] H. Pilo, C. Barwin, G. Braceras, C. Browning, S. Lamphier, and F.
Towler, An SRAM design in 65-nm technology node featuring read
and write-assist circuits to expand operating voltage, IEEE J. SolidState Circuits, vol. 42, no. 4, pp. 813819, Apr. 2007.
[18] M. Yamaoka, N. Maeda, Y. Shinozaki, Y. Shimazaki, K. Nii, S.
Shimada, K. Yanagisawa, and T. Kawahara, Low-power embedded
SRAM modules with expanded margins for writing, in IEEE Int.
Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, 2005, pp.
480611.
[19] K. Zhang, U. Bhattacharya, Z. Chen, F. Hamzaoglu, D. Murray, N.
Vallepalli, Y. Wang, B. Zheng, and M. Bohr, A 3-GHz 70-Mb SRAM
in 65-nm CMOS technology with integrated column-based dynamic
power supply, IEEE J. Solid-State Circuits, vol. 41, no. 1, pp. 146151,
2006.
[20] N. Shibata, H. Kiya, S. Kurita, H. Okamoto, M. Tanno, and T.
Douseki, A 0.5-V 25-MHz 1-mW 256-kb MTCMOS/SOI SRAM for
solar-power-operated portable personal digital equipmentSure write
operation by using step-down negatively overdriven bitline scheme,
IEEE J. Solid-State Circuits, vol. 41, no. 3, pp. 728742, 2006.
[21] Y. H. Chen, W. M. Chan, S. Y. Chou, H. J. Liao, H. Y. Pan, J. J. Wu,
C. H. Lee, S. M. Yang, Y. C. Liu, and H. Yamauchi, A 0.6 V 45 nm
adaptive dual-rail SRAM compiler circuit design for lower VDDmin
VLSIs, in IEEE Symp. VLSI Circuits Dig., 2008, pp. 210211.
[22] M. Yamaoka, K. Osada, and K. Ishibashi, 0.4-V logic-library-friendly
SRAM array using rectangular-diffusion cell and delta-boosted-array
voltage scheme, IEEE J. Solid-State Circuits, vol. 39, no. 6, pp.
934940, 2004.
[23] H. Pilo, J. Barwin, G. Braceras, C. Browning, S. Burns, J. Gabric, S.
Lamphier, M. Miller, A. Roberts, and F. Towler, An SRAM design in
65 nm and 45 nm technology nodes featuring read and write-assist circuits to expand operating voltage, in IEEE Symp. VLSI Circuits Dig.,
2006, pp. 1516.
[24] Y. Fujimura et al., A configurable SRAM with constant-negative-level
write buffer for low-voltage operation with 0.149 um cell in 32 nm
high-k metal-gate CMOS, in IEEE ISSCC Dig. Tech. Papers, Feb.
2010, vol. 53, pp. 348349.
[25] F. Hamzaoglu, K. Zhang, Y. Wang, H. J. Ann, U. Bhattacharya, Z.
Chen, Y.-G. Ng, A. Pavlov, K. Smits, and M. Bohr, A 153 Mb-SRAM
design with dynamic stability enhancement and leakage reduction in 45
nm high-K metal-gate CMOS technology, in IEEE ISSCC Dig. Tech.
Papers, Feb. 37, 2008, pp. 376621.
[26] K. Zhang, U. Bhattacharya, Z. Chen, F. Hamzaoglu, D. Murray, N.
Vallepalli, Y. W. Zheng, B. Bohr, and M. , SRAM design on 65-nm
CMOS technology with dynamic sleep transistor for leakage reduction, IEEE J. Solid-State Circuits, pp. 895901, Apr. 2005.
[27] K. Zhang, U. Bhattacharya, L. Ma, Y. Ng, B. Zheng, M. Bohr, and
S. Thompson, A fully synchronized, pipelined, and re-configurable
50 Mb SRAM on 90 nm CMOS technology for logic applications, in
Symp. VLSI Circuits Dig., Jun. 2003, pp. 253254.
[28] A. T. Krishnan, V. Reddy, D. Aldrich, J. Raval, K. Christensen, J.
Rosal, C. OBrien, R. Khamankar, A. Marshall, W.-K. Loh, R. McKee,
and S. Krishnan, SRAM cell static noise margin and VMIN sensitivity
to transistor degradation, in IEDM 06, Dec. 1113, 2006, pp. 14.
[29] S. Pae, J. Maiz, C. Prasad, and B. Woolery, Effect of BTI degradation on transistor variability in advanced semiconductor technologies, IEEE Trans. Device and Materials Reliability, vol. 8, no. 3, pp.
519525, Sep. 2008.

83

Pramod Kolar (S01M04) received the B.E. degree in electronics and communication engineering
from the National Institute of Technology, Surathkal,
India, in 1998 and the M.S. and Ph.D. degree in electrical engineering from Duke University, Durham,
NC. in 2002 and 2005, respectively.
He has been with Advanced Design, Logic Technology Development, Intel Corporation, since 2005,
where he works on SRAM bitcell development,
statistical circuit design and yield analysis. He has
published 12 papers in international conferences
and technical journals and holds two U.S. patents. He was a graduate intern at
Qualcomm in 2004.
Dr. Kolar received the Inventor Recognition Award from Semiconductor Research Corporation in 2005.

Eric Karl (S03M08) received B.S.E., M.S.E., and


Ph.D. degrees in electrical engineering from the University of Michigan, Ann Arbor, in 2002, 2004, and
2008, respectively. He was an undergraduate intern in
1999 and 2000 at the Electrical Center, General Motors Corporation and in 2001 with Sun Microsystems
SPARC microprocessor development teams. In 2004,
he was a graduate intern at the IBM T. J. Watson Research Center and in 2005, he was a graduate intern
at Circuits Research Lab, Intel Corporation.
After finishing the Ph.D. in March 2008, he joined
Logic Technology Development, Intel Corporation, as a Senior Design Engineer. Since then, he has been working on low-power, high-performance SRAM
cache design and technology development for CPU and SoC applications. He is
the author or coauthor of 12 technical papers.
Dr. Karl participated in reviewing papers for conferences and journals including International Symposium on Computer Architecture, Solid State Electronics, ACM/IEEE Design Automation Conference, and IEEE TRANSACTIONS
ON ELECTRON DEVICES.

Uddalak Bhattacharya (S93M97) received the


B.Tech. degree in electronics and electrical communication engineering from the Indian Institute of
Technology, Kharagpur, India, in 1991, and the M.S.
and Ph.D. degree in electrical engineering from the
University of California at Santa Barbara in 1993
and 1996, respectively.
He is a principal engineer in the Logic Technology
Development Organization at Intel Corporation. His
interests are in high speed digital circuit design, test,
measurement, and yield analysis.

Fatih Hamzaoglu (S97M02) received the B.Sc.


degree from the Middle East Technical University,
Ankara, Turkey, in 1996, the M.S. degree from
Clemson University, Clemson, SC, in 1998, and
the Ph.D. degree from the University of Virginia,
Charlottesville, in 2002, all in electrical engineering.
After finishing the Ph.D. in September 2002, he
joined Portland Technology Development, Intel Corporation, as a Senior Design Engineer. Since then,
he has been working on low-power high-performance
memory design, including SRAM, DRAM and nonvolatile memories. He is the author or coauthor of more than 20 papers and inventor/co-inventor of six patents.
Dr. Hamzaoglu participated in reviewing papers for IEEE TRANSACTIONS
ON CIRCUITS AND SYSTEMS, IEEE TRANSACTIONS ON VLSI, ACM Design
Automation Conference, and IEEE International Symposium on Circuits and
Systems.

84

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 46, NO. 1, JANUARY 2011

Henry (Hyunwoo) Nho received the B.S. in electrical engineering from Korea Advanced Institute of
Science and Technology, Daejeon, Korea, in 2003,
and the M.S. and Ph.D. degrees in electrical engineering from Stanford University, Stanford, CA, in
2005 and 2008, respectively.
After finishing the Ph.D. in June 2008, he joined
Advanced Design Group, Logic Technology Development, Intel Corporation, where he worked on the
lead vehicle design for future process technology development, and low-power high-performance SRAM
designs for CPU and mobile applications. In January 2010, he joined Mobile
Platform Architecture Group, LG Electronics, as a Senior Research Engineer.
Since then, he has been working on architecture for mobile application processors, optimization of system architecture for mobile devices, and adoption of
innovative technologies into mobile devices.

Yong-Gee Ng was born in Singapore. He received


the Bachelor and Master degrees in electrical engineering from Arizona State University in 1989 and
1991, respectively.
He joined Intel Corporation in September 1991 and
is currently a senior design engineer at Portland Technology Development group. His interests are in lowpower high-performance SRAM cache design.

Yih Wang (M02) received the B.S.E.E. degree from


National Tsing Hua University, Hsinchu, Taiwan, and
the M.S. and Ph.D. degrees in electrical and computer
engineering from University of Florida, Gainesville,
in 1998 and 2000, respectively.
From 1996 to 2001, he worked at Florida SolidState Electronics Lab as a Research Assistant and was
the Pittman-Eminent Scholar Predoctoral and Postdoctoral Fellow. He has been with Advanced Design
Group, Logic Technology Development, Intel Corporation since 2001, where he has been engaged in the
development of low-power and high-performance SRAM designs for CPU and
SoC applications.

Kevin Zhang (SM07) received the B.S. and Ph.D.


degrees in electrical engineering from Tsinghua
University, Beijing, China, and Duke University,
Durham, NC, respectively.
He is an Intel Fellow and Director of Advanced
Design at Logic Technology Development, Intel,
where he is responsible for developing advanced
design collaterals, including design rules, digital
circuit library, analog/mixed-signal/RF circuits,
and embedded memories for Intels future process
development and product applications. Prior to this
role, he led embedded memory technology development from 90 nm to 32 nm
node at Intel. Zhang has published over 40 papers at international conferences
and technical journals. He holds over 45 US patents in the area of integrated
circuit technology. Currently, he is chairing Memory subcommittee of ISSCC
Technical Program Committee.

A Dense 45nm Half-differential SRAM


with Lower Minimum Operating Voltage
Gregory Chen, Michael Wieckowski, Daeyeon Kim, David Blaauw, Dennis Sylvester
University of Michigan, Ann Arbor, Michigan
{grgkchen, wieckows, daeyeonk, blaauw, dmcs}@umich.edu
Abstract- We present a 45nm half-differential 6T SRAM (HDSRAM) with differential write and single-ended read, enabling
asymmetric sizing and VTH selection. The HD-SRAM bitcell uses
SRAM physical design rules to achieve the same area as a
commercial differential 6T SRAM (D-SRAM). We record
measurements from 80 32kb SRAM arrays. HD-SRAM is 18%
lower energy and 14% lower leakage than D-SRAM. It has a
72mV-lower VMIN, demonstrating higher stability.

I.

INTRODUCTION

Process variations such as random dopant fluctuation and


line edge roughness degrade SRAM operating margins [1].
Since designs commonly have large SRAMs, each bitcell must
be extremely robust to achieve high chip yield. In differential
6T SRAM (D-SRAM), read stability is improved by making
the pull down (PD) device strong relative to the pass gate
device (PG). This reduces the probability of read upset or
destructive read failures when bitcells are subject to process
variation. Write stability is improved by making PG strong
relative to the pull up device (PU). SRAM designs are
commonly read-stability limited.
Many SRAM designs achieve higher read stability by
increasing PD width and PG length. In addition, PG typically
has a higher threshold voltage (VTH) than PD. However,
making the device dimensions larger increases bitcell area.
Also, increasing PG L and VTH reduces performance and
degrades write margins. The two-sided read and write criteria
create an upper bound on the overall stability margin. 8T
bitcells separate read and write circuitry to increase stability at
the expense of area and leakage [2]. The proposed halfdifferential 6T SRAM (HD-SRAM) improves voltage
scalability and operating margin with no increase in bitcell
size or leakage.
II.

VDD
800mV
750mV
700mV
650mV
600mV
550mV

HD-SRAM
0
0
0
0
1
19

D-SRAM
0
1
1
5
10
115

Figure 1. HD-SRAM operates with differential write and single-ended read,


enabling asymmetric sizing and VTH selection for higher robustness.

HALF-DIFFERENTIAL SRAM METHOD

A. Operation, Sizing and VTH Selection


HD-SRAM performs a differential write access in the
same manner as D-SRAM, but only reads the bitcell from one
side (Fig. 1). This enables asymmetric sizing and VTHselection optimizations to improve stability margins. During a
write operation, both wordlines (WLs) are asserted, both PGs
turn on, and the differential value on the bitlines (BLs)
overwrites the cell value. During a read operation, the readand-write WL (WLRW) is asserted, turning on the read-andwrite PG (PGRW). This device selectively discharges its
associated BL (BLRW) based on the bitcells stored value.

978-1-4244-9474-3/11/$26.00 2011 IEEE

Figure 2. HD-SRAM is the same size as a commercial differential 6T design


(D-SRAM). Both designs exceed logic design rules for higher density.

57

Asymmetric sizing and VTH-selection optimizations


increase bitcell stability without increasing area or energy. For
the single-ended read, the write-only pull down device (PDW)
does not strongly impact read stability, so we reduce its width
to minimum size. This significantly reduces the bitcell area
because in D-SRAM, the PDs are large to enhance read
stability. We apply the resulting area savings to increase the
read-and-write side PD (PDRW) width and PG (PGRW) length,
improving read margin. Since the length of PGRW is increased,
we can increase the lengths of PDW and the write-only pull-up
device (PUW) no area penalty. This increases write-one margin
and also improves read stability by decreasing the positive
feedback between the cross-coupled inverters.
In many D-SRAM designs, two NMOS VTHs are available
in the process, a higher VTH for PGs and a lower VTH for PDs.
For each HD-SRAM device, we optimally select from the
existing VTHs to improve stability margins. The higher NMOS
VTH usually reserved for PGs is used for PDW to help prevent
read upsets. Using the low VTH device for the write-only pass
gate (PGW) would further increase write-one margin but
decreased the overall simulated robustness and increased
leakage. So the higher VTH is selected for this device. Previous
asymmetric SRAMs do not provide silicon results and either
decrease robustness, increase bitcell area, and/or do not
consider the physical design of SRAM [3][4][5].
B. Physical Design
The HD-SRAM bitcell has the same area (0.374um2) as
the commercial D-SRAM bitcell in this 45nm process to allow
for an accurate comparison. The layout violates logic design
rules to achieve higher density, which is typical for
commercial SRAM but uncommon in research efforts [3] (Fig.
2). We implemented the design with feedback from the
foundry regarding design, lithography, and design for
manufacturing (DFM) rules. The two WLs are on Metal 4,
grounds are on Metal 3, and the BLs and VDD are on Metal 2.
All polysilicon is linear and unidirectional to enable double
patterning. Unlike most D-SRAM, PDW and PGW are the same
width in HD-SRAM, eliminating a notch in the source-drain
region and improving DFM.
C. Simulated Results

Figure 3. HD-SRAM has an 85mV-higher simulated SNM than D-SRAM at


nominal VDD. SNM remains higher as VDD scales below 500mV.

HD-SRAM achieves higher robustness than D-SRAM,


even when peripheral assist circuits and optimal technology
selection are applied only to D-SRAM. HD-SRAM has an 85mV higher simulated static noise margin (SNM) than DSRAM at the nominal VDD of 1.1V (Fig. 3a). The HD-SRAM
SNM remains higher as VDD scales to below 500mV (Fig. 3b).
Since SRAM is typically read-stability limited at nominal
VDD, one read assist technique reduces the WL voltage (VWL)
to increase read margin [6]. As a measure of robustness, we
simulate the maximum VTH variation that a typical bitcell can
tolerate without functional failure for read, write, and hold
operations. We simulate the designs in SPICE using
importance sampling and normalize the robustness to a typical
45nm distribution of VTH with =40mV [7]. As D-SRAM VWL

Figure 4. D-SRAM robustness improves with assist techniques such as WL


voltage selection (a) and technology VTH selection (b). However, neither of
these techniques acheives as high robustness as HD-SRAM.

58

decreases from 1.1V to 1.02V, read-stability and total


robustness increase from 4.2 to 4.8 (Fig. 4a). However, as
VWL further decreases, write margin degrades overall
robustness and latency becomes prohibitive. Separate voltages
can be used for write and read, but this requires pre-decoding
and additional complexity. HD-SRAM without read assistance
is more robust than D-SRAM at any VWL. HD-SRAM
robustness further improves with read assistance.
The optimal selection of technology parameters, such as
VTH, also improves robustness. In typical SRAM processes,
these parameters are carefully tuned to optimize the design.
However, the nominal VTH selections may trade off robustness
for improved performance. We simulate bitcell robustness in
SPICE using importance sampling for theoretical selections of
technology parameters, with reasonable selections of PD, PG
and PU VTHs. The maximum D-SRAM robustness of 4.8 is
achieved by reducing PD VTH and increasing PG VTH (Fig.
4b). This robustness is lower than both the nominal and
maximum HD-SRAM robustness of 6.1 of 7.0, respectively.
III.

Figure 5. HD-SRAM and D-SRAM arrays use nearly-identical peripheral


circuits. A BIST performs march and speed tests on both designs.

MEASUREMENT RESULTS

A. Test Chips

Process
Bitcell Area
VMIN
Sim. SNM
Performance
Energy / bit
Leakage / bit

We fabricated test chips including 32kb banks of HDSRAM and commercial D-SRAM in a 45nm CMOS process
with 1.1V nominal VDD (Figs. 5 and 6). Each bank uses
identical address decoders, WL and BL drivers, and sense
amplifiers (SAs). HD-SRAM adds gating logic and an
additional WL driver to support two WLs per row, slightly
decreasing array efficiency. We tie one HD-SRAM SA input
to a reference voltage to accommodate single-ended read. The
test chips do not include assist circuits, error correction coding
(ECC), or redundancy, which could be applied to either
design. A BIST performs functionality and performance tests
on each design. Functionality is assessed by performing march
tests with solid, checkerboard and stripe test patterns.
B. Performance, Power and Leakage

HD-SRAM D-SRAM
45nm CMOS
0.37 m2
639 mV
711 mV
353 mV
268 mV
550 MHz
650 MHz
43 fJ
53 fJ
55 pW
64 pW

Figure 6. Chip micrograph and results summary.

D-SRAM is 15% faster than HD-SRAM. Performance is


defined as the speed at which every bitcell in the array is
functional. It is read limited and includes WL, bitcell, and BL
delays for each design (Fig. 7). In a microprocessor, this delay
amortizes over register, interconnect, decoder, sense amplifier
and multiplexer delays. HD-SRAM has larger read devices
that exhibit less timing sensitivity to process variation,
decreasing array latency that is dictated by the slowest cells.
HD-SRAM has 18%-lower access energy than D-SRAM
(Fig. 8a). The HD-SRAM write energy is slightly higher
because of higher total capacitance on the WLs caused by
higher routing capacitance. However, the HD-SRAM read
energy is significantly lower since only WLRW switches and
capacitance on this WL is lower than the total D-SRAM WL
capacitance. Also, in the read-one case, neither BL discharges,
whereas one BL always discharges during a D-SRAM read.
HD-SRAM has a 14%-lower leakage power than DSRAM (Fig. 8b). The leakage improvements result from
longer gate lengths selected for PGRW, PDW, and PUW. In

Figure 7. Measured performance results show that the HD-SRAM array is


15% slower than D-SRAM at nominal VDD. Array performance is dictated by
the slowest cells and HD-SRAM exhibits less timing variation.

Figure 8. HD-SRAM has an 18%-lower measured access energy and


a 14%-lower measured leakage than D-SRAM.

59

addition, PDW has a higher VTH than in D-SRAM. 8T SRAM


would have significantly higher leakage than D-SRAM,
because of additional devices added to the bitcell.
C. Minimum Operating Voltage
We record the minimum operating voltage (VMIN) for 80
test chips. VMIN is defined as the VDD where all bitcells in the
SRAM bank are functional. A lower VMIN indicates higher
bitcell stability, since the bitcell is more tolerant to the
decreased noise margins and the greater effect of process
variations at low voltage. Error maps from one chip show that
VMIN is 800mV for D-SRAM, with one bitcell failing below
this voltage. Meanwhile, every HD-SRAM bitcell functions at
650mV, demonstrating a lower VMIN for this array (Fig. 9).
Across all 80 test chips, the average HD-SRAM VMIN is
639mV, whereas the average D-SRAM VMIN is 72 mV higher
at 711mV (Fig. 10). HD-SRAM also exhibits decreased
variation in VMIN among test chips because the devices that
limit stability are larger and less affected by process variations
such as random dopant fluctuation and line edge roughness.
Only 4 HD-SRAM arrays have VMIN above 700mV, whereas
35 D-SRAM arrays fail this criterion. This demonstrates
higher HD-SRAM yield at a given VDD.
Bitcell failure rates are recorded for all 80 test chips. At
nominal VDD, SRAM failures are rare. Therefore, to observe a
significant number of errors, VWL is raised by 50mV to
aggravate read failures. Since these cells are typically read
stability limited, this emphasizes variation and emulates cells
at the tails of the process variation distributions. In larger
SRAM arrays, it is more likely that these tail bitcells would
appear. Under this condition, HD-SRAM has a 100 lower
failure rate than D-SRAM at nominal VDD (Fig. 11).

Figure 9. Failure maps show bitcell failure locations as VDD is scaled down.
For this test array VMIN is 650mV for HD-SRAM and 800mV for D-SRAM.

ACKNOWLEDGEMENTS
The authors thank STMicroelectronics for fabrication
and support of this project.
Figure 10. A histogram of measured VMIN for 80 test chips shows that HDSRAM has a 72mV-lower average VMIN and fewer arrays with high VMIN.

REFERENCES
[1]

[2]

[3]

[4]

[5]

[6]

[7]

R. Aitken, S. Idgunji, "Worst-Case Design and Margin for Embedded


SRAM," Design, Automation & Test in Europe Conference, pp.1-6,
Apr. 2007.
L. Chang et al., An 8T-SRAM for Variability Tolerance and LowVoltage Operation in High-Performance Caches," IEEE Journal of
Solid-State Circuits, vol.43, no.4, pp.956-963, Apr. 2008
N. Azizi, F.N. Najm, A. Moshovos, "Low-leakage Asymmetric-cell
SRAM," IEEE Transactions on VLSI, vol.11, no.4, pp. 701- 715, Aug.
2003.
B. S. Gill, C. Papachristou, F.G. Wolff, "A New Asymmetric SRAM
Cell to Reduce Soft Errors and Leakage Power in FPGA," Design,
Automation & Test in Europe, pp.1-6, Apr. 2007.
K. Kim; J.-J. Kim, C.-T. Chuang, "Asymmetrical SRAM Cells with
Enhanced Read and Write Margins," International Symposium on VLSI
Technology, Systems and Applications, pp.1-2, 23-25 Apr. 2007.
K. Nii et al., "A 45-nm Bulk CMOS Embedded SRAM with Improved
Immunity Against Process and Temperature Variations," IEEE Journal
of Solid-State Circuits, vol.43, no.1, pp.180-191, Jan. 2008.
G.K. Chen, D. Blaauw, T. Mudge, D. Sylvester, and N.S. Kim, "Yielddriven Near-threshold SRAM Design," IEEE/ACM International
Conference on Computer-Aided Design, pp.660-666, 4-8 Nov. 2007.

Figure 11. At nominal VDD, HD-SRAM has a 100x-lower bitcell failure rate
than D-SRAM. Read failures dominate cell stability at nominal VDD and are
aggravated in only this plot to observe a significant number of failures.

60

A Novel 9T SRAM Design in Sub-Threshold Region


Arun Ramnath Ramani and Ken Choi
Department of Electrical and Computer Engineering
Illinois Institute of Technology
Chicago, Illinois-60616
aramani3@iit.edu, kchoi@ece.iit.edu

Abstract -- With technology scaling, lower power operation


has become one of the key areas of importance in VLSI
Design. Lowering power supply is a very good and effective
technique for power reduction. Scaling the supply voltage into
the sub-threshold region for low power operation is possible.
Power reduction in memory circuits with a little compromise
on performance is very useful as they form a major part of a
digital chip. In this paper, operation of various SRAM designs
in sub-threshold region is examined and the ones which
overcome the challenges that arise from operating in the subthreshold region are also explained. Among the chosen designs
for performance evaluation, the successful designs were the
ones which resulted in proper read and write. The 7T SRAM
and 8T SRAM came up as the best among the selected ones for
write and read respectively. In this paper, a new 9T SRAM
design is proposed combining the advantages of these two and
hence better overall performance. In case of write, the PDP of
proposed 9T SRAM design is 2.80% less than the 7T SRAM,
4.48 % less than 8T SRAM, 5.64% less than 9T SRAM design
and 8.5 % less than 11T SRAM. Similarly, the savings in PDP
during read is 44.8 % less than 7T SRAM and 66.18 % less
than 9T SRAM. It is almost same as in the 8T Design. Though
PDP of the proposed design is greater than that of the 11T
design, the reduced RSNM of 11T design makes it inferior the
proposed design when operated in the sub-threshold region.
Keywords: SRAM, Sub-threshold Region, Sub-threshold SRAM
Design, Voltage Scaling, Read Static Noise margin

I. INTRODUCTION
Technology scaling has made the current day ICs faster,
denser. But faster the circuits are, more is the power
consumption and hence reduces the battery life of many of
the portable devices. Reduction in power consumption can
be achieved by many techniques: Logic Optimization,

Fig. 1 Traditional 6T SRAM Schematic [4]

Pipelining and parallelism, voltage scaling etc. As


technology scales down, the leakage power starts to
dominate and this is considered to be of utmost importance
with further scaling. Memory circuits such as SRAM
occupies considerable amount of area in any digital IC. To
maximize power savings, designs that operate in subthreshold region have been proposed [1] [2]. It has been
proved that sub-threshold region operation leads to
reduction in operational energy for logic [3]. Sub-threshold
operation possibility has led to further research of SRAM
designs. But when technology scaled as low as 45nm, there
is a possibility that the other above-threshold SRAMs may
also be used for successful sub-threshold operation and
thereby entering into ultra low-power operation. This paper
examines such possibility where many above-threshold low
power SRAM designs are pushed into the sub-threshold
region and compared for best performance with respect to
speed, power consumption and average power delay
product. The ability of the cell to write properly and to have
adequate read noise margin is very important for subthreshold region. In the paper, we examine many of such
necessities for successful operation. Also, a new 9T SRAM
combining the advantages of these circuits is proposed in
the paper.
The paper is organized as follows: Section 2 explains the
challenges faced by the standard 6T SRAM for operation in
the sub-threshold region. Section 3 details the abovethreshold SRAM designs chosen for comparison. Section 4
explains the Simulation Setup. Section 5 compares the
results obtained along with the new design proposed from
the analysis of the results and Section 6 is the Summary of
the paper.

Fig. 2 6T Single ended SRAM [6]

Fig. 3 7T SRAM as in [7]

Fig. 4 7T SRAM as in [8]

Fig. 7 9T SRAM as in [11]

Fig. 5 7T SRAM as in [9]

Fig. 8 9T SRAM as in [12]

Fig. 6 8T SRAM as in [10]

Fig. 9 11T SRAM as in [13]

II. PROBLEMS FACED BY 6T SRAM IN SUBTHRESHOLD REGION

III. ABOVE-THRESHOLD SRAM DESIGNS FOR


COMPARISON

The 6T SRAM cell, shown in Figure 1 has two crosscoupled inverters ((M1, M2) & (M4, M3)) connected to the
bit lines through the access transistors (M5, M6) [4].
During write, the bit lines are driven with the value that has
to be written into the cell. The word line WL goes high and
the values are stored in the cell. During read, the cell drives
the respective bit lines with the value stored in the cell. This
cell has already been studied in the sub-threshold region
[5].Results has shown that the write ability of the cell fails
due to decreased signal levels and increased variations.
Write depends on NMOS winning the ratioed fight with the
PMOS. But as iso-size PMOS is stronger than NMOS in
sub-VT region, this becomes more challenging and fails [5].
Similarly, read SNM also comes down heavily because of
the interference from the bit lines and hence more prone to
flipping the state of the cell. SNM defines the amount of
noise that the cell can bear before the state of the cell flips.
In [1], they show that, for a 6 probability, hold SNM for a
particular supply say 0.3 V is equal to the RSNM of a
supply twice of it (0.6 V). So, we can operate the 6T SRAM
cell at very low voltages provided that RSNM problem is
removed. There have been many SRAM design proposals
for sub-threshold region operation. Some of the abovethreshold SRAMs which satisfy the above condition for
read and also with the ability to write at low voltages can be
successful in the sub-threshold region. Some of such
SRAM designs were chosen and were pushed into subthreshold region of operation and their performance was
observed and compared.

The first cell observed is the 6T single ended SRAM cell


[6] as shown in the Figure 2. This cell design uses two
assist transistors: one for the read (MRA) and another for the
write (MWA) purpose. During write operation, with the BL
precharged to required value, WWL is held high and MWA
is held off so as to weaken the cross coupled inverter and
hence get a successful write. During read, RWL is held high
and read occurs through M6 and MRA depending upon the
value stored at the node QB.
The second cell used for comparison is the 7T SRAM [7]
as shown in Figure 3. Here this design uses two virtual
ground rails. During write operation, the cell is
disconnected from the ground rail by turning off MW and
hence weakening the feedback in the cell and helps in faster
write. During hold mode, MW is held ON so that the strong
feedback exists. During read, MW and RWL are high; the
node Q is read through the MR transistor. This uses a
separate read bit line to be read and hence has a very good
RSNM.
The third design used for comparison is the 7T SRAM [8]
shown in Figure 4. This uses write property just as in a
traditional 4T SRAM. When the word line is low, the
values from the bit lines are written into the respective
nodes. For reading it uses separate set of transistors (N3, N4
and N5) where reading occurs when WLR goes high. The
main problem with this design is the leakage of the storage
node. Also this design has higher bit line parasitic

capacitance since for reading a pass transistor connected to


bit line [8].

down the read process as the read bit line discharge has to
take place through three stacked transistors.

The fourth cell that we used for comparison is the 7T


SRAM cell [9] as shown in the Figure 5. Read operation
here occurs exactly similar to that of the 6T SRAM.
Transistor N5 is kept ON and hence read occurs in the usual
way as the standard 6T SRAM. During write, the feedback
is disconnected by switching OFF N5 and hence the design
behaves as cascaded inverters using single ended write
operation. The BLB has the value to be written into QB and
this in turn activates the output to the other storage node Q.

The eighth cell used for comparison is the 11T SRAM


[13] as shown in the Figure 9.Here the cell disconnects the
node to which a logical HIGH to be written from the
ground and hence felicitates the write faster. In case of
reading, it uses a separate read bit line and N5, N2, N8 and
N9 are ON and hence N9 and N8 give low effective
resistance to the read operation and hence faster read.

The fifth cell for comparison is the 8T SRAM cell [10] as


shown in the Figure 6. This cell uses the same 6T SRAM
structure for the writing operation. For read it uses a
separate bit line, RBL with RWL as its control signal.
During write, the PMOS and NMOS transistors of the
inverters can be maintained of the minimum width as the
read operation is separated. The RBL is read according to
the value stored at the storage nodes when RWL is high.

The comparison was done in single cell level using the


typical corner Predictive technology model [14] at 45nm
technology. The experiments were conducted using 0.3 V
supply voltage where the threshold of the NMOS transistor
was 0.47 V. All the simulations were done at a temperature
of 25C. Three measurements were considered in the
evaluation: speed, power and average power delay product.
The results can be used to generally understand how
various above-threshold SRAM designs work when pushed
into the sub-threshold region. The delay is measured from
the time where the control signal is 0.5*Vdd to the time
where the storage node or bit line reaches 0.5*Vdd. The
power was calculated as the average power over the period
of time. Power for write, read and hold modes were
calculated. The average power delay product is the average
PDP combining both the reads and both the writes
operations. All the designs were sized appropriately for
comparison. The results for delay and power are shown in
the Table I and Table II respectively. To obtain the overall
performance aspect, the average power delay products of
write and read operations of the cells which give the better
performance alone is shown below in Table III.

IV SIMULATION SETUP

The sixth cell used for comparison is the 9T SRAM [11]


as shown in the Figure 7. Write occurs just as in the 6T
SRAM cell. Reading occurs separately through N5, N6 and
N7 controlled by the Read Signal RD going high. This
design has the problem of the high bit line capacitance with
more pass transistors on the bit line as mentioned in [8].
The seventh cell used for comparison is the 9T SRAM
cell [12] as shown in the Figure 8. Here the operation of the
write is similar to that of the 6T SRAM cell. For read, there
are three transistors used with two controlled by the RWL.
This setup can reduce the bit line leakage by making use of
the stack effect. But the disadvantage is that it can slow
TABLE I

TIMING RESULTS OF SRAMS AT 45nm TECHNOLOGY WITH 0.3V VDD


TIME IN PICOSECONDS (ps)

Delay
Twrite01
Twrite10
Tread 1
Tread0

6T
[6]
Fails
321
-0.494

7T
[7]
1015.5
198.15
367
--

7T
[8]
2501
1495
721
721

7T
[9]
Fails
Fails
316
240

8T
[10]
1035
200
-210

9T
[11]
1131
201
466
466

9T
[12]
1039.5
202.5
-562

11T
[13]
1059
202.5
-203

10T
[1]
2611.5
460.7
-0.605

Proposed
Design
1018.3
193.19
---209.6

10T
[1]
6.48
6.18
0.338
0.482
0.217

Proposed
Design
2.21
2.23
0.244
0.382
0.161

TABLE II
POWER RESULTS OF SRAMS AT 45nm TECHNOLOGY WITH 0.3 V VDD
POWER IN NANOWATTS (nW)

Delay
Pwrite01
Pwrite10
Pread 1
Pread0
PHOLD

6T
[6]
Fails
3.146
0.171
0.665
0.0768

7T
[7]
2.32
2.25
0.456
0.235
0.161

7T
[8]
2.02
2.02
0.862
0.862
0.251

7T
[9]
Fails
Fails
0.640
0.659
0.201

8T
[10]
2.28
2.29
0.241
0.379
0.158

9T
[11]
2.5
2.5
0.998
0.998
0.177

9T
[12]
2.32
2.28
0.253
0.439
0.156

11T
[13]
2.34
2.33
0.195
0.424
0.158

TABLE III
AVERAGE POWER DELAY PRODUCT OF SRAMS in (Ws)

Avg PDP

7t
[7]

8t
[10]

9t
[12]

11t
[13]

Proposed
Design

Writing

1.3866e-18

1.4109 e-18

1.4283 e-18

1.4728 e -18

1.3477 e-18

Reading

1.267 e -19

0.651 e -19

1.944 e-19

0.6282e-19

0.6573 e-19

V. ANALYSIS OF RESULTS
All the SRAM designs were compared for power and
delay values when subjected to operate in the sub-threshold
region. Some of the designs worked well into the subthreshold region. The designs that did not work were not
taken into comparative study for performance. The further
part of the paper explains about the reasons as if why the
design does work or does not work.
A. Designs that fail
Out of the designs compared, along with the standard 6T
design, the 7T SRAM design proposed in [9], the single
ended 6T SRAM design proposed in [6] do not work in the
sub-threshold region. Standard 6T design does not work for
the reasons mentioned earlier. The single ended 6T SRAM
design does not work because the inverter 1 is not able to
activate the inverter 2 in such low supply voltages. The
same problem exists with the 7T SRAM from [9] too. Also
this SRAM uses the traditional read method which reduces
the RSNM of the circuit and hence more prone to failure.
B. Designs that work
Among the designs that are used for comparison, the 7T
SRAMs proposed in [7], [8], 8T SRAM proposed in [10],
the 9T SRAM designs proposed in [11] [12], and the 11T
SRAM proposed in [13] work well. All these designs
except the 11T SRAM [13] use separate read and write
parts in the SRAM.
The sub-threshold SRAM proposed in [1] was also taken
for comparison analysis. Here in 45nm technology, when
the circuit was simulated, the PMOS header to the supply
had to be upsized for proper functionality. When the results
were compared with the other SRAM designs considered
for study, we find that all the other working ones using the
separate read and write mechanism performed better. This
is because the design in [1] uses floating Vdd and hence it
takes longer time to write. Similarly, the read has to occur
through more number of transistors and hence it is slower.
The comparison designs work well for writing just but they
are slower than the above threshold operation which is
obvious from such a low supply. From the results in Table I

and Table II, we can see that 7T SRAM from [8] and the 9T
SRAM from [11] have the worst performance with respect
to power and speed and hence were not considered further
to find the better performer. This is because of the high bit
line capacitance and hence higher delay and power as these
designs have extra pass transistors connected to the bit lines
[8].
C. SRAM Write
Among the designs used for study, successful writes gives
almost similar results as they use the same 6T SRAM
design for write. This is very because, since the inverter is
not ratioed, the write 1 operation can occur at such low
supply for typical corner transistor models [2]. In Table I,
we can see that combining the results of both write 1 and
write 0, we can see that the 7T SRAM in [7] comes as the
fastest. This is because, it weakens the feedback by
disconnecting the 6T write part from the ground and hence
can load into the storage nodes easily [6]. These are closely
followed by the 9T SRAM [12], 11T SRAM [13] and the
8T SRAM [4]. From Table II, when we compare the power
consumption, they differ according to the usage of
transistors with the 11T SRAM design consuming the most.
The difference in speed between the considered designs is
almost nullified by the difference in power consumption as
the faster ones use more number of transistors. This can be
observed from Table III which has the average power delay
products. The result for all the selected designs is almost
the same with minor differences. This is due to the fact that
the write mechanism is similar in all the compared cases.
D. SRAM Read
When we compare the results, the designs with separate
read ports with a dedicated read bit line emerge winners in
terms of speed and power. In case of the speed as shown in
Table I, among these designs we can see that 11T SRAM
[13] comes as the fastest in reading followed by the 8T
SRAM [10] and 7T SRAM [7]. This is because the read
discharge through the storage node has two paths to get
discharged and hence faster. The 8T SRAM is very close in
speed to this because the read has to occur through just two
of the transistors. Though the 7T SRAM has the similar

Fig 10 RSNM of 11T SRAM from [12]

direct access to the storage node, interruptions cause the


RSNM of this cell to be degraded where as 8T SRAM [9]
has the best possible SNM when the inverter transfer curves
were plotted which is shown in the Figure 10 and Figure 11.
The 8T design has the best SNM because the storage nodes
are decoupled from the bit lines and hence maintain strict
values. The read SNM was calculated and was found to be
85mv. In the case of 11T design[13] the discharge has to
occur through the storage node and hence makes a rise in
steady state voltage of the storage node when the node
stores a 0 and hence making the node more vulnerable to
noise. SNM was found to be 17mv only. Thus the read
circuit of 8T SRAM has the best performance among the
selected designs for comparison.
F. Proposed 9T SRAM Design

Fig 11 RSNM of 8T SRAM [9] and Proposed 9T SRAM Design

Fig 12 Proposed 9T Design

read set-up, the bit line is connected to the transistor which


is controlled by the storage node and not a control signal
and hence it is slightly slower. In case of power
consumption as shown in Table II, we can see that both the
8T and the 11T designs come on par with each other when
the power consumptions are averaged over the three
different operation modes. Also, these two designs have the
best average power delay product owing to the best results
in case of both speed and power savings which can be seen
from the Table III.
E. RSNM
Though 11T SRAM operate well for both read and write,
it has a problem with the read noise margin. Since we make

From the results shown in Table I, Table II and Table III,


we can see that the 7T SRAM setup as shown in figure 3
has the best performance in case of write and the 8T SRAM
has the best performance in case of read among the designs
in the final stage of comparison. In order to maximize gain,
we can combine the set up of these two designs and get a
new SRAM design which is good in terms of speed and
power consumption. This proposed 9T SRAM design is
shown in figure 12. During write, the bit lines BL and BLB
are driven with the data that is needed to be stored in the
nodes. The footer M9 is switched off during write by
making the signal WR go low. This weakens the feedback
and hence faster writes. During read, WR is kept high so
that the data at the nodes stay stable. WL is kept low and
RBL is pre-charged high. Reading a stored 0 at Q occurs
when RWL goes high. During the hold mode, WL and
RWL are kept low with WR kept high to maintain the
stored data.
The simulation results of this new design are shown in
Table I, Table II and Table III under the proposed design
column. The proposed design has similar RSNM as the 8T
SRAM design as shown in figure 11. Table IV shows the
percentage savings in average power delay product of the
new 9T design. We can see that the new design has superior
savings in case writing. In case of reading, it is inferior to
11T Design. But as the 11T Design has poor noise margin
which threatens its chances in sub-threshold region and
hence this can be ignored. Though the proposed design uses
similar read setup as the 8T Design [10], its PDP is
marginally higher than the 8T design because of the extra
footer transistor used.
TABLE IV
PERCENTAGE SAVINGS IN PDP OF THE PROPOSED 9T DESIGN
COMPARED TO REFERNCE DESIGNS
PDP savings
Proposed 9T
SRAM (%)

7t
[7]

8t
[10]

9t
[12]

11t
[13]

Writing

2.80

4.48

5.64

8.5

Reading

44.8

-0.9

66.18

-4.6

VI. CONCLUSION
The paper proposes a novel sub-threshold SRAM circuit
along with the study of various SRAM designs in the subthreshold region at 45nm technology using HSPICE
simulations and typical corner transistor models. Operating
a SRAM device in sub-threshold requires sufficient writing
ability and good static noise margin for the design. Among
the designs that we have taken, the successful designs use
the 6T setup for writing without the necessity of ratioed
inverter and a separate setup for read when they use typical
corner transistor models. From the results we can see that
successful designs perform better than sub-threshold SRAM
proposed in [1]. Results show that the 7T SRAM proposed
in [7] has best performance in case of write and 8T SRAM
[10] has the best performance in case of read. 11T SRAM
from [13] is good in speed and power but as it has reduced
SNM in the read mode. The new 9T SRAM design
combining the advantages of these designs was proposed.
In case of write, the PDP of proposed 9T SRAM design is
2.80% less than the 7T SRAM from [7], 4.48 % less than
8T SRAM from [10], 5.64% less than 9T SRAM design
[12] and 8.5 % less than 11T SRAM [13]. Similarly, the
savings in PDP during read is 44.8 % less than 7T SRAM
in [7] and 66.18 % less than 9T SRAM in [12]. It is almost
same as in the 8T Design. Though PDP of the proposed
design is greater than that of the 11T design, the reduced
RSNM of 11T design makes it inferior the proposed design.

Proceedings. IEEE Computer Society Annual Symposium on, vol., no.,


pp.5-9,2002
[4] J. M. Rabaey, A. Chandrakasan, and B. Nikolic, Digital Integrated
Circuits: A Design Perspective, 2nd ed. Pearson Education, Inc., 2003.
[5] Calhoun, B.H.; Chandrakasan, A.P.; , "A 256-kb 65-nm Sub-threshold
SRAM Design for Ultra-Low-Voltage Operation," Solid-State Circuits,
IEEE Journal of , vol.42, no.3, pp.680-688, March 2007
[6] Singh, J.; Pradhan, D.K.; Hollis, S.; Mohanty, S.P.; Mathew, J.; ,
"Single ended 6T SRAM with isolated read-port for low-power embedded
systems," Design, Automation & Test in Europe Conference & Exhibition,
2009. DATE '09. , vol., no., pp.917-922, 20-24 April 2009
[7] Azam, T.; Cheng, B.; Cumming, D.R.S.; , "Variability resilient lowpower 7T-SRAM design for nano-scaled technologies," Quality Electronic
Design (ISQED), 2010 11th International Symposium on , vol., no., pp.914, 22-24 March 2010
[8] Tseng, Yen Hsiang; Zhang, Yimeng; Okamura, Leona; Yoshihara,
Tsutomu; , "A new 7-transistor SRAM cell design with high read
stability," Electronic Devices, Systems and Applications (ICEDSA), 2010
Intl Conf on , vol., no., pp.43-47, 11-14 April 2010
[9] Aly, R.E.; Faisal, M.I.; Bayoumi, M.A.; , "Novel 7T sram cell for low
power cache design," SOC Conference, 2005. Proceedings. IEEE
International , vol., no., pp.171-174, 19-23 Sept. 2005
[10] Chen, G.; Sylvester, D.; Blaauw, D.; Mudge, T.; ,"Yield-Driven NearThreshold SRAM Design," Very Large Scale Integration (VLSI) Systems,
IEEE Transactions on , vol.18, no.11, pp.1590-1598, Nov. 2010
[11] Zhiyu Liu; Kursun, V.; , "Characterization of a Novel Nine-Transistor
SRAM Cell," Very Large Scale Integration (VLSI) Systems, IEEE
Transactions on , vol.16, no.4, pp.488-492, April 2008.

REFERENCES
[1] Calhoun, B.H.; Chandrakasan, A.;, "A 256kb Sub-threshold SRAM in
65nm CMOS," Solid-State Circuits Conference, 2006. ISSCC 2006. Digest
of Technical Papers. IEEE International, vol., no., pp.2592-2601, 6-9 Feb.
2006
[2] Moradi, F.; Wisland, D.T.; Aunet, S.; Mahmoodi, H.; Tuan Vu Cao; ,
"65NM sub-threshold 11T-SRAM for ultra low voltage applications," SOC
Conference, 2008 IEEE International , vol., no., pp.113-118, 17-20 Sept.
2008
[3] Wang, A.; Chandrakasan, A.P.; Kosonocky, S.V.; , "Optimal supply
and threshold scaling for subthreshold CMOS circuits ," VLSI, 2002.

[12] Sheng Lin; Yong-Bin Kim; Lombardi, F.; , "A 32nm SRAM design
for low power and high stability," Circuits and Systems, 2008. MWSCAS
2008. 51st Midwest Symposium on , vol., no., pp.422-425, 10-13 Aug.
2008
[13] Singh, A.K.; Prabhu, C.M.R.; Soo Wei Pin; Ting Chik Hou; , "A
proposed symmetric and balanced 11-T SRAM cell for lower power
consumption," TENCON 2009 - 2009 IEEE Region 10 Conference , vol.,
no.,
pp.1-4,
23-26
Jan.
2009
[14] Berkeley Predictive
www.eas.asu.edu/~ptm/.

Technology

model

website,

http://

2011 International Conference on Recent Trends in Information Systems

Low Power Single Bitline 6T SRAM Cell


With High Read Stability
Budhaditya Majumdar

Sumana Basu

PG Student, School of VLSI Technology


Bengal Engineering & Science University, Shibpur
West Bengal, India
budhadityamajumdar@gmail.com

PG Student, Dept. of Computer Sc. & Engg.


Jadavpur University
Kolkata, West Bengal, India
sumana.basu21@gmail.com

Abstract This paper presents a novel CMOS 6-transistor


SRAM cell for different purposes including low power embedded
SRAM applications and stand-alone SRAM applications. The
data is retained by the cell with the help of leakage current and
positive feedback, and does not use any refresh cycle. The size of
the new cell is comparable to the conventional six-transistor cell
of same technology and design rules. Also, the proposed cells uses
a single bit-line for both read and write purposes.

performance, efficiency and reliability. Most of the embedded


and portable devices use SRAM cells because of their ease of
use as well as low standby leakage.
A six-transistor SRAM cell (6T SRAM cell) is
conventionally used as the memory cell. However, the 6T
SRAM cell produces a cell of larger size than that of a DRAM
cell, resulting in a low memory density. Therefore,
conventional SRAM cells that use the 6T RAM cell have
difficulty in meeting the growing demand of a larger memory
capacity in mobile applications.

The cell proposed in this paper consumes less dynamic power


and has higher read stability than the standard one. In
conventional six-transistor (6T) SRAM cell, read stability is very
low due to the voltage division between the access and driver
transistors during read operation. In existing SRAM topologies
of 8T, 9T and higher transistor count, the read static noise
margin (SNM) is increased but size of the cell and power
consumption increases relatively.

Also the conventional six transistor (6T) SRAM cell shows


poor stability at very small feature size with low power supply.
During the read operation, the stability drastically decreases
due to the voltage division between the access and driver
transistors.
Considerable research work has been done over the past
several years to design a low power SRAM cell, which also
resulted in a significant degradation in SRAM cell data
stability. With each technology generation, the scaling of
CMOS devices results in random variations in the number of
dopant atoms in the channel region of the device. This causes
random variations in the device parameters like the threshold
voltage (Vt) and is usually known as random dopant
fluctuation (RDF) [1].

In the proposed technique, the SRAM cell operates by charging /


discharging of a single bit-line (BL) during read and write
operation, resulting in reduction of dynamic power consumption
to only 40% to 60% (best case / worst case) of that of a
conventional 6T SRAM cell. The power consumption is further
decreased if the switching operational voltage of the bit-line lies
between 0.25VDD to 0.5VDD. All simulations are done using
0.18um Technology.
Keywords-single bit-line, low power, SRAM, 6T Cell, read stable

I.

Since SRAM cells operate on delicately balanced


Transistors, and the conventional six transistor (6T) SRAM cell
shows poor stability during read operation, it is very important
to consider these issues during new memory cell designs.

INTRODUCTION

Exponential increase in VLSI fabrication process has


resulted in the increase of the densities of Integrated Circuits
by decreasing the device geometries. But devices with such
high densities are susceptible to high power consumption and
run time failures. Apart from such concerns, other factors such
as a growing class of portable devices like PDA, cellular
phones, portable multimedia devices etc have given designers a
motivation to look into low power design and today, not only
device geometries are a technology focus, but also reducing the
existing topologies keeping the functionality intact is also a
major area.

Design proposed in various papers provides topologies for


cells with higher read stability at the cost of increased area due
to higher transistor counts. Also isolated read bit-lines (separate
from the write bit lines) increase dynamic power consumption
and complexity while designing the overall memory unit.
II.

Rapid development of low power, low voltage SRAM cells


has been experienced during recent years. This is due to an
increasing demand of embedded devices, notebooks, laptops,
hand held communication devices and IC memory cards. Due
to these concerns limiting power consumption is a must and

Memories are an integral part of most of the digital devices


and hence reducing power consumption of memories as well as
area reduction is very important as of today to improve system

978-1-4577-0792-6/11/$26.00 2011 IEEE

BACKGROUND

169

requirements for reading and writing, a SRAM cell can also


achieve that by modifying itself. A write operation to an
unselected column of a 6T standard SRAM cell can result in
stability issues, when the word line is activated while both bit
lines are held high; a bias condition similar to a read operation.
Such problems resulted in design of different topologies of
SRAM cell to improve the data stability and leakage power
consumption [3].

hence new techniques are being realized to improve energy


efficiency at all levels of the design. In this paper an overall
analysis has been carried-out for a novel SRAM cell with
respect to stability and switching power consumption.

III.

PROPOSED 6T SRAM CELL

The proposed design has increased the read stability and


SNM, without affecting the Size or Power Consumption of a
Standard 6 Transistor SRAM cell.
A. Minimization of a 6T Srandard Cell

Figure 1. Traditional SRAM Cell with Dual Bit Line

In the traditional 6T-SRAM (Fig.1), the cells are expected


to be both read stable and also writable and such functionality
must be preserved for each cell under worst-case variation. At
the cell level, the static noise margin and write margin are both
maintained by the selection of calculated transistor strength
ratios, which in turn presents conflicting constraints on the cell
transistor strengths. For the cell stability during a read
operation, the storage inverters are usually made strong and the
pass-gates weak. Whereas the opposite is usually the case for
cell write ability which is a weak storage inverter and strong
pass-gates. Device variations common in nano-scale
fabrication can severely impact this delicate balance of
transistor strength ratios, which dramatically degrades stability
and write margins. These problems are further exacerbated by
low voltage operations as threshold voltage variation consumes
a larger fraction of these voltage margins. Variability can thus
limit the minimum operating voltage of SRAM [2].

Figure 2. 6 Transistor Standard SRAM Cell

The schematic of Fig 2 once again shows the 6 Transistor


SRAM cell which uses two bit-lines and one word-lines (tied to
two access transistors).

Many design techniques circumventing variability


problems have been proposed to enable low-voltage operation
of 6T cells. One such method is the addition of a second higher
supply voltage dedicated to the SRAM array which is very
effective and ensures sufficient margins with scaling of the
logic supply voltage. The SRAM voltage does not scale with
technology in such cases and could even be increased as
variability intensifies. Instead of being tied to a fixed higher
supply, dynamically modulated supplies could also be used
with SRAM arrays that are pulsed to different levels when a
read or write event occurs. To an extent, this helps decoupling
read and write events from the standby condition such that the
optimum bias conditions can be used in each case. Such
techniques may add complexity to the design, but can be used
to improve cell stability and write-ability or standby leakage.
These methods may reduce the trade-offs between cell
optimization for read and write but cannot eliminate them as a
whole.

Figure 3. 5 Transistor (Single Ended) SRAM Cell

With the Transistor M6 being taken away a schematic like


Fig 3 is obtained, which still functions like the 6T SRAM but
the advantages of this design are reduction in cell area and
power consumption. The cell area decreases by one transistor
and one bit line. The power consumption from charging the bit
line decreases by approximately a factor of 2 because only one
bit line is charged during a read operation instead of two, and
the bit line is charged during a write operation about half of the
time (assume equal probability of writing 0 and 1) instead of
every time when a write operation is required.

Conflicting needs of cell read stability and write-ability


compromises the variability tolerance of a 6T standard cell.
Balanced conditions are used for the same pass-gate devices
for reading and writing the cell, and it is inevitable that the two
conditions cannot be simultaneously optimized. Similar to the
dynamically modulated power supplies which separate

170

Figure 4. 4 Transistor (Single Ended) SRAM Cell

With the Transistor M1 being taken away a schematic like


Fig 4 if obtained, which has the functionality of a SRAM and
the main advantage of this design is the further reduction
power consumption. Other advantages include significant
larger write margin and smaller delay for writing 1, and
slightly smaller cell area.

Write: The word-line WL is charged to VDD as in 6T


Standard SRAM. Since NMOS is a stronger driver
than PMOS, no problem is incurred while writing a 0
into the cell. The absence of the pull down NMOS for
memory node Q allows writing a 1 into the cell easily.
Writing a 1 is done by pre-charging bit-line BL to
VDD. While writing 0, the bit-line BL is discharged
and then word-line WL is charged to VDD as in 6T
Standard SRAM.

Read: Considering the case of reading Q=0; before


reading a value from the storage nodes, the bit line BL
is pre-charged to VDD. The read word line RL is then
asserted to VDD. The storage node Q' that stores a 1 is
statically connected to the gate of MRA (Read Access
Transistor) and will drain the charges on the bit line
through MRD to GND as the RL is 1, which means
that the bit line has just read a 0. On the contrary, when
Q=1, Q' will be 0 and MRA will be in cutoff and the
bit line BL would not be able to discharge through
MRD to Gnd, and it would read a 1.

B. The Proposed 6T New SRAM Cell

Figure 6.

Schematic of Proposed 6T SRAM Cell in Multisim 11

IV.

SIMULATION RESULTS

A. Dynamic Power Consumption


In an SRAM operation, power is consumed in two phases:
the setup phase and the operation phase.
Energy consumed during the setup phase is dominated by
pre-charging/discharging various buses such as bit lines and
word lines. Using the formula Eline = 0.5 * Cline * Vline2, in which
Cline is the line capacitance and Vline is the change in line
voltage, the energy drawn from the supply by the bus can be
calculated. From this information, the average power of an
SRAM operation is obtained by dividing the clock period,
assuming that each SRAM cell can only perform one operation
per clock period, and all word lines and bit lines are discharged
to 0 after performing each operation. Also the clock has 50%
duty cycle. In the simulation performed, a clock with 40ns
clock period is used (or equivalent to 25MHz clock frequency
and 1pF bit-line capacitance).

Figure 5. 6 Transistor (Single Ended) SRAM Cell (Proposed)

The proposed 6 Transistor new SRAM Cell is created by


adding two more transistors MRA (Read Access Transistor)
and MRD (Read Driver Transistor) which shall work
independently during read operation and wont affect the Cell
SNM in any way.
C. Memory Cell Operations

Hold: If the cell content is a 1 (Q=VDD, Q=0), both


memory nodes will lock each other at their respective
voltages. However, if the cell content is a 0 (Q=0, Q=
VDD), Q is floating. Referring to Fig 5, the leakage
current through M5 must be greater than that of M2 to
ensure Q stays at 0. Fortunately, since NMOS (M5) is
a stronger current driver than PMOS (M2), this
condition is satisfied.

Therefore, consumed power Pline = Cline * Vline2 * F


Dynamic power dissipation can be lowered by reducing the
switching activity and clock frequency but it affects the

171

performance [9]. Reduction of supply voltage leads to


degradation of the cell data stability. Hence dynamic power
dissipation can be lowered by reducing bit-line capacitance of
the SRAM cell without degrading the performance.

same as switching the bit line. Note that for read operation,
since the bit line is pre-charged to VDD, there is no significant
current flow and voltage changes across the access transistor if
the cell contains a 1. Therefore, read 1 delay is not defined.

Power consumed during the operation phase is dominated


by active power and leakage power. Active power is the power
consumed when both pull-up and pull-down networks are
active, creating a direct current path from VDD to ground.
Leakage power is the power consumed when charges leak
through a transistor that is off.

The proposed 6T SRAM Cell has the smallest write 1


delays because there is no pull-down NMOS that keeps the
memory node from being pulled up to VDD. For the same
reason, it also has the worst write 0 delays, because there is no
pull-down NMOS that helps to bring the memory node to 0.
TABLE II.

TABLE I.

CALCULATED SWITCHING POWER

Charging and Discharging of only Bit-Line considered

6T Standard SRAM Cell

Proposed 6T SRAM Cell

WRITE 0

162uW

0uW

WRITE 1

162uW

81uW

READ 0

243uW

162uW

READ 1

243uW

81uW

DELAY ANALYSIS RESULTS

Simulation Results from Multisim 11

6T Standard SRAM Cell

Proposed 6T SRAM Cell

WRITE 0

5.1 nS

6nS

WRITE 1

5.5 nS

0nS

7.5nS

8.5nS @ 600nM
6.5nS @ 800nM
{MRD/MRA}

READ 0

C. Static Noise Margin


SRAM cell design has to achieve high integration density
nowadays and it has led to a stringent constraint on the cell
area in modern embedded systems or memory modules.
Choosing minimal width-to-length ratios for the SRAM cell
transistors is the first step to achieve such a design. As
previously mentioned, variations in the threshold voltage Vth,
increase steadily due to random dopant density fluctuations in
channel, source and drain as the dimensions scale down to
nanometer regime [11]. Therefore, differences are common,
between two closely placed transistors which were supposed to
be identical. The differences are mainly in their their electrical
parameters such as Vth and make the design of the SRAM less
predictable and controllable. Moreover, the stability of the
SRAM cell is seriously affected by the increase in variability
and by the decrease in supply voltage Vdd. Considerable
research in understanding and modeling the stability of the
SRAM cell has been done in the past. Development of several
analytical models of the static noise margin (SNM) have been
done in the past. Each of the work tried to optimize the design
of the cell, to forecast the effect of parameter changes on the
SNM [12] and to estimate the impact of intrinsic parameter
variations on the cell stability [13]. Further, maximization of
cell stability has been done in new SRAM cell circuit for future
technology nodes [14].

For every write operation of a 6T Standard Cell, the


complementary data is placed on both the bit-lines and then the
pre-charge circuit is activated. Only one of the bit-lines gets
charged depending on the data value. Once the write is
completed, it is assumed that the capacitor is discharged. So
power dissipation happens twice during a write phase. During a
read cycle, again both the bit-lines are charged and then one is
discharged during reading a Zero, while the other is discharged
after the operation is complete.
For the proposed 6T SRAM Cell, only a single bit-line is
either charged if it is a 1 or does not get charged at all
assuming that the data was already present before the precharge circuit has been activated. After the write 1 operation,
the bit-line is assumed to get discharged. During a read 0 cycle,
the bit-line capacitance is pre-charged and discharged through
the cell, where as in a read 1 cycle, the bit-line is pre-charged
and assumed to be discharged after the read process is over.
B. Delay Calculation
SRAM delays are usually defined as the time it takes to
read or write a value from an SRAM cell. When a node is
switching, delay is measured as the time difference between
10% and 90% of the voltage swing. For example, if node A is
being changed over from 0V to 1.8V, then the delay is the time
node A takes to go from 0.18V to 1.62V.

The data retention of the Standard 6T SRAM cell in hold


state and the read state are important constraints in advanced
technologies. The cell becomes less stable at low VDD, with
increase in leakage currents and increasing variability. The
stability is usually defined by the static noise margin as the
maximum value of the DC noise voltage that can be tolerated
by the SRAM cell without altering the stored bits. [15]

In the simulation, it is assumed that the bit-line(s) have 1pF


capacitance which is much higher a value than node
capacitance at VNode1. Therefore, it takes much less effort to
switch memory nodes than to switch bit line. This is why, in
general, delays for write operation are smaller than that of read
operation in SRAMs, because writing into a cell is the same as
switching the memory node, and reading from a cell is the

In the standard 6T SRAM cell the read static-noise-margin


(SNM) is much affected with decrease in supply voltage
(VDD) and transistor mismatch [12], [16]. This mismatch
happens due to variations in physical quantities the devices
designed to be identical. Commonly known physical quantities

172

100mV of noise at Q when Q is at 1, can easily flip the state.


But it is very unlikely that a noise can flip Q from 0 to 1.

are threshold voltages, body factor and current factor. Though


SNM decreases at low VDD the overall delay of the SRAM
increases and also data destruction takes place with low VDD
read operation in SRAM cells [12]. But in the proposed SRAM
cell, reading from the cell has no effect on the static noise
margin because the data retention and the data output blocks
are isolated.

D. Improving Noise Margin Dual Vt


The static noise margin at Hold State is very low as seen in
Fig 7 and a slight disturbance of as less as 100mV at Q' can flip
the state of the Cell. To prevent this from happening, a higher
Vt for M2 of Fig 5 can be implemented.

The main operations of the SRAM cell are the write, read
and hold. The static noise margin is certainly more important at
hold and read operations [6], specifically in read operation
when the wordline is 1 and the bitlines are precharged to 1. The
internal node of SRAM which stores 0 will be pulled up
through the access transistor across the access transistor and
the driver transistor. This increase in voltage severely degrades
the SNM during read operation.

Figure 8.

The SNM Curve plotted for different VBS changing from 0.0V to
1.8V in steps of 300mV

It can be seen that the risk of Q=1 being flipped with a


little noise reduces with increasing VBS for M2 of Fig 5.
Figure 7.

V.

Hold State SNM Curve for the proposed 6T Cell

CONCLUSION

Continuing technology scaling puts a limit on how much


supply voltage can be scaled. Therefore, limiting the power
consumption with new architectures are the design
requirements in recent integrated circuits. In the case of
SRAM, one seemingly counter intuitive approach is to utilize
only a single bit-line without jeopardizing read stability, which
leads to the development of a Single Ended 6T SRAM. The
new SRAM operating scheme, gives a significant power
reduction by reducing the amount of switching on bit lines.
Extending this operating scheme also allows us to propose a
single bit line design that achieves a relatively smaller area
while retaining all of the power saving advantages. For a small
penalty in delay, Single Ended 6T SRAMs are attractive
alternatives as memory storage for applications that do not
require high clock frequency. Although, higher operating
frequency may be obtained by lowering the bit-line capacitance
in the order of Femto Farads, instead of the 1pF, as assumed in
this paper.

During read/hold operation, the requirement is that the


SRAM cell must be as robust as possible so that a sudden
disturbance will not change the content in the memory nodes.
For example, read noise margin of 200mV means that during
read operation, if one of the memory nodes (Q or Q) changes
by less than 200mV, then we can be sure that after the read
operation, the content of Q and Q will remain the same, and
any disturbance to the voltage in the cell will be eliminated.
Therefore, a larger read/hold noise margin is preferred. During
write operation, the situation is reversed; the requirement is to
switch the content of Q and Q easily. Therefore, the write
noise margin (more commonly referred to as the write
margin) is defined as the range of voltage disturbances that
will flip the content of the memory nodes. For example, if
write margin is 500mV, then a range of at least 500mV
disturbance in the memory nodes will cause their content to
flip, thus achieving write operation.
It can be deduced from Fig 7 that Q is very zero stable
where as Q is pretty much 1 stable. And the risk is that a mere

173

REFERENCES
[1]

S. Borkar, T. Karnik, S. Narendra, J.T schanz, A. Keshavarzi, and V.de,


Parameter Variations and Impact on Circuits & Microarchitecture,"
Proceedings of Design Automation Conerence., pp. 338-342, Jun.2003

[2]

Jawar Singh, DhirajK.Pradhan et al, A single ended 6T SRAM cell


design for ultra low voltage applications,IEICE Electronic
Express,2008,pp-750-755

[16] Koichi Takeda, Yasuhiko Hagihara, Yoshiharu Aimoto, Masahiro


Nomura, Member, IEEE, Yoetsu Nakazawa, Toshio Ishii, and Hiroyuki
Kobatake, A readstatic-noise-margin-free SRAM cell for low-VDD and
high-speed applications, in IEEE ISSCC Dig. Tech. Papers, Feb. 2005,
pp. 478479.
[17] Ultra-Low Voltage Nano-Scale Memories, Itoh, Kiyoo; Horiguchi,
Masashi; Tanaka, Hitoshi (Eds.), Springer Publications

[3]

Yen Chen,Gary Chen et al., A 0.6V dual rail compiler SRAM design
on 45nm CMOS technology with Adaptive SRAM power for lower
Vdd_min VLSIs,Solid-State Circuits, IEEE Journal,vol. 44 ,April.2009,
Issue 4 , pp.1209-1214

[18] Digital Integrated Circuits A Design Perspective, Second Edition, Jan


M. Rabaey, Anantha Chandrakasan, Borivoje Nikolic, Prentice-Hall /
Pearson Publications

[4]

Koichi Takeda et al, A Read Static Noise Margin Free SRAM cell for
Low Vdd and High Speed Applications,Solid-State Circuits, IEEE
Journal vol. 41, Jan.2006, Issue 1 , pp.113-121

[19] Shilpi Birla, Neeraj Kr. Shukla, Manisha Pattanaik, R.K.Singh, Device
and Circuit Design Challenges for Low Leakage SRAM for Ultra Low
Power Applications, Canadian Journal on Electrical & Electronics
Engineering Vol. 1, No. 7, December 2010,

[5]

Aly, R.E. Bayoumi, M.A., Low-Power Cache Design Using 7T SRAM


Cell Circuits and Systems II: Express Briefs, IEEE Transactions, vol.
54 April 2007, Issue: 4, pp. 318-322

[6]

Benton H. Calhoun Anantha P. Chandrakasan A 256-kb 65-nm Subthreshold SRAM Design for Ultra-Low-Voltage Operation,Solid-State
Circuits, IEEE Journal vol. 42, March 2007, Issue 3 , pp.680-688.

[7]

[20] Shilpi Birla, Neeraj Kr.Shukla, Manisha Pattnaik, R.K.Singh


ANALYSIS OF THE DATA STABILITY AND LEAKAGE POWER
IN THE VARIOUS SRAM CELLS TOPOLOGIES, International
Journal of Engineering Science and Technology, Vol. 2(7), 2010, 29362944
[21] Benton Highsmith Calhoun, Member, IEEE, and Anantha P.
Chandrakasan, Fellow, A 256-kb 65-nm Sub-threshold SRAM Design
for Ultra-Low-Voltage Operation, IEEE IEEE JOURNAL OF SOLIDSTATE CIRCUITS, VOL. 42, NO. 3, MARCH 2007

Peter Geens, WimDehaene, A dual port dual width 90nm SRAM with
guaranteed data retention at minimal standby supply voltage,34th
European Solid-State Circuits Conference, 2008. ESSCIRC 2008.pp290-293.

[8]

Farshad Moradi etal., 65nm Sub threshold 11 T SRAM for ultra low
voltage Application ,IEEE xplore,2008,pp-113-117

[9]

Dake Liu, and Christer Svenson, Power Consumption Estimation in


CMOS VLSI Chips," IEEE Journal of Solid State Circuits, vol. 29, No.
6, pp. 663-670, June 1994

[22] Prashant Upadhyay, Mr. Rajesh Mehra, Niveditta Thakur, Low Power
Design of an SRAM Cell for Portable Devices, Intl Conf. on
Computer & Communication Technology, ICCCT10
[23] Yen Hsiang Tseng, Yimeng Zhang , Leona Okamura and Tsutomu
Yoshihara, Graduate School of Information, Production and Systems,
Waseda University, Fukuoka, 808-0135, Japan, A New 7-Transistor
SRAM Cell Design with High Read Stability, 2010 International
Conference on Electronic Devices, Systems and Applications
(ICEDSA2010)

[10] K. Itoh et. al., Trends in Low-Power RAM Circuit Technologies,


Proceedings of the IEEE, pp.524-543, April 1995

[24] Sandeep R, Narayan T Deshpande, A R Aswatha, Design and Analysis


of a New Loadless 4T SRAM Cell in Deep Submicron CMOS
Technologies, Second International Conference on Emerging Trends in
Engineering and Technology, ICETET-09

[11] B. Cheng et al., The impact of random doping effects on CMOS SRAM
cell, in Proc. ESSCIRC, Sep. 2004, pp. 219222
[12] E. Seevinck et al., Static-noise margin analysis of MOS SRAM cells,
IEEE J. Solid-State Circuits, vol. SC-22, no. 5, pp. 748754, Oct. 1987

[25] Paridhi Athe, S. Dasgupta, A Comparative Study of 6T, 8T and 9T


Decanano SRAM cell, 2009 IEEE Symposium on Industrial
Electronics and Applications (ISIEA 2009), October 4-6, 2009, Kuala
Lumpur, Malaysia

[13] J. Bhavnagarwala et al., The impact of intrinsic device fluctuations on


CMOS SRAM cell stability, IEEE J. Solid-State Circuits, vol. 36, no. 4,
pp. 658665, Apr. 2001

[26] Abhijit Sil, Soumik Ghosh, Neeharika Gogineni, Magdy Bayoumi, A


Novel High Write Speed, Low Power, Read-SNM-Free 6T SRAM
Cell, The Center for Advanced Computer Studies, University of
Louisiana at Lafayette, Lafayette, Louisiana 70504

[14] L. Chang et al., Stable SRAM cell design for the 32 nm node and
beyond, in Symp. VLSI Technology Dig. Tech. Papers, Jun. 2005, pp.
128129

[27] Ramy E. Aly and Magdy A. Bayoumi, Fellow, Low-Power Cache


Design Using 7T SRAM Cell, IEEE, IEEE TRANSACTIONS ON
CIRCUITS AND SYSTEMSII: EXPRESS BRIEFS, VOL. 54, NO. 4,
APRIL 2007

[15] Evelyn Grossar, Michele Stucchi, Karen Maex, Member, IEEE, and
Wim Dehaene, Senior Member, Read stability and write-ability
analysis of SRAM Cells for nanometer Technologies IEEE Journal Of
Solid-State Circuits, Vol. 41, No. 11, November 2006.

174

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSI: REGULAR PAPERS, VOL. 59, NO. 10, OCTOBER 2012

2275

Heterogeneous SRAM Cell Sizing for


Low-Power H.264 Applications
Jinmo Kwon, Student Member, IEEE, Ik Joon Chang, Insoo Lee, Heemin Park, and Jongsun Park, Member, IEEE

AbstractIn low-voltage operation, static random-access


memory (SRAM) bit-cells suffer from large failure probabilities
with technology scaling. With the increasing failures, conventional SRAM memory is still designed without considering the
importance differences found among the data stored in the SRAM
bit-cells. This paper presents a heterogeneous SRAM sizing
approach for the embedded memory of H.264 video processor,
where the more important higher order data bits are stored in the
relatively larger SRAM bit-cells and the less important bits are
stored in the smaller ones. As a result, the failure probabilities
significantly decrease for the SRAM cells storing the more important bits, which allows us to obtain the better video quality even in
lower voltage operation. In order to find the SRAM bit-cell sizes
that achieve the best video quality under SRAM area constraint,
we propose a heterogeneous SRAM sizing algorithm based on a
dynamic programming. Compared to the brute-force search, the
proposed algorithm greatly reduces the computation time needed
to select the SRAM bit-cell sizes of 8 bit pixel. Experimental
results show that under iso-area condition, the heterogeneous
SRAM array achieves significant PSNR improvements (average
4.49 dB at 900-mV operation) compared to the conventional one
with identical cell sizing.
Index TermsH.264 systems, low-power static random-access
memory (SRAM), SRAM bit-cell.

I. INTRODUCTION

ECENTLY, portable devices such as smart-phones, cellular phones, and video cameras are gaining popularity
as well as making changes in every aspect of our daily lives.
Multimedia data processing including image/video applications is one of the key factors enhancing the ever-increasing
portable device market. However, image/video applications are
very computationally intensive and require a large amount of
embedded memory access, which results in significant power
consumption and thus limits the battery lifetime of portable
devices.

Manuscript received March 01, 2011; revised July 12, 2011 and October 12,
2011; accepted November 24, 2011. Date of publication February 14, 2012; date
of current version September 25, 2012. This work was supported by the Basic
Science Research Program through the National Research Foundation of Korea
(NRF) funded by the Ministry of Education, Science, and Technology (20100004484). This paper was recommended by Associate Editor C.-C. Wang.
J. Kwon, I. Lee, and J. Park are with the School of Electrical Engineering,
Korea University, Seoul 136-701, Korea (e-mail: jongsun@korea.ac.kr).
I. J. Chang is with the School of Electronics and Information, Kyung Hee
University, Suwon 446-701, Korea.
H. Park is with the Department of Computer Science, Yonsei University,
Seoul 140-749, Korea.
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TCSI.2012.2185335

Many previous research efforts have focused on reducing


the power consumption of portable multimedia applications.
In multimedia video processing, the embedded memory access
used for motion estimation and buffering is one of the primary
sources of power consumption [1]. As a key approach for
reducing power, supply voltage scaling has been widely used
in CMOS VLSI systems [2]. Unfortunately, in low-voltage
operation, the failure probabilities of SRAM bit-cells significantly increase, and it becomes even worse due to the
serious processvoltagetemperature (PVT) variations with
technology scaling [3]. The increasing failures in the embedded
SRAM memories within video processors finally give rise to
considerable video quality degradation.
In order to overcome the increasing SRAM failures, SRAM
structures with eight or ten transistors [4][6] have been proposed. Using extra transistors, read stabilities were remarkably
improved; however, the alternative SRAM bit-cells suffer from
large area penalties. In [7], hybrid 6T/8T SRAM structure is proposed to mitigate the area overhead of the 8T bit-cell. The approach facilitates aggressive voltage scaling in SRAM memory,
however, the hybrid structure results in several design challenges for peripheral circuitries such as write drivers and row
decoders. Moreover, the hybrid SRAM should be implemented
using single-ended structures, where a small number of bit-cells
are used per single bit-line, therefore degrading area efficiency.
error correction codes (ECC) are also used to reduce the failure
probabilities in SRAM array [8]. The ECC approach suffers
from a delay penalty to encode/decode the syndromes of data
stream as well as area overhead for both ECC circuitry and redundancy data.
Previous literature [9], [10] shows that most of SRAM failures are mainly due to the random transistor threshold voltage
variations, which is caused by the random dopant fluctuation (RDF) [11]. Since the RDF effect can be alleviated by
increasing transistor size, the SRAM failures can be reduced by
using bit-cells with larger transistors. In video data processing,
the human visual system is more sensitive to higher order bits
(HOBs) of the luma and chroma pixels than the low-order bits
(LOBs). However, the importance differences found among the
data stored in SRAM have not been considered in the conventional embedded memory design. In this work, we propose a
heterogeneous SRAM cell sizing scheme, where the HOBs are
stored in the relatively larger SRAM bit-cells and the LOBs bits
are stored in the smaller ones. In order to find the appropriate
sizes of the SRAM bit-cells, we also propose a dynamic programming-based low-complexity algorithms. Our proposed heterogenous SRAM sizing approach is applied to the embedded
memories in the H.264 baseline profile level 1.3 encoder [12]

1549-8328/$31.00 2012 IEEE

2276

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSI: REGULAR PAPERS, VOL. 59, NO. 10, OCTOBER 2012

Fig. 1. Overview of the heterogeneous SRAM architecture.

that allows low operation frequencies and possibly low-voltage


operations. Using the proposed SRAM, the failure probabilities
of the HOBs are remarkably reduced, thereby improving the
video quality even in low-voltage operation. Compared to the
previous work [7], the proposed approach offers a simple yet efficient voltage scalable SRAM architecture, where the conventional 6T SRAM array structure is not changed. Thus, the layout
of the proposed SRAM is neat and straightforward compared
to other SRAM architectures [5], [7] supporting low-voltage
operations. Fig. 1 shows the overview of the proposed SRAM
architecture."
The rest of the paper is organized as follows. Section II
describes a SRAM cell failure and its resulting video quality
degradation in H.264 system. Section III presents algorithmic
approaches used to get the SRAM bit-cell sizes that give rise to
minimum video quality degradation in low-voltage operations.
Numerical results obtained from implementations are presented
in Section IV. Section IV also discusses about the layout issues
of the proposed SRAM architecture, and finally, conclusions
are drawn in Section V.
II. ANALYSIS OF SRAM BIT-CELL FAILURE AND ITS
RESULTING VIDEO QUALITY DEGRADATION
IN H.264 PROCESSORS
In this section, we have brief discussions on the SRAM failure
mechanism. In a H.264 processor, the failures occurred in the
embedded SRAM array give rise to video quality degradation,
where the locations of the SRAM failures are the main factor
affecting the actual amplitude of the quality degradation. We
also present a quantitative analysis on the relationship between
the amplitude of the video quality degradation and the failure
locations.
A. SRAM Bit-Cell Failure Analysis
Fig. 2 shows a schematic of 6T SRAM bit-cell. In the
SRAM bit-cells, possible failures under process variations can
be broadly categorized as delay and functional failures. As
mentioned earlier, since the target application is the H.264
baseline profile that allows low operation frequency of 20 MHz
or below, the delay constraint can be easily satisfied using
90-nm technology even with 850-mV supply voltage.
In low-voltage operation with process variations, the SRAM
bit-cell arrays still suffer from functional failures due to neg-

Fig. 2. Schematic of 6T SRAM bit-cell.

Fig. 3. Comparison between the write failure probability in the SF corner and
the read failure probability in the FS corner for minimum size SRAM. The trannm
nm,
sistor sizes of the minimum (1.0 ) SRAM are
nm
nm, and
nm
nm.

ative read static noise margin (SNM) [3], [13] and negative
write margin [14]. Under process variations, the worst process
corners in SRAM bit-cell for read and write operations are
Fast-NMOS and Slow-PMOS (FS) and Slow-NMOS and
Fast-PMOS (SF) corners, respectively [7]. SRAM read failure
probability,
, at FS corner and write failure probability,
, at SF corner are simulated using IBM 90-nm CMOS
technology, and the results are presented in Fig. 3. Please note
that
at the SF corner is much smaller than
at
the FS corner, which implies that in the conventional 6T SRAM
bit-cell, the worst SRAM failure probability,
, happens in
the FS corner. In the following, we will refer to SRAM failure
as the read failure under the FS corner, which means that we
consider the worst process corners of 6T SRAM.
As mentioned, most of the SRAM failures are mainly due
to the random transistor threshold voltage
variations
[15]. The transistor threshold voltage
variation is usually
caused by the random dopant fluctuation (RDF) effect [16].
Since the effect of RDF is expected to increase with technology
scaling [16], [17], the probability of a SRAM cell failure will
grow as well with process scale-down. The RDF based
variation in a simple reverse-quadratic model is expressed as
follows [18]:
(1)

KWON et al.: HETEROGENEOUS SRAM CELL SIZING FOR LOW-POWER H.264 APPLICATIONS

2277

Fig. 5. Block diagram of H.264 video encoder and its embedded SRAMs.
Fig. 4. SRAM failure probabilities for different supply voltages and
bit-cell sizes. The transistor sizes of the minimum (1 ) SRAM are
nm
nm,
nm
nm, and
nm
nm.

where
is the standard deviation of the
variations for a
minimum sized transistor, which has a channel length
and
a channel width
. From the equation above, we can easily
notice that SRAM failure probability decreases with larger transistor size.
The failure probabilities for different supply voltages and
transistor widths in a SRAM bit-cell are presented in Fig. 4. The
numerical results are obtained using extensive Monte Carlo simulations with IBM 90-nm technology. The number of destructive read operations in SRAM is counted using Monte-Carlo
simulations for 100 000 samples with local intra-die threshold
voltage variations (RDF effects) at the worst global process
corner (FS corner). As the supply voltages and transistor widths
become smaller, the probability of SRAM failure abruptly increases. It should be noted that the increasing failures due to the
voltage scaling-down can be compensated with a larger transistor width. For example, a SRAM bit-cell with minimum transistor width (1.0 ) and 925-mV supply voltage has a failure
probability of 1.07%, which has the same failure probability in
the case of 1.35 width bit-cell under 850-mV supply voltage.
B. Video Quality Degradation for Bit Position
For the embedded SRAM used inside H.264 system, the
SRAM failure increase leads to video quality degradation.
The key observation here is that the amplitude of the quality
degradation is quite dependent on the bit-cell failure locations.
In other words, the failures, which occur in the SRAM bit-cells
storing the HOBs of luma/chroma pixels, result in a considerable video quality degradation, while the LOB data, if affected,
does not significantly deteriorate the output video quality. In
this section, we quantitatively analyze the effect of the SRAM
failure positions on the video quality degradation in H.264
system.
As a video quality measure, we use the PSNR (Peak
Signal-to-Noise Ratio), which is
(2)

MSE is the mean square error between the original videos and
the impaired videos, whose quality is degraded due to the failures in embedded memory. MSE is expressed as

(3)
Fig. 5 presents the H.264 encoder architecture [19]. The motion estimation (ME) block records the displacement with the
reference frame as a motion vector. The motion compensation
(MC) reconstructs the temporal domain frame by using the motion vectors. Here, the difference between the original data and
the reconstructed data is computed and stored in residual frame
buffer. This residual frame data is transformed to the frequency
domain and further quantized in order to reduce the spatial redundancy. The final bit-stream is generated by using the motion
vectors and quantized transform coefficients. The target application in this work is the H.264 baseline profile level 1.3 that
supports common intermediate format (CIF). Since the format
presents 352
288 resolution under 0.5-Mbps constant bitrate (CBR) constraint, it allows low clock frequencies with low
supply voltage.
During the H.264 encoding process, we assume that the
six embedded memories (the residual frame buffer, the reconstructed frame buffer, the reference frame buffer, the
inter-prediction buffer, the pipelined buffer for DCT, and the
quantization buffer) are utilized as buffers storing the intermediate results of the frame data, which are highlighted in
Fig. 5. In the following discussions, we assume that the SRAM
failures are equally distributed in those six different buffers,
and the video quality degradation is measured on the H.264
decoder side.
As a measure of the video quality degradation, we define
PSNR as the PSNR differences between the original video
and the impaired video due to the SRAM failures:
(4)
plots when the SRAM failure ocFig. 6 shows the
curs only at one particular bit position of the 8-bit luma pixel and
4-bit chroma pixel during the encoding process of H.264 systems. During the encoding process, the level 1.3 baseline profile
encoder with JM reference software [20] version 16.0 which

2278

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSI: REGULAR PAPERS, VOL. 59, NO. 10, OCTOBER 2012

III. AN ALGORITHM TO DETERMINE THE BEST CANDIDATE


SRAM SIZING FOR H.264 MULTIMEDIA SYSTEMS
In this section, we describe a SRAM bit-cell size selection
algorithm. Using the proposed algorithm, appropriate sizes of
the embedded SRAM bit-cells are determined to guarantee a
significant improvement in the video quality of H.264 system
in low-voltage operation. SRAM memory storing 8-bit luma
pixels is considered as a case study, and the proposed algorithm
is also applicable to chroma pixel data or even to super-resolution video systems.
A. Problem Definition

Fig. 6.
for different failure locations in embedded SRAM.
changes for 8-bit luma pixel data. (b)
changes for 4-bit
(a)
chroma pixel data.

supports CIF video format, 30 fps, 0.5 Mbps bit-rate with adaptive quantization is used. More than ten reference video samples are used to get the average value of the
. As the
failure probability increases at the HOBs, the
range
is around 24
3 dB. On the other hands, the
stays
around 1
0 dB for the failures in LOBs. Here, when the
SRAM failure occurs only at the
order bit of 8 bit pixel,
the video quality degradation can be represented as
,
. As the failure position becomes closer to
7, the
increases abruptly.
The analysis in Fig. 6 shows that the amplitudes in video
quality degradation are quite dependent on the bit-cell failure
positions and the supply voltage. Therefore, the SRAM bit-cells
storing the luma/chroma pixel data can be carefully sized so
that the cells storing important video data are less affected by
voltage scaling or process variations. Of course, the failures of
the SRAM cells storing the LOB pixel data may increase; however, the video quality degradation is not significant. In order
to select the appropriate size for each of the SRAM bit-cells,
we propose a priority-based SRAM sizing algorithm in the following section.

The problem is formulated as follows: Given a SRAM


area constraint and target supply voltage, determine each of
the SRAM bit-cell sizes so that the video quality (in terms
of PSNR or MSE) is maximized. For the SRAM storing
an 8-bit luma pixel, a set of bit-cell sizes is represented as
, and following are the specifications
of our problem:
Given area constraint is
, which means that
should be satisfied.
There is a maximum size limit for a bit-cell. In our experiment in target voltage of 900 mV, the maximum size of a
SRAM bit-cell is 1.8 times larger than the minimum size
bit-cell. This is due to the fact that the SRAM failure reduction due to the SRAM sizing-up is almost saturated beyond
the 1.8 bit-cell size.
From the minimum size (1.0 cell), the SRAM bit-cell is
extended by minimum steps, where the minimum step size
is the minimum permissible grid size in a given process.
In the case of IBM 90-nm technology that we used in our
experiment, the minimum permissible grid size is 5 nm.
Thus, one of the 17 different size bit-cells with
will be chosen.
A look-up table is also needed. The table gives the
information
when a SRAM sizing and bit location are specified.
As shown in (2), PSNR is a direct function of MSE. In
this algorithmic approach, we use MSE to deicide the optimal cell sizing since smaller MSE guarantees better video
quality (PSNR). The SRAM bit-cell size search problem
can be considered as a problem of finding a bit-cell size set,
, which gives rise to the minimum
values. In the hypothetical scenario of the previous section, where we assume that
a SRAM failure occurs only in one bit of 8-bit luma pixel, if
the failure occurs at the
bit, the MSE for that specific bit
is expressed as
. The amplitude of the
increases as grows
since failure happenings in the HOBs give rise to a relatively larger video quality degradation. In our heterogeneous
SRAM sizing approach, since each bit-cell of the 8 bit luma
pixel has different cell size,
, each
of the bit-cells have a different failure probability. We can

KWON et al.: HETEROGENEOUS SRAM CELL SIZING FOR LOW-POWER H.264 APPLICATIONS

obtain

by summing up all the


values
.1 For every possible set of the
8 bit bit-cell sizes, the pseudo-code for the brute-force search
algorithm is shown in Algorithm 1.

if

Input: Area Constraint

, Voltage

Initial:

, Voltage

Initialize

Output: Optimal SRAM cell sizing

for all

Algorithm 2 Cell sizing algorithm based on dynamic


programming approach

Output: Optimal SRAM cell sizing

Algorithm 1: Brute-force search algorithm


Input: Area constraint

2279

Initialize other sizing subproblems

that satisfies area constraint

do
then

for

1 to 7 do
to

for

to

for

if
However, this methodology requires a huge computation
overhead since it needs to compute all of the possible bit-cell
combinations. In the case of calculating the optimal cell sizes
of the 8-bit luma pixels, the complexity is
, where is
the number of all the probable cases of the bit-cell sizing from
minimum to maximum. In our experiment, 17 probable bit-cell
sizes can be chosen, therefore, a total
sizing cases need
to be considered. In the following subsection, we propose an
efficient approach to find a set of bit-cell sizing that guarantees
the optimal video quality with a reasonable time complexity.

B. Proposed Dynamic Programming-Based Algorithm


The brute-force algorithm that is presented in Section III-A
is not feasible due to its large complexity. Instead, we can use
a dynamic programming technique [21] to design a low-complexity algorithm that guarantees the same cell sizing output.
The dynamic programming technique generally solves a complex problem by breaking it down into simpler subproblems in a
bottom-up manner. At each step, it solves every subproblem just
once and then saves the answer in a table, thereby avoiding the
work of recomputing the answer every time that the subproblem
is encountered.
Let us now discuss the dynamic programming approach. A
subproblem sizing
is defined as follows: Under a given
intermediate area constraint , find the optimal cell sizing
that has the minimum sum of
for the partial upper
bits starting from MSB. As solutions of the subproblem, the
partial optimal cell sizing
and
are obtained, where the
denotes the
minimum sum of
for upper bits that have the cell
sizes of
. The solutions of subproblems are stored in a
table to prevent recomputing in the next step of iterations.
1Generally,
is not same with the total MSE of video due to
the nonlinear process inside the H.264 system. However, we used
as an optimization metric since the SRAM cell sizing that has the minimum
generally shows the better quality of video quality (PSNR).

do
do

then

In the initialization step, when is equal to zero,


can be obtained by a simple
table look-up. As becomes larger,
is computed by accumulating the
value of each bit-cell having the partial optimal cell
sizes of
. The
can be
calculated by using the previously computed
values, where is the newly added area of the
bit-cell. In other words, in the
iteration to compute the
, the partial cell sizing
that has the minimum sum of the
, is determined by varying , where
the
is obtained from the previously calculated results. The recursive relation is expressed as the
following equation:

(5)
where
means
of
order bit with size
of . Fig. 7 shows an example of a subproblem sizing (2, 3.8 ).
To solve the subproblem sizing (2, 3.8 ), the candidates from
the previous iteration
are checked to find the partial cell sizing that shows the
minimum sum of the
. The resulting
and
are stored in the table, and those values will be
used in the next step
of iteration.
Algorithm 2 shows an overall flow of the optimal cell sizing
selection process in a given area
. As a first step of the
iteration, when is equal to zero,
are initialized
as
since there is only one candidate. For other
table contents,
,
, are initialized
as infinite number, and all the
elements are set to
zero. As mentioned, in order to find the optimal solution of a

2280

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSI: REGULAR PAPERS, VOL. 59, NO. 10, OCTOBER 2012

Fig. 7. An example of sub-problem sizing (2, 3.8 ) in the proposed dynamic


programming approach.
TABLE I
OPTIMAL SRAM BIT-CELL SIZES AND ITS CORRESPONDING FAILURE
PROBABILITIES. (WITH THE ISO-AREA CONDITION OF 1.3 IDENTICAL
SIZE SRAM UNDER 900-mV SUPPLY VOLTAGE)

sub-problem with the given area


and the partial bits, the
minimum sum of
needs to be iteratively searched
by calculating (5) in a bottom-up manner. The candidates
of
are obtained by summing the precalculated
subsolutions for -1 bits (
) and
s of
the newly added bit (
[m]). Among the candidates,
the minimum value is selected as an optimal solution. The table
storing the partial optimal cell sizes (
) is continuously
updated during the iterations.
As mentioned above,
time complexity is required for
the brute-force search. Using the proposed dynamic programming approach, the time complexity is reduced to
, and
the proposed approach always presents the identical solutions to
the brute-force search algorithm. We compared the CPU computation time for
case. The brute-force search took more
than 3 hours to get the optimal solution, whereas our proposed
dynamic programming approach spends only 0.2 seconds.
IV. EXPERIMENTAL RESULTS AND LAYOUT ISSUES
A. Experimental Results
In this subsection, we present the numerical comparisons
on the video quality degradations when SRAMs with identical
sizing and heterogenous sizing are used for the embedded
memories of H.264 system with supply voltage scaling. For
the identically sized SRAM, every bit-cell has the same failure
probability. However, in the SRAM with heterogeneous sizing,
which is generated by the proposed algorithm 2, the failure

Fig. 8. Still shot images and PSNR comparisons between the identical and
the proposed approach for two different videos with the iso-area condition of
1.3 under 900-mV supply voltage. (a) An image from Foreman video with
the identical sizing SRAM. (b) An image from Foreman video with heterogeneous sizing SRAM. (c) An image from City video with the identical sizing
SRAM. (d) An image from City video with heterogeneous sizing SRAM.

Fig. 9. Average PSNR comparisons for ten sample videos under different area
constraints at 900-mV supply voltage.

occurrences in the HOBs are suppressed at the expense of


the increasing failure probabilities of the LOBs, whereby the
overall video quality is significantly improved even with very
low-voltage operation. In the following, PSNR is used as a measure of video quality. Table I shows each of the optimal SRAM
bit-cell sizes and its corresponding SRAM failure probability
when the supply voltage is 900 mV. Our proposed algorithm is
used to obtain the results under the iso-area condition of 1.3
the identical SRAM sizing. We can notice from the table that a
large portion of the SRAM area is occupied by the HOBs and
the failure probabilities of the HOBs are noticeably reduced.
In order to observe the video quality degradations during the
low-voltage operations, first, extensive Monte Carlo simulations are performed in the FS corner of IBM 90-nm technology
with 900-mV supply voltage and the temperature of 25 C, and
the failure probabilities of SRAM bit-cells for the two disparate
SRAM schemes are obtained. Using the failure probabilities,

KWON et al.: HETEROGENEOUS SRAM CELL SIZING FOR LOW-POWER H.264 APPLICATIONS

2281

Fig. 10. PSNR curves per frame for sample videos. (a) PSNR curves of 60 frames Soccer video under 900-mV (heterogeneous and identical) and 1.2-V (identical)
supply voltage. (b) PSNR curves of 60 frames Crew video under 900-mV (heterogeneous and identical) and 1.2-V (identical) supply voltage. (c) PSNR curves
of 1000 frames video (Harbour, Canoo, mobile, Football) under 900-mV (heterogeneous and identical) and 1.2-V (identical) supply voltage. (d) PSNR curves of
1000 frames video (Soccer, Crew, Foreman, Bus) under 900-mV (heterogeneous and identical) and 1.2-V (identical) supply voltage.

the H.264 encoding simulations are performed with the JM


reference tool [20]. In the JM reference tool, error generation
function is inserted to emulate the SRAM failure occurrences.
During the simulations, 1000 frames of ten benchmark videos
[22] are processed with 0.5-Mbps encoder output rate.
Fig. 8 shows the still images of example videos and their corresponding PSNRs for both the identical sizing and heterogeneous sizing approaches. Under the 1.3 iso-area conditions of
the two disparate SRAM structures, the proposed heterogeneous
sizing scheme shows 6.37 and 6.60 dB improvements over the
identical SRAM sizing for Foreman and City videos, respectively. Under the iso-area condition, for ten 1000 frame benchmark videos [22], the average PSNR improvement of the proposed SRAM structure at 900-mV supply voltage is 4.49 dB.
In Fig. 9, the PSNRs of the ten sample videos under various
SRAM iso-area conditions are also presented. The figure shows
that our proposed scheme shows better PSNR for different area
constraints. In Fig. 10, the PSNR changes for various sample
videos up to 1000 frames under three conditions (heterogeneous
sizing at 900 mV, identical sizing at 900 mV, and identical sizing
at 1.2 V) are also demonstrated. Even at the low 900-mV supply
voltage, the proposed SRAM structure provides a noticeably

PSNR

FOR

TABLE II
FOREMAN VIDEO IN VARIOUS LOW-VOLTAGE OPERATIONS.
(UNDER THE ISO-AREA CONDITION OF 1.3 )

better video. Table II also shows the video quality degradations


with more aggressive voltage scaling.
In order to show the video quality improvements for different
output bit-rates, we changed the output bit-rates from 256 kbps
to 768 kbps, and compared the video qualities for Foreman and
Soccer videos. Fig. 11 shows the experimental results. The heterogeneous sizing approach was also applied to the higher resolution videos. 4CIF resolution (704
576) videos are used
as an example, and Fig. 12 shows the comparisons of four different sample videos between the conventional and heterogeneous SRAM sizing approaches.

2282

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSI: REGULAR PAPERS, VOL. 59, NO. 10, OCTOBER 2012

Fig. 13. Comparison between the proposed heterogeneous SRAM sizing


scheme and hybrid 8T/6T SRAM architecture [7].
TABLE III
POWER COMPARISON FOR THE EQUI-PSNR CONDITIONS WITH VARIOUS
VIDEOS. (UNDER THE ISO-AREA CONDITION OF 1.3 )

Fig. 11. Video quality (PSNR) versus output bit-rate graphs for Foreman (a)
video and Soccer (b) videos.

Fig. 12. PSNR comparison for 4CIF video sequences under 1.3
dition at 900-mV supply voltage.

iso-area con-

The proposed approach is compared with the hybrid SRAM


sizing scheme [7], where the upper 4-bit HOBs are designed
with minimum sized 8T SRAM and lower 4-bit LOBs are with
minimum size 6T SRAM. The different SRAM architectures
are compared under the iso-area condition of 1.15 size. The

worst case failures of SRAM bit-cells are assumed, and in case


of 8T SRAM, the worst case appears at SF (Slow-NMOS and
Fast-PMOS) corner with write failures. Fig. 13 shows the PSNR
comparisons between the proposed SRAM architecture with the
hybrid 8T/6T SRAM when those are used for the embedded
memories of H.264 processors. Even though the hybrid SRAM
scheme shows the better video quality compared to the identical sizing under iso-area condition, our proposed heterogeneous SRAM sizing presents the best PSNR.
The heterogeneously sized SRAM also operates with lower
supply voltage than the conventional SRAM to maintain the
same PSNR levels of videos. To measure the power consumption in low-voltage operation, the proposed SRAM of 64 kb was
implemented using IBM 90-nm technology, and Table III shows
the numerical results. With 810-mV supply voltage, the proposed SRAM approach presents approximately the same PSNR
values with the conventional SRAM operating with 900 mV.
For the numerical results presented in Table III, the PSNR numbers are obtained using the JM reference tool with the error
generation function, and the power consumptions are measured
with the spice-level simulations at 20 MHz using the sample
video sequences. As a result, the proposed SRAM structure
shows an average power savings of 38.4% over the conventional
SRAM architecture.
In smaller process technology, the heterogeneous sizing
scheme shows even larger advantages. Fig. 14 presents PSNR
comparisons for four sample videos when 32-nm predictive
technology model (PTM) [24] is used. Under 750-mV supply

KWON et al.: HETEROGENEOUS SRAM CELL SIZING FOR LOW-POWER H.264 APPLICATIONS

2283

architecture, and is applicable to the various embedded memories, where data stored in the memory have large differences
in importance.
V. CONCLUSION

Fig. 14. PSNR comparisons for four sample videos when 32-nm PTM model
is used with two different supply voltages (750 mV and 800 mV) under 1.3
iso-area condition. The transistor sizes of the minimum SRAM are
nm
nm,
nm
nm, and
nm
nm.
mV for both NMOS
Both inter-die threshold voltage variation (
mV,
and PMOS) and intra-die threshold voltage variation (NMOS
mV) are considered [25] for the simulations.
and PMOS

Supply voltage scaling is widely used to reduce the power


consumption of CMOS VLSI systems. In the embedded SRAM
memory of H.264 system, supply voltage scaling leads to
SRAM failures, which gives rise to serious video quality
degradation. In this paper, we propose a heterogeneous SRAM
cell sizing architecture, where the more important HOB data
is stored in the relatively larger SRAM bit-cells to reduce
the SRAM failures, and the less important LOBs are stored
in the smaller bit-cells. In order to find the optimal sizing
for each of the SRAM bit-cells, we also propose a dynamic
programming based low-complexity algorithm, which is approximately 50 000 times faster than the brute-force search
approach in selecting the SRAM size of 8bit pixel data. In
H.264 system, our proposed heterogeneous embedded SRAM
achieves an average PSNR improvement of 4.49 dB compared
to the identical SRAM cell sizing with the iso-area condition of
900-mV supply voltage. The proposed SRAM sizing approach
offers attractive design techniques, and the ideas presented in
this paper can assist the design of embedded SRAM and its
low-power implementation with low-voltage operation.
ACKNOWLEDGMENT
The authors would like to thank the IC Design Education
Center (IDEC) for its software assistance.

Fig. 15. Layout of heterogeneous size SRAM for 8-bit pixel.

voltage, the proposed approach shows an average PSNR improvement of 5.65 dB over the identical sizing.
B. Layout Example and Layout Issues
A layout example of the heterogeneous SRAM for 8-bit pixel
is presented in Fig. 15. Eight heterogeneously sized bit-cells
are placed into one word-line, and every bit-cells have the
same height. The width of each bit-cell can be varied by the
bit position, which is decided by the proposed Algorithm 2.
As mentioned, the simple modifications are needed in the
SRAM bit-cells and peripheral circuitry to adopt the proposed
heterogeneous SRAM architecture. Other SRAM architectures
supporting low-voltage operation such as 8T SRAM [4] and
priority-based 8T/6T hybrid SRAM [7] require complex circuitries since two separate word-lines are used for the read and
write operations. Therefore, combining the single-ended 8T
SRAM structure with the double-ended 6T SRAM becomes
a rather cumbersome design challenge. Moreover, in the 8T
SRAM architectures, only a small number of bit-cells are used
for a common bit-line due to the single-ended structure [4],
thus significantly degrading the area efficiency. Our proposed
heterogeneous SRAM approach provides an easy to design

REFERENCES
[1] C. P. Lin et al., A 5 mW MPEG4 SP encoder with 2D bandwidthsharing motion estimation for mobile applications, in Proc. ISSCC
Dig. Tech. Papers, Feb. 2006, pp. 16261635.
[2] A. P. Chandrakasan, S. Sheng, and R. W. Brodersen, Low-power digital CMOS design, IEEE J. Solid State Circuits, vol. 6, no. 5, pp.
473484, Apr. 1992.
[3] I. J. Chang et al., Fast and accurate estimation of SRAM read and
hold failure probability using critical point sampling, IET Circuits,
Devices, Syst., vol. 4, no. 6, pp. 469478, Nov. 2010.
[4] L. Chang et al., A 5.3 GHz 8T-SRAM with operation down to 0.41 V
in 65 nm CMOS, in Symp. VLSI Circuits Dig., Jun. 2007, pp. 252253.
[5] I. J. Chang et al., A 32 kb 10T sub-threshold SRAM array with bitinterleaving and differential read scheme in 90 nm CMOS, IEEE J.
Solid State Circuits, vol. 44, no. 2, pp. 650658, Feb. 2009.
[6] A.-T. Do et al., An 8T differential SRAM with improved noise margin
for bit-interleaving in 65 nm CMOS, IEEE Trans. Circuits Syst. I, Reg.
Papers, vol. 58, no. 6, pp. 12521263, Jun. 2011.
[7] I. Chang, D. Mohapatra, and K. Roy, A priority-based 6T/8T hybrid
SRAM architecture for aggressive voltage scaling in video applications, IEEE Trans. Circuits Syst. Video Technol., vol. 21, no. 2, pp.
101112, Feb. 2011.
[8] K. Osada et al., 16.7 fA/cell tunnel-leakage-suppressed 16 Mb SRAM
for handling cosmic-ray-induced multi-errors, in ISSCC Dig. Tech.
Papers, Feb. 2003, pp. 302303.
[9] A. K. Agarwal and S. Nassif, The impact of random device variation on SRAM cell stability in sub-90-nm CMOS technologies, IEEE
Trans. Very Large Scale Integr. (VLSI) Syst., vol. 16, no. 1, pp. 8697,
Jan. 2008.
[10] A. Bhavnagarwala, X. Tang, and J. D. Meindl, The impact of intrinsic
device fluctuations on CMOS SRAM cell stability, IEEE J. Solid State
Circuits, vol. 36, no. 4, pp. 658665, Apr. 2001.
[11] Y. Taur and T. H. Ning, Fundametals of Modern VLSI Devices. Cambridge, U.K.: Cambridge Univ. Press, 2002.

2284

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSI: REGULAR PAPERS, VOL. 59, NO. 10, OCTOBER 2012

[12] H.264: Advanced Video coding for generic audiovisual services,


Telecommunication Standardization Sector (ITU-T) [Online]. Available: http://www.itu.int/rec/T-REC-H.264-201003-I/en
[13] E. Seevinck, F. J. List, and J. Lohstroh, Static-noise margin analysis
of MOS SRAM cells, IEEE J. Solid State Circuits, vol. 22, no. 5, pp.
748754, Oct. 1987.
[14] K. Takeda et al., Redefinition of write-margin for next generation
SRAM and write-margin monitoring circuit, in ISSCC Dig. Tech. Papers, Feb. 2006, pp. 630631.
[15] S. Mukhopadhyay, H. Mahmoodi, and K. Roy, Modeling of failure
probability and statistical design of SRAM array for yield enhancement in nanoscaled CMOS, IEEE Trans. Comput.-Aided Des. Integr.
Circuits Syst., vol. 24, no. 12, pp. 18591880, Dec. 2005.
[16] A. J. Bhavnagarwala, X. Tang, and J. D. Meindl, The impact of intrinsic device fluctuations on CMOS SRAM cell stability, IEEE J.
Solid State Circuits, vol. 36, no. 4, pp. 658665, Apr. 2001.
[17] X. Tang, V. De, and J. D. Meindl, Intrinsic MOSFET parameter fluctuations due to random dopant placement, IEEE Trans. Very Large
Scale Integr. (VLSI) Syst., vol. 5, no. 4, pp. 369376, Dec. 1997.
[18] V. Gupta and M. Anis, Statistical design of the 6T SRAM bit cell,
IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 57, no. 1, pp. 93104,
Jan. 2010.
[19] S. Chien et al., Hardware architecture design of video compression
for multimedia communication systems, IEEE Commun. Mag., vol.
43, no. 8, pp. 122131, Aug. 2005.
[20] H.264/AVC Reference Software JM 16.0, Joint Video Team (JVT)
[Online]. Available: http://iphome.hhi.de/suehring/tml
[21] T. Cormen et al., Dynamic programming, in Introduction to Algorithms, 3rd ed. Cambridge, MA: MIT Press, 2009, pp. 323369.
[22] Video Test Sequence Database, [Online]. Available: ftp.tnt.uni-hannover.de/pub/svc/testsequences/
[23] V. Bhaskaran and K. Konstantinides, Image and Video Compression
Standards, 2nd ed. Norwell, MA: Kluwer, 1997.
[24] Nanoscale Integration and Modeling (NIMO) Group,, Predictive Technology Model (PTM) Website [Online]. Available: http://ptm.asu.edu
[25] S.-T. Zhou, S. Katariya, H. Ghasemi, S. Draper, and N. Kim, Total
area optimization for ultra-low voltage SRAM with sizing, redundancy,
and error correction codes, in Proc. IEEE Int. Conf. Comput. Design,
Oct. 2010, pp. 112117.

Jinmo Kwon (S11) received the B.S. and M.S. degrees in electrical engineering from Korea University, Seoul, Korea, in 2009 and 2011, respectively.
He joined the system IC Division of LG Electronics Corporation, Seoul, as a Research Engineer
in 2011. His research interest includes low power
circuit and systems for digital signal processing and
video signal processing.

Ik Joon Chang received the B.S. degree (summa cum


laude) in electrical engineering from Seoul National
University, Seoul, Korea, and the M.S. and Ph.D. degrees from the School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN, in
2005 and 2009, respectively.
After the Ph.D. degree, he joined NAND FLASH
Design Team, Samsung Electronics, Hwaseong-shi,
Korea.
Dr. Chang was awarded by the Samsung Scholarship Foundation in 2005.

Insoo Lee received the B.S. and M.S. degrees in


electrical engineering from Korea University, Seoul,
Korea, in 2009 and 2011, respectively.
He joined as a Research Engineer the Research and
Development Division, Hynix Semiconductor Inc.,
Gyeonggi-do, Korea, in 2011. His research interest
includes low-power VLSI digital signal processing
design for phase change memory.

Heemin Park received the B.S. and M.S. degrees


in computer science from Sogang University, Seoul,
Korea, in 1993 and 1995, respectively, and the Ph.D.
degree in electrical engineering from the University
of California, Los Angeles, in 2006.
He is an Assistant Professor in the Department
of Multimedia Science, Sookmyung Womens
University, Seoul. Before joining Sookmyung
Womens University in 2010, he was with Samsung
Electronics in Korea as a Principal Engineer and
Technical Leader of the Image Processor Design
Group for mobile phone cameras. His research interests include a broad range
of multimedia, web science, ubiquitous and entertainment computing with a
focus on imaging and sensing technologies.

Jongsun Park (M05) received the B.S. degree


in electronics engineering from Korea University,
Seoul, Korea, in 1998 and the M.S. and Ph.D.
degrees in electrical and computer engineering from
Purdue University, West Lafayette, IN, in 2000 and
2005, respectively.
He joined the Electrical Engineering Faculty of
Korea University, Seoul, in 2008. From 2005 to
2008, he was with the Signal Processing Technology
Group, Marvell Semiconductor Inc., Santa Clara,
CA. He was also with the Digital Radio Processor
System Design Group, Texas Instruments, Dallas, TX, in the Summer of 2002.
His research interests focus on variation-tolerant, low-power and high-performance VLSI architectures and circuit designs for digital signal processing and
digital communications.

Design and Implementation of 32nm FINFET


based 4x4 SRAM Cell array using I-bit 6T SRAM

Abstract-

Lourts Deepak A.

Likhitha Dhulipalla

Student, M.Sc in VLSI system design

Student, M.Sc in VLSI system design

M.S.Ramaiah School of Advanced Studies

M.S.Ramaiah School of Advanced Studies

Bangalore, India.

Bangalore, India.

lourdu.lourdu@gmail.com

likhitha.ec@gmail.com

Static

Random

Access

Memory

(SRAM)

is

designed to plug two needs: i) The SRAM provides as cache


memory, communicating between central processing unit and
Dynamic Random Access Memory
SRAM

doesn't

ii) The SRAM

in

the future due to fundamental

material

and

process

technology limits. The 32 nm FINFET based transistors are


used for alternative solution for Si-MOSFET with scaled

application

device geometry. In these device structures, the effect of

is portable compared to DRAM, and SRAM

short-channel length can be controlled by limiting the off-state

technology act as
since

(DRAM).

scaling of bulk CMOS, however, faces significant challenges

driving

require

any

force

refresh

for

low

current.

power

In

this

paper,

we've

illustrated the design and implementation of FINFET based 4x4

leakage.

SRAM cell array by means of one bit 6T SRAM. It has been

II.

carried out by FINFET HSPICE modeling with read and write


operation of SRAM memory.

A.

Keywords- SRAM; FINFET; I-BIT 6T SRAM;; 4X4 6

Transistor SRAM Cell

BACKGROUND THEORY

6T SRAM Cell
A static RAM cell is capable of holding a data bit so

long as the power is applied to the circuit. It consists of the

I.

central storage cell made up of two cross coupled inverters

INTRODUCTION

and two access transistors which provides read and write

The past 2 to 3 decades CMOS IC technologies have

operation. The Conducting state of the access transistor is

been scaled down continuously and forcefully entered into the

controlled by the control signal word line. Whenever the word

nanometer region. In many designs the need of memory has

increased

vastly

from

consumer

goods

to

industrial

line is high, both access transistor conduct and provide the


ability to write or read a data bit. When the word line is

applications. It increases the necessity of improving memories

low,

in a single chip with the help of nanometer technologies. There

connects the storage cell input/output nodes to the data lines

it

isolates

the storage

cell. The

access transistor

are lots of applications and integrated memories are improved

which are complement to each other. In read operation, the bit

using nanotechnology especially SRAM cell.

lines start pre-charged to some reference voltage usually close

SRAM cell is very useful to store a single bit data. The

static random access memory (SRAM) can preserve its data's until the
power is applied. Moreover SRAM cell doesn't need to be

to positive supply voltage [I].


B.

FINFET

is

The continuous down in scaling of bulk CMOS creates

simple because of the row and column address signals are

major issues in due to its base material. The primary obstacles

refreshed

and

faster

than

Dynamic

RAM,

operation

loaded concurrently and it has low data retention current. This

to the scaling of bulk CMOS to 32nm gate lengths include

property of SRAM differentiate it from the Dynamic RAM

short channel effects, Sub-threshold leakage, gate-dielectric

(DRAM) where there is no necessity of power need to be

leakage and device to device variations. But FINFET based

supplied and periodic refreshes are necessary

for power

preservation. In an array of cells, each cell can be read or write


in any

other as the term -random access II meant, no matter

designs offers the better control over short channel effects, low
leakage and better yield [2] in 32nm helps to overcome the
obstacles in scaling.

which cell was accessed last. SRAM cell is made up of two


cross coupled inverters. Lower operating voltage is enough to
operate SRAM which means increasing the cell stability and
long life of the memories.
increases

the high speed devices and in very large scale circuits. This
steady miniaturization of transistor with each new generation
bulk

CMOS

improvement

in

technology

transistor pair consists of two transistors with their source and

drain terminals tied together. In Double-Gate (DG) FINFETS,

Continuous shrinking of channel length is

of

Double Gate devices have been used in a variety of


innovative ways in digital and analog circuit designs. A parallel

has

yielded

continual

the performance of digital circuits. The

978-1-4673-0074-2/11/$26.00 @2011IEEE

the second gate is added opposite to the traditional gate, which


has been recognized for their potential to better control short
channel effects, as well as to control leakage current.

The

operations of FINFET is identified as short gate (SG) mode


with transistor gates tied together, the independent gate (IG)

177

mode where independent digital signals are used to drive the


two device gates, the low-power (LP) mode where the back
gate is tied to a reverse-bias voltage to reduce leakage power
and the hybrid (IG/LP) mode, which employs a combination of
low power and independent gate modes. Here independent
control of front and back gate in DG FINFET can be
effectively used to improve performance and reduce power
consumption. Independent gate control can be used to merge
parallel transistors in non-critical paths.
III. DESIGN OF I-BIT 6T SRAM MEMORY CELL [3]
The 6T SRAM cell that stores one bit of information is
shown in figure 3.1. The cell consists of two CMOS inverters
where the output of each is fed as input to other; this loop
stabilized the inverters to their respective state.

Figure 2: Read and Write operation of 1bit SRAM Memory cell

B.

Design of2:4 Decoder[4}


A decoder is a digital logic circuit which takes multiple

coded input and converts to coded multiple outputs where the


input

and

output

codes

are

different.

The

decoding

is

necessary in applications like data multiplexing, 7 Segment

display and memory address decoding. The simple example of

the decoder is an AND gate, the AND gate output will be high
when all the inputs are high, this output is also known as
active high output. A little more complex decoder is the n-2n
=

binary decoders. These decoders convert n coded input to 2n


unique outputs.

Figure 1 : 6T SRAM cell using FINFET

The access transistors and the word and bit lines WL and
BL are used to write and read, to and from the cell. In the
standby mode the access transistors turn to off by making the
word line low. The inverter will be complementary in this
state. The PMOS of the left inverter is turned on, the output
potential is high and the PMOS of the second inverter is
switched to off. The gates of the transistor that connect the bit
line and the lines of the inverter are driven by the word line. If
the word line is kept low the cell is disconnected from the bit
lines.

A.

SlA

'0

f2

Read and Write operation ofSRAM Cell

The word line high makes the NMOS conduct and


connect the inverter inputs and outputs to two vertical bit
lines. These inverters drive the current value stored inside the
memory cell onto the bit line and inverted the value on the
inverted bit line, this data generated the output value of the
SRAM cell during a read operation.

Figure 3: 2:4 decoder designs using NOT and AND gate

The figure 3 illustrates the decoder circuit for 2 coded

inputs to 4 coded outputs. It consists of 2 NOT logic gates and

In writing operation, the strong bit lines are activated


by input drivers to write the data into the memory. Depending
on the current value there might be a short circuit condition
and the value in SRAM is overwritten. The above mentioned
figure 2 shows the simulation output of read and write
operation for I-bit 6T SRAM cell.

4 AND logic gates.

The output of the 2:4 decoders further

clarified from its inputs like SO, SI and outputs like YO, YI,
Y2, Y3 relationship which mentioned in the table I below:

178

Table I: 2:4 Decoder truth table

YO

SI

SO
0

From this

Y2

Y3

Yl

1
0

truth table we can observe that the output

variables are mutually exclusive because only one output can


be equal to 1 at particular time. The output line whose value is
equal to 1 represents the min term equivalent of the binary
number presently available in the input lines. The following
figure 4 illustrates the output result of 2:4 decoder circuits.

Pll DJll]]lJVVlLJlLJll
I
jll
,

fl

:!

____

L___

i-I

OOTI'H!
_

___

l_

__

r
I I
II II
I I ________

J ,,

__

Figure 5 : Designed 4x4 SRAM cell array

Input-output buffers are also required for each column

r-'
I I
I I
I I
, I
J \.

as the decoder selects only one row of the array, the other
cells may generate glitch, this can be nullified by the buffers.
Also a 4-bit OR can be used to combine all the output of single
SRAM cells of each column to make a single output data.
The figure 5 shows the 4x4 SRAM cell array design

111 r, /n r 1_._.
=
J 11;:::==::;:
:::;:
=:::;:
:
: ","UT l
>-

__1> ___

t:.

...,J12

___

r-,!

___

... )
__

\...

consists of 6T one bit SRAM cell, decoders and buffers and 4-

jl

-",-,........'

input OR logic gates. The total number of transistors utilized in

\_._----

this 4x4 SRAM cell array is 156.

:=;:::::::;=
:
::
:;::=
::::;:::;::=
:=

r_t-!('1I$

D
D

D
Figure 4 : 2:4 decoder simulation result

A.

No of transistors used in SRAM cell array

96

No of transistors used in 2:4 Decoders

28

No of transistors used in Input Buffers

16

No of transistors used in Output Buffers

16

Read and Write operation of 4X SRAM s Cell Array


At initial stage of read operation, decoder will be in

IV. 4x4 SRAMCELL ARRAy DESIGN [5]

inactive mode. As soon as decoder is enabled, they are pre


charged first. This process makes all output high for a small

This section describes the designing of 4x4 SRAM cell

amount of time. This address is invalid then address settles

arrays of 4 rows and 3 columns. Each block of the array is of

down according to the input of the decoder and one particular

6T SRAM cell. There are 4 rows and 4 columns arranged to

SRAM cell is activated. Activation of read enable

form a 4x4 SRAM cell array. To address these rows of cells the

(RE)

signal

activates the read buffer. The ready SRAM cell data traverses

decoder is used prior to the array arrangement. As the row

towards ready buffer. Thus the data bit is read from memory

consists of 4 cells it constitutes to form half a byte. The AND

cell. To continue the read operation address bits are changed to

based 2:4 decoder is used to generate the address lines, the

address the next memeory cell.

number of transistors used for the decoder circuit is 28 (each

During write operation, the address is selected and data is

AND gate uses 6 transistor and NOT gate made up of 2

given to write circuit as input. Upon the activation of write

transistors).

enable

(WE)

signal activates the write buffer output change

according to the input. The feedback action in SRAM cell then

These address lines which form the outputs of decoder are

stabilizes the data of the memory.

connected to each row of the array. The input and output data

WE

signal is then disabled

for safe write operation and to avoid further writing of spurious

control consists of write and ready circuitry. From the decoder

data. To continue the write operation to other cells address bits

the address is selected in the array and 4 bits of data is written

are changed and same procedure is repeated again and again

or read in parallel from DO to D3.

for required times. The following figure 6 shows the simulation


output of 4x4 SRAM cell array

179

VI. CONCLUSION
V.

FINFET based

RESULTS AND DISCUSSION

The output of the decoder is taken as the word line WL of

the 32nm FINFET and simulated output has been verified.

SRAM. The bit lines BL from bl 3 to bl 0 are common to

Once all sub-blocks such as I-Bit 6Transistor SRAM and 2:4

SRAM of each column. The selection WL and BL makes one

decoder designed in HSPICE for FINFET, all the sub blocks

of the SRAM active and performs either read or write

are integrated and 4x4 SRAM cell array has been developed.

operation depending on the activation of enable. The below

Then output of SRAM array verified and resulted in figure 6.

mentioned figure 6 shows the overall input and result of 4X4


SRAM cell array.

REFERENCES

From this simulation result, it is observed that the inputs

of decoder p and q of 0, 1 and 1, 0, the output of decoder

word line WL OOlO and 0100 and for the given bit line BL

inputs llOl and 0111 the outputs of the SRAM array after

[I]

S.M kang and Y.Leblebici, -CMOS digital integrated circuits II, TMH
publishing company limited, 2007

[2]

Y.Omura, S.Cristovoveanu, F. Gamiz, B-Y. Nguyen , - Silicon-on


Insulator Technology and Devices 14, Issue 4 - Advanced FinFET
Devices for sub-32nm Technology Nodes: Characteristics and

passed through the OR gate are 1101 and 0111 thus just
following the BL i.e. input performing the read operation of

Integration Challangesll 2009, pp.45-54.

the SRAM array of 4x4 cells.


[3]

I
.l
WS;t5;s--__S I
I

'-------------------

M'"

L______
'------------------- ('I') I)

."

MOl

]lDMIVVVVVV I
i

"

"

[4]

Jan M Rabaey & Anantha Chandrakasan & Borivoje Nikolic, -Digital


Integrated Circuits- A Design Prespectivell , Pearson education, Third
edition, 2005.

[5 ]

Sedra and Smith, -Microelectronic Circuits II , Oxford University


press, Fifth edition, 2004.

"

I
L
I'M'"''"
, ilt===============-------J
l
i ;':,
1
----"1
I
, ':l-.,'"
"""I 4''"''""'"' <h"'''1 ''"tI+ I

Feng Wang, Yuan Xie, Kerry Bernstein and Yan Luo, -Dependability
Analysis of Nano-Scale FinFET Circuits II, proceedings of Emerging
VLSI technologies and Architectures IEEE 2006 .

..,.,

I-bit SRAM has been designed and

simulated first. Then 2:4 decoder also has been designed using

Figure 6 : 4x4 SRAM Cell array simulation result

180

High Density and Low Leakage Current Based 5T


SRAM Cell Using 45 nrn Technology
Shyam Akashe

SushiI Bhushan

Assistant Professor

M-TECH VLSI

Institute of Technology and Management, Gwalior

Institute of Technology and Management,


Gwalior, M.P., INDIA

vlsi.shyam@gmail.com

er.sushil.bhushan@gmail.com

Sanjay Sharma
Department of Electronics & Communication
Thapar University
Patiala, Punjab, INDIA.
sanjay.sharma@thapar.edu
Abstract- This paper is based on the observation of a CMOS

consume

five-transistor SRAM cell (5T SRAM cell) for very high density

dissipation in modem microprocessor [3]. A large portion of

significant

fraction

(30-6 0%)

of

total

power

and low power applications. This cell retains its data with leakage

cell energy is dissipated in driving the bit-lines, which are

current and positive feedback without refresh cycle. This 5T

heavily loaded with multiple storage cells [3]. Clearly, the

SRAM cell uses one word-line and one bit-line and extra read

memory

line

reduction

control.

The

new

cell

size

is

21.66%

smaller

than

conventional six-transistor SRAM cell using same design rules


with no performance degradation. Simulation and analytical
results

show

purposed

cell

has

correct

operation

during

read/write and also the delay of new cell is 70.15% smaller than a
six-transistor SRAM cell. The new 5T SRAM cell contains
72.10% less leakage current with respect to the 6T SRAM
memory cell using cadence 45 nm technology.

Keywords-component; 5T SRAM cell, Cell delay, Cell leakage,


Cell area, power consumption.

many

VLSI

chips.

This

is

especially

true

for

microprocessors, where the on-chip memory cell sizes are


growing

with

each

generation

to

the

most

attractive

targets

for

power

majority of the write and read bits are '0'. Whereas in the
conventional SRAM cell because one of two bit-lines must be
discharged to low regardless of written value, the power
consumption in both writing '0' and ' l' are the generally same
[1]. Also in conventional SRAM cell differential read bit-line
used during read operation and consequently, one of the two
bit-lines must be discharged regardless of the stored data value
[3]. Therefore always there are transitions on bit lines in both
writing '0' and reading '0' and since in cell accesses an
cause high dynamic power consumption during read/write

Fast low power SRAMs have become a critical component


of

are

[1]. Besides, in cell accesses an overwhelming

overwhelming majority of the write and read bits are "0" these

INTRODUCTION

I.

cells

bridge

the

increasing

divergence in the speeds of the processor and the main


memory. The power dissipation has become an important
consideration due to the increased integration, operating speeds
and the explosive growth of battery operated appliances. The

operation in conventional SRAM cell.


The read static noise margin (SNM) is important parameter
of SRAM cell. The read SNM of cell shows the stability of cell
during read operation and further degraded by supply voltage
scaling and transistor mismatch. The read operations at the low
read SNM levels result in storage data destruction in SRAM
cells [4].

leakage current of the memory will be increased with the

In response to these challenges in conventional SRAM cell,

capacity such that more power will be consumed even in the

our objective is to develop a read-static-noise margin- free

usually

SRAM cell with five transistors to reduce the cell area size

implemented using arrays of densely packed SRAM cells for

standby

mode. These

on-chip

memory

cells

are

with perfonnance and power consumption improvement. In

high performance [1]. A six transistor SRAM cell (6T SRAM

designing of this new cell we exploit the strong bias towards

cell) is conventionally used as the memory cell [2]. However,

zero at the bit level exhibited by the memory value stream of

the 6T SRAM cell produces a cell size an order of magnitude

ordinary programs.

larger than that of a DRAM cell, which results in a low


memory density [2]. Therefore, conventional SRAMs that use
the 6T SRAM cell have difficulty meeting the growing demand
for a larger memory capacity in mobile applications [2].
Studies show that the power dissipated by the cell is usually
a significant part of the total chip power [1]. Cell accesses

978-1-46 73-0074-21111$26 .00 @2011 IEEE

346

II.

READ STATIC NorSE MARGIN AND S RAM CELL


CURRENT IN CONVENTIONAL SRAM CELLS

Ease The SRAM cell current and read static noise margin
(SNM) are two important parameters of SRAM cell. The read
SNM of cell shows the stability of cell during read operation

IDS-Ml

and SRAM cell current detennine the delay time of


SRAM cell [4]. Fig.l shows the SRAM cell current in the
conventional

SRAM

cell.

Although

SRAM

cell

>

ISD-M 2 + I gate-M 4 + I gate-M3

For satisfying above condition when '0' stored in cell, we

current

use leakage current of access transistors (Ml), especially sub

degradation simply increases bit-line (BL) delay time, Read

threshold current of access transistors (Ml). For this purpose

SNM degradation results in data destruction during Read

during idle mode of cell, bit-line maintained at GND and word

operations [4].

line maintained at VIdle. Fig. 3 shows leakage current of cell


during idle mode for data retention when '0' stored in cell.

Both Read SNM and SRAM cell current values are highly
dependent on the driving capability of the access NMOS
transistor: Read SNM decreases with increases in driving
capability, while SRAM cell current increases [4]. That is, the
dependence of the two is in an inverse correlation [4]. Thus in
conventional SRAM cell the read SNM of cell and cell current
cannot adjust separately.

Figure 2.

New ST SRAM cell in 4S-nm technology node

Most of leakage current of access transistor (Ml) is sub


threshold current, since this transistor maintained in sub
threshold region.

Figure I.

SRAM cell current in 6T SRAM cell

One strategy for solving the problem of inverse correlation


between SRAM cell current and read SNM is separation of
data retention element and data output element. Separation of
data retention element and data output element means that
there will be no correlation between Read SNM and SRAM
cell current. Base on this strategy, [S ] presents a dual-port
SRAM cell. But this cell is composed of eight transistors and
has 30% greater area than that of a conventional 6T SRAM cell

Figure 3.

[4]. Another strategy is loop-cutting during read operation.


Base on this strategy in [4] a read-static-noise-margin-free

Cadence Virtuoso simulation result with VDD=l.lV shows

SRAM cell for low-VDD and high speed application presented.

if during idle mode of cell, bit-line maintained at GND and

To avoid inverse correlation between SRAM cell current and

VIdle =O.2V then 'O'data stored in cell without refresh cycle

read SNM we proposed new five transistor SRAM cell. Our


proposed

cell

is

base

on loop-cutting strategy and

Leakage current in idle mode when '0' stored in cell.

and thus in idle mode above condition satisfied. The Cadence

this

Virtuoso parameters are obtained from the latest for the

observation that in ordinary programs most of the bits in

technology node of 4S -run [6 ].

memory cell are zeroes for both the data and instruction
streams. This new cell making it possible to achieves both low
VDD and high-speed operations with no area overhead
III.

CELL DESIGN CONCEPT

Fig. 2 shows a circuit equivalent to a developed ST SRAM

,
-

cell using a supply voltage of 1.1V in 4S -nm technology node.

---

--

1'4:

.')',

-1

t,

During idle mode of cell (when read and write operation don't

perform on cell) the feedback cutting transistor (MS ) is ON and

. -

N node pulled to VDD by this transistor. When '1' stored in


cell, M3 and M2 are ON and there is positive feedback
between ST node and STB node, therefore ST node pulled to

''' .:':. I

--

- - -

_ __

__

____

- - ....... :- ...- -"'- - 1


- f J .1 : t _
. " j 1\)

'

..:

- -

.....,. - - - - -- -

- --

t.C"()

- -;-

-, - '- - -

Figure 4.

by MS the STB pulled to VDD, also M2 and M3 are OFF and


must be satisfied.

347

.'1-

) . : I', l

VDD by M2 and STB node pulled to GND by M3. When '0'

for data retention without refresh cycle following condition

) ...

- --

stored in cell M4 is ON and since N node maintained at VDD

Waveform of new cell during write cycle.

- - --

READ AND WRITE OPERATION

IV.

During write operation feedback-cutting transistor is ON


and N node pulled to VDD by this transistor, thus in write
operation

read-line

maintained

at

GND.

When

write

operation is issued the memory cell will go through the


following steps.

Bit-line driving

A.

For a write, data drove on bit-line (BL), and then word-line


(WL) asserted to VDD.
Figure 5.

B.

Possible circuit schematic of sense amplifier.

Cell flipping
J)

Tr<lmientResponse

]un16,2011

this step includes two states as follows:

Data is zero
in this state, ST node pulled down to GND by NMOS

access transistor (MI), and therefore the Load transistor (M4)


will be ON, and STB node will be pulled up to VDD.

2)

Data is one
in this state, ST node pulled up to VDD-VTN by NMOS

1.25

access transistor (MI), and therefore the drive transistor (M3)

>

'HI,1!

u' rill 1

,
!,
\

load transistor (M2) will be ON and positive feedback created

r L

-.25
1.25

will be ON , and STB node will be pulled down to GND, thus

---

by M2 and M3.

3)

Idle mode
At the end of write operation, cell will go to idle mode and

word-line and bit-line asserted to VIdie and GND, respectively.


When a read operation is issued the memory cell will go

Figure 6.

Waveform of new cell during read cycle.

through the following steps.

a)

Bit-line discharging:

For a read, bit-line discharged to

V.

GND, and then floated. Transistor is OFF and thus read-line

Fig. 7 shows the layout of 6T SRAM cell and Fig. 8 shows

maintained at VDD during read operation.

b)

Word-line activation:

the 5T SRAM cell in scalable CMOS design rules. The 6T


SRAM cell has the conventional layout topology and is as

in this step word-line asserted

compact as possible. The 6T SRAM cell requires 3.438 /lm2

to VDD and two states can be considered:

c)

Voltage of ST node is high:

CELL AREA

areas, whereas 5T SRAM cell requires 2.6 9 /lm2 areas.

when voltage of st node

is high, the voltage of bit-line pulled up to high voltage by


nmos access transistor. We refer to this voltage of bit-line as
vb I-high.

j) Voltage of ST node is low:

when voltage of ST node is

low, the voltage of bit-line and ST node equalized.

g) Sensing:

After word-line deactivate to VIdie and read

line return to GND then sense amplifier is turned on to read


data on bit-line. Fig. 5 shows possible circuit schematic of
sense amplifier that used for reading data from new cell.

h)

Idle mode:

At the end of read operation, cell will go

to idle mode and bit-line asserted to GND, respectively.


Figure 7.

348

Layout of 6T cell using 45 nm technology.

These

numbers

do

not take

into

account

TABLET.

the

COMPARISON BETWEEN 5T & 6T LEAKAGE CURRENT

potential area reduction obtained by sharing with neighbouring


cells. Therefore the new cell size is 2l.66% smaller than a
S.

conventional six-transistor cell using same design rules.

Leakage

Leakage

Better

in H

in6T

Performanc e

Forwrite1

...9 . 8 4

-0229

6T

inSTB

nA..

nA.

551 nA.

94.1 1

Parameter

No
1

node
ForwriteO
inSTB

5I

nA.

node
3

HL9n.I\

Forwrite 1

-3

.10

5I

nA.

inST
node
Figure 8.

Layout of 5T cell using 45 nm technology.

ForwriteO

3 . 6011.1\

5I

920 nA.

inST

VI.

LEAKAGE CURRENT

node

In one state, novel ST SRAM cell must retains its data


using the leakage current of the access transistor (when zero

The above tables shows the ST SRAM cell leakage in write

stored) and in the other state the ST SRAM cell must retains its

o STB node, write 1 ST node, write 0 ST node is less than the

data using positive feedback (when one stored). Thus in idle

6T SRAM cell. It shows that ST is better than 6 t for write data

mode when 'l' stored in cell, there is positive feedback and M2,

in SRAM cell.

M3 and feedback cutting (MS ) transistors are ON and access


transistor maintained in sub-threshold region. In this state there

VII.

is a path from supply voltage to ground and power dissipated.

CELL DELAY

Delay of the cell depends on the consumption of time

Fig. 8 shows this path when '1' stored in cell.

between the cells from input (BL) to output [10]. Comparison


of the cell delay between ST & 6T shows in the table below.
TABLE!!.
S.No.

1
2

COMPARISON BETWEEN 5T & 6T CELL DELAY

Delay

Delay

Better

of S T

of 6T

Performance

Delay at

14.24

47.72

ST

STB

ps

ps

Delay at

2.4S 3

0.839

ST

ns

ns

Parameters

6T

This table shows that S T cell delay in STB node is less than
Figure 9.

6T cell delay in STB node. It means ST is better than 6T

Path from supply voltage to ground when ' I ' stored in cell.

because the output is taken from the STB node and cell delay
for the ST in STB node is less than the 6T in STB node.

In ordinary programs most of the bits in memory cell are


zeroes for both the data and instruction streams. It has been
shown that this behavior persists for a variety of programs
under

different

assumptions

about

memory

cell

VIII.

sizes,

Power

organization and instruction set architectures [7] [8]. Thus most


of bit values resident in the data and instruction memory cell
current

in

idle

mode

of

ST

SRAM

cell

consumption

of

the

SRAM

memory

cell

is

depending on the consumption of the power of the transistor


using for the operation [9]. Power consumption of the cell is

are zero. Based on these observations we simulated average


leakage

POWER CONSUMPTION

shows in the table below

and

conventional 6T SRAM cell by using 4S nm technology.

349

TABLE Ill.

ACKNOWLEDGEMENTS

COMPARISON BETWEEN 5T & 6T POWER CONSUMPTION

This work was supported by ITM University Gwalior, with


S.

collaboration Cadence Design System Bangalore. The authors

Paramete1: 5

no .

PO\'i<2'1" consulILod

at ST"B node fo
''Titing 0
PO\'i<2'1" consulILod

at 'T node fo
''Titing I
4

POWe1:
consumption of

ST

6T

9 .8nW

O.OIl p\V

at STB node fo
\'Titing I
POWe1: con sulILod

POWe1:
consumption of

Power consulll..od

6 .J n\\

JO pW

8 5 . 0 1 n': 1,1

O.OOJ pW

9 .8nW

would also like to thank to Professor S. Akashe for their


enlightening technical advice.
REFERENCES
Y. 1. Chang, F. Lai, and C. L. Yang, "Zero-Aware Asymmetric SRAM
Cell for Reducing Cache Power in Writing Zero," IEEE Transactions on
Very Large Scale Integration Systems, vol. 12,no. 8,pp. 827-836, 2004 .

[2]

A. Kotabe,K. Osada,N. Kitai,M. Fujioka,S. Kamohara,M. Moniwa,S.


Morita,and Y. Saitoh, "A Low-Power Four-Transistor SRAM Cell With
a Stacked Vertical Poly- Silicon PMOS and a Dual-Word-Voltage
Scheme," IEEE Journal of Solid-State Circuits, vol. 40, no. 4, pp. 870876,2005.

[3]

L. Villa, M. Zhang, and K. Asanovic, "Dynamic zero compression for


cache energy reduction," in Proceeding 33rd Annual IEEE/ ACM
International Symposium Micro architecture, pp. 214-220, 2000.

[4]

K. Takeda et aI. , "A read-static-noise-margin-free SRAM cell for low


VDD and high-speed applications," IEEE Journal of Solid-State
Circuits,vol. 41,no. I, pp. 113-121,2006.

[5]

L. Chang et aI., "Stable SRAM cell design for the 32 nm node and
beyond," in Symp. VLSI Technology Dig., pp. 128-129, Jun. 2005. 574
JOURNALS OF COMPUTERS, VOL. 4, NO. 7, JULY 2009 2009
ACADEMY PUBLISHER.

[6]

http://www.eas.asu. edu/-ptm & W. Zhao and Y. Cao, "New generation


of predictive technology model for sub- 45nm design exploration," IEEE
Transactions on Electron Devices,vol. 53,no. I I, pp. 2816-2823,2006.

[7]

N. Azizi, F. Najm, and A. Moshovos, "Low-leakage asymmetric-cell


SRAM," IEEE Transactions on Very Large Scale Integration Systems,
vol. I I, no. 4,pp. 701- 715,2003.

[8]

A. Moshovos, B. Falsafi, F. N. Najm, and N. Azizi, "A Case for


Asymmetric-Cell Cache Memories," IEEE Transactions on Very Large
Scale Integration Systems,vol. 13,no. 7,pp. 877-881,2005.

[9]

1. M. Rabaey, A. Chandrakasan, and B. Nikolic, Digital Integrated


Circuits: A Design Perspective,Prentice Hall,2002.

[10]

K. Martin, Digital Integrated Circuit Design, Oxford University Press,


New York,2000.

28pW

at ST node fo
\'Titing 0

IX.

[I]

CONCLUSION

With the aim of achieving a high density and low leakage


current memory cell, we developed a 5T SRAM cell. The key
observations behind our design are that the cell leakage is
determined from that node in which the transistor is off. In
same design rules proposed cell area is 21.66% smaller than 6T
SRAM cell with 28. 57% speed improvement. Leakage current
during memory cell access of new cell is 72.10% lesser than 6T
SRAM cell but the proposed cell is cell power consumption
penalty.

350

Specific Power Illustration of


Proposed 7T SRAM with 6T SRAM
Using 45 nrn Technology
Shyam Akashe

Shishir Rastogi

Assistant Professor

M-TECH VLSI

Institute of Technology and Management,

Institute of Technology and Management

Gwalior, M.P

Gwalior, M.P

vlsi.shyam@gmail.com

shishir.vlsi@gmail.com

Sanjay Sharma
Thapar University
Department of Electronics & Communication
Patiala, Punjab, INDIA.
sanjay.sharma@thapar.edu
Abstract- This paper is based on the observation of a various

the 6T SRAM cell have difficulty meeting the growing demand

CMOS

seven transistor SRAM cell for very high density and

for a larger memory capacity in mobile applications [2].

low power applications. This cell retains its data with leakage

Studies show that the power dissipated by the cell is usually a

current

significant part of the total chip power [1]. Cell accesses

and positive

feedback

without

refresh cycle.

These

various 7T SRAM cell uses one word-line and one bit-line and

consume

NMOS transistor to control. Simulation and analytical results


show purposed cell has correct operation during read/write and

significant

fraction

(30-60%)

of

total

power

cell energy is dissipated in driving the bit-lines, which are

also the delay of new cell is 70.15% smaller than a six-transistor

heavily loaded with multiple storage cells [3]. Clearly, the

SRAM cell. The various new 7T SRAM cell contains 72.10% less

memory

leakage current with respect to the 6T SRAM memory cell using

reduction

cadence 45 nm technology and power consumption during read


and write operation are approximate 20.34%

dissipation in modem microprocessor [3]. A large portion of

cells

are

the

most

attractive

targets

for

power

[1]. Besides, in cell accesses an overwhelming

majority of the write and read bits are '0'. Whereas in the

less than the

conventional SRAM cell because one of two bit-lines must be

conventional 6T SRAM memory cell.

discharged to low regardless of written value, the power


consumption in both writing '0' and '1' are the generally same

Keywords- Various 7T SRAM cell, Cell delay, Cell leakage, Cell

[1]. Also in conventional SRAM cell differential read bit-line

area, power consumption

used during read operation and consequently, one of the two


I.

bit-lines must be discharged regardless of the stored data value

INTRODUCTION

[3]. Therefore always there are transitions on bit lines in both

Fast low power SRAMs have become a critical component


of

many

VLSI

chips.

This

is

especially

true

writing '0' and reading '0' and since in cell accesses an

for

overwhelming majority of the write and read bits are "0" these

microprocessors, where the on-chip memory cell sizes are


growing

with

each

generation

to

bridge

the

cause high dynamic power consumption during read/write

increasing

operation in conventional SRAM cell. The read static noise

divergence in the speeds of the processor and the main

margin (SNM) is important parameter of SRAM cell. The read

memory. The power dissipation has become an important

SNM of cell shows the stability of cell during read operation

consideration due to the increased integration, operating speeds

and further degraded by supply voltage scaling and transistor

and the explosive growth of battery operated appliances. The

mismatch. The read operations at the low read SNM levels

leakage current of the memory will be increased with the

result in storage data destruction in SRAM cells [4]. In

capacity such that more power will be consumed even in the


standby

mode.

These

on-chip

memory

cells

are

response to these challenges in conventional SRAM cell, our

usually

objective is to develop a read-static-noise margin- free SRAM

implemented using arrays of densely packed SRAM cells for

cell with five transistors to reduce the cell area size with

high performance [1]. A six transistor SRAM cell (6T SRAM

performance

cell) is conventionally used as the memory cell [2]. However,


the 6T SRAM cell produces a cell size an order of magnitude

power

consumption

improvement.

In

zero at the bit level exhibited by the memory value stream of

larger than that of a DRAM cell, which results in a low

ordinary programs.

memory density [2]. Therefore, conventional SRAMs that use

978-1-4673-0074-21111$26.00 @2011 IEEE

and

designing of this new cell we exploit the strong bias towards

364

II.

perform on cell) word line transistor are at VDD by the supply

READ STATIC NOISE MARGIN AND SRAM CELL

of IV.

CURRENT IN CONVENTIONAL SRAM CELLS

The SRAM cell current and read static noise margin (SNM)
are two important parameters of SRAM cell. The read SNM of
cell shows the stability of cell during read operation and
SRAM cell current detennine the delay time of SRAM cell [4].

Figure 2.

New 7T1 SRAM cell in 45-nm technology node

When '1' stored in cell, M3 and M2 are ON and there is


positive feedback between ST node and STB node, therefore
Figure 1.

ST node pulled to VDD by M2 and STB node pulled to GND

SRAM cell current in 6T SRAM cell.

Fig. l shows the SKAM cell current

ill

by M3. When '0' stored in cell M4 is ON and since N node


maintained at VDD by M5 the STB pulled to VDD, also M2

the conventIOnal

and M3 are O FF and for data retention without refresh cycle

SRAM cell. Although SRAM cell current degradation simply

following condition must be satisfied. For satisfying above

increases bit-line (BL) delay time, Read SNM degradation

condition when '0' stored in cell, we use leakage current of

results in data destruction during Read operations [4]. Both

access transistors (Ml), especially sub-threshold current of

Read SNM and SRAM cell current values are highly dependent

access transistors (MI). For this purpose during idle mode of

on the driving capability of the access NMOS transistor: Read

cell, bit-line maintained at GND and word-line maintained at

SNM decreases with increases in driving capability, while

VIdle. Fig. 3 shows the new proposed 7T2 SRAM and Fig.4.

SRAM cell current increases [4]. That is, the dependence of the

shows the new propose 7T3 SRAM cell.

two is in an inverse correlation [4]. Thus in conventional


SRAM cell the read SNM of cell and cell current cannot adjust
separately. One strategy for solving the problem of inverse
correlation between SRAM cell current and read SNM is
separation of data retention element and data output element.
Separation of data retention element and data output element
means that there will be no correlation between Read SNM and
SRAM cell current. Base on this strategy, [5] presents a dual
port SRAM cell. But this cell is composed of eight transistors
and has 30% greater area than that of a conventional 6T SRAM
cell [4]. Another strategy is loop-cutting during read operation.
Base on this strategy in [4] a read-static-noise-margin-free
SRAM cell for low-VDD and high speed application presented.
To avoid inverse correlation between SRAM cell current and
read SNM we proposed new five transistor SRAM cell. O ur
proposed

cell

is

base

on

loop-cutting

strategy

and

this

observation that in ordinary programs most of the bits in


memory cell are zeroes for both the data and instruction
streams. This new cell making it possible to achieves both low
VDD and high-speed operations with no area overhead.
III.

Figure 3.

CELL DESIGN CONCEPT

Fig. 2 shows a circuit equivalent to a developed 7T SRAM


cell using a supply voltage of 1V in 45-nm technology node.
During idle mode of cell (when read and write operation don't

365

shows the new proposed 7T2 SRAM

Figure 7.

l1li; __ -.._",. ... _

:JIIIIo_.!io'lIIt=-- __ IW_,..
Figure 4.

Waveform of new proposed 7T3 during write


cycle.

shows the new proposed 7T3 SRAM

Cadence Virtuoso simulation result with VDD=lV shows if


during idle mode of cell, bit-line maintained at GND then
'O'data stored in cell without refresh cycle and thus in idle
mode

above

condition

satisfied.

The

Cadence

Virtuoso

parameters are obtained from the latest for the technology node
of 45-nm [6].
IV.

READ AND WRITE OPERATION

During write operation feedback-cutting transistor is ON


and N node pulled to VDD by this transistor, thus in write
operation

read-line

maintained

at

GND.

When

write

operation is issued the memory cell will go through the


following steps.
l)-Bit-line driving: For a write, data drove on bit-line (BL),
and then word-line (WL) asserted to VDD.
1.0

2.1

10

5.0

2)-Cell flipping: this step includes two states as follows:

!inlt4

Figure 5.

a)-Data is zero: in this state, ST node pulled down to GND

Waveform of new proposed 7T1 during write cycle.

by NMOS access transistor (Ml), and therefore


the Load transistor (M4) will be ON, and STB node will be
pulled up to VDD.
b)-Data is one: in this state, ST node pulled up to VDD
VTN by NMOS access transistor (Ml), and therefore the drive
transistor (M3) will be ON , and STB node will be pulled down
to GND, thus load transistor (M2) will be ON and positive
feedback created by M2 and M3.
3)-Idle mode: At the end of write operation, cell will go to
idle mode and word-line and bit-line asserted to VIdie and
GND, respectively.
When a read operation is issued the memory cell will go
1.1

Figure 6.

2.1

till!t!I

3.0

through the following steps.

5.'

Waveform of new proposed 7T2 during write cycle.

366

1)-Bit-line discharging: For a read, bit-line discharged to

V.

GND, and then floated. Transistor is OFF and thus read-line

CELL AREA

Fig. 10 shows the layout of 6T SRAM cell and Fig. 11

maintained at VDD during read operation.

shows the 7T SRAM cell in scalable CMOS design rules. The


6T SRAM cell has the conventional layout topology and is as
2

2)-Word-line activation: in this step word-line asserted to

compact as possible. The 6T SRAM cell requires 3.438 f.1m


2
areas, whereas 7T SRAM cell requires 4.41 f.1m areas.

VDD and two states can be considered:


a)-Voltage of ST node is high: when voltage of ST node is
high, the voltage of bit-line pulled up to high voltage by
NMOS access transistor. We refer to this voltage of bit-line as
VBL-High.
b)-Voltage of ST node is low: when voltage of ST node is
low, the voltage of bit-line and ST node equalized.
4)-Sensing: After word-line deactivate to VIdie and read
line return to GND then sense amplifier is turned on to read
data on bit-line. Fig. 5 shows possible circuit schematic of
sense amplifier that used for reading data from new cell.
5)-Idle mode: At the end of read operation, cell will go to
idle mode and bit-line asserted to GND, respectively.

Figure 10. . Layout of 6T cell using 45 nm technology.

These numbers do not take into account the potential area


reduction

obtained

Therefore

the

6T

by
cell

sharing

with

size

21.66%

is

neighboring
smaller

cells.
than

conventional six-transistor cell using same design rules

Figure 8.

. Possible circuit schematic of sense amplifier.

Jun16,2011

Transi!ntR!lpOnl!

-----
-

",11

.1

Figure 11. . Layout of 7T cell using 45 nm technology.

---

3 '1"

VI.

LEAKAGE CURRENT

In one state, novel 7T SRAM cell must retains its data


using the leakage current of the access transistor (when zero
stored) and in the other state the 7T SRAM cell must retains its
data using positive feedback (when one stored). Thus in idle

time (n

Figure 9.

mode when ' l' stored in cell, there is positive feedback and M2,

Waveform of new 7T cell during read cycle.

367

COMPARISON AMONG VARIOUS 7T & 6T CELL

TABLE II.

M3 and feedback cutting (M5) transistors are ON and access

DELAY

transistor maintained in sub-threshold region. In this state there


is a path from supply voltage to ground and power dissipated.
Fig. 12 shows this path when '1' stored in cell.

SRAM

S.No.

Delay

cell

STB

ST

6T

47.72 ps

50.54 ps

7Tl

39.18 ps

39.18 ps

7T2

34.53 ps

28.96 ps

7T3

31.48 ps

31.48 ps

This table shows that 7T3 cell delay in STB node is less
than 6T cell delay in STB node. It means 7T3 is better than 6T
because the output is taken from the STB node and cell delay
for the 7T3 in STB node is less than the 6T in STB node.
VIII.
Power

POWER CONSUMPTION

consumption

of

the

SRAM

memory

cell

is

depending on the consumption of the power of the transistor


using for the operation [9]. Power consumption of the cell is
shows in the table below, this table shows that the power .
Figure 12. Path from supply voltage to ground when '1' stored in cell.
TABLE II!.

COMPARISON AMONG VARIOUS 7T & 6T POWER


CONSUMPTION

In ordinary programs most of the bits in memory cell are


zeroes for both the data and instruction streams. It has been
shown that this behavior persists for a variety of programs
under

different

assumptions

about

memory

cell

S.

sizes,

Cell

organization and instruction set architectures [7] [8]. Thus most


of bit values resident in the data and instruction memory cell

STB
'1 '

are zero. Based on these observations we simulated average


leakage

current

in

idle

mode

of

7T

SRAM

cell

and

conventional 6T SRAM cell by using 45nm technology.

TABLE !.

COMPARISON BETWEEN VARIOUS 7T & 6T


LEAKAGE CURRENT

S. No.

Transistor

Leakage

6T

0.9067 f.1A

7T3

7Tl

0.7092 f.1A

7T3

7T2

0.3155 f.1A

7T3

7T3

0.1189 f.1A

7T3

STB
'0'

ST
'1 '

ST '0'
102.3n

101

101.

101.

nW

3nW

5nW

58.8

59.4

52.56n

7Tl

56
nW

nW

3nW

7T2

71.2

74.3

78.5

81.23n

nW

nW

6nW

7T3

90.4

95.6

98.9

93.93n

nW

nW

InW

Better

6T

Current
1

Power Consumption

NO

consumption of 7Tl is less than all other proposed SRAM


cell and but all proposed 7T SRAM cell contains less power
consumption during write and read operation with respect to
the 6T SRAM cell.
IX.

CONCLUSION

The above tables shows the 7T3 SRAM cell leakage in

The key observations behind our design are that the cell

write 0 STB node, write 1 ST node, write 0 ST node is less

leakage is determined from that node in which the transistor is

than the 6T SRAM cell. It shows that 7T3 is better than 6t for

off. In same design rules proposed cell area is 2l.66% larger

write data in SRAM cell.

than 6T SRAM cell with 28.57% speed improvement. Leakage


current during memory cell access of new cell is 72.l0% lesser

VII.

than 6T SRAM cell and power consumption during read and

CELL DELAY

write

Delay of the cell depends on the consumption of time

operation

are

approximate

conventional 6T SRAM memory cell.

between the cells from input (BL) to output. Comparison of the


cell delay between 5T & 6T shows in the table below.

368

20.34%

less

than

the

REFERENCES
[I]

Y. 1. Chang, F. Lai, and C. L. Yang, "Zero-Aware Asymmetric SRAM


Cell for Reducing Cache Power in Writing Zero," iEEE Transactions on
Very Large Scale integration Systems, vol. 12,no. 8,pp. 827-836,2004.

[2]

A. Kotabe, K. Osada, N. Kitai, M. Fujioka, S. Kamohara, M. Moniwa,


S. Morita, and Y. Saitoh, "A Low-Power Four-Transistor SRAM Cell
With a Stacked Vertical Poly- Silicon PMOS and a Dual-Word-Voltage
Scheme," iEEE Journal oj Solid-State Circuits, vol. 40, no. 4, pp. 870876,2005.

[3]

L. Villa, M. Zhang, and K. Asanovic, "Dynamic zero compression for


cache energy reduction," in Proceeding 33rd Annual IEEE/ACM
International Symposium Micro architecture,pp. 214-220,2000.

[4]

K. Takeda et aI. , "A read-static-noise-margin-free SRAM cell for low


VDD and high-speed applications," iEEE Journal oj Solid-State
Circuits, vol. 41,no. I, pp. 113-121,2006.

[5]

L. Chang et aI., "Stable SRAM cell design for the 32 nm node and
beyond," in Symp. VLSI Technology Dig., pp. 128-129,Jun. 2005. 574
JOURNALS OF COMPUTERS, VOL. 4, NO. 7, JULY 2009 2009
ACADEMY PUBLISHER.

[6]

http://www. eas.asu.edu/-ptm & W. Zhao and Y. Cao, "New generation


of predictive technology model for sub- 45nm design exploration," iEEE
Transactions on Electron Devices, vol. 53,no. II, pp. 2816-2823,2006.

[7]

N. Azizi, F. Najm, and A. Moshovos, "Low-leakage asymmetric-cell


SRAM," iEEE Transactions on Very Large Scale integration Systems,
vol. II, no. 4,pp. 701- 715,2003.

[8]

A. Moshovos, B. Falsafi, F. N. Najm, and N. Azizi, "A Case for


Asymmetric-Cell Cache Memories," iEEE Transactions on Very Large
Scale integration Systems, vol. 13,no. 7,pp. 877-881, 2005.

[9]

1. M. Rabaey, A. Chandrakasan, and B. Nikolic, Digital integrated


A Design Perspective, Prentice Hall,2002.

Circuits:

369

2012 Second International Conference on Advanced Computing & Communication Technologies

Impact of Design Parameter on SRAM Bit Cell

Jayram Shrivas

Shyam Akashe

M-Tech VLSI Design


Institute of Technology and Management
Gwalior (M.P) INDIA
Email: jrm.shrivas@gmail.com

Associate Professor
Electronics & Instrument Engineering Department
Institute of Technology and Management
Gwalior (M.P), INDIA
Email: vlsi.shyam@gmail.com

Abstract SRAM Bit-Cell Sleep technique is widely used in


processors to reduce SRAM leakage power. However,
significance of leakage power savings from SRAM bit-cell sleep
technique is dependent on process technology and various design
parameters. This paper evaluates the effects of design parameters
like ITD, DVS and VDCMIN_RET on performance of 7T SRAM
bit-cell sleep technique. Impact of Process Technology on SRAM
bit-cell sleep technique performance, due to transition from
silicon dioxide (SIO2) to Hafnium based High-K gate dielectric
material is also discussed in this paper. Hafnium is a chemical
element and found in zirconium minerals, its atomic number is
72. Silicon measurement results of a 3MegaByte SRAM array
designed in 45nm High-K CMOS process is used to demonstrate
reducing effectiveness of SRAM bit-cell sleep technique.

leakage power in 7T SRAM bit-cells. A sleep transistor is


used in SRAM bit-cell sleep technique to scale (or lower)
voltage across SRAM bit-cell, thus reducing SRAM bit-cell
leakage power. Previous [3-4] work published on SRAM bitcell sleep describes merits of different bit-cell sleep
implementation schemes. Zhang et al [5] quantify leakage
power savings from bit cell sleep scheme based on a discrete
SRAM chip designed in 65nm process technology. In this
paper, we analyze effect of design parameters and process
technology on SRAM bit-cell sleep scheme effectiveness on
SRAM.

Keywords-SRAM, CMOS, 7T SRAM bit cell, SOC,

I.

INTRODUCTION

SRAM is an important part of modern microprocessor


design, taking a large portion of the total chip area and power.
With the advantages of high speed and ease of use, static
random access memory (SRAM) has been widely used in
system-on-chips (SOC). To achieve higher reliability and
longer battery life for portable application, low-power SRAM
array is a necessity. The sum of the power consumption in
decoders, bit lines, data lines, sense amplifier, and periphery
circuits represents the active power consumption. 7T SRAM
cell reduces the activity factor of discharging the bit line pair to
perform a write operation [1]. Over the last few years, devices
at 180nm have been manufactured; the deep sub-micron/nano
range of 45nm is foreseen to be reached in the very near future.
A seven transistors (7T) SRAM cell configuration is proposed
in this paper, which is amenable to small feature sizes
encountered in the deep sub-micron/nano CMOS ranges. The
schematic and Layout of proposed 7T SRAM Cell is shown in
Fig.1 and Fig.2

Fig.1 Schematic of proposed 7T SRAM Cell

In this era of green computing, there is increased


attention on power reduction, since power is constraining
products from realizing their full energy. Supply voltage
scaling is the most useful technique to reduce leakage power.
Scaling (or reducing) voltage has significant impact on
leakage power, since transistor leakage current decreases
exponentially with supply voltage [2]. SRAM bit-cell sleep
technique extends this concept of voltage scaling, to reduce
978-0-7695-4640-7/12 $26.00 2012 IEEE
DOI 10.1109/ACCT.2012.63

Fig.2 Layout of proposed 7T SRAM Cell

355
353

II.

SRAM BIT CELL OVERVIEW

SRAM bit-cell sleep techniques use a sleep transistor to


lower voltage across 7T SRAM bit-cell. A sleep transistor is
usually a high threshold voltage (Vt) PMOS [6] transistor
inserted between regular power supply and SRAM bit cell as
illustrated in Fig-3. In a typical PMOS transistor based sleep
scheme, voltage across SRAM bit cell and resulting bit-cell
leakage is dependent on sleep transistor width (Z). Voltage
across SRAM bit-cell (VDC_SRAM) reduces with decrease in
width of sleep transistor due to maximum voltage drop across
sleep transistor. This implies that a smaller width (Z) sleep
transistor is used to reduce bit-cell leakage, since a lower
VDC_SRAM value results in minimum SRAM bit-cell
leakage. In summary, SRAM bit cell sleep effectiveness
increases if VDC_SRAM decreases or sleep transistor width
decreases. A wake-up transistor is connected in parallel with
sleep transistor to minimize exit latency from sleep state and to
mitigate performance degradation due to voltage drop across
sleep transistor.

Fig.4 Output waveform of Bit Cell

IV.

IMPACT OF PROCESS TECHNOLOGY SCALING

Gate Leakage (Igate) was a major contributor to leakage


power in 180/45nm process technology nodes due to
aggressive scaling of gate oxide thickness (Tox). Hence, SRAM
bit-cell designed in 180nm and 45nm process technology nodes
relied extensively upon SRAM bit-cell sleep technique to
minimize gate leakage current. SRAM bit-cell sleep scheme
minimized Igate, by lowering voltage (VDC_SRAM) across
SRAM bit-cell using sleep transistor. Scaling VDC_SRAM
resulted in an exponential decrease in Igate, as described by
Equation-1:

Igate = A(VoxTox) e

(1)

A is a constant, Tox is the gate oxide thickness Vox is the


voltage (VDC_SRAM) across gate of the transistors in 7T
SRAM bit-cell we can also reduce Igate by increasing gate
oxide thickness (Tox) as described in Equation-1. Hence, the
preferred option to reduce Igate in 180nm and 45nm process
technology nodes was to lower VDC_SRAM using SRAM bitcell sleep technique. However, effectiveness of SRAM bit-cell
sleep in minimizing Igate [7] reduced significantly in 45nm
technology due to adoption of Hafnium based High-K gate
dielectric material. Transition to High- K dielectric material
was essential in 45nm technology since SiO2 (gate oxide
dielectric material used in 65nm technology) could not be
reduced further due to excessive increase in Igate.

Fig.3 Bit-Cell Sleep implementation using PMOS sleep transistor

III.

 Vox 


Tox

READ AND WRITE OPERATION OF CIRCUIT

The proposed write concept depends on cutting off the


feedback connection between the two inverters, inv1 and inv2,
before the write operation. The feedback connection and
disconnection are performed through an extra NMOS transistor
N5, as shown in Fig. 3, and the cell only depends on BLB (Bit
line bar) to perform a write operation. The write operation
starts by turning N5 off to cut off the feedback connection.
BLB carries complement of the input data, N3 is turned on, and
N4 is kept off. The SRAM cell looks like two cascaded
inverters, inv2 followed by inv1. BLB transfers the
complement of input data to input of another inverter which
drives inv2, P2 and N2, to develop output, cell data, which
drives inv1 and develops QB. The output waveform of write
and read operation are shown in Fig.4

Fig.5 Waveform of leakage power in Bit Cell

354
356

V.

IMPACT OF RETENTION VOLTAGE VDCMIN_RET

maintain equilibrium condition, as described in Equation-3


increases. Since higher supply voltage floor has implications
on other design metrics like power; microprocessors increase
width (Z) of sleep transistor. A large Z reduces SRAM Bit-Cell
sleep effectiveness.

Data Retention Voltage (VDCMIN_RET) is the minimum


voltage required by an SRAM bit-cell to retain stored data.
This implies that power savings from SRAM bit-cell sleep
scheme is maximum when voltage (VDC_SRAM) across
SRAM
bit-cell
equals
VDCMIN_RET.
However,
VDCMIN_RET for SRAM Arrays is a distribution due to Vt
variation in transistors of SRAM bit-cell. Due to
VDCMIN_RET distribution, VDC_SRAM floor increases,
since worst case VDCMIN_RET value has to be chosen. A
high VDC_SRAM floor increases bit-cell leakage due to larger
voltage across bit-cell. Bit-cell leakage increases further in
large SRAM arrays, as large SRAM arrays have a wide
VDCMIN_RET distribution as illustrated in Fig-4, since
expansion in Vt variation increases as bit-cell count increases.
In summary, VDCMIN_RET distribution results in a higher
VDC_SRAM floor, due to which SRAM bit cell sleep
effectiveness moderate.

VI.

VIII. SIMULATION RESULT


In this paper, by using sleep transistor techniques we can
reduce the voltage and leakage power. The Results of leakage
current and leakage power using Cadence tool is given in
Table. 1.
TABLE I.

SIMULATION RESULT OF THE 7T SRAM BIT CELL TABLE


TYPE STYLES

Process Technology

45 nm

Power Supply Voltage

0.7 V

Leakage Current

680.53 fA

Power Consumption

95.28 PW

IMPACT OF DYNAMIC VOLTAGE SCALING (DVS)

Supply Voltage (VDC) floor for SRAM Arrays with


SRAM Bit-Cell sleep feature depend upon Data Retention
Voltage (VDCMIN_RET) of SRAM bit-cell and voltage drop
across PMOS sleep transistor. This dependence is given below
in Equation-2:
VDC = {Ileak Rsleep(Z)} + VDCMIN_RET

(2)

IX.

A major reason for decreasing effectiveness of SRAM bitcell technique is inherent goodness of High-K Gate Dielectric
used in modern CMOS process along with processor design
constraints like ITD and VDCMIN_RET distribution.
Decreasing effectiveness of SRAM bit cell sleep is evident by
the fact that 10-50mw leakage power savings is attained from
SRAM bit-cell sleep scheme. For example, leakage power
savings from SRAM bit-cell sleep scheme can be increased by
disabling bit-cell sleep at lower VDC in processors supporting
DVS feature. Disabling SRAM bit-cell sleep at lower VDC
levels enables sleep design to be optimized for higher VDC.
Another factor contributing to bit-cell sleep ineffectiveness is a
wide spread in data-retention voltage. Since spread in
VDCMIN_RET is function of array size. Since sleep transistor
resistance selection can be better optimized due to a narrower
VDCMIN_RET distribution.

VDCMIN_RET is Bit-Cell Data Retention Voltage, VDC


is supply voltage, Z is Sleep Transistor Width (Z 1/Rsleep),
Rsleep is Sleep Transistor Resistance, Ileak is Bit-cell Leakage
Current at VDCMIN_RET. Equation-2 is critical since it
establishes the relationship between sleep transistor width and
supply voltage floor. It can be concluded from Equation-2 that
a high Rsleep (or small Z) increases voltage drop across sleep
transistor, resulting in a higher VDC floor. For example, it can
be observed from Figure-3 that a Z1 sleep transistor width
results in VDC floor of 0.7V whereas a Z3 sleep transistor
width (Z3>Z2>Z1) is much greater than as compared to Z2 and
Z1. However, a large Z (or low Rsleep) reduces bit-cell sleep
effectiveness.
VII. IMPACT OF INVERSE TEMPERATURE DEPENDENCE
Optimizing SRAM bit-cell sleep design over a wide
temperature range is challenging, since VDCMIN_RET of bitcell and Rsleep depend upon temperature. Relationship between
supply voltage (VDC), Rsleep and VDCMIN_RET is shown in
Equation-3:
VDC = {Ileak(T) Rsleep(T)} + VDCMIN_RET(T)

CONCLUSION

ACKNOWLEDGMENT
This work was supported by ITM University Gwalior, with
collaboration Cadence Design System Banglore.

(3)

T is Junction Temperature, Rsleep (T) and VDCMIN_RET


(T) have an inverse dependence on temperature (ITD), since
threshold voltage (Vt) of sleep transistor and SRAM bit-cell
transistors increase with decrease in temperature. As a result of
this increase in Vt, supply voltage (VDC) floor required to

355
357

REFERENCES
[1]

[2]

[3]

[4]

[5]

[6]

[7]

Y. Hsiang Tseng, Y. Zhang, L. Okamura and T. Yoshihara, A new 7transistor SRAM cell design with high stability, IEEE International
Conference on Electronic Device, System and Application, 2010.
V. Venkatachalam, Power Reduction Techniques in Microprocessor
Systems ACM Computing Surveys, Vol 3, pp 195237, September
2005.
M. Powell, Gated Vdd: A circuit technique to reduce leakage power in
deep-submicron cache memories Proceedings of ISLPED, pp 90-95,
2000.
N. Kim, Circuit and Microarchitectural Techniques for reducing cache
leakage power IEEE Transactions on VLSI, Vol 12, number 2, pp 167182, Feb 2004.
K. Zhang, SRAM Design on 65nm CMOS Technology with Dynamic
Sleep Transistor for leakage reduction IEEE JSSCC, Vol 40, pp 895900, April 2005.
F. Hamzaoglu, K.Zhang, A 3.8 GHz 153 Mb SRAM Design in 45nm
High-K Metal Gate CMOS Technology IEEE JSSCC, Vol 44, pp 148154, January 2009.
K. Kuhn, Intels 45nm CMOS Process Technology, Intel Technology
Journal, Vol 12, Issue-2, June 2008.

356
358

ISSCC 2012 / SESSION 13 / HIGH-PERFORMANCE EMBEDDED SRAM / 13.4


13.4

A 28nm 360ps-Access-Time Two-Port SRAM


with a Time-Sharing Scheme to Circumvent
Read Disturbs

Yuichiro Ishii1, Yasumasa Tsukamoto1, Koji Nii1,


Hidehiro Fujiwara1, Makoto Yabuuchi1, Koji Tanaka2,
Shinji Tanaka1, Yasuhisa Shimazaki1
1

Renesas Electronics, Tokyo, Japan


Renesas Electronics, Itami, Japan

With the rapid growth in the market for mobile information terminals such as
smart phones and tablets, the performance of image processing engines (e.g.,
operation speed, accuracy in digital images) has improved remarkably. In these
processors, 2-port SRAM (2P-SRAM) macros [1], in which a read port and a
write port are operated synchronously in a single clock cycle, are widely used.
As shown in Fig. 13.4.1, since the 2P-SRAM is placed in front of large scale logic
circuitry for image processing, a faster access time (e.g., <1ns) is required. In
general, the read-out operation in 2P-SRAM utilizes full-swing of the single read
bitline (BL), so a drastic improvement of the access time is not expected. On the
other hand, the dual-port SRAM (DP-SRAM) makes use of the voltage difference
between BL pair in the read-out operation, which is suitable for the high-speed
operation. In this study, we present a time-sharing scheme using a DP-SRAM
cell to achieve high-speed access in 2P-SRAM macros in such image processors.
There are several conventional methods to realize 2P-SRAM operation within a
single clock cycle. A single-port SRAM (SP-SRAM) can be double-clocked with
consecutive read and write operations, but such 2 operating frequency is generally hard to achieve. Alternatively, the read and write ports of a DP-SRAM can
be operated in parallel; however, this causes a read-disturb issue [2]
(Fig. 13.4.2), in which the cell current (Iread) is degraded when the read and write
ports access the same row simultaneously. To achieve <1ns access time, such
Iread degradation is not acceptable. Our method, using a DP-SRAM cell, performs consecutive read and write operations in a single clock cycle with a small
delay between the WL pulses. The two operations effectively share time in the
overall clock cycle as the rising edge of WLW is delayed until the sense amplifier (SA) has completed the read operation. Our method realizes i) high-frequency operation (or reduced cycle time) compared with conv. 1 (of Fig. 13.4.1) since
the read and write operations are independently executed by independent
peripheral circuits for each port, and (ii) high-speed access time due to prevention of read-disturb issue in conv. 2.
Figure 13.4.2 illustrates a situation in which data is read out from the left memory cell while a data is written to the right cell. Note that both cells are in the
same row. Focusing on the left cell, since the read wordline (WLRn) is activated, the internal node MT discharges BLR (Iread). On the other hand, the write
wordline (WLWn) is also activated to write a data to the right cell, so WLWn
behaves as the dummy read operation for the left cell. The MT in the left cell
(storing 0) is ramped up via the precharged BLW, reducing the power of MT to
discharge the BLR. Thus, the cell current (Iread) decreases, which is referred to
as the read-disturb issue. Figure 13.4.2 shows the simulated Iread with and without the read-disturb issue. To take the 6 variation into account, we introduce
the worst combination of the local Vth variation referring to the worst-vector
method in [3]. Due to the read-disturb issue, Iread is decreased by 48%, which
means that it takes twice as long to achieve the same amount of BL swing without the read disturb. Therefore, by circumventing the read-disturb issue, we can
accelerate the time for BL swing by 50% compared with conventional DP-SRAM.
Another merit of our implementation is shown in Fig. 13.4.3, in which the left cell
is written and read simultaneously. The unselected right cell in the same row is
half-selected due to activation of both WLRn and WLWn and thus sees a strong
dummy read operation. In this case, MT is raised significantly above ground by
BLR and BLW. If MT becomes higher than the threshold voltage of the inverter
in the cell, the storage nodes are flipped, causing data destruction. Figure 13.4.3

236

2012 IEEE International Solid-State Circuits Conference

compares the waveforms of the WLs and the storage nodes for proposed and
conventional circuit. In the conventional method, the longer the simultaneousWL-activation period (indicated by the dashed arrow) becomes, the more easily
the storage nodes are flipped, indicating low cell stability. This period for the
time-sharing method is shorter than that used in conventional parallel read/write
DP-SRAM because the write WL is activated only when the read BL swing is
complete. Thus, this implemenatation realizes the high cell stability. In this
study, we applied the conventional 8T DP-SRAM cell layout to our circuit, however the proposed circuit enables us to use smaller cell size owing to this advantage.
Figure 13.4.4 shows the circuitry and corresponding waveforms of our implementation. When the TDEC signal generated by CLK is activated, one read wordline, WLRn, is selected to begin the read operation. Note that in the conventional design, this TDEC also activates the WLWn, which results in the read disturb
issue mentioned earlier. To circumvent this, we introduce a new BACK signal,
which is activated by TDEC with a delay element This delay is designed so that
the SA can detect the worst BL swing including the local Vth variation. The BACK
signal activates the SA and the read-out data is transferred as the output Q. At
the same time, the BACK signal also selects the WLWn so that the write operation is executed after the read-out is completed. In this way, the time-sharing
scheme is achieved and the read-disturb issue is prevented. Note that WTE,
which activates the write-driver (WD), is synchronized with TDEC, however, the
data to be written is transferred to the WD in advance, so the peak current due
to concurrent activation of SA and WD is effectively avoided. Simulated waveforms show that WLW1 activation is delayed until SAE is enabled, so that the
read BLs (BLR0 and /BLR0) are discharged without disturbance from WLW1. In
addition, WTE is enabled prior to SAE activation, which helps to reduce peak current as previously mentioned. Simulation results show that the cycle time and
access time are 1GHz and 360ps, respectively.
The table in Fig. 13.4.5 summarizes the features of the circuit as designed in a
test-chip in a 28nm high-k metal-gate process. The graph represents a Shmoo
plot at 25C, showing the relationship between minimum operating voltage
(Vmin) and read access time. The lowest Vmin is 0.56V, while the access time at
1.0V is 500ps. This access time achieves 5 speed-up in comparison with our
previous data (2.5 ns at 1.0 V using the same DP-SRAM in 28 nm process [4].)
Figure 13.4.6 compares the data for this scheme with that for the conventional
method (conv. 2 in Fig. 13.4.1.) The simulation result (solid line) is in good
accordance with the measurement data for a wide range of supply voltage VDD.
A 100mV reduction in Vmin is observed due to improvement of cell stability while
the access time at worst-case temperature was improved by 13% to 360ps at
1.2V due to elimination of the read-disturb issue.

Acknowledgements:
We would like to thank Y. Ouchi, O. Kuromiya and M. Tanaka for their helpful
technical supports, and also Y. Kihara, T. Sato and T. Takeda for their management.
References:
[1] T. Suzuki, H. Yamauchi, Y. Yamagami, K. Satomi, and H. Akamatsu, A stable
2-port SRAM cell design against simultaneously read/write-disturbed accesses,
IEEE J. Solid-State Circuits, pp. 2109-2119, Sep. 2008.
[2] Y. Ishii, H. Fujiwara, S. Tanaka, Y. Tsukamoto, K. Nii, Y. Kihara, and K.
Yanagisawa, A 28 nm dual-port SRAM macro with screening circuitry against
write-read disturb failure issues, IEEE J. Solid-State Circuits, pp. 2535-2544,
Nov. 2011.
[3] Y. Tsukamoto, T. Kida, T. Yamaki, Y. Ishii, K. Nii, K. Tanaka, S. Tanaka, and Y.
Kihara, Dynamic stability in minimum operating voltage Vmin for single-port
and dual-port SRAMs, CICC Dig. Tech Papers, pp. 1-4, Sep. 2011.
[4] Y. Ishii, H. Fujiwara, K. Nii, H. Chigasaki, O. Kuromiya, T. Saiki, A. Miyanishi,
and Y. Kihara, A 28-nm dual-port SRAM macro with active bitline equalizing circuitry against write disturb issue, Symp. VLSI Circuits Dig. Tech. Papers, 2010,
pp. 99100.

978-1-4673-0377-4/12/$31.00 2012 IEEE

ISSCC 2012 / February 21, 2012 / 2:45 PM

Figure 13.4.1: 2P-SRAM for image processing unit.

Figure 13.4.2: Advantage: Fast read access time.

13

Figure 13.4.3: Advantage: Cell stability improved.

Figure 13.4.4: Circuit and simulated waveforms.

Figure 13.4.5: Measured Shmoo plot and features of the test-chip.

Figure 13.4.6: Measured FBC and read access time.

DIGEST OF TECHNICAL PAPERS

237

ISSCC 2012 PAPER CONTINUATIONS

Figure 13.4.7: Microphotograph of the test chip.

2012 IEEE International Solid-State Circuits Conference

978-1-4673-0377-4/12/$31.00 2012 IEEE

Analyzing the Optimal Ratio of


SRAM Banks in Hybrid Caches
Alejandro Valero, Julio Sahuquillo, Salvador Petit, Pedro Lopez, and Jose Duato
Department of Computer Engineering
Universitat Polit`ecnica de Val`encia
Valencia, Spain
alvabre@gap.upv.es, {jsahuqui, spetit, plopez, jduato}@disca.upv.es

AbstractCache memories have been typically implemented


with Static Random Access Memory (SRAM) technology. This
technology presents a fast access time but high energy consumption and low density. As opposite, the recently appeared
embedded Dynamic RAM (eDRAM) technology allows caches
to be built with lower energy and area, although with a slower
access time. The eDRAM technology provides important leakage
and area savings, especially in huge Last-Level Caches (LLCs),
which occupy almost half the silicon area in some recent
microprocessors.
This paper proposes a novel hybrid LLC, which combines
SRAM and eDRAM banks to address the trade-off among
performance, energy, and area. To this end, we explore the
optimal percentage of SRAM and eDRAM banks that achieves
the best target trade-off. Architectural mechanisms have been
devised to keep the most likely accessed blocks in fast SRAM
banks as well as to avoid unnecessary destructive reads.
Experimental results show that, compared to a conventional
SRAM LLC with the same storage capacity, performance degradation does not surpass, on average, 2.9% (even with 12.5%
of banks built with SRAM technology), whereas area savings
can be as high as 46% for a 1MB-16way LLC. For a 45nm
technology node, the energy-delay squared product confirms that
a hybrid cache is a better design than the conventional SRAM
cache regardless the number of eDRAM banks, and also better
than a conventional eDRAM cache when the number of SRAM
banks is a quarter or an eighth of the cache banks.

I. I NTRODUCTION
Multilevel on-chip cache hierarchies have been typically
built employing Static Random Access Memory (SRAM)
technology, which is the fastest electronic memory technology. Nowadays, alternative technologies are being used and
explored since SRAM technology presents important shortcomings like low density and high leakage energy, which is
proportional to the number of transistors. These shortcomings
have become important design challenges, in such a way that
it is unlikely the implementation of future cache hierarchies
with only SRAM technology, especially in the context of Chip
Multi-Processors (CMPs).
New advances in technology enable to build caches with
other technologies, like embedded Dynamic RAM (eDRAM)
or Magnetic RAM (MRAM). Table I summarizes some

properties of these technologies. The former presents high


density and low leakage power and has been already used
to build large L2 caches or Last-Level Caches (LLCs) in
commercial processors such as some IBM POWER processors [1] [2] [3] [4]. This capacitor-based memory integrates
trench DRAM storage cells into a logic-circuit technology [5],
which reduces significant area over typical 6-transistor bit
cells used in SRAM technology. More precisely, compared to
SRAM, the eDRAM technology increases the storage capacity
by a 3x factor for a given silicon area, thus giving important
area savings, especially in large LLCs.
An important design issue is that eDRAM can be manufactured by logic technologies with no or minimal changes in the
manufacture process. That is, SRAM and eDRAM technologies can be mingled in the same die as already addressed by
some companies [6]. Although other technologies like MRAM
present less leakage than eDRAM, manufacturing constraints
prevent from mingling them in conventional two-dimensional
(2D) chips. In addition, the low speed of MRAM, in particular
for write operations, suggests the use of this technology for
main memory storage instead of caches.
Since eDRAM technology is slower than SRAM technology, SRAM-based organizations are commonly designed as
first-level (L1) caches since this level is especially important
for performance; while eDRAM caches are designed for
capacity and energy savings (mainly leakage) working as huge
LLCs [7].
Because of each technology presents its advantages and
shortcomings, previous works have focused on hybrid designs
aimed at taking the best of each technology in different
components of the processor like L1 caches [8] [9], NonUniform Cache Architectures (NUCAs) [10] [11], and register
files [12]. In [8] it is proved that in hybrid SRAM/eDRAM

This work was supported by the Spanish MICINN, Consolider Programme


and Plan E funds, as well as European Commission FEDER funds, under
Grants CSD2006-00046 and TIN2009-14475-C04-01.

978-1-4673-3052-7/12/$31.00 2012 IEEE

297

Table I
F EATURES OF SRAM, E DRAM, AND MRAM
Technology
SRAM
eDRAM
MRAM

TECHNOLOGIES

Feature
Speed
Density
fast
low
slow
high
slow for reads
high
very slow for writes

Leakage
high
low
very low

II. M OTIVATION
First-level data caches concentrate most of their hits (e.g.,
more than 90%) in the Most Recently Used (MRU) block [13].
Therefore, in hybrid SRAM/eDRAM L1 caches, it is enough
to build a single cache way with the fastest SRAM technology
and force this cache way to store the MRU block for performance purposes [8]. However, it is widely known that data
locality in L2 caches is much poorer than in L1 caches. Thus
this implementation might yield to unacceptable performance
in L2 caches.
The two extremes of the design space exploration of hybrid
caches are defined by caches implemented with a single
technology. We will refer to these caches as pure SRAM and
pure eDRAM caches. Both extremes provide the maximum
performance but higher energy consumption and area, and the
poorer performance but lower leakage energy consumption and
area, respectively. Between these points we can vary the ratio
between SRAM and eDRAM banks in order to benefit either
performance or energy savings and area.
In this design space, the optimal hybrid SRAM/eDRAM
cache design must be found out. That is, the optimal design

loc-0

loc-1

loc-{2-3}

loc-{4-7}

loc-{8-15}

100%
90%
Cache Hit Distribution

80%
70%
60%
50%
40%
30%
20%
10%

ea
n

ap
pl
u

Am

gr
id
m

ar
t
fa
ce
re
c
w
up
w
ise

lu
ca
s

sw
im

ap
si

ol
f

bz
ip
2

tw

cf

Figure 1.

vp
r

0%
m

L1 caches, due to the high locality exhibited by data in this


level of the cache hierarchy, performance can be sustained by
implementing only a cache way with SRAM technology. This
scheme does not work well for high-associativity LLCs, since
data locality is much less predictable.
In this paper, we propose a novel hybrid Last-Level Cache
design and explore the trade-off among performance, energy,
and area. To this end, the data array of the cache is implemented in banks with different technologies, each one aimed
at storing specific cache blocks. The design pursues three
main goals with respect to a conventional SRAM cache with
the same capacity: i) to minimize performance losses, ii) to
reduce silicon area, and iii) to increase energy savings. The
first goal is achieved by storing the most likely accessed
blocks in SRAM banks, while the two latter are achieved
by building a significant percentage of banks with eDRAM
technology. Experimental results show that, compared to a
conventional SRAM cache with the same storage capacity,
performance degradation never exceeds, on average, 2.9%,
whereas area savings are by 46%. For a 45nm technology
node, the energy-delay squared product shows that a hybrid
cache is a better design than a conventional SRAM cache
regardless the percentage of eDRAM banks, and also better
than a conventional eDRAM cache when implementing the
fourth or the eighth part of its banks with SRAM technology.
The remainder of this paper is organized as follows. Section II shows the distribution of cache hits across the locations
of the LRU stack to estimate the ratio of SRAM and eDRAM
banks. Section III presents the design of the proposed hybrid Last-Level Cache. Section IV analyzes the performance,
energy-delay squared product, and area savings provided by
the proposal. Section V summarizes the related work, and
finally, concluding remarks are given in Section VI.

Percentage of cache hits across the locations of the LRU stack.

should provide the best trade-off among performance, energy


consumption, and area.
For analysis purposes, we will assume a 1MB-16way setassociative LLC (L2 cache) implementing the LRU replacement algorithm. With the aim to serve as a guide to estimate
how many ways should be implemented in SRAM banks
and how many in eDRAM banks, this section analyzes the
distribution of cache hits.
This distribution has been been obtained for the LRU stack
with the aim of analyzing if hits concentrate in a few blocks
at the top of the stack. On such a case, these blocks should
be stored in fast SRAM banks by implementing a swap
mechanism to move blocks among banks built with different
technologies. Figure 1 shows the results for the SPEC2000
benchmark suite [14]1 . Label loc-0 refers to the location
storing the MRU block, whereas loc-15 is the location storing
the Least Recently Used (LRU) block. Label loc-{x-y} denotes
hits in the positions falling in between x and y in the stack,
both inclusive.
As observed, unlike L1 caches, where more than 90% of
cache hits concentrate in the MRU block, hits are distributed
among different locations of the LRU stack in L2 caches.
Although the distribution is clearly skewed to blocks at the
top of the stack, the cache would require more than half its
size (8 ways) to cover by 90% of cache hits. Notice that
in 8 of 12 applications the MRU way captures only around
50% of cache hits. Thus, implementing only that way with
SRAM technology would yield to unacceptable performance.
Finally, notice that there is no need to maintain a bidirectional
relationship between cache ways and positions of the LRU
stack (e.g., way 4 stores the block in position 4 of the stack)
but only to find out which percentage of blocks should be
stored in fast SRAM ways.
III. H YBRID L AST-L EVEL C ACHE D ESIGN
This section discusses the architectural mechanisms used to
avoid unnecessary destructive reads in eDRAM banks and to
keep the MRU data in SRAM banks. In addition, a distributed
refresh mechanism to avoid capacitor discharges is presented.
1 Those applications exhibiting an L2 hit ratio higher than 95% were skipped
for this study. Architectural parameters are described in Section IV.

298

Table II
P URE AND HYBRID CACHES WITH THE CORRESPONDING NUMBER OF WAYS , BANKS ,
Cache scheme
16S
8S-8D
4S-12D
2S-14D
16D

SRAM ways
16
8
4
2
0

eDRAM ways
0
8
12
14
16

SRAM banks
8
4
2
1
0

We will assume that each bank stores two ways, which


results in an LLC with 8 banks for the studied 16-way cache.
This number of banks is acceptable, since it is common to
find other designs in the literature with more banks [15].
As each bank of the data array can be implemented with
either SRAM or eDRAM technology, several design choices
can be analyzed. Table II summarizes the studied choices,
specifying the number of SRAM and eDRAM ways and banks
of each cache scheme and the ratio of SRAM banks. The
conventional schemes in both extremes of the table are the
pure SRAM (16S) and the pure eDRAM (16D), which have all
their banks implemented with SRAM and eDRAM technology,
respectively. The tag array is built with SRAM cells regardless
the cache scheme, since it is much smaller than the data array
(i.e., much lower energy and area benefits can be obtained in
this structure, and implementing it with eDRAM technology
will negatively affect the access time).
A. Accessing the Hybrid Cache
To reduce access time, conventional caches usually overlap
the access of the tag array and the data array of all cache
ways, finally selecting the target way on a hit. However, this
might yield to energy wasting in hybrid caches since reads in
eDRAM cells are destructive, thus requiring to rewrite their
contents. Hence, the access in a hybrid cache is split into
two stages as shown in Figure 2. In the first stage, all the
tags (which are built with SRAM technology) and all the
SRAM banks (SRAM data array) are accessed in parallel. If
the requested data is stored in an SRAM way, the access time
of the hybrid cache is as fast as a hit in a conventional SRAM
cache and the second stage is not performed (i.e., no eDRAM
way is accessed). On a miss in the SRAM data array but a
hit in a tag associated to an eDRAM way, the second stage
is performed and only the target eDRAM way is accessed. In
this case, the access time includes the tag comparison plus
the access to the eDRAM data. On a cache miss, no eDRAM
way is accessed and the requested data is fetched from main
memory.

Figure 2. Diagram of the hybrid cache access. Dark boxes represent the
accessed parts of the cache. The second stage is performed only on a hit in
an eDRAM way detected in the first stage.

AND RATIO

eDRAM banks
0
4
6
7
8

(%) OF SRAM BANKS

SRAM ratio (%)


100
50
25
12.5
0

B. Keeping Blocks at the Top of the LRU Stack in SRAM Banks


To keep the MRU data in fast SRAM banks, the cache
controller manages a swap operation between SRAM and
eDRAM banks, similarly as done in [11]. To properly select
the blocks to be transferred, it also has to maintain a separate
LRU stack order in both SRAM and eDRAM data arrays.
The mechanism works as follows. On an eDRAM hit, the
requested eDRAM block is transferred from its eDRAM bank
to the SRAM bank that contains the LRU block of the SRAM
data array, which in turn is moved to the eDRAM bank. After
this swap operation, both involved blocks will be the MRU
ones of each data array. Since the SRAM data array has its
own LRU stack, a block does not leave that array and moves
to the eDRAM array until it becomes the LRU and is selected
to carry out the swap operation. However, notice that that
block resides in the same SRAM bank from its placement
until eviction from the SRAM array and no bank movement
is performed while the block is in a given data array (SRAM or
eDRAM). On a cache miss, the LRU block of the eDRAM data
array is selected for replacement. As the incoming data block
that is fetched from main memory is allocated in an SRAM
bank, a unidirectional transfer from the SRAM LRU block to
the eDRAM bank that contains the victim block is triggered.
Of course, the control bits must be accordingly updated. Notice
also that by splitting the LRU stack, the required number of
control bits is reduced.
C. Distributed Refresh
Although swapping eDRAM and SRAM data on an
eDRAM hit avoids refreshing the accessed eDRAM contents,
data in eDRAM banks that is not accessed for long may
be lost, as eDRAM capacitors lose their contents with time.
Merely losing eDRAM contents will hurt the performance
since this data, if required, must be fetched from main memory. To avoid such situations, a refresh operation should be
performed for all eDRAM blocks both in hybrid caches and
in the pure eDRAM 16D scheme before the capacitors lose the
stored value (i.e., before the retention time of the capacitors
expire).
Retention time depends on eDRAM capacitance. In this
work, we consider eDRAM cells implemented with trench
capacitors [16] with a 10fF capacitance, which corresponds
to a retention time of 190K processor cycles for a 3GHz
processor speed [17]. In order to mitigate the refresh penalty,
we assumed a distributed refresh interleaved among banks,
where each eDRAM block is regularly refreshed. The period
between two consecutive refresh operations is established as
the retention time divided by the number of eDRAM blocks.

299

Table III
M ACHINE PARAMETERS

eDRAM
SRAM

8S-8D
4S-12D
2S-14D
8S-8D
4S-12D
2S-14D
8S-8D
4S-12D
2S-14D
8S-8D
4S-12D
2S-14D
8S-8D
4S-12D
2S-14D
8S-8D
4S-12D
2S-14D
8S-8D
4S-12D
2S-14D
8S-8D
4S-12D
2S-14D
8S-8D
4S-12D
2S-14D
8S-8D
4S-12D
2S-14D
8S-8D
4S-12D
2S-14D
8S-8D
4S-12D
2S-14D
8S-8D
4S-12D
2S-14D

Hit Ratio

Microprocessor core
Issue policy
Out of order
Branch predictor type
Hybrid gshare/bimodal: gshare has
14-bit global history plus 16K
2-bit counters, bimodal has 4K
2-bit counters, and choice
predictor has 4K 2-bit counters
Branch predictor penalty
10 cycles
Fetch, issue, commit width 4 instructions/cycle
ROB size (entries)
128
# Int/FP ALUs
4/4
Memory hierarchy
L1 instruction cache
64B-line, 16KB, 2-way, 2-cycle
L1 data cache
64B-line, 16KB, 2-way, 2-cycle
L2 unified cache
64B-line, 1MB, 16-way
L2 access latency
Tag array: 2-cycle
SRAM data array: 6-cycle
eDRAM data array: 9-cycle
Memory access latency
100-cycle

100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%

mcf

Figure 3.

IV. E XPERIMENTAL R ESULTS


This section presents the simulation environment used to
evaluate performance, energy, and area of the studied schemes.
The hybrid caches have been modeled on top of an extensively modified version of the SimpleScalar simulation
framework [18]. The simulation results include the execution
time of the applications and the generated memory events
(i.e., cache hits, misses, writebacks, swaps 2 , and refreshes)
required to estimate leakage and dynamic energy, respectively.
We also model bank contention due to all these memory
events in hybrid and pure eDRAM caches. In other words,
an access to a given bank must wait until the previous access
to the same bank finishes and the cache controller releases it.
Accesses to different banks can be concurrently performed.
Leakage, dynamic energy per access type, and area were
estimated with CACTI 5.3 [19] for a 45nm technology node.
The overall energy was calculated combining the results of
both simulators.
Experimental results were performed configuring the SimpleScalar for the Alpha ISA and running SPEC2000 benchmarks with the ref input set. Statistics were collected simulating 500M instructions after skipping the initial 1B instructions.
Table III summarizes the main architectural parameters used
throughout the experiments. Cache access time was obtained
with the CACTI tool assuming a 3GHz processor frequency.

twolf bzip2

apsi

swim lucas

art

face.

wup. mgrid applu Amean

Hit ratio (%) split into hits in SRAM and eDRAM banks.

allow information loss due to capacitor discharges. Thus, the


total hit ratio matches the obtained in pure caches. Figure 3
depicts the results.
As expected, the hit ratio in eDRAM banks (eDRAM hit
ratio) increases with the number of eDRAM ways. Nevertheless, this is not the case in a few applications. For instance, the
eDRAM hit ratio in art is almost the same regardless the cache
scheme. This behavior can be explained by looking at Figure 1.
In this benchmark, the MRU location and the following one
capture around 50% of cache hits, whereas locations from 8
to 15 in the LRU stack capture almost all the remaining hits.
The eDRAM hit ratio is, on average, by 8%, 13%, and 17%
for 8S-8D, 4S-12D, and 2S-14D hybrid schemes, respectively.
To achieve enhanced performance, it is important that the
percentage of eDRAM hits remain as low as possible since an
access to an eDRAM way takes more cycles than an access
to an SRAM way. Performance losses due to bank contention
also rise because of periodic refresh operations. These losses
are not constant among the different studied caches, since
the elapsed time between two consecutive periodic refreshes
becomes shorter as the number of eDRAM blocks increases.
In addition, in the pure eDRAM 16D cache, reads require to
refresh data since these operations are destructive, which also
induces bank contention. In contrast, the bank contention on an
eDRAM hit in hybrid caches is induced by the swap operation
between the involved banks.
Figure 4 shows the slowdown of the analyzed cache
schemes with respect to the pure SRAM cache (the lower
18%
8S-8D
4S-12D
2S-14D
16D

16%
14%
12%
Slowdown

This guarantees that all eDRAM blocks are refreshed before


their retention time expire.

vpr

A. Performance Evaluation

10%
8%
6%
4%

To provide insights in the cache performance, we first


quantify the hit ratio in the different cache banks since they
work at different speeds. Remember that the design does not
2 The dynamic energy consumption due to the swap operation has been
considered as the sum of the expenses of a read access to the SRAM banks,
a read access to an eDRAM bank, a write access to an eDRAM bank, and a
write access to an SRAM bank.

2%

ea
n

ap
pl
u

m
H

gr
id
m

ar
t
fa
ce
re
c
w
up
w
ise

lu
ca
s

sw
im

ap
si

ol
f

bz
ip
2

tw

cf
m

vp
r

0%

Figure 4. Slowdown (%) of the analyzed schemes with respect to the pure
SRAM scheme.

300

is the better). As expected, the slowdown increases with the


number of eDRAM ways. In general, the performance loss is
higher in those applications with a high hit ratio in eDRAM
banks. For instance, in twolf, the eDRAM hit ratio can be
as high as 34% in the 2S-14D approach, which leads to 9%
slowdown. The pure eDRAM architecture is strongly impacted
both by its slower access time and bank contention. For
instance, in twolf and apsi, the slowdown is around 16.2%
and 15.9%, respectively, resulting in a very poor performance.
The performance loss is, on average, by 1.8%, 2.2%, and
2.9% in the 8S-8D, 4S-12D, and 2S-14D hybrid approaches,
respectively. In the 16D scheme, the slowdown grows up to
5.2%. Finally, remark that 8 banks are enough to obtain a
reasonable slowdown for hybrid LLCs.
B. Energy-Delay Squared Product and Area
This section evaluates the trade-off between performance
and energy consumption using the energy-delay squared product (ED2 P ) metric, since it reflects whether the hybrid design
stands as a cost-effective cache design or not for the near future
technologies. Then, area savings are also estimated.
To compute the ED2 P , the energy consumption has been
obtained as the sum of leakage currents and dynamic energy after running 500M instructions. The results include the
consumption of both tag and data arrays. Notice that the
impact of the swap operation has been taken into account,
and it represents between 11% and 13% of the total dynamic
consumption depending on the hybrid scheme (not shown due
to space restrictions). Figure 5 shows the normalized ED2 P
with respect to the 16S cache (the lower is the better).
Compared to the pure SRAM cache, all the studied schemes
reduce the ED2 P despite the lower performance obtained.
This is mainly due to the fact that hybrid and pure eDRAM
schemes significantly reduce leakage currents by design. This
points out the importance of eDRAM-based Last-Level Cache
designs, and especially hybrid-based designs, since 4S-12D
and 2S-14D present, on average, a higher reduction of ED2 P
compared to the 16D scheme. In particular, the 4S-12D
approach reduces ED2 P in 6 of 12 benchmarks with respect
16S

8S-8D

4S-12D

2S-14D

16D

Normalized Energy-Delay 2 Product

100%
90%
80%
70%
60%
50%
40%
30%
20%
10%

ea
n
Am

gr
id

ap
pl
u

ar
t
fa
ce
re
c
w
up
w
ise

lu
ca
s

sw
im

ap
si

ol
f

bz
ip
2

tw

cf
m

vp
r

0%

Figure 5. Normalized energy-delay squared product (%) with respect to the


pure SRAM scheme.

to the 16D cache, while the 2S-14D performs better in 10


workloads. This is mainly due to the fact that, compared to
the hybrid schemes, the pure eDRAM approach, despite being
the scheme that consumes less leakage energy, increases both
the execution time (see Figure 4) and the dynamic energy.
This is because the pure eDRAM scheme accesses all cache
ways of the data array, its reads are destructive, and refreshes
more eDRAM blocks. Notice that the 16D scheme presents
better ED2 P results compared to the 8S-8D scheme, since
the latter, as it is half SRAM-based, consumes a high amount
of leakage energy.
The reduction of ED2 P is, on average, by 57% and 66%
in 4S-12D and 2S-14D caches, respectively. This percentage
is by 50% in the 16D approach. Therefore, a hybrid cache
design with 12.5% of its banks built with SRAM technology
(2S-14D) or 25% (4S-12D) is a better cache design option
than a pure eDRAM cache.
The hybrid caches and the pure eDRAM cache require
less area than the pure SRAM cache since eDRAM cells
have higher density than SRAM ones. Taking into account
the SRAM and eDRAM cell area values provided by CACTI,
which are 0.296m2 and 0.062m2 , respectively, for a 45nm
technology node, the 16D scheme is the one that most reduces
the data array area (by 56%), closely followed by the 2S-14D
hybrid cache (49%). Since the tag array is built with SRAM
technology, no area benefits come from this small structure.
Thus, the total cache area savings are by 53% and 46% for
the pure eDRAM and 2S-14D schemes, respectively.
V. R ELATED W ORK
To take advantage of the properties that each technology
offers, previous works have focused on hybrid architectures
in different structures of the microprocessor such as first-level
caches, Non-Uniform Cache Architectures, and multi-threaded
register files.
Valero et al. [8] [9] proposed a hybrid n-bit cell, namely
macrocell, which consists of one SRAM cell, n-1 eDRAM
cells, and n-1 bridge transistors that allow internal movements
between SRAM and eDRAM cells. The macrocell is used
to implement n-way set-associative first-level data caches,
so that one cache way is built with SRAM cells and the
remaining n-1 ways are implemented with eDRAM cells. Due
to the highly-predictable data locality in L1, the single way
built with SRAM technology is used to store the MRU data.
Unfortunately, the data locality widely differs in LLCs, so a
significant number of accesses would be performed in slow
eDRAM cells.
In [10], Wu et al. proposed two hybrid designs: LHCA
and RHCA. The former design implements the L3 cache with
eDRAM, MRAM, or PRAM, while both L1 and L2 levels
are built with SRAM technology. In the latter design both
L2 and L3 caches are flatten into a pair of regions to form
a single level. One region is SRAM-based and the other is
eDRAM, MRAM, or PRAM-based, whereas the L1 is SRAMbased. The RHCA design requires much hardware complexity

301

than our proposal to manage data movements between regions,


since the design requires not only the LRU stack of all lines in
a set, but also an additional sticky bit for the SRAM lines and
a 2-bit saturating counter per eDRAM line. Unlike this work,
there is not a design space exploration varying the size of the
SRAM region, which is fixed to 256KB across all experiments.
Lira et al. [11] proposed two different architectures (homogeneous and heterogeneous) for a hybrid SRAM/eDRAM
NUCA. In the homogeneous organization, the fast SRAM
banks store the frequently accessed blocks and they are
placed close to the cores, whereas the eDRAM banks are
located in the center of the NUCA. However, this approach
is penalized by the shared data, since it is usually located in
slow eDRAM banks. On the other hand, the heterogeneous
architecture balances the number of SRAM and eDRAM
banks with the location of them (close to the cores or in the
center of the NUCA). Authors argue that the same number of
SRAM and eDRAM banks provide the best trade-off between
performance, power, and area in this organization.
In [12], Yu et al. presented an augmented 1-bit SRAM
cell with several eDRAM cells, resulting in a multiple-bit
SRAM/eDRAM cell to implement register files. The fast
SRAM cell is aimed at storing the active context, whereas each
pair of eDRAM cells store a dormant context. An additional
pair of eDRAM cells is used as a replica of the active context.
A dormant context becomes active by transferring the data
from the pair of eDRAM cells to the SRAM one. Performance,
energy, and area is evaluated using this hybrid cell.
VI. C ONCLUSIONS
Cache memories have been typically built with SRAM
technology to achieve high speed accesses. However, this
technology presents important drawbacks such as high leakage
currents and low density. In contrast, new advances in technology allow cache memories to be implemented with eDRAM
technology, which presents low leakage and high density at the
expense of an access speed not as fast as SRAM. Since both
technologies are CMOS compatible, they have been mingled
in the same die at the manufacturing process. The eDRAM
technology has been used in Last-Level Caches (LLCs), where
energy is an important design concern.
In this paper, both SRAM and eDRAM technologies have
been mingled in the LLC, resulting in a novel hybrid cache
design consisting of SRAM and eDRAM banks. The optimal
percentage of SRAM banks has been explored to achieve the
best trade-off among performance, energy, and area. Architectural mechanisms have been considered to maintain the most
likely accessed data in SRAM banks and to avoid unnecessary
destructive reads in eDRAM banks.
Experimental results have shown that, compared to a conventional SRAM LLC with the same storage capacity, performance degradation never exceeds, on average, 2.9%, whereas
area savings are up to 46% for a 1MB-16way hybrid cache.
For a 45nm technology node, the energy-delay squared product
confirms that, on average, a hybrid cache is a better design

than a pure SRAM cache regardless the number of eDRAM


banks. Moreover, contrary to as expected, results have shown
that it is a better design than a pure eDRAM cache when the
percentage of SRAM banks is 12.5% and 25%.
Finally, evaluating the potential of the hybrid architecture
in CMP systems, where different threads will share the LLC
and data locality may change, is planned as for future work.
R EFERENCES
[1] J. M. Tendler, J. S. Dodson, J. S. Fields, H. Le, and B. Sinharoy,
POWER4 system microarchitecture, IBM J. Research and Development, vol. 46, no. 1, pp. 525, 2002.
[2] B. Sinharoy, R. N. Kalla, J. M. Tendler, R. J. Eickemeyer, and J. B.
Joyner, POWER5 system microarchitecture, IBM J. Research and
Development, vol. 49, no. 4/5, pp. 505521, 2005.
[3] H. Q. Le, W. J. Starke, J. S. Fields, F. P. OConnell, D. Q. Nguyen,
B. J. Ronchetti, W. M. Sauer, E. M. Schwarz, and M. T. Vaden,
IBM POWER6 microarchitecture, IBM J. Research and Development,
vol. 51, no. 6, pp. 639662, 2007.
[4] B. Sinharoy, R. Kalla, W. J. Starke, H. Le, R. Cargnoni, J. A. VanNorstrand, B. J. Ronchetti, J. Stuecheli, J. Leenstra, G. L. Guthrie,
D. Q. Nguyen, B. Blaner, C. F. Marino, E. Retter, and P. Williams,
IBM POWER7 multicore server processor, IBM J. Research and
Development, vol. 55, no. 3, 2011.
[5] R. E. Matick and S. E. Schuster, Logic-Based eDRAM: Origins and
Rationale for Use, IBM J. Research and Development, vol. 49, no. 1,
pp. 145165, 2005.
[6] http://www.uniramtech.com/embedded dram.php.
[7] X. Jiang, N. Madan, L. Zhao, M. Upton, R. Iyer, S. Makineni, and
D. Newell, CHOP: Adaptive Filter-Based DRAM Caching for CMP
Server Platforms, in Proc. 16th Intl Symp. High-Performance Computer Architecture, 2010, pp. 112.
[8] A. Valero, J. Sahuquillo, S. Petit, V. Lorente, R. Canal, P. Lopez,
and J. Duato, An Hybrid eDRAM/SRAM Macrocell to Implement
First-Level Data Caches, in Proc. 42th Ann. IEEE/ACM Intl Symp.
Microarchitecture, 2009, pp. 213221.
[9] A. Valero, S. Petit, J. Sahuquillo, P. Lopez, and J. Duato, Design,
Performance, and Energy Consumption of eDRAM/SRAM Macrocells
for L1 Data Caches, IEEE Trans. Computers, vol. 61, no. 9, pp. 1231
1242, 2012.
[10] X. Wu, J. Li, L. Zhang, E. Speight, R. Rajamony, and Y. Xie, Hybrid
Cache Architecture with Disparate Memory Technologies, in Proc. 36th
Ann. Intl Symp. Computer Architecture, 2009, pp. 3445.
[11] J. Lira, C. Molina, D. Brooks, and A. Gonzalez, Implementing a hybrid
SRAM / eDRAM NUCA architecture, in Proc. 18th Intl Conf. High
Performance Computing, 2011, pp. 110.
[12] W.-k. S. Yu, R. Huang, S. Q. Xu, S.-E. Wang, E. Kan, and G. E. Suh,
SRAM-DRAM Hybrid Memory with Applications to Efficient Register
Files in Fine-Grained Multi-Threading, in Proc. 38th Ann. Intl Symp.
Computer Architecture, 2011, pp. 247258.
[13] S. Petit, J. Sahuquillo, J. M. Such, and D. Kaeli, Exploiting Temporal
Locality in Drowsy Cache Policies, in Proc. 2nd Conf. Computing
Frontiers, 2005, pp. 371377.
[14] Standard Performance Evaluation Corporation, available online at
http://www.spec.org/cpu2000.
[15] T. Kirihata, P. Parries, D. R. Hanson, H. Kim, J. Golz, G. Fredeman,
R. Rajeevakumar, J. Griesemer, N. Robson, A. Cestero, B. A. Khan,
G. Wang, M. Wordeman, and S. S. Iyer, An 800-MHz Embedded
DRAM with a Concurrent Refresh Mode, IEEE J. Solid-State Circuits,
vol. 40, no. 6, pp. 13771387, 2005.
[16] B. Keeth, R. J. Baker, B. Johnson, and F. Lin, DRAM Circuit Design.
Fundamental and High-Speed Topics.
John Wiley and Sons, Inc.,
Hoboken, New Jersey, 2008.
[17] A. Valero, J. Sahuquillo, V. Lorente, S. Petit, P. Lopez, and J. Duato,
Impact on Performance and Energy of the Retention Time and Processor Frequency in L1 Macrocell-Based Data Caches, IEEE Trans. Very
Large Scale Integration Systems, vol. 20, no. 6, pp. 11081117, 2012.
[18] D. Burger and T. Austin, The simplescalar tool set, version 2.0,
Computer Architecture News, vol. 25, no. 3, pp. 1325, 1997.
[19] S. Thoziyoor, N. Muralimanohar, J. H. Ahn, and N. P. Jouppi, CACTI
5.1, Hewlett-Packard Laboratories, Palo Alto, Technical Report, 2008.

302

2013 International Conference on Computer Communication and Informatics (ICCCI -2013), Jan. 04 06, 2013, Coimbatore, INDIA

Low power reconfigurable FPGA based on


SRAM
A.Priadarshini
P.G. Scholar - M.E.VLSI Design
Sri Ramakrishna Engineering College
Coimbatore, India
priaasokan123@gmail.com
Abstract- A low power CMOS Static Random Access Memory
(SRAM) based Field Programmable Gate Arrays (FPGA)
architecture is being presented in this paper. The
architecture presented here is based on CMOS logic and
CMOS SRAMs that are used for on-chip dynamic
reconfiguration. This architecture employs the fast and lowpower SRAM blocks that are based on 10T SRAM cells.
These blocks are employed in fast access of the configuration
bits by using the shadow SRAM technique. The dynamic
reconfiguration delay is being hidden behind the computation
delay through the use of shadow SRAM cells. The combined
effect of both the SRAM memory cells and the shadow SRAM
scheme enables to support in reducing the delay and also to
achieve reduced power consumption. Experimental results
show reduced delay of about 8.035ns and power consumption
of about 0.015W for the 10T SRAM memory cell with an
overhead in area, relative to 4T and 6T SRAM cells. Also, the
experimental results include the values of delay of about
8.979ns and power consumption of about 0.052W, achieved
for the LB of FPGA architecture which employs CMOS
SRAMs using the 10T SRAM memory cells in it.
Kewords Static Random Access Memory (SRAM), Fieldprogrammable gate arrays (FPGAs), Shadow SRAM, Fast
access.

I.INTRODUCTION
NATURE technology, which is a hybrid CMOS/nano
technology reconfigurable architecture, was presented
earlier. It can facilitate run-time reconfigurability. The
NATURE technology was based on CMOS logic and nano
RAMs, which used the concept of temporal logic folding
and fine-grain dynamic reconfiguration in order to increase
logic density. The main drawback of this design is that, it
required fine-grained distribution of nano RAMs
throughout the field-programmable gate array (FPGA)
architecture. Since the fabrication process of nano RAMs
is not mature yet, this prevents immediate exploitation of
NATURE[1]. FPGAs - Field Programmable Gate Arrays are future-oriented building blocks which allow perfect
customization of the hardware at an attractive price even in
low quantities. FPGA components available today have
usable sizes at an acceptable price. This makes them
effective factors for cost savings and time-to-market when
making individual configurations of standard products. A
time consuming and expensive redesign of a board can
often be avoided through application-specific integration of

978-1-4673-2907-1/13/$31.00 2013 IEEE

Dr.M.Jagadeeswari
Professor and Head - M.E. VLSI Design
Sri Ramakrishna Engineering College
Coimbatore, India
jagadee_raj@rediffmail.com
IP cores in the FPGA-an alternative for the future,
especially for very specialized applications with only small
or medium [7]. Another important aspect is long-term
availability. The advantage of FPGAs and their nearly
unlimited availability lies in the fact that even if the device
migrates to the next generation the code remains
unchanged. FPGAs contain programmable logic gate
components called logic blocks and a hierarchy of
reconfigurable interconnects that allow the blocks to be
wired together. The logic blocks also include memory
elements, which may be simple flip-flops or an array of
complete blocks of memory. The drawback of FPGAs is
that, the area, power consumption and delay parameters are
high compared to the application-specific integrated
circuits (ASICs) [2]. This drawback is primarily due to the
overheads introduced for reconfigurability. In order to
overcome the drawbacks of the current FPGA, a hybrid
CMOS/nanotechnology reconfigurable architecture, called
NATURE [3], was proposed previously to solve two main
problems: logic density and efficiency of run-time
reconfiguration. This NATURE technology is based on
CMOS logic and nano RAMs. These nano RAMs were
advantageous due to the fact that they provide high-speed
and high density, which enables the concept of temporal
logic folding that is similar to the temporal pipelining
concept [4].Due to the fact that, nano RAM fabrication
techniques are not mature yet, and the demand for the use
of fine-grained distributed nano RAMs incurs extra design
complexity and cost, marks a drawback of the nano RAMs.
Therefore, in order to avoid the use of nano RAMs, this
paper presents SRAM-based FPGA architecture. This
architecture overcomes the disadvantages of the nano
RAMs by employing CMOS logic and CMOS devices.
Reduced power consumption can be achieved by
employing low-power 10T non-precharge SRAM blocks.
These blocks are used for storage of configuration bits [5],
which save the charge/precharge power on bitlines during
read operation [1]. The proposed 10T SRAM-based FPGA
architecture achieves reduced reconfiguration delay and
power consumption. Simulation results show significant
improvements in performance due to the reduced delay
being achieved at competitive power consumption. The
remainder of this paper is organized as follows. Section II
presents fundamental facts of the 4T, 6T and 10T SRAM
cells and the previous design of NATURE. Section III

2013 International Conference on Computer Communication and Informatics (ICCCI -2013), Jan. 04 06, 2013, Coimbatore, INDIA

describes the proposed SRAM-based architecture. Section


IV presents experimental results for delay and power
consumption for the proposed architecture which employs
10T SRAM cell and a comparison is made between the 4T,
6T and 10T SRAM cells. Section V concludes this paper.
II. FUNDAMENTAL FACT
This section provides the fundamental material
necessary for understanding the CMOS SRAM-based
NATURE. In Section II-A, we discuss the structure and
characteristics of the 4T SRAM cell. Next in Section II-B
we discuss about the structure and characteristics of 6T
SRAM cell and in Section II-C we discuss about the design
and characteristics of 10T SRAM cell proposed in [4].
These 10T SRAM cells are elementary blocks of the
memory block in the LB of FPGA, which enables reduced
power consumption. Next, in Section II-D, we present the
concept behind NATURE technology.
A.

4T SRAM

This design consists of four NMOS transistors plus two


poly-load resistors as shown in Fig.1. Two NMOS
transistors are pass-transistors. These transistors have their
gates tied to the word line and connect the cell to the
columns. The two other NMOS transistors are the pulldowns of the flip-flop inverters. The loads of the inverters
consist of a very high poly-silicon resistor. The cell needs
room only for the four NMOS transistors [12]. Although
the 4T SRAM cell may be smaller than the 6T and 10T
cells, its speed is reduced and power consumption is more.

B.

6T SRAM

A different cell design that eliminates the above


limitations is the use of a CMOS flip-flop. In this case, the
load is replaced by a PMOS transistor. This SRAM cell is
composed of six transistors, one NMOS transistor and one
PMOS transistor for each inverter, plus two NMOS
transistors connected to the row line as shown in Fig 2.
This cell offers better performance in speed and power
than a 4T structure [9]. Although this cell structure offers
reduced area, its delay and power consumption are more
compared to the 10T SRAM cell due to the absence of the
readout inverter.
C.

10T SRAM

The proposed architecture employs the low power nonprecharge 10T SRAM cell in each of the memory cell. As
shown in Fig. 1, a 10T SRAM cell includes a conventional
6T SRAM cell, a readout inverter and a transmission gate
for the read port. These 10T SRAM cells enable both read
and write operation. The write operation is same as
achieved by a conventional 6T SRAM cell. For read
operation, this 10T SRAM cell employs its non-precharge
scheme [1]. Since the readout inverter is able to fully
charge/ discharge the read bitline, the precharge scheme is
not required. Therefore, the voltage on the bitline does not
switch until the readout datum is changed and hence, the
readout power is saved, also the delay is improved as well
compared to conventional 4T and 6T SRAM cells, since
the time for precharge is reduced. The area overhead of
the cell relative to the conventional 6T SRAM cell and 4T
SRAM cell is quite high. Since the 10T SRAM design
avoids high switching activities on memory read bitlines
and thus saves most of the charge/ precharge power, it is a
promising candidate for low-power applications.

Fig.1. 4T SRAM cell

Fig. 3. 10T SRAM cell.


D.

Fig.2. 6T SRAM cell

NATURE

A hybrid CMOS/nanotechnology reconfigurable


architecture, referred as the NATURE technology
presented in previous papers, can facilitate run-time
reconfiguration [2]. It deals with the access of the
reconfiguration bits from nano RAMs.

2013 International Conference on Computer Communication and Informatics (ICCCI -2013), Jan. 04 06, 2013, Coimbatore, INDIA

Fig. 4. High-level view of the SRAM-based architecture.

The concept of temporal logic folding is employed due to


the small latency to access reconfiguration bits from nano
RAMs. Logic folding can be performed on large circuits,
by partitioning them into a sequence of small folding
stages. The on-chip local nano RAMs employed here,
stores the configuration bits for these folding stages. Every
cycle, a new configuration is being loaded to the SRAM
cells from the nano RAMs in order to reconfigure the logic
and interconnects. It is to be noted that NATURE achieves
an order of improvement in logic density, better power
consumption and competitive delays for large applications.
However, in order to realize NATURE technology, the
nano RAMs have to be replaced with SRAMs or embedded
DRAMs. The proposed concept analyzes the impact of
both the parameters- delay/power, when SRAMs are being
used. Each of the block design in NATURE architecture
requires the nano RAMs to be employed which makes it
efficient in speed and reduced power, but it is restricted by
the only fact that the fabrication process of the nano RAMs
are not matured yet, the design process becomes a difficult
task. Hence we do not prefer the use of nano RAMs,
instead we go in for CMOS devices.
III. SRAM based architecture
In this section, we present the proposed CMOS SRAMbased NATURE. Fig. 4 shows the high-level view of the
architecture [1]. It contains island-style LBs, connected by
various levels of interconnect. Switch blocks (SBs) connect
the wire segments, while the connection blocks (CBs)
connect the input and output ports of each LB to the
interconnection network. To support fast run-time
reconfiguration, each LB is associated with a local 10T
SRAM block that stores the configuration copies. In the
following subsections, we describe in detail the LB,
interconnect, and support for reconfiguration.
A.

LB

The FPGA architecture consists of an array of LBs.


Each of this LB has eight logic elements (LEs) and a local
switch matrix, as shown in Fig. 5. Each of these LEs in the
LB performs the necessary logic computations.

Fig. 5. Design of an LB.

Fig. 6. Design of an LE.

The local switch matrix in the LB is designed to


facilitate high speed local communications among LEs. An
LE has six inputs, which are selected through the switch
matrix. The local switch matrix is designed in such a way
that it acts as a 56-to-1 bit multiplexers (MUXs), selecting
signals from 40 common inputs of the LB and 16 feedback
signals (2 feedback signals per LE)[1]. The 40 common
inputs arrive from the interconnection network outside the
LB, and connect to the local switch matrix through CBs.
CBs and the interconnection network are described in
detail in Section III-C.
B. LE
Each of the LB consists of eight LE, which performs
the necessary computations. Fig. 6 shows the design of an
LE, which contains a four-input LUT, four D flip-flops
(DFFs), and several crossbars. The conventional FPGA
designs have only one DFF in each LE, but our design
includes four. These DFFs are used when logic folding is
performed in order to store many temporary computation
results to facilitate interstage communications. The 3-to-1

2013 International Conference on Computer Communication and Informatics (ICCCI -2013), Jan. 04 06, 2013, Coimbatore, INDIA

crossbars select signals to be latched, and the 5-to-1


crossbars determine the output signals of the LE from the
LUT or DFFs. The 3-to-1 and 5-to-1 crossbars acts as
switch and selects signals based on the select line given to
them. Among the four LE outputs [1], two are connected to
the interconnection network through the CB and form part
of the LB outputs, while the other two LE outputs are fed
back through the local switch matrix to the local LEs.
C.

Interconnects

The interconnect and the routing between each of the


elements in the LB is done in order to facilitate the
interstage communications.

Fig. 8. Read circuit in the memory block.

Fig. 7. Architecture of a switch block. (a) Switch block. (b) Design of a


switch.

The routing architecture is required to facilitate logic


folding and efficient local communications, while
maintaining good area efficiency. The interconnect design
consists of CBs, SBs, and various levels of wire segments.
There are 40 common inputs of an LB and the 16 outputs
connect to the interconnection network through the
transmission gates. Each pin connects to SB shown in Fig.
7(a), a switch box. The switch box is composed of 32
switches. Fig. 7(b) shows the switch design explained in
[1]. Connections between pairs of tracks are based on
transmission gates, and an output buffer is placed at each
SB output. Six connections share four buffers for better
area efficiency [11]. Transmission gates which acts as
buffers are used to enable each of these tracks to connect to
a perpendicular track. These connections enable
communications between horizontal and vertical
directions. The use of this switch matrix in this architecture
acts just as a multiplexer which is used to only select the
inputs based on the select lines employed in each of the
logic element of the logic block in the FPGA
architecture[12].Thus enabling us to efficiently reconfigure
by reducing the delay and reducing the average power.
D.

Support for reducing reconfiguration delay

The basic idea behind dynamic reconfigurability of the


architecture depends on fast access to the reconfiguration
bits stored in the local embedded memory [6]. The
proposed architecture is based on the low-power and a high

Fig. 9. Shadow SRAM.

speed 10T SRAMs [4] that support fast read operation and
large bit-width. However, loading reconfiguration bits
from the local memory blocks every cycle results in a
delay overhead. Hence, we added a shadow SRAM cell to
each reconfigurable element to further improve
performance, allowing us to hide the reconfiguration delay.
As shown in Fig. 4, each LB is associated with a 10T
SRAM block. This SRAM block stores the configuration
copies for the LB, CB, and SB. The memory is designed to
support 32 configurations for a circuit mapped to the
FPGA. Hence, 32 x no. of configuration bits, need to be
stored in the memory block. As shown in Fig. 8, 32 DFFs
are serially connected so as to implement a shift-register
and the 32 wordlines (each has 1438 bits) are activated by
the shift-register row-by-row. The read bit lines (RBL) as
specified in the Fig.8 then provide the configuration data to
the shadow SRAMs. Conventional FPGAs use only one of
the SRAM cell in order to control a reconfigurable switch.

2013 International Conference on Computer Communication and Informatics (ICCCI -2013), Jan. 04 06, 2013, Coimbatore, INDIA

The reconfiguration requires access of the local 10T


SRAM block, leading to a delay overhead. To hide this
delay, we use the shadow SRAM scheme introduced in [8].
Fig. 9 shows how this scheme works for controlling a
reconfigurable switch. It consists of two 6T SRAM cells
and a 2-to-1 MUX. The WBL and WBL_N signal lines
from the read circuit are fed into the write1 and write2
signals of SRAM1 and SRAM2 cells. The sel signal selects
any one of the cell that is to be used to control the switch in
the current cycle. When one SRAM cell is used, the other
can be reconfigured for the computation to be performed in
the upcoming cycle, or remain in a wait state when the first
cell needs to be used in the next cycle as well. In the next
cycle, the two SRAM cells switch their roles. Hence, if the
memory access time is smaller than the computation
delays, the configuration delays can be totally hidden.
IV.

EXPERIMENTAL RESULTS

In this section, we present simulation results for the


proposed architecture employing 10T SRAM cell in each
of the memory cell included in the read circuit of the
memory block associated with the logic block. Initially, let
us consider the area-delay tradeoffs. As the folding level
increases, generally, the number of LBs increases and
delay decreases. However, since we are designing only a
single logic block of the FPGA architecture, it can be
shown how the delay during reconfiguration is reduced by
employing shadow SRAM scheme.
Compared to the design of 4T and 6T SRAM cell
structures based on CMOS SRAMs, the area, delay and
power parameters are affected by other factors in the
proposed architecture. First, the inclusion of shadow
SRAM cells introduces more area compared to
conventional single SRAM cell. With the shadow SRAM,
the configuration delay is hidden behind the computation
delay. Second, the CMOS-based 10T SRAM cell
employing the CMOS technology at the expense of a larger
area overhead, its delay and power are reduced. Compared
with nano RAMs, the memory area overhead increases
when 10T SRAMs are used [14]. In the case of
conventional FPGAs, the interconnect dominates the
power consumption. When logic folding is performed, the
leakage power is reduced since a much of the increasing
amount of interconnect hence consumes more power. As
the folding level increases, the number of global
interconnects does not increase much, but the extra routing
complexity for intrastage communications is smaller. The
designers are required to choose appropriate techniques
that satisfy application and product needs in order to avoid
tradeoffs between power, delay and area. Thus it is inferred
that, reduced power consumption can be achieved only by
the non-precharge scheme of the SRAM cell. Comparing
the performance of 4T, 6T and 10T SRAM cell, each of
these SRAM cells possesses its own characteristic feature

TABLE I
Area, delay and power comparison between 4T, 6T and
10T SRAM cell
SRAM Cell Structures
LOGIC
UTILIZATION
4T SRAM

6T SRAM

10T
SRAM

DELAY

9.049 ns

8.924 ns

8.035 ns

POWER

0.089 W

0.081W

0.015W

TABLE II
Area, delay and power tradeoffs of LB employed with 10T
SRAM cell in Memory Block
SRAM
CELL

AREA

DELAY

10T SRAM

239

8.979 ns

POWER

0.052 W

for the improvement of the parameters- area, delay and


power consumption. It is observed that the delay can be
reduced by fast access of the configuration bits from the
memory block during the reconfiguration. Also, with a
slight overhead in area, the delay and power parameters
can be achieved at a lower rate.
The memory area overhead increases from 10.6% when
nano RAMs are used [2] to 26.3%when 10T SRAMs are
used. This enables a delay-power tradeoff. However, we
observe that when logic folding is performed, the powerdelay product follows the same trend as delay with
different folding levels. This is because the change in
power is relatively smaller compared to the change in
delay. The simulation works have been accomplished using
Xilinx Design suite 14.2 version of family SPARTAN 3E
using the device XC3S250E in the package FT256 [13].
Table I shows the comparison results of 4T, 6T and 10T
SRAM cell that can be employed in the architecture. From
the comparison results, it can be inferred that the delay and
the average power consumed is reduced in case of the 6T
and 10T compared with 4T. But for the 10T SRAM cell the
delay and the power consumed is much reduced compared
to the other two SRAM design with an overhead in area.
These disadvantages can be overcome by using the fast and
low power 10T SRAM cells in the LB of proposed design

2013 International Conference on Computer Communication and Informatics (ICCCI -2013), Jan. 04 06, 2013, Coimbatore, INDIA

architecture that supports in reducing the delay during


reconfiguration. Hence by employing the 10T SRAM cell
in the proposed architecture provides an efficient
performance with good design flexibility.
Table II shows the area, delay and power trade-offs of
the LB in the proposed FPGA architecture when each of
the memory cell in the LB is employed with 10T SRAM
cell. We can observe that, by comparing the parameters of
4T, 6T and 10T SRAM cells the performance of 10T
SRAM cell is much better in reducing both delay and
power consumption [9]. Due to presence of readout
inverter in the 10T SRAM, it does not require any
precharge circuit and due to the fact that the Shadow
SRAM employs two SRAM cells, the delay of
configuration bits during reconfiguration is much reduced.
Since the 10T SRAM design avoids high switching
activities on memory read bitlines and thus saves most of
the charge/precharge power, it is a promising candidate for
wide on-chip memory for low-power applications [5].
Hence we employ the 10T SRAM cell in each of the
memory cell of the LB to fine the performance of the LB
and are tabulated as in Table II.
V. CONCLUSION
A combination of CMOS logic and CMOS SRAM
based logic block of FPGA architecture is presented in this
paper. This architecture uses low-power 10T SRAM block
as storage element for the configuration bits. The nonprecharge scheme of the 10T SRAM cell enables to
achieve reduced power consumption and reduced delay,
since the time for precharge is reduced with an area
overhead compared to the conventional 4T and 6T SRAM
cells. Since the memory access time is smaller than the
computation delays, the configuration delays can be totally
hidden by the use of shadow SRAM technique. The LB
design employing 10T SRAM cell is simulated using
XILINX ISE Design suite 14.2 with family SPARTAN 3E
and target device XC3S250E. The power analysis for 4T,
6T and 10T are obtained using TANNER tool. Simulation
results show that the performance of the LB is improved
with reduced power consumption of about 0.089 W (for
4T) to 0.015 W (for 10T) and 0.081 W (for 6T) to 0.015 W
(for 10T). Also, reduced reconfiguration delay is achieved
of about 9.049 ns (for 4T) to 8.035 ns (for 10T) and 9.049
ns (for 6T) to 8.035 ns (for 10T). The architecture also
allows various tradeoffs among area, delay, and power
consumption, providing good design flexibility.
In future, the memory cell in the LB can be designed
using 12T SRAM cell, in order to achieve more effective
performance by reducing the delay further and can be
implemented by using the low power technique of C2MOS
logic in order to achieve reduced power consumption.

ACKNOWLEDGEMENT
The authors would like to thank All India Council for
Technical Education (AICTE), India, to financially support
this work under the grant 8023/BOR/RID/RPS-48/200910. The authors would also like to thank the Management
and Principal of Sri Ramakrishna Engineering College,
Coimbatore for providing excellent computing facilities
and encouragement.
REFERENCES
[1] Ting-Jung Lin, Wei Zhang, and Niraj K. Jha, SRAM-Based
NATURE: A Dynamically Reconfigurable FPGA Based on 10T LowPower SRAMs IEEE Trans. VLSI Systems, accepted for future inclusion
in IEEE journal.
[2] I. Kuon and J. Rose, Measuring the gap between FPGAs and ASICs,
IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 26, no.
2, pp. 203215, Feb. 2007.
[3] W. Zhang, N. K. Jha, and L. Shang, A hybrid nano/CMOS
dynamically reconfigurable systemPart I: Architecture, ACM J.
Emerg. Technol. Comput. Syst., vol. 5, no. 4, pp. 16.116.30, Nov. 2009.
[4] A. DeHon, Dynamically programmable gate arrays: A step toward
increased computational density, in Proc. Canadian Wkshp. FieldProgram. Devices, 1996, pp. 4754.
[5] H. Noguchi, Y. Iguchi, H. Fujiwara, Y. Morita, K. Nii, H. Kawaguchi,
and M. Yoshimoto, A 10T non-precharge two-port SRAM for 74%
power reduction in video processing, in Proc. IEEE Comput. Soc.
Annu. Symp. VLSI, 2007, pp. 107112.
[6] Md. A. Khan, N. Miyamoto, R. Pantonial, K. Kotani, S. Sugawa, and
T. Ohmi, Improving multi-context execution speed on DRFPGAs, in
Proc. IEEE Asian Solid-State Circuits Conf., 2006, pp. 275278.
[7] S. Trimberger, D. Carberry, A. Johnson, and J. Wong, A timemultiplexed FPGA, in Proc. IEEE Symp. FPGAs for Custom Comput.
Mach., 1997, pp. 2228.
[8] T. Fujii, K.-I. Furuta, M. Motomura, M. Nomura, M. Mizuno, K.-I.
Anjo, K.Wakabayashi, Y. Hirota, Y.-E. Nakazawa, H. Ito, and M.
Yamashina, A dynamically reconfigurable logic engine with amulticontext/ multi-mode unified-cell architecture, in Proc. IEEE Int. SolidState Circuits Conf., 1999, pp. 364365.
[9] Sapna Singh, Neha Arora, Meenakshi Suthar and Neha Gupta
(2012)Performance Evaluation Of Different Sram Cell Structures At
Different Technologies, Proc. of International Journal of VLSI design &
Communication Systems (VLSICS) Vol.3.
[10] W. Zhang, L. Shang, and N. K. Jha, A hybrid nano/CMOS
dynamically reconfigurable systemPart II: Design optimization flow,
ACM J. Emerg. Technol. Comput. Syst., vol. 5, no. 3, pp. 13.113.31,
Aug. 2009.
[11] G. Lemieux and D. Lewis, Circuit design of routing switches, in
Proc. Int. Symp. FPGA, 2002, pp. 1928.
[12] Dr.Sanjay Sharma and Shyam Akashe (2011),High Density FourTransistor SRAM Cell With Low Power Consumption, Proc. of Int. J.
Comp. Tech. Appl., Vol 2, 1275-1282.
[13]
Spartan3E,ver.3.4,Xilinx,(2006)[Online].Available:http://direct.
xilinx.com/bvdos/ publications/ds312.pdf
[14] Narender Gujran, Praveen kaushik (2012), A Comparative Study of
6T, 8T and 9T Sram Cell, Proceedings of International Journal of Latest
Trends in Engineering and Technology (IJLTET),Vol.1.

2012 2nd International Conference on Power, Control and Embedded Systems

Simulation and stability analysis of 6T and 9T


SRAM cell in 45 nm era
Shyam Akashe , Nitesh Kumar Tiwari , Rajeev Sharma
Abstract-Advancement of technology greatly affects the leakage
current and leakage power of SRAM cell. Leakage current in
SRAM cell is dominating factor, which is mainly affects the
power consumption. This paper presents the design and
evaluation of a new SRAM cell made of nine transistors (9T).The
9T SRAM cell achieves improvements in leakage current, power
dissipation performance and read stability compared with 6T
SRAM cell for low power operation. This paper compares the
performance of two SRAM cell topologies, which includes the
conventional 6T cell and 9T cell. In particular the leakage
current, leakage power and static noise margin (SNM) of each
cell is designed and examined. Compared to a conventional 6T
SRAM cell, the proposed 9T SRAM cell reduces the power
consumption by62.45% and enhances the read stability by
43.37%.
Keywords SRAM, leakage current, leakage power, static
noise margin (SNM).

I.

INTRODUCTION

SRAM stands static random access memory. SRAM is


volatile in nature; it means that it holds the data as long as
power supply is not cut off. Semiconductor memories,
particularly SRAMs are widely used in electronic system [1-23]. A significant percentage of the total area and power of
many digital chips are due to SRAMs. For these chips, the
SRAM leakage dominates the total chip leakage. Lowering
the supply voltage (VDD) for SRAMs may reduce the leakage
and switching power consumptions [4].Static Random Access
Memory is mainly used in various kinds of portable
devices/systems. SRAM plays an important role in modern
mobile phones, microprocessors, microcontrollers, and
computers etc. SRAM (Static RAM) and DRAM (Dynamic
RAM) both holds the data but in different manners. DRAM
requires the data to be refreshed periodically in order to retain
the data. SRAM does not need to be refreshed as the
transistors inside would continue to hold the data as long as
the power supply is not cut off. The additional circuitry and
timing are needed to refresh the DRAM periodically, which
makes DRAM memory slower and less desirable than SRAM.
One complication is the much higher power used by DRAM
memory.
.
With the advantages of high speed and ease of use, static
random access memory (SRAM) has been widely used in
system-on-chips (SoC). According to the International
Technology Roadmap for Semiconductors (ITRS) forecast,
memory is going to occupy 90% of the SoC area by 2013

978-1-4673-1049-9/12/$31.00 2012 IEEE

[5].On the contemporary, the size of transistor and SRAM bit


cell can be reduced by the technology scaling technique, this
has also made it even more challenging to maintain a sufficient
cell stability margin while keeping the same scaling pace of
access time and cell size as the mismatching of threshold
voltage (Vt) between cross-coupled inverter pairs becomes
larger and larger [6-7-8].Over the last few years, devices at
45nm have been manufactured .range of 45 nm is foreseen to
be reached in the very near future. Advances in chip design
using CMOS technology have made possible the design of
chips at high integration, fast performance, and low power
consumption, low leakage current and greater the static noise
margin(SNM).at low supply voltages the SRAM cell becomes
less stable means low static noise margin ,causing increasing
leakage current. when the conventional six transistors SRAM
cell configurations is usually in low power supply ,so this cell
shows poor stability hence hold and read static noise margin
are small. A 9T SRAM cell is proposed in this paper we
examined the low power consumption and better stability.
The remainder of this paper is organized as follows. In
section2 we introduce about 6T SRAM cell and a newly
proposed 9T SRAM cell. In section3 we discuss leakage
current and leakage power of 7T SRAM cell. Section 4
consists of static noise margin. Section5 shows the simulation
result of 6T and 9T SRAM cell in terms of leakage current
leakage power and static nose margin, while section6
concludes the paper.
II.

CONVENTIONAL 6T SRAM CELL

The 6T SRAM cell can be designed by using two PMOS


transistors and four NMOS transistors. A conventional 6T
SRAM cell shown in fig.1 it consists of two crossed coupled
inverters and two access NMOS transistor N3 and N4.The
both inverters are connected with each other by back to back
two Negative channel metal oxide semiconductor (NMOS)
transistors. With the help of these two transistors we access
the data for either read or write operation [9]. These two
cross-coupled inverters are used for storing one bit of
information at a time (either 0 or1). 6T SRAM cell must to
guarantee a non-destructive read operation, NMOS driver
transistors N1 and N2 must be1.5-2.5 times larger than NMOS
access transistors N3 andN4[10].
As shown in fig.1, 6T SRAM requires two Positive channel
metal oxide semiconductor (PMOS) transistors namely P1 and

P2 and four NMOS transistor namely N1, N2, N3, N4. Fig.2
shows 6T SRAM cell write wavefoorm, fig.3 shows
Conventional 6T SRAM cell for read opperation and fig.4
shows 6T SRAM cell read waveform.
A conventional 6T Static Random Access Memory
performs write operation, read operattion and standby
operation. These operations performed witth the help of bitline pairs (BL and BLB) and word lines (W
WL). The BL and
BLB are the bit line pairs of the SRAM cell. The value of
BL and BLB is inverted. When BL is highh then BLB will be
zero and vice versa.

that depends upon what is wriitten during write operation of


SRAM cell. Now at this time only
o
one of the line discharges
to ground to create the voltagees differences between the two
lines. Now the sense amplifieer sensed this difference and
stored bit will be decoded and displayed at the output of the
sense amplifier.

wo signals will be
During write operation of SRAM tw
produced from the input BL and BLB. BL is the input of N3
and BLB is the input of N4 transistor. Wheere if BL=0, then
BLB will be 1, or if BL=1 then BLB
B
will be zero.
Simultaneously word-line goes high which help to written the
data in the SRAM cell by accessing the trannsistors.

Fig3: Conventional 6TSRA


AM cell for read operation

Fig1- Schematic of Conventional 6T SR


RAM cell

Fig4: 6T SRAM ceell read waveform

Fig2: 6T SRAM cell write wavefforms

III.

PROPOSED
D 9T SRAM CELL

To perform the read operation of 6T SR


RAM cell both bitThe proposed 9T SRAM ceell is introduced in this section.
lines BL and BLB are pre-charged and enaables the word-line
The schematic in fig.(5) showss the proposed 9T SRAM cell
to high. IN the read operation we use the sense amplifier, it
with transistor sizing in 45 nm CMOS
C
technology [11]. The 9T
sense the value of the SRAM cell. Sincce Static Random
SRAM cell can be designed byy using two PMOS transistors
Access Memory cell is already either in statte 1 or in state 0,

and seven NMOS transistors. As shown in fig.5, 9T SRAM


requires two Positive channel metal oxide semiconductor
(PMOS) transistors namely P1 and P2 and NMOS seven
transistors namely N1, N2, N3, N4, N5, N6, N7. Fig.6 shows
6T SRAM cell write operation waveform, and fig.7 shows 6T
SRAM cell read operation waveform.
A 9T SRAM cell has been proposed to improve the SNM
[12]. A new 9T SRAM cell has been proposed to accomplish
better stability, reduce bit line leakage problem, and provides
low leakage current during read and write operation thus
achieving low power consumption compared with
conventional 6T SRAM cell. In the proposed 9T SRAM cell
the bit line leakage can be significantly reduced by the stack
effect when N5 and N7 transistors are off.

N5 and N7. Alternatively, provided that Node V stores 1, the


complementary bit line (BLB) is discharged through N6 andN7.
Since transistors N3 and N4 are cut-off, the storage nodes
Node Vb and Node V are completely isolated from the bit lines
during a read operation. Unlike the 6T SRAM cell, the voltage
of the node which stores 0 is strictly maintained at the
ground level during a read operation with the proposed circuit
technique. The read stability of the 9T SRAM cell is, therefore,
enhanced as compared to the standard 6T SRAM cell.

The left sub-circuit configuration of proposed 9T SRAM


cell from N1 to N4 and P1 to P2 are remain unchanged as in
conventional 6T SRAM cell. The read SNM is maintained by
retaining the write access circuit and adding a read buffer to
the conventional 6T SRAM cell. To perform write operation
the two write access transistors (N3 and N4) are controlled by
the two write word line. The right side sub-circuit form N5 to
N7 perform read operation. To perform read operation
transistor N7 is controlled by a separate read word line.

Fig.6 9T SRAM cell write operation

Fig.7 9T SRAM cell read waveform

Fig.5 proposed 9T SRAM cell

During a write operation, Word line enables high while


transistor N7 is cut-off. The two write access transistors N3
and N4 are turned on. In order to write a 0 to Node Vb, BL
and BLB are discharged and charged, respectively. A 0 is
stored into the SRAM cell through N3. Alternatively, for
writing a 0 to Node V, BL and BLB are charged and
discharged, respectively. A 0 is stored into Node V through
N4.
During a read operation, read word line signal transitions
high while WR is maintained low. N7 is activated. Provided
that Node Vb stores 1, the bit line (BL) is discharged through

IV.

LEAKAGE CURRENT AND LEAKAGE POWER

Leakage current is a current that flows continuously


through the circuit which is not in active state. Leakage
current is calculate at the node of transistor which is turn off.
It has been analyzed that by reducing the node voltages of the
SRAM cell, leakage current can be reduced effectively and
properly for example bit line voltages, word line voltages and
also transistor node voltages [13-14].
The leakage current of CMOS transistor consists of three
main components: junction tunneling leakage, sub threshold
leakage and Gate tunneling leakage.

Junction tunneling leakage: the reverse biased p-n junction


leakage has two main components; one is minority carriers
diffusion near the edge of the depletion region and other is
due to electron hole pair generation in the depletion region of
the reverse biased junction [15].
Sub threshold leakage: sub threshold leakage is the drainsource current of a transistor when the gate- source voltage is
less than the threshold voltage happens when the transistor is
operating in the weak inversion region. To reduce the sub
threshold leakage of an SRAM cell one can increase the
threshold voltage of all or some of the transistors in the cell.
Gate tunneling leakage: electrons (holes) tunneling from the
bulk silicon through the gate oxide in to the gate results in
gate tunneling current is an NMOS (PMOS) transistors.
Leakage power: leakage power dissipation is roughly
proportional to the area of the circuit. it is expected to
significant fraction of the overall chip power dissipation in
nanometer CMOS design process .In many processors cache
occupy about 50% of the chip area. The leakage power of
cache is one of the major sources of power consumption in
high performance microprocessor.

The stability of a SRAM cell is usually defined by the SNM


as the maximum value of DC voltage that can be tolerated by
the SRAM cell without changing the stored bit [16]. SNM
which affects both read and write margin, is related to the
threshold voltage of PMOS and NMOS devices in SRAM cell
[18].
A. Read static noise margin (RSNM)
Read static noise margin is the measurement of the voltage
which is required at the node store 0 to flip the state of an
SRAM cell during read cycle. Fig. 12 and Fig. 13 indicate the
waveform of the RSNM of 6T and proposed 9T SRAM cell
respectively. The 9T RSNM waveform shows that 29.88%
improvement in read static noise margin as compared to
RSNM of 6T SRAM cell. These curves are drawn by the test
schematic. For the read stability of SRAM, static noise margin
(SNM) is the standard method to measure the read stability of
a SRAM cell. It can be measured graphically from the
butterfly graph. The butterfly graph is a plot of the inverter
characteristics of the SRAM cell [17].

The read static noise margin (SNM) deteriorates with


decrease in supply voltage [18-19] and increase with
transistors mismatch. This mismatch occurs due to variations
in physical quantities of identically designed devices; it means
their body factor, current factor and threshold voltages.. As
supply voltage decreases beyond the limit the static noise
margin decreases and the overall delay of SRAM cell
increases. Moreover the read operation at low voltage leads to
storage data destruction in SRAM [14].
As shown in fig NMH and NML are defined as equations
1 and 2 respect

Fig.8 leakage current waveform of 6T SRAM

NMH = VOH - VIH

(1)

NML = VIL - VOL

(2)

Whereas
VIL is the maximum input voltage level
recognized as logical 0, VIH is the minimum input voltage
level recognized as a logical 1. VOL is the maximum logical
0 output voltage, VOH is the minimum logical 1 output
voltage. Therefore the required SNM expression can be
defined a
SNM = (NMH2 + NML2)

Fig.9 leakage current waveform of 9T SRAM cell

V.

STATIC NOISE MARGIN

(3)

As shown in fig.10 and fig. 11 the SNM of 6T SRAM cell


during read operation is 415mv and for the 9T SRAM cell the
SNM is 595mv, this shows that 43.37% improvements in 9T
SRAM cell.

Leakage
Current (pA)

Leakage Power
(pW)

6T

13.31

4.82

9T

4.57

1.81

SRAM
Cell

Table (1). Leakage Current and leakage Power of 6T and 9T SRAM cell
.

Fig.10 RSNM of 6T SRAM cell

SRAM

6T

7T

% Increment

RSNM

415mv

595mv

43.37%

Table (2). Read SNM of 6T and 9T SRAM cell.

Table (2) shows the RSNM for 6T is 415mv and for 9T is


595. Now it can be easily seen that the proposed 9T SRAM
cell improves the read stability by 43.37% as compared to
compared to conventional 6T SRAM cell. These simulation
results show the better stability, low leakage and robustness of
9T.
CONCLUSION

Fig.11 RSNM of 9T SRAM cell

B. Write static noise margin (WSNM)


WSNM implies the write stability of any SRAM during
write operation. It is an measure of an ability of an SRAM cell
to pull down the node which store 1 to a voltage which is
less than the switching threshold voltage of the inverter which
store 0. It means WSNM is measuring during 1 is writing.
VI.

SIMULATION RESULTS

We have done a proper simulation of 6T and 9T SRAM cell


by cadence simulation tool in 45 nm CMOS technology and
obtain some good results. As shown in table (1), the leakage
current of conventional 6T SRAM cell is 13.31 pa but in the
proposed 9T SRAM cell design it effectively decrease to
4.57pa. Similarly the leakage power of the conventional 6T
SRAM cell is 4.82pw. But it is effectively decreasing in 9T
SRAM cell that is 1.81pw. From simulation results in table (1)
it is cleared that the proposed 9T SRAM cell minimizes the
power consumption by 62.45%.

In this paper, a new 9T SRAM cell has been proposed to


accomplish better read stability, reduce bit line leakage
problem, and provides low leakage current thus achieving
lower power consumption compared with conventional 6T
SRAM cell. Simulation results show that the proposed 9T
SRAM cell achieves about 62.45 percent power saving and
improves read stability by 43.37% compared with conventional
6T SRAM cell using45 nm CMOS technology. An optimal
operating power supply voltage of 0.7V is applied for the
proposed 9T SRAM cell in 45 nm technologies by taking into
consideration all the three requirements of stability, leakage
current, and power consumption. At this power supply voltage
the proposed 9T SRAM cell achieves better performance at 45
nanometer CMOS technology.
ACKNOWLEDGEMENT
This work was supported by ITM University Gwalior with
collaboration of Cadence System Design Bangalore.
REFERENCES
[1]. B. Prince, Semiconductor Memories. New York: Wiley,
1991.
[2]. K. Takeda, Y. Aimoto, N. Nakamura, H. Toyoshima, T.
Iwasaki, K. Noda, K. Matsui, S. Itoh, S. Masuoka, T.
Horiushi, A. Nakagawa, K.Shimogawa, and H.

Takahashi, A 16-Mb 400-MHz loadless CMOS fourtransistor SRAM macro, IEEE J. Solid-State Circuits,
vol. 35, pp.16311640, Nov. 2000.
[3]. S.-M. Yoo, J. M. Han, E. Hag, S. S. Yoon, S.-J. Jeong,
B. C. Kim, J.-H. Lee, T.-S. Jang, H.D. Kim, C. J.
Park, D. H. Seo, C. S. Choi, S.-I. Cho, and C. G. Hwang,
A 256 M DRAM with simplified register control for
low power self refresh and rapid burn-in, in Symp.
VLSI Circuits Dig. Tech. Papers, 1994, pp. 8586.
[4]. B.H.Calhoun and P.Chandrakasan A 256-kb 65-nm
sub- threshold SRAM Design for Ultra-Low-Voltage
operation, IEEE JOURNAL OF SOLID-STATE
CIRCUITS, VOL, 42, NO. 3, pp. 680-688, March 2007
[5]. International Technology Roadmap for Semiconductors
2005.WWW.ITRS.NET/LINKS/2005ITRS/HOME2005.
HTM
[6]. Evelyn Grossar, Michele Stucchi, Karen Maex, Read
Stability and Write-Ability Analysis of SRAM Cells for
Nanometer Technologies, Solid-State Circuits, IEEE
Journal ,vol. 41 , no. 11, Nov.2006 pp.2577-2588.
[7]. Benton H. Calhoun Anantha P. Chandrakasan Static
Noise Margin Variation for Sub-threshold SRAM in 65
nm CMOS, Solid-State Circuits, IEEE Journal vol. 41,
Jan.2006, Issue 7, pp.1673-1679
[8]. Yeonbae Chung, Seung-Ho Song , Implementation of
low-voltage static RAM with enhanced data stability and
circuit speed, Microelectronics Journal vol. 40, Issue 6,
June 2009, pp. 944-951.
[9]. J.Rabaey, A.Chandrakasan, and B. Nikolic, Digital
Integrated Circuits: A Design Perspective, 2nd ed.
Englewood Cliffs, NJ: Prentice- Hall, 2003.
[10]. Andrei Pavlov and Manoj Sachdev. CMOS SRAM
Circuit Design AND Parametric Test in Nano-scaled
Technologies. Springer, 2008
[11]. Z. Liu, and V. Kursun, Characterization of a Novel
Nine-Transistor SRAM Cell, IEEE Transactions on
Very Large Scale Integration (VLSI) Systems vol. 16,
No. 4, pp. 488-492, April 2008
[12]. S. Lin, Y-B. Kim, F. Lombardi, A Low Leakage 9T
SRAM Cell for Ultra-Low Power Operation, ACM
Great Lakes Symposium on VLSI 2008, May 2008, pp.
123-126.
[13]. A. Chandrakasan, W.J. Bowhill, F. Fox, Design of
High- Performance Microprocessor
Circuits, IEEE
Press, 2000.
[14]. O. Thomas, Impact of CMOS Technology Scaling on
SRAM Standby Leakage Reduction techniques,
ICICDT, May 2006.
[15]. V. De et al., Techniques for leakage power reduction,
in Design of High-Performance Microprocessor Circuit,
Circuits, A. Chandrakasan, W. J. Bowhill, and F. Fox,
Eds. Piscataway, NJ: IEEE, 2001, pp. 285-308.
[16]. E. Seevinck, F.J. List, J. Lohstroh, Static-noise margin
analysis of MOS SRAM cells, Solid-State Circuits,
IEEE Journal of Volume 22, Issue 5, pp. 748 - 754, Oct
1987.

[17]. J.yang and L. Chen. A New Loadless 4-Transistor


SRAM Cell with a 0.18 m CMOS Technology. In
Electrical and Computer Engineering, 2007. CCECE
2007, Canadian Conference on, pages 538-541,2007
[18]. A.P.Chandrakasan, Low-power CMOS digital design,
IEEE Journal of Solid-state Circuits, Vol. 27, pp.473484, Apr. 1992.
[19]. A.J.Bhavnagarwala, "The impact of intrinsic device
Fluctuations on CMOS SRAM cell stability," IEEE J.
Solid-State Circuits, vol. 36, no. 4, pp. 658-665, Apr.
2001.

Relocatable and Resizable SRAM Synthesis for Via Configurable Structured ASIC
Hsin-Hung Liu, 1Rung-Bin Lin, 1I-Lun Tseng
Department of Computer Science and Engineering
Yuan Ze University
Chung-Li, Taiwan
1
E-mail: csrlin@cs.yzu.edu.tw, iltseng@saturn.yzu.edu.tw
memory block realization using six-transistor (6T) memory
cells. Nevertheless, we still adapt a logic-oriented VCLB to
realize an SRAM block.
A typical ASIC may contain a varying number of
memory blocks of various sizes. The memory blocks,
mainly SRAMs, can be placed at any legal locations. They
can be used for cache memories, FIFOs, stacks, etc. They
are customized for area, power, and performance
optimization using a highly crafted SRAM cell and
peripheral circuits. On the contrary, a typical structured
ASIC can have only a fixed number of pre-diffused memory
blocks whose locations and sizes cannot be changed as
shown in Figure 1(a), i.e., they are not relocatable and
resizable. Figure 1(b) shows another implementation of
structured ASIC memory block which employs a
customized VCLB different from the one for realizing logic
blocks. If the VCLB instances are not used for memory
block, they may be still configured into some logic blocks.
Otherwise, they will be wasted. The implementations both in
Figure 1(a) and 1(b) face the same problem of having to
determine the number of memory blocks, their sizes, and
their locations beforehand. Hence, a designer may encounter
a problem of insufficient memory blocks, improper memory
block sizes and locations, etc. These limitations severely
discourage structured ASICs adoption. Clearly, we need
remove these limitations to make structured ASIC a
prevailing design technology.

Abstract
Memory blocks in a structured ASIC are normally precustomized with fixed sizes and placed at predefined
locations. The number of memory blocks is also predetermined. This imposes a stringent limitation on the use of
memory blocks, often creating a situation of either
insufficient capacity or considerable waste. To remove this
limitation, in this paper we propose a method to create
relocatable and resizable SRAM blocks using the same viaconfigurable logic block to implement both logic gates and
6T SRAM cells. We develop an SRAM compiler to
synthesize SRAM blocks of this sort. Our single-port SRAM
array uses only 1/3 the area taken by a flip-flop based
SRAM array. For dual-port SRAM arrays, this ratio is 2/3.
We demonstrate first time the feasibility of deploying a
varying number of relocatable and resizable SRAM blocks
on a structured ASIC.

Keywords
Structured ASIC, SRAM, via configurable, regular fabric

1. Introduction
Structured ASIC is a design style that can bridge the
performance, power, area, and design cost gaps between
ASIC and FPGA. It contains arrays of via-configurable logic
blocks (VCLB) with prefabricated transistors and possibly
arrays of routing fabric blocks each formed by predefined
yet via-configurable metal wires [1-4]. A via-configurable
structured ASIC technology employs only customizable via
layers [5-9] and thus has a much lower non-recurring
engineering (NRE) cost than standard ASIC. Its regular
layout structure also enables higher and more predictable
manufacturing yields. As technology scales, layout
regularity becomes even more important. It is encouraging
to see that Jhaveri et al. [10-11] are able to achieve
comparable timing and area utilization for an ARM926EJ
implementation exploiting only a small set of layout
primitives for forming regular layouts, with respect to an
implementation based on a commercial 65-nm standard cell
library.
Structured ASIC researches mainly focus on exploring
VCLB architectures. Only a few works are done for routing
architecture [6,8,9]. VCLBs are usually handcrafted to
enable standard-cell-like designs for leveraging existing
standard ASIC design tools [7-8]. Although a significant
progress in VCLB architecture designs has been made so far,
VCLBs are mainly optimized for logic block realization
rather than memory block implementation. To the best of
our knowledge, we are the first to design a VCLB for
978-1-4673-4953-6/13/$31.00 2013 IEEE

494

14th Int'l Symposium on Quality Electronic Design

A solution to the above problem is to use the same


VCLB to realize both logic and memory blocks as shown in
Figure 1(c). Clearly, a structured ASIC formed by VCLB
instances of this sort can have a varying number of memory
blocks which are relocatable and resizable even though the
structured ASIC still has a pre-diffused substrate. Here, we
have three memory blocks, each of which can be moved
around within the chip and resized as needed. However, as
can be expected, timing performance, power dissipation, and
area of such structured ASIC will be compromised for
owning this design freedom. Nevertheless, the design
freedom brought about by employing a single VCLB is a
valuable asset to designers. Hence, in this paper we propose
a method to create relocatable and resizable SRAM blocks
for via-configurable structured ASIC. Two approaches are
possible. One is to assemble flip-flops into a memory block.
This approach is simple but will use more area. A few
patents based on this approach have been granted [12,13].
The other is to configure VCLB instances into an array of
six-transistor (6T) SRAM bits. This approach is more
complicate but needs less area. Our work adopts this
approach. We re-engineer the VCLB in [7] to create a
VCLB which can also be configured into a 6T SRAM cell.
We implement an SRAM compiler to generate SRAM
blocks of this sort. We test our methodology using an
industrial design flow. Based on TSMC 0.18um process
technology, our single-port SRAM arrays use only 1/3 the
area needed by flip-flop based SRAM arrays. This area ratio
is 2/3 for dual-port SRAM arrays. Our single-port SRAM
blocks from small to large ones have an area of 1.5 to 14.4
times the area of their non-structured ASIC counterparts
generated by ARMs SRAM compiler. It has an access time
of 1.46 to 2.25, a read internal power of 0.37 to 3.29, and a
write internal power of 0.5 to 3.05 times that of their nonstructured ASIC counterparts. This performance figures
sound not quite outstanding when they are compared with
those of state-of-the-art SRAMs. However, given that the
ratio of our structured ASIC memory cells area to a nonstructured ASIC memory cells area is 18.8, our SRAM
blocks are of good quality. Especially, our small SRAM
blocks achieve good area usage, timing performance, and
power dissipation. Achieving the above results involves
meticulous circuit design and layout planning within VCLB,
among VCLBs, and within the whole chip. Our main
contributions are as follows:
z Design a VCLB for implementing logic blocks, all the
peripheral circuits (including sense amplifier, tri-state
output, etc.) and 6T memory cells for SRAM blocks.
z Develop an SRAM compiler to generate SRAM blocks
using our VCLB and predefined routing fabrics.
z Demonstrate first time the feasibility of deploying a
varying number of relocatable and resizable SRAM
blocks on a structured ASIC via implementing an SoC
platform based on OR1200 32-bit CPU from
OpenCores.
The rest of this article is organized as follows. Section II
briefly introduces some basics about SRAM and standardcell-like structured ASIC. Section III describes how to
design a relocatable and resizable SRAM block. Section IV

presents our SRAM compiler. Section V gives some


experimental results. The last section draws conclusions.

2. Preliminary
2.1. VCLB for standard-cell-like structured ASIC
In this work we will employ a standard-cell-like
structured ASIC technology presented in [7]. Such a
technology is featured by having a VCLB layout similar to
that of conventional standard cells. The work in [7] presents
a VCLB with 5 pairs of P/N transistors laid over three
diffusion strips. We can use vias between Metal 1 (M1 for
short) and Metal 2 (M2 for short) to configure the VCLB
into various logic gates. We can also abut several VCLB
instances to realize a more complex logic gate or a flip-flop.
However, this VCLB, without any modification, cannot be
used directly to implement a 6T memory cell. Besides a
VCLB, a via-configurable structured ASIC has a predefined
yet via-configurable routing fabric for connecting logic
gates together. We need a structured ASIC router [8]
specifically designed to deal with a predefined routing fabric.
In this work, we use the routing fabric in [8] for our SRAM
implementation. This routing fabric contains repetitive wire
segments on M3 through M5.

2.2. Single and dual-port SRAM blocks


A synchronous single-port SRAM block consists of a
memory array, bit line conditioning circuitry, row decoder,
column multiplexer, column circuitry, clock generator, input
register, and data output tri-state buffer [14]. Figure 2 shows
a mxN single-port SRAM block. A word line from the row
decoder connects all the memory cells in a row and a bit line
to the column multiplexer connects all the cells in a column.
The column circuitry has write drivers and sense amplifiers.
A dual-port SRAM block has two read ports per memory
cell. Its structure is similar to that of a single-port SRAM
block. The main difference is that a dual-port SRAM block
can simultaneously read two memory words at different
locations. Hence, each port has its own row decoder, column
multiplexer, column circuitry, I/O buffer, and clock
generator.

Figure 2: A synchronous single-port SRAM block

3. Relocatable and Resizable SRAM Block Design


The key to enabling a relocatable and resizable SRAM
block for via-configurable structured ASIC is to use the
same VCLB for implementing logic blocks and memory
blocks. In this section we will present such a VCLB and
show how to use it to implement a single-port and a dualport SRAM blocks.

difficult due to lack of short M2 jumpers on the top and


bottom boundaries. Hence, we develop a special router for
routing signal nets on a predefined routing fabric.

3.1. A VCLB for SRAM cell


As mentioned earlier, the VCLB in [7] cannot be used
directly to implement a 6T memory cell. However, we can
modify it for this purpose. First, we increase the VCLB size
to provide more horizontal and vertical tracks for
implementing more complex cells such as a scan flip-flop.
The sizes of transistors in the VCLB are increased
accordingly. This VCLB shown in Figure 3 is called SVCLB. Since S-VCLB is designed originally for
implementing logic gates and flip-flops, its transistor sizes
are not right for implementing a workable 6T memory cell.
Hence, we modify its transistor sizes to form a new VCLB
that can implement a workable 6T memory cell. The new
VCLB called M-VCLB is shown in Figure 4. Currently, SVCLB and M-VCLB are both implemented based on TSMC
0.18um technology, which can be easily migrated to more
advanced technology nodes due to its layout regularity and
simplicity. The reason why we do not work out a completely
new VCLB is because we want to see whether a VCLB
tailored for realizing logic gates could be successfully tuned
to implement a 6T memory cell. Note that all the logic gates
implemented using S-VCLB can also be implemented using
M-VCLB with a degraded performance. Hence, we will use
M-VCLB to implement not only a 6T memory cell but also
various logic gates for realizing peripheral circuits and all
the logic elements outside an SRAM block. Namely, MVCLB enables realization of a relocatable and resizable
SRAM block. Note that the layout of M-VCLB also enables
a standard cell design style.

Figure 3: S-VCLB with 10 transistors designed for realizing


logic gates and flip-flops. Vertical wires on different layers
are stacked but shown here with a slight displacement for
visibility. The short Metal 2 wires (jumper wires) on the left
and right boundaries are used for joining signals between
two adjacent VCLBs which are used to implement more
complex logic gates or flip-flops.

3.2. Implementation of single-port SRAM block


Here we describe how to configure M-VCLB into a 6T
memory cell and various kinds of pitch-matched peripheral
circuits. We then show how to assemble memory cells into
an SRAM array, together with pitch-matched peripheral
circuits to form a single-port SRAM block.
M-cell and SRAM array: We can configure M-VCLB to
realize a 6T memory cell in Figure 5 using some vias
between M1 and M2. This is shown in Figure 6. We call this
memory cell M-cell. We waste four transistors per M-cell,
two at the upper right and two in the middle of the diffusions
on the left. Forming an SRAM array need seamlessly
connect the bit lines and word lines of the M-cells in the
same column and the same row together, respectively. We
can use the two horizontal M2 lines as shown in Figure 6 to
form a wider word line. The word lines between neighboring
M-cells are connected automatically via the jumper wires on
the left and right boundaries. Hence, word line routing will
be simple. The two bit lines, bit and bitb run vertically on
the right part of the cell. Two vias are used to connect the
upper and lower segments of a bit line together. However,
joining the bit lines in the vertically stacked M-cells is more

Figure 4: M-VCLB tailored from S-VCLB for forming an


SRAM cell.

Figure 5: A 6T memory cell.

Figure 6: Stick diagram of a 6T memory cell (M-cell).


Pitch-matched peripheral circuits: The major issue for
designing a peripheral circuit using M-VCLB is pitchmatching the circuits layout with an SRAM array. In this
work we have implemented all the peripheral circuits with
pitch-matching. For example, Figure 7 shows a 4:1 column
multiplexer whose inputs (four pairs of bit lines) come from
four columns of M-cells. We use four VCLBs lined
horizontally to implement this multiplexer. The layout of the
four VCLBs is pitch-matched with the four M-cell columns.
A column circuitry contains a write driver and a current
mirror sense amplifier [15-17]. Both circuits each need two
M-VCLBs joined horizontally. The large-sized transistors in
M-VCLB are used here to build a sense amplifier. The
layout of a 1-bit column circuitry containing four M-VCLBs
is also pitch-matched with the four M-cell columns.
However, we must place the column circuitry below the
column multiplexer to smooth bit-line routing and signal
flow. As we will see later, we perform placement to
assemble all the circuits together and perform routing for
completing an SRAM block design. Figure 8 shows the
layout of a single-port SRAM64X4_Mux4.

access. We use hierarchical bit lines to mitigate this problem


[18-19]. We chop bit lines into global bit lines and local bit
lines. We divide a memory array into multiple banks. A
local bit line is located inside a bank and connected to a
global bit line via a transmission gate. A bank selection
signal generated by a bank decoder determines which bank
should be accessed. To improve bit-line pre-charging speed,
we also employ a local bit line conditioning circuitry. In our
SRAM blocks, 16 M-cells are connected to a local bit line.
A memory array has at most eight banks.
A large SRAM block has large word line capacitance.
Hence, we divide a word line into global word lines and
local word lines [20]. A local word line is connected to a
global word line via a buffer. Since loading on each word
line is reduced, access time is improved. In our SRAM
blocks, 16 memory cells are connected to a local word line.

Figure 7: 4:1 Column multiplexer with pre-decoding.

3.3. Implementation of dual-port SRAM block


To support two read ports, a dual-port memory cell as
shown in Figure 9 can be used. It has two word lines and
two bit-line pairs. Although a dual-port memory cell needs
only eight transistors, a single M-VCLB instance cannot
implement a dual-port memory cell. We could certainly use
two M-VCLBs to implement one dual-port memory bit, but
this would waste too many transistors. Hence, we use three
M-VCLBs to implement two dual-port memory bits.
Besides, a dual-port SRAM block needs two row decoders,
two column multiplexers, two column circuitries, and two
clock generators, and additional logic for controlling access
to the second port. Similarly, the layouts of peripheral
circuits should be designed to match the pitch of a dual-port
SRAM array.
Figure 10 shows a dual-port
SRAM64X4_Mux4.

Figure 8: Layout of SRAM64X4_Mux4.

3.4. Performance enhancement


As the size of an SRAM block becomes large, bit-line
capacitance also grows fast. Large bit-line capacitance
increases power consumption and slows down memory

Figure 9: A dual-port memory cell with eight transistors.

Figure 10: Layout of a dual-port SRAM64X4_Mux4.

4. SRAM Compiler for Structured ASIC


Our SRAM compiler can generate single-port and dualport SRAM blocks. It has two major components [18,21,22].
One is the layout generator for producing a GDSII file of an
SRAM block. The other is the SPICE netlist generator that
creates a SPICE netlist for LVS check. The inputs to our
SRAM compiler include design specification such as
number of words, number of bits, multiplexer width,
distance between two vertical power/ground straps, etc.
Multiplexer width specifies the number of bit lines
multiplexed into a data line. The width can be 4, 8 or 16.
The one shown in Figure 7 has a width of 4. Besides, inputs
also include a technology file, GDSII and LEF files of leaf
cells, routing fabrics, etc. Since SPICE netlist generation is a
simple task, here we only present the layout generator.
As shown in Figure 11, our layout generator has a
placement phase and a routing phase. Placement phase
assembles some leaf cells together and determines their
locations to form specific logic blocks and memory array.
Leaf cells such as 1-bit memory cell, 1-bit bit-line
conditioning circuitry, 1-bit sense amplifier, etc. are the
basic circuit elements for forming an SRAM block. Each of
the leaf cells is implemented using the same M-VCLB.
Peripheral circuits are placed with pitch matched to memory
array. Output from placement phase is a DEF file which
becomes an input to routing phase. In an output DEF file,
we assign a weight ranging from 0 to 10 to each net. The
larger a weight is, the more important a net is. Hence, we
assign the largest weight to bit lines and word lines.

Figure 11: SRAM layout generator.

The inputs to routing phase include a DEF file from


placement phase, a LEF file, and a routing fabric. Routing
an SRAM block is different from routing a common circuit.
We develop a router for performing routing using a
predefined routing fabric. We route a net with a larger
weight first. As a result, word lines and bit lines will get
routed first at the moment when there are more routing
resources. We route each two-pin net using Dijkstras
shortest path algorithm. However, if a net is a multi-pin net,
we use FLUTE [23] to decompose it into several two-pin
nets, i.e., forming a Steiner tree. A wire closest to a Steiner
point will be designated as a Steiner wire. The connections
originally routed to a Steiner point should be now routed to
the Steiner wire. After all the nets are routed, we output a
GDSII file. Finally, we obtain the layout of an SRAM block
by combining the GDSII file generated by placement and
the one generated by routing.
Power/ground (P/G) distribution is important for an
SRAM block layout design. An SRAM block for standard
ASIC is normally surrounded by a P/G ring. However, we
have difficulty placing a P/G ring around a relocatable and
resizable SRAM block for structured ASIC. What we do is
to add regular P/G straps over a memory array and
peripheral circuits. P/G straps can be added just like that
added for logic blocks because there is virtually no
difference between a relocatable and resizable SRAM block
and a logic block. Note that I/O pins are on M2~M5 wires
and located inside an SRAM block rather than at the
boundaries to facilitate wiring optimization.

5. Experimental Results
We employ M-VCLB to create a structured ASIC library
with more than 100 cells (combinational and sequential cells
together) based on TSMC 0.18um technology. We also use
M-VCLB to create 6T memory cells (M-cell) and leaf cells
employed by our SRAM compiler to produce a relocatable
and resizable SRAM block. We use a commercial tool
MemChar [24] to characterize access time and power
dissipation of our SRAM blocks. For the purpose of
comparison, we also use Artisans (ARMs) memory
compiler to generate a customized yet non-structured ASIC
SRAM block counterpart. Our M-cell has an area of
94.248um2 whereas a typical 0.18um 6T memory cell has an
area about 5um2 [25]. The area ratio is about 18.85. Table I
shows area and access time of our single-port SRAM blocks
with a multiplexor width equal to four. Area is normalized
to that of Artisans non-structured ASIC SRAM block
counterparts. Our SRAM blocks have an area of 1.5 to 14.4
and an access time of 1.46 to 2.25 times that of their
counterparts. Timing performance of our SRAM blocks is
competitive. The area gap between our SRAM blocks and
Artisans increases with increasing memory size. However,
given that the ratio of our M-cells area to a non-structured
ASIC SRAM cells area is 18.85, the area efficiency of our
SRAM blocks is relatively good, especially for small SRAM
blocks. Clearly, if we can reduce M-cells size, the area
efficiency will be getting even better. When compared to
flip-flop based SRAM blocks, the area of our SRAM array
is only 1/3 the area of a memory array formed by flip-flops
(each flip-flop using three M-VCLBs based on the work in

[7]). Note that Table I also shows the number of nets routed
by our router and the routing runtime. Table II shows the
internal power of single-port SRAM blocks. For a small
SRAM block, power dissipation is dominated by peripheral
circuits whereas, for a large SRAM block, power is
dominated by memory array access. We are still wondering
why Artisans SRAM blocks have such a small standby
power which is independent of their sizes.
Table III shows area and access time of dual-port SRAM
blocks. Note that MemChar [24] fails to characterize our
dual-port SRAM blocks because we use 3 M-VCLBs to
implement two memory bits. Hence, we do not have internal
power data. The access times of our SRAM blocks are
obtained from HSPICE simulation. The array area is 2/3 the
area of a memory array formed by flip-flops. The area usage
and timing performance are as good as those for single-port
SRAM blocks.
Since we derive M-VCLB by shrinking the transistor
sizes of S-VCLB originally optimized for logic gates, we
would like to see the extent of the impact on timing
performance due to transistor downsizing. We perform the
following experiments. We also use S-VCLB to create a
structured ASIC cell library. Along with Artisans 0.18um
non-structured ASIC standard cell library (STDL) and the
cell library created using M-VCLB, we have three cell
libraries now. We synthesize some ITC99 (b14~b22) and
ISCAS89 benchmark circuits using these libraries. We use a
method presented in [7] to push the delay envelope
(minimizing the longest path delay as much as possible) of a
circuit synthesized by Synopsys Design Compiler using
each individual cell library. The smallest longest path delay
obtained for each circuit is used as the achievable clock
period. Table IV shows the achievable clock period for each
circuit and their corresponding power dissipation and total
cell area. Columns denoted by M-VCLB (S-VCLB) give
data obtained by employing the cell library based on MVCLB (S-VCLB). With M-VCLB, chip performance is
degraded by 64% (=(10.8-6.6)/6.6*100%). Correspondingly,
power dissipation is also reduced. Despite of such
degradation, the achievable clock speed by M-VCLB is only
a little bit smaller than half the clock speed achieved by
STDL. Note that all the benchmark circuits in Table IV do
not contain SRAM blocks.
To illustrate how we use our relocatable and resizable
SRAM blocks in a design, we implement an SoC platform
ORPSoC [26] based on OpenRISC 1200, a 32-bit CPU from
OpenCores. ORPSoC has eight memory blocks. The largest
one has 4K bytes whereas the smallest one has 112 bytes.
We have three implementations:
STDL(A): using Artisans non-structured ASIC standard cell
library and Artisans non-structured ASIC SRAM
blocks.
S-VCLB(A): using the structured ASIC cell library based on
S-VCLB and Artisans non-structured ASIC
SRAM blocks.
M-VCLB(O): using our structured ASIC cell library and
relocatable and resizable SRAM blocks based
on M-VCLB.

Table I: Area and access time of single-port SRAM blocks.


Area

SRAM Block

Access time (ns)

Artisans Ours Artisans Ours

1
1
1
1
1

16X4 (8B)
32X8 (32B)
256X16 (512B)
256X64 (2KB)
512X64 (4KB)

1.5
2.5
8.3
11.6
14.4

1.15
1.17
1.20
1.33
1.34

Routing
Nets

Runtime (m)

1.68 300
1.87 800
2.56 10800
2.82 42700
3.01 83100

0
0
2
16
42

Table II: Internal power of single-port SRAM blocks


(mW).
SRAM
Block
16X4
32X8
256X16
256X64
512X64

Artisans

Ours

Read

Write

Standby

Read

Write

Standby

59
70
95
233
237

60
73
103
274
282

0.04
0.04
0.04
0.04
0.04

22
45
144
585
780

30
40
154
547
859

5
10
54
225
366

Table III: Area and access time of dual-port SRAM blocks.


SRAM
Block
16X4
32X8
256X16
256X64
512X64

Area

Access time (ns)

Artisans Ours Artisans

1
1
1
1
1

1.6
2.3
6.8
9.1
11.1

1.19
1.21
1.28
1.46
1.51

Routing

Ours

Nets

Runtime (m)

1.80
1.80
2.40
2.90
3.10

600
1700
22300
87000
169300

0
0
8
72
188

Table IV: Performance impact on pure logic circuits.


Circuit
s35932
s38417
s38584
b14
b15
b17
b18
b19
b20
b21
b22
Mean

Clock period(ns)

Power(mW)

Cell Area( mm2 )

STDL S-VCLB M-VCLB STDL S-VCLB M-VCLB STDL S-VCLB M-VCLB

1.3
2.9
2.4
4.7
4.2
4.9
7.2
9.7
5.2
4.8
5.3
4.8

1.8
3.4
3.0
6.7
4.9
5.8
12.1
13.3
7.0
6.9
7.7
6.6

2.6
5.2
4.4
9.1
8.8
12.5
21.9
22.5
9.7
10.2
11.7
10.8

10
11
12
10
8
26
68
114
22
19
24
29

22
33
32
20
19
58
148
260
30
36
60
65

21
30
36
13
11
42
101
182
28
28
41
49

0.25
0.24
0.23
0.16
0.15
0.44
1.23
2.27
0.37
0.35
0.45
0.56

0.81
1.02
0.95
0.83
0.71
2.09
5.35
8.77
1.17
1.48
2.42
2.33

0.90
1.13
1.01
0.84
0.61
2.28
5.49
9.05
1.74
1,75
2.47
2.48

The same pushing delay envelope method [7] is used to


optimize timing performance of ORPSoC. We employ
Synopsys Design Compiler for logic synthesis and use
Cadence SOC Encounter to perform placement, routing,
power analysis, and timing analysis. Table V shows the
performance, power, and area data. Surprisingly, with MVCLB, performance degradation shown in this table is not
that much as compared with the performance degradation
presented in Table IV. As expected, the core area for MVCLB(O) is much larger than that for S-VCLB(A). This is
because the SoC design with S-VCLB(A) uses a customized
non-structured ASIC SRAM blocks. Figure 12 shows a
layout of the SoC platform designed based on M-VCLB. If
SRAM blocks in this design were not relocatable and
resizable, we would need eight pre-diffused and fixed-size
SRAM blocks at fixed locations. This places a great
limitation on the applications of via-configurable structured
ASIC technology. Chip area is wasted if the pre-diffused
memory blocks are not used. Certainly, it is possible but
rarely happens that a structured ASIC with pre-diffused
memory blocks whose sizes and their quantities are a good
fit to the need of a design. In view of the above point, our

relocatable and resizable SRAM blocks could still be


relatively efficient in area usage and power dissipation for
most of the applications.
Table V: Performance profile of ORPSoC.
Clock period (ns)
Power (mW)
Core area (mm2)
Cell area (mm2)

STDL(A)
15.00
68.74
3.00
1.28

S-VCLB(A)
17.97
247.85
7.86
6.65

M-VCLB(O)
19.74
320.63
18.36
6.70

[3]

[4]
[5]
[6]

[7]
[8]
[9]
[10]

[11]

Figure 12: An SoC platform ORPSoC with eight SRAM


blocks, marked by the eight rectangles.

[12]

6. Conclusions
In this paper we propose a method to create a relocatable
and resizable SRAM block for via-configurable structured
ASIC. This structured ASIC technology is enabled by using
the same VCLB to implement logic blocks and SRAM
blocks. We develop such a VCLB by properly sizing a
VCLB originally optimized for logic gate implementation.
We develop an SRAM compiler to generate relocatable and
resizable SRAM blocks using this VCLB. Our SRAM
blocks, especially the smaller ones, achieve acceptable
timing performance, area usage, and power dissipation. We
also demonstrate an application of our relocatable and
resizable SRAM blocks to designing an SoC platform. In the
future we will work out a VCLB optimized for memory cell
implementation but still being viable for logic gate
implementation. We will also continue to search for VCLBs
that will further narrow the area, power, and performance
gaps between structured ASIC and standard ASIC.

7. References
[1] [1] B. Zahiri, "Structured ASICs: opportunities and
challenges," ICCD, pp. 404-4093, 2003.
[2] K. C. Wu and Y. W. Tsai, "Structured ASIC, evolution
or revolution?" ISPD, pp. 103-106, 2004.

[13]
[14]

[15]

[16]

[17]

L. Pileggi, H. Schmit, A. J. Strojwas, P.


Gopalakrishnan, V. Kheterpal, A. Koorapaty, C. Patel,
V. Rovner, and K. Y. Tong, "Exploring regular fabrics
to optimize the performance-cost trade-off," DAC, pp.
782-787 , 2003.
Z. Or-Bach, "Paradiam shift in ASIC technology instand metal out-stand cell," eASIC, 2006.
C. Patel, A. Cozzie, H. Schmit, and L. Pileggi, "An
architectural exploration of via patterned gate arrays,"
ISPD, pp. 184189, 2003.
Y. Ran and M. Marek-Sadowska, "Designing viaconfigurable logic blocks for regular fabric," IEEE
Trans. on VLSI Systems, Vol. 14, No. 1, pp. 1-14, Jan.
2006.
M. C. Li, H. H. Tung, C. C. Lai, and R. B. Lin,
"Standard cell like via-configurable logic block design
for structured ASICs," ISVLSI, pp. 381-386, 2008
L. C. Lai, H. H. Chang, and R. B. Lin, "Rover:
Routing on via-configurable fabrics for standard-celllike Structured ASICs," GLSVLSI, pp. 37-42, 2011.
V. Kheterpal, A. J. Strojwas, and L. Pilegg, "Routing
architecture exploration for regular fabrics," DAC, pp.
204-207, 2004.
T. Jhaveri, V. Rovner, L. Pileggi, A. J. Strojwas, D.
Motiani, V. Kheterpal, K. Y. Tong, T. Hersan, and D.
Pandini, "Maximization of layout printability/
manufacturability by extreme layout regularity," J.
Micro/Nanolith. MEMS MOEMS 6(3), pp. 0310111~031011-15 (JulSep 2007).
T. Jhaveri, V. Rovner, L. Liebmann, L. Pileggi, A. J.
Strojwas, and J. D. Hibbeler, "Co-optimization of
circuits, layout and lithography for predictive
technology scaling beyond gratings," IEEE Trans. On
CAD, Vol. 29, No. 4, pp. 509-527, April 2010.
K. H. Choe and k. K. Chua, "Distributed memory
circuitry on structured application-specific integrated
circuit devices," US Patent, US 7,586,327 B1, Sep. 8,
2009.
D. Lewis, "Variable sized soft memory macros in
structured cell arrays, and related methods," US Patent,
US 7,768,819 B2, Aug. 3, 2010.
A. Pavlov and M. Sachdev, CMOS SRAM circuit
design and parametric test in nano-scaled technologies
: process-aware SRAM design and test, Springer
Verlag, 2008.
T. Kobayashi, K. Nogami, T. Shirotori, and Y.
Fujimoto, "A current-controlled latch sense amplifier
and a static power saving input buffer for low-power
architecture, IEEE J. Solid State Circuits, vol. 28, no.
4, pp. 523527, Apr. 1993.
Y. C. Lai and S. Y. Huang, "A resilient and powerefficient automatic-power-down sense amplifier for
SRAM design," IEEE Trans. on Circuits and Systems
II: Express Briefs, vol. 55, no. 10, pp. 10311035, Oct.
2008.
B. S. Amrutur and M. A. Horowitz, "A replica
technique for wordline and sense control in low-power
SRAMs," IEEE J. Solid-State Circuits, vol. 33, no. 8,
pp. 12081219, Aug. 1998.

[18] Y. Xu, Z. Gao, and X. He, "A flexible embedded


SRAM IP compiler," ISCAS, pp.3756-3759, 2007.
[19] A. Karandikar, and K. K. Parhi, "Low power SRAM
design using hierarchical divided bit-line approach,"
ICCD, pp.82-88, 1998.
[20] J. H. Oppold, M. R. Ouellette, and M. J. Sullivan,
"Performance optimizing compiler for building a
compiled SRAM," United States Patent , no.6,002,633,
1999.
[21] Z. Wu, Z. Gao, and X. He, "Development of a deep
submicrometer embedded SRAM compiler," ICECS,
pp. 707-710 2003
[22] J. C. Tou, P. Gee, J. Duh, and R. Eesley, "A
submicrometer CMOS embedded SRAM compiler,"
IEEE J. Solid-State Circuits, vol. 27, no. 3, pp. 417
424, Mar. 1992.
[23] FLUTE: Fast lookup table based technique for RSMT
construction and wirelength estimation. [Online].
http://home.eng.iastate.edu/~cnchu/flute.html
[24] MemChar, Legend Design Technology Inc. [Online].
http://www.legenddesign.com/products/memchar.shtm
l
[25] C. H. Hsiao and D. M. Kwai, "Measurement and
characterization of 6T SRAM cell, Int. Workshop
Memory Technology, Design, and Testing (MTDT),
Taipei, Taiwan, Aug. 2005.
[26] ORPSoC, http://opencores.org/project,orpsoc.

Performance and Cache Access Time of SRAM-eDRAM Hybrid Caches


Considering Wire Delay
Young-Ho Gong, Hyung Beom Jang, and Sung Woo Chung
Dept. of Computer and Radio Communications Engineering, Korea University
Seoul, Korea
E-mail: {kyh555, kuphy01, swchung}@korea.ac.kr
electrical resistance. Thus, the delay is reduced by the advance of process technology. If a delay model is more dependent on area than the process technology, the delay model is area-sensitive delay model. Otherwise, the delay model
is the process technology sensitive delay model. To reduce
the cache area, the SRAM-eDRAM hybrid caches were proposed by substituting some of SRAM cells with eDRAM
cells. One of them is the SRAM-eDRAM hybrid macrocell
which is proposed for L1 data caches [5]. The area and
energy consumption of the L1 data caches is reduced by the
SRAM-eDRAM hybrid macrocell. However, in their work,
the effect of the reduced area on the cache access time is not
considered. Since the h-tree wire delay accounts for a considerable portion of the cache access time and it is shortened
by the reduced area, the SRAM-eDRAM hybrid cache with
less wire delay is expected to further improve the cache
access time.
As far as we know, there has not been any study about
the impacts of the SRAM-eDRAM hybrid cache on the performance considering the area, wire delay, and cache access
time. In this paper, we adopt the SRAM-eDRAM hybrid
cache to investigate the effect of the reduced area on the
wire delay, cache access time, and performance. We apply
the SRAM-eDRAM hybrid cache to the L1 and L2 (LLC);
the L2 cache is used for the last-level cache in our paper.
Since the L1 cache has the smallest area, the wire delay improvement caused by the area reduction of the SRAMeDRAM hybrid cache is not significant compared to the
LLC. However, when the critical path is on the access to the
L1 cache, clock cycle time is reduced, which directly leads
to the performance improvement. Different from the L1
cache, the LLC has large capacity as much as multimegabyte. Obviously, the area reduction of the LLC is larger than the area of the L1 cache. Therefore, in case of the
LLC, the cache access time is expected to be further improved with the SRAM-eDRAM hybrid cache, which will
be shown in Section 4. Thus, the SRAM-eDRAM hybrid
cache reduces the number of clock cycles to access the LLC,
which leads to better performance.
The rest of this paper is organized as follows. Section 2
presents related works on the SRAM-eDRAM hybrid cache
structure. Section 3 demonstrates the impact of the SRAMeDRAM hybrid cache on the area, wire delay, and cache
access time. Section 4 provides our evaluation environments
and results in terms of energy consumption, cache access
time and performance. Section 5 concludes our paper.

Abstract
Most modern microprocessors have multi-level on-chip
caches with multi-megabyte shared last-level cache (LLC).
By using multi-level cache hierarchy, the whole size of onchip caches becomes larger. The increased cache size causes
the leakage power and area of the on-chip caches to increase.
Recently, to reduce the leakage power and area of the
SRAM based cache, the SRAM-eDRAM hybrid cache was
proposed. For SRAM-eDRAM hybrid caches, however,
there has not been any study to analyze the effects of the
reduced area on wire delay, cache access time, and performance. By replacing half (or three fourth) of SRAM cells by
small eDRAM cells for the SRAM-eDRAM hybrid caches,
wire length is shortened, which eventually results in the reduction of wire delay and cache access time. In this paper,
we evaluate the SRAM-eDRAM hybrid caches in terms of
the energy, area, wire delay, access time, and performance.
We show that the SRAM-eDRAM hybrid cache reduces the
energy consumption, area, wire delay, and SRAM array
access time by up to 53.9%, 49.9%, 50.4%, and 38.7%, respectively, compared to the SRAM based cache.

Keywords
SRAM-eDRAM hybrid cache, wire delay, access time

1. Introduction
Recent microprocessors have multi-level cache hierarchy
which is composed of the first level (L1), second level (L2),
and last-level cache (LLC). Since cache access time is crucial for performance, SRAM cells which have faster access
time than DRAM cells are suitable for cache memory. However, SRAM cells relatively occupy larger area and consume
much more leakage power than DRAM cells, since SRAM
cells have more transistors. The whole size of on-chip caches including L1, L2, and LLC is large as much as multimegabyte. An important point is that the cache access time
is increased with the cache size, since the delay to travel
route wire is increased by the enlarged area.
Cache access time is composed of the following delay
models based on [7]: i) h-tree wire delay, ii) decoder delay,
iii) wordline delay, iv) bitline delay, and v) Sense amplifier
delay. These delay models can be classified into two categories: i) area-sensitive and ii) process technology sensitive.
The wire length is increased with the cache area so that the
electron transfer time through wire is inevitably increased.
Especially, the h-tree wire delay is the most area-sensitive
delay model. Note the h-tree wire delay accounts for substantial portion of cache access time [7]. On the other hand,
the delay to transfer an electron is largely dependent on the
978-1-4673-4953-6/13/$31.00 2013 IEEE

2. Related Work
2.1. SRAM-eDRAM Hybrid Cache
524

14th Int'l Symposium on Quality Electronic Design

(a) SRAM based cache

Figure 1: Block diagram of an N-bit macrocell [5]

2.1.1. Macrocell Architecture


A macrocell [5], which is composed of one SRAM cell
and multiple eDRAM cells, was proposed for the L1 data
cache, as shown in Fig. 1. Though the eDRAM cell consists
of one transistor, it has the same capacity as the SRAM cell.
However, the eDRAM cell has relatively slower cell access
time than the SRAM cell. Since cache access time is determined by the slowest time for access, the multiple eDRAM
cells in the macrocell could result in significant performance
degradation. To alleviate performance loss caused by the
eDRAM cells, the SRAM-eDRAM hybrid caches based on
the macrocell adopt different access cycles depending on
cell types. In the macrocell architecture, the SRAM cell is
accessed in the first cycle. When a miss occurs in the SRAM
cell, then the eDRAM cells are sequentially accessed in the
second cycle. To make use temporal locality, the Most Recently Used (MRU) cache replacement policy is applied to
the SRAM cells.
Fig. 1 shows the implementation of the n-bit macrocell.
Compared to the SRAM based cache, only the bridges and
intermediate buffer are additionally added to communicate
between an SRAM cell and N-1 eDRAM cells. The bridges
are used for unidirectional data transfer from an SRAM cell
to eDRAM cells not used for data transfer from eDRAM
cells to an SRAM cell. The intermediate buffer works as
follows; i) in the case of cache hit to an eDRAM cell, the
eDRAM cell data is transferred to the intermediate buffer, ii)
by using s2d (SRAM to eDRAM) bridge, the SRAM cell
data is transferred to the DRAM cell, and iii) the SRAM cell
gets the data which was a hit to the eDRAM cell from the
intermediate buffer. In other words, the intermediate buffer
is used only if the cache hit to the eDRAM cell occurs.
In [5], Valero et al. evaluated their proposed scheme focusing on the energy consumption, area, and performance.
Their experimental results showed that the energy consumption and area are reduced by 55% and 29%, on average, respectively, while the performance degradation is less than
2% in the 45nm process technology [6]. However, in their
work, they did not consider the effect of the reduced area on

(b) Hybrid cache (1S3D)


Figure 2: Operations of SRAM based cache and hybrid
cache
the cache access time. Since the area of the eDRAM cells is
much smaller than that of the SRAM cells, the area reduction by the SRAM-eDRAM hybrid cache leads to the wire
delay reduction. However, the L1 cache size is generally
less than 64KB so that the wire delay reduction and cache
access time reduction are insignificant. Instead, with the
multi-megabyte LLC, it is appropriate to evaluate the effect
of the wire delay reduction. In this paper, we analyze wire
delay reduction when we adopt the SRAM-eDRAM hybrid
cache to the L1 cache and L2 (last-level) cache. In addition,
we evaluate the impact by wire delay reduction by the
SRAM-eDRAM hybrid cache on the access time and performance.

2.1.2. Macrocell Based SRAM-eDRAM Hybrid


Cache Organization and Policy
A cache is composed of tag arrays and data arrays. Valero et al. adopted the macrocell to implement data arrays [5],
while tag arrays are same as the SRAM based cache. Fig. 2
shows the operations of the SRAM based cache compared to
the SRAM-eDRAM hybrid cache (SRAM:eDRAM=1:3),
assuming that the caches are 4-way set associative caches. In
the case of the SRAM based cache (Fig. 2(a)), all tag arrays
and data arrays are accessed simultaneously, which requires

one cycle. On the other hand, as shown in Fig. 2(b), the


SRAM-eDRAM hybrid cache requires one cycle to access
all the tag arrays and SRAM data array, the cache works
same as the SRAM based cache (static hit). If the SRAM
array does not contain the desired data, then the eDRAM
data array is accessed, which consumes the second cycle
(dynamic hit). Thus, the cache hits to the eDRAM data array
incur the performance overhead. To mitigate the performance degradation, they have MRU blocks in the SRAM
array [5].
An eDRAM array loses the data when the retention time
is expired. When a dirty block is in an eDRAM array, this
data should be written back to the next level cache or be
accessed within the predefined retention time. Since exceeding the retention time may cause incorrect executions, the
duration of the retention time should be carefully considered.
Liang et al. proposed a global binary counter to prevent discharging of eDRAM array [6]. For all blocks in the eDRAM
array, the retention time is examined by the global binary
counter whether it is imminent or not. At the time for retention, the processor checks the dirty bit of the block. Only
when the dirty bit is set to 1, the block is written back to
the next level cache. By adopting the global binary counter,
they prevent the incorrect executions [5].

2.2. Non-Uniform Cache Architecture (NUCA)


As the process technology shrinks down, more bits can
be saved into same area. Nevertheless, cache area becomes
larger since larger on-chip cache is required for high performance microprocessors. In the uniform cache architecture
(UCA), the cache access time is determined by the time to
access the farthest sub-bank since all sub-banks should be
accessed at the same clock cycle by using the h-tree network.
In addition, the h-tree wire delay comprises a considerable
portion of cache access time. Therefore, the cache access
time is deteriorated as the cache size grows. To reduce the htree wire delay, the non-uniform cache architecture (NUCA)
is proposed by Kim et al. [1]. In the UCA model, all subbanks are accessed with the same number of clock cycles.
On the other hand, each sub-bank can be accessed with different number of clock cycles in the NUCA model.
The main idea of NUCA is as follows; i) the cache accesses to banks which are closer to the cache controller lead
to shorter access time than the cache accesses to banks
which are far from the cache controller, ii) different banks
have different access time, iii) each bank has the private
transmission channel to serve the requests. The private perbank channel makes each bank be accessed at its maximum
speeds. However, the private per-bank channel results in the
cache area as the number of banks grows. In addition, the
cache controller needs more power to support the private
per-bank channel. These overheads restrict the cache sizes
and the number of banks.
On the contrary, the SRAM-eDRAM hybrid cache reduces energy consumption since the eDRAM cells consume
less leakage power than SRAM cells. In addition, the
SRAM-eDRAM hybrid cache has much smaller area than
the SRAM based cache. Thus, the wire length and wire delay are reduced by the SRAM-eDRAM hybrid cache.

(a) 1S1D hybrid cache

(b) 1S3D hybrid cache


Figure 3: Two different SRAM-eDRAM hybrid caches,
depending on the eDRAM array ratio

3. Impacts of the SRAM-eDRAM Hybrid Cache on


Wire Delay
We adopt the SRAM-eDRAM hybrid cache based on the
macrocell architecture [5], as described in Subsection 2.1.
To investigate the impact of the SRAM-eDRAM hybrid
cache on wire delay, first, we analyze the area of SRAMeDRAM hybrid cache since the h-tree wire delay is significantly affected by area. Then, we will analyze the wire delay,
and cache access time.

3.1. Target Hybrid Cache


Fig. 3 shows the configuration of SRAM-eDRAM hybrid
cache which is used for the analysis; the 1S1D, as shown
in Fig. 3(a), represents the SRAM-eDRAM hybrid cache
which has the same ratio of SRAM arrays and eDRAM arrays, and 1S3D, as shown in Fig. 3(b) represents the
SRAM-eDRAM hybrid cache, where the ratio of SRAM
arrays and eDRAM arrays is one to three. For example, if
the 1S3D hybrid cache is applied to the 8-way set associative cache, the first two ways (way-0 and way-1) are composed of the SRAM arrays and the other subsequent ways
(from way-2 to way-7) are composed of the eDRAM arrays.
Since cache hit time is one of the most significant factors
to determine performance of microprocessors, at least one
SRAM array should be adopted for SRAM-eDRAM hybrid
caches; in our paper, the way-0 is always composed of
SRAM array. Different from the data arrays which are composed of the SRAM arrays and eDRAM arrays, all tag arrays are composed of the SRAM arrays. Since a cache hit or
miss is determined after accessing tag arrays, access time to
the tag array is crucial for performance. Thus, it would take
much longer to compare tags resulting in severe performance degradation, if the tag arrays were composed of the
eDRAM arrays.

3.2. Area

To analyze the h-tree wire delay, we estimate the area of


the SRAM-eDRAM hybrid cache compared to the SRAM
based cache. By using the CACTI 6.5 [7], we obtain the area
of 64KB L1 cache, 8MB and 16MB LLC. The CACTI 6.5
does not support hybrid cache configuration, but it only supports the homogeneous cache configuration (SRAM-only or
eDRAM-only). Thus, we calculate the area of the 1S1D and
1S3D hybrid cache as follows: i) we obtain the area of the
SRAM arrays and the area of the eDRAM arrays respectively, according to the cache size and cache configuration
(1S1D or 1S3D). ii) We then add up the area of the SRAM
arrays and the area of the eDRAM arrays. Table I shows the
result of data array area according to the cache size.

3.4. H-Tree Wire Delay

Table I. Data array area comparison


(mm2)
64KB(L1)

SRAM
0.57(100.0%)

1S1D

Figure 5: H-tree wire delay

1S3D

0.57(99.3%)

0.55(95.6%)

8MB(LLC)

13.93(100.0%)

9.80(70.4%)

7.88(56.6%)

16MB(LLC)

28.64(100.0%)

19.51(68.1%)

14.49(50.1%)

When we apply the SRAM-eDRAM hybrid cache to the


L1 cache with the size of 64KB, the area reduction is not
significant since the area of the L1 cache is very small. On
the other hand, the area reduction is noticeable when we
apply the SRAM-eDRAM hybrid cache to the L2 cache. In
case of the 1S3D hybrid cache with the size of 16MB, almost half of the cache area is reduced by the SRAMeDRAM hybrid cache compared to the SRAM based cache.

3.3. Delay Model Breakdown


To analyze the cache access time, we use the CACTI 6.5
and break the cache access time into the delay components
based on the delay models. Fig. 4 shows the ratio of each
delay model with the 64KB SRAM based cache. The h-tree
wire delay accounts for 23.7% of cache access time. Though
the ratio of each delay model differs from the cache configuration, the ratio of the h-tree wire delay is about 20% even
with the 8MB and 16MB SRAM based cache. The h-tree
wire delay is significantly reduced when we apply the
SRAM-eDRAM hybrid cache to the multi-megabyte cache.
We will show the reduced cache access time and performance in Section 4.

We investigate the impact of the cache area reduction on


the h-tree wire delay of the SRAM-eDRAM hybrid caches.
We evaluate the h-tree wire delay reduction with two different hybrid caches (1S1D and 1S3D). Among the cache
access time delay models of the cache access time, which
are explained in earlier Section, the h-tree wire delay is the
area-sensitive delay model. As shown in Eq. (1) [7], the htree wire delay is determined by the property of wires, number of banks, height of banks, and width of banks.
Delayh-tree Propertywire #bank Heightbank Widthbank (1)
In Eq. (1), the property of wires depends on the characteristics of power and delay of wires. In the CACTI 6.5, the
property of wires is described according to the delay penalty: low-swing, 30%, 20%, 10%, and delay optimal. Among
these wire properties, the delay optimal wires have the best
performance since repeater size/spacing is optimized for
minimizing wire delay [4]. In this paper, we conservatively
choose the delay optimal wire property. We assume that the
cache has 8 banks. Thus the h-tree wire delay is determined
by the height and width. To calculate the h-tree wire delay
of the SRAM-eDRAM hybrid cache, we extract the h-tree
wire delay depending on the cache area.
Fig. 5 shows the evaluation results of the h-tree wire delay depending on the cache configuration. In case of the
64KB 1S3D hybrid cache, the h-tree wire delay is reduced
by only 1.2%. Since the area of the 64KB cache is very
small compared to 8MB and 16MB cache, the h-tree wire
delay reduction by the SRAM-eDRAM hybrid cache is
small. On the other hand, in case of the 8MB and 16MB
1S3D hybrid cache, the h-tree wire delay is reduced by as
much as 17.8% and 50.4%, respectively, compared to the
SRAM based cache.

4. Evaluation Methodology and Results


4.1. Evaluation Environments

Figure 4: Breakdown of the delay components in case


of the 64KB SRAM based cache.

For evaluation, we use the three different caches depending on the ratio of eDRAM array: the SRAM based cache,
1S1D hybrid cache, and 1S3D hybrid cache. We evaluate
these three different caches in the perspective of energy consumption, cache access time, and performance. Additionally,
we compare the SRAM-eDRAM hybrid caches with NUCA.
Table II shows the specification of two different cache configurations (Config.1 and Config.2) used in our evaluation.

For a fair comparison, we use the baseline which has only


SRAM based cache: the term SRAM represents the baseline.
Table III. Cache Configuration
Config.1
Technology
Line size
L1 cache
(private)
LLC
(shared)

Config.2

32nm
64B
64KB
4-way set assoc.
8MB
8-way set assoc.

16MB
8-way set assoc.

We use the M-sim 3.0 simulator [3], which is derived from


the SimpleScalar toolset [8], to investigate the architectural
impacts of the SRAM-eDRAM hybrid cache. Table III describes the system parameters for our simulation. The number of executed instructions is 500 million, and 10 million
instructions are fast-forwarded.

(a) SRAM array hit ratio

Table II. System Parameters


Processor
# Instructions
Issue policy
Issue width
Pipeline depth
Pipeline width
Functional units
Branch predictor
Cache configuration

2 cores (1 application per core)


500 million
Out of Order
4
6 stages
4
4 integer ALU, 4 FP ALU
Bimodal
Described in Table. II

For benchmark applications, we select 7 applications


which have higher misses per kilo instructions (MPKI) than
the other applications [3] from the SPEC CPU 2006 benchmark suite [9]. We then make six groups which are composed of randomly selected two applications. Since we assume that each core can execute one application at a time,
each application is tied up with a core. Table IV shows the
groups of applications.

Group 1
Group 2
Group 3
Group 4
Group 5
Group 6

Table IV. Groups of Applications


Core 1
Core 2
bzip2
gcc
lbm
gobmk
sjeng
mcf
astar
lbm
gobmk
mcf
sjeng
gcc

4.2. Energy Consumption


By using the eDRAM cells instead of the SRAM cells,
total energy consumption is significantly reduced, since the
eDRAM cells consume much less leakage power than the
SRAM cells [4]. To evaluate total energy consumption, we
investigate the leakage power and dynamic energy separately based on the CACTI 6.5 [7].

(b) Normalized Energy Consumption


Figure 6: SRAM array hit ratio and normalized energy
consumption in the L2 cache for each group of applications.
We firstly evaluate the leakage power for the SRAM
based cache, 1S1D, and 1S3D hybrid cache. In case of the
1S1D and 1S3D hybrid cache with the size of 8MB, the leakage power is reduced by 42.9% and 61.2%, on average,
respectively, compared to the SRAM based cache. In addition, to calculate the dynamic energy consumption, we investigate the number of access to the SRAM array and the
dynamic energy per access. Fig. 6(a) shows the hit ratio of
the SRAM array depending on the ratio of eDRAM array:
the SRAM based cache, 1S1D hybrid cache, and 1S3D hybrid cache. As shown in Fig. 6(a), the hit ratio of the SRAM
array is more than 95% in the 1S1D and 1S3D hybrid cache.
Though the dynamic energy per access of the eDRAM array
is slightly larger than that of the SRAM array, the dynamic
energy consumption of the SRAM-eDRAM hybrid cache is
almost same as that of the SRAM based cache. Consequently, the difference of the dynamic energy consumption between the SRAM-eDRAM hybrid cache and the SRAM
based cache is less than 1%, on average.
Based on the leakage power and dynamic energy consumption, we calculate the total energy consumption. Fig.
6(b) shows the total energy consumption which is normalized to the SRAM based cache. In Fig. 6(b), we only show
the total energy consumption of the Config.1 which is described in Table II; the total energy consumption of the Config.2 is similar to the results of the Config.1. In our evalua-

(a) 64KB L1 cache

(b) 8MB L2 cache

(c) 16MB L2 cache

Figure 7: Analysis of the cache access time


tion of energy consumption, the 1S1D, 1S3D hybrid cache
and NUCA are applied to only LLC; note that the SRAM
based cache is applied to the L1 cache. The total energy
consumption of 1S1D and 1S3D hybrid cache is reduced by
36.9% and 53.9%, on average, respectively, compared to the
SRAM based cache. The NUCA consumes more energy
than the SRAM based cache as well as hybrid caches due to
the routing overhead [2].

4.3. Cache access time


When we adopt the SRAM-eDRAM hybrid cache instead of the SRAM based cache, the access time is reduced
by two factors: i) one is the h-tree wire delay reduction due
to smaller area of the SRAM-eDRAM hybrid cache, which
is shown in Section 3.2, and ii) the other is the different
number of cache access cycles depending on the cell type
(SRAM or eDRAM). Note that the SRAM array and
eDRAM array are not accessed at the same time in the
SRAM-eDRAM hybrid cache; the SRAM array is firstly
accessed, and the eDRAM array is subsequently accessed if
the SRAM array miss occurs. In addition, the bitline delay is
shortened by the SRAM-eDRAM hybrid cache. Since the
bitline delay is proportional to the length of bitline, the
number of cycles for the SRAM array access is reduced
when the cache hit occurs in the SRAM array.
Fig. 7 shows the cache access time depending on the
cache size (64KB, 8MB and 16MB). In addition, Table V
presents the number of clock cycles the SRAM array access
Table V. Number of clock cycles for the SRAM array
access depending on the ratio of eDRAM array
(cycles)

SRAM

1S1D

1S3D

64KB (L1)
8MB (LLC)
16MB (LLC)

3 cycles
21 cycles
25 cycles

3 cycles
16 cycles
23 cycles

3 cycles
14 cycles
16 cycles

in the SRAM-eDRAM hybrid cache depending on the cache


size and ratio of the eDRAM array. We assume that the
clock frequency is 3GHz. In case of the 1S1D and 1S3D
hybrid cache with the size of 64KB, Fig. 7(a) shows the
SRAM array access time is reduced by 19.1% and 21.5%,
on average, respectively, compared to the SRAM based
cache. However, the access time reduction is too little to
reduce the number of clock cycles for the L1 cache access
showing same number of clock cycles as the SRAM based
L1 cache, as shown in Table V. Instead, when the critical
path is on the access to the L1 cache, the clock cycle time
can be reduced. As shown in Eq. (2), the execution time
(TExecution) is determined by the multiplication of the cycles
per instruction (CPI), the number of instructions (#Instructions),
and the clock cycle time. In our evaluation, the number of
instructions and CPI is not changed. Thus, the execution
time is enhanced by the reduced clock cycle time.
T Execution = CPI # Instructions Clock cycle time (2)
On the other hand, in case of the LLC, the SRAM array
access time is significantly reduced by the SRAM-eDRAM
hybrid cache. Fig. 7(b) and Fig. 7(c) show that the SRAM
array access time is reduced by 38.7% and 36.9%, on average, in the 8MB and 16MB LLC, respectively. The ratio of
the eDRAM arrays in the SRAM-eDRAM hybrid cache also
affects the SRAM array access time. As shown in Table V,
the number of clock cycles to access the SRAM array is
significantly reduced as the ratio of the eDRAM arrays is
increased. The reason is that the h-tree wire delay is significantly decreased with the increase in the ratio of eDRAM
array since the area of eDRAM array is much smaller than
that of the SRAM array. Thus, the number of clock cycles
for the SRAM array access is decreased, though the access
time is longer than the SRAM based L1 cache when the
cache miss occurs in the SRAM array. Thus, the execution
time is shortened by the reduced CPI.

not consider the h-tree wire delay reduction. In this paper,


we analyze the impacts of the SRAM-eDRAM hybrid cache
on area, h-tree wire delay, cache access time, performance,
and energy consumption. When the 1S3D hybrid cache is
applied to the 16MB LLC, the h-tree wire delay is reduced
by up to 50.4%. Thus, the SRAM array access time is reduced by up to 38.7% compared to the SRAM based cache.
Moreover, the energy consumption and area is substantially
decreased with negligible performance degradation. As the
cache size increases in the future, the benefits of the SRAMeDRAM hybrid cache is expected to be much more significant.

6. Acknowledgements
Figure 8: Performance of each group of applications.
The 1S1D/1S3D hybrid caches and NUCA are applied
to L2 cache.

4.4. Performance
While the SRAM array access time is further decreased
by the SRAM-eDRAM hybrid cache as the cache size is
increased, as shown in Section 4.3, the performance might
be deteriorated by the SRAM-eDRAM hybrid cache due to
the access time to the eDRAM array. However, the shortened SRAM array access time compensates for the increased
average cache access time caused by the eDRAM cells.
When the SRAM-eDRAM hybrid cache is applied to the L1
cache, performance loss is 2%, on average. On the other
hand, when the SRAM-eDRAM hybrid cache is applied to
the LLC, the performance loss by the eDRAM cells is negligible, since the SRAM array access time considering the
reduced wire delay (as shown in Fig. 7) is around 30% more
reduced compared to that not considering the reduced wire
delay in the LLC.
Fig. 8 shows the IPC which is normalized to the SRAM
based LLC with the different groups of applications. For
evaluation, we use the cache configuration as described in
Table II. The SRAM-eDRAM hybrid cache and NUCA are
applied to the LLC. In our evaluation, we concentrate on the
LLC since the impact of the SRAM-eDRAM hybrid cache
on the L1 cache was already investigated in [6]. As shown in
Fig. 8, the impact of the SRAM-eDRAM hybrid cache on
performance is not prominent, since the number of L2 cache
accesses is much less compared to the L1 cache. In case of
the group 1 or group 6, the performance improvement is
slight (on average, 1.0%). Though the performance improvement by the SRAM-eDRAM hybrid cache is not so
significant, the energy consumption is still significantly reduced (as shown in Fig. 6(b)) with relatively small area.
However, the SRAM-eDRAM hybrid cache is expected to
further enhance performance is expected to be further improved by the SRAM-eDRAM hybrid cache as the cache
size is increased.

5. Conclusion
The SRAM-eDRAM hybrid cache was originally proposed to reduce leakage power and area of SRAM cells,
maintaining performance. However, the previous studies did

This work was supported by the Center for Integrated Smart


Sensors funded by the Ministry of Education, Science and
Technology as Global Frontier Project (CISS-2012054194).
This work was also supported by the Ministry of Knowledge
Economy (MKE), Korea, under the Information Technology
Research Center (ITRC) support program supervised by the
national IT Industry Promotion Agency (NIPA-2012H0301-12-2006).

7. References
[1] C. Kim, D. Burger, S.W. Keckler, An adaptive, nonuniform cache structure for wire-delay dominated onchip caches, Proceedings of the 10th International Conference on Architectural Support for Programming
Languages and Operating Systems (ASPLOS), vol. 30,
pp. 211-222, Dec. 2002.
[2] N. Muralimanohar and R. Balasubramonian, Interconnect Design Considerations for Large NUCA Caches,
Proceedings of the 34th annual international symposium
on Computer architecture (ISCA 07), , vol. 35, pp.
369-380, May 2007.
[3] J. J. Sharkey, D. Ponomarev, and K. Ghose M-Sim: A
Flexible, Multithreaded Architectural Simulation Environment, Technical report CS-TR-05-DP01, Department of Computer Science, State University of New
York at Binghamton, Oct. 2005.
[4] S. Thoziyoor, N. Muralimanohar, J.H. Ahn, and N.P.
Jouppi CACTI 5.1, Technical report, HPL-2008-20,
HP Laboratories, Apr. 2008.
[5] A. Valero, J. Sahuquillo, S. Petit, V. Lorente, R. Canal,
P. Lopez, and J. Duato, An hybrid eDRAM/SRAM
macrocell to implement first-level data caches, Proceedings of the 42nd Annual IEEE/ACM International
Symposium on Microarchitecture, pp. 213221, Dec.
2009.
[6] A. Valero, J. Sahuquillo, V. Lorente, S. Petit, P. Lopez,
and J. Duato, Impact on performance and energy of
the retention time and processor frequency in L1 macrocell-based data caches, IEEE Transactions on Very
Large Scale Integration (VLSI) Systems, pp. 1-10, May
2011.
[7] S.J.E. Wilton and N.P. Jouppi, CACTI: An enhanced
cache access and cycle time model, IEEE Journal of
Solid-State Circuits, vol. 31, no. 5, May 1996.
[8] SimpleScalar toolset. http://www.simplescalar.com/
[9] SPEC CPU2006 http://www.spec.org/cpu2006/

223

Single Bit-line 7T SRAM cell for Low


w Power
and High SNM
Basavaraj Madiwalar

Dr. Kariyappa B.S


B

MTech Student (VLSI design & Embeddded Systems)


RVCE Bengaluru, INDIA
A
basavaraj.madiwalar@gmail..com

Professor. Dept off ECE


RVCE Bengaluru, INDIA
I
kariyappabs@rvce.edu.in

Abstract:- Memories are integral parts of most of the digital devices


and hence reducing power consumption of memory is very
important in improving the system performancce, efficiency and
stability. Most of the embedded and portable d
devices use SRAM
cells because of their ease of use as well as low
w standby leakage.
Standard CMOS 6T SRAM cell uses two bit-linees and a word line
for both read and write operations. This 6T SRA
AM cell consumes
more power and shows poor stability at small featture sizes with low
power supply. During read operation, the staability drastically
decreases due to the voltage division between the access and driver
transistors. In this paper new 7T SRAM cell is prooposed, which uses
single bit-line for both read and write op
perations. Power
consumption is reduced because of single bit lin
ne usage and read
stability is very high compared to conventionaal 6T SRAM cell.
Proposed cell also provides high static noise marrgins (SNMs). The
proposed 7T SRAM cell is compared with conven
ntional 6T SRAM
cell in terms of power consumed, delay and SNMs.. The Proposed 7T
SRAM cell consumes 22.03% less power for wrrite 0 operation,
17.33% less power for write 1 operation, 17.522% less power for
read 0 operation and 21.36% less power for rread 1 operation
posed cell has 2.64
compared to conventional 6T SRAM cell. The prop
times SNM in read state; 1.082 times SNM in hoold state and 1.064
ntional 6T SRAM
times SNM in write 0 state compared to conven
cell. Schematics are drawn using virtuoso ADE off Cadence, and all
simulations are carried out using Cadence Specttre Analyzer with
90nm Technology library at 1.8V VDD.
Keywords:- Single bit line, 7T-SRAM cell, low ppower, read stable,
SRAM (Static Random Access Memory), SNM (Staatic Noise Margin),
RNM (Read Noise Margin), WNM (Write Noisee Margin), HSNM
(Hold State Noise Margin).

I. INTRODUCTION
In recent microprocessors, the capacity of onn-chip memory is
rapidly increasing to improve overall performaance. As a larger
cache memory is demanded, SRAM plays an inccreasingly critical
role in modern microprocessor systems [1], porrtable devices like
PDA, cellular phones, portable multimedia deviices etc. To attain
higher speed, SRAM based cache memories and System-onchips (SOCs) are commonly used [2]. Due too device scaling,
SRAM design is facing several challengges like power
consumption problems, stability and area. Six--transistor SRAM
cell is conventionally used as the memorry cell [3] [4].
Substantial problems have already been encouuntered when the
conventional six transistors (6T) SRAM cell configuration is
used. This cell shows poor stability [3]. It has small hold and
read static noise margins. During the read operaation, the stability

drastically decreases due to the voltage


v
division between the
access and driver transistors. Sincee conventional six transistors
(6T) SRAM cells operate on delicattely balanced transistors, and
cell shows poor stability during read
d operation.
The basic cell of conventionall static RAM consists of 6
transistors in which there are two pass gates and 2 inverters,
which are cross coupled and they store the data. Whereas the
pass transistors purpose is to select the
t cell, which is activated by
uts to be read or written to the
the word line and pass the data inpu
uit diagram of 6T bit cell is
cross coupled inverters. The circu
shown in Fig 1.

Figure 1: Conventional six tran


nsistor (6T) SRAM cell

II. PROPOSED DESIGN


N (7T SRAM cell)
Main objective of proposing th
his new 7T SRAM cell is to
have good Read Stability and Staatic Noise Margins (SNMs).
Proposed 7T SRAM cell is shown in
n Fig 2. This new SRAM cell
is made up of seven transistors, uses single bit-line (BL), a word
line (WL), and a read line (RL).
While writing into the cell, bit-liine (BL) and word line (WL)
(inactive). While reading
are used and read-line (RL) is not used
u
from the cell, bit-line (BL) and read line (RL) are used and wordline (WL) is not used (inactive). Fo
or any operation proposed 7T
or read-line (that is
SRAM cell uses single bit line and word-line
w
only two lines for any operation).. Where as conventional 6T
SRAM cell need all 3 lines (two bit
b lines (BL & BLb) and a
consuming more power
word line (WL)) for any operation hence
h
compared to proposed 7T SRAM ceell.

978-1-4673-5090-7/13/$31.00 2013 IEEE


:P

224
RL is at VDD and N4 is OFF becaause Qb=0. Now BL has no
path to discharge to ground, hence retaining the held charge,
indicating stored data is 1.
C. Hold operation of 7T SRAM celll
During hold state, Q and Qb
b nodes of 7T SRAM cell
maintain the stored data until the power is available from the
power supply. If data stored is 1, then Q=VDD and Qb=0V. If
data stored is 0, then Q=0V and Qb
b=VDD.
N RESULTS
III. SIMULATION
Figure 2: Proposed 7T SRAM cell

Since 7TSRAM cell uses only one bit linee, power required
for charging and discharging of one more bit line will be
reduced. Hence usage of only one bit line reeduces the power
required to charge and discharge the bit lines to approximately
half, because only one bit line is charged duringg a read operation
instead of two. The bit line is charged during a write operation
about half of the time instead of every tim
me when a write
operation is required, here we are assuming equual probability of
writing 0 and 1. The proposed 7T SRAM
M cell uses two
transistors N4 and N5 with read-line (RL) for reead operation.

In this section comparison bettween conventional 6T, and


proposed 7T SRAM cell has been caarried out on the basis of read
delay, write delay, RNM, WNM, HS
SNM and total power.
A. Total power calculation
AM cell is sum of the power
Total power consumed by SRA
drawn from source used as main su
upply, sources used to charge
and discharge the bit lines (BL and BLb), sources used for word
line (WL) and sources used for read line (RL) (in case of
proposed 7T SRAM cell). Power co
onsumed by particular source
is calculated by multiplying the aveerage current drawn from the
source and voltage provided by the source.
s

A. Write operation of 7T SRAM cell


While writing, the data need to be written w
will be loaded on
bit-line (BL) and then word-line (WL) will be activated. Strong
access transistor N3 allows bit line to over poweer the cell, so that
required data will be written into the cell. To w
write 1 into the
cell, the bit-line (BL) is charged to VDD. If the data need to be
written is 0, bit-line should be at logic low, annd then word-line
(WL) should be pulled to VDD. In write modde read-line (RL)
will be inactive (i.e. at logic 0).
B. Read operation of 7T SRAM cell
To read data from the cell, initially bit line ((BL) is being precharged to VDD. After pre-charging the bit linee read line (RL) is
activated. Depending upon whether the bit linee (BL) discharges
or holds the held charge, data stored in the 7T S
SRAM cell can be
decided. If BL discharges after pulling the readd line to VDD, it
indicates 7T SRAM cell is storing 0 in it. If bit line holds the
held charge then the data stored is 1. In reead mode WL is
inactive (i.e. at logic 0).
Assume that 7T SRAM cell is initially storinng 0 (Q=0 and
Qb=1). When bit line (BL) is pre-charged to VDD and RL is
pulled to VDD, N4 and N5 transistors are ON. N
N4 is ON because
of data stored is 0 (i.e. Q=0 and Qb=1)) and N5 is ON
because RL is at VDD. Now BL has the path too ground through
N4 and N5 hence discharges to logic 0 indicatting data stored is
0. Assume that 7T SRAM cell is initially sttoring 1 (Q=1
and Qb=0). When bit line (BL) is pre-chargedd to VDD and RL
is pulled to VDD, N4 is OFF and N5 is ON. N
N5 is ON because

Table 1: Percentage power improvemen


nt in the proposed 7T SRAM cell
compared to conventiona
al 6T SRAM cell.
Operation
WRITE 0
WRITE 1
READ 0
READ 1

Conventional
6T SRAM cell
(uW)
9.35
9.35
8.62
8.52

Pro
oposed 7T
SR
RAM cell
(uW)
7.29
7.73
7.11
6.70

Improvement
(%)
22.03
17.33
17.52
21.36

10
8
Conventional 6T
SRAM cell

6
4

Proposed 7T
SRAM cell

2
0
WRITE 0 WRITE 1 READ 0 READ
D1

Figure 3: Power improvement in th


he proposed 7T SRAM cell
compared to conventionaal 6T SRAM cell.

Table 1 shows total power conssumption of conventional 6T


SRAM cell and proposed 7T SRAM
M cell and improvements in

225
the proposed design. These total power consumed include, total
power consumed by bit lines (BL and BLb), transistors of cell,
word line (WL), and read line (RL) in case of proposed design. It
is found that power consumed by proposed 7T SRAM cell is less
comparing with conventional 6T SRAM cell.
In case of write operation of conventional 6T SRAM cell,
both the bit lines are loaded with complementary data and after
that charged values are floated. Depending on the value loaded,
any one of the bit line will be charged. Once the write operation
is finished, it is assumed that the charged value is discharged.
Here we are assuming equal probabilities of write 0 and write
1 operations. So power dissipation happens twice during a
write operation. In case of write operation of proposed 7T
SRAM cell, single bit-line is charged if the data need to be
written is 1 otherwise it is not charged if the data need to be
written is 0. After the write 1 operation, it is assumes that
charged value is discharged. Assuming equal probabilities for
write 0 and write 1 operations power consumed is almost half
compared to power consumption in conventional 6T SRAM Cell.
During read operation of 6T SRAM Cell, both the bit-lines
are charged to VDD and charged values are floated. One of the
bit lines will discharge depending on the data present in the cell,
while the other bit line is discharged after the operation is
complete. So in read operation power dissipation happens four
times. The single bit line of proposed 7T SRAM cell is charged
to VDD during read operation. If the data stored is 0, the bit
line discharges, otherwise it is assumed that the bit line
discharges after the read operation. So in read operation power
dissipation happens two times. Hence power consumed during
read operation is half compared to power consumption in
conventional 6T SRAM cell.
B. Delay calculation
SRAM delays are usually defined as the time taken to switch
SRAM cells data nodes from one logic to other logic. Delay is
measured as the time difference between 10% and 90% of the
voltage swing.
Table 2: Delay comparison between proposed 7T SRAM cell and
conventional 6T SRAM cell
Operation
Conventional 6T SRAM
Proposed 7T SRAM cell
(ps)
cell (ps)
WRITE 0
72.8
85.7
WRITE 1
104.1
139.3
READ 0
25.5
35
READ 1
22.5
0

Table 2 shows delays of different operations for conventional


6T SRAM as well as proposed 7T SRAM cell. During read 1
operation bit line will not discharge, so time required to read 1
in proposed design is almost 0s. For remaining operations
proposed design needs slightly more time than the conventional
6T SRAM cell.

C. Static Noise Margin(SNM)


The SRAM cell immunity to static noise is measured in terms
of SNM that quantifies the maximum amount of voltage noise
that can be tolerated at the cross-inverters output nodes without
flipping the cell [8] [9]. The graphical method to determine the
SNM uses the static voltage transfer characteristics of the SRAM
cell inverters.

SNM

Figure 4: Example Butterfly curve or VTC curves.

Fig 4. superposes the voltage transfer characteristic (VTC) of


one inverter to the inverse VTC of the other inverter of SRAM
Cell. The resulting graph is called a "butterfly" curve and is used
to determine the SNM [10] [13]. Its value is defined as the side
length of the largest square that can be fitted inside the lobes of
the "butterfly" curve. SNM is a key performance factor during
hold and read operations, and its value changes significantly
depending on the mode of operation of SRAM cell.
SRAM cell design has to achieve high integration density and
it has led to a stringent constraint on the cell area in modern
embedded systems or memory modules. Choosing minimal
width-to-length ratios for the SRAM cell transistors is the first
step to achieve such a design. Variations in the threshold voltage
Vth, increase steadily due to random dopant density fluctuations
in channel, source and drain as the dimensions scale down to
nanometer regime. Therefore, differences are common, between
two closely placed transistors which were supposed to be
identical. The differences are mainly in their electrical
parameters such as Vth and make the design of the SRAM less
predictable and controllable.
Graphs (Fig 5, 6, 7, 8) show Noise Margins for conventional
6T SRAM and proposed 7T SRAM cell.
Table3: Comparison between SNMs of conventional 6T SRAM cell and
proposed 7T SRAM cell
Improvement
Proposed
Convention
SNM
(X times)
7T SRAM
al 6T SRAM
cell (V)
cell (V)
RSNM
0.22
0.580
2.640 X
HSNM
0.55
0.595
1.082 X
WSNM for 0
0.66
0.702
1.064 X
WSNM for 1
0.66
0.580
No improvement

226

Figure 5(a): Read state Static Noise Margins (RSNMs)


of conventional 6T SRAM

RSNM for Conventional=max {0.22V, 0.044V}


SRAM cell

Figure 5(b): Read state Static Noise Margins (RSNMs)


of proposed 7T SRAM cells.

RSNM for Proposed=max {0.58V, 0.0435V}


SRAM cell

0.702

0.66

Figure 6(a): Write 0 state Static Noise Margins (WSNMs)


of conventional 6T SRAM

Figure 6(b): Write 0 state Static Noise Margins (WSNMs)


of proposed 7T SRAM cells.

0.66
0.58

Figure 7(a): Write 1 state Static Noise Margins (RSNMs)


of conventional 6T SRAM

Figure 7(b): Write 1 state Static Noise Margins (RSNMs)


of proposed 7T SRAM cells.

227

0.595

0.55

0.437
0

0.49

Figure 8(a): HOLD state Static Noise M


Margins (RSNMs)
of conventional 6T SR
RAM

Conveentional 6T
SRAM
M cell

successfully writes the required daata and after write operation


high HSNM will take care of proteccting the stored data. Write 1
state static noise margin of propo
osed 7T SRAM cell can be
increased by increasing the size of transistor N3. Increasing the
size of N3 leads to more area requireement.

osed 7T SRAM
Propo
cell

D. Cell Area Comparison

0.8
0.6
0.4
0.2
0
RSNM

Figure 8(b): HOLD state Static Noise Margins


M
(RSNMs)
of proposed 7T SRAM cells.

HSNM WSNM0 WSNM1

Figure 9: Comparison between SNMs of Conventional 66T SRAM cell and


proposed 7T SRAM cell

Table 3 compares SNMs of Conventional 6T


T SRAM cell and
proposed 7T SRAM cell. Fig 9 provides grraphical view of
comparison. In case of proposed 7T SRAM cell data storing
nodes Q and Qb are isolated from bit line (BL)) using transistors
N4 and N5. Due to this arrangement duringg read operation,
charge on bit line will not be able to disturb thhe data present on
the data storing nodes Q and Qb. Because of the isolated read
operation RSNM of proposed design is drasticaally increased. In
hold state strong nMOS N3 (OFF during hold state) present inbetween BL and node Q isolates BL from noode Q very well,
hence improvement in the HSNM.
When the stored data is 1, P1 is ON. At thhis situation if the
data need to be written is 0, then strong N3 willl override the pull
up function of P1 easily. Therefore WSNM foor 0 is improved.
When the stored data is 0, N1 will be ON. At thhis situation if the
data need to be written is 1, then strong N3 sligghtly struggles to
override the pull down function of N1. Wheere as in case of
conventional 6T SRAM cell write 1 operation iss carried out from
the BLb side, where N4 easily overrides the puull up function of
P2 while storing 0. Hence WSNM for 1 of prroposed design is
slightly less. Since proposed design is provviding very high
WSNMs, because
RSNM and HSNM, we can compromise with W
at the end of write operation proposed 7T SRAM cell

cell uses 7 transistors


Even though the proposed 7T SRAM
S
the area of proposed 7T SRAM ceell is almost same as that of
conventional 6T SRAM cell area. Here area of cells must be
compared by considering both numb
ber of transistors used and the
nsistors used in proposed 7T
sizes of transistors. Because all tran
pt transistor N3. Where as in
SRAM cell are of small size excep
conventional 6T SRAM cell two pull down transistors of large
size, hence we can say that therre is no area over head in
proposed 7T SRAM cell compared
d to conventional 6T SRAM
cells area.
IV. CONCLU
USIONS
With the aim of low power and
d high SNM SRAM cell, this
7T SRAM cell is designed. Thee proposed 7T SRAM cell
consumes 22.03% less power for write 0 operation, 17.33%
less power for write 1 operation, 17.52% less power for read
ower for read 1 operation
0 operation and 21.36% less po
AM cell. The proposed 7T
compared to conventional 6T SRA
Ms. It has 2.64 times SNM in
SRAM cell also has very high SNM
d state and 1.064 times SNM
read state; 1.082 times SNM in hold
in write 0 state compared to conventtional 6T SRAM cell.
NCES
REFEREN
[1]. Hyoungjun Na and Tetsuo Endoh, A
New Compact SRAM Cell by
Vertical MOSFET for Low-power an
nd Stable Operation, 978-1-45770226-6/11 2011 IEEE.

5P

228
[2]. Sanjeev K. Jain/ Pankaj Agarwal, A Low Leakage and SNM Free SRAM
Cell Design in Deep Sub micron CMOS Technology, 19th International
Conference on VLSI Design (VLSID06)1063-9667 2006 IEEE.
[3]. Paridhi Athe and S. Dasgupta, A Comparative Study of 6T, 8T and 9T
Decanano SRAM cell, 978-1-4244-4683-4/09 2009 IEEE.
[4]. Arash Azizi Mazreah, Mohammad Reza Sahebi, Mohammad Taghi
Manzuri, S. Javad Hosseini, A Novel Zero-Aware Four-Transistor SRAM
Cell for High Density and Low Power Cache Application, 978-0-76953489-3/08 $25.00 2008 IEEEDOI 10.1109/ICACTE.2008.
[5]. Sheng Lin, Yong-Bin Kim and Fabrizio Lombardi, Dept of Electrical and
Computer Engineering Northeastern University, Boston, MA, USA, A
32nm SRAM Design for Low Power and High Stability, unpublished.
[6]. Prashant Upadhyay, Mr. Rajesh Mehra, Niveditta Thakur, Low Power
Design of an SRAM Cell for Portable Devices, 978-1-4244-9034 2010
IEEE.
[7]. Ming-Hsien Tu, Jihi-Yu Lin, Ming-Chien Tsai, Shyh-Jye Jou, SingleEnded Sub-threshold SRAM With Asymmetrical Write/Read-Assist,
1549-8328 2010 IEEE.
[8]. B. Alorda, G. Torrens, S. Bota and J. Segura, Static-Noise Margin
Analysis during Read Operation of 6T SRAM Cells, unpublished.
[9]. Seevinck, E., List, F.J., Lohstroh, J. Static-Noise Margin Analysis of MOS
SRAM Cells, IEEE Journal of Solid-State Circuits, SC-22, 5 (Oct. 1987),
748-754.
[10]. Benton H. Calhoun and Anantha, Analyzing Static Noise Margin for Sub
threshold SRAM in 65nm CMOS, MIT, 50 Vassar St 38-107, Cambridge,
MA, 02139 USA, unpublished.
[11]. Zheng Guo, Andrew Carlson, Liang-Teck Pang, Member, Kenneth T.
Duong, Tsu-Jae King Liu, and Borivoje Nikolic, Large-Scale SRAM
Variability Characterization in 45 nm CMOS, 0018-9200 2009 IEEE.
[12]. Benton Highsmith Calhoun, and Anantha P. Chandrakasan, A 256-kb 65nm Sub-threshold SRAM Design for Ultra-Low-Voltage Operation, 00189200 2007 IEEE.
[13]. Koichi Takeda, Yasuhiko Hagihara, Yoshiharu Aimoto, Masahiro Nomura,
Yoetsu Nakazawa, Toshio Ishii, and Hiroyuki Kobatake, A Read-StaticNoise-Margin-Free SRAM Cell for Low-VDD and High-Speed
Applications, 0018-9200 2006 IEEE.
[14]. Shilpi Birla, Neeraj Kr. Shukla, Manisha Pattanaik, R.K.Singh, Deviceand-Circuit-Design-Challenges-for-Low-Leakage-SRAM for Ultra Low
Power Applications, Canadian Journal on Electrical & Electronics
Engineering Vol. 1, No. 7, December 2010.

2012 International Symposium on Electronic System Design

A New Assist Technique to Enhance the Read and Write Margins of Low Voltage
SRAM cell
Santhosh Keshavarapu

Saumya Jain

Manisha Pattanaik

VLSI Design Lab


ABV-IIITM
Gwalior(MP), INDIA
santhosh.459ece@gmail.com

VLSI Design Lab


ABV-IIITM
Gwalior(MP), INDIA
saumyajain.sati@gmail.com

VLSI Design Lab


ABV-IIITM
Gwalior(MP), INDIA
manishapattanaik@iiitm.ac.in

AbstractImproving the Noise margin is one of the important


challenge in every state of the art SRAM design. Due to the
Process variations like threshold voltage variations, supply
voltage variations etc.. in scaled technologies, stable operation
of the bit cell is critical to obtain with high yield in low-voltage
SRAM. In this paper a new assist technique (Read assist and
write assist) is proposed to enhance the read and write margins
of the 6T SRAM bit cell and the same write assist circuit is
applicable to enhance the write margin of the 8T SRAM bit
cell. The simulations are performed in 90nm TSMC process
Technology node and the read and write margin simulation
results are compared with different SRAM circuits like 6T
SRAM bit cell with cell ratio of 1, 2, 3 and Dynamic word line
swing technique and 8T SRAM bit cell. The effect of
temperature and threshold voltage values on Read and Write
margins are observed. By using the proposed read assist
technique the read margin is improved by 2.375 times for 6T
cell and with write assist technique the write margin is
improved by 1.89 times for 6T and 8T cells.

voltage swing technique [11] provides the improvement of


read margin by dynamically changing the word line voltage
during the read and write operations. However with this
technique there is no improvement in write margin and its
value is same as 6T and also it consume more area as well.
Finally these SRAM cells still do not satisfy the needs for
improving both read and write stability with a limited area
overhead for low power applications. As the SRAM stability
is going to effect at lower supply voltages, the proper
analysis of SRAM read/write margin is essential for all low
voltage SRAM designs. A good metric for read/write margin
is critically important to all Types of SRAM designs.
Stability of SRAM cell is described in terms of static
noise margin (SNM) which is the conventional approach to
measure the stability, in this paper the stability is measured
in terms of the read SNM and write SNM of the SRAM
cells. In this work a new assist technique is proposed which
enhance the read and write margins of the 6T cell and the
same technique is applicable to enhance the write margin of
the 8T cell. In conventional if we increase one margin
(either read or write) the other margin is going to affect
correspondingly. But by using this proposed technique the
read and write margins are enhanced without affecting the
other margins.
This paper is organized as follows section II covers the
proposed technique, section III describes the simulation
results and section IV covers the conclusion.

Keywords- Process variations; Read assist; write assist; Low


voltage SRAM; Read and Write margins;

I.

INTRODUCTION

As the CMOS Technologies continue to scale down to


deep sub-micrometer levels, devices are becoming more
sensitive to noise sources. SRAM is the most common
embedded memory option for CMOS ICs. Due to increasing
threshold voltage fluctuations caused by global and local
process variations, on-chip nanometer SRAMs suffer from
instability in write and read operations at a lower supply
voltage. To achieve low voltage operation, researchers have
proposed several new cell topologies and sizing methods
than conventional 6T SRAM cell.
In a read decoupled 8T SRAM cell[1][2] the read path is
completely isolated from the original 6T structure due to this
there is no effect on the storage nodes during the read
operation. By this structure the read margin is improved and
it can operate even at low supply voltages. However 8T cell
do not significantly increase the write margin. Several readdecoupled 9T[3][4][5], 10T SRAM cells[6] have achieved
low voltage operation with improved read stability, but they
consume more area overhead when compared to
conventional 6T cell and have a limited write margin. The
6T cell using a larger transistor size improves cell stability,
due to this there is an increase on chip area and it cant meet
the requirements for further reducing the VDDmin. Dynamic
978-0-7695-4902-6/12 $26.00 2012 IEEE
DOI 10.1109/ISED.2012.55

II.

PROPOSED METHOD

The cell ratio and pull up ratios are the two important
factors which decide the read and write margins of the
SRAM bit cells. The cell ratio is the ratio of pull down
NMOS transistor to the width of access NMOS transistor.
Pull up ratio is the width of pull up PMOS transistor to
access NMOS transistor in SRAM cell. If the cell ratio is
high then the read margin of the cell increases
correspondingly and by reducing the pull up ratio the write
margin of the cell increases. These two ratios are
contradictory to each other i.e. by modifying one factor the
other margin decreases proportionally. Hence we propose a
mechanism to enhance the read and write margins of the
SRAM bit cell without affecting the other margin.
In the read operation if the supply voltage of the cell is
higher than that of the word line voltage then the read
margin of the cell increases and vice versa. Similarly for the

97

write operation if the word line voltage is higher than the


supply voltage then the write margin of the cell increases
and vice versa. These changes in the supply and word line
voltages ultimately affect the cell and pull up ratios which
lead to an increase in the read and write margins of the bit
cell. Other way to increase the read and write margins is by
using different supply voltages. But providing different
supply voltages requires the use of DC-DC converters on
the chip and hence an increased area.
We know that the NMOS transistor provides a strong
zero and the PMOS provides a strong one. NMOS
transistor pulls the output node all the way down to GND
i.e. if the input is low (GND) it provide the strong zero at
the output node and when input is high (VDD) it fails to
raise the output node up to VDD. That means the output
voltage is reduced by some value. Similarly PMOS provides
the output node all the way to High (VDD) i.e when the
input is high (VDD) it provides the output to high(VDD) i.e.
strong one and when input is low (GND) it lowers the
output node to some value of voltage instead of exactly
GND. These properties of NMOS and PMOS transistors we
can use in our mechanism to implement a circuit which will
provide the read and write margin improvement without
affecting the other margins.

will not reduce the voltage which is obtained with the two
transistors.

A. Circuit Operation

As the 6T SRAM cell faces the problem related to read


and write margins at low supply voltages, by applying the
above proposed mechanism to 6T cell the read margin is
improved without any degradation of write margin of the
cell. Figure 3 shows the 6T cell with the read assist circuit.

B. Read Assist technique


Figure 2 shows the proposed read assist technique. In the
read operation we provide read enable to high due to this
the circuit produce the reduced voltage value. Now by
applying this reduced voltage to the word line of the SRAM
cell the read margin of the circuit will increase as described
earlier. In write operation we will provide read enable to
low then the above circuit provide the exact VDD without
any degraded value. As the word line voltage is remains at
VDD which is equal as supply voltage the write margin will
not degrade.

Figure 2. Read assist circuit

Figure 1. operation of proposed technique

When we provide the enable signal to high, then


NMOS transistor M1 is active and a voltage VDD is
connected to one end of M1. Due to the property of NMOS
that it will not provide the strong one (i.e. VDD) and
provides the voltage that is less than VDD. In the same path
transistor M2 is connected which is a diode connected load
i.e. its gate is connected to drain. Due to this it always be in
saturation region i.e. always ON, so a resistor Ron exists
across M2 due to this there is a voltage drop across this
resistor and the voltage is further reduced. Now this voltage
is applied to one of the terminals of NMOS transistor M3
and it produces a reduced voltage value. When the enable is
low the transistors M1 and M3 are OFF due to this there is
no path from the one end of VDD. At the other end M4 is
ON and as PMOS provides the strong one i.e. exact VDD
without degraded value. The Transistor M2 is also
responsible for the reduction of voltage. The Aspect ratios
for the Transistors M1,M3,M4 are taken as nominal values.
The Transistors M1 and M3 are responsible in reduction of
voltage and the absence of one of transistor either M1 or M3

Figure 3. Read assist circuit with 6T cell

Applying this circuit to a single bit cell may consume an


extra area of four transistors but at the array level we will
have more advantage since we usually read or write the
whole row or column at the same time.
C. Write Assist technique
In the write operation when we provide write enable to
high then the circuit provide the reduced voltage. This
voltage we can apply to the SRAM cell supply. As word line

98

voltage is higher than the supply voltage the write margin of


the SRAM cell increases. In read operation we provide the
write enable to low so that the circuit gives the exact value
of voltage i.e. VDD without any degraded value. As there is
no degradation in supply voltage the read margin of the
circuit is not affected.

Figure 6. Read Noise Margin curve for proposed read assist technique

Temperature and threshold voltages (Vtn, Vtp) are the


main parameters which effect the read noise margin of the
SRAM cell. So here we analyzed the Temperature effect on
the read margin value and is shown in the Figure 7.

Figure 4. Write assist circuit

Write margin is same for the 6T and 8T SRAM cells and


is less at low supply voltages. We can use this mechanism
for both 6T and 8T cells. Figure 5 shows the proposed write
assist with 8T cell.

Figure 7. Temperature variation effect read margin curves

As the Temperature is varying from -400C to 700C the


read margin value for the proposed read assist technique
decreases and is shown in the Figure 8.

Figure 5. write assist circuit with 8T cell

These read and write assist techniques have no effect on


supply voltage during the Hold state. The Hold margin of
the cell is not affected, so there is no problem with respect
to the Data retention voltage (DRV) which is the
minimum voltage required to store the data in the cell.
III.

SIMULATION RESULTS

Figure 8.Temperature effect result on read margin

The Simulation for the proposed read and write assist


techniques are performed in 90nm technology node with
0.6V supply voltage. The stability parameters like read and
write margins are measured and compared with different
SRAM techniques. The effect of Temperature and
Threshold voltages (process corners) on the read and write
margin values are observed.

[A]: -400C
[G]:700C.

[B]: -200C

[C]: 00C

[D]: 200C

[E]:400C

[F]:600C

Read margin of the proposed Read assist technique for


the different process corners include TT(True NMOS, True
PMOS), FF(Fast NMOS, Fast PMOS), SS(Slow NMOS,
Slow PMOS), FS(Fast NMOS, Slow PMOS) and SF(Slow
NMOS, Fast PMOS) is measured. It is shown in Figure 9.
From the analysis it is observed that the Read margin value
is high in SS corner and it is less in FS corner.

A. Read Margin
For the measurement of read margin conventional static
noise margin (i.e. butterfly curve) approach is used and the
curve of read noise margin for the proposed read assist
technique is shown in the Figure 6. The node voltages Q,
QB are the storage nodes. The butterfly curve is obtained by
sweeping these storage nodes and measuring the other
storage nodes voltage.

Figure 9. Read Margin with different process corners

99

The read Margin value for the proposed assist circuit is


compared with the different SRAM techniques and the
results are shown in the Figure 10.

temperature on the write margin curves is analyzed and is


shown in Figure 12.

Figure 10. Comparison of read margin with different techniques


[A]: 6T(=1) [B]: 6T(=2) [C]: 6T(=3) [D]: proposed read assist with
6T(=1) [E]: Dynamic word line voltage with 6T
[F]:8T cell
[G]:Proposed write assist with 8T.

As the temperature is varying from -400C to 700C the


write margin values are going to increase and is shown in
the Figure 13.

Figure 12. Write margin curves with temperature variation

The read margin values increase as the cell ratio()


increases, since as the cell ratio increases the drivability of
pull down NMOS transistor is going to increase which
increases the readability of the SRAM cell. Dynamic word
line voltage technique increase the read margin quite high
compared to 6T cell with different cell ratios. But it requires
an extra area in terms of transistors when compared with the
proposed read assist technique. The proposed read assist
technique enhances the read margin of 6T cell as equal as
Dynamic word line voltage technique with less area. The 8T
cell read margin is superior to all the techniques which are
described above. Since in the 8T cell as the read path is
completely isolated from the original 6T structure there is
no effect on the storage nodes by which the read margin is
improved. The proposed write assist technique do not
disturb the supply voltage during the read operation, so
there is no effect on the read margin. Write assist technique
provide the read margin as equal to the original 8T cell.

Figure 13. Temperature effect on write margin values


[A]: -400C [B]: -200C [C]: 00C [D]: 200C [E]:400C [F]:600C
[G]:700C.

Threshold voltage (Vtn, Vtp) variations have a


significant effect on the write margin value of the SRAM
cell. The Monte Carlo simulation for 100 M.C. in TT corner
is performed and its result for the proposed write assist
technique is shown in Figure 14. This graph shows the
variation of write margin values for different threshold
voltage variations.

B. Write Margin
The write margin is measured by sweeping the right
word line voltage(WLR)[7][8] of the SRAM cell. On the Y
axis the storage node voltage is taken and on the X axis
sweeping right word line voltage is taken. The write margin
curve for proposed write assist technique is shown in the
Figure 11.

Figure 14. Monte Carlo simulation results for write margin

Write Margin values for the proposed write assist


technique with different process corners include TT, FF, SS,
FS and SF is analyzed and is shown in Figure 15. It is
observed that the write margin value is less in SF(Slow
NMOS, Fast PMOS) corner and is high in FS(Fast NMOS,
Slow PMOS) corner.

Figure 11. write margin curve for the proposed write assist technique

Temperature and threshold voltages (Vtn,Vtp) also effect


the write margin of the SRAM cell. The effect of

100

require single assist circuit so when compared to array level


the area over head is very less.
REFERENCES
[1]

[2]
Figure 15. Write margin curves with different process corners

The Write Margin values for the proposed write assist


technique is compared with the different SRAM Techniques
and is shown in the Figure 16.

[3]

[4]

[5]

[6]
Figure 16. Comparison of write margin with different techniques
A: 6T(=1) B: 6T(=2) C: 6T(=3) D: proposed read assist with
6T(=1) E: Dynamic word line voltage with 6T F:8T cell G:Proposed
write assist with 8T.

[7]

The write margin values decrease by increasing the cell


ratio (=1,2,3). Since  affects the drivability of access
transistor, which affects the write ability of the cell. Write
margin value for the proposed read assist is same as 6T cell
since in this there is no degradation of the word line voltage
and is same as cell supply voltage. Write margin for the
dynamic word line voltage technique and for 8T cell is also
same as 6T cell. The write margin value is high for the
proposed write assist technique when compared to all the
techniques which are described above.
IV.

[8]

[9]

CONCLUSION
[10]

In this paper we proposed a new assist technique (Read


and write assist) to enhance the read and write margins of
the low voltage SRAM cell. By applying the Read assist
technique to the 6T SRAM cell the read margin is improved
by 2.375 times and write assist technique with 8T cell the
write margin is improved by 1.89 times the original 8T cell.
The Temperature and threshold voltage variations on read
and write margins of proposed techniques are observed.
Application of this technique to a single bit cell may take an
area over head but at the array level, the complete row

[11]

[12]

[13]

101

H. Fujiwara, K. Nii, H. Noguchi, J. Miyakoshi, Y. Murachi, Y.


Morita, H. Kawaguchi, M. Yoshimoto, "Novel Video Memory
Reduces 45% of Bitline Power Using Majority Logic and Data-Bit
Reordering," Very Large Scale Integration (VLSI) Systems, IEEE
Transactions on , vol.16, no.6, pp.620-627, June 2008.
L. Chang, D.M. Fried, J. Hergenrother, J.W. Sleight, R.H. Dennard,
R.K. Montoye, L. Sekaric, S.J. McNab, A.W. Topol, C.D. Adams,
K.W. Guarini, W. Haensch, "Stable SRAM cell design for the 32 nm
node and beyond," VLSI Technology, 2005. Digest of Technical
Papers. 2005 Symposium on vol., no., pp. 128- 129, 14-16 June 2005.
Shilpi Birla, Rakesh Kumar Singh, Manisha Pattanaik, Stability and
Leakage Analysis of a Novel PP Based 9T SRAM Cell Using N
Curve at Deep Submicron Technology for Multimedia Applications,
Circuits and Systems, 2011, 2, 274-280.
Liu Zhiyu, V. Kursun, "Characterization of a Novel Nine-Transistor
SRAM Cell," Very Large Scale Integration (VLSI) Systems, IEEE
Transactions on vol.16, no.4, pp.488-492, April 2008.
R.K.Singh, Shilpi Birla, Manisha Pattanaik, Characterization of
PNN Stack SRAM Cell at Deep Sub-Micron Technology with High
Stability and Low Leakage for Multimedia Applications,
International Journal of Computer Applications (0975 8887),
Volume 33 No.1, November 2011.
H. Noguchi, Y. Iguchi, H. Fujiwara, Y. Morita, K. Nii, H.
Kawaguchi, M. Yoshimoto, "A 10T Non-Precharge Two-Port SRAM
for 74% Power Reduction in Video Processing," VLSI, 2007. ISVLSI
'07. IEEE Computer Society Annual Symposium on , vol., no., pp.107112, 9-11 March 2007.
K. Takeda, H. Ikeda, Y. Hagihara, M. Nomura, H. Kobatake,
"Redefinition of Write Margin for Next-Generation SRAM and
Write-Margin Monitoring Circuit," Solid-State Circuits Conference,
2006. ISSCC 2006. Digest of Technical Papers. IEEE International ,
vol., no., pp.2602-2611, 6-9 Feb. 2006.
Wang Jiajing, S. Nalam, B.H. Calhoun, "Analyzing static and
dynamic write margin for nanometer SRAMs," Low Power
Electronics and Design (ISLPED), 2008 ACM/IEEE International
Symposium on , vol., no., pp.129-134, 11-13 Aug. 2008.
A. Kawasumi, T. Yabe, Y. Takeyama, O. Hirabayashi, K. Kushida,
A. Tohata, T. Sasaki, A. Katayama, G. Fukano, Y. Fujimura,
N. Otsuka, "A Single-Power-Supply 0.7V 1GHz 45nm SRAM with
An Asymmetrical Unit--ratio Memory Cell," Solid-State Circuits
Conference, 2008. ISSCC 2008. Digest of Technical Papers. IEEE
International , vol., no., pp.382-622, 3-7 Feb. 2008.
B.H. Calhoun, A.P. Chandrakasan, "Static noise margin variation for
sub-threshold SRAM in 65-nm CMOS," Solid-State Circuits, IEEE
Journal of vol.41, no.7, pp.1673-1679, July 2006.
S.A. Tawfik, V. Kursun, "Dynamic wordline voltage swing for low
leakage and stable static memory banks," Circuits and Systems, 2008.
ISCAS 2008. IEEE International Symposium on , vol., no., pp.18941897, 18-21 May 2008.
Joon Chang Ik, Kang Kunhyuk, S. Mukhopadhyay, C.H. Kim, K.
Roy, "Fast and accurate estimation of nano-scaled SRAM read failure
probability using critical point sampling," Custom Integrated Circuits
Conference, 2005. Proceedings of the IEEE 2005 , vol., no., pp. 439442, 18-21 Sept. 2005.
E. Seevinck, F.J. List, J. Lohstroh, , "Static-noise margin analysis of
MOS SRAM cells," Solid-State Circuits, IEEE Journal of , vol.22,
no.5, pp. 748- 754, Oct 1987.

A 32kb 90nm 9T -cell Sub-threshold SRAM with


Improved Read and Write SNM
Milad Zamani, Sina Hassanzadeh, Khosrow Hajsadeghi and Roghayeh Saeidi
Department of Electrical Engineering, SharifUniversity of Technology, Tehran, Iran
Miladzamani@ee.sharif.edu, S_hassanzadeh@ee.sharif.edu, Ksadeghi@sharif.edu, rosaeidi@ee.sharif.edu

Abstract-The fast growth of battery operated devices has made


low power SRAM designs a necessity in recent years. Moreover,
embedded SRAM units have become an important block in
modern SoCs. The SRAM performance is limited by the cell
stability during different operation. By adding extra transistor to
the eonventional 6T-cell, hold, read and write static noise margin
(SNM) can be improved in the sub-threshold SRAM. In this
paper we proposed a new 9T-cell SRAM that shows 80% and
50%

improvement

in

read

and

write SNM

respectively

in

(a)

eomparison to the conventional 6T-eell SRAM. Using stack


transistors in the leakage current path, the new structure shows

vee ri"lr===----,

lower bitline leakage assisting the sense amplifier to easily read


the bitline current. The 0.3V sub-threshold SRAM post-layout
simulation

using

90nm

TSMC

CMOS

model

confirms

the

proposed 32k SRAM performance.

III
a

Keywords-component; Sub-threshold; SRAM; Stability; Static


Noise Margin
1.
In

recent

years,

sub-threshold

o L-_____-J

(b)

INTRODUCTION

design for

low

power

possibility of a low power SRAM clear. However, memory


circuits operating successfully at such low voltage are more
challenging since SRAM yield decreases considerably at these
low voltages [1]. Many other effects such as process variation,

Q(V)

vee

Figure I. a) conventional 6T cell SRAM and b) butterfly curve for hold and

application is in the center of attention as a low energy


solution. Various designs in sub-threshold region make the

read SNM

operation lead to weak stability of 6T cell [4]. Avoiding this


access in the read operation and suppressing the feedback
during the write operation, many designers manipulate the cell
and add extra transistor.

bitline leakage and transistors mismatches challenge the proper

ST, 9T and lOT are the different configurations designed to

operation of SRAMs that need a precise design. The main

improve the stability [5-7]. These structures use separate read

SRAM unit, cell, plays a key role in practical sub-threshold

mechanism, feedback cutting, asymmetric theme and one side

design [2]. A robust cell resisting to the process variation and

access to augment the performance and stability of the cell and

bitline leakage, augments the total SRAM performance.


In

sub-threshold

experiences

poor

region,

read

and

conventional
write

fluctuations of threshold voltage,

ability.

hence the functionality of SRAM and the peripheral circuits.

6T-cell

SRAM

With

various

SNM will experience a

significant reduction. Noise margin becomes worse at the time


of read and write operations compared to hold operation which
the internal feedback operates independent of other transistors
[3]. Fig. 1 shows the conventional 6T-cell SRAM and the
butterfly curve for hold and read SNM. 6T-cell shows poor
stability in read SNM. High variation of internal nodes during
read

operation

and

internal

feedback effect during

978-1-4673-6040-1/13/$31.00 2013 IEEE

write

However, designing a fully differential scheme that provides


differential outputs for sense amplifier and improving the
stability during the hold and read phases is the necessity of a
practical and high performance sub-threshold SRAM. In this
paper

novel

scheme

is

proposed

that

uses

dynamic

mechanism cutting the feedback to improve the write SNM and


lowering

the

write

time.

ScarifYing

the

read

time

and

eliminating one side of data from reading path in a new


scheme,

augments

the

read

SNM.

Although

consumption is 1.6 times of the 6T cell, the proposed


shows

104

robust stability.

the

area

structure

...
--,--.,-,
1.2 ,---....---

1.2

III

(a)

0.8

III

0.4
0 .2

1.2

0.8

0.2 0.4

1.2

0.8

0.2 0.4

a (V)

a (V)

Figure 2. Butterfly curve shows a) hold SNM reduction and b) read SNM
reduction by lowering

VDD.

The stack transistors enacted in the scheme decreases the


leakage current and total performance of SRAM in reading and
holding phases. A 32Kb 9T cell SRAM simulated by post
layout simulation using 90run TSMC tech to confinn the
memory performance.
The paper is organized as follows: section II explains the
challenge faced by the conventional 6T and 9T cell SRAM for
operation in sub-threshold region. Section III presents the new
9T cell structure and its functionality. Section IV shows the
different simulations confirm the proper operation of the

(b)

proposed structure and section V is the summery of the paper.

Figure 3. The two structures of a) conventional 9T cell and b) Proposed 9T

II.
A.

cell with all controlling signals. c) layout of two columns related to the

SUBTHRESHOLD SRAM DESIGN CHALLENGES

proposed 9T cell SRAM.

6T cell failure

power consumption will be increased. Using stack transistors

The 6T cell shown in Fig. l(a) fails to operate in sub

(M9 and M7), leakage current is suppressed and read operation

threshold because of process variation and reduced voltage

will be improved. The basis of this structure is the conventional

level. Noise source having value VN are introduced at each of

8T cell [9] used the inverter as a buffer to separate the storage

the internal nodes in the bitcell. As VN increases, the stability


of the cell will change. The butterfly curve or voltage transfer

nodes from the read operation path to be unchanged. M9 is


added to decrease the leakage current and to eliminate the need

characteristic (VTC) represents the most common way of SNM

of charge pump for changing the ground voltage which

calculation graphically. Fig. l(b) plots the VTC of inverter 2

suppresses the leakage current. This extra transistors increase

from Fig. l(a) and the inverse VTC from inverter 1. The SNM

the

is defined as the length of the side of the largest square that can

consumption is high.

reading

time

and

due

to

be

asymmetric,

the

area

be embedded inside the lobes of the butterfly curve. This SNM


is extracted when the access transistors are off and the cell

III.

contains only two inverters that regenerate the data. This VTC
is related to the hold phase. The other use of SNM is the
measurement of cell stability during the read operation. During
read operation, two access transistors interact with the inverters
and deteriorate the SNM. Fig. l(b) shows the read VTC that
contains smaller square in comparison to the hold VTC.
The impact of decreasing the VDD is shown in Fig. 2.

In

ideal VTCs, SNM is still I imited to VDJ2 because of the two


side of the butterfly curve. An upper limit on the change of
SNM with VDD is thus \1:,. As the VDD decreases, the hold and
read SNM will be suppressed. Going to the sub-threshold area,
6T cell SRAM will face to the predicament situation.
B.

Conventional 9T cell SRAM


Fig.

3(a)

shows

the

conventional

cell

[8].

The

conventional 9T cell uses two different paths for read and write
operations; WBL and WBLB perform write operation and the
single RBL is for read phase activation. Thus, the area and the

2013

8th International Conference on Design

&

SRAM

9T cell. The proposed 9T cell uses two inverters for holding


data. In this case, S l, S2 and S3 signals are active (S3=O,
S l=S2=1). Thus, the storage nodes (Q and QB) retain the data
due to the feedback. Thus, the hold SNM for proposed 9T is
similar to the 6T cell (some reduction exists due to the stacked
transistors). WLR and WLL attach the bitlines to the storage
nodes Q and QB. For controlling the read and write operation
we use S l, S2 and S3. Fig. 4 and 5 shows the different phases
of controlling signals for the read and write operations. Fig.
3(c) shows the layout related to the two columns of 32Kb 9T
2
SRAM and the total area is O.2mm .
A.

9T

PROPOSED 9 T CELL

Fig. 3(a) (b) shows conventional 9T cell and the proposed

Read operation
Fig. 4 shows the proposed 9Tcell read operation simulation

results. During the read operation, WLL,

S2 and S3 are

inactive (S3=1 and S2=WLL=O). Thus, the storage node Q will


be cut from the read path while indirectly affected by M2 in the

Technology of Integrated Systems in Nanoscale Era (DTlS)

105

I.

SIMULATION RESULTS AND COMPARlSION

All simulations are done by 90nm TSMC CMOS technology.


The default VDD for cell is 0.3 at the room temperature. Fig, 6
shows hold SNM for various VDD and temperatures. The hold
SNM changes by VDD reduction significantly. Operating
around 0.3V, temperature variation has a small effect on hold

Ci

:: :: 1I!

200

" r

SNM.

:=============

300,____
27S !
"
250
"
22S '
200 I;
175 1
SO

100

200

250

read

SNM

for

various

VDD

\,

-;; 200.

....co..s.

100;
o

IL

\---___-. _
_-_-_-_-_-_-_-_-_-==----1

____

300
-s; 200

..s

100

300

co

150

shows

300.L----'I

is acceptable.

: 1 1----1
I

Fig.

300

Time (ns)

Figure 4. Proposed 9Tcell read operation simulation results.

read path . WLR and SI activation commences the reading

200

'

loa

phase through transistors M2, M6 and M8. During reading 0,


BLB cannot be affected;

while

reading

1, BLB will be
300

discharged through reading path. This event increases the


storage node QB voltage. For improving the read SNM, the
signals S2 and S3 should be changed in a small time after
WLR. Thus, the storage node QB will be regenerated through

cJ

storage node Q. This problem occurs during the 0 reading

200
100

transferring the 0, the delay between controlling signals is


small. As the reading operation deteriorates one of the storage
nodes, the other node can regenerate it. Due to stacking
M6

and

M2

the

bitline

leakage

decrease

significant amount.

Fig.S

shows

the

E.

300
200
100
0

Write operation

B.

-;;

1
I

operation and due to the higher performance of NMOS when

transistors,

i
1
l

9Tcell write

(0)

operation

simulation results. In the write operation, two signals, WLL


and WLR, are attached to the VDD to prepare a path from

1----_1
I

proposed

10

20

30

40

SO

Time (ns)
Figure 5. Proposed 9Tcell write operation simulation results.
HoIdSNM

internal node to the bitlines. The controlling signals, SI and


S3, are similar to the hold situation and S2 is attached to the
ground leads the Q path to the ground be cut and the feedback
open. Thus, M8 charges the internal node, QB, through BLB
with higher power and improves the write SNM and decreases
the write operation time. The main reason of the fast speed
write operation is the access of the data to the internal node
without any trouble when the feedback is in cut-off mode.
In conventional 9T, the VDD node is float during write
operation. In contrast, our proposed scheme uses the VDD to
empower the Q charging. Thus, the write SNM is higher in the

.5\,';-----;;";;c ----;C;-;;------;O';; 3 ---;;";


;c ----;C;-----;;C;;-----;!
VDO(V)

Figure 6. Hold SNM for various

VDD and temperatures.

proposed scheme and the write time is decreased.

106

2013

and

temperatures. For normal variation of temperature, read SNM

8th International Conference on Design

&

Technology of Integrated Systems in Nanoscale Era (DTlS)

TABLE

ReadSNM

COMPARISION WITH OTHER STRUCTURES


Different Structures

TABLE II

ConvellIional 9T
[8J

Read SNM (mV)

45

91

82

Write SNM (mv)

90

90

135

Hold SNM (mv)

91

91

90

33

33

39

4. 5

4. 5

3. 6

differential

single

single

1. 8A

I. 7A

PERATION

i
:>
z
00

50

Write Energy for


one cell

, -"'5 ----,;C;o 2 -----;o !;z. ;- 52 -----;,


03,-------o -; 35
----0';'-;; -"'5 ----,!05
"!';C;
;
Figure 7. Read SNM for various Voo and temperatures.

Read method

WriteSNM

50

(fi)

Write time (ns)

VOO(V)

Area

,--------r=---<>=;:=c;= T .=;
.2<) "'c:il
-B-T=20C
----'O'-T=80C

Proposed
9T

Conventional
6T

Using stack transistors in the leakage current path, the new


structure shows lower bitline leakage assisting the sense

50

i:> _loa

amplifier to read the bitline current. Moreover, by cutting the

ill

feedback of latch during the write operation, write time


improves 20% in comparison to the conventional 6T cell

-150

SRAM. The 32 Kb 0.3V sub-threshold SRAM post-layout


2<)0

simulations using 90nm TSMC CMOS model confirm the


proposed 9T cell SRAM performance. Moreover, the read,

_25 L
,

05
"\-c
5 -
o ,-
3 --035 fc
0
-;; 25-
02-----0"'
5 ----fo
-"
VDO(V)

write and hold SNM variation with temperature and VDD

Figure 8. Write SNM for various Voo and temperatures

Fig.

shows

the

write

SNM

for

various

simulated

to show the
2
consumption is 0.2mm .

VDD

of

cell.

Table I

shows various

[I]

stack transistors in the reading path decrease the leakage


current of bitline and improve the reading operation speed. By

[4]

However, reference [3] proposed a new read procedure using


read

accesses

that

improve

the

readability

[6]

by

and

50%

in

read

Computer

ICCD

2005.

Bo Zhai; Blaauw, D. ; Sylvester, D. ; Hanson, S. ; , "A Sub-200mV 6T


SRAM in 0.13m CMOS," Solid-State Circuits Conference, 2007.

Saeidi, R.; Sharitkhani, M. ; Hajsadeghi,

K;

, "A subthreshold dynamic

read SRAM (DRSRAM) based on dynamic stability criteria," Circuits

Calhoun, RH. ; Chandrakasan, AP. ; , "Static noise margin variation for


sub-threshold SRAM in 65-nm CMOS," Solid-State Circuits, IEEE
Ming-Hung Chang; Yi-Te Chiu; Shu-Lin Lai; Wei Hwang; , "A Ikb 9T
subthreshold SRAM with bit-interleaving scheme in 65mn CMOS," Low

Ik Joon Chang; Jae-Joon Kim; Park, S.P.; Roy,

K;

, "A 32 kb lOT Sub

Threshold SRAM Array With Bit-Interleaving and Differential Read


Scheme in 90 nm CMOS," Solid-State Circuits, IEEE Journal of ,
voL44, no.2, pp.650-658, Feb. 2009.

[7 ]

Meng-Fan Chang; Shi-Wei Chang; Po-Wei Chou; Wei-Cheng Wu; , "A


130

mV

SRAM

With

Expanded

Write

and

Read

Margins

for

voL46, n02, pp. 520-529, Feb. 2011.

CONCLUSION

improvement

2005.

Subthreshold Applications," Solid-State Circuits, IEEE Journal of ,

In this paper we propose a new 9T-cell SRAM that shows


80%

Processors,

Power Electronics and Design (ISLPED) 2011 International Symposium

proposed structure addresses this improvement.


II.

and

on , vol. , no , pp.291-296, 1-3 Aug. 20II.

dynamic behavior of the cell. But for write operation, we have


stability problem such as before and need improvement. The

Computers

Journal of, voL41, no.7, pp.1673-1679, July 2006.


[5]

with the write SNM improvement. The conventional SNM


more important than write.

, "A feasibility study

and Systems (ISCAS), 2011 IEEE International Symposium on , vol. , no. ,

structure are the area

reduction of proposed structure is negligible in comparison

VLSI in

K;

across technology generations,"

pp. 61-64, 15-18 May 2011.

SNM and write time improved 50% and 20% in comparison to

consumption and higher controlling signals. The read SNM

SRAM

pp. 332-606, 11-15 Feb. 2007.


[3]

cutting the feedback of latch during the write operation, write


SRAM respectively. The most

subthreshold

ISSCC 2007. Digest of Technical Papers. IEEE International, vol. , no. ,

structure, the area consumption shows 6% improvement. The

multiple

area

417- 422, 2-5 Oct 2005.


[2]

regenerating the other. With one BL reduction in the new

shows that read stability is

total

Proceedings. 2005 IEEE International Conference on , vol., no. , pp.

in comparison to the conventional 6T cell SRAM due to use of


cutting one side of storage nodes from reading path and

Raychowdhury, A, Mukhopadhyay, S. , Roy,


of

Design:

read SNM of proposed 9T cell shows about 80% improvement

the conventional 6T cell

the

REFERENCES

comparisons of proposed scheme with other structures. The

important drawbacks of proposed

and

and

temperatures. Again, the write SNM around 0.3V is acceptable


for proper functionality

performance

and

write

[8]

2008. 51st Midwest Symposium on, vol. , no. , pp.422-425, 10-13 Aug.

SNM

respectively in comparison to the conventional 6T-cell SRAM.

Sheng Lin; Yong-Bin Kim; Lombardi, F.; , "A 32nm SRAM design for
low power and high stability," Circuits and Systems, 2008. MWSCAS
2008.

[9]

Verma, N. ; Chandrakasan, AP. ; , "A 256 kb 65 nm 8T Subthreshold


SRAM Employing Sense-Amplifier Redundancy," Solid-State Circuits,
IEEE Journal of, voL43, no. I, pp. 141-149, Jan. 2008.

2013

8th International Conference on Design

&

Technology of Integrated Systems in Nanoscale Era (DTlS)

107

Statistical (M-C) and Static noise margin


analysis of the SRAM cells
Govind Prasad and Rambabu kusuma

AbstractFor the first time through this paper, a Static Random Access
Memory using 9T SRAM, 8T SRAM and 6T SRAM has been compared
using N-curve and statistical analysis which demonstrated a multi-fold
performance enhancement. In this paper, 9T SRAM cell with extra
transistors compared to 8T SRAM and 6T SRAM cells, is giving the
higher stability (SVNM, SINM, WTV, and WTI) as compare to
conventional SRAM cells. The paper analysis the variety of parameters
such as stability (SVNM, SINM, WTV, and WTI), area and leakage
power consumption. A comparison based study of the Cell Ratio (CR),
the Pull-up-ratio (PR) with SVNM has been shown. A statistical model
has been developed displaying the power histogram during the write and
read cycle for the 9T SRAM cell. The 9T SRAM cell shows a much better
stability, less standby power consumption and higher area as compare to
6T and 8T SRAM cell counterparts. The design is based on the 90 nm
CMOS process technology.

Index Terms-- Cell Ratio (CR), Pull-up-ratio (PR), Static voltage


noise margin (SVNM), Static current noise margin (SINM), Write
trip voltage (WTV), Write trip current (WTI).

I. INTRODUCTION

For stability of the SRAM cell good SVNM is required so


SVNM is the most important parameter for memory design.
The higher SVNM of the cell confirms the high-speed of
SRAM.
This paper is to introduce how the static voltage noise margin
(SVNM), write trip voltage (WTV) and write trip current
(WTI) of SRAM cell depends on the cell ratio and pull-up
ratio. In order to obtain high noise margin and less power
dissipation new SRAM cell have been introduced.
PR (pull-up ratio) and CR (cell ratio) and supply voltage are
important parameters because these are the only parameters in
the hand of the design engineers. Technology is getting more
complex day by day so it should be carefully selected in
design of the memory cell, there are number of design criteria
that must be taken into consideration.
The two basic criterias which we have to taken such as
Data read operation should not destructive.
Static noise margin should be in acceptable range.

A key insight of paper is that we can analyze noise margin of


different types of SRAM (like 6T, 8T, 9T) and the layout
(area) comparison of SRAM cells, comparison of SVNM with
CR and finally the Monte Carlo analysis of SRAM cell
evaluated.
A. THE STATISTICAL SIMULATION
The manufacturing variations in components affect the
production of any design that includes them. Statistical
analysis allows you to study this relationship in detail. In
general we can say Monte Carlo simulation is a technique
used to understand the impact of risk [1].
1) How Statistical Analysis Works
To prepare for a statistical analysis, you create a design that
includes devices models [7] that are assigned statistically
varying parameter values. The shape of each statistical
distribution shows the manufacturing tolerances on a device or
devices. During the analysis, the statistical analysis option
performs multiple simulations, with each simulation using
different parameters values for the devices based upon the
assigned statistical distributions. When the simulations finish,
you can use the data analysis features of the statistical analysis
option to examine how manufacturing tolerances affect the
overall production yield of your design. If necessary then you
can switch to different components or change the design to
improve the yield [2].
B. NOISE MARGIN
Noise margin is the maximum voltage that can be added to
the logic gate which not affects the output of logic gate. For
stability of the SRAM cell, good SVNM is required that is
depends on the value of the CR and PR.
CELL RATIO is the ratio between sizes of driver transistor
to the load transistor during the read operation.
CR= (W1/L1)/ (W5/L5)

Govind Prasad is with the Department of Electronics and Communication


Engineering, National Institute of Technology, Rourkela, India -769008 (email:govindp317@gmail.com)
Rambabu kusuma is with the Department of Electronics and
Communication Engineering, National Institute of Technology, Jamshedpur,
India -831014 (e-mail: rambabukusuma@gmail.com)

978-1-4673-5630-5//13/$31.00 2013 IEEE

PULL-UP RATIO is the ratio between sizes of the load


transistor to access transistor during write operation.
PR= (W4/L4)/ (W6/L6).
SVNM, which affects both WTV and WTI and also SVNM
related to the threshold voltages of the NMOS and PMOS
devices in SRAM cells [3]. For high SVNM, the threshold
voltages of the NMOS and PMOS devices need to be
increased. However, the increase in threshold voltage of

PMOS and NMOS devices is limited. The reason is that


SRAM cells with MOS devices having too much high
threshold voltages is difficult to operate as it is hard to flip the
operation of MOS devices.
Changing the Cell Ratio, we got different speed of SRAM
cell. If cell ratio increases, then size of the driver transistor
also increases hence current also increases. As current is an
increase, the speed of the SRAM cell also increases. By
changing the Cell ratio we got corresponding SVNM. For
different values of CR, we got different values of SVNM in
different SRAM cells.
II. DESIGN STRATEGY FOR HIGH
SVNM
To determine the (W/L) ratio of the transistors in a SRAM
cell two basic requirements must be taken into consideration
[4]
The data read operation on cell should not destroy the
stored information in the SRAM cell.
The SRAM cell should allow modification of the
stored information during data-write phase.

For occurring of above condition we have to keep M3 is in


saturation and M1 is in liner and the current I 3 should be less
than or equal to I1
I 3 I1
Then only we can get the CR range between 1 to 1.25 that
gives maximum range of SVNM [4]. Here I 3 and I1 is the
current through transistor M3 and M1 respectively.

III. CONSTRUCTION OF 6T SRAM CELL


The conventional SRAM cell (6T-SRAM) shown in fig.2, the
6T-SRAM cell has combination of six transistors in which
four transistors N0, P0, N1, P1 form back to back connection
of inverters to store the single bit either 0 or 1.For read
(write) purpose of data from (to) bit lines, two transistors N2,
N3 are used as access transistors. Word line (WL) is used for
turn ON and OFF the access transistors. BL, BLB are bit lines
[5].

A. Consider data read operation

Fig. 2. Schematics of conventional 6T SRAM cell

Fig. 1. 6T SRAM Cell

When M3, M4 is turned on the voltage level of column BLB


will not show any significant variation since no current will
flow through M4 and M1 and M3 will conduct a nonzero
current
and the voltage level of column BL will begin to
drop slightly and the voltage V1 will increases from its initial
value of 0V, where V1 is the voltage across node 1.
If W/L ratio of access transistor M3 is large compared to the
ratio of M1, the node voltage V1 may exceed the threshold
voltage of M2 during this process, forcing an unintended
change of the stored state.
The key design issue for the data read operation is then to
guararantee that the voltage V1 doesnt exceed the threshold
voltage of M2 ,so that M2 remains turned off during the read
phase i.e.,
(V1 )max (VT )2

A. Operation
The conventional 6T-SRAM cell has three different modes;
standby mode, write mode and read mode. In standby mode no
write or read operation is performed means circuit is idle, in
read mode the data is read from output node to the bit lines
and the write mode updating the data or contents. The SRAM
to operate in read mode should have readability and write
mode should have "write ability. The three different modes
work as follows.
Standby: If the word line WL is low (0) then the access
transistors M3 and M4 become turn off and the bit lines
disconnect from both the access transistors. The cross-coupled
inverters will continue reinforce each other, in this mode the
current drawn from supply voltage is called standby or leakage
current.
Reading: In read mode the word line WL (1), turns ON the
transistors N3 and N4, when both the transistors turn on than
the value of Q and QB are transferred to BL and BLB bit lines
respectively but before giving the WL(1) high the bit lines BL
and BLB should be pre-charge to VDD Assume that the 1 is
stored at Q and 0 at QB so no current flows through N4 and
current is flow through N3 that will discharge the BLB
through N3 and N0.this voltage difference means the read

operation is done by sense amplifier that pull the data and


produce the output. The decoders (row and column) are used
to select the appropriate cell.
Writing: In write mode, suppose we want to write a 0V at
the output Q, we would apply a 0V to the bit line means
setting BL to 0V and BLB to 1V.after setting the bit lines, WL
is then asserted .for proper operation of SRAM cell the sizing
of the transistors is very important.

curve gives both information voltage and current as shown in


fig.5, 6, 7 and allow to scaling described for the SVNM.
The curve crosses over zero at three points E ,G and H from
left to right, The voltage difference between point E and G is
called static voltage noise margin (SVNM) , the curve
between point G and H show the write ability so the voltage
difference between point G and H is called write trip voltage
(WTV),the peak current located between point E and G
means at point F is called static current noise margin(SINM)
and the negative current peak between point G and H is called
write trip current(WTI). So N-curve analysis gives additional
information as compare to butterfly curve analysis.
TABLE I
SNM IMPROVEMENT OF THE 9T SRAM CELLS COMPARED
TO THAT OF THE CONVENTIONAL SRAM CELL

Fig. 3. Schematics of 8T SRAM cell

CMOS
PROCESS
SVNM
SINM
WTV
WTI

Conventional
SRAM 6T
90nm /1V

8T-SRAM
90nm /1V

90nm /1V

273.9mV
18.40uA
446.6mV
-13.53uA

273.9mV
18.42uA
446.6mV
-13.55uA

529.5mV
32.85uA
598mV
-18.27uA

Fig. 4. Schematics of 9T SRAM cell

IV. SIMULATION RESULTS AND


DISCUSSION
A. Stability Analysis using N-curve Analysis
Normally the stability of the SRAM cell define by the
Static Voltage Noise Margin (SVNM) and the Write Trip
Point (WTP) and SVNM is define as the maximum value of
DC noise voltage that can be tolerated by cell without
changing the output or stored bit. So the cell stability is
depends on supply voltage, as supply voltage is reduce the cell
become less stable. The most common static approach for
measuring the SVNM is by using butterfly (or VTC) curves,
which is obtain from a dc simulation. The disadvantage of
measure the SVNM using butterfly curves approach is the
inability to measure the SVNM with automatic inline testers,
after measuring the butterfly curves the static noise margin
still has to be derived by mathematical calculation of
measured data of SRAM cell. So for inline testers here NCurve analysis is used for analysis of SRAM cells [6] [5], N-

Fig. 5. Conventional 6T SRAM cell N-curve.

Fig. 6. Conventional 6T SRAM cell N-curve.

9T- SRAM

Fig. 7. 9T SRAM cell N-curve.

B. Area and Power calculation

Fig. 9. Layout of 8T SRAM cell

TABLE II
AREA AND POWER IMPROVEMENT OF THE 9T SRAM CELLS
COMPARED TO THAT OF THE CONVENTIONAL SRAM CELL

SRAM
Cell

Leakage power
consumption(nW)

6T
8T
9T

56.23
55.10
48.43

Area(mm2)

26.1756
32.6710
34.1381

Fig 10 . Layout of 9T SRAM cell

The below graph shows SVNM increases when write trip


voltage increases(WTV) and is increase when cell ratio(CR)
is increases .So SVNM ,WTV and CR are proportional to
each other[3].

Fig. 8. Layout of 6T SRAM cell

Fig. 11. SVNM variations with WTV (Write trip voltage) and CR for the
6T SRAM Cell.

V.STATISTICAL ANALYSIS OF SRAM


CELL

Fig. 15. The histogram of reading power of 6T SRAM cell from 10k point
M-C simulation.

Fig

12.

Monte

carlo

simulation

of

8T

SRAM

In this paper, we present detailed simulation results for all


SRAMs in 90nm technology at different (W/L) ratio and static
power dissipation. Here first, we present the area and power
savings obtained in 9T SRAM design as compared to the
conventional 6T and 8T-SRAM cell, Second we present
SVNM, SINM, WTV and WTI of the all SRAM design using
N-curve analysis. A histogram based-analysis of power during
read/write operation was made. We have analyzed SVNM,
SINM, WTV and WTI is depends on the CR and PR. The
percentage savings in static power of 9T SRAM as compared
to a conventional 6T SRAM cell is 13.9%and compared to 8TSRAM cell is 12.10%.The SVNM,SINM,WTV and WTI of
9TSRAM
cell
was
also
improved
by
48.27%,43.926%,25.3%,25.831% respectively as compare to
6T SRAM cell. The total area was increased by 23.32% and
4.23%as compared to 6T and 8T-SRAM cell respectively.

Cell.
Fig.

VI. CONCLUSION

13. Monte carlo simulation of 9T SRAM Cell.

A. The histogram of power during writing of data

Fig. 14. The histogram of write power of 6T SRAM cell from 10k point M-C
simulation.

The power distribution shows that the power value is changing


for every iteration.
B. The histogram of power during reading of the data

VII. REFERENCES
[1] Virtuso advanced Analysis tools user guide ,product version 5.1.41(2007).
[2] Jiajing.wang ,Satyanand nalam , analyzing static and dynamic Write
margin for nanometer SRAMs ISLPED08, augerst 11-13 ,2008
[3] Debasis mukherjee ,Hemanta kr.mondal and B.V.R. reddy, static noise
margin analysis of SRAM cell for high speed application, IJCSI international
journal of computer science issue volume7 issue 5 september 2010.
[4] Sung-Mo Kang ,Yusuf Leblebici, CMOS digital Intergrated circuit
analysis and design edition 2003, pp 402-519.
[5] K. Dhanumjaya1, M. Sudha2, Dr.MN.Giri Prasad3, Dr.K.Padmaraju, Cell
stability analysis of conventional 6T dynamic 8T SRAM cell in 45nm
technology, International Journal of VLSI design & Communication Systems
(VLSICS) Vol.3, No.2, April 2012
[6] E. Grossar et al., Read stability and write-ability analysis of SRAM cells
for manometer technologies, IEEE J. Solid-State Circuits, vol.41, no. 11, pp.
25772588, Nov. 2006.
[7] Jiajing .wang and Amith singhee, Statistical modeling for the minimum
standby supply voltage of a full SRAM array IEEE trans.2007
[8] S.Dasgupta and Paridhi ate , A comparative study of 6T,8T and 9T
SRAM cell, IEEE trans. October 4-6 ,2009.
[9] Vikas nehra and Rajesh singh ,simulation of 8T SRAM cell stability at
CMOS technology for multimedia application ,Canadian journal on
electronics engineering vol.3 no.1 january 2012.
[10] Shilpi birla, R.K.Singh, and Manisha Pattnaik, Static Noise Margin
Analysis of Various SRAM Topologies , IACSIT International Journal of
Engineering and Technology,Volume 3, No.3, June 2011.

Proceedings of 2013 IEEE Conference on Information and Communication Technologies (ICT 2013)

Built in Self Test Architecture for Testing SRAM


Using Transient Current Testing
Anumol K A #1, N.M.Siva Mangai #2, P.Karthigai Kumar#3
#

PG Scholar, School of Electrical Science, Karunya University, Coimbatore-641114, India


Associate Professor, School of Electrical Science, Karunya University, Coimbatore-641114, India
#
Associate Professor, School of Electrical Science, Karunya University, Coimbatore-641114, India
#

AbstractSemiconductor memories most specifically Static


Random Access Memories (SRAMs) are becoming very popular
in today's System-On-Chip (SOCs). Memories become more
susceptible to faults when the complexity of these memories
increase as the technology shrinks. In order to detect these faults,
March algorithm has been widely used. This detection of faults in
SRAM has been a time consuming process. Hence transient
current testing (IDDT) methods are used. This paper implements
a transient current testing method to detect faults in
Complementary Mosfet (CMOS) SRAM cells. By monitoring a
transient current pulse during a write operation or a read
operation, faults can be detected. In order to detect the fault a
Built in self test (BIST) circuit is developed. Simulations are
carried out on a 6T SRAM circuit, to detect the difference in
amplitude of the IDDT waveform. Simulations are also carried
out on a 4 *4 SRAM array to detect the occurrence of fault .The
SRAM circuit, array circuit and the sensor circuits are designed
in 180nm CMOS technology.
Keywords SRAM, Memory testing, March algorithm, IDDT,
Current sensor circuit,BIST.

(IDDQ) is also used [5] [6]. However, some defects in SRAM


cells may not be detected using IDDQ. This project proposes a
transient current testing method to detect open defects in
CMOS SRAM cells. By monitoring a transient current pulse
during a write operation or a read operation, faults can be
detected.
II. BACKGROUND
A. SRAM Cell
Transient current testing yields higher efficiency with
repeating structure. Hence this method is good for SRAM cells.
The 6T CMOS SRAM [7] cell is the most widely used SRAM
cell in todays SOCs and microprocessors. Accordingly, a 6T
CMOS cell, as given by Figure 1 is considered. It consists of
two cross coupled inverters formed by transistor
complementary pairs M1-M3 and M2-M4, and by two access
transistors M5 and M6 that are usually NMOS transistors,
which ensure read and write operations in the cell

I. INTRODUCTION
Embedded memories are popular in the realization of
todays complex systems known as system on chips (SOCs).
The forecast for 2013 from International Technology Roadmap
for Semiconductors (ITRS) [1] [2] states that 90% of the area
of SOCs will be made up of memories most specifically static
random access memories (SRAMs) Large arrays of fast SRAM
help in expanding the system performance. However, this
increases the chip cost. Thus for area cost optimization the size
of SRAM cells are minimized. Thus, small SRAM cells are
closely placed making SRAM arrays the densest circuitry on a
chip. Such areas on the chip can be vulnerable to
manufacturing defects and process variations.
This implies that test cost of memories will make a large
impact on the test cost of the SOCs. The faults in memories
result in reduction of yield. In critical systems these may cause
systems failure. Thus adequate test methods must be
employed in order to minimize the cost while maintaining
Figure 1: 6T CMOS SRAM cell
efficiency thereby increasing the quality of the product. In a
SRAM testing, various fault models such as stuck-at, transition,
The writing operation of a SRAM can be classified as
coupling faults are used. In order to detect these faults, March
tests [3] [4] has been widely used. But these detection transition writing and non transition writing. Transition writing
processes are time consuming. Testing using quiescent current will cause transient current to flow from the power supply to

978-1-4673-5758-6/13/$31.00 2013 IEEE

331

Proceedings of 2013 IEEE Conference on Information and Communication Technologies (ICT 2013)
ground. Thus if a transient current is detected during non
transition write or no transient current is detected during a
transition writing can confirm the presence of fault in the cell.
The peak of the transient current also varies with the presence
of different faults, which can be sensed to predict the
occurrence of faults
B. Faults introduced
Fault occurs in SRAM due to logical or electrical design
error, manufacturing defects, aging of components, or
destruction of components (due to exposure to radiation) or
process variations. Manufacturing defects are defects that were
not intended. A manufacturing defect can occur despite of
careful design.
The size of memory causes physical examination of SRAM
impossible. Thus testing mechanism is based on the
comparison of logical behaviour of faulty memory against
good memory. To compare logical behaviour of faulty
memories against good ones, modelling the physical failure
mechanisms as logic fault models is required. The failure in
SRAM occurs due to open and bridging faults as shown in
Figure 2.

TABLE I
RESISTANCE INTRODUCED TO MODEL FAULTS

Resistance

Resistance

Nature of fault

Value()

Fault
model

R1

1M

Open defect 1

TF

R2

1M

Open defect 2

DRF

R3

1M

Open defect 3

S-a-1

R4

1M

Open defect 4

DRF

R5

1M

Open defect 5

SOF

R6

1M

Open defect 6

DRF

R7

10

Bridging defect 1

SAF

R8

10

Bridging defect 2

S-a-1

R9

10

Bridging defect 3

S-a-0

R10

10

Bridging defect 4

CF

R11

10

Bridging defect 5

CF

SAF: A memory cell has stuck at fault when it always contains


either zero or one value regardless of the value given for
writing. Stuck at 1 is represented as S-a-1 and stuck at 0 is
represented as S-a-0.
TF: A memory cell has transition fault when it makes a
transition 01 or 10 but fails to make a transition in the
other direction. A transition fault is a special case of stuckat
fault.
SOF: A memory cell which has unexpected open faults which
makes it unable to be accessed is considered as having a stuck
open fault. Neither read nor write operation to the cell is
possible when a cell has stuck open fault.
CF:A memory cell is considered as having a coupling fault if
a write operation to the cell affects the contents of a
neighbouring cell, or vice versa.
DRF: A memory cell has data retention fault when it loses its
content after certain period of time.
III. BIST ARCHITECTURE

Figure 2: Open and bridging faults in SRAM

Open fault [8] [9] occurs where two nodes which are
supposed to be connected is left open and they can be modelled
as a high resistance connected between those particular nodes.
Bridging fault [10] [11]t is modelled as a low valued resistance
connected across two nodes to show the shorting of the nodes
which are supposed to be open. Thus any resistances shown
above can be introduced into a SRAM to make it faulty. The
values of resistance used are given in Table I. It also states the
nature of that particular fault. The established logic fault
models in SRAM are listed below. The model predicts five
functional fault classes, stuck-at fault (SAF), transition fault
(TF), stuckopen fault (SOF), coupling fault (CF) and data
retention fault (DRF).

978-1-4673-5758-6/13/$31.00 2013 IEEE

A. VDDT Sensor
IDDT current [12] [13] is a very fast action, it is extremely
difficult to sense and process it. Mostly in low power
technologies, processing the dynamic supply current is almost
insoluble. Thus by transforming the current to voltage, and
then handling the resulted voltage waveform is a possible
solution [14]. A VDDT sensor is shown in Figure 3. The
output voltage keeps the shape of the dynamic current, but they
are stretched in time. This stretch in time is useful for further
processing. The main advantage is the speed of test, where
only two write operations are required as compared to March
tests, where minimum 4 operations are required to detect the
fault.

332

Proceedings of 2013 IEEE Conference on Information and Communication Technologies (ICT 2013)
obtained at the output of comparator. Instead of a SRAM cell
an array of cells can also tested with this same circuitry.

Figure 3: VDDT Sensor

M2 and M5 act as two emitter followers, which have the


same DC conditions resulting from the fact, that since M1 is
larger compared to transistors in the cell and that the gate of
M1 is grounded, the voltage drop on it is nearly zero. Thus, the
gate of M2 can be considered as connected to VDD, just like
the gate of M5. Currents through M5 and M2 are also identical
and hence, transistors have the same DC conditions. This way
stable DC conditions are ensured on the differential pair (M6M7) as well. The role of capacitor C is to stretch the voltage
waveform in node Vsense. The main requirement for this
circuitry is good device matching. Output of the differential
amplifier is connected to an opamp of high gain hence the
transient voltage in V can be transformed to mVolts and Volts
A fault-free SRAM cell draws no significant supply current
in a steady state, which means quasi zero static current.
Substantial supply current is only driven when the cell is
changing its state. Under this condition, a temporary current
path is induced between the voltage supply and ground. In
combination with the charging and discharging of node
capacitances, it causes a current flow for the time of the
switching, which is called the dynamic or transient current
This current is sensed by pmos M1. This sensed voltage is
given in V(SENSE). Small spikes of V range appears in the
waveform V(SENSE). This transient voltage is sent to the
differential pair. With good device matching of differential an
approximately equal output is made at the output nodes. These
two outputs are given to a high gain opamp. The voltage is at V
range, and could be processed easily.

Figure 4: BIST Circuitry

C. Test for faulty SRAM array.


A block level representation of the entire memory system
[15] is shown in Figure 6. SRAMs in the same row share same
word line and SRAMs in same column share common bit line.
A 4*4 is made hence 16 six transistor SRAMs are employed.

B. Testing circuitry for SRAM cell.


A fault free SRAM cell along with the Vddt circuitry,
a faulty SRAM cell, an opamp and a comparator is shown in
Figure 4. The faults introduced here can be open faults or
bridging faults. The outputs from the opamps of faulty and
fault free circuit is compared using the comparator. And if
there is any fault in SRAM the outputs of opamps will vary due
to the change in transient current change. And hence a pulse is

978-1-4673-5758-6/13/$31.00 2013 IEEE

Figure 6: SRAM array

333

Proceedings of 2013 IEEE Conference on Information and Communication Technologies (ICT 2013)
Column decoder is used activate a particular BL and
BLBAR lines. Row decoder is used to activate word lines. A
block level representation of the entire memory system is
shown in Figure 6. SRAMs in the same row share same word
line and SRAMs in same column share common bit line. A 4*4
is made hence 16 six transistor SRAMs are employed. Column
decoder is used activate a particular BL and BLBAR lines.
Row decoder is used to activate word lines.
The SRAM array testing circuitry is same as that of a
SRAM cell as shown in the previous section. The difference is
that in place of a cell an array of SRAM is placed. A faulty
array with a good array is compared where the transient
voltage differs from each other giving a pulse at the output of
the comparator. Hence pulsed output at the comparator gives
the occurrence fault in the SRAM array.
IV. SIMULATION RESULTS
A. Sensor Output
A write operation that flips the data in the SRAM will
cause transient current to flow through the SRAM. Presence of
fault in a SRAM changes the amount of transient current
flowing through it. Figure 6 shows the difference in transient
current of a faulty and fault free SRAM. Here write operation
is done in every 50ns. The difference in transient current within
an individual memory cell is due to 1 to 0 writing or 0 to 1
writing into the SRAM cell
.

Figure 7 . Output of Vddt sensor implementation

B. Test for SRAM with single fault.


The test is performed on a SRAM cell with single fault. All
open and bridging faults are introduced individually. Table II
shows the difference in transient current between a faulty and
fault free SRAM cell. Thus this difference helps in identifying
single faults in a cell.
TABLE II
TRANSIENT CURRENT IN SRAM CELL WITH SINGLE FAULT

Figure 6. Difference in transient current

The transient current is sensed by pmos M1. This sensed


voltage is given in V(SENSE) in Figure 7. Here spikes occur at
every 50ns because a write operation is performed at every
50ns. And during the write operation transient current flows.
The transient current at every write operation is converted to
voltage. This voltage is shown as V(OP) which is the sensor
output.

978-1-4673-5758-6/13/$31.00 2013 IEEE

Fault free

Faulty

transient

transient

current(A)

current(A)

R1

-58.767

-11.730

R2

-58.767

-039.201

R3

-58.767

-070.524

R4

-58.767

-085.688

R5

-58.767

-075.570

R6

-58.767

-183.740

R7

-58.767

-082.716

R8

-58.767

-377.510

R9

-58.767

-210.926

R10

-58.767

-000.427

R11

-58.767

-028.773

Difference in the transient current will lead to different


output for each fault in the VDDT sensor. When this different
sensor output of a faulty and fault free memory is compared by
a comparator, a pulse is obtained at the output of the circuit.
This is shown in Figure 8. The pulse V(MAINOUT) confirms

334

Proceedings of 2013 IEEE Conference on Information and Communication Technologies (ICT 2013)
the presence of fault in SRAM. And a zero dc output at
V(MAINOUT) is obtained for a fault free SRAM cell.

V. CONCLUSIONS
In this paper a BIST using transient current approach for
fault detection, is implemented and its effectiveness has been
tested using simple memory architecture having single and
multiple faults. Presence of fault is shown as a pulse in the
output of the circuit. Absence of fault is given by a zero dc at
the output. This BIST is also effective for detecting faults for a
4*4 SRAM array also.
This BIST architecture may be extended to detect
faults in larger SRAM array. A novel high performance BIST
can be designed. High performance includes high speed, low
power and area efficient BIST.
REFERENCES
[1]
[2]
[3]
[4]

[5]
Figure 8: Output of BIST circuit
[6]

C. Test for SRAM with multiple fault.


On a single SRAM cell multiple faults are added. These
faults can be multiple open faults only, multiple bridging faults
only or combination of open and bridging faults. Random
inclusion of faults in a SRAM cell also gives different transient
current when compared with a fault free cell. This comparison
is shown in Table III.

[7]

[8]

[9]

[10]

TABLE III
TRANSIENT CURRENT IN SRAM CELL WITH MULTIPLE FAULTS

Resistance

Fault free SRAM

Faulty SRAM

introduced

transient

transient

current(A)

current(A)

R1,R2

-58.767

-21.563

R1,R4

-58.767

-089.001

R4,R5

-58.767

-159.897

R4,R5,R10

-58.767

-047.688

R5,R11

-58.767

-161.580

R6,R8

-58.767

-12.890

R7,R9.R4

-58.767

-019.716

978-1-4673-5758-6/13/$31.00 2013 IEEE

[11]

[12]

[13]

[14]

[15]

Semiconductor Industry Association 2005 "International Technology


Roadmap for Semiconductors ".
Semiconductor Industry Association 2011 "International Technology
Roadmap for Semiconductors".
Van de Goor 1993, Using March Tests to test SRAMs, IEEE Design
and Test of Computers, March, pp. 8-14.
Luigi Dilillo, Patrick Girard May 2004 Dynamic Read Destructive
in Embedded-SRAMs: Analysis and March Test Solution 9th IEEE
European Test Symposium Congress Center, Ajaccio, Corsica, France
Rubio, J. Figueras, J. Segura 1990, Quiescent current sensor circuits in
digital VLSI CMOS testing, Electronic Letters, Vol. 26, pp. 1204-120.
Balachandran 1996, Improvement of SRAM-based failure analysis
using calibrated Iddq testing VLSI Test Symposium, Proceedings of
14th , IEEE pp. 130 136
Sunil Jadav1, Vikrant2, Munish Vashisath June 2012 Design and
performance analysis of ultra Low power 6t sram using adiabatic
Technique International Journal of VLSI design & Communication
Systems (VLSICS) Vol.3, No.3
R. Rodriquez 2002, Resistance Characterization of Interconnect Weak
and Strong Open Defects, IEEE Design & Test of Computers, vol.19,
n.5, Sept-, pp.18-26
Luigi Dilillo, Patrick Girard 2004 Resistive-Open Defects in
Embedded-SRAM core cells: Analysis and March Test Solution 13th
Asian Test Symposium , Kenting, Taiwan Nov 15-17
Fonseca 2010 Analysis of resistive-bridging defects in SRAM corecells: A comparative study from 90nm down to 40nm technology
nodes Test Symposium (ETS), 15th IEEE European
S. Hamdioui and A.J. Van De Goor, 2000 An Experimental Analysis
of Spot Defects in SRAMs: Realistic Fault Models and Tests, Proc. of
IEEE Asian Test Symposium, pp. 131-138,
Suriya A. Kumar, Rafic Z. Makki, and David Binkley 2002 iDDT
Testing of CMOS Embedded SRAMs Design, Automation and Test in
Europe Conference and Exhibition, Proceedings
Doe Yoon , Hong-Sik Kim, and Sungho Kang 2001 Dynamic Power
Supply Current Testing for Open Defects in CMOS SRAMs ETRI
Journal, Volume 23, Number 2, June
Gbor Gyepes, Daniel Arbet, Juraj Brenkus and Viera Stopjakov 2012
Application of IDDT test towards increasing SRAM reliability in
nanometer technologies Design and Diagnostics of Electronic Circuits
& Systems (DDECS), 2012 IEEE 15th International Symposium on
April 2012
Kiyoo Itoh 2010 VLSI memory chip design. Volume 5 of Springer
Series in Advanced Microelectronics

335

,QWHJUDWLQJ(PEHGGHG7HVW,QIUDVWUXFWXUHLQ
65$0&RUHVWR'HWHFW$JLQJ
:3UDWHV/%RO]DQL*+DUXW\XQ\DQ$'DYW\DQ)9DUJDV<=RULDQ
(OHFWULFDO(QJLQHHULQJ'HSW
&DWKROLF8QLYHUVLW\38&56
3RUWR$OHJUH%UD]LO 

YDUJDV#SXFUVEU

 
OHWLFLD#SRHKOVFRP 







$EVWUDFW2QH RI WKH PRVW LPSRUWDQW SKHQRPHQD GHJUDGLQJ


1DQRVFDOH 6WDWLF 5DQGRP $FFHVV 0HPRU\ 65$0 
UHOLDELOLW\ LV UHODWHG WR 1HJDWLYH%LDV 7HPSHUDWXUH
,QVWDELOLW\ 1%7,  7KLV SDSHU SUHVHQWV WKH LQWHJUDWLRQ RI
WKH2&$6 2Q&KLS$JLQJ6HQVRU DSSURDFKLQWKH GHVLJQ
PHWKRGRORJ\ RI QP VLQJOHSRUW 65$0 FRUHV 7KH JRDO LV
WR HQKDQFH WKH FXUUHQW WHVW DQG UHSDLU RQFKLS
LQIUDVWUXFWXUH WR GHWHFW 65$0 DJLQJ GXULQJ V\VWHP
OLIHWLPH 2&$6 LV DEOH WR GHWHFW WKH DJLQJ VWDWH RI D FHOO LQ
WKH65$0DUUD\7KHVWUDWHJ\LVEDVHGRQWKHFRQQHFWLRQRI
RQH2&$6SHU65$0FROXPQZKLFKSHULRGLFDOO\SHUIRUPV
RIIOLQH WHVWLQJ E\ PRQLWRULQJ ZULWH RSHUDWLRQV LQWR WKH
65$0 FHOOV WR GHWHFW DJLQJ 7KH DSSURDFK LV DSSOLFDWLRQ
WUDQVSDUHQW VLQFH LW LV GRHV QRW FKDQJH WKH 65$0 FRQWHQW
DIWHU WHVWLQJ 63,&( VLPXODWLRQV DOORZHG XV WR DQDO\]H WKH
2&$6 VHQVLWLYLW\ WR GHWHFW HDUO\ DJLQJ VWDWHV LQ WKLV YHU\
GHHSVXEPLFURQWHFKQRORJ\DVZHOODVWKHDUHDSRZHUDQG
SHUIRUPDQFHSHQDOWLHVGXHWRWKHVHQVRULQVHUWLRQ
.H\ZRUGV65$01%7,$JLQJ0RQLWRULQJ2Q&KLS$JLQJ
6HQVRU65$0UHOLDELOLW\

, ,1752'8&7,21


6\QRSV\V
6\QRSV\V

<HUHYDQ$UPHQLD
&$86$

*XUJHQ+DUXW\XQ\DQ#V\QRSV\VFRP
$UPDQ'DYW\DQ#V\QRSV\VFRP

<HUYDQW=RULDQ#V\QRSV\VFRP

EORFN DQG ZULWH FLUFXLWU\ ZKLFK LQFUHDVHV GHVLJQ
FRPSOH[LW\ PRUH WKDQ VLPSO\ DGGLQJ DQ RQFKLS VHQVRU
LWVHOI F  WR SHUIRUP 65$0 DJLQJ WHVW RQ WKH ILHOG WKH
ZKROH PHPRU\ FRQWH[W PXVW EH SUHYLRXVO\ VDYHG LQ D
VSDUH PHPRU\ EHIRUH WHVW H[HFXWLRQ EHFDXVH WKH WHVW
SURFHGXUH RYHUZULWHV WKH DSSOLFDWLRQ GDWD VWRUHG LQ
PHPRU\ )RU KLJKDYDLODELOLW\ DSSOLFDWLRQV ORQJ
GRZQWLPH SHULRGV PD\ EH D VHULRXV UHVWULFWLRQ  $QRWKHU
DSSURDFK >@ SUHVHQWV D FRPSDFW RQFKLS VHQVRU GHVLJQ
WKDWWUDFNV1%7,IRU65$0V7KHVHQVRULVHPEHGGHGLQ
WKH65$0DUUD\DQGWDNHVWKHIRUPRID765$0FHOO
7KLVDSSURDFKW\SLFDOO\FRQVLVWVRIKXQGUHGVRUWKRXVDQGV
RI VHQVRUV WR DFKLHYH GHFHQW VHQVLQJ SUHFLVLRQ )RU
H[DPSOH IRU D 0ELW 65$0 D V\VWHP RI RQH WKRXVDQG
VHQVRUV ZDV GHVLJQHG HDFK VHQVRU PRQLWRUV D VPDOO
VXEVHW RI FHOOV  UHVXOWLQJ LQ D ODUJH DUHD RYHUKHDG
$GGLWLRQDOO\LIWKHVHQVRULVGULYHQLQWRDIDXOWVWDWH LH
LW WXUQV LQWR DQ DJHG FHOO  WKLV GRHV QRW PHDQ WKDW WKH
QHLJKERUKRRG FHOOV ZRXOG EH GHIHFWLYH DV ZHOO $QG WKH
RSSRVLWHLVDOVRWUXHZKLFKLVHYHQZRUVH7KLVFRQGLWLRQ
GHJUDGHVWKHDSSURDFK
VUHOLDELOLW\
,Q >@ WKH DXWKRUV SUHVHQWHG WKH ILUVW YHUVLRQ RI WKH
2Q&KLS $JLQJ 6HQVRU 2&$6  DSSURDFK 7KLV VHQVRU
ZDVGHVLJQHGLQDFRPPHUFLDOQPWHFKQRORJ\

,Q 9HU\ 'HHS 6XE0LFURQ 9'60  WHFKQRORJ\


OLIHWLPH UHOLDELOLW\ KDV EHFRPH RQH RI WKH NH\ GHVLJQ
IDFWRUV WR JXDUDQWHH WKH UREXVWQHVV RI 6WDWLF 5DQGRP
$FFHVV 0HPRULHV 65$0V  ,Q WKLV FRQWH[W 1HJDWLYH
%LDV7HPSHUDWXUH,QVWDELOLW\ 1%7, KDVEHFRPHWKHPRVW
SURPLQHQW HIIHFW GXH WR WKH IDFW WKDW LW FUHDWHV LQWHUIDFH
VWDWHVDORQJZLWKWKHZKROHVLOLFRQR[LGHLQWHUIDFH1%7,
KDVEHHQVKRZQWREHWKHPDMRUUHOLDELOLW\OLPLWLQJIDFWRU
ZKHQWKHJDWHR[LGHLVWKLQQHUWKDQQP>@>@

,QWKLVSDSHUZHSUHVHQWWKHLQWHJUDWLRQRIWKH2&$6
DSSURDFKLQWKHGHVLJQPHWKRGRORJ\RIQPVLQJOHSRUW
65$0FRUHV7KHJRDOLVWRHQKDQFHWKHFXUUHQW6\QRSV\V
WHVW DQG UHSDLU RQFKLS LQIUDVWUXFWXUH 67$5 6HOI7HVW
DQG 5HSDLU 6ROXWLRQ >@  WR GHWHFW 65$0 DJLQJ GXULQJ
V\VWHPOLIHWLPH7KH67$50HPRU\6\VWHPVROXWLRQZDV
GHYHORSHG ZLWKLQ 6\QRSV\V 'HVLJQ :DUH DOORZLQJ XVHUV
WRFUHDWHLQWHJUDWHDQGYHULI\HPEHGGHGPHPRU\WHVWDQG
UHSDLU LQIUDVWUXFWXUHLQV\VWHPRQFKLSV67$5GHWHFWVD
ZLGH UDQJH RI UHDOLVWLF IDXOWV VXFK DV UHVLVWLYH
SHUIRUPDQFH VWDWLF DQG G\QDPLF OLQNHG DQG XQOLQNHG
SURFHVVYDULDWLRQIDXOWV%DVHGRQ63,&(VLPXODWLRQVZH
KDYH LQYHVWLJDWHG WZR VLWXDWLRQV D  WKH DJLQJ IDXOWV DQG
WKHLU LPSDFW RQ QP 65$0V DQG E  WKH 2&$6
VHQVLWLYLW\ WR GHWHFW VXFK IDXOWV LQ WKH WDUJHW QP
WHFKQRORJ\

6HYHUDO SUHYLRXV ZRUNV IRXQG LQ WKH OLWHUDWXUH KDYH


DGGUHVVHGWKH1%7,SUREOHP$PRQJWKHP>@SURSRVHG
DQ DSSURDFK DV D PHDQVRIDOOHYLDWLQJ WKH 1%7,LQGXFHG
DJLQJ HIIHFWV ,Q SDUWLFXODU WKH\ GHPRQVWUDWH KRZ
LQWHOOLJHQWVRIWZDUHGLUHFWHGGDWDDOORFDWLRQVWUDWHJLHVFDQ
H[WHQGWKHOLIHWLPHRISDUWLWLRQHGVFUDWFKSDGPHPRULHVE\
PHDQV RI GLVWULEXWLQJ WKH LGOHQHVV DFURVV PHPRU\ VXE
EDQNV+RZHYHULWPXVWEHREVHUYHGWKDWDXWKRUVGRQRW
GHWHFW DJLQJ WKH\ VLPSO\ PLQLPL]H LWV HIIHFW RQ WKH
65$0
$QRWKHUDSSURDFK>@ XVHVDQH[SHULPHQWDOO\YHULILHG
1%7, PRGHO WR VWXG\ '& QRLVH PDUJLQV LQ FRQYHQWLRQDO
765$0FHOOVDVDIXQFWLRQRI1%7,GHJUDGDWLRQLQWKH
SUHVHQFH RI SURFHVV YDULDWLRQV 7KLV LV DQ LQWHUHVWLQJ
DSSURDFK KRZHYHU D  WKH SUHVHQWHG GHVLJQIRU
WHVWDELOLW\ ')7 EORFNLVPRVWO\DQDORJDQGVRPD\EH
YHU\VHQVLWLYHWRSURFHVVYDULDWLRQVDQGDJLQJHIIHFWV E 
WKLV DSSURDFK UHTXLUHV PRGLILFDWLRQV RI WKH URZ GHFRGHU

c
978-1-4799-0664-2/13/$31.00 2013
IEEE

,,

2&$6$3352$&+

)LJGHSLFWVWKHJHQHUDOEORFNGLDJUDPRIWKH2&$6
DSSURDFK LQGLFDWLQJ WKH FRQQHFWLRQ EHWZHHQ WKH DJLQJ
VHQVRUDQGD65$0FROXPQ$VREVHUYHGWUDQVLVWRU77
LV FRQQHFWHG EHWZHHQ WKH UHDO 9'' DQG YLUWXDO 9'' QRGH
9'' ZKLFKLVXVHGWRIHHGWKHSRVLWLYHELDVWRWKHFHOOV

25

RI D 65$0 FROXPQ 'XULQJ 1RUPDO 2SHUDWLQJ 0RGH


77LVRQDQG2&$6LVSRZHUHGRIIE\ WUDQVLVWRUV73*
DQG 71* )LJ   'XULQJ WKH 7HVWLQJ 0RGH 2&$6 LV
SRZHUHGRQE\73*DQG71*WKDWDUHWXUQHGRQDQG77
LV VZLWFKHG RII $W WKLV PRPHQW D ZULWH RSHUDWLRQ RU D
VHTXHQFHRIZULWHRSHUDWLRQV LVSHUIRUPHGRQWKHVSHFLILF
PHPRU\ FHOO ZKLFK ZH ZRXOG OLNH WR PHDVXUH WKH DJLQJ
VWDWH $IWHU SHUIRUPLQJ D FRPSDULVRQ EHWZHHQ WKH 9''
QRGH YROWDJH DW WKH HQG RI D ZULWH RSHUDWLRQ  DQG D
5HIHUHQFH 9ROWDJH YDOXH SUHYLRXVO\ DGMXVWHG LQVLGH WKH
VHQVRU 2&$6 WDNHV WKH SDVVIDLOGHFLVLRQ $W WKHHQGRI
WKLVSURFHVV2&$6RXWSXW 287 \LHOGVDORJLFIRUD
IDXOWIUHH IUHVK 65$0FHOORUDORJLFZKLFKPHDQV
D IDXOWVWDWH LH WKH FHOO LV QR PRUH UHOLDEOH GXH WR LWV
DGYDQFHDJHGVWDWH 

  :ULWHWKHYDOXHDVUHDGLQVWHS  
  'ULYH &75/ VLJQDO WR  (YDOXDWLRQ 3KDVH  DQG
REVHUYHWKH2&$6RXWSXWIRUDSDVVIDLOGHFLVLRQ
  5HWXUQ3RZHU*DWLQJVLJQDOWRDQG7HVWLQJ0RGH
VLJQDOWRDORQJZLWKWKHFROXPQZKRVH
FHOO
ZDVWHVWHG
  ,I WKHUH DUH PRUH FHOOV WR EH FKHFNHG UHSHDW WKH
SURFHVVIURPVWHS  RWKHUZLVHVWRSWHVWLQJ
,WLVLPSRUWDQWWRPHQWLRQWKDWWKHUHLVDVPDOOFLUFXLWU\
HPEHGGHGLQWKH2&$6ZKLFKLVXVHGWRSHUIRUPVHOIWHVW
RIWKHVHQVRUEHIRUHLWLVDFWLYDWHGWRPRQLWRUWKH65$0
FHOOV7KLVVPDOOFLUFXLW QRWVKRZQLQ)LJ LVIRUPHGE\
WZR UHVLVWRUV 5 DQG 5 LQ VHULHV  LQ WKH VDPH
FRQILJXUDWLRQ DVWKHRQH IRUPHGE\ UHVLVWRUV 5DQG 5
EXWFRQQHFWHGWRWKHGUDLQRIWUDQVLVWRUV0DQG07KH
YROWDJHSURGXFHGDWWKLVQRGHLVHTXDOWRWKHRQHSURGXFHG
DWWKH9''DIWHUDVHTXHQFHRIWZRZULWHRSHUDWLRQVLQD
IDXOW\ DJHG FHOODFWLYDWHGGXULQJ WKH 7HVWLQJ 0RGH 6R
ZKHQDFWLYDWHGWKH2&$6VHOIWHVWLQJLWLVH[SHFWHGWKDW
2XWZLOOLQGLFDWHDQHUURU ORJLFOHYHO 
,,, ,17(*5$7,1*2&$6$1'65$0&25(6


)LJXUH*HQHUDOEORFNGLDJUDPRIWKHSURSRVHGDSSURDFK

)LJ  SUHVHQWV GHWDLOHG 2&$6 VFKHPDWLFV 7KH


FRQWURO VLJQDO &75/ LV VHW WR  GXULQJ WKH SUHFKDUJH
SKDVHRIWKH7HVWLQJ0RGHZKHUHDVGXULQJWKHHYDOXDWLRQ
SKDVHWKLVVLJQDOLVVHWWR8SRQWKHSUHFKDUJHSKDVH
WUDQVLVWRUV 7& 7 7 7 DQG 7 DUH GULYHQ E\ WKH
&57/VLJQDOWRWKH21VWDWHDQGWKHVLJQDOVWREHFKHFNHG
DUH GULYHQ LQWR WKH YROWDJH FRPSDUDWRU IRUPHG E\
WUDQVLVWRUV 0 WR 0 ,Q WKH VHTXHQFH WKH HYDOXDWLRQ
SKDVH  &75/ LV VHW WR  WXUQLQJ 7& 7 WR 7 RII
7&7DQG7RQZKLFKDOORZV0WR0WRHYDOXDWH
WKH LQSXW VLJQDO WKH YROWDJH DW 9''  DJDLQVW WKH
5HIHUHQFH 9ROWDJH YDOXH JHQHUDWHG E\ UHVLVWRUV 5 DQG
5 ,IWKHVHQVLQJYDOXHFRPLQJIURP9''LVORZHUWKDQ
WKH 5HIHUHQFH 9ROWDJH WKHQ WKH FHOO LV VWLOO IUHVK
RWKHUZLVH WKH FHOO LV FRQVLGHUHG DJHG DQG WKH 2&$6
RXWSXW 287 LVVHWWR7KLVSURFHVVLVH[HFXWHGIRU
HDFK FHOO LQ WKH FHOOV DUUD\ ZKLFK ZH ZRXOG OLNH WR
PHDVXUHWKHDJLQJVWDWH,QDJHQHUDOIRUPWKHIROORZLQJ
VWHSVDUHFDUULHGRXWLQRUGHUWRPHDVXUHWKHDJLQJVWDWHRI
DJLYHQ65$0FHOO
  6HOHFWWKHGHVLUHGFHOO
VDGGUHVVDQGUHDGWKHFHOO
  &KDQJH 7HVWLQJ 0RGH VLJQDO IURP  WR  DORQJ
ZLWKWKHFROXPQZKRVHFHOOLVWREHWHVWHG
  'ULYH 3RZHU *DWLQJ DQG &75/ VLJQDOV WR  3UH
&KDUJH3KDVH 
  :ULWHWKHRSSRVLWHYDOXHDVUHDGLQVWHS  

:H XVH WKH 3RZHU *DWLQJ 7HFKQLTXH WR VZLWFK WUDQVLVWRUV 73* DQG 71* RII
GXULQJWKH1RUPDO2SHUDWLQJ0RGHVRWKDWWRDYRLGDJLQJWKH2&$6FLUFXLWU\ 


26

7KLV VHFWLRQ SUHVHQWV WKH LQWHJUDWLRQ RI WKH 2&$6


DSSURDFKLQWKHGHVLJQPHWKRGRORJ\ RIQPVLQJOHSRUW
65$0FRUHV)LJVXPPDUL]HVWKHEORFNGLDJUDPRIWKH
2&$6 LQVWDQWLDWLRQ LQ WKH 6\QRSV\V WHVW DQG UHSDLU RQ
FKLSLQIUDVWUXFWXUH 67$56HOI7HVWDQG5HSDLU6ROXWLRQ 
WRGHWHFW65$0DJLQJGXULQJV\VWHPOLIHWLPH7KH67$5
0HPRU\6\VWHPVROXWLRQZDVGHYHORSHGZLWKLQ6\QRSV\V
'HVLJQ:DUHDOORZLQJXVHUVWRFUHDWHLQWHJUDWHDQGYHULI\
HPEHGGHGPHPRU\WHVWDQGUHSDLULQIUDVWUXFWXUHLQV\VWHP
RQ FKLSV7KH EORFN %,67 GHDOV ZLWK GHWHFWLQJ D ZLGH
UDQJH RI UHDOLVWLF IDXOWV VXFK DV UHVLVWLYH SHUIRUPDQFH
VWDWLFDQGG\QDPLFOLQNHGDQGXQOLQNHGSURFHVVYDULDWLRQ
IDXOWV )RU D GHWDLOHG GHVFULSWLRQ RI WKLV EORFN UHDGHUV
VKRXOGDGGUHVVH[LVWLQJGRFXPHQWDWLRQ>@1XPEHUVWR
LGHQWLI\WKHIROORZLQJFRQWUROVLJQDOV5XQB%,67
3RZHU *DWLQJ  5HIHUHQFH 9ROWDJH JHQHUDWHG RXWVLGH
WKHFRUH 7HVWLQJ0RGH&75/6HQVRU2XWB
6HQVRU2XWB VHHDOVR)LJ 
&XUUHQWO\WKH2&$6LQWHJUDWLRQLQWKH67$50HPRU\
6\VWHP LV QRW LPSOHPHQWHG \HWWKRXJK VRPH SURWRW\SLQJ
LVDOUHDG\GRQHIRUWKLVSXUSRVH6RIRURXUH[SHULPHQWV
LQ RUGHU WR LQWHJUDWH 2&$6 LQ WKH 65$0 GHVLJQ
PHWKRGRORJ\ D GHGLFDWHG 65$0 FRQWDLQLQJ  ELW
ZRUGV ZDV GHVLJQHG DV FDVHVWXG\ ,Q PRUH GHWDLOV WKH
PHPRU\FRQVLVWHGRIFROXPQVHDFKFROXPQFRQVLVWLQJ
RI  FHOOV :LWK WKLV SXUSRVH WKH IROORZLQJ FKDQJHV
ZHUHSHUIRUPHGLQWKHQP65$0FRUHDUFKLWHFWXUH
 7KH VHQVRU LV LQWHJUDWHG ZLWKLQ WKH PHPRU\ DQG
LWV SLQV DUH FDUULHG RXW WR EH FRQWUROOHG IURP
RXWVLGHWKHPHPRU\
 7KH 9'' QRGH RI WKH FROXPQ FRQQHFWHG WR WKH
VHQVRU ZDV VHSDUDWHG IURP WKH 9'' QRGH RI WKH
RWKHU FROXPQV E\ PHDQV RI WKH DGGLWLRQ RI
WUDQVLVWRU77 )LJVDQG 

2013 IEEE 19th International On-Line Testing Symposium (IOLTS)

 7KH VHSDUDWHG 9'' LH 9''


 9LUWXDO  LV
FRQQHFWHGWRWKHVHQVRU
 ,W ZDV VFDOHG WKH OHQJWK RI DOO WUDQVLVWRUV LQ WKH
VHQVRUWR/  X WKHVHQVRU ZDVRULJLQDOO\
SURSRVHG GHVLJQHG DQG YDOLGDWHG LQ QP

WHFKQRORJ\ >@ ZLWK /  X  7KH ZLGWKV RI


DOOWUDQVLVWRUVLQWKHVHQVRUZHUHFKDQJHGZLWKWKH
VDPH OHQJWK UDWLR DV ZHOO UDWLR    )RU
H[DPSOH WKH ZLGWK RI 7 DQG 7 WUDQVLVWRUV
EHFDPH: XIURP: X


)LJXUH6FKHPDWLFVRIWKHSURSRVHGDSSURDFK





















(UURU,QGLFDWLRQ
5HI9ROWDJH

,9 (;3(5,0(176

67$50HPRU\6\VWHP
%,67

&RQWURO



$GGUHVV
'DWD

67$5&RQWURO



$JLQJ6HQVRU
9''
 9LUWXDO 

0HPRU\
&RQWURO

9''

5RZ
'HFRGHU

$*('

&(//

7KLV VHFWLRQ LOOXVWUDWHV 63,&( VLPXODWLRQV WKDW ZHUH


SHUIRUPHG WR LQYHVWLJDWH WKH 2&$6 VHQVLWLYLW\ WR GHWHFW
DJLQJIDXOWVLQWKHWDUJHWQPWHFKQRORJ\DVZHOODVWKH
DUHD SRZHU DQG SHUIRUPDQFH SHQDOWLHV GXH WR WKH VHQVRU
LQVHUWLRQ :LWK WKLV SXUSRVH WKH IROORZLQJ DFWLRQV ZHUH
FDUULHGRXW
 2&$6ZDVLQWHJUDWHGLQWKH65$0FRUH$QDJLQJ
IDXOWZDVLQMHFWHGLQRQHRIWKHFHOOVLQWKHFROXPQ
ZKHUHWKHVHQVRULVFRQQHFWHGZLWK7KHIDXOWZDV
LQMHFWHGE\DGGLQJD YROWDJH VRXUFHLQVHULHVZLWK
WKH JDWH RI WKH S026 WUDQVLVWRUV RI WKDW FHOO )RU
GHWDLOV RQ KRZ WR FRQQHFW WKH YROWDJH VRXUFH
SOHDVHUHIHUWR>@
 7ZR IRXU VL[ DQG HLJKW ZULWH RSHUDWLRQV ZHUH
SHUIRUPHG ZLWK GLIIHUHQW WHFKQRORJ\ FRUQHU FDVHV
7766)))66) 

6HQVH$PSDQG23EXIIHUV

$''5
!

'

:(

0(

&/
.

*1'

)LJXUH%ORFNGLDJUDPRIWKH2&$6LQWHJUDWLRQLQD65$0FRUH

 7KH PHPRU\ ZULWH RSHUDWLRQV ZHUH SHUIRUPHG DW


 0+] IUHTXHQF\ 7KH PHPRU\ DQG WKH VHQVRU
FDQ RSHUDWH HYHQ DW KLJKHU IUHTXHQFLHV :H MXVW
VHOHFWHGDPLGGOHYDOXHIURPWKHRSHUDWLQJUDQJH
 ,Q RUGHU WR HYDOXDWH WKH VHQVRU VHQVLWLYLW\ ZLWK
UHVSHFW WR WLPH WKH GXUDWLRQ RI WKH 2&$6 SUH

2013 IEEE 19th International On-Line Testing Symposium (IOLTS)

27

FKDUJH SKDVH ZDV VHW WR QV 7KLV YDOXH ZDV


DUELWUDULO\ GHILQHG 7KRXJK ZH KDYH FKDQJHG WKLV
YDOXHIURPQVWRQVZHGLGQRWVHHDQ\LPSDFW
RQWKHVHQVRURSHUDWLRQ
 1RWLQJWKDWLQWKLVWDUJHWWHFKQRORJ\9'' 9
9WKS 9 S026 WKUHVKROGYROWDJH DQGWKDW
9WKS VKLIWV LQ DYHUDJH E\  SHU \HDU WKHQ 
\HDU DJLQJ FRUUHVSRQGV WR LQFUHDVH 9WKS E\
P9  )RU\HDUV9WKSLVVKLIWHGE\
P9DQGVRRQ
7DEOH , LOOXVWUDWHV VRPH RI WKH REWDLQHG UHVXOWV
$VVXPH IRU LQVWDQFH WKH OHIWPRVW FROXPQ ZLWK &RUQHU
&DVHHTXDOWR)) 7 R9'' 9 DQG$JLQJ
IDXOW GHWHFWLRQ E\ :ULWHV DV EHLQJ  WKLV PHDQV
WKDW 2&$6 GRHV QRW GHWHFW D \HDU DJLQJ IDXOW DIWHU 
ZULWH RSHUDWLRQV ZKLFK FRUUHVSRQGV WR D 9WKS 
9  EXW LW GHWHFWV D \HDU DJLQJ IDXOW 9WKS 
9 DIWHUVXFKVHTXHQFHRIZULWHRSHUDWLRQV

PRQLWRUHG 65$0 FROXPQ E\ FUHDWLQJ D YLUWXDO 9''


QRGH1RWHWKDWWKLVWUDQVLVWRULVWXUQHG2))RQO\GXULQJ
WKHVKRUWWHVWLQJSHULRGV7KLVIXQFWLRQDOEHKDYLRUUHQGHUV
WKLVWUDQVLVWRUYHU\VHQVLWLYHWRDJLQJHIIHFWVVLQFHDJLQJ
LVDFFHOHUDWHGIRUS026GHYLFHVWXUQHG217KHUHIRUHLQ
RUGHUWRKDYHDEHWWHUXQGHUVWDQGLQJRIWKLVVLWXDWLRQ ZH
SHUIRUPHG D VHW RI VLPXODWLRQV E\ DJLQJ QRW RQO\ WKH
65$0 FHOO DV VKRZQ LQ 7DEOH , EXW DOVR DJLQJ 77 DV
ZHOO ,Q WKLV FDVH ZH UHVLPXODWHG WKH ILUVW OLQH RI WKLV
WDEOH 777 R9'' 9 E\DVVXPLQJ77DJHGE\
\HDUV7DEOH,,LOOXVWUDWHVWKLVVLWXDWLRQ$VGHSLFWHGLQ
WKLV WDEOH HYHQ LQ WKH SUHVHQFH RI 77 ZHDU RXW 2&$6
SHUIRUPV SURSHUO\ WKH ILUVW DQG WKH VHFRQG OLQHV DUH
HTXDO  H[FHSW IRU WKH UHVXOW GLVSOD\HG RQ WKH IRXUWK
FROXPQ VHFRQG OLQH VXUURXQGHG E\ D VTXDUH  ,Q WKLV
FDVH2&$6FRXOGQRWGHWHFWD\HDUDJLQJIDXOWDIWHU
ZULWHRSHUDWLRQVLQWRWKHFHOOEXWLWVXFFHHGHGWRGHWHFWD
\HDUDJLQJIDXOWZLWKVXFKVHTXHQFH

7DEOH,2&$6VHQVLWLYLW\DVDIXQFWLRQRIWKHQXPEHURIZULWH
RSHUDWLRQVLQWRWKH65$0FHOO


)LJ  SUHVHQWV 2&$6 VHQVLWLYLW\ IRU WKH YDOXH 
GLVSOD\HG RQ WKH VHFRQG FROXPQ VHFRQG OLQH RI 7DEOH ,
VXUURXQGHGE\DFLUFOH $VREVHUYHG )LJD WKH\HDU
DJLQJ IDXOW ZDV QRW GHWHFWHG E\ 2&$6 DIWHU ZULWH
RSHUDWLRQVLQWRWKHFHOOEXWLWVXFFHHGHGWRGHWHFWD\HDU
DJLQJ IDXOW ZLWK VXFK VHTXHQFH )LJ E  )RU WKLV
VLPXODWLRQ VHW WKH UHIHUHQFH YROWDJH DW WKH 9''
 YLUWXDO
QRGH  ZDV VHW WR 9 7KLV UHIHUHQFH YDOXH ZDV
FRPSXWHGGXULQJDSUHYLRXVVLPXODWLRQE\REVHUYLQJWKH
YROWDJHYDOXHDWYLUWXDO9''
QRGHDIWHUZULWHRSHUDWLRQV
LQWRWKHWDUJHWFHOO

D 

)LJ  LOOXVWUDWHV WKH 2&$6 VHQVLWLYLW\ IRU WKH YDOXH


GLVSOD\HGRQWKHIRXUWKFROXPQWKLUGOLQHRI7DEOH,
VXUURXQGHGE\DVTXDUH 7KH\HDUDJLQJIDXOW )LJD 
ZDV QRW GHWHFWHG E\ 2&$6 DIWHU ZULWH RSHUDWLRQV LQWR
WKH FHOO EXW LW VXFFHHGHG WR GHWHFW D \HDU DJLQJ IDXOW
ZLWKVXFKVHTXHQFH )LJE )RUWKLVVLPXODWLRQVHWWKH
UHIHUHQFHYROWDJHZDVVHWWR97KLVUHIHUHQFHYDOXH
ZDVFRPSXWHGGXULQJDSUHYLRXVVLPXODWLRQE\REVHUYLQJ
WKH YROWDJH YDOXH DW YLUWXDO 9''
 QRGH DIWHU  ZULWH
RSHUDWLRQVLQWRD65$0FHOORIWKHVDPHFROXPQ
,W LV DVVXPHG WKDW WKH 2&$6 FLUFXLWU\ LV SUHYHQWHG
IURP DJLQJ GXH WR WKH XVH RI SRZHU JDWLQJ ORJLF
WUDQVLVWRUV 73* DQG 71*  WKDW DUH WXUQHG 21 RQO\
GXULQJ WKH VKRUW SHULRGV RI WHVWLQJ +RZHYHU WKH
GHFRXSOLQJWUDQVLVWRU 77 LVDOZD\V217KLVWUDQVLVWRU
LV XVHG WR GHFRXSOH WKH SRZHU VXSSO\ OLQH RI WKH


7KH VKLIW YDOXH RI  ZDV VHOHFWHG E\ FRQVLGHULQJ OLWHUDWXUH ZRUNV WKDW
VXJJHVWW\SLFDO9WKSKDQJLQJIURPWRSHU\HDU>@
28

E 


)LJXUH  63,&( VLPXODWLRQ IRU )) FRUQHU 7   R &HOVLXV 9'' 
9  D  2&$6 IDLOV WR GHWHFW D \HDU DJLQJ IDXOW LQMHFWHG LQ D
65$0FHOODIWHUDVHTXHQFH RIZULWH RSHUDWLRQV E  2&$6VXFFHHGV
WRGHWHFWD\HDUDJLQJIDXOWDIWHUVXFKDVHTXHQFHRIZULWHRSHUDWLRQV


2013 IEEE 19th International On-Line Testing Symposium (IOLTS)

LL DJLQJVWDWHRIWKHFHOOWREHPRQLWRUHG$VROGLVD
PHPRU\ FHOO OHVVHU LV LWV FDSDELOLW\ WR GLVFKDUJH WKH
YLUWXDO9''QRGHGXULQJDZULWHRSHUDWLRQEHFDXVHRIWKH
UHGXFHGFXUUHQWGULYHFDSDELOLW\RIWKHS026WUDQVLVWRUV
WKDW KDYH WKHLU WKUHVKROG YROWDJH LQFUHDVHG 7KLV PHDQV
WKDWDVROGLVDFHOOHDVLHULVWKH2&$6
VWDVNWRGHWHFWD
IDXOWFRQGLWLRQ

D 

E 

)LJXUH  63,&( VLPXODWLRQ IRU 66 FRUQHU 7  R &HOVLXV 9'' 
9  D  2&$6 IDLOV WR GHWHFW D \HDU DJLQJ IDXOW LQMHFWHG LQ D
65$0FHOODIWHUDVHTXHQFH RIZULWH RSHUDWLRQV E  2&$6VXFFHHGV
WRGHWHFWD\HDUDJLQJIDXOWDIWHUVXFKDVHTXHQFHRIZULWHRSHUDWLRQV
7DEOH,,2&$6VHQVLWLYLW\IRU77JHWWLQJROGGXULQJV\VWHPOLIHWLPH
&RUQHUFDVH

$JLQJIDXOW
$JLQJIDXOW
GHWHFWLRQE\ GHWHFWLRQE\
:ULWHV
:ULWHV

$JLQJIDXOW
GHWHFWLRQE\
:ULWHV

$JLQJIDXOW
GHWHFWLRQE\
:ULWHV

77 7 R9GG 9 


77IUHVK 









77 7 R9GG 9


77DJHGE\\HDUV









$ 'LVFXVVLRQV
+HUHDIWHU ZH GLVFXVV WKH SDUDPHWHUV WKDW GLUHFWO\
DIIHFW 2&$6
 VHQVLWLYLW\ ZKHQ VFDOLQJ GRZQ IURP  WR
QPWHFKQRORJ\0RUHVSHFLILFDOO\2&$6VHQVLWLYLW\LV
DIXQFWLRQRIWKHQXPEHURIZULWHRSHUDWLRQVWKDWKDYHWR
EHSHUIRUPHGGXULQJWKHSUHFKDUJHSKDVHRIWKH7HVWLQJ
0RGH1XPEHURIZULWHRSHUDWLRQVGHSHQGVRQWKHDFWXDO
9''QRGHFDSDFLWDQFHZKLFKLQWXUQLVDIXQFWLRQRIWKH
L QXPEHURIFHOOVFRQQHFWHGWRWKHFROXPQ$VODUJH
LVWKH 9'' QRGH FDSDFLWDQFH DODUJHQXPEHURIFHOOVLV
FRQQHFWHG WR WKH FROXPQ  ORQJHU ZULWH RSHUDWLRQ
VHTXHQFHVDUHQHHGHGE\2&$6WRFKHFNWKHDJLQJVWDWH
RIDJLYHQFHOO$QGWKHUHYHUVHLVDOVRWUXHDVVPDOOHULV
WKH 9'' QRGH FDSDFLWDQFH D VPDOO QXPEHU RI FHOOV LV
FRQQHFWHG WR WKH FROXPQ  VKRUWHU ZULWH RSHUDWLRQ
VHTXHQFHVDUHQHHGHG

%DVHGRQUHFHQW63,&(VLPXODWLRQVRID 65$0FDVH
VWXG\ GHVLJQHG LQ D FRPPHUFLDO QP WHFKQRORJ\ >@
2&$6
 VHQVLWLYLW\ ZDV DGMXVWHG WR  PLOOLYROW 6LQFH
GXULQJWKHVHFRQGZULWHRSHUDWLRQDFHOODJHGRIDWOHDVW
\HDU DW D SDFH RI  LQFUHDVH RI 9WKS  SURGXFHV D
YDULDWLRQ DW WKH 9'' QRGH ODUJHU WKDQ  PLOOLYROW LQ D
65$0 FROXPQ FRQWDLQLQJ  FHOOV 2&$6 ZDV DEOH WR
GHWHFW DQ\ FHOO DJHG RI  \HDU RU PRUH DIWHU WKH VHFRQG
ZULWHRSHUDWLRQLQVXFKDQ65$0,QRUGHUWRJXDUUDQWHH
VLPLODU PLOOLYROW 2&$6
VHQVLWLYLW\DIWHUVFDOLQJGRZQ
WRQPWHFKQRORJ\WKHQXPEHURIZULWHRSHUDWLRQVKDG
WR EH LQFUHDVHG $V H[DPSOH FRQVLGHU 7DEOH , IRU VKRUW
ZULWH VHTXHQFHV  ZULWHV )LJ   2&$6 ZDV DEOH WR
GHWHFW FHOOV DJHG RI DW OHDVW  \HDUV EXW LW IDLOHG WR
LGHQWLI\ FHOOV DJHG E\  \HDUV +RZHYHU ZKHQ LW ZDV
SHUIRUPHGORQJHUVHTXHQFHV ZULWHVULJKWPRVWFROXPQ
RI7DEOH, 2&$6HDVLO\GHWHFWHGFHOOVDJHGRIDWOHDVW
\HDU RU PRUH :H ZRXOG OLNH WR XQGHUOLQH WKDW UHVLVWRUV
5 DQG 5 ZHUH LQFOXGHG LQ WKH GHVLJQ RQO\ IRU
VLPXODWLRQ SXUSRVH ,Q RUGHU WR HQVXUH WKH  PLOOLYROW
2&$6VHQVLWLYLW\WKHUHIHUHQFHYROWDJHZLOOEHJHQHUDWHG
RXWVLGHWKHFKLSWRDYRLGYDULDELOLW\SUREOHPVDQGODFNRI
DFFXUDF\1RWHWKDWLIRQHGHVLUHVWRFRQQHFW2&$6WRD
FROXPQ FRQWDLQLQJ PRUH WKDQ  FHOOV LW ZLOO SUREDEO\
EH QHFHVVDU\ WR FRPELQH ODUJHU VHTXHQFHV RI ZULWH
RSHUDWLRQV WKDQ WKRVH REVHUYHG LQ 7DEOH , ZLWK D ORQJHU
2&$6 SUHFKDUJH SKDVH ORQJHU WKDQ QV DW OHDVW  LQ
RUGHU WR REWDLQ D YDULDWLRQ RI DW OHDVW  PLOOLYROW DW WKH
9'' QRGH 6R WKH DSSURDFK
V UHOLDELOLW\ LV XQFKDQJHG
DQG WKH RQO\ SDUDPHWHU WKDW PXVW EH DGMXVWHG LV WKH WHVW
GXUDWLRQVLQFHDORQJHU ZULWHRSHUDWLRQVHTXHQFH ZLWK D
ORZHU DSSOLFDWLRQ IUHTXHQFH ORZHU WKDQ 0+]  LV
QHHGHG
)LQDOO\ LW LV DOVR LPSRUWDQW WR PHQWLRQ WKDW WKH
QXPEHU RI ZULWH RSHUDWLRQV VKRXOG EH HYHQ LQ RUGHU WR
JXDUDQWHH WKDW WKH RULJLQDO YDOXH WKDW ZDV VWRUHG LQ WKH
PHPRU\FHOOEHIRUHVWDUWLQJWKHWHVWLVUHVWRUHGEHIRUHWKH
PHPRU\ LV VHW EDFN LQWR WKH 1RUPDO 2SHUDWLQJ 0RGH
7KLV SURSHUW\ LV LPSRUWDQW IRU SHULRGLFDO RIIOLQH WHVWLQJ
SURFHGXUHV RQ WKH ILHOG LQ RUGHU WR JXDUDQWHH WKDW
DSSOLFDWLRQPHPRU\FRQWHQWLVQRWORVWDIWHUWHVW
% $UHD2YHUKHDG
&RQVLGHULQJ WKH FDVHVWXG\ GHVFULEHG LQ 6HFWLRQ ,,,
DQG VLPXODWHG LQ 6HFWLRQ ,9 D 65$0 FRQVLVWLQJ RI 
FROXPQVHDFKFROXPQFRQWDLQLQJFHOOVDQG2&$6
WKHWUDQVLVWRUFRXQWDUHDRYHUKHDGIRUWKHPHPRU\GXHWR
VHQVRULQVHUWLRQLVDOPRVWQHJOLJLEOH
& 3RZHU&RPVXPSWLRQ
7KH SRZHU FRQVXPHG SHU 2&$6 LQVHUWLRQ LV
FRPSXWHGLQWZRSDUWVWKHVWDWLFSRZHUDQGWKHG\QDPLF

2013 IEEE 19th International On-Line Testing Symposium (IOLTS)

29

RQH 7KH ODWWHU SRZHU KDV EHHQ FRPSXWHG IRU WKH VHQVRU
RSHUDWLQJDW0+],QPRUHGHWDLO
6WDWLF SRZHU FRQVXPSWLRQ 9DD  9 OHDNDJH
FXUUHQW S$ 36 S:
'\QDPLF SRZHU FRQVXPSWLRQ 9DD
FRPSRVHGRIWKUHHFRPSRQHQWV

 9  LV

D  6ZLWFKLQJ SRZHU RQ DYHUDJH FXUUHQW L 


X$ 3' X:7KLVLVWKHDPRXQWRIWKH
SRZHU FRQVXPHG IRU VZLWFKLQJ RQ WKH VHQVRU ZKHQ
VHWWLQJ32:(5B*$7,1*  
E (YDOXDWLRQ3KDVH DYHUDJHFXUUHQWL X$ 
3' X:
F  6ZLWFKLQJ SRZHU RII DYHUDJH FXUUHQW L 
X$ 3' X:7KLVLVWKHDPRXQWRIWKH
SRZHU FRQVXPHG IRU VZLWFKLQJ RII WKH VHQVRU ZKHQ
VHWWLQJ32:(5B*$7,1*  
7KHQWKHILQDOG\QDPLFSRZHUFRQVXPSWLRQ 3' 3'
 3'  3'  LV X: DQG WKH WRWDO SRZHU
FRQVXPHG 37 363' E\DQ2&$6DWWKHFRQVLGHUHG
WHFKQRORJ\ DQG IUHTXHQF\ LV S: 
X: 

VLWXDWLRQ 7KH VLPXODWLRQV KDYH VKRZQ WKDW WKH LQFXUUHG


GHOD\VGXULQJUHDGDQGZULWHRSHUDWLRQVDUHQHJOLJLEOHLQ
WKHSUHVHQFHRI77[VIRUUHDGVDQG
[VIRUZULWHV
9 ),1$/&216,'(5$7,216
7KLV SDSHU SUHVHQWHG WKH LQWHJUDWLRQ RI WKH 2Q&KLS
$JLQJ 6HQVRU 2&$6  DSSURDFK LQ WKH GHVLJQ
PHWKRGRORJ\RIQPVLQJOHSRUW65$0FRUHV7KHILQDO
JRDO LV DOORZ WKH 6\QRSV\V WHVW DQG UHSDLU RQFKLS
LQIUDVWUXFWXUH 67$5 6HOI7HVW DQG 5HSDLU 6ROXWLRQ  WR
GHWHFW65$0DJLQJGXULQJV\VWHPOLIHWLPH
63,&( VLPXODWLRQV ZHUH SHUIRUPHG WR  DOORZ XV WR
XQGHUVWDQG WKH DJLQJ LPSDFW RQ 65$0 FRUHV DV ZHOO DV
DQDO\]HWKH2&$6
VHQVLWLYLW\WRGHWHFWHDUO\DJLQJVWDWHV
LQ WKLV 9'60 WHFKQRORJ\ ,W ZDV REVHUYHG WKDW WKH
2&$6
 VHQVLWLYLW\ FDQ EH DGMXVWHG E\ FRPELQLQJ LQ
GLIIHUHQW GHJUHHV WZR RU PRUH RI WKH IROORZLQJ
RSHUDWLRQDO SDUDPHWHUV   QXPEHU RI ZULWH RSHUDWLRQV
LQWR WKH FHOO WR EH PRQLWRUHG   QXPEHU RI FHOOV
FRQQHFWHGSHU65$0 FROXPQDQG  WKHWLPH GXUDWLRQ
RI WKH 2&$6
 SUHFKDUJH SKDVH ZKLFK GLUHFWO\ DIIHFWV
WKH FORFN IUHTXHQF\ XVHG WR SHUIRUP WKH PHPRU\ WHVW 
0RUHRYHU WKH HVWLPDWHG DUHD RYHUKHDG GXH WR VHQVRU
LQVHUWLRQ UHYHDOHG WR EH RQ WKH RUGHURI  ZKHUHDV
SRZHUFRQVXPSWLRQZDVX:SHUVHQVRU
$&.12:/('*0(17
7KLV ZRUN KDV EHHQ SDUWLDOO\ IXQGHG E\ &13T 6FLHQFH DQG
7HFKQRORJ\)RXQGDWLRQ%UD]LO XQGHUFRQWUDFWVQ 34 
DQGQ 8QLYHUVDO 

ZLWK
77

ZLWKRXW
77

D 

5()(5(1&(6



'HOD\[ V

ZLWK77

E 

ZLWKRXW
77

ZLWK
77



'HOD\[ V


)LJXUH  3HUIRUPDQFH GHJUDGDWLRQ GXH WR 2&$6 LQVHUWLRQ D  UHDG
RSHUDWLRQV E ZULWHRSHUDWLRQV

' 3HUIRUPDQFH'HJUDGDWLRQ
$QRWKHU LVVXH UHODWHG WR WKH GHOD\ LQFUHDVH
SHUIRUPDQFH GHJUDGDWLRQ  WKDW FRXOG UHVXOW IURP
LQWHJUDWLQJ WKH 2&$6 FLUFXLWU\ LQ D 65$0 PHPRU\ ,Q
RWKHU ZRUGV WUDQVLVWRU 77 PD\ OLPLW WKH SRZHU VXSSO\
FXUUHQW ,'' WKDWIORZV EHWZHHQ9''DQG*QGDQGIURP
9''WRWKHELWQRGHZKHQDFHOOLVDFFHVVHG IRUDUHDGRU
D ZULWH RSHUDWLRQ  ,Q WKLV FDVH SHUIRUPHG D VHW RI
VLPXODWLRQVZLWKDQGZLWKRXWWUDQVLVWRU77FRQQHFWHGWR
WKHSRZHUVXSSO\OLQHDQGPHDVXUHGWKHGHOD\RIDUHDG
DQG D ZULWH RSHUDWLRQ LQWR WKH FHOO )LJ  GHSLFWV WKLV
30

>@ 6 0DKDSDWUD ' 6DKD ' 9DUJKHVH 3 % .XPDU 2Q WKH
*HQHUDWLRQ DQG 5HFRYHU\ RI ,QWHUIDFH 7UDSV LQ 026)(7V
6XEMHFWHG WR 1%7, )1 DQG +&, 6WUHVV ,((( 7UDQV (OHFWURQ
'HY
>@ ,QJ&KDR/LQ&KLQ+RQJ/LQ.XDQ+XL/L/HDNDJHDQG$JLQJ
2SWLPL]DWLRQ 8VLQJ 7UDQVPLVVLRQ *DWH%DVHG 7HFKQLTXH ,(((
7UDQV RQ &RPSXWHU$LGHG 'HVLJQ RI ,QWHJUDWHGG &LUFXLWV DQG
6\VWHPV9RO1R-DQSS
>@ &)HUUL'3DSDJLDQQRSRXORX5,ULV%DKDU$&DOLPHUD1%7,
$ZDUH 'DWD $OORFDWLRQ 6WUDWHJLHV IRU 6FUDWFKSDG0HPRU\ %DVHG
(PEHGGHG 6\VWHPV  WK ,((( /DWLQ $PHULFDQ 7HVW :RUNVKRS
/$7: 
>@ ) $KPHG / 0LORU 5HOLDEOH &DFKH 'HVLJQ ZLWK 2Q&KLS
0RQLWRULQJ RI 1%7, 'HJUDGDWLRQ LQ 65$0 &HOOV XVLQJ %,67
WK,(((9/6,7HVW6\PSRVLXP 976 SS
>@ =4L-:DQJ$&DEH6:RRWHUV7%ODORFN%&DOKRXQ0
6WDQ 65$0%DVHG 1%7,3%7, 6HQVRU 6\VWHP 'HVLJQ 'HVLJQ
$XWRPDWLRQ&RQIHUHQFH '$& 
>@ $&HUDWWL7&RSHWWL/%RO]DQL)9DUJDV,QYHVWLJDWLQJWKH8VH
RI DQ 2Q&KLS 6HQVRU WR 0RQLWRU 1%7, (IIHFW LQ 65$0 WK
,(((/DWLQ$PHULFDQ7HVW:RUNVKRS /$7: 
>@ -+LFNV'%HUJVWURP0+DWWHQGRUI--RSOLQJ-0DL]63DH
& 3UDVDG - :LHGHPHU QP 7UDQVLVWRU 5HOLDELOLW\ ,QWHO
7HFKQRORJ\-RXUQDO9RO,VVXH-XQH,661
;'2,LWM
>@ . 'DUELQ\DQ * +DUXW\XQ\DQ 6 6KRXNRXULDQ 9 9DUGDQLDQ
DQG<=RULDQ$UREXVW VROXWLRQIRUHPEHGGHGPHPRU\WHVWDQG
UHSDLU,((($VLDQ7HVW6\PSRVLXP $76 SS


2013 IEEE 19th International On-Line Testing Symposium (IOLTS)

IEEE ELECTRON DEVICE LETTERS, VOL. 35, NO. 3, MARCH 2014

393

Impact of Single Trap Random Telegraph Noise


on Heterojunction TFET SRAM Stability
Rahul Pandey, Student Member, IEEE, Vinay Saripalli, Jaydeep P. Kulkarni,
Vijaykrishnan Narayanan, Fellow, IEEE, and Suman Datta, Fellow, IEEE
Abstract We investigate the effect of a single charge trap
random telegraph noise (RTN)-induced degradation in IIIV
heterojunction tunnel FET (HTFET)-based SRAM. Our analysis
focuses on Schmitt trigger (ST) mechanism-based variation
tolerant ten-transistor SRAM. We compare iso-area SRAM cell
configurations in Si-FinFET and HTFET. Our results show
that HTFET ST SRAMs provide significant energy/performance
enhancements even in the presence of RTN. For sub-0.2 V
operation (Vcc), HTFET ST SRAM offers 15% improvement in
read-write noise margins along with better variation immunity
from RTN over Si-FinFET ST SRAM. A comparison with
iso-area 6T Si-FinFET SRAM with wider size transistors shows
43% improved read noise margin in 10T HTFET ST SRAM at
Vcc = 0.175 V. In addition, HTFET ST SRAM exhibits 48X
lower read access delay and 1.5X reduced power consumption
over Si-FinFET ST SRAM operating at their respective Vcc-min.
Index Terms Heterojunction TFET, random telegraph noise
(RTN), trap, electrical noise, TCAD simulation, SRAM.

I. I NTRODUCTION

ANDOM Telegraph Noise (RTN) is a prominent source


of threshold voltage fluctuation VT h in MOSFETs [1].
For sub-14 nm technology nodes, VT h from RTN is expected
to exceed that from random dopant fluctuation [2], which has
been so far the dominant source of variation for sub-threshold
MOSFETs. Since the RTN scales inversely with the device
footprint, it makes SRAM design most vulnerable to RTN due
to minimum-sized transistors used in the cell. Hence, it is of
great significance to explore RTN immunity of SRAM designs
using CMOS and post CMOS device technologies.
RTN in SOI based Tunnel FET has been studied through
simulations in [3] and experimentally in [4]. At very low
supply voltage, Vcc, compound semiconductor (III-V) based
Heterojunction Tunnel FET (HTFET) has emerged as an
alternative for conventional subthreshold MOSFET due to its
high on-current and sub-60 mV/decade subthreshold slope
[5], [6]. HTFET based Schmitt-Trigger SRAM (ST2 SRAM
topology [7]) is shown to offer improved read/write noise
margins, with sufficient variation tolerance as compared to

Manuscript received December 20, 2013; accepted January 12, 2014. Date
of publication January 31, 2014; date of current version February 20, 2014.
This work was supported by the National Science Foundation ASSIST
Nanosystems ERC under Award EEC-1160483. The review of this letter was
arranged by Editor D. Ha.
R. Pandey, V. Saripalli, V. Narayanan, and S. Datta are with The
Pennsylvania State University, University Park, PA 16802 USA (e-mail:
rop5090@psu.edu).
J. P. Kulkarni is with the Circuit Research Laboratory, Intel Corporation,
Hillsboro, OR 97124 USA.
Color versions of one or more of the figures in this letter are available
online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/LED.2014.2300193

Si-FinFET ST2 SRAM [8]. Here we specifically focus on


analyzing RTN tolerance of both 6-Transistor (6T) SRAM and
10T ST2 SRAM, for both Si-FinFET and HTFET.
II. S IMULATION M ETHODOLOGY
Each transistor exhibits a variation of threshold voltage,
VT h , caused by trapping and de-trapping of the charge
carriers at the interface trap site. Hence, the RTN in each
transistor of an n-T SRAM cell produces 2n unique RTN
cell-variants [9]. We analyze all 2n combinations to identify
the impact on SRAM read/write noise margins, to examine
the RTN induced variation immunity of the SRAM cell.
A limitation of this approach [9] is it does not capture the
time evolution of RTN. However, since we are quantifying the
worst case RTN induced degradation across two different technologies, the current approach still serves as a useful indicator.
The device parameters and the calibration methodology for
both Si-FinFET and HTFET, used in SRAM simulations, are
provided in [10]. The drain current fluctuation, ID , from
RTN is transformed into VT h through the transconductance,
gm at each bias point [11]. For Si-FinFET, ID was modeled
from [12]. For HTFET we use following analytical expression
calibrated against TCAD simulations over a Vcc range of
0.1V to 0.5V [10]:


B
2
q
I D
+
=
(1)
ID
F F2 ch WL
where F is electric field at the source-channel tunneling
junction, L the tunneling distance of carriers and constant B
are as defined in [10], ch is the channel electrical permittivity,
and = 0.5 is an empirical parameter. In transistor level
RTN models we have assumed trap locations giving rise
to worst case RTN at device level in operation range of
0.1 V0.5 V [10]. For HTFET, the trap is located at 2 nm
from tunnel junction, whereas for Si-FinFET it is positioned
at near mid-channel region. Additionally, in case of HTFET,
the impact of a trap located at tunnel junction on SRAM
performance has been discussed at end of Section III. RTN
from this trap location does not produce worst case noise
margins for practical range of Vcc for SRAM operation (for
this trap location, I D /ID extracted from TCAD simulation
is directly used into Verilog-A lookup table based transistor
model). Details regarding circuit simulation methodology
is presented in [8]. The fluctuation in VT h due to RTN is
integrated into the SRAM design for circuit level simulation
in the Cadence Spectre circuit simulator [13].
III. R ESULTS
Due to the uni-directional conduction in HTFET resulting from its asymmetric source-drain architecture, 6T

0741-3106 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

394

Fig. 1. 10T ST2 SRAM (a) read schematic (b) RNM in presence of RTN in
1024 possible cell types (c) (d) Worst case RTN RNM, FinFET and HTFET.

HTFET SRAM cannot perform simultaneous read and write


operation [8]. 6T SRAM shows significant degradation in Read
Noise Margin (RNM) due to RTN as shown later [Fig. 4(a)].
Hence, we explore the RTN tolerance of Schmitt Trigger (ST)
mechanism based ST2 SRAM topology which has been shown
to exhibit variation immunity and to be suitable for ultralow Vcc operation [7]. We will use the same transistor sizing
scheme as used in [8] in order to draw comparison between
the RTN performance of HTFET and Si-FinFET based ST2
SRAM.
A 10T ST2 SRAM cell in read mode is shown in Fig. 1(a).
Due to RTN in each transistor, a total of 1024 cell combinations are possible. The RNM distribution is depicted in
Fig. 2(b) at Vcc = 0.25V. The intrinsic RTN-free RNM for
HTFET and FinFET ST2 SRAM is 83.9 mV and 81.7 mV,
respectively. Higher Ion and Ion/Ioff ratio of HTFET improves
the intrinsic RNM of HTFET ST2 SRAM over Si-FinFET.
However, the worst case degradation in RNM (due to RTN) is
still comparable: 12.7 % in HTFET and 11 % for Si-FinFET.
The percentage RNM degradation is very sensitive to Vcc as
we discuss later. Worst case RNM [Fig. 1(c) and (d)] results
from RTN in pull-up transistor PL, and in pass gate transistor
NFL, along with RTN in pull-down transistor NR2.
Fig. 2 depicts Write Noise Margin (WNM) of ST2 SRAM
cell at Vcc = 0.25V. Note that intrinsic no-RTN WNM of
HTFET is 183.6 mV which is less compared to 189.2 mV of
Si-FinFET, still the % worst case degradation in WNM due to
RTN in Si-FinFET is higher: 5.11% against 4.57% of HTFET.
Worst case WNM is caused by RTN in pull-up transistor PL
and in pass gate transistor NFL, along with RTN in pull-down
transistor NR2 and in pass gate transistor AXRWR.
The effect of Vcc scaling on the RTN impact on
RNM/WNM is depicted in Fig. 3(a) and (b). Both worst case
and best case changes in the SRAM noise margin due to
RTN are shown in Fig. 3, which is essential to capture RTN
tolerance of SRAM cell. Intrinsic read/write noise margins
of ST2 SRAM improve in HTFET design for sub-0.225 V,
over subthreshold Si-FinFET. This is a direct consequence
of higher Ion/Ioff ratio coupled with increased on-current in
HTFET at ultra-low Vcc, which both reduces the influence of
trap on channel carriers by screening, as well as improves the

IEEE ELECTRON DEVICE LETTERS, VOL. 35, NO. 3, MARCH 2014

Fig. 2. 10T ST2 SRAM (a) write schematic (b) WNM in presence of RTN in
1024 possible cell types (c) (d) Worst case RTN WNM, FinFET and HTFET.

Fig. 3. 10T ST2 SRAM (a) RNM, and (b) WNM trend with Vcc scaling
in presence of RTN. Percent change in RNM (c) and WNM (d) indicates
HTFET ST2 SRAM is more immune to RTN induced variation.

efficiency of Schmitt feedback action [8] (thereby benefitting


the noise margin). Fig. 3(c) and (d) display the percentage
variation in RNM/WNM of ST2 SRAM with Vcc scaling.
HTFET ST2 SRAM displays a symmetric change in noise
margin (best case /worst case RTN) for both read and
write operation whereas Si-FinFET SRAM shows significant
RNM degradation (>30%) sub-0.2V due to extremely low
on-currents in RTN affected subthreshold devices (which also
deteriorates Schmitt feedback mechanism and hence the noise
margins). Hence, HTFET ST2 SRAM exhibits overall better
immunity against RTN induced variation in noise margin in
contrast to subthreshold Si-FinFET ST2 SRAM, at ultra-low
Vcc. At Vcc = 0.15 V, with worst RTN, HTFET ST2 SRAM
offers 15.8% and 17.2% improvement in RNM and WNM
over Si-FinFET.
It is important to compare the RTN performance of isoarea 6T Si-FinFET SRAM against 10T ST2 HTEFT SRAM

PANDEY et al.: IMPACT OF SINGLE TRAP RTN ON HTFET SRAM STABILITY

395

Fig. 4. (a) RNM of 10T ST2 SRAM compared against 6T SRAM (b) Average power consumption of 256256 SRAM array with 5% activity factor
(c) Read-access delay. For HTFET SRAM, plots for 2 different trap locations: trap at tunnel junction and at 2 nm away from tunnel junction are also shown.
TABLE I
N ORMALIZED P ERFORMANCE M ETRICS W ITH RTN, AT V CCMIN

as the influence of RTN diminishes in upsized transistors.


In order to meet the iso-area condition, the 6T Si-FinFET
SRAM uses 4X sized transistors [8] which results in improved
RNM over 1X sized 6T SRAM, as depicted in Fig. 4(a).
Still the RNM of 4X sized 6T Si-FinFET SRAM is 43%
less that of 10T ST2 HTEFT SRAM (with trap distance,
Xt = 2 nm from tunnel junction) at Vcc = 0.175V, and, hence,
the performance of ST SRAM design remains superior due to
Schmitt feedback action. RNM of 10T ST2 Si-FinFET SRAM
improves at higher Vcc as Si-FinFET gains in on-current as it
transitions out of subthreshold operation regime (VT h 0.4 V).
The average power consumed by 256 256 SRAM array with
an activity factor of 5%, using an approach similar to [8], is
shown in Fig. 4(b) along with read access delay in Fig. 4(c).
At Vcc = 0.175 V, HTFET ST2 SRAM (Xt = 2 nm) exhibits
75X and 21X faster read-access times as compared to FinFET
ST2 SRAM and FinFET 6T-4X sized SRAM respectively.
In HTFET, RTN from the trap at tunnel junction is prominent only for VGS <0.1 V [10]. Hence, as Vcc scales down
to 0.13 V and below [Fig. 4(a)], RTN from trap at tunnel
junction produces worse SRAM RNM than RTN from the trap
at Xt = 2 nm. However, lower limit on Vcc (Vcc-min) is set by
a minimum RNM requirement of 26 mV (kB T/q, T = 300K).
This Vcc-min exceeds 0.13 V for all SRAM designs discussed
in this work (refer Table I). Hence the effect of trap at the
tunnel junction is not pronounced for practical SRAM Vcc
range and consequently, the trap at Xt = 2 nm gives rise to
worst case RTN. The trap at tunnel junction although turns the
devices more leaky (higher average power than Xt = 2 nm trap,
Fig. 4(b), still comparable to Si-FinFET ST2 SRAM at its Vccmin) along with marginally fast read-access [Fig. 4(c)] enabled
by higher drain current. The average power consumption and
read-access delay of Si-FinFET ST2 and 6T-4X sized SRAM,
normalized against HTFET ST2 SRAM, at their respective
Vcc-min for worst case RTN, is shown in Table I, indicating
power savings in HTFET design at ultra-low Vcc.
IV. C ONCLUSION
RTN in HTFET based SRAM is analyzed for the first time.
6T HTFET SRAM shows significant degradation of RNM as

compared to Si-FinFET 6T SRAM due to delayed saturation in HTFET output characteristics. 10T ST2 SRAM
using Schmitt Trigger feedback mechanism to suppress variation is examined to explore its RTN immunity. For sub0.225 V operation, HTFET ST2 SRAM supersedes Si-FinFET
ST2 SRAM in performance due to high Ion and Ion/Ioff
ratio of HTFET (which improves effectiveness of Schmitt
feedback [8]). At 0.15V, HTFET ST2 SRAM offers 15.8%
and 17.2% improvement in RNM and WNM respectively over
Si-FinFET ST2 SRAM, besides exhibiting better tolerance
against RTN induced variation and faster operation with competitive power dissipation. Thus HTFET ST2 SRAM meets
performance and power requirements at ultra-low Vcc SRAM
applications.
R EFERENCES
[1] M. Agostinelli, J. Hicks, J. Xu, et al., Erratic fluctuations of SRAM
cache vmin at the 90nm process technology node, in IEEE IEDM Tech.
Dig., Dec. 2005, pp. 655658.
[2] N. Tega, H. Miki, R. Zhibin et al., Impact of HK / MG stacks
and future device scaling on RTN, in Proc. IEEE IRPS, Apr. 2011,
pp. 6A.5.16A.5.6.
[3] M. Fan, V. P. Hu, Y. Chen, et al., Analysis of single-trap-induced
random telegraph noise and its interaction with work function variation for tunnel FET, IEEE Trans. Electron Devices, vol. 60, no. 6,
pp. 20382044, Jun. 2013.
[4] J. Wan, C. Le Royer, A. Zaslavsky, et al., Low-frequency noise behavior
of tunneling field effect transistors, Appl. Phys. Lett., vol. 97, no. 24,
pp. 243503-1243503-3, 2010.
[5] D. K. Mohata, R. Bijesh, S. Mujumdar, et al., Demonstration of
MOSFET-like on-current performance in arsenide/antimonide tunnel
FETs with staggered hetero-junctions for 300mV logic applications,
in Proc. IEEE IEDM, vol. 5. Dec. 2011, pp. 33.5.133.5.4.
[6] G. Dewey, B. Chu-Kung, J. Boardman, et al., Fabrication, characterization, and physics of III-V heterojunction tunneling field effect transistors
for steep sub-threshold swing, in Proc. IEEE IEDM, vol. 3. Dec. 2011,
pp. 33.6.133.6.4.
[7] J. P. Kulkarni, K. Kim, S. P. Park, et al., Process variation tolerant SRAM array for ultra low voltage applications, in Proc. 45th
ACM/IEEE DAC, Jun. 2008, pp. 108113.
[8] V. Saripalli, S. Datta, V. Narayanan, et al., Variation-tolerant ultra lowpower heterojunction tunnel FET SRAM design, in Proc. IEEE/ACM
Int. Symp. Nanoscale Archit., vol. 1. Jun. 2011, pp. 4552.
[9] M.-L. Fan, V. P.-H. Hu, Y.-N. Chen, et al., Impacts of single trap
induced random telegraph noise on FinFET devices and SRAM cell
stability, in Proc. IEEE Int. SOI Conf., Oct. 2011, pp. 12.
[10] R. Pandey, B. Rajamohanan, H. Liu, et al., Electrical noise in heterojunction interband tunnel FETs, IEEE Trans. Electron Devices, vol. 61,
no. 2, Feb. 2014, to be published.
[11] N. Tega, H. Miki, M. Yamaoka, et al., Impact of threshold voltage
fluctuation due to random telegraph noise on scaled-down SRAM, in
Proc. IEEE Int. Rel. Phys. Symp., Apr./May 2008, pp. 541546.
[12] C. Leyris, S. Pilorget, M. Marin, et al., Random telegraph signal noise
SPICE modeling for circuit simulators, in Proc. 37th Eur. Solid State
Device Res. Conf., Sep. 2007, pp. 187190.
[13] (2009). Cadence Virtuoso Spectre Circuit Simulator [Online]. Available:
http://www.cadence.com/products/rf/spectre_circuit/pages/default.aspx

ISSCC 2014 / SESSION 13 / ADVANCED EMBEDDED MEMORY / 13.4


13.4

A 7ns-Access-Time 25W/MHz 128kb SRAM for


Low-Power Fast Wake-Up MCU in 65nm CMOS
with 27fA/b Retention Current

Toshikazu Fukuda1, Koji Kohara1, Toshiaki Dozaka1,


Yasuhisa Takeyama1, Tsuyoshi Midorikawa1, Kenji Hashimoto2,
Ichiro Wakiyama2, Shinji Miyano1, Takehiko Hojo1
Toshiba, Kawasaki, Japan, 2Toshiba Microelectronics, Kawasaki, Japan

Battery lifetime is the key feature in the growing markets of sensor networks and
energy-management system (EMS). Low-power MCUs are widely used in these
systems. For these applications, standby power, as well as active power, is
important contributor to the total energy consumption because active sensing or
computing phases are much shorter than the standby state. Figure 13.4.1 shows
a typical power profile of low-power MCU applications. To achieve many years
of battery lifetime, the power consumption of the chip must be kept below 1A
during deep sleep mode. Another key feature of a low-power MCU for such
applications is fast wake-up from deep-sleep mode, which is important for low
application latency and to keep wake-up energy minimal. For fast wake-up, the
system must retain its state and logged information during sleep mode because
several-hundred microseconds are needed for reloading such data to memories.
Conventional SRAM consumes much higher retention current than the required
deep-sleep-mode current as shown in Fig. 13.4.1. Embedded Flash memories
have limited write endurance on the order of 105 cycles making them difficult to
use in applications that frequently power down. Embedded FRAM [1,2] has been
used for this purpose and it could be used as a random-access memory as well
as a nonvolatile memory. However, as a random-access memory, its slow
operation and high energy consumption [1,2] limits performance of the MCU
and battery lifetime. Furthermore, additional process steps for fabricating FRAM
memory cells increase the cost of MCU. SRAM can operate at higher speed with
lower energy without additional process steps, but high retention current makes
it difficult to sustain data in deep-sleep mode. To solve this problem, we develop
low-leakage current SRAM (XLL SRAM) that reduce retention current by 1000
compared to conventional SRAM and operate with less than 10ns access time.
The retention current of XLL SRAM is negligible in the deep-sleep mode because
it is much smaller than the amount of the deep-sleep-mode current of MCU,
which is dominated by active current of the real-time clock and control logic
circuits. By using XLL SRAM, the store and reload process during mode transitions can be eliminated and wake-up time from deep-sleep mode of MCU is
reduced to few microseconds. This paper describes a 128kb SRAM with 3.5nA
(27fA/b) retention current, 7ns access time, and 25W/MHz active energy
consumption. Its low retention current, high-speed, and low-power operation
enable to activate SRAM in the deep-sleep mode, and also provides fast
wake-up, low active energy consumption and high performance to MCU.
Since complexity and required performance of MCU has been increasing, a more
advance process has to be used. As the process geometry becomes smaller, the
leakage current of transistors increases. In addition to channel leakage, other
leakage mechanism become significant, such as gate-oxide leakage and GIDL.
To realize XLL SRAM, these three types of leakage current must be suppressed.
Low-leakage transistors with long gate length and thick gate oxide for SRAM
memory cells have been developed. Their GIDL is also decreased by introducing
lightly doped region in the active layer as shown in Fig. 13.4.2(c). Generally,
adapting long-gate-length and thick-gate-oxide transistors for memory cells
causes an increase in memory macro area and power dissipation. We use
several techniques to shrink the memory cell size, avoiding the increase in active
power dissipation. The memory cell layout is shown in Figs.13.4.2 (a) and (b).
Since the supply voltage of memory cells is 1.2V, the space between p-type and
n-type well and between gate poly and adjacent diffusion area are shrunk
compared to the original design rule for the transistors with the same gate-oxide
thickness. As a result, the memory cell size is reduced about 20%. The cell
height in the vertical direction is extended by adopting long gate length
transistors. This allows four wordlines to be routed over the memory cell as
shown in Fig. 13.4.2(b). A block diagram of the developed 128kb SRAM is
shown in Fig.13.4.3(a). Peripheral circuits consist of conventional transistors to
achieve high performance and to reduce SRAM size. The supply voltage of the
peripheral circuits is cut-off, and the NMOS source node of the memory cells
(VSSB) is reverse biased via source-bias circuits in retention mode, as shown in
Fig. 13.4.3(b). As a result, 27fA/b of leakage current is achieved with the

236

2014 IEEE International Solid-State Circuits Conference

fabricated XLL SRAM at room temperature, shown in Fig. 13.4.4(a). The leakage
current of conventional SRAM is also shown for comparison. The XLL SRAM
achieves 1000 lower leakage current at room temperature compared to the
conventional SRAM due to lower gate leakage current and larger back-gate-bias
effect. Low-power MCU in the deep-sleep mode consumes low energy, so that
retention current at room temperature determines the battery lifetime. The
leakage current of XLL SRAM is lower than the required deep-sleep-mode
current of low power MCU, even though memory capacity increases up to
several Mb. Therefore all of SRAMs in a low-power MCU can be active in deepsleep mode, and the state and the logged information can be retained in them.
Comparison of our SRAM leakage current to published leakage for SRAM in
65nm and smaller processes is shown in Fig. 13.4.4(b). It shows XLL SRAM
reduces leakage current by more than 10 compared to FD-SOI SRAMs.
To compensate for the increase in active power due to relatively large SRAM
area, several low-power techniques are adopted. Bitline-charging current is the
dominant portion of active power consumption of SRAM. To reduce the bitlinecharging current, we adopt quarter array activation scheme (QAAS) and chargeshared hierarchical bitline (CSHBL) [3,4]. Four wordlines are routed over a
memory cell, and one of four wordlines connects to a memory cell in every 4
columns as shown in Fig.13.4.5. An SRAM architecture where two word lines
are routed over a memory cell has been reported [3]. By taking advantage of the
extended memory cell height, the number of wordlines passed over a memory
cell is doubled. Then 3/4 of bitlines remain inactive in active cycles, and bitlinecharging current is reduced. The SRAM also employ CSHBL. It has been
reported that CSHBL is effective for reducing the active power increase due to
random variation of transistors [4]. We find that CSHBL is also effective for
reducing the active power increase due to process and temperature variations.
Waveforms of signals in CSHBL operation are shown on the right side of
Fig.13.4.5. The local bitlines are fully swung when the corresponding wordline
is selected. Before the pass transistors are turned on, the selected wordline falls.
Then stored charge on the local bitlines are transferred to the global bitlines by
charge sharing. Since the amount of stored charge on the local bitlines is
determined by the capacitance of the local bitlines and supply voltage, the
bitline-charging current of the SRAM stays constant regardless temperature and
process condition. In a conventional SRAM, bitline level varies substantially with
changing temperature or process condition. Designing for the minimum bitline
swing that can be sensed by the sense amplifiers in the slowest condition causes
the bitline swing to become excessive in fast conditions. This causes excessive
bitline-charging power dissipation. Figures 13.4.6 (a) and (b) show the bitlinecharging current by measuring current to the memory cell ground in several
process conditions. The charging current of the XLL SRAM decreases more than
40% compared to SRAM with conventional bitline architecture and timingcontrol circuits. Figure 13.4.6(c) shows measured active energy of the XLL
SRAM. Active energy of the XLL SRAM is reduced about 40% by adapting QAAS
and CSHBL, and 25W/MHz of active energy at 1.2V is achieved. The achieved
active energy is only 9% larger than the conventional SRAM.
Figure 13.4.7 shows the chip micrograph and key features of the test chip
fabricated in a 65nm CMOS process. Memory cell size is 2.159m2 and macro
area of the 128kb XLL SRAM is 0.443mm2. Retention current and active energy
at 1.2V is 3.5nA (27fA/b) and 25W/MHz, respectively. The leakage current is
negligible compared to deep-sleep-mode current of low power MCU. Because of
this, all the SRAMs in MCU can be awake in deep-sleep mode, shortening
wake-up time from deep-sleep mode and reducing the energy consumption
during the mode transition.
References:
[1] A. Baumann et al., A MCU platform with embedded FRAM achieving 350nA
current consumption in real-time clock mode with full state retention and 6.5s
system wakeup time, VLSI Cir. Symp., pp. 202-203, 2013.
[2] M. Zwerg et al. An 82A/MHz Microcontroller with Embedded FeRAM for
Energy-Harvesting Applications, ISSCC, pp. 334-335, 2011.
[3] H. Fujiwara et al., A 20nm 0.6V 2.1W/MHz 128kb SRAM with No Half
Select Issue by Interleave Wordline and Hierarchical Bitline Scheme, VLSI Cir.
Symp., pp 118-119, 2013.
[4] S. Miyano et al., Highly Energy-Efficient SRAM With Hierarchical Bit Line
Charge-Sharing Method Using Non-Selected Bit Line Charges, JSSC vol.48, pp.
924-931, April 2013.

978-1-4799-0920-9/14/$31.00 2014 IEEE

ISSCC 2014 / February 11, 2014 / 2:45 PM

Figure 13.4.1: Typical power profile of low power MCU.

Figure 13.4.2: XLL SRAM bitcell and MOSFET leakage path.

13

Figure 13.4.3: Block diagram and source bias circuit.

Figure 13.4.4: XLL SRAM leakage current measured results.

Figure 13.4.5: QAAS and CSHBL circuit and timing chart.

Figure 13.4.6: Bitline-charging current and active power.

DIGEST OF TECHNICAL PAPERS

237

ISSCC 2014 PAPER CONTINUATIONS

Figure 13.4.7: XLL SRAM test-chip micrograph and key features.

2014 IEEE International Solid-State Circuits Conference

978-1-4799-0920-9/14/$31.00 2014 IEEE

ISSCC 2014 / SESSION 13 / ADVANCED EMBEDDED MEMORY / 13.5


13.5

A 16nm 128Mb SRAM in High- Metal-Gate FinFET


Technology with Write-Assist Circuitry for Low-VMIN
Applications

Yen-Huei Chen, Wei-Min Chan, Wei-Cheng Wu, Hung-Jen Liao,


Kuo-Hua Pan, Jhon-Jhy Liaw, Tang-Hsuan Chung, Quincy Li,
George H. Chang, Chih-Yung Lin, Mu-Chi Chiang,
Shien-Yang Wu, Sreedhar Natarajan, Jonathan Chang
TSMC, Hsinchu, Taiwan
FinFET technology has become a mainstream technology solution for post-20nm
CMOS technology [1], since it has superior short-channel effects, better subthreshold slope and reduced random dopant fluctuation. Therefore, it is expected
to achieve better performance with lower SRAM VDDMIN. However, the quantized
sizing of the channel width and length has drawbacks for conventional 6T-SRAM
bitcell scaling. To minimize the bitcell area of the high-density SRAM bitcell, the
number of fins (setting the channel width, W) of the pull-up PMOS (PU), passgate NMOS (PG) and pull-down NMOS (PD) transistors must be selected as
1:1:1. Since PU, PG, and PD have the same channel length (L), the ratio in
geometry between the PU transistor and the PG transistor is equal to one. With
the process variations, the strength of PU transistor can be much stronger than
the PG transistor. A stronger PU transistor increases read stability of the SRAM
bitcell but it degrades the write margin significantly and results in worse writeVDDMIN issue. Figure 13.5.1(a) shows a contention condition between PU and PG
transistors of a 6T-SRAM bitcell for the write operation. During the write
operation, the PU transistor impedes the ability of the PG transistor to pull the
storage node (S) from VDD to ground. The bitcell may suffer a write failure at the
stronger PU with weaker PG condition caused by the device variations. Two
techniques have been proposed to improve the high density SRAM bitcell write
VDDMIN: 1) negative bit-line voltage (NBL) to increase the strength of PG transistor
and 2) lower cell VDD (LCV) to weaken PU transistor strength [1-5]. Compared to
the conventional techniques, this work develops a suppressed-coupling-signal
negative bitline (SCS-NBL) scheme and a write-recovery-enhancement lowercell-VDD (WRE-LCV) scheme for write assist without the concern of reliability at
higher VDD operating region. A comparison of the effectiveness of the two design
techniques is also performed. Figure 13.5.1(b) shows the layout view of the
high-density 6T-SRAM bit-cell with 0.07m2 area in a 16nm high-k metal-gate
FinFET technology. To minimize area, we set the geometric ratio of PU, PG, and
PD transistors all equal to one. With the two developed write-assist circuits, the
overall VDDMIN improvement can be over 300mV in a 128Mb SRAM test-chip.
Figure 13.5.2 shows simulated results of the required negative bitline bias (blue
curve) and coupling NBL voltage levels with and without negative-biassuppression (clamping) circuit for the high-density SRAM bitcell. Since
aggressive negative bias is needed to achieve the VDDMIN target, the coupled
negative-bias level has to be more negative to provide the required bitline write
voltage. Due to the coupling technique, the negative-bias voltage level is
proportional to the voltage level of the coupling signal. In the SRAM write
operation, the negative bitline bias is applied to the source terminal of the PG
transistor and wordline activation pulse signal is applied to the gate terminal of
the PG transistor in the selected SRAM bitcell. The greater negative bias at higher VDD region leads to more stress on the SRAM PG transistor, thus reliability is
a concern for the SRAM operation at higher VDD. Therefore, a suppressed
coupling signal at high VDD region for negative-bit-line bias (SCS-NBL)
generation circuitry is applied to suppress the coupling signal voltage and
reduce the generated negative bias level at higher VDD. With the SCS-NLB
scheme, the reliability concern is mitigated without degrading the negative-bias
level generation at the lower VDD region.
Figure 13.5.3 illustrates the SRAM design equipped with SCS-NBL write-assist
scheme. The SCS-NBL scheme is directly implemented into read/write block
with small area penalty. During the write operation, the write signal pulse
triggers the replica write buffer to pull low the replica bitline (RBL) to generate a
negative-bitline enable signal (ENB_NBL). Then the ENB_NBL signal propagates
to become the coupling signal (NBL_FIRE). The coupling signal voltage
suppression block is composed of the stacked NMOS transistors that can
suppress the coupling signal (NBL_FIRE) voltage level to the designed
suppressed voltage level that can be below the power rail (VDD) at higher VDD

238

2014 IEEE International Solid-State Circuits Conference

region. The falling edge of the suppressed NBL_FIRE signal couples to a


capacitor (C1) to generate a negative coupling signal (NVSS). With the lower
coupling NBL_FIRE signal voltage level, the coupled negative-bias signal (NVSS)
is limited and kept constant below the higher VDD, as shown in the waveforms of
Fig. 13.5.3. Then the negative bias will be propagated to the selected bitline
(BL[n]) and transferred into the selected SRAM bitcell through the write driver
(WD1) and the write MUX (N1).
Figure 13.5.4 shows the write-recovery-enhancement lower-cell-VDD (WRE-LCV)
write-assist scheme. In order to save area, a shared WRE-LCV voltage generator
is placed in each read/write block. During write operation, the generated LCV
voltage signal (lower than the power rail VDD) from the shared WRE-LCV voltage
generator is propagated to the selected column power track (CVDD[n]) and to
the selected bitcell through a transmission gate (T1) that is controlled by the LCV
enable (LCV_EN) and the column selection (Y[n]) signals. For unselected
columns, the CVDD[n-1] will remain at VDD since the transmission gate (T2) is
off and the pre-charging PMOS (P2) turns on to prevent the half-selection readdisturbance issue. Unlike the previous work [1,4] using voltage collapse as write
assist, to prevent the transient data-retention issue, a slightly lower cell VDD
(LCV) with write-recovery-enhancement LCV pulse (WRE-LCV) technique is
applied in this work. The WRE-LCV pulse has to be recovered before the
wordline-pulse-signal turn off to prevent the write recovery failure issue as
shown in the waveforms of Fig. 13.5.4. The WRE-LCV pulse write-recovery
timing control is centralized at the WRE-LCV controller for the recovery timing
adjustment. The WRE-LCV pulse voltage level options (75%, 50%, 25% of VDD)
are also offered for post-fabrication tuning to mitigate process variations.
Figure 13.5.5 shows the floorplan and the area of 128kb SRAM macro with
0.07m2 SRAM bitcell. The WRE-LCV scheme is placed at the boundary of
SRAM array and read/write block (RWBLK). In order to reduce the area penalty,
the WRE-LCV write-recovery timing-control block is placed in the main control
(CTRL) block. The SCS-NBL scheme is located at the bottom of read/write block
(RWBLK). The area overheads of the WRE-LCV and SCS-NBL schemes are 3%
and 2%, respectively.
Figure 13.5.6 shows the cumulative distribution plot of the overall VDDMIN
improvement from a 128Mb test-chip at 25C. The SCS-NBL and WRE-LCV
improve the overall VDDMIN by over 300mV at 95 percentile for the 0.07m2 SRAM
bitcell in a 128Mb test-chip. With the reduced LCV voltage level, the VDDMIN can
be pushed even lower, but the improvement saturates at 50% LCV voltage level.
Figure 13.5.7 shows the micrograph of the 128Mb SRAM test-chip, which is
equipped with electrically programmable fuses for post-silicon tuning on the
write-assist options. The test-chip is built from 1024 128kb (409632) SRAM
macros. The die area of the test-chip is 42.6mm2.
Acknowledgements:
The authors thank the physical design team John Hung, R.S. Chen, Hanson Hsu,
and L.J. Tyan for layout and chip implementation; the RD team for wafer
manufacturing; the test department for chip measurements on this work.
References:
[1] Y. Wang et al., Dynamic Behavior of SRAM Data Retention and a Novel
Transient Voltage Collapse technique for 0.6V 32nm LP SRAM, IEDM Dig. Tech.
Papers, Dec. 2011, pp 32.1.1-32.1.4.
[2] Y. Fujimura et at., A Configurable SRAM with Constant-Negative-Level Write
Buffer for Low-Voltage Operation with 0.149m2 Cell in 32nm High-K Metal-Gate
CMOS, ISSCC Digest of Technical Papers, Feb. 2010, pp 348-349.
[3] H. Pilo et al., A 64Mb SRAM in 32nm High-k Metal Gate SOI Technology
with 0.7V Operation Enabled by Stability, Write-Ability and Read-Ability
Enhancements, ISSCC Digest of Technical Papers, Feb. 2011, pp 254-256.
[4] Eric Karl et al., A 4.6GHz 162Mb SRAM Design in 22nm Tri-Gate CMOS
Technology with Integrated Active Vmin Enhanced Assist Circuitry, ISSCC
Digest of Technical Papers, Feb. 2012, pp 230-231.
[5] Jonathan Chang et al., A 20nm 112Mb SRAM in High-k Metal-Gate with
Assist Circuitry for Low-Leakage and Low-Vmin Applications ISSCC Digest of
Technical Papers, Feb. 2013, pp 316-317.

978-1-4799-0920-9/14/$31.00 2014 IEEE

ISSCC 2014 / February 11, 2014 / 3:15 PM

Figure 13.5.1: (a) Conventional 6T-SRAM bitcell and (b) the layout view of the
high-density SRAM bit-cell with an area of 0.07m2.

Figure 13.5.2: Negative bitline voltage versus required write bitline voltage.

13

Figure 13.5.3: SRAM design equipped with SCS-NBL write assist scheme.

Figure 13.5.4: SRAM design equipped with WRE-LCV write assist scheme.

Figure 13.5.5: Floorplan of the WRE-LCV and SCS-NBL blocks for write-assist
techniques.

Figure 13.5.6: Cumulative distribution plot of overall VDDMIN improvement for


WRE-LCV and SCS-NBL write-assist techniques.

DIGEST OF TECHNICAL PAPERS

239

ISSCC 2014 PAPER CONTINUATIONS

Figure 13.5.7: Die microgrpah of 128Mb SRAM test-chip.

2014 IEEE International Solid-State Circuits Conference

978-1-4799-0920-9/14/$31.00 2014 IEEE

ISSCC 2014 / SESSION 13 / ADVANCED EMBEDDED MEMORY / 13.5


13.5

A 16nm 128Mb SRAM in High- Metal-Gate FinFET


Technology with Write-Assist Circuitry for Low-VMIN
Applications

Yen-Huei Chen, Wei-Min Chan, Wei-Cheng Wu, Hung-Jen Liao,


Kuo-Hua Pan, Jhon-Jhy Liaw, Tang-Hsuan Chung, Quincy Li,
George H. Chang, Chih-Yung Lin, Mu-Chi Chiang,
Shien-Yang Wu, Sreedhar Natarajan, Jonathan Chang
TSMC, Hsinchu, Taiwan
FinFET technology has become a mainstream technology solution for post-20nm
CMOS technology [1], since it has superior short-channel effects, better subthreshold slope and reduced random dopant fluctuation. Therefore, it is expected
to achieve better performance with lower SRAM VDDMIN. However, the quantized
sizing of the channel width and length has drawbacks for conventional 6T-SRAM
bitcell scaling. To minimize the bitcell area of the high-density SRAM bitcell, the
number of fins (setting the channel width, W) of the pull-up PMOS (PU), passgate NMOS (PG) and pull-down NMOS (PD) transistors must be selected as
1:1:1. Since PU, PG, and PD have the same channel length (L), the ratio in
geometry between the PU transistor and the PG transistor is equal to one. With
the process variations, the strength of PU transistor can be much stronger than
the PG transistor. A stronger PU transistor increases read stability of the SRAM
bitcell but it degrades the write margin significantly and results in worse writeVDDMIN issue. Figure 13.5.1(a) shows a contention condition between PU and PG
transistors of a 6T-SRAM bitcell for the write operation. During the write
operation, the PU transistor impedes the ability of the PG transistor to pull the
storage node (S) from VDD to ground. The bitcell may suffer a write failure at the
stronger PU with weaker PG condition caused by the device variations. Two
techniques have been proposed to improve the high density SRAM bitcell write
VDDMIN: 1) negative bit-line voltage (NBL) to increase the strength of PG transistor
and 2) lower cell VDD (LCV) to weaken PU transistor strength [1-5]. Compared to
the conventional techniques, this work develops a suppressed-coupling-signal
negative bitline (SCS-NBL) scheme and a write-recovery-enhancement lowercell-VDD (WRE-LCV) scheme for write assist without the concern of reliability at
higher VDD operating region. A comparison of the effectiveness of the two design
techniques is also performed. Figure 13.5.1(b) shows the layout view of the
high-density 6T-SRAM bit-cell with 0.07m2 area in a 16nm high-k metal-gate
FinFET technology. To minimize area, we set the geometric ratio of PU, PG, and
PD transistors all equal to one. With the two developed write-assist circuits, the
overall VDDMIN improvement can be over 300mV in a 128Mb SRAM test-chip.
Figure 13.5.2 shows simulated results of the required negative bitline bias (blue
curve) and coupling NBL voltage levels with and without negative-biassuppression (clamping) circuit for the high-density SRAM bitcell. Since
aggressive negative bias is needed to achieve the VDDMIN target, the coupled
negative-bias level has to be more negative to provide the required bitline write
voltage. Due to the coupling technique, the negative-bias voltage level is
proportional to the voltage level of the coupling signal. In the SRAM write
operation, the negative bitline bias is applied to the source terminal of the PG
transistor and wordline activation pulse signal is applied to the gate terminal of
the PG transistor in the selected SRAM bitcell. The greater negative bias at higher VDD region leads to more stress on the SRAM PG transistor, thus reliability is
a concern for the SRAM operation at higher VDD. Therefore, a suppressed
coupling signal at high VDD region for negative-bit-line bias (SCS-NBL)
generation circuitry is applied to suppress the coupling signal voltage and
reduce the generated negative bias level at higher VDD. With the SCS-NLB
scheme, the reliability concern is mitigated without degrading the negative-bias
level generation at the lower VDD region.
Figure 13.5.3 illustrates the SRAM design equipped with SCS-NBL write-assist
scheme. The SCS-NBL scheme is directly implemented into read/write block
with small area penalty. During the write operation, the write signal pulse
triggers the replica write buffer to pull low the replica bitline (RBL) to generate a
negative-bitline enable signal (ENB_NBL). Then the ENB_NBL signal propagates
to become the coupling signal (NBL_FIRE). The coupling signal voltage
suppression block is composed of the stacked NMOS transistors that can
suppress the coupling signal (NBL_FIRE) voltage level to the designed
suppressed voltage level that can be below the power rail (VDD) at higher VDD

238

2014 IEEE International Solid-State Circuits Conference

region. The falling edge of the suppressed NBL_FIRE signal couples to a


capacitor (C1) to generate a negative coupling signal (NVSS). With the lower
coupling NBL_FIRE signal voltage level, the coupled negative-bias signal (NVSS)
is limited and kept constant below the higher VDD, as shown in the waveforms of
Fig. 13.5.3. Then the negative bias will be propagated to the selected bitline
(BL[n]) and transferred into the selected SRAM bitcell through the write driver
(WD1) and the write MUX (N1).
Figure 13.5.4 shows the write-recovery-enhancement lower-cell-VDD (WRE-LCV)
write-assist scheme. In order to save area, a shared WRE-LCV voltage generator
is placed in each read/write block. During write operation, the generated LCV
voltage signal (lower than the power rail VDD) from the shared WRE-LCV voltage
generator is propagated to the selected column power track (CVDD[n]) and to
the selected bitcell through a transmission gate (T1) that is controlled by the LCV
enable (LCV_EN) and the column selection (Y[n]) signals. For unselected
columns, the CVDD[n-1] will remain at VDD since the transmission gate (T2) is
off and the pre-charging PMOS (P2) turns on to prevent the half-selection readdisturbance issue. Unlike the previous work [1,4] using voltage collapse as write
assist, to prevent the transient data-retention issue, a slightly lower cell VDD
(LCV) with write-recovery-enhancement LCV pulse (WRE-LCV) technique is
applied in this work. The WRE-LCV pulse has to be recovered before the
wordline-pulse-signal turn off to prevent the write recovery failure issue as
shown in the waveforms of Fig. 13.5.4. The WRE-LCV pulse write-recovery
timing control is centralized at the WRE-LCV controller for the recovery timing
adjustment. The WRE-LCV pulse voltage level options (75%, 50%, 25% of VDD)
are also offered for post-fabrication tuning to mitigate process variations.
Figure 13.5.5 shows the floorplan and the area of 128kb SRAM macro with
0.07m2 SRAM bitcell. The WRE-LCV scheme is placed at the boundary of
SRAM array and read/write block (RWBLK). In order to reduce the area penalty,
the WRE-LCV write-recovery timing-control block is placed in the main control
(CTRL) block. The SCS-NBL scheme is located at the bottom of read/write block
(RWBLK). The area overheads of the WRE-LCV and SCS-NBL schemes are 3%
and 2%, respectively.
Figure 13.5.6 shows the cumulative distribution plot of the overall VDDMIN
improvement from a 128Mb test-chip at 25C. The SCS-NBL and WRE-LCV
improve the overall VDDMIN by over 300mV at 95 percentile for the 0.07m2 SRAM
bitcell in a 128Mb test-chip. With the reduced LCV voltage level, the VDDMIN can
be pushed even lower, but the improvement saturates at 50% LCV voltage level.
Figure 13.5.7 shows the micrograph of the 128Mb SRAM test-chip, which is
equipped with electrically programmable fuses for post-silicon tuning on the
write-assist options. The test-chip is built from 1024 128kb (409632) SRAM
macros. The die area of the test-chip is 42.6mm2.
Acknowledgements:
The authors thank the physical design team John Hung, R.S. Chen, Hanson Hsu,
and L.J. Tyan for layout and chip implementation; the RD team for wafer
manufacturing; the test department for chip measurements on this work.
References:
[1] Y. Wang et al., Dynamic Behavior of SRAM Data Retention and a Novel
Transient Voltage Collapse technique for 0.6V 32nm LP SRAM, IEDM Dig. Tech.
Papers, Dec. 2011, pp 32.1.1-32.1.4.
[2] Y. Fujimura et at., A Configurable SRAM with Constant-Negative-Level Write
Buffer for Low-Voltage Operation with 0.149m2 Cell in 32nm High-K Metal-Gate
CMOS, ISSCC Digest of Technical Papers, Feb. 2010, pp 348-349.
[3] H. Pilo et al., A 64Mb SRAM in 32nm High-k Metal Gate SOI Technology
with 0.7V Operation Enabled by Stability, Write-Ability and Read-Ability
Enhancements, ISSCC Digest of Technical Papers, Feb. 2011, pp 254-256.
[4] Eric Karl et al., A 4.6GHz 162Mb SRAM Design in 22nm Tri-Gate CMOS
Technology with Integrated Active Vmin Enhanced Assist Circuitry, ISSCC
Digest of Technical Papers, Feb. 2012, pp 230-231.
[5] Jonathan Chang et al., A 20nm 112Mb SRAM in High-k Metal-Gate with
Assist Circuitry for Low-Leakage and Low-Vmin Applications ISSCC Digest of
Technical Papers, Feb. 2013, pp 316-317.

978-1-4799-0920-9/14/$31.00 2014 IEEE

ISSCC 2014 / February 11, 2014 / 3:15 PM

Figure 13.5.1: (a) Conventional 6T-SRAM bitcell and (b) the layout view of the
high-density SRAM bit-cell with an area of 0.07m2.

Figure 13.5.2: Negative bitline voltage versus required write bitline voltage.

13

Figure 13.5.3: SRAM design equipped with SCS-NBL write assist scheme.

Figure 13.5.4: SRAM design equipped with WRE-LCV write assist scheme.

Figure 13.5.5: Floorplan of the WRE-LCV and SCS-NBL blocks for write-assist
techniques.

Figure 13.5.6: Cumulative distribution plot of overall VDDMIN improvement for


WRE-LCV and SCS-NBL write-assist techniques.

DIGEST OF TECHNICAL PAPERS

239

ISSCC 2014 PAPER CONTINUATIONS

Figure 13.5.7: Die microgrpah of 128Mb SRAM test-chip.

2014 IEEE International Solid-State Circuits Conference

978-1-4799-0920-9/14/$31.00 2014 IEEE

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 33, NO. 4, APRIL 2014

585

Reachability-Based Robustness Verification and


Optimization of SRAM Dynamic Stability
Under Process Variations
Yang Song, Hao Yu, Senior Member, IEEE, and Sai Manoj Pudukotai DinakarRao, Student Member, IEEE

AbstractThe dynamic stability margin of SRAM is largely


suppressed at nanoscale due to not only dynamic noise but also
process variation. This paper introduces an analog verification
for SRAM dynamic stability under threshold-voltage variations.
A zonotope-based reachability analysis by the backward Euler
method is deployed for SRAM dynamic stability in state space
with consideration of SRAM nonlinear dynamics. It can simultaneously consider multiple SRAM variation sources without multiple repeated computations. What is more, sensitivity analysis is
developed for zonotope to optimize SRAM designs departing from
unsafe regions by simultaneously tuning multiple SRAM device
parameters. In addition, compared to the SRAM optimization
by single-parameter small-signal sensitivity, the proposed method
can converge faster with higher accuracy. As shown by numerical
experiments, the proposed optimization method can achieve 600
speedup on average when compared to the repeated Monte Carlo
simulations under the similar accuracy.
Index TermsDesign for manufacturability, memory, mixedmode, performance optimization, simulation, transistor-sizing.

I. Introduction

OBUSTNESS verification and optimization have become


an emerging need for integrated circuit (IC) designs at
nano-scale such as SRAMs. Static noise margin (SNM) [1], [2]
is traditionally deployed for SRAM stability characterization
because of its simple interpretation and measurement. As it
may overestimate read-failure and underestimate write-failure,
dynamic stability margin [3] is increasingly adopted by deploying critical word-line pulse-width that can produce a better
estimation of failures. However, the verification of SRAM
stability margin becomes even harder at nano-scale. Firstly,
due to the nonlinear dynamics, the SRAM characteristic behavior becomes not digital but more analog. Moreover, process
variations such as threshold-voltage variations [4][13] can
further significantly suppress the SRAM stability margin, and
hence result in higher failure rate during read/write operations.
The robustness verification and further optimization of
SRAMs have become thereby necessary to provide designers

Manuscript received June 22, 2013; revised November 13, 2013 and January
17, 2014; accepted January 23, 2014. Date of current version March 17, 2014.
The work was supported by the Singapore MOE Tier-1 funding RG 26/10. The
preliminary result was published in ISPD13. This paper was recommended
by Associate Editor C. Sze.
The authors are with the School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore 639798 (e-mail:
haoyu@ntu.edu.sg).
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TCAD.2014.2304704

a close scrutiny of potential hazards, such as threshold-voltage


variations from all transistors. The primary challenges in
traditional approaches for robustness verification and optimization are the complexity to deal with multiple dimensions
of variation sources and device parameters. No matter the
deterministic corner analysis or the statistical Monte Carlo
analysis, the fundamental problem of complexity is from many
repeated computations performed for each different condition
of variation source and parameter. There is a need to efficiently
report the status of failure with consideration of multiple variation sources just by one verification simultaneously. Moreover,
a verification-driven optimization is also required, which can
guide designs departing from unsafe region by further tuning
multiple device parameters at the same time.
Many works are performed from statistical perspective
[14], [15]. Based on dc characteristics of inverters, stability
analysis has been performed in [14] by modeling failure
with normal distribution even when failure occurs in tail of
normal distribution. In [15], accurate estimation is achieved
without the assumption of normal distribution of failure
probability but is based on the most probable failure point
searching. Moreover, importance sampling is utilized to avoid
the prohibitive Monte Carlo simulation by only capturing
relevant rare event in. A number of recent works have been
performed from deterministic perspective as well [16][20].
For example, EulerNewton curve-tracing [17] is utilized to
find the boundary between the safe and unsafe regions without
brute-force exploration. The work in [20] further formulates a
dynamic stability margin to characterize the stability boundary,
namely, the separatrix [18]. But, the search for boundary is
limited to two parameters, and the computational cost can
be prohibitive when considering parameter variations from
all transistors. What is more, it is unclear how to perform
parameter adjustment for SRAM robustness optimization that
can help designs depart from unsafe regions.
Reachability analysis has been widely deployed in verification of system dynamics by exploring potential trajectories
of operating points in state space. It can conveniently provide
accurately predicted boundary of multiple trajectories under
uncertain inputs and/or interval parameters by one-time computation, in contrast to simulate different trajectories one by
one to explore. The reachability analysis has been deployed
for a number of hard analog circuit verifications [21][26].
Starting from a set of uncertain inputs, the set of system

c 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
0278-0070 
See http://www.ieee.org/publications standards/publications/rights/index.html for more information.

586

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 33, NO. 4, APRIL 2014

trajectories in state space can be bounded by zonotope-based


over-approximation [27], [28]. One can perform time-interval
integrated reachability analysis with formed zonotope that can
distinguish safe and unsafe regions at final set. As such, one
can verify failure of system trajectory in state space without
multiple repeated simulations. What is more, one can also
formulate the set when adjusting multiple device parameters
by zonotope approximation, and thereby further optimize the
system trajectory to depart from the unsafe region to the safe
region. However, the limitations of previous zonotope-based
reachability analysis are mainly twofold. First, explicit timeinterval integration is computationally expensive when considering nonlinearity during SRAM verifications. It is unknown
how to develop a SPICE-like implicit time-interval integration
of zonotope-based reachability analysis with consideration of
linearization error update. Second, the zonotope-based sensitivity analysis is different from the traditional single-parameter
small-signal sensitivity. One need to explore the set based
sensitivity in term of distance to the safe/unsafe region inside
a sequential of verifications based optimization.
In this paper, we introduce a zonotope-based reachability
analysis for both verification and optimization of SRAM
dynamic stability. The formulation of zonotope is based on
over-approximation defined by a hypercube. Alternatively,
sensitivity analysis of zonotope with respect to multiple device
parameters is also derived to guide SRAM optimization that
can depart from unsafe region. As a summary, there are
two primary contributions of this paper. Firstly, to consider
SRAM nonlinear dynamics, a backward Euler method is
developed for the zonotope-based reachability analysis with
linearization error control, which can efficiently consider multiple variation sources for the SRAM robustness verification.
Secondly, a zonotope-based sensitivity analysis is introduced
for safety distance, which can generate multiparameter largesignal sensitivity for the SRAM robustness optimization. The
proposed verification and optimization procedures are both
implemented in a SPICE-like simulator with nonlinear device
model of transistors. Moreover, as multiple-parameter largesignal sensitivity is generated for safety distance, compared to
the traditional single-parameter small-signal based sensitivity
optimization, the proposed method can converge fast with
high accuracy. Compared to Monte Carlo optimization, the
proposed method can achieve speedups up to 600 with
similar accuracy.
The rest of this paper is organized as follows. Section II
reviews the SRAM failure mechanisms with consideration
of threshold-voltage variations. Section III describes the
zonotope-based based nonlinear reachability analysis, which
is further deployed in the robustness optimization with safety
distance sensitivity calculation for SRAM dynamic stability in
Section IV. The proposed method is validated by experiments
in Section V for different SRAM malfunctions including write
and read failures. Conclusions are drawn in Section VI.
II. Problem Formulation of SRAM Failure
Verification and Optimization
Similar to [16][20], the scope of this paper focuses on the
transistor-level analytical approaches for the SRAM dynamic

stability verification and optimization under threshold-voltage


(Vth ) variations. When statistical distribution of Vth variation
is known, one can efficiently generate yield statistics from
the transistor-level verification results. There exists serious
concern of 6T-SRAM with Vth variation [5], [6]. The resulting
mismatch among transistors can lead to SRAM functional
failures during read and write operations. What is more,
though transistor sizing may compensate the negative impact
of Vth variations, it is unknown how to adjust transistor size
for robustness optimization for the sake of SRAM dynamic
stability.
In this section, we introduce the definition to quantitatively
describe the robustness of SRAM dynamic stability.
Definition 1: Safety distance is the Euclidean distance
||psafe x||2 in the state space between operating point x and
the safe state psafe .
The operating point x or safe state psafe depends on both the
time instant t and input stimulus u. Hence, the safety distance
||psafe x||2 is a function of both time instant t and input
stimulus u. The input corners of the state space are employed
to ensure the safety for the possible input stimulus. The idea
of zonotope-based reachability verification in the state space is
based on the fact that the instability hazards can be visualized
as unsafe regions and the Euclidean distance to safe region
becomes a measure of system safety. From this perspective, it
is the first time in literature to deploy Euclidean distance to
describe the safety distance for verification and optimization
by this paper.
Note that different from separatrix based approaches [18],
[20], the safety distance provides indication on the optimization direction of trajectory. As such, it can be conveniently
leveraged within reachability analysis to consider parameter
and also input variations at the same time by performing onetime transient simulation.
A. Failure Mechanisms
In the following, we describe physical mechanisms of
SRAM failures, including read and write failures in terms
of safety distance in the state space. In addition, recall that
there exist two convergent regions in the state space of SRAM
[18]. Operating points on either region converge to the nearest
equilibrium state.
1) Write Failure: A write failure refers to the inability
to write data properly into the SRAM cell. During write
operation, both access transistors should be strong enough
to change the voltage level at internal nodes. As shown in
Fig. 1, write operation can be described in the state space as
the procedure of pulling the operating point from initial state
(bottom right corner) to the target state (top left corner). Thus,
the safety distance refers to the distance between one operating
point and the target state at top-left corner. Given enough
time, the operating point will converge to the nearest stable
equilibrium state either at top-left or bottom-right corner. The
write operation is aimed at pulling operating point into the
same region with the target state and thus helping the safety
distance converge to zero, as shown by point B in Fig. 1.
The V th variations, however, may cause write failure. An
increase in V th can reduce the strength of the transistor. For

SONG et al.: REACHABILITY-BASED ROBUSTNESS VERIFICATION AND OPTIMIZATION OF SRAM DYNAMIC STABILITY

Fig. 1.

Illustration of write failure by safety distance.

example, increase of V th in M6 along with the decrease of


V th in M4 can make it more difficult to pull up v2. In
the state space, the operating point moves slowly toward the
target state under this condition. If operating point cannot
reach other convergent region before access transistors are
closed, it will move back to the initial state which means
write failure happens. To resolve the failure, tuning width of
M6 can be increased while narrowing M4 can help to reduce
safety distance and hence can mitigate the side effect from
V th variations.
2) Read Failure: A read failure refers to the loss of
previously stored data when SRAM flips to the other state
during read operation. Access transistors need careful sizing
such that their pull-up strengths are not strong enough to pull
digital 0 to 1 or vice-versa during read operation. In the state
space, one operating point of SRAM is inevitably perturbed
and pulled toward the other convergent region. In this situation,
the safety distance is from the operating point to its initial
state. If read operation does not last too long, access transistors
can be shut down before the operating point converges to the
other region. The safety distance will converge to zero as the
operating point returns to the initial state in the end, as shown
by point A in Fig. 2.
The V th variations may also cause read failure. For example,
variations caused by mismatch between M4 and M6 can
result in unbalanced pulling strengths and v2 can be pulled
up more quickly. As a result, the operating point crosses to
the other region before read operation ends with failure, as
shown by point B in Fig. 2. To resolve the read failure, width
of M6 needs to be scaled down to avoid excessive pulling
strength. However, this may lead to write failure as illustrated
in previous subsection. In addition, V th variations in M14
affect the locations of converging regions on the state-variable
plane. As the opposite converging region migrates closer to the
initial state, it becomes more likely for read failure to happen.
Therefore, a full and fast verification considering V th variations from all transistors are needed to include all potential
hazards, which will be done by our reachability-based method.
Another problem addressed in this paper is to find an appropriate combination of sizing from all transistors to optimize
the robustness of SRAM dynamic stability by circumventing
potential hazards caused by V th variations. In addition, one
needs to balance both read and write operations during the
optimization for the SRAM dynamic stability.
B. SRAM Dynamics
1) Nonlinearity: One primary challenge for SRAM dynamic stability verification and optimization is its nonlinear dynamic behavior. The time-evolution of safety distance

Fig. 2.

587

Illustration of read failure by safety distance.

depends on the nonlinear dynamics of SRAMs, which can be


described by the differential algebraic equation (DAE)
d
q(x(t), t) + f (x(t), t) + u(t) = 0
(1)
dt
in which state variable vector x(t) and input vector u(t) are
deployed. In this DAE equation, taking dynamics of SRAM
as an example, f (x(t), t) includes drain current of transistor;
q(x(t), t) is charge accumulated on the gate or parasitic capacitors; u(t) is the input current as well as noise current in (6);
and vector x includes node voltages and branch currents.
After Newton iteration is performed at a selected operating
point (or nominal point) x , the
 SRAM nonlinear dynamics
f 
by f (t) can be linearized as x  . Based on the mean-value
x=x
theorem, the dynamic equation at any neighbor operating point
x can be expressed by

d
f 

q(x(t), t) + f (x , t) + u +
(x x )
dt
x x=x

1
2 f 
T
+ (x x )
(x x ) = 0,
(2)
2
x2 x=
{x + (x x )|0 1}
where x is the nominal point and x is one neighbor point near
x ; and represents the tensor multiplication. The 2nd-order
remainder in (2), i.e., the difference between nonlinear f (t)
and its linear approximation, is called as linearization error
denoted by L.
The SRAM dynamic equation thereby can be depicted in a
simplified form by
d
(q(x , t) + Cx) + f (x , t) + u (t) + Gx + u + L = 0 (3)
dt
in which
q
f
C=
|x=x , G =
|x=x ;
x
x
(4)
x = x x , u = u u .
Here, u is decomposed into u and u, in which u is the
noiseless input and u is the input noise independent of x.
Assume that q(x, t) can be further decomposed into q(x ) and
Cx. Thus, one can obtain
d
q(x , t) + f (x , t) + u (t) = 0
(5a)
dt
d
Cx + Gx + u + L = 0
(5b)
dt
in which (5a) is the nonlinear differential equation for nominal
point x and (5b) is the linear equation with Euclidean distance
from nominal point x to the neighbor point x.

588

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 33, NO. 4, APRIL 2014

On basis of gm , the device parameter perturbations can be


included into conductance matrix by G as follows:

..
.

gm
gm

W
W

W.
(9)
G =

gm gm

W W
..
.
Fig. 3. SRAM with threshold-voltage variations modeled by additional current sources for all transistors.

On the basis of (5), reachability analysis can be deployed


for SRAM dynamic stability verification and optimization in
the state space as discussed later. Note that by using L,
nonlinearity is considered and thus reachability analysis can
be performed on nonlinear trajectories with high accuracy.
2) Multiple Variation Sources and Multiple Device Parameters: Moreover, we discuss how to introduce multiple
variation sources and also multiple device parameters into
the state equation. First, multiple threshold-voltage variation
sources in SRAM can be introduced at the input u as additional
noise current sources, which are added to the drain current of
each transistor in SPICE as shown in Fig. 3.
Based on the first-order Taylor approximation, drain current
for transistor operating in saturation region is presented in (6).
For the simplicity of presentation, we only show the drain
current equation in the saturation mode. In the experiment,
we employ drain current equations that depend on transistor
operation modes. The operation mode of each transistor is
derived after Newton iterations at the present time instance
similar to SPICE. Note that the threshold voltage variation is
modeled as an ad-hoc noise drain current that is computed
afterward based on the operation mode of the transistor.
Operation mode transition can happen in the next time instance
if the noise current value is large enough to cause the change
I d + I d = [Vgs (Vth + Vth )]2
I d (Vgs Vth )Vth .

(6)

The threshold-voltage variation of each transistor I d can


be included by u of (5b)
node a

node b


  
u = u u = [0, ..., Ijd , ..., Ijd , ..., 0]T

(7)

as an independent current source. u represents the jth variation of current source connected between nodes a and b. The
other process variations can be also conveniently considered
in the similar way.
What is more, perturbations of multiple device parameters
can be considered as well. Suppose that each transistor in
SRAM has width perturbation W that affects transconductance gm , namely, gm . One can have
gm =

gm
W.
W

(8)

Based on the above discussions to include multiple variation


sources and multiple device parameters, one can deploy zonotope to form a set of region for multiple variation sources and
multiple device parameters. With the further development of
linear multistep based integration for zonotope and its according sensitivity, one can develop reachability-based robustness
verification and optimization as discussed in the later part.
C. Problem Formulation
Based on the aforementioned SRAM failure mechanisms
and dynamic analysis, the objective of SRAM dynamic stability verification is to examine if the safety distance can be
reduced at the final operating point, when considering interval
values of threshold-voltage variations from all transistors.
If the safety distance fails to converge to zero, a robust
optimization of SRAM dynamic stability is proposed in terms
of safety distance with respect to the most adverse combination
of Vth variations. We call this approach as a verification
oriented robustness optimization.
SRAM Robustness Optimization: To ensure SRAM dynamic
stability, one needs to minimize the safety distance measured
at the final state of the system trajectory as follows:
min

F (w)

s. t.:

Wmin < wi < Wmax , i = 1, 2, ..., m.

(10)

Here, w is the parameter or sizing vector for all transistors


with a defined range [Wmin , Wmax ]. Number of process variations is represented by m. In this paper, the weighted sum of
safety distances for both read and write operations is deployed
as the objective function by

Dw (w, tw ) + Dr (w, tr ), write and read failures


Dw (w, tw ), write failure only
F (w) =

Dr (w, tr ), read failure only


(11)
where D(w, t) is the safety distance and t is the pulse-width
for read or write operation. Due to the symmetrical structure,
three transistor pairs are used to represent the 6T-SRAM. Thus,
the robustness optimization task to be performed reduces to
a three-dimensioned parameter space, where a parameter-state
point is denoted by w R31 .
In the following, we will introduce a solution for the above
problem by a zonotope-based reachability analysis. After the
reachability analysis is performed for verification of safety
distance, sensitivity of safety distance is also obtained to guide
the optimization routine, which can eventually mitigate or
even eliminate failures caused by Vth variations with improved
SRAM dynamic stability.

SONG et al.: REACHABILITY-BASED ROBUSTNESS VERIFICATION AND OPTIMIZATION OF SRAM DYNAMIC STABILITY

589

TABLE I
Parameters Used in Reachability-Based Verification

Fig. 5. Construction of zonotope. (a) c + g(1) . (b) c + g(1) + g(2) .


(c) c + g(1) + g(2) + g(3) .

device parameters. Before defining zonotope, one important


concept for reachability analysis is the reachable set.
Definition 2: Reachable Set is the collection of all possible
operating points or states in the state space that a system may
visit, which can be approximated by an enclosing polytope.
One simple and symmetrical type of polytope, called zonotope [27] and is defined as follows.
Definition 3: Zonotope X is defined by
q

X = {x Rn1 : x = c +
[1, 1]g(i) };
i=1

= (c, g(1) , g(2) , ...)


Fig. 4.

System trajectory and safety distance with zonotopes.

III. Zonotope-Based Reachability Analysis for


Verification
In this section, we show how to deploy zonotope-based
reachability analysis for SRAM dynamic stability verification
in terms of safety distance, and also discuss how to consider
nonlinearity during SRAM dynamic stability verification.
Reachability analysis [23], [24], [27], [28] can efficiently
determine a reachable region that one dynamic system evolves
with a range of states. As such, one can perform one-time
reachability analysis for all potential system trajectories that
form the safe region with the safe distance determined from
the final state set as shown in Fig. 4. The complete flow
of reachability analysis for SRAM verification is shown in
Algorithm 1. The notations used in this section and their
definitions are listed in Table I. Here, X0 is the initial reachable
set in zonotope form, and XN is final reachable set used for
calculation of safety distance and its sensitivity. With the linear
multistep implementation for integration, the runtime cost or
complexity of zonotope-based reachability analysis is similar
with the transient analysis in SPICE. In the following, the
details at each step are illustrated.

(12)

where c Rn1 is the zonotope center; and g(i) Rn1 is


called as zonotope generator.
As shown in (12), the so called zonotope is essentially a
multidimensional interval in affine form or a hypercube with
each generator as a variation in a different direction. Note
that ellipsoid-modeled uncertainties are not considered in the
reachability analysis in this paper, which will be addressed in
future work.
Mathematically, the summation in zonotope needs to be
interpreted as the Minkowski summation [28] of two finite sets
such that the merged set can preserve convex property. Given
two sets of zonotopes P and Q, Minkowski summation is
performed by adding their zonotope centers and concatenating
their generators as
P Q = {p + q|p P, q Q}
= (c1 + c2 , g1(1) , ..., g1(e) , g2(1) , ..., g2(u) ).

(13)

A. Reachable Set and Zonotope

Here, c1 and c2 are the centers of zonotopes P, Q, respectively. Generators of P and Q are represented by g1(i) ,
g2(i) , respectively. A tight zonotope enclosing the convex hulls
of two zonotopes CH(P, Q) can be found by CH(P, Q) as
follows:
1
CH(P, Q) = (c1 + c2 , g1(1) + g2(1) , ..., g1(e) + g2(e) ,
2
c1 c2 , g1(1) g2(1) , ..., g1(e) g2(e) ).
(14)

Interval-value has been applied to model the uncertainties


of state variables in [4], such as variation sources and device
parameters. For example, if x1 , x2 model uncertainties in
two different dimensions of state variable x with c as interval
center, the neighboring point including these variations can
be modeled as: x = c + [1, 1]x1 + [1, 1]x2 . However,
there is no formal and efficient verification method developed
to deal with multidimensional interval-value problem.
In this paper, we show that the multidimensional intervalvalue problem can be modeled by zonotope, which is a convex
polytope to model multiple variation sources and multiple

As all the generators are enclosed within in the zonotope,


this forms a convex set. Note that the above property is
applicable to the summations in (13) and (12).
One example of construction of zonotope with the addition
of generator vectors is shown in Fig. 5. Here, c is the center
of zonotope and generator vectors are represented as g(1) , g(2)
and g(3) . We perform addition of zonotope vectors to preserve
the convexity. In Fig. 5, initially a zonotope with a center c
and generator g(1) is presented. Further to perform Minkowski
summation, g(2) and its negative vector is added to g(1) , which

590

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 33, NO. 4, APRIL 2014

B. Reachability Analysis by Backward Euler Method

Fig. 6.

Fig. 7.

On the basis of nonlinear dynamics of SRAM discussed in


(5), the zonotope-based reachability analysis is performed as
follows. The detailed explanation of explicit integration can be
found in [28], which is much more sophisticated and expensive
than the proposed numerical integration method developed in
this paper. In this paper, a SPICE-like zonotope evolution is
developed based on backward Euler method [29].
First, note that (5) can be solved with discretized time-step
h at kth-iteration by
C (i)
(j)
uk Lk );
xk(i) = A1 ( xk1
h
k = 1, ..., N; i = 1, ..., q; j = 1, ..., m
(17)

Nonlinear SRAM dynamics and zonotope.

Bidirectional zonotope and unidirectional zonotope.

results g(2) in two directions to form a convex zonotope. The


same procedure is followed to perform Minkowski summation
for additional generators.
Physically, a zonotope spans a polytope in the state space
that covers all kinds of trajectories caused by uncertain initial
sets as shown in Fig. 4. A zonotope indicating its center c and
the generator vector g(i) is shown in Fig. 6, which indicates that
the nominal point x can be relevant to the center of zonotope
c; and the the deviation distance (x x ) of the state variable
x can be relevant to the generator g.
Note that scaling factors of generators are allowed to range
from 1 to 1. Thus, the difference vector from the nominal
point to other points within a reachable set varies in two
directions. One can use bidirectional zonotopes to include
all possible threshold variations during SRAM verification.
However, for the calculation of sensitivity during robustness
optimization, the scaling factor is defined within [0, 1] to
obtain a single-direction difference vector from the nominal
point to any other neighbor point in the reachable set. We call
the modified zonotope as the unidirectional zonotope (Fig. 7),
which is determined by
X uni = {x Rn1 : x = x +

q


[0, 1]g(i) }.

(15)

i=1

What is more, similar to zonotope for state variable vector,


one can model the interval values for state matrices by
zonotope as well [28]. As such, the matrix zonotope is derived
as
M = {M Rnn : M M (0) +

q


where A = Ch + G is the Jacobian matrix, N represents the


number of time steps, m represents the number of process
variations and q is the number of zonotope generators.
Let the zonotope center be a nominal operating point xk .
Meanwhile, the zonotope generators xk(i) are the Euclidean
distances (4) from the nominal point xk to neighbor points xk .
As such, one can have a zonotope of state variable vector by


q

(i)
n1

Xk = xk R
: xk = xk +
[1, 1]xk .
i=1

What is more, multiple threshold-voltage variation sources


are included in form of zonotope generators based on (7) as


(j)
Uk = uk Rn1 : uk = uk +
.
[1, 1]uk

j=1

The according iteration equation for zonotope-based verification is thereby built after substituting generator xk(i)
(q)
g
g
by generator matrix Xk = [xk(1) , ..., xk ], u(i)
k by Uk =
(1)
(m)
[uk , ..., uk ], Jacobian matrix A by matrix zonotope A,
and capacitance matrix C by matrix zonotope C. As such


C g
g
g
Xk = A1
(18)
Xk1 Uk Lk , k = 1, ..., N.
h
What is more for robustness optimization, matrix zonotopes
A and C can be built to consider perturbations from multiple
device parameters, such as transistor width sizings in the case
of SRAMs. In A, interval conductance matrix G can be
computed using the interval values of transistor widths similar
to (9). As such, the zonotope matrix can be further interpreted
in terms of interval-valued matrices by




A A(0)
|A(i) |, A(0) +
|A(i) | .
(19)
i

The matrix generator can be formed as follows:


[0, 1]M (i) }.

(16)

i=1

Similar to zonotope of state variable vector, the matrix M (0)


is the center matrix and the matrix M (i) is called the matrix
generator, which contains the variation ranges of perturbed
device parameters. Addition and multiplication rules for matrix
zonotopes are similarly defined as vector zonotopes [28].

A(i) =

A(0)
W (i) = G(i) .
W

(20)

Here, A(0) is the nominal state matrix without variations, and


A(i) is the variation of state matrix caused by perturbation due
to ith transistor width W (i) .
In addition, note that in (18), the reciprocal of the matrix
zonotope A = (A(0) , ..., A(m) ) can be evaluated in two steps

SONG et al.: REACHABILITY-BASED ROBUSTNESS VERIFICATION AND OPTIMIZATION OF SRAM DYNAMIC STABILITY

591

without explicit inversion. The first step is to calculate (A(0) )1


by LU decomposition
(A(0) )1 = U 1 L1 P T I

(21)

where I is the identity matrix and P is the permutation matrix.


The second step is the approximated expansion of A1 by
A1 = ((A(0) )1 , ..., (A(0) )1 A(m) (A(0) )1 ).

(22)

Recall that m represents the number of process variations.


This approach enables an implementation of reachability analysis inside a SPICE-like simulator. However, such an approximated inversion may introduce additional source of error.
C. New Set Formulation by Minkowski Summation
Superposition principle allows to separate the solution of
g
(18) into two parts: homogeneous solution Xh with respect
g
g
to the initial state Xk when there is no input Uk ; and
g
inhomogeneous solution Xi accounting for the system input
g
g
Uk supposed that the initial state Xk is the origin. Note that
linearization error is also considered at the input during the
update at each time step.
As such, the inhomogeneous solution can be further divided
g
into the one due to input vector (Xi ) and the other one for
linearization error (Xeg )
C g
g
Xh = A1 Xk1
h
g
g
Xi = A1 Uk
(23)
g
1
Xe = A Lk .
Given an initial set Xk at current time step, there are
three sets of solutions computed. Multiplication of matrix
zonotopes [28] can be used for solving (23). The number of
generators may grow after zonotope multiplications. As such,
an upper bound has to be set on the number of generators
and generators with smallest magnitudes can be discarded in
this process. The concatenation of these sets with convexity
is performed by the aforementioned Minkowski summation
[28] to form a convex zonotope.
Therefore, a new reachable set Xk is obtained by combining
g
zonotope center xk and generator Xk as
g
g
g
Xk = Xh Xi Xeg
g
Xk = (xk , Xk )
(24)
where represents the Minkowski summation.
D. Linearization Error Control
Approximation of linearization error Lk in line 6 of
Algorithm 1 is a critical step in each iteration cycle. Lk is
a vector with interval values. It can be viewed as a zonotope
with 0 as the center and the interval ranges as generators. Linearization error accounts for nonlinearity of SRAM dynamics.
Here, nominal point xk is the zonotope center for the current
iteration xk ; and xk varies within zonotope Xk . As such, Lk
cannot be exactly calculated but approximated for xk Xk .
Detailed approximation for Lk can be found in [28] by

1
2 f 
Lk = (xk xk )T
(xk xk ),
2
x2 
{xk + (xk

k xk =
xk )|0

1}.

(25)

Fig. 8.

Reachable sets with or without splitting.

By dynamic updating the approximated Lk , the convergence


of zonotope-based reachability analysis can be guaranteed.
What is more, one can further develop local-truncation-error
control scheme similar to SPICE.
Next, the nonlinearity of SRAM is rather prominent in
the transition area between two convergent regions where
linearization error expands rapidly. Over-expanded reachable
sets in Fig. 8 may be too rough to be meaningful. Based on
(25), if the deviations (xk xk ) between states are small, the
second order derivatives are appropriate enough to approximate the linearization error. However, for strong nonlinearity,
set-splitting needs to be performed to limit the deviations
(xk xk ) by cutting it into half and creating two nominal points,
xk + |xk xk |/2 and xk |xk xk |/2. After self-splitting, new
zonotopes are formed but usually with overlap of each other.
Avoiding the overlap can reduce unnecessary computations.
One possible solution is to cancel the reachable sets that have
already been reached, which can be performed by difference
operation between xk + |xk xk |/2 and xk |xk xk |/2.
A judgement condition for set splitting is shown as follows:
IH(Lk ) [, ]

(26)

in which IH() is the interval hull operation that converts a


zonotope to a multidimensional interval; and is an userdefined limit vector. After the current reachable set is divided
into two subsets, along with a new trajectory being created, the
zonotope-based reachability analysis is repeated at the current
time point for the new subsets.
IV. Sensitivity of Safety Distance for
Optimization
In this section, we first introduce the definition of safety
distance under zonotope, and further discuss the according
sensitivity calculation of safety distance, which is applied
for SRAM dynamic stability optimization by tuning multiple
SRAM device parameters simultaneously. Different notations
and their definitions used in this section are listed in Table II.
A. Safety Distance
With the use of zonotope, safety distance in the state space
can be obtained as follows. Assume that one safe state is
located at psafe in the state space. As for any zonotope in
the form of (12), the safety distance for the reachable set can
be expressed as
D = {d R

n1

: d = psafe c

q

i=1

[0, 1]g(i) }.

(27)

592

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 33, NO. 4, APRIL 2014

TABLE II

Algorithm 1: Reachability Analysis


Input: System equation, input vector I1,2,...N , initial set X0 ,
simulation interval h, and the maximum number of time
steps N.
Output: XN
1: for (k = 1; k < N + 1; k + +) do
g

2:
Xk1 (xk1
, Xk1 )
3:
compute xk and linearized matrices Ck , Gk
4:
compute system matrix zonotopes A and C
5:
approximate linearization error Lk
6:
if IH(Lk ) [, ] then
g
g
7:
Xh = A1 Ch Xk1
g
1
8:
Xi = A Uk
9:
Xeg = A1 Lk
g
g
g
10:
Xk = Xh Xi Xeg
g

11:
Xk = (xk , Xk )
12:
else
13:
Xk1 = split(Xk1 )
14:
continue
15:
end if
16: end for

Parameters Used in Robustness Optimization

Fig. 9.

As shown in Fig. 9, for one specific point inside the reachable


set, one certain safety distance can be determined as
q

D = ||psafe c
(i) g(i) ||2 , 0 (i) 1
(28)
i=1
(i)

where , i = 1, ...q is the coefficient of generators to determine the relative position of the point within the zonotope.
Note that safety distance reduces to zero if zonotope settles
in the safe region. As such, one can utilize it to verify the
dynamic stability of SRAM.
B. Sensitivity of Safety Distance
With the use of reachability analysis by zonotope, trajectory
of SRAM is obtained and the sensitivity of the safety distance
D at the final reachable set
q

(i)
xfinal = cfinal +
[0, 1]gfinal
i=1

can be calculated afterward. Note that the safety distance D


for a reachable set can vary within a certain range as the
perturbation of device parameters (20) can result in different
operating points close-by.
The perturbation range of device parameters [0, W] is
in form of interval entries of the matrix zonotope (16) and
is included in A in (23). By zonotope multiplication, the
perturbation is transferred to generator xk (namely, gk ) in
g
Xk . After running reachability iterations, the generator of the
final state (gfinal ) is used to derive the perturbation of the
safety distance as follows:
q

(psafe cfinal )T (i)
D =
g
.
(29)
||psafe cfinal ||2 final
i=1
As shown in Fig. 9, the perturbation of the safety distance
D at the final reachable set xfinal is obtained by projecting

Safety distance and its sensitivity in reachability analysis.

(p

)T

(i)
final
zonotope generators gfinal
to the normal vector ||psafe
,
safe cfinal ||2
which is formed from the zonotope center cfinal to the safe
region psafe .
As such, one can calculate the large-signal sensitivity
S(D, w) of the safety distance D with respect to device
parameter w by

S(D, w) :=

D
w

(30)

which becomes the ratio between their increment values D


and w for multiple device parameters simultaneously.
Note that different from the single-parameter small-signal
sensitivity of state variable obtained by differentiating (2) with
s=

xk
C
G C
C
= ( + G)1
( + G)1 ( xk1 uk )
w
h
w h
h

(31)

in which the linearization error Lk of (2) is omitted and state


variable xk1 for the last time-step is assumed as constant.
(For the simplicity of presentation, derivatives of capacitance
matrices are omitted.) As such, one can observe that though
the single-parameter small-signal sensitivity is easy to obtain,
compared to the large-signal sensitivity S(D, w) calculated
from reachability analysis in (30), s may fail to measure the
accumulated variation from the previous states by multiple
parameters. What is more, without considering nonlinearity,
the small-signal sensitivity may fail to provide accurate direction during the global optimization of system trajectory. In
contrast, the proposed multiparameter large-signal sensitivity
by safety distance in reachability analysis can be effectively
utilized for SRAM dynamic stability optimization, which has
faster convergence with higher accuracy as demonstrated by
numerical experiment results.

SONG et al.: REACHABILITY-BASED ROBUSTNESS VERIFICATION AND OPTIMIZATION OF SRAM DYNAMIC STABILITY

593

V. Experimental Results

Fig. 10.

Flow-chart of robust stability optimization of SRAMs.

C. Safety Distance Optimization by Sensitivity


The sensitivity of safety distance derived from the reachability analysis can guide the optimization direction that departs
from unsafe region. It can effectively shorten the safety
distance by tuning multiple device parameters. As such, one
can embed it within any gradient-based optimization algorithm
to achieve a robust SRAM design under process variations.
Note that a dynamic stability optimization by sensitivity of
safety distance is convenient and general; and hence can
be applied for any circuits when safety distance is properly
defined because there is no specific knowledge of circuit
types required. In the case of robust stability optimization
for SRAMs, transistor widths can be automatically sized to
improve the dynamic stability. The complete flow of SRAM
optimization is shown in Fig. 10 with following detailed steps.
First, in each searching step, the increment wk of one parameter vector wk is in the same direction with the sensitivity
of the safety distance by
wk = k k

(32)

where k > 0 is a scaling factor and k is the gradient of


objective function, i.e., Sk (F (w, t), w).
The parameter increment wk for the next step is estimated
by
F (wk , t) + wTk k = 0.
(33)
As such, one can obtain
k =

F (wk , t)
kT k

(34)

after combining (32) with (33). Increment of parameter vector


w (32) is obtained afterward.
Note that the objective function F (w, t) changes nonlinearly
(w,t)
in the parameter space but its gradient Fw
becomes small
in magnitude around the safe region. As such, an empirical
scaling factor < 1 can be utilized to resize the estimated
increment of parameter vector
F (wk , t)
k = T
,0 < < 1
(35)
k k
such that the convergence of optimization can be improved.
What is more, to further improve the convergence, the initial
value stepping can be used when the searching is stuck in
the deadlock or out of the feasible range of device parameters
(Wmin < w1,2,3 < Wmax ).

With the use of zonotope-based reachability analysis, the


robustness verification and optimization for SRAM dynamic
stability are implemented inside a SPICE-like simulator by
MATLAB. Manipulations of zonotopes are performed by a
MATLAB toolbox named Multiparametric Toolbox (MPT)
[30]. BSIM3 is used as the MOSFET transistor model. Threshold voltage variation in each transistor is introduced as a noise
current source in (6). Its center value is 0 and variation is
(Vgs Vth )Vth |, where is the variation range. Experiment
|k W
L
data is collected on a desktop with Intel Core i5 3.2 GHz
processor and 8 GB memory.
We first demonstrate zonotope-based reachability analysis
upon SRAM dynamic stability verification under thresholdvoltage variations. Then, we show robustness optimization
on basis of zonotope-based sensitivity calculation. Further,
we compare with Monte Carlo-based verification and also
single-parameter small-signal sensitivity based optimization.
For SRAM stability verification, we used 1000/2000 samples
in order to show a comparison in reasonable runtime. For
SRAM stability optimization, we used 100 000 samples when
measuring the yield rate before and after optimization as
shown in Fig. 16. But we cannot show 100 000 curves in
one figure. What is more, for both of the verification and
optimization, we set threshold voltage variation up to 30%.
In addition note that during read operation, two charged
external capacitors are connected to the outputs of SRAM.
Data in SRAM is read after one of the external capacitors
is discharged through SRAM. By comparison, during write
operation, internal capacitors in SRAM are pulled down/up.
Since internal capacitors are much smaller than the external
capacitors for read operation. As such, write operation is
observed faster than read operation in experiment results.
A. Dynamic Stability Verification Results
40 nm node is used in our experiment and 1V is chosen
as the supply voltage. Moreover, the equilibrium state of
SRAM usually does not settle at the exact vdd or 0. Thus
we start reachability analysis with an initial state set of
v1 [0.98, 1.00] and v2 [0, 0.02].
1) Verification of Write Operation: The write operation
is first verified by reachability analysis with consideration
of threshold voltage variations. For comparison, Monte Carlo
verification is performed to demonstrate the accuracy of reachability analysis. The duration of write signal is varied to exam
SRAM behaviors under different conditions.
Verification results of write operation are shown in Fig. 11
with threshold-voltage variation range set to 10%. Larger
variation range can be considered for verification when highorder noise model is available. The curves simulated by Monte
Carlo verification are plotted in light purple and trajectories
of reachability analysis are drawn in dark blue. Three different durations of write signal are tested, including 0.025 ns,
0.029 ns, and 0.050 ns.
In Fig. 11(a), write signal lasts for 0.025 ns. At the beginning, trajectories move toward the other corner of variable
plane as data is being written into SRAM. Later, the turning

594

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 33, NO. 4, APRIL 2014

Fig. 12. Verification of read operation with threshold variation range of 30%.
(a) Read operation succeeds with 6 ns pulse. (b) Read operation fails with
11 ns pulse.
Fig. 11. Verification of write operation with threshold variation range of
10%. (a) Write operation fails with 0.025 ns writing pulse. (b) Write operation
fails with 0.029 ns writing pulse. (c) Write operation succeeds with 0.050 ns
writing pulse.

point of trajectories is generated when the write signal flips


to 0. Afterward, trajectories return to initial states. As such,
the data fails to be written into the SRAM, which means that
write failure happens.
When the write pulse increases to 0.029 ns in Fig. 11(b),
trajectories of reachable sets split around the center of the
state space. This happens when the write signal shuts down.
Some of the new trajectories move back to initial states,
which means that some states still fail in the write operation.
To limit the computational cost, the number of trajectories
needs to be constrained. For the simplicity of presentation,
we show two trajectories in Fig. 11(b). Note that at the end
of the trajectory departing from the failure region, the Monte
Carlo curves do not settle with in reachable sets, which means
the mismatch between Monte Carlo curves and reachable sets
happens. This is because after some trajectories reachable
sets are truncated, the rest trajectories may not cover all
possible curves of Monte Carlo verification. Thus, the number
of trajectories is a tradeoff between time and accuracy. An
ideal set-splitting strategy can make the overlap between new

reachable sets the smallest and thus the new trajectories can
cover the most Monte Carlo curves.
Finally, when the duration increases to 0.050 ns in
Fig. 11(c), all possible states finish write operation without
failure. As shown in Fig. 11, curves of Monte Carlo verification remain within the reachable sets by reachability analysis
under the similar accuracy. It indicates that reachability analysis can succeed in approximating the trajectories of SRAM
for failure verification.
2) Verification of Read Operation: Next, the read operation
can be also verified by reachability analysis. The verification
result of read operation is compared with different durations of
input signal while the V th variations are set as 30%. Duration
of read signal is set to 6 ns and 11 ns.
As shown in Fig. 12, the Monte Carlo curves are plotted
in light purple and the enclosing trajectories drawn by reachability analysis are in dark blue. When signal duration is 6 ns
[Fig. 12(a)], all reachable sets recover back to the initial state
after read operation finishes. But when the signal duration
rises to 11 ns, most reachable sets head for the opposite state
which means that read failure happens [Fig. 12(b)]. Due to
the limited accuracy of the first-order noise current model in
(6), the difference between Monte Carlo and the reachability

SONG et al.: REACHABILITY-BASED ROBUSTNESS VERIFICATION AND OPTIMIZATION OF SRAM DYNAMIC STABILITY

595

analysis can be observed. Yet reachability analysis is still able


to catch most of the possible trajectories obtained by Monte
Carlo simulations. Note that in the optimization experiment,
the threshold voltage variation is predefined with no use of
noise current equation in (6).
B. Stability Optimization Results
The setup of SRAM circuit in optimization is as follows.
40 nm CMOS is used as the technology node for our optimization experiment. Supply voltage of SRAM vdd is set to
1V . Initial states for SRAM are set as v1 = 1V and v2 = 0,
respectively. The transistors widths can change in the range of
[100 nm, 600 nm] with step of 1 nm. Threshold variations of
30% are considered by verification and optimization. Note that
interval values of threshold-voltage variations are considered
by reachability analysis as input sources. For the robustness
optimization, the interval values of transistor widths are further
considered in zonotope matrix to derive sensitivity.
As mentioned in Section II-C, the dynamic stability of
SRAM can be improved by shortening the safety distance and
converging to the safe region. Although yield rate cannot be
calculated based on safety distance, it can be optimized by
improving the stability or reducing the failure rate of SRAMs.
In this way, the safety distances of a number of SRAMs with
different V th deviations can be shortened as a whole. As
such, less SRAMs end up outside safety region with yield rate
N
Y := 1 Nfailure
, which is increased as Nfailure reduces. As for
total
write operation, strong pulling strength of M1, M4 and weak
strength of the other transistors lead to high probability of
write failure. Thus, the negative threshold-voltage variations
are assumed for M1, M4, while positive threshold-voltage
variations are assumed for the other transistors. For the same
reason, negative threshold-voltage variations in M2, M3, M6,
and positive threshold-voltage variations in M1, M4, M5 are
used for read operation. The threshold-voltage variations are
set as constant during optimization as the standard deviations.
1) Optimization of Read or Write Failure: To start
with, we perform dynamic stability optimization for read
operation only. Initial widths of three transistor pairs are
[W1 , W3 , W5 ] =[200 nm, 300 nm, 300 nm] and pulse width is
9 ns. The process of stability optimization is shown in Fig. 13,
in which trajectories are plotted in light purple; and reachable
sets (i.e., zonotopes) due to parameter changes are drawn
in dark blue. Unlike the situation in the previous section,
zonotopes for SRAM optimization are quite small. This is
because transistor widths are varied by step of 1 nm and thus
the resulting variation range of trajectory is limited. Note that
the sensitivity calculated here is multiparameter large-signal
sensitivity with respect to one zonotope set, which is different
from the classic single-parameter small-signal sensitivity in
(31). As demonstrated later, the multiparameter large-signal
sensitivity is more stable and accurate.
Three reachable sets are generated at each nominal point
with different transistor widths. The final sets are used to
derive large-signal sensitivities (Fig. 9) by sensitivity-based
reachability analysis. The initial trajectory fails to converge to
the safe region. After three iterations, the optimized trajectory

Fig. 13.

Optimization of read operation only.

recovers from read failure. The optimized widths are [148 nm,
343 nm ,217 nm].
Then, we perform stability optimization for write operation
only. We set the initial pair widths as [W1 , W3 , W5 ] =[400 nm,
500 nm, 350 nm] and reduce the pulse width to 0.050 ns.
The stability optimization by large-signal sensitivity calculated
from reachability analysis can certainly help guide the system
trajectory to converge to the safe region within four iterations
(Fig. 14). The optimized widths are [381 nm, 440 nm, 497 nm].
2) Optimization of Read and Write Failure: To optimize
read and write failure simultaneously, initial transistor pair
widths are randomly chosen as W1 = 200 nm, W3 = 400 nm
and W5 = 400 nm. Pulse width is 9 ns for read operation
and is 0.024 ns for write operation. The process of stability
optimization is shown in Fig. 15.
The optimization direction of trajectory for read operation and write operation are shown in Fig. 15(a) and (b),
respectively. The trajectory after performing optimization to
initial set of transistor widths is represented as initial. From
Fig. 15(b), one can observe that at the beginning, write failure
happens as the trajectory converges to the initial state. With
the use of the proposed sensitivity-based reachability analysis
for the dynamic stability optimization, the trajectory of read
operation moves away from the wrongly converged region
and finally moves to the target state after six iterations when
tuning transistor pair sizes. Meanwhile, the read operation in
Fig. 15(a) is considered, where read failure did not happen at
the beginning. As the write operation is optimized, the trajectory for read operation deviates upward too. As such, the safety
distance to the top-left corner (in this case) is decreased. In
other words, the write operation is optimized at the expense of
read operation to achieve a lower rate of failure for both cases.
The optimized transistor widths obtained by our approach
are finally achieved as W1 = 192 nm, W3 = 330 nm and
N
W5 = 586 nm, respectively. Yield rate (Y := 1 Nfailure
)
total
considering both read and write functions is improved from
6.8% to 99.957%. Further improvement of yield rate can be
achieved by introducing larger threshold variations during the
optimization.
C. Comparisons
1) SRAM Dynamic Stability Verification: A detailed comparison between zonotope-based reachability analysis and

596

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 33, NO. 4, APRIL 2014

Fig. 14.

Optimization of write operation only.

Fig. 15.

Optimization procedure for SRAM dynamic stability. (a) Optimization of read operation. (b) Optimization of write operation.

Monte Carlo method is made upon the write operation. For the
concern of huge time consumption by Monte Carlo method,
we use 1000 samples that usually takes more than one hour
for a single round of verification according to our experiment. Different durations of write signal are considered as
well as different threshold-voltage variations in all transistors.
Detailed experimental results are listed in Table III in which
pulse refers to the duration of input signal; and acceleration
is the ratio of time consumption of Monte Carlo to that of
reachability analysis.
As shown in Table III, compared with Monte Carlo, reachability analysis can achieve speedup up to more than 800 for
1000 samples. When write signal duration is set to 0.025 ns
[Fig. 11(a)] or 0.050 ns [Fig. 11(c)], only one trajectory is
generated by reachability analysis. Linearization is performed
around one nominal trajectory which takes up most of the
simulation time. Thus the time consumption of reachability
analysis is slightly higher than the simulation of one sample of
Monte Carlo verification. As signal duration is set to 0.029 ns,
reachable sets are split into different parts and two trajectories

TABLE III
Time Consumption of SRAM Verification

are generated. Therefore the runtime of reachability verification doubles and the speedup ratio reduces by half when the
signal lasts 0.029 ns and 10% V th variations are introduced
[Fig. 11(b)]. For all experiment cases listed in the Table III,
the reachability analysis can achieve the similar accuracy as
Monte Carlo method to report the failure region.
2) SRAM Dynamic Stability Optimization: The runtime of
optimization at each iteration is listed in Table IV, where more
than 600 runtime speedup can be achieved by our approach.

SONG et al.: REACHABILITY-BASED ROBUSTNESS VERIFICATION AND OPTIMIZATION OF SRAM DYNAMIC STABILITY

Fig. 16.

597

Statistical yield calculation (a) before and (b) after optimization.


TABLE IV

Runtime Comparison of SRAM Stability Optimization

The optimized transistor widths i.e. [W1 , W3 , W5 ] for read and


write stability optimization are also represented in Table IV.
Iteration number for optimization is represented as Iter and the
time taken for optimization by using our proposed reachability
based sensitivity analysis is listed under sensitivity-based RA
column with its units in seconds, similarly time consumption
for optimization by traditional Monte Carlo-based method is
listed under MC column with time consumption in seconds and
the corresponding speedup achieved by our proposed method
is listed under speedup column. For example, for the first
optimization step, the proposed optimization takes about 9 s
while Monte Carlo-based method needs nearly 2 h. The time
consumption of reachability analysis is roughly the same with
one transient simulation, since most computation is used on
the simulation of the nominal trajectory. Similarly, one can
observe the variation in transistor widths at each iteration.
As discussed previously the initial transistor widths are set to
[200 nm, 400 nm, 400 nm], but the optimized set of transistor
widths by our approach is [192 nm, 330 nm, 586 nm]. In our
case, to derive large-signal sensitivity with respect to the
three transistor pairs, reachability analysis is performed for
three times.
Furthermore, we compare the our approach with another optimization routine by single-parameter small-signal sensitivity
(31). For the same aforementioned test-case, the optimization
result by small-signal sensitivity in shown in Fig. 17. Unlike in
Fig. 15(b), the optimization routine by single-parameter smallsignal sensitivity fails to find a feasible solution and results in
negative width after three iterations. Transistor pair widths,
i.e. [W1 , W3 , W5 ] are shown in Fig. 17. Note that W5 fails to
be tuned during optimization, because small-signal sensitivity
with respect to W5 is much smaller than the rest. Since the
single-parameter small-signal sensitivity only depends on the
location of the final state, the resulted gradient merely has

Fig. 17.

Optimization of write operation by small-signal sensitivity.

local accuracy and changes irregularly as the trajectory moves.


As a result the small-signal sensitivity does not lead to the
convergence. The proposed large-signal sensitivity calculation
in reachability analysis can achieve much higher accuracy for
a faster converged SRAM optimization.
VI. Conclusion
In this paper, we are the first to develop the reachability
analysis for the robustness verification and optimization of
SRAM dynamic stability in the presence of multiple variation
sources and device parameters from all transistors. By quantitatively describing SRAM robustness with a defined safety
distance, our approach can efficiently provide not only stability
verification but also optimization during the zonotope-based
reachability analysis. By modeling variations as uncertain
input currents added to the input range, the zonotope-based
reachability analysis is deployed to provide the system performance boundary for the estimation of SRAM dynamic stability
region. We are the first to develop the backward Euler-based
zonotope evolution with linearization error update and control. Furthermore, the multiparameter large-signal sensitivity
calculation is invented in term of zonotope, which is applied
for the robustness optimization for SRAM dynamic stability.
By simultaneously tuning multiple SRAM transistor widths,
the resulted sensitivity of safety distance during reachability
analysis can be deployed during the sequential optimizations to

598

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 33, NO. 4, APRIL 2014

guide SRAM design with operations departing from unsafe region and converge in safe region. In addition, compared to the
traditional single-parameter small-signal based sensitivity optimization, our method can converge faster with higher accuracy.
Compared to the Monte Carlo-based optimization, our method
can achieve speedups up to 600 with similar accuracy.
References
[1] E. Seevinck, F. J. List, and J. Lohstroh, Static-noise margin analysis
of MOS SRAM cells, IEEE J. Solid State Circuits, vol. 22, no. 5,
pp. 748754, Oct. 1987.
[2] E. Grossar, M. Stucchi, K. Maex and W. Dehane, Read stability and
write-ability analysis of SRAM cells for nanometer technologies, IEEE
J. Solid State Circuits, vol. 41, no. 11, pp. 25772588, Nov. 2006.
[3] S. O. Toh, Z. Guo, and B. Nikolic, Dynamic SRAM stability characterization in 45 nm cmos, in Proc. VLSIC, Jun. 2010, pp. 3536.
[4] A. Singhee, C. F. Yang, J. D. Ma, R. A. Rutenbar, Probabilistic intervalvalued computation: Toward a practical surrogate for statistics inside
CAD tools, IEEE Trans. Comput. Aided Design Integr. Circuits Syst.,
vol. 27, no. 12, pp. 23172330, Nov. 2008.
[5] S. Yaldiz, U. Arslan, X. Li and L. Pileggi, Efficient statistical analysis
of read timing failures in SRAM circuits, in Proc. ISQED, 2009,
pp. 617, 621.
[6] C. Dong and X. Li, Efficient SRAM failure rate prediction via Gibbs
sampling, in Proc. DAC, Jun. 2011, pp. 200205.
[7] H. Yu and S. X.-D. Tan, Recent advance in computational prototyping
for analysis of high-performance analog/RF ICs, in Proc. ASICON,
Oct. 2009, pp. 760764.
[8] F. Gong, H. Yu, Y. Shi, D. Kim, J. Ren and L. He, Quickyield: An efficient global-search based parametric yield estimation with performance
constraints, in Proc. DAC, Jun. 2010, pp. 392397.
[9] F. Gong, H. Yu, and L. He, Fast non-Monte-Carlo transient noise
analysis for high-precision analog/RF circuits by stochastic orthogonal
polynomials, in Proc. DAC, Jun. 2011, pp. 298303.
[10] F. Gong, X. Liu, H. Yu, S. X-D. Tan, J. Ren and L. He, A fast nonMonte-Carlo yield analysis and optimization by stochastic orthogonal
polynomials, ACM Trans. Design Autom. Electron. Syst., vol. 17, no. 1,
pp. 10:110:23, Jan. 2012.
[11] H. Wang, H. Yu, and S. X.-D. Tan, Fast timing analysis of clock
networks considering environmental uncertainty, VLSI J. Integr., vol. 45,
no. 4, pp. 376387, Sep. 2012.
[12] W. Wu, F. Gong, R. Krishnan, H. Yu, and L. He, Exploiting parallelism
by data dependency elimination: A case study of circuit simulation algorithms, IEEE Design Test Comput., vol. 30, no. 1, pp. 2635, Feb. 2013.
[13] F. Gong, S. B. Kazeruni, L. He and H. Yu, Stochastic behavioral modeling analysis of analog/mixed-signal circuits, IEEE Trans. Comput.
Aided Design Integr. Circuits Syst., vol. 32, no. 1, pp. 2433, Jan. 2013.
[14] K. Agarwal and S. Nassif, Statistical analysis of SRAM cell stability,
in Proc. DAC, 2006, pp. 5762.
[15] D. E. Khalil, M. Khellah, N-S. Kim, Y. Ismail, T. Karnik and V. K. De,
Accurate estimation of SRAM dynamic stability, IEEE Trans. Very
Large Scale Integr. (VLSI) Syst., vol. 16, no. 12, pp. 16391647, Dec.
2008.
[16] B. Zhang, A. Arapostathis, S. Nassif and M. Orshansky, Analytical
modeling of SRAM dynamic stability, in Proc. ICCAD, Nov. 2006,
pp. 315322.
[17] S. Srivastava and J. Roychowdhury, Rapid estimation of the probability
of SRAM failure due to MOS threshold variations, in Proc. CICC,
Sep. 2007, pp. 229232.
[18] G. M. Huang, W. Dong, Y. Ho and P. Li, Tracing SRAM separatrix
for dynamic noise margin analysis under device mismatch, in Proc.
BMAS, Sep. 2007, pp. 610.
[19] C. J. Gu and J. Roychowdhury, An efficient, fully nonlinear, variabilityaware non-Monte-Carlo yield estimation procedure with applications
to SRAM cells and ring oscillators, in Proc. ASPDAC, Mar. 2008,
pp. 754761.
[20] W. Dong, P. Li, and G. M. Huang, SRAM dynamic stability: Theory,
variability and analysis, in Proc. ICCAD, Nov. 2008, pp. 378385.
[21] S. Gupta, B. H. Korigh and R. A. Rutenbar, Towards formal verification
of analog designs, in Proc. ICCAD, Nov. 2004, pp. 210217.
[22] G. Frehse, B. H. Krogh, and R. A. Rutenbar, Verifying analog
oscillator circuits using forward/backward abstraction refinement, in
Proc. DATE, Mar. 2006, pp. 257262.

[23] D. Walter, S. Little, C. Myers, N. Seegmiller, and T. Yoneda,


Verification of analog/mixed-signal circuits using symbolic methods,
IEEE Trans. Comput. Aided Design Integr. Circuits Syst., vol. 27,
no. 12, pp. 22232235, Dec. 2008.
[24] M. Althoff, S. Yaldiz, A. Rajhans, X. Li, B. H. Krogh, and L. Pileggi,
Formal verification of phase-locked loops using reachability analysis
and continuization, in Proc. ICCAD, Nov. 2011, pp. 659666.
[25] Y. Song, H. Fu, H. Yu and G. Shi, Stable backward reachability
correction for PLL verification with consideration of environmental
noise induced jitter, in Proc. ASPDAC, Jan. 2013, pp. 755760.
[26] Y. Song, H. Yu, S. Manoj P. D. and G. Shi, SRAM dynamic stability
verification by reachability analysis with consideration of threshold
voltage variations, in Proc. ISPD, 2013, pp. 4349.
[27] A. Girard, Reachability of uncertain linear systems using zonotopes,
in Proc. HSCC, 2005, pp. 291305.
[28] M. Althoff, Reachability analysis and its application to the safety
assessment of autonomous cars, Ph.D. dissertation, Dept. Electr. Eng.,
TUM, Munich, 2010.
[29] U. M. Ascher and L. R. Petzold, Computer Methods for Ordinary
Differential Equations and Differential-Algebraic Equations.
Philadelphia, PA, USA: Society Ind. Appl. Math., 1998.
[30] M. Kvasnica, P. Grieder, and M. Baotic. (2013, Jul.).
Multi-parametric toolbox (MPT). MPT 2.6.3 [Online]. Available:
http://control.ee.ethz.ch/mpt/

Yang Song received the B.S. and M.S. degrees in


microelectronics from Shanghai Jiao Tong University, Shanghai, China, in 2006 and 2013, respectively, and is currently pursuing the Ph.D. degree
from the University of California, San Diego, CA,
USA.
From 2012 to 2013, he was a Project Officer
with the VIRTUS IC Design Center of Excellence,
Nanyang Technological University (NTU), Singapore, where he was a member of the NTU CMOS
Emerging Technology group. His current research
interests include applications of reachability analysis on circuit-level verification and optimization.

Hao Yu (M06SM13) received the B.S. degree


from Fudan University, Shanghai, China, in 1999
and the Ph.D. degree from the Electrical Engineering
Department, University of California, San Diego,
CA, USA, in 2007.
He was a Senior Research Staff with Berkeley Design Automation, Berkeley, CA, USA. Since 2009,
he has been an Assistant Professor with the School
of Electrical and Electronic Engineering, Nanyang
Technological University, Singapore. His current research interests include 3-D-IC and RF-IC at nanotera scale. He has authored 115 peer-reviewed IEEE/ACM publications.
Dr. Yu was a recipient of the Best Paper Award from the ACM TODAES10,
Best Paper Award nominations in DAC06, ICCAD06, ASP-DAC12, Best
Student Paper (advisor) Finalist in SiRF13, RFIC13 and Inventor Award08
from semiconductor research cooperation. He is an Associate Editor and
Technical Program Committee Member for a number of IEEE/ACM journals
and conferences.

Sai Manoj Pudukotai DinakarRao (S13) received


the M.Tech. degree from IIIT, Bangalore, India, in
2012, and is currently pursuing the Ph.D. degree
from the School of Electrical and Electronics Engineering, Nanyang Technological University, Singapore.
His current research interests include 3-D-IC I/O
modeling, thermal, and power management.
Mr. Manoj P. D. was a recipient of the A.
Richard Newton Young Research Fellow Award in
DAC 2013.

2013 IEEE 11th International Conference on Dependable, Autonomic and Secure Computing

A Radiation Hardened SRAM in 180-nm RHBD


Technology
CHEN Nan, WEI Tingcun, WEI Xiaomin and Chen Xiao
School of Computer Science and Technology, Northwestern Polytechnical University
Xian, China

chennan0929@126.com

Co mmercial CM OS process in 180-nm and RHBD


technology are adopted to realize radiation-hardened design in
the system, circuit and layout in this paper. The proto chips of
SRAM are fabricated and tested, and the electrical properties of
SRAM chips are measured. On that basis, TID experiment with
using a Co-60 gamma radiation source is made on it, and the
result is analyzed.

AbstractA 24 kB radiation hardened static random access


memory using 180-nm commercial CMOS process appropriate
for embedded system on a chip integrated circuits is presented.
Radiation-hardened design is realized in the system, circuit and
layout design to improve tolerance of radiation effects. The proto
chips of S RAM are fabricated and tested, not only the electrical
properties of S RAM chips are measured, but also the total
ionizing dose effects experiments are finished using a Co-60
gamma radiation source. The experimental results show that, the
TID(total ionizing dose effect) tolerance of S RAM chips is larger
than 300 krad (Si) in which the electrical functions of S RAM are
correct, but with the increase of TID rate, the static and
dynamic current of S RAM increase seriously, and the write and
read time increase slowly. Furthermore, it is verified by our
research that, CMOS transistor layout design with ring-gate and
P-type guard ring can enhance the TID tolerance of S RAM
greatly.
KeywordsSRAM; Radiation Hardened
Ionizing Dose Effect; Ring-gate; Guard Ring

This paper is organized as follows: Sect. II describes the


design of SRAM for radiation hardness improvement, Sect. III
present the electrical properties test results, Sect. IV presents
the experimental TID and analyzes the results, and conclusion
is given in Sect. V.
II.

SRAMs system, circuit and layout are anti-radiation


improvement by RHBD technologies , mainly targeting TID
and SEU and SEL.

by Design; Total

I.
INT RODUCT ION
SRAM with CMOS process is widely used in space
exp loration and high-energy physics experiments. Since a large
number of cosmic rays and charged particles ( particle, high
energy proton, neutron and so on) existing in these application
environment, SRAM is especially sensitive to the ionizing
radiation that they create, which easily cause storage data bits
upset, the decrease of data access rate, temporal disorder,
tremendous parameters change and power consumption [1].
The life and stability of SRAM is seriously declined because of
radiation effect [2]. Therefore, the research and design of
radiation hardened SRAM is very necessary.

24Kb8bit Radiation Hard SRAM

Replication Array

SRAM
Control
Signal
SRAM
Address
Line

Precharge and
Amplifier

Replication
Array

Memory Array
Address
Decoding

Sequential
Control
Driver

24Kb13bit SRAM
Writing Data
/ 13 bits
and
Redundancy
Bits

According to the damage, the ionizing radiation is divided


into SEE (Single Event Effect) and TID (Total Ionizing Dose)
[3]. SEU (Single Event Upset) and SEL (Single Event Latch-up)
are the most important SEE damage in SRAM. TIDs
mechanism is to create trapped charges in silica, which leads to
the threshold voltage shift, mobility reduce and leak
current increase of CMOS so as to change SRAMs time
sequence and noise margin, and to increase power consumption,
or even permanent damage [4]. The damage caused by TID
could accumulate over time. Therefore, TID immunity is one
of the most important parameter to SRAMs life and reliability
[5].

EDAC Writing
Circuit
Encode

5HDGLQJ

EDAC Test
Circuit

'DWDDQG
/UHGXQGDQF\

ELWV

%LWV

EDAC Reading
Circuit
Decode

8 bits

2 bits

/ 8 bits

Writing
Data

EE_T Reading Error Flag


Data

Fig. 1. T he system block diagram of Rad-Hard SRAM

T his work is supported by the National Key Scientific Instrument and


Equipment Development Project number 2011YQ040082

978-1-4799-3381-5/13 $31.00 2013 IEEE


DOI 10.1109/DASC.2013.55

THE DESIGN OF RADIAT ION-HARDENED SRAM CHIP

159

B. Circuit And Layout Design


The 180-n m co mmercial CM OS process is applied to
design and realize the radiation-hardened proto chips of SRAM
in this paper. Gate o xide thickness is less than 4nm, and then
the threshold voltage offset led by radiation effect is basically
eliminated [7]. Ho wever, the leakage flow path around the
edge of MOS, generated by using self-registered technology
when making grid with commercial CMOS proces s, which
becomes the main failure reason of TID in deep-submicron
technology [8].

WL
PM1

PM2

PM3

PM4
NM1

NM2

BL

Storage cell is the most important part in SRAM, accounts


for about 70% of the area. Therefore, the design of the storage
cell is especially important. But the anti-radiation design
usually increases the chip area, for examp le, the classic DICE
cell structure increases the chip area more than four times. To
save area, this paper puts forward a radiation-hardened
optimu m design of the classic 6 cell. As Figure 2 shows, it
consists of 4 PMOS transistors and 2 NMOS transistors. Since
positive charges accumulated between the gate and the
insulating layer of the PMOS, the leakage flow path cant be
formed, so the transfer transistors, NMOS tubes in the classic 6
cell are taken place by PMOS tubes to reduce the impact of
leakage so as to improve the circuits ability of anti-TID effect.
In the layout design, ring-gate layout structure is adopted in the
2 NMOS transfer transistors to eliminate the leakage flow path
around the edge of the active area and field area and reduce the
edge parasitic effect.

9'' %/ *1' %/1 9''

BLN

(a) Schematic

(b) Layout with ring-gate NMOS

Fig. 2. 6-bit cell design of SRAM

A. System Design
The system structure of the designed radiation-hardened
SRAM proto chip is showed in Figure 1. To make up for the
deficiency of commercial CMOS processs anti-SEU ability,
EDAC (Error Detection and Correction) is integrated in SRAM
to realize real-time processing for storage data. The encoding
algorithm of module EDAC is the (13, 8) hamming code,
which is 8 bits of data and 5 redundancy codes (13 b its in total).
When data is written, 5 bits of redundancy codes are created by
EDAC writing circuits, stored in SRAM together with data.
When the data is read, the data and the 5 redundancy codes in
SRAM co me into EDA C decoding error correction circuits
together. First, decide whether the data is right or not. If its
right, output it directly. If one bit is wrong, correct it and then
output it. If two are wrong, report the error to the system
through error flag. Error flag is made up by two bits; one
shows whether to correct the error, the other shows whether the
data is effective. When the error bits are greater than or equal
to 2 bits, the data is useless. Therefore, built-in module EDAC
can correct or monitor the upset of the storage data, and
improve SRAMs ability to tolerate SEU.

In the layout design of digital standard cell library, two


anti-radiation methods of adding the guard ring and distancing
PMOS tube from NMOS tube are adopted to improve SRAMs
tolerance of SEE. Radiation-induced charge in storage node is
absorbed by guard ring, which raise the noise margin and
improve the ability of anti-SEU. On the other hand, guard ring
can guide the transient current, reduce the voltage pulse of the
base of parasitic transistor and improve the ability of anti-SEL.
Whats more, the leakage current transmission path between
the trap and the diffusion zone can be cut by adding the guard
ring, so that improve the tolerance of TID.
Distancing tube from NM OS increases the base region of
parasitic transistor, decreases the current amplification factor
so as to reduce the probability of SEL. It can be proven that the
circuit has higher tolerance of anti-SEL, if the distance between
PMOS and NMOS is wider. When the distance is 5um, the
circuit is basically immune to SEL [9]. In this design,
considering the compro mise between silicon cost and anti-SEL
ability, 2u m is distanced from PM OS tube and NMOS tube so
that the circuits anti-SEL ability can reach above
40MeV cm2 /mg [9].

Meanwhile, the storage array of SRAM is arranged


according to Bit-Interleaving, which reduces the rate of
mu ltibit upsets and improve SRAMs ability to anti-SEU.
Besides, a low-power consumption design strategy is applied to
divide modules for the storage array of SRAM.
The common time sequence control method of SRAM is
based on clock signal and the sequence signal[6]. Delay unit is
used to produce sequence signal that controls the input and
output data, word line and bit line. However, transistors gate
capacitance increasing in radiation environment, lead to the
devices speed changes, and the accuracy of delay unit gets
worse, which easily result in d isoperation. Instead of delay unit,
dedicated radiation-hardened sequential circuits are used to
produce timing sequence signals in the designed radiationhardened SRAM in this paper. It adds a row and a colu mn of
the replication unit of storage array in storage array to track the
key signal lines of storage and feed back to the s equential
control unit so that it can control the writing and reading of
data according to the feedback signal.

III.

THE REALIZAT ION AND VERIFICAT ION OF RADIAT IONHARDENED SRAM PROT O CHIP

In this paper, a 24kB radiation-hardened SRAM proto chip


is researched and designed using 180-n m co mmercial CM OS
process. To verify and co mpare the result of the above
radiation-hardened design methods, 4 different versions of
SRAM chips are designed, named Chip A, Ch ip B, Chip C and
Chip D respectively. The radiation-hardened measure each

160

version adopted is listed in Table , micrographs of SRAM


chips are shown in Figure 3.

TABLE II.

In the SRAM chip test, PXI-1033 fro m NI is adopted to


generate and collect data. Labview is used for software
programming. In normal conditions (VDD=1.8 V,
temperature=25C), the most importance electrical properties
of 4 versions of SRAM chip are tested and compared
respectively. The result is summarized in table .
As is shown in Table , RHDB technology is used in Chip
A, Chip B and Chip C, including adding EDAC module and
redundant storage cell, storage cell NMOS is replaced by ringgate in layout design, 2um is distanced from PMOS tube and
NMOS tube in digital standard cell library, the area of the chips
increased 2.8 times as big as the original, compared with the
ordinary Chip D. The reading and writing time has doubled
because of the delay of EDA C module. Besides, as EDAC and
redundant storage cell are adopted, storage array is increased,
dynamic power consumption of the chip increases 1.4 t imes.
Thus it can be seen that as RHBD technology is adopted,
radiation toleration of the chip gets improved while the other
properties degrade. Electrical properties of Chip A, B, C are
the same, which shows that the type of the guard ring has no
impact to the normal electrical properties of SRAM.

PP

) LOOHU

&KLS$

&KLS%

G
F

&KLS&

PP

G
F

PP
XP
XP
XP
XP
)('$&5HDGLQJ&LUFXLW *('$&7HVW&LUFXLW

XP
XP
(('$&:ULWLQJ&LUFXLW

Fig.3. Micrographs of Rad-Hard SRAM chips

TABLE I.

T HE RAD -HARD METHODS OF SRAM CHIP S


Chips versions

RHBD
Method

Chip A

Chip B

Chip C

EDAC

YES

YES

YES

YES

T ime
Control
Store Cell
(NMOS)
Guard
Ring

Replication
Array

Replication
Array

Replication
Array

Replication
Array

Ringgate

Ringgate

Ringgate

Bar gate

No

N-type

P-type

No

Distance

2um

2um

2um

0um

a.

Chip C

Chip D

Chip Area

5.24mm2

5.24mm2

5.24mm2

1.86mm2

Frequency

50M Hz

50M Hz

50M Hz

50M Hz

T ime

7.6ns

7.6ns

7.6ns

3.9ns

Static Power

0.49uW

0.49uW

0.49uW

0.41uW

Dynamic Power

7.38mW

7.38mW

7.38mW

5.40mW

Time means the write and read time

TID EXPERIMENT AND RESULT A NALYSIS

When the total dose is 300 krad(Si), co mpared with Chip D,


the static power consumption of Chip A reduces by 97%,
which is the consequence of the adoption of the ring-gate in
NMOS in the storage cell of Chip A. Thus it can be seen that
the ring-gate layout effectively suppresses the increase of the
leakage current and reduces the damage of TID. The curve of
Chip A is almost coincided with that of Chip B, which proves
that the adding of N-type guard ring has no obvious effect on
suppressing leakage current. However, compared with Chip D,
the static current of Chip C reduces by 99%, its
a dual outcome of the adoption of the ring-gate and P-type
guard ring. Co mpared with Chip A, the static current of Chip C
reduces by 51%, showing that P-type guard ring has a
remarkable function in suppressing TID effect.

E
PP

Chips versions

Figure 4 shows the tendency that the static current increases


with the total ionizing dose. As the total dose rate accumulates
continually, static power consumption increases rapidly. Thats
because TID effect causes the significant increase of the
leakage current of CMOS devices. While the total dose is 65
krad(Si), SRAM static current has no noticeable change, which
proves that the process itself could stand the TID effect in this
scope.

) LOOHU

PP

Chip B

The Co-60 gamma radiation source with 4000-curie is used


as our irrad iation source. The radiation dose rate is 50 rad(Si)/s
calibrated by UNIDOS dosimeter. When performing the
experiment, SRAM chip to be tested is powered, but it is in
a stationary working position (no storing data), VDD=1.8 V.
The experiment main ly analyzes SRAMs static current,
dynamic current and read and writes time with total ionizing
doses increasing. The chosen total ionizing doses are 65
125162200300 krad(Si). When testing dynamic current
and reading and writing time, the radiation source is cut
temporarily, and then continuing the irradiation after the
measure. The average of the 6 chip samples is considered as
the final result so as to reduce the individual difference of the
chips.

PP

&KLS'

Chip A

IV.

F G

Electrical
properties

a.

PP

PP

T HE MEASURED ELECTRICAL P ROP ERTIES OF SRAM CHIP S

Chip D

Figure 5 shows the curve that the dynamic current increases


with the total ionizing dose, which has almost the same
changing trend as that of the static current. Co mpared with
Chip D, the dynamic current of Chip C reduces by 24%,
showing that ring-gate layout and P-type guard ring effectively
suppress the effect TID has on dynamic power consumption.
And the dynamic current of Chip B only reduces by 14%,
which shows that N-type guard ring has a weak role in

Distance means distancing PMOS tube from NMOS tube

161

suppressing TID effect. Besides, different fro m leakage current,


data bits upset makes dynamic current itself b ig, the leakage
current caused by radiation has a small proportion in
dynamic current, therefore, TID has a smaller effect on
dynamic current.
Figure 6 shows the curve that the read and write time of
SRAM increases with the total ionizing dose. As the dose
accumulates, SRAM reading and writing speed slows down,
because the decrease of the mobility slows the speed of SRAM
down. But since the gate oxide is too thin, the mobility doesnt
change much, 4 chips all have timing protection circuits, read
and write time basically fo llows the same trend with small
increment.
Fig.6. T he write and read time increases the total ionizing dose

V.

CONCLUSION

In this paper, a 24kB radiation-hardened SRAM proto chip


is researched and designed using 180-n m co mmercial CM OS
process. The TID effects experiments are fin ished using a Co60 gamma radiation source. The experimental results show that
the TID effect has a greater influence on static power
consumption than that on dynamic power consumption and less
influence on reading and writing time. The ring-gate and Ptype guard ring can effectively relieve the damage that TID
brings.

REFERENCES
[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]
Fig.4. T he static current increases with the total ionizing dose
[9]

Fig.5. T he dynamic current increases with the total ionizing dose

162

Cester A, Gerardin S, Drain current decrease in MOSFET s aft er heavy


ion irradiation, IEEE Trans Nuclear Science, vol. 51, pp. 3150-3157,
May 2004.
Jarron P,Anelli G,Calint,et al. Deep submicron CMOS technologies for
the LHC experiments, Nucl Phys B-Proc Sup, vol. 78, pp. 625-634,
1999.
M. Gadlage et al. , Single event transient pulsewidths in digital microcircuits, IEEE Trans Nuclear Science, vol. 51, no. 6, pp. 32853290,
Dec 2004.
Karl C. Mohr, Lawrence T. Clark, Keith E. Holbert, A 130-nm RHBD
SRAM With High Speed SET and Area Efficient TID Mitigation,
IEEE T rans Nuclear Science, vol. 54, no. 6, December 2007.
J. Knudsen and L. Clark, An area and power efficient radiation hardened by design flip-flop, IEEE Trans. Nucl. Sci., vol. 53, no. 6,
pp.33923399, Dec 2006.
David A.Hodges,Hoarace G.Jackson,Resve A.Saleh ,Analysis and
Design of Digital Integrated Circuits:In Deep Submicron
T echnology, McGraw-Hill, 2004
C. Hafer, D. Slocum, J. Mabra, S. Tyson, Commercially Fabricated
Radiation Hardened 4Mbit SRAM, IEEE Aerospace Applications
Conference Proceedings, 2004.
Hughes H. L., Benedetto J. M. , Radiation effects and hardening of
MOS technology: devices and circuits, IEEE Trans. Nuclear Science,
vol.50, pp.500-521, 2003.
Wojciech Dulinsk. Ultra-thin Tracking Detectors for ILC and Other
Applications. [EB/OL].
http://www.in2p3.fr/actions/formation/microelectronique07/detecteurs_u
ltra_mince.pdf , IPHC, 2007.

6WXG\DQGPHFKDQLVPRIVWDWLFVFDQQLQJODVHUIDXOWLVRODWLRQRQHPEHG65$0IXQFWLRQIDLO

&KDQJTLQJ&KHQKXLSHQJ1J*KLQERRQ$QJ-/DP=KLKRQJ0DL
*/2%$/)281'5,(63WH/WG
:RRGODQGV,QGXVWULDO3DUN'6WUHHW6LQJDSRUH
&KDQJTLQJ&+(1#JOREDOIRXQGULHVFRP


$EVWUDFW
$V WKH WHFKQRORJ\ NHHS VFDOLQJ GRZQ DQG ,& GHVLJQ
EHFRPLQJ PRUH DQG PRUH FRPSOH[ IDLOXUH DQDO\VLV EHFRPHV
PRUH DQGPRUH FKDOOHQJH HVSHFLDOO\ IRU VWDWLF ODVHU DQDO\VLV
)RU WKH IRXQGU\ )$ RU SURFHVV PRQLWRULQJ 65$0 DQDO\VLV
EHFRPHV PRUH DQG PRUH FULWLFDO 7KHUH DUH WZR UHDVRQV IRU
WKLV 7KH ILUVW RQH LV WKDW 65$0 FLUFXLW LV UHODWLYH VLPSOH
ZKLFK LV ZHOO NQRZQ WR DOO LW LV DOVR XVHG E\ IDE IRU
PRQLWRULQJ VWUXFWXUH 7KH VHFRQG UHDVRQ LV WKH 65$0
SHUFHQWDJHRQFKLSNHHSVLQFUHDVLQJ,WFDQRFFXS\PRUH
FKLS DUHD IRUPRVW ORJLF SURGXFW 7KDW LV DOVR DQRWKHUUHDVRQ
ZH XVH WKH 65$0 WR PRQLWRU RXU SURFHVV 65$0 DQDO\VLV
ZLWK ELW PDS LV UHODWLYHO\ HDV\ IRU )$ %XW DV ')7 EHFRPH
PRUHSRSXODUWKH%,67WHFKQLFDOZDVDSSOLHGLQWKH65$0
ELW PDS ZDV SURYLGHG IUHTXHQWO\  7KH JOREDO IDXOW LVRODWLRQ
PHWKRGRORJ\ PXVW EH HPSOR\HG LQ WKH 65$0 )$ ,Q WKLV
SDSHU VWDWLF VFDQQLQJ ODVHU PHWKRGRORJ\ ZDV DSSOLHG LQ WKH
65$0 )$ ZKLFK QR ELW PDS ZDV SURYLGHG   +RW VSRW ZDV
REVHUYHG LQWKH 65$0 EORFN HGJH IRU VRPH IDLOHG XQLWV EXW
VRPH QRW &RPELQHG ZLWK WKH 65$0 VFKHPDWLF DQG *'6
DQDO\VLV WKH GHIHFW ZDV VXFFHVVIXOO\ IRXQG DQG WKH IDLOXUH
PHFKDQLVP ZDV VWXGLHG ZKLFK FDQ VXFFHVVIXOO\ OLQN WKH
HOHFWULFDO SKHQRPHQRQ DQG SK\VLFDO GHIHFW $OVR ZH IRXQG
WKHSURFHVVLVVXHZLWKWKH)$UHVXOW
%DFNJURXQGLQIRUPDWLRQ
6LQFHWKH65$0SHUFHQWDJHRQWKHFKLSNHHSVLQFUHDVLQJ
DQG LW LV DOVR WKH NH\ SURFHVV GHYHORSPHQW DQG SURFHVV
PRQLWRULQJ VWUXFWXUH 7KH IDLOXUH DQDO\VLV RQ 65$0 LV TXLWH
FULWLFDO7KH)$RQ65$0ZLWKELWPDSLVTXLWHFRPPRQDQG
PXFK PRUH VWUDLJKWIRUZDUG %XW DV WKH WHFKQRORJ\ JRLQJ
IRUZDUG')7DQG%,67ZDVZLGHO\DSSOLHGLQWKH,&GHVLJQ
WKH IDLOXUH DQDO\VLV EHFRPHV PRUH DQG PRUH FKDOOHQJH RQ
65$0 ZLWKRXW ELW PDS *OREDO IDXOW LVRODWLRQ PHWKRGRORJ\
ZDVPXVWHPSOR\HGLQWKLVNLQGRIIDLOXUHPRGH%XWPRVWRI
WLPH '& ELDV GRHVQW VKRZV DQ\ VLJQLILFDQW GLIIHUHQFH
EHWZHHQ JRRG DQG IDLOHG XQLW EHFDXVH HLWKHU WKH GHIHFW
ORFDWLRQFDQQRWEHDFFHVVHGE\'&ELDVRUWKHGHIHFWORFDWLRQ
RQO\FDQLQGXFHVPDOOFXUUHQWFKDQJHZKLFKFDQEHFRQFHDOHG
E\ RYHUDOO FXUUHQW %DVHG RQ RXU VWXG\ WKH JOREDO IDXOW
LVRODWLRQ VWLOO FDQ EH DSSOLFDEOH IRU WKH VHFRQG VLWXDWLRQ  ,Q
WKLVSDSHUDQHPEHG65$0%,67IDLOZDVKDQGOHG$OWKRXJK
WKH FRPSDUDEOH ,9 ZDV REVHUYHG WKH JOREDO IDXOW LVRODWLRQ
ZDVVWLOODSSOLHGRQWKHIDXOWLVRODWLRQ7,9$DQDO\VLV

FRPSDUHGZLWKUHIHUHQFHRQH2QHPRUHREVHUYDWLRQLVWKDWDOO
RI WKH 7,9$ VSRW ORFDWH LQ WKH 65$0 EORFN HGJH )LJXUH 
%DVHGRQRXUH[SHULHQFHWKLVNLQGRIVROLGVSRWVKRXOGEHUHDO
RQH (LWKHU GHIHFW ORFDWLRQ RUKDV VRPH NLQG RI UHODWLRQ ZLWK
WKH GHIHFW :H VHOHFW RQH XQLW IRU 3)$ IURP WRS GRZQ
1RWKLQJDEQRUPDOZDVREVHUYHGIURPWRSPHWDOGRZQWRPHWDO
39&DOVRVKRZVQRDQRPDO\LQWKHKRWVSRWORFDWLRQDWHYHU\
OD\HU


)LJXUHKRWVSRWRIWKHIDXOWLVRODWLRQ 

)XUWKHU GHSURFHVV ZDV SHUIRUPHG 7KH 3RO\ ZDV H[SRVHG
E\%2(*URVV:H[WUXVLRQZDVREVHUYHGLQWKHVSRWORFDWLRQ
)LJXUH  0HDQZKLOH JURVV : H[WUXVLRQ ZDV DOVR REVHUYHG
ZLWKLQWKH65$0EORFN

([SHULPHQWDQG'LVFXVVLRQ
(PEHG 65$0 %,67 IDLO ZDV REVHUYHG LQ RQH RI RXU
SURGXFW XP QR ELW PDS FDSDELOLW\ ZDV EXLOGXS IRU WKLV
SURGXFW 7KH 65$0 LV WKH QRUPDO 7 65$0 '& ,9
PHDVXUHPHQW ZDV SHUIRUPHG RQ WKH 9GG DQG 9VV 1R
VLJQLILFDQW GLIIHUHQFH ZDV REVHUYHG 7,9$ DQDO\VLV ZDV
SHUIRUPHG RQ VHYHUDO IDLOHG XQLWV DQG FRPSDUHG ZLWK JRRG
XQLW'LVWLQFW7,9$VSRWZDVREVHUYHGRQVRPHIDLOHGXQLWVDV

c
978-1-4799-3929-9/14/$31.00 2014
IEEE

)LJXUH3)$VKRZV:VROLGVKRUWKDSSHQHGLQ65$0HGJH

39

2QHPRUH3)$REVHUYDWLRQLVWKDWDOORIWKH:H[WUXVLRQ
ORFDWLRQV VLW LQ VSHFLILF ORFDWLRQ ELW OLQH FRQWDFW WR ELW OLQH
FRQWDFWVKRUW)URPWKHSURFHVVSRLQWRIYLHZLWLVTXLWHHDV\
WRXQGHUVWDQGWKHURRWFDXVHRIWKLVGHIHFW7KDWPHDQVZHFDQ
HDVLO\OLQNWKHGHIHFWWRRXUSURFHVV%XWKRZFDQZHOLQNWKH
GHIHFW ZLWK RXU HOHFWULFDO UHVXOW  ZK\ WKH KRWVSRW DOZD\V
ORFDWHV LQ WKH 65$0 EORFN HGJH  :K\ RQO\ VRPH RI WKH
IDLOHGXQLWVKDVWKHKRWVSRW
7R DQVZHU WKHVH TXHVWLRQV WKH LQGHSWK DQDO\VLV ZDV
HPSOR\HG LQ WHUPV RI WKH FLUFXLW DQG OD\RXW RI WKLV GHYLFH
%HIRUH WKDW ZH VHOHFW RQH IDLOHG XQLW ZLWKRXW KRWVSRW WR GR
UDQGRP 3)$ RQ WKH 65$0 UHJLRQ $V H[SHFWHG QRWKLQJ
DEQRUPDOZDVREVHUYHGLQ%(2/%XWJURVV:H[WUXVLRQZDV
DOVR REVHUYHG LQ WKH 65$0 EORFN 7KHQ RQH PRUH TXHVWLRQ
FRPHV RXWZK\WKLVJURVV:H[WUXVLRQFDQLQGXFHKRWVSRWLQ
VRPHXQLWZKLOHQRWLQRWKHUXQLW:KDWVWKHUHDVRQEHKLQG
6(0 WRS GRZQ LQVSHFWLRQ ZDV FRPSDUHG EHWZHHQ WKH
VDPSOH ZLWKKRWVSRW DQG ZLWKRXWKRW VSRW 6RPH REVHUYDWLRQ
ZDV IRXQG IRU WKH VDPSOH ZLWK KRW VSRW WKHUH LV D VROLG :
VKRUWQHVVEHWZHHQELWOLQHFRQWDFWDQGQHLJKERULQJFRQWDFWDW
65$0 EORFN HGJH :KLOH IRU WKH VDPSOH ZLWKRXW KRW VSRW
DOWKRXJK WKHUH LV : H[WUXVLRQ EXW QR VROLG : VKRUWQHVV
KDSSHQVLQWKH65$0EORFNHGJH

7KHUHDVRQLVDVEHORZXQGHU'&ELDVFRQGLWLRQ9GGDQG
9VVWKLV3026VRXUFHLVFRQQHFWHGZLWK9GGZKLOHWKHJDWH
LVIORDWLQJ%XWZHPXVWEHDULQPLQGWKLV3026LVVLWWLQJRQ
WKH 1:(// ZKLOH WKH 1:(// LV FRQQHFWHG ZLWK WKH 9GG
6RHYHQWKHJDWHLVIORDWLQJRUFRQQHFWHGZLWKVRPHZKHUHWKH
JDWHLVVWLOOWXUQRQ)RUWKH%LWOLQHWRELWOLQHVKRUWWKHUHLVQR
FXUUHQW IORZ %HFDXVH ERWK ELW OLQH DUH VKRUW WR 9GG IRU WKH
FHQWHU FHOO WKHUH LV QR FXUUHQW IORZ VR QR VSRW FDQ EH
WULJJHUHG

)LJXUH*'6OD\RXWRIWKHIDLOHGORFDWLRQDQGDQDO\VLV

)LJXUH3)$VKRZVQR:VROLGVKRUWKDSSHQHGLQ65$0HGJHIRUWKHVDPSOH
ZLWKRXWKRWVSRW


,QGHSWK FLUFXLW DQG OD\RXW DQDO\VLV VKRZV WKDW WKH GLH
HGJH FRQWDFW ZKLFK VKRUW ZLWK ELW OLQH FRQWDFW LV FRQQHFWHG
ZLWK9VV7KH9VVLQWKHEORFNHGJHLVKLJKOLJKWHGLQWKH*'6
OD\RXWILJXUH
,QWKHFHQWHURIWKH65$0EORFNELWOLQHWRELWOLQHVKRUW
ZDV REVHUYHG EXW WKLV VKRUW FDQQRW EH DFFHVVHG E\ WKH '&
ELDVVLQFHZHFDQRQO\ELDV9GGDQG9VV%XWKRZDERXWWKH
HGJH ELWOLQHZKLFKLVVKRUWWR9VV EDVHGRQRXU*'6 OD\RXW
DQDO\VLV&DQWKLVNLQGRIVKRUWQHVVEHDFFHVVHGE\QRUPDO'&
ELDVDQGFDQLWLQGXFHKRWVSRW
,QRUGHUWRDQVZHUWKLVTXHVWLRQZHKDYHDGHWDLOHGVWXG\
RI WKH 65$0 SHULSKHUDO VFKHPDWLF 7KHUH LV DQ HTXDOL]HU
3026 ZKRVH VRXUFH LV FRQQHFWHG ZLWK WKH 9GG DV
KLJKOLJKWHG LQ WKH )LJXUH   1RUPDOO\ WKLV 3026 LV WXUQHG
RII WKHELW OLQHFDQQRWEHGLUHFWO\DFFHVVHG E\WKH9GG%XW
DIWHU WKH GHWDLOHG DQDO\VLV RI WKLV FLUFXLW DQG FRPELQHG ZLWK
*'6OD\RXWZHFDQFRQILUPWKLV3026LVWXUQRQ

40


)LJXUHWUDGLWLRQDO65$0FLUFXLWDQDO\VLV

%XWWKHIRUWKHHGJHELWOLQHWKHVLWXDWLRQLVGLIIHUHQWLWLV
ELWOLQHVKRUWWR9VV

2014 IEEE 21st International Symposium on the Physical and Failure Analysis of Integrated Circuits (IPFA)

$VVKRZQLIWKH)LJXUWKHUHLVEDODQFH3026ZKLFKLV
FRQQHFWHGZLWKWKHELWOLQH,IZHSXOORXWDVLQJOH65$0IRU

VHOHFWLRQLVYHU\LPSRUWDQW$VVKRZQLQWKLVSDSHUQRWDOOWKH
VDPSOHKDVKRWVSRW
)RUWKLVSDSHUWKH%,67IXQFWLRQDOIDLO 65$0ZDVDQDO\VLV
6LQFHQR ELW PDS SURYLGHG WKHQRUPDO JOREDO IDXOWLVRODWLRQ
PHWKRGZDVDSSOLHGLQWKHDQDO\VLV7KHGHIHFWDQGURRWFDXVH
ZDV VXFFHVVIXOO\ IRXQG 7KLV LV D JRRG UHIHUHQFH IRU VRPH
NLQGRIIXQFWLRQDOIDLOXUHDQDO\VLV


5HIHUHQFHV
 >@ /DZUHQFH & :DJQHU )DLOXUH $QDO\VLV RI ,QWHJUDWHG
&LUFXLWV7RROVDQG7HFKQLTXHVSS,6%1
  $6,1    ,6%1  ($1

>@-&+3KDQJ'6+&KDQO03DODQLDSSDQO-0&KLQ$
5HYLHZRI/DVHU,QGXFHG7HFKQLTXHVIRU0LFURHOHFWURQLF
)DLOXUHSURFHHGLQJVRIWK,3)$ 7DLZDQ

 >@-&+3KDQJ'6+&KDQO6/7$1:%/HQ$UHYLHZRI
)LJXUH65$0VLQJOHELWDQGDQDO\VLV
QHDU LQIUDUHG SKRWRQ HPLVVLRQ PLFURVFRS\ DQG
FLUFXLW PHFKDQLVPDQDO\VLV LW LV VKRZQ LQ )LJXUH  :H FDQ
VSHFWURVFRS\3URFHHGLQJRIWK,3)$6LQJDSRUH
HDVLO\ ILQGWKHUHLVD FXUUHQWIORZ IURPWUDQVLVWRU0SWR9VV
ZKHQWKHJDWH30260SLVIORDWLQJ7KDWLVWKHUHDVRQZK\
WKHKRWVSRWV DOZD\V ORFDWH LQ WKH 65$0 HGJH ,I WKHUH LVQR
VROLG VKRUWKDSSHQHG LQ WKH 65$0 HGJH WKHUH LVQR FXUUHQW
IORZSDWK6RKRWVSRWFDQQRWEHWULJJHUHG7KDWLVWKHUHDVRQ
ZK\WKHUHLVQRKRWVSRWLQVRPHRIWKHIDLOHGGLH
)XUWKHU FURVV VHFWLRQ DQDO\VLV VKRZ : H[WUXVLRQ VKRUW

)LJXUHFURVVVHFWLRQUHVXOWVKRZ:VKRUWKDSSHQHG

KDSSHQHG LQWKH FHQWHU ORFDWLRQ WKH FRQWDFWKHLJKW )LJXUH


%DVHGRQRXUSURFHVVDQDO\VLVZH LGHQWLILHGWKDWWKHVKRUWLV
GXHWKHILOPGHSRVLWLRQSURFHVVGULIW7KLVGULIWLQGXFHVVRPH
ILOPLQWHUIDFHLQWHUDFWLRQHIIHFW

&RQFOXVLRQ
6WDWLF IDXOW LVRODWLRQ LV VWLOO FDSDEOH IRU VRPH NLQGV RI
IXQFWLRQDO IDLO LI WKH GHIHFW ORFDWLRQ FDQ EH DFFHVVHG E\ WKH
'&ELDV6RPHWLPHVHYHQWKH'&,9FXUYHLVFRPSDUDEOHLW
LV VWLOO SRVVLEOH WR ILQG WKH GHIHFW E\ VWDWLF IDXOW LVRODWLRQ
EHFDXVH WKH '& ,9 FXUYH LV DQ DFFXPXODWLRQ UHVXOW 6RPH
NLQGV RI GHIHFW LQGXFHG FKDQJH PD\EH FRQFHDOHG E\ WKH
RYHUDOO UHVXOW %XW IRU WKHVH NLQGV RI DQDO\VLV WKH VDPSOH
2014 IEEE 21st International Symposium on the Physical and Failure Analysis of Integrated Circuits (IPFA)

41

Robust Subthreshold 7T-SRAM Cell for


Low-Power Applications
Farshad Moradi and Jens K. Madsen, Department of Engineering, Aarhus University, Denmark

AbstractIn this paper, a novel 7T-S RAM cell for ultra-low


power applications is proposed. The proposed S RAM cell is fully
functional at subthreshold voltages down to VDD min=200mV. In
this technique, separate read/write bitlines and wordlines are
used that makes read and write operation independent. The 7TS RAM cell proposed in this paper, improves static read noise
margin, write margin, and write time by 2.2X, 27%, and 6% in
comparison to the standard 6T-S RAM cell. The 7T-S RAM cell
proposed in this paper, improves write margin of the
conventional 7T-S RAM cell, as well. The proposed 7T-S RAM
cell is designed in 65nm CMOS technology.
Index TermsS RAM, Write Margin, Read Static Noise

Margin, CMOS

I. INTRODUCTION
SRAM memories take up to 80% of the total die area and up to 70%
of the total power consumption of high-performance processors [1].
Therefore, there is a crucial need of designing high-performance,
low-leakage, and robust SRAM s. Unfortunately in scaled
technologies, particularly under scaled supply voltages, read and
write stabilities of SRAM s are affected by process variations. Due to
a large number of small geometry transistors in a memory array,
process variations have a significant impactleading to possible
read, write, and access failures, particularly at lower supply voltages.
Furthermore, in conventional 6T SRAM , the conflict between read
and write stabilities is an unavoidable design constraint that
aggravates the effect of process variations on SRAM stability and its
performance.
To improve the SRAM cell functionality, several solutions have
been proposed from device to architecture level. For instance, the use
of new devices such as FinFETs, leads to a significant performance
improvement [2-5]. At the cell level, new cells such as 7T, 8T, 9T,
10T, and 11T [6-12] have been proposed. At the architecture level,
proposed read and write assist techniques in literature can improve
SRAM robustness and performance while occupying less area
compared to the cell techniques such as 8T and 10T which can be
used with any type of SRAM .
The standard 6T-SRAM cell is shown in Fig.1 that consists of two
back-to-back inverters (include two pull-up NM OS transistors and
two pull-down PM OS transistors) and two NM OS access transistors
connected to the bitlines with the gates connected to the wordline.
During read, wordline is asserted and the voltage difference between
bitlines is sensed using a sense amplifier. The read cycle is done via
access transistors and pull-down transistors. Stronger pull-down
transistors (PDL and PDR) and weaker access transistors improve
read static noise margin (RSNM ). On the other side, stronger access
transistors and weaker pull-up transistors improve write margin
(WM ) of the bit-cell. Through upsizing, SRAM cell can operate at
lower supply voltages (i.e. low VDD min) with minimized threshold
voltage variation with a penalty of increased area.

978-1-4799-4132-2/14/$31.00 2014 IEEE

893

The main contribution of this work is to design an SRAM cell with


capability of working at subthreshold regions with significantly
improved read and write margins in comparison to the standard 6T-

Fig.1. 6T-SRAM cell


SRAM cell. The proposed 7T-SRAM cell uses separate read and
write wordlines and single-ended bitlines (BL) for read. By the use of
the proposed 7T-SRAM cell, both read and write margins are
improved while VDD min can be scaled down to very low voltages less
than 200mV.
The rest of this paper is organized as follows. Section II describes
the proposed 7T-SRAM and compares it with other previously
proposed SRAM cells in terms of area, power, and robustness. In
section III, the simulations results are discussed. Finally, conclusions
are drawn in section IV.

II. 7T-SRAM CELL

Fig.2.a) Conventional 7T-SRAM b) . Standard 7T-SRAM


during read (VQ=1)
The conventional SRAM cell is shown in Fig.2-a [13]. In this 7T cell,
transistor M 7 is added to break the butterfly loop during write and
read operations. The gate of this transistor is controlled by inverted
wordline signal (WLB) that is set to 0 when data on a cell/row is
accessed (i.e. WL=1). This SRAM cell design improves read SNM
significantly while it does not help WM . During read, when a cell is
selected (i. e. WL=1 and WLB=0), WWL signal is set to 0. By
assuming a storage node value of 0 at node Q, RBL starts to
discharge the bitline capacitance, CBL, and the data is read using a
sense amplifier. During this mode M 7 is OFF. This results in a
significant improvement in read noise margin. However, when the bit
value on node QB is zero, read static noise margin of the cell is
similar to the standard 6T SRAM cell. To clarify this, let us assume
that the value stored on storage node Q is 1. In this case, when QB
increases due to the leakage or noise resources, transistor M 2
conducts and starts to discharge node Q. Due to this feedback, read
noise margin of the cell is degraded. The main disadvantage of this

increase in the area of SRAM cell compared to the standard 6TSRAM cell. Therefore, we consider two improvements in our
proposed SRAM cell: first, lowering the total area and second write
margin improvement compared to the conventional 7T-SRAM cell.

A. Proposed 7T-SRAM cell

The proposed 7T-SRAM cell is shown in Fig. 5-a. The circuit


works as follows. During read, WL is asserted while WWL is low
and the stored data is read through a single-ended bitline. To sense
the stored bit on the bitcell, any proper sense-amplifier scheme can be
used. Before read, RBL is precharged to 1 and in case of 1 at
node Q, no discharging will happen and the access transistor is kept
in OFF mode. In this case, transistor NF is OFF and there is no
discharging path to ground. This leads to an extremely decreased
leakage through pull-down transistors due to the stacking effect. In

Fig.3. a) 6T-SRAM , b) conventional 7T-SRAM cell during


write 0 when Q value discharges to Vdd-Vtp

Fig.5 Proposed 7T-SRAM a) cell b) during read 0

Fig.4. a) 6T-SRAM , b) conventional 7T-SRAM cell during


write 1 when QB voltage becomes lower than Vdd-Vtp
circuit in comparison to the standard 6T-SRAM cell is the
disconnected path from node QB to ground that result in a faster
flipping data on node QB compared to the standard 6T-SRAM cell as
illustrated in Fig.2-b.
During write, WWL is asserted and the data on the bitlines will be
written on the storage nodes. For simplicity and clarification, we
change the node Q and QB for the standard 7T-SRAM cell to be
similar to the standard configurations of 6T-SRAM and proposed 7TSRAM . Fig.3 shows the equivalent circuit of 6T-SRAM and 7TSRAM when Q stores 1. Let us consider both cases for storage
nodes. First, when the bit stored on node Q is 1 and the data on
RBL is 0 that is illustrated in Fig.3.
In this case, for the 7T-STRAM cell, node Q starts to discharge
through access transistor M 5. In this case, the contention between
M 3 and M 5 determines the speed of data flipping. As it is clear from
Fig.3, the equivalent circuit of 7T-SRAM and 6T-SRAM at this mode
is the same. Therefore, it is expected that the write time of both
circuits becomes similar. However, considering this fact that the
voltage on node QB will increase through PUR and M 4 in 6T-SRAM
and 7T-SRAM , respectively, in 6T-SRAM cell, the transistor PDL
turns on and helps to discharge the voltage on node Q to ground
while the only available discharging path for 7T-SRAM is through
transistor M 5. Therefore, it is expected that write time of the 7TSRAM degrades when Q hold 1.
On the other hand, when the stored bit on node Q is 0, the
equivalent circuit for both 6T-SRAM and 7T-SRAM is shown in
Fig.4. In this case, write margin will be improved due to this fact that
the discharging path of node QB is disconnected from ground (M 7
and M 1). This leads to a faster writing time and improved write
margin. All in all, 7T-SRAM improves the read margin using a
similar scenario with 8T-SRAM while the write margin improvement
is asymmetric and is negligible. Furthermore, the conventional 7TSRAM cell uses an extra inverted WL (i.e. WLB) that leads to an

894

Fig.6. Proposed 7T-SRAM during write when a) Q=1 b)


Q=0
another case, when Q holds 0, transistor NF is ON that is explained
below.
Fig.8.
Read cycle: Fig.5-b illustrates the behavior of the cell during read
Proposed
mode by assuming a 0 on node Q. When the cell is accessed, once
the voltage on node Q increases, due to the active feedback, it is 7TSRAM
expected that transistor PDR conducts that results in decreased noise
immunity of the cell similar to the standard 6T-SRAM cell. However, during
write
when the level of QB is lowered due to the raised voltage on node Q,
when a)
transistor NF becomes weaker and neutralizes the effect of raised
voltage on node Q. In other part of the cell, however, NF weakens the Q=1
b)
discharging branch of the node Q, as well. Furthermore, in our
Q=0
design, transistor PDR is sized smaller than PDL that results in a
significant improved read margin. As discussed before, in case of a
1 on the storage node Q, the achieved noise margin will be very
high. This is attributed to the transistors PDL and NF turning off that
improves the read noise, significantly. All in all, the simulation
results show a maximum improvement in read margin compared to
conventional 7T-SRAM and 6T-SRAM cells. The simulations results
will be discussed later.
Write cycle: During write, when the cell is accessed, both WWL
and WL are set to high. In this case, similar to 6T-SRAM cell, data

can be written to the cell through left or right access transistors.


However, to clarify the write mode of the 7T-SRAM cell, write cycle
is explained step by step in comparison to 6T-SRAM and
conventional 7T-SRAM cell. By assuming a 1 on storage node Q
as shown in Fig.6, the small capacitance on node Q starts to be
discharged to ground through ACL while fighting with the PUL
transistor that is trying to keep the storage node value high. On the
other side of the SRAM cell, ACR turns on and starts to charge node
QB. Due to this fact that, there is no discharging path from node QB
to ground, ACR accelerates charging node QB to Vdd. When the
voltage on node Q drops below Vdd-Vtp, where Vtp is the threshold

Fig.9. Write time comparison for the proposed SRAM cell, 6T SRAM , and conventional 7T-SRAM a) writing 1 b) writing
0
achieved. To improve the write margin of the cell when Q holds 0,
smaller sized ACR ameliorates the write margin while it has no effect
on other parameters such as read margin or access time. Since the
write margin of the cell when Q holds 1 is large enough, the size of
PUL can be increased while the size of PUR can be decreased. Also,
considering this fact that enlarging the ACL can improve the write
margin, it can be used as an important design factor. During read,
larger PDL and NF will improve read noise margin while stronger
ACL deteriorates the RSNM .

Fig.7. Write margin comparison for the proposed 7T-SRAM


when (a) writing 0 and (b) writing 1 compared to the
standard 6T-SRAM cell

III. SIMULATION RESULTS


In this section, the simulation results of the proposed 7T-SRAM in
CM OS 65nm technology are discussed. To this end, we present
different features of the proposed 7T-SRAM compared to the
standard 6T-SRAM cell.

a. Write Margin
Fig.8. Read static noise margin of the proposed 7T-SRAM and
the conventional 7T-SRAM
voltage of the PM OS transistor PUR, thus, PUR turns on and helps
the stored data to flip faster. This leads to an intrinsic improvement in
write margin and enables the SRAM cell to work properly at ultrascaled supply voltages near/sub-threshold. However, as we showed in
Fig.3, for the conventional 7T-SRAM cell, M 3 and M 5 contest to
discharge and charge the storage node Q respectively, while M 2
keeps the QB node low and fights with M 4 and M 6. This increases
the short circuit power consumption of the conventional 7T -SRAM
during write 0.
By assuming a 0 on the storage node Q, as shown in in section II,
the behavior of the conventional 7T-SRAM improves the write
margin, significantly, due to the similar mechanism to the proposed
7T-SRAM cell that was mentioned in previous case. However, the
minimum write noise margin is defined by the previous case (i.e.
when node Q holds a 1) that was slightly worse than standard 6TSRAM cell. For the proposed 7T-SRAM cell, however, the write
margin improvement is limited by the write margin of the circuit
when Q holds a 0. The equivalent circuit of the cell is shown in
Fig.6.b. When the voltage on node QB is lower than Vdd-Vtp both
transistors PDL and NF become weaker that results in a faster data
flip. Noted, to improve the write margin of 7T-SRAM cell, careful
sizing is required that will be discussed in following sub-section.

A. SRAM cell sizing

So far, we have considered the behavior of the proposed SRAM


cell at different modes of operation. In this sub-section, the
considerations on the sizing of transistors are explained.
Due to the asymmetric nature of the proposed 7T -SRAM cell, the
asymmetric sizing of the SRAM cell will be optimum to achieve
maximum robustness. The main advantage of the proposed SRAM
cell is its significantly improved write margin when the bitcell
content is 1 while for the case Q holds 0, lower write margin is

895

Different methods have been used to find the WM of an SRAM cell


[14]. For WM simulations, we choose Word-Line (WL) voltage
sweep method. In this method the bitline will be connected to the
appropriate voltages to enable flipping the data on the storage node.
Then WL and WLB are swept from 0V to 1V and 1V to 0V,
respectively. WM is calculated as the difference between VDD and
WL voltage when the data stored in the cell is flipped. Fig.7
illustrates the write margin of the proposed 7T-SRAM cell versus
standard 6T-SRAM cell for write 1 and 0. As it can be seen, when
writing 0, the proposed 7T-SRAM improves the write margin
between 28%-74% at different supply voltages while during write
1, the improvement is between 11%-21% compared to the standard
6T-SRAM . As explained previously, during this mode, standard 7T SRAM results in a lower write margin compared to the standard 6TSRAM cell.
Read static noise margin: A metric to evaluate the read stability of an
SRAM cell is Read Static Noise M argin (RSNM ). It is defined as the
length of the side of the largest square that can fit into the lobes of the
butterfly curve. Butterfly curve is obtained by drawing and mirroring
the inverter characteristics while access transistors are ON and
bitlines are precharged to VDD [15]. Fig.8-a shows RSNM results of
the proposed 7T-SRAM compared to the 6T-SRAM cell. As it is
seen, the RSNM of the proposed 7T-SRAM cell compared to 6TSRAM cell is improved by at least 1.9X (occurs at Vdd=500 mV) that
enables designers to scale the supply voltage to subthreshold region.
For instance, an 8X improvement in RSNM is achieved using the
proposed 7T-SRAM cell at Vdd=150mV (RSNM =32 mV).
Furthermore, the proposed cell improves RSNM in comparison to the
conventional 7T-SRAM cell; especially at lower supply voltages. The
results are shown in Fig.8-b. To clarify the reason behind the RSNM
improvement, let us consider a case scenario. Assume that, the
voltage on node Q in Fig.5-a increases during read. In this case,
transistor PDR starts to turn on and starts to discharge the value
stored on node QB. When the voltage on node QB is lowered, the
strength of the transistor NF is lowered whose gate is connected to

Table 1. The sizing of SRAM cells (6T, 7T, proposed 7T)

6T-SRAM Cell

7T-SRAM

ACL
ACR
PUL
PUR
PDL
PDR

M1,M2
M3
M4
M5
M6
M7

180n
180n
150n
150n
230n
230n

230n
150n
150n
180n
180n
230n

Proposed 7TSRAM
ACL
180n
ACR
180n
PUL
150n
PUR
120n
PDL,NF
230n
PDR
200n

the node QB. This negative feedback does not allow QB going lower
and the data flipping pace becomes slower. This results in a
significantly improved read noise margin. On the other hand, when
node QB keeps a 0, by assuming a bump in voltage at this node, the
discharging path from node Q to ground is weak due to this fact that
the gate of transistor NF that is connected to the node QB. Therefore,
the RSNM of the proposed circuit improves significantly.
Retention (Hold) mode: During Hold mode when both WWL and
WL are set to 0, the proposed 7T-SRAM circuit deteriorates the
hold static noise margin (HSNM ) by at least 20.2% for VDD=0.8V
compared to the conventional 7T-SRAM . The degradation of HSNM
is attributed to the case in which Q holds 1 while for another case
(Q=0), the maximum noise margin is achieved.
Write time: In order to find the write time, the time between asserting
WL and storage node reaches to 80% of its final value is measured.
As it is shown in Fig.9, the proposed SRAM cell improves the write
time by at least 10% and 2% compared to the conventional 7T SRAM at VDD=1V when writing 1 and 0, respectively. The
maximum write time improvement compared to the conventional 7TSRAM occurs at VDD=0.4V when writing 1 and at VDD=0.3V when
writing 0 by 81% and 37%, respectively. In comparison to the 6T SRAM cell, the proposed 7T-SRAM cell shows at least 3% when
writing 1 at VDD=0.3V and 8% when writing 0 at VDD=1V. The
maximum improvement in write time compared to 6T-SRAM occurs
at VDD=0.4V by 17% and 71% for writing 1 and 0, respectively.
As seen in Fig. 9, conventional 7T-SRAM shows larger write time
when writing 1. In our simulation, the wordline capacitance effect
on write time has been neglected. Since the conventional 7T -SRAM
cell suffers from a larger wordline capacitance, taking this into
consideration results in more degradation in write time. In contrast,
the proposed 7T-SRAM cell has a similar wordline capacitance to the
6T-SRAM cell.
Sizing: in this part, we investigate the effect of sizing of transistors
on different parameters of the proposed cell such as RSNM , WM ,
and HSNM . The sizing information of the 6T-SRAM , the
conventional 7T-SRAM , and the proposed 7T-SRAM cells are
tabulated in Table 1. For the proposed 7T-SRAM cell increasing the
size of the transistor NF, PDR, and PDL improves the RSNM and
HSNM . By increasing the size of NF, PDR, and PDL, similar to 6T SRAM cell, the RSNM and HSNM will improve while increasing the
size of access transistors degrades the RSNM of the cell.

IV.CONCLUSIONS
In this paper, a novel 7T-SRAM cell was proposed that improves
write and read noise margin along with write time of the cell. The
proposed circuit can employs any type of write assist technique to
improve the write margin further. The proposed cell improves write
margin of the 6T-SRAM cell by 27% and 14% when writing 0 and 1,
respectively. The proposed cell, improves RSNM of the cell by 2.2X
and 2.5% compared to standard 6T-SRAM and conventional 7TSRAM cell, respectively.

REFERENCES
[1] M . Horowitz, Scaling, power, and the future of M OS, in
IEDM Tech. Dig., pp. 915, Dec. 2005.

896

[2] F. M oradi, S.K. Gupta, G. Panagopoulos, D.T. Wisland, H.


M ahmoodi, and K. Roy, "Asymmetrically doped FinFETs for
low-power robust SRAM s," IEEE Trans. Electron Devices, vol.
58, no. 12, pp.4241-4249, Dec. 2011.
[3] N. Collaert, A. De Keersgieter, A. Dixit, I. Ferain, L.-S. Lai, D.
Lenoble, A. M ercha, A. Nackaerts, B. J. Pawlak, R. Rooyackers,
T. Schulz, K. T. Sar, N. J. Son, M . J. H. Van Dal, P. Verheyen,
K. von Arnim, L. Witters, M . De, S. Biesemans, and M . Jurczak,
M ulti-gate devices for the 32 nm technology node and
beyond, in Proc. 37th ESSDERC, Sep. 1113, 2007, pp. 143
146.
[4] A. B. Sachid, and C. Hu, Denser and more stable SRAM using
FinFETs with multiple fin heights, IEEE Trans. Electron
Devices, vol. 59, no. 8, pp. 2037-2041, Aug. 2012.
[5] S. A. Tawfik, and V. Kursun, M ulti-threshold voltage FinFET
sequential circuits, IEEE Trans. VLSI systems, vol. 19, no. 1,
pp. 151-156, Jan. 2011.
[6] N. Verma and A. Chandrakasan, A 256 kb 65 nm 8T
subthreshold SRAM employing sense-amplifier redundancy,
IEEE J. Solid-State Circuits, vol. 43, no. 1, pp. 141149, Jan.
2008.
[7] R. E. Aly and M . A. Bayoumi, Low-power cache design using
7T SRAM cell, IEEE Trans. Circuits Syst. II, Exp. Briefs, vol.
54, no. 4, pp. 318322, Apr. 2007.
[8] B. M adiwalar and B.S. Kariyappa, Single bit-line 7T SRAM
cell for low power and high SNM International M ultiConference on iM ac4s, pp. 223-228, M ar 2013.
[9] L. Wen, Z. Li, and Y. Li, Differential-read 8T SRAM cell with
tunable access and pull-down transistors, Electronics Letters,
vol. 48, no. 20, pp. 1260-1261, Sep. 2012.
[10] M . H. Tu, J.-Y. Lin, M .-C. Tsai, C.-Y. Lu, Y.-J. Lin, M .-H.
Wang, H.-S. Huang, K.-D. Lee, W.-C. Shih, S.-J. Jou, and C.-T.
Chuang, A single-ended disturb-Free 9T subthreshold SRAM
with cross-point data-aware write word-line structure, negative
bit-line, and adaptive read operation timing tracing, IEEE J.
Solid-State Circuits, vol. 47, no. 6, pp. 14691482, Jun. 2012.
[11] I. J. Chang, J. Kim, S. P. Park, and K. Roy, A 32 kb 10T subthreshold SRAM array with bit-interleaving and differential read
scheme in 90 nm CM OS, IEEE J. Solid State Circuits, vol. 44,
no. 2, pp. 650658, Feb. 2009.
[12] F. M oradi, D. T. Wisland, S. Aunet, H. M ahmoodi, and C. Tuan
Vu, "65NM sub-threshold 11T-SRAM for ultra-low voltage
applications," in SOC Conference, IEEE International, 2008, pp.
113-118.
[13] K. Takeda, Y. Hagihara, Y. Aimoto, M . Nomura, Y. Nakazawa,
T. Ishii, and H. Kobatake, A read-static-noise-margin-free
SRAM cell for low-Vdd and high-speed applications, IEEE J.
Solid-State Circuits, vol. 41, no. 1, pp. 113121, Jan. 2006.
[14] J. Wang, S. Nalam, and B. H. Calhoun, Analyzing static and
dynamic write margin for nanometer SRAM s, in Proc. Int.
Symp. Low Power Electron. Design, New York, 2008, pp. 129
134
[15] E. Seevinck, F. J. List, and J. Lohstroh, Static-noise margin
analysis of M OS SRAM cells, IEEE J. Solid-State Circuits, vol.
sc-22, pp. 748-754, Oct. 1987.

Comparison of 130 nm Technology 6T and 8T


SRAM Cell Designs for Near-Threshold Operation
Topic Category: Digital Integrated Circuits, SoC and NoC
Mika Kutila, Ari Paasio and Teijo Lehtonen
University of Turku, Technology Research Center (TRC)
20014 Turun yliopisto, Finland
{mika.kutila, ari.paasio, teijo.lehtonen}@utu.fi

AbstractPower consumption is an important aspect of almost


any electrical device design. Near-Threshold Computing (NTC)
is a voltage scaling technique that makes it possible to reduce
the power consumption of CMOS devices with the cost of
speed and reliability. We are using NTC to design low-power
cache memory circuit for a low-performance sensor-based system.
Caches consume noteworthy portions of power and area of this
kind of systems, and therefore reducing their power consumption
has a meaningful impact on the overall power consumption
of the whole system. In this paper, 8T SRAM and 6T SRAM
memory cells are compared in order to establish guidelines for
choosing SRAM cell constructions for NTC systems. 8T SRAM
is traditionally concerned as a more reliable memory cell, but we
have managed to design 6T SRAM which executes read operation
with an acceptaple reliability; read being the most vulnerable
operation of conventional 6T SRAM cell. Also, our 6T SRAM
cell has 31 % smaller area and smaller power consumption.
Index Terms6T SRAM, 8T SRAM, Energy Efficiency, NearThreshold Computing.

I. I NTRODUCTION
Lowering the power consumption is a continuous task as
IC technologies keep advancing. Dynamic power consumption
of CMOS circuits is mainly consequence of charging and
discharging nodes of the circuit. Static power consumption,
on the other hand, is always present when there are voltage
differences over some components of the circuit. Both of
these can be reduced by lowering supply voltage (Vdd ), and
voltage scaling has become an effective method for reducing
power consumption in commercial CMOS devices [1]. If
Vdd is lowered from the nominal value towards transistor
threshold voltage (Vth ), transistors keep operating normally,
but the currents through devices become smaller and therefore
state switching becomes slower. In Near-Threshold Computing
(NTC) Vdd is lowered from the nominal value, but it is kept
above Vth in order to keep Ion /Iof f ratio of transistors large
enough for reliable operation.
If NTC is applied to CMOS technologies with low price
point, reasonable priced ASIC devices can be built for lowperformance purposes. Also, if the application which the
circuit is used for, can operate with low clock frequency,
voltage scaling can provide significant power benefits.
8T SRAM is by nature more reliable memory cell structure
than 6T SRAM [2]. Therefore, it is a reasonable choice when

978-1-4799-4132-2/14/$31.00 2014 IEEE

memory is designed for low supply voltage, which is the case


in NTC technique. However, as the memory arrays usually
consume a considerable amount of area on SoC, smaller
6T SRAM cell would be an appealing alternative, if the
possible reliability issues are solved. Therefore, 6T SRAM
and 8T SRAM cells have been chosen for comparison.
Conventionally, 8T SRAM cell is read from one side only,
and 6T SRAM cell is read simultaneously from both sides.
This exposes both of the inner nodes of 6T SRAM cell to
the precharged bitlines. This may make 6T SRAM to change
its state in an unwanted way; this is the main reason that
makes 6T SRAM cell more vulnerable than 8T SRAM cell.
By applying the 8T SRAM read technique to 6T SRAM
we were able firstly, to make 6T SRAM work reliably and
secondly, to make reasonable comparisons between 6T SRAM
and 8T SRAM cells. Single-sided 6T SRAM read discussed
in [3] and [4] is suitable for low performance and low power
systems. Our goal is to use 12 MHz clock in our system, and
therefore slower read operation fits our needs.
We have been simulating 6T SRAM and 8T SRAM devices
with NTC voltages. The power and area consumption are
compared, but the reliability issues are considered as the top
priority. Concentration to the memory cells is due to their
tendency to use significant portions of area and energy of a
common SoC. The goal in mind throughout this paper is a
low-performance 12 MHz SoC that is operating with a NTC
supply voltage of 0.6 V. 130 nm CMOS technology, which is
used, is designed originally for nominal voltage of 1.2 V.
This paper is structured as follows. The memory cells and
notable peripheral circuits are presented in Section II. Simulation arrangements and methods for evaluating the results are
discussed in Section III. Results of simulations are presented
in Section IV. The area comparison is made in Section V, and
conclusions are made in Section VI.
II. C IRCUITS
6T SRAM and 8T SRAM cells and the names of their
components, nodes and control signals used in this paper
are illustrated in Fig. 1 (a) and (b). The dimensions of each
transistor are listed in Table I.

925

design

P1 & P2

N1 & N2

A1 & A2

R1 & R2

The single-sided read circuit is illustrated in Fig. 2 (b);


it takes less area and does not rely on precise timings of
control signals, which is the case with the differential read
of conventional sense amplifier circuit.
Write circuit is illustrated in Fig. 2 (a). There are one of
these circuits for BL and one for BL; the other takes the
data bit and the other takes its inversion as an input. PMOS
transistors are wider than NMOS transistors; this makes pullup and pull-down currents symmetrical, and therefore write
circuit is capable of driving 0 and 1 to the bitlines similarly.
Transistor dimensions are determined by considering mainly
reliability and static energy consumption. The used dimensions
are presented in Table I. P 1 and P 2 have the minimum
width allowed by the technology. This minimizes leak currents
from Vdd . In 6T SRAM, the pull-down NMOS transistors
N 1 and N 2 are wider than the access transistors A1 and
A2. This makes the cell more stable during read operation,
which is when 6T SRAM is the most prone to unwanted state
change. 8T SRAM has the widths of pull-down and access
transistors the opposite way, because A1 and A2 are not used
during read operation; this makes write operation faster and
more reliable. R1 and R2 of 8T SRAM have a width that
minimizes the leakage through them. Leak current through
NMOS depends on the width of the channel, and the minimum
leakage is achieved with 300 nm width [5]. The lengths of all
transistors are set to 200 nm, which is larger than the nominal
130 nm; this makes the leakage smaller, and bigger size gives
protection against manufacturing inaccuracies and therefore
reduces variations in the threshold voltages.

6T SRAM
8T SRAM

150/200
150/200

350/200
250/200

250/200
350/200

300/200

III. S IMULATION M ETHODS

W R/RD

Vdd

WR

P1

P2
BL

BL

A2

A1
N1

N2

(a)
Vdd

WR

WR
BL

P1

A2

P2
Q

R2

N2

R1

BL
A1

N1

RD

(b)
Fig. 1. 6T (a) and 8T (b) SRAM memory cells
TABLE I
SRAM CELL TRANSISTOR DIMENSIONS [ WIDTH / LENGTH (nm)].

The read operation starts with precharghing BL. After that,


the reading is started by setting W R/RD (6T SRAM) or
RD (8T SRAM) to Vdd . If bit one (1) is stored to the cell, Q is
at Vdd and Q is at ground. During read operation, precharghed
BL starts to discharge through A2 and N 2 (6T SRAM) or
R2 and R1 (8T SRAM). Discharghing should be fast enough
compared to the rate that BL leaks through all the unaccessed,
SRAM cells connected to it. This way, the read circuit is able
to identify the data correctly.
If bit zero (0) is stored in the cell, BL should not discharge
and is should stay at or near Vdd . In this case, the read circuit
should identify the value of BL before leakage causes it to
drop too much. We have placed a weak pull-up connection to
the read circuit to help keeping the value one in the bitline.
This connection is illustrated in the Fig. 2 (b). Also, our
bitlines are only 64 bit long, and therefore leakage does not
seem to interfere with the read operation. Short bitlines have
low capacitance, and they make it easier for SRAM cells and
peripheral circuits to charge and discharge them. This adds up
to small transistor sizes, low leakage and better reliability.
Read operations are executed only from one bitline (BL in
Fig. 1 (a) and (b)). This way the two designs of 6T SRAM and
8T SRAM and the simulation results are more comparable.

Extensive ammounts of Monte Carlo simulations were run


to find out how do 6T SRAM and 8T SRAM differ in different
temperatures and with different device mismatch setups. NTC
Vdd of 0.6 V was used throughout these simulations, while the
nominal Vdd for the used technology would have been 1.2 V.
All corner parameters with device mismatch (SS, SF, FS,
FF) and typical performance with Gaussian distributed chipwide mismatch together with device mismatch (stat) were
used in each simulation. 3 variations were used in the corner
parameters to find out the worst-case behavior. Temperature
were observed between 35 C and 90 C.
The worst-case corner parameters for delay results are SS
and delays increase as Vdd decreases. However, decrease of
delays does not restrict the speed of the memory device until
the clock frequency is over 10 MHz. Therefore, the speed
decrease in NTC voltages is not concidered as a critical issue
in this paper.
A. Reliability Estimation
Reliabilities of 6T and 8T SRAM cells were estimated
by executing read and write operations to them. Write was
interpreted as successful if the state of the memory cell after
the operation was the same which was written to it. Read
was interpreted as successful if the output of the memory
column was the right one, and if the state of the memory cell

926

during an active clock cycle, during which some operation is


executed. Eoperation was estimated from the equation
Z t0 +t
Eoperation Vdd
(iP 1 + iP 2 ) dt,

Vdd
WR
600/200

t0
600/200
BL/BL

input/input

where iP 1 and iP 2 are the currents through P1 and P2


transistors; these are the values which were measured from the
simulations. t0 is the start time of the clock cycle, in which
the execution was conducted.

Vdd
RD

150/1000

300/200
150/1000
WR
300/200
output

BL

(a)

(b)

Fig. 2. (a) Write and (b) read circuits used for both SRAM designs
[width/length (nm)].

had not changed during the operation. The worst-case corner


parameters for the 6T SRAM reliability are FS [6]; fast N 1
and N 2 make the 6T SRAM cell vulnerable.
B. Leak Measurements
Leak current measurements were executed by writing bit 1
to the memory cell and then waiting for the system to settle to
a stable state. After that, currents leaking to the cell from the
Vdd were measured. The same procedure was done by writing
bit 0 to the cell and measuring the leakage after that. Half of
the other memory cells in the same column had 0 and half
had 1 stored to them. This simulated an average situation of
the bitlines.
As the peripheral circuits of our 6T SRAM and 8T SRAM
are almost the same, the leakage comparison is made only
between the single memory cells. The worst-case corner
parameters for leakage are FF (fast NMOS, fast PMOS), but
for better estimate of leakage behavior in large SRAM array
stat parameters were used.
C. Dynamic Power Measurements
The dynamic power of memory cells were measured by
executing write and read operations to them. All different
operations were executed;

write 0, when there


write 1, when there
write 0, when there
write 1, when there
read operation after

was 0 stored
was 0 stored
was 1 stored
was 1 stored
each write,

in
in
in
in

the
the
the
the

cell
cell
cell
cell

before,
before,
before,
before,

and the mean power was calculated from them. The dynamic
power consumption of a single operation was estimated from
the equation
Eoperation
,
P
t
where t is the period of the clock cycle, which was 1 s in
our case. Eoperation is the energy consumption of SRAM cell

IV. S IMULATION R ESULTS


A. Reliability Results
10 000 Monte Carlo reliability simulations were conducted
with temperatures from 35 C to 90 C, and with stat and
all different corner parameters SS, SF, FS, and FF. No errors
were produced in either 6T SRAM or 8T SRAM case. This
indicates, that our 6T SRAM and one-sided read operation
have an acceptable stability with NTC use. Also, 64 bits long
bitline helps keeping bitline leakage small, which furthermore
helps read operation reliability. The write buffers and their
sizes are illustrated in Fig. 2 (a), and they were capable of
driving bitlines and change the state of SRAM cell in correct
way.
B. Static Power Results
Leak currents are an important source of total power consumption of memory system. In NTC system, which operates
at low frequency, leak power dominates the total power
consumption. The results of 10 000 Monte Carlo simulations
with stat parameters are presented in the Fig. 3 (a) and (b).
6T SRAM leaks from 4.55 % to 10.95 % less than 8T SRAM,
depending on the temperature. This indicates, that 6T SRAM
would be beneficial choice over 8T SRAM.
C. Dynamic Power Results
Dynamic power results are presented in Fig. 3 (c) and
(d). Distributions are positively skewed, and 6T SRAM is
more skewed. Even, while each write operation is executed
in a similar manner, the variation is more significant in
read operation power consumption. One reason to this are
the relatively slow write circuits (Fig. 2 (a)); faster drivers
might make the power consumption of write operations more
consistent.
The results in Fig. 3 (c) and (d) are measured from a single
SRAM cell. They represent the theoretical maximum of power
consumed by one SRAM cell, which is read or written. That is,
if a processor would be capable of executing memory access
commands on every single clock cycle.
Median and mean values of the leak and dynamic power
measurements are illustrated in Fig. 4 (a) and (b). As an
example, active power consumption of a 128 B memory page
and an 8 MB memory of 64 pages are illustrated in Fig. 4 (c)
and (d). The dynamic power consumption of a single SRAM
cell has only little impact on the total power consumption of
usually large memory arrays, because there is a large number
of memory cells in a static state just retaining their state, and
only few cells are executing an operation at any given time.

927

Dynamic Power (nW)

Leak Power (nW)

15

10

0
-35 -10 15

40

65

90

-35 -10 15

Temperature (C)

40

65

90

-35 -10 15

Temperature (C)

(a)

40

65

90

-35 -10 15

Temperature (C)

(b)

40

65

90

Temperature (C)

(c)

(d)

Fig. 3. Power consumption of a single SRAM cell, stat parameters, distributions of 10 000 Monte Carlo simulations. Leak power of 6T SRAM (a) and
8T SRAM (b), each measuring two cases: 0 stored, and 1 stored in the cell. Dynamic power of 6T SRAM (c) and 8T SRAM (d), all different initial values
and read/write operations concerned. Box edges are at 25th and 75th percentiles, whiskers have maximum length of 1.5 times the box height ( 99.3 %
coverage).

Total Power,

Dynamic Power (nW)

Leak power (nW)


2.5

1.0

8T median
0.6

1.5

6T median

1.0

0.4
0.2

0.5

0
-35

-10

15

40

65

90

Temperature (C)

-10

15

40

65

1.6

1.6

1.4

1.4

1.4

1.4

1.2

1.2

1.0

1.0

90

(a)

0.8
-35

Temperature (C)

1 page
64 pages

1.8

0.8
-35

-10

15

40

65

90

-35

Temperature (C)

(b)

6T SRAM median
8T SRAM median

2.0
1 page
64 pages

1.8

2.0

6T mean

Total Power,

2.0

8T mean
0.8

6T SRAM mean
8T SRAM mean

-10

15

40

65

90

Temperature (C)

(c)

(d)

Fig. 4. Leak (a) and dynamic (b) power consumption of a single SRAM cell, 10 000 Monte Carlo simulations. Total power of 6T SRAM array in relation
to 8T SRAM array, calculated from means (c) and medians (d) of simulation results; example page of 128 B, 16 bit read or write, activity 100 %.

V. A REA C OMPARISON
2

Our 6T SRAM layout area is 4.371 25 m


(3.250 m1.345 m) and 8T SRAM layout area is
6.362 4 m2 (3.615 m1.76 m). When moving from
8T SRAM to 6T SRAM cell, over 31 % area savings can be
achieved. This is a significant difference in a small SoC, that
has considerable amount of cache.
VI. C ONCLUSION
We have executed large numbers of Monte Carlo simulations, which indicate that with single-side read operation,
6T SRAM can be designed to work reliably in NTC voltages.
6T SRAM cell also leaks at least 4.5 % less, and a single cell is
over 30 % smaller in area than 8T SRAM cell. These results
together with the acceptable reliability make the 6T SRAM
cell an appealing choice for a NTC SoC.
In the future, it should be possible to reduce leakage even
more; unaccessed memory cells could retain their state with
even lower Vdd . Some leak savings could be achieved also by
making the transistors a little longer, and perhaps by using
PMOS transistors as access transistors, as they leak less than

NMOS transistors. Also, there seems to be a possibility for


dynamic power savings by adjusting the write circuit strength.
However, together with the reliability, the leakage is the most
important aspect in low power memory design, as most of the
memory cells are idle at any given time.
R EFERENCES
[1] R. Dreslinski, M. Wieckowski, D. Blaauw, D. Sylvester, and T. Mudge,
Near-threshold computing: Reclaiming moores law through energy
efficient integrated circuits, Proc. IEEE, vol. 98, no. 2, pp. 253266,
Feb. 2010.
[2] P. Athe and S. Dasgupta, A comparative study of 6T, 8T and 9T decanano
SRAM cell, in Proc. ISIEA, vol. 2, Oct. 2009, pp. 889894.
[3] G. Chen, M. Wieckowski, K. Daeyeon, D. Blaauw, and D. Sylvester,
A dense 45nm half-differential SRAM with lower minimum operating
voltage, in Proc. ISCAS, May 2011, pp. 5760.
[4] C. Piguet, J.-M. Masgonty, S. Cserveny, C. Arm, and P. D. Pfister, Lowpower low-voltage library cells and memories, in Proc. ICECS, vol. 3,
Sep. 2001, pp. 15211524 vol.3.
[5] M. Kutila, A. Paasio, and T. Lehtonen, Simulations on 130 nm technology 6T SRAM cell for near-threshold operation, Proc. ISCAS, June
2014.
[6] A. F. Yeknami, Design and evaluation of a low-voltage, processvariation-tolerant SRAM cache in 90nm CMOS technology, Masters
thesis, Dept. Elect. Eng., Linkopings Universitet, Linkoping, Sweden,
2008. [Online]. Available: http://urn.kb.se/resolve?urn=urn:nbn:se:liu:
diva-12260

928

IEEE-ICSE2014 Proc. 2014, Kuala Lumpur, Malaysia

Low power and low voltage SRAM design for LDPC


codes hardware applications
Rosalind Deena Kumari Selvam, C. Senthilpari, Lee Lini
Faculty of Computing & Informatics,
Faculty of Engineering
Multimedia University
Cyberjaya, Malaysia
rosalind@mmu.edu.my
ratio-less and prevented the loss of information. Kumkum
Verma et al [5] proposed a design of 6T SRAM cell at 180 nm
using two types of architecture namely bank partitioning and
matrix array architecture which focused on optimizing power
and delay. They found in order to lower the power in SRAMs
it was necessary to reduce the capacity bit and word lines. By
this the power dissipation reduces by 78% and speed by 23%
at the cost of 6.93% more transistor count.[6].A study of
SRAM cell based on tunneling field effect transistors(TFET)
was made by Xuebei Yang and KarthikMohanram. They
proposed a new 6T TFET SRAM circuit utilizing ground
lowering read assist (RA). This proved to have better
performance and reliability. It also occupied about 10 to 15%
less area and consuming at least 4 orders of magnitude of
lower static power making it a good choice for low power
high density SRAM applications.
Our proposed dynamic design is a tradeoff between existing
challenges highlighted by the authors and gives a better
performance in terms of power dissipation, propagationdelay
and speed.In the present paper, a modified SRAM cell is
proposed to implement LDPC decoders, which is used in DSP
algorithms and cryptography. The proposed SRAM cell is
compared with other different type of SRAM cell designs.
These SRAM cells have been designed using CAD tools such
as DSCH2 for logic design and Microwind 3 for the layout
design and timing simulation. The SRAM cell is simulated
and the results are compared with other published results in
terms of power dissipation, propagation delay, and total chip
area. The results obtained from simulation shows that the
proposed circuit performs better in terms of propagation delay
and throughput.

Abstract:The Low Voltage Low Power (LVLP) 8T, 11T, 13T and
ZA SRAM cell is designed using the dynamic logic SRAM cell.
The SRAM cells are implemented using pass transistor logic
technique, which is mainly focused on read and write operation.
The circuits are designed by using DSCH2 circuit editor and
their layouts are generated by MICROWIND3 layout editor. The
Layout Versus Simulation (LVS) design has been verified using
BSIM 4 with 65nm technology and with a corresponding voltage
of 0.7V respectively. The simulated SRAM layouts are verified
and analyzed. The SRAM 8T gives power dissipation of 0.145
microwatts, propagation delay of 37.2 pico seconds, area of 14 x
8 micrometers and a throughput of 4.037 nano seconds.
Keywords power dissipation, delay, throughput, SRAM cell

I.
INTRODUCTION
Lower power operation has become of crucial importance in
VLSI Design. One of the ways of obtaining power reduction is
by lowering the power supply and has been seen to be good
and effective. One of the essentials of IC design techniques is
to lower the power of memory circuits with a minimum
tradeoff on its performance. The VLSI industry is constantly
striving towards achieving high density, high speed and low
power devices in the CMOS technology. As the size of the
transistor is being reduced to about 70% of its earlier version
using new technology, the density of these devices on chip has
been increased and also reduction in delay time has been
obtained to satisfy the demand of high performance.
Different memory circuits like the SRAM, covers a
considerable area in the design of digital ICs. Arun Ramnath
Ramani and Ken Choi have [1] shown that it is possible to
push the design of low power SRAMs into the sub threshold
region and then compared it with various parameters like
speed, power consumption and average power delay product.
Yashwant Singh and D. Bhoolchandani have [2] focused on
design ofSRAM cell with dynamic Vt and dynamic standby
voltage to mitigate the leakage power dissipation. Simulation
results show significant reduction in power dissipation in
standby mode of SRAM cell. An8T-CDC column-decoupled
SRAM was designed using a half-select free design by Rajiv
V. Joshi et al [3] which enabled enhancement in voltage
scaling capabilities, and there was a 30%40% power
reduction in comparison to standard 6T techniques. A 10T
SRAM cell circuit was designed by Takahiko et al
[4]considering the static noise margin(SNM) .This circuit was

978-1-4799-5760-6/14/$31.00 2014 IEEE

II.

Design Method

The design of the circuit used follows the dynamic CMOS


logic, which is also known as Pre-charge-Evaluate logic. This
design allows reducing the number of transistors used to
implement any logic function substantially. In this method the
output node capacitance is first pre-charged and then the
output voltage is evaluated based on the applied inputs.Both
these operations are driven by a clock signal which drives one
NMOS and one PMOS transistor in each dynamic stage.

332

IEEE-ICSE2014 Proc. 2014, Kuala Lumpur, Malaysia


When the clock signal is low, the PMOS precharge transistor
starts to conduct while the complementary NMOS transistor is
off. The output capacitor is charged up through the conducting
PMOS transistor to a logic high level of Vout = VDD. The input
voltages at this stage have no affect on the output as one of the
transistors is turned OFF. When the clock signal becomes
high, the precharge transistor turns off and the other transistor
which is the complementary one is turned on. The output node
voltage now changes eitherhigh or low depending on the input
voltage levels. If the input signals create a path to conduct
between the output node and the ground then the output
capacitance will start to discharge to 0.
This paper is mainly focused on the design of SRAM cell
using dynamic logic. A fundamental NMOS dynamic logic
Fig. 2 (b): 11T SRAM
circuit comprises of an NMOS pass transistor which drives the
gate of another NMOS transistor.A periodic clock signal
drives the pass transistor and it helps to charge up or charge
down a capacitance depending on the input signal. This leads
to the concept of logic 0 when the clock is low (0) and the
capacitance is in the charge down mode or logic 1 when the
clock input is high (1) and the capacitor is in the charge up
mode. The output of the inverter will have a value logic 1 or
logic 0 depending on the voltage at the capacitance.
Logic 1 transfer: If the voltage at the node is assumed to be
0 initially, anda logic 1 level is applied to the input terminal.
The clock signal at the gate will now increase from 0 to VDD at
t = 0. Now the pass transistor will start to conduct as soon as
the clock becomes active. The pass transistor will work in the
Fig. 2(c) 13T SRAM
saturation state and charges up the capacitor. The pass
transistor will turn off when the voltage at the node equals the
maximum voltage and the gate source voltage will be equal to
threshold voltage.
Logic 0 transfer: If it is assumed that the voltage at the node
is 1 initially and logic 0 is applied at the input terminal. The
clock signal at the gate of the pass transistor changes from 0 to
VDD. The pass transistor starts to conduct as soon as the clock
signal becomes active and the current flows in the opposing
direction of as during charge up of the capacitor. The pass
transistor operating in the linear region discharges the
capacitor and the source voltage now becomes 0.
Our design uses a 65nm technology size for 8T, 11T, 13T and
ZA circuits using the logic 0 and logic 1 transition for inputs
of WL= 1 and the BL input as 0 and 1. The circuits below
show the basic setup for the simulation. According to dynamic
logic technique and pass transistor logic, the 8T, 11T, 13T and
ZA circuits are shown in Fig. 2(a), Fig. 2(b), Fig. 2(c) and Fig.
2(d) respectively.

Fig. 2 (d) ZA SRAM


Timing Diagrams:

Fig. 3(a) 8T

Fig. 2(a): 8T SRAM


The circuits shown above in Fig.2 (a), 2(b), 2(c) and 2 (d)
have been designed using the dynamic logic. The operation of
the circuit is divided by the control input into two distinct
phases that are the pre-charge and evaluate intervals.

333

Fig. 3(b) Icular stack.

Fig. 3(c)13T

IEEE-ICSE2014 Proc. 2014, Kuala Lumpur, Malaysia


dynamic logic; the results are evaluated in terms of logic 0 and
logic 1 for pre evaluation charges. The outputs and leakage are
evaluated according to pre evaluation charge logic.
Table 1 shows the Logic 0 simulation result of dynamic 8T
SRAM cell, which was measured with WL =1 and BL=0 for
the SRAM cell. The function of the SRAM cell is designed by
NMOS tree. When the input is 1 the current that passes
through the drain is low, so the output drops the voltage of
VDD, in line with the CMOS 65nm technology.According to
NFET standards, the output voltage and the dissipation is less
compared with other existing results. The parameters
components resulting from the layout are analyzed using
BSIM4 analyzer. The output parameters values of delay and
falltime are made perfect due to dynamic logic. The dynamic
logic cells are acting as a push-pull device, which maintains
the flow of charges in a regular manner. So, our dynamic
logic based 8T SRAM cells give lower delay than other logic.
The area A can be calculated from SRAM 8T whole cell
which includes input pads and output pads. Any memory logic
will need to place a bit of data at the correct location of the
memory cell. The proper selection of the cell location will be
done if the logic is given sufficient energy. This can be
calculated from CV2. The throughput of the memory logic
always depends on the number of stages of stack. Our 8T
SRAM cell has proper load capacitance which consumes
sufficient energy to place the logic in the part
Table I: Logic 0 simulation result of dynamic SRAM cell
Type

VO
(V)

ID
(mA)

PD(W)

Propagation
Delay

PDP

A=WxH

Throug
h
put

8T
11T
13T
ZA

0.7
0.69
0.67
0.67

0.06
0.06
0.25
0.03

0.145 x10-6
0.101 x10-6
1 x10-3
6 x10-9

37.16 x10-12
8.70 x10-11
8.41 x10-10
8.50 x10-10

5.388 x10-18
8.787x10-18
8.41 x10-13
5.1 x10-18

14x8m
17x8m
20x8m
16x8m

4.372 ns
4.087ns
6.841ns
8.85ns

8T
11T
13T
ZA

0.7
0.68
0.69
0.66

0.06
0.06
0.25
0.03

0.145 x10-6
0.101 x10-6
1 x10-3
606 x10-9

3.72 x10-11
7.00 x10-11
9.25 x10-10
8.50 x10-10

5.388 x10-18
7.07 x10-18
9.25 x10-13
5.151 x10-16

14x8m
17x8m
20x8m
16x8m

4.037ns
4.07ns
6.925ns
8.85ns

Fig. 3(d) ZA
A condition of 0 at the control input defines the pre-charge
where the PMOS is conducting while the NMOS is cutoff as is
seen in the timing diagram in Fig. 3(a), 3(b), 3(c) and 3(d)
respectively. The switching of the control input between 0 and
1 and the effect on the circuits of 8T, 11T, 13T and ZA is seen
in the timing diagrams of Fig.3 (a), 3(b), 3(c) and 3(d).In the
part III of the paper the results obtained using the above
techniques achieve low power dissipation, less delay and
better speed

Table 2 shows the Logic 1simulation result of dynamic SRAM


cell, which was measured with WL =1 and BL=0 and WL= 1
and BL=1for the SRAM cell. The NMOSFET tree determines
the function of the SRAM cell here. When the input is 1 the
output node voltage remains at the logic 1 high level ordrops
to a logic low level depending on the input voltage levels.The
output voltage drops to its correct logic level after a certain
time delay.The voltage falls to a value almost where the
output voltage equals Vdd. The parameters of components
resulting from the layout are analyzed using BSIM4 analyzer.
The output parameter values of delay and fall time are made
almost to the best value because of dynamic logic.

III.
RESULTS AND DISCUSSION
A VLSI circuit can be determined as a complex maze of paths
that influence on the whole circuit function. When the
designis not appropriately simulated it will ruin the whole
circuit operation. In order to ensure the rightness of the circuit
there are a few ways to minimize these effects. One of the
methods to correct the fault is by putting the analysis data on
the certain probe points and follows the data flow through the
circuit. The output points later can be observed to determine
whether the circuit has handled the data in an appropriate
manner. The observation of these data as they flow through
the computer representation of a circuit is called the
simulation and the selection of input data is known as a testvector generation.Our SRAM cell is designed in context with

Table II: Logic 1 simulation result of dynamic SRAM cell

334

Typ
e

VO
(V)

ID
(mA)

PD (W)

Propagati
on Delay

PDP

A=WxH

Through
put

8T

0.56

0.056

7.817 x10-6

3.72 x10-11

2.907x10-16

14x8m

4.0372ns

-6

-11

8.427 x10-17

17x8m

4.036ns

11T

0.66

0.056

2.341 x10

3.60 x10

13T

0.68

0.249

7 x10-6

9.09 x10-10

6.363 x10-15

20x8m

6.909ns

ZA

0.67

0.249

906 x10-9

9.00 x10-10

8.154 x10-16

16x8m

8.9ns

8T

0.56

0.056

7.817 x10-6

3.72 x10-11

2.907 x10-16

14x8m

4.0372ns

11T

0.54

0.056

9.061 x10-6

1.04 x10-10

9.423 x10-16

17x8m

4.104ns

13T

0.68

0.249

7 x10-6

9.42 x10-10

6.594 x10-15

20x8m

6.942ns

ZA

0.67

0.249

1.306 x10-6

9.84 x10-10

1.285 x10-15

16x8m

8.984ns

IEEE-ICSE2014 Proc. 2014, Kuala Lumpur, Malaysia


Table III: Comparison of power dissipation, delay and PDP
Parameter
Power
Dissipation
Delay
PDP

Proposed circuit
0.145 x 10 -6

Ref [1]
-

Ref [2]
0.0687 x 10 -6

37.2 x 10 -12
2.907 x 10 -16

1018.3 x 10 -12
1.347 x 10 -18

From the 8T SRAM cell simulation, the results obtained and


its comparison with other existing circuits is shown in Table
3.From the table it can be noted that our SRAM 8T cell
compared with Arun et al. Ref [1] and Yashwant Singh et al
Ref [2] SRAM circuits. Our proposed dynamic SRAM circuit
has less critical path, due to lower switching threshold.So our
circuit has given 96.34% improvement in terms of delay. But
our circuit consumes more power when compared with
reference [1] and reference [2] due to logic transition in
dynamic cell.It is seen that for logic 0there is a small change
in the power dissipated than during the logic 1 where some
significant difference can be observed. The simulation of
voltage versus time using Micro wind 2 shows the
performance of the circuit for the 8T,11T, 13T and ZA circuits
for the conditions of logic 1 control input and inputs write
logic WL=1 and bit logic BL=0. From the simulation diagram
the output power obtained for an input voltage of 0.7V can be
observed. The propagation delay in terms of rise time tr and
fall time tfis noted. It can be seen that the rise time and fall
time are small values hence making the propagation delay to
be a minimum. The output voltage is observed to be almost
reaching VDD. This proves that our proposed circuit is better
designed in terms of performance than other existing circuits.
Fig 4 (a) (b) (c) (d) Voltage Vs time for 8T, 11T, 13T and ZA
circuits respectively.

Fig 4 (c) 13T

Fig 4 (d) ZA
CONCLUSION
This design has been implemented with the low voltage low
power application where it tries to reduce the power
consumption on the SRAM cell circuit. The data or output
from this simulation does not give the much expected result.
This is due to the many problems faced when dealing with the
simulation process.This shows that during the designing
process, we need to follow all aspects stated in the manual as a
guide line in order to design a better circuit. According to
dynamic Design concept, this SRAM cell achieved less
consumption of power, lower delay, high speed and high
throughput than existing SRAM cell.
REFERENCES
[1]

Fig 4(a) 8T
[2]

[3]

[4]

[5]

[6]

Fig 4(b) 11T

335

Arun Ramnath Ramani and Ken Choi , A Novel 9T SRAM Design


in Sub-Threshold Region International IEEE conference on
Electro/Information Technology (EIT),2011
Yashwant Singh and D. Boolchandani, SRAM Design for Nanoscale
Technology with Dynamic Vth and Dynamic Standby Voltage for
Leakage ReductionInternational IEEE conference on Signal
Processing and Communication, 2013.
Rajiv V. Joshi,, Rouwaida Kanj, and Vinod Ramadurai, A Novel
Column-Decoupled 8T Cell for Low-Power Differential and
Domino-Based SRAM Design, IEEETransactions on Very large
scale integration (vlsi) systems, vol. 19, no. 5, may 2011.
Takahiko Saito, Hitoshi Okamura, Hiromasa Yamamoto and
Kazuyuki Nakamura, A Ratio-less 10-Transistor Cell and Static
Column Retention Loop Structure for Fully Digital SRAM Design,
IEEE Memory Workshop (IMW) 2012,
Kumkum Verma Sanjay Kumar Jaiswa lDheeraj Jain Vijendra
Maurya, Design & Analysis of 1-Kb 6T SRAM Using Different
Architecture2012
Fourth
International
Conference
on
Computational Intelligence and Communication Networks.
Xuebei Yang and Kartik Mohanram, Robust 6T Si tunneling
transistor SRAM design Design, Automation and Test in Europe
IEEE Conference 2011

Comparative Study of FinFETs versus 22nm Bulk


CMOS technologies: SRAM Design Perspective
Hooman Farkhani1, Ali Peiravi1, Jens Madsen Kargaard2, Farshad Moradi2
1
Ferdowsi University of Mashhad, Iran
2
Integrated Circuits and Electronics Lab, Aarhus University, Denmark

AbstractIn this paper, FinFET devices are compared to


bulk CMOS technology by looking at the characteristics of both
devices and their challenges in nano-scale regimes. The effects of
process variations on these devices along with the effect of device
parameters on their characteristics are explored. Both FinFET
and CMOS devices are used in 6T and 8T-SRAM cells.
Simulation results show significant improvements for FinFETbased SRAMs compared to bulk CMOS-based SRAM cells.
FinFET based 6T-SRAM cell shows 39% improvement in read
static noise margin, 54% higher write margin, 54% smaller
minimum supply voltage applicable, and 7.3X less leakage power
compared to its CMOS counterpart. 8T-SRAM using FinFET
improved read static noise margin, write margin, minimum
supply voltage and leakage power consumption by 7%, 64%,
50%, and 3.1X compared to bulk-CMOS 8T-SRAM, respectively.
KeywordsFinFET; SRAM; CMOS;

I.

DRAIN

BACK GATE
FRONT GATE

SOURCE
BOX
Si SUBSTRATE

DRAIN

BACK GATE
FRONT GATE

SOURCE
BOX
Si SUBSTRATE

INTRODUCTION

Bulk CMOS technologies have been the cornerstone of


semiconductor devices for years. Moores law motivates the
technology scaling in order to improve the performance
features such as speed, power consumption and area. While
circuit and systems take the advantages of inevitable scaling
down the technology, the effect of undesired features such as
short channel effects (SCEs) and sensitivity to process
variations has been increased [1]-[3]. The short channel effects
include the limitation imposed on electron drift characteristics
in the channel and the threshold voltage variation along with
ION/IOFF reduction and leakage current increase have been
caused the use of bulk CMOS transistors in sub 22nm
technologies impossible. It is due to the fact that ION/IOFF
reduction causes instability and limits subthreshold circuit
design. On the other hand, leakage current increment increases
static power consumption.

Fig. 1. Different structures of FinFET [8]

control of the channel. Furthermore, it decreases the SCEs and


eliminates random dopant fluctuation (RDF) effect due to fully
depleted channel that causes less sensitivity to process
variations [9].
One class of circuits affected critically by the scaling issues
is the static random access memory (SRAM) [10]. It is due to
using minimum size transistors in SRAM structures in order to
minimize the area overhead that increases the sensitivity to
process variations. Besides, considering the fact that most of
cells in SRAM array are inactive, leakage currents contribute to
a large part of total power consumption.
In this paper, we compare different features of bulk CMOS
transistors with FinFET and the effect of utilizing these two
types of transistors on reliability and power consumption of
SRAM structure will be explored. The rest of this article is
arranged as following: in section II, the features of bulk CMOS
transistor are compared with FinFET. In section III, reliability
and power consumption of 6T and 8T SRAM cell while using
FinFET and bulk CMOS technology in 22nm technology are
explored. Finally, the results are included in section IV.

SCEs can be reduced by using thinner gate oxide, while it


will lead to a higher gate leakage current exponentially due to
tunneling. It increases total power consumption and reduces
device reliability. Thus, new transistor structures have been
proposed in order to overcome the SCEs [4]-[8]. Among
different proposed devices, FinFET is one of the best
candidates in order to overwhelm the restrictions of bulk
CMOS technologies toward deep nano-scale technologies.
What makes FinFET different from bulk CMOS transistor is
the thin silicon fin that plays the role of channel and conducts
the electron carriers between source and drain. Fig. 1 shows the
different structures of FinFET. As shown, the channel is
surrounded from three dimensions by gate results in a superior

978-1-4799-3378-5/14/$31.00 2014 IEEE

II.

BULK CMOS VERSUS FINFET

In this section, different features of a bulk CMOS transistor


in 22nm technology are compared with a FinFET transistor in
20nm technology. 24nm channel length for both transistors is

449

FinFET
VGS= 0.9V
0.8V
0.7V
0.6V
0.5V
0.4V

50
40

Bulk MOSFET
VGS= 0.9V
0.8V
0.7V
0.6V
0.5V
0.4V

a
20

15
30
10
20

Temp=25c

10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Temp=25c

0.1

0.2

0.3

VDS(V)

0.4

0.5

0.6

0.7

0.8

0.9

VDS(V)

Fig. 2. The I-V characteristic of (a) FinFET (b) bulk CMOS transistors

used. The width of bulk CMOS transistor is 71nm while the fin
height and fin thickness of FinFET transistor are 28nm and
15nm, respectively, that makes its width equal with bulk
CMOS transistor. Table I shows the characteristics of FinFET
transistor that is used. All simulations have been performed in
HSPICE using Predictive technology models (PTMs) [11].

TABLE I.

FINFET PARAMETERS [11]

Technology(nm)
Lg(nm)
EOT(nm)
Tfin(nm)
Hfin(nm)
NSD(cm-3)
VDD(V)
SS(mV/decade)
DIBL(mV/V)

A. I-V characteristic
Fig. 2 shows ID versus VDS for bulk CMOS and FinFET
transistors when VGS changes from 0V to 0.9V. Two features
can be derived from strong inversion region including the level
of ON current and output resistance. The level of ON current in
FinFET is higher. Besides, it has higher output resistance (less
channel length modulation). It is due to the fact that the
channel is surrounded in three dimensions in FinFET. It causes
better gate control in this type of transistor.

20
24
1.1
15
28
3e26
0.9
71
58

FinFET transistors change versus gate source voltage at


VDS=0.1V and VDS=1.1V. Drain induced barrier lowering
(DIBL) for bulk CMOS transistor is 124mV/V that is much
higher than FinFET (58mV/V). It shows lower threshold
voltage variation due to short channel effects for FinFET.
Another point derived from the figure is the lower threshold
voltage of FinFET (0.36V) than bulk CMOS (0.55V) that is
one of the reasons behind the higher level of ION/IOFF ratio.

Fig. 3 shows ION/IOFF ratio versus supply voltage for both


devices. As illustrated, in low supply voltages the ION/IOFF ratio
is higher for FinFET while in high supply voltages (higher than
0.72V) it is higher for bulk CMOS. It is due to the fact that
bulk CMOS has a lower IOFF compared with FinFET while
FinFET has a higher ION. In low supply voltages the OFF
current of bulk CMOS is lower but it is closed to FinFET while
the ON current of FinFET is much higher than bulk CMOS. As
a result, the ION/IOFF ratio is higher for FinFET. However, in
high supply voltages (higher than 0.72V) the ON current of
bulk CMOS is getting close to the ON current of FinFET and
the ION/IOFF ratio of devices are closed to each other.

C. Subthreshold Swing
Fig. 4 also shows that the subthreshold swing (SS) of the
FinFET is 21% lower than bulk CMOS transistor at room
temperature. It shows more dependency of the drain current to
VGS in FinFET transistor. Considering the subthreshold I-V
relation where drain current changes exponentially with VGS
[12], it shows that the dependency of drain current with VGS in
FinFET increases in a faster pace than bulk CMOS.
Fig. 5 shows the effect of temperature on SS. Temperature
is changed from -40c to 125c and SS is calculated for both

B. Drain Induced Barrier Lowering

Log10 ID

Fig. 4 shows how the drain current of bulk CMOS and

Fig. 4. Drain current versus Gate Source voltage for FinFET and bulk CMOS
while VDS is 0.1V and 1.1V

Fig. 3. ION/IOFF ratio versus supply voltage for FinFET and bulk CMOS
transistors

450

Fig. 7. Drain current versus gate source voltage, while drain voltage is VDD for
channel length from 24nm to 54nm

Fig. 5. Subthreshold swing versus temperature for FinFET and bulk CMOS

devices. As illustrated, SS is increased as temperature is raised


linearly for both devices. However, the rising rate of SS for
bulk CMOS is higher than FinFET. It shows more sensitivity
of SS to temperature in bulk CMOS.

characteristics to process variation.


F. Silicon Thickness Variation
Fig. 9 shows ION/IOFF ratio variation due to silicon thickness
(TSI) variation from 7nm to 15nm. As it is illustrated, the
ION/IOFF ratio increases dramatically by decreasing TSI. IOFF
decreases by 50% with 1nm reducing in TSI for FinFET while
ION is degraded by 1.5%. This feature can be used for reducing
the leakage current in FinFET. However, it has to be
considered that there is a minimum thickness applicable (due to
physical stability issues) in each technology.

D. Gate Induced Drain Leakage


Gate leakage in nano-scale bulk CMOS transistor is one of
the most important concerns. In order to calculate the gate
induced drain leakage (GIDL), VGS has to sweep from negative
voltages to positive voltages as shown in Fig. 6. As it is clear
from this figure, the behavior of FinFET is different from bulk
CMOS for negative VGS and it shows a better GIDL. Using
negative VGS decreases subthreshold current in both devices.
However in bulk CMOS, it is constant at VGS < -0.1V and starts
to increase rapidly at VGS < -0.3V that is attributed to increased
gate leakage.

G. Fin Height Variation


Fig. 10 shows ION and IOFF versus fin height (HFIN)
variation. As it is illustrated, 11% increase in HFIN raises ION

E. Channel Length Effect


Fig. 7 shows drain current versus VGS for different channel
lengths (LCH) from 24nm to 54nm. It shows faster decrease in
IOFF for FinFET compared with bulk CMOS by increasing LCH
while ION is sort of constant for FinFET but it decreases for
Bulk CMOS. It can be used in order to reduce leakage power
consumption at cost of increasing the area and lower ION.
Another point derived from Fig. 7 is the less dependency of
threshold voltage to LCH variation in FinFET compared to bulk
CMOS. It is highlighted in Fig. 8 where the threshold voltage
versus LCH variation has been shown for both devices.
Increasing in LCH from 24nm to 54nm caused 14% and 7%
increase in the threshold voltage for bulk CMOS and FinFET,
respectively. It proves less dependency of FinFET

Fig. 8. Threshold voltage versus channel length for bulk CMOS and FinFET

Fig. 6. Drain current versus gate source voltage while drain voltage is VDD

Fig. 9 ION/IOFF ratio for different silicon thickness

451

10u
1u

T=125c

100n
10n
1n

-40c

100p
10p
1p

FinFET
Bulk MOSFET

100f
10f
0

0.4

0.6

0.8

VGS(V)

Fig. 10. ION and IOFF versus fin height

Fig. 11. Drain current versus Gate Source voltage for different temperatures
for FinFET and bulk CMOS

and IOFF by 6% and 29%, respectively. Therefore, increasing


the fin height is not a good way in order to achieve a higher ION
for low power applications due to the fact that higher fin height
increases IOFF more than ION. However it has to be considered
that increasing HFIN in FinFET compared with increasing the
width in bulk CMOS occupies very small area. This small area
penalty is due to the fact that higher fin needs higher thickness
in order to physical stability issues. Therefore, increasing the
ON current by increasing the fin height causes a negligible area
penalty compared to increasing the width in bulk CMOS. This
is a very important feature in applications need high density
such as SRAMs. SRAMs use excessive number of bit cells in
their structures. Therefore, the area occupied with each bit cell
has to be at minimum level in order to achieve high density.

to common path for read and write operations, there are design
trade-offs in the strength of transistors in 6T SRAM cell. While
reading the cell, pull down transistors (PD1-PD2) stronger than
access transistors (AC1-AC2) increase the reliability of the
cell. In the other way, when performing the write operation
stronger access transistors than pull down and pull up
transistors ease the operation. In the hold mode, equal strength
for pull up and pull down transistors ensures the most
reliability.
Due to a very high process variation and low noise margin
of 6T SRAM, 8T-SRAM cell is used with separate lines for
read and write [15]-[17]. This obviates the trade-off between
read and write cycles. To make it more clear, stronger access
transistors are required in order to improve write margin, while
for improving read margin weaker access transistors are
needed. This means, this issue can be resolved by separating
read and write paths. Fig 12-b shows 8T SRAM cell structure.
It consists of a 6T SRAM cell together with a read circuit (R1R2 transistors and RBL line). The write operation is done by
BL and BLB lines througth access transistors (AC1-AC2). R1
and R2 transistors are used in order to make the data stored in
node Q on RBL line during read operation. This structure
obviates the trade-offs between read and write.

H. Temperature
Fig. 11 shows ID versus VGS for different temperatures (T)
vary from -40c to 125c. As shown, T variation in bulk
CMOS changes both of the performance (ION) and the leakage
power consumption (IOFF) compared with FinFET that it only
changes IOFF. However, OFF current variation in FinFET is
more severe. Another feature that is affected by temperature
variation is threshold voltage. Increasing the temperature from
-40c to 125c decreases the threshold voltage by 10% and
16% for bulk CMOS and FinFET respectively. It shows more
threshold voltage dependency to the temperature in FinFET.
III.

0.2

Since most of SRAM cells are in standby mode, the


leakage current is one of the main concerns for SRAM designs.
Besides, in order to reach the maximum density in SRAM
memory, minimum width transistors are used. Minimum size
transistors along with accurate sizing ratio of transistors
requirements make SRAM cell reliability very sensitive to
proccess variations. In this section, the effect of using FinFET
and bulk CMOS transistors on power consumption and
robustness of 6T and 8T SRAM cells are explored. In order to
include process variation due to RDF on RSNM, WM, and
supply voltage scalability simulations, (VTH) of 25mV and
28mV for n-channel and p-channel FinFETs and (VTH) of

6T AND 8T SRAM CELL

Several SRAM cells have been proposed in order to reach


different design goals such as low power consumption, low
density, lower supply voltage, and high reliability [13]-[14].
Among them 6T and 8T SRAM cells are the most commonly
used cells. Fig. 12 shows 6T and 8T SRAM cell structures. As
shown, 6T SRAM cell (a) consists of two back to back
inverters (PU1-PD1 and PU2-PD2) that keep the data and its
inverse in nodes Q and QB, respectively. The access transistors
(AC1-AC2) are used to perform read and write operations. Due

Fig. 12. (a) 6T and (b) 8T SRAM cells [15]

452

45mV and 49mV for NMOS and PMOS have been considered
[18] and a Monte Carlo analysis with 1000 itterations is
performed. 25C and VDD=0.9V conditions have been
considered in whole simulations. Simulations are done using
HSPICE with PTM models 22nm LP and 20nm LSTP for bulk
CMOS and FinFET transistors, respectively.

FinFET structure.
3) Supply Voltage Scalability: In order to evaluate supply
voltage scalability for low voltage applications such as
biomedical applications, the Monte Carlo analysis while VDD
is swept from 0.95V to 0.2V is performed and RSNM is
calculated. As expexcted, RSNM decreases with VDD scaling.
By supposing the minimum allowable RSNM is 15% of VDD,
the minimum operational supply voltage will be 0.3V and
0.65V for FinFET and bulk CMOS, respectively. It shows that
FinFET transistor is a better candidate for low voltage
applications.

A. 6T SRAM cell
1) RSNM: Read static noise margin (RSNM) is a metric
showing read stability of a SRAM cell. It is defined as the
length of the side of the largest square that can fit into the
lobes of the butterfly curve. Butterfly curve is obtained by
drawing and mirroring the inverter charachteristics while
access transistors are ON and bitlines are precharged to VDD
[19]. The size of transistors has shown in Fig. 12. The
parameter X is W/L ratio for bulk CMOS transistors that is
71nm/24nm (e.g., the size o