You are on page 1of 11


4, APRIL 2017 1193

10T SRAM Using Half-VDD Precharge and

Row-Wise Dynamically Powered Read
Port for Low Switching Power
and Ultralow RBL Leakage
Naeem Maroof, Member, IEEE, and Bai-Sun Kong, Member, IEEE

Abstract— We present, in this paper, a new 10T static will also increase the yield and improve the SoC reliability.
random access memory cell having single ended decoupled Although the six transistor (6T) SRAM cell is a widely used
read-bitline (RBL) with a 4T read port for low power operation standard in industry, it has its own limitations. 6T SRAM
and leakage reduction. The RBL is precharged at half the cell’s
supply voltage, and is allowed to charge and discharge according not only has conflicting read and write requirements, it also
to the stored data bit. An inverter, driven by the complementary has read static noise margin (RSNM) degradation. The most
data node (QB), connects the RBL to the virtual power rails important factors to consider in the design of SRAM in
through a transmission gate during the read operation. RBL modern nanometer technologies are the: 1) read stability;
increases toward the VDD level for a read-1, and discharges 2) write stability; 3) cell supply reduction; 4) power dissipa-
toward the ground level for a read-0. Virtual power rails have
the same value of the RBL precharging level during the write tion; 5) leakage currents; 6) bitline (BL) ION to IOFF ratio; and
and the hold mode, and are connected to true supply levels 7) variability [6]. With increasing process variations, achieving
only during the read operation. Dynamic control of virtual rails specific yield is getting difficult, and novel designs and tech-
substantially reduces the RBL leakage. The proposed 10T cell niques, including read and write assist circuits, are adopted at
in a commercial 65 nm technology is 2.47× the size of 6T with the cost of area, power dissipation, or speed to improve the
β = 2, provides 2.3× read static noise margin, and reduces the
read power dissipation by 50% than that of 6T. The value of read/write stability and increase the number of cells in a single
RBL leakage is reduced by more than 3 orders of magnitude column [7].
and (ION /IOFF ) is greatly improved compared with the 6T BL Reduction of the supply voltage is the most straightforward
leakage. The overall leakage characteristics of 6T and 10T are technique to reduce the active power dissipation. However,
similar, and competitive performance is achieved. 6T SRAM power supply cannot be reduced aggressively due to
Index Terms— 10T, charge recycling, leakage reduction, low its RSNM degradation. Many SRAM cell have been proposed
power, precharging, single ended (SE) read bitline (RBL), static that improve RSNM, including single ended (SE) 8T [8],
random access memory (SRAM), virtual rails. 9T [9]–[11], 10T [12], [13] and differential 7T [14],
I. I NTRODUCTION 8T [15], 9T [16], [17], and 10T [18]. Also, numerous SRAM
assist techniques have been described in the literature as a

P OWER dissipation has become a first class design

constraint [1], [2], as we have hit the utilization wall,
and the low power circuit, architecture, and system level
cost-effective method to increase the write margin, and lower
the leakage power dissipation compared to bitcell transistor
upsizing or operating the memory array at a higher supply
techniques are sought out [3], [4]. In addition, the static voltage [19]–[21]. A 10T cell in [22] uses virtual ground rail
random access memory (SRAM) is the most important digital for read port to achieve lower BL leakage and differential,
macro and its portion on a system-on-chip (SoC) is ever- while Kanda et al. [23] used row-by-row dynamic control of
increasing [5]. Decreasing the power dissipation of SRAM cell supply voltage and negative wordline voltage for 2 orders
will not only lower the overall system power dissipation, but of magnitude reduction in leakage currents.
Manuscript received August 30, 2016; revised November 6, 2016; accepted In this paper, we present our half VDD precharge and charge
December 6, 2016. Date of publication December 29, 2016; date of current recycling technique for low power read operation. A 4T read
version March 20, 2017. This work was supported in part by the Basic port is designed to employ the proposed technique. Read
Research Program through the National Research Foundation of Korea funded
by the Ministry of Education under Grant NRF-2016R1D1A1B03933605, BL (RBL) is charged and discharged through the read port
and in part by the Industrial Strategic Technology Development Program according to the state of stored bit. Read port is powered
(10052653) funded by the Ministry of Trade, Industry & Energy, Korea. by virtual power rails that run horizontal and are shared by
Design tools were supported by IDEC, KAIST.
The authors are with the Department of Information and Communication the cells of a word. The dynamic control of read port power
Engineering, Sungkyunkwan University, Suwon 16419, South Korea (e-mail: rails reduces the RBL leakage substantially. The rest of this; paper is organized as follows. In Section II, we review the
Color versions of one or more of the figures in this paper are available
online at conventional and the state-of-the-art cell designs. Section III
Digital Object Identifier 10.1109/TVLSI.2016.2637918 presents the proposed cell and its associated scheme. The
1063-8210 © 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See for more information.

Fig. 1. Conventional 6T SRAM read. (a) Column of M bit-cells during read. (b) Top: hold and read SNM butterfly curve (with worst case noise polarity
during hold). Bottom: transient behavior showing read disturbance.

postlayout results are gathered in Section IV, and Section V increases during the read operation. This increase in volt-
presents some important discussion. Finally, a brief conclusion age (V ) is dependent upon transistor sizing. For a successful
is presented in Section VI. read operation, β ratio, defined as (((W/L) N )/((W/L) A ))
must be larger than 1 (typically 2 to 3) [5]. The vulnerability
II. C ONVENTIONAL AND THE S TATE - OF - THE -A RT of the internal nodes of an SRAM cell is captured through
SRAM D ESIGNS metrics HSNM, RSNM, and WNM/write trip point (WTP)
SRAM cell must robustly operate under hold, read, and during hold, read, and write mode, respectively. Fig. 1(b)
write mode. An SRAM cell uses the positive feedback of shows the HSNM and RSNM butterfly curve (top) [24] and the
cross-coupled inverters (INVs) to store a single bit of infor- internal state disturbance of node QB for a slow rising WL
mation in a complementary fashion. Access transistors provide signal (bottom). The increase in node Qa voltage not only
the mechanism for the read and write operation. Before every decreases the cell stability, but also increases the short-circuit
access, column BL pair (BL and BLB) is precharged to the current from VDD to VSS and lets pass (now) the higher amount
supply voltage. For the write operation, one of the precharged of leakage current from BLB (IaLeak1 ). This decreases the
BLs is discharged through the write driver. differential BL voltage (VBL ) and requires the increases in
Fig. 1(a) shows a single column of M 6T SRAM cells, WL pulse duration. A wider read pulse can cause dynamic
where one cell is accessed in read mode with data = 0 instability and increases the power dissipation [25]. In addi-
(Qa = 0), while other M − 1 cells are in the hold mode. tion, for a successful write operation, access transistors of the
Leakage components are labeled, and for the worst case 6T must be strong enough to take over the pull-up pMOS
leakage, all M − 1 cells store data = 1 (Qu = 1). Iread flows transistors. Thus, γ ratio, defined as (((W/L) P )/((W/L) A ))
from BL to the VSS through AL and NL of the accessed cell, must be smaller than 1. However, a stronger pMOS is benefi-
and the BL voltage is decreased. The unaccessed cell on the cial for read operation (to decrease the V ), and a weaker
BL exhibits BL leakage. IuLeak0 is the main component of pull-down nMOS is beneficial for write operation (to let
BL leakage while IuLeak1 is negligible, as VDS of AR of the access transistors have more strength for injecting current).
unaccessed cell is large, while VDS of its AL is very small Thus, proper sizing is required for specific conditions and
(varies from 0 to VBL ). These leakage components decrease application.
the differential BL voltage development. As there are a large The conventional 6T SRAM has two BL s (BL and BLB),
number of cells in a single column, the worst case BL leakage and for each read operation one of the BLs is decreased.
can decrease BLB voltage enough to make an erroneous read. We can model 6T SRAM by a single BL with an activity
Thus, Iread must be greater than (M − 1) × IuLeak0 , where M factor of 1. Thus, the dynamic read power dissipation of 6T
is the number of cells in a single column. is given as
During read operation, the internal node of the 6T cell
storing a zero (Qa) lies in the read current path and its voltage Pd6T = N × CBL × VDD × VBL × f (1)

Fig. 2. SRAM read ports (a) 6T. (b) 8T. (c) 9T [11]. (d) 9T [10]. (e) 10T [12].

where N is the number of cells connected in a wordline In 6T SRAM read operation, one of the BL stays at the VDD
(known as word size), CBL is the total switching BL capac- while the other decreases by VBL amount. However, in the
itance, VDD is the precharging supply voltage, and f is the case of 8T SRAM, there is only one BL (RBL) and it either
frequency of operation. CBL is not decreasing gracefully with decreases or stays at the VDD level depending on the bit read.
technology, and a minimum of around 100 mV of VBL is Now, the sensing of SE BL can be done using different circuits
required for proper sensing considering noise, leakage, and such as: 1) domino sensing that requires full VDD swing ON
the process parameter fluctuations. Leakage power dissipation the local-BL; 2) psuedo-differential that requires a reference
is highly data dependent. As the subthreshold leakage current signal; and 3) ac coupled sensing that requires the use of
is a dominant part of the overall transistor leakage, the worst capacitors [32]. Using a reference-based sense amplifier, only
case leakage current of 6T SRAM during the hold can be a small voltage difference is required. In the case of 6T, one of
estimated as the BLs always serves as a reference, while in the case of 8T,
W −Vth  −VDD  reference is set at VDD − VBL . Thus, for a read-1, RBL stays
Il6T = M × N × I0 × × e ηvt × 1 − e vt (2) at VDD and Vref − VRBL = −VBL is sensed. For a read-0,
RBL must decrease to produce +VBL of the sensing margin,
where M × N is the size of the SRAM array, Vth is the
threshold voltage of access transistor, v t is the thermal voltage which means that now RBL must decrease by y2 × VBL ,
such that Vref − VRBL = VDD − VBL − (VDD − 2 × VBL ) =
(≈26 mV at 27 °C), I0 is the technology-dependent coefficient
+VBL .
and is the leakage current at VGS = Vth of a minimum sized
transistor, and η is the technology-dependent subthreshold Now, for the sensing of the proposed cell, Vref is set at
VDD /2, and for each read operation only VBL of the sensing
factor [26]. In Fig. 1(a), AR of the unaccessed cell is OFF (hav-
margin is required because for each read operation, RBL of the
ing VGS = 0 V), while its VDS = VDD , which exhibits large
leakage. LP10T either decreases or increases from VDD /2, depending
on the data read.
In essence, 6T SRAM has conflicting read and write require-
ments [5] and transistor sizing cannot be done independently. Thus, though the 8T has activity factor of 0.5, it still has the
same dynamic power dissipation as of 6T SRAM due to the
Also, 6T has inherit RSNM problem as the read current
higher differential voltage requirement. Also, the differential
passes through the cell internal node [27], and it further
degrades with VDD scaling [28]. Also, being considered as voltage needs to be developed in the same time (Tread ), to
provide similar performance. Thus, the read port of 8T is
baseline design, 6T has overall a higher power dissipation,
sized wider to provide higher Iread , which exhibits higher RBL
and higher BL leakages, as the low power techniques employ
a certain mechanism to lower the dynamic power dissipation, leakages.
SE 9T SRAM [11] uses a 3T read port shown in Fig. 2(c).
e.g., charge sharing [29], [30] and hierarchical BL [31] and
the leakages (by employing virtual rails) [22], [23]. The read It effectively stacks the M2 between M1 and M3 to reduce
the RBL leakage. Write performance and dynamic power
port of 6T SRAM cell is shown in Fig. 2(a) that highlights
dissipation of this cell is the same as 8T, however, speed
the internal node Q in the read current path. Many alternative
bitcells and techniques have been proposed in the literature to is degraded compared with 6T and 8T cell due to 3T read
improve SRAM cell stability, reduce the leakage currents, and
Another state-of-the-art 9T SRAM cell [10] uses a 3T read
achieve low power operation compared with the conventional
6T design. port shown in Fig. 2(d). It provides the leakage current to
An 8T SRAM cell adds a separate 2T read port, shown the RBL through M3, to compensate the BL leakage when
in Fig. 2(b), and necessarily solves the problem of read the RBL is to stay at a higher level but decreases due to
stability. Internal nodes are isolated from the read current path, the leakage currents (of unaccessed cells). The cell improves
and thus a high RSNM is achieved. Also, sizing of 8T read the sensing margin, and provides better performance due to
port can be done independently without affecting the write 2T (M1-M2) read current path. However, its dynamic power
operation. is the same as 6T, and overall static power is increased.

Fig. 3. Proposed 10T SRAM cell with row-wise read port dynamic power

A 10T cell proposed in [12] adds a 4T read port for an

SE read shown in Fig. 2(e). The cell was designed to provide
ultralow leakage for subthreshold operation. RBL leakage is
substantially reduced at the cost of area and performance.


A. 10T Cell
Fig. 4. Control signals. (a) Levels of control signals during read and
The proposed 10T SRAM cell with SE RBL is shown otherwise. (b) Internal node Z is charged at vdd/2 during nonread mode to
in Fig. 3. We have added a 4T read port to the 6T cell to reduce the RBL leakage.
decouple the internal nodes during the read operation. Read
port consists of an INV P1-N1 driven by node QB, and a
transmission gate (TG) P2-N2. The output (Z) of the INV would not change for consecutive similar bit reads. RBL
is connected to RBL during the read operation through TG, would change only if consecutive read bits are different.
which is controlled by (read) control signals. Furthermore,
read port is powered by virtual power rails, VVDD and VVSS , B. Precharging and Read Operation
which are dynamically controlled. These virtual power rails The proposed 10T SRAM (hereafter referred to as LP10T)
(control signals) run horizontally, and have the true rail values is precharged by V P supply, which has a value half that of
only during the read operation. For the RBL leakage reduction, the supply voltage (i.e., V P = vdd/2). For the read operation,
both the virtual rails have the same level as the precharge level R goes high and R B goes low and thus the TG is activated to
of RBL. connect RBL to the node Z. If QB is 0, then N1 is OFF and
The 10T SRAM cell using an INV and a TG has been P1 connects node Z to the VVDD , which is high for the read
proposed earlier [33]. However, our proposed 10T scheme is operation. Thus, the read current flows from VVDD (having
different from the previous design in the following aspects. value of vdd) to RBL (which has value of vdd/2) through
1) The previous INV+TG-based 10T cell was application P1-TG. Hence, the RBL voltage increases toward the vdd
specific, while our proposed design is generic. level. Now, for a read-0 operation (i.e., QB = 1), P1 is turned
2) We have used the dynamically controlled power rails for OFF and N1 connects node Z to the VVSS , which is low (0 V)
the read port. during the read operation. Thus, the read current flows from
3) We precharge RBL at VDD/2, while the previous 10T RBL (having value vdd/2) to the GND through TG-N1, and
design eliminated the precharge phase, and used INV to hence RBL voltage decreases toward 0 V. For the efficient
fully charge or discharge the RBL. read operation, we have used the boosted read (R and R B)
4) The basic read technique of both the designs is com- signals, which are 1.2× nominal signal levels. This allows
pletely different. The main idea of the proposed design higher current to flow in both the directions. Boosted signals
is “the charging or the discharging of the read BL from have been used to improve the performance degradation due
VDD/2 for every read operation.” The previous design to the half-vdd swing available for each read operation, as is
either discharges from VDD to VSS, or charges from the proposed precharging scheme. The levels of control signals
VSS to VDD. and the change in RBL voltage are shown in Fig. 4(a) during
5) A powerful INV was used previously to produce full read, precharge, and hold mode.
VDD swing on the RBL. In the proposed design, RBL 8T and LP10T are both SE, and their output is sensed by
is precharged at VDD/2, and only a small voltage a sense amplifier with a reference voltage. The increase and
difference (comparable with 6T) is produced for every decrease of the RBL level of LP10T is differentiated by the
read cycle. sense amplifier referenced at the vdd/2 level. LP10T provides
6) In the proposed design, for every read cycle the RBL voltage differential (VBL ) at RBL for both the read-0 and
will exhibit some change (positive or negative) from its read-1, which relaxes the performance constraints compared
precharged value of vdd/2. However, in [33], the RBL with the SE 8T. Use of TG in LP10T improves the efficiency of

read-1 operation, as the single nMOS could not charge well the TABLE I
RBL through P1. Furthermore, sizing of read port is important T RANSISTOR S IZING I NFORMATION (W μm). L min OF 60 nm I S U SED
in terms of area and performance.

C. Dynamic Power
RBL is precharged by V P to the vdd/2 level. For the read-0
operation (i.e., QB = 1), RBL is discharged by VBL amount
through TG-N2. Thus for the next precharge, CBL × VBL
amount of charge is transferred to the RBL through V P supply. D. Leakage Reduction
As the value of V P is vdd/2, the dynamic power dissipation
Virtual power rails run horizontal and are shared by the
of an LP10T cell due to read-0 is given as
cells of a row. These rails are activated during read operation,
VDD (i.e., VVDD is connected to VDD , and VVSS is connected to
P0 = α0 × CBL × × VBL × f (3)
2 ground). During the hold and write mode, these virtual rails
where α0 is the activity factor of read-0. Now, for a read-1 have value of vdd/2. These control signals are shown in
(i.e., QB = 0), RBL is charged from vdd/2 level toward vdd Fig. 4(a) for read and hold/write mode. Fig. 4(b) shows the
level by the amount of VBL through VVDD . The charge equal state of read port transistors in the hold/write mode. As both
to CBL × VBL is transferred from VVDD to RBL. For the next the virtual rails have voltage level vdd/2 during nonread,
precharge, RBL is discharged to the vdd/2 level through V P . the voltage of node Z stays near vdd/2 value. As RBL is
The charge CBL × VBL is recycled from RBL to V P , and precharged at vdd/2 level, and read signals are not activated
can be used for future pre-charging intervals. As the value of (TG is OFF), RBL leakage is reduced substantially due to near
V P is vdd/2, and the value of VVDD is vdd during the read zero VDS of TG. Also, the boosted read signals help reduce
operation, the read-1 dynamic power dissipation is given as the leakage currents, as the VGS of pMOS (P2) becomes more
P1 = α1 × CBL × VDD − × VBL × f
VDD E. Transistor Sizing and Layout
= α1 × CBL × × VBL × f (4)
2 The β and γ ratios of 6T must be considered for proper
where α1 is the activity factor of read-1 operation. Hence, read and write operation [5]. Thus, for a 6T a stronger pull-
assuming equal probability of read-1 and read-0, the dynamic down nMOS, a medium strength access-nMOS, and a weaker
power dissipation of LP10T is given as pull-up pMOS is used. Due to mobility difference, access
VDD and pull-up transistors are sized minimum. Pull-down nMOS
PdLP10T = N × CBL × × VBL × f. (5) are sized 2 × Wmin to make β ratio of 2 for a 6T cell. For
8T and LP10T, minimum sized transistors are used for the
Comparing (5) and (1) shows that the proposed half-vdd
cross-coupled INVs s and for the write access transistors.
precharging scheme and charge recycling mechanism, due to
This achieves relatively low write BL leakage currents and a
charging/discharging of RBL during read interval using the
higher write noise margin. Read port transistors of 8T are sized
proposed 4T read port, reduces the average dynamic read
2 × Wmin . Read port of LP10T is sized as pMOS 2.5 × Wmin ,
power dissipation by 50% compared with the 6T SRAM.
nMOS 1.5 × Wmin . Sizing information of 6T, 8T, and LP10T
However, LP10T incurs area and power overheads due to
is shown in Table I. Although the wider transistors used for
control signal complexity. There are two virtual rails, and each
read port may exhibit higher leakage, the RBL precharging
one is modulated by value VDD /2 from its nominal value
level and dynamic control of read port power rails of LP10T
of VDD /2 when the read control signal is asserted. VVSS is
substantially reduce the RBL leakage current.
generated by passing signal “R” through an INV between
Layouts have been produced in commercial 65 nm technol-
VSS and VDD /2. Now, the capacitance of VVSS rail is much
ogy, and are shown for both the 6T and LP10T in Fig. 5.
lower than the BL capacitances, because the number of bitcells
Metal-2 runs vertical, and connects the cells in a single
in a word are very small compared with the number of bitcells
column. Control signals runs horizontal on M3. 6T has only
in a single column. Considering M × N bit organization of
one control signal (WL), while LP10T has five control signals
SRAM, N  M, where N is the number of bits per word and
(R, R B, W, VVDD , VVSS ). The layout of 6T shown in Fig. 5(a)
M is the number of words in the bank. Here, we can safely
stacks AL and NL transistors. Thin layout to avoid lithographic
assume that the Cvvss is 1/8th of the CBL (128 × 8 bit SRAM
defects [34] increases the cell area of 6T by 39%. For LP10T,
organization). The power dissipation due to virtual rails VVSS
however, it was necessary to choose a taller structure to allow
is Pvvss = 1 × Cvvss × ((VDD )/2) × ((VDD)/2) × f . As there
five M3 rails run horizontally. As AL/AR and NL/NR are sized
are two virtual power rails that modulate by vdd/2 for every
minimum for LP10T, such a taller layout structure allows their
read cycle, the total power penalty is
vertical stacking.
1 In the used technology, Wmin of 120 nm is used, while L min
Pv = CBL × VDD × × f (6)
16 of 60 nm is used. Poly–poly spacing of 130 nm (and 150 for
which is 6.25% of the 6T read power dissipation. polycontacts), along with increased sizes of vias and contacts

Fig. 6. Transient waveforms: read and write operation of 6T and

LP10T. Waveforms are shown for read-write-read operation (VDD = 1 V,
Tperiod = 1n). VBL of LP10T is different for read-0 and read-1.

Higher P f means that the design produces a higher amount

of VBL for each unit of power dissipated. In other words, it
tells us that, to produce the same amount of VBL , the design
requires the smaller amount of power.
For the analysis of leakage currents, 128-bit column is
constructed, and an extra 100 fF of capacitance is added to
Fig. 5. Layouts in commercial 65 nm technology node. (a) 6T (area 1×).
(b) LP10T (area 2.47×). x = via1 = via2. the BL. Leakage current for different data states are analyzed.
Worst case leakage currents with full BL precharged levels
are reported for the VDD leakage, BL/BLB leakage, and RBL
increases the cell layout area. Certain intralayer spacings in leakage (where applicable).
nanometers are mentioned in the layout. The relative area
of LP10T cell is 2.47× the cell area of 6T. The large area B. Results
penalty is due to the increased number of control signals, and
1) Transient Waveforms: Working operation is presented
increased spacing due to larger number of contacts and n-well
in Fig. 6, which shows the transient waveforms of both 6T
spacing. However, RSNM improvement of 2.3×, dynamic
and LP10T during the read, write, and precharge phase. For
power dissipation reduction of ∼2×, and substantial RBL
read-1 RBL of LP10T increases while for read-0 operation, it
leakage reduction by LP10T levels the area cost.
decreases. For 6T, the BLB and the BL decrease for read-1
and read-0 operation, respectively. There is no read stability
IV. R ESULTS AND D ISCUSSION problem for LP10T due to decoupling of the internal nodes.
A. Simulation Conditions However, internal nodes of 6T receive disturbances during
We performed detailed simulations at 65 nm with a 100 fF the read operation, as Iread flows through the access and
of BL capacitance. Distributed model for BL capacitance pull-down transistors. 6T was designed for proper read and
is adopted. Results are presented for 6T, 8T, and LP10T, write operation, and β = 2 is used. Q/QB node disturbances
with 6T being the baseline. Designs are evaluated at five received for 6T during read operation can be as large as
different process corners (NN, FF, SS, FS, and SF), evaluated 100 mV and decreases the read SNM of 6T.
at different temperatures while operating at 1 V of VDD LP10T achieves competitive or even better BL differential
and 1 ns of read period. Results are also presented for four voltage. However, as can be seen from Fig. 6, read-1 Tread can
different voltage/frequency pairs for a typical corner at room be larger than read-0 Tread . Delay performance of LP10T is
temperature. constrained by read-0 operation, as the pMOS (P1) strength is
The development of VBL , and thus the power dissipation, lower than nMOS (N1), even though a wider P1 has been used.
is dependent on the word-line pulse duration. The wider the Read-1 operation is also impacted by the low available swing
pulse duration used, the higher the amount of differential BL in LP10T, which is used to provide lower power dissipation
voltage achieved and the higher will be the power dissipation benefit. This decreases the slew rate of RBL voltage, and
in (1) and (5). Thus, for a fair comparison, we fixed the delay is increased for low supply voltages. Nevertheless, as
word-line pulse duration, and combined the resultant VBL the cells are designed to provide about 100 mV of B L,
and power dissipation into a new metric called “performance” LP10T achieves a better delay at 1 V and competitive delay
(P f ), defined as performance as VDD is reduced.
Write performance of LP10T is improved compared to 6T
Pf = (7) due to better sizing, and the write operation internal node
Pdiss flipping happens earlier than 6T write flipping (as shown
measured in (mV /μW ). P f is the measure of power during the write phase of Fig. 6). Both designs produce full
efficiency, and tells us how better a design uses the swing on BLs for the write operation, thus the write power
power dissipated to produce the output differential voltage. dissipation of 6T, LP10T, and 8T (not shown in the figure) is

read operation. Delay is the maximum of read-0 delay and

read-1 delay. LP10T has smaller delay than the 6T and 8T
at all process corners and temperature levels. The global
average delay of LP10T is 10% and 32% smaller than 6T
and 8T read delay, respectively. In LP10T, read-1 operation is
performed through P1 pMOS and TG and read-0 operation is
performed through N1 nMOS and TG. TG itself contains P2
pMOS and N2 nMOS. Thus, read-0 and read-1 delay happen
to be in opposite fashion for different process corners. For
example, in FS corner, fast nMOS helps reduce the read-0
delay, while slow pMOS increases the read-1 delay. As the
delay is maximum of read-0 and read-1 delays, overall read
delay of LP10T is relatively increased in FS corner compared
Fig. 7. Performance metric: P f of 6T, 8T, and LP10T evaluated at different with 6T. In SF corner, fast pMOS decreases the delay, which
design corners and temperatures. Average P f of LP10T is ∼1.7× that of 6T.
overall decreases the LP10T delay compared with 6T at this
corner. In fact, read delay of LP10T is constrained by read-1
delay, because the P1 (pMOS) and TG cannot charge the RBL
as efficiently as does the N1 (nMOS) and TG discharge it.
4) Leakage Currents: Leakage currents are measured at
different process corners and temperature values for a 128 bit
column of 6T, 8T, and LP10T. As leakage is highly data
dependent, worst case leakages are reported. We can divide
static power dissipation into different categories as follows.
1) Cell leakage (Ilcell ) is the leakage current of
cross-coupled INVs, from VDD to VSS . The Ilcell is
dependent upon sizing and threshold voltage. The Vth
of all the transistors of all designs is the same, however,
pull-down of 6T has higher strength. Thus, Ilcell of 6T
is relatively higher than 8T and LP10T, both of which
Fig. 8. Read delay (Tread ) of 6T, 8T, and LP10T evaluated at different process has minimum sized INVs for latching.
corners and temperatures. LP10T achieves better or competitive speed. 2) BL/BLB leakage IlBL is dependent upon the size of
access transistor and BL voltage level. As BL/BLB of all
the designs is charged to VDD , IlBL of 6T is marginally
similar. However, as LP10T provides better write performance higher than IlBL of 8T and LP10T.
and stability, more leakage optimized sizing of LP10T write 3) RBL leakage IlRBL is only present in 8T and LP10T. The
access transistors could be done. IlRBL is dependent on sizes of the read port transistors
2) Performance (P f ): In Fig. 7, P f for all the three designs and RBL voltage level. As RBL of 8T is precharged
is shown against five process corners, and for 3 different to VDD , and N1/N2 of 8T are sized wider, IRlBL of 8T
temperatures for VDD = 1 V and Tperiod = 1 ns. The is higher than IlBL . For LP10T, RBL is precharged at
data are shown for average read power dissipation (that is, ((VDD )/2), and VVDD and VVSS also have value of vdd/2,
assuming that data 0 and data 1 are read with 50% probability). the RBL leakage is essentially reduced.
As explained earlier, 6T and 8T has similar read power Total leakage current of all designs is shown on log-scale in
dissipation, while LP10T dissipates about half the power Fig. 9(a). The 8T has a higher total leakage than the 6T and
dissipation of 6T. This can be readily seen in the figure that LP10T. Total leakage current of LP10T is slightly better than
the performance of LP10T is about 2× the performance of 6T due to the better sizing of LP10T. RBL of 8T and LP10T is
6T at all the data points. Global average Pf of LP10T, 8T, shown in Fig. 9(b). Dynamic control of power rails of LP10T
and 6T is 14.9, 9.2, and 8.9 (mV/μW), respectively, which provides substantial reduction of the IlRBL . Compared with
demonstrates the effectiveness of the proposed scheme. The the 8T, RBL leakage of the LP10T is more than 3 orders of
benefit is a direct result of hald-vdd precharging and charge magnitude reduced at typical corner and room temperature,
recycling technique. and more than 3.3 orders of magnitude reduced on average
3) Read Delay: In Fig. 8 the read delay time (Tread ) is over all the process corners. At all values of temperature and
measured for all the designs in different process corners and different process corners, more than 2 orders of magnitude
temperatures at VDD = 1 V and Tperiod = 1 ns. Tread is the leakage reduction is observed. The overall leakage power of
time from word-line approaching half-VDD to the point where 6T and LP10T is similar. Nevertheless, the (ION /IOFF ) of RBL
desired VBL has been developed on BL. These results are of LP10T is thousands of times better than the 6T and 8T
based on 100 mV of VBL (and thus, 200mV for 8T as ratios.
it does not produce any differential voltage when Q = 1). 5) Power and Performance at Low Supply Voltage: Results
All designs produce 100 mV of VBL on average for a for different values of supply voltage are shown in Fig. 10

Fig. 9. Leakage current (on log scale) in nA. (a) Total leakage current of 6T, 8T, and LP10T. (b) RBL leakage of 8T and LP10T.

for 6T, 8T and LP10T. These results are obtained at typ-

ical process corner and at room temperature. The left bar
graph shows the normalized average read power dissipation
of designs at different supply voltages, i.e., the average read
power of read-0 and read-1 and normalized to the 6T read
power at 1 V. Read pulsewidth is set at 400 ps, and the
VBL is noted against different supply levels. Although the
pulsewidth is fixed, frequency of read operation is reduced
with the reduction of supply voltage.
Middle bar graph shows the normalized average VBL
produced for all designs at different supply levels, i.e., average Fig. 10. Performance comparison at different supply voltages (TT at 27 °C).
differential voltage of read-0 and read-1 and normalized to the
VBL of 6T at 1 V. Both these bar graphs show that the LP10T helps in read-1 operation (as the RBL is increasing). However,
requires a smaller amount of power dissipation to produce this increase needs to be overcome for read-0 operation as the
approximately the same average VBL compared to the 6T. RBL needs to discharge. This is beneficial in terms of delay
Power dissipation of 8T is smaller than 6T as it produces for read-1, and disadvantageous for the power dissipation of
a lower amount of VBL . 6T produces the same VBL for read-0 operation.
read-0 and read-1, while 8T produces 0 V for read-1 and The right bar graph shows the normalized performance of
some VBL for read-0. Thus, to have the same average VBL all the designs, i.e., respective performance normalized by
for both the read cases, 8T needs to produce a higher VBL performance of 6T at that supply level. LP10T achieves about
for read-0 operation. LP10T also produces different VBL for 1.83× the P f of 6T at 1 V, and 1.84× on average at different
read-0 and read-1 operations. RBL voltage of LP10T increases supply levels.
by a few tens of millivolts due to capacitive coupling as the 6) Delay at Low Supply Voltage: Changes in tempera-
precharge signal is turned OFF. This increase in RBL level ture affect speed, power, and reliability of SRAM cells by

Fig. 11. Read delay time (on log scale) of all designs at different supply
voltage and temperature values for typical process corner.

altering the threshold voltage, electron mobility, and saturation

velocity [35], [36]. Two temperature dependencies exist for
MOSFETs: the ND region, where drain current (I D ) decreases
with increasing temperature, and the reverse dependence
region, where I D increases with increasing temperature [37].
Between these two regions, there is a supply voltage where
the impact of temperature on delay is minimized, referred
to as the temperature-insensitive voltage (VINS ). Both nMOS
and pMOS devices have a different value of VINS [38].
At nominal voltage, the current is influenced greatly by the Fig. 12. Write noise margin for 6T and LP10T at two different supply levels.
(a) Write butterfly curves. (b) WTP.
change in saturation velocity, and thus at nominal voltages,
read time increases with increase in temperature. However,
for lower supplies, change in threshold voltage highly impacts
the read current, and thus, at lower supply voltages, read delay
decreases as temperature increases [39]. Read delay times
of all designs against different supply values and different
temperature values for a typical process corner are shown in
the line graph on log scale in Fig. 11. Low supply values
and lower temperature results in a higher delay for LP10T,
because read-1 time is increased considerably as the P1 is not
able to charge the RBL efficiently. Also, producing a VBL for
full swing is efficient compared with the half swing available,
which is the case with LP10T.
7) Write Noise Margin: Due to the better (though smaller)
sizing for cross coupled INVs of LP10T, compared with
6T sizing, it achieves better write noise margin. Fig. 12(a)
shows the write SNM butterfly curves for both 6T and LP10T
at 1 V and 0.7 V. WTP for both LP10T and 6T is shown
in Fig. 12(b). LP10T achieves a higher WTP compared with
6T at all supply levels. As WNM of LP10T is relatively
higher than 6T WNM and the write time of LP10T is smaller
than 6T, a longer channel length could be used for write
access transistors of LP10T to further minimize the BL/BLB
leakage current, while not impacting the write performance
and stability compared with 6T.
8) Read SNM: Due to data node decoupling, LP10T solves Fig. 13. Read stability of 6T and LP10T versus supply voltage. (a) RSNM
the RSNM problem of 6T. RSNM results are obtained for both bar graph evaluated at typical process corner. (b) Line graph evaluated at room
designs at different supply levels. Fig. 13(a) shows the RSNM
of 6T and LP10T at different temperatures for a typical process
corner. At 0.7 V and 125 °C, RSNM of LP10T becomes 2.57× of 6T at different supply levels at FS corner. Global
2.5× RSNM of the 6T. Fig. 13(b) shows the RSNM of 6T average RSNM of LP10T is ∼2.3× RSNM of 6T.
and LP10T evaluated at different process corners for a room The proposed LP10T uses 4T read port and a separate RBL
temperature. At FS corner, LP10T has the best RSNM of 2.8× for the read operation, while write and hold mechanism of
RSNM of 6T at 0.7 V. On average, LP10T has the RSNM of LP10T is the same as in 6T. Thus, overall BL, BLB, and VDD

leakage power of 6T, 8T, and LP10T is the same. Nevertheless, [8] L. Chang et al., “Stable SRAM cell design for the 32 nm node
with the use of row-by-row dynamic control of read-port and beyond,” in Symp. VLSI Technol. Dig. Tech. Papers., Jun. 2005,
pp. 128–129.
power rails, LP10T substantially lowers the RBL leakage, [9] M.-H. Tu et al., “A single-ended disturb-free 9T subthreshold SRAM
and thus many more cells can be added in a single column. with cross-point data-aware write word-line structure, negative bit-line,
8T worst case leakage is higher than 6T as it uses wider read and adaptive read operation timing tracing,” IEEE J. Solid-State Circuits,
vol. 47, no. 6, pp. 1469–1482, Jun. 2012.
port transistors to achieve 2 × VBL for a read-0 operation. [10] B. Wang, T. Q. Nguyen, A. T. Do, J. Zhou, M. Je, and T. T. H. Kim,
Virtual rail with 8T has also been used to reduce the RBL “Design of an ultra-low voltage 9T SRAM with equalized bitline leakage
leakage. However, 8T requires same (average dynamic) power and CAM-assisted energy efficiency improvement,” IEEE Trans. Circuits
Syst. I, Reg. Papers, vol. 62, no. 2, pp. 441–448, Feb. 2015.
as does the 6T. LP10T with the proposed precharging scheme
[11] S. Lin, Y.-B. Kim, and F. Lombardi, “A low leakage 9t SRAM cell for
and charge recycling mechanism only uses half of the (average ultra-low power operation,” in Proc. 18th ACM Great Lakes Symp. VLSI,
dynamic) power of 6T. LP10T achieves 2.3× RSNM of 6T, 2008, pp. 123–126.
and reduces read delay by 10% and 32% compared with 6T [12] B. H. Calhoun and A. P. Chandrakasan, “A 256-kb 65-nm sub-threshold
SRAM design for ultra-low-voltage operation,” IEEE J. Solid-State
and 8T. As LP10T uses two nMOS and two pMOS transistors Circuits, vol. 42, no. 3, pp. 680–688, Mar. 2007.
for read port, read-1, and read-0 performance is not the same. [13] T.-H. Kim, J. Liu, J. Keane, and C. H. Kim, “A 0.2 V, 480 kb
In addition, a single read operation can be effected differently subthreshold SRAM with 1 k cells per bitline for ultra-low-voltage
computing,” IEEE J. Solid-State Circuits, vol. 43, no. 2, pp. 518–529,
with the change in temperature. LP10T requires 5 control Feb. 2008.
signals, and an area of 2.47× the area of 6T (with β = 2). [14] K. Takeda et al., “A read-static-noise-margin-free SRAM cell for low-
Furthermore, LP10T requires V P supply voltage with value of VDD and high-speed applications,” IEEE J. Solid-State Circuits, vol. 41,
vdd/2 for precharging. RBL precharging transistors for LP10T no. 1, pp. 113–121, Jan. 2006.
[15] R. Saeidi, M. Sharifkhani, and K. Hajsadeghi, “A subthreshold symmet-
are wider than BL/BLB precharging. Yet, low dynamic power, ric SRAM cell with high read stability,” IEEE Trans. Circuits Syst. II,
high ION /IOFF , read stability and competitive read time are Express Briefs, vol. 61, no. 1, pp. 26–30, Jan. 2014.
salient features of the proposed LP10T SRAM. [16] Z. Liu and V. Kursun, “Characterization of a novel nine-transistor SRAM
cell,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 16, no. 4,
pp. 488–492, Apr. 2008.
V. C ONCLUSION [17] S. Lutkemeier, T. Jungeblut, H. K. O. Berge, S. Aunet,
In this paper, we have presented our 10T SRAM cell that M. Porrmann, and U. Ruckert, “A 65 nm 32 b subthreshold processor
with 9T multi-Vt SRAM and adaptive supply voltage control,” IEEE J.
uses a 4T read port and SE RBL. RBL is precharged at half the Solid-State Circuits, vol. 48, no. 1, pp. 8–19, Jan. 2013.
supply voltage and, during the read operation, is charged or [18] I. J. Chang, J.-J. Kim, S. P. Park, and K. Roy, “A 32 kb 10T
discharged according to the bit stored. For a read-0 operation, sub-threshold SRAM array with bit-interleaving and differential read
RBL discharges through TG and nMOS transistor, and for scheme in 90 nm CMOS,” IEEE J. Solid-State Circuits, vol. 44, no. 2,
pp. 650–658, Feb. 2009.
the next precharge, RBL is supplied current by VP. For a [19] E. Karl et al., “A 0.6 V, 1.5 GHz 84 Mb SRAM in 14 nm FinFET
read-1 operation, RBL is charged from vdd/2 to vdd by virtual CMOS technology with capacitive charge-sharing write assist circuitry,”
read port. For the next precharge, RBL level is decreased and IEEE J. Solid-State Circuits, vol. 51, no. 1, pp. 222–229, Jan. 2016.
[20] K. Nii et al., “A 45-nm single-port and dual-port SRAM family
current flows from RBL to VP. By precharging through VP with robust read/write stabilizing circuitry under DVFS environment,”
(which is half vdd) and charge recycling mechanism, LP10T in Proc. IEEE Symp. VLSI Circuits, Jun. 2008, pp. 212–213.
only dissipates half the average read dynamic power compared [21] F. Hamzaoglu et al., “A 3.8 GHz 153 Mb SRAM design with dynamic
stability enhancement and leakage reduction in 45 nm high-k metal
with 6T. In 65 nm, performance figure (mV /μW ) of 1.83× gate CMOS technology,” IEEE J. Solid-State Circuits, vol. 44, no. 1,
of 6T is achieved at 1 V, and 1.84× on average at different pp. 148–154, Jan. 2009.
supply levels. Due to decoupling of internal nodes, RSNM is [22] T. Song, S. Kim, K. Lim, and J. Laskar, “Fully-gated ground
increased by 2.3× compared with 6T. Overall leakage power 10T-SRAM bitcell in 45 nm SOI technology,” Electron. Lett., vol. 46,
no. 7, pp. 515–516, Apr. 2010.
of LP10T is similar to 6T, however, RBL leakage is reduced [23] K. Kanda, T. Miyazaki, M. K. Sik, H. Kawaguchi, and T. Sakurai, “Two
by more than 3 orders of magnitude, and thus a higher number orders of magnitude leakage power reduction of low voltage SRAM’s
of cell could be integrated on a single column. by row-by-row dynamic VDD control (RRDV) scheme,” in Proc. 15th
Annu. IEEE Int. ASIC/SoC Conf., Sep. 2002, pp. 381–385.
[24] E. Seevinck, F. J. List, and J. Lohstroh, “Static-noise margin analysis
R EFERENCES of MOS SRAM cells,” IEEE J. Solid-State Circuits, vol. 22, no. 5,
pp. 748–754, Oct. 1987.
[1] T. Mudge, “Power: A first-class architectural design constraint,” Com-
puter, vol. 34, no. 4, pp. 52–58, Apr. 2001. [25] M. H. Abu-Rahma, M. Anis, and S. S. Yoon, “Reducing SRAM power
[2] N. S. Kim et al., “Leakage current: Moore’s law meets static power,” using fine-grained wordline pulsewidth control,” IEEE Trans. Very Large
Computer, vol. 36, no. 12, pp. 68–75, Dec. 2003. Scale Integr. (VLSI) Syst., vol. 18, no. 3, pp. 356–364, Mar. 2010.
[3] G. Venkatesh et al., “Conservation cores: Reducing the energy of mature [26] M. Alioto, “Ultra-low power VLSI circuit design demystified and
computations,” ACM SIGARCH Comput. Archit. News, vol. 38, no. 1, explained: A tutorial,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 59,
p. 205, 2010. no. 1, pp. 3–29, Jan. 2012.
[4] N. Goulding-Hotta et al., “The GreenDroid mobile application proces- [27] G. Pasandi and S. M. Fakhraie, “A 256-kb 9T near-threshold SRAM
sor: An architecture for silicon’s dark future,” IEEE Micro, vol. 31, no. 2, with 1k cells per bitline and enhanced write and read operations,”
pp. 86–95, Mar./Apr. 2011. IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 23, no. 11,
[5] A. Pavlov and M. Sachdev, CMOS SRAM Circuit Design and Parametric pp. 2438–2446, Nov. 2015.
Test in NANO-Scaled Technologies: Process-Aware SRAM Design and [28] S. Ahmad, M. K. Gupta, N. Alam, and M. Hasan, “Single-ended
Test, vol. 40. The Netherlands: Springer, 2008. Schmitt-trigger-based robust low-power SRAM cell,” IEEE Trans. Very
[6] J. Singh, S. P. Mohanty, and D. Pradhan, Robust SRAM Designs and Large Scale Integr. (VLSI) Syst., vol. 24, no. 8, pp. 2634–2642,
Analysis. New York, NY, USA: Springer-Verlag, 2012. Aug. 2016.
[7] M. A. Rahma and M. Anis, Nanometer Variation-Tolerant SRAM: [29] B. D. Yang, “A low-power SRAM using bit-line charge-recycling for
Circuits and Statistical Design for Yield. New York, NY, USA: read and write operations,” IEEE J. Solid-State Circuits, vol. 45, no. 10,
Springer-Verlag, 2012. pp. 2173–2183, Oct. 2010.

[30] K. Kim, H. Mahmoodi, and K. Roy, “A low-power SRAM using bit-line Naeem Maroof (S’14–M’16) received the B.Sc.
charge-recycling technique,” in Proc. ACM/IEEE Int. Symp. Low Power degree in computer engineering from the COM-
Electron. Design (ISLPED), Aug. 2007, pp. 177–182. SATS Institute of Information Technology (CIIT),
[31] B.-D. Yang and L.-S. Kim, “A low-power SRAM using hierarchical bit Islamabad, Pakistan, in 2006, the M.Sc. degree in
line and local sense amplifiers,” IEEE J. Solid-State Circuits, vol. 40, electronic communications and computer engineer-
no. 6, pp. 1366–1376, Jun. 2005. ing from the University of Nottingham, Nottingham,
[32] H. Jeong, T. Kim, T. Song, G. Kim, and S. O. Jung, “Trip-point bit- U.K., in 2007, and the Ph.D. degree in electronics
line precharge sensing scheme for single-ended SRAM,” IEEE Trans. engineering from Hanyang University, Seoul, South
Very Large Scale Integr. (VLSI) Syst., vol. 23, no. 7, pp. 1370–1374, Korea, in 2016.
Jul. 2015. From 2007 to 2012, he was a full-time
[33] H. Noguchi et al., “Which is the best dual-port SRAM in 45-nm Faculty Member with the Department of Electrical
process technology?—8T, 10T single end, and 10T differential,” in Engineering, CIIT. He is currently a Post-Doctoral Research Fellow with the
Proc. IEEE Int. Conf. Integr. Circuit Design Technol. Tut., Jun. 2008, Integrated System Design Laboratory, College of Information and Commu-
pp. 55–58. nication Engineering, Sungkyunkwan University, Suwon, South Korea. His
[34] R. W. Mann and B. H. Calhoun, “New category of ultra-thin current research interests include low power and reliable integrated circuit
notchless 6T SRAM cell layout topologies for sub-22nm,” in Proc. design, memory architecture (SRAM, DRAM, NVM, RRAM, and hybrids),
12th Int. Symp. Quality Electron. Design (ISQED), Mar. 2011, and the power management ICs.
pp. 1–6.
[35] I. M. Filanovsky and A. Allam, “Mutual compensation of mobility Bai-Sun Kong (S’94–M’00) received the B.S.
and threshold voltage temperature effects with applications in CMOS degree in electronics engineering from Yonsei
circuits,” IEEE Trans. Circuits Syst. I, Fundam. Theory Appl., vol. 48, University, Seoul, South Korea, in 1990, and the
no. 7, pp. 876–884, Jul. 2001. M.S. and Ph.D. degrees in electrical engineering
[36] J. C. Ku and Y. Ismail, “On the scaling of temperature-dependent from the Korea Advanced Institute of Science and
effects,” IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., Technology, Daejon, South Korea, in 1992 and 1996,
vol. 26, no. 10, pp. 1882–1888, Oct. 2007. respectively.
[37] C. Park et al., “Reversal of temperature dependence of From 1996 to 1999, he was with LG Semicon
integrated circuits operating at very low voltages,” in Company Ltd., Seoul, where he was involved in
Proc. Int. Electron Devices Meeting (IEDM), Dec. 1995, the design of high-bandwidth DRAMs including
pp. 71–74. 18 M CONCURRENT RDRAM, 72 M Concur-
[38] A. Bellaouar, A. Fridi, M. I. Elmasry, and K. Itoh, “Supply voltage rent RDRAM, and 128 M Direct RDRAM. From 2000 to 2005, he was
scaling for temperature insensitive CMOS circuit operation,” IEEE with Korea Aerospace University, Goyang, South Korea, where he was
Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 45, no. 3, an Associate Professor at the School of Electronics, Telecommunication,
pp. 415–417, Mar. 1998. and Computer Engineering. In 2005, he joined Sungkyunkwan University,
[39] D. Wolpert and P. Ampadu, “Temperature effects in semiconductors,” Suwon, South Korea, where he is currently a Professor at the College of
in Managing Temperature Effects in Nanoscale Adaptive Information and Communication Engineering. His current research interests
Systems. New York, NY, USA: Springer-Verlag, 2012, include microprocessor, and memory architecture and circuit design, and VLSI
pp. 15–33. circuit and system design for low-power and/or high-speed applications.