You are on page 1of 40



Spin-Transfer Torque
Memories: Devices, Circuits,
and Systems
This paper reviews the fundamentals of spin-transfer torque memories and discusses
key experimental breakthroughs before covering the state of the art and the device
design concepts and challenges.
By Xuanyao Fong, Member IEEE , Yusung Kim, Rangharajan Venkatesan, Member IEEE ,
Sri Harsha Choday, Student Member IEEE , Anand Raghunathan, Fellow IEEE , and
Kaushik Roy, Fellow IEEE

ABSTRACT | Spin-transfer torque magnetic memory I. INTRODUCTION

(STT-MRAM) has gained significant research interest due to The increasing leakage power in on-chip memories as a
its nonvolatility and zero standby leakage, near unlimited en-
result of CMOS technology scaling is a critical problem
durance, excellent integration density, acceptable read and
that needs to be overcome in future microprocessors.
write performance, and compatibility with CMOS process
Proposals for nonvolatile memory technologies that en-
technology. However, several obstacles need to be overcome able powered-down operation during “idle mode” can po-
for STT-MRAM to become the universal memory technology. tentially alleviate the leakage problem. Although several
This paper first reviews the fundamentals of STT-MRAM and nonvolatile memory technologies have been proposed,
discusses key experimental breakthroughs. The state of the most are not suitable for on-chip cache applications for
art in STT-MRAM is then discussed, beginning with the device a variety of reasons (e.g., poor performance and/or en-
design concepts and challenges. The corresponding bit-cell
durance, high energy consumption for read/write, poor
design solutions are also presented, followed by the STT-MRAM
reliability, or incompatibility with CMOS fabrication
cache architectures suitable for on-chip applications.
Furthermore, the integration density of dynamic
KEYWORDS | Design of spin-transfer torque MRAM; emerging RAM (DRAM) and performance of static RAM (SRAM)
nonvolatile memory technology; magnetic tunnel junction; are desired for the nonvolatile memory technology to im-
nonvolatile on-chip caches plement highly energy-efficient on-chip caches. A large
on-chip cache capacity reduces the number of off-chip
memory accesses and improves the overall system perfor-
mance. However, there is a tradeoff between the access
Manuscript received January 22, 2015; revised October 16, 2015 and January 13, latencies of the on-chip cache and the capacity of the
2016; accepted January 15, 2016. Date of publication April 7, 2016; date of current
version June 16, 2016. This work was supported in part by C-SPIN, one of six centers
cache [1], [2]. Hence, nonvolatile memory technologies
of StarNET, a Semiconductor Research Corporation program sponsored by MARCO proposed for on-chip cache applications need sufficient
and DARPA,
by the Semiconductor Research Corporation, by the National Science Foundation,
performance when the cache capacity is large.
NSSEFF, by Intel Corporation, and by Qualcomm. Of the nonvolatile memory technologies that have
X. Fong is with the Institute of Microelectronics, Agency for Science Technology and
Research (A*STAR), Singapore 138634, Singapore (e-mail:
been proposed, spin-transfer torque magnetic random
Y. Kim is wth Advanced Design, Logic Technology Development, access memories (STT-MRAM) have emerged as the
Intel Corporation, Hillsboro, OR 97124 USA.
R. Venkatesan is with NVIDIA Research, Santa Clara, CA 95050 USA.
leading candidate for on-chip cache applications. STT-
S. H. Choday is with Intel Labs, Intel Corporation, Hillsboro, OR 97124 USA. MRAM can achieve the integration density of DRAM
A. Raghunathan and K. Roy are with the School of Electrical and Computer
Engineering, Purdue University, West Lafayette, IN 47907 USA
and potentially match the performance of SRAM. Com-
(e-mail:; pared to competing nonvolatile memory technologies,
Digital Object Identifier: 10.1109/JPROC.2016.2521712 STT-MRAM is compatible with the CMOS fabrication
0018-9219 Ó 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See for more information.
Vol. 104, No. 7, July 2016 | Proceedings of the IEEE 1449
Fong et al.: Spin-Transfer Torque Memories: Devices, Circuits, and Systems

process and has better combination of retention time,

endurance, and reliability. Although spin-transfer torque
(STT)-based memory is an attractive candidate for uni-
versal memory technology, much research effort is
needed to overcome the key design issues that prevent
widespread adoption of the technology. The key thrust
in the research effort has been to reduce the relatively
high write energy (compared to CMOS on-chip caches)
of STT-MRAM, and to improve their reliability and read
This paper serves as a review of the state of the art in
STT-MRAM research. Key design challenges that need to
be overcome in STT-MRAM will be presented along with
possible solutions that have been proposed in the litera-
ture. Our discussion begins in Section II with an intro-
duction to the magnetoresistance effect, which was
experimentally observed by Jullière in 1975 [3]. The rest
of Section II discusses the basic device structures that ex-
ploit the giant and tunneling magnetoresistance effects,
the core technology for STT-based memory applications.
We then shift the focus of our discussion to other aspects
of STT-based magnetic devices for memory applications
in Section III. Specifically, we discuss how the device Fig. 1. Physical structure of a vertical spin valve. The type of
may be programmed and the ability of the device to re- magnetoresistance depends on the spacer layer used. The free
layer has an anisotropy energy barrier so that the spin valve is
tain the data stored in it. The concept of energy barriers
in either parallel or anti-parallel configuration. The direction of
and the spin-transfer torque phenomenon are discussed current flow to switch the spin valve configuration is also
in this section as well. The basic design of an STT- shown on top.
MRAM bit-cell is presented in Section IV to show the
design issues and where the inefficiencies in the design
come from. Section V focuses on various device-level de-
sign techniques for improving STT-MRAMs. We first are two types of spin valves depending on the type of
consider two-terminal magnetic tunnel junction (MTJ)- spacer layer used. Spin valves using an insulator spacer
based device structures. Their drawbacks and limitations layer exploit the tunneling magneto-resistance (TMR) ef-
are discussed to motivate the discussion on multiterminal fect discovered by Jullière in 1975 [3]. On the other
device. As we will see in Section V, the high write en- hand, spin valves using a nonmagnetic metal spacer layer
ergy in STT-MRAM poses a significant design challenge. exploit the giant magneto-resistance (GMR) effect dis-
Several alternative physical phenomena, such as spin-orbit covered separately by Fert and Grünberg [4], [5], for
torque and voltage-controlled magnetic anisotropy which they were awarded the Nobel Prize [6]. The key
(VCMA), may be exploited to lower the write energy in characteristic of spin valves is that their conductance de-
STT-MRAMs. The associated device structures are also pends on the relative magnetization directions of the
presented in Section V. The details of the circuit-level magnetic layers (see Fig. 1). The conductance is high
design techniques (or the bit-cells) are presented in and low when the magnetization directions are parallel
Section VI. The STT-MRAM array/architecture level de- (P) and anti-parallel (AP), respectively. A key metric to
sign techniques along with error correction techniques to describe spin valves is the magneto-resistance (MR) ratio
improve reliability and energy consumption are de- given by
scribed in Section VII. Finally, Section VIII concludes
this paper. GP  GAP
MR ¼  100%: (1)
The spin valve structure, as shown in Fig. 1, is the basic In the first experiments, Jullière and Fert measured MR
building block of spin-transfer torque memories. The ratios of 14% and 80% in Fe/Ge-O/Co tunnel junctions
magnetization of a magnetic layer in the absence of ex- [3] and in Fe/Cr superlattices [4], respectively, at
ternal stimulus is engineered to point along a particular T  4:2 K (T is the temperature). Since then, many
direction called its easy axis. Note that the easy axes of studies have been done to understand GMR and TMR
the magnetic layers in the spin valve are collinear. There effects [7]–[18].

1450 Proceedings of the IEEE | Vol. 104, No. 7, July 2016

Fong et al.: Spin-Transfer Torque Memories: Devices, Circuits, and Systems

Table 1 Comparison of Metrics for GMR and TMR Spin Valves presence of thermal agitations, the stability or lifetime
ðtLIFE Þ of the nano-magnet state is related to EB by [21]

tLIFE ¼ t0 exp (3)
kB T

where kB is the Boltzmann constant, T is the tempera-

The GMR effect may be understood by considering
ture, and t0 is the inverse attempt frequency (1 ns).
the scattering of spin-polarized electrons as it passes
For a nano-magnet with EB  40kB T, tLIFE  7:4 years.
through the spin valve [19]. When the spin valve is in
An MTJ consists of two ferromagnetic layers sand-
the P configuration, the electrons flowing through the
wiching an insulator spacer layer (tunnel oxide layer) as
device experience little scattering. However, electrons
illustrated in Fig. 1. The electrical resistance of the MTJ,
experience a lot of spin scattering when the spin valve is
RMTJ , is low when the MTJ is in the P configuration
in the AP configuration.
(RMTJ ¼ RP ¼ RL or “0” state). On the other hand, RMTJ
The TMR effect may be explained by the spin filtering
is high when the MTJ is in the AP configuration (“1”
effect that arises due to the bandstructures of the constit-
state or RMTJ ¼ RAP ¼ RH ). In order to ensure proper op-
uent layers of the spin valve [19]. As a result, the proba-
eration of the device, one of the ferromagnetic layers is
bility of electrons tunneling across the insulator spacer
used as the reference layer and is magnetically pinned
layer depends on the configuration of TMR-based spin
(called the pinned layer or PL). The magnetization of the
valve. Hence, the TMR-based spin valve can be engi-
other ferromagnetic layer (called the free layer or FL) is
neered to achieve MR ratios far higher than that achiev-
free to rotate and is used to store spin information. As
able by GMR-based spin valves at room temperature.
mentioned earlier, uniaxial anisotropy is engineered into
Together with the fact that the resistance of TMR spin
the FL to create an energy barrier that ensures thermal
valves is much higher than that of GMR spin valves, the
magnetic configuration of the TMR spin valve may be
The MTJ may be switched between the P and the AP
electrically sensed by CMOS circuits more easily than in
configurations using a magnetic field also known as
the GMR spin valve. These differences are summarized
field-induced magnetization switching (FIMS). For on-
in Table 1, showing that TMR spin valves are more desir-
chip applications, the magnetic field is generated using
able than GMR spin valves for on-chip memory applica-
current carrying wires as in field-switched magnetic
tions. The spin filtering effect may also be exploited for
RAM (MRAM) [22]. The magnetic field, H, generated
manipulating the magnetic configuration of the TMR-
by the current, I, in the wire is given by
based spin valve as we will discuss later on.

The alignment of the magnetization of a nano-magnet,
where r is the distance between the wire and the nano-
m, along its easy axes is retained in the presence of ther-
magnet. This field needs to overcome the critical field
mal agitations by an energy barrier ðEB Þ as shown in
needed to program the magnetic state of the MTJ, which
Fig. 1. Note that higher-order effects have been ignored:
is given by
We are assuming that the EB that needs to be overcome
for switching from 0 to 180 ðEB;1 Þ is identical to that
for switching from 180 to 0 ðEB;2 Þ. Higher-order effects 2EB
may result in EB;1 6¼ EB;2 . When this is the case, the ef- jHj 9 HC ¼ (5)
0 MS V
fective EB is the smaller of EB;1 and EB;2 . The EB in nano-
magnets can be created with uniaxial anisotropy, and it
is related to the volume, V, of the nano-magnet by [20] Hence, the critical current, IC , required to generate the
magnetic field for switching is

EB ¼ Ku2 V ¼ (2) 4EB r
2 IC 9 : (6)
0 MS V
where Ku2 is the second-order uniaxial anisotropy
constant, MS is the saturation magnetization of the nano- The volume of FL scales as r2 as the MTJ size is scaled
magnet, and HC is the magnetic anisotropy field. In the down. As a result, the iso-EB switching current increases.

Vol. 104, No. 7, July 2016 | Proceedings of the IEEE 1451

Fong et al.: Spin-Transfer Torque Memories: Devices, Circuits, and Systems

Furthermore, the applied magnetic field to switch the assuming  to be constant and 0 ¼ 0.  describes the
MTJ is not well confined and can affect MTJs in close degree to which the current flowing through the MTJ is
proximity to the current carrying wires. The magnetic spin-polarized and depends on the spin filtering effect
field from the current carrying wires in FIMS-based discussed in Section II. To determine the condition for
MRAM may accidentally switch the states of these MTJs. STT to switch the MTJ configuration, (8) is substituted
Based on the above considerations, it is widely accepted into (7) and rearranged. Since only the damping torque
that FIMS-based MRAM is not scalable [23]. needs to be considered to obtain the condition for STT
STT provides an alternative switching mechanism to switch the MTJ configuration, the following condition
[14], [24]–[27] that overcomes the scalability issues in is obtained:
field-switched MRAM. Consider the case when electrons
flow from PL to FL in an MTJ. Since magnetic layers are  
2 eM V I  HC :
polarizers of electron spin, PL strongly polarizes the spin (9)
of incoming electrons in the direction of PL magnetiza- 0 S
tion, mP . As the electrons tunnel across the barrier into
FL, they exert a torque that aligns FL magnetization, m, Using (5) in (9) and rearranging, the critical current, IC ,
with mP . On the other hand, when electrons tunnel from for STT switching is given by
FL to PL, the FL tries to polarize the incoming electrons
with spins in the direction of the FL magnetization, m.
However, electrons with spin in the direction of mP tun- 4eEB
IC ¼ : (10)
nel across the barrier easily and are removed from FL. h
Electrons with spin opposite to mP cannot tunnel across
the barrier easily and remains in FL, exerting a torque
on m. The torque exerted aligns m opposite to mP . Thus, Hence, at iso-EB , IC in STT-based switching can be
the direction of current flow determines the switching constant when the MTJ size is scaled down, whereas it
direction (Fig. 1). Also, the IC for parallelizing FL with increases for FIMS-based MRAM. As such, STT-MRAM
PL is lower since the spin of electrons injected into FL is more scalable as compared to FIMS-based MRAM.
are mostly in the direction of mP . Electrons entering FL The magnetization of the ferromagnetic layers in an
when anti-parallelizing FL with PL are not as well polar- MTJ can be made to have in-plane anisotropy (IMA) or
ized and, hence, a larger IC is needed to anti-parallelize perpendicular magnetic anisotropy (PMA) [14], [23], [27],
FL with PL. [31]–[37]. As Fig. 2 shows, MTJs with IMA have thin
A few words are in order so the scalability of STT film ferromagnetic layers whose uniaxial anisotropy easy
may be understood. The magnetization dynamics of a axes are in the plane of the thin film layers [14], [31],
mono-domain FL in an MTJ may be modeled using the [32]. The uniaxial anisotropy easy axes of thin film ferro-
Landau–Lifshitz–Gilbert–Slonczewski (LLGS) equation [28] magnetic layers in MTJs with PMA are perpendicular to
the plane of the thin film layers [27], [33]–[35]. The
shape of an MTJ with IMA determines its uniaxial an-
@m @m isotropy barrier. Magnetocrystalline anisotropy domi-
¼ jjm  H EFF þ  m  þ STT (7)
@t @t nates in MTJs with PMA, and hence, the shape of the
MTJ has less effect on the anisotropy energy. Thus, MTJs
with PMA can have circular cross sections and are pre-
where  is the gyromagnetic ratio,  is Gilbert’s damping ferred to achieve maximum integration density. Further-
factor, and H EFF is the effective magnetic field acting on more, the uniaxial anisotropy field and demagnetizing
the nano-magnet. Uniaxial anisotropy and demagnetizing field are collinear in MTJs with PMA, and the effective
fields are included in H EFF . Generally, the spin-transfer tor- field STT needs to overcome during switching is given by
que is given by [24], [25], [28]–[30] (5). In MTJs with IMA, STT needs to overcome an effec-
tive magnetic field given by [28]

STT ¼  I m  ðmP  mÞ  0 mP
2 eM V 2EB HDemag
0 S
jH EFF j ¼ þ (11)
0 MS V 2
where h is the reduced Planck constant, and e is the elec-
tronic charge. Also,  and 0 models the Slonczewski-like where the last term on the right-hand side models the
and field-like torques, respectively, and may be functions of demagnetizing field in the out-of-plane direction [28].
m and mP [29]. Since the effective field that STT needs to overcome dur-
Let us first consider the case where FL has only uni- ing switching in MTJs with IMA is higher than that in
axial magnetic anisotropy. Let us also simplify (8) by MTJs with PMA, the IC of MTJs with IMA are higher

1452 Proceedings of the IEEE | Vol. 104, No. 7, July 2016

Fong et al.: Spin-Transfer Torque Memories: Devices, Circuits, and Systems

The scaling analysis performed in [37] and [38]

matched the bit-cell write failure rates by ensuring that
the voltage needed to switch IMA and PMA MTJs are
the same. The resistance-area (RA) product of the MTJ,
which corresponds to the thickness of the MTJ tunneling
oxide and the reliability of the MTJ, needed to meet the
design specifications were then determined. It was then
shown that under the assumed design constraints, STT-
MRAM with PMA-based MTJs outperforms STT-MRAM
with IMA-based MTJs when the minimum lateral feature
size is larger than 22 nm. Below that, STT-MRAM
with IMA-based MTJs were shown to be better than their
PMA-based counterparts. In [38], the poor scalability of
MTJs based on interfacial PMA is attributed to the diffi-
culty in engineering the barrier height to meet thermal
stability requirements. Furthermore, the IC of the MTJ is
unaffected by the scaling down of the physical dimen-
sions of the MTJ [37], [40], [41], whereas that of IMA-
based MTJs scale down with the physical dimensions of
the MTJ [37]. Also, due to the larger cross-sectional area
Fig. 2. Geometries of MTJ with IMA and with PMA. The aspect of MTJ with IMA compared to that with PMA, the resis-
ratio (AR ¼ LY =LX , LX and LY are the length of the FL in the tance of the MTJ with IMA is sufficiently small such that
x- and y-directions, respectively) of the IMA-based MTJ can be the CMOS access transistors are able to meet the write
used to engineer the shape anisotropy of the FL that determines
current requirements at the corresponding supply
EB , whereas the thickness of the FL can be used to determine
EB in PMA-based MTJ and the AR can be 1, which is better for
integration density. However, it was shown in [42] that when IMA-based
MTJs are scaled down at iso-EB , the induced shape an-
isotropy causes the easy axis of the FL to align perpen-
dicular to the substrate plane, just like in PMA-based
than that of MTJs with PMA having the same model MTJ. Although the scaled IMA-based MTJs seem to be
parameters as summarized in Table 2. Hence, MTJs with PMA, the thickness of the FL is too large. The mono-
PMA are expected to be able to achieve lower switching domain assumption may no longer be valid in this case.
energy as compared to MTJs with IMA. Note AR can be When electron current flows in MTJs with very thick FL,
1 for PMJs, making them suitable for higher integration STT occurs mostly at the interface between the FL and
density. the tunnel oxide. The charge current may completely
lose spin-polarization in the FL, and as a result, STT may
A. Scalability of MTJs not be exerted on the magnetization of the rest of FL.
STT-MRAM implemented using IMA and PMA-based Hence, PMA-based MTJs may be the only option for
MTJs were studied in the literature to understand the high-density STT-MRAM with large EB or retention time.
impact of scalability of STT-MRAM bit-cells [37], [38].
The STT-MRAM bit-cells compared in [37] and [38] B. Thermal Effects in MTJs
were designed with the same retention time and EB and Strictly speaking, IC calculated from (9) is the mini-
bit-cell layout styles. The bit-cells were also designed mum current required to switch the FL in an MTJ.
such that their read and write failure rates were approxi- However, switching has been observed at currents less
mately the same. The minimum lateral feature size of than IC because the switching process is stochastic
the MTJ is matched to that of the technology of the ac- [23], [43]–[46]. The randomness of the switching pro-
cess transistor. The design specifications were also cess is due to thermal effects that may be modeled as
matched to those listed in the ITRS roadmap [39]. an effective magnetic field given by [47]–[49]

Table 2 Comparison of IC for MTJs With PMA and With IMA 2kB T
H Thermal ¼X (12)
jjMS Vt

where X is a vector with components that are indepen-

dent Gaussian random variables, kB is the Boltzmann

Vol. 104, No. 7, July 2016 | Proceedings of the IEEE 1453

Fong et al.: Spin-Transfer Torque Memories: Devices, Circuits, and Systems

constant, T is the temperature, and t is the constant I V. DESIGN OF SPIN-TRANSFER

time-step used in the numerical simulation of the switch- T ORQUE M RAM BI T- CEL L S
ing process. H Therm affects switching dynamics of MTJs In this section, we will discuss the design of an STT-
in two ways. First, it affects the switching process be- MRAM bit-cell. Its basic operations are presented along
cause it is always present. Second, it affects the initial with the design issues that need to be overcome.
relative angle between m and mP . Depending on the op-
erating regime, the impact of each on the switching
process is different. A. Basic Bit-Cell Design
In the precessional switching regime (switching delay, The structure of the standard STT-MRAM bit-cell
tSW  3 ns), the distribution of tSW depends mostly on consists of an MTJ and an access transistor (ATx) con-
the initial angle, , between m and mP . This may be nected as shown in Fig. 3 (known as 1T-1MTJ or 1T-1R
described by [44] STT-MRAM bit-cell) [32], [57]–[61]. The ATx’s allow se-
lective access to the MTJs, the storage elements in the
  memory array, as Fig. 3 shows. Let us consider the write
 operation being performed on the STT-MRAM bit-cell.
ln 2
tSW / (13) The word line (WL) connected to the bit-cell is charged
I  IC0

where IC0 is the critical switching current at the charac-

teristic switching delay (usually 1 ns). The thermal agita-
tions in  may be modeled as a Boltzmann distribution

EB cos2 
PðÞ / exp : (14)
kB T

The switching probability, PSW , of the MTJ in the preces-

sional switching regime may be estimated by combining
(13) and (14) to yield [44]


PSW / exp ð1  cos2 Þ ðI  IC0 Þ sin2 (15)

where  ¼ EB =kB T, ¼ ð=2Þ expðð2B =eMS VÞðI

IC0 ÞtSW Þ,  is the effective spin polarization of the charge
current, and B is the Bohr magneton. However, in the
thermal activation regime ðtSW  10 nsÞ, PSW is instead
given by [46], [53]

0 1
t I
PSW ¼ 1  exp@ exp  1  A (16)


is the characteristic switching time (usually 1 ns).
When current flows through the MTJ, Joule heating in the Fig. 3. (a) STT-MRAM bit-cell may be connected in the
MTJ effectively reduces . Hence, thermally assisted “top-pinned” or “bottom-pinned” configuration. (b) 2  2 array
MRAM that uses write current to heat the MTJ and oper- of bit-cells is shown. A row of cells is selected by charging the
WL along that row to VDD . The MTJs are represented as variable
ate STT-MRAM in the thermal activation regime has been
resistances. (c) Regardless of bit-cell configuration, the access
proposed [54]–[56]. However, thermally assisted MRAM transistor may become source degenerated during one of the
may be unsuitable for memory applications requiring write write operations due to the need for bidirectional write
latency shorter than 10 ns. current flow.

1454 Proceedings of the IEEE | Vol. 104, No. 7, July 2016

Fong et al.: Spin-Transfer Torque Memories: Devices, Circuits, and Systems

to VDD so that the ATx of the bit-cell can pass current be- electromigration in the interconnect layers supplying
tween the bit line (BL) and source line (SL) through the current to the MTJ.
MTJ. If a “0” (or low resistance state, LRS) is to be writ- As the MTJ and the CMOS technology are scaled
ten into the bit-cell, a current is passed from FL to PL so down, the magnetic layers of the MTJs are spaced closer
that the MTJ is programmed into P configuration. BL is together. Asymmetry may be introduced into the energy
charged to VDD , and that of SL is discharged to GND so landscape of the free layer of an MTJ due to stray mag-
the MTJ is parallelized to P configuration. If instead a “1” netic fields from the neighboring MTJs, as mentioned
(or high resistance state, HRS) is to be written into the earlier in Section III. Note that from the perspective of
bit-cell, the direction of current flow is reversed so that the FL of the MTJ, the stray field from the PL of the
the MTJ is anti-parallelized into AP configuration. SL is MTJ may be stronger than stray fields from the neighbor-
charged to VDD , and that of BL is discharged to GND so ing MTJs due to the close proximity of the PL to the FL
that the current flows from PL to FL. Note that VDD in the MTJ. The effective EB may reduce due to these
must be set such that write current flowing through the stray fields resulting in increased retention failures,
MTJ ðIWR Þ is at least the critical currents, IC (‘0’) and which occur when thermal agitations flip the configura-
IC (‘1’), needed to write a “0” and a “1,” respectively. tion of the MTJ in the absence of any input stimulus.
For read operation [62]–[65], the WL connected to The presence of stray magnetic field may also increase
the bit-cell that is selected for reading is charged to VDD . the asymmetry in IC that we discussed in Section III.
In voltage sensing scheme, a read current ðIREAD Þ is passed Failures in operation of STT-MRAM bit-cell may also
through the bit-cell so that a voltage develops between occur due to process variations in the MTJ and the ATx.
BL and SL ðVCELL Þ. A voltage sense amplifier (VSA) com- Note that the thickness of the tunnel oxide ðtMgO Þ and the
pares VCELL to a reference voltage, VREF . The bit-cell dimensions of FL may vary due to process variations.
stores HRS if VCELL 9 VREF , and the VSA outputs a “1.” These variations lead to variations in the static and dy-
Otherwise, the bit-cell stores LRS and the VSA outputs a namic behavior of the MTJ and contribute to failures in
“0.” In current sensing scheme, the voltage across the BL STT-MRAM bit-cell operation. These failures may be clas-
and SL is fixed at a read voltage, VREAD . The current sified as write failure, disturb failure, and decision failure
flowing through the bit-cell ðICELL Þ is compared to a ref- [62], [63]. Write failures occur when IWR G IC . This may
erence current, IREF , using a current sense amplifier occur when the ATx resistance is too large or when RMTJ
(CSA). The bit-cell stores an LRS if ICELL 9 IREF , and the is too large because tMgO is too thick, or when variations
CSA outputs a “0.” Otherwise, the bit-cell stores HRS in the MTJ result in a higher-than-expected IC . Since cur-
and the CSA outputs a “1.” The sensing schemes just dis- rent flows through the STT-MRAM bit-cell during the
cussed are single-ended in nature. Also, note that current read operation, thermal effects may lead to disturb fail-
flows through the MTJ during read operations regardless ures. Moreover, disturb failure may occur when the cur-
of the sensing scheme used. Since write operations also rent flowing through the MTJ, IMTJ , is more than IC
occur by passing current through the MTJ, the current during the read operation. This may occur due to a lower-
through the MTJ during read operations needs to be kept than-expected RMTJ because tMgO is too thin, the variations
small to ensure that the bit-cell is not accidentally over- in the MTJ causing IC to be lower than expected, or a
written during read operations. lower ATx resistance due to process variations. Decision
failures occur when the sense amplifier incorrectly senses
B. Design Issues of 1T-1MTJ STT-MRAM Bit-Cell RMTJ , i.e., LRS is sensed as HRS or HRS is sensed as LRS.
Just as in FIMS-based MRAM, the endurance and re- In the current sensing scheme described in Section IV-A,
liability of the MTJ is crucial for the functionality and decision failure occurs when ICELL 9 IREF for a bit-cell
performance of STT-MRAM. The important reliability is- storing HRS and ICELL G IREF for a bit-cell storing LRS.
sues in the MTJ are mainly due to time-dependent di- Decision failure in the voltage sensing scheme occurs
electric breakdown (TDDB) and time-dependent when VCELL 9 VREF for a bit-cell storing an LRS and
resistance drift (TDRD) [66]–[69]. Dielectric breakdown VCELL G VREF for a bit-cell storing an HRS. Generally, deci-
results in a sharp decrease in MTJ resistance and loss of sion failures occur because process variations in the bit-
magneto-resistance. TDDB manifests as an abrupt de- cell lead to lower-than expected bit-cell resistance ðRCELL Þ
crease in the MTJ resistance, whereas TDRD exhibits a when it is storing an HRS, and higher-than-expected RCELL
gradual decrease in the MTJ resistance [66]. The occur- when it is storing an LRS. Thus, bit-cell parameters (tMgO ,
rence of either failure leads to decrease in the MTJ resis- FL dimensions, VDD , VREAD and IREF or IREAD and VREF ,
tance, which in turn manifests as stuck-at-P faults. These ATx width, etc.) must be carefully chosen to minimize
reliability issues are mitigated by ensuring that the MTJ bit-cell failures and to meet design targets.
is not overly stressed. Hence, reducing the IC of the MTJ However, there are many conflicting design require-
is desirable for reducing the stress on the MTJ and also ments that STT-MRAM bit-cell designers face. For exam-
in reducing the write energy consumption of STT- ple, if tMgO is increased to improve the distinguishability
MRAM. This is also desirable for mitigating between RP and RAP (MR ratio of an MTJ was

Vol. 104, No. 7, July 2016 | Proceedings of the IEEE 1455

Fong et al.: Spin-Transfer Torque Memories: Devices, Circuits, and Systems

experimentally observed to increase with tMgO in [14], V. IMPROVING STT-MRAM AS THE

[70], and [71]), decision failures would reduce. However, S T O RA G E DE VI CE
RMTJ would increase, and as a result, IWR would reduce. The MTJ stack may be designed to mitigate the design is-
Hence, if VDD is not increased correspondingly, write sues in STT-MRAM, which were discussed earlier. In
failures would increase. Similar tradeoffs in varying the this section, we will first look at the proposed improve-
other bit-cell design parameters mean that the design ments to the two-terminal MTJ stacks and discuss their
space for STT-MRAM bit-cells may become extremely advantages and limitations. This motivates our discussion
limited when designing for multiple constraints (power, on multiterminal MTJ structures, which have most of
performance, reliability, etc.). the advantages of two-terminal MTJ stacks while over-
The stochastic nature of spin-transfer torque also coming most of their limitations. We then present sev-
poses a design challenge. A relatively large IC is required eral physical phenomena that may be exploited to
to achieve low write error rates even in the absence of improve STT-MRAM performance. We will also discuss
process variations. The need for bidirectional write cur- multilevel cells (MLCs), which potentially increase inte-
rent in standard STT-MRAM bit-cells further exacerbates gration density while amortizing the cost per bit. Design
the problem. Consider the bit-cell bias for writing a “0” issues of MLCs are also discussed.
into the “bottom-pinned” bit-cell as shown in Fig. 3.
Note that the terminal of ATx that is connected to SL is
the source terminal of ATx and, hence, VGS ¼ VDD for A. Improving the Conventional MTJ Stack
ATx. Now, consider the bit-cell bias for writing a “1” As we have discussed earlier, the stray field from the
into the same bit-cell. Since the direction of IWR is re- PL of the MTJ may give rise to asymmetry in EB and IC
versed, the terminal of ATx that is connected to the MTJ of the FL in the MTJ. Note that the PL of the MTJ can
is the source terminal of ATx. Due to the voltage divid- be replaced with a synthetic anti-ferromagnet (SAF)
ing action of the channel resistance of ATx ðRATx Þ and structure [78]–[80] to reduce the stray field from the PL.
RMTJ , the voltage of the source terminal of ATx ðVS Þ is in Reducing IC is also highly desirable for improving the
the range 0 G VS G VDD . Hence, VGS G VDD , leading to a endurance, the reliability, and the write energy con-
source degenerated ATx [72]. For proper memory opera- sumption. From (10), it is clear that IC may be reduced
tion, the bit-cell must be designed to satisfy write failure by increasing the spin-polarization efficiency  or by re-
requirements for both writing a “0” and writing a “1.” ducing the damping factor  or the barrier height EB .
When the “bottom-pinned” STT-MRAM bit-cell is de- EB is determined by application requirements and is
signed to satisfy the write failure rate for writing a “1,” expected to be equivalent to 10 years retention time for
there may be excessive IWR flowing through the bit-cell most memory applications.  may be improved by re-
when writing a “0.” Hence, the reliability of the tunnel placing the PL and FL of the MTJ with half-metallic
oxide in the MTJ may severely degrade, and excessive Heusler alloys in which all conduction electrons are
write power may dissipate. Furthermore, the source de- spin-polarized in one direction. Improved  may also
generation problem may be exacerbated by the asymme- lead to improved MR ratio [16]. However, it was found
try in IC of the MTJ. Asymmetry in IC has been observed that interfacial states play an important role and may
in several experiments [33], [58], [73]–[75] and may be degrade the MR ratio [17].
explained in terms of the dependence of spin- An alternative approach to reduce IC is to reduce the
polarization efficiency on the direction of current flow damping factor,  of the FL of the MTJ [23]. Further-
[76], [77]. A “top-pinned” or “reverse-connected” STT- more, the variance of H Thermal is proportional to  as (12)
MRAM bit-cell, shown in Fig. 3, was proposed to miti- shows. Thus, reducing  also has the desirable effect of
gate the source degeneration problem [34], [58]–[60]. reducing the variation in IC due to thermal fluctuations
However, source degeneration is still unavoidable as in the FL magnetization. Experimental measurements
Fig. 3 illustrates. Furthermore, the source degeneration [81]–[84] suggest that  of the FL depends on the
effect becomes more pronounced when the switching thickness of the layer. Note, however, it is challenging
voltage for the MTJ ðIC  RMTJ Þ is increased. Conse- to optimize the thickness of the FL layer to reduce 
quently, the source degeneration effect can become a while ensuring that other magnetic properties are main-
serious issue when designing high-performance STT- tained. It has been shown that  can be enhanced with
MRAM with large RMTJ (such as when the MTJ cross- spin-pumping effect at the interface of FL with the
sectional area is scaled down). neighboring capping/seed layer [85]. Insertion of a spin
Various research efforts have addressed the aforemen- barrier between FL and the neighboring capping/seed
tioned design challenges at various levels of design ab- layer suppresses the spin-pumping effect and may re-
straction. In the following sections, we present an duce  [86]. In the CoFeB/MgO/CoFeB-based MTJ with
extensive survey of the research work that addresses interfacial PMA, sandwiching a CoFeB free layer be-
STT-MRAM design issues at the device, circuit, and tween MgO layers also increases the strength of interfa-
memory architecture levels. cial PMA [86], [87]. The desirable effect in such a

1456 Proceedings of the IEEE | Vol. 104, No. 7, July 2016

Fong et al.: Spin-Transfer Torque Memories: Devices, Circuits, and Systems

stack is that thermal stability can be increased without

increase in IC [86].
Other improvements to the MTJ stack have also
been conceptually proposed in the literature. Some of
these structures have been experimentally demonstrated
as well. The following sections present several alterna-
tive MTJ structures that are suitable for memory

B. Alternative Two-Terminal Devices

Four types of two-terminal MTJ structures for im-
proving the standard STT-MRAM bit-cell were investi-
gated in [88]. These structures may be classified as
single-barrier and double-barrier structures, where the
double-barrier structure consists of two PLs and tunnel-
ing barriers as shown in Fig. 4. The double-barrier struc-
ture enhances the spin filtering effect and results in a
lower IC as compared to the single-barrier MTJ structure
[88]. Note that the PLs in double-barrier structures are
anti-parallel to each other, and since the easy axis of the
FL is collinear with that of the PL, the FL will be parallel
to one of the PLs. If the thicknesses of both tunneling
barriers are the same, the resistance of the double-
barrier structure will be the same regardless of whether
FL is parallel to the top PL or to the bottom PL. Hence,
the thicknesses of the tunneling barriers must be differ-
ent so that the state of the FL may be electrically
The device structures in Fig. 4 may be further clas-
sified into structures with and without synthetic anti-
ferromagnetic FL (SAFF). The SAFF structure consists
of a pair of magnetically coupled ferromagnetic (FM) Fig. 4. Two terminal MTJ stacks that have been proposed in the
literature include (a) the single- and double-barrier stacks,
layers sandwiching a thin paramagnetic (Ru) layer. The
which may have ferromagnetic free layer or synthetic
anti-ferromagnetic (AFM) coupling between FM layers in anti-ferromagnetic free layer, and (b) the MTJ with tilted
SAFF increases the immunity of each FM layer to thermal magnetic anisotropy or the orthogonal MTJ stack.
fluctuations. Furthermore, the IC of SAFF may not in-
crease even though its thermal stability is increased be-
cause each FM layer in the SAFF stack acts as a spin
polarizer for the other FM layer [88]. Consider the flow to thermal fluctuations [23], [89], [90]. When the PL and
of electrons from PL through the tunneling barrier and FL easy axes are close to collinear, the initial torque ex-
then through the SAFF structure. The spin direction of erted on the FL during switching, as given in (8), is very
electrons is initially aligned with the magnetization of small. The FL magnetization has to deviate sufficiently
PL. When they enter FL1, they exert torque on the mag- away from the easy axis before the spin-transfer torque is
netization of FL1 and their spin directions are flipped to able to switch it. To overcome this, MTJ with tilted mag-
the magnetization direction of FL1. The torque exerted netic anisotropy (TMA, Fig. 4), in which the easy axis of
aligns the magnetization of FL1 with that of PL. The elec- the PL is not collinear with that of the FL, was proposed
trons then enter FL2 through the Ru layer and exert tor- [90]. Thus, the initial torque exerted on the FL during
que on the magnetization of FL2. The torque exerted on switching is significantly increased, and the IC is reduced
the magnetization of FL2 aligns the magnetization of FL2 for a fixed switching time. Hence, the write energy per bit
with that of FL1. Hence, the spin torque exerted by the is also reduced. However, the technique may degrade the
electrons on one FL is opposite of that exerted on the MR ratio and, hence, the readability of the MTJ structure.
other FL. RMTJ is given by [91]

1) MTJ With Tilted Magnetic Anisotropy: It was observed

that the STT-MRAM switching delay distribution in the RMTJ ¼ (17)
precession regime is determined by the initial angle due ðRAP þ RP Þ þ ðRAP  RP Þ cos 

Vol. 104, No. 7, July 2016 | Proceedings of the IEEE 1457

Fong et al.: Spin-Transfer Torque Memories: Devices, Circuits, and Systems

where  is the angle between the FL and the PL magneti-

zations. Consider the magnetization of the FL ðmÞ as the
reference, which may be in þb z- or bz-directions. In MTJs
without TMA, the magnetization of PL ðmP Þ is fixed at
say þb z. Hence, m mP ¼ cos  ¼ þ1 or 1, and either
RMTJ ¼ RP or RMTJ ¼ RAP . In MTJs with TMA, mP is not
fixed at þbz or at b
z with respect to the FL magnetization.
Consider if mP is close to but not at þb z. Hence,
m mP G 1 when m ¼ þb z, and m mP 9  1 when
m ¼ b z. This means that in the MTJ with TMA, since
RMTJ 9 RP when m ¼ þb z and RMTJ G RAP when m ¼ b z, Fig. 5. Sample of read currents through 104 bit-cells where the
the MR ratio is lower than that in MTJs without TMA. voltage across each memory cell is VREAD . Bit-cell parameters
Furthermore, the lower IC tightens the constraints on read are shown in the table. Each point shows the read current for a
current to meet sensing speed and read-disturb failure bit-cell when it is storing AP ðIREAD;AP Þ and when it is storing P
ðIREAD;P Þ normalized to the reference current, IREF . Bit-cells falling
below the red line and to the right of the magenta line cannot be
correctly sensed by the sense amplifier. Self-referencing read
2) Orthogonal MTJ Stack: The orthogonal MTJ stack is schemes that leverage the correlation of IREAD;AP and IREAD;P are
another two-terminal structure that was proposed to re- needed to reduce sensing failures.
duce the IC of the MTJ stack without sacrificing its read-
ability [92]–[95]. The structure, as shown in Fig. 4,
consists of a free layer sandwiched between a tunneling
oxide layer and a metallic spacer layer (Cu). The easy disturb failure requirements. A lower IRD may limit the
axis of PL1 is collinear with the easy axis of the FL, maximum achievable sensing performance. Alternatively,
whereas that of PL2 is perpendicular to the easy axis of tMgO can be reduced to achieve higher IWR . However, re-
the FL. Two sources of torque are exerted on the FL duction in tMgO reduces the MR ratio of the MTJ and,
when current flows through the orthogonal MTJ stack. hence, degrades the distinguishability of the bit-cell as
Switching of the FL is initiated by the torque acting on well as the reliability of the tunneling oxide [14], [37].
the FL due to PL2 , which is at its maximum initially. These design tradeoffs arise because the read and write
Since PL2 is separated from the FL by a metallic spacer, current paths are shared in two-terminal STT-MRAM
electrons passing between FL and PL2 are not strongly devices.
spin-polarized. Hence, the torque exerted due to PL2 is The two-terminal nature of the aforementioned STT-
only able to destabilize the FL to initiate switching, but MRAM devices also leads to challenging design issues at
unable to fully align the FL magnetization with that of the circuit level. One of the design issues is the source
PL2 . Once the FL is destabilized, the torque exerted by degeneration problem discussed in Section IV-B. The
PL1 increases to a point it is able to align the FL magne- single-ended nature of sensing operations is another
tization with that of PL1 . Hence, the incubation and critical design issue in STT-MRAM. Consider the scatter
switching delays of the orthogonal MTJ are shorter than plot shown in Fig. 5, where each point corresponds to
those of MTJs with IMA or PMA [94]. The resistance of the read current of an STT-MRAM bit-cell when it stores
the orthogonal MTJ stack is dominated by the tunneling LRS and when it stores HRS. In single-ended current
resistance across the tunneling oxide barrier. Since the sensing, data stored in the STT-MRAM bit-cell is sensed
easy axes of FL and PL1 are collinear, RMTJ is determined by comparing the read current through the bit-cell to
by the relative magnetizations of FL and PL1 . Hence, IREF as discussed earlier. The cells falling to the right of
RMTJ  RP when m ¼ þb z and RMTJ  RAP when m ¼ b z, the vertical magenta line will always be sensed as LRS or
and the degradation in MR ratio is negligible, unlike in “0,” and those cells falling below the horizontal red line
MTJs with TMA. However, just like in MTJs with TMA, will always be sensed as HRS or “1.” Although IREF may
the reduced IC tightens the constraints on read current be carefully chosen to minimize sensing failures, sensing
to meet sensing speed and read-disturb failure failures may not be completely eliminated unless the
requirements. read currents between “0” and “1” are sufficiently sepa-
rated (i.e., a larger MR ratio is needed to compensate for
3) Design Issues of STT-MRAM Based on Two-Terminal higher process variations). One way to mitigate this
Devices: The design of two-terminal spin devices requires problem in single-ended sensing is to employ self-
the bit-cell designer to trade off between write, retention, referencing [31], [96]. Self-referencing allows the sensing
read-decision, and read-disturb failures. For example, we scheme to exploit correlations in the bit-cell read cur-
may want to reduce IC by reducing EB to improve write rents to reduce sensing failures. However, as we will dis-
failures. However, lowering EB also increases retention cuss in Section VI-G, self-referenced single-ended
failures and reduces the allowable IRD to meet read- sensing may become ineffective in future STT-MRAM.

1458 Proceedings of the IEEE | Vol. 104, No. 7, July 2016

Fong et al.: Spin-Transfer Torque Memories: Devices, Circuits, and Systems

Fig. 6. DPMTJ-based STT-MRAM bit-cell proposed (a) in [97] and (b) in [98]. (c) Write operations require write current ðIWR Þ through
the write port, whereas (d) read operations occur by passing read current ðIRD Þ through the read port.

Furthermore, complementary data are not available in proposed DPMTJs is that they have separate ports for
every bit-cell and, hence, differential sensing cannot be read and write.
used. Unlike single-ended sensing, differential sensing is DPMTJ proposed in [97] looks like a spin-valve and a
able to achieve faster read operation due to larger sens- tunneling junction with a shared free layer as Fig. 6(a)
ing margins enabled by storing complementary data. An shows. It has a single free layer in contact with two me-
STT-MRAM design that enables differential sensing is tallic contacts (one of which is Cu while the other is
presented in Section VI-F. However, they are not self- Cr/Au) on the top and a tunneling oxide barrier on the
referencing since separate MTJs are used to store comple- bottom. A pinned layer sits below the tunneling oxide,
mentary data. As a result, process variations can lead to and another pinned layer sits on top of the Cu contact.
significant read-decision failures. Hence, the lack of vari- Write operations require sufficient current through the
ation tolerant self-referenced differential sensing limits low-resistance path between the Cr/Au electrode and
the readability of two-terminal STT-MRAM devices. the spin-valve structure. Read operations involve sensing
Since the key obstacle is the two-terminal nature of current through the high-resistance path between the
the STT-MRAM devices, multiterminal devices that are Cr/Au electrode and the tunneling junction.
able to mitigate the read/write stability conflicts have Since the DPMTJ in [97] uses a spin-valve in the write
been proposed. As we will see in the following sections, current path, the spin-polarization efficiency of the write
STT-MRAM bit-cells based on multiterminal devices may current is low. Hence, an alternative DPMTJ structure,
require multiple access transistors. However, the re- which uses a tunnel junction in the write current path,
quirements on the sizing of the access transistors for shown in Fig. 6(b), was proposed [98], [99]. The read and
multiterminal devices are significantly relaxed. Hence, write operations occur instead by passing current between
the bit-cell layout area for multiterminal devices can be BL and RSL and between BL and WSL, respectively.
smaller than those based on two-terminal devices despite A key advantage of DPMTJ structures is that their read
the need for multiple access transistors per bit-cell. and write current paths are separate. Hence, the DPMTJs
In Section V-C, we explore different genres of multiter- may be optimized for read and for write independently.
minal devices and show how they are able to mitigate or The resistance of the write current path can be reduced to
completely avoid the design issues in two-terminal STT- mitigate the source degeneration problem discussed in
MRAM devices. Section IV-B. This is achieved using a Cu spacer in [97],
and a thin tunnel oxide layer for the write port in [98].
The distinguishability of the stored states in the DPMTJ
C. Multiterminal Devices can be improved by increasing the thickness of the tun-
Multiterminal devices mitigate the aforementioned nel oxide layer in the read ports shown in Fig. 6. This
design issues by decoupling read/write current paths may also reduce the amount of read current needed for
[97]–[101], avoiding the source degeneration problem sensing the stored state, which helps reduce read disturb
and enabling self-referenced differential sensing opera- failures. Another advantage of DPMTJ is that the high
tions [102]–[105]. write current never flows through the tunnel oxide used
for sensing the DPMTJ state. Hence, the reliability of the
1) Dual-Pillar MTJ Structure (DPMTJ): Fig. 6 illustrates tunnel oxide for sensing the stored state is significantly
the DPMTJ proposed in [97]–[99]. A key feature of the improved.

Vol. 104, No. 7, July 2016 | Proceedings of the IEEE 1459

Fong et al.: Spin-Transfer Torque Memories: Devices, Circuits, and Systems

Fig. 7. DWMTJ state is (a) represented by the magnetization of

the region just under the read port. (b) Read and write current
paths are decoupled just like in the DPMTJ.

2) Domain-Wall-Based MTJ (DWMTJ) as Storage Ele-

ment: The DWMTJ structure is another multiterminal
MTJ structure that has been proposed (Fig. 7) [100],
[101]. It consists of a domain-wall stripe with comple-
mentary polarized pinned layers at the ends (i.e., the
pinned layer magnetizations are completely opposite)
and a free region between the pinned layers. A tunnel
oxide is deposited on top of the free region followed by
another pinned layer.
The current paths for read and write operations are
shown in Fig. 7(b). Write operations require sufficient
current between BL and WSL, whereas read operations
occur by passing current between RSL and BL through
the tunnel oxide. DWMTJ has all the advantages of the Fig. 8. CPMTJ-based STT-MRAM bit-cell using PMA ferromagnetic
DPMTJ—separation of read and write current paths, low layers proposed (a) in [102] and (b) in [103]. (c) Write operations
resistance in write current path to mitigate source degen- occur by steering write current through the complementary
eration, improved distinguishability, read disturb failure, pinned layers via either SLL or SLR. (d) Read operation requires
read currents, IREAD , through both pinned layers and comparing
and tunnel oxide reliability. However, the read opera-
them to determine the state of the stored data.
tions of DPMTJ and DWMTJ are single-ended in nature.
Single-ended read operations sense data stored in multi-
ple cells by comparing with a common reference. The
reference needs to be carefully chosen to optimize sens- structure shown in Fig. 8 [102]–[105]. CPMTJ is similar
ing failures in the presence of process variations. Read to DPMTJ in that they look like two tunnel junctions
operations that use self-referenced differential sensing with a shared free layer. In CPMTJ, the magnetizations
does not require a common reference, and hence are of the pinned layers are complementary. As Fig. 8(c)
more robust than read operations using single-ended shows, write operations require sufficient write current
sensing. A multiterminal MTJ structure that enables self- from BL to either SLL or SLR. Hence, the free-layer
referenced differential sensing for read operations while magnetization is always parallelized with the magnetiza-
preserving most advantages of DPMTJ and DWMTJ is tion of one of the pinned layers. Read operations occur
presented next. by first applying a common voltage to SLL and SLR and
another voltage to BL. The respective currents flowing
3) Complementary Polarizer MTJ (CPMTJ): Another fla- through SLL and SLR are different since the free-layer
vor of multiterminal MTJ structure is the CPMTJ magnetization is parallel to one pinned layer and

1460 Proceedings of the IEEE | Vol. 104, No. 7, July 2016

Fong et al.: Spin-Transfer Torque Memories: Devices, Circuits, and Systems

anti-parallel to the other. Hence, the FL magnetization

may be sensed by comparing the respective currents
flowing through SLL and SLR.
CPMTJ completely avoids source degeneration and
uses only the parallelizing of FL with PL for write op-
erations. Note that such parallelizing operation usually
requires lower IC . Thus, the write power is optimized.
The read operation is also variation-tolerant due to the
self-referencing and differential nature of sensing. Fur-
thermore, latch-based sense amplifiers may be used for
sensing the state of CPMTJ, which improves sensing de-
lay. Reduction in sensing delay also reduces read-disturb
Although the multiterminal devices presented in this
section mitigate design issues in STT-MRAM by decou-
pling read and write current paths, or avoiding source
degeneration and enabling self-referencing differential
read operations, the write currents are still driven
through a tunneling oxide. Hence, the voltage required
to supply the required IC may still be quite high. This Fig. 9. (a) Nonlocal spin torque effect was observed in the LSVs.
may lead to reliability issues with the tunnel oxide Injection of charge current via the injection port modulates
barrier in the write current path. Also, the inability to the open circuit voltage at the detector port. (b) Nonlocal
reduce the voltage needed for write means the write en- spin-transfer torque (NLSTT) MRAM is based on the LSV.
The read current path ðIREAD Þ is separate from the write
ergy consumption remains relatively high. In the follow-
current ðICHARGE Þ path.
ing sections, we will present some of the new physical
mechanisms for STT-MRAM write operations, which
may achieve much lower write energy either by reducing
the resistance along the write current path or by elimi- ferromagnetic nature of the injector filters the spins of
nating the need for write current. electrons that make up the charge current. When elec-
trons are injected into the channel via the injector, there
D. Alternate Physics is an accumulation of electrons with spin in the same di-
As mentioned earlier, the high write energy of two- rection as the injector magnetization. Similarly, when
terminal STT-MRAM poses a significant design chal- the direction of electron flow is reversed, electrons with
lenge. Several experiments have revealed new physics spin in the same direction as the injector are removed
that may be exploited to overcome this design issue. In from the channel, leaving an excess of electrons with op-
this section, we will discuss three of these experiments posite spin. Along the channel, the difference in spin
and consider STT-MRAM bit-cells that exploit these population decays when measured further and further
physical phenomena. away from the injector, which may be modeled as a spin
potential. Valet and Fert modeled the transport of up-
1) Nonlocal Spin-Transfer Torque: When current flows spin and down-spin electrons using the drift-diffusion
between an injector ferromagnetic contact to ground in a equation [108]. It can then be shown that the aforemen-
lateral spin valve (LSV) as shown in Fig. 9, one would tioned phenomenon leads to the injection of spin currents
expect that the open-circuit voltage, VO , on the detector ðI SPIN Þ in the channel, which do not have to coincide
ferromagnetic contact is the same as the channel since with the direction of charge current flow. The spin cur-
no electrical current is being driven through the detec- rent interacts with the detector and can exert a nonlocal
tor. However, the experiments in [106] and [107] show spin-transfer torque on its magnetization.
that VO depends on the magnetization direction of the The nonlocal spin-transfer torque MRAM (NLSTT-
detector relative to that of the injector. It was further MRAM) shown in Fig. 9(b) exploits nonlocal spin-
shown in [106] that at low temperatures, the magnetiza- transfer torque for programming the free layer [109],
tion of the detector may be switched if sufficient current [110]. Like DPMTJ, the read and write current paths are
is injected via the injector. However, the magnitude of separate as illustrated by Fig. 9(c). The write current
switching current in [106] is insufficient to generate flows between PL0 and PL1 through the metallic channel.
magnetic fields large enough to switch the detector mag- The metallic nature of the write current path effectively
netization. It has been theorized that the injection of mitigates the source degeneration of the write access
charge current into the channel results in a spin accumu- transistor. Consider the case when the write current is
lation in the channel underneath the injector [108]. The flowing from PL1 to the PL0. Majority of the electrons

Vol. 104, No. 7, July 2016 | Proceedings of the IEEE 1461

Fong et al.: Spin-Transfer Torque Memories: Devices, Circuits, and Systems

injected into the metallic channel are spin-aligned with

the magnetization of PL0. Furthermore, electrons that
have spin-aligned with the magnetization of PL1 are easily
removed from the metallic channel. As a result, there is a
large concentration of electrons with spins that are
aligned with the magnetization of PL0. This results in a
large gradient in the concentration of each spin species if
the FL magnetization is aligned with that of PL1. As dis-
cussed previously, this concentration gradient leads to
spin diffusion that has zero charge current flow but a net
spin current flow. The spin current is absorbed by FL,
and a nonlocal spin torque is exerted on FL that aligns its
magnetization with that of PL0. Reversal of the current
direction between PL0 and PL1 reverses the direction in
which the magnetization of FL switches. The magnetiza-
tion of FL is sensed by passing a small read current Fig. 10. (a) Spin Hall effect, where the electron spin and
through the tunnel junction formed by FL and PL2, as magnetization directions of magnetic layers are shown. (b) Basic
shown by Fig. 9(d), just as in the read operation of the SOT-MRAM cell with the magnetization arrows drawn for clarity.
Note that the read and write current paths are also decoupled.
1T-1MTJ STT-MRAM bit-cell. (c) SOT-MRAM bit-cell uses a pair of MTJs to store
In order to achieve low write energy in NLSTT- complementary data and enable self-referenced differential
MRAM, one must ensure that the charge to spin current read operations.
conversion is sufficiently strong. This requires sufficiently
strong spin filtering effect at the interfaces between PL1
and PL0 with the nonmagnetic metallic channel. Also,
spin scattering effects in the nonmagnetic metallic chan- absorbed by the FL and exerts a torque on the FL magneti-
nel need to be minimized. These two issues are key in en- zation. However, electrons that enter HM from FL experi-
suring that the ratio of majority electron spin to minority ence strong spin-scattering upon entry. Being close to the
electron spin in the nonmagnetic metallic channel is high FM/HM interface, these electrons may enter FL after scat-
enough, which is required for efficient charge to spin tering. Fig. 10(a) shows that an electron from FL may
current conversion and low write energy. spin-scatter into the opposite spin and go back into FL.
Furthermore, the process can repeat many times along the
2) Spin-Orbit Torque: Unlike the spin torques discussed direction of charge current flow, and hence, the same
earlier, spin-orbit interaction (SOI) can be much more electron can transfer many units of angular momentum to
efficient at generating spin current from charge current. FL. Thus, the ratio of spin current injected into FM to the
For example, it was recently found that spin-orbit torque charge current flowing through HM may be significantly
(SOT) is exerted on a ferromagnet (FM) deposited on a larger than 1 (this ratio is equivalent to the spin-polarization
heavy metal (HM) when charge current is passed efficiency in the spin torques exploited in the structures
through the HM layer [111]–[113]. Two possible origins discussed in Section V-B and C). Hence, materials with
of SOT are the Rashba field due to structural inversion large SOI provide a means to reduce the write energy of
asymmetry in FM/HM interfaces [114]–[120] and the STT-MRAM by enhancing the efficiency of spin current
spin Hall effect (SHE) in HM [116], [120], [121]. In the generation.
case of SHE, the spin current flow generated is orthogo- The basic structure of the spin-orbit torque MRAM
nal to the current flow, and the spin direction of the (SOT-MRAM) is shown in Fig. 10(b) [111], [124]. It con-
electrons is orthogonal to the direction of spin current sists of an MTJ with its free layer in contact with the HM.
and charge current flow. The spin current may be in- Note that the FM has IMA so that it may be switched using
jected into FM just like in NLSTT. HMs such as -Ta SOT. However, techniques have been proposed to enable
[111], -W [122], and Pt [123] have been identified as switching of MTJs with PMA using SOT [125]–[127]. In
candidate materials that may be used to generate spin Fig. 10, the current flows in the x-direction. The direction
current via SHE. of spins injected into FL is either in þy- or y-direction
Spin current due to SHE may be understood by consid- depending on the sign of spin Hall angle and direction of
ering the illustration in Fig. 10(a). When charge current current flow in the HM. Hence, write operations are per-
flows in HM, SOI causes the accumulation of one type of formed by passing current through the HM. The state of
electron spin on one surface of HM, whereas the electrons the MTJ is sensed by the read current through the MTJ as
of the opposite spin accumulate on the opposite surface. shown in Fig. 10(b). Note that the reliability of the MTJ is
Note that the accumulation is orthogonal to charge cur- improved since small read current flows across the tunnel-
rent flow. SHE generates a spin current that can be ing oxide only during read operations.

1462 Proceedings of the IEEE | Vol. 104, No. 7, July 2016

Fong et al.: Spin-Transfer Torque Memories: Devices, Circuits, and Systems

Since electron spins accumulate on both sides of the magnetic material but not the alignment of its magneti-
HM, a pair of MTJs may be switched by the same write zation along the easy axis [134]–[138]. A separate mecha-
current. Fig. 10(c) shows a memory bit-cell with two nism is required to switch the MTJ between P and AP
MTJs. In such a bit-cell, the read operation compares configurations. Several proposals that exploit this mecha-
IRD;R to IRD;L so as to determine the stored state, thus en- nism for memory applications achieve the desired
abling differential read operation. switching of the MTJ state by using the strain as an assist
Note that SOT may also be used to reduce the write for spin-transfer torque [138] or for current-induced do-
current of other MTJ structures, such as the DWMTJ main wall motion [136]. However, it remains unclear
structure discussed in Section V-C.2. A Co domain wall whether the integration of such MTJ structures into the
stripe grown on Pt and capped with AlOx was investi- back-end-of-line (BEOL) of the CMOS fabrication pro-
gated in [113]. The results corroborate the theory that cess severely degrades the efficacy of strain transfer be-
the current density needed to move the domain wall in tween the piezoelectric and magnetic materials.
such multilayers with an HM underlayer is much lower Nevertheless, the control of magnetization using voltage-
than without the HM layer [128]–[130]. Furthermore, controlled strain is promising for achieving ultra-low
the dependence of the domain wall velocity on the direc- write energy STT-MRAM.
tion of current flow depends on the choice of HM [131].
E. Multilevel Cells
3) Voltage-Controlled Magnetic Anisotropy (VCMA): It The integration density of STT-MRAM may be in-
was experimentally observed that the free layer in a creased by storing multiple bits in each memory bit-cell.
Ta/CoFeB/MgO/CoFeB/Ta MTJ stack may be switched This may be achieved by including two or more MTJs in
by an applied voltage in the presence of a magnetic field each bit-cell [139]–[142]. The single MTJ in the single-
[132]. Furthermore, the current density through the MTJ level cell (SLC)-type STT-MRAM may be replaced by se-
at the switching voltage is about 1:2  102 A=cm2 , which ries or parallel connections of two MTJs, as shown in
is much smaller than predicted by the Slonczewski Fig. 11, to implement MLC. Each MTJ stores 1 bit of
model of spin-transfer torque. Further investigation data, and the structures shown in Fig. 11 enable each
found that the magnetic anisotropy field of the MTJ is MLC to store 2 bits. However, the dimensions of the
modulated by the electric field due to the applied voltage MTJs are deliberately different so that four resistance
[133]. Note that the effect was only observed in MTJs in states may be achieved. Just as in the SLC, either voltage
which the PMA effect is due to interfacial effects at the or current sensing may be used to determine its resis-
oxide-ferromagnet interface [132]. Furthermore, since tance state.
the applied electric field only modulates the magnetic an- The write operation of MLC STT-MRAM is more
isotropy of the ferromagnet, a magnetic field is needed to complicated than its SLC counterpart. First, the MTJs in
deterministically switch the MTJ. This may pose a prob- MLC are engineered such that they have different IC ’s.
lem in dense STT-MRAM arrays where stray fields from Consider the operation to store “01.” The MTJ with the
nearby ferromagnetic layers may affect the switching. larger IC is programmed first (storing “0” in Fig. 11).
The MTJ with the smaller IC is then programmed in
4) Strain-Induced Magnetization Change: An alternative the second step. Note that the write latency of MLC is
scheme to achieve voltage control of magnetization data-dependent and can be larger than that in SLC.
without the need for an applied magnetic field uses When the MTJ with the larger IC is being written into,
strain-induced magnetization reversal [134]–[138]. Some the other MTJ is also written. The write operation may
magnetic materials exhibit the inverse magnetostriction be terminated after the first step if the final state of both
effect [134], whereby stresses applied to the material MTJs matches the data that is to be stored into them.
change its atomic structure, which determines the easy Otherwise, a second write step is needed to correctly
axes of the magnetic material. Since the lattice structure program the MTJ with the smaller IC .
of piezoelectric materials such as lead-zirconate-titanate The MLC bit-cells suffer from the same failures as
(PZT) may be modulated by an electric field, direct cou- their SLC counterpart. In addition, write disturb failure
pling of piezoelectric materials to magnetic materials ex- may occur when the direction of write current needs to
hibit inverse magnetostriction effect, enabling voltage be reversed to program the MLC bit-cell. When the MTJ
control of magnetization without the need of an applied with the smaller IC is being programmed in the second
magnetic field. Thus, a magnetic storage element exploit- write step, current also flows through the other MTJ and
ing this mechanism may be implemented as an MTJ with may overwrite its state, which was programmed in the
its FL adjacent to a piezoelectric material. The FL mate- first write step. Hence, MLC STT-MRAM needs to be
rial needs to exhibit the inverse magnetostriction effect, carefully designed to mitigate write disturb failures.
and the interface of FL with the tunneling oxide barrier Moreover, the resistance difference between states needs
needs to be engineered to provide good MR ratio. How- to be large enough so that the sensing circuitry is able to
ever, the induced strain only rotates the easy axis of the distinguish all four states.

Vol. 104, No. 7, July 2016 | Proceedings of the IEEE 1463

Fong et al.: Spin-Transfer Torque Memories: Devices, Circuits, and Systems

Fig. 11. MLC STT-MRAM bit-cells based on (a) series connection of MTJs, called S-MLC, and on (b) parallel connection of MTJs called
P-MLC. In the first write step, large write current ðIWR Þ is driven through the bit-cell so that identical data are written to both MTJs.
If the data to be written to each bit are different, the second write step is initiated, driving a small write current to write the MTJ
with the smaller IC .

1) Racetrack Memory: Racetrack memory (RTM), pro- logically opposite (identical) if a DW is present (absent)
posed in 2008 by IBM, is an MLC based on the domain between them.
wall (DW) physics discussed in Section V-C.2 [143]. Its The write port of the racetrack memory consists of a
key advantage is extreme integration density. It consists nano-wire that overlaps and runs perpendicular to the
of a write port, a read port, a shift port, and a long do- domain wall stripe. A current pulse injected through the
main wall stripe, in which DWs are formed to separate write port generates a localized magnetic field, H WR ,
magnetic domains of opposite magnetizations (Fig. 12). that manipulates the magnetization of the segment of the
The domain wall stripe (DWS) is segmented so that each DWS overlapping the write port. Alternatively, an MTJ
segment has a uniform magnetization, which represents may also be used to inject spin-polarized electrons into
the data bit stored. A DW is present between neighbor- the DWS so as to manipulate the magnetization of the
ing segments with opposite magnetization directions. segment using STT. A DW may be created or annihilated
Notches may be used to create regions of strong pinning at the boundaries of the segment during the writing pro-
potential along the DWS so that DWs can be stabilized cess. On the other hand, the read port consists of an
[144]. Neighboring segments store data bits that are MTJ formed using a PL, a tunnel oxide, and one segment
of the DWS as shown in Fig. 12. Hence, the resistance of
the read port depends on the magnetic configuration of
the MTJ.
In RTM, the memory address of the data stored in
each segment of the DWS is variable. A shift port placed
at the ends of the DWS controls the sequential shifting
of data bits into and out of the read and write ports.
Current injected through the shift port moves DWs along
the DWS using the STT effect. The number of data bits
that can be stored in the DWS depends on its length. If
vertical integration is possible, large numbers of bits may
be stored in each DWS. Furthermore, only three ATx’s
are needed per RTM bit-cell, and extreme integration
density may be achieved.
An important design issue in RTM is that under pro-
cess variations, it may be very difficult to uniformly
control the motion of all DWs in the DWS [143]. The
functionality of RTM depends on the ability to shift
Fig. 12. Schematic diagrams of racetrack memory programmed DWs in unison. Consider the case when a DW is being
using (a) magnetic field and (b) STT. The magnetization of each shifted in a PMA-type DWS as shown in Fig. 13(a).
segment stores 1 bit of data and DW form between segments Assume that the pinning potential at B is increased due
with opposite magnetization. Segments with different colors
to process variations. Hence, the shift current needed to
have opposite magnetization. SHFT port allows bits to be shifted
in and out of the read and write ports, which are accessed via
dislodge DW from B is much larger than usual. When
RSL and RWL, and WSL, WWL, and WBL, respectively. An MTJ in DWs are being shifted in the direction shown by
the read port allows data to be read out. Fig. 13(a), the separation between DWs is decreased.

1464 Proceedings of the IEEE | Vol. 104, No. 7, July 2016

Fong et al.: Spin-Transfer Torque Memories: Devices, Circuits, and Systems

flavors of MLC SOT-MRAM were proposed in [148].

Similar to the MLC STT-MRAM discussed earlier, the
MLC SOT-MRAM may be designed with parallel or se-
ries connected MTJs (called P-MSOT and S-MSOT, re-
spectively) as shown in Fig. 14. Also, the MTJs are
designed with different IC ’s such that the stored states
may be sensed as distinct resistances [148]. A multistep
sensing operation may be needed to read out the data
stored in MLC SOT-MRAM.
Write operations in MLC SOT-MRAM are multistep
Fig. 13. Shifting of DWs along a DWS can result in unintentional
(a) annihilation and (b) creation of stored data bits.
in nature just like in MLC STT-MRAM. In the P-MSOT
design, write current is passed through the heavy metal
(HM). Hence, low write energy may be achieved because:
1) the IC ’s are low due to efficient spin-orbit torque gen-
Hence, 1 bit of data in the DWS is annihilated. Now con- eration; and 2) the voltage needed to induce the write
sider the case when the direction of DW shifting is re- current in HM is very low due to the high conductivity of
versed as shown in Fig. 13(b). The DW at B is too the write current path. In the S-MSOT design, HM is in
strongly pinned and remains pinned while the other do- contact with the FL of only one of the two MTJs. Write
main wall shifts to A. Since the separation between the current needs to flow through the stack of MTJs and
DWs has increased, a logical data bit has been uninten- through HM to program both MTJs. As a result, the write
tionally created in the DWS. Since the creation and anni- energy may be significantly higher compared to the
hilation process can occur simultaneously at multiple P-MSOT design. However, the write current may be
sites within a DWS, the data stored in an RTM bit-cell passed through both current paths simultaneously to
may be corrupted during shift operations. Furthermore, achieve single-cycle latency write operation. Hence, the
the problem is exacerbated by the fact that the relation- P-MSOT design is more suitable for cache applications in
ship between DW velocity and shift current depends on which low write energy is crucial, whereas the S-MSOT
many factors [144]–[147]. Moreover, access to individual design is more suitable for cache applications that require
memory locations requires the shifting of DWs so that fast write operations [148]. Furthermore, large write cur-
the corresponding memory location is shifted into the rents do not flow through the tunnel oxides in the P-SOT
read port. This requires additional hardware and also in- design, which improves the reliability of the MTJs.
troduces variable access latency to RTM accesses.
3) Multilevel Domain Wall Memory: MLC based on
2) Spin-Orbit Torque-Based MLC: Just as in the SLC DWMTJ may be implemented as illustrated in Fig. 15(a)
case discussed earlier, spin-orbit torque may be exploited [149]. Compared to RTM, DWMTJ-based MLC uses two
to reduce the write energy in MLC STT-MRAM. Two read ports. Conceptually, its structure consists of three

Fig. 14. (a) MLC storage elements using SOT, (b) equivalent circuit network modeling of the storage elements, (c) bit-cell schematics,
and (d) bit-cell bias conditions for read and write operations.

Vol. 104, No. 7, July 2016 | Proceedings of the IEEE 1465

Fong et al.: Spin-Transfer Torque Memories: Devices, Circuits, and Systems

presented in this section. Table 3 summarizes the key ad-

vantages and disadvantages of the design techniques for
improving the two-terminal MTJ stack. The two-terminal
nature of the MTJ stack is desirable for achieving arrays
with high bit density. The interfacial PMA-based MTJ
with dual MgO/FL interface is promising due to its
enhanced interfacial anisotropy and reduced , which
reduces write energy as well as the randomness of spin-
transfer torque switching. However, the two-terminal
nature of the MTJ does not resolve the conflicting re-
quirements for improving read and write operations.
Multiterminal STT-MRAM storage device structures such
as the DPMTJ, DWMTJ and CPMTJ were proposed to
alleviate this problem. However, the intrinsic IC may still
be high. Alternative physical mechanisms for program-
ming the MTJ, such as nonlocal spin-transfer torque
effect, spin-orbit torque effect, voltage-controlled mag-
netic anisotropy, and strain-induced magnetization
reversal, may be promising solutions for overcoming the
Fig. 15. (a) Conceptual design of MLC-type DWMTJ and its
corresponding write current path, and (b) a possible
STT-MRAM storage device design issues. Finally, we
implementation of MLC-type DWM and its corresponding write discussed multilevel cells, which amortize the implemen-
current path. tation overhead among the total number of bits stored
per bit-cell. Large bit densities may be achievable by inte-
grating RTM in three dimensions. Note that some of these
pinned magnetic regions along the stripe: one at each techniques may be used in conjunction with others—such
end of the stripe and one in the middle. The magnetiza- as spin-orbit torque with current-induced domain wall
tions of the pinned region at the ends are the same but motion—to get improved performance.
opposite to that of the pinned region in the middle of the
stripe. Note that this may be implemented by connecting
together the ends of two separate DWMTJs as shown in VI. CIRCUIT TECHNIQUES FOR
Fig. 15(b). Data are stored as the location of the DWs IMPROVING S TT-MRAM
along the stripe(s), similar to the domain wall memory In Section V, various device design techniques were
(DWM) discussed in Section V-C.2. The geometry of the discussed to address the challenges associated with STT-
domain wall stripes are engineered such that the IC ’s to MRAM. However, it may be costly and difficult to incor-
move the DW in each stripe are different. Also, the stripe porate some of these device design modifications into the
may be deposited on HM to reduce IC by exploiting SOT. fabrication process. Alternatively, circuit-level design
Data stored in MLC-DWM is read out as the resis- techniques may be used to alleviate some of the problems
tance formed by the read ports of the domain wall described earlier. In this section, we will present several
stripes connected in parallel. Hence, the resistance of circuit-level STT-MRAM design techniques. Some of
each read port needs to be designed such that the stored these techniques improve STT-MRAM by embedding new
states in the MLC-DWM may be sensed as distinct resis- functionality in the STT-MRAM array at near-zero cost.
tances [149]. The write operations of the MLC-DWM are The other design techniques discussed in this section op-
also multistep in nature. The cell is programmed by first timize the STT-MRAM bit-cell topology and/or peripheral
writing the bit with the larger IC followed by writing the circuitry to improve reliability. The attractiveness of
bit with the lower IC . The write steps are similar to those these techniques is that they may be cost-effective since
in MLC STT-MRAM and MLC SOT-MRAM. the basic MTJ structure is not modified. Let us first con-
Compared to RTM, the integration density of sider some possibilities of unique functionality that can
MLC-DWM is much lower, but the access latencies in be easily added to the STT-MRAM array.
MLC-DWM are much more predictable. Furthermore,
additional peripheral circuitry is needed to track the A. Embedding New Functionality in
address of the memory bit in the read and write ports of STT-MRAM Arrays
RTM. This is not needed in MLC-DWM since both bits A unique characteristic of resistive RAM technology,
can be accessed simultaneously. and especially STT-MRAM, is that the layout of the
Before discussing circuit and architecture level design memory array can be exploited to embed new function-
techniques for improving STT-MRAM, let us briefly sum- ality in the array. For example, an additional bit line
marize the STT-MRAM storage device design techniques can be added into the STT-MRAM array at no layout

1466 Proceedings of the IEEE | Vol. 104, No. 7, July 2016

Fong et al.: Spin-Transfer Torque Memories: Devices, Circuits, and Systems

Table 3 Summary of STT-MRAM Storage Device Design Techniques

area cost. The extra bit line can be utilized to embed the bit-cell, whereas ROM data are stored as the connec-
read-only memory (ROM) [150], [151]. In the following tion of the bit-cell to one of the two available bit lines
sections, we will briefly discuss three proposals whereby (BL0 and BL1, see Fig. 16). If the bit-cell is connected to
additional functionality is embedded in the STT-MRAM BL0, the ROM data stored in it is “0,” and if the bit-cell
array so that it may be used simultaneously as RAM is connected to BL1, the ROM data stored in it is “1.”
and as a functional block. The attractiveness of such an The peripheral circuitry needs to be modified to support
approach is that by implementing the additional func- both RAM and ROM functionality as shown in Fig. 16.
tionality in the STT-MRAM array, the system-level cost Referring to Fig. 17(a), a pair of pass transistors allow
of adding the function may be significantly lowered. BL0 and BL1 to be electrically connected for RAM opera-
tions, and electrically decoupled for ROM read opera-
1) Embedding Read-Only Memory in STT-MRAM: ROM tions. Also, a latch-based sense amplifier may be used to
functionality may be embedded in an STT-MRAM array sense the ROM data as Fig. 17(b) shows.
by utilizing an additional bit line [150], [151]. RAM data Note that the additional circuits do not need to con-
are stored as the magnetic configuration of the MTJ in sume a significant amount of area. The pass transistors

Vol. 104, No. 7, July 2016 | Proceedings of the IEEE 1467

Fong et al.: Spin-Transfer Torque Memories: Devices, Circuits, and Systems

physically unclonable function (MemPUF) [152]–[154].

The input and output of the MemPUF is called its
challenge and response, respectively. When a MemPUF is
presented with a challenge, it produces a chip-unique
response and hence may be used for hardware security
applications. A simple STT-MRAM-based MemPUF may
be implemented as follows. Ordinarily, the STT-MRAM
array functions as RAM. When MemPUF mode is re-
quired, data in the bit-cells selected by the challenge to
generate the response are first copied to a buffer stor-
age. The data in the bit-cells are then overwritten with
Fig. 16. Column of STT-MRAM array that has ROM embedded in
it. Additional peripheral circuits needed to support the ROM and
identical values. Next, the bit-cells are grouped in pairs.
RAM functionality are also shown. The bits constituting the response are then generated by
comparing the resistances of the bit-cells in each group.
Due to process variations, the resistance of each bit-cell
need to be sized so that sufficient current may flow is different from each other, and if pairing is done
through the bit-cells during RAM operations. Also, the properly, the result of the comparison in each group is
sense amplifier for ROM read operations can be very random between groups (also termed as the randomness
small due to the large sensing margin for ROM sensing. criterion). Since the process variation is unique for each
As Fig. 17(b) shows, when ROM data are being read, STT-MRAM array, the response to the challenge is un-
one of BL0 and BL1 will be floating while the other is ique to the array (also termed as the uniqueness crite-
electrically connected to SL via the selected bit-cell. rion), and the response to the same challenge is expected
Hence, the transistors in the ROM sense amplifier may to be repeatable (also termed as the stability criterion).
be minimum-sized. Since the array area increase due to Fig. 18(a) shows the schematic of an STT-MRAM array
the additional transistors is aggregated among the bit- that can also function as a MemPUF. The flowchart in
cells connected to the same column, the area penalty is Fig. 18(b) illustrates the operation flow when the
negligible. MemPUF functionality is required.
Several design issues need to be overcome to fulfill
2) STT-MRAM-Based Physically Unclonable Function: the randomness, uniqueness, and stability criteria for
The variations in performance between STT-MRAM bit- STT-MRAM-based MemPUFs as discussed in [152]–
cells may also be exploited to implement a memory-based [154]. For example, read failures may cause stability is-
sues in an STT-MRAM-based MemPUF. An automatic
writeback scheme was proposed in [153] to improve the
stability of the STT-MRAM-based MemPUF. Interest-
ingly, the proposed MemPUF may also be implemented
in other nonvolatile memory technologies [152].

3) Generating Random Numbers Using STT-MRAM: The

stochastic nature of spin-transfer torque switching in
the MTJ may be used to generate random numbers
[155]–[159]. This is extremely useful not just as a secure
hardware primitive for cryptography and other hardware
security applications, but also for high performance com-
puting applications such as Monte Carlo analysis. Fig. 19
shows the flowchart for an STT-MRAM-based truly ran-
dom number generator (TRNG). In the STT-MRAM ar-
ray that can function as a TRNG, the write drivers are
designed so that the currents supplied to the bit-cells
during the RAM and the TRNG modes are different. In
the RAM mode, the current supplied is large enough so
that the state of the MTJ is deterministically pro-
grammed. When a random number is needed from the
array, the memory switches to the TRNG mode. Data in
a row of STT-MRAM bit-cells is first copied into a buffer
Fig. 17. Current paths during (a) RAM operations and during (b) storage. Next, bit-cells in the row storing “0” and “1” are
ROM operations are controlled by the pass transistors as shown. stochastically overwritten with “1” and “0,” respectively.

1468 Proceedings of the IEEE | Vol. 104, No. 7, July 2016

Fong et al.: Spin-Transfer Torque Memories: Devices, Circuits, and Systems

Fig. 19. Flowchart illustrating how TRNG functionality may be

realized in an STT-MRAM array.

minimal impact on read failures have been proposed [64].

These techniques either increase the current flowing
through the bit-cell during write operations (write current,
IWR ), or reduce the critical switching current ðIC Þ of the
STT-MRAM bit-cell. IWR may be increased by: 1) increas-
ing the voltage applied to the word line; 2) increasing the
voltage difference between the bit line and the source line
of the bit-cell being written into; and 3) forward biasing
the body of the NMOS access transistor (Tx) to reduce its
Fig. 18. (a) Schematic showing how an STT-MRAM array
consisting of a data array and a reference array, which are threshold voltage, VTN [64]. On-chip current carrying wires
identical 1T-1M bit-cell arrays, may be used as a physically may also be used to generate magnetic fields that can re-
unclonable function (PUF). (b) Flowchart shows how the PUF duce the IC of the MTJ [64].
functionality may be used without overwriting the data stored in
the STT-MRAM array.
1) Word Line Voltage Boosting: In some transistor tech-
nologies, the gate voltage may be boosted such that
VGS 9 VDD . For bit-cells implemented with such transis-
The write current supplied during this time is such tor technologies, the word line voltage may be boosted
that the probability of overwriting each MTJ is 50%. during write operations to lower the Tx resistance and
After the stochastic write process has completed, the increase IWR [163]. In the “bottom-pinned” bit-cell
state of the bit-cells is sensed to determine the random shown in Fig. 3, the MTJ in AP may have such a large re-
number. Note that the write current giving exactly sistance that IWR through the bit-cell falls below IC . By
50% probability of switching may be different for every boosting the word line voltage, IWR can be increased.
MTJ under process variations. Strong post-processing This technique is also used to improve the write-ability
[160]–[162] may be required to relax the design require- of high-performance SRAMs.
ments on the write drivers while ensuring that the ran- Boosting the word line voltage may degrade the reli-
dom numbers generated by the STT-MRAM are of ability of the access Tx. Since write operations in memory
sufficiently high quality. occur infrequently, boosting the word line voltage during
write may have little impact on the reliability of the ac-
B. Write Assist Techniques cess Tx. An issue with word line boosting in SRAM de-
As was shown in [63], the minimum achievable sign is that SRAM cells in the unselected columns along
layout area of STT-MRAM bit-cells may be limited by the same row are usually put in the half-select mode [164]
the need to meet requirements for successful write oper- and may be susceptible to disturb failures if word line
ations. Hence, there is a need for write assist techniques boosting is implemented. However, STT-MRAMs do not
to improve the write-ability of STT-MRAM with minimal have such an issue since the bit and source lines of unse-
impact on its readability. Write assist techniques with lected columns may be discharged to GND.

Vol. 104, No. 7, July 2016 | Proceedings of the IEEE 1469

Fong et al.: Spin-Transfer Torque Memories: Devices, Circuits, and Systems

2) Transistor Body-Biasing: IWR may be increased applied magnetic field effectively reduces the IC of the
without increasing the width of the access Tx by lower- MTJ. However, the strength of the magnetic field re-
ing the threshold voltage ðVT Þ of the Tx. This may be quired depends on the critical field of the FL ðHC Þ,
achieved by applying a voltage to the body of the access which is given in (5) and (11) for MTJs with PMA and
Tx. When interdie variation is the dominant component IMA, respectively. Moreover, on-chip current carrying
of variation, a single bias to the NMOS body can be suffi- wires are required to generate the assist magnetic field,
cient in improving the failure probability of bit-cells on which may disturb other cells near the wire. Also, as was
the die. Forward biasing the Tx body may increase the found in [64], the timing of the assist magnetic field and
leakage currents in the cell, resulting in increased power IWR is crucial in reducing IC .
dissipation. The area penalty for implementing this tech-
nique comes only from circuitry for generation of body C. Write Current Optimization in STT-MRAMs
bias and not from the bit-cells. Note, however, if the As discussed in Section IV, an important design issue
body bias is also applied during read operations, the to overcome in STT-MRAMs is the large IC . The stochas-
read-disturb failures may increase. Finally, the effective- tic nature of spin-transfer torque switching and the
ness of body-biasing to manipulate transistor threshold source degeneration problem exacerbate the problem.
voltage has decreased considerably in scaled CMOS tech- Due to the need for bidirectional write currents, the
nologies [165]. width of the access Tx needs to be sized such that the
currents flowing through the MTJ (whether flowing from
3) Write Voltage Boosting: As mentioned earlier, write BL to SL or from SL to BL) during write operations meet
operations in STT-MRAM bit-cells require voltages to be the corresponding IC . A larger write current needs to be
applied to the bit, source, and word lines. The word line supplied to overcome the stochastic nature of spin-
controls the gate of the access Tx and controls the cur- transfer torque switching. The source degeneration issue
rent flow through the bit-cell. When the access Tx is worsens write energy dissipation. Larger write current
turned on, the bit and source line voltages determine the also degrades the reliability of the tunneling oxide in the
amount and direction of current flow through the bit- MTJ, which is crucial for achieving high MR ratio.
cell. Hence, IWR may also be increased by increasing One proposed circuit technique to optimize STT-
jVBL  VSL j. MRAM write current pulse is to turn off the pulse once
In order to implement the boosted jVBL  VSL j, an ad- the write operation is complete [171], [172]. A “self-enable”
ditional voltage plane may be required along with voltage circuitry tracks the resistance of the STT-MRAM bit-cell
level converters in the bit and source line drivers. Fur- during write operations. Once the write operation to the
thermore, higher write voltage increases electrical stress bit-cell is detected to be complete, the circuitry turns off
on the MgO barrier and may lead to reliability issues the write driver. Hence, “just enough” write current is sup-
[69], [166], [167]. As was mentioned in Section VI-B.1, plied to write data into the STT-MRAM bit-cell.
write operations might not occur frequently and, hence, Alternatively, the write current flowing through the
jVBL  VSL j may be boosted without significantly degrad- STT-MRAM bit-cell may be optimized by controlling
ing the MgO reliability. VBL  VSL so that just enough current flows from BL to
SL and from SL to BL during write operations. However,
4) Applied External Magnetic Field: It was shown that an additional voltage plane may be required. We next
variations in switching delay of an MTJ may be due to discuss two alternative circuit design techniques that
the thermal fluctuations that cause the magnetization were proposed in [173] to optimize the write current in
of the FL to become noncollinear with the magnetiza- STT-MRAMs.
tion of the PL [89], [90]. When the FL and PL magne-
tizations are almost collinear, very little spin-transfer 1) Bit Line Voltage Clamping: The conventional write
torque is generated. Thermal fluctuations must perturb drivers in STT-MRAM may use transmission gates to
the FL magnetization such that FL and PL magnetiza- drive the bit and source lines as shown in Fig. 20. Trans-
tions are less collinear in order for larger spin-transfer mission gates allow the bit and source lines to be driven
torque to be generated when current flows through the close to VDD and to GND. However, write operations re-
MTJ. Hence, there is an incubation delay before the quire only the source line to be strongly driven to VDD or
spin torque exerted on the FL becomes large enough GND. The bit line may be driven to a voltage smaller
to switch the FL magnetization. than VDD to save write energy. Since VSL is driven to
The incubation period may be reduced by applying a GND when current flows from BL to SL, the current
small magnetic field that tilts the FL magnetization to- flow may be limited by ensuring that VBL is not fully
wards its hard axis, as was proposed in [168]–[170]. driven to VDD . This may be achieved by driving the bit
Compared to Tx body biasing, word line voltage boost- line using a pass Tx instead of a transmission gate (Fig. 20).
ing, and write voltage boosting techniques, where IWR is Source degeneration of the pass Tx is exploited to limit the
increased and may degrade the reliability of the MTJ, the current flow from BL to SL [173]. When current flows from

1470 Proceedings of the IEEE | Vol. 104, No. 7, July 2016

Fong et al.: Spin-Transfer Torque Memories: Devices, Circuits, and Systems

Fig. 20. By eliminating the PMOS transistor in the transmission

gate within the bit-line write driver, (a) the bit-line voltage may
still be pulled to GND when current is being passed from SL to BL.
(b) Source degeneration of the pass transistor within the bit-line
write driver may be used to limit the write current and save
write energy.

SL to BL, the pass Tx is not source degenerated, and maxi-

mum write current flows.

2) 2T-1MTJ STT-MRAM Bit-Cell With Dual Source Lines:

Another method to limit the write current flowing
through the MTJ is to use a 2T-1MTJ STT-MRAM bit-cell
topology with dual source lines (Fig. 21). The voltages of
BL, SL1, and WL for write operations are identical to
those of BL, SL, and WL, respectively, of the 1T-1MTJ
STT-MRAM bit-cell topology. When maximum write cur-
rent is required to flow through the MTJ, the voltage of
SL2 is the same as that of SL1. When the current flowing
through the MTJ needs to be limited, the voltage of SL2
is complementary to that of SL1 as Fig. 21 shows. As
Fig. 22. (a) STT-MRAM array is rearranged in (b) the
[173] explains, the transistor M2 serves as a bypass tran- common-source-line scheme so that every column only has its
sistor to limit the write current flowing through the MTJ. own bit line. (c) Timing of the write operation needs to be
modified to occur in two steps. Low and high voltage levels
D. Common-Source-Line and Balanced correspond to GND and VDD , respectively.

Write Design
A 1T-1MTJ STT-MRAM array is usually designed
such that the word line used to control the access Tx of independently written into the bit-cells in the row.
the bit-cells is shared across the row of bit-cells [174], Hence, additional spacing is needed to separate neigh-
[175]. Since there are only two voltage planes (VDD and boring columns, which in turn degrades the achievable
GND), every column of bit-cells needs its own set of bit integration density of STT-MRAMs. A common-source-line
line and source line drivers so that the data may be (CSL) 1T-1MTJ STT-MRAM bit-cell design (Fig. 22) in
which all source lines in the array are electrically con-
nected is proposed in [176] to overcome this limitation.
The CSL design is enabled by the key insight that for
read operations, all source lines in the array are driven
to the same voltage. Furthermore, write operations occur
by applying voltages on the bit line and source line to
pass current through the selected bit-cells. In the CSL
design proposed in [176], read operations are the same
as that in the STT-MRAM design with separate source
lines. The write operation is modified into two steps as
follows (see Fig. 22). During the entire write operation,
the word line is driven to VDD and the bit lines are
Fig. 21. Using the 2T-1MTJ dual source-line technique, (a) both
transistors are turned on to maximize current flowing through
driven to VDD or GND depending on the data to be writ-
the MTJ during write operations. (b) M2 allows current to bypass ten into the corresponding column. During the first step
the MTJ when current is being sinked through SL1. of the write operation, the common source line is driven

Vol. 104, No. 7, July 2016 | Proceedings of the IEEE 1471

Fong et al.: Spin-Transfer Torque Memories: Devices, Circuits, and Systems

Fig. 24. Voltages on the control lines of the conventional 1T-1MTJ

STT-MRAM array (top) and the 1T-1MTJ STT-MRAM array with
Fig. 23. Array layouts using (a) the conventional and (b) the CSL balanced write scheme (bottom), for writing “1” (left bit-cell) and
schemes are illustrated. The bit-cell area for the CSL scheme will “0” (right bit-cell) into two “bottom-pinned” bit-cells in the
be smaller if the width of the access transistor is less than 10 . same selected row. Differences in voltages are highlighted in red
in the tables on the right.

to GND. Current flows through the bit-cells that are con-

nected to bit lines that are driven to VDD , and the corre- “0” and “1” are equal (hence, balanced write). For the
sponding MTJs in these bit-cells are programmed. Note STT-MRAM technology parameters used in [179], it was
that no current flows through the bit-cells connected to found that the balanced write scheme may achieve 43%
bit lines that are driven to GND. In the second step of and 67% reduction in write delay and write energy,
the write operation, the common source line is driven to respectively.
VDD instead. During this time, current stops flowing
through the bit-cells that are connected to the bit lines E. Two-Transistor One-MTJ STT-MRAM Bit-Cell
that are driven to VDD , while it starts flowing through As discussed earlier, the current flowing through the
those that are connected to bit lines that are driven to STT-MRAM bit-cell has to be smaller than a critical value
GND. Hence, as shown in Fig. 22, the write operation of during read operations. On the other hand, the bit-cell
the CSL design consists of two steps: First, only the “0”s current needs to be sufficiently large during write opera-
are written, followed by a step in which only the “1”s are tions to prevent write failures. For on-chip applications
written. Note also that the time for each write step is de- where the number of supply planes may be limited, the
signed to ensure that all bit-cells are successfully written conflicting requirements on bit-cell currents for read and
into. write operations place a constraint on the width of the ac-
The area wasted to separate neighboring columns is cess Tx. The two-transistor one-MTJ (2T-1MTJ) bit-cell
significantly reduced in the CSL design as shown in structure (Fig. 25) consisting of two access Tx’s con-
Fig. 23. Using -based [177] SCMOS layout rules [178] nected in parallel but with separate gates was proposed to
and a commercially available 45-nm CMOS technology overcome the conflicting design requirement for read and
for analysis, the CSL design can achieve 40% reduction write operations [62], [180]. During write operations,
in bit-cell layout area if the minimum access Tx width is
used to maximize integration density. However, the volt-
age levels may not be optimized for energy efficiency.
Balanced Write proposed in [179] optimizes write op-
eration voltages used in CSL design. By using a negative
voltage plane, the balanced write scheme allows the com-
mon source line to be directly connected to the GND
plane. Hence, the long source lines in the STT-MRAM
array may be completely eliminated and the integration
density of STT-MRAM can be improved. The voltages of
the word line and bit line for writing “0” and for writing
“1” are then optimized (see Fig. 24). The analysis in
[179] shows that the balanced write scheme may be very
effective in mitigating the effects of transistor source de-
Fig. 25. 2T-1MTJ bit-cell topology and corresponding layout for
generation, discussed in Section IV-B. The analysis done achieving maximum integration density. Note that the word lines
in [179] minimizes the STT-MRAM write energy with for M1 and M2 may be separately driven without bit-cell area
the constraint that the MTJ switching delays for writing penalty.

1472 Proceedings of the IEEE | Vol. 104, No. 7, July 2016

Fong et al.: Spin-Transfer Torque Memories: Devices, Circuits, and Systems

Fig. 27. Peripheral circuit for the destructive sensing scheme

from [31].

Fig. 26. Structures of NV-SRAMs proposed (a) in [181], (b) in

Although NV-SRAMs can achieve fast read opera-
[182], and (c) in [183]. The structure of (d) a volatile 6T SRAM is tions, their restore operations depend on the ability of
also shown for comparison. the MTJs to skew the cross-coupled inverters. Hence, the
proposed NV-SRAM bit-cells may be sensitive to mis-
matches in MTJ characteristics. Furthermore, the sizes
of the transistors in the bit-cells shown in Fig. 26 may
both access Tx’s are turned on to maximize current flow-
need to be enlarged to meet the write current require-
ing through the bit-cell. During read operations, only one
ments. Also, NV-SRAM bit-cells illustrated in Fig. 26
access Tx is turned on to limit the current flow. Analysis
need a significant number of extra transistors as com-
of the bit-cell layouts in [63] and [174] shows that the lay-
pared to the 1T-1MTJ STT-MRAM bit-cell. Hence, bit-
out area penalty is minimal. Note that write, disturb, and
cells based on some of the device structures presented in
decision failures can be simultaneously minimized if the
Section V may present a more area efficient approach to
access Tx’s are sized differently.
enabling self-referenced differential read operations.

F. Nonvolatile SRAM (NV-SRAM) G. Robust STT-MRAM Sensing Schemes

Several alternative STT-MRAM bit-cell structures Two circuit-level STT-MRAM sensing techniques
(Fig. 26) have been proposed in the literature to target have been proposed to improve the robustness of read
applications that require fast access times [181]–[183]. operations in STT-MRAM [31], [96]. The basic concept
These bit-cell structures are similar to the conventional underlying these schemes is to exploit specific charac-
6T SRAM bit-cell. The difference is that a pair of MTJs teristics of the STT-MRAM bit-cell so as to enable self-
is used to skew the cross-coupled inverters in the SRAM referencing.
cell to implement an NV-SRAM bit-cell. When the
NV-SRAM cell is powered off, the MTJs are first pro- 1) Destructive Self-Referenced Sensing: The self-referenced
grammed with the state of the cell. Note that MTJ0 and sensing scheme proposed times in [31] achieves self-
MTJ1 in Fig. 26 store complementary data (i.e., one referencing by comparing the resistance of the bit-cell
stores “0” and the other stores “1”). When the NV-SRAM to a known resistance generated from the same bit-cell.
is powered back on, the MTJs skew the cross-coupled The sensing scheme (Fig. 27) can be illustrated by con-
inverters so that the state the NV-SRAM returns to is sidering “1” to be represented by an MTJ in the AP
the state before it was powered off. The advantage of state, while an MTJ in the P state represents “0.” In
NV-SRAM is that since the SRAM bit-cell topology the first step of the sensing operation, a small read cur-
is preserved, only the array peripheral circuitry for rent, IRD , is passed through the bit-cell to charge the
write operations needs to be modified to meet write positive input terminal of a voltage sense amplifier
requirements. Furthermore, the differential nature of through the switch shown in Fig. 27. The bit-cell being
the bit-cell structure enables self-referenced differential sensed is then overwritten with “0.” Next, IRD is passed
read operations for fast read access. Self-referenced through the bit-cell again to charge the negative terminal
read operations do not require a global reference for of the voltage sense amplifier through the switch. After
sensing operations, which significantly impoves varia- the voltages on the input terminals of the sense amplifier
tion tolerance. have settled, the difference between them is large if the

Vol. 104, No. 7, July 2016 | Proceedings of the IEEE 1473

Fong et al.: Spin-Transfer Torque Memories: Devices, Circuits, and Systems

of this current mirror charges the positive terminal of

the voltage sense amplifier. If the voltage difference be-
tween the input terminals of the voltage sense amplifier
is large, the MTJ in the bit-cell being sensed is in the AP
state, and the voltage sense amplifier outputs “1” ðVDD Þ.
If the voltage difference between the input terminals of
the voltage sense amplifier is small, the MTJ in the bit-
cell being sensed is in the P state, and the voltage sense
amplifier outputs “0” ðGNDÞ.
The nondestructive sensing scheme greatly reduces
the energy dissipated by the read operation since the
MTJ does not need to be programmed. However, read
accesses still consist of multiple steps and may hence re-
quire more than one cycle. Furthermore, the voltage de-
Fig. 28. Peripheral circuit for the nondestructive sensing scheme pendence of the MTJ resistance may diminish as the size
from [96]. of the MTJ is scaled down as shown in [27]. Hence, the
nondestructive self-referencing sensing may become inef-
fective in scaled MTJ technologies.
The circuit-level design techniques presented in this
bit-cell originally stored a “1,” and small if it originally section for improving STT-MRAM are summarized in
stored a “0.” The voltage output of the sense amplifier is Table 4. First, techniques for embedding functionality
“1” ðVDD Þ if the voltage difference between its inputs is such as ROM, MemPUF, and TRNG in STT-MRAM ar-
large and “0” ðGNDÞ if the difference is small. Since the rays with little or no overhead to the bit-cell area were
data in the bit-cell was overwritten with “0” during sens- presented. This leads to significantly lower cost compared
ing, it needs to be written back with “1” if the voltage to the case where such functionality is added separately.
sense amplifier senses that it originally stored a “1” prior Design techniques to optimize write energy efficiency of
to the overwriting step. This sensing scheme is destruc- STT-MRAM were then presented. Word-line and write-
tive in that the stored data are overwritten during sens- voltage boosting, transistor body-biasing, and an external
ing. Furthermore, the average energy dissipated by read magnetic field may be used to assist spin-transfer torque
operations is larger than the average write energy since during write operations. Techniques such as bit-line volt-
every read operation consists of at least one write opera- age clamping and the common source line with balanced
tion. Moreover, read accesses require long latencies since write scheme optimize the write energy efficiency of
they occur over four steps. STT-MRAM. Alternative bit-cell topologies such as the
2T-1MTJ bit-cell and nonvolatile SRAM bit-cells were
2) Nondestructive Self-Referenced Sensing: The energy also discussed. The 2T-1MTJ bit-cell enables separate op-
dissipation and destructive nature of the self-referenced timization of STT-MRAM for read and write operations.
sensing scheme discussed earlier may be improved by The second access transistor may be used to limit or in-
eliminating the need to overwrite and restore the data in crease the write current flowing through the MTJ, depend-
the bit-cell being sensed. This may be achieved by ex- ing on the requirements. However, these circuit-level
ploiting a different characteristic of the MTJ [96]. It was design techniques may not satisfy requirements of applica-
observed that the voltage dependence of the MTJ resis- tions that require low write latency. NV-SRAM bit-cells
tance is very strong when the MTJ is in the AP state, but were proposed for such applications. When power is sup-
very weak when the MTJ is in the P state. Hence, data plied to NV-SRAM, they function and perform as 6T
may be sensed by sensing the voltage dependence of the SRAM. The data stored in the NV-SRAM are backed up
MTJ resistance using the circuit shown in Fig. 28. The into the MTJs within the bit-cell before power is removed
modified sensing operation proceeds as follows. The volt- so as to save leakage power when the NV-SRAM is idle.
age on BL is first biased at VBL1 , and the current flowing When the data in NV-SRAM are required, power and the
through the bit-cell is passed into the input of a 1 : 1 state stored in the MTJs are restored to the bit-cell. Thus,
current mirror with SEL1 and SEL2 charged to VDD and the NV-SRAM can approach the performance of 6T SRAM
GND, respectively [96]. The output of the current mirror while providing nonvolatility. Although the circuit-level
charges the negative terminal of a voltage sense ampli- design techniques presented here are promising for opti-
fier. Next, SEL1 and SEL2 are set to GND, and the volt- mizing STT-MRAM, further improvements can be
age of BL is charged to VBL2. The current flowing achieved by considering the architectural aspects of STT-
through the bit-cell during this time is passed into the MRAM. In Section VII, we will discuss architecture-level
input of a 1 : 2 current mirror with SEL1 and SEL2 design techniques for STT-MRAM that exploit its unique
charged to GND and VDD , respectively [96]. The output characteristics.

1474 Proceedings of the IEEE | Vol. 104, No. 7, July 2016

Fong et al.: Spin-Transfer Torque Memories: Devices, Circuits, and Systems

Table 4 Summary of Circuit-Level Design Techniques to Improve STT-MRAM

VII. IM PROVING S TT- MRAM ARRAY lines to appropriate voltages during the write operation.
ARCHITECTURE Furthermore, large STT-MRAM memory arrays can be
The unique characteristics of STT-MRAM lead to a num- organized into banks, sub-banks, mats, and subarrays in
ber of benefits while also posing some key challenges in a manner similar to SRAMs in order to reduce the delay
the design of memory arrays. For example, just using STT- and energy consumption from longer interconnects. A
MRAM as a drop-in replacement for SRAM in on-chip ca- key difference between SRAMs and STT-MRAMs is in
ches may give some benefits due to the increase in cache the idle state. Due to the nonvolatile nature of STT-
capacity and nonvolatility. However, the performance may MRAMs, the bit and source lines can be grounded in the
suffer due to longer access times. Moreover, error rates in idle state. As a result, the leakage energy consumption is
STT-MRAM may be prohibitively high and require array limited to only the peripheral circuits. However, when
level design techniques to handle them, thus reducing the the array is accessed to perform a read/write operation,
benefits. In this section, we first discuss the various array- charging the bit lines or source lines introduces leakage
level design considerations and techniques for STT- paths in the unselected cells of the column, leading to
MRAM. Thereafter, various architectural optimizations to active leakage power consumption. Another major differ-
maximize the benefits of STT-MRAM are presented. ence is in the sensing mechanism used for performing
the read operation. The basic STT-MRAM array employs
single-ended sensing, and therefore requires reference
A. STT-MRAM Array Design bit-cells as shown in Fig. 29. Sensing is performed by
Fig. 29 shows the design of a basic STT-MRAM mem- comparing the current/voltage through/across the se-
ory array. Similar to a traditional SRAM array, the STT- lected bit-cells to that of the reference bit-cells. As ex-
MRAM array consists of row and column decoders that plained earlier, this can lead to significant reduction in
are used to select the required bit-cells for access, sense the sensing margin under variations. This is further de-
amplifiers that are used to sense the data during the read graded by the active leakage current through the unse-
operation, and write drivers to charge the bit and source lected cells. For example, consider an array consisting of

Vol. 104, No. 7, July 2016 | Proceedings of the IEEE 1475

Fong et al.: Spin-Transfer Torque Memories: Devices, Circuits, and Systems

Fig. 30. Adaptive STT-MRAM array design: (a) single cell storing
1 bit per 1T-1MTJ, (b) dual cell storing 1 bit in 2T-2MTJ,
(c) quadruple cell storing 1 bit in 4T-4MTJ.

MRAM by sacrificing capacity. This may also be achieved

Fig. 29. Conventional STT-MRAM array design.
in the MLC storage device [187]. For example, an array
of MLC bit-cells may be operated in SLC mode to im-
prove sensing operations. In the MLC mode, the possible
m bit-cells per column. The net current passing through data stored in bit-cells are “00,” “01,” “10,” and “11.” In
the bit line during read operation from a bit-cell storing the SLC mode, the bit-cells store either “00” or “11.” The
“0” is given by IRD;0 ¼ ðIon;0 þ ðm  1ÞIoff Þ. Similarly, for difference in resistance of the MLC bit-cell in the SLC
a bit-cell storing “1,” IRD;1 ¼ ðIon;1 þ ðm  1ÞIoff Þ. The ar- mode can be larger than that in the SLC bit-cells, making
ray MR ratio ðTMRarray Þ is given by it easier to distinguish between stored resistance states at
the expense of fewer stored data bits.
IRD;0  IRD;1 Another design consideration at the array-level is Error
TMRarray ¼ Correction Codes or ECC, which can be used to tackle bit-
level errors [188], [189]. As described in Section III-B,
Ion;0  Ion;1
¼ write errors may be introduced during the operation of
Ion;1 þ ðm  1ÞIoff STT-MRAM due to thermal effects and process variations.
Ron;1  Ron;0 This presents a tradeoff between write current versus error
¼ Roff G TMRbitcell : (18)
Ron;0 þ ðm1Þ rate, which in turn translates to area and energy versus er-
ror tradeoff. Therefore, the use of multibit ECC can im-
prove the energy and density of spin memories by relaxing
Note that TMRarray is less than the bit-cell-level MR ratio the write current requirement [189]. Furthermore, multi-
ðTMRbitcell Þ, and this leads to increased read failures in level ECC can also be used to mitigate the increase in
the STT-MRAM array [184], especially in on-chip caches write current and corresponding increase in area due to
(due to their fast access time requirement). write current variations. In multilevel ECC, data words
One approach to address this problem of reduced read that are significantly affected by variations are provisioned
margins is adaptive array design [185], [186] illustrated with a second-level ECC to correct the potential errors in
in Fig. 30, which uses one or more bit-cells to store a sin- large numbers of bits [188].
gle bit of data depending on the read speed and capacity Another major concern with STT-MRAM is its write
requirements. When capacity is critical, then each bit is energy as discussed earlier. The nonvolatile nature of
stored in 1T-1MTJ configuration to maximize array capac- STT-MRAM presents a unique opportunity to reduce the
ity. Next, in order to achieve higher read speed, adjacent write energy required at the array level. For example,
columns are utilized to store a bit and its complement. the STT-MRAM array does not suffer from half-select
The read operation is then performed by comparing the problem as mentioned earlier, and therefore, the unse-
resistances of these bit-cells to each other using a differ- lected columns of the array can remain in the idle state
ential sensing scheme, thereby enabling faster read opera- [190]. As a result, energy is only dissipated in selected
tions. Storing data in a complementary fashion increases bit-cells connected to the selected columns. Note that al-
the effective signal difference that needs to be sensed to though the write energy for individual STT-MRAM bit-
read out the stored data while reducing the effective array cells may be high, the write energy consumption at the
capacity. To further improve the read speeds, the effec- array-level may not increase as much since the half-
tive signal difference may be further increased by com- selected bit-cells do not dissipate energy.
bining adjacent rows as shown in Fig. 30, thereby Although the aforementioned array design techniques
utilizing four bit-cells to store a bit of data. Conceptually, may be able to overcome some of the deficiencies of
the adaptive array improves sensing performance of STT- STT-MRAM, more can be achieved if the architecture is

1476 Proceedings of the IEEE | Vol. 104, No. 7, July 2016

Fong et al.: Spin-Transfer Torque Memories: Devices, Circuits, and Systems

optimized considering unique characteristics associated

with STT-MRAMs. For example, STT-MRAM has high
write energy and latency compared to its CMOS counter-
parts (i.e., SRAM and DRAM). In addition, the asymmetric
nature of write operations STT-MRAM further worsens
the inefficiency in write operations. Some benefits of
STT-MRAM may be recovered when MLCs are used, but
doing so may lead to variable write latencies as we have
shown in Section V-E. Therefore, the array architecture of
STT-MRAM needs to be optimized with careful consider-
ation of the different design tradeoffs. In the following
sections, we briefly survey some of the key architectural Fig. 31. Way-based intra-level hybrid cache architecture.
optimizations that have been proposed for STT-MRAM.

B. Hybrid Cache to match the most suitable technology to the access char-
The motivation for hybrid cache stems from both de- acteristics of different cache blocks. In order to illustrate
vice and architectural considerations [191]–[206]. From an intra-level hybrid cache, let us consider a design with
the device point of view, different technologies (STT- STT-MRAM and SRAM [191]–[193], [195], [202]–[204].
MRAM, CMOS memories) have complementary character- Recall that STT-MRAMs have relatively efficient reads,
istics (i.e., the strengths and weakness of STT-MRAM and but have higher write energy and latency than SRAMs.
CMOS memories are mutually exclusive). For instance, Therefore, STT-MRAMs are most suitable for storing
SRAM has high leakage and area but low write energy/la- read-intensive blocks, while SRAMs are suitable for stor-
tency, whereas STT-MRAM has very low leakage and ing write-intensive blocks. Note that in practice, most of
high density at the cost of high write energy/latency. the cache blocks are read-intensive in nature, which
From an architecture perspective, different levels in the enables us to design a large portion of the cache using
memory hierarchy have different design constraints: 1) L1 STT-MRAMs, thereby exploiting its high density and low
cache must have lower access latency and can have small leakage power.
capacity; 2) Last-Level Cache (LLC) is typically much Fig. 31 shows the design of a way-based intra-level
larger in capacity and, therefore, requires high density and hybrid cache, in which a few cache ways are designed
low leakage power. In addition, different cache blocks using SRAM and the remaining cache ways are designed
have different read/write characteristics (read-intensive using STT-MRAM. In order to classify the cache blocks
versus write-intensive). Based on the above observations, as read-intensive or write-intensive and direct them to
two different hybrid cache architectures can be imple- the appropriate way in the cache, suitable cache manage-
mented viz., Inter-level and Intra-level. ment policies are used. The tag array stores additional
bits (that constitute a saturation counter) to determine if
1) Inter-Level Hybrid Cache: Inter-level hybrid cache the cache block is read- or write-intensive. When a cache
architecture uses different memory technologies in dif- block is first written, its saturation counter is reset and
ferent levels of the cache hierarchy to match the design is incremented on every write operation. During a write
requirements of each cache level to the memory device operation to STT-MRAM, its saturation counter is
characteristics [191]–[194]. In this approach, the latency- checked to see if it has reached a threshold. If it has, the
sensitive L1 cache is designed using bit-cells with fast ac- saturation counters in the SRAM ways are checked for
cess times like SRAM, whereas capacity-sensitive LLCs the smallest value. If the value in the saturation counter
are designed with high-density memory technologies like for the STT-MRAM way is larger than that for the SRAM
STT-MRAM. Furthermore, the inefficiency in write oper- way (i.e., the SRAM way in that cache block is less
ations of STT-MRAM-based LLC can be addressed by write-intensive than the STT-MRAM that was being writ-
modifying the cache management policy to reduce the ten to), then the cache blocks are swapped. A swap
number of writes to LLC, e.g., by allocating a cache buffer is added as shown in Fig. 31 to enable swap opera-
block only if its reuse count is predicted to be higher tions across ways. In this way, SRAM ways are used to
than a certain threshold [205]. In this way, an inter-level store cache blocks with high write intensity and mitigate
hybrid cache incorporates the benefits of different tech- the impact of inefficient writes in STT-MRAM ways.
nologies in the design of memory hierarchy. However, due to the small number of SRAM ways,
the aforementioned policy can lead to high conflict mis-
2) Intra-Level Hybrid Cache: The intra-level hybrid ses for write-intensive workloads. An adaptive cache
cache architecture goes further than the inter-level hy- management policy [206] can be used to address this is-
brid cache architecture by mixing different memory tech- sue. In this policy, a cache block is evicted to lower-level
nologies within each level of cache. This is done in order memory only if it is predicted to be dead (i.e., the cache

Vol. 104, No. 7, July 2016 | Proceedings of the IEEE 1477

Fong et al.: Spin-Transfer Torque Memories: Devices, Circuits, and Systems

Fig. 32. The large amount of redundant bit writes to L2 in SPEC

2006 benchmark indicates that substantial write energy savings
may be achieved by avoiding redundant bit writes [207].

block is unlikely to be reused). Otherwise, the cache Fig. 33. Partial line update scheme.
block is migrated to an STT-MRAM way to eliminate the
overheads associated with conflict misses. In addition,
the cache block allocation policy can be adapted based overwritten. Only the bits that need to be modified are
on the nature of cache write operations. If the cache written into after the read operation. The redundant
block being written is either unlikely to be reused or write operations are eliminated as a result, and signifi-
only reused in the near future, then it is stored in SRAM cant write energy savings is achieved.
ways. On the other hand, if the cache block being writ-
ten is not a dead block and is unlikely to be written in 2) Partial Line Update: Another approach to reduce the
the near future, it is stored in STT-MRAM ways. This redundant writes on cache block eviction is partial line
leads to lower conflict misses in SRAM ways, and the update [190]. This scheme is illustrated in Fig. 33 for a
cache performance is improved. two-level cache hierarchy. Additional dirty bits in the tag
array are used to track the dirty blocks at sub-block gran-
C. Reducing Redundant Writes ularity in the L1 cache. When a cache block is evicted
Two kinds of write operations occur in caches: from L1 cache, only the dirty portion of the cache block
1) write on dirty cache block eviction from higher-level is written to L2, thereby eliminating the energy consump-
caches to lower-level caches, and 2) write on cache block tion from redundant writes of other sub-blocks.
replacement in which a new cache block from lower-level
memory replaces the exiting cache block. In the former, a 3) Write Biasing: As discussed earlier, a significant num-
cache block is marked dirty even if only one of the words ber of redundant write operations may occur if an evicted
in the cache block is modified. As such, the write opera- cache block contains very few dirty blocks. Hence, signifi-
tion is performed on every bit in the cache block upon cant write energy savings can be achieved if the replace-
eviction in traditional cache architectures. This may re- ment policy is modified to increase the residency of dirty
sult in significant amount of redundant writes when very blocks in the higher level caches, thereby reducing the
few words in the cache block are actually dirty. Even number of dirty block evictions [208]. The objective here
when we consider the cache block replacement, not all is to accommodate all the writes to a cache block and avoid
the bits of the cache block are modified during the write premature eviction. In a typical Least Recently Used (LRU)
operation. An analysis of the writes to a 16 MB L2 cache replacement policy, the most recently accessed block is
incurred during the execution of SPEC 2006 benchmarks placed at the top of stack (TOS) for replacement. This pol-
(Fig. 32) shows that 88% of the bit-level writes are re- icy is agnostic to loads and stores as well as dirty and clean
dundant [207]. These redundant write operations can blocks. In order to reduce the eviction of dirty blocks, the
lead to significant unnecessary energy consumption in insertion, eviction, and promotion steps of the cache block
STT-MRAM. In the following sections, we will discuss dif- replacement policy are modified as follows:
ferent techniques to address this issue [190], [207]–[212]. • Insertion policy for Loads: Target cache blocks
are inserted at distance K from TOS if all posi-
1) Early Write Termination: Early write termination tions higher than K are dirty. Otherwise, it is in-
[207] was proposed to reduce the write energy wasted in serted at TOS similar to LRU.
STT-MRAM. It takes advantage of the fact that read op- • Promotion policy for Loads: Target cache blocks
erations are significantly faster than write operations in are promoted to position K if all stack positions
STT-MRAM. Hence, read operations are performed first higher than K are dirty. Otherwise, it is promoted
to quickly determine which bits in the word need to be to TOS similar to LRU.

1478 Proceedings of the IEEE | Vol. 104, No. 7, July 2016

Fong et al.: Spin-Transfer Torque Memories: Devices, Circuits, and Systems

Fig. 34. As shown in [213], the lifetime of data in L2 cache can be

Fig. 35. Conceptual description of the asymmetry-aware write
less than 1 s and, hence, 10-year retention time is probably
scheme for STT-MRAM cache.
excessive for on-chip cache applications.

mechanisms, similar to those used for DRAM, are em-

• Insertion policy for Stores: Target cache blocks ployed in volatile STT-MRAM designs [213]–[218].
are always inserted at position K. This prevents
one-time stores from occupying positions higher
than K for very long periods unless replaced by E. Asymmetry-Aware STT-MRAM
another store operation. Cache Architecture
• Promotion policy for Stores: Target cache blocks As explained earlier, the write operation in STT-MRAM
of store hits to dirty lines are promoted to TOS. is asymmetric, i.e., writing from “0” ! “1” (P ! AP)
Store hits to clean blocks are promoted to posi- requires higher latency/energy compared to “1” ! “0”
tion K. (AP ! P). As a result, the cache write latency is deter-
• Eviction policy: The cache block at the bottom of mined by the slower “0” ! “1” switching time, resulting
the stack is evicted. in significantly higher latency and energy. Asymmetry-
The parameter K determines the relative preference given aware cache architecture [219], [220] is aimed at ad-
to the residency of dirty blocks over clean blocks and is dressing this issue. This technique involves the introduc-
determined based on the configuration and characteristics tion of redundant blocks (RBLs), which are preset to “1”
of the cache hierarchy. (AP state) as shown in Fig. 35 [219]. During a write op-
eration, the new cache block is written to one of the
D. Volatile STT-MRAM Clean RBLs by writing the appropriate bits from “1” !
As Fig. 34 shows, the lifetime of the data in the cache “0,” resulting in “fast write” operation. The other loca-
is typically low and, therefore, the cache does not re- tion, which initially stores the cache block (before the
quire very long retention time. Hence, the energy barrier write operation), is marked as a Dirty RBL. If a Clean
required for STT-MRAM-based on-chip caches can be RBL is not available during a write operation, then a
reduced to reduce the retention time and the write higher access latency is incurred (Slow write), and the
latency (see Table 5) [213]–[218]. Furthermore, differ- write operation is performed on the current cache loca-
ent levels in the cache hierarchy may also be designed tion. Simultaneously, the Dirty RBLs are also pro-
with different retention times depending on the design grammed to the “1” state and marked as Clean RBLs.
requirements [218]. Even within a single cache level, bit- Note that this eliminates the additional latency penalty
cells with different retention times can be used to store for the preset operation by overlapping it with the slow
cache blocks with different access characteristics [218]. writes. Thus, asymmetry-aware architecture reduces the
In order to ensure that data with relatively long lifetime effective latency of write operations in STT-MRAM ca-
(large interval between writes) are stored correctly, refresh ches by increasing the frequency of fast write operations.

Table 5 Retention Time Versus Write Latency for STT-MRAM in [213]

F. Obstruction-Aware Cache Architecture
The long latency of write operations in STT-MRAM-
based last-level caches (LLCs) can obstruct successive
cache accesses and lead to significant performance over-
heads, especially in the context of multicore systems.
Obstruction-aware cache architecture proposes suitable
cache managemement schemes to address this issue

Vol. 104, No. 7, July 2016 | Proceedings of the IEEE 1479

Fong et al.: Spin-Transfer Torque Memories: Devices, Circuits, and Systems

Table 6 Summary of Architecture-Level Design Techniques to Improve STT-MRAM

[209], [221]. It introduces an obstruction-aware monitor cache hierarchy, and STT-MRAM in the capacity-sensitive
(OAM), which tracks the request-response characteris- portions. In the intra-level hybrid cache design, SRAM
tics of the STT-MRAM cache in terms of miss rate, and STT-MRAM are implemented in the same level of
read/write latency, etc., for each core. OAM uses these cache: Some ways are implemented using SRAM, while
statistics to estimate the effective latencies of the STT- the rest are implemented using STT-MRAM. The hybrid
MRAM cache if the cache block is written to the cache cache design approach mitigates the impact of the long la-
(T1) and not written to the cache (T2). If T1 9 T2, this tency write operations in STT-MRAM. Our discussion
implies that the process running on that core is write- subsequently focused on several techniques that reduce
intensive and can degrade the performance through cache redundant write operations in STT-MRAM to save write
obstruction. Such a process is labeled as an obstruction energy. The early write termination technique achieves
process. Subsequent write requests from this process are this by performing a read operation prior to write so as to
forwarded to lower-level memory and not written to determine the bits that need to be changed in cache. The
STT-MRAM cache. This eliminates obstruction of the partial line update technique uses history bits to track
cache from long-latency writes and improves the overall sub-blocks that have been modified in the cache blocks.
system performance. When the cache block is evicted from the higher level of
Table 6 summarizes the key architectural-level design cache to the lower level, only the sub-blocks that have
techniques for improving STT-MRAM that we reviewed been modified are updated in the lower-level cache. The
in this section. We initially discussed the design of the write biasing technique uses a modified replacement pol-
conventional STT-MRAM array and the adaptive STT- icy to avoid premature eviction of cache blocks from the
MRAM array design, which is able to increase read higher levels of cache. The objective is to ensure that as
margins and performance. We then discussed research many bits as possible in the cache block have been modi-
proposing ECC schemes for mitigating the errors, which fied before it gets evicted to the lower-level cache. The
may arise due to retention failures, stochastic write pro- next technique discussed to reduce STT-MRAM write en-
cess, etc. An interesting aspect of the conventional STT- ergy is the concept of volatile STT-MRAM. This tech-
MRAM array is that unlike its SRAM counterpart, power nique exploits the fact that the lifetime of data in cache is
is only supplied to the bit-cells being accessed. Thus, usually small. Thus, EB may be lowered to save write en-
there is no half-select issue in STT-MRAM, and the read ergy. In applications that need a longer retention time
operation can be more energy-efficient than in SRAM. than that supported by the lowered EB , refresh mecha-
Next, we discussed hybrid STT-MRAM cache architec- nisms similar to those used in DRAM may be used.
tures. The strategy employed in the inter-level hybrid Asymmetry-aware and obstruction-aware cache architec-
cache is to use SRAM for latency-sensitive levels of the tures, which mitigate the impact of the long latency

1480 Proceedings of the IEEE | Vol. 104, No. 7, July 2016

Fong et al.: Spin-Transfer Torque Memories: Devices, Circuits, and Systems

write operations, were also presented. The asymmetry- will depend on the target application. The two terminal
aware cache architecture exploits the asymmetry in MTJ stacks are more suitable for high-density STT-
STT-MRAM write operations to reduce the effective write MRAM with low performance requirements, whereas
latency. Finally, the obstruction-aware cache architecture multiterminal devices or devices using new physical phe-
avoids obstruction of cache accesses by long latency write nomena for writing are more suited for high-performance
operations. STT-MRAM. However, experimental data are needed to
verify our understanding of spin-transfer torque (such as
size effects discussed in [40]) and to validate the models
VIII . CONCLUSION AND OUTLOOK used to study the multiterminal devices discussed in
It has been four decades since the discovery of the Section V.
tunneling magnetoresistance effect by Jullière. Since At the circuit-level, STT-MRAM bit-cell design tech-
then, many improvements in fabrication techniques as niques have been developed to extract maximum benefit
well as new discoveries have pushed spin-transfer- from the various device structures. One unique charac-
torque-based memory technology to become a leading teristic of STT-MRAM, for example, is that new func-
candidate for future nonvolatile on-chip memory tech- tionality (e.g., read-only memory, true random number
nology. As we have discussed earlier, STT-MRAM offers generator, and physically unclonable functions) may be
significant benefits over modern SRAMs for last-level embedded within the STT-MRAM array (i.e., the array
cache applications due to its nonvolatility and high inte- functions as nonvolatile RAM as well as the embedded
gration density. However, the benefits can be greater if function) at near-zero cost/penalty. It was also found that
not for the conflicting design requirements that STT- the integration density of STT-MRAM may be improved
MRAM designers need to overcome to meet read, write, by sharing the source lines. Some proposed circuit-level
and reliability design targets. The problem is further design techniques seek to improve write energy efficiency
exacerbated by design issues that limit the write energy (e.g., “self-enable” write scheme, balanced write scheme,
efficiency, read performance, and robustness of STT- bit-line clamping, and 2T-1MTJ dual-source-line bit-cell),
MRAM (such as source degeneration of the access while some optimize the design by decoupling design
transistor, requirement for bidirectional write currents, knobs (e.g., write assist schemes, and 2T-1MTJ bit-cell
asymmetry in critical write currents, shared read and with independent word lines). Novel sensing schemes
write current paths, and single-ended sensing read (e.g., destructive and nondestructive sensing) have also
operation). been proposed to improve the robustness of STT-MRAM
Much research effort has since been placed on over- read operations.
coming the challenges associated with STT-MRAM, Analysis of last-level caches shows that the nonvolati-
pushing it closer towards fulfilling its promise of be- lity and higher integration density of STT-MRAM makes
ing a high-performance, ultra-low-power, nonvolatile, it more suitable than 6T SRAM. Techniques to optimize
universal memory technology. The STT-MRAM design the STT-MRAM array design and architecture have also
techniques found in the literature may be broadly classi- been explored to extract more benefits out of STT-
fied based on the level of design abstraction that they ad- MRAM. For example, the cache line update scheme may
dress. At the device level, alternative structures for the be optimized to minimize write energy consumption
STT-MRAM storage element, the MTJ, have been pro- and further improve energy efficiency of STT-MRAM.
posed to give it the desired characteristics. Two terminal Conventional ECC schemes have been investigated to
MTJ structures, for example, have been studied to under- mitigate stochastic effects and process variations in
stand the design space available for achieving maximum STT-MRAM. Results show that stronger ECC may be
STT-MRAM integration density for last-level cache appli- needed to improve the robustness of STT-MRAM as the
cations. An important goal in the study of new materials size is scaled down. Since the write energy in STT-
and two terminal stacks is to reduce the high write en- MRAM is significantly higher than SRAMs, the energy
ergy of STT-MRAM. In this respect, the MTJ with dual efficiency of STT-MRAM on-chip cache can be im-
MgO/free layer interfaces is promising. However, the proved if write operations to STT-MRAM bit-cells are
two terminal MTJ may be insufficient for achieving STT- avoided (e.g., through hybrid caches, early write termi-
MRAM that performs as well as SRAM. The multitermi- nation, partial line update, and write biasing cache
nal storage elements that were discussed may be needed blocks to avoid premature eviction of cache blocks).
for STT-MRAM to be feasible for higher-level cache ap- The architecture may also be designed to increase the
plications. Also, new physical phenomena observed in ex- frequency of fast write operations such as in the asym-
periments (e.g., domain wall motion and spin-orbit metry-aware STT-MRAM cache.
torque effect) and new materials (e.g., topological insula- It is important to note that device/circuit/array-
tors and better magnetic materials) provide new avenues architecture co-design techniques have been very effec-
to overcome the design challenges in STT-MRAM. It is tive in optimizing STT-MRAM. For example, volatile
increasingly likely that the STT-MRAM storage device STT-MRAM exploits the fact that the lifetime of data in

Vol. 104, No. 7, July 2016 | Proceedings of the IEEE 1481

Fong et al.: Spin-Transfer Torque Memories: Devices, Circuits, and Systems

on-chip cache is much shorter than the 10-year lifetime Advances in STT-MRAM are also crucial to emerging
standard used in the mass storage industry. By relaxing nonvolatile computation systems [225]–[228]. STT-
the retention time requirement, the same write current MRAM devices have also been explored to realize Bool-
programs the STT-MRAM bit-cell in a much shorter ean logic. While it is an active area of research with
time and gives an opportunity to reduce write latency many open challenges [227], [240], [241], we focus on the
and energy consumption. Furthermore, the retention prospects and challenges of using STT devices to realize
time requirement is different for different cache levels memory in this work. In Section VI, we showed how the
and also for different cache blocks within the same STT-MRAM memory device may be used as a true ran-
cache level. Hence, the STT-MRAM bit-cells for differ- dom number generator. Moreover, it has been shown
ent cache blocks can utilize MTJs with different reten- that the functionality of STT devices are similar to artifi-
tion times. The array architecture must also be designed cial neurons and synapses, and thus, these devices can
to support the partitioning of the caches, and ECC enable new computation paradigms [228]. Thus, ad-
schemes may need to be implemented to ensure reli- vances in the fabrication of STT-MRAM memory devices
ability of the volatile STT-MRAM cache. It is possible with low switching energy also aids the development of
that stronger ECC schemes may be able to push the energy-efficient neuro-inspired computational systems.
limit of energy efficiency in STT-MRAM and warrant Finally, the ability of STT-MRAM to achieve extreme
further investigation. As interest in STT-MRAM con- integration density in multilevel bit-cells and racetrack
tinues to grow, we expect to see a myriad of STT- memory is an attractive characteristic that has not been
MRAM storage devices suitable for different application fully exploited. It is worthy to note that several studies
requirements as well as corresponding device/circuit/ have explored the system-level benefits of multilevel STT-
array-architecture co-optimization techniques. MRAM bit-cells [229]–[239]. It has been found that the
Many other challenges in STT-MRAM, including sev- variable access latency in multilevel STT-MRAM bit-cells
eral that are beyond the scope of this paper, remain to may be made manageable by suitably engineering the stor-
be solved. Several material layers are needed to form age device [230]–[234]. At the time of writing, the variabil-
MTJs with desirable characteristics [33], [78], [222]. The ity in racetrack memory remains the biggest factor limiting
synthetic anti-ferromagnetic pinned layer structure, the achievable integration density of STT-MRAM.
which is preferred for its small stray field, consists of as Research has begun in earnest to study new material
many as 20 layers [223]. The deposited layers may be systems and new phenomena that address the challenges
damaged during patterning to define the MTJ pillars facing STT-MRAM. These efforts give reason for hope,
[224]. The refinement of MTJ fabrication steps compati- and may prove to be the boost required for STT-MRAM
ble with the CMOS fabrication process will be crucial for to fulfill its potential as the truly universal next-generation
realizing the STT-MRAM devices discussed in Section V. memory technology. h

REFERENCES [8] T. Miyazaki, T. Yaoi, and S. Ishio, “Large junctions,” Nature Mater., vol. 3, no. 12,
magnetoresistance effect in 82Ni-Fe/ pp. 868–871, Dec. 2004.
[1] D. A. Patterson and J. L. Hennessy, Al-Al2O3/Co magnetic tunneling junction,”
Computer Organization and Design, Revised [15] S. Ikeda et al., “Tunnel magnetoresistance
J. Magn. Magn. Mater., vol. 98, no. 1–2, of 604% at 300K by suppression of
Fourth Edition: The Hardware/Software pp. L7–L9, Jul. 1991.
Interface, Amsterdam, The Netherlands: Ta diffusion in CoFeBMgOCoFeB
Elsevier, 2011, vol. 2011. [9] R. Schad et al., “Giant magnetoresistance pseudo-spin-valves annealed at high
in Fe/Cr superlattices with very thin temperature,” Appl. Phys. Lett., vol. 93,
[2] J. M. Rabaey, A. P. Chandrakasan, and Fe layers,” Appl. Phys. Lett., vol. 64, no. 25, no. 8, 2008, Art. no. 082508.
B. Nikolic, Digital Integrated Circuits: A p. 3500, 1994.
Design Perspective, Upper Saddle River, [16] M. Bowen et al., “Nearly total spin
NJ, USA: Pearson, 2003. [10] T. Miyazaki and N. Tezuka, “Giant polarization in La[sub 2/3]Sr[sub 1/3]MnO
magnetic tunneling effect in Fe/Al2O3/Fe [sub 3] from tunneling experiments,” Appl.
[3] M. Julliere, “Tunneling between junction,” J. Magn. Magn. Mater., vol. 139, Phys. Lett., vol. 82, no. 2, p. 233, 2003.
ferromagnetic films,” Phys. Lett. A, vol. 54, no. 3, pp. L231–L234, Jan. 1995.
no. 3, pp. 225–226, Sep. 1975. [17] P. Mavropoulos, M. Ležaić, and S. Blügel,
[11] W. Butler, X.-G. Zhang, T. Schulthess, and “Half-metallic ferromagnets for magnetic
[4] M. N. Baibich, J. M. Broto, A. Fert, J. MacLaren, “Spin-dependent tunneling tunnel junctions by ab initio calculations,”
F. N. Van Dau, and F. Petroff, “Giant conductance of Fe-MgO-Fe sandwiches,” Phys. Rev. B, vol. 72, no. 17, Nov. 2005,
magnetoresistance of (001)Fe/(001)Cr Phys. Rev. B, vol. 63, no. 5, Jan. 2001, Art. no. 174428.
magnetic superlattices,” Phys. Rev. Lett., Art. no. 054416.
vol. 61, no. 21, pp. 2472–2475, Nov. 1988. [18] G. Autès, J. Mathon, and A. Umerski,
[12] S. S. P. Parkin, “Applications of magnetic “Strong enhancement of the tunneling
[5] G. Binasch, P. Grünberg, F. Saurenbach, nanostructures,” in Spin Dependent magnetoresistance by electron filtering in
and W. Zinn, “Enhanced magnetoresistance Transport in Magnetic Nanostructures, an Fe/MgO/Fe/GaAs(001) junction,” Phys.
in layered magnetic structures with S. Maekawa and T. Shinjo, Eds., London, Rev. Lett., vol. 104, no. 21, May 2010,
antiferromagnetic interlayer exchange,” U.K.: Taylor & Francis, 2002, ch. 5. Art. no. 217202.
Phys. Rev. B, vol. 39, no. 7, pp. 4828–4830,
Mar. 1989. [13] S. S. P. Parkin et al., “Giant tunnelling [19] J.-I. Inoue, “GMR, TMR and BMR,” in
magnetoresistance at room temperature Nanomagnetism Spintron, T. Shinjo,
[6] “The Nobel Prize in Physics 2007.” [Online]. with MgO (100) tunnel barriers,” Nature Ed., 2nd, Amsterdam, The Netherlands:
Available: Mater., vol. 3, no. 12, pp. 862–867, Elsevier, 2009, ch. 2, pp. 15–92.
nobel_prizes/physics/laureates/2007/. Dec. 2004. [20] B. Behin-Aein, S. Salahuddin, and S. Datta,
[7] S. Parkin, R. Bhadra, and K. Roche, [14] S. Yuasa, T. Nagahama, A. Fukushima, “Switching Energy of Ferromagnetic Logic
“Oscillatory magnetic exchange coupling Y. Suzuki, and K. Ando, “Giant Bits,” IEEE Trans. Nanotechnol., vol. 8,
through thin copper layers,” Phys. Rev. room-temperature magnetoresistance in no. 4, pp. 505–514, Jul. 2009.
Lett., vol. 66, no. 16, pp. 2152–2155, single-crystal Fe/MgO/Fe magnetic tunnel
Apr. 1991.

1482 Proceedings of the IEEE | Vol. 104, No. 7, July 2016

Fong et al.: Spin-Transfer Torque Memories: Devices, Circuits, and Systems

[21] N. Strikos, V. Kontorinis, X. Dong, tunnel junctions using a physics-based [55] I. L. Prejbeanu et al., “Thermally assisted
H. Homayoun, and D. Tullsen, model,” in Proc. 72nd IEEE Device MRAM,” J. Phys. Condens. Matter, vol. 19,
“Low-current probabilistic writes for Res. Conf., Jun. 2014, pp. 155–156. no. 16, Apr. 2007, Art. no. 165218.
power-efficient STT-RAM caches,” in [39] “ITRS Roadmap,” 2014. [Online]. [56] B. Dieny et al., “Extended scalability and
Proc. 31st IEEE Int. Conf. Comput. Des., Available: functionalities of MRAM based on
Oct. 2013, pp. 511–514. thermally assisted writing,” in Proc. IEEE
[40] J. Z. Sun et al., “Effect of subvolume
[22] B. Engel et al., “A 4-Mb toggle MRAM excitation and spin-torque efficiency on Int. Electron Devices Meet., vol. 33,
based on a novel bit and switching magnetic switching,” Phys. Rev. B, vol. 84, pp. 1.3.1–1.3.4.
method,” IEEE Trans. Magn., vol. 41, no. 1, no. 6, Aug. 2011, Art. no. 064413. [57] K. Ono et al., “A disturbance-free read
pp. 132–136, Jan. 2005. scheme and a compact stochastic-spin-
[41] J. Z. Sun et al., “Spin-torque switching
[23] Y. Huai, “Spin-transfer torque MRAM efficiency in CoFeB-MgO based tunnel dynamics-based MTJ circuit model for
(STT-MRAM): Challenges and prospects,” junctions,” Phys. Rev. B, vol. 88, no. 10, Gb-scale SPRAM,” in Proc. IEEE Int.
AAPPS Bull., vol. 18, no. 6, pp. 33–40, Sep. 2013, Art. no. 104426. Electron Devices Meet., Dec. 2009, pp. 1–4.
2008. [58] C. Lin et al., “45 nm low power CMOS
[42] T. Devolder, “Scalability of magnetic random
[24] J. Slonczewski, “Current-driven excitation access memories based on an in-plane logic compatible embedded STT MRAM
of magnetic multilayers,” J. Magn. Magn. magnetized free layer,” Appl. Phys. Exp., utilizing a reverse-connection 1T/1MTJ
Mater., vol. 159, no. 1–2, pp. L1–L7, vol. 4, no. 9, Aug. 2011, Art. no. 093001. cell,” in Proc. IEEE Int. Electron Devices
Jun. 1996. Meet., Dec. 2009, pp. 1–4.
[43] M. Pakala, Y. Huai, T. Valet, Y. Ding, and
[25] L. Berger, “Emission of spin waves by a Z. Diao, “Critical current distribution in [59] Y. M. Lee et al., “Highly scalable STT-MRAM
magnetic multilayer traversed by a spin-transfer-switched magnetic tunnel with MTJs of top-pinned structure in
current,” Phys. Rev. B, vol. 54, no. 13, junctions,” J. Appl. Phys., vol. 98, no. 5, 1T/1MTJ cell,” in Proc. IEEE Symp. VLSI
pp. 9353–9358, Oct. 1996. 2005, Art. no. 056107. Technol., Jun. 2010, pp. 49–50.
[26] Y. Huai, F. Albert, P. Nguyen, M. Pakala, [44] Z. Diao et al., “Spin-transfer torque [60] A. Driskill-Smith et al., “Non-volatile
and T. Valet, “Observation of spin-transfer switching in magnetic tunnel junctions and spin-transfer torque RAM (STT-RAM): An
switching in deep submicron-sized and spin-transfer torque random access analysis of chip data, thermal stability and
low-resistance magnetic tunnel junctions,” memory,” J. Phys. Condens. Matter, vol. 19, scalability,” in Proc. IEEE IMW, vol. 1,
Appl. Phys. Lett., vol. 84, no. 16, p. 3118, no. 16, Apr. 2007, Art. no. 165209. no. 408, 2010, pp. 1–3.
2004. [61] T. Kawahara, “Challenges toward gigabit-scale
[45] T. Devolder et al., “Single-shot time-resolved
[27] M. Gajek et al., “Spin torque switching of measurements of nanosecond-scale spin-transfer torque random access
20 nm magnetic tunnel junctions with spin-transfer induced switching: stochastic memory and beyond for normally off, green
perpendicular anisotropy,” Appl. Phys. Lett., versus deterministic aspects,” Phys. Rev. information technology infrastructure
vol. 100, no. 13, 2012, Art. no. 132408. Lett., vol. 100, no. 5, Feb. 2008, (Invited),” J. Appl. Phys., vol. 109, no. 7,
[28] J. Sun, “Spin-current interaction with a Art. no. 057206. 2011, Art. no. 07D325.
monodomain magnetic body: A model [46] T. Seki et al., “Switching-probability [62] J. Li, P. Ndai, A. Goel, S. Salahuddin, and
study,” Phys. Rev. B, vol. 62, no. 1, distribution of spin-torque switching in K. Roy, “Design paradigm for robust
pp. 570–578, Jul. 2000. MgO-based magnetic tunnel junctions,” spin-torque transfer magnetic RAM
[29] J. Xiao, A. Zangwill, and M. Stiles, Appl. Phys. Lett., vol. 99, no. 11, (STT MRAM) from circuit/architecture
“Boltzmann test of Slonczewski’s theory of Art. no. 112504, 2011. perspective,” IEEE Trans. Very Large Scale
spin-transfer torque,” Phys. Rev. B, vol. 70, Integr. Syst., vol. 18, no. 12, pp. 1710–1723,
[47] W. Brown, “Thermal fluctuations of a
no. 17, Nov. 2004, Art. no. 172405. Dec. 2010.
single-domain particle,” Phys. Rev.,
[30] NIST, “OOMMF,” 2006. [Online]. vol. 130, no. 5, pp. 1677–1686, Jun. 1963. [63] X. Fong, S. H. Choday, and K. Roy,
Available: “Bit-cell level optimization for non-volatile
[48] W. Scholz, T. Schrefl, and J. Fidler,
memories using magnetic tunnel junctions
[31] G. Jeong et al., “A 0.24- m 2.0-V “Micromagnetic simulation of thermally
and spin-transfer torque switching,” IEEE
1T1MTTJ 16-kb nonvolatile activated switching in fine particles,”
Trans. Nanotechnol., vol. 11, no. 1,
magnetoresistance RAM with self-reference J. Magn. Magn. Mater., vol. 233, no. 3,
pp. 172–181, Jan. 2012.
sensing scheme,” IEEE J. Solid-State pp. 296–304, Aug. 2001.
Circuits, vol. 38, no. 11, pp. 1906–1910, [64] X. Fong, Y. Kim, S. H. Choday, and
[49] D. Pinna, A. D. Kent, and D. L. Stein,
Nov. 2003. K. Roy, “Failure mitigation techniques for
“Spin-transfer torque magnetization
1T-1MTJ spin-transfer torque MRAM
[32] T. Kawahara et al., “2 Mb SPRAM reversal in uniaxial nanomagnets with
bit-cells,” IEEE Trans. Very Large Scale
(spin-transfer torque RAM) with bit-by-bit thermal noise,” J. Appl. Phys., vol. 114,
Integr. Syst., vol. 22, no. 2, pp. 384–395,
bi-directional current write and no. 3, 2013, Art. no. 033901.
Feb. 2014.
parallelizing-direction current read,” IEEE [50] A. Raychowdhury, D. Somasekhar, T. Karnik,
J. Solid-State Circuits, vol. 43, no. 1, [65] W. Kang et al., “Variation-tolerant and
and V. De, “Design space and scalability
pp. 109–120, Jan. 2008. disturbance-free sensing circuit for
exploration of 1T-1STT MTJ memory arrays
deep nanometer STT-MRAM,” IEEE
[33] S. Ikeda et al., “A perpendicular-anisotropy in the presence of variability and
Trans. Nanotechnol., vol. 13, no. 6,
CoFeB-MgO magnetic tunnel junction,” disturbances,” in Proc. IEEE Int. Electron
pp. 1088–1092, Nov. 2014.
Nature Mater., vol. 9, no. 9, pp. 721–724, Devices Meet., Aug. 2009, pp. 1–4.
Sep. 2010. [66] J. Akerman et al., “Reliability of 4MBIT
[51] A. Nigam et al., “Delivering on the promise
MRAM,” in Proc. 43rd IEEE Int. Rel. Phys.
[34] T. Kishi et al., “Lower-current and fast of universal memory for spin-transfer torque
Symp., 2005, pp. 163–167.
switching of a perpendicular TMR for high RAM (STT-RAM),” in Proc. IEEE/ACM Int.
speed and high density spin-transfer-torque Symp. Low Power Electron. Des., vol. 1, [67] K. Hosotani et al., “Resistance drift of
MRAM,” in Proc. IEEE Int. Electron Devices pp. 121–126, Aug. 2011. MgO magnetic tunnel junctions by
Meet., Dec. 2008, pp. 1–4. trapping and degradation of coherent
[52] W. H. Butler et al., “Switching distributions
tunneling,” in Proc. IEEE Int. Rel. Phys.
[35] J. H. Jeong et al., “Extended scalability of for perpendicular spin-torque devices
Symp., vol. 8, Apr. 2008, pp. 703–704.
perpendicular STT-MRAM towards sub-20 nm within the macrospin approximation,”
MTJ node,” in Proc. Int. Electron Devices IEEE Trans. Magn., vol. 48, no. 12, [68] C. Yoshida et al., “A study of dielectric
Meet., Dec. 2011, pp. 24.1.1–24.1.4. pp. 4684–4700, Dec. 2012. breakdown mechanism in CoFeB/MgO/
CoFeB magnetic tunnel junction,” in Proc.
[36] A. Driskill-Smith et al., “Latest Advances and [53] P. K. Amiri et al., “Spin-transfer torque
IEEE Int. Rel. Phys. Symp., Apr. 2009,
Roadmap for In-Plane and Perpendicular switching above ambient temperature,”
pp. 139–142.
STT-RAM,” in Proc. 3rd IEEE IMW, vol. 1, IEEE Magn. Lett., vol. 3, 2012,
no. 408, 2011, pp. 1–3. Art. no. 3000304. [69] C. Yoshida and T. Sugii, “Reliability study of
magnetic tunnel junction with naturally
[37] K. C. Chun et al., “A scaling roadmap and [54] N. D. Rizzo et al., “Thermally activated
oxidized MgO barrier,” in Proc. IEEE Int. Rel.
performance evaluation of in-plane and magnetization reversal in submicron
Phys. Symp., Apr. 2012, pp. 2A.3.1–2A.3.5.
perpendicular MTJ Based STT-MRAMs for magnetic tunnel junctions for
high-density cache memory,” IEEE J. magnetoresistive random access memory,” [70] J. Hayakawa, S. Ikeda, F. Matsukura,
Solid-State Circuits, vol. 48, no. 2, Appl. Phys. Lett., vol. 80, no. 13, 2002, H. Takahashi, and H. Ohno, “Dependence
pp. 598–610, Feb. 2013. Art. no. 2335. of giant tunnel magnetoresistance of
sputtered CoFeB/MgO/CoFeB magnetic
[38] J. Kim et al., “Scaling analysis of in-plane
tunnel junctions on MgO barrier thickness
and perpendicular anisotropy magnetic
and annealing temperature,” Jpn. J. Appl.

Vol. 104, No. 7, July 2016 | Proceedings of the IEEE 1483

Fong et al.: Spin-Transfer Torque Memories: Devices, Circuits, and Systems

Phys., vol. 44, no. 19, pp. L587–L589, [86] H. Sato et al., “Perpendicular-anisotropy [101] S. Fukami et al., “20-nm magnetic domain
Apr. 2005. CoFeB-MgO magnetic tunnel junctions wall motion memory with ultralow-power
[71] P. Khalili Amiri et al., “Low write-energy with a MgO/CoFeB/Ta/CoFeB/MgO operation,” in Proc. IEEE Int. Electron
magnetic tunnel junctions for high-speed recording structure,” Appl. Phys. Lett., Devices Meet., Dec. 2013, pp. 3.5.1–3.5.4.
spin-transfer-torque MRAM,” IEEE Electron vol. 101, no. 2, 2012, Art. no. 022414. [102] S. Huda and A. Sheikholeslami, “A novel
Device Lett., vol. 32, no. 1, pp. 57–59, [87] J.-H. Park et al., “Enhancement of data STT-MRAM cell with disturbance-free read
Jan. 2011. retention and write current scaling for operation,” IEEE Trans. Circuits Syst. I, Reg.
[72] B. Razavi, Design Of Analog CMOS sub-20 nm STT-MRAM by utilizing dual Papers, vol. 60, no. 6, pp. 1534–1547,
Integrated Circuits, New York, NY, USA: interfaces for perpendicular magnetic Jun. 2013.
McGraw-Hill, 2002, vol. 4. anisotropy,” in Proc. Symp. VLSI Technol., [103] X. Fong and K. Roy, “Complementary
2012, pp. 57–58. polarizers STT-MRAM (CPSTT) for on-chip
[73] P. Khalili Amiri et al., “Switching current
reduction using perpendicular anisotropy in [88] C. Augustine et al., “Numerical analysis of caches,” IEEE Electron Device Lett., vol. 34,
CoFeB-MgO magnetic tunnel junctions,” typical STT-MTJ stacks for 1T-1R memory no. 2, pp. 232–234, Feb. 2013.
Appl. Phys. Lett., vol. 98, no. 11, 2011, arrays,” in Proc. IEEE Int. Electron Devices [104] X. Fong and K. Roy, “Low-power robust
Art. no. 112507. Meet., Dec. 2010, pp. 22.7.1–22.7.4. complementary polarizer STT-MRAM
[74] Y. Huai et al., “Spin transfer switching and [89] T. Devolder, C. Chappert, J. Katine, (CPSTT) for on-chip caches,” in Proc. 5th
spin polarization in magnetic tunnel M. Carey, and K. Ito, “Distribution of the IEEE Int. Memory Workshop, vol. 34, no. 2,
junctions with MgO and AlO[sub x] magnetization reversal duration in May 2013, pp. 88–91.
barriers,” Appl. Phys. Lett., vol. 87, no. 23, subnanosecond spin-transfer switching,” [105] X. Fong, R. Venkatesan, A. Raghunathan,
2005, Art. no. 232502. Phys. Rev. B, vol. 75, no. 6, pp. 1–5, and K. Roy, “Non-volatile complementary
Feb. 2007. polarizer spin-transfer torque (CPSTT)
[75] X. Yao, H. Meng, Y. Zhang, and J.-P. Wang,
“Improved current switching symmetry of [90] N. N. Mojumder and K. Roy, “Proposal for on-chip caches: A device/circuit/systems
magnetic tunneling junction and giant switching current reduction using perspective,” IEEE Trans. Magn., vol. 50,
magnetoresistance devices with reference layer with tilted magnetic no. 10, Oct. 2014, Art. no. 3400611.
nano-current-channel structure,” J. Appl. anisotropy in magnetic tunnel junctions for [106] T. Yang, T. Kimura, and Y. Otani, “Giant
Phys., vol. 103, no. 7, 2008, Art. no. 07A717. spin-transfer torque (STT) MRAM,” IEEE spin-accumulation signal and pure
Trans. Electron Devices, vol. 59, no. 11, spin-current-induced reversible
[76] D. Datta, B. Behin-Aein, S. Datta, and
pp. 3054–3060, Nov. 2012. magnetization switching,” Nature Phys.,
S. Salahuddin, “Voltage asymmetry of
spin-transfer torques,” IEEE Trans. [91] X. Fong et al., “KNACK: A hybrid vol. 4, no. 11, pp. 851–854, Oct. 2008.
Nanotechnol., vol. 11, no. 2, pp. 261–272, spin-charge mixed-mode simulator for [107] Y. Ji, A. Hoffmann, J. S. Jiang, J. E. Pearson,
Mar. 2012. evaluating different genres of spin-transfer and S. D. Bader, “Non-local spin injection
torque MRAM bit-cells,” in Proc. IEEE Int. in lateral spin valves,” J. Phys. D. Appl. Phys.,
[77] J. Slonczewski and J. Sun, “Theory of
Conf. Simul. Semicond. Process. Devices, vol. 40, no. 5, pp. 1280–1284, Mar. 2007.
voltage-driven current and torque in
Sep. 2011, pp. 51–54.
magnetic tunnel junctions,” J. Magn. Magn. [108] T. Valet and A. Fert, “Theory of the
Mater., vol. 310, no. 2, pp. 169–175, [92] A. D. Kent, B. Özyilmaz, and E. del Barco, perpendicular magnetoresistance in
Mar. 2007. “Spin-transfer-induced precessional magnetic multilayers,” Phys. Rev. B, vol.
magnetization reversal,” Appl. Phys. Lett., 48, no. 10, pp. 7099–7113, Sep. 1993.
[78] H. Sato et al., “MgO/CoFeB/Ta/CoFeB/MgO
vol. 84, no. 19, p. 3897, Apr. 2004.
recording structure in magnetic tunnel [109] L. Xue et al., “Resonance measurement of
junctions with perpendicular easy axis,” [93] R. Law, E.-L. Tan, R. Sbiaa, T. Liew, and nonlocal spin torque in a three-terminal
IEEE Trans. Magn., vol. 49, no. 7, T. C. Chong, “Reduction in critical current magnetic device,” Phys. Rev. Lett., vol. 108,
pp. 4437–4440, Jul. 2013. for spin transfer switching in perpendicular no. 14, Apr. 2012, Art. no. 147201.
anisotropy spin valves using an in-plane
[79] H. Sato et al., “Comprehensive study of [110] M. Sharad, G. Panagopoulos, C. Augustine,
spin polarizer,” Appl. Phys. Lett., vol. 94,
CoFeB-MgO magnetic tunnel junction and K. Roy, “NLSTT-MRAM: Robust spin
no. 6, Feb. 2009, Art. no. 062516.
characteristics with single- and transfer torque MRAM using non-local
double-interface scaling down to 1X nm,” [94] H. Liu, D. Bedau, D. Backes, J. A. Katine, spin injection for write,” in Proc. 70th IEEE
in Proc. IEEE Int. Electron Devices Meet., and A. D. Kent, “Precessional reversal in Device Res. Conf., Jun. 2012, pp. 97–98.
Dec. 2013, pp. 3.2.1–3.2.4. orthogonal spin transfer magnetic random
[111] L. Liu et al., “Spin-torque switching with
access memory devices,” Appl. Phys. Lett.,
[80] H. Sato et al., “Co/Pt multilayer based the giant spin Hall effect of tantalum,”
vol. 101, no. 3, 2012, Art. no. 032403.
reference layers in magnetic tunnel Science, vol. 336, no. 6081, pp. 555–558,
junctions for nonvolatile spintronics [95] R. Sbiaa et al., “Reduction of switching May 2012.
VLSIs,” Jpn. J. Appl. Phys., vol. 53, no. 4S, current by spin transfer torque effect in
[112] I. M. Miron et al., “Perpendicular
Jan. 2014, Art. no. 04EM02. perpendicular anisotropy magnetoresistive
switching of a single ferromagnetic layer
devices (Invited),” J. Appl. Phys., vol. 109,
[81] F. Schreiber, J. Pflaum, Z. Frait, T. Mühge, induced by in-plane current injection,”
no. 7, Mar. 2011, Art. no. 07C707.
and J. Pelzl, “Gilbert damping and g-factor Nature, vol. 476, no. 7359, pp. 189–193,
in FexCo1x alloy films,” Solid State [96] Z. Sun, H. Li, Y. Chen, and X. Wang, Aug. 2011.
Commun., vol. 93, no. 12, pp. 965–968, “Voltage driven nondestructive self-
[113] I. M. Miron et al., “Current-driven spin
Mar. 1995. reference sensing scheme of spin-transfer
torque induced by the Rashba effect in a
torque memory,” IEEE Trans. Very Large
[82] J. Katine, F. Albert, R. Buhrman, E. Myers, ferromagnetic metal layer,” Nature Mater.,
Scale Integr. Syst., vol. 20, no. 11,
and D. Ralph, “Current-driven vol. 9, no. 3, pp. 230–234, Mar. 2010.
pp. 2020–2030, Nov. 2012.
magnetization reversal and spin-wave [114] V. Edelstein, “Spin polarization of conduction
excitations in Co/Cu/Co pillars,” Phys. Rev. [97] P. Braganca et al., “A three-terminal
electrons induced by electric current in
Lett., vol. 84, no. 14, pp. 3149–3152, approach to developing spin-torque written
two-dimensional asymmetric electron
Apr. 2000. magnetic random access memory cells,”
systems,” Solid State Commun., vol. 73,
IEEE Trans. Nanotechnol., vol. 8, no. 2,
[83] X. Liu, W. Zhang, M. J. Carter, and no. 3, pp. 233–235, Jan. 1990.
pp. 190–195, Mar. 2009.
G. Xiao, “Ferromagnetic resonance and [115] A. Chernyshov et al., “Evidence for reversible
damping properties of CoFeB thin films as [98] N. N. Mojumder et al., “Dual pillar
control of magnetization in a ferromagnetic
free layers in MgO-based magnetic tunnel spin-transfer torque MRAMs for low power
material by means of spin-orbit magnetic
junctions,” J. Appl. Phys., vol. 110, no. 3, applications,” ACM J. Emerg. Technol.
field,” Nature Phys., vol. 5, no. 9,
2011, Art. no. 033910. Comput. Syst., vol. 9, no. 2, pp. 1–17,
pp. 656–659, Aug. 2009.
May 2013.
[84] T. Devolder et al., “Damping of CoxFe80xB20 [116] M. Dyakonov and V. Perel, “Current-induced
ultrathin films with perpendicular [99] N. N. Mojumder, S. K. Gupta, and K. Roy,
spin orientation of electrons in
magnetic anisotropy,” Appl. Phys. Lett., “Dual pillar spin transfer torque MRAM
semiconductors,” Phys. Lett. A, vol. 35, no. 6,
vol. 102, no. 2, 2013, Art. no. 022407. with tilted magnetic anisotropy for fast and
pp. 459–460, Jul. 1971.
error-free switching and near-disturb-free
[85] Y. Tserkovnyak, A. Brataas, and [117] J. Hirsch, “Spin Hall effect,” Phys. Rev. Lett.,
read operations,” in Proc. 69th IEEE Device
G. E. W. Bauer, “Enhanced Gilbert vol. 83, no. 9, pp. 1834–1837, Aug. 1999.
Res. Conf., vol. 54, no. 2010, pp. 67–68.
damping in thin ferromagnetic films,”
[100] S. Fukami et al., “Low-current perpendicular [118] G. Yu et al., “Magnetization switching
Phys. Rev. Lett., vol. 88, no. 11, Feb. 2002,
domain wall motion cell for scalable through spin-Hall-effect-induced chiral
Art. no. 117601.
high-speed MRAM,” in Proc. Symp. VLSI domain wall propagation,” Phys. Rev. B,
Technol., Jun. 2009, pp. 230–231. vol. 89, no. 10, Mar. 2014, Art. no. 104421.

1484 Proceedings of the IEEE | Vol. 104, No. 7, July 2016

Fong et al.: Spin-Transfer Torque Memories: Devices, Circuits, and Systems

[119] Y. Yoshimura et al., “Effect of spin Hall IEEE Trans. Magn., vol. 40, no. 4, [150] D. Lee, X. Fong, and K. Roy, “R-MRAM: A
torque on current-induced precessional pp. 2637–2639, Jul. 2004. ROM-embedded STT MRAM cache,” IEEE
domain wall motion,” Appl. Phys. Exp., [135] K. Roy, S. Bandyopadhyay, and J. Atulasimha, Electron Device Lett., vol. 34, no. 10,
vol. 7, no. 3, Mar. 2014, Art. no. 033005. “Hybrid spintronics and straintronics: A pp. 1256–1258, Oct. 2013.
[120] P. Gambardella and I. M. Miron, magnetic technology for ultra low energy [151] X. Fong, R. Venkatesan, D. Lee,
“Current-induced spin-orbit torques,” computing and signal processing,” Appl. A. Raghunathan, and K. Roy, “Embedding
Philos. Trans. A. Math. Phys. Eng. Sci., Phys. Lett., vol. 99, no. 6, 2011, read-only memory in spin-transfer torque
vol. 369, no. 1948, pp. 3175–3197, Art. no. 063108. MRAM based on-chip caches,” IEEE Trans.
Aug. 2011. [136] N. Lei et al., “Strain-controlled magnetic Very Large Scale Integr. Syst., vol. 24, no. 3,
[121] J. Sinova et al., “Universal intrinsic Spin domain wall propagation in hybrid pp. 992–1002, Mar. 2016.
Hall effect,” Phys. Rev. Lett., vol. 92, piezoelectric/ferromagnetic structures,” [152] L. Zhang, X. Fong, C.-H. Chang, Z. H. Kong,
no. 12, Mar. 2004, Art. no. 126603. Nature Commun., vol. 4, Jan. 2013, and K. Roy, “Feasibility study of emerging
[122] C. F. Pai et al., “Spin transfer torque Art. no. 1378. non-volatile memory based physical
devices utilizing the giant spin Hall effect [137] M. Barangi and P. Mazumder, unclonable functions,” in Proc. 6th IEEE Int.
of tungsten,” Appl. Phys. Lett., vol. 101, “Straintronics-based magnetic tunneling Memory Workshop, May 2014, pp. 1–4.
no. 2012, pp. 1–5, 2012. junction: Dynamic and static behavior [153] L. Zhang, X. Fong, C.-H. Chang,
[123] T. Kimura, Y. Otani, T. Sato, S. Takahashi, analysis and material investigation,” Appl. Z. H. Kong, and K. Roy, “Highly reliable
and S. Maekawa, “Room-temperature Phys. Lett., vol. 104, no. 16, Apr. 2014, memory-based physical unclonable function
reversible Spin Hall effect,” Phys. Rev. Lett., Art. no. 162403. using spin-transfer torque MRAM,” in
vol. 98, no. 15, pp. 1–4, Apr. 2007. [138] A. Khan, D. E. Nikonov, S. Manipatruni, Proc. IEEE Int. Symp. Circuits Syst.,
T. Ghani, and I. A. Young, “Voltage Jun. 2014, pp. 2169–2172.
[124] Y. Kim, S. H. Choday, and K. Roy,
“DSH-MRAM: Differential Spin Hall induced magnetostrictive switching of [154] L. Zhang, X. Fong, C.-H. Chang,
MRAM for on-chip memories,” IEEE nanomagnets: Strain assisted strain transfer Z. H. Kong, and K. Roy, “Optimizating
Electron Device Lett., vol. 34, no. 10, torque random access memory,” Appl. Phys. emerging nonvolatile memories for
pp. 1259–1261, Oct. 2013. Lett., vol. 104, no. 26, Jun. 2014, dual-mode applications: data storage and
Art. no. 262407. key generator,” IEEE Trans. Comput. Des.
[125] L. Liu, O. J. Lee, T. J. Gudmundsen,
[139] X. Lou, Z. Gao, D. V. Dimitrov, and Integr. Circuits Syst., vol. 34, no. 7,
D. C. Ralph, and R. A. Buhrman,
M. X. Tang, “Demonstration of multilevel pp. 1176–1187, Jul. 2015.
“Current-induced switching of
perpendicularly magnetized magnetic layers cell spin transfer switching in MgO [155] A. Fukushima et al., “Statistical variance in
using spin torque from the Spin Hall magnetic tunnel junctions,” Appl. Phys. Lett., switching probability of spin-torque
effect,” Phys. Rev. Lett., vol. 109, no. 9, vol. 93, no. 24, 2008, Art. no. 242502. switching in MgO-MTJ,” IEEE Trans. Magn.,
Aug. 2012, Art. no. 096602. [140] T. Ishigaki et al., “A multi-level-cell vol. 48, no. 11, pp. 4344–4346, Nov. 2012.
[126] G. Yu et al., “Current-driven perpendicular spin-transfer torque memory with [156] A. Fukushima et al., “Spin dice: A scalable
magnetization switching in Ta/CoFeB/ series-stacked magnetotunnel junctions,” in truly random number generator based on
[TaOx or MgO/TaOx] films with lateral Proc. IEEE Symp. VLSI Technol., Jun. 2010, spintronics,” Appl. Phys. Exp., vol. 7, no. 8,
structural asymmetry,” Appl. Phys. Lett., pp. 47–48. Aug. 2014, Art. no. 083001.
vol. 105, no. 10, Sep. 2014, Art. no. 102411. [141] R. Sbiaa et al., “Spin transfer torque [157] A. Fukushima, K. Yakushiji, H. Kubota,
[127] L. You et al., “Switching of perpendicularly switching for multi-bit per cell magnetic and S. Yuasa, “Spin dice (physical random
polarized nanomagnets with spin orbit memory with perpendicular anisotropy,” number generator using spin torque
torque without an external magnetic field Appl. Phys. Lett., vol. 99, no. 9, 2011, switching) and its thermal response,” in
by engineering a tilted anisotropy,” in Proc. Art. no. 092506. Proc. IEEE Magn. Conf., 2015, p. 1.
National Academy Sci., vol. 112, no. 33, [142] G. Panagopoulos, C. Augustine, X. Fong, [158] W. H. Choi et al., “A magnetic tunnel
Aug. 2015, pp. 10310–10315. and K. Roy, “Exploring variability and junction based true random number
[128] J. Ryu, K.-J. Lee, and H.-W. Lee, reliability of multi-level STT-MRAM cells,” generator with conditional perturb and
“Current-driven domain wall motion with in Proc. 70th IEEE Device Res. Conf., real-time output probability tracking,” in
spin Hall effect: Reduction of threshold Jun. 2012, pp. 139–140. IEEE Int. Electron Devices Meet. (IEDM)
current density,” Appl. Phys. Lett., vol. 102, [143] S. S. P. Parkin, M. Hayashi, and L. Thomas, Tech. Dig., 2014, pp. 12.5.1–12.5.4.
no. 17, 2013, Art. no. 172404. “Magnetic domain-wall racetrack memory,” [159] X. Fong, M.-C. Chen, and K. Roy,
[129] E. Martinez, S. Emori, and G. S. D. Beach, Science, vol. 320, no. 5873, pp. 190–194, “Generating true random numbers using
“Current-driven domain wall motion along Apr. 2008. on-chip complementary polarizer
high perpendicular anisotropy multilayers: [144] M.-Y. Im, L. Bocklage, P. Fischer, and spin-transfer torque magnetic tunnel
The role of the Rashba field, the spin G. Meier, “Direct observation of stochastic junctions,” in Proc. 72nd IEEE Device Res.
Hall effect, and the Dzyaloshinskii-Moriya domain-wall depinning in magnetic Conf., vol. 11, no. 1, pp. 103–104.
interaction,” Appl. Phys. Lett., vol. 103, nanowires,” Phys. Rev. Lett., vol. 102, [160] M. Dichtl, “Bad and good ways of
no. 7, 2013, Art. no. 072406. no. 14, Apr. 2009, Art. no. 147204. post-processing biased physical random
[130] A. V. Khvalkovskiy et al., “Matching [145] G. Beach, C. Knutson, C. Nistor, M. Tsoi, numbers,” in Fast Software Encryption,
domain-wall configuration and spin-orbit and J. Erskine, “Nonlinear domain-wall Berlin, Germany: Springer, pp. 137–152.
torques for efficient domain-wall motion,” velocity enhancement by spin-polarized [161] P. Lacharme, “Post-processing functions for
Phys. Rev. B, vol. 87, no. 2, Jan. 2013, electric current,” Phys. Rev. Lett., vol. 97, a biased physical random number
Art. no. 020402. no. 5, Aug. 2006, Art. no. 057203. generator,” in Fast Software Encryption,
[131] S. Emori, U. Bauer, S.-M. Ahn, E. Martinez, [146] M. Hayashi, L. Thomas, R. Moriya, Berlin, Germany: Springer, pp. 334–342.
and G. S. D. Beach, “Current-driven C. Rettner, and S. S. P. Parkin, [162] J. Von Neumann, “Various techniques used
dynamics of chiral ferromagnetic domain “Current-controlled magnetic domain-wall in connection with random digit,” Nat. Bur.
walls,” Nature Mater., vol. 12, no. 7, nanowire shift register,” Science, vol. 320, Stand. Appl. Math. Ser. 12, pp. 36–38, 1951.
pp. 611–616, Jul. 2013. no. 5873, pp. 209–211, Apr. 2008. [163] R. Nebashi et al., “A 90 nm 12 ns 32 Mb
[132] W.-G. Wang, M. Li, S. Hageman, and [147] S. Ghosh, “Design methodologies for high 2T1MTJ MRAM,” in IEEE Int. Solid-State
C. L. Chien, “Electric-field-assisted density domain wall memory,” in Proc. Circuits Conf.—Dig. Tech. Papers, Feb. 2009,
switching in magnetic tunnel junctions,” IEEE/ACM Int. Symp. Nanoscale Archit., pp. 462–463.
Nature Mater., vol. 11, no. 1, pp. 64–68, vol. 2, Jun. 2013, pp. 30–31. [164] S. P. Park et al., “Column-selection-enabled
Jan. 2012. [148] Y. Kim, X. Fong, K.-W. Kwon, M.-C. Chen, 8T SRAM array with 1R/1W multi-port
[133] N. A. Pertsev, “Origin of easy magnetization and K. Roy, “Multilevel spin-orbit torque operation for DVFS-enabled processors,” in
switching in magnetic tunnel junctions with MRAMs,” IEEE Trans. Electron Devices, Proc. IEEE/ACM Int. Symp. Low Power
voltage-controlled interfacial anisotropy,” vol. 62, no. 2, pp. 561–568, Feb. 2015. Electron. Des., Aug. 2011, pp. 303–308.
Sci. Rep., vol. 3, Jan. 2013, Art. no. 2757. [149] M. Sharad, R. Venkatesan, A. Raghunathan, [165] A. Hokazono et al., “Forward body biasing
[134] S. Kim, S. Shin, and K. No, “Voltage and K. Roy, “Multi-level magnetic RAM as a bulk-Si CMOS technology scaling
control of magnetization easy-axes: A using domain wall shift for energy-efficient, strategy,” IEEE Trans. Electron Devices,
potential candidate for spin switching in high-density caches,” in Proc. IEEE Int. vol. 55, no. 10, pp. 2657–2664, Oct. 2008.
future ultrahigh-density nonvolatile Symp. Low Power Electron. Des., Sep. 2013, [166] G. Panagopoulos, C. Augustine, and K. Roy,
magnetic random access memory,” pp. 64–69. “Modeling of dielectric breakdown-induced

Vol. 104, No. 7, July 2016 | Proceedings of the IEEE 1485

Fong et al.: Spin-Transfer Torque Memories: Devices, Circuits, and Systems

time-dependent STT-MRAM performance gating technique with 1.0 ns/200 ps [198] Q. Li, J. Li, L. Shi, C. J. Xue, and Y. He,
degradation,” in Proc. 69th IEEE Device wake-up/power-off times,” in Proc. IEEE “MAC: Migration-aware compilation for
Res. Conf., Jun. 2011, pp. 125–126. Symp. VLSI Circuits, Jun. 2012, pp. 46–47. STT-RAM based hybrid cache in embedded
[167] C.-H. Ho et al., “A physical model to [183] S. Yamamoto and S. Sugahara, “Nonvolatile systems,” in Proc. ACM/IEEE Int. Symp.
predict STT-MRAM performance static random access memory using Low Power Electron. Design (ISLPED), 2012,
degradation induced by TDDB,” in Proc. magnetic tunnel junctions with p. 351.
71st IEEE Device Res. Conf., vol. 45, current-induced magnetization switching [199] Y.-T. Chen et al., “Static and dynamic
no. 765, pp. 59–60. architecture,” Jpn. J. Appl. Phys., vol. 48, co-optimizations for blocks mapping in
[168] T. Devolder et al., “Magnetization no. 4, Apr. 2009, Art. no. 043001. hybrid caches,” in Proc. ACM/IEEE Int.
switching by spin torque using [184] S. Chatterjee, M. Rasquinha, S. Yalamanchili, Symp. Low Power Electron. Design (ISLPED),
subnanosecond current pulses assisted by and S. Mukhopadhyay, “A methodology for 2012, p. 237.
hard axis magnetic fields,” Appl. Phys. Lett., robust, energy efficient design of [200] J. Hu, C. J. Xue, Q. Zhuge, W.-C. Tseng,
vol. 88, no. 15, 2006, Art. no. 152502. Spin-Torque-Transfer RAM arrays at scaled and E. H.-M. Sha, “Towards energy
[169] K. Ito, T. Devolder, C. Chappert, M. J. Carey, technologies,” in Proc. Int. Conf. Comput. efficient hybrid on-chip scratch pad
and J. A. Katine, “Micromagnetic Design (ICCAD), 2009, p. 474. memory with non-volatile memory,” in
simulation of spin transfer torque [185] H. Noguchi et al., “Variable nonvolatile Proc. IEEE Design Autom. Test Eur.,
switching combined with precessional memory arrays for adaptive computing Mar. 2011, pp. 1–6.
motion from a hard axis magnetic field,” systems,” in Proc. IEEE Int. Electron Devices [201] J. Hu, C. J. Xue, Q. Zhuge, W.-C. Tseng,
Appl. Phys. Lett., vol. 89, no. 25, 2006, Meet., Dec. 2013, pp. 25.4.1–25.4.4. and E. H.-M. Sha, “Data allocation
Art. no. 252509. [186] W. Kang et al., “Reconfigurable codesign optimization for hybrid scratch pad
[170] K. Ito, T. Devolder, C. Chappert, M. J. Carey, of STT-MRAM under process variations in memory with SRAM and nonvolatile
and J. A. Katine, “Micromagnetic deeply scaled technology,” IEEE Trans. memory,” IEEE Trans. Very Large Scale
simulation on effect of oersted field and Electron Devices, vol. 62, no. 6, Integr. Syst., vol. 21, no. 6, pp. 1094–1102,
hard axis field in spin transfer torque pp. 1769–1777, Jun. 2015. Jun. 2013.
switching,” J. Phys. D. Appl. Phys., vol. 40, [187] W. Kang et al., “DFSTT-MRAM: Dual [202] J. Cong et al., “An energy-efficient adaptive
no. 5, pp. 1261–1267, Mar. 2007. functional STT-MRAM cell structure for hybrid cache,” in Proc. IEEE/ACM Int.
[171] Y. Lakys et al., “Self-enabled ‘error-free’ reliability enhancement and 3-D MLC Symp. Low Power Electron. Design, Aug. 2011,
switching circuit for spin transfer torque functionality,” IEEE Trans. Magn., vol. 50, pp. 67–72.
MRAM and logic,” IEEE Trans. Magn., no. 6, pp. 1–7, Jun. 2014. [203] J. Wang et al., “A coherent hybrid SRAM
vol. 48, no. 9, pp. 2403–2406, Sep. 2012. [188] W. Xu, Y. Chen, X. Wang, and T. Zhang, and STT-RAM L1 cache architecture for
[172] W. Kang et al., “Spintronics: Emerging “Improving STT MRAM storage density shared memory multicores,” in Proc. 19th
ultra-low power circuits and systems through smaller-than-worst-case transistor IEEE Asia South Pacific Des. Autom. Conf.,
beyond MOS technology,” ACM J. Emerg. sizing,” in Proc. Design Autom. Conf., 2009, Jan. 2014, pp. 610–615.
Technol. Comput. Syst., vol. 12, no. 2, pp. 87–90. [204] G. Sun, X. Dong, Y. Xie, J. Li, and
pp. 1–42, Sep. 2015. [189] B. D. Bel, J. Kim, C. H. Kim, and Y. Chen, “A novel architecture of the 3D
[173] Y. Kim, S. K. Gupta, S. P. Park, S. S. Sapatnekar, “Improving STT-MRAM stacked MRAM L2 cache for CMPs,” in
G. Panagopoulos, and K. Roy, density through multibit error correction,” Proc. 15th IEEE Int. Symp. High Perform.
“Write-optimized reliable design of STT in Proc. IEEE Design Autom. Test Eur. Conf. Comput. Archit., Feb. 2009, pp. 239–249.
MRAM,” in Proc. ACM/IEEE Int. Symp. Exhib., 2014, pp. 1–6. [205] C. Zhang et al., “SBAC,” in Proc. Int. Symp.
Low Power Electron. Design (ISLPED), 2012, [190] S. P. Park, S. Gupta, N. Mojumder, Low Power Electron. Design (ISLPED), 2014,
p. 3. A. Raghunathan, and K. Roy, “Future cache pp. 345–350.
[174] S. K. Gupta, N. N. Mojumder, and K. Roy, design using STT MRAMs for improved [206] Z. Wang, D. A. Jimenez, C. Xu, G. Sun,
“Layout-aware optimization of stt mrams,” energy efficiency,” in Proc. 49th Annu. and Y. Xie, “Adaptive placement and
in Proc. IEEE Design Autom. Test Eur. Conf. Design Autom. Conf. (DAC), 2012, p. 492. migration policy for an STT-RAM-based
Exhib., Mar. 2012, pp. 1455–1458. [191] X. Wu et al., “Design exploration of hybrid hybrid cache,” in Proc. 20th IEEE Int. Symp.
[175] C. Augustine et al., “Spin-transfer torque caches with disparate memory High Perform. Comput. Archit., Feb. 2014,
MRAMs for low power memories: technologies,” ACM Trans. Archit. Code pp. 13–24.
Perspective and prospective,” IEEE Sensors Optim., vol. 7, no. 3, pp. 1–34, Dec. 2010. [207] P. Zhou, B. Zhao, J. Yang, and Y. Zhang,
J., vol. 12, no. 4, pp. 756–766, Apr. 2012. [192] X. Wu et al., “Hybrid cache architecture “Energy reduction for STT-RAM using
[176] B. Zhao, J. Yang, Y. Zhang, Y. Chen, and with disparate memory technologies,” ACM early write termination,” in IEEE/ACM Int.
H. Li, “Architecting a common-source-line SIGARCH Comput. Archit. News, vol. 37, Conf. Comput. Design—Dig. Tech. Papers
array for bipolar non-volatile memory no. 3, p. 34, Jun. 2009. (ICCAD), 2009, pp. 264–268.
devices,” in Proc. IEEE Design Autom. Test [193] X. Wu, J. Li, L. Zhang, E. Speight, and [208] M. Rasquinha, D. Choudhary, S. Chatterjee,
Eur. Conf. Exhib., Mar. 2012, pp. 1451–1454. Y. Xie, “Power and performance of S. Mukhopadhyay, and S. Yalamanchili,
[177] C. Mead and L. Conway, Introduction read-write aware Hybrid Caches with “An energy efficient cache design using spin
to VLSI Systems, Reading, MA, USA: non-volatile memories,” in Proc. IEEE torque transfer (STT) RAM,” in Proc. 16th
Addison-Wesley, 1980. Design Autom. Test Eur. Conf. Exhib., ACM/IEEE Int. Symp. Low Power Electron.
Apr. 2009, pp. 737–742. Design (ISLPED), 2010, p. 389.
[178] The MOSIS Service, “MOSIS scalable
CMOS (SCMOS) design rules.” [Online]. [194] J. Zhao, C. Xu, and Y. Xie, [209] J. Wang, X. Dong, and Y. Xie, “OAP: An
Available: “Bandwidth-aware reconfigurable cache obstruction-aware cache management
design/rules/index. design with hybrid memory technologies,” policy for STT-RAM last-level caches,” in
in Proc. IEEE/ACM Int. Conf. Comput. Proc. IEEE Design Autom. Test Eur. Conf.
[179] D. Lee, S. K. Gupta, and K. Roy,
Design, Nov. 2011, pp. 48–55. Exhib. (DATE), 2013, pp. 847–852.
“High-performance low-energy STT MRAM
based on balanced write scheme,” in Proc. [195] A. Jadidi, M. Arjomand, and H. Sarbazi-Azad, [210] J. Jung, Y. Nakata, M. Yoshimoto, and
ACM/IEEE Int. Symp. Low Power Electron. “High-endurance and performance-efficient H. Kawaguchi,“Energy-efficient spin-transfer
Design (ISLPED), 2012, p. 9. design of hybrid cache architectures torque RAM cache exploiting additional
through adaptive line replacement,” in all-zero-data flags,” in Proc. IEEE Int.
[180] N. Sakimura et al., “A 250-MHz 1-Mbit
Proc. IEEE/ACM Int. Symp. Low Power Symp. Qual. Electron. Design,
embedded MRAM macro using 2T1MTJ
Electron. Design, Aug. 2011, pp. 79–84. Mar. 2013, pp. 216–222.
cell with bitline separation and half-pitch
shift architecture,” in Proc. IEEE Asian [196] Y. Li, Y. Chen, and A. K. Jones, “A [211] J. Ahn, S. Yoo, and K. Choi, “Write intensity
Solid-State Circuits Conf. (A-SSCC), 2007, software approach for combating prediction for energy-efficient non-volatile
pp. 216–219. asymmetries of non-volatile memories,” in caches,” in Proc. IEEE Int. Symp. Low Power
Proc. ACM/IEEE Int. Symp. Low Power Electron. Design, Sep. 2013, pp. 223–228.
[181] K. Abe, S. Fujita, and T. H. Lee,
“Architecture of three-dimensional circuit Electron. Design (ISLPED), 2012, p. 191. [212] M. Mao, G. Sun, Y. Li, A. K. Jones, and
using nanoscale memory devices,” in Proc. [197] S.-M. Syu, Y.-H. Shao, and I.-C. Lin, Y. Chen, “Prefetching techniques for
Eur. Micro Nano Syst., 2004, pp. 225–229. “High-endurance hybrid cache design in STT-RAM based last-level cache in CMP
CMP architecture with cache partitioning systems,” in Proc. 19th IEEE Asia South
[182] T. Ohsawa et al., “1Mb 4T-2MTJ
and access-aware policy,” in Proc. Pacific Design Autom. Conf., Jan. 2014,
nonvolatile STT-RAM for embedded
23rd ACM Int. Conf. Great Lakes Symp. pp. 67–72.
memories using 32b fine-grained power
VLSI-GLSVLSI, 2013, p. 19.

1486 Proceedings of the IEEE | Vol. 104, No. 7, July 2016

Fong et al.: Spin-Transfer Torque Memories: Devices, Circuits, and Systems

[213] A. Jog et al., “Cache revive: Architecting information,” in Proc. IEEE Symp. VLSI A. Raghunathan, “STAG: Spintronic-tape
volatile STT-RAM caches for enhanced Technol., 2011, pp. 214–215. architecture for GPGPU cache hierarchies,”
performance in CMPs,” in Proc. 49th [223] G. S. Kar et al., “Co/Ni based p-MTJ stack ACM SIGARCH Comput. Archit. News,
Annu. Design Autom. Conf. (DAC), Jun. 2012, for sub-20 nm high density stand alone vol. 42, no. 3, pp. 253–264, Oct. 2014.
p. 243. and high performance embedded memory [233] Z. Sun, W. Wu, and H. H. Li, “Cross-layer
[214] C. W. Smullen, V. Mohan, A. Nigam, application,” in Proc. IEEE Int. Electron racetrack memory design for ultra high
S. Gurumurthi, and M. R. Stan, “Relaxing Devices Meet., Dec. 2014, pp. 19.1.1–19.1.4. density and low power consumption,” in
non-volatility for fast and energy-efficient [224] M. H. Jeon et al., “Selective etching of Proc. 50th Annu. Design Autom. Conf. (DAC),
STT-RAM caches,” in Proc. 17th IEEE Int. magnetic tunnel junction materials using 2013, p. 1.
Symp. High Perform. Comput. Archit., CO/NH 3 gas mixture in radio frequency [234] Z. Sun, X. Bi, A. K. Jones, and H. Li,
Feb. 2011, pp. 50–61. pulse-biased inductively coupled plasmas,” “Design exploration of racetrack
[215] Q. Li et al., “Compiler-assisted refresh Jpn. J. Appl. Phys., vol. 52, no. 5S2, lower-level caches,” in Proc. Int. Symp. Low
minimization for volatile STT-RAM cache,” May 2013, Art. no. 05EB03. Power Electron. Design (ISLPED), 2014,
in Proc. 18th IEEE Asia South Pacific Design [225] W. Kang et al., “A radiation hardened pp. 263–266.
Autom. Conf., Jan. 2013, pp. 273–278. hybrid spintronic/CMOS nonvolatile unit [235] M. Mao, W. Wen, Y. Zhang, Y. Chen, and
[216] J. Li et al., “Low-energy volatile STT-RAM using magnetic tunnel junctions,” J. Phys. H. H. Li, “Exploration of GPGPU register file
cache design using cache-coherence-enabled D. Appl. Phys., vol. 47, no. 40, Oct. 2014, architecture using domain-wall-shift-write
adaptive refresh,” ACM Trans. Design Autom. Art. no. 405003. based racetrack memory,” in Proc. 51st Annu.
Electron. Syst., vol. 19, no. 1, pp. 1–23, [226] N. Locatelli et al., “Spintronic Devices as Design Autom. Conf. (DAC), 2014, pp. 1–6.
Dec. 2013. Key Elements for Energy-Efficient [236] S. Motaman, A. Iyengar, and S. Ghosh,
[217] Z. Sun, X. Bi, H. Li, W.-F. Wong, and Neuroinspired Architectures,” in Proc. “Synergistic circuit and system design for
X. Zhu, “STT-RAM cache hierarchy with Design Autom. Test Eur. Conf. Exhib., 2015, energy-efficient and robust domain wall
multiretention MTJ designs,” IEEE Trans. vol. 1, pp. 994–999. caches,” in Proc. Int. Symp. Low Power
Very Large Scale Integr. Syst., vol. 22, no. 6, [227] J. Kim et al., “Spin-based computing: Electron. Design (ISLPED), 2014,
pp. 1281–1293, Jun. 2014. device concepts, current status, and a case pp. 195–200.
[218] Z. Sun et al., “Multi retention level study on a microprocessor,” Proc. IEEE, [237] R. Venkatesan, V. K. Chippa, C. Augustine,
STT-RAM cache designs with a dynamic vol. 103, no. 1, pp. 106–130, Jan. 2015. K. Roy, and A. Raghunathan,
refresh scheme,” in Proc. 44th Annu. [228] X. Fong et al., “Spin-transfer torque “Domain-specific many-core computing
IEEE/ACM Int. Symp. Microarchitecture devices for logic and memory: Prospects using spin-based memory,” IEEE Trans.
(MICRO-44), 2011, p. 329. and perspectives,” IEEE Trans. Comput. Nanotechnol., vol. 13, no. 5, pp. 881–894,
[219] K.-W. Kwon, S. H. Choday, Y. Kim, and Design Integr. Circuits Syst., vol. 35, no. 1, Sep. 2014.
K. Roy, “AWARE (Asymmetric Write pp. 1–22, Jan. 2016. [238] R. Venkatesan et al., “Cache design with
Architecture with REdundant blocks): A [229] R. Venkatesan, V. K. Chippa, C. Augustine, domain wall memories,” IEEE Trans.
high write speed STT-MRAM cache K. Roy, and A. Raghunathan, “Energy Comput., 2015, to be published.
architecture,” IEEE Trans. Very Large Scale efficient many-core processor for [239] A. Ranjan et al., “DyReCTape: A dynamically
Integr. Syst., vol. 22, no. 4, pp. 712–720, recognition and mining using spin-based reconfigurable cache using domain wall
Apr. 2013. memory,” in Proc. IEEE/ACM Int. Symp. memory tapes,” in Proc. Design Autom. Test
[220] G. Sun, Y. Zhang, Y. Wang, and Y. Chen, Nanoscale Archit., Jun. 2011, pp. 122–128. Eur. Conf. Exhib. (DATE), Grenoble, France,
“Improving energy efficiency of [230] R. Venkatesan et al., “TapeCache: A high 2015, pp. 181–186.
write-asymmetric memories by log style density, energy efficient cache based on [240] B. Behin-Aein, D. Datta, S. Salahuddin,
write,” in Proc. ACM/IEEE Int. Symp. Low domain wall memory,” in Proc. ACM/IEEE and S. Datta, “Proposal for an all-spin logic
Power Electron. Design (ISLPED), 2012, Int. Symp. Low Power Electron. Design device with built-in memory,” Nature
pp. 173–178. (ISLPED), 2012, p. 185. Nanotechnol., vol. 5, pp. 266–270,
[221] J. Wang, X. Dong, and Y. Xie, “Preventing [231] R. Venkatesan, M. Sharad, K. Roy, and Feb. 2010.
STT-RAM last-level caches from port A. Raghunathan, “DWM-TAPESTRI—An [241] Z. Pajouhi, S. Venkataramani, K. Yogendra,
obstruction,” ACM Trans. Archit. Code energy efficient all-spin cache using A. Raghunathan, and K. Roy, “Exploring
Optim., vol. 11, no. 3, pp. 1–19, Jul. 2014. domain wall shift based writes,” in Proc. spin-transfer-torque devices for logic
[222] K. Miura et al., “CoFeB/MgO based IEEE Design Autom. Test Eur. Conf. Exhib. applications,” IEEE Trans. Comput.-Aided
perpendicular magnetic tunnel junctions (DATE), 2013, pp. 1825–1830. Design Integr. Circuits Syst., vol. 34, no. 9,
with stepped structure for symmetrizing [232] R. Venkatesan, S. G. Ramasubramanian, pp. 1441–1454, Sep. 2015.
different retention times of ‘0’ and ‘1’ S. Venkataramani, K. Roy, and


Xuanyao Fong (Member, IEEE) received the B. Yusung Kim received the B.S. degree in electri-
Sc. and Ph.D. degrees in electrical engineering cal and computer engineering from Rutgers Uni-
from Purdue University, West Lafayette, IN, USA, versity, Piscataway, NJ, USA, in 2007, and the
in 2006 and 2014, respectively. M.S. and Ph.D. degrees in electrical engineering
From January to August 2007, he was an In- from Purdue University, West Lafayette, IN, USA,
tern Engineer with Advanced Micro Devices, Inc., in 2010 and 2015, respectively.
Boston Design Center, Boxboro, MA, USA. He was He is currently a Component Design Engineer
a Research Assistant and then Postdoctoral Re- with Intel Corporation, Hillsboro, OR, USA. He
search Assistant to Prof. Kaushik Roy with the was a Research Assistant to Prof. Kaushik Roy
Nanoelectronics Research Laboratory, Purdue with the Nanoelectronics Research Laboratory,
University, from August 2007 to May 2015. Currently, he is a Research Purdue University, until 2015. He was a Summer Intern with the Analog
Scientist with the Institute of Microelectronics, Agency for Science, and CMOS Embedded Memory Group, Texas Instruments, Dallas, TX,
Technology and Research (A*STAR), Singapore. His research interests USA, in 2012 and 2013. His current research interests include device cir-
include device/circuit/architecture co-design methodologies for Si and cuit codesign for emerging nonvolatile memories.
non-Si nanoelectronics; design of high-performance and ultralow Dr. Kim was a recipient of the Magoon Award for Excellence in
power logic and memory systems using spintronic devices, circuits, and Teaching from Purdue University in 2010 and the Purdue Research
architectures; and non-Boolean and analog computing paradigms using Foundation Fellowship in Summer 2011.
emerging technologies.
Dr. Fong received the “AMD Design Excellence Award” at Purdue in
2008, and the best paper award at the 2006 International Symposium
on Low Power Electronics and Design.

Vol. 104, No. 7, July 2016 | Proceedings of the IEEE 1487

Fong et al.: Spin-Transfer Torque Memories: Devices, Circuits, and Systems

Rangharajan Venkatesan (Member, IEEE) re- served as Associate Editor of the IEEE TRANSACTIONS ON COMPUTER-AIDED
munication engineering from Indian Institute of LARGE SCALE INTEGRATION (VLSI) SYSTEMS, ACM Transactions on Design Au-
Technology, Roorkee, India, in 2009, and the Ph.D. tomation of Electronic Systems, IEEE TRANSACTIONS ON MOBILE COMPUTING,
degree in electrical and computer engineering from ACM Transactions on Embedded Computing Systems, IEEE Design &
Purdue University, West Lafayette, IN, USA, in 2014. Test of Computers, and the Journal of Low Power Electronics. His publi-
During his Ph.D. studies, he was a research cations have been recognized with eight best paper awards and four
intern with Intel Corporation, Hillsboro, OR, best paper nominations. He received the Patent of the Year Award
USA, from May to September 2012 and June to (recognizing the invention with the highest impact) and two Technology
September 2013, where he worked on develop- Commercialization Awards from NEC. He was chosen by MIT’s Technol-
ing low-power design methodologies for graphics processors and de- ogy Review among the TR35 (top 35 innovators under 35 years, across
signing circuits for enabling fine-grained power gating. He is currently various disciplines of science and technology) in 2006, for his work on
a Research Scientist with NVIDIA Corporation, Santa Clara, CA, USA. “making mobile secure.” He was a recipient of the IEEE Meritorious
His research interests include circuit-architecture codesign for emerg- Service Award in 2001 and Outstanding Service Award in 2004.
ing technologies, neuromorphic hardware architectures, approximate
computing, and variation-aware design methodologies. Kaushik Roy (Fellow, IEEE) received the B.Tech.
Dr. Venkatesan was awarded the Ross Fellowship for the year degree in electronics and electrical communica-
2009–2010 and the Bilsland Dissertation Fellowship for the year 2013– tions engineering from the Indian Institute of
2014 by the Graduate School, Purdue University. He has received the Technology (IIT), Kharagpur, India, in 1983, and
Best Paper Award in the International Symposium on Low Power Elec- the Ph.D. degree in electrical and computer engi-
tronics and Design (ISLPED) 2012 and a Best Paper Nomination in De- neering from the University of Illinois at Urbana-
sign Automation Test in Europe (DATE) 2015. Champaign, Urbana, IL, USA, in 1990.
He was with the Semiconductor Process and
Sri Harsha Choday (Student Member, IEEE) re- Design Center of Texas Instruments, Dallas, TX,
ceived the B.E. degree in electronics and tele- USA, where he worked on FPGA architecture de-
communication engineering from the University velopment and low-power circuit design. He joined the Electrical and
of Pune, Pune, India, in 2006, the M.S. degree in Computer Engineering faculty at Purdue University, West Lafayette, IN,
electrical engineering from Oklahoma State Uni- USA, in 1993, where he is currently the Edward G. Tiedemann Jr. Distin-
versity, Stillwater, OK, USA, in 2009, and the Ph.D. guished Professor. He has published more than 600 papers in refereed
degree in electrical engineering from Purdue journals and conferences, holds 15 patents, has supervised 70 Ph.D.
University, West Lafayette, IN, USA, in 2014. dissertations, and is coauthor of two books on Low Power CMOS VLSI
He is currently a Research Scientist with Intel Design (Wiley & McGraw-Hill). His research interests include spintro-
Labs, Intel Corporation, Hillsboro, OR, USA. He nics, device-circuit co-design for nano-scale Silicon and non-Silicon
was a Senior Engineer with Qualcomm, Inc., San Diego, CA, USA, and a technologies, low-power electronics for portable computing and wire-
summer intern with the Technology Development Group, Open-Silicon, less communications, and new computing models enabled by emerging
Inc., Milpitas, CA, USA, in 2014 and 2008, respectively. His current re- technologies.
search interests include variation-aware low-power memory design, Dr. Roy was a Research Visionary Board Member of Motorola Labs
spintronics, and thermo-electric device modeling. in 2002 and held the M.K. Gandhi Distinguished Visiting faculty at IIT
Dr. Choday was a student member of the Academic Appeals Panel. Bombay. He has been in the Editorial Board of IEEE Design & Test, IEEE
He was a recipient of a Ph.D. Fellowship from Oklahoma State Univer- TRANSACTIONS ON CIRCUITS AND SYSTEMS, IEEE TRANSACTIONS ON VERY LARGE
DEVICES. He was Guest Editor for the Special Issue on Low-Power VLSI
Anand Raghunathan (Fellow, IEEE) received the in the IEEE Design & Test (1994) and IEEE TRANSACTIONS ON VERY LARGE
B.Tech. degree in electrical and electronics engi- SCALE INTEGRATION (VLSI) SYSTEMS (June 2000), IEE Proceedings Com-
neering from the Indian Institute of Technology, puters and Digital Techniques (July 2002), and IEEE JOURNAL ON EMERG-
Madras, India, and the M.A. and Ph.D. degrees in ING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS (2011). He received the
electrical engineering from Princeton University, National Science Foundation Career Development Award in 1995,
Princeton, NJ, USA. the IBM faculty partnership award, the ATT/Lucent Foundation award,
He is a Professor and Chair of VLSI with the the 2005 SRC Technical Excellence Award, the SRC Inventors Award, the
School of Electrical and Computer Engineering, Purdue College of Engineering Research Excellence Award, the
Purdue University, West Lafayette, IN, USA, Humboldt Research Award in 2010, the 2010 IEEE Circuits and Systems
where he leads the Integrated Systems Labora- Society Technical Achievement Award, the Distinguished Alumnus
tory. He was previously a Senior Research Staff Member with NEC Labo- Award from IIT Kharagpur, the Fulbright-Nehru Distinguished Chair,
ratories America, Princeton, NJ, USA, and has held the Gopalakrishnan the DoD National Security Science and Engineering Faculty Fellow
Distinguished Chair with the Indian Institute of Technology, Madras. (2014–2019), the Semiconductor Research Corporation Aristotle award
He has coauthored a book, High-Level Power Analysis and Optimiza- in 2015; best paper awards at the 1997 International Test Conference,
tion (Kluwer, 1998), eight book chapters, 21 U.S patents, and over 220 IEEE 2000 International Symposium on Quality of IC Design, the 2003
refereed journal and conference papers. His research explores domain- IEEE Latin American Test Workshop, the 2003 IEEE Nano, the 2004 IEEE
specific architecture, system-on-chip design, embedded systems, and International Conference on Computer Design, and the 2006 IEEE/ACM
heterogeneous parallel computing. International Symposium on Low Power Electronics & Design; the 2005
Prof. Raghunathan is a Golden Core Member of the IEEE Computer IEEE Circuits and System Society Outstanding Young Author Award
Society. He has served on the technical program and organizing com- (Chris Kim), the 2006 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION
mittees of several leading conferences and workshops. He has chaired (VLSI) SYSTEMS Best Paper Award, the 2012 ACM/IEEE International
the ACM/IEEE International Symposium on Low Power Electronics and Symposium on Low Power Electronics and Design Best Paper Award,
Design, the ACM/IEEE International Conference on Compilers, Architec- and the 2013 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI)
ture, and Synthesis for Embedded Systems, the IEEE VLSI Test Sympo- SYSTEMS Best Paper Award. He was a Purdue University Faculty Scholar
sium, and the IEEE International Conference on VLSI Design. He has from 1998 to 2003.

1488 Proceedings of the IEEE | Vol. 104, No. 7, July 2016

You might also like