Professional Documents
Culture Documents
Conference ETS 2017
Conference ETS 2017
Nour Sayed Fabian Oboril Azadeh Shirvanian Rajendra Bishnoi Mehdi B. Tahoori
Karlsruhe Institute of Technology (KIT) Karlsruhe, Germany
E-mail: {nour.sayed, fabian.oboril, azadeh.shirvanian, rajendra.bishnoi, mehdi.tahoori}@kit.edu
Abstract—Spin Transfer Torque Magnetic RAM (STT- continuously flows for the same duration resulting in a high
MRAM) is an emerging non-volatile memory technology and energy [5].
a potential candidate to replace SRAM in processor caches.
However, STT-MRAM suffers from a high write latency and Combining fast writing speed and low write energy to-
high write energy consumption which have to be addressed for gether is not yet trivial. Consequently, there are a variety of
energy-efficient on-chip caches. The non-volatility property of different approaches at circuit- and architecture-level to im-
STT-MRAM can be relaxed by reducing the thermal stability prove the characteristics of STT-MRAM [6–10]. An interesting
factor to improve both the write latency and write energy of solution in that regard is to relax the non-volatility property of
STT-MRAM. However, this leads to increase in retention failure STT-MRAM as this improves both writing speed and energy.
and read disturb rates resulting in erroneous data stored in the
cache. This problem can naturally be mitigated in the scope of
On the other hand, this increases the error rate due to retention
approximate computing in which such errors can be tolerated at failures as well as read disturb [11].
the application level. In this paper, we show how STT-MRAM In this paper, we show the potential of the STT-MRAM
technology can effectively be used for approximate computing
by tuning technology and application parameters to achieve an
technology in the scope of approximate computing which
acceptable level of correctness with significant gains. Results show naturally fits with the aforementioned constraints. In general,
that using our proposed approximate computing framework, the approximate computing relaxes the bounds of accurate com-
per-access write latency and energy can be improved up to 25% puting for the applications which are inherently error resilient
and 70%, respectively. and in return, it improves the efficiency of the system by other
means such as performance and energy improvements [12, 13].
I. I NTRODUCTION Many applications, for instance in audio or image processing
domains, possess an intrinsic fault tolerance, and thus can
Energy consumption has emerged to be a major design deal with noisy (i.e. erroneous) data. For example, when
constraint for modern integrated circuits, in particular in low- the output of multimedia (images, videos, etc.) processing
power application domains such as the “Internet of Things”. algorithms is left to human perception strict exactness may
The static power, which is the dominating component in not be required and imprecise results may often be sufficient.
the total energy consumption, is the leading roadblock in Hence, with approximate computing, an increasing error rate
these fields [1]. Therefore, the stringent energy targets cannot can be tolerated to improve the write energy/latency of STT-
be achievable using conventional SRAMs, because of the MRAM with minimal cost. Moreover, even the overall system
high leakage. As pointed out by the International Technology energy efficiency can further be improved due to shorter
Roadmap for Semiconductors [2], one of the most promising execution times thanks to the improved cache access latencies.
solutions to overcome this power challenge is the integration of
non-volatile memories, where the data can be retained without Our simulation results show that using the proposed ap-
any power supply. proximate computing framework, the per-access write energy
of STT-MRAM can be reduced by 70 %. Applying this to the
Among the emerging non-volatile memory technologies, data-cache of a microprocessor improves the overall system
Spin Transfer Torque Magnetic Random Access Memory (STT- performance by 10 % and 25 % for 2 GHz and 1 GHz core
MRAM) is one of the most promising technologies for future frequencies, respectively.
embedded memories. STT-MRAM is a novel and unique stor-
age technology based on a magnetic device (called Magnetic The remainder of this paper is organized as follows. In
Tunnel Junction or simply MTJ) that exploits not only the Section II, the basics of STT-MRAM is presented and the idea
charge of electrons but also their spin to store digital content. of approximate computing is introduced. Afterwards, in Sec-
As a result, STT-MRAM promises almost zero static power, tion III, the proposed framework using approximate computing
fast access latencies (almost as fast as SRAM) and high for STT-MRAM is explained, followed by a comprehensive
integration density (as good as DRAM) [3, 4]. Thus, STT- experimental study in Section IV. Finally, Section V concludes
MRAM has the potential to replace SRAM in the memory the paper.
hierarchy of the computing system. II. P RELIMINARIES
However, in order to employ STT-MRAM as a promising A. Basics of STT-MRAM
low-power solution still several fundamental challenges have
to be resolved. In particular, it is of decisive importance to In STT-MRAM, data is stored in a Magnetic Tunnel
reduce the write energy as well as the write latency of STT- Junction (MTJ) cell, which consists of two independent ferro-
MRAM [1, 2]. In fact, the MTJ cells require rather high current magnetic layers separated by a thin oxide layer. The magnetic
for a long duration to switch their magnetization. Additionally, orientation of one layer, the free layer, can be freely rotated,
the stochastic switching nature of the MTJ cells necessitates a while the magnetization of the other layer, the reference layer,
longer timing margin, which further extends the total write is fixed. Thus, the magnetization of the free layer can be in
period. The overall write latency is significantly increased parallel or anti-parallel to the reference layer. As a result, the
due to their long write period as well as the write current electric resistance of the MTJ changes to high for anti-parallel
!
Bit-Line Bit-Line
−12 −7
x 10 x 10
3 5 1 4
Energy Retention Failure
Retention Failure
Write Energy [J]
Latency Read Disturb
Read Disturb
Probability
Probability
Free Layer 2 4.5
Reference Layer
0 3.5 0 0
20 30 40 50 60 20 30 40 50 60
Write-0 Write-1 Thermal Stability Factor (∆) Thermal Stability Factor (∆)
Word-Line Word-Line
IW > IC IW > IC (a) (b)
Fig. 2: Effects of thermal stability factor on (a) write energy
Source-Line Source-Line and latency, (b) failure rates (for setup see Sec. IV-A)
Fig. 1: Typical STT-MRAM bit-cell structure
C. Failure Rate Dependency on Thermal Stability Factor
and low for parallel. These two states are used to represent a
logic ’0’ and ’1’, respectively (see Fig. 1). With low ∆ value, the STT-MRAM access latency and
energy can be significantly reduced. However, it increases the
In order to read data in an STT-MRAM cell, a low current retention failure rates and possibility of read disturb, that are
flows through the MTJ to sense the resistance state of the cell. explained next.
Also, to write a data into an STT-MRAM cell, current flows
through the MTJ. However, the write current (Iw ) is much Retention Failure: The retention failures in STT-MRAM
higher (tens of µA) than the critical current (Ic ) (minimum happen due to the inherent thermal fluctuation of the MTJ cell,
current required to flow for a considerable amount of time to that can lead to switch its magnetic orientation. This can occur
switch magnetization at a certain write error rate). The final regardless of whether a memory access is performed or not.
magnetization can be controlled by the write current direction The retention failure probability (PRF ) for a given time period
(see Fig. 1). As a result of this high write current, the dynamic (t) can be computed according to [14] as:
write energy in STT-MRAM is very high [5]. This issue is t
further pronounced due to the stochastic nature of the writing PRF = 1 − exp[− ] (3)
τ · e∆
(switching) process as well as the high sensitivity to process
variation, which requires considerable timing margins [7]. where τ is a constant equal to 1 ns. According to this equation,
a relatively high ∆ can significantly improve the reliability.
B. Retention Time in STT-MRAM Read Disturb: Since both read and write currents share
In general, the retention time in non-volatile memories the same path in the STT-MRAM bit-cell, the read current
refers to the data retaining capability of their bit-cell regard- can accidentally switch the bit-cell content during the read
less of their powered-on or powered-off conditions. In STT- operation time (tr ). This phenomenon is known as read disturb.
MRAM, this retention time depends on the thermal stability Its probability (PRD ) is strongly dependent on ∆ according
factor (∆) value of the MTJ cell. The higher the ∆ value, the to [16]:
tr
longer the data can stay in the bit-cell. For a value of 60, it PRD = 1 − exp[− ] (4)
∆(1− IIr )
can retain the bit-cell content for 10 years [14]. The ∆ value τ ·e c
!
D. Approximate Computing Device- value
level (Device parameter dependent on MTJ size)
The Approximate Computing concept has recently emerged Failure Switching Switching
as a promising approach which relies on relaxing the bounds rates latency energy
of precise computing to improve the energy, and performance
efficiency by orders of magnitude. This is done by leveraging Architecture- Fault Cache latencies Cache energy
level
the applications resilience to the errors and producing an injection (in cycles) (per access)
!
TABLE I: The overall mean time to failure (MTTF) for various Signal to Noise Ratio (SNR) between the approximated image
∆ values (for setup see Sec. IV-A) (faulty image) and golden image (faulty-free image). In other
∆ MTTF [s] ∆ MTTF [s]
words, SNR can be served as a quality measure for the
20 9.22X10−7 34 1.11 approximated image, which is calculated based on the variance
25 1.37X10−4 35 3.01 of the signal (i.e., the golden image) (σS2 ) and the variance of
2
31 5.52X10−2 40 447.59 the noise (i.e., the faulty image) (σN ) as Equation. (6), and is
32 1.50X10−1 45 6.7X10+4 expressed in decibel (dB). q
33 4.08X10−1 60 2.17X10+11 2 σS
SN R = 20 log10 q (6)
2
σN
application requirements to achieve acceptable level of quality
while maximizing energy and performance gains. For that pur-
pose, we employ image processing applications, in particular The value of SNR impacts directly the performance and energy
as a case study, we will deal with JPEG format. This is a lossy consumption of the proposed scheme. However, the minimum
compression method for digital images to store or transmit acceptable SNR value depends mainly on the application
data in an efficient form without losing the ability to re- requirements. For instance in image processing applications,
extract an acceptable version of the image. For this reason, the the minimum required SNR value of the faulty image SNR
JPEG format provides some inherent level of fault tolerance, Threshold (SNRth ), is set at the level where the human brain
which means that it fits very well to the idea of approximate and eyes can differentiate between a faulty image and a golden
computing. one. This means, if the SNR value of the output is less than
SNRth , it is considered as an unacceptable output quality. See
B. Adaptation of Thermal Stability Factor Fig. 4 where SNR for an acceptable JPEG image quality has
to be more than SNRth = 50. However, for more performance
As discussed in Sec. II-C, the thermal stability factor has and energy improvements, other smaller SNR values can be
a strong influence on the reliability of STT-MRAM cell at considered at the cost of less but still acceptable output quality.
both idle and access time because of the reduction in Ic . The corresponding pseudo-codes for generation of golden and
Consequently, various critical optimizations for STT-MRAM faulty images are presented in Algorithm 1 and Algorithm 2.
design can be performed, namely i) performance optimization
which is obtained due to the reduction of the required time E. Protection of Critical Data
to switch the MTJ resistance state (i.e., write latency tw ),
ii) as the write access power of STT-MRAM depends on A major challenge of using an approximate storage is
that most applications which are highly amenable to the
Ic as: Pw = Ic 2 · RM T J , where the resistance state of
approximate computing paradigm, have a mixture of control
MTJ cell is fixed, a low Ic value leads to a reduction in
data, so-called the critical data, which cannot tolerate any
power of the STT-MRAM cell, and iii) the energy consumption
errors and the data that may be approximated or unreliable.
(Ew = Pw · tw ) can also be dramatically reduced because of
Since the workloads in our case study are image processing
the small tw and Pw values, since they are the outcomes of
applications (JPEG), which uses a lossy compression technique
low ∆ value. Fig. 2 shows the effects of ∆ value reduction on
to produce a similar coloured image with reduced size and
failure probabilities, write latency and write energy. According
an acceptable quality, any imprecision on region of code that
to this figure, as the value of ∆ decreases, both the write
stores the meta-data (the header of the image) totally corrupts
access latency and write energy decrease at the cost of higher
the output image. This means header data is controlling the
retention failure. In our framework, for the ∆ reduction from
image data and should not be approximated. Whereas, the
60 to 20, the improvements of the write latency and energy per
compressed image data (i.e., quantization) has tolerance to
access reach up to 25% and 70%, respectively. Whereas, the
imprecision and any errors in it may only lead to some degree
retention failure probability increases significantly and become
non-acceptable for the ∆ value below 32 (see Fig. 2).
C. Fault Injection Approach Algorithm 1 Golden image generation
Input Error-free output constraint {∆max }
There are retention failures and read disturb failures as- Output Error-free image
sociated with low ∆ values for the STT-MRAM memory. 1: Set the corresponding STT-MRAM based data cache configurations
Therefore, an accurate fault injection model has to be built 2: Execute the simulation without fault injection mode
based on the combined failure rates, i.e., Failures In Time 3:return Error-free image
(FIT). The combined FIT rate has to be converted into access-
based failure probability. The accesses in memory (e.g., L1 Algorithm 2 Faulty image generation
data cache) are performed by read and write requests. However, Input Approximate computing constraints {∆min , ∆max , SN Rth }
we perform the fault injection model at only read access, and the maximum number of experiments (N)
since it is more frequent compared to the write access. We Output Acceptable and non-acceptable percentage of image quality
compute the Fault Injection Rate (FIR) per read access based 1: For ∆ from ∆min to ∆max do
on FIT value of the adopted ∆ and the Read Rate (RR) as 2: Set the corresponding STT-MRAM based data cache configurations
3: Calculate the fault injection rate per read access (FIR)
follows: FIR=FIT/RR. Table I provides the combined FIT rate 4: For n from 1 to N do
corresponding to ∆ values from 20 to 60. 5: Execute the simulation with fault injection mode
7: Calculate SN R
D. Tolerate Acceptable Error Rate Using Approximate Com- 8: if SN R > SN Rth then
puting 9: Increase the acceptable quality percentage
10: else
To define the accepted value of the lowered ∆ for cache 11: Increase the non-acceptable quality percentage
memories, we have to estimate the acceptance quality of the 12: end if
extracted image by observing the influence of the ∆ value on 13: end for
14: end for
the error rate which will be translated into the read-access- 15: return Acceptable and non-acceptable percentage of image quality
based injected failures. The metric is based on the computed
!
(a) Golden image (b) Acceptable image qualtiy (c) Non-acceptable image quality (d) Non-acceptable image quality
∆ = 32 & SN R = 50 ∆ = 26 & SN R = 23 ∆ = 27 & SN R = 19
Fig. 4: Images of various ∆ and SNR values for STT-MRAM data cache for “jpegtran” workload
100 100
Image Quality in %
Image Quality in %
75 75
50 50
25 djpeg 25 djpeg
jpegtran jpegtran
cpeg cpeg
0 0
10 20 30 40 50 60 70 10 20 30 40 50 60 70
Thermal Stability Factor ∆ Thermal Stability Factor ∆
of quality loss in the output image. Therefore, the stored data with a fault injection framework in order to support the
has to be classified into reliable and unreliable data, which retention time of each particular value of ∆ adopted for an
are header and quantization, respectively. There are various STT-MRAM based data cache. Whereas, the instruction cache
solutions to protect the critical part of the data. has to be faulty-free. This is why, we assumed ∆ = 60 for
the instruction cache to guarantee high retention time. This
The critical data size is very small compared to that of assumption is reasonable, as the number of write accesses
the non-critical data. To make critical data error resilient, one to the instruction cache is considerably lower. Therefore, the
way is to design a heterogeneous memory array for the data performance and energy overheads of adopting higher ∆ for
cache which has high and low ∆ values. In this design, the such this cache would be negligible. Table II summarize the
critical data has to be stored to the cells of high ∆ to guarantee set-up for our experiments. The evaluations are performed by
the error free operation. However, this requires changes to the running 3 applications of image processing from MiBench
fabrication parameters of the two arrays as well as complex applications 100 times in order to make our fault injection
cache controller to allocate data to different array. A less model non-deterministic. Each output for each experiment has
complicated way to protect the critical data is either to use a different level of quality loss according to the error rate of
multiple copies of the content of this data, such as dual or triple the used ∆. We use SNR as the metric to evaluate the quality
modular redundancy, or to use Error Correction Code (ECC), of the applications output.
which protects the data by adding check bits. Since the size
of critical data is significantly smaller than approximate-able B. SNR Evaluations
data, the overhead due to extra bit-cells of either the repeated
bits of dual or triple modular redundancy or the check bits of The SNR measurements reveal the degree at which an
ECC approach is minimal. application could produce satisfactory and reasonable output
due to the use of the relaxed thermal stability factor of STT-
MRAM in data cache for approximate applications. This is
IV. SIMULATION RESULTS to determine the achievable limits of the performance and
In this section, we present the experimental analysis to energy improvements for STT-MRAM technology. Our anal-
show the benefits of using STT-MRAM in approximate com- ysis depends on extracting 100 different values of SNR for
puting by using image processing applications (JPEG) as case each ∆ value up to 70 for the all three images applications.
study. Fig. 4 shows acceptable and non-acceptable faulty image
quality along with the original one. After observing the images
A. Simulation Setup along with the related SNR, we can define the values of
For the bit-cell characterization, we extracted the read SN Rth which are 50, 50 and 60 for djpg, jpegtran and
and write latencies for STT-MRAM by SPICE simulation cjpeg, respectively. This means that the faulty image can be
using TSMC 65nm general purpose transistor models and the considered as acceptable, if its SNR value is greater than
perpendicular STT-MRAM model presented in [25]. For the SN Rth (as explained in Algorithm 2).
bit-cell in L1 data cache, we used ∆ from 10 up to 70 in
order to determine the minimum value that can be adopted TABLE II: Simulation setup in gem5
for optimal results. For our evaluations, we employed the gem5 confiquration ISA x86
gem5 simulator [26]. gem5 is a full-system, cycle-accurate Processor Single-core, 0.5/1/2 GHz, Out-of-order, 4-issue
performance simulator that supports all levels of the memory L1 data-cache
64 KB, 2-way set associative, 64B line size
STT-MRAM, different write latencies and ∆ values
hierarchy with various configurations such as capacity, asso- 32 KB, 2-way set associative, 64B line size
ciativity, latency, and block size. We extended the simulator L1 instruction-cache
STT-MRAM, ∆ = 60
MiBench Applications [27] djpeg, cjpeg, jpegtran
!
!
120 120
Relative Changes in %
Relative Changes in %
100 100
Approximate Approximate
80 80 Computing Region
Computing Region
60 60
40 40
Write Energy Write Energy
20 Performance 20 Performance
Image Quality Image Quality
0 0
10 20 30 32 34 40 50 60 70 10 20 30 32 34 40 50 60 70
Thermal Stability Factor ∆ Thermal Stability Factor ∆
Fig. 6: Relation between image quality, performance (runtime) and write energy for an STT-MRAM based data cache
According to the SNR measurements, we count the ac- to determine the optimal thermal stability factor in order to
ceptable images from the entire 100 faulty images related to find the right balance between acceptable accuracy as well
each ∆ value for all the image applications (see Fig. 5). This as energy and performance gains. Our results show that the
image shows that for ∆ ≤ 25, the percentage of acceptable energy reduction of 46% along with performance gain of up
quality is zero, for the three image applications. Whereas, for to 25% can be achieved at system level with a reasonable
∆ ≥ 40, all the images are acceptable. For ∆ values equal output quality.
to 32, 33 and 34, our results show that the percentage of the VI. ACKNOWLEDGEMENT
acceptable quality is more than 95%. Therefore, values of ∆
can be considered for approximate computing, in which the This work was partly supported by the European Commis-
output with quality loss less than 5%. Please note that in our sion under the Horizon-2020 Program as part of the GREAT
framework, the retention failure rate is dominating over the project (http://www.great-research.eu/) and by ANR/DFG as
read disturb rate for the SNR measurements (see Fig. 2 (b)). part of the MASTA project.
For the applications with faster read requirements, higher read
R EFERENCES
currents are necessary, and consequently, the read disturb rate
will be higher. In such cases, the SN Rth evaluation criteria can [1] N. Kim, et al., “Leakage current: Moore’s law meets static power,” computer, pp.
68–75, 2003.
be changed, and accordingly, the percentage of the acceptable [2] International Technology Roadmap for Semiconductors, http://www.itrs.net, 2013.
quality can be altered. [3] K. Wang, et al., “Low-power non-volatile spintronic memory: STT-RAM and
beyond,” Journal: Applied Physics, 2013.
[4] Driskill-Smith, et al., “Latest advances and roadmap for in-plane and perpendicular
C. Performance and Energy Analysis of Optimal ∆ for Ap- STT-RAM,” in International Memory Workshop, 2011, pp. 1–3.
proximate Computing [5] S. Fujita, et al., “Technology Trends and Near-Future Applications of Embedded
STT-MRAM,” in 2015 IEEE International Memory Workshop, May 2015, pp. 1–5.
The performance influence of the lower ∆ value for STT- [6] Z. Sun, et al., “Multi Retention Level STT-RAM Cache Designs with a Dynamic
Refresh Scheme,” in Micro, 2011, pp. 329–338.
MRAM data cache is closely associated with the processor [7] A. Ahari, et al., “Improving Reliability, Performance, and Energy Efficiency of
frequency. On one hand, for a given frequency, the write STT-MRAM with Dynamic Write Latency,” in ICCD, Oct. 2015, pp. 109–116.
[8] R. Venkatasan, et al., “Energy-Efficient All-Spin Cache Hierarchy Using Shift-
latency is reduced by 0.5 ns when ∆ is lowered to 32, as Based Writes and Multilevel Storage,” ACM JETC, pp. 4:1–4:24, 2015.
illustrated in Fig. 2 (a). On the other hand, the performance [9] R. Bishnoi, et al., “Avoiding Unnecessary Write Operations in STT-MRAM for
gain further increases by a low clock frequency of the pro- Low Power Implementation,” in ISQED, 2014, pp. 548–553.
[10] P. Zhou, et al., “Energy Reduction for STT-RAM Using Early Write Termination,”
cessor as well. For instance it reaches to 25% and 10% at pp. 264–268, 2009.
clock frequency 1 GHz and 2 GHz, respectively (see Fig. 6). [11] H. Naeimi, et al., “STT-RAM scaling and retention failure,” Intel Technology
Furthermore, according to our experiments, 66% of the total Journal, pp. 54–75, 2013.
[12] J. Lucas, et al., “Sparkk: Quality-scalable approximate storage in DRAM,” in The
energy of STT-MRAM for the L1 cache is consumed for write Memory Forum, 2014, pp. 1–9.
operations in data cache. Therefore, any improvement in the [13] S. Liu, et al., “Flikker: saving DRAM refresh-power through critical data partition-
ing,” ACM SIGPLAN Notices, pp. 213–224, 2012.
energy consumption of STT-MRAM cell in the data cache [14] C. W. Smullen, et al., “Relaxing non-volatility for fast and energy-efficient STT-
leads to the overall system energy efficiency. Results show that RAM caches,” in 2011 IEEE 17th HPCA, 2011, pp. 50–61.
we are able to improve the energy consumption per access up [15] Y. Jin,, et al., “Area, Power, and Latency Considerations of STT-MRAM to
Substitute for Main Memory,” in Proc. ISCA, 2014.
to 70% with a ∆ value 32. [16] D.Apalkov, et al., “Spin-transfer torque magnetic random access memory (STT-
MRAM),” JETC, p. 13, 2013.
Based on the performance and energy improvements of [17] K. Munira, et al., “A Quasi-analytical model for energy-delay-reliability tradeoff
the low ∆ value along with the output quality, we can define studies during write operations in STT-RAM,” Electron Devices, 2012.
[18] T. Zheng, et al., “Variable-energy write STT-RAM architecture with bit-wise write-
the optimal ∆ value for a particular system setup. Fig. 6 completion monitoring,” in ISLPED, 2013, pp. 229–234.
illustrates this trilateral relationship (i.e., energy, performance, [19] D. Suzuki, et al., “Cost-Efficient Self-Terminated Write Driver for Spin-Transfer-
quality). According to this figure, the optimal configuration Torque RAM and Logic,” Magnetics, pp. 1–4, 2014.
[20] Y. Emre, et al., “Enhancing the reliability of STT-RAM through circuit and system
for approximate computing applications, can be obtained by level techniques,” in 2012 Workshop on SiPS, 2012, pp. 125–130.
relaxing the thermal stability factor from the usually applied [21] X. Bi, et al., “Probabilistic design methodology to improve run-time stability and
60 down to 32, where the overall system energy reduction performance of stt-ram caches,” in ICCAD, 2012, pp. 88–94.
[22] A. Sampson, et al., “Approximate storage in solid-state memories,” TOCS, p. 9,
reaches to around 46% and the performance gains are 25% 2014.
and 10%, at clock frequency 1 GHz and 2 GHz, respectively. [23] A. Ranjan, et al., “Approximate storage for energy efficient spintronic memories,”
in 2015 52nd DAC, 2015, pp. 1–6.
[24] F. Oboril, et al., “Fault tolerant approximate computing using emerging non-volatile
V. CONCLUSIONS spintronic memories,” in VTS, 2016, pp. 1–1.
[25] A. Mejdoubi, et al., “A compact model of precessional spin-transfer switching for
We have developed a cross-layer framework, from tech- MTJ with a perpendicular polarizer,” in MIEL, 2012, pp. 225–228.
[26] N. Binkert, et al., “The gem5 simulator,” ACM SIGARCH Computer Architecture
nology parameters to architecture and application levels to News, pp. 1–7, 2011.
evaluate the applicability of STT-MRAM technology for ap- [27] M. R. Guthaus, et al., “MiBench: A free, commercially representative embedded
proximate computing. We have also provided a methodology benchmark suite,” in WWC-4. 2001 International Workshop on, pp. 3–14.