RRAM for In-Memory Computing Systems
RRAM for In-Memory Computing Systems
www.advintellsyst.com
Adv. Intell. Syst. 2019, 1, 1900068 1900068 (1 of 16) © 2019 The Authors. Published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.advancedsciencenews.com www.advintellsyst.com
static random access memory (SRAM),[8–10] flash memory,[11,12] Bonan Yan received his bachelor’s
magnetic random access memory (MRAM),[13,14] racetrack degree from Beihang University,
memory,[15,16] phase change memory (PCM),[3,17–19] and resistive China, and master’s degree from the
random access memory (RRAM).[20–22] All these memory types, Department of Electrical and
except SRAM, are nonvolatile, i.e., a memory device maintains Computer Engineering, University of
the stored data even without having power supply. SRAM fea- Pittsburgh, USA. He is pursuing his
tures the fastest read/write speed (sub-nanosecond) and a much doctoral degree at Duke University,
more mature fabrication process among the existing memory USA. His research interests include
techniques.[8–10] Besides persistent data storage, nonvolatile macromodeling and circuit design for
memory technologies generally have a higher density, i.e., emerging nonvolatile memories, brain-
<100 F2 cell size, where F represents the technology feature size. inspired computing systems, and hardware/software
NAND flash memory as a commercial nonvolatile memory tech- codesign for machine learning acceleration.
nology supports sequential data accesses and has complicated
techniques to maintain robustness and reliability. MRAM high- Meng-Fan Chang received his M.S.
lights fast write speed (in the same order as SRAM) and stochas- and Ph.D. degrees from Penn State
tic programming.[23–26] Racetrack memory is known for its University, USA, and National Chiao
extremely high density (20 nm-wide nanowire) and the sequential Tung University, Taiwan, respectively.
access along tracks.[27] PCM provides a linear conductance update He is a distinguished professor at
characteristic.[18] RRAM offers versatility, including high resistiv- National Tsing Hua University,
Taiwan. Before 2006, he worked in the
ity (MΩ order of magnitude), the support of 3D integration, sto-
semiconductor industry over for
chastic programming, and multilevel cell (up to 6 bits).[28–31]
10 years. His research interests include
The unique characteristics of different memory technologies
circuit designs for volatile and
make them useful in different application scenarios. Several
nonvolatile memory, computing-in-computing, neuro-
recently published review articles summarized the exploitation
morphic computing, circuit–device interaction beyond
of these memory technologies for in-memory computing from
CMOS technologies, and software–hardware codesign for
distinct aspects or layers. Xia and Yang described the physical
AI devices. He is an IEEE Fellow.
mechanisms and fabrication of several state-of-the-art memristor
devices and the principles to implement neural network hard- Hai Li received her bachelor’s and
ware with them.[31] Jeong et al. reviewed how memristors master’s degrees from Tsinghua
progress from concepts to real-world devices that aim at University, China, and doctoral degree
energy-efficient computing paradigms.[32] Tsai et al. introduced from the Department of Electrical and
the principal applications of analog memory devices in building Computer Engineering, Purdue
deep learning accelerators, including prospective candidates University, USA. She is currently the
of resistive, capacitive, and photonics devices.[30] Jeong and Clare Boothe Luce Associate Professor
Hwang provided insights into the usage of nonvolatile memory with the Department of Electrical and
materials to build machine learning computing hardware.[33] In Computer Engineering at Duke
this paper, we would like to broaden the sight beyond the University, USA. Her current research
focus solely on device layer or circuit layer, following the interests include hardware/software codesign for machine
device/circuit/computer architecture/cross-layer codesign meth- learning acceleration and security, brain-inspired
odologies. Taking the RRAM device family as a representative computing systems, and memory architecture and
technology, we review the latest progress of in-memory comput- optimization. Dr. Li is a distinguished speaker of ACM
ing from the perspectives of state-of-the-art nanoscale devices (2017–2020) and a distinguished lecture of the IEEE CAS
and large-scale integration computing systems for artificial intel- society (2018–2019). Dr. Li is a fellow of IEEE and a
ligence (AI) acceleration. We will start with the implementation distinguished member of ACM.
of RRAM devices in weight matrices and in realizing activation
function, followed by the circuit and architecture approaches of
in-memory computing. The reliability issue as a new challenge in contains a resistive layer sandwiched by two electrodes. The resis-
developing large-scale systems will also be discussed. In the end, tive layer is typically transition metal oxides, such as HfOx,[35–38]
we will conclude this paper. NbOx,[39,40] TiOx,[34,41,42] and TaOx.[43,44] Other materials, such
as SrTiO3,[45] PrCaMnO3,[45] and Ag:a-Si,[21] also show memris-
tivity. According to the resistivity switching mechanisms,
mainstream RRAM technologies can be classified into two catego-
2. RRAM Array for DNN Weighting Function
ries:[29] filamentary RRAM that relies on the formation and disso-
2.1. RRAM Basics and RRAM Array for Inference lution of the conductive filaments or channels of the metal ions or
oxygen vacancies in its insulating layer and interfacial RRAM that
A resistive memory (RRAM, a.k.a. memristor) device generally redistributes the oxygen vacancies at the heterogeneous interface
represents any two-terminal electronic device whose resistance to change the overall resistance. During programming, a “SET”
value can be programmed by applying external voltage/current operation increases the device conductance (or decreases the resis-
with an appropriate configuration.[34] A single RRAM device tance), whereas a “RESET” operation decreases the conductance.
Adv. Intell. Syst. 2019, 1, 1900068 1900068 (2 of 16) © 2019 The Authors. Published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.advancedsciencenews.com www.advintellsyst.com
(a) (b)
(c)
Figure 1. a) Schemes of von Neumann architecture versus in-memory computing.[2] b) Diagram of von Neumann bottleneck. c) Analog of RRAM array to
VMM, where the synaptic weights are stored as a conductance matrix of the RRAM array.
In some types of RRAM technologies, often within the filamentary applied to the RRAM devices with data stored as resistances.
category, the “forming” operation is required.[46] It applies a much The amplitude of output current represents the logic operation
higher voltage than the programming one to generate initial fila- result. To be more specific, RRAM devices enable logic-
ments before normal use. According to the available stable resis- in-memory to perform stateful logic operations. Xu et al. demon-
tance states, RRAM can be categorized into two types: analog strated a time-efficient implementation with dual-bit memristors
RRAM denotes those devices whose resistances can be pro- for 12 different basic single-step logic operations, including
grammed to any value between the highest resistance state TRUE, FALSE, NOT, AND, OR, and, more importantly, “mate-
(HRS) and the lowest resistance state (LRS), whereas binary rial implication” (IMP).[50] A large-scale array of stateful logical
RRAM behaves as a normal memory device with a stable RRAM is the aggregation of these basic logic gates similar to
HRS and LRS. RRAM technology advances for its high data the transistor-based arithmetic units.[57] By weaving complex log-
storage density as well as the compatibility with the complemen- ics with stateful RRAM in a dataflow automation scheme, a fast
tary metal–oxide–semiconductor (CMOS) process. As early as in and energy-efficient computer can be realized to execute heavy
2009, a 1 kb RRAM array in a one-transistor-one-RRAM (1T1R) logical workload (see arithmetic operation).[53] The realization
cell structure was successfully integrated with CMOS read/write with stateful logic RRAMs often requires special routing techni-
circuits.[20] In this design, the HfO2-based RRAM with TiN electro- ques to attain logic gate functionality and high accuracy to resist
des is fabricated above the transistors. These devices achieved four IR drop. Thanks to nonvolatility, RRAM-based logic gate imple-
separated resistance levels. Such a high storage density makes mentation targets at the normally off-computing systems,
RRAM a great replacement of the conventional memory technol- leading to dramatically the power consumption reduction of
ogy. The latest 3.6 Mb embedded binary RRAM array fabricated in computing systems.[55,57]
a 22 nm low-power process is composed of 0.3 Mb subarrays The analog between memristors and biological synapse origi-
together with read/write logic circuitry.[47] nated shortly after Hewlett Packard Labs discovered nanoscale
Following the classical Boolean logic to construct computing RRAM devices as memristors in 2008 (Figure 2a).[34] Since
automata, logic gate implementation with memory devices is the then, there have been extensive studies on developing electric
first attempt to exceed the function of memory and head for com- synapses with RRAM devices and exploiting them for DNN
putation.[48–54] Binary inputs are stored into RRAM devices as the acceleration.[29,32,33] In 2012, Hu et al. described how to conduct
resistance (conductance) before performing computation.[55,56] VMM on an RRAM crossbar array based on Kirchhoff’s law.[58]
With parallel or serial connection setups, sensing voltage is Furthermore, they analyzed the necessary design consideration
Adv. Intell. Syst. 2019, 1, 1900068 1900068 (3 of 16) © 2019 The Authors. Published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.advancedsciencenews.com www.advintellsyst.com
(a) (b)
Figure 2. a) Biological synapse versus RRAM synapse.[5] b) Different segmentation schemes for fully connected layers (DNN), convolutional layers
(CNN), and LSTM layers (LSTM network), all of which are converted into VMM using the RRAM array.
in real circuit implementation by taking a brain-state-in-a-box this design, the input vector and feedback of the output vector are
(BSB) computing model as an example. A spiking neural network encoded in the voltage format and fed into RRAM arrays in
(SNN) represents another association between RRAM devices parallel. The intermediate computation results of different gates
and synapses.[59] Applying voltage pulses to an RRAM device come from different columns of the RRAM array simulta-
can gradually change its conductance. With careful designs, the neously. No matter what types of network models, the execution
SET voltage pulses can be used to realize potentiation (excitatory), parallelism can dramatically improve the throughputs.
whereas the RESET voltage pulses enable depressing (inhibitory)
functions. Such an RRAM synapse together with an external volt- 2.2. RRAM-Based Designs in Training
age spike generator can be adopted with SNN learning rules to
implement unsupervised learning applications.[60] Most of the RRAM-based DNN accelerator designs target Internet
Centric to VMM, RRAM arrays can be used to implement the of Things (IoT) and edge computing applications.[67–70] In such
DNN inference functions very efficiently. A DNN model contains energy-efficiency-oriented embedded systems, the necessity of
many layers of matrices, each of which can be deployed on one or on-chip or online training is arguable. The latest progress in
a few RRAM arrays. The similar voltage–current flow mecha- decentralized learning and data privacy demands the realization
nism of matrix multiplication has been widely adopted in of local training capability in edge devices. However, the
research studies, whereas weight representation can be realized RRAM device uniformity and the large overhead of training
in an analog or digital manner.[61] In the analog scheme, the syn- circuits remain major challenges.[29,71]
aptic weights directly map as the conductance values. Such an Considering the implementation complexity of training cir-
“analog synapse” requires that the device can be programmed cuitry, the devices featuring linear and symmetric synaptic
to any value in a certain conductance range. In contrast, a digital weight updates are more preferred. Here, the symmetric weight
scheme can accommodate only a limited number of resistance update (Figure 3a) denotes the scenario where identical electrical
states, denoting different digital levels of weights.[62] Often, a col- excites result in the same amount change in weight Δw, during
umn of RRAM cells is with the same digit-wise significance. The both SET and RESET programming. The linear weight update
computational results from different columns can be summed by means that the weight change Δw has linear dependency on
applying the corresponding significances. the electrical excites, regardless of the current resistance state.
Moreover, RRAM arrays have been used for various neural In reality, however, the weight update of RRAM devices is asym-
network models with the assist of weight segmentation techni- metric and nonlinear (Figure 3a).[72,77] Some theoretical memris-
ques (Figure 2b).[61,63,64] For example, a fully connected layer can tor models reason that nonlinearity is natural and derivable;
be directly mapped on one RRAM array or partitioned and imple- however, the nonideal linear update can be mitigated by relaxing
mented onto a few smaller arrays. A convolutional layer in con- the strong requirements of training.[72] For example, Chen et al.
volutional neural networks (CNNs) contains many convolutional presented the self-rectifying TaOx/TiO2 RRAM, which is only
kernels (or filters), which need to be unrolled to the VMM format 3% away from linearity during programming.[72] They
first.[63,65,66] For long short term memory (LSTM) networks, the modified the online training with voltage spikes and mitigated
synaptic weights in an LSTM layer toward input gate, output gate, nonlinearity by fine-tuning the exciting spike widths. As shown
and forget gate can be deployed on different RRAM arrays.[64] In in Figure 3a, an approximation of linear and symmetrical
Adv. Intell. Syst. 2019, 1, 1900068 1900068 (4 of 16) © 2019 The Authors. Published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.advancedsciencenews.com www.advintellsyst.com
(c)
Figure 3. a) Measured weight update by bidirectional identical pulses of Ta2O5, TaOx/TiO2, Ag:a-Si, and praseodymium calcium manganese oxide
(PCMO) memristors.[72,73] All conductance values are normalized to the same scale. b) Measured change of synaptic connections as a function of
the relative timing of pre- and postsynaptic spikes using Al2O3/TiO2 – x, second-order, PCMO, and tunnel junction memristors.[31,55,74,75] All conductance
change values are normalized to the same scale. c) Training scheme of HD computing using RRAM associate memory.[76] d) Two schemes of 3D stack
RRAM configurations.[77–79]
programming is achieved by modulating the pulse width or volt- into weight updates.[21] “Synaptic plasticity,” similar to its defini-
age potentials. Moreover, Yu et al. demonstrated an online train- tion in neuroscience, refers to the synaptic weights that can be
ing by fabricating quasi-linear devices and integrating in the strengthened (increased) or weakened (decreased) over time. In
1T1R cell.[80] The weight updates, calculated by a gradient-decent STDP configuration, the weight of a target RRAM synapse
method, are translated into the number of identical program- changes according to the delay of the spikes from the presynaptic
ming pulses to RRAM devices. The accuracy of modified neuron (the neuron before the synapse) to the postsynaptic neuron
National Institute of Standards and Technology (MNIST)[81] data- (the neuron after it), as shown in Figure 2a. The “delay” can also be
set reaches 96.5%, which is very close to the one obtained at a negative, meaning the postsynaptic spike fires before the presyn-
software level. aptic spike. The delay, or more accurately, the time difference,
A backpropagation algorithm commonly adopted in neural net- serves as the input to the learning function. The output of the
work algorithms consists of some complex computing operations, learning function is the weight update of the target synapse.
such as partial derivatives and outer products.[76,83] In-memory Voltage pulses are generated based on the difference of the pre-
computing modules are the most convenient for inner products synaptic and postsynaptic spike timing for RRAM synapse updat-
but do not support outer products. There exist studies on on-chip ing (Figure 3b). To achieve weight updating according to spike
learning of RRAM-based design by leveraging its advantages in timing, the shape of the spikes applied to the RRAM synapse
matrix-multiplication operations.[83] However, it is not feasible is critical. Figure 3b summarizes some examples. Most of the cur-
to implement the entire backpropagation procedure within in- rent designs produce such spike shapes directly from the testing
memory computing modules. The assist of extra circuitry or equipment or by an analog CMOS neuron circuit design.
graphics processing unit (GPU)/CPU cores is necessary. A recent research study shows that RRAM (memristor) devices
Backpropagation, however, is not the only choice. With can also be used to build spike-based neurons, the details of which
additional controlling and peripheral circuitry, it is feasible to will be elaborated in Section 3.2.
realize other training schemes with RRAM synapse, such as As an alternate to train RRAM synapse, hyperdimensional
spike-timing-dependent plasticity (STDP).[84] STDP is a biological (HD) computing eases the computation complexity and enhan-
plausible training method that operates time-related information ces training efficiency with interpretability (Figure 3c).[85–87] HD
Adv. Intell. Syst. 2019, 1, 1900068 1900068 (5 of 16) © 2019 The Authors. Published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.advancedsciencenews.com www.advintellsyst.com
computing encodes the input data into query vectors and com- In this design, the top electrodes determine the input vector
pares them with a set of hypervectors trained from various clas- dimension; the bottom electrodes and the stack layer number
ses. A hypervector is a representative of the characteristic of a decide the output vector dimension.
specific class. The encoding method reshapes multidimensional The 3D stack RRAM can further advance in-memory comput-
input data into a series of scalars and lines them up to form query ing with bigger storage capacity, more efficient local data proc-
vectors. Both hypervectors and query vectors are often binary and essing, and bigger bandwidth and throughput.[38,89,91] The new
sparse. As an inference, the associate memory performs the concept of the 3D synapse also has emerged, using the multiple
“search” operation of query vectors for its most similar hypervec- layer of 3D RRAM arrays to store the synaptic weights and con-
tors.[76] The hypervector with the least hamming distance to the duct matrix multiplication.[92] It collects the output currents ver-
query vector indicates the classification or regression results. tically from RRAM devices of different layers.[78] In such a
There are two advantages when implementing such a compari- configuration, the number of allowable layers limits the dimen-
son process with RRAM arrays. First, the associate memory can sion of input vectors. From the perspective of circuit design, the
be easily built on RRAM arrays.[88] Second, an RRAM array, ports and topology of the top/middle/bottom electrodes should
together with the winner-takes-all (WTA) circuit, can directly be constructed in a similar way as a 2D RRAM with shared elec-
deliver the hamming distance calculation. A demonstration of trodes as the VMM outputs. In this way, the analog to VMM is
HD computing was presented by Wu et al., which monolithically established. The support of parallel or partially parallel opera-
integrated carbon-nanotube field-effect transistor (CNFET) and tions among different layers is crucial because this is the only
RRAM synapses.[87] The training of such a scheme can be simply way to enhance the bandwidth of the 3D stack RRAM.
realized by updating the hypervectors stored in an RRAM-based Meanwhile, careful calibration of the selecting device connection
associate memory and aligning their hamming distances to the is necessary to avoid sneak paths in programming and
demanded results. sensing.[93]
The Joule heat dispersion is now a concern in developing the
3D stack RRAM. Sun et al. revealed this “thermal crosstalk” in a
2.3. 3D Stacking for Scalability 3D RRAM array and modeled it quantitatively.[94] The Joule heat
generated from one RRAM device may heat its neighbor cells
The capacity of RRAM arrays grows by increasing the array and cause unexpected failures, especially during the RESET
size.[22,67] Furthermore, 3D stacking techniques can expand operation that requires a relatively large programming current
the array layers vertically with a minimal impact on the die (i.e., more severe ohmic heating) compared with the SET opera-
area.[77–79,89] Figure 3d shows the 3D implementation of the tion. The dense alignment of RRAM devices in a 3D crossbar
RRAM array above the transistors. For example, Adam et al. dem- array deteriorates heat dispersion due to the memory cells that
onstrated a two-layer stack RRAM array,[78] where the TiO2 x are made very close to each other.
RRAM is fabricated on Si wafer coated with SiO2. The fabrication
process requires a low-temperature profile (<175 C) to prevent
from damaging the fabricated horizontal RRAM layer, as shown 3. RRAM Devices for Neuron Activation
in Figure 3d. The chip consists of two layers of 10 10 arrays,
i.e., 200 RRAM devices in total. The two stackable layers share the Neuron circuits of a certain neural network layer interface with
middle electrodes. Under the statistical measurement of conduc- neighboring neural network layers. Owing to the analog signal
tance and programming hysteresis I–V curves, the RRAM processing nature, the CMOS neuron design in analog circuits
devices of the two layers present similar characteristics.[78] has lasted for decades.[95–99] In SNNs, integrate-and-fire circuit
The fabrication on the Si wafer indicates its potential for (IFC) built based upon capacitors and transistors is usually
CMOS-compatible monolithic back-end-of-line (BEOL) integra- adopted, showing stable performance and robustness. Some
tion. Though the design adopts a simple RRAM-only cell struc- recent research works also present the use of emerging
ture without a selector device, it shows the possibility of nanoscale devices, especially RRAM (or memristor) devices, in
integrating multiple RRAM arrays vertically with middle emulating biological neuron functions.
electrodes. Wang et al. proposed to using diffusive memristors featured
Li et al. first reported a four-layer stack RRAM array monolith- with stochastic dynamics to construct neuron circuits.[100] A dif-
ically integrated with fin field effect transistor (FinFET).[79] Each fusive memristor sandwiches a SiOxNy or SiOx layer that is
layer of the HfOx RRAM array has a size of 32 32 devices. doped with Ag nanoclusters between two metal electrodes.
Different from the aforementioned two-layer 3D RRAM by During SET operations, field-induced Ag mass transportation
Adam et al.,[78] this four-layer 3D RRAM array has a vertically is formed between the electrodes, and thus, the device gradually
integrated RRAM configuration: one TiN/Ti top electrode is changes to an LRS. During RESET operations, the Ag diffusive
shared by the corresponding cells of four adjacent layers, dynamics dissolves the nanoparticle bridge after a certain char-
whereas TiN bottom electrodes are separately assigned to each acteristic time, and hence, the device is relaxed to an HRS. This
specific layer. As such, a middle electrode is internally connected conductive process is a combination of both the Ag mass trans-
to the drain/source of the selecting FinFET, forming a one- portation induced by an external electrical field and the conduc-
transistor-four-RRAM (1T4R) high-density cell. Luo et al. showed tive filament formation. The special selection of Ag results in a
an example of vertically integrated four-layer 3D RRAM.[90] These dedicated delay in response to a train of programming spikes.
devices are naturally with a switching-on threshold to mitigate The amount of this delay can be well controlled through carefully
the sneak path impact due to the lack of cell selecting switches. selecting the external shunt capacitor. This delay of response is
Adv. Intell. Syst. 2019, 1, 1900068 1900068 (6 of 16) © 2019 The Authors. Published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.advancedsciencenews.com www.advintellsyst.com
the key feature to emulate the “threshold firing” operation in phase transition.[40,101] This transition, often activated by certain
spiking neurons. More specifically, only inputs larger than a cer- thermal conditions, also causes the negative differential resistance
tain neuron threshold lead to output spikes. The “threshold fir- (NDR) phenomenon. That is, the induced current decreases as the
ing” of diffusive memristors behaves in a similar way as ion applied voltage increases. This uncommon nonlinearity has been
channel formation in biological neuron cell membranes leveraged in developing a relaxation oscillation circuit. When a
(Figure 4a). Hence, such a resistive SiOxNy/SiOx:Ag memristor, Mott memristor in the NDR region is connected to a resis-
with minimal additional circuit elements, is prominent than tance–capacitor (RC) charging circuit (Figure 4b), the Mott mem-
CMOS analog neuron circuits to mimic the IFC function. ristor could force charges to flow toward (away from) the capacitor
Compared with the analog IFC circuit, the diffusive memristor even as the capacitor discharges (charging). The reciprocating
neuron circuit occupies a much smaller area and consumes trend of charge flow between the Mott memristor and capacitor
much less power.[100,103] Together with RRAM synapses results in a sawtooth-shaped current oscillation, even the capacitor
(Section 2.1), it is possible to realize a “fully memristive is supposed to be charged only under the excites of a direct current
network,” in which both synapses and neurons are RRAM (DC) voltage source. The Mott memristor-based neuristor circuit
devices. exploits the oscillating dynamics to generate output spikes. It con-
The Hodgkin–Huxley neuron model is used to explain the sists of two sets of Mott memristor-RC circuits coupled with each
dynamics of biological axons via electrical elements.[104] other (Figure 4c).[40] The design demonstrates similar neuron
Biological neurons process signals by mediating the sodium behavior described in the Hodgkin–Huxley neuron model and
and potassium ion channels. The procedure is abstracted as the presents the “threshold firing” function. Compared with diffusive
conductance varying over time in the Hodgkin–Huxley neuron memristors that process spikes only, a Mott memristor-RC circuit
model.[104] This model can be emulated physically by co-operating can spawn spikes under DC excitation. In other words, Mott mem-
the Mott memristor with capacitors, which forms a set of analog ristor-based neuristors integrate both “threshold firing” and spike
neuron circuits named “neuristors.”[40] Mott memristor is a type generation capability, which advances the diffusive memristor at
of RRAM nanoscale device whose hysteresis (memory) loop is the cost of more complex circuit connections. Experiments show
formed based on Mott transition, a reversible insulation-metal that the NbO2 Mott memristor-based neuristor achieves rapid
(a) Diffusive Memristor vs. Neuron Membrane (b) Mott Memristor I-V Curve
Electrode 1.0
Current (mA)
Mott Transition
vs.
Ag filaments
0.5
Negative Differential
Electrode Resistance (NDR)
ion channel
Diffusive Memristor Neuron Circuit 0.0
0.0 0.5 1.0
Spike Generator Voltage (V)
A Relaxation Oscillator
Input voltage spikes
Mott Memristor
(c) Neuristor
Mott
Memristor Output
Time
Not Firing Firing Output spikes Rout
Current
Rload1 Rload2
|Vdc| |Vinput|
Mott Cout
Threshold Time |Vdc| Memristor
Spike Number 1 2 3 4 5 6
Fire Or Not N N N N Y Y
Figure 4. a) The SiOxNy/SiOx:Ag diffusive memristor is similar to the neuron cell membrane in the conductive channel formation.[100] A diffusive
memristor-based neuron circuit demonstrates the “threshold firing” function.[100] b) Mott memristor characteristics showing Mott transition and
NDR.[101,102] The step response of the Mott memristor-RC circuit is sawtooth-shaped spikes. c) The neuristor circuit contains two groups of Mott
memristor-RC circuits.[40]
Adv. Intell. Syst. 2019, 1, 1900068 1900068 (7 of 16) © 2019 The Authors. Published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.advancedsciencenews.com www.advintellsyst.com
spike generation (≤1 ns), very low switching energy (<100 f J), and current is converted to voltage by a transimpedance amplifier
a much more compact design (110 110 nm2).[40] and subsequently translated into a digital output vector by
To reduce the routing complexity in the neuristor, Yan et al. ADC. The DPE works at a frequency of 10 MHz limited by
proposed to integrate both “threshold firing” and spike genera- the multiple conversion paths and high parasitics at the 2 μm
tion functions within a group of Mott memristor-RC circuits.[102] transistor technology. The Ta/HfO2 RRAM is with 6 bit preci-
With an appropriate post-amplifier to fit into the voltage range of sion, and a single-layer perceptron using such a DPE for perfor-
CMOS logic, a single Mott memristor-RC oscillation circuit mance MNIST dataset recognition yields an accuracy of 89.9%.
can replace the analog IFC in conventional CMOS technology
(Figure 4c). Additionally, the Mott memristor shows the quasi-
4.1.2. Level Sense-Amplifier-Based Design
chaotic behavior, i.e., intrinsic pseudo-randomness. Adding
the random noise with limited amplitude to the outputs helps
The pure analog approach of DPE using the RRAM array
jump out of the local minima in the backpropagation process
requires substantial efforts to fine-tune the analog RRAM cell
and thus improve training speed.[101,102] The experiment showed
resistance.[67] In contrast, digital approaches can simplify the
that online training with the Mott memristor neuron circuit is
data conversion inside the in-memory computing macro.
1.8 faster on average than the design with the analog
A closer look at the RRAM-based in-memory computing macro
CMOS IFC. For a fully connected layer benchmark, the
RRAM in-memory computing macro saves 27% area and reduces reveals the three parts of digitalization based on vector multipli-
36% power consumption.[102] cation: the two operands G and V and the computational results
In addition to developing high-density and a large volume of I. To digitalize V, a multibit digital input is applied to the RRAM
synaptic connections, the neuron design is another key to the array bit by bit. Each bit is with an identical voltage. The results
efficient implementation of in-memory computing accelerator are subsequently weighted with significance and summed. In
design. In recent years, RRAM-based neurons have gained sub- this way, DAC at the input terminal is removed at the cost of
stantial attention as RRAM devices feature high density, low increased computing cycles. This method is often used to isolate
power consumption, and easiness to emulate complex the input drivability and input data. Otherwise, DAC with a spe-
neuron dynamics. The intrinsic connection between biological cial drivability requirement consumes much more area.
nervous systems and memristors is proved and explained in The quantization of G determines mapping from a floating-
theory.[105,106] Meanwhile, new types of RRAM neuristors are point weight in neural networks to RRAM conductance with a
brought to this emerging field. limited number of levels. The state-of-the-art monolithic integra-
tion of RRAM in an order of kilobits and more onto the CMOS
logic platform relies on binary RRAM devices, which demands
4. Large-Scale System Integration the adaption of floating-point synaptic weights to binary or ter-
nary ones. Wang et al. presented a few schemes, including
Neural network model sizes surge as deep learning methodolo- distribution-aware quantization, quantization regularization,
gies prevail in solving the recognition and regression problem. and bias tuning, to adapt synaptic weights during training to
How to implement large-scale in-memory computing systems fit into RRAM in-memory computing macro.[107]
becomes very important. For RRAM-based large-scale systems, The lowered requirement of up to 4 bit output precision (a.k.a.
the peripheral circuitry together with the RRAM arrays often activation precision) primarily simplifies the ADC design.
dominate the overall power, chip area, and energy consumption. Instead of conventional ADC architectures (such as pipelined
Thus, new timing control and data conversion circuitries are ADC and successive-approximation ADC), low-precision ADCs
expected. Furthermore, more complicated topologies of neural are feasible from multiple binary sensing amplifiers with differ-
networks require considerable data management and scheduling ent reference thresholds to address.[108] Mochida et al. presented
for better exploiting in-memory computing systems. In this sec- a binary-input binary-output RRAM-based in-memory comput-
tion, we present both in-memory computing macro designs with ing macro.[67] The RRAM array is composed of 1T1R cells.
different data conversion interface circuits and state-of-the-art Assisted with the read-verify programming scheme, an RRAM
microarchitectures of RRAM-based in-memory computing. device can be programmed to any value between its HRS and
LRS.[67] The output sensing amplifier is binary. For neural net-
4.1. RRAM-Based In-Memory Computing Macro Design works requiring multibit activation precision, neuron computa-
tion is then realized by accumulating the 1 bit MAC results
4.1.1. Analog/Digital Converter-Based Design followed by additional digital nonlinear activation function cir-
cuits. Such a simplified design scheme without an ADC module
An in-memory computing macro is the basic processing core of reduces the area and power consumption with increased latency/
VMM operations. RRAM-based in-memory computing indeed operation overhead.
operates in an analog format and requires data conversion for Chen et al. demonstrated a binary-input 3 bit output RRAM in-
such a macro to interface with its surrounding digital systems. memory computing macro based on the single-level cell (SLC)
Mature digital/analog converter (DAC) and analog/digital con- RRAM.[109] Figure 5a shows its architecture. The binary input
verter (ADC) designs become the first choice. Hu et al. presented signal is determined by turning on the word line, and the weights
a pure analog dot-product engine (DPE) using a 128 64 RRAM are stored in the memory array. There are two states in RRAM
array.[61] The digital inputs vectors are converted into queues of cells, HRS and LRS, respectively, representing the weights of þ1
analog voltages, and DPE performs computation. The output (LRS) and 0 (HRS). There are two RRAM in-memory computing
Adv. Intell. Syst. 2019, 1, 1900068 1900068 (8 of 16) © 2019 The Authors. Published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.advancedsciencenews.com www.advintellsyst.com
(a) (b)
Figure 5. a) Architecture of binary-in ternary-weight RRAM in-memory computing macro. Reproduced with permission.[109] Copyright 2018, IEEE.
b) Architecture of serial-input nonweighted product RRAM in-memory computing macro. Reproduced with permission.[68] Copyright 2019, IEEE.
macros that, respectively, provide the positive and negative 4.1.3. Spike-Based Design
weights to reach the ternary weight. This RRAM in-memory
computing macro yields a latency of 14.8 and 15.6 ns to compute Although there are many algorithm-layer studies to develop a
a convolutional layer and a fully connected layer, respectively. purely binary neural network (binary input, binary output, and
The precision of the sensing circuit is 3 bits. Moreover, it uses binary weights),[110,111] complicated datasets and applications,
an input-aware reference current generation to increase the read e.g., ImageNet,[112] still need a certain level of data precision
margin. A small-offset multilevel current sense amplifier (e.g., 8 bit) to satisfy the accuracy requirement.[113] Considering
improves the sensing yield. Xue et al. further optimized the sens- the limited sensing margin in voltage representation of data,[68,109]
ing circuit to a 4 bit precision with 14.6 ns latency.[68] Figure 5b Yan et al. proposed using spikes for data representation and dem-
shows the structure of the RRAM in-memory computing macro onstrated a compact RRAM-based nonvolatile in-memory comput-
using a 1T1R SLC cell array, including a serial input unweighted ing processing engine (PE).[114] The 1T1R array is 64 kb
product array structure, a read path current reduction module, (256 256) with RRAM devices in binary states. The PE has
and a multilevel current mode sense amplifier. This work allo- the duality function of memory and computation. In the memory
cates positive and negative weights in different columns in mode, the read/write logics, drivers, and amplifiers realize data
the same array, which is realized by a current subtractor. programming and sensing. In the computing mode, the RRAM
Besides, multiple binary RRAM devices together represent a array performs matrix multiplication, and the in situ nonlinear
multibit synaptic weight. For an N N CNN kernel, N2 weights activation (ISNA) circuit converts the output currents to spikes.
are stored in N2 consecutive rows. In this in-memory computing PE design, ISNA executes the acti-
In-memory computing appears to be a promising approach, vation function computation on the fly, obviating the additional
using large memory internal bandwidth and enabling parallel circuits to calculate activation function and reducing the design
data processing in the local memory. The in-memory computing overhead.[115] The spike-based ISNA takes a different approach
structure also has several advantages over the conventional to enhance the energy efficiency by lowering the power consump-
approach. First, in-memory computing reduces the amount of tion at the cost of 200 ns latency. Instead of using multiple sens-
data that must be transferred between the CPU and memory. ing amplifiers with different thresholds,[68,88] the IFC-like ISNA
Second, it reduces the amount of intermediate data, which circuit performs data conversion by continuous charging and dis-
decreases memory capacity requirements, reduces energy con- charging of a capacitor. Such a biological-inspired spike genera-
sumption, lowers latency, and improves overall performance. tion needs only approximately ten transistors with a capacitor.
To accommodate the higher precision requirement of heavy Because of the small footprint of ISNA, more spike-based
DNN applications, RRAM in-memory computing macro has to ISNA sensing circuits can be included, leading to a higher execu-
support multibit inputs and weights to maximize the accuracy tion parallelism and bigger throughput. This RRAM in-memory
of the MAC output. computing PE reaches the highest energy efficiency of 16 trillion
Adv. Intell. Syst. 2019, 1, 1900068 1900068 (9 of 16) © 2019 The Authors. Published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.advancedsciencenews.com www.advintellsyst.com
operations per second per watt (TOPS/W) as well as provides the Furthermore, it removes the high-cost ADC/DAC components
flexibility of configuring the activation precision between 1 bit and replaces with spiking-based read/output circuits. More
and 8 bits. importantly, PipeLayer implements the customized processes
for training, i.e., error backward and weight update.
4.2. RRAM-Based In-Memory Computing Microarchitecture Integrating the aforementioned designs, PipeLayer significantly
improved the energy efficiency, the computing throughput, as
According to von Neumann, a computer can be divided into basic well as the area efficiency. Based on the observation that the exist-
arithmetic operations and logic flow.[1] In-memory computing ing RRAM accelerators overlooked the data reuse opportunity
macro designs provide the function of the basic arithmetic underlying the network layer, Qiao et al. proposed a universal
operation. Microarchitecture studies focus on effective control accelerator, AtomLayer, which integrates a unique filter mapping
to utilize in-memory computing macros. scheme and a dataflow design to maximize the utilization of
ISAAC is a crossbar-based accelerator tailored for CNN bench- input data and execution throughput.[119] The performance
marks (Figure 6).[116] It is organized in a hierarchy of chips/tiles/ and power efficiency of AtomLayer exceed the previous works
in situ multiply accumulators (IMA)/arrays. The dedicated on- in both training and inference.
chip network bridges the tiles within the chip for data transmis- Recently, a few RRAM-based in-memory computing specialized
sion. Within each tile, embedded dynamic random access mem- for diverse network networks have emerged. LerGAN and ZARA
ory (eDRAM) buffers are used for result aggregations, IMAs are are tailored for accelerating the unsupervised machine learning
composed of a group of RRAM arrays together with data conver- applications, generative adversarial network (GAN).[120,121] The
sion interface and conduct VMM, and the output registers store challenge of training GAN comes from two aspects: 1) the complex
the aggregated results. Additional digital components perform data dependency between the discriminator network and the gen-
the pooling and activation operations in neural networks.[116] erator network and 2) the untraditional computing patterns within
ISAAC exploits the characteristics of networks and proposes a the layers of the generator network. To address these problems,
pipeline design. The pipeline is applied within IMA and tiles LerGAN derives the chances to improve the computing efficiency
and enables the overlap of data accesses and computations. by skipping the ineffective computations in a generator net-
ISAAC also equips with the data encode and allocation scheme work.[121] Meanwhile, a 3D-based layer connection is developed
to lower the overhead induced by high-precision DAC/ADCs. to optimize the efficiency of data transmission among the layers
In the same year, another neural network accelerator, PRIME, of the discriminator and generator. Regarding the same problems,
was published by the research group from University of ZARA emphasizes the computing efficiency optimization.[120] It
California, Santa Barbara (UCSB).[117] Different from the hierar- first decomposes the convolution in the generator into several sub-
chical structure introduced by ISAAC, PRIME was built upon the matrix multiplications and then balances their computation
traditional main memory architecture, so the overhead of design latency through weight mapping and execution scheduling
modification is minimal. In the design, the peripheral circuits of designs. By eliminating the zero-related ineffective computation,
a portion of RRAM arrays are enhanced to support the comput- ZARA achieves almost 2.1 performance over the previous
ing functions. These arrays can alter between memory and com- RRAM-based in-memory computing accelerators.
putation modes in a time-multiplexed manner. A large amount Furthermore, to enable the general purpose application of the
of data can reside in RRAM arrays instead of in external memory, RRAM-based microarchitecture, Ankit et al. proposed program-
which reduces the overhead of memory and data access. mable ultra-efficient memristor-based accelerator (PUMA) archi-
Furthermore, PRIME provides a set of software and hardware tecture with an instruction set architecture (ISA) and compiler
interfaces, such that the RRAM arrays are configured into mem- for a wide variety of machine learning workloads.[122] The
ory or computing units according to the application demand. PUMA ISA accommodates the hardware design configuration
PipeLayer enhances the execution parallelism across two and provides an interface for the up-level compiler.
levels, i.e., intra-layer parallelism and inter-layer parallelism.[118] Computing instructions include VMM, vector arithmetic, and
Figure 6. Hierarchy of RRAM in-memory computing microarchitecture: from top-level to bottom-level is processor, PE, macro, RRAM array, 1T1R cell,
and RRAM. The data conversion shown implemented with DAC/ADC. Reproduced with permission.[51] Copyright 2019, ACM, Inc.
Adv. Intell. Syst. 2019, 1, 1900068 1900068 (10 of 16) © 2019 The Authors. Published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.advancedsciencenews.com www.advintellsyst.com
scalar arithmetic. Each of the three-level memory hierarchies has reliability issues are induced by not only device fabrication but
its own controlling instructions, which, respectively, are set/ also the computing process. First, process variations cause the
copy, load/store, and send/receive, from low to high. The argu- devices across a single chip (within the same array or on different
ment lists of all aforementioned operations are directly encoded arrays) to behave differently in terms of the conductance ranges,
in the instructions. The gap between the machine code and high- device programmability, retention, and endurance. The heteroge-
level machine learning (ML) model descriptions is fulfilled by the neity of RRAM array fabrication potentially increases faults and
PUMA compiler. It directly compiles ML models written in pop- reduces the yield. The variations across different chips are even
ular frameworks (Caffe2, PyTorch, etc.) to executable PUMA more severe. Moreover, nonoptimized operational behaviors,
instructions.[123–125] Note that PUMA is a spatial architecture, such as repeatedly rewriting a small portion of devices, could
for which each core has its own sequence of instructions. The result in overall system performance deterioration.[84] These
first stage of compiling is, therefore, to derive the code for each nonideal properties are summarized in Table 1. Some nonideal
core. A machine learning model is described as a computation properties can be exploited for special purposes, such as devel-
graph where a node represents an operation and an edge repre- oping physically unclonable functions (PUF) with the statistical
sents a communication. A heuristic-based graph partition is per- variance of RRAM devices.[129] In general, the robustness
formed to assign graph nodes to PUMA cores and replace edges concerns of RRAM-based in-memory computing obscure the
with load/store/send/receive operations. Next, the compiler accuracy of both storage and computation. Comparing RRAM
schedules the instructions for each PUMA core. Dataflow analy- in-memory computing systems with conventional storage-purpose
sis techniques are applied to reduce register pressure, capture RRAM designs, the inaccuracy of devices not only affects data stor-
instruction-level parallelism, and avoid deadlocks. Finally, regis- age, but also the computing accuracy of analog matrix multiplica-
ter allocation is performed to fit the actual hardware. tion. Due to the highly parallel operations in an RRAM in-memory
computing macro, conventional techniques, like error-correcting
code (ECC), are not sufficient to tolerate faults without forfeiting
5. Reliability of RRAM-Based In-Memory the throughput bonus brought by parallelism.
Computing When addressing the reliability concerns, fault models are
used to understand the persistent and nonpersistent errors of
Reliability becomes as a major concern for large-scale integra- RRAM arrays. For example, Ambrogio et al. used 1/f noise
tion. For the RRAM-based in-memory computing systems, the and telegraph noise models to describe the low-frequency noise
in binary RRAM devices.[130] Huang et al. presented an analytic
Table 1. Toy model for RRAM nonideal properties. model on RRAM retention properties.[131] Chen et al. described
the endurance of RRAM devices toward repeated writing.[132]
Nonideal behavior Analog RRAM Binary RRAM There is no universal model that can cover all types of RRAM
devices, due to a large variety of materials and structures.
Endurance[22,80] 500 k cycles 95.5% >106 cycles
maintains resistance
Nevertheless, the investigations on fault models for different
types of RRAM devices unveil some common characteristics,[133]
Variabilitya)[126] 0.03 0.04 @ LRS, 0.4 @ HRS
which are helpful to circuit and system designers to understand
Yield[61,80] 89.9% >99%
and explore reliability enhancement technologies for RRAM in-
Bit error rate before ECC [47]
N/A <105 memory computing systems.
Thermal-activated fluctuation 0.03 0.03 Specifically, one of the prominent problems is the low yield of
variabilitya)[94] RRAM devices. As shown in Figure 7a, many devices in an
Read disturbance Refer to Yan et al.[127] Refer to Ho et al.[128] RRAM array always are in LRS (“stuck-on”) or in HRS
(“stuck-off”). These devices cannot be mapped to any arbitrary
a)
The variability is defined as the ratio of the standard deviation over the mean of the values when deploying an algorithm on an RRAM array. This
measured resistance. issue is especially vital to large-scale RRAM arrays, because a
Relative Weight
Figure 7. a) Yield problem. Left: conductance heat map in a programming example (64 64 arrays). Right: measured stuck-on (þ1) and stuck-off (1).
Reproduced with permission.[134] Copyright 2017, IEEE. b) Read disturbance causes weight drift (simulated). A synaptic weight is represented by the conduct
difference of two RRAM devices. The shown two cases cause combined weight to decrease. Reproduced with permission.[127] Copyright 2017, IEEE.
Adv. Intell. Syst. 2019, 1, 1900068 1900068 (11 of 16) © 2019 The Authors. Published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.advancedsciencenews.com www.advintellsyst.com
small proportion of devices in a large RRAM array still 6. Conclusion and Remarks
covers considerable MAC operations. To mitigate the low
yield issue, Liu et al. introduced the concept of “weight In this paper, we summarize the state-of-the-art progress in
significance,” which evaluates how severe the impact of the unex- developing RRAM-based in-memory computing systems, from
pected deviation of these weights is on the final computational device to system layers. The major focus at the device level is
accuracy.[134] By assigning less weight significance to stuck-on to realize electric synapses and neurons with a single RRAM
or stuck-off cells, the computational accuracy can be largely device and/or through simple circuitry that leverages novel
recovered from neural network deployment on RRAM arrays device structures. Further enhancing the density and scalability,
with lower yield. Xia et al. presented an online training scheme, e.g., by taking advantage of 3D integration, will continue to be an
which can detect and remap the erroneous cells.[135] The important trend. At the circuit and architecture levels, the
detection uses an adaptive threshold voltage to locate the RRAM-based in-memory computing that naturally integrates
stubborn stuck cells. The remapping technique exchanges the data storage and processing operations is widely investigated.
neuron positions to bypass those dead cells recognized from Substantial research efforts focus on improving energy effi-
detection. ciency, reducing the design cost, meanwhile satisfying the sys-
Faults also arise in the process of data movement and com- tem requirements, e.g., operation speed, throughput, and
putation. The read-verify programming scheme effectively low- accuracy. New ISA and compilers emerge as the new topic to
ers the bit error rate to the order of 105 for binary RRAM and generalize in-memory computing modules to various applica-
suppresses the programming conductance error under 2.95% tions. Finally, the robustness issue emerges and could be even
for analog RRAM.[47,67] However, RRAM sensing is far more aggravated as the dimension of RRAM devices scales down
frequent than programming (Figure 7b). Long time use of and the density tops out.
RRAM in-memory computing macro repeatedly applies vol- The adoption of emerging RRAM technology demonstrates
tages to cells, which likely causes read disturbance, which great potential in realizing more efficient computing systems,
denotes the unexpected weight drift from the original well- particularly for cognitive applications. To enable large-scale inte-
trained values. Considering that RRAM is programmed with grated systems for real-world applications, however, there are
bipolar voltages/currents, adaptively alternating the sensing still many key challenges to be solved, such as the imbalance
direction can effectively mitigate the read disturbance.[127] between the limited device number and the increasing size of
Such a task can be completed by a feedback controller.[127] neural network models,[136] the difficulty in realizing online
With the sensing direction determined by mimicking the train- training schemes, the automation flow to transfer an algorithm
ing backpropagation feedback, the weight stability improves to the given hardware platform, as well as the device robustness
averagely 14.9.[127] and system reliability issues. It is impossible to overcome them
Work Layer of major contribution Data conversion Hierarchical levela) Computation model Targeting application/dataset
[55]
Balatti et al. Device Sense amplifier Cell/macro Logic Logic gate
Pedro et al.[60] Device IFC Cell SNN Image clustering
Hu et al.[61] Device DAC/ADC Cell/macro DNN DCT,b) MNIST[81]
[100]
Wang et al. Device Memristive neuron Cell SNN Pattern classification
Li et al.[64] Device DAC/ADC Macro LSTM Gait recognition
Kumar et al.[101] Device Memristive neuron Cell SNN Backpropagation
Hu et al.[58] Circuit/microarchitecture DAC/ADC PE BSB Digit recognition
Wu et al.[137] Device/circuit Sense amplifier PE HD computing Language recognition
Imani et al.[76] Circuit Sense amplifier PE/microarchitecture HD computing Language recognition
Chen et al.[109] Circuit Sense amplifier Macro DNN, CNN MNIST
[68]
Xue et al. Circuit Sense amplifier Macro/PE DNN, CNN CIFAR-10[138]
[114]
Yan et al. Circuit ISNA PE DNN, CNN MNIST, CIFAR-10
Shafiee et al.[116] Microarchitecture DAC/ADC Processor/macro DNN, CNN ImageNet[139]
Song et al.[118] Microarchitecture IFC Processor DNN, CNN Image recognition, NNc) training
[120]
Chen et al. Microarchitecture DAC/ADC Processor GAN Image synthesis
Mao et al.[121] Microarchitecture DAC/ADC Processor GAN Image synthesis
Ankit et al.[122] Microarchitecture DAC/ADC Processor/compiler DNN/LSTM/CNN Object detection, neural machine translation,
language modeling, image recognition
a)
Hierarchical level refers to Figure 6; b)DCT: Discrete cosine transform. c)NN: Neural network.
Adv. Intell. Syst. 2019, 1, 1900068 1900068 (12 of 16) © 2019 The Authors. Published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.advancedsciencenews.com www.advintellsyst.com
solely at a device level, while mitigation from circuits, systems, [8] J. Zhang, Z. Wang, N. Verma, IEEE J. Solid-State Circuits 2017, 52, 915.
and even algorithms might be the only solution. [9] W.-S. Khwa, J.-J. Chen, J.-F. Li, X. Si, E.-Y. Yang, X. Sun, R. Liu,
Nowadays, to develop a highly efficient computing system, all P.-Y. Chen, Q. Li, S. Yu, M. F. Chang, IEEE Int. Solid-State Circuits
the hierarchical layers, including device processing, the circuit Conf., IEEE, San Francisco, CA, 2018, pp. 496–498.
components, the microarchitecture, as well as application algo- [10] M.-F. Chang, C.-F. Chen, T.-H. Chang, C.-C. Shuai, Y.-Y. Wang,
rithms, are heavily correlated. The same philosophy is well rep- H. Yamauchi, IEEE Int. Solid-State Circuits Conf., IEEE, San
Francisco 2015, pp. 1–3.
resented by the research on RRAM-based in-memory computing,
[11] F. Merrikh-Bayat, X. Guo, M. Klachko, M. Prezioso, K. K. Likharev,
as shown by examples at different layers in Table 2. As an inter-
D. B. Strukov, IEEE Trans. Neural Netw. Learn. Syst. 2017, 29, 4782.
disciplinary area, researchers with different knowledge back- [12] D. Fick, A Peek Into Software Engineering at Mythic, https://
grounds present different understandings on how to develop medium.com/mythic-ai/a-peek-into-software-engineering-at-mythic-
in-memory computing. For instance, the expansion of comput- 1b0ca5522868 (accessed: June 2019).
ing units leads to the revision from the original classical von [13] S. Jain, A. Ranjan, K. Roy, A. Raghunathan, IEEE Trans. Very Large
Neumann architecture for higher efficiency and higher perfor- Scale Integr. Syst. 2017, 26, 470.
mance, whereas the unprecedent biological-plausible computing [14] D. Fan, S. Angizi, Z. He, IEEE Comput. Soc. Annu. Symp. VLSI, IEEE,
schemes emerge to respond to the ever-increasing AI computing Kyoto 2017, pp. 683–688.
demands. RRAM in-memory computing has attracted the atten- [15] Y. Wang, H. Yu, L. Ni, G.-B. Huang, M. Yan, C. Weng, W. Yang,
tion from industries and will certainly brace further development J. Zhao, IEEE Trans. Nanotechnol. 2015, 14, 998.
of computing power. [16] R. Venkatesan, M. Sharad, K. Roy, A. Raghunathan, Des. Autom. Test
Eur. Conf. Exhib. 2013, IEEE, Grenoble, pp. 1825–1830.
[17] S. Kim, N. Sosa, M. BrightSky, D. Mori, W. Kim, Y. Zhu, K. Suu,
C. Lam, IEEE Int. Electron Devices Meet, IEEE, Washington, DC
Acknowledgements 2013, pp. 30–37.
This work was supported by Air Force Research Laboratory (AFRL) [18] M. Le Gallo, A. Sebastian, R. Mathis, M. Manica, H. Giefers, T. Tuma,
FA8750-18-2-0121 and National Science Foundation (NSF) CSR-1717885. C. Bekas, A. Curioni, E. Eleftheriou, Nat. Electron. 2018, 1, 246.
[19] D. Ielmini, H.-S. P. Wong, Nat. Electron. 2018, 1, 333.
[20] S.-S. Sheu, P.-C. Chiang, W.-P. Lin, H.-Y. Lee, P.-S. Chen, Y.-S. Chen,
Conflict of Interest T.-Y. Wu, F. T. Chen, K.-L. Su, M.-J. Kao, K. H. Cheng, IEEE Symp. VLSI
Circuits, IEEE, Kyoto 2009, pp. 82–83.
The authors declare no conflict of interest. [21] S. H. Jo, T. Chang, I. Ebong, B. B. Bhadviya, P. Mazumder, W. Lu,
Nano Lett. 2010, 10, 1297.
[22] W.-H. Chen, W.-J. Lin, L.-Y. Lai, S. Li, C.-H. Hsu, H.-T. Lin, H.-Y. Lee,
J.-W. Su, Y. Xie, S.-S. Sheu, M. F. Chang, IEEE Int. Electron Devices
Keywords Meet, IEEE, San Francisco, CA, 2017, pp. 22–28.
accelerators, in-memory computing, neural networks, process-in-memory, [23] A. Sengupta, P. Panda, P. Wijesinghe, Y. Kim, K. Roy, Sci. Rep. 2016, 6,
resistive memory 30039.
[24] X. Zhang, W. Cai, X. Zhang, Z. Wang, Z. Li, Y. Zhang, K. Cao, N. Lei,
Received: June 30, 2019 W. Kang, Y. Zhang, K. Cao, N. Lei, W. Kang, Y. Zhang, H. Yu, ACS
Revised: July 25, 2019 Appl. Mater. Interfaces 2018, 10, 16887.
Published online: September 20, 2019 [25] Y. Huang, W. Kang, X. Zhang, Y. Zhou, W. Zhao, Nanotechnology
2017, 28, 08LT02.
[26] K. Cao, W. Cai, Y. Liu, H. Li, J. Wei, H. Cui, X. He, J. Li, C. Zhao,
[1] J. Von Neumann, The Computer and the Brain, Yale University Press, W. Zhao, Nanoscale 2018, 10, 21225.
New Haven, 2012. [27] W. Kang, Y. Huang, C. Zheng, W. Lv, N. Lei, Y. Zhang, X. Zhang,
[2] J. L. Hennessy, D. A. Patterson, Computer Architecture: A Quantitative Y. Zhou, W. Zhao, Sci. Rep. 2016, 6, 23164.
Approach, Elsevier, Burlington, MA 2011. [28] J. Y. Seok, S. J. Song, J. H. Yoon, K. J. Yoon, T. H. Park, D. E. Kwon,
[3] D. Kuzum, R. G. D. Jeyasingh, B. Lee, H.-S. P. Wong, Nano Lett. 2011, H. Lim, G. H. Kim, D. S. Jeong, C. S. Hwang, Adv. Funct. Mater. 2014,
12, 2179. 24, 5316.
[4] B. Reagen, P. Whatmough, R. Adolf, S. Rama, H. Lee, S. K. Lee, [29] S. Yu, Proc. IEEE, 2018, 106, 260.
J. M. Hernández-Lobato, G.-Y. Wei, D. Brooks, ACM/IEEE Int. [30] H. Tsai, S. Ambrogio, P. Narayanan, R. M. Shelby, G. W. Burr, J. Phys.
Symp. Comput. Archit. IEEE, Piscataway, NJ, 2016, pp. 267–278. D. Appl. Phys. 2018, 51, 283001.
[5] T. Potok, C. Schuman, R. Patton, T. Hylton, H. Li, R. Pino, [31] Q. Xia, J. J. Yang, Nat. Mater. 2019, 18, 309.
Neuromorphic Computing, Architectures, Models, and Applications. [32] D. S. Jeong, K. M. Kim, S. Kim, B. J. Choi, C. S. Hwang, Adv. Electron.
A Beyond-CMOS Approach to Future Computing, The Department Mater. 2016, 2, 1600090.
of Energy (DOE) Office of Scientific and Technical Information [33] D. S. Jeong, C. S. Hwang, Adv. Mater. 2018, 30, 1704729.
(OSTI), Oak Ridge, TN, June 29–July 1, 2016. [34] D. B. Strukov, G. S. Snider, D. R. Stewart, R. S. Williams, Nature 2008,
[6] H. Zhang, G. Chen, B. C. Ooi, K.-L. Tan, M. Zhang, IEEE Trans. Knowl. 453, 80.
Data Eng. 2015, 27, 1920. [35] L. Goux, Y.-Y. Chen, L. Pantisano, X.-P. Wang, G. Groeseneken,
[7] J. Fowers, K. Ovtcharov, M. Papamichael, T. Massengill, M. Liu, M. Jurczak, D. J. Wouters, Electrochem. Solid-State Lett. 2010, 13, G54.
D. Lo, S. Alkalay, M. Haselman, L. Adams, M. Ghandi, S. Heil, [36] S. Balatti, S. Larentis, D. C. Gilmer, D. Ielmini, Adv. Mater. 2013, 25,
P. Patel, A. Sapek, G. Weisz, L. Woods, S. Lanka, S. K. Reinhardt, 1474.
A. M. Caulfield, E. S. Chung, D. Burger, Int. Symp. Comput. [37] S. Privitera, G. Bersuker, S. Lombardo, C. Bongiorno, D. C. Gilmer,
Archit. ISCA, Los Angeles, CA, 2018, pp. 1–14. Solid. State. Electron. 2015, 111, 161.
Adv. Intell. Syst. 2019, 1, 1900068 1900068 (13 of 16) © 2019 The Authors. Published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.advancedsciencenews.com www.advintellsyst.com
[38] S. Yu, H.-Y. Chen, B. Gao, J. Kang, H.-S. P. Wong, ACS Nano 2013, [65] L. Gao, P.-Y. Chen, S. Yu, IEEE Electron Device Lett. 2016,
7, 2320. 37, 870.
[39] W. R. Hiatt, T. W. Hickmott, Appl. Phys. Lett. 1965, 6, 106. [66] C. Li, M. Hu, Y. Li, H. Jiang, N. Ge, E. Montgomery, J. Zhang,
[40] M. D. Pickett, G. Medeiros-Ribeiro, R. S. Williams, Nat. Mater. 2013, W. Song, N. Dávila, C. E. Graves, Z. Li, Nat. Electron. 2018,
12, 114. 1, 52.
[41] D. S. Jeong, H. Schroeder, R. Waser, Phys. Rev. B 2009, 79, 195317. [67] R. Mochida, K. Kouno, Y. Hayata, M. Nakayama, T. Ono, H. Suwa,
[42] J. J. Yang, M. D. Pickett, X. Li, D. A. A. Ohlberg, D. R. Stewart, R. Yasuhara, K. Katayama, T. Mikawa, Y. Gohou, IEEE Symp. VLSI
R. S. Williams, Nat. Nanotechnol. 2008, 3, 429. Technol. 2018, IEEE, Honolulu, HI, pp. 175–176.
[43] M.-J. Lee, C. B. Lee, D. Lee, S. R. Lee, M. Chang, J. H. Hur, Y.-B. Kim, [68] C.-X. Xue, W.-H. Chen, J.-S. Liu, J.-F. Li, W.-Y. Lin, W.-E. Lin,
C.-J. Kim, D. H. Seo, S. Seo, U. I. Chung, Nat. Mater. 2011, 10, 625. J.-H. Wang, W.-C. Wei, T.-W. Chang, T.-C. Chang, T.-Y. Huang,
[44] F. Miao, J. P. Strachan, J. J. Yang, M.-X. Zhang, I. Goldfarb, H.-Y. Kao, S.-Y. Wei, Y.-C. Chiu, C.-Y. Lee, C.-C. Lo, Y.-C. King,
A. C. Torrezan, P. Eschbach, R. D. Kelley, G. Medeiros-Ribeiro, C.-J. Lin, R.-S. Liu, C.-C. Hsieh, K.-T. Tang, M.-F. Chang, IEEE Int.
R. S. Williams, Adv. Mater. 2011, 23, 5633. Solid-State Circuits Conf., IEEE, San Francisco, CA, 2019,
[45] R. Soni, A. Petraru, P. Meuffels, O. Vavra, M. Ziegler, S. K. Kim, pp. 388–390.
D. S. Jeong, N. A. Pertsev, H. Kohlstedt, Nat. Commun. 2014, [69] W.-H. Chen, K.-X. Li, W.-Y. Lin, K.-H. Hsu, P.-Y. Li, C.-H. Yang,
5, 5414. C.-X. Xue, E.-Y. Yang, Y.-K. Chen, Y.-S. Chang, T.-H. Hsu,
[46] B. J. Choi, J. Zhang, K. Norris, G. Gibson, K. M. Kim, W. Jackson, Y.-C. King, C.-J. Lin, R.-S. Liu, C.-C. Hsieh, K.-T. Tang,
M.-X. M. Zhang, Z. Li, J. J. Yang, R. S. Williams, Adv. Mater. 2016, M.-F. Chang, IEEE Int. Solid-State Circuits Conf., IEEE, San
28, 356. Francisco, CA, 2018, pp. 494–496.
[47] P. Jain, U. Arslan, M. Sekhar, B. C. Lin, L. Wei, T. Sahu, J. Alzate- [70] F. Su, W.-H. Chen, L. Xia, C.-P. Lo, T. Tang, Z. Wang, K.-H. Hsu,
vinasco, A. Vangapaty, M. Meterelliyoz, N. Strutt, A. B. Chen, M. Cheng, J.-Y. Li, Y. Xie, Y. Wang, M.-F. Chang, H. Yang, Y. Liu,
P. Hentges, P. A. Quintero, C. Connor, O. Golonzka, K. Fischer, IEEE Symp. VLSI Technol., IEEE, Kyoto, 2017, pp. T260–T261.
F. Hamzaoglu, IEEE Int. Solid-State Circuits Conf., IEEE, San [71] H.-J. Yoo, IEEE Int. Solid-State Circuits Conf. 2019, IEEE, San
Francisco, CA, 2019, pp. 212–214. Francisco pp. 20–26.
[48] W. Zhao, M. Moreau, E. Deng, Y. Zhang, J.-M. Portal, J.-O. Klein, [72] P.-Y. Chen, B. Lin, I.-T. Wang, T.-H. Hou, J. Ye, S. Vrudhula, J. Seo,
M. Bocquet, H. Aziza, D. Deleruyelle, C. Muller, D. Querlioz, Y. Cao, S. Yu, IEEE/ACM Int. Conf. Comput. Des. IEEE Press,
N. B. Romdhane, D. Ravelosona, C. Chappert, IEEE Trans. Circuits Piscataway, NJ, 2015, pp. 194–199.
Syst. I Regul. Pap. 2013, 61, 443. [73] S. Choi, J. H. Shin, J. Lee, P. Sheridan, W. D. Lu, Nano Lett. 2017, 17,
[49] N. Xu, K. J. Yoon, K. M. Kim, L. Fang, C. S. Hwang, Adv. Electron. 3113.
Mater. 2018, 4, 1800189. [74] S. Lashkare, N. Panwar, P. Kumbhare, B. Das, U. Ganguly, IEEE
[50] N. Xu, L. Fang, K. M. Kim, C. S. Hwang, Phys. Status Solidi (RRL) 2019, Electron Device Lett. 2017, 38, 1212.
13, 1900033. [75] H. Tan, S. Majumdar, Q. Qin, J. Lahtinen, S. van Dijken, Adv. Intell.
[51] B. Li, B. Yan, H. Li, Proc. 2019 Gt. Lakes Symp. VLSI, 2019, ACM, Syst. 2019, 1, 1900036
Tysons Corner, VA, pp. 381–386. [76] M. Imani, Y. Kim, T. Worley, S. Gupta, T. Rosing, Des. Autom. Test
[52] K. M. Kim, N. Xu, X. Shao, K. J. Yoon, H. J. Kim, R. S. Williams, Eur. Conf. Exhib., IEEE, Florence, 2019, pp. 1591–1594.
C. S. Hwang, Phys. Status Solidi (RRL) 2019, 13, 1800629. [77] I.-T. Wang, C.-C. Chang, L.-W. Chiu, T. Chou, T.-H. Hou,
[53] J. Borghetti, G. S. Snider, P. J. Kuekes, J. J. Yang, D. R. Stewart, Nanotechnology 2016, 27, 365204.
R. S. Williams, Nature 2010, 464, 873. [78] G. C. Adam, B. D. Hoskins, M. Prezioso, F. Merrikh-Bayat,
[54] P. Huang, J. Kang, Y. Zhao, S. Chen, R. Han, Z. Zhou, Z. Chen, W. Ma, B. Chakrabarti, D. B. Strukov, IEEE Trans. Electron Devices 2016,
M. Li, L. Liu, X. Liu, Adv. Mater. 2016, 28, 9758. 64, 312.
[55] S. Balatti, S. Ambrogio, D. Ielmini, IEEE Trans. Electron Devices 2015, [79] H. Li, K.-S. Li, C.-H. Lin, J.-L. Hsu, W.-C. Chiu, M.-C. Chen, T.-T. Wu,
62, 1831. J. Sohn, S. B. Eryilmaz, J.-M. Shieh, W.-K. Yeh, H.-S. Philip Wong,
[56] S. Kvatinsky, D. Belousov, S. Liman, G. Satat, N. Wald, IEEE Symp. VLSI Technol., IEEE, Honolulu, HI, 2016,
E. G. Friedman, A. Kolodny, U. C. Weiser, IEEE Trans. Circuits Syst. pp. 1–2.
II Express Briefs 2014, 61, 895. [80] S. Yu, Z. Li, P.-Y. Chen, H. Wu, B. Gao, D. Wang, W. Wu, H. Qian,
[57] S. Balatti, S. Ambrogio, D. Ielmini, IEEE Trans. Electron Devices 2015, IEEE Int. Electron Devices Meet, IEEE, San Francisco, CA, 2016,
62, 1839. pp. 12–16.
[58] M. Hu, H. Li, Q. Wu, G. S. Rose, Des. Autom. Conf. 2012, ACM, San [81] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, Proc. IEEE, 1998,
Francisco pp. 498–503. 86, 2278.
[59] M. Prezioso, F. M. Bayat, B. Hoskins, K. Likharev, D. Strukov, Sci. Rep. [82] S. Li, N. Xiao, P. Wang, G. Sun, X. Wang, Y. Chen, H. H. Li, J. Cong,
2016, 6, 21331. T. Zhang, IEEE Trans. Comput. 2018, 68, 239.
[60] M. Pedro, J. Martin-Martinez, E. Miranda, R. Rodriguez, M. Nafria, [83] A. M. Hassan, C. Yang, C. Liu, H. H. Li, Y. Chen, Des. Autom. Test
M. B. Gonzalez, F. Campabadal, IEEE Int. Reliab. Phys. Symp., Eur. Conf. Exhib., ACM, Lausanne, 2017, pp. 776–781.
IEEE, Burlingame, CA, 2018, p. P–CR. [84] S. Park, H. Kim, M. Choo, J. Noh, A. Sheri, S. Jung, K. Seo,
[61] M. Hu, C. E. Graves, C. Li, Y. Li, N. Ge, E. Montgomery, N. Davila, J. Park, S. Kim, W. Lee, J. Shin, D. Lee, G. Choi, J. Woo,
H. Jiang, R. S. Williams, J. J. Yang, Q. Xia, Adv. Mater. 2018, 30, E. Cha, J. Jang, C. Park, M. Jeon, B. Lee, B. H. Lee, H. Hwang,
1705914. IEEE Int. Electron Devices Meet, IEEE, San Francisco, CA, 2012,
[62] B. Yan, C. Liu, X. Liu, Y. Chen, H. Li, IEEE Int. Electron Devices Meet, pp. 10–12.
IEEE, San Francisco, CA, 2017, pp. 11–14. [85] M. Imani, Y. Kim, S. Riazi, J. Merssely, P. Liu, F. Koushanfar,
[63] C. Yakopcic, M. Z. Alom, T. M. Taha, Int. Jt. Conf. Neural Networks, T. Rosing, Cloud Comput. (CLOUD), IEEE, Milan 2019.
IEEE, Anchorage, AK, 2016, pp. 963–970. [86] M. Imani, A. Rahimi, D. Kong, T. Rosing, J. M. Rabaey, IEEE Int.
[64] C. Li, Z. Wang, M. Rao, D. Belkin, W. Song, H. Jiang, P. Yan, Y. Li, Symp. High Perform. Comput. Archit., IEEE, Austin, TX, 2017,
P. Lin, M. Hu, N. Ge, Nat. Mach. Intell. 2019, 1, 49. pp. 445–456.
Adv. Intell. Syst. 2019, 1, 1900068 1900068 (14 of 16) © 2019 The Authors. Published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.advancedsciencenews.com www.advintellsyst.com
[87] T. F. Wu, H. Li, P.-C. Huang, A. Rahimi, J. M. Rabaey, H.-S. P. Wong, [113] N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa,
M. M. Shulaker, S. Mitra, IEEE Int. Solid-State Circuits Conf., IEEE, S. Bates, S. Bhatia, N. Boden, A. Borchers, R. Boyle, ACM/IEEE Int.
San Francisco 2018, pp. 492–494. Symp. Comput. Archit., IEEE, Toronto, Ontario 2017,
[88] L. Zheng, S. Shin, S. Lloyd, M. Gokhale, K. Kim, S.-M. Kang, IEEE pp. 1–12.
Int. Symp. Circuits Syst., IEEE, Montreal, 2016, pp. 1382–1385. [114] B. Yan, Q. Yang, W. H. Chen, K. T. Chang, J. W. Su, C. H. Hsu,
[89] B. Chakrabarti, M. A. Lastras-Montaño, G. Adam, M. Prezioso, S. H. Li, H. Y. Lee, S. S. Sheu, M. S. Ho, Q. Wu M. F. Chang,
B. Hoskins, M. Payvand, A. Madhavan, A. Ghofrani, Y. Chen, H. Li, IEEE Symp. VLSI Technol, IEEE, Kyoto 2019,
L. Theogarajan, K.-T. Cheng, D. B. Strukov, Sci. Rep. 2017, 7, 42429. p. T86.
[90] Q. Luo, X. Xu, H. Liu, H. Lv, T. Gong, S. Long, Q. Liu, H. Sun, [115] S. Shukla, B. Fleischer, M. Ziegler, J. Silberman, J. Oh, V. Srinivasan,
W. Banerjee, L. Li, J. Gao, Nanoscale 2016, 8, 15629. J. Choi, S. Mueller, A. Agrawal, T. Babinsky, N. Cao, C.-Y. Chen,
[91] B. Gao, Y. Bi, H.-Y. Chen, R. Liu, P. Huang, B. Chen, L. Liu, X. Liu, P. Chuang, T. Fox, G. Gristede, M. Guillorn, H. Haynie,
S. Yu, H.-S. P. Wong, J. Kang, ACS Nano 2014, 8, 6998. M. Klaiber, D. Lee, S.-H. Lo, G. Maier, M. Scheuermann,
[92] W. Hwang, M. M. S. Aly, Y. H. Malviya, M. Gao, T. F. Wu, S. Venkataramani, C. Vezyrtzis, N. Wang, F. Yee, C. Zhou,
C. Kozyrakis, H.-S. P. Wong, S. Mitra, Int. Conf. Hardware/ P.-F. Lu, B. Curran, L. Chang, K. Gopalakrishnan, IEEE Solid-
Software Codesign Syst. Synth. (CODESþISSS), ACM, Seoul, State Circuits Lett. 2018, 1, 217.
2017, pp. 1–2. [116] A. Shafiee, A. Nag, N. Muralimanohar, R. Balasubramonian,
[93] M. Yu, Y. Cai, Z. Wang, Y. Fang, Y. Liu, Z. Yu, Y. Pan, J. P. Strachan, M. Hu, R. S. Williams, V. Srikumar, ACM
Z. Zhang, J. Tan, X. Yang, M. Li, R. Huang, Sci. Rep. 2016, SIGARCH Comput. Archit. News 2016, 44, 14.
6, 21020. [117] P. Chi, S. Li, C. Xu, T. Zhang, J. Zhao, Y. Liu, Y. Wang, Y. Xie, ACM
[94] P. Sun, N. Lu, L. Li, Y. Li, H. Wang, H. Lv, Q. Liu, S. Long, S. Liu, SIGARCH Comput. Archit. News, 2016, pp. 27–39.
M. Liu, Sci. Rep. 2015, 5, 13504. [118] L. Song, X. Qian, H. Li, Y. Chen, IEEE Int. Symp. High Perform.
[95] P. W. Hollis, J. J. Paulos, IEEE J. Solid-State Circuits 1990, Comput. Archit., IEEE, Vienna, 2017, pp. 541–552.
25, 849. [119] X. Qiao, X. Cao, H. Yang, L. Song, H. Li, Proc. 55th Annu. Des.
[96] P. Häfliger, M. Mahowald, L. Watts, Advanced in Neural Information Autom. Conf., IEEE, San Francisco, CA, 2018, p. 103.
Processing System, Neural Information Processing Systems [120] F. Chen, L. Song, H. H. Li, Y. Chen, Des. Autom. Conf., IEEE, Las
Foundation, Denver 1997, pp. 692–698. Vegas 2019, p. 133.
[97] A. Joubert, B. Belhadj, R. Héliot, Int. New Circuits Syst. Conf., IEEE, [121] H. Mao, M. Song, T. Li, Y. Dai, J. Shu, IEEE/ACM Int. Symp.
Seoul, 2011, pp. 9–12. Microarchitecture, IEEE, Fukuoka, 2018, pp. 669–681.
[98] J. Schemmel, J. Fieres, K. Meier, IEEE Int. Jt. Conf. Neural Networks [122] A. Ankit, I. El Hajj, S. R. Chalamalasetti, G. Ndu, M. Foltin,
(IEEE World Congr. Comput. Intell., IEEE, Hong Kong, 2008, R. S. Williams, P. Faraboschi, W. W. Hwu, J. P. Strachan, K. Roy,
pp. 431–438. D. S. Milojicic, Proc. Twenty-Fourth Int. Conf. Archit. Support
[99] S.-I. Amarimber, IEEE Trans. Syst. Man. Cybern. 1972, 643. Program. Lang. Oper. Syst., ACM, Providence, RI, 2019,
[100] Z. Wang, S. Joshi, S. Savel’ev, W. Song, R. Midya, Y. Li, M. Rao, pp. 715–731.
P. Yan, S. Asapu, Y. Zhuo, H. Jiang, P. Lin, C. Li, J. H. Yoon, [123] Caffe2, https://caffe2.ai/ (accessed: June 2019).
N. K. Upadhyay, J. Zhang, M. Hu, J. P. Strachan, M. Barnell, [124] Pytorch, https://pytorch.org/ (accessed: June 2019).
Q. Wu, H. Wu, R. S. Williams, Q. Xia, J. J. Yang, Nat. Electron. [125] ONNX, https://onnx.ai/ (accessed: June 2019).
2018, 1, 137. [126] A. Grossi, E. Nowak, C. Zambelli, C. Pellissier, S. Bernasconi,
[101] S. Kumar, J. P. Strachan, R. S. Williams, Nature 2017, 548, 318. G. Cibrario, K. El Hajjam, R. Crochemore, J. F. Nodin, P. Olivo,
[102] B. Yan, X. Cao, H. (Helen) Li, Des. Autom. Conf., San Francisco L. Perniola, IEEE Int. Electron Devices Meet., IEEE, San
2018. Francisco, CA, 2016, pp. 4–7.
[103] N. Qiao, H. Mostafa, F. Corradi, M. Osswald, F. Stefanini, [127] B. Yan, J. Yang, Q. Wu, Y. Chen, H. Li, IEEE/ACM Int. Conf.
D. Sumislawska, G. Indiveri, Front. Neurosci. 2015, 9, 141. Comput. Des., IEEE, Irvine, CA, 2017.
[104] L. F. Abbott, T. B. Kepler, Stat. Mech. Neural Networks, Springer, [128] C. Ho, T. Y. Shen, P. Y. Hsu, S. C. Chang, S. Y. Wen, M. H. Lin,
Berlin, Heidelberg 1990, pp. 5–18. P. K. Wang, S. C. Liao, C. S. Chou, K. M. Peng, C. M. Wu,
[105] L. O. Chua, Int. J. Bifurc. chaos 2005, 15, 3435. W. H. Chang, Y. H. Chen, F. Chen, L. W. Lin, T. H. Tsai,
[106] L. Chua, V. Sbitnev, H. Kim, Int. J. Bifurc. Chaos 2012, 22, S. F. Lim, C. J. Yang, M. H. Shieh, H. H. Liao, C. H. Lin,
1230011. P. L. Pai, T. Y. Chan, Y. C. Chiao, IEEE Symp. VLSI Technol.,
[107] Y. Wang, W. Wen, L. Song, H. H. Li, Asia South Pacific Des. Autom. IEEE, Honolulu, HI, 2016, pp. 1–2.
Conf., IEEE, Chiba, 2017, pp. 776–781. [129] Y. Pang, B. Gao, D. Wu, S. Yi, Q. Liu, W.-H. Chen, T.-W. Chang,
[108] X. Peng, S. Yu, IEEE Asia Pacific Conf. Circuits Syst., IEEE, Chengdu, W.-E. Lin, X. Sun, S. Yu, H. Qian, M.-F. Chang, H. Wu, IEEE Int.
2018, pp. 378–381. Solid-State Circuits Conf. 2019, IEEE, San Francisco, CA,
[109] W. H. Chen, K. X. Li, W. Y. Lin, K. H. Hsu, P. Y. Li, C. H. Yang, pp. 402–404.
C. X. Xue, E. Y. Yang, Y. K. Chen, Y. S. Chang, T. H. Hsu, [130] S. Ambrogio, S. Balatti, V. McCaffrey, D. Wang, D. Ielmini, IEEE Int.
Y. C. King, C. J. Lin, R. S. Liu, C. C. Hsieh, K. T. Tang, Electron Devices Meet., IEEE, San Francisco, CA, 2014,
M. F. Chang, in IEEE Int. Solid-State Circuits Conf., IEEE, San p. 14.
Francisco 2018, Vol. 61, p. 494. [131] P. Huang, Y. C. Xiang, Y. D. Zhao, C. Liu, B. Gao, H. Q. Wu,
[110] M. Courbariaux, Y. Bengio, J.-P. David, Adv. Neural Inf. Process. Syst. H. Qian, X. Y. Liu, J. F. Kang, IEEE Int. Electron Devices Meet.,
2015, 3123. IEEE, San Francisco 2018, pp. 40–44.
[111] M. Rastegari, V. Ordonez, J. Redmon, A. Farhadi, Eur. Conf. [132] C.-Y. Chen, H.-C. Shih, C.-W. Wu, C.-H. Lin, P.-F. Chiu, S.-S. Sheu,
Comput. Vis., Springer, Cham, 2016, pp. 525–542. F. T. Chen, IEEE Trans. Comput. 2014, 64, 180.
[112] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, L. Fei-Fei, IEEE Conf. [133] L. Chua, IEEE Micro 2018, 38, 7.
Comput. Vis. Pattern Recognit., IEEE, Miami, FL, 2009, [134] C. Liu, M. Hu, J. P. Strachan, H. Li, Des. Autom. Conf., ACM,
pp. 248–255. Austin, TX, 2017, pp. 1–6.
Adv. Intell. Syst. 2019, 1, 1900068 1900068 (15 of 16) © 2019 The Authors. Published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.advancedsciencenews.com www.advintellsyst.com
[135] L. Xia, M. Liu, X. Ning, K. Chakrabarty, Y. Wang, Des. Autom. Conf., [138] A. Krizhevsky, G. Hinton, Learning Multiple Layers of Features from
ACM, Austin, TX, 2017, p. 33. Tiny Images, Vol. 1, No. 4. Technical Report, University of Toronto,
[136] X. Xu, Y. Ding, S. X. Hu, M. Niemier, J. Cong, Y. Hu, Y. Shi, Nat. Toronto 2009.
Electron. 2018, 1, 216. [139] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma,
[137] T. F. Wu, B. Q. Le, R. Radway, A. Bartolo, W. Hwang, S. Jeong, H. Li, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg,
P. Tandon, E. Vianello, P. Vivet, E. Nowak, IEEE Int. Solid-State L. Fei-Fei, Int. J. Comput. Vis. 2015, 115, 211.
Circuits Conf., IEEE, San Francisco, CA, 2019, pp. 226–228.
Adv. Intell. Syst. 2019, 1, 1900068 1900068 (16 of 16) © 2019 The Authors. Published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim