0 Up votes0 Down votes

59 views13 pagesLearning in Multilayer Neural Networks (MNNs)
relies on continuous updating of large matrices of synaptic
weights by local rules. Such locality can be exploited for massive
parallelism when implementing MNNs in hardware. However,
these update rules require a multiply and accumulate operation
for each synaptic weight, which is challenging to implement
compactly using CMOS. In this paper, a method for performing
these update operations simultaneously (“incremental outer
products”) using memristor-based arrays is proposed. The
method is based on the fact that, approximately, given a voltage
pulse, the conductivity of a memristor will increment proportionally
to the pulse duration multiplied by the pulse magnitude,
if the increment is sufficiently small. The proposed method
uses a synaptic circuit composed of a small number of components
per synapse: one memristor and two CMOS transistors.
This circuit is expected to consume between 2% to 8% of the
area and static power of previous CMOS-only hardware alternatives.
Such a circuit can compactly implement scalable training
algorithms based on online gradient descent (e.g., Backpropagation).
The utility and robustness of the proposed memristorbased
circuit is demonstrated on standard supervised learning
tasks.

Nov 25, 2015

© © All Rights Reserved

PDF, TXT or read online from Scribd

Learning in Multilayer Neural Networks (MNNs)
relies on continuous updating of large matrices of synaptic
weights by local rules. Such locality can be exploited for massive
parallelism when implementing MNNs in hardware. However,
these update rules require a multiply and accumulate operation
for each synaptic weight, which is challenging to implement
compactly using CMOS. In this paper, a method for performing
these update operations simultaneously (“incremental outer
products”) using memristor-based arrays is proposed. The
method is based on the fact that, approximately, given a voltage
pulse, the conductivity of a memristor will increment proportionally
to the pulse duration multiplied by the pulse magnitude,
if the increment is sufficiently small. The proposed method
uses a synaptic circuit composed of a small number of components
per synapse: one memristor and two CMOS transistors.
This circuit is expected to consume between 2% to 8% of the
area and static power of previous CMOS-only hardware alternatives.
Such a circuit can compactly implement scalable training
algorithms based on online gradient descent (e.g., Backpropagation).
The utility and robustness of the proposed memristorbased
circuit is demonstrated on standard supervised learning
tasks.

© All Rights Reserved

59 views

Learning in Multilayer Neural Networks (MNNs)
relies on continuous updating of large matrices of synaptic
weights by local rules. Such locality can be exploited for massive
parallelism when implementing MNNs in hardware. However,
these update rules require a multiply and accumulate operation
for each synaptic weight, which is challenging to implement
compactly using CMOS. In this paper, a method for performing
these update operations simultaneously (“incremental outer
products”) using memristor-based arrays is proposed. The
method is based on the fact that, approximately, given a voltage
pulse, the conductivity of a memristor will increment proportionally
to the pulse duration multiplied by the pulse magnitude,
if the increment is sufficiently small. The proposed method
uses a synaptic circuit composed of a small number of components
per synapse: one memristor and two CMOS transistors.
This circuit is expected to consume between 2% to 8% of the
area and static power of previous CMOS-only hardware alternatives.
Such a circuit can compactly implement scalable training
algorithms based on online gradient descent (e.g., Backpropagation).
The utility and robustness of the proposed memristorbased
circuit is demonstrated on standard supervised learning
tasks.

© All Rights Reserved

- Song Abbott Neuron 2001
- 3-D Memristor Chip Debuts
- 00343548
- Classification of MRI Brain Images Using GLCM, Neural Network, Fuzzy Logic & Genetic Algorithm
- CS231n Convolutional Neural Networks for Visual Recognition Part I.
- Evanna i
- NeuralNetworks
- NeuralNetworks-Assign1
- Scope for Artificial Neural Network in Textiles
- Machine Hints
- Study of Deep Learning Algorithms for Automatic License Plate Recognition (ALPR)
- sensors-16-01429 (1)
- Local Training for Radial Basis Function Networks_ Towards Solving the Hidden Unit Problem
- A Directional Feature with Energy based Offline Signature Verification Network
- Lab Work
- 09 Artificial Neural Networks and Classification
- Dean I. Radin- Searching for "Signatures" in Anomalous Human-Machine Interaction Data: A Neural Network Approach
- art-3A10.1007-2Fs11269-013-0506-x
- Feature Recognition
- Bangla Optical Character Recognition

You are on page 1of 13

with online gradient descent training

Daniel Soudry, Dotan Di Castro, Asaf Gal, Avinoam Kolodny, and Shahar Kvatinsky

AbstractLearning in Multilayer Neural Networks (MNNs)

relies on continuous updating of large matrices of synaptic

weights by local rules. Such locality can be exploited for massive

parallelism when implementing MNNs in hardware. However,

these update rules require a multiply and accumulate operation for each synaptic weight, which is challenging to implement

compactly using CMOS. In this paper, a method for performing these update operations simultaneously (incremental outer

products) using memristor-based arrays is proposed. The

method is based on the fact that, approximately, given a voltage pulse, the conductivity of a memristor will increment proportionally to the pulse duration multiplied by the pulse magnitude, if the increment is sufficiently small. The proposed method

uses a synaptic circuit composed of a small number of components per synapse: one memristor and two CMOS transistors.

This circuit is expected to consume between 2% to 8% of the

area and static power of previous CMOS-only hardware alternatives. Such a circuit can compactly implement scalable training algorithms based on online gradient descent (e.g., Backpropagation). The utility and robustness of the proposed memristorbased circuit is demonstrated on standard supervised learning

tasks.

Stochastic Gradient Descent, Backpropagation,

Synapse, Hardware, Multilayer neural networks.

I. I NTRODUCTION

been recently incorporated into numerous commercial products and services such as mobile devices

and cloud computing. For realistic large scale learning tasks MNNs can perform impressively well and

produce state-of-the-art results, when massive computational power is available (e.g., [13], and see [4] for

related press). However, such computational intensity

limits their usability, due to area and power requirements. New dedicated hardware design approach must

therefore be developed to overcome these limitations.

In fact, it was recently suggested that such new types

of specialized hardware are essential for real progress

towards building intelligent machines [5].

MNNs utilize large matrices of values termed

synaptic weights. These matrices are continuously updated during the operation of the system, and are conD. Soudry is with the Department of Statistics, Center for Theoretical Neuroscience, Columbia University, New York, NY 10027,

USA (email: daniel.soudry@gmail.com)

D. Di Castro, A. Gal, A. Kolodny and S. Kvatinsky are with the

Department of Electrical Engineering, Technion - Israel institute of

Technology, Haifa 32000, Israel

MNNs mainly stems from the learning rules used for

updating the weights. These rules are usually local, in

the sense that they depend only on information available at the site of the synapse. A canonical example

for a local learning rule is the Backpropagation algorithm, an efficient implementation of online gradient

descent, which is commonly used to train MNNs [6].

The locality of Backpropgation stems from the chain

rule used to calculate the gradients. Similar locality

appears in many other learning rules used to train neural networks and various Machine Learning (ML) algorithms.

Implementing ML algorithms such as Backpropagation on conventional general-purpose digital hardware

(i.e., von Neumann architecture) is highly inefficient.

A prime reason for this is the physical separation between the memory arrays used to store the values of

the synaptic weights and the arithmetic module used to

compute the update rules. General-purpose architecture actually eliminates the advantage of these learning rules - their locality. This locality allows highly

efficient parallel computation, as demonstrated by biological brains.

To overcome the inefficiency of general-purpose

hardware, numerous dedicated hardware designs,

based on CMOS technology, have been proposed in

the past two decades [7, and references therein]. These

designs perform online learning tasks in MNNs using massively parallel synaptic arrays, where each

synapse stores a synaptic weight and updates it locally.

However, so far, these designs are not commonly used

for practical large scale applications, and it is not clear

whether they could be scaled up, since each synapse

requires too much power and area (see section VII).

This issue of scalability possibly casts doubt on the

entire field [8].

Recently, it has been suggested [9, 10] that scalable

hardware implementations of neural networks may become possible if a novel device, the memristor [11

13], is used. A memristor is a resistor with a varying, history-dependent resistance. It is a passive analog device with activation-dependent dynamics, which

makes it ideal for registering and updating of synaptic weights. Furthermore, its relatively small size en-

ables integration of memory with the computing circuit [14] and allows a compact and efficient architecture for learning algorithms. Specifically, it was recently shown [15] that memristors could be used to

implement MNNs trained by backpropgation in a chipin-a-loop setting (where the weight update calculation

is performed using a host computer). However, it remained an open question whether the learning itself,

which is a major computational bottleneck, could also

be done online (completely implemented using hardware) with massively parallel memristor arrays.

To date, no circuit has been suggested to utilize

memristors for implementing scalable learning systems. Most previous implementations of learning rules

using memristor arrays have been limited to spiking

neurons and focused on Spike Time Dependent Plasticity (STDP) [1623]. Applications of STDP are

usually aimed to explain biological neuroscience results. At this point, however, it is not clear how useful is STDP algorithmically. For example, the convergence of STDP-based learning is not guaranteed

for general inputs [24]. Other learning systems [25

27] that rely on memristor arrays, are limited to single

layers with a few inputs per neuron. To the best of

the authors knowledge, the learning rules used in the

above memristor-based designs are not yet competitive and have not been used for large scale problems

- which is precisely where dedicated hardware is required. In contrast, online gradient descent is considered to be very effective in large scale problems [28].

Also, there are guarantees that this optimization procedure converges to some local minimum. Finally,

it can achieve state-of-the-art-results when executed

with massive computational power (e.g., [3, 29]).

The main challenge for general synaptic array circuit design arises from the nature of learning rules:

practically all of them contain a multiplicative term

[30], which is hard to implement in compact and scalable hardware. In this paper, a novel and general

scheme to design hardware for online gradient descent

learning rules is presented. The proposed scheme

uses a memristor as a memory element to store the

weight and temporal encoding as a mechanism to perform a multiplication operation. The proposed design

uses a single memristor and two CMOS transistors per

synapse, and requires therefore 2% to 8% of the area

and static power of previously proposed CMOS-only

circuits. The functionality of a circuit utilizing the

memristive synapse array is demonstrated numerically

on standard supervised learning tasks. On all datasets

the circuit performs as well as the software algorithm.

Introducing noise levels of about 10% and parame-

performance of the circuit, due to the inherent robustness of online gradient descent. The proposed design

may therefore allow the use of specialized hardware

for MNNs, as well as other ML algorithms, rather than

the currently used general-purpose architecture.

The remainder of the paper is organized as follows.

In section II, basic background on memristors and online gradient descent learning is given. In section III,

the proposed circuit is described, for efficient implementation of Single layer Neural Networks (SNNs) in

hardware. In section IV a modification of the proposed

circuit is used to implement a general MNN. In section V the sources of noise and variation in the circuit are estimated. In section VI the circuit operation

and learning capabilities are evaluated numerically,

demonstrating it can be used to implement MNNs

trainable with online gradient descent. In section VII

the novelty of the proposed circuit is discussed, as well

as possible generalizations; and in section VIII the paper is summarized. The supplementary material [31]

includes detailed circuit schematics, code, and an appendix with additional technicalities.

II. P RELIMINARIES

For convenience, basic background information on

memristors and online gradient descent learning is

given in this section. For simplicity, the second part

is focused on a simple example of the Adaline algorithm - a linear SNN trained using Mean Square Error

(MSE).

A. The memristor

A memristor [11, 12], originally proposed by Chua

in 1971 as the missing fourth fundamental element,

is a passive electrical element. Memristors are basically resistors with varying resistance, where their resistance changes according to time integral of the current through the device, or alternatively, the integrated

voltage upon the device. In the classical representation the conductance of a memristor G depends directly on the integral over time of the voltage upon the

device, sometimes referred to as Flux. Formally, a

memristor obeys the following equations

i (t) = G (s (t)) v (t) ,

s (t) = v (t) .

(1)

(2)

A generalization of this model , which is called a memristive system [32], was proposed in 1976 by Chua and

Kang. In memristive devices, s is a general state variable, rather than an integral of the voltage. Memristive

SOUDRY ET AL: MEMRISTOR-BASED MULTILAYER NEURAL NETWORKS WITH ONLINE GRADIENT DESCENT TRAINING3

sections it is assumed that the variations in the value of

s (t) are restricted to be small, so that G (s (t)) can be

linearized around some point s and the conductivity

of the memristor is given, to first order, by

As explained in the introduction, a reasonable iterative algorithm for minimizing objective (7) (i.e., updating W, where initially W is arbitrarily chosen) is the

following online gradient descent (also called stochastic Gradient Descent ) iteration

G (s (t)) = g + g s (t) ,

2

1

W(k) = W(k1) W
y(k)
,

2

(3)

B. Online gradient descent learning

The field of Machine Learning (ML) is dedicated to

the construction and study of systems that can learn

from data. For example, consider the following supervised learning task. Assume a learning system that operates on K discrete presentations of inputs (trials),

indexed by k = 1, 2, . . . , K. For brevity, the indexing

of iteration number is sometimes suppressed, where it

is clear from the context. On each trial k, the system

receives empirical data, a pair of two real column vectors of sizes M and N : a pattern x(k) RM and a

desired label d(k) RN , with all pairs

sharing the

same desired relation d(k) = f x(k) . The objective

of the system is to estimate (learn) the function f ()

using the empirical data.

As a simple example, suppose W is a tunable N

M matrix of parameters, and consider the estimator

r(k) = W(k) x(k) ,

(4)

or

rn(k) =

(k) (k)

Wnm

xm .

(5)

should aim to predict the right desired labels d =

f (x) for new unseen patterns x. To solve this problem, W is tuned to minimize some measure of error

between the estimated and desired labels, over a K0 long subset of the empirical data, called the training

set (for which k = 1, ..., K0 ). For example, if we

define the error vector

y(k) , d(k) r(k) ,

(6)

MSE ,

K0

X

(k)
2

y
.

(7)

k=1

tested over a different subset, called the test set

(k = K0 + 1, ..., K).

(8)

convenience, is the learning rate, a (usually small)

positive constant, and at each iteration k a single empirical sample is chosen randomly and presented at the

input of the system. The gradient can be calculated by

differentiation using the chain rule, (6) and (4). Defin>

ing W(k) , W(k) W(k1) , and () to be the

transpose operation, we obtain the outer product

>

W(k) = y(k) x(k)

,

(9)

or

(k)

(k1)

(k)

Wnm

= Wnm

+ x(k)

m yn .

(10)

algorithm [33], used in adaptive signal processing and

control [34]. The parameters of more complicated estimators can also be similarly tuned (trained), using

online gradient descent or similar methods. Specifically, MNNs (section (IV)) are commonly being

trained using Backpropgation, which is an efficient

form of online gradient descent [6]. Importantly, note

that the update rule in (10) is local- i.e., the change in

(k)

only

on therelated

the synaptic weight Wnm depends

(k)

(k)

components of input xm and error yn . This

local update, which ubiquitously appears in neural networks training (e.g., Backpropagation, and the Perceptron learning rule [35]) and other ML algorithms (e.g.,

[3638]), enables a massively parallel hardware design, as explained in section III.

Such massively parallel designs are needed, since

for large N and M , learning systems usually become

computationally prohibitive in both time and memory space. For example, in the simple Adaline algorithm the main computational burden, in each iteration, comes from (5) and (10), where the number

of operations (addition and multiplication) is of order O (M N ). Commonly, these steps have become

the main computational bottleneck in executing MNNs

(and related ML algorithms) in software. Other algorithmic steps, such as (6) here, are usually linear in

either M or N , and therefore have a negligible computational complexity.

column

input

interface

row

input

interface

row

input

interface

Synaptic Grid circuit executing (4) and (9), which are the

main computational bottlenecks in the algorithm.

Next, dedicated analog hardware for implementing

online gradient descent learning is described. For simplicity, this section is focused on simple linear SNNs

trained using Adaline, as described in section II. Later,

in section IV, the circuit is modified to implement

general MNNs, trained using Backpropagation. The

derivations are rigorously done using a single controlled approximation (13), and no heuristics (which

can damage performance) - thus creating a precisely

defined mapping between a mathematical learning

system and a hardware learning system. Readers who

do not wish to go through the detailed derivations, can

get the general idea from the following overview (section A) together with Figs. 1-4.

A. Circuit overview

To implement these learning systems, a grid of artificial synapses is constructed, where each synapse

stores a single synaptic weight Wnm . The grid is a

large N M array of synapses, where the synapses

operate simultaneously, each performing a simple local operation. This synaptic grid circuit carries the

main computational load in ML algorithms by implementing the two computational bottlenecks, (5) and

(10), in a massively parallel way. This matrixvector

product in (5) is done using a resistive grid (of memristors), implementing multiplication through Ohms

law and addition through current summation. The

vectorvector outer product in (10) is done by using

the fact that, given a voltage pulse, the conductivity of

a memristor will increment proportionally to the pulse

duration multiplied by the pulse magnitude. Using this

method, multiplication requires only two transistors

per synapse. Thus, together with a negligible amount

of O (M + N ) additional operations, these arrays can

be used to construct efficient learning systems.

column

input

interface

row

output

interface

row

output

interface

Every (n, m) node in the grid is a memristor-based synapse

that receives voltage input from the shared um , u

m and the

en lines and outputs Inm current on the on lines. These output lines receive total current arriving from all the synapses

on the n-th row, and are grounded.

Similarly to the Adaline algorithm described in section II, the circuit operates on discrete presentations of

inputs (trials) - see Fig. 1. On each trial k, the cirM

cuit receives an input vector x(k) [A, A] and an

N

error vector y(k) [A, A] (where A is a bound on

both the input and error) and produces a result output

vector r(k) RN , which depends on the input by the

relation (4), where the matrix W(k) RN M , called

the synaptic weight matrix, is stored in the system.

Additionally, on each step, the circuit updates W(k)

according to (9). This circuit can be used to implement

ML algorithms. For example, as depicted in Fig. 1 the

simple Adaline algorithm can be implemented using

the circuit, with training enabled on k = 1, . . . , K0 .

The implementation of the Backpropgation algorithm,

is shown in section IV.

B. The circuit architecture

B.1 The synaptic grid

The synaptic grid system described in Fig. 1 is implemented by the circuit shown in 2, where components of all vectors are shown as individual signals.

Each gray cell in Fig. 2 is a synaptic circuit (artificial synapse) using a memristor (described in Fig.

3a). The synapses are arranged in a two dimensional

N M grid array as shown in Fig. 2, where each

synapse is indexed by (n, m), with m {1..M } and

n {1..N }. Each (n, m) synapse receives two inputs um , um , an enable signal en , and produces an

output current Inm . Each column of synapses in the

SOUDRY ET AL: MEMRISTOR-BASED MULTILAYER NEURAL NETWORKS WITH ONLINE GRADIENT DESCENT TRAINING5

um and um , both connected to a column input interface. The voltage signals um and um (m) are generated by the column input interfaces from the components of the input signal x, upon presentation. Each

row of synapses in the array (the n-th row) shares the

horizontal enable line en and output line on , where

en is connected to a row input interface and on is

connected to a row output interface. The voltage

(pulse) signal on the enable line en (n) is generated

by the row input interfaces from the error signal y,

upon presentation. The row output interfaces keep the

o lines grounded (V = 0) and convert the total current

from

P all the synapses in the row going to the ground,

m Inm , into the output signal r.

B.2 The artificial synapse

The proposed memristive synapse is composed of

a single memristor, connected to a shared terminal

of two MOSFET transistors (P-type and N-type), as

shown schematically in Fig. 3a (without the n, m indices). These terminals act as drain or source, interchangeably, depending on the input, similarly to the

CMOS transistors in transmission gates. Recall the

memristor dynamics are given by (1-3), with s (t) being the state variable of the memristor and G (s (t)) =

g + gs (t) its conductivity. Also, the current of the

N-type transistor in the linear region is, ideally,

1 2

,

(11)

I = K (VGS VT ) VDS VDS

2

where VGS is the gate-source voltage, VT is the voltage threshold, VDS is the drain-source voltage, and K

is the conduction parameter of the transistors. The current equation for the P-type transistor is symmetrical

to (11), and for simplicity we assume that K and |VT |

are equal for both transistors.

The synapse receives three voltage input signals: u

and u

= u are connected, respectively, to a terminal

of the N-type and P-type transistors and an enable signal e is connected to the gate of both transistors. The

enable signal can have a value of 0, VDD , or VDD

(with VDD > |VT |) and have a pulse shape of varying duration, as explained below. The output of the

synapse is a current I to the grounded line o. The

magnitude of the input signal u (t) and the circuit parameters are set so they fulfill the following

1. We assume (somewhat unusually) that

|u (t)| < |VT |

(12)

transistors are non-conducting (in the cut-off region).

2. We assume that

K (VDD |u (t)| |VT |) G (s (t))

(13)

both transistors have relatively high conductivity

as compared to the conductivity of the memristor.

To satisfy (12) and (13), the proper value of u(t) is

chosen. Note (13) is a reasonable assumption, as

shown in [39]. If not (e.g., if the memristor conductivity is very high), instead one can use an alternative

design, as described in appendix B [31]. Under these

assumptions, when e = 0 then I = 0 in the output and

the voltage across the memristor is zero. In this case,

the state variable does not change. If e = VDD , from

(13), the voltage on the memristor is approximately

u . When e = VDD the N-type transistor is conducting in the linear region while the P-type transistor is

cut off. When e = VDD the P-type transistor is conducting in the linear region while the N-type transistor

is cut off.

C. Circuit operation

The operation of the circuit in each trial (a single

presentation of a specific input) is composed of two

phases. First, in the computing phase (read), the

output current from all the synapses is summed and

adjusted to produce an arithmetic operation r = Wx

from (4). Second, in the updating phase (write), the

synaptic weights are incremented according to the update rule W = yx> from (9). In the proposed design, for each synapse, the synaptic weight, Wnm , is

stored using snm , the memristor state variable of the

(n, m) synapse. The parallel read and write operations are achieved by applying simultaneous voltage signals on the inputs um and enable signals en

(n, m). The signals and their effect on the state variable are shown in Fig. 3b.

C.1 Computation phase (read)

During each read phase, a vector x is given, and

component-wise by the column

encoded in u and u

input interfaces, for a duration of Trd , m : um (t) =

axm =

um (t), where a is a positive constant converting xm , a unit-less number, to voltage. Recall that

A is the maximal value of |xm |, so we require that

aA < |VT |, as required in (12). Additionally, the row

input interfaces produce voltage signal on the en lines,

n :

(

VDD

, if 0 t < 0.5Trd

en (t) =

. (14)

VDD , if 0.5Trd t Trd

to a unit-less number rn , and

X

oref = a

g

xm .

(19)

m

Defining

Wnm = ac

g snm ,

(20)

r = Wx ,

(21)

we obtain

as desired.

C.2 Update phase (write)

memristive synapse (without the n, m indices). The synapse

receives input voltages u and u

= u, an enable signal e,

and output current I. (b) The read and write protocols incoming signals in a single synapse and the increments in the

synaptic weight s as determined by (24). T = Twr + Trd .

is therefore, n, m :

0.5Trd

snm =

Trd

(axm ) dt+

0

(axm ) dt = 0 ,

0.5Trd

(15)

The zero net change in the value of snm between

the times of 0 and Trd implements a non-destructive

read, as common in many memory devices. To minimize inaccuracies due to changes in the conductance

of the memristor during the read phase, the row output interface samples the output current at time zero.

Using (3), the output current of the synapse to the on

line at the time is thus

Inm = a(

g + gsnm )xm .

(16)

equals to the sum of the individual currents produced

by the synapses driving that line, i.e.,

X

X

on =

Inm = a

(

g + gsnm ) xm . (17)

m

maintain their values from the read phase, while the

signal e changes. In this phase the row input interfaces

encode e component-wise, n : w

(

sign (yn ) VDD , if 0 t Trd b |yn |

en (t) =

,

0

, if b |yn | < t Trd < Twr

(22)

The interpretation of (22) is that en is a pulse with

magnitude VDD , the same sign as yn , and a duration

b |yn | (where b is a constant converting yn , a unit-less

number, to time units). Recall that A is the maximal

value of |yn |, so we require that Twr > bA. The total

change in the internal state is therefore

on , and outputs

rn = c (on oref )

(18)

Trd +b|yn |

snm =

Trd

= abxm yn

(24)

weights is therefore obtained

W = yx> ,

(25)

where = a2 bc

g.

IV. A MULTILAYER N EURAL NETWORK CIRCUIT

So far, the circuit operation was exemplified on a

SNN with the simple Adaline algorithm. In this section it is explained how, with a few adjustments, the

proposed circuit can be used to implement Backpropgation on a general MNN.

Recall the context of the supervised learning

setup detailed in section II. Consider an estimator of

the form

r = W2 (W1 x) ,

with W1 and W2 being the two parameter matrices.

Such a double layer MNN can approximate any target function with arbitrary precision [40]. Denote by

SOUDRY ET AL: MEMRISTOR-BASED MULTILAYER NEURAL NETWORKS WITH ONLINE GRADIENT DESCENT TRAINING7

Fig. 4 Implementing the Backpropagation for a MNN with MSE error, using a modified version of the original circuit. The function

boxes denote the operation of either () or 0 () on the input, and the box denotes a component-wise product. For detailed

circuit schematics, see [31].

layer. Suppose we again use MSE to quantify the error

between d and r. In the Backpropagation algorithm,

each update of the parameter matrices is given by an

online gradient decent step, from (8),

>

W1 = y1 x>

1 ; W2 = y2 x2 ,

with x2 , (r

1 ) , y2 , d r2 , x1 , x and

y1 , W2> y2 0 (r1 ), where here a b ,

(a1 b1 , . . . , aM bM ), a component-wise product. Implementing such an algorithm requires a minor modification of the proposed circuit - it should have an additional inverted output , W> y. Once this modification is made to the circuit, by cascading such circuits, it is straightforward to implement the Backpropagation algorithm for two layers or more, as shown in

Fig. 4.

The additional output = W> y can be generated

by the circuit in an additional read phase with duration Trd , between the original read and write

phases, in which the original role of the input and output lines are inverted. In this phase the NMOS transistor is on, i.e., n, en = VDD , and the former output

on lines are given the following voltage signal (again,

used for a non-destructive read)

(

ayn

, if Trd t < 1.5Trd

on =

.

ayn , if 1.5Trd t 2Trd

The Inm current now exits through the (original input) um terminal, and the sum of all the currents is

measured at time Trd by the column interface (before

it goes into ground)

X

X

um =

Inm = a

(

g + gsnm ) yn . (26)

n

m = c (um uref ) ,

(27)

where uref = a

g

m =

Wnm yn ,

as required.

Lastly, note that it is straightforward to use a different error function instead of MSE. For example, in the

2

kd rk .

two-layer example, recall that y2 = 12 r

If a different error function E (r, d) is required, one

can re-define

y2 ,

E (r, d)

r

(28)

learning rate. To implement this change in the MNN

circuit (Fig. 4), the subtractor should be simply replaced with some other module.

V. S OURCES OF NOISE AND VARIABILITY

Usually, analog computation suffers from reduced

robustness to noise as compared to digital computation

[41]. ML algorithms are, however, inherently robust

to noise, which is a key element in the set of problems

they are designed to solve. This suggests that the effects of intrinsic noise on the performance of the analog circuit are relatively small. These effects largely

depend on the specific circuit implementation (e.g., the

CMOS process). Particularly, memristor technology

is not mature yet and memristors have not been fully

characterized. Therefore, to check the robustness of

the circuit, crude estimation of the magnitude of noise

and variability has been used in this section. This estimation is based on known sources of noise and variability, which are less dependent on specific implementation. In section VI, the alleged robustness of the

circuit would be evaluated by simulating the circuit in

the presence of these noise and variation sources.

A. Noise

When one of the transistors is enabled (e (t) =

VDD ), then the current is affected by intrinsic thermal noise sources in the transistors and memristor of

each synapse. Current fluctuations on a device due to

thermal origin can be approximated by a white noise

signal I (t) with zero mean (hI (t)i = 0) and autocorrelation hI (t) I (t0 )i = 2 (t t0 ) where ()

is Diracs delta function and 2 = 2k Tg (where k

is Boltzmann constant, T is the temperature, and g

is the conductance of the device). For 65 nm transistors (parameters taken from IBMs 10LPe/10RFe

process [42]) the characteristic conductivity is g1

104 1 , and therefore for I1 (t), the thermal current source of the transistors are 12 1024 A2 sec.

Assume that the memristor characteristic conductivity g2 = g1 , so for the thermal current source of the

memristor, 22 = 12 . Note that from (13) we have

1, the resistance of the transistor is much smaller

than that of the memristor. The total voltage on the

memristor is thus

g1

1

u (t) + (g1 + g2 ) (I1 (t) I2 (t))

VM (t) =

g1 + g2

1

u (t) + (t)

=

1+

where (t) = g11 (I1 (t) I2 (t)) and h (t) (t0 )i

2 (t t0 ) with 2 2k Tg11 1016 V2 sec. The

equivalent circuit, including sources of noise, is shown

in Fig. (5). Assuming the circuit minimal trial duration is T = 10nsec, the maximum root mean square

error due to thermal noise would be about ET

T 1/2 104 V .

Noise in the inputs u, u

and e also exists. According

to [43], the relative noise in the power supply of the

u/

u inputs is approximately 10% in the worst case.

Applying u = ax, effectively gives an input of ax +

ET + Eu , where |Eu | 0.1a |x|. The absolute noise

min

level in duration of e should be smaller than Tclk

2

1010 sec, assuming a digital implementation of pulsemin

width modulation with Tclk

being the shortest clock

cycle currently available. On every write cycle e =

VDD is therefore applied for a duration of b |y| + Ee

(instead of b |y|), where |Ee | < Tclk .

B. Parameter variability

A common estimation of the variability in memristor parameters is a Coefficient of Variation (CV =

(standard deviation)/mean) of a few percent [44]. In

this paper, the circuit is also evaluated with considerably larger variability (CV 30%), in addition to

the noise sources as described in section A. This is

Fig. 5 Noise model for artificial synapse. (a) During operation, only one transistor is conducting (assume it is the N-type

transistor). (b) Thermal noise in a small signal model, the

transistor is converted to a resistor (g1 ) in parallel to current

source (I1 ), the memristor is converted to a resistor (g2 ) in

parallel to current source (I2 ), and the effects of the sources

are summed linearly.

done by assuming that the variability in the parameters of the memristors are random variables independently sampled from a uniform distribution g

U [0.5

g , 1.5

g ]. When running the algorithm in software, these variations are equivalent to corresponding

changes in the synaptic weights W or the learning rate

in the algorithm. Note that variability in the transistor parameters is not considered, since these can affect the circuit operation only if (12) or (13) are invalidated. This can happen, however, only if the values

of K or VT vary in orders of magnitude, which is unlikely.

VI. C IRCUIT E VALUATION

In this section, the proposed circuit is implemented

(section A) in a simulated physical model. Then (section B), using standard supervised learning datasets,

the implementation of SNNs and MNNs using the proposed circuit is demonstrated, including its robustness

to noise and variation.

A. Basic functionality

First, the basic operation of the proposed design is

examined numerically. A small 2 2 synaptic grid

circuit is simulated for time 10T with simple inputs

(x1 , x2 ) = (1, 0.5) sign (t 5T)

(y1 , y2 ) = (1, 0.5) 0.1 .

(29)

(30)

in SPICE (appendix D [31]) and SimElectronics [45]

(Fig. 6). Both are software tools which enable physical circuit simulation of memristors and MOSFET devices. In these simulations the correct basic operation of the circuit is verified. For example, notice that

the conductance of the (n, m) memristor indeed incre-

SOUDRY ET AL: MEMRISTOR-BASED MULTILAYER NEURAL NETWORKS WITH ONLINE GRADIENT DESCENT TRAINING9

Type

Parameter

Power source

VDD

10 V

5 AV2

NMOS

PMOS

VT

1.7 V

5 AV2

VT

1.4 V

Memristor

Timing

Value

1 S

180 S/ (V sec)

0.1 sec

Twr

0.21 T

0.1 V

Twr

c1

10 nA

Scaling

Parameter

Fig. 7a

Fig. 7b

Network

30 1

443

#Training samples

284

75

#Test samples

285

75

0.1

0.1

to the inputs from the previous layer, each neuron has constant

one input (a bias).

24.

Implementation details. The SPICE model is

described in appendix D. Next, the SimElectronics

synaptic grid circuit model (which was used to construct model hardware MNNs in the section B) is described. The circuit parameters appear in Table I, with

the parameters of the (ideal) transistors kept at their

defaults. The memristor model is implemented using

(1-3) with parameters taken from experimental data

[46, Fig. 2]. As depicted schematically in Fig. 2, the

circuit implementation consists of a synapse-grid and

the interface blocks. The interface units were implemented using a few standard CMOS and logic components, as can be see in the detailed schematics [31]

(HTML, open using Firefox). The circuit operated

synchronously, using global control signals, supplied

externally. The input and output to the circuit were

kept constant in each trial using sample and hold

units.

B. Learning performance

The synaptic grid circuit model is used to implement a SNN and a MNN, trainable by the online gradient descent algorithm. In order to demonstrate that the

algorithm is indeed implemented correctly by the pro-

posed circuit, the circuit performance has been compared to an algorithmic implementation of the MNNs

in (Matlab) software. Two standard tasks are used:

1. The Wisconsin Breast Cancer diagnosis task [47]

(linearly separable).

2. The iris classification task [47] (not linearly separable).

The first task was evaluated using an SNN circuit, similarly to Fig. 1. The second task is evaluated on a 2layer MNN circuit, similarly to Fig. 4. Fig. 7 shows

the training phase performance of

1. The algorithm.

2. The proposed circuit (implementing the algorithm).

3. The circuit, with about 10% noise and 30% variability (section V).

Also, Table III shows the generalization errors after

training was over. On each of the two tasks, the performance of the circuit and the algorithm similarly improves, finally reaching a similar generalization error.

These results indicate that the proposed circuit design

can precisely implement MNNs trainable with online

gradient descent, as expected, from the derivations in

section III. Moreover, the circuit exhibits considerable

robustness, as its performance is only mildly affected

by significant levels of noise and variation.

Implementation details. The SNN and the MNN

were implemented using the (SimElectronics) synaptic grid circuit model which was described in section

A. The detailed schematics of the 2-layer MNN model

are again given in [31] (HTML, open using Firefox),

and also the code. All simulations are done using a

desktop computer with core i7 930 processor, a Windows 8.1 operating system, and a Matlab 2013b environment. Each circuit simulation takes about one day

to complete (note the software implementation of the

circuit does not exploit the parallel operation of the

circuit hardware). For each k sample from the dataset,

the attributes are converted to a M -long vector x(k) ,

and each label is converted to a N -long binary (1)

vector d(k) . The training set is repeatedly presented,

with samples at random order. The inputs are centralized and normalized as recommended by [6], with

additional rescaling made done in dataset 1, to insure

that the inputs fall within the circuit specifications 12.

The performance was averaged over 10 repetitions and

the training error was also averaged over 20 samples.

Cross entropy error was used instead of MSE, since it

reduces classification error [48]. The initial weights

were set as sampled independently from a symmetric uniform distribution. The neuronal activation functions were set as (x) = 1.7159 tanh (2x/3), as rec-

10

r1

r2

r1

r2

0.4

0.6

5

1

0.8

0.5

0

0.2

0.4

t [sec]

0.4

0.6

gs11 [S]

V12 [mV]

V11 [mV]

0.5

0.2

0.5

1

0.8

0.2

50

0.1

50

100

0

0.1

0.2

0.4

0.8

50

0.1

0.6

50

0.2

1

0.8

V22 [mV]

0.1

gs21 [S]

V21 [mV]

0.2

50

0.4

0.6

0.2

1

t [sec]

100

0.2

5

1

100

t [sec]

100

0

0.8

t [sec]

100

100

0

0.6

gs12 [S]

0.2

0.1

50

0

0.2

t [sec]

0.4

0.6

0.8

gs22 [S]

1

0

0.5

x2

x1

0.1

1

t [sec]

Training Error

Fig. 6 Synaptic 2 2 grid circuit simulation, during ten operation cycles. Top: Circuit Inputs (x1 , x2 ) and result outputs (r1 , r2 ).

Middle and bottom: voltage (solid black) and conductance (dashed red) change for each memristor in the grid. Notice that the

conductance of the (n, m) memristor indeed changes proportionally to xm yn , following 24 . Simulation was done using SimElectronics, with inputs as in (29-30), and circuit parameters as in Table I. Similar results were obtained using SPICE with linear ion drift

memristor model and CMOS 0.18m process, as shown in appendix D [31].

Fig. 7a

0.8

Algorithm

Circuit

Noisy+variable circuit

0.6

0.4

0.2

0

0

200

400

600

800

1000

1200

Training Error

samples

0.8

Algorithm

Circuit

Noisy+variable circuit

0.6

0.4

(a)

(b)

(c)

Fig. 7b

1.3% 0.7%

2.9% 2%

1.5% 0.7%

2.8% 1.9%

1.5% 0.7%

4.7% 2.4%

(a) Algorithm (b) Circuit (c) Circuit with noise and

variability.

The standard deviation was calculated as

p

Pe (1 Pe ) /Ntest , with Ntest being the number of samples in the test set and Pe being the generalization error.

0.2

0

0

200

400

600

800

1000

1200

samples

Fig. 7 Circuit evaluation - training error for two datasets: (a)

Wisconsin breast cancer diagnosis (b) Iris classification.

VII. D ISCUSSION

As explained in section II and IV, two major computational bottlenecks of Multilayer Neurnal Networks

(MNNs), and many Machine Learning (ML) algorithms, are given by matrixvector product operation (4) and a vectorvector outer product operation (9). Both are of order O (M N ), where M

and N are the sizes of the input and output vectors.

In this paper, the proposed circuit is designed specifically to deal with these bottlenecks using memristor

arrays. This design has a relatively small number of

components in each array element - one memristor

and two transistors. The physical grid-like structure

of the arrays implements the matrixvector product operation (4) using analog summation of currents,

while the memristor dynamics enable us to perform

the vectorvector outer product operation (10), using timevoltage encoding paradigm. The idea to

use a resistive grid to perform matrixvector product operation is not new (e.g., [49]). The main novelty of this paper is the use of memristors together

with timevoltage encoding, which allows us to

perform a mathematically accurate vectorvector

outer product operation in the learning rule using a

small number of components.

SOUDRY ET AL: MEMRISTOR-BASED MULTILAYER NEURAL NETWORKS WITH ONLINE GRADIENT DESCENT TRAINING11

TABLE IV Hardware designs of artificial synapses implementing scalable online learning algorithms

Design

#Transistors

Comments

Proposed design

2 (+1 memristor)

[50]

[51]

[52, 53]

39

[54]

52

[55]

92

[56]

83

[57]

150

Weights decay ~ minutes

Weights only

increase (unusable)

Must keep training

As mentioned in the introduction, CMOS hardware

designs that specifically implement online learning algorithms remain an unfulfilled promise at this point.

The main incentive for existing hardware solutions

is the inherent inefficiency in implementing these algorithms in software running on general purpose hardware (e.g., CPUs, DSPs and GPUs). However,

squeezing the required circuit for both the computation

and the update phases (two configurable multipliers

and a memory element) into an array cell has proven to

be a hard task, using currently available CMOS technology. Off-chip or chip-in-the-loop design architectures (e.g., [58], Table 1) have been suggested in many

cases as a way around this design barrier. These designs, however, generally deal with the computational

bottleneck of the matrixvector product operation

in the computation phase, rather than the computational bottleneck of the vectorvector outer product

operation in the update phase. Additionally, these solutions are only useful in cases where the training is

not continuous and is done in a pre-deployment phase

or on special reconfiguration phases during the operation. Other designs implement non-standard (e.g.,

perturbation algorithms [59]) or specifically tailored

learning algorithms (e.g., modified Backpropagation

for spiking neurons [60]). However, it remains to be

seen whether such algorithms are indeed scalable.

Hardware designs of artificial synaptic arrays that

are capable of implementing common (scalable) online gradient-descent based learning, are listed in Table IV. For large arrays (i.e., large M and N ) the effective transistor count per synapse (where resistors and

capacitors were also counted as transistors) is approximately proportional to the required area and average

static power usage of the circuit.

transistors, similarly to our design, but requires the

(rather unusual) use of UV illumination during its operation, and has the disadvantage of having volatile

weights, decaying within minutes. The next device

[51] includes six transistors per synapse, but the update rule can only increase the synaptic weights, which

makes the device unusable for practical purposes. The

next device [53, 61] suggested a grid design using

CMOS Gilbert multipliers, resulting in 39 transistors

per synapse. In a similar design, [54], 52 transistors

are used. Both these devices use capacitive elements

for analog memory and suffer from the limitation of

having volatile weights, vanishing after training has

stopped. They require therefore constant re-training

(practically acting as refresh). Such retraining is required also on each start-up, or, alternatively, reading

out the weights into an auxiliary memory - a solution which requires a mechanism for reading out the

synaptic weights. The larger design in [55] (92 transistors) also has weight decay, but with a slow hourslong timescale. The device in [56] (83 transistors +

an unspecified weight unit which stores the weights)

does not report to have weight decay, apparently since

digital storage is used. This is also true for [57] (150

transistors).

The proposed memristor-based design should resolve these obstacles, and provide a compact, nonvolatile circuit. Area and power consumption are expected to be reduced by a factor between 13 to 50, in

comparison to standard CMOS technology, if a memristor is counted as an additional transistor (although

it is actually more compact). Maybe the most convincing evidence for the limitations of these designs

is the fact that, although most of these designs are

two decades old, they have not been incorporated into

commercial products. It is only fair to mention at

this point, that while our design is purely theoretical

and based on speculative technology, the above reviewed designs are based on mature technology, and

have overcome obstacles all the way to manufacturing.

B. Circuit modification and generalizations

The specific parameters that were used for the circuit evaluation I, are not strictly necessary for proper

execution of the proposed design, and are only used

to demonstrate its applicability. For example, it is

straightforward to show that K, VT and g have little

effect (as long as (12-13) hold), and different values of

g can be adjusted for by rescaling c. This is important

since the feasible range of parameters for the memristive devices is still not well characterized, and it seems

12

to be quite broad. For example, the values of the memristive timescales range from picoseconds [62] to milliseconds [46]. Here the parameters were taken from

a millisecond-timescale memristor [46]. The actual

speed of the proposed circuit would critically depend

on the timescales of commercially available memristor

devices.

Additional important modifications of the proposed

circuit are straightforward. In appendix A it is explained how to modify the circuit to work with more

realistic memristive devices [13, 32], instead of the

classical memristor model [11], given a few conditions. In appendix C, it is shown that it is possible

to reduce the transistor count from two to one, at the

price of doubling the duration of the write phase.

Other useful modifications of circuit are also possible.

For example, the input x may be allowed to receive

different values during the read and write operations. Also, it is straightforward to replace the simple

outer product update rule in (10) by more general update rules of the form

X

(k)

(k1)

Wnm

= Wnm

+

fi yn(k) gj x(k)

,

m

R EFERENCES

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

i,j

to adaptively modify the learning rate during training

(e.g., by modifying in Eq. 28).

[14]

[15]

VIII. C ONCLUSIONS

A novel method to implement scalable online gradient descent learning in multilayer neural networks

through local update rules is proposed, based on

the emerging memristor technology. The proposed

method is based on an artificial synapse using one

memristor, to store the synaptic weight, and two

CMOS transistors to control the circuit. The correctness of the proposed synapse structure exhibits similar accuracy to its equivalent software implementation, while the proposed structure shows high robustness and immunity to noise and parameter variability.

Such circuits may be used to implement large-scale

online learning in multilayer neural networks, as well

as other and other learning systems. The circuit is estimated to be significantly smaller than existing CMOSonly designs, opening the opportunity for massive parallelism with millions of adaptive synapses on a single

integrated circuit, operating with low static power and

good robustness. Such brain-like properties may give

a significant boost to the field of neural networks and

learning systems.

[16]

[17]

[18]

[19]

[20]

[21]

[22]

high-level features using large scale unsupervised learning,

in ICML 12, 2012, pp. 8188.

D. Ciresan, Multi-column deep neural networks for image

classification, in CVPR 12, 2012, pp. 36423649.

J. Dean, G. S. Corrado, R. Monga, K. Chen, M. Devin, Q. V.

Le, M. Z. Mao, M. Ranzato, A. Senior, P. Tucker, K. Yang,

and A. Y. Ng, Large scale distributed deep networks, NIPS,

2012.

R. D. Hof, Deep Learning, MIT Technology Review, 2013.

D. Hernandez, Now You Can Build Googles 1M Artificial

Brain on the Cheap, Wired, 2013.

Y. LeCun, L. Bottou, G. B. Orr, and K. R. Mller, Efficient

backprop, in Neural networks: Tricks of the Trade, 2012.

J. Misra and I. Saha, Artificial neural networks in hardware:

A survey of two decades of progress, Neurocomputing,

vol. 74, no. 1-3, pp. 239255, Dec. 2010.

A. R. Omondi, Neurocomputers: a dead end? International

journal of neural systems, vol. 10, no. 6, pp. 475481, 2000.

M. Versace and B. Chandler, The brain of a new machine,

Spectrum, IEEE, vol. 47, no. 12, pp. 3037, 2010.

M. M. Waldrop, Neuroelectronics: Smart connections.

Nature, vol. 503, no. 7474, pp. 224, Nov. 2013.

L. Chua, Memristor-the missing circuit element, Circuit

Theory, IEEE Transactions on, vol. 18, no. 5, pp. 507519,

1971.

D. B. Strukov, G. S. Snider, D. R. Stewart, and R. S. Williams,

The missing memristor found, Nature, vol. 453, no. 7191,

pp. 8083, 2008.

S. Kvatinsky, E. G. Friedman, A. Kolodny, and U. C.

Weiser, TEAM: ThrEshold Adaptive Memristor Model,

IEEE Transactions on Circuits and Systems I: Regular

Papers, vol. 60, no. 1, pp. 211221, 2013.

S. Kvatinsky, Y. H. Nacson, Y. Etsion, E. G. Friedman,

A. Kolodny, and U. C. Weiser, Memristor-based Multithreading, Computer Architecture Letters, vol. PP, no. 99, p. 1,

2013.

S. P. Adhikari, C. Yang, H. Kim, and L. Chua, Memristor

Bridge Synapse-Based Neural Network and Its Learning,

IEEE Transactions on Neural Networks and Learning

Systems., vol. 23, no. 9, pp. 14261435, Sep. 2012.

D. Querlioz and O. Bichler, Simulation of a memristor-based

spiking neural network immune to device variations, Neural

Networks (IJCNN), The 2011 International Joint Conference

on. IEEE, pp. 17751781, 2011.

C. Zamarreo Ramos, L. A. Camuas Mesa, J. A.

Prez-Carrasco, T. Masquelier, T. Serrano-Gotarredona, and

B. Linares-Barranco, On spike-timing-dependent-plasticity,

memristive devices, and building a self-learning visual

cortex, Frontiers in neuroscience, vol. 5, p. 26, Jan. 2011.

O. Kavehei, S. Al-Sarawi, K.-R. Cho, N. Iannella, S.-J. Kim,

K. Eshraghian, and D. Abbott, Memristor-based synaptic

networks and logical operations using in-situ computing,

2011 Seventh International Conference on Intelligent Sensors,

Sensor Networks and Information Processing, pp. 137142,

Dec. 2011.

A. Nere, U. Olcese, D. Balduzzi, and G. Tononi, A

neuromorphic architecture for object recognition and motion

anticipation using burst-STDP. PloS one, vol. 7, no. 5, p.

e36958, Jan. 2012.

Y. Kim, Y. Zhang, and P. Li, A digital neuromorphic

VLSI architecture with memristor crossbar synaptic array for

machine learning, SOC Conference (SOCC), 2012 IEEE . . . ,

pp. 328333, 2012.

W. Chan and J. Lohn, Spike timing dependent plasticity

with memristive synapse in neuromorphic systems, The 2012

International Joint Conference on Neural Networks (IJCNN),

pp. 16, Jun. 2012.

T. Serrano-Gotarredona, T. Masquelier, T. Prodromakis,

SOUDRY ET AL: MEMRISTOR-BASED MULTILAYER NEURAL NETWORKS WITH ONLINE GRADIENT DESCENT TRAINING13

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[41]

[42]

[43]

variations with memristors for spiking neuromorphic learning

systems. Frontiers in neuroscience, vol. 7, no. February, p. 2,

Jan. 2013.

D. Querlioz, O. Bichler, P. Dollfus, and C. Gamrat,

Immunity to Device Variations in a Spiking Neural Network

With Memristive Nanodevices, IEEE Transactions on

Nanotechnology, vol. 12, no. 3, pp. 288295, May 2013.

R. Legenstein, C. Naeger, and W. Maass, What can a

neuron learn with spike-timing-dependent plasticity? Neural

computation, vol. 17, no. 11, pp. 233782, 2005.

D. Chabi and W. Zhao, Robust neural logic block (NLB)

based on memristor crossbar array, Nanoscale Architectures

(NANOARCH), 2011 IEEE/ACM International Symposium

on. IEEE, pp. 137143, Jun. 2011.

H. Manem, J. Rajendran, and G. S. Rose, Stochastic Gradient

Descent Inspired Training Technique for a CMOS/Nano

Memristive Trainable Threshold Gate Array, Circuits and

Systems I: Regular Papers, vol. 59, no. 5, pp. 10511060,

2012.

M. Soltiz, D. Kudithipudi, C. Merkel, G. Rose, and

R. Pino, Memristor-based neural logic blocks for nonlinearly separable functions, Computers, IEEE Transactions

on, vol. 62, no. 8, pp. 15971606, 2013.

L. Bottou and O. Bousquet, The Tradeoffs of Large-Scale

Learning, in Optimization for Machine Learning, 2011, p.

351.

D. Ciresan and U. Meier, Deep, big, simple neural nets for

handwritten digit recognition, Neural Computation, vol. 22,

no. 12, pp. 32073220, 2010.

Z. Vasilkoski, H. Ames, B. Chandler, A. Gorchetchnikov,

J. Leveille, G. Livitz, E. Mingolla, and M. Versace,

Review of stability properties of neural plasticity rules for

implementation on memristive neuromorphic hardware, The

2011 International Joint Conference on Neural Networks, pp.

25632569, Jul. 2011.

See Supplemental Material.

L. Chua, Memristive devices and systems, Proceedings of

the IEEE, vol. 64, no. 2, 1976.

B. Widrow and M. Hoff, Adaptive switching circuits,

STANFORD ELECTRONICS LABS, Stanford, CA, Tech.

Rep., 1960.

B. Widrow and S. Stearns, Adaptive signal processing. Englewood Cliffs, NJ: Prentice-Hall, inc, 1985.

F. Rosenblatt, The perceptron: a probabilistic model

for information storage and organization in the brain.

Psychological review, vol. 65, no. 6, p. 386, 1958.

E. Oja, Simplified neuron model as a principal component

analyzer, Journal of mathematical biology, vol. 15, no. 3, pp.

267273, 1982.

C. M. Bishop, Pattern recognition and machine learning.

Singapore: Springer, 2006.

S. Shalev-Shwartz, Y. Singer, N. Srebro, and A. Cotter,

Pegasos: primal estimated sub-gradient solver for SVM,

Mathematical Programming, vol. 127, no. 1, pp. 330, Oct.

2010.

S. Kvatinsky, E. G. Friedman, A. Kolodny, and U. C. Weiser,

Memristor-based Material Implication (IMPLY) Logic: Design Principles and Methodologies, submitted to IEEE Transactions on Very Large Scale Integration (VLSI), 2013.

G. Cybenko, Approximation by superpositions of a sigmoidal

function, Mathematics of Control, Signals, and Systems

(MCSS), vol. 2, pp. 303314, 1989.

R. Sarpeshkar, Analog versus digital: extrapolating from

electronics to neurobiology, Neural Computation, vol. 10,

no. 7, pp. 16011638, 1998.

http://www.mosis.com.

G. Huang, D. C. Sekar, A. Naeemi, K. Shakeri, and J. D.

Meindl, Compact Physical Models for Power Supply Noise

and Chip/Package Co-Design of Gigascale Integration,

[44]

[45]

[46]

[47]

[48]

[49]

[50]

[51]

[52]

[53]

[54]

[55]

[56]

[57]

[58]

[59]

[60]

[61]

[62]

Technology Conference. Ieee, 2007, pp. 16591666.

M. Hu, H. Li, Y. Chen, X. Wang, and R. E. Pino, Geometry

variations analysis of TiO2 thin-film and spintronic memristors, in 16th Asia and South Pacific Design Automation

Conference (ASP-DAC 2011). Ieee, Jan. 2011, pp. 2530.

http://www.mathworks.com/products/simelectronics/.

T. Chang, S.-H. Jo, K.-H. Kim, P. Sheridan, S. Gaba, and

W. Lu, Synaptic behaviors and modeling of a metal oxide

memristive device, Applied Physics A, vol. 102, no. 4, pp.

857863, Feb. 2011.

K. Bache and M. Lichman, UCI Machine Learning Repository, http://archive.ics.uci.edu/ml, 2013.

P. Simard, D. Steinkraus, and J. Platt, Best practices for

convolutional neural networks applied to visual document

analysis, Seventh International Conference on Document

Analysis and Recognition, 2003. Proceedings., vol. 1, no.

Icdar, pp. 958963, 2003.

D. B. Strukov and K. K. Likharev, Reconfigurable nanocrossbar architectures, in Nanoelectronics and Information

Technology, 2012, pp. 543562.

G. Cauwenberghs, Analysis and verification of an analog

VLSI incremental outer-product learning system, IEEE

transactions on neural networks, 1992.

H. Card, C. R. Schneider, and W. R. Moore, Hebbian

plasticity in mos synapses, IEEE Transactions on Audio and

Electroacoustics, vol. 138, no. 1, p. 13, 1991.

C. Schneider and H. Card, Analogue CMOS Hebbian

Synapses, Electronics letters, pp. 19901991, 1991.

H. Card, C. R. Schneider, and R. S. Schneider, Learning

capacitive weights in analog CMOS neural networks, Journal

of VLSI Signal Processing, vol. 8, no. 3, pp. 209225, Oct.

1994.

M. Valle, D. D. Caviglia, and G. M. Bisio, An experimental

analog VLSI neural network with on-chip back-propagation

learning, Analog Integrated Circuits and Signal Processing,

vol. 9, no. 3, pp. 231245, 1996.

T. Morie and Y. Amemiya, An all-analog expandable neural

network LSI with on-chip backpropagation learning, IEEE

Journal of Solid-State Circuits, vol. 29, no. 9, pp. 10861093,

1994.

C. Lu, B. Shi, and L. Chen, An on-chip BP learning

neural network with ideal neuron characteristics and learning

rate adaptation, Analog Integrated Circuits and Signal

Processing, pp. 5562, 2002.

T. Shima and T. Kimura, Neuro chips with on-chip

back-propagation and/or Hebbian learning, IEEE Journal of

Solid-State Circuits, vol. 27, no. 12, 1992.

C. S. Lindsey and T. Lindblad, Survey of neural network

hardware, in Proc. SPIE, Applications and Science of Artificial Neural Networks, vol. 2492, 1995, pp. 11941205.

G. Cauwenberghs, A learning analog neural network chip

with continuous-time recurrent dynamics, NIPS, 1994.

H. Eguchi, T. Furuta, and H. Horiguchi, Neural network LSI

chip with on-chip learning, Neural Networks, pp. 453456,

1991.

C. Schneider and H. Card, CMOS implementation of

analog Hebbian synaptic learning circuits, IJCNN-91-Seattle

International Joint Conference on Neural Networks, vol. i,

pp. 437442, 1991.

A. C. Torrezan, J. P. Strachan, G. Medeiros-Ribeiro, and R. S.

Williams, Sub-nanosecond switching of a tantalum oxide

memristor. Nanotechnology, vol. 22, no. 48, p. 485203, Dec.

2011.

- Song Abbott Neuron 2001Uploaded byEileen Fu
- 3-D Memristor Chip DebutsUploaded bybiolek
- 00343548Uploaded bySangeeta Inginshetty
- Classification of MRI Brain Images Using GLCM, Neural Network, Fuzzy Logic & Genetic AlgorithmUploaded byEditor IJRITCC
- CS231n Convolutional Neural Networks for Visual Recognition Part I.Uploaded byJose Garcia
- Evanna iUploaded byLuis Moncayo
- NeuralNetworksUploaded byRijas Rasheed
- NeuralNetworks-Assign1Uploaded bySundeep Chopra
- Scope for Artificial Neural Network in TextilesUploaded byInternational Organization of Scientific Research (IOSR)
- Machine HintsUploaded byDenno Stein
- Study of Deep Learning Algorithms for Automatic License Plate Recognition (ALPR)Uploaded byIJRASETPublications
- sensors-16-01429 (1)Uploaded byMiguel Angel Gualito
- Local Training for Radial Basis Function Networks_ Towards Solving the Hidden Unit ProblemUploaded byNguyen Trong Tai
- A Directional Feature with Energy based Offline Signature Verification NetworkUploaded byijsc
- Lab WorkUploaded bypavan
- 09 Artificial Neural Networks and ClassificationUploaded bysiddu
- Dean I. Radin- Searching for "Signatures" in Anomalous Human-Machine Interaction Data: A Neural Network ApproachUploaded byDominos021
- art-3A10.1007-2Fs11269-013-0506-xUploaded byJithinRaj
- Feature RecognitionUploaded byapt917228393
- Bangla Optical Character RecognitionUploaded byselotmani
- 05 Adaptive Resonance Theory ART - CSE TUBEUploaded byRaja Ram
- Zan Huanga, Hsinchun Chena, Chia-Jung Hsua, Wun-Hwa Chenb, Soushan Wu - Credit Rating Analysis With Support Vector Machines and Neural NetworksUploaded byhenrique_oliv
- PSR_gameUploaded byAhmed Khan
- 05502348Uploaded byRajasekhar Nv
- Fuzzy-A Review ShortenedUploaded byYatheshth Anand
- CUploaded byPrithireddy Theneti
- scimakelatex.2634.Michael+JacksonUploaded bymaxxflyy
- A PREDICTIVE MODEL FOR MAPPING CRIME USING BIG DATA ANALYTICS.pdfUploaded byesatjournals
- 11_MultiattributeUploaded byanima1982
- me4547Uploaded byAndy Coello

- Escalator Instruction-manual KindmoverUploaded byengkankw
- Week 4Uploaded byrafiqueuddin93
- Unit Weight of Materials Used at Construction Site – Online Civil.pdfUploaded byKeshav Varpe
- Netapp Interview Questions - Q&AUploaded byRajesh Natarajan
- Configuring DHCP Relay on ComwareUploaded byMohammedAbulmakarim
- Vip Conv HelpUploaded byahmed_497959294
- broadcastUploaded byyadi1982
- Manual Electrode DesignUploaded byfranciscoval
- GO.Ms.103.T_R_B_22-05-1996Uploaded bymrraee4729
- AA_HVAC_101Uploaded byChan Sean
- DO_137_s2016.pdfUploaded bykhraieric16
- Oki c301Uploaded bypaolocollector
- Book 1Uploaded byramuce04
- 0-courseIntro-SP16Uploaded byAli Mir
- Modeling and Simulation of a Dynamic Model for Road Networks using Mobile AgentsUploaded byJournalofICT
- Net QuestionUploaded byvaithiyanathanh
- Laboratory Rev a-ModelUploaded byEng Hinji Rudge
- switch_4800g_24port.pdfUploaded byPachoBernal
- Error Number Mentor GraphicsUploaded byMendes
- DFS Step by Step GuideUploaded byనరహరికంఠంనేని
- ReadmeUploaded byMansoor Ahmed
- Universal Wireless ConfigUploaded bykmikmi
- Thermal Analysis of Long Buildings for Elimination of Exp JointUploaded byShirke Sanjay
- Carpentry Roof Framing SyllabusUploaded byKasozi Charles Bukenya
- Group 4 Final Paper 1-3-2012Uploaded byTrong Nhan Do
- clmgrUploaded byAnvesh Reddy
- Styro LutionUploaded byGanesan T
- EN(1084)Uploaded byreacharunk
- Using GNS3 for Switching LabsUploaded bykiny
- Leach 2017 Architectural DesignUploaded byPatricia Takamatsu

## Much more than documents.

Discover everything Scribd has to offer, including books and audiobooks from major publishers.

Cancel anytime.