You are on page 1of 12

micromachines

Article
A High-Precision Implementation of the Sigmoid Activation
Function for Computing-in-Memory Architecture
Siqiu Xu 1,2 , Xi Li 1, *, Chenchen Xie 1 , Houpeng Chen 1 , Cheng Chen 1,2 and Zhitang Song 1

1 The State Key Laboratory of Functional Materials for Informatics, Shanghai Institute of Microsystem and
Information Technology, Chinese Academy of Sciences, Shanghai 200050, China;
sqxu1022@mail.sim.ac.cn (S.X.); xcc@mail.sim.ac.cn (C.X.); chp6468@mail.sim.ac.cn (H.C.);
chencheng1@shanghaitech.edu.cn (C.C.); sztztsong@mail.sim.ac.cm (Z.S.)
2 The University of Chinese Academy of Sciences, Beijing 100049, China
* Correspondence: ituluck@mail.sim.ac.cn

Abstract: Computing-In-Memory (CIM), based on non-von Neumann architecture, has lately received
significant attention due to its lower overhead in delay and higher energy efficiency in convolu-
tional and fully-connected neural network computing. Growing works have given the priority to
researching the array of memory and peripheral circuits to achieve multiply-and-accumulate (MAC)
operation, but not enough attention has been paid to the high-precision hardware implementation
of non-linear layers up to now, which still causes time overhead and power consumption. Sigmoid
is a widely used non-linear activation function and most of its studies provided an approximation
of the function expression rather than totally matched, inevitably leading to considerable error. To
address this issue, we propose a high-precision circuit implementation of the sigmoid, matching the
expression exactly for the first time. The simulation results with the SMIC 40 nm process suggest

 that the proposed circuit implemented high-precision sigmoid perfectly achieves the properties of
Citation: Xu, S.; Li, X.; Xie, C.; Chen, the ideal sigmoid, showing the maximum error and average error between the proposed simulated
H.; Chen, C.; Song, Z. A sigmoid and ideal sigmoid is 2.74% and 0.21%, respectively. In addition, a multi-layer convolutional
High-Precision Implementation of the neural network based on CIM architecture employing the simulated high-precision sigmoid acti-
Sigmoid Activation Function for vation function verifies the similar recognition accuracy on the test database of handwritten digits
Computing-in-Memory Architecture. compared to utilize the ideal sigmoid in software, with online training achieving 97.06% and with
Micromachines 2021, 12, 1183. offline training achieving 97.74%.
https://doi.org/10.3390/mi12101183

Keywords: computing-in-memory; circuit implementation; diode; high-precision sigmoid; non-linear


Academic Editor: Jung Ho Yoon
activation function; neural networks

Received: 16 August 2021


Accepted: 27 September 2021
Published: 29 September 2021
1. Introduction
Publisher’s Note: MDPI stays neutral
Convolutional Neural Network (CNN) [1] has shown a satisfying performance on
with regard to jurisdictional claims in
recognition/classification tasks owing to its characteristics of weight sharing, multi-core
published maps and institutional affil- convolution, and local perception [2]. Figure 1 presents a typical basic architecture named
iations. LeNet-5, which consists of two convolutional layers (C1, C2), two max-pooling layers
(S1, S2), and three fully connected layers (F1, F2, F3). Moreover, there is an activation func-
tion after F3. CNN plays a crucial role in the artificial intelligence (AI) world, but a massive
amount of data transfers back and forth between CPU and memory causes high power
Copyright: © 2021 by the authors.
consumption for conventional all-digital implemented CNN computation, which is called
Licensee MDPI, Basel, Switzerland.
“memory bottleneck” [3–5] resulting from von-Neumann computing architecture. A variety
This article is an open access article
of works aim to solve this problem. Among them, the Computing-In-Memory (CIM) [4–7]
distributed under the terms and architecture was spotlighted because of its extraordinary advantages, one is the great re-
conditions of the Creative Commons duction of the data transmission, while the other is the substantially improved parallelism.
Attribution (CC BY) license (https:// Plenty of previous works have demonstrated the design methods and results of CIM, which
creativecommons.org/licenses/by/ greatly showed the superiority of CIM architecture [8–13]. For instance, [9] adopting a
4.0/). mode of 1bit-input-3bit-weight illustrates the time-cost of a MAC operation is 11.75 ns per

Micromachines 2021, 12, 1183. https://doi.org/10.3390/mi12101183 https://www.mdpi.com/journal/micromachines


Micromachines 2021, 12, x FOR PEER REVIEW 2 of 12

Micromachines 2021, 12, 1183 2 of 12

adopting a mode of 1bit-input-3bit-weight illustrates the time-cost of a MAC operation is


11.75 ns per cycle by using 3 × 3 kernels, and the system achieved an 88.52% inference
cycle by using
accuracy 3×
on the 3 kernels,
CIFAR-10 and the
dataset. Thesystem achieved
measured peakan 88.52%
energy inference
efficiency accuracy on
is 53.17TOPS/W
the CIFAR-10
in the binary dataset.
mode of The measured
1bit-input, peak energy
3bit-weight, efficiency isand
4bit-MAC-out 53.17TOPS/W
21.9TOPS/W ininthe
thebinary
multi-
mode
bit mode of 2bit-input, 3bit-weight, 4bit-MAC-out using CIM peripheral circuitsmode
of 1bit-input, 3bit-weight, 4bit-MAC-out and 21.9TOPS/W in the multi-bit and a
ofreference
2bit-input, 3bit-weight,
generator. 4bit-MAC-out
Ref. [12] proposed a using CIM peripheral
hybrid-training method circuits
to dealandwitha device
reference
im-
generator.
perfections,Ref. [12] proposed
which is provedatohybrid-training
be effective and method
fast. Theto deal
resultwith device
showed imperfections,
that the precision
which is provedintothe
of recognition be MNIST
effectivedataset
and fast. The result
achieving showed
more that the
than 96% withprecision of recognition
a five-layer memristor-
in the MNIST dataset achieving more than 96% with a five-layer memristor-based CNN.
based CNN. Ref. [13] illustrated the in situ training of a five-level CNN, which could self-
Ref. [13] illustrated the in situ training of a five-level CNN, which could self-adapt to
adapt to the non-idealities of the array of memristors to complete classification task and
the non-idealities of the array of memristors to complete classification task and avoid
avoid suffering from the “memory bottleneck” by reducing about 75% trainable parame-
suffering from the “memory bottleneck” by reducing about 75% trainable parameters with
ters with the method of sharing weights, meanwhile ensuring similar classification accu-
the method of sharing weights, meanwhile ensuring similar classification accuracy as the
racy as the memristor-based multilayer perceptron. A recent work experimentally proved
memristor-based multilayer perceptron. A recent work experimentally proved function-
functionality (>98% accuracy) in the MNIST dataset [14], using 6-b inputs/outputs. In ad-
ality (>98% accuracy) in the MNIST dataset [14], using 6-b inputs/outputs. In addition,
dition, they achieved similar or better power efficiency by reducing data transfers com-
they achieved similar or better power efficiency by reducing data transfers compared to
pared to full-digital implementations. In the paper, the adopted structure of the network
full-digital implementations. In the paper, the adopted structure of the network is LeNet-5,
is LeNet-5, involving two fully connected layers, F5, F6, and a non-linear ReLU layer be-
involving two fully connected layers, F5, F6, and a non-linear ReLU layer between F5 and
tween F5 and F6. Thus, the data needs to move on between memory and CPU to compute
F6. Thus, the data needs to move on between memory and CPU to compute the ReLU
the ReLU
layer, which layer,
also which
leads toalso leads to a considerable
a considerable amountdissipation
amount of power of power dissipation and time
and time overhead.
overhead. Only by using a MAC operation and nonlinear activation
Only by using a MAC operation and nonlinear activation function in circuits together, function in circuits
together, could we obtain a minimum consumption either in latency
could we obtain a minimum consumption either in latency or power. Whereas most CIM or power. Whereas
most used
works CIM the
worksarrayused the array
on-chip on-chip convolution/fully-connected
to implement to implement convolution/fully-connected
(CONV/FC)
layers, few studies have investigated the non-linearthe
(CONV/FC) layers, few studies have investigated non-linear
layers layers in
implemented implemented
the software,in
the software,
which which
highlights that highlights
little is knownthat about
little isthe
known aboutimplementation
hardware the hardware implementation
of the activationof
the activation
functions in CIM. functions in CIM.

Figure 1. Architecture of the LeNet-5, showing the seven layers of CNN, the size of feature maps, and the size of filters.
Figure 1. Architecture of the LeNet-5, showing the seven layers of CNN, the size of feature maps, and the size of filters.
An activation layer is essential for a neural network to add nonlinear factors to improve
An activation
the expressive abilitylayer
of theislinear
essential
model,for furthermore,
a neural network to add nonlinear
a high-precision activationfactors to im-
function
prove the expressive ability of the linear model, furthermore, a
is crucial for a neural network model to realize the fast convergence. There are various high-precision activation
function
kinds is crucialfunctions,
of activation for a neural such network
as ReLU, model to realize
softmax, sigmoid the[15],
fastetc.
convergence.
Sigmoid isThere are
popular
various kinds of activation functions, such as ReLU, softmax, sigmoid
because of its smoothness and its ability to control the output data of the network layer [15], etc. Sigmoid
is popular
between because
0 and of its smoothness
1. In addition, and itsfor
it is convenient ability to control the
backpropagating tooutput
calculate data
theof the net-
gradient
work layer between 0 and 1. In addition, it is convenient for backpropagating
thanks to the function is continuously differentiable. Hence, it is of great significance to to calculate
the gradient
design thanks to the
a high-precision function
circuit is continuously
implementation of thedifferentiable. Hence,function.
sigmoid activation it is of great sig-
nificance
In thistopaper,
designwe a high-precision
focus on designing circuitaimplementation
high-precision circuitof the sigmoid activation
implementation offunc-
the
tion. activation function used in the CIM architecture for the first time, which is mainly
sigmoid
In thisofpaper,
composed diodes. weThefocus on designing
sigmoid functiona expression
high-precision can circuit implementation
be obtained by configuringof the
sigmoid activation function used in the CIM architecture for the
the connection method of the diodes based on the I–V equation of a diode. It is demon- first time, which is mainly
composed
strated that of
thediodes.
circuitThe sigmoid function
implemented expression
high-precision can behas
sigmoid obtained by key
all of the configuring
propertiesthe
ofconnection
conventional method of theimplemented
software diodes basedsigmoid,on the I–V equation
with of a diode.
insignificant error.It In
is demonstrated
other words,
thathigh-precision
the the circuit implemented high-precision
activation function layerssigmoid
also canhas be all of the key properties
implemented on-chip, whichof con-
ventional software
contributes implemented
to less delay overhead sigmoid,
and power with insignificantToerror.
consumption. furtherIn substantiate
other words,the the
performance of the proposed circuit, firstly, we export an enormous number of simulated
Micromachines 2021, 12, x FOR PEER REVIEW 3 of 12

Micromachines 2021, 12, 1183 3 of 12


high-precision activation function layers also can be implemented on-chip, which contrib-
utes to less delay overhead and power consumption. To further substantiate the perfor-
mance of the proposed circuit, firstly, we export an enormous number of simulated points
of the proposed
points circuitcircuit
of the proposed to encapsulate them as
to encapsulate a function
them in theinsoftware,
as a function then then
the software, we con-
we
struct a multi-layer
construct CNN
a multi-layer CNNbased on CIM
based on CIMarchitecture employing
architecture employingthe the
newnew sigmoid
sigmoidfunction
func-
tion replacing
replacing former
former sigmoid
sigmoid expression
expression to complete
to complete the recognition
the recognition task on task
theon the MNIST
MNIST hand-
hand-written
written dataset dataset and compare
and compare it withitthe
with the sigmoid
sigmoid implemented
implemented on software
on software in CIMinarchi-
CIM
architecture.
tecture.
This paper is organized as follows. Section 2 introduces the overall detailed structure structure
of the
the high-precision
high-precisionsigmoid
sigmoidfunction
functioncircuit. Section
circuit. Section3 shows
3 showsthethe
simulation
simulationresults of the
results of
designed circuit
the designed implementation
circuit implementationof the
of sigmoid activation
the sigmoid function.
activation Section
function. 4 presents
Section the
4 presents
application of the
the application of circuit implementing
the circuit implementing a high-precision
a high-precisionsigmoid function
sigmoid to CNN
function on the
to CNN on
MNIST dataset. Finally, the summary of the research is presented in
the MNIST dataset. Finally, the summary of the research is presented in Section 5. Section 5.

2. Circuit Architecture
2. Circuit Architecture Design
Design
2.1. Core Circuit
2.1. Core Circuit
Sigmoid is a common activation function, whose typical expression is shown as follow:
Sigmoid is a common activation function, whose typical expression is shown as fol-
low: 1
y= (1)
1 + e− x
𝑦= (1)
For a diode, when a forward voltage is performed across the diode and VBE > ( 5 ∼ 10)VT , the
For a diode, when a forward voltage is performed across the diode and 𝑉 >
current and voltage formula of a diode is:
(5~10)𝑉 , the current and voltage formula of a diode is:
VBE
IC 𝐼==IS𝐼e 𝑒VT (2)
(2)

where I𝐼S isisthe


thereverse
reverse saturation
saturation current
current of diode, V𝑉BE isisthe
of aa diode, theforward
forward voltage
voltage across
across aa
diode, andV𝑉T isisthe
diode, and thevoltage
voltage equivalent
equivalent of of temperature,
temperature, which
which is mV
is 26 26 mV at room
at room temperature
temperature
(T = 300 K). The(T =equivalence
300 K). Theofequivalence
Equation (2)ofisEquation
depicted(2)
inis depicted
Figure in Figurea2,critical
2, becoming becoming
step
a critical
in step inthe
constructing constructing the sigmoid function.
sigmoid function.

Figure 2.
Figure 2. The
The I–V
I–V characteristic
characteristic of
of the diode, I𝐼S is
the diode, is the
the reverse
reverse saturation
saturation current
current of
of the
the diode
diode and
and
𝑉 is the turn-on voltage of the diode.
VON is the turn-on voltage of the diode.

To gain
To gain expression
expressionofofthe
thesigmoid,
sigmoid,we weuse
usethree
three identical
identical diodes
diodes connected
connected simply,
simply, as
as shown in Figure 3. According to (2), the current of diode
shown in Figure 3. According to (2), the current of diode D1 is D1 is

𝐼 = 𝐼 V𝑒BE1 (3)
IC1 = IS1 e VT (3)
similarly, the I–V formula of diode D2 is
similarly, the I–V formula of diode D2 is
𝐼 = 𝐼 V𝑒 (4)
BE2
IC2 = IS2 e VT (4)
Micromachines 2021, 12, 1183 4 of 12

Micromachines 2021, 12, x FOR PEER REVIEW 4 of 12

the reverse saturation current IS1 = IS2 = IS thanks to we use three identical diodes. So,
thereverse
the currentsaturation
of diode D3
current 𝐼 = 𝐼 = 𝐼 thanks to we use three identical diodes. So,
the current of diode D3  V
BE1 VBE2

VT VT
IC3 = IC1 = IC2 = IS e +e (5)
𝐼 =𝐼 +𝐼 =𝐼 𝑒 +𝑒 (5)
Now we calculate the current ratio of diode D1 and D3
Now we calculate the current ratio of diode D1 and D3
VBE1
IC1 IS1 e VT 1
𝐼 =  V𝐼BE1𝑒 VBE2
 = 1BE2 −VBE1
V
(6)
IC3 =
𝐼 IS e VT + e VT = 1 + e VT (6)
𝐼 𝑒 +𝑒 1+𝑒

ItItisisnoted
notedthat
thatwe
wesuccessfully
successfullyconstruct
constructaaform
formlike
likesigmoid.
sigmoid.

Figure
Figure3.3.The
Theconnection
connectionmethod
methodofofdiodes,
diodes,the
thecurrent
currentofofD3
D3isisthe
thesum
sumofofthe
thecurrent
currentofofD1
D1and
andthe
the
current of D2.
current of D2.

2.2.
2.2.Peripheral
PeripheralCircuits
Circuits
WeWeuse
usethe
thethree
threeexactly
exactlysame
samediodes
diodesbefore,
before,andandthen
thenwewedesign
designperipheral
peripheralcircuits
circuits
forcopying
for copyingcurrent
currentandandconversing
conversing thethe ratio
ratio of
of current into the ratio
ratio of
of voltage.
voltage. The
Theoverall
over-
all architecture of the sigmoid is shown in Figure 3. In our approach (Figure 4a), the cur-of
architecture of the sigmoid is shown in Figure 3. In our approach (Figure 4a), the current
D3 of
rent is copied to the to
D3 is copied current of MOSFET—M5,
the current of MOSFET—M5, and theand
current of D1 isofcopied
the current to the current
D1 is copied to the
of M3 due
current to due
of M3 the feature of the Current
to the feature Mirror. Mirror.
of the Current Moreover, the forward
Moreover, voltage of
the forward diode of
voltage D2
diode D2 𝑉 theminus
VBE2 minus forwardthevoltage
forward diode D1ofVdiode
of voltage D1 𝑉 to −isVinequal
BE1 is equal , which −𝑉
to is because
, which of the
is
virtual short
because of thecharacteristic
virtual shortof characteristic
the operational ofamplifier. Consequently,
the operational Equation
amplifier. (6) can be
Consequently,
further simplified
Equation (6) can be as (7), simplified as (7),
further
I𝐼C1 11 1
1
=
= =
= (7)
(7)
I𝐼C3 VBE2 −VBE1 −Vin
11 +
+ 𝑒e VT 1 + 𝑒e VT

ItItmay
maybebenoted
notedthat
thatachieving
achievingthe theconversion
conversionofofcurrent
currentand
andvoltage
voltagebecoming
becomingeasy,
easy,
asaslong
long as make sure that the two resistors in the circuit with the same resistance.After
as make sure that the two resistors in the circuit with the same resistance. After
that,
that,we
weshould
shouldconsider
considerhowhowtotodesign
designthe thedevice
deviceofofaadivider
dividerininFigure
Figure4.4.We
Wecombine
combine
Gilbert
Gilbert Unit and the feedback loop to complete voltage division in this paper,which
Unit and the feedback loop to complete voltage division in this paper, whichisis
shown in Figure 4b. Actually, there are several kinds of circuit units to achieve
shown in Figure 4b. Actually, there are several kinds of circuit units to achieve division division
operation
operationbesides
besidesGilbert.
Gilbert.Therefore,
Therefore,we weobtain
obtainthe
theexpression
expressionforforthe
theoutput
outputofofthe
thewhole
whole
circuit, as shown in the following
circuit, as shown in the following (8): (8):
𝑉 𝑅 ∗𝐼 1
𝑉 =VA = R1 ∗ IC1 = 1 (8)
Vout = 𝑉 = 𝑅 ∗𝐼 = (8)
VB R2 ∗ IC3 1 + 𝑒 −VVin
1+e T
which successfully converts the ratio of I to the ratio V by two resistors with the same
which successfully
resistance converts the ratio of I to the ratio V by two resistors with the same
as said above.
resistance as said above.
Micromachines 12, 1183
2021,2021,
Micromachines 12, x FOR PEER REVIEW 5 of512of 12

(a) (b)
Figure
Figure 4. (a)4.The
(a) The overall
overall circuit
circuit architecture
architecture of high-precision
of the the high-precision sigmoid,
sigmoid, withwith
R1 =R1
R2,= the
R2, size
the size of is
of M4 M4 is equal
equal to size
to the the size
of M5 and so are M2 and M3. (b) Details of the structure of the analog
of M5 and so are M2 and M3. (b) Details of the structure of the analog divider. divider.

Although
Although the the expression
expression (8) matches
(8) matches the the sigmoid
sigmoid expression
expression well,well, it should
it should be dis-
be dis-
cussed
cussed the the limitation
limitation of theof the actual
actual circuit.
circuit. It can
It can be observed
be observed thatthatthe the (8)only
(8) is is only adapted
adapted to to
the situation of the forward voltage of the diode 𝑉 ≫ 𝑉
the situation of the forward voltage of the diode VBE  VT , whereas the three diodes are are
, whereas the three diodes
not not always
always on as onthe
asinput
the input changes
changes fromfrom−2 V −2to 2VVto in2this
V in this paper.
paper. To facilitate
To facilitate analysis, analysis,
we
assume the voltage
we assume the of the cathode
voltage of theof the diode
cathode is VX0 1/2
1/2 diode
of the when 𝑉 when
is input is −2input −2 V, and
V, andismeasure
the measure
turn-on voltage
the turn-onof the diodeofisthe
voltage 0.7diode
V. It isisdivided
0.7 V. It into three phases
is divided into threeaccording to the
phases according
region of the
to the three
region of diodes.
the threeAtdiodes.
the firstAtphase, thephase,
the first forwardthevoltage D1 is Vof
forwardofvoltage in −
D1 Vis 𝑉
X0 < −𝑉0V, <
D2, 0𝑉,
D3 and
D2, D3M4andcombined as a pathway,
M4 combined so the state
as a pathway, of D1
so the is off,
state of D1 theisD2
off,and
theD3 D2are andon-state,
D3 are on-
meaning
state, that the output
meaning that the voltage
outputisvoltage
0. The voltage
is 0. Theofvoltage
D1 increases with the growth
of D1 increases with the ofgrowth
input of
voltage
input until Vin −until
voltage VX0 = 𝑉 0.7V,
−𝑉 and = 0.7𝑉,
during andthis period,
during theperiod,
this voltagethe remains.𝑉 In remains.
VX0voltage phase 2, In
the phase
voltage2,of D1voltage
the reachesofturn-on
D1 reachesvoltage, thus,voltage,
turn-on the threethus,
diodestheare all on-state,
three diodes are indicating
all on-state,
the indicating
output changes following
the output changes (8). following
It comes to(8). phase 3 when
It comes Vin =3−(
to phase whenVX0 + 𝑉 0.7V= −(𝑉 + 0.7𝑉)
) because
of the
because of the symmetry of the sigmoid, and the forward voltage of D2 is less than is
symmetry of the sigmoid, and the forward voltage of D2 is less than 0.7V. So D2 0.7V.
off-state,
So D2D1 and D3 are
is off-state, D1on-state,
and D3 are declaring
on-state,that the output
declaring thatvoltage
the output is 1 V. According
voltage is 1 V.toAccord-
the
analysis,
ing toitthe
is known that
analysis, it the turningthat
is known pointtheofturning
the output is closely
point VX0 , and
relatedistoclosely
of the output VX0 to
related
is interrelated to the current of D3, which demonstrated the slope
𝑉 , and 𝑉 is interrelated to the current of D3, which demonstrated the slope of the of the simulated circuit
is adjustable.
simulated The above
circuit analysis is summarized
is adjustable. The above analysis in Tableis1.summarized
Thus, the output in Table of the
1. circuit
Thus, the
could be corresponding to the ideal sigmoid precisely.
output of the circuit could be corresponding to the ideal sigmoid precisely.

Table 1. The
Table threethree
1. The phases of the
phases ofregion for diodes.
the region for diodes.
Region
Region of Diodes
of Diodes
Vin𝑽[V]
𝒊𝒏 [V] 𝑽𝒐𝒖𝒕 [V]
Vout [V]
D1
D1 D2 D2 D3 D3
−2 −2~(𝑉
∼ (VX0 + +
0.70.7)
) OFF
OFF ON ON ON ON 0 0
(𝑉 + 0.7)~−(𝑉 + 0.7) ON ON ON Formula (8)
(VX0 + 0.7) ∼ −(VX0 + 0.7) ON ON ON Formula (8)
−(𝑉 + 0.7)~2 ON OFF ON 1
−(VX0 + 0.7) ∼ 2 ON OFF ON 1

3. Simulation Results and Discussions


3. Simulation Results and Discussions
We use virtuoso of Cadence to simulate the proposed circuit with SMIC 40 nm pro-
We usesupply
cess at virtuoso of Cadence
voltage is 3.3 to
V.simulate
A simplethe proposedsigmoid
simulated circuit with SMIC
curve with40anm
markprocess
point is
at supply voltage is 3.3 V. A simple simulated sigmoid curve with a mark point is revealed
revealed in Figure 5, illustrating that the output voltage is 0.5 V when the input voltage is
in Figure 5, illustrating that the output voltage is 0.5 V when the input voltage is 0 V. The
0 V. The voltage of the cathode of the diode 1/2, 𝑉 is −950 mV in the situation ( 10of M4
voltage of the cathode of the diode 1/2, VX0 is −950 mV in the situation ( W L of M4 is 1 ).
is 6).displays
Figure Figure 6the
displays the dc simulated
dc simulated output(V
output voltages voltages
out ) over (𝑉
an ) over
input an input
voltage (Vvoltage
in ) range
(𝑉 )
Micromachines 2021, x FOR PEER REVIEW 6 of
Micromachines 2021, 12,
12, 1183
x FOR PEER REVIEW 6 6of
of 12
12

range
from − from
2 V −2−2 2V
to VVto
tofrom
22 V
V from
from seven
sevenseven different
different W ofMOSFET
the MOSFET
MOSFET whose serial number is
range from L of the
different of the whose serialserial
whose numbernumberis M4.is
M4.
M4. The
The The
resultresult
result demonstrates
demonstrates
demonstrates that
that that we can
we can
we can
change change the current
current
the current
change the flowing
flowing through
through
flowing the
the the
through diodediode
diodeD3
D3 branch
branch to to derive
derive the the sigmoid
sigmoid function
function withwith different
different slopes,
slopes, as as
thethe analysis
analysis
D3 branch to derive the sigmoid function with different slopes, as the analysis of Section of of Section
Section 2.
2. Furthermore,
Furthermore, the
the smaller
smaller the
the current
current flowing
flowing through
through the
the diode
diode D3
D3
2. Furthermore, the smaller the current flowing through the diode D3 branch, the greater branch,
branch, the greater
the slope
the slope ofof the
the sigmoid
sigmoid function.
function. Additionally,
function. Additionally,
Additionally, thethe outputs
outputs
outputs of of seven
seven curves
curves are
are all
all 0.5
0.5 V
V
the input
when the
when input voltages
input voltagesare
voltages are0,0,
are 0,and
andthe
and theoutput
the outputisis
output ismonotonically
monotonicallyincreasing
monotonically increasingbetween
increasing between
between 0 and 1,
00 and
and
which
1, whichis outstandingly
is outstandingly corresponding
corresponding to to
thethe
various
variouslawlawof of
thethe
sigmoid
sigmoidfunction.
function.
1, which is outstandingly corresponding to the various law of the sigmoid function.

Figure 5.
Figure 5. Simulated high-precision
Simulated high-precision
5. Simulated sigmoid.
high-precision sigmoid.
sigmoid.
Figure

W
Simulation results
Figure 6. Simulation results of the high-precision sigmoid with different L of M4.
M4. The slope of
Figure 6. Simulation results of the high-precision sigmoid with different of M4. The slope of
function increases with the W of
function increases with the of M4
M4 getting
getting smaller.
smaller.
function increases with the L of M4 getting smaller.
To evaluate the error between the simulated high-precision curve and the ideal func-
To evaluate
evaluate the
the error
error between
between the
the simulated
simulated high-precision
high-precision curve
curve and
and the
the ideal
ideal func-
tion,To
a function expression is necessary for this experience, which is fitting with a func-
large
tion,
tion, a function
a function expression
expression is necessary
is necessary for this
forMATLABexperience,
this experience, which
which is fitting with aacurve,
with large
number of simulated data points (4k) by exported fromisthe
fitting
simulated large
number of
number of simulated
simulated data
data points
points (4k)
(4k) by
by MATLAB
MATLAB exported
exported from
from the
the simulated
simulated curve,
curve,
Micromachines 2021, 12, x FOR PEER REVIEW 7 of 12
Micromachines 2021, 12, 1183 7 of 12

depicting in Figure 7a. The fitted curve has an expression with a fitting coefficient of sig-
depicting in Figure
moid, which is 7a. The fitted curve has an expression with a fitting coefficient of
sigmoid, which is
1 1
y=𝑦= (9)
(9)
1+1+e 𝑒19.58x.

The simulated
The appropriate W
simulated curve with an appropriate of M4 ( W= ),10the orange curve is the
L of M4 ( L = 2 ), the orange curve is
the ideal
ideal oneone
andand the blue
the blue one one represents
represents the simulation.
the simulation. It is that
It is clear clearthere
that there is a slight
is a slight differ-
difference between
ence between the two
the two curves,
curves, which
which cancan
bebe attributabletotothe
attributable theinaccuracy
inaccuracy of the
the current
current
replication
replicationofofMOSFETs.
MOSFETs.Both Bothofofthem
themarearesmooth
smoothbutbutthe
theorange
orangeone oneisismore
moresymmetrical
symmetrical
than
than the
the blue
blueone.
one. As
As mentioned
mentioned inin Section
Section 1,
1,there
thereare
aretwotwocritical
criticalcharacteristics
characteristicsof ofthe
the
sigmoid, one is continuously differentiable, the other is the output value is between
sigmoid, one is continuously differentiable, the other is the output value is between 0 and 0 and
1,1,which
whichare
arecontained
containedby bysimulated
simulatedhigh-precision
high-precisionsigmoid
sigmoidtotally.
totally.

(a) (b)
Figure7.7.(a)
Figure (a)The
Thefitting
fittingfunction
functionisisbased
basedon
onnumerous
numeroussimulated
simulateddata
datapoints
pointsexported
exportedfrom
fromthetheimplementation
implementationcircuit
circuitof
of
the sigmoid and fitted with MATLAB (
W of M4 is 10 ). The orange curve represents the fitting sigmoid, and the blue one is
the sigmoid and fitted with MATLAB ( L of M4 is 2 ). The orange curve represents the fitting sigmoid, and the blue one is
thesimulated
the simulateddata.
data.(b)
(b)Error
Errorbetween
betweenthethefitting
fittingexpression
expressionand andthe
thesimulated
simulateddata.
data.

The
Theerror
errorusing
usingthethenumerical
numericaldifference
differencebetween
betweenthe thefitting
fittingcurve
curveandandthe
thesimulated
simulated
curve is calculated to evaluate the fitting accuracy of the fitted curve, showing
curve is calculated to evaluate the fitting accuracy of the fitted curve, showing in Figure in Figure 7b.
According
7b. According to the definition
to the of error
definition of errorin (10) and
in (10) andFigure
Figure 7b,
7b,the
themaximum
maximumerror errorbetween
between
the
the fitting sigmoid
sigmoid and andthe
thesimulated
simulatedsigmoid
sigmoid reaching
reaching 2.74%
2.74% comes
comes at −0.05
at −0.05 V ofinput
V of the the
input voltage,
voltage, and theand the average
average error error is 0.21%,
is 0.21%, whichwhich
couldcould be considered
be considered that
that the the is
error error is
negli-
negligible. Therefore,
gible. Therefore, the unmarkable
the unmarkable difference
difference of theofsimulation
the simulation
resultresult andfitting
and the the fitting
curve
curve
for thefor the sigmoid
sigmoid couldcould not affect
not affect the accuracy
the accuracy of a network
of a network recognition
recognition theoretically,
theoretically, fur-
furthermore,
thermore, it can degrade the energy dissipation and delay overhead by reducing data
it can degrade the energy dissipation and delay overhead by reducing data
transfer
transferbetween
betweenCPU CPUandandmemory
memoryaccording
accordingto toour
ourreasoning.
reasoning.

𝐸𝑟𝑟𝑜𝑟Vout
= − Sigmoid f unction (10)
Error = (10)
Amplitude
Corner analysis is simulated with SS (Slow NMOS Slow PMOS), FF (Fast NMOS Fast
PMOS),
CornerSF (Slow
analysis NMOS Fast PMOS),
is simulated with SS and FS (Fast
(Slow NMOS NMOS SlowSlow PMOS)
PMOS), for the
FF (Fast NMOSproposed
Fast
high-precision circuit implementation of the sigmoid function
PMOS), SF (Slow NMOS Fast PMOS), and FS (Fast NMOS Slow PMOS) for the proposed (the size of M4 is the same
as Figure 7, i.e.,
high-precision circuit = implementation
in corner analysis), and theyfunction
of the sigmoid are combined
(the sizewith supply
of M4 is thevoltage
same
W 10
as Figure(±10%,
change 7, i.e., i.e.,
L =3 V,
2 in
3.3corner
V, andanalysis),
3.6 V), and
showingthey inare combined
Figure 8a of with
15 supply
curves. Wevoltage
could
change (±10%,that
easily observe i.e.,only
3 V, five
3.3 V, and 3.6
curves V), be
could showing in Figure
distinguished 8a ofillustrating
clearly, 15 curves. the Wechange
could
easily observe
of supply thatalmost
voltage only fivehascurves couldon
no impact beour
distinguished clearly,For
simulated results. illustrating the change
process corners, the
of supply voltage almost has no impact on our simulated results.
maximum output voltage of the third phase occurs in the SS corner with the supply volt- For process corners, the
maximum
age of 3.6 output
V is 1.06 voltage
V, the of minimum
the third phase
output occurs in the
voltage ofSSthecorner
thirdwith
phase theoccurs
supplyinvoltage
the FF
of 3.6 Vwith
corner is 1.06
theV,supply
the minimum
voltage of output
3 V is voltage
0.93 V, so ofthe
themaximum
third phase occurs inbetween
difference the FF corner
Figure
with
8a andthethe
supply
idealvoltage
curve isof 703mVV isin0.93 V, so
phase 3.the maximum
In phases 1 and difference between
2, the 15 curves Figure almost.
coincide 8a and
the ideal curve is 70 mV in phase 3. In phases 1 and 2, the 15 curves
All in all, our circuit is robust with the process corner and supply voltage change. coincide almost. All in
all, our circuit is robust with the process corner and supply voltage change. Moreover, the
Micromachines2021,
Micromachines 12,x1183
2021,12, FOR PEER REVIEW 8 of1212
8 of

Moreover,
maximumthe maximum
error and the error anderror
average the average error3.39%
are 7% and are 7% and 3.39% in
respectively respectively in theof
the worst case
worst case3 of
FF with FF calculating.
V by with 3 V by calculating.

(a) (b)

(c)
Figure
Figure8.8.(a)
(a)DC
DCscanning
scanningofofthe
theproposed
proposedcircuit
circuitfor
forvarious
variousprocess
processcorners
cornersand
andsupply
supplyvoltages.
voltages.(b)
(b)The
Theoutput
outputofofthe
the
simulations for changing temperatures with an increment of 10 ◦
°C. (c) The Monte Carlo simulation of the
simulations for changing temperatures with an increment of 10 C. (c) The Monte Carlo simulation of the output.output.

Additionally,we
Additionally, weexplore
explorethe theprocess
process andand mismatch
mismatch analysis
analysis using Monte Carlo sim- simu-
lation. It
ulation. It is
is performed
performedby by4040nmnmmc mcmodels
modelsand andthen
thenrunsruns1000
1000trials,
trials,showing
showing in in
Figure
Figure 8b.
It is
8b. It seen thatthat
is seen mismatch
mismatch andand process make
process makea slight effecteffect
a slight on the
onoutput of theofcircuit.
the output When
the circuit.
the diode
When D1 isD1
the diode close in the
is close in first phase,
the first the the
phase, output
outputof the curves
of the curves ranging
rangingfromfrom 1919mV mVto
to−−20
20 mV.
mV.In Inthe
thesecond
second phase
phase withwith the three diodes turned turned on,on, the
the 11kkcurves
curveshave havelittle
little
differenceand
difference andareareclose
closetotothetheideal
idealcurve.
curve.InInphase
phase3,3,the theerror
errorisisslightly
slightlylarger,
larger,varying
varying
from 1.04 V to 0.95 V due to the mismatch of the current
from 1.04 V to 0.95 V due to the mismatch of the current mirrors and so on. mirrors and so on.
Consideringthat
Considering thatthetheoutput
outputofofthe thesimulation,
simulation,i.e., i.e.,Equation
Equation(8) (8)could
couldbe beaffected
affectedby by
temperature,we weonly
onlytest
testthe
thetemperature
temperaturevary varyfrom
from10 ◦
10°CCto to60 ◦
60°CCwithwithaastep
stepof of10 ◦
10°CCtoto
temperature,
minimizethe
minimize theerror
errorcaused
causedby bythe
theformula.
formula.The Theresults
resultsare areshown
shownininFigure
Figure8c,8c,indicating
indicating
thattemperature
that temperaturehas haslittle
littleeffect
effecton onthe
theoutput
outputresult
resultin inaacertain
certaintemperature
temperaturerange. range.
The form of Voltage Input–Voltage Output in this circuit is correspondingtotothe
The form of Voltage Input–Voltage Output in this circuit is corresponding theinput
in-
signal and output signal of a linear layer of most CIM architecture,
put signal and output signal of a linear layer of most CIM architecture, meaning that meaning that it couldit
reduce
could energy
reduce dissipation
energy dissipationby eliminating
by eliminating the the
conversion
conversion between
between current
current andandvoltage
volt-
compares
age compares with another
with anotherform such
form as I–I,
such I–V I–V
as I–I, andand so on.
so The comparison
on. The comparison of this workwork
of this with
with former works is concluded in Table 2. All of the references [16–19] constructed by
former works is concluded in Table 2. All of the references [16–19] constructed a sigmoid a
dividingby
sigmoid the ideal expression
dividing the ideal into several into
expression partsseveral
to approximate, contributing to
parts to approximate, perceptible
contributing
Micromachines 2021, 12, 1183 9 of 12

error. We achieve <(1/2) × maximum error rate, <(1/5.5) × maximum error rate of the
corner and <(1/19) × improvement in average error rate, <(1/11) × improvement in
average error rate of the corner compared to [16] due to our expression matches the ideal
sigmoid expression totally. We achieve similar simulated accuracy using fewer MOSFETs
in the core circuit as [17], in addition, it cannot be applied to practical applications due
to the output range of the sigmoid is between 0–1 uA, and compared to [18], we achieve
higher simulated accuracy with fewer MOSFETs in the core function circuit. As for [19],
although the MOSFETs of the core circuit and maximum error are similar to our work, the
ranges of input and output are in the order of milli-voltage, making it difficult to apply in
real work.

Table 2. Comparison of this work and former works.

MOS Number Corner Analysis


Supply VoltAge Tech Maximum Error Average Error Form of
Ref of Core Function Maximum Error
(V) (nm) (%) (%) Input-Output
Circuit (%)
[16] 1.2 90 6 7.67 4.12 I–V 37.42
[17] 1.8 180 9 1.76 / I–I /
[18] 3.3 350 12 5 / V–V /
[19] 1.2 90 6 3 / V–V /
This work 3.3 40 7 2.74 0.21 V–V 7

4. Applying the Proposed Circuit to CNN on the MNIST Dataset


To evaluate the performance of the hardware-implemented sigmoid activation func-
tion in a typical situation, we apply it (the size of M4 is 10
2 , and the corner is TT condition)
to a multi-layer CNN in the CIM architecture to complete a hand-written recognition task
with the MNIST dataset [20]. In the first place, choosing the construction of a CNN, which
is shown in Table 3.

Table 3. The architecture of CNN for the experience.

CNN Layers
1 Conv 8 Conv
2 Activation(sigmoid) 9 BN
3 Sub_sampling 10 Flatten
4 BN 11 Dense
5 Conv 12 BN
6 Sub_sampling 13 Dense
7 BN 14 BN
15 Activation (softmax)
BN = Batch_Normalization.

The CNN we used consists of three convolution layers, two sub_sampling layers, three
fully-connected layers, five batch_normalization layers, and two activation layers. The filter
weights would be trained to be +1/−1 in binary-weight-network (BWN), [21], which is
needed in most CIM operations owing to the limitation of the principle of memory storage,
has fueled an active interest in choosing BWN to complete this handwritten recognition
task. BWN is different from full-precision neural network and binary-neural network
(BNN). Not only the weight but also the input of BNN is binary, which leads to a much
smaller storage capacity than BWN, but the recognition accuracy is not optimistic enough.
In order to achieve a good compromise, we use BWN in the CIM architecture to reduce the
parameters of a network while ensuring the recognition accuracy of a small network in this
experience. Because BWN only cares about the binarization of weights and does not change
the input of the network and the intermediate value between layers, it retains the original
Micromachines 2021, 12, 1183 10 of 12

accuracy and is still critical to make use of a full-precision activation function. Additionally,
there are two kinds of training patterns in the CIM architecture, online training and offline
training. Online training means not only inference but also training of the model occurred
Micromachines 2021, 12, x FOR PEER
onREVIEW
the chip, but for offline training, the parameters of the model such as weights will 10 be
of 12
trained on the software, when the training of the model converges, the parameters trained
before can be stored on the chip for the inference step, which denotes that the weights
would not be changed
accomplish the circuitonimplementation
the chip. In this ofpaper, we accomplish
the sigmoid activationthe but
circuit
notimplementation
the hardware im-
ofplementation
the sigmoid activation
of other network layers, so we use the method of exporting the layers,
but not the hardware implementation of other network so
circuit sim-
weulation
use thecurve
method of exporting the circuit simulation curve into a
into a function in the software to simplify this experiment. function in the software
to simplify thisonline
For the experiment.
training in this experiment, we train the ideal model with ideal activa-
tionFor the online
function withtraining
𝑒𝑝𝑜𝑐ℎ =in6,this experiment,
as for we train
the simulated curve,the ideal
it can bemodel
trainedwith
using ideal
the acti-
fitting
vation function with epoch = 6, as for the simulated curve, it can be trained
function (9) rather than exported multi-point from the simulated circuit due to backward using the
fitting functionis(9)needed
propagation rather for
than exported
online multi-point
training from the simulated
and the insignificant error ofcircuit due to back-
(9) analyzed above.
ward
Thus, the function expression (9) should be packaged into a new activation analyzed
propagation is needed for online training and the insignificant error of (9) function in
above. Thus, the function expression (9) should be packaged into a new activation function
software for easy calling. The change in accuracy of the training process is shown in Figure
in software for easy calling. The change in accuracy of the training process is shown
9a. It could be noted that the accuracy of recognition of the model with ideal activation is
in Figure 9a. It could be noted that the accuracy of recognition of the model with ideal
approximated to the model with fitting one at the beginning with 𝑒𝑝𝑜𝑐ℎ ≤ 2, as the in-
activation is approximated to the model with fitting one at the beginning with epoch ≤ 2,
creased number of the epoch, the gap of training accuracy between the two models grad-
as the increased number of the epoch, the gap of training accuracy between the two models
ually increases, but when the 𝑒𝑝𝑜𝑐ℎ = 6, the simulated one achieving 97.04% and the
gradually increases, but when the epoch = 6, the simulated one achieving 97.04% and the
ideal one achieving 96.06% which illustrate that using our simulated activation function
ideal one achieving 96.06% which illustrate that using our simulated activation function
almost has no effect on the training recognition results for on-line training on the MNIST
almost has no effect on the training recognition results for on-line training on the MNIST
training dataset. In addition, the comparison of the accuracy of the two models in the in-
training dataset. In addition, the comparison of the accuracy of the two models in the
ference period for online training in the MNIST test dataset is shown in Figure 9b, i.e., the
inference period for online training in the MNIST test dataset is shown in Figure 9b, i.e.,
ideal one is 97.32%, and the simulated one is 97.06%.
the ideal one is 97.32%, and the simulated one is 97.06%.

(a) (b)
Figure 9. (a) Measured the training accuracy of recognition in the MNIST training dataset, with the ideal sigmoid and with the
Figure 9. (a) Measured the training accuracy of recognition in the MNIST training dataset, with the ideal sigmoid and with
simulated sigmoid.
the simulated (b) Measured
sigmoid. the inference
(b) Measured accuracy
the inference of recognition
accuracy in the
of recognition MNIST
in the testtest
MNIST dataset, with
dataset, thethe
with ideal sigmoid
ideal and with
sigmoid
theand with the
simulated simulated
sigmoid, sigmoid,
setting epochsetting
= 6. epoch = 6.

For the offline training, during the training stage, we train the two models using the
For the offline training, during the training stage, we train the two models using the
ideal sigmoid activation function equally, and then the weights information of the models
ideal sigmoid activation function equally, and then the weights information of the models
should be saved in a file. After this, the inference phase of the ideal model is the same as
should be saved in a file. After this, the inference phase of the ideal model is the same as
online training, except the information of the weights should load to memory, and then it is
online training, except the information of the weights should load to memory, and then it
accomplished on-chip. The simulated one must be inferred on a new model, which is only
is accomplished on-chip. The simulated one must be inferred on a new model, which is
replacing the former activation layer to a simulated activation layer encapsulating with 4 k
only replacing the former activation layer to a simulated activation layer encapsulating
simulated data points exported in hardware. The results likewise show in Figure 9b. The
with 4k simulated
accuracies of inferencedata pointstraining
of offline exported in hardware.
between the twoThe resultsare
functions likewise
97.32%show in Figure
and 97.74%,
9b. The accuracies of inference of offline training between the two functions are
respectively. These results are consistent with the theoretical inferences for the application 97.32%
ofand
the 97.74%,
simulated respectively.
function inThese results
Section 3. Itare consistent
is easily notedwiththatthe
thetheoretical
inference ofinferences
accuracyfor
the application of the simulated function in Section 3. It is easily noted
between the two training types with the ideal sigmoid is equal due to the fact that that the inference
the
of accuracy between the two training types with the ideal sigmoid is equal
error caused by MAC operation on a chip is ignored to make the comparison of the resultsdue to the fact
that the error caused by MAC operation on a chip is ignored to make the comparison of
the results more intuitive in the experiment. Whether with online training or with offline
training, the experimental results have shown unprecedented performance on the recog-
nition task with the MNIST test dataset.
Micromachines 2021, 12, 1183 11 of 12

more intuitive in the experiment. Whether with online training or with offline training, the
experimental results have shown unprecedented performance on the recognition task with
the MNIST test dataset.
To test the robustness of our circuit, we select the three curves with the largest errors
in the simulation results in Figure 8. One is the simulation curve under the supply voltage
of 3.6 V and SS corner, the other is the simulation curve under the voltage of 3 V with FF
corner, and the last is the one in the Monte Carlo curve that deviates the furthest from
the fitted curve. For online training, there are only slight differences from the typical case:
97.28%, 97.01%, 97.11%, respectively because of the minimal differences between them and
the fitted curve. For offline training, the classification accuracies are 96.46%, 97.28%, and
97.19%. It is easy to be seen that our actual circuit structure is sufficiently robust.

5. Conclusions
The high-precision sigmoid activation function implemented with the circuit using
the index formula of I-V of diodes to address the existing problem of implementation
for non-linear activation function in the CIM architecture is presented in this paper. We
demonstrated features of the sigmoid function with adjustable slope by simulating the
high-precision sigmoid circuit with the SMIC 40 nm process, achieving the output value
close to the ideal sigmoid, which means the simulated sigmoid function is high-precision,
so it will theoretically not reduce the classification accuracy of the neural network. To
provide further support for the correctness of this theory, the proposed simulated high-
precision sigmoid circuit is applied to accomplish the task of recognizing the MNIST
hand-written dataset utilizing a multi-layer CNN based on CIM architecture. The results
demonstrate that using the proposed high-precision circuit implemented sigmoid has
almost no effect on the accuracy of the network, suggesting it is in accordance with the
theory. In addition, compared to prior CIM works of the sigmoid activation function
implemented on software, our approach can dramatically reduce data transfer back and
forth and latency by putting the circuit of sigmoid into CIM architecture. It is of great
significance for the hardware-implemented sigmoid to realize fully customized hardware
neural network accelerators with CIM architecture for specific applications. To sum up,
it is indicated that the proposed high-precision circuit implementation of sigmoid shows
promising results, which can be applied to ultra-low-power and high-accuracy AI products
because of its further degradation of energy consumption in CIM architecture.

Author Contributions: Conceptualization, S.X. and X.L.; methodology, S.X. and C.X.; software, S.X.;
validation, C.X. and X.L.; formal analysis, S.X. and C.C.; investigation, S.X. and X.L.; resources, Z.S.,
H.C. and X.L.; data curation, S.X.; writing—original draft preparation, S.X.; writing—review and
editing, S.X., X.L. and C.X.; visualization, S.X.; supervision, X.L. and Z.S. All authors have read and
agreed to the published version of the manuscript.
Funding: This research was funded by the National Key Research and Development Program
of China (grant number 2017YFA0206101, 2017YFB0701703, 2017YFA0206104, 2017YFB0405601,
2018YFB0407500), National Natural Science Foundation of China (grant number 91964204, 61874178,
61874129, 61904186, 61904189), Strategic Priority Research Program of the Chinese Academy of
Sciences (grant number XDB44010200), Science and Technology Council of Shanghai (grant number
17DZ2291300, 19JC1416801), and in part by the Shanghai Research and Innovation Functional
Program under Grant 17DZ2260900.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: Not applicable.
Conflicts of Interest: The authors declare no conflict of interest.
Micromachines 2021, 12, 1183 12 of 12

References
1. LeCun, Y.; Boser, B.; Denker, J.S.; Henderson, D.; Howard, R.E.; Hubbard, W.; Jackel, L.D. Backpropagation Ap-plied to
Handwritten Zip Code Recognition. Neural Comput. 1989, 1, 541–551. [CrossRef]
2. LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86,
2278–2324. [CrossRef]
3. Fang, S.; Zhibo, W.; Jinyang, L.; Chang, M.; Liu, Y. Design of nonvolatile processors and applications. In Proceedings of the 2016
IFIP/IEEE International Conference on Very Large Scale Integration (VLSI-SoC), Tallinn, Estonia, 26–28 September 2016; pp. 1–6.
4. Dou, C.M.; Chen, W.H.; Xue, C.X.; Lin, W.Y.; Chang, M.F. Nonvolatile Circuits-Devices Interaction for Memory, Logic and
Artificial Intelligence. In Proceedings of the 2018 IEEE Symposium on VLSI Technology, Honolulu, HI, USA, 18–22 June 2018.
5. Dou, C.; Chen, W.; Chen, Y.; Lin, H.; Lin, W.; Ho, M.; Chang, M. Challenges of emerging memory and memristor based circuits:
Nonvolatile logics, IoT security, deep learning and neuromorphic computing. In Proceedings of the 2017 IEEE 12th International
Conference on ASIC (ASICON), Guiyang, China, 25–28 October 2017; pp. 140–143.
6. Chen, W.; Lin, W.; Lai, L.; Li, S.; Hsu, C.; Lin, H.; Lee, H.; Su, J.; Xie, Y.; Sheu, S.; et al. A 16Mb dual-mode ReRAM macro with
sub-14ns computing-in-memory and memory functions enabled by self-write termination scheme. In Proceedings of the 2017
IEEE International Electron Devices Meeting (IEDM), San Francisco, CA, USA, 2–6 December 2017; pp. 28.22.21–28.22.24.
7. Zhang, Z.; Si, X.; Srinivasa, S.; Ramanathan, A.K.; Chang, M.F. Recent Advances in Compute-in-Memory Sup-port for SRAM
Using Monolithic 3-D Integration. IEEE Micro 2019, 39, 28–37. [CrossRef]
8. Chen, W.H.; Li, K.X.; Lin, W.Y.; Hsu, K.H.; Li, P.Y.; Yang, C.H.; Xue, C.X.; Yang, E.Y.; Chen, Y.K.; Chang, Y.S.; et al. A 65nm
1Mb Nonvolatile Computing-in-Memory ReRAM Macro with Sub-16ns Multiply-and-Accumulate for Binary DNN AI Edge
Processors. In Proceedings of the 2018 IEEE International Solid State Circuits Conference, San Francisco, CA, USA, 11–15 February
2018; pp. 494–496.
9. Xue, C.X.; Chen, W.H.; Liu, J.S.; Li, J.F.; Lin, W.Y.; Lin, W.E.; Wang, J.H.; Wei, W.C.; Chang, T.W.; Chang, T.C.; et al. A 1Mb Multibit
ReRAM Computing-In-Memory Macro with 14.6ns Parallel MAC Computing Time for CNN-Based AI Edge Processors. In
Proceedings of the 2019 IEEE International Solid-State Circuits Conference; Fujino, L.C., Ander-son, J.H., Belostotski, L., Dunwell, D.,
Gaudet, V., Gulak, G., Haslett, J.W., Halupka, D., Smith, K.C., Eds.; IEEE: Piscataway, NY, USA, 2019; Volume 62, pp. 388–390.
10. Jain, S.; Ranjan, A.; Roy, K.; Raghunathan, A. Computing in Memory With Spin-Transfer Torque Magnetic RAM. In IEEE
Transactions on Very Large Scale Integration (VLSI) Systems; IEEE: Piscataway, NY, USA, 2018; pp. 470–483.
11. Yue, J.; Yuan, Z.; Feng, X.; He, Y.; Zhang, Z.; Si, X.; Liu, R.; Chang, M.; Li, X.; Yang, H.; et al. 14.3 A 65nm Compu-ting-in-
Memory-Based CNN Processor with 2.9-to-35.8TOPS/W System Energy Efficiency Using Dynam-ic-Sparsity Performance-Scaling
Architecture and Energy-Efficient Inter/Intra-Macro Data Reuse. In Proceedings of the 2020 IEEE International Solid- State
Circuits Conference (ISSCC), San Francisco, CA, USA, 16–20 February 2020; pp. 234–236.
12. Yao, P.; Wu, H.; Gao, B.; Tang, J.; Zhang, Q.; Zhang, W.; Yang, J.J.; Qian, H. Fully hardware-implemented memristor convolutional
neural network. Nature 2020, 577, 641–646. [CrossRef] [PubMed]
13. Wang, Z.; Li, C.; Lin, P.; Rao, M.; Nie, Y.; Song, W.; Qiu, Q.; Li, Y.; Yan, P.; Strachan, J.P.; et al. In situ training of feed-forward and
recurrent convolutional memristor networks. Nat. Mach. Intell. 2019, 1, 434–442. [CrossRef]
14. Biswas, A.; Chandrakasan, A.P. CONV-SRAM: An Energy-Efficient SRAM With In-Memory Dot-Product Com-putation for
Low-Power Convolutional Neural Networks. IEEE J. Solid-State Circuits 2018, 54, 217–230. [CrossRef]
15. Yeo, I.; Gi, S.; Lee, B.; Chu, M. Stochastic implementation of the activation function for artificial neural networks. In Proceedings
of the 2016 IEEE Biomedical Circuits and Systems Conference (BioCAS), Shanghai, China, 17–19 October 2016; pp. 440–443.
16. Khodabandehloo, G.; Mirhassani, M.; Ahmadi, M. Analog Implementation of a Novel Resistive-Type Sigmoidal Neuron. IEEE
Trans. Very Large Scale Integr. (VLSI) Syst. 2012, 20, 750–754. [CrossRef]
17. Xing, S.; Wu, C. Implementation of A Neuron Using Sigmoid Activation Function with CMOS. In Proceedings of 2020 IEEE 5th
International Conference on Integrated Circuits and Microsystems (ICICM), Nanjing, China, 23–25 October 2020; pp. 201–204.
18. Chible, H.; Ghandour, A. CMOS VLSI Hyperbolic Tangent Function & its Derivative Circuits for Neuron Implementation. Int. J.
Electron. Comput. Sci. Eng. 2013, 2, 1162–1170.
19. Babu, V.S.; Biju, M. Novel circuit realizations of neuron activation function and its derivative with continuously programmable
characteristics and low power consumption. Int. J. Adv. Res. Eng. Technol. 2014, 5, 185–200.
20. Lecun, Y.; Cortes, C. The Mnist Database of Handwritten Digits. 2021. Available online: http://yann.lecun.com/exdb/mnist/
(accessed on 23 July 2021).
21. Rastegari, M.; Ordonez, V.; Redmon, J.; Farhadi, A. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural
Networks. In Proceedings of Computer Vision—ECCV 2016, Cham, Switzerland, 8–16 October 2016; pp. 525–542.

You might also like