Professional Documents
Culture Documents
HPBDIS 2021 Paper 144
HPBDIS 2021 Paper 144
Abstract— This paper proposed a rate encoding binarized performance and energy efficiency. Nonetheless, the accuracy
spiking neural network (B-SNN) model suited for in- of BNNs strongly depends on the process variations, which
memory computing (IMC) implementation. While in most could be quite severe in finer technologies [3].
of the prior arts, due to the nature of spike represented The spiking neural network has been recently introduced
unipolar format, the B-SNN were implemented using in [4], [5] with resilience, robustness, and fault tolerance
either complex or non-regular logic that is not suited for capability against the process variation. Furthermore,
IMC and/or makes the network inflexible. In this work, we binarized spiking neural network (B-SNN) is introduced with
proposed a B-SNN model that permits the direct adoption binary weights and binary spikes [6], [7], enabling the
of a unipolar format spike on the XNOR array, i.e., allows utilization of MAC within in-memory. The authors in these
fully exploiting IMC's potential benefit based on the highly works introduced the conversion of the trained BNN model
into B-SNN, where ReLU activation is normalized to be to IF
regular and simple array structure. Also, instead of spiking activity. Their proposed S-BNNs achieve comparable
indirectly taking the B-SNN model from the trained BNN, accuracy compared to BNN with equivalent network
we propose to train B-SNN directly using the surrogate topologies. In [8], the author presents a directly supervised
gradient method with residual connections. The system learning algorithm to train B-SNN with the temporal encoding
simulation shows that our proposed trained network method. Their evaluation obtained 97.0% and 87.3%
achieves reasonably good accuracy (59.11%) on CIFAR- misclassification accuracy for MNIST and Fashion MNIST,
100 with very low inference latency (only 8 time-steps B- respectively. In [9], the author train the residual stochastic
SNN). binary convolutional spiking neural network with hybrid
Keywords—In-memory computing, Binary Spiking Neural spike-timing-dependent plasticity to achieve 66% for the
Network, surrogate gradient CIFAR-10 dataset. However, these techniques require a large
number of time-steps (~100-300) for inference, which leads to
I. INTRODUCTION high latency and limited energy efficiency on the actual
hardware implementation.
Multiply-addition-accumulation (MAC) has been the
dominant computational workload for deep-neural-network The surrogate gradient method is recently introduced to
(DNN) processors. This type of computation not only mitigate the inherent non-differentiability of spiking during
computing but memory intensive. Therefore, the conventional the backpropagation training process. Studies in [10], [11]
computer architecture with limited memory bandwidth and demonstrated that training the conventional spiking neural
sequential computing nature is not ideal for AI applications. It networks using surrogate gradients achieves reasonable
is a real challenge to accommodate DNN in on-edge artificial- accuracy using fewer time-steps. In this work, we proposed a
intelligence (AI) devices like Internet-of-Thing (IoT) or novel method for B-SNN training that achieves an accuracy
mobile systems, which have strict resource and power of 59.11% for CIFAR-100 with only 8 time-steps,
consumption constraints. approximating 13X lower than previous art (105 time-steps)
in [6]. Furthermore, we propose a novel MAC model using
BNN, first introduced by M. Courbariaux et. al [1], is a only the XNOR array. Compared to the implementation in
type of network whose weights and inputs are represented by [12], where the possible product output belongs to the set of
the bi-polar value ±1, simplifying the multiplication to be the (0,+1,-1), therefore fundamental multiplication unit is realized
bitwise function. BNN hence is particularly suited for using a non-standard dual output gate, i.e., more complex than
resource- and area-constrained devices. Nonetheless, the BNN XNOR gate. Our proposed model permits the full B-SNN
network size, compared to the conventional one, needs to be model can be mapped to an existing common in-memory
increased accordingly to compensate for the low-precision in architecture based on the XNOR array.
bi-polar presentation. The latter could lead to the surging up
of the number of memory access. In-memory computing The remainder of this paper is structured as follows.
(IMC), one of the recent revolutionized approaches to solving Section II present the training B-SNN with surrogate gradient.
the current computer architecture's memory bottleneck [2]. Section III details the B-SNN inference IMC model. The
This approach proposed to partially shift the processing from experiment evaluation and results comparison is presented in
the central processing unit to the processing in-memory, Section IV. Finally, Section V concludes the paper.
which greatly reduces memory access and increases
Conv
BN
In this section, we proposed directly training B-SNN with
IF
+
the surrogate gradient method. In our model, the weights of
B-SNN are represented in bipolar format (i.e., ±1) [13]. In
the general SNN model using the Integrated-and-Fired (IF)
model, the membrane potential 𝑢𝑖𝑡,𝑙 at i-th neuron time-step 𝑡 Residual Block
at intermediate layer 𝑙-th is defined as follows: Fig. 1. Residual connections from layer 𝑙 − 2, 𝑙 − 1 to 𝑙 [9]
𝑀
𝑙 − 2 . The index 𝑓 represents the residual connection
𝑢𝑖𝑡,𝑙 = 𝑢𝑖𝑡−1,𝑙 + ∑ 𝑤𝑖𝑗 𝑜𝑗 𝑡,𝑙 (1) (𝑖. 𝑒. 𝑓 = 1 implies the presence of the residual connection,
𝑗=1 otherwise 𝑓 = 0 ). The input data is encoded using rate
Where 𝑀 denotes the number of pre-synaptic neurons, encoding, supported by the Poisson Generator function [10].
𝑜𝑗 𝑡,𝑙 is the pre-synaptic spike in the 𝑗-th neuron, 𝑤𝑖𝑗 is the The batch normalization (BN) is set with two parameters:
weight that links the pre-and the post-neurons. If the variance 𝜎, and learn parameters 𝜇 and are updated during the
membrane potential 𝑢𝑖𝑡,𝑙 surpasses the firing threshold 𝜃, the training process.
IF model generates a binary spike output 𝑜𝑖 𝑡,𝑙 . In this work,
we use soft reset, i.e., the membrane potential is subtracted Algorithm 1 The direct B-SNN training algorithm
by a threshold if a neuron is fired. In the output layer, the using surrogate gradient
membrane potential 𝑢𝑖𝑇,𝐿 in the output layer 𝐿 at final time- Input: input 𝑋, label vector 𝑌
step 𝑇, is accumulated without firing, calculates probability Output: parameters in layer 𝑛: 𝑤1 , 𝑤 𝐿 , 𝛼 , 𝑤𝑖𝑗𝑏,𝑙 , 𝜃
distribution after softmax function without information loss 1: function IF (𝑢, 𝐼)
[10]. From the accumulated membrane potential, the cross- 3: for 𝑡 = 1 𝑡𝑜 𝑇 do
entropy loss for B-SNN is defined as 4: 𝑢𝑡 = 𝑢𝑡−1 + 𝐼(𝑡)
𝐶 𝑇,𝐿 5: end for
𝑒 𝑢𝑖
𝐿 = − ∑ 𝑦𝑖 𝑙𝑜𝑔 ( 𝑇,𝐿 ) (2) 6: if 𝑢𝑡 ≤ 𝜃
𝑖=1 ∑𝐶𝑘=1 𝑒 𝑢𝑘 7: 𝑜𝑖 𝑡 = 0
8: else
Here, 𝑌 = (𝑦1 , 𝑦2 , … , 𝑦𝐶 ) is a label vector, and 𝑇 is a 9: 𝑜𝑖 𝑡 = 1
total number of time-steps. The partially derivative of the loss 10: 𝑢𝑡 = 𝑢𝑡 − 𝜃
function with respect to the membrane potential 𝑢𝑖𝑡,𝑙 at the 11: end if
12: end function
layer 𝑙 is defined as follows
In the Training: % Forward:
𝜕𝐿 𝜕𝐿 𝜕𝑜𝑖𝑡,𝑙 𝜕𝐿 𝜕𝑢𝑖𝑡+1,𝑙 1: for 𝑙 = 1 do
= 𝑡,𝑙 + (3) 2: for 𝑡 = 1 𝑡𝑜 𝑇 do
𝜕𝑢𝑖𝑡,𝑙 𝜕𝑜𝑖𝑡,𝑙 𝜕𝑢𝑖 𝜕𝑢𝑖𝑡+1,𝑙 𝜕𝑢𝑖𝑡,𝑙
𝜕𝑜𝑖𝑡,𝑙
3: 𝑜𝑡 = 𝑃𝑜𝑖𝑠𝑠𝑜𝑛 𝐺𝑒𝑛𝑒𝑟𝑎𝑡𝑜𝑟 (𝑋)
Due to the non- differentiable spiking activities, does 4: 𝑥 𝑡,1 = 𝑤1 𝑜𝑡
𝜕𝑢𝑖𝑡,𝑙
5: 𝑥 𝑡,1 −𝜇
not exist. To deal with this problem, the authors in [10] 𝑦 𝑡,1 = 𝜎
introduce an approximate gradient (i.e., surrogate gradient) 6: end for
for SNN training, which is expressed as follows 7: 𝑜𝑖𝑡,1 ← IF (𝑢1 , 𝑦 𝑡,1 )
𝜕𝑜𝑖𝑡,𝑙 𝑢𝑖𝑡,𝑙 − 𝜃 8: end for
= 𝛿 𝑚𝑎𝑥 {0, 1 − | |} (4) 9: for 𝑙 = 2 𝑡𝑜 𝐿 − 1 do
𝜕𝑢𝑖𝑡,𝑙 𝜃
10: for 𝑡 = 1 𝑡𝑜 𝑇 do
11: 𝑥 𝑡,𝑙 = 𝛼𝑤𝑖𝑗𝑏,𝑙 {𝑜𝑗𝑡,𝑙−1 + 𝑓𝑜𝑗𝑡,𝑙−2 }
Here, 𝛿 is a damping factor for back-propagated gradients, 12: 𝑥 𝑡,𝑙−𝜇
which is empirically set to 0.3 for stable training [10]. 𝑦 𝑡,𝑙 =
𝜎
Algorithm 1 presented below demonstrates our procedure for 13: end for
training B-SNN with binary weights. Note that our B-SNN 14: 𝑜𝑖𝑡,𝑙 ← IF (𝑢𝑙 , 𝑦 𝑡,𝑙 )
does not binarize the first and the last layer as in some previous 15: end for
works (BNN [13], B-SNN [6]). Except for the full-precision 16: for 𝑙 = 𝐿 do
weights in the first layer 𝑤1 , and the last layer 𝑤 𝐿 , the dot- 17: for 𝑡 = 1 𝑡𝑜 𝑇 do
product computation 𝑥 𝑡,𝑙 at layer 𝑙 -th (in Conv block as 18: 𝑥 𝑡,𝐿 = 𝑤 𝐿 𝑜𝑡,𝐿−1
shown in Fig. 1) between the weights and the pre-synaptic
𝑥 𝑡,𝐿 − 𝜇
spike is equal to 19: 𝑦 𝑡,𝐿 =
𝜎
𝑥 𝑡,𝑙 = 𝛼𝑤𝑖𝑗𝑏,𝑙 {𝑜𝑗𝑡,𝑙−1 + 𝑓𝑜𝑗𝑡,𝑙−2 } (5) 20: 𝑢𝑡,𝐿 = 𝑢𝑡−1,𝐿 + 𝑦 𝑡,𝐿
21: end for
22: end for
Where 𝛼 is a scaling factor and 𝑤𝑖𝑗𝑏,𝑙 is binary weights that 23: % Calculate the loss and back-propagation
link layer 𝑙 − 1 to layer 𝑙. In (5) we also apply the identity 24: 𝐿 ← 𝐶𝑜𝑚𝑝𝑢𝑡𝑒 𝐿𝑜𝑠𝑠( 𝑌, 𝑢𝑇,𝐿 )
mapping technique for residual connections in [9] to our 𝜕𝐿 𝜕𝐿
model. The convolutional layer 𝑙 receives residual 25: 𝑡,𝑙 , ← 𝐴𝑢𝑡𝑜𝑔𝑟𝑎𝑑
𝜕𝑢𝑖 𝜕𝑜𝑖𝑡,𝑙
connections from the pre-synaptic spike in layers 𝑙 − 1 and
III. PROPOSED BINARIZED SPIKING NEURAL NETWORK Algorithm 2 Proposed B-SNN inference model for IMC
MODEL BASED ON THE IN-MEMORY XNOR ARRAY implementation.
This section proposes a MAC model for the accumulated 1: PARAMETERS: 𝑤𝑖𝑗𝑏,𝑙 , 𝜃̂, 𝑀, 𝑀1, 𝜇, 𝛼
bit-wise product in B-SNN inference using the XNOR cell 2: INPUT: 𝑜𝑗𝑡,𝑙−1
circuit, which is common in the IMC model as shown in 3: OUTPUT: 𝑜𝑖 𝑡,𝑙
Algorithm 2.
Δ In-memory MAC computation
First, we analyze with 𝑓 = 0 on steps (11-12) in 4: for 𝑡 ← 1 𝑡𝑜 𝑇 do
Algorithm 1. We have the IF model now is expressed as
5: for 𝑗 ← 1 𝑡𝑜 𝑀 do
𝑀
𝑀
𝛼 𝜇 6: 𝑑𝑜𝑡𝑖𝑡,𝑙 = ∑ 𝑤𝑖𝑗𝑏,𝑙 ⊕ 𝑜𝑗𝑡,𝑙−1
𝑰𝒏𝒕𝒆𝒈𝒓𝒂𝒕𝒊𝒐𝒏: 𝑢𝑖𝑡,𝑙 = 𝑢𝑖𝑡−1,𝑙 + {∑ 𝑤𝑖𝑗𝑏,𝑙 𝑜𝑗𝑡,𝑙−1 − }
𝜎 𝛼 𝑗=1
𝑗=1 7: end for
1, if 𝑢𝑖𝑡,𝑙 > 𝜃 8: end for
𝑭𝒊𝒓𝒊𝒏𝒈: 𝑜𝑖𝑡,𝑙 = { (6)
0, otherwise Δ Accumulation phase
𝑹𝒆𝒔𝒆𝒕: 𝑢𝑖𝑡,𝑙 = 𝑢𝑖𝑡,𝑙 − 𝜃 9: for 𝑡 ← 1 𝑡𝑜 𝑇 do
𝜇
10: 𝑢̂𝑖𝑡,𝑙 = 𝑢̂𝑖𝑡−1,𝑙 + 𝑑𝑜𝑡𝑖𝑡,𝑙 − [𝑀1 + ]
To summarize (6), for every time-step, the membrane 𝛼
11: end for
potential 𝑢𝑖𝑡,𝑙 is accumulated with 𝛼⁄𝜎 {∑𝑀 𝑏,𝑙 𝑡,𝑙−1
𝑗=1 𝑤𝑖𝑗 𝑜𝑗 −
Δ Firing & reset phase
𝜇⁄ },
𝛼 then compared with threshold 𝜃 for firing decision. 12: for 𝑡 ← 1 𝑡𝑜 𝑇 do
After firing, the membrane is subtracted by 𝜃 in the reset 13: if 𝑢̂𝑖𝑡,𝑙 > 𝜃̂ then
phase. 14: 𝑜𝑖 𝑡,𝑙 ← 1
We can reformulate the integration phase in (6) as 15: 𝑢̂𝑖𝑡,𝑙 ← 𝑢̂𝑖𝑡,𝑙 − 𝜃̂
𝑀 16: else
𝜇 17: 𝑜𝑖 𝑡,𝑙 ← 0
𝑢̂𝑖𝑡,𝑙 = 𝑢̂𝑖𝑡−1,𝑙 + ∑ 𝑤𝑖𝑗𝑏,𝑙 𝑜𝑗𝑡,𝑙−1 − (7) 𝜇
𝛼 18: ̂𝑢𝑖𝑡,𝑙 = 𝑢̂𝑖𝑡−1,𝑙 + 𝑑𝑜𝑡𝑖𝑡,𝑙 − [𝑀1 + ]
𝑗=1
𝛼
19: end if
In (7), 𝑢̂𝑖𝑡,𝑙 is the membrane potential transformation, 𝜃̂ = 20: end for
(𝜎 ∙ 𝜃)⁄
𝛼 is a threshold transformation.
𝑏,𝑙 𝑡,𝑙−1
To compute the MAC operation ∑𝑀 𝑗=1 𝑤𝑖𝑗 𝑜𝑗 , the prior
works [14] proposed to separate the calculation into negative 𝑀1 𝑀1
and positive phases of weight. In detail, 𝑀 weights 𝑤𝑖𝑗𝑏,𝑙 are − ∑|𝑤𝑖𝑗−𝑏 | ∧ 𝑜𝑗𝑡,𝑙−1 = ∑(1 − 𝑜𝑗𝑡,𝑙−1 ) − 𝑀1
divided into 𝑀1 negative weights 𝑤𝑖𝑗−𝑏 (-1) and 𝑀2 positive 𝑗=1 𝑗=1
weights 𝑤𝑖𝑗+𝑏 (+1). Then, the MAC is seperated into two sub- 𝑀1
(9)
MAC operation ∑𝑗=1
𝑀1 𝑀2
𝑤𝑖𝑗−𝑏 𝑜𝑗𝑡,𝑙−1 and ∑𝑗=1 𝑤𝑖𝑗+𝑏 𝑜𝑗𝑡,𝑙−1 . The = ∑(1 + 𝑤𝑖𝑗−𝑏 ) ⊕ 𝑜𝑗𝑡,𝑙−1 − 𝑀1
MAC product 𝑠𝑖𝑡,𝑙 in B-SNN synapse can be derived as 𝑗=1
𝑀2 𝑀2
follows.
∑ 𝑤𝑖𝑗+𝑏 ∧ 𝑜𝑗𝑡,𝑙−1 = ∑ 𝑤𝑖𝑗+𝑏 ⊕ 𝑜𝑗𝑡,𝑙−1
𝑀 { 𝑗=1 𝑗=1