You are on page 1of 6

SUBURAAJ R 18MVD0072

PROJECT TITLE: An SRAM-based implementation of a convolutional neural network

MOTIVATION:

The Extreme Learning Machine (ELM) is an efficient analytical technique for solving classification and
regression tasks. ELM techniques is used in face recognition algorithm. But ELM networks are
difficult to implement in hardware due to the all-to-all connectivity between the input and hidden
layer neurons. This increases the requirement of hardware resources and also increases as the
hidden layer size increases.

The concept of using receptive fields (RF) for classification tasks originates from biology, in which
sensory neurons responds to a limited spatial range of the input stimulus. Using this methodology
into a classification system increases the performance.

Since SRAM has smaller area compared to logic gates, this implementation is efficient in terms of
hardware resources.

LITERATURE SURVEY:

[1] Extreme learning machine: Theory and applications:ref-1


 The Extreme Learning Machine (ELM) is an analytical technique for solving
classification and regression tasks in which the output layer weights are solved using
a single-step pseudo-inverse approach
 Algorithm ELM: Given a training set {(𝑥𝑖 ,𝑡𝑖 )|𝑥𝑖 ∈ 𝑅𝑛 , 𝑡𝑖 ∈ 𝑅𝑚 , I = 1,….,N}, activation
function g(x), and hidden node number M
 Step 1: Randomly assign input weight 𝑤𝑖 and bias 𝑏𝑖 , I = 1,…,M
 Step 2: Calculate the hidden layer output matrix H.
 Step 3: Calculate the output weight 𝛽.
[2] Online and adaptive pseudo-inverse solutions for ELM weights:ref-7
 Iterative methods for calculating the output weights in the ELM networks, such as
the Online Pseudo-inverse Update Method (OPIUM), remove the problem with
learning very large datasets, for which the memory needed to perform the pseudo-
inverse operation would otherwise become too large.
 These online methods allow the application of ELMs to non-stationary datasets.
[3] A Low Power Trainable Neuromorphic Integrated Circuit That Is Tolerant to Device
Mismatch:ref-8
 Proposed Trainable analog block (TAB) overcomes device mismatch as a means for
random projections of the input to a higher dimensional space.
 The TAB framework is inspired by the principles of neural population coding
operating in the biological nervous system.
 Architecture of the TAB framework:
 The connections from the input layer neurons/nodes to the non-linear hidden
neurons are via random weights and controllable offsets, O1 to OM .
 The hidden layer neurons are connected linearly to the outer layer neurons via
trainable weights.
 The output neurons compute a linearly weighted sum of the hidden layer values.
 TAB is tolerant to random device mismatch (fixed-pattern mismatch) and variability
in the fabrication process.
 Hidden Neuron Block: Schematic of the hidden neuron block that implements the
sigmoid nonlinear activation function for the TAB framework.
 These transistors operate in the weak-inversion region, with
the slope factor (n) ranging from 1.1 to 1.5.
 The current I1 is copied to Ihid via a current mirror, which
acts as a sigmoid tuning curve for a hidden neuron. The
voltage at the M3 transistor, Vb, sets the bias current
(IM3)(∼few nanoamperes) which is equal to sum of I1 and I2.
 In the TAB, each neuron has a distinct tuning curve
depending on the process variations such as offset mismatch
between the transistors in the differential pairs, bias current
mismatch due to variability in M3 and current mirror
mismatch.

 Output Weight Block: Schematic of the output weight block, comprising a splitter
circuit wherein MR and the two M2R transistors form an R2R network, which gets
repeated 13 times in the block. The octave splitter is
terminated with a single MR transistor.

 The output from the hidden neuron block, Ihid, is the input
current for the output weight block.
 Ihid is divided successively to form a geometrically-spaced
series of smaller currents.
 There are a total of N stages in the splitter circuit. The
current at the kth stage is given by 𝐼ℎ𝑖𝑑 /2𝑘 .
 The master bias voltage Vgbias is the reference voltage for the p-FET gates in the
splitter
 Two transistor switches in the lower half of the circuit route the branch current to
either useful current, Igood, or to current that goes to ground, Idump.
 Igood is mirrored to generate a current,Iout, which is further routed to currents
Ipos (positive current) or Ineg (negative current), as determined by the signW signal.
 The signW signal, stored in flip-flop, indicates the polarity of the output weight
connected between the hidden neuron and the output neuron.
[4] A neuromorphic hardware framework based on population coding:ref – 9
 Biological neurons encode input stimuli such as motion, position, colours, and sound
into neuronal firing. The encoded information is represented by a set of neurons in a
collective and distributed manner, referred as population coding.
 Trainable Analogue Block (TAB) is designed which encodes given input stimuli using a
large population of neurons with a heterogeneous tuning curve profile.
 Heterogeneity of tuning curves is achieved using random device mismatches and by
adding a systematic offset to each hidden neuron.

 Hidden Neuron: currents, I1 and I2, are a function of the


input differential voltage between Vin (ramp input) and Vref
(constant input) and their difference is identical to the
mathematical tanh function.
 Heterogeneity improves the information encoded in the neuronal population activity.
 The systematic offset ensures that all tuning curves are distinct and independent, thus
improving the encoding capacity of the system.
[5] Local Receptive Fields Based Extreme Learning Machine:ref -10
 the local receptive fields based ELM (ELM-
LRF) used in image processing applications
 The connections between input and hidden
nodes are sparse and bounded by a receptive field,
which are sampled by a continuous probability
distribution.

 combinatorial nodes are used to


provide translational invariance to the network by
combining several hidden nodes together.
 It involves no gradient-descent steps and
the training is remarkably efficient.

[6] Fast, simple and accurate handwritten digit classification by training shallow neural network
classifiers with the `extreme learning machine' algorithm:ref-11
 The main innovation is to ensure each hidden-unit operates only on a randomly sized
and positioned patch of each image.
 This form of random `receptive field' sampling of the input ensures the input weight
matrix is sparse, with about 90% of weights equal to zero.
 The algorithm for generating these `receptive field' is as follows:
 Generate a random input weight matrix W
 For each of M hidden units, select two pairs of distinct random integers from {1, 2.. L} to
form the co-ordinates of the rectangular mask.
 If any mask has total area smaller than some value q then discard and repeat.
 Set the entries of a √𝐿 𝑋 √𝐿 square matrix that are defined by the two pairs of integers
to 1, and all other entries to zero.
 Flatten each receptive field matrix into a length L vector where each entry corresponds
to the same pixel as the entry in the data vectors Xtest or Xtrain.
 Concatenate the resulting M vectors into a receptive field matrix F of size M X L.
[7] Gradient-Based Learning Applied to Document Recognition:
 Given an appropriate network architecture, gradient-based learning algorithms can be
used to classify high-dimensional patterns, such as handwritten characters, with minimal
pre-processing.
 This paper reviews various methods applied to handwritten character recognition and
compares them on a standard handwritten digit recognition task.
[8] Convolutional neural networks, which are specifically designed to deal with the variability of
two dimensional (2-D) shapes.
[9] A neuromorphic hardware
architecture using the Neural Engineering
Framework for pattern recognition: ref - 13
 System Topology: The inputs are the
pixels; they are connected to a higher-
dimensional hidden layer with 8k neurons,
using randomly weighted connections.
 The output layer consists of linear
neurons and the output layer weights are
solved analytically using the pseudo-inverse
operation

 A Typical NEF Network: The


stimulus X(t) is encoded into a large number
of nonlinear hidden layer neurons N using
randomly initialised connection weights.
 The output of the system, Y(t), is the linear
sum of the weighted spike trains from the
hidden neurons.
 FPGA Implementation of Architecture:
Time-multiplexing approach provides high-
speed digital circuit
 The encoder and the hidden layer are both
implemented with the time multiplexing
approach
 The global counter processes the time-multiplexed (TM) encoders and neurons
sequentially.
 The decoding weights of the physical neuron are stored in the weight buffer.
 In every clock cycle, a TM encoder will generate the stimulus for an input digit, and
the corresponding TM neuron will generate a firing rate with that stimulus and then
multiply it with the decoding weights.
 Pipe-lined Design
 Physical encoder:
 It consists of an input buffer, a global
counter, 49 random weight (RW) generators
(each implemented with a 20-bit LFSR), 196 2-
input multiplexers and an accumulator.
 The accumulator module is a 2 stage
pipeline, which consists of fourteen 14-input 5-
bit parallel adders and one 14-input 9-bit parallel adder.
 Pipe-lined Design
 Physical Neuron:
 The neuron is implemented with three identical 9-bit multipliers.
 The multiplier’s inputs A and B are 9 bits wide and the output result is 18 bits wide.
 All of the three multipliers will need four clock cycles to
process the algorithm.
 Since it is a pipelined design, the output of each TM neuron
is updated only once in its time slot (with a latency of four clock
cycles).
 The memory required by the decoding weights is linearly
proportional to size of the hidden layer.
 The pixels of input digits were converted to binary values in
software and a Python-based front-end client software sent the
selected test digit to the FPGA via JTAG interface.

[10]Human face recognition based on multidimensional PCA and extreme learning machine:
 A new human face recognition algorithm based on bidirectional two dimensional
principal component analysis
(B2DPCA) and extreme learning
machine (ELM) is introduced.
 Images from each database
are converted into gray level image.
 Curvelet transform is used
to generate initial feature vectors.
 Subband that exhibits
highest standard deviation is
selected as an initial feature vector
of size U X V
 B2DPCA is used to generate unique feature sets and to minimize computational
complexity.
 Dimensionally reduced curvelet feature sets are randomly selected for training of an
ELM.
 Remaining features of the same dataset are used to
judge using learned classifier
[11] A compact neural core for digital implementation of
the Neural Engineering Framework:
 The Digital neural core consists of 64 neurons that
are instantiated by a single physical neuron using a time-
multiplexing approach.
 Inputs A and B are 9 bits wide and the output result
is 18 bit wide.
 Result is obtained after four clock cycles.
 Summing the result of all the 64 virtual neurons, input stimulus is obtained.

Tools used for implementation of work:

 Quartus II 13.0sp1
 Modelsim-Altera 10.1d
Timeline Chart:
05.09.2019 – 30.11.2019 Implementation of Existing Architecture
1.12.2019 – 22.3.2020 Implementation of Modified Architecture
23.3.2020 – 31.1.2020 Documentation

You might also like