You are on page 1of 15

Classification through

Artificial Neural Networks


Overview of the presentation
• Brief History & Applications of Artificial Neural Networks (ANN)
• Feed-forward ANN
• Learning the parameters of ANN through Backpropogation
Artificial Neural networks
• Artificial Neural networks (ANN) were developed in 1980’s
• The back-propagation algorithm was developed in 1985 by Geoff
Hinton et al.
• ANN formulation has been inspired by the human brain
• Presently, neural networks (particularly Deep Feedforward networks)
is an active area of research
• ANN finds application in Natural Language Processing, Image
processing, training of self-driven cars etc
Structure of Feed Forward ANN
Feed Forward ANN
• A feedforward neural network is an artificial neural network wherein
connections between the units do not form a cycle.
• The figure on right
depicts calculations done
at each node of ANN
Activation Function
• Each unit in the hidden and output layers takes its net input and then
applies an activation function to it.
• The function symbolizes the activation of the neuron represented by
the unit.
1
• Sigmoid function ( ) is one of the commonly used activation
1+ 𝑒 −𝑥
function.
Feed Forward ANN
• Mathematically, ANN is a function f(x) which is a composition
of other functions (linear and non-linear) which are continuous and
differentiable
• So, how do we find adequate values of the parameters in the ANN
• One of such technique is the Back Propagation algorithm
• Continuous and differentiable property of the functions is required in
the back-propagation algorithm
Intuition of Back Propagation Algorithm
• Generally loss function is defined as sum of squared errors
1
𝐿 𝑥ҧ = ෍(Tj−Oj)2
2
• The partial derivatives of the loss function with respect to weight
parameters are calculated using chain rule:
𝜕𝐿
𝛥𝑤𝑖 =
𝜕𝑤𝑖
• Then the weight parameters are updated to reduce the loss function
𝑤𝑖𝑡+1 = 𝑤𝑖𝑡 − α𝛥𝑤𝑖𝑡
where α is called the learning rate.
Back Propagation Algorithm Details
• Given a training instance and some initial weigths and bias
parameters, output of the ANN is computed
• Then, for a unit j in the output layer, the error Errj is computed by
Errj =Oj(1−Oj)(Tj−Oj)
where Oj is the actual output of unit j, and Tj is the known target value
of the given training tuple. Note that Oj(1−Oj) is the derivative of the
logistic function.
Back Propagation Algorithm
• The error of a hidden layer unit j is
Errj =Oj(1−Oj)(Tj−Oj)σ𝑘 𝐸𝑟𝑟𝑘 𝑤𝑗𝑘
where wjk is the weight of the connection from unit j to a unit k in the next
higher layer, and Errk is the error of unit k.
• Weights are updated by the following equations, where ∆wij is the change
in weight wij:
∆wij = α . Errj . Oi
wij =wij+∆wij
where α is the learning rate, a constant typically having a value between 0
and 1.
Back Propagation Algorithm
• Biases are updated by the following equations below, where ∆θj is the
change in bias θj:
∆θj = α .Errj
θj = θj + ∆θj
• If the learning rate is too small, then learning will occur at a very slow
pace. If the learning rate is too large, then oscillation between
inadequate solutions may occur. A rule of thumb is to set the learning
rate to 1/t, where t is the number of iterations through the training
set so far.
Back Propagation Algorithm
• Alternatively, the weight and bias increments could be accumulated
in variables, so that the weights and biases are updated after all of
the tuples in the training set have been presented. This latter strategy
is called epoch updating, where one iteration through the training set
is an epoch.
• In theory, the mathematical derivation of back-propagation employs
epoch updating, yet in practice, case updating is more common
because it tends to yield more accurate results.
Back Propagation Algorithm
• Terminating condition: Training stops when
1. All ∆wij in the previous epoch were so small as to be below some
specified threshold
2. The percentage of training instances misclassified in the previous
epoch is below some threshold
3. A pre-specified number of epochs has expired.
Example
• Recognition of handwritten digits of the kind in the figure below.
• The grey level at each mini-square of
20x20 pixel image of a digit is fed
into ANN.
• The number of units in the single
hidden layer is taken to be around 500.
• ANN correctly classifies all images with
a very high accuracy.
References
• Data Mining Concepts and Techniques by Han Kamber
• The MNIST data of handwritten digits can be downloaded from
http://yann.lecun.com/exdb/mnist/

You might also like