Professional Documents
Culture Documents
1
Introduction, or how the brain works
2
• A neural network can be defined as a model of
reasoning based on the human brain.
• The brain consists of a densely interconnected set of
nerve cells, or basic information-processing units,
called neurons.
• The human brain incorporates nearly 10 billion neurons
and 60 trillion connections, synapses, between them.
• By using multiple neurons simultaneously, the brain
can perform its functions much faster than the fastest
computers in existence today.
3
• Each neuron has a very simple structure, but an army
of such elements constitutes a tremendous processing
power.
4
Biological neural network
Synapse
Synapse Dendrites
Axon
Axon
Soma Soma
Dendrites
Synapse
5
• Our brain can be considered as a highly complex, non-
linear and parallel information-processing system.
• Information is stored and processed in a neural
network simultaneously throughout the whole
network, rather than at specific locations. In other
words, in neural networks, both data and its processing
are global rather than local.
• Learning is a fundamental and essential characteristic
of biological neural networks. The ease with which
they can learn led to attempts to emulate a biological
neural network in a computer.
6
• An artificial neural network consists of a number of very
simple processors, also called neurons, which are
analogous to the biological neurons in the brain.
• The neurons are connected by weighted links passing
signals from one neuron to another.
• The output signal is transmitted through the neuron’s
outgoing connection.
• The outgoing connection splits into a number of
branches that transmit the same signal. The outgoing
branches terminate at the incoming connections of other
neurons in the network.
7
How do Neural Networks
Work?
• Neural networks learn by adjusting the weights
of connections between neurons in response to
input data.
• They can be trained using a variety of learning
algorithms, including backpropagation,
reinforcement learning, and unsupervised
learning.
Feedforward Neural
Networks
Convolutional Neural
Networks
History
Output Signals
Input Signals
Middle Layer
Input Layer Output Layer
16
Analogy between biological and
artificial neural networks
17
How do Neural Networks
Work?
• Neural networks learn by adjusting the weights
of connections between neurons in response to
input data.
• They can be trained using a variety of learning
algorithms, including backpropagation,
reinforcement learning, and unsupervised
learning.
x1
Y
w1
x2
w2
Neuron Y Y
wn
Y
xn
21
The neuron computes the weighted sum of the input signals and
compares the result with a threshold value, . If the net input is
less than the threshold, the neuron output is –1. But if the net
input is greater than or equal to the threshold, the neuron
becomes activated, and its output attains a value +1.
n 1, if X
X xi wi Y
i 1 1, if X
The neuron uses the following transfer or activation function:
22
Activation functions of a neuron
Y Y Y Y
+1 +1 +1 +1
0 X 0 X 0 X 0 X
-1 -1 -1 -1
23
Can a single neuron learn a task?
24
Single-layer two-input perceptron
Inputs
x1 Linear Hard
w1 Combiner Limiter
Output
Y
w2
x2
Threshold
25
The Perceptron
26
The aim of the perceptron is to classify inputs,
x1, x2, . . ., xn, into one of two classes, say
A1 and A2.
27
Linear separability in the perceptrons
x2 x2
Class A1
1
2
1
x1
Class A2 x1
29
If at iteration p, the actual output is Y(p) and the desired output
is Yd (p), then the error is given by:
e( p) Yd ( pwhere
) Y ( pp) = 1, 2, 3, . . .
30
The perceptron learning rule
wi ( p 1) wi ( p) xi ( p) e( p)
where p = 1, 2, 3, . . .
is the learning rate, a positive constant less than unity.
31
Perceptron’s training algorithm
Step 1: Initialisation
Set initial weights w1, w2,…, wn and threshold
to random numbers in the range [0.5, 0.5].
33
Step 3: Weight training
Update the weights of the perceptron
wi ( p 1) wi ( p) wi ( p)
where is the weight correction at iteration p.
The weight correction is computed by the delta
rule:
wi ( p) xi ( p) e( p)
Step 4: Iteration
Increase iteration p by one, go back to Step 2 and
repeat the process until convergence.
Example of perceptron learning: the logical operation
AND Desired Initial Actual Error Final
Inputs
Epoch output weights output weights
x1 x2 Yd w1 w2 Y e w1 w2
1 0 0 0 0.3 0.1 0 0 0.3 0.1
0 1 0 0.3 0.1 0 0 0.3 0.1
1 0 0 0.3 0.1 1 1 0.2 0.1
1 1 1 0.2 0.1 0 1 0.3 0.0
2 0 0 0 0.3 0.0 0 0 0.3 0.0
0 1 0 0.3 0.0 0 0 0.3 0.0
1 0 0 0.3 0.0 1 1 0.2 0.0
1 1 1 0.2 0.0 1 0 0.2 0.0
3 0 0 0 0.2 0.0 0 0 0.2 0.0
0 1 0 0.2 0.0 0 0 0.2 0.0
1 0 0 0.2 0.0 1 1 0.1 0.0
1 1 1 0.1 0.0 0 1 0.2 0.1
4 0 0 0 0.2 0.1 0 0 0.2 0.1
0 1 0 0.2 0.1 0 0 0.2 0.1
1 0 0 0.2 0.1 1 1 0.1 0.1
1 1 1 0.1 0.1 1 0 0.1 0.1
5 0 0 0 0.1 0.1 0 0 0.1 0.1
0 1 0 0.1 0.1 0 0 0.1 0.1
1 0 0 0.1 0.1 0 0 0.1 0.1
1 1 1 0.1 0.1 1 0 0.1 0.1
Threshold: = 0.2; learning rate: = 0.1
Two-dimensional plots of basic logical operations
x2 x2 x2
1 1 1
x1 x1 x1
0 1 0 1 0 1
37
Multilayer perceptron with two hidden layers
First Second
Input hidden hidden Output
layer layer layer layer
38
What does the middle layer hide?
A hidden layer “hides” its desired output. Neurons in the
hidden layer cannot be observed through the input/output
behaviour of the network.
40
In a back-propagation neural network, the learning
algorithm has two phases.
First, a training input pattern is presented to the
network input layer. The network propagates the
input pattern from layer to layer until the output
pattern is generated by the output layer.
If this pattern is different from the desired output,
an error is calculated and then propagated
backwards through the network from the output
layer to the input layer. The weights are modified
as the error is propagated.
41
Three-layer back-propagation neural network
Input signals
1
x1 1 y1
1
2
x2 2 y2
2
i wij j wjk
xi k yk
m
n l yl
xn
Input Hidden Output
layer layer layer
Error signals
42
The back-propagation training algorithm
• Step 1: Initialisation
• Set all the weights and threshold levels of the
network to random numbers uniformly
distributed inside a small range:
2.4 2.4
,
Fi Fi
• where Fi is the total number of inputs of
neuron i in the network. The weight
initialisation is done on a neuron-by-neuron
basis.
43
Step 2: Activation
Activate the back-propagation neural network by
applying inputs x1(p), x2(p),…, xn(p) and desired
outputs yd,1(p), yd,2(p),…, yd,n(p).
(a) Calculate the actual outputs of the neurons in
the hidden layer:
n
y j ( p) sigmoid xi ( p) wij ( p) j
i 1
where n is the number of inputs of neuron j in the
hidden layer, and sigmoid is the sigmoid activation
function.
44
Step 2: Activation (continued)
(b) Calculate the actual outputs of the neurons in
the output layer:
m
yk ( p ) sigmoid x jk ( p ) w jk ( p ) k
j 1
where m is the number of inputs of neuron k in the
output layer.
45
Step 3: Weight training
Update the weights in the back-propagation network
propagating backward the errors associated with
output neurons.
(a) Calculate the error gradient for the neurons in the
output layer:
k ( p) yk ( p) 1 yk ( p) ek ( p)
where ek ( p) yd ,k ( p) yk ( p)
48
Three-layer network for solving the Exclusive-
OR operation
1
3
w13 1
x1 1 3 w35
w23 5
5 y5
w24
x2 2 4 w45
w24
Input 4 Output
layer layer
1
Hiddenlayer
• The effect of the threshold applied to a neuron in the
hidden or output layer is represented by its weight,
, connected to a fixed input equal to 1.
• The initial weights and threshold levels are set
randomly as follows:
w13 = 0.5, w14 = 0.9, w23 = 0.4, w24 = 1.0, w35 = 1.2,
w45 = 1.1, 3 = 0.8, 4 = 0.1 and 5 = 0.3.
• We consider a training set where inputs x1 and x2 are
equal to 1 and desired output yd,5 is 0. The actual
outputs of neurons 3 and 4 in the hidden layer are
calculated as
y3 sigmoid ( x1w13 x2w23 3) 1/ 1 e(10.510.410.8) 0.5250
10 0
Sum-Squared Error
10 -1
10 -2
10 -3
10 -4
0 50 100 150 200
Epoch
Final results of three-layer network learning
58
• A typical CNN architecture consists of a series of
convolutional layers, pooling layers, and fully connected
layers.
• Let’s assume that the input will be a color image, which is made up
of a matrix of pixels in 3D.
• This means that the input will have three dimensions—a height,
width, and depth—which correspond to RGB in an image.
60
• The feature detector is a two-dimensional (2-D) array of
weights, which represents part of the image.
62
Pooling Layer
• Pooling layers, also known as downsampling, conducts
dimensionality reduction, reducing the number of
parameters in the input.
• Similar to the convolutional layer, the pooling operation
sweeps a filter across the entire input, but the difference
is that this filter does not have any weights.
• Instead, the kernel applies an aggregation function to the
values within the receptive field, populating the output
array. There are two main types of pooling:
63
• Max pooling: As the filter moves across the input, it
selects the pixel with the maximum value to send to the
output array. As an aside, this approach tends to be used
more often compared to average pooling.
• Max Pooling also performs as a Noise Suppressant. It
discards the noisy activations altogether and also
performs de-noising along with dimensionality reduction.
64
Fully-Connected Layer
• The name of the full-connected layer aptly describes itself. As
mentioned earlier, the pixel values of the input image are not
directly connected to the output layer in partially connected
layers. However, in the fully-connected layer, each node in the
output layer connects directly to a node in the previous layer.
• This layer performs the task of classification based on the
features extracted through the previous layers and their
different filters. While convolutional and pooling layers tend to
use ReLu functions, FC layers usually leverage a softmax
activation function to classify inputs appropriately, producing a
probability from 0 to 1.
65
Example of CNN
• Example of a CNN that is trained to recognize
handwritten digits in the MNIST dataset:
• this data set was created by modifying an original
database of handwritten digits. The data set contains
60,000 training images and 10,000 testing images.
• Each image is a scan of a handwritten digit from 0 to 9,
and the differences between different images are a
result of the differences in the handwriting of different
individuals.
Steps:
1.Input Image: The input to the CNN is a 28 x 28 x 3 grayscale
image of a handwritten digit and the depth is 3
(corresponding to the RGB color channels).
2.Convolutional Layers: The first layer in the CNN applies a set
of 32 filters, each of size 5x5x3, to the input image. The
output of this layer is a set of 28x28 feature maps, each
representing a different local feature of the input image.
3.Activation Function: After each convolutional layer, an
activation function such as ReLU is applied to introduce non-
linearity in the network.
Figure: (a) The convolution between an input layer of size 32 × 32 × 3 and a filter of
size 5 × 5 × 3 produces an output layer with spatial dimensions 28 × 28. The depth of
the resulting output depends on the number of distinct filters and not on the
dimensions of the input layer or filter. (b) Sliding a filter around the image tries to
look for a particular feature in various windows of the image.
4. Pooling Layers: The second layer in the CNN
applies a max-pooling operation to each of the 32
feature maps, reducing their dimensionality by a
factor of 2.
74
Applications of Neural
Networks
• Neural networks are used in a wide range of
applications, including: Image and speech
recognition
• Natural language processing
• Predictive analytics
• Robotics and automation
• Financial forecasting
• Drug discovery