Artificial Neural Network

Artificial Neural Networks
1
Introduction, or how the brain works
Machine learning involves adaptive mechanisms that enable

computers to learn from experience, learn by example and learn
by analogy. Learning capabilities can improve the performance
of an intelligent system over time. The most popular approaches
to machine learning are artificial neural networks and genetic
algorithms. This lecture is dedicated to neural networks.
2
• A neural network can be defined as a model of
reasoning based on the human brain.
• The brain consists of a densely interconnected set of
nerve cells, or basic information-processing units,
called neurons.
• The human brain incorporates nearly 10 billion neurons
and 60 trillion connections, synapses, between them.
• By using multiple neurons simultaneously, the brain
can perform its functions much faster than the fastest
computers in existence today.
3
• Each neuron has a very simple structure, but an army
of such elements constitutes a tremendous processing
power.
• A neuron consists of a cell body, soma, a number of

fibers called dendrites, and a single long fiber called
the axon.
4
Biological neural network
Synapse
Synapse Dendrites
Axon
Axon
Soma Soma
Dendrites
Synapse
5
• Our brain can be considered as a highly complex, non-
linear and parallel information-processing system.
• Information is stored and processed in a neural
network simultaneously throughout the whole
network, rather than at specific locations. In other
words, in neural networks, both data and its processing
are global rather than local.
• Learning is a fundamental and essential characteristic
of biological neural networks. The ease with which
they can learn led to attempts to emulate a biological
neural network in a computer.
6
• An artificial neural network consists of a number of very
simple processors, also called neurons, which are
analogous to the biological neurons in the brain.
• The neurons are connected by weighted links passing
signals from one neuron to another.
• The output signal is transmitted through the neuron’s
outgoing connection.
• The outgoing connection splits into a number of
branches that transmit the same signal. The outgoing
branches terminate at the incoming connections of other
neurons in the network.
7
How do Neural Networks
Work?
• Neural networks learn by adjusting the weights
of connections between neurons in response to
input data.
• They can be trained using a variety of learning
algorithms, including backpropagation,
reinforcement learning, and unsupervised
learning.
Basic Artificial Neuron

Model
Φ: Threshold energy
Types of Neural Networks
• Feedforward Neural Networks: Information

flows in one direction from input to output.
• Recurrent Neural Networks: Allow feedback

loops, enabling them to process data
sequences.
• Convolutional Neural Networks: Used for

image and video recognition tasks.
Recurrent Neural
Networks
Feedforward Neural
Networks
Convolutional Neural
Networks
History
• late-1800's - Neural Networks appear as an analogy to

biological systems
• The 1960s and 70's – Simple neural networks appear
• Fall out of favor because the perceptron is not effective by
itself, and there were no good algorithms for multilayer
nets
• 1986 – Backpropagation algorithm appears
• Neural Networks have a resurgence in popularity
• More computationally expensive
Neural network v/s machine
learning algorithms
• A key advantage of neural networks over traditional machine
learning is that the former provides a higher-level abstraction
of expressing semantic insights about data domains by
architectural design choices in the computational graph.
• The second advantage is that neural networks provide a

simple way to adjust the complexity of a model by adding or
removing neurons from the architecture according to the
availability of training data or computational power.
• A large part of the recent success of neural
networks is explained by the fact that the increased
data availability and computational power of
modern computers have outgrown the limits of
traditional machine learning algorithms, which fail
to take full advantage of what is now possible
Figure: An illustrative comparison of the accuracy

of a typical machine learning algorithm with that
of a large neural network
• The performance of traditional machine learning remains better at
times for smaller data sets because of more choices, greater ease
of model interpretation, and the tendency to hand-craft
interpretable features that incorporate domain-specific insights.
• With limited data, the best of a very wide diversity of models in

machine learning will usually perform better than a single class of
models (like neural networks). This is one reason why the potential
of neural networks was not realized in the early years.
Architecture of a typical artificial neural network
Output Signals
Input Signals
Middle Layer
Input Layer Output Layer
16
Analogy between biological and
artificial neural networks
Biological Neural Network Artificial Neural Network

Soma Neuron
Dendrite Input
Axon Output
Synapse Weight
17
How do Neural Networks
Work?
• Neural networks learn by adjusting the weights
of connections between neurons in response to
input data.
• They can be trained using a variety of learning
algorithms, including backpropagation,
reinforcement learning, and unsupervised
learning.
Basic Artificial Neuron

Model
Φ: Threshold energy
Types of Neural Networks
• Feedforward Neural Networks: Information

flows in one direction from input to output.
• Recurrent Neural Networks: Allow feedback

loops, enabling them to process data
sequences.
• Convolutional Neural Networks: Used for

image and video recognition tasks.
The neuron as a simple computing element
Input Signals Weights Output Signals
x1
Y
w1
x2
w2
Neuron Y Y
wn
Y
xn
21
 The neuron computes the weighted sum of the input signals and
compares the result with a threshold value, . If the net input is
less than the threshold, the neuron output is –1. But if the net
input is greater than or equal to the threshold, the neuron
becomes activated, and its output attains a value +1.
n  1, if X  
X   xi wi Y 
i 1  1, if X  
 The neuron uses the following transfer or activation function:
 This type of activation function is called a sign function.
22
Activation functions of a neuron
Step function Sign function Sigmoid function Linear function
Y Y Y Y
+1 +1 +1 +1
0 X 0 X 0 X 0 X
-1 -1 -1 -1
step 1, if X  0 sign  1, if X  0 sigmoid 1

Y  Y  Y  Y linear  X
0, if X  0  1, if X  0 1  e X
23
Can a single neuron learn a task?
 In 1958, Frank Rosenblatt introduced a training

algorithm that provided the first procedure for
training a simple ANN: a perceptron.
 The perceptron is the simplest form of a neural

network. It consists of a single neuron with
adjustable synaptic weights and a hard limiter.
24
Single-layer two-input perceptron
Inputs
x1 Linear Hard
w1 Combiner Limiter
Output
 Y
w2

x2
Threshold
25
The Perceptron
 The operation of Rosenblatt’s perceptron is based

on the McCulloch and Pitts neuron model. The
model consists of a linear combiner followed by a
hard limiter.
 The weighted sum of the inputs is applied to the

hard limiter, which produces an output equal to +1
if its input is positive and 1 if it is negative.
26
 The aim of the perceptron is to classify inputs,
x1, x2, . . ., xn, into one of two classes, say
A1 and A2.
 In the case of an elementary perceptron, the n-

dimensional space is divided by a hyperplane into two
decision regions. The hyperplane is defined by the
linearly separablenfunction:
 xi wi    0
i 1
27
Linear separability in the perceptrons
x2 x2
Class A1
1
2
1
x1
Class A2 x1
x1w1 + x2w2   = 0 x1w1 + x2w2 + x3w3   = 0

x3
(a) Two-input perceptron. (b) Three-input perceptron.
How does the perceptron learn its classification
tasks?
This is done by making small adjustments in the weights to
reduce the difference between the actual and desired outputs of
the perceptron.
The initial weights are randomly assigned, usually in the range

[0.5, 0.5], and then updated to obtain the output consistent with
the training examples.
29
 If at iteration p, the actual output is Y(p) and the desired output
is Yd (p), then the error is given by:
e( p)  Yd ( pwhere
)  Y ( pp) = 1, 2, 3, . . .
Iteration p here refers to the pth training example presented to

the perceptron.
 If the error, e(p), is positive, we need to increase perceptron

output Y(p), but if it is negative, we need to decrease Y(p).
30
The perceptron learning rule
wi ( p  1)  wi ( p)    xi ( p)  e( p)
where p = 1, 2, 3, . . .
 is the learning rate, a positive constant less than unity.
The perceptron learning rule was first proposed by Rosenblatt in

1960. Using this rule we can derive the perceptron training
algorithm for classification tasks.
31
Perceptron’s training algorithm
Step 1: Initialisation
Set initial weights w1, w2,…, wn and threshold 
to random numbers in the range [0.5, 0.5].
If the error, e(p), is positive, we need to increase

perceptron output Y(p), but if it is negative, we
need to decrease Y(p).
Step 2: Activation
Activate the perceptron by applying inputs x1(p),
x2(p),…, xn(p) and desired output Yd (p).
Calculate the actual output at iteration p = 1

 n 
Y ( p )  step  xi ( p ) wi ( p )  

 i 1 

where n is the number of the perceptron inputs, and

step is a step activation function.
33
Step 3: Weight training
Update the weights of the perceptron
wi ( p  1)  wi ( p)  wi ( p)
where is the weight correction at iteration p.
The weight correction is computed by the delta
rule:
wi ( p)    xi ( p)  e( p)
Step 4: Iteration
Increase iteration p by one, go back to Step 2 and
repeat the process until convergence.
Example of perceptron learning: the logical operation
AND Desired Initial Actual Error Final
Inputs
Epoch output weights output weights
x1 x2 Yd w1 w2 Y e w1 w2
1 0 0 0 0.3  0.1 0 0 0.3  0.1
0 1 0 0.3  0.1 0 0 0.3  0.1
1 0 0 0.3  0.1 1 1 0.2  0.1
1 1 1 0.2  0.1 0 1 0.3 0.0
2 0 0 0 0.3 0.0 0 0 0.3 0.0
0 1 0 0.3 0.0 0 0 0.3 0.0
1 0 0 0.3 0.0 1 1 0.2 0.0
1 1 1 0.2 0.0 1 0 0.2 0.0
3 0 0 0 0.2 0.0 0 0 0.2 0.0
0 1 0 0.2 0.0 0 0 0.2 0.0
1 0 0 0.2 0.0 1 1 0.1 0.0
1 1 1 0.1 0.0 0 1 0.2 0.1
4 0 0 0 0.2 0.1 0 0 0.2 0.1
0 1 0 0.2 0.1 0 0 0.2 0.1
1 0 0 0.2 0.1 1 1 0.1 0.1
1 1 1 0.1 0.1 1 0 0.1 0.1
5 0 0 0 0.1 0.1 0 0 0.1 0.1
0 1 0 0.1 0.1 0 0 0.1 0.1
1 0 0 0.1 0.1 0 0 0.1 0.1
1 1 1 0.1 0.1 1 0 0.1 0.1
Threshold:  = 0.2; learning rate: = 0.1
Two-dimensional plots of basic logical operations
x2 x2 x2
1 1 1
x1 x1 x1
0 1 0 1 0 1
(a) AND (x1  x2) (b) OR ( x 1  x 2 ) (c) Ex cl usive- OR

(x 1  x2 )
A perceptron can learn the operations AND and OR,

but not Exclusive-OR.
Multilayer neural networks
 A multilayer perceptron is a feedforward neural

network with one or more hidden layers.
 The network consists of an input layer of source
neurons, at least one middle or hidden layer of
computational neurons, and an output layer of
computational neurons.
 The input signals are propagated in a forward
direction on a layer-by-layer basis.
37
Multilayer perceptron with two hidden layers
Out put Signals

Input Signals
First Second
Input hidden hidden Output
layer layer layer layer
38
What does the middle layer hide?
 A hidden layer “hides” its desired output. Neurons in the
hidden layer cannot be observed through the input/output
behaviour of the network.
 There is no obvious way to know what the desired output of the

hidden layer should be.
 Commercial ANNs incorporate three and sometimes four

layers, including one or two hidden layers.
 Each layer can contain from 10 to 1000 neurons.

Experimental neural networks may have five or even six layers,
including three or four hidden layers, and utilise millions of
neurons. 39
Back-propagation neural network
• Learning in a multilayer network proceeds the
same way as for a perceptron.
• A training set of input patterns is presented to the
network.
• The network computes its output pattern, and if
there is an error  or in other words a difference
between actual and desired output patterns  the
weights are adjusted to reduce this error.
40
 In a back-propagation neural network, the learning
algorithm has two phases.
 First, a training input pattern is presented to the
network input layer. The network propagates the
input pattern from layer to layer until the output
pattern is generated by the output layer.
 If this pattern is different from the desired output,
an error is calculated and then propagated
backwards through the network from the output
layer to the input layer. The weights are modified
as the error is propagated.
41
Three-layer back-propagation neural network
Input signals
1
x1 1 y1
1
2
x2 2 y2
2
i wij j wjk
xi k yk
m
n l yl
xn
Input Hidden Output
layer layer layer
Error signals
42
The back-propagation training algorithm
• Step 1: Initialisation
• Set all the weights and threshold levels of the
network to random numbers uniformly
distributed inside a small range:
 2.4 2.4 
  ,  
 Fi Fi 
• where Fi is the total number of inputs of
neuron i in the network. The weight
initialisation is done on a neuron-by-neuron
basis.
43
Step 2: Activation
Activate the back-propagation neural network by
applying inputs x1(p), x2(p),…, xn(p) and desired
outputs yd,1(p), yd,2(p),…, yd,n(p).
(a) Calculate the actual outputs of the neurons in
the hidden layer:
n 
y j ( p)  sigmoid  xi ( p)  wij ( p)   j 
 i 1 
where n is the number of inputs of neuron j in the
hidden layer, and sigmoid is the sigmoid activation
function.
44
Step 2: Activation (continued)
(b) Calculate the actual outputs of the neurons in
the output layer:
m 
yk ( p )  sigmoid   x jk ( p )  w jk ( p )   k 
 j 1 
where m is the number of inputs of neuron k in the
output layer.
45
Step 3: Weight training
Update the weights in the back-propagation network
propagating backward the errors associated with
output neurons.
(a) Calculate the error gradient for the neurons in the
output layer:
 k ( p)  yk ( p)  1  yk ( p) ek ( p)
where ek ( p)  yd ,k ( p)  yk ( p)
Calculate the weight corrections:

w jk ( p)    y j ( p)   k ( p)
Update the weights at the output neurons:
w jk ( p  1)  w jk ( p)  w jk ( p)
46
Step 3: Weight training (continued)
(b) Calculate the error gradient for the neurons in
the hidden layer:
l
 j ( p )  y j ( p)  [1  y j ( p)]    k ( p ) w jk ( p )
k 1
Calculate the weight corrections:
wij ( p)    xi ( p)   j ( p)
Update the weights at the hidden neurons:
wij ( p  1)  wij ( p)  wij ( p)
Step 4: Iteration
Increase iteration p by one, go back to Step 2 and
repeat the process until the selected error criterion
is satisfied.
As an example, we may consider the three-layer

back-propagation network. Suppose that the
network is required to perform logical operation
Exclusive-OR.
48
Three-layer network for solving the Exclusive-
OR operation
1
3
w13 1
x1 1 3 w35
w23 5
5 y5
w24
x2 2 4 w45
w24
Input 4 Output
layer layer
1
Hiddenlayer
• The effect of the threshold applied to a neuron in the
hidden or output layer is represented by its weight,
, connected to a fixed input equal to 1.
• The initial weights and threshold levels are set
randomly as follows:
w13 = 0.5, w14 = 0.9, w23 = 0.4, w24 = 1.0, w35 = 1.2,
w45 = 1.1, 3 = 0.8, 4 = 0.1 and 5 = 0.3.
• We consider a training set where inputs x1 and x2 are
equal to 1 and desired output yd,5 is 0. The actual
outputs of neurons 3 and 4 in the hidden layer are
calculated as
y3  sigmoid ( x1w13  x2w23  3)  1/ 1 e(10.510.410.8)  0.5250
y4  sigmoid ( x1w14  x2w24  4 )  1/ 1 e (10.911.010.1)  0.8808

• Now the actual output of neuron 5 in the output
layer is determined as:
y5  sigmoid( y3w35  y4w45  5)  1/ 1 e(0.52501.20.88081.110.3)  0.5097

• Thus, the following error is obtained:
e  yd,5  y5  0  0.5097  0.5097
• The next step is weight training. To update the
weights and threshold levels in our network, we
propagate the error, e, from the output layer
backward to the input layer.
• First, we calculate the error gradient for neuron 5
in the output layer:
5  y5 (1 y5) e  0.5097 (1 0.5097) ( 0.5097) 0.1274
• Then we determine the weight corrections assuming
that the learning rate parameter, , is equal to 0.1:
w35   y3  5  0.1 0.5250 (0.1274)  0.0067
w45   y4  5  0.1 0.8808 (0.1274)  0.0112
5   ( 1)  5  0.1 (1)  (0.1274)  0.0127
• Next we calculate the error gradients for neurons
3 and 4 in the hidden layer:
3  y3(1  y3)  5  w35  0.5250 (1  0.5250) (  0.1274) (  1.2)  0.0381
4  y4 (1 y4 )  5  w45  0.8808 (1  0.8808) (  0.1274) 1.1  0.0147
• We then determine the weight corrections:

w13   x1  3  0.1 1 0.0381 0.0038
w23   x2  3  0.11 0.0381 0.0038
 3   ( 1)  3  0.1 ( 1)  0.0381 0.0038
w14   x1  4  0.11 (  0.0147)  0.0015
w24   x2  4  0.11 ( 0.0147)  0.0015
 4   ( 1)  4  0.1 ( 1)  ( 0.0147)  0.0015
• At last, we update all weights and threshold:
w13  w13   w13  0.5  0.0038  0.5038
w14  w14   w14  0.9  0.0015  0.8985
w23  w23   w23  0.4  0.0038  0.4038
w24  w24   w24  1.0  0.0015  0.9985
w35  w35   w35   1.2  0.0067   1.2067
w45  w45   w45  1.1  0.0112  1.0888
 3   3    3  0.8  0.0038  0.7962
 4   4    4   0.1  0.0015   0.0985
 5   5    5  0.3  0.0127  0.3127
• The training process is repeated until the sum of

squared errors is less than 0.001.
Learning curve for operation Exclusive-OR
Sum-Squared Network Erro r for 224 Epochs
1
10
10 0
Sum-Squared Error
10 -1
10 -2
10 -3
10 -4
0 50 100 150 200
Epoch
Final results of three-layer network learning
Inputs Desired Actual Sum of

output output squared
x1 x2 yd y5 e errors
1 1 0 0.0155 0.0010
0 1 1 0.9849
1 0 1 0.9849
0 0 0 0.0175
Convolutional Neural Networks
• Since the 1950s, the early days of AI, researchers have
struggled to make a system that can understand visual data. In
the following years, this field came to be known as Computer
Vision.
• In 2012, computer vision took a quantum leap when a group of

researchers from the University of Toronto developed an AI
model that surpassed the best image recognition algorithms
and that too by a large margin.
• The AI system, which became known as AlexNet (named
after its main creator, Alex Krizhevsky), won the 2012
ImageNet computer vision contest with an amazing 85
percent accuracy.
• Convolutional neural networks are designed to work with
grid-structured inputs, which have strong spatial
dependencies in local regions of the grid.
• Convolutional Neural Networks (CNNs) are a type of
artificial neural network commonly used for image
recognition and classification tasks.
58
• A typical CNN architecture consists of a series of
convolutional layers, pooling layers, and fully connected
layers.
• Convolutional layers are responsible for detecting local

features in the input image, while pooling layers down
sample the feature maps to reduce their dimensionality.
• Finally, the fully connected layers take the output of the

convolutional and pooling layers and perform
classification.
Convolutional Layer
• The convolutional layer is the core building block of a CNN, and it is
where the majority of computation occurs. It requires a few
components, which are input data, a filter, and a feature map.
• Let’s assume that the input will be a color image, which is made up
of a matrix of pixels in 3D.
• This means that the input will have three dimensions—a height,
width, and depth—which correspond to RGB in an image.
• We also have a feature detector, also known as a kernel or a filter,

which will move across the receptive fields of the image, checking if
the feature is present. This process is known as a convolution.
60
• The feature detector is a two-dimensional (2-D) array of
weights, which represents part of the image.
• While they can vary in size, the filter size is typically a

3x3 matrix; this also determines the size of the receptive
field. The filter is then applied to an area of the image,
and a dot product is calculated between the input pixels
and the filter.
• This dot product is then fed into an output array.

Afterwards, the filter shifts by a stride, repeating the
process until the kernel has swept across the entire
image.
• The final output from the series of dot products from the
input and the filter is known as a feature map, activation
map, or a convolved feature. 61
• After each convolution operation, a CNN applies a Rectified
Linear Unit (ReLU) transformation to the feature map,
introducing nonlinearity to the model.
• As we mentioned earlier, another convolution layer can follow
the initial convolution layer. When this happens, the structure
of the CNN can become hierarchical as the later layers can see
the pixels within the receptive fields of prior layers.
• As an example, let’s assume that we’re trying to determine if
an image contains a bicycle. You can think of the bicycle as a
sum of parts. It is comprised of a frame, handlebars, wheels,
pedals, et cetera.
• Each individual part of the bicycle makes up a lower-level
pattern in the neural net, and the combination of its parts
represents a higher-level pattern, creating a feature hierarchy
within the CNN.
62
Pooling Layer
• Pooling layers, also known as downsampling, conducts
dimensionality reduction, reducing the number of
parameters in the input.
• Similar to the convolutional layer, the pooling operation
sweeps a filter across the entire input, but the difference
is that this filter does not have any weights.
• Instead, the kernel applies an aggregation function to the
values within the receptive field, populating the output
array. There are two main types of pooling:
63
• Max pooling: As the filter moves across the input, it
selects the pixel with the maximum value to send to the
output array. As an aside, this approach tends to be used
more often compared to average pooling.
• Max Pooling also performs as a Noise Suppressant. It
discards the noisy activations altogether and also
performs de-noising along with dimensionality reduction.
• Average pooling: As the filter moves across the input, it

calculates the average value within the receptive field to
send to the output array.
• While a lot of information is lost in the pooling layer, it also

has a number of benefits to the CNN. They help to reduce
complexity, improve efficiency, and limit risk of overfitting.
64
Fully-Connected Layer
• The name of the full-connected layer aptly describes itself. As
mentioned earlier, the pixel values of the input image are not
directly connected to the output layer in partially connected
layers. However, in the fully-connected layer, each node in the
output layer connects directly to a node in the previous layer.
• This layer performs the task of classification based on the
features extracted through the previous layers and their
different filters. While convolutional and pooling layers tend to
use ReLu functions, FC layers usually leverage a softmax
activation function to classify inputs appropriately, producing a
probability from 0 to 1.
65
Example of CNN
• Example of a CNN that is trained to recognize
handwritten digits in the MNIST dataset:
• this data set was created by modifying an original
database of handwritten digits. The data set contains
60,000 training images and 10,000 testing images.
• Each image is a scan of a handwritten digit from 0 to 9,
and the differences between different images are a
result of the differences in the handwriting of different
individuals.
Steps:
1.Input Image: The input to the CNN is a 28 x 28 x 3 grayscale
image of a handwritten digit and the depth is 3
(corresponding to the RGB color channels).
2.Convolutional Layers: The first layer in the CNN applies a set
of 32 filters, each of size 5x5x3, to the input image. The
output of this layer is a set of 28x28 feature maps, each
representing a different local feature of the input image.
3.Activation Function: After each convolutional layer, an
activation function such as ReLU is applied to introduce non-
linearity in the network.
Figure: (a) The convolution between an input layer of size 32 × 32 × 3 and a filter of
size 5 × 5 × 3 produces an output layer with spatial dimensions 28 × 28. The depth of
the resulting output depends on the number of distinct filters and not on the
dimensions of the input layer or filter. (b) Sliding a filter around the image tries to
look for a particular feature in various windows of the image.
4. Pooling Layers: The second layer in the CNN
applies a max-pooling operation to each of the 32
feature maps, reducing their dimensionality by a
factor of 2.
5. Flatten Layer: The output of the pooling layer is

then flattened into a 1D vector, which serves as
the input to the fully connected layers.
6. Fully Connected Layers: The flattened vector is
passed through a series of fully connected layers
with decreasing number of neurons, each using a
non-linear activation function such as ReLU.
7. Output Layer: The final layer in the network is a

softmax layer that assigns probabilities to each of
the 10 possible classes (digits 0-9) based on the
input image.
• During training, the weights of the CNN are
adjusted to minimize the difference between the
predicted class probabilities and the true class
labels. Once trained, the CNN can be used to
predict the class of new, unseen images.
Convolutional neural networks
and computer vision
• Convolutional neural networks power image
recognition and computer vision tasks. Computer
vision is a field of artificial intelligence (AI) that
enables computers and systems to derive
meaningful information from digital images,
videos and other visual inputs, and based on
those inputs, it can take action.
• This ability to provide recommendations
distinguishes it from image recognition tasks.
Some common applications of this computer
vision today can be seen in:
73
• Marketing: Social media platforms provide suggestions on
who might be in photograph that has been posted on a
profile, making it easier to tag friends in photo albums.
• Healthcare: Computer vision has been incorporated into
radiology technology, enabling doctors to better identify
cancerous tumors in healthy anatomy.
• Retail: Visual search has been incorporated into some e-
commerce platforms, allowing brands to recommend items
that would complement an existing wardrobe.
• Automotive: While the age of driverless cars hasn’t quite
emerged, the underlying technology has started to make its
way into automobiles, improving driver and passenger safety
through features like lane line detection.
74
Applications of Neural
Networks
• Neural networks are used in a wide range of
applications, including: Image and speech
recognition
• Natural language processing
• Predictive analytics
• Robotics and automation
• Financial forecasting
• Drug discovery

Artificial Neural Network

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Artificial Neural Network

Uploaded by

Copyright:

Available Formats

Artificial Neural Networks

Machine learning involves adaptive mechanisms that enable

• A neuron consists of a cell body, soma, a number of

Basic Artificial Neuron

• Feedforward Neural Networks: Information

• Recurrent Neural Networks: Allow feedback

• Convolutional Neural Networks: Used for

• late-1800's - Neural Networks appear as an analogy to

• The second advantage is that neural networks provide a

Figure: An illustrative comparison of the accuracy

• With limited data, the best of a very wide diversity of models in

Biological Neural Network Artificial Neural Network

Basic Artificial Neuron

• Feedforward Neural Networks: Information

• Recurrent Neural Networks: Allow feedback

• Convolutional Neural Networks: Used for

Input Signals Weights Output Signals

 This type of activation function is called a sign function.

Step function Sign function Sigmoid function Linear function

step 1, if X  0 sign  1, if X  0 sigmoid 1

 In 1958, Frank Rosenblatt introduced a training

 The perceptron is the simplest form of a neural

 The operation of Rosenblatt’s perceptron is based

 The weighted sum of the inputs is applied to the

 In the case of an elementary perceptron, the n-

x1w1 + x2w2   = 0 x1w1 + x2w2 + x3w3   = 0

The initial weights are randomly assigned, usually in the range

Iteration p here refers to the pth training example presented to

 If the error, e(p), is positive, we need to increase perceptron

The perceptron learning rule was first proposed by Rosenblatt in

If the error, e(p), is positive, we need to increase

Calculate the actual output at iteration p = 1

where n is the number of the perceptron inputs, and

(a) AND (x1  x2) (b) OR ( x 1  x 2 ) (c) Ex cl usive- OR

A perceptron can learn the operations AND and OR,

 A multilayer perceptron is a feedforward neural

Out put Signals

 There is no obvious way to know what the desired output of the

 Commercial ANNs incorporate three and sometimes four

 Each layer can contain from 10 to 1000 neurons.

Calculate the weight corrections:

As an example, we may consider the three-layer

y4  sigmoid ( x1w14  x2w24  4 )  1/ 1 e (10.911.010.1)  0.8808

y5  sigmoid( y3w35  y4w45  5)  1/ 1 e(0.52501.20.88081.110.3)  0.5097

3  y3(1  y3)  5  w35  0.5250 (1  0.5250) (  0.1274) (  1.2)  0.0381

4  y4 (1 y4 )  5  w45  0.8808 (1  0.8808) (  0.1274) 1.1  0.0147

• We then determine the weight corrections:

 5   5    5  0.3  0.0127  0.3127

• The training process is repeated until the sum of

Inputs Desired Actual Sum of

• In 2012, computer vision took a quantum leap when a group of

• Convolutional layers are responsible for detecting local

• Finally, the fully connected layers take the output of the

• We also have a feature detector, also known as a kernel or a filter,

• While they can vary in size, the filter size is typically a

• This dot product is then fed into an output array.

• Average pooling: As the filter moves across the input, it

• While a lot of information is lost in the pooling layer, it also

5. Flatten Layer: The output of the pooling layer is

7. Output Layer: The final layer in the network is a

You might also like