Professional Documents
Culture Documents
Backpropagation
Application of ANN
2
Introduction
The most popular approaches to machine learning is neural
networks (ANN, CNN, RNN).
A neural network can be defined as a model of reasoning based on
the human brain. The brain consists of mass interconnected set of
nerve cells or basic information-processing units, called neurons.
The area of Neural Networks has originally been primarily inspired
by the goal of modeling biological neural systems, but has since
diverged and become a matter of engineering and achieving good
results in Machine/Deep learning and computer vision tasks.
3
Biological neural network (BNN)
Our brain can be considered as a highly complex, non-linear and
parallel information-processing system.
The basic computational unit of the brain is a neuron.
A nerve cell (neuron) is a special biological cell that processes
information. Approximately 86 billion neurons can be found in the
human nervous system and they are connected with approximately 1014
– 1015 synapses.
A neuron consists of a cell body (soma), connection links called
synapses, a number of fibers called dendrites, and a single long fiber
called the axon.
Each neuron receives input signals from its dendrites and produces
4 output signals along its (single) axon.
Biological neural network (BNN) …
Synapses is the connection between the axon and other neuron
dendrites and passes electrical signals from one layer to another
layer.
The axon eventually branches out and connects via synapses to
dendrites of other neurons.
The diagram below shows a Biological Neuron Network (left, a)
and a common mathematical model (right, b) of the BNN.
5
Figure 3.1 a) Biological Neuron Network(BNN) b) Mathematical model of the BNN
6
7
Biological neural network (BNN) …
Learning is a fundamental and essential characteristic of biological
neural networks.
The ease with which they can learn led to attempts to emulate a
biological neural network in a computer.
Historically, a common choice of activation function is the sigmoid
function σ, since it takes a real-valued input (the signal strength after the
sum) and squashes it to range between 0 and 1.
8
Artificial Neural Network (ANN)
ANN is a computational model based on the structure and functions of
biological neural networks that simulate some properties of the human brain.
It is one of the main supervisor learning methods used in machine learning.
An ANN consists of a number of very simple processors, also called
neurons, which are analogous to the biological neurons in the brain.
The neurons are connected by weighted links passing signals from one
neuron to another.
The output signal is transmitted through the neuron’s outgoing connection.
The outgoing connection splits into a number of branches that transmit the
same signal. The outgoing branches terminate at the incoming connections of
other neurons in the network.
9
Artificial Neural Network (ANN) …
ANN consist of input, hidden (optional) and output layers that transform
the input into something that the output layer can use.
ANN receive an input (a single vector), and transform it through a series
of hidden layers.
Each hidden layer is made up of a set of neurons, where each neuron is fully
connected to all neurons in the previous layer, and where neurons in a single
layer function completely independently and do not share any connections.
The output layer in classification it represents the class scores.
10
Architecture of a typical ANN
11
Biological Neural Network Artificial Neural Networks
12
Artificial Neural Network (ANN) …
13
Model of Artificial Neural Network
14
Cont …
A given node (neuron) takes the weighted sum of its inputs, and passes it through a linear
or non-linear activation function in single layer perceptron (SLP) and multilayer
perceptron ( MLP ) respectively. This is the output of the node, which then becomes the
input of another node in the next layer. The signal flows from left to right, and the final
output is calculated by performing this procedure for all the nodes.
Training this neural network means learning the weights and bias (optional) associated
15 with all the edges.
Cont …
An ANN consists of three basic components that include weights,
thresholds, and a single activation function.
Weight factors: W1,W2,W3,…..,Wn determines the strength of the
associated input.
The threshold θ is the magnitude offset that affects the activation of the
node output y.
An activation function : it may be linear or non linear activation function
which performs a mathematical operation on the signal output.
16
ANN sample
17
Cont. …
18
Cont …
The equation for a given node looks as follows. The weighted sum of its
inputs passed through a linear/non-linear activation function. It can be
represented as a vector dot product, where n is the number of inputs for the
node.
19
Type of ANN
There are two Artificial Neural Network topologies − Feedforward and
Feedback.
1. FeedForward ANN
In this ANN, the information flow is unidirectional. A unit sends information to
another unit from which it does not receive any information. There are no
feedback loops. They are used in pattern generation/recognition/classification.
They have fixed inputs and outputs.
20
Type of ANN
2. FeedBack ANN
Here, feedback loops are allowed. They are used in content-addressable
memories.
21
Activation Functions and There Types?
What is Activation Function?
It’s just a thing function that you use to get the output of the node. It is
also known as Transfer Function.
Activation functions are really important for an Artificial Neural
Network to learn and make sense of something reallocated and Non-linear
complex functional mappings between the inputs and response variable.
They introduce non-linear properties to our Network. Their main
purpose is to convert an input signal of a node in an ANN to an output
signal.
That output signal now is used as an input in the next layer in the stack.
22
Activation Functions and There Types?
Specifically in A-NN we do the sum of products of inputs(X) and their
23
Types of activation Functions?
It is used to determine the output of neural network like yes or no.
It maps the resulting values in between 0 to 1 or -1 to 1 etc. (depending
upon the function).
The Activation Functions can be based on 2 types-
1. Linear Activation Function
2. Non-linear Activation Functions
24
Types of activation Functions?
1. Linear Activation Function
As you can see the function is a line or linear. Therefore, the output of the
functions will not be confined between any range.
Equation: f(x) = x
Range: (-infinity to infinity)
It doesn’t help with the complexity of various parameters of usual data that is fed to
25
the neural networks.
Types of activation Functions?
2. Non-linear Activation Function
The Nonlinear Activation Functions are the most used activation functions.
Nonlinearity helps to makes the graph look something like this:
It makes it easy for the model to generalize or adapt to a variety of data and
to differentiate between the outputs.
26
Different Types of Activation function in non-Linear
Sigmoid Activation Function:
The Sigmoid Function curve looks like an S-shape.
The main reason why we use the sigmoid function is that it exists between (0 to 1). The SoftMax
function is a more generalized logistic activation function that is used for multiclass classification.
Therefore, it is especially used for models where we have to predict the probability as an output. Since
the probability of anything exists only between the range of 0 and 1, sigmoid is the right choice.
27
Different Types of Activation function in non-Linear
Tanh Activation Function
tanh is also like logistic sigmoid but better. The range of the tanh function is from
(-1 to 1). tanh is also sigmoidal (s-shaped).
The advantage is that the negative inputs will be mapped strongly negative and
the zero inputs will be mapped near zero in the tanh graph.
The tanh function is mainly used classification between two classes.
Both tanh and logistic sigmoid activation functions are used in feed-forward nets.
28
Different Types of Activation function in non-Linear
ReLU (Rectified Linear Unit) Activation Function
The ReLU is the most used activation function in the world right now. Since
it is used in almost all the convolutional neural networks or deep learning.
As you can see, the ReLU is half rectified (from bottom).
f(z) is zero when z is less than zero and f(z) is equal to z when z is above or
equal to zero.
29
Topology of ANN
Artificial Neural Network (ANN) is a computational model used in
these is the fact that it can actually learn from observing data sets.
A neural network architecture is the arrangement of a network
32
Single layer Perceptron
The simplest type of feedforward neural network is the perceptron.
It has no hidden layers. Thus, a perceptron has only an input layer and an output layer.
The output nodes are computed directly from the sum of the product of their weights
with the corresponding input nodes, plus bias.
The perceptron's output is binary either 0 or 1. This is achieved by passing the
aforementioned product sum into the step function H(x). This is defined as
34
Cont..
The perceptron learning rule:
The perceptron learning rule was first proposed by Rosenblatt in 1960. Using
this rule we can derive the perceptron training algorithm for classification
tasks.
Perceptron is used to build ANN system.
35
Cont..
Perceptron’s training algorithm:
Step 1: Initialization
Set initial weights w1, w2,…, wn and threshold θ to random numbers in the range [−0.5,
0.5].
If the error, e(p), is positive, we need to increase perceptron output Y(p), but if it is
negative, we need to decrease Y(p).
Step 2: Activation
Activate the perceptron by applying inputs x1(p), x2(p),…, xn(p) and desired output Yd (p).
Calculate
the actual output at iteration p= 1.
36
where n is the number of the perceptron inputs, and step is a step activation function.
Cont..
Step 3: Weight training
Update the weights of the perceptron
Step 4: Iteration
Increase iteration p by one, go back to Step 2 and repeat the process until convergence.
An epoch is when all the training data is used at once and is defined as the total number of
iterations of all the training data in one cycle for training the machine learning model.
37
A perceptron can learn the operations AND and OR, but not Exclusive-OR.
Example of perceptron learning: the logical operation AND
38
Two-dimensional plots of basic logical operations
A perceptron can learn the operations AND and OR, but not Exclusive-OR.
39
Multi layer neural networks (Multi layer Perceptron)
A multilayer perceptron is a feedforward neural network with one or
more hidden layers.
The network consists of an input layer of source neurons, at least one
middle or hidden layer of computational neurons, and an output layer
of computational neurons.
The input signals are propagated in a forward direction on a layer-
by-layer basis.
A hidden layer “hides” its desired output. Neurons in the hidden
layer cannot be observed through the input/output behavior of the
network. There is no obvious way to know what the desired output of
40 the hidden layer should be.
Cont..
41
Cont…
Example : Design a feed forward neural network topology : three nodes
in the input layer, 2 hidden layers and each has 4 nodes, and one node in
the output layer .
42
Cont…
Example : Design a feed forward neural network topology : three nodes
in the input layer, 2 hidden layers and each has 4 nodes, and one node in
the output layer .
45
Cont..
• Step_1: Gather Datasets
• A data set is a collection of data.
• Neural network depends heavily on data, without data, it is
impossible to learn ANN or other Neural Networks.
• Overall to gather an accurate dataset:
• Articulate the problem early: classification, clustering, regression,
recommendation, etc.
• Establish data collection mechanisms:
• Questionnaires , interview, observation, social media ,workshop ,etc.
• Store the datasets in valid storage and backup it , it may be suddenly lost or
46
modified.
Cont..
• Step_2: Preprocess the collected datasets
• Data preprocessing is used in transforming raw data into an understandable data format. Real-
world data is often incomplete, inconsistent, and/or lacking in certain behaviors or trends, and is
likely to contain many errors. So ,it protects from “garbage in, garbage out”.
• Overall in preprocessing the dataset include:
Get the dataset and import the libraries.
Convert to the valid data format (e.g. dicom to png, jpeg)which will be input to the network
model
Data Cleaning: Data is cleansed through processes such as filling in missing values,
smoothing the noisy data (noise removal), extract ROI , resolving the inconsistencies in the
data.
Data Integration: Data with different representations are put together aggregated and
generalized , and conflicts within the data are resolved.
Data Transformation: Data is transformed and normalized.
Encode categorical data.
47
Splitting the dataset into the Training set ,Validation set and Test set.
Cont..
Activation function
Use better activation function which used for that problem.
Stopping rule
Error rate
Number of epoch
49
Cont..
Step_5 : Optimized the neural network model
To produce an optimal NN topology
Minimum error rate
Less number of nodes and layers at hidden layer
Step_6 : Testing the neural network model
Without input noise
With input noise
Step_7 : Using the Neural Network model
If the error rate is within the acceptable error range, the network is
ready for use
50
Cont..
General Rules
Learning is done by making small adjustments in the weights to reduce the difference
51
Confusion Matrix in Machine Learning
The confusion matrix is a matrix used to determine the performance of the
The matrix itself can be easily understood, but the related terminologies
may be confusing.
Since it shows the errors in the model performance in the form of a matrix,
52
Confusion Matrix in Machine Learning cont..
The matrix is divided into two dimensions, that are predicted values and
actual values are the true values for the given observations.
53
Confusion Matrix in Machine Learning cont..
True Negative: Model has given prediction No, and the real or actual value
true.
False Negative: The model has predicted no, but the actual value was Yes,
Suppose we are trying to create a model that can predict the result for the
disease that is either a person has that disease or not. So, the confusion
matrix for this is given as:
55
Calculations using Confusion Matrix:
We can perform various calculations for the model, such as the model's
57
Calculations using Confusion Matrix: cont..
Precision: It can be defined as the number of correct outputs provided by
the model or out of all positive classes that have predicted correctly by the
model, how many of them were actually true. It can be calculated using the
below formula:
Recall: It is defined as the out of total positive classes, how our model
58
Calculations using Confusion Matrix: cont..
F-measure: If two models have low precision and high recall or vice
versa, it is difficult to compare these models. So, for this purpose, we can
use F-score.
This score helps us to evaluate the recall and precision at the same time.
59
Cont..
Example : Determine the characteristics of the customers of a given business institution
(Bank) in terms of credit risks (low credit risk , or high credit risk)
The data for this problem is all about the attributes of the customers
Age
Income
Debt
Payment record
60
Cont..
Example
61
Exercise
1. Why one model is better than others ?
2. The core mathematics behind the Machine Learning algorithms are
Statistics
Calculus
Linear Algebra
Probability
Explain detail their roles in Machine Learning algorithms and give examples for each.
1. Explain differentiable and continuous function, and write the difference
2. Explain the following terms (with in the perspective of ANNs)
Epoch
Iteration
Learning rate
Momentum coefficient
3. Explain sampling techniques used to determine the training, validation, and testing datasets in the training of ANNs.
4. Compare ANN with
Logistic regression
62