You are on page 1of 63

Introduction to Machine Learning

Chapter 4- Artificial Neural Networks (ANN)


Content
Introduction
Biological Analogy
Artificial Neural Network ( ANN) models
Activation Function

ANN Architecture (Network Topology)

Designing an ANN based system

Backpropagation

Application of ANN

2
Introduction
The most popular approaches to machine learning is neural
networks (ANN, CNN, RNN).
A neural network can be defined as a model of reasoning based on
the human brain. The brain consists of mass interconnected set of
nerve cells or basic information-processing units, called neurons.
The area of Neural Networks has originally been primarily inspired
by the goal of modeling biological neural systems, but has since
diverged and become a matter of engineering and achieving good
results in Machine/Deep learning and computer vision tasks.

3
Biological neural network (BNN)
Our brain can be considered as a highly complex, non-linear and
parallel information-processing system.
The basic computational unit of the brain is a neuron.
A nerve cell (neuron) is a special biological cell that processes
information. Approximately 86 billion neurons can be found in the
human nervous system and they are connected with approximately 1014
– 1015  synapses.
A neuron consists of a cell body (soma), connection links called
synapses, a number of fibers called dendrites, and a single long fiber
called the axon.
Each neuron receives input signals from its dendrites and produces
4 output signals along its (single) axon.
Biological neural network (BNN) …
Synapses is the connection between the axon and other neuron
dendrites and passes electrical signals from one layer to another
layer.
The axon eventually branches out and connects via synapses to
dendrites of other neurons. 
The diagram below shows a Biological Neuron Network (left, a)
and a common mathematical model (right, b) of the BNN.

5
Figure 3.1 a) Biological Neuron Network(BNN) b) Mathematical model of the BNN
6
7
Biological neural network (BNN) …
Learning is a fundamental and essential characteristic of biological
neural networks.
The ease with which they can learn led to attempts to emulate a
biological neural network in a computer.
Historically, a common choice of activation function is the sigmoid
function σ, since it takes a real-valued input (the signal strength after the
sum) and squashes it to range between 0 and 1.

8
Artificial Neural Network (ANN)
ANN is a computational model based on the structure and functions of
biological neural networks that simulate some properties of the human brain.
It is one of the main supervisor learning methods used in machine learning.
An ANN consists of a number of very simple processors, also called
neurons, which are analogous to the biological neurons in the brain.
The neurons are connected by weighted links passing signals from one
neuron to another.
The output signal is transmitted through the neuron’s outgoing connection.
The outgoing connection splits into a number of branches that transmit the
same signal. The outgoing branches terminate at the incoming connections of
other neurons in the network.

9
Artificial Neural Network (ANN) …
ANN consist of input, hidden (optional) and output layers that transform

the input into something that the output layer can use. 
ANN receive an input (a single vector), and transform it through a series

of hidden layers.
Each hidden layer is made up of a set of neurons, where each neuron is fully

connected to all neurons in the previous layer, and where neurons in a single
layer function completely independently and do not share any connections.
The output layer in classification it represents the class scores.
10
Architecture of a typical ANN

11
Biological Neural Network Artificial Neural Networks
12
Artificial Neural Network (ANN) …

Analogy between biological and Artificial neural networks :

Biological Neural Networks Artificial Neural Networks

Cell body ( Soma) Neuron


Dendrite Input
Axon Output
Synapse Weight or Interconnections

13
Model of Artificial Neural Network

14
Cont …

 A given node (neuron) takes the weighted sum of its inputs, and passes it through a linear
or non-linear activation function in single layer perceptron (SLP) and multilayer
perceptron ( MLP ) respectively. This is the output of the node, which then becomes the
input of another node in the next layer. The signal flows from left to right, and the final
output is calculated by performing this procedure for all the nodes.
 Training this neural network means learning the weights and bias (optional) associated
15 with all the edges.
Cont …
An ANN consists of three basic components that include weights,
thresholds, and a single activation function.
Weight factors: W1,W2,W3,…..,Wn determines the strength of the
associated input.
The threshold θ is the magnitude offset that affects the activation of the
node output y.
An activation function : it may be linear or non linear activation function
which performs a mathematical operation on the signal output.

16
ANN sample

17
Cont. …

18
Cont …
The equation for a given node looks as follows. The weighted sum of its
inputs passed through a linear/non-linear activation function. It can be
represented as a vector dot product, where n is the number of inputs for the
node.

19
Type of ANN
There are two Artificial Neural Network topologies − Feedforward and
Feedback.
1. FeedForward ANN
In this ANN, the information flow is unidirectional. A unit sends information to
another unit from which it does not receive any information. There are no
feedback loops. They are used in pattern generation/recognition/classification.
They have fixed inputs and outputs.

20
Type of ANN
2. FeedBack ANN
Here, feedback loops are allowed. They are used in content-addressable
memories.

21
Activation Functions and There Types?
What is Activation Function?
It’s just a thing function that you use to get the output of the node. It is
also known as Transfer Function.
Activation functions are really important for an Artificial Neural
Network to learn and make sense of something reallocated and Non-linear
complex functional mappings between the inputs and response variable.
They introduce non-linear properties to our Network. Their main
purpose is to convert an input signal of a node in an ANN to an output
signal.
That output signal now is used as an input in the next layer in the stack.

22
Activation Functions and There Types?
Specifically in A-NN we do the sum of products of inputs(X) and their

corresponding Weights (W) and apply an Activation function f(x) to it


to get the output of that layer and feed it as an input to the next layer.

23
Types of activation Functions?
It is used to determine the output of neural network like yes or no.
It maps the resulting values in between 0 to 1 or -1 to 1 etc. (depending
upon the function).
The Activation Functions can be based on 2 types-
1. Linear Activation Function
2. Non-linear Activation Functions

24
Types of activation Functions?
1. Linear Activation Function
As you can see the function is a line or linear. Therefore, the output of the
functions will not be confined between any range.

 Equation: f(x) = x
 Range: (-infinity to infinity)
 It doesn’t help with the complexity of various parameters of usual data that is fed to

25
the neural networks.
Types of activation Functions?
2. Non-linear Activation Function
The Nonlinear Activation Functions are the most used activation functions.
Nonlinearity helps to makes the graph look something like this:

It makes it easy for the model to generalize or adapt to a variety of data and
to differentiate between the outputs.

26
Different Types of Activation function in non-Linear
Sigmoid Activation Function:
The Sigmoid Function curve looks like an S-shape.

The main reason why we use the sigmoid function is that it exists between (0 to 1). The SoftMax
function is a more generalized logistic activation function that is used for multiclass classification.
Therefore, it is especially used for models where we have to predict the probability as an output. Since
the probability of anything exists only between the range of 0 and 1, sigmoid is the right choice.

27
Different Types of Activation function in non-Linear
Tanh Activation Function
tanh is also like logistic sigmoid but better. The range of the tanh function is from
(-1 to 1). tanh is also sigmoidal (s-shaped).

The advantage is that the negative inputs will be mapped strongly negative and
the zero inputs will be mapped near zero in the tanh graph.
The tanh function is mainly used classification between two classes.
Both tanh and logistic sigmoid activation functions are used in feed-forward nets.

28
Different Types of Activation function in non-Linear
ReLU (Rectified Linear Unit) Activation Function
The ReLU is the most used activation function in the world right now. Since
it is used in almost all the convolutional neural networks or deep learning.

As you can see, the ReLU is half rectified (from bottom).
f(z) is zero when z is less than zero and f(z) is equal to z when z is above or
equal to zero.
29
Topology of ANN
Artificial Neural Network (ANN) is a computational model used in

Machine Learning which works similar to biological neurons.


An ANN has several advantages but one of the most recognized of

these is the fact that it can actually learn from observing data sets. 
A neural network architecture is the arrangement of a network

along with its neurons (nodes ) and connecting lines. According to


the topology, ANN can be classified generally into Feedforward and
Feedback(Recurrent) neural networks.
30
Feedforward Neural Network
Feed-forward neural networks are the first and simplest type. They are
primarily used for supervised learning in cases where the data to be
learned is neither sequential nor time-dependent.
Links can only go in one direction
The neurons(nodes) are arranged in separate layers
Each node is linked only in the node in next layer
There is no connection between the neurons in the same layer, back to the
previous layer or skipping a layer.
The neurons in one layer receive inputs from the previous layer
The neurons in one layer delivers its output to the next layer
The connections are unidirectional (Hierarchical)
Do not have internal states
31
Cont…

There are two types of Feedforward neural network :


Single layer Perceptron
Multi layer perceptron (Multi layer network)

32
Single layer Perceptron
 The simplest type of feedforward neural network is the perceptron.
 It has no hidden layers. Thus, a perceptron has only an input layer and an output layer.
The output nodes are computed directly from the sum of the product of their weights
with the corresponding input nodes, plus bias.
 The perceptron's output is binary either 0 or 1. This is achieved by passing the
aforementioned product sum into the step function H(x). This is defined as

 Single-layer perceptron’s are linear classifiers.


 Single layer network are not capable of classifying non linear separable data sets.
 One way to tackle this problem is to use a multilayer layer perceptron.
33
Cont…
In 1958, Frank Rosenblatt introduced a training algorithm that provided the first
procedure for training a simple ANN: a perceptron.
The perceptron is the simplest form of a neural network. It consists of a single
neuron with adjustable synaptic weights and a hard limiter.

34
Cont..
The perceptron learning rule:

where p= 1, 2, 3, . . . and α is the learning rate, a positive constant less than

The perceptron learning rule was first proposed by Rosenblatt in 1960. Using
this rule we can derive the perceptron training algorithm for classification
tasks.
Perceptron is used to build ANN system.

35
Cont..
Perceptron’s training algorithm:
Step 1: Initialization
 Set initial weights w1, w2,…, wn and threshold θ to random numbers in the range [−0.5,
0.5].
 If the error, e(p), is positive, we need to increase perceptron output Y(p), but if it is
negative, we need to decrease Y(p).
Step 2: Activation
 Activate the perceptron by applying inputs x1(p), x2(p),…, xn(p) and desired output Yd (p).
Calculate
the actual output at iteration p= 1.

36
where n is the number of the perceptron inputs, and step is a step activation function.
Cont..
 Step 3: Weight training
 Update the weights of the perceptron

where ∆wi (p) is the weight correction at iteration p.


 The weight correction is computed by the delta rule:

 Step 4: Iteration
 Increase iteration p by one, go back to Step 2 and repeat the process until convergence.

 An epoch is when all the training data is used at once and is defined as the total number of
iterations of all the training data in one cycle for training the machine learning model.
37
 A perceptron can learn the operations AND and OR, but not Exclusive-OR.
 Example of perceptron learning: the logical operation AND

38
 Two-dimensional plots of basic logical operations

 A perceptron can learn the operations AND and OR, but not Exclusive-OR.

39
Multi layer neural networks (Multi layer Perceptron)
A multilayer perceptron is a feedforward neural network with one or
more hidden layers.
The network consists of an input layer of source neurons, at least one
middle or hidden layer of computational neurons, and an output layer
of computational neurons.
The input signals are propagated in a forward direction on a layer-
by-layer basis.
A hidden layer “hides” its desired output. Neurons in the hidden
layer cannot be observed through the input/output behavior of the
network. There is no obvious way to know what the desired output of
40 the hidden layer should be.
Cont..

41
Cont…
Example : Design a feed forward neural network topology : three nodes
in the input layer, 2 hidden layers and each has 4 nodes, and one node in
the output layer .

42
Cont…
Example : Design a feed forward neural network topology : three nodes
in the input layer, 2 hidden layers and each has 4 nodes, and one node in
the output layer .

Figure : Multilayer layer Perceptron


43
Cont…
Single Layer perceptron Multi layer perceptron
Vs
• Do not have one or • Have one or more layers
more layers of hidden of hidden
nodes(neurons) nodes(neurons)
• Can only learn linear • Can learn both linear and
functions non – linear functions .
• Less intelligent: B/c • More intelligent :B/c can
unable to learn complex learn complex data.
data. • Can use for both linear
• Only use for linear and non leaner
classification task. classification and
44 regression task
Associative Neural Network
There is no hierarchical arrangement
The connections can be bidirectional

45
Cont..
• Step_1: Gather Datasets
• A data set is a collection of data.
• Neural network depends heavily on data, without data, it is
impossible to learn ANN or other Neural Networks. 
• Overall to gather an accurate dataset:
• Articulate the problem early: classification, clustering, regression,
recommendation, etc.
• Establish data collection mechanisms:
• Questionnaires , interview, observation, social media ,workshop ,etc.
• Store the datasets in valid storage and backup it , it may be suddenly lost or
46
modified.
Cont..
• Step_2: Preprocess the collected datasets
• Data preprocessing is used in transforming raw data into an understandable data format. Real-
world data is often incomplete, inconsistent, and/or lacking in certain behaviors or trends, and is
likely to contain many errors. So ,it protects from “garbage in, garbage out”.
• Overall in preprocessing the dataset include:
Get the dataset and import the libraries.
Convert to the valid data format (e.g. dicom to png, jpeg)which will be input to the network
model
Data Cleaning: Data is cleansed through processes such as filling in missing values,
smoothing the noisy data (noise removal), extract ROI , resolving the inconsistencies in the
data.
Data Integration: Data with different representations are put together aggregated and
generalized , and conflicts within the data are resolved.
Data Transformation: Data is transformed and normalized.
Encode categorical data.
47
Splitting the dataset into the Training set ,Validation set and Test set.
Cont..

• Step_3 Determine the neural network architecture/topology


Number of nodes in the input layer
Equal to input parameters (independent variables)

Number of nodes and layers in the Hidden layer


There is no a clear cutting rule (Optimization Governs)
o Minimum error rate, and
o Less number of hidden nodes Helpful to simplify the hardware representation
of the network
Number of nodes in the output layer
Equal to output parameters (dependent variables)
o e.g. Number of class labels
48 Possible to optimize the number of output nodes as per the nature of the output
Cont..
• Step_4: Training the neural network model
Train the neural network model using the training dataset and adjust /tuning using
validation dataset

Activation function
Use better activation function which used for that problem.
Stopping rule
Error rate
Number of epoch

49
Cont..
Step_5 : Optimized the neural network model
To produce an optimal NN topology
Minimum error rate
Less number of nodes and layers at hidden layer
Step_6 : Testing the neural network model
Without input noise
With input noise
Step_7 : Using the Neural Network model
If the error rate is within the acceptable error range, the network is
ready for use

50
Cont..

General Rules

Initial network has a randomly assigned weights

Learning is done by making small adjustments in the weights to reduce the difference

between the observed and predicted values


Main difference from the logical algorithms is the need to repeat the update phase

several times in order to achieve convergence


Updating process is divided into epochs.

Each epoch updates all the weights of the process.

51
Confusion Matrix in Machine Learning
The confusion matrix is a matrix used to determine the performance of the

classification models for a given set of test data.


 It can only be determined if the true values for test data are known.

The matrix itself can be easily understood, but the related terminologies

may be confusing.
Since it shows the errors in the model performance in the form of a matrix,

hence also known as an error matrix.

52
Confusion Matrix in Machine Learning cont..
The matrix is divided into two dimensions, that are predicted values and

actual values along with the total number of predictions.


Predicted values are those values, which are predicted by the model, and

actual values are the true values for the given observations.

53
Confusion Matrix in Machine Learning cont..
True Negative: Model has given prediction No, and the real or actual value

was also No.


True Positive: The model has predicted yes, and the actual value was also

true.
False Negative: The model has predicted no, but the actual value was Yes,

it is also called as Type-II error.


False Positive: The model has predicted Yes, but the actual value was No.

It is also called a Type-I error.


54
Confusion Matrix in Machine Learning cont..
Example: We can understand the confusion matrix using an example.

Suppose we are trying to create a model that can predict the result for the

disease that is either a person has that disease or not. So, the confusion
matrix for this is given as:

55
Calculations using Confusion Matrix:
We can perform various calculations for the model, such as the model's

accuracy, using this matrix. These calculations are given below:


Classification Accuracy: It is one of the important parameters to

determine the accuracy of the classification problems.


It defines how often the model predicts the correct output. It can be

calculated as the ratio of the number of correct predictions made by the


classifier to all number of predictions made by the classifiers. The formula
is given below:
56
Calculations using Confusion Matrix: cont..
Misclassification rate: It is also termed as Error rate, and it defines how

often the model gives the wrong predictions.


The value of error rate can be calculated as the number of incorrect

predictions to all number of the predictions made by the classifier. The


formula is given below:

57
Calculations using Confusion Matrix: cont..
Precision: It can be defined as the number of correct outputs provided by

the model or out of all positive classes that have predicted correctly by the
model, how many of them were actually true. It can be calculated using the
below formula:
Recall: It is defined as the out of total positive classes, how our model

predicted correctly. The recall must be as high as possible.

58
Calculations using Confusion Matrix: cont..
F-measure: If two models have low precision and high recall or vice

versa, it is difficult to compare these models. So, for this purpose, we can
use F-score.
This score helps us to evaluate the recall and precision at the same time.

The F-score is maximum if the recall is equal to the precision. It can be


calculated using the below formula:

59
Cont..
Example : Determine the characteristics of the customers of a given business institution
(Bank) in terms of credit risks (low credit risk , or high credit risk)

The data for this problem is all about the attributes of the customers
 Age
 Income
 Debt
 Payment record

60
Cont..
Example

61
Exercise
1. Why one model is better than others ?
2. The core mathematics behind the Machine Learning algorithms are
 Statistics
 Calculus
 Linear Algebra
 Probability
Explain detail their roles in Machine Learning algorithms and give examples for each.
1. Explain differentiable and continuous function, and write the difference
2. Explain the following terms (with in the perspective of ANNs)
 Epoch
 Iteration
 Learning rate
 Momentum coefficient
3. Explain sampling techniques used to determine the training, validation, and testing datasets in the training of ANNs.
4. Compare ANN with
 Logistic regression

62

You might also like