MLUnit 4-Notes

Greater Noida Institute of Technology
Department of CSE
Unit-IV (KCS-055)
Machine Learning Techniques (KCS-051)
Notes unit 4
ARTIFICIAL NEURAL NETWORKS – Perceptron’s, Multilayer perceptron, Gradient descent and the Delta
rule, Multilayer networks, Derivation of Backpropagation Algorithm, Generalization, Unsupervised
Learning – SOM Algorithm and its variant;
DEEP LEARNING - Introduction, concept of convolutional neural network , Types of layers –

(Convolutional Layers , Activation function , pooling , fully connected) , Concept of Convolution (1D and
2D) layers, Training of network, Case study of CNN for e.g. on Diabetic Retinopathy, Building a smart
speaker, Self-deriving car etc.
Artificial Neural Networks (ANN) or neural networks are computational algorithms that intended to
simulate the behaviour of biological systems composed of neurons. ANNs are computational models
inspired by an animal’s central nervous systems. It is capable of machine learning as well as pattern
recognition. A neural network is an oriented graph. It consists of nodes which in the biological analogy
represent neurons, connected by arcs. It corresponds to dendrites and synapses. Each arc associated
with a weight at each node. A neural network is a machine learning algorithm based on the model of a
human neuron. The human brain consists of millions of neurons. It sends and process signals in the
form of electrical and chemical signals. These neurons are connected with a special structure known as
synapses. Synapses allow neurons to pass signals. An Artificial Neural Network is an information
processing technique. It works like the way human brain processes information. ANN includes a large
number of connected processing units that work together to process information. They also generate
meaningful results from it.
Advantages of Artificial Neural Networks (ANN) :
1. Problems in ANN are represented by attribute-value pairs.
2. ANNs are used for problems having the target function, output may be discrete-valued, real-valued,
or a vector of several real or discrete-valued attributes.
3. ANNs learning methods are quite robust to noise in the training data. The training examples may
contain errors, which do not affect the final output.
4. It is used where the fast evaluation of the learned target function required.
Department of CSE
Unit-IV (KCS-055)
5. ANNs can bear long training times depending on factors such as the number of weights in the
network, the number of training examples considered, and the settings of various learning algorithm
parameters.
Disadvantages of Artificial Neural Networks (ANN) :
1. Hardware dependence : Artificial neural networks require processors with parallel processing power,
by their structure. For this reason, the realization of the equipment is dependent.
2. Unexplained functioning of the network :This is the most important problem of ANN. When ANN
gives a probing solution, it does not give a clue as to why and how. This reduces trust in the network.
3. Assurance of proper network structure : There is no specific rule for determining the structure of
artificial neural networks. The appropriate network structure is achieved through experience and trial
and error.
4. The difficulty of showing the problem to the network : ANNs can work with numerical information.
Problems have to be translated into numerical values before being introduced to ANN. The display
mechanism to be determined will directly influence the performance of the network. This is dependent
on the user’s ability.
5. The duration of the network is unknown : The network is reduced to a certain value of the error on
the sample means that the training has been completed. This value does not give us optimum results.
Characteristics : It contains large number of interconnected processing elements called neurons to do all
the operations. Information stored in the neurons is basically the weighted linkage of neurons. The input
signals arrive at the processing elements through connections and connecting weights. It has the ability
to learn, recall and generalize from the given data by suitable assignment and adjustment of weights.
The collective behaviour of the neurons describes its computational power, and no single neuron carries
specific information.
Application areas of ANN: Application areas of artificial neural network are :
1. Speech recognition : Speech occupies a prominent role in human-human interaction. Therefore, it is

natural for people to expect speech interfaces with computers. In the present era, for communication
with machines, humans still need sophisticated languages which are difficult to learn and use. To ease
this communication barrier, a simple solution could be communication in a spoken language that is
possible for the machine to understand. Hence, ANN is playing a major role in speech recognition.
2. Character recognition : It is a problem which falls under the general area of Pattern Recognition.
Many neural networks have been developed for automatic recognition of handwritten characters, either
letters or digits.
Department of CSE
Unit-IV (KCS-055)
3. Signature verification application : Signatures are useful ways to authorize and authenticate a person
in legal transactions. Signature verification technique is a non-vision based technique. For this
application, the first approach is to extract the feature or rather the geometrical feature set
representing the signature. With these feature sets, we have to train the neural networks using an
efficient neural network algorithm. This trained neural network will classify the signature as being
genuine or forged under the verification stage.
4. Human face recognition : It is one of the biometric methods to identify the given face. It is a typical
task because of the characterization of “non-face” images. However, if a neural network is well trained,
then it can be divided into two classes namely images having faces and images that do not have faces.
Perceptron:
The operation of Rosenblatt’s perceptron is based on the McCulloch and Pitts neuron model. The model
consists of a linear combiner followed by a hard limiter. The weighted sum of the inputs is applied to the
hard limiter, which produces an output equal to þ1 if its input is positive and 1 if it is negative. The aim
of the perceptron is to classify inputs, or in other words externally applied stimuli x1; x2; ... ; xn, into one
Department of CSE
Unit-IV (KCS-055)
of two classes, say A1 and A2. Thus, in the case of an elementary perceptron, the n-dimensional space is
divided by a hyperplane into two decision regions. The hyperplane is defined by the linearly separable
function
learning by perceptron
Department of CSE
Unit-IV (KCS-055)
Gradient descent:
Gradient descent is an iterative optimization algorithm for finding the minimum of a function. To find
the minimum of a function using gradient descent, one takes steps proportional to the negative of the
gradient of the function at the current point. The “gradient” in gradient descent refers to an error
gradient. The model with a given set of weights is used to make predictions and the error for those
predictions is calculated.
Department of CSE
Unit-IV (KCS-055)
The gradient is given by the slope of the tangent at w = 0.2, and then the magnitude of the step is
controlled by a parameter called the learning rate. The larger the learning rate, the bigger the step we
take, and the smaller the learning rate, the smaller the step we take. Then we take the step and we
move to w1. Now when choosing the learning rate, we have to be very careful as a large learning rate
can lead to big steps and eventually missing the minimum. On the other hand, a small learning rate can
result in very small steps and therefore causing the algorithm to take a long time to find the minimum
point.
1. Gradient descent is an optimization algorithm used to minimize some function by iteratively moving
in the direction of steepest descent as defined by the negative of the gradient.
2. A gradient is the slope of a function, the degree of change of a parameter with the amount of change
in another parameter.
3. Mathematically, it can be described as the partial derivatives of a set of parameters with respect to its
inputs. The more the gradient, the steeper the slope.
4. Gradient Descent is a convex function.
5. Gradient Descent can be described as an iterative method which is used to find the values of the
parameters of a function that minimizes the cost function as much as possible.
6. The parameters are initially defined a particular value and from that, Gradient Descent run in an
iterative fashion to find the optimal values of the parameters, using calculus, to find the minimum
possible value of the given cost function.
Different types of gradient descent are :
1. Batch gradient descent :

Department of CSE
Unit-IV (KCS-055)
This is a type of gradient descent which processes all the training examples for each iteration of gradient
descent. When the number of training examples is large, then batch gradient descent is computationally
very expensive. So, it is not preferred. Instead, we prefer to use stochastic gradient descent or mini-
batch gradient descent.
Advantages of batch gradient descent :
1. Less oscillations and noisy steps taken towards the global minima of the loss function due to updating
the parameters by computing the average of all the training samples rather than the value of a single
sample.
2. It can benefit from the vectorization which increases the speed of processing all training samples
together.
3. It produces a more stable gradient descent convergence and stable error gradient than stochastic
gradient descent.
4. It is computationally efficient as all computer resources are not being used to process a single sample
rather are being used for all training samples.
Disadvantages of batch gradient descent :
1. Sometimes a stable error gradient can lead to a local minima and unlike stochastic gradient descent
no noisy steps are there to help to get out of the local minima.
2. The entire training set can be too large to process in the memory due to which additional memory
might be needed.
3. Depending on computer resources it can take too long for processing all the training samples as a
batch
2. Stochastic gradient descent :
This is a type of gradient descent which processes single training example per iteration. Hence, the
parameters are being updated even after one iteration in which only a single example has been
processed. Hence, this is faster than batch gradient descent. When the number of training examples is
large, even then it processes only one example which can be additional overhead for the system as the
number of iterations will be large.
Advantages of stochastic gradient descent :
1. It is easier to fit into memory due to a single training sample being processed by the network.
2. It is computationally fast as only one sample is processed at a time.
3. For larger datasets it can converge faster as it causes updates to the parameters more frequently.
Department of CSE
Unit-IV (KCS-055)
4. Due to frequent updates the steps taken towards the minima of the loss function have oscillations
which can help getting out of local minimums of the loss function (in case the computed position turns
out to be the local minimum).
Disadvantages of stochastic gradient descent :
1. Due to frequent updates the steps taken towards the minima are very noisy. This can often lead the
gradient descent into other directions.
2. Also, due to noisy steps it may take longer to achieve convergence to the minima of the loss function.
3. Frequent updates are computationally expensive due to using all resources for processing one
training sample at a time.
3. Mini-batch gradient descent : This is a mixture of both stochastic and batch gradient descent. The
training set is divided into multiple groups called batches. Each batch has a number of training samples
in it. At a time, a single batch is passed through the network which computes the loss of every sample in
the batch and uses their average to update the parameters of the neural network.
Explain delta rule. Explain generalized delta learning rule (error backpropagation learning rule).
Delta rule :
1. The delta rule is specialized version of backpropagation’s learning rule that uses single layer neural
networks.
2. It calculates the error between calculated output and sample output data, and uses this to create a
modification to the weights, thus implementing a form of gradient descent.
Generalized delta learning rule (Error backpropagation learning) : In generalized delta learning rule
(error backpropagation learning).
Step 1: Initialisation :Set all the weights and threshold levels of the network to random numbers
uniformly distributed inside a small range ; where Fi is the total number of inputs of neuron i
in the network. The weight initialisation is done on a neuron-by-neuron basis.
Step 2: Activation : Activate the back-propagation neural network by applying inputs

; and desired outputs .
(a) Calculate the actual outputs of the neurons in the hidden layer:
; where n is the number of inputs of neuron j in the hidden layer,

and sigmoid is the sigmoid activation function.
Department of CSE
Unit-IV (KCS-055)
(b) Calculate the actual outputs of the neurons in the output layer:
; where m is the number of inputs of neuron k in the output

layer.
Step 3: Weight training Update the weights in the back-propagation network propagating backward the
errors associated with output neurons.
(a) Calculate the error gradient for the neurons in the output layer:
(b) Calculate the error gradient for the neurons in the hidden layer:
Step 4: Iteration Increase iteration p by one, go back to Step 2 and repeat the process until the selected
error criterion is satisfied.
Self Organizing Map :
Another popular type of unsupervised learning is competitive learning. In competitive learning, neurons
compete among themselves to become active. The output neuron that wins the ‘competition’ is called
Department of CSE
Unit-IV (KCS-055)
the winner-takes-all neuron. Although competitive learning was proposed in the early 1970s, it was
largely ignored until the late 1980s, when Teuvo Kohonen introduced a special class of artificial neural
networks called self-organising feature maps. He also formulated the principle of topographic map
formation which states that the spatial location of an output neuron in the topographic map
corresponds to a particular feature of the input pattern. The Kohonen network consists of a single layer
of computation neurons, but it has two different types of connections. There are forward connections
from the neurons in the input layer to the neurons in the output layer, and lateral connections between
neurons in the output layer. The lateral connections are used to create a competition between neurons.
In the Kohonen network, a neuron learns by shifting its weights from inactive connections to active
ones. Only the winning neuron and its neighbourhood are allowed to learn. If a neuron does not
respond to a given input pattern, then learning does not occur in that neuron.
Our brain is dominated by the cerebral cortex, a very complex structure of billions of neurons and
hundreds of billions of synapses. The cortex is neither uniform nor homogeneous. It includes areas,
identified by the thickness of their layers and the types of neurons within them, that are responsible for
different human activities (motor, visual, auditory, somatosensory, etc.), and thus associated with
different sensory inputs. We can say that each sensory input is mapped into a corresponding area of the
cerebral cortex; in other words, the cortex is a self-organising computational map in the human brain.
Deep Learning :
Department of CSE
Unit-IV (KCS-055)
Deep Learning is a part of machine learning, which is a subset of Artificial Intelligence. Deep Learning is a
subfield of machine learning concerned with algorithms inspired by the structure and function of the
brain called artificial neural networks. It is useful in processing Big Data and can create important
patterns that provide valuable insight into important decision making. Deep-learning architectures such
as deep neural networks, deep belief networks, recurrent neural networks and convolutional neural
networks have been applied to fields including computer vision, machine vision, speech recognition,
natural language processing, audio recognition, social network filtering, machine translation,
bioinformatics, drug design, medical image analysis, material inspection and board game programs.
Working:
1. First, we need to identify the actual problem in order to get the right solution and it should be
understood, the feasibility of the Deep Learning should also be checked (whether it should fit Deep
Learning or not).
2. Second, we need to identify the relevant data which should correspond to the actual problem and
should be prepared accordingly.
3. Third, Choose the Deep Learning Algorithm appropriately.
4. Fourth, Algorithm should be used while training the dataset.
5. Fifth, Final testing should be done on the data.
Applications:
1. Automatic Text Generation: Corpus of text is learned and from this model new text is generated,
word-by-word or character-by-character.
2. Healthcare: Helps in diagnosing various diseases and treating it.
3. Automatic Machine Translation: Certain words, sentences or phrases in one language is transformed
into another language (Deep Learning is achieving top results in the areas of text, images).
4. Image Recognition: Recognizes and identifies peoples and objects in images as well as to understand
content and context. This area is already being used in Gaming, Retail, Tourism, etc.
5. Predicting Earthquakes: Teaches a computer to perform viscoelastic computations which are used in

MLUnit 4-Notes

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MLUnit 4-Notes

Uploaded by

Copyright:

Available Formats

Greater Noida Institute of Technology

Machine Learning Techniques (KCS-051)

DEEP LEARNING - Introduction, concept of convolutional neural network , Types of layers –

Advantages of Artificial Neural Networks (ANN) :

1. Problems in ANN are represented by attribute-value pairs.

Disadvantages of Artificial Neural Networks (ANN) :

Application areas of ANN: Application areas of artificial neural network are :

1. Speech recognition : Speech occupies a prominent role in human-human interaction. Therefore, it is

4. Gradient Descent is a convex function.

Different types of gradient descent are :

1. Batch gradient descent :

Advantages of batch gradient descent :

Disadvantages of batch gradient descent :

2. Stochastic gradient descent :

Advantages of stochastic gradient descent :

2. It is computationally fast as only one sample is processed at a time.

Disadvantages of stochastic gradient descent :

Step 2: Activation : Activate the back-propagation neural network by applying inputs

; where n is the number of inputs of neuron j in the hidden layer,

; where m is the number of inputs of neuron k in the output

Self Organizing Map :

3. Third, Choose the Deep Learning Algorithm appropriately.

4. Fourth, Algorithm should be used while training the dataset.

5. Fifth, Final testing should be done on the data.

2. Healthcare: Helps in diagnosing various diseases and treating it.

You might also like