Professional Documents
Culture Documents
Tuesday (10am-11am)
Wednesday (10am-11am & 3pm-4pm)
Friday (9am-10am, 11am-12am, 2pm-3pm)
Dr. Srinivasa L. Chakravarthy
&
Smt. Jyotsna Rani Thota
Department of CSE
GITAM Institute of Technology (GIT)
Visakhapatnam – 530045
Email: slade@gitam.edu & jthota@gitam.edu
20 August 2020 Department of CSE, GIT EID 403 and machine learning 1
Course objectives
• Explore about various disciplines connected with ML.
• Explore about efficiency of learning with inductive
bias.
• Explore about identification of Ml algorithms like
decision tree learning.
• Explore about algorithms like Artificial Neural
20 August 2020 2
networks, genetic programming, Bayesian algorithm,
Nearest neighbor algorithm, Hidden Markov chain
model.
20 August 2020 4
20 August 2020 5
20 August 2020 6
learning
• Perceptrons
• Backpropagation algorithms
&
Chapter -9 Genetic Algorithms 7
Chapter - 4
2. ANN is built with interconnected set of simple units, where each unit
8
takes a no.of real valued inputs & produce real valued outputs.
these inputs may be the output of other these outputs may be the input of
units other units
Introduction-
• ANN provides a method for learning-
Real-valued,
Discrete-valued &
Random-valued functions from examples.
• In general, ANN can be with graphs of many types acyclic or cyclic, directed or
undirected. 10
Number of neurons = 10 10
Connections per neuron = 10 4
Scene recognition time = 10 -1 seconds.
Neuron switching time = 10 -3 seconds.
but, still humans
can take complex
decisions quickly.
11
-10
It is slow when compared to a ANN(i.e.,10 seconds).
Comparison-
With the discussion,
13
2. Goal of defining effective ML algorithms, independent of whether
they mirror biological processes.
Neural network representation-
For example,
ALVINN- To steer an autonomous vehicle driving at normal speeds on
public highways.
14
Input = 30 X 32 grid of pixel intensities obtained from a
forward-pointed camera mounted on vehicle.
15
In this picture,
-a large matrix of black & white boxes
weights from 30 X 32 pixel inputs.
-small white box is +ve weight.
-small black box is -ve weight.
-size of box is called weight magnitude
and
-A smaller rectangular on top is weights from 16
It generates-
- Value ‘1’ if result is greater than threshold
- Value ‘-1’ otherwise
Perceptron representation-
iI
- Inputs are fed into multiple units and output of these units are
then input to second and final stage.
Calculate Weight vector-
To determine weight vector that makes perceptron to give correct
output for every training examples.
- Repeat the above steps, until the perceptron classifies all training
examples correctly.
Calculate Weight vector-(cont.)
Perceptron Rule
Case 3-If input xi >0 and increasing wi will make perceptron to classify
correctly.
Note- Both delta rule and perceptron rule appears in the same way but
the difference is in perceptron rule-
o’ refers to thresholded output.
Where as in delta rule-
O’ refers to linear unit output.
Gradient descent-
D is training examples
td is target output and od is output of linear unit.
Note- Here error is defined as half the squared difference between the target
output and linear unit, summed over all training examples.
To understand gradient descent-It is helpful to visualize the
hypothesis space.
2. If there are multiple local minimum errors in the error surface then
there is no guarantee that the procedure will find minimum error.
It is with larger step size per weight. It’s step size is small per weight compared to
standard.
It requires more computation per weight step It takes less time compared to standard.
update.
Linear programming
• Other than Perceptron rule & Delta rule there exists another
algorithm for calculating weight vector i.e., Linear programming.
Let's consider a speech recognition task, which involves all spoken words
“H_d” with identified 1 t0 10 vowels in it(i.e., hid, had,hood,head etc) .
So, the decision surface for this is represented with a multilayer network
in a highly non linear decision surface.
MultiLayer Network(cont.)
Figure (a) represents, Multilayer network consists of-
1. Two input parameters F1 & F2
2. Ten outputs with possible vowel sounds in it(”h_d”).
HOW??
A differentiable threshold unit
For that, we need a unit whose output is nonlinear function of its
inputs.
• Like perceptron,Sigmoid unit first computes the linear combination of inputs, then
applies a threshold to the result.
• It uses gradient descent to minimize the squared error between network output
and target output.
• As we are looking for multiple output units than single units, the error is redefined
as
• But gradient descent can guaranteed only for some local minimum
but not necessarily for global minimum error.
BackPropagation Algorithm(cont.)
• For example, to keep a ball rolling in same direction from one iteration to the next at a
flat region then the ball would stop if there is no momentum.
• To alter the weight-update rule in algorithm is, for updating nth iteration it
depends partially on update that occured during (n-1)th iteration, represents
as follows-
An alternative for this is by adding a term for weight vector-in such a way
that it leads gradient descent search to seek weight vectors with small
magnitudes,thereby reducing the risk of overfitting.
Advanced topics in ANN(cont.)
As gradient descent is one of the most general search methods for finding a
hypothesis to minimize the error function, but it is not efficient always.
So, a no.of alternative weight optimization algorithms have been proposed for
decisions to weight update methods like-
Choosing a direction in which to alter the current weight vector
Choosing a distance to move.
Some of the Optimization methods are like-
1- Line Search
2-Conjugate gradient method
Advanced topics in ANN(cont.)
Recurrent networks-
• These are ANN’s that apply to time series data and that use outputs of
network units at time t as the input to other units at time t+1.
• These are more difficult to train the networks with no loops but still they
remain important due to their more representational power.
Advanced topics in ANN(cont.)
Dynamically modifying network structure-
For this a variety of methods have been proposed to dynamically grow or shrink
the number of network units to improve accuracy and training efficiency.
Such as,
1. Cascade-correlation Algorithm: In this it starts with a network with no hidden
units,then grow the network by adding hidden units whenever required by
reducing the training error.
2. Optimal brain damage approach: In this it removes the least useful
connections which reduces the no.of weights in a large network with an
increase of accuracy and training efficiency.
END OF CHAPTER-4