ANN Notes

------------------------------------------------------------------ Lecture 1:
------------------------------------------------------------------
All Exams are MCQs. (T1 - 9 qtns_5 marks each, T2, T3, Compre)
LAB:- solve programming exercise
Design automated programs to solve particular tasks.
############################################################### Lecture Starts

###############################################################
speech recognition process:

1. record the speech.
2. Segment the speech: Extract the various features of the speech signal,
frequency, Male Frequency Spectral Coeff. and other multi-scale features.
3. use ML model for classification into the different types say: angry, happy,
neutral, etc
ML subset of pattern recognition system.
ML role is to predict or classify the signal/model from the features itself.
ML is learning algorithm.
Example of pattern recognition is sensor data. recorded signals using sensor(sensor

data).
Components of Pattern Recognition System:

1. Data Acquisition.
2. Preprocessing : Filtering info we do not need
3. Feature Extraction.
4. Feature Selection.
5. Model Selection & Training.
6. Training.
IoT: Multi-Feature Recognition. Robust than Normal Neural Networks dur to sturdy
security system.
ML focuses on development of computer programs that can access data and create a
hypothesis to learn for themselves.
ML an application of AI that provides systems the ability to automatically learn

and improve from experience without being explicitly programmed.
ML is classified into three categories:

1. Supervised Learning.
2. Unsupervised Learning.
3. Semi-Supervised Learning.
SUPERVISED LEARNING:
1. Presence of supervisor.
2. Output is already available to compare the created model with.
UNSUPERVISED LEARNING:
1. Training of machine using information that is neither classified nor labeled and
allowing the algorithm to act on that information without guidance.
2. Clustering: it is a problem where we discover the inherent groups of data.
SEMI-SUPERVISED LEARNING:
1. Has both labeled and unlabeled data.
2. Machines which use this, are able to considerably improve learning accuracy.
3. Usually chosen when acquired labeled data requires skilled and relevant
resources in order to train it/learn from it.
Supervised learning categorised into two:
1. Classification : Class values (output) are non-continuous

2. Regression : Class values (output) are continuous.
We use Linear Regression to create a hypothesis using training data set and then
evaluate it using the test dat set.
----------See slide for Batch Gradient Descent derivation. Easy to understand.

############################################################### Lecture 1 Ends
###############################################################
------------------------------------------------------------------ Lecture 2:
------------------------------------------------------------------
Alpha value can be carefully chose using grid search or nested cross validation.
Grid Search:
1. Fix range of alpha. Find the Lowest Mean Square value for each alpha in given
range.
2. Lowest value will be the optimal value of alpha.
Batch gradient descent method is suitable for a small test data set. But not for
large data set.
For large data sets, we use mini batch gradient descent.
Flow of Mini Batch Gradient Descent:

1. Randomly shuffle data set.
2. Select 50 instances.
3. again shuffle data set.
4. Select 50 instances.
Do same till all the data set is done.
This reduces the computational complexity as we have only 50 instances to compute

from. Also we get local minima.
Hence, the Graph of Batch Gradient Descent will be much more smoother than Mini
Batch Gradient Descent.
Stocastic Gradient Descent:

process only one instance.
algorithm in Lecture slide.
Stocastic Gradient Descent process:

1. Shuffle all data.
2. Select one data set.
3. Evaluate the Cost function.
4. Do 1 through 3 steps for fixed number of times.
5. alpha value is updated every time. The minimum value of alpha is the final alpha
value of alpha.
Regularized Linear Regression (Ridge Regression):

Used to deal with the bias-variance problem of Linear Regression.
Bias-Variance Problem:
We create multiple models so as to fit the training data set. There are some models
which fit the training set perfectly, some which fit the data set with moderate
error and some with maximum error.
In this case, we do not know what the test data set is going to be. Hence it is
harmful to choose the Perfect fitting model as the test data set may be entirely
diiferent than the training set.
Also, we cant choose the least fitting model as it may not fit the test data set
too.
Here, It is advisable to choose the moderately fitting model so as to generalize
the data set the model can fit with moderate error.
The Perfectly fitting model for training data set is known as Overfitting.
The Least fitting model for training data set is known as Underfitting.
Hence Regularized Linear Regression model is better than Linear Regression as it

uses this technique.
Comparision of Batch Gradient Descent, Mini Batch Gradient Descent and Stocastic
Gradient Descent graphs.
----------See slides for Derivation of Mini Batch Gradient Descent and Stocastic
Gradient Descent. Easy to understand.
Algorithm for finding Gradient Descent (all types):

1. Declare Wj using uniform distribution or gaussian distribution.
2. Input the data set.
3. Evaluate the hypothesis.
4. Evaluate the cost function. (different for different type of regression)
5. Weight updation.
6. Do steps 3 to 5 till we get optimal cost function or till the decided
iterations.
Test 1:
9 MCQ qtns. (5 marks each in Google Quiz)
Types of Questions:
1. All equations to be remembered.
2. Given test vectors, evaluate the cost function, weight, etc. till given
iteration.
############################################################### Lecture 2 Ends

###############################################################
------------------------------------------------------------------ Lecture 3:
------------------------------------------------------------------
Important Note: Remember or it will be confusing
L2 norm (Ridge Regression / Regularized Gradient Descent) is the one having

"lambda(summation of Wj(squared))"
L1 norm is the one having "lambda(summation of mod of Wj)"
https://towardsdatascience.com/l1-and-l2-regularization-methods-ce25e7fc831c
Refer to the above link for more information
----------See Slide for Equation
Vectorization Based Linear Regression:
Jibber jabber about the slide no. 24

Finally some important thing came out of his mouth.
Limitation of VBLR (abbr.):

1. The evaluation of W i.e weight vector will be a computationally very complex and
very much time consuming if we consider a large data set.
Same problem occurs using Vector Based Ridge Regression.
Can't make notes till 27:56. All derivation based explaination.

After this, MATLAB demonstration is done.
############################################################### Lecture 3 Ends

###############################################################
------------------------------------------------------------------ Lecture 4:
------------------------------------------------------------------
Logistic Regression and Classification problems
Logistic regression is grnerally for binary classification.
In Logistic Regression,
P(y) -> Prior Function
P(x/y) -> Likelihood Function -> Gaussian Distribution
P(x) -> Evidence
P(y/x) -> Aposteriori Distribution.
############################################################### Lecture 4 Ends

###############################################################
------------------------------------------------------------------ Lecture 5:
------------------------------------------------------------------
In one vs all algorithm for Multiclass Logistic Regression, we hav a major problem
of Class Imbalance.
over Fitting may happen here.

Options to overcome this problem:
1. use random sampling:

pick random values from the other classes so that, the number of values
selected are same for both '0' and '1' classes.
2. one vs one Multiclass Logistic Regression Algorithm
here, number of models=(c(c-1))/2; c= total number of classes.

For 4 classes, we have, number of models=(4*3)/2=6
Namely : 1 vs 2, 1 vs 3, 1 vs 4, 2 vs 3, 2 vs 4, 3 vs 4
if there is a tie in "one vs all" or "one vs one",

we consider the max value of the likelihood function between the tied values, and
assign it to the output.
############################################################### Lecture 5 Ends

###############################################################
------------------------------------------------------------------ Lecture 6:
------------------------------------------------------------------
Underfitting is:
High Bias and Low Variance
Overfitting is:
Low Bias and High Variance
We can say that first order polynomial will have Underfitting.

and higher order polynomial will have overfitting.

ANN Notes

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ANN Notes

Uploaded by

Copyright:

Available Formats

------------------------------------------------------------------ Lecture 1:

LAB:- solve programming exercise

Design automated programs to solve particular tasks.

############################################################### Lecture Starts

speech recognition process:

ML subset of pattern recognition system.

ML role is to predict or classify the signal/model from the features itself.

Example of pattern recognition is sensor data. recorded signals using sensor(sensor

Components of Pattern Recognition System:

ML an application of AI that provides systems the ability to automatically learn

ML is classified into three categories:

2. Output is already available to compare the created model with.

2. Clustering: it is a problem where we discover the inherent groups of data.

1. Has both labeled and unlabeled data.

Supervised learning categorised into two:

1. Classification : Class values (output) are non-continuous

----------See slide for Batch Gradient Descent derivation. Easy to understand.

2. Lowest value will be the optimal value of alpha.

For large data sets, we use mini batch gradient descent.

Flow of Mini Batch Gradient Descent:

Do same till all the data set is done.

This reduces the computational complexity as we have only 50 instances to compute

Stocastic Gradient Descent:

Stocastic Gradient Descent process:

Regularized Linear Regression (Ridge Regression):

Hence Regularized Linear Regression model is better than Linear Regression as it

Algorithm for finding Gradient Descent (all types):

############################################################### Lecture 2 Ends

Important Note: Remember or it will be confusing

L2 norm (Ridge Regression / Regularized Gradient Descent) is the one having

Vectorization Based Linear Regression:

Jibber jabber about the slide no. 24

Limitation of VBLR (abbr.):

Same problem occurs using Vector Based Ridge Regression.

Can't make notes till 27:56. All derivation based explaination.

############################################################### Lecture 3 Ends

Logistic Regression and Classification problems

Logistic regression is grnerally for binary classification.

############################################################### Lecture 4 Ends

over Fitting may happen here.

1. use random sampling:

2. one vs one Multiclass Logistic Regression Algorithm

here, number of models=(c(c-1))/2; c= total number of classes.

if there is a tie in "one vs all" or "one vs one",

############################################################### Lecture 5 Ends

We can say that first order polynomial will have Underfitting.

You might also like