You are on page 1of 93

Course: -Elective III (Machine Learning) (404191)

Credit : 3
Teaching Scheme: Examination Scheme:
Lecture:03 Hr/Week In-Sem: 30 Marks;
End-Sem: 70 Marks

Course: Lab Practice IV (404194)


Credit : 1

Teaching Scheme: Examination Scheme:


Practical:02 Hr/Week PR: 50 Marks; TW: 50 Marks

Academic Year: 2020- 2021, Semester: II


Course Coordinator
Dr. Mrs. R. S. Kamathe
Course Contents
Unit Course Contents

II

III

IV
Course Contents
Unit Course Contents

VI
Course (Theory) Objectives:

1. To introduce basic concepts in Machine Learning and its types.


(Unit I)
2. To explain regression and classification techniques to supervised
learning problems (Unit II)
3. To discuss dimensionality reduction and Clustering algorithms.
(Unit III)
4. To elaborate Artificial Neural Networks for classification
(Unit IV and V)
5. To explain Deep Learning and Convolutional Neural Networks
(Unit VI)
Course (Theory) Outcomes:

At the end of the course course, student will be able to


1. Explain basic concepts in Machine Learning and its types. (BT Level-2,
Understand) (Unit I)
2. Apply regression and classification techniques to supervised learning
problems.(BTL-3, Apply)(Unit II)
3. Illustrate dimensionality reduction and Clustering algorithms. (BTL-3,
Apply) (Unit III)
4. Use Artificial Neural Networks for Classification. (BTL-3, Apply)
(Unit IV and V)
5. Explain Deep Learning and Convolutional Neural Networks.
(BTL-2, Understand) (Unit VI)
3/12/2021 DSP (304182) 7
What Is Machine Learning?

•A subset of artificial intelligence (AI), machine learning (ML) is the


area of computational science that focuses on-
analyzing and interpreting patterns and structures in data to enable
learning, reasoning, and decision making outside of human interaction.

•Simply put, machine learning allows the user to feed a computer


algorithm an immense amount of data and have the computer analyze
and make data-driven recommendations and decisions based on only
the input data.

•If any corrections are identified, the algorithm can incorporate that
information to improve its future decision making.
The main goal of machine learning is-

• to study, engineer, and improve mathematical models,

• which can be trained (once or continuously) with context-related data


(provided by a generic environment),

• to infer the future and to make decisions without complete knowledge


of all influencing elements (external factors).

In other words, an agent (which is a software entity that receives


information from an environment, picks the best action to reach a
specific goal, and observes the results of it) adopts a statistical learning
approach, trying to determine the right probability distributions and use
them to compute the action (value or decision) that is most likely to be
successful (with the least error).

- Ref. Machine learning algorithms by Giuseppe Bonaccorso


Few definitions of Machine Learning (ML):

Nvidia defines it as “the practice of using algorithms to parse (analyze)


data, learn from it, and then make a determination or prediction about
something in the world.”

McKinsey&Company agree with Nvidia, saying that ML is “based on


algorithms that can learn from data without relying on rules-based
programming.”

Stanford suggests that ML is “the science of getting computers to act


without being explicitly programmed.”

Carnegie Mellon’s definition states “the field of Machine Learning


seeks to answer the question ‘How can we build computer systems
that automatically improve with experience, and what are the
fundamental laws that govern all learning processes?’”
The goal of Machine Learning:

Regardless of the definition you choose, at its most basic level, the goal
of machine learning is-

• To adapt to new data independently and make decisions and


recommendations based on thousands of calculations and analyses.

• It’s done by infusing artificial intelligence machines or deep learning


business applications from the data they’re fed.

•The systems learn, identify patterns, and make decisions with minimal
intervention from humans.

• Ideally, machines increase accuracy and efficiency and remove (or


greatly reduce) the possibility of human error.
How Does Machine Learning Work?

Machine learning is made up of three parts:

•The computational algorithm at the core of making determinations.

•Variables and features that make up the decision.

•Base knowledge for which the answer is known that enables (trains)
the system to learn.

Initially, the model is fed parameter data for which the answer is
known. The algorithm is then run, and adjustments are made until the
algorithm’s output (learning) agrees with the known answer. At this
point, increasing amounts of data are input to help the system learn and
process higher computational decisions.
Why Is Machine Learning Important?

•The nearly limitless quantity of available data, affordable data storage,


and the growth of less expensive and more powerful processing has
propelled the growth of ML.

•Now many industries are developing more robust models capable of


analyzing bigger and more complex data while delivering faster, more
accurate results on vast scales.

•ML tools enable organizations to more quickly identify profitable


opportunities and potential risks.

• The practical applications of machine learning drive business results


which can dramatically affect a company’s bottom line.

..........continued
Why Is Machine Learning Important?

• New techniques in the field are evolving rapidly and expanded the
application of ML to nearly limitless possibilities.

•Industries that depend on vast quantities of data—and need a system


to analyze it efficiently and accurately, have embraced ML as the best
way to build models, strategize, and plan.
Machine Learning Use Cases:
Machine learning has applications in all types of industries, including
manufacturing, retail, healthcare and life sciences, travel and
hospitality, financial services, and energy, feedstock, and utilities.

Use cases include:

Manufacturing: Predictive maintenance and condition monitoring

Retail: Upselling and cross-channel marketing

Healthcare and life sciences: Disease identification and risk satisfaction

Travel and hospitality: Dynamic pricing

Financial services: Risk analytics and regulation

Energy: Energy demand and supply optimization


Quick tour to what is ML and its types:

https://www.youtube.com/watch?v=ukzFI9rgwfU
Develop predictive model based Group and interpret data based
on both input and output on input data
• In supervised learning, the machine is taught by example.

•The operator provides the machine learning algorithm with a known dataset that
includes desired inputs and outputs, and the algorithm must find a method to
determine how to arrive at those inputs and outputs.

•While the operator knows the correct answers to the problem, the algorithm
identifies patterns in data, learns from observations and makes predictions.

•The algorithm makes predictions and is corrected by the operator – and this process
continues until the algorithm achieves a high level of accuracy/performance.
• Supervised Learning is the most popular paradigm for performing machine learning
operations.
• It is widely used for data where there is a precise mapping between input-output
data.
• The dataset, in this case, is labeled, meaning that the algorithm identifies the
features explicitly and carries out predictions or classification accordingly.
• As the training period progresses, the algorithm is able to identify the relationships
between the two variables such that we can predict a new outcome.
• Resulting Supervised learning algorithms are task-oriented.

• As we provide it with more and more examples, it is able to learn more


properly so that it can undertake the task and yield us the output more
accurately.

Under the umbrella of supervised learning fall: Classification, Regression


and Forecasting.

Classification:

• In classification tasks, the machine learning program must draw a


conclusion from observed values and determine to what category new
observations belong.

•For example, when filtering emails as ‘spam’ or ‘not spam’, the


program must look at existing observational data and filter the emails
accordingly.
Regression:

• In regression tasks, the machine learning program must estimate – and


understand – the relationships among variables.

• Regression analysis focuses on one dependent variable and a series of


other changing variables – making it particularly useful for prediction
and forecasting.

Forecasting:

Forecasting is the process of making predictions about the future based


on the past and present data, and is commonly used to analyse trends.
Some of the algorithms that come under supervised learning are as
follows –

• Linear Regression

In linear regression, we measure the linear relationship between two or


more than two variables. Based on this relationship, we perform
predictions that follow this linear pattern.

•Random Forest

Random Forests are an ensemble learning method that is for


performing classification, regression as well as other tasks through the
construction of decision trees and providing the output as a class which
is the mode or mean of the underlying individual trees.

........continued
• Gradient Boosting

Gradient Boosting is an ensemble learning method that is a collection of


several weak decision trees which results in a powerful classifier.

•Support Vector Machine

SVMs are powerful classifiers that are used for classifying the binary
dataset into two classes with the help of hyperplanes.

•Logistic Regression

It makes use of a bell-shaped S curve that is generated with the help of


logit function to categorize the data into their respective classes.

•Artificial Neural Networks


ANNs are modeled after the human brain and they learn from the data
over time. They form a much larger portion of machine learning called
Deep Learning.
Supervised Learning Use Case:

• Facial Recognition is one of the most popular applications of


Supervised Learning

• More specifically – Artificial Neural Networks.

• Convolutional Neural Networks (CNN) is a type of ANN used for


identifying the faces of people.

•These models are able to draw features from the image through
various filters. Finally, if there is a high similarity score between the
input image and the image in the database, a positive match is
provided.
Semi-supervised learning:

• Semi-supervised learning is similar to supervised learning, but instead


uses both labelled and unlabelled data.

• Labelled data is essentially information that has meaningful tags so


that the algorithm can understand the data,

• whilst unlabelled data lacks that information.

• By using this combination, machine learning algorithms can learn to


label unlabelled data.
Unsupervised learning:

• Here, the machine learning algorithm studies data to identify patterns.

• There is no answer key or human operator to provide instruction.

• Instead, the machine determines the correlations and relationships by


analysing available data.

•In an unsupervised learning process, the machine learning algorithm is


left to interpret large data sets and address that data accordingly.

• The algorithm tries to organise that data in some way to describe its
structure.

• This might mean grouping the data into clusters or arranging it in a


way that looks more organised.
Unsupervised learning:

• As it assesses more data, its ability to make decisions on that data


gradually improves and becomes more refined.
Unsupervised learning:

• As it assesses more data, its ability to make decisions on that data


gradually improves and becomes more refined.

Under the umbrella of unsupervised learning, fall:

Clustering:

•Clustering involves grouping sets of similar data (based on defined


criteria).
• It’s useful for segmenting data into several groups and performing
analysis on each data set to find patterns.
Reinforcement learning:

• Reinforcement learning focuses on regimented learning processes,


where a machine learning algorithm is provided with a set of
actions, parameters and end values.

Reinforcement
learning (RL) concerned with
how intelligent agents ought to
take actions in an environment
in order to maximize the
notion of cumulative reward.
Reinforcement learning:
• By defining the rules, the machine learning algorithm then tries to-

o explore different options and possibilities,


o monitoring and evaluating each result to determine which one is
optimal.

• Reinforcement learning teaches the machine trial and error.

• It learns from past experiences and begins to adapt its approach in


response to the situation to achieve the best possible result.
Examples of Machine Learning Algorithms:
Classification of Machine learning algorithms:

Learning a Function:

Machine learning can be summarized as learning a function (f) that


maps input variables (X) to output variables (Y).

Y = f(x)

•An algorithm learns this target mapping function from training data.

• The form of the function is unknown, so our job as machine learning


practitioners is to evaluate different machine learning algorithms and
see which is better at approximating the underlying function.

• Different algorithms make different assumptions or biases about the


form of the function and how it can be learned.
Classification of Machine learning algorithms:

Machine learning algorithms are classified into two distinct groups:

parametric and nonparametric models.

Parametric Machine Learning Algorithms:

• Assumptions can greatly simplify the learning process, but can also
limit what can be learned.

•Algorithms that simplify the function to a known form are called


parametric machine learning algorithms.

“A learning model that summarizes data with a set of parameters of


fixed size (independent of the number of training examples) is called a
parametric model. No matter how much data you throw at a parametric
model, it won’t change its mind about how many parameters it needs.”
— Artificial Intelligence: A Modern Approach,
Parametric Machine Learning Algorithms:...Continued

The algorithms involve two steps:

1. Select a form for the function.

2. Learn the coefficients for the function from the training data.

Example:
An easy to understand functional form for the mapping function is a
line, as is used in linear regression:

b0 + b1*x1 + b2*x2 = 0

Where b0, b1 and b2 are the coefficients of the line that control the
intercept and slope, and x1 and x2 are two input variables.
Parametric Machine Learning Algorithms:...Example Continued

•Assuming the functional form of a line greatly simplifies the learning


process.

•Now, all we need to do is estimate the coefficients of the line equation


and we have a predictive model for the problem.

• Often the assumed functional form is a linear combination of the input


variables and as such parametric machine learning algorithms are often
also called “linear machine learning algorithms“.

The problem is, the actual unknown underlying function may not be a
linear function like a line. It could be almost a line and require some
minor transformation of the input data to work right. Or it could be
nothing like a line in which case the assumption is wrong and the
approach will produce poor results.
Parametric Machine Learning Algorithms:... Continued

Some more examples of parametric machine learning algorithms


include:
oLogistic Regression

oLinear Discriminant Analysis

oPerceptron

oNaive Bayes

oSimple Neural Networks


Parametric Machine Learning Algorithms:...Continued

Benefits of Parametric Machine Learning Algorithms:

Simpler: These methods are easier to understand and interpret results.

Speed: Parametric models are very fast to learn from data.

Less Data: They do not require as much training data and can work well
even if the fit to the data is not perfect.

Limitations of Parametric Machine Learning Algorithms:

Constrained: By choosing a functional form these methods are highly


constrained to the specified form.
Limited Complexity: The methods are more suited to simpler problems.
Poor Fit: In practice the methods are unlikely to match the underlying
mapping function.
Nonparametric Machine Learning Algorithms

Algorithms that do not make strong assumptions about the form of the
mapping function are called nonparametric machine learning algorithms.

By not making assumptions, they are free to learn any functional form
from the training data.

Nonparametric methods are good when you have a lot of data and no
prior knowledge, and when you don’t want to worry too much about
choosing just the right features.
— Artificial Intelligence: A Modern Approach, page 757
Nonparametric Machine Learning Algorithms:

Nonparametric methods seek to best fit the training data in


constructing the mapping function, whilst maintaining some ability to
generalize to unseen data. As such, they are able to fit a large number
of functional forms.

•An easy to understand nonparametric model is the k-nearest neighbors


algorithm that makes predictions based on the k most similar training
patterns for a new data instance.

•The method does not assume anything about the form of the mapping
function other than patterns that are close are likely to have a similar
output variable.

Some more examples of popular nonparametric machine learning


algorithms are: k-Nearest Neighbors, Decision Trees and Support
Vector Machines
Benefits of Nonparametric Machine Learning Algorithms:

Flexibility: Capable of fitting a large number of functional forms.

Power: No assumptions (or weak assumptions) about the underlying


function.
Performance: Can result in higher performance models for prediction.

Limitations of Nonparametric Machine Learning Algorithms:

More data: Require a lot more training data to estimate the mapping
function.
Slower: A lot slower to train as they often have far more parameters to
train.
Overfitting: More of a risk to overfit the training data and it is harder to
explain why specific predictions are made.
Regression Analysis in Machine learning:

•Regression analysis is a statistical method to model the relationship


between a dependent (target) and independent (predictor) variables
with one or more independent variables.

• More specifically, Regression analysis helps us to understand how the


value of the dependent variable is changing corresponding to an
independent variable when other independent variables are held fixed.

• It predicts continuous/real values such as temperature, age, salary,


price, etc.

Example: Suppose there is a marketing company A, who does various


advertisement every year and get sales on that. The following list shows
the advertisement made by the company in the last 5 years and the
corresponding sales:
Now, the company wants to do the advertisement of $200 in the year
2019 and wants to know the prediction about the sales for this year.

So to solve such type of prediction problems in machine learning, we


need regression analysis.
Regression is a supervised learning technique which helps in finding
the correlation between variables and enables us to predict the
continuous output variable based on the one or more predictor
variables.

 It is mainly used for prediction, forecasting, time series modeling,


and determining the causal-effect relationship between variables.

 In Regression, we plot a graph between the variables which best fits


the given datapoints, using this plot, the machine learning model can
make predictions about the data.

 In simple words, "Regression shows a line or curve that passes


through all the datapoints on target-predictor graph in such a way
that the vertical distance between the datapoints and the regression
line is minimum." The distance between datapoints and line tells
whether a model has captured a strong relationship or not.
Some examples of regression can be as:

•Prediction of rain using temperature and other factors

•Determining Market trends

•Prediction of road accidents due to rash driving.


Quick Check:

Which of the following is a regression task?


Predicting age of a person
Predicting nationality of a person
Predicting whether stock price of a company will increase tomorrow
Predicting whether a document is related to sighting of UFOs?
Solution : Predicting age of a person (because it is a real value,
predicting nationality is categorical, whether stock price will increase is
discrete-yes/no answer, predicting whether a document is related to
UFO is again discrete- a yes/no answer).
Terminologies Related to the Regression Analysis:

• Dependent Variable: The main factor in Regression analysis which we


want to predict or understand is called the dependent variable. It is also
called target variable.

•Independent Variable: The factors which affect the dependent


variables or which are used to predict the values of the dependent
variables are called independent variable, also called as a predictor.

•Outliers: Outlier is an observation which contains either very low value


or very high value in comparison to other observed values. An outlier
may hamper the result, so it should be avoided.
Terminologies Related to the Regression Analysis:...continued

• Multicollinearity: If the independent variables are highly correlated


with each other than other variables, then such condition is called
Multicollinearity. It should not be present in the dataset, because it
creates problem while ranking the most affecting variable.

•Underfitting and Overfitting: If our algorithm works well with the


training dataset but not well with test dataset, then such problem is
called Overfitting. And if our algorithm does not perform well even with
training dataset, then such problem is called underfitting.
Why do we use Regression Analysis?

•Regression analysis helps in the prediction of a continuous variable


(real/continuous values).

•Regression estimates the relationship between the target and the


independent variable.

•It is used to find the trends in data.

•By performing the regression, we can confidently determine the most


important factor, the least important factor, and how each factor is
affecting the other factors.
Linear Regression:

•Linear regression is a statistical regression method which is used for


predictive analysis.

•It is one of the very simple and easy algorithms which works on
regression and shows the relationship between the continuous
variables.

•It is used for solving the regression problem in machine learning.

• Linear regression shows the linear relationship between the


independent variable (X-axis) and the dependent variable (Y-axis),
hence called linear regression.

•If there is only one input variable (x), then such linear regression is
called simple linear regression. And if there is more than one input
variable, then such linear regression is called multiple linear regression.
Example: Here we are predicting the salary of an employee on the basis
of the year of experience.

Mathematical equation for Linear regression: Y= aX+b


Here, Y = dependent variables (target variables), X= Independent
variables (predictor variables), a and b are the linear coefficients

Some popular applications of linear regression are:


•Analyzing trends and sales estimates
•Salary forecasting
•Real estate prediction
•Arriving at ETAs in traffic.
Logistic Regression:

•Logistic regression is another supervised learning algorithm which is


used to solve the classification problems.

• In classification problems, we have dependent variables in a binary or


discrete format such as 0 or 1.

•Logistic regression algorithm works with the categorical variable such


as 0 or 1, Yes or No, True or False, Spam or not spam, etc.

•It is a predictive analysis algorithm which works on the concept of


probability.

•Logistic regression is a type of regression, but it is different from the


linear regression algorithm in the term how they are used.

•Logistic regression uses sigmoid function or logistic function which is a


complex cost function. This sigmoid function is used to model the data
When we provide the input values (data) to the function, it gives the S-
curve as follows:

It uses the concept of threshold levels, values above the threshold level
are rounded up to 1, and values below the threshold level are rounded
up to 0.

There are three types of logistic regression:


•Binary(0/1, pass/fail)
•Multi(cats, dogs, lions)
•Ordinal(low, medium, high)
Polynomial Regression:

•Polynomial Regression is a type of regression which models the non-


linear dataset using a linear model.

•It is similar to multiple linear regression, but it fits a non-linear curve


between the value of x and corresponding conditional values of y.

•Suppose there is a dataset which consists of datapoints which are


present in a non-linear fashion, so for such case, linear regression will
not best fit to those datapoints. To cover such datapoints, we need
Polynomial regression.

•In Polynomial regression, the original features are transformed into


polynomial features of given degree and then modeled using a linear
model. Which means the datapoints are best fitted using a polynomial
line.
The equation for polynomial regression also derived from linear
regression equation that means

Linear regression equation Y= b0+ b1x, is transformed into Polynomial


regression equation Y= b0+b1x+ b2x2+ b3x3+.....+ bnxn.

Here Y is the predicted/target output, b0, b1,... bn are the regression


coefficients. x is our independent/input variable.

The model is still linear as the coefficients are still linear with quadratic
Example: Linear regression ; Identify errors of prediction in a scatter
plot with a regression line

•In simple linear regression, we predict scores on one variable from the
scores on a second variable.

•The variable we are predicting is called the criterion variable and is


referred to as Y.

•The variable we are basing our predictions on is called the predictor


variable and is referred to as X.

•When there is only one predictor variable, the prediction method is


called simple regression.

•In simple linear regression, the topic of this section, the predictions of
Y when plotted as a function of X form a straight line.
Example: Linear regression ; Identify errors of prediction in a scatter
plot with a regression line...continued

The example data in Table 1 are plotted in Figure 1. One can see that
there is a positive relationship between X and Y. If you were going to
predict Y from X, the higher the value of X, the higher your prediction of
Y.
X Y
Table 1.
1.00 1.00
Example data.
2.00 2.00
3.00 1.30
4.00 3.75
5.00 2.25

Figure 1. A scatter plot of the example data.


Example: Linear regression ; Identify errors of prediction in a scatter plot with
a regression line...continued

Linear regression consists of finding the best-


fitting straight line through the points. The
best-fitting line is called a regression line.

The black diagonal line in Figure 2 is the


regression line and consists of the predicted
score on Y for each possible value of X.

The vertical lines from the points to the


regression line represent the errors of
prediction.
•The red point is very near the regression line;
Figure 2. A scatter plot of the
its error of prediction is small. example data. The black line
•By contrast, the yellow point is much higher consists of the predictions, the
points are the actual data, and the
than the regression line and therefore its error vertical lines between the points and
the black line represent errors of
of prediction is large. prediction.
Example: Linear regression...continues

The error of prediction for a point is the value of the point minus the
predicted value (the value on the line).
Let predicted values : Y‘ then the errors of prediction = Y-Y’.

Here the Predicted Score Y’ = m X + C

Where m is the slope of the line and C is the Y intercept.

Let Mx: Mean of X,


My: Mean of Y,
Sx: standard deviation of X,
Sy: standard deviation of Y and
r: correlation between X and Y.

 Quick review of Measures of Variability:


WHAT IS VARIABILITY?
Variability refers to how "spread out" a group of scores is.
Variability refers to how "spread out" a group of scores is.

VARIANCE:
•Variability can also be defined in terms of how close the scores in the
distribution are to the middle of the distribution.

•Using the mean as the measure of the middle of the distribution, the
variance is defined as the average squared difference of the scores from
the mean.

• The formula for the variance is:

where S2 is the variance, M is the sample mean (M is the mean of a


sample taken from a population with a mean of , and N is the number
of samples.
STANDARD DEVIATION:
The standard deviation is simply the square root of the variance.
PEARSON’S CORRELATION:
• Compute x = X- Mx and y= Y- My

• The formula for the Pearson’s correlation (r) is:

•Now the slope m can be calculated as : m= r. Sy / Sx


and the intercept
C = My – m.Mx.

Then the formula for a regression line is Y’= mX + C, where Y’ is the


predicted score.
Example: Linear regression ..continued

X Y x = X- x2 y= Y-My y2 xy
Mx
1.00 1.00 -2 4 -1.06 1.123 2.12

2.00 2.00 -1 1 -0.06 0.0036 0.06

3.00 1.30 0 0 -0.76 0.5776 0

4.00 3.75 1 1 1.69 2.856 1.69

5.00 2.25 2 4 0.19 0.036 0.38

Mx= 3 My=2.06 𝜮x2=10 𝜮y2 𝜮


=4.5968 xy=4.25
r =0.626
Sx= Sy=1.07
1.581

Now, m= r. Sy/Sx= 0.425 and C= My- m.Mx = 0.785

Y’= 0.425 X +0.785

Then

For X= 1, Y’= (0.425)(1) + 0.785 = 1.21

For X=2, Y’ = (0.425)(2) + 0.785 = 1.64


Q. Using Regression analysis fit the linear model to the given data
X 17 13 12 15 16 14 16 17 18 19

Y 94 70 59 80 92 65 87 95 99 105

And predict the output for X= 12.5. (5M)


Solution: X Y x = X-Mx x 2 y= Y-My y2 xy
17 94 1.3 1.69 9.4 88.36 12.22

13 70 -2.7 7.29 -14.6 213.16 39.42

12 59 -3.7 13.69 -25.6 655.36 94.72

15 80 -0.7 0.49 -4.6 21.16 3.22

16 92 0.3 0.09 7.4 54.76 2.22

14 65 -1.7 2.89 -19.6 384.16 33.32


r = 0.9726
16 87 0.3 0.09 2.4 5.76 0.72

17 95 1.3 1.69 10.4 108.16 13.52

18 99 2.3 5.29 14.4 207.36 33.12

19 105 3.3 10.89 20.4 416.16 67.32

Mx= 15.7 My=84.6 𝜮x2=44.1 𝜮y2 =2154.4 𝜮xy=299.8

Sx= 2.213 Sy=15.475


Now, m= r. Sy/Sx= 6.79 and C= My- m.Mx = - 22.003

y’= 6.79 X - 22

Then

For X= 12.5 , Y’= (6. 79)(12.5) - 22 = 62.875 is the predict


output!
Non linear regression:
The simplest statistical relationship between a dependent variable Y
and one or more independent or predictor variables X1, X2, ... Is

Y = B0 + B1X1 + B2X2 + ... +

where represents a random deviation from the mean relationship


represented by the rest of the model.

With a single predictor, the model is a straight line (linear regression).

With more than one predictor, the model is a plane or hyperplane.

While such models are adequate for representing many relationships


(at least over a limited range of the predictors), there are many cases
when a more complicated model is required.
Linear regression relates two variables with a straight line; nonlinear
regression relates the variables using a curve.

Nonlinear regression model:

Y = f(X, β)+ ϵ,
where X is a vector of p predictors,
β is a vector of k parameters,
f( ) is some known regression function,
and ϵ is an error term whose distribution may or may not be normal.
(Error term representing random sampling noise or the effect of variables not
included in the model.)
Notice that we no longer necessarily have the dimension of the
parameter vector simply one greater than the number of predictors.

Some examples of nonlinear


regression models are:
Classification in Machine Learning:

•Classification in ML and statistics is a supervised learning approach in


which the computer program learns from the data given to it and make
new observations or classifications.

•A classification model attempts to draw some conclusion from


observed values.

•Given one or more inputs a classification model will try to predict the
value of one or more outcomes.

•A classification problem is when the output variable is a category, such


as “red” or “blue” or “disease” and “no disease”.
Classification in Machine Learning:

For example,
•when filtering emails “spam” or “not spam”,

•when looking at transaction data, “fraudulent”, or “authorized”.

•In short Classification either predicts categorical class labels or


classifies data (construct a model) based on the training set and the
values (class labels) in classifying attributes and uses it in classifying new
data.

•There are a number of classification models.

•Classification models include - logistic regression, decision tree,


random forest, gradient-boosted tree, multilayer perceptron, one-vs-
rest, and Naive Bayes.
Classification in Machine Learning:

Quick Check:

Which of the following is/are classification problem(s)?

•Predicting the gender of a person by his/her handwriting style

•Predicting house price based on area

•Predicting whether monsoon will be normal next year

•Predict the number of copies a music album will be sold next month

Solution : Predicting the gender of a person, Predicting whether


monsoon will be normal next year. The other two are regression.
Classification Models
Naive Bayes :

• Naive Bayes is a classification algorithm that assumes that predictors


in a dataset are independent.
•This means that it assumes the features are unrelated to each other.

•For example, if given a banana, the classifier will see that the fruit is of
yellow color, oblong shaped and long and tapered. All of these features
will contribute independently to the probability of it being a banana and
are not dependent on each other. Naive Bayes is based on Bayes’
theorem, which is given as:

P(A | B) = how often A happens given that B happens


P(A) = how likely A will happen
P(B) = how likely B will happen
P(B | A) = how often B happens given that A happens
Decision Trees :
•A Decision Tree is an algorithm that is used to visually represent
decision making.

•A Decision Tree can be made by asking a yes/no question and splitting


the answer to lead to another decision.

•The question is at the node and it places the resulting decisions below
at the leaves.

•The tree depicted below is used to decide if we can play tennis.


•In the above example, depending on the weather conditions and the
humidity and wind, we can systematically decide if we should play
tennis or not.

•Knowing this, we can make a tree which has the features at the nodes
and the resulting classes at the leaves.

❏Data is continuously split according


to a certain parameter.

❏Two entities
Decision nodes ~ Splits the dataset
Leaves ~ Decisions are taken
Attribute-based representations
• Examples described by attribute values (Boolean, discrete,
continuous)
• E.g., situations where I will/won't wait for a table:

• Classification of examples is positive (T) or negative (F)


• One possible representation for hypotheses
• E.g., here is the “true” tree for deciding
whether to wait:
Dimensionality Reduction in ML:
• The number of input variables or features for a dataset is referred to
as its dimensionality.

• Dimensionality reduction refers to techniques that reduce the


number of input variables in a dataset.

• More input features often make a predictive modeling task more


challenging to model, more generally referred to as the curse of
dimensionality.

• High-dimensionality statistics and dimensionality reduction


techniques are often used for data visualization. Nevertheless these
techniques can be used in applied machine learning to simplify a
classification or regression dataset in order to better fit a predictive
model.
Why is Dimensionality Reduction so Important?

• In machine learning, to catch useful indicators and obtain a more


accurate result, we tend to add as many features as possible at first.

• However, after a certain point, the performance of the model will


decrease with the increasing number of elements. This
phenomenon is often referred to as “The Curse of Dimensionality.”

• The curse of dimensionality occurs because the sample density


decreases exponentially with the increase of the dimensionality.

• When we keep adding features without increasing the number of


training samples as well, the dimensionality of the feature space
grows and becomes sparser and sparser.
Why is Dimensionality Reduction so Important?...continued
• Due to this sparsity, it becomes much easier to find a “perfect”
solution for the machine learning model which highly likely leads to
overfitting.

• Overfitting happens when the model corresponds too closely to a


particular set of data and doesn’t generalize well.

• An overfitted model would work too well on the training dataset so


that it fails on future data and makes the prediction unreliable.

• So to overcome the curse of dimensionality and avoid overfitting


especially when we have many features and comparatively few
training samples:
One popular approach is dimensionality reduction.
Dimensionality Reduction:

• Dimensionality reduction is the process of reducing the


dimensionality of the feature space with consideration by obtaining
a set of principal features.

• Dimensionality reduction can be further broken into feature


selection and feature extraction.

• Feature selection:
Feature selection tries to select a subset of the original features for
use in the machine learning model. In this way, we could remove
redundant and irrelevant features without incurring much loss of
information.
Dimensionality Reduction:....continued

Feature extraction:
• Feature extraction is also called feature projection. Whereas
feature selection returns a subset of the original features, feature
extraction creates new features by projecting the data in the high-
dimensional space to a space of fewer dimensions. This approach
can also derive informative and non-redundant features.

• We can use feature selection and feature extraction together.


Feature extraction may be performed on selected elements
containing relevant information rather than using the original
features.
• In addition to avoiding overfitting and redundancy, dimensionality
reduction also leads to better human interpretations and less
computational cost with simplification of models.
Example of dimensionality reduction:

• Consider a simple e-mail classification problem, where we need to


classify whether the e-mail is spam or not.

• This can involve a large number of features, such as whether or not


the e-mail has a generic title, the content of the e-mail, whether the
e-mail uses a template, etc.

• However, some of these features may overlap.

• In another condition, a classification problem that relies on both


humidity and rainfall can be collapsed into just one underlying
feature, since both of the aforementioned are correlated to a high
degree.
Example of dimensionality reduction:...continued
• Hence, we can reduce the number of features in such problems.
• A 3-D classification problem can be hard to visualize, whereas a 2-D
one can be mapped to a simple 2 dimensional space, and a 1-D
problem to a simple line.
• The below figure illustrates this concept, where a 3-D feature space
is split into two 1-D feature spaces, and later, if found to be
correlated, the number of features can be reduced even further.
There are two components of dimensionality reduction:

Feature selection:
In this, we try to find a subset of the original set of variables, or
features, to get a smaller subset which can be used to model the
problem. It usually involves three ways:
•Filter
•Wrapper
•Embedded

Feature extraction: This reduces the data in a high dimensional space to


a lower dimension space, i.e. a space with lesser no. of dimensions.

Methods of Dimensionality Reduction


The various methods used for dimensionality reduction include:
Principal Component Analysis (PCA)
Linear Discriminant Analysis (LDA)
Generalized Discriminant Analysis (GDA)
Overfitting and Techniques to Prevent Overfitting:

Overfitting occurs when the model performs well on training data but
generalizes poorly to unseen data.
Eight simple approaches to alleviate overfitting:
1. Hold-out (data)
• Rather than using all of our data for training, we can simply split our
dataset into two sets: training and testing.

• A common split ratio is 80% for training and 20% for testing.

• We train our model until it performs well not only on the training
set but also for the testing set.

• This indicates good generalization capability since the testing set


represents unseen data that were not used for training.

• However, this approach would require a sufficiently large dataset to


train on even after splitting.
Eight simple approaches to alleviate overfitting:...continued
2. Cross-validation (data)
• We can split our dataset into k groups (k-fold cross-validation).

• We let one of the groups to be the testing set (please see hold-out
explanation) and the others as the training set, and repeat this
process until each individual group has been used as the testing set
(e.g., k repeats).

• Unlike hold-out, cross-validation allows all data to be eventually


used for training but is also more
computationally expensive than
hold-out.
Eight simple approaches to alleviate overfitting:...continued
3. Data augmentation (data)
• A larger dataset would reduce overfitting.
• If we cannot gather more data and are constrained to the data we
have in our current dataset, we can apply data augmentation to
artificially increase the size of our dataset.
• For example, if we are training for an image classification task, we
can perform various image transformations to our image dataset
(e.g., flipping, rotating, rescaling, shifting).
Eight simple approaches to alleviate overfitting:...continued
4. Feature selection (data)
• If we have only a limited amount of training samples, each with a
large number of features, we should only select the most important
features for training so that our model doesn’t need to learn for so
many features and eventually overfit.
• We can simply test out different features, train individual models
for these features, and evaluate generalization capabilities, or use
one of the various widely used feature selection methods.
Eight simple approaches to alleviate overfitting:...continued

5. L1 / L2 regularization (learning algorithm)

•Regularization is a technique to constrain our network from learning a


model that is too complex, which may therefore overfit.

•In L1 or L2 regularization, we can add a penalty term on the cost


function to push the estimated coefficients towards zero (and not take
more extreme values).

•L2 regularization allows weights to decay towards zero but not to zero,
while L1 regularization allows weights to decay to zero.

6. Remove layers / number of units per layer (model)


•As mentioned in L1 or L2 regularization, an over-complex model may
more likely overfit.
Eight simple approaches to alleviate overfitting:...continued
•Therefore, we can directly reduce the model’s complexity by removing
layers and reduce the size of our model.

• We may further reduce complexity by decreasing the number of


neurons in the fully-connected layers.

•We should have a model with a complexity that sufficiently balances


between underfitting and overfitting for our task.
Eight simple approaches to alleviate overfitting:...continued

7. Dropout (model)
•By applying dropout, which is a form of regularization, to our layers, we
ignore a subset of units of our network with a set probability.

•Using dropout, we can reduce interdependent learning among units,


which may have led to overfitting.

•However, with dropout, we would need more epochs for our model to
converge.
Eight simple approaches to alleviate overfitting:...continued

8. Early stopping (model)


•We can first train our model for an arbitrarily large number of epochs
and plot the validation loss graph (e.g., using hold-out).

•Once the validation loss begins to degrade (e.g., stops decreasing but
rather begins increasing), we stop the training and save the current
model.

•We can implement this either by


monitoring the loss graph or set an
early stopping trigger.
The saved model would be the optimal
model for generalization among different
training epoch values.
Thank you!
Assignment No.1 [20 M]

1 . Explain Decision Tree algorithm with example.[6M]

2. What do you mean by linear regression? With suitable example, describe how linear
regression is used to predict the output for test example/input sample. What is non linear
regression? [8] OR

2. Describe Parametric and non parametric learning with their advantages and limitations. State
any four applications where machine learning is used? [8]

3. What is over fitting in machine learning? What are the different methods to overcome the
over fitting problem. Describe in brief. [6]

Email scanned handwritten copy of assignment to: hodentc@moderncoe.edu.in


Write email Subject as: ML_2020-21_Assignment1_First Name_Surname_XXXX(roll no.)

You might also like