Chapter 6 Supervised Learning

Chapter 6 Supervised Learning
6a Classification
The Classification algorithm is a Supervised Learning technique that is used to
identify the category of new observations on the basis of training data. In
Classification, a program learns from the given dataset or observations and then
classifies new observation into a number of classes or groups, such as, Yes or No, 0
or 1, Spam or Not Spam, cat or dog, etc. Classes can be called targets/labels or
categories.
It takes labeled input data, which means it contains input with the corresponding
output. In classification algorithm, a discrete output function(y) is mapped to input
variable(x).
Consider the case of a Mall owner selling computers. The customers are of all ages
and incomes, as shown below. The goal is to predict whether customer will buy or
not buy the computers. Data points have attributes age and income.
Age o o o x x o Not buys

o o o o o x o x x x x Buys
o o x x x x x this is simple. Only intercept
Income on x-axis
Age o o o x x better performance

o o o o o x o x x x define intercept and
o o x x x x x slope
Income
Age o o o x x x now a quadratic, with 3

o o o o o x o x x x parameters – a,b and c
o o x x x x x much better performance
Income but complex classifier
Age o o o x x x not right, too complex

o o o o o x o x x x too many paras
o o x x x x x may be a glitch
Income
Goal is to get a mapping from income to age
Many ways to do this – the natural thing to do is get a curve- a complex quadratic
equation, or a spline: a special function defined piecewise by polynomials, which
adds curves together to make a continuous and irregular curve.
So a tradeoff between complexity and accuracy
The Process X1 = <30000, 25>, Y1 = o = -1
Training set X2 = <80000, 45>, Y2 = x = +1
X1 Y1
X1 Y1 Training Classifier
X2 Y1 Algorithm
X2 Y2
………
……….
X1 Y1 Validation
X2 Y2
Test Set
Normalize the values to X1 = <0.15, 0.25>, Y1 = o = -1
X2 = <0.4, 0.45>, Y1 = x = +1
Inductive bias refers to a set of (explicit or implicit) assumptions made by a learning
algorithm in order to perform induction, that is, to generalize a finite set of
observation (training data) into a general model of the domain. The inductive bias
(also known as learning bias) of a learning algorithm is the set of assumptions that
the learner uses to predict outputs of given inputs that it has not encountered.
Input X Agent output Yhat
Error
test
Target Y
Applications
 Credit card fraud detection – valid transaction or not
 Sentiment analysis – opinion mining
 Medical diagnosis – risk analysis
 Churn prediction – employee loyalty
Models based on Artificial Neural Networks (ANN), Support Vector Machines, (SVM)
Decision tree, Bayesian Networks
6b Regression (Prediction)
The output is no longer discrete, but continuous. A regression problem is when the
output variable is a real or continuous value, such as “salary” or “weight”.
Example: Temperature at different times of day and night
Temp
x
x x x x x
day night x
Time of day x
Temp
x
x x x x x
day night x
Time of day x
Temp
x
x x x x x
day night x
Time of day x
This fits all the noise in the data – overfitting, has to be avoided
Overfitting occurs when our machine learning model tries to cover all the data
points or more than the required data points present in the given dataset. Because of
this, the model starts caching noise and inaccurate values present in the dataset,
and all these factors reduce the efficiency and accuracy of the model.
Many different models can be used for regression. The simplest is the Linear
Regression. It tries to fit data with the best hyper-plane which goes through the
points.
Types of Regression
Linear Regression
Linear regression is one of the easiest and most popular Machine Learning
algorithms. It is a statistical method that is used for predictive analysis. Linear
regression makes predictions for continuous/real or numeric variables such as sales,
salary, age, product price, etc.
Linear regression algorithm shows a linear relationship between a dependent (y) and
one or more independent (x) variables, hence called as linear regression. Since
linear regression shows the linear relationship, which means it finds how the value of
the dependent variable is changing according to the value of the independent
variable.
The linear regression model provides a sloped straight line representing the
relationship between the variables. Consider the image below:
Mathematically, we can represent a linear regression by the (Hypothesis) function
y= a0+a1x, where,
Y= Dependent Variable (Target Variable)
X= Independent Variable (predictor Variable)
a0= intercept of the line (Gives an additional degree of freedom)
a1 = Linear regression coefficient (scale factor to each input value).
Finding the best fit line:
When working with linear regression, our main goal is to find the best fit line, which
means the error between predicted values and actual values should be minimized.
The best fit line will have the least error.
The different values for weights or the coefficient of lines (a 0, a1) gives a different line
of regression. So we need to calculate the best values for a 0 and a1 to find the best fit
line. To calculate this we use cost function.
Cost function-
o The different values for weights or coefficient of lines (a 0, a1) gives different
lines of regression, and the cost function is used to estimate the values of the
coefficient for the best fit line.
o Cost function optimizes the regression coefficients or weights. It measures
how a linear regression model is performing.
o We can use the cost function to find the accuracy of the mapping function,
which maps the input variable to the output variable. This mapping function is
also known as Hypothesis function.
For Linear Regression, we use the Mean Squared Error (MSE) cost function, which
is the average of squared error occurred between the predicted values and actual
values. It can be written as:
Here,
N=Total number of observations
Yi = Actual value
(a1xi+a0) = Predicted value
We want to minimize this MSE. It is simple with sufficient data points.

With many dimensions, we need to avoid overfitting and need regularization.
Regularization is a technique used to reduce the errors by fitting the function
appropriately on the given training set and avoid overfitting.
Applications
 Time series prediction – rainfall in a certain region
- Spend on voice calls
 Trend analysis – linear or exponential, not actual values
 Risk factor analysis – factors contributing most to output
Examples:
1. Marks scored by students based on number of hours studied
(ideally)- Here marks scored in exams are dependent and the number of
hours studied is independent.
2. Predicting crop yields based on the amount of rainfall- Yield is a
dependent variable while the measure of precipitation is an independent
variable.
3. Predicting the Salary of a person based on years of
experience- Experience becomes the independent while Salary is the
dependent variable.

Chapter 6 Supervised Learning

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 6 Supervised Learning

Uploaded by

Copyright:

Available Formats

Chapter 6 Supervised Learning

Age o o o x x o Not buys

Age o o o x x better performance

Age o o o x x x now a quadratic, with 3

Age o o o x x x not right, too complex

Input X Agent output Yhat

Finding the best fit line:

We want to minimize this MSE. It is simple with sufficient data points.

You might also like