28 - AI-Regression vs. Classification

Artificial Intelligence
LEARNING: REGRESSION VS
CLASSIFICATION
Week-15 Lecture-30
Agenda 2
 Regression
 Multiple Linear Regression.
 Polynomial Regression.
 Classification
 Logistic Regression.
 k-nearest neighbors.
 Tree based methods.

Regression
4
Recap: Linear Regression
Dependent Independent
Variable Variable
𝑦 𝑦 = 𝑏0 + 𝑏1𝑥
Constant Coefficient
Dependent
Variable
𝒙
Independent
Variable
5
Example: Linear Regression
Salary
Data
Example: Linear Regression (Ordinary 6
Least Squares)
7
Multiple features (variables)
• Example: Price of a House Price

Size (feet2) ($1000)
2104 460
1416 232
1534 315
852 178
… …
��ො = 𝑏0 + 𝑏1𝑥
9
Multiple features (variables)
Size (feet2) Number of Number of Age of home Price ($1000)

bedrooms floors (years)
2104 5 1 45 460
1416 3 2 40 232
1534 3 2 30 315
852 2 1 36 178
…
Notatio n: … … … …
= number of features
= input (features) of training example.
= value of feature in training example.
10
Hypothesis:
Previously:
��ො = 𝑏0 + 𝑏1𝑥 (No More)
Now:
��ො = 𝑏0 + 𝑏1𝑥1 + 𝑏2𝑥2 + 𝑏3𝑥3 + 𝑏4𝑥4

11
For convenience of notation, define .
Multivariate linear regression.

12
Building a Multivariate linear Regression model
13
Building a Multivariate linear Regression model Cont.
 There are 5 methods for building such models:
 All in
 Backward Elimination
 Forward Selection
 Bidirectional Elimination
 Score Comparison
14
Building a Multivariate linear Regression model Cont.
15
Housing prices prediction
𝑦 = 𝑏0 + 𝑏1 × 𝑤𝑖𝑑𝑡ℎ + 𝑏2 × 𝑙𝑒𝑛𝑔𝑡ℎ
𝑥1
𝑥2
𝐴𝑟𝑒𝑎 = 𝑤𝑖𝑑𝑡ℎ × 𝑙𝑒𝑛𝑔𝑡ℎ
𝑦 = 𝑏0 + 𝑏1 × X
16
Polynomial regression
𝑏0 + 𝑏1𝑥 + 𝑏 2 𝑥 2
Price
(y) 𝑏0 + 𝑏1𝑥 + 𝑏 2 𝑥 2 + 𝑏 3 𝑥 3
Size (x)
��ො = 𝑏0 + 𝑏1𝑥 + 𝑏2𝑥2 +
𝑏3𝑥3 3
= 𝑏0 + 𝑏1 𝑠𝑖𝑧𝑒 + 𝑏2 𝑠𝑖𝑧𝑒 2 + 𝑏3 𝑠𝑖𝑧𝑒
17
Other Type of Regression
Support Vector Regression
Decision Tree Regression
Random Forest Regression
Artificial neural network regression
Deep learning based regression

18
Example: Linear Regression (Implementation)
Importing Libraries
Data Import
Train and Test Split

19
Example: Simple Linear Regression (Implementation)
Fitting a linear regressor to the

Salary Data
Plotting the Results

20
Example: Simple Linear Regression (Results)
Classification
22
Classification
 A classification problem is when the output variable is a category

“red” or “blue” or “disease” and “no disease”.
 A classification model attempts to draw some conclusion
from observed values.
 Given one or more inputs a classification model will try to
predict the value of one or more outcomes.
Which of the following is/are classification problem(s)?
 Predicting the gender of a person by his/her handwriting

style
 Predicting house price based on area
 Predicting whether monsoon will be normal next year
 month
Predict the number of copies a music album will be sold Image source: Kaggle
Classification:
Logistic Regression
24
Logistic Regression: Overview
 Defined: A model for predicting one variable from

other variable(s).
 Variables: IV(s) is continuous/categorical,

DV is dichotomous (contrast b/w things)
 Relationship: Prediction of group membership
 Example: Can we predict students passage from GPA, etc.
 Assumptions: Multicollinearity (not linearity or normality)

25
Comparison to Linear Regression
Since dichotomous outcome, can’t use linear

regression because not linear
Since dichotomous outcome, we are now talking
about “probabilities” (of 0 or 1)
 So logistic is about predicting the probability of
the outcome occurring.
26
Logistic Regression basics
• Logistic is based upon “odds ratio”

• which is the probability of an event divided by probability of non-event.
• For example, if Exp(b) =2, then a one unit change would make the event
twice as likely (.67/.33) to occur.
Odds after a unitchangein the predictor

Exp(b) 
Odds before a unitchangein the
predictor
27
Logistic Regression basics
 Single predictor
P(Y ) 1 e
1
( b 0 b 1 X 1   i )
 Multiple predictor 
1
P(Y) 
1e (b0  b1 X1b 2 X 2 ...bn Xn  i )
 Notice the linear regression equation

 e is the base of the natural logarithm (about
2.718)
k-nearest neighbor: Intuition
28
 K nearest neighbors is a simple algorithm that stores all available
cases and classifies new cases based on a similarity measure
(e.g., distance functions).
 KNN has been used in statistical estimation and pattern

recognition already in the beginning of 1970’s as a non-parametric
technique.
Hasan, M.J.; Kim, J.-M. Fault Detection of a Spherical Tank Using a Genetic Algorithm-Based Hybrid Feature Pool and k-Nearest Neighbor Algorithm. Energies 2019, 12, 991.
Tree Based Methods
Decision Tree Complexity
* Slides from Seo Hui(LG Electronics), “Gradient Boosting

Model”
30
Implementation: Logistic Regression
31
Implementation: Logistic Regression Cont.
32
Implementation: Logistic Regression Cont.
33
Logistic Regression: Training and Testing Results
34
Another Classification
Algorithms
Support vector machine
Random Forest
Naïve Bayes
Neural Network (MLP)
…
35
References
 Y. Bengio, I. Goodfellow, and A. Courville, Deep learning, vol.1.

Citeseer, 2017.ai.berkeley.edu
 SuperDataScience
 towardsdatascience.com

28 - AI-Regression vs. Classification

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

28 - AI-Regression vs. Classification

Uploaded by

Copyright:

Available Formats

Artificial Intelligence

 Multiple Linear Regression.

 Tree based methods.

• Example: Price of a House Price

Size (feet2) Number of Number of Age of home Price ($1000)

��ො = 𝑏0 + 𝑏1𝑥1 + 𝑏2𝑥2 + 𝑏3𝑥3 + 𝑏4𝑥4

For convenience of notation, define .

Multivariate linear regression.

 There are 5 methods for building such models:

Support Vector Regression

Decision Tree Regression

Random Forest Regression

Artificial neural network regression

Deep learning based regression

Train and Test Split

Fitting a linear regressor to the

Plotting the Results

 A classification problem is when the output variable is a category

Which of the following is/are classification problem(s)?

 Predicting the gender of a person by his/her handwriting

 Defined: A model for predicting one variable from

 Variables: IV(s) is continuous/categorical,

 Relationship: Prediction of group membership

 Example: Can we predict students passage from GPA, etc.

 Assumptions: Multicollinearity (not linearity or normality)

Since dichotomous outcome, can’t use linear

• Logistic is based upon “odds ratio”

Odds after a unitchangein the predictor

 Notice the linear regression equation

 KNN has been used in statistical estimation and pattern

Decision Tree Complexity

* Slides from Seo Hui(LG Electronics), “Gradient Boosting

 Y. Bengio, I. Goodfellow, and A. Courville, Deep learning, vol.1.

You might also like