You are on page 1of 31

Linear Regression and its nitty

gritty

Instructor
Dr. Hiren Kumar Thakkar

30 July 2019
Public Notice regarding Use of Content & Images

• This document contains content and images obtained by


routine Google and Image searches. Some of these content
and images may perhaps be under copyright. They are
included here for educational and non commercial
purposes and are considered to be covered by the doctrine
of Fair Use. In any event they are easily available from
Google search or Google Images.

• It is not feasible to give full scholarly credit to the creators


of these content and images. We hope they can be satisfied
with the positive role they are playing in the educational
process.
2
Types of Learning

3
Algorithms
• Supervised learning
• Prediction
• Classification (discrete labels), Regression (real values)

• Unsupervised learning ( )
• Clustering
• Probability distribution estimation
• Finding association (in features)
• Dimension reduction

• Reinforcement learning
• Decision making (robot, chess machine)

4
Machine learning structure
• Supervised learning

5
Machine learning structure
• Unsupervised learning

6
Training and testing

Data acquisition Practical usage

Universal set
(unobserved)

Training set Testing set


(observed) (unobserved) 7
Learning techniques
• Supervised learning categories and techniques
• Linear classifier (numerical functions)
• Parametric (Probabilistic functions)
• Naïve Bayes, Gaussian discriminant analysis (GDA), Hidden Markov
models (HMM), Probabilistic graphical models
• Non-parametric (Instance-based functions)
• K-nearest neighbors, Kernel regression, Kernel density estimation,
Local regression
• Non-metric (Symbolic functions)
• Classification and regression tree (CART), decision tree
• Aggregation
• Bagging (bootstrap + aggregation), Adaboost, Random forest

8
Definition

Linear Regression

“It is a linear approximation of a


relationship between two or more
variables”
Types of Relationships
Types of Relationships

Strong Linear Negative Non Linear


Strong Linear
Positive

Non Linear No relationship Casual Linear


Negative
Linear Regression

➢ Use a Least squares to fit a line


to the data

➢ Calculate R2

➢ Calculate a p-value for R2

(Adjusted R2)

11
Example – Mouse Size Vs Weight

12
Fitting a regression line

13
How the regression line is obtained?

First, draw line through the data

14
How the regression line is obtained?

residual

➢ Second, measure the distance from line to the data points.


➢ Square the distances
➢ Add the squared distances 15
How the regression line is obtained?

Third, rotate the line a little bit

16
How the regression line is obtained?

With the new line,

➢ Measure the residuals.

➢ Square the residuals.

➢ Add the squared


residuals.

17
How the regression line is obtained?

Rotate the line a bit more…

18
How the regression line is obtained?

Sum up the squared


residuals

19
How the regression line is obtained?

20
How the regression line is obtained?

21
How the regression line is obtained?

22
How the regression line is obtained?

23
How the regression line is obtained?

24
Regression line as Y = mx + c

25
Mathematics of Regression Line

26
Mathematics of Regression Line

Slop = 0.78 ≠ 0

“Mouse weight” is helpful


to make a guess about the
“Mouse size”.

Calculate R2 How good the guess is?


27
News of the day in ML
Birthweight, height can predict infants’ future health:
Study

Courtesy: https://www.thehealthsite.com/news/birthweight-height-can-predict-infants-future-health-study-679744/
28
Coefficient of determination 2
(r )

“How much variance is explained by the


regression line?”
Explanation of (r2)
0 <= r2 <= +1
r2 = 0 → Variance is 0% explained

r2 = 1 → Variance is 100% explained

Size Vs Weight r2 = 0.6 → Variance is 60% explained

r2 = 0.54 → 54% variance in “size” can be explained by


“Weight”.

Size may depends on age, food intake, Gender …..


Thank you

You might also like