You are on page 1of 26

Topic: Multiple Linear Regression

with a Case Study in Python

Course: Applied Statistics


By: 科瓦吉 -214718003
Why Multiple Regression? References
The General Idea Benefits of Multiple Regression

1 2 2 3 3 4 5 6 7 8

Equations
Example of Multiple Regression
Case Study: Student Grade Prediction
The General Idea:
Simple Regression Considers the relation
between a single independent (or explanatory)
variable and a dependent (or response) variable.

Example: Does height have an influence on the


weight of a person?

科瓦吉
自 强 不 息 厚 德 载 物 11/29/2022 T s i nC
g he un at r U
a ln iSv oe ur st iht yU n
o fi v C
e hr si ni tay 3
The General Idea:
Multiple regression simultaneously considers
the influence of multiple independent (or
explanatory) variables on a dependent (or
response) variable Y.

Example: Do height and gender have


an influence on the weight of a person?

科瓦吉
自 强 不 息 厚 德 载 物 11/29/2022 T s i nC
g he un at r U
a ln iSv oe ur st iht yU n
o fi v C
e hr si ni tay 4
Example of Multiple Regression
A researcher decides to study students’ performance
in a school over a period of time. He observed that as
the lectures proceed to operate online, the
performance of students started to decline as well.
The parameters for the dependent variable “decrease
in performance” are various independent variables
like “lack of attention, more internet addiction,
neglecting studies” and much more. 

科瓦吉
自 强 不 息 厚 德 载 物 11/29/2022 T s i nC
g he un at r U
a ln iSv oe ur st iht yU n
o fi v C
e hr si ni tay 5
Why Multiple Regression?
Simple linear regression is limited to just one
independent and one dependent variable. To tackle
this limitation, we use multiple regression and it is
allowing for analyzing more than one independent
variable

科瓦吉
自 强 不 息 厚 德 载 物 11/29/2022 T s i nC
g he un at r U
a ln iSv oe ur st iht yU n
o fi v C
e hr si ni tay 6
Equations:
We will start the discussion by first taking a look at the
linear regression equation:
Ŷ = bx + a
Where,
Ŷ: dependent variable
x: independent variable
a: y-intercept
b: slope or coefficient

But according to our definition, as the multiple regression


takes several independent variables (x), so for the
equation we will have multiple ‘x’ values too:
Ŷ = b1x1 + b2x2 + … bnxn + a
科瓦吉
自 强 不 息 厚 德 载 物 11/29/2022 T s i nC
g he un at r U
a ln iSv oe ur st iht yU n
o fi v C
e hr si ni tay 7
Equations:
Formula to find the slope:

Formula to find the Y-intercept:

科瓦吉
自 强 不 息 厚 德 载 物 11/29/2022 T s i nC
g he un at r U
a ln iSv oe ur st iht yU n
o fi v C
e hr si ni tay 8
Equations:
Coefficient of Determination (R²):
R-squared, often written R2, is the proportion of the variance in
the response variable that can be explained by the predictor
variables in a linear regression model. The value for R-squared can
range from 0 to 1 where:
• 0 indicates that the response variable cannot be explained by
the predictor variable at all.
• 1 indicates that the response variable can be perfectly
explained without error by the predictor variables.

科瓦吉
自 强 不 息 厚 德 载 物 11/29/2022 T s i nC
g he un at r U
a ln iSv oe ur st iht yU n
o fi v C
e hr si ni tay 9
Equations:
How to find Coefficient of Determination (R²):

科瓦吉
自 强 不 息 厚 德 载 物 11/29/2022 T s i nC
g he un at r U
a ln iSv oe ur st iht yU n
o fi v C
e hr si ni tay 10
Equations:
Adjusted R-Squared:
• The adjusted R-squared is a modified version of R-squared that
adjusts for the number of predictors in a regression model.
• It only increases if the newly added predictor improves the model’s
predicting power. Adding independent and irrelevant predictors to a
regression model results in a decrease in the adjusted R-squared.

科瓦吉
自 强 不 息 厚 德 载 物 11/29/2022 T s i nC
g he un at r U
a ln iSv oe ur st iht yU n
o fi v C
e hr si ni tay 11
Benefits of Multiple Regression Analysis:
• Multiple regression analysis helps us to better study the
various predictor variables at hand.

• It increases reliability by avoiding dependency on just


one variable and having more than one independent
variable to support the event. 

科瓦吉
自 强 不 息 厚 德 载 物 11/29/2022 T s i nC
g he un at r U
a ln iSv oe ur st iht yU n
o fi v C
e hr si ni tay 12
Case Study: Student Grade Prediction
• The data set is from the UCL Machine Learning Repository.

• This data approach student achievement in mathematics


subjects in secondary education in a Portuguese school. The
data attributes include student grades, demographic, and
social and school-related features.

• 395 entries, 33 attributes

科瓦吉
自 强 不 息 厚 德 载 物 11/29/2022 T s i nC
g he un at r U
a ln iSv oe ur st iht yU n
o fi v C
e hr si ni tay 13
Case Study: Student Grade Prediction
Resources Used:
• Python 3.7.2
• Jupyter Notebook

Libraries:
• Pandas
• Matplotlib
• Scikit-Learn

科瓦吉
自 强 不 息 厚 德 载 物 11/29/2022 T s i nC
g he un at r U
a ln iSv oe ur st iht yU n
o fi v C
e hr si ni tay 14
Case Study: Student Grade Prediction

科瓦吉
自 强 不 息 厚 德 载 物 11/29/2022 T s i nC
g he un at r U
a ln iSv oe ur st iht yU n
o fi v C
e hr si ni tay 15
Case Study: Student Grade Prediction
Correlation Matrix:

科瓦吉
自 强 不 息 厚 德 载 物 11/29/2022 T s i nC
g he un at r U
a ln iSv oe ur st iht yU n
o fi v C
e hr si ni tay 16
Case Study: Student Grade Prediction
Correlation Matrix:

科瓦吉
自 强 不 息 厚 德 载 物 11/29/2022 T s i nC
g he un at r U
a ln iSv oe ur st iht yU n
o fi v C
e hr si ni tay 17
Case Study: Student Grade Prediction

Correlation using
Matplotlib

科瓦吉
自 强 不 息 厚 德 载 物 11/29/2022 T s i nC
g he un at r U
a ln iSv oe ur st iht yU n
o fi v C
e hr si ni tay 18
Case Study: Student Grade Prediction

科瓦吉
自 强 不 息 厚 德 载 物 11/29/2022 T s i nC
g he un at r U
a ln iSv oe ur st iht yU n
o fi v C
e hr si ni tay 19
Case Study: Student Grade Prediction

科瓦吉
自 强 不 息 厚 德 载 物 11/29/2022 T s i nC
g he un at r U
a ln iSv oe ur st iht yU n
o fi v C
e hr si ni tay 20
Case Study: Student Grade Prediction

科瓦吉
自 强 不 息 厚 德 载 物 11/29/2022 T s i nC
g he un at r U
a ln iSv oe ur st iht yU n
o fi v C
e hr si ni tay 21
Case Study: Student Grade Prediction
What we have found?

• The target attribute G3 has a strong correlation with


attributes G2 and G1.
• This occurs because G3 is the final year grade (issued at the
3rd period), while G1 and G2 correspond to the 1st and 2nd-
period grades.
• It is more difficult to predict G3 without G2 and G1.

科瓦吉
自 强 不 息 厚 德 载 物 11/29/2022 T s i nC
g he un at r U
a ln iSv oe ur st iht yU n
o fi v C
e hr si ni tay 22
Case Study: Student Grade Prediction
Results in Excel:

科瓦吉
自 强 不 息 厚 德 载 物 11/29/2022 T s i nC
g he un at r U
a ln iSv oe ur st iht yU n
o fi v C
e hr si ni tay 23
References

• https://datatab.net/tutorial/linear-regression

https://www.sciencedirect.com/topics/social-sciences/multiple-regression#:~:te
xt=Multiple%20regression%20is%20a%20statistical,of%20the%20single%20dep
endent%20value

https://www.voxco.com/blog/multiple-regression-analysis-definition-example-a
nd-equation/

https://www.analyticsvidhya.com/blog/2022/03/multiple-linear-regression-usin
g-python/

• http://faculty.cas.usf.edu/mbrannick/regression/Part3/Reg2.html
• https://www.statology.org/adjusted-r-squared-in-python/
• https://www.statology.org/r-squared-in-python/
自 强 不 息 厚 德 载 物 11/29/2022 T s i nC
g he un at r U
a ln iSv oe ur st iht yU n
o fi v C
e hr si ni tay 24
References

• https://medium.com/swlh/multi-linear-regression-using-python-44bd0d10082
d
• https://www.kaggle.com/datasets/dipam7/student-grade-prediction

自 强 不 息 厚 德 载 物 11/29/2022 T s i nC
g he un at r U
a ln iSv oe ur st iht yU n
o fi v C
e hr si ni tay 25
THANKS FOR
Listening

You might also like