0% found this document useful (0 votes)

81 views37 pages

Understanding Multiple Linear Regression

Multiple linear regression allows predicting the value of a dependent variable based on the values of two or more independent variables. It is used for prediction, explanation, and theory building. The key assumptions of multiple linear regression are independence of observations, normality, homoscedasticity, linearity, and little to no multicollinearity between independent variables. Multiple linear regression analysis produces outputs including R-squared, regression coefficients, F-tests, and t-tests that are used to evaluate the significance and relative contribution of each independent variable to the model.

Uploaded by

Francis Walugembe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

81 views37 pages

Understanding Multiple Linear Regression

Uploaded by

Francis Walugembe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Multiple Linear Regression

(Multiple Regression Analysis)

What is MLR?
 Multiple Linear Regression is a statistical
method for estimating the relationship
between a dependent variable and two or
more independent (or predictor) variables.
 Purposes:

◦ Prediction
◦ Explanation
◦ Theory building
 Y = β0+β1X1 + β2X2 … + βnXn +Є

y-intercept error term

Regression coefficient
Design Requirements

 One dependent variable (criterion)

 Two or more independent variables

(predictor or explanatory variables).

 Sample size: >= 50 (at least 10 times as

many cases as independent variables)

Variables
 Dependent variable should be measured on a
continuous scale (interval or ratio variable).
Examples: revision time (hours), intelligence
(IQ score), weight (kg), etc,.
 Two or more independent variables, which

can be either continuous (an interval or ratio)

or categorical (an ordinal or nominal variable)
 Ordinal variables include Likert items
 Nominal variables include gender (2 groups:

male and female), ethnicity (3 groups:

Caucasian, African American and Hispanic)
Variations
Predictable variation by
the combination of
independent variables
Total Variation in Y

Unpredictable
Variation
MLR Model: Basic Assumptions
 Independence
 Normality
 Homoscedasticity
 Linearity
 No or Little Multicollinearity
Independence
 Independence of observations .
Assumption: values of the residuals (errors) are
independent across all observations
 Check using the Durbin-Watson statistic
 The Durbin-Watson statistic lies in the range

0-4. An acceptable range is 1.50 - 2.50.

 Durbin-Watson is low (less than 1.50); this

indicates the presence of positive

autocorrelation
Normality of Residuals
- Check that the residuals (errors) are
approximately normally distributed
- Two common methods to check

 (a) Histogram (with a superimposed normal curve)

 (b) a Normal Q-Q Plot.
 Residuals should follow a normal distribution
Homoscedasticity
 Homoscedasticity, or homogeneity of
variances, is an assumption of equal or
similar variances of residuals across all levels
of the independent variables
 Residual plot shows a random scatter of

points without any discernible pattern.

Linear relationship
 Relationship between dependent variable and
independent variables must be linear
 Open the Plots Dialog Box: Click on the Plots

button.
 Set Up Partial Plots:
 "Produce all partial plots" option is checked.
 Click OK to run the regression analysis and

generate the plots.

 If the relationship is linear, the data points in

each partial plot will be randomly dispersed

around a straight line
Multicollinearity
 Two or more independent variables are highly
correlated, they explain the same information
 A model will not be able to know which

variable is actually responsible for a change

in the dependent variable
 Correlation Matrix: The first step is often to

look at the correlation matrix for all

independent variables. High correlation
coefficients (typically above 0.7 or 0.8)
between two or more predictors indicate
potential multicollinearity.
Multicollinearity

 Detecting multicollinearity through an

inspection of correlation coefficients and
Tolerance/VIF (Variance Inflation Factor )
 Tolerance <0.1 and VIF > 10  significant

mulicollinearity
 Tolerance <0.25 and VIF >4  might be

multicollinearity
Simple vs. Multiple Regression

 One dependent variable  One dependent variable

Y predicted from one Y predicted from a set
independent variable X of independent
variables (X1, X2 ….Xk)
 One regression
 One regression
coefficient coefficient for each
independent variable
 R2: proportion of
 R2: proportion of
variation in dependent variation in dependent
variable Y predictable variable Y predictable by
from X set of independent
variables (X’s)
MLR Equation
 Ypred = a + b1X1 + B2X2 … + BnXn

Ypred = dependent variable or the variable to be

predicted.
 X = the independent or predictor variables
 a = a constant (Intercept of Y axis, when X = 0)
 b = weights; or partial regression coefficients.
 relative contribution of IVs on DV when controlling for the
effects of the other predictors.
MLR Output
 R2, adjusted R2, constant, b coefficient, beta, F-
test, t-test
 “R2” assesses the strength of the complex

relationship
 The adjusted R2 adjusts for the inflation in R2

caused by the number of variables in the

equation
B coefficient (Regression Weights)
 The regression weights or regression coefficients
measures the amount of increase or decrease in the
dependent variable for a one-unit difference in the
independent variable
 Y = 2 + (b) X, b= 4: Y = 2 + (4) X  every unit
change in X, Y will be increased 4
 if X = 1, Y = 6
 X= 2, Y = 10
 X increases 1  Y increase 4
Different Ways of Building
Regression Models
 Simultaneous: all independent variables
entered together
 Stepwise: independent variables entered

according to some order

◦ By size or correlation with dependent variable
◦ In order of significance
 Hierarchical: independent variables entered in
stages
F-test

 The F-test is used as a general indicator of

the probability that any of the predictor
variables contribute to the variance in the
dependent variable within the population.
 The null hypothesis is that the predictors’
weights are all effectively equal to zero.
t-test

 t-test is used to test the significance of each

predictor in the equation.
 The null hypothesis is that a predictor’s
weight is effectively equal to zero when the
effects of the other predictors are taken into
account.
  It does not contribute to the variance in
the dependent variable within the population.
Example
 Examine the factors that predict the length of
hospitalization following spinal surgery in children
(dependent continuous variable).
 The available variables in the dataset are
hematocrit, estimated blood loss, cell saver,
operating time, age at surgery, and parked red
blood cells.
 Dependent and independent variables are
measured on continuous scales
◦ Select appropriate variables (theory based and statistical
approach), and determine the effect of estimated blood loss
while controlling hematocrit and parked red blood cell, age
at surgery, cell saver, operating time (duration of surgery).
SPSS: 1) analyze, 2)
regression, 3) linear
SPSS Screen
SPSS Output
SPSS Output
Interpreting Your SPSS Multiple
Regression Output
 First let’s look at the zero-order (pairwise) correlations
between Average Female Life Expectancy (Y), Daily
Calorie Intake (X1) and People who Read (X2). Note that
these are .776 for Y with X1, .869 for Y with X2, and .682
for X1 with X2
Correlations

Average
female life Daily calorie People who
expectancy intake read (%)
Pearson Correlation Average female life
1.000 .776 .869
expectancy
r YX1 Daily calorie intake
People who read (%)
.776
.869
1.000
.682
.682
1.000
r X1X2
r YX2 Sig. (1-tailed) Average female life
expectancy
. .000 .000
Daily calorie intake .000 . .000
People who read (%) .000 .000 .
N Average female life
74 74 74
expectancy
Daily calorie intake 74 74 74
People who read (%) 74 74 74
Examining the Regression Weights
Coefficientsa

Unstandardized Standardized
Coefficients Coefficients 95% Confidence Interval for B Correlations Collinearity Statistics
Model B Std. Error Beta t Sig. Lower Bound Upper Bound Zero-order Partial Part Tolerance VIF
1 (Constant) 25.838 2.882 8.964 .000 20.090 31.585
People who read (%) .315 .034 .636 9.202 .000 .247 .383 .869 .738 .465 .535 1.868
Daily calorie intake .007 .001 .342 4.949 .000 .004 .010 .776 .506 .250 .535 1.868
a. Dependent Variable: Average female life expectancy

Regression of female life expectancy on daily calorie intake and

percentage of people who read.
Standardized beta for daily caloric intake is .342; people who read is
much larger, .636.
Every unit change in percentage of people who read , Y (female life
expectancy) will increase by a multiple of .636 standard deviations.
Beta coefficients are significant at p < .001
R, R Square, and the SEE
Model Summary

Change Statistics
Adjusted Std. Error of R Square
Model R R Square R Square the Estimate Change F Change df1 df2 Sig. F Change
1 .905 a .818 .813 4.948 .818 159.922 2 71 .000
a. Predictors: (Constant), People who read (%), Daily calorie intake

R is .905, which is a very high correlation.

R2 tells 81.8% of the variation in female life
expectancy is explained by the two predictors
F -Test for the Significance of the
Regression Equation
ANOVAb

Sum of
Model Squares df Mean Square F Sig.
1 Regression 7829.451 2 3914.726 159.922 .000 a
Residual 1738.008 71 24.479
Total 9567.459 73
a. Predictors: (Constant), People who read (%), Daily calorie intake
b. Dependent Variable: Average female life expectancy

Y = .342 X1 + .636 X2 + 25.838

A very large value of F, which is significant at p <.001.
It indicates this linear regression model provides a better fit to the data than a
model that contains no independent variables.
Multicollinearity, cont’d
 As a rule of thumb, bivariate zero-order
correlations between predictors should not
exceed .80
 The best prediction occurs when the predictors are
moderately independent of each other, but each is
highly correlated with the dependent (criterion)
variable Y
Multicollinearity Issues in our Current
SPSS Problem
 Daily Calorie Intake (X1) and People who Read (X2) is .682. This is a
pretty high correlation for two predictors to be interpreted
independently:
 Average life expectancy with % people who read, you note that the
correlation is quite high, .869. Correlations

In the case of our two predictors,

there is some indication of
multicollinearity but not enough to
throw out one of the variables
Thank you

Understanding Multiple Linear Regression
No ratings yet
Understanding Multiple Linear Regression
52 pages
Multiple Linear Regression Guide
No ratings yet
Multiple Linear Regression Guide
48 pages
Correlation, Simple Linear Regression and Multiple Linear Regression Practice
No ratings yet
Correlation, Simple Linear Regression and Multiple Linear Regression Practice
50 pages
VIP SLIDE On Regression
No ratings yet
VIP SLIDE On Regression
19 pages
Understanding Multiple Regression Analysis
No ratings yet
Understanding Multiple Regression Analysis
6 pages
Multiple Regression Analysis Overview
No ratings yet
Multiple Regression Analysis Overview
40 pages
CH 4 Multiple Regression Models
No ratings yet
CH 4 Multiple Regression Models
28 pages
Understanding Simple and Multiple Regression
No ratings yet
Understanding Simple and Multiple Regression
36 pages
Understanding Multiple Linear Regression
No ratings yet
Understanding Multiple Linear Regression
29 pages
Regression Analysis Guide
100% (1)
Regression Analysis Guide
35 pages
063a - Submission Attachment - Aust Industry - SPC Ardmona - UsaMultiple Linear Regression Analysis
No ratings yet
063a - Submission Attachment - Aust Industry - SPC Ardmona - UsaMultiple Linear Regression Analysis
35 pages
Multiple Regression Analysis Overview
No ratings yet
Multiple Regression Analysis Overview
8 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
39 pages
Multiple Linear Regression: Points of Significance
No ratings yet
Multiple Linear Regression: Points of Significance
2 pages
SPSS Linear Regression Analysis Guide
No ratings yet
SPSS Linear Regression Analysis Guide
72 pages
Chapter 8 Linear Regression
No ratings yet
Chapter 8 Linear Regression
34 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
73 pages
Stats Multiple Regression
No ratings yet
Stats Multiple Regression
19 pages
Understanding Regression Analysis Basics
No ratings yet
Understanding Regression Analysis Basics
55 pages
Multiple Linear Regression - Prof. Sami Day 1
No ratings yet
Multiple Linear Regression - Prof. Sami Day 1
58 pages
Understanding Multiple Regression Analysis
No ratings yet
Understanding Multiple Regression Analysis
168 pages
Econometrics Unit 4
No ratings yet
Econometrics Unit 4
56 pages
Multiple Linear Regression-1
No ratings yet
Multiple Linear Regression-1
8 pages
STAT22209 - Chapter 03-Multiple Regression - 2022
No ratings yet
STAT22209 - Chapter 03-Multiple Regression - 2022
41 pages
Multivariate Analysis Techniques Guide
No ratings yet
Multivariate Analysis Techniques Guide
23 pages
Multiple Regression for Students
100% (2)
Multiple Regression for Students
105 pages
Understanding Regression Analysis Basics
No ratings yet
Understanding Regression Analysis Basics
39 pages
Multiple Linear Regression Slides
No ratings yet
Multiple Linear Regression Slides
17 pages
Multiple Linear Regression in SPSS Guide
No ratings yet
Multiple Linear Regression in SPSS Guide
23 pages
Understanding Multiple Linear Regression
No ratings yet
Understanding Multiple Linear Regression
6 pages
08 Multiple Regression
No ratings yet
08 Multiple Regression
11 pages
Understanding Multiple Regression Analysis
No ratings yet
Understanding Multiple Regression Analysis
57 pages
11 Bda
No ratings yet
11 Bda
25 pages
Understanding Simple Linear Regression
No ratings yet
Understanding Simple Linear Regression
39 pages
Multiple Linear Regression Session 4
No ratings yet
Multiple Linear Regression Session 4
32 pages
RSM1282-2025-Session 6-Multiple Regression POST
No ratings yet
RSM1282-2025-Session 6-Multiple Regression POST
84 pages
Multiple Regression Slides Mod-Ed
No ratings yet
Multiple Regression Slides Mod-Ed
32 pages
Correlation Regression Tutorial
No ratings yet
Correlation Regression Tutorial
42 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
26 pages
2.1 Regression Analysis
No ratings yet
2.1 Regression Analysis
28 pages
Linear Regression and Correlation Analysis
No ratings yet
Linear Regression and Correlation Analysis
30 pages
Anova Explain
No ratings yet
Anova Explain
10 pages
Multiple Regression Analysis Overview
No ratings yet
Multiple Regression Analysis Overview
57 pages
RiP Final Study
No ratings yet
RiP Final Study
35 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
10 pages
Psy524 Lecture 5 MR - Updated
No ratings yet
Psy524 Lecture 5 MR - Updated
29 pages
Multiple Regression for STAT 501
No ratings yet
Multiple Regression for STAT 501
23 pages
Multiple Regression Example PDF
No ratings yet
Multiple Regression Example PDF
5 pages
Multiple Regression Forecasting Techniques
No ratings yet
Multiple Regression Forecasting Techniques
100 pages
Regression Inference and Model Evaluation
No ratings yet
Regression Inference and Model Evaluation
37 pages
Predicting VO2max with Regression Analysis
No ratings yet
Predicting VO2max with Regression Analysis
17 pages
Pink Green Bright Aesthetic Playful Math Class Presentation
No ratings yet
Pink Green Bright Aesthetic Playful Math Class Presentation
34 pages
Assumptions of Multiple Linear Regression
No ratings yet
Assumptions of Multiple Linear Regression
17 pages
Understanding Multiple Linear Regression
No ratings yet
Understanding Multiple Linear Regression
11 pages
4 Multiple Regression Analysis
No ratings yet
4 Multiple Regression Analysis
58 pages
Understanding Multiple Linear Regression
No ratings yet
Understanding Multiple Linear Regression
7 pages
Understanding Multiple Regression Analysis
No ratings yet
Understanding Multiple Regression Analysis
23 pages
Animal Sounds Quiz
No ratings yet
Animal Sounds Quiz
11 pages
Toeic Practice Sept, 21 - 2025toeic21
No ratings yet
Toeic Practice Sept, 21 - 2025toeic21
13 pages
Multinomial Logistic Regression Guide
No ratings yet
Multinomial Logistic Regression Guide
24 pages
Africa's COVID-19 Communication Platform
No ratings yet
Africa's COVID-19 Communication Platform
11 pages
Cooperatives Act B.E. 2542 Overview
No ratings yet
Cooperatives Act B.E. 2542 Overview
28 pages
Understanding ANOVA Techniques
No ratings yet
Understanding ANOVA Techniques
24 pages
General Ability Test 1: Formative Learning
100% (1)
General Ability Test 1: Formative Learning
12 pages
Assessment Tools and Direct Assessment Methods
100% (3)
Assessment Tools and Direct Assessment Methods
3 pages
Speaking Practice - A2
No ratings yet
Speaking Practice - A2
16 pages
Modal Propositions and The Multiple Types of Categorical
50% (2)
Modal Propositions and The Multiple Types of Categorical
17 pages
Analyzing Child Psychology Discourse
No ratings yet
Analyzing Child Psychology Discourse
15 pages
Literature Review and Research Methods
No ratings yet
Literature Review and Research Methods
29 pages
Empowering Educators with CCDA
No ratings yet
Empowering Educators with CCDA
3 pages
Statistics for Informed Decision-Making
No ratings yet
Statistics for Informed Decision-Making
29 pages
Employability Skills Post-Test Guide
No ratings yet
Employability Skills Post-Test Guide
5 pages
Group Theory in The Bedroom, and Other Mathematical Diversions
No ratings yet
Group Theory in The Bedroom, and Other Mathematical Diversions
2 pages
Thomas Aquinas and Cognitive Therapy
No ratings yet
Thomas Aquinas and Cognitive Therapy
22 pages
Data Mining for Business Decisions
100% (1)
Data Mining for Business Decisions
85 pages
Filipino Curriculum Map: Student Learning Outcomes
No ratings yet
Filipino Curriculum Map: Student Learning Outcomes
2 pages
36 Questions To Generate Closeness
No ratings yet
36 Questions To Generate Closeness
1 page
Call Center Manager Expertise
No ratings yet
Call Center Manager Expertise
3 pages
bineSentencesElements4thCourse PDF
100% (1)
bineSentencesElements4thCourse PDF
53 pages
Estimation & Quantity Surveying Guide
No ratings yet
Estimation & Quantity Surveying Guide
5 pages
De Va Goi y Mon Anh Tuyen Sinh 10 TP HCM 62012 9294
No ratings yet
De Va Goi y Mon Anh Tuyen Sinh 10 TP HCM 62012 9294
3 pages
Purposive Communication
No ratings yet
Purposive Communication
5 pages
Iv Sem Nep
No ratings yet
Iv Sem Nep
4 pages
Genesis For The New Space Age
96% (24)
Genesis For The New Space Age
240 pages
Theory of Computation Course Overview
No ratings yet
Theory of Computation Course Overview
4 pages
Industrial Dead Leg Hazards
No ratings yet
Industrial Dead Leg Hazards
1 page
Engineering Tolerances and Fits Explained
No ratings yet
Engineering Tolerances and Fits Explained
13 pages
Riverine Fisheries of India
No ratings yet
Riverine Fisheries of India
8 pages
Ingestion of Arti Ficial Sweeteners Leads To Caloric Frustration Memory in Drosophila
No ratings yet
Ingestion of Arti Ficial Sweeteners Leads To Caloric Frustration Memory in Drosophila
10 pages
CASTvac
No ratings yet
CASTvac
3 pages
Kertas 1 Peperiksaan Akhir Tahun Sains Tingkatan 4
33% (3)
Kertas 1 Peperiksaan Akhir Tahun Sains Tingkatan 4
20 pages
Project Leadership PDF
No ratings yet
Project Leadership PDF
60 pages
Reflection
No ratings yet
Reflection
2 pages

Understanding Multiple Linear Regression

Uploaded by

Understanding Multiple Linear Regression

Uploaded by

Multiple Linear Regression

(Multiple Regression Analysis)

y-intercept error term

 One dependent variable (criterion)

(predictor or explanatory variables).

many cases as independent variables)

can be either continuous (an interval or ratio)

male and female), ethnicity (3 groups:

0-4. An acceptable range is 1.50 - 2.50.

indicates the presence of positive

 (a) Histogram (with a superimposed normal curve)

points without any discernible pattern.

generate the plots.

each partial plot will be randomly dispersed

variable is actually responsible for a change

look at the correlation matrix for all

 Detecting multicollinearity through an

 One dependent variable  One dependent variable

Ypred = dependent variable or the variable to be

caused by the number of variables in the

according to some order

 The F-test is used as a general indicator of

 t-test is used to test the significance of each

Regression of female life expectancy on daily calorie intake and

R is .905, which is a very high correlation.

Y = .342 X1 + .636 X2 + 25.838

In the case of our two predictors,

You might also like