You are on page 1of 54

# Multiple Regression

## Associate Professor Prapon Sahapattana, Ph.D.

GSPA, NIDA
Topics covered
Concepts of simple regression
Multiple Linear Regression
Assumptions of Multiple Regression
Questions for Analysis in Multiple Linear Regression
Model
Methods to Estimate the Regression Model
Assessing Overall Model Fit

Charts and examples in this slides came from Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
Concepts of Simple Regression
Simple Regression
Formula of straight line

Y = a + bX
Y = dependent variable
X = independent variable
a = the intercept, the point at which the line intersects the
Y-axis
b = the slope of the line, the rate of increase or decrease
in Y as a function of a unit change in X

When X changes by 1 unit, Y changes by b units
An Example of a Straight Line
An Example of a Bivariate Regression

r = +0.94
How to Get the Slope (b)
and the Constant (a)
The slope for the best t line also called the
regression coefcient
b = [N(XY) (X) (Y) ] / [N X 2 (X) 2 ]
The intercept of the best t line also called the
regression constant
a = (b) (x
)
The Regression Equation
Case Priors Sentenc X2 Y2 XY
(X) e (Y)
A 2 2 4 4 4
B 3 3 9 9 9
C 0 2 0 4 0
D 4 8 16 64 32
E 5 10 25 100 50
F 1 2 1 4 2
G 6 15 36 225 90
H 3 5 9 25 15
I 7 18 49 324 126
J 5 10 25 100 50

## Slope b = [ N (XY) (X) (Y) ] / [N X 2 (X) 2 ]

b = [ 10 (378) (36) (75) ] / [10 (174) (36) 2 ]
b = 2.432 years
Intercept a = Y bar (b) ( X bar )
a = (75 / 10) 2.432 (36 / 10) = -1.26
Regression equation
Sentence = -1.26 + 2.432 (prior convictions)
Residual Sum of Squares or Error
From the equation
sentence = -1.26 + 2.432 (priors)
Case Priors Sentence Prediction Error (e) e2
(X) (Y) (Y ) e = ( Y'-Y)

## A 2 2 +3.604 +1.604 2.573

B 3 3 +6.036 +3.036 9.217
C 0 2 -1.260 -3.260 10.628
D 4 8 +8.468 +0.468 0.219
E 5 10 +10.900 +0.900 0.810
F 1 2 +1.172 -0.826 0.686
G 6 15 +13.332 -1.668 2.782
H 3 5 +6.036 +1.036 1.073
I 7 18 +15.764 -2.236 5.000
J 5 10 +10.900 +0.900 0.810

## (Y' - Y) = (e) = 0.0

e 2 = SS error (residual sum of squares) = 33.8
Partitioning the Sums of Squares
SS total = SS regression + SS error

SS total = ( y ) 2
Without any IV, the best prediction would be the mean

SS regression = (y ) 2
But if we know the relationship between the IV and DV, the square improvement would be
SS regression

SS error = ( y y ) 2
And the error remaining would be SS error
Partitioning Sums of Squares

## Lets look at this case

Sums of Squares in Demonstration
The same example,
sentence = -1.26 + 2.432 (priors), Ybar = 7.5 years
Case Priors Sentence Total Regression Error(e 2)
(X) (Y) ( y ) 2 (y ) 2 ( y y ) 2

## A 2 2 30.25 15.18 2.573

B 3 3 20.25 2.14 9.217
C 0 2 30.25 76.74 10.628
D 4 8 0.25 0.94 0.219
E 5 10 6.25 11.56 0.810
F 1 2 30.25 40.04 0.686
G 6 15 56.25 34.01 2.782
H 3 5 6.25 2.14 1.073
I 7 18 110.25 68.29 5.000
J 5 10 6.25 11.56 0.810

## SS total = SS regression + SS error

296.5 = 262.7 + 33.8
What is the Relationship between Linear
Regression and Correlation?
Coefcient of determination (r2)
r2 = (SS total - SS error) / (SS total )
= (SS regression) / (SS total )
= (262.7) / (296.5 ) = + 0.886

## r = square root 0.886

= 0.94
The Signicance of the Regression Coefcient (b)
Test the signicance with statistic t
Null hypothesis: The sample comes from a
population in which the value of the regression
coefcient = 0.0
t = (b) / SEb

## In generalization the regression coefcient (b) to the

population, b in the population
= b t (SEb)
For example, b in the equation = 2.43
2.43 1.94
Thus, Condent interval is between 0.49 and 4.37
The Goodness of Fit of a Regression Model
To answer the question, how well does the
regression model t the data?
Many questions need to be answered
Is the correlation (r) signicantly different than 0.0 ?
If signicant, how much of the variance in Y can be
accounted for by X? The coefcient of determination
(r2)
How much of the variance in Y can not be accounted for
by X? The coefcient of non-determination
(1 r2)
Are the prediction errors distributed randomly?
residual analysis
Residual Analysis
A residual (an error) is the difference between a
prediction (Y) and the actual value of the dependent
variable (Y) or Residual (e) = ( Y Y )
If the data t the assumptions of the regression
model,
The residuals will be randomly distributed
Determine by:
Histogram of the residuals (e) with a normal curve
overlay
Normal probability plot of the residuals (e)
Plot the residuals (e) against the predictions (Y)
Multiple Linear Regression
Multiple Linear Regression
The model
Y = a + b1X1 + b2X2 + ... bkXk + e
Dependency technique
Dependent variable (Y) is metric
Independent variables (Xk) can be metric and/or
nonmetric
DV is explained by a function of two or more IV (Xk)
Assumptions of Multiple Regression
Linear relationship between Y and Xk
Errors (e) are normally distributed, distributed
homoskedastically over the levels of the predictions
(Y') and the levels of Xk, and neither autocorrelated
nor correlated with Xk
Variance of Y is homoskedastic over the various
levels of Xk
Xk are not collinear, i.e. not related with each other
Questions for Analysis in Multiple Linear
Regression Model
Example of a model
sentence = a + b1 (dr_score) + b2 (pr_conv) + b3 (tm_disp) + b4 (jail_tm)

N = 70
Questions for analysis
What is the overall relationship between the length of
sentence and the predictor variables?
How much of the variance in sentence is accounted for
by the predictor variables? How much is not accounted
for?
What is the direction and magnitude of the effect of
each predictor variable on the length of sentence?
How accurate is the model in predicting sentences?
Statistics on the DV and IVs
MeanStdDev

SENTENCE5.9574.953
DR_SCORE6.1862.661
PR_CONV1.8431.656
JAIL_TM42.91445.198
TM_DISP88.97124.405

NofCases=70

Intercorrelation Matrix
What is the overall relationship between the length of
sentence and the predictor variables?
Multiple R 0.65420
R Square 0.42797
Standard Error 3.85979

## The overall correlation: R = 0.65420

What is the probability that the multiple correlation in
the population is zero and that the obtained (R =
0.654) is the result of sampling error?
The null hypothesis H0 : (rho) = 0.00
What is the overall relationship between the length of
sentence and the predictor variables?
DFSumofSquaresMeanSquare
Regression4724.50354181.12588
Residual65968.3678914.89797
Total691692.87100
F=12.15776,SignifF=.0000

## Statistical Decision: The probability (p) that the sample

came from a population where = 0.00 is less than 1 in
1000.

The H0 is rejected and it is concluded that 0.00
How much of the variance in sentence is accounted for by the
predictor variables? How much is not accounted for?
Multiple R 0.65420
R Square 0.42797
Standard Error 3.85979

## Coefcient of determination: R2 = 0.428

Meaning 42.8% of the variance in sentence is explained
by the predictor variables
(1 - R2) = (1 - 0.428) = 0.572
Or 57.2% of the variance in sentence is not explained by
the predictor variables. It must be accounted for by
variables not included in the model.
What is the direction and magnitude of the effect of each
predictor variable on the length of sentence?
VariableBSEBBetaTSigT
DR_SCORE.542484.175487.2914383.091.0029
PR_CONV.687719.300261.2299542.290.0253
JAIL_TM.048639.011398.4438344.267.0001
TM_DISP.035828.019993.1765321.792.0778
(Constant)2.4344902.1833901.115.2690

## The regression model

Sentence = 2.434 + 0.542 (dr_score) - 0.0358
(tm_disp) + 0.04864 (jail_tm) + 0.688 (pr_conv)
The value of DV (sentence) can be predicted by the
values of the IVs.
Testing the Signicance of the Regression
Coefcients (bk)
What is the probability of the coefcients in the
population (k) is equal to zero, and that the only
reason that the obtained coefcients (bk) are not
zero is due to sampling error?

H0 : k = 0.00

## t-test is used to test the hypotheses

Testing the Signicance of the Regression
Coefcients (bk)
VariableBSEBBetaTSigT
DR_SCORE.542484.175487.2914383.091.0029
PR_CONV.687719.300261.2299542.290.0253
JAIL_TM.048639.011398.4438344.267.0001
TM_DISP.035828.019993.1765321.792.0778
(Constant)2.4344902.1833901.115.2690

## For the predictor variable dr_score

b = 0.542 standard error = 0.175
t = 0.542 / 0.175 = 3.091, p = 0.003, (df= N-k-1)
All the predictor variables are signicantly related to
sentence except tm_disp.
Equation for the Standard Error of a Regression
Coefcient

## What makes standard error of bk bigger?

The worse the t of the model to the data, the larger the
SS residual, and the larger the SE
The smaller the ratio of N to k, the larger the SE
The less predictor variable (xk2), the larger the SE
The greater the collinearity (r2 Yk1,2,3 ) of the predictor
variable (k) with the other predictors in the model, the
larger the SE
Interpretation of the Regression Coefcients
The model
Sentence = 2.434 + 0.542 (dr_score) - 0.0358 (tm_disp) + 0.04864 (jail_tm) + 0.688 (pr_conv)

## For dr-score: For every one-point increase in an

offender's drug score, the sentence increases by 0.542
years, or 199 days.
For tm_disp: For every one-day increase in the time for
the court to dispose of the case, the offender's sentence
decreases by 0.0358 years, or 13 days. However, tm_disp
How about variables jail_tm and pr_conv ?
How to Generalize Regression Coefcients to the
Population?
From the model, b for dr_score = 0.542
In the population, what is the value of dr_score ?

## The 95% condent interval of the dr_score

= b t95% (standard error of b)
How to Generalize Regression Coefcients to the
Population?
Variable95%ConfdnceIntrvlB

DR_SCORE.192012.892956
TM_DISP.075758.004101
JAIL_TM.025876.071403
PR_CONV.0880571.287382
(Constant)1.9260416.795020

## The range of the dr_score can get by either calculation or from

the table above dr_score = b SEb (t)
By calculation: b = 0.542, Standard error of b = 0.175487
t = 1.9971, for df = 65
Then dr_score = 0.542 1.9971 (0.175487)
It is the same as from the table: 0.192 and 0.893
Thus, we are 95% condent that the population parameter
dr_score is within the range of 0.192 and 0.892
Condence Interval for a Predicted Sentence (Y')
An offender who has a drug score of 8, 92 days to disposition,
23 days in jail pretrial, and 2 priors, how long the sentence
would be?
Sentence =2.434 + 0.542 (8) - 0.0358 (92) + 0.04864 (23) + 0.688 (2)
= 5.97 years
However, how big is the range for the 95% condent interval of
sentence?
df = (N - k - 1) = (70 - 4 - 1) = 65
t = 1.9971, for df = 65 and (SYxk ) = 3.85979,
Syxk = Standard error of estimate (from the print out)

## Condent interval for Y' = 5.97 years (1.9971) (3.85979)

If we need more condent, say 99%, will the interval bigger or
smaller?
How to Compare the Magnitude of Regression
Coefcients?
Can we compare the magnitude of the impacts
among IVs?
Yes and No (when they are in different scale of
measurement)
Sentence = 2.434 + 0.542 (dr_score) - 0.0358 (tm_disp)
+ 0.04864 (jail_tm) + 0.688 (pr_conv)

## dr_score is scaled from 1 to 10

tm_disp is measured in days
jail_tm is measured in days
pr_conv can range from zero on up
Which IV has the biggest impact on sentence?
How to Compare the Magnitude of Regression
Coefcients?
In order to compare the impact of IVs, the coefcients
must be standardized into beta weights ( k).
k = (bk) [ (SXk) / (SY) ]
SXk, SY = standard deviation of Xk, Y
For example, the variable dr_score
bdr_score = 0.542
Sdr_score = 2.661
S sentence = 4.9532
dr_score = (0.542) [(2.661 / 4.9532)] = + 0.291
How to Compare the Magnitude of Regression
Coefcients?
VariableBSEBBetaTSigT
DR_SCORE.542484.175487.2914383.091.0029
PR_CONV.687719.300261.2299542.290.0253
JAIL.048639.011398.4438344.267.0001
TM_DIS.035828.019993.1765321.792.0778
(Constant)2.4344902.1833901.115.2690

## How do we order the impact of IVs on sentence?

JAIL
DR_SCORE
PR_CONV
TM_DIS
Methods to Estimate the Regression Model
Methods to Estimate the Regression Model
Each method differ in steps and statistical
criteria in estimation process of regression
model
Forced Entry

Backward Elimination

Forward Selection

Stepwise Regression

All Possible Sub-Sets

Blockwise Selection
Methods to Estimate the Regression Model
The various methods will not make any difference if
there is no colinearity among IVs.
That means: Tolerance = 1.0 or VIF = 1.0
However, usually there is colinearity among IVs.
Tolerance 0.0 or VIF > 1.0
Then, the regression model will be different from
each methods.
Intercorrelation among the IVs
Forced Entry
This method in SPSS called Enter
Input all the IVs into the model at the same time
The result model will be different when
There is different in relationship between IVs and DV
There is different in colinearity among IVs
It is possible to nd IVs with insignicant relationship
with the DV in the model
Forced Entry

Sentence = 50.79 0.77 (dr_score) 0.18 (tm_disp) + 0.19 (jail_tm) - 1.14 (pr_conv)
Backward Elimination
This method in SPSS called Backward
It starts with the Enter method, then eliminate the IV
with the smallest non-signicant partial correlation
with DV.
In SPSS the default probability to eliminate a variable
is called pout (probability out) = p 0.10.
It repeats to recheck the model that the IVs in the
model are all signicantly related with the DV and the
IVs out of the model are all insignicant.
Backward Elimination

## Sentence = 43.28 0.17 (tm_disp) + 0.17(jail_tm)

Forward Selection
This method in SPSS called Forward
It begins with an intercorrelation matrix calculation
including DV and all the IVs.
Select a IV with the highest signicant correlation
with DV and estimate an equation
Use the partial correlations and F-values associated
with each remaining IVs and DV to select the
remaining IV with the highest signicant correlation
with DV and put into the equation
Repeat until all the IVs in the equation are signicant
and all left outside the equation are non-signicant
Forward Selection

## Sentence = 29.41+ 0.14 (jail_tm)

Stepwise Regression
This method in SPSS called Stepwise
The process is similar to Forward selection but it
rechecked the signicance of each IVs already put in
the model. If there is any IV in the model turned out
insignicant, after a IV was input, that IV will be
removed from the model.
In SPSS, if the level of signicance becomes greater
than the default of Pout = p 0.10, the variable is
removed.
It is the most conservative method compared to others.
Print out for this method from SPSS is the same as
from Forward selection
All Possible Sub-Sets
Another method to estimate a model by all possible
combination of IVs to get the highest signicant R2
However, it is not recommended.
If the number of IVs is large, there are a many
possible way of putting the IVs.
Blockwise Selection
This method is based on a theory or previous
research to group IVs together in to blocks.
Then the DV is regressed on IVs in each block. The
best predictor from each block is selected.
The process goes on for each block.
The nal model will be composed of the best
predictors from each theoretical block.
The knowledge about the research context is
important, otherwise the model might have high
predictive accuracy but no managerial or theoretical
relevance.
After a model estimation, the value of R2 can be derived.
When the estimated model is used to predict the value of
Y (Y) in another sample randomly selected from the
same population with same size,
R2 from the correlation between Y and Y from the second
sample will be always smaller than the rst R2 and called
The reason that makes adjust R2 smaller is because the
process of model estimation for the rst sample did not
take the error in correlations among IVs in the calculation.
Assessing Overall Model Fit
Assessing Overall Model Fit
After model was estimated, researcher need to
ensure that R and the parameters of the model are
signicant.
Then, we need to examine the residuals to see how
well the model t the data
Run data through the model and save the predicted
values of Y for each case (Y')
Compute and save the errors of prediction, which are
called the residuals (e) and equal to Y'i - Yi
Examining the Residuals (e)
Ideally, all the residuals (Y'i - Yi) will equal zero
However, it is very rare or cannot be found in real life.
The residuals should be
Normally distributed,
Homoskedastically distributed over the various levels of
the predicted values (Y'),
Not autocorrelated (i.e. not correlated with each other)
Durban-Watson
Not correlated with any of the predictor variables
Can indicate outliers
Absence of outliers.