Lesson 5 10 Linear Regression Residuals

How to do a linear regression?
Tom Broekel
Diagnostics: Residuals
1 © TBroekel
Diagnostics of linear regression: Residuals
Regression (residual) diagnostics show if requirements for test of

statistical significance are me
If requirements not met - inference of statistical significance of

regression parameters (~p-values) not vali
Requirement
Normally distributed residual
Heteroscedasticity (non-constant variance of residuals
Auto-correlation (independence of residuals)
2 © TBroekel
s
3 © TBroekel
s

First impression of residuals by looking at their scatterplo
Regression on NUT2 regions: GDP ~ Pop_Den + Patents
40000
20000
Residual
−20000
0 100 200
Index
4 © TBroekel
t

Test if residuals are normally distribute
Normal distribution: No systematic biases & just random noise
Explicit test: Shapiro-Wilk-test compares distribution of residuals

with normal distributio
Significant result (p-value below chosen level of significance) indicates

rejection of normal distribution hypothesi
Insignificant result (p-value above chosen level of significance) indicates

not to reject normal distribution hypothesis
5 © TBroekel
n
Rejection of normal distribution hypothesi
Coefficients correc
But: Test of coefficients’ significances not reliabl
Results cannot be interpreted
6 © TBroekel
t
Rejection of normal distribution hypothesi
What to do
Wrong function relation? (Non-linearity?
Missing variables
Inherent characteristics of data ➡ different empirical approach (e.g.,

bootstrapping)
7 © TBroekel
?

Testing for normal distribution of regression residuals in
Function ols_test_normality() in package olsrr directly applicable to

regression results objec
Function reports additional tests with usually little differences and similar
interpretation
Normal distribution
hypothesis to be rejected
because p-value below 0.01
8 © TBroekel
t
9 © TBroekel
s
Test for homoscedastic residual
Test of significance based on the assumption of residuals’ variance

being constant across their distributio
“No heteroscedasticity
No relationship between residuals and explanatory variable
Variance of residuals should be constant across the distribution of fitted

values of the dependent variable
10 © TBroekel
”
Comparison of fitted values of dependent variable with residuals
Source: https://clevertap.com/blog/a-brief-primer-on-linear-regression-part-ii/
11 © TBroekel
Breusch-Pagan test of heteroscedasticit
Regression of squared residuals on same explanatory variable
If “too” much variance explained by regression - residuals not

independent of explanatory variable
Significant result of BP-Test suggests rejection of homoscedasticity

assumptio
More tests available
12 © TBroekel
n
Rejection of homoscedasticity
Implication
Coefficients correc
Test of significance not reliabl
Results cannot be interpreted
13 © TBroekel
s
Rejection of homoscedasticity - Causes and consequence
Wrong function relation? (Test for non-linearities?
Missing variables
Inherent characteristics of data ➡ different empirical approach (e.g.,

bootstrapping)
14 © TBroekel
?
Testing for heteroscedasticity in
Function ols_test_breusch_pagan() in package olsrr directly applicable

to regression results object
Prob>Chi2 = p-value
p-value above 0.01 implying
homoscedasticity cannot be
rejected
15 © TBroekel
R
16 © TBroekel
s

Autocorrelation: Observations correlate with themselve
Observation in some kind of orde
Temporal: observations’ values in t correlate with their values in t-1 (temporal-

autocorrelation) -> Panel & time series analysis
Spatial: observations correlate with others in geographical proximity (spatial

autocorrelation
Autocorrelation implies correlated residuals, e.g., residuals not independent of

each other and hence include structural biases and not just random nois
Tests of significance not reliable
17 © TBroekel
)
High & low GDP

values geographically
clustered
18 © TBroekel
Spatial Autocorrelation: Residuals of observation i (region i)

correlate with those from its neighbouring region
Almost always a problem is case of spatial dat
Spatial autocorrelation hints at similarities across regions or strong

relations between the
Unobserved regional characteristics or relation
Regions part of a “larger” regions - regional borders not optimal
19 © TBroekel
m

Spatially autocorrelated residuals: mapping regression residual
Residuals clearly geographically structured / clustere
North Europe: under-estimating GDP from population density & patent
East Europe: over-estimating GDP from population density & patents
Regression residual
^
Residual = Y − Y
40000
20000
20 © TBroekel
d
Exact test of spatial autocorrelation: Moran‘s
Extension of Pearson‘s correlation coefficient to spatial structur
Comparison of value for region i with those of (direct) neighbouring

region
Moran-correlation coefficient I
Pn Pn
n i=1 j=1 wij (xi x̄)(xj x̄)
I = Pn Pn Pn
i=1 j=1 wij i=1 (xi x̄)2
21 © TBroekel
s

Example: spatial relations reflected by direct
Region Neighbours
neighbourhoo
ResidualA ResidualB
Region A’s neighbours: B, C, D ResidualA ResidualC
ResidualA ResidualD
Region D’s neighbours: A, E ResidualD ResidualA
ResidualD ResidualE
Region E’s neighbours: D, F, G, H
ResidualE ResidualD
Arranging residuals according to spatial ResidualE ResidualF
neighbourhood ResidualE ResidualG
ResidualE ResidualH
Estimation of
correlation coef cient=Moran’s I
22 © TBroekel

fi
Moran‘s I test of spatial autocorrelatio
Values between -1 (negative autocorrelation) and 1 (positive

autocorrelation
Significant result indicated presence of autocorrelation
23 © TBroekel
)
Problem with Moran’s
Different ways to define “neighbourhood
Direct neighbourhood (weight of neighbouring values =1, all others = 0
Weighting based on distance (growing distance implies less weight in region i’s
estimation
Neighbourhood definition impacts estimation results
Motivate choice from theory: What type of dependencies are relevant?
24 © TBroekel
…
When spatial autocorrelation presen
Use different spatial units (definition of regions
Consideration of spatial characteristics, e.g., urban vs. rura
Model spatial dependencies with dummy variables (e.g., Country

dummies
Use of spatial regression models (not this class
Multi-level regression (not this class)
25 © TBroekel
)

How to test for spatial autocorrelation in R
Load spatial information concerning geographical locations of observation
Usually, a “map
Maps = so called “shapefiles” that link geographical information (latitude and

longitude data) to empirical observations, e.g., region
R with excellent capabilities of handling spatial information using the

package sf
Use of sf (simple feature) library makes working with such data eas
Full compatibility with tidyverse and all its feature
Easy integration with ggplot
26 © TBroekel


How to test for spatial autocorrelation in R
Load shapefile with read_sf() of sf librar
Add regression residuals to original data set using add_residuals() from

modelr librar Regression object Name of new column
Merge extended data set with shapefile to “geolocated” observations
ID columns in shape le & region.data
27 © TBroekel
y
fi
y
Shapefile (sf) object includes data.frame with merged data
Merged columns of regional data
28 © TBroekel
Before calculating Moran’s I, create information on spatial relations

(who is neighbour of whom?
Extract neighbourhood information from shapefile with poly2neigh()

and transform into spatial dependency object (spatial weights) with
neigh2listw()
Some regions with no neighbours (islands)
29 © TBroekel
)
Moran’s I test implemented in spdep library with function

moran.test()
Highly signi cant

spatial autocorrelation!
Neighbours’ residuals correlat

with 0.65 (correlation coef cient)
30 © TBroekel
fi
:
fi
e
Estimating regression model considering the full set of country

dummies (almost) solves the issue
Slightly signi cant

Weak spatial autocorrelation!
31 © TBroekel
fi
:
How to do linear regressions!

Ex-ante checks
Number of observation
Type of dependent variabl
Linearit
Ex-post checks with potentially model refinemen
Multicollinearit
Outlie
Normal distribution of residual
Heteroscedasticit
Autocorrelation (spatial/temporal)
32 © TBroekel
r
How to do a linear regression? Tom Broekel

Diagnostics: Residuals
33 © TBroekel

Lesson 5 10 Linear Regression Residuals

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lesson 5 10 Linear Regression Residuals

Uploaded by

Copyright:

Available Formats

How to do a linear regression?

Regression (residual) diagnostics show if requirements for test of

If requirements not met - inference of statistical significance of

Normally distributed residual

Heteroscedasticity (non-constant variance of residuals

Auto-correlation (independence of residuals)

Diagnostics of linear regression: Residuals

Normally distributed residual

Heteroscedasticity (non-constant variance of residuals

Auto-correlation (independence of residuals)

Diagnostics of linear regression: Residuals

Regression on NUT2 regions: GDP ~ Pop_Den + Patents

Diagnostics of linear regression: Residuals

Normal distribution: No systematic biases & just random noise

Explicit test: Shapiro-Wilk-test compares distribution of residuals

Significant result (p-value below chosen level of significance) indicates

Insignificant result (p-value above chosen level of significance) indicates

Diagnostics of linear regression: Residuals

Rejection of normal distribution hypothesi

But: Test of coefficients’ significances not reliabl

Results cannot be interpreted

Diagnostics of linear regression: Residuals

Rejection of normal distribution hypothesi

Wrong function relation? (Non-linearity?

Inherent characteristics of data ➡ different empirical approach (e.g.,

Diagnostics of linear regression: Residuals

Function ols_test_normality() in package olsrr directly applicable to

Diagnostics of linear regression: Residuals

Normally distributed residual

Heteroscedasticity (non-constant variance of residuals

Auto-correlation (independence of residuals)

Diagnostics of linear regression: Residuals

Test for homoscedastic residual

Test of significance based on the assumption of residuals’ variance

No relationship between residuals and explanatory variable

Variance of residuals should be constant across the distribution of fitted

Diagnostics of linear regression: Residuals

Comparison of fitted values of dependent variable with residuals

Breusch-Pagan test of heteroscedasticit

Regression of squared residuals on same explanatory variable

If “too” much variance explained by regression - residuals not

Significant result of BP-Test suggests rejection of homoscedasticity

More tests available

Diagnostics of linear regression: Residuals

Test of significance not reliabl

Results cannot be interpreted

Diagnostics of linear regression: Residuals

Rejection of homoscedasticity - Causes and consequence

Wrong function relation? (Test for non-linearities?

Inherent characteristics of data ➡ different empirical approach (e.g.,

Diagnostics of linear regression: Residuals

Testing for heteroscedasticity in

Function ols_test_breusch_pagan() in package olsrr directly applicable

Diagnostics of linear regression: Residuals

Normally distributed residual

Heteroscedasticity (non-constant variance of residuals

Auto-correlation (independence of residuals)

Diagnostics of linear regression: Residuals

Observation in some kind of orde

Temporal: observations’ values in t correlate with their values in t-1 (temporal-

Spatial: observations correlate with others in geographical proximity (spatial

Autocorrelation implies correlated residuals, e.g., residuals not independent of