You are on page 1of 78

© Andrew R Marshall

Statistics and Quantitative Methods Module

Extraordinary
Regression:
Non-Normal, Non-
Parametric & Non-Straight
Relationships
Andy Marshall
Practical 7 (Introductory Lecture)
Spring 2013

Practical 7 files are on the VLE:


Statistics and Quantitative Methods > Course Materials 2012-13 > SPRING WEEK
3 - Prac 7 - Extraordinary Regression
1
© Andrew R Marshall

Choosing a Statistical Test


From Prac 6: (d) Effect Size of Trends
i) Two datasets with normal error, without causation?

ii) Two datasets without normal error or causation?

iii) Normal predictor and normal response?

iv) Single normal response vs many normal predictors?

2
© Andrew R Marshall

Choosing a Statistical Test


From Prac 6: (d) Effect Size of Trends
i) Two datasets with normal error, without causation?
Pearson Correlation
ii) Two datasets without normal error or causation?

iii) Normal predictor and normal response?

iv) Single normal response vs many normal predictors?

3
© Andrew R Marshall

Choosing a Statistical Test


From Prac 6: (d) Effect Size of Trends
i) Two datasets with normal error, without causation?
Pearson Correlation
ii) Two datasets without normal error or causation?
Spearman Rank Correlation
iii) Normal predictor and normal response?

iv) Single normal response vs many normal predictors?

4
© Andrew R Marshall

Choosing a Statistical Test


From Prac 6: (d) Effect Size of Trends
i) Two datasets with normal error, without causation?
Pearson Correlation
ii) Two datasets without normal error or causation?
Spearman Rank Correlation
iii) Normal predictor and normal response?
Ordinary Least Squares Linear Regression
iv) Single normal response vs many normal predictors?

5
© Andrew R Marshall

Choosing a Statistical Test


From Prac 6: (d) Effect Size of Trends
i) Two datasets with normal error, without causation?
Pearson Correlation
ii) Two datasets without normal error or causation?
Spearman Rank Correlation
iii) Normal predictor and normal response?
Ordinary Least Squares Linear Regression
iv) Single normal response vs many normal predictors?
Multiple Linear Regression

6
© Andrew R Marshall

Help on the VLE


(Statistics & Quantitative Methods > STATISTICS FORUM 2012-13)

Time
Statistics Forum Summary Stats 2009-10

Date

Day

7
© Andrew R Marshall

One-to-one Help
Remaining Help Sessions

• OPEN HELP SESSION: Fri 25th Jan 09:15-12:14 (Steve LFA/015)


(wrap up any incomplete practicals and/or get assignment help)

• TUTORIAL: Weds 30th Jan 09:15-12:14 (Andy LFA/015)


(guided tutorial covering practicals 4-7 including informal test)

• OPEN HELP SESSION: Weds 6th Feb 09:15-12:14 (Andy LFA/015)


(final chance to get help on the assignment data)

• ASSIGNMENT DEADLINE: Mon 11th Feb 12:00 noon

8
© Andrew R Marshall

What is Extraordinary Regression?


Refers to regression where:
• Non-parametric
- Response data error
distribution undefinable
• Non-normal
- Response data residual
error not normal
• Non-straight
- Predictor-response model
is not a straight line
9
© Andrew R Marshall

Non-normal Methods
Identifying a non-normal response variable:
Distance from Low Tide (m) 1) Data distribution
Histogram Kernel Density
- Count data with few

0.6
0.4

0.4
values, low mean or low

0.2

0.2

Frequency
Distance (m)

0.0
0.0
sample size
1 2 3 4 5 0 1 2 3 4 5 6

Distance (m) N = 30 Bandwidth = 0.4001 - Binary (0/1) data


Normal Q-Q Plot Boxplot

5
5
2) Data exploration

4
4

3
3
- Skew

2
2

Distance (m)

1
1

Sample Quantiles
-2 -1 0 1 2
- Non-straight
Theoretical Quantiles

10
© Andrew R Marshall

Non-normal Methods
Identifying a non-normal response variable:
Distance from Low Tide (m) 1) Data distribution
Histogram Kernel Density
- Count data with few

0.6
0.4

0.4
values, low mean or low

0.2

0.2

Frequency
Distance (m)

0.0
0.0
sample size
1 2 3 4 5 0 1 2 3 4 5 6

Distance (m) N = 30 Bandwidth = 0.4001 - Binary (0/1) data


Normal Q-Q Plot Boxplot

5
5
2) Data exploration

4
4

3
3
- Skew

2
2

Distance (m)

1
1

Sample Quantiles
-2 -1 0 1 2
- Non-straight
Theoretical Quantiles

11
© Andrew R Marshall

Non-normal Methods
Identifying a non-normal response variable:
Distance from Low Tide (m) 1) Data distribution
Histogram Kernel Density
- Count data with few

0.6
0.4

0.4
values, low mean or low

0.2

0.2

Frequency
Distance (m)

0.0
0.0
sample size
1 2 3 4 5 0 1 2 3 4 5 6

Distance (m) N = 30 Bandwidth = 0.4001 - Binary (0/1) data


Normal Q-Q Plot Boxplot

5
5
2) Data exploration

4
4

3
3
- Skew

2
2

Distance (m)

1
1

Sample Quantiles
-2 -1 0 1 2
- Non-straight
Theoretical Quantiles

12
© Andrew R Marshall

Non-normal Methods
Identifying a non-normal response variable:

Residuals vs Fitted Normal Q-Q


3) Diagnostic plots

2
16 9 16 9

1
- Residual plots show

50
0

0
curvature or

Residuals
-1

-100
-2
1

Standardized residuals
1 heteroscedasticity
140 160 180 200 220 240 -2 -1 0 1 2

Fitted values Theoretical Quantiles - Normality


Scale-Location Residuals vs Leverage
- Skew/outliers

2
1

1.5
9
16 9 1

1
0.5

1.0
0
4) Tests (not essential)

-1
0.5

0.5
11 1

-2
Cook's distance 1

Standardized residuals
Standardized residuals
-F

0.0
140 160 180 200 220 240 0.0 0.1 0.2 0.3 0.4

Fitted values Leverage


- Kolmogorov-Smirnov, etc.
13
© Andrew R Marshall

Non-normal Methods
Identifying a non-normal response variable:

Residuals vs Fitted Normal Q-Q


3) Diagnostic plots

2
16 9 16 9

1
- Residual plots show

50
0

0
curvature or

Residuals
-1

-100
-2
1

Standardized residuals
1 heteroscedasticity
140 160 180 200 220 240 -2 -1 0 1 2

Fitted values Theoretical Quantiles - Normality


Scale-Location Residuals vs Leverage
- Skew/outliers

2
1

1.5
9
16 9 1

1
0.5

1.0
0
4) Tests (not essential)

-1
0.5

0.5
11 1

-2
Cook's distance 1

Standardized residuals
Standardized residuals
-F

0.0
140 160 180 200 220 240 0.0 0.1 0.2 0.3 0.4

Fitted values Leverage


- Kolmogorov-Smirnov, etc.
14
© Andrew R Marshall

Non-normal Methods
Identifying a non-normal response variable:

Residuals vs Fitted Normal Q-Q


3) Diagnostic plots

2
16 9 16 9

1
- Residual plots show

50
0

0
curvature or

Residuals
-1

-100
-2
1

Standardized residuals
1 heteroscedasticity
140 160 180 200 220 240 -2 -1 0 1 2

Fitted values Theoretical Quantiles - Normality


Scale-Location Residuals vs Leverage
- Skew/outliers

2
1

1.5
9
16 9 1

1
0.5

1.0
0
4) Tests (not essential)

-1
0.5

0.5
11 1

-2
Cook's distance 1

Standardized residuals
Standardized residuals
-F

0.0
140 160 180 200 220 240 0.0 0.1 0.2 0.3 0.4

Fitted values Leverage


- Kolmogorov-Smirnov, etc.
15
© Andrew R Marshall

Non-Parametric
Regression

16
© Andrew R Marshall

Non-Parametric
Regression
(i.e. no defined error distribution)

17
© Andrew R Marshall

Non-parametric Regression
Kendall’s robust line method:
• “Robust”, i.e. few assumptions

15
• Slope (z) = median of all possible
slopes (for every pair of points) 10

• Median slope then used to get an


5

intercept for each point, and the


median intercept is used
0

0 1 2 3 4
• Simple (!) x

z[i,j] = (yj-yi)/(xj-xi)
Cleveland (2006)
18
© Andrew R Marshall

Non-parametric Regression
Kendall’s robust line method: H0: Slope (mu; μ) = 0
(i.e. W– = W+)
• Less influenced by outliers

15
than OLS regression
• Wilcoxon Signed-Rank (One-
10
sample Inference):
y
5

- Compares median of slopes μ1-i


to single value (μ = 0)
0

- Critical value W 0 1 2 3 4

(subtract μ from μ1-i, rank disregarding x

signs, then restore signs and sum up wilcox.test(data,mu=0)


the –ves [W–] and +ves [W+]) 19
© Andrew R Marshall

Non-parametric Regression
Kendall’s robust line method: H0: Slope (mu; μ) = 0
(i.e. W– = W+)
• Cleveland (2006) gives other

15
non-parametric alternatives

10
• 3 drawbacks:
y
5
0

0 1 2 3 4

Cleveland (2006)
20
© Andrew R Marshall

Non-parametric Regression
Kendall’s robust line method: H0: Slope (mu; μ) = 0
(i.e. W– = W+)
• Cleveland (2006) gives other

15
non-parametric alternatives

10
• 3 drawbacks:
y
- Proportion variance
5

explained not known


0

- Can ignore outliers


0 1 2 3 4
- Can’t deal with more than x

one variable
Cleveland (2006)
21
© Andrew R Marshall

Non-Normal Parametric
Regression

22
© Andrew R Marshall

Non-Normal Parametric
Regression
(These models are still parametric, i.e.
response variable has defined
distribution)

23
© Andrew R Marshall

Generalised Linear Models


GLMs or “glims” (package mgcv in R)

• Regression where response variable does not


necessarily require normally distributed errors

• ≥1 predictors

• Predictor distribution unimportant (except skew)

• Based on maximum likelihood rather than


minimising the squared residual error:

“… iterative weighted linear regression…”


(Nelder & Wedderburn 1972 J. R. Statist. Soc. A)
24
© Andrew R Marshall

Generalised Linear Models


GLMs or “glims” (package mgcv in R)

GLMs require an “Error function”

• This modifies the regression to match the data type

• Adjusts the random component (the probability


distribution)…

25
© Andrew R Marshall

Gaussian (Normal) Error Family

glm(y ~ x, family = gaussian)

• Result very similar to OLS linear


regression (general linear model)

• Normal errors

Wikipedia

26
© Andrew R Marshall

Binomial Error Family

glm(y ~ x, family = binomial)

• “Logistic regression”

Wikipedia

27
© Andrew R Marshall

Binomial Error Family

glm(y ~ x, family = binomial)

• “Logistic regression”

• Probability distribution of
two alternative outcomes

• E.g. presence/absence,
0/1, categorical data

• Output: probability of Wikipedia

getting result 1 28
© Andrew R Marshall

Poisson Error Family

glm(y ~ x, family = poisson)

• “Poisson regression”

Wikipedia

29
© Andrew R Marshall

Poisson Error Family

glm(y ~ x, family = poisson)

• “Poisson regression”

• Random results from distribution


of counts
Wikipedia
• Dispersion is set to 1
(mean = variance)

• e.g. phone calls / roadkill


30
© Andrew R Marshall

Negative Binomial Error Family


glm.nb(y ~ x)
Wikimedia Commons
• Alternative
distribution for
count data (mean
≠ variance)

• Example in
practical and
Crawley (2007)

31
© Andrew R Marshall

Generalised Linear Models


GLMs require a “Link function”
• Links the expected value of y to the predictors (i.e.
adjusting for the error function)
• Common link functions:
- Identity link: E(y) = y (normal)
- Log link: E(y) = log(y) (Poisson / negative binomial)
- Logit link: E(y) = log[y/(1 – y)] (binomial)

E.g. Poisson
regression model log(y) = β0 + β1x1 + β2x2 + β3x3 + ...
32
© Andrew R Marshall

Generalised Linear Models


Stats: (1) Deviance
Deviance calculation varies according to error function

• Normal: Σ(y – y̅)2 (=sum of squares)


• Poisson: 2 Σ(y × ln[y/µ] – [y – µ])
• Binomial: 2 Σ{(y × ln[y/µ]) + ([n – y] × ln[n – y] / [n – µ])}
(μ = fitted values of y from maximum likelihood model; Crawley 2007 The R Book)

Proportion deviance explained

1 – Residual Deviance / Null Deviance 33


© Andrew R Marshall

Generalised Linear Models


Stats: (2) AIC (not just for multiple models)
• Typical selection criterion for stepwise modelling
• Trade off between model simplicity and fit (penalty of
two added for each extra parameter):

AIC = (-2 × log-likelihood) + (2 × no. of parameters)

• Lower AIC is better than higher AIC


• Rule of thumb: AIC within 2 suggests models equivalent
34
© Andrew R Marshall

Generalised Linear Models


Stats: (3) Test Statistics
• Marginal statistics used to test the significance of
slope beta (i.e. H0: β=0) – t or z
• Analysis of Deviance (likelihood ratio) test to
determine the probability that reduced model “b” has
reduced deviance to full model “a” - anova(a,b):

F = (Da – Db / νa – νb) / (Db / νb)


(D=deviance; ν=degrees of freedom)
“a” must be nested within “b” 35
© Andrew R Marshall

Generalised Linear Models


Simple univariate example:
• Crawley (2005) expected cancer patients clusters.txt
> library(mgcv)
> model <- glm(Cancers ~
Distance, family = poisson) 6

> xv <- seq(0,100,0.1)


5

> yv <- predict(model,


4

list(Distance=xv))
> plot(Distance,Cancers)
3

Cancers

> lines(xv,exp(yv))
2
1
0

Plotting a line: (1) predict points, (2) adjust for


0 20 40 60 80 100
link function 36
Distance
© Andrew R Marshall

IMPORTANT
One more step is needed if the data
continue to defy the distribution

Overdispersion…
(mean = variance → dispersion = 1)

37
© Andrew R Marshall

Quasi-likelihood in GLMs
Overdispersion
• Used where greater variability than expected

• Poisson and binomial (logistic) regression only

• Variance > mean (i.e. dispersion > 1)

• Rule of thumb:
Overdispersion concerning if dispersion >1.5
(Residual Deviance > 1.5 × residual degrees of freedom)
(often shows a funnel shape in residual diagnostic plots)
38
© Andrew R Marshall

Quasi-likelihood in GLMs
Example of overdispersion:
• Crawley (2005) clusters.txt
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.186865 0.188728 0.990 0.3221
Distance -0.006138 0.003667 -1.674 0.0941 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘
’ 1

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 149.48 on 93 degrees of freedom


Residual deviance: 146.64 on 92 degrees of freedom
AIC: 262.41
RD > 1.5 x rdf 39
© Andrew R Marshall

Quasi-likelihood in GLMs
Overdispersion
• Deviance is scaled by overdispersion coefficient (D/df)

• Binomial → quasibinomial [uses a scaling parameter ≈


Pearson chi-sq / df to scale the deviance]

• Poisson → quasipoisson

• Problems:
- Generally reduces power of the test
- Can’t use automated stepwise reduction…
40
© Andrew R Marshall

Quasi-likelihood in GLMs
Alternatives to quasi-likelihood:
• Remove more intercollinearity
• Poisson (p372 Q&K):
- Adjust parameters: √(x2/ν)
- Negative binomial GLM (see prac)

• Binomial (& Proportions):


- arcsine/probit/logit transformation
- Beta-binomial distribution
41
See also final slide…
© Andrew R Marshall

Multiple GLM Final Steps


Residuals vs Fitted Normal Q-Q
93 93

3
3
94 94
91 91

2
2

1
1

Residuals
0
0
(1) Diagnostic Plots
Std. deviance resid.

-1
-1

-2
-0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 -2 -1 0 1 2

(2) Analysis of Deviance Predicted values Theoretical Quantiles

(likelihood ratio test): Scale-Location Residuals vs Leverage


5

93
93 0.5
94
4

91
94

1.5
3

84
2

Gaussian:
1.0
1

0.5
0

Std. deviance resid.


anova(a,b,test=“F”)
Std. Pearson resid.
-1

Cook's distance
0.0

Poisson/binomial: -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.00 0.01 0.02 0.03 0.04

Predicted values Leverage


anova(a,b,test=“Chisq”)
Quasi-likelihood:
anova(a,b,test=“F”)
42
© Andrew R Marshall

Poisson GLM Example


What are the key stats here?
TABLE II. Predictors of Resident Monkey
Species Richness in 21 Udzungwa Forest
Fragments at the 95% (and 90%) Level

43
Marshall et al. (2010)
© Andrew R Marshall

Poisson GLM Example


What are the key stats here?
TABLE II. Predictors of Resident Monkey
Species Richness in 21 Udzungwa Forest
Fragments at the 95% (and 90%) Level

44
Marshall et al. (2010)
© Andrew R Marshall

Recap: GLM Steps


(1) .

(2) .

(3) .

(4) .

(5) .

(6) .

(7) .
45
© Andrew R Marshall

Recap: GLM Steps


(1) Remove intercorrelation (if multiple predictors)

(2) .

(3) .

(4) .

(5) .

(6) .

(7) .
46
© Andrew R Marshall

Recap: GLM Steps


(1) Remove intercorrelation (if multiple predictors)

(2) Determine distribution

(3) .

(4) .

(5) .

(6) .

(7) .
47
© Andrew R Marshall

Recap: GLM Steps


(1) Remove intercorrelation (if multiple predictors)

(2) Determine distribution

(3) Run full GLM (incl. correct error and link)

(4) .

(5) .

(6) .

(7) .
48
© Andrew R Marshall

Recap: GLM Steps


(1) Remove intercorrelation (if multiple predictors)

(2) Determine distribution

(3) Run full GLM (incl. correct error and link)

(4) Stepwise reduction → minimum adequate model

(5) .

(6) .

(7) .
49
© Andrew R Marshall

Recap: GLM Steps


(1) Remove intercorrelation (if multiple predictors)

(2) Determine distribution

(3) Run full GLM (incl. correct error and link)

(4) Stepwise reduction → minimum adequate model

(5) Check for over-dispersion (→ quasi-likelihood)

(6) .

(7) .
50
© Andrew R Marshall

Recap: GLM Steps


(1) Remove intercorrelation (if multiple predictors)

(2) Determine distribution

(3) Run full GLM (incl. correct error and link)

(4) Stepwise reduction → minimum adequate model

(5) Check for over-dispersion (→ quasi-likelihood)

(6) Check model diagnostics

(7) .
51
© Andrew R Marshall

Recap: GLM Steps


(1) Remove intercorrelation (if multiple predictors)

(2) Determine distribution

(3) Run full GLM (incl. correct error and link)

(4) Stepwise reduction → minimum adequate model

(5) Check for over-dispersion (→ quasi-likelihood)

(6) Check model diagnostics

(7) Analysis of deviance (minimum vs. full model)


52
© Andrew R Marshall

Non-straight Models
Clues for non-linearity
• Data exploration:

300
- Curve

250
- Hump
200

carbon
150
- Complex relationship 100
50

0.1 0.2 0.3 0.4 0.5

prop90_ba
53
© Andrew R Marshall

Non-straight Models
Clues for non-linearity
• Data exploration:

16
- Curve

14
12
- Hump

ht_dbh
10
- Complex relationship 8
6

500 1000 1500 2000

elevation 54
© Andrew R Marshall

Non-straight Models
Clues for non-linearity
• Data exploration:
- Curve
- Hump

- Complex relationship

55
© Andrew R Marshall

Non-straight Linear Models


Polynomial regression
• Works in same way as ordinary least squares
regression
lm(response ~ predictor + I(predictor^2))

• E.g. simple relationships for x = -4 to +4 (see prac):


60

15
y = a + bx + cx2
40
20

10
z

y
0

y = a + bx + cx2 + dx3
-20

5
-40
-60

-4 -2 0 2 4 -4 -2 0 2 56 4

x x
© Andrew R Marshall

Non-linear Models
Generalised Additive
Models (GAMs):
• Wiggly relationships

• Non-parametric version of GLM

• Local scoring algorithm


iteratively fits a smoothing
function, e.g. LOESS

y = β0 + f1(x1) + f2(x2) + f3(x3) + ...


57
Zuur et al. (2007) p99; Quinn & Keough p372-4
© Andrew R Marshall

Non-linear Models

GAMs:
• Uses deviance and AIC, as for GLM, then use analysis
of deviance: anova(simple,complex,Test=”F”)
• Like GLM an error probability distribution is specified
gam(y ~ s(x1) + s(x2) + s(x3), family = xxx)

• Can even mix of linear and wiggly (semi-parametric)


gam(y ~ x1 + s(x2))

58
© Andrew R Marshall

Non-linear Models

4
Over-fitting in GAMs:

2
0
• A GAM can have perfect fit

-2

s(elevation[,8.652:18])
-4
-6
• BUT fit ≠ explanatory power (we

-8
600 800 1000 1200 1400 1600 1800 2000
want to represent the “parent
elevation[2:18]
population”)

4
2
• GAM can overfit so need to adjust

0
effective degrees of freedom

-2

s(elevation,3.84)
-4
-6
• Use parsimony to decide between

-8
500 1000 1500 2000
models (e.g. quadratic = 3df)
59
elevation
© Andrew R Marshall

The genus Acacia in East Africa:


distribution, biodiversity and the
protected area network
Marshall et al (2012) Plant Ecology and Evolution

60
Marshall et al. (in review)

Predicted Acacia Biodiversity

61
© Andrew R Marshall

What Next?
Some extensions to methods covered (all
possible in R):
• Weighted GLM (under-dispersion; e.g. Ridout & Besbeas 2004)

• Zero-inflated binomial GLM (0s ~2× 1s)

• GLS (e.g. RD/df ≥ 15, i.e. high heteroscedasticity)

• GLMM/GAMM (mixed-models for spatial/temporal bias)

• Non-linear regression (e.g. exponential; Crawley p148)

• Multi-model averaging (Burnham & Anderson 1998)

• Multivariate methods (>1 response variable) 62


© Andrew R Marshall

Take Home
Messages i.There are several sound
methods for distributions
other than normal
ii. Diagnostic checks (plots and
test) are vital but not included
in some statistical software!
iii. Don’t be afraid to try non-
linear methods, but beware of
over-fitting

63
© Andrew R Marshall

Homework
i. Reading as shown on
slides
ii. Assignment data
analysis
iii. Complete all
practical exercises
iv. Add requests for the
two tutorials onto
Stats Forum
64
© Andrew R Marshall

Some Additional Slides For Interest

65
© Andrew R Marshall

Poisson Error Family

Features of a Poisson GLM

• Predictions must be positive: Log link function


ensures this (unlike linear regression)
• Predictions must be integers: Poisson error function
ensures this (unlike linear regression)

66
© Andrew R Marshall

Generalized Linear Models


Simple univariate example:
• Crawley (2005) expected cancer patients clusters.txt
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.186865 0.188728 0.990 0.3221
Distance -0.006138 0.003667 -1.674 0.0941 . P
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 149.48 on 93 degrees of freedom


Residual deviance: 146.64 on 92 degrees of freedom
AIC: 262.41
% Deviance = 1–RD/ND = 1–146.64/149.48
= 0.98 = 98% 67
© Andrew R Marshall

Generalized Linear Models


Simple univariate
example: Residuals vs Fitted Normal Q-Q

2
IT3265 IT3265

1
1 2

• Diagnostics

-1 0
-1 0

Residuals
BW2832 BW2832
Std. deviance resid.

-3
-3

LU3988
LU3988
plot(modelname)
2.0 2.5 3.0 3.5 -2 -1 0 1 2

Predicted values Theoretical Quantiles


• This example has
weak funnel shape Scale-Location Residuals vs Leverage
3

LU3988
IT3265 1
2

suggesting that 1.5 IT3265 0.5


BW2832
1

1.0

transformation or
-1 0

0.5

BW2832
0.5
Std. deviance resid.
Std. Pearson resid.

other method may be Cook's distance


LU3988
1
0.0
-3

required 2.0 2.5 3.0 3.5 0.0 0.1 0.2 0.3


68
Predicted values Leverage
© Andrew R Marshall

Generalized Linear Models


Multiple GLM: (same as MLR)
• Tests each predictor weighting by intercorrelation
glm(y ~ x1 + x2 + x3, family=xxx)

• Interactions:
glm(y ~ x1*x2*x3, family=xxx)
glm(y ~ x1 + x2 + x3 + x1:x2:x3, family=xxx)

• More parameters lead to better fit but less


explanatory power (as for MLR)

→ minimum adequate model (see last lecture)


69
© Andrew R Marshall

Multiple GLM
Model Reduction: (same as MLR)
• Before running first model, deal with intercorrelations:

1) Correlation between predictors


Pearson ≥ 0.7
2) Variance Inflation Factors (VIF) (code in prac 5)
Tolerance = 1 – r2 [for xi vs. all other predictors]
VIF = 1/Tolerance [VIF > 5 suggests collinearity]

• Remove the intercorrelated predictors least correlated


with response (unless important)
70
© Andrew R Marshall

Multiple GLM
Model Reduction: (same as MLR)
• Next run full model and remove any non-significant
interactions (3-way > 2-way …)

• Then stepwise selection using AIC (as last prac):

- Related to deviance (but penalised for lack of


parsimony): -2 × log-likelihood + 2 × (parameters + 1)

- Or BIC: -2 × log-likelihood + logen(parameters + 1)


(Bayes Information Criterion penalises parameters even more)
71
© Andrew R Marshall

Quasi-likelihood in GLMs
Example of overdispersion:
model <- glm(Cancers ~ Distance, family = quasipoisson)

Coefficients: Decreased power


Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.186865 0.235364 0.794 0.429
Distance -0.006138 0.004573 -1.342 0.183

(Dispersion parameter for quasipoisson family taken to be


1.555271)

Null deviance: 149.48 on 93 degrees of freedom


Residual deviance: 146.64 on 92 degrees of freedom
AIC: NA

→ anova(simple,complex,Test=“F”)
72
© Andrew R Marshall

Non-straight Models
Residuals vs Fitted Normal Q-Q
PSP9 PSP9

4
1.5
Clues for non-linearity

2
0.5

Residuals
-2
PSP1

-4
PSP12
• Diagnositics example Standardized residuals
-1.5 -0.5

PSP1
PSP12

10 11 12 13 14 -2 -1 0 1 2
(see practical) Fitted values Theoretical Quantiles

Scale-Location Residuals vs Leverage


2

PSP12 PSP9
PSP1 PSP9 0.5

1.2
• Significance remains
1

0.8
0

(despite curvature) 0.4


-1

Standardized residuals PSP1


Cook's distancePSP12 0.5
Standardized residuals
-2

0.0

10 11 12 13 14 0.00 0.10 0.20

Fitted values Leverage

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 15.352888 1.539830 9.971 2.86e-08 ***
73
elevation -0.002892 0.001227 -2.358 0.0314 *
© Andrew R Marshall

Non-linear Models
Non-linear (least squares) regression:
• Polynomial regression essentially transforms the
data, but if we can’t transform…

• Works in same way as ordinary least squares


regression, but now we have to tell R the equation
nls(y~a-b*exp(-c*x)) [e.g. Crawley 2005]
anova(non-linear,linear)

• Example: exponential (Crawley 2005 jaws.txt shows


how nls() helps if quadratic gives an erroneous
hump) 74
© Andrew R Marshall

Non-straight Linear Models


General notes
• Alternative code for plotting line of a model:
y_plot <- predict(xy,list(x=x_plot))
lines(x_plot,y_plot)

• If unsure always try a curve to be sure that


linear method is correct
• Use analysis of deviance to test line vs curve:
anova(non-linear,linear)
75
© Andrew R Marshall

Non-linear Models
Degrees of Freedom in GAMs:
• Df are not integers (“effective df”)
• Try various dfs using package gam – use
s(variable,x.xx)

76
© Andrew R Marshall

Non-linear Models
GAM with binary data
• Outputs is the probability of success (i.e. 1
rather than 0)
model <- gam(sp ~ ., family=binomial)

• Example: Species distribution modelling (e.g.


Guisan & Thuiller 2005)…

77
© Andrew R Marshall

Non-linear Models
Multiple GAM variables
• Smoother determined from points either side so
increased error at extremes

• Interactions: s(x1,x2)
- More complicated than lm/glm
- E.g. Crawley 2005 contour plot ozone.data.txt, p617

78

You might also like