You are on page 1of 16

Linear Statistical Models

Response to Response to
Patient. No. Drug A Drug B
1 1.9 0.7
2 0.8 -1.0
3 1.1 -0.2
4 0.1 -1.2
5 -0.1 -0.1
6 4.4 3.4
7 4.6 0.0
8 1.6 0.8
9 5.5 3.7
10 3.4 2.0
Fitting a straight line
Fitting a straight line
Fitting a straight line
Fitting a straight line
Fitting a straight line
Galapagos Islands Plant Species Richness
Island Area (km2) Species log10(Area) log10(Species)
Albemarle 5824.9 325 3.765 2.512
Charles 165.8 319 2.220 2.504
Chatham 505.1 306 2.703 2.486
James 525.8 224 2.721 2.350
Indefatigable 1007.5 193 3.003 2.286
Abingdon 51.8 119 1.714 2.076
Duncan 18.4 103 1.265 2.013
Narborough 634.6 80 2.803 1.903
Hood 46.6 79 1.668 1.898
Seymour 2.6 52 0.415 1.716
Barringon 19.4 48 1.288 1.681
Gardner 0.5 48 -0.301 1.681
Bindloe 116.6 47 2.067 1.672
Jervis 4.8 42 0.681 1.623
Tower 11.4 22 1.057 1.342
Wenman 4.7 14 0.672 1.146
Culpepper 2.3 7 0.362 0.845
Galapagos Islands Plant Species Richness
Galapagos Islands Plant Species Richness
Call: lm(formula = Log.Species ~ Log.Area)
Coefficients: (Intercept) Log.Area
1.3216 0.32981
Residuals:
Min 1Q Median 3Q Max
-0.59575 -0.32767 0.02589 0.25759 0.45894
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.32157 0.14203 9.305 1.28e-07 ***
Log.Area 0.32976 0.07188 4.587 0.000356 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.3208 on 15 degrees of freedom
Multiple R-squared: 0.5838, Adjusted R-squared: 0.5561
F-statistic: 21.04 on 1 and 15 DF, p-value: 0.0003558
Galapagos Islands Plant Species Richness
Influence and the Hat Matrix
If the vector of response values is denoted by y and the fitted
values by ŷ, then

ŷ = H y, where H is the Hat matrix

Called “Hat” matrix because it puts a hat on y


(Also called influence matrix or projection matrix)

The formula for the vector of residuals is therefore:

r = y – ŷ = y – H y = (I – H) y
Influence and the Hat Matrix
Moreover, the element in the ith row and jth column of
H is equal to the covariance between the jth response
value and the ith fitted value, divided by the variance
of the former:

The diagonal elements of the hat matrix are leverages.

The leverage of a data point measures the impact that yi has on ŷ.


The further xi is from mean of x, the greater the hi, and the more
sensitive the regression to changes in hi
compute.sr.tq <- function(lm.obj)
{
raw.res <- resid(lm.obj)
N <- length(raw.res)
h.ii <- influence(lm.obj)$hat
mse.res <- sum(raw.res^2/(N-2))
std.s <- sqrt(mse.res)

stdd.res <- raw.res / (std.s * sqrt(1 - h.ii))


stdd.res <- sort(stdd.res)

# for theoretical quantiles use (i-0.5)/n


index <-1:N

pn <- (index - 0.5)/N


qn <- qnorm(pn)

return(data.frame(TQ=qn, STD.Res=stdd.res))
}
Cook’s Distance
Cook's distance measures the effect of deleting a given
observation. Data points with large residuals (outliers)
and/or high leverage may distort the outcome and
accuracy of a regression. Points with a large Cook's
distance are considered to merit closer examination
in the analysis
Cook’s Distance
There are different opinions regarding what cut-off
values to use for spotting highly influential points.
A simple operational guideline of Di > 1 has been
suggested. Others have indicated that Di > 4/n,
where n is the number of observations, might be used

You might also like