Professional Documents
Culture Documents
Exercise 4.38
●
100
●
80
Sample Quantiles
●
●
●
60
● ●
●●
40
●●
●●●
●●●●●
●●
●●
20
●
●●●
● ●●●
●
● ● ● ● ● ● ●●
●
−2 −1 0 1 2
Theoretical Quantiles
The data is clearly not Gaussian, and one approach is to use a data transfor-
mation. We start by a exploratory approach using the box-cox transforma-
tion (y(λ ) = (yλ − 1)/λ
## Exploratory
par(mfrow=c(2,2))
qqnorm(data)
qqnorm((sqrt(data)-1)/0.5) ## lambda=0.5
qqnorm(log(data)) ## lambda=0
qqnorm((1/sqrt(data)-1)/(-0.5)) ## lambda=-0.5
● ●
100
●
●
Sample Quantiles
Sample Quantiles
15
80
●
●
●
● ●●
●
60
● ●●
●●
10
●●
●●
●●●
40
●●●●●
●●
●● ●●
●●●●●●
●●●● ●
●●●
●●
20
●●●● ●●●●
5
●●●
●●●●● ● ●●●
● ● ● ● ●●●●● ● ● ●
−2 −1 0 1 2 −2 −1 0 1 2
● ●
●
4.5
●
●●
●●●
●●
1.7
●●
Sample Quantiles
Sample Quantiles
●
4.0
●●
●●
●● ●●●
●●●●●
1.6
●●
●● ●●
3.5
●●●
●●●●● ●
●●
1.5
● ●●
● ●
3.0
●●●
● ●
●●● ●
1.4
●●● ●●
2.5
●
● ● ●●●
●●
● ●●● ● ●
● ●
1.3
2.0
● ●
−2 −1 0 1 2 −2 −1 0 1 2
2
−60
95%
−70
−80
log−Likelihood
−90
−100
−2 −1 0 1 2
where p is the density (here the normal density) and the last term is come
from the change of variables, i.e.
∂ y(λ )
f (y) = p(y(λ ) ) = p(y(λ ) )yλ −1 (2)
∂y
taking the log and summing over all observations gives the log-likelihood.
For each fixed λ we can find the optimal parameters of µ and σ 2 , by opti-
3
mising the likelihood with respect to µ and σ 2 . These are given by
1 (λ )
µ̂(λ ) = yi (3)
n∑i
1 (λ )
σ̂ 2 (λ ) = ∑(yi − µ̂(λ ))2 (4)
n i
hence we will not need the usual inner optimisation, and the box-cox can
be implemented by
## profile likelihood
ll <- sapply(lambda, ll.lambda, y=data)
## Plot
plot(lambda, ll-max(ll),type="l",ylim=c(-5,0))
lines(range(lambda), -qchisq(0.95,df=1)/2 * c(1,1),lty=2)
4
0
−1
−2
ll − max(ll)
−3
−4
−5
lambda
Exercise 6.5
5
●
●
9
●
●
8
●
7
●
y
●
5
●
4
●
3
0 20 40 60 80 100
It is quite clear that a linear model in x (dose) will not work, but lets have a
look at the residuals anyway
plot(fitted(fit1),fit1$residuals)
6
Normal Q−Q Plot
● ●
● ●
0.5
0.5
● ●
● ●
● ●
Sample Quantiles
fit1$residuals
0.0
0.0
● ● ● ●
● ●
● ●
● ●
−0.5
−0.5
● ●
−1.0
−1.0
● ●
Even though the distribution assumption is fine, we see quite clear patterns
in the residuals vs. fitted. We now implement the non-linear model pro-
posed in the exercise, this can be formulated as
Yi ∼ N(µi , σ 2 ) (5)
with µi given by
β0
µi (x) = (6)
1 + e−β1 (xi −µ)
The mean value function is implemented by
7
nll <- function(theta,data){
m <- mu.i(theta[1],theta[2],theta[3],data$x)
- sum(dnorm(data$y, mean = m, sd = sqrt(theta[4]),
log = TRUE))
}
we can compare the likelihood of the two models calculating the AIC for
each
AIC(fit1)
## [1] 23.40353
2 * opt$objective + 2 * 4
## [1] 19.20323
library(numDeriv)
H <- hessian(nll, opt$par, data=data)
(se.theta <- sqrt(diag(solve(H))))## Satandard errors
opt$par ## Parameters
we see that 95% confidence intervals would not cover 0 in any of the cases.
For the confidence interval of the residuals we need the (negative) profile
log-likelihood
8
nllp <- function(beta1,data){
fun.tmp <- function(theta,beta1,data){
nll(c(theta[1],beta1,theta[2:3]),data)
}
nlminb(c(max(data$y), 60, 1), fun.tmp,
lower = c(-Inf, -Inf, 0),
beta1 = beta1, data = data)$objective
}
9
1.0
0.8
exp(−(nllprofile − opt$objective))
0.6
0.4
0.2
0.0
beta1
x <- seq(0,150)
plot(x, mu.i(opt$par[1], opt$par[2], opt$par[3], x),
lty = 2, col = 2, type = "l")
points(data)
10
●
●
●
●
8
mu.i(opt$par[1], opt$par[2], opt$par[3], x)
●
6
●
4
●
●
2
0 50 100 150
par(mfrow=c(1,2))
res <- data$y-mu.i(opt$par[1],opt$par[2],opt$par[3],data$x)
qqnorm(res)
qqline(res)
plot(mu.i(opt$par[1],opt$par[2],opt$par[3],data$x),res)
11
0.6 Normal Q−Q Plot
0.6
● ●
● ● ● ●
0.4
0.4
0.2
0.2
● ● ● ●
Sample Quantiles
● ●
0.0
0.0
res
● ● ● ●
−0.2
−0.2
● ●
● ●
−0.4
−0.4
● ●
−0.6
−0.6
● ●
There are still some systematic effect that could be investigated further, but
the magnitude of the residuals is smaller in the non-linear model.
Exercise 6.14
12
●
●
60
●
50
●
●
●
accident
●
40
●
●
●
●
●
● ●
●
● ●
● ●
30
●
●
●
●
●
● ●
● ●
●
20
● ●
2 4 6 8 10 12
month
From the data we see that there are fewer accidents for later years, also
there is a clear seasonal variation.
a)
Based on the plot above it is reasonable to try a model with year as a linear
effect, but month as a factor (i.e. a parameter for each month).
##
## Call:
## glm(formula = accident ~ year + factor(month), family = poisson,
## data = data)
##
13
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.93674 -0.49986 0.03137 0.65552 2.25070
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 269.25450 78.92569 3.411 0.000646 ***
## year -0.13474 0.04005 -3.365 0.000766 ***
## factor(month)2 -0.34484 0.14177 -2.432 0.014998 *
## factor(month)3 -0.21278 0.13654 -1.558 0.119138
## factor(month)4 -0.39304 0.14380 -2.733 0.006272 **
## factor(month)5 -0.31015 0.14035 -2.210 0.027110 *
## factor(month)6 -0.47000 0.14720 -3.193 0.001408 **
## factor(month)7 -0.23361 0.13733 -1.701 0.088921 .
## factor(month)8 -0.35667 0.14226 -2.507 0.012169 *
## factor(month)9 -0.14310 0.13397 -1.068 0.285460
## factor(month)10 0.19877 0.13515 1.471 0.141358
## factor(month)11 0.13935 0.13731 1.015 0.310183
## factor(month)12 0.18911 0.13549 1.396 0.162805
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 120.4 on 32 degrees of freedom
## Residual deviance: 40.9 on 20 degrees of freedom
## AIC: 242.72
##
## Number of Fisher Scoring iterations: 4
from the coefficients we see that the number of accidents decrease with
time (year), but also that there is a considerable variation over the year,
with fewer (than January) accidents from Feb-Aug and more accidents in
Oct-Dec.
b)
In order the predict the outcomes in the last 3 month of 1973 we need the
coefficients for Oct-Dec and the slope wrt year, and the link-function (log),
i.e.
14
coef(fit)
c)
In order to find the goodness of fit we need the expected in each month and
year
e <- predict(fit,type="response")
## [1] 39.83656
and finally we can calculate the p-value relating to the goodness of fit by
## [1] 0.005238489
15