You are on page 1of 19

02418 Week 2, solution

Exercise 3.2

a) Xi ∼ U(0, θ), i.e.


n
1 Y
pθ (x1 , ..., xn ) = I(xi > 0)I(xi < θ)
θn
i=1
1
= n I(x(1) > 0)I(x(n) < θ) = L(θ)
θ
Hence t(x1 , ..., xn ) = x(n) .

c) Xi ∼ U(θ1 , θ2 ), i.e.
n
1 Y
pθ (x1 , ..., xn ) = I(xi > θ1 )I(xi < θ2 )
(θ2 − θ1 )n
i=1
1
= I(x(1) > θ1 )I(x(n) < θ2 ) = L(θ1 , θ2 )
(θ2 − θ1 )n

Hence t(x1 , ..., xn ) = [x(1) , x(n) ]T .

f) X i = [X1,i , X2,i ]T Uniform in the circle with radius θ and E[X i ] = (0, 0)T
n
1 Y q 2 2 <θ

pθ (x1 , ..., xn ) = I x 1,i + x 2,i
(πθ2 )n
i=1
n
1 Y
= I (ri < θ)
(πθ2 )n
i=1
1 
= 2 n
I r(n) < θ = L(θ)
(πθ )
q
Hence t(x1 , ..., xn ) = r(n) , with ri = x21,i + x22,i .
x
h) Xi ∼ Exp(θ), (E[Xi ] = θ), fθ (x) = 1θ e− θ
n
Y 1 xi
pθ (x1 , ..., xn ) = e− θ
θ
i=1
1 − 1 P i xi
= e θ = L(θ)
θn
P
Hence t(x1 , ..., xn ) = i xi = nx̄.

i) Xi ∼ G(α, λ),
n
Y 1 α α−1 −λxi
pθ (x1 , ..., xn ) = λ xi e
Γ(θ)
i=1
!α−1
1 Y P
= n
λnα
xi e−λ i xi
Γ(θ)
i
P
α known: t(x1 , ..., xn ) = Q i xi
λ known: t(x1 , ..., xn ) = i xi P
α, λ unknown: t(x1 , ..., xn ) = [ i xi , i xi ]T
Q

α
j) Xi ∼ W ei(α, λ), f (x) = αλα xα−1 e−(λx)
n
α
Y
pθ (x1 , ..., xn ) = αλα xα−1
i e−(λxi )
i=1
n
!α−1
α xα
Y P
=α λ n nα
xi e−λ i i

i=1

α known: t(x1 , ..., xn ) = i xαi


P
α unknown: t(x1 , ..., xn ) = [x(1) , ...x(n) ]

2
Γ(α+β) α−1
k) Xi ∼ Beta(α, β), f (x) = Γ(α)Γ(β) x (1 − x)β−1
n
Y Γ(α + β) α−1
pθ (x1 , ..., xn ) = x (1 − xi )β−1
Γ(α)Γ(β) i
i=1
n
Γ(α + β)n Y α−1
= xi (1 − xI )β−1
Γ(α)n Γ(β)n
i=1
n
!α−1 n !β−1
Γ(α + β)n Y Y
= xi (1 − xi )
Γ(α)n Γ(β)n
i=1 i=1

Both unknown: t(x1 , ..., xn ) = [ ni=1 xi , ni=1 (1 − xi )]T


Q Q

Exercise 3.17

Referring to Example 3.10, the likelihood of θ = [µ, σ 2 ] is


1 − 12
P
i (xi −µ)
2
L(θ) = 2 n/2
e 2σ (1)
(2πσ )
and hence the log-likelihood (omitting additive constants)
n 1 X
l(θ) = − log(σ 2 ) − 2 (xi − µ)2 (2)
2 2σ
i

in order to find the profile likelihood for σ 2 we need to find the optimal µ
for each σ 2 , i.e.
0 =lµ (θ) (3)
1 X
= 2 (xi − µ) (4)
σ
i
1
= (nx̄ − nµ) (5)
σ2
which is solved for µ = x̄. Hence the profile log-likelihood for σ 2 is
n 1 X
l(σ 2 ) = − log(σ 2 ) − 2 (xi − x̄)2 (6)
2 2σ
i
n 1
=− log(σ 2 ) − 2 nσ̂ 2 (7)
2 2σ
1
with σ̂ 2 = − x̄)2 being MLE of σ 2 .
P
n i (xi
The profile likelihood can be implemented in R by

3
Lp <- function(sigma2,x){
n <- length(x)
sigma.hat.sq <- sum((x - mean(x))^2) / n
exp( - sigma.hat.sq * n/(2 * sigma2)) * sigma2^(- n / 2)
}

For the second part we define

(n − 1)S 2
Z= ∼ χ2n−1
σ2
n−1 z
−1
where S 2 = 1 2 and Z ∼ χ2n−1 hence fZ (z) = 1
e− 2 ,
P
i (Xi −X̄) z
n−1 2
n−1
2 2 Γ n−1
2( )
σ2
of course we also have S 2 n−1 = Z and therefore

2
∂z
fS 2 (s ) =fz 2

∂s
1 n−1
−1 z n−1
= n−1 z 2 e− 2
2 2 Γ n−1 σ2
2
!
 n−1 (n−1)s2
−1
1)s2 σ2

(n −
1 2 n−1
= n−1 e− 2
σ2
n−1 σ2

2 Γ 2
2
(n − 1)s2
 
n−1
= 2 fZ = L(σ 2 )
σ σ2

The second stage likelihood can be implemented in R by

Ls <- function(sigma2,x){
n <- length(x)
s2 <- var(x)
(n-1)/sigma2 * dchisq((n-1) * s2 / sigma2, df = n -1)
}

The likelihoods of are compared by the script below.

## Data
x <- c(0.88,1.07,1.27,1.54,1.91,2.27,3.84,4.5,4.64,9.41)

## Plot profile likelihood


## Profile likelihood
sigma2 <- seq(0.01,30,by=0.01)
plot(sigma2,Lp(sigma2,x)/max(Lp(sigma2,x)), type="l")

4
## Plot second stage likelihood
lines(sigma2,Ls(sigma2,x)/max(Ls(sigma2,x)),col=2)

## Compare with likelihood estimate of sigma^2 and the unbiased esitmate


n <- length(x)
lines(var(x) * c(1, 1), c(0, 1), col = 2) ## Unbiased estimate
lines((n - 1)/ n * var(x) * c(1, 1), c(0, 1)) ## likelihood estimate
1.0
0.8
Lp(sigma2, x)/max(Lp(sigma2, x))

0.6
0.4
0.2
0.0

0 5 10 15 20 25 30

sigma2

Exercise 3.21

Likelihood
1 − 12
P
i (xi −µ)
2
L(θ) = pθ (x1 , ..., xn ) = e 2σ
(2πσ 2 )n/2

5
and
n n 1 X
l(θ) = − log(2π) − log(σ 2 ) − 2 (xi − µ)2
2 2 2σ
i

hence
∂l 1 X
= 2 (xi − µ)
∂µ σ
i
∂l n 1 X
2
=− 2 + 2 2 (xi − µ)2
∂σ 2σ (σ )
i

1
and the estimates are µ̂ = x̄ and σ̂ 2 = − µ̂)2 , the second order
P
n i (xi
derivatives are
∂2l 1
=− 2
∂µ2 σ
∂2l n 1 X
2 2
= 2 2
− 2 3 (xi − µ)2
∂(σ ) 2(σ ) (σ )
i
∂2l 1 X
=− (xi − µ)
∂µ∂σ 2 (σ 2 )2
i

Inserting the estimates we get

∂ 2 l

1
2
=− 2
∂µ θ=θ̂ σ̂
2
 
∂ l n 1 X 2 1 n 1 2 n
2 2
= 4− 6 (xi − µ̂) = 4 − 2 nσ̂ = − 4
∂(σ ) θ=θ̂ 2σ̂
σ̂ σ̂ 2 σ̂ 2σ̂
i
∂ 2 l

1 X 1
= − (xi − µ̂) = − 2 2 (nx̄ − nx̄) = 0
∂µ∂σ 2 θ=θ̂ (σ̂ 2 )2 (σ̂ )
i

Hence we have the observed Hessian


 n 
− σ̂2 0
H(θ̂) =
0 − 2σ̂n4

and the observed Fisher information matrix


n 
0
I(θ̂) = −H(θ̂) = σ̂2 n
0 2σ̂ 4

Refering to Example 3.10 and exercise 3.17 the profile log-likelhood of µ


and σ 2 can be implemented as

6
lp.mu <- function(mu,y){
n <- length(y)
lp <- sum((y-mu)^2)^(-n/2)
log(lp)
}

lp.sig <- function(sigma2,y){


n <- length(x)
sigma.hat.sq <- sum((x - mean(x))^2) / n
- sigma.hat.sq * n/(2 * sigma2) +log(sigma2^(- n / 2))
}

The theoretical values of the informaion matrix is calculated by

sigma.hat2 <- (sum((x-mean(x))^2) / length(x))


I11 <- length(x) / sigma.hat2
I22 <- length(x) / (2*sigma.hat2^2)

These can be compared with the ones obtained by numerical approximation


of the curvature by (σ̂ 2 = n−1 2
n s )

library(numDeriv)
c(-hessian(lp.mu, mean(x), y = x), I11)

## [1] 1.622818 1.622818

c(-hessian(lp.sig,9/10 * var(x),y=x), I22)

## [1] 0.1316769 0.1316769

Which are the same.


In order to check the quadratic approximation we use that
Iii
l(θi ) − l(θ̂i ) ≈ − (θi − θ̂i )2 (8)
2
where θ1 = µ and θ2 = σ 2 . These are compare below in the log-domain and
the natural domain

par(mfrow=c(2,2))
## Profile log-likelihood of mu
mu <- seq(0,6,by=0.01)

7
plot(mu, sapply(mu, lp.mu, y = x) - lp.mu(mean(x), x),type="l",
ylab="Profile log-likelihood")
## Quadratic approximation
lines(mu, - I11 / 2 * (mu - mean(x)) ^ 2, lty = 2)

## Profile likelihood of mu
plot(mu, exp(sapply(mu, lp.mu, y = x) - lp.mu(mean(x), x)), type = "l",
ylab = "Profile log-likelihood")
## Quadratic approximation
lines(mu, exp(- I11 / 2 * (mu - mean(x)) ^ 2), lty = 2)

## Profile log-likelihood of sigma^2


sigma2 <- seq(0.0001, 30, length = 200)
plot(sigma2, sapply(sigma2, lp.sig, y = x) -
lp.sig(9 / 10 * var(x), x),type="l",
ylab="Profile log-likelihood",ylim=c(-10,0))
## Quadratic approximation
lines(sigma2, - I22 / 2 * (sigma2 - 9 / 10 * var(x)) ^ 2, lty = 2)

## Profile likelihood of sigma^2


plot(sigma2, exp(sapply(sigma2, lp.sig, y = x) -
lp.sig(9 / 10 * var(x), x)),type="l",
ylab="Profile log-likelihood")
## Quadratic approximation
lines(sigma2, exp(- I22 / 2 * (sigma2 - 9 / 10 * var(x)) ^ 2), lty = 2)

8
1.0
0

0.8
Profile log−likelihood

Profile log−likelihood
−1

0.6
−2

0.4
−3

0.2
−4

0.0
0 1 2 3 4 5 6 0 1 2 3 4 5 6

mu mu

1.0
0

0.8
Profile log−likelihood

Profile log−likelihood
−2

0.6
−4

0.4
−6

0.2
−8
−10

0.0

0 5 10 15 20 25 30 0 5 10 15 20 25 30

sigma2 sigma2

We see that the quadratic approximation is quite good for inference on µ


but it is quite a lot off for σ 2 .

Exercise 3.23

b)

We start by comparing I22 and (I 22 )−1 .


We can find I 22 by (using the definitions on page 63)

I11 n σ 4 (1 − ρ2 )2 (1 − ρ2 )2
I 22 = = 4 =
det(I) σ n2 n
n
(I 22 )−1 =
(1 − ρ2 )2

9
hence
I22 n(1 + ρ2 ) (1 − ρ2 )2
= = (1 + ρ2 ) ≥ 1
(I 22 )−1 (1 − ρ2 )2 n

or I22 > (I 22 )−1 hence ignoring the off diagonal terms in I will underesti-
mate the unceartainty.
We now turn to the analysis of the actual data. We start by centering the
data and plot it

x <- c(109, 88, 96, 96, 109, 116, 114, 96, 85, 100, 113,
117, 107, 104, 101, 81)
y <- c(116, 77, 95, 79, 113, 122, 109, 94, 91, 88, 115,
119, 100, 115, 95, 90)

x <- x -mean(x)
y <- y - mean(y)
y <- cbind(x,y)
plot(y,pch=19)

10

20


● ●


10


0


y

● ●

−10



−20

−20 −15 −10 −5 0 5 10 15

It is quite clear that x and y are correlated.


In order to find the likelihood of the parameters θ = [σ 2 , ρ], we use the
distribution of the centered observations
 
x̃i
∼ N (0, Σ(σ 2 , ρ)) (9)
ỹi

The likelihood is implemented by

library(mvtnorm)

## negative log likelihood


nll <- function(theta, y){
## set up parameters
sigma2 <- theta[1]
rho <- theta[2]
Sigma <- matrix(1, ncol = 2, nrow = 2)

11
Sigma[1, 2] <- Sigma[2, 1] <- rho
Sigma <- sigma2 * Sigma
## negative log-likelihood
- sum(dmvnorm(y, sigma = Sigma, log=TRUE))
}

We can use different options for optimizing the likelihood here we use
nlminb

## Initialize parameters
theta0 <- c(1,0)

## Find optimal parameters and the Hessian


opt <- nlminb(theta0, nll,
lower = c(0, -1),
upper = c(Inf, 1), y = y)
H <- hessian(nll,opt$par,y=y)

We can compare with the information given in part a) of the exercise

n <- dim(y)[1]
sigma2.hat <- opt$par[1]
rho.hat <- opt$par[2]
H

## [,1] [,2]
## [1,] 0.0006517657 -0.226674
## [2,] -0.2266740216 202.072046

n / sigma2.hat ^ 2

## [1] 0.0006517674

n * (1 + rho.hat ^ 2) / (1 - rho.hat ^ 2) ^ 2

## [1] 202.0722

- n * rho.hat / (sigma2.hat * (1 - rho.hat^2))

## [1] -0.226674

12
The likelihood function can be visualized by a countour plot, it is natural to
choose the contour lines such that the lines illustrate confidence regions.

## Plot of likelihood
## Define the parameters where the likelihood
## should be calculated
sigma2 <- seq(1, 600)
rho <- seq(0, 0.99, by = 0.01)
ll <- matrix(nrow = length(sigma2), ncol = length(rho))

## Find the values of the log likelihood


for(i in 1:length(sigma2)){
for(j in 1:length(rho)){
ll[i,j] <- nll(c(sigma2[i], rho[j]), y)
}
}

## normalized log-likelihood
Nll <- -ll+opt$objective

## Define the levels


alpha <- c(0.5,0.25,0.1,0.05,0.01)
contour(x = sigma2, y = rho,
z = Nll,ylim=c(0,1),
levels = -qchisq(1 - alpha, df = 2) / 2,
labels = alpha,xlab="sigma2",ylab="rho")

13
1.0

0.01
0.05
0.1
0.25
0.5
0.8
0.6
rho

0.4
0.2
0.0

0 100 200 300 400 500 600

sigma2

For the profile likelihood functions we need to set up functions with inner
optimisations

## Profile likelihood for sigma


lp.sigma <- function(sigma2,y){
## Function for inner optimisation
fun.tmp <- function(rho, sigma2, y){
nll(c(sigma2, rho), y)
}
## Find the profile likelihood for the given
## sigma^2
optimise(fun.tmp, c(-0.99, 0.99),
sigma2 = sigma2, y = y)$objective
}

## Profile likelihood for rho


lp.rho <- function(rho,y){

14
fun.tmp <- function(sigma2,rho,y){
nll(c(sigma2,rho),y)
}
optimise(fun.tmp, c(0,1000), rho = rho, y = y)$objective
}

To compare the likelihood and the quadratic approximation we need the


Hessian

## Find Hessian at the optimum (sigma^2)


opt.sig <- optimise(lp.sigma,c(0,1000),y=y)
H11 <- hessian(lp.sigma,opt.sig$minimum,y=y)

opt.rho <- optimise(lp.rho,c(-1,1),y=y)


H22 <- hessian(lp.rho,opt.rho$minimum,y=y)

The profle likelihood and the quadratic approximation is plotted by

## Plot the profile likelihood and the quadratic


## approximation
par(mfrow=c(2,2))

## Profile Likelihood
plot(sigma2, exp(-(sapply(sigma2, lp.sigma, y = y) -
opt$objective)), type = "l", ylab = "Profile likelihood")
## Quadratic approximation
lines(sigma2, exp(-H[1,1] * (opt$par[1]-sigma2) ^ 2 / 2),
lty=2)

## Profile log-likelihood
plot(sigma2, -(sapply(sigma2,lp.sigma, y = y) - opt$objective),
type = "l", ylim = c(-7, 0), ylab = "Profile log-likelihood")
## Quadratic approximation
lines(sigma2,-H[1,1]*(opt$par[1]-sigma2)^2/2,lty=2)

##
plot(rho, exp( -(sapply(rho, lp.rho, y = y)-opt$objective)),
type="l", ylab = "Profile likelihood")
lines(rho, exp( -H[2,2] * (opt$par[2] - rho)^2 / 2), lty = 2)

plot(rho, -(sapply(rho, lp.rho, y = y) - opt$objective), type = "l",


ylim = c(-7, 0), ylab = "Profile log-likelihood")

15
lines(rho, -H[2,2] * (opt$par[2] - rho)^2 / 2, lty = 2)
1.0

0
−1
0.8

Profile log−likelihood
Profile likelihood

0.6

−3
0.4

−5
0.2
0.0

−7
0 100 200 300 400 500 600 0 100 200 300 400 500 600

sigma2 sigma2
1.0

0
−1
0.8

Profile log−likelihood
Profile likelihood

0.6

−3
0.4

−5
0.2
0.0

−7

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

rho rho

c)

Different transformations could be used, here we just test the tranforma-


2θ −1
tion σ 2 = eθ1 and ρ = e1+e22θ 2
(correspondign to Fisher’s z-transform), the
implementation and the resulting plots is shown below

##################################################
## thransformation
nll <- function(pars, y){
sigma2 <- exp(pars[1])
rho <- (exp(2*pars[2])-1)/(1+exp(2*pars[2]))
Sigma <- matrix(ncol=2,nrow=2)
Sigma[1,1]<-Sigma[2,2]<-1

16
Sigma[1,2]<-Sigma[2,1]<-rho
Sigma <- sigma2*Sigma
-sum(dmvnorm(y,sigma = Sigma, log=TRUE))
}

opt <- nlminb(c(1,0),nll, lower = c(-Inf,-Inf),upper=c(Inf,Inf),y=y)


H <- hessian(nll,opt$par,y=y)

opt$par[1]+c(-1,1)*4*sqrt(solve(H)[1,1])

## [1] 3.773703 6.334705

opt$par[2]+c(-1,1)*4*sqrt(solve(H)[2,2])

## [1] 0.09805955 2.09805954

lsig <- seq(3.8,6.4,length=100)


psi <- seq(0,2.1,length=100)
ll <- matrix(nrow=length(lsig),ncol=length(psi))

for(i in 1:length(lsig)){
for(j in 1:length(psi)){
ll[i,j] <- nll(c(lsig[i],psi[j]),y)
}
}

alpha <- c(0.5,0.25,0.1,0.05,0.01)


contour(lsig,psi,exp(-ll+opt$objective),
levels = exp(-qchisq(1-alpha,df=2)/2),xlab="lsig",ylab="psi")

17
2.0

0.05
1.5

0.5
psi

1.0

0.25

0.1
0.5

0.01
0.0

4.0 4.5 5.0 5.5 6.0 6.5

lsig

lp.sigma <- function(lsig,y){


fun.tmp <- function(psi,lsig,y){
nll(c(lsig,psi),y)
}
optimise(fun.tmp, c(-10,10), lsig = lsig, y = y)$objective
}

par(mfrow=c(2,2))
## lsig <- log(seq(30,800))

plot(lsig,exp(-(sapply(lsig,lp.sigma,y=y)-opt$objective)),type="l")
lines(lsig,exp(-H[1,1]*(opt$par[1]-lsig)^2/2),lty=2)

plot(lsig,-(sapply(lsig,lp.sigma,y=y)-opt$objective),type="l",ylim=c(-7,0))
lines(lsig,-H[1,1]*(opt$par[1]-lsig)^2/2,lty=2)

18
lp.rho <- function(psi,y){
fun.tmp <- function(lsig,psi,y){
nll(c(lsig,psi),y)
}
optimise(fun.tmp, c(0,1000), psi = psi, y = y)$objective
}

plot(psi,exp(-(sapply(psi,lp.rho,y=y)-opt$objective)),type="l")
lines(psi,exp(-H[2,2]*(opt$par[2]-psi)^2/2),lty=2)

plot(psi,(-(sapply(psi,lp.rho,y=y)-opt$objective)),type="l",ylim = c(-7,0))
lines(psi,-H[2,2]*(opt$par[2]-psi)^2/2,lty=2)
exp(−(sapply(lsig, lp.sigma, y = y) − opt$objective))

−(sapply(lsig, lp.sigma, y = y) − opt$objective)


1.0

0
−1
0.8
0.6

−3
0.4

−5
0.2
0.0

−7

4.0 4.5 5.0 5.5 6.0 6.5 4.0 4.5 5.0 5.5 6.0 6.5

lsig lsig
exp(−(sapply(psi, lp.rho, y = y) − opt$objective))

(−(sapply(psi, lp.rho, y = y) − opt$objective))


1.0

0
−1
0.8
0.6

−3
0.4

−5
0.2
0.0

−7

0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0

psi psi

19

You might also like