Professional Documents
Culture Documents
20/03/2023
Week 7-Bootstrap
This week we focus on simulation and bootstrap.
Reading material
Read chapter 20 of Jones et al. The Wikipedia page on bootstrapping contains some additional explanation of
the bootstrap method. We also recommend to study problem 2 of the problem set of week 6, and its solution.
Problem 1
In this problem we assess bias and variance of estimators when the number of observations in a sample
increases. First, we consider X1 , . . . , Xn ∼ N (µ, σ 2 ) all data are independent. In class we have seen
√
2
µ̂ µ asy σ 0
n( − ) ∼ N 0, .
σ̂ σ 0 σ 2 /2
This is an asymptotic result, so at best we can hope that it approximates the finite sample distribution of θ̂
well (that is, when n is fixed).
1. Write the appropriate loglikelihoodfunction.
2. Use set.seed(45) and generate B = 10000 samples of n = 30 observations from a normal distribution
with µ = 1 and σ 2 = 2.25. For each sample, estimate the parameters µ and σ, and store those in a
matrix. Make a graph of the density of the estimated σ̂’s, and draw the density of a N (σ, σ 2 /2n) as
well. Is the empirical distribution well approximated by the asymptotic distribution?
2
σ 0
3. Calculate n1 , and compare it to the inverse of the negative of the Hessian in your last
0 σ 2 /2
simulated estimation.
4. Use set.seed(45) and generate B = 10000 samples of n = 3000 observations from a normal distribution
with µ = 1 and σ 2 = 2.25. For each sample, estimate the parameters µ and σ, and store those in a
matrix. Make a graph of the density of the estimated σ̂’s, and draw the density of a N (σ, σ 2 /2n) as
well. Is the empirical distribution well approximated by the asymptotic distribution in this case?
2
1 σ 0
5. Calculate n , and compare it to the inverse of the negative of the Hessian in your last
0 σ 2 /2
simulated estimation.
Solution
The loglikelihood function is
n
n 1 X
`(µ, σ) = − log 2π − n log σ − 2 (xi − µ)2 .
2 2σ i=1
1
First we simulate samples of size 30, small samples so the asymptotic distribution of (µ̂, σ̂)0 may not be a
good approximation to the finite sample distribution.
rm(list=ls())
set.seed(45)
n <- 30
x <- rnorm(n,mean=1,sd=1.5)
optim(c(mu=0,sd=1),loglik,control=list(fnscale=-1),z=x)
## $par
## mu sd
## 1.058899 1.576261
##
## $value
## [1] -56.22442
##
## $counts
## function gradient
## 61 NA
##
## $convergence
## [1] 0
##
## $message
## NULL
t0 <- Sys.time()
B <- 10000
bootstrap.results <- matrix(NA,nrow=B,ncol=3)
colnames(bootstrap.results) <- c("mu","sigma","convergence")
for (b in 1:B){
sample.b <- rnorm(n,mean=1,sd=1.5)
m.b <- optim(c(mu=0,sd=1),loglik,control=list(fnscale=-1),z=sample.b)
bootstrap.results[b,] <- c(m.b$par,m.b$convergence)
}
t1 <- Sys.time()
t1-t0
2
2.0 Density
Empirical
Normal
1.5
density
1.0
0.5
0.0
## mu sd
## mu 6.832987e-02 1.739109e-06
## sd 1.739109e-06 3.414888e-02
1.5ˆ2/30
## [1] 0.075
1.5ˆ2/(2*30)
## [1] 0.0375
The off-diagonal elements should be close to zero, and they are.
Now we do the same simulation for a much larger sample size, n = 3000.
set.seed(45)
n <- 3000
x <- rnorm(n,mean=1,sd=1.5)
optim(c(mu=0,sd=1),loglik,control=list(fnscale=-1),z=x)
## $par
## mu sd
3
## 0.9689086 1.4935903
##
## $value
## [1] -5459.993
##
## $counts
## function gradient
## 67 NA
##
## $convergence
## [1] 0
##
## $message
## NULL
B <- 10000
bootstrap.results <- matrix(NA,nrow=B,ncol=3)
colnames(bootstrap.results) <- c("mu","sigma","convergence")
for (b in 1:B){
sample.b <- rnorm(n,mean=1,sd=1.5)
m.b <- optim(c(mu=0,sd=1),loglik,control=list(fnscale=-1),z=sample.b)
bootstrap.results[b,] <- c(m.b$par,m.b$convergence)
}
library(ggplot2)
ggplot(data.frame(bootstrap.results),aes(x=sigma)) +
geom_line(aes(y = ..density.., colour = 'Empirical'), stat = 'density')+
stat_function(fun = dnorm, args=list(mean=mean(1.5,na.rm=TRUE),sd=1.5/(sqrt(2*3000))),
aes(colour = 'Normal')) +
scale_colour_manual(name = 'Density', values = c('red', 'blue')) +
theme(legend.position = c(0.85, 0.85))+xlab("sigma")
4
20 Density
Empirical
Normal
15
density
10
## mu sd
## mu 7.207965e-04 -3.018749e-08
## sd -3.018749e-08 3.603010e-04
1.5ˆ2/3000
## [1] 0.00075
1.5ˆ2/(2*3000)
## [1] 0.000375
The code below does the same simulation, except that it uses all the cores on the computer (this is parallel
computing). This is not exam material, it just shows you how to speed up computations.
library(doParallel)
5
## Loading required package: iterators
## Loading required package: parallel
n.cores <- detectCores()
set.seed(45)
n <- 3000
x <- rnorm(n,mean=1,sd=1.5)
optim(c(mu=0,sd=1),loglik,control=list(fnscale=-1),z=x)
## $par
## mu sd
## 0.9689086 1.4935903
##
## $value
## [1] -5459.993
##
## $counts
## function gradient
## 67 NA
##
## $convergence
## [1] 0
##
## $message
## NULL
B <- 100000
bootstrap.results <- matrix(NA,nrow=B,ncol=3)
colnames(bootstrap.results) <- c("mu","sigma","convergence")
cl <- makePSOCKcluster(n.cores-1)
registerDoParallel(cl)
bootstrap.results <- foreach(b=1:B,.combine=rbind) %dopar% {
sample.b <- rnorm(n,mean=1,sd=1.5)
m.b <- optim(c(mu=0,sd=1),loglik,control=list(fnscale=-1),z=sample.b)
c(m.b$par,m.b$convergence)
}
stopCluster(cl)
On a Macbook you can see in Activity Monitor how seven R processes are started. In simple simulation or
bootstrap problems computation time may be reduced noticeably by using this parallel computation. This
type of programming can also be run on the multicore cluster of the university, where it is easy to use 20
cores in one job.
Problem 2
Use the setup of Problem 1. Program the methods yourself, do not use boot or some other preprogrammed
procedure.
1. Calculate a 95% confidence interval for σ using the asymptotic distribution of σ̂.
2. Calculate a 95% confidence interval for σ using the bootstrap percentile method.
3. Calculate a 95% confidence interval for σ using the bootstrap BCa method.
4. Calculate an exact 95% confidence interval for σ using the finite sample distribution of distribution σ̂ 2 .
6
Solution
First, we generate a sample of n = 30 observations, that we use in the remainder of this answer. We also set
the other parameters in this problem.
set.seed(45)
n <- 30
m <- 1
sd <- 1.5
x <- rnorm(n,mean=m,sd=sd)
√
From the asymptotic distribution given in Problem~1, we see that the asymptotic distribution of σ̂ is n(σ̂ −
asy √
/2), so we take as the confidence interval based on this distribution is [σ̂ − σ̂Φ−1 (0.975)/ 2n, σ̂ −
σ) ∼ N (0, σ 2√
σ̂Φ−1 (0.975)/ 2n].
s.hat <- sd(x)
ci.asymptotic <- s.hat + qnorm(c(0.025,0.975))*sqrt(s.hatˆ2/(2*n))
According to the bootstrap percentile method, we read the appropriate quantiles of the distribution of σ̂
from the bootstrapped estimates. We choose B = 9999 replications.
B <- 9999
s.hat.star <- rep(NA,B)
for (b in 1:B){
x.b <- sample(x,n,replace=TRUE)
s.hat.star[b] <- sd(x.b)
}
ci.percentile <- quantile(s.hat.star,p=c(0.025,0.975))
In the BCa bootstrap, we do not take the 0.025 and 0.975 quantiles of the distribution of σ̂ ∗ , but quantiles that
adjust for bias and skewness. We need to calculate the constants a and b. First, we calculate b̂ = Φ−1 (F̂ ∗ (σ̂))
mean(s.hat.star<=s.hat)
## [1] 0.5678568
b <- qnorm(mean(s.hat.star<=s.hat))
b
## [1] 0.1709203
In the second step we calculate the scaling parameter a, which requires calculating the standard deviation
leaving out one observation successively.
s.hat.loo <- rep(NA,n)
for (i in 1:n) s.hat.loo[i] <- sd(x[-i])
av.s.hat.loo <- mean(s.hat.loo)
dev.s.hat.loo <- av.s.hat.loo-s.hat.loo
a <- (1/6)*sum(dev.s.hat.looˆ3)/((sum(dev.s.hat.looˆ2))ˆ(3/2))
## [1] 0.102009
beta2
## [1] 0.9992164
7
ci.bca <- quantile(s.hat.star,p=c(beta1,beta2))
Finally, we calculate the exact confidence interval. Since the observations are froma normal distribution, we
2
know that (n−1)σ̂
σ2 ∼ χ2 (n − 1) in samples of any size n. As a consequence,
"s s #
(n − 1)σ̂ 2 (n − 1)σ̂ 2
,
χ20.975 (n − 1) χ20.025 (n − 1)
is an exact confidence interval, with χ2α (n) denoting the α quantile of a χ2 (n) distribution.
ci.exact <- c(sqrt(((n-1)*s.hatˆ2)/qchisq(0.975,df=n-1)),
sqrt(((n-1)*s.hatˆ2)/qchisq(0.025,df=n-1)))
Since we have the bootstrap estimates for σ, it is easy to give a basic bootstrap interval as well:
ci.basic <- 2*s.hat-quantile(s.hat.star,p=c(0.975,0.025))
We collect all intervals, and for comparison we run the boot function as well:
ci <- rbind(asymptotic=ci.asymptotic,percentile=ci.percentile,BCa=ci.bca,
basic=ci.basic,
exact=ci.exact)
ci
## 2.5% 97.5%
## asymptotic 1.1977310 2.009175
## percentile 0.9245941 2.192933
## BCa 1.1009476 2.566972
## basic 1.0139736 2.282312
## exact 1.2770012 2.155546
library(boot)
b.results <- boot(x,function(x,i){ sd(x[i])},R=20000)
boot.ci(b.results)
Problem 3
Twelve observations on the time (in hours) between failures of airconditioning equipment have been collected:
3, 5, 7, 18, 43, 85, 91, 98, 100, 130, 230, and 487. Assume that the times between failures follow an exponential
8
distribution with parameter λ.
1. Obtain the ML estimate of λ and use the bootstrap to estimate the bias and the standard error of the
estimate.
2. Compute 95% confidence intervals for the mean time between failures 1/λ by the standard normal,
percentile, and BCa method. Compare the intervals and explain why they may differ.
3. Use library(boot) and the functions boot and boot.ci to calculate these confidence intervals as well.
Solution
It is well-known that the ML estimate of λ is
n
λ̂ = P .
i Xi
library(MASS)
##
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
##
## select
w.time <- c(3,5,7,18,43,85,91,98,100,130,230,487)
m <- fitdistr(w.time,"exponential")
m
## rate
## 0.009252120
## (0.002670857)
First, we estimate the bias and the standard error of λ̂ by a nonparametric bootstrap.
B <- 9999
lambda.B <- rep(NA,B)
n <- length(w.time)
for (b in 1:B){
b.sample <- sample(1:n,n,replace=TRUE)
lambda.B[b] <- 1/mean(w.time[b.sample])
}
## [1] 0.004208582
The bias is positive, so on average λ̂ overestimates λ. Note that the bias is large compared to the standard
error.
ggplot(data=data.frame(lambda.B),aes(x=lambda.B), geom = 'blank') +
stat_function(fun = dnorm, args=list(mean=m$estimate,sd=m$sd),
aes(colour = 'ML, normality')) +
xlab("lambda")+ylab("density")+
geom_density(aes(y = ..density..,colour="Nonarametric bootstrap"), alpha = 0.4) +
theme(legend.title=element_blank())
9
150
100
density
ML, normality
Nonarametric bootstrap
50
10
}
sd.m.star <- sd(m.star)
interval.3 <- m + sd.m.star * qnorm(c(0.025,0.975))
interval.3
## 2.5% 97.5%
## 47.49583 192.83750
Finally, we suspect that θ̂ does not have a symmetric distribution because the data are skewed to the right
themselves. The probabilities in calculating interval.4 need to be adjusted. We need the constants a and b,
reflecting scaling and bias adjustment.
b <- qnorm(mean(m.star<m))
psi <- rep(0,n)
for (i in 1:n) psi[i] <- mean(w.time[-i])
psi.centered <- mean(psi)-psi
a <- sum(psi.centeredˆ3)/(6*sum(psi.centeredˆ2)ˆ1.5)
z <- qnorm(c(0.025,0.975))
beta <- pnorm(b+(b+z)/(1-a*(b+z)))
beta
## 6.37664% 99.52597%
## 57.33333 223.48742
Very useful bootstrap functions are implemented in the library boot:
library(boot)
bootstrap.example <- boot(w.time,R=9999,statistic=function(x,i){mean(x[i])})
mean.fun <- function(d, i)
{ m <- mean(d[i])
n <- length(i)
v <- (n-1)*var(d[i])/nˆ2
c(m, v)
}
w.boot <- boot(w.time, mean.fun, R = 999)
boot.ci(w.boot, type = c("all"))
11
## Level Percentile BCa
## 95% ( 47.5, 199.2 ) ( 54.6, 227.3 )
## Calculations and Intervals on Original Scale
## Some BCa intervals may be unstable
12