You are on page 1of 12

Problem set 7

Numerical Methods for EOR

20/03/2023

Week 7-Bootstrap
This week we focus on simulation and bootstrap.

Reading material
Read chapter 20 of Jones et al. The Wikipedia page on bootstrapping contains some additional explanation of
the bootstrap method. We also recommend to study problem 2 of the problem set of week 6, and its solution.

Problem 1
In this problem we assess bias and variance of estimators when the number of observations in a sample
increases. First, we consider X1 , . . . , Xn ∼ N (µ, σ 2 ) all data are independent. In class we have seen


      2 
µ̂ µ asy σ 0
n( − ) ∼ N 0, .
σ̂ σ 0 σ 2 /2

This is an asymptotic result, so at best we can hope that it approximates the finite sample distribution of θ̂
well (that is, when n is fixed).
1. Write the appropriate loglikelihoodfunction.
2. Use set.seed(45) and generate B = 10000 samples of n = 30 observations from a normal distribution
with µ = 1 and σ 2 = 2.25. For each sample, estimate the parameters µ and σ, and store those in a
matrix. Make a graph of the density of the estimated σ̂’s, and draw the density of a N (σ, σ 2 /2n) as
well. Is the empirical distribution well approximated by the asymptotic distribution?
 2 
σ 0
3. Calculate n1 , and compare it to the inverse of the negative of the Hessian in your last
0 σ 2 /2
simulated estimation.
4. Use set.seed(45) and generate B = 10000 samples of n = 3000 observations from a normal distribution
with µ = 1 and σ 2 = 2.25. For each sample, estimate the parameters µ and σ, and store those in a
matrix. Make a graph of the density of the estimated σ̂’s, and draw the density of a N (σ, σ 2 /2n) as
well. Is the empirical distribution well approximated by the asymptotic distribution in this case?
 2 
1 σ 0
5. Calculate n , and compare it to the inverse of the negative of the Hessian in your last
0 σ 2 /2
simulated estimation.

Solution
The loglikelihood function is
n
n 1 X
`(µ, σ) = − log 2π − n log σ − 2 (xi − µ)2 .
2 2σ i=1

1
First we simulate samples of size 30, small samples so the asymptotic distribution of (µ̂, σ̂)0 may not be a
good approximation to the finite sample distribution.
rm(list=ls())

loglik <- function(p,z){


sum(dnorm(z,mean=p[1],sd=p[2],log=TRUE))
}

set.seed(45)
n <- 30
x <- rnorm(n,mean=1,sd=1.5)
optim(c(mu=0,sd=1),loglik,control=list(fnscale=-1),z=x)

## $par
## mu sd
## 1.058899 1.576261
##
## $value
## [1] -56.22442
##
## $counts
## function gradient
## 61 NA
##
## $convergence
## [1] 0
##
## $message
## NULL
t0 <- Sys.time()
B <- 10000
bootstrap.results <- matrix(NA,nrow=B,ncol=3)
colnames(bootstrap.results) <- c("mu","sigma","convergence")
for (b in 1:B){
sample.b <- rnorm(n,mean=1,sd=1.5)
m.b <- optim(c(mu=0,sd=1),loglik,control=list(fnscale=-1),z=sample.b)
bootstrap.results[b,] <- c(m.b$par,m.b$convergence)
}
t1 <- Sys.time()
t1-t0

## Time difference of 2.876258 secs


library(ggplot2)
ggplot(data.frame(bootstrap.results),aes(x=sigma)) +
geom_line(aes(y = ..density.., colour = 'Empirical'), stat = 'density')+
stat_function(fun = dnorm, args=list(mean=mean(1.5,na.rm=TRUE),sd=1.5/(sqrt(2*30))),
aes(colour = 'Normal')) +
scale_colour_manual(name = 'Density', values = c('red', 'blue')) +
theme(legend.position = c(0.85, 0.85))+xlab("sigma")

2
2.0 Density
Empirical
Normal

1.5
density

1.0

0.5

0.0

0.8 1.2 1.6 2.0


sigma
In the graph we see that the asymptotic distribution is a bit to the right of the actual finite sample distribution.
The mean of the finite sample distribution is a little bit smaller than the mean of the asymptotic distribution
n
( n−1 σ̂ 2 ) is an unbiased estimator for σ 2 , as a consequence, E σ̂ < σ). In the last iteration, the estimate of the
variance of the asymptotic distributionn and the actual inverse of the information matrix are
m.b <- optim(c(mu=0,sd=1),loglik,control=list(fnscale=-1),hessian=TRUE,z=sample.b)
solve(-m.b$hessian)

## mu sd
## mu 6.832987e-02 1.739109e-06
## sd 1.739109e-06 3.414888e-02
1.5ˆ2/30

## [1] 0.075
1.5ˆ2/(2*30)

## [1] 0.0375
The off-diagonal elements should be close to zero, and they are.
Now we do the same simulation for a much larger sample size, n = 3000.
set.seed(45)
n <- 3000
x <- rnorm(n,mean=1,sd=1.5)
optim(c(mu=0,sd=1),loglik,control=list(fnscale=-1),z=x)

## $par
## mu sd

3
## 0.9689086 1.4935903
##
## $value
## [1] -5459.993
##
## $counts
## function gradient
## 67 NA
##
## $convergence
## [1] 0
##
## $message
## NULL
B <- 10000
bootstrap.results <- matrix(NA,nrow=B,ncol=3)
colnames(bootstrap.results) <- c("mu","sigma","convergence")
for (b in 1:B){
sample.b <- rnorm(n,mean=1,sd=1.5)
m.b <- optim(c(mu=0,sd=1),loglik,control=list(fnscale=-1),z=sample.b)
bootstrap.results[b,] <- c(m.b$par,m.b$convergence)
}

library(ggplot2)
ggplot(data.frame(bootstrap.results),aes(x=sigma)) +
geom_line(aes(y = ..density.., colour = 'Empirical'), stat = 'density')+
stat_function(fun = dnorm, args=list(mean=mean(1.5,na.rm=TRUE),sd=1.5/(sqrt(2*3000))),
aes(colour = 'Normal')) +
scale_colour_manual(name = 'Density', values = c('red', 'blue')) +
theme(legend.position = c(0.85, 0.85))+xlab("sigma")

4
20 Density
Empirical
Normal

15
density

10

1.45 1.50 1.55


sigma
The finite sample distribution and the asymptotic distribbution are virtually identical. In the last iteration,
the estimate of the variance of the asymptotic distributionn and the actual inverse of the information matrix
are
m.b <- optim(c(mu=0,sd=1),loglik,control=list(fnscale=-1),hessian=TRUE,z=sample.b)
solve(-m.b$hessian)

## mu sd
## mu 7.207965e-04 -3.018749e-08
## sd -3.018749e-08 3.603010e-04
1.5ˆ2/3000

## [1] 0.00075
1.5ˆ2/(2*3000)

## [1] 0.000375
The code below does the same simulation, except that it uses all the cores on the computer (this is parallel
computing). This is not exam material, it just shows you how to speed up computations.
library(doParallel)

## Loading required package: foreach


##
## Attaching package: 'foreach'
## The following objects are masked from 'package:purrr':
##
## accumulate, when

5
## Loading required package: iterators
## Loading required package: parallel
n.cores <- detectCores()
set.seed(45)
n <- 3000
x <- rnorm(n,mean=1,sd=1.5)
optim(c(mu=0,sd=1),loglik,control=list(fnscale=-1),z=x)

## $par
## mu sd
## 0.9689086 1.4935903
##
## $value
## [1] -5459.993
##
## $counts
## function gradient
## 67 NA
##
## $convergence
## [1] 0
##
## $message
## NULL
B <- 100000
bootstrap.results <- matrix(NA,nrow=B,ncol=3)
colnames(bootstrap.results) <- c("mu","sigma","convergence")
cl <- makePSOCKcluster(n.cores-1)
registerDoParallel(cl)
bootstrap.results <- foreach(b=1:B,.combine=rbind) %dopar% {
sample.b <- rnorm(n,mean=1,sd=1.5)
m.b <- optim(c(mu=0,sd=1),loglik,control=list(fnscale=-1),z=sample.b)
c(m.b$par,m.b$convergence)
}
stopCluster(cl)

On a Macbook you can see in Activity Monitor how seven R processes are started. In simple simulation or
bootstrap problems computation time may be reduced noticeably by using this parallel computation. This
type of programming can also be run on the multicore cluster of the university, where it is easy to use 20
cores in one job.

Problem 2
Use the setup of Problem 1. Program the methods yourself, do not use boot or some other preprogrammed
procedure.
1. Calculate a 95% confidence interval for σ using the asymptotic distribution of σ̂.
2. Calculate a 95% confidence interval for σ using the bootstrap percentile method.
3. Calculate a 95% confidence interval for σ using the bootstrap BCa method.
4. Calculate an exact 95% confidence interval for σ using the finite sample distribution of distribution σ̂ 2 .

6
Solution
First, we generate a sample of n = 30 observations, that we use in the remainder of this answer. We also set
the other parameters in this problem.
set.seed(45)
n <- 30
m <- 1
sd <- 1.5
x <- rnorm(n,mean=m,sd=sd)

From the asymptotic distribution given in Problem~1, we see that the asymptotic distribution of σ̂ is n(σ̂ −
asy √
/2), so we take as the confidence interval based on this distribution is [σ̂ − σ̂Φ−1 (0.975)/ 2n, σ̂ −
σ) ∼ N (0, σ 2√
σ̂Φ−1 (0.975)/ 2n].
s.hat <- sd(x)
ci.asymptotic <- s.hat + qnorm(c(0.025,0.975))*sqrt(s.hatˆ2/(2*n))

According to the bootstrap percentile method, we read the appropriate quantiles of the distribution of σ̂
from the bootstrapped estimates. We choose B = 9999 replications.
B <- 9999
s.hat.star <- rep(NA,B)
for (b in 1:B){
x.b <- sample(x,n,replace=TRUE)
s.hat.star[b] <- sd(x.b)
}
ci.percentile <- quantile(s.hat.star,p=c(0.025,0.975))

In the BCa bootstrap, we do not take the 0.025 and 0.975 quantiles of the distribution of σ̂ ∗ , but quantiles that
adjust for bias and skewness. We need to calculate the constants a and b. First, we calculate b̂ = Φ−1 (F̂ ∗ (σ̂))
mean(s.hat.star<=s.hat)

## [1] 0.5678568
b <- qnorm(mean(s.hat.star<=s.hat))
b

## [1] 0.1709203
In the second step we calculate the scaling parameter a, which requires calculating the standard deviation
leaving out one observation successively.
s.hat.loo <- rep(NA,n)
for (i in 1:n) s.hat.loo[i] <- sd(x[-i])
av.s.hat.loo <- mean(s.hat.loo)
dev.s.hat.loo <- av.s.hat.loo-s.hat.loo
a <- (1/6)*sum(dev.s.hat.looˆ3)/((sum(dev.s.hat.looˆ2))ˆ(3/2))

Now we can calculate the appropriate quantiles of the bootstrap distribution:


beta1 <- pnorm(b+(b+qnorm(0.025))/(1-a*(b+qnorm(0.025))))
beta2 <- pnorm(b+(b+qnorm(0.975))/(1-a*(b+qnorm(0.975))))
beta1

## [1] 0.102009
beta2

## [1] 0.9992164

7
ci.bca <- quantile(s.hat.star,p=c(beta1,beta2))

Finally, we calculate the exact confidence interval. Since the observations are froma normal distribution, we
2
know that (n−1)σ̂
σ2 ∼ χ2 (n − 1) in samples of any size n. As a consequence,
"s s #
(n − 1)σ̂ 2 (n − 1)σ̂ 2
,
χ20.975 (n − 1) χ20.025 (n − 1)

is an exact confidence interval, with χ2α (n) denoting the α quantile of a χ2 (n) distribution.
ci.exact <- c(sqrt(((n-1)*s.hatˆ2)/qchisq(0.975,df=n-1)),
sqrt(((n-1)*s.hatˆ2)/qchisq(0.025,df=n-1)))

Since we have the bootstrap estimates for σ, it is easy to give a basic bootstrap interval as well:
ci.basic <- 2*s.hat-quantile(s.hat.star,p=c(0.975,0.025))

We collect all intervals, and for comparison we run the boot function as well:
ci <- rbind(asymptotic=ci.asymptotic,percentile=ci.percentile,BCa=ci.bca,
basic=ci.basic,
exact=ci.exact)
ci

## 2.5% 97.5%
## asymptotic 1.1977310 2.009175
## percentile 0.9245941 2.192933
## BCa 1.1009476 2.566972
## basic 1.0139736 2.282312
## exact 1.2770012 2.155546
library(boot)
b.results <- boot(x,function(x,i){ sd(x[i])},R=20000)
boot.ci(b.results)

## Warning in boot.ci(b.results): bootstrap variances needed for studentized


## intervals
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 20000 bootstrap replicates
##
## CALL :
## boot.ci(boot.out = b.results)
##
## Intervals :
## Level Normal Basic
## 95% ( 1.006, 2.323 ) ( 1.007, 2.278 )
##
## Level Percentile BCa
## 95% ( 0.929, 2.200 ) ( 1.096, 2.523 )
## Calculations and Intervals on Original Scale
The results correspond nicely.

Problem 3
Twelve observations on the time (in hours) between failures of airconditioning equipment have been collected:
3, 5, 7, 18, 43, 85, 91, 98, 100, 130, 230, and 487. Assume that the times between failures follow an exponential

8
distribution with parameter λ.
1. Obtain the ML estimate of λ and use the bootstrap to estimate the bias and the standard error of the
estimate.
2. Compute 95% confidence intervals for the mean time between failures 1/λ by the standard normal,
percentile, and BCa method. Compare the intervals and explain why they may differ.
3. Use library(boot) and the functions boot and boot.ci to calculate these confidence intervals as well.

Solution
It is well-known that the ML estimate of λ is
n
λ̂ = P .
i Xi

library(MASS)

##
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
##
## select
w.time <- c(3,5,7,18,43,85,91,98,100,130,230,487)
m <- fitdistr(w.time,"exponential")
m

## rate
## 0.009252120
## (0.002670857)
First, we estimate the bias and the standard error of λ̂ by a nonparametric bootstrap.
B <- 9999
lambda.B <- rep(NA,B)
n <- length(w.time)
for (b in 1:B){
b.sample <- sample(1:n,n,replace=TRUE)
lambda.B[b] <- 1/mean(w.time[b.sample])
}

bias <- mean(lambda.B-m$estimate)


sd(lambda.B)

## [1] 0.004208582
The bias is positive, so on average λ̂ overestimates λ. Note that the bias is large compared to the standard
error.
ggplot(data=data.frame(lambda.B),aes(x=lambda.B), geom = 'blank') +
stat_function(fun = dnorm, args=list(mean=m$estimate,sd=m$sd),
aes(colour = 'ML, normality')) +
xlab("lambda")+ylab("density")+
geom_density(aes(y = ..density..,colour="Nonarametric bootstrap"), alpha = 0.4) +
theme(legend.title=element_blank())

9
150

100
density

ML, normality
Nonarametric bootstrap

50

0.02 0.04 0.06


lambda
In the second part we have to calculate a 95% confidence interval for the mean time between failures. First,
note that the mean of an exponential distribution is 1/λ and its variance is 1/λ2 . For clarity, we denote the
mean by θ, that is the parameter we want to estimate. A first interval is based on the assumption that θ̂ = X̄
follows a normal distribution with mean θ and variance σ 2 /n where we estimate that last parameter from the
data. It is
n <- length(w.time)
m <- mean(w.time)
se <- sd(w.time)/sqrt(n)
interval.1 <- m + se * qnorm(c(0.025,0.975))
interval.1

## [1] 31.00421 185.16246


But we can also use the the assumption that the data are from an exponential distribution. In that case we
have varX̄ = 1/(nλ2 ) = θ2 /n which can be estimated by X̄ 2 /n.
sd.m <- sqrt(mˆ2/n)
interval.2 <- m + sd.m * qnorm(c(0.025,0.975))
interval.2

## [1] 46.93055 169.23611


We can also estimate the standard error of θ̂ by means of a boostrap procedure. We use the nonparametric
bootstrap, that is, we sample from the original sample with replacement.
B <- 9999
m.star <- rep(NA,B)
for (b in 1:B){
m.star[b] <- mean(sample(w.time,replace=TRUE))

10
}
sd.m.star <- sd(m.star)
interval.3 <- m + sd.m.star * qnorm(c(0.025,0.975))
interval.3

## [1] 33.65828 182.50838


An interval not based on the assumption of normality of θ̂ is obtained by the percentile method:
interval.4 <- quantile(m.star, probs=c(0.025,0.975))
interval.4

## 2.5% 97.5%
## 47.49583 192.83750
Finally, we suspect that θ̂ does not have a symmetric distribution because the data are skewed to the right
themselves. The probabilities in calculating interval.4 need to be adjusted. We need the constants a and b,
reflecting scaling and bias adjustment.
b <- qnorm(mean(m.star<m))
psi <- rep(0,n)
for (i in 1:n) psi[i] <- mean(w.time[-i])
psi.centered <- mean(psi)-psi
a <- sum(psi.centeredˆ3)/(6*sum(psi.centeredˆ2)ˆ1.5)
z <- qnorm(c(0.025,0.975))
beta <- pnorm(b+(b+z)/(1-a*(b+z)))
beta

## [1] 0.0637664 0.9952597


interval.5 <- quantile(m.star, probs=beta)
interval.5

## 6.37664% 99.52597%
## 57.33333 223.48742
Very useful bootstrap functions are implemented in the library boot:
library(boot)
bootstrap.example <- boot(w.time,R=9999,statistic=function(x,i){mean(x[i])})
mean.fun <- function(d, i)
{ m <- mean(d[i])
n <- length(i)
v <- (n-1)*var(d[i])/nˆ2
c(m, v)
}
w.boot <- boot(w.time, mean.fun, R = 999)
boot.ci(w.boot, type = c("all"))

## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS


## Based on 999 bootstrap replicates
##
## CALL :
## boot.ci(boot.out = w.boot, type = c("all"))
##
## Intervals :
## Level Normal Basic Studentized
## 95% ( 28.5, 182.9 ) ( 16.9, 168.7 ) ( 43.6, 289.4 )
##

11
## Level Percentile BCa
## 95% ( 47.5, 199.2 ) ( 54.6, 227.3 )
## Calculations and Intervals on Original Scale
## Some BCa intervals may be unstable

12

You might also like