You are on page 1of 7

STAT 560 Homework 3

Alex Nguyen

Problem 7.4

Assuming all the elements in 𝑋 = (𝑥1 , 𝑥2 , … , 𝑥𝑛 ) are unique, the probability that a certain element 𝑥𝑖 is
1
in a bootstrap sample 𝑘 amount of times is distributed as 𝐵𝑖𝑛(𝑛, 𝑛 ). Using the Poisson approximation,
1
this is approximately distributed as 𝑃𝑜𝑖𝑠𝑠𝑜𝑛 (𝑛 ∗ 𝑛) = 𝑃𝑜𝑖𝑠𝑠𝑜𝑛(1). Thus if 𝑌 is the number of times an
element 𝑥𝑖 shows up in the bootstrap sample, the probability of 𝑘 times is given as

1𝑘 ∗ 𝑒 −1 𝑒 −1
𝑃𝑟𝑜𝑏(𝑌 = 𝑘) = =
𝑘! 𝑘!
For problem 7.3, the number of times a row will be repeated is binomially distributed with parameters
1
𝑛 = 88 and 𝑝 = 88 (assuming each row is unique), and the corresponding poisson approximation is
distributed as 𝑃𝑜𝑖𝑠𝑠𝑜𝑛(1)

Thus the probabilities that a row will show up 𝑘 = 0,1,2,3 times using the binomial distribution and the
Poisson approximation is:

𝑘=0 𝑘=1 𝑘=2 𝑘=3


Binomial(n,p) 0.366 0.370 0.185 0.610
Poisson (np) 0.368 0.368 0.184 0.061

The Poisson approximation works fairly well to approximate in this problem.

Problem 8.1

We could take individual one-sample bootstrap samples for 𝑍 and 𝑌 separately. Then we calculate the
sample means for each of the bootstrap samples 𝑧1∗ , 𝑧2∗ , . . , 𝑧𝐵∗ . And then we repeat the same for
𝑦1∗ , 𝑦2∗ , … , 𝑦𝐵∗ . We can estimate the bootstrap estimate of the standard error for the mean for each and
use the formula: 𝑠𝑒 ̂𝜃̂ = √𝑣𝑎𝑟(𝑧̅) + 𝑣𝑎𝑟(𝑦̅) since both 𝑍 and 𝑌 are independent.

Using R to implement the above method,


The one sample bootstrap estimate for the standard error is 26.834 which is very close to the original
estimate of 26.4

Problem 10.5

From equation 10.16 we are given:

̂∞ | < 2𝑠𝑒̂𝐵 ) = 0.95,


̂𝐵 − 𝑏𝑖𝑎𝑠
𝑃𝑟𝑜𝑏𝐹̂ (|𝑏𝑖𝑎𝑠 𝐵 √

̂𝐵
2𝑠𝑒
We need to find the value of 𝐵 at which the value = 0.001. Using R,
√𝐵

data(patch)

# Bootstrap Function
bootstrap10.3 = function(x, nboot, theta, ...){
data = matrix(sample(x, size=length(x) * nboot, replace=T),
nrow=nboot)
return(apply(data,1,theta, xdata))
}
# Change data to use in function
xdata = cbind(patch$z,patch$y)
xdata

# Get theta function


theta = function(x,data){
mean(xdata[x,2])/mean(xdata[x,1])
}

# get range of B values


B = seq(30000,50000,1000)

# Using a for-loop
for (i in B){
results = bootstrap10.3(1:8,i,theta,xdata)

value=2*sd(results)/sqrt(i)
#Use statement to find value of B that approximates the value to 0.001
if(value<0.001){
print(i)
break
}
}

For this instance I got 𝐵 = 42,000 which approximately satisfies the equation above.
Problem 10.8

Given:
𝑛 (𝑥𝑖 − 𝑥̅ )2
𝜃̂ = ∑ ,
𝑖=1 𝑛
And:
̂𝑗𝑎𝑐𝑘 = (𝑛 − 1)(𝜃(∙) − 𝜃̂)
𝑏𝑖𝑎𝑠

Thus
̂𝑗𝑎𝑐𝑘 = 𝜃 − (𝑛 − 1)(𝜃̂(∙) − 𝜃̂) = 𝑛𝜃̂ − (𝑛 − 1)𝜃̂(∙)
𝜃̅ = 𝜃̂ − 𝑏𝑖𝑎𝑠

First we solve for 𝜃̂(∙) yields:


𝑛 𝜃̂(𝑖)
𝜃̂(∙) = ∑
𝑖=1 𝑛

Given that:
𝑛 𝑛 𝑛 2
1
∑ (𝑥𝑖 − 𝑥̅ )2 = ∑ 𝑥𝑖2 − (∑ 𝑥𝑖 )
𝑖=1 𝑖=1 𝑛 𝑖=1

2 2
(𝑥𝑗 − 𝑥̅ ) 1 1
𝜃̂(𝑖) =∑ = (∑ 𝑥𝑗2 − (∑ 𝑥𝑗 ) )
𝑛−1 𝑛−1 𝑛−1 𝑗≠𝑖
𝑗≠𝑖 𝑗≠1

Thus

𝑛 2
1 1
𝜃̂(∙) = ∑ [ (∑ 𝑥𝑗2 − (∑ 𝑥𝑗 ) )]
𝑛(𝑛 − 1) 𝑛−1 𝑗≠𝑖
𝑖=1 𝑗≠1

2
𝑛 𝑛
1 1
= [∑ ∑ 𝑥𝑗2 − ∑ (∑ 𝑥𝑗 ) ]
𝑛(𝑛 − 1) 𝑛−1
𝑖=1 𝑗≠𝑖 𝑖=1 𝑗≠𝑖

Given:
2
𝑛 𝑛 𝑛−1

(∑ 𝑥𝑗 ) = ∑ 𝑥𝑗2 + 2 ∑ ∑ 𝑥𝑗 𝑥𝑘
𝑗=1 𝑗=1 𝑗=1 𝑘>𝑗

And
𝑛 𝑛

∑ ∑ 𝑥𝑗2 = (𝑛 − 1) ∑ 𝑥𝑗2
𝑖=1 𝑗≠𝑖 𝑗=1
Continuing with the derivation yields:

𝑛 𝑛 𝑛−1
1 1
𝜃̂(∙) = [(𝑛 − 1) ∑ 𝑥2𝑗 − ((𝑛 − 1) ∑ 𝑥2𝑗 − (𝑛 − 2) ∗ 2 ∑ ∑ 𝑥𝑗 𝑥𝑘 )]
𝑛(𝑛 − 1) 𝑛−1
𝑗=1 𝑗=1 𝑗=1 𝑘>𝑗

𝑛 𝑛 2
1 1 𝑛
= [(𝑛 − 1) ∑ 𝑥2𝑗 − (∑ 𝑥2𝑗 + (𝑛 − 2) (∑ 𝑥𝑗 ) )]
𝑛(𝑛 − 1) 𝑛−1 𝑗=1
𝑗=1 𝑗=1

𝑛 𝑛 2
1 𝑛−1 1 𝑛−2 𝑛
= [ ∑ 𝑥2𝑗 − ∑ 𝑥2𝑗 − (∑ 𝑥𝑗 ) ]
𝑛−1 𝑛 𝑛(𝑛 − 1) 𝑛(𝑛 − 1) 𝑗=1
𝑗=1 𝑗=1

Thus

𝜃̅ = 𝑛𝜃̂ − (𝑛 − 1)𝜃̂(∙)
𝑛 𝑛 2
(𝑥𝑖 − 𝑥̅ )2
𝑛 1 𝑛−1 1 𝑛−2 𝑛
=𝑛∗∑ − (𝑛 − 1) ∗ [ ∑ 𝑥2𝑗 − ∑ 𝑥2𝑗 − (∑ 𝑥𝑗 ) ]
𝑗=1 𝑛 𝑛−1 𝑛 𝑛(𝑛 − 1) 𝑛(𝑛 − 1) 𝑗=1
𝑗=1 𝑗=1

𝑛 𝑛 2
𝑛−1 1 𝑛−2 𝑛
= (𝑥𝑖 − 𝑥̅ )2 −( ∑ 𝑥2𝑗 − ∑ 𝑥2𝑗 − (∑ 𝑥𝑗 ) )
𝑛 𝑛(𝑛 − 1) 𝑛 𝑗=1
𝑗=1 𝑗=1

𝑛 𝑛 2 𝑛 𝑛 2
1 𝑛−1 1 𝑛−2 𝑛
= ∑ 𝑥𝑗2 − (∑ 𝑥𝑗 ) − ∑ 𝑥𝑗2 + ∑ 𝑥𝑗2 + (∑ 𝑥𝑗 )
𝑛 𝑛 𝑛(𝑛 − 1) 𝑛(𝑛 − 1) 𝑗=1
𝑗=1 𝑖=1 𝑗=1 𝑗=1

Grouping together the terms yields:


𝑛 2
𝑛(𝑛 − 1) − (𝑛 − 1)2 + 1 𝑛 1 𝑛−2
= ∑ 𝑥𝑗2 ∗ ( ) + (∑ 𝑥𝑗 ) ∗ (− + )
𝑛(𝑛 − 1) 𝑗=1 𝑛 𝑛−1
𝑗=1

𝑛 2
1 𝑛 1
= ∑ 𝑥2𝑗 ∗ + (∑ 𝑥𝑗 ) ∗ −
𝑛−1 𝑗=1 𝑛(𝑛 − 1)
𝑗=1

𝑛 2
1 1 𝑛
= (∑ 𝑥𝑗2 − (∑ 𝑥𝑗 ) )
𝑛−1 𝑛 𝑗=1
𝑗=1

𝑛
1 2
= ∑(𝑥𝑗 − 𝑥
̅) 𝑄𝐸𝐷
𝑛−1
𝑖=1
Problem 11.12

Using the 15 sampled values in the law data, we impute the data into R:
# Read CSV File
law = read.csv(file="./law.csv",h=T)

# Get the sampled observations in law


law = law[(law[,4]==1),]

# Get estimated correlation value


cor.hat = thetahat = cor(law[,2:3])[1,2]

# Perform Bootstrap
n=dim(law)[1]
n.bs=2000
bs=rep(0,n.bs)

for(i in 1:n.bs){
bs[i] = cor(law[sample(seq(n),n,replace=T),2:3])[1,2] }

# Bootstrap SE estimate
sd(bs)
# Bootstrap Bias estimat
mean(bs)-cor.hat

This yields for the bootstrap estimates:

For the jack knife:


#Jack Knife
xdata = cbind(law$LSAT,law$GPA)
theta = function(x,xdata){ cor(xdata[x,1],xdata[x,2]) }
results = jackknife(1:n,theta,xdata)

This yields for the jack knife estimates:

The Jack knife estimate for the standard error is slightly large, and both the bias estimates are roughly
the same
Problem 11.13

Part a)

Creating the data in R:


# Create sample
N.Samp = matrix(rnorm(20*100,1,1),nrow=100,ncol=20)

# create variables to hold the variances


var.bs = rep(0,100)
var.jack = rep(0,100)

# number of bootstrap samples and sample size


n.bs = 2000
n.s = 20

for (i in 1:100){

#Boot Strap
BS=matrix(sample(N.Samp[i,],size=n.bs*n.s, replace=T),n.bs,n.s)
bs.means = apply(BS,1,mean)
var.bs[i] = var(bs.means)

# Jack knife
var.jack[i] = jackknife(N.Samp[i,],mean)$jack.se**2
}

This yields

Repeating the same as above with 𝜃̂ = 𝑋̅ 2 , we simply change the function argument with:
theta = function(x){mean(x)**2}

Which yields:
Problem 12.5

To find the exact confidence interval we are given:

𝜃̂ 2
20 ∗ ~ 𝜒20
𝜃

2
𝛼 𝜃̂ 2
𝛼
𝑃 (𝜒20 ( ) < 20 ∗ < 𝜒20 (1 − )) = 1 − 𝛼
2 𝜃 2

Thus rearranging yields:

20 ∗ 𝜃̂ 20 ∗ 𝜃̂
𝑃( 𝛼 < 𝜃 < 𝛼 )
2 2
𝜒20 (1 − 2 ) 𝜒20 (2 )

For part b) to find the standard interval we are given:

𝜃̂ − 𝜃
~ 𝑁(0,1)
𝑠𝑒
̂
Rearranging yields:

𝑃 (𝜃̂ − 𝑧1−𝛼 ∗ 𝑠𝑒
̂ < 𝜃 < 𝜃̂ − 𝑧𝛼 ∗ 𝑠𝑒
̂)
2 2

Using 𝛼 = 0.05, and using the code from class for the bootstrap T intervals.

The results of the 1000 simulations are shown below.

Since 𝛼 = 0.05, the exact confidence interval has a miscoverage around that rate. The standard interval
showed the highest error rates, especially on the right tail. The bootstrap t interval performed had a low
miscoverage rate but suffered from high variance and a high mean interval length.

You might also like