Professional Documents
Culture Documents
STAT 560 Homework 3 PDF
STAT 560 Homework 3 PDF
Alex Nguyen
Problem 7.4
Assuming all the elements in 𝑋 = (𝑥1 , 𝑥2 , … , 𝑥𝑛 ) are unique, the probability that a certain element 𝑥𝑖 is
1
in a bootstrap sample 𝑘 amount of times is distributed as 𝐵𝑖𝑛(𝑛, 𝑛 ). Using the Poisson approximation,
1
this is approximately distributed as 𝑃𝑜𝑖𝑠𝑠𝑜𝑛 (𝑛 ∗ 𝑛) = 𝑃𝑜𝑖𝑠𝑠𝑜𝑛(1). Thus if 𝑌 is the number of times an
element 𝑥𝑖 shows up in the bootstrap sample, the probability of 𝑘 times is given as
1𝑘 ∗ 𝑒 −1 𝑒 −1
𝑃𝑟𝑜𝑏(𝑌 = 𝑘) = =
𝑘! 𝑘!
For problem 7.3, the number of times a row will be repeated is binomially distributed with parameters
1
𝑛 = 88 and 𝑝 = 88 (assuming each row is unique), and the corresponding poisson approximation is
distributed as 𝑃𝑜𝑖𝑠𝑠𝑜𝑛(1)
Thus the probabilities that a row will show up 𝑘 = 0,1,2,3 times using the binomial distribution and the
Poisson approximation is:
Problem 8.1
We could take individual one-sample bootstrap samples for 𝑍 and 𝑌 separately. Then we calculate the
sample means for each of the bootstrap samples 𝑧1∗ , 𝑧2∗ , . . , 𝑧𝐵∗ . And then we repeat the same for
𝑦1∗ , 𝑦2∗ , … , 𝑦𝐵∗ . We can estimate the bootstrap estimate of the standard error for the mean for each and
use the formula: 𝑠𝑒 ̂𝜃̂ = √𝑣𝑎𝑟(𝑧̅) + 𝑣𝑎𝑟(𝑦̅) since both 𝑍 and 𝑌 are independent.
Problem 10.5
̂𝐵
2𝑠𝑒
We need to find the value of 𝐵 at which the value = 0.001. Using R,
√𝐵
data(patch)
# Bootstrap Function
bootstrap10.3 = function(x, nboot, theta, ...){
data = matrix(sample(x, size=length(x) * nboot, replace=T),
nrow=nboot)
return(apply(data,1,theta, xdata))
}
# Change data to use in function
xdata = cbind(patch$z,patch$y)
xdata
# Using a for-loop
for (i in B){
results = bootstrap10.3(1:8,i,theta,xdata)
value=2*sd(results)/sqrt(i)
#Use statement to find value of B that approximates the value to 0.001
if(value<0.001){
print(i)
break
}
}
For this instance I got 𝐵 = 42,000 which approximately satisfies the equation above.
Problem 10.8
Given:
𝑛 (𝑥𝑖 − 𝑥̅ )2
𝜃̂ = ∑ ,
𝑖=1 𝑛
And:
̂𝑗𝑎𝑐𝑘 = (𝑛 − 1)(𝜃(∙) − 𝜃̂)
𝑏𝑖𝑎𝑠
Thus
̂𝑗𝑎𝑐𝑘 = 𝜃 − (𝑛 − 1)(𝜃̂(∙) − 𝜃̂) = 𝑛𝜃̂ − (𝑛 − 1)𝜃̂(∙)
𝜃̅ = 𝜃̂ − 𝑏𝑖𝑎𝑠
Given that:
𝑛 𝑛 𝑛 2
1
∑ (𝑥𝑖 − 𝑥̅ )2 = ∑ 𝑥𝑖2 − (∑ 𝑥𝑖 )
𝑖=1 𝑖=1 𝑛 𝑖=1
2 2
(𝑥𝑗 − 𝑥̅ ) 1 1
𝜃̂(𝑖) =∑ = (∑ 𝑥𝑗2 − (∑ 𝑥𝑗 ) )
𝑛−1 𝑛−1 𝑛−1 𝑗≠𝑖
𝑗≠𝑖 𝑗≠1
Thus
𝑛 2
1 1
𝜃̂(∙) = ∑ [ (∑ 𝑥𝑗2 − (∑ 𝑥𝑗 ) )]
𝑛(𝑛 − 1) 𝑛−1 𝑗≠𝑖
𝑖=1 𝑗≠1
2
𝑛 𝑛
1 1
= [∑ ∑ 𝑥𝑗2 − ∑ (∑ 𝑥𝑗 ) ]
𝑛(𝑛 − 1) 𝑛−1
𝑖=1 𝑗≠𝑖 𝑖=1 𝑗≠𝑖
Given:
2
𝑛 𝑛 𝑛−1
(∑ 𝑥𝑗 ) = ∑ 𝑥𝑗2 + 2 ∑ ∑ 𝑥𝑗 𝑥𝑘
𝑗=1 𝑗=1 𝑗=1 𝑘>𝑗
And
𝑛 𝑛
∑ ∑ 𝑥𝑗2 = (𝑛 − 1) ∑ 𝑥𝑗2
𝑖=1 𝑗≠𝑖 𝑗=1
Continuing with the derivation yields:
𝑛 𝑛 𝑛−1
1 1
𝜃̂(∙) = [(𝑛 − 1) ∑ 𝑥2𝑗 − ((𝑛 − 1) ∑ 𝑥2𝑗 − (𝑛 − 2) ∗ 2 ∑ ∑ 𝑥𝑗 𝑥𝑘 )]
𝑛(𝑛 − 1) 𝑛−1
𝑗=1 𝑗=1 𝑗=1 𝑘>𝑗
𝑛 𝑛 2
1 1 𝑛
= [(𝑛 − 1) ∑ 𝑥2𝑗 − (∑ 𝑥2𝑗 + (𝑛 − 2) (∑ 𝑥𝑗 ) )]
𝑛(𝑛 − 1) 𝑛−1 𝑗=1
𝑗=1 𝑗=1
𝑛 𝑛 2
1 𝑛−1 1 𝑛−2 𝑛
= [ ∑ 𝑥2𝑗 − ∑ 𝑥2𝑗 − (∑ 𝑥𝑗 ) ]
𝑛−1 𝑛 𝑛(𝑛 − 1) 𝑛(𝑛 − 1) 𝑗=1
𝑗=1 𝑗=1
Thus
𝜃̅ = 𝑛𝜃̂ − (𝑛 − 1)𝜃̂(∙)
𝑛 𝑛 2
(𝑥𝑖 − 𝑥̅ )2
𝑛 1 𝑛−1 1 𝑛−2 𝑛
=𝑛∗∑ − (𝑛 − 1) ∗ [ ∑ 𝑥2𝑗 − ∑ 𝑥2𝑗 − (∑ 𝑥𝑗 ) ]
𝑗=1 𝑛 𝑛−1 𝑛 𝑛(𝑛 − 1) 𝑛(𝑛 − 1) 𝑗=1
𝑗=1 𝑗=1
𝑛 𝑛 2
𝑛−1 1 𝑛−2 𝑛
= (𝑥𝑖 − 𝑥̅ )2 −( ∑ 𝑥2𝑗 − ∑ 𝑥2𝑗 − (∑ 𝑥𝑗 ) )
𝑛 𝑛(𝑛 − 1) 𝑛 𝑗=1
𝑗=1 𝑗=1
𝑛 𝑛 2 𝑛 𝑛 2
1 𝑛−1 1 𝑛−2 𝑛
= ∑ 𝑥𝑗2 − (∑ 𝑥𝑗 ) − ∑ 𝑥𝑗2 + ∑ 𝑥𝑗2 + (∑ 𝑥𝑗 )
𝑛 𝑛 𝑛(𝑛 − 1) 𝑛(𝑛 − 1) 𝑗=1
𝑗=1 𝑖=1 𝑗=1 𝑗=1
𝑛 2
1 𝑛 1
= ∑ 𝑥2𝑗 ∗ + (∑ 𝑥𝑗 ) ∗ −
𝑛−1 𝑗=1 𝑛(𝑛 − 1)
𝑗=1
𝑛 2
1 1 𝑛
= (∑ 𝑥𝑗2 − (∑ 𝑥𝑗 ) )
𝑛−1 𝑛 𝑗=1
𝑗=1
𝑛
1 2
= ∑(𝑥𝑗 − 𝑥
̅) 𝑄𝐸𝐷
𝑛−1
𝑖=1
Problem 11.12
Using the 15 sampled values in the law data, we impute the data into R:
# Read CSV File
law = read.csv(file="./law.csv",h=T)
# Perform Bootstrap
n=dim(law)[1]
n.bs=2000
bs=rep(0,n.bs)
for(i in 1:n.bs){
bs[i] = cor(law[sample(seq(n),n,replace=T),2:3])[1,2] }
# Bootstrap SE estimate
sd(bs)
# Bootstrap Bias estimat
mean(bs)-cor.hat
The Jack knife estimate for the standard error is slightly large, and both the bias estimates are roughly
the same
Problem 11.13
Part a)
for (i in 1:100){
#Boot Strap
BS=matrix(sample(N.Samp[i,],size=n.bs*n.s, replace=T),n.bs,n.s)
bs.means = apply(BS,1,mean)
var.bs[i] = var(bs.means)
# Jack knife
var.jack[i] = jackknife(N.Samp[i,],mean)$jack.se**2
}
This yields
Repeating the same as above with 𝜃̂ = 𝑋̅ 2 , we simply change the function argument with:
theta = function(x){mean(x)**2}
Which yields:
Problem 12.5
𝜃̂ 2
20 ∗ ~ 𝜒20
𝜃
2
𝛼 𝜃̂ 2
𝛼
𝑃 (𝜒20 ( ) < 20 ∗ < 𝜒20 (1 − )) = 1 − 𝛼
2 𝜃 2
20 ∗ 𝜃̂ 20 ∗ 𝜃̂
𝑃( 𝛼 < 𝜃 < 𝛼 )
2 2
𝜒20 (1 − 2 ) 𝜒20 (2 )
𝜃̂ − 𝜃
~ 𝑁(0,1)
𝑠𝑒
̂
Rearranging yields:
𝑃 (𝜃̂ − 𝑧1−𝛼 ∗ 𝑠𝑒
̂ < 𝜃 < 𝜃̂ − 𝑧𝛼 ∗ 𝑠𝑒
̂)
2 2
Using 𝛼 = 0.05, and using the code from class for the bootstrap T intervals.
Since 𝛼 = 0.05, the exact confidence interval has a miscoverage around that rate. The standard interval
showed the highest error rates, especially on the right tail. The bootstrap t interval performed had a low
miscoverage rate but suffered from high variance and a high mean interval length.