You are on page 1of 7

STA303: Assignment1

Hailey Shim - 1007654030

February 28th, 2024

Question 1a)
Given the number of successes y = 27 and the sample size n = 30, the estimated proportion π̂ is:

27
π̂ = = 0.9
30
The Wald Confidence Interval can be calculated using the formula:
r
π̂(1 − π̂)
Wald CI = π̂ ± z ·
n

Substituting the values into the formula, we get:


r
0.9(1 − 0.9)
Wald CI = 0.9 ± 1.96 · = [0.79264, 1.00735]
30

Following this, we can caluculate Score(Wilson) Intervals using the formula:

" ! !# v " ! !#
2 u  2 2
zα/2
n 1 zα/2 u 1 n 1
p̂ 2 + 2 ± zα/2 t
2 p̂(1 − p̂) 2 + 2
n + zα/2 2 n + zα/2 n + zα/2 n + zα/2 2 n + zα/2

Substituting the values, we get:

v "
u   2  #
1.962
     
30 1 u 1 30 1 1.962
0.9 + ±1.96t 0.9(1 − 0.9) + = [0.7438, 0.9
30 + 1.962 2 30 + 1.962 30 + 1.962 30 + 1.962 2 30 + 1.962

Interpretation: The Wald confidence interval [0.79264, 1.00735] suggests a probability greater than 1, which
is illogical. Therefore, the Score confidence interval is a better choice as it offers a more reliable range within
legitimate probability values. The Score CI is [0.7438, 0.9654]. Unlike the Wald interval, this interval does
not surpass the boundaries of probability. It is more reliable and accurately represents the uncertainty in
the estimated proportion by excluding values lower than 0 or higher than 1, thus remaining within the valid
probability limits.

Question1b)

1
set.seed(1007654030)
N <- 100000
pie <- 0.9 ## pi is already defined in R thus we use "pie"
n <- 30
alph <- 0.05 # For 95% Confidence Interval
y <- rbinom(N, n, pie)
pihat <- y/n

## Wald Confidence Interval ##


L.wald <- pihat - qnorm(1-alph/2)*sqrt((pihat*(1-pihat))/n)
U.wald <- pihat + qnorm(1-alph/2)*sqrt((pihat*(1-pihat))/n)
pi_in_wald_CI <- (pie > L.wald)*(pie < U.wald)
observedConfLevel_WaldCI <- mean(pi_in_wald_CI)
observedConfLevel_WaldCI

## [1] 0.81025
## Scores Confidence Interval ##
nprime <- n + (qnorm(1-alph/2))ˆ2
w1 <- n/nprime
w2 <- ((qnorm(1-alph/2))ˆ2)/nprime
midpoint <- pihat*w1+0.5*w2
L_bound <- midpoint - qnorm(1-alph/2)*sqrt((1/nprime)*(pihat*(1-pihat)*w1+0.25*w2))
U_bound <- midpoint + qnorm(1-alph/2)*sqrt((1/nprime)*(pihat*(1-pihat)*w1+0.25*w2))
pi_in_score_CI <- (pie > L_bound)*(pie < U_bound)
observedConfLevel_scoreCI <- mean(pi_in_score_CI)
observedConfLevel_scoreCI

## [1] 0.97303
Interpretation: The observed confidence interval for Wald Confidence Interval is 0.81025. This means that
approximately 81.025% of the Wald CIs contain the true proportion 0.9, which is below our 95% Confidence
Level set by alpha 0.05. On the other hand, the observed confidence interval for Score Confience Interval is
0.97303. This indicates that approximatley 97.703% of the Score CIs contain the true proportion 0.9, which
is much closer to our 95% confidence level. Therefore, one can conclude that Score CI is more accurate and
reliable in estimating the confidence level of the proportion.

Question2a)
The likelihood function with the binomial coefficient is given by:

30!
ℓ(π) = π 27 (1 − π)30−27
27!(30 − 27)!

The Log likelihood function follows as:


 
30!
L(π) = log + 27 log(π) + 3 log(1 − π)
27!(30 − 27)!

Question 2b)
par(mfrow = c(1,2))
## Using Likelihood ##

2
likelihood <- function(pi) { (piˆ27)*((1-pi)ˆ3) }
opt.lik <- optimize(likelihood, interval=c(0, 1), maximum=TRUE)
curve(likelihood, from=0, to=1, xlab=expression(pi), ylab=expression(l(pi)))
abline(v = opt.lik$maximum)
text(opt.lik$maximum + 0.05, 0.0014, label = round(opt.lik$maximum, 4))

## Using log-likelihood ##
log_likelihood <- function(pi) { 27*log(pi) + 3*log(1-pi) }
curve(log_likelihood, from=0, to=1, xlab=expression(pi), ylab=expression(L(pi)))
opt.log_lik <- optimize(log_likelihood, interval=c(0, 1), maximum=TRUE)
abline(v = opt.log_lik$maximum)
text(opt.log_lik$maximum + 0.05, -15, label = round(opt.lik$maximum, 4))
6e−05

0.9

−80 −60 −40 −20


4e−05

L(π)
l(π)

2e−05

−120
0e+00

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

π π #
Question 2C)
Here, We are testing H0 : π = 0.5 against Ha : π ̸= 0.5 using the likelihood ratio test with y = 27 successes
out of n = 30 trials. The test statistic is given by:
    
y n−y
−2 log(Λ) = 2 y log + (n − y) log
n · π0 n · (1 − π0 )

Substituting the values we get:


    
27 30 − 27
−2 log(Λ) = 2 27 log + (30 − 27) log
30 · 0.5 30 · (1 − 0.5)

Simplifying the above expression, we calculate the test statistic:


    
27 3
−2 log(Λ) = 2 27 log + 3 log
15 15

3
−2 log(Λ) = 2 [27 log(1.8) + 3 log(0.2)] = 9.590779

Since 9.590779 > 3.84(the chi square(0.05)), we can reject the H_0.

Question 2d)
library(rootSolve)

n <- 30
y <- 27
phat <- y/n
alpha <- 0.05
f1 <- function(pi0) {
-2*(y*log(pi0) + (n-y)*log(1-pi0)-y*log(phat)
-(n-y)*log(1-phat)) - qchisq(1-alpha,df=1)
}
uniroot.all(f=f1, interval=c(0.000001,0.999999))

## [1] 0.7607957 0.9741571


par(oma = c(1,1,0,1), bty = "n")
curve(f1, from=0, to=1, xlab=expression(pi[0]),
ylab=expression(paste("-2log(" ) ~ Lambda ~ paste( ") - ") ~ chi[1]ˆ2 ~
paste( "(" ) ~ alpha ~ paste( ")" ) ))
abline(h=0, col="red")
200
−2log( Λ ) − χ21 ( α )

150
100
50
0

0.0 0.2 0.4 0.6 0.8 1.0

π0
Interpretation: The likelihood ratio confidence interval suggests that we can be 95% confident that the true
values lies between roughly 76.08% and 97.42%. As the interval does not include the null hypothesis value
of 0.5, we would reject the null hypothesis. It indicates that the true proportion significantly differs from
0.5 based on the observed data. Also, the graph shows that the two points that intersect with the red curve
which is roughly 76.08% and 97.42%, aligning with the output result.

4
Question 3a)
### Logistic Regression ##
set.seed(1007654030)

## Simulate the data ##


n<-500
X1<-runif(n,-10,10)
X2<- rnorm(n, 0, 2)
X3 <- rbinom(n, size = 1, prob = 0.7)

b0 <- -0.8
b1 <- 0.1
b2 <- 0.2
b3 <- 0.3

eta <- b0 + b1*X1 + b2*X2 + b3*X3


mu <- exp(eta)

Y <- rpois(n, mu)


Xmat <- cbind(1, X1, X2, X3)

Question 3b)
print(dim(t(Xmat)))

## [1] 4 500
glm.303 <- function(Y, Xmat, tol.lim = 1e-10){
# Initialization
beta <- rep(0, ncol(Xmat))
eta <- Xmat %*% beta
mu <- exp(eta)
W <- diag(as.vector(mu)) # Ensure 'mu' is treated as a vector
Z <- eta + (Y - mu) / mu

## Threshhold based on the score fucntion


thresh <- sum(t(Xmat) %*% (Y-mu))
istep <- 0

# Iteratively Reweighted Least Squares (IRLS) loop


while(threshˆ2 > tol.lim){
beta <- solve(t(Xmat) %*% W %*% Xmat) %*% t(Xmat) %*% W %*% Z
eta <- Xmat %*% beta
mu <- exp(eta)
W <- diag(as.vector(mu))
Z <- eta + (Y - mu) / mu
thresh <- sum(t(Xmat) %*% (Y - mu))
istep <- istep + 1
}

5
SE <- sqrt(diag(solve(t(Xmat) %*% W %*% Xmat)))
z = beta/SE

res.mat <- data.frame(Estimates = beta, OR = c(1, exp(beta)[-1]),


SE = SE,
z = z,
p_value = ifelse(z < 0, 2*pnorm(z), 2*(1 - pnorm(z)))
)
rownames(res.mat)[1] <- "Intercept"
results <- list(Table = res.mat, Iteration = istep)
return(results)
}

glm.r <- glm(Y ~ X1 + X2 + X3, family = poisson(link = "log"))


summary(glm.r)

##
## Call:
## glm(formula = Y ~ X1 + X2 + X3, family = poisson(link = "log"))
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.87866 0.12374 -7.101 1.24e-12 ***
## X1 0.09836 0.01008 9.759 < 2e-16 ***
## X2 0.18846 0.02476 7.613 2.69e-14 ***
## X3 0.44768 0.13348 3.354 0.000797 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 664.64 on 499 degrees of freedom
## Residual deviance: 496.21 on 496 degrees of freedom
## AIC: 1015.6
##
## Number of Fisher Scoring iterations: 5
glm.303(Y = Y, Xmat = Xmat) # Default tolerance is 1e-10, can be changed by setting tol.lim = some value

## $Table
## Estimates OR SE z p_value
## Intercept -0.87866335 1.000000 0.12374540 -7.100574 1.242397e-12
## X1 0.09836197 1.103362 0.01007931 9.758796 0.000000e+00
## X2 0.18845696 1.207385 0.02475694 7.612288 2.686740e-14
## X3 0.44767828 1.564675 0.13349034 3.353638 7.975662e-04
##
## $Iteration
## [1] 5
summary(glm.r)

##
## Call:
## glm(formula = Y ~ X1 + X2 + X3, family = poisson(link = "log"))

6
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.87866 0.12374 -7.101 1.24e-12 ***
## X1 0.09836 0.01008 9.759 < 2e-16 ***
## X2 0.18846 0.02476 7.613 2.69e-14 ***
## X3 0.44768 0.13348 3.354 0.000797 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 664.64 on 499 degrees of freedom
## Residual deviance: 496.21 on 496 degrees of freedom
## AIC: 1015.6
##
## Number of Fisher Scoring iterations: 5

You might also like