You are on page 1of 4

Statistical Methodology

MATH10095

Wednesday 16th December 2020


1300-1500 † *


All students: you have an additional 1 hour to assemble and submit your PDF.
Final submission deadline: 16:00.

*
Students with a Schedule of Adjustment: You are entitled to a further fixed
additional 1 hour for this remote examination.

Final submission deadline: 17:00

Attempt all questions

Important instructions

1. Start each question on a new sheet of paper.


2. Number your sheets of paper to help you scan them in order.
3. Only write on one side of each piece of paper.
4. If you have rough work to do, simply include it within your overall answer – put
brackets at the start and end of it if you want to highlight that it is rough work.
MATH10095 Statistical Methodology 1

(1) The data y1 , . . . , yn are n independent observations of a random variable with


probability density function

θe−θ sin(y) cos(y)


f (y; θ) = , y ∈ [0, π/2]
1 − e−θ
and f (y; θ) = 0 for y ∈
/ [0, π/2], and θ > 0 is an unknown parameter.

(a) Show that the log-likelihood for this data can be written as
n
X n
X
n log(θ) − n log(1 − e−θ ) − θ sin(yi ) + log cos(yi ).
i=1 i=1

[4 marks]
(b) Show that the observed information function for this distribution is

n eθ
− n .
θ2 (eθ − 1)2

[6 marks]
(c) Obtain an iterative formula based on the Fisher’s method of scoring for
calculating the MLE of θ. Discuss whether the iterative formula changes if
the Newton-Raphson method was applied instead.
[8 marks]
P42
(d) An experiment with 42 observations yielded i=1 sin yi = 11. Taking the initial
value for iteration as θ(0) = 3.2, complete one iteration of the Fisher’s method
of scoring.
[6 marks]
(e) We know that after 3 iterations the iterative scheme in (c) for data given in (d)
converges to the maximum likelihood estimate θ̂ = 3.35. Test the hypothesis
that θ = 3 against the two-sided alternative θ 6= 3 for this data using a Wald
test at α = 0.05.
[Note: Choose an appropriate critical point from the quantiles below in which
(α) shows the area under the curve on the right hand-side of the distribution:

χ21 (0.05) = 3.8415, χ22 (0.05) = 5.9915, z(0.025) = 1.96, z(0.05) = 1.64.]

[10 marks]

[Please turn over]


MATH10095 Statistical Methodology 2

(2) Let y1 , y2 , . . . , yn denote a random i.i.d. sample from the following distribution
4
f (y | µ, β) = Cβ 1/4 e−β(y−µ) , y ∈ (−∞, +∞),

with µ ∈ R and β > 0 where C ≈ 0.552.

(a) Consider the case where µ = 0 is known. Derive the expression of the likelihood
ratio test statistic for testing hypothesis H0 : β = 1 against H1 : β 6= 1, and
hence derive the rule for rejecting the null hypothesis.
[Note: log(1) = 0.]
[10 marks]
(b) Let µ = 0 be known and consider a Bayesian model for β with prior density

ba a−1 −bβ
p(β) = β e , β > 0,
Γ(a)
a
for some fixed values a, b > 0; i.e. β ∼ Γ(a, b), for which, E(β) = and
b
a
Var(β) = 2 .
b
(i) Derive the posterior distribution of β.
[10 marks]
(ii) Write the expressions for the posterior mean and the posterior variance of
β (no need to prove them).
[4 marks]
(c) Now consider the case both µ and β are unknown. Obtain the score vector and
the Fisher information matrix for vector of the unknown parameters θ = (µ, β).
[Note: you can use E((Y − µ)2 ) = 0.338β −1/2 and E((Y − µ)3 ) = 0 for Y ∼
f (y | µ, β).]
[10 marks]

[Please turn over]


MATH10095 Statistical Methodology 3

(3) (a) Consider the simple linear regression model E(Yi | xi ) = β0 + β1 xi for i =
1, · · · , n, in which Yi are independently distributed from N (β0 + β1 xi , σ 2 ) and
σ 2 is unknown. Assume we have a sample of size n = 6 and RSS = 0.026,
Syy = 6i=1 (yi − ȳ)2 = 0.54 and Sxy = 6i=1 (xi − x̄)(yi − ȳ) = −6.
P P

We want to test whether the expectation of the response variable (Y ) linearly


depends on the explanatory variable (X).
(i) Write the null and alternative hypotheses and find the t-test statistic value
for this test. Use four decimal places in the calculations.
[Note: RSS = Syy − 2 βb1 Sxy + βb12 Sxx ]
[10 marks]
(ii) For the same test explained in (a), find the value of another test statistic
and specify its distribution under the null hypothesis.
[4 marks]
(b) Consider the simple linear regression model E(Yi | xi ) = β0 + β1 xi for i =
1, · · · , n, in which Yi are independently distributed from N (β0 + β1 xi , σ 2 ).
RSS
The value of 1 − is calculated for this model; in which RSS is the residuals
Syy
sum of squares and Syy is the total sum of squares about ȳ. What is this value
named? What range of values does it take? Explain what it describes about
the model.
[8 marks]
(c) Consider the regression model,
1
log Yi = β0 + β1 + i
xi
i.i.d
in which i ∼ N (0, σ 2 ). Apply the least squares estimation method and derive
the estimates for parameters β0 and β1 .
[10 marks]

[End of Paper]

You might also like