MSexam Stat 2016F Solution

Master’s Written Examination
Option: Statistics and Probability Fall 2016
Full points may be obtained for correct answers to eight questions. Each numbered question
(which may have several parts) is worth the same number of points. All answers will be
graded, but the score for the examination will be the sum of the scores of your best eight
solutions.
Use separate answer sheets for each question. DO NOT PUT YOUR NAME ON
YOUR ANSWER SHEETS. When you have finished, insert all your answer sheets into
the envelope provided, then seal it.
1
Problem 1—Stat 401. Let (Ai )ni=1 be a sequence of events such that
(i) At least one of these events happens;
(ii) No more than two of these events would happen at the same time.
Suppose P(Ai ) = p for all 1 ≤ i ≤ n, and P(Ai ∩ Aj ) = q for all i ̸= j. Prove: p ≥ n1 , q ≤ n2 .
Solution to Problem 1. By (i), 1 = P(Ω) = P(∪n1 Ai ) ≤ np, therefore p ≥ 1/n. For the
second claim, we prove it by contradiction. We use “ ⊔ ” to denote disjoint unions. Assume
q > 2/n.
(ii)
Case 1. n is even. Then 1 ≥ P[(A1 ∩ A2 ) ⊔ (A3 ∩ A4 ) ⊔ · · · ⊔ (An−1 ∩ An )] > n2 · n2 = 1.
Contradiction.
(ii)
Case 2. n is odd. 1 ≥ P[(A1 ∩ A2 ) ⊔ (A3 ∩ A4 ) ⊔ · · · ⊔ (An−2 ∩ A1 ) ⊔ An ] > p + n−1
2
· n2 ≥
1 n−1
n
+ n = 1. Again contradiction.
Problem 2—Stat 401. Let ξ, η be two independent standard normal random variables.
Find the PDF of ξ/η.
Solution to Problem 2. Let F (x) be the CDF of ξ/η. Therefore
( )
ξ
F (x) = P ≤x
η
( ) ( )
ξ ξ
=P ≤ x; η > 0 + P ≤ x; η < 0
η η
= P (ξ ≤ xη; η > 0) + P(ξ ≥ xη; η < 0)
= 2P(ξ ≤ xη; η > 0)
∫
1 − s2 +t2
=2 e 2 dsdt
t≤xs,s>0 2π
∫ ∫ xs
1 ∞ s2 +t2
= ds e− 2 dt.
π 0 −∞
Taking derivatives in x yields

∫ ∞
′ 1 s2 1
se− 2 (1+x ) ds =
2
f (x) := F (x) = , for all x ∈ R.
π 0 π(1 + x2 )
Problem 3—Stat 411. Let X(1) < X(2) < . . . < X(n) be the order statistics of a random
sample of size n from the exponential distribution with pdf
1 −(x−θ1 )/θ2
f (x|θ1 , θ2 ) = e , x ≥ θ1
θ2
2
(i) Derive the MLE of (θ1 , θ2 ).
(ii) Show that Z2 = (n − 1)(X(2) − X(1) ), Z3 = (n − 2)(X(3) − X(2) ), ..., Zn = X(n) − X(n−1)
are independent and each Zi has identical exponential distribution.
Solution to Problem 3.
(i) The joint distribution of X(1) < X(2) < . . . < X(n) is
∑n
n! (x(i) − nθ1 )
g(x(1) , . . . , x(n) ) = n exp{− i=1 }Ix(1) ≥θ1 ,x(i+1) >x(i) ,i=1,...,n .
θ2 θ2
∑n
For given θ1 , g(x(1) , . . . , x(n) ) is maximized when θ2 = i=1 (x(i) − nθ1 )/n and the
maximum value is
n!
∑n exp(−n)Ix(1) ≥θ1 ,x(i+1) >x(i) ,i=1,...,n .
( i=1 (x(i) − nθ1 )/n)n
The maximum value is an increasing function of θ1 . Since θ1 ≤ X(1) , then it will be
∑
maximized at θ1 = X(1) . The MLE of (θ1 , θ2 ) is (θ̂1 = X(1) , θ̂2 = ni=1 (X(i) −nX(1) )/n).
(ii) The joint distribution of X(1) < X(2) < . . . < X(n) is given above. Let Z1 = nX(1) .
Then we have
Z1 Z2 Z1 Z3 Z2 Z1 Z2 Z1
X(1) = , X(2) = + , X(3) = + + , . . . , X(n) = Zn +. . .+ + .
n n−1 n n−2 n−1 n n−1 n
The Jocobian matrix is
 ∂X ∂X(1) ∂X(1)
  
(1) 1
... 0 ... 0
 ∂Z1
∂X(2)
∂Z2
∂X(2)
∂Zn
∂X(2)   n

 ...   1 1
... 0  1
J = = . n n−1

∂Z1
..
∂Z2
.. ...
∂Zn
..   .. .. .. .. =
 . . .  . . .  n!
∂X(n) ∂X(n) ∂X(n) 1 1
... n n−1
... 1
∂Z1 ∂Z2 ∂Zn
Therefore the joint distribution of Z1 , . . . , Zn is

∑n
1 i=1 zi − nθ1
f (z1 , . . . , zn ) = |J|g(x(1) , . . . , x(n) ) = n exp{− }Iz1 ≥nθ1 ,zi >0,i=2,...,n .
θ2 θ2
On the other hand, the density function of X(1) is
n − nx(1)θ −nθ1
e 2 Ix(1) ≥θ1
θ1
z1 −nθ1
1 −
Thus the density function of Z1 is f (z1 ) = θ1
e θ2
Iz1 ≥nθ1 . Consequently we have
∑n
1 i=2 zi
f (z1 , . . . , zn ) = f (z1 ) exp{− }Izi >0,i=2,...,n .
θ2n−1 θ2
This clearly shows that Z2 , . . . , Zn are independent with exponential distribution.
3
Problem 4—Stat 411. Let X1 , X2 , . . . , Xn denote a random sample from the normal
distribution N (θ, 1), where −∞ < θ < ∞ .
∑
(a) Show that Y = ni=1 Xi is a complete sufficient statistic for θ .
(b) Determine the MVUE of θ2 .
(c) If a constant b is defined by the equation P (X1 ≤ b) = 0.95 . Find the MVUE of b .
(For your reference, P (Z ≤ 1.645) = 0.95 if Z ∼ N (0, 1).)
(a) The density function of Xi is

{ } {
1 1 √ }
f (x; θ) = √ exp − (x − θ) = exp x · θ − θ /2 − x /2 − log 2π
2 2 2
2π 2
with x ∈ (−∞, ∞), which belongs to the regular exponential class with K(x) ≡ x.
Therefore,
∑n ∑n
Y = Xi = K(Xi )
i=1 i=1
is complete and sufficient for θ.
(b) Since Xi i.i.d.∼ N (θ, 1) implies Y ∼ N (nθ, n), then
E(Y 2 ) = Var(Y ) + [E(Y )]2 = n + (nθ)2 = n + n2 θ2
Let T = (Y 2 − n)/n2 . Then E(T ) = θ2 .

Since Y is complete sufficient and T is a function of Y which is unbiased for θ2 , then
T is the MVUE of θ2 .
(c) Since 0.95 = P (X1 ≤ b) = P (X1 − θ ≤ b − θ), then b − θ = 1.645.

Let U = Y /n+1.645 which is a function of Y . Since E(U ) = nθ/n+1.645 = θ+1.645 =
b, then U is the MVUE of b.
Problem 5—Stat 411. Let X1 , . . . , Xn be iid samples from an exponential distribution

with density function
fθ (x) = (1/θ)e−x/θ , x > 0, θ > 0.
The goal is to test H0 : θ = θ0 versus H1 : θ > θ0 , where θ0 is some fixed value.
4
1. Suppose that the original data are corrupted somehow and all that you are left with
are the binary data Y1 , . . . , Yn , where
{
1 if Xi ≤ c
Yi = IXi ≤c = i = 1, . . . , n
0 if Xi > c,
and c > 0 is some fixed cutoff, e.g., c = 7. How would you test H0 versus H1 based on
only these corrupted data? Give sufficient details about the null distribution that you
would use to make the test achieve a given size.
2. It is intuitively clear that the corrupted data is not as good as the original data. Make
this intuition precise by explaining how the power of your test based on the corrupted
data compares to the power of a test based on the original data.
1. It is easy to see that Y1 , . . . , Yn are iid Ber(p), where
p = pθ = Pθ (X1 ≤ c) = 1 − e−c/θ .
Since pθ is a decreasing function of θ, the stated hypothesis testing problem is equivalent
to testing
H0′ : p = pθ0 vs. H1′ : p < pθ0
based on the corrupted data. For this problem, the most powerful test is of the form
∑
reject H0′ iff ni=1 Yi < k,
where k is chosen to control the size of
∑the test. A conservative test could be derived
n
based on the binomial distribution of i=1 Yi under the null; alternatively, if n is rea-
sonably large, an approximate test can be derived based on the normal approximation
to binomial.
2. There is a uniformly most powerful test based on the original data. The test based
on the corrupted data is, itself, a test based on the original data—given the original
data, we could always corrupt it ourselves, if desired. So the Neyman–Pearson level
implies that the most powerful test based on original data is more powerful than the
test based on the corrupted data; therefore, the corrupted data is not as good.
Problem 6—Stat 416. A researcher is interested in learning if a new drug is better than
a placebo in treating a certain disease. Because of the nature of the disease, only a limited
number of patients can be found. Out of these, five are randomly assigned to the placebo
and five to the new drug. Suppose that the concentration of a certain chemical in blood is
measured and smaller measurements are better and the data are:
Drug: 3.2, 2.1, 2.3, 1.2, 1.5
Placebo: 3.4, 3.5, 4.1, 1.7, 2.2
5
(a) Let random variables X and Y stand for the output of a randomly selected person
in the Drug group and Placebo group, respectively. Then the goal is to test if X is
stochastically smaller than Y . Interpret the definition of “stochastically smaller”.
(b) Calculate the Mann-Whitney U Test statistic on the given dataset. Specify the type
of rejection region (that is, reject H0 if U is too small/large/small or large).
(c) Name another test for the same purpose (no need to perform it). Compared with the
Mann-Whitney U Test in (a), which test do you prefer? Explain your reason(s).
(a) X is stochastically smaller than Y if FX (x) ≥ FY (x) for all x and FX (x) > FY (x) for
some x. Here FX and FY stand for the cumulative distribution functions of X and Y ,
respectively.
(b) The Mann-Whitney U = #{(i, j) | Xi > Yj } = 5. We reject H0 if U is too small.
(c) Other tests include Wald-Wolfowitz runs test, K-S two-sample test, median test, control
median test, etc. Mann-Whitney U test is far preferable as a test of location since it
is more powerful.
Problem 7—Stat 431. For a sample of size n = 2,∑ let πi be the desired probability
of inclusion for sample i, with the usual constrain that N π = n. Let ψi = πi /2 and
ψi (1−ψi ) ∑N i
i=1
ai = 1−πi . Draw the first sample with probability ai / k=1 ak of selecting sample i.
Supposing sample i is selected at the first draw, select the second sample j from the remaining
N − 1 samples with probability ψj /(1 − ψi ).
(i) Show that the probability that samples i and j are in sample is
( )
ψi ψj 1 1
πij = ∑N +
k=1 ak
1 − πi 1 − πj
(ii) Show that P (sample i selected in sample) = πi .

(i)
P (samples i and j are in sample)
= P (samples i drawn first and j drawn second)
+ P (samples j drawn first and i drawn second)
ai ψj aj ψi
= ∑N + ∑N
k=1 ak
1 − ψi k=1 ak
1 − ψj (1)
ψi (1 − ψi )ψj ψj (1 − ψj )ψi
= ∑N + ∑N
k=1 ak (1 − ψi )(1 − πi ) ak (1 − ψj )(1 − πj )
( ) k=1
ψi ψj 1 1
= ∑N +
k=1 ak
1 − πi 1 − πj
6
(ii)
∑
N
P (sample i in sample) = πij
j=1,j̸=i
∑
N ( )
ψi ψj 1 1
= ∑N +
j=1,j̸=i k=1 a k 1 − π i 1 − πj
( N )
ψi ∑ ( 1 1
)
ψi
= ∑N ψj + −2 (2)
k=1 ak j=1
1 − πi 1 − πj 1 − πi
( )
ψi 1 ∑ N
ψj πi
= ∑N + −
k=1 ak
1 − πi j=1 1 − πj 1 − πi
( )
ψi ∑N
ψj
= ∑N 1+
k=1 ak j=1
1 − πj
On the other hand, we can also show that

∑
N ∑
N
ψk + ψk (1 − 2ψk )
2 ak =
k=1 k=1
1 − πk
∑N
ψk (1 − πk ) ∑ ψk
N
= + (3)
k=1
1 − πk k=1
1 − πk
∑
N
ψk
=1+ .
k=1
1 − πk
The conclusion follows by combining the preceding two equations.
Problem 8—Stat 451. Set D = {(x, y) : |x| + |y| ≤ 1}, and suppose that (X, Y ) has
uniform distribution on D, i.e., the joint density function satisfies
{
1
, if (x, y) ∈ D,
f (x, y) = 2
0, otherwise.
Write down a Gibbs sampler algorithm to simulate from f (x, y). (Hint: It may be helpful
to sketch a picture of the region D.)
Solution to Problem 8. The joint density is uniform, so the two conditional distributions—
X given Y and Y given X—must also be uniform, as their densities are proportional to the
joint. But the ranges depend on the variable being conditioned on. Specifically, we have
( )
X | Y = y ∼ Unif −(1 − |y|), 1 − |y|
( )
Y | X = x ∼ Unif −(1 − |x|), 1 − |x| .
7
Simulation from these two conditional distributions is straightforward, e.g., we can use runif
in R. Alternating between simulations from these two generates a sequence {(Xt , Yt ) : t ≥ 1},
a Markov chain, and the stationary distribution of the chain is the uniform distribution on
the disc.
Problem 9—Stat 461. Show that a finite-state aperiodic irreducible Markov chain is
regular.
Solution to Problem 9. We need to show that for any i, j there is a large enough M
(n)
such that pij > 0 for all n ≥ M . Now we fix any pair i and j. Since the Markov chain is
irreducible, we know there exist an integer k and a large enough integer N such that
pk+nd(j) > 0,
for all n ≥ N , where d(j) is the period of state j. Now recall that the Markov chain is also
aperiodic, hence d(j) = 1. The proof is then completed by setting M = k + N .
Problem 10—Stat 461. Customers arrive at a facility according to a Poisson process

X(t) with rate λ. Each customer pays $1 on arrival, and it is desired to evaluate the expected
value of the total sum collected during the time (0, t] discounted back to time 0. This quantity
is given by  
∑
X(t)
M = E e−rWk  ,
k=1
where r is the discount rate, W1 , W2 , ... are the arrival times, and X(t) is the total number
of arrivals in (0, t]. Evaluate M explicitly in terms of λ, r and t.
Solution to Problem 10. By the law of total probability, we have

[ n ]
∑∞ ∑
M= E e−rWk |X(t) = n P{X(t) = n}. (4)
n=1 k=1
Let U1 , ..., Un denote independent random variables

∑n that are uniformly distributed in (0, t].
Because of the symmetry of the functional k=1 exp{−rWk }, we have
[ n ] [ n ]
∑ ∑
E e−rWk |X(t) = n = E e−rUk
k=1 k=1
−rUk
= nEe
n
= (1 − e−rt ).
rt
8
Substitution into (4) gives
1 ∑∞
−rt
M = (1 − e ) nP{X(t) = n}
rt n=1
1
= (1 − e−rt )EX(t)
rt
λ
= (1 − e−rt ).
r
Problem 11—Stat 481. Observations: for k brands of beer, researchers recorded the
sodium content of n 12-ounce bottles. Denote the measurement responses by Yij , i =
1, ..., k; j = 1, ..., n. Questions of interest: what is the “ grand mean ” sodium content?
How much variability is there from brand to brand?
(1). Given the research objective and observations, what is the statistical model and basic
assumptions that you think appropriate for the analysis? What is your hypotheses on the
study for the variability from brand to brand?
(2). Derive E (M ST R) .
(3). Derive the sampling distribution of the overall sample mean Ȳ·· and construct the
confidence interval for the grand mean µ.
Solution to Problem 11. (1). One-Way random effect model Yij = µ + αi + εij , i =
1, ..., k; j = 1, ..., n, where αi ∼i.i.d. N (0, σα2 ) and εij ∼i.i.d. N (0, σ 2 ) are two independent
radom components in the model. Hypotheses for variability is H0 : σα2 = 0 vs H1 : σα2 ̸= 0.
(2). Note that αi ∼i.i.d. N (0, σα2 ) , ε̄i· ∼i.i.d. N (0, σ 2 /n),
( k n ) ( k )
∑∑( ) ∑
E (SST R) = E Ȳi· − Ȳ·· =E n (αi + ε̄i· − ᾱ − ε̄·· )2
i=1 j=1 i=1
[ ]
∑
k ∑
k
= n E (αi − ᾱ)2 + E (ε̄i· − ε̄·· )2
i=1 i=1
[ 2
]
σ
= n (k − 1) σα2 + (k − 1)= n (k − 1) σα2 + (k − 1) σ 2 ,
n
E (M ST R) = E (SST R/ (k − 1)) = nσα2 + σ 2 .
(3). The overall sample mean overall

( ) ( )
σα2 σ2 σ 2 + nσα2
Ȳ·· = µ + ᾱ + ε̄·· ∼ N µ, + = N µ, .
k kn kn
From (2), E (M ST R) = nσα2 + σ 2 , and the sampling distribution of Ȳ·· is

Ȳ·· − µ
√ ∼ t (k − 1)
M ST R/(kn)
9
as DF (SST R) = k − 1. Then the confidence interval for grand mean µ is
√
Ȳ·· ± tα/2 (k − 1) M ST R/ (kn).
Problem 12—Stat 481. In forestry, the diameter of a tree at breast height (which is
fairly easy to measure) is used to predict the height of a tree (a difficult measurement to
obtain). Silviculturists working in British Columbia’s boreal forest conducted a series of
spacing trials to predict the heights of several species of trees. The data set whitespruce.txt
contains the breast height diameters (in centimeters, xi ) and heights (in meters, Yi ) for a
random sample of 36 white spruce trees.
(1). Consider a linear regression model, write down the model and assumptions.
(2).( Based
) on the( least
) square estimation, β̂0 = 9.15, β̂1 = 0.48 and standard errors are
se β̂0 = 1.12, se β̂1 = 0.06. Is there sufficient evidence to conclude that there is a linear
association between breast height diameter and tree height? [t0.005 (34) = 2.73, t0.025 (34) = 2.02] .
(3). Derive
∑ E (SSR) given that β̂1 ∼ N (β1 , σ 2 /Sxx ) , where σ 2 is the error variance, and
Sxx = (xi − x̄)2 .
(4). Complete ANOVA table given that SSE = 95.7, Sxx = 795.3. Will you reach the same
decision as in (2)? [F0.01 (1, 34) = 7.44, F0.05 (1, 34) = 4.13.]
Solution to Problem 12. (1). Consider simple linear regression Yi = β0 + β1 xi + εi , where

i.i.d. errors εi ∼ N (0, σ 2 ) , i = 1, ..., n.
(2). To test H0 : β1 = 0 vs H1 : β1 ̸= 0, under H0 ,
β̂ 0.48
tβ̂1 = (1 ) ∼ t (n − 2) , to = = 8 > t0.005 (34) = 2.73
se β̂1 0.06
which implies that p-value( ) < 0.01, hence reject H0 . Or construct confidence interval for
β1 , β̂1 ± tα/2 (n − 2) se β̂1 , so 95% C.I. for β1 is 0.48 ± 2.02 ∗ 0.06 = (0.36, 0.60) , which
doesn’t include 0.
(3). The predicted value is Ŷi = β̂0 + β̂1 xi = Ȳ + β̂1 (xi − x̄) , it leads to
n (
∑ )2 ∑
n
SSR = Ŷi − Ȳ = β̂12 (xi − x̄)2 = β̂12 · Sxx ,
i=1 i=1
( ) ( )
σ2
E (SSR) = E β̂12 Sxx 2
= β1 + Sxx = β12 Sxx + σ 2
Sxx
( ) ( ) ( )
as E β̂12 = E 2 β̂1 + V ar β̂1 , β̂1 ∼ N (β1 , σ 2 /Sxx ) .
(4). We have SSR = β̂12 · Sxx = 0.482 ∗ 795.3 = 183.24. ANOVA table
10
Source of Variation SS DF M S F p − value
Regression 183.24 1 183.24 65.1 < 0.01
Error 95.7 34 2.81
T otal 278.94 35
We can reach the same decision of rejecting H0 as in (2).
11

MSexam Stat 2016F Solution

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MSexam Stat 2016F Solution

Uploaded by

Copyright:

Available Formats

Master’s Written Examination

Option: Statistics and Probability Fall 2016

Taking derivatives in x yields

Therefore the joint distribution of Z1 , . . . , Zn is

(b) Determine the MVUE of θ2 .

(a) The density function of Xi is

is complete and suﬃcient for θ.

(b) Since Xi i.i.d.∼ N (θ, 1) implies Y ∼ N (nθ, n), then

E(Y 2 ) = Var(Y ) + [E(Y )]2 = n + (nθ)2 = n + n2 θ2

Let T = (Y 2 − n)/n2 . Then E(T ) = θ2 .

(c) Since 0.95 = P (X1 ≤ b) = P (X1 − θ ≤ b − θ), then b − θ = 1.645.

Problem 5—Stat 411. Let X1 , . . . , Xn be iid samples from an exponential distribution

(ii) Show that P (sample i selected in sample) = πi .

On the other hand, we can also show that

Problem 10—Stat 461. Customers arrive at a facility according to a Poisson process

Solution to Problem 10. By the law of total probability, we have

Let U1 , ..., Un denote independent random variables

(3). The overall sample mean overall

From (2), E (M ST R) = nσα2 + σ 2 , and the sampling distribution of Ȳ·· is

Solution to Problem 12. (1). Consider simple linear regression Yi = β0 + β1 xi + εi , where

We can reach the same decision of rejecting H0 as in (2).

You might also like