Professional Documents
Culture Documents
Full points may be obtained for correct answers to eight questions. Each numbered question
(which may have several parts) is worth the same number of points. All answers will be
graded, but the score for the examination will be the sum of the scores of your best eight
solutions.
Use separate answer sheets for each question. DO NOT PUT YOUR NAME ON
YOUR ANSWER SHEETS. When you have finished, insert all your answer sheets into
the envelope provided, then seal it.
To earn full credits, you must show all the steps how you got your answer.
1
Problem 1—Stat 401. Suppose X1 and X2 are independent, and Xi has p.d.f.
f (xi ) = λi e−λi xi , xi ≥ 0, i = 1, 2,
(i) Show that for each x > 0, we have P(T > x) = e−(λ1 +λ2 )x .
Solution to Problem 1.
(i) P(T > x) = P(X1 > x, X2 > x) = P(X1 > x)P(X2 > x) = e−(λ1 +λ2 )x .
Therefore, Z ∞
E(T ) = xpT (x)dx = (λ1 + λ2 )−1 .
0
(iii) Let pX1 ,X2 (x1 , x2 ) be the joint p.d.f. of X1 , X2 . Elementary calculation gives
Z ∞Z ∞
λ1
P(T = X1 ) = P(X1 ≤ X2 ) = pX1 ,X2 (x1 , x2 )dx2 dx1 = .
0 x1 λ1 + λ2
Problem 2—Stat 401. Suppose X1 and X2 have joint p.d.f. fX1 ,X2 (x1 , x2 ) = 2e−(x1 +x2 ) , 0 <
x1 < x2 < ∞; zero elsewhere.
Solution to Problem 2.
(a)
Z x2
fX2 (x2 ) = fX1 ,X2 (x1 , x2 )dx1 = 2e−x2 (1 − e−x2 ), 0 < x2 < ∞.
0
2
(b) The conditional density of X1 given X2 = 3 is
e−x1
fX1 ,X2 (x1 , x2 )
= , 0 < x1 < 3.
fX2 (x2 ) x2 =3 1 − e−3
(c) Clearly |J| = 1. Notice that 0 < X1 < X2 , so we have Y1 > 2Y2 > 0. Thus the joint
p.d.f. of Y1 , Y2 is
pY1 ,Y2 (y1 , y2 ) = 2e−y1 , 0 < 2y2 < y1 < ∞.
Problem 3—Stat 401. At the beginning of a study of individuals, 15% were classified as
heavy smokers, 30% were classified as light smokers, and 55% were classified as nonsmokers.
In the five-year study, it was determined that the death rate of the heavy smother was 5%,
the death rate of the light smoker was 3%, and the death rate of the nonsmoker was 1%. A
randomly selected participant died over the five-year period. Calculate the probability that
this participant was a nonsmoker.
By Bayes’s theorem,
P(D|N ) · P(N )
P(N |D) =
P(D|H) · P(H) + P(D|L) · P(L) + P(D|N ) · P(N )
1% · 55%
=
5% · 15% + 3% · 30% + 1% · 55%
= 0.25 .
3
Problem 4—Stat 401. Let X1 , X2 , X3 be a random sample from a distribution having
density function f (x) = 2x, 0 < x < 1 and zero otherwise. Let X(3) = M ax(X1 , X2 , X3 )
and X(1) = M in(X1 , X2 , X3 ). Compute E(X(3) − X(1) ).
Solution to Problem 4. Let F (x) be the cumulation distribution function of the density
function f (x). We have F (x) = 0 when x ≤ 0, F (x) = x2 , when 0 < x ≤ 1, and F (x) = 1
when x > 1. Utilizing the formula for order statistics, the density function of X(3) is 6x5 and
the density function of X(1) is 6x(1 − x2 )2 . Directly computation shows that E(X(3) ) = 6/7
and E(X(1) ) = 16
35
. Thus E(X(3) − X(1) ) = 1435
.
Solution to Problem 5.
∂ log L(µ, σ 2 )
=0
∂µ
∂ log L(µ, σ 2 )
and = 0,
∂σ 2
b−µ
(b) P (X1 ≤ b) = P (Z ≤ σ
= 0.975. Let z ∗ be the real number such that P (Z ≤ z ∗ ) =
)
p
0.975. Then b−µ
σ
= z and b = µ + z σ. Hence, b̂, the MLE of b, is b̂ = µ̂ + z σˆ2 .
∗ ∗ ∗
4
Problem 6—Stat 411. Let X1 , X2 , . . . , Xn be a random sample from a Rayleigh distri-
1 2
bution with parameter τ > 0 with pdf given by fτ (x) = τ xe− 2 τ x I[0,∞) (x).
(a) Is the above Rayleigh distribution an exponential family distribution? Justify your
answer.
(b) Find a minimal sufficient statistic for τ .
1 2
(c) What is the distribution of U = e− 2 τ X1 ?
Solution to Problem 6.
(a) Yes by definitioin.
1 2
(b) The joint density function of X1 , X2 , . . . , Xn is given by ni=1 τ xi e− 2 τ xi I[0,∞) (xi ), which
Q
1 n 2
( ni=1 xi I[0,∞) (xi ))(τ n )e− 2 τ i=1 xi . By Facotrization theorem,
Q P
can be factorized as P
n 2
T (X1 ,P
X2 , . . . , Xn ) = i=1 Xi is a sufficient statistic. It is also minimal sufficient
n 2
since i=1 Xi is complete sufficient.
5
Problem 8—Stat 411. Let X1 , . . . , Xn be a random sample from Uniform(0, θ) with
θ > 0. In order to test H0 : θ = θ0 against H1 : θ > θ0 , suppose we consider a test that
rejects H0 if and only if Mn > c, where Mn = max{X1 , . . . , Xn }.
(a) Find c > 0 such that the size of the test is α ∈ (0, 1).
(b) Find the power function γn (θ) for the size-α test.
(c) Suppose θ0 = 1 and α = 0.05. Find the smallest sample size n such that γn (2) ≥ 0.9.
Solution to Problem 8.
(a) The cumulative distribution function of Mn is
x n
Fθ (x) = Pθ (Mn ≤ x) = Pθ (X1 ≤ x)n =
θ
for x ∈ [0, θ]. For a size-α test,
n
c
α = Pθ0 (Mn > c) = 1 − Fθ0 (c) = 1 −
θ0
Then c = θ0 (1 − α)1/n .
(b) The power function at θ is
n n
θ0 (1 − α)1/n
θ0
γn (θ) = Pθ (Mn > c) = 1 − = 1 − (1 − α)
θ θ
where eij are independent and identically distributed with N (0, σ 2 ) and σ 2 is unknown. The
parameters in our model are µ, θ1 and θ2 . For your convenience, you may use the following
notation to answer questions:
X 3
3 X 3 X
X 3 3 X
X 3
Sy = yij , Sjy = jyij and Sj 2 y = (j − 1)2 yij .
i=1 j=1 i=1 j=3 i=1 j=1
6
(a) Let Y = (y11 , · · · , y13 , y21 , · · · , y23 , y31 , · · · , y33 )0 and e = (e11 , · · · , e13 , e21 , · · · , e23 , e31 ,
· · · , e33 )0 and β = (µ, θ1 , θ2 )T . Specify the corresponding design matrix X in the linear
model of matrix form Y = Xβ + e for the above model and derive the least squares
estimates of µ, θ1 and θ2 . Note that the inverse of XT X is
21 −16 7
(XT X)−1 = 6−1 −16 13 −6 .
7 −6 3
(b) Compute the variances of the least squares estimates (LSE) for θ1 and θ2 , and their
covariance.
(c) Construct a 95% confidence intervals for θ1 − θ2 based on the LSEs θ̂1 , θ̂2 for θ1 , θ2 , the
sample variance estimate of σ 2 and t distribution.
7
(b) Because the variance of β̂ is σ 2 (XT X)−1 , the variances of θ̂1 and θ̂2 , and their covariance,
are respectively,
where
3 X
X 3
2
σ̂ = {yij − µ̂ − j θ̂1 − (j − 1)2 θ̂2 }2 /6
i=1 j=1
Problem 10—Stat 481. Some island has 1,000 male and 1,000 female inhabitants. An
investigator wants to know if males spend more minutes than females on the phone each
month. He samples 10 males and 10 females and asks them their monthly phone minutes.
The sample means and sample standard deviations/variances are shown below.
(a) In order to test if the monthly mean minutes spent by males is higher than the mean
minutes by females, what assumptions do you need for the distributions of the two
populations? Specify the hypotheses for your test.
(b) Perform an appropriate test for checking if the variances of the two populations are
the same and draw your conclusion at 5% significance level. For your reference, some
F -quantiles are F0.025 (9, 9) = 0.248, F0.975 (9, 9) = 4.026.
(c) Assuming that the variances of the two populations are the sample, perform an ap-
propriate test to check if the mean of males is higher than the mean of females and
draw your conclusion at 5% significance level. For your reference, some t-quantiles are
t0.95 (18) = 1.734, t0.975 (18) = 2.101.
8
Solution to Problem 10. (a) We may assume that (1) the mean minutes by females,
X1 , . . . , X10 are iid from N (µ1 , σ12 ); (2) the mean minutes by males, Y1 , . . . , Y10 are iid from
N (µ2 , σ22 ); and (3) Xi ’s and Yj ’s are indepedent. The hypotheses are: H0 : µ1 = µ2 ;
H1 : µ1 < µ2 .
(b) In order to test if σ12 = σ22 , we may use an F-test with test statistic F = s2X /s2Y =
224.16/190.95 = 1.17. If σ12 = σ22 , then F ∼ F (9, 9), which has a 95% confidence interval
(0.248, 4.026). Since 1.17 falls into the confidence interval, we conclude that the two variances
are the same.
(c) Assuming σ12 = σ22 = σ 2 , then the pooled variance
Ȳ − X̄ 106.61 − 99.44
T =q = p = 1.113
1
s2p 10 + 1 207.56/5
10
Under H0 , T ∼ t(18) with one-sided critical value t0.95 (18) = 1.734. Since 1.113 < 1.734, we
do not reject H0 . That is, the two means are the same.
Problem 11—Stat 481. A researcher studied the sodium content in beer by selecting
six brands from the large number of brands of U.S. and Canadian beers. The researcher
then chose eight 12-ounce cans or bottles of each selected brand at random and measured
the sodium content Y (in mg).
(1) Write down appropriate statistic model and necessary assumptions for the model.
What is the hypotheses for this study?
(2) Complete the following ANOVA table and then conclude given level α = 0.05. For
your reference, F0.05 (5, 42) = 2.44, F (6, 41) = 2.33.
9
Solution to Problem 11. (1). One-way ANOVA with random effect
Yij = µ + τi + εij , i = 1, ..., 6; j = 1, ..., 8
where errors εij ˜iid N (0, σ 2 ) are independent of the random effect τi ˜iid N (0, στ2 ) .
Hypotheses: H0 : στ2 = 0 vs H1 : στ2 6= 0.
(2). ANOVA table
Source DF Sum of Squares Mean Square F
Brand 5 42 8.4 11.5
Error 42 30.8 0.73
T otal 47 72.8
Reject H0 as F = 11.5 > F0.05 (5; 42) = 2.44.
(3). Estimate for error variance: σ̂ 2 = M SE = 0.73. Estimate for variance of random effect:
σ̂τ2 = (M ST R − M SE) /n = (8.4 − 0.73)/8 = 0.96.
(4). Note that Cov (Yij , Yi0 j 0 ) = Cov τi + εij , τi0 + εi0 j 0 . Model assumes the independence
among all random components, i.e.
Cov (Yij , Yi0 j 0 ) = Cov (τi , τi0 ) + Cov εij , εi0 j 0 .
So the covariance matrix of the response vector is
0 i 6= i0
Cov (Yij , Yi0 j 0 ) = σ2 i = i0 , j 6= j 0
2 τ 2
στ + σ i = i0 , j = j 0
Then the correlation matrix of the response
i 6= i0
0
στ2
Corr (Yij , Yi j ) =
0 0
στ +σ 2
2 i = i0 , j 6= j 0 .
i = i0 , j = j 0
1
Problem 12—Stat 481. A food company wished to test four different package designs
(A, B, C, D) for a new breakfast cereal. Twenty stores, with comparable location and sales
volume, were selected as the experiment units. Each design is randomly assigned to five
stores. Sales, in number of cases, were observed for the study period.
(1) What design is used in this study? Please specify the factor and its levels. Write down
the model and state your hypotheses.
(2) Complete the following ANOVA table and draw your conclusion given level 0.05.
Source DF Sum of Squares Mean Square F
Design 588.2
Error 158.2
T otal 746.4
10
(3) Find an estimate for the mean difference between package designs C and D. Then
derive its sampling distribution, a t-distribution.
(4) If the sales averages of package design C and D are ȲC = 19.5 and ȲD = 27.2 re-
spectively, find a 95% confidence interval for the mean difference µC − µD based on
the sampling distribution in (3). Then make inference about the difference. For your
reference, t0.025 (16) = 2.12, t0.05 (16) = 1.75.
Solution to Problem 12. (1). It is a completely randomized design study. Factor is the
package design and it has four levels, A, B, C, and D. One-way ANOVA model:
Note that F = 19.82 > F0.05 (3, 16) = 3.24, i.e. p-value < 0.05. So reject the null hypothese
and conclude that the there exists significant differences in the sale volumn due to the
different designs.
(3). Sample mean difference ȲC ∼ N (µC , σ 2 /n), ȲD ∼ N (µD , σ 2 /n). Then ȲC − ȲD is
following a normal distribution with mean µC − µD and variance 2σ 2 /n. In addition,
ȲC − ȲD − (µC − µD ) SSE
2
∼ N (0, 1) , 2 ∼ χ2 (N − k) .
2σ /n σ
Based on the Student’s Theorem, ȲC − ȲD is independent of SSE = ki=1 (n − 1) s2i . Hence
P
it has sampling distribution
ȲC − ȲD − (µC − µD )
t= p ∼ t (N − k)
2 · M SE/n
where n = 5, k = 4, N = 20.
(4). The 95% confidence interval for µC − µD is
ȲC − ȲD ± t0.025 (N − k) · se ȲC − ȲD
r
M SE
⇒ ȲC − ȲD ± t0.025 (16) · 2 ⇒ (−7.7) ± 4.21
n
We can conclude that designs C and D leads to different sales as 0 ∈
/ (−11.91, −3.49) .
11