You are on page 1of 5

QUESTION 1 (5 + 5 + 1 + 0.5 + 0.

5 + 3 = 15 marks)
1.1 Consider a parameter of interest, θ, and suppose you have identified three estimators of this
parameter,  θ^ 1 , θ^ 2 and θ^ 3. Name the desired properties of a good estimator and explain how you would
choose amongst the three estimators based on these properties.

1.2 Let Y be a binomial distributed random variable with parameters n and p. Consider two estimators of
Y Y +1
p, namely ^p1= and ^p2= . Given that E ( Y )=np and V ( Y )=npq, where q=1− p, show that ^p1 is
n n+ 2
an unbiased estimator of p while ^p2 is a biased estimator of p.

1.3 Let Y 1 , … ,Y n be independent and identically distributed random variables with E ( Y i ) =μ and
V ( Y i )=σ 2 <∞. Define

Ý −μ
Un=
σ ,
√n
1
where Ý = ∑ Y . Then the distribution of U n converges to the standard normal distribution function
n i i
as n → ∞. That is,
2
u t
1 2
lim P ( U n ≤ u ) =∫ e dt for all u.
n→∞ −∞ √2 π
1.3.1 Give the name of the above theorem.

1.3.2 Read the following excerpt from the proof of the theorem and then answer the questions following
it:

Proof:

X́ −E ( X́ )
1. Let Y = .
σ X́
X́ −E ( X́ )
2. Then,
M ( t )=E ( e
( t
σ X́ ))
Y

3. ⋮
− √ n μt

4. ¿ e σ
∏ M X √nt σ
i ( )
n
t 2
5.
¿e
− √ n μt
σ
[ ( )
1 + μ'1
t
√n σ
+ μ'2
√n σ
2!
+…
( )
]
1.3.2.1 In line 2, what does M Y ( t ) represent?

1
1.3.2.2 Why is it a good idea to make use of moment generating functions in this proof?

1.3.2.3 Show how line 4 is obtained.

QUESTION 2 (20 marks)


Are the following statements TRUE or FALSE? Copy the table numbers to your exam book and indicate
your answer as illustrated below. (A correct answer = 2 marks; an incorrect answer = - 0.50 and no
answer = 0 marks).

TRU
FALSE
E
Example: I have written my name on the cover page. X
2.1 When conducting a hypothesis test for the difference between two population
proportions, p1− p2, the appropriate test statistic is given by
( ^p 1−^p2 )−0
Z= Y 1 +Y 2
1 1 , where ^p pool = and q^ pool=1−^p pool.

p^ pool q^ pool
(+
n1 n2 ) n1 +n 2

2.2 A large-sample lower one-sided hypothesis test with α level of significance is related to
a ( 1−α ) lower one-sided interval.

2.3 Consider the deviation of an observed value, y, from the fitted line, ^y , i.e. ( y i− ^y i) . The
sum of squares to be minimized is given by
2 2
∑ ( y i−^y i ) =∑ ( y i− ^β 1 x i )
i i

2.4 The least-squares equation for solving for ^β , using matrices, is given by
( X ' X ) ^β= X ' Y .

S
2.5 The correlation coefficient can be calculated by r = ^β1 xx
S yy √
1
2.6 A 100 ( 1−α ) % confidence interval for β 1 is given by ^β 1 ± t α S √ c11 where c 11 =
S xx
2.7 A 100 ( 1−α ) % prediction interval for Y when x=x is given by
¿


¿
^ ^ ¿ 1 ( x − x́ )
( β 0 + β 1 x ) ±t α S n + S
2 xx

2.8 The ANOVA procedure for a one-way layout is used for the comparison of more than
two means.
2.9 The variability of a set of measurements is quantified by the sum of squares of
2
deviations, ∑ ( y i− ý ) . Under the ANOVA procedure this is called the total sum of
i
squares and it is partitioned into the sum of squares attributed to one of the
independent variables, SST, plus a remainder that is associated with random error, SSE.
2.10 In 1900 Karl Pearson proposed the following:

2
2 2
2
χ =∑
[ ni−E ( ni ) ] ¿ [ ni−n pi ]
¿∑
[ O−E ] 2
E ( n i)
∑ n pi E

QUESTION 3 (4 + 7 + 7 + 2 = 20 marks)
The article “Plugged In, but Tuned Out” (USA Today, January 20, 2010) summarizes data from two surveys
of kids aged 8 to 18. One survey was conducted in 1999 and the other in 2009. Data on the number of
hours per day spent using electronic media, are given below.

Number of Hours per Day spent using Electronic Media


1999 4 5 7 7 5 7 5 6 5 6 7 8 5 6 6
2009 5 9 5 8 7 6 7 9 7 9 6 9 10 9 8

Research question: Is there sufficient evidence to conclude that the average number of hours spent per
day using electronic media by kids aged 8 to 18 was less in 1999 than in 2009?

3.1 Before continuing with any inference you have to determine whether the normality assumption of the
populations is met. Calculate the mean, median and mode number of hours per day spent using
electronic media in 1999 and 2009. Using these descriptive measures, can you conclude that both
populations of number of hours per day spent, by kids aged 8 to 18 using electronic media, are
approximately normally distributed?

3.2 Next, is there sufficient evidence to conclude that the variances of the number of hours spent per day
using electronic media by kids aged 8 to 18, differs in 1999 and 2009? Why is it important that this
assumption be met before answering the research question? Report your conclusion about the
variances with 1% significance level.

3.3 Finally, what is your answer to the research question? Use α =0.01.

3.4 Report an approximate p-value for the test you carried out in 3.3.

QUESTION 4 (5 + 4 + 1 = 10 marks)
It is stated that a random sample of 500 measurements on the length of stay in hospital had a sample
mean of 4.6 days and a standard deviation of 3.1 days. A federal regulatory agency hypothesized that the
average length of stay is less than 5 days.
4.1 Do the data support this hypothesis? Use α =0.05.

4.2 Construct a 95% upper confidence bound for the average length of stay in hospital.

4.3 How does the hypothesized value compare to this upper bound?

3
QUESTION 5 (5 + 3 + 7 + 5 = 20 marks)
Medical researchers have noted that adolescent females are much more likely to deliver low-birth-weight
babies than are adult females. Because low-birth-weight babies have higher mortality rates, a number of
studies have examined the relationship between birth weight and mother’s age for babies born to young
mothers. The following data on the age of the mother (in years) and the birth weight of the baby (in
grams) are provided:

Birth Weight (grams) Mother’s Age (years)


2289 15
3393 17
3271 18
2648 15
2897 16
3327 19
2970 17
2535 16
3138 18
3573 19

5.1 Fit a straight line equation to predict the birth weight of a baby from the maternal age of the mother.
Show all calculations.

5.2 Calculate the sum of squared error, SSE, followed by the standard deviation, S.

5.3 Is there sufficient evidence of a linear relationship between the birth weight of a baby and the maternal
age of the mother? Use α =0.05.

5.4 Find a 95% prediction interval for the birth weight of a baby when the maternal age of the mother is
14. Interpret your answer.

QUESTION 6 (15 marks)


Suppose that three drugs used to reduce cholesterol are compared in a randomized experiment in which
three people use each drug for a month. The data for the reductions in cholesterol level for the 9
participants follow in the table below.

4
DRUG DRUG DRUG
1 2 3
6 10 9
4 14 12
2 9 6

Is there sufficient evidence to permit us to conclude that the mean reduction in cholesterol levels differs
for the three drugs? What would you conclude at the 5% level of significance? Interpret your finding.

THE END

Remember to check the notice board to determine if you qualify for a supplementary exam.

You might also like