You are on page 1of 8

Examiners’ commentaries 2013

Examiners’ commentaries 2013


ST3133 Advanced statistics: distribution theory

Important note

This commentary reflects the examination and assessment arrangements for this course in the
academic year 2012–13. The format and structure of the examination may change in future years,
and any such changes will be publicised on the virtual learning environment (VLE).

Information about the subject guide

Unless otherwise stated, all cross-references will be to the latest version of the subject guide (2011).
You should always attempt to use the most recent edition of any Essential reading textbook, even if
the commentary and/or online reading list and/or subject guide refers to an earlier edition. If
different editions of Essential reading are listed, please check the VLE for reading supplements – if
none are available, please use the contents list and index of the new edition to find the relevant
section.

General remarks

Learning outcomes

By the end of this half course and having completed the Essential reading and activities you should
be able to demonstrate to the Examiners that you are able to:

• recall a large number of distributions and be a competent user of their mass/density and
distribution functions and moment generating functions
• explain relationships between variables, conditioning, independence and correlation
• relate the theory and method taught in the course to solve practical problems.

Format of the examination


• Like last year, the examination this year has only three parts in Section A but carries the
same 40 per cent weighting of the whole paper. This aims to reduce the amount of time it
takes to complete the paper, and this format will be retained for next year’s examination.

Key steps to improvement


• In 2012–13, there was a general improvement in the ability to answer questions from
different parts of the subject guide. That said, many candidates still have not studied the
subject guide thoroughly enough. Candidates are reminded that examination questions vary
from year to year, and can potentially cover materials that have not been examined in the
past. Hence it is suboptimal to practise only the past papers, and so candidates should
practise all the examples, Learning activities and Sample examination questions in

1
ST3133 Advanced statistics: distribution theory

the subject guide in order to understand the relevant theory and adequately prepare for the
examination.
• Candidates should try their best to first find and attempt parts of questions which they are
more confident of answering correctly. Again, candidates should prepare well by studying
the subject guide, knowing what is covered by the syllabus, not only practising past papers.
• The purpose of your work is to show the Examiners that you understand how to answer the
question. So you should give reasons for steps taken in your answers, and make them
complete, logical and ordered. You should aim to show complete working in your answer. A
‘splatter’ approach, where the parts of your answer are scattered over the page, is not likely
to get full marks.
• Candidates are advised to practise their accuracy in evaluating sums, integrals or derivatives,
as a large number of questions involve these basic operations. This is the key to improving
performance. Many candidates each year fail to perform integration by parts accurately.
• Quite a number of candidates can only partly remember formulae or procedures for solving
particular problems. This is not good enough, and certainly more practice is needed. This is
especially true for finding the probability of an event involving more than one random
variable, integration by parts, and finding the probability mass/density function of
transformed bivariate random variables.
• Please bring a calculator for evaluating probabilities in some questions. Check to make sure
your calculator complies in all respects with the specification given with your Admission
Notice.
• Candidates should be ready to derive the moment generating functions of standard random
variables, like the normal, Gamma, chi-square, exponential (all continuous), or the
geometric, binomial, Poisson (all discrete), and ideally know the forms by heart. It is also
important to know basic applications of these distributions, and apply the correct formulae
in probability questions.

Question spotting
Many candidates are disappointed to find that their examination performance is poorer
than they expected. This can be due to a number of different reasons and the Examiners’
commentaries suggest ways of addressing common problems and improving your performance.
We want to draw your attention to one particular failing – ‘question spotting’, that is,
confining your examination preparation to a few question topics which have come up in past
papers for the course. This can have very serious consequences.
We recognise that candidates may not cover all topics in the syllabus in the same depth, but
you need to be aware that Examiners are free to set questions on any aspect of the syllabus.
This means that you need to study enough of the syllabus to enable you to answer the required
number of examination questions.
The syllabus can be found in the ‘Course information sheet’ in the section of the VLE dedicated
to this course. You should read the syllabus very carefully and ensure that you cover sufficient
material in preparation for the examination.
Examiners will vary the topics and questions from year to year and may well set questions that
have not appeared in past papers – every topic on the syllabus is a legitimate examination
target. So although past papers can be helpful in revision, you cannot assume that topics or
specific questions that have come up in past examinations will occur again.
If you rely on a question spotting strategy, it is likely you will find yourself in
difficulties when you sit the examination paper. We strongly advise you not to
adopt this strategy.

2
Examiners’ commentaries 2013

Examiners’ commentaries 2013


ST3133 Advanced statistics: distribution theory

Important note

This commentary reflects the examination and assessment arrangements for this course in the
academic year 2012–13. The format and structure of the examination may change in future years,
and any such changes will be publicised on the virtual learning environment (VLE).

Information about the subject guide

Unless otherwise stated, all cross-references will be to the latest version of the subject guide (2011).
You should always attempt to use the most recent edition of any Essential reading textbook, even if
the commentary and/or online reading list and/or subject guide refers to an earlier edition. If
different editions of Essential reading are listed, please check the VLE for reading supplements – if
none are available, please use the contents list and index of the new edition to find the relevant
section.

Comments on specific questions – Zones A and B

Candidates should answer all FOUR questions: Question 1 of Section A (40 marks) and all
THREE questions from Section B (60 marks in total).

Section A

Answer all three parts of Question 1 (40 marks in total).

Question 1

(a) The joint probability density function of (X, Y ) is (note that x < 2y in the
range)
kxy 2 ,

0 < x < 2y < 1;
fX,Y (x, y) =
0, otherwise.
i. Show that k = 80. (4 marks)
ii. Find P (X > Y ). (5 marks)
iii. Find E(X). (5 marks)

Approaching the question


This question was not answered as well as it should have been, despite the same type of
question appearing in the examination each year. Major weak points of candidates are
finding the correct range for x and y, depending on the order of integration used. Many
candidates wrongly used x/2 < y < 1 when integrating out y first. But it should be
x/2 < y < 1/2, because 2y < 1 means that y < 1/2. Candidates need to practise a lot in
finding the correct region of integration in these kinds of questions. Please study Section 4.2
of the subject guide carefully, and practise similar questions.

3
ST3133 Advanced statistics: distribution theory

In part ii., some candidates just used a single integral and left the answer in terms of x or y,
showing a rather weak concept of probability. In fact, we need to solve the system of
inequalities:
x > y, and 0 < x < 2y < 1.

The solution is y < x < 2y when 0 < y < 1/2, if you decide to integrate out x first; or
x/2 < y < x when 0 < x < 1/2; x/2 < y < 1/2 when 1/2 < x < 1, if you decide to integrate
out y first.
The former case is covered in the solutions. Many candidates got it wrong in 2013 doing the
latter. There are two integrals, coming from the fact that although y < x, we have y < 1/2
also, but x can be larger than 1/2, rendering y < x not valid. Drawing the resulting region
on graph paper should reveal why when integrating out y first, the solution is the sum of
two integrals rather than just one. Some candidates in 2013 missed out the latter integral in
the solution for integrating out y first:
Z 1/2 Z x Z 1 Z 1/2
80xy 2 dydx + 80xy 2 dydx.
0 x/2 1/2 x/2

Again, please study Section 4.2 of the subject guide carefully.


For part iii., candidates generally
R know to find the marginal density for X, fX (x), first, and
then use Rthe
R definition E(X) = xfX (x)dx. Some candidates use
E(X) = xfX,Y (x, y)dxdy directly, which is fine. Please check Section 4.3 and
Proposition 4.3.1 of the subject guide for definitions.

(b) A product is faulty with probability 0.1, and is intact with probability 0.9.
In an inspection 10 products are randomly selected. If at least 2 are found
faulty, an alert is raised.
i. Find the probability that an alert is raised in an inspection. (4 marks)
ii. Find the probability that in 5 independent inspections, there is at least one
alert raised. (3 marks)
iii. Given an alert is raised, what is the probability that there are 2 faulty
products exactly? (4 marks)

Approaching the question


This question was answered well in 2013, but not as well as it should have been. This is a
very basic probability question testing the binomial distribution. See Equation (3.4) in the
subject guide. Candidates are reminded that this is a discrete distribution, and hence the
inequality sign of an event plays an important role. For instance, in part i., denote X =
number of faulty products in an inspection. Then

P (alert) = P (X ≥ 2) = 1 − P (X = 0) − P (X = 1).

If a candidate writes the probability as P (X > 2) instead of P (X ≥ 2), then

P (X > 2) = 1 − P (X = 0) − P (X = 1) − P (X = 2),

which gives the wrong solution. Another important point is that each candidate should
bring an authorised calculator into the examination, for evaluating probabilities.
To complete part ii., candidates need to realise that it is a binomial distribution, with n = 5
and p = P (alert). Please study Example 3.3.8 in the subject guide carefully. Part iii. is a
conditional probability question and was answered well in general. See Section 2.4.1 of the
subject guide.

4
Examiners’ commentaries 2013

(c) Let Y be a positive random variable with an absolutely continuous


probability density fY (y) and finite mean µ.
i. Prove the Markov inequality
µ
P (Y > a) ≤ ,
a
where a > 0. (4 marks)
ii. Let X ∼ Exp(1). Show that the moment generating function of X is
1
MX (t) = ,
1−t
stating clearly the range of t where this is valid. (5 marks)
tX
iii. Using i. and ii. by putting Y = e , show that
 
log a 1
P X> ≤ ,
t a(1 − t)
where the range of t should be stated again.
By putting a = et and calculating P (X > 1), show that

xe−x ≤ 1

for any x > 0. (6 marks)

Approaching the question


This question was not answered well. The Markov inequality is Proposition 3.4.12 in the
subject guide. This goes to show that studying everything in the subject guide is important,
as an inequality question is not often set in examinations. Part ii. is a standard question on
the moment generating function of an exponential random variable. See Definition 3.5.1 and
Example 3.5.6 in the subject guide. One thing to note is that there are still a number of
candidates who do not write the range of t where the moment generating function is valid,
even if explicitly asked to do so. Your solution will be incomplete if you do not write the
range of t.
Part iii. was not answered well in general. Some candidates just jumped to the conclusion
after substituting Y = etX , without explaining why the right-hand side has
E(Y ) = MX (t) = 1/(1 − t). Marks will not be given since the question requires you to show
your workings, meaning that you need to present the logical reasoning.

Section B

Answer all three questions in this section (60 marks in total).

Question 2

Let X and Y be Exp(λ) random variables with X independent of Y . That is,

fX (x) = λe−λx , x > 0.

Define V = X/Y , U = Y .

(a) Find the joint density function fU,V (u, v) of (U, V ). State clearly the range of
(u, v) where the density function is non-zero. (5 marks)

Approaching the question


This part was not answered as well as it should have been. Many candidates forgot to take
the absolute value of the Jacobian, leaving −u in the density, making the density negative,

5
ST3133 Advanced statistics: distribution theory

which does not make sense. Some candidates reversed the Jacobian in the formula which is
wrong. See Section 4.6.2 of the subject guide on how the Jacobian should be calculated and
where it should be placed. Some left the solution in terms of x and y, which is not correct
since the question asked for fU,V (u, v), so it has to be in terms of the transformed variables
u and v.

(b) Hence show that the density function fV (v) of V is

1
fV (v) = , v > 0.
(v + 1)2

(5 marks)

Approaching the question


This part was answered fine by those who were able to derive the joint density in part (a).
Some candidates got it wrong when doing integration by parts, which is a basic technique
that all candidates should master before the examination. See Activity 3.4 in the subject
guide for an example of using integration by parts.

(c) For k > 0, find P (kY < X < 2kY ). (5 marks)

Approaching the question


This part was generally badly answered. Only a few candidates got it right. Some evaluated
the probability starting from the joint density of X and Y , which is fine although
unnecessarily more difficult and time consuming. The trick is to realise that part (b) is
related to part (c). In fact most of the examination questions have related parts. Then it is
not difficult to see that kY < X < 2kY is equivalent to k < V < 2k, and so part (b) can be
used.

(d) Find k to maximise the probability in part (c). You are not required to
perform the second derivative test. (5 marks)

Approaching the question


This part was answered well by those who were able to complete part (c) successfully. One
thing to note is that if k is negative, the probability is 0 and hence it is not maximised, but
rather minimised. Hence k has to be positive.

Question 3

Let N ∼ Pois(µ) be the number of vehicles passing through a particular street on a


weekday, while M ∼ Pois(γ) the same number on a weekend. N and M are
independent of each other.

(a) Show that the moment generating function of N is

MN (t) = exp µ(et − 1) ,



t ∈ R.

(4 marks)

Approaching the question


This part was generally answered well. The derivation of the moment generating function of
a Poisson random variable can be found in Example 3.5.5 in the subject guide.

6
Examiners’ commentaries 2013

(b) A vehicle is either public transport or a private car, with probability p and
q = 1 − p respectively. Let Xi , Yi ∼ Bernoulli(p) be independent of each other and
N and M for all i. Write
N
X M
X
G= Xi , H= Yi .
i=1 i=1

What do G and H represent? Write down the meaning for N − G and M − H as


well. (4 marks)

Approaching the question


Many candidates were confused with part (b). Both Xi and Yi are Bernoulli(p) and hence
both of them must represent public transport rather than private cars. G is summing over
weekdays (total represented by N ), while H is summing over weekends (total represented by
M ), and the meaning is then clear for G and H. N − G is just the rest of the cars on a
weekday, which must be private cars. The same is the case for M − H.

(c) Find the joint moment generating function of G and H, stating clearly the
range of validity. What are the distributions of G and H? (7 marks)

Approaching the question


This part was not answered well, though many candidates were able to point out that it is
the product of individual moment generating functions of G and H, since they are
independent. Both G and H are random sums, and the details on how to find the moment
generating function of a random sum can be found in Section 5.6 of the subject guide. See
also Proposition 5.6.3 in the subject guide.

(d) Find Var(G|N = 100). You can use the formula for the mean of a Bernoulli(p)
random variable without proof. If the variance of a Bernoulli(p) random variable
is needed, you need to derive it explicitly. (5 marks)

Approaching the question


This part is testing basic knowledge of conditional variance. See Lemma 5.6.2 in the subject
guide. To derive the variance of a Bernoulli random variable, the easiest way is to use
Definition 3.4.7 in the subject guide, which is the formula for calculating the variance of a
discrete random variable.

Question 4

In a game, two players alternately pick a ball from an urn at random, which
contains 2 red balls and 8 blue balls. If a blue ball is picked, it is put back into the
urn and the other player then picks a ball. If a red ball is picked, then the player
has won the set. The red ball is then put back into the urn, and the player who lost
the set picks the first ball of the next set. The winner is the first player to win 3
sets, so the game has a maximum of 5 sets.

At the beginning of the first set you are the first to pick a ball. Let N be the total
number of sets played, and Yi , i = 1, . . . , N be the total number of balls drawn at set i.

(a) Write down P (Yi = y), y = 1, 2, . . .. Show that the probability you win the first
set is 95 . (Hint: you winning the first set means Y1 is odd.) (5 marks)

Approaching the question


This part was answered well in general. Most candidates could state the geometric
distribution for Yi and calculate P (Yi is odd). The geometric distribution can be seen in
Example 3.3.9 in the subject guide.

7
ST3133 Advanced statistics: distribution theory

(b) What is the probability that you win 3 sets in a row? Show that
16
P (N = 3) = .
81
(5 marks)

Approaching the question


This part was badly answered, especially for the probability of winning 3 sets in a row. To
be able to complete the question, one must realise that winning in a set that you start has
probability 5/9; afterwards you are not starting it since you won the last set, so the
probability of winning changes to 4/9; and if you win again, the winning probability stays at
4/9 since you won the last set. And each set is independent of each other, hence you need to
use the multiplication rule in the end, which can be found in Section 2.4.3 of the subject
guide.

280
(c) Show that P (N = 4) = 729 . Hence find P (N = 5). (10 marks)

Approaching the question


Most candidates realised that P (N = 5) = 1 − P (N = 3) − P (N = 4), and hence part (c)
was generally answered well. Yet most candidates were not able to evaluate P (N = 4). As
in part (b), they needed to realise the corresponding probability of winning/losing in each
set, and identify all the combinations giving N = 4.

You might also like