Professional Documents
Culture Documents
Probability ideas
Issued: Week 1, Friday 14 January.
Workshop: Week 2, 1721 January.
Hand in: Friday 21 January, by 4 pm.
Solutions posted online: Friday 28 January, evening.
1.1 On the space = {1, 2, 3, . . .} a probability distribution is defined by p = 2 for = 1,
2, 3, . . . .
(a) For positive integer n denote by An the event { is a multiple of n}. Show that
P (An ) = 1/(2n 1).
(b) Denote by C the event { is a multiple of 2 or of 3}. Find P (C).
k=1
1.3 In the UK National Lottery, where six numbers are chosen at random from a list of 49
numbers, players select six numbers themselves, hoping to match as many of the chosen
numbers as possible. Find the probabilities that for a given entry:
(a) exactly four winning numbers are selected;
(b) at least four winning numbers are selected;
(c) exactly two of the winning numbers are multiples of six.
1.4 The roulette wheel in a UK casino has 37 numbers, {0, 1, . . . , 36}, all equally likely.
Pandora makes bets on the following combinations:
(a) top half = {19, 20, 21, . . . , 36};
(b) odd = {1, 3, 5, . . . , 35};
(c) bottom row = {34, 35, 36};
(d) the foursome {14, 15, 17, 18}.
Her bet wins if the winning number falls into her selection. Let A, B, C and D indicate
that the above four respective bets are winning bets. Find the probabilities of A, B,
A B, A B, B C and A B C D.
PSEx.tex
17 i 2011
005 B
input
03
output
02
03
What is the probability that the system fails within the next five years? Given that it
has not failed in five years, find the probability that neither of the components marked
A and B has failed.
1
k = n(n + 1),
2
k=1
n
X
1
k = n(n + 1)(2n + 1),
6
k=1
2
k 2 xk =
k=0
x(1 + x)
for |x| < 1.
(1 x)3
3.1 In each case below show that the probabilities given are non-negative and sum to 1, and
that the stated means and variances are correct. In parts (b) and (c), take 0 < p < 1 and
q = 1 p.
(a) The discrete uniform distribution Unif(1, . . . , n), where P (X = x) = 1/n for x = 1,
2, . . . , n, with EX = (n + 1)/2 and var X = (n2 1)/12.
3.2 In a simple Lottery, four numbers are selected at random from twenty, and you win the
first prize if you match all four numbers, or a second prize if you match 3 (and not 4)
numbers. You buy one ticket.
(a) Find the probabilities of winning the respective prizes.
(b) Calculate the mean number of correct guesses that you will make.
3.3 You buy 100 used computer monitors for a lump sum of 1,500. You expect about 60%
to be functioning, and youll sell those for 40 each. The rest youll sell at 5 each as
scrap. Model X, the number of monitors that will be functioning, using the binomial
distribution. Write down your net profit, in terms of X. Deduce the mean and standard
deviation of your net profit.
3.4 Suppose the random variables X and Y have the following joint distribution:
P (X = 1, Y = 0) = P (X = 0, Y = 0) = P (X = 0, Y = 1) = P (X = 1, Y = 0) =
(a) Find the (marginal) distributions of X and of Y .
(b) Deduce that E(XY ) = EX EY , but that X and Y are not independent.
1
.
4
(a) Show that, if X and Y are independent with Binom(m, p) and Binom(n, p) distributions, then X + Y has a Binom(m + n, p) distribution.
(b) Show that if, instead, Y has a Binom(n, r) distribution, X + Y does not have a
binomial distribution unless r = p.
4.4 Suppose that X has the geometric distribution Geom(p), so that P (X = x) = pq x for
x = 0, 1, 2, . . . . Show that its probability generating function is p/(1 qz), provided
that |z| < 1/q. Hence confirm that the mean and variance of X are q/p and q/p2 ,
respectively.
Use this result to show that the probability generating function of the sum of
r independent Geom(p) random variables is pr /(1 qz)r . Deduce that if Y has this
distribution then
y+r1 r y
P (Y = y) =
pq
(y = 0, 1, 2, . . .).
y
This is called the negative binomial distribution NB(r, p) and arises as the probability
that in a sequence of Bernoulli trials there are exactly y failures before the rth success.
Deduce the mean and variance of a NB(r, p) distribution from its probability generating
function, and note that these results can be obtained directly from the fact that this
negative binomial arises as the sum of r independent geometric random variables.
page 1
(d) Show how P (02 < X < 06) can be obtained from the answers to parts (b) and (c)
using the Law of Total Probability.
5.4 Prove that if X has the (, ) distribution, i.e.
x1 ex , if x > 0,
f (x) = ()
0,
elsewhere,
continued . . .
Reminder: () = 0 x1 ex dx.
page 2
1 |x|/2
e
,
4
(b) Work your answer out for small values of n and sketch the result.
7.3 Suppose that Z1 , Z2 and Z3 are independent observations from the standard normal
distribution. Let X1 = Z1 + Z2 , X2 = Z1 2Z2 + 3Z3 and X3 = Z1 Z2 . Find the
covariance and correlation between
(a) X1 and X2 ,
(b) X1 and X3 ,
(c) X2 and X3 .
7.4 A rock specimen from a particular area is randomly selected and weighed two different
times. Let W denote the actual weight and X1 and X2 the two measured weights. Thus
X1 = W + E1 and X2 = W + E2 , where E1 and E2 are the two measurement errors.
Assume that the Ei are independent of each other and of W and that var E1 = var E2 =
E2 .
(a) Express , the correlation between the two measured weights X1 and X2 , in terms of
2
W
, the variance of actual weight, and E2 .
(b) Calculate when W = 1 kg and E = 50 g.
page 1
continued . . .
page 2
(c) Weight might be proportional to height3 (e.g. volume) or height2 (e.g. area, for hollow
plants). Therefore try fitting a power relationship. Calculate the least-squares regression
line for V := ln Y on U := ln X.
(d) Plot the residuals against the values ui .
(e) Express the regression line as an equation giving Y in terms of X, and plot the curve
on your original graph.
8.4 A genetic experiment was undertaken to study the competition between two types of
female Drosophila melanogaster (fruit fly) in cages with one male genotype acting as a
substrate. The independent variable X is the time, in days, spent in cages, and the
dependent variable Y is the ratio of the numbers of Type 1 to Type 2 females. The
following data were recorded:
X: 17
31
45
59
73
Y : 02338 05804 1982 3388 1301
(a) Plot the points. The relationship is clearly non-linear.
(b) Transform the data appropriately and find the least-squares regression line for the
transformed data.
(c) Plot the residuals and say whether your transformation has made it satisfactory to fit
a straight line.
(d) Express the regression line as an equation giving Y in terms of X, and plot the curve
on your original graph.
page 1
13,
24,
10,
11,
continued . . .
page 2
9.4 Repeat as much as you can of Exercise 8.4 using Minitab. Use two transformations and
look at the residuals in each case. Give reasons for the transformation that you eventually
choose. Show the Minitab commands that you use clearly.
page 1
Estimation I
Issued: Week 10, Friday 18 March.
Workshop: 28 April5 May.
Hand in: Friday 6 May, by 4 pm.
Solutions posted online: Friday 13 May, evening.
10.1 The Poisson distribution has been used by traffic engineers as a model for
light traffic, based on the rationale that if the rate is approximately constant
and the traffic is light (so the individual cars move independently of each
other), the distribution of counts of cars in a given time-interval or space area
should be nearly Poisson. The table on the right records numbers of right
turns in 300 three-minute periods at a specific intersection (D. Gerlough &
A. Schuhl, Use of Poisson Distribution in Highway Traffic, Eno Foundation
for Highway Traffic Control, 1955). The table shows the frequency fi of
periods in which there were i right turns.
P
(a) The usual formula for sample mean is x := ni=1 xi /n. However for
the present data-set, in frequency form,
P12 theaverage
P12 number of right
turns per 3minute period is x = i=0 ifi
i=0 fi . Explain why,
and calculate x.
fi
(b) Assume that X, the number of right turns in a 3minute period, has
a Poisson distribution with parameter . Find an unbiased estimator
of and calculate the estimate for the given data. What is the standard error (i.e.
standard deviation) of your estimator? Calculate the estimated standard error.
= ? and var(X)
=?
Hint: EX = and var X = for X Poisson, so E(X)
(c)For both the following sets of observations from this distribution, calculate the values
of the maximum-likelihood estimate and the methods-of-moments estimate for .
(i) 00256, 03051, 00278, 08971, 00739, 03191, 07379, 03671, 09763, 00102.
(ii) 04698, 03675, 05991, 09513, 06049, 09917, 01551, 00710, 02110, 02154.
continued . . .
page 2
10.4 Let X1 , . . . , Xn be a random sample of size n from the geometric distribution with success
parameter p, i.e.
f (x) = p(1 p)x
(x = 0, 1, 2, . . .).
(a) Use the method of moments to find a point estimate of p.
(b) Explain in words why this estimate makes sense.
(c) Find a point estimate of p, given the following data:
2, 33, 6, 3, 18, 1, 0, 18, 42, 1, 21, 3, 18, 10, 6, 0, 1, 20, 14, 15.
page 1
Hypothesis testing I
Issued: Friday 13 May.
Workshop: 1619 May.
Hand in: Friday 20 May, by 4 pm.
Solutions posted online: Friday 27 May, evening.
12.1 Assume that IQ scores for a certain population are approximately N(, 100). To test
H0 : = 110 against the one-sided alternative H1 : > 110 we take a random sample of
size 16 from this population, and find that the mean of this sample is x = 1135.
(a) Do we accept or reject H0 at the 5% level?
(b) Do we accept or reject H0 at the 10% level?
(c) What is the p-value?
12.2 The calibration of a scale is to be checked by weighing a 5 kg test specimen 10 times.
Suppose that the results of different weighings are independent of one another and that
the weight on each trial is Normally distributed with = 0200 kg. Let denote the true
average weight reading on the scale.
(a) What hypotheses should be tested?
(b) Suppose the scale is to be re-calibrated if either x 51629 or x 48371.
p Express
(c) What is the probability that re-calibration is carried out when it is actually unnecessary?
(d) Which type of error would that be?
(e) Using the test of (b), what would you conclude from the sample data below?
4981, 5006, 4857, 5107, 4888, 4793, 4728, 5439, 5214, 5190
12.3 Assume that the birth weight in grammes of a baby born in the US is N(3315, 5252 ) for
boys and girls combined. Let X be the weight of a baby girl who is born at home in
Ottawa County and assume that X N(, 2 ).
(a) Using 11 observations of X, give the test statistic and critical region for testing
H0 : = 3315 against the alternative H1 : > 3315 (home-born girls in Ottawa County
are heavier) with significance level = 001.
(b) Calculate the value of the test statistic and give your conclusion using the following
weights:
3119, 2657, 3459, 3629, 3345, 3629, 3515, 3856, 3629, 3345, 3062.
continued . . .
page 2
12.4 Copper values (g Cu/100 ml blood) were determined for cattle grazing in an area known
to have well-defined molybdenum anomalies (metal values in excess of normal regional
variation) and for cattle grazing in a non-anomalous area [L. Thornton, G. F. Kershaw,
M. K. Davies, An investigation into copper deficiency in cattle in the southern Pennines,
I, J. Agricultural Sci. 78(1972), 157163], resulting in sX = 215 (m = 48) for the
anomalous area and sY = 1945 (n = 45) for the non-anomalous area. Test at significance
level 10 for equality of population variances.
page 1
(b) Deduce, without further calculation or use of tables, the result of the test at significance level 5% for the Null Hypothesis X = Y .
(c) Repeat (a) without the assumption of common variance of X and Y , i.e. define T
using a non-pooled estimate of variance:
Y (X Y )
X
,
T := p 2
SX /m + SY2 /n
and use Welchs formula for the degrees of freedom r, as given in lectures.
(d) Test at the 10% level whether the variances of residues in the two populations are
equal.
14.3 The driver of a diesel-powered car decided to test the quality of three types of diesel
fuel, based upon miles per gallon. Test the Null Hypothesis that the three means are
equal using the data below, using the significance level = 005 and making the usual
assumptions.
Brand A: 387, 392, 401, 389
Brand B: 419, 423, 413
Brand C: 408, 412, 395, 389, 403
14.4 Different sizes of nails are packaged in one-pound boxes. Let Xi for i = 1, 2, 3, 4, 5 be
the weight of a box with nail size 4C, 8C, 12C, 16C, 20C respectively, these being the
sizes from smallest to largest. It is desired to test whether the mean weights of nails in
the 4C, 8C, 12C, 16C and 20C boxes are equal. Assume that the distribution of Xi is
N(i , 2 ).
(a) Using random samples of size 7, give a critical region for a test with = 005.
continued . . .
page 2
(b) Construct an ANOVA table, and state your conclusions using the following data.
X1 : 103, 104, 107, 103, 108, 106, 107
X2 : 103, 110, 108, 105, 106, 106, 105
X3 : 103, 108, 106, 102, 104, 104, 107
X4 : 110, 110, 109, 109, 106, 105, 108
X5 : 104, 106, 107, 106, 105, 107, 105
(c) Construct boxplots on the same diagram for each type of nail, and comment.