You are on page 1of 20

G5098 Probability and Statistics 201011: Exercises 1

Probability ideas
Issued: Week 1, Friday 14 January.
Workshop: Week 2, 1721 January.
Hand in: Friday 21 January, by 4 pm.
Solutions posted online: Friday 28 January, evening.
1.1 On the space = {1, 2, 3, . . .} a probability distribution is defined by p = 2 for = 1,
2, 3, . . . .
(a) For positive integer n denote by An the event { is a multiple of n}. Show that
P (An ) = 1/(2n 1).
(b) Denote by C the event { is a multiple of 2 or of 3}. Find P (C).

1.2 Show by induction that, for any events A1 , A2 , . . . , An ,


!
n
n
[
X
P
Ak
P (Ak ).
k=1

k=1

This is called Booles inequality.

1.3 In the UK National Lottery, where six numbers are chosen at random from a list of 49
numbers, players select six numbers themselves, hoping to match as many of the chosen
numbers as possible. Find the probabilities that for a given entry:
(a) exactly four winning numbers are selected;
(b) at least four winning numbers are selected;
(c) exactly two of the winning numbers are multiples of six.
1.4 The roulette wheel in a UK casino has 37 numbers, {0, 1, . . . , 36}, all equally likely.
Pandora makes bets on the following combinations:
(a) top half = {19, 20, 21, . . . , 36};
(b) odd = {1, 3, 5, . . . , 35};
(c) bottom row = {34, 35, 36};
(d) the foursome {14, 15, 17, 18}.
Her bet wins if the winning number falls into her selection. Let A, B, C and D indicate
that the above four respective bets are winning bets. Find the probabilities of A, B,
A B, A B, B C and A B C D.

PSEx.tex

17 i 2011

G5098 Probability and Statistics 201011: Exercises 2


Conditional probability and independence
Issued: Week 2, Friday 21 January.
Workshop: Week 3, 2428 January.
Hand in: Friday 28 January, by 4 pm.
Solutions posted online: Friday 4 February, evening.
2.1 Darren plays twice as many good games as bad games; he scores in 70% of his good games
and in 40% of his bad games. In what proportion of games does he score? Given that he
has scored, what is the probability that he had a good game?
2.2 An urn initially has one red ball. Persephone uses a device to select n blue balls with
probability e n /n!, for n = 0, 1, 2, . . . , and add them to the urn. She then selects one
ball at random from the urn. Show that the probability that she selects the red ball is
(1 e )/.
2.3 In a communications system, a string of 0s and 1s, known as bits, is transmitted by a
sender to a receiver. Noise in the system means that some bits are incorrectly received:
the probability that a 0 arrives as a 1 is 005 and the probability that a 1 arrives as a 0
is 01. All bits are sent independently and 60% of all items sent begin as 1.
(a) Find the proportion of bits accurately received.
(b) Find the probability that a 1 was sent, given that a 1 is received.
(c) Find the probability that a 0 was sent, given that a 0 is received.
(d) To improve reliability, 0 is sent as 000 and 1 is sent as 111; the signal received
is interpreted as 0 or 1, according to the majority of 0s and 1s. For example, 010 is
interpreted as 0. What proportion of signals sent are interpreted correctly?
2.4 The system in the diagram below will work if there is some path from left to right. In the
boxes, which represent components of the system, the numbers indicate the probability
that that component will fail in the next five years. Components behave independently
of each other.
03
005 A

005 B

input

03

output

02
03

What is the probability that the system fails within the next five years? Given that it
has not failed in five years, find the probability that neither of the components marked
A and B has failed.

G5098 Probability and Statistics 201011: Exercises 3


Discrete random variables I
Issued: Week 3, Friday 28 January.
Workshop: Week 4, 31 January4 February.
Hand in: Friday 4 February, by 4 pm.
Solutions posted online: Friday 11 February, evening.
The following facts may help. The first two are standard.
n
X

1
k = n(n + 1),
2
k=1

n
X

1
k = n(n + 1)(2n + 1),
6
k=1
2

k 2 xk =

k=0

x(1 + x)
for |x| < 1.
(1 x)3

3.1 In each case below show that the probabilities given are non-negative and sum to 1, and
that the stated means and variances are correct. In parts (b) and (c), take 0 < p < 1 and
q = 1 p.
(a) The discrete uniform distribution Unif(1, . . . , n), where P (X = x) = 1/n for x = 1,
2, . . . , n, with EX = (n + 1)/2 and var X = (n2 1)/12.

(b) The geometric distribution Geom(p), where P (X = x) = pq x for x = 0, 1, 2, . . . ,


with EX = q/p and var X = q/p2 .

(c) The binomial distribution Binom(n, p), where P (X = x) = nx px q nx for x = 0, 1,
. . . , n, with EX = np and var X = npq.

Hint for (c): find E(X 2 ) by calculating E X(X 1) and adding EX to it.

3.2 In a simple Lottery, four numbers are selected at random from twenty, and you win the
first prize if you match all four numbers, or a second prize if you match 3 (and not 4)
numbers. You buy one ticket.
(a) Find the probabilities of winning the respective prizes.
(b) Calculate the mean number of correct guesses that you will make.
3.3 You buy 100 used computer monitors for a lump sum of 1,500. You expect about 60%
to be functioning, and youll sell those for 40 each. The rest youll sell at 5 each as
scrap. Model X, the number of monitors that will be functioning, using the binomial
distribution. Write down your net profit, in terms of X. Deduce the mean and standard
deviation of your net profit.
3.4 Suppose the random variables X and Y have the following joint distribution:
P (X = 1, Y = 0) = P (X = 0, Y = 0) = P (X = 0, Y = 1) = P (X = 1, Y = 0) =
(a) Find the (marginal) distributions of X and of Y .
(b) Deduce that E(XY ) = EX EY , but that X and Y are not independent.

1
.
4

G5098 Probability and Statistics 201011: Exercises 4


Discrete random variables II
Issued: Week 4, Friday 4 February.
Workshop: Week 5, 711 February.
Hand in: Friday 11 February, by 4 pm.
Solutions posted online: Friday 18 February, evening.
4.1 The paper by Y. Mori and B. R. Ellingwood, Reliability-based service-life assessment
of aging concrete structures (J. Structural Engineering 119(1993), 16001621) suggests
that as exceptional structural loads occur at random over time, Poisson distributions are
appropriate for modelling the numbers of such loads in given time intervals. Suppose
that exceptional loads on a specific building occur at the rate of two per year on average.
Use a Poisson distribution, with appropriate mean in relation to the time period, to find
the probabilities of the following events.
(a) Exactly five exceptional loads occur over the years 201112.
(b) At least two exceptional loads occur in 2013.
(c) The second exceptional load in 2011 or after occurs in 2012.
(d) Just two exceptional loads occur in 2011, given that there are five in total in 201112.
4.2 Suppose that X has the Pois() distribution and that, independently, Y is Pois(). Then
it is shown in lectures that X + Y is Pois( + ). Find the conditional probability
P (X = k|X + Y = n) when 0 k n and write your answer as a binomial probability.
Note that this is a generalisation of Exercise 4.1(d).
4.3 In lectures, it was found that the probability generating function of a Binom(n, p) random
variable is (pz + 1 p)n .

(a) Show that, if X and Y are independent with Binom(m, p) and Binom(n, p) distributions, then X + Y has a Binom(m + n, p) distribution.
(b) Show that if, instead, Y has a Binom(n, r) distribution, X + Y does not have a
binomial distribution unless r = p.

4.4 Suppose that X has the geometric distribution Geom(p), so that P (X = x) = pq x for
x = 0, 1, 2, . . . . Show that its probability generating function is p/(1 qz), provided
that |z| < 1/q. Hence confirm that the mean and variance of X are q/p and q/p2 ,
respectively.
Use this result to show that the probability generating function of the sum of
r independent Geom(p) random variables is pr /(1 qz)r . Deduce that if Y has this
distribution then


y+r1 r y
P (Y = y) =
pq
(y = 0, 1, 2, . . .).
y
This is called the negative binomial distribution NB(r, p) and arises as the probability
that in a sequence of Bernoulli trials there are exactly y failures before the rth success.
Deduce the mean and variance of a NB(r, p) distribution from its probability generating
function, and note that these results can be obtained directly from the fact that this
negative binomial arises as the sum of r independent geometric random variables.

G5098 Probability and Statistics 201011: Exercises 5

page 1

Continuous random variables I


Issued: Week 5, Friday 11 February.
Workshop: Week 6, 1418 February.
Hand in: Friday 18 February, by 4 pm.
Solutions posted online: Friday 25 February, evening.
5.1 Let X denote the vibratory stress, in pounds per square inch (psi), on a wind turbine blade
at a particular wind speed in a wind tunnel. The paper by P. S. Veers, Blade fatigue
life assessment with application to VAWTS (J. Solar Energy Engineering 104(1982),
106111) proposes a Rayleigh distribution, with density
( x
2
2
ex /(2 ) , for x > 0,
2
f (x) =
0,
otherwise.
where > 0, as a model for the distribution of X.
(a) Verify that f is a legitimate density.
(b) Suppose = 100 (a value suggested by a graph in the article). What is the probability
that X is at most 200 psi? Less than 200 psi? At least 200 psi?
(c) What is the probability that X is between 100 psi and 200 psi (again assuming =
100)?
(d) Find the distribution function of X.
5.2 A one-person business has to submit detailed accounts to the Income Tax authorities only
if its annual turnover is at least 67 000 per annum. Let X, in thousands of pounds, be
the annual turnover of a randomly chosen such business. A widely used model for the
density of such a random variable, for which there are theoretical reasons, is

kx , if x > 67,
f (x) :=
0,
otherwise,
where and k are parameters.
(a) What values of can be permitted? Find the value of k in terms of .
(b) Find the distribution function of X.
(c) For the case = 4 find P (X 75) and P (X > 100).
5.3 Suppose that X has density f (x) = 2x for 0 < x < 1.
(a) Calculate P (02 < X < 06).
(b) Calculate P (02 < X < 06 | X < 04).
(c) Calculate P (02 < X < 06 | X > 04).

(d) Show how P (02 < X < 06) can be obtained from the answers to parts (b) and (c)
using the Law of Total Probability.
5.4 Prove that if X has the (, ) distribution, i.e.

x1 ex , if x > 0,
f (x) = ()

0,
elsewhere,

continued . . .

G5098 Probability and Statistics 201011: Exercises 5


where > 0 and > 0, then
(a) EX = /;
(b) var X = /R2 .

Reminder: () = 0 x1 ex dx.

page 2

G5098 Probability and Statistics 201011: Exercises 6


Continuous random variables II
Issued: Week 6, Friday 18 February.
Workshop: Week 7, 2125 February.
Hand in: Friday 25 February, by 4 pm.
Solutions posted online: Friday 4 March, evening.
6.1 The diameter in inches, at chest height, of trees of a certain type is normally distributed
with mean 88 and standard deviation 28, as suggested in D. M. AedoOrtiz, E. D.
Olsen and L. D. Kellogg, Simulating a harvester-forwarder softwood thinning: a software
evaluation (Forest Products J. 47(1997), 3641).
(a) What is the probability that the diameter of a randomly selected tree will be at least
10 in? Will exceed 10 in?
(b) What is the probability that the diameter of a randomly selected tree will exceed
20 in? Comment on this calculation.
(c) What is the probability that the diameter of a randomly selected tree will be between
5 and 10 in?
(d) What value c is such that the interval (88 c, 88 + c) includes 98% of all diameter
values?
(e) What is the probability, if four trees are independently selected, that at least one has
a diameter exceeding 10 in?
6.2 Let X have density
f (x) =

1 |x|/2
e
,
4

for < x < .

(a) Sketch this density.


(b) Show that EX = 0 and var X = 8.
(c) Suppose that the error, in grammes, of a balance has the above distribution and that
100 items are weighed, independently of each other. Use the Central Limit Theorem to
approximate the probability that the absolute difference between the true total weight
and the measured total weight is more than 50 g.
6.3 Suppose that X and Y are independent random variables, with N(2, 2) and N(10, 3)
distributions, respectively. State the distributions of the five random variables X, 2X,
5X + Y , Y X 5 and (X + Y )/2.
6.4 (a) Find the moment generating function of X Unif(1, 1).
(b) By expanding the mgf as a power series,
(i) show that X has mean zero and variance 1/3;
(ii) find the third and fourth moments of X.

G5098 Probability and Statistics 201011: Exercises 7


Continuous random variables III
Issued: Week 7, Friday 25 February.
Workshop: Week 8, 28 February4 March.
Hand in: Friday 4 March, by 4 pm.
Solutions posted online: Friday 11 March, evening.
7.1 Let X and Y have joint density f (x, y) = c(x+2y) on the rectangle 0 < x < 3, 1 < y < 2.
Evaluate c and calculate the marginal densities of X and Y . Are X and Y independent?
Find the densities of X, given that Y = 125, and of Y , given that X = 2. Confirm that
the two answers you obtain are indeed densities, that is, that they are non-negative and
integrate to 1. Evaluate EX, EY and cov(X, Y ).
7.2 Let X1 , . . . , Xn be random variables denoting n independent bids for an item that is for
sale. Suppose each Xi is uniformly distributed between 100 and 200. The seller sells
to the highest bidder.
(a) Find how much, as a function of n, he can expect to make on the sale.
Hint: let Y = max(X1 , . . . , Xn ). Find the distribution function of Y by noting that
Y y if and only if Xi y for all i.

(b) Work your answer out for small values of n and sketch the result.

7.3 Suppose that Z1 , Z2 and Z3 are independent observations from the standard normal
distribution. Let X1 = Z1 + Z2 , X2 = Z1 2Z2 + 3Z3 and X3 = Z1 Z2 . Find the
covariance and correlation between
(a) X1 and X2 ,
(b) X1 and X3 ,
(c) X2 and X3 .
7.4 A rock specimen from a particular area is randomly selected and weighed two different
times. Let W denote the actual weight and X1 and X2 the two measured weights. Thus
X1 = W + E1 and X2 = W + E2 , where E1 and E2 are the two measurement errors.
Assume that the Ei are independent of each other and of W and that var E1 = var E2 =
E2 .
(a) Express , the correlation between the two measured weights X1 and X2 , in terms of
2
W
, the variance of actual weight, and E2 .
(b) Calculate when W = 1 kg and E = 50 g.

G5098 Probability and Statistics 201011: Exercises 8

page 1

Linear regression and least squares


Issued: Week 8, Friday 4 March.
Workshop: Week 9, 711 March.
Hand in: Friday 11 March, by 4 pm.
Solutions posted online: Friday 18 March, evening.
8.1 The following data are from the paper by O. Klemm & I. C. Ziomas, Urban emissions
measured with aircraft, J. Air & Waste Mgmt. Assoc. 48(1998), 1625, and record the
enrichment of plumes of pollutants along flight paths above Athens. The response variable
is NO, the explanatory variable is CO, and the units are parts per billion times seconds
times 100.
CO: 50 60 95 108 135 210 214 315 720
NO: 23 45 40 37 82 54 72 138 321
(a) Find the correlation coefficient of the data.
(b) Calculate the least-squares regression line for these data, and find an estimate of the
variance of the error term.
(c) Plot the points and the regression line on a graph.
(d) Comment on the data and the graph, and carry out any further analysis that your
comments indicate would be appropriate.
8.2 The article by B. Kroll and M. R. Ramey, Effects of bike lanes on driver and bicyclist
behavior (ASCE Transportation Eng. J. 103(1977), 243256) reports the results of a
regression analysis with x the available travel space in feet (a convenient measure of
roadway width, defined as the distance between a cyclist and the centre line) and y the
separation distance between a bike and a passing car (determined by photography). The
data, for 10 streets with bike lanes, were as follows:
X: 128 129 129 136 145 146 151 175 195 208
Y : 55 62 63 70 78 83 71 100 108 110
P
P
P 2
P
(a) Verify that
xi = 1542,
yi = 800,
xi = 245218 and
xi yi = 128274.
(b) Derive the equation of the estimated regression line.
(c) Plot the points and the regression line on a graph.
(d) What separation distance would you predict for another street that has 150 as its
available travel space value?
8.3 The following data represent the height (X) in centimetres and weight (Y ) in grammes
of a type of plant. A sample of ten plants was taken.
X: 47 62 64 69 76 78 81 87 92 104
Y : 22 46 50 68 92 92 109 136 159 221
(a) Verify that the correlation coefficient of the data is 0975.
(b) Plot the points. Despite how high the correlation coefficient is, the data clearly lie
on a curve.

continued . . .

G5098 Probability and Statistics 201011: Exercises 8

page 2

(c) Weight might be proportional to height3 (e.g. volume) or height2 (e.g. area, for hollow
plants). Therefore try fitting a power relationship. Calculate the least-squares regression
line for V := ln Y on U := ln X.
(d) Plot the residuals against the values ui .
(e) Express the regression line as an equation giving Y in terms of X, and plot the curve
on your original graph.
8.4 A genetic experiment was undertaken to study the competition between two types of
female Drosophila melanogaster (fruit fly) in cages with one male genotype acting as a
substrate. The independent variable X is the time, in days, spent in cages, and the
dependent variable Y is the ratio of the numbers of Type 1 to Type 2 females. The
following data were recorded:
X: 17
31
45
59
73
Y : 02338 05804 1982 3388 1301
(a) Plot the points. The relationship is clearly non-linear.
(b) Transform the data appropriately and find the least-squares regression line for the
transformed data.
(c) Plot the residuals and say whether your transformation has made it satisfactory to fit
a straight line.
(d) Express the regression line as an equation giving Y in terms of X, and plot the curve
on your original graph.

G5098 Probability and Statistics 201011: Exercises 9

page 1

Descriptive statistics and Minitab


Issued: Week 9, Friday 11 March.
Workshop: Week 10, 1418 March.
Hand in: Friday 18 March, by 4 pm.
Solutions posted online: Friday 29 April, evening.
9.1 In the casino game roulette, if a player bets one unit on red, the probability of winning is
18/38 and of losing is 20/38 (in American casinosEuropean ones are more generous!).
Suppose that a player begins with five units and let Y be a players maximum capital,
before eventually losing their money. The following data are 100 simulations of this value
of Y .
25, 9, 5, 5, 5, 9, 6, 5, 15, 45, 55, 6, 5, 6, 24, 21, 16, 5, 8, 7, 7, 5, 5, 35,
9, 5, 18, 6, 10, 19, 16, 21, 8, 13, 5, 9, 10, 10, 6, 23, 8, 5, 10, 15, 7, 5, 5,
9, 11, 34, 12, 11, 17, 11, 16, 5, 15, 5, 12, 6, 5, 5, 7, 6, 17, 20, 7, 8, 8, 6,
11, 6, 7, 5, 12, 11, 18, 6, 21, 6, 5, 24, 7, 16, 21, 23, 15, 11, 8, 6, 8, 14,
6, 9, 6, 10.

13,
24,
10,
11,

(a) Construct an ordered stem-and-leaf plot.


(b) Find the five-number summary and draw a boxplot.
(c) Draw a histogram of the data.
9.2 The following are the ages at death for the 38 American presidents from Washington to
Ford.
Washington 67, J. Adams 90, Jefferson 83, Madison 85, Monroe 73, J. Q.
Adams 80, Jackson 78, Van Buren 79, W. H. Harrison 68, Tyler 71, Polk 53,
Taylor 65, Fillmore 74, Pierce 64, Buchanan 77, Lincoln 56, A. Johnson 66,
Grant 63, Hayes 70, Garfield 49, Arthur 56, Cleveland 71, B. Harrison 67,
Cleveland 71, McKinley 58, T. Roosevelt 60, Taft 72, Wilson 67, Harding
57, Coolidge 60, Hoover 90, F. D. Roosevelt 63, Truman 88, Eisenhower
78, Kennedy 46, L. Johnson 64, Nixon 81, Ford 93.
(a) Construct a stem-and-leaf plot of the data and describe its shape.
(b) Find the five-number summary, and hence draw a boxplot.
9.3 An article in U.S. Consumer Reports, September 1990, reported the following scores for
various brands of two types of peanut butter:
Smooth: 56, 44, 62, 36, 39, 53, 50, 65, 45, 40, 56, 68, 41, 30, 40, 50, 56, 30,
22.
Crunchy: 62, 53, 75, 42, 47, 40, 34, 62, 52, 50, 34, 42, 36, 75, 80, 47, 56,
62.
Construct a comparative stem-and-leaf display by listing stems in the middle of your page
and then displaying the smooth leaves out to the right and the crunchy leaves out to the
left. Also construct boxplots. Use your displays to make a comparison between smooth
and crunchy peanut butter, considering shape, spread and location of the distributions
of the scores.

continued . . .

G5098 Probability and Statistics 201011: Exercises 9

page 2

9.4 Repeat as much as you can of Exercise 8.4 using Minitab. Use two transformations and
look at the residuals in each case. Give reasons for the transformation that you eventually
choose. Show the Minitab commands that you use clearly.

G5098 Probability and Statistics 201011: Exercises 10

page 1

Estimation I
Issued: Week 10, Friday 18 March.
Workshop: 28 April5 May.
Hand in: Friday 6 May, by 4 pm.
Solutions posted online: Friday 13 May, evening.
10.1 The Poisson distribution has been used by traffic engineers as a model for
light traffic, based on the rationale that if the rate is approximately constant
and the traffic is light (so the individual cars move independently of each
other), the distribution of counts of cars in a given time-interval or space area
should be nearly Poisson. The table on the right records numbers of right
turns in 300 three-minute periods at a specific intersection (D. Gerlough &
A. Schuhl, Use of Poisson Distribution in Highway Traffic, Eno Foundation
for Highway Traffic Control, 1955). The table shows the frequency fi of
periods in which there were i right turns.
P
(a) The usual formula for sample mean is x := ni=1 xi /n. However for
the present data-set, in frequency form,
P12 theaverage
P12 number of right
turns per 3minute period is x = i=0 ifi
i=0 fi . Explain why,
and calculate x.

fi

(b) Assume that X, the number of right turns in a 3minute period, has
a Poisson distribution with parameter . Find an unbiased estimator
of and calculate the estimate for the given data. What is the standard error (i.e.
standard deviation) of your estimator? Calculate the estimated standard error.
= ? and var(X)
=?
Hint: EX = and var X = for X Poisson, so E(X)

10.2 Let X1 , . . . , Xn be independent, each with distribution P (Xi = 1) = p, P (Xi = 0) = 1p,


where p is an unknown parameter satisfying 0 p 1.
Pn
(a) State the distribution of Y =
i=1 Xi , and give its mean and variance (you may
quote from the Probability Distributions sheet).
= Y /n is an unbiased estimator of p, with variance p(1 p)/n.
(b) Deduce that X

X)
= (n 1)p(1 p)/n.
(c) Show that E X(1

is an unbiased estimator of var X


= p(1p)/n.
(d) Find the value of c so that cX(1
X)
10.3 Let X1 , . . . , Xn be a random sample of size n from the distribution with density
f (x; ) = x1

(0 < x < 1, 0 < < ).

(a) Sketch the graph of this density for = 05, 1, 2.


(b) Show that the maximum-likelihood estimator of is given by
n
.
= Qn
ln i=1 Xi

(c)For both the following sets of observations from this distribution, calculate the values
of the maximum-likelihood estimate and the methods-of-moments estimate for .
(i) 00256, 03051, 00278, 08971, 00739, 03191, 07379, 03671, 09763, 00102.
(ii) 04698, 03675, 05991, 09513, 06049, 09917, 01551, 00710, 02110, 02154.

continued . . .

G5098 Probability and Statistics 201011: Exercises 10

page 2

10.4 Let X1 , . . . , Xn be a random sample of size n from the geometric distribution with success
parameter p, i.e.
f (x) = p(1 p)x
(x = 0, 1, 2, . . .).
(a) Use the method of moments to find a point estimate of p.
(b) Explain in words why this estimate makes sense.
(c) Find a point estimate of p, given the following data:
2, 33, 6, 3, 18, 1, 0, 18, 42, 1, 21, 3, 18, 10, 6, 0, 1, 20, 14, 15.

G5098 Probability and Statistics 201011: Exercises 11


Estimation II
Issued: Friday 6 May.
Workshop: 912 May.
Hand in: Friday 13 May, by 4 pm.
Solutions posted online: Friday 20 May, evening.
11.1 Assume that the yield per acre of a particular variety of soya beans is N(, 2 ). For a
random sample of n = 5 plots, the yields in bushels per acre were 374, 488, 469, 550
and 440.
(a) Give a point estimate for .
(b) Find a 90% confidence interval for .
11.2 The following observations were made on fracture toughness of a base plate of 18% nickel
maraging steel [J. A. Kies, H. L. Smith, H. E. Romine, H. Bernstein, Fracture testing of
weldments, ASTM Special Tech. Publ. 381(1965), 328356]. The observations are in ksi

in and are given in increasing order.


695, 719, 726, 731, 733, 735, 755, 757, 758, 761, 762,
762, 770, 779, 781, 796, 797, 799, 801, 822, 837, 937.
Calculate a 99% confidence interval for the standard deviation of the fracture toughness
distribution. Is this interval valid whatever the nature of the distribution? Explain.
11.3 Let Y be the sum of n independent observations from a Pois() distribution. Further let
the prior distribution for be (, ).
(a) Find the posterior distribution of , given Y = y.
(b) Find a point estimate of given this value y.
11.4 Suppose that Yi is the result of a Bernoulli trial, with probability of success (Yi = 1).
If we assign a Unif(0, 1) prior distribution to , find the posterior distribution of after
the observation(s)
(a) 1;
(b) 0, 1, 1, 0, 0.

G5098 Probability and Statistics 201011: Exercises 12

page 1

Hypothesis testing I
Issued: Friday 13 May.
Workshop: 1619 May.
Hand in: Friday 20 May, by 4 pm.
Solutions posted online: Friday 27 May, evening.
12.1 Assume that IQ scores for a certain population are approximately N(, 100). To test
H0 : = 110 against the one-sided alternative H1 : > 110 we take a random sample of
size 16 from this population, and find that the mean of this sample is x = 1135.
(a) Do we accept or reject H0 at the 5% level?
(b) Do we accept or reject H0 at the 10% level?
(c) What is the p-value?
12.2 The calibration of a scale is to be checked by weighing a 5 kg test specimen 10 times.
Suppose that the results of different weighings are independent of one another and that
the weight on each trial is Normally distributed with = 0200 kg. Let denote the true
average weight reading on the scale.
(a) What hypotheses should be tested?
(b) Suppose the scale is to be re-calibrated if either x 51629 or x 48371.
p Express

this test procedure in terms of the standardised test-statistic Z = (X 5)/ 2 /n.

(c) What is the probability that re-calibration is carried out when it is actually unnecessary?
(d) Which type of error would that be?
(e) Using the test of (b), what would you conclude from the sample data below?
4981, 5006, 4857, 5107, 4888, 4793, 4728, 5439, 5214, 5190

12.3 Assume that the birth weight in grammes of a baby born in the US is N(3315, 5252 ) for
boys and girls combined. Let X be the weight of a baby girl who is born at home in
Ottawa County and assume that X N(, 2 ).

(a) Using 11 observations of X, give the test statistic and critical region for testing
H0 : = 3315 against the alternative H1 : > 3315 (home-born girls in Ottawa County
are heavier) with significance level = 001.
(b) Calculate the value of the test statistic and give your conclusion using the following
weights:
3119, 2657, 3459, 3629, 3345, 3629, 3515, 3856, 3629, 3345, 3062.

(c) What is the approximate p-value?


(d) Give the test statistic and critical region for testing H0 : 2 = 5252 against the
Alternative Hypothesis H1 : 2 < 5252 at significance level = 005.
(e) Calculate the test statistic and state your conclusions.
(f) Find the approximate p-value for this second test.

continued . . .

G5098 Probability and Statistics 201011: Exercises 12

page 2

12.4 Copper values (g Cu/100 ml blood) were determined for cattle grazing in an area known
to have well-defined molybdenum anomalies (metal values in excess of normal regional
variation) and for cattle grazing in a non-anomalous area [L. Thornton, G. F. Kershaw,
M. K. Davies, An investigation into copper deficiency in cattle in the southern Pennines,
I, J. Agricultural Sci. 78(1972), 157163], resulting in sX = 215 (m = 48) for the
anomalous area and sY = 1945 (n = 45) for the non-anomalous area. Test at significance
level 10 for equality of population variances.

G5098 Probability and Statistics 201011: Exercises 13


Hypothesis testing II
Issued: Friday 20 May.
Workshop: 2326 May.
Hand in: Friday 27 May, by 4 pm.
Solutions posted online: Friday 3 June, evening.
13.1 It was claimed that 75% of dentists recommend a certain design of toothbrush. A consumer group doubted this claim and decided to test H0 : p = 075 against H1 : p < 075,
where p is the proportion of dentists who recommend this design. A survey of 390 dentists
found that 273 recommended the design.
(a) Would you reject the Null Hypothesis at the 5% level?
(b) Would you reject the Null Hypothesis at the 1% level?
(c) Find the p-value.
13.2 In the Michigan Daily Lottery, each week-day a three-digit integer is generated, one digit
at a time. For i = 0, 1, . . . , 9 let pi denote the probability that the digit generated is i.
Use the following 50 digits to test H0 : p0 = p1 = = p9 = 01, using = 005.
1, 6, 9, 9, 3, 8, 5, 0, 6, 7, 4, 7, 5, 9, 4, 6, 5, 6, 4, 4, 4, 8, 0, 9, 3,
2, 1, 5, 4, 5, 7, 3, 2, 1, 4, 6, 7, 1, 3, 4, 4, 8, 8, 6, 1, 6, 1, 2, 8, 8.
13.3 The article by S. M. Specht, R. J. Tushup and C. N. Deatrick, Psychiatric and alcoholic
admissions do not occur disproportionately close to patients birthdays [Psychological
Reports 71(1992), 944946], focusses on the existence of any relationship between date of
patient admission for treatment of alcoholism and patients birthday. Assuming a 365day
year (i.e. excluding leap year), in the absence of any relation, a patients admission date
is equally likely to be any one of the 365 possible days. The investigators established four
different admission categories: (1) within 7 days of birthday, (2) between 8 and 30 days,
inclusive, from the birthday, (3) between 31 and 90 days, inclusive, from the birthday,
and (4) more than 90 days from the birthday. A sample of 200 patients gave observed
frequencies of 11, 24, 69 and 96 for categories 1, 2, 3 and 4 respectively. State and test
the relevant hypotheses using a significance level of 001.
L>R L=R L<R
13.4 The article by J. Levy & J. M. Levy, Human
lateralization from head to foot: sex-related
Men
2
10
28
factors (Science 200(1978), 12911292) reports
Women
55
18
14
for a sample of right-handed men and women the
numbers of individuals whose feet were the same size, the numbers with a bigger left foot
than right (a difference of half a shoe size or more), and the numbers with a bigger right
foot than left.
(a) Do the data indicate that gender has a strong effect on the development of foot
asymmetry? State the appropriate Null and Alternative Hypotheses and test at level
= 01.

(b) If there is evidence of an effect, state where it primarily lies.

G5098 Probability and Statistics 201011: Exercises 14

page 1

Hypothesis testing III


Issued: Friday 27 May.
Workshop: 30 May2 June.
Hand in: Friday 3 June, by 4 pm.
Solutions posted online: Friday 10 June, evening.
14.1 The length of life of brand X light bulbs is assumed to be N(X , 784). The length of life
of brand Y light bulbs is assumed to be N(Y , 627). If a random sample of m = 56 brand
X bulbs yielded a mean life of x = 9374 hours and an independent random sample of
size n = 57 brand Y bulbs yielded a mean life of y = 9889 hours, find a 90% confidence
interval for X Y .
14.2 The article by K. Vermeer, F. A. J. Armstrong and D. R. M. Hatch, Mercury in aquatic
birds at Clay Lake, Western Ontario (J. Wildlife Mgmt. 37(1973), 5861) reported the
following data on mercury residues in breast muscles:
Mallards: m = 16, x = 613, sx = 240,
Blue-winged teals: n = 17, y = 646, sy = 173.
(a) Assuming that X N(X , 2 ) and Y N(Y , 2 ), find a 95% confidence interval for
X Y .

(b) Deduce, without further calculation or use of tables, the result of the test at significance level 5% for the Null Hypothesis X = Y .
(c) Repeat (a) without the assumption of common variance of X and Y , i.e. define T
using a non-pooled estimate of variance:
Y (X Y )
X
,
T := p 2
SX /m + SY2 /n
and use Welchs formula for the degrees of freedom r, as given in lectures.

(d) Test at the 10% level whether the variances of residues in the two populations are
equal.
14.3 The driver of a diesel-powered car decided to test the quality of three types of diesel
fuel, based upon miles per gallon. Test the Null Hypothesis that the three means are
equal using the data below, using the significance level = 005 and making the usual
assumptions.
Brand A: 387, 392, 401, 389
Brand B: 419, 423, 413
Brand C: 408, 412, 395, 389, 403
14.4 Different sizes of nails are packaged in one-pound boxes. Let Xi for i = 1, 2, 3, 4, 5 be
the weight of a box with nail size 4C, 8C, 12C, 16C, 20C respectively, these being the
sizes from smallest to largest. It is desired to test whether the mean weights of nails in
the 4C, 8C, 12C, 16C and 20C boxes are equal. Assume that the distribution of Xi is
N(i , 2 ).
(a) Using random samples of size 7, give a critical region for a test with = 005.

continued . . .

G5098 Probability and Statistics 201011: Exercises 14

page 2

(b) Construct an ANOVA table, and state your conclusions using the following data.
X1 : 103, 104, 107, 103, 108, 106, 107
X2 : 103, 110, 108, 105, 106, 106, 105
X3 : 103, 108, 106, 102, 104, 104, 107
X4 : 110, 110, 109, 109, 106, 105, 108
X5 : 104, 106, 107, 106, 105, 107, 105
(c) Construct boxplots on the same diagram for each type of nail, and comment.