Collins - Cambridge - Further Probability & Statistics - Worked Solutions

Cambridge International
AS & A Level Further Mathematics
Further Probability & Statistics

STUDENT’S BOOK: Worked solutions
Yimeng Gu, Dr Patrick Wallace

Series Editor: Dr Adam Boddison
Pure Mathematics 1 International Students Book Title page.indd 1 14/11/17 10:46 pm

57736_Pi_viii.indd 1
WS TITLE PAGE_Further Probability & Statistics.indd 1 6/18/18 10:38
31/07/18 3:21 PM
AM
1
Worked solutions
Worked solutions
1 Continuous random variables
Please note: Full worked solutions are provided as an aid to learning, and represent one approach to answering the
question. In some cases, alternative methods are shown for contrast.
All sample answers have been written by the authors. Cambridge Assessment International Education bears no
responsibility for the example answers to questions taken from its past question papers, which are contained in this
publication.
Non-exact numerical answers should be given correct to 3 significant figures, or 1 decimal place for angles in
degrees, unless a different level of accuracy is specified in the question.
Prerequisite knowledge 1
y − 5 2
x=
6  2 
1 a ∫1 k(x − 2)dx = 1
( )
1
Therefore f −1( x ) = x − 5
2
 2  6 2
k   x − 2x   = 1 Domain: x  5
2
  1
Range: f –1(x)  0
2 (( 2 ) ( ))
k 36 − 2(6) − 1 − 2 = 1
Exercise 1.1A
k= 2
15 0 1 21
b P (X = 2.5) = 0
1 a P(X < 2) = ∫−1 3 dx + ∫0 6 dx
0 2
2 (x − 2) dx = 2   x 2 − 2x   = 1 x  + 1 x  = 1 + 1 = 2
5
5
c ∫3 15 15   2  
3 
 3  −1  6 0 3 3 3
0 1 31 1 1
= 2 (4) = 8
b P(−0.5 < X < 3) = ∫−0.5 3 dx + ∫0 6 dx = 6 + 2
15 15
=2
2  x3
6 3
6 2 2 44
d ∫1 x 15 (x − 2)dx = 15   3 − x 1  = 9 c P(X > −0.8 ) = 1 − P(X  −0.8)
(1614 ) − (141 ) = 1141

2 4 −0.8 1
41 x  (X 0.8) = 1 − ∫ dx = 1 − 1 = 14
2 ∫1 7 x dx =  14 1 = −1 3 15 15
41 1
3 u = x2 − 4 d P(X > 1) = ∫1 6 dx = 2
( )
du = 2x, x 2 = u − 4 1
⌠ 2 2
dx 2 a  5 m 2 + a dm +
⌡−1 ∫1 a dm = 1
x 2 3  2 3 1 2
u 22 −4=0 32 −4=5  15 m + am  + [am ]1 = 1
−1
11
3 x3 1 5 − 12 a=
45
∫2 x2 − 4
dx =
2 ∫0
u (u + 4) du
b f(m)
1   2 u 23 + 8u 21  
5
=   
29
2 3 0  45

 3 1
= 1  2 × 52 + 8 × 52 
23 
= 5 5+4 5 11
3 45
= 17 5
3
–1
0
1 2 m
4 y = 2x2 + 5
y−5
x2 =
2 1
©HarperCollinsPublishers 2018 Cambridge International AS & A Level Mathematics: Further Probability & Statistics 9780008271886
1 CONTINUOUS RANDOM VARIABLES
( ) P(−0.2 y 1.2)
1 c
⌠ 2 2 11
c P(0 < M  3 ) =  5 m + 45 dm +
2 ⌡0 0 1 m(2 − y ) dy + 1 1 m dy + 1.2 1 my dy
=∫ ∫0 2 ∫1 2
3 −0.2 4
11 dm 1
+∫ 2
45 =
2 0 1 × 8 (2 − y ) dy + 1 1 × 8 dy
=∫ ∫0 2 15
1
0 1 −0.2 4 15
3 a ∫
−π3
k cos x dx + ∫ (1 − x ) dx = 1
0 +∫
1 .2 1
× 8 y dy
1 2 15
1
[k sin x ]0− π3 +  x − 12 x 2 0 = 1 = 7 + 4 + 22 = 143
125 15 375 375
( )
    
 0 −  − 3 k   +  1 − 1 − 0 = 1
  2    2  25 1 a dt + ∞ at − 23dt = 1
5 a ∫0 50 ∫25
3 1
k + =1 ∞
 1 at  +  −2at − 2  = 1
25 1
2 2
 50 0  
3 1   25
k= 10
2 2 a=
9
3
k= b
3 f(t)
b 1
f(x) 45
√3
3 .
0.008
–π
0
1 x 25 t
3
c i ( ) π
P x= 4 =0 c 
25
⌡20 50 9 ( ∫25 9 )
3
⌠ 1 × 10 dt + 30 10 t − 2dt = 0.150 (3 s.f.)
P (x < π ) = 
0
⌠ 3 cosx dx + 1(1 − x ) dx = 1
ii
3 ⌡ −π 3 ∫0 6 a
11
∫0 3 t dt + ∫1
k −1 1 k
dt + ∫ 1 (k − t ) dt = 1
3 3 k −1 3
( )
π
iii P − π x π = ∫ π 3 cosx dx + ∫ 6 (1 − x )dx  1 2  1  1  k −1  1 k
0
t + t +  kt − 1 t 2  =1
6 6 −
6
3 0  6  0  3  1 3 6  k −1
4 a
= 0.675 (3 s.f.)
( 16 − 0) + ( 13 (k − 1) − 13 ) +  ( 13 k − 16 k ) − ( 13 k (k − 1) − 16 (k − 1)
2 2
f(y)
m
( 16 − 0) + ( 13 (k − 1) − 13 ) +  ( 13 k − 16 k ) − ( 13 k (k − 1) − 16 (k − 1) ) = 1
2 2 2
6 (3
+ k − ) +  k − ( k − k − k + k − ) = 1
3m 1 1 2 1 1
2 1 2 1 1 2 1
4 3 6 3 3 6 3 6 
1m
1 1 2 1 1 1 1 1 1
2 + k − + k2 − k2 + k + k2 − k + = 1
6 3 3 6 3 3 6 3 6
–1
0
1 2 y 1 1 1
− + k + =1
2 3 6
0 1 11 21 1 1 1
k =1− +
b ∫−1 4 m(2 − y )dy + ∫0 2 m dy + ∫1 2 my dy = 1 3 6 2
k=4
1 0  1  2
my − 1 my 2  +  1 my  +  1 my 2  = 1
 2 8  −1  2 0  4 1
8
m = 15
2
1
Worked solutions
b P(T > 2) = 1 − P (T  2) Or
=1− (∫ 11
0 3
t dt + ∫ 1 dt
2
1 3 ) P(T  3)
= 1 – P(T > 3)
( )
5 2
=1− 1 + 1
6 3
=1− ∫3 97 (3t + 1)dt
1 3 2 
5
=
2 = 1 −  t2 + t
 97 97 3
( ) ( )
1 1 2. 5 1
dt = 5
c ∫0.5 3 t dt + ∫1 3 8 =1−
75 10
+ −
27 6 
+
 97 97 97 97 

7 a
f(t) =1−
52
97
1
6 45
=
1 97
8
Alternative method, using the graph to
0
4 7 11 t work out the area:
b i Using the graph from part a

4 1 1
P (T 3) = 14 ×
2 1
97
+ ×1×
2
14 20
97
+
97
=
28 17
97
+
97
= ( )
P ( 2 T < 7 ) = ∫ t dt + ( 7 − 4 )
( )
2
2 128 6 P (T 3) = 14 × 2 + 1 × 1 × 14 + 20 = 28 + 17 = 45
7 97 2 97 97 97 97 97
= + = 31
1
48 2 48 5 2 69
ii P ( 2 T 5 ) = ∫ (3t + 1) dt =
2 97 97
ii P (T > 7 ) = 1 × 1 × (11 − 7 ) = 1
2 6 3 Alternative method, using the graph:
c P (T < 2 ) =
2
∫0 128 t
1 2
dt = 1
48
P (2 T 5) =
1
2
× 3× + (
14 32
97 97
3
= ×
2
46 69
=
97 97 )
Alternative method, using the previous answers: 9 a f(x)
P(T < 2) = 1 – P(2  T < 7) – P(T > 7) p – 2q
31 1 1
= 1− − =
48 3 48 (q) p – 4q
1
Therefore, × 100% = 2.08% appointments
48
were delayed by less than 2 minutes.
0
2 4 10 x
8 a b P(X > 5) = 0.5

f(t)
16k From the graph, (10 – 5) × q = 0.5 q = 0.1
or q can be found using integration:
7k 10
∫5 q dx = 0.5
0
2 5 t [qx ]105 = 0.5
b Using the graph,
10q – 5q = 0.5
1
2 × 7k + ( 5 − 2 )( 7k + 16k ) = 1 q = 0.1
2
Total area under PDF = 1
69
14k + k =1 4
2
∫2 p − 0.1x dx + 6 × 0.1 = 1
2 4
k=
97 ∫2 p − 0.1x dx = 0.4
4
P (T 3 ) = 14 ×
2 3 2 28 17 45 px − 0.05x 2  = 0.4
97 ∫2 97 (
c i + 3t + 1) dt = + =
97 97 97 2
(4p – 0.8) – (2p – 0.2) = 0.4
2 3 2 28 17 45
P (T 3 ) = 14 × +∫ ( 3t + 1) dt = 97 + = 2p = 0.4 + 0.8 – 0.2 = 1
97
2 97 97 97
p = 0.5
3
1 Continuous Random Variables
0. 5 1 1. 5 4
∫0 kx dx + ∫ 1 dx + ∫ 3 − kx dx = 1 a For x ∈ ( 0, 1) , F ( x ) = dx = x
x
10 a
∫0 x
3
2
0. 5 1 4
0.5 1.5
 kx 2   kx 2  For x ∈ (1, 1.5 ) , F ( x ) = F (1) + ∫ 1 dx = 1 + [ x ]1x
x
 2  + [ x ]0.5 +  3x − 2  = 1
1
 0  1 1 4
=x− 3
( k8 − 0) + (1 − 0.5) +  ( 29 − 98k ) − (3 − k2 ) = 1

For x ∈ (1.5, 2.5 ) ,
4
x
F ( x ) = F (1.5 ) + ∫ − ( x − 2.5 ) dx
4k 3
=1
8 1. 5
x
k=2
3  ( x − 2.5 )  ( x − 2.5)4
4
= + −  =1−
0. 5 1 1. 2 4  4 4
b P ( X < 1.2 ) = ∫0 2x dx + ∫0.51 dx + ∫1 3 − 2x dx  1.5
Therefore
= 1 + 0.5 + ( (3.6 − 1.44) − (3 − 1) ) = 0.91 F( x ) = 
4
0 x <0
Alternatively: x4
4 0x <1
P ( X < 1.2 ) = 1 − P ( X > 1.2 ) = 1 − ∫ 3 − 2x dx = 1 − 9 = 0.91
1. 5
1.2
2 100 
x − 3 1 x 1.5
1. 5 9  4
P ( X < 1.2 ) = 1 − P ( X > 1.2 ) = 1 − ∫ 3 − 2x dx = 1 − = 0.91 
1.22 100
1 −
( x − 2.5 ) 4 1.5 x < 2.5
4

1 x 2.5.
Exercise 1.2A
b P(X > 1.2) = 1 – P(X  1.2) = 1 – F(1.2)
1 a For x ∈ (1, 6 ) , F ( x ) =
x 1
∫1 10 dx = 10 − 10
x 1 = 1 − 1.2 − 3 = 0.55
4 ( )
c P(0.5 < X < 2) = F(2) – F(0.5) = 0.969
For x ∈( 6, 16 ) , F ( x ) = F ( 6 ) + ∫ 4 − 1 x dx
x
6 25 100 k 1
 2
 2 x 3 a ∫−1 5 (x + 1)dx = 1
= 1 +  4x − x  = 4x − x − 7
2  25 200  25 200 25 k
6  1 2
 10 (x + 1)  = 1
−1
 0 x < 1,
 x 1
1 (k + 1)2 − 0 = 1
 − 1 x < 6, 10
 10 10
Therefore F ( x ) =  2 k = 10 − 1, reject k = − 10 − 1
 4x − x − 7 6 x < 16,
 25 200 25

 1 x 16. 0 x < −1

b F(x) =  ( x + 1)
2 2
b For x ∈( 0, 2 ) , M( x ) = x
x1
∫0 8 x dx = 16  10 −1 x 10 − 1
1 x > 10 − 1.

For x ∈ ( 2, 8 ) , M ( x ) = M ( 2 ) +
x 1 (8 − x ) dx
∫2 24 2 6 1
∫0 a(3 − z) dz + ∫
2
4 a (3z + 2) d z = 1
2 x 2 80
  2
= 1 +  1 x − x  = −x + 1 x − 1
( )
4 3 48  48 3 3 2 6
2 9az − 3az 2 + a z 3  +  1 3 z 2 + 2 z  = 1
 3 0  80 2 2
0 x <0
 2
x
 16 0x <2 (18a − 12a + 83 a ) + 107 = 1
Therefore M ( x ) =  2
 −x + 1 x − 1 26 3
2x 8 a=
 48 3 3 3 10
1 x > 8.
a= 9
260
4
1
Worked solutions
b For 0 z < 2 , 1
72a =
z z 2
∫−∞ a (3 − z) dz = F(0) + ∫ a(3 − z) dz
2 2
0 1
a=
z 144
1
= 0 + − a (3 − z ) 
3
 3 0 2 s
b For 0 s < 2, F(0) + ∫ 1 s ds = 0 +  1 s 2  = s
s
81 3 ( 3 − z )
3
0 4  8 0 8
= −
260 260 s 1
For 2 s < 8, F (2) + ∫
2 144
(s − 8)2 ds = 12 +
For 2 z < 6,
z 1 z 1 1  1
s
(s − 8)3  =
(s − 8 ) + 1
3
∫−∞ 80 (3 z + 2) dz = F(2) + ∫2 80 (3z + 2) dz = +
2  432 2 432
1 ( 3z + 2 )
2
3  1
z Therefore,
= + (3 z + 2)2  = +
10  480 2 6 480 
0 s <0
Therefore,
s2
0 z<0  0s < 2
F(s) =  8
  ( s − 8 )3
 81 − 3 ( 3 − z )
3
0 z < 2  432 + 1 2s < 8
F(x) =  260 260

1 + ( 3z + 2)
2
1 s 8.
6 2 z<6
480
1
c P(1.5 < s 2.5) = F (2.5) − F (1.5)
 z 6.
= 1153 or 0.334 (3 s.f.)
83 3456
c P(1 < Z 3) = F(3) − F(1) = 416
1 1
8 a F(a) = 1, therefore a − = 1 a=4
3 3
1 41
P ( X > 8) = 1 − P ( X 8) = 1 −
50 (
8 − 5) =
2
5 a
50 1 13
P ( X > 1.5 ) = 1 − F (1.5 ) = 1 −
12 ( ) 16
2
b 1.5 =
 1 ( x − 5 ) 5 x 10
 25 1
 c F (2 ) = , therefore y > 2
3
f (x ) =  1 10 x 12
b 4 2
0 F( y ) =
otherwise. 3

1 1 2
Therefore the graph of f(x) is: y− = y=3
3 3 3
f(x) ln 2 1 −3t k 1 dt = 1
9 a ∫0 3
e dt + ∫
ln 2 24
k
7 1 
+ t =1
1 72  24  ln 2
4
1
k ln 2 65
5 − =
24 24 72
0
5 10 12 x
65
k = 3 + ln 2
t
 b For 0 t < ln 2, F ( 0 ) + ∫ 1 e −3t dt = 0 +  − 1 e −3t 
t
6

a f(t) =  cos t 0 t π , 03  9 0
2
0 otherwise.
 = 1 (1 − e −3t )
9
( 4 4 ) () ()
b P π <t < π = F π −F π = 2 − 1 =
6 6 2 2 For ln 2 t < ln 2 + 65 ,
3
= 0.207 (3 s.f.) t
F(ln 2) + ∫
t 1 dt = 7 +  t  = 7 − 3 ln 2 + t
21 8 ln 2 24 72  24  ln 2 72 24
∫0 4 s ds + ∫2 a(s − 8)
2
7 a ds = 1
2 8
 1 s 2  +  1 a(s − 8)3  = 1

 8 0  3 2
5
Therefore, x
∫0 0.02 (10 − x)dx = 0.2x − 0.01x
2
2

0 t <0
1 
−3t
 0 x <0
9 (1 − e ) 0 t < ln 2
F(t) =  Therefore F ( x ) =  0.2x − 0.01x 2 0 x 10
7 − 3 ln 2 + t ln 2 t < ln 2 + 65 
x > 10.
 72 24 3 1
1 65 Limits of M:
 t ln 2 + .
3
X M
ln 2 1 −3t 0 0
+ ∫ 1 dt
1
c ∫0. 5 3
e dt
ln 2 24
or F(1) − F (0.5)
10 5
( 12 x m) = F (2m) = 0.2(2m) − 0.01(2m)

= 0.0237 (3 s.f.)
For m ∈( 0, 5 ) , FM (m ) = P X
( )
10 a Substitute a = 2,
m ∈ ( 0, 5 ) , FM (m ) = P x m = FX ( 2m ) = 0.2 ( 2m ) − 0.01( 2m ) = 0.4m − 0.04m 2
1 2
( )
31 2
21 2 7, ⌠ 3 3 1 8
∫
1 5
t dt =
15  ⌡2 5
− t − 3 dt =
3 15 
m <0
0
7 8 F (m ) = 0.4m − 0.04m 2 0 m 5
+ = 1 , therefore a = 2
15 15 
1  1 m > 5.
( ) ( ) ( )
31 2 33
 3 1 
P t 2 = ⌠
1 3 3 1 5
− t −3 dt =  − t −3 = (or 0.208 to 3. s.f.)
b 2  ⌡2 1 5 3  10 3  1 24 Hence, f (m ) = 0.4 − 0.08m 0m5
2 2 0 otherwise.
2 
31
( ) ( )
3 1
⌠ 3 − 3 t − 3 1 dt =  − 3 t − 3 1  = 5 (or 0.208 to 3. s.f.)
2 3
=  3 Y =X3
⌡2 1 5 3  10 3  1 24
2 2
2
 0 x <6

c 60 members, therefore 
F(x) =  1 x − 3 6 x 10
4 2
( )
1
33 
60 × P (t > 3 ) = 60 ×⌠ 3 1
 − 5 t − 3 3 dt = 60 × 0.03 = 2 members  1 x > 10.
⌡3
( )
3 13 Change limits:
×⌠ 3 1
 − 5 t − 3 3 dt = 60 × 0.03 = 2 members
⌡3 X Y = X3
Exercise 1.3A 0 0
6 216
x
10 1000
∫0 0.08x dx = 0.04x
2
1
1
3 1 1 3
0 x <0 F(y) = P(X y) = P(X y 3 ) = 4 y 3 − 2

Therefore F ( x ) =  0.04x 2 0x 5 Therefore,
1 x > 5.
  2
Limits of Y : 1 −
f(y) = 12 y 3 216 y 1000
X Y 0 otherwise.

0 0
5 25 10
4 a ∫0 kx dx = 1
1 2
   
( )
1
For y ∈( 0,25 ) , FY ( y ) = P X 2 y = FX  y2  = 0.04 ×  y2 = 0.04 y 1 kx 2  = 1
10
  2
2 0
 21   21 
( 2
)
X y = FX  y  = 0.04 ×  y  = 0.04 y
    50k = 1

 1
y <0 k=
 0 50
F ( y ) =  0.04y 0 y 25
 
1
1 y > 25 . b f ( x ) =  50 x 0 x 10
 0 otherwise.

Hence, f ( y ) =  0.04 0 y 25
x 1 1
 0 ∫0 50 x dx = 100 x
otherwise. 2
6
1
Worked solutions

0 x <0 For r ∈ ( 13 ,1), F (r ) = P ( 13 t r ) = F (3r ) = 25 (3r ) − 201 (3
R T
Therefore F ( x ) =  1 2
100
1
x
( )
1
r0∈ x,1 ,10
3
x > 10.
1
3 (

) 2
5
1
FR (r ) = P t r = FT ( 3r ) = ( 3r ) − ( 3r ) −
20
2 7 6 9
= r − r2 −
20 5 20
7
20
Change limits:
X Y
1
( )
For r ∈ (1, 3 ) , FR (r ) = P 3 t r = FT (3r ) = 10 + 10 ( 3r ) =
1 1
0
10
0
2
r ∈ (1, 3 ) , FR (r ) = P ( 13 t r )
= F (3r ) =
T
1
+
1
3r =
1
10 10 ( ) 10 10
3
+ r
For y ∈ ( 0, 2 ) , FY ( y ) = P ( 15 x y ) = F (5y ) = 1001 (5y ) = 14 y

X
2 2

0

r<1
3
) , FY ( y ) = P ( 15 x y )
= F ( 5y ) =
X
1
100 ( )
1
5y = y
4
2 2 F (r ) =  6 r − 9 r 2 − 7
5
1
20 20
1 r 1
3
3
 10 + 10 r 1r 3
0 y <0 
F ( y ) =  1 2 1 r > 3.
y 0y 2
4 
1 y > 2. 6 − 9 r 1 r 1
5 10 3
 1 Hence, f (r ) = 
 y 0 y 2 3
Hence, f ( y ) =  2 10 1r 3
 0 otherwise. 0 otherwise.

3 9 1 3 11
P ( R < 1.5 ) = F (1.5 ) =
∫1 0.4 − kt dt + ∫3 k dt = 1 10 10 ( ) 20
5 a c + 1.5 =
3 1 2
0.4t − 1 kt 2  + [ kt ]9 = 1 6 a a = 1 so a2 = 16
 2 1 3 16
0.8 – 4k + 6k = 1 Therefore, a = 4, a = –4 (reject as a > 0)

k = 0.1 Change limits:
t
⌠ 2 t X Y
 
b For t ∈ (1, 3 ) ,  0.4 − 0.1t dt = 0.4t − 0.1t  = 2 t − 1 t 2 − 7
 2  5 20 20 0 0
⌡1 1
t t 4 16
⌠  0.1t 2  2 1 2 7
 0.4 − 0.1t dt = 0.4t − 2  = 5 t − 20 t − 20  1 1  1
2
⌡1  1
( )
FY ( y ) = P x 2 y = FX  y 2  =  y 2  =
  16  
1
16
y
t
For t ∈ (3, 9), F ( 3 ) + ∫ 0.1t dt = 2 + [0.1t ]t3 = 1 + 1 t
3 5 10 10 
t 2 1 1 0 y <0
F ( 3 ) + ∫ 0.1t dt = + [0.1t ]3 =
t
+ t
3 5 10 10 F ( y ) =  1
y 0 y 16
16
Therefore 1 y > 16.
 
0 t <1 1
2 1 2 7 Hence, f ( y ) =  16 0 y 16
F (t ) =  5 t − 20 t − 20 1t 3 0 otherwise.

1 + 1t 3 t 9 1 15
10 10 b P (Y > 1) = 1 − F (1) = 1 − =
16 16
1 t > 9..

x
∫−∞ 2 dx = F(−1) +  2 x −1 = 2 x + 2
x 1 1 1 1
Change limits: 7 a
T R
1 
1 0 x < −1
3
F(x ) =  1
3 1 x+1 −1 x 1
2 2
9 3 1 x > 1.
7
Change limits: Therefore
X Y = eX 
1024 3 0 y 17
f(y) =  83521 y 4
−1 e−1 0
 otherwise.
1 e 2 3 2
9 ∫a 16 x dx = 1
F(Y ) = P(Y y ) = P(e X y ) = P(x ln y ) =
 3x 3  2
=
1
ln y +
1   =1
2 2  48  a
Therefore, a = −2

 0 y <e −1

 0 x < −2

F( y ) =  1
ln y +
1 e −1 y e F ( x ) =  1 3 1
 2 2 x + −2 x 2
y > e. 16 2
 1
1 x > 2.

1 Change limits:
 e −1 y e
Hence f(y) =  2 y
0 otherwise. X Y = X2

−2 4
b P(Y  k) = 0.25 is the same as 0 0
P(Y  k) = 0.75
2 4
1 ln k + 1 = 0.75
2 2 1 1
1 3
F(y) = P(X 2 y) = P(− y 2 X y 2 ) = y 2
1 ln k = 0.25 8
2 Therefore
ln k = 0.5  1
3
k = e0.5 f(y) = 16 y 2 0y 4
0 otherwise.
1 
∫02 c x
3
8 dx = 1 x
⌠ ( x − 10 ) dx = F ( 0 ) +  ( x − 10 )  = ( x − 10 ) + 1
x 2 3 3
1 10   
 cx 
4 2 ⌡−∞ 3000 
9000

9000 9
 4  =1
0
 0

c = 64 0 x 0
 F ( x ) = ( x − 10 )3 1

f(x) =  64x
3
0x 1  9000 + 9 0 < x < 30
2 1
0  x 30.
 otherwise.
Since X + T = 30, T = 30 − X


0 x <0

Change limits:
F(x ) = 16x 4 0x 1
2 X T

1 1
x> . 0 30
 2
30 0
Change limits:
FT (t ) = P(T < t ) = P(30 − X < t ) = P(X > 30 − t ) =
X Y = 8.5X
= P(X > 30 − t ) = 1 − P(X 30 − t )
0 0
1  (( 30 − t ) − 10 )3 1 
17
2 4 = 1 − F X (30 − t) = 1 −  9000
+ 
9

y  =8−
(20 − t ) 3
F(y) = P(8.5X y) = P  X
256 4
= y
 8.5  83521 9 9000
8
1
Worked solutions
Therefore
∫0 m (0.4 − 0.08m )dm = 3
E(M ) =
5 5

0 t 0
10 5 15
F (t ) =  8 ( 20 − t )3 Therefore E ( X ) + E ( M ) = + = =5
3 3 3
 9 − 9000 0 < t < 30
1 t 30. b E ( 2 X − M ) = 2E ( X ) − E ( M ) = 2 ×
10 5
− =5
 3 3
Exercise 1.4A 3 9 11 18 = 13
5 a E (T ) = ∫1 t (0.4 − 0.1t )dt + ∫3 0.1t dt = 15 + 5 3
1 33
dx + ∫ 1 x 2dx = 9
0 4
a E(Y) = ∫−1 3 x
2
1 1 1 13 13
E(R) =
3 ( ) 3 3
0 6 b ET = × =
9
1 13 8
dx + ∫ 1 x 3dx = 127
0 4
b E(Z) = ∫−1 3 x
3
E ( 2R + 3 ) = 2E(R) + 3 = 2 × +3=5
0 6 12 9 9
1 1
dx + ∫ 1 x 4dx = 513
0 4
∫−1 3 x
4
c E(P) =
0 6 15
6 a f ( x ) =  8 x 0x 4
57 0 otherwise.
d E(Y + Z) = E(Y) + E(Z) = 
4
( 18 x ) dx =  24x  = 83

10 1 3 4
2 a E(Y) = ∫6 4
x 3 dx = 544 cm3 E( X ) = ∫0 x
4
0
b b Y = X3
( )
4 4
3Y 2 = 3X 6 ⌠
b ( ) 8
 4
E X 2 =  x 2 1 x dx =  x  = 8
 32 0
E(3Y 2 ) = E(3X 6 ) ⌡0
10 1 7 290048 E(Y) = E(2X2) = 2E(X2) = 16
= ∫6 4
× 3x 6dx =
7
Alternatively,
c E(3Y 2 + Y + 2) = E(3Y 2 ) + E (Y ) + E(2) 1
7 290048 Y = 2X 2 and f ( x ) =  8 x 0x 4
= + 544 + 2
7 0 otherwise.

7 293870
= 4
7
∫ (2x ) 8 x  dx =  16 
1  x  4
E(Y ) =
4 2
= 16
5 0
∫0 kx dx = 1
0
3 a
2 31
5
 1 kx 2  = 25 k = 1
7 a ∫−1k dy +∫2 5 (7 − 2y ) dy = 1
 2 0 2
3
 2

k=
2
25 [ky ] 2−1+  75 y − 210y  = 1
 2
( 252 x )dx = 1002 x  = 252

5 2 5
b E (Y ) = ∫0 x
4
2k + k +
4
=1
0 10
c E(2X + Y) = 2E(X) + E(Y) 1
k=
5
( )
5
= 2∫ x 2 x dx + 25 = 2 ×  2 x 3  + 25
5
2 25 2  75 0 2 b f(y)
10 25 115 3
=2 × + = 5
3 2 6
2
5
0.02x (10 − x )dx = 10
10
4 a E( X ) = ∫0 3

f (m ) = 0.4 − 0.08m 0 m 5 –1
0
2 3 y
0 otherwise.
9
2 2 3 3Therefore,
1 y dy + 3 1 y (7 − 2y ) dy =  y  +  7y − 2y  = 3 + 29 = 19
2 2
c E (Y ) =∫−1 5 ∫2 5  10  
  −1  10 15  10 30  015 w<0
2 
31 y2 
2
 7y 2 2y 3 
3
3 29 19  1 w3 0 w < 1.5
y dy + ∫ y (7 − 2y ) dy =   +  −  = + =  5
2 5 10  10 15  10 30 15
   27 27
− 1 2
F (w ) =  40 w − 80 1.5 w < 1.9

d X = 1 Y , therefore X 2 = 1 Y 2  189 + e −1.9 − e −w 1.9 w < 2.36
2 4  200
 1
( )( ) ( )( )
2 3 w 2.36.

E (Y ) = ⌠
 4 y 2 5 dy + ⌠
1 1 1 1
 4 y 2 5 ( 7 − 2y ) dy
⌡−1 ⌡−2
Y = 2.2W
2 3
 x3  7 1 4 y 
=   +  y3 − y For y ∈ ( 0, 3.3) , F ( y ) = P (Y y ) = P ( 2.2W y ) = P  W =
 60  −1  60 40  2  2.2 
3
y  1 y 
∈ ( 0, 3.3) , F ( y ) = P (Y y ) = P ( 2.2W y ) = P  W
25 3
3 y 71 = = y
= +  2.2  5  2.2  1331
20 120
89 For y ∈(3.3, 4.18),
=
120
21 8 1
F ( y ) = P (Y y ) = P ( 2.2W y ) = P  W

y 
2.2 
= ( )
27  y  27
−
40  2.2  80
8 ∫ x(x ) dx + ∫ (8 − x )(x ) dx = 82
2 2
( )
0 4 2 4
y  27  y  27 27
F ( y ) = P (Y y ) = P ( 2.2W y ) = P  W
27
= − = y−
6 1 43  2.2  40  2.2  80 88 80
9 E(R 2) = ∫ r 2 dr = 3
1 5
For y ∈ (4.18, 5.192),
E(A) = πE(R 2) = 43 π cm 2 or 45.0 (3 s.f.) y  189 −5
3 F ( y ) = P (Y y ) = P ( 2.2W y ) = P  W = + e −1.9 − e 1
 2.2  200
( 2)
E 1 ( A + 1)2 = 1 E ( A + 1)2
2 ( )
F ( y ) = P (Y y ) = P ( 2.2W y ) = P  W
y  189
= + e −1.9 − e 11
−5 y
1 2  2.2  200
= 2 (E( A ) + 2E( A) + E(1)) = 1059
10 To find the value of k, Therefore, f ( y ) = d F( y)
1.5 1.9 k dy
∫0 0.6w dw + ∫1.5 0.675 dw + ∫1.9 e dw = 1
2 −w
 75 2
1.5
0.2w 3  + [ 0.675w ]1.9 +  −e −w  = 1
k
 1331 y 0 < y < 3.3
 0 1.5  1.9  27
 3.3 y < 4.18
0.675 + 0.27 + (– e–k + e–1.9) = 1 f ( y ) =  88
 5 −11 y 5
e–k = e– 1.9 – 0.055  11 e 4.18 y < 5.192
 0 otherwise.
–k = ln(e–1.9 – 0.055) 
( )
4.18
3. 3 75 2 ⌠ y × 27 dy +
k = 2.36 E(Y ) = ∫0 (y × 1331 y ) dy +  88
⌡3.3
Change limits: 5.192
⌠  5y
5 −11 
W Y = 2.2 W   y × 11 e  d y = 2.94 (3 s.f.)
⌡4.18
0 0
Alternatively:
1.5 3.3
Y = 2.2W so E(Y) = 2.2E(W)
1.9 4.18
1.5 1.9 k
2.36 5.192 E(W ) = ∫ 0.6w 3dw + ∫ 0.675w dw + ∫ we −w dw
0 1.5 1.9
w 1 3
∫0 0.6w
2 1.5 1.9 k
For W ∈(0,1.5), dw = w = 0.15w 4  + 0.3375w 2  +  −e −w (w + 1)  where k = − ln(
5 0 1.5 1.9
1.5 1.9 −w k
For W ∈(1.5,1.9),0.15 w)4 + w+ 0.675
F (1.5 0.3375w 2= 27+w −−e27 (w + 1)  where k = − ln(e −1.9 − 0.055) ≈ 2.36
0 ∫1.5  dw
40
1.5 80 1.9
= 0.759375 + 0.459 + 0.116147 = 1.335

For W ∈ (1.9,2.36 ), F (1.9) + ∫ e −w dw = 189 + e −1.9 − e −w
w
1.9 200 E(Y) = 2.2 × 1.335 = 2.94 (3 s.f.)
10
1
Worked solutions
Exercise 1.5A – 0.05x2 + 0.4x – 0.55 = 0

x = 1.76, x = 6.24 (reject, as x < 3 from part b)
1 a F(x) = 0.5
Therefore the 20th percentile is 1.76.
1 x 2 = 0.5
25
3 a y = 1, 5 (y − 4)2 = 5
x = 3.54 x = −3.54, reject 72 8
b Q1: 1 x 2 = 0.25 f(y)

25
x = 2.5 x = −2.5, reject
Q3: 1 x 2 = 0.75 3
25 4
5
x = 4.33 x = −4.33, reject 8
2 a f(x)
0.3
0.1
1 2 3 4 y
0
1 3 9 x

0 y <0
b Using the graph, area of trapezium 3 2
1 8 y 0y <1
= (0.1 + 0.3) × 2 = 0.4 < 0.5 b F(y) = 
2
 5( y − 4 )
3
Therefore the median value lies between 3 1 + 1 y < 4

216

and 9. 1 y > 4.
Area of trapezium + area of rectangle = 0.5
3
Area of rectangle = 0.1 c F(m) = 1 + 5(m − 4) = 0.5
216
(x – 3) × 0.1 = 0.1 m = 1.22 (3 s.f.)
x=4 d The 20th percentile is when y is between 0 and 1,

3 y 2 = 0.2
Therefore the median is 4. 8
Alternative method: y = 0.730, y = −0.730, reject
3 31 3 1
Using calculus, ∫1 0.4 − 0.1x dx = 0.4 < 0.5 4 a ∫0 7 dw = 7 < 2
Therefore, the median value lies between 3 Therefore, the median value lies between 3
and 9. and 5.
x 3 + w 2 dw = 1
7 ∫3 7
0.4 + ∫ 0.1 dx = 0.5
3 2
w
[0.1x ]3x = 0.1 2w = 1 − 3
 7  3 2 7
0.1x – 0.3 = 0.1
1
w=3
x=4 4
1
Therefore the median is 4. Therefore, the median is 3 .
4
c F(x) = 0.2 3 + w 2 dw = 7
7 ∫3 7
b
x 10
∫1 0.4 − 0.1x dx = 0.2
w
x 2w = 7 − 3
0.4x − 0.1 x 2  = 0.2  7  3 10 7
 2 1
19
w=3
20
11
1 CONTINUOUS RANDOM VARIABLES
3 5 F(Q3) = 0.75
5 a ∫1 k ( x − 1)dx + ∫3 k (5 − x )dx = 1 1 7
x − 1 = 0.75 x=
3 2 5
2 2
 kx 2
  kx 
 2 − kx  + 5kx − 2  = 1
 1  3 Therefore IQR = 7 − 2 = 1 1
2 2
( 2)( )
 9k − 3k − k − k 
 2 
x 1 −1x  −1x  −1x
x
7 a ∫0 3 e dx =  −e 3  = 1 − e 3
+  ( 25k −
2 ) (
3
− 15k − k ) = 1
25k 9  0
 2 
4k = 1  0 x <0
Therefore F ( x ) = 
−1x
k=
1  1 − e 3 x 0.
4
1
b F(m) = 2
b f(x)
− 1m 1
1 1− e 3 = m = 3ln 2 or 2.08 (3 s.f.)
2
2
c F(x) = 0.8
−1x
1− e 3 = 0.8 x = 3ln 5 or 4.83 (3 s.f.)
0
1 3 5 x
c The graph is an isosceles triangle. It has the 8 a y
line of symmetry x = 3. Therefore, the median 1
is 3. 2 a
3 4
6 a ∫1 k dx + ∫3 2k dx = 1 –1 1 x
[kx ] + [2kx ]
3
1
4
3
=1
(3k – k) + (8k – 6k) = 1 a=1
2
1 t 1 1
k=
4 b F(t) = 0 + ∫−1 2 dt = 2 (t + 1)
x1
Alternative method
1 1
b For x ∈ (1, 3), F(x) = ∫ 4 dx = 4 x − 4 1
1 (m + 1) = 0.5
2
x1 1 m=0
For x ∈ (3, 4), F(x) = ∫3 2 dx = 2 x − 1
 c Q1= −0.5 Q3 = 0.5
0 x <1
IQR = Q3 − Q1 = 1
1 1
F ( x ) =  4 x − 4 1 x < 3
k
4 1
1 x − 1 3x 4
9 a ∫1 0.25 dr + ∫4 − 8 (r − k ) dr =1
2
1 x > 4.
  1 4  1 2
k
 4 r  +  − 16 (r − k )  = 1
1 4
c F(x)
1 (1 − 14 ) +  0 − − 161 ( 4 − k )  = 1 2
1
1 1
2 (4 − k )2 =
16 4
0
1 3 4 x 4 – k = ±2
d F(Q1) = 0.25. From the graph, LQ value must k = 2 (reject because k > 4) or k = 6
be between 1 and 3. b For r ∈(1, 4),
1 1 r
x − = 0.25 x=2 1 1 1
dr = 0 +  r  = (r − 1)
r
4 4 F (r ) = F (1) + ∫
1 4  4 1 4
12
1
Worked solutions
For r ∈ (4, 6), Exam-style questions

r r
F( r ) = F( 4 ) + ⌠ − 1 ( r − 6 )dr = 3 +  − 1 (r − 6)2  = 1 − 1 (r − 6)2 1
⌡4 8 4  16  4 16
1 a F(7) = a(7)3 = 343a = 1, so a = 343
r r
1 3
) = F( 4 ) + ⌠⌡4 − 18 (r − 6 )dr = 34 +  − 16
1 (r − 6)2  = 1 − 1 (r − 6)2
 4 16 343
t = 0.2

t = 4.0936…
 t = 4 days
0 r <1
1 
 4 (r − 1) 1r < 4  3 2
F(r) =  b f(t) =  343 t 0 t 7
 (r − 6 ) 2
0 otherwise.
1 − 16 4 r 6 
1 r > 6. c F(t) = 0.25

c For the 80th percentile, R is between 4 and 6. For t ∈(0, 7),
t 3 1 3
(r − 6)2 = 0.8 F (t ) = ∫ t 2 dt = t
1− 0 343 343
16
r = 4.21 r = 7.79 reject Lower quartile F(t) = 0.25
1 3
−x −1 x < 0 t = 0.25
343

10 a f(x) = x 0 x 1
0 t3 = 85.75
 otherwise.
t = 4.41
0 x < −1 a1 1
1 1 2
 2 − 2 x −1 x 0
2 a ∫1 6 (4 − x)dx + 6 (5 − a) = 1
a
F(x) =   − 1 (4 − x)2  + 1 (5 − a) = 1
1 + 1 x2 0 x 1  2  1 6
2 2
x > 1.
1 −
1
12 { 1
}
(4 − a)2 − 9 + (5 − a) = 1
6
Change limits:
(4 − a )2 − 9 − 2(5 − a ) = −12
X Y
−1 4 a 2 − 6a + 9 = 0

2
0 0
(a − 3) = 0
1 4 a=3

(
FY (y) = P(4 X 2 y) = P − 1 y X 1 y
2 2 ) b
3 1
∫1.5 6 (4 − x)dx + ∫3
4. 5 1
6
dx = 11
16
1
( ) (
= FX 2 y − FX − 2 y = 4 y
1 1
) c The 95th percentile lies in the region (3, 5).
Therefore, 1 1
 F(x) = 6 + 6 x 3x 5
0 y <0
 1 + 1 x = 0.95
F(y) =  y 1 0y 4
4 6 6
1 y > 4. x = 4.7
1
1 3 a  1 x 4  + ( k − 1) x k +  − ( x − 2 )2 2 = 1
 0 y 4  4 0  1  k
Hence f(y) =  4
0 otherwise. 8k 2 − 24k + 17 = 0

1 6+ 2 6− 2
b y = 0.5 or
4 4 4
M: y = 2
c For Q1, 1 y = 0.25,
0.5 so y = 1
4
For Q3, 1 y = 0.75,
0.5 so y = 3
4
IQR = 2 13
 Or change limits:
x3 0x <1

 X Y
2+ 2 6+ 2
 1 x < 0 0
b f(x) =  4 4
 6+ 2 1 1
− 2( x − 2) x < 2
 4
 6− 2 126 − 55 2
0 otherwise
 4 32
 x3 0x <1
 2 8
 2− 2 6− 2
 1 x <
or f ( x ) =  4 4  1
6− 2 1 y 3 0y <1
 −2( x − 2) x <2 3
 4 
 −2
0 otherwise. 2 − 2 3 1 y < 126 − 55 2
 f(y) =  12 y 32
  1  −2
Therefore, − 2  y 3 − 2  y 3 126 − 55 2 y < 8
 3  32
 0 x <0 0
  otherwisse.
4
 x
 0x <1
4 4 a F(15) − F(10) = 0.145

2+ 2 2 +1 6+ 2 b Q1 : 1 − e −0.1t = 0.25 t = 10 ln 4
F(x) =  4
x−
4
1 x <
4 3

 6+ 2 Q 3 : 1 − e −0.1t = 0.75 t = 10 ln 4
1 − ( x − 2)
2
 x 2
4 Q 3 − Q1 = 10 ln 3
 1 x >2
 c For median:
 1 − e −0.1t = 0.5
0 x <0
x4 t = 10 ln 2
4 0x <1
  1 − 10
1t
or F(x) =  2 − 2 x − 1 − 2  e
1 x < 6 − 2
t 0
f (t ) = 10
 4 4 4
  0 otherwise.
6− 2 x 2
1 − ( x − 2 )
2
 4 ∞ ∞
1 x > 2. ⌠  − 1t  − 1t ∞ − 1t
Mean = E(t ) =  t  1 e 10  dt =  −te 10  − ∫ e 10 dt
⌡0  10  0
 0
c Change limits:
∞ ∞
⌠  − 1t  − 1t ∞ − 1t
X Y E(t ) =  t  1 e 10  dt =  −te 10  − ∫ e 10 dt
⌡0  10  0
0 0  0
1 1  − 1t
∞
6+ 2 = 0 − 10e 10  = 0 − ( 0 − 10 ) = 10
126 − 55 2
 0
4 32
P(mean  T  median) = F(10 ln 2) – F(10)
2 8
= 0.5 − e −1
 1 13
 y 0 y <1 5 a i Two points (−2, 0) and (0, 0.2) on the
 3 first piece:
 2 + 2 − 23 126 + 55 2 0.2 − 0 = 1
 y 1 y < a=
f(y) =  12 32 0 + 2 10
2
 2 1  −3 126 + 55 2 Two points (1, 0.4) and (4, 0) on the third
 −  y 3 − 2 y y <8 piece:
3  32

 0 − 0.4 2
0 otherwise. b = 4 − 1 = − 15

14
1
Worked solutions
x
1 1  9 x − 1 x2  = 3
10 x + 5 −2 x < 0
 14 14 1 8

1 0x <1 4x2 – 36x + 53 = 0
f ( x ) = 5
− 2 x + 8 1 x < 4 x = 1.85, x = 7.15 reject
 15 15
0
 otherwise.
c ( ) ∫ 12 x dx + ∫
E X2 =
1
0
5
14 7
1 (
x 2 9 − 1 x dx
4. 5
)
0 x < −2 1 301 919
 = + =
12 64 192
 1 x 2 + 4x + 4 ( ) −2 x < 0
20
( ) = 0.803
2

ii F(x) = 1 (1 + x ) 0x <1
( )
Var ( X ) = E X 2 − ( E ( X )) =
2 919
192
− 1
239
240
5

− 1 x 2 − 8x + 1
 15
( ) 1 x < 4 7 a F(4) = 0.5(4 – 3) = 0.5
1 x > 4. so k(4 – 2) = 0.5

b i Change limits: 2k = 0.5
k = 0.25
X Y
−2 e−2 b F(4) = 0.5, therefore the value of Q1 must lie
0 1 between 2 and 4; the value of Q3 must lie
between 4 and 5.
1 e
4 e4 0.25(x – 2) = 0.25 x = 3

0.5(x – 3) = 0.75 x = 4.5
F(y) = P(Y  y) = P(eX  y) = P(X  ln y)
1 ((ln y)2 + 4 ln y + 4) IQR = 4.5 – 3 = 1.5
For y ∈ (e–2, 1), F ( y ) =
20 
 0.25 2x 4
1
For y ∈ (1, e), F ( y ) = (1 + ln y) c f ( x ) =  0.5 4x 5
5

0 otherwise.
For y ∈ (e, e4), F ( y ) = 1 (ln y 2 − 8ln y + 1)
15
Therefore, E( X ) =
4 5 3 9 3
d i ∫2 0.25x dx + ∫4 0.5xdx = 2 + 4 = 3 4

 3
10y
1
e −2 y < 1
ii P( X > µ) = P X > ( 15
4 )
=1− P X (15
4
. )

(154 ) = 1 – 0.4375 = 0.5625
f(y) =  5y 1 y <e
 −2 =1− F
4
 5y ey <e
 8 a
0 otherwise. f(x)
1 e
ii ⌠  3  ⌠  1
 y  10y  d y +  y  5y  d y
⌡e−2 ⌡1 1 or 1
15 k
e4
+⌠  −2 
 y  5y  d y = 21.4
⌡e
a P ( 0.5 < x < 3 ) =

1 1 31
∫0.5 2 x dx + ∫1 7 ( 4.5 − x )dx
3
6 0
15 x
1
1
9 1 15 5 745
3 0 x <0
=  x 4  +  x − x2  = + = 1 dx = 1 x F x =  1
 8 0.5  14 14 1 128 7 896 x
( ) 15 x
b ∫0 15 15 0 x 15

11 1 1 x > 15.
∫0 2 x dx = 8 , therefore the median value
3
b
must lie between 1 and 4.5.

1 + x 1 4 . 5 − x dx = 1
8 ∫1 7 ( ) 2
15
c Change limits:
(16k − 8k ) − (8k − 2k ) = 53
X Y
3
0 0 k=
10
15 225
b f(t)
   1  1  1  1 21
( )
1
FY ( y ) = P X 2 y = P  X y 2  = FX  y 2  =  y 2  = y
    15   15 0.6
1
  1
 1 1
 1 1
y2 = FX  y2  = 15  y2  = 15 y2

0 y <0
F ( y ) =  1 21
 15
y 0 y 225 0
2 4 t
1 y > 225.

dt + ∫ 3 t ( 4 − t )dt = 3 + 8 = 2.2 hours
2 4
Mean = E (T ) = ∫0 0.15t
3
c 2 10 5 5

−1
Therefore, f ( y ) =  1 y 2 0 y 225
E (T ) =
2
t + ∫ 3 t ( 4 − t )dt = 3 + 8 = 2.2 hours
4
∫0 0.15t
3
30 d
 0 2 10 5 5
 otherwise.
d For t ∈ ( 0, 2 ) , F (t ) =
t
∫0 0.15t
2
dt = 0.05t 3
d E(Y) = E(X 2) = 75
 3X 4  t
For t ∈ ( 2, 4 ) , F (t ) = F ( 2 ) + ∫
t 3 4 − t dt = 0 . 4 +  6 t − 3 t 2  =
E(Z) = E  2  = E(3X 2) ( )
 X  2 10  5 20 2
= 3 × E(X2) = 3 × 75 = 225 t
t ∈ ( 2, 4 ) , F (t ) = F ( 2 ) + ∫ 3
( 4 − t )dt = 0.4 +  65 t − 230 t 2  = − 20
t 3 t 2 + 6t − 7
9 a For X ∈(–∞, 0), 2 10  2 5 5
x
Therefore,
F(x) = ∫−∞ 0 dx = 0 
t <0
0
For X ∈(0, ∞ ), 0.05t 3 0 t 2
x F (t ) = 
x 1x  1x  1x 1x 3 6 7
F(x) = F( 0 ) + ⌠ 4 e 4 dx = 0 +  − e 4  = − e 4 − ( −1) = 1 − e 4 − 20 t + 5 t − 5
1 − − − − 2
2 t 4
⌡0  0
1 t > 4.
x
1  1  1 1
4   ( )
1 e − 4 xdx = 0 + − e − 4 x = − e − 4 x − −1 = 1 − e − 4 x 17 3
e P(T > 3) = 1 – F(3) = 1 − 20 = 20
 0
Therefore, 2 6

x < 0,
11 a ∫0 kx dx + ∫2 2k dx = 1
0
F( x ) = 
1 2
1 − e − 4 x x 0.  1 kx 2  + [ 2kx ]6 = 1
  2 0 2
 −1 
b P ( X > 1) = 1 − P ( X 1) = 1 − F (1) = 1 −  1 − e 4  = 0.779 (2k – 0) + (12k – 4k) = 1
 
 −1 
k=
1
1) = 1 − P ( X 1) = 1 − F (1) = 1 −  1 − e 4  = 0.779 10

1
− x 3 2 1 31 3 1 7
c 1−e 4 =
4 b P (1 < X < 3 ) = ∫ x dx + ∫ dx = + =
1 10 2 5 20 5 20
1 = e − 14 x
For x ∈ ( 0, 2 ) , F ( x ) =
x 1 1
∫0 10 x dx = 20 x
2
c
4
1 1
− x = ln x
For x ∈( 2, 6 ) , F ( x ) = F ( 2 ) + ∫ 5 dx = 5 +  5 x  = 5 x − 5
x1 1 1 1 1
4 4
2  2
x = 5.55 x
x ∈( 2, 6 ) , F ( x ) = F ( 2 ) + ∫ 1 dx = 1 +  1 x  = 1 x − 1
x
2 4 2 5 5  5 2 5 5
10 a ∫0 0.15t
2
dt + ∫ k ( 4 − t ) dt = 1
2
4
2  1
+ 4kt − kt 2  = 1
5  2 2
16
1
Worked solutions
Therefore, 
 1 4a < 9
0 x <0 10
b f (a ) = 
1 2 1
F ( x ) =  20 x 0x 2 100 (19 − a ) 9 a < 19
0 otherwise.
1 x − 1 2x 6 
5 5
1 x > 4.
 For a ∈ ( 4, 9 ) , F (a ) =
a 1 1 4
∫4 10 da = 10 a − 10
1
d F ( 2 ) = , therefore the median value must lie
5
between 2 and 6.
For a ∈ ( 9, 19 ) , F (a ) = F ( 9 ) + ∫ 100 (19 − a ) da = 2 + 10
a 1
9
1 19
(
1
5
1 1
x− =
5 2
a ∈ ( 9, 19 ) , F (a ) = F ( 9 ) + ∫
9
a
2 100 200 (
1 19 − a da = 1 + 19 a − 1 a 2 − 261
100 ( ) 200 )
x = 3.5 1 2 19 161
=− a + a−
200 100 200
12 a
1
4 ( 1
4
1
4 )
P − π < x < π = 0 + F π = 0.383 ( ) Therefore,

0 a<4

1 1 1
f ( x ) =  2 cos 2 x 0x π 4
b i F (a ) = 10 a − 10 4 a 9
0 otherwise.
 − 1 a 2 + 19 a − 161 9 a 19
 200 100 200
ii f(x) 1 a > 19.

5
1 1
2 c S = A2
Limits:
1
2 1
A S = A2
–5
0 π 5 x
4 2
9 3
–5 19 19
c i E ( 2X ) = ( 12 x ) dx
π 1
∫0 2x  2 cos  
For s ∈ ( 2,3) , F ( s ) = P  A s  = P ( A s ) =
1
10
s
1
2 2 2
−
1
 
= 2x sin ( x )  + ∫ 2sin ( x ) dx = 2π −  −4cos ( x )  = 2.28
π π π
1 1 1
 s = P ( A s ) = 1 s − 4
1
 2  2 s ∈ ( 2,3) , F (s ) = P 2A
0 2 2 2

0   0
10 10
sin ( x )  + ∫ 2sin ( x ) dx = 2π −  −4cos ( x )  = 2.28

π π π
1 1 1
2  0 2  2  P (S < 2.5) = F ( 2.5) = 1 (2.5) − 4 = 9 2
0
0 10 10 40
or 0.225
1
E( X ) =
2 ( )
ii E 2 X = 1.14
x
∫0 cx
2
14 a dx = 1
9 19 1 19 − a da = 1
∫4 k da + ∫9 100 ( )
13 a 2
c x3 = 1
 3 0
19
19 1 2
[ka ]94 +  100 a− a
200 
=1 8c
9 −0=1
3
 192 192  19 × 9 81   3
5k +  − − − =1 c=
 100 200  100 200   8
1
5k + =1
2
1
k=
10
17
0 x <0 0 y <8
 
b F ( x ) =  1 x 3 0x 2 F ( y ) =  1 y 4 − 1 8 y 24
8  327680 80
1 x > 2. 1 y > 24.
Q1 :
1 3 1
x = x = 1.260  1 y 3 8 y 24
8 4 Therefore, f ( y ) =  81920
1 3 3 0 otherwise.
Q3 : x = x = 1.817 
8 4
b P(Y > 6) = 1 – F(6) = 1 – 0 = 1
IQR = 1.817 – 1.260 = 0.557 24
(10y )  81920
1
y 3  d y = 
1
y5
24
c E (10Y ) = ∫
c Limits of Y: 8  40960 
 8
X Y = 193.6
0 0
2 16 4k x + k dx = 1
16 a i ∫−k 5k
1 3
( ) ( )( )
 1 21  1  1 2
( ) 1 3 4k
FY ( y ) = F 4 X 2 y = FX  y  =  4 y  = 64 y 2  x + 1 x  = 1
2
 4  8    10k 5 
1 3   −k
( ) ( )( )
1 2 1  1 2
1
1 23
y =  y  = y
( )( )
4  8 
4  64

 k + k − k − k  =1
8 4 1 1
0  5 5 10 5 
y <0

F ( y ) =  1 y 2
3
0 y 16 12 1
64 k + k = 1
 5 10
1 y > 16.
 5
k = 1
2
 1
3 2
Therefore, f ( y ) = 128 y 0 y 16 2
k =
0 otherwise. 5


3
0 x < −2
d E ( y ) = ∫ 3 y 2 dy = 9.60
16
5
0 128 
( 5x + 2 )2
ii F(x) =  −2 x < 8
 100 5 5
x
x 1 3
 1 4 1 1 1 8
15 a ∫ x dx = x = x − 4
 x .
1 20  80 1 80 80 5
0 x <1 iii P(X  p) = 0.85


F ( x ) =  1 x 4 − 1 1 x 3
(5x + 2)2 = 0.85
80 80 100
1 x > 3. x = 1.44 (3 s.f) −2.24, reject
b i Change limits:
Limits of Y:
X Y
X Y
2 8
1 8 − −
5 125
3 24 8 512
( 18 y ) = ( 801 )( 18 y ) − 801
4 5 125
FY ( y ) = F ( 8X y ) = FX
1
1 1 P (Y y) = P (X 3 y) = P (X y 3 ) =
= y4 −
327680 80 2
 13 
 5y + 2 
=
100
18
1
Worked solutions
 0 t <0
0 y <− 8  2
125
 2
t 0t < 4
  13   24
F(y) =   5y + 2  F(t) = 
 1 − (12 − t )
2
 − 8 y < 512 192
4 t < 12
100 125 125 

1
1 y 512 . 
t 12.
 125
 1 2 b i 2.52 = 25 ii 1 −
(12 − 5)2 − 32 = 71
 5y 3 + 2  24 96 192 24 192
  = 0.5
ii
100 c Q 3 when 4 t < 12,
y = 1.01 −1.81, reject
1−
(12 − t )2 = 0.75
( ) 192
∞
17 a E(T) = ⌠ 1
 t 10 e −0.1t dt = 10 minutes t = 5.07, 18.9, reject
⌡0
( )
t t 1 × 2 ×k + 5 − 2 × k + 1 × 8 − 5 × k = 1
F (T ) = dt =  − e −0.1t  = − e −0.1t − ( −1)19= 1 −ae −0i.1t ( )
∫0 0.1e 2 ( )
−0.1t
b i 2
0
2
( ) k =
t t
T) = ∫0 0.1e
−0.1t
dt =  − e −0.1t  = − e −0.1t − ( −1) = 1 − e −0.1t 11
0
M = 2T 
F(m) = P(2T  m) = P(T  0.5m) 1 x 0x < 2
11
= 1 – e–0.1(0.5m) = 1 – e– 0.05m 2
ii f(x) =  11 2x <5

 −0.05m
2
Therefore, f (m ) =  0.05 e m 0
 33 ( 8 − x ) 5x < 8
0 otherwise. 
– 0.05m 0 ottherwise.
To find median, 1 – e = 0.5
e– 0.05m = 0.5 b i For x ∈ (0 , 2)
x
– 0.05m = ln 0.5 x 1  x2  x2
F( x ) = ∫0 11 x dx =  22  =
m = 13.9 minutes 22
0
ii FN(n) = P(T 2  n) For x ∈ (2, 5)

(121 x −
x
∫2 11 dx = 11 + 11 x  2 = 11 +
F ( x ) = F ( 2) +
x 2 2 2 2
(
= P − n T n )
(( ) − (1 − e F(x ))=) F(2) + ∫ 112 dx = 112 +  11 ( 11 11 ) 11
x
x 2 x = 2 + 2 x − 4 = 2 x −1
= 1 − e −0.1 n 0.1 n
2 11 
( )
2
For x ∈ (5, 8)
= e
0.1 n
− e −0.1 n
x
x 2 8  (8 − x)2 
F ( x ) = F ( 5) + ∫ ( 8 − x ) dx = + − 
5 33 11  33 
 0 n<0 5
FN ( n ) =  0.1
8  (8 − x ) 3
n −0.1 n 2
 e −e n 0. = + − + 
11  33 11 
18 a For t ∈ (0 , 4)
=1−
( 8 − x )2
2 t
t 1 t  t2
33
F (t ) = ∫0 12t dt =  24  =
24
0 Change limits:
For t ∈ [4, 12], X Y = X2
2 t 0 0
1  (12 − t ) 
(12 − t ) dt = 32 +  − 192 
t
F (t ) = F ( 4 ) + ∫ 2 4
4 96   4
5 25
2  (12 − t ) 1
2
= + − +  8 64
3  192 3
=1−
(12 − t )2
192
19
2
For y ∈ (0, 4)  1 1
c y 2P( 0.5 < x < 1.5 ) = ∫ 0.2x 2 dx + ∫ 26 xdx = 0.08619... + 0.36111
1 1.5
    45
( )
1 1 0 . 5 1
1
F ( y ) = P (Y y ) = P X 2 y = P  − y 2 X y 2 = −0= y
1 1 1.5  22 22
P ( 0.5 < x < 1.5 ) = ∫ 0.2x 2 dx + ∫ 26 xdx = 0.08619... + 0.36111... = 0.447
0.5 1 45
1 2
 2
   y 
( )
1 1
Y y ) = P X 2 y = P  −y 2 X y 2  = −0=
1 ,
y Mathematics in life and work
  22 22
because X cannot take negative values. 1 a

f(a)
For y ∈ (4, 25)
F ( y ) = 2  y 2 − 1 − 2  − y 2 − 1 = 4 y 2
1 1 1
11   11   11
For y ∈ (25, 64)

 2

 1

 8 − y 
2
 
F( y ) = 1 −  0
10 16 21 26 a
 33 
 
  b A
ge is a non-negative continuous random
variable. However, we are not expecting a
d
f (y) =
dy ( )
F y , therefore: baby to play computer games. The model can
be modified to take this into account.
 1 c Mean is calculated from the integral of x f(x).
 0 y 4
22 Mean is 22.1 years, so in the age group

 1 21  a  26.
4 y 25
 11 y
f(y) =  d Players between 21 and 26.
 8− y
 25 y 64 2 a The probability will be zero since t is the
33 y
 continuous random variable measuring the
 0 otherwise.
 number of hours in one day.
383 b Any increasing function, i.e. x 2, 0.2x 3.e x
ii
22
c Median > mean, negative skew.
1 1 b 26
20 a ∫0 0.2x 2 dx + ∫
1 45
x dx = 1 Reduce the difficulty level in order to reduce
the median of the play time.
( )( )
1 Put in extra help functions to support players
 1 2 23   13 2  b
 5 3 x  +  45 x  = 1 to complete each level, so that the mean of
 0 1
play time will increase.
2 13 2 13 d Let daily playing time on the weekend be
+ b − =1
15 45 45 Y, Y = 2X
b2 = 4 E(Y) = 2 E(X)
b = 2, b = –2 reject M(Y) = 2 M(X)
3
dx + ∫ 26 x 2dx = 2 + 182 = 1.443
1 2
b E(x ) = ∫ 0.2x 2
0 1 45 25 135
3 2 26
1 2 182
∫0 0.2x 2 dx + ∫
2
x dx = + = 1.443
1 45 25 135
20
2
Worked solutions
2 Inference using normal and t -distributions

responsibility for the example answers to questions taken from its past question papers, which are contained in
this publication.
Non-exact numerical answers should be given correct to 3 significant figures, or 1 decimal place for angles in
degrees, unless a different level of accuracy is specified in the question.
Where values from the Cambridge International Education statistical tables are used, the same level of accuracy has
been used in workings unless stated otherwise.
Prerequisite knowledge
The test statistic T = 7.2 − 9 = − 4.495
1.2829
8
1 P(W < 53) = P(Z < 1.5) = 0.9332
One-tailed test to the left, with p = 0.95 and v = 7;
2 P(W  3.0) = P(Z  −0.3529) = 1 − P(Z < 0.35)
a the critical value of t = −1.895.
= 0.3621. Expected number = 100 × 0.3621 = 36 As −4.495 < −1.895, the test statistic T does not lie
in the acceptance region, so you should reject H0.
b P(3.2  W  3.5) = P(−0.1176  Z  0.2353)
There is significant evidence to suggest that the
= 0.1398
mean value of the random variable has decreased
Expected number = 100 × 0.1398 = 14 from 9.
0.85
c 3.3 ± 1.96 gives the interval [ 3.13,3.47 ]
100 2 H0 : m = 10 cm. The average length of leaves is
a
10 cm.
15.2
3 a 46.3 ± 1.645 gives the interval [ 42.8,49.8 ] b H1 : m > 10 cm. The average length of leaves is
50
b You are 90% confident that on average, greater than 10 cm.
applicants can score between 42.8 and c Gemma should use the test statistic T,
49.8. Since the lower limit of the confidence because the population standard deviation is
interval is greater than 42, that means unknown (and the sample size is small).
you are 90% confident that applicants could d Assume that the length of leaves is normally
achieve a mean score of 42 or higher. distributed.
4 H0: m = 1.26 H1: m ≠ 1.26 96 1 96 2 
x= = 9.6 s 2 =  1080 − = 17.6
13.35 10 9 10 
x= = 1.335
10
9.6 − 10
The test statistic T = = −0.302
The test statistic z = 1.335 − 1.26 = 0.988 17.6
0.24
10
10
Two-tailed test, with p = 0.975, the critical values A one-tailed test to the right, with p = 0.95
are ± 1.96. and m = 9. The critical value of t is 1.833.
As 0.988 < 1.96, the test statistic Z lies inside the The test statistic –0.302 < 1.833, lies within
acceptance region, so you should accept H0. the acceptance region. So you should accept
There is no evidence to suggest a change in mean H0. There is no evidence to suggest that the
growth of tomato plants. average length of leaves in Gemma’s garden is
greater than 10 cm.
Exercise 2.1A
3 0 : m = 165 cm. The average height of students
a H
is 165 cm.
1 H0 : m = 9 H1: m < 9
H1 : m > 165 cm. The average height of students
57.6 is greater than 165 cm.
x = 8 = 7.2
b One-tailed test to the right.
1 57.6 2 
s 2 = 7  423.7 − 8  = 1.2829 c Assume the height of students is normally
 
distributed.
21
2 Inference using normal and t-distributions
As −2.776 < −0.481 < 2.776, the test statistic T lies

972 1 972 2 
x= = 162 s 2 =  157754.6 − = 58.12
6 5 6  inside the acceptance region, so you should
accept H0. There is no evidence to suggest that
162 − 165 the mean volume is not as expected.
The test statistic T = = −0.964 .
58.12
6 7 H0: m = 20 g H1: m < 20 g
One-tailed test to the right, with p = 0.99 ∑ x = 115 → x = 19.17
and v = 5. The critical value of t is 3.365. 2 2
∑ x = 2231.26 → s = 5.419
The test statistic –0.964 < 3.365, lies within the
acceptance region, so you should accept H0. The test statistic T = 19.17 − 20 = − 0.8734
5.419
There is no evidence to suggest that the 6
average height of the students is greater than One-tailed test to the left, with p = 0.05 and v = 5;
165 cm. the critical value of t = −2.015 .
4 Assuming the sample is normally distributed. As −0.8734 > −2.015, the test statistic T lies inside
H0 : m = 10.5 H1: m > 10.5 the acceptance region, so you should accept H0.

There is no evidence to suggest that the mean
127.3  2

x= = 12.73 s 2 = 1  2219.6 − 127.3  = 66.563 weight of a pack of Brand A’s raisins is less than 20 g.
10 9  10 
12.73 − 10.5 8 H0: m = 12 minutes H1: m < 12 minutes
The test statistic T = = 0.864
66.563
10 ∑ x = 26 → x = 3.25
2
One-tailed test to the right, with p = 0.90 and 2
∑ x = 312 → s = 32.5
v = 9; the critical value of t = 1.383. 3.25 − 12
The test statistic T = = − 4.341
As 0.864 < 1.383, the test statistic T lies inside the 32.5
acceptance region, so you should accept H0. 8
There is no evidence to suggest that the new One-tailed test at the 1% significance level,
technology has increased the television lifetime. with p = 0.01 and v = 7; the critical value of
t is −2.998.
As −4.341 < −2.998, the test statistic T does not lie
207.7 1 207.7 2 
5 a x= = 20.77 s 2 =  4361.33 − = 5.267
10 9 10  in the acceptance region, so you should reject H0.
There is evidence to suggest that the new mean
207.7 1 207.7 2
x= = 20.77 s 2 =  4361.33 − = 5.267 is less than 12 minutes. That means the new
10 9 10 
schedule introduced by the control room works
b H0 : m = 20
better than the previous one during the
H1 : m > 20 peak time.
20.77 − 20
The test statistic T = = 1.061 9 : m0 :=m1=kg
H0 H 1 H1 :Hm1 :≠m1≠ 1 kg
5.267
10 ∑ x = 10.54 → x = 1.054
One-tailed test to the right, with p = 0.975 and v = ∑ x 2 = 11.2202 → s 2 = 0.012 34
9. The critical value of t is 2.262.
1.054 − 1 = 1.537
The test statistic 1.061 < 2.262, lies within the The test statistic T =
0.012 34
acceptance region. So you should accept H0. 10
There is no evidence to suggest that the mean is Two-tailed test at the 10% significance level, with
greater than 20. p = 0.95 and v = 9; the critical values of t = ±1.833.
6 H0: m = 138 ml H1: m ≠ 138 ml As 1.537 < 1.833, the test statistic T lies within the
acceptance region, so you accept H0. There is no
∑ x = 682 → x = 136.4 evidence to suggest that the mean weight of bags
∑ x 2 = 93 246 → s 2 = 55.3 of vegetables has changed from 1 kg.
The test statistic T = 136.4 − 138 = − 0.481 10 H0 : m = 50 grams H1 : m = 50 grams

55.3
5 597 1 597 2 
x= = 49.75 s 2 =  29 767 − = 6.023
Two-tailed test at the 5% significance level, with 12 11  12 
p = 0.975 and v = 4; the critical values of t = ± 2.776.
49.75 − 50
The test statistic T = = −0.353
6.023
12
22
2
Worked solutions
A two-tailed test, with p = 0.975 or p = 0.025; lifespan of candles made at the two factories
v = 11. The critical values of t are ± 2.201. are different. As the sample mean of factory X
The test statistic –0.353 is between –2.201 and is smaller than the sample mean from factory
2.201, and lies within the acceptance region. So you Y, it also suggests that the mean lifespan of
should accept H0. There is no evidence to suggest candles produced by factory X is shorter.
that the mean weight differs from 50 grams. b It is not necessary as both samples are larger
than 30. The Central Limit Theorem can be
Exercise 2.2A applied and the unbiased estimators can be
used as the population variances.
1 a
Normal distribution test is suitable as the 4 H0 : m1 − m 2 = 0
population variances are known.
H1: m1 − m 2 ≠ 0
b H0 : m1 − m 2 = 0 H1: m1 − m 2 ≠ 0 (4.06 − 3.91)
The test statistic Z =
The test statistic Z is 0.062 + 0.047
10 10
(23.4 − 19.8) z = 1.437

=
z = 0.4238
12 2 + 17 2
6 6 Two-tailed test with p = 0.95; the critical values of
z = ±1.645.
Two-tailed test with p = 0.975; the critical
values of z = ±1.960. As 1.437 < 1.645, the test statistic Z does lie
in the acceptance region, so accept H0. There is
As 0.4238 < 1.960, the test statistic Z lies no significant evidence to suggest that the mean
within the acceptance region, so you accept amounts of milk dispensed by the two machines
H0. There is no evidence to suggest that the are different.
two random variables have different means.
5 Assume each accident that happened
2 H0 : m M − m F = 0 H1: m M − m F > 0 was independent.
256 1  256 2  H0 : m1 – m2 = 0
2
xM = = 3.2 sM =  975 − = 1.972
80 79  80  H1 : m1 – m2 > 0
208 1  208  2
 2
xF =
80
= 2.6 s F2 =  731 −
79  80 
= 2.408 x1 = 2859 = 81.69 s12 = 1  235 425 − 2859  = 55.46
35 34  35 
The test statistic Z is
 2
=
(3.2 − 2.6)
z = 2.564 x 2 = 3052 = 87.20 s 2 2 = 1  268 450 − 3052  = 68.11
1.972 + 2.408 35 34  35 
80 80
81.69 − 87.20
One-tailed test to the right with p = 0.95; the The test statistic Z = = −2.93
55.46 + 68.11
critical value of z = 1.645. 35 35
As 2.564 > 1.645, the test statistic Z does not lie in A one-tailed test to the right with p = 0.95. The
the acceptance region. So, you reject H0. There is critical value from the normal distribution table
significant evidence to suggest that the mean is 1.645.
number of sales of the new cereal to men is
higher than to woman on that day. The test statistic –2.93 < 1.65, lies within the
acceptance region. So accept H0. There is no
3 a H0 : m X − mY = 0 evidence to show that the warning signs reduce
H1: m X − mY ≠ 0 road accidents.
(305 − 309) 6 Let the time spent on the internet by families
=
The test statistic Z is Z = −2.170
82.35 + 123.61 with children be X1 and the time spent by
55 65 families without children be X2.
Two-tailed test with p = 0.975; the critical
H0 : m1 − m 2 60
values of z = ±1.960.
H1: m1 − m 2 < 60
As −2.170 < −1.960, the test statistic Z does not
lie in the acceptance region. So, you reject H0. x1 = 728 s12 = 11 970.59
There is evidence to suggest that the mean x 2 = 635 s12 = 49 929.66
23
(728 − 635) − 60 9 H0 : mB – mG = 2
11970.59 + 49 929.66 H1 : mB – mG < 2
35 30
=  2

Z 0.7367 x B = 2474 = 82.47 s B 2 = 1  206 044 − 2474  = 69.71
30 29  30 
One-tailed test to the left with p = 0.05; the
critical value of z = −1.645.
 2
0.7367 > −1.645, the test statistic Z lies within the xG = 2380 = 79.33 sG 2 = 1  191094 − 2380  = 78.64
30 29  30 
acceptance region. So, you accept H0. The
evidence supports the claim that families with
The test statistic Z = (82.47 − 79.33) − 2 = 0.510
children spend at least sixty extra hours on the
69.71 + 78.64
internet than those without children in a year. 30 30
7 The time taken by the new system is denoted by N. A one-tailed test to the left with p = 0.05. The
critical value from the normal distribution table
Let the time taken by the old system be O.
is –1.645.
H0 : mO − m N = 2 The test statistic 0.510 > –1.645, lies within the

acceptance region, so accept H0. The evidence
H1: mO − m N < 2
supports the claim that boys scored at least 2
2 marks more than girls in the mock science exam.
n = 4.3 s N = 2.56
10 H0 : m X − mY = 0
a
The test statistic Z = (7 − 4.3) − 2
2.52 + 2.56 H1 : m X − mY ≠ 0
45 45
y = 19.99
z = 1 . 582
(20.32 − 19.99)
One-tailed test to the left with p = 0.05; the The test statistic Z =
0.36 2 + 0.36 2
critical value of z = –1.645. 35 40
As 1.582 > –1.645, the test statistic Z does lie in z = 3.960
the acceptance region, so accept H0. The hotel Two-tailed test with p = 0.95, the critical
manager’s claim is supported; there is at least values of z = ±1.645.
2 minutes improvement from the new 3.960 > 1.645. The test statistic does not lie in
computer system. the acceptance region, so reject H0. There is
significant evidence to suggest that the mean
8 Let X1 be the size of flowers grown in high length has changed. After the machine has
nitrogen compost, and X2 be the size of flowers been serviced, the mean length of flat-pack
grown in normal compost. components has decreased.
H0 : m1 – m2 = 1 b H0 : m X − mY = 0
H1 : m1 – m2 < 1 H1 : m X − mY ≠ 0
285.9 1  285.92  y = 19.99 s y2 = 0.0784

x1 = = 8.169 s12 =  2357.75 − = 0.6575
35 34  35  (20.32 − 19.99)
0.36 2 + 0.0784
244.9 2 1  244.92  35 40
x2 = = 6.997 s2 = 1751.61 − = 1.118
35 34  35  z = 4.39

(8.169 − 6.997 ) − 1 = 0.764 Two-tailed test with p = 0.95, the critical
The test statistic Z = values of z = ±1.645.
0.6575 + 1.118
35 35 4.39 > 1.645. The test statistic Z lies outside
A one-tailed test to the left with p = 0.10. The the acceptance region, so reject H0. There is
critical value from the normal distribution table significant evidence to suggest that the mean
is –1.282. length has changed after the service.
The second test is more reliable as the sample
The test statistic, 0.764 > –1.282, lies within the
variance is used.
acceptance region, so accept H0. The evidence
supports the garden centre’s claim that flowers
grown in the high nitrogen compost are at least
1 cm bigger.
24
2
Worked solutions
Exercise 2.2B As 0.8560 < 1.761, the test statistic T does lie in
the acceptance region. So, you accept H0. There
is no evidence to suggest that trains from
(8 − 1) × 3.7 + (15 − 1) × 2.9
1 a sp 2 = = 3.167 manufacturer A have greater fuel efficiency on
8 + 15 − 2
b H0 : m X − mY = 0 average than those from manufacturer B.
H1 : m X − mY ≠ 0 4 Let X denote the height of young plants receiving

the new fertiliser. Y denotes the height of young
The test statistic T = 26.1 − 24.8 = 1.669
( )
plants receiving the usual fertiliser.
3.167 18 + 15
1
a ∑ x = 45.3 ∑ x 2 = 345.11 ∑ y = 45.2 ∑ y 2 = 340.9
Two-tailed test with p = 0.975, v = 8 + 15 – 2 =
x = 7.55 s x 2 = 0.619
21; the critical values of t = ± 2.080.
As −2.080 < 1.669 < 2.080, the test statistic t y = 7.533 s y 2 = 0.07867
lies within the acceptance region, so accept (6 − 1) 0.619 + (6 − 1) 0.078 67
H0. There is not enough evidence to suggest s p2 = = 0.3488
6+6−2
that the two random variables X and Y have
different means. b H0 : m X − mY = 0
H1 : m X − mY > 0
2 a ∑ x = 62.7 ∑ x 2 = 445.61 ∑ y = 59 ∑ x 2 = 393.8
7.55 − 7.533

x = 6.967
y = 6.556
s x2 = 1.1
s 2y = 0.8778
(
0.3488 1 + 1
6 6 )
One-tailed test to the right, with p = 0.75,
s p2 =
(9 − 1)1.1 + (9 − 1)0.8778 = 0.989
9+9−2 v = 10; the critical value of t = 0.700.
b Assume that both samples are independent As 0.0499 < 0.700, the test statistic T does lie
and randomly selected. They have the same in the acceptance region, so accept H0. There
population variance. is no evidence to suggest that the new
fertiliser gives an increase in growth.
H0 : m X − mY = 0
5 The five records are independent between Adam
H1 : m X − mY > 0 and Bob. The population variances are the same.
H0 : m x A − m xB = 0
The test statistic T = 6.967 − 6.556 = 0.877
(
0.989 1 + 1
9 9 ) H1: m x A − m xB ≠ 0
One-tailed test with p = 0.95, v = 9 + 9 – 2 =16; x A = 49.3 s x A 2 = 0.0375

the critical value of t = 1.746. x B = 48.7 s xB 2 = 0.0625
As 0.877 < 1.746, the test statistic T lies within
s p2 =
(5 − 1) 0.0375 + (5 − 1) 0.0625 = 0.05
the acceptance region. So, you accept H0. 5+5−2
There is no evidence to suggest that the
sunflowers planted in soil X are larger than 49.3 − 48.7 = 4.243
The test statistic T =
those planted in soil Y.
( )
0.05 1 + 1
5 5
3 H0 : m A − m B = 0 Two-tailed test with p = 0.975, v = 8; the critical
H1: m A − m B > 0 values of t = ± 2.306.
As 4.243 > 2.306, the test statistic T does not lie in
x A = 4.2 s 2A = 0.49 the acceptance region, so you reject H0. There is
x B = 3.8 s B2 = 1.257 evidence to suggest that the average swimming
times of Adam and Bob are different.
(8 − 1) 0.43 + (8 − 1)1.257
sp 2 = = 0.8735
8+8−2 6 H0 : md = 0 , there is no difference between the
4.2 − 3 .8 population mean leaf widths.
0.8735 1 + 1
8 8 ( ) H1 : md ≠ 0 , there is a difference between the
population mean leaf widths.
One-tailed test with p = 0.95, v = 8 + 8 – 2 = 14; the
critical value of t = 1.761.
25
42 One-tailed test to the right with p = 0.9, v = 9; the

X d = 10 = 4.2 critical value of t = 1.383.
1  381.96 − ( 42 )  = 22.84
2
s d2 = As 1.67 > 1.383, the test statistic does not lie in the

10 − 1  10  acceptance region, so you reject H0. There is
4.2 evidence to suggest that vitamins increase
T = = 2.779
22.84 attention span.
10
10 The differences are normally distributed.
Two-tailed test with p = 0.975, v = 9; the critical
values of t = ± 2.262. H0 : md = 0
As 2.78 > 2.262, the test statistic does not lie in the H1 : md ≠ 0
acceptance region, so reject H0. There is evidence
to suggest that nitrogen affects leaf growth, as the Difference 7 −6 −6 −20 3
two samples indicate there is a difference in the x = −2.4 s d2 = 50.3
population mean leaf widths. d
−2.4
The test statistic T is T = = −0.7567
50.3
7 H0 : md 1 5
H1: md < 1 Two-tailed test with p = 0.975, v = 4; the critical
values are t = ± 2.776.
pH difference:
As −2.776 < −0.7567 < 2.776, the test statistic lies
0.8 1.2 0.6 1.2 0.7 1.8 within the acceptance region, so accept H0. There
x d = 1.05 s d2 = 0.199 is not sufficient evidence to suggest that the
percentages of bacteria from the two lakes
The test statistic T is are different.
1.05 − 1
T = = 0.2745
0.199 Exercise 2.3A
6
One-tailed test to the left with p = 0.05, v = 5; the 1 a
A t-distribution with 15 degrees of freedom,
critical value of t = −2.015. critical value = 2.131
0.2745 > −2.015, so the test statistic T lies within b A t-distribution with 9 degrees of freedom,
the acceptance region. So, you accept H0. The critical value = 1.833
evidence suggests that the chemical reduces
pH value by at least 1. c A t-distribution with 22 degrees of freedom,
critical value = 2.819
8 The sample is randomly selected; the differences
d A normal distribution, critical value = 2.326
are normally distributed.
H0 : md = 0 2 a Assuming sample is normally distributed.

H1: md ≠ 0 xt = 5.3 st2 = 2.161

A t-distribution with p = 0.975, v = 14, critical
xd = −0.4571 s d2 = 0.08219 value = 2.145

The test statistic T is 2.161
5.3 ± 2.145
15
T = −0.4571 = −4.218
0.08219 4.49 minutes  mean time  6.11 minutes
7
b You are 95% confident that the mean time of
completing the puzzle is between 4.49 and
values of t = ± 2.447.
6.11 minutes. That means the students can
−4.218 < −2.447, so the test statistic does not lie in complete the puzzle more quickly than the
the acceptance region. So, you reject H0. There is time that the manufacturer suggested.
strong evidence to suggest that the wear on the
3.24 1.87
front and rear tyres is different. 3 a (176 − 166) ± 1.645 40 + 32
9 H0 : md = 0 9.39 cm  height difference  10.6 cm
H1: md > 0 2.5 + 2.9
3.5 b (80.3 − 67.6) ± 1.96
The test statistic T is T = = 1.67 40 32
44.1 11.9 kg  mass difference  13.5 kg
10
26
2
Worked solutions
9 a
Assume population variances of the times
c (28.8 − 27.6) ± 2.326 4.6 + 5.9 spent watching TV by boys and girls are equal.
40 32
−0.0727  BMI difference  2.47 b x B = 7.657 s B = 2.219
2.37 5.78 xG = 7.556 sG = 1.014
d (63 − 68) ± 2.576 40 + 32
Xd = μB − μG
−6.26 bpm heart rate difference −3.73bpm
6 × 2.2192 + 8 × 1.0142
The pooled estimate s p2 =
4 Assume sugar content is normally distributed. 7+9−2
= 2.698
x = 36 s = 2.280
A t-distribution with p = 0.99, v = 5, critical value
(7.657 − 7.556) ± 2.145 × 2.698 × ( 17 + 19 )
= 3.365 −1.67 xd 1.88
2.28
36 ± 3.365 c You are 95% confident that, on average, the
6
difference between the times spent by boys
32.9 g  mean sugar content  39.1 g and girls watching TV each week lies in this
5 a π = 3.13 s = 0.02915 interval. As the interval contains the value
zero, there is no significant difference
A t-distribution with p = 0.975, v = 4, critical
between boys and girls.
value = 2.776
0.02915 10 Assume population variances of the study times
3.13 ± 2.776
5 of first-year students and final-year students are
3.094  calculated value of π  3.166 the same.
b The confidence interval calculated is valid Xd = mean first-year study time – m ean final-year
because neither the distribution nor the study time
parameters used are approximated. 4 × 1.3 + 4 × 0.98
The pooled estimate s p2 = = 1.14
5+5−2
( )
6 a x = 501.7 s = 7.421 1 1
(3 − 2) ± 1.86 × 1.14 × +
A t-distribution with p = 0.975, v = 5, critical 5 5
value = 2.571 −0.256 xd 2.26

7.421
501.7 ± 2.571
6 11 a Assume the pH values are normally
493.9 g  mean weight  509.5 g distributed.
b There is a 5% chance that the confidence x = 6.99 s = 0.4408

interval will not contain the population mean. A t-distribution with p = 0.975, v = 9, critical
2
value = 2.262
7 a m = 8.6 sm = 7.575
0.4408
6.99 ± 2.262
w = 5.5 sw2 = 1.692 10
Let d = μM – μW 6.67  mean pH value  7.31
7.575 1.692 b 0.95 × 60 = 57
(8.6 − 5.5) ± 1.96 +
600 800
2.86  d  3.34 12 Let Year 7 level of progress be the random
variable X, and Year 8 level of progress be the
b You are 95% confident that men spend more random variable Y.
money on breakfast at a café than women do
as both lower and upper limits are above the a x = 2.28 s x2 = 0.907
value of zero. y = 2.9 s y2 = 0.185
2 2 Xd = μX − μY
8 The pooled estimate s p2 = 11 × 5.9 + 14 × 4.1
a
12 + 15 − 2 The pooled estimate
= 24.73 4 × 0.907 + 4 × 0.185 = 0.546
s p2 =
(
b (63 − 57) ± 1.708 × 24.73 × 1 + 1 ) 5+5−2
12 15
2.71  difference in mean scores  9.29
(2.28 − 2.9) ± 2.306 × 0.546 × ( 15 + 15 )
−1.70 xd 0.458
27
b You are 95% confident that the difference in 503.5 − 498

mean levels of progress between Year 7 and
Year 8 is between −1.70 and 0.458. As the (
22.58 1 + 1
10 7 )
interval contains the value 0, there is no
significant difference.
values of t = ±2.131
c A normal distribution as populations are used. 2.349 > 2.131. The test statistic T does not lie in
Xd = μ7 – μ8 the acceptance region, so reject H0. There is
evidence to suggest that the population mean
x 7 = 2.6 s 72 = 0.850 weights of Bakery A’s white bread and brown
bread are different.
x 8 = 3.1 s 82 = 0.128
To test whether the mean weight of Bakery A’s
(2.6 − 3.1) ± 1.96 0.850 + 0.128 white bread is higher than the mean weight,
120 87
505 g, of Bakery B’s 50 bread:
−0.681 xd −0.319 50
H0 : µX = 505
d 95% confidence interval found from the
population suggests that Year 7 students are H1 : µX > 505
making less progress than Year 8, as the upper 2
and lower limits of the interval are both x = 503.5 s x = 17.1

negative values.
503.5 − 505
The interval calculated from part a does Test statistic T = = −1.147
17.1
suggest that there is no significant difference 10
between Year 7 and Year 8 level of progress.
However, since the upper limit of the interval One-tailed test to the right with p = 0.95, v = 9
is only just above the value 0, this also Critical value is t = 1.833
confirms the conclusion from part c.
Since –1.147 <1.833, the test statistic T lies in the
acceptance region, so accept H0; there is no
Exam-style questions
evidence to suggest that Bakery A’s claim
is justified.
1 H0 : m = 4.27 H1: m < 4.27
3 a A
ssuming the mobile phone signal strength is
2
xm = 4.023 sm = 0.02560 normally distributed in city X and city Y.
4.023 − 4.27 x = −113.34 s x2 = 0.18
The test statistic T = = − 5.347
0.0256
12 y = −112.61 s y2 = 0.052
( 060.18 + 0.50052 )
One-tailed test to the left, with p = 0.01 and
v = 11; the critical value of t = −2.718. (−113.34 − (−112.61)) ± 1.645 ×
–5.347< −2.718. The test statistic T lies outside the −0.835 m x − m y −0.625
acceptance region, so reject H0. There is b H0 : m X − mY = 0
significant evidence to suggest that the mean is
less than 4.27. H1 : m X − mY ≠ 0
−113.34 − ( −112.61)
2 Use a two-sample t-test, assuming both white 0.18 + 0.052
and brown bread weights are normally 60 50
distributed with the same variance. z = −11.49

Let X be the weight of white bread and Y the Two-tailed test with p = 0.99; the critical
weight of brown bread sold by Bakery A. values of z = ± 2.326.
H0 : m X − mY = 0 As −11.49 < −2.326, the test statistic Z does not

H1: m X − mY ≠ 0 lie in the acceptance region, so reject H0.

There is significant evidence to suggest that
x = 503.5 s x2 = 17.1 y = 498 s 2y = 30.83 the mean signal strengths in the two cities are

9 × 17.1 + 6 × 30.8 different.
s p2 = = 22.58
10 + 7 − 2
28
2
Worked solutions
4 A paired sample t-test y = 6.025 s x = 9.099

H0 : md = 0
 7.7672 9.0992 
(6.38 − 6.025) ± 1.96 ×  50 + 40 
H1 : md ≠ 0

−3.20 x − y 3.90
The differences, course 1 – course 2, are
b No ladybird with nine spots was found in
Runner 1 2 3 4 5 forest B. However, a few of them were found
Difference −3 0.4 −1 −1.5 1.8 in forest A. Therefore, forest A is more likely to
be located in the north-eastern United States.
X d = −0.66 s d = 1.835
82
∑ ( x − x )2 = 22 −
The test statistic T = −0.66 = − 0.804 7 a n
1.835
5 28.82
∑(y − y )2 = 94 −
Two-tailed test with p = 0.95, v = 4; the critical 3n
values of t = ± 2.132.  82   28.82 
 22 − n  +  94 − 3n 
−2.132 < −0.804 < 2.132. The test statistic T lies 94
=
375 n + 3n − 2
within the acceptance region, so accept H0.
There is not sufficient evidence to suggest that 94 ( 4n − 2 ) 1021.44
= 116 −
the mean running time over the two courses 375 3n
is different. 1128n 2 − 131064n + 383040 = 0
5 a Population variances are equal. n = 3 or n = 113.2 (reject)
b xm = 7.934 sm = 0.2130
b x = 2.67 y = 3.2
xw = 5.795 sw = 0.439
∑(x − x )2 = 0.67 ∑(y − y )2 = 1.84
The pooled estimate
t-distribution with p = 0.95, v = 10, critical
9 × 0.21302 + 9 × 0.4392 value = 1.812
s p2 = = 0.119
10 + 10 − 2
1
(7.934 − 5.795) ± 2.101 × 0.119 × 10 +
1
10 ( ) (2.67 − 3.2) ± 1.812 ×
94
375 ( )
1 1
× +
3 9
1.81 mm − mw 2.46 −1.13 m x − m y 0.0748

8 a H0 : mr = 36.5
c H0 : mm − mw = 2.20
H1 : mr ≠ 36.5
H1: mm − mw ≠ 2.20
xr = 34.8 sr 2 = 15.46
The test statistic T = (7.934 − 5.795) − 2.2 = −0.395
0.119 1 + 1
10 10 ( ) 34.8 − 36.5
15.46
= −1.675
Two-tailed test with p = 0.995, v = 18; the 15

critical values of t = ± 2.878. Two-tailed test with p = 0.975, v = 14, critical
values = ± 2.145.
−2.878 < –0.395 < 2.878. The test statistic lies
within the acceptance region, so accept H0. −2.415 < −1.675 < 2.415. The test statistic T
There is evidence to suggest that the mean lies within the acceptance region, so accept
distance of long jump of men is 2.20 m greater H0. There is no evidence to suggest that the
than that of women. mean is not 36.5.
b A t-distribution with p = 0.975, v = 14, critical
6 a ∑ x = 0 × 8 + 2 × 5 + 7 × 24 + 9 × 7 + 13 × 6 = 319
value = 2.145
∑ x 2 = 0 × 8 2 + 2 × 52 + 7 × 24 2 + 9 × 7 2 + 13 × 6 2 = 4991 15.46
2 2 2
13 × 6 2 = 4991
5 + 7 × 24 + 9 × 7 + 34.8 ± 2.145
15
x = 6.38 s x = 7.767 32.6 mr 37.0
∑ y = 0 × 4 + 2 × 7 + 7 × 25 + 9 × 0 + 13 × 4 = 241
2 2 2 2 2 2
∑ y = 0 × 4 + 2 × 7 + 7 × 25 + 9 × 0 + 13 × 4 = 4681
7 2 + 7 × 252 + 9 × 0 2 +
13 × 4 2 = 4681
29
9 a A paired sample t-test One-tailed test to the left with p = 0.05; the
H0 : md = 0 critical values of z = −1.645.
−1.104 > −1.645. The test statistic Z lies within
H1 : md > 0 the acceptance region, so accept H0. There is
The differences before – after are: no evidence to suggest that college A students
took less time than college B students.
Staff A B C D E F G H
(5.9 −5.7 ) − 0.1 = 0.5522.
Difference 4 1 4 −2 0 10 18 −7 b Calculate value of Z =
0.7838 + 0.2669
X d = 3.5 s d = 7.672 30 40
3.5 Gives probability 0.7095.
The test statistic T = 7.672 = 1.29 1 – 0.7095 = 0.2905, therefore β > 29.05%
8
12 a H0 : m = 28
One-tailed test with p = 0.90, v = 7, critical
value t = 1.415. H1: m < 28
1.29 < 1.41. The test statistic T lies within the x = 27.44 s = 1.502
acceptance region, so accept H0. There is not 27.44 − 28
The test statistic T = 1.502 = –0.9133
sufficient evidence to suggest that the holiday
kids, club reduced the absence rate. 6
b The mean number of absent hours is reduced One-tailed test to the left with p = 0.05, v = 5,
by 3.5 hours from eight staff. However, at the critical value t = −2.015.
10% significance level, the test statistic is not −0.9133 > −2.015. The test statistic lies in the
significant enough to suggest that the number acceptance region, so accept H0. There is no
of hours of absence is reduced. As it costs the evidence to suggest that the mean completion
agency to run a kids’ club, it is recommended time for the swimmers is less than 28 seconds.
not to do it in the coming year. b p = 0.975, v = 9; the critical value of t = 2.262
10 a H0 : m = 15 m − 2.262 s = 26.07
10
H1: m > 15
m + 2.262 s = 28.17
x = 15.95 s = 1.629 10
m = 27.12 s 2 = 1.468
The test statistic T = 15.95 − 15 = 1.428
1.629 13 Assume the numbers of people coming to the
6 gym on different days are independent.
One-tailed test to the right with p = 0.95, v = 5, Population variances before and after extended
critical value t = 2.015. opening hours are the same.
1.428 < 2.015. The test statistic lies in the Let X1 be the number of people using the gym
acceptance region, so accept H0. There is no each day when the gym is open for 12 hours, and
evidence to suggest that the mean tail length X2 be the number of people when the gym is
of new-born mice is greater than 15 mm. open for 18 hours.
b 15.95 ± 2.571 1.629 H0 : m1 – m2 = 0

6 H1 : m1 – m2 < 0
14.2 mm m 17.7 mm
407 1 407 2 
x1 = = 40.7 s12 =  18125 − = 173.34
11 a H0 : m A − m B = 0 10 9 10 
H1 : m A − m B < 0
511 1 5112 
x2 = = 51.1 s2 2 =  28109 − = 221.88
t A = 5.7 s 2A = 0.7838 10 9 10 
t B = 5.9 s B2 = 0.2669
sp2 =
(9)(173.34 ) × (9)(221.88) = 197.61
The test statistic Z is 18
=
(5.7 − 5.9) The test statistic T =
40.7 − 51.1
= −1.65
0.7838 + 0.2669
30 40 (1 + 1
197.61 10 10 )
z = −1.104
30
2
Worked solutions
A one-tailed test to the left with p = 0.05, v = 18. A one-tailed test to the right with p = 0.90. The
The critical value is –1.734. The test statistic, critical value is 1.282. The test statistic,
–1.65 > –1.734, lies within the acceptance region, 2.070 > 1.282, lies outside the acceptance region,
so accept H0. There is not sufficient evidence to so reject H0. There is significant evidence to show
suggest that more people use the gym when the that the new menu increases sales.
gym is open for 18 hours. 16 a A
suitable test would be a two-sample t-test.
14 a Let X1 be the time needed to solve the puzzle Assume that the height of each group of
without any training and X2 be the time children is independent. Population variances
needed with training. are the same.
H0 : m1 – m2 = 0 b Let X1 be the height of the boys and X2 be the
H1 : m1 – m2 > 0 height of the girls.
H0 : m1 – m2 = 2
19 1 192 
x1 = = 3.8 s12 = 75.34 − = 0.785
5 4  5  H1 : m1 – m2 < 2
 2
17.4 2 1 17.42  X 1 = 564 = 112.8 s12 = 1  63 640 − 564  = 5.2
x2 = = 3.48 s2 =  61 − = 0.112 5 4 5 
5 4 5 
( 4 )(0.785) × ( 4 )(0.112 ) = 0.4485 (7 ) (0.003 343) × (7 ) (0.000 9357 )
sp2 = sp 2 = = 0.002139
8 14
3.8 − 3.48  2
The test statistic T = = 0.7555 X 2 = 555 = 111 s 2 2 = 1  61 635 − 555  = 7.5
( )
0.4485 1 + 1
5 5
5 4 5 
A one-tailed test to the right with p = 0.95, sp2 =

( 4 )(5.2 ) × ( 4 )( 7.5) = 6.35
8
v = 8. The critical value is 1.860. The test
statistic, 0.7555 < 1.860, lies within the
(112.8 − 111) − 2 = −0.1255
acceptance region, so accept H0. There is not
sufficient evidence to suggest that the training (
6.35 15 + 15 )
improves the time taken to solve the puzzle. A one-tailed test to the left with p = 0.05,
b Assume the times spent completing the v = 8. The critical value is –1.860.
puzzle are independent. Population variances The test statistic, –0.1255 < –1.860, lies within
with and without training are the same. the acceptance region, so accept H0. There is
19 17.4 sufficient evidence to suggest that the boys
c From part a, x1 = = 3.8 x2 = = 3.48
5 5 are taller than the girls by at least 2 cm.
sp2 =
( 4 )(0.785) × ( 4 )(0.112 ) = 0.4485 c For a 90% confidence interval, p = 0.95 and
8 v = 8. The critical values are ± 1.860.
( )
The confidence interval for the difference in
means: (112.8 − 111) ± 1.860 × 6.35 15 + 15
( 3.8 − 3.48) ± 1.860 × 0.4485 15 + 15 ( ) –1.16 cm  μ1 – μ2  4.76 cm
17 a A
ssume the weight of each pack of potatoes is
– 0.468 minutes  μ1 – μ2  1.11 minutes
independent and normally distributed.
15 Assume the sales made on different days Population variances are the same.
are independent. Let X1 be the weight of King Edward potatoes,
Let X be the random variable of new menu sales and X2 be the weight of salad potatoes.
and Y be the random variable of old menu sales. H0 : m1 – m2 = 0
H0 : mx – my = 0 H1 : m1 – m2 ≠ 0
H1 : mx – my > 0  2
x1 = 8.4 = 1.05 s12 = 1  8.843 4 − 8.4  = 0.003 343
343.8 1 343.8 2 8 7 8 
X= = 11.46 sx2 = 4069.04 − = 4.451
30 29  30 
 2
x 2 = 8.18 = 1.0225 s 2 2 = 1  8.3706 − 8.18  = 0.0009357
311 1 3112  8 7 8 
X2 = = 10.37 sy2 =  3336.29 − = 3.871
30 29  30 
The test statistic
11.46 − 10.37 1.05 − 1.0225
The test statistic Z = = 2.070 T = = 1.189
4.451 + 3.871
30 30 0.002139 1 + 1
8 8( )
31
A two-tailed test with p = 0.05 and p = 0.95. b Let X be the length of a piece before and Y the
v = 14. The critical values are ± 1.761. length after the machine is serviced.
The test statistic –1.761 < 1.189 < 1.761 lies H0 : µ X − µY = 0
within the acceptance region, so accept H0.
There is not sufficient evidence to suggest H1 : µ X − µY ≠ 0
that the weights of the two types of potatoes
are different. ∑ x = 6.04 ∑ x 2 = 6.0864
b For a 95% confidence interval, p = 0.975 and 6.04 1 6.04 2 
x= = 1.006667 s x2 =  6.0864 − = 0.0012267
v = 14. The critical values are ± 2.145. 6 5 6 
6.04 1 6.04 
2
( )
2
x= = 1.006667
1 + 1s x = 5  6.0864 − = 0.0012267
(1 . 05 − 1 . 0225 ) ± 2 . 145 × 6 0 . 002139 6 
8 8
∑ y = 6.01 ∑ y 2 = 6.0223
–0.0221 kg  μ1 – μ2  0.0771 kg
6.01 1 6.012 
x= = 1.001667 s x2 =  6.0223 − = 0.0004567
18 a A
paired sample t-test. Assume that the 6 5 6 
difference betweenx the6.01 1 6.012 
= two=types of pain
1.001667 s x2 relief
=  6.0223 − = 0.0004567
6
tablets is normally distributed. 5 6 
5 × 0.0012267 + 5 × 0.0004567
s p2 = = 0.00084167
b Let d = Tablet A time – Tablet B time H0 : md = 0 6+6−2
H1 : md < 0 6.04 − 6.01
Test statistic T = = 1.791
Differences: 4
1.5
0
–1.5
1.5
–2.5 (
0.00084167 16 + 16 )
3 1 32  Two-tailed test with p = 0.025, 0.975 and v = 10
xd = = 0.5 sd 2 =  29 −  = 5.5
6 5 6
Critical values are t = ±2.228
0.5 − 0
The test statistic T = = 0.5222 Since −2.228 < 1.791 < 2.228, the test statistic
5.5 lies within the acceptance region, so accept
6
H0. There is no significant difference between
A one-tailed test to the left with p = 0.95. the mean lengths before and after the
v = 5. The critical value is –2.015. machine’s service.
The test statistic, 0.5225 > –0.2015, lies within 20 a L

et X1 be the 11-year-olds’ progress and
the acceptance region, so accept H0. There is X2 be the 12-year-olds’ progress.
not sufficient evidence to suggest that
Tablet A is more efficient than Tablet B. x1 = 0.84 s12 = 0.148
52.5
c Tablet A, x A = = 8.75 s A = 2.44 x2 = 1.16 s22 = 1.528
6
Tablet B, x B =
49.5
= 8.25 s B = 0.524 sp2 =
( 4 )(0.148) × ( 4 )(1.528) = 0.838
6 8
The test statistic T in part b can be used to The confidence interval:
show that Tablet B is more efficient than
Tablet A at the 5% significance level. Also,
tablet B has a smaller sample mean and ( )
(0.84 − 1.16 ) ± 2.306 × 0.838 15 + 15
standard deviation. Tablet B would be
recommended. –1.655  μ1 – μ2  1.015
19 a A two-sample t-test. Assume that the samples b The confidence interval contains the value 0,
of length measurements before and after the so there is no significant difference between
machine’s service are independent, that each the two groups of children.
is taken from a normally distributed
population, and that the two populations
have the same variance.
32
2
Worked solutions
Mathematics in life and work
1 A 2-sample t-test
Let H denote the goals scored in home matches
and A denote the goals scored in away matches
H0 : m H − m A = 0

H1 : m H − m A > 0

x H = 2.33 s H 2 = 3.87

x A = 1.83 s A 2 = 1.37

s p 2 = 5 × 3.87 + 5 × 1.37 = 2.62
6+6−2
2.33 − 1.83
(
2.62 1 + 1
6 6 )
One-tailed test to the right, p = 0.95, v = 10; the
critical value of t = 1.812.
0.535 < 1.812. The test statistic lies in the
acceptance region, so accept H0. There is not
enough evidence to show that Liverpool plays
better at a home match than an away match.
2 For home matches, ∑ x = 14 + 19 = 33
x H = 2.06
Pooled variance estimate of home matches
 2

5 × 1.97 +  49 − 19
10 
= 
= 1.625
6 + 10 − 2
For away matches, ∑ x = 11 + 11 = 22
x A = 1.375
Pooled variance estimate of away matches
 2

5 × 1.17 +  21 − 11
10 

= = 1.054
6 + 10 − 2
Pooled estimate of home and away matches
15 × 1.625 + 15 × 1.054
s p2 = = 1.3395
16 + 16 − 2
95% confidence interval: p = 0.975, v = 30,
t = 2.042

(2.06 − 1.375) ± 2.042 × 1.3395 × 1 + 1 (
16 16 )
−0.15 m H − m A 1.52

From part b, the confidence interval contains 0,
so you are 95% confident that there is no
difference between how Liverpool play at home
or away; both parts suggest that there is not
enough statistical evidence to support the claim
that Liverpool plays better at home.
33
2
3 -Tests
3 χ2-tests
this publication.
1 a The total frequency is 25 + 31 + ⋅⋅⋅ + 1 = 100

1
100 (
x= 1 × 25 + 2 × 31 + ⋅ ⋅ ⋅ + 8 × 1) = 2.5
s2 =
1
99{( )
12 × 25 + 2 2 × 31 + ⋅ ⋅ ⋅ + 82 × 1 − 100 × 2.52 = }
189
99
= 1.91
b The total frequency is 13 + 25 + 32 + 5 = 75

1 566
75 (
x= 10 × 13 + 30 × 25 + 50 × 32 + 70 × 5 ) = = 37.7 (3 s.f.)
15
1  2 566 
( )
2
s2 = 
74 (10 × 13 + 302 × 25 + 502 × 32 + 702 × 5 − 75 ×)15 
= 291
 10 10 − 2
2 a i P ( X = 2 ) =   0.32 (1 − 0.3 ) = 0.233
 2
ii E(X) = 10 × 0.3 = 3
b i P (150 < Y 200 ) = Φ ( 20050− 260 ) − Φ(15050− 260 )

= Φ(–1.2) – Φ(–2.2) = (1 – Φ(1.2)) – (1 – Φ(2.2))
= (1 – 0.8849) – (1 – 0.9861) = 0.101
ii Var(Y) = 502 = 2500
3 Performing a paired t-test, assume the differences between sample A and sample B are normally
distributed. Take differences to be sample A minus sample B.
Differences 2, – 7, 6, 5, 3, 7, 4, 0
1
8(
d= 2 − 7 + ⋅ ⋅ ⋅ + 0 ) = 2.5
(( )
sd2 = 71 2 2 + (−7)2 + ⋅ ⋅ ⋅ + 02 − 8 × 2.52 = 138
7 )
H0: μd = 0
H1: μd ≠ 0
2.5 − 0
5% significance level, two-tailed test. The test-statistic T = = 1.593. Critical value: t7 (2.5%) = 2.365.
19.71
8
As 1.593 < 2.365 there is no reason to doubt H0. There is insufficient evidence at the 5% significance level to
suggest the means of sample A and sample B are different.
34
3
Worked solutions
Exercise 3.2A
1 H0: Distribution of flights is as Yusuf claims.

H1: Distribution of flights is not as Yusuf claims.
5% significance level, degrees of freedom: 4 − 1 = 3, critical value 7.815
On time Under 30 mins Over 30 mins Cancelled
Probability 0.5 0.2 0.2 0.1
Observed 35 10 3 2
Expected 25 10 10 5
2
(Ok − E k )
Ek 4 0 4.9 1.8
X 2 = 10.7 > 7.815. Therefore reject H0; the distribution of flight departure times is not as Yusuf claims.
2 H0: Number rolled on the dice can be modelled by a uniform distribution.

H1: Number rolled on the dice cannot be modelled by a uniform distribution.
1 2 3 4 5 6
1 1 1 1 1 1
Probability 6 6 6 6 6 6
Observed 29 44 38 34 48 47
Expected 40 40 40 40 40 40
2
(Ok − E k )
3.025 0.4 0.1 0.9 1.6 1.225
Ek
X2 = 7.25 < 9.236. Therefore, no reason to doubt H0, the dice are fair.
3 a H
0: The distribution of ‘shiny’ stickers in packs can be modelled by B(8, 0.2).
H1: The distribution of ‘shiny’ stickers in packs cannot be modelled by B(8, 0.2).
5% significance level.
0 1 2 3 4 5 6 7 8
8.192 2.560
Probability
0.1678 0.3355 0.2936 0.1468 0.045 88 0.009 175 0.001 147 × 10–5 × 10–6
Observed 32 43 40 21 10 3 1 0 0
Expected 25.17 50.33 44.04 22.02 6.881 1.376 0.1720 0.012 29 0.000 384 0
Combining columns 4 to 8:
0 1 2 3 4 or more
Observed 32 43 40 21 14
Expected 25.17 50.33 44.04 22.02 8.442
2
(Ok − E k )
1.856 1.068 0.3706 0.047 26 3.659
Ek
Degrees of freedom 5 − 1 = 4, critical value 9.488

X 2 = 7.001 < 9.488. Therefore no reason to doubt H0, the distribution of shiny stickers in packs is B(8, 0.2).
b For the test statistic to be approximately described by the χ2-distribution, expected values must be
greater than 5.
35
2
3 -Tests
4 a A
geometric distribution would be suitable if the probability of success is fixed and if successes
occur independently.
b Expected frequencies, under Geo(0.4)
1 2 3 4 5 6 7 or more
Probability 0.4 0.24 0.144 0.0864 0.051 84 0.031 10 0.046 66
Expected 24 14.4 8.64 5.184 3.110 1.866 2.799
c H0: First sale of the day can be modelled by Geo(0.4).
H1: First sale of the day cannot be modelled by Geo(0.4).
2.5% significance level. Combine the classes 5, 6 and 7 or more.
1 2 3 4 5 or more
Observed 10 20 16 7 7
Expected 24 14.4 8.64 5.184 7.776
(Ok − E k )2
8.167 2.178 6.270 0.6362 0.077 44
Ek
X 2 = 17.33 > 11.14. Therefore reject H0; the distribution for the first sale of the day cannot be modelled
by Geo(0.4).
5 a H0: Defective parts can be modelled by Po(2.5).
H1: Defective parts cannot be modelled by Po(2.5).
1% significance level
0 1 2 3 4 5 6 or more
Prob 0.082 08 0.2052 0.2565 0.2138 0.1336 0.066 80 0.042 02
Obs 28 49 50 44 16 8 5
Exp 16.42 41.04 51.30 42.75 26.72 13.36 8.404
2
(Ok − E k )
Ek 8.172 1.543 0.033 10 0.036 40 4.301 2.151 1.379

X 2 = 17.62 > 16.81. Therefore reject H0; defective parts cannot be modelled by Po(2.5).
b Defects should occur singularly and randomly.
6 H0: Sarah’s cat’s food preference can be modelled by a uniform distribution.

H1: Sarah’s cat’s food preference cannot be modelled by a uniform distribution.
Turkey Fish Chicken Lamb Beef

Prob 0.2 0.2 0.2 0.2 0.2
Obs 4 7 3 6 10
Exp 6 6 6 6 6
2
(Ok − E k ) 2 1 3 8
2 0
Ek 3 6 3
X 2 = 5 < 9.488. Therefore, no reason to doubt H0, the cat does not have a preference.
36
3
Worked solutions
7 H0: The number of goals scored in a penalty shootout can be modelled by B(5, 0.7).
H1: The number of goals scored in a penalty shootout cannot be modelled by B(5, 0.7).
0 1 2 3 4 5
Prob 0.002 43 0.028 35 0.1323 0.3087 0.3602 0.1681
Exp 0.243 2.835 13.23 30.87 36.02 16.81
Combine 0, 1 and 2:
2 or fewer 3 4 5
Obs 22 40 27 11
Exp 16.31 30.87 36.02 16.81
2
(Ok − E k )
1.987 2.700 2.257 2.006
Ek
Degrees of freedom 4 − 1 = 3 , critical value 7.815

X 2 = 8.950 > 7.815. Therefore reject H0; goals scored in a penalty shootout cannot be modelled by B(5, 0.7).
8
1 1 1
a The sum of the probabilities is 1. k 1 ++ +
2 3 4 ( 25 12
)
= 1, therefore 12 k = 1 so k = 25 .
b Expected frequencies for sample of size 50.
r 1 2 3 4
P(X = r) 0.48 0.24 0.16 0.12
Er 24 12 8 6
c H0: The data can be modelled by the random variable X.
H1: The data cannot be modelled by the random variable X.
10% significance level, Degrees of freedom 4 − 1 = 3, critical value 6.251
3 1 1 3 43
+ + + = X2 =
2 12 2 2 12
X 2 = 3.583 < 6.251. Therefore, no reason to doubt H0, the random variable X is a good model for the data.
9 a P(X < 150) = Φ(15050− 260 ) = Φ(−2.2) = 1 − 0.9861 = 0.0139

P(150 X < 200) = Φ ( ) − 0.0139 = Φ(−1.2) − 0.0139 = 0.1012
200 − 260
b 0.1012 using sum of probabilities.
50
c H0: the finishing times can be modelled by N(260, 50)
H1: the finishing times cannot be modelled by N(260, 50)
Time under 150 150–200 200–250 250–300 over 300

Prob 0.0139 0.1012 0.3057 0.3674 0.2119
Obs 20 83 373 476 298
Exp 17.38 126.5 382.1 459.3 264.82
(Ok − E k )2 0.3952 14.93 0.2162 0.6105 4.158
Ek
5% significance level, degrees of freedom 5 − 1 = 4, critical value 9.488
X 2 = 20.31 > 9.488. Therefore reject H0; there is evidence that the finishing times cannot be modelled by
N(260, 50).
37
2
3 -Tests
Exercise 3.2B
1 a H0: The number of buses arriving before Nury’s can be modelled by a Poisson distribution.
H1: The number of buses arriving before Nury’s cannot be modelled by a Poisson distribution.
0 × 4 + 1 × 13 + 2 × 10 + 3 × 3
b Sample mean: x = = 1.4, hence λ = 1.4
30
c Expected frequencies under Po(1.4)
0 1 2 3 4 or more
Probability 0.2466 0.3452 0.2417 0.1128 0.053 73
Expected 7.398 10.36 7.250 3.383 1.612
d Groups 2, 3 and 4 must be combined in order to get expected frequencies greater than 5.
e Three (combined) groups, two constraints, degrees of freedom 3 − 2 = 1.
f 5% significance level, critical value 3.841
0 1 2 or more
Obs 4 13 13
Exp 7.398 10.36 12.25
2
(Ok − E k )
1.561 0.6744 0.046 55
Ek
X 2 = 2.282 < 3.841. Therefore, no reason to doubt H0, the number of buses arriving before Nury’s bus can
be modelled by a Poisson distribution.
1 × 51 + 2 × 46 + 3 × 29 + … + 8 × 1 7
2 a Sample mean: x = =
150 3
b p=1÷7= 3
3 7
c Probabilities
1 2 3 4 5 6 7 8 9 or more
0.4286 0.2449 0.1399 0.079 97 0.045 69 0.026 11 0.014 92 0.008 526 0.011 37
d Expected frequencies
1 2 3 4 5 6 7 8 9 or more
64.29 36.73 20.99 12.00 6.854 3.917 2.238 1.279 1.705
e Combined groups 6, 7, 8 and 9 or more. Degrees of freedom: 6 combined groups minus 2 constraints, so 4.
f H0: The number of darts taken to hit a double can be modelled by a geometric distribution.
H1: The number of darts taken to hit a double cannot be modelled by a geometric distribution.
5% significance level, critical value 9.488
X 2 = 2.746 + 2.337 + 3.056 + 0.082 54 + 0.1065 + 0.501 = 8.83
X 2 = 8.83 < 9.488. Therefore accept H0, the number of darts required to hit a double can be modelled by
a geometric distribution.
g The geometric distribution requires the probability of success to remain constant. As the dart player
throws more darts, they are probably more likely to hit a double through practice. Therefore, it is
unlikely this condition would be met.
0 × 49 + 1 × 64 + 2 × 34 + 3 × 3
3 a Sample mean: x = = 0.94, hence λ = 0.94
150
b H0: The visits by patients can be modelled by a Poisson distribution.
H1: The visits by patients cannot be modelled by a Poisson distribution.
38
3
Worked solutions
c Expected frequencies
0 1 2 3 4 or more
Prob 0.3906 0.3672 0.1726 0.054 07 0.015 53
Exp 58.59 55.08 25.89 8.111 2.329
d Combine groups 3 and 4 or more. 1% significance level, degrees of freedom 4 − 2 = 2, critical value 9.210
X 2 = 1.571 + 1.445 + 2.543 + 5.302 = 10.86
X 2 = 10.86 > 9.210. Therefore reject H0; the visits by patients cannot be modelled by a Poisson
distribution. As a generic Poisson distribution is not a good fit, it is not surprising that in the example
the null hypothesis is rejected as well.
4 H0: The number of goals scored in a penalty shootout can be modelled by a binomial distribution.
H1: The number of goals scored in a penalty shootout cannot be modelled by a binomial distribution.
Sample mean: x = 0 × 3 + 1 × 6 + … + 5 × 11 = 3.15, so an estimate for p is 3.15 ÷ 5 = 0.63
100
0 1 2 3 4 5
Prob 0.006 934 0.059 04 0.2010 0.3423 0.2914 0.099 24
Exp 0.6934 5.904 20.10 34.23 29.14 9.924
Combine 0 and 1:
1 or fewer 2 3 4 5
Obs 9 13 40 27 11
Exp 6.597 20.10 34.23 29.14 9.924
2
(Ok − E k )
0.8753 2.510 0.9721 0.1576 0.1166
Ek

X 2 = 4.632 < 7.815. Therefore, no reason to doubt H0, goals scored in a penalty shootout can be modelled by
a binomial distribution. This reverses the result from the previous question; a 70% chance of scoring was
too high, with this sample suggesting a 63% chance would be more appropriate.
5 a c + (c + d ) + (c + 2d ) + (c + 3d ) + (c + 4d ) = 1
5c + 10d = 1
c + 2d = 0.2
b E ( X ) = 0 × c + 1 × ( c + d ) + … + 4 × ( c + 4d ) = 10c + 30d
0 × 12 + 1 × 20 + … + 4 × 3
c x= = 1.5
60
c + 2d = 0.2 
d  ⇒ c = 0.3, d = −0.05
10c + 30d = 1.5 
e H0: The data can be modelled by the random variable X.
r 0 1 2 3 4
P(X = r) 0.3 0.25 0.2 0.15 0.1
Or 12 20 17 8 3
Er 18 15 12 9 6
2
(Or − E r ) 5 25 1 3
2 3 9
Er 12 2
X 2 = 7.361 < 7.815. Therefore, no reason to doubt H0, the random variable X is a good fit for the data.
39
2
3 -Tests
6 a x = 38.68 (by symmetry), y = 160 − 2 × 0.9935 + 9.696 + 38.68) = 61.26

Alternative method:
y = 160 ×  Φ
 ( 5010− 45 ) − Φ ( 4010− 45 ) = 160 × (2 Φ (0.5) − 1) = 61.28 . Actual value using spreadsheet, y = 61.27.
(You get slightly different answers, due to the fact that normal tables round probability values to 4 d.p.)
b H0: Plant growth can be modelled by N(45, 102).
H1: Plant growth cannot be modelled by N(45, 102).
5% significance level, combine ‘20 or less’ and ‘20–30’, and combine ‘more than 70’ and ‘60–70’; degrees
of freedom 5 − 1 = 4, critical value 9.488
X 2 = 0.009 040 + 2.421 + 0.083 96 + 0.002701 + 11.97 = 14.49
X 2 = 14.49 > 9.488. Therefore reject H0; plant growth cannot be modelled by N(45, 102).
25 × 11 + 35 × 29 + 45 × 59 + 55 × 39 + 65 × 22
c x= = 47mm
160
d z = 160 − ( 0.5547 + 6.576 + … + 1.716 ) = 13.77
Alternative method: z = 160 ×  Φ

 ( 7010− 47 ) − Φ ( 6010− 47 ) = 160 × (Φ(2.3) − Φ(1.3)) = 13.78
e H0: Plant growth can be modelled by N( μ, 102).
H1: Plant growth cannot be modelled by N( μ, 102).
5% significance level, combine ‘20 or less’ and ‘20–30’, and combine ‘more than 70’ and ‘60–70’; degrees
of freedom 5 − 2 = 3, critical value 7.815
X 2 = 2.102 + 0.2108 + 0.021 99 + 0.9677 + 2.738 = 6.0404
X 2 = 6.0404 < 7.815. Therefore, no reason to doubt H0, plant growth can be modelled by N(μ, 102) with μ
estimated to be 47.
7 a At x = 3, F(3) = 3a + 9b = 1
b f(x) = F ′(x) = a + bx2 for 0  x  3, 0 otherwise
3
3 3  ax 2 bx 4  9a 81b
E ( X ) = ⌠ x f ( x ) dx = ∫0 ax + bx dx =  2 + 4 0 = 2 + 4
3
⌡0
(0.5 × 3 + 1.5 × 16 + 2.5 × 21)

c Average time = = 1.95
40
3a + 9b = 1 
 2 1
d 9a 81b 39  ⇒ a = 15 , b = 15
+ =
2 4 20 
e H0: The data can be modelled by the random variable X.

0t1 1<t2 2<t3

7 13 5
Prob 45 45 9
Ot 3 16 21
56 104 200
Et 9 9 9
(Ot − Et )2
1.669 1.709 0.067 22
Et
X 2 = 3.445 < 6.635. Therefore, no reason to doubt H0, the random variable X is a good fit for the data.
40
3
Worked solutions
5 × 0 + 15 × 12 + 25 × 48 + … + 55 × 7
8 a x= = 35
200
200  52 × 0 + 152 × 12 + 252 × 48 + … + 552 × 7 
s= − 352  = 9.563 (4 s.f.)
199  200 
b H0: Income distribution can be modelled by N( μ, σ 2)
H1: Income distribution cannot be modelled by N( μ, σ 2)
Less than 10 10–20 20–30 30–40 40–50 50–60 60 or more

Prob 0.004 472 0.053 91 0.2422 0.3989 0.2422 0.053 91 0.004 472
Exp 0.8943 10.78 48.43 79.78 48.43 10.78 0.8943
Combine ‘less than 10’ and ‘10–20’ and combine ‘60 or more’ and ‘50–60’:
Less than 20 20–30 30–40 40–50 50 or more

Obs 12 48 75 58 7
Exp 11.68 48.43 79.78 48.43 11.68
2
(Om − E m )
0.009 026 0.003 864 0.2869 1.890 1.872
Em

X 2 = 4.062 < 5.991. Therefore no reason to doubt H0, income distribution can be modelled by N(μ, σ2).
c P(M 22.5) = Φ ( 22.59.563− 35 ) = Φ(−1.307) = 0.095 59

200 × 0.095 56 = 19.12, therefore approximately 19 families.
Exercise 3.3A
60 × 80
1 a = 32, row total multiplied by column total divided by grand total is expected frequency.
150
70 × 60
x= = 28
150
(Ok − E k )2 (26 − 28)2 1 (34 − 32)2 1

b = = = 0.1429 (4 s.f.). y = 32
= = 0.125
8
Ek 28 7
c H0: Age group and clinic time attendance are independent.
H1: Age group and clinic time attendance are not independent.
5% significance level, degrees of freedom (4 − 1) × (2 − 1) = 3, critical value 7.815
X 2 = 12.84 > 7.815. Therefore reject H0; age group and clinic time attendance are not independent.
2 a 365 × 460 = 115, row total multiplied by column total divided by grand total is expected frequency. The
1460
column totals (365) are identical for each region, so expected frequency will be equal in each row.
b Expected frequencies: 115, 187.5, 62.5.
2 (120 − 115)2 (199 − 187.5)2 (46 − 62.5)2

X England = + + = 5.279
115 187.5 62.5
2 (103 − 115)2 (215 − 187.5)2 (47 − 62.5)2
X Scotland = + + = 9.130
115 187.5 62.5
2 (103 − 115)2 (184 − 187.5)2 (78 − 62.5)2

X Wales = + + = 5.162
115 187.5 62.5
(134 − 115)2 (152 − 187.5)2 (79 − 62.5)2
X N2 .Ireland = + + = 14.22
115 187.5 62.5
Hence X2 = 33.79.
41
2
3 -Tests
c Degrees of freedom (4 − 1) × (3 − 1) = 6
d H0: Region and rainfall are independent.
H1: Region and rainfall are not independent.
0.1% significance level, 6 degrees of freedom,
critical value 22.46
X 2 = 33.79 > 22.46. Therefore reject H0; region and rainfall are not independent. There is strong evidence
to justify this conclusion.
3 H0: Gender and political preference are independent.

H1: Gender and political preference are not independent.
Key
Observed
 (Ok − E k )2 
  Expected
 Ek
A B C D E Total
Male 27 32 24 12 5 100
(0.4016) 30.5 (0.4298) 28.5 (0.5976) 20.5 (0.2857) 14 (0.3462) 6.5
Female 34 25 17 16 8 100
(0.4016) 30.5 (0.4298) 28.5 (0.5976) 20.5 (0.2857) 14 (0.3462) 6.5
Total 61 57 41 28 13 200
X 2 = 4.122 < 9.448. Therefore, no reason to doubt H0; there is no relationship between gender and
political preference.
4 H0: Consumption type and taste change are independent.
H1: Consumption type and taste change are not independent.
Key
Observed
 (Ok − E k )2 
  Expected
 Ek
Better No Change Worse Total

High 24 22 29 75
(0.5523) 20.625 (2.133) 30 (0.8776) 24.375
Medium 18 30 12 60
(0.1364) 16.5 (1.5) 24 (2.885) 19.5
Low 13 28 24 65
(1.330) 17.875 (0.1538) 26 (0.3913) 21.125
Total 55 80 65 200
X 2 = 9.959 < 13.28. Therefore, no reason to doubt H0, volume of consumption is not related to taste change.
34 × 25
5 a = 6.8, row total multiplied by column total divided by grand total is expected frequency.
125
42
3
Worked solutions
b The expected frequencies are less than 5, and in order for the test statistic to be approximated by the
χ2-distribution, this cannot be the case.
c H0: Advertising and sales are independent.
H1: Advertising and sales are not independent.
10% significance level, degrees of freedom (4 − 1) × (3 − 1) = 6 , critical value 10.64
Key
Observed
 (Ok − E k )2 
 Ek  Expected

0a<5 5  a < 10 10  a < 15 15  a  20
None 13 5 6 10
(5.653) 6.8 (0.4765) 6.8 (0.5718) 8.16 (0.4099) 12.24
Low 5 8 11 16
(1.125) 8 (0) 8 (0.2042) 9.6 (0.1778) 14.4
Medium 7 12 13 19
& High (1.004) 10.2 (0.3176) 10.2 (0.04719) 12.24 (0.02231) 18.36
X 2 = 10.01 < 10.64. Therefore, no reason to doubt H0; there is insufficient evidence at the 10%
significance level to state that advertising and sales are linked.
d H0: Advertising and sales are independent.
H1: Advertising and sales are not independent.
Key
Observed
 (Ok − E k )2 

0  a < 10 10  a < 15 15  a  20
None 18 6 10
(1.424) 13.6 (0.5718) 8.16 (0.4099) 12.24
Low 13 11 16
(0.5625) 16 (0.2042) 9.6 (0.1778) 14.4
Medium 11 11 7
(0.031 03) 11.6 (2.345) 6.96 (1.133) 10.44
High 8 2 12
(0.072 73) 8.8 (2.038) 5.28 (2.102) 7.92
X 2 = 11.07 > 10.64. Therefore reject H0; there is sufficient evidence at the 10% significance level to state
that advertising and sales are linked.
e By combining different rows or columns, opposite conclusions are reached.
43
2
3 -Tests
6 a H0: Gender and subject preference are independent.

H1: Gender and subject preference are not independent.
Key
Observed
 (Ok − E k )2 

Maths & History Science Geography
Male 14 5 6
(0.050 78) 13.18 (0.4848) 6.818 (0.2) 5
Female 15 10 5
(0.042 32) 15.82 (0.4040) 8.182 (0.1667) 6
X 2 = 1.349 < 5.991. No reason to doubt H0, there is no relationship between gender and subject preference.
b H0: Gender and subject preference are independent.
H1: Gender and subject preference are not independent.
Key
Observed
 (Ok − E k )2 

History Science Maths & Geography
Male 12 5 8
(0.9309) 9.091 (0.4848) 6.818 (0.1309) 9.091
Female 8 10 12
(0.7758) 10.91 (0.4040) 8.182 (0.1091) 10.91
X 2 = 2.836 < 5.991. No reason to doubt H0, there is no relationship between gender and subject preference.
Both tests return the same result, despite different column groupings. However, the second test has a
p-value of approximately 24%, whereas the first has a p-value of approximately 51%, meaning the
grouping of columns (which here are arbitrary) could affect conclusions at weaker significance levels
(for example 25%).
1 a Geometric distribution
b H0: The first time a head is tossed can be modelled by Geo(0.4).
H1: The first time a head is tossed cannot be modelled by Geo(0.4).
44
3
Worked solutions
1 2 3 4 5 6 7 8 9+
Prob 0.4 0.24 0.144 0.0864 0.051 84 0.031 10 0.018 66 0.011 20 0.016 80
Obs 70 42 33 21 20 7 5 2 0
Exp 80 48 28.8 17.28 10.37 6.221 3.732 2.239 3.359
1 2 3 4 5 6 7+
Obs 70 42 33 21 20 7 7
Exp 80 48 28.8 17.28 10.37 6.221 9.331
 (Ok − E k )2 
 Ek  1.25 0.75 0.6125 0.8008 8.948 0.097 60 0.5824

Combine 7, 8 and 9+, degrees of freedom 7 − 1 = 6, critical value 12.59
X 2 = 13.04 > 12.59, therefore reject H0; there is evidence that the first time a head is tossed cannot be
modelled by Geo(0.4).
2 H0: The number of broken teacups in a pack of four can be modelled by B(4, 0.15).
H1: The number of broken teacups in a pack of four cannot be modelled by B(4, 0.15).
0 1 2 3 4
Prob 0.5220 0.3685 0.0975 0.011 48 0.000 506 3
Obs 42 16 5 2 0
Exp 33.93 23.95 6.340 0.7459 0.032 91
0 1 2 or more
Obs 42 16 7
Exp 33.93 23.95 7.119
 (Ok − E k )2 
  1.919 2.639 0.001 980
 Ek
Combine 2, 3 and 4, degrees of freedom 3 − 1 = 2, critical value 5.991

X 2 = 4.561 < 5.991, therefore no reason to doubt H0, the number of broken teacups in a pack of four can be
modelled by B(4, 0.15).
3 H0: Gender and vegetable preference are independent.
H1: Gender and vegetable preference are not independent at the 5% significance level.
Key
Observed
 (Ok − E k )2 

Male Female
Tomatoes 26 24
(0.7273) 22 (0.5714) 28
Carrots 10 28
(2.701) 16.72 (2.122) 21.28
Mushrooms 19 18
(0.4544) 16.28 (0.3571) 20.72
45
2
3 -Tests
Degrees of freedom (3 − 1) × (2 − 1) = 2, critical value 5.991

X 2 = 6.933 > 5.991, therefore reject H0; there is a relationship between gender and vegetable choice.
4 a H0: House type and supermarket shopped at are independent.

H1: House type and supermarket shopped at are not independent.
b Total columns minus one multiplied by total combined rows minus one:
degrees of freedom (3 − 1) × (3 − 1) = 4
c Critical values: 2.5% significance level 11.14, 1% significance level 13.28, 0.5% significance level 14.86.
13.28 < 13.95 < 14.86, so 0.5% is the largest significance level for which there would be no reason to
doubt H0. (Note: from calculator 0.75% is the solution.)
5 a P(150 < l 170) = Φ (17020− 160 ) − Φ(15020− 160 ) = Φ(0.5) − Φ(−0.5)

= 2 × 0.6915 − 1 = 0.3830
Expected value: 100 × 0.3830 = 38.30
100 × P(l 150) = 100 × Φ(−0.5) = 100 × 0.3085 = 30.85
100 × P(170 < l 190) = 100 × [ Φ(1.5) − Φ(0.5) ] = 24.17
100 × P (l > 190 ) = 100 × [1 − Φ(1.5 )] = 6.681

b H0: The length of Aesculapian snakes can be modelled by N(160, 202).
H1: The length of Aesculapian snakes cannot be modelled by N(160, 202).
2.5% significance level, degrees of freedom 4 − 1 = 3, critical value 9.348
X2 = 1.997 + 0.075 46 + 0.1386 + 2.794 = 5.005
X2 < 9.348, therefore no reason to doubt H0, the length of Aesculapian snakes can be modelled by
N(160, 202).
c Confidence interval
 20 20 
165 − 1.96 × 100 ,165 + 1.96 × 100 
[161, 169]
The proposed population mean of 160 lies outside the 95% confidence interval, and so it would seem
unlikely that it would make a suitable model to measure the lengths of the snakes.
3 4
∫0 ky d y + ∫ k(12 − y) d y = 1
2
6 a
3
3 4
y3   y2  35
k   + k 12y −  = k (9 − 0) + k (40 − 31.5) = k = 1 therefore k = 2
3
 0  2  2 35
3
3
2 2 3 18
dy = 
3
∫0 35 y
b 2
P(Y 3) = y =
 105 0 35
6 + 10 16
c From data given in the table P(Y 2) = =
105 105
18 16 38
Therefore P(2 < Y 3) = − =
35 105 105
18 17
and P(3 < Y 4) = 1 − P(Y 3) = 1 − =
35 35
so the remaining two expected values are 38 and 51 respectively.
H0: Amount of chemical produced can be modelled by the random variable Y.
H1: Amount of chemical produced cannot be modelled by the random variable Y.
46
3
Worked solutions
X 2 = 1.5 + 1.6 + 3.789 + 7.078 = 13.97

X 2 > 7.815, therefore reject H0, so the amount of chemical produced cannot be modelled by the random
variable Y.
7 a x = 0 × 29 + 1 × 38 + 2 × 33 + 3 × 13 + 4 × 8 + 5 × 4 = 1.56
125
2 125  02 × 29 + 12 × 38 + 22 × 33 + 32 × 13 + 42 × 8 + 52 × 4 
s = − 1.562  = 1.7
124  125 
b Poisson distribution has equal expectation and variance, and the sample mean and variance are close.
e −1.56 × 1.56 2
c P(X = 2) = = 0.2557, and expected value is 125 × 0.2557 = 31.96
2!
x = 125 − 26.27 − 40.98 − 31.96 − 16.62 − 6.482 − 2.022 = 0.6705
d Groups 4, 5 and 6 or more must be combined (to form 4 or more) so that expected frequencies are
greater than five.
e H0: The number of A* gained can be modelled by a Poisson distribution.
H1: The number of A* gained cannot be modelled by a Poisson distribution.
X 2 = 0.2844 + 0.2162 + 0.033 73 + 0.7885 + 0.8701 = 2.193

X 2 < 6.251, therefore no reason to doubt H0, the number of A* gained can be modelled by a Poisson
distribution.
f Expected frequencies would change. One more degree of freedom.
8 a H0: Opening day and level of demand are independent.

H1: Opening day and level of demand are not independent.
Key
Observed
 (Ok − E k )2 

Tuesday Wednesday Thursday

Low 15 18 12
(0) 15 (0.6) 15 (0.6) 15
Normal 30 25 23
(0.6154) 26 (0.038 46) 26 (0.3462) 26
High 5 7 15
(1.778) 9 (0.4444) 9 (4) 9

X 2 = 8.422 < 9.448, therefore no reason to doubt H0, there is not a relationship between level of demand
and the weekday of opening.
b The conclusion states there is no relationship between level of demand and opening day, meaning
no specific day gains a higher or lower demand. Therefore, it would be difficult to choose which one of
the days to open on. Also, the test does not give any information on whether the demand is sufficient for
the club to make a profit.
47
2
3 -Tests
9 a H0: The number of cases assigned can be modelled by a uniform distribution.

H1: The number of cases assigned cannot be modelled by a uniform distribution.
A B C
Observed 43 32 45
Expected 40 40 40
 (O − E )2 
 k k
 0.225 1.6 0.625
 Ek 

X 2 = 2.45 < 5.991, therefore no reason to doubt H0, the number of cases assigned can be modelled by a
uniform distribution.
b H0: Cases solved and officer are independent.
H1: Cases solved and officer are not independent.
Key
Observed
 (Ok − E k )2 

A B C Total
Solved 31 18 11 60
(4.198) 21.5 (0.25) 16 (5.878) 22.5
Unsolved 12 14 34 60
(4.198) 21.5 (0.25) 16 (5.878) 22.5
Total 43 32 45 120

X 2 = 20.65 > 9.210, therefore reject H0, so there is a relationship between the proportion of cases solved
and assigned officer.
10 a ΣP(X = r) = a + 2a + 3a + 4b + 5b + 6b = 1 ⇒ 6a + 15b = 1
b E(X) = ΣrP(X = r) = a + 4a + 9a + 16b + 25b + 36b = 14a + 77b
1 × 3 + 2 × 13 + 3 × 11 + 4 × 10 + 5 × 16 + 6 × 7 56
c x= =
60 15
6a + 15b = 1 
 1 1
56  ⇒ a = 12 , b = 30
14a + 77b = 
15 
d Use these values to calculate expected frequencies:
1 2 3 4 5 6
Obs 3 13 11 10 16 7
Exp 5 10 15 8 10 12
2
(Or − E r )
0.8 0.9 1.067 0.5 3.6 2.083
Er
48
3
Worked solutions
X2 = 8.95, degrees of freedom 6 − 2 = 4

critical values: 10% significance level 7.779, 5% significance level 9.488
7.779 < X2 < 9.488, therefore 5% significance level is the greatest at which there would be no reason to
doubt H0, the data is modelled by the random variable X.
11 a R
ow total multiplied by column total divided by grand total gives expected frequency. Here
48 × 70 32
= = 10.67 to 2 d.p.
315 3
b Expected frequencies for the row x > 150 are all less than five, and so must be combined to get expected
frequencies of over five in order for the test statistic to be compared to the χ2-distribution.
c X F2 =
(11 − 9.90)2 + (19 − 27.24 )2 + ( 35 − 27.86 )2 = 4.44 to 2 d.p.
9.90 27.24 27.86
d 1% significance level, degrees of freedom (5 − 1) × (3 − 1) = 8, critical value 20.09. X2 > 20.09, so she would
have rejected the null hypothesis under these conditions.
12 a x = 25 × 10 + 35 × 42 + 45 × 31 + 55 × 17 = 40.5
100
100  252 × 10 + 352 × 42 + 452 × 31 + 552 × 17 
s= − 40.52  = 8.92 to 3 s.f.
99  100 

b 100 × P (50 length < 60) = 100 × Φ
 8.92(
60 − 40.5
−Φ ) (
50 − 40.5 
8.92  )
= 100 × [Φ(2.186) − Φ(1.065)] = 100 × (0.9856 − 0.8566) = 12.90
c H0: Algae length can be modelled by a normal distribution.
H1: Algae length cannot be modelled by a normal distribution.
1% significance level, combine first two and last two cells, three constraints, degrees of freedom 4 − 3 = 1,
critical value 6.635
X2 = 0.3202 + 1.070 + 1.253 + 0.4938 = 3.137
X2 < 6.635, therefore no reason to doubt H0, algae length can be modelled by a normal distribution.
d Expected values will change, and there will be two fewer constraints, so degrees of freedom will increase
by two.
13 a H0: Mobile phone signal strength and service provider are independent.
H1: Mobile phone signal strength and service provider are not independent.
Key
Observed
 (Ok − E k )2 

Good Medium Bad

A 55 56 39
(0.018 52) 54 (0.074 07) 54 (0.2143) 42
B 35 34 31
(0.027 78) 36 (0.1111) 36 (0.3214) 28

X 2 = 0.7302 < 9.210, therefore no reason to doubt H0, there is no relationship between signal and provider.
b Increasing sample size by a factor of n and keeping observed frequencies in the same proportion
increases X 2 by a factor of n.
0.7302n > 9.210 ⇒ n > 12.61 therefore n = 13.
49
2
3 -Tests
14 a H0: Treatment and reaction are independent.

H1: Treatment and reaction are not independent.
b Column total multiplied by row total divided by grand total gives expected frequency,
here 41 × 41 = 11.21 to 2 d.p.
150
c X S2 =
(25 − 21.05)2 + (7 − 11.21)2 + (9 − 8.75)2 = 2.33 to 2 d.p.
21.05 11.21 8.75
d Degrees of freedom (3 − 1) × (3 − 1) = 4
Critical value at 5% is 9.488 and at 2.5% is 11.14. Since 9.488 < X 2 < 11.14, the null hypothesis would be
rejected at the 5% significance level, but not at the 2.5% significance level. Therefore 5% is the smallest
significance level from the tables for which the null hypothesis would be rejected.
e H0: Proportions of people given the treatments A, B and C are in the ratio 3:2:1.
H1: Proportions of people given the treatments A, B and C are not in the ratio 3:2:1.
A B C
Observed 77 41 32
Expected 75 50 25
 (Ok − E k ) 
2
 Ek  0.053 33 1.62 1.96


X 2 = 3.633 < 5.991, therefore no reason to doubt H0, the proportion of people given each treatment is in
the ratio 3:2:1.

H0: Age and demand are independent
H1: Age and demand are not independent
5% significance level, combine 40–49 and 50+ rows.
Key
Observed
 (Ok − E k )2 

Low Medium High

<20 12 5 7
(2) 8 (0.5714) 7 (0.4444) 9
20–29 5 4 14
(0.9277) 7.667 (1.093) 6.708 (3.350) 8.625
30–39 7 12 13
(1.262) 10.67 (0.7621) 9.333 (0.083 33) 12
40+ 16 14 11
(0.3983) 13.67 (0.3487) 11.96 (1.247) 15.38

X 2 = 12.49 < 12.59, therefore just about no reason to doubt H0, there is no relationship between age
and demand.
50
3
Worked solutions
H0: The ages of customers attending the coffee shop can be modelled by a uniform distribution.
H1: The ages of customers attending the coffee shop cannot be modelled by a uniform distribution.
<20 20–29 30–39 40–49 50+

Observed 24 23 32 27 14
Expected 24 24V 24 24 24
 (Ok − E k ) 
2
 Ek  0 0.041 67 2.667 0.375 4.167


X 2 = 7.25 < 9.488, therefore no reason to doubt H0, the ages of customers attending the coffee shop can be
modelled by a uniform distribution.
51
4 Non-parametric tests
this publication.
Where values from the Cambridge International Education statistical tables are used, the same level of accuracy has
been used in workings unless stated otherwise.
1 P(X  12) = P(X = 12) + P(X = 13) + P(X = 14) + P(X = 15)
()
15
1 576
= × { 455 + 105 + 15 + 1} = = 0.0176
2 32768
1
10 (
2 x= 156 + 46 + … + 175 ) = 121
1 12102  21350
s 2 =  167760 − = = 2372.2…
9 10  9
H0 : μ = 150
H1 : μ < 150
121 − 150
5% significance level, one-tailed test. The test statistic T = = −1.883. Critical value:
2372
10
t9 (5%) = –1.833. As –1.883 < –1.833, reject H0. There is (just) sufficient evidence at the 5% significance level
to reject the null hypothesis that the mean is 150 in favour of the alternative hypothesis that the mean is less
than 150.
3 Perform a paired t-test, assuming the differences in the calorie intakes in January and June are normally
distributed. Take differences to be January minus June.
Differences: 126, 63, 189, – 92, – 49, 6, 93, 141
1
8(
d= 126 + 63 + … + 141) = 59.625
sd2 =
1
7 (( ) )
126 2 + 632 + … + 1412 − 8 × 59.6252 = 9508
H0 : md = 0
H 1 : md > 0
10% significance level, one-tailed test.
Test statistic T = 59.625 − 0 = 1.730.
9508
8
Critical value: t7 (10%) = 1.415. As 1.730 > 1.415 reject H0. There is sufficient evidence at the 10% significance
level to support Juliet’s hypothesis that people consume fewer calories in summer than winter.
52
4
Worked solutions
Exercise 4.1A
1 a H0: median revision time is 30 hours a week

H1: median revision time is less than 30 hours a week
X ∼ B(9, 0.5), 5% significance level, one-tailed test
Signed differences: −14, −11, −16, −18, −1, 9, −9, −20, −21
One positive sign. P(X  1) = 0.01953 < 0.05
Reject H0. Sufficient evidence that students are doing less than 30 hours revision a week.
b H
0: median revision time is 20 hours a week
H1: median revision time is not 20 hours a week
X ∼ B(9, 0.5), 5% significance level, two-tailed test
Signed differences: −4, −1, −6, −8, 9, 19, 1, −10, −11
Three positive signs. P ( X 3 ) = 0.2539 > 0.025
No reason to doubt H0. Insufficient evidence to refute the claim that median revision time is 20 hours
per week.
2 H0: Standard and premium brand yield same total distance.

H1: Premium brand yields higher distance than standard.
Signed differences: 15, −7, 24, 23, −16, 21, 10, 33
Two negative signs. P ( X 2 ) = 0.1445 > 0.05.
No reason to doubt H0. Insufficient evidence to show premium is better than standard.
The cars with the lowest distance travelled with standard are the ones that travel less far with premium.
Perhaps premium works better with more efficient cars.
3 H0: Wheatees and Crunchos are equally popular.

H1: There is a difference in popularity between Wheatees and Crunchos.
Two signs for Crunchos. P(X  2) = 0.01929 < 0.025
Reject H0. Sufficient evidence to say there is a difference in preferences for the two breakfast cereals.
4 H0: Chesford and Amerston have the same median crime rate.
H1: Chesford and Amerston do not have the same median crime rate.
X ∼ B(12, 0.5), 5% sig level, two-tailed test
Signed differences: 0.78, 0.27, 0.48, −1.29, −0.15, 0.50, −0.04, 0.07, 2.09, 1.20, 1.04, 0.02
Three negative signs. P ( X 3 ) = 0.07300 > 0.025 No reason to doubt H0. Insufficient evidence to show that
Chesford and Amerston have different median crime rates.
5 H0: Insulation does not affect median heat loss.

H1: Homes with insulation have a lower median heat loss.
Signed differences: 0.9, 3.2, 1.4, 0.1, −1.3, 1.2, 1.7, 5.1, 0.4, −1.9
Two negative signs. P(X  2) = 0.05469 > 0.02
No reason to doubt H0. Insufficient evidence to show that insulation reduces median heat loss.
6 a 2.17, 3.50, 2.06, 0.55, 2.05

b 0.17, 1.50, 0.06, −1.45, 0.05
c H0: Derivative A performs two percentage points better than derivative B.
H1: Derivative A performs more than two percentage points better than derivative B.
One negative sign. P ( X 1) = 0.1875 > 0.1
No reason to doubt H0. Insufficient evidence to show that derivative A has median performance more
than two percentage points better than derivative B.
53
7 H0: WBC count is normal, with a median of 7 million per 1 ml.

H1: WBC count is abnormal; median is not 7 million per 1 ml.
Signed differences: −0.27, –0.08, 0.21, −0.06, 0.45, 0.32, −0.49, −0.29, −0.09, −0.13, −0.05, 0.93, 0.35, −0.37,
−0.18, −0.39, −0.03, 0.23, 0.02, −0.24, −0.28, −0.10, −0.59, −0.34, −0.16
Seven positive signs
Normal approximation: X ∼ N(12.5, 6.25)
 7.5 − 12.5 
P ( X 7) = Φ  
 6.25 
= Φ ( −2 ) = 1 − Φ ( 2 )
= 1 − 0.9772
= 0.0228 < 0.05
Reject H0. The patient has an abnormal white blood cell count.
8 X ∼ B(n, 0.5): P(X = 0) = (0.5)n

Note: 0 negative signs, implies one-tailed test
0.5n < 0.001
n ln 0.5 < ln 0.001
ln 0.001
n> = 9.966
ln 0.5
Therefore n = 10.
X ∼ B(30, 0.5); therefore, by the normal approximation X ∼ N(15, 7.5)
 8.5 − 15 
P ( X 8) = Φ  = Φ ( −2.373 )
 7.5 
= 1 − 0.9912 = 0.0088
Two-tailed test, therefore significance level is 2 × 0.0088 = 0.0176 , or 1.76%.
Exercise 4.2A
1 Signed diff 4 −7 −5 −13 1 3 −20 −31 24 10 −12 −23 6 −21 −8

Unsigned diff 4 7 5 13 1 3 20 31 24 10 12 23 6 21 8
Unsigned rank 3 6 4 10 1 2 11 15 14 8 9 13 5 12 7
Signed rank 3 −6 −4 −10 1 2 −11 −15 14 8 −9 −13 5 −12 −7
H0: Employees stay with the company for a median period of a year (52 weeks).
H1: Employees stay with the company for a median period of less than a year.
1% significance level, one-tailed test, n = 15, critical value from table: 19
P = 33, Q = 87, therefore T = 33
T > 19, so no reason to doubt H0. Employees stay with the company for a year on average.
Assumption: weeks worked at the company are distributed symmetrically.
2 H0: Median is the same.

H1: Median is different.
10% significance level, two-tailed test,n = 18, critical value from table: 47
P = 133, Q = 38, therefore T = 38
T < 47, so reject H0. The median is different.
3 H0: Chesford and Amerston have the same median crime rate.
H1: Chesford and Amerston do not have the same median crime rate.
5% significance level, two-tailed test, n = 12, critical value from table: 13
54
4
Worked solutions
Signed diff 0.78 0.27 0.48 −1.29 −0.15 0.50 −0.04 0.07 2.09 1.20 1.04 0.02
Unsigned diff 0.78 0.27 0.48 1.29 0.15 0.50 0.04 0.07 2.09 1.20 1.04 0.02
Unsigned rank 8 5 6 11 4 7 2 3 12 10 9 1
Signed rank 8 5 6 −11 −4 7 −2 3 12 10 9 1
P = 61, Q = 17, therefore T = 17

T > 13, no reason to doubt H0. Median crime rates in Chesford and Amerston are equal.
4 H0: Literacy rates between genders (people aged 35–39) in South America are equal.
H1: Literacy rates between genders (people aged 35–39) in South America are not equal.
5% significance level, two-tailed test, n = 9, critical value from table: 5
Signed diff 1.59 −1.28 2.70 1.67 3.62 1.60 −0.19 0.46 1.79
Unsigned diff 1.59 1.28 2.70 1.67 3.62 1.60 0.19 0.46 1.79
Unsigned rank 4 3 8 6 9 5 1 2 7
Signed rank 4 −3 8 6 9 5 −1 2 7
P = 41, Q = 4, therefore T = 4
T < 5, reject H0. There is sufficient evidence at the 5% significance level that literacy rates between genders
(people aged 35–39) in South America are not equal.
5 a H0: Median consumption of refined petroleum products is 35 barrels a day.

H1: Median consumption of refined petroleum products is less than 35 barrels a day.
Signed differences: −9.99, 23.33, −6.24, −7.59, −5.66, −3.54, −11.06, 25.50, −10.40, −5.03, −2.37, −9.38
Two positive signs. P ( X 2 ) = 0.0193 < 0.05
Reject H0. Sufficient evidence to demonstrate median consumption of refined petroleum products is
less than 35 barrels a day.
b Same hypotheses. 5% significance level, one-tailed test, n = 12, critical value: 17
Sign diff −9.99 23.33 −6.24 −7.59 −5.66 −3.54 −11.06 25.5 −10.4 −5.03 −2.37 −9.38
Unsign diff 9.99 23.33 6.24 7.59 5.66 3.54 11.06 25.5 10.4 5.03 2.37 9.38
Unsign rank 8 11 5 6 4 2 10 12 9 3 1 7
Sign rank −8 11 −5 −6 −4 −2 −10 12 −9 −3 −1 −7
P = 23, Q = 55, therefore T = 23

T > 17, no reason to doubt H0. Median consumption of refined petroleum products is 35 barrels a day.
c T
he two biggest consumers are Netherlands and Belgium. They only count for two positive signs in the
sign test, but are the two largest deviations from the median in the Wilcoxon signed-rank test, so count
more significantly towards this test. Netherlands and Belgium could be considered outliers, so the sign
test would be more persuasive. Or, thinking of ‘average’ consumption, the fact that Netherlands and
Belgium consume so much means the Wilcoxon signed-rank test might be more appropriate.
6 H0: Median is as given.

H1: Median is less than the value given (also acceptable: greater than).
4% significance level, one-tailed test. Normal approximation required
µ = 1 × 50 × ( 50 + 1) = 637.5 and
4
σ 2 = 1 × 50 × ( 50 + 1) × ( 2 × 50 + 1) = 10 731.25
24
T = 423
55
 423.5 − 637.5 
P (T 423 ) = Φ  
 10 731.25 
= Φ ( −2.066 )
= 1 − 0.9806
= 0.0194 < 0.04
Therefore reject H0. Median is not as the given value.
7 a H
0: WBC count is normal, with a median of 7 million per 1 ml.
H1: WBC count is abnormal; median is not 7 million per 1 ml.
10% significance level, two-tailed test, n = 25. Normal approximation required
µ = 1 × 25 × 26 = 162.5 and
4
1 × 25 × 26 × 51 = 1381.25
σ 2 = 24
Signed ranks: −14, −5, 11, −4, 22, 17, −23, −16, −6, −8, −3, 25, 19, −20, −10, −21, −2, 12, 1, −13, −15, −7, −24,
−18, −9
P = 107, Q = 218, therefore T = 107
 107.5 − 162.5 
P (T 107 ) = Φ  
 1381.25 
= Φ( −1.480 ) = 1 − 0.9306
= 0.0694 > 0.05
Therefore no reason to doubt H0. WBC count is normal.
b T
he Wilcoxon signed-rank test has a lower probability of a Type II error (incorrectly rejecting a true null
hypothesis). Given the data does not look asymmetric, the Wilcoxon signed-rank test would be more
appropriate here.
8 a H
0: Population median is as given
H1: Population is not as given
1% sig level, two-tailed test, n = 30. Normal approximation required
µ = 1 × 30 × 31 = 232.5 and
4
1 × 30 × 31 × 61 = 2363.75
σ 2 = 24
T = 105
 105.5 − 232.5 
P (T 105 ) = Φ  
 2363.75 
= Φ( −2.612 ) = 1 − 0.9955
= 0.0045 < 0.005
Therefore reject H0. The population median is not as given.
b SN = 1 N ( N + 1)
2
The maximum number of positive ranks would occur if these were all the lowest ranks (because T is the
smaller of P and Q). If the lowest N ranks were all positive, then
1
2 (
N N + 1) = 105
N 2 + N = 210
N 2 + N − 210 = 0

( N − 14 )( N + 15) = 0
As N is positive, therefore N = 14, so at most 14 positive ranks.
56
4
Worked solutions
c X ∼ B(30, 0.5), two-tailed test. Normal approximation required.

 14.5 − 15 
P ( X 14 ) = Φ  
 7.5 
= Φ ( −0.1826 )
= 1 − 0.5726
= 0.4274
Therefore the probability of a Type I error (rejecting a true null hypothesis) is 2 × 0.4274 = 0.855.
9 a 1 2 3 P Q T
− − − 0 6 0
+ − − 1 5 1
− + − 2 4 2
+ + − 3 3 3 Therefore, for a two-tailed test with rejection region T  2, P (T 2 ) = 6 = 63
8 2
− − + 3 3 3
+ − + 4 2 2
− + + 5 1 1
+ + + 6 0 0
b 1 2 3 4 P Q T
+ + + + 10 0 0
− + + + 9 1 1
+ − + + 8 2 2
+ + − + 7 3 3
Therefore, for a two-tailed test with rejection region T  2,
− − + + 7 3 3 6 6
P (T 2 ) = =
+ + + − 6 4 4 16 2 4

− + − + 6 4 4
+ − − + 5 5 5
− + + − 5 5 5
+ − + − 4 6 4
− − − + 4 6 4
1 2 3 4 P Q T
+ + − − 3 7 3
− − + − 3 7 3
− + − − 2 8 2
+ − − − 1 9 1
− − − − 0 10 0
c
1 2 3 4 5 P Q T
− − − − − 0 15 0
+ − − − − 1 14 1 Each increase in n raises the total number of possible outcomes
− + − − − 2 13 2 by an additional power of two; however, the number of different
ways of getting signed-ranks of two or less remains constant at
+ − + + + 13 2 2 six. Therefore, for a two-tailed test with rejection region T  2,
− + + + + 14 1 1 P (T 2 ) =
6
=
6
32 2 5
+ + + + + 15 0 0
 
57
d For a sample of size n, P (T 2 ) = 6n

2
e P (T 2 ) = 6n < 0.001
2
6
< 0.001
2n
6 < 0.001 × 2 n
ln6 < ln0.001 + n ln2
ln6 − ln0.001 < n ln2
ln6 − ln0.001
<n
ln2
n > 12.55
Therefore, n = 13.
Exercise 4.3A
1 a β = 81
b γ = 55
(n + m + 1) = 7 × ( 9 + 7 + 1) = 119. 119 − 81 = 38 and 119 − 55 = 64

c m
d There are fewer boys, so should use 119 − 81 = 38 or 81. But as 38 is lower, the test statistic is W = 38.
e Critical value for 5% significance level one-tailed test: 43.
f H0: Girls and boys raise the same amount of sponsorship.

H1: Girls raise more sponsorship than boys.
W < 43, so reject H0. There is sufficient. evidence at the 5% significance level that girls raise more

sponsorship than boys.
2 a
Rank 1 2 3 4 5 6 7 8 9 10
98A 95B 93A 90B 87B 84B 81A 78A 77A 67A
 10
b   = 210
 6
c 17
d H0: Quarry A and quarry B have the same purities of iron ore.
H1: Quarry A and quarry B have different purities of iron ore.
10% significance level, two-tailed test, m = 4, n = 6, critical value: 13, m (n + m + 1) − 17 = 27, therefore W = 17
As W > 13, no reason to doubt H0; the quarries have the same purity of iron ore.
e No assumptions on the underlying probability distribution are necessary to perform this test.
3 a H
0: Group 1 and Group 2 are drawn from identical populations.
H1: Group 2 has higher values than Group 1.
5% significance level, one-tailed test, m = 6, n = 8, critical value: 31, m is the second group with
summed rank
Rm = 62, m (n + m + 1) − 62 = 28 , therefore W = 28

As W < 31, reject H0, so Group 2 has higher values than Group 1.
wo-tailed test, 2% significance level (from tables) critical value is 27, as W > 27, no reason to doubt H0.
b T
At 5%, critical value is 29, so this would yield a rejection of H0. Answer: 2%.
58
4
Worked solutions
4 a Rank 1 2 3 4 5 6 7 8 9 10
Ball C C C C G G C G G G
Sum of ranks C = 17
Sum of ranks G = 38
b The two sample sizes are equal.
c H
0: Claxxon and Galway golf balls travel the same distance.
H1: Claxxon and Galway golf balls do not travel the same distance.
10% significance level, two-tailed test, m = 5, n = 5, critical value: 19
For Claxxon balls, Rm = 17, m ( n + m + 1) − 17 = 38, therefore W = 17
As W  19, reject H0. There is sufficient evidence at the 5% significance level that the two types of balls
do not travel the same distance.
d H
0: The balls travel a median distance of 275 metres.
H1: The balls travel a median distance of greater than 275 metres.
Three negative signs. P ( X 3 ) = 0.172 > 0.08
No reason to doubt H0. Insufficient evidence that balls travel further than 275 metres.
5 H0: Younger and older drivers take the same length of time to pass their driving test.
H1: Younger drivers take less time than older drivers.
1% significance level, one-tailed test, m = 6, n = 10, critical value: 29
Rm = 37, m (n + m + 1) − 37 = 65 therefore W = 37
As W > 29 no reason to doubt H0; there is insufficient evidence at the 1% significance level that younger
drivers take less time than older drivers to pass their driving test.
6 H0: The two doctors have the same waiting time.

H1: The two doctors do not have the same waiting time.
5% significance level, two-tailed test, m = 3, n = 7, critical value: 7
Rm = 26, m (n + m + 1) − 26 = 7 therefore, W = 7
As W  7, there is just about reason to doubt H0, so just sufficient evidence at the 5% significance level that
waiting times are different.
7 H0: The two samples are drawn from identical distributions.

H1: The two samples are not drawn from identical distributions.
2.5% significance level, two-tailed test, m = 13, n = 14, therefore normal approximation required.
Rm = 231, m (n + m + 1) − 231 = 133
therefore W = 133
µ = 1 m (n + m + 1) = 1 × 13 × (13 + 14 + 1) = 182
2 2
σ = nm ( n + m + 1) = 1 × 14 × 13 × (13 + 14 + 1) = 1274
2 1
12 12 3
 
133.5 − 182 
P (W 133 ) = Φ  = Φ ( −2.354 ) = 1 − 0.9907 = 0.0093 < 0.0125
 1274 
 
3
Therefore reject H0; there is sufficient evidence at the 2.5% significance level that the two samples are not
from identical distributions.
8 a T
he samples are no longer matched pairs, but just 12 observations from each population (though one
might consider they are not randomly drawn).
b T
he three pairs of tied values occur within each sample, e.g. 4.94 appears in Chesford’s data twice, but not
in Amerston’s. So, this means they can be ranked k and k + 1 in either ordering, and it will not affect the test.
59
c A 1 A 13
A 2 A 14
A 3 C 15
C 4 A 16
A 5 A 17
C 6 A 18
C 7 C 19
A 8 C 20
C 9 C 21
C 10 A 22
A 11 C 23
C 12 C 24
H0: Chesford and Amerston have the same crime rate.

H1: Chesford and Amerston do not have the same crime rate.
5% significance level, two-tailed test, m = 12, n = 12, therefore normal approximation required. Using
Amerston as m, and ranking from low to high crime rate:
Rm = 130, m (n + m + 1) − 130 = 170 therefore W = 130
µ = 1 × 12 × 25 = 150
2
σ = 1 × 12 × 12 × 25 = 300
2
12
 130.5 − 150 
P (W 130 ) = Φ   = Φ ( −1.126 ) = 1 − 0.8698 = 0.1302 > 0.025
 300 

Therefore no reason to doubt H0; the crime rates in Chesford and Amerston are the same.
9 Entries N have been omitted for clarity

1 2 3 4 5 6 Rm W
M M 3 11 3
M M 4 10 4
M M 5 9 5
M M 6 8 6
M M 7 7 7
M M 5 9 5
M M 6 8 6
M M 7 7 7
M M 8 6 6
M M 7 7 7
M M 8 6 6
M M 9 5 5
M M 9 5 5
M M 10 4 4
M M 11 3 3
60
4
Worked solutions
Sampling distribution:
w 3 4 5 6 7
2 2 4 4 3
P(W = w) 15 15 15 15 15
Therefore, lowest possible significance level for two-tailed test would be 2 .

15
1 a H0: Median ratio of pupils to teachers is the same in 2000 and 2010.
H1: Median ratio of pupils to teachers is lower in 2010 than in 2000.
X ∼ B(9, 0.5), 10% significance level, one-tailed test, 2 negative signs P ( X 2 ) = 0.08984 < 0.1, therefore
reject H0; there is evidence to show that the median ratio of pupils to teachers is lower in 2010, so quality
is increasing.
b The Wilcoxon matched-pairs signed-rank test has a lower probability of Type II error.
2 H0: Median number of pages set for reading is 40.
H1: Median number of pages set for reading is not 40.
2% significance level, two-tailed test, n = 14, critical value 15
Difference 9 20 −8 25 3 −5 −2 −1 38 13 17 7 15 22
|Difference| 9 20 8 25 3 5 2 1 38 13 17 7 15 22
Rank 7 11 6 13 3 4 2 1 14 8 10 5 9 12
Signed rank 7 11 −6 13 3 −4 −2 −1 14 8 10 5 9 12
P = 92, Q = 13, so T = 13 < 15, therefore reject H0; the median number of pages set is not 40.
3 H0: Drinking Blue Stallion does not improve concentration.
H1: Drinking Blue Stallion does improve concentration.
5% significance level, one-tailed test, m = 6, n = 7, critical value 29
1 2 3 4 5 6 7 8 9 10 11 12 13
32 44 51 58 59 60 62 67 68 72 73 74 81
(BS) (BS) (Co) (BS) (Co) (BS) (Co) (Co) (BS) (BS) (Co) (Co) (Co)
Rm = 32, m ( n + m + 1) − Rm = 52 therefore W = 32. W > 29, so no reason to doubt H0; Blue Stallion drinkers’
reaction times are drawn from an identical distribution to the control groups’ reaction times.
4 a
H0: The median coefficient of friction is the same with both oils.
H1: The median coefficient of friction is not the same with both oils.
X ~ B (15, 0.5 ), 5% significance level, two-tailed test, 3 negative signs P ( X 3 ) = 0.01758 < 0.025, therefore
reject H0; there is a difference in median coefficient of friction between the two oils.
ritical value for n = 15, 5% significance level, two-tailed test is 25. As T = 33 > 25, there is no reason to
b C
doubt the null hypothesis. This changes the conclusion from above.
5 a The data does not appear to be symmetric.

b H0: The median amount of time for pain to be relieved is 30 minutes.
H1: The median amount of time for pain to be relieved is more than 30 minutes.
X ~ B (12, 0.5 ), 5% significance level, one-tailed test, 3 negative signs
P ( X 3 ) = 0.0730 > 0.05, therefore no reason to doubt H0; on average pain is relieved within 30 minutes.
6 H0: There is no preference for one sports kit manufacturer over the other.
H1: There is preference for one sports kit manufacturer over the other.
X ~ B (100, 0.5 ) , therefore use a normal approximation, X ~ N ( 50, 25 )
1% significance level, two-tailed test, 36 ‘negative’ signs
( )
P(X 36) = Φ 36.5 − 50 = Φ(−2.7) = 1 − 0.9965 = 0.0035 < 0.005,
5
therefore reject H0; there is evidence of a difference in preference for the two manufacturers of the sports kit.
61
7 a H0: North African and Central American television ownership rates are drawn from identical distributions.
H1: North African and Central American television ownership rates are not drawn from
identical distributions.
5% significance level, two-tailed test, m = 5, n = 10, critical value 23
Rank Value Region
1 84.8 NA
2 93.5 NA
3 93.9 NA
4 99.5 CA
5 99.8 CA
6 104.8 NA
7 109.7 CA
8 110.9 NA
9 125.9 CA
10 129.4 CA
11 134.6 CA
12 152.0 CA
13 158.7 CA
14 158.9 CA
15 166.1 CA
Rm = 20, m (n + m + 1) − Rm = 60, therefore W = 20. W < 23, reject H0; there is sufficient evidence of a

difference between television ownership rates in North Africa and Central America.
b In order to perform a two-sample t-test the samples must be drawn from distributions with identical
variance, but clearly the two standard deviations are not close to being the same.
8 a H
0: Median score on first paper is the same as on the second paper.
H1: Median score on first paper is the lower than on the second paper.
5% significance level, one-tailed test, n = 10, critical value 10
Difference 10 −19 11 16 20 28 −4 21 8 2
|Difference| 10 19 11 16 20 28 4 21 8 2
Rank 4 7 5 6 8 10 2 9 3 1
Signed rank 4 −7 5 6 8 10 −2 9 3 1
P = 46, Q = 9, so T = 9 < 10 therefore reject H0; there is evidence to support the claim that the median
mark on the second paper is higher than on the first paper.
b This converts the test into a two-tailed test, and the critical value is now 8. As T > 8 there is no reason to
doubt H0, which is that the median scores on the two papers are the same. For a given significance level,
in stating that one median is lower than the other, the critical region becomes larger than just looking
for a generic difference (either higher or lower). Hence, it is not contradictory that the first test should
reject, whilst the second test finds no reason to doubt the null hypothesis.
9 H0: The two samples are drawn from identical distributions.

H1: The two samples are not drawn from identical distributions.
5% significance level, two-tailed test, m = 15, n = 20, normal approximation required
Rm = 340, m (n + m + 1) − Rm = 200, therefore W = 200.
( 2
) 1 2 1
W ~ N µ, σ : µ = 2 m (n + m + 1) = 270, σ = 12 nm (n + m + 1) = 900

P (W 200 ) = Φ (
200.5 − 270
30 )
= Φ ( −2.317 ) = 1 − 0.9898 = 0.0102 < 0.025
Therefore, reject H0; the two samples are not drawn from identical distributions.
62
4
Worked solutions
10 a H0: Median birth rates have not changed from 2000 to 2005.
H1: Median birth rates have decreased from 2000 to 2005.
Difference −2.55 −0.45 0.73 −0.28 −0.17 −0.29 0.06

|Difference| 2.55 0.45 0.73 0.28 0.17 0.29 0.06
Rank 7 5 6 3 2 4 1
Signed rank −7 −5 6 −3 −2 −4 1
P = 7, Q = 21, so T = 7 > 3, therefore no reason to doubt H0; there has been no change in median birth rate

from 2000 to 2005.
b Wilcoxon rank-sum test
c 5% significance level, one-tailed test, m = n = 7, critical value 39. As W = 50 > 39, this does not change the
conclusion that there is no reason to doubt the null hypothesis.
11 H0: The median weight of the first-born twin is the same as that of the second-born.
H1: The median weight of the first-born twin is greater than that of the second-born.
8% significance level, one-tailed test, n = 45, normal approximation required
( ) 1
T ~ N µ, σ 2 : µ = 4 n ( n + 1) = 517.5
s 2 = 1 n ( n + 1)( 2n + 1) = 7848.75
24
T = 437
 437.5 − 517.5 
P (T 437 ) = Φ  7848.75 
 = Φ ( −0.9030 ) = 1 − 0.8167 = 0.1833 > 0.08
Therefore, no reason to doubt H0; the median weight of the two twins is the same.
12 a
H0: Winston and Jamal have the same median time.
H1: Winston has a lower median time than Jamal.
X ~ B ( 5, 0.5 ) , 5% significance level, one-tailed test, no negative signs
P(X = 0) = 0.03125 < 0.05, therefore reject H0; there is sufficient evidence that Winston has a lower median
time than Jamal.
b The paired-sample sign test can only be used if the samples are drawn under the same conditions for
each point. In this case the races being different will have different underlying conditions, so the test is
not valid.
c Use a Wilcoxon rank-sum test.
H0: Winston and Jamal’s times are drawn from identical distributions.
H1: Winston’s times are lower than Jamal’s.
5% significance level, one-tailed test, m = n = 5, critical value 19.
Using Jamal as m Rm = 34, m ( n + m + 1) − Rm = 21, therefore W = 21.
W > 19, therefore, no reason to doubt H0; Winston and Jamal’s times are drawn from identical
distributions.
As only the second non-parametric test is valid, this suggests that there is insufficient evidence to pick
Winston ahead of Jamal. For example, Winston’s times may have come in races where the wind was
behind, whereas Jamal’s may have all been into a headwind.
If the times had come from the same races (so under the same conditions) there might have been
sufficient evidence to pick Winston ahead of Jamal, as he would have beaten him five times out of five.
13 a H0: Median black-fly damage is identical on crops treated by organic or chemical pesticides.
H1: Median black-fly damage is not identical on crops treated by organic or chemical pesticides.
5% significance level, two-tailed test, n = 9, critical value 5
63
Difference −4.6 1.2 −7.6 −4.0 3.4 −1.1 −3.2 −5.8 −3.3
|Difference| 4.6 1.2 7.6 4.0 3.4 1.1 3.2 5.8 3.3
Rank 7 2 9 6 5 1 3 8 4
Signed rank −7 2 −9 −6 5 −1 −3 −8 −4
P = 7, Q = 38, so T = 7 > 5, therefore no reason to doubt H0; type of pesticide makes no difference to
prevalence of black-fly damage.
b The paired-sample t-test could be used to test whether the mean damage is the same.
c H0: Mean difference of black-fly damage between crops treated by organic or chemical pesticides is zero.
H1: Mean difference of black-fly damage between crops treated by organic or chemical pesticides is not zero.
5% significance level, two-tailed test
Estimated standard deviation of differences
9  (−4.6)2 + 1.2 2 + … + (−3.3)2 2
s= − ( 6.4 − 3.6 ) 
8  9 
= 3.416
Test statistic T = 6.4 − 3.6 = 2.459
3.416
9
Critical value from t-distribution with 8 degrees of freedom is 2.306
As T > 2.306, reject H0; there is sufficient evidence of a difference in the mean black-fly damage between
the two pesticides.
d Paired sample t-test as Wilcoxon signed-rank test doesn’t take into account the magnitude of the
differences. Different regions may respond better to different pesticides.
14 a H0: Ship-building times in Guangnan and Jiangzhou are drawn from identical distributions.
H1: Ship-building times in Guangnan and Jiangzhou are not drawn from identical distributions.
10% significance level, two-tailed test, m = 3, n = 4, critical value 6
Ranking: 23J, 27J, 32G, 34J, 40J, 43G, 54G
Rm = 16, m (n + m + 1) − Rm = 8
Therefore W = 8. W > 6, so no reason to doubt H0; the ship-building times are the same.
b H0: The mean ship-building times in Guangnan and Jiangzhou are equal.
H1: The mean ship-building times in Guangnan and Jiangzhou are not equal.
10% significance level, two-tailed test, degrees of freedom 3 + 4 − 2 = 5
Critical value from t-distribution is 2.015
Sample means xG = 43 and x J = 31
Estimate of shared variance
2 × 121 + 3 × 170
3 = 82.4
s2 =
5
Test statistic T = 43 − 31
= 1.731
(
82.4 13 + 14 )
As T < 2.015, there is no reason to doubt H0; the mean ship-building times are the same
for both companies.
c Advantage is that it does not rely on the underlying populations being drawn from a normal
distribution. Disadvantage is that it does not use all the information (it only uses the relative sizes and
not the exact values) of the data points to test the hypothesis.
1
15 m=
a R × 10 × (10 + 1) = 55,
55 assuming the first ten ranks were all from the sample of size m.
2
b Let ω be the maximum value at which the null hypothesis would be just rejected. Using a normal
approximation
( ) 1
W ∼ N µ, σ 2 : µ = m (n + m + 1) = 125,
2
64
4
Worked solutions
σ 2 = 1 nm (n + m + 1) = 875
12 3
   
P (W ω ) = Φ  ω + 0.5 − 125 
⇒ 0.01 = Φ  ω + 0.5 − 125 
 875   875 
   
3 3
= −2.326 this yields ω = 84.78
Therefore W < 84.8.
Use a matched-pairs Wilcoxon signed-rank test, as the nesting locations are the same throughout.
H0: Median number of eggs from earlier year to later year is unchanged.
H1: Median number of eggs from earlier year to later year has decreased.
For 2000 to 2005, T = 19 and for 2005 to 2010 T = 18. In both cases T > 17, so there is no reason to doubt H0; the
median number of eggs is unchanged.
However, for 2000 to 2010, T = 9 and T < 17, so the null hypothesis is rejected in favour of H1; the median number
of eggs has decreased.
This demonstrates that even if over shorter periods there is no evidence to show egg numbers are decreasing, in
the longer run the evidence supports this hypothesis.
Location A B C D E F G H I J K L
2000 154 239 107 167 130 245 280 68 179 294 273 249
2005 99 201 121 162 129 258 254 90 162 278 242 252
Difference 55 38 −14 5 1 −13 26 −22 17 16 31 −3
|Difference| 55 38 14 5 1 13 26 22 17 16 31 3
Rank 12 11 5 3 1 4 9 8 7 6 10 2
Negative ranks 5 4 8 2
2005 99 201 121 162 129 258 254 90 162 278 242 252
2010 109 193 76 140 118 246 230 108 153 203 269 212
Difference −10 8 45 22 11 12 24 −18 9 75 −27 40
|Difference| 10 8 45 22 11 12 24 18 9 75 27 40
Rank 3 1 11 7 4 5 8 6 2 12 9 10
Negative ranks 3 6 9
2000 154 239 107 167 130 245 280 68 179 294 273 249
2010 109 193 76 140 118 246 230 108 153 203 269 212
Difference 45 46 31 27 12 −1 50 −40 26 91 4 37
|Difference| 45 46 31 27 12 1 50 40 26 91 4 37
Rank 9 10 6 5 3 1 11 8 4 12 2 7
Negative ranks 1 8
65
5 Probability Generating Functions
5 Probability generating functions

this publication.
Prerequisite knowledge ii From the expansion, the coefficient of t r is

P ( X = r ) = kα r
1 Mean, E(X ) = 2 × 0.2 + 5 × 0.5 + 7 × 0.3 = 5 kα 1
b GX (1) = =1⇒α =
1−α k +1
Variance, Var(X ) = 22 × 0.2 + 52 × 0.5 + 72 × 0.3 – 52 = 3
4 a G eometric distribution requires a set of trials
2 Expectations, E(X ) = 2, E(Y ) = 5 × 0.4 = 2 that have two outcomes (success or failure,
Variances, Var(X ) = 2, Var(Y ) = 5 × 0.4 × (1 – 0.4) = equivalently, pass or fail his test) with these
1.2 outcomes being independent from trial to trial
and with a fixed probability of success. These
( ) ( )
( ) ( )
( ) ( )
1 −2   2 3 
9 1 9 1 − 2 × − 3 assumptions
1 (−are
2) ×stated
(−3) ×in(−the
4) question.
1 Once
3 2 = 9 × 16 1 − 4 x  = 16 1 + ( −2 ) × − 4 x + × − x + + − x + …
(4 − x )    2 4 6
Sudhir passes his test, he does not retake it, 4 
which is as in the geometric distribution: once
( ) ( ) ( )
( −2 ) × ( −3) × − 1 x 2 + (−2) × (−3)
( )
1 −2 
9  
3
1 1 × (−4)
there 1
has+ been
= 9 ×  1 − x  = 1 + ( −2 ) × − x + − xa success
+ … the trials end.
 16 4  16  4 2 4 6 4  2
() ()
3
1 2 1 2 2 1 2 1
( ) = + × + × t3+ × t4 +…
) ( ) ( )
b G t t t
+
( −2 ) × ( −3 ) 1
× − x +
2
(−2) × (−3) × (−4) 1
+ − x + …
3  X 3 3 3 3 3 3 3
2 4 6 4  1 
( ) ( ) 
2 3
2 2 2
G (t ) = t  1 + t + t + t + …

=
9
16 { 1 3 2 1 3
1+ x + x + x … =
2 16 16 }
9
+
16 32
9
x+
27 2
256
x +
9 X3
256
x …
3  3 3
This is an infinite geometric series with a
3 
1
2
3 2 1 3
x+ x + x … =
16 16

} 9
+
16 32
9
x+
27 2
256
x +
9 3
256
x … common ratio of 2 t and initial term
1
3
t . Therefore
3
Exercise 5.1A 1t
t
GX (t ) = 3 =
1 − 23 t 3 − 2t
1 1 3 1 1
1 GX (t ) = + t + t 2 + t 3 + t 4 1 = 1 as required.
2 5 20 10 20 GX (1) = 3 − 2
2  1010  1010
GHG(Ht ()t=) =   (0.7 )10)10+ +   (0.7
(0.7 )9 )(90.3
(0.7 )t )t
(0.3 c GX (t ) =
pt
 0 0   1 1  1 − (1 − p )t
 10   8 8 2 22 2 …
 10 e −λ λ 2 2 e −λ λ 3 3 … e −λ λ r r
++  2  (0.7 ) )(0.3
(0.7 (0.3) )t t+ + … 5 GX (t ) = e −λ + e −λ λt + t + t + + t
 2 2 6 r!
 10 10  10 1010 10

(0.3 ) )t t

= e −λ  1 + λt +
( λt )2 + ( λt )3 + …
++
 10  
 10 (0.3
2! 3! 
 
= (0.7 + 0.3t )
10 Using the Maclaurin series result this equals
= e −λe λt = e λt − λ = e λ(t −1) as required.
3 a i (1 − αt )−1 = 1 + αt + (αt )2 + (αt )3 + …
−1
kα t (1 − α t ) = kα t + k(α t )2 + k(α t )3 + k(α t ) 4 + …
⇒ P ( X = 1) = kα
66
5
Worked solutions
Exercise 5.2A p p
E( X ) = G′X (1) = = = 1 as required
(1 − (1 − p ))2 p2 p
GX(t ) = 15 + 25 t + 10 3 t 3 + 1 t 10
1
10 2 (1 − p ) p
GX′′ (t ) =
2 9
G′X (t ) = 5 + 10 t + t2 9
(1 − (1 − p )t )3
G′X (1) = 25 + 10 9 + 1 = 2.3 = E X
( ) 2 (1 − p ) p 2 (1 − p )
so GX′′ (1) = =
(1 − (1 − p ) ) 3
p2
G′′X (t ) = 95 t + 9t 8
2 (1 − p ) p 1 1− p
Var ( X ) = + 2 − 2 = 2 as required
G′′X (1) = 95 + 9 = 10.8 p2 p p p
2 1+ a
Var( X ) = G′′X (1) + G′X (1) − G′X (1) 5 a GY (1) = =1⇒ a = 4−b
  5−b
= 10.8 + 2.3 − 2.32 = 7.81 a ( 5 − bt ) + b (1 + at )
b GY′ (t ) =
(5 − bt )2
1
a GX (1) = k ( 3 + 2 × 1) = 125k = 1 ⇒ k = 125 a ( 5 − b ) + b (1 + a )
3
2
so GY′ (1) = =1
6 24
(5 − b )2
′X (t ) = 125 ( 3 + 2t ) and G′′X (t ) = 125 ( 3 + 2t )
2
b G
Substituting a = 4 − b gives
6
Therefore E ( X ) = G′X (1) = × 25 = 1.2
125 ( 4 − b )(5 − b ) + b (1 + 4 − b ) = 4 = 1
2 (5 − b )2 5−b
Var( X ) = G′′X (1) + G′X (1) − G′X (1)
 
⇒ b = 1 and a = 3
24 2
= 125 × 5 + 1.2 − 1.2 = 0.72
c Expanding
3
4
a Expanding brackets: GZ (t ) = t −1 +
25
4m m 2
25
+
25
t (
(1 + 3t )(5 − t )−1 = 15 (1 + 3t ) 1 + 15 t + 25
1 t 2 +…
)
Therefore Z can take values {−1, 0,1} Mean is expectation, so equal to 1. Therefore
P (Y = 1) is the coefficient for t from the
Z (1) =
b G
(2 + m )2 = 1 ⇒ (m + 2 )2 = 25 , as m > 0,
25

5 5 (
expansion: 1 1 t + 3t so P (Y = 1) = ) 16
25
m + 2 = 5, m = 3
Exercise 5.3A
6( 2 + 3t ) × 25t − 25 × ( 2 + 3t )
2
c GZ′ (t ) =
625t 2 1 GX +Y (t ) = GX (t ) × GY (t )
(2 + 3t )( 3t − 2 )
=
25t 2 (
GX +Y (t ) = 0.1t 2 6 + 3t + t 2 ×) 0.05
t
(8 + 12t ) 2
4
= 0.005 ( 6 + 3t + t )( 8 + 12t )
Therefore the mean (expectation) is 2 4
GZ′ (1) =
(2 + 3)( 3 − 2 ) = 0.2
(t ) = 0.005 ( 48 + 24t + 8t + 72t + 36t )
25 2 4 5
GX +Y + 12t 6
G′′ (t ) =
(
18t × 25t 2 − 50t 9t 2 − 4 )= 8
Z
625t 4 25t 3 P ( X + Y is odd ) is the sum of coefficients of tr where
8 = 0.32 r is odd. P ( X + Y is odd ) = 0.005 ( 24 + 36 ) = 0.3
G′′Z (1) = 25
Standard deviation is 2 Let Yi be the score on an individual dice, then

2
2 3 1 1 1 1 1 1
G′′Z (1) + G′X (1) − G′X (1)  = 0.32 + 0.2 − 0.04 = GYt (t ) = t + t 2 + t 3 + t 4 + t 5 + t 6
  5 6 6 6 6 6 6
1
2
2 3 = t (1 + t + t 2 + ... + t 5)
G′′Z (1) + G′X (1) − G′X (1)  = 0.32 + 0.2 − 0.04 = 6
  5
using formula for geometric sum
4 GX (t ) =
pt
therefore 1 ( 1 1−t6) ( t 1−t6 )
1 − (1 − p )t =
6
t ×
1−t
=
6(1 − t )

p × (1 − (1 − p )t ) − pt × − (1 − p) p
G′X (t ) = =
(1 − (1 − p)t )2 (1 − (1 − p)t )2
67
So for X = Y1 + Y 2 + Y 3 0.5(0.5 + 0.5t )5 × (1 − 0.5t )−1

= 0.015625 + 0.0859375t + 0.19921875t 2 + …
( ) 
3
t 1 − t 6 t3 1−t6 
3
GX (t ) =  = 216
 6(1 − t )   1 − t  Therefore

P ( A + B < 3 ) = P ( A + B = 0 ) + P ( A + B = 1)
as required.
+ P( A + B = 2 )

P ( X 16 ) is the sum of the coefficients of t16, t17 = 0.015625 + 0.0859375 + 0.19921875 = 0.301
and t18 Begin by noting (3 s.f.)
1−t 6 = 1+t +t2 +t 3 +t 4 +t 5
1−t ( ) 6 a T
his is a case of the sum of two geometric
3
distributions, each with a PGF of
1−t 
( )
6
so  = … + 6t 13 + 3t 14 + t 15 0.25t
 1 − t  G X i (t ) =
1 − 0.75t
6 + 3 +1 5 So
therefore P ( X 16 ) = =
216 108 2
0.25t 
G X1 + X 2 (t ) = G X1 (t ) × G X 2 (t ) = 
3 GX1 (t ) = 0.75t −2 5
+ 0.25t and X = X 1 + X 2 + … + X 25  0.75t 
1 −
5 25
GX (t ) = 0.75t −2
+ 0.25t  0.125t (1 − 0.75t )2 + 1.5(1 − 0.75t ) × 0.0625t 2
b GX′ (t ) =
(1 − 0.75t ) 4
To find expectation
0.125(1 − 0.75)2 + 1.5(1 − 0.75) × 0.0625
E(X ) = GX′ (1) = =8
GX′ (t ) = 25 × (−1.5t −3 + 1.25t 4) × (0.75t −2 + 0.25t 5 )24 (1 − 0.75) 4

E(X ) = GX′ (1) = 25 × (−1.5 + 1.25) × (0.75 + 0.25)24 c GX (t ) = (0.25t )2(1 − 0.75t )−2
= −6.25
= (0.25t )2(1 + 1.5t + 1.6875t 2 + …)
4 a G′X (t ) = 0.3 + 0.74t + 0.6t 2 + 0.16t 3 P ( X < 5 ) = 0.252 (1 + 1.5 + 1.6875 ) = 0.262 to

E(X ) = G′X (1) = 0.3 + 0.74 + 0.6 + 0.16 = 1.8 3 s.f.
G′′X (t ) = 0.74 + 1.2t + 0.48t 2 Exam-style questions

2
Var(X ) = G′′X (1) + G′X (1) − G′X (1)   n n  n
 
a G X (t ) =   (1 − p ) +   (1 − p ) pt
n −1
1
= (0.74 + 1.2 + 0.48) + 1.8 − 1.8 = 0.98 2 0 1 
 n
GY (t ) = 0.3 + 0.5t + 0.2t 2 as [ GY (t )] 2 = GX (t ) +   (1 − p ) p 2t 2
n −2
b i
2
ii P (Y = 1) = 0.5
 n  n
iii 2E (Y ) = E ( X ) ⇒ E (Y ) = 0.9 and
3
n−3
 n
(
+   (1 − p ) p 3t 3 + … +   p nt n = (1 − p ) + pt )n
2Var (Y ) = Var ( X ) ⇒ Var (Y ) = 0.49
5 The number of heads Alberta gets, A, is a b GX′ (t ) = np ((1 − p) + pt )n −1 and

geometric sequence, but if she succeeds (gets a GX′′(t ) = n (n − 1)p 2((1 − p) + pt )n − 2
tail) on her first throw there are no heads. This can
be thought of as a geometric distribution with the Therefore
0.5 E(X ) = GX′ (1) = np ((1 − p) + p)n −1 = np and
values starting at zero. Hence GA (t ) =
1 − 0.5t
The number of heads Bruno gets, B, is simple 2
Var(X ) = GX′′(1) + GX′ (1) − GX′ (1) 
binomial, so GB (t ) = ( 0.5 + 0.5t )5  
0.5 ( 0.5 + 0.5t ) = n(n − 1)p 2 + np − n 2p 2 = np(1 − p)
5

GA + B (t ) = GA (t ) × GB (t ) =
1 − 0.5t
0.5(0.5 + 0.5t )5 2 ( ) 1
a GX (1) = k 1( 3 + 4 ) + (1 + 1) = 11k = 1 ⇒ k = 11
2
= 0.5(0.03125 + 0.15625t + 0.3125t + …) 2

b The coefficient of the t2 term is P(X = 2).
= 0.015625 + 0.078125t + 0.15625t + … 2 4
Therefore P(X = 2) =
11
(1 − 0.5t )−1 = 1 + 0.5t + 0.25t 2 + …
68
5
Worked solutions
( ) ()
1 1 −1
a 
2

X′ (t ) =
c G (2 + 8t + 12t 2) and GX′′(t ) = (8 + 24t ) a
c GX (t ) = t 1 −
t
= t 1 + +
t t
+ …
11 11 b b b  b b 
Therefore E(X ) = GX′ (1) = 2 and
a a a
2
32 10 = t+ 2
t + t +…
3
Var(X ) = GX′′(1) + GX′ (1) − GX′ (1)  = +2−4= a +1 (a + 1)2 (a + 1)3
  11 11
3 GY (t ) = pt + (1 − p)pt 2 + (1 − p)2pt 3 + (1 − p)3pt 4 + … d p = a and 1 − p = 1

a +1 a +1
= pt (1 + (1 − p )t + (1 − p ) t 2 + (1 − p ) t 3 + …)
2 3
2 −2 2
6 a G′X (t ) = 4te 2t
and G′′X (t ) = 4(1 + 4t 2)e2t − 2
This is an infinite geometric series with common
ratio (1 − p )t and first term pt. so E(X ) = G′X (1) = 4
pt 2
Therefore GY (t ) = and Var(X ) = G′′X (1) + G′X (1) − G′X (1) 
1 − (1 − p )t  
To find expectation and variance = 20 + 4 − 4 2 = 8
GY′ (t ) =
(
p × 1 − (1 − p )t − pt × − (1 − p ) ) =
p 2 −2
b GX +Y (t ) = G X (t ) × GY (t ) = e2t × et −1
(1 − (1 − p )t ) 2
(1 − (1 − p )t )2
2 +t − 3
= e2t = e(t −1)(2t + 3)
p p 1
E(Y ) = G′Y (1) = = 2=p

(1 − (1 − p ))2 p 7 a GX (1) = 3 + a = 1 ⇒ a = 1
5 −1
2 (1 − p ) p b The coefficient of t 2 represents P ( X = 2 ).
G′′Y (t ) = so
(1 − (1 − p )t ) 3
)( )
−1
GX (t ) =
1
5 (
3 + t2 1−
t
5
2 (1 − p ) p 2 (1 − p )
GY′′ (1) = =
(
1 − (1 − p ) ) 3
p2 =
1
5 ( 
3 + t 2 1 + +
 5 25 )
t t 2 …
+ 


2 (1 − p ) p 1 1− p

Var (Y ) =
p2
+ 2− 2 = 2
p p p Coefficient is: 1 1 + 3 × 1 = 28
5 25 125 ( )
4 (
a GQ (1) = c (1 + 1) + 2 ( 2 + 3 ) 4 2
) = 66c = 1 ⇒ c =
1
66 c GX′ (t ) =
2t (5 − t ) + (3 + t 2)
(5 − t )2
66 (
b GQ′ (t ) = 1 4 (1 + t )3 + 12 ( 2 + 3t ) ) so E(X ) = G X′ (1) =
3
Therefore, 4
E (Q ) = GQ′ (1) =
1
66 (
4 (1 + 1) + 12 ( 2 + 3 ) =
3 46
33 )  3 + t2
d H Z (t ) = [ G X (t )] ⇒ H Z (t ) = 
2
2
 5 − t 
1
(
c GQ′′ (t ) = 66 12 (1 + t ) + 36
2
) ( )
4t 3 + t 2 ( 5 − t ) + 2( 5 − t )( 3 + t )
2 2
H ′Z (t ) =
so, Var (Q ) = GQ′′(1) + GQ′ (1) − GQ′ (1)
2
(5 − t ) 4
 
256 + 128 = 3 = 2 × E X
( ) ( ) so E( Z ) = H Z′ (1) = ( )
2
1 46 46
= 66 12(1 + 1) + 36 + 33 − 33 256 2
2
as required
= 0.724
8 (
D1 (t ) = k t + 2t 2 + 3t 3 + 4t 4 + 5t 5 + 6t 6 ,
a G )
d GP +Q (t ) = GP (t ) × GQ (t )
When t = 1, 21k = 1
1
( ) therefore k = 1
2
= (1 + t )4 + 2 ( 2 + 3t )2
21
4356
5 a GX (1) =
a
=1⇒ b = a +1 b GD′ 1 (t ) =
1
21 (
1 + 4t + 9t 2 + 16t 3 + 25t 4 + 36t 5 , )
b −1
GD′ (1) = 13
a ( b − t ) + at 1 3
X′ (t ) =
b G
(b − t )2 and GD′′1 (t ) =
1
21 (
4 + 18t + 48t 2 + 100t 3 + 180t 4 , )
a ( b − 1) + a a 2 + a a + 1 ′′ (1) = 50
E( X ) = G′X (1) = = = a GD
(b − 1)2 a2 1 3
69
′ (1) = 13 and
E( X ) = GD
11 ( X ) = 0 × (1 − p ) + 1 × p = p and
a E
1 3
Var ( X ) = [02 × (1 − p ) + 12 × p] − p 2 = p (1 − p )
2
′ (1)  = 20
Var ( X ) = GD′′1 (1) + GD′ 1 (1) − GD
 1  9
c Let D be the sum of the scores on the three b GX (t ) = P ( X = 0 ) × t 0 + P ( X = 1) × t 1 = (1 − p ) + pt
dice, then
−λ 2 −λ 3
c GX (t ) = e −λ + e −λ λt + e λ t 2 + e λ t 3
1
( )
3
GD (t ) = t + 2t 2 + 3t 3 + 4t 4 + 5t 5 + 6t 6 2 6
9261
e −λ λ r r
P ( D 16 ) is given by the coefficients of the
+…+ t
r!
t16, t17 and t18 terms.
1 26

= e −λ  1 + λt +
( λt )2 + ( λt )3 + …
9261 ((
P ( D 16 ) = 432 + 450 ) + 540 + 216 ) =  2! 3! 

147
Using the Maclaurin series result
9 a GV (t ) = a t 2 + 2 t −2
= e λt − λ = e ( ) as required.
b b −λ λt −λ 1 − t
=e e
a+2
GV (1) = =1⇒b = a +2
b
d H′Y (t ) = λ e ( ) and H′′Y (t ) = λ 2e ( )
−λ 1− t −λ 1 − t
GV′ (t ) = 2a t − 4 t −3
b b
H′Y (1) = λ and H′′Y (1) = λ 2 so
E(V ) = GV′ (1) = 2a − 4 = − 23 ⇒ b = 6 − 3a
b
E ( X ) = H′Y (1) = λ and

Solving simultaneously yields a = 1 and b = 3 2
Var ( X ) = H′′Y (1) + H′Y (1) − H′Y (1) 
 
b L
et V be the sum of six independent
observations, then = λ2 + λ − λ2 = λ
6 as required.
HV (t ) = [ GV (t )] =  t 2 + t −2 
6 1 2
 3 3 
e KZ (t ) = HY ( G X (t )) = e
( (
−λ 1− (1− p )+ pt )) = e λp(t −1)
P(V = 0) is the coefficient of the constant
as required
term. Using the binomial expansion, this
( )( ) KZ′ (t ) = λ pe λp(t −1) and KZ′′ (t ) = ( λ p ) e λp(t −1)

6  3 3
f
2
coefficient is   1 2 = 160
 3 3 3 729
KZ′ (1) = λ p and KZ′′ (1) = ( λ p ) so
2
1
10 a GY (1) = k (1 + a ) = 1 ⇒ k =
1+ a
E ( Z ) = KZ′ (1) = λ p and
G′Y (t ) = 3 akt 2 and G′′Y (t ) = 6 akt
Var ( Z ) = ( λ p ) + λ p − (λ p)2 = λ p
2
2
Var ( X ) = GY′′ (1) + GY′ (1) − GY′ (1) 
  g Po ( λ p )
= 6 ak + 3 ak − ( 3ak ) = 9 ak (1 − ak ) = 2
2

Substituting k = 1 gives
1+a
( )
9a 9a L
et X be the score of the voter:
a
1− = =2 GX (t ) = 0.4t + 0.25 + 0.35t −1
1+ a 1+ a (1 + a )2
1 2 1 I f sample is random, then scores for each voter
2a 2 − 5a + 2 = 0 ⇒ a = or 2 and k = or
2 3 3 will be independent, so let Y be the total score of
three voters, hence
( )
10
b H Z (t ) = k10 1 + at 3
( )
3
GY (t ) = 0.4t + 0.25 + 0.35t −1
P ( Z 3 ) is given by the sum of coefficients

of the constant term, t, t 2 and t 3. Using a
I n the sample of three, these provisos imply that
binomial expansion
Y 2 (2 yes, 1 non-vote or 3 yes votes).

(
H Z (t ) = k10 1 + 10at 3 + … ) Therefore, P (Y 2 ) is the sum of the coefficients
1 2 of the t2 and t3 terms.
For a = , k = : P ( Z 3 ) = 0.104
2 3
P (Y 2 ) = 0.4 3 + 3 × 0.4 2 × 0.25 = 0.184
For a = 2, k = 1 : P ( Z 3 ) = 0.000356
3
70
WORKED SOLUTIONS
Summary Review
this publication.
Warm-up Questions
2σ z
1 Width of the confidence interval is < 0.2
n
σ = 0.17 and for a 99% confidence interval z = 2.576
2 × 0.17 × 2.576
< 0.2
n
n > 4.3792…

n > 19.17…
nMIN = 20
2 H0: μ = 17
H1: μ ≠ 17
This is a two-tailed test with 2.5% in each tail ⇒ z = ±1.96
x = 17.8 + 22.4 + 16.3 + 23.1 + 11.4 = 18.2

5
x − µ 18.2 − 17
= = 1.12
σ 2.4

n 5
−1.96 < 1.12 < 1.96 ⇒ not in the critical region.
Accept H0: μ = 17. Accept the lecturer’s claim.
3 C ~ N(91, 3.2 2) and S ~ N(72, 2.6 2)

X = C1 + ... + C6 + S1 + ... + S6 + 550
E(X ) = 6 × 91 + 6 × 72 + 550 = 1528
Var(X ) = 6 × 3.22 + 6 × 2.62 + 02 = 102
X ~ N (1528, 102 )
 
P ( X > 1550 ) = P  Z > 1550 − 1528 
 102 
= P ( Z > 2.178 )

= 1 − P ( Z 2.178 )

= 1 − 0.9853 = 0.0147
A Level Questions
1 ∑ x = 5 and ∑ x 2 = 11 for N observations
x = 5 s x2 = 1 11 − 52 
N N − 1  N 
71
Summary REVIEW
∑ y = 10 and ∑ y 2 = 160 for 10 observations

 2
y = 10 = 1 s y2 = 1 160 − 10  = 150
10 9 10  9

So, the pooled estimate is:
2
sp =
( N − 1) × N 1− 1 11 − 25
N (
+ 9 150
9 ) ( )
N + 10 − 2
11 − 25 + 150
s p2 = N
N +8
Given that s p2 = 12
12 ( N + 8) = 11 − 25 + 150
N
12N + 96 = 161 − 25
N
2
12N − 65N + 25 = 0
(12N − 5)(N − 5) = 0
N= 5 or N =5
12
We know that N must be an integer, so N = 5.
2 H0: median = 400 ml

H1: median < 400 ml
The deviations from the median are: −10, −3, −15, 10, −8, −30, 30, −3, −12, −9, −25, 42, −4, −28, −19, 4.
There are 4 positives and 12 negatives.
Under H0, X ~ B (16, 0.5 )

P ( X 12 ) = 0.0384

0.0384 < 0.05 ⇒ this result is in the critical region. Reject H0 and accept H1.
The customers’ complaints are justified.
 0 x <1
 1 3
3 F( x ) =  ( x − 1) 1 x 4
63

 1 x>4
Y= X2 ⇒ X= Y
New limits are: 1 → 1, 4 → 16
For 1 y 16 , G ( y ) = 1
63 (( y )
3
)  3 
− 1 = 1  y 2 − 1
63  
 0 y <1


G( y) =  1  3

y2 − 1 1 y 16
63  

 1 y > 16

Differentiating:
 1 1

g ( y ) =  42 y
2 1 y 16
 0 otherwise

72
WORKED SOLUTIONS
iAt the median, G(y) = 0.5 and y = m

1 m 2 − 1 = 0.5
3
63  


3
m 2 = 32.5
m = 10.18
16
ii E (Y ) = ∫ y g ( y ) d y
1
1 16 23
42 ∫1
= y dy

16
 5
= 1 y 2 
105  
1
= 1 [1024 − 1] = 9.74
105
4 i G X (1) = 1 ⇒ 2 + a = 1 ⇒ a = 3
5
ii For P(X = 2), we need the coefficient of t2.
2 + 3t 3 = 2 + 3t 3 7 − 2t −1 = 1 2 + 3t 3 1 − 2 t −1
7 − 2t ( () 7 ) 7 ( )( )
( ) ( ) + …

2 + 3t = 1 2 + 3t 3 1 + −1 − 2 t + (−1)(−2 ) − 2 t
3 2 
7 − 2t 7  ( ) 7

( )2! 7

The term in t2 is:
1 × 2 × (−1)(−2 ) − 2 t
( ) = 27 × 494 t
2
2
= 8 t2
7 2! 7 343
So P ( X = 2 ) = 8
343
(7 − 2t )9t 2 − (2 + 3t 3 )( −2 )
iii G X′ (t ) =
(7 − 2t )2
5 × 9 − 5 × ( −2 ) 11
E ( X ) = G X′ (1) = =
25 5
5 Let d = score before eating fruit – score after eating fruit
H0: µd = 0 there is no difference between the two sets of results
H1: µd < 0 there is an increase in the results
This is a one-tailed test with p = 0.05, v = 13, so the critical value is −1.771
Calculating the differences and squares:
Student 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Before 15 10 7 12 18 16 15 13 10 5 19 20 14 15
After 16 7 11 14 19 15 12 15 11 7 18 19 19 18
Differences −1 3 −4 −2 −1 1 3 −2 −1 −2 1 1 −5 −3
Squared 1 9 16 4 1 1 9 4 1 4 1 1 25 9
∑ d = −12 and ∑ d 2 = 86
−12 6
xd = =−
14 7
 (−12)2  = 1 × 530 = 530
s d2 = 1 86 −
13  14  13 7 91

73
Summary REVIEW
The test statistic is:
− 67
= −1.329
530
91
14
−1.329 > −1.771 ⇒ not in the critical region. Accept H0
There is insufficient evidence to claim that eating fruit improves mathematical ability.
The claim is not justified.
6 H0: gender and preferred brand are independent
H1: gender and preferred brand are not independent
Table of expected frequencies:
A B C
Male 28.57 37.71 13.71
Female 21.43 28.29 10.29
(O − E )2
X2 = ∑
E
=
( 32 − 28.57 )2 + ( 36 − 37.71)2 + (12 − 13.71)2 + (18 − 21.43)2 + ( 30 − 28.29)2 + (12 − 10.29)2
28.57 37.71 13.71 21.43 28.29 10.29
= 0.4118 + 0.07754 + 0.2133 + 0.5490 + 0.1034 + 0.2842
= 1.639

For a 5% test with v = 1 × 2 = 2, the critical value is 5.991.
1.639 < 5.991 ⇒ not in the critical region. Accept H0.
Gender and brand are independent. There is no difference in the preferences between males and females.
If the sample is n times larger, then χ2 will also be n times larger. For χ2 to be in the critical region it must be
greater than 5.991.
1.639n > 5.991
n > 3.655
Since n is an integer, nMIN = 4.
x x
⌠ x
7
 x2 
 2
x2 4 1 2
 6 dx =  12  = 12 − 12 = 12 x − 4 ( )
⌡2
 0 x <2
 1
F( x ) = 
12
2
x −4 ( 2x4 )

 1 x>4

Y = X3 ⇒ X = 3 Y
New limits are: 2 → 8, 4 → 64
For 8 y 64

G(y ) = 1
12 (( 3 y )
2
)  2
12 

− 4 = 1  y 3 − 4

So, the CDF is:
 0 y <8

 1  23 
G( y) =  y − 4 8 y 64
12  

 1 y > 64

74
WORKED SOLUTIONS
Differentiating:
 1 −1
 3
g ( y ) =  18 y 8 y 64
 0 otherwise

64
64 1 64 23 1  53 
E (Y ) = ∫ y g ( y ) d y = ∫ y dy = y
8 18 8 30  
8
1  5
= 4 − 2 5  = 33.1
30 
8 i Let the difference be after minus before.

H0: median difference = 0 There is no change in the amount of litter in the street
H1: median difference < 0 There has been a reduction in the amount of litter in the street
Calculating the differences and ranks gives:
Site A B C D E F G H I J
Before poster
85 146 137 120 79 95 153 144 108 127
campaign
After poster
78 120 110 128 61 65 121 131 88 104
campaign
Difference −7 −26 −27 8 −18 −30 −32 −13 −20 −23
Rank −1 −7 −8 2 −4 −9 −10 −3 −5 −6
Sum of the positive ranks: P =2

Absolute sum of the negative ranks: Q = 53
Therefore T = 2.
For a one-tail test at the 1% level, T  5 to reject H0.
Since T = 2  5 we can reject H0 and accept H1.
There has been a reduction in the amount of litter in the street.
ii The test tells us if there has been a significant change, but it does not establish cause and effect. In this
case, the reduced amount of litter may not be as a result of the poster campaign.
9 λ=
(0 × 7 ) + (1 × 20 ) + (2 × 39) + ( 3 × 16 ) + ( 4 × 14 ) + (5 × 2 ) + (6 × 1) + (7 × 1)
100
λ = 225 = 2.25
100
H0: data can be modelled by Po(2.25)
H1: data cannot be modelled by Po(2.25)
e −2.25 × 2.25r
The expected values are calculated using 100 × , which gives:
r!
10.540, 23.715, 26.679, 20.009, 11.255, 5.065, 1.899, 0.6105, 0.2275
The last three expected values are too small as they must be greater than 5, so the final four categories are
combined to get an observed value of 4 and an expected value of 7.802.
(O − E )2
X2 = ∑ = 1.189 + 0.5820 + 5.690 + 0.803 + 0.6695 + 1.853 = 10.8
E
At the 2.5% level with v = 4, the critical value is 11.14
10.8 < 11.14 ⇒ Accept H0. The data can be modelled by Po(2.25).
10 i GY (t ) = k (5t − at 4) ⇒ GY′ (t ) = k (5 − 4at 3)

GY (1) = 1 ⇒ 1 = k (5 − a ) 1
E (Y ) = 2 ⇒ 2 = k (5 − 4a ) 2
75
Summary REVIEW
2 ÷ 1
2 = 5 − 4a
5−a
10 − 2a = 5 − 4a ⇒ a = −5
2
Substituting in 1
( )
1=k 5+ 5
2
⇒ k= 2
15
ii 15 ( 2 3 )
GY (t ) = 2 5t + 5 t 4 = 1 (2t + t 4)
′ 1
GY (t ) = (2 + 4t )
3
3
GY″ (t ) = 1 (12t 2) = 4t 2
3
Var (Y ) = GY″ (1) + GY′ (1) − [ G Y′ (1)]
2

G ′ (1) = E(Y ) = 2
Y
GY″ (1) = 4 × 12 = 4
Var (Y ) = 4 + 2 − 2 2 = 2
()
3
iii HZ (t ) = 1 (2t + t 4)3
3
= 1 ( 2t ) + 3(2t )2(t 4) + 3 ( 2t ) (t 4)2 + (t 4)3 

3

27
= 1 8t 3 + 12t 6 + 6t 9 + t 12 
27
P(Z  6) is the sum of the coefficients of t with powers  6.
8 12 20
P ( Z 6) = + =
27 27 27
11 i x = 2478 = 45.05 s x2 = 343.75 = 6.25

55 55
y = 3981 = 56.87 s y2 = 857.5 = 12.25

70 70
For a 90% confidence interval, we need p = 0.95 ⇒ z = 1.645
( 45.05 − 56.87 ) ± 1.645 × 6.25
55
+ 12.25
70

−12.7 µ x − µ y −10.9

ii H 0: µ x − µ y = 0
H 1: µ x − µ y ≠ 0
The test statistic is
45.05 − 56.87 = −22.0
6.25 + 12.25
55 70
For a two-tail test at the 10% significance level, z = ±1.645
−22.0 < −1.645 ⇒ it is in the critical region. Reject H0.
μx is not the same as μy.
12 For 1  x  3,
F ( x ) = ∫ 1 dx = x + c
2 2
When x = 1, F ( x ) = 0 ⇒ 1 +c = 0 ⇒ c = −1
2 2
76
WORKED SOLUTIONS
F ( x ) = x − 1 = 1 (x − 1)
2 2 2
 0 x <1
 1
F ( x ) =  ( x − 1) 1 x 3
2

 1 x>3

G(y) = P(Y  y)

Y = X3 ⇒ (
G( y ) = P X 3 y )
 1
  1
= P X y 3  = F y 3 
   
 0 y <1

  1 
G( y) =  1  y 3 − 1 1 y 27
 2 
 1 y > 27


 1 
For 1  y  27, G ( y ) = 1  y 3 − 1
2 
1 −2 1
⇒ g(y ) = 6 y 3 = 2
6y 3
 1
 2 1 y 27
g (y ) =  6 y 3

 0 otherwise
g( y)
0.2
1
6 1
27,
54
0
5 10 15 20 25 y
27 27 1 1
E (Y ) = ∫ y g (y) d y = ∫ y 3 dy
1 1 6
27 27
 4
  4
=  3 × 1 y 3  = 3  y 3  = 3 ( 81 − 1) = 10
 4 6 1 24  
1
24
P(median Y  mean) = |P(Y < 10) – 0.5| = |G(10) – 0.5|
1  13 
= 10 − 1 − 0.5 = 0.0772 (3 s.f.)
2  

13 ∑x = 2623, ∑x 2 = 1 376 081

x = 2623 = 524.6
5
1 26232 
s2 = 1 376 081 − = 13.8
4  5 
77
Summary REVIEW
95% confidence interval ⇒ 2.5% in each tail ⇒ p = 0.975

There are 4 degrees of freedom ⇒ t4, 0.975 = 2.776 (from tables)
Therefore, the confidence interval is: 524.6 ± 2.776 13.8 = 524.6 ± 4.6118
5
[520, 529] to 3 s.f.
Let the first sample be sample A and the second sample be sample B.
H0: μA = μB
H1: μA ≠ μB
For sample B: ∑ x = 5216, ∑x 2 = 2 720 780, x = 5216

10
= 521.6
 2
s 2 B = 1  2720780 − 5216  = 12.71
9 10 

For the combined sample:
4 × 13.8 + 9 × 12.71
s 2p = = 13.05
13
T = 524.6 − 521.6 = 1.516

13.05 1 + 1
5 10 ( )
10% significance level and 2-tail test ⇒ p = 0.95
1.5164 < 1.771 ⇒ not in the critical region ⇒ accept H0.
There is no significant evidence of a difference in the population means before and after the adjustments.
∞
14 i ∫0 Ae −λt dt = 1
∞
 − A e −λt  = 1
 λ 0

[0] −  − Aλ  = 1

A =1
λ
A=λ
1 16
∫0 λe
−λt
ii dt ≈
100
 −e −λt  ≈ 16
1
 0 100

 −e −λ  − [ −1] ≈ 16
  100

−λ 16
e ≈1−
100

( )
−λ ≈ ln 1 − 16
100
λ ≈ −ln (1 − 16 ) = 0.174
100
For median:
T
∫0.174e
−0.174t
dt = 0.5
0
78
WORKED SOLUTIONS
T
 −e −0.174t  = 0.5
0
[– e– 0.174T ] – [–1] = 0.5

e– 0.174T = 0.5
– 0.174T = ln 0.5
T = 3.98 years (3 s.f.)
15 G(y) = P(Y  y)

Y = X3 ⇒ (
G( y ) = P X 3 y )
 1
  1
= P X y 3  = F y 3 
   
2 1
For 1  x  4, F ( x ) = ∫ 15x dx = 15 x
2
+c
1 1
When x = 1, F ( x ) = 0 ⇒ + c=0 ⇒ c=−
15 15
2
F ( x ) = x − 1 = 1 (x 2 − 1)
15 15 15
 0 y <1

  2 
G ( y ) =  1  y 3 − 1 1 y 64
15  

 1 y > 64

i Let m be the median value of Y.
G(m) = 0.5
1  m 23 − 1 = 0.5
15  

2
m 3 − 1 = 7.5
2
m 3 = 8.5
m = 24.8 (3 s.f.)
 −1  −1
ii For 1 y  64, g ( y ) = 1  2 y 3  = 2 y 3
15  3  45
64 2 64 23
E (Y ) = ∫ y g ( y ) d y =
45 ∫1
y dy
1

64 64
 5
  5
= 2  3 y 3  = 2  y 3  = 2 (1024 − 1)
45  5  75   75
1 1

= 27.28 = 27.3 (3 s.f.)
16 H0: μO – μI = 0
H1: μO – μI ≠ 0
Outdoor times – Indoor times: 0
.1, 2.1, –0.1, 0.2, 2.4, 0.5, 2.8, –2.6
∑x = 5.4, ∑x 2 = 25.08, x = 5.4 = 0.675

8
s= 1  25.08 − 5.4 2  = 1.750

7  8 

79
Summary REVIEW
For the combined sample: t = 0.675 = 1.091

1.750
8
There is no significant evidence that there is a difference between the indoor and outdoor swimming times.
222.8
17 x = 10 = 22.28
s= 4.12 = 0.6766
9
Therefore, the confidence interval is:
2
22.28 ± 2.262 0.6766 = 22.28 ± 0.4840
10
[21.8, 22.8] to 3 s.f.
3 3
 32 dx = 80  − 3  = 80  [ −1] −  − 3  = 80 × 0.5 = 40
18 E 2 x < 3 = 80⌠
⌡2 x  x 2   2 
4 4
 32 dx = 80  − 3  = 80   − 3  − [ −1] = 80 × 0.25 = 20
E 3x < 4 = 80⌠
⌡3 x  x  3   4  

5 5
⌠ 3 dx = 80  − 3  = 80   − 3  −  − 3  = 80 × 0.15 = 12
E 4x <5 = 80
⌡4 x 2  x  4   5   4 
6 6
 32 dx = 80  − 3  = 80   − 3  −  − 3  = 80 × 0.1 = 8
E 5x <6 = 80⌠
⌡5 x  x 5   6   5 

H0 : f ( x ) = 32 fits the data.
x
H1 : f ( x ) = 32 does not fit the data.

x
10% significance level ⇒ p = 0.9
There are 3 degrees of freedom ⇒ χχ3232,, 00.9.9 == 66..251
251 (from tables)
X2 =
( 36 − 40)2 + (29 − 20)2 + (9 − 12 )2 + (6 − 8)2
40 20 12 8
= 0.4 + 4.05 + 0.75 + 0.5 = 5.7
5.7 < 6.251 ⇒ accept H0
f ( x ) = 32 fits the data

x
19 x = 42.5 = 5.3125
8
s = 15.519 = 1.4890
7
T = 5.3125 − 4.5 = 1.5434

1.4890
8
80
WORKED SOLUTIONS
H0: μ = 4.5
H1: μ > 4.5
There is not significant evidence that μ is greater than 4.5
2
5.3125 ± 2.365 1.489 = 5.3125 ± 1.2450
8
[4.07, 6.56] to 3 s.f.
20 For 1  x  3,
F ( x ) = ∫ 1 dx = x + c
2 2
When x = 1,
1 1
F(x ) = 0 ⇒ +c =0 ⇒ c =−
2 2
F ( x ) = x − 1 = 1 (x − 1)
2 2 2
 0 x <1
 1
F ( x ) =  ( x − 1) 1 x 3
2

 1 x>3

i G(y) = P(Y  y)
   1
( )
1
Y = X3 ⇒ G( y ) = P X 3 y = P  X y 3  = F y 3 
   

 0 y <1

  1 
G ( y ) =  1  y 3 − 1 1 y 27
2
  
 1 y > 27

1 1  −2
For 1  y  27, G(y ) = 2  y 3 − 1 ⇒ g (y ) = 1 y 3 = 1 2
  6
6y 3
 1
 2 1 y 27
g (y ) =  6 y 3

 0 otherwise
27 27 1 1
ii E (Y ) = ∫ y g ( y ) d y =∫ y 3 dy
1 1 6
27 27
 4
  4
=  3 × 1 y 3  = 3  y 3  = 3 ( 81 − 1) = 10
 4 6 1 24  
1
24

( )
27 27 1 4
E Y 2 = ∫ y 2 g ( y ) d y =∫ y 3 dy
1 1 6
81
Summary REVIEW
27 27
 7
  7
= 3 × 1 y 3 = 1  y 3
 7 6 1 14  1
1
= ( 2187 − 1) = 156.14
14
Var(Y) = E(Y 2) – E 2(Y) = 156.14 – 102 = 56.1 (3 s.f.)
23.2 + 27.8 = 0.96298

21 s =
50 60
x − y = 25.4 − 23.6 = 1.8
Z= 1.8 = 1.8692
0.96298
Using the normal tables in reverse: z = 1.8692 ⇒ P(Z  z) = 0.9692
Two-tail test at α % significance level ⇒ α % in each tail.
α = (1 − 0.9692) × 100
2 ()
2
α = 3.08
2
α  6.16%
22 Total number of goals scored = (0 × 12) + (1 × 16) + (2 × 31) + (3 × 25) + (4 × 13) + (5 × 3) = 220
Therefore, the average number of goals scored/match is 220 = 2.2 ⇒ λ = 2.2
100
H0: Total number of goals scored can be modelled by Po(2.2)
H1: Total number of goals scored cannot be modelled by Po(2.2)
The expected numbers of goals are:
 2.2 0 × e −2.2 
E 0 = 100 ×   = 11.080
 0!

 2.21 × e −2.2 
E1 = 100 ×   = 24.377
 1!

 2.2 2 × e −2.2 
E 2 = 100 ×   = 26.814
 2!

 2.2 3 × e −2.2 
E 3 = 100 ×   = 19.664
 3!

 2.2 4 × e −2.2 
E 4 = 100 ×   = 10.815
 4!

 2.2 5 × e −2.2 
E 5 = 100 ×   = 4.7587
 5!

E6+ = 100 – (E0 + E1 + E2 + E3 + E4 + E5) = 100 – 97.509 = 2.491
E5 < 5 and E6+ < 5 ⇒ combine E5+ = 4.7587 + 2.491 = 7.2497
O5+ = 3 (from the table in the question)
X2 =
(12 − 11.080 )2 + (16 − 24.377 )2 + ( 31 − 26.814 )2 + (25 − 19.664 )2 + (13 − 10.815)2 + ( 3 − 7.2497 )2
11.080 24.377 26.814 19.644 10.815 7.2497
X2= 7.99
There are 4 degrees of freedom ⇒ χ 42, 0.95 = 9.488 (from tables)
7.99 < 9.488 ⇒ not in the critical region ⇒ accept H0
Total number of goals scored can be modelled by Po(2.2).
82
WORKED SOLUTIONS
23 H0: μ = 5.2
H1: μ > 5.2
∑x = 61, ∑x 2 = 384, x = 61 = 6.1

10
sx = 1  384 − 612  = 1.1499

9  10 

6.1 − 5.2
t= = 2.4751
1.1499
10
2.4751 > 1.833 ⇒ in the critical region ⇒ reject H0. There is significant evidence that the new type of
tree produces a greater mass of fruit on average.
H0: μy = μx
H1: μy > μx
∑y = 70, ∑y 2 = 500.6, y = 70 = 7
10
sy = 1  500.6 − 702  = 1.0853

9  10 

Estimate of the common variance:
2 2
s = 1.1499 + 1.0853 = 0.25
10
T = 7.1 − 6 = 1.8
0.25
1.8 > 1.734 ⇒ in the critical region ⇒ reject H0. There is significant evidence that the mean mass of
fruit produced by gardener Q's trees is greater than the mean mass of fruit produced by gardener P's trees.
24 H0: coffee preferences are independent of company

H1: coffee preferences are not independent of company
Observed Cappuccino Latte Ground Total

Company A 60 52 32 144
Company B 35 40 31 106
Total 95 92 63 250
Expected Cappuccino Latte Ground Total

Company A 54.72 52.992 36.288 144
Company B 40.28 39.008 26.712 106
Total 95 92 63 250
X2 =
(60 − 54.72 )2 + (52 − 52.992 )2 + ( 32 − 36.288)2 + ( 35 − 40.28)2 + ( 40 − 39.008)2 + ( 31 − 26.712 )2
54.72 52.992 36.288 40.28 39.008 26.712
= 0.5095 + 0.0186 + 0.5067 + 0.6921 + 0.0252 + 0.6883 = 2.4404
v=2×1=2 ⇒ χ 22, 0.95 = 5.991 (from tables)
83
Summary REVIEW

Preferences are independent of company.
For the larger sample, the value of v is the same ⇒ v=2
1% significance level ⇒ p = 0.99 ⇒ χ 22, 0.99 = 9.21 (from tables)
To be in the critical region, we require:
2.44N > 9.21
N > 3.774
N must be an integer ⇒ Nmin = 4
6
∫0 kx
2
25 i dx = 1
k  x 3 6 = 1
3  0
72k = 1
k= 1
72
 1 2
 x 0x 6
f ( x ) =  72
 0 otherwise

3 3
E 2 x < 3 = 3∫ x 2 dx =  x 3  = ( 27 − 8 ) = 19 ⇒ a = 19
2 2

3 4
E 3x < 4 = 3∫ x 2 dx =  x 3  = ( 64 − 27 ) = 37 ⇒ b = 37
4 3
5 5
E 4 x < 5 = 3∫ x 2 dx =  x 3  = (125 − 64 ) = 61 ⇒ c = 61
4 4
ii H0: f(x) fits the data

H1: f(x) does not fit the data
X2 =
( 4 − 8)2 + (15 − 19)2 + ( 31 − 37 )2 + (59 − 61)2 + (107 − 91)2
8 19 37 61 91
= 2 + 0.842 10 + 0.97297 + 0.065573 + 2.8132
= 6.6938
v=4 ⇒ χ 42, 0.9 = 7.779 (from tables)
6.6938 < 7.779 ⇒ accept H0 f(x) fits the data
26 H0: area and preference are independent

H1: area and preference are not independent
Observed Area 1 Area 2 Area 3 Total

Local bus service 73 36 30 139
Road surfaces 47 44 20 111
Total 120 80 50 250
Expected Area 1 Area 2 Area 3 Total

Local bus service 66.72 44.48 27.8 139
Road surfaces 53.28 35.52 22.2 111
Total 120 80 50 250
84
WORKED SOLUTIONS
X2 =
(73 − 66.72 )2 + ( 36 − 44.48)2 + ( 30 − 27.8)2 + ( 47 − 53.28)2 + ( 44 − 35.52 )2 + (20 − 22.2 )2
66.72 44.48 27.8 53.28 35.52 22.2
= 5.3646
v=2×1=2 ⇒ χ 22, 0.95 = 5.991 (from tables)
Area and preference are independent. There is no association between them.
27 H0: μ = 1.2
H1: μ > 1.2
Assume the masses are normally distributed.

∑x = 12.11, ∑x 2 = 14.6745, x = 12.11
10
= 1.211
1 12.112 
s= 14.6745 − = 0.032128
9  10 

1.211 − 1.2
T = = 1.0827
0.032128
10
There is no significant evidence that the mean mass of the greengrocer’s cabbages is greater than 1.2 kg.
28 H0: μ = 7.5
H1: μ < 7.5
x = 70.4 = 7.04
10
8.48
s= = 0.970 68
9
7.04 − 7.5
T = = −1.4986
0.970 68
10
The tables are based on the upper tail, so we need to use the positive value of t.
10% significance level and 1-tailed test ⇒ p = 0.9
1.4986 > 1.383 ⇒ in the critical region ⇒ reject H0. There is significant evidence that the population
mean is less than 7.5.
29 For A:
∑x = 57.4, ∑x 2 = 481.1, x = 57.4 = 8.2

7
s= 1  481.1 − 57.42  = 1.3178

6  7 

2
8.2 ± 2.447 1.3178 = 8.2 ± 1.2188
7
[6.98, 9.42] to 3 s.f.
85
Summary REVIEW
Assume that for B, the population is also normally distributed and has the same variance as for A.
H0: μA = μB
H1: μA > μB
For B:
∑x = 37, ∑x 2 = 278.74, x = 37 = 7.4

5
s= 1  278.74 − 37 2  = 1.1113
4  5 

For the combined sample:
s= 6 × 1.31782 + 4 × 1.11132 = 1.536 = 1.2394

10
8.2 − 7.4 0.8
T = = = 1.1024
1 1 0.725 69
1.2394 × 7 + 5

5% significance level and one-tailed test ⇒ p = 0.95
μA is not greater than μB.
Extension Questions
1 i G X′ (t ) = λ e λ(t −1) ⇒ G X′ (1) = λ
G ″X (t ) = λ 2e λ(t −1) ⇒ G X″ (1) = λ 2
E ( X ) = λ
Var ( X ) = λ + λ − λ = λ
2 2
So, E ( X ) = Var ( X )
ii Poisson distribution
π
2 i For 0 x ,
2
( )
x
I = ∫ x cos x 2 dx
0
Using the substitution u = x2
I = ∫ 1 cosu du
2
 x
2 ( )
I =  1 sin x 2  = 1 sin x 2
0 2 ( )
When x =
2 2 ( )
π , 1 sin x 2 = 1
2
So the CDF is:
 0 x <0

1
 sin x 2
 2
( ) 0x
π
2
F(x ) =  1 1 π π
 4 + x 8π 2
< x3
2

 π
1 x>3
 2

86
WORKED SOLUTIONS
ii
 π
P
 6



< X < π  = P X < π − P X <

( ) π
6 
1
= + π
4
1  1
− sin
8π   2
π 
6  ()
1
= +
4
1
8 
−
1
4
= 0.354 ()
3 Let the difference be after minus before.
H0: Median difference = 0 There is no change in the number of flowers.
H1: Median difference > 0 There is an increase in the number of flowers.
Calculating the ranks and signed ranks, we get:
Plant A B C D E F G H I J K L
Number of flowers
3 7 1 5 2 8 4 4 5 9 1 6
before spraying
Number of flowers 1
5 8 5 5 2 2 7 4 0 20 9 15
week after spraying
Difference 2 1 4 0 0 −6 3 0 −5 11 8 9
Rank 2 1 4 −6 3 −5 9 7 8
Notice that three plants have a difference of zero, so we ignore them and reduce n by 3.
P = 34 and Q = 11 ⇒ T = 11
This is a one-tail test at the 5% level with n = 9 ⇒ T  8 to reject H0
T>8 ⇒ Accept H0
There has been no significant change in the number of flowers.
4 i P(X = 0) = (k – 5) × 0! = k – 5
P(X = 1) = (k – 5) × 1! = k – 5
P(X = 2) = (k – 5) × 2! = 2(k – 5)
∴Gx(t) = (k – 5) + (k – 5)t + 2(k – 5)t2
Gx(t) = (k – 5)(1 + t + 2t2)
Gx(1) = 1 ⇒ 1 = (k – 5)(1 + 1 + 2) ⇒ 1 =k −5 ⇒ k = 21
4 4
(
G X (t ) = 14 1 + t + 2t 2 )
G X (t ) = 14 + 14 t + 12 t 2

ii G'X (t ) = 14 + t
µ = G'X (1) = 54

G″X(t) = 1
Var ( X ) = G '' X (1) + G ' X (1) − ( G ' X (1))
2
( ) = 1611
2
= 1+ 5 − 5
4 4
( ) + 161
2
Var( X ) = 25 µ 2 + 16
11 2 5 ⇒ 1
16 = 5 4
87
Summary REVIEW
π
5 i I = ∫ ekxsin x dx
0
By parts: I = −ekxcosx + k ∫ekxcosx dx
(
By parts again: I = −ekxcos x + k ekxsin x − k ∫ekxsin x dx )
Notice the integral is equal to I.
I = – ekx cos x + kekx sin x – k2 I
(1 + k2)I = kekx sin x – ekx cos x
π
 kekxsin x − ekxcos x 
I= 
 1 + k2 0

 ekπ   −1 
I= 2 − 2
 k + 1   k + 1 

ekπ + 1
I=
k2 + 1
ekπ + 1
The integral must sum to 1 (total probability). Therefore: =1
k2 + 1
ekπ + 1 = k2 + 1
ekπ = k2
ii y
y = ekπ
4
2 y = k2
0
–1 1 2 k
The only solution is when k < 0.
6 H0: hair colour and eye colour are independent

H1: hair colour and eye colour are not independent
The table of expected values is:
Eye colour
Blue Green Brown Total
Blonde 4 7.25 13.75 25
Hair colour
Brown 3.84 6.96 13.2 24

Black 5.92 10.73 20.35 37
Red 2.24 4.06 7.7 14
Total 16 29 55 100
We need all expected values to be greater than 5 to apply the c 2 test, so merge blue eye and green eye
columns to get the following table of observed and (expected) values.
88
WORKED SOLUTIONS
Eye colour
Blue or
Brown Total
green
Blonde 21 (11.25) 4 (13.75) 25
Hair colour
Brown 10 (10.8) 14 (13.2) 24

Black 7 (16.65) 30 (20.35) 37
Red 7 (6.3) 7 (7.7) 14
Total 25 55 100
(Ok − E k )2 = (21 − 11.25)2 + (10 − 10.8)2 + (7 − 16.65)2 + (7 − 6.3)2 + ( 4 − 13.75)2

∑ Ek 11.25 10.8 16.65 6.3 13.75
+
(14 − 13.2 ) + ( 30 − 20.35 ) + (7 − 7.7 )
2 2 2
13.2 20.35 7.7
(Ok − E k )2 = 25.781…

∑ Ek
2
There are 3 degrees of freedom. So the critical value of χ 3 at the 0.1% level is 16.27.
Since 25.782 > 16.27, there is sufficient evidence to reject H0. Therefore, you can conclude that hair colour
and eye colour are not independent.
7 i
The sample space for the difference between the two dice is:
Difference Dice 1
1 2 3 4 5 6
1 0 1 2 3 4 5
2 1 0 1 2 3 4
Dice 2 3 2 1 0 1 2 3
4 3 2 1 0 1 2
5 4 3 2 1 0 1
6 5 4 3 2 1 0
Therefore:
x 0 1 2 3 4 5
1 5 2 1 1 1
P(X = x) 6 18 9 6 9 18
Therefore, the PGF is:

G X (t ) = 16 + 18
5 t + 2t 2 + 1t 3 + 1t 4 + 1 t 5
9 6 9 18
5 4 3 4 5
ii G'X (t ) = 18 + 9 t + 6 t 2 + 9 t 3 + 18 t 4
35
E( X ) = G X (1) = 18

G''X (t ) = 94 + t + 12 2 20 4
9 t + 18 t
G''X (1) = 94 + 1 + 12 20 35
9 + 18 = 9
( ) = 665
2
Var( X ) = 35 35 35
9 + 18 − 18 324
aE(X) = Var(X)
35 = 665 ⇒
a 18 324 a = 19
18

89
Summary REVIEW
8 Let the difference be after minus before.

H0: Difference = 0 There is no change in the test scores.
H1: Difference > 0 Test scores have increased.
A B C D E F G H I J
Test scores before tuition 20 15 17 18 8 15 19 24 6 23
Test scores after tuition 22 16 14 18 18 17 25 20 18 18
Difference 2 1 −3 0 10 2 6 −4 12 −5
Rank 2.5 1 −4 8 2.5 7 −5 9 −6
Notice that 1 person has a difference of zero, so we ignore this and reduce n by 1. Notice also that two of the
differences are equal, so the ranks (2 and 3) are averaged.
P = 30 and Q = 15 ⇒ T = 15
This is a 1-tail test at the 2.5% level with n = 9 ⇒ T  5 to reject H0
T>5 ⇒ Accept H0
There has been no significant increase in test scores.
π
When x = 2 ⇒ π2 π2
9 i k e =1
4
−π
4e 2
∴k =
π2
ii Let y = (kx2ex) sin x
Using the product rule for the expression within the brackets and for the overall expression:

dy
dx ( )
= kx 2e x cos x + (kx 2e x + 2kx e x )sin x = kx 2e xcos x + kx 2e xsin x + 2kxe xsin x
(kx 2e x + 2kx e x )sin x = kx 2e xcos x + kx 2e xsin x + 2kxe xsin x
= kx e x(xcos x + xsin x + 2sin x)

−π
dy 4e 2 x
= 2 xe (xcos x + xsin x + 2sin x)
dx π
 −π
 4e 2 x e x(xcos x + xsin x + 2sin x) π
Therefore the pdf is f ( x ) =  π2 0x
2
 0 otherwise

10 Since a, b, c forms a geometric progression: GX(t) = a + art + ar2t2
GX(1) = 1 ⇒ 1 = a + ar + ar2 ⇒ 1 = a(1 + r + r2)
GX′(t) = ar + 2ar2t ⇒ G′X (1) = E ( X ) = ar + 2ar 2 = a(r + 2r 2)

Simultaneous equations:
1 = a(1 + r + r2) 1
24 2
19 = a(r + 2r ) 2
2 ÷ 1
24 a(r + 2r 2)
19 = a(1 + r + r 2)

24(1 + r + r2) = 19(r + 2r2)
24 + 24r + 24r2 = 19r + 38r2
0 = 14r2 – 5r – 24
0 = (7r + 8)(2r – 3)
r = − 78 or r = 23

90
WORKED SOLUTIONS
3
Since the geometric progression is increasing ⇒ r = 2
1 = 4
Therefore a =
1 + 23 + 94 19
4 + 6 t + 9 t2
So G X (t ) = 19 19 19
G X ′ (t ) = 6 + 18 t
19 19
G′′X(t ) = 18
19 ⇒ G′′X(1) = 18
19

( ) = 222
2
Var( X ) = 18 24 24
19 + 19 − 19 361
11 Let the difference be new minus original.

H0: Difference = 0 There is no change in the median number of customers per hour.
H1: Difference ≠ 0 There is a change in the median number of customers per hour.
A B C D E F G H I J K L M N O
Original location
(median number
224 108 613 251 700 632 348 372 366 571 336 515 324 198 337
of customers
per hour)
New location
(median number
361 202 484 380 837 632 485 395 237 258 465 714 523 69 337
of customers
per hour)
Difference 137 94 −129 129 137 0 137 23 −129 −313 129 199 199 −129 0
Rank 9 2 −5 5 9 9 1 −5 −13 5 11.5 11.5 −5
Notice that zero ranks have been ignored and tied ranks have been averaged.
P = 63 and Q = 28 ⇒ T = 28
This is a 2-tail test at the 2% level with n = 13 ⇒ T  12 to reject H0
T > 12 ⇒ accept H0
There has been no significant change in the median number of customers per hour. The market research
was correct.
12 i GX(t) = q + pt
ii GY(t) = [GX(t)]n = (q + pt)n This represents the binomial distribution.
iii GY′(t) = np(q + pt)n – 1 ⇒ GY′(1) = np(q + p)n – 1
We know that q + p = 1 ⇒ GY′ (1) = E ( X ) = np
iv GY″(t) = (n – 1) np2(q + pt)n – 2 ⇒ GY″(1) = (n – 1) np2(q + p)n – 2

We know that q + p = 1 ⇒ GY″(1) = (n – 1) np2
Var(X) = (n – 1)np2 + np – (np)2
= n2p2 – np2 + np – n2p2
= np – np2
= np(1 – p)
We know that q + p = 1 ⇒ q=1–p
Var(X) = npq
91

Collins - Cambridge - Further Probability & Statistics - Worked Solutions

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Collins - Cambridge - Further Probability & Statistics - Worked Solutions

Uploaded by

Copyright:

Available Formats

Cambridge International

AS & A Level Further Mathematics

Further Probability & Statistics

Yimeng Gu, Dr Patrick Wallace

Pure Mathematics 1 International Students Book Title page.indd 1 14/11/17 10:46 pm

(1614 ) − (141 ) = 1141

b i Using the graph from part a

8 a b P(X > 5) = 0.5

( k8 − 0) + (1 − 0.5) +  ( 29 − 98k ) − (3 − k2 ) = 1

( 12 x  m) = F (2m) = 0.2(2m) − 0.01(2m)

For y ∈ ( 0, 2 ) , FY ( y ) = P ( 15 x  y ) = F (5y ) = 1001 (5y ) = 14 y

0.8 – 4k + 6k = 1 Therefore, a = 4, a = –4 (reject as a > 0)

Change limits: Therefore

( 18 x ) dx =  24x  = 83

( 252 x )dx = 1002 x  = 252

= 0.759375 + 0.459 + 0.116147 = 1.335

Exercise 1.5A – 0.05x2 + 0.4x – 0.55 = 0

b Q1: 1 x 2 = 0.25 f(y)

Therefore the median value lies between 3 1 + 1 y < 4

x=4 d The 20th percentile is when y is between 0 and 1,

For r ∈ (4, 6), Exam-style questions

a P ( 0.5 < x < 3 ) =

must lie between 1 and 4.5.

sin ( x )  + ∫ 2sin ( x ) dx = 2π −  −4cos ( x )  = 2.28

0 x <1 iii P(X  p) = 0.85

ii FN(n) = P(T 2  n) For x ∈ (2, 5)

because X cannot take negative values. 1 a

For y ∈ (25, 64)

2 Inference using normal and t -distributions

As −2.776 < −0.481 < 2.776, the test statistic T lies

The test statistic T = 136.4 − 138 = − 0.481 10 H0 : m = 50 grams H1 : m = 50 grams

285.9 1  285.92  y = 19.99 s y2 = 0.0784

H1 : m X − mY ≠ 0 4 Let X denote the height of young plants receiving

One-tailed test with p = 0.95, v = 9 + 9 – 2 =16; x A = 49.3 s x A 2 = 0.0375

42 One-tailed test to the right with p = 0.9, v = 9; the

b There is a 5% chance that the confidence x = 6.99 s = 0.4408

b You are 95% confident that the difference in 503.5 − 498

4 A paired sample t-test y = 6.025 s x = 9.099

1.81  mm − mw  2.46 −1.13  m x − m y  0.0748

Two-tailed test with p = 0.995, v = 18; the 15

b 15.95 ± 2.571 1.629 H0 : m1 – m2 = 0

A one-tailed test to the right with p = 0.95, sp2 =

The test statistic, 0.5225 > –0.2015, lies within 20 a L

Mathematics in life and work

1 a The total frequency is 25 + 31 + ⋅⋅⋅ + 1 = 100

b The total frequency is 13 + 25 + 32 + 5 = 75

b i P (150 < Y  200 ) = Φ ( 20050− 260 ) − Φ(15050− 260 )

1 H0: Distribution of flights is as Yusuf claims.

2 H0: Number rolled on the dice can be modelled by a uniform distribution.

Degrees of freedom 5 − 1 = 4, critical value 9.488

Degrees of freedom 7 − 1 = 6, critical value 16.81

6 H0: Sarah’s cat’s food preference can be modelled by a uniform distribution.

Turkey Fish Chicken Lamb Beef

Degrees of freedom 4 − 1 = 3 , critical value 7.815

9 a P(X < 150) = Φ(15050− 260 ) = Φ(−2.2) = 1 − 0.9861 = 0.0139

Time under 150 150–200 200–250 250–300 over 300

5% significance level, degrees of freedom 5 − 2 = 3, critical value 7.815

6 a x = 38.68 (by symmetry), y = 160 − 2 × 0.9935 + 9.696 + 38.68) = 61.26

Alternative method: z = 160 ×  Φ

(0.5 × 3 + 1.5 × 16 + 2.5 × 21)

e H0: The data can be modelled by the random variable X.

0t1 1<t2 2<t3

Less than 10 10–20 20–30 30–40 40–50 50–60 60 or more

( 12 x m) = F (2m) = 0.2(2m) − 0.01(2m)

For y ∈ ( 0, 2 ) , FY ( y ) = P ( 15 x y ) = F (5y ) = 1001 (5y ) = 14 y

Therefore the median value lies between 3 1 + 1 y < 4

2 Inference using normal and t -distributions

1.81 mm − mw 2.46 −1.13 m x − m y 0.0748

b i P (150 < Y 200 ) = Φ ( 20050− 260 ) − Φ(15050− 260 )

c P(M 22.5) = Φ ( 22.59.563− 35 ) = Φ(−1.307) = 0.095 59

5 a P(150 < l 170) = Φ (17020− 160 ) − Φ(15020− 160 ) = Φ(0.5) − Φ(−0.5)

100 × P(170 < l 190) = 100 × [ Φ(1.5) − Φ(0.5) ] = 24.17

Degrees of freedom (3 − 1) × (2 − 1) = 2, critical value 9.210

1 a H0: median revision time is 30 hours a week

8 X ∼ B(n, 0.5): P(X = 0) = (0.5)n

5 a H0: Median consumption of refined petroleum products is 35 barrels a day.

c X ∼ B(30, 0.5), two-tailed test. Normal approximation required.

d For a sample of size n, P (T 2 ) = 6n

(n + m + 1) = 7 × ( 9 + 7 + 1) = 119. 119 − 81 = 38 and 119 − 55 = 64

e Critical value for 5% significance level one-tailed test: 43.

f H0: Girls and boys raise the same amount of sponsorship.

H0: Chesford and Amerston have the same crime rate.