You are on page 1of 26

Ordinary Certificate, Paper II, 2001.

Question 1

(i) p for z = 0.714 is given by


THE ROYAL STATISTICAL SOCIETY
0.014
p for 0.70 p for 0.72 p for 0.70
0.02
2001 EXAMINATIONS SOLUTIONS 0.014
0.7580 0.7642 0.7580
0.02
0.7580 0.7 0.0062
ORDINARY CERTIFICATE 0.7623

PAPER II
This can be rewritten as 0.7580 0.7(0.7642 0.7580)
or (0.3 0.7580) (0.7 0.7642)
where the weights are 0.3 and 0.7.

The Society provides these solutions to assist candidates preparing for the
examinations in future years and for the information of any other persons using the (ii) When z = 0.70, pmin = 0.75795 and pmax = 0.75805.
examinations.
When z = 0.72, pmin = 0.76415 and pmax = 0.76425.
The solutions should NOT be seen as "model answers". Rather, they have been
written out in considerable detail and are intended as learning aids.
The minimum for z = 0.714 is (0.3 0.75795) (0.7 0.76415)
Users of the solutions should always be aware that in many cases there are valid = 0.76229 i.e. 0.7623.
alternative methods. Also, in the many cases where discussion is called for, there
may be other valid points that could be made.
The maximum is (0.3 0.75805) (0.7 0.76425)
While every care has been taken with the preparation of these solutions, the Society = 0.76239 i.e. 0.7624.
will not be responsible for any errors or omissions.

The Society will not enter into any correspondence in respect of these solutions. [Note that the values calculated above are not necessarily accurate to 4 decimal
places.]

RSS 2001

Ordinary Certificate, Paper II, 2001. Question 2 Ordinary Certificate, Paper II, 2001. Question 3

For clarity, use the hundreds as stems, tens as leaves and discard the units. [An Cumulative frequencies and percentage frequencies are:
alternative would be first to round to the nearest 10.] Times in seconds.
<1 <2 <3 <6 < 12 < 24 < 36 TOTAL
Unordered, reading across rows:
A 2 5 12 32 65 85 95 100
0 7 6 7 8 8 % 2 5 12 32 65 85 95 100
1 7 0 3 1
2 1 5 9 7 8 9 5 1 8 7 9 B 40 60 78 100 120 140 160 200
3 1 2 2 2 3 % 20 30 39 50 60 70 80 100
4 2 1 5 4 8 9
5 0 5 2 2 8 8 5 5 0
(i) For the graphs, see the next page.
Ordered:

0 6 7 7 8 8 (ii) From the graphs, take the 50% points for (a) and (b), and use the 25% and
1 0 1 3 7 75% points in calculating (c) and (d).
2 1 1 5 5 7 7 8 8 9 9 9
3 1 2 2 2 3 (a) Median for A is 9 months.
4 1 2 4 5 8 9 (b) Median for B is 6 months.
5 0 0 2 2 5 5 5 8 8 (c) Quartiles for A are 5 months and 18 months (approximately) so the
inter-quartile range is 13 months.
(d) Quartiles for B are 1 months and 30 months (approximately) so the
The median and quartiles help to give information about the distribution of times. inter-quartile range is 28 months.
The median is about 300 (midway between the 20th and 21st entries in the ordered
stem-and-leaf diagram). The lower quartile (between 10th and 11th) is about 210 and
the upper quartile is about 485. So a quarter of the calls last less than about 210, half (iii) Half of the employees at A leave within 9 months, at B within only 6 months.
less than about 300, and a quarter are longer than about 485. The inter-quartile ranges show that there is much more variability in the length of
service in Company B than in A. A quarter of B's employees stay more than 30
Inspection also shows that 14 (i.e. about one-third) of calls are concentrated in the months, but for A the comparable figure is only 18 months. But a quarter of B's
period 250 330. Five of the 40 calls were 540 or longer (i.e. 9 minutes or more). employees leave within about 1 months, whereas for A the comparable figure is
about 5 months. So employees of B have a tendency either to leave almost at once or
So the calls range in length from approximately 1 to 10 minutes, with a substantial to stay a long time, whereas for A they do not begin leaving quite so soon but nearly
number between 4 and 5 minutes. all have left by the end of 3 years.

Continued on next page


Ordinary Certificate, Paper II, 2001. Question 4
Cumulative percentage frequency of length of service for companies A and B

Advantages: standard deviation is very useful in theoretical work and in statistical


methods and inference.
100

Disadvantages: it is not easy to calculate, and its value is seriously influenced by


extreme values in a set of data.

(i) For the Celsius temperatures, 146, so x 18.25


8

80
xi
i 1

xi
2
1 1
xi2 2738, so s 2 xi2 2738 2664.5
7 8 7

73.5
10.5, so s = 3.2404
60

7
Cumulative % Frequency

Company A
Company B
(ii) If y = Fahrenheit temperature, we have y = 32 + 1.8x.

40 Hence y 32 1.8 x 64.85

sy 1.8 sx 1.8 3.2404 5.8327

x x
(iii) The scaling z is to be applied.
20 sx

Maximum and minimum values of x are 23 and 13, so for z they are

23 18.25 13 18.25
and , i.e. 1.466 and 1.620.
3.2404 3.2404
0

The range is 1.466 1.620 3.086 .


0 3 6 12 24 36

Length of Service (months)

Ordinary Certificate, Paper II, 2001. Question 5 Ordinary Certificate, Paper II, 2001. Question 6

Probabilities are Pairs of data values may have been collected but not in the form of measurements.
For example, two people may have tasted the same range of jams, each person
Male Female Total ranking them from best to worst. Do these rankings agree, at least reasonably well, or
Under 25 0.15 0.12 0.27 are they very different? In other words, are the items ranked in approximately the
Age
Over 25 0.45 0.28 0.73 same order by the two people? A rank correlation coefficient can be used to examine
Total 0.6 0.4 1 this.

using the information given. The product-moment coefficient assumes measurements are bivariate Normally
(e.g. P(male and under 25) = 0.25 0.6 = 0.15). distributed. If this is not a reasonable assumption, the ranking of measurements rather
than their actual values may be used to compare them. Data with extreme values in
(i) (a) 0.15; (b) 0.45; (c) 0.12; (d) 0.28. them can be handled in this way.

Spearman's rank coefficient uses rankings, and makes the same calculations on them
(ii) as Pearson does on actual measurements.

(a) P(under 25) = sum of entries in first row = 0.27.


A rank coefficient lies between 1 and 1.
(b) Males contribute 0.15 towards this 0.27, so required probability
0.15 (i)
0.556 . Sample (1) (2) (3) (4) (5) (6) (identical
0.27
Judge A 4 1 6 2 3 5 rankings)
Judge B 4 1 6 2 3 5
(c) This is all except the class "female over 25", so required probability = 1 0.28
= 0.72.
(ii)
(d) This is the complement of (c), so probability is 0.28. Sample (1) (2) (3) (4) (5) (6) (diametrically
Judge A 3 4 1 5 6 2 opposite
Judge B 4 3 6 2 1 5 rankings)
(iii)

(a) The probabilities in this table must be multiplied by the corresponding (iii) See graph on next page. The last-but-one is the obvious outlier.
probabilities in the table above, to obtain

0.09 0.15 0.06 0.12 0.04 0.45 0.02 0.28 0.0443. (iv) Rankings smallest first.

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10)
(b) P(male and under 25|accident) =
L 10 9 4 5 2 1 8 3 7 6
P(male and under 25 and accident) 0.09 0.15 S 8 10 5 6 3 2 9 4 1 7
0.305 . d=L S 2 1 1 1 1 1 1 1 6 1
P(accident) 0.0443

6 d2 6 48
d2 48. rs 1 1 0.709
n(n 2 1) 10 99

Continued on next page


Ordinary Certificate, Paper II, 2001. Question 7
(v) Without the outlier (L7, S1), ranks become
(i)
L 9 8 4 5 2 1 7 3 6
S 7 9 4 5 2 1 8 3 6 (a) The basic underlying long-term movement of the series.
d 2 1 0 0 0 0 1 0 0
(b) Short-term regular variation about the trend (e.g. seasonal, daily, time of day).
6 6
d2 6 . rs 1 0.95 (c) A model in which the series value y is the sum of a trend component and a
9 80
seasonal component and a random "residual" component.
The rankings now agree almost perfectly. The outlier was clearly a different shape.
(d) A model in which y is the product of these components.

If the trend is changing slowly, an additive model is likely to be suitable, but with
rapid changes of trend the multiplicative version is preferable.

SCATTER DIAGRAM FOR PART (iii) (ii) Taking seasonal variation as (column 2 column 3):

Quarter 1 2 3 4
1996 34.125 15.000
1997 6.000 19.125 38.000 3.375
70 1998 32.000 3.625 37.375 8.125
1999 16.875 9.625 35.875 7.250
(Unadjusted) average 18.292 10.792 36.344 8.438 Total 1.178
Adjustment 1.178/4
60 = 0.294
Adjusted average 17.998 10.498 36.638 8.144 (0.002)
(to nearest 1000 units) 18 10 37 8
50
L metres

(iii) Predictions for 2001 (thousands of units) are

Quarter 1 2 3 4
40
170 + 18 172 + 10 174 37 175 + 8
= 188 = 182 = 137 = 183
(Note. The predicted trend values for 2001 used in this calculation are given in the
30
question.)

20 (iv) Unusual changes in the series, e.g. especially cold or warm winter causing
particularly high or low usage, would make predictions inaccurate. Expansion of
20 30 40 50 60 70 college activities would change the usage, and also make forecasts inaccurate. So
S metres would closure of a department using heavy electrical equipment.

Ordinary Certificate, Paper II, 2001. Question 8

(i) The correlation implies that as the age of men at marriage increases, so does
that of women. They do not have to be equal, simply that younger men marry THE ROYAL STATISTICAL SOCIETY
younger women and older men marry older women. If every man was 5 years older
than his bride there would be correlation +1.
2002 EXAMINATIONS SOLUTIONS
150.4
(ii) Index in 2000 compared with 1995 has increased 100% i.e. 21.1%.
124.2
This is the correct, direct comparison.
ORDINARY CERTIFICATE
(iii) The statement would be true if we were given the median, not the mean. More PAPER II
than half will earn less than the mean, because the few highly paid specialists will
make the distribution skew to the right.

The Society provides these solutions to assist candidates preparing for the examinations
in future years and for the information of any other persons using the examinations.

The solutions should NOT be seen as "model answers". Rather, they have been written
out in considerable detail and are intended as learning aids.

Users of the solutions should always be aware that in many cases there are valid
alternative methods. Also, in the many cases where discussion is called for, there may be
other valid points that could be made.

While every care has been taken with the preparation of these solutions, the Society will
not be responsible for any errors or omissions.

The Society will not enter into any correspondence in respect of these solutions.

RSS 2002
Ordinary Certificate, Paper II, 2002. Question 1

(i)

Age last birthday Employees Frequency Density


(years)
15 19 240 240
20 24 340 340
25 29 360 360
30 39 420 210 Frequency
40 49 380 190 density
50 64 240 80
Frequency density is frequency per 5-year interval.

See next page for histogram.

(ii) The histogram is peaked between 20 and 30, and is quite skew to the right. The
greatest density of employees is at a younger age, so one possible explanation of this
could be "ageist", although there might well be other explanations such as experienced 300
people finding better-paid jobs elsewhere.

(iii) The original table used unequal intervals of age, and it is not until a frequency
density based on intervals of the same size (5 years) is calculated that the pattern is easy 200
to see. The difference between absolute frequency and frequency density is unlikely to
be known to the casual observer.

100

15 30 45 60
Age

Ordinary Certificate, Paper II, 2002. Question 2 Ordinary Certificate, Paper II, 2002. Question 3

The dispersion in a set of data is the variation among the set of data values. It measures 47118
whether they are all close together, or more scattered. (i) x 471.18 .
100
1 47118 8509344.76
2

s2 30710404 85952.98 , so s = 293.18 .


99 100 99
Range is the difference between the largest and smallest values in the data set. It is very
easy to calculate, but it depends only on the largest and smallest values which may
sometimes be extremes or outliers; it does not use the whole pattern including the central
(ii)
values.
x 499.5
Inter-Quartile Range (IQR) is the difference in value between the upper quartile (Q3) and Class Tally Frequency Mid-point, x y
the lower quartile (Q1). Q3 has 75% of the total number of observations below it, Q1 has 200
25%. (The semi-inter-quartile range, SIQR, is 12 Q3 Q1 , and is often used instead).
000 999 |||| |||| |||| |||| ||| 23 99.5 2
Once the data are arranged in rank order, Q1 and Q3 are easy to locate, and the value 200 - 399 |||| |||| |||| |||| ||| 23 299.5 1
(Q3 Q1) is not affected by the most extreme data values. However, it is not easy to
develop theory for using these measures in mathematical statistical methods. 400 599 |||| |||| |||| |||| 19 499.5 0

Variance is an "average" deviation from the mean, squared. For a sample, the definition 600 799 |||| |||| |||| | 16 699.5 1
1
is s2 = x x ; if the data are regarded as an entire population, the divisor is 800 - 999 |||| |||| |||| |||| 19 899.5 2
2

n 1
often taken as n rather than n 1. It shows whether data cluster round their mean or are Total 100
more spread out. It does use all the data values in the calculation but only really gives a
good measure when data are fairly symmetrical; extreme values or outliers affect s2
considerably. This measure is the one for which good mathematical theory exists (based
on a Normal distribution), and so it is widely used. The standard deviation, s, is in the (iii) Using the last two columns in the above table, to give a coded measure y, we have
same units of measurement as x, so is often preferred. fy = 46 23 + 0 + 16 + 38 = 15, so y = 15/100 = 0.15. Now,
x 499.5
y , so x 200 y 499.5 30 499.5 469.5 .
200

The variance and range will be affected by the extreme values (if any) at each end of the Also, fy2 = (23 4) + (23 1) + 0 + (16 1) + (19 4) = 207.
distribution. The IQR (or SIQR) gives a better idea of the spread of wages in the central 15
1 204.75
2

part of the wage distribution, so it may be a better measure (even though it does not So variance in coded units = 207 2.0682 , and the standard
reflect the whole distribution, especially its ends). 99 100 99
deviation in coded units is 1.438. Hence the standard deviation of x is 200 1.438 =
287.6.

(iv) In part (iii) we have had to "group" all the observations at the interval mid-points,
assuming uniform spread through each interval. This explains the small differences
between results in (i) and (iii); but since the differences are small, the assumption is quite
good.
Ordinary Certificate, Paper II, 2002. Question 4 Ordinary Certificate, Paper II, 2002. Question 5

(i) (i)

D P 500

12 x 6 450

400

22 350
y z 300
Expenditure
250
3 year M.A.

8 w 200

F 150

100

50

0
Given that 62% did tick D, then 12 + x + y + 22 = 62 or x + y = 28. 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000

Also 58% did tick P, so 6 + x + z + 22 = 58 or x + z = 30. (ii)


Finally, 56% did tick F, so 8 + y + z + 22 = 56 or y + z = 26.
Year Expenditure 3-year M.A.
1986 200
If w is the % ticking none of D, P, F, then 48 + x + y + z + w = 100, so x + y + z + w = 52.
1987 203 203.33
1988 207 210.00
Adding , , gives 2(x + y + z) = 28 + 30 + 26 = 84, i.e. x + y + z = 42, and so w = 10.
1989 220 223.00
gives z y = 2. In , put z = y + 2, giving 2y + 2 = 26, i.e. y = 12. 1990 242 239.33
1991 256 251.67
Then x = 16 and z = 14. 1992 257 262.00
1993 273 278.33
1994 305 306.33
(ii) 1995 341 341.67
1996 379 371.67
No factors 1 factor 2 factors 3 factors Total 1997 395 400.00
w 12 + 6 + 8 x+y+z 22 1998 426 423.33
Probability 0.10 0.26 0.42 0.22 1.00 1999 449 442.33
2000 452

(iii) Excluding w, the total probability is 0.90, so the conditional probabilities of 1, 2,


3 are (iii) A moving average for 2000 requires the 2001 data. Retrospectively, a moving
0.26 0.42 0.22 average keeps close to the data curve provided that the data curve is fairly smooth, as it is
0.289, 0.467, 0.244 . here. As a forecasting tool, however, a moving average does not pick up changes in trend
0.9 0.9 0.9
until some time later and we have no knowledge whether the trend after 2000 will
continue as before. Prediction for a 4-year period 2001 5, using in effect data which end
at 1999, is not likely to be reliable.

Ordinary Certificate, Paper II, 2002. Question 6 Ordinary Certificate, Paper II, 2002. Question 7

Diagram 1 100 15203


(i) If 1997 is taken as 100, 1998 is 104.2 . In the same way, 1999
14590
(i) There is no indication of what is being measured on the y-axis, and no units are 100 15735
given. Therefore we do not know whether the changes are large or small. is 107.8, 2000 is 111 and 2001 is 113.7 (all relative to 1997).
14590
(ii) If it is worthy of comment, there seems to be a fairly steady increase in y over the
4 years. (If the units are in money, we would like to know whether they have been
corrected for inflation.) (ii) 1998 is 104.2, as in part (i).
100 15735 100 16191
By this method, 1999 is 103.5 , 2000 is 102.9 and 2001 is
15203 15735
Diagram 2
100 16596
102.5 .
(i) Which is which for 2000/1 have they crossed over or not? 16191

Note also that the vertical scale does not start at zero.
(iii) The index (i) seems to be showing that roughly 3.5% of the 1997 earnings is
(ii) The heading implies that they have crossed over, and if so A has gone back to being added each year. However, the % increase based on present earnings, shown in
where it started, having initially done much better than B; while B still has a more gentle (ii), is falling by a little more than 0.5% each year. So actual earnings have increased
upward trend. each year, but at a decreasing rate.
A key with different symbols for drawing the lines would distinguish between A
and B.

Diagram 3

(i) What is the distinction between and ---- ? Do we already have the complete
year-end data for 2001/2? And what are the ---- lines based on?

(ii) Starting from a very low base, sales have doubled in the first year, and apparently
doubled again in the second year. On this basis, the projection seems to be that they will
double again in each of the next two years. This seems highly dubious on the
information available.
Ordinary Certificate, Paper II, 2002. Question 8

(i) Driver and car costs for 10 ferry routes


THE ROYAL STATISTICAL SOCIETY
180

2003 EXAMINATIONS SOLUTIONS


160 D
F

140
G

ORDINARY CERTIFICATE
C
120

PAPER II
100 I
E
A
)

80
J H

60

40

The Society provides these solutions to assist candidates preparing for the examinations
in future years and for the information of any other persons using the examinations.
20

The solutions should NOT be seen as "model answers". Rather, they have been written
0
0 5 10 15 20 25 30 35 40 45
)
out in considerable detail and are intended as learning aids.

Users of the solutions should always be aware that in many cases there are valid
(ii) x 275 y 1167 n 10 x 27.5 y 116.7 alternative methods. Also, in the many cases where discussion is called for, there may be
other valid points that could be made.
S xy xy 1
x y 34046 32092.5 1953.5
10
While every care has been taken with the preparation of these solutions, the Society will
S xx 8113 275 /10 550.5 S yy 144671 1167 /10 8482.1 not be responsible for any errors or omissions.
2 2

S xy 1593.5
r 0.904 The Society will not enter into any correspondence in respect of these solutions.
S xx S yy 550.5 8482.1

This is near +1, showing a good linear relationship between x and y, with both increasing
together.

S xy
(iii) y = a + bx, where b and a y bx .
S xx
1953.5
b 3.549 a 116.7 3.549 27.5 19.10 .
550.5

The fitted line is y = 19.10 + 3.549x .


RSS 2003
(iv) H is "cheapest". (E is only slightly "dearer" than H). D is "most expensive".

Ordinary Certificate, Paper II, 2003. Question 1 Ordinary Certificate, Paper II, 2003. Question 2

(i) Letters to the Editor, 18 September 2001 12 anuary 2002. (i) One-fifth is 20%. The quotation has simply added 14% and 6%, instead of
Sex Frequency averaging them, weighted according to the numbers of each sex in the population.
Men 1698
Women 244 If there were equal numbers, the weights would be . The percentage affected in
TOTAL 1942
the whole population would therefore be ( 14) + ( 6) = 10.
Source: "The Times", 21 January 2002.

Frequency
999 333 666
(ii) The reduction is actually 66.7% .
999 999
1500

(The quotation had divided by 333, not 999.)


1000

(iii) The further 10% reduction applies to the reduced premium, not the original, i.e. it
500
is 10% of the 30% no-claims premium.
1
So the total reduction is 70 30 73% , not 80%.
10
Men Women

(ii) Letters to the Editor from ladies, 18 September 2001 12 anuary 2002.
Subject Frequency
Home and family 18
Terrorism and war 17
Education 12
Social questions 5
The arts 5
Others 43
TOTAL 100
Source: "The Times", 21 January 2002.

Frequency
50

Home and Terrorism Education Social The arts Others


family and war questions

(iii) Pie charts are less easy to draw than bar charts, but they show proportions of the total
rather than actual numbers of letters. In (i) this would be more useful. A bar chart would
be better in (ii) to make exact comparisons between the subjects, although a pie chart
would help to emphasise the proportion in the "others" category.
Ordinary Certificate, Paper II, 2003. Question 3 Ordinary Certificate, Paper II, 2003. Question 4

(i) Salaries for EFG Bank:-

Question 2 incorrect Question 2 correct Total


Salary Frequency Cumulative frequency
Under 10,000 6200 6200
Question 1 incorrect 4 4 8 10,000 but under 15,000 12000 18200
Question 1 correct 6 11 17 15,000 but under 20,000 15600 33800
Total 10 15 25 20,000 but under 25,000 14200 48000
25,000 but under 30,000 11900 59900
30,000 but under 40,000 7000 66900
40,000 but under 50,000 3500 70400
17 50,000 but under 100,000 1500 71900
(ii) (a) 0.68 100,000 or more 100 72000
25

15 (i) Median is at (m1 + m2) where m1 and m2 are the values of the 36000th and 36001th
(b) 0.60
25 observations. There are 33800 up to 19999.5 [ 19999.99 could be argued for, and similarly in the
rest of the question, but this possibility is ignored; it would make hardly any difference to the answers].

11 36000.5 33800 = 2200.5, and the next interval is 5000 wide with frequency 14200. So
(c) 0.44 we need to go 2200.5/14200 of the way through it to locate the median, which will
25 therefore be 19999.5 + (2200.5/14200) 5000 = 20774, or 20800 to the nearest 100.

21 For the upper quartile, (72000) = 54000 [very slightly different definitions of percentiles are similarly
(d) 0.84 ignored they would make hardly any difference; the 6000/11900 below is similarly an approximation].
25 There are 48000 up to 24999.5. The next interval is 5000 wide with frequency 11900.
So we need to go (approximately) 6000/11900 of the way through it, to Q3 = 24999.5 +
11 (6000/11900) 5000 = 27521, or 27500 to the nearest 100.
(e) 0.65
17 For the lower quartile, (72000) = 18000 and thus we similarly get Q1 = 9999.5 +
(11800/12000) 5000 = 14916, or 14900 to the nearest 100.
4
(f) 0.4 For the 95th percentile, 0.95 72000 = 68400 and we similarly get that the 95th percentile
10 is at 39999.5 + (1500/3500) 10000 = 44285, or 44300 to the nearest 100.
5% of 72000 is at 3600, which is in the "under 10000" group.
(iii)
Number correct 0 1 2 Total (ii)
Frequency
5th percentile Q1 Median Q3 95th percentile
4 10 11 25 UK 12100 15700 23200 32300 48600
EFG < 10000 14900 20800 27500 44300
4 0 10 1 11 2 32
Mean 1.28 . EFG's statistics are all lower than the corresponding UK ones, differences tending to
25 25 increase further up the scale.

fx Possibly EFG has a different pattern of workforce compared with banks in general, with
2
1 1 322
Variance fx 2 4 0 10 1 11 4 more younger workers and/or more part-timers, or more employees in call-centres.
24 25 24 25

1 (iii) The mean and standard deviation would be inflated by the salary figures in the (open-
54 40.96 0.5433 , so standard deviation = 0.737
24 ended) top salary range, and we would need to make assumptions about the limits of the
uppermost interval (and, less importantly, the lowest interval) in order to do the
[standard deviation = 0.72 if divisor 25 is used] calculations. The percentiles allow more detailed comparison of the differences between
UK and EFG than could be made using only the mean and standard deviation.

Ordinary Certificate, Paper II, 2003. Question 5 Ordinary Certificate, Paper II, 2003. Question 6

(i) r measures the degree of linear relationship between the two variables. (i) A weighted index number takes account of the amounts consumed, giving greater
weight to the prices of fruit with higher consumption.
(ii)

(ii) A Paasche index number is based on the current consumption pattern and so is
more up to date.
[NOTE. If it is intended to go on monitoring price changes using an index
number, a Laspeyres index, which is base-weighted, would be useful.]
r = +1 r= 1 r=0

p1q1
(iii) The product-moment correlation coefficient uses the actual recorded values of x (iii) The Paasche index is , where p0 and p1 are 1999 and 2002 prices
and y; Spearman's rank correlation coefficient ranks the values in order and uses p0 q1
only the rankings. Both coefficients are calculated from the same basic formula respectively and q1 is 2002 consumption.
(although Spearman's is usually expressed in a different way).
p1q1 2 0.75 20 0.15 1 0.75 10 0.35 8.75 .
(iv)
p0 q1 2 0.60 20 0.12 1 0.80 10 0.25 6.90 .

8.75
Therefore the Paasche index is 100 126.8 .
6.90

Fruit prices rose by 26.8%, using this index, from September 1999 to September
2002.

(v)
A B C D E F G H
Time rank 3 6 8 1 5 2 7 4
Speed rank 8 3 1 5 2 6 7 4
Difference (d) 5 3 7 4 3 4 0 0

d 2 = 25 + 9 + 49 + 16 + 9 + 16 + 0 + 0 = 124

6 d2 6 124
rs 1 1 0.476 .
n n2 1 8 63

Faster speed is associated with shorter time, so the coefficient is negative. But the
association is not very strong and so rs is not near to 1.

(vi) No effect, because the actual values are not used, only their rankings, which
would not be altered.
Ordinary Certificate, Paper II, 2003. Question 7 Ordinary Certificate, Paper II, 2003. Question 8

(i) (a) Trend is the underlying long-term movement of the series.


(i) Coins B and C could have been tossed a large number of times, and the
(b) The seasonal component is short-term regular variation about the trend, and can proportions of heads obtained. The estimates could then have been expressed to
be daily or weekly as well as quarterly even in some cases time of day. the nearest simple fraction.
(c) In an additive model, observations are expressed as the sum
[Coin A could have been dealt with in the same way, or if it was thought to be
trend + seasonal component + random residual "noise" . "fair" (e.g. a new undamaged coin) a smaller number of tosses could have been
(d) In a multiplicative model, these three items are multiplied together instead of used and the result tested against the hypothesis that P(head) = , using this as
being added. the "true" value of the probability if the hypothesis is not rejected.]
An additive model is appropriate for a slowly changing trend component, while a
multiplicative model is better when trend is changing rapidly. In a multiplicative model, (ii)
it is assumed that the ratio of seasonal variation to trend is constant in each season over
time. (C) Probability

(ii) (B)
Year Quarter Passengers 4-quarter Trend Actual/Trend (%) 1/4 H HHH : 1/24
(x 100,000) mv avge [for part (iii)]
1999 1 12 (A)
2 15 H T
15.25 1/3 HHT : 3/24 = 1/8
3 21 15.625 134.40
16.00 3/4
4 13 16.500 78.79
17.00
2000 1 15 17.750 84.51
18.50 1/4 H HTH : 2/24 = 1/12
2 19 19.000 100.00 H
19.50
3 27 20.250 133.33
21.00 1/2 T
4 17 22.125 76.84
23.25 2/3
2001 1 21 24.375 86.15 T HTT : 6/24 = 1/4
25.50 3/4
2 28 25.375 110.34
25.25
3 36 25.375 141.87
25.50
4 16 26.000 61.54 1/4
26.50 H THH : 1/24
2002 1 22 27.500 80.00 1/3
28.50 H
2 32 29.625 108.02 1/2
30.75
3 44 31.125 141.37 T
31.50
4 25 3/4 T THT : 3/24 = 1/8
2003 1 25
1/4
2/3 T H TTH : 2/24 = 1/12
The trend rises until the second quarter of 2001, and then it rises again after a short
period of being constant.

3/4 T TTT : 6/24 = 1/4


(iii) See the table above for calculations of (actual trend). Average seasonal variations (%)
are calculated as follows:-
(Total = 24/24 )
Q1 Q2 Q3 Q4
1999 134.40 78.79
2000
2001
84.51
86.15
100.00
110.34
133.33
141.87
76.84
61.54
(iii) P(3 heads) = 1/24, P(2 heads) = 1/4, P(1 head) = 11/24, P(0 heads) = 1/4 .
2002 80.00 108.02 141.37
Mean 83.55 106.12 137.74 72.39 Sum = 399.8
x 400/399.8 83.59 106.17 137.81 72.43 Sum = 400 (iv) P(2 heads) = 1/4. For the outcomes where one head is on coin A, we have
In quarters 1 and 4, actual numbers are below trend, namely about 84% and 72% of trend 5 1 5
P(HHT) + P(HTH) = 5/24. Hence the required probability is .
respectively (i.e. 16% and 28% below). In quarters 2 and 3, actual numbers are above 24 4 6
trend, namely about 106% and 138% of trend respectively (i.e. 6% and 38% above).

Ordinary Certificate, Paper II, 2004. Question 1

A quantitative variable is one measured on a numerical scale, while a qualitative variable


THE ROYAL STATISTICAL SOCIETY is not numerical but categorical, each item being assigned to one of a set of categories.

Qualitative variables can be nominal or ordinal.


2004 EXAMINATIONS SOLUTIONS The categories of a nominal variable cannot be put into an order; they may for example
be colours of objects (e.g. vehicles), source of origin (e.g. food products from different
parts of the world) or ethnic group.

ORDINARY CERTIFICATE Ordinal variables can be arranged in a logical order, such as for example in a 1-to-5
scoring scale for an opinion (e.g. disagree up to strongly agree), sizes of motor cars
PAPER II (small, medium, large), vigour of plants (weak, average, good, very good).

Discrete variables are usually counted as integers, for example the number of vehicles
passing along a road or the number of insects found on the leaves of a plant.

Continuous variables are precise measured variables, such as lengths, heights, times,
The Society provides these solutions to assist candidates preparing for the examinations which can take any value within a range.
in future years and for the information of any other persons using the examinations.

The solutions should NOT be seen as "model answers". Rather, they have been written Nominal : a bar chart with categories in no special order
out in considerable detail and are intended as learning aids.
Ordinal : a bar chart in the order given by the categories
Users of the solutions should always be aware that in many cases there are valid
alternative methods. Also, in the many cases where discussion is called for, there may be Discrete : a bar diagram in numerical order of the variable
other valid points that could be made.
Continuous : a histogram with intervals in increasing order of the value of the variable.
While every care has been taken with the preparation of these solutions, the Society will
not be responsible for any errors or omissions.

The Society will not enter into any correspondence in respect of these solutions.

RSS 2004
Ordinary Certificate, Paper II, 2004. Question 2 Ordinary Certificate, Paper II, 2004. Question 3

(i) Stem and leaf diagrams, using 10 units as stems and 1 as leaves, for the two
categories are as follows, after ordering the leaves for each stem. (i)
SEROMBA Frequency Cumulative frequency
Value Minimum possible Maximum possible
1 06 68 4 4 2.7 2.65 2.75
2 23 4455566 9 13 3.8 3.75 3.85
3
4
00
00
0000555788
0000005555557
12
15
25
40
3.0 2.95 3.05
5 00 0025 6 46 4.4 4.35 4.45
6 00 00 4 50 Total 13.9 13.7 14.1
ETOUDERAL Frequency Cumulative frequency
1 59 2 2
2 35 5669 999 9 11 (ii) Minimum for mean = 13.7/4 = 3.425.
3 55 5555 55555999999999 20 31
4 55 5555 999 9 40 Maximum for mean = 14.1/4 = 3.525.
5 55 5599 9 7 47
6 55 5 3 50

(ii) There are 50 prices in each list. The median is therefore midway between the (iii) Minimum possible SD = 0.7182, maximum = 0.8261 (these are given in the
25th and 26th, and so is 39 for each [Seromba: (38+40); Etouderal: (39+39)]. question).
The upper quartile is the [ (50+1)]th value in the list, i.e. the 38 th item; this is 45 for
SD
Seromba and 49 for Etouderal. The lower quartile is similarly the 12 th item, which is The coefficient of variation is 100 %.
26 for Seromba and 35 for Etouderal. [Note. Alternative definitions of quartiles for discrete data Mean
sets are sometimes used. In the present cases, they would make no difference due to the repeated values
in the lists.] Min SD 0.7182 Max SD 0.8261
0.2037 . 0.2412 .
The maxima and minima are also used in the boxplots. Max mean 3.525 Min mean 3.425

Hence the minimum possible coefficient of variation is 20.4% and the maximum is
SEROMBA
24.1%.

ETOUDERAL

Price ( )

10 20 30 40 50 60

(iii) Both have a range of 50, Seromba from 10 to 60 and Etouderal from 15 to
65. Both have a median of 39. Seromba has more at the lower end of the price range.
Etouderal has a large number at 35 39.

(iv) Almost all Etouderal products have prices ending in 5 or 9, the 9 being no doubt
intended to give an impression that they are not all that expensive. Seromba has several
ending in zeros, and none with 9s.

Ordinary Certificate, Paper II, 2004. Question 4 Ordinary Certificate, Paper II, 2004. Question 5

(i) The chart shows actual takings (diamond symbols) and the 7-day moving average
(i) The proportion with no amenities is 1 0.84 = 0.16. calculated in part (iii) (square symbols).

The proportion with 3 is given as 0.16 and the proportion with 2 or more as 0.36. Hence
140
the proportion with 2 is 0.36 0.16 = 0.20.
120
This leaves a proportion of 0.48 with 1 amenity. 100
)

80
Summary table:
Ta in s (

60
Number of amenities 0 1 2 3
Proportion 0.16 0.48 0.20 0.16 40
20
0
Mean = (0 0.16) + (1 0.48) + (2 0.20) + (3 0.16) = 1.36.
M Tu W Th F Sa Su M Tu W Th F Sa Su M Tu W Th F Sa Su
Day
(ii) Put x = proportion having just C = proportion having just BL. The proportion
having just S is then x + 0.06.
(ii) Seasonal variation is a regular short-term variation from the trend. Here there is a
7-day variation, with more sales on Friday and Saturday, dropping on Sunday and
So, for one amenity, we have 0.48 = x + x + x + 0.06 = 3x + 0.06, giving 3x = 0.42 so that
Monday, and then increasing again steadily through the week.
x = 0.14.
(iii) A 7-day moving average will be needed.
Now put y = proportion with (C+BL) = proportion with (C+S). Then the proportion
having (S+BL) is y + 0.02. Thus for two amenities we have 3y + 0.02 = 0.20, so y = 0.06. Takin s 000 7 day total 7 day movin avera e
M 92
Tu 98
The diagram can now be completed. W 104
Th 106 753 107.57
F 120 750 107.14
Sa 132 745 106.43
Su 101 741 105.86
M 89 738 105.43
S BL Tu 93 733 104.71
W 100 726 103.71
Th 103 724 103.43
0.20
0.08 0.14 F 115 722 103.14
Sa 125 719 102.71
Su 99 716 102.29
M 87 712 101.71
Tu 90 711 101.57
0.16
0.06 0.06 W 97 707 101.00
Th 99 705 100.71
F 114
0.16 0.14 C Sa 121
Su 97

(iv) The trend is slowly downwards, almost linear.


Ordinary Certificate, Paper II, 2004. Question 6 Ordinary Certificate, Paper II, 2004. Question 7

(i) There are n = 8 destinations. Sample mean x = 84/8 = 10.5 days. (i) In the tree diagram, C represents cursory check and T represents thorough check.
1 842 62 The tree "starts" with C at week 1; weeks 2, 3 and 4 are as shown.
Sample standard deviation = 944 = = 2.976.
7 8 7

(ii) 4

1800 3
0.2 T
1600
1400 2
1200 0.2 T C
1000 0.8
( )

800
0.7 T
ri

600 T
400 0.7 C
200 0.8
0.3 C
0
0 5 10 15 20
0.2
Duration (days) T
0.7
0.3 T
C
There is very little suggestion of a linear relationship. It is possible to find holidays of 0.8 C
the same length at very different prices.
0.7
0.3 C T

(iii) y = 9490/8 = 1186.25.

C
102230 84 9490 / 8
0.3
2585
The estimate of the slope parameter is = 41.69.
944 842 / 8 62

So the line is y 1186.25 = 41.69(x 10.5) = 41.69x 437.78, i.e. y = 748.5 + 41.7x. (ii) From the tree diagram, the probabilities are (a) 0.7, (b) (0.7 0.2) + (0.3 0.7) = 0.35,
(c) (0.7 0.2 0.2) + (0.7 0.8 0.7) + (0.3 0.7 0.2) + (0.3 0.3 0.7) = 0.525.
[This is plotted on the diagram above. Note for plotting that, for example, the line passes
through x , y and (15, 1374).] P 2 and 4
(iii) For P(2 4), we have P(2 4) = and, following the appropriate routes
P 4
through the tree to obtain the numerator, we have
(iv) With x = 21, we get y = 1624(.2). However, the data are very variable and
predictions are therefore unreliable. Also there are no data above x = 15, so to extend the P 2 and 4 0.7 0.2 0.2 0.7 0.8 0.7 0.42
0.8 .
line to x = 21 is extremely hazardous. P 4 0.525 0.525

Ordinary Certificate, Paper II, 2004. Question 8

THE ROYAL STATISTICAL SOCIETY


The table could be rewritten in the order of SMRs (increasing or decreasing equally
suitable), with the statement "SMR for whole of South Trafford = 87" placed beneath it.
(Possibly the statement "National SMR = 100" would also be useful.) 2005 EXAMINATIONS SOLUTIONS
The number of deaths in South Trafford is 13% below the national average, but the
electoral wards vary substantially. Sale Moor and St Martin's are over 20% higher than
the national average, while Bowdon, Village, Mersey St Mary's, Hale and Timperley are
all more than 25% less than the national average. Priory and Broadheath are very close ORDINARY CERTIFICATE
to the national average, Altrincham is 15% below it and Broadlands 18% below it.
PAPER II
Why should Sale Moor and St Martin's be so high? Some study should be made of
possible reasons, such as population profiles (age etc) and local conditions. Comparison
with those at the opposite extreme of the table would be useful.

The Society provides these solutions to assist candidates preparing for the
examinations in future years and for the information of any other persons using the
examinations.

The solutions should NOT be seen as "model answers". Rather, they have been
written out in considerable detail and are intended as learning aids.

Users of the solutions should always be aware that in many cases there are valid
alternative methods. Also, in the many cases where discussion is called for, there may
be other valid points that could be made.

While every care has been taken with the preparation of these solutions, the Society
will not be responsible for any errors or omissions.

The Society will not enter into any correspondence in respect of these solutions.

Note. In accordance with the convention used in the Society's examination papers, the notation log denotes
logarithm to base e. Logarithms to any other base are explicitly identified, e.g. log10.

© RSS 2005
Ordinary Certificate, Paper II, 2005. Question 1 Ordinary Certificate, Paper II, 2005. Question 2

(i) (i) Maximum is 27 years minus 1 day; minimum 26 years exactly. If his
birthday is on Census Day he will be recorded as having reached that age, and
he will still be 26 until the actual day he reaches 27.
Number of people with Alzheimer's Disease in USA
Source: US National Center for Health Statistics 2000
(ii) The minimum age difference is 1 year 1 day (husband is exactly 36 and wife
9 will be 35 tomorrow). The maximum difference is 3 years minus 1 day
8 (husband will be 37 tomorrow and wife is 34 today).

Mid-point of difference is 2 years.


7
Number of people (millions)

5 Age 65-74
Age 75-84
(iii) (a) Differences (older – younger) are
45
4 Age 85+

2, 5, 11, 3, 4, 0, 3, 13, 3, 1. Mean = = 4.5 years.


10
3

1
The sum of the squares of the differences is 4 + 25 + 121 + 9 + 16 + 0
0
2000 estimated 2050 projected
+ 9 + 169 + 9 + 1 = 363.
Year

1 452
s2 363 16.05 and therefore s = 4.01
10 10
(ii) Total numbers in each age group increase in the projected figures, only
slightly in the 65–74 group but substantially in 75–84 and even more in 85+. regarding this as a population of couples, or
In the 2000 data, the number in the 85+ group is less than that in 75–84, but
the prediction is that it will become much larger in the 85+ group by 2050. In 1 452
2000 there were relatively few in the 65–74 group, and this is predicted to s2 363 17.833 and therefore s = 4.22
9 10
continue to 2050.
regarding it as a sample.
(iii) Possible advantages of using a pie chart are
(b) Differences (husband – wife) are
(A) it is more eye-catching, easier to interpret,
(B) it shows the proportions in each category. 7
2, 5, –11, 3, –4, 0, –3, 13, 3, –1. Mean = = 0.7 years.
10
Possible disadvantages are
(A) calculations (of angles) are needed before it can be drawn, 1 72
s2 363 35.81 and therefore s = 5.98 for a population, or
(B) there is no scale to read off actual numbers in each category, 10 10
(C) the absolute numbers with the disease are not shown.
1 72
s2 363 39.79 and therefore s = 6.31 for a sample.
9 10

Ordinary Certificate, Paper II, 2005. Question 3 Ordinary Certificate, Paper II, 2005. Question 4

(i) These are the parking times to the nearest 10 minutes (below), i.e. 5 2 would (i)
be any time from 520 to 529 minutes. 5 is the "stem", which is in hundreds of
minutes, and 2 is the "leaf", which is in tens of minutes. (Minutes less than 10
are ignored).
Cumulative frequency polygon

600

(ii) Six hours = 360 minutes. Seven vehicles stayed the full permitted time (or a 500

little more), and only two were above (490 and 520). Six cars park for just
below 60 minutes (1 hour), suggesting that there is one price for up to one
Cumulative number of customers

400

hour and then an increase. There is a similar pattern leading up to 180 minutes
(3 hours), pointing to a price increase at 3 hours. Also leading up to 240 300

minutes (4 hours) there is a concentration of leavers, suggesting another


increase at 4 hours. There are no indications from these data of tariff change 200

at 2 hours; there is a possibility at 5 hours.


100

1
(iii) The median is the 60 +1 th item in order in the list, i.e. between the 30th
0

2
0 5 10 15 20 25
Lower
quartile Median Upper Waiting time (minutes)

and 31st. Thus it is at 195 (with the assumptions given).


quartile

The lower and upper quartiles may be taken at the 15th and 45th observations Cumulative frequencies are:
in order (there are slightly differing conventions about this); these are at 135
and 295. Up to 0.5 1 2 3 4 5 7.5 10 15 20
Cu
16 38 112 188 256 316 372 426 468 504 520
freq
(iv) The first quartile is nearer to the median than the third quartile is, which
indicates a distribution that is skew to the right (positively skew). 520
(ii) The median is at the th item, i.e. it is found by drawing a horizontal
2
line to the graph at height 260 and projecting down to the x-axis – this gives a
waiting time of about 4.1 minutes. Similarly, using heights 130 and 390 in the
y-direction, the quartiles are approximately 2.3 and 8.5 minutes.

200 295 (iii) The 260th customer in order of times is in the group 4 to 5. It is easier to
4
calculate the median as of the way up from 4 (there are 256 up to 4 and 60
0 15 100 135 195 300 400 500 525 60
4
in the next interval, which is of width 1). Therefore M 4 + 4.07 mins
60
(i.e. 4 mins 4 seconds).

(iv) Half the customers wait less than 4.07 minutes BUT there is a very long tail to
the right, with 10% waiting more than 15 minutes (and 25% more than 8.5
minutes (see (ii)).
Ordinary Certificate, Paper II, 2005. Question 5 Ordinary Certificate, Paper II, 2005. Question 6

(i) The inspector seems to believe that if only 3 eggs are used they will always be (i)
the 3 (out of 4) that do not contain salmonella! This is clearly not true, indeed
totally lacking in logic. Unless we have tested them, we do not know; the Time chart of reported accidents in a large town 2002-2004

bacterium is likely to be randomly distributed. We assume this in what


follows. 60

(ii) Let S be the event "contains salmonella". We have P(S) = 0.25. 50

P(3-egg quiche salmonella free) = (0.75) = 0.422.


3

40

Number of accidents
(iii) Similarly, P(4-egg quiche salmonella free) = (0.75) = 0.316.
4
Accidents (y)
30 M.A. Trend
Regression trend

(iv) The probability that an n-egg quiche is salmonella free is (0.75) . This is less
n

than 0.1 if n log(0.75) < log(0.1). 20

2.302585
Hence the critical value of n is log(0.1) / log(0.75) 8.004 .
0.287682
10

[Check: (0.75)8 = 0.1011 and (0.75)9 =0.07508. (This result could be reached
on a hand calculator very easily by successive multiplication.)]
0
2002 2 3 4 2003 2 3 4 2004 2 3 4
We must take n at least 9, so the answer is n 9. Q1 Q1 Q1
Year and quarter

(v) If locally produced eggs are used, independence is not likely since they are
likely to have come from exactly the same source, so will all (or mostly) have
the infection or all (or mostly) not. If however a large egg-packing company (ii) The peaks and troughs recur at 4-quarterly intervals, so a 4-quarter moving
is used for supplies, independence is more likely since sources of supply will average will smooth this out.
be numerous and mixing will occur.

Solution continued on next page

(iii)
(v) The quarter numbers (x) have been added to table in part (iii).
Quarter 4-quarter Sum of
Accidents Centred moving
Year Quarter number
(y)
moving pairs of
average trend Linear regression:
(x) total totals
2002 1 1 42 x2 = 650 y2 = 24085 xy = 3599.

2 2 38
x 6.5 , y 44.583 .
162
78 535
3 3 36 330 41.250 3599 121.5
bˆ 12 0.8497
168
782 143
4 4 46 341 42.625 650
12
173
2003 1 5 48 352 44.000 Line is y – 44.583 = 0.8497(x – 6.5) or y = 0.8497x + 39.060.
179
2 6 43 359 44.875
(vi) The linear regression trend is very similar to the moving average, with the
180 advantage of being estimated for all quarters (not omitting the end ones).
3 7 42 364 45.500
184
4 8 47 372 46.500
188
2004 1 9 52 379 47.375
191
2 10 47 384 48.000
193
3 11 45

4 12 49

x=78 y=535

(iv) The moving average trend is very nearly linear.

Solution continued on next page


Ordinary Certificate, Paper II, 2005. Question 7 Ordinary Certificate, Paper II, 2005. Question 8

TEAM A I F S E SA NZ W (i) Cost relatives are (percentages):


Scored 2 6 4 8 3 5 1 7
Conceded 1 3 6 7 2 5 4 8 8.4
A = 1.120 i.e. 112.0%;
7.5
(Highest scored ranked 1. Lowest conceded ranked 1.)
12.0
d (diff. in B = 1.176 i.e. 117.6%;
1 3 –2 1 1 0 –3 –1 10.2
ranks)
9.6
C = 0.980 i.e. 98.0%;
9.8
d 2 = 1 + 9 + 4 + 1 + 1 + 0 + 9 + 1 = 26.
13.2
6 d2 6 26 D = 1.065 i.e. 106.5%.
rs 1 1 0.6905 12.4
n n2 1 8 63

The sample is too small for rs to be worth testing for significance (in fact it is not (ii) Model C became 2% cheaper to produce, and the others were all more
significant) but the main contributions to d 2 are from Ireland and New Zealand. expensive: D by 6.5%, A by 12.0% , B by 17.6%.
New Zealand scored most points but conceded a fair number; Ireland did not score
very many.
(iii) 200 112.0 240 117.6 750 98.0 140 106.5 139034 .

Total weekly in 2004 = 1330.

139034
Therefore weighted index 104.5 .
1330

Ordinary Certificate, Paper II, 2006. Question 1

(a)
M
THE ROYAL STATISTICAL SOCIETY

0 100%
2006 EXAMINATIONS SOLUTIONS 39.0% 64.3%

x = 14.5 x = 29.5

ORDINARY CERTIFICATE 25.3%

PAPER II
In a large sample, the median can be taken at the 50% frequency point. There
is 39.0% of the sample below the value x = 14.5 and in the next 15 units of
measurement there is 25.3% of the sample. We need to go 11% of this to
11.0 11.0
reach 50%, i.e. of the way from x = 14.5 to 29.5. So 14.5 15
The Society provides these solutions to assist candidates preparing for the 25.3 25.3
examinations in future years and for the information of any other persons using the is the required value of the median, i.e. 14.5 + 6.52 = 21.0 (1 d.p.). This
examinations. assumes that the values between x = 14.5 and x = 29.5 are uniformly spread.

The solutions should NOT be seen as "model answers". Rather, they have been
written out in considerable detail and are intended as learning aids. (b) (i) Again assuming linear interpolation is correct, the critical value for
1
n = 35 will be 55.76 43.77 49.765 .
Users of the solutions should always be aware that in many cases there are valid 2
alternative methods. Also, in the many cases where discussion is called for, there may
be other valid points that could be made. (ii) n 30 35 40
1/n 0.03333 0.02857 0.02500
While every care has been taken with the preparation of these solutions, the Society Critical value 43.77 55.76
will not be responsible for any errors or omissions.
Using 1/n for the interpolation, and the given critical values of the
The Society will not enter into any correspondence in respect of these solutions. statistic at n = 30 and n = 40, the value at n = 35, where 1/n = 0.02857,
is

0.02857 0.03333
43.77 55.76 43.77
0.02500 0.03333

( 0.00476)
43.77 11.99
( 0.00833)

© RSS 2006 = 43.77 + 6.85

= 50.62 (slightly larger than the value in (b)(i)).


Ordinary Certificate, Paper II, 2006 Question 2 (ii) Grouped frequency distribution of lifetimes of light bulbs:-

(i) Driver sleeplessness is thought to account for 10% of all UK road accidents. Lifetime (h) Frequency Density*
On motorways and dual carriageways, that figure rises to 20%. 0 but less than 500 4 0.8
500 but less than 1000 18 3.6
(ii) 1 in 10,000 is 0.0001, but as a percentage this is 0.01%. 1000 but less than 1100 14 14
1100 but less than 1200 8 8
280000 75440 1200 but less than 1300 11 11
(iii) The increase as a percentage of the previous year is 100 1300 but less than 1400 14 14
75440 1400 but less than 1500 21 21
which is 2.7116 100 i.e. 271%. 1500 but less than 2000 10 2
Total 100
This is the only change needed to the quotation.
* Density is frequency density per 100 hours. This column is appended to the
(iv) The percentages quoted add up to 100. It is not possible for all of them to
table to enable the histogram in part (iii) to be constructed, in which the areas
have increased or remained the same – some must have decreased (as
standing on each interval represent frequencies.
percentages) to balance the increases. Possibly there has been confusion
between actual numbers and percentages.
(iii) Histogram of lifetimes of 100 light bulbs
__________________________________________________________________
25

Ordinary Certificate, Paper II, 2005. Question 3


20
(i) As the data are already ranked in order, an ordered stem and leaf diagram is
easily produced by using 01 to 18 as stems and omitting the final digit when

Frequency density per 100 h


constructing the leaves. For example, 1517 simply becomes 15|1 (see the
diagram). 15

Lifetimes (h) of 100 light bulbs


10
Stem | Leaf
(hundreds) | (tens)
1 | 9
5
2 | 7
3 | 6
4 | 2
5 | 15 0
6 | 27
0 500 1000 1500 2000
7 | 5599
8 | 4456778
9 | 479 Lifetime (h)
10 | 00000223558889
11 | 12337788
12 | 23344455667
13 | 01244456666677
14 | 000001122334566678899 (iv) The histogram shows the overall shape of the distribution, so giving a general
15 | 13 picture of the overall quality of the batch. The stem and leaf diagram also
16 | 25789 does this, but in addition makes it easy to calculate the median and quartiles;
17 | 02 it is therefore more useful. The stem and leaf is more easily incorporated into
18 | 9 a routine quality control procedure because it can be built up as data are
collected.
Solution continued on next page

Ordinary Certificate, Paper II, 2006. Question 4 Ordinary Certificate, Paper II, 2005. Question 5

P A and B P A B
(i) P(A | B) is , i.e. . (i) Rank the profit in order also, then calculate d, the difference in ranks for each
P B P B
club.
(ii) If A and B are independent, then Profit 1 4 2 3 10 6 9 5 8 7
(a) P(A and B) = P(A) P(B), League position 1 5 2 3 4 10 9 8 7 6
d 0 –1 0 0 6 –4 0 –3 1 1
(b) P(A | B) = P(A) [i.e. B gives no information on A].

(iii) (a) Device 1 works only when both X and Y work, so the probability is Spearman's rank correlation coefficient
3 7 21
.
4 8 32 6 d2 6 64
rs = 1 1 1 0.388 0.612 .
n n2 1 10 99
(b) Device 2 works except when both X and Y fail. The probabilities of
1 1 1 1 1
failure are and , so P(both fail) = and P(Device 2 This shows a fairly strong positive relation between position and profit. The
4 8 4 8 32
main exceptions are Chelsea (high in table but a very large loss) and
1 31
works) = 1 . Tottenham Hotspur (a smaller loss than some clubs above them in the table;
32 32 also true of Southampton).

(iv) Suppose Device 1 works. Then X and Y both work, so the probabilities are (ii) League position is an ordinal variable, so Pearson's coefficient is not suitable.
(a) 1, (b) 0, (c) 1.

Now suppose Device 2 works. We use notation as follows: X for "X works", Y for
"Y works", 2 for "Device 2 works" and Y C for "Y does not work".

P ( X 2)
(a) P(X | 2) = [note that the event X 2 is the same event as X]
P(2)

P( X ) 3/ 4 3 32 24
= = .
P(2) 31/ 32 4 31 31

P( X YC 2) P( X Y C ) 3 1
3 1 32 3
(b) P(X Y C | 2) = 4 8
.
P (2) P(2) 31
32 4 8 31 31

P( X Y 2) P( X Y ) 3 7
3 7 32 21
(c) P(X Y | 2) = 4 8
.
P(2) P(2) 31
32 4 8 31 31
Ordinary Certificate, Paper II, 2005. Question 6 Ordinary Certificate, Paper II, 2006. Question 7

(i) For x = number of heart valve operations, x 1790, x2 437898 . (i) The graph shows percentage changes on the previous quarter. So the chain
base index numbers are read directly from the graph.
The sample is of size n = 8 (note this is a sample, not a population, so divisor
Year Qtr North South
n – 1 is used in calculating the variance and standard deviation), so the sample 2003 Q1 – –
mean is 1790/8 = 223.75 and the sample variance is 2003 Q2 102.0 103.0
2003 Q3 102.5 103.0
x 2003 Q4 103.0 103.5
2
1 1
x x x2 5340.786
2
2004 Q1 103.5 102.5
7 7 8 2004 Q2 102.5 101.5
2004 Q3 102.0 101.0
2004 Q4 102.0 100.5
and the standard deviation is (5340.786) = 73.08.

[Note. The mean and standard deviation can of course be found directly from (ii) To obtain fixed index numbers based on 2003 Q1 = 100.0, we "scale up" each chain
calculators, but candidates are strongly advised to give an indication of the base index number by the corresponding factor for its successor. For example, North
method being used.] 2003 Q3 is calculated as 102.0 (102.5/100.0) = 104.55 (shown as 104.6 in the
table); North 2003 Q4 is then this multiplied by (103.0/100.0); and so on.
100 sd
The coefficient of variation (expressed as a percentage) is = 32.7%. Year Qtr North South
mean
2003 Q1 100.0 100.0
2003 Q2 102.0 103.0
2003 Q3 104.6 106.1
(ii) Similarly for y = number of heart bypass operations, we have y 14286 2003 Q4 107.7 109.8
and y 2 28503366 , leading to mean 1785.75, variance 427448.786 and 2004 Q1 111.5 112.5
2004 Q2 114.3 114.2
standard deviation 653.80.
2004 Q3 116.6 115.3
2004 Q4 118.9 115.9
65380
Hence the coefficient of variation is 36.6% .
1785.75
[Each result is given to 1 d.p. and then used as such. Slightly different
numbers are obtained in some quarters if higher precision is used throughout.]
(iii) On average, a far greater number of heart bypass operations is carried out,
compared with heart valve operations.
(iii) Overall there was an 18.9% increase in house prices in the North over the two
The standard deviation is larger for bypasses, so we could say that the years, and a 15.9% increase in the South.
between-hospital variation for this operation is greater than for heart valve
operations. The peak rate of increase in the South was in 2003 Q4, whereas it was a
quarter later in the North.
But when the coefficients of variations, which compare the standard deviations
with the means, are calculated they are quite similar. So, relative to the means, In the North, the rate of increase continued to rise until 2004 Q1, and then
the variability in the number of operations between hospitals is about the same began to drop; in the South, there was a more rapid increase until 2003 Q4
for the two types of operation. and then the rate of increase dropped considerably for the rest of the period.

The rate of increase was greater in the South than in the North in 2003, but the
reverse was true in 2004.

Ordinary Certificate, Paper II, 2006. Question 8 (iv) There is some remaining fluctuation in both moving averages, although both
suggest that the quarterly number of new members is reducing slowly from the
early two quarters. However, the peak in 2004 Q3 causes a slight rise at the
In order to draw a useful chart on this scale, a whole sheet of graph paper should be end. But the average of this and the next quarter is similar to that for the
used. The fluctuations in quarterly numbers of new members cause the vertical scale corresponding quarters in 2003. Since moving averages should not be
to be inconveniently wide. If it goes down to 0, this wastes space at the bottom end; extrapolated, neither plot for these data is helpful in prediction.
a "false origin" might be used, but there must be a clear indication of it (e.g. by a
"break" in the axis).

(i) The line shows the time trend, which is rather irregular but does
not seem to have a seasonal component. There may have been a membership Calculation
campaign occasionally; information would be useful. This could explain the
local peaks. Membership numbers
Time New members Centred 4Q MA 7Q MA
2000 Q1 242
Society membership Q2 357
Q3 168 242.8
400 Q4 186 229.4 236.4
2001 Q1 278 216.8 231.1
Q2 214 224.4 212.6
350

300 Q3 210 220.4 215.1


Q4 205 210.5 213.1
250
2002 Q1 227 202.3 197.4
New members
Q2 186 192.9 191.6
New Members

Q3 172 181.5 186.6


200 Centred 4Q MA
7Q MA

150 Q4 168 173.4 183.9


2003 Q1 173 173.8 178.3
100
Q2 175 178.0 177.0
50
Q3 186 181.0 176.1
Q4 188 180.4 186.3
0 2004 Q1 177 185.9 180.4
Q2 166 185.5
2000 Q2 Q3 Q4 2001 Q2 Q3 Q4 2002 Q2 Q3 Q4 2003 Q2 Q3 Q4 2004 Q2 Q3 Q4
Q1 Q1 Q1 Q1 Q1
Time Q3 239
Q4 132

(ii) and (iii) The calculations are shown on the next page. For example, the centred
four-quarterly moving average for 2000 Q3 is

1 242 357 168 186 357 168 186 278 242 2(357 168 186) 278
= .
2 4 4 8

The seven-quarterly moving averages do not need centring in this way.

On the graph above, shows the centred four-quarterly moving averages and
shows the seven-quarterly moving averages.

Solution continued on next page


Ordinary Certificate, Paper II, 2007. Question 1

(a) Discrete data are measurements that can only take a set of specified values,
THE ROYAL STATISTICAL SOCIETY such as the integers: the number of people in a household much be a whole
number, an integer. The values need not have an upper limit: if we count
insects or caterpillars in an agricultural plantation there must be a whole
number but was cannot say there is a highest possible number. We are not
2007 EXAMINATIONS SOLUTIONS limited to integers; for example, if we consider a currency that includes say a
half-penny or half-cent coin, the total count of money in our pocket may be ½,
1, 1½, 2, … pence or cents; this is a discrete set of data points.

ORDINARY CERTIFICATE Continuous data can take any value whatever within a specified range. For
example, the height of a plant may be any value whatever: 25.39827… cm is
PAPER II a possible height even if it might be recorded as 25.4. This is of course subject
to the accuracy of available measuring equipment, but this does not detract
from the principle that any height is possible. Obviously there is a physical
upper limit (though we may be unsure what it is) to the possible height, and
likewise a physical lower limit (perhaps taken as 0 for a plant that has died).
But between these limits, any value is possible.
The Society provides these solutions to assist candidates preparing for the
examinations in future years and for the information of any other persons using the
examinations. (b) As an illustration, if the same person does the same operation (e.g. a
calculation) several times, the variation in the times he or she takes is "within
The solutions should NOT be seen as "model answers". Rather, they have been subject" or "intra-subject" variation. If several different people each do the
written out in considerable detail and are intended as learning aids. same operation once, the variation between these times is "between subject" or
"inter-subject" variation. Now suppose that several different people do the
Users of the solutions should always be aware that in many cases there are valid same task twice each; the variation between the mean times for these people
alternative methods. Also, in the many cases where discussion is called for, there may is "inter-subject" and the variations between the two times taken by the same
be other valid points that could be made. subject are "intra-subject".

While every care has been taken with the preparation of these solutions, the Society
will not be responsible for any errors or omissions.

The Society will not enter into any correspondence in respect of these solutions.

© RSS 2007

Ordinary Certificate, Paper II, 2007. Question 2 Ordinary Certificate, Paper II, 2007. Question 3

(i) Unordered data, as recorded: (i)


Length Upper end-point Frequency Cumulative frequency
Hundreds | Tens (mm) (mm)
1 | 7 5 8 20–29 29.5 2 2
3

2 | 1 2 5 0 5 30–39 39.5 7 9
3 | 5 9 8 6 8 8 9 40–49 49.5 11 20
4 | 3 6 2 1 3 9 7 6 7 50-59 59.5 15 35
5 | 1 2 1 5 0 2 0 60–69 69.5 18 53
70–79 79.5 17 70
80–89 89.5 12 82
With the "leaves" ordered: 90–99 99.5 9 91
100–109 109.5 5 96
(for use in part (ii)) 110–119 119.5 4 100
Hundreds | Tens Cumulative Frequency
1 | 5 7 8 3
2 | 0 1 2 5 5 8 (ii) Plot cumulative frequencies against upper-end points of intervals, beginning
3 | 5 6 8 8 8 9 9 15 with 0 at 19.5.
4 | 1 2 3 3 6 6 7 7 9 24
5 | 0 0 1 1 2 2 5 31

Cumulative Frequency Curve of Mussel Data


(ii) The median is the 16th from the beginning, which is 41. Referring to the
original data, it is 419.
120

The lower quartile is at the 8th position and the upper quartile at the 24th 100

position. Using the cumulative frequencies, these are found as 25 (the second
of those) and 49, which give the exact results 253 and 494. [Other
conventions for the detailed locations of the lower and upper quartiles also
80

exist.]
60

(iii) Taking the days in order, the last two observations in each group of seven (i.e. 40

6, 7, 13, 14, 20, 21, 27, 28) are much lower than the other five. It seems very
likely that these would be the weekend days. A more informative analysis
would take these 8 days as one group and the remaining 23 days as another
20

group, and examine the groups separately.


0
0 20 40 60 80 100 120 140

length (mm)

(iii) The median length corresponds to a cumulative frequency of 50, and is


approximately 68 mm. The quartiles correspond to frequencies 25 and 75, and
are approximately 53 and 83 mm.

Hence the inter-quartile range is, approximately, 83 – 53 = 30 mm.


Ordinary Certificate, Paper II, 2007. Question 4 Ordinary Certificate, Paper II, 2007. Question 5

(i) For ammonia, the mean is 0.533 ... 0.278 / 6 1.9827 / 6 0.330 (or this (i) (a) From the tree diagram below, P(HM in week 2) = 0.1.
can be found directly from a calculator).
(b) In week 3 there are three ways of finishing up at the hypermarket:
(HM, HM), (SM, HM) and (VS, HM). From the tree diagram, these
For the standard deviation, it is as well to check from another measure that
have probabilities 0.1 0.1 = 0.01, 0.3 0.3 = 0.09 and 0.6 0.4 = 0.24
divisor n – 1 is intended (it is), and then the standard deviation for ammonia
respectively. So the total probability is 0.34.
can be found directly from the appropriate key on a calculator (or worked out
using the usual formula) as 0.296.

Similarly, for dissolved oxygen the mean is 88.0% and the standard deviation Week 1 Week 2 Week 3

is 8.34%.
HM
0.1

100 × standard deviation


The coefficient of variation (%) = . This can now be HM 0.3 SM
mean
calculated for each measure. 0.6 VS
0.1

Thus the completed table is as follows. 0.3


HM

HM 0.3 SM 0.3 SM
Quality measure Mean Standard deviation Coefficient of variation
(%) 0.4 VS
Suspended solids 15.0 mg/l 19.7 mg/l 131.0
Ammonia 0.330 mg/l 0.296 mg/l 89.7
Nitrate 2.39 mg/l 0.411 mg/l 17.2 0.6 HM
Orthophosphate 0.176 mg/l 0.0858 mg/l 48.7
0.4

Dissolved oxygen 88.0% 8.34% 9.5 VS 0.4 SM

VS
(ii) One value (April) for suspended solids is very high, while two values
0.2

(December and February) for ammonia are very low. These lead to very large
coefficients of variation as the sets of data for these measures are very
variable. Orthophosphate is also quite variable. It is very likely that on some
occasions there are pollutants in the stream. The value 104% for dissolved In an obvious notation, we want
oxygen in March needs checking for possible errors in measurement or
technical errors in the analysis or calculation. P SM 2 HM3
P SM 2 HM 3 .
P HM 3

SM2 HM3 means "SM in week 2 followed by HM in week 3" and the
tree diagram shows that this has probability 0.3 0.3 = 0.09.

0.09
The required probability is therefore = 0.265.
0.34

Solution continued on next page

(ii) Ordinary Certificate, Paper II, 2007. Question 6

Week 1 Week 2 (i) ["Series 1" is the given set of data points. For "series 2", see part (v).]

HM
0.1 Scatterplot of Skiing Data

3
HM 0.3 SM

0.6 VS 2.5
20/71

HM 2
0.3

24/71 SM 0.3 SM Series1


1.5
Series2

0.4 VS
1

27/71
HM
0.4
0.5

VS 0.4 SM
0
0 0.5 1 1.5 2 2.5 3
0.2 VS
Final checkpoint (secs)

P(HM in week 2) = P(HM, HM) + P(SM, HM) + P(VS, HM) (ii) x 15.4 /10 1.54 , y 12.2 /10 1.22 .

20 24 27 1 20 15.4 2
x x 26.34 26.34 23.716 2.624 .
2
= 0.1 0.3 0.4 2.0 7.2 10.8 .
71 71 71 71 71 10

12.22
y y 18.24 18.24 14.884 3.356 .
2

10

15.4 12.2
x x y y 21.68 21.68 18.788 2.892 .
10

2.892
r 0.975 .
2.624 3.356

(iii) r is very close to +1, so there is a very close (linear) relation between x and y,
both increasing together.

Solution continued on next page


(iv) The regression line is given by y y b x x , where Ordinary Certificate, Paper II, 2007. Question 7

x x y y 2.892 (i) (a) Trend is the basic long-term underlying movement of the series.
b 1.102 ,
x x
2
2.624
(b) Seasonal variation is short-term, usually regular (and in some sense
so the line is seasonal), variation about the trend.

y – 1.22 = 1.102(x – 1.54) (c) An additive model assumes that the components Trend, Seasonal and
Irregular are added together (rather than multiplied together) to give
or y = 1.102x – 0.477. the time series value, so that the model to explain the time series data
actually observed is of the form
The checkpoint at 1 min 30 seconds has an x-value of x = 2. So the estimate Time series value = Trend + Seasonal + Irregular.
of y is 2.204 – 0.477 = 1.727, giving that the estimated finishing time is 1 min
54.7 seconds.
(ii) Differences between Sales and Trend are

(v) The mean (x = 1.54, y = 1.22) lies on the line. In part (iv) we have found that
(x = 2, y = 1.73) lies on the line. So these can be used. See graph above. Year Qtr 1 Qtr 2 Qtr 3 Qtr 4
These points have been indicated as "series 2"; the line has not been drawn, to 2004 92.000 –108.000 –75.750
avoid confusing the display. 2005 87.625 81.125 –86.125 –78.250
2006 84.000 88.125 –89.875
Mean 85.8125 87.083 –94.667 –77.000 Total 1.229

Since these means do not add to 0, an adjustment must be made by subtracting


1.229
0.307 from them to give
4
Qtr 1 Qtr 2 Qtr 3 Qtr 4
Adjusted mean 85.506 86.776 –94.974 –77.307

(the total of 0.001 is still non-zero, but this is a minor rounding error).

(iii) Sales are considerably higher in quarters 1 and 2 than they are in quarters 3
and 4.

(iv) When the trend changes rapidly, a multiplicative model may be more
appropriate because seasonal variation is then a constant percentage of trend
(rather than just an absolute value).

Ordinary Certificate, Paper II, 2007. Question 8

2005 price
(i) Price relatives, using 1975 = 100, are . These are as follows.
1975 price THE ROYAL STATISTICAL SOCIETY
Sugar 450.00
Eggs 380.95 2008 EXAMINATIONS SOLUTIONS
Raisins 256.14
Ground Almonds 172.76
Brandy 431.82
Total 1691.67 Mean = 1691.67 / 5 = 338.3
ORDINARY CERTIFICATE

(ii) Although the price of these ingredients is higher in 2005 by 238.3% compared PAPER II
with 1975, this index tells us nothing about other items in the cost of living,
such as other food and drink, housing, heating, clothing, travel and transport.

(iii) Expenditure weights are appropriate. Base weighting requires the 1975
expenditure weights. Expenditure = price quantity. We work in units of The Society provides these solutions to assist candidates preparing for the
pence. examinations in future years and for the information of any other persons using the
examinations.
1975
1975 Expenditure Price
Ingredient price Quantity Relative
Price relative The solutions should NOT be seen as "model answers". Rather, they have been
= Price × × Expenditure
(pence) Quantity (1975 = 100) written out in considerable detail and are intended as learning aids.
Sugar (1 kg) 16 0.2 kg 3.20 450.00 1440
Eggs (12) 42 4 eggs 14.00 380.95 5333 Users of the solutions should always be aware that in many cases there are valid
Raisins (1 kg) 57 0.45 kg 25.65 256.14 6570 alternative methods. Also, in the many cases where discussion is called for, there may
Gr almonds (1 kg) 246 0.1 kg 24.60 172.76 4250 be other valid points that could be made.
Brandy (70 cl) 220 0.1 bottle 22.00 431.82 9500
89.45 27093 While every care has been taken with the preparation of these solutions, the Society
will not be responsible for any errors or omissions.
The base-weighted price relative index number for 2005 using 1975 as base is
27093 The Society will not enter into any correspondence in respect of these solutions.
302.9 .
89.45

Overall the prices of the required ingredients have slightly more than trebled
during the period. The weighted index is less than the unweighted index
because relatively small quantities of the ingredients with the greatest price
increases are used.

© RSS 2008
Ordinary Certificate, Paper II, 2008. Question 1 Ordinary Certificate, Paper II, 2008. Question 2

(i) (45/60) 15 = 11.25 km. Table of tally counts


English
A B C Total
(ii) (80/60) 15 = 20 km.
A II III III 8
Mathematics B IIII II I 7
(iii) (20/30) 60 = 40 minutes. C IIII I IIII I III 15
Total 12 11 7 30
(iv) (50/30) 60 = 100 minutes = 1 hour 40 minutes.

Contingency table for grades of 30 students in Mathematics and English


English
Distance travelled = (15 0.5) + (30 2) = 67.5 km. A B C Total
A 2 3 3 8
Time taken = 2.5 hrs. Mathematics B 4 2 1 7
C 6 6 3 15
So average speed = 67.5/2.5 = 27 km per hour. Total 12 11 7 30

Time taken = (10/15) + (40/30) = 2 hours. The modal grade in Mathematics is C.


Distance travelled = 50 km.
The modal grade in English is A.
So average speed = 50/2 = 25 km per hour.
Probability of a randomly selected student having As in both subjects is 2/30 or 1/15.

If Alice has a grade A in Mathematics, the probability that she has a grade A in
English is 2/8 = ¼.

If David has a grade A in English, the probability that he has a grade A in


Mathematics is 2/12 = 1/6.

Ordinary Certificate, Paper II, 2008. Question 3 Ordinary Certificate, Paper II, 2008. Question 4

(i) For easy puzzles, x = 36, n = 4.


Greenhouse Gas Emissions by Activity
Source: Stern Report 2006 So mean = 36/4 = 9 minutes.
Percent

Standard deviation = ( (x – 9)2/4)


= {(22+02+12+12)/4} = (6/4) = 1.5 = 1.2 minutes to 1 decimal place.
35 I Industry
C Commercial buildings
R Residential buildings
30 T Transport
N Non-energy
M Miscellaneous
25
(ii) For mild puzzles we have:
x f fx fx2
20

15
10 3 30 300
10 11 4 44 484
12 5 60 720
13 2 26 338
14 2 28 392
Total 16 188 2234
I C R T N M
Activity

So mean = 188/16 = 11.75 minutes.


(i) Angle for non-energy: (35/100) 360 = 126o. Standard deviation = {( fx2/ f ) – ( fx/ f ) 2)} [or equivalent formula]

(ii) Angle for commercial buildings: (5/100) 360 = 18o. = {(2234/16) – (188/16)2} = (139.625 – 138.0625) = 1.5625
= 1.25 minutes.

The overall percentage reduction would be 10% and the angles would be unchanged. [Note. n – 1 instead of n in the denominator is acceptable, regarding this as a sample
rather than a population. This gives values for the standard deviations of 1.4 and 1.3
respectively.]

The overall percentage reduction in the second situation would be


Coefficient of variation = (standard deviation/mean) 100%.
{(35+30) (20/100)} + {(5+10+14+6) (10/100)} = 13 + 3.5 = 16.5%.
Easy: CV = 13% [could be given as 14% if 1.5 used for st dev]
New percentage for non-energy is 35 (80/100) = 28%. Mild: CV = 11%
Therefore new angle for non-energy is (28/83.5) 360 = 120.7o. Difficult: CV = 12.5%
Fiendish: CV = 13%
New percentage for commercial buildings is 5 (90/100) = 4.5%.
Times taken to solve Sudoku puzzles
Therefore new angle for commercial buildings is (4.5/83.5) 360 = 19.4o.
Level Easy Mild Difficult Fiendish
Mean (min) 9 11.75 18.4 25.3
Std deviation (min) 1.2 1.25 2.3 3.4
CV % 13 11 12.5 13

The mean time to solve the puzzles increases with the level of difficulty.
The variability in time also increases, as measured by the standard deviation, but the
relative variability is greatest in the easy and fiendish levels.
Ordinary Certificate, Paper II, 2008. Question 5 Ordinary Certificate, Paper II, 2008. Question 6
(i)
4/9 R x 208 so x 17.3333 .

R
3/9
B y 101 so y 8.4167 .
2/9

G
x x x x / n = 4772 – (208 × 208)/12 = 1166.6667.
5/10 2 2 2

5/9 R

y y y2 y / n = 1605 – (101 × 101)/12 = 754.9167.


2/9
2 2
3/10
B B
2/9

G
x x y y xy x y / n = 2624 – (208 × 101)/12 = 873.3333.
2/10

5/9 R

G 3/9
B
r = 873.3333/ (1166.6667×754.9167) = 0.9306, i.e. r = 0.93 to 2 decimal places.
1/9

G
r is positive as higher maximum temperature is associated with higher minimum
temperature; it is close to +1 indicating a high correlation.
Probability = (5/10 4/9) + (3/10 2/9) + (2/10 1/9) = 28/90 = 14/45.

(ii)
x x y y 873.3333
bˆ 0.7486 , i.e. 0.75 to 2 decimal places.
R
x x
2
1166.6667
3/8

3/8
R B
2/8 aˆ ˆ = 8.4167 – (0.7486
y bx 17.3333) = –4.559, i.e. –4.56 to 2 decimal places.
4/9 G

4/8 R
(i) New x {1.8 (Old x )} 32 63.2 (degrees Fahrenheit).
3/9 2/8
R B B
2/8 New y {1.8 (Old y )} 32 47.2 [47.15] (degrees Fahrenheit).
2/9 G
R
New x x (1.8)2 Old x x 3780.00 .
4/8 2 2

G 3/8
B

New y y (1.8)2 Old y y 2445.93 .


1/8 2 2
G

Probability = (4/9 8/8) + (3/9 6/8) + (2/9 5/8) = 60/72 = 5/6.


(ii) r is unchanged. bˆ is unchanged.
(iii)

New aˆ New y bˆ New x 0.16 .


2/7
4/8 R B

2/8 4/7
R B B R
2/8

Probability = (4/8 2/7) + (2/8 4/7) = 16/56 = 2/7.

Ordinary Certificate, Paper II, 2008. Question 7 Time Series Analysis of Rainfall Data
Rainfall 4-Qtr Add in Centred 4-Qtr MA Detrended
Year/Quarter (mm) Total(mm) pairs (mm) (Trend) (mm) data (to 3 dp)
(i) (a) Trend is the basic long-term underlying movement of the series. 2004 Q1 650

(b) Seasonal variation is short-term, usually regular (and in some sense 2 525
seasonal), variation about the trend. 1725
3 125 3550 443.750 0.282
(c) A multiplicative model assumes that the components Trend, Seasonal 1825
and Irregular are multiplied together (rather than added together) to 4 425 3600 450.000 0.944
1775
give the time series value, so that the model to explain the time series
2005 Q1 750 3525 440.625 1.702
data actually observed is of the form
1750
Time series value = Trend Seasonal Irregular. 2 475 3525 440.625 1.078
1775
[Cyclical variation could be included in this too.] 3 100 3575 446.875 0.224
1800
4 450 3575 446.875 1.007
The chart shows a marked seasonal pattern with the highest rainfall every year in Q1 1775
and the lowest in Q3. There appears to be a tendency for the rainfall in Q1 to be 2006 Q1 775 3525 440.625 1.759
1750
increasing with time and for that in Q3 to be decreasing with time.
2 450 3525 440.625 1.021
1775
3 75 3575 446.875 0.168
1800
4 475 3575 446.875 1.063
1775
2007 Q1 800 3550 443.750 1.803
1775
2 425 3550 443.750 0.958
1775
3 75

4 475

Note: 4-Qtr totals: First total T1 = t1 + t2 + t3 + t4


Second total T2 = T1 + (t5 – t1)
Third total T3 = T2 + (t6 – t2), etc
Last total, check sum of 2007 quarterly values = total obtained by difference
method above.
Centred 4-Qtr Moving Average values are obtained by dividing previous
column by 8.
The detrended data column is Rainfall / Trend.

The trend appears to be a fairly constant value between 440 and 450 mm per quarter.

The detrended data column shows that the Q3 rainfall is markedly below the trend
(28.2%, 22.4% and 16.8% of the trend in successive years), indicating that Q3 is
becoming even drier than previously. By contrast the Q1 rainfall is markedly above
Solution continued on next page the trend (170.2%, 175.9% and 180.3% of the trend in successive years), indicating
that Q1 is becoming even wetter than previously. The rainfall in Q2 and Q4 remains
much closer to the trend value throughout.
Ordinary Certificate, Paper II, 2008. Question 8

Energy costs of company

2
THE ROYAL STATISTICAL SOCIETY

2009 EXAMINATIONS SOLUTIONS


1.5

0.5 ORDINARY CERTIFICATE

0
PAPER II
Feb Mar Apr May Jun

-0.5

-1
The Society provides these solutions to assist candidates preparing for the
examinations in future years and for the information of any other persons using the
Month (2006)

examinations.
Chain-based index numbers of costs 2006
The solutions should NOT be seen as "model answers". Rather, they have been
Month Jan Feb Mar Apr May Jun
written out in considerable detail and are intended as learning aids.
Index ---- 101.5 101.3 100.9 100.4 99.5
Users of the solutions should always be aware that in many cases there are valid
Fixed based index numbers of costs 2006 (January 2006 = 100)
alternative methods. Also, in the many cases where discussion is called for, there may
Month Jan Feb Mar Apr May Jun be other valid points that could be made.
Index 100 101.5 102.8 103.7 104.1 103.6
While every care has been taken with the preparation of these solutions, the Society
Calculations will not be responsible for any errors or omissions.
Mar 101.5 101.3% = 102.8195 = 102.8 to 1 decimal place
The Society will not enter into any correspondence in respect of these solutions.
Apr 102.8 100.9% = 103.7252 = 103.7 to 1 d. p.
May 103.7 100.4% = 104.1148 = 104.1 to 1 d. p.
Jun 104.1 99.5% = 103.5795 = 103.6 to 1 d. p.

Comments
Costs rose every month from February to May and then dropped in June.
The rate of increase of costs was highest in February. The rate of increase of costs
gradually decreased from February to May.
June was the only month showing a decrease in costs over the previous month.
The costs in May were the highest over this six-month period.
© RSS 2009
The costs in January were the lowest over this six-month period.
Overall, the costs in June were 3.6% higher than in January.

Ordinary Certificate, Paper II, 2009. Question 1 Ordinary Certificate, Paper II, 2009. Question 2

(i) Since all the data values are non-negative, the lowest possible value of each is The percentages in each category are rounded to the nearest whole number, so
0 and therefore the lowest possible value of the mean is 0. rounding errors have occurred.

This occurs when all the data values are 0.


Advantages of bar charts:
They are easy to draw without calculation and with ruler only
(ii) The lowest possible value of the variance is 0. It is easy to compare directly on a scale the value of each category

This occurs when all the data values are equal. Advantages of pie charts:
They show the size of each category in relation to the whole
They are visually appealing
(iii) The lowest possible value of Spearman's rank correlation coefficient is –1.

This occurs when the two sets of rankings are completely reversed, i.e. {1, 2, The angle for "daytime cost" is (61/99) 360 = 222 to the nearest whole number.
…, n – 1, n} and {n, n – 1, …, 2, 1}.
Thus, arguing similarly for the others, the angles are as follows.

(iv) The lowest possible value of the product-moment correlation coefficient is –1.
Type of call Cost as % of total cost Angle for pie chart
Daytime 61 222
This occurs when the values of the two variables lie on a straight line of Evening and weekend 13 47
negative slope. Mobile 17 62
0845 numbers 6 22
All others 2 7
Total 99 360

Number as % of total
Type of call Angle for pie chart
number of calls
Daytime 50 182
Evening and weekend 33 120
Mobile 7 25
0845 numbers 8 29
All others 1 4
Total 99 360

The pie charts are shown on the next page.

Solution continued on next page


Comments might include the following.

The most common calls are daytime ones.


Cost as % of total cost

Daytime calls account for half of the calls but approaching two-thirds of the cost.

Evening and weekend calls account for one-third of the calls but a much smaller
proportion of the cost.

Daytime The cost of mobile calls as a proportion of the total cost is more than double the
Evening and weekend
Mobile
proportion of mobile calls as a proportion of all calls.
0845 numbers

Daytime and mobile calls are relatively more expensive than evening and weekend
All other

calls.

Number as % of total number of calls

Daytime
Evening and weekend
Mobile
0845 numbers
All other

[Note. The use of colour is not necessary provided the sectors are clearly identifiable
and labelled.]

Solution continued on next page

Ordinary Certificate, Paper II, 2009. Question 3 For both centres, on half the days the number of daily calls is 546 or fewer. The range
of calls per day is higher (249) for centre B than for centre A (89). Although for both
centres on a quarter of the days the number of calls is 518 or fewer, the number of
The data arranged in order of magnitude are shown in the table. calls in centre B on such days can be as low as 418 whereas the lowest number of
Numbers of calls in
calls for centre A is 503. For centre A, on a quarter of the days there are at least 582
ascending order calls. For centre B there are at least 601 calls on a quarter of the days. Although the
Centre A Centre B medians are the same for each centre and so are the lower quartiles, the workload is
503 418 much more variable in centre B than centre A.
508 436
509 455
518 518
521 519
529 523
546 527
546 546
554 558
564 571
574 572
582 601
583 615
591 623
592 667

Range for centre A is 592 – 503 = 89 calls.


Range for centre B is 667 – 418 = 249 calls.

There are 15 days, so the median is the number of calls on the (15+1)/2 = 8th day
when the daily calls are arranged in order.

Thus the median for centre A is 546 calls and the median for centre B is 546 calls.

There are 15 observations. The lower quartile is the number of calls on the (15+1)/4
= 4th day.

Similarly, the upper quartile is the number of calls on the 3 (15+1)/4 = 12th day.

[Note. Other conventions exist for defining the lower and upper quartiles. These
were acceptable in the examination.]

Thus
lower quartile for centre A is 518 calls
upper quartile for centre A is 582 calls
lower quartile for centre B is 518 calls
upper quartile for centre B is 601 calls

Solution continued on next page


Ordinary Certificate, Paper II, 2009. Question 4 Ordinary Certificate, Paper II, 2009. Question 5

The class "30.5 but less than 35.5" has class width 5 (hours). The table of rankings is as follows.
Country 100m Hammer High Jump
The area of this bar of the histogram is therefore 5 × 22.
GBR 1 7 8
If this is to represent a frequency of 22 then the scale factor is 1/5 = 0.2. FRA 2 5 4
GER 3 2 1
The class "20.5 but less than 30.5" has width 10 (hours). UKR 4 6 6
POL 5 1 3
As the frequency here is 22, we must have Height × 10 × 0.2 = 22, so Height = 11 cm. GRE 6 3 5
RUS 7 4 2
Similarly for the "70.5 but less than 90.5" class, the width is 20 (hours) and so we BEL 8 8 7
have Height × 20 × 0.2 = 8 so that Height = 2 cm.
6 d2
The formula for Spearman's rank correlation coefficient is rs 1 . Here we
n(n 2 1)
We construct a table based on the mid-points of the cells. have n = 8 and therefore n(n2 – 1) = 504.

Mid-point in hours (x) Frequency (f ) fx For 100m and Hammer:


8.0 6 48
Total
15.5 12 186 100m 1 2 3 4 5 6 7 8
25.5 22 561 Hammer 7 5 2 6 1 3 4 8
33.0 22 726 d –6 –3 1 –2 4 3 3 0
38.0 20 760 d2 36 9 1 4 16 9 9 0 84
45.5 24 1092
rs = 1 – (6 84/504) = 0
60.5 16 968
80.5 8 644 For 100m and High Jump ("Hi Jmp"):
Total 130 4985
Total
x = 4985/130 = 38.35 hours, to 2 decimal places. 100m 1 2 3 4 5 6 7 8
Hi Jmp 8 4 1 6 3 5 2 7
d –7 –2 2 –2 2 1 5 1
The table of the upper class boundaries and cumulative frequencies for a cumulative d2 49 4 4 4 4 1 25 1 92
frequency graph is as follows. rs = 1 – (6 92/504) = 0.0952
Upper class boundary in hours(x) Cumulative frequency
For High Jump ("Hi Jmp") and Hammer:
5.5 0
10.5 6 Total
20.5 18 Hi Jmp 8 4 1 6 3 5 2 7
30.5 40
35.5 62 Hammer 7 5 2 6 1 3 4 8
40.5 82 d 1 –1 –1 0 2 2 –2 –1
50.5 106 d2 1 1 1 0 4 4 4 1 16
70.5 122
90.5 130 rs = 1 – (6 16/504) = 0.8095

Solution continued on next page

For 100m and Hammer the value of rs is 0, indicating that there is no correlation Ordinary Certificate, Paper II, 2009. Question 6
between the rankings in these two events.

For 100m and High Jump the value of rs is –0.0952, a small negative value. This (i) Probability that £250,000 is not in central box in any show = 21/22.
indicates a slight tendency for a country finishing well in the 100m to do badly in the
High Jump and vice versa. Probability that this happens for 99 shows = (21/22)99 = 0.009997 which is
slightly less than 0.01.
For High Jump and Hammer the value of rs is 0.8095, a large positive value. This
indicates a strong tendency for a country which performs well in the High Jump to
also do so in the Hammer and for countries which perform badly to do so in both. (ii) (a) If the contestant has a "blue" sum in the central box, there are 10
"blue" and 11 "red" to choose from for the first choice, and so on for
the remaining choices.

So the probability of five "blues" is


10 9 8 7 6 4
= = 0.01238.
21 20 19 18 17 323

Similarly, if there is a "red" sum in the central box, there are initially
11 "blue" and 10 "red" to choose from and the probability is
11 10 9 8 7 22
= = 0.02270.
21 20 19 18 17 969

The central box is equally likely to contain a "blue" or a "red", i.e. each
has probability ½.

So the overall probability of 5 "blues" is


1 4 1 22 1
0.0175 to 4 decimal places.
2 323 2 969 57

(b) By symmetry, P(5B) = P(5R), P(4B and 1R) = P(1B and 4R), and
P(3B and 2R) = P(2B and 3R)

Thus P(5B) + P(4B and 1R) + P(3B and 2R)=0.5

We have that P(5B) = 0.0175 [answer to part (ii)(a)] and P(4B and 1R)
= 0.1379 [given in the question].

Thus 0.0175 + 0.1379 + P(3B and 2R) = 0.5, so the probability of 3


"blues" and 2 "reds" is 0.5 – 0.0175 – 0.1379 = 0.3446.
Ordinary Certificate, Paper II, 2009. Question 7 Advantages of moving average:
Its form is not fixed so its gradient changes as the data change
Number of properties bought by Britons in Spain On the chart, the gradient is flatter towards the upper end of the series, which
seems to match the behaviour of the observed values
7000

Disadvantages of moving average:


6000
It has no fixed equation; each trend value had to be worked out individually
Because of the averaging process, there is no value at the start or end of the
5000
time period (i.e. here no trend value for 1998 or 2007)

4000 Advantages of least squares trend line:


It has a fixed equation so its value can be worked out easily at any time point
Number
Least squares trendline

To plot the line, we needed to calculate only two values, plus a third one as a
3-yr moving average
3000

check
2000 It uses values from all the data

Disadvantages of least squares trend line:


It is always linear regardless of how the data change
1000

On the chart, the least squares line is still rising at the same rate towards the
upper end of the series, although it looks as though the data values are
0
1998 1999 2000 2001 2002 2003 2004 2005 2006 2007
Year levelling off

The 3-year moving averages for the number of properties bought by Britons in Spain
are as follows. As an example, 2646 is calculated as (2634 + 2385 + 2920)/3 We would choose the moving average method, as a linear form for the trend does not
(rounded to the nearest integer for convenience). These have been plotted on the chart appear appropriate.
above.

Year Number 3yr moving average


1998 2634
1999 2385 2646
2000 2920 3151
2001 4148 3584
2002 3683 3926
2003 3947 4248
2004 5114 4799
2005 5336 5296
2006 5438 5395
2007 5412

The least squares trend line y = 2421 + 373x is plotted on the chart above.

Solution continued on next page

Ordinary Certificate, Paper II, 2009. Question 8

Company A's share price increased by the greatest amount of money, from 765 to
1566 pence per share, i.e. an increase of 801p. THE ROYAL STATISTICAL SOCIETY

The price relative for Company A (based on 1 Dec 1998 = 100) is 1566/765 100 =
204.7, to 1 decimal place. Similarly for the other companies. Thus we get 2010 EXAMINATIONS SOLUTIONS

Company Price relative (1 Dec 1998 = 100)


A 205
B 165 ORDINARY CERTIFICATE
C 151
D
E
456
501
PAPER II

Company E's share price has risen by the greatest percentage, 401%.

A weighted index number will allow for the fact that the investor has different values
The Society provides these solutions to assist candidates preparing for the
of shareholdings in each company. The greater the value of the shareholding, the
examinations in future years and for the information of any other persons using the
greater the investor is affected by changes in the share price.
examinations.
Value weights are appropriate, i.e. the number of shares multiplied by the price on
The solutions should NOT be seen as "model answers". Rather, they have been
1 Dec 2008. The calculation is shown below.
written out in considerable detail and are intended as learning aids.
Company Number Price in Price in Price Weight Weight
of shares pence on pence on relative (p) Price Users of the solutions should always be aware that in many cases there are valid
held 1 Dec 1 Dec (1 Dec relative (p) alternative methods. Also, in the many cases where discussion is called for, there may
1998 2008 1998 = be other valid points that could be made.
100)
A 200 765 1566 205 313200 64206000 While every care has been taken with the preparation of these solutions, the Society
B 400 636 1050 165 420000 69300000 will not be responsible for any errors or omissions.
C 350 813 1231 151 430850 65058350
D 500 62 283 456 141500 64524000 The Society will not enter into any correspondence in respect of these solutions.
E 250 153 767 501 191750 96066750
1497300 359155100

Thus the share price index for 1 Dec 2008 (1 Dec 1998 = 100) is 359155100/1497300
= 239.868 = 240 to the nearest whole number.

The investor's share prices as a whole have risen by 140% from 1 Dec 1998 to 1 Dec
2008.

© RSS 2010
Ordinary Certificate, Paper II, 2010. Question 1 Ordinary Certificate, Paper II, 2010. Question 3

The mode is the most frequently occurring value. There may be more than one mode.
UK population 2007
The median is the middle value when the observations are arranged in order from
Male/female splits at particular ages
lowest to highest (or vice versa). If the sample size is even, the median is halfway
between the two middle values.
The (arithmetic) mean is the sum of all the observations divided by the sample size.
Populations in thousands
500
450
Hannah: categorical, unordered
400
Joshua: categorical, ordered 350
300
Sarah: counting, discrete Males
250
Females
200
150
Hannah: mode only
100
Joshua: mode and median 50
0
Sarah: all three
0 10 20 30 40 50 60 70 80
Age (years)

Ordinary Certificate, Paper II, 2010. Question 2 [Note. The use of colour is not necessary provided the bars are clearly identifiable
and suitably labelled.]

Sugar content
R O G Total UK population 2007
|| || |||| |
R
2 2 6 10 Age
Ratio of females
to males
|||| | ||||
Fat content O 0 0.95
5 1 5 11 10 0.96
|||| | |||| 20 0.94
G
4 1 4 9 30 0.99
Total 11 4 15 30 40 1.03
50 1.03
60 1.03
70 1.11
(i) 10/30 = 1/3. 80 1.41
(ii) 4/30 = 2/15.
(iii) G (green). There are more males than females at ages 0, 10, 20, and 30 but the reverse is true at
ages 40, 50, 60, 70 and 80. The ratio is the same at ages 40, 50 and 60. The ratio
(iv) Fat R (red), sugar G (green). increases quite sharply with age from 60 to 80, and especially from 70 to 80.
(v) 5/15 = 1/3.

Ordinary Certificate, Paper II, 2010. Question 4 Ordinary Certificate, Paper II, 2010. Question 5

The maximum length is 3.2 cm and the minimum length is 2.3 cm, so the range The base period for the index numbers is January 1987.
[ = max – min] is 0.9 cm.

The lower quartile is the length of the [(15+1)/4]th bean when arranged in order, i.e. The index numbers for July 2007 are RPI = 182.2, PPI = 179.8.
the length of the 4th bean. The upper quartile is the length of the [3(15+1)/4]th bean,
i.e. the 12th bean. The index numbers for June 2008 are RPI = 193.2, PPI = 193.3.

[Note. Other conventions exist for defining the lower and upper The percentage rise in the RPI is {(193.2 – 182.2)/182.2} 100% = 6.0% (to 1 d.p.).
quartiles. These were acceptable in the examination.]
The data arranged in ascending order are as follows. The percentage rise in the PPI is {(193.3 – 179.8)/179.8} 100% = 7.5% (to 1 d.p.).

Bean Length in cm [Thus in June 2008, the annual rate of inflation based on the RPI is
1 2.3 6.0% and based on the PPI is 7.5%.]
2 2.4
3 2.4 The RPI has risen by 11.0 percentage points.
4 2.5
5 2.7 The PPI has risen by 13.5 percentage points.
6 2.8 So the lower quartile is 2.5 cm
7 2.8 and the upper quartile is 3.0 cm,
8 2.8 and thus the inter-quartile range is
9 2.8 3.0 – 2.5 = 0.5 cm. Rebased to July 2007, the series are as follows (to 1 decimal place).
10 2.8
11 2.9 UK Retail Price Index Numbers
12 3.0 Month RPI (Jul 2007 = 100) PPI (Jul 2007 = 100)
13 3.0 2007 Jul 100.0 100.0
14 3.1 2007 Aug 100.4 100.3
15 3.2 2007 Sep 100.7 100.9
2007 Oct 101.2 101.3
The mean is x = x/n = 41.5/15 = 2.77 (cm) to 2 decimal places. 2007 Nov 101.6 102.0
2007 Dec 102.3 102.6
The sample variance is 2008 Jan 101.6 102.2
2008 Feb 102.6 103.6
1 x 1 41.5 0.9933
2
2008 Mar 103.3 104.2
x2 115.81 0.070952
n 1 n 14 15 14 2008 Apr 104.1 105.1
2008 May 104.9 106.2
and the sample standard deviation is the square root of this, i.e. 0.26637, or 0.27 (cm) 2008 Jun 106.0 107.5
to 2 decimal places.

Coefficient of variation = standard deviation/mean (usually expressed as a percentage) The price rise for (two-person) pensioner households (0.3%) was smaller than for
= 0.26637/2.77 100% = 9.6% to 1 decimal place. general households (0.4%) between July and August 2007. For all other months, the
pensioner households have faced a larger percentage increase in prices compared with
By every measure of variability, the length of the new beans is more variable than the July 2007 than the general households. Over the whole period from July 2007 to June
length of the usual beans, even taking into account, with the coefficient of variation, 2008, the pensioner annual rate of inflation was 7.5% whereas it was 6% for general
the fact that the new beans have a larger mean than the usual beans. There is evidence households.
to support the merchant's views on the variability in size of the beans.
Ordinary Certificate, Paper II, 2010. Question 6 The 7-year moving average is as follows. For illustration, the first MA figure, 42.7, is
calculated as (15 + 53 + 43 + 46 + 34 + 47 + 61)/7.

(The data source is shown for interest.) Days with air frost in Bradford UK
Year Actual 7-year MA
Days with air frost in Bradford UK (Source: UK Met Office) 1990 15
1991 53
70 1992 43
1993 46 42.7
1994 34 45.1
60
1995 47 41.6
1996 61 39.4
50 1997 32 36.3
1998 28 39.6
40 1999 28 35.4
Actual
2000 24 34.1
7year MA
30 2001 57 33.7
2002 18 35.7
20 2003 52 36.3
2004 29 36.9
2005 42
10
2006 32
2007 28
0

A 7-year moving average does not appear to be appropriate as the fluctuations in the
Year data series have not been completely removed.

[Note. The 7-year moving average is required later in the question. The use of colour
is not necessary provided the plots are clearly identifiable and suitably labelled.]

The data fluctuate markedly from year to year with no obvious pattern or trend.

A benefit of using a moving average is that it smoothes out the highs and lows of the
data series.

A drawback of using a moving average is that, being an averaging process, there is no


estimate for the trend at each end of the data series.

Solution continued on next page

Ordinary Certificate, Paper II, 2010. Question 7 Ordinary Certificate, Paper II, 2010. Question 8

The probability of an event A given an event B is P(A and B)/P(B). (i) x 581/10 58.1 y 607 /10 60.7

x 5812
2

The probability tree with probabilities inserted on the branches is as follows. The ( x x )2 x2 37193 3436.9
probability values for the final outcomes are written at the ends of the branches (eg n 10
0.0095 = 0.01 0.95).
y 607 2
2

(y y )2 y2 38795 1950.1
n 10
0.95 Positive test

x y 581 607
Disease
0.0095 ( x x )( y y) xy 33426 1840.7
0.01 n 10
Negative test
0.05
Therefore the product-moment correlation coefficient is
0.0005
1840.7
0.05 r 0.711 .
Positive test 3436.9 1950.1
No disease
0.99
0.0495

0.95 Negative test


(ii) r is negative, indicating that increasing age is associated with a decrease in
fitness.

r is reasonably close to 1 in absolute value, indicating a relationship that is


0.9405

reasonable close (but not necessarily very close) to being linear.

(iii) The appropriate straight line is the usual linear regression line y = a + bx.
(i) The probability that a randomly chosen member of the population has a
positive test is 0.0095 + 0.0495 = 0.059. ( x x )( y y ) 1840.7
From the calculations above, we have b 0.5356
( x x )2 3436.9
(ii) The probability that a person has the disease given that this person's test result
is positive is 0.0095/0.059 = 0.161. and a y bx 60.7 ( 0.5356 58.1) 91.8166 .

(iii) The probability that a person has the disease given that this person's test result Inserting x = 45 gives an estimated average fitness score of 67.71 (to 2 d.p.).
is negative is 0.0005/(1 – 0.059) = 0.00053.

(iv) The value is an estimate because the line parameters (a and b) have been
The result of part (iii) indicates that the test is working well with respect to those who estimated from the sample.
have a negative result, as their probability of disease is reduced from 1% to 0.053%.
However, for those with a positive result, the test is not so satisfactory: only 16.1% of It is the average fitness score that is being estimated because, according to the
those with a positive result actually have the disease and the remainder will be underlying model, any individual fitness score would include an "error" term
subjected to further tests that are in fact unnecessary, thus leading to anxiety and no representing variability about the straight line; this is assumed to have average
doubt having cost implications. value 0 and is being ignored here.

You might also like