Professional Documents
Culture Documents
The importance of the weighting is twofold. First, petroleum products are all
made from crude oil and are to some extent substitutable as far as the producer
is concerned. An index should reflect the average price of every gallon of petro-
leum product purchased. Only the weighted index does this. Secondly, products
such as car petrol, of which large quantities are purchased, will have a bigger
effect on the general public than products, such as kerosene, for which only
small amounts are purchased. Again, the index should reflect this.
(d) A differently constructed index would use different weightings. The price part of
the calculation could be changed by using price relatives but this would have
little effect since the prices are close together.
The different weightings that could be used are:
(i) Most recent year quantity weighting. But this would imply a change in his-
torical index values every year.
(ii) Average quantity for all years’ weighting. This would not necessarily mean
historical changes every year. The average quantities for 2012–14 could be
used in the future. This has the advantage that it guards against the chosen
base year being untypical in any way.
Module 6
Review Questions
6.1 The correct answer is C. Sampling is necessary because it is quicker and easier than
measuring the whole population while little accuracy is lost. Statement A is not true
because it is not always impossible to take population measurements, although it is
usually difficult. Statement B is untrue because sampling is always less accurate since
fewer observations/measurements are made.
6.2 The correct answer is A. Many sampling methods are based on random selection for
two reasons. First, it helps to make the sample more representative (although it is
unlikely to make it totally representative). Second, it enables the use of statistical
procedures to calculate the range of accuracy of any estimates made from the
sample. A is therefore a correct reason, while B, C and D are incorrect.
6.3 The correct answer is B. Starting top left, the first number is 5; therefore, the first
region chosen is SE England. Moving across the row, the second number is 8 and
the corresponding region is Scotland. The third number is 5, which is ignored, since
SE England is already included. In this situation, having the same region repeated in
the sample would not make sense. Consequently, we sample without replacement.
The fourth number is 0 and is also ignored since it does not correspond to any
region. The fifth number is 4 and so London completes the sample.
6.4 The correct answer is B, C. Multi-stage sampling has two advantages over simple
random sampling. The population is divided into groups, then each group into
subgroups, then each subgroup into subsubgroups, etc. A random sample of groups
is taken, then for each group selected a random sample of its subgroups is selected
and so on. Therefore, it is not necessary to list the whole population and advantage
B is valid. Since the observations/measurements/interviews of sample elements are
restricted to a few sectors (often geographical) of the population, time and effort
can be saved, as, for example, in opinion polls. Advantage C is therefore also valid.
Multi-stage sampling is solely a way of collecting the sample. Once collected, the
sample is treated as if it were a simple random sample. Its accuracy and the observa-
tions required are therefore just the same as for simple random sampling. Reasons A
and D are false.
6.5 The correct answer is A, C. In stratified sampling the population is split into
sections on the basis of some characteristic (e.g. management status in the absentee-
ism survey). The sample has to have the same sections in the same proportions as
the population (e.g. if there are 23 per cent skilled workers in the population, the
sample has to have 23 per cent skilled workers). In respect of management status,
therefore, the sample is as representative as it can be. In respect of other characteris-
tics (e.g. length of service with the company) the stratified sample is in the same
position as the simple random sample (i.e. its representativeness is left to chance).
Thus a stratified sample is usually more representative than a simple random one
but not necessarily so. Statement A is true.
A cluster sample can also be stratified by making each cluster have the same
proportions of the stratifying characteristics as the population. Statement B is
untrue.
If a total sample of 100 is required, stratification will probably mean that more than
100 elements must be selected. Suppose 23 per cent skilled workers are required and
that, by the time the sample has grown to 70, 23 skilled workers have already been
selected. Any further skilled workers chosen cannot be used. To get a sample of
100, therefore, more than 100 elements will have had to be selected. Only if the cost
of selection is negligible will a stratified sample be as cheap as a simple random
sample. Statement C is true.
6.6 The correct answer is B. Use of a variable sampling fraction means that one section
of the population is deliberately over-represented in the sample. This is done when
the section in question is of great importance. It is over-sampled to minimise the
likelihood of there being error because only a few items from this section have been
measured. Incidentally, D describes weighted sampling.
6.7 The correct answer is A, B, C. Non-random sampling is used in all three situations.
Size
Small Medium Large
Sample % 33 33 33
Population % 34 34 32
(i) Some small percentage of accounts will have been closed between the
time of compilation of the computer listings and use of the infor-
mation, thus reducing the sample size and the accuracy. The original
sample may have to be increased to allow for this.
(ii) Some accounts will be dormant (i.e. open but rarely, or never, used). A
decision to include or exclude these accounts must be made. It is usual-
ly made after consideration of the purposes to which the information is
being put.
(iii) To know the occupation of customers requires visiting branches, since
this information may not be computerised. This is time-consuming and
requires the establishment of a working relationship with the bank
manager. It also requires permission to breach the confidentiality be-
tween customers and manager.
(iv) The personal details may well be out of date, since such information is
only occasionally updated.
(v) It may not be possible to classify some account holders into a socioec-
onomic group. For example, the customer may have been classified as a
schoolchild seven years ago. This is a problem of non-response. Omit-
ting such customers from the sample will lead to bias. Extra work must
be done to find the necessary details.
Module 7
Review Questions
7.1 The correct answer is B. The probabilities are calculated from the formula:
7.6 The correct answer is C. The basic multiplication rule as given relates to
independent events. They may not be independent. For example, spells of bad
weather are likely to prevent patients attending. The fact that there were no cancella-
tions on day 1 might indicate a spell of good weather and a higher probability of no
cancellations on the following day.
7.7 The correct answer is D. The number of ways of choosing three objects from eight
is .
7.8 The correct answer is B. Knowledge about standard distributions (e.g. for the
normal ±2 standard deviations covers 95 per cent of the distribution) is available
rather than it having to be calculated direct from data.
A is not correct since some small amount of data has to be collected to check that
the standard distribution is applicable and to calculate parameters.
C is not necessarily correct. Standard distributions are approximations to actual
situations and may not lead to greater accuracy.
7.9 The correct answer is B. The population is split into two types: watchers and non-
watchers. A random sample of 100 is taken from this population. The number of
watchers per sample therefore has a binomial distribution.
7.10 The correct answer is True. Since the programme is described as being popular, the
proportion of people viewing ( ) is likely to be sufficiently high (perhaps about 0.3)
so that and are both greater than 5. The normal approximation to the
binomial can therefore be applied.
7.11 The correct answer is D. The population can be split into two types: those that have
heard and those that have not. A random sample of five is taken from this popula-
tion. The underlying distribution is therefore binomial with = 0.4 and = 5. The
binomial formula is:
Thus:
Area B
Area A
8%
z = 1.405
To see whether the results are consistent with an overall 10 per cent defective
rate, the observed results from the 100 samples are compared with the theoreti-
cally expected results calculated above.
Number of defectives 0 1 2 3 4 5 6
Observed no. of samples 52 34 10 4 0 0 0
Theoretical no. of samples 53 35 10 1 0 0 0
There is a close correspondence. The results are consistent with a process defec-
tive rate of 10 per cent. Note that because of rounding the theoretical numbers
of samples do not add up to 100.
(b) A first reservation concerning this conclusion is whether the samples were taken
at random. If the samples were only taken at particular times, say at the start of a
shift, it might be that starting-up problems meant that the defective rate at this
time was high. The results would then suggest the overall rate was higher than it
actually is.
Second, if the samples that contain defectives were mostly towards the end of
the time period during which the samples were collected, this might indicate that
the process used to have a defective rate less than 10 per cent but had deteriorat-
ed.
Third, the fact that there are more samples with three defectives than expected,
and fewer with zero and one defective, suggests greater variability in the process
than expected. This might occur because is not constant at 10 per cent but
varies throughout the shift. The antidote to this problem is either to split the
shift into distinct time periods and take samples from each or to use a more
sophisticated distribution called beta-binomial, which allows for variability in
and which will be described in a later module.
Therefore, there is a 96 per cent probability that a sample will contain from five to
13 regular users of the breakfast cereal. If many samples are taken, it is likely that 96
per cent of them will contain five to 13 users. Because consumers are counted in
whole numbers, there is no range of users equivalent to the 95 per cent requested in
the question.
The question has been answered, but it has been a lengthy process. Since (= 9)
and (= 11) are both greater than five, the binomial can be approximated by
the normal. The parameters are:
Module 8
Review Questions
8.1 The correct answer is C. Statistical inference uses sample information to make
statements about populations. The statements are in the form of estimates or
hypotheses.
8.2 The correct answer is C. Inference is based on sample information. Even though a
sample is random, it may not be representative, and therefore there is some chance
that the inference may be incorrect. The other statements are true but they are not
the reasons for using confidence levels.
8.3 The correct answer is A. The mean of the sample = (7 + 4 + 9 + 2 + 8 + 6 + 8 + 1
+ 9)/9 = 6
The variance = [(7 6)2 + (4 6)2 + (9 6)2 + (2 6)2 + (8 6)2 + (6 6)2 + (8
6)2 + (1 6)2 + (9 6)2]/(9 1) = 9
The standard deviation is:
8.4 The correct answer is B. The point estimate of the population mean is simply the
sample mean.
8.5 The correct answer is B. The point estimate of the mean is 6. The 90 per cent
confidence limits are (from normal curve tables) 1.645 standard errors either side of
the point estimate. The limits are 6 ± 1.645 × 1 = 4.4 to 7.6 (approximately).
8.6 The correct answer is A. The 95 per cent confidence limits cover a range of 2
standard errors on either side of the mean. A standard error is 150/ where is
the sample size. Thus
8.7 The correct answer is False. Sample evidence does not prove a hypothesis. Because
it is from a sample, it merely shows whether the evidence is statistically significant or
not.
8.8 The correct answer is A. The tester decides on the significance level. He or she may
choose whatever value is thought suitable but 5 per cent has come to be accepted as
the convention. The other statements are true but only after 5 per cent has been
chosen as the significance level.
8.9 The correct answer is True. Critical values are an alternative approach to
significance tests and can be used in both one- and two-tailed tests.
8.10 The correct answer is A. The standard error of the sampling distribution is 6 (=
48/ 64). There is no suggestion that any deviation from the hypothesised mean of 0
could be in one direction only. Therefore the test is two-tailed. At the 5 per cent
level the critical values are 2 standard errors either side of the mean (i.e. at 12 and
12). Since the observed sample mean is 9.87, at the 5 per cent level the hypothesis is
accepted. At the 10 per cent level the critical values are 1.645 standard errors from
the mean (i.e. at 9.87 and 9.87). At the 10 per cent level the test is inconclusive.
8.11 The correct answer is C. Since the test is one-tailed at the 5 per cent level, the
critical value is 1.645 standard errors away from the null hypothesis mean. The
critical value is therefore 9.87. For the alternative hypothesis the value of 9.87 is
1.69 (= (9.87 20)/6). The corresponding area in normal curve tables is 0.4545.
Since the null hypothesis will be accepted (and the alternative rejected even if true)
when the observed sample mean is less than 9.87, the probability of a type 2 error is
0.0455 (= 0.5 0.4545; i.e. 4.55 per cent).
8.12 The correct answer is D. The power of the test is the probability of accepting the
alternative hypothesis when it is true. The power is therefore:
8.13 The correct answer is C. The test is to determine whether the plea has met its
objective by bringing about an increase of £2500 per month (i.e. whether the
average increase in turnover per branch is £2500 per month). This is equal to £7500
over a three-month period.
8.14 The correct answer is B. The samples being compared are the turnovers before and
after the plea. They are paired in that the same 100 branches are involved. Each
turnover in the first period is paired with the turnover of the same branch in the
second period.
The test is one-tailed. In the circumstances that the plea was well exceeded the
hypothesis would not be rejected. One would only say that the plea had not
succeeded if the observed increase was significantly less than £7500, but not if
significantly more. Therefore only one tail should be considered. The test should
thus be one-tailed, based on paired samples.
8.15 The correct answer is False. The procedure described relates to an unpaired sample
test, not a paired test. A paired test requires a new sample formed from the differ-
ences in each pair of observations.
The mean of the one sample (£186) must lie within 12 (= 2 × 6) of the true
population mean at the 95 per cent confidence level. Consequently, the true
population mean must be in the range 186 ± 12 = £174 to £198.
The value for the sample result of an average score of 261 is thus:
= (261 242)/13
= 19/13
= 1.46
From the normal curve table given in Appendix 1 (Table A1.2), the associated
area under the curve is 0.4279. The hypothesis is that the new course would not
change the test score. The possibility that the new course could have led to an
improvement or a deterioration was recognised. The probability of the sample
result must therefore be seen as the probability of a result as far from the mean
as = 1.46 in either direction (a two-tailed test).
(e) This result is larger than the significance level of 5 per cent and the hypothesis
must be accepted. There is insufficient evidence to suggest that the new course
makes a significant difference to the test scores at the 5 per cent level.
rather than to be the basis for contract action. Second, the assumptions underlying
the test must be met. In this case this means that the sample should have been
selected in a truly random fashion. If not, the whole basis of the test is undermined.
Third, a tensile strength of slightly less than 12 kg might be adequate for the cloth
concerned, and it might not be economic to go to the expense of ensuring the
contract is kept to the letter. On the other hand, a tensile strength of, say, 11 kg or
less might have been more serious because the quality of the cloth was noticeably
reduced. This might suggest the adoption of a test that had 11 kg as the alternative
hypothesis.
Salesperson Difference
6 13 9 81
7 3 7 49
8 6 10 100
9 6 10 100
10 25 21 441
11 17 13 169
12 21 17 289
13 21 17 289
14 14 18 324
15 7 11 121
16 19 15 225
17 7 11 121
18 34 38 1444
19 7 11 121
20 13 9 81
21 13 9 81
22 9 5 25
23 11 15 225
24 11 7 49
25 18 14 196
26 19 23 529
27 8 4 16
28 7 11 121
29 9 5 25
30 18 14 196
120 0 5750
(v) From the normal curve table in Appendix 1 (Table A1.2), the correspond-
ing area is 0.4406. The probability of such a value is therefore 0.0594 or
5.94 per cent.
This percentage is slightly higher than the significance level. The hypothesis is
accepted (but only just). The new scheme does not give rise to a significant
increase in output.
(c) The six possible reservations listed in the text suggest:
(i) The sample should be collected at random. This means a true random-
based sampling method should have been used to select the salespeople
covering all grades, areas of the country, etc. It also means that the months
used should not be likely to show variations other than those stemming
from the new scheme. For instance, allowance should be made for seasonal
variations in sales.
(ii) Checks should be made that the structure of the test is right. For instance,
does a simple measure like the total sum assured reflect the profitability of
the company? Profitability may have more to do with the mix of types of
policy than total sum assured.
(iii) The potential cost/profit to the company of taking the right decision in
regard to the incentive scheme suggests that more effort could be put into
the significance test. In particular, a larger sample could be taken.
(iv) The balance between the two types of error should be right. It is more im-
portant to know whether the scheme is profitable than to know whether it
gives a significant increase. The test should have sufficient power to distin-
guish between null and alternative hypotheses. This is discussed below.
(d) If the alternative hypothesis is a mean increase of £5000:
(i) .
(ii) The critical value of the one-tailed test is 1.645 standard errors from the
mean. The critical value is 4.23 (= 1.645 × 2.57). For the alternative hy-
pothesis, the value of 4.23 is 0.30 (= (4.23 5)/2.57) (see Figure A4.8).
From the normal curve table in Appendix 1 (Table A1.2), the correspond-
ing area is 0.1179. The null hypothesis is accepted (and the alternative
hypothesis is rejected) if the observed sample mean is less than the critical
value, 4.23. A type 2 error is the acceptance of the null hypothesis when the
alternative hypothesis truly applies. Therefore:
5%
0 4230 5000
Critical
value
(e) Note that, although the null hypothesis was accepted, the alternative was more
likely to apply. Under the null hypothesis, = 5.94 per cent;
under the alternative hypothesis the -value of the observed sample mean is 0.39
((4 5)/2.57) and = 34.83 per cent. The problem is that the
power of the test is low. There is a much higher probability of a type 2 error
than a type 1 error.
To balance the situation, if , then the
critical value must be halfway between the means of the null hypothesis distribu-
tion and the alternative hypothesis distribution and also 1.645 standard errors
from both means. The critical value must therefore be at £2500 and (working in
thousands):
(Note that the original estimate of the standard deviation, 14.08, is still used.)
Module 9
Review Questions
9.1 The correct answer is B. A ‘natural’ measurement such as height is a typical example
of a normal distribution. Many small genetic factors presumably cause the varia-
tions. This is highly typical of the sort of situation on which the normal is defined.
9.2 The correct answer is B, C. The binomial formula, with its factorials and powers, is
more difficult to use than the Poisson. Binomial tables extend to many more pages
than the Poisson because the former has virtually one table for each sample size.
A is not a correct reason. If the situation is truly binomial but the Poisson is used,
some accuracy will be lost but the loss will be small if the rule of thumb applies.
9.3 The correct answer is B. The situation looks to be Poisson. Assume this to be the
case. The parameter is equal to the average number of accidents per month: 36/12
= 3. From the Poisson probability table (see Appendix 1, Table A1.3):
Therefore:
Therefore:
9.4 The correct answer is B. The variance is estimated in essentially the same way as the
standard deviation, which is merely the square root of the variance. The same
reasoning that leads to the standard deviation having degrees of freedom leads
to the variance having degrees of freedom.
9.5 The correct answer is False. In addition to the conditions made in the statement, the
distribution from which the sample is taken must also be normal if the sampling
distribution of the mean is to be a -distribution.
9.6 The correct answer is D.
9.8 The correct answer is B. For this test there are 17 degrees of freedom. The test is
one-tailed since it is supposed that the fitness of the executives can only have
improved after the course. The table value is therefore in the row for 17 degrees of
freedom and the column headed 0.05.
9.9 The correct answer is B. The observed value is greater than the 5 per cent
significance value. Hence it falls within the 5 per cent tail and the probability of the
sample evidence is less than 5 per cent. The fitness of the executives does show a
significant improvement.
9.10 The correct answer is True. Chi-squared is essentially the ratio between a sample
variance and the variance of the population from which it was drawn. It can
therefore be used to test hypotheses such as that described in the question, provided
the population is normal and the sample is selected at random.
9.11 The correct answer is E. The correct value is taken from the row referring to 18
degrees of freedom, and the column referring to an upper tail area 0.10. The critical
chi-squared value is 25.989.
9.12 The correct answer is False. The chi-squared, not -distribution, is applicable in
such circumstances.
9.13 The correct answer is D. Find the entry for row 11 and column 8 in the table. The
upper entry refers to the 5 per cent tail and the lower to the 1 per cent tail. The
correct answer is 2.95.
9.14 The correct answer is True. The observed ratio is 96/24 = 4.0. The 1 per cent
critical ratio is 3.80, taken from the row corresponding to 14 degrees of freedom
in the denominator and the column corresponding to 12 degrees of freedom in the
numerator. The observed therefore does exceed the 1 per cent critical value.
9.15 The correct answer is C. Essentially the distribution is binomial with and
. For such a low value of and large value of , the distribution can and
would be approximated by the Poisson, which is easier to use in practice than the
binomial.
On average, therefore, an aircraft was involved in 1.6 incidents over the 400
days. From the column headed 1.6 in the table, the theoretical frequencies of
incidents can be found. For example, the probability of 0 incidents is 0.2019.
Out of 100 aircraft, one would thus expect 20 to be involved in 0 incidents. The
comparison between theoretical and observed is:
No. incidents ( ) 0 1 2 3 4 5 6 7
Observed 23 33 23 11 5 3 1 1
Theoretical Poisson 20 32 26 14 6 2 0 0
= 0.0176
= 0.0047
= 0.0011
= 0.0002
= 0
giving:
= 0.0236
A proportion of 0.0236 (or 2.36 per cent) of the 800 would therefore be ex-
pected to be involved in five or more incidents over a 400-day period (i.e. 19
aircraft).
(c) Reservations about the conclusion are principally to do with whether the
incidents are random. It may be that for this aircraft certain routes/flights/ pi-
lots/times of the year are more prone to accident. If so, the average incident rate
differs from one part of the population to another and a uniform value (1.6)
covering the whole population should not be used. In this case it may be neces-
sary to treat each section of the population differently or to move to a more
sophisticated distribution, possibly the negative binomial.
Another problem may be that the sample is not representative of the population.
Not all routes may be included; not all airlines may be included; there may be a
learning effect, with perhaps fewer errors later in the 400-day period than earlier;
or perhaps the pilots are doubly careful when they first fly a new aircraft. The
data should be investigated to search for gaps and biases such as these. If the
insurance is to cover all aircraft/routes/airlines then the sample data should be
representative of this population.
Lastly, the data are all about incidents; the insurance companies become involved
financially only when an accident takes place. The former may not be a good
surrogate for the latter. If possible, past records should be used to establish a
relationship between the two and to test just how good a basis the analysis of
incidents is for deciding on accident insurance.
(d) If a check of the data reveals missing routes or airlines then the gaps should be
filled if possible. The data should be split into subsections and the analysis re-
peated to find if there has been a learning effect or if there are different patterns
in different parts of the data. There could be differences on account of seasonali-
ty, routes, airlines, type of flight (charter or scheduled). If differences are
observed then the insurance premium would be weighted accordingly.
Data from the introduction of other makes of aircraft could serve to indicate
learning effects and also the future pattern of incidents.
The large amounts of money at stake in a situation like this would make the extra
statistical work suggested here worthwhile.
Car 1 2 3 4 5 6 7 8 9 10 11 12 13 14
21 24 22 24 29 18 21 26 25 19 22 20 28 23
2 1 1 1 6 5 2 3 2 4 1 3 5 0
4 1 1 1 36 25 4 9 4 16 1 9 25 0
(v) The test is one-tailed, assuming the tyres could bring about only an im-
provement, not a deterioration, in petrol consumption; the degrees of
freedom are 13. The value corresponding to the 5 per cent level is thus
taken from the row for 13 degrees of freedom and the column for . The
value is 1.771. The observed value is less than this and therefore the hy-
pothesis is accepted. The tyres do not make a significant difference.
(b) On the other hand, the alternative hypothesis is that the tyres result in an
improvement in petrol consumption of 1.5 mpg. Under this hypothesis the sam-
ple would have come from a distribution of mean 23.9 (= 22.4 + 1.5). The
observed value is:
Ignoring the negative sign, this observed value (under the alternative hypothe-
sis) is lower than the critical value of 1.771, just as was the previous observed
value (under the null hypothesis). The sample evidence would therefore be insuf-
ficient to reject either the null or the alternative hypothesis. Clearly the sample
size is too small to discriminate properly between the two hypotheses.
If the probabilities of type 1 and type 2 errors are to be equal then the critical
value, , should be equidistant from both hypotheses (i.e. halfway between
them at 23.15). A sample size larger than 14 is evidently required to do this. As-
suming that the sample size needed is greater than 30 (and therefore the -
distribution can be approximated to the normal):
achieve such a sample size would presumably involve using cars of a wider age
span than six to nine months.
(c) Many factors affect a car’s petrol consumption. A well-designed significance test
should exclude or minimise the effect of all factors except the one of interest, in
this case, the tyres. Major influences on consumption are the type of car and its
age. By comparing like with like in respect of these factors, their effect is elimi-
nated. Other factors cannot be controlled in this way. Very little can be done
about the type of usage, total mileage, the style of the drivers and the quality of
maintenance. It is hoped that these factors will balance out over the sample of
cars and the time period.
(d) The principal argument in favour of the officer’s suggestion is that it may
eliminate the effect of different maintenance methods from the test so that ob-
served differences are accounted for by the tyres, not the maintenance methods.
The arguments against his proposal are stronger. First, the maintenance methods
are unlikely to be identical in the two forces. The procedures laid down may be
the same but the interpretation of them by different sets of mechanics of differ-
ent levels of skill will almost certainly mean that there are still differences.
Second, his proposals create some new difficulties not present in the original
significance test. Some factors affecting petrol consumption that were eliminated
by the first test are now reintroduced. The geography of the territories served by
the forces will differ; the drivers of the cars will be different; the roles of the cars
may be different. All these factors will cause different fuel consumptions in the
two samples of cars, which may well disguise or overwhelm the influence of the
tyres.
While the officer’s test could certainly be carried out, the new variables his test
introduces would put a question mark over any conclusions drawn. On the
whole, the officer’s suggestion should be rejected but without blunting his en-
thusiasm for using analytical methods to help in decision taking.
Module 10
Review Questions
10.1 The correct answer is A, C. Analysis of variance tests the hypothesis that the
samples come from populations with equal means or that they come from a
common population. In the former case, B is an assumption. In both cases, D is an
assumption. B and D are therefore not hypotheses but assumptions underlying the
testing of the hypotheses by analysis of variance.
10.2 The correct answer is A.
10.5 The correct answer is True. The hypothesis tested by the analysis of variance is that
the treatments come from populations with equal means (i.e. the treatments have no
effect). But since the observed value exceeds the critical value, the hypothesis
must be rejected. The treatments do have a significant effect.
10.6 The correct answer is A, D. It is hypothesised that B is an attribute of the
populations from which the samples are taken; C is an assumed attribute of the
populations. Since the samples are selected at random, it would be virtually impossi-
ble for the samples to have these attributes.
10.7 The correct answer is B. The grand mean is 4. Total SS is calculated by finding the
deviation of each observation from the grand mean, then squaring and summing.
Taking each row in turn:
10.10 The correct answer is False. A balanced design is one in which all the treatment
groups are the same size (i.e. they all have an equal number of observations).
The first column of Table A4.7 describes the sources of error. The second relates to
degrees of freedom, always given by and for a one-way analysis of
variance.
The third column requires the calculation of the sums of squares. SST deals with the
‘between’ sums of squares and is concerned with the group means and their
deviations from the grand mean.
SSE deals with (within) sums of squares and is concerned with the individual
observations and the deviations between them and their group means.
Brand 1
Brand 2
Brand 3
Brand 4
Brand 5
Brand 6
Total SS is of course the total sums of squares. It is concerned with all observations
and their deviations from the grand mean. Going through the observations, each
row in turn:
It was not strictly necessary to calculate all three sums of squares since:
Calculating all three provided a check. Table A4.7 shows that the equality is satis-
fied: 1078.4 = 490.0 + 588.4.
Next, the mean squares are calculated by dividing the sums of squares by the
associated degrees of freedom (column 4 in Table A4.7). The ratio of the mean
squares is the observed value of the variable and is calculated in the final column.
To finish the test, the critical value for (5, 54) degrees of freedom at the 5 per cent
level is found from the table of the -distribution. In this case, the value is 2.38. The
observed value, 8.99, greatly exceeds 2.38. The hypothesis is rejected at the 5 per
cent significance level. There is a significant difference in the levels of irritability
caused.
At the 1 per cent level the hypothesis is also rejected. The critical value for (5, 54)
degrees of freedom is 3.37 at the 1 per cent level. The observed value exceeds this
also. The evidence that the powders do not cause the same level of skin irritation is
strong.
Qualifications
The reservations that should attach to the results of the test are to do with both
statistics and common sense.
(a) An test assumes that the populations from which the samples are drawn are
normally distributed. In this case, it must be assumed that the distribution of
observations for each brand is normal. This may not be true, especially since the
sample size (10) is too small for the central limit theorem to have any real effect.
(b) An test also assumes that the populations from which the samples are drawn
have equal variances. Again, this may not be true although statistical research has
indicated that variances would have to be very different before the results of the
test were distorted.
(c) Since skin irritation is very much a subjective problem and one that is hard to
quantify, there must also be doubts about the validity of the data (i.e. does the
index measure accurately what it is supposed to measure?). The tester should
look carefully at the ways in which the index has been validated by the research-
ers.
(d) The data must also come into question for more fundamental reasons. The
design of the experiment gives rise to the following doubts:
(i) How were the households chosen? Are they a representative group?
(ii) Do the households do their washing any differently because they are being
monitored by the tester?
(iii) How representative are the batches of washing?
(iv) How ‘standard’ are the batches of washing?
(v) Are there factors that make some people more prone to skin irritation and
that should therefore be built into the test?
(vi) Are the data independent (e.g. is there any cumulative effect in the testing)?
Does any brand suffer a higher index because of the effect of brands tested
earlier?
After calculating the means, the next step is to construct a two-way analysis of
variance (ANOVA) table as shown in Table A4.9.
The block sum of squares is calculated from the block (row) means:
The error sum of squares (SSE) is calculated by first determining the total sum of
squares (Total SS).
Monday 616
Tuesday 928
Wednesday 326
Thursday 62
Friday 900
(a) Do the locations of the store have different effects on the responses? To test the
hypothesis that the location (column) means come from the same population,
the observed value relating to treatments must be compared with a critical
value. If the significance level is chosen to be 5 per cent then the appropriate
critical value, relating to (7, 28) degrees of freedom, is found from the table
to be 2.36. The observed value is 3.77; therefore the hypothesis should be
rejected. At the 5 per cent significance level, location does appear to affect re-
sponses.
(b) It is important to neutralise the effect of the days of the week in a test such as
this. Intuitively there is a likelihood that people’s attitudes will vary between the
beginning of the week and the end, when the weekend approaches. This factor
may affect customers and staff alike.
(c) If the effect of days of the week had not been neutralised, the appropriate test
would have been a one-way analysis of variance as shown in Table A4.10. Total
SS and SST are calculated just as in the two-way case, but SSE is obtained from
the relationship:
The critical value at the 5 per cent level and for (7, 32) degrees of freedom is
2.32. The observed value, 1.03, is less than the critical. The hypothesis should
be accepted. Location does not appear to affect responses. When the effects of
days of the week are not allowed for, the result of the test is the opposite of
when they are allowed for.
(d) Referring back to Table A4.9, the effect of days of the week on responses can
also be tested. This time the observed value is the ratio MSB/MSE, equal to
22.3. The critical value at the 5 per cent level for (4, 28) degrees of freedom is
2.71. The observed far exceeds this amount. Days of the week have a highly
significant effect on the responses.
(e) Should the analysis of variance be taken further by looking into the possibility of
an interaction effect? The usefulness of such an extension to the study depends
on how much it is thought days of the week and locations have independent
effects on responses. If it were thought that the ‘Monday’ and ‘Friday’ effects
were more marked in some parts of the country than others then an interaction
variable would permit the inclusion of this influence in the analysis of variance.
Intuitively it does not seem likely that people feel particularly worse about Mon-
days (better about Fridays) in some cities than in others. In any case, since the
effect of location has already been demonstrated to have a significant bearing on
responses, the inclusion of a significant interaction term could only make the
effect more marked (by decreasing the SSE while SST remains the same). Over-
all it does not seem worthwhile to extend the analysis to include an interaction
term.
Module 11
Review Questions
11.1 The correct answer is C, D. A is untrue because regression is specifying the
relationship between variables; correlation is measuring the strength of the relation-
ship. B is untrue because regression and correlation cannot be applied to unpaired
sets of data. C is true, by definition, and D is true, because if the data were plotted in
a scatter diagram, they would lie approximately along a straight line with a negative
slope.
11.2 The correct answer is B. A is untrue because residuals are measured vertically, not at
right angles to the line. B is true, by definition. C is untrue because actual points
below the line have negative residuals, and D is untrue because residuals are all zero
only when the points all lie exactly on the line (i.e. when there is perfect correlation).
11.3 The correct answer is B.
4 2 4 16 3 9 12
6 4 2 4 1 1 2
9 4 1 1 1 1 1
10 7 2 4 2 4 4
11 8 3 9 3 9 9
40 25 34 24 26
11.7 The correct answer is A. The evidence of everyday life is that husbands and wives
tend to be of about the same age, with only a few exceptions. One would therefore
expect strong positive correlation between the variables.
11.8 The correct answer is A. If data are truly represented by a straight line, the residuals
should exhibit no pattern. They should be random. Randomness implies that each
residual should not be linked with the previous (i.e. there should be no serial
correlation). Randomness also implies that the residuals should have constant
variance across the range of values (i.e. heteroscedasticity should not be present).
11.9 The correct answer is False. The strong correlation indicates association, not
causality. In any case, it is more likely that if causal effects are present, they work in
the opposite direction (i.e. a longer life means a patient has more time in which to
visit his doctor).
11.10 The correct answer is C. The prediction of sales volume for advertising expenditure
of 5 is:
11.11 The correct answer is B. Unexplained variation = Sum of squared residuals = 900
11.12 The correct answer is A. The difference between a regression of on and one of
on is that and are interchanged in the regression and correlation formulae.
Since the correlation coefficient formula is unchanged if and are swapped round,
the correlation coefficients are the same in both cases. Since the slope and intercept
formulae are changed if and are swapped round, then these two quantities are
different in the two cases (unless by a fluke).
20
15
10
5
x
0 1 2 3 4 5 6 7 8
3 11 1 1 3 9 3
1 7 3 9 7 49 21
3 12 1 1 2 4 2
4 17 0 0 3 9 0
6 19 2 4 5 25 10
7 18 3 9 4 16 12
24 84 24 112 48
The correlation coefficient is high, confirming the visual evidence of the scatter
diagram that the relationship is linear.
(b) Line (i)
The line goes through the points (1,7) and (6,19). Therefore, the line has slope =
(19 7)/(6 1) = 12/5, (i.e. the line is . Since the line goes
through the point (1,7):
The line is
Line (ii)
The line goes through the points (1,7) and (7,18). Therefore, the line has slope =
(18 7)/(7 1) = 11/6 = 1.8 (i.e. the line is ). Since the line goes
through (1,7):
The line is
Line (iii)
The regression line is found from the regression formulae:
The line is
The residuals are calculated as actual minus fitted values. For example, for line
(i) and the point (3,11), the residual is:
The MADs are calculated as the average of the absolute values of the residuals.
For example, for line (i):
The variances are calculated as the average of the squared residuals (but with a
divisor of 5, not 6, as in the formula for the variance). For example, for line (i):
The mean absolute deviation shows that line (i), connecting the extreme values,
has the smallest residual scatter. On the MAD criterion, line (i) is the best.
The variance shows that line (iii), the regression line, has the smallest residual
scatter. On the variance criterion (equivalent to least squares), line (iii) is the best.
This has to be the case since the regression line is the line that minimises the
sum of squared residuals.
Clearly different, but equally plausible criteria (minimising the MAD and mini-
mising the variance of the residuals) give different ‘best fit’ lines. Even when one
keeps to one criterion the margin between the ‘best’ line and the others is small
(in terms of the criterion). Yet the three lines (i), (ii) and (iii) differ markedly
from one another and would give distinctly different results if used to forecast.
The conclusion is that, while regression analysis is a very useful concept, it
should be used with caution. A regression line is best only in a particular way
and, even then, only by a small margin.
(b) The goodness of fit can be checked by considering the correlation coefficient
and the residuals. The correlation coefficient is 0.92. This is high, suggesting a
good fit.
The next step is to check the residuals for randomness. They must first be calcu-
lated using the regression equation (see Table A4.11).
A visual inspection of the residuals does not suggest any particular pattern. First,
there is no tendency for the positives and negatives to be grouped together (e.g.
for the positive residuals to refer to the smaller stores and the negatives to the
larger, or vice versa). In other words, there is no obvious evidence of serial cor-
relation. Second, there is no tendency for the residuals to be of different sizes at
different parts of the range (e.g. for the residuals to be, in general, larger for
larger stores and smaller for smaller stores). In short, there is no evidence of
heteroscedasticity.
Visually, the residuals appear to be random. Taken with the high correlation
coefficient, this indicates that there is a linear relationship between sales and
family disposable income.
(c) The scatter of the residuals about the regression line is measured through the
residual standard error. If the residuals are normally distributed, 95 per cent of
them will lie within 2 standard errors of the line. For a point forecast (given by
the line) it may be anticipated, if the future is like the past, that the actual value
will also lie within 2 standard errors of the point forecast.
If residual error were the only source of error, 95 per cent confidence limits for
the forecast could be defined as, in the example given above:
£74 668 ± 2 4720
i.e. £65 228 to £84 108
However, there are other sources of error (see Module 12) and therefore the
above confidence interval must be regarded as the best accuracy that could be
achieved.
(d) The linear relationship between sales and family disposable income appears to
pass the statistical tests. Further, since it must be a reasonable supposition that
sales are affected to some degree by the economic wealth of the catchment area,
the model has common sense on its side.
On the other hand, there are many influences on a store’s sales besides family
income. These are not included in the forecasting method. Ideally, a method that
can include other variables would be preferable.
A second reservation is concerned with the quality of the data. While store sales
are probably fairly easy to measure and readily available, this is unlikely to be the
case with the disposable family income. If these data are not available, an expen-
sive survey would be required to make estimates. Even then, the data are not
likely to carry a high degree of accuracy.
Last, the catchment area will be difficult to define in many if not all cases, adding
further to the inaccuracy of the data.
Module 12
Review Questions
12.1 The correct answer is B, C. B and C give synonyms for a right-hand-side variable.
Another synonym is an independent variable, the opposite of A. D is incorrect,
there being no such thing as a residual variable.
12.2 The correct answer is False. The statement on simple regression is correct, but the
statement on multiple regression should be altered to ‘one variable is related to
several variables’.
12.3 The correct answer is A. The coefficients have standard errors because they are
calculated from a set of observations that is deemed to be a sample and therefore
the coefficients are estimates. Possible variations in the coefficients are calculated
via their standard errors, which are in turn estimated from variation in the residuals.
B is incorrect since, although there may be data errors, this is not what the standard
errors measure. The standard errors are used to calculate values that are used in
multiple regression, but this is not why they arise. Therefore, C and D are incorrect.
12.4 The correct answer is C. The values are found by dividing the coefficient estimate
by the standard error. Thus:
Variable 1: 5.0/1.0 = 5.0
Variable 2: 0.3/0.2 = 1.5
Variable 3: 22/4 = 5.5
12.5 The correct answer is D. The degrees of freedom = No. observations No.
variables 1. Thus, number of observations = 32 + 3 + 1 = 36.
12.6 The correct answer is C. The elimination of variables is based on a significance test
for each variable. The value for each variable is compared with the critical value
for the relevant degrees of freedom. In this case, the number of observations
exceeds 30; therefore, the normal distribution applies and the critical value is 1.96.
12.7 The correct answer is True. The formula for -bar-squared has been adjusted to
take degrees of freedom into account. Since each variable reduces the degrees of
freedom by 1, the number of variables included is allowed for.
12.8 The correct answer is A. Sums of squares (regression) have as many degrees of
freedom as there are right-hand-side variables (i.e. 3).
12.9 The correct answer is C.
12.10 The correct answer is D. The degrees of freedom for sums of squares (residuals)
12.11 The correct answer is C. The critical ratio is for (3,34) degrees of freedom and for
the 5 per cent level. From tables this is found to be 2.88. Since observed exceeds
critical, there is a significant linear relationship.
12.12 The correct answer is D. The independent variables are the right-hand-side
variables. Only (and ) appear on the right-hand side, but in curvilinear regression
squared terms are treated as additional variables. Therefore, the independent
variables are and .
12.13 The correct answer is False. A transformation is used not to approximate a curved
relationship to a linear one but to put the relationship in a different form so that the
technique of linear regression can be applied to it.
12.14 The correct answer is B. To carry out a regression analysis on the exponential
function the equation is first transformed by taking logarithms (to the base
) of either side to obtain eventually:
(b) The values of the residuals are found by first calculating the fitted values from
the regression equation. The fitted values are then subtracted from the actual.
(See Table A4.12.) The residuals are not especially unusual. A visual inspection
suggests that they are random, although it is of course difficult to detect patterns
from so few observations.
(c) The correlation coefficient is high at 0.92. But it would be expected to be high
where there are so few observations and two right-hand-side variables. The test
would be a more precise check. An ANOVA table should be drawn up as in
Table A4.13.
The observed value is 37.3. The critical value for (2,5) degrees of freedom at
the 5 per cent significance level is, from the table, 5.79. Since observed ex-
ceeds the critical , it can be concluded that there is a significant linear relation-
ship.
(d) Two additional pieces of information would be useful. The correlation matrix
would help to check on the possible collinearity of the two variables. Calcula-
tions of SE(Pred) would help to determine whether the predictions produced by
the model were of sufficient accuracy to use in decision making.
(e) The model could be used to forecast revenue, provided that conditions do not
change. In particular, this means that the ways in which decisions are taken re-
main the same.
A seeming paradox of the model is the negative coefficient for television adver-
tising expenditure. Does this mean that television advertising causes sales to be
reduced? The answer is almost certainly no. The reason is this: television adver-
tising is only used when sales are disappointing. Consequently, high television
advertising expenditure is always associated with low revenue (but not as low as
it might have been). The causality works in an unexpected way: from sales to
advertising and not the other way around.
Provided decisions about when to use the two types of advertising conform to
the past, the model could be used for predictions. If, however, it was decided to
experiment in advertising policy and make expenditures in different circum-
stances to those that have applied in the past, the model could not be expected
to predict gross revenue.
(f) The prime improvement would be extra data. Eight observations is an unsatis-
factory base for a regression analysis. This is not a statistical point. It is simply
that common sense suggests that too little information will be contained in those
eight observations for the decision at hand, whatever it is.
90
Unit costs (£/tonne)
80
70
60
50
40
100 200 300 400
Capacity (tonnes/week)
unit costs to 1/Capacity alone). They do not appear to be random in the first
model as described in Case Study 12.2.
(iv) The third model is the only one of the three that is a multiple regression
model. Both of its right-hand-side variables have been included correctly.
Their values are 23.8 (for 1/Capacity) and 7.3 (for Age), well in excess of
the critical value at the 5 per cent level.
(v) Collinearity is largely absent from the third model. Using the formula given
in Module 11, the correlation coefficient between 1/Capacity and Age is
0.44. Squaring this to obtain -squared, the answer is 0.19. This is a low
value (and an test shows the relationship is not significant).
(vi) The third model has the lowest SE(Pred), at 1.7, compared with 4.0 for the
second model. It is more than twice as accurate as the next best model.
(b)
(i) Although the value of age in making a prediction for a 2018 plant is zero,
age nevertheless has had an effect on the prediction. Age was allowed for in
constructing the regression model. All coefficients were affected by the
presence of age in the regression. One could say that the regression has
separated the effect of capacity from that of age. The (pure) effect of capac-
ity can now be used in predicting for a modern plant.
(ii) The 95 per cent forecast intervals must be based on the -distribution since
the number of observations is less than 30. For 9 degrees of freedom
(12 2 1) is 2.26. The intervals are:
(iii) SE(Pred) takes into account a number of sources of error. One of these is
in the measurement of the variable coefficients. Any prediction involves
multiplying these coefficients by the values of the right-hand-side variables
on which the predictions are based. Therefore, the amount of the error will
vary as these prediction values vary. SE(Pred) will thus be different for dif-
ferent predictions.
(iv) -squared measures variation explained; SE(Pred) deals with unexplained
variation plus other errors. Although, therefore, the two are linked, the rela-
tionship is not a simple or exact one. An increase in -squared from 0.93 to
0.99 in one way appears a small increase. From another point of view it re-
flects a great reduction in the unexplained variation, which is reflected in
the substantial improvement in prediction accuracy.
Module 13
Review Questions
13.1 The correct answer is False. The techniques are classified into qualitative, causal
modelling and time series; the applications are classified into short-, medium- and
long-term.
13.2 The correct answer is B, D. A is false because time series methods make predictions
from the historical record of the forecast variable only, and do not involve other
variables. B is true because a short time horizon does not give time for conditions to
change and disrupt the structure of the model. C is false since time series methods
work by projecting past patterns into the future and therefore are usually unable to
predict turning points. D is true because some time series methods are able to
provide cheap, automatic forecasts.
13.3 The correct answer is A, C. A is the definition of causal modelling. B is false since
there is no reason why causal modelling cannot be applied to time series as well as
cross-sectional data. C is true because causal modelling tries to identify all the
underlying causes of a variable’s movements and can therefore potentially predict
turning points. D is false since causal modelling can be used for short-term fore-
casts, but its expense often rules it out.
13.4 The correct answer is False. Causal modelling is the approach of relating one
variable to others; least squares regression analysis is a technique for defining the
relationship. There are other ways of establishing the relationship besides least
squares regression analysis.
13.5 The correct answer is True. Qualitative forecasting does not work statistically from a
long data series, as the quantitative techniques tend to. However, in forming and
collecting judgements, numerical data may be used. For example, a judgement may
be expressed in the form of, say, a future exchange rate between the US dollar and
the euro.
13.6 The correct answer is A, B, C, D. The situations A to D are the usual occasions
when qualitative forecasting is used.
13.7 The correct answer is A. A is correct, although ‘expert’ needs careful definition. It
would be better to say that the participants were people with some involvement in
or connection with the forecast. B is not true since the participants are not allowed
to communicate with one another at all. C is not true because the chairman passes
on a summary of the forecasts, not the individual forecasts. D is not true. The
chairman should bring the process to a stop as soon as there is no further move-
ment in the forecasts, even though a consensus has not been reached.
13.8 The correct answer is True. This is a definition of scenario writing. Each view of the
future is a scenario.
13.9 The correct answer is C, D. A and B are not true since the technique of the cross-
impact matrix does not apply to forecasts or probabilities of particular variables,
whether sales or not. C is true since the technique is based on the full range of
future events or developments and they therefore need to be fully listed. D is true,
being a description of what the technique does.
13.10 The correct answer is False. The essence of an analogy variable is that it should
represent the broad pattern of development expected for the forecast variable. It
does not have to be exactly the same at each point. They could, for example, differ
by a multiplicative factor of ten.
13.11 The correct answer is A. Catastrophe theory applies to ‘jumps’ in the behaviour of a
variable rather than smooth changes, however steep or unfortunate.
13.12 The correct answer is C. C gives the formula for a partial relevance number.
Objective:
Level Design successful automobile
A Performance
B Passenger comfort
C Safety
D Running costs
E Capital costs
Weight the importance of each criterion relative to the others. This is done by
asking which criteria are most relevant to the basic objective of designing a suc-
cessful automobile. The weights might be assigned as follows:
Weight
A Performance 0.30
B Passenger comfort 0.20
C Safety 0.10
D Running costs 0.15
E Capital costs 0.25
Total 1.00
(c) Weight the sub-objectives at each level (the elements of the tree) according to
their importance in meeting each criterion. In this case, the result might be as in
Table A4.14.
The first column shows the assessed relevance of the three elements at level 1 to
the criterion of performance. Accommodation is weighted 10 per cent, control
65 per cent and information 25 per cent. Since the table gives the relative rele-
vance of the elements at each level to the criteria, this part of each column must
sum to 1. The process of assessing relevance weights is carried out in a similar
way for the second level of the tree.
(d) Each element has a partial relevance number (PRN) for each criterion. It is
calculated:
It is a measure of the relevance of that element with respect only to that criteri-
on. For this case the partial relevance numbers are shown in Table A4.15.
For instance, at level 2 the PRN for direction with respect to capital costs is 0.05
× 0.25 = 0.0125.
PRNs are calculated for each element at each level for each criterion.
(e) The LRN for each element is the sum of the PRNs for that element (see Ta-
ble A4.16). It is a measure of the importance of that element relative to others at
the same level in achieving the highest-level objective. For example, at level 2 the
LRN for direction is 0.0375 (= 0.0150 + 0 + 0.0100 + 0 + 0.0125). There is one
LRN for each element at each level.
(f) There is one CRN for each element. They are calculated by multiplying the LRN
of an element by the LRNs of each associated element at a higher level (see Ta-
ble A4.17). This gives each element an absolute measure of its relevance.
For example:
The CRNs at the second level show the comparative importance with respect to
the overall objective of the elements at that level. Thus, speed is the most im-
portant (0.240) and baggage the least important (0.012).
Recall that by this process the bottom row of elements (specific technological
requirements) will have overall measures of their relevance in achieving the ob-
jective at the highest level of the tree. This should lead to decisions about the
importance, timing, resource allocation, etc. of the tasks ahead.
Module 14
Review Questions
14.1 The correct answer is C. C is the definition of time series forecasting. A is untrue
because TS methods work for stationary and non-stationary series. Decomposition
is the only method that uses regression even to a small extent. Therefore, B is
untrue. D is partly true. Some, but not all, TS methods are automatic and need no
intervention once set up.
14.2 The correct answer is A, D. TS methods analyse the patterns of the past and project
them into the future. Where conditions are not changing, the historical record is a
reliable guide to the future. TS methods are therefore good in the short term when
conditions have insufficient time to change (situation A) and in stable situations (D).
For the same reason, they are not good at predicting turning points (situation B). In
order to analyse the past accurately, a long data series is needed, thus situation C is
unlikely to be one in which TS methods are used.
14.3 The correct answer is D. A stationary series has no trend and constant variance.
Homoscedastic means ‘with constant variance’. Thus, it is only D that fully defines a
stationary series.
14.4 The correct answer is False. The part of the statement referring to MA is true; the
part referring to ES is false. ES gives unequal weights to past values, but they are
not completely determined by the forecaster. They are partly chosen by the fore-
caster in that a smoothing constant, , is selected.
14.5 The correct answer is A. The three-point moving average for 9 is the average of the
values for periods 6, 7 and 8:
14.11 The correct answer is D. The length of a cycle is the time it takes to repeat itself. In
this case the time is 20 quarters (i.e. five years).
14.12 The correct answer is A. A forecast at is given by:
(b) The forecast for future months is 2056 – see Table A4.19.
(c) In both cases it is assumed that the series are stationary. In other words, there is
no trend in the data and they have constant variance through time.
The forecasts for quarterly demand in 2018 are calculated in Table A4.20. The total
forecast is therefore 215.7 + 223.9 + 232.1 + 240.3 = 912.0.
= 0.2 = 0.3
Period Actual Smoothed series Smoothed trend
4 207.5 + (4 × 8.2) = 240.3
where trend or fitted value for time period , and = 1, 2, 3, … for suc-
ceeding time periods.
Exhibit 3 shows the historical data in column 1, the time index in column 2
and the trend in column 3.
Exhibit 3
(1) (2) (3) (4) (5) (6)
Year Month Sales Time Trend Mov. Cycle Season
av.
2006 1 8.12 1 8.89
2006 2 7.76 2 8.94
2006 3 7.97 3 9.00
2006 4 7.88 4 9.05
2006 5 8.45 5 9.11
2006 6 8.68 6 9.16 9.80 1.07 0.89
2006 7 6.77 7 9.22 9.74 1.06 0.70
2006 8 6.60 8 9.27 9.70 1.05 0.68
2006 9 8.39 9 9.33 9.76 1.05 0.86
2006 10 11.88 10 9.38 9.87 1.05 1.20
2006 11 15.58 11 9.44 10.05 1.06 1.55
2006 12 19.50 12 9.49 10.09 1.06 1.93
2007 1 7.43 13 9.55 10.25 1.07 0.72
2007 2 7.26 14 9.60 10.07 1.05 0.72
2007 3 8.67 15 9.66 10.13 1.05 0.86
2007 4 9.26 16 9.71 10.08 1.04 0.92
2007 5 10.55 17 9.77 10.05 1.03 1.05
2007 6 9.17 18 9.82 9.93 1.01 0.92
2007 7 8.66 19 9.88 9.80 0.99 0.88
2007 8 4.45 20 9.93 9.71 0.98 0.46
2007 9 9.10 21 9.99 9.68 0.97 0.94
2007 10 11.32 22 10.04 9.65 0.96 1.17
2007 11 15.23 23 10.10 9.53 0.94 1.60
2007 12 18.02 24 10.15 9.59 0.94 1.88
2008 1 5.87 25 10.21 9.39 0.92 0.62
2008 2 6.19 26 10.26 9.35 0.91 0.66
2008 3 8.34 27 10.32 9.20 0.89 0.91
2008 4 8.91 28 10.37 9.35 0.90 0.95
2008 5 9.05 29 10.43 9.41 0.90 0.96
2008 6 9.98 30 10.48 9.82 0.94 1.02
2008 7 6.26 31 10.54 9.98 0.95 0.63
2008 8 3.98 32 10.59 10.12 0.96 0.39
2008 9 7.24 33 10.65 10.12 0.95 0.72
1.16
1.14
1.12
1.1
1.08
1.06
1.04
Cycle
1.02
1
0.98
0.96
0.94
0.92
0.9
0.88
0 20 40 60 80 100 120
Time
Since the seasonality is calculated several times for each month (e.g. for Janu-
ary, an estimate of seasonality is made for 2006, 2007, 2008, …), the
Exhibit 5
2006 2007 2008 2009 2010 2011 2012 2013 2014 Aver-
age
Jan. 0.72 0.62 0.75 0.79 0.69 0.51 0.71 0.76 0.69
Feb. 0.72 0.66 0.74 0.79 0.76 0.56 0.57 0.63 0.68
Mar. 0.86 0.91 0.78 0.69 0.86 0.65 0.76 0.80 0.79
Apr. 0.92 0.95 0.87 0.79 0.80 0.73 0.82 0.84
May 1.05 0.96 0.87 0.84 0.91 0.58 0.88 0.87
June 0.89 0.92 1.02 0.91 0.90 0.82 0.80 0.86 0.89
July 0.70 0.88 0.63 0.81 0.75 0.63 0.82 0.82 0.75
Aug. 0.68 0.46 0.39 0.36 0.34 0.33 0.34 0.29 0.40
Sep. 0.86 0.94 0.72 0.95 0.92 0.96 1.00 1.04 0.92
Oct. 1.20 1.17 1.30 1.11 1.19 1.28 1.21 1.22 1.21
Nov 1.55 1.60 1.55 1.82 1.75 2.08 1.80 1.73 1.73
.
Dec. 1.93 1.88 2.23 2.07 2.18 2.71 2.36 2.21 2.20
Total 11.98
Average 1.00
The cycle for January 2015 has to be estimated from column 5 of Exhibit 3.
Since the pattern is not a convincing one and it does seem to peter out, it is
assumed that there is no definite cycle. The cyclical factor for January 2015
and the other months is taken to be 1.0.
Seasonality for January 2015, taken from Exhibit 5, is 0.69.
The forecast for January 2015 is calculated by multiplying the three compo-
nents together:
(b) There are a number of assumptions and limitations in the use of these forecasts.
These reservations do not mean of course that the forecasts cannot be used, but
they do mean that they should only be used in full awareness of the problems
involved. The reservations are:
(i) The decomposition method does not allow the accuracy of the forecasts to
be measured.
(ii) Other forecasting methods could be used in such a situation, for example, the
Holt–Winters Method. Keith Scott should try other methods and compare
their accuracy.
(iii) Keith Scott should ensure he discusses the forecasts thoroughly with man-
agement before formalising the method by incorporating the forecasts in pro
formas and the like.
Module 15
Review Questions
15.1 The correct answer is False. Both manager and expert should view forecasting as a
system. They differ in the focus of their skills. The expert knows more about
techniques; the manager knows more about the wider issues.
15.2 The correct answer is A. The people who are to use the system should have primary
responsibility for its development. They will then have confidence in it and see it as
‘their’ system.
15.3 The correct answer is D. Analysing the decision-making process may reveal
fundamental flaws in the system or organisational structure which must be ad-
dressed before any forecasts can hope to be effective.
15.4 The correct answer is False. The conceptual modelling stage includes consideration
of possible causal variables but has wider objectives. The stage should be concerned
with all influences on the forecast variable. Time patterns and qualitative variables
also come into the reckoning.
15.5 The correct answer is D. The smoothed value calculated in period 5 from the actual
values for periods 3–5 is 16.0. This is the forecast for the next period ahead, period
6.
15.6 The correct answer is A. The one-step-ahead forecast error for period 6 is the
difference between the actual value and the one-step-ahead forecast for that period:
15.7 The correct answer is C. The MAD is the average of the errors, ignoring the sign:
15.8 The correct answer is B. The MSE is the average of the squared errors:
15.9 The correct answer is E. MAD and MSE are different measures of scatter. In
comparing forecasting methods they may occasionally give different answers,
suggesting different methods as being superior.
MAD measures the average error. The method for which it is lower is therefore
more accurate on average. The MSE squares the errors. This can penalise heavily a
method that leaves large residuals. Forecasting method 2 may be better on average
than method 1 (i.e. have a lower MAD), but contained within the MAD for method
2 there may be some large errors that cause the MSE of method 2 to be higher than
that of method 1.
15.10 The correct answer is C. Other measures of closeness of fit (e.g. the correlation
coefficient) are based on the same data as were used to calculate the model parame-
ters. This method keeps the two sets of data apart. A and B are true but not the
reasons why the test is described as an independent one.
15.11 The correct answer is A, B, C. A, B and C summarise the reasons why step 7,
incorporating judgements, is an important part of the system.
15.12 The correct answer is False. A consensus on what the problems are can be just as
difficult to obtain as a consensus on the solutions.
(b) Using exponential smoothing (ES) with = 0.1, the forecast is as in Table A4.22.
(d) Exponential smoothing has the lower MSE and therefore performs better over
the historical time series. The forecast for September 2017 is the exponential
smoothing forecast. The most recent smoothed value is the forecast for the next
period ahead. Thus, the forecast for September 2017 = 2056.
(e) The reservations about the forecast are:
(i) Exponential smoothing assumes the series is stationary. This has not been
checked (there are insufficient data to do so).
(ii) The possibility of a seasonal effect has been ignored (it would be impossible
to measure seasonality from less than one year’s data).
(iii) There is much volatility in the series, as seen by considering the data and
the MSE. The exponential smoothing forecast, although better than the
moving average, is not particularly good. It may be that different types of
forecasting methods should be used. Perhaps a causal model should be
tried.
(a) Internal
(i) reputation of play
(ii) reputation of actors
(iii) presence of star names
(iv) ticket prices
(v) advertising/promotional expenditures
(vi) demand for previous, similar productions
(b) External
(i) personal disposable income
(ii) weather
(iii) time of year
(iv) day of week
(v) competition
These are the factors that it is hoped a forecasting model could take into account.
Step 8: Implement
The manager of the forecasting system should establish and gain agreement on what
the implementation problems are and how they can be solved. The problems would
depend on the individuals involved, but it is likely that they would centre on the
replacement of purely judgement-based methods by a more scientific one. The
manager should also follow the first set of forecasts through the decision process.
Step 9: Monitor
Implementation refers to just before and after the first use of the forecasts. In
monitoring, the performance of the system is watched, but not in such detail, every
time it is used. The accuracy of the forecasted demand for each production should
be recorded and reasons for deviations explored. In the light of this information, the
usefulness of the system can be assessed and changes to it recommended. The track
records of those making judgemental adjustments to the forecasts should also be
monitored. In this situation, where judgement must play an important part, this
aspect of monitoring will take on particular significance.
forecasting project may have to be postponed pending the resolution of these wider
issues.
A thorough analysis of a decision-making system involves systems theory. A lower-
level approach of listing the decisions directly and indirectly affected by the fore-
casts is described here. The list would be determined from an exhaustive study of
the organisational structure and the job descriptions associated with relevant parts
of it. Here is a brief description of the main decisions.
The list forms the input information for step 2, determining the forecasts required.
Step 3: Conceptualise
At this step consideration is given to the factors that influence the forecast variable.
No thought is given to data availability at this step. An ideal situation is envisaged.
An alcoholic beverage is not strictly essential to the maintenance of life. It is a
luxury product. Therefore, its consumption will be affected by economic circum-
stances. It would be strange if advertising and promotions did not result in changes
in demand. In addition, the variability of the production, advertising and promo-
tions of the different competitors must have an effect. In particular, the launch of a
new product changes the state of the market. It is not just competing beer products
that are important. Competition in the form of other products, such as lager and
possibly wines and spirits, must also have an influence.
The data record in Table 15.4 makes it clear that there is a seasonal effect. In other
words, the time of year and, perhaps, the weather are relevant factors. The occur-
rence of special events may have had a temporary influence. A change in excise duty
as a result of a national budget is an obvious example. More tentatively, national
success in sporting tournaments is rumoured to have an effect on the consumption
of alcoholic beverages.
There are also reasons for taking a purely time series approach to the forecast. First,
the seasonal effect will be handled easily by such methods. Second, the time horizon
for the forecasts is short: less than one year. Within such a period there is little time
for other variables to bring their influence to bear. To some extent, therefore, sales
volume could have a momentum of its own. Third, a time series approach will give a
forecast on the basis ‘if the future is like the past’. Such a forecast would be the
starting point for judging the effect of changing circumstances.
determine any cyclical effect (these effects are often five or seven years in length).
To summarise, what is required is a technique that can handle trend and seasonality
but not cycles. The obvious choice is the Holt–Winters technique.
The causal modelling technique should be multiple regression analysis with two
independent variables. The first independent variable is the gross domestic product
(GDP) of the UK, as a measure of the economic climate. Other economic variables
can also be used in this role, but GDP is the most usual. The second independent
variable is the sum of advertising and promotional expenditures (ADV/PRO) of the
organisation. Scatter diagrams relating the dependent variable with each independ-
ent variable in turn can verify that it is reasonable to consider GDP and ADV/PRO
as independent variables.
Other potential independent variables will have to be ignored for reasons of the
non-availability of data.
The table shows the parameter sets in three groups. For the first group the smooth-
ing constant for the main series has been varied; for the second, that for the trend
has been varied by keeping the series constant at its ‘best’ level; finally, the constant
for seasonality has been varied by keeping the other two at their ‘best’ level. The
parameter set with the lowest MAD and MSE is (0.2, 0.4, 0.5). The Holt–Winters
model with these parameters would appear to be the best. Note that the procedure
for finding these parameter values is an approximate one. There is no guarantee that
the truly optimum set has been found. To ensure that this had been done would
have required an exhaustive comparison of all possible parameter sets.
The third model, with GDP and ADV/PRO as independent variables, is slightly
better than the second, having a marginally higher -bar-squared.
Finally, since these are regression models, they should be checked for the existence
of any of the usual reservations: lack of causality, spuriousness, etc. There may
indeed be a problem with causality. The second and third models are superior
because the ADV/PRO variable captures the seasonality, which was a problem in
the first. It is not clear whether it is the seasonality or the expenditure on advertising
and promotion that explains the changes in sales volumes. There will be no difficul-
ty if advertising and promotion expenditures continue to be determined with
seasonal variations as in the past, but if the method of allocation changes then both
models will be inadequate. A new model, dealing with advertising/promotion and
seasonality separately, will have to be tested.
Meanwhile, the model with two independent variables seems to be the best. The
results of this regression analysis are shown in more detail in Table A4.25.
Table A4.25 Output of the regression model linking sales to GDP and
ADV/PRO
Sales volume = –44.3 + 0.49 × GDP + 17.4 × ADV/PRO
-bar-squared = 0.93
Quarter
Residuals Year 1 2 3 4
2009 0.3 4.2 0.4 0.6
2010 2.4 1.3 1.7 4.6
2011 2.6 4.3 0.5 1.4
2012 1.1 5.3 4.7 1.7
2013 2.9 1.8 0.7 2.5
2014 1.9 2.8 1.3 3.0
2015 2.6 1.2 0.7 0.3
2016 1.4 2.2 0 0.1
2017 1.4 5.0 3.5 0.8
The best time series model is the Holt–Winters with smoothing constants 0.2 (for
the series), 0.4 (for the trend) and 0.5 (for the seasonality); the best regression model
relates sales volume to GDP and total expenditure on advertising and promotion.
To choose between these two, an independent test of accuracy should be used. This
means that the latter part of the data (2017) is kept apart and the data up to then
(2009–16) used as the basis for forecasting 2017. The better model is the one that
provides forecasts for 2017 that are closer to the actual sales volumes. There are two
reasons for comparing the models in this way.
First, the test is independent in the sense that the data being forecast (2017) are not
used in establishing the forecasting model. Contrast this with the use of -bar-
squared. All of the data, 2009–17, are used to calculate the coefficients of the model;
the residuals are then calculated and -bar-squared measures how close this model is
to the same 2009–17 data. This is not an independent measure of accuracy.
Second, the accuracy of smoothing techniques is usually measured through the
mean square error or mean absolute deviation; the accuracy of regression models is
measured by -bar-squared. These two types of measures are not directly compara-
ble. On the other hand, the independent test of accuracy does provide a directly
comparable measure: closeness to the 2017 data.
The details of the test are as follows. The 2009–16 data are used for each of the two
models as the basis of a forecast for each of the four quarters of 2017. The close-
ness of the two sets of forecasts to the actual 2017 data is measured using the mean
square error and the mean absolute deviation. The model for which both these
measures are smaller is chosen as the better to use in practice. Should the two
measures have contradicted one another, then this would have meant that the model
with the lower MSE tended to be closer to extreme values whereas the model with
lower MAD tended to be closer on average to all values.
Table A4.26 shows the results of this test. The Holt–Winters time series is clearly
superior to the regression model. Both measures, MAD and MSE, demonstrate that
it gives the better forecasts for the already known 2017 data. The Holt–Winters
technique, with smoothing constants 0.2, 0.4 and 0.5, should be chosen to make
forecasts. The whole of the data series, including of course 2017, should be used in
doing this. Forecasts for 2018 are shown in Table A4.27.
To obtain a consensus from these data, a modified version of the Delphi method
might be used. All the experts represented in the table should be approached,
presented with the views of the others and asked if they wish to adjust their opin-
ions. As a result, some of the more extreme views might be altered.
The second stage, the adjustment of the statistical forecasts, is made by people
within the organisation. They should be accountable for any changes made; it is not
of course possible to make the external experts accountable for their views within
the context of an organisation’s forecasts.
volvement approach is likely to minimise the effect, since the participants will feel
more like a team than they would otherwise.
Likewise, a lack of belief in a forecasting system will probably have effects that go
far beyond any one particular project, but early user involvement will mitigate the
effects.
Conclusions
It should be emphasised that it is probably only for short-term forecasts that a time
series method will seem to be the best. For medium-term forecasts beyond a year
ahead, a causal model is likely to be better. Even for a short-term forecast, however,
uncertainty and volatility in the UK economic environment will eventually cause
problems, and adjustments will have to be made to the Holt–Winters model. For
important medium-term forecasts on which the expenditure of a lot of money is
justified, it may be worthwhile to use all three approaches to forecasting: causal,
time series and qualitative. If all give similar output, there is mutual confirmation of
the correct forecast; if they give different output, then the process of reconciling
them will be a valuable way for the managers involved to gain a better understand-
ing of the future.
This case solution has covered the important aspects of the case, but not all the
aspects. Among the omissions, for example, are statistical tests of randomness.
Furthermore, techniques other than Holt–Winters and some limited causal model-
ling have not been described, but they should have been considered. The emphasis
has, however, been on the role of a manager, not a statistician. The items included
are, in general, the things a manager would need to be aware of in order to be able
to engage in sensible discussions with forecasting experts.