You are on page 1of 25

Ordinary Certificate, Paper I, 2001.

Question 1

(i) A longitudinal study is one carried out over a period of time. A cohort is the
THE ROYAL STATISTICAL SOCIETY sample of people chosen initially to take part in the study, who are then followed over
the time period.

2001 EXAMINATIONS SOLUTIONS (ii) The sampling unit is an individual young person of school leaving age. Hence
the sampling frame is all schools with pupils in the required year.

The sample design is to choose a random sample of pupils of the required age using
ORDINARY CERTIFICATE the school lists of pupils.

PAPER I [Note. The actual survey excludes special schools and those with less than 15
students.]

(iii) Age 16: response 77% of the original


17: 76 77 = 59% of the original
The Society provides these solutions to assist candidates preparing for the 18: 76 76 77 = 44% of the original
examinations in future years and for the information of any other persons using the 23: 62 76 76 77 = 28% of the original
examinations.
The percentages decrease quite rapidly. Young people are likely to be quite mobile,
The solutions should NOT be seen as "model answers". Rather, they have been so follow-up is likely to be difficult. The first response is only 77%, and this could be
written out in considerable detail and are intended as learning aids. because a self-completion questionnaire was used; this would also be a factor in later
years.
Users of the solutions should always be aware that in many cases there are valid
alternative methods. Also, in the many cases where discussion is called for, there
may be other valid points that could be made. (iv) Government (the Department of Education actually sponsors the survey);
educationalists (colleges, universities, training organisations); employers (supply of
While every care has been taken with the preparation of these solutions, the Society suitable skills and training).
will not be responsible for any errors or omissions.

The Society will not enter into any correspondence in respect of these solutions.

RSS 2001

Ordinary Certificate, Paper I, 2001. Question 2 Ordinary Certificate, Paper I, 2001. Question 3

ENTRY FORM FOR OUR PRIZE DRAW


(i) Simple random sampling, advantages:
Please complete the form in BLOCK CAPITALS or print your answers clearly
- electoral roll can be used as a sample frame
1. TITLE Mr / Mrs / Miss / Other___________(please specify) - there is no personal bias in selecting units
Cross out those that do not apply - sampling variation can be estimated mathematically.

2. INITIALS___________________SURNAME___________________________ Simple random sampling, disadvantages:

3. ADDRESS_______________________________________________________ - selection process tedious when the population is large


- some wards may be represented much better than others
_______________________________________ POSTCODE_________________ - there may be non-response of selected units.

4. TELEPHONE NUMBER DAYTIME______________________________ Stratification by ward, advantages:

EVENING______________________________ - electoral roll can be used as sampling frame since it distinguishes between
wards
5. EMAIL ADDRESS (IF ANY)________________________________________ - all wards represented, giving information about each ward as well as total
population
6. PLEASE RING YOUR AGE GROUP - sampling variation for each ward can be estimated mathematically
- less tedious to select because each ward contains a manageable number of
18 25 26 35 36 45 46 55 56 65 OVER 65 units.

7. DURING A TYPICAL WEEK, ON WHICH DAYS DO YOU BUY "THE Stratification by ward, disadvantages:
NATIONAL DAILY"?
- non-response may occur
MON TUE WED THUR FRI SAT (please ring) - preliminary calculation of sample sizes in each ward necessary, using
guesses of relative variability
8. DURING A TYPICAL WEEK, PLEASE SAY WHICH OTHER NATIONAL - calculation of overall sampling error less straightforward.
DAILY PAPERS YOU BUY (please write the names)
Quota sampling by ward, advantages:
Monday __________________________________________________________
Tuesday __________________________________________________________ - no difficulty over non-response
Wednesday __________________________________________________________ - only limited area to be covered, therefore quick
Thursday __________________________________________________________ - different wards all represented satisfactorily.
Friday __________________________________________________________
Saturday __________________________________________________________ Quota sampling by ward, disadvantages:

9. WHICH NATIONAL SUNDAY NEWSPAPERS ARE BOUGHT ONCE A - no estimate of sampling variation can be made
MONTH OR MORE OFTEN ? Please list them (OR write NONE) - results may be biased through choice of individuals to be approached, and
through willingness or not to reply
____________________________ ____________________________ - appearance of interviewer may cause some people to respond, other not.
____________________________ ____________________________
[Two comments required for each.]

[Note. 8 and 9 could also be asked by giving a full list of all those available and
asking for boxes to be ticked. This could take a lot more space] Continued on next page
Ordinary Certificate, Paper I, 2001. Question 4
(ii) The roll will not be an up-to-date list of residents, due to deaths, removals into
or away from area, including from one ward to another and from town to country or
visa versa. Also any building, clearance or renovation schemes may have affected the (i) The number of respondents, because the larger this is the smaller will be the
structure of wards, of the numbers in them and the economic characteristics. standard error of the estimated percentage. This must be taken note of when assessing
the meaning of the result.

(ii)

(1) The poll is only of listeners to the radio station; so it will not be representative
of the whole area it serves. The audience may be biased to particular age-groups,
economic characteristics, work and leisure habits, and to those who like the sort of
entertainment the station gives.

(2) A telephone is required to answer the poll. There may be difficulties in


getting through, or accessing a telephone.

(3) The timing will exclude several groups of people, perhaps even those out
walking their dogs.

(4) The nature of the question may make dog owners more likely to respond.

(5) Giving a figure half-way through the hour will encourage more no-voters to
respond as their view is in a minority (or vice versa if the announced percentage had
been below 50).

(6) People can vote more than once if no identification is asked for and checked.
This is a source of considerable bias.

[Three problems required.]

(iii) "Are you a dog-owner?", with the figures for Yes and No kept separate. This
allows comparison of the views of the two groups. [It is possible that age or sex may
be relevant also, but only one question is allowed.]

Ordinary Certificate, Paper I, 2001. Question 5 Ordinary Certificate, Paper I, 2001. Question 6

The sampling fraction is the proportion of the total population (or of a particular She should sit in a position where she can see the whole shop clearly, and can also see
subgroup) that is used in the sample survey. If there are N in the group, of whom n the cash desk/till to record the value of goods purchased. A pre-printed form should
are in the sample, the sampling fraction is n/N. be used for each customer observed, recording sex, age (in the form of a very broad
classification, young/middle-aged/old, since she cannot ask the customers), time of
entry and exit. A floor-plan of the shop, printed on the form, would allow direction
(i) A uniform sampling fraction requires 400/10000=0.04 to be selected from and pattern of movement around the shop to be recorded. The number of times an
each stratum, i.e. item is looked at, or picked up for examination, can be recorded on the plan.

40 from A, 160 from B, 2000 from C. Only one person can be observed at a time, so as soon as she is in her observation she
should observe the first customer coming in, and when that customer has finally left
[The population size is 1000 4000 5000 = 10000.] she can take the next one to enter. This should ensure reliable records, and span the
whole working day (with breaks taken at convenient times, e.g. for refreshment).
Forms will be numbered and dated, and used in order, so comparisons between days,
(ii) The sample fractions in A, B, C must be 10k, 5k, 2k where k is a constant that and of times in the same day, can be made.
will achieve the required total 400. The sample sizes then are 1000 10k, 4000 5k
and 5000 2k which add to 40000k; this has to be 400, so k = 0.01. Care must be taken not to be conspicuous, or to disturb the normal running of the
shop; staff need to be fully aware of her task and to prevent customers asking her for
Hence 100 in A, 200 in B, 100 in C. assistance (so far as possible). Difficulty could arise if a shop cannot be seen fully
and easily from one place and probably should not be used for the study. Any
groups of shoppers, perhaps looking for a single item, may be hard to record properly.
(iii) The second method should reduce the standard error of the estimate.
Ordinary Certificate, Paper I, 2001. Question 7 Ordinary Certificate, Paper I, 2001. Question 8

(i) People may simply refuse; may be away from home; may be out at the time A typical database might look like this:
of the call; may be unsuitable to be interviewed, for various reasons; or may be new
occupants, not the persons on the available list. Pre-selected names should not FIELD NAME FIELD TYPE WIDTH
normally be replaced by substitutes. Title Text 4
Surname Text 24
Given_name Text 24
(ii) Non-respondents may well have different characteristics from those who do
Initials Text 6
respond, and bias will depend on the extent of these differences and the amount of
House_no Numeric 4
non-response.
Address_1 Text 36
Address_2 Text 36
(iii) It may be necessary to keep a sample up to the planned size, to provide Address_3 Text 36
sufficient data for analysis and for adequate estimation of sample variance. In Postcode Text 8
stratified sampling, strata proportions need to be kept correct if response rates are Telephone Numeric 16
likely to vary between strata. But since substitutions are necessarily responders, the Number Numeric 2
difficulties in (ii) still remain. Savings Attribute 1
Current Attribute 1
Loan Attribute 1
(iv) Skilled professional interviewers can sometimes help with unsuitable Deposit Attribute 1
interviewees and potential refusals. Brief, clear and well-designed questionnaires Other Attribute 1
may overcome these difficulties also.
[Attribute Yes/No]
Repeat calling is used for those not at home, either choosing a different time of day or
day of the week, in the light of any knowledge about age, sex, occupation etc of
respondent, or if possible making a firm appointment.

Results could be weighted for characteristics such as age, sex, social class if it was
thought, or there was information, that these differ between respondents and non-
respondents.

Ordinary Certificate, Paper I, 2002. Question 1

A survey usually aims to estimate a mean or a proportion, e.g. the mean expenditure
THE ROYAL STATISTICAL SOCIETY of a family per week or the proportion of the population holding a particular opinion.
If a sample from the whole population is used for this purpose, it must "represent" the
population so that the results from the sample can be applied to the wider population.
Some methods of selecting a sample do not properly represent a population, e.g. if
2002 EXAMINATIONS SOLUTIONS using a list of members that is not up to date. Some methods of obtaining information
will cause non-response, the refusal of people to answer badly constructed questions
or the failure to take part in enquiries when people have no interest in a topic. These,
and other, errors in carrying out a sample survey rarely affect all sections or groups of
ORDINARY CERTIFICATE a population in the same way or to the same extent, so the answers which are obtained
are incomplete but the effect of this on estimated means or proportions cannot be
PAPER I measured. This leads to bias in the sense that the sample estimates, even from large
samples, may differ systematically from the true (but unknown) value of the required
mean or proportion in the whole population due to, for example, a non-response
group being systematically different from the rest. This bias cannot usually be
corrected by statistical methods. It is a structural error in the sampling technique
used.
The Society provides these solutions to assist candidates preparing for the
examinations in future years and for the information of any other persons using the In this survey, the response rate was so small that it gives negligible information on
examinations. the whole population. It very likely represents the views of minorities or special
interest groups who selected themselves by replying. No statistical selection took
The solutions should NOT be seen as "model answers". Rather, they have been place, and no reminder to reply seems to have gone out. The 'Newsletter' is most
written out in considerable detail and are intended as learning aids. unlikely to have been read in detail by more than a small proportion of the population,
and even those who tried to telephone their responses may not always have got
Users of the solutions should always be aware that in many cases there are valid through. 52 out of 103456 residents, self-selected, cannot be taken as a good
alternative methods. Also, in the many cases where discussion is called for, there representation of residents in the borough.
may be other valid points that could be made.

While every care has been taken with the preparation of these solutions, the Society
will not be responsible for any errors or omissions.

The Society will not enter into any correspondence in respect of these solutions.

RSS 2002
Ordinary Certificate, Paper I, 2002. Question 2 Ordinary Certificate, Paper I, 2002. Question 3

These three methods all imply proper statistical selection, which is an improvement.
It is quite likely that the same errors would occur if a repeat "survey" was done in the Nevertheless we also need to select from all the population of residents, and none of
same way responses would be very few and would represent only those with some the methods ensures this.
special interest in the questions asked, very far from the population as a whole.
Hence, in any repeat, the same sort of results might occur. Reliability in this sense is (i) A simple random sample, using the register of voters as its sampling frame,
therefore a misuse of the word; it does not imply validity. If a sampling method is will exclude those residents newly moved in, and those who failed to return their
basically faulty, the same faults will influence any set of results. registration forms for that year; and will include some who have moved out or died.
In the UK, these lists become quite inaccurate by the end of a year. This is a source
of bias, difficult to estimate. A postal survey, even with up to two reminders, rarely
gains more than 50% response unless some form of visit can be organised for non-
responders.
Advantages of simple random sampling, given an adequate frame, are that selection is
made from the population with no personal bias by the survey organiser; that
everyone on the list has the same probability of selection; that some valid statistical
analysis of results can be done, and can be generalised to the whole population.

(ii) A stratified sample requires the population to be split into groups or "strata",
which should ideally be homogenous within a group, the main differences occurring
between groups. A simple random sample is taken within each of the strata. The
advantage is that information is obtained for each stratum, and similarities or
differences between them can be seen. A disadvantage is that the frame has to be split
into strata, and the likely boundaries between strata are not always clear. Sometimes
urban and rural may differ, sometimes type of housing is a useful division (and a
borough council will have this information), especially if it also represents social
class.
Personal interviewers will lead to a considerably better response rate, without the
need for reminders, except that some revisiting is needed to those not at home on the
first visit. It is more expensive, especially in less densely populated areas. Although
it gives better quality data, a larger total sample may be needed if there are several
strata, although in the present case that seems unlikely to be necessary.

(iii) The sampling frame is incomplete. Some people do not want their names and
addresses published in a directory, some have mobile phones or unlisted cable
services. This causes bias, since some groups of people are more likely than others to
be in one of these categories. Also, people do not always respond to telephone
interviews (unless some publicity beforehand has told the population that a telephone
survey is to be carried out, and encouraged them to cooperate).
It is a quick method, even allowing for non-response, and enough reserves can be
located to enable the desired size of sample to be achieved whereas in other
methods this depends on the response rate, which can be hard to guess in advance.
However, the use of reserves instead of attempting to contact again those who are out
at the first call can bias the response in favour of some parts of the population (older,
different leisure interests, different hours of work).
This is a very cheap method and so is often used. Its response rate would be higher
than for a postal survey.

Ordinary Certificate, Paper I, 2002. Question 4 Ordinary Certificate, Paper I, 2002. Question 5

(i) N1 + N2 = 12000 + 80000 = 92000 n1 + n2 = 500 c1 = 18 c2 = 5


A pilot survey allows data collection equipment (such as questionnaires) and methods
(such as use of interviewers) to be field-tested and improved, data processing to be
12000 checked and if necessary improved, explanation of purpose of survey for respondents
n1 500 65.217 ; take n1 65 .
92000 to be refined. Sampling frames can be assessed for accuracy, completeness and
80000 suitability for use as a basis for whatever sampling method is planned (random,
n2 500 434.783 ; take n2 435 . stratified, systematic, multi-stage). Training of enumerators (field workers) and
92000
supervisors is carried out in the pilot stage. Decisions on the sampling unit where
necessary (e.g. individual or household) can be made, and whether to include or
Cost is n1c1 n2 c2 1170 2175 3345.
exclude any special types of unit that may exist. A pilot survey may indicate that
other items of information need to be collected to make the final survey worthwhile,
capable of answering important questions. It will give realistic ideas of the time
N i si needed for interviews and travel. Sources of variation, e.g. between towns/villages or
(ii) ni k where k is a proportionality constant to be found. Budget for
ci parts of towns, can be determined, and/or previous knowledge checked for relevance
sampling costs is (3600 500) = 3100 n1c1 n2 c2 . to the present survey.

n1 12000 11 n2 80000 8
31112.7 286216.7
k 18 k 5

Hence 3100 31112.7k 18 286216.7 k 5 1991112.1k


Ordinary Certificate, Paper I, 2002. Question 6
3100
Therefore k and so n1 31112.7 k 48.44 , say 48.
1991112.1 (i) (a) As many as possible of the questions should be capable of answers
Yes/No, or a limited set of alternatives, with boxes to be ticked. All questions
Then n2 286216.7k 445.62 , say 446. should be 'closed', not open-ended allowing imprecise verbal answers; those
that ask for numerical information should have the units clearly stated and a
This gives as cost 18 48 5 446 3094 . format marked out for respondents to enter information in standard form, e.g.
for a date:
D D M M Y Y
(iii) The optimum method gives a total sample size of 494, allowing for costs.
Assuming that in proportional allocation 500 had also been required for fixed costs,
3100 or for a time:
the stratum sample sizes would have had to be reduced in the ratio , giving n1 =
3345 Hours Minutes
60.44 and n2 = 402.94 (take as 60 and 403); n1c1 + n2c2 is then 3095, so one more
urban item could be take to give finally n1 = 60, n2 = 404. Total sample size would be
464. If decimal answers are possible, the position of the point should be marked
The optimum method has given a sample 30 items larger, with 12 fewer in the clearly. Any questions that must be open-ended have to be copied into the
smaller, more expensive stratum and 42 more in the larger, less expensive stratum. database with answers abbreviated according to instructions given to the
operator.
The strata SDs from the pilot survey are not so very different, so the actual estimates
in the full survey will not be much more variable on the simple method (i). Since we
may be interested in stratum results as well as overall results, method (i) seems Solution to question 6 continued on next page
suitable in this case.
(b) A spreadsheet will use a row for each response, so these have to be Ordinary Certificate, Paper I, 2002. Question 7
numbered 1 to N, assuming N ( 700) reply. A pattern could be constructed on
the computer screen to make sure the data are transferred into the correct
columns of the database and are entered in the right format; otherwise the (i) The UK Index groups items into broad categories, such as food, alcohol and
spreadsheet must be completed carefully row-by-row. tobacco, clothing and footwear, transport and vehicles, fuel and light, leisure
activities. Household surveys provide estimates of the expenditure in each of these
categories from a large number of participating households each year (or two years)
(c) The data may be (partially) validated by providing upper and lower and hence the weighting (i.e. the proportion of total expenditure) for each category is
limits for some of the data, e.g. the column containing children's ages could be found. This weighting becomes out of date fairly quickly, as fashions change or some
programmed to reject entries less than 5 years or more than, say, 19 years. items suddenly increase in price so consumption is reduced. However, grouping into
Accuracy can be checked by a second person entering the data, and the categories reduces the effect of these changes, as for example seasonal price rises in
program rejecting items which do not match. Each column requires the type some food items will result in people buying less, and replacing them with
of data to be specified, numerical or non-numerical. alternatives. Only consumer items are used in the RPI (e.g. savings and investments
are not).

(d) Usually a symbol such as * is entered where values are missing. When
using the data, results for each column (i.e. each data item) may be used as (ii) Prices vary substantially in different parts of the country. Different groups of
they stand, or the rows with any item lost may be omitted completely. If two- the population will buy somewhat different ranges of products and will react in
way tables, using data from two columns, are to be produced, a row need only different ways to prices changes. Some people will use more expensive small shops
be omitted if one (or both) of these items is (are) lost, so minimising the to avoid travelling; others will concentrate on supermarkets where prices are lower
number of missing values. and less variable. Data need to be collected from the whole range of outlets, and a
suitable form of "average" found from them.
The procedures required will depend on which program is being used, and what
computing equipment is available.
(iii) Weights for the categories will, in practice, vary substantially from group to
group in the population. Children's clothes need more frequent replacement, food
(ii) Given a good program, used properly, accuracy can be achieved quite quickly, consumption by the elderly differs in amount, and in types of food, from that by
since all the calculations after the checking stage can be done in the computer families with growing children; some types of occupation demand more energy-
(column means etc, two-way tables), and if verbal answers can be categorised into a giving foods, as well as different types and strengths of clothing. Old people are more
few classes these can also be summarised quickly. A disadvantage could be that the likely to need supportive footwear, for example, whereas teenage sportsmen and
data are not scrutinised so closely as in an analysis by hand, so that some useful two- women want quite different special items. Only broad generalisations are possible
way tables are not calculated because a possible relation has not been spotted. between different types of household. However it is certainly possible to estimate the
Comments in words, from individuals or only a few people, may be lost in a purely expenditure necessary for healthy living and eating as a basis for deciding what a
mechanical analysis. On the other hand, standard statistical packages will note "odd" "minimum income" should be. A single "cost of living" index is really a fallacy.
values (possible outliers) where these may be missed by hand. Computer analysis can
carry out all the studies and comparisons that seem useful; this would often be
impossible by hand.

Ordinary Certificate, Paper I, 2002. Question 8

Quota sampling splits a population into a number of groups and samples a prescribed
number (quota) of people in each group. Often these numbers will be in the same THE ROYAL STATISTICAL SOCIETY
ratio as the totals in the population. Suppose that a college or university is split into
male/female, home/overseas and three areas of subject study A, B, C. The numbers of
each of the 12 groups in the population can be found from college records, and so a
quota of each group (A/male/home), , (C/female/overseas) can be specified: this 2003 EXAMINATIONS SOLUTIONS
may, for example, be 10% of each total. Interviewers now go in search of the
appropriate numbers from each group, and ask the first suitable individuals they meet
the questions in the survey they are conducting. As soon as the required number of
(A/M/H) have been interviewed, no one else who is in that group will be asked the ORDINARY CERTIFICATE
survey questions; and so on for all 12 groups. There is no question of randomness in
the sampling, but if the survey is about opinions on some topic that is likely to affect PAPER I
people within the same group in much the same way, the answers can often be quite
representative. However, there is no statistical theory that can be used to assess the
results. Interviewers can be told not to concentrate on all the same type of unit, e.g.
they can be warned not to go for the tallest males!

The population is the whole college; the frame is the college list; the method gives The Society provides these solutions to assist candidates preparing for the
answers quickly, without the need to set up a randomised scheme, and unless the examinations in future years and for the information of any other persons using the
survey questions are very sensitive the answers are likely to be reasonable reliable. examinations.

The solutions should NOT be seen as "model answers". Rather, they have been
Cluster sampling is useful where a number of similar large (primary) units, such as written out in considerable detail and are intended as learning aids.
villages in an agricultural region, exist and a sample of individual farms or holdings
(secondary units) is required. Time and cost can be saved by selecting some of the Users of the solutions should always be aware that in many cases there are valid
clusters, at random from all of the population of clusters which are the villages. alternative methods. Also, in the many cases where discussion is called for, there
The remaining villages are not visited at all, and so the sampling frame only needs to may be other valid points that could be made.
exist for the chosen villages. Often it will have to be constructed as part of the
survey, so considerable effort is saved in this way. From each of the chosen villages, While every care has been taken with the preparation of these solutions, the Society
a random sample is selected in the usual way, of farms or holdings to take part in the will not be responsible for any errors or omissions.
survey. (If there are not many farms in clusters, they can all be taken; but usually a
sample of the same size would be taken for the survey.) The Society will not enter into any correspondence in respect of these solutions.
The sampling frame at the beginning needs to list all the villages. The population is
all the holdings in all the villages. Assuming that differences between villages are not
great, resources can be conserved by not having to visit all of them. This allows
sampling within villages to be sufficient for a good estimate of variance to be found.
A random method which required several villages to be visited for only a single unit,
or very few units, to be studied would be inefficient by comparison.

RSS 2003
Ordinary Certificate, Paper I, 2003. Question 1 Ordinary Certificate, Paper I, 2003. Question 2

Methods vary between countries. The UK conducts regular 10-yearly censuses, the (i) A target population is that population for which information is required.
most recent being in 2001. Questionnaires are used, one for each household (not one Results from a survey apply only to the study population which was sampled, e.g.
for each building); these are delivered by enumerators and (in 2001) returned by post those who respond to the first request for information, or those geographically easy to
in pre-paid envelopes. (In previous censuses, they were collected personally by the locate. If there are any real differences between target and study populations, results
enumerators.) Those not returned by post in 2001 were followed up by the may not apply to the target.
enumerators.
(ii) Either the target will be those who already use the canteen, regularly or
Forms contained sections for household entries and separate (but identical) sections occasionally, or it will cover all actual and potential users. If the main aim of the
for individual members. Because the form was to be filled in by the householder (or survey is to improve satisfaction among existing users, the first is appropriate, but if it
another member, but not by an enumerator), questions needed to be as few as possible is also desired to improve user numbers the second target is appropriate. In that way,
and as clear as possible, and should have been tested in a pre-census pilot survey. The information can be obtained on reasons for non-use, such as not supplying the type of
UK 2001 census had five sections:- food required at lunch time (which would be the main time of interest to the
manager), or speed of service, supply of vegetarian means, etc.
(1) Residents the name of each person usually living there.
(2) Visitors name, together with the usual address.
(3) Household type of accommodation, whether self-contained (e.g. no sharing
of kitchen, etc), number of rooms, ownership, central heating, vehicle
ownership.
(4) Relationships between individual members (husband, wife, partner, parent,
child).
(5) Individuals sex, date of birth, marital status, various ethnic questions such as
country of birth, migration, previous address, ethnic group classification,
religion, qualifications, various questions on health, provision of care etc,
employment, working hours and method of travel.

Some different questions were added in Wales (e.g. language), Scotland and Northern
Ireland. There were 5 household questions and just over 20 individual ones in 2001.

Data remain confidential and individuals cannot be identified. However, data for
relatively small areas, towns, villages, minority groups, can be obtained. Hence,
nationally data will be useful to government for
(1) planning of requirements of health care, hospitals, schools and colleges;
demographic variation and pension provision, welfare,
(2) in the UK, devolving funds to regions or areas.

Also, (3) businesses and commercial interests, and social research organisations and
departments, use these data.

Changes should be minimal between censuses so that comparisons can be made; also,
comparisons between countries are very useful. (Note that some developing countries
are still developing census methods, and that some developed countries (e.g.
Netherlands) use sample surveys combined with administrative registers rather than
complete censuses.)

Ordinary Certificate, Paper I, 2003. Question 3 Ordinary Certificate, Paper I, 2003. Question 4

(i) Bias, in general, is a systematic tendency to overestimate, or to underestimate, a


parameter of the target population (e.g. a total, mean or proportion). That is, repeated use of (i) Total number of staff = 3000. 450/3000 = 15%. Use 15% of the number of
the same sampling method would not produce an estimate of the required parameter, but staff in each group, i.e. Education 165, Social Services 135, Chief Executive's 48,
would aim above it, or below it, consistently. Environment and Resources 102.
Selection bias is due to the method of selecting the sample (as opposed to using an
inappropriate formula for estimating parameters).
Response rate is the proportion of those selected and contacted in the sample who actually
(ii) This achieves equal proportions of employees in each of the departments. It
provide a reply. Low response rates can lead to (self-)selection bias when only a particular avoids some departments being over-represented in the sample originally selected.
type of member of the population is willing to respond (e.g. those with non-typical views on a (Also it is easy to adapt the stratification method to study groups which may give
survey topic). more variable results, or higher means, or are more expensive to sample (e.g. through
low initial response rate).)
(ii)

A: If the target population is only present users, this method could be adequate provided
enough time was allowed for occasional (as well as regular) users to be adequately
represented among the respondents but even here the most regular users might be over-
represented, so that too high a proportion of "well satisfied" customers was recorded.
For the other possible target, the non-users are not represented at all, so it would be quite
unsatisfactory.
Response rates could also be lower among those in a hurry, who may not be so satisfied with
speed of service but did not take the time to say so.

B: This is an expensive method, but it avoids selection bias. Response rates could be
relatively high, although those who were "too busy", not interested in the canteen or not easily
available for interview could produce bias in the results by not answering. Interviewers, if not
regularly used to carrying out surveys, would need careful training to avoid bias in the way
they asked the questions and recorded answers.

C: Using the list of work email addresses, this is a good method for either target. As in
B, present usage of the canteen could be one of the questions asked. Effectively this is a
complete census, but of course non-response is possible. This could be minimised by sending
reminders to those known not to have replied. In fact, people commonly reply to emails
quickly (if at all!) and so if a clear and fairly short questionnaire is emailed a large number of
responses would be hoped for.

D: A display stand, with questionnaire forms to be taken, could raise general interest in
the survey but still may not obtain an unbiased selection of replies, which could be limited to
those who were already interested in the canteen and felt they could spare time to reply.
Unless there are sufficient questions to check identity of respondents, a few members of the
public visiting the offices might answer. This does not seem to be a good method.

E: If strata (as in question 4) cover all the departments, this is potentially a good way of
selecting a "representative" sample, but there remains the possibility of variable response
rates. Any group which spends almost all of its time in the office is likely to respond better
than any whose work is partly external. Prepaid envelopes may possibly reduce non-response
among those not always working in the office. But any differences between department
groups should be shown up by this method. It can be argued that better stratification would
be between (1) regular users, (2) occasional users, (3) non-users. But lists of these would be
very hard to construct. Provided strata do have underlying differences in important responses,
the method is bound to give better (more precise) estimates than simple random samples,
provided also that response rates are similar in all strata.
Ordinary Certificate, Paper I, 2003. Question 5 Ordinary Certificate, Paper I, 2003. Question 6

(i) C is cheapest; A costs only the paper; B is most costly in terms of staff time; (i) A longitudinal survey follows a group ("cohort") of the target population over
D will have some cost to make a useful and eye-catching display; E will have postal a period of time; a cross-sectional study takes a "snapshot" at a particular time.
costs.
For the present topic, a longitudinal survey would follow the opinions of the same
C is quick, even allowing for reminders; A and D depend on how long it takes to get group of people through, say, a year, to note any changes in attitude to food, variety,
enough responses; E will take some weeks, allowing for reminders; B may be slow if service etc of regular users. Another example would be to take a newly-arrived group
only one or two trained interviewers are available. of users and follow their changes in attitude and use.

B and E are the only ones where some staff will not have the opportunity to answer A cross-sectional study is a single undertaking on the lines discussed in previous
(though statistically satisfactory methods). For all the others, the choice whether to questions.
respond is theirs, except for the present non-users by method A.

(ii) A panel would be a group of people, selected at random or otherwise, who


(ii) Multi-stage sampling uses natural clusters (groupings) in the population, agreed to report on their use of, and attitude to, the canteen over a period of time,
samples a randomly chosen set of these, and then samples members within each providing an indication of responses to any changes and initiatives the manager
chosen group. It is economical where clusters are geographical areas and members introduced and/or other relevant changes in competition, canteen layout, organisation,
within them have to be interviewed. In the present case, email, post or telephone etc.
contact would make any multi-stage scheme unnecessary; with only four
departments, on the same site, which do not necessarily form "natural clusters", There is always a danger that such a panel, willing to do this task for some time, may
saving of time under scheme B would be minimal, so multi-stage sampling does not be atypical of the whole target population, so causing bias in the results of the study.
seem worth considering for this survey. Also, familiarity with the running of the canteen may in time make them less like the
"average" user, again leading to possible bias.

(iii) An email survey would be satisfactory, and could be backed up by a personal


interview survey of any staff not using email, such as maintenance and security staff
who do not have individual office space, also cleaners, car park staff, etc. It might
also be possible to interview some of those who did not respond to email. This should
collect maximum information in minimum time, at no very great cost, without
"excluding" anyone.

Ordinary Certificate, Paper I, 2003. Question 7 Ordinary Certificate, Paper I, 2003. Question 8

(i) A possible closed question is: (i) Disadvantages of computers


How well would you say your dietary needs are catered for? open-ended questions difficult to deal with, requiring some knowledge and
judgement to classify;
Very well Quite well Not very well Not at all
need to have a specialist computer package and an operator who can use it.
Don't know (e.g. because never go to canteen)
Disadvantages of hand analysis
This aims to obtain a qualitative answer to one point of particular concern, offering still not easy to deal with open-ended questions;
alternatives to provide any respondent with a suitable box to tick.
further analyses after data entry are very time-consuming, and need checking.
An open question could be:
Advantages of computers
What, if anything, could be done by the canteen that would encourage those
with your dietary needs to use it more often?
after data entry, a wide range of tables, tests and analyses can be carried out;
new (derived) variables can be created and studied;
[Provide a large box or a few dotted lines for the answer]
modern packages have good graphics for use in reports.
This covers the same area of enquiry, but allows the opportunity for respondents to
offer suggestions which may be useful. Advantages of hand analyses
very few, except that "rogue" data may be more easily spotted (although even
that should be done during data entry).
(ii) Closed questions provide a number of possible alternative answers, either
mutually exclusive such as boxes to tick for amount of income or including a general
"don't know" box as above. Open questions do not offer specific alternatives, but
(ii) SPSS (for example) requires variables to be set up in variable view, with
merely ask for opinions.
information on whether variables are numeric, alphanumeric etc, labels to be applied
Closed questions are easy to analyse, although respondents may be constrained in to responses, data to be entered in a spreadsheet matrix related to variables as defined.
thinking about an answer and may wish to answer in a way not provided for. "Other" Questionnaires should therefore be formatted so that it is easy to translate answers
answers may not always raise points that would have been ticked if a box had been into single numbers or very short mnemonic codes. Transfer to computer from
offered for them. Open questions do not limit answers to those expected by the questionnaire should be easy, without having to search the form to find codes.
questionnaire designers, and may give very useful feedback. But the more open the
answers, the more difficult they often are to analyse. The survey planners cannot
know how widespread particular concerns are (e.g. canteen layout, lighting, comfort) (iii) Cross-check consistency of answers: e.g. a situation where, if one question is
unless answers on those specific points have been asked for. answered "no", there should be no data entered in a following question. A standard
symbol (such as *) should be entered where values are missing. Types of entry,
numeric, alphanumeric etc, can be checked for appropriateness. Values can be
(iii) The wording of a question may point to possible answers, or discourage checked to see if they lie in an acceptable range.
people from giving others, which will bias estimates of the opinion in the population
on the issue being studied. For example, "How far do you feel that the canteen
service should be controlled by money-making motives?" is pointing people to answer
against this emotive idea. A better question would be "Should the present system of
subsidising the canteen be continued, or would it be better to expect those using it to
pay prices sufficient for the canteen to break even?"

Another example is "Do you think that reasonable accommodation should be devoted
by the canteen to meeting special dietary needs?", which begs the question of what is
"reasonable" a vague word which should not be used. A better alternative is "How
much of the canteen area should be dedicated to special dietary needs?"
Ordinary Certificate, Paper I, 2004. Question 1

NOTE. The question does not ask for definitions of the two sampling methods, so
THE ROYAL STATISTICAL SOCIETY these can be assumed.

Advantages of quota sampling over random sampling include the following.


2004 EXAMINATIONS SOLUTIONS (i) No sampling frame is needed, so the quota method can be used when a frame
is not available.

(ii) Very little preparatory work is required in the survey office.


ORDINARY CERTIFICATE
(iii) It is quick to carry out, since interviewers do not have to find specified persons
PAPER I for respondents [this solution is written throughout in the context of a survey
of people, but the ideas will apply in general for other contexts], so call-back
is not necessary.

(iv) Because of speed, it can be used repeatedly for topical purposes such as
election predictions.
The Society provides these solutions to assist candidates preparing for the
examinations in future years and for the information of any other persons using the (v) The costs of planning and analysis are less than for random sampling with
examinations. call-backs.

The solutions should NOT be seen as "model answers". Rather, they have been (vi) The controls on quotas consist simply of finding men and women in fairly
written out in considerable detail and are intended as learning aids. broad categories, such as age-group and type of occupation/work.

Users of the solutions should always be aware that in many cases there are valid (vii) A specified target sample size can be achieved, since it does not matter which
alternative methods. Also, in the many cases where discussion is called for, there individuals form the sample.
may be other valid points that could be made.
Disadvantages include the following.
While every care has been taken with the preparation of these solutions, the Society
will not be responsible for any errors or omissions. (i) There is no theoretical method of assessing sampling variability.

The Society will not enter into any correspondence in respect of these solutions. (ii) Approximate methods have to be used for estimating variances.

(iii) Interviewers can easily introduce bias when choosing people to form the
sample.

(iv) Refusals can be numerous if the same vicinity (e.g. a shopping area) is used
for obtaining interviews for several surveys in succession.

(v) Refusals may in any case be more common than with a random sample.

(vi) It is difficult to control fieldwork, or to make checks while it is being done


(and impossible after it finishes).

RSS 2004 (vii) Errors through bias etc form a "hidden" cost because of non-detection.

Ordinary Certificate, Paper I, 2004. Question 2 Ordinary Certificate, Paper I, 2004. Question 3

(i) Possible problems and suggestions for overcoming them are as follows. A database might look like the following.
Credit was of course given in the examination for any other reasonable suggestions.
FIELD NAME FIELD TYPE WIDTH
(1) Some addresses will be non-residential; if this is recognised, they can be Customer_Id Auto-number 6
omitted from the sampling frame. Title Text 4
Surname Text 24
(2) Extra "reserve" addresses need to be available to replace any that are empty or Given_name Text 24
non-residential. Initials Text 8
House_number Numeric 4
(3) Some addresses may be listed more than once, by accident or error or in Address_1 Text 36
different categories, but no address should be used more than once. However, Address_2 Text 36
this will not get over the problem of repeated addresses having a higher Address_3 Text 36
probability of appearing in the sample. Postcode Text 8
Telephone Numeric 16
(4) Alternatively, repeated addresses may be deleted, but this could be a time-
Doctor Text 24
consuming task before a sample is selected.
AllergyA Text 24
AllergyB Text 24
(5) If there is time, a list in some reasonable order will allow someone to travel
round and check it for completeness before it is used geographical order MedicationA Text 24
rather than, for example, alphabetical would be needed. StartA Date 6
FinishA Date 6
(6) Some addresses will have more than one household living there (this should MedicationB Text 24
not apply to blocks of flats, where each household has its own number), and StartB Date 6
all households at such an address could be sampled to ensure proper FinishB Date 6
representation of this type. MedicationC Text 24
StartC Date 6
FinishC Date 6
(ii) Each interviewer could have a quota of interviewees in age, sex and
occupation groups, to be ascertained by the first few questions asked. Size of
household(s) is a useful criterion also, and the whole of a target area must be covered Note. A separate table for medication would also be useful:
by the team of interviewers between them. Time of interview should be varied (day,
evening) to ensure all residents are available for interview. FIELD NAME FIELD TYPE WIDTH
Medication_Id Auto-number 6
Customer_Id Auto-number 6
StartDate Date 6
FinishDate Date 6
Ordinary Certificate, Paper I, 2004. Question 4 Ordinary Certificate, Paper I, 2004. Question 4 continued

Please complete this questionnaire only if you are an employee working in the city of (X). If
you do not work in (X), we apologise for bothering you. Additional questions would be on relevant important topics, such as delays on public
Please give your Name ____________________________ transport, overcrowding, reliability of services, cost, congestion on roads, problems
and Home Address ____________________________ caused by flexible working times.
____________________________
_________ Postcode __________

1. Where in (X) do you work? Please give the name of the road or building and its
postcode. ___________________________________________________________

2. How far is it from your home to work? Please tick the appropriate box. Ordinary Certificate, Paper I, 2004. Question 5
Less than 2 miles
Between 2 and 5 miles
Between 5 and 10 miles
The advantages of a "diary" include the following.
10 or more miles
(i) There is a much more accurate record of what was eaten and when.
3. On a normal day with no unusual delays, how long does it take you to travel from
home to work? _____ hours _____ minutes (ii) Answers do not depend on long- or medium-term memory.
4. What method(s) of transport do you use? Please tick all the relevant boxes.
(iii) A diary form could be designed, with suitable headings and definitions, to
Foot make accurate and correct recording easier.
Bicycle
Car or Van (iv) A further improvement may be to record quantities in some convenient way.
Bus, Tram or Coach
Train or Underground (v) Regularity of diet can be included by having carefully specified "time" boxes.
Other (please specify) ________________

5. For any of the methods of transport you have marked in question 4, please give the
cost of a return trip. Disadvantages include the following.

Car or Van (cost of fuel only) (i) It takes time, and may become tedious, for a diary to be fully completed over a
Bus, Tram or Coach reasonable period.
Train or Underground
Other
(ii) There is no guarantee that it is completed fully accurately, at the time food or
[Note. For many large cities, this question would need amending to allow for use of season or other multi- drinks are consumed or very soon afterwards.
purchase tickets or "travelcards" that may cover more than one method of transport. ]
(iii) Diets will vary somewhat according to seasonal availability of some items,
6. Do you find any disadvantages in your present method(s) of travel?
requiring repetition of the exercise a few times during a year.
Yes No
(iv) People may actually change their regular habits during the time they are
If you have answered Yes, please say what they are. keeping a diary.

(v) "Snacks" between main meals may not be recorded unless clear instructions
______________________________________________________
are given and not always then.
Thank you for completing this questionnaire.
Your answers will be kept confidential.

Solution continued on the next page

Ordinary Certificate, Paper I, 2004. Question 6 Ordinary Certificate, Paper I, 2004. Question 7

What is to be estimated in particular, will the interest lie in mean values of (i) The total number of farms is 400, so we can take a 10% sample (40 farms) and
measurements or in proportions? This determines which variance formula is used. therefore 10% in each size group. Rounded to the nearest whole number, this gives
Proportions give much les information per item and so need much larger samples. 20, 12, 8.

Are estimates required for subgroups? If so, stratification is required, and each
subgroup has to be sampled adequately. (ii)

How accurate are estimates required to be (how close to the "true" population value), Size group Number SD Sample size
and are the available resources (time, money, staff) sufficient to collect and process (see calculation below table)
sufficient data to achieve this accuracy? Small 203 6.4 = 1299.2 10.67
Medium 115 11.6 = 1334.0 10.95
Does the person planning the survey have any information on the variability of Large 82 27.3 = 2238.6 18.38
measurements to be taken or the size of the proportion to be estimated? If not, few of 4871.8
these questions can be answered satisfactorily and some preliminary work or a pilot
survey will be needed.
1299.2 1334.0 2238.6
The sample sizes are 40, 40, 40 .
Is the sample multi-purpose, i.e. required to estimate several things, either several 4871.8 4871.8 4871.8
measurements or proportions or a mixture of the two? If so, the sample size must be Rounding, these will be taken as 11, 11, 18.
large enough to meet the requirements of precision for all of them. Assess this in the
light of available resources.
(iii) Method (i) is proportional allocation, which is easy to plan and does not need
What level of non-response may be expected? Allowance for this will be needed in estimates of standard deviations in groups. The groups (or strata) are represented in
deciding sample size. the same proportions as in the population, so the method gives reasonable estimates
valid for the whole population without further adjustment.
How quickly are results needed, and is this realistic with the available resources?
Method (ii) is optimal allocation, sampling more intensively in the more variable parts
of the population and in the larger strata. Its estimates have minimum variance for
fixed total sample size (provided the available information on SDs is good). The
recorded data have to be kept in the correct strata during the estimation calculations.
Ordinary Certificate, Paper I, 2004. Question 8

(i) (a) For systematic sampling, number the books in order of positions on
shelves, beginning with (say) the top shelf and then move to the second, then THE ROYAL STATISTICAL SOCIETY
to the third and so on until all N books are identifiable. (Since a library will
have many sets of shelves, the shelves in one set will be completed first, then
move to the next set; this will usually be easier than completing all the top
shelves first.) 2005 EXAMINATIONS SOLUTIONS
Calculate k = N/n and round it to the nearest whole number. Choose at
random a number between 1 and k (inclusive), say j. Locate the jth book
along from the starting point; this is the first member of the sample. Then
take every kth book after that. ORDINARY CERTIFICATE
(b) For cluster sampling, the clusters could be taken as the sets of shelves PAPER I
or, alternatively, individual shelves could be used. Number these clusters 1 to
L. To obtain an approximate value of the number of books in each cluster, use
M = N/L. Then choose n/M clusters to form the sample, and take every book
in each cluster chosen.

Advantages of systematic sampling are that it is easy to carry out and would be very The Society provides these solutions to assist candidates preparing for the
much quicker than a random sampling scheme. Since N is known, no complete count examinations in future years and for the information of any other persons using the
is necessary at the beginning. Most likely there would be no periodic variation in the examinations.
ages of books, so age can be assumed to be a random variable when based on
systematic sampling. All the stock of books would be covered, so long as no shelves The solutions should NOT be seen as "model answers". Rather, they have been
were missed in the initial count. There is no theoretical basis on which to study written out in considerable detail and are intended as learning aids.
systematic sampling, but either a form of cluster sample analysis can be used or one
based on assuming simple random sampling. Users of the solutions should always be aware that in many cases there are valid
alternative methods. Also, in the many cases where discussion is called for, there
Cluster sampling requires the shelves, or sets of shelves, to be numbered first. Care may be other valid points that could be made.
needs to be taken when sampling each cluster to look at every book in it, once and
once only. It is a good method if the distribution of ages of books is similar in each While every care has been taken with the preparation of these solutions, the Society
cluster and the clusters are similar to the overall distribution in the library. This may will not be responsible for any errors or omissions.
not happen if books are arranged by subjects, some of which will have more recent
books and others older books. The Society will not enter into any correspondence in respect of these solutions.

(ii) Suitable strata would be very hard to define books of similar age would take
a long time to locate (unless there is a computer listing of stock in which case
average age could probably be calculated directly for the population without
sampling). Even if some other strata (not age) were used, the process would not be
easy; for example, if shelves (or sets of shelves) were strata, every book would need
numbering so that random samples could be taken. This would be very time-
consuming.

Stratified sampling does not seem a good idea for this purpose.

© RSS 2005

Ordinary Certificate, Paper I, 2005. Question 1 Ordinary Certificate, Paper I, 2005. Question 2

A possible questionnaire might be as follows.


(i) Using stratified sampling followed by simple random sampling, we could
proceed as follows. Please give the date you left school:
Month Year
Stage 1: stratify schools into size groups and select samples from the strata.
Your age then:
Stage 2: using the lists of students, select simple random samples from the Years Months

/ /
chosen schools.
Your date of birth:
D D MM Y Y
Using cluster sampling followed by systematic sampling:
We are interested to know what your occupation has been in each year since you left
Stage 1: take the schools in different geographical regions as clusters and
school. This could be study, or work (paid or unpaid), or something quite different
select some clusters at random from these;
such as travel. For this survey the main occupation is defined as one on which you
spent 4 or more hours per week and which lasted at least one month. If you changed
Stage 2: use the student lists as the basis for a systematic sample of
your occupation within a year, please give the details of the months spent in each. (If
individuals. you have any queries about how to complete this question, please contact the survey
team.)
(ii) For the first scheme above, we are bound to obtain schools of all size groups; Calendar Year 2004 [Repeat all of this for 2005]
but we have to construct the strata first. Then it is easy to estimate means,
proportions etc from a simple random sample, but the method is rather tedious
to plan and carry out. Also occasionally a very "untypical" sample can result. Study? Yes No (please tick relevant box)
For the second scheme, clusters are administratively easier to handle, for
example by using local offices as bases to cut down travel, but the chosen
If YES please give the following information:
clusters may not be typical of the whole country and the schools within a
cluster may be quite similar to one another. Then systematic samples from
Time for which it lasted (months) _________________
lists are easy to carry out, and can ensure a good balance of leaving dates, but
there is no theoretical basis for estimating variability. Name and location of institution ___________________________________

[NOTE – it would be wise to give two spaces for answering these questions in case
NOTE that other combinations are possible, and will show similar advantages there has been a change]
and disadvantages.
Work? Yes No (please tick relevant box)

If YES please give the following information:

Paid Unpaid

Time for which it lasted (months) _________________

Name and location of employer ___________________________________

Type of work done ______________________________________________

[Give a second space for answers here also]

Solution continued on next page


If you had a main occupation other than study or work, please give details here. Ordinary Certificate, Paper I, 2005. Question 3

_____________________________________________________________________ A database might look like the following, continuing with as many occupations as
necessary.
The details of the occupations would be coded to indicate names (if appropriate) and
Please tell us what you hope to be doing in five years time from now:
locations. Career plans would also be coded.
_____________________________________________________________________ FIELD NAME FIELD TYPE WIDTH
Respondent_ID Auto-number 6
_____________________________________________________________________ Title Text 4
Surname Text 24
Given_name Text 24
And in ten years: Initials Text 8
House_number Numeric 4
_____________________________________________________________________ Address_1 Text 36
Address_2 Text 36
_____________________________________________________________________ Address_3 Text 36
Postcode Text 8
Telephone Numeric 16
(These could be the same as now or something quite different).
Date_left Date 6
Age Numeric 2
Year_1_occ1 Numeric 2
Date_occ11 Date 6
Thank you for responding to this survey. Details_occ11 Numeric 3
Year_1_occ2 Numeric 2
Date_occ12 Date 6
Details_occ12 Numeric 3
Year_1_occ3 Numeric 2
Date_occ13 Date 6
Details_occ13 Numeric 3
Year_2_occ1 Numeric 2
:
Career_plans Numeric 3

It would also be useful to have separate tables for occupations:


FIELD NAME FIELD TYPE WIDTH
Occupation_ID Auto-number 6
Respondent_ID Auto-number 6
StartDate Date 6
FinishDate Date 6
Details Numeric 3

A spreadsheet would have variable names similar to the field names, and cell widths
the same as the field widths. Coding could be the same as the above. Either extra
columns can be used for occupations or separate sheets for each. Types are number,
date or text. Auto-number number.

Ordinary Certificate, Paper I, 2005. Question 4 Ordinary Certificate, Paper I, 2005. Question 5

(i) A non-random method could: (i) Some of the local representatives may be chosen at random from those with
responsibilities in, and/or knowledge of, the community served by the market.
(A) ensure that regions of both types are sampled, without the need to
Some may have an official position which makes them well known, and able
stratify;
to obtain the required information. Merchants may have to pay a hire charge,
(B) use regions that are easy to get to as often as necessary;
and if so the numbers doing so on any particular day could be obtained from
(C) choose regions that are likely to have a variety of types of market;
the authority receiving the charge. Times of start and end of activity each day
(D) give what may be thought a "representative" sample.
could be found by regular visits, and the chosen representatives must be
prepared to do this. Numbers could also be estimated in this way. Merchants
Disadvantages include:
could be asked exactly where they come from, and their range of produce
(A) possibility of bias in choice; recorded. The amount they bring will determine the length of time they stay.
(B) no valid estimate of sampling variation can be found; Representatives have to be taught how to obtain accurate, reliable information.
(C) areas thought "not typical" may not be included;
(D) those least easy to reach may not be used.
(ii) Few people are likely to remember last year’s prices at any particular season,
even if they have a rough idea that things are cheaper/more expensive this
(ii) (a) Merchants change from day to day; the same merchant may bring a year. Merchants could perhaps be asked whether supplies are more or less
different set of produce at each visit; some merchants may have access plentiful than last year, which may be related to price, but any attempt at
to very little land and so visit rarely; a "sampling frame" of merchants numerical estimates is probably not worthwhile. Central figures on price and
is not likely to exist. quantity of staple foods may be available but regions will vary.

Sampling of stalls, at random or perhaps as a systematic sample, may


be satisfactory. Local representatives may be able to say who is a
regular stall-holder, if required. Those sampled could be asked how
often they come to the market, and some balance between those who
are very often there and those rarely there could perhaps be achieved.
Supplementary questions about their background and sources of
produce could be asked.

(b) It should be possible to construct a comprehensive list of the fruits and


vegetables likely to be available in the region at the times of visits.
Staple varieties of these should generally be on sale in season, but a
few merchants may also have less common specialities. Prices of the
staple foodstuffs should be collected if possible (it is not always easy
to do so if an actual purchase is not made) and the proportion of
merchants lacking any of these should be noted. Interviewers need
skill and tact to achieve reliable results. Local representatives can
probably advise about shortages, or poor quality of produce, due to
adverse weather.
Ordinary Certificate, Paper I, 2005. Question 6 Ordinary Certificate, Paper I, 2005. Question 7

Interviewers can encourage respondents to give particular answers by asking (i) There are 80 markets. Hence a 10% sample is required. This requires 2.7 and
questions in a particular way, loaded to a particular answer. Also the question may 5.3 markets, or 3 large and 5 small as the nearest whole numbers.
not be fully understood by the interviewer, so it is not answered as it was meant to be
asked. Respondents may not be shown the questionnaire upon which answers are This will cost (3 × 15) + (5 × 12) = 105 currency units.
recorded, so they do not know all the possible answers expected; or the interviewer
may take advantage of illiteracy to enter inaccurate answers. A trained interviewer
should ask all questions in a neutral way, not depart from the wording of them, and be (ii) Using n1 large and n2 small markets,
careful to record the answer as closely as possible to what the respondent says. A
friendly attitude, helping but not forcing people towards answers, is necessary. 27 0.05 53 0.07
n1 0.3486 and n2 1.0710
When collecting prices from markets, it may be wise to record prices quoted to 15 12
potential buyers instead of asking directly. A direct answer may be one thought likely
to please the interviewer, or biased in either direction according to why the merchant so that n n1 + n2 1.4196 .
thinks the question is being asked – such as if he fears higher stall charges.
Interviewers should explain the purpose of the survey, to avoid suspicions over the 0.3486
Hence n1 0.2456n and n2 0.7544n .
reason for it. 1.4196

Total cost c1n1 c2 n2 0.2456n 15 0.7544n 12 12.7368n .

105
This should not exceed 105, so n 8.24 .
12.7368

So n1 0.2456n 2.02 and n2 0.7544n 6.22 .

Therefore take 2 large and 6 small markets, cost 102.

(iii) If the data refer to a very important vegetable, the optimal allocation may be
preferred as it gives a minimum-variance estimate. However, there is no
guarantee that it will do the same for any or all of the other vegetables and
fruit on sale. Using a uniform sampling fraction should achieve representative
results for items of produce as a whole, and it does not need estimated
variances to make any calculations. The cost is marginally higher but this is
not very serious. The slightly larger number of large markets in uniform
sampling could also help in assessing the variation among the large markets.

Ordinary Certificate, Paper I, 2005. Question 8

Index numbers for prices require a set of fruits and vegetables to be specified to go
into the index. This is the first important decision to make. Prices (pi) have to be THE ROYAL STATISTICAL SOCIETY
found for each chosen item, and also the quantities (qi) of these that are consumed in
the population. Decisions on how to collect (pi) and (qi) have to be made. The index
number is generally calculated as a weighted average based on piqi.
2006 EXAMINATIONS SOLUTIONS
Assuming that no index at present exists, the prices from as many markets (or other
outlets) as possible should be obtained. These may vary with season, but a decision
must be made on a "typical" pi for the index. Some combination, possibly weighted,
of local prices is likely to be best. In the same way, qi for each item has to be ORDINARY CERTIFICATE
constructed. This is often done by a consumer survey, separately from the price
survey, though care should be taken to see whether pi and qi for particular parts of the PAPER I
country are related (scarcity in a region may lead to high prices there). Town and
rural consumption patterns may differ. The first year's data would be used as
"baseline" information, and subsequent years' data compared with that baseline –
usually by changing only the prices. Quantities are updated less often. Whatever
method has been used to construct the (pi) should go on being used for subsequent
years to provide valid comparisons. (If any serious error is found in the method after The Society provides these solutions to assist candidates preparing for the
later use, a new base may need to be set up using a modified version of the collection examinations in future years and for the information of any other persons using the
method.) Although indices are typically quoted as annual figures, food prices are examinations.
almost bound to vary by season and the data collection has to allow for this.
The solutions should NOT be seen as "model answers". Rather, they have been
written out in considerable detail and are intended as learning aids.

Users of the solutions should always be aware that in many cases there are valid
alternative methods. Also, in the many cases where discussion is called for, there
may be other valid points that could be made.

While every care has been taken with the preparation of these solutions, the Society
will not be responsible for any errors or omissions.

The Society will not enter into any correspondence in respect of these solutions.

© RSS 2006
Ordinary Certificate, Paper I, 2006. Question 1 Ordinary Certificate, Paper I, 2006. Question 2

The required questions could be covered on a form such as the one below, and
(i) The description of the project indicates that a large area is involved. Villages instructions to interviewers might be printed in italics to avoid confusion.
chosen should represent all distinct types of agriculture and land, such as high
and low land, mixed crop and animal farms, farms that deal only with crops Interviewer, say the following: Good morning (or afternoon, evening, as appropriate).
and those that are only animal farms. Size of farm should also be considered, The … organisation is undertaking a project on the control of XX in this country, and
perhaps split into large and small. If the area is large enough to have a as part of this project would like some information about farms and farmers in this
variable climate, for example if some parts have less rainfall than others, this village. It would be very helpful if you could answer some questions about your farm.
is also important. If possible, reasonably accessible villages should be used. This will take at most half an hour.
But with so many constraints and only 15 villages, there may be some
difficulty making a good choice. Interviewer: If the farmer agrees, say "Thank you" and then start asking the
questions below. If the farmer refuses, apologise for taking his/her time and try to
(ii) Method A arrange an alternative time of interview.
Advantages are that those attending are likely to be interested in the project Q1. Please tell me your name.
and less likely to refuse to be interviewed; there is no "not at home" problem.
Provided they are willing to stay, the necessary number can be obtained – if Q2. How old were you last birthday?
there are enough at the meeting!
Disadvantages are that a meeting may be poorly attended; unless there are a Q3. Is your highest educational level primary, secondary, or higher? (Interviewer:
good number of interviewers, people may not be prepared to wait for their delete as necessary).
interview; and the sample is in any case self-selected, through interest in the
project, enough spare time to attend, and perhaps being more articulate or Interviewer: Ask questions 4–7 in turn, recording the answers to all four questions on
literate. Good publicity beforehand and plenty of interviewing staff would the grid below. Insert the name of the household member where indicated.
help considerably. There could be other activities while waiting for interviews
(e.g. experts whom they could consult), or some refreshments, or the promise Q4. Please could you tell me the names of all members of your household.
of assistance by advisory services when needed.
Q5. What relationship is (give name, working through all members of the
The choice of random sample members also has problems. This cannot be household in turn) to you?
done in advance; it must be done at the meetings. As there is no list, some
form of selection has to be based on numbering of seating or distributing Q6. How old was (give name, working through all members of the household in
tickets to people as they arrive and then drawing numbers out of a hat when turn) last birthday?
everyone has arrived. More than one member from the same farm could be
selected unless this was deliberately avoided. Q7. I am now going to ask you about the education of each member of your
household. (Interviewer: Say this once only.)
(iii) Method B
Was the highest educational level of (give name, working through all members of the
Absence of maps makes it hard to know whether all farms have been located household in turn) primary, secondary, or higher?
and listed, and also to fix a suitable sampling interval in systematic sampling.
Asking local inhabitants where neighbouring farms are could help to complete Name of household Relationship to farmer Age Highest educational level
a list, and a preliminary tour round the area could then be used to fix the member (delete as necessary)
sampling interval. Non-response through being unwilling or unavailable is
1 primary/secondary/higher
likely. Considerable time will be needed for this method.
2 primary/secondary/higher
It is however a properly random method, in which all farmers have a chance of 3 primary/secondary/higher
being interviewed, and the interviewer can record other useful information as 4 primary/secondary/higher
well as answers to questions. Choice of calling time could help to reduce non-
response, especially by not going at busy times during the working day. Some
form of incentive could be offered to obtain answers, though this would cause Solution continued on next page
ill-feeling among those not in the sample.

Q8. How many hours per week do you usually work on the farm this time of year? Interviewer, say the following: "That is the end of the questions. Thank you very
much for taking the time to answer them. Is there anything you would like to ask me
Interviewer: Record answer on grid after Q.10. or anything else that you would like to add?" Record responses. Answer any
questions if you are able to do so, or say you will try to find out the answers and that
Interviewer: Ask questions 9–10 in turn, recording the answers on the grid below. someone will get back to the interviewee. Record the action. Record anything
interviewee adds.
Q9. Which members of your household work on the farm this time of year?

Q10. How many hours per week does (give name, working through all members of
the household who work on the farm in turn) usually work at this time?

Name of household member Hours per week


Farmer

Interviewer: Ask questions 11–12 in turn, recording the answers on the grid below.

Q11. What kinds of livestock do you own?

Interviewer: Record answers in table in Q.12. If answer is none, go to Q.13.

Q12. How many (give name of type of livestock, working through all listed in turn) do
you own?

Type of livestock Number

Interviewer: Ask questions 13–14 in turn, recording the answers on the grid below.

Q13. What crops do you grow?

Interviewer: Record answers in table in Q.14. If answer is none, go to Q.15.

Q14. What size of area is planted with (give name of crop, working through all
listed in turn)?

Type of crop Area

Q15. Would you describe the pest risk from XX in this area as high, medium, or
low?

Q16. What measures do you take to control XX?

Solution continued on next page


Ordinary Certificate, Paper I, 2006. Question 3 Ordinary Certificate, Paper I, 2006. Question 4

(i) Clearly one possibility is to allow for five entries for every farm, i.e. one for (i) Sample size n = 20. There are 92 children altogether. The proportions in the
each potential household member. When households are smaller than five, schools are as follows.
this wastes a large amount of space and leads to large unwieldy files. [It also
School 1: 25/92 = 0.2717
requires care in programming when calculating means (or other statistics). For
example, suppose five columns (one for each potential member) are used for School 2: 30/92 = 0.3261
each farm, but a particular farm has only four members. The mean age, say, School 3: 37/92 = 0.4022
must be calculated using just these four members, i.e. assuming the fifth is
"missing" (and not erroneously taking the fifth as being present but zero). A The sample sizes from the three schools should be as near as possible to
total (e.g. total number of hours worked) assumes a zero in the "missing" n × 0.2717 etc. These are as follows.
column, but clearly makes no sense on its own without knowing the number of
School 1: 5.43 School 2: 6.52 School 3: 8.04
members. It might help to use a code (different from any code for an
individual missing value) to indicate those columns in which there are no
Therefore take samples of sizes 5, 7, 8.
entries at all because there are fewer than five members at that farm.]

If an additional variable, the number of household members, is created for


(ii) The values of (population size × SD of m) are as follows.
each farm, and only that number of entries is allowed for, space will be saved
but care will still be needed when calculating means and totals (and other School 1: 255.0 School 2: 231.0 School 3: 329.3
statistics) to ensure that the calculation takes the correct number of members
into account. 255.0
The total of these is 815.3, so the required sample sizes are 20 etc, i.e.
815.3

(ii) Livestock: discrete variable (integer), showing number on farm – as many School 1: 6.26 School 2: 5.67 School 3: 8.08
variables as there are types of animal in the whole survey, score 0 if a
particular farm does not have a particular animal type. So now the sample sizes should be 6, 6, 8.

Crops: continuous variable, measured area under each crop – as many


variables as there are crops in the whole survey, score 0 if there is none of a (iii) The sample sizes are almost the same. Method (ii) gives an extra item from
particular crop on a particular farm. the more variable location which could be an advantage. Method (i) is simple
to carry out and does not require information from the previous study, since
In each case, could use an "other" category, to save space, if there are when conditions could have changed. The locations are in (as nearly as
particular types of crop or animal mentioned by only very few farmers – possible) the same proportions in the sample and in the population. The
include all "rare" types in "other". estimate for the whole population is simple to calculate. With method (ii),
sample results must be kept separate for the three locations in order to
Pest risk: an ordinal variable coded 0 (low), 1 (medium), 2 (high) for each calculate population estimates, using strata membership information for
farm – only one entry needed. [If there were any "don't know" answers, they weighting.
would be treated as missing values.]
Perhaps there is little to choose between the methods overall. The final
Pest control: a nominal variable coded 1 (used), 0 (not) for each possible decision could be made on practical grounds, e.g. ease of reaching the schools.
control method for each farm.

Ordinary Certificate, Paper I, 2006. Question 5


Ordinary Certificate, Paper I, 2006. Question 6

(i) Stratified sampling


Simple random sampling from voters' register, especially one a year old
Urban and rural districts are two obvious strata that could be used. Simple
random samples in each stratum should give districts with different types of The target population is all residents. However, only those who were voters a year
employment pattern, although this cannot be guaranteed unless a more ago will be listed (probably not all of them), so any not eligible to vote will be missed,
complicated method is used. as well as voters who were not listed which will include all newcomers. Thus the
sampling frame is biased. Further, there will be non-response due to deaths or people
Cluster sampling moving away. When a sample has been selected, it may not be easy to contact all the
Regions could be used a clusters, each containing several districts. First some selected members, and some of those contacted may refuse to respond anyway.
regions will be chosen for a random sample of clusters, and then a random
sample will be selected within each cluster, giving administrative districts as
the units, which again would have a variety of employment patterns. Quota sampling

If this method is to be used easily and successfully, there must be some central facility
(ii) Stratified sampling controls the urban/rural factor better. Cluster sampling where a wide selection of people will go, e.g. a shopping centre or a main railway
would be easier to administer as both the sampling and the interviewing of station. It is very unlikely that the whole population of the area could be captured at
individuals are carried out in more concentrated areas rather than over the any one place, or even a representative part of that population – some could only be
whole country, saving time and resources. located at home or work places at convenient times.

Bias through interviewer choice of individuals is likely, refusal is more likely from
some groups than others, interviewers need good training (and occasional checking)
to make sure they correctly allocate individuals to quotas. Refusal rates may be no
lower than in random methods, and bias in asking questions is possible (as with any
personal interview method).
Ordinary Certificate, Paper I, 2006. Question 7 Ordinary Certificate, Paper I, 2006. Question 8

(i) Advantages of a diary – it does not usually rely on memory to any extent, (i) Obviously for the youngest age group this is likely to be the best way of
does not usually involve any interviewing if a good layout is used and data can collecting information, but there are a number of difficulties. Some length of
be taken directly from it, there is a reasonably accurate record of the activities time will be needed for each child, perhaps a full day (or the equivalent such
in each time unit. Headings could make completion easier, e.g. eating, as a morning and another afternoon/evening). This could easily alter the
working, leisure, codes for common occupations. child's pattern of activities. A sample of several children will be required to
find out the full variety of activities in that group, and to discover whether
rural and urban differ and in what ways. Observing just one child for a week,
(ii) Disadvantages – in some circumstances instant completion is not possible, for example, would be a waste of the observer's time. Basic information could
risk of some participants losing it, ceasing to keep it up to date, not giving be collected from parents, for example whether the child is at school or, if
details. Fewer people may be willing to join up in the first place, and some of relevant, nursery and play school attendance although observation in such
those who do will not complete it. Also the act of keeping a diary may lead activities is not encouraged for legal reasons. Certainly by the time a child is
some respondents to change habits, permanently or temporarily. A tedious at school "full-time", detailed observation would neither be possible nor
task like this may not always be done fully. Coding of responses for analysis necessary. Observation would be limited to out-of-school hours and holidays.
could be difficult and time-consuming, and analysis more complicated. An observational study needs fully trained observers for long periods of time
and will be very expensive. For the (say) 11–16 age group, a diary (see qu 7)
might be sufficient.

(ii) (a) Depending on the access to the web in the region being studied, it is
probably better to concentrate on school-based data collection. For
those who have access at home, this could be added. Schools would
therefore have to agree to take part and to direct children to a web page
containing a questionnaire. The answers would vary in quality
according to whether the children took it seriously or not, and whether
they were supervised or not; supervision might lead to less-than-
honest answers for some questions.

For home use, pop-ups while browsing the web could be used. This
misses those whose equipment does not accept pop-ups and those who
only use the web occasionally for specific enquiries. Data quality by
this method must be in serious doubt.

(b) As for any questionnaire, it will be necessary to check that the


questions can be understood, and answered easily in the absence of an
interviewer. Children in the appropriate age group must be used in the
pilot, to test the wording, coverage of appropriate topics and ease of
access to the questionnaire. It is also necessary to examine the
interface between the questionnaire and the storage of answers. Open-
ended questions will need great care in automatic collection methods.
A "focus group" could perhaps be useful.

Ordinary Certificate, Paper I, 2007. Question 1

Dear Customer,
THE ROYAL STATISTICAL SOCIETY
We value the feedback from our customers on the holidays they take with us. So we
hope you will spend a few moments answering the questions below. Please either tick
the appropriate box when there is one or write your answer in the space provided.
2007 EXAMINATIONS SOLUTIONS When you have completed the questionnaire, please hand it to your tour leader or, if
you prefer, post it when you get home to the FREEPOST address given at the end.

1a. CODE NUMBER OF HOLIDAY ________________


ORDINARY CERTIFICATE
1b. DATES FROM _______________________ TO ______________________
PAPER I 2a. Are you Male Female

2b. Please indicate your age range 18-24 25-34 35-44


45-54 55-64 65-74 75+

3. How satisfied were you with:


The Society provides these solutions to assist candidates preparing for the
examinations in future years and for the information of any other persons using the VERY ONLY JUST NOT AT ALL
examinations. SATISFIED SATISFIED SATISFIED
(a) your accommodation?
The solutions should NOT be seen as "model answers". Rather, they have been
(b) the programme of
written out in considerable detail and are intended as learning aids.
excursions?
Users of the solutions should always be aware that in many cases there are valid (c) your tour leader?
alternative methods. Also, in the many cases where discussion is called for, there may
be other valid points that could be made. 4. Would you recommend this holiday to other people?

While every care has been taken with the preparation of these solutions, the Society YES NO
will not be responsible for any errors or omissions.
5. What did you like best about the holiday?
The Society will not enter into any correspondence in respect of these solutions. _______________________________________________________________

6. What was the worst thing about the holiday?


_______________________________________________________________

7. Please add here any other comments you would like to make
_______________________________________________________________
_______________________________________________________________
© RSS 2007
_______________________________________________________________

Solution continued on next page


Ordinary Certificate, Paper I, 2007. Question 2
Thank you for answering these questions. Please now hand this form to your tour
leader, or remember to post it to the FREEPOST address, XXX, FREEPOST 6789,
Townsname, YZ77 0XQ (i) A large number of possible descriptions exist and, if they are to be classified, a
considerable number of completed questionnaires will have to be gathered first
so that the coders have a good idea of the sorts of responses that have been
made.
[Note: most companies would include an undertaking not to pass on details to any
other companies when people's names are on the form, but in this case there is no Examples of responses include the following.
need to ask for a name.] Beach activities. Swimming, surfing, etc. Sport – watching or
playing. Golf, cricket, tennis, winter (eg skiing) etc. Evening
activities. Excursions: coach or rail journeys, cruises (educational or
other), city/region tours, sightseeing and/or educational.
Walking/hiking/climbing. Safaris.

A coding system could either use groups of activities or individual activities


having separate codes. If individual activities are each given a code, it might
be useful to code them 01, 02, ... in order of popularity as shown by the
questionnaires. A disadvantage of this is that similar activities would not have
adjacent code numbers. Another disadvantage is that the popularity rankings
might be found to change after the coding order has been laid down.
Alternatively all activities of the same basic type could have a group of
adjacent code numbers, e.g. sport – playing 11, watching 12, golf 13, cricket
14, tennis 15 and so on. However, with this system many responses would
need several codes (e.g. a response of "playing golf and watching cricket"
would need 11 13 12 14, with a database convention regarding the ordering);
an alternative would be to have a code of say 11 for playing golf and a
separate code of 12 for watching golf, similarly 13 and 14 for cricket, 15 and
16 for tennis, etc, but there would soon be very many codes. Either way, a
response of simply (say) "golf" does not indicate whether it refers to playing or
watching the sport. Any system of this kind would require a few basic types to
be identified first, e.g. 0 for beach/water, 1 for sport, 2 for travel, and so on.
Within the basic types those that turn out to be rare could be combined into the
same code ("other") – but, once this has been done, it is very difficult to
break the combined code down into its constituent parts again.

(ii) In closed form, a list of the more popular activities could be given, each
having a box to be ticked if the respondent liked doing that activity, while the
less popular ones could be combined into related activities such as
walking/hiking/climbing that would have just one box to tick.

Ordinary Certificate, Paper I, 2007. Question 3 Ordinary Certificate, Paper I, 2007. Question 4

(i) A quota sample should contain a given number (quota) of people in each of a (i) (A) is two-stage cluster sampling; (B) is stratified random sampling.
set of categories. These categories are based on sex, age-group and any other
characteristics of interest, such as in this case whether people are in a group or
are travelling independently. Assume he has been told what each category (ii) For cluster sampling (method A) to work well, the views of the patients in the
consists of (e.g. "males over 60 travelling independently"), and how many sampled wards have to be representative of those in the hospital on the whole,
residents he must interview from each category. i.e. in each chosen ward the variability in patients' views must reflect
variability in the whole hospital.
He should walk round the hotel complex, visiting all its facilities at different
times of the day, and also interview people in the restaurant and in their rooms For stratified sampling (method B) to work well, the views of patients in
if possible. There will be a few questions to ask first to identify which sampled wards of a given type (O, SC or IC) should not vary much between
category a person is in, so that the manager knows whether he needs another wards of that type, but the types may differ noticeably – in which case results
member from that category or not. He should go on searching until all his for each particular type are more useful than a single "overall" result.
quotas have been met – some people may not be easy to catch e.g. if they go
on organised excursions most days.
(iii) (A) Advantages include: sample relatively quick to choose; require details
A systematic sample can be taken once it is known how many people are of patients only for selected wards; less effort to visit only a few wards
needed and how many registrations there are in that particular week. If n than the whole hospital.
interviews are needed and N people have registered during the week, he can
take every (N/n)th from the registration list, the starting point being chosen at Disadvantages include: may not have all three types (O, SC, IC)
random between 1 and N/n. [If most arrivals are in groups, the sampling represented adequately in the sample; variances of estimators tend to
fraction for independent travellers might need to be larger than that for groups be high.
to obtain adequate precision.]

(B) Advantages include: easy to get results for each type of ward;
(ii) The systematic method is easy to carry out, and should be effectively random possibility of including different questions relevant to each type of
(unless the "sampling interval" N/n unfortunately coincides with any cyclical ward; estimates are often more precise.
pattern among the registrations). Everyone who has registered has the same
chance of being selected (but this may need adjusting as mentioned above). Disadvantages include: takes time to select samples; takes time to
Some may of course be difficult to locate, or may refuse, as in any survey. A visit several wards of each type; details are needed to trace individual
quota sample ought to be representative of the population of residents but the patients.
interviewer may be selective in which people are approached – those who
look less likely to refuse – and there is a danger of missing altogether some
types of residents, e.g. those who breakfast early or dine late because they
spend a lot of time out of the hotel. There is no theoretical support for quota
sampling because there is no element of random selection in it.
Ordinary Certificate, Paper I, 2007. Question 5 Ordinary Certificate, Paper I, 2007. Question 6

50 25 10 (i) Telephone interviews are relatively cheap – no travel costs, and hardly any
(i) N Ni 85. So the sample sizes ni have to be in the ratio
: : ,
85 85 85 wasted time through finding selected sample members are not at home or if the
respondent says "call back later". Interviewers' performance is easily
i
multiplied by 36 to find the actual numbers. Thus we get 21.2 in ward A, 10.6
in B and 4.2 in C, so we take 21, 11 and 4 respectively. monitored as conversations will usually be recorded (it is important to make
the respondents aware of this). Sometimes seeing an interviewer can put
The cost of this is (5 × 21) + (5 × 11) + (10 × 4) = 200 dollars. potential respondents off, whereas hearing may not do so. Answers to
sensitive questions may be better in telephone interviews that face to face.

N i si 50 1.81 25 3.23
(ii) For A, 40.47 ; for B, 36.11 ; for C, (ii) However, refusals may be more likely, especially if people have been
ci 5 5 interrupted at a busy or inconvenient time. Conversations must be kept short.
10 2.18 Background noise and the possibility of being overheard are possible. And not
6.89.
10 everyone is accessible by telephone.

N i si 40.47 36.11 6.89


83.47 , so the ratio nA : nB : nC is to be n: n: n,
ci 83.47 83.47 83.47
where n is the total sample size.

This gives 0.485n : 0.433n : 0.083n and the total cost is then

(0.485 5n) + (0.433 5n) + (0.083 10n) = 5.42n.

Hence we require 5.42n 200, or n 36.9.

Taking n as 36.9 gives nA = 0.485 × 36.9 = 17.90; take nA = 18


nB = 0.433 × 36.9 = 15.98 ; take nB = 16
nC = 0.083 × 36.9 = 3.06; take nC = 3.

The total cost will then be (5 × 18) + (5 × 16) + (10 × 3) = 200 dollars, so this
is satisfactory.

[Note: because the sample size for the more expensive ward (C) is rounded
down to 3, we save just enough to take n as 37. Otherwise we would need to
consider n = 36 and find suitable nA, nB, nC.]

(ii) The main difference is in B, where the second method gives a larger sample
size. Since B is the most variable ward, this should lead to more precise
overall results. A sample as large as 21 in A, as in the first method, is rather
wasteful, although C does have 4 on that scheme but only 3 on the second
scheme. On balance, the second scheme is likely to be preferred.

Ordinary Certificate, Paper I, 2007. Question 7 Ordinary Certificate, Paper I, 2007. Question 8

(i) A longitudinal study follows the same group of people through the whole (i) (1) People who respond might be systematically different from those who
period of the study. One advantage is that a sample will be relatively easy to do not. For example, they might be very interested in the topic of the
choose from the list of last year's graduates, which is likely to be complete and survey while most other people are not, or they might hold strong
very nearly fully up-to-date in terms of addresses etc, and thus provides a good views which in no way represent those of the whole population. This
sampling frame. Recent graduates are likely to be interested in responding, so can bias results very seriously.
there should be a good response rate at least initially – though this may fall off
over time. It is useful to be able to follow a group through an extended period (2) When some of the selected people fail to respond, the achieved sample
of time, noting changes in occupations and reasons for them. size becomes smaller than was planned and so the results have lower
precision than was aimed for. This can be very serious if there are
A disadvantage is that occupations which are recruiting in one year may not be many non-respondents.
recruiting every year, so the pattern of jobs that are obtained immediately after
graduation may change year by year. Another disadvantage is that contact (3) Following up non-response (as opposed to outright refusal) can be done
may be lost as people change jobs or move to other addresses. People may by telephone or by having an interviewer visit people, but this costs
also lose interest in the survey, leading to problems of reduced sample size and resources (time and money) and may not be possible in a short time-
possible non-response bias. On the other hand, it is even possible that scale when results are needed quickly.
participation in the survey, with the feeling of being "watched", might make
some respondents change some aspects of their occupations in the hope of
creating a good impression. A further disadvantage is that the study must (ii) Reasons include the following.
obviously take ten years (plus time for analysis) before full results are
available. (1) The available address is no longer the correct one, as the person has
moved.

(ii) A major advantage of using samples from graduates of five and ten years ago (2) The questionnaire may not get delivered.
as well as current new graduates is timeliness. Results will immediately be
available for people in varying stages of their careers and with different (3) The questionnaire may be regarded as junk mail and destroyed.
experiences of the initial jobs market.
(4) The intended responded may have died.
However, unless the university has kept a good database of its graduates, with
sufficient contact to keep addresses etc up to date, the sample frames available (5) A questionnaire may not reach the survey organiser even if it is
at five and ten years will not be as good as the recent one. If contact has not returned.
been reasonably regular, even those who do receive an inquiry may not be
very interested in responding. These disadvantages would make it wise to (6) People may be too busy to reply, or not interested, or simply set the
select a larger sample to allow for non-response, but this would increase the survey aside until they are less busy – when it is too late.
cost of the survey. Another disadvantage is that it would not be possible to
find out how the careers of individual graduates had changed over time (unless It may be possible to improve the look of a questionnaire to make it seem
the respondents were also asked to record their progress throughout five or ten more worthwhile answering; to remind people (also by post) at intervals; to
years – this could be done if it was thought appropriate). send a pre-paid reply envelope with the questionnaire; to offer gifts or
inducements, such as taking part in a draw, as an incentive to reply; to take
care that the introductory explanation to the questionnaire makes it seem
interesting and relevant, and is expressed in simple form rather than unduly
"official".

Also selected non-respondents could be visited or telephoned.


Ordinary Certificate, Paper I, 2008. Question 1

The letter should be on the organisation's headed notepaper.


THE ROYAL STATISTICAL SOCIETY
We would like to know whether the monthly newsletter meets our members' needs.
You have been chosen by a random selection method as one of the members to
2008 EXAMINATIONS SOLUTIONS approach for opinions. We hope that you will complete the short enclosed
questionnaire. Your responses will be kept confidential and results of the survey will
be published in aggregate form only.

ORDINARY CERTIFICATE Many questions can be answered by simply ticking the relevant response.

PAPER I
1. In which year did you join the organisation?

2. Are you: male ? female ?

3. Please tick the box showing your main employment status.


The Society provides these solutions to assist candidates preparing for the student
examinations in future years and for the information of any other persons using the employed part-time
examinations. employed full-time
unemployed
The solutions should NOT be seen as "model answers". Rather, they have been retired
written out in considerable detail and are intended as learning aids.
4. Do you read everything in the newsletter? Yes No
Users of the solutions should always be aware that in many cases there are valid
alternative methods. Also, in the many cases where discussion is called for, there may 5. What do you do with the newsletter when you have finished with it?
be other valid points that could be made. Keep it for a while
Pass it on to someone else
While every care has been taken with the preparation of these solutions, the Society Throw it away
will not be responsible for any errors or omissions.
6. We are particularly interested in knowing members' opinions of the
The Society will not enter into any correspondence in respect of these solutions. special features in the newsletter. In general, do you think:

There are too many


The number is about right
There are too few

In general do you find them:

Of no interest
A few are of interest
© RSS 2008 Most or all of them are of interest

Solution continued on next page

7. Do you think that advertisements should be included? Ordinary Certificate, Paper I, 2008. Question 2
Yes No No opinion

8. Do you agree that publishing a newsletter at intervals of one month is Advantages of including a questionnaire as an insert in the newsletter include:
about right? There are no costs of selecting a sample, addressing envelopes etc, or separate
Yes No postage costs.
If you answered No, at what interval would you like the newsletter to The questionnaire is sent to all members so potentially every member could
be published? give an opinion of the newsletter.

………………………………………………………. Disadvantages include:


Could be time consuming/costly to insert a questionnaire in every newsletter.
9. Please rate your satisfaction with the newsletter on a scale of 1 to 5, The extra weight of the questionnaire might increase the postage costs.
where 1 is not at all satisfied and 5 is extremely satisfied.
Unless the mailing list of members is up to date, some newsletters will not be
1 2 3 4 5 delivered (and this may not be detected). [The mailing list is likely to be at its
most up to date fairly early in the organisation's financial year soon after
membership subscriptions have become due and been paid; but it is still very
likely that there will be errors and omissions.]
Thank you very much for completing this questionnaire. Please return it in the Some members might postpone opening the newsletter if they are busy, or
enclosed reply-paid envelope. might never open it.
The response rate is likely to be low as people tend to ignore inserts in
publications.
Difficult/costly to follow-up non-response and could only be done if names or
other identifiers were requested in the questionnaire.
There is likely to be a respondent bias towards, for example, those who have
strong views about the newsletter and those who have time on their hands.

Advantages of sending the questionnaire separately to a random sample of members


include:
Members might be more likely to open the envelope containing a
questionnaire than to open one which they recognise as containing the
newsletter.
It is more likely that members will respond to a personal request than to a
questionnaire sent as an insert in the newsletter.
It will be easier and cheaper to follow-up non-response even if this means
sending reminder letters to everyone if responses are anonymous.
A random sample of members should give a representative sample of members
covering all types of members and opinions, provided non-response is
minimal.

Disadvantages include:
The financial and time costs of selecting the sample, addressing and stuffing
envelopes.
Costs associated with following up non-response.
Ordinary Certificate, Paper I, 2008. Question 3 Ordinary Certificate, Paper I, 2008. Question 4

(i) Systematic sample from the alphabetical list of members. (i) Total of 1162 members.
This should achieve a sample close to simple random. The proportions in Need (141/1162) 95 = 0.1213 95 = 11.53 student members, say 12.
the sample in the three grades of membership should be similar to the actual
proportions in the entire membership, and likewise there should be a good
Need (782/1162) 95 = 0.6721 95 = 63.9 ordinary members, say 64.
spread across the years of joining the organisation. An example of a
potential problem is that, if the number in any grade is very small (e.g. there
Need (239/1162) 95 = 0.2057 95 = 19.54 retired members, say 20.
are only a few student members), this method might not select any
members at all from that grade.
Note that 12 + 64 + 20 = 96 but there is sufficient budget only for 95
altogether. So take one fewer in one of the groups, say 11 student members
Systematic sample from the list ordered by year of joining the organisation.
(as 11.53 is very slightly further from 12 than 19.54 is from 20 – but there is
This would achieve a good sample across time, with numbers from different very little in it, and this does depend on the calculations being worked to 2
periods of joining represented in much the same proportions as in the entire decimal places).
membership. This is important for picking up any trends over time. Year
of joining is likely to be related to some extent to grade of membership, (ii)
perhaps with those who joined a long time ago more likely to be retired and
those who joined recently more likely to be students, and this method Membership grade Ni si Nisi Nisi/ Nisi (Nisi/ Nisi) 95
would thus also sample the three grades in roughly the same proportions as Student 141 226.21 31895.61 0.0597 5.67
in the entire membership. However, if very few joined in some years for Ordinary 782 550.12 430193.84 0.8046 76.44
some reason important to the organisation, such as a sharp rise in Retired 239 303.60 72560.40 0.1357 12.89
subscriptions, then this sampling method might not sample that group at all.
( Nisi = 534649.85)

This suggests 6 student members, 76 ordinary members and 13 retired


(ii) As pointed out above, grade of membership and year of joining are likely to be
members.
related, so there might not be much advantage to be gained by systematic
sampling from the list reordered by grade rather than by year of joining.
6 + 76 + 13 = 95, so the budget constraint is satisfied.
However, this should ensure that each grade is correctly represented in the
sample, and this might therefore be preferable to the other methods, depending
on the purpose of the survey. [If the grade information is readily available in
(iii) The method described in part (ii) assumes that there has been little change in
this way, it might be desirable to stratify by grade.]
the SDs of expenditure over the two years. If however the SDs now are
substantially different, the sample sizes found in part (ii) will not be optimum.
In contrast the sampling method in (i) uses only the membership numbers
which can assumed to still represent the groups, at least proportionally, and
this could seem fairer to members.

Note in particular that in (ii) only 6 student members are selected. If non-
response is high in this group, as is likely, then the achieved sample might be
very small indeed.

Ordinary Certificate, Paper I, 2008. Question 5 Ordinary Certificate, Paper I, 2008. Question 6

(i) A pilot survey is a small sample survey carried out at the planning stage of a (i) A region can be considered to be a cluster of supermarkets. Cluster sampling
full scale census or survey. It is used to test the design of the questionnaire, could consist of selecting one or more (but not all) regions at random. All
the sample frame and the general administrative procedures. It provides supermarkets in these regions would then be in the sample of supermarkets (1-
estimates of the costs and response rates, and of the variances of measured stage cluster sampling) or a sample from the supermarkets in these regions
variables. It can lead to improvements, technical and/or procedural, in the could be taken (2-stage cluster sampling).
conduct of the full census or survey.
An advantage of this method is that interviewers would need to be employed
A sample survey is, as the name suggests, a survey of a sample of a in the selected regions only, saving on travel and administrative time and
population. The sample would usually be selected by a random procedure and costs.
would be considerably larger than the sample used in a pilot survey, but
normally considerably smaller than the population. It would be done when it Disadvantages include: the supermarkets in the chosen region(s) might be
is too costly both financially and in terms of time to survey the whole atypical; the variances of estimators in this method of sampling tend to be
population. For fixed resources, information can be obtained in greater depth higher than in stratified and simple random sampling; the estimation method
in a sample survey than in a census, and quality control can be more stringent. is more complex.

A census is strictly a complete count of a population, but is taken more


generally to mean a survey of the complete population. It is done when it is (ii) The supermarkets could be stratified into the three size groups, and a random
important to get information about every member of the population and to sample of supermarkets taken from each group.
provide benchmark figures with which later figures can be compared. A
census might also be appropriate when the population is very small so that Advantages of this method include: all three sizes of supermarket will be
little is to be gained by sampling it, especially if there is large variability in the included; different sampling methods and questionnaires could be used in
variables of interest. each group if necessary as the types of customer could be very different in the
three types of supermarket; it is easy to produce estimates for each group;
variances tend to be lower under stratified sampling than under cluster
(ii) Results from a census might differ from the population figures if questions are sampling.
misunderstood, or if the responses given to questions are incorrect – either
because the respondent has answered incorrectly (or not at all) or because A disadvantage is that the supermarkets could be scattered widely
responses have been recorded incorrectly or copied incorrectly into a database. geographically, so that interviewer costs may be high.
Results might also differ because some particular groups in the population are
hard to reach and so, in practice, these groups turn out to be under-represented
even though it was meant to be a complete census.

Examples will depend on candidates' experience. In the UK Population


Census, the homeless and young professionals are found to be hard to reach
(the former for obvious reasons, the latter often because they work very long
hours and are thus only rarely at home to be interviewed). In rural
communities, several groups of workers may be hard to locate, and others may
be itinerant.
Ordinary Certificate, Paper I, 2008. Question 7 Ordinary Certificate, Paper I, 2008. Question 8

(i) As the interviewers are to select the customers and interview them at the time The observers could stand by the displays of organic fruit and vegetables with a pre-
of selection, quota sampling would be an appropriate sampling method. prepared check-list to be completed by ticking boxes indicating characteristics such as
sex and estimated age group for customers who put such produce in their baskets or
Interviewers should go to the stores at varying times during the day and week. trolleys, and other boxes indicating what organic fruit and vegetables are taken and in
They should aim to interview customers of all ages and of both sexes, in about what kinds of quantity. However, there might be a problem if fruit and vegetable
the same proportions as customers using the supermarket. Observing displays are not very close to each other, in which case individual customers might
customers at check-outs would give an indication of these proportions, as need to be followed round the displays to collect any useful information.
might asking staff (but staff will not necessarily see the whole range of
customers; for example, staff who work only in the evenings are likely to see Alternatively, or as well, the observers could stand by the checkouts to make similar
mainly people who are employed during the day and who do not have children observations.
under 16).
Observations would need to be made at different times of day and on different days of
Different types of customer should be approached – both those shopping on the week.
their own and those who are with others, both those who are doing a big shop
and those who have come in for one or two items only. Customers should be Advantages of an observational study in this context over the use of a questionnaire
selected taking no account of their dress or demeanour, that is interviewers include:
should avoid approaching only those who look respectable, who look as if they It does not take up customers' time.
might be sympathetic to answering questions, who appear not to be in a hurry,
etc. Customers could be selected from those in the check-out queue, from It does not rely on customers answering questions truthfully.
those leaving the store, or by walking round the store. Standing by one display Customers do not have to be approached and persuaded to take part.
in the store would not be sensible as not all customers will pass it.
Disadvantages include:
Interviewers should dress moderately, approach customers politely, etc. They It is only possible to observe whether organic fruit and vegetables are bought
should not try to persuade customers to respond against their will. by a customer at the time of observation. No information can be gained about
customers who buy such produce at other times, or who do not buy because
they cannot find what they are looking for or think it too expensive.
(ii) Interviewers should treat the customers with respect. They should ask the
questions and read any introduction and closing note of thanks exactly as No information can be obtained about customer characteristics that cannot be
written on the questionnaire, and should use a neutral tone of voice to avoid observed, such as household size, place of residence and occupation.
biasing the customers' answers. They should not comment on or react in any The results depend very much on the ability of interviewers to decide which
way to the customers' responses. They should record the responses that are customers to observe and their ability to make correct records.
given and not make them up or expand on comments. They might (discreetly)
record specific observations as regards customers, such as if a customer was
hard of hearing leading to communication difficulties.

[Note. Possible alternative survey methods include (short and simple) interviews of
customers in check-out queues. Excellent information on products bought should be
available from the data base holding information from the check-out tills.]

Ordinary Certificate, Paper I, 2009. Question 1

(i) Several methods are possible. For example:


THE ROYAL STATISTICAL SOCIETY
Could take a random sample of regions at the first stage, each region being a
cluster of administrative districts (cluster sampling). Could take a simple
random sample of districts in selected regions at the second stage and then
2009 EXAMINATIONS SOLUTIONS contact all adults listed in these districts (no further sampling). Note: the
districts can be considered to be clusters of adults so this could be argued to be
two-stage cluster sampling.

ORDINARY CERTIFICATE Consider the administrative districts as lying in strata classified as urban and
rural, possibly also with regions as strata. As the first stage of sampling take a
PAPER I simple random sample of districts from each stratum (stratified sampling). At
the second stage of sampling take systematic samples of adults from the lists
in the districts in the sample.

(ii) Comments on benefits and drawbacks may depend to some extent on the
The Society provides these solutions to assist candidates preparing for the sampling designs suggested in part (i), but are likely to include the following.
examinations in future years and for the information of any other persons using the
examinations. Cluster sampling benefits: administratively easier than other methods; less
travel if interviewers are used. Cluster sampling drawbacks: districts within a
The solutions should NOT be seen as "model answers". Rather, they have been cluster might be similar to one another; clusters chosen might be atypical of
written out in considerable detail and are intended as learning aids. districts in the country as a whole; estimators are complicated.

Users of the solutions should always be aware that in many cases there are valid Simple random sampling benefits: estimation is easy; properties of estimators
alternative methods. Also, in the many cases where discussion is called for, there may are well understood. Drawbacks: moderately complicated to select a sample;
be other valid points that could be made. extreme/atypical samples might occur.

While every care has been taken with the preparation of these solutions, the Society Stratified sampling benefits: ensures both rural and urban districts (and all
will not be responsible for any errors or omissions. regions if used as strata) are represented in the survey. Drawbacks: involves
stratifying districts as a first step; can lead to reduced precision if the
The Society will not enter into any correspondence in respect of these solutions. stratification is poorly done.

Systematic sampling benefits: easy to implement; ensures good


representation of adults by address. Drawbacks: might not behave as a simple
random sample if there are cycles in the lists; cannot estimate variability.

(iii) To obtain a sample of all adults would need supplementation of the sample of
adults in private households by sampling adults living in institutions such as
hostels, residential homes and prisons. Care would be needed to ensure there
© RSS 2009 was no duplication, for example somebody listed as in a private household six
months ago might now be in prison. Obtaining access to suitable lists might
be difficult due to confidentiality.
Ordinary Certificate, Paper I, 2009. Question 2 Q5. Violent crimes are sometimes committed by those aged under 16 years of age. What
(Solution continues on next page) are your feelings about this?

[Interviewer: record verbatim as far as possible.]


Instructions to interviewers for the start of the interview may be as follows. Other
instructions are inserted in the questionnaire.

When you get a reply from the address at which you have been asked to call,
show your identity card and say the following: "Good morning (or afternoon,
evening, as appropriate). I work for XXX and have been asked to interview
ABC whose address I believe this is. Is ABC in?" If the answer is "No", try
to find out when ABC might be in and arrange to call back. If the answer is
"Yes", ask to see this person unless you are already speaking with him/her. Interviewer: now say "To put your replies in context, I am now going to ask you
some questions about yourself. It will not be possible to identify you in any way from
When speaking with ABC, or if asked by the person answering the door, say the published results of the survey which will be as tables."
"As you probably know" (or if person is not ABC "might know") "from a
letter that you have (or ABC has) been sent, the government is interested in Q6. Are you
finding out adults' perceptions of crime and has asked my organisation to Married or in a civil partnership?
undertake a survey on this topic. You have (or ABC has) been selected by a Divorced or separated?
random process as a person who could help, and we would be very grateful if Living with a partner?
you (or ABC) would answer some questions. This will take at most half an Single and not living with a partner?
hour."
Q7. What is your occupation? …………………………………………………………….
If ABC agrees, say "Thank you" and then start asking the questions below in
order and record the answers, ticking boxes where appropriate. If ABC
refuses, apologise for taking his/her time and try to arrange an alternative time Q8. What is your age group? (Interviewer: show card and record answer.)
of interview.
18 – 24
25 – 44
45 – 64
65 – 79
Q1. Do you think that crime is a problem in your neighbourhood? 80 and over
Yes No Don't know Did not answer

Q2. Compared with five years ago, do you think the level of crime in your neighbourhood Interviewer, now say the following: "That is the end of the questions. Thank you
is now very much for taking the time to answer them. Is there anything you would like to ask
Lower? me or anything else that you would like to add?" Record responses. Answer any
About the same?
Higher?
questions if you are able to do so, or say you will try to find out the answers and that
Don’t know or did not answer someone will get back to the interviewee. Record the action. Record anything
interviewee adds.

Q3. Compared with five years ago, do you think the level of crime in the country as a Record the sex of the respondent. This is a check as the name of ABC will usually
whole is now
reveal this.
Lower?
About the same? Male Female Not clear
Higher?
Don’t know or did not answer

Q4. Do you think that violent crimes are on the increase in the country as a whole? Return completed forms to the office as requested.
Yes No Don’t know Did not answer

Ordinary Certificate, Paper I, 2009. Question 3 Ordinary Certificate, Paper I, 2009. Question 4

As the topic is a sensitive one, self-completion questionnaires (no interviewer present) (i) Advantages of quota sampling over simple random sampling.
are more likely to elicit honest responses than telephone interviews. The respondent
would have time to reflect on the answers and to remember details that might have It does not require a sampling frame, so it is useful when no suitable frame
been suppressed. The questionnaire could use closed questions and examples to exists.
prompt the respondent's memory and ensure that the respondent knew what kinds of
incidents constituted crimes. In addition, not everyone will have a telephone; in a It is quick to do as interviewers are not constrained to find named respondents,
telephone call it could be difficult for the interviewer to establish rapport with the so there are no call-backs.
respondent; identification details cannot be shown; more concentration is needed in a Hardly any preparatory work in the office is required.
telephone interview and respondents are likely to become tired if the interview is
long; it is not easy for the respondent to look up facts; there could be background Controls, such as limiting the numbers of men and of women to interview, are
noise and/or the chance of others overhearing the interview. relatively easy to use.

Advantages of telephone interviews compared with self-completion questionnaires are The costs of planning and analysis are less than in random methods with call-
that people often keep the same telephone number when they move house; the backs.
researcher will know quickly whether or not the selected sample member is willing to A target sample size can usually be achieved.
respond, so there is no time and money spent on following up non-respondents; and it
is usually possible to get a reply from a telephone number even if an answer-phone.
(ii) Disadvantages of quota sampling compared with random sampling.

There is no theoretical method of assessing sampling variability (but there are


methods of estimating variances).
Control of fieldwork is more difficult than in random methods. Undetected
errors are a hidden cost.
Interviewers could be biased in their choice of sample members. For example,
they might select people of a similar type to one another.
Interviewers might misclassify people in order to fill their quotas (or, indeed,
accidentally).
Some groups might have no chance of being chosen (depending on when and
where the fieldwork is done).
Refusal rates tend to be higher than in random methods, and this might
introduce substantial bias.
Ordinary Certificate, Paper I, 2009. Question 5 Ordinary Certificate, Paper I, 2009. Question 6

(i) The sampling fraction is 150/750 = 1/5. The main sources of error in an interviewer survey of a random sample of cruise
passengers drawn from passenger lists are sampling error, measurement error, non-
For "budget", 377/5 = 75.4; for "standard", 303/5 = 60.6; for "de luxe", 70/5 response, interviewer effects and processing errors.
= 14.
Sampling error occurs because only a sample of passengers is selected. If the method
This suggests taking sample sizes of 75, 61 and 14 from those who have is random there would be no sampling biases, but this does depend on having a good
booked budget, standard and de luxe cabins respectively. sampling frame. If there are problems with the frame such as duplication of names or
omissions of names then sampling biases will occur.

(ii) Nh Sh Nh Sh Nh sh/ (Nh sh) {Nh sh/ (Nh sh)} 150 Measurement error might be due to problems with the questionnaire, for example if a
question is worded in such a way that it measures something different from what was
377 8850 3336450 0.4387 65.8
intended. Respondents do not necessarily give true answers to questions, either
303 13005 3940515 0.5181 77.7
deliberately or because they do not know or have forgotten details. However, if the
70 4695 328650 0.0432 6.4
responses resemble the truth, measurement error from this source will be small.
750 7605615 Measurement error also occurs when interviewers record responses incorrectly. This
might be because they did not hear a response properly, but might also be a
This suggests taking sample sizes of 66, 78 and 6 from those who have booked transcription error, or even deliberate. Measurement error can also occur during
budget, standard and de luxe cabins respectively. processing.

Non-response would occur if the passenger refuses to be interviewed or is unavailable


(iii) The sample size from those who have booked de luxe cabins is very small, for interview. Partial non-response would occur if the response was missing on some
particularly in the method of part (ii). If there is high non-response from this questions. Views of non-respondents might differ markedly from those of
group, the achieved sample size could be negligible, leading to poor respondents and, unless other information was available, there would be no way of
representation of this group and high standard errors. knowing whether this was the case.
The method of part (ii) depends on standard deviations found from a past Interviewer bias is when an interviewer influences the response in some way. This
survey and from a different group (those booking with this travel firm are not might occur because of the relationship between the interviewer and respondent, or
necessarily similar to a wider population of those who go on cruises, many of because of the way the interviewer asks the questions, or interprets the responses.
whom might not have booked with this firm). If these standard deviations are The general looks and demeanour of the interviewer can also have an effect.
very different from those for the group in question, then the sample sizes of
(ii) will not be optimal for estimation of the mean annual income of those who Processing errors occur if a wrong estimation method is used, in particular if
booked with this firm. inappropriate weights are used. Incorrect entry of responses into a database might
also be regarded as a processing error.

Ordinary Certificate, Paper I, 2009. Question 7 Ordinary Certificate, Paper I, 2009. Question 8

In a question in open format (often referred to as an open-ended question), the (i) The amount of money taken each month is known as the value and is equal to
respondent is given no suggestions of possible answers. In questions in closed format, price per item times number of items sold. Clearly data suitable for
alternative responses are given. In a closed question with a single answer, the monitoring changes in the value per month would consist of prices and
alternatives are mutually exclusive and the respondent is asked to choose one. In a quantities sold of a sample of the different magazines and newspapers. The
closed question with multiple answers, the respondent is asked to choose as many as quantities could be obtained from records of stock coming in and subtracting
apply. Sometimes an "Other, please state" option is given in a closed question. the amounts unsold and ultimately removed from sale.

Decisions would need to be made as to which publications and which outlets


An example of an open-ended question is "How many cruises have you gone on to include. A decision would also be needed as to the definition of a month, as
previously?" calendar months vary in length. A decision is also needed as to when to
collect the prices as there might be variations throughout the month; for
example, there might be reductions in prices of out of date publications.
An example of a closed question with multiple answers is Further, a decision is needed as to when to collect quantities. The last day of
each month might be suitable.
"Which of the following newspapers do you read at least once a week? Please
select as many as apply."
(ii) A simple way to summarise the information is to estimate a total value or an
The Daily Telegraph
average, or alternatively to form an index relating the value each month to the
The Guardian value at a fixed point in time. All of these summaries [only one was required
The Independent from candidates in the examination] involve summing over a number of
The Daily Mirror different magazines and newspapers.
The Sun
Other – please specify To update the summary, prices and quantities would need to be collected every
month and in a similar way as was done initially. However, changes of buying
habits and of what is published might need to be considered (in other words,
An example of a closed form question with a single answer is the "basket" of goods might need to be changed if the monitoring took place
over a long period of time).
"Do you hope to go on another cruise within the next two years?"
Yes
No (iii) Many sampling schemes are possible. As it is a chain of newsagents, taking a
Don't know sample of outlets, perhaps stratified by area of the country, might be sensible;
though as prices of magazines and newspapers tend to be fixed, a relatively
small sample should suffice. What will vary is the publications stocked in
different outlets, for example one in a business district is more likely to stock
the Financial Times and less likely to stock Playboy.

Sampling of newspapers and magazines is desirable. These might be stratified


by type; for example, magazines might be put into such groups as women's,
professional, gardening and so on. Stratification by frequency and time of
publication is also needed, probably as daily, weekly and monthly. Simple
random samples from the different strata could be taken. An element of
purposive sampling might also be done if, say, some publications tended to be
price leaders.
Ordinary Certificate, Paper I, 2010. Question 1

(i) Selection bias is bias due to the method of selecting the sample and arises
THE ROYAL STATISTICAL SOCIETY when the members selected are in some way consistently atypical of the study
population. It would result in estimates of population quantities that are
systematically too low or too high.
2010 EXAMINATIONS SOLUTIONS Response rate is the proportion of those selected to take part in the survey who
provide a reply. A low response rate could produce poor estimates of
population quantities as those who respond might be atypical and not
representative of the population, even if there was no selection bias when the
ORDINARY CERTIFICATE sample was taken. Standard errors are also likely to be high, so precision of
estimates will be low.
PAPER I
(ii) In method A, there is no sampling scheme as such. Only those with a
particular interest in the restaurant and with time to spare might reply,
introducing both selection bias and a low response rate. These are both
disadvantages. On the other hand, potentially anyone who approaches the pay
The Society provides these solutions to assist candidates preparing for the desk could respond, which is an advantage, as is the publicity about the
examinations in future years and for the information of any other persons using the survey.
examinations.
In method B, tables are selected rather than people, but there could well be
The solutions should NOT be seen as "model answers". Rather, they have been more than one customer at a table. If the decision as to whom to interview at a
written out in considerable detail and are intended as learning aids. table is left to the interviewer, there could be selection bias due to the
interviewer. There could also be problems arising from the time of day (or
Users of the solutions should always be aware that in many cases there are valid day of the week), as the restaurant is likely to be much busier at some times
alternative methods. Also, in the many cases where discussion is called for, there may than others. The response rate will depend on people's willingness to respond,
be other valid points that could be made. and this might be low as they might not wish to be interviewed while, or just
after, eating. The advantages of the method are the element of randomness
While every care has been taken with the preparation of these solutions, the Society involved and that a personal approach has the potential to increase the sample
will not be responsible for any errors or omissions. size compared with method A. The personal approach might also enable
deeper questioning to be carried out ("probing").
The Society will not enter into any correspondence in respect of these solutions.

© RSS 2010

Ordinary Certificate, Paper I, 2010. Question 2 Ordinary Certificate, Paper I, 2010. Question 3

(i) The regions (which consist of small and large urban areas) are considered as A suggested covering letter, to be on the headed paper of the organisation conducting
clusters. The fist stage is to take a simple random sample of clusters. This is the survey and signed by the chief researcher, is shown below. If the names of all the
cluster sampling. Having selected these clusters, the second stage should teachers are available, they could be inserted in the salutation. If the survey has been
consist of stratified sampling, with each selected cluster stratified into small commissioned by a well-known organisation, the letter could start instead with "We
and large urban areas; it may be useful to further subdivide the large urban have been commissioned by ..... to undertake ...".
areas into those with (say) 2 or 3 outlets and those with 4 or 5 outlets. The
stratified sampling would be conducted by simple random sampling within the
strata, perhaps using proportional allocation. It is common practice that the
selected sample of clusters contains only a small number of them (sometimes Dear teacher,
only one), and sometimes complete enumeration is then carried out within
each selected cluster. This relies, of course, on each of the clusters being We are undertaking a survey to investigate whether teachers in colleges of students
representative of the population as a whole. aged 16–19 feel stressed by their work and to investigate factors that might affect
stress levels. You have been selected by a random process to take part in this survey
and we hope that you will agree to do so. Your responses will be strictly confidential
(ii) The tables need to be numbered. One method is to choose a simple random to our organisation.
sample of tables and ask the interviewer to approach customers at these tables
in turn in a specified order, returning to tables that were vacant at a later time. The survey consists of a questionnaire which is enclosed with this letter. Please
Another method is to take a systematic sample of tables and ask the answer all the questions in the spaces provided. Many can be answered by ticking
interviewer to follow a similar procedure. boxes. Please return the completed questionnaire to me at the address shown in the
letter-heading. A reply-paid envelope is enclosed. Alternatively, if you prefer to
Bearing in mind that interviews will take time, say 10 minutes including time answer the questionnaire electronically, please email me at [insert email address] and
to approach customers and decide who to interview at a table, it might be I will send it to you as a Word attachment which can be returned by email.
reasonable to do six interviews in an hour. For restaurants with only 6 tables,
all might be approached so that the sample size, in terms of tables, is 100% With our thanks in advance,
(complete enumeration); only a random order of tables to approach is needed.
For restaurants with as many as 15 tables, a 50% sample of the tables might be Yours sincerely,
appropriate (though perhaps somewhat ambitious in terms of the time taken).
Restaurants with intermediate numbers of tables could reasonably have
samples of between 50 and 100 per cent of the tables.

[insert name]
Chief researcher

A suggested questionnaire is shown on the next page. Questions 1, 2, 5 and 10 are


closed; questions 3 and 4 are open; questions 6, 7, 8 and 9 are rating scale.

Solution continued on next page


Q1. What is your sex? Ordinary Certificate, Paper I, 2010. Question 4
Male Female
(i) The total number of teachers is 610, so the required overall sampling fraction
Q2. What is your age group? is about 150/610. Working with a sampling fraction of exactly 150/610 in
Under 30 30 – 44 45 – 59 60 or over each college gives
(150/610) 307 = 75.49 from A,
Q3. What subject or subjects do you teach? ………………………………….............
(150/610) 200 = 49.18 from B,
Q4. What position do you hold in your college? ……………………………........….. (150/610) 103 = 25.33 from C.

Q5. Do you feel that the hours you work are excessively long? Taking 75 + 49 + 25 gives a total sample of 149 and a total cost (in £) of
Yes No Am not sure (75 5) + (49 10) + (25 7) = 1040.

Taking 76 + 49 + 25 gives a total sample of 150 and a total cost (in £) of


Please indicate the extent of your agreement with the statements in the next four
questions. (76 5) + (49 10) + (25 7) = 1045.

Q6. My work makes a valuable contribution to society. [Note. In the examination, either answer was acceptable;
candidates were not expected to give both.]
Strongly Agree Neither Disagree Strongly
agree agree nor disagree
disagree (ii) Let n be the total sample size and n1, n2, n3 the sample sizes for the three
colleges. The required calculation is set out in the table below; the first four
Q7. I feel valued at work. columns repeat the information given in the table in the question.
Strongly Agree Neither Disagree Strongly
agree agree nor disagree N i si / ci
disagree College Ni ci si N i si / ci Cost
N i si / ci
Q 8. I feel stressed at work. A 307 5 7.5 1029.71 0.729 0.729n 5 = 3.645n
B 200 10 2.8 177.09 0.125 0.125n 10 = 1.250n
Strongly Agree Neither Disagree Strongly
C 103 7 5.3 206.33 0.146 0.146n 7 = 1.022n
agree agree nor disagree
disagree Total 1413.13 5.917n

Q9. The pay is adequate. The calculation shows that n1 = 0.729n, n2 = 0.125n, n3 = 0.146n and the total
cost is 5.917n.
Strongly Agree Neither Disagree Strongly
agree agree nor disagree So we require 5.917n 1050, which gives n 177.45.
disagree
Using 177.45 as the value of n, we get n1 = 129.36, n2 = 22.18, n3 = 25.91.
Q 10. Are you likely to leave the sector during the next year?
Taking 129, 22 and 26 respectively (with which n = 177) gives a total cost of
Yes No Do not know 1047.

[Note. Slightly different decimal values might be found


depending on rounding within the calculation, but these
Thank you for your time. are unlikely to alter the integer values in the final answer.]

Solution continued on next page

(iii) There are advantages and disadvantages of both methods. The optimum Ordinary Certificate, Paper I, 2010. Question 5
allocation method minimises the variance of the estimate of the mean number
of years teachers have been at the colleges. However, we are told that the
survey has several objectives, and there is no guarantee that this optimum (i) Advantages of this longitudinal study include the following.
allocation for the mean number of years will also be optimal in respect of any
other objectives. Indeed, it almost certainly will not, depending on the Recent recruits are likely to be fairly interested in responding (initially
standard deviations that would apply for other objectives. Using a uniform at least).
sampling fraction is safe in that it achieves representativeness across all As the same group is followed, any changes can be related directly to
variables that might need to be measured. Further, it does not rely on the the teachers.
standard deviations, which are only estimates. Compared with the optimum
allocation, it uses a noticeably larger sample from college B and so will pick As the sampling frame is recent, it is likely to be fairly accurate.
up more of the variation between teachers at B. Conversely, it has a smaller
sample at A, but it remains quite a large sample, so A should be well covered. It is only necessary to look at one year's list of teachers to select the
Its overall sample size is considerably smaller, which may have consequences sample.
for overall accuracy, but despite this it costs almost as much as the sample
found by optimum allocation. Disadvantages include the following.
The results relate to one particular group only (things might be
The overall decision is not clear-cut but, particularly as there are several different for those joining in other years).
objectives, perhaps on the whole the uniform sampling fraction method is to
be preferred here. Sample members might get conditioned to responding and change
some of their views because of an impression they want to create.
It is necessary to wait five or more years to get results for those who
have been at the college for five years.
Members of the sample might leave the college or, even if they stay,
might get tired of responding and drop out of the study (leading to
reduced sample size and/or likely bias due to non-response).

(ii) Advantages of this method of sample surveys include the following.


Information is obtained from teachers with different experiences time-
wise.
Results relating to one, two and five years are obtained in one survey
and immediately.
It is not too much of a burden on respondents.

Disadvantages include the following.


It could be difficult to select the samples required as records have to be
searched for appropriate details or a preliminary "census" done to find
out when teachers joined.
Teachers might have forgotten how they felt about stress in the past.
It is not easy to find how views of individual teachers have changed
over time.
Ordinary Certificate, Paper I, 2010. Question 6 Ordinary Certificate, Paper I, 2010. Question 7

A pilot survey is a small scale initial survey done with similar procedures to a (i) Several potential problems are listed here with suggestions as to how they
proposed survey. might be overcome. [In the examination, candidates were only asked to
discuss three problems. Other reasonable suggestions were of course
It is done to test various aspects of the proposed survey and to help in the design of accepted.]
the survey and to train personnel. In particular:
Some addresses might be missing from the list. Could travel round the area
Variability and costs can be estimated to help in the determination of sample and add these to the list before taking a sample, or could take an additional
size sample from the extra addresses found while doing the survey.
Some addresses might no longer exist. Could take a further sample to
Decisions on the sampling units can be made
compensate for this loss of sample members.
Sample frames can be drawn up and/or tested for accuracy, completeness etc Some addresses might be listed more than once. Delete duplicates from the
list before taking the sample if they are spotted. Do not include the same
Questionnaires can be tested and improved, including the introduction made to address more than once in the sample.
potential respondents
Some addresses will be non-residential. Might overcome by dropping these
Interviewers, if used, can be field-tested and given further training if necessary from the sample, but would need to choose more addresses than the required
number of households to avoid too big a reduction in planned sample size.
Office procedures can be developed and staff can be trained Some addresses with more than one household living at them could be under-
represented (for example if the address does not identify individual households
Coding and analysis procedures can be pre-tested living at it). Might always include all households living at any selected
address.
Background information useful to the full-scale survey can be obtained
Some households might have more than one address. Do not include them
The times needed for the different stages can be assessed. more than once in the sample (but it might be difficult to identify duplicates of
this type).
Some households might not have an address. Perhaps supplement the sample
by using other lists.

(ii) The interviewers should be given quotas to tell them the numbers of
households of each size and type that should be interviewed, perhaps also
including quotas for the ages and sex of people interviewed. An alternative
might be to instruct the interviewers how to take a systematic sample of
residential dwellings.
Geographical coverage should be ensured, either by making this part of each
quota or by telling interviewers in which area of the community they should
interview.
Interviewers should be instructed to interview in evenings as well as during
the day, and on all days of the week.
Interviewers might perhaps be advised to consider other ways of finding
members of households: for example, as well as knocking on doors, they
could stand in shopping centres.

Ordinary Certificate, Paper I, 2010. Question 8

The researcher should observe customer characteristics such as sex, broad age group,
ethnicity and whether the customer is with others such as children or other adults.
The researcher should observe what goods the customers look at and the approximate
time for which they do so, and what they put in their basket or trolley.

A variety of shoppers should be selected for observation, in all the weeks of the
survey and at different times of day. This might be done continuously, selecting a
further shopper as soon as observation of one has been completed.

The information would be best recorded on a check form so that as far as possible the
researcher just has to tick boxes.

One difficulty is that the researcher needs to be unobtrusive, and must avoid being
mistaken for a member of the supermarket staff. It is difficult to hide a clip-board. It
might be possible to stand at the end of an aisle.

Another difficulty is that, if there are a lot of people around the shelves, it will be hard
to tell who is doing what.

Selecting a variety of shoppers might also be difficult as time goes on. To begin with
most shoppers will be suitable, but later it might be difficult to fill some quotas,
especially in the rarer groups.

It will be difficult to time accurately how long people look at goods; indeed, it may
be difficult to tell whether they have looked at them at all.

You might also like