Professional Documents
Culture Documents
Question 1
(i) A longitudinal study is one carried out over a period of time. A cohort is the
THE ROYAL STATISTICAL SOCIETY sample of people chosen initially to take part in the study, who are then followed over
the time period.
2001 EXAMINATIONS SOLUTIONS (ii) The sampling unit is an individual young person of school leaving age. Hence
the sampling frame is all schools with pupils in the required year.
The sample design is to choose a random sample of pupils of the required age using
ORDINARY CERTIFICATE the school lists of pupils.
PAPER I [Note. The actual survey excludes special schools and those with less than 15
students.]
The Society will not enter into any correspondence in respect of these solutions.
RSS 2001
Ordinary Certificate, Paper I, 2001. Question 2 Ordinary Certificate, Paper I, 2001. Question 3
EVENING______________________________ - electoral roll can be used as sampling frame since it distinguishes between
wards
5. EMAIL ADDRESS (IF ANY)________________________________________ - all wards represented, giving information about each ward as well as total
population
6. PLEASE RING YOUR AGE GROUP - sampling variation for each ward can be estimated mathematically
- less tedious to select because each ward contains a manageable number of
18 25 26 35 36 45 46 55 56 65 OVER 65 units.
7. DURING A TYPICAL WEEK, ON WHICH DAYS DO YOU BUY "THE Stratification by ward, disadvantages:
NATIONAL DAILY"?
- non-response may occur
MON TUE WED THUR FRI SAT (please ring) - preliminary calculation of sample sizes in each ward necessary, using
guesses of relative variability
8. DURING A TYPICAL WEEK, PLEASE SAY WHICH OTHER NATIONAL - calculation of overall sampling error less straightforward.
DAILY PAPERS YOU BUY (please write the names)
Quota sampling by ward, advantages:
Monday __________________________________________________________
Tuesday __________________________________________________________ - no difficulty over non-response
Wednesday __________________________________________________________ - only limited area to be covered, therefore quick
Thursday __________________________________________________________ - different wards all represented satisfactorily.
Friday __________________________________________________________
Saturday __________________________________________________________ Quota sampling by ward, disadvantages:
9. WHICH NATIONAL SUNDAY NEWSPAPERS ARE BOUGHT ONCE A - no estimate of sampling variation can be made
MONTH OR MORE OFTEN ? Please list them (OR write NONE) - results may be biased through choice of individuals to be approached, and
through willingness or not to reply
____________________________ ____________________________ - appearance of interviewer may cause some people to respond, other not.
____________________________ ____________________________
[Two comments required for each.]
[Note. 8 and 9 could also be asked by giving a full list of all those available and
asking for boxes to be ticked. This could take a lot more space] Continued on next page
Ordinary Certificate, Paper I, 2001. Question 4
(ii) The roll will not be an up-to-date list of residents, due to deaths, removals into
or away from area, including from one ward to another and from town to country or
visa versa. Also any building, clearance or renovation schemes may have affected the (i) The number of respondents, because the larger this is the smaller will be the
structure of wards, of the numbers in them and the economic characteristics. standard error of the estimated percentage. This must be taken note of when assessing
the meaning of the result.
(ii)
(1) The poll is only of listeners to the radio station; so it will not be representative
of the whole area it serves. The audience may be biased to particular age-groups,
economic characteristics, work and leisure habits, and to those who like the sort of
entertainment the station gives.
(3) The timing will exclude several groups of people, perhaps even those out
walking their dogs.
(4) The nature of the question may make dog owners more likely to respond.
(5) Giving a figure half-way through the hour will encourage more no-voters to
respond as their view is in a minority (or vice versa if the announced percentage had
been below 50).
(6) People can vote more than once if no identification is asked for and checked.
This is a source of considerable bias.
(iii) "Are you a dog-owner?", with the figures for Yes and No kept separate. This
allows comparison of the views of the two groups. [It is possible that age or sex may
be relevant also, but only one question is allowed.]
Ordinary Certificate, Paper I, 2001. Question 5 Ordinary Certificate, Paper I, 2001. Question 6
The sampling fraction is the proportion of the total population (or of a particular She should sit in a position where she can see the whole shop clearly, and can also see
subgroup) that is used in the sample survey. If there are N in the group, of whom n the cash desk/till to record the value of goods purchased. A pre-printed form should
are in the sample, the sampling fraction is n/N. be used for each customer observed, recording sex, age (in the form of a very broad
classification, young/middle-aged/old, since she cannot ask the customers), time of
entry and exit. A floor-plan of the shop, printed on the form, would allow direction
(i) A uniform sampling fraction requires 400/10000=0.04 to be selected from and pattern of movement around the shop to be recorded. The number of times an
each stratum, i.e. item is looked at, or picked up for examination, can be recorded on the plan.
40 from A, 160 from B, 2000 from C. Only one person can be observed at a time, so as soon as she is in her observation she
should observe the first customer coming in, and when that customer has finally left
[The population size is 1000 4000 5000 = 10000.] she can take the next one to enter. This should ensure reliable records, and span the
whole working day (with breaks taken at convenient times, e.g. for refreshment).
Forms will be numbered and dated, and used in order, so comparisons between days,
(ii) The sample fractions in A, B, C must be 10k, 5k, 2k where k is a constant that and of times in the same day, can be made.
will achieve the required total 400. The sample sizes then are 1000 10k, 4000 5k
and 5000 2k which add to 40000k; this has to be 400, so k = 0.01. Care must be taken not to be conspicuous, or to disturb the normal running of the
shop; staff need to be fully aware of her task and to prevent customers asking her for
Hence 100 in A, 200 in B, 100 in C. assistance (so far as possible). Difficulty could arise if a shop cannot be seen fully
and easily from one place and probably should not be used for the study. Any
groups of shoppers, perhaps looking for a single item, may be hard to record properly.
(iii) The second method should reduce the standard error of the estimate.
Ordinary Certificate, Paper I, 2001. Question 7 Ordinary Certificate, Paper I, 2001. Question 8
(i) People may simply refuse; may be away from home; may be out at the time A typical database might look like this:
of the call; may be unsuitable to be interviewed, for various reasons; or may be new
occupants, not the persons on the available list. Pre-selected names should not FIELD NAME FIELD TYPE WIDTH
normally be replaced by substitutes. Title Text 4
Surname Text 24
Given_name Text 24
(ii) Non-respondents may well have different characteristics from those who do
Initials Text 6
respond, and bias will depend on the extent of these differences and the amount of
House_no Numeric 4
non-response.
Address_1 Text 36
Address_2 Text 36
(iii) It may be necessary to keep a sample up to the planned size, to provide Address_3 Text 36
sufficient data for analysis and for adequate estimation of sample variance. In Postcode Text 8
stratified sampling, strata proportions need to be kept correct if response rates are Telephone Numeric 16
likely to vary between strata. But since substitutions are necessarily responders, the Number Numeric 2
difficulties in (ii) still remain. Savings Attribute 1
Current Attribute 1
Loan Attribute 1
(iv) Skilled professional interviewers can sometimes help with unsuitable Deposit Attribute 1
interviewees and potential refusals. Brief, clear and well-designed questionnaires Other Attribute 1
may overcome these difficulties also.
[Attribute Yes/No]
Repeat calling is used for those not at home, either choosing a different time of day or
day of the week, in the light of any knowledge about age, sex, occupation etc of
respondent, or if possible making a firm appointment.
Results could be weighted for characteristics such as age, sex, social class if it was
thought, or there was information, that these differ between respondents and non-
respondents.
A survey usually aims to estimate a mean or a proportion, e.g. the mean expenditure
THE ROYAL STATISTICAL SOCIETY of a family per week or the proportion of the population holding a particular opinion.
If a sample from the whole population is used for this purpose, it must "represent" the
population so that the results from the sample can be applied to the wider population.
Some methods of selecting a sample do not properly represent a population, e.g. if
2002 EXAMINATIONS SOLUTIONS using a list of members that is not up to date. Some methods of obtaining information
will cause non-response, the refusal of people to answer badly constructed questions
or the failure to take part in enquiries when people have no interest in a topic. These,
and other, errors in carrying out a sample survey rarely affect all sections or groups of
ORDINARY CERTIFICATE a population in the same way or to the same extent, so the answers which are obtained
are incomplete but the effect of this on estimated means or proportions cannot be
PAPER I measured. This leads to bias in the sense that the sample estimates, even from large
samples, may differ systematically from the true (but unknown) value of the required
mean or proportion in the whole population due to, for example, a non-response
group being systematically different from the rest. This bias cannot usually be
corrected by statistical methods. It is a structural error in the sampling technique
used.
The Society provides these solutions to assist candidates preparing for the
examinations in future years and for the information of any other persons using the In this survey, the response rate was so small that it gives negligible information on
examinations. the whole population. It very likely represents the views of minorities or special
interest groups who selected themselves by replying. No statistical selection took
The solutions should NOT be seen as "model answers". Rather, they have been place, and no reminder to reply seems to have gone out. The 'Newsletter' is most
written out in considerable detail and are intended as learning aids. unlikely to have been read in detail by more than a small proportion of the population,
and even those who tried to telephone their responses may not always have got
Users of the solutions should always be aware that in many cases there are valid through. 52 out of 103456 residents, self-selected, cannot be taken as a good
alternative methods. Also, in the many cases where discussion is called for, there representation of residents in the borough.
may be other valid points that could be made.
While every care has been taken with the preparation of these solutions, the Society
will not be responsible for any errors or omissions.
The Society will not enter into any correspondence in respect of these solutions.
RSS 2002
Ordinary Certificate, Paper I, 2002. Question 2 Ordinary Certificate, Paper I, 2002. Question 3
These three methods all imply proper statistical selection, which is an improvement.
It is quite likely that the same errors would occur if a repeat "survey" was done in the Nevertheless we also need to select from all the population of residents, and none of
same way responses would be very few and would represent only those with some the methods ensures this.
special interest in the questions asked, very far from the population as a whole.
Hence, in any repeat, the same sort of results might occur. Reliability in this sense is (i) A simple random sample, using the register of voters as its sampling frame,
therefore a misuse of the word; it does not imply validity. If a sampling method is will exclude those residents newly moved in, and those who failed to return their
basically faulty, the same faults will influence any set of results. registration forms for that year; and will include some who have moved out or died.
In the UK, these lists become quite inaccurate by the end of a year. This is a source
of bias, difficult to estimate. A postal survey, even with up to two reminders, rarely
gains more than 50% response unless some form of visit can be organised for non-
responders.
Advantages of simple random sampling, given an adequate frame, are that selection is
made from the population with no personal bias by the survey organiser; that
everyone on the list has the same probability of selection; that some valid statistical
analysis of results can be done, and can be generalised to the whole population.
(ii) A stratified sample requires the population to be split into groups or "strata",
which should ideally be homogenous within a group, the main differences occurring
between groups. A simple random sample is taken within each of the strata. The
advantage is that information is obtained for each stratum, and similarities or
differences between them can be seen. A disadvantage is that the frame has to be split
into strata, and the likely boundaries between strata are not always clear. Sometimes
urban and rural may differ, sometimes type of housing is a useful division (and a
borough council will have this information), especially if it also represents social
class.
Personal interviewers will lead to a considerably better response rate, without the
need for reminders, except that some revisiting is needed to those not at home on the
first visit. It is more expensive, especially in less densely populated areas. Although
it gives better quality data, a larger total sample may be needed if there are several
strata, although in the present case that seems unlikely to be necessary.
(iii) The sampling frame is incomplete. Some people do not want their names and
addresses published in a directory, some have mobile phones or unlisted cable
services. This causes bias, since some groups of people are more likely than others to
be in one of these categories. Also, people do not always respond to telephone
interviews (unless some publicity beforehand has told the population that a telephone
survey is to be carried out, and encouraged them to cooperate).
It is a quick method, even allowing for non-response, and enough reserves can be
located to enable the desired size of sample to be achieved whereas in other
methods this depends on the response rate, which can be hard to guess in advance.
However, the use of reserves instead of attempting to contact again those who are out
at the first call can bias the response in favour of some parts of the population (older,
different leisure interests, different hours of work).
This is a very cheap method and so is often used. Its response rate would be higher
than for a postal survey.
Ordinary Certificate, Paper I, 2002. Question 4 Ordinary Certificate, Paper I, 2002. Question 5
n1 12000 11 n2 80000 8
31112.7 286216.7
k 18 k 5
(d) Usually a symbol such as * is entered where values are missing. When
using the data, results for each column (i.e. each data item) may be used as (ii) Prices vary substantially in different parts of the country. Different groups of
they stand, or the rows with any item lost may be omitted completely. If two- the population will buy somewhat different ranges of products and will react in
way tables, using data from two columns, are to be produced, a row need only different ways to prices changes. Some people will use more expensive small shops
be omitted if one (or both) of these items is (are) lost, so minimising the to avoid travelling; others will concentrate on supermarkets where prices are lower
number of missing values. and less variable. Data need to be collected from the whole range of outlets, and a
suitable form of "average" found from them.
The procedures required will depend on which program is being used, and what
computing equipment is available.
(iii) Weights for the categories will, in practice, vary substantially from group to
group in the population. Children's clothes need more frequent replacement, food
(ii) Given a good program, used properly, accuracy can be achieved quite quickly, consumption by the elderly differs in amount, and in types of food, from that by
since all the calculations after the checking stage can be done in the computer families with growing children; some types of occupation demand more energy-
(column means etc, two-way tables), and if verbal answers can be categorised into a giving foods, as well as different types and strengths of clothing. Old people are more
few classes these can also be summarised quickly. A disadvantage could be that the likely to need supportive footwear, for example, whereas teenage sportsmen and
data are not scrutinised so closely as in an analysis by hand, so that some useful two- women want quite different special items. Only broad generalisations are possible
way tables are not calculated because a possible relation has not been spotted. between different types of household. However it is certainly possible to estimate the
Comments in words, from individuals or only a few people, may be lost in a purely expenditure necessary for healthy living and eating as a basis for deciding what a
mechanical analysis. On the other hand, standard statistical packages will note "odd" "minimum income" should be. A single "cost of living" index is really a fallacy.
values (possible outliers) where these may be missed by hand. Computer analysis can
carry out all the studies and comparisons that seem useful; this would often be
impossible by hand.
Quota sampling splits a population into a number of groups and samples a prescribed
number (quota) of people in each group. Often these numbers will be in the same THE ROYAL STATISTICAL SOCIETY
ratio as the totals in the population. Suppose that a college or university is split into
male/female, home/overseas and three areas of subject study A, B, C. The numbers of
each of the 12 groups in the population can be found from college records, and so a
quota of each group (A/male/home), , (C/female/overseas) can be specified: this 2003 EXAMINATIONS SOLUTIONS
may, for example, be 10% of each total. Interviewers now go in search of the
appropriate numbers from each group, and ask the first suitable individuals they meet
the questions in the survey they are conducting. As soon as the required number of
(A/M/H) have been interviewed, no one else who is in that group will be asked the ORDINARY CERTIFICATE
survey questions; and so on for all 12 groups. There is no question of randomness in
the sampling, but if the survey is about opinions on some topic that is likely to affect PAPER I
people within the same group in much the same way, the answers can often be quite
representative. However, there is no statistical theory that can be used to assess the
results. Interviewers can be told not to concentrate on all the same type of unit, e.g.
they can be warned not to go for the tallest males!
The population is the whole college; the frame is the college list; the method gives The Society provides these solutions to assist candidates preparing for the
answers quickly, without the need to set up a randomised scheme, and unless the examinations in future years and for the information of any other persons using the
survey questions are very sensitive the answers are likely to be reasonable reliable. examinations.
The solutions should NOT be seen as "model answers". Rather, they have been
Cluster sampling is useful where a number of similar large (primary) units, such as written out in considerable detail and are intended as learning aids.
villages in an agricultural region, exist and a sample of individual farms or holdings
(secondary units) is required. Time and cost can be saved by selecting some of the Users of the solutions should always be aware that in many cases there are valid
clusters, at random from all of the population of clusters which are the villages. alternative methods. Also, in the many cases where discussion is called for, there
The remaining villages are not visited at all, and so the sampling frame only needs to may be other valid points that could be made.
exist for the chosen villages. Often it will have to be constructed as part of the
survey, so considerable effort is saved in this way. From each of the chosen villages, While every care has been taken with the preparation of these solutions, the Society
a random sample is selected in the usual way, of farms or holdings to take part in the will not be responsible for any errors or omissions.
survey. (If there are not many farms in clusters, they can all be taken; but usually a
sample of the same size would be taken for the survey.) The Society will not enter into any correspondence in respect of these solutions.
The sampling frame at the beginning needs to list all the villages. The population is
all the holdings in all the villages. Assuming that differences between villages are not
great, resources can be conserved by not having to visit all of them. This allows
sampling within villages to be sufficient for a good estimate of variance to be found.
A random method which required several villages to be visited for only a single unit,
or very few units, to be studied would be inefficient by comparison.
RSS 2003
Ordinary Certificate, Paper I, 2003. Question 1 Ordinary Certificate, Paper I, 2003. Question 2
Methods vary between countries. The UK conducts regular 10-yearly censuses, the (i) A target population is that population for which information is required.
most recent being in 2001. Questionnaires are used, one for each household (not one Results from a survey apply only to the study population which was sampled, e.g.
for each building); these are delivered by enumerators and (in 2001) returned by post those who respond to the first request for information, or those geographically easy to
in pre-paid envelopes. (In previous censuses, they were collected personally by the locate. If there are any real differences between target and study populations, results
enumerators.) Those not returned by post in 2001 were followed up by the may not apply to the target.
enumerators.
(ii) Either the target will be those who already use the canteen, regularly or
Forms contained sections for household entries and separate (but identical) sections occasionally, or it will cover all actual and potential users. If the main aim of the
for individual members. Because the form was to be filled in by the householder (or survey is to improve satisfaction among existing users, the first is appropriate, but if it
another member, but not by an enumerator), questions needed to be as few as possible is also desired to improve user numbers the second target is appropriate. In that way,
and as clear as possible, and should have been tested in a pre-census pilot survey. The information can be obtained on reasons for non-use, such as not supplying the type of
UK 2001 census had five sections:- food required at lunch time (which would be the main time of interest to the
manager), or speed of service, supply of vegetarian means, etc.
(1) Residents the name of each person usually living there.
(2) Visitors name, together with the usual address.
(3) Household type of accommodation, whether self-contained (e.g. no sharing
of kitchen, etc), number of rooms, ownership, central heating, vehicle
ownership.
(4) Relationships between individual members (husband, wife, partner, parent,
child).
(5) Individuals sex, date of birth, marital status, various ethnic questions such as
country of birth, migration, previous address, ethnic group classification,
religion, qualifications, various questions on health, provision of care etc,
employment, working hours and method of travel.
Some different questions were added in Wales (e.g. language), Scotland and Northern
Ireland. There were 5 household questions and just over 20 individual ones in 2001.
Data remain confidential and individuals cannot be identified. However, data for
relatively small areas, towns, villages, minority groups, can be obtained. Hence,
nationally data will be useful to government for
(1) planning of requirements of health care, hospitals, schools and colleges;
demographic variation and pension provision, welfare,
(2) in the UK, devolving funds to regions or areas.
Also, (3) businesses and commercial interests, and social research organisations and
departments, use these data.
Changes should be minimal between censuses so that comparisons can be made; also,
comparisons between countries are very useful. (Note that some developing countries
are still developing census methods, and that some developed countries (e.g.
Netherlands) use sample surveys combined with administrative registers rather than
complete censuses.)
Ordinary Certificate, Paper I, 2003. Question 3 Ordinary Certificate, Paper I, 2003. Question 4
A: If the target population is only present users, this method could be adequate provided
enough time was allowed for occasional (as well as regular) users to be adequately
represented among the respondents but even here the most regular users might be over-
represented, so that too high a proportion of "well satisfied" customers was recorded.
For the other possible target, the non-users are not represented at all, so it would be quite
unsatisfactory.
Response rates could also be lower among those in a hurry, who may not be so satisfied with
speed of service but did not take the time to say so.
B: This is an expensive method, but it avoids selection bias. Response rates could be
relatively high, although those who were "too busy", not interested in the canteen or not easily
available for interview could produce bias in the results by not answering. Interviewers, if not
regularly used to carrying out surveys, would need careful training to avoid bias in the way
they asked the questions and recorded answers.
C: Using the list of work email addresses, this is a good method for either target. As in
B, present usage of the canteen could be one of the questions asked. Effectively this is a
complete census, but of course non-response is possible. This could be minimised by sending
reminders to those known not to have replied. In fact, people commonly reply to emails
quickly (if at all!) and so if a clear and fairly short questionnaire is emailed a large number of
responses would be hoped for.
D: A display stand, with questionnaire forms to be taken, could raise general interest in
the survey but still may not obtain an unbiased selection of replies, which could be limited to
those who were already interested in the canteen and felt they could spare time to reply.
Unless there are sufficient questions to check identity of respondents, a few members of the
public visiting the offices might answer. This does not seem to be a good method.
E: If strata (as in question 4) cover all the departments, this is potentially a good way of
selecting a "representative" sample, but there remains the possibility of variable response
rates. Any group which spends almost all of its time in the office is likely to respond better
than any whose work is partly external. Prepaid envelopes may possibly reduce non-response
among those not always working in the office. But any differences between department
groups should be shown up by this method. It can be argued that better stratification would
be between (1) regular users, (2) occasional users, (3) non-users. But lists of these would be
very hard to construct. Provided strata do have underlying differences in important responses,
the method is bound to give better (more precise) estimates than simple random samples,
provided also that response rates are similar in all strata.
Ordinary Certificate, Paper I, 2003. Question 5 Ordinary Certificate, Paper I, 2003. Question 6
(i) C is cheapest; A costs only the paper; B is most costly in terms of staff time; (i) A longitudinal survey follows a group ("cohort") of the target population over
D will have some cost to make a useful and eye-catching display; E will have postal a period of time; a cross-sectional study takes a "snapshot" at a particular time.
costs.
For the present topic, a longitudinal survey would follow the opinions of the same
C is quick, even allowing for reminders; A and D depend on how long it takes to get group of people through, say, a year, to note any changes in attitude to food, variety,
enough responses; E will take some weeks, allowing for reminders; B may be slow if service etc of regular users. Another example would be to take a newly-arrived group
only one or two trained interviewers are available. of users and follow their changes in attitude and use.
B and E are the only ones where some staff will not have the opportunity to answer A cross-sectional study is a single undertaking on the lines discussed in previous
(though statistically satisfactory methods). For all the others, the choice whether to questions.
respond is theirs, except for the present non-users by method A.
Ordinary Certificate, Paper I, 2003. Question 7 Ordinary Certificate, Paper I, 2003. Question 8
Another example is "Do you think that reasonable accommodation should be devoted
by the canteen to meeting special dietary needs?", which begs the question of what is
"reasonable" a vague word which should not be used. A better alternative is "How
much of the canteen area should be dedicated to special dietary needs?"
Ordinary Certificate, Paper I, 2004. Question 1
NOTE. The question does not ask for definitions of the two sampling methods, so
THE ROYAL STATISTICAL SOCIETY these can be assumed.
(iv) Because of speed, it can be used repeatedly for topical purposes such as
election predictions.
The Society provides these solutions to assist candidates preparing for the
examinations in future years and for the information of any other persons using the (v) The costs of planning and analysis are less than for random sampling with
examinations. call-backs.
The solutions should NOT be seen as "model answers". Rather, they have been (vi) The controls on quotas consist simply of finding men and women in fairly
written out in considerable detail and are intended as learning aids. broad categories, such as age-group and type of occupation/work.
Users of the solutions should always be aware that in many cases there are valid (vii) A specified target sample size can be achieved, since it does not matter which
alternative methods. Also, in the many cases where discussion is called for, there individuals form the sample.
may be other valid points that could be made.
Disadvantages include the following.
While every care has been taken with the preparation of these solutions, the Society
will not be responsible for any errors or omissions. (i) There is no theoretical method of assessing sampling variability.
The Society will not enter into any correspondence in respect of these solutions. (ii) Approximate methods have to be used for estimating variances.
(iii) Interviewers can easily introduce bias when choosing people to form the
sample.
(iv) Refusals can be numerous if the same vicinity (e.g. a shopping area) is used
for obtaining interviews for several surveys in succession.
(v) Refusals may in any case be more common than with a random sample.
RSS 2004 (vii) Errors through bias etc form a "hidden" cost because of non-detection.
Ordinary Certificate, Paper I, 2004. Question 2 Ordinary Certificate, Paper I, 2004. Question 3
(i) Possible problems and suggestions for overcoming them are as follows. A database might look like the following.
Credit was of course given in the examination for any other reasonable suggestions.
FIELD NAME FIELD TYPE WIDTH
(1) Some addresses will be non-residential; if this is recognised, they can be Customer_Id Auto-number 6
omitted from the sampling frame. Title Text 4
Surname Text 24
(2) Extra "reserve" addresses need to be available to replace any that are empty or Given_name Text 24
non-residential. Initials Text 8
House_number Numeric 4
(3) Some addresses may be listed more than once, by accident or error or in Address_1 Text 36
different categories, but no address should be used more than once. However, Address_2 Text 36
this will not get over the problem of repeated addresses having a higher Address_3 Text 36
probability of appearing in the sample. Postcode Text 8
Telephone Numeric 16
(4) Alternatively, repeated addresses may be deleted, but this could be a time-
Doctor Text 24
consuming task before a sample is selected.
AllergyA Text 24
AllergyB Text 24
(5) If there is time, a list in some reasonable order will allow someone to travel
round and check it for completeness before it is used geographical order MedicationA Text 24
rather than, for example, alphabetical would be needed. StartA Date 6
FinishA Date 6
(6) Some addresses will have more than one household living there (this should MedicationB Text 24
not apply to blocks of flats, where each household has its own number), and StartB Date 6
all households at such an address could be sampled to ensure proper FinishB Date 6
representation of this type. MedicationC Text 24
StartC Date 6
FinishC Date 6
(ii) Each interviewer could have a quota of interviewees in age, sex and
occupation groups, to be ascertained by the first few questions asked. Size of
household(s) is a useful criterion also, and the whole of a target area must be covered Note. A separate table for medication would also be useful:
by the team of interviewers between them. Time of interview should be varied (day,
evening) to ensure all residents are available for interview. FIELD NAME FIELD TYPE WIDTH
Medication_Id Auto-number 6
Customer_Id Auto-number 6
StartDate Date 6
FinishDate Date 6
Ordinary Certificate, Paper I, 2004. Question 4 Ordinary Certificate, Paper I, 2004. Question 4 continued
Please complete this questionnaire only if you are an employee working in the city of (X). If
you do not work in (X), we apologise for bothering you. Additional questions would be on relevant important topics, such as delays on public
Please give your Name ____________________________ transport, overcrowding, reliability of services, cost, congestion on roads, problems
and Home Address ____________________________ caused by flexible working times.
____________________________
_________ Postcode __________
1. Where in (X) do you work? Please give the name of the road or building and its
postcode. ___________________________________________________________
2. How far is it from your home to work? Please tick the appropriate box. Ordinary Certificate, Paper I, 2004. Question 5
Less than 2 miles
Between 2 and 5 miles
Between 5 and 10 miles
The advantages of a "diary" include the following.
10 or more miles
(i) There is a much more accurate record of what was eaten and when.
3. On a normal day with no unusual delays, how long does it take you to travel from
home to work? _____ hours _____ minutes (ii) Answers do not depend on long- or medium-term memory.
4. What method(s) of transport do you use? Please tick all the relevant boxes.
(iii) A diary form could be designed, with suitable headings and definitions, to
Foot make accurate and correct recording easier.
Bicycle
Car or Van (iv) A further improvement may be to record quantities in some convenient way.
Bus, Tram or Coach
Train or Underground (v) Regularity of diet can be included by having carefully specified "time" boxes.
Other (please specify) ________________
5. For any of the methods of transport you have marked in question 4, please give the
cost of a return trip. Disadvantages include the following.
Car or Van (cost of fuel only) (i) It takes time, and may become tedious, for a diary to be fully completed over a
Bus, Tram or Coach reasonable period.
Train or Underground
Other
(ii) There is no guarantee that it is completed fully accurately, at the time food or
[Note. For many large cities, this question would need amending to allow for use of season or other multi- drinks are consumed or very soon afterwards.
purchase tickets or "travelcards" that may cover more than one method of transport. ]
(iii) Diets will vary somewhat according to seasonal availability of some items,
6. Do you find any disadvantages in your present method(s) of travel?
requiring repetition of the exercise a few times during a year.
Yes No
(iv) People may actually change their regular habits during the time they are
If you have answered Yes, please say what they are. keeping a diary.
(v) "Snacks" between main meals may not be recorded unless clear instructions
______________________________________________________
are given and not always then.
Thank you for completing this questionnaire.
Your answers will be kept confidential.
Ordinary Certificate, Paper I, 2004. Question 6 Ordinary Certificate, Paper I, 2004. Question 7
What is to be estimated in particular, will the interest lie in mean values of (i) The total number of farms is 400, so we can take a 10% sample (40 farms) and
measurements or in proportions? This determines which variance formula is used. therefore 10% in each size group. Rounded to the nearest whole number, this gives
Proportions give much les information per item and so need much larger samples. 20, 12, 8.
Are estimates required for subgroups? If so, stratification is required, and each
subgroup has to be sampled adequately. (ii)
How accurate are estimates required to be (how close to the "true" population value), Size group Number SD Sample size
and are the available resources (time, money, staff) sufficient to collect and process (see calculation below table)
sufficient data to achieve this accuracy? Small 203 6.4 = 1299.2 10.67
Medium 115 11.6 = 1334.0 10.95
Does the person planning the survey have any information on the variability of Large 82 27.3 = 2238.6 18.38
measurements to be taken or the size of the proportion to be estimated? If not, few of 4871.8
these questions can be answered satisfactorily and some preliminary work or a pilot
survey will be needed.
1299.2 1334.0 2238.6
The sample sizes are 40, 40, 40 .
Is the sample multi-purpose, i.e. required to estimate several things, either several 4871.8 4871.8 4871.8
measurements or proportions or a mixture of the two? If so, the sample size must be Rounding, these will be taken as 11, 11, 18.
large enough to meet the requirements of precision for all of them. Assess this in the
light of available resources.
(iii) Method (i) is proportional allocation, which is easy to plan and does not need
What level of non-response may be expected? Allowance for this will be needed in estimates of standard deviations in groups. The groups (or strata) are represented in
deciding sample size. the same proportions as in the population, so the method gives reasonable estimates
valid for the whole population without further adjustment.
How quickly are results needed, and is this realistic with the available resources?
Method (ii) is optimal allocation, sampling more intensively in the more variable parts
of the population and in the larger strata. Its estimates have minimum variance for
fixed total sample size (provided the available information on SDs is good). The
recorded data have to be kept in the correct strata during the estimation calculations.
Ordinary Certificate, Paper I, 2004. Question 8
(i) (a) For systematic sampling, number the books in order of positions on
shelves, beginning with (say) the top shelf and then move to the second, then THE ROYAL STATISTICAL SOCIETY
to the third and so on until all N books are identifiable. (Since a library will
have many sets of shelves, the shelves in one set will be completed first, then
move to the next set; this will usually be easier than completing all the top
shelves first.) 2005 EXAMINATIONS SOLUTIONS
Calculate k = N/n and round it to the nearest whole number. Choose at
random a number between 1 and k (inclusive), say j. Locate the jth book
along from the starting point; this is the first member of the sample. Then
take every kth book after that. ORDINARY CERTIFICATE
(b) For cluster sampling, the clusters could be taken as the sets of shelves PAPER I
or, alternatively, individual shelves could be used. Number these clusters 1 to
L. To obtain an approximate value of the number of books in each cluster, use
M = N/L. Then choose n/M clusters to form the sample, and take every book
in each cluster chosen.
Advantages of systematic sampling are that it is easy to carry out and would be very The Society provides these solutions to assist candidates preparing for the
much quicker than a random sampling scheme. Since N is known, no complete count examinations in future years and for the information of any other persons using the
is necessary at the beginning. Most likely there would be no periodic variation in the examinations.
ages of books, so age can be assumed to be a random variable when based on
systematic sampling. All the stock of books would be covered, so long as no shelves The solutions should NOT be seen as "model answers". Rather, they have been
were missed in the initial count. There is no theoretical basis on which to study written out in considerable detail and are intended as learning aids.
systematic sampling, but either a form of cluster sample analysis can be used or one
based on assuming simple random sampling. Users of the solutions should always be aware that in many cases there are valid
alternative methods. Also, in the many cases where discussion is called for, there
Cluster sampling requires the shelves, or sets of shelves, to be numbered first. Care may be other valid points that could be made.
needs to be taken when sampling each cluster to look at every book in it, once and
once only. It is a good method if the distribution of ages of books is similar in each While every care has been taken with the preparation of these solutions, the Society
cluster and the clusters are similar to the overall distribution in the library. This may will not be responsible for any errors or omissions.
not happen if books are arranged by subjects, some of which will have more recent
books and others older books. The Society will not enter into any correspondence in respect of these solutions.
(ii) Suitable strata would be very hard to define books of similar age would take
a long time to locate (unless there is a computer listing of stock in which case
average age could probably be calculated directly for the population without
sampling). Even if some other strata (not age) were used, the process would not be
easy; for example, if shelves (or sets of shelves) were strata, every book would need
numbering so that random samples could be taken. This would be very time-
consuming.
Stratified sampling does not seem a good idea for this purpose.
© RSS 2005
Ordinary Certificate, Paper I, 2005. Question 1 Ordinary Certificate, Paper I, 2005. Question 2
/ /
chosen schools.
Your date of birth:
D D MM Y Y
Using cluster sampling followed by systematic sampling:
We are interested to know what your occupation has been in each year since you left
Stage 1: take the schools in different geographical regions as clusters and
school. This could be study, or work (paid or unpaid), or something quite different
select some clusters at random from these;
such as travel. For this survey the main occupation is defined as one on which you
spent 4 or more hours per week and which lasted at least one month. If you changed
Stage 2: use the student lists as the basis for a systematic sample of
your occupation within a year, please give the details of the months spent in each. (If
individuals. you have any queries about how to complete this question, please contact the survey
team.)
(ii) For the first scheme above, we are bound to obtain schools of all size groups; Calendar Year 2004 [Repeat all of this for 2005]
but we have to construct the strata first. Then it is easy to estimate means,
proportions etc from a simple random sample, but the method is rather tedious
to plan and carry out. Also occasionally a very "untypical" sample can result. Study? Yes No (please tick relevant box)
For the second scheme, clusters are administratively easier to handle, for
example by using local offices as bases to cut down travel, but the chosen
If YES please give the following information:
clusters may not be typical of the whole country and the schools within a
cluster may be quite similar to one another. Then systematic samples from
Time for which it lasted (months) _________________
lists are easy to carry out, and can ensure a good balance of leaving dates, but
there is no theoretical basis for estimating variability. Name and location of institution ___________________________________
[NOTE it would be wise to give two spaces for answering these questions in case
NOTE that other combinations are possible, and will show similar advantages there has been a change]
and disadvantages.
Work? Yes No (please tick relevant box)
Paid Unpaid
_____________________________________________________________________ A database might look like the following, continuing with as many occupations as
necessary.
The details of the occupations would be coded to indicate names (if appropriate) and
Please tell us what you hope to be doing in five years time from now:
locations. Career plans would also be coded.
_____________________________________________________________________ FIELD NAME FIELD TYPE WIDTH
Respondent_ID Auto-number 6
_____________________________________________________________________ Title Text 4
Surname Text 24
Given_name Text 24
And in ten years: Initials Text 8
House_number Numeric 4
_____________________________________________________________________ Address_1 Text 36
Address_2 Text 36
_____________________________________________________________________ Address_3 Text 36
Postcode Text 8
Telephone Numeric 16
(These could be the same as now or something quite different).
Date_left Date 6
Age Numeric 2
Year_1_occ1 Numeric 2
Date_occ11 Date 6
Thank you for responding to this survey. Details_occ11 Numeric 3
Year_1_occ2 Numeric 2
Date_occ12 Date 6
Details_occ12 Numeric 3
Year_1_occ3 Numeric 2
Date_occ13 Date 6
Details_occ13 Numeric 3
Year_2_occ1 Numeric 2
:
Career_plans Numeric 3
A spreadsheet would have variable names similar to the field names, and cell widths
the same as the field widths. Coding could be the same as the above. Either extra
columns can be used for occupations or separate sheets for each. Types are number,
date or text. Auto-number number.
Ordinary Certificate, Paper I, 2005. Question 4 Ordinary Certificate, Paper I, 2005. Question 5
(i) A non-random method could: (i) Some of the local representatives may be chosen at random from those with
responsibilities in, and/or knowledge of, the community served by the market.
(A) ensure that regions of both types are sampled, without the need to
Some may have an official position which makes them well known, and able
stratify;
to obtain the required information. Merchants may have to pay a hire charge,
(B) use regions that are easy to get to as often as necessary;
and if so the numbers doing so on any particular day could be obtained from
(C) choose regions that are likely to have a variety of types of market;
the authority receiving the charge. Times of start and end of activity each day
(D) give what may be thought a "representative" sample.
could be found by regular visits, and the chosen representatives must be
prepared to do this. Numbers could also be estimated in this way. Merchants
Disadvantages include:
could be asked exactly where they come from, and their range of produce
(A) possibility of bias in choice; recorded. The amount they bring will determine the length of time they stay.
(B) no valid estimate of sampling variation can be found; Representatives have to be taught how to obtain accurate, reliable information.
(C) areas thought "not typical" may not be included;
(D) those least easy to reach may not be used.
(ii) Few people are likely to remember last years prices at any particular season,
even if they have a rough idea that things are cheaper/more expensive this
(ii) (a) Merchants change from day to day; the same merchant may bring a year. Merchants could perhaps be asked whether supplies are more or less
different set of produce at each visit; some merchants may have access plentiful than last year, which may be related to price, but any attempt at
to very little land and so visit rarely; a "sampling frame" of merchants numerical estimates is probably not worthwhile. Central figures on price and
is not likely to exist. quantity of staple foods may be available but regions will vary.
Interviewers can encourage respondents to give particular answers by asking (i) There are 80 markets. Hence a 10% sample is required. This requires 2.7 and
questions in a particular way, loaded to a particular answer. Also the question may 5.3 markets, or 3 large and 5 small as the nearest whole numbers.
not be fully understood by the interviewer, so it is not answered as it was meant to be
asked. Respondents may not be shown the questionnaire upon which answers are This will cost (3 × 15) + (5 × 12) = 105 currency units.
recorded, so they do not know all the possible answers expected; or the interviewer
may take advantage of illiteracy to enter inaccurate answers. A trained interviewer
should ask all questions in a neutral way, not depart from the wording of them, and be (ii) Using n1 large and n2 small markets,
careful to record the answer as closely as possible to what the respondent says. A
friendly attitude, helping but not forcing people towards answers, is necessary. 27 0.05 53 0.07
n1 0.3486 and n2 1.0710
When collecting prices from markets, it may be wise to record prices quoted to 15 12
potential buyers instead of asking directly. A direct answer may be one thought likely
to please the interviewer, or biased in either direction according to why the merchant so that n n1 + n2 1.4196 .
thinks the question is being asked such as if he fears higher stall charges.
Interviewers should explain the purpose of the survey, to avoid suspicions over the 0.3486
Hence n1 0.2456n and n2 0.7544n .
reason for it. 1.4196
105
This should not exceed 105, so n 8.24 .
12.7368
(iii) If the data refer to a very important vegetable, the optimal allocation may be
preferred as it gives a minimum-variance estimate. However, there is no
guarantee that it will do the same for any or all of the other vegetables and
fruit on sale. Using a uniform sampling fraction should achieve representative
results for items of produce as a whole, and it does not need estimated
variances to make any calculations. The cost is marginally higher but this is
not very serious. The slightly larger number of large markets in uniform
sampling could also help in assessing the variation among the large markets.
Index numbers for prices require a set of fruits and vegetables to be specified to go
into the index. This is the first important decision to make. Prices (pi) have to be THE ROYAL STATISTICAL SOCIETY
found for each chosen item, and also the quantities (qi) of these that are consumed in
the population. Decisions on how to collect (pi) and (qi) have to be made. The index
number is generally calculated as a weighted average based on piqi.
2006 EXAMINATIONS SOLUTIONS
Assuming that no index at present exists, the prices from as many markets (or other
outlets) as possible should be obtained. These may vary with season, but a decision
must be made on a "typical" pi for the index. Some combination, possibly weighted,
of local prices is likely to be best. In the same way, qi for each item has to be ORDINARY CERTIFICATE
constructed. This is often done by a consumer survey, separately from the price
survey, though care should be taken to see whether pi and qi for particular parts of the PAPER I
country are related (scarcity in a region may lead to high prices there). Town and
rural consumption patterns may differ. The first year's data would be used as
"baseline" information, and subsequent years' data compared with that baseline
usually by changing only the prices. Quantities are updated less often. Whatever
method has been used to construct the (pi) should go on being used for subsequent
years to provide valid comparisons. (If any serious error is found in the method after The Society provides these solutions to assist candidates preparing for the
later use, a new base may need to be set up using a modified version of the collection examinations in future years and for the information of any other persons using the
method.) Although indices are typically quoted as annual figures, food prices are examinations.
almost bound to vary by season and the data collection has to allow for this.
The solutions should NOT be seen as "model answers". Rather, they have been
written out in considerable detail and are intended as learning aids.
Users of the solutions should always be aware that in many cases there are valid
alternative methods. Also, in the many cases where discussion is called for, there
may be other valid points that could be made.
While every care has been taken with the preparation of these solutions, the Society
will not be responsible for any errors or omissions.
The Society will not enter into any correspondence in respect of these solutions.
© RSS 2006
Ordinary Certificate, Paper I, 2006. Question 1 Ordinary Certificate, Paper I, 2006. Question 2
The required questions could be covered on a form such as the one below, and
(i) The description of the project indicates that a large area is involved. Villages instructions to interviewers might be printed in italics to avoid confusion.
chosen should represent all distinct types of agriculture and land, such as high
and low land, mixed crop and animal farms, farms that deal only with crops Interviewer, say the following: Good morning (or afternoon, evening, as appropriate).
and those that are only animal farms. Size of farm should also be considered, The … organisation is undertaking a project on the control of XX in this country, and
perhaps split into large and small. If the area is large enough to have a as part of this project would like some information about farms and farmers in this
variable climate, for example if some parts have less rainfall than others, this village. It would be very helpful if you could answer some questions about your farm.
is also important. If possible, reasonably accessible villages should be used. This will take at most half an hour.
But with so many constraints and only 15 villages, there may be some
difficulty making a good choice. Interviewer: If the farmer agrees, say "Thank you" and then start asking the
questions below. If the farmer refuses, apologise for taking his/her time and try to
(ii) Method A arrange an alternative time of interview.
Advantages are that those attending are likely to be interested in the project Q1. Please tell me your name.
and less likely to refuse to be interviewed; there is no "not at home" problem.
Provided they are willing to stay, the necessary number can be obtained – if Q2. How old were you last birthday?
there are enough at the meeting!
Disadvantages are that a meeting may be poorly attended; unless there are a Q3. Is your highest educational level primary, secondary, or higher? (Interviewer:
good number of interviewers, people may not be prepared to wait for their delete as necessary).
interview; and the sample is in any case self-selected, through interest in the
project, enough spare time to attend, and perhaps being more articulate or Interviewer: Ask questions 4–7 in turn, recording the answers to all four questions on
literate. Good publicity beforehand and plenty of interviewing staff would the grid below. Insert the name of the household member where indicated.
help considerably. There could be other activities while waiting for interviews
(e.g. experts whom they could consult), or some refreshments, or the promise Q4. Please could you tell me the names of all members of your household.
of assistance by advisory services when needed.
Q5. What relationship is (give name, working through all members of the
The choice of random sample members also has problems. This cannot be household in turn) to you?
done in advance; it must be done at the meetings. As there is no list, some
form of selection has to be based on numbering of seating or distributing Q6. How old was (give name, working through all members of the household in
tickets to people as they arrive and then drawing numbers out of a hat when turn) last birthday?
everyone has arrived. More than one member from the same farm could be
selected unless this was deliberately avoided. Q7. I am now going to ask you about the education of each member of your
household. (Interviewer: Say this once only.)
(iii) Method B
Was the highest educational level of (give name, working through all members of the
Absence of maps makes it hard to know whether all farms have been located household in turn) primary, secondary, or higher?
and listed, and also to fix a suitable sampling interval in systematic sampling.
Asking local inhabitants where neighbouring farms are could help to complete Name of household Relationship to farmer Age Highest educational level
a list, and a preliminary tour round the area could then be used to fix the member (delete as necessary)
sampling interval. Non-response through being unwilling or unavailable is
1 primary/secondary/higher
likely. Considerable time will be needed for this method.
2 primary/secondary/higher
It is however a properly random method, in which all farmers have a chance of 3 primary/secondary/higher
being interviewed, and the interviewer can record other useful information as 4 primary/secondary/higher
well as answers to questions. Choice of calling time could help to reduce non-
response, especially by not going at busy times during the working day. Some
form of incentive could be offered to obtain answers, though this would cause Solution continued on next page
ill-feeling among those not in the sample.
Q8. How many hours per week do you usually work on the farm this time of year? Interviewer, say the following: "That is the end of the questions. Thank you very
much for taking the time to answer them. Is there anything you would like to ask me
Interviewer: Record answer on grid after Q.10. or anything else that you would like to add?" Record responses. Answer any
questions if you are able to do so, or say you will try to find out the answers and that
Interviewer: Ask questions 9–10 in turn, recording the answers on the grid below. someone will get back to the interviewee. Record the action. Record anything
interviewee adds.
Q9. Which members of your household work on the farm this time of year?
Q10. How many hours per week does (give name, working through all members of
the household who work on the farm in turn) usually work at this time?
Interviewer: Ask questions 11–12 in turn, recording the answers on the grid below.
Q12. How many (give name of type of livestock, working through all listed in turn) do
you own?
Interviewer: Ask questions 13–14 in turn, recording the answers on the grid below.
Q14. What size of area is planted with (give name of crop, working through all
listed in turn)?
Q15. Would you describe the pest risk from XX in this area as high, medium, or
low?
(i) Clearly one possibility is to allow for five entries for every farm, i.e. one for (i) Sample size n = 20. There are 92 children altogether. The proportions in the
each potential household member. When households are smaller than five, schools are as follows.
this wastes a large amount of space and leads to large unwieldy files. [It also
School 1: 25/92 = 0.2717
requires care in programming when calculating means (or other statistics). For
example, suppose five columns (one for each potential member) are used for School 2: 30/92 = 0.3261
each farm, but a particular farm has only four members. The mean age, say, School 3: 37/92 = 0.4022
must be calculated using just these four members, i.e. assuming the fifth is
"missing" (and not erroneously taking the fifth as being present but zero). A The sample sizes from the three schools should be as near as possible to
total (e.g. total number of hours worked) assumes a zero in the "missing" n × 0.2717 etc. These are as follows.
column, but clearly makes no sense on its own without knowing the number of
School 1: 5.43 School 2: 6.52 School 3: 8.04
members. It might help to use a code (different from any code for an
individual missing value) to indicate those columns in which there are no
Therefore take samples of sizes 5, 7, 8.
entries at all because there are fewer than five members at that farm.]
(ii) Livestock: discrete variable (integer), showing number on farm – as many School 1: 6.26 School 2: 5.67 School 3: 8.08
variables as there are types of animal in the whole survey, score 0 if a
particular farm does not have a particular animal type. So now the sample sizes should be 6, 6, 8.
If this method is to be used easily and successfully, there must be some central facility
(ii) Stratified sampling controls the urban/rural factor better. Cluster sampling where a wide selection of people will go, e.g. a shopping centre or a main railway
would be easier to administer as both the sampling and the interviewing of station. It is very unlikely that the whole population of the area could be captured at
individuals are carried out in more concentrated areas rather than over the any one place, or even a representative part of that population – some could only be
whole country, saving time and resources. located at home or work places at convenient times.
Bias through interviewer choice of individuals is likely, refusal is more likely from
some groups than others, interviewers need good training (and occasional checking)
to make sure they correctly allocate individuals to quotas. Refusal rates may be no
lower than in random methods, and bias in asking questions is possible (as with any
personal interview method).
Ordinary Certificate, Paper I, 2006. Question 7 Ordinary Certificate, Paper I, 2006. Question 8
(i) Advantages of a diary – it does not usually rely on memory to any extent, (i) Obviously for the youngest age group this is likely to be the best way of
does not usually involve any interviewing if a good layout is used and data can collecting information, but there are a number of difficulties. Some length of
be taken directly from it, there is a reasonably accurate record of the activities time will be needed for each child, perhaps a full day (or the equivalent such
in each time unit. Headings could make completion easier, e.g. eating, as a morning and another afternoon/evening). This could easily alter the
working, leisure, codes for common occupations. child's pattern of activities. A sample of several children will be required to
find out the full variety of activities in that group, and to discover whether
rural and urban differ and in what ways. Observing just one child for a week,
(ii) Disadvantages – in some circumstances instant completion is not possible, for example, would be a waste of the observer's time. Basic information could
risk of some participants losing it, ceasing to keep it up to date, not giving be collected from parents, for example whether the child is at school or, if
details. Fewer people may be willing to join up in the first place, and some of relevant, nursery and play school attendance although observation in such
those who do will not complete it. Also the act of keeping a diary may lead activities is not encouraged for legal reasons. Certainly by the time a child is
some respondents to change habits, permanently or temporarily. A tedious at school "full-time", detailed observation would neither be possible nor
task like this may not always be done fully. Coding of responses for analysis necessary. Observation would be limited to out-of-school hours and holidays.
could be difficult and time-consuming, and analysis more complicated. An observational study needs fully trained observers for long periods of time
and will be very expensive. For the (say) 11–16 age group, a diary (see qu 7)
might be sufficient.
(ii) (a) Depending on the access to the web in the region being studied, it is
probably better to concentrate on school-based data collection. For
those who have access at home, this could be added. Schools would
therefore have to agree to take part and to direct children to a web page
containing a questionnaire. The answers would vary in quality
according to whether the children took it seriously or not, and whether
they were supervised or not; supervision might lead to less-than-
honest answers for some questions.
For home use, pop-ups while browsing the web could be used. This
misses those whose equipment does not accept pop-ups and those who
only use the web occasionally for specific enquiries. Data quality by
this method must be in serious doubt.
Dear Customer,
THE ROYAL STATISTICAL SOCIETY
We value the feedback from our customers on the holidays they take with us. So we
hope you will spend a few moments answering the questions below. Please either tick
the appropriate box when there is one or write your answer in the space provided.
2007 EXAMINATIONS SOLUTIONS When you have completed the questionnaire, please hand it to your tour leader or, if
you prefer, post it when you get home to the FREEPOST address given at the end.
While every care has been taken with the preparation of these solutions, the Society YES NO
will not be responsible for any errors or omissions.
5. What did you like best about the holiday?
The Society will not enter into any correspondence in respect of these solutions. _______________________________________________________________
7. Please add here any other comments you would like to make
_______________________________________________________________
_______________________________________________________________
© RSS 2007
_______________________________________________________________
(ii) In closed form, a list of the more popular activities could be given, each
having a box to be ticked if the respondent liked doing that activity, while the
less popular ones could be combined into related activities such as
walking/hiking/climbing that would have just one box to tick.
Ordinary Certificate, Paper I, 2007. Question 3 Ordinary Certificate, Paper I, 2007. Question 4
(i) A quota sample should contain a given number (quota) of people in each of a (i) (A) is two-stage cluster sampling; (B) is stratified random sampling.
set of categories. These categories are based on sex, age-group and any other
characteristics of interest, such as in this case whether people are in a group or
are travelling independently. Assume he has been told what each category (ii) For cluster sampling (method A) to work well, the views of the patients in the
consists of (e.g. "males over 60 travelling independently"), and how many sampled wards have to be representative of those in the hospital on the whole,
residents he must interview from each category. i.e. in each chosen ward the variability in patients' views must reflect
variability in the whole hospital.
He should walk round the hotel complex, visiting all its facilities at different
times of the day, and also interview people in the restaurant and in their rooms For stratified sampling (method B) to work well, the views of patients in
if possible. There will be a few questions to ask first to identify which sampled wards of a given type (O, SC or IC) should not vary much between
category a person is in, so that the manager knows whether he needs another wards of that type, but the types may differ noticeably in which case results
member from that category or not. He should go on searching until all his for each particular type are more useful than a single "overall" result.
quotas have been met some people may not be easy to catch e.g. if they go
on organised excursions most days.
(iii) (A) Advantages include: sample relatively quick to choose; require details
A systematic sample can be taken once it is known how many people are of patients only for selected wards; less effort to visit only a few wards
needed and how many registrations there are in that particular week. If n than the whole hospital.
interviews are needed and N people have registered during the week, he can
take every (N/n)th from the registration list, the starting point being chosen at Disadvantages include: may not have all three types (O, SC, IC)
random between 1 and N/n. [If most arrivals are in groups, the sampling represented adequately in the sample; variances of estimators tend to
fraction for independent travellers might need to be larger than that for groups be high.
to obtain adequate precision.]
(B) Advantages include: easy to get results for each type of ward;
(ii) The systematic method is easy to carry out, and should be effectively random possibility of including different questions relevant to each type of
(unless the "sampling interval" N/n unfortunately coincides with any cyclical ward; estimates are often more precise.
pattern among the registrations). Everyone who has registered has the same
chance of being selected (but this may need adjusting as mentioned above). Disadvantages include: takes time to select samples; takes time to
Some may of course be difficult to locate, or may refuse, as in any survey. A visit several wards of each type; details are needed to trace individual
quota sample ought to be representative of the population of residents but the patients.
interviewer may be selective in which people are approached those who
look less likely to refuse and there is a danger of missing altogether some
types of residents, e.g. those who breakfast early or dine late because they
spend a lot of time out of the hotel. There is no theoretical support for quota
sampling because there is no element of random selection in it.
Ordinary Certificate, Paper I, 2007. Question 5 Ordinary Certificate, Paper I, 2007. Question 6
50 25 10 (i) Telephone interviews are relatively cheap no travel costs, and hardly any
(i) N Ni 85. So the sample sizes ni have to be in the ratio
: : ,
85 85 85 wasted time through finding selected sample members are not at home or if the
respondent says "call back later". Interviewers' performance is easily
i
multiplied by 36 to find the actual numbers. Thus we get 21.2 in ward A, 10.6
in B and 4.2 in C, so we take 21, 11 and 4 respectively. monitored as conversations will usually be recorded (it is important to make
the respondents aware of this). Sometimes seeing an interviewer can put
The cost of this is (5 × 21) + (5 × 11) + (10 × 4) = 200 dollars. potential respondents off, whereas hearing may not do so. Answers to
sensitive questions may be better in telephone interviews that face to face.
N i si 50 1.81 25 3.23
(ii) For A, 40.47 ; for B, 36.11 ; for C, (ii) However, refusals may be more likely, especially if people have been
ci 5 5 interrupted at a busy or inconvenient time. Conversations must be kept short.
10 2.18 Background noise and the possibility of being overheard are possible. And not
6.89.
10 everyone is accessible by telephone.
This gives 0.485n : 0.433n : 0.083n and the total cost is then
The total cost will then be (5 × 18) + (5 × 16) + (10 × 3) = 200 dollars, so this
is satisfactory.
[Note: because the sample size for the more expensive ward (C) is rounded
down to 3, we save just enough to take n as 37. Otherwise we would need to
consider n = 36 and find suitable nA, nB, nC.]
(ii) The main difference is in B, where the second method gives a larger sample
size. Since B is the most variable ward, this should lead to more precise
overall results. A sample as large as 21 in A, as in the first method, is rather
wasteful, although C does have 4 on that scheme but only 3 on the second
scheme. On balance, the second scheme is likely to be preferred.
Ordinary Certificate, Paper I, 2007. Question 7 Ordinary Certificate, Paper I, 2007. Question 8
(i) A longitudinal study follows the same group of people through the whole (i) (1) People who respond might be systematically different from those who
period of the study. One advantage is that a sample will be relatively easy to do not. For example, they might be very interested in the topic of the
choose from the list of last year's graduates, which is likely to be complete and survey while most other people are not, or they might hold strong
very nearly fully up-to-date in terms of addresses etc, and thus provides a good views which in no way represent those of the whole population. This
sampling frame. Recent graduates are likely to be interested in responding, so can bias results very seriously.
there should be a good response rate at least initially though this may fall off
over time. It is useful to be able to follow a group through an extended period (2) When some of the selected people fail to respond, the achieved sample
of time, noting changes in occupations and reasons for them. size becomes smaller than was planned and so the results have lower
precision than was aimed for. This can be very serious if there are
A disadvantage is that occupations which are recruiting in one year may not be many non-respondents.
recruiting every year, so the pattern of jobs that are obtained immediately after
graduation may change year by year. Another disadvantage is that contact (3) Following up non-response (as opposed to outright refusal) can be done
may be lost as people change jobs or move to other addresses. People may by telephone or by having an interviewer visit people, but this costs
also lose interest in the survey, leading to problems of reduced sample size and resources (time and money) and may not be possible in a short time-
possible non-response bias. On the other hand, it is even possible that scale when results are needed quickly.
participation in the survey, with the feeling of being "watched", might make
some respondents change some aspects of their occupations in the hope of
creating a good impression. A further disadvantage is that the study must (ii) Reasons include the following.
obviously take ten years (plus time for analysis) before full results are
available. (1) The available address is no longer the correct one, as the person has
moved.
(ii) A major advantage of using samples from graduates of five and ten years ago (2) The questionnaire may not get delivered.
as well as current new graduates is timeliness. Results will immediately be
available for people in varying stages of their careers and with different (3) The questionnaire may be regarded as junk mail and destroyed.
experiences of the initial jobs market.
(4) The intended responded may have died.
However, unless the university has kept a good database of its graduates, with
sufficient contact to keep addresses etc up to date, the sample frames available (5) A questionnaire may not reach the survey organiser even if it is
at five and ten years will not be as good as the recent one. If contact has not returned.
been reasonably regular, even those who do receive an inquiry may not be
very interested in responding. These disadvantages would make it wise to (6) People may be too busy to reply, or not interested, or simply set the
select a larger sample to allow for non-response, but this would increase the survey aside until they are less busy when it is too late.
cost of the survey. Another disadvantage is that it would not be possible to
find out how the careers of individual graduates had changed over time (unless It may be possible to improve the look of a questionnaire to make it seem
the respondents were also asked to record their progress throughout five or ten more worthwhile answering; to remind people (also by post) at intervals; to
years this could be done if it was thought appropriate). send a pre-paid reply envelope with the questionnaire; to offer gifts or
inducements, such as taking part in a draw, as an incentive to reply; to take
care that the introductory explanation to the questionnaire makes it seem
interesting and relevant, and is expressed in simple form rather than unduly
"official".
ORDINARY CERTIFICATE Many questions can be answered by simply ticking the relevant response.
PAPER I
1. In which year did you join the organisation?
Of no interest
A few are of interest
© RSS 2008 Most or all of them are of interest
7. Do you think that advertisements should be included? Ordinary Certificate, Paper I, 2008. Question 2
Yes No No opinion
8. Do you agree that publishing a newsletter at intervals of one month is Advantages of including a questionnaire as an insert in the newsletter include:
about right? There are no costs of selecting a sample, addressing envelopes etc, or separate
Yes No postage costs.
If you answered No, at what interval would you like the newsletter to The questionnaire is sent to all members so potentially every member could
be published? give an opinion of the newsletter.
Disadvantages include:
The financial and time costs of selecting the sample, addressing and stuffing
envelopes.
Costs associated with following up non-response.
Ordinary Certificate, Paper I, 2008. Question 3 Ordinary Certificate, Paper I, 2008. Question 4
(i) Systematic sample from the alphabetical list of members. (i) Total of 1162 members.
This should achieve a sample close to simple random. The proportions in Need (141/1162) 95 = 0.1213 95 = 11.53 student members, say 12.
the sample in the three grades of membership should be similar to the actual
proportions in the entire membership, and likewise there should be a good
Need (782/1162) 95 = 0.6721 95 = 63.9 ordinary members, say 64.
spread across the years of joining the organisation. An example of a
potential problem is that, if the number in any grade is very small (e.g. there
Need (239/1162) 95 = 0.2057 95 = 19.54 retired members, say 20.
are only a few student members), this method might not select any
members at all from that grade.
Note that 12 + 64 + 20 = 96 but there is sufficient budget only for 95
altogether. So take one fewer in one of the groups, say 11 student members
Systematic sample from the list ordered by year of joining the organisation.
(as 11.53 is very slightly further from 12 than 19.54 is from 20 but there is
This would achieve a good sample across time, with numbers from different very little in it, and this does depend on the calculations being worked to 2
periods of joining represented in much the same proportions as in the entire decimal places).
membership. This is important for picking up any trends over time. Year
of joining is likely to be related to some extent to grade of membership, (ii)
perhaps with those who joined a long time ago more likely to be retired and
those who joined recently more likely to be students, and this method Membership grade Ni si Nisi Nisi/ Nisi (Nisi/ Nisi) 95
would thus also sample the three grades in roughly the same proportions as Student 141 226.21 31895.61 0.0597 5.67
in the entire membership. However, if very few joined in some years for Ordinary 782 550.12 430193.84 0.8046 76.44
some reason important to the organisation, such as a sharp rise in Retired 239 303.60 72560.40 0.1357 12.89
subscriptions, then this sampling method might not sample that group at all.
( Nisi = 534649.85)
Note in particular that in (ii) only 6 student members are selected. If non-
response is high in this group, as is likely, then the achieved sample might be
very small indeed.
Ordinary Certificate, Paper I, 2008. Question 5 Ordinary Certificate, Paper I, 2008. Question 6
(i) A pilot survey is a small sample survey carried out at the planning stage of a (i) A region can be considered to be a cluster of supermarkets. Cluster sampling
full scale census or survey. It is used to test the design of the questionnaire, could consist of selecting one or more (but not all) regions at random. All
the sample frame and the general administrative procedures. It provides supermarkets in these regions would then be in the sample of supermarkets (1-
estimates of the costs and response rates, and of the variances of measured stage cluster sampling) or a sample from the supermarkets in these regions
variables. It can lead to improvements, technical and/or procedural, in the could be taken (2-stage cluster sampling).
conduct of the full census or survey.
An advantage of this method is that interviewers would need to be employed
A sample survey is, as the name suggests, a survey of a sample of a in the selected regions only, saving on travel and administrative time and
population. The sample would usually be selected by a random procedure and costs.
would be considerably larger than the sample used in a pilot survey, but
normally considerably smaller than the population. It would be done when it Disadvantages include: the supermarkets in the chosen region(s) might be
is too costly both financially and in terms of time to survey the whole atypical; the variances of estimators in this method of sampling tend to be
population. For fixed resources, information can be obtained in greater depth higher than in stratified and simple random sampling; the estimation method
in a sample survey than in a census, and quality control can be more stringent. is more complex.
(i) As the interviewers are to select the customers and interview them at the time The observers could stand by the displays of organic fruit and vegetables with a pre-
of selection, quota sampling would be an appropriate sampling method. prepared check-list to be completed by ticking boxes indicating characteristics such as
sex and estimated age group for customers who put such produce in their baskets or
Interviewers should go to the stores at varying times during the day and week. trolleys, and other boxes indicating what organic fruit and vegetables are taken and in
They should aim to interview customers of all ages and of both sexes, in about what kinds of quantity. However, there might be a problem if fruit and vegetable
the same proportions as customers using the supermarket. Observing displays are not very close to each other, in which case individual customers might
customers at check-outs would give an indication of these proportions, as need to be followed round the displays to collect any useful information.
might asking staff (but staff will not necessarily see the whole range of
customers; for example, staff who work only in the evenings are likely to see Alternatively, or as well, the observers could stand by the checkouts to make similar
mainly people who are employed during the day and who do not have children observations.
under 16).
Observations would need to be made at different times of day and on different days of
Different types of customer should be approached both those shopping on the week.
their own and those who are with others, both those who are doing a big shop
and those who have come in for one or two items only. Customers should be Advantages of an observational study in this context over the use of a questionnaire
selected taking no account of their dress or demeanour, that is interviewers include:
should avoid approaching only those who look respectable, who look as if they It does not take up customers' time.
might be sympathetic to answering questions, who appear not to be in a hurry,
etc. Customers could be selected from those in the check-out queue, from It does not rely on customers answering questions truthfully.
those leaving the store, or by walking round the store. Standing by one display Customers do not have to be approached and persuaded to take part.
in the store would not be sensible as not all customers will pass it.
Disadvantages include:
Interviewers should dress moderately, approach customers politely, etc. They It is only possible to observe whether organic fruit and vegetables are bought
should not try to persuade customers to respond against their will. by a customer at the time of observation. No information can be gained about
customers who buy such produce at other times, or who do not buy because
they cannot find what they are looking for or think it too expensive.
(ii) Interviewers should treat the customers with respect. They should ask the
questions and read any introduction and closing note of thanks exactly as No information can be obtained about customer characteristics that cannot be
written on the questionnaire, and should use a neutral tone of voice to avoid observed, such as household size, place of residence and occupation.
biasing the customers' answers. They should not comment on or react in any The results depend very much on the ability of interviewers to decide which
way to the customers' responses. They should record the responses that are customers to observe and their ability to make correct records.
given and not make them up or expand on comments. They might (discreetly)
record specific observations as regards customers, such as if a customer was
hard of hearing leading to communication difficulties.
[Note. Possible alternative survey methods include (short and simple) interviews of
customers in check-out queues. Excellent information on products bought should be
available from the data base holding information from the check-out tills.]
ORDINARY CERTIFICATE Consider the administrative districts as lying in strata classified as urban and
rural, possibly also with regions as strata. As the first stage of sampling take a
PAPER I simple random sample of districts from each stratum (stratified sampling). At
the second stage of sampling take systematic samples of adults from the lists
in the districts in the sample.
(ii) Comments on benefits and drawbacks may depend to some extent on the
The Society provides these solutions to assist candidates preparing for the sampling designs suggested in part (i), but are likely to include the following.
examinations in future years and for the information of any other persons using the
examinations. Cluster sampling benefits: administratively easier than other methods; less
travel if interviewers are used. Cluster sampling drawbacks: districts within a
The solutions should NOT be seen as "model answers". Rather, they have been cluster might be similar to one another; clusters chosen might be atypical of
written out in considerable detail and are intended as learning aids. districts in the country as a whole; estimators are complicated.
Users of the solutions should always be aware that in many cases there are valid Simple random sampling benefits: estimation is easy; properties of estimators
alternative methods. Also, in the many cases where discussion is called for, there may are well understood. Drawbacks: moderately complicated to select a sample;
be other valid points that could be made. extreme/atypical samples might occur.
While every care has been taken with the preparation of these solutions, the Society Stratified sampling benefits: ensures both rural and urban districts (and all
will not be responsible for any errors or omissions. regions if used as strata) are represented in the survey. Drawbacks: involves
stratifying districts as a first step; can lead to reduced precision if the
The Society will not enter into any correspondence in respect of these solutions. stratification is poorly done.
(iii) To obtain a sample of all adults would need supplementation of the sample of
adults in private households by sampling adults living in institutions such as
hostels, residential homes and prisons. Care would be needed to ensure there
© RSS 2009 was no duplication, for example somebody listed as in a private household six
months ago might now be in prison. Obtaining access to suitable lists might
be difficult due to confidentiality.
Ordinary Certificate, Paper I, 2009. Question 2 Q5. Violent crimes are sometimes committed by those aged under 16 years of age. What
(Solution continues on next page) are your feelings about this?
When you get a reply from the address at which you have been asked to call,
show your identity card and say the following: "Good morning (or afternoon,
evening, as appropriate). I work for XXX and have been asked to interview
ABC whose address I believe this is. Is ABC in?" If the answer is "No", try
to find out when ABC might be in and arrange to call back. If the answer is
"Yes", ask to see this person unless you are already speaking with him/her. Interviewer: now say "To put your replies in context, I am now going to ask you
some questions about yourself. It will not be possible to identify you in any way from
When speaking with ABC, or if asked by the person answering the door, say the published results of the survey which will be as tables."
"As you probably know" (or if person is not ABC "might know") "from a
letter that you have (or ABC has) been sent, the government is interested in Q6. Are you
finding out adults' perceptions of crime and has asked my organisation to Married or in a civil partnership?
undertake a survey on this topic. You have (or ABC has) been selected by a Divorced or separated?
random process as a person who could help, and we would be very grateful if Living with a partner?
you (or ABC) would answer some questions. This will take at most half an Single and not living with a partner?
hour."
Q7. What is your occupation? …………………………………………………………….
If ABC agrees, say "Thank you" and then start asking the questions below in
order and record the answers, ticking boxes where appropriate. If ABC
refuses, apologise for taking his/her time and try to arrange an alternative time Q8. What is your age group? (Interviewer: show card and record answer.)
of interview.
18 – 24
25 – 44
45 – 64
65 – 79
Q1. Do you think that crime is a problem in your neighbourhood? 80 and over
Yes No Don't know Did not answer
Q2. Compared with five years ago, do you think the level of crime in your neighbourhood Interviewer, now say the following: "That is the end of the questions. Thank you
is now very much for taking the time to answer them. Is there anything you would like to ask
Lower? me or anything else that you would like to add?" Record responses. Answer any
About the same?
Higher?
questions if you are able to do so, or say you will try to find out the answers and that
Don’t know or did not answer someone will get back to the interviewee. Record the action. Record anything
interviewee adds.
Q3. Compared with five years ago, do you think the level of crime in the country as a Record the sex of the respondent. This is a check as the name of ABC will usually
whole is now
reveal this.
Lower?
About the same? Male Female Not clear
Higher?
Don’t know or did not answer
Q4. Do you think that violent crimes are on the increase in the country as a whole? Return completed forms to the office as requested.
Yes No Don’t know Did not answer
Ordinary Certificate, Paper I, 2009. Question 3 Ordinary Certificate, Paper I, 2009. Question 4
As the topic is a sensitive one, self-completion questionnaires (no interviewer present) (i) Advantages of quota sampling over simple random sampling.
are more likely to elicit honest responses than telephone interviews. The respondent
would have time to reflect on the answers and to remember details that might have It does not require a sampling frame, so it is useful when no suitable frame
been suppressed. The questionnaire could use closed questions and examples to exists.
prompt the respondent's memory and ensure that the respondent knew what kinds of
incidents constituted crimes. In addition, not everyone will have a telephone; in a It is quick to do as interviewers are not constrained to find named respondents,
telephone call it could be difficult for the interviewer to establish rapport with the so there are no call-backs.
respondent; identification details cannot be shown; more concentration is needed in a Hardly any preparatory work in the office is required.
telephone interview and respondents are likely to become tired if the interview is
long; it is not easy for the respondent to look up facts; there could be background Controls, such as limiting the numbers of men and of women to interview, are
noise and/or the chance of others overhearing the interview. relatively easy to use.
Advantages of telephone interviews compared with self-completion questionnaires are The costs of planning and analysis are less than in random methods with call-
that people often keep the same telephone number when they move house; the backs.
researcher will know quickly whether or not the selected sample member is willing to A target sample size can usually be achieved.
respond, so there is no time and money spent on following up non-respondents; and it
is usually possible to get a reply from a telephone number even if an answer-phone.
(ii) Disadvantages of quota sampling compared with random sampling.
(i) The sampling fraction is 150/750 = 1/5. The main sources of error in an interviewer survey of a random sample of cruise
passengers drawn from passenger lists are sampling error, measurement error, non-
For "budget", 377/5 = 75.4; for "standard", 303/5 = 60.6; for "de luxe", 70/5 response, interviewer effects and processing errors.
= 14.
Sampling error occurs because only a sample of passengers is selected. If the method
This suggests taking sample sizes of 75, 61 and 14 from those who have is random there would be no sampling biases, but this does depend on having a good
booked budget, standard and de luxe cabins respectively. sampling frame. If there are problems with the frame such as duplication of names or
omissions of names then sampling biases will occur.
(ii) Nh Sh Nh Sh Nh sh/ (Nh sh) {Nh sh/ (Nh sh)} 150 Measurement error might be due to problems with the questionnaire, for example if a
question is worded in such a way that it measures something different from what was
377 8850 3336450 0.4387 65.8
intended. Respondents do not necessarily give true answers to questions, either
303 13005 3940515 0.5181 77.7
deliberately or because they do not know or have forgotten details. However, if the
70 4695 328650 0.0432 6.4
responses resemble the truth, measurement error from this source will be small.
750 7605615 Measurement error also occurs when interviewers record responses incorrectly. This
might be because they did not hear a response properly, but might also be a
This suggests taking sample sizes of 66, 78 and 6 from those who have booked transcription error, or even deliberate. Measurement error can also occur during
budget, standard and de luxe cabins respectively. processing.
Ordinary Certificate, Paper I, 2009. Question 7 Ordinary Certificate, Paper I, 2009. Question 8
In a question in open format (often referred to as an open-ended question), the (i) The amount of money taken each month is known as the value and is equal to
respondent is given no suggestions of possible answers. In questions in closed format, price per item times number of items sold. Clearly data suitable for
alternative responses are given. In a closed question with a single answer, the monitoring changes in the value per month would consist of prices and
alternatives are mutually exclusive and the respondent is asked to choose one. In a quantities sold of a sample of the different magazines and newspapers. The
closed question with multiple answers, the respondent is asked to choose as many as quantities could be obtained from records of stock coming in and subtracting
apply. Sometimes an "Other, please state" option is given in a closed question. the amounts unsold and ultimately removed from sale.
(i) Selection bias is bias due to the method of selecting the sample and arises
THE ROYAL STATISTICAL SOCIETY when the members selected are in some way consistently atypical of the study
population. It would result in estimates of population quantities that are
systematically too low or too high.
2010 EXAMINATIONS SOLUTIONS Response rate is the proportion of those selected to take part in the survey who
provide a reply. A low response rate could produce poor estimates of
population quantities as those who respond might be atypical and not
representative of the population, even if there was no selection bias when the
ORDINARY CERTIFICATE sample was taken. Standard errors are also likely to be high, so precision of
estimates will be low.
PAPER I
(ii) In method A, there is no sampling scheme as such. Only those with a
particular interest in the restaurant and with time to spare might reply,
introducing both selection bias and a low response rate. These are both
disadvantages. On the other hand, potentially anyone who approaches the pay
The Society provides these solutions to assist candidates preparing for the desk could respond, which is an advantage, as is the publicity about the
examinations in future years and for the information of any other persons using the survey.
examinations.
In method B, tables are selected rather than people, but there could well be
The solutions should NOT be seen as "model answers". Rather, they have been more than one customer at a table. If the decision as to whom to interview at a
written out in considerable detail and are intended as learning aids. table is left to the interviewer, there could be selection bias due to the
interviewer. There could also be problems arising from the time of day (or
Users of the solutions should always be aware that in many cases there are valid day of the week), as the restaurant is likely to be much busier at some times
alternative methods. Also, in the many cases where discussion is called for, there may than others. The response rate will depend on people's willingness to respond,
be other valid points that could be made. and this might be low as they might not wish to be interviewed while, or just
after, eating. The advantages of the method are the element of randomness
While every care has been taken with the preparation of these solutions, the Society involved and that a personal approach has the potential to increase the sample
will not be responsible for any errors or omissions. size compared with method A. The personal approach might also enable
deeper questioning to be carried out ("probing").
The Society will not enter into any correspondence in respect of these solutions.
© RSS 2010
Ordinary Certificate, Paper I, 2010. Question 2 Ordinary Certificate, Paper I, 2010. Question 3
(i) The regions (which consist of small and large urban areas) are considered as A suggested covering letter, to be on the headed paper of the organisation conducting
clusters. The fist stage is to take a simple random sample of clusters. This is the survey and signed by the chief researcher, is shown below. If the names of all the
cluster sampling. Having selected these clusters, the second stage should teachers are available, they could be inserted in the salutation. If the survey has been
consist of stratified sampling, with each selected cluster stratified into small commissioned by a well-known organisation, the letter could start instead with "We
and large urban areas; it may be useful to further subdivide the large urban have been commissioned by ..... to undertake ...".
areas into those with (say) 2 or 3 outlets and those with 4 or 5 outlets. The
stratified sampling would be conducted by simple random sampling within the
strata, perhaps using proportional allocation. It is common practice that the
selected sample of clusters contains only a small number of them (sometimes Dear teacher,
only one), and sometimes complete enumeration is then carried out within
each selected cluster. This relies, of course, on each of the clusters being We are undertaking a survey to investigate whether teachers in colleges of students
representative of the population as a whole. aged 16–19 feel stressed by their work and to investigate factors that might affect
stress levels. You have been selected by a random process to take part in this survey
and we hope that you will agree to do so. Your responses will be strictly confidential
(ii) The tables need to be numbered. One method is to choose a simple random to our organisation.
sample of tables and ask the interviewer to approach customers at these tables
in turn in a specified order, returning to tables that were vacant at a later time. The survey consists of a questionnaire which is enclosed with this letter. Please
Another method is to take a systematic sample of tables and ask the answer all the questions in the spaces provided. Many can be answered by ticking
interviewer to follow a similar procedure. boxes. Please return the completed questionnaire to me at the address shown in the
letter-heading. A reply-paid envelope is enclosed. Alternatively, if you prefer to
Bearing in mind that interviews will take time, say 10 minutes including time answer the questionnaire electronically, please email me at [insert email address] and
to approach customers and decide who to interview at a table, it might be I will send it to you as a Word attachment which can be returned by email.
reasonable to do six interviews in an hour. For restaurants with only 6 tables,
all might be approached so that the sample size, in terms of tables, is 100% With our thanks in advance,
(complete enumeration); only a random order of tables to approach is needed.
For restaurants with as many as 15 tables, a 50% sample of the tables might be Yours sincerely,
appropriate (though perhaps somewhat ambitious in terms of the time taken).
Restaurants with intermediate numbers of tables could reasonably have
samples of between 50 and 100 per cent of the tables.
[insert name]
Chief researcher
Q5. Do you feel that the hours you work are excessively long? Taking 75 + 49 + 25 gives a total sample of 149 and a total cost (in £) of
Yes No Am not sure (75 5) + (49 10) + (25 7) = 1040.
Q6. My work makes a valuable contribution to society. [Note. In the examination, either answer was acceptable;
candidates were not expected to give both.]
Strongly Agree Neither Disagree Strongly
agree agree nor disagree
disagree (ii) Let n be the total sample size and n1, n2, n3 the sample sizes for the three
colleges. The required calculation is set out in the table below; the first four
Q7. I feel valued at work. columns repeat the information given in the table in the question.
Strongly Agree Neither Disagree Strongly
agree agree nor disagree N i si / ci
disagree College Ni ci si N i si / ci Cost
N i si / ci
Q 8. I feel stressed at work. A 307 5 7.5 1029.71 0.729 0.729n 5 = 3.645n
B 200 10 2.8 177.09 0.125 0.125n 10 = 1.250n
Strongly Agree Neither Disagree Strongly
C 103 7 5.3 206.33 0.146 0.146n 7 = 1.022n
agree agree nor disagree
disagree Total 1413.13 5.917n
Q9. The pay is adequate. The calculation shows that n1 = 0.729n, n2 = 0.125n, n3 = 0.146n and the total
cost is 5.917n.
Strongly Agree Neither Disagree Strongly
agree agree nor disagree So we require 5.917n 1050, which gives n 177.45.
disagree
Using 177.45 as the value of n, we get n1 = 129.36, n2 = 22.18, n3 = 25.91.
Q 10. Are you likely to leave the sector during the next year?
Taking 129, 22 and 26 respectively (with which n = 177) gives a total cost of
Yes No Do not know 1047.
(iii) There are advantages and disadvantages of both methods. The optimum Ordinary Certificate, Paper I, 2010. Question 5
allocation method minimises the variance of the estimate of the mean number
of years teachers have been at the colleges. However, we are told that the
survey has several objectives, and there is no guarantee that this optimum (i) Advantages of this longitudinal study include the following.
allocation for the mean number of years will also be optimal in respect of any
other objectives. Indeed, it almost certainly will not, depending on the Recent recruits are likely to be fairly interested in responding (initially
standard deviations that would apply for other objectives. Using a uniform at least).
sampling fraction is safe in that it achieves representativeness across all As the same group is followed, any changes can be related directly to
variables that might need to be measured. Further, it does not rely on the the teachers.
standard deviations, which are only estimates. Compared with the optimum
allocation, it uses a noticeably larger sample from college B and so will pick As the sampling frame is recent, it is likely to be fairly accurate.
up more of the variation between teachers at B. Conversely, it has a smaller
sample at A, but it remains quite a large sample, so A should be well covered. It is only necessary to look at one year's list of teachers to select the
Its overall sample size is considerably smaller, which may have consequences sample.
for overall accuracy, but despite this it costs almost as much as the sample
found by optimum allocation. Disadvantages include the following.
The results relate to one particular group only (things might be
The overall decision is not clear-cut but, particularly as there are several different for those joining in other years).
objectives, perhaps on the whole the uniform sampling fraction method is to
be preferred here. Sample members might get conditioned to responding and change
some of their views because of an impression they want to create.
It is necessary to wait five or more years to get results for those who
have been at the college for five years.
Members of the sample might leave the college or, even if they stay,
might get tired of responding and drop out of the study (leading to
reduced sample size and/or likely bias due to non-response).
A pilot survey is a small scale initial survey done with similar procedures to a (i) Several potential problems are listed here with suggestions as to how they
proposed survey. might be overcome. [In the examination, candidates were only asked to
discuss three problems. Other reasonable suggestions were of course
It is done to test various aspects of the proposed survey and to help in the design of accepted.]
the survey and to train personnel. In particular:
Some addresses might be missing from the list. Could travel round the area
Variability and costs can be estimated to help in the determination of sample and add these to the list before taking a sample, or could take an additional
size sample from the extra addresses found while doing the survey.
Some addresses might no longer exist. Could take a further sample to
Decisions on the sampling units can be made
compensate for this loss of sample members.
Sample frames can be drawn up and/or tested for accuracy, completeness etc Some addresses might be listed more than once. Delete duplicates from the
list before taking the sample if they are spotted. Do not include the same
Questionnaires can be tested and improved, including the introduction made to address more than once in the sample.
potential respondents
Some addresses will be non-residential. Might overcome by dropping these
Interviewers, if used, can be field-tested and given further training if necessary from the sample, but would need to choose more addresses than the required
number of households to avoid too big a reduction in planned sample size.
Office procedures can be developed and staff can be trained Some addresses with more than one household living at them could be under-
represented (for example if the address does not identify individual households
Coding and analysis procedures can be pre-tested living at it). Might always include all households living at any selected
address.
Background information useful to the full-scale survey can be obtained
Some households might have more than one address. Do not include them
The times needed for the different stages can be assessed. more than once in the sample (but it might be difficult to identify duplicates of
this type).
Some households might not have an address. Perhaps supplement the sample
by using other lists.
(ii) The interviewers should be given quotas to tell them the numbers of
households of each size and type that should be interviewed, perhaps also
including quotas for the ages and sex of people interviewed. An alternative
might be to instruct the interviewers how to take a systematic sample of
residential dwellings.
Geographical coverage should be ensured, either by making this part of each
quota or by telling interviewers in which area of the community they should
interview.
Interviewers should be instructed to interview in evenings as well as during
the day, and on all days of the week.
Interviewers might perhaps be advised to consider other ways of finding
members of households: for example, as well as knocking on doors, they
could stand in shopping centres.
The researcher should observe customer characteristics such as sex, broad age group,
ethnicity and whether the customer is with others such as children or other adults.
The researcher should observe what goods the customers look at and the approximate
time for which they do so, and what they put in their basket or trolley.
A variety of shoppers should be selected for observation, in all the weeks of the
survey and at different times of day. This might be done continuously, selecting a
further shopper as soon as observation of one has been completed.
The information would be best recorded on a check form so that as far as possible the
researcher just has to tick boxes.
One difficulty is that the researcher needs to be unobtrusive, and must avoid being
mistaken for a member of the supermarket staff. It is difficult to hide a clip-board. It
might be possible to stand at the end of an aisle.
Another difficulty is that, if there are a lot of people around the shelves, it will be hard
to tell who is doing what.
Selecting a variety of shoppers might also be difficult as time goes on. To begin with
most shoppers will be suitable, but later it might be difficult to fill some quotas,
especially in the rarer groups.
It will be difficult to time accurately how long people look at goods; indeed, it may
be difficult to tell whether they have looked at them at all.