You are on page 1of 73

Office for National Statistics – Government Buildings – Newport – NP10 8XG Fax : 01633 652727 – Email: info@ons.

RSS Ordinary Certificate in Statistics
Module 1: Research Methods, Data Collection Methods and Questionnaire Design

Statistical Training Unit
GTN 72 – 5523 / 01633 455523

Objectives Syllabus References Introduction Section A: Study Design 1. Experimental Studies Confounding Factors Controlling for confounding factors (i) Block randomisation (ii) Replication (iii) Blind Experimentation 2. Observational Studies Exercise 1 Section B: Survey Design Censuses, Samples and Administrative Data 1. Administrative data 2. Census Surveys The UK Decennial Population Census Pause and think 3. Sample Surveys How often do we need to collect data? (i) One-off sample surveys (ii) Serial sample surveys Repeated cross-sectional sample surveys Longitudinal sample surveys (1) Panel Surveys (2) Cohort Studies Combination of repeated cross-sectional and longitudinal sample surveys. Exercise 2 Examples of serial surveys Expenditure and Food survey General Household Survey National Travel Survey Section C: Data Collection 1. Self-completion Questionnaires 2. Telephone Interviews 3. Personal (face to face) interviews 4. Diaries Exercise 3 Section D: Questionnaire Design Principles of Questionnaire Design Establishing the objectives Overall layout Question wording Layout Page 3 3 4 5 6 9 10 10 11 11 11 12 14 15 15 15 16 17 21 22 23 24 24 24 27 27 27 29 30 31 31 32 34 35 37 39 39 40 43 44 44 44 45 45 46


Question Numbers Question Order Routing or Filtering Front Page Instructions Question Styles Open Questions Closed Questions Exercise 4 Section E: Data Processing and Analysis Coding Structures Missing Value Codes Coding and Data Analysis Data Capture Data Analysis Outliers Detecting outliers Dealing with Outliers Section F: Pilot Surveys Exercise 5 Assignment Glossary Appendix Answers

46 47 47 47 48 48 48 50 53 54 54 55 55 55 57 57 57 58 59 61 62 63 66 68


Module 1: Research Methods, Data Collection Methods and Questionnaire Design
Objectives By the end of this module you will be able to: • • • • distinguish between experimental and observational studies and define their advantages and disadvantages; define, explain and give examples of administrative, census and sample surveys; discuss and compare the advantages and disadvantages of census and sample surveys; define, explain and discuss the advantages and disadvantages of personal interviewing, telephone interviewing, self-completion questionnaires and diaries as methods of data collection; explain the uses of statistical computing; explain the problems that can occur in data collection; design and assess questionnaires based on best practice principles and a format that will aid data collection processes; classify and code questions to facilitate data processing and analysis; and explain the need for and uses of pilot surveys.

• • • • •

Syllabus This module covers the following parts of the syllabus: • • • Distinction between observational studies and experimental studies. The origin, use and interpretation of published or administrative data. Overview of official statistics in a country of the candidate’s choice; What statistical series are produced; How are the data collected; and What uses are the data put? Pilot surveys, censuses, sample surveys, personal interviews, selfcompletion questionnaires, postal and telephone enquiries. Serial surveys – longitudinal or cross-sectional. Problems arising in the collection of data, late returns, freak values and their treatment. 3

• •

Use of computers for data storage and retrieval (access to a computer when preparing for the examinations is not necessary). Design of simple questionnaire and forms for collection of data. Formulation, classification and coding of questions including verification. Making questionnaires suitable for data processing and analysis; use of missing value codes. Candidates should be able to produce their own simple questionnaire and data form

References Upton, G. & Cook, I. (2001) Introducing Statistics (2nd ed), chapter 3 Sudman, S. (1982) Asking Questions: A practical guide to questionnaire design Fowler, F. J. (1993) Survey Research Methods Dillman, D. (2001) Mail and Internet Surveys, chapter 3 Corti, L. (1993) Using Diaries in Social Research (


Research Methods, Data Collection Methods and Questionnaire Design
Introduction You hear claims involving statistics all the time; newspapers, magazines and news programmes use them frequently. But where do these statistics come from? Most will come from a piece of research into a particular subject. The research could have been conducted from a study or a survey and it is therefore important that we can distinguish between the two. Studies involve the investigation and testing of a hypothesis. They can be divided into two different types, experimental or observational. Section A will examine both, how they differ and suggest the advantages and disadvantages of each. On the other hand, some of the statistics we see will claim that 49% of the UK voting population don’t vote. These statistics have come from polls or surveys. Surveys are used to collect data through questions and answers and are used to gather information about the opinions, demographics, employment history and other reportable characteristics of the population of interest. Section B will cover sample surveys as well as censuses, which are a special type of survey. It will also investigate the use of administrative data and describe some of the more well known UK official statistics. When we decide upon the type of research that we are going to conduct it is necessary to think about the type of data that we will need and how we can collect this. The more common data collection methods of personal interviews, telephone interviews and self-completion questionnaires are discussed and reviewed in section C, along with the diary method of data collection. Each method relies on questionnaires to ask and record responses to questions. Therefore section D will address questionnaire design. When designing surveys and questionnaires, consideration needs to be given to data processing and analysis. For example, the use of coding structures and the format of responses. This will be considered in Section E. We will close the booklet with section F, which will discuss the need to test out all aspects of our studies or survey’s before we make them ‘live’. The section will focus on pilot surveys.


Section A: Study Design
As discussed in the introduction, most statistics that you read about will have come from a piece of research into a particular subject of interest. The research could have been conducted from a study or a survey and it is therefore important that we can distinguish between the two. Studies often involve the investigation and testing of a hypothesis, for example:Hypothesis: Taking a vitamin C supplement on a daily basis will reduce the chance of catching a cold. So we need to design a statistical procedure that either accepts or rejects this hypothesis. This idea is a fundamental way of thinking in Western Science and it has lead to many breakthroughs in understanding the world around us. For example, we now understand what causes certain illnesses and we have proven ways of curing them. The idea of collecting evidence to accept or reject a hypothesis might seem obvious, but that is because it is part of our culture and that we have grown up with it. But history will show that evidence based reasoning is a relatively new idea. This approach is seen in many different aspects of our decision making. For example, in a law court, a suspect is declared guilty if we have enough evidence to reject the hypothesis that the person is innocent:Hypothesis: Suspect is innocent So, if the evidence against the suspect is sufficiently strong, then we reject the hypothesis and convict the suspect. If the evidence is not strong, then we accept the hypothesis of innocence and release the suspect without a criminal record. This does not prove innocence, it only says that we cannot reject the hypothesis of innocence. Overall, we only conclude something if there is strong evidence for it. Without evidence, it is simply an opinion that has no weight behind it. Clearly, a GP should only prescribe a drug that has evidence to suggest that it works. If a particular drug has insufficient evidence to say that it works, then it should not be licensed and dispensed. But how do we go about testing the effectiveness of a drug? This section will focus on study design, in particular ‘experimental’ and ‘observational’ studies. It will look at how they differ and suggest the main advantages and disadvantages of each. Before we discuss these studies, it is best to think about what we mean by research.


What is research? Use this space to write down your thoughts.


What is Research? Research can be defined in the following way:A systematic investigation and study in order to establish facts and conclusions As statisticians, we often want to know the answer to certain questions. For example:• • • • • What is the percentage of the adult population who have cash savings above £10,000? How many people in a certain area speak Welsh? How many people in the UK are self-employed? How many under 3s live in a certain city? What is average amount of time required to see a specialist at a local hospital?

Sometimes, we want to test a hypothesis (i.e., can we accept or reject the hypothesis). For example:• • • • • The number of unemployed people who are available and actively seeking work in a certain area has gone down over a ten year period A new drug is more effective than an old drug in curing an illness A new teaching method is more effective than the old method in preparing students for an examination A CCTV camera reduces reported crime A ‘counselling’ is more effective in treating mild depression than drugs

We may think that we already know the answer, but until we subject our problem to testing and analysis then what we think is purely an opinion. But how would we test and analyse it? These sorts of questions can be answered with the use of ‘Experimental’ or ‘Observational’ studies. Let us start with a particular problem Hypothesis: Taking a vitamin C supplement on a daily basis will reduce the chance of catching a cold. We can accept or reject this hypothesis by performing an experiment, or we can observe past events.


It would be unacceptable to say that I take vitamin C and I never catch a cold, so therefore the vitamin C tablet is causing it and everyone should do it. Obviously, many other things could have prevented me from catching a cold, such as:• • • • My general health Not being exposed to the virus My lifestyle My immune system

Similar thinking happened in the past. For example, some smokers believed that smoking was not affecting their health and shortening their lives, because a family member lived until they were 100. However, a single observation does not prove the statement false. In some situations, it is possible to perform an experiment to test a hypothesis, but in other cases this is not possible. Let us discuss experimental studies in more detail.

Experimental Studies An experimental study deliberately imposes a treatment to a group of units of the population. These units are referred to as experimental units and as a group are called the experimental or treatment group. A comparison is made between this group and a control group, a group of units which have no treatment imposed. It is important that both groups are sufficiently large and representative, but this will be discussed in more in Module 2. Treatment Group The treatment group are asked to take vitamin C on a daily basis over a year. They are asked to record the number of colds they catch during this period. Control Group The control group are asked to take a placebo drug on a daily basis. A placebo is a drug that has no known nutritional or medical properties. They are asked to record the number of colds they catch during this period.

Comparison Overall, within a comparison we want to find out:1. if the two groups are different; 2. if they are, how and why they differ; 3. how we can measure this difference.


If vitamin C is effective in reducing the chance of catching a cold, then we would expect the percentage of people catching a cold in the treatment group to be less than the control group. Suppose that we conducted this experiment and at the end of the year we noted the following:

Group % who caught a cold Treatment 45 Control 60

We can see that the two groups are different, that the treatment group caught fewer colds, and we can measure this difference. However, it is not so easy for us to determine why the groups are different. It is tempting to say that our experiment shows that taking vitamin C on a regular basis reduces the chance of catching a cold. However, we must be careful in making such statements. Our experiment here has not considered other factors that may have led to these results. These factors are known as confounding factors, which we will discuss now.

Confounding Factors Confounding factors are factors that can affect the results of our experiment. In the vitamin C example the number of cold viruses contracted could be down to a combination of the following factors: • • • • • • the general health of the subjects before beginning the experiment; their gender; the strength of the supplement they received; their diet (its worth noting here that some people may have a diet that is already rich in vitamin C); their lifestyle; and or their age, etc.

With there being so many factors that can influence our results, we can’t be sure that the vitamin C supplement has caused the effect of reducing the frequency of colds. What we need therefore is to find a way to control for some of these factors. Controlling or Accounting for Confounding Factors We can never eliminate all of the confounding factors within an experiment, but we can put in place some measures that will allow us to control or account for some of them. This in itself is an advantage of experimental studies. There are several ways in which we can do this and we will consider three of them. 10


Block Randomisation After a sample is selected for the experiment, the research subjects can be divided into small groups by some factor that we believe could have an affect on our results, for example age. These groups can then be split further into treatment and control groups. This creates an opportunity to analyse the effects of age combined with the actual treatment. Consider again the vitamin C experiment. Suppose we were to amend the experiment so that effects of age could be taken into account. To control for age we could group people of similar ages into blocks and then randomly assign the subjects in these blocks to either the experimental or the control group:
















Key: T = Treatment Group C = Control Group By doing this, we can now compare the effects of the vitamin C supplement across the various age groups. (ii) Replication We can assess the influence of confounding factors by repeating or replicating an experiment. Replication is necessary to prove that if there was a difference in the control and treatment groups on the first run of the experiment that this difference was not due to chance or other, one-off factors. The method does not directly control particular confounding factors but is a method of accounting for chance results and determining validity. Blind Experimentation When conducting an experiment using humans, it is often important that the person does not know if they are in the control or the treatment group (although in certain cases this will not be possible). This is important, because knowing which group they are in could change behaviour in either a conscious or unconscious way. For example:-



Students knowing that they are being taught using a new method might change their behaviour in class in an unusual way Patients knowing that they are taking the new wonder drug might improve their mental state and increase their chances of recovery. This is known as the placebo effect.

This type of blind experimentation is called ‘single blind’ experimentation. Where practical, single blind experimentation should always be implemented. Sometimes it may be necessary to conduct a ‘double blind’ experiment. This is where neither the researcher nor the subject knows who belongs to the treatment or control group. Conducting this type of experiment prevents the researcher from biasing the results, by reporting what he/she thinks should be the observed result. Where a double blind experiment is conducted an external organisation or individual will be charged with allocating subjects to either the treatment or control group.


Observational Studies There are many reasons why we cannot conduct an experimental study. For example, it may be unethical to impose the treatment that we want to investigate, or, we may be in a position where we can only observe behaviours. In cases like these we use observational studies as an alternative. Observational studies observe the natural characteristics of a group of units in their natural environment. As such, an observational study imposes no treatment on the group being observed and confounding factors are not controlled for. If they were, then it would be similar to imposing a treatment and as a result the study would no longer be observational, it would be experimental. As observational studies simply observe something as it is found the results are more reliable than experimental studies. This is owing to no artificial manipulation to control for confounding factors or a treatment being imposed. However, in an observational study it is more difficult to determine ‘cause and effect’, as nothing is controlled for and there could be many factors at play that will determine results. Observational studies are often found in the medical or psychological fields where, for ethical and practical reasons, there are strict rules regarding what can and cannot be ‘done’ to subjects. For example, we couldn’t impose smoking on a group of subjects to determine its effects on health, but it is possible to observe a group of subjects who already smoke and record the effects that this has on health.


Observational studies are not limited to sight alone but are conducted in combination with other senses. Observational studies do not need to physically observe the research situation to ascertain the cause and effect. Consider the smoking example again: it would be difficult to observe a sample of smokers over a set period of time going about the daily routine and recording what we observe. How would we ascertain what effect this had on their health? What we could do instead is to define a population of interest. From this population we could, by using medical records or otherwise, ascertain who are smokers, who aren’t, and assign them to groups respectively. When we have formed two groups we could use the medical records again to observe any differences between the health of smokers to non smokers. When reporting our observations it is essential that we state what other factors could influence our results. With the smoking example factors such as age, diet, lifestyle, medical history etc could all affect a person’s health. But we should always remember that we cannot control for any of these effects.


Exercise 1 Taken from 1996, Paper 1, Question 3 Describe the difference between observational and experimental studies. Give one example of each type of study. Discuss one advantage and one disadvantage of experimental studies over observational studies. (10) Hint: Try to think of different example to those given in this booklet for your answer. Note: When asked to discuss advantages and disadvantages of one method over another you do not need to give advantages and disadvantages of both methods more that you discuss what the one method can do that the other cannot.

Difference between methods

Example of experimental study

Example of observational study

Advantage of experimental study over observational study

Disadvantage of experimental study over observational study


Section B: Survey Design
When we design a survey there are two things that we need to consider. 1. Who will we obtain data from? 2. How often do we need to collect it? For the first part of this section we consider who we will collect data from. Censuses, Samples and Administrative Data Who we collect data from is directly related to the type of data that we wish to collect. For example, we may want to get a full picture of a population of interest, so we conduct a census. However, there may be cases where cost and time constraints will result in only data being collected from a proportion of the population of interest, i.e. a sample. There are also times when we do not need to conduct a survey at all, as we can make use of data that have been collected by another organisation. That is we make use of administrative data. 1. Administrative data Administrative or ‘admin’ data are data which have been collected in order to carry out an administrative process rather than for the sole purpose of producing a statistical output. Example 1 The main data sources used in compiling the Inter-Departmental Business Register (the list of businesses in the UK) are administrative: • • • VAT registration data from Customs and Excise PAYE data from Inland Revenue Company details from Companies House

These data sources are routinely collected and stored by other organisations. Example 2 Administrative data is becoming more popular within Official Statistics. The Neighbourhood Statistics programme for example relies heavily on Administrative data from Local Authorities, the Home Office, Department for Work and Pensions and the Department for Transport to name but a few. Data taken from these organisations is passed through rigorous quality checks to ensure that it is fit for purpose. For more information on Neighbourhood Statistics see


Advantages and Disadvantages of Using Administrative Data Advantages • Costs are reduced as the data has already been collected. Disadvantages • The data have been collected for some other purposes than what you want it for and it may therefore have important data missing. • The data may be restricted due to legal and policy reasons. Examples of restrictions include the Data Protection Act, Human Rights, Administrative Law, etc. • The data will not have been produced for statistical purposes, and it therefore often fails basic statistical criteria (e.g. definitions may not be consistent over time). • The required data may be collected from different sources of admin data hence it may be difficult to merge all of the data together for the same unit.

• The data is already available.

• There is reduction in respondent burden as a new survey does not need to be conducted.

2. Census Surveys A census is a special type of survey where data is collected from all the units in the population of interest. Usually when we think of a census we think of the UK decennial population census. However it is important to note that a census is not just limited to this one case, rather it is any survey where data is collected from all the units within the population of interest. For example, if we were interested in the views of school children in the United Kingdom, and we surveyed all school children, we would be conducting a census. The term census comes from the Latin word censere, meaning ‘to assess, or tax’. The earliest recorded census dates back to 500 B.C. Government officials, called censors, made a register of people and their property. One purpose was to identify persons for military service. The other was to place a value on property so that taxes could be collected. The first decennial population census in the UK was conducted in 1801, and in its modern form in 1841.


The UK Decennial Population Census The following information has been taken from: What is the census? Since 1801, every 10 years the nation has set aside one day for the Census – a count of all people and households. It is the most complete source of information about the population that we have. The latest Census was held on Sunday 29 April 2001. Every effort is made to include everyone, and that is why the Census is so important. It is the only survey which provides a detailed picture of the entire population, and is unique because it covers everyone at the same time and asks the same core questions everywhere. This makes it easy to compare different parts of the country. The information the Census provides allows central and local Government, health authorities and many other organisations to target their resources more effectively and to plan housing, education, health and transport services for years to come. In England and Wales, the Census is planned and carried out by the Office for National Statistics. Elsewhere in the UK, responsibility lies with the General Register Office for Scotland and the Northern Ireland Statistics and Research Agency. Why have a census? We all use public services at various times – including schools, health services, roads and libraries. These services need to be planned, and in such a way that they keep pace with fast-changing patterns of modern life. We need accurate information on the numbers of people, where they live and what their needs are. Every ten years the Census provides a benchmark. Uniquely, it gives us a complete picture of the nation. It counts the numbers of people living in each city, town and country area. It tells us about each area and its population, including the balance of young and old, what jobs people do, and the type of housing they live in. Because the same questions are asked and the information is recorded in the same way throughout the UK, the Census allows us to compare different groups of people across the nation. The Census costs some £255 million for the UK as a whole, but the information it provides enables billions of pounds of taxpayers’ money to be targeted where it is needed most.


How is the data collected? In 2001, a Census form was delivered to every household, establishment, or to people living anywhere else, by a field force set up throughout the country. The forms were designed for self-completion by form fillers and to provide information that is related to Census day – 29 April 2001. Most forms were then posted back to temporary local offices and the remainder collected by the field force. The form for a household in England asked questions which collected information on household accommodation, relationship, demographic characteristics (e.g. sex, age, marital status), migration, cultural characteristics, health and provision of care, qualifications, employment, workplace and journey to work. In Wales there was an additional question on the Welsh language. The forms for people who lived communally, such as in homes for the elderly, collected information on each person. What is the data used for? The Census gives us invaluable facts about: Population An accurate count of the population in each local area helps Government to calculate the size of grants it allocates each local authority and health authority. In turn, these authorities use Census information when planning services within their areas. Health Data on the age and socio-economic make up of the population, and more specifically on general health, long-term illness and carers enables the Government to plan health and social services, and to allocate resources. Housing Information on housing and its occupants measures inadequate accommodation and, with information about the way we live as households, indicates the need for new housing. Employment The Census shows how many people work in different occupations and industries throughout the country, helping government and businesses to plan jobs and training policies and to make informed investment decisions. Transport Information collected on travel to and from work, and on availability of cars, contributes to the understanding of pressures on transport systems and to the planning of roads and public transport. Ethnic Group Data on ethnic groups help to identify the extent and nature of disadvantage in Britain and to measure the success of equal opportunities polices. The information helps central and local government to allocate


resources and plan programmes to take account of the needs of minority groups. 2011 Census The following information has been taken from: The next full census of England and Wales will take place in 2011 and includes a number of new approaches which will be evaluated during a rehearsal on 11 October 2009 with around 135,000 selected households in three local authority areas. The 2009 Census Rehearsal will test, among other things, the questions we are asking, the accuracy of our address list for posting out questionnaires and the new internet services for getting help and completing questionnaires online. The 2011 Census Programme The Office for National Statistics (ONS) designs, manages and runs the census in England and Wales. The General Register Office Scotland (GROS) and the Northern Ireland Statistics and Research Agency (NISRA) are responsible for the census in Scotland and Northern Ireland. All three have agreed to conduct their 2011 censuses on the same day in order to produce consistent and coherent information that covers the whole of the UK. Early research within the programme confirmed the unique value and continuing need for the sort of information that has traditionally been collected in the census and that a census is the only viable option for collecting such data in 2011. A test of the data collection process took place in 2007 and a rehearsal in three local authorities areas in October 2009 will test all the plans, systems and processes for 2011. ONS is working with local authorities and community groups to make sure the 2011 Census accounts for population diversity: special attention is being paid to hard-to-count, under-represented groups, including:
• • • • • • • • •

disabled and/or very elderly people ethnic minority groups faith communities migrants non-English speaking people unemployed people people on low income students and other young adults gypsy, traveller and other groups where response has historically been low


Collecting the Information Pre-addressed questionnaires will be posted out to most households using a specially developed enumeration list of addresses. This was trialled in the 2007 Census Test. A variety of approaches will be used to increase coverage of groups of people. These include delivering and collecting forms by hand, and using more field staff in the most difficult areas. Following the success of the 2001 Census Community Liaison programme, the 2011 Census will seek to build relationships with relevant groups and agencies. The programme involves close co-operation with local authorities and regional, community and neighbourhood groups. Householders will have a choice to submit their answers to census questions online or by post. New web services are being created for the online questionnaire and an online help centre providing advice and guidance for completing the questionnaire. An accessibility area on the website will provide video and audio assistance, in English and Welsh, for people who are visually impaired or deaf. A multi-lingual telephone helpline will also be available.

Further information on the 2011 census can be found via the link above.


Pause and think. Use the space below to write down what you think the advantages and disadvantages of a census. Advantages



The table below sets out the advantages and disadvantages of a census survey. It mainly focuses on the advantages and disadvantages of population censuses (which tend to be large and ask for detailed information). However, it is important to note that the advantages and disadvantages may change, depending on the census survey you are conducting. Advantages • The whole population of interest will be covered and therefore (give or take some nonresponse) the results of the survey will be representative. Disadvantages • Compared to a sample survey, cost in running the survey will be high (particularly if the population size is large) simply because the entire population will be targeted and it may take a long time to obtain responses. • There is a large collective burden as all units in the population of interest are contacted.

As the quality of census data tends to be good, the results of major census surveys are used by Governments and Local Authorities to inform policy decisions. They are also used as a benchmark for sample surveys.

• The analysis stage can be slow and by the time the results are published they may well be out of date.

3. Sample Surveys A sample survey is defined as the collection of data from a proportion or sample of the population of interest. Generally, when we talk about a survey, we are referring to a sample survey. The results from a sample survey are used to make population estimates. Methods of selecting appropriate samples depending on the aim of the survey will be discussed in module 2. As a sample survey only selects a proportion of the population of interest the advantages and disadvantages of this methods are different to those of a census. The table below gives the main advantages and disadvantages of a sample survey.


Advantages and Disadvantages of a Sample Survey (Again, it is important to note that the advantages and disadvantages may change depending on the sample survey being carried out.) Advantages • Only a proportion of the population is selected therefore costs will be lower than in a census and data collection should be quicker. Disadvantages • As data will be only be obtained from a proportion of the population, the researcher cannot be certain that population estimates will be representative of the entire population. The sampling method used is important here. • The periodic nature of some sample surveys means that respondents can potentially stay in a sample for a long time. This may increase burden over time compared to running a one-off sample survey or an infrequent census.

• There is a lower overall burden (on the population, not the individual) as only a proportion of the population takes part.

• As sample surveys deal with smaller numbers of respondents than a census, results can be published in a more timely manner. We have discussed censuses and sample surveys. Whether or not we conduct a census or sample is dependent on the aim of the survey. It would be wrong to say that one is always better than another, because it is not that black and white. Statisticians face the following main issues when designing a survey procedure:• • • Timeliness Accuracy Cost

All of these issues need to be taken into account when any decision relating to the survey is made.

How often do we need to collect data? Surveys can be one-offs or conducted at regular periods of time. The latter are often referred to as serial surveys.

When we conduct a census the periodicity on which we collect our data is limited. We need to allow time for planning, collection, analysis and 23

publication. This is why we only run the population census every ten years. For sample surveys we tend to have much more choice. Market research and opinions polls tend to be one-off surveys. Many ONS surveys are serial sample surveys. We will now consider one-off sample surveys and serial sample surveys in turn.


One-off sample surveys One-off or cross-sectional sample surveys are used to take a snapshot of a proportion of the population at one point in time. They are commonly used during government election campaigns to establish public opinion towards the political parties. Cross-sectional means that these surveys target different members of the population in an effort to make the survey as representative as possible, this depends though on the sampling method. These surveys are primarily concerned with macro-level processes; that is, the general rather than the specific. One weakness of cross-sectional sample surveys is their inability to track change over time. We could ask people about events in the past but memories are fallible and likely to be unreliable. To overcome this problem, we need to collect the same information over an extended period of time – a serial survey design.


Serial sample surveys There are two types of serial sample surveys: • Repeated cross-sectional sample surveys; and • Longitudinal sample surveys. Repeated cross-sectional sample surveys These occur where a survey is conducted at regular intervals (for example, every month) and a new sample is taken each time. These surveys help to provide a ‘moving picture’ of the population. Repeated cross-sectional surveys are useful if we simply want to document some change over time. For example, consumption patterns or fluctuations in the numbers of people having their car stolen. They are not useful however if we are interested in how certain types of events typically influence and effect individuals over time. Examples of repeated cross-sectional surveys include the: • General Household Survey (GHS); • Expenditure and Food Survey (EFS); and • Consumer Price Index (CPI) 24


Advantages and Disadvantages sectional Sample Surveys Advantages • The data obtained can be used to estimate the average level of some characteristic over time. For example, the average numbers of passengers who use the London to Cardiff train between April 2006 and April 2007.




Disadvantages • There are two sources of variation in repeated crosssectional sample surveys; the time period and the change in respondents. Therefore, even though it is possible to identify relationships between variables or changes over time, these relationships may not be considered to be accurate. • In-depth information tends not to be asked for in repeated crosssectional sample surveys as the focus is more on movements in general trends rather than specific individual level results where more detailed information is required. Note: Most social surveys are very detailed and complex.

• The data can be used to obtain some measurement of the relationship between two or more variables over time. For example, a train company asks a sample of its passengers on a monthly basis whether they think the train service offers value for money. If the tariff for one particular route was £15, and in May it rose to £20, the company could analyse the results to determine whether the price rise affected passenger views. • Since the sample changes at every interval, population coverage is good. The end results are then considered highly representative of the population and valid.

• Miss-representation of the population only becomes an issue where an inappropriate sampling method has been used – therefore, this requires careful consideration.


Longitudinal Sample Surveys Longitudinal sample surveys collect data from the same sample of respondents at intervals of time. There are two types of longitudinal sample surveys, they are: 1. panel surveys; and 2. cohort studies. 1. Panel Surveys Panel surveys recruit a single representative sample, a panel, from the population and collect data from them at regular intervals of time over a long period. Where panel members die or become unobtainable, their children may be added to the panel. It should be noted that panel surveys are not only limited to subjects as the observational units, sometimes, households, businesses and even countries are used. An example of a panel survey with this type of design is the British Household Panel Survey (BHPS), which recruited a cross-section of the population in the early nineties and has returned to them annually ever since. The BHPS collects information on individual demographics, income and wealth, health and care and household details such as ownership and size of dwelling. For more information on the BHPS see

2. Cohort Studies A cohort study is a form of longitudinal study used commonly in medicine and social science. A cohort differs from a panel in that instead of a representative sample being chosen a group of units are selected that share a common characteristic or experience within a defined time period. Members of the cohort are tracked throughout time and information on them is gathered and recorded. An example of a cohort study is the Child of Our Time study which has been commissioned and televised by the BBC. The study aims to answer the question “are we born or made?” The study will run for 20 years working with families whose children were born around the millennium and who represent the widest possible range of genetic, social and geographical and ethnic backgrounds.


Whilst the study runs, a series of experiments will be conducted to build up a coherent and accurate picture of how the genes and environments of our growing children interact whilst they progress into adulthood. For more information see the BBC website: _time/ [Note that this is an example of a cohort study, not a survey, although surveys may have been conducted as part of the study.]

Advantages and disadvantages of longitudinal surveys Advantages • As the sample respondents in a longitudinal study remain the same the variability associated with changing the sample is accounted for. As such there is only one source of variability in longitudinal designs and that is the change in time period. Disadvantages • The cost of longitudinal studies can be high. This is down to several factors. Often the study takes place over lengthy time period and respondents are interviewed on more than one occasion. Added to this is the cost of researchers and programme managers which can result in escalating costs. • As longitudinal studies use the same panel or cohort over a designated time period there is a danger of loss of respondents from the sample. This may be down to mortality or attrition, where respondents become unavailable for a particular reason. Those that remain in the study may not be representative of those that have left. • Respondents are often asked to submit responses to the same questions over a period of time. This can lead to the respondent becoming familiar with the style of question and the responses expected. This is often referred to as conditioning. There is a danger therefore that as 28

• Longitudinal studies are often used to determine the relationship between variables and how they interact. This often makes it possible to determine causality as well as before and after effects.

• Field work planning can be easier with longitudinal surveys than repeated cross-sectional surveys as the sample only needs to be chosen once.

respondents become familiar with the survey that they will adjust their answers leading to biased results. Combination of repeated cross-sectional and longitudinal surveys There are benefits to be gained from combining elements of repeated crosssectional sample surveys with elements of longitudinal surveys. In these instances, some respondents in the sample will remain the same over successive periods to give you your panel. Other respondents will be rotated in and out of the sample each time the survey is run to ensure both a wider coverage of the population of interest and to ensure that respondents are not over burdened. This combined design allows researchers to obtain estimates of population parameters at a specific point in time, and increases the reliability of estimates of changes over time.


Exercise 2 Taken from 1997, Paper I, Question 5 (a) What is the essential difference between longitudinal and crosssectional surveys? (4) Discuss two advantages and disadvantages of longitudinal surveys over cross-sectional surveys. (8)



Examples of Serial Surveys The Living Cost and Food Survey (Previously the Expenditure and Food Survey) – Repeated cross-sectional design The following information has been taken from: Why is the survey carried out? The Living Cost and Food Survey is a continuous survey of household expenditure, food consumption and income. The primary uses are to provide information about spending patterns for the Retail Price Index, and about food consumption and nutrition. The index also feeds into estimates of consumers’ expenditure in the National Accounts, is used for tax benefit modelling and is an important source of economic and social data for government and other research agencies. The Living Cost and Food Survey is commissioned by the Office for National Statistics (ONS) and by the Department for Environment, Food and Rural Affairs (DEFRA). The main customers are divisions within ONS, DEFRA and a number of other government departments. Social Survey Division (SSD) within ONS is responsible for the survey design and carries out fieldwork in Great Britain, while the Central Survey Unit of the Northern Ireland Statistics and Research Agency undertakes the fieldwork in Northern Ireland. SSD carries out the coding and editing of all the data, quality control and supply of data to customers. SSD will also report the expenditure and income data whilst DEFRA will publish the detailed food results and nutrition information. How is the survey done? Information for the Living Cost and Food Survey is collected from people living in private households. The survey is made up of: • a comprehensive household questionnaire which asks about regular household bills and expenditure on major but infrequent purchases; • an individual questionnaire for each adult (aged 16 or over) which asks detailed questions about their income; • a diary of all personal expenditure kept by each adult for two weeks, and of home grown and wild food brought into the home; • a simplified diary kept by children ages 7 to 15 years, also kept for two weeks. The sample in Great Britain is 12,000 addresses a year which are selected from the Postal Address File. Data is collected using Computer Assisted Interviewing and paper diary questionnaires.


General Household Survey (GHS) – Repeated Cross-sectional Survey This information has been taken from: Why is the survey carried out? The General Household Survey (GHS) is a multi-purpose continuous survey carried out by the Social Survey Division of the Office for National Statistics (ONS) which collects information on a range of topics from people living in private households in Great Britain. The survey started in 1971 and has been carried out continuously since, except for breaks in 1997 / 98 (when the survey was reviewed) and 1999 / 2000 when the survey was re-developed. The main aim of the survey is to collect data on: • household and family information • housing tenure and household accommodation • consumer durables including vehicle ownership • employment • education • health and use of health services • smoking and drinking • family information including marriage, cohabitation and fertility • income • demographic information about household members including migration. The information is used by government departments and other organisations for planning, policy and monitoring purposes, and to present a picture of households, families and people in Great Britain. Data from the General Household Survey is widely used in other publications, such as Social Trends and Regional Trends – details of both of these publications can be found on ONS’ website, The GHS has documented the major changes in households, families and people which have occurred over the last 30 years. These include the decline in average household size and the growth in the proportion of the population who live alone, the increase in the proportion of families headed for lone parent and in the percentage of people who are cohabiting. It has also recorded changes in housing, such as the growth of home ownership, and the increasing proportion of homes with household facilities and goods such as central heating, washing machines, microwave ovens and home computers. The survey also monitors trends in the prevalence of smoking and drinking. How is the survey done? Fieldwork of the GHS is conducted on a financial year basis, with interviewing taking place continuously throughout the year. A sample of approximately 13,000 addresses is selected each year from the Postcode Address File (PAF). All adults aged 16 years and over are interviewed in each responding household. Demographic and health information is also collected about


children in the household. Since 1994, all interviews have been conducted using Computer Aided Personal Interviewing (CAPI).


The National Travel Survey (NTS) – Repeated cross-sectional survey Further information regarding the National Travel Survey can be found at: or s/ What is the NTS? The National Travel Survey is designed to provide a databank of personal travel information for Great Britain. It has been conducted as a continuous survey since July 1988, following ad hoc surveys since the mid-1960s. The survey is designed to identify long-term trends and is not suitable for monitoring short-term trends. What data is collected? NTS respondents keep a travel diary for seven days of their trips within Great Britain. Travel details provided by respondents include trip purpose, method of travel, time of data and trip length. The NTS is the only source of national information on subjects such as cycling and walking which provide a context for the results of more local studies. The households also provide personal information, such as their age, gender, working status and driving licence holding and details of the cars available for their use. In order to minimise burden of completing the diaries respondents only include walks under a mile on the seventh day, but data on short walks is estimated for the full seven-day period. What is the data used for? The NTS is carried out in order to provide a better understanding of the use of transport facilities made by different sectors of the population, and trends in the patterns of demand. Important uses include the forecasting of future traffic levels, monitoring accident rates amongst different types of road user, and informing policy decisions regarding infrastructure and environmental issues relating to pollution caused by travel. The results are used extensively by consultants and academics, and they appear in many Government statistical publications.


Section C: Data Collection Methods
After the study design has been finalised, the issue of how the data will be collected needs to be addressed. Many organisations, including the ONS, collect data through the use of a questionnaire or form. A questionnaire or form is a type of data collection instrument and provides a structure to the data collection process. The person who completes the questionnaire or form is known as the respondent. There is a subtle difference between a questionnaire and a form and in some cases the two are mixed together. Generally, a questionnaire asks for information using specific questions (e.g. How old are you?), whereas a form provides a basis for recording information next to key words or statements (e.g. age). In this section we will look at four methods of data collection via questionnaire or form. These are: 1. 2. 3. 4. Self-completion questionnaire; Telephone interview; Personal, or face to face interview; and Diaries.

The first three of these methods all use questionnaires or forms to collect the data. Questionnaires/forms are also often included in diaries, to add structure. We will address each method in turn, outlining the advantages and disadvantages of each. 1. Self-completion Questionnaires A self completion questionnaire is a questionnaire that is sent or given to a respondent for completion and return. There are four main types of self-completion questionnaire: (a) Postal Questionnaire The questionnaire is posted to the respondent, asking him/her to complete it and post it back. The respondent may be encouraged to complete and return the questionnaire by enclosing pre-paid envelopes or offering an incentive. Drop-off and pick-up questionnaires The questionnaire is delivered to and collected from the respondent by field staff. Computer assisted self-interviewing (CASI) The questionnaire is filled in by the respondent on a PC or laptop in the presence of an interviewer. The interviewer is present only to explain the purpose of the questionnaire and to explain any questions or concepts that the respondent requires clarifying.





Electronic self-completion questionnaire The questionnaire is sent to the respondent in an electronic format, such as floppy disc or via the internet, and the respondent completes the questionnaire electronically.

Advantages and disadvantages of Postal Self-completion Questionnaires Advantages • Large numbers of questionnaires can be sent out at relatively low cost. This is because it is cheaper to post a questionnaire to the respondent than to conduct a personal or telephone interview. Disadvantages • Response rates are usually extremely low, generally lower than 20%. Response rates depend on factors such as the questionnaire’s length, presentation and subject matter, the incentives offered, ease of completion and respondents’ vested interest in participating. • The results may be unrepresentative of the whole population, with some groups more motivated to return questionnaires than others. • The data collection process could take a long time since the questionnaires need to be distributed to respondents, completed and then returned to the survey centre.

• The survey can be widely spread, as the cost of postage remains the same for posting a questionnaire locally or nationally. • The questionnaires can include visual prompts, products for trial, and can be useful for asking the sorts of questions unlikely to be easily answered on the telephone (e.g. where a lot of quantitative (numeric) detail is required.)

What advantages/disadvantages do you think the other 3 methods have? There are also some specific advantages and disadvantages for web-based questionnaires that you should consider. Advantages • Pictures and graphics can be used much more freely. Disadvantages • There are significant costs involved from development and maintenance. However, as technology improves these costs are falling. • Technology is improving continuously and quickly making some questionnaires and the 36

• Data is captured and can be verified at the same time the respondent inputs their

response. • Can increase confidence.

process attached to them out of date. respondent’s

2. Telephone interviews Telephone interviews are as they sound. A trained interviewer telephones a respondent and uses an interviewer-led questionnaire to conduct an interview over the phone. Telephone interviews tend to be conducted from a central location (i.e. the survey centre, or the interviewers own home). The telephone interviewer can use a computer to generate the questions and to record and code responses. This is known as Computer Assisted Telephone Interviewing (CATI) and it allows the interviewer to conduct some validation checks at the time of the interview (i.e. checking suspect responses with respondent). CATI can also be used to schedule telephone calls so that respondents are called at convenient times, and it can also generate random telephone numbers for random digit dialling. Advantages and Disadvantages of Telephone Interviewing Advantages • Data collection from telephone interviews can be relatively quick, especially in comparison to self-completion surveys, as a response can be obtained immediately. • In comparison to personal interviews, telephone interviews are less expensive because interviewer travel costs do not feature. Disadvantages • Any survey which targets those in the very low socio-economic groups, will suffer from the fact that telephone ownership is lower with these groups. This results in a less-representative sample. • The number of people having mobiles only, and no land line, is increasing. As there is currently no reliable database containing mobile numbers, this leads to them being excluded from the survey. • Telephone interviews should be kept relatively short; a common rule of thumb is usually 20 minutes at the most. If longer interviews are necessary it is always sensible to set appointments to call back. 37



Personal (Face to Face) interviews Personal interviews can be conducted in the respondent’s home or workplace, or in locations such as shopping malls, or even simply on the street. Computer Assisted Personal Interviewing (CAPI) involves the interviewer using a laptop to record the respondent’s answers. The questionnaire is programmed onto the laptop using specialist software (e.g. Blaize). This software enables the interviewer to record the responses and then routes the interviewer automatically to the next appropriate question. Advantages and Disadvantages of Personal Interviews Advantages • CAPI allows for a more efficient survey process since validation checks and coding can be done at the time of interview. Also, the electronically held responses can be transferred back to the survey centre quickly following the interview. • The response rate is usually high for personal interviews as the respondent typically finds it difficult to refuse an interviewer face-to-face compared to over the telephone. • Data quality tends to be good since the interviewer can probe for more complete answers from the respondent and can complete validation checks at the time of interview. As the interviewer is present to answer queries, more complex questions designed to elicit detailed information can be asked. • Visual aids, or ‘flash’ cards can be used to help the respondent to answer certain sensitive questions, for example, questions regarding sexual Disadvantages • As with CATI, the cost of setting up and maintaining the questionnaire software and computer technology is high.

• Making contact with the desired respondent may be difficult, as the respondent may not be home when the interviewer calls. Repeated attempts to contact should be made but these will inevitably increase costs. • Interviewer bias may occur if the interviewer is poorly trained. The interviewer may consistently misinterpret responses or give misleading guidance. Bias may also be introduced through the interviewer appearance, gender, age, ethnicity and tone of voice.

• Compared to self-completion and telephone interviews, personal interviews are expensive. This is because there is the added cost of 39

behaviour or drug addiction.

employing and training interviewers for the survey, along with paying for the interviewers travel and subsistence costs. Such costs may be minimised depending on the sampling method used.

• Since only the interviewer need understand the questionnaire structure, complex routing and filtering may be used in the questionnaire.

4. Diaries Diaries are used as data collection instruments to collect detailed information about behaviour, events and other aspects of individuals daily lives. Respondents are given a diary to complete at regular (usually predetermined) intervals, for a set period of time. Diaries provide an alternative method to interviews for events that are easily forgotten or difficult to recall. They are often used in combination with personal interviews. The diary provides the main data collection instrument and the interview is used to collect background information from the respondent, and information that may be of interest to the study that will not be captured by the diary. During the interview, interviewers will explain to the respondent how the diary is to be kept. Diaries can be open format, allowing respondents to record activities and events in their own words, or they can be highly structured where all activities are pre-categorised. All diary surveys, regardless of their format, need highly trained coding and analysis staff and an exhaustive and mutually exclusive coding structure is paramount to the success of the survey analysis. (The importance of coding will be discussed later in this module.) Advantages and Disadvantages of Diaries Advantages Disadvantages • The open format of a diary • There are significant costs allows respondents to answer in involved with this method. The detail and gives the respondent costs incurred from opportunity to clarify the accompanying this method with meaning of their response. personal interviews are added to by the labour intensive editing and coding processes that are needed for data analysis. • Diaries do not rely on long term memory. Although many • Respondents may become conditioned to the survey. They 40

respondents may not complete the diary at each relevant time period, the recall time required is much less than when an interview or a self-completion questionnaire is used. • The accuracy and quality of data collected within a diary survey is high.

may become more aware of their behaviour and adjust their behaviour accordingly to comply more with the norms of society. • Diaries often suffer from what is known as “first day effects”. This is where respondents are diligent for the first day of the collection period, entering accurately and promptly their response. After the first day their enthusiasm for the survey may wane and their responses become less prompt, meaning that the accuracy of their inputs will rely on memory. • Members of the chosen sample may refuse to take part in the survey. Reassurances of confidentiality and offering incentives to respondents will influence respondent’s cooperation. The success of the survey may also depend on the quality of interviewing staff who should be highly motivated, competent and well briefed, thereby encouraging response. • Diaries are time consuming and may become tedious, if a diary is to be fully completed over a reasonable period. • Diaries, as with all selfcompletion questionnaires, rely on assumed level of literacy and as such they disadvantage the less literate respondents. In some cases this assumed literacy level will mean that the sample used is biased in favour of the more literate from the population.

• It is possible that responses will reveal the unexpected.

• Diaries can be designed so that they provide a form for respondents to complete. • It is possible to use different diaries for different sub-groups within the population. For example the Living Cost and Food Survey uses more detailed diaries for adults than it does for children.


Although the design of a diary will depend on the detailed requirement of the topic under consideration, there are certain design aspects which are common to most: • The diary should include clear instructions on how to complete the diary. These instructions should stress the need for the entries to be placed in the diary as soon as possible after the event. • An example should be given of how the diary is to be completed. • Each page of the diary should be for a specific period. For example a day or week. • A checklist of what should and shouldn’t be included in the diary should be given. • Terminology that can be recognised by the respondents should be used. • At the end of the diary a series of questions on the diary itself should be asked. These should include any comments or feelings that respondents have on completing the diary.


Exercise 3 Taken from 2004, Paper I, Question 5 In dietary studies, subjects are sometimes asked to keep a diary for a limited period of time in which they record what foods and drinks they have consumed, and when. Outline the advantages and disadvantages of collecting information using a diary rather than relying solely on a questionnaire. (10)


Section D: Questionnaire Design
Questionnaire design forms the central role in the survey process as the questionnaire is often the first point of contact with the respondent. Good questionnaire design is crucial in terms of: • encouraging the respondent to participate in the survey; and • eliciting the required information from the respondent in a valid way. The main objectives of a questionnaire are to: • obtain accurate information from respondents; • provide a logical structure to the questionnaire/ interview so that it flows smoothly; • provide a standard form on which responses can be recorded; and • facilitate data entry and processing through the use of coding. Designing a questionnaire is considered by some to be a simple task – how hard can it be to write a few questions? But producing a good questionnaire requires careful thought, thorough planning and specialist knowledge. A well designed questionnaire should achieve all of the above objectives. Throughout this next section we will consider the principles of questionnaire design for self-completion questionnaires. However, candidates should note that many of these principles can be applied to interviewer led questionnaires also. For interviewer led questionnaires, instructions to the interviewer must also be included. A template of both interviewer led and self-completion questionnaires can be found in the appendix. Principles of Questionnaire Design When designing a questionnaire there are some general principles that should be considered. Establishing the objectives of the survey When researchers design a survey the first step that they will take is to decide what they want to learn from the survey as this will determine the objectives of the survey. The objectives help researchers decide who they need to collect data from and the data required. If survey objectives are unclear it is highly likely the data collected with be unclear. The more specific the objectives, the more usable the data collected. When we establish our objectives, we should list them, split them into topic areas and then design questions based on these topics.


Overall layout of the questionnaire Before you start to think about the questions that you will ask it is important to consider what the overall layout of the questionnaire will be. Ideally you want to ensure that the questionnaire is kept short and simple. This can be achieved by: • Minimising Clutter All questionnaires should be kept as uncluttered as possible. Nearly all aspects of the questionnaire will require some response from the respondent. This might simply be that they have to read the instructions or questions, or they may have to assimilate what is needed in a response box. The more cluttered a questionnaire is with text, graphics and symbols, the greater the burden on the respondent. White space within a questionnaire should be maximised; not only does it make the questionnaire easier to read but it is thought that it will also help to relax the respondent. Questionnaire / Section Title The first thing that the respondent will read is the questionnaire title. It will set the tone of the questionnaire and inform the respondent of its relevance to them. Titles should also be used throughout the questionnaire, in order for respondents to distinguish one section from the next. All titles should be clear and easily understood. • Accessibility and clarity We all have a different perception of what we see, therefore we should consider this within our questionnaire design and ensure that it is accessible/easy to understand for all respondents. We often assume the level of comprehension of our respondents and that all are at the same level. This is often not the case and care must be taken to ensure that all respondents can understand what is being asked of them. Question Wording The major problem faced when designing questionnaires is designing questions that the respondent can understand and interpret in the way in which we desire them to. The following should all be considered: • Terminology When designing a questionnaire you should always use language and terminology that the respondent is familiar with. Therefore you should look to: ƒ avoid jargon, shorthand or uncommon words;



ƒ •

avoid ambiguous words that do not have a specific meaning, for example the words ‘often’, ‘usually’ and ‘frequently’ have no specific meaning and should be made more specific; avoid words which can be misinterpreted;

Question Structure As well as considering the terminology of questions we should also consider their structure. ƒ ‘Least Read’ – respondents only read as much of the question as they think is necessary. It is therefore important that questions are structured so respondents are more likely to read the whole question before they answer it. As a guide, keep questions short and concise. ƒ Multi-part questions will only lead to confusion. Even though they may appear to save space they should be avoided. ƒ Double-barrelled questions that ask the respondent for two pieces of information at the same time may confuse respondents. Some respondents may only answer one part of the question whilst others will answer the other part. ƒ Leading or biased questions will force a respondent to answer in a certain way thus biasing the response. Questions should be worded in a neutral and balanced way to guard against response bias.

Layout The layout of the questionnaire should be inviting and interesting and should provide a clear and logical path for the eye to follow. Throughout the questionnaire this layout should stay consistent to help respondents navigate through the questionnaire. Much of this can be achieved through the effective use of blank space so that it is clear and easy to read. There should be more space between the questions than there is within them (this helps the respondent to group question parts together). Questions, response options, response boxes and instructions should be laid out in a standard format. Where appropriate there should be enough space made available for the respondent to write their responses. Question numbers Question numbers are the main tool the designer has to help respondents progress through the questionnaire in the desired sequence. Therefore they need to stand out from the rest of the text on the page. Question numbers should: • Have a simple sequence of numbers


• •

A combination of numbers and letters should be avoided as this can confuse respondents. Question numbers should always be displayed consistently throughout the questionnaire.

Question order The questions and sections within a questionnaire should be ordered in a logical manner that makes sense to the respondent. For example, all demographic questions such as age and sex should be contained within the same section. The respondent should be able to work through the questionnaire without having to look back or forwards for references. It is also sensible to ensure that questions are ordered in a way that minimises the need for routing or filtering to other questions. You should ensure that the questionnaire starts with easy or less sensitive questions to encourage the respondent to participate. Respondents are more likely to answer sensitive questions, for example, those relating to income or alcohol consumption, if they are placed towards the end of the questionnaire rather than at the beginning. As a whole, the questions should be grouped into topics in a logical sequence and should flow easily. Routing or Filtering Routing questions can be used to guide respondents to questions that are applicable to them and to ensure that they do not respond to questions that are not applicable. The use of routing should be kept to a minimum, respondents can find them difficult to follow and they can disrupt a respondent’s flow through the questionnaire. Where routing is used, instructions should be included that aid the respondent and these should be placed with the appropriate questions rather than with the general instructions for the questionnaire. Front Page The front page of a questionnaire should contain all of the information that the respondent will need to know to complete the questionnaire. However, we should ensure that respondents are not overloaded with information as this can be confusing. The front page should act as an introduction or covering letter and should inform the respondent of: • Who is conducting the survey; • Why they have been selected to take part; • Whether the survey is compulsory; • Confidentiality; • Who should complete the questionnaire; • How and when they should return the questionnaire;


• •

What the data collected will be used for, i.e. the objectives of the survey; and Contact details for advice and further information.

Where possible questionnaires should be personalised. This can be done by using the name and address of the respondent and including a signature of the person responsible for the survey. Instructions A respondent needs to know how to complete a questionnaire. As such, the questionnaire should include some clear guidance on how to complete it. Instructions should be presented where they are needed and before the respondent is required to put pen to paper. For example, where a question requires a tick or cross in a box the respondent should be informed of this. Instructions that are presented away from the questionnaire, for example in a booklet or on a separate page, can confuse respondents and will often be forgotten. Instructions should also be repeated throughout the questionnaire to remind the respondent. There may be cases where specific instructions are needed for a particular question. Where this is the case the instructions should be integrated into the question. If this is not possible, they should be placed immediately after the question but before the respondent is required to respond. Question Styles There are many types of questions available for a questionnaire. Two such types are open and closed questions. The type of question used depends on the data required. By using a variety of question styles the respondent will become more interested and engaged in the survey. Open Questions Open questions require the respondent to produce their own answers. In a self-completion questionnaire respondents write in their own answers constrained only by the space available. In an interview, the interviewer writes down the respondent’s answer verbatim. Open questions are used when rich, detailed information is required from the respondent. They provide a source of qualitative data, where qualitative data refers to descriptive rather than numeric data. Let’s assume that a theatre director wants to know what the audience thought of his latest play. He designs a questionnaire and distributes them to the audience to complete. An open question that he could have asked is:


What did you think of tonight’s show?


Closed Questions Closed questions offer respondents a choice of answers, or response categories. Some closed questions require ‘yes/no’ answers, others provide a list of possible choices. Closed questions tend to be used when high level, quantifiable data is required. Quantifiable refers to data that is numeric or can be summarised in numeric form. In relation to the theatre directors questionnaire above the question, “What did you think of tonight’s show?”, can be changed into a closed question by offering the audience a set of response categories and asking them to choose one. For example: What did you think of tonight’s show? (please tick one box only) Very good Good Poor Very poor No opinion

The advantages and disadvantages of open and closed questions should be balanced against the response that we require before we decide which type of question to use. Advantages and Disadvantages of Open Questions • Advantages They allow an unlimited number of possible answers. • Disadvantages Different respondents give different degrees of detail in their answers making them difficult to compare. Responses may be irrelevant or buried in useless detail. Comparisons and statistical analysis can be difficult and indeed there is a methodological argument against converting qualitative data into a quantifiable form. Coding of responses can be time consuming and difficult especially where responses are 50

• •

Respondents can answer in detail and can clarify responses. Unanticipated responses can be discovered.

• •

They enable adequate answers to be given to complex issues/questions.

incomplete or unclear. • They encourage creativity, selfexpression and richness of detail. They reveal a respondent’s logic, thinking process and frame of reference. • Questions may be too general causing respondents to lose focus. A large amount of respondent time, effort and energy is required to answer open questions. Response boxes for open questions often take up a lot of space on a questionnaire. Articulate and highly literate respondents have an advantage over those who are less literate.

• •

Compare the above with the advantages and disadvantages of closed questions. Advantages and Disadvantages of Closed Questions • • Advantages They are easy and quick to answer. The answers of different respondents are comparable • • Disadvantages Misinterpretation of a question may go unnoticed. They force respondents to give simplistic answers to potentially complex questions. It can be confusing for the respondent if many response categories are offered (especially where the categories are read by an interviewer). The response categories can suggest ideas that respondents would not otherwise have considered thereby influencing their response. Respondents with no opinion or knowledge may just choose a category anyway.

Responses are easier to code and analyse compared to open questions.

Response categories can clarify the question meaning to respondents

Respondents are more likely to respond to closed questions that relate to a sensitive topic than they are to an open question on the same topic.


Less articulate or less literate people are not disadvantaged by closed questions.

A respondent’s desired answer may not be listed and as such they may feel forced to choose an available category instead of their preferred option. (This problem should not occur if an ‘Other’ category is provided.)


Exercise 4 Taken from 2001, Paper I, Question 2 You are employed in the marketing department of a national newspaper “The National Daily” that published Monday to Saturday. Your paper is running a prize draw. In return for supplying data about themselves and their reading habits, readers aged 18 or over will be entered into a draw to win a valuable prize. You have been asked to design the form to be used in the newspaper as an entry to the draw. The form has to elicit the following information: Name, address including post code (or zip code), telephone number, email address, age group, on which days the reader usually buys “The National Daily”, which other national daily newspapers are bought regularly during the week by the reader and which national Sunday newspapers are bought at least once a month. Design a form that could be used for this purpose. Marks will be given for clarity of layout and ease of use by the readers. (12) Use the space provided to design your form.


Section E: Data Processing and Analysis
The questionnaires or forms that we design are used to collect accurate data. Survey organisations such as ONS will perform several steps to transform the data collected into information that can be assimilated and used. These steps are known as data processing. Note: The following is a guide to what typically happens in ONS in terms of data processing and analysis. The process used by other organisations may vary. Coding Structures Before the data we collect is entered onto a computer for data processing it must be coded. Coding involves allocating a number to each of the possible responses provided to a closed question, or allocating a code to the response of an open question. A code is quicker to enter onto computer systems than text responses, thus data processing is more efficient. Coding also aids the data analysis stage, as it categorises the responses given and enables the frequency of selection to be calculated. Note that codes should cover all possible responses and should not overlap. Coding can be carried out at one of three stages: (a) Before the survey Where closed questions are being used, a code can be assigned to each of the possible responses on the questionnaire before the survey is sent out. This is known as pre-coding. The responses are then ready for data entry as soon as the questionnaire is returned to the office. However, the code may detract the attention of the respondent away from the question so where you position the codes on the questionnaire is important (a common place is on the right hand side of the response box and in a small or greyed out font). For example: How well would you say your dietary needs (e.g. low fat, vegetarian, vegan, kosher, etc.) are catered for by the ONS canteen? Please tick one box only.

Very well Quite well Not very well Not at all No opinion

1 2 3 4 5



During the interview Where open questions are being used during an interviewer led questionnaire, the interviewer can code the responses as they are given. Note however, that the coding schedule must have been established before the interview takes place. The interviewer is able to clarify the response and provide appropriate coding. It is possible for the interviewer to bias the respondent, or interpret detailed or complicated responses in line with personal prejudices, possibly providing the incorrect code. After the survey Where open questions are being used in a self-completion survey, a range of responses are received and coded after the survey has been completed. All given answers will therefore be considered in the coding. However, responses to open questions may be incomplete or vague and thus difficult to code.


Missing Value Codes (a) Text Responses The codes used in practice for a missing text response, for example, where a respondent has completely missed a question out, will vary but they may take a ‘.’, or ‘#’. Numeric responses For numerical responses it is important to distinguish between a missing value and a returned zero. Hence it is also good practice to provide a code for a missing numerical value, such as ‘99’ or ‘999’. This will aid analysis later on and will prevent confusion with returned responses.


Coding and Data Analysis Coded answers are easily analysed using computer software packages such as SPSS (Statistical Package for Social Sciences) or SAS (Statistical Analysis System). Both of these packages are used extensively across the Government Statistical Service for statistical analysis. These packages allow researchers/statisticians to produce a wide range of summary statistics, like the mean or standard deviation, and tables and graphs that can be used in reports. Note, however, that operators need to be trained to use the software packages in order that they are used correctly and that outputs are interpreted appropriately. Data Capture This is the process where by the data we collect on questionnaires or forms is transferred to an electronic file and subsequently put onto the computer. Before we can complete this step, we must ensure that the questionnaire or form is ready for data capture. The questionnaire is reviewed by someone to


ensure that all of the minimum required data have been reported, and that they are understandable.

There are several methods use for data capture: • Batch Keying is one of the oldest methods of data capture. It involves the manual keying of data onto the computer. During this keying period no immediate editing takes place so validity and range edits need to implemented to ensure quality keying. This does not mean the data are being re-edited, but if a field is numeric and alpha characters are entered instead, the error will be flagged. • The scanning of questionnaires to capture the information that has been supplied on them is commonly used within ONS. The main system used is Intelligent Character Recognition (ICR). Within this system questionnaires are designed so that responses can be distinguished from the actual questionnaire. For successful data capture through both batch keying and scanning there is a need to consider how the information we collect will be interpreted when designing a questionnaire. We can do this by considering the format we want our responses in. For example, if we ask for date of birth then we should state whether we want this in DD/MM/YYYY or some other format. Some of the questionnaires that we pass through ICR will fail on scanning, that is they can not be interpreted. A common example of this is where respondents enter a six but the scanner picks it up as a zero. Whilst ICR is adapt at picking up characters, some will ultimately fail. Where this is the case a team of data experts will manually input the information on the questionnaire onto a computer. • For data recorded by an interviewer, this can sometimes be entered directly onto a computer. These files can then be transferred electronically to the relevant system

Once an electronic file of all of the information collected has been created, the data is passed through a series of validation and automatic editing rules. One automatic editing rule used at the ONS is automatic rounding. Much of the turnover data that ONS collects is asked for in £’000’s, so an automatic editing rule has been set up to check that a respondent’s data has been reported in the correct format. Where it hasn’t, the rule is programmed to correct the respondent’s turnover and transform it into the appropriate form. For example, if a respondent returns a monthly turnover figure of £1,000,000 when £1,000 is far more likely (i.e., the respondent has written their figure in full, rather than in £000), the system will automatically adjust this. All data are also passed through a set of validation gates. These gates check the feasibility of the data and highlight possible errors. The data that fails


validation are passed to a team of analysts who contact the respondent to confirm the data or query it. They then correct it where necessary. Data Analysis Once all data has been edited in this manner it will be passed to a results team who will analyse it further. The main interest for this team is how the individual data will collate together to form the key outputs of the survey. Here they will use statistical computing packages such as SAS to analyse the data sets, to find out about aspects of the data such as the trend, irregular movements and outliers or freak values. After the data has been analysed, the final results are produced, published and disseminated. Outliers or Freak Values An outlier is a response that is unusual in comparison to responses from other respondents. That is the response appears to be inconsistent with the rest of the population. It may be so different from other responses that is arouses suspicion. Outliers are identified after all the data has been processed and has passed through validation and editing. Since outliers are unusual responses there should be a small number of outliers in any dataset. There are two main types of outlier: • • Representative outliers – representative outliers are genuine values which cannot be assumed to be unique in the population. Non-representative outliers – non-representative outliers are unique or incorrect data values which should be looked at and treated by editing and imputation systems.

Outlier theory has been developed to deal with representative outliers only, that is, it assumes that all data is correct. Detecting Outliers One of the easiest ways to detect an outlier is through plotting all data on a scatter plot. Any observations that appear to fall away from the bulk of the data can be viewed as outliers. There are also two mathematical ways of detecting outliers. • Distance from the mean Calculating the distance from the mean is the most common method used for detecting outliers. It works by measuring the relative distance between a response and the average response. Those values that appear to have a large distance from the average value are deemed to be outliers. (Please note that you do not need to know how to calculate distance from the mean, but you do need to recognise it as a method of detecting an outlier)


Trimming Trimming is a relatively simple method for detecting outliers. The responses are sorted into ascending order and the top x% and bottom y% of the responses are identified as outliers. The upper and lower percentages are pre-determined by the researcher, but typically they are between 2% and 20%. This method tends to identify a large number of outliers, depending on the number of responses, and the values for the upper and lower percentages.

Dealing with outliers Before any moves are made to deal with an outlier the first step should be to re-check the results; a recording error may be the explanation, or two or more data items may have been mixed up. Data entry to a computer may also be faulty (e.g. scanning or keying errors). If an outlier is confirmed then action can be taken to reduce its effect on the survey results. Analysis may be carried out with and without the doubtful values to see if omitting a possible outlier makes a difference to the survey results. But it is logically dangerous to omit observations unless there is a valid, likely reason for doing so. The more extreme values give information on how variable the data is, which in some studies is just as important as location.


Section F: Pilot Surveys
Before implementing a survey, it is essential to make sure that it will work and that all of the processes that have been put in place are appropriate for the survey. The best way to do this is through a pilot survey. A pilot survey is a small scale survey carried out to gain information which is typically used to improve the design of a larger survey. It allows researchers to find and overcome any difficulties before implementation and to test the whole survey process. The pilot survey can be used to provide information on: • The accuracy of the sampling frame The pilot survey can be used to identify whether the sampling frame is complete, up–to-date and accurate. (We will discuss sampling frames in module 2.) The potential response rate The pilot survey can give an indication of the likely response rate for the main survey. It can also be used to test the effects on the response rate of making certain changes to an existing survey. Changes could be made to the number of questions and the data collection method. The effectiveness of the questionnaire The pilot survey offers a way of testing how well respondents understand and respond to the questions and instructions. For interviewer led questionnaires, how well interviewers understand and follow the questions may also be tested. The possible answers for closed questions It is often difficult to produce a closed list of all possible answers for the use in designing closed questions. The pilot survey can therefore use open questions to obtain a range of answers, which can then be transferred into closed questions for the real survey. The codes for pre-coded answers As with establishing the possible answers for closed questions, a pilot survey can be used to obtain a more complete range of codes. The potential cost and duration The pilot survey can give an indication of the likely cost and duration of the main survey. How efficient the survey processes are The pilot survey can be used to identify the efficiency and smooth running of all survey processes such as form printing, posting, receipt, scanning, data editing and data analysis.


Note: The extent to which good information is gained during a pilot survey will depend on how well you managed to obtain a representative sample for the pilot.


Exercise 5 Taken from 1995, Paper I, Question 4. As part of a nation-wide study into the health of the elderly population, medical investigators intend to question a large number of old people in great detail about their previous medical history. The investigators ask for your advice regarding the use of a pilot questionnaire. (a) What is the role of a pilot questionnaire in formulating a main study questionnaire? (2) State three advantages of using a pilot questionnaire with reference to the study described. (6)



Please complete the following questions as your assignment for this module: 1999, Paper I, Question 6 2006, Paper I, Question 2 (information given at the beginning of the 2006 paper will need to be read) 2006, Paper I, Question 8 part (i) 2007, Paper I, Question 7 The assignment due-by-date is stated on your course timetable. Please return your completed assignment to your designated course tutor. Gemma Hamilton Room 1.127 Office for National Statistics, Government Buildings, Cardiff Road, Newport. NP10 8XG Richard Treloar Room 4200N Office for National Statistics, Segensworth Rd, Titchfield, Fareham Hants. PO15 5RR


This appendix contains all of the key definitions from this module in alphabetical order. Blind Experimentation Blind experimentation is a specific method of controlling for confounding effects from the subjects or the researchers in an experiment. There are two types: single blind and double blind. Single blind experimentation means that the subjects do now know which group they are in: double blind means that neither the researcher nor the subjects know who is in which group. Block randomisation A method for accounting for confounding factors by grouping like units together and then randomly allocating the units within each group to either the experimental or control groups. Census survey A census survey involves the collection of data from all units in a population of interest. Closed question A closed question offers respondents a choice of answers known as response categories. Coding Coding involves allocating a number to each of the possible responses provided to a closed question, or allocating a code to a response to an open question. Cohort study A cohort study is a type of longitudinal survey that selects a sample of individuals who have experienced a similar or same life event, and then follows this sample over time. Confounding factor Is a factor, other than the experimental factor, that my influence the results of experimental research. Control group A group that does not receive an experimental treatment. Data capture The process by which collected data are put in a machine-readable form. This may involve some simple editing checks. Experimental group A group that receives an experimental treatment.


Experimental study An experimental study involves deliberately applying a treatment to one group of experimental units and comparing that group to a group that does not receive the treatment (control units). Longitudinal sample surveys A longitudinal sample survey collects data from the same sample of respondents at regular intervals of time. Observational study An observational study observes the natural characteristics of a group of observational units, or subjects, in their natural environment. One-off sample survey A one-off, or cross-sectional, sample survey provides a snapshot of a proportion of the population at one point in time. Open question An open question is worded in such a way that it invites respondents to answer a question in their own words, without being restricted to set responses. Outlier (freak value) An outlier is a single observation that is inconsistent with the rest of the data for the variable being observed. Panel surveys A panel survey is a type of longitudinal survey that recruits a single representative sample from the population and collects data from these same respondents at regular intervals of time, sometimes adding the children of panel members when they are born and removing people when they die. Personal Interviews Personal interview involve face-to-face contact between an interviewer and the respondent, where the interviewer may conduct the interview in the respondent’s home or workplace, or in locations such as shopping malls, or even on the street. Pilot survey A pilot survey can be thought of as a ‘dress-rehearsal’ of the main survey with the objective of ensuring that all survey processes are working effectively. Pilot surveys provide an opportunity to test the data collection instrument and allow necessary amendments to be identified and made before the main survey. Placebo A medication or treatment believed by a researcher to be inert or innocuous. Repeated cross-sectional sample surveys


These are surveys conducted at regular intervals, where a new sample is taken each time the survey is run.


Questionnaire Templates Two templates have been inserted here to assist you with the design of: 1. An Interviewer led questionnaire; and 2. A Self-completion postal questionnaire.

Template 1: An Interviewer led Questionnaire Title of Survey Interviewer Details E.g. Name & address of respondent, interviewer's name, interview date and time, number of previous contacts, reference number, etc. Scripted introduction • • • • Explain the purpose of the survey. Provide an estimate of how long the interview will take. Mention that responses will be kept in confidence. This should be scripted, so that the interviewer can read it out during the interview.

The Questions • • • • • • Follow guidelines for questionnaire and question design. Use open and closed questions. Use visual aids/show cards as appropriate. Use large enough font so that interviewer can read clearly. Make questions short & simple. Use prompts and probes (follow-up questions that glean further information). • Use filtering/routing, to ensure only relevant questions are asked. • Use pre-coding on the questionnaire or provide a list of codes for the interviewer to code responses during the interview. Close the interview and thank respondent


Template 2: A Self-completion Postal Questionnaire There are two elements to this type of questionnaire: (a) Introduction or covering letter; and (b) main questionnaire.

Title of Survey Introduction or Cover Letter • • • • • • Explain the purpose of the survey. Provide information about what the respondent will gain by taking part in the survey (e.g. promise of an improved product/service or a free gift). Provide an estimate of how long it will take the respondent to complete the questionnaire. Provide the deadline for returning the questionnaire. Outline how the responses will be dealt with confidentially. Thank the respondent in advance for taking part in the survey.

General guidance E.g. • how to complete tick boxes, • write in capital letters only, • use black ink only, etc

The Questions • • • Follow guidelines for questionnaire and question design. Use closed questions to encourage response. Follow guidelines for formulating questions, e.g. make questions short & simple. • Use simple routing, to ensure only relevant questions are asked. • Use pre-coding.

Close the survey E.g. • Thank the respondent. • Explain how to return questionnaire • Provide the deadline for its return.


Exercise 1 Answer to 1996, Paper I, Question 3 Observational studies observe the natural characteristics of the subjects or units under investigation. No treatments are imposed on the subjects and confounding factors are not controlled for. With an experimental study a treatment is imposed on the subjects or units under investigation, those units that have the treatment imposed form the treatment group. The results of the treatment group are compared against a control group, that have no treatment imposed, the cause and after effects of the treatment are then noted. Confounding factors can be controlled for in an experimental study, they can be controlled using either block randomisation, replication or blind experimentation. An example of an observational study is the investigation of smoking on health. An example of an experimental study is the investigation of a new fertiliser on a particular crop. An advantage of an experimental study over an observational study is that you can control the situation and therefore confounding factors can be controlled for. Under an observational study confounding factors can not be controlled for as this would result in a process being imposed on the subject. A disadvantage of an experimental study would be that by controlling for confounding factors the results of the survey could be affected by this control. Under an observational study because confounding factors are not controlled for, the whole process of the study can be viewed.

Exercise 2 The essential difference between longitudinal and cross-sectional surveys is that for longitudinal surveys the sample remains the same throughout the duration of the survey whereas a cross-sectional survey will have a different random sample at each repeat of the survey. An advantage of a longitudinal survey over cross-sectional survey is that for a longitudinal survey there is only one source of variation, the change in time period, whereas, a cross-sectional survey has two sources of variation, the change in time and the change in sample. Because the sample stays the same in a longitudinal survey it is possible to calculate change over time, in a cross sectional survey this difficult as the sample changes making it hard to make meaningful comparisons between time period. A disadvantage of a longitudinal survey over a cross-sectional survey is that a longitudinal survey will suffer from loss of respondents through mortality and attrition leading to the remaining sample being less representative and potentially biased. As a cross sectional survey has a new sample at each repeat of a survey it is thought to be more representative. Another 68

disadvantage is that the respondents in a longitudinal survey are often asked to submit responses to the same questions over a period of time, leading to the respondent becoming familiar with the style of question and the responses expected. This is often referred to as conditioning and can lead respondents becoming familiar with the survey adjusting their answers leading to biased results. This is much less of an issue with cross-sectional surveys as the respondents will only be asked to respond once. Exercise 3 Advantages of a “diary” method include the following: • there is a much more accurate record of what is eaten and when; • answers do not depend on long-or medium-term memory; • a diary form could be designed with suitable headings and definitions to make accurate and correct recording easier; • a further improvement may be to record quantities in some convenient way; • regularity of diet can be included by having carefully specified “time” boxes. Disadvantages include the following: • it takes time, and may become tedious, for a diary to be fully completed over a reasonable period; • there is no guarantee that is completed fully and accurately, at the time food or drinks are consumed or very soon afterwards; • diets will vary somewhat according to seasonal availability of some items, requiring repetition of the exercise a few times during a year; • people may actually change their regular habits during the time they are keeping a diary; • “snacks” between main meals may not be recorded unless clear instructions are given – and are not always then.


Exercise 4

The National Daily Prize Draw: Entry Form
By simply completing this form you will automatically be entered into a prize draw where you could win a valuable prize!

Please complete this form in BLOCK CAPITALS. 1. Title

2. First (or given name)

3. Surname

4. Home address


5. Telephone number:

Daytime Evening

5. Email address (if applicable)


6. What age group do you belong to? Please tick one box only. 18 - 25 26 - 35 36 - 45 46 - 55 56 - 65 over 65

2 3 4 5 6

7. During a typical week, on what days do you buy The National Daily? Please tick all applicable boxes. Monday Tuesday Wednesday Thursday Friday Saturday

2 3 4 5 6

8. During a typical week (excluding Sunday), what other daily national newspapers do you buy? Please write the name of the newspaper(s) in the box below.

9. Do you buy a national Sunday newspaper at least once a month? Yes No
1 2

10. If yes, what national Sunday newspaper(s) do you buy?


Please write the names of the newspapers in the box below.

Thank you for completing this form. Please now send it to: The National Newspaper Some Street Somewhere AA11 2BB

Exercise 5 (a) The role of a pilot survey is to test the whole survey process with a small number of people. The pilot survey will allow researchers to overcome any difficulties before implementation and can therefore be thought of as the dress rehearsal for the main survey. (b) Three advantages of using a pilot questionnaire in a study into the health of the elderly population are: 1. It will test the effectiveness of the questionnaire, testing how well respondents understand and respond to the questions and instructions. 2. If closed questions are to be used on the full survey then the pilot survey will be able to ask these questions in an open format, to obtain a full range of answers, which can be transferred into closed questions for the real survey. The pilot survey will also test the accuracy of the sampling frame whether it is complete, up-to-date and accurate.