Professional Documents
Culture Documents
7.1 Sampling
When conducting scientific research one should always consider the use of sampling. If it’s possible to obtain
and analyse data from every possible case or group member it’s termed a ‘census’. However, this may not
always be the case because one might face some restrictions: time, money and access. This is the reason
why sampling is often used.
A sample should always represent the full set of cases in a way that is meaningful and which we can justify
(Becker 1998). The full set of cases from which the population is taken is called the population. The
population does not necessarily signify people, it could also point to Chinese restaurants or electric cars in a
specific region for example. There are a number of reasons why sampling is a better option than a census:
Researchers such as Barnett argue that using a sample leads to higher overall accuracy than a census. This
is because the researcher focusses on a smaller number of cases to collect data from and therefore has
more time to design and pilot the data collection methods. Moreover, data collected from fewer cases
means that the information is more detailed.
Sampling techniques
There two types of sampling techniques (see figure 7.2 on page 261):
• Probability/ representative sampling – the chance of each case to be selected from the population
is already known and is usually equal for all cases. This is when you want to prove something
statistically.
• Non-probability sampling – the chance of each case to be selected from the population is not
known and it’s impossible to make statistical interferences about the characteristics of the
population.
The ‘sampling frame’ for a probability sample is a complete list of all the cases in the population from
which the sample is drawn. It is not possible to select a probability sample without a sampling frame.
When no suitable list exists and you still want to use a probability sampling technique, you will have to
compose your own sampling frame and ensure that it’s valid.
The way in which a researcher defines his sampling frame also raises implications to the extent to which he
can generalize form his sample. If a sampling frame is a list of all customers of an organization one can
only generalize to that population. Thus, you should not generalize beyond your sampling frame. This is a
mistake many researchers make; they don’t place clear limits on the generalizability of their findings.
• The confidence one has in the data (whether you are certain that the sample is representative of
the entire population)
• The margin of error1 one tolerates (the accuracy for estimates made from the sample)
• The type of analyses one is going to undertake
• The size of the total population
In order to ensure that faked results cannot be present, the analysed data must be normally distributed.
The larger the absolute size of a certain sample, the closer its distribution will be to the normal distribution.
This relationship is known as the ‘central limit theorem’ and it also occurs if the population from which the
sample is drawn isn’t normally distributed. It is proven that any sample size larger than 30 will usually
result in a sampling distribution for the mean that is very close to a normal distribution. This is the
reason why Stutely (2003) advises to use sample with a minimum of 30 cases.
The process of making conclusions about a population on the basis of data describing the sample is
called ‘statistical interference’. The ‘law of large number’ holds that large samples are more likely to
represent the population from which they are drawn than smaller samples. Moreover, their means are also
more likely to be equal to mean of the population.
Response rate
It is essential for a probability sample to be representative of the population. A perfect representative
sample is one that represents the population from which it is taken exactly. There are four levels of response
to questionnaires and structured interviews:
Reasons why people don’t respond may be because they refuse to participate to the research, they are
ineligible (they don’t fit the requirements) or they may be unreachable. A research report should always
include the response rate of the research. This could be calculated by the following formula:
A more common way of calculating the response rate excludes ineligible respondents who were unreachable.
This is the active response rate
1
A margin of error tells you how many percentage points your results will differ from the real population value. For example, a 95%
confidence interval with a 4 percent margin of error means that your statistic will be within 4 percentage points of the real population value
95% of the time
active response rate= total number of responses total /number in sample-(ineligible+unreachable)
It is important to estimate the likely response rate and increase the sample size accordingly to ensure that
you will be able to undertake the analysis at the level of detail required. Once the estimated response rate
and the minimum sample size are determined one could calculate the actual sample size with the
following formula:
na=n ×100/re%
One way to estimate the response rate is to analyse the response rates achieved for similar surveys that have
already been undertaken and subsequently base the response rates on these.
Sampling techniques
There are five main techniques for selecting a probability sample:
• Simple random
• Systematic random
• Stratified random
• Cluster
• Multi stage
See figure 7.3 for a guideline for selecting the appropriate probability sampling technique.
When the sampling fraction is ¼ one needs to select every fourth case from the sampling frame. Using this
technique one needs to be sure that the list does not contain periodic patterns since this may disturb the
results.
Cluster sampling
This technique is similar to stratified random sampling as it is required to divide the population into discrete
groups prior to sampling. The groups are called ‘clusters’ and can be based on any naturally occurring
grouping (for example by manufacturing firm or geographical area). Instead of the individual cases within the
population, with this technique the sampling frame consists of the list of clusters. The technique has three
stages:
This technique leads to a sample that represents the whole population less accurately than stratified random
sampling.
Multi-stage sampling
This is a development of cluster sampling. Just like cluster sampling, multi-stage sampling can be used for any
discrete group, including those that are not geographically based. With this technique one modifies a cluster
sample by adding at least one or more stage of sampling that also involves some form of random sampling.
The four phases of multi-stage sampling are depicted in figure 7.4 on page 279. Because this technique relies
on various different sampling frames, one needs to ensure that they are all suitable and available.
For example, if you were studying inner city violence, you could study a city with high violence and
compare it to a city with low violence. Like any sampling technique where a researcher deliberately
chooses cases, extreme case sampling could result in selection bias, undermining results (Collier &
Mahoney, 1996)
For example
• A researcher is conducting a door to door survey to find attitudes towards single parents. ...
• A researcher is investigating why people don't complete their prescribed course of antibiotics and
thinks that socioeconomic class may be a reason.
• Homogeneous sampling – focuses on one specific subgroup in which all the members are very
similar (age,occupation)
For example, people in a homogeneous sample might share the same age, location or employment.
The selected traits are ones that are useful to a researcher.
•
• Critical base sampling – selects critical cases because they are either important or can make a
dramatic point.
• Typical case sampling – these enable the researcher to generate an illustration of what is typical to
those who will read the research report and are unfamiliar with the topic.
• Theoretical sampling – sample selection is dictated by the needs of the theory being developed.
Thus the sampling occurs during the research as more participants are needed.
Volunteer sampling
Snowball sampling is a type of sampling where participants volunteered to participate in the research,
instead of being chosen by the researcher. Self-selection sampling on the other hand occurs when the
researcher asks the participants to volunteer in the research.
Example—A TV show host asks his viewers to visit his website and respond to an online poll.
Haphazard sampling
This is a type of sampling where sample cases are selected without an obvious relation to the research
questions.
An example of Haphazard Sampling would be standing on a busy corner during rush hour and interviewing
people who pass by