You are on page 1of 5

Chapter 7 – Sample selection

7.1 Sampling
When conducting scientific research one should always consider the use of sampling. If it’s possible to obtain
and analyse data from every possible case or group member it’s termed a ‘census’. However, this may not
always be the case because one might face some restrictions: time, money and access. This is the reason
why sampling is often used.

A sample should always represent the full set of cases in a way that is meaningful and which we can justify
(Becker 1998). The full set of cases from which the population is taken is called the population. The
population does not necessarily signify people, it could also point to Chinese restaurants or electric cars in a
specific region for example. There are a number of reasons why sampling is a better option than a census:

• There is no budget to survey the entire population


• There is no time to survey the entire population
• It is not practicable to survey the entire population

Researchers such as Barnett argue that using a sample leads to higher overall accuracy than a census. This
is because the researcher focusses on a smaller number of cases to collect data from and therefore has
more time to design and pilot the data collection methods. Moreover, data collected from fewer cases
means that the information is more detailed.

Sampling techniques
There two types of sampling techniques (see figure 7.2 on page 261):

• Probability/ representative sampling – the chance of each case to be selected from the population
is already known and is usually equal for all cases. This is when you want to prove something
statistically.
• Non-probability sampling – the chance of each case to be selected from the population is not
known and it’s impossible to make statistical interferences about the characteristics of the
population.

7.2 Probability sampling


Often, probability sampling is associated with survey strategies when one needs to make interferences
from a sample about a population to answer the research question and to meet objectives. Henry (1990)
advises against probability sampling for researches that use less than 50 cases. Because this amount may
not be representative of the entire population. The process of probability sampling passes four stages:

• Identify a suitable sampling frame based on research questions and objectives


• Decide an appropriate sample size
• Choose the most suitable sample techniques and select the sample
• Check if the sample represents the total population

The ‘sampling frame’ for a probability sample is a complete list of all the cases in the population from
which the sample is drawn. It is not possible to select a probability sample without a sampling frame.
When no suitable list exists and you still want to use a probability sampling technique, you will have to
compose your own sampling frame and ensure that it’s valid.

The way in which a researcher defines his sampling frame also raises implications to the extent to which he
can generalize form his sample. If a sampling frame is a list of all customers of an organization one can
only generalize to that population. Thus, you should not generalize beyond your sampling frame. This is a
mistake many researchers make; they don’t place clear limits on the generalizability of their findings.

Suitable sample size


The larger a sample’s size the lower the likely error in generalizing to the population. The choice of sample
size is governed by

• The confidence one has in the data (whether you are certain that the sample is representative of
the entire population)
• The margin of error1 one tolerates (the accuracy for estimates made from the sample)
• The type of analyses one is going to undertake
• The size of the total population

In order to ensure that faked results cannot be present, the analysed data must be normally distributed.
The larger the absolute size of a certain sample, the closer its distribution will be to the normal distribution.
This relationship is known as the ‘central limit theorem’ and it also occurs if the population from which the
sample is drawn isn’t normally distributed. It is proven that any sample size larger than 30 will usually
result in a sampling distribution for the mean that is very close to a normal distribution. This is the
reason why Stutely (2003) advises to use sample with a minimum of 30 cases.
The process of making conclusions about a population on the basis of data describing the sample is
called ‘statistical interference’. The ‘law of large number’ holds that large samples are more likely to
represent the population from which they are drawn than smaller samples. Moreover, their means are also
more likely to be equal to mean of the population.

Response rate
It is essential for a probability sample to be representative of the population. A perfect representative
sample is one that represents the population from which it is taken exactly. There are four levels of response
to questionnaires and structured interviews:

• Complete refusal – no questions were answered


• Break- off – less than 50% of the questions were answered
• Partial response – 50% to 80% of the questions are answered
• Complete response – all the questions were answered

Reasons why people don’t respond may be because they refuse to participate to the research, they are
ineligible (they don’t fit the requirements) or they may be unreachable. A research report should always
include the response rate of the research. This could be calculated by the following formula:

total response rate= total number of responses/ total number in sample-ineligible

A more common way of calculating the response rate excludes ineligible respondents who were unreachable.
This is the active response rate

1
A margin of error tells you how many percentage points your results will differ from the real population value. For example, a 95%
confidence interval with a 4 percent margin of error means that your statistic will be within 4 percentage points of the real population value
95% of the time
active response rate= total number of responses total /number in sample-(ineligible+unreachable)

It is important to estimate the likely response rate and increase the sample size accordingly to ensure that
you will be able to undertake the analysis at the level of detail required. Once the estimated response rate
and the minimum sample size are determined one could calculate the actual sample size with the
following formula:

na=n ×100/re%

na = actual sample size required


n = minimum sample size
re% = estimated response rate expressed as percentage

One way to estimate the response rate is to analyse the response rates achieved for similar surveys that have
already been undertaken and subsequently base the response rates on these.

Sampling techniques
There are five main techniques for selecting a probability sample:

• Simple random
• Systematic random
• Stratified random
• Cluster
• Multi stage

See figure 7.3 for a guideline for selecting the appropriate probability sampling technique.

Simple random sampling


This is done by selecting the sample at random from the sampling frame using a computer or random number
tables. You do this by numbering each of the cases with a unique number (Starting with 0) and select cases
using random numbers until your actual sample size is reached. This is done without replacement so that no
number could be selected twice. This type of sampling is best used when one has an accurate and easily
accessible sampling frame that lists the entire population. The sample that is eventually selected can be said
to be representative of the entire population, because the numbers were chosen without bias.

Systematic random sampling


This involves the researcher selecting the sample at regular intervals from the sampling frame. This is done
by numbering each of the cases in a sampling frame (starting with 0), then selecting the first case using a
random number, calculate the sampling fraction and finally select subsequent cases systematically by using
the sampling fraction to determine the frequency of selection. The sampling fraction is the proportion of the
entire population that one needs to select and could be calculated using the following formula:
sampling fraction= actual sample size/total population

When the sampling fraction is ¼ one needs to select every fourth case from the sampling frame. Using this
technique one needs to be sure that the list does not contain periodic patterns since this may disturb the
results.

Stratified random sampling


This is a modification of random sampling in which you split the population into two or more relevant and
significant strata (lagen) based on one or more attributes. In other words, the sampling frame is divided into
various subsets after which a random number is drawn from each of the strata. By dividing the population
into a series of relevant strata the sample is more representative because one can ensure that each of the
strata are represented proportionally within the sample.
To do this a researcher chooses the stratification variable(s) and divides the sampling frame into the
discrete strata. Then he numbers each of the cases within each strata with a unique number (starting with 0),
after which he selects the sample using either simple random or systematic random sampling.

Cluster sampling
This technique is similar to stratified random sampling as it is required to divide the population into discrete
groups prior to sampling. The groups are called ‘clusters’ and can be based on any naturally occurring
grouping (for example by manufacturing firm or geographical area). Instead of the individual cases within the
population, with this technique the sampling frame consists of the list of clusters. The technique has three
stages:

• Choose the cluster grouping for the sampling frame


• Number each of the clusters with a unique number (beginning with 0)
• Select the sample of clusters using some form of random sampling

This technique leads to a sample that represents the whole population less accurately than stratified random
sampling.

Multi-stage sampling
This is a development of cluster sampling. Just like cluster sampling, multi-stage sampling can be used for any
discrete group, including those that are not geographically based. With this technique one modifies a cluster
sample by adding at least one or more stage of sampling that also involves some form of random sampling.
The four phases of multi-stage sampling are depicted in figure 7.4 on page 279. Because this technique relies
on various different sampling frames, one needs to ensure that they are all suitable and available.

7.3 Non-Probability Sampling


Non-probability sampling provides alternative techniques for selecting samples. There are no rules for
deciding the sample size, but it is important to choose a size that represents the population adequately.
Quota sampling
Quota sampling is a non-random approach for sampling and is often used for structured interviews. This is a
type of stratifies sample in which the selection of cases within strata is totally non-random. To select a quota
sample one needs to divide the population into groups, calculate the quota of each group (based on relevant
data), give each interviewer an ‘assignment’ (this states the number of cases in each quota from which to
collect data) and combine the collected data from the interviews to provide the sample.

Purposive sampling/judgemental sampling


This type of sampling involves the researcher to use judgment to select the cases that are most suitable for
answering the research questions and to meet the objectives. There are a number of strategies which one can
adopt to use purposive sampling:
• Extreme case/ deviant sampling – focus on unusual/special cases that will provide answers to
research questions and enable the researcher to meet objectives

For example, if you were studying inner city violence, you could study a city with high violence and
compare it to a city with low violence. Like any sampling technique where a researcher deliberately
chooses cases, extreme case sampling could result in selection bias, undermining results (Collier &
Mahoney, 1996)

• Heterogeneous/maximum variation sampling – uses judgement of the researcher to choose the


participants with sufficiently diverse characteristics to generate the maximum variation possible in
the collected data.

For example

• A researcher is conducting a door to door survey to find attitudes towards single parents. ...
• A researcher is investigating why people don't complete their prescribed course of antibiotics and
thinks that socioeconomic class may be a reason.

• Homogeneous sampling – focuses on one specific subgroup in which all the members are very
similar (age,occupation)

For example, people in a homogeneous sample might share the same age, location or employment.
The selected traits are ones that are useful to a researcher.


• Critical base sampling – selects critical cases because they are either important or can make a
dramatic point.
• Typical case sampling – these enable the researcher to generate an illustration of what is typical to
those who will read the research report and are unfamiliar with the topic.
• Theoretical sampling – sample selection is dictated by the needs of the theory being developed.
Thus the sampling occurs during the research as more participants are needed.

Volunteer sampling
Snowball sampling is a type of sampling where participants volunteered to participate in the research,
instead of being chosen by the researcher. Self-selection sampling on the other hand occurs when the
researcher asks the participants to volunteer in the research.

Example—A TV show host asks his viewers to visit his website and respond to an online poll.

Haphazard sampling
This is a type of sampling where sample cases are selected without an obvious relation to the research
questions.

An example of Haphazard Sampling would be standing on a busy corner during rush hour and interviewing
people who pass by

You might also like