SAMPLIN
G
:First: Introduction to Sampling Surveys
There are two main approaches to data collection: complete enumeration (census) and
sampling.
Complete enumeration means collecting information about all individual elements in the
studied population. Example: population censuses
Sampling, on the other hand, means gathering information about only a fraction or a subset
of the studied population. Example: Sampling surveys
Advantages of Sampling
Low cost )1
With each extra element added to the process, the cost of data collection and data handling
.(including editing, coding, data entry, description, and data analysis) increases monotonically
2) Higher accuracy
Since a research using sampling requires fewer employees and less time for data collection and
.data handling, it is easier to have stricter control of the two processes
3) Wider applicability
The low cost of sampling research facilitates performing a large number of studies that would
.otherwise have been precluded by the high cost of complete enumeration
Probability and Non-Probability Sampling
Probability sampling is defined as any selection technique that gives each
population element a determined (known or could be estimated) probability of
being selected into the sample. These probabilities of selection are not necessarily
equal.
Unequal probabilities of selection imply biased probability sampling. Some
population elements may be given a zero probability of selection, in which case
we have non-coverage probability sampling. Both biased and non-coverage
probability samplings are not representative of the sampled population.
All probability sampling techniques that assign equal probabilities of selection to
all elements of the sampled population are said to be representative sampling
techniques.
The main advantage of probability sampling is its following probability laws.
Therefore, generalization from the results of a sample selected by a probability
sampling technique to the population can be achieved through statistical
inference.
All sample selection techniques that do not satisfy the necessary condition of
probability sampling are considered non-probability sampling techniques. There
are several circumstances when it is useful to use non-probability sampling.
Examples of Non-Probability Sampling:
1) Purposive sampling
It is any sample selection technique that depends on experts’ opinion in choosing
a typical sample.
It could be preferred to probability sampling techniques in some circumstances.
This mainly occurs when the desired sample size is small
2) Quota Sampling
Experts do not specify which elements to be chosen, but rather their main
characteristics such that the sample resembles the population.
Quota sampling is frequently used in marketing studies and in opinion polls
where quick and not so costly results are required. However, quota sampling is
plagued by the personal interference in the selection process which makes it non-
generalizable. The resulting sample can be highly non-representative of the
population if the study topic depends on other characteristics beside those
.included in putting the selection constraints
3) Convenience Sampling
The main selection criterion in this sampling technique is how easy it is to reach the
selected population elements.
This technique is not suitable to descriptive studies because it yields samples that are not
representative of any specific population and cannot be generalized from
Usefulness of the technique depends on the validity of the underlying assumption that all
population elements are similar with respect to the basic structure of variable relationships
Convenience sampling is only appropriate in exploratory studies
4) Haphazard Sampling
This type includes several selection techniques that do not apply any objective criteria.
Examples are selecting people who happen to be present in a specific place at a specific
time, without any objective selection of the place and time (e.g. around the corner surveys)
Haphazard sampling is the worst and most biased type of sampling. Results from a
haphazard sample are non-generalizable. They cannot even be taken as indicative, like the
results of purposive or convenience samples, because haphazard samples are usually
atypical, i.e. so different from the population. It is known, for instance, that media and
internet polls tend to represent the minority-held extreme opinions rather than the majority-
held moderate ones because those with strong extreme opinions are keener to participate in
such polls.
5) Snowball sampling
It is mainly used in studies of hard-to-reach/enter populations (such as criminals and illegal
migrants). The idea is to start with a small number of individuals belonging to that population.
The Population and the Frame
The population is the entire set of individual elements to which the researcher wants to generalize
.(or extrapolate) the findings of the research
In the first stage, the researcher should determine what is called the target population, which is
the population from which the sample will be selected. The target population is usually smaller
than the population to which the researcher wants to generalize.
The most important factors to be considered in the process of determining the target population
are generalizability to the original population and the feasibility of sampling and data collection
from the chosen target population.
In the second stage, the researcher has to get a frame for the target population.
The frame is a mapping of the target population that وcan be used for sample selection.
The frame contains a list of sampling units, which are usually given numerical codes such that a
probability sample could be selected using random number generators. The key is to have a clear
mapping between the population elements and the sampling units in the frame.
Most straightforward examples of frames are name lists. Frames also exist in other forms
besides name lists. The frame could be a map such that specific delimited locations constitute
.the sampling units
There are the two criteria in choosing the frame. The first is the correspondence between the
sampling units in the frame and the elements of the target population. The second criterion is the
availability of the proposed frame.
Unconventional frames usually depend on specifying times or places (or both) where theelements
of the target population can be allocated.
Characteristics of Perfect Frames
1- Coverage
Every element in the population has to be mapped to at least one sampling unit in
the frame.
2- Non-repetition
Every element in the target population should correspond to a single unit in the
.frame
3- No empty or foreign units
Every sampling unit in the frame should correspond to at least one element in the
.target population
4- Individuality
Every sampling unit in the frame should correspond to only one element in the
.target population
Sampling Distributions
Let us assume that the target population contains N elements and that we are interested
in estimating a particular measure that summarizes the values of a specific variable
among the elements of the target population. Let V denote the value of the measure we
want to estimate, which we will get if we study the target population using complete
enumeration. Since we are not going to do a complete enumeration, the population
value V is unknown.
If we select a sample of size n from the target population using a probability sampling
approach and estimated the measure under study we will get what we will call the
sample value, denoted by v. Since the selected sample contains only a subset of the
elements of the target population, the sample value is typically different from the
population value. The difference (e = v – V) is called the sampling error. The sampling
error is hence defined as the error in estimating the measure under study that is
attributable to the fact that sampling rather than complete enumeration is used in data
collection. Since the population value V is unknown, the sampling error e is also
unknown.
Typically, the estimate v will differ among these potential samples. Since the sampling
approach is probabilistic, the sample value v is a random variable that will vary among
potential samples according to a specific probability distribution.
We can construct a table containing each possible value for the sample value v along
with the probability of getting that particular value, i.e. the probability distribution of
the sample value, which is called the sampling distribution.
It should be noted that the sampling distribution depends on the properties
of the target population and also on the sampling design (i.e. the sample size
and the technique of selecting the sampled elements from the target
population).
Also note that the sampling distribution of the sample value v can easily be
translated to the sampling distribution of the sampling error e, since the
difference between the two random variables v and e is a constant V.
Statistical inference techniques can be employed to estimate the
characteristics of the sampling distribution using one observed
representative sample. If the sample is large enough, many sample values
(estimators) are approximately normally distributed.
:Sources of Errors in Surveys
Surveys’ errors can be classified into two main groups: non-measurement errors and
.measurement errors
Non-measurement errors mean that information is not available about some
population elements.
Measurement errors mean that some of the information available is different from
the true values of the variable under study.
Whether the true value for some elements are unavailable due to a non-measurement
or a measurement error, the value computed for the measurement under study will be
different from its true value V*. If the type of error results in a tendency for the
calculated value to be higher (or lower) than the true value (i.e. to be biased), it is said
to be a systematic error. Otherwise, it is said to be a random error.
Non-Measurement Errors:
.There are three sources for non-measurement errors
1) Sampling
Using the sampling approach, instead of complete enumeration, the results are
always subject to sampling errors. The main advantage of the sampling error in case
.of representative sampling techniques is its being random
2) Non-coverage
This error happens when a part of the target population is excluded from the studied
population from which data are collected (in case of complete enumeration) or from
.the sampling frame. Non-coverage typically results in biased results
3) Non-response
This means the unavailability of information about some elements covered by the
study, either because efforts to collect this information failed or because collected
information were lost or deemed unusable. Refusal is the most prevalent cause of
non-response; and it usually introduces bias in the results. This is because refusal is
.typically related to the variable under study
Measurement Errors
.There are many sources for measurement errors (also called response errors)
1) Interviewers
Unless effectively trained and monitored, interviewers can introduce a multitude
of response errors. This might happen due to negligence in asking the survey
questions or in writing down the answers they get. At the worst, some
.interviewers could fabricate the answers of whole questionnaires or parts of them
2) Respondents
The respondent may give untrue answers due to being ignorant or forgetful of the
true answer; or because he or she misunderstood the question. The respondent
may also give an untrue answer deliberately to avoid a real or imagined harm; or
.to gain a real or imagined benefit
3) Questionnaires
Complicated, unclear and suggestive questions induce respondents to give untrue
.answers
4) Data handling
In addition to the measurement errors occurring during the interview or questionnaire
filling process, errors can occur while editing the questionnaire. Extra errors can be
.induced during coding and data entry processes
The sampling error is the only type of error that is unique to sampling surveys while
the rest of errors can affect both complete enumerations and sampling surveys. In
addition, except for non-coverage errors, all other types of errors are more likely to
affect complete enumerations which are more complicated and, hence, are more
difficult to monitor efficiently.
:Criteria for Good Sampling Designs
1) Goal orientation
Focusing on other criteria for good designs may distract the researcher from
focusing on the most important criterion, which is satisfying the goals of the
.planned study
2) Feasibility
This essential criterion is especially relevant when determining the target
population, preparing the sampling frame and choosing the sampling technique
.compatible with the available frame
3) Generalizability
This criterion is the main reason why probability sampling is preferred to non-
probability sampling. However, even with probability sampling, further conditions
.have to be met to enable generalizability
4) Efficiency
By efficiency it is meant achieving a specified level of precision with the least cost;
.or, alternatively, achieving the highest precision within a specified cost