You are on page 1of 34

ED223: Sampling Techniques and

Design of Experiments

Module-I

© Dr. Tina Dutta


Concepts of population, sample, survey and census

➢ Population
➢Sample
➢Survey
➢Sample Surveys
➢Census/Complete Enumeration
Sample Surveys

➢“A sample survey is a method for collecting data from or about the
members of a population so that inferences about the entire population
can be obtained from a subset, or sample, of the population members
(R.L. Williams, in Encyclopedia of Health Economics, 2014).”
➢A properly conducted sample survey will support inference from the
sample that is scientifically valid about the population.
Advantages of Sample Surveys over Complete
Enumeration
• Reduced Cost
• Greater Speed
• Greater Scope
• Greater Accuracy
• Only resort to study phenomenon involving destroying the items
Disadvantages of Sample Surveys over
Complete Enumeration
• Subpopulation estimates may not be reliable
• Small geographic areas may not be represented properly
• The estimates may contain sampling error as the data are obtained and
processed from a part of the population
Census in India
• Started in 1872.
• Decennial census continued from 1881 till 2011.
• What about Census 2021? Not held yet!
Census-2011 Schedule, India
Major Large-Scale Sample Surveys in India
• National Sample Surveys (now PLFS- Periodic Labour force Survey)
[1st NSS conducted in 1950-51]
• National Family Health Survey
Major Steps in Sample Surveys
• Objectives of the survey (descriptive or analytic or both)
• Population to be sampled (target population to be specified by units, place
and time)
• Survey variables (for which data to be collected)
• Degree of precision desired (the amount of error that can be tolerated in
estimates)
• Methods of measurement
Major Steps in Sample Surveys
• The Frame
• Selection of the sample (Sampling design)
• The pretest
• Organization of fieldwork
• Summary and analysis of the data
Probability Sampling
• Probability sampling is the mechanism through which inferences are
extended from the sample to the population.
• A probability sampling plan associates a nonzero probability of selection
with each and every member of the survey population such that the selection
probability can be determined for every member of the sample.
• A random process is used to select the sample so that the desired
probabilities of selection are achieved.
Common Probability Sampling Methods
• Simple Random Sampling
• Stratified random sampling
• Systematic sampling
• Cluster sampling
• Sampling probability proportional to size
• Multistage sampling
Non-Probability Sampling Methods
• Quota sampling
• Purposive sampling (judgmental sampling)
• Snowball sampling
• Convenience sampling
• Volunteer sampling (self-selection)
Sampling and Non-Sampling Errors
• Data from sample surveys are subject to two types of errors: sampling and non-sampling
errors.
• Sampling errors result from the fact that we only have information from a portion of the
population of interest rather than from the whole population.
• Non-sampling errors include errors from all other sources, such as:
• the sampling frame
• the questionnaire
• the interviewer
• and the respondent.
• Data from both censuses and sample surveys may suffer from non-sampling errors, while only sample survey
data are subject to sampling error.
Sampling and Non-Sampling Errors
• Sampling errors are controlled by choice of the survey design.
• The size of the sampling error can be estimated using the survey data
themselves.
• Sampling errors typically contribute to the variance of an estimate from
sample survey data (and do not exist for census data), while non-sampling
errors can contribute to both variance and bias of an estimate.
• In collecting sample survey or census data we try to control, reduce, or
avoid non-sampling errors as much as possible because their size and
effect on estimates are very hard to assess.
Major types of Non-Sampling Errors
1. Frame Error (erroneous inclusions, erroneous exclusions, and multiple inclusions.)
• Some units may have a higher probability of being included in the sample than most units.
• These errors may result in sample estimates that are biased for the population quantity being
estimated.
• For example, if wealthier households are more likely to have multiple phone lines, then
those households are more likely to be included in the sample (if sampling frame is the
telephone directory).
• This could lead to biased (here overestimated) estimates of, say, amount spent on
entertainment.
Major types of Non-Sampling Errors
2. Response Error : occurs when a respondent intentionally or unintentionally
gives an answer to a survey question that is not correct.
• Carefully designed and worded questionnaires, and, for in person or
telephone surveys, well-trained interviewers are the best tools for avoiding
response errors
• Questionnaires must be pilot tested to ensure that respondents understand
questions and that all possible responses are allowed for on closed-form
questions.
Major types of Non-Sampling Errors
3. Nonresponse Error: (missing data or nonresponse)
• There are two types of nonresponse in surveys and censuses: unit
nonresponse and item nonresponse.
• If no survey information is collected from a sampled unit, this is termed as
unit nonresponse
• If a sampled unit responds to some, but not all, survey questions, this is
termed as item nonresponse.
• There are different methods for dealing with unit and item nonresponse.
Major types of Non-Sampling Errors
• There are other sources of non-sampling error that can affect survey
data.
• Errors caused by interviewer bias, survey mode, coding, and data
entry are all errors that should be avoided in conducting surveys.
Unbiased estimator of a population parameter

• The word estimator denotes the rule by which an estimate of some population
characteristic µ is calculated from the sample results.
• The word estimate is applied to the value obtained from a specific sample.
• An estimator 𝜇ො of µ given by a sampling plan is called unbiased if the mean
value of 𝜇ො , taken over all possible samples provided by the plan, is equal to µ.
CENTRAL LIMIT THEOREM (CLT)
• Central limit theorem is one of the most important theorems in statistics due
to its applications in testing of hypothesis.
• Let S1, S2, …, Sk be samples of size n drawn from an independent and
identically distributed population with mean µ and standard deviation 𝜎.
• Let 𝑋1 , 𝑋2 ,…… 𝑋𝑛 be the sample means (of the samples S1, S2, …, Sk).
• According to the CLT, the distribution of 𝑋1 , 𝑋2 ,…… 𝑋𝑛 follows a normal
distribution with mean µ and standard deviation 𝜎Τ 𝑛 for large value of n.
CENTRAL LIMIT THEOREM (CLT)
• That is, the sampling distribution of mean will follow a Normal distribution
with mean µ (same as the mean of the population) and standard
deviation 𝝈Τ 𝒏.
• In simple terms, Central Limit Theorem states that for a large sample drawn
from a population with mean µ and standard deviation 𝝈, the sampling
ത follows an approximate normal distribution with
distribution of mean, 𝑋,
mean µ and standard deviation (standard error) 𝝈Τ 𝒏 irrespective of the
distribution of the population (Thomas, 1984).
Alternative version of CLT
Implications of CLT
Central Limit Theorem for Proportions

If we have a population in which a characteristic (for example, smart


phone users) has a proportion of p, then the sampling distribution of
the proportion (that is 𝑝,Ƹ calculated from several samples of size n)
will follow a normal distribution with mean p and standard deviation
𝑝(1−𝑝)Τ
𝑛

𝑝ෝ −𝑝
Then 𝑤𝑖𝑙𝑙 𝑓𝑜𝑙𝑙𝑜𝑤 𝑁𝑜𝑟𝑚𝑎𝑙 𝐷𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛
𝑝(1−𝑝)ൗ
𝑛
Use of the Normal Distribution in Sampling
Theory
We cannot know the exact value of the error of estimate (𝜇-
ො µ) but, from
the properties of the normal curve, the chances are
Use of the Normal Distribution in Sampling
Theory
The Confidence Interval

The ‘99% confidence” implies that if the same sampling


plan were used many times in a population, a confidence
statement being made from each sample, about 99% of
these statements would be correct and 1% wrong.
Sample Size Determination for Mean of the
Population
Sample Size Determination for Mean of the
Population
Then,
Sample Size Determination for Proportion of
the Population
Example-1
Example-2
References:
1. Cochran (1977): Sampling Techniques

2. Williams, R. L. (2014). Survey Sampling and Weighting.


Encyclopedia of Health Economics, 371–374.

3. Book Chapter “Nonsampling Errors” in book International


Encyclopedia of the Social & Behavioral Sciences, 2nd edition,
Volume 16

4. Book Chapter “Sampling and Estimation” in book Business


Analytics: The Science of Data-Driven Decision Making by U.
Dinesh Kumar (2019)

You might also like