Statistical Surveys
» The purpose of a survey is the collection of information to
satisfy a definite need.
* The need to collect data arises in all walks of life.
* The data we need may be about
1
etc,
the population (total, sex, age, migration, rate of growth, literacy,
religion ...
. labour (no. of employees, hrs. of work, wages, strikes,
unemployment...)
. agriculture ( area under diff. crops, forests, agriculture income,
manures, cultivation practices ...)
|. Industry ( turn over, production, capital investment, water
consumption, pollution ... )
. Trade ( wholesale/ retail prices, demand, profit/ loss .. )+A statistical survey is a sort of investigation carried out
by an agency or individual to study the nature of the
unknown characteristics of a population.
. We undertake a survey for a variety of purposes.
However in most cases our interest may be concentrated
on 4 important unknown values (or parameters ) of the
population under study.
1, the population total
2. the population mean
3. the population proportion
4. the population ratio
Proportion : whole and part ( proportion of smokers,
males, defectives, distinctions ... )
Ratio : part and part ( sex ratio, import/ export ratio,
birth/ death ratio ... )Population : Aggregate of all objects about which
we want to collect information ( houses in an area,
students in a class, fishes in a lake, viewers of a
specific T. V. programme, normal population with
mean 50 and SD 5...)
Characteristic : Any aspect of the population about
which we want to collect information ( colour,
height, life length, yield, political affiliation,
income, employment ... )
Characteristics are of two types — Variables and Attributes
Two ways of collecting information —
Census method ( complete enumeration method)
and Sampling method.* Census - Method of collecting data from each and
every unit of the population.
Merits of census method
+ The results are more representative, accurate and reliable
+ The results are free from sampling errors
+ Acensus data may be used as a basis for various other surveys
However despite of these advantages, the census method is not
popularly used in practice.
+ Effort, money, time required for completing census is very large.
+ There is no way of checking the error in the data except through a
re- survey or sample checks.
Census is practically impossible for a researcher or a small
organization.
If the population is infinite or the enumeration is destructive in
nature, census cannot be used.Sampling -— Method of collecting data from a
representative part of the population only
Merits of sampling include
+ Reduced cost, time and labour
* Greater scope ( need only less number of trained investigators,
less administrative cost, less number of equipments ... )
+ Greater accuracy
If the population is hypothetical or infinite, only sampling is
possible
Itis always possible to determine the extent of sampling error
Demers of sampling method are
+ A proper choice of the sampling method is not made, the results
may be misleading
The chances of sampling errors are great in sampling
When the population is small, we can’t use sampling
When the information is needed from each and every unit in the
population (Voters list preparation, incom-tax assessment,
college admissions... ), sampling cannot be used.Neither sampling nor census admit universal
application.
Census and sampling will produce identical
conclusions when the population is perfectly
homogenous.
It is a curious fact that the results from a carefully
planned, well executed sample survey are expected to
be more accurate than those from a census survey.
The aim of sampling theory is to make sampling more
effective so that the answer to a particular question is
given in a quick, valid, efficient and economical way.Errors in Surveys
‘Two major types of errors can arise when a survey is conducted to
make observations on a characteristic defined over the population:
sampling errors and non-sampling errors
Sampling error refers to the error arising due to drawing
inferences about the population on the basis of few observations
taken from it.
This error is inherent and unavoidable in any sample survey. It can
be decreased by increasing the sample size .
S. E. is inversely proportional to the square root of the sample size.
Sampling errors are absent in census surveys.
Few reasons for sampling errors are — faulty selection of the
sample(purposive or judgment sampling, use of inappropriate
sampling scheme like srs for heterogeneous populations ... ),
substitution(when difficulties arise, investigator may substitute a
convenient member of the population), faulty identification of the
‘sampling units (high in area surveys or crop surveys) ...* Non-sampling errors are more serious and are due to
mistakes made in the acquisition of data .
* This is present in both sample surveys and census
surveys.
* It can occur at any stage of its planning, execution and
analysis.
Few reasons for non sampling errors are — faulty planning or
definitions(faulty objectives, faulty questionnaire, errors in
measurements, lack of trained investigators ... ), errors due
to non response (not at homes, unable to answer, refuses
to answer the questions ... ), response errors (respondent
may misunderstand a question and may furnish false data,
prestige bias, investigator bias ... ), errors in
coverage(inclusion/exclusion of units which are to be
excluded/included in a survey ... ), compiling errors(errors
in coding, editing, tabulation ... ), publication errors (errors
in printing, presentation... )...Classification of Sampling Techniques:Non- probability sampling and probability sampling
* Non probability sampling - Method of selecting
samples in which the choice of selection of units
into the sample depends entirely on the judgment
of the sampler (investigator).
Probability sampling - Scientific method of
selecting samples from the population. In this
procedure , each unit in the population has a
definite pre assigned non zero probability of
being selected into the sample.Non probability sampling
+ Convenience sampling
® Attempts to obtain a sample of convenient elements.
> Usually the sample is restricted to a part of the population
that is readily available.
> Often, respondents are selected because they happen to
be in the right place at the right time.
— use of students or members of social organizi
1S
—mall intercept interviews without qualifying the
respondents
— fruits on the top of the containers
— “people on the street” interviews+ Judgmental sampling
® method of sampling in which the sample elements
from the population are selected based on the judgment
of the researcher.
— party members selected in voting behavior research
— expert witnesses used in court
— purchase engineers selected in industrial marketing
researchQuota sampling
may be viewed as two-stage restricted judgmental sampling.
— The first stage consists of developing control categories, or
quotas, of population elements.
— In the second stage, sample elements are selected based on
convenience or judgment.
Population Sample
composition composition
Control
Characteristic Percentage Number
(Sex)
Male 48 480 (48%)
Female 52 520 (52%)
100 1000* Snowball sampling
® an initial group of respondents is selected, usually at
random.
— After being interviewed, these respondents are
asked to identify others who belong to the target
population of interest.
— Subsequent respondents are selected based on the
referrals.
Disadvantages of non probability sampling
All the non probability sampling procedures suffers from
drawbacks of favoritism, personal biases, prejudices ... of
the investigator.
Only if the investigator is well experienced and perfect in
nature we can expect satisfactory results in this caseProbability Sampling Schemes
+ Simple Random Sampling
+ Stratified Random Sampling
+ Systematic Sampling
+ Cluster Sampling
+ PPS Samling
Advantages of probability sampling
The sample will be representative of the population with
respect to the variables of interest.
Probability samples are more accurate than non-probability
samples (They remove conscious and unconscious sampling bias )
Probability samples permit the development of the theory for
the estimation of population parameters.
Probability samples allow us to determine the accuracy of the
sample estimates.For any survey there are three important stages
— Planning, execution and Analysis.
* Execution of the survey is 100% practical work.
Sampling theory plays no role in the execution
of the survey.
+ It gives importance to the other two stages
namely, planning and analysis.Principal steps in sample surveys
Sample survey may be considered as an organized fact
finding procedure. While developing a sampling design
( planning execution and analysis), we must pay attention
to the following points.
Stating clearly the objectives of the survey
Define the population to be sampled (covered)
Definition of the sampling units and the preparation of
the sampling frame
Deciding the data to be collected
Methods of collecting the data
Preparation of the questionnaire
Selection of the sample
Organization of the field work
Summary and analysis of the data collectedThe Language of Sampling
Population: the theoretical aggregation of specified
elements defined for a given survey defined by time and
space
element: a case or a single unit that is selected
from a population and measured in some way for the
study (e.g., a person, thing ...).
Sampling frame: a specific list containing the names or
addresses of all elements in the population. From this, the
researcher selects units to create the study sample.
?: a set of cases or units that is drawn froma
population and used to make conclusions(generalizations
or inferences) about the unknown aspects of the
population
Estimator: a calculating scheme or formula (statistic) for
obtaining an appropriate value of the population
parameter based on the sample observations..
Estimate: particular value of an estimator w. r. toa
sample
Expected value: average value of all possible
estimates based on an estimator from repeated
trials of a sampling scheme.
Bias : difference between the expected value of the
estimator and the true value of the parameter
Precision : measures the closeness of the estimator
with its expected value. Variance of an estimator is
usually used to measure precision.
: refers to the closeness of the estimate
and the true value of the parameter. Mean Square
Error of the estimator is used to measure accuracy
Estimator is both accurate and precise only if
the estimator is unbiasedSimple Random Sampling (srs)
It is the simplest method of probability sampling
The sample is drawn unit by unit with equal probability of
selection for each unit at each draw.
If the unit selected is returned to the population after
enumeration, before the next draw, the procedure of
selection called srswr.
If the unit selected is removed from the population after
enumeration, before the next draw, the procedure of
selection called srswor.
If Nis the population size and n is the size of the sample we
select, there are IN" srswr samples and NC, srswor samples
possible
‘Since the srswor sample provide more precise and accurate
estimates of the population parameters than that based on the
stswr sample, we always prefer srswor samplesProcedures of selecting a srs
® Define the population and select a suitable sampling
frame
® Each element is assigned a number from 1to N
* Generate n different random numbers between 1
andN
° The numbers generated denote the elements that
should be included in the sample
Lottery method and random number table method
are the two procedures available for selecting srsLottery Method
* This is the most popular and simplest method. In this
method all the items of the population are numbered on
separate slips of paper of same size, shape and colour. They
are folded and stored in a container. Shuffle them
thoroughly. Slips are then drawn at random one by one till
the required number or units are selected into the sample.
Table of Random numbers
As the lottery method cannot be used, when the population is
large, the alternative method is that of using the table of
random numbers. There are several standard tables of random
numbers.
1. Tippett’ s table
2. Fisher and Yates’ table
3. Kendall and Smith’ s tableSelection of a srs using random number tables
» Identify and define the population.
» Determine the desired sample size.
® Assign all individuals on the list a consecutive
number from zero to the population size.
® Select an arbitrary number in the table of random
numbers.
® For the selected number, locate the unit in the
population bearing it and select that unit to the
sample
® Go to the next number in the column of the table
and repeat the above step until the desired
number of individuals has been selected for the
sample.Estimation of Parameters using srswor
+ Let (Vis Yor» ¥q) be the srswor sample of observations taken from.
the population under study. Then the sample mean is
- 12
Vere =
nt
Itis an unbiased and consistent estimator of the population mean
Y
If Nis the populati Sol >) is the sample
variance, then an unbiased estimator of the variance of the
estimator is ( nn)
Nn
+ An estimate of the population total Y is Ny and an unbiased
estimator of its variance isAdvantages of srs
* easy to conduct
* strategy requires minimum knowledge of
the population to be sampled
+ simple estimators for the parameters
Drawbacks of srs
* When the population is not homogeneous w. r.
to the characteristic under survey, the srs need
not be a good representative of it.Homogeneous and heterogeneous populations
If all members of a population were identical, the
population is considered to be homogenous. That is,
the characteristics of any one individual in the
population would be the same as the characteristics of
any other individual (little or no variation among
individuals).
When individual members of a population are different
from each other, the population is considered to be
heterogeneous (having significant variation among
individuals).
Eg. Students in a school, students in a college,
workers in a factory ...Stratified Sampling
When the population is heterogeneous in nature,
the stratified random sampling is used.
Two-step process in which the population is
partitioned into subpopulations, or strata.
Strata should be mutually exclusive and collectively
exhaustive so that every population element should
be assigned to one and only one strata and no
population elements should be omitted.
Elements are selected from each strata by a random
sample procedure, usually srs. Then pool them
together to get the stratified sample
A major objective of stratified sampling is to
increase precision without increasing cost.* The elements within a strata should be as
homogeneous as possible, but the elements in
different strata should be as heterogeneous as
possible.
The stratification variables (Auxiliary variables)
should also be closely related to the
characteristic of interest (survey variable or Study
variable).
If height is the study variable,
weight or age may taken as the auxiliary variable
If volume of timber is the study variable,
girth or height of the trees may be taken as auxiliary variableProcedure for Drawing a Stratified Random Sample
Define the population and the sampling frame
Select the stratification variable(s) and the number of strata, L
Divide the entire population into L non overlapping subdivisions
called strata based on the classification variable
In each strata, number the elements from 1 to N, (the pop.
size of strata h)
Determine the sample size of each strata, nj, where
From each strata, select a simple random sample of n, units
and pool them together to get the required Stratified sample.Reasons for stratification
Administrative convenience
When sampling problems differ markedly in
different parts of the population(In surveying
factories, firms may be grouped into large/
medium/small, individual/group, private/govt. ... )
When stratification produce gain in precision in the
estimates of the parameters of the whole
population ( when the population is highly
heterogeneous)Estimation of Parameters using a Stratified sample
Let the population of size N is divided into L strata
each of size N,, h= 1, 2, ..., Lso that »-
We take random samples of size n, fiom the, hth
strata in the population, h=1, 2, ...,Lso that » Sa
Suppose ¥,, 4=1,2,....n,, 2=1,2,...2, denote the it
observation in the sample taken from the h'* strata
in the Population.
Now y, = hyn, be the sample mean from the ht"
iy a
strata in the population . 72 = 1, 2...
Then an unbiased estimator of the population mean
Y is
Yu ==,Allocation of sample size in different strata
* Once the sampling strategy is fixed as Stratified
random sampling, their arise the question of
deciding the sample size, n,, for the ht strata,
h= 1, 2, ..., Lin the population.
* The following are the important methods of
allocation in stratified sampling
1. equal allocation
2. proportional allocation
3. optimum allocationEqual Allocation
Proportional Allocation 72,
Optimum Allocation
In optimum allocation procedures, we resort to
conditional minimization techniques. Here we consider
linear cost functions. The standard procedures are
+ Minimizing the variance of the estimator for a given total
cost of the survey
* Minimizing total cost the survey for a given variance of
the estimator
* Minimizing the variance of the estimator for a given
sample size (Neyman Optimum Allocation)Advantages of Stratification
Administrative convenience
Samples are more representative
Estimation with greater accuracy
Stratification makes it possible to use
different sampling designs in different strataSystematic Sampling
+ A sampling technique in which only the first unit is
selected randomly and the rest are selected
automatically according to a predefined pattern
The sample is chosen by selecting a random starting
point and then picking every kt" element in succession
from the sampling frame.
The sampling interval, k, is determined by dividing the
population size N by the required sample size n and
rounding it to the nearest integer.
For example, let there are 5,000 elements in the
population and a sample of 50 is desired. In this case
the sampling interval, k, is 100. A random number ( +,
the random start) between 1 and 100 is selected. If, for
example, r=23, the sample consists of elements 23, 123,
223, 323, 423, 523, and so on upto 4923..
Procedure for Drawing a Systematic Sample
Define the population and select a suitable
sampling frame
Each element is assigned a number from 1 to N
Determine the sampling interval k (k=N/n). If k
is a fraction, round it to the nearest integer
Select a random number, r, between 1 and k
Now the elements with the following numbers
will constitute the systematic random sample
r+k,r+2k,r+3k,r+4k,...,r+(n-1)Things to remember before using
Systematic sampling scheme
* Efficiency of systematic sampling depends on the
order of arrangement of the units in the population
If the units in the population show an
increasing/decreasing trend along with the
increase in magnitude of their labels, the
systematic sample means will also show the same
tendency (rank lists, salary lists
If the population is almost periodic/cyclic in nature,
then the efficiency of systematic sampling depends
on the value of k, the sampling interval (Rain fall,
market days, peak traffic hrs. ... )Cluster Sampling
The target population is first divided into mutually
exclusive and collectively exhaustive
subpopulations, or clusters.
Then a random sample of clusters is selected,
based on a probability sampling technique such as
srs.
For each selected cluster, either all the elements
are enumerated (single stage) or a random sample
of elements is drawn from the selected clusters and
are only enumerated (two-stage).
Ideally, each cluster should be a small-scale
representation of the population.Types of Cluster Sampling
Cluster Sampling
Population : People in Kerala Population : Students in a
Cluster : districts/ Villages/ college
Panchayats/ municipalities Cluster : Courses/batches
Elements : Houses/Schools/ Elements : Each student
individualsTHANK YOU