You are on page 1of 5

CLT-BASED SAMPLE SIZE CALCULATIONS FOR MARKETING

RESEARCH SURVEYS (CLASSICAL INFERENCE)

§ 1. Sample Size Calculations

Using the Central Limit Theorem (CLT), it can be shown that the minimum
sample size n = (1.96)2σ2/e2 if e s.t. , where μ and σ2
are population mean and variance respectively and is the mean of the random
sample.
Let us outline a simple “proof” of the proposition before we generalize it. By
CLT we know that … (1)
Hence, ~ N(0,1)

From normal probability tables we know P(|Z|≤1.96) = 0.95


That is P(| |≤1.96) = 0.95

Or P(| |≤1.96 ) = 0.95 … (2)


We are given that … (3)
From (2) and (3) we have e >
Or n > 1.962σ2/e2 that is n > 3.84σ2/e2.
This gives the minimum sample size for estimating mean at confidence level
95%. This may be generalized to any level of significance α% and
corresponding confidence level (1-α)%. Remember we use the Z score zα
corresponding to the confidence level (1-α)%.
Also, to estimate population proportion P (with Q = 1-P) at level ofsignificance
α% and corresponding confidence level (1-α)% we use
n = PQ /e2.

A detailed discussion on specific instances of sample size calculationfollows.

1 of 9
For continuous or interval-scaled variables, the sample size (n) for estimating
2
means is n pq z / e where “Z” is the score from the normal distribution for
the chosen confidence interval. For a 95% confidence interval for a two- tailed
test the Z score would be 1.96. “s” is the population standard deviation for the
variable under study. Of course, for a new study area, rough estimates or
surrogate variable values have to be input. “e” is the tolerable error in estimating
the variable in question. This is usually decided based on the researcher’s
experience and the expected response error and other human factors.

2
The sample size (n) for estimating proportions is n pq z / e where “p” is
the frequency of occurence expressed as proportion and q = 1 - p. “pq” is
computed in much the same way as “s” above. The other symbols are also
similar to the above case of estimation of means.

The calculations for proportionate and disproportionate stratified random


sampling are also similar but certainly not identical. In general, disproportionate
stratified random sampling is considered much more efficient and accurate than
proportionatestratified random sampling.

Several non-probabilistic methods such as convenience sampling and snowball


sampling may also be resorted to but the results may not have much statistical
validity.

§ 2. Survey Research and Sampling Fieldwork

As is commonly known, the census approach covers data points thoroughly but
is perfectly unviable for most instances of market research. There are many
different ways field surveys can be carried out and there are strong
2 of 5
commonalities that have to be satisfied irrespective of survey type.

Typically, in a sample survey, the sampling element has to be clearly defined.


The element is usually a human respondent about whom information is sought
by the researcher based on certain parameters. The responses of this element
will be stored as a record and the parameter values as fields against the said
record in the resultant raw database. Every field against each such record is
usually referred to as a datapoint. The population is the predefined set of
potential respondents within a geographical area and within the scope of the
survey. It is not necessarily the entire human population of an area. The
sampling frame is a systematic list of the population elements – if Borivali
households need to be surveyed a Mumbai telephone directory may suffice as a
useful sampling frame. However, sometimes, all population elements may not
be properly listed, for which reason the products of the national census as
commercially available may provide a more (expensive but also more) accurate
frame. The sampling unit is usually a multi-stage link between the sampling
frame and the sampling element. For example, in the survey of Borivali
households, the city blocks within the locality itself are first-stage sampling
units, the apartments in each block are the second-stage units and so on, until
the sampling element is reached.

To ensure valid results, sampling techniques have to be chosen with care.


Probabilistic sampling techniques set the ground for accurate estimation of
population parameters (e.g. mean annual incomes of Borivali households) based
on sampled statistics (e.g. mean annual income of the sampled Borivali
households). There are many such techniques, such as simple random sampling
without replacement (srswor), simple random sampling with replacement
(srswr), stratified random sampling, cluster sampling, systematic sampling and
multi-stageand combination sampling.
3 of 5
Simple random sampling sounds simple but can actually be very complicated to
implement in practice. Hence, stratified random sampling is found to be a
common social research technique. In this, based on some robust population
parameter, one heterogeneous population is divided into several homogeneous
strata. In proportionate stratified random sampling, the number of units sampled
from each stratum is the same. Sample size calculations will be discussed
shortly. Selection is done by assigning a serial number to each element in each
stratum in the frame and then drawing sampling elements according to a random
number series with a random start. In case of disproportionate stratified random
sampling, the weights assigned to each stratum have to be taken into account.

Cluster sampling is a slightly modified method. A cluster is a group of sampling


units taken together on the basis of criteria which may be geographical,
social or economic. Clusters selected on geographical basis are also called area
samples. Note that a cluster is distinct from a stratum. Usually, a cluster is
selected randomly from the list of clusters and then a sample of households or
other units is drawn from each cluster. In general, cluster sampling is
statistically inferior to simple or stratified random sampling. The results
obtained are not very accurate. But the sampling costs are low.

Systematic sampling is the selection of elements from the frame at regular (or
systematic) elements. For example, I draw up a sampling frame for Borivali
households, identify sampling units and then pick up every 10th element from
there. While systematic sampling is easy and cost-effective, results may vary
widely. Systematic sampling may give erroneous results on account of mis-
representation, or results almost as accurate as those obtained by random
sampling if mis-representation is avoided.

Finally, multi-stage and combination sampling, as the names suggest, combine


4 of 5
two or more probability sampling techniques as part of the sampling strategy
and the process is implemented in stages. Given adequate researcher experience
and familiarity with the population characteristics, robust parametric inference
can be drawn in this way.

A brief discussion on field force control follows. The field work must first be
organised by selecting sampling units, laying the field coverage plan and
budgeting the project. Since there may be response errors (selected sampling
elements do not respond), poor questionnaire fill-up or inconsistent and
incomplete answers quotas may be set to get a sufficient number of
responses over and above the sample size calculations. The field force must
have grassroots level investigators and supervisors and strict control and back-
check procedures. There must also be due briefing and debriefing procedures for
the field force at the project headquarters.

5 of 5

You might also like