You are on page 1of 7

Determining Sample Size

Prof. Noor Md. Rahmatullah

1 How to Determine Sample Size


Most of the sciences and social sciences use statistics to understand what is studied. In order to
make statistical analysis manageable, researchers must define their sample size rather than
attempt to work with an entire population. The purpose of a sample is to gain knowledge about a
population using an unbiased representation that can be easily observed and measured. This is
why it is necessary to choose a sample size that is large enough to represent the population as a
whole but small enough to make measuring and recording observations possible.

Design your experiment. The sample size will depend on what type of research you conduct,
such as performing interviews, doing surveys, reporting voting patterns or measuring molecules.

Calculate the population size. Your research likely has the goal of finding something out about a
population, and in order to determine how many observations you need to make (your sample
size), it's necessary to know how many total possible observations are available.

Specify the level of accuracy you want from your research. The sample size directly determines
the margin of error or the width of the confidence interval, two statistical measurements that can
be used to judge how accurately your research tracks to the larger population.

Calculate your ideal sample size. Do this by using a formula or an estimate. Statistical software
often provides formulas for calculating sample size. You can use such software, or you can
estimate your sample size given your research design, size of population and level of accuracy.

A number of formulae are available for working out sample size. We will introduce two of them
here to highlight some of the issues involved in calculating sample size for a quantitative project.

1.1 Points to be considered in determining the sample size


Technically, the sample size should be large enough to give a confidence interval of desired
width and as such the size of sample must be chosen by some logical process before sample is
taken from the universe. Size of sample should be determined by a researcher keeping in view of
the following points.
a) Nature of universe: The universe may be homogenous or heterogeneous. If the items of
the universe are homogeneous, a small sample can serve the purpose. But if the items are
heterogeneous, a large sample would be required. Technically, this can be termed as the
dispersion factor.

Sample Size Determination Page 1


b) Number of classes proposed: If many groups and sub-groups are to be formed, a large
sample should be required because a small sample might not be able to a reasonable
number of items in each class-group.
c) Nature of study: If items are to be intensively and continuously studied, the sample
should be small. For a general survey, the size of the sample should be large, but a small
sample is considered appropriate in technical surveys.
d) Type of sampling: The sampling technique plays an important role in determining the
size of the sample. A small random sample is apt to be much superior to a large but badly
selected sample.
e) Standard of accuracy and acceptable confidence level: If the standard of accuracy and
acceptable confidence level of precision is to be kept high, we shall require relatively
larger sample.
f) Availability of finance: In practice, the size of the sample depends upon the amount of
money available for study purposes. This factor should be kept in view while determining
the size of the sample for large samples result in increasing the cost of sampling
estimates.

1.2 Margin of Error


The sample size of a statistical survey is also directly related to the survey's margin of error, or
how accurate a statistic can be calculated to be. Margin of error is a percentage that expresses
the probability that the data received is accurate. For example, in a survey about religious
beliefs, the margin of error is the percentage of responders who can be expected to provide the
same answer if the survey was repeated. To determine the margin of error, divide 1 by the
square root of the sample size, and then multiply by 100 to get a percentage. For instance, a
sample size of 2,400 will have a margin of error of 2.04 percent.

2 Estimating the sample size based on a proportion


To calculate the sample size based on the sample required to estimate a proportion with an
approximate 95% confidence level, you can use the following formula:
z 2pq
nr  2
d

Sample Size Determination Page 2


Where nr = required sample size
p = proportion of the population having the characteristic
q=1-p and
d = the degree of precision.

The proportion of the population (p) may be known from prior research or other sources; if it is
unknown use p = 0.5 which assumes maximum heterogeneity (i.e. a 50/50 split). The degree of
precision (d) is the margin of error that is acceptable. Setting d = 0.02, for example, would give a
margin of error of plus or minus 2%. We apply this formula in the example in the worked
example below.

2.1 Worked example


You are investigating the use of mobile phones for online banking and want to estimate what
proportion of the population uses their phones in this way at an approximate 95% confidence
level. Since no data are available on the proportion currently using their mobile phones you take
the worst case scenario and set p = 0.5 (and therefore q = 1-0.5 = 0.5). As this is a preliminary
study you are prepared to accept a margin of error of ± 5% so you set d = 0.05.

To determine the minimum sample size you then apply the formula:

z 2 pq
nr 
d2


1.96   0.5  0.5
2

0.052
 384.16  385

So your minimum sample size would be 385.


Note that accepting a higher margin of error (making d bigger) will reduce the sample size. For a
10% margin of error in this scenario (i.e. d = 0.1), the sample size would reduce to 100. Similarly
if the population were more homogenous the sample size would also reduce. In this example, for
instance, if we know that only 10% of the population used their phones for mobile banking then
p = 0.1, q = 0.9 and the required sample size would drop to 144, assuming a margin of error of
5%.

Sample Size Determination Page 3


3 Estimating the sample size based on a mean
The second formula applies when estimating the arithmetic mean (average) of a particular
variable for a population. Suppose, for example, that you wanted to know the average employee
satisfaction level in your organization.

To calculate the sample size based on the sample required to estimate a population mean with an
approximate 95% confidence level, you can use the following formula:
z 2 2
nr  2
d

Where nr = required sample size, ó (the Greek letter sigma) = the population standard deviation,
a measure of the variation in the population and d = the degree of precision required by the
researcher. A drawback with this formula is the need to know the population standard deviation.
This may be known from prior research; if a good estimate is unavailable the formula will not be
reliable. We apply this formula in the example below.

3.1.1 Worked example


You are investigating the average (mean) level of employee satisfaction and want to know the
required sample size. You decide on a 95% confidence level. Prior studies have reported a
standard deviation (σ) of 1.5 so you decide to use the same figure in your estimate. Satisfaction
will be measured on a 7-point scale and you set a margin of error of ±0.25 units. To determine
the minimum sample size you then apply the formula:

z 2σ 2
nr 
d2


1.96   1.5
2 2

0.252
 138.29  139

So your minimum sample size would be 139.

Note that as with the proportion formula, accepting a higher margin of error will decrease the

Sample Size Determination Page 4


sample size. (You can see this from the formula: as d gets bigger, nr gets smaller.) The sample
size will also decrease if the population is more homogenous (i.e. the standard deviation is
smaller).

3.2 Finite population correction factor


When the sample represents a significant (e.g. over 5%) proportion of the population, a finite
population correction factor can be applied. This will reduce the sample size required. The
formula for this is:
nr
na 
n  1
1 r
N

Where na = the adjusted sample size,


nr = the original required sample size and
N = population size.

The formula is applied in the worked example below:

3.3 Worked example


Having calculated your sample size (nr = 139) for the employee satisfaction survey in the
previous example, you decide to apply a finite population correction factor because the total
number of employees is only 650 (N = 650). To determine the adjusted sample size you apply
the following formula
nr
na 
n  1
1 r
N
139

1
139  1
650
139
  115
1.21

So your adjusted minimum sample size would be 115.

Sample Size Determination Page 5


3.4 Worked example
Suppose a sample of farm household is to be selected by a simple random sampling to estimate
the cost of cultivation of paddy per hectare. The precision is to be within  50 Taka will 95%
confidence. From a past survey, it is known that s =255. How large must the sample be if the
population size is 1000? Find the sample size, if the population size is 10,000.

We are given Margin of Error d = 50, Standard deviation s = 255 and the reliability co-efficient
=1.96.

 zs   1.96  255 
2 2

nr        99.92  100
d   50 
The population size, N=1000

nr 1 99
Therefore   0.099
N 1000
Since the sampling fraction is considerable we have to recalculate adjusted sample size using the
formula,

nr
na 
n  1
1 r
N
100

1
100  1
1000
100
  91
1.099
If, N = 10000

na  1 90
  0.009
N 10000
Since the sampling fraction is negligible we can take the previously calculated n (=100).

Sample Size Determination Page 6


3.5 Worked example

A preliminary sample of 100 farmers was selected from a population of 5000 farmers by simple
random sampling. It was found that 40 of the selected farmers opt for a new variety of paddy.
How large a sample must be selected to have a precision of  5% with 95% confidence?

40
We are given that p   0.4; d  0.05 and the reliability coefficient = 1.96.
100
Hence,

pq 0.4  0.6
n  z2   (1.96) 2

d2 0.0025
0.93
  369
0.0025
368
The sampling fraction   0.074. it is considerable.
5000
Hence

nr 369 369
na   
n 1 369 - 1 1.074
1 r 1
N 5000
 344

Sample Size Determination Page 7

You might also like