You are on page 1of 11

De La Salle University Dasmariñas

College of Engineering, Architecture and Technology


Department of Architecture

Determining Sampling Size


Research Compilation

Submitted by:
Ron Ernest G. Borres

Submitted to:
Arch. Juanito Y. Sy

T01
What is Data Sampling?
Data is an empirical and concise way of knowing the scope of audience to participate on your
study. Data sampling is necessary in research as it can save you time on data gathering, at the
same time, still provides you with reliable data which is to be analyzed and interpreted. Data
gathering has a simple, but hard and time-consuming process. Overtime, new approaches are
instrumented to aid in the data gathering process. These approaches, gives way to make keep the
marginal error at its lowest.

Determining Sampling Size


Large Sample Estimation of a Population Proportion
Suppose we randomly select n individuals from the population of voters and let N denote the
proportion in the sample who favor a particular candidate. Then is our
estimate of . The value of this estimate depends on the individuals who are selected for the
sample. To understand how we can make use of this fact to make a statement about estimation
error, consider the following thought experiment (an experiment that we don't actually perform,
but can think about). Suppose we select every possible sample of size n from the population and
for each sample we obtain the sample proportion who favor this candidate. These estimates will
vary from 0 to 1 and the actual sampling experiment we perform, selecting a random sample of
size n and obtaining its sample proportion, is equivalent to randomly selecting one proportion from
the population of proportions obtained from all possible samples of size n. Although we could not
perform this experiment in reality, we can perform it mathematically. If we can determine the
distribution of the population of all possible sample proportions, then we can use this distribution
to make a probability statement about the estimation error. The Central Limit theorem states that
if n is large, then the distribution of N is approximately a normal distribution with mean and

variance . Therefore, the distribution of has approximately a normal

distribution with mean and variance (see the plot below). This distribution is

called the sampling distribution of . One of the properties of normal curves is that approximately
95% of a normally distributed population lies within 2 standard deviations of the mean. In this
case that means that approximately 95% of all possible samples of size n have sample proportions
that are within 2 standard deviations of their mean . Therefore, when we randomly select our
sample proportion from the population of all possible sample proportions, the probability is
approximately 0.95 that the error of estimation, the difference between the estimate and the actual

proportion, will be no more than 2 standard deviations, . This represents a bound on

the error of estimation. It is not an absolute bound, but is a reasonable bound in the sense that there
is only a 5% chance that the error of estimation will exceed this bound.

For example, suppose we randomly select 500 voters and find that 260 of these voters favor this

candidate. Then our estimate of the population proportion is . We are about


95% certain that the error of this estimate is no more than . The problem that
remains to be solved is that this error bound depends on the value of , which is unknown. There
are two approaches we can take to solve this problem. The first approach is to note that the
function

is a bounded function of with upper bound . The plot below shows how this
function depends on .
This implies that the bound on the error of estimation is at most

Therefore, we can make the following statement about the proportion of voters who favor our
candidate based on the information contained in our sample: the estimated proportion who favor
our candidate is 0.520 and we are about 95% certain that this estimate is no more than

from the actual population proportion. Another way of stating this is that we are about 95% certain

that the population proportion is within the interval , that is, between 0.475 and
0.565.
This bound on the error of estimation of a population proportion is conservative in the sense that
it does not depend on the actual population proportion. However, if is close to 0 or 1, then it

will be too conservative because in this case, the value of would be much smaller than the

upper bound. It can be seen from the plot that if , then , so the upper
bound becomes too conservative when the population proportion is below .2 or above .8. In some
situations, we may have prior information in the form of a bound on that allows us to place a

bound on . Suppose, for example, that we wish to estimate the proportion of memory chips
that do not meet specifications, and we know from past history that this proportion has never
exceeded 0.15. In that case, we can say that

If a sample of 400 memory chips is randomly selected from a production run, and it is found that

32 fail to meet specifications, then the estimated population proportion is , and a bound

on the error of estimation would be . We could present these results as


follows: The estimated proportion of memory chips that do not meet specifications is 0.080. With
95% certainty, this proportion could be as low as 0.044 or as high as 0.116.
If we do not have available any prior bounds on the population proportion, then we could use in
place of in the error bound. That is, the estimated bound on the error of estimation would be

One of the interpretations of the estimated proportion of voters who favor our candidate is that we
are 95% confident that this proportion is between 0.475 and 0.565. This interval represents a range
of reasonable values for the population proportion. The confidence level of 95% is determined by
the use of 2 standard deviations for the error bound and the property of normal curves that
approximately 95% of a population falls within 2 standard deviations from the mean. However,
this also implies that there is a 5% chance that the estimation error is greater than the stated bound,
or that there is a 5% chance that the interval does not contain the population proportion. If there
are very serious consequences of reporting an error bound that turns out to be too small, then we
should decide what is an acceptable risk that the error bound is too small. We can then use the
appropriate number of standard deviations so that the risk is acceptably small. Suppose for example
that we are willing to accept a risk of 1% that the error bound is too small or that the resulting
interval of reasonable values does not include the population proportion. To accomplish this, we
must find the z-score such that the area between -z and z is 0.99. To find this z-score, note that to
area above z must be 0.005 and so the total area below z is 0.995. We can use the R quantile
function qnorm for the normal distribution to obtain this value,
z = qnorm(.995)
This gives z=2.576 and so the 99% confidence interval is

In this case we are 99% confident that the proportion of voters who favor our candidate is
somewhere within this interval. Such intervals are called confidence intervals. To summarize the
discussion above, a confidence interval for a population proportion based on a random sample of
size n is
where z is selected so that the area between -z and z is the required level of confidence, and is

In practice we can just use the estimated standard deviation for confidence intervals. This gives
the confidence interval,

The standard deviation based on a prior bound is used for sample size determination.
Confidence intervals have two inter-related properties: the level of confidence and the precision
as measured by the width of the confidence interval. These properties are inversely related. If the
confidence level is increased, then the width is increased and so its precision is decreased. The
only way to increase the confidence level while maintaining or increasing precision is to use a
larger sample size. The sample size can be determined by specifying the confidence level and the
required precision. Suppose for example that we would like to estimate the proportion who favor
our candidate to within 0.025 with 95% confidence. These goals require that the confidence

interval has the form , where e denotes the required precision, 0.02. Since there is no prior
bound available for the population proportion, we must use the conservative standard deviation for

the confidence interval, . Therefore, to attain these goals we must have

where z is chosen so that the area between -z and z is 0.95 and e=0.02. Using R gives
z = qnorm(.975)
e = .02
n = (z/(2*e))^2
n
[1] 2400.912
If the actual population proportion is close to 0 or 1, then this sample size will be much larger than
is required for the stated goals. In such situations if we have a prior bound on the population
proportion, we can incorporate that bound to improve the sample size determination. If we would
like to estimate the proportion of memory chips that do not meet specifications and we have a prior

bound, for the proportion, then the confidence interval will have the form,

This gives

If we require that the estimate of this proportion be within .02 of the population proportion with

90% confidence, and we have a prior bound , then z=1.645, , and so the
sample size would be
Consequential Research
Determining Sample Size: How to Ensure You Get the Correct Sample Size
Author: Scott Smith, Ph.D.|January 31, 2018

How many responses do you really need for statistically sound results? This simple question is a
never-ending quandary for researchers who use statistically based calculations to answer
different questions. A larger sample group can yield more accurate study results — but excessive
responses can be pricey.
Consequential research requires an understanding of the statistics that drive the range of sample
size decisions you need to make. A simple equation will help you put the migraine pills away
and sample confidently knowing that there is a high probability that your survey is statistically
accurate with the correct sample size.
Sample size variables based on target population
Before you can calculate a sample size, you need to determine a few things about the target
population and the sample you need:
1. Population Size — How many total people fit your demographic? For instance, if you
want to know about mothers living in the US, your population size would be the total
number of mothers living in the US. Not all populations sizes need to be this large. Even
if your population size is small, just know who fits into your demographics. Don’t worry
if you are unsure about this exact number. It is common for the population to be unknown
or approximated between two educated guesses.
2. Margin of Error (Confidence Interval) — No sample will be perfect, so you must decide
how much error to allow. The confidence interval determines how much higher or lower
than the population mean you are willing to let your sample mean fall. If you’ve ever
seen a political poll on the news, you’ve seen a confidence interval. For example, it will
look something like this: “68% of voters said yes to Proposition Z, with a margin of error
of +/- 5%.”
3. Confidence Level — How confident do you want to be that the actual mean falls within
your confidence interval? The most common confidence intervals are 90% confident,
95% confident, and 99% confident.
4. Standard of Deviation — How much variance do you expect in your responses? Since we
haven’t actually administered our survey yet, the safe decision is to use .5 – this is the
most forgiving number and ensures that your sample will be large enough.
Calculating sample size
Okay, now that we have these values defined, we can calculate our needed sample size. This can
be done using an online sample size calculator or with paper and pencil.
Your confidence level corresponds to a Z-score. This is a constant value needed for this
equation. Here are the z-scores for the most common confidence levels:
• 90% – Z Score = 1.645
• 95% – Z Score = 1.96
• 99% – Z Score = 2.576
If you choose a different confidence level, use this Z-score table* to find your score.
Next, plug in your Z-score, Standard of Deviation, and confidence interval into the sample size
calculator or into this equation:**
Necessary Sample Size = (Z-score)2 * StdDev*(1-StdDev) / (margin of error)2
Here is an example of how the math works assuming you chose a 95% confidence level, .5
standard deviation, and a margin of error (confidence interval) of +/- 5%.
((1.96)2 x .5(.5)) / (.05)2
(3.8416 x .25) / .0025
.9604 / .0025
384.16
385 respondents are needed
Taken from:

Works Cited
Scott Smith, P. (2018, January 01). qualtrics. Retrieved April 03, 2018, from qualtrics.com:
https://www.qualtrics.com/blog/determining-sample-size/

University of Texas at Dallas. (2018, April 02). University of Texas at Dallas Portal. Retrieved April 03,
2018, from www.utdallas.edu/: http://www.utdallas.edu/~ammann/stat3355/node25.html

You might also like