Professional Documents
Culture Documents
Submitted by:
Ron Ernest G. Borres
Submitted to:
Arch. Juanito Y. Sy
T01
What is Data Sampling?
Data is an empirical and concise way of knowing the scope of audience to participate on your
study. Data sampling is necessary in research as it can save you time on data gathering, at the
same time, still provides you with reliable data which is to be analyzed and interpreted. Data
gathering has a simple, but hard and time-consuming process. Overtime, new approaches are
instrumented to aid in the data gathering process. These approaches, gives way to make keep the
marginal error at its lowest.
distribution with mean and variance (see the plot below). This distribution is
called the sampling distribution of . One of the properties of normal curves is that approximately
95% of a normally distributed population lies within 2 standard deviations of the mean. In this
case that means that approximately 95% of all possible samples of size n have sample proportions
that are within 2 standard deviations of their mean . Therefore, when we randomly select our
sample proportion from the population of all possible sample proportions, the probability is
approximately 0.95 that the error of estimation, the difference between the estimate and the actual
the error of estimation. It is not an absolute bound, but is a reasonable bound in the sense that there
is only a 5% chance that the error of estimation will exceed this bound.
For example, suppose we randomly select 500 voters and find that 260 of these voters favor this
is a bounded function of with upper bound . The plot below shows how this
function depends on .
This implies that the bound on the error of estimation is at most
Therefore, we can make the following statement about the proportion of voters who favor our
candidate based on the information contained in our sample: the estimated proportion who favor
our candidate is 0.520 and we are about 95% certain that this estimate is no more than
from the actual population proportion. Another way of stating this is that we are about 95% certain
that the population proportion is within the interval , that is, between 0.475 and
0.565.
This bound on the error of estimation of a population proportion is conservative in the sense that
it does not depend on the actual population proportion. However, if is close to 0 or 1, then it
will be too conservative because in this case, the value of would be much smaller than the
upper bound. It can be seen from the plot that if , then , so the upper
bound becomes too conservative when the population proportion is below .2 or above .8. In some
situations, we may have prior information in the form of a bound on that allows us to place a
bound on . Suppose, for example, that we wish to estimate the proportion of memory chips
that do not meet specifications, and we know from past history that this proportion has never
exceeded 0.15. In that case, we can say that
If a sample of 400 memory chips is randomly selected from a production run, and it is found that
32 fail to meet specifications, then the estimated population proportion is , and a bound
One of the interpretations of the estimated proportion of voters who favor our candidate is that we
are 95% confident that this proportion is between 0.475 and 0.565. This interval represents a range
of reasonable values for the population proportion. The confidence level of 95% is determined by
the use of 2 standard deviations for the error bound and the property of normal curves that
approximately 95% of a population falls within 2 standard deviations from the mean. However,
this also implies that there is a 5% chance that the estimation error is greater than the stated bound,
or that there is a 5% chance that the interval does not contain the population proportion. If there
are very serious consequences of reporting an error bound that turns out to be too small, then we
should decide what is an acceptable risk that the error bound is too small. We can then use the
appropriate number of standard deviations so that the risk is acceptably small. Suppose for example
that we are willing to accept a risk of 1% that the error bound is too small or that the resulting
interval of reasonable values does not include the population proportion. To accomplish this, we
must find the z-score such that the area between -z and z is 0.99. To find this z-score, note that to
area above z must be 0.005 and so the total area below z is 0.995. We can use the R quantile
function qnorm for the normal distribution to obtain this value,
z = qnorm(.995)
This gives z=2.576 and so the 99% confidence interval is
In this case we are 99% confident that the proportion of voters who favor our candidate is
somewhere within this interval. Such intervals are called confidence intervals. To summarize the
discussion above, a confidence interval for a population proportion based on a random sample of
size n is
where z is selected so that the area between -z and z is the required level of confidence, and is
In practice we can just use the estimated standard deviation for confidence intervals. This gives
the confidence interval,
The standard deviation based on a prior bound is used for sample size determination.
Confidence intervals have two inter-related properties: the level of confidence and the precision
as measured by the width of the confidence interval. These properties are inversely related. If the
confidence level is increased, then the width is increased and so its precision is decreased. The
only way to increase the confidence level while maintaining or increasing precision is to use a
larger sample size. The sample size can be determined by specifying the confidence level and the
required precision. Suppose for example that we would like to estimate the proportion who favor
our candidate to within 0.025 with 95% confidence. These goals require that the confidence
interval has the form , where e denotes the required precision, 0.02. Since there is no prior
bound available for the population proportion, we must use the conservative standard deviation for
where z is chosen so that the area between -z and z is 0.95 and e=0.02. Using R gives
z = qnorm(.975)
e = .02
n = (z/(2*e))^2
n
[1] 2400.912
If the actual population proportion is close to 0 or 1, then this sample size will be much larger than
is required for the stated goals. In such situations if we have a prior bound on the population
proportion, we can incorporate that bound to improve the sample size determination. If we would
like to estimate the proportion of memory chips that do not meet specifications and we have a prior
bound, for the proportion, then the confidence interval will have the form,
This gives
If we require that the estimate of this proportion be within .02 of the population proportion with
90% confidence, and we have a prior bound , then z=1.645, , and so the
sample size would be
Consequential Research
Determining Sample Size: How to Ensure You Get the Correct Sample Size
Author: Scott Smith, Ph.D.|January 31, 2018
How many responses do you really need for statistically sound results? This simple question is a
never-ending quandary for researchers who use statistically based calculations to answer
different questions. A larger sample group can yield more accurate study results — but excessive
responses can be pricey.
Consequential research requires an understanding of the statistics that drive the range of sample
size decisions you need to make. A simple equation will help you put the migraine pills away
and sample confidently knowing that there is a high probability that your survey is statistically
accurate with the correct sample size.
Sample size variables based on target population
Before you can calculate a sample size, you need to determine a few things about the target
population and the sample you need:
1. Population Size — How many total people fit your demographic? For instance, if you
want to know about mothers living in the US, your population size would be the total
number of mothers living in the US. Not all populations sizes need to be this large. Even
if your population size is small, just know who fits into your demographics. Don’t worry
if you are unsure about this exact number. It is common for the population to be unknown
or approximated between two educated guesses.
2. Margin of Error (Confidence Interval) — No sample will be perfect, so you must decide
how much error to allow. The confidence interval determines how much higher or lower
than the population mean you are willing to let your sample mean fall. If you’ve ever
seen a political poll on the news, you’ve seen a confidence interval. For example, it will
look something like this: “68% of voters said yes to Proposition Z, with a margin of error
of +/- 5%.”
3. Confidence Level — How confident do you want to be that the actual mean falls within
your confidence interval? The most common confidence intervals are 90% confident,
95% confident, and 99% confident.
4. Standard of Deviation — How much variance do you expect in your responses? Since we
haven’t actually administered our survey yet, the safe decision is to use .5 – this is the
most forgiving number and ensures that your sample will be large enough.
Calculating sample size
Okay, now that we have these values defined, we can calculate our needed sample size. This can
be done using an online sample size calculator or with paper and pencil.
Your confidence level corresponds to a Z-score. This is a constant value needed for this
equation. Here are the z-scores for the most common confidence levels:
• 90% – Z Score = 1.645
• 95% – Z Score = 1.96
• 99% – Z Score = 2.576
If you choose a different confidence level, use this Z-score table* to find your score.
Next, plug in your Z-score, Standard of Deviation, and confidence interval into the sample size
calculator or into this equation:**
Necessary Sample Size = (Z-score)2 * StdDev*(1-StdDev) / (margin of error)2
Here is an example of how the math works assuming you chose a 95% confidence level, .5
standard deviation, and a margin of error (confidence interval) of +/- 5%.
((1.96)2 x .5(.5)) / (.05)2
(3.8416 x .25) / .0025
.9604 / .0025
384.16
385 respondents are needed
Taken from:
Works Cited
Scott Smith, P. (2018, January 01). qualtrics. Retrieved April 03, 2018, from qualtrics.com:
https://www.qualtrics.com/blog/determining-sample-size/
University of Texas at Dallas. (2018, April 02). University of Texas at Dallas Portal. Retrieved April 03,
2018, from www.utdallas.edu/: http://www.utdallas.edu/~ammann/stat3355/node25.html