You are on page 1of 5

Sampling Distribution and Estimation

When discussing sampling techniques, we talked about the difference of population and samples.
We said that the population refers to the entire pool of potential outcomes. In a survey, the
population includes every person in a set geographical area. The characteristic that comes from
a population is called a parameter. On the other side, a sample refers to a smaller group taken
from the population. This is done due to resource and time constraints. The idea is that even
though we only consider a smaller group, the characteristic of this sample reflects or “estimates”
the characteristic of the population. The sample characteristic is called a statistic.
If we expand this idea to probability distributions, we can say that the population should follow
a given distribution. Typically, most examples have the population follow the normal distribution.
In this case, what happens to the distribution of the sample if we take it from a population with
a normal distribution? Ideally with proper sampling techniques, the sample distribution is
appropriately reflecting the distribution of the population. Hence, the sample distribution should
also be normally distributed in this case.
However, we can recall that in sampling, we can have varied results due to measures of variation.
That is, say we have a population with a mean of 10. If I ask each student to take a sample from
this population, it is possible that some will get values less than 10, some greater than 10. Hence,
there is a certain “interval” in which the possible values can fall in. If you recall from normal
distribution, this is explained by the empirical rule. The values can fall within a few standard
deviations of the mean.
With this, we can see that we can have two kinds of statistical estimations.

• Point Estimate – the single value of a statistic calculated from a sample. This is usually
the sample mean.

• Interval Estimate – a range of values calculated from a sample statistic and standardized
statistics such as Z-score. This is the range of possible values that can occur. An example
from normal distribution is
𝑥̅ ± 𝑧𝑠
𝑥̅ − 𝑧𝑠 < 𝑠𝑎𝑚𝑝𝑙𝑒 𝑚𝑒𝑎𝑛 < 𝑥̅ + 𝑧𝑠
As you can see, there will be an upper and lower limit for this range. This is an example of an
interval estimate. We will be using this a lot later.
A more general approach to this interval estimate is as follows
𝑃𝑜𝑖𝑛𝑡 𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑒 ± (𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑𝑖𝑧𝑒𝑑 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐)(𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛)
Later, we will learn different standardized statistics such as Z-statistic, and t-statistic. Know that
whenever you will be asked to develop an interval estimate, we will always follow the same
format.
We will now consider five different cases where we can analyze for an interval estimate.

Case 1: Estimating µ when n is large


In this first case, we will try to estimate the population mean by obtaining a large sample. In the
simplest manner, we can assume the sample to be large when we take at least 30 data points or
n > 30. If that is the case, we can analyze using this case.
In this case, our point estimate is simply the sample mean.
∑𝑛𝑖=1 𝑥𝑖
𝑥̅ =
𝑛
Our interval estimate will be based on the Z-statistic. Recall that the z-statistic is
𝑥−𝜇
𝑧=
𝜎
However, for this analysis, we do not have to calculate for z. Rather our z-statistic will be based
on critical values. Remember that using z-statistic we can get the area under the curve of P(z).

If we look at this diagram, we can see the area in green. This is our interval estimate. If you notice,
there is a 95% attributed to this example. This means that if we have this particular interval
estimate, we can be sure that 95% of all values will fall within the said intervals. In other words,
we can be “confident” with our interval. Say we will have an exam with 100 items. If I ask you,
how “confident” are you that you will score between 70-80? You might say that you’ll be 60%
sure. But if I will ask how “confident” are you that you will score between 60-90? If you notice,
the interval became wider. This makes you a lot more confident as there is a bigger room for
error. Thus, we can say that if the interval is “wider” we are more confident.

Thus, we can see that our “confidence” is highly dependent on the critical values of our z-statistic.
A 90% confidence requires a z-statistic with critical values at ±1.65, at 95%, it requires critical
values at ±1.96, and so on. With this, we can actually call our interval estimate as a “confidence
interval”. This confidence interval is defined by the area under the curve shown in red shades.
This “confidence” can be defined as level of confidence.
If we look at the opposite side, we are focusing on the area under the curve outside of the
confidence interval. In this case, we are seeing how “significant” the interval is. Thus, if we are
have a 90% confidence interval, we have a 10% level of significance. Similarly, a 95% level of
significance will have ta 5% level of significance. This is important for you to understand as we
will keep coming back to this idea later.
The confidence interval for this case is
𝜎 𝜎
𝑥̅ − 𝑧𝛼⁄2 ≤ 𝜇 ≤ 𝑥̅ + 𝑧𝛼⁄2
√𝑛 √𝑛
This interval varies depending on the value of your sample mean. However, this confidence
interval somehow gives us a range of values such that we can capture the population mean to
some degree of confidence.
The confidence interval is between 3.98 and 4.54 with a 95% level of confidence.
In this case, we can see that the population standard deviation, σ is given. However, there are
cases where we do not know the population standard deviation. This leads us to the second
case.
Case 2: Estimating µ when n is large and σ is unknown
When the population standard deviation is unknown, we have no choice but to try and estimate
it. The way we can do this is to calculate for the sample standard deviation.

∑(𝑥 − 𝜇)2
𝜎≈𝑠= √
𝑛−1

With this, we can then estimate the confidence interval using the sample standard deviation, s.
The confidence interval for this case is
𝑠 𝑠
𝑥̅ − 𝑧𝛼⁄2 ≤ 𝜇 ≤ 𝑥̅ + 𝑧𝛼⁄2
√𝑛 √𝑛

You might also like