Professional Documents
Culture Documents
nces
i nfere he
d
ase ed on t
r y -b as
theo lation b ample
ki ng u s
M a
e po p
of th e
to th atistics
st
Population characterized by
its parameters 𝜇 and 𝜎
1
Lesson Proper
Suppose that you just had a losing game, and your coach wants you to work on your 3-point shooting then
report progress. In doing this, you might be interested to compute the mean number of 3-point shots made,
and the standard deviation of the shooting practice routines you set. The sample mean (𝑥̅ ), and the sample
standard deviation (𝑠) you recorded are considered as “estimates” to the population parameter (your true
3-point shooting characteristics).
Using proper statistical terms, we say that the sample mean (𝑥̅ ), and the sample standard deviation (𝑠) are
considered as point estimates to the population parameter. Below is the full definition:
Def. Point Estimate
A point estimate is a single number (or data) calculated to describe the population parameter. The rule
(or formula) by which the point estimate is computed is called the point estimator.
The possible point estimators are the sample mean, sample median, and sample mode.
However, in real life situations, recall that there could be various statistics (various sample means,
medians, and modes) to use as point estimates since sampling is a random process. In this regard, to be
able to determine which point estimate is best, we need to determine how the point estimates behave in
repeated sampling—this is why we discussed the sampling distribution in the previous lecture.
Now, suppose the image above is a result of four of your friends shooting an arrow to a target. To put things
through an analogy, we say that each one of your friends is considered as a point estimator; each arrow hit
is a point estimate; and the innermost circle (or otherwise known as the Bull’s Eye) is the desired
parameter. Suppose each of them was able to hit the Bull’s Eye. Which of your friend would consider as the
best marksman? To answer this, we need to be informed of the following definition:
Given the definition of an unbiased estimate, it should be clear that your friend who showed the rightmost
performance is the best marksman.
2
Further, note that when the sample size is large, the sample mean (𝑥̅ ) is considered as the best point
estimator (unbiased estimate) for the population mean since the sample mean has the smallest variance
among other possible estimators of the population mean.
Remark. In making inferences about the population, it is imperative to have an estimator having a
distribution that is very close, if not, the same as the population parameter. And so, through the Central
Limit Theorem, we know for a fact that the sampling distribution of the estimator centers on the parameter
of the population we are attempting to estimate. However, recall that all we have is the estimate computed
using a sample size 𝑛. The fact that it is only taken from a sample begs us to ask the following question:
How far from the true value of the parameter does our estimate lie? This question is answered by the
concept of the error of estimation.
Def. Error of Estimation
The absolute value of the difference between the estimate and the estimated parameter is called the
error of estimation.
Recall that for a point estimator with a normal distribution, the Empirical Rule states that 95% of all the
point estimates lie within two standard deviations (or more exactly 1.96 standard deviations) from the
mean of that distribution. See the illustration below:
1.96 SE 1.96 SE
Sample Estimator
True
Value
Margin of Margin of
Error Error
A Particular
Estimate
This implies that for unbiased estimators, the difference between the point estimator and the true value of
the parameter will also be less than 1.96 standard deviations or standard errors. This is called the 95%
margin of error, or just simply, margin of error.
𝛼/2 𝛼/2
−𝑧!/# 𝑧!/#
This brings us to the general formula for computing the (1 − 𝛼)% Confidence Interval Estimate. That is,
(1 − 𝛼)% Confidence Interval Estimate = 𝑥̅ ± 𝑧%/! × SE
where 𝑧%/! is standard normal score associated with an area equivalent to 𝛼/2 located at the right tail of
the 𝑧-distribution; and −𝑧%/! is standard normal score associated with an area equivalent to 𝛼/2 located
at the left tail of the 𝑧-distribution. Note that this formula produces two limits—the lower confidence limit
and the upper confidence limit.
Example 4.2.2. If (1 − 𝛼) = 0.90, compute for 𝑧%/! and −𝑧%/! .
Solution to Example 4.2.2
Since (1 − 𝛼) = 0.90, then 𝛼 must be equal to 0.10, and 𝛼/2 = 0.05. This means even further that we are
looking forward in finding the 𝑧-scores that corresponds to an area equivalent to 0.05 to the right tail, and
0.05 to the left tail. Now, using the NORMSINV function, we have
7
𝑧%/! = 1.64 and −𝑧%/! = −1.64
(Using MS Excel, the value of 𝑧%/! is computed as =norm.s.inv(0.95), and =norm.s.inv(0.05) respectively)
Remark. The values of 𝑧 commonly used for the confidence intervals are outlined in the table below:
Confidence
Coefficient
(1 − 𝛼) 𝛼 𝛼/2 𝑧%/!
0.90 0.10 0.05 1.64
0.95 0.05 0.025 1.96
0.98 0.02 0.01 2.33
0.99 0.01 0.005 2.58
Example 4.3.3. Suppose a dietician selected a random sample of 𝑛 = 50 female UST-SHS students and found
that their daily intake of flour-based products is 𝑥̅ = 750 grams per day. Further, the dietician was
informed by the central statistics department of the university that the standard deviation of the
population is 𝜎 = 35 grams per day. Use this sample to provide a 99% confidence interval estimate for the
mean daily intake of flour-based products for these students.
Solution to Example 4.3.3.
Since the sample size is large, then by the Central Limit Theorem, we know that the standard error (SE) is
'(
equal to . Given this, the 99% Confidence Interval Estimate is
√()
'(
99% Confidence Interval Estimate = 750 ± 2.58 ×
√()
or in confidence statement format we say that we are 99% confident that
737.23 < 𝜇 < 762.77.
Remark. In the previous examples, we constructed confidence interval estimates that depend on the
sampling distribution of the sample mean. This means that all we know so far is how to construct a
confidence interval estimate when the population standard deviation (𝜎) is known. Now, when the
population standard deviation is unknown, there is just a slight change in the construction of the
confidence interval, and this change involves the tabular value to use and the standard error of the
sampling distribution of the sample mean.
9
Now, with 𝑥̅ = 56.05 and 𝑠 = 7.5147, the 95% Confidence Interval Estimate of the true average weight of
all learners is
7.5147 7.5147
56.05 ± 𝑡().)!(,36) × = 56.05 ± 2.09 ×
√20 √20
(The 𝑡-value is computed using MS Excel or the 𝑡-table. The syntax is =t.inv(0.975,19) or
=t.inv.2t(0.05,19))
or, in confidence statements, we say that we are 95% confident that the population mean (𝜇) is in between
52.54 kilograms and 59.56 kilograms.
Remark. A property of the 𝑡-distribution is that as the degrees of freedom increases, the 𝑡-distribution
approaches the standard normal distribution (𝑧-distribution). Note that this is also in harmony with the
Central Limit Theorem discussed in the previous chapter. As such, given a sample size that is at least 30,
and an unknown population standard deviation (𝜎), we take the tabular value from the 𝑧-distribution, and
the standard error estimate is to be used for constructing the (1 − 𝛼)% confidence interval estimate. That
is,
𝑠
(1 − 𝛼)% Confidence Interval Estimate = 𝑥̅ ± 𝑧%/! ×
√𝑛
Example 4.3.5. A random sample of statistics students were asked to estimate the total number of hours
they spend watching television in an average week. The responses are shown on the table below
0 3 1 20 9
5 10 1 10 4
4 2 4 4 5
5 10 1 10 4
4 4 5 4 2
20 9 0 3 1
4 10 3 4 6
Use this data to construct a 98% Confidence Interval Estimate for the mean number of hours of statistics
students will spend watching television in one week.
Solution to Example 4.3.5.
Using MS Excel, we get that 𝑥̅ = 5.46 and 𝑠 = 4.7113. Now, since 𝑛 = 35, the 98% confidence interval
estimate is computed as
4.7113
5.46 ± 2.33 ×
√35
which means that we are fairly confidence that 3.60 < 𝜇 < 7.32 hours.
10
Remark. When to use the 𝑧- or 𝑡-distribution for creating confidence interval estimates:
Path 1: Sigma is Known → Use 𝑧-distribution no matter the sample size.
Path 2: Sigma is Unknown → Use 𝑡-distribution if 𝑛 < 30.
Path 3: Sigma is Unknown → Use 𝑧-distribution if 𝑛 ≥ 30.
Supplementary Exercises:
1. A machine produces metal pieces which are cylindrical in shape with an average mean diameter of
14.20 cm if the machine is in good condition. A quality engineer officer evaluates the condition of
the machine by using a random sample of 36 runs which resulted to a mean diameter of 14.25 cm
with standard deviation of 0.30 cm. Using this information, construct a 95% confidence interval
estimate of the true average diameter of the cylindrical metal pieces produced by the machine.
2. A 99% confidence interval estimate can be interpreted to mean that:
Choice A: If all possible samples are taken and confidence interval estimates are developed, 99% of
them would include the true population mean somewhere within their interval.
Choice B: We have 99% confidence that we have selected a sample whose interval includes the
population mean.
Choice C: Both of the above.
Choice D: None of the above.
3. A local government official observes an increase in the number of individuals with cardiovascular
and obesity problems in his barangay. In order to improve the health conditions of his constituents,
he aims to promote an easy and cheap way to reduce weight. It is known that obesity results in
greater risk of having illnesses like diabetes and heart problems. He encouraged his constituents to
participate in his Dance for Life project every weekend for 3 months. To know if the program is
effective in reducing weight, he randomly selected 12 participants from the group who completed
the program. The weight loss data, in kilograms, of the 12 randomly selected participants after
completing the program are: 0.5, 0.7, 0.9, 1.1, 1.2, 1.3, 1.4, 2.0, 2.3, 2.4, 2.7, and 3.0. It is known that
the weight loss of those who have completed the dance program follows a normal distribution with
a variance of 3.24 kg ! . Construct a 90% confidence interval estimate for the true mean weight loss
of the participants who have completed the dance program.
4. The Human Toxome Project (HTP) is working to understand the scope of industrial pollution in the
human body. Industrial chemicals may enter the body through pollution or as ingredients in
consumer products. In October 2008, the scientists at HTP tested cord blood samples for 20
newborn infants in the United States. The cord blood of the "In utero/newborn" group was tested
for 430 industrial compounds, pollutants, and other chemicals, including chemicals linked to brain
and nervous system toxicity, immune system toxicity, reproductive toxicity, and fertility problems.
There are health concerns about the effects of some chemicals on the brain and nervous system. The
table below shows how many of the targeted chemicals were found in each infant’s cord blood.
79 145 147 160 116 100 159 151 156 126
137 83 156 94 121 144 123 114 139 99
11
Use this sample data to construct the 90% confidence interval estimate for the mean number of
targeted industrial chemicals to be found in an infant’s blood.
- End of the Lecture Transcript –
12