Lecture Transcript 4 (Estimation of Paramterers)

A
Transcript of the Lecture in Estimation of Parameters

Lecture Objective:

This lecture aims to impart to the learners the basics of making inferences regarding the population of a
random variable.

References Used:

Albert, J., Albacea, J., Ayaay, M., David, I., and de Mesa, I. (2016). Teaching Guide for Senior High School –
Statistics and Probability. Commission on Higher Education K to 12 Transition Program Management Unit.

Illowsky, B. and Dean, S. (2018). Introduction to Statistics. OpenStax.

Melosantos, L., Antonio, J., Robles, S., Bruce, R., and Sacluti, J (2016). Math Connections in the Digital Age
Statistics and Probability. Quezon City: Sibs Publishing House, Inc., 2016

Mendelhall, W., Beaver, R., and Beaver, B. (2013). Introduction to Probability and Statistics.
Pacific Grove, Calif. : Brooks/Cole ; Andover : Cengage Learning [distributor], 2013.

Lecture 4.1
Point Estimation

Introduction

In the previous lecture, we discussed the link between probability and the statistics of samples. We also
learned from the Central Limit Theorem that even the population is not normal, the sampling distribution
of the statistics will be approximately normal when 𝑛 is large (conservatively, that is 𝑛 ≥ 30). In this
lecture, we will learn that we do not always need the population parameters to characterize a population.
With the use of some certain techniques, we may be able to provide inferences about a population based
on the statistics of samples. This process is called inferential statistics and its process is illustrated below:

The Sampling-Inferencing Process

Random sample characterized by its
statistics 𝑥̅ and 𝑠

nces
i nfere he
d
ase ed on t
r y -b as
theo lation b ample
ki ng u s
M a
e po p
of th e
to th atistics
st

Population characterized by
its parameters 𝜇 and 𝜎

1
Lesson Proper

Suppose that you just had a losing game, and your coach wants you to work on your 3-point shooting then
report progress. In doing this, you might be interested to compute the mean number of 3-point shots made,
and the standard deviation of the shooting practice routines you set. The sample mean (𝑥̅ ), and the sample
standard deviation (𝑠) you recorded are considered as “estimates” to the population parameter (your true
3-point shooting characteristics).

Using proper statistical terms, we say that the sample mean (𝑥̅ ), and the sample standard deviation (𝑠) are
considered as point estimates to the population parameter. Below is the full definition:

Def. Point Estimate
A point estimate is a single number (or data) calculated to describe the population parameter. The rule
(or formula) by which the point estimate is computed is called the point estimator.

The possible point estimators are the sample mean, sample median, and sample mode.

However, in real life situations, recall that there could be various statistics (various sample means,
medians, and modes) to use as point estimates since sampling is a random process. In this regard, to be
able to determine which point estimate is best, we need to determine how the point estimates behave in
repeated sampling—this is why we discussed the sampling distribution in the previous lecture.

Now, suppose the image above is a result of four of your friends shooting an arrow to a target. To put things
through an analogy, we say that each one of your friends is considered as a point estimator; each arrow hit
is a point estimate; and the innermost circle (or otherwise known as the Bull’s Eye) is the desired
parameter. Suppose each of them was able to hit the Bull’s Eye. Which of your friend would consider as the
best marksman? To answer this, we need to be informed of the following definition:
Def. Unbiased and Biased Estimate

An estimator of a parameter is said to be unbiased if the mean of its estimate distribution is equal to the
true value of the parameter. If such is not the case, the estimator is otherwise classified as biased.

Given the definition of an unbiased estimate, it should be clear that your friend who showed the rightmost
performance is the best marksman.

2
Further, note that when the sample size is large, the sample mean (𝑥̅ ) is considered as the best point
estimator (unbiased estimate) for the population mean since the sample mean has the smallest variance
among other possible estimators of the population mean.

Remark. In making inferences about the population, it is imperative to have an estimator having a
distribution that is very close, if not, the same as the population parameter. And so, through the Central
Limit Theorem, we know for a fact that the sampling distribution of the estimator centers on the parameter
of the population we are attempting to estimate. However, recall that all we have is the estimate computed
using a sample size 𝑛. The fact that it is only taken from a sample begs us to ask the following question:
How far from the true value of the parameter does our estimate lie? This question is answered by the
concept of the error of estimation.

Def. Error of Estimation
The absolute value of the difference between the estimate and the estimated parameter is called the
error of estimation.

Recall that for a point estimator with a normal distribution, the Empirical Rule states that 95% of all the
point estimates lie within two standard deviations (or more exactly 1.96 standard deviations) from the
mean of that distribution. See the illustration below:

1.96 SE 1.96 SE
Sample Estimator
True
Value

Margin of Margin of
Error Error

A Particular
Estimate

This implies that for unbiased estimators, the difference between the point estimator and the true value of
the parameter will also be less than 1.96 standard deviations or standard errors. This is called the 95%
margin of error, or just simply, margin of error.
Def. Formula for the 95% Margin of Error

The 95% margin of error when 𝑛 is sufficiently large (𝑛 ≥ 30), is computed as

𝜎
95% Margin of Error = ±1.96 × SE = ±1.96 ×
√𝑛

Jolo pogi 3
Remark. Note that it is possible that the error of estimation will exceed the margin of error, however, that
is very unlikely (5% chance).

Example 4.1.1. A random sample of 50 polar bears produced an average weight of 980 pounds. Further, it
was determined that the population standard deviation is 105 pounds. Use this information to 1) compute
the margin of error; and 2) to provide an estimate of the average weight of all Arctic polar bears.

Solution to Example 4.1.1.
The problem above asks us to determine the margin of error, and infer the average weight of all Arctic
polar bears (𝜇) given the sample statistics 𝑥̅ = 980, and population standard deviation 𝜎 = 105.

Now, using the concepts discussed above, we say that we can be fairly confident (95% confident) that the
margin of error of the sample mean is

105
±1.96 × ≈ ±29 pounds
√50

In addition, this also means that the average weight of all polar bears may be as low as 951 pounds and as
high as 1009 pounds.

Example 4.1.2. Calculate the margin of error in estimating a population mean 𝜇 for 𝑛 = 30 and 𝜎 ! = 0.2

To compute for the 95% margin of error, we need to find the standard error first. That is,

√0.2
SE =
√30

Thus, the 95% margin of error is

√0.2
95% Margin of Error = ±1.96 × = ±0.1600.
√30

4
Lecture 4.2
Interval Estimation

Introduction

Previously, we learned about point estimators and the concept of the margin of error. In this part of the
lecture, we will study about a rule used to produce a certain interval that “reasonably” contains the
parameter we want to estimate. This concept is called the confidence interval.

Def. Confidence Interval and the Confidence Coefficient
A confidence interval is an interval estimate that is thought to contain the parameter of interest. The
confidence interval is measured using confidence coefficient, defined as 1 − 𝛼.

Further, the confidence coefficient tells the probability that a confidence interval will contain the
estimated parameter.

Lesson Proper

Most confidence interval created is a 95% confidence interval. A 95% confidence interval implies that the
probability that the interval created will contain the actual parameter estimated is 0.95. However, it should
be clear that one can decrease the amount of certainty when making confidence intervals. Some other
confidence coefficients typically used are 0.90, 0.98, and 0.99.

A confidence interval and confidence coefficient may also be thought of through the following analogy:

Consider a person throwing hoops at a wooden post. The wooden post represents the parameter wished
to be estimated, and the hoops are the different confidence intervals. Each time a hoop is thrown, it is hoped
that the hoop hoops the post. Similarly, when one draws a sample and constructs a confidence interval, we
hope to include the target parameter. However, it should be clear that sometimes missing is inevitable.

The success rate, or otherwise the proportion of intervals that “hoops the post” in repeated sampling
(repeated throwing), is called the confidence coefficient.

Def. 95% Confidence Interval Estimate Formula
The 95% confidence interval used to estimate the population parameter (95% Confidence Interval
Estimate) is computed as

95% Confidence Interval Estimate = Point Estimate ± 1.96 × SE = 𝑥̅ ± 1.96 × SE

Remark.
Creating a 95% confidence interval assures us that 95% of the time, the parameter is contained
within the created interval.

5

Figure 1. A Person Throwing a Hoop at a Post

Real-world problems often lead to the estimation of 𝜇. Below are some examples

• The average achievement of students in a particular academy
• Average compressive strength of a new type of cement
• Average number of deaths per age category of a certain disease
• The average demand for a new lip tint line

Example 4.2.1. Suppose a random sample of observed weights sized 𝑛 = 30 produced a sample mean that
is 𝑥̅ = 56.05 and the standard deviation of the population (𝜎) is known to be 9 kilograms. Create a 95%
confidence interval estimate for the mean of the population.

Given 𝑥̅ = 56.05 and 𝜎 = 9, to create a 95% confidence interval estimate, we need to use the formula

𝜎
95% Confidence Interval Estimate = 𝑥̅ ± 1.96 × SE = 𝑥̅ ± 1.96 ×
√𝑛

and so we have

9
56.05 ± 1.96 ×
√30

6
which also means that the 95% confidence interval estimate is

52.83 < 𝜇 < 59.27.

"
Remark. The 95% Confidence Interval Estimate defined previously as 𝑥̅ ± 1.96 × may also be expressed
√$
" "
as 𝑥̅ − 1.96 × < 𝜇 < 𝑥̅ ± 1.96 × . Further, it is also a correct statement to say that
√$ √$

" "
𝑃 }𝑥̅ − 1.96 × < 𝜇 < 𝑥̅ + 1.96 × • = 0.95.
√$ √$

Remark. Recall that we may also want to change the confidence coefficient from (1 − 𝛼) = 0.95 to another
confidence level. To do this, we just have to change the value of 𝑧 = 1.96 to the standard score value that
represents the (1 − 𝛼) in the center of the curve.

The whole red area is equivalent to

(1 − 𝛼)
Each of the unshaded area is equal to

𝛼/2
𝛼/2 𝛼/2
−𝑧!/# 𝑧!/#

This brings us to the general formula for computing the (1 − 𝛼)% Confidence Interval Estimate. That is,

(1 − 𝛼)% Confidence Interval Estimate = 𝑥̅ ± 𝑧%/! × SE

where 𝑧%/! is standard normal score associated with an area equivalent to 𝛼/2 located at the right tail of
the 𝑧-distribution; and −𝑧%/! is standard normal score associated with an area equivalent to 𝛼/2 located
at the left tail of the 𝑧-distribution. Note that this formula produces two limits—the lower confidence limit
and the upper confidence limit.

Example 4.2.2. If (1 − 𝛼) = 0.90, compute for 𝑧%/! and −𝑧%/! .

Solution to Example 4.2.2
Since (1 − 𝛼) = 0.90, then 𝛼 must be equal to 0.10, and 𝛼/2 = 0.05. This means even further that we are
looking forward in finding the 𝑧-scores that corresponds to an area equivalent to 0.05 to the right tail, and
0.05 to the left tail. Now, using the NORMSINV function, we have

7
𝑧%/! = 1.64 and −𝑧%/! = −1.64
(Using MS Excel, the value of 𝑧%/! is computed as =norm.s.inv(0.95), and =norm.s.inv(0.05) respectively)

Remark. The values of 𝑧 commonly used for the confidence intervals are outlined in the table below:

Confidence

Coefficient
(1 − 𝛼) 𝛼 𝛼/2 𝑧%/!
0.90 0.10 0.05 1.64
0.95 0.05 0.025 1.96
0.98 0.02 0.01 2.33
0.99 0.01 0.005 2.58

Example 4.3.3. Suppose a dietician selected a random sample of 𝑛 = 50 female UST-SHS students and found
that their daily intake of flour-based products is 𝑥̅ = 750 grams per day. Further, the dietician was
informed by the central statistics department of the university that the standard deviation of the
population is 𝜎 = 35 grams per day. Use this sample to provide a 99% confidence interval estimate for the
mean daily intake of flour-based products for these students.

Since the sample size is large, then by the Central Limit Theorem, we know that the standard error (SE) is
'(
equal to . Given this, the 99% Confidence Interval Estimate is
√()

'(
99% Confidence Interval Estimate = 750 ± 2.58 ×
√()

or in confidence statement format we say that we are 99% confident that

737.23 < 𝜇 < 762.77.

Remark. In the previous examples, we constructed confidence interval estimates that depend on the
sampling distribution of the sample mean. This means that all we know so far is how to construct a
confidence interval estimate when the population standard deviation (𝜎) is known. Now, when the
population standard deviation is unknown, there is just a slight change in the construction of the
confidence interval, and this change involves the tabular value to use and the standard error of the
sampling distribution of the sample mean.
Def. Standard Error Estimate

If 𝜎 is unknown, to estimate the population mean 𝜇, the point estimator’s distribution is said to be
unbiased with standard error estimate as

𝑠
SE*+,-./,* =
√𝑛
where 𝑠 represents the sample standard deviation.

8
Now, the tabular value to use when 𝜎 is unknown will come from the Student’s 𝑡-Distribution.

Def. Student’s 𝑡-Distribution
The student’s 𝒕-distribution is a probability distribution that is used when making inferences to an
approximately normal population, using a sample 𝑛 that is less than 30 and uses 𝑠 for estimating 𝜎.

Remark. When 𝑛 < 30, the Central Limit Theorem will not guarantee that the sampling distribution of the
sample mean is approximately normal.

Remark. When making (1 − 𝛼)% confidence interval estimates using the student’s 𝑡-distribution, usually,
we use the notation

𝑡(%/!,$23)

as the representation of the tabular value that corresponds to the right tail 𝑡-distribution. Note that 𝑛 − 1
is defined as its degrees of freedom (df), and the degrees of freedom is used to determine the 𝑡-score
associated with a given probability value. In this regard, therefore, the (1 − 𝛼)% confidence interval
estimate for the population mean (𝜇) when 𝜎 is unknown is constructed as

𝑠
𝑥̅ ± 𝑡(%/!,$23) ×
√𝑛

or

𝑠 𝑠
𝑥̅ − 𝑡(%/!,$23) × < 𝜇 < 𝑥̅ + 𝑡(%/!,$23) ×
√𝑛 √𝑛

Example 4.3.4. Consider the following sample data of the weights (in kilograms) of 20 learners.

40 45 46 48 48 50 55 55 56 58
58 59 60 60 62 62 64 64 65 66

Create 95% Confidence Interval Estimate for the population mean 𝜇,

Solution to Example 4.3.4
To create a 95% confidence interval, we need point estimates. Since all we have are samples, the point
estimates we can get are only 𝑥̅ and 𝑠. Using MS Excel, these are

9
Now, with 𝑥̅ = 56.05 and 𝑠 = 7.5147, the 95% Confidence Interval Estimate of the true average weight of
all learners is

7.5147 7.5147
56.05 ± 𝑡().)!(,36) × = 56.05 ± 2.09 ×
√20 √20

(The 𝑡-value is computed using MS Excel or the 𝑡-table. The syntax is =t.inv(0.975,19) or
=t.inv.2t(0.05,19))

or, in confidence statements, we say that we are 95% confident that the population mean (𝜇) is in between
52.54 kilograms and 59.56 kilograms.

Remark. A property of the 𝑡-distribution is that as the degrees of freedom increases, the 𝑡-distribution
approaches the standard normal distribution (𝑧-distribution). Note that this is also in harmony with the
Central Limit Theorem discussed in the previous chapter. As such, given a sample size that is at least 30,
and an unknown population standard deviation (𝜎), we take the tabular value from the 𝑧-distribution, and
the standard error estimate is to be used for constructing the (1 − 𝛼)% confidence interval estimate. That
is,

𝑠
(1 − 𝛼)% Confidence Interval Estimate = 𝑥̅ ± 𝑧%/! ×
√𝑛

Example 4.3.5. A random sample of statistics students were asked to estimate the total number of hours
they spend watching television in an average week. The responses are shown on the table below

0 3 1 20 9
5 10 1 10 4
4 2 4 4 5
5 10 1 10 4
4 4 5 4 2
20 9 0 3 1
4 10 3 4 6

Use this data to construct a 98% Confidence Interval Estimate for the mean number of hours of statistics
students will spend watching television in one week.

Using MS Excel, we get that 𝑥̅ = 5.46 and 𝑠 = 4.7113. Now, since 𝑛 = 35, the 98% confidence interval
estimate is computed as

4.7113
5.46 ± 2.33 ×
√35

which means that we are fairly confidence that 3.60 < 𝜇 < 7.32 hours.

10
Remark. When to use the 𝑧- or 𝑡-distribution for creating confidence interval estimates:

Path 1: Sigma is Known → Use 𝑧-distribution no matter the sample size.
Path 2: Sigma is Unknown → Use 𝑡-distribution if 𝑛 < 30.
Path 3: Sigma is Unknown → Use 𝑧-distribution if 𝑛 ≥ 30.

Supplementary Exercises:

1. A machine produces metal pieces which are cylindrical in shape with an average mean diameter of
14.20 cm if the machine is in good condition. A quality engineer officer evaluates the condition of
the machine by using a random sample of 36 runs which resulted to a mean diameter of 14.25 cm
with standard deviation of 0.30 cm. Using this information, construct a 95% confidence interval
estimate of the true average diameter of the cylindrical metal pieces produced by the machine.

2. A 99% confidence interval estimate can be interpreted to mean that:

Choice A: If all possible samples are taken and confidence interval estimates are developed, 99% of
them would include the true population mean somewhere within their interval.

Choice B: We have 99% confidence that we have selected a sample whose interval includes the
population mean.

Choice C: Both of the above.

Choice D: None of the above.

3. A local government official observes an increase in the number of individuals with cardiovascular
and obesity problems in his barangay. In order to improve the health conditions of his constituents,
he aims to promote an easy and cheap way to reduce weight. It is known that obesity results in
greater risk of having illnesses like diabetes and heart problems. He encouraged his constituents to
participate in his Dance for Life project every weekend for 3 months. To know if the program is
effective in reducing weight, he randomly selected 12 participants from the group who completed
the program. The weight loss data, in kilograms, of the 12 randomly selected participants after
completing the program are: 0.5, 0.7, 0.9, 1.1, 1.2, 1.3, 1.4, 2.0, 2.3, 2.4, 2.7, and 3.0. It is known that
the weight loss of those who have completed the dance program follows a normal distribution with
a variance of 3.24 kg ! . Construct a 90% confidence interval estimate for the true mean weight loss
of the participants who have completed the dance program.

4. The Human Toxome Project (HTP) is working to understand the scope of industrial pollution in the
human body. Industrial chemicals may enter the body through pollution or as ingredients in
consumer products. In October 2008, the scientists at HTP tested cord blood samples for 20
newborn infants in the United States. The cord blood of the "In utero/newborn" group was tested
for 430 industrial compounds, pollutants, and other chemicals, including chemicals linked to brain
and nervous system toxicity, immune system toxicity, reproductive toxicity, and fertility problems.
There are health concerns about the effects of some chemicals on the brain and nervous system. The
table below shows how many of the targeted chemicals were found in each infant’s cord blood.

79 145 147 160 116 100 159 151 156 126
137 83 156 94 121 144 123 114 139 99
11

Use this sample data to construct the 90% confidence interval estimate for the mean number of
targeted industrial chemicals to be found in an infant’s blood.

- End of the Lecture Transcript –
12

Lecture Transcript 4 (Estimation of Paramterers)

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture Transcript 4 (Estimation of Paramterers)

Uploaded by

Copyright:

Available Formats

A

Transcript of the Lecture in Estimation of Parameters

Def. Unbiased and Biased Estimate

Def. Formula for the 95% Margin of Error

The whole red area is equivalent to

Each of the unshaded area is equal to

Def. Standard Error Estimate

You might also like