You are on page 1of 50

1

CHAPTER 5
ESTIMATION AND
STATISTICAL INTERVALS
Outline of Chapter 5
2

5.1 Point Estimation

5.2 Large-Sample Confidence Intervals for a Population Mean

5.3 More Large-Sample Confidence Intervals

5.4 Small-Sample Intervals Based on a Normal Population Distribution

5.5 Intervals for µ1- µ2 Based on Normal Population Distributions


5.1 Point Estimation
à Introduction

• The general objective of statistical inference is to use sample information


as a basis for drawing various types of conclusions.

• When a parameter is being estimated, the estimate can be either a single


number or it can be a range of scores.
Ø When the estimate is a single number, the estimate is called a point
estimate.
Ø When the estimate is a range of scores, the estimate is called an
interval estimate. Confidence intervals are used for interval estimates.
5.1 Point Estimation
à Point Estimation

• A point estimate of some parameter q is a single number, calculated


from sample data, that can be regarded as an educated guess for the
value of q.
• The symbol q! is frequently used to denote either the estimator or the
resulting estimate.

Ø For example: we might decide that .350 is a point estimate for the
proportion p of all individuals who would try a particular product
again after using a free trial sample.
5.1 Point Estimation
à Properties of Estimators
à Bias and unbiased estimator

• One desirable property that a good estimator should possess is that it


be unbiased.
• In terms of sampling distributions, an estimator is said to be unbiased
if the mean of its sampling distribution coincides with the parameter
that is being estimated.

Ø For instance, the sampling distribution of the statistic 𝑥̅ has a mean


value of µ𝑥 ̅ , which equals the mean µ of the population from which
the samples are taken.

Ø Then 𝑥̅ is said to be an estimator of the parameter µ and, because


µ𝑥 ̅ = µ, 𝑥̅ is also an unbiased estimator of µ.
5.1 Point Estimation
à Properties of Estimators
à Bias and unbiased estimator

In general, for any


population parameter q
and any estimator q! of
that parameter, Figure
illustrates what it means
for q! to be unbiased or
biased.

Sampling distribution of an estimator q!


5.1 Point Estimation
à Properties of Estimators
à Bias and unbiased estimator

DEFINITIONS

Denote a population parameter generically by the letter q and denote any


estimator of this parameter by q. ! Then q! is an unbiased estimator if
µq̂ = q . Otherwise, q! is said to be biased, and the quantity µq̂ - q is called
!
the bias of q.

• Some of the most important statistics we have studied are unbiased


estimators of certain population parameters.
• For example, it can be shown that the sample mean 𝑥̅ is an unbiased
estimator of the population mean µ, the sample variance s2 is an
unbiased estimator of the population variance s2, and the sample
proportion p is an unbiased estimator of the population proportion p.
5.1 Point Estimation
à Properties of Estimators
à Consistency

• A second desirable property that estimators often possess is


consistency. If q! denotes an estimator of some population parameter
q, then q! is said to be consistent if the probability that it lies close to q
increases to 1 as the sample size increases.

Ø consistent estimators become more and more accurate as the sample


size increases. That is, as you increase n, it becomes more and more
likely that such estimators will be very close to the parameter they are
intended to estimate.

• The most common method for showing that an estimator is consistent


is to show that its standard error decreases as the sample size
increases.
5.2 Large-Sample Confidence Intervals for a Population Mean
à Interval estimate or confidence interval (CI)

• A point estimate, because it is a single number, by itself provides no


information about the precision and reliability of estimation.

• Because of sampling variability, it is virtually never the case that 𝑥̅ = µ.


The point estimate says nothing about how close it might be to µ.

• An alternative way is to calculate and report an entire interval of plausible


values—an interval estimate or confidence interval (CI).

• A confidence interval is always calculated by first selecting a confidence


level, which is a measure of the degree of reliability of the interval.

• The higher the confidence level, the more strongly we believe that the value
of the parameter being estimated lies within the interval.
5.2 Large-Sample Confidence Intervals for a Population Mean
à Interval estimate or confidence interval (CI)
à A Confidence Interval for µ with Confidence Level 95%

• A confidence interval for a population or process mean µ is based on


the following properties of the sampling distribution of 𝑥:̅

• When n is large, the 𝑥̅ distribution is approximately normal (this is


the Central Limit Theorem).
• Standardizing 𝑥̅ gives standard normal (the z curve):

• In fact, standard deviation 𝛔 will almost never be knownà replace


with the sample standard deviation s:
5.2 Large-Sample Confidence Intervals for a Population Mean
à Interval estimate or confidence interval (CI)
à A Confidence Interval for µ with Confidence Level 95%

• Due to Appendix Table I:

Capturing a central curve area of .95


5.2 Large-Sample Confidence Intervals for a Population Mean
à Interval estimate or confidence interval (CI)
à A Confidence Interval for µ with Confidence Level 95%

Substituting the values of n, 𝑥,̅ and s from any particular sample into these
expressions gives a confidence interval for µ with a confidence level of
approximately 95%.
5.2 Large-Sample Confidence Intervals for a Population Mean
à Interval estimate or confidence interval (CI)
à A Confidence Interval for µ with Confidence Level 95%
5.2 Large-Sample Confidence Intervals for a Population Mean
à Interval estimate or confidence interval (CI)
à A Confidence Interval for µ with Confidence Level 95%à Example 5.3

Given the accompanying sample


observations on breakdown
voltage (kV) of a particular
circuit under certain conditions: What is the CI for µ ?

The boxplot of the data shows a


high concentration in the middle
half of the data (narrow box width). The output from the JMP software’s Analyze/Distribution command
5.2 Large-Sample Confidence Intervals for a Population Mean
à Interval estimate or confidence interval (CI)
à A Confidence Interval for µ with Confidence Level 95%à Example 5.3

Solution:
5.2 Large-Sample Confidence Intervals for a Population Mean
à Interval estimate or confidence interval (CI)
à Other Confidence Levels and a General Formula

• The confidence level of 95% was inherited from the probability .95 with which we began
the derivation of the interval. This probability in turn dictated the use of the z critical value 1.96
in the confidence interval formula.
Ø It follows that if we want a confidence level of 99%, we should identify the z critical value
that captures a central z curve area of .99.

Finding the critical value for a 99% confidence level


5.2 Large-Sample Confidence Intervals for a Population Mean
à Interval estimate or confidence interval (CI)
à Other Confidence Levels and a General Formula

A large-sample confidence interval for a population or process mean µ is given by


the formula:

• As a general rule, this


interval is appropriate
when the sample size
exceeds 30.

• The three most commonly


used confidence levels,
90%, 95%, and 99%, use
critical values of 1.645,
1.96, and 2.576,
respectively.

Finding the critical value for a 99% confidence level


5.2 Large-Sample Confidence Intervals for a Population Mean
à Interval estimate or confidence interval (CI)
à Other Confidence Levels and a General Formula

Exercises:

1.

2. Random samples of size n are selected from a normal population whose standard
deviation 𝜎 is known to be 2.
a. Suppose you want 90% of the area under the sampling distribution of 𝑥̅ to lie within
±1 unit of a population mean 𝜇. Find the minimum sample size n that satisfies this
requirement.
b. Repeat the calculations in part (a) for areas of 80%, 95%, and 99%.
5.2 Large-Sample Confidence Intervals for a Population Mean
à Interval estimate or confidence interval (CI)
à Other Confidence Levels and a General Formula

Solution Ex1:
5.2 Large-Sample Confidence Intervals for a Population Mean
à Interval estimate or confidence interval (CI)
à Other Confidence Levels and a General Formula

Solution Ex2:
5.2 Large-Sample Confidence Intervals for a Population Mean
à Interval estimate or confidence interval (CI)
à Choosing the Sample Size

• The half-width 1.96s/ 𝑛 of the 95% CI is sometimes called the bound on


the error of estimation associated with a 95% confidence level; that is, with
95% confidence, the point estimate 𝑥̅ will be no farther than this from µ.
• Before obtaining data, an investigator may wish to determine a sample size
for which a particular value of the bound is achieved.
• More generally, suppose we wish to estimate µ to within an amount B (the
specified bound on the error of estimation) with 95% confidence. This
implies that B=1.96s/ 𝑛, from which:

à How to define s in general?


=> For a population distribution that is not too skewed: s = (𝒙𝒎𝒂𝒙 −𝒙𝒎𝒊𝒏 )/4
5.2 Large-Sample Confidence Intervals for a Population Mean
à Interval estimate or confidence interval (CI)
à Choosing the Sample Size: Example

• Example: Back to Example 5.3


Given the accompanying
sample observations on
breakdown voltage (kV)
of a particular circuit:
à Suppose that the investigator believes that almost all values in the population
distribution are between 40 and 70. Then (70-40)/4 = 7.5 gives a reasonable value
for s.
à Question: What is the appropriate sample size for estimating true average
breakdown voltage to within 1 kV with confidence level 95%?
5.2 Large-Sample Confidence Intervals for a Population Mean
à Interval estimate or confidence interval (CI)
à One-Sided Confidence Intervals (Confidence Bounds)
5.3 More Large-Sample Confidence Intervals
à A Large-Sample Confidence Interval for µ1- µ2

• For population: µ à mean value, s à standard deviation (std), 𝜎 ! à variance


• For sample distribution: 𝑥̅ à sample mean, 𝑠 à sample std, 𝑠 ! à sample variance
5.3 More Large-Sample Confidence Intervals
à A Large-Sample Confidence Interval for µ1- µ2
à Example 5.5

A study was carried out to compare population mean lifetimes (hr) for two different
brands of AA batteries. Here, µ1 and s1 mean value and standard deviation for the
distribution of brand 1 lifetimes; µ2 and s2 are the mean value and standard deviation
for the distribution of brand 2 lifetimes. Values of the summary quantities calculated
from the two resulting samples are as follows:

Question: What is the estimation of the difference µ1- µ2?


à The natural statistic for estimating µ1 is 𝑥̅ 1, for estimating µ2 is 𝑥̅ 2
àThe difference µ1- µ2 is estimated of 𝑥"̅ − 𝑥̅!
à The point estimate from the data is 4.15 - 4.53 = 0.38.
Ø That is, we estimate that, on average, brand 2 batteries last 0.38 hr longer than do
brand 1 batteries!!
5.3 More Large-Sample Confidence Intervals
à A Large-Sample Confidence Interval for µ1- µ2

1. For any two random variables x and y,

2. If x and y are two independent random variables, then

3. If x and y are independent random variables, each with a normal distribution, then
the difference x - y also has a normal distribution. If each variable is approximately
normal, then the distribution of the difference is also approximately normal.
5.3 More Large-Sample Confidence Intervals
à A Large-Sample Confidence Interval for µ1- µ2
%1- 𝒙
à Properties of the Sampling Distribution of 𝒙 %2
5.3 More Large-Sample Confidence Intervals
à A Large-Sample Confidence Interval for µ1- µ2
%1- 𝒙
à Properties of the Sampling Distribution of 𝒙 %2

(1- 𝒙
• Consider results to standardize 𝒙 (2 when both sample sizes are large.

• When 1 and 2 are both large, the standardized variable

has approximately a standard normal distribution (the z curve).


5.3 More Large-Sample Confidence Intervals
à A Large-Sample Confidence Interval for µ1- µ2
%1- 𝒙
à Properties of the Sampling Distribution of 𝒙 %2

• Using this variable in the same way that variables were used earlier to
obtain confidence intervals for µ and for p gives the following large-sample
confidence interval formula for estimating µ1- µ2:

• This formula is valid irrespective of the shapes of the two underlying


distributions.
• The three most frequently used confidence levels of 95%, 99%, and 90%
are achieved by using the critical values 1.96, 2.576, and 1.645,
respectively.
5.3 More Large-Sample Confidence Intervals
à Section 5.3 Exercises

• Given the two samples of the following disks with 3/8-inches and 1/2-
inches diameter.
à What is the estimate of µ1- µ2 of the two populations with a confidence
level of 95%?
5.3 More Large-Sample Confidence Intervals
à Section 5.3 Exercises

• The estimate of µ1- µ2 with a confidence level of 95%:


5.4 Small-Sample Intervals Based on a Normal Population Distribution
à t Distributions and the One-Sample
t Confidence Interval

The large-sample interval for µ by introducing a standard normal distribution

to have

à However, for small n this is no longer true!!!


à For small-sample n, we can use t-distribution!
PROPOSITION
Let x1, x2, . . . , xn be a random sample from a normal distribution. Then
the standardized variable

has a type of probability distribution called a t distribution with n - 1


degrees of freedom (df).
5.4 Small-Sample Intervals Based on a Normal Population Distribution
à t Distributions and the One-Sample
t Confidence Interval à Properties of Distributions

• The Z distribution is a special case of the normal distribution with a mean


of 0 and standard deviation of 1, i.e. Z ~ N(0,1).
• The t-distribution is similar to the Z - distribution, but is sensitive to
sample size and is used for small or moderate samples when the
population standard deviation is unknown.
Ø At large samples, the z and t-samples are very similar.
5.4 Small-Sample Intervals Based on a Normal Population Distribution
à t Distributions and the One-Sample
t Confidence Interval à Properties of Distributions

Ø How well a t-distribution


approximates a normal
distribution is determined
by degrees of freedom (df).
Ø The greater the sample size
(n) is, the larger the degrees of
freedom (n-1) are, and the better
the t-distribution approximates
the normal distribution!
Ø The z curve is sometimes
referred to as the t curve with
df = ∞
5.4 Small-Sample Intervals Based on a Normal Population Distribution
à t Distributions and the One-Sample
t Confidence Interval à One-Sample Confidence Intervals

Let 𝑥̅ and s be the sample mean and sample standard deviation of a random sample of size from a
normal population distribution. Then a two-sided confidence interval for the population mean µ
has the form

t critical values for the most frequently used confidence levels, corresponding to particular
central t curve areas, are given in Appendix Table IV.
5.4 Small-Sample Intervals Based on a Normal Population Distribution
à t Distributions and the One-Sample - t Confidence Interval
à One-Sample Confidence Intervals à Example 5.6

Consider the following observations

To simplify calculation, we simplify data by replacing: yi = xi - 10,000.

A Q-Q plot of the


data
5.4 Small-Sample Intervals Based on a Normal Population Distribution
à Tolerance Intervals

Let k be a number between 0 and 100. A tolerance interval for capturing at


least k% of the x values in a normal population distribution with a confidence
level 95% has the form

Tolerance critical values for k = 90, 95, and 99 in combination with


various sample sizes are given in Appendix Table V.
5.4 Small-Sample Intervals Based on a Normal Population Distribution
à Tolerance Intervals
5.4 Small-Sample Intervals Based on a Normal Population Distribution
à Section 5.4 Exercises

1.
5.4 Small-Sample Intervals Based on a Normal Population Distribution
à Section 5.4 Exercises

2. Given the following 16 mileages of a Porsche car :

a. What are the min, max values? Q1, Q2? Mean 𝑥? ̅ Sample Std s?
b. What is the estimate of mileage mean (CI) with 95% confident level?

Solution:
5.5 Intervals for µ1 - µ2 Based on Normal Population Distributions
à The Two-Sample t Interval
PROPOSITION

Consider two normal distributions with mean values µ1 and µ2, respectively. Suppose a
random sample of size n1 is selected from the first distribution, resulting in a sample
mean of 𝑥̅ 1 and a sample standard deviation of s1. A random sample from the second
distribution, selected independently of that from the first one, yields sample mean 𝑥̅ 2
and sample standard deviation s2. Then the standardized variable

has approximately a t distribution with df estimated from the sample by the following
formula:

where se = s/Ön (Note: df should be rounded down to the nearest integer).


5.5 Intervals for µ1 - µ2 Based on Normal Population Distributions
à The Two-Sample t Interval
PROPOSITION

This implies that a confidence interval for µ1 - µ2 in this situation is

t critical values corresponding to the most frequently used confidence levels appear in
Appendix Table IV
5.5 Intervals for µ1 - µ2 Based on Normal Population Distributions
à The Two-Sample t Interval à Example 5.7

Which way of dispensing champagne, the traditional vertical method or a tilted beerlike
pour, preserves more of the tiny gas bubbles that improve flavor and aroma? The
following data was reported in the article “On the Losses of Dissolved CO2 during
Champagne Serving”

(standard deviation)

è Question: Assuming the sampled distributions are normal, what are the confidence
intervals for the difference between true average dissolved CO2 loss for the traditional
pour and that for the slanted pour at each of the two temperatures?
5.5 Intervals for µ1 - µ2 Based on Normal Population Distributions
à The Two-Sample t Interval à Example 5.7
5.5 Intervals for µ1 - µ2 Based on Normal Population Distributions
à A Confidence Interval from Paired Data

• Let d denote the population mean difference, that is, the average of all
differences in the population. It can be shown that

where µ1 is the population mean value of all first numbers within pairs
and µ2 is defined similarly for all second numbers.
• The importance of this relationship is that if we can obtain a CI for
µd, it will also be a CI for µ1 - µ2 .
• A CI for µd can be calculated from the differences for pairs in the
sample.
• In particular, if the population distribution of the differences can be
assumed to be normal, then a one-sample t interval based on the
sample differences is appropriate.
5.5 Intervals for µ1 - µ2 Based on Normal Population Distributions
à A Confidence Interval from Paired Data
5.5 Intervals for µ1 - µ2 Based on Normal Population Distributions
à A Confidence Interval from Paired Data
à Example 5.8

Given data on the modulus of elasticity obtained 1 minute after loading in a certain
configuration, the values of modulus of elasticity obtained 4 weeks after loading for
the same lumber specimens. The data is presented here.
Normal quantile plot of the differences

It is reasonable to assume that the population


distribution of the differences is approximately
normal
5.5 Intervals for µ1 - µ2 Based on Normal Population Distributions
à A Confidence Interval from Paired Data
à Example 5.8

The sample consists of 16 pairs, so a 99% confidence interval based on 15 df


requires the t critical value 2.947. With d52635.6 and sd5508.64, the interval is

Normal quantile plot of the differences


5.5 Intervals for µ1 - µ2 Based on Normal Population Distributions
à Section 5.5 Exercises

The firmness of a piece of fruit is an important indicator of fruit ripeness.


The Magness–Taylor firmness (N) was determined for one sample of 20
golden apples with a shelf life of zero days, resulting in a sample mean
of 8.74 and a sample standard deviation of .66, and another sample of 20
apples with a shelf life of 20 days, with a sample mean and sample
standard deviation of 4.96 and .39, respectively.
à Calculate a confidence interval for the difference between true
average firmness for zero-day apples and true average firmness for 20-
day apples using a confidence level of 95%, and interpret the interval.
5.5 Intervals for µ1 - µ2 Based on Normal Population Distributions
à Section 5.5 Exercises

Solution:

You might also like