Lektion 3

Chapter 12
12.2
Inference About A Population
• Inference about a Population Mean when the

Standard Deviation is Unknown
• Inference about a Population Variance
Inference About A Population…
12.3
Population
Sample
Inference
Statistic
Parameter
We will develop techniques to estimate and test

three population parameters:
Population Mean, µ
Population Proportion, p
Population Variance, σ
Inference With Variance Unknown…
12.4
Previously, we looked at estimating and testing the
population mean when the population standard
deviation ( σ ) was known or given:
But how often do we know the actual population

variance?
Instead, we use the Student t-statistic, given by:

Inference With Variance Unknown…
12.5
When σ is unknown, we use its point estimator s
and the z-statistic is replaced by the t-statistic, where the

number of “degrees of freedom” ν, is n–1.
Testing µ when σ is unknown…
12.6
When the population standard deviation is unknown
and the population is normal, the test statistic for
testing hypotheses about μ is:
which is Student t distributed with ν= n–1 degrees of

freedom. The confidence interval estimator of is
given by:
Example 12.1
12.7
It is likely that in the near future nations will have to
do more to save the environment.
Possible actions include reducing energy use and

recycling.
Currently (2007) most products manufactured from

recycled material are considerably more expensive
than those manufactured from material found in the
earth.
Example 12.1
12.8
Newspapers are an exception.
It can be profitable to recycle newspaper.
A major expense is the collection from homes. In

recent years a number of companies have gone
into the business of collecting used newspapers from
households and recycling them.
A financial analyst for one such company has

recently computed that the firm would make a profit
if the mean weekly newspaper collection from each
household exceeded 2.0 pounds.
Example 12.1
12.9
In a study to determine the feasibility of a recycling
plant, a random sample of 148 households was
drawn from a large community, and the weekly
weight of newspapers discarded for recycling for
each household was recorded.
Do these data provide sufficient evidence to allow

the analyst to conclude that a recycling plant would
be profitable?
Example 12.1 IDENTIFY
12.10
Our objective is to describe the population of the

amount of newspaper discarded per household,
which is an interval variable. Thus the parameter to
be tested is the population mean µ.
We want to know if there is enough evidence to

conclude that the mean is greater than 2. Thus,
H1: µ > 2
Therefore we set our usual null hypothesis to:

H0: µ = 2
12.11
Data:
12.12
The test statistic is: x -µ

t=
s/ n
ν = n −1
The alternative hypothesis is:
H1: µ > 2
Manager believes cost of Type I error is high and sets

significance level at 1%, the rejection region becomes:
t > ta ,n = t0.01,148 » t0.01,150 = 2.351

Example 12.1 COMPUTE
12.13
Manually:
To calculate the value of the test statistic, we need
to calculate the sample mean and the sample
standard deviation s. From the data, we determine:
Thus,
and
12.14
Manually:
The value of µ is to be found in the null
hypothesis. It is 2.0. The value of the test statistic is:
Because 2.23 is not greater than 2.351, we

cannot reject the null hypothesis in favor of the
alternative. (Students performing the calculations
manually can approximate the p-value. The
online appendix Approximating the p-Value from
the Student t Table describes how.)
12.15
Example 12.1 INTERPRET
12.16
The value of the test statistic is t = 2.24 and its p-value is 0.0134.
There is not enough evidence to infer that the mean weight of

discarded newspapers is greater than 2.0.
Note that there is some evidence; the p-value is 0.0134. However,

because we wanted the Type I error to be small, we insisted on a 1%
significance level.
Thus, we cannot conclude that the recycling plant would be

profitable.
What if the significance level was set at 5%? (Discuss)

Example 12.2
12.17
In 2014 (the latest year reported), 146,861,217 tax returns
were filed in the United States. The Internal Revenue
Service (IRS) examined 1,228,117, of them to determine if
they were correctly done.
To determine how well the auditors are performing, a
random sample of these returns was drawn, and the
additional tax was reported, which is listed next.
Estimate with 95% confidence the mean additional
income tax collected from the 1,228,117 files audited.
Example 12.2
Random sample of tax collections 12.18
12.19
The objective is to describe the population of
additional tax collected.
The data are interval.
The parameter to be estimated is the mean

additional tax.
The confidence interval estimator is

12.20
Thus,
12.21
12.22
Example 12.2… INTERPRET
12.23
We estimate that the mean additional tax collected

lies between $8,887 and $10,168.
We can use this estimate to help decide whether the
IRS is auditing the individuals who should be audited.
Check Required Conditions
12.24
The Student t distribution is robust, which means that
if the population is non-normal, the results of the t-
test and confidence interval estimate are still valid
provided that the population is “not extremely non-
normal”.
To check this requirement, draw a histogram of the

data and see how “bell shaped” the resulting figure
is. If a histogram is extremely skewed (say in the case
of an exponential distribution), that could be
considered “extremely non-normal” and hence t-
statistics would be not be valid in this case.
Histogram for Example 12.1
12.25
Histogram for Example 12.2
12.26
Estimating Totals of Finite Populations
12.27
The inferential techniques introduced thus far were
derived by assuming infinitely large populations. In
practice however, most populations are finite.
When the population is small, we must adjust the test

statistic and interval estimator using the finite
population correction factor introduced in Chapter
9.
However, in populations that are large relative to the

sample size we can ignore the correction factor.
Large populations are defined as populations that
are at least 20 times the sample size.
Estimating Totals of Finite Populations
12.28
Finite populations allow us to use the confidence
interval estimator of a mean to produce a
confidence interval estimator of the population
total.
To estimate the total we multiply the lower and

upper confidence limits of the estimate of the mean
by the population size.
Thus, the confidence interval estimator of the total is:
é s ù
N êx ± t a / 2 ú
ë nû
Estimating Totals of Finite Population
12.29
For example, suppose that we wish to estimate the
total amount of additional income tax collected
from the 1,228,117 returns that were examined. The
95% confidence interval estimate of the total is:
Developing an Understanding of Statistical Concepts 1
12.30
This section introduced the term degrees of freedom.
We will encounter this term many times in this book,
so a brief discussion of its meaning is warranted. The
Student t-distribution is based on using the sample
variance to estimate the unknown population
variance. The sample variance is defined as:
To compute s2, we must first determine "̅

12.31
Recall that sampling distributions are derived by repeated
sampling from the same population. To repeatedly take
samples to compute s2, we can choose any numbers for
the first n − 1 observations in the sample. However, we
have no choice on the nth value because the sample
mean must be calculated first. To illustrate, suppose that
n=3 and we find "̅ = 10.
We can have x1 and x2 assume any values without

restriction. However, x3 must be such that "̅ = 10.
For example, if x1 = 6 and x2 = 8, then x3 must equal 16.

Therefore, there are only two degrees of freedom in our
selection of the sample. We say that we lose one degree
of freedom because we had to calculate ".̅
12.32
The t-statistic like the z-statistic measures the
difference between the sample mean "̅ and the
hypothesized value of µ in terms of the number of
standard errors. However, when the population
standard deviation σ is unknown, we estimate the
standard error by s/√n .
12.33
When we introduced the Student t-distribution in
Section 8-4, we pointed out that it is more widely
spread out than the standard normal. This
circumstance is logical. The only variable in the z-
statistic is the sample mean , which will vary from
sample to sample.
The t-statistic has two variables: the sample mean
and the sample standard deviation s, both of which
will vary from sample to sample. Because of the
greater uncertainty, the t-statistic will display greater
variability.
Inference About Population Variance
12.34
If we are interested in drawing inferences about a
population’s variability, the parameter we need to
investigate is the population variance: σ2
The sample variance (s2) is an unbiased, consistent and

efficient point estimator for σ2. Moreover,
the statistic, , has a chi-squared distribution,
with n–1 degrees of freedom.

Testing & Estimating Population Variance
12.35
The test statistic used to test hypotheses about σ2 is:
which is chi-squared with ν = n–1 degrees of

freedom.
Testing & Estimating Population Variance
12.36
Combining this statistic:
with the probability statement:
yields the confidence interval estimator for σ2:
lower confidence limit upper confidence limit

Example 12.3…
12.37
Container-filling machines are used to package a variety of liquids;

including milk, soft drinks, and paint.
Ideally, the amount of liquid should vary only slightly, since large
variations will cause some containers to be underfilled (cheating the
customer) and some to be overfilled (resulting in costly waste).
The president of a company that developed a new type of machine

boasts that this machine can fill 1-liter (1,000 cubic centimeters)
containers so consistently that the variance of the fills will be less than
1.
Example 12.3
12.38
To examine the veracity of the claim, a

random sample of 25 l-liter fills was taken
and the results (cubic centimeters)
recorded.
Do these data allow the president to make

this claim at the 5% significance level?
Example 12.3… IDENTIFY
12.39
The problem objective is to describe the population

of l-liter fills from this machine.
The data are interval, and we're interested in the

variability of the fills.
It follows that the parameter of interest is the

population variance σ2.
Example 12.3… IDENTIFY
12.40
Because we want to determine whether there is

enough evidence to support the claim, the
alternative hypothesis is:
H1 : s 2 < 1
The null hypothesis is:
H0 : s2 = 1
and the test statistic we will use is:
2 (n - 1)s 2
c =
s2
12.41
Manually:
Using a calculator, we find:
Thus,
The value of the test statistic is:
The rejection region is:
Because 15.20 is not less than 13.85, we cannot

reject the null hypothesis in favor of the alternative.
12.42
Example 12.3 INTERPRET
12.43
There is not enough evidence to infer that the claim is
true.
As we discussed before, the result does not say that

the variance is equal to 1; it merely states that we
are unable to show that the variance is less than 1.
Example 12.4 (Workshop)
12.44
Consistency of a Container-Filling Machine, Part 2
Estimate with 99% confidence the variance of fills in Example 12.3.
12.45
12.46
Example 12.4… INTERPRET
12.47
In Example 12.3, we saw that there was not sufficient
evidence to infer that the population variance is
less than 1. Here we see that is estimated to lie
between 0.3336 and 1.5375.
Part of this interval is above 1, which tells us that the

variance may be larger than 1, confirming the
conclusion we reached in Example 12.3.
We may be able to use the estimate to predict the

percentage of overfilled and underfilled bottles.
This may allow us to choose among competing
machines.
Checking the Required Condition
12.48
Factors That Identify the Chi-Squared Test
12.49
1. Problem objective: Describe a population

2. Data type: Interval
3. Type of descriptive measurement:
Variability
Flowchart of Techniques
12.50
Describe a Population
Data Type?
Interval Nominal
Type of descriptive measurement? z test &

estimator of p
Central Location Variability
t test & χ2 test &

estimator of u. estimator of d

Lektion 3

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lektion 3

Uploaded by

Copyright:

Available Formats

Chapter 12

• Inference about a Population Mean when the

We will develop techniques to estimate and test

But how often do we know the actual population

Instead, we use the Student t-statistic, given by:

and the z-statistic is replaced by the t-statistic, where the

which is Student t distributed with ν= n–1 degrees of

Possible actions include reducing energy use and

Currently (2007) most products manufactured from

It can be profitable to recycle newspaper.

A major expense is the collection from homes. In

A financial analyst for one such company has

Do these data provide sufficient evidence to allow

Our objective is to describe the population of the

We want to know if there is enough evidence to

Therefore we set our usual null hypothesis to:

The test statistic is: x -µ

Manager believes cost of Type I error is high and sets

t > ta ,n = t0.01,148 » t0.01,150 = 2.351

Because 2.23 is not greater than 2.351, we

There is not enough evidence to infer that the mean weight of

Note that there is some evidence; the p-value is 0.0134. However,

Thus, we cannot conclude that the recycling plant would be

What if the significance level was set at 5%? (Discuss)

The data are interval.

The parameter to be estimated is the mean

The confidence interval estimator is

We estimate that the mean additional tax collected

To check this requirement, draw a histogram of the

When the population is small, we must adjust the test

However, in populations that are large relative to the

To estimate the total we multiply the lower and

Thus, the confidence interval estimator of the total is:

To compute s2, we must first determine "̅

We can have x1 and x2 assume any values without

For example, if x1 = 6 and x2 = 8, then x3 must equal 16.

The sample variance (s2) is an unbiased, consistent and

the statistic, , has a chi-squared distribution,

with n–1 degrees of freedom.

which is chi-squared with ν = n–1 degrees of

with the probability statement:

yields the confidence interval estimator for σ2:

lower confidence limit upper confidence limit

Container-filling machines are used to package a variety of liquids;

The president of a company that developed a new type of machine

To examine the veracity of the claim, a

Do these data allow the president to make

The problem objective is to describe the population

The data are interval, and we're interested in the

It follows that the parameter of interest is the

Because we want to determine whether there is

The null hypothesis is:

The value of the test statistic is:

The rejection region is:

Because 15.20 is not less than 13.85, we cannot

As we discussed before, the result does not say that

Part of this interval is above 1, which tells us that the

We may be able to use the estimate to predict the

1. Problem objective: Describe a population

Type of descriptive measurement? z test &

Central Location Variability

t test & χ2 test &

You might also like