You are on page 1of 50

Chapter 12

12.2
Inference About A Population

• Inference about a Population Mean when the


Standard Deviation is Unknown
• Inference about a Population Variance
Inference About A Population…
12.3
Population

Sample

Inference

Statistic
Parameter

We will develop techniques to estimate and test


three population parameters:
Population Mean, µ
Population Proportion, p
Population Variance, σ
Inference With Variance Unknown…
12.4
Previously, we looked at estimating and testing the
population mean when the population standard
deviation ( σ ) was known or given:

But how often do we know the actual population


variance?

Instead, we use the Student t-statistic, given by:


Inference With Variance Unknown…
12.5
When σ is unknown, we use its point estimator s

and the z-statistic is replaced by the t-statistic, where the


number of “degrees of freedom” ν, is n–1.
Testing µ when σ is unknown…
12.6
When the population standard deviation is unknown
and the population is normal, the test statistic for
testing hypotheses about μ is:

which is Student t distributed with ν= n–1 degrees of


freedom. The confidence interval estimator of is
given by:
Example 12.1
12.7
It is likely that in the near future nations will have to
do more to save the environment.

Possible actions include reducing energy use and


recycling.

Currently (2007) most products manufactured from


recycled material are considerably more expensive
than those manufactured from material found in the
earth.
Example 12.1
12.8
Newspapers are an exception.

It can be profitable to recycle newspaper.

A major expense is the collection from homes. In


recent years a number of companies have gone
into the business of collecting used newspapers from
households and recycling them.

A financial analyst for one such company has


recently computed that the firm would make a profit
if the mean weekly newspaper collection from each
household exceeded 2.0 pounds.
Example 12.1
12.9
In a study to determine the feasibility of a recycling
plant, a random sample of 148 households was
drawn from a large community, and the weekly
weight of newspapers discarded for recycling for
each household was recorded.

Do these data provide sufficient evidence to allow


the analyst to conclude that a recycling plant would
be profitable?
Example 12.1 IDENTIFY
12.10

Our objective is to describe the population of the


amount of newspaper discarded per household,
which is an interval variable. Thus the parameter to
be tested is the population mean µ.

We want to know if there is enough evidence to


conclude that the mean is greater than 2. Thus,
H1: µ > 2

Therefore we set our usual null hypothesis to:


H0: µ = 2
Example 12.1 IDENTIFY
12.11
Data:
Example 12.1 IDENTIFY
12.12

The test statistic is: x -µ


t=
s/ n
ν = n −1
The alternative hypothesis is:
H1: µ > 2

Manager believes cost of Type I error is high and sets


significance level at 1%, the rejection region becomes:

t > ta ,n = t0.01,148 » t0.01,150 = 2.351


Example 12.1 COMPUTE
12.13
Manually:
To calculate the value of the test statistic, we need
to calculate the sample mean and the sample
standard deviation s. From the data, we determine:

Thus,

and
Example 12.1 COMPUTE
12.14
Manually:
The value of µ is to be found in the null
hypothesis. It is 2.0. The value of the test statistic is:

Because 2.23 is not greater than 2.351, we


cannot reject the null hypothesis in favor of the
alternative. (Students performing the calculations
manually can approximate the p-value. The
online appendix Approximating the p-Value from
the Student t Table describes how.)
Example 12.1 COMPUTE
12.15
Example 12.1 INTERPRET
12.16

The value of the test statistic is t = 2.24 and its p-value is 0.0134.

There is not enough evidence to infer that the mean weight of


discarded newspapers is greater than 2.0.

Note that there is some evidence; the p-value is 0.0134. However,


because we wanted the Type I error to be small, we insisted on a 1%
significance level.

Thus, we cannot conclude that the recycling plant would be


profitable.

What if the significance level was set at 5%? (Discuss)


Example 12.2
12.17
In 2014 (the latest year reported), 146,861,217 tax returns
were filed in the United States. The Internal Revenue
Service (IRS) examined 1,228,117, of them to determine if
they were correctly done.
To determine how well the auditors are performing, a
random sample of these returns was drawn, and the
additional tax was reported, which is listed next.
Estimate with 95% confidence the mean additional
income tax collected from the 1,228,117 files audited.
Example 12.2
Random sample of tax collections 12.18
Example 12.2 IDENTIFY
12.19
The objective is to describe the population of
additional tax collected.

The data are interval.

The parameter to be estimated is the mean


additional tax.

The confidence interval estimator is


Example 12.2 COMPUTE
12.20

Thus,
Example 12.2 COMPUTE
12.21
Example 12.2 COMPUTE
12.22
Example 12.2… INTERPRET
12.23

We estimate that the mean additional tax collected


lies between $8,887 and $10,168.
We can use this estimate to help decide whether the
IRS is auditing the individuals who should be audited.
Check Required Conditions
12.24
The Student t distribution is robust, which means that
if the population is non-normal, the results of the t-
test and confidence interval estimate are still valid
provided that the population is “not extremely non-
normal”.

To check this requirement, draw a histogram of the


data and see how “bell shaped” the resulting figure
is. If a histogram is extremely skewed (say in the case
of an exponential distribution), that could be
considered “extremely non-normal” and hence t-
statistics would be not be valid in this case.
Histogram for Example 12.1
12.25
Histogram for Example 12.2
12.26
Estimating Totals of Finite Populations
12.27
The inferential techniques introduced thus far were
derived by assuming infinitely large populations. In
practice however, most populations are finite.

When the population is small, we must adjust the test


statistic and interval estimator using the finite
population correction factor introduced in Chapter
9.

However, in populations that are large relative to the


sample size we can ignore the correction factor.
Large populations are defined as populations that
are at least 20 times the sample size.
Estimating Totals of Finite Populations
12.28
Finite populations allow us to use the confidence
interval estimator of a mean to produce a
confidence interval estimator of the population
total.

To estimate the total we multiply the lower and


upper confidence limits of the estimate of the mean
by the population size.

Thus, the confidence interval estimator of the total is:

é s ù
N êx ± t a / 2 ú
ë nû
Estimating Totals of Finite Population
12.29
For example, suppose that we wish to estimate the
total amount of additional income tax collected
from the 1,228,117 returns that were examined. The
95% confidence interval estimate of the total is:
Developing an Understanding of Statistical Concepts 1
12.30
This section introduced the term degrees of freedom.
We will encounter this term many times in this book,
so a brief discussion of its meaning is warranted. The
Student t-distribution is based on using the sample
variance to estimate the unknown population
variance. The sample variance is defined as:

To compute s2, we must first determine "̅


Developing an Understanding of Statistical Concepts 1
12.31
Recall that sampling distributions are derived by repeated
sampling from the same population. To repeatedly take
samples to compute s2, we can choose any numbers for
the first n − 1 observations in the sample. However, we
have no choice on the nth value because the sample
mean must be calculated first. To illustrate, suppose that
n=3 and we find "̅ = 10.

We can have x1 and x2 assume any values without


restriction. However, x3 must be such that "̅ = 10.

For example, if x1 = 6 and x2 = 8, then x3 must equal 16.


Therefore, there are only two degrees of freedom in our
selection of the sample. We say that we lose one degree
of freedom because we had to calculate ".̅
Developing an Understanding of Statistical Concepts 2
12.32
The t-statistic like the z-statistic measures the
difference between the sample mean "̅ and the
hypothesized value of µ in terms of the number of
standard errors. However, when the population
standard deviation σ is unknown, we estimate the
standard error by s/√n .
Developing an Understanding of Statistical Concepts 3
12.33
When we introduced the Student t-distribution in
Section 8-4, we pointed out that it is more widely
spread out than the standard normal. This
circumstance is logical. The only variable in the z-
statistic is the sample mean , which will vary from
sample to sample.
The t-statistic has two variables: the sample mean
and the sample standard deviation s, both of which
will vary from sample to sample. Because of the
greater uncertainty, the t-statistic will display greater
variability.
Inference About Population Variance
12.34
If we are interested in drawing inferences about a
population’s variability, the parameter we need to
investigate is the population variance: σ2

The sample variance (s2) is an unbiased, consistent and


efficient point estimator for σ2. Moreover,

the statistic, , has a chi-squared distribution,

with n–1 degrees of freedom.


Testing & Estimating Population Variance
12.35
The test statistic used to test hypotheses about σ2 is:

which is chi-squared with ν = n–1 degrees of


freedom.
Testing & Estimating Population Variance
12.36
Combining this statistic:

with the probability statement:

yields the confidence interval estimator for σ2:

lower confidence limit upper confidence limit


Example 12.3…
12.37

Container-filling machines are used to package a variety of liquids;


including milk, soft drinks, and paint.

Ideally, the amount of liquid should vary only slightly, since large
variations will cause some containers to be underfilled (cheating the
customer) and some to be overfilled (resulting in costly waste).

The president of a company that developed a new type of machine


boasts that this machine can fill 1-liter (1,000 cubic centimeters)
containers so consistently that the variance of the fills will be less than
1.
Example 12.3
12.38

To examine the veracity of the claim, a


random sample of 25 l-liter fills was taken
and the results (cubic centimeters)
recorded.

Do these data allow the president to make


this claim at the 5% significance level?
Example 12.3… IDENTIFY
12.39

The problem objective is to describe the population


of l-liter fills from this machine.

The data are interval, and we're interested in the


variability of the fills.

It follows that the parameter of interest is the


population variance σ2.
Example 12.3… IDENTIFY
12.40

Because we want to determine whether there is


enough evidence to support the claim, the
alternative hypothesis is:
H1 : s 2 < 1

The null hypothesis is:

H0 : s2 = 1
and the test statistic we will use is:

2 (n - 1)s 2
c =
s2
Example 12.13 COMPUTE
12.41
Manually:
Using a calculator, we find:

Thus,

The value of the test statistic is:

The rejection region is:

Because 15.20 is not less than 13.85, we cannot


reject the null hypothesis in favor of the alternative.
Example 12.3 COMPUTE
12.42
Example 12.3 INTERPRET
12.43
There is not enough evidence to infer that the claim is
true.

As we discussed before, the result does not say that


the variance is equal to 1; it merely states that we
are unable to show that the variance is less than 1.
Example 12.4 (Workshop)
12.44
Consistency of a Container-Filling Machine, Part 2
Estimate with 99% confidence the variance of fills in Example 12.3.
Example 12.4 COMPUTE
12.45
Example 12.4 COMPUTE
12.46
Example 12.4… INTERPRET
12.47
In Example 12.3, we saw that there was not sufficient
evidence to infer that the population variance is
less than 1. Here we see that is estimated to lie
between 0.3336 and 1.5375.

Part of this interval is above 1, which tells us that the


variance may be larger than 1, confirming the
conclusion we reached in Example 12.3.

We may be able to use the estimate to predict the


percentage of overfilled and underfilled bottles.
This may allow us to choose among competing
machines.
Checking the Required Condition
12.48
Factors That Identify the Chi-Squared Test
12.49

1. Problem objective: Describe a population


2. Data type: Interval
3. Type of descriptive measurement:
Variability
Flowchart of Techniques
12.50
Describe a Population

Data Type?

Interval Nominal

Type of descriptive measurement? z test &


estimator of p

Central Location Variability

t test & χ2 test &


estimator of u. estimator of d

You might also like