You are on page 1of 4

03758_11_ch10_p364-424.

qxd 9/7/11 12:47 PM Page 391

10.5 SMALL-SAMPLE INFERENCES FOR THE DIFFERENCE BETWEEN TWO MEANS: A PAIRED-DIFFERENCE TEST ❍ 391

reduction in the standard error more than compensated for the loss in degrees of
freedom.
Except for notation, the paired-difference analysis is the same as the single-
sample analysis presented in Section 10.3. However, both MINITAB and MS Excel provide
a single procedure (Paired t in MINITAB and t-Test: Paired Two Sample for Means in
MS Excel) to analyze the differences. The MINITAB output, shown in
Figure 10.12, shows the p-value for the paired analysis, .000, indicating a highly signif-
icant difference in the means. You will find instructions for generating both the MINITAB
and MS Excel outputs in the “Technology Today” section at the end of this chapter.
FIG UR E 10. 1 2
● Paired T-Test and CI: Tire A, Tire B
MINITAB output for
Paired T for Tire A - Tire B
paired-difference analysis N Mean StDev SE Mean
of tire wear data Tire A 5 10.240 1.316 0.589
Tire B 5 9.760 1.328 0.594
Difference 5 0.4800 0.0837 0.0374
95% CI for mean difference: (0.3761, 0.5839)
T-Test of mean difference = 0 (vs not = 0): T-Value = 12.83 P-Value = 0.000

10.5 EXERCISES

BASIC TECHNIQUES d. What assumptions must you make for your infer-
10.36 A paired-difference experiment was conducted ences to be valid?
using n ! 10 pairs of observations.
a. Test the null hypothesis H0 : (m1 " m2) ! 0 against APPLICATIONS
Ha : (m1 " m2) # 0 for a ! .05, !d ! .3, and sd2 ! 10.39 Auto Insurance In Exercise 2.4, we
.16. Give the approximate p-value for the test. EX1039 presented the annual 2010 premium for a male,
b. Find a 95% confidence interval for (m1 " m2). licensed for 6–8 years, who drives a Honda Accord
c. How many pairs of observations do you need if you 12,600 to 15,000 miles per year and has no violations
want to estimate (m1 " m2) correct to within .1 with or accidents.11
probability equal to .95? City GEICO ($) 21st Century ($)
10.37 A paired-difference experiment consists of Long Beach 2780 2352
n ! 18 pairs, d! ! 5.7, and sd2 ! 256. Suppose you Pomona 2411 2462
San Bernardino 2261 2284
wish to detect md $ 0. Moreno Valley 2263 2520
a. Give the null and alternative hypotheses for the test. Source: www.insurance.ca.gov
b. Conduct the test and state your conclusions. a. Why would you expect these pairs of observations
10.38 A paired-difference experiment was conducted to be dependent?
to compare the means of two populations: b. Do the data provide sufficient evidence to indicate
Pairs
that there is a difference in the average annual pre-
miums between GEICO and 21st Century
Population 1 2 3 4 5 insurance? Test using a ! .01.
1 1.3 1.6 1.1 1.4 1.7 c. Find the approximate p-value for the test and inter-
2 1.2 1.5 1.1 1.2 1.8
pret its value.
a. Do the data provide sufficient evidence to indicate d. Find a 99% confidence interval for the difference
that m1 differs from m2? Test using a ! .05. in the average annual premiums for GEICO and
b. Find the approximate p-value for the test and inter- 21st Century insurance.
pret its value. e. Can we use the information in the table to make
c. Find a 95% confidence interval for (m1 " m2). valid comparisons between GEICO and 21st Cen-
Compare your interpretation of the confidence inter- tury insurance throughout the United States? Why
val with your test results in part a. or why not?
03758_11_ch10_p364-424.qxd 9/7/11 12:47 PM Page 392

392 ❍ CHAPTER 10 INFERENCE FROM SMALL SAMPLES

10.40 Runners and Cyclists II Refer to Exer- c. Construct a 99% confidence interval for the differ-
cise 10.27. In addition to the compartment pressures, ence in the average prices for the two supermarket
the level of creatine phosphokinase (CPK) in blood chains. Interpret this interval.
samples, a measure of muscle damage, was determined
for each of 10 runners and 10 cyclists before and after 10.42 No Left Turn An experiment was con-
exercise.7 The data summary—CPK values in EX1042 ducted to compare the mean reaction times to two
units/liter—is as follows: types of traffic signs: prohibitive (No Left Turn) and per-
missive (Left Turn Only). Ten drivers were included in
Runners Cyclists the experiment. Each driver was presented with 40 traf-
Standard Standard fic signs, 20 prohibitive and 20 permissive, in random
Condition Mean Deviation Mean Deviation order. The mean time to reaction (in milliseconds) was
Before Exercise 255.63 115.48 173.8 60.69 recorded for each driver and is shown here.
After Exercise 284.75 132.64 177.1 64.53 Driver Prohibitive Permissive
Difference 29.13 21.01 3.3 6.85
1 824 702
a. Test for a significant difference in mean CPK val- 2 866 725
ues for runners and cyclists before exercise under 3 841 744
the assumption that s 12 # s 22; use a ! .05. Find a 4 770 663
5 829 792
95% confidence interval estimate for the corre- 6 764 708
sponding difference in means. 7 857 747
b. Test for a significant difference in mean CPK val- 8 831 685
ues for runners and cyclists after exercise under the 9 846 742
10 759 610
assumption that s 12 # s 22; use a ! .05. Find a 95%
confidence interval estimate for the corresponding MS Excel printout for Exercise 10.42
difference in means.
c. Test for a significant difference in mean CPK
values for runners before and after exercise.
d. Find a 95% confidence interval estimate for the
difference in mean CPK values for cyclists before
and after exercise. Does your estimate indicate
that there is no significant difference in mean CPK
levels for cyclists before and after exercise?
10.41 America’s Market Basket An adver-
EX1041 tisement for a popular supermarket chain claims
that it has had consistently lower prices than one of its
competitors. As part of a survey conducted by an inde-
pendent price-checking company, the average weekly
a. Explain why this is a paired-difference experiment
total, based on the prices of approximately 95 items, is
and give reasons why the pairing should be useful
given for this chain and for its competitor recorded
in increasing information on the difference between
during four consecutive weeks in a particular month.
the mean reaction times to prohibitive and permis-
Week Advertiser ($) Competitor ($) sive traffic signs.
1 254.26 256.03 b. Use the Excel printout to determine whether there is
2 240.62 255.65 a significant difference in mean reaction times to
3 231.90 255.12 prohibitive and permissive traffic signs. Use the
4 234.13 261.18
p-value approach.
a. Is there a significant difference in the average prices
for these two different supermarket chains? 10.43 Healthy Teeth II Exercise 10.25 describes a
dental experiment conducted to investigate the effective-
b. What is the approximate p-value for the test con-
ness of an oral rinse used to inhibit the growth of plaque
ducted in part a?
on teeth. Subjects were divided into two groups: One
group used a rinse with an antiplaque ingredient, and the
03758_11_ch10_p364-424.qxd 9/7/11 12:47 PM Page 393

10.5 SMALL-SAMPLE INFERENCES FOR THE DIFFERENCE BETWEEN TWO MEANS: A PAIRED-DIFFERENCE TEST ❍ 393

control group used a rinse containing inactive ingredi- 10.46 Tax Assessors In response to a com-
ents. Suppose that the plaque growth on each person’s EX1046 plaint that a particular tax assessor (A) was
teeth was measured after using the rinse after 4 hours biased, an experiment was conducted to compare the
and then again after 8 hours. If you wish to estimate the assessor named in the complaint with another tax
difference in plaque growth from 4 to 8 hours, should assessor (B) from the same office. Eight properties
you use a confidence interval based on a paired or an were selected, and each was assessed by both asses-
unpaired analysis? Explain. sors. The assessments (in thousands of dollars) are
10.44 Ground or Air? The earth’s temperature can
shown in the table.
be measured using either ground-based sensors or Property Assessor 1 Assessor 2
infrared-sensing devices mounted in aircraft or space 1 276.3 275.1
satellites. Ground-based sensoring is very accurate but 2 288.4 286.8
tedious, while infrared-sensoring appears to introduce 3 280.2 277.3
a bias into the temperature readings—that is, the aver- 4 294.7 290.6
5 268.7 269.1
age temperature reading may not be equal to the aver- 6 282.8 281.0
age obtained by ground-based sensoring. To determine 7 276.1 275.3
the bias, readings were obtained at five different loca- 8 279.0 279.1
tions using both ground- and air-based temperature
sensors. The readings (in degrees Celsius) are listed Use the MINITAB printout to answer the questions that
here: follow.
MINITAB output for Exercise 10.46
Location Ground Air
Paired T-Test and CI: Assessor A, Assessor B
1 46.9 47.3 Paired T for Assessor A - Assessor B
2 45.4 48.1 N Mean StDev SE Mean
3 36.3 37.9 Assessor A 8 280.78 7.99 2.83
4 31.0 32.7 Assessor B 8 279.29 6.85 2.42
5 24.7 26.2 Difference 8 1.487 1.491 0.527
95% lower bound for mean difference: 0.489
a. Do the data present sufficient evidence to indicate a
T-Test of mean difference = 0 (vs > 0):
bias in the air-based temperature readings? Explain. T-Value = 2.82 P-value = 0.013
b. Estimate the difference in mean temperatures
between ground- and air-based sensors using a 95% a. Do the data provide sufficient evidence to indicate
confidence interval. that assessor A tends to give higher assessments
than assessor B?
c. How many paired observations are required to
estimate the difference between mean tempera- b. Estimate the difference in mean assessments for the
tures for ground- versus air-based sensors correct two assessors.
to within .2°C, with probability approximately c. What assumptions must you make in order for the
equal to .95? inferences in parts a and b to be valid?
d. Suppose that assessor A had been compared with a
10.45 Red Dye To test the comparative more stable standard—say, the average !x of the
EX1045 brightness of two red dyes, nine samples of assessments given by four assessors selected from
cloth were taken from a production line and each sam- the tax office. Thus, each property would be assessed
ple was divided into two pieces. One of the two pieces by A and also by each of the four other assessors and
in each sample was randomly chosen and red dye 1 (xA " !x ) would be calculated. If the test in part a is
applied; red dye 2 was applied to the remaining piece. valid, can you use the paired-difference t-test to test
The following data represent a “brightness score” for the hypothesis that the bias, the mean difference
each piece. Is there sufficient evidence to indicate a between A’s assessments and the mean of the assess-
difference in mean brightness scores for the two dyes? ments of the four assessors, is equal to 0? Explain.
Use a ! .05.
10.47 Memory Experiments A psychology
Sample 1 2 3 4 5 6 7 8 9 EX1047 class performed an experiment to compare
Dye 1 10 12 9 8 15 12 9 10 15 whether a recall score in which instructions to form
Dye 2 8 11 10 6 12 13 9 8 13 images of 25 words were given is better than an initial
394 ❍ CHAPTER 10 INFERENCE FROM SMALL SAMPLES

recall score for which no imagery instructions were 10.48 Music in the Workplace Before con-
given. Twenty students participated in the experiment EX1048 tracting to have stereo music piped into each of
with the following results: his suites of offices, an executive had his office manager
With Without With Without randomly select seven offices in which to have the sys-
Student Imagery Imagery Student Imagery Imagery tem installed. The average time (in minutes) spent out-
side these offices per excursion among the employees
1 20 5 11 17 8
2 24 9 12 20 16 involved was recorded before and after the music sys-
3 20 5 13 20 10 tem was installed with the following results.
4 18 9 14 16 12
Office Number 1 2 3 4 5 6 7
5 22 6 15 24 7
6 19 11 16 22 9 No Music 8 9 5 6 5 10 7
7 20 8 17 25 21 Music 5 6 7 5 6 7 8
8 19 11 18 21 14
9 17 7 19 19 12 Would you suggest that the executive proceed with the
10 21 9 20 23 13 installation? Conduct an appropriate test of hypothesis.
Find the approximate p-value and interpret your
Does it appear that the average recall score is higher
results.
when imagery is used?

INFERENCES CONCERNING
A POPULATION VARIANCE
10.6
You have seen in the preceding sections that an estimate of the population variance
s 2 is usually needed before you can make inferences about population means. Some-
times, however, the population variance s 2 is the primary objective in an experimental
investigation. It may be more important to the experimenter than the population mean!
Consider these examples:
• Scientific measuring instruments must provide unbiased readings with a very
small error of measurement. An aircraft altimeter that measures the correct
altitude on the average is fairly useless if the measurements are in error by as
much as 1000 feet above or below the correct altitude.
• Machined parts in a manufacturing process must be produced with minimum
variability in order to reduce out-of-size and hence defective parts.
• Aptitude tests must be designed so that scores will exhibit a reasonable amount
of variability. For example, an 800-point test is not very discriminatory if all
students score between 601 and 605.

In previous chapters, you have used


S(xi " x!)2
s2 ! % %
n"1
as an unbiased estimator of the population variance s 2. This means that, in repeated
sampling, the average of all your sample estimates will equal the target parameter,
s 2. But how close or far from the target is your estimator s2 likely to be? To answer
this question, we use the sampling distribution of s2, which describes its behavior in
repeated sampling.
Consider the distribution of s2 based on repeated random sampling from a normal
distribution with a specified mean and variance. We can show theoretically that the
distribution begins at s2 ! 0 (since the variance cannot be negative) with a mean equal
to s 2. Its shape is nonsymmetric and changes with each different sample size and each

You might also like