Professional Documents
Culture Documents
Introductory
Statistics
A Problem-Solving Approach
S t e p h e n K o k o sk a
APPLICATIONS
Introductory
Statistics
A Problem-Solving Approach
Stephen Kokoska
Bloomsburg University
Omar Harran/Moment/Getty Images
Senior Publisher: Terri Ward
Senior Acquisitions Editor: Karen Carson
Marketing Manager: Cara LeClair
Development Editor: Leslie Lahr
Associate Editor: Marie Dripchak
Senior Media Editor: Laura Judge
Media Editor: Catriona Kaplan
Associate Media Editor: Liam Ferguson
Editorial Assistant: Victoria Garvey
Marketing Assistant: Bailey James
Photo Editor: Robin Fadool
Cover Designer: Vicki Tomaselli
Text Designer: Jerry Wilke
Managing Editor: Lisa Kinne
Senior Project Manager: Dennis Free, Aptara®, Inc.
Illustrations and Composition: Aptara®, Inc.
Production Coordinator: Julia DeRosa
Printing and Binding: QuadGraphics
Cover credit: Omar Harran/Moment/Getty Images
First printing
Optional Sections
(available online at www.whfreeman.com/introstats2e and on LaunchPad):
v
Contents
vi
Contents vii
Optional Sections
Chapter 13 Categorical Data and (available online at www.whfreeman.com/
Frequency Tables 651 introstats2e and on LaunchPad):
S tudents frequently ask me why they need to take an introductory statistics course. My
answer is simple. In almost every occupation and in ordinary daily life, you will have
to make data-driven decisions, inferences, as well as assess risk. In addition, you must be
able to translate complex problems into manageable pieces, recognize patterns, and most
important, solve problems. This text helps students develop the fundamental lifelong tool
of solving problems and interpreting solutions in real-world terms.
One of my goals was to make this problem-solving approach accessible and easy to
apply in many situations. I certainly want students to appreciate the beauty of statistics
and the connections to so many other disciplines. However, it is even more important for
students to be able to apply problem-solving skills to a wide range of academic and career
pursuits, including business, science and technology, and education.
Introductory Statistics: A Problem-Solving Approach, Second Edition, presents long-
term, universal skills for students taking a one- or two-semester introductory-level
statistics course. Examples include guided, explanatory Solution Trails that emphasize
problem-solving techniques. Example solutions are presented in a numbered, step-by-
step format. The generous collection and variety of exercises provide ample opportunity
for practice and review. Concepts, examples, and exercises are presented from a practical,
realistic perspective. Real and realistic data sets are current and relevant. The text uses
mathematically correct notation and symbols and precise definitions to illustrate statisti-
cal procedures and proper communication.
This text is designed to help students fully understand the steps in basic statistical ar-
guments, emphasizing the importance of assumptions in order to follow valid arguments
or identify inaccurate conclusions. Most important, students will understand the process
of statistical inference. A four-step process (Claim, Experiment, Likelihood, Conclusion)
is used throughout the text to present the smaller pieces of introductory statistics on which
the larger, essential statistical inference puzzle is built.
LaunchPad
Introductory Statistics is accompanied by its own dedicated version of W. H. Freeman’s
breakthrough online course space, which offers the following:
• Pre-built Units for each chapter, curated by experienced educators, with media for
each chapter organized and ready to assign or customize to suit the course.
• All online resources for the text in one location, including an interactive e-Book,
LearningCurve adaptive quizzing, Try It Now exercises, StatTutors, video technology
manuals, statistical applets, CrunchIt! and JMP statistical software, EESEE case stud-
ies, and statistical videos.
• Intuitive and useful analytics, with a Gradebook that allows instructors to see how the
class is progressing, for individual students and as a whole.
• A streamlined and intuitive interface that lets instructors build an entire course in
minutes.
ix
x Preface
New Chapter 0
This introductory chapter eases students into the course and Kokoska’s approach. It in-
cludes about a dozen exercises that instructors can assign for the first day of class, helping
students settle into the course more easily.
Features
Focus on Statistical Inference The main theme of this text is statistical inference and
decision making through interpretation of numerical results. The process of statistical
inference is introduced in a variety of contexts, all using a similar, carefully delineated,
four-step approach: Claim, Experiment, Likelihood, and Conclusion.
n Be able to classify a data set as categorical or numerical, discrete or continuous.
n Learn several graphical summary techniques.
n Construct bar charts, pie charts, stem-and-leaf plots, and histograms.
FREE136_CH09_390-459.indd Page 391 29/08/14 9:41 AM f403 /207/WHF00260/9781429239769_KOKOSKA/KOKOSKA_INTRODUCTORY_STATISTICS02_SE_9781
Preface xi
Can the Florida Everglades be saved?
Burmese pythons have invaded the Florida Everglades and now threaten the
9
wildlife indigenous to the area. It is likely that people were keeping pythons as
pets and somehow a few animals slithered into Everglades National Park. The
Chapter Opener Each chapter begins with a unique, real-world
Option
first python was found in the Everglades in 1979, and these snakes became an
officially established species in 2000.1 The Everglades has an ideal climate for the
pythons, and the large areas of grass allow the snakes plenty of places to hide.
Hypothesis Tests Based
question, providing an interesting introduction to new concepts
and an application to begin discussion. The chapter question is
In January 2013, the Florida Fish and Wildlife Conservation Commission started
the Python Challenge. The purpose of the contest was to thin the python popu-
lation, which could be tens of thousands, and help save the natural wildlife in
on a Single Sample
presented again as an exercise at the end of the chapter, to close
the concept and application loop, as a last step.
the Everglades. There were 800 participants, with prizes for the most pythons
captured and for the longest. At the end of the competition, 68 Burmese pythons
had been harvested.
Suppose a random sample of pythons captured during the Challenge was Looking Back
obtained. The length (in feet) of each python is given in the following table. =
n Recall that x, p, and s2 are the point estimates for the parameters m, p, and s2 .
9.3 3.5 5.2 8.3 4.6 11.1 10.5 3.7 2.8 5.9
n Remember how to construct and interpret confidence intervals.
7.4 14.2 13.6 8.3 7.5 5.2 6.4 12.0 10.7 4.0
11.1 3.7 7.0 12.2 5.2 8.1 4.2 6.1 6.3 13.2 n Think about the concept of a sampling distribution for a statistic and the process of
standardization.
3.9 6.7 3.3 8.3 10.9 9.5 9.4 4.3 4.6 5.8
4.1 5.2 4.7 5.8 6.4 3.8 7.1 4.6 7.5 6.0 Looking Forward
The tabular and graphical techniques presented in this chapter will be used to n Use the available information in a sample to make a specific decision about a population
describe the shape, center, and spread of this distribution of python lengths and parameter.
to identify any outliers. n Understand the formal decision process and learn the four-part hypothesis test procedure.
ContEnts n Conduct formal hypothesis tests concerning the population parameters m, p, and s2 .
2 1 2 1 2
358 20.89 402 0.04 441 1.02
361 20.78 406 0.12 443 1.18
364 20.67 408 0.21 446 1.36
163
xii 371
Preface
376
20.57
20.47
412
424
0.29
0.38
447
449
1.61
2.04
4.4 Conditional Probability
probability that a person is male Minitab, or ExcelNever is presented at the end of each text
and divorced is 0.043, the example.
Married (R)This married
allows(N)students Widowed to (W)
focusDivorced
on concepts (D)
intersection of the first row and
Male (M) and interpretation.
0.282 0.147 0.013 0.043 0.485
the fourth column. The probabilities 4.4 Conditional Probability 163
obtained by summing across rows
Female (F) 0.284 0.121 0.050 0.060 0.515
or down columns are calledThe body of the table contains 0.566
Solution 0.268 0.063 0.103 1.000
x marginal probabilities. The total
intersection probabilities: the
Step 1 The keyword is and, which means intersection. The probability of female (F )
probability of a row event andPa( F d W ) 5 0.050
440
probability in the table is 1.000.
column event. For example, the
and widowed (W ) is found by reading the appropriate cell.
probability that aStep
person 2 The keyword is suppose. That suggests
is male Never conditional probability. The extra infor-
420 and divorced is 0.043, the mation is male. Married (R) married (N) Widowed (W) Divorced (D)
intersection of the first row and
400 Male (M)
the fourth column. The probabilities P ( N d M ) 0.282 0.147 0.013 0.043 0.485
380 obtained by summing across P ( N 0 M ) 5Female (F)
rows
0.284 0.121 0.050
Translated 0.060probability;
conditional 0.515definition.
or down columns are called
(
P M )
0.566 0.268 0.063 0.103 1.000
360 marginal probabilities. The total 0.147
P(F d W) 5 0.050
340
probability in the table is 1.000. 5 5 0.303 Use known probabilities.
2 1 0 1 2 z Step 2 0.485
The keyword is suppose. That suggests conditional probability. The extra infor-
mation is male.
Figure 6.63 Normal probability plot for the Step
Figure 6.64 JMP normal probability 3 This is another conditional
P(N dprobability.
M) This time, the event R is given.
chemotherapy dose data. plot. P(N 0 M) 5 Translated conditional probability; definition.
P ( F d R ) P(M)0.284
P(F 0 R) 5 50.147 5 0.502
The histogram, backward empirical rule, IQR/s, and the normal probability plot P ( R ) 5 0.485
0.566
5 0.303 Use known probabilities.
Theory Symbols More In general, for any two 5 P(R d M) 1 P(R d Mr)
events A and B,
In general, for any two events A and B,
advanced material, which P ( A ) 5 P ( A d P(A)
B) 1 ( )
5 P(AAdd
P B)Br
1 P(A d Br)
may be found in “A Closer
This decomposition technique technique
This decomposition is oftenisneeded in order
often needed in ordertotofind P(A).
find P(A). TheThe
VennVenn diagram
diagram
Look” and regular ex- in Figure 4.19 illustrates this equation.
in Figure 4.19 illustrates this equation.
The events B and B9 make up the entire sample space: S 5 B c Br.
position Page
FREE136_CH03_072-121.indd as 75appropriate,
20/08/14 11:45 AM f403
The events B and B9/207/WHF00260/9781429239769_KOKOSKA/KOKOSKA_INTRODUCTORY_STATISTICS02_SE_97814292
make up the entire sample space: S 5 B c Br. S
...
c. In words, expression (c) says subtract 7 from each observation, square each difference,
Try to draw the Venn diagram to 2. Suppose B1, B2, and B3 are mutually exclusive and exhaustive:
How TotheBoxes
and add This feature provides clear steps for con-
resulting values.
g (xi 2 7) 2 5 basic
illustrate this equality. B1 c B2 c B3 5 S. For any other event A,
How to Construct a Standard Box Plot structing (x 2 7) 2graphs
1 (x 2 7)or
2
1performing
(x 2 7) 2 1 (x4 2essential
7) 2 1 (x5 2 7)calculations.
2
1 (x6 2 7) 2
P(A) 5 P(A d B1) 11 P(A d B22) 1 P(A d3 B3)
Given a set of n observations x1, x2, . . . , xn: How To boxes are color-coded and easy to locate within
Expand summation notation.
1. Find the five-number summary xmin, Q1, ~ x , Q3, xmax. each chapter.
5 (5 2 7) 2 1 (9 2 7) 2 1 (12 2 7) 2 1 (26 2 7) 2 1 (17 2 7) 2 1 (22 2 7) 2
e minimum 2. Draw a (horizontal) measurement axis. Carefully sketch a box with edges at the quar-
Use given data.
s the tiles: left edge at Q1, right edge at Q3. (The height of the box is irrelevant.)
3. Draw a vertical line in the box at the median. 5 (22) 2 1 (2) 2 1 (5) 2 1 (213) 2 1 (10) 2 1 (29) 2 Compute each difference.
4. Draw a horizontal line (whisker) from the left edge of the box to the minimum value 5 4 1 4 1 25 1 169 1 100 1 81 5 383 Square each difference, and add.
(from Q1 to xmin). Draw a horizontal line (whisker) from the right edge of the box to the
maximum value (from Q3 to xmax).
Try IT Now Go To ExErCiSE 3.2
The most common measure of central tendency is the sample, or arithmetic, mean.
Figure 3.36 illustrates this step-by-step procedure and shows a standard box plot with
the five numbers indicated on a measurement axis. Note that the length of the box is the
Definition
interquartile range. The box contains the middle half of the values.
x is read as “x bar.” The sample (arithmetic) mean, denoted x, of the n observations x1, x2, . . . , xn is the sum
of the observations divided by n. Written mathematically.
x 5 g xi 5
Definition/Formula Boxes Definitions and formulas are 1 x1 1 x2 1 c1 xn
(3.1)
n n
clearly marked and outlined with clean, crisp color-coded lines.
Figure 3.36 Standard box plot. xmin Q1 ~
x Q3 xmax
A CLoser L ok
The position of the vertical line in the box (median) and the lengths of the horizontal
lines (whiskers) indicate symmetry or skewness, and variability. Figure 3.37 shows a 1. The notation x is used to represent the sample mean for a set of observations denoted
standard box plot for a distribution of data that is skewed to the right. The lower half of by x1, x2, . . . , xn. Similarly, y would represent the sample mean for a set of observations
the data is in the interval from 3 to 4.5, while the upper half of the data is much more denoted by y1, y2, . . . , yn.
spread out, from 4.5 to 11. Figure 3.38 shows a standard box plot for a fairly symmetric 2. The population mean is denoted by m, the Greek letter mu.
distribution with lots of variability. The lower and upper half of the data are evenly distrib-
The sample contains 28 observations and 18 successes. The sample proportion of suc- 113 124 141 1
Data Set
cessesInismany states it is against the law to drive without a fastened seatbelt. The State Police
power was measured for each engine. ENGINEhP
SeatBeLt recently established a checkpoint along a heavily traveled road. A success was recorded
FREE136_CH03_072-121.indd Page 82 20/08/14 forf403
11:45 AM =
p5
n(S) 18
and a failure
5 a. Find the mean and the standard deviation of these
a driver wearing a/207/WHF00260/9781429239769_KOKOSKA/KOKOSKA_INTRODUCTORY_STATISTICS02_SE_97814292
seatbelt, recorded otherwise. The observations from
5 0.6429 ... a. Find the mean and
this checkpoint are given in the following
n 28 table.
b. Find the z-score fo
Approximately 64% of the drivers stopped at the checkpoint werehorsepower measurements.
Preface Findxiii
wearing their seatbelts.
S assume
S F theFvalue S ofS p= isF close
S toFthe Spopulation
F S proportion
S S c. the mean and
It is reasonable to
successes—in thisSexample,
F Sthe true S proportion
S S Fof drivers S S whob.F wear
Find F theS actual
S a seatbelt. F
of
proportion of observations within one
standard deviation of the mean, within two standard z-scores.
82 Cha P T E R 3 NumericalTry The
Summary sample contains
IT Measures
Now 28 observations
3.10 and 18 successes. The sample proportion of suc-
cesses is
Go To ExErCisE
deviationsFREE136_CH03_072-121.indd
of the mean, and within three 11:45
standard d. For any set of obse
Page 78 20/08/14 AM f403 /207/WHF00260/9781429239769_KOKOSKA/K
Technology Corner This feature, atp= the 5
end
n(S)
5
18of most sec-
5 0.6429 deviations of the mean. standard deviation
n 28
tions,
Technology presents
Corner step-by-step
FREE136_CH03_072-121.indd Page 82 20/08/14 11:45 AM f403
instructions for using CrunchIt!,
c. Using the results in part
/207/WHF00260/9781429239769_KOKOSKA/KOKOSKA_INTRODUCTORY_STATISTICS02_SE_97814292 (b), do
... you think the shape of the prove this result.
Approximately 64% of the drivers stopped at the checkpoint were wearing their seatbelts.
=
the TI-84, Compute theMinitab, isand
sample mean,Itsample Excel
reasonable atotrimmed
assumetomean,
thesolve
value pthe
of mode.is closeexamples
distribution
to the of horsepower
population proportion of measurements is normal? Why
Procedure: median, and the
successes—in this example, the true proportion of drivers who wear a seatbelt. Challenge
presented in that section. Keystrokes, menu items, specific
Reconsider: Example 3.2, solution, and interpretations.
or why not? 78 C h a P T E R 3 Numerical Summary Measures
functions, and screenTry
CrunchIt!
IT Now Go To ExErCisE 3.10
illustrations are presented. 3.94 Travel and Transp
82 CrunchIt!Cha
hasPaT Ebuilt-in functionSummary
to find certain descriptive statistics, including the sample mean and the sample median.
R 3 Numerical Measures
There is no built-in function to compute a trimmed mean nor a sample mode. 3.92 Sports and Leisure In 1974, Erno Rubik created an setts Bay Transportation A
Figure 3.9 The sample mean Figure 3.10 The sample mean Figure 3.11 The second output 2. In general, the population mean is not equal to the population
Technology
1. Enter and
thethe
datasample
into a median Corner
column. using is part of the output from the imaginative
screen from 1-Var StatsandFREE136_CH07_294-331.indd
best-selling puzzle—the Rubik’s
Page 300 10/31/14 8:54 AMcube.
s-w-149Many
distribution Boston’s Loganthen mAirport
/207/WHF00260/9781429239769_KOKOSKA/K
of the population is symmetric, 5m ~. o
2. Selectbuilt-in calculator
Statistics;
FREE136_CH02_026-071.indd
functions.Statistics. Choose
Descriptive
Page 64 19/08/14
1-Var the
8:44 AM
Stats function.column and click
appropriate
f403 a trimmed mean, and
shows
thethe sample median
countries
Calculate button.(Med).
hold See
competitions in which participants
/207/WHF00260/9781429239769_KOKOSKA/KOKOSKA_INTRODUCTORY_STATISTICS02_SE_97814292 3. Thetry to solve
relative positions
... of x A
and random
~x suggest sample
the shape of
of atrav
dist
Procedure:
FREE136_CH03_072-121.indd
Figure 3.8. Page Compute
82 the 11:45
20/08/14 sample
AM mean,
f403
Reconsider: Example 3.2, solution, and interpretations.
sample median, the mode.
Helpful Icons
/207/WHF00260/9781429239769_KOKOSKA/KOKOSKA_INTRODUCTORY_STATISTICS02_SE_97814292
provides a 3.3–3.5
in Figures theillustrate
results arepossibilities:
three given in th
Minitab ~
Page 113 20/08/14 11:47 AM f403 a. If x . x , the distribution of the sample is positively skewed
CrunchIt!
There are several ways to find the summary statistics using Minitab. In addition to the general sampleDescribefrom
command, thetherelist
are of record
FREE136_CH03_072-121.indd
times (in seconds) in
Data(Figure official
Set 3.3).
/207/WHF00260/9781429239769_KOKOSKA/KOKOSKA_I
icons indicate46.5 when38.3 a
Calc; Column has
CrunchIt! statistics functions,
a built-in functionCalc; Calculator
to find certain functions,
descriptiveand variousincluding
statistics, macros. the sample mean and the sample median.
world competitions.29300 CUBETIME 39.1
Therethe
1. Enter is no
databuilt-in function
into column C1.to compute a trimmed mean nor a sample mode. data
C h a p t e r 7 Sampling b.set
If x is
Distributions <~available
x , the distribution online, and
of the sample also
is approximately sy
822. Select Cha P T E R 3 Numerical Summary Measures
a. Find the actual proportion of observations within c. If x ,one
~x , the distribution of the39.0 34.8 36.5
1.64
Stat;theBasic
data Statistics; Display Descriptive Statistics. Enter C1 in the Variables window.
Choose
3.Figure
Enter
theThe
CHaPTer
into a column.
Statistics option
2 Tables
button Statistics.
and
and Graphs
check3.10 the summary
for Summarizing
statistics
mean Mean,
Data
Median, Mode,theand Trimmed mean. See the name of the data set. sample is negatively skewe
Note:
and the
3.9
2. Select
Minitab
sample
sample
Statistics;
Figure
computes
median
mean
Descriptive
using
3.8 Crunchit! Figure
only a 5% trimmed
Choose
descriptive
is part of mean.
Thethesample
appropriate
statistics.
Other from
the output macros
Figure
column
theallow any percentage.
3.11
and click
screen from See
The standard
second
Calculate
Figure
1-Var
output
3.12.
Stats
button. deviation of the mean, within two standard
(Figure 3.5).
sample consists of pieces of coal a.(objects)
Find and values (are
thethemean x )t
Figure 3.8.
built-in calculator functions.
TI-84 Plus C 1-Var Stats function. deviations
shows the sample median (Med). of the mean, and within three standard ments. However, it is common practice 3.4 to
data
Five-Number
set.
Summary
say, “Consider thean ra
There are3. Eachways class is labeled with its and
right theendpoint. In addition,
using the Excel places observations
There is noon a boundary STaTISTICaL aPPLeT
in the smaller class. Statistical
mium measurements.” Applet icons indicate
several to find the sample mean sample median graphing deviations
calculator. of the mean.
built-in
b.presented
Compute each z-sc
Minitab See Figure
function to compute 2.51.mean nor a sample mode.
a trimmed
b. Using
MeaN aND MeDIaN
statistical
3. Unless stated applets
otherwise, that are
all data available in this text are obtai
There are several ways to find the summary statistics using Minitab. In addition to the general Describe command, the results in part (a), do you think the
there are dom shape
sample.of the
1. Enter the data into list L1. in LaunchPad. c. Find a general form
LIST distribution
; MATH; median to of national record times is normal? Why or
Calc; Column statistics functions, Calc;64Calculator
FREE136_CH02_026-071.indd
functions, and various macros.
2. Use the command LIST ; Page MATH; mean 19/08/14 8:44 AM
to compute thef403 /207/WHF00260/9781429239769_KOKOSKA/KOKOSKA_INTRODUCTORY_STATISTICS02_SE_97814292
sample mean. Use the command STePPeD ...
STEPPED TUToRIaL
TUTORIALS
Data Set Example~
x x̄ 7.4 Cattle Feeds~x ¯x
1. Enter the data
compute theinto column
sample C1. See Figure 3.9.
median.
Figure 3.8 Crunchit! descriptive statistics. why not? MeaSUReS
BOX PLOTS of CeNTeR:
3.95 Reconsider Examp
Select
2. 3. The Stat;
3. Choose
screen the
and
Figure
BasicSTAT
function Statistics;
Statistics
; CALC;
Theoption
the sample
3.9
Display
1-Var
button
median
sample mean
Descriptive
is and
Stats returns
check
on the
Statistics.
the summary
second,
Figure denoted
3.10 The
Entersummary
several C1 in thestatistics.
statistics
by Med.
sample Mean,
See
VariablesThe
Median,
Figures
mean
window.
sample mean is on the first output
Mode,
3.10Figure andc.
and 3.11.
3.11 Construct
Trimmed
The mean.
second
FeeD
MeaN aND
STePPeD
STEPPED MeDIaN
TUToRIaL
TUTORIALS
output a histogram for these data. Describe the3 shape
Stepped
Figure
The
Figure 3.43
following Tutorial
Modified
table
3.3 Positively listsbox icons
theplot indicate
for the3.4
percentage
Figure of protein for the entire
Approximately po
Figu
TI-84 Plus C Box PLoTS sled
cattle
skeweddog trip
feeds. data.
distribution. guaranteed
symmetric 0 life.
distribution.2 That
4 skew is,6
Note: and
Minitab computes
the sample onlyusing
median a 5% trimmed mean. is part ofOther macrosfrom
the output allowthe any percentage. Seefrom
screen Figure 3.12.Stats
1-Var
ofmedian
the distribution.
BOX PLOTS
is no built-inNote: The population mean is the
detailed tutorials for specific calcu-
There
built-inare several functions.
calculator ways toFigure
find the3.12
sample meandescriptive
1-Var
Minitab
and thefunction.
Stats sample median using the
statistics.
graphing
shows calculator.
the sample There
(Med).
age, then the store would
function to compute a trimmed mean nor a sample mode.
64
recall:
sum of A
allhistogram consists of
the observations lations.Note: IF
Because the sample
L 18
than 0 hours.
and OF are negative even though an observed
mean
19L 34
That’s
is extremely
26 11sensitive3 20 to outliers,
9 12 an
CHaPTer 2 Tables and Graphs for Summarizing Data
1. Enter the data into list L1.
rectangles
divided by ndrawn
5 20:above each class
m 5 15.8. very insensitive 24 OK.
to13outliers, This
10 is
it 4seems a24
correct
10 statistical
reasonable to 25
search15calculat
for 12
ac
Minitab
excel even though it seems odd. Figure 3.42 shows a technology
2. Use the command LIST ; MATH; mean to compute the sample mean. to
Use LIST with height proportional to central tendency. A trimmed mean is moderately sensitive to outli
There
Excelare several ways to find thethese
summary statistics using Minitab. In addition thethe command
general Describe ; MATH; there
command, median
Statisticsare
to
SOLUTION TRAIL
Calc;
has built-in functions
compute the sample
also Column
be used statistics
to compute
for
median.
functions,
four descriptive
See
Calc;
several summary
Figure 3.9.
Calculator
statistics.
functions,
statistics
Under
VIDEO the TECH
Data tab,
and various macros.
simultaneously.
Data
MANUALS Analysis; Descriptive can frequencyViDeo Video Tech Manual icons indi-
or relative
VIDEO TECH
tech frequency.
MANUALS
ManualS Find an approximate sampling distribution for the sample mean of
Try IT Now Go To ExErCiSE 3.91
3. The function STAT ; CALC; 1-Var Stats returns several EXEL statistics. The sample mean is on the first outputWe draw SaMpling
DISCRIPTIVE
summary a curve
EXEL along the tops
DISCRIPTIVE
FroM Definition
this population.
1.1.Enter
Enterthe Eachinto
3. data
the data
screen
class
andinto
is labeled with its right endpoint. In addition,
thecolumn
columnC1.
sample A.
median is on the second, denoted by Med.
Excel places observations on a boundary in theasmaller
See Figures 3.10 and 3.11. Page 64 19/08/14 8:44 AM
FREE136_CH02_026-071.indd of the cate video instructions for solving
Data to
f403rectangles
class.
Setsmooth out
A CLoser L mean,
ok denoted xtr(p), of the n observations x1,
/207/WHF00260/9781429239769_KOKOSKA/KOKOSKA_INTRODUCTORY_STATISTICS02_
See Figure 2.51.function A 100p% trimmed
3.4
2.2.Select Stat; Basic Statistics; Display Descriptive
to compute Statistics.
the sampleEnter in the median,
Variablestrimmed
window.mean, and mode. Note: the
Use the appropriate Excel mean,C1sample the histogram and display an Solution
certain kinds of problems using sta-
3. Choose
secondthe
Note:
tionsMinitab
Statistics
argument
down tocomputes
in option
TRIMMEAN
the nearestonly
buttonisand
thecheck
a 5% trimmed
multiple
the summary
total proportion
of 2. Seemean.
FigureOther
statistics
of data Mean,
trimmed. Median,
Excel roundsMode,
3.13.macros allow any percentage. See Figure 3.12.
Five-Number
and Trimmed
the number mean.observa-
of trimmed
summary
enhanced graphical
representation of the distribution. tistical
Step
and
1 software.
1. placed
be Order There
the are
Box
mean of
When
a measurement
b5
observations
on the same 15,504
from
Plots
wethe trimmed
compare
possible
smallest
axis to
twodata set.
20(or more) data sets graphically, the corres
samples
above theofother
largest.
(one size usi
5.
Figure 2.51 Excel histogram.
Figure 3.12 Minitab descriptive statistics. 5
side-by-side
2. Delete, orwith a vertical
trim, axis). Figureand
3.44 shows three boxof
plot
every onethe
of smallest 100p%
these samples, selectthe largest
some 100p%
(say, th
100) samp
ment axis, representing the number of gallons of gasoline pumpe
SolutionATrail
Figure 3.44 Ti-84 Plus C box
icons
box plot, data within
orset.
vehicles
mean fortheeachexercise
at box-and-whisker
the sets in-a histogram of t
three different stations. plot, is a compact g
sample, and construct
excel 64 CHaPTer 2 plotsfor
Tables and Graphs forSummarizing
gasoline data.
Data 3. Compute the sample mean for the remaining data.
Excel has built-in functions for these four descriptive statistics. Under the Data tab, Data Analysis; Descriptive Statistics can dicate the opportunity
mation Try about
Step 2 The for
central students
tendency,
table below to create theirof size
lists the firstsymmetry,
few samples skewnes
5 and the
also be used to compute several summary statistics simultaneously. 100p isITthe
Nowtrimming Gopercentage,
To ExErCiSE the3.94
percentage of observations
own Solution box Trails.
plot ofis
theconstructed
ordered list. using
Sample the minimum
x and ma
Sampl
1. Enter theseCTIoN 2.4 exerCIses
data into column A.
Figure 3.13 Excel descriptive
2. Use the appropriate Excel function to compute the sample mean, sample median, 3.
statistics.
Eachmean,
trimmed classand
is labeled
mode. Note:
technology and
withtheits right endpoint. In addition, Excel
Corner third
places quartiles,
observations on a25 and
boundary
13 18 the
in median.
the10
smaller
12 class.
15.6This
VIDEO TeCH
24 collectio
3 13
second argument in TRIMMEAN is the total proportion of data trimmed. Excel roundsSee
Concept Check Figure of
the number
Practice 2.51.
trimmed observa- VIDEO TECH MANUALS 3 10 13 12 11VIDeo 9.8TECH MaNUaLS
MANUALS
4 24 10
tions down to the nearest multiple of 2. See Figure 3.13. summary.
EXEL DISCRIPTIVE A CLoser L ok
19 9 10 11 16Box13.0
EXELPLoTS
DISCRIPTIVE
20 16 26
2.68 True/False Figure 3.12 Minitab
The classes descriptivedistribution
in a frequency statistics. may Procedure:
2.77 Consider the data given in the following table. Construct
ex2.77a box plot.
1. We compute a trimmed( mean by deleting( the smallest and( la
overlap. Reconsider: Example 3.20, solution, and interpretations.
possible outliers. Some statisticians believe that deleting any da
Grouped Exercises
excel2.69 True/False The classes inKokoska
Excel has built-in functions for these
a frequency distribution 87 81 86
91 Descriptive
four descriptive statistics. Under the Data tab, Data Analysis;
90
offers a wide variety of interesting,Crunchit!
86 86 Statistics
87 can
88 85 79
88 85 92 85 87 86
91 87
engaging exercises Step82
every Definition
observation
3 Figure contributes
7.1 shows to theof
a histogram bigthepicture.
sample means for 10
should have the same width. Figure 2.51 Excel histogram.
also on relevant
be used topics,
to compute several summarybased
statistics on current data, at the 91end81of 89
simultaneously. each89section and
CrunchIt!
83 90 83 80 90 80chapter.
has a These
built-in function to 2. A 100p%
Step 4 There
construct a box trimmed
are some
plot. mean
very is computed
interesting, by deleting
curious results the smalles
here. Even
1. Enter
2.70 True/False The cumulative relative frequency for each
problems provide
the data into column A. plenty of opportunity for practice,89 review,
85 86 and
90 application
90 89
1. 78
Enter 91
the of
data concepts.
83
into 92
a
The
column.
five-number
100p% thesummary
notofnormally
observations.
distributed for
(the apopulation
Therefore, set of
2(100p)%n
is observati
of theand
finite; observ
con
class in a frequency distribution may be greater than 1. 3. Therecattle
is no feeds in for
set rule Figure 7.2), the the
determining shape of the
value of p.sampling
It seemsdisre
Answers to odd-numbered
Figure the
2. Use the appropriate Excel function to compute 3.13
statistics.
second argument in TRIMMEAN is the total
Excel mean,
sample
section
descriptive
sample median, trimmed mean, and mode. Note: the
andcanchapter exercises are given at the
proportion of data trimmed. Excel rounds the number of trimmed observa- back
2. Select of data
Graphics; mum
theBoxbook.
Plot. value,
Choose the the maximum
appropriate
a few column,
approximately
observations, and
value,
optionally
normal!
to select
the
p enter
In addition, first
a the
so that Title, Xand
Label,
center
np (the
third
of the
number and Y
samp
of ob
2.71 True/False The relative frequency for each class be Construct a frequency distribution to summarize these
tionsdetermined
down to the nearest multiple of 2. See Figure 3.13.frequencies. recall: The range of a data set is calculate button. The resulting box plot is shown
mean
each end in
ofisthe Figure 3.45.
approximately an integer.mean (m 5 15.8). A
ordered data)theispopulation
Exercises byare
usinggrouped
the cumulative according
relative to: using the class intervals 78–80, 80–82, 82–84, . . . . between the original population parameters and the sampli
4. Here is a specific example using the notation: xtr(0.05) is a (100
seCTIoN
2.72 True/False 2.4 exerCIses
The only difference between a frequency the largest observation (maximum
2.78 Consider the data given on the text website. Construct a mean.ters is not
In this clear, there
example, 10% is certainly less variability
of the observations are in the samp
discarded.
histogram and a relative frequency histogram (for the same
Concept
data) Check
is the scale on the vertical axis.
value)
frequency
Practice
minus tothe
distribution smallest
summarize these data. ex2.78 These fivethe numbers do provide
population distribution (s 5 a glimpse
8.11). ofobserv
These three sym
example distribution of the sample mean,
3.6 example,
overtime andasstress
discussed in the next se
2.68 True/False A
2.73 The classes incan
histogram a frequency
be used todistribution
describe themay
2.79observation (minimum
Consider the following frequencyvalue). This
distribution.
2.77 Consider the data given in the following table. ex2.77
ity in a data set. For minimum and maxim
to an article in The Guardian,3 Americans spend more tim
overlap.center, and variability of a distribution.
shape, descriptive
Concept statisticFigure
Check was 2.51
DaTahistogram.
our Excel
first
SeT
Cumulative
gest lotsAccording
of variability.
ers in Germany
If the median is approximate
do. Dr. Paul Landsbergis, an epidemiologist at M
8
Figure 3.13 Excel descriptive 87 81 86 90 88 85 79 91 87 82 STReSS 25
2.69 Short
2.74 True/False
answer The classes in a frequency distribution attempt
True/False, at measuring variability in maximum values and approximately halfway betw
86 Fill-in-the-Blank, and87 Short-
statistics. Relative relative studies job stress, and he warns that too many overtime hours may inc
91Class
86 Frequency87 88 frequency
85 92 85 86 20 6
should have is
a. When thea same width.
density histogram appropriate? frequency
a data set. suggests the distribution is symmetric. A box plot i
Frequency
Answer exercises designed to reinforce the
Frequency
b. In a density histogram, what is the sum of areas of all 91 81 89 89 83 90 83 80 90 80 Figure 3.45 Crunchit!
15 box
2.70 True/False The cumulative relative frequency for each 400–410 5 0.0758 0.0758 4
rectangles? 89 85 86 90 90 89 78 91 83 92 plot of the sled dog trip data.
class in a frequency distribution may be greater than 1. basic concepts
410–420 8 presented section.
0.1212in the 0.1970 10
2.75 Fill in the Blank
2.71 True/False The relative frequency for each class can be
seCTIoN 420–4302.4 exerCIses
10 ti-84 Plus
0.1515 C
0.3485 2
a. The most common unimodal distribution is a Construct a frequency12distribution to0.1818
summarize these data 5
430–440 0.5303
determined by using the cumulative relative frequencies. using the class intervals The TI-84 Plus has two built-in statistical plots, a standard and a modified box plot. The modified box
. Concept Check
440–450 9 78–80, 80–82, 82–84, . . . . 0.6667
0.1364 Practice
b. A
2.72 unimodal distribution
True/False is
The only difference between aiffrequency
there is a distinguish between
450–460 the data8 given on the0.1212 0.7879 mild and extreme outliers.6 8 10 12 14 16 18 20 22 24 26 5 1
vertical
histogram andline of symmetry.
a relative frequency histogram (for the same 2.68 2.78 Consider
True/False
460–470 The classes
5
text distribution
in a frequency
0.0758
website. Construct
may a 2.77 Consider the data given in the following
0.8637
Mean
table. ex2.77
frequency distribution to summarize 1.
overlap. these data.the data into list L1 .
Enter ex2.78
c. is
data) If the
a unimodal
scale ondistribution
the vertical is not symmetric, then it is
axis. 470–480 4 0.0606 0.9243 Figure 7.1 Histogram of the sample Figure 7.2
2.79 Consider the classes
following Press STATPLOT and select
2. distribution. 87 Plot1 from
86 the
81 means. 88 85 menu.
90 STATPLOTS 79 91 87 82
. 2.69 480–490The
True/False 3 in afrequency 0.0455
frequency distribution 0.9698 original pop
2.73 True/False A histogram can be used to describe the 3. Turn the plot On and select Type box plot (modified or standard). Set Xlist to the name of the li
490–500 2 0.0303 1.0001 91 86 86 87 88 85 92 85 87 86
2.76 True/False
shape, A bimodalofdistribution
center, and variability cannot be symmetric.should have the same width.
a distribution. data andCumulative
Freq to 1. Choose91 a Mark for outliers (if constructing a modified box plot) and select a C
81 89 89 83 90 83 80 90 80
2.74 Short answer 2.70 True/False The cumulative relative4.frequency for each WINDOW settings. Press GRAPH to display the box plot. A modified box plot
Enter appropriate
Relative relative 89 85 86 90 90 89 78 91 83 92
class in a frequency distribution may be greater than 1.
a. When is a density histogram appropriate? Practice Class Frequency Figure
frequency 3.42.
frequency
b. In a density histogram, what is the sum of areas of all 2.71 True/False The relative frequency for each class can be
rectangles? Basic, introductory
determined
400–410 problems
5
by using the cumulative to familiar-
0.0758
relative frequencies.
0.0758 Construct a frequency distribution to summarize these data
using the class intervals 78–80, 80–82, 82–84, . . . .
410–420 8 0.1212 0.1970
2.75 Fill in the Blank ize students withThethe
420–430
2.72 True/False 10concepts
only anda frequency
0.1515
difference between solu- 0.3485
2.78 Consider the data given on the text website. Construct a
a. The most common unimodal distribution is a
.
tionhistogram
methods. 430–440
and
440–450
12
a relative frequency
9 axis.
0.1818
histogram (for the same
0.1364
0.5303
0.6667
frequency distribution to summarize these data. ex2.78
data) is the scale on the vertical
b. A unimodal distribution is if there is a 450–460 8 0.1212 0.7879 2.79 Consider the following frequency distribution.
vertical line of symmetry. 2.73 True/False
460–470A histogram5 can be used to describe the
0.0758 0.8637
c. If a unimodal distribution is not symmetric, then it is shape, center, and variability4 of a distribution.
470–480 0.0606 0.9243 Cumulative
. 2.74 Short480–490
answer 3 0.0455 0.9698 Relative relative
490–500 2 0.0303 1.0001
2.76 True/False A bimodal distribution cannot be symmetric. a. When is a density histogram appropriate? Class Frequency frequency frequency
b. In a density histogram, what is the sum of areas of all
rectangles? 400–410 5 0.0758 0.0758
410–420 8 0.1212 0.1970
c.variability?
Use the frequency
diamond and otherdistribution to estimate
precious stones the middle
are usually of in car-
measured of fighting (and penalty minutes), the League Office believes most
quency xiv
efficiency
such that
are (AFUE).
above
25%
gas furnace can be measured
0.23 Q10.27
This
of
.Preface
the observations
by the
0.30 depends
number
annual
0.25 0.27
are below
fuel
on many 0.26
Q
utilization
1 and
furnace
75%
0.40 0.40 0.4
39 38 14 22 26 15 65 39 24 44
e. Use and
properties, the
0.51frequency
is an0.58 distribution
indication
0.61 of the to estimate
0.80proportion
0.90 of a number
1.02fuel energy
0.92Q3 1.01 21 29 16 18 19 17 37 40 56 17
delivered such that
heat75%
as 0.76 energy of the observations are below Q and 25% 39 21 19 19 15 25 46 14 32 30
0.90 during1.14 an 1.11entire heating
1.38 season.
1.523 0.96 The 1.16 0.2
are above Q
U.S. Department 3.Energy (DOE) requires all new gas
1.52 of1.05 1.51 1.36 0.91 341.22 1.54 1.35 25(12,0)14 22 17 15 71 14 13 23 24
furnaces to operate at an AFUE of at least 78%. A gas
Applications 1.76 2.00 1.01 1.69 1.51 2.00 1.45 1.38 Applications
5 a.10 15 a histogram
20 25 for these
30 data. 35 Describe
40 45
company selected a random sample of customers, carefully Construct the
tested
2.86 each furnace,
Biology andand recorded the AFUE
environmental Science number. The data
A weather Realistic, appealing
distribution in termsAge exercises
of shape, center, and tovariability.
build confidence and promote routine under-
are ona.the
givenlocated
station
Construct a frequency
text website.
along the Maine coast
distribution and a histogram for
FUrNaCein Kennebunkport Figure standing.
2.53 WriteResultingaManyogive. exercises
Solution are
Trail for this problem. based on interesting and carefully researched data.
a. Construct
these data. b. Find a value m for the number of minutes such that 90%
collects data ona stem-and-leaf
temperature, wind plot for thesewind
speed, data.chill, and
b. Multiply each observation in the table by 200, to convert the of all players have fewer than m penalty minutes.
n. b. Construct
rain. The maximum a frequency distribution
wind speed (in miles for per
these data for
hour) and50 draw
weights into
the corresponding milligrams. Construct a frequency distribution
histogram. A random sample of game scores from Abby Sciuto’s evening
randomly selected times in February 2013 are given on the
and a histogram for these new, transformed data. bowling league with Sister Rosita was obtained, and the data
umulative c. Describe
text website.27the distribution
maxwiND in terms of shape, center, and extended Applications
c. Compare the two histograms. Are the shapes similar? are given on the text website. BOwLiNG
relative a.variability.
Construct Are there anydistribution
a frequency outliers? Iftoso,summarize
what are they? these 2.92 Biology and distribution
environmental Science
d. Using theDescribe
frequency anydistribution
differences.in part (a), a. Construct a frequency for these data. Fruits such as
equency data, and draw the corresponding histogram. cherries and grapes are harvested and placed in a shallow box
b. Draw the resulting ogive for these data.
b.approximately
2.89 Public
Describe what
Health
the shape proportion
of the of furnaces
anddistribution.
Nutrition do
Vitamin
Are notany
there meet
B 3 (niacin) helps or crate called a lug. The size of a lug varies, but one typically
the toDOE’s minimum
detoxify
outliers? the body, AFUEaids requirement?
digestion, can ease the pain of 2.108 Public Health and Nutrition A doughnut graph is
holds between 16 and 28 pounds. A random sample of the
e. Themigraine
gas company classifies
headaches, andeach
helpsAFUE promotereading
healthyaccording
skin. A random another graphical representation of a frequency distribution for
2.87toFuel Consumption and Cars The quality of an auto- weight (in pounds) of full lugs holding peaches was obtained,
mobile
the following
sample
battery
butobtained
below 90,is often
scheme:
of adults
good;
in the90
measured
at least
or above,
United
70bybutcold
excellent;
States at leastwas
and in Europe
crankingfair; amps
80 categorical data. To construct a doughnut graph:
and the data are summarized in the following table. Extended Applications
and the daily intake of below
niacin 80, and lesswas
(in milligrams) 1. Divide a (flat) doughnut (or washer) into pieces, so that
(CCA),than a 70,
measure
poor.The
recorded.
of the current
Classify
data areeach suppliedinatthe
reading
summarized
08F.
in the
Thirty
table automo-
above,
following and
table. each piece (bite of the doughnut)
Class corresponds
Frequencyto a class. Applied problems that require
bile batteries
constructwerea barrandomly
chart for theseselected and subjected
classification data.to subfreez- 2. The size of each piece is measured by the angle made at
ing temperatures. The resulting CCA data are given in the
/207/WHF00260/9781429239769_KOKOSKA/KOKOSKA_INTRODUCTORY_STATISTICS02_SE_97814292
the center of the doughnut.
...
20.0–20.5 To compute the angle 6 of each extra care and thought.
following table. BaTTery
United States Europe
CHAllenGe Class frequency frequency piece, multiply the 20.5–21.0relative frequency times 12 3608 (the
FREE136_CH02_026-071.indd Page 67 19/08/14 8:44 AM f403 number of degrees 21.0–21.5 in a whole, or complete, 17
/207/WHF00260/9781429239769_KOKOSKA/KOKOSKA_INTRODUCTORY_STATISTICS02_SE_97814292circle). ...
2.107 63 Sports87 and302 Leisure
0–34 259 106
An ogive,
15 or 198 55 relative
cumulative 99
4 134 21.5–22.0 21
frequency
122 514 polygon,91 is3–6another
117 325type of 39visual
23 30representation
164 6 of 16
75 a The manager at a Whole Foods Market obtained
22.0–22.5 28a random
n and draw frequency distribution. To construct an ogive: sample of customers who22.5–23.0 purchased at least one 25 popular herb
340 199 77 6–9 217 64 320 21 145 84 12 47 232
FREE136_CH02_026-071.indd (for Page 71or19/08/14
cooking medicinal 8:44 AM f403 Figure 2.54 and/207/WHF00260/9781429239769_KOKOSKA/KOKOSKA_INTRODUCTORY_STATISTICS02_SE_97814292
purposes). ...
mmarizing Data n Plot each point (upper 9–12 endpoint of
14 class interval, cumulative
17 23.0–23.5 19 2.55
relative frequency).12–15 12 32 show a frequency distribution 23.5–24.0and the corresponding 15 dough-
n Connect the points 15–18 with line segments.9 25 nut graph. 24.0–24.5 11
18–21 3 distribution and 20 the
Chapter 2 Summary 67
) was Figures 2.52 and 2.53 show a frequency 24.5–25.0 10
ing table. 21–24 2
corresponding ogive. The observations are ages. The values to 10
be used in the plot are shown in bold in the table. a. Complete the frequency distribution.
C a. Construct two relative frequency histograms, Cumulative one for the
a. Complete the frequency distribution.
b. Construct a histogram corresponding to this frequency Chapter 2 Exercises 71
b. A traditional frequency histogram or relative frequency
cy United States and one for Relative
Europe. relative distribution.
Cumulative histogram is not appropriate in this case. Why not?
Class
b. Describe Frequency
the shape of each frequency
histogram. frequency
Does a comparison c. Estimate the weight w such that 90% of all full peach lugs
Relative c. Construct a density histogram corresponding to this
of the two histograms suggest anyrelative
differences in niacin weigh more than w. Relative Class Frequency
100–105 Frequency frequency frequency
Class 0.050 frequency distribution.
intake between the two samples? Explain. Herb Frequency frequency
105–110 0.425 2.93 Travel and Transportation Maglev trains operate in Smoking or smoking materials 70
12–16 8 0.08 0.08 2.95 Fuel Consumption and Cars The total cost of owning
110–115
2.90 manufacturing and Product Development 0.625 In the Echinacea Germany and Japan at speeds of up to 300 miles per hour. Mag- Heating equipment 85
an automobile includes the25amount spent0.125 on repairs. Before
16–20
United States, 10
115–120 yarn is often0.10 sold in hanks.0.18 For0.750
woolen yarn, nets create a frictionless
Ephedra 15 system in which0.075 the train operates at aCooking and cooking equipment 205
20–24 20 0.20 0.38 purchasing a new car, many consumers research the past quality
120–125
one hank is approximately 1463 meters. A quality
24–28 0.68
0.850control Challenge
distance of 100–150 20
ofFeverfew
specific makes and models.
millimeters from
The data in0.100
the rail. The size of this Children playing with matches
the following table 105
inspector uses 30
125–130 a special machine 0.30 to quickly 0.925 each hank. Garlic
measure air gap is monitored 35 constantly to ensure a safe ride. A randomArson / suspicious
0.175over 35
28–32
130–135
A random sample 15 was obtained 0.15 during the 0.83 0.975
manufacturing Additional
lists the number of problems
Ginkgosample of the exercises
size of air
40
per 100 and
gaps (in
vehicles technology
millimeters)
0.200
the three projects that allow students to discover
at one specific
32–36 10length (in 0.10 0.93 years 2010–2012. CarCOST
135–140
process, and the
32–40
meters) of each
1.00
1.000
hank is given on more advanced30concepts0.150
Kava location in the track was obtained. Theand connections.
frequency distribution
a. Find the relative frequency for each class.
the text website.7 yarN
0.07
Sawfor this data is shown20in the following
palmetto table.
0.100
Total 200 b. Draw a doughnut graph for these data.
Total 100 1.00 St. John’s wort 71
Lexus 15
Porsche 940.075Lincoln 112
a. Complete the frequency distribution. Toyota 112 Mercedes 115 Buick 118
Figure 2.52 Frequency distribution. Total 200 1.000 lASt SteP
b. Draw a histogram corresponding to this frequency distribution. Honda 119 Acura 120 Ram 122
are the
c. What proportion of air gaps were between 110 and 125 Suzuki 122 Mazda
Figure 2.54 Frequency distribution. 124 Chevrolet 125 2.109 Can the Florida Everglades be saved? In January
millimeters? Ford 127 Cadillac 128 Subaru 132 2013, the Florida Fish and Wildlife Conservation
e, center, or
1.0 BMW 133 GMC 134 Scion Commission started the Python Challenge. The purpose of the
135
Last Step
Cumulative relative frequency
gas
0.2
Class Frequency frequency Width Density the
a. Constructsolution
Ginkgo a histogram involves
for these 7.5%
20% the distribution in terms
the skills
data. 9.3 3.5 5.2 8.3 4.6 11.1 10.5 3.7 2.8 5.9
(12,0) b. Describe Saw of shape, center, and 7.4 14.2 13.6 8.3 7.5 5.2 6.4 12.0 10.7 4.0
gas 1400–1500
5 10 15
1
20 25 30 35 40 45
and conceptsKavapresented
variability. palmetto
10%
in the 11.1 3.7 7.0 12.2 5.2 8.1 4.2 6.1 6.3 13.2
arefully 1500–1600 2
r. The data 1600–1800 41 Age chapter.
c. Find a number Q115% such that 25% of the problem data are 3.9 6.7 3.3 8.3 10.9 9.5 9.4 4.3 4.6 5.8
less than Q1. Find a number Q3 such that 25% of the
Figure 2.53 Resulting
1800–2000 192ogive. 4.1 5.2 4.7 5.8 6.4 3.8 7.1 4.6 7.5 6.0
problem data are greater than Q3.
2000–2100 79 Figure 2.55 Resulting doughnut graph.
d. How many values should be between Q1 and Q3? Find the a. Construct a frequency distribution, stem-and-leaf plot,
ta and draw 2100–2200 60 actual number of values between Q1 and Q3. Explain any
A random sample of game scores from Abby Sciuto’s evening and histogram for these data.
2200–2300 15 A randomdifference
sample ofbetween
house fires these in two values.North Dakota,
Bismarck, b. Use these tabular and graphical techniques to describe the
nter, and bowling league with Sister Rosita was obtained, and the data
2300–2800 10 was selected and the cause of each was recorded. The resulting shape, center, and spread of this distribution, and to
are they? are given on the text website. BOwLiNG
a. Construct a frequency
Total 400 distribution for these data. data are shown in the following table. identify any outlying values.
b. Draw the resulting ogive for these data.
not meet
2.108 Public Health and Nutrition A doughnut graph is
g according another graphical representation of a frequency distribution for
t; at least 80 categorical data. To construct a doughnut graph:
air; and less 1. Divide a (flat) doughnut (or washer) into pieces, so that
e above, and CHAPTer
each piece (bite2of summAry
the doughnut) corresponds to a class.
a. 2. The size of each piece is measured by the angle made at
Concept Page notation / Formula / Description
the center of the doughnut. To compute the angle of each
piece, multiply the relative frequency times 3608 (the Chapter Summary A table at
Categorical data set 29 Consists of observations that may be placed into categories.
ve relative
number of degrees in a whole, or complete, circle).
Numerical data set 29 Consists of observations that are numbers.
the end of each chapter provides
ntation of a The manager at a Whole Foods Market obtained a random
Discrete data set 30 The set of all possible values is finite, or countably infinite.
sample of customers who purchased at least one popular herb
a list of the main concepts with
cumulative (forContinuous
cooking or data set purposes). Figure
medicinal 30 2.54 and
The2.55
set of all possible values is an interval of numbers. brief descriptions, proper nota-
Frequency
show distribution
a frequency 33 A table
distribution and the corresponding dough-used to describe a data set. It includes the class, frequency, and relative
nut graph. frequency (and cumulative relative frequency, if the data set is numerical).
tion, and applicable formulas,
Class frequency 33 The number of observations within a class. along with page numbers for
Class relative frequency 33 The proportion of observations within a class: class frequency divided by total quick reference.
number of observations.
media and supplements
W. H. Freeman’s new online homework system, LaunchPad, offers our quality content
curated and organized for easy assignability in a simple but powerful interface. We’ve
taken what we’ve learned from thousands of instructors and hundreds of thousands of
students to create a new generation of W. H. Freeman/Macmillan technology.
Curated Units. Combining a curated collection of videos, homework sets, tutorials, ap-
plets, and e-Book content, LaunchPad’s interactive units give instructors building blocks
to use as is or as a starting point for their own learning units. Thousands of exercises from
the text can be assigned as online homework, including many algorithmic exercises. An
entire unit’s worth of work can be assigned in seconds, drastically reducing the amount of
time it takes to have a course up and running.
Easily customizable. Instructors can customize the LaunchPad Units by adding quizzes
and other activities from our vast collection of resources. They can also add a discussion
board, a dropbox, and RSS feed, with a few clicks. LaunchPad allows instructors to cus-
tomize their students’ experience as much or as little as they like.
Useful analytics. The Gradebook quickly and easily allows instructors to look up perfor-
mance metrics for classes, individual students, and individual assignments.
Intuitive interface and design. The student experience is simplified. Students’ naviga-
tion options and expectations are clearly laid out at all times, ensuring that they can never
get lost in the system.
New Stepped Tutorials are centered on algorithmically generated quizzing with step-
by-step feedback to help students work their way toward the correct solution. These
new exercise tutorials (two to three per chapter) are easily assignable and assessable.
Icons in the textbook indicate when a Stepped Tutorial is available for the material
being covered.
xv
xvi Preface
Statistical Video Series consists of StatClips, StatClips Examples, and Statistically Speak-
ing “Snapshots.” View animated lecture videos, whiteboard lessons, and documentary-style
footage that illustrate key statistical concepts and help students visualize statistics in real-
world scenarios.
New Video Technology Manuals available for TI-83/84 calculators, Minitab, Excel, JMP,
SPSS, R, Rcmdr, and CrunchIT! provide brief instructions for using specific statistical
software.
Updated StatTutor Tutorials offer multimedia tutorials that explore important concepts
and procedures in a presentation that combines video, audio, and interactive features. The
newly revised format includes built-in, assignable assessments and a bright new interface.
CrunchIt! is a web-based statistical program that allows users to perform all the statistical
operations and graphing needed for an introductory statistics course and more. It saves
users time by automatically loading data from the text, and it provides the flexibility to
edit and import additional data.
JMP Student Edition (developed by SAS) is easy to learn and contains all the
capabilities required for introductory statistics, including pre-loaded data sets from In-
troductory Statistics: A Problem-Solving Approach. JMP is the commercial data analysis
software of choice for scientists, engineers, and analysts at companies around the globe
(for Windows and Mac).
Stats@Work Simulations put students in the role of the statistical consultant, helping
them better understand statistics interactively within the context of real-life scenarios.
Data files are available in ASCII, Excel, TI, Minitab, SPSS (an IBM Company),* and
JMP formats.
Student Solutions Manual provides solutions to the odd-numbered exercises in the text.
Available electronically within LaunchPad, as well as in print form.
Interactive Table Reader allows students to use statistical tables interactively to seek the
information they need.
Instructor’s Solutions Manual contains full solutions to all exercises from Introductory
Statistics: A Problem-Solving Approach. Available electronically within LaunchPad.
Test Bank offers hundreds of multiple-choice questions. Also available on CD-ROM (for
Windows and Mac), where questions can be downloaded, edited, and resequenced to suit
each instructor’s needs.
Special Software Packages Student versions of JMP and Minitab are available for pack-
aging with the text. JMP is available inside LaunchPad at no additional cost. Contact your
W. H. Freeman representative for information or visit www.whfreeman.com.
Question 1
Select an answer
C
Received
E
Acknowledgments
I would like to thank the following colleagues who offered specific comments and sug-
gestions on the second-edition manuscript throughout various stages of development:
A special thanks to Ruth Baruth, Terri Ward, Karen Carson, Cara LeClair, Lisa Kinne,
Tracey Kuehn, Julia DeRosa, Vicki Tomaselli, Robin Fadool, Marie Dripchak, Liam
Ferguson, Catriona Kaplan, Laura Judge, and Victoria Garvey of W. H. Freeman and
Company.
Designer Jerry Wilke and illustrator Cambraia Fernandez, led by Vicki Tomaselli, of-
fered the creativity, expertise, and hard work that went into the design of this new edition.
I am very grateful to Jackie Miller for her insights, suggestions, and editorial talent
throughout the production of the second edition. She is doggedly accurate in her accuracy
reviews and page proof examination. Thanks to Dennis Free of Aptara for his patience
and typesetting expertise. Much appreciated are the copy editing skills brought to the
project by Lynne Lackenbach; her time and perseverance helped to add cohesion and
continuity to the flow of the material. Many thanks to Aaron Bogan for bringing his
xviii
Acknowledgments xix
a ttention to detail and knowledge of statistics to the accuracy review of the solutions
manuals. And I could not have completed this project without Karen Carson and Leslie
Lahr. Both have superb editing skills, a keen eye for style, a knack for eliciting the best
from an author, and unwavering support.
My sincere thanks go to the authors and reviewers of the supplementary materials
available with Introductory Statistics: A Problem-Solving Approach, Second Edition; their
hard work, expertise, and creativity have culminated in a top-notch package of resources:
Test Bank written by Julie Clark, Hollins University
Test Bank and iClicker slides accuracy reviewed by John Samons, Florida State
College at Jacksonville
Practice Quizzes written by James Stamey, Baylor University
Practice Quizzes accuracy reviewed by Laurel Chiappetta, University of Pittsburgh
iClicker slides created by Paul Baker, Catawba College
Lecture PowerPoints created by Susan Herring, Sonoma State University
I am very grateful to the entire Antoniewicz family for providing the foundation for a
wide variety of problems, including those that involve nephelometric turbidity units, floor
slip testers, and crazy crawler fishing lures.
I continue to learn a great deal with every day of writing. I believe this kind of exposi-
tion has made me a better teacher.
To Joan, thank you for your patience, understanding, inspiration, and tasty treats.
ABOUT THE AUTHOR
S teve received his undergraduate degree from Boston College and his
M.S. and Ph.D. from the University of New Hampshire. His initial re-
search interests included the statistical analysis of cancer chemoprevention
experiments. He has published a number of research papers in mathematics
journals, including Biometrics, Anticancer Research, and Computer Methods
and Programs in Biomedicine; presented results at national conferences; and
written several books. He has been awarded grants from the National Sci-
ence Foundation, the Center for Rural Pennsylvania, and the Ben Franklin
Program.
Steve is a long-time consultant for the College Board and conducted
workshops in Brazil, the Dominican Republic, and China. He was the AP
Calculus Chief Reader for four years and has been involved with calculus
reform and the use of technology in the classroom. He has been teaching at
Bloomsburg University for 25 years and recently served as Director of the
Honors Program.
Steve has been teaching introductory statistics classes throughout his aca-
demic career, and there is no doubt that this is his favorite course. This class
(and text) provides students with basic, lifelong, quantitative skills that they
will use in almost any job and teaches them how to think and reason logically.
Steve believes very strongly in data-driven decisions and conceptual under-
standing through problem solving.
Steve’s uncle, Fr. Stanley Bezuszka, a Jesuit and professor at Boston Col-
lege, was one of the original architects of the so-called new math in the 1950s
Credit: Eric Foster and 1960s. He had a huge influence on Steve’s career. Steve helped Fr. B.
with test accuracy checks, as a teaching assistant, and even writing projects
through high school and college. Steve learned about the precision, order, and elegance
of mathematics and developed an unbounded enthusiasm to teach.
xx
Introductory
Statistics
0 Why Study Statistics
1
2 Ch a p t e r 0 Why Study Statistics
Here is another example of an extraordinary event involving a daily lottery number. The
1980 Pennsylvania Lottery scandal, or the Triple Six Fix, involved a three-digit daily lot-
tery number. Nick Perry was the announcer for the Daily Number and the plan’s architect.
With the help of partners, Nick was able to weight all of the balls except for the ones
numbered 4 and 6. This meant that the winning three-digit lottery number would be a
combination of 4s and 6s. There were thus only eight possible winning lottery numbers,
444, 446, 464, 466, 644, 646, 664, and 666, and the conspirators were certain that the plan
would work.
The winning number on the day of the fix was 666. Ignoring the connection to the
Book of Revelations, lottery officials discovered that there were very unusual betting pat-
terns that day, all on the eight possible lottery numbers involving 4 and 6. This extraordi-
nary occurrence suggested that the unusual bets were not due to pure chance. This
conclusion, along with an anonymous tip, helped in a grand jury investigation leading to
convictions and jail time for several men.
The crucial prevailing theme in this text is statistical inference and decision making
through problem solving. Computation is important and is shown throughout the text.
However, calculators and computers remove the drudgery of hand calculations and
allow us to concentrate more on interpretation and drawing conclusions. Most problems
in this text contain a part asking the reader to interpret the numerical result or to draw
a conclusion.
The process of questioning a rare occurrence or claim can be described in four steps.
Claim: This is the status quo, the ordinary, typical, and reasonable course of events—
what we assume to be true.
Experiment: To check a claim, we conduct a relevant experiment or make an appropriate
observation.
Likelihood: Here we consider the likelihood of occurrence of the observed experimental
outcome, assuming the claim is true. We will use many techniques to determine
whether the experimental outcome is a reasonable observation (subject to some vari-
ability), or whether it is an exceptionally rare occurrence. We need to consider care-
fully and quantify our natural reaction to the relevant experiment. Using probability
rules and concepts, we will convert our natural reaction to an experimental outcome
into a precise measurement.
Conclusion: There are always only two possible conclusions.
1. If the outcome is reasonable, then we cannot doubt the original claim. The natural
conclusion is that nothing out of the ordinary is occurring. More formally, there is
no evidence to suggest that the claim is false.
2. If the experimental outcome is rare or extraordinary, we usually disregard the lucky
alternative, and we think something is wrong. A rare outcome is a contradiction.
Strange occurrences naturally make us question a claim. In this case we believe
there is evidence to suggest that the claim is false.
Let’s try to apply these four steps to the PG&E case in Erin Brokovich. The claim or
status quo is that the cancer incidence rate in Hinkley is equivalent to the national
incidence rate. Recent figures from the American Cancer Society suggest that the can-
cer incidence rate is approximately 551 in 100,000 for men and 419 in 100,000 for
women.1
The experiment or observed outcome is the cancer incidence rate for the population
living in Hinkley. In the movie, it is implied that Erin counts the number of people in
Hinkley who have developed cancer.
With a Little Help from Technology 3
Erin determines that the likelihood, or probability, of observing that many people in
Hinkley who have developed cancer is extremely low. Subject to reasonable variability,
we should not see that many people with cancer in this location.
The conclusion is that this rare event is not due to pure chance or luck. There is some
other reason for this rare observation. The implication in the movie is that there is evi-
dence to suggest that something else is affecting the health of the people in Hinkley.
Problem Solving
SOLUTION TRAIL
Perhaps one of the most difficult concepts to teach is problem solving. We all struggle to
solve problems: thinking about where to begin, what assumptions we can make, and
which rules and techniques to use. One reason many students consider statistics a difficult
Solution Trail course is because almost every problem is a word problem. These word problems have to
be translated into mathematics.
K e y w or ds The Solution Trail in this text is a prescriptive technique and visual aid for problem
n Normally distributed solving. To decipher a word problem, start by identifying the keywords and phrases. Here
n Mean are the four steps identified in each Solution Trail for solving many of problems in this text.
n Standard deviation
1. Find the keywords.
T r ansl ati o n 2. Correctly translate these words in statistics.
n Normal random variable 3. Determine the applicable concepts.
n m 5 34
4. Develop a vision, or strategy, for the solution.
n s 5 0.5
Many of the examples presented in this text have a corresponding Solution Trail in the
C onc e pts margin to aid in problem solving. An example of a Solution Trail appears in the margin.
n Normal probability distribution Note that many of these terms and symbols may be unfamiliar to you at this point. Right
n Standardization now, just focus on the idea that the Solution Trial involves keywords, a translation, con-
cepts, and a vision.
V isi on The keywords in the problem lead to a translation into statistics. The statistics question
Define a normal random variable is then solved by using the appropriate, specific concepts. The keywords, translation, and
and translate each question into a concepts are used to develop a grand vision for solving the problem.
probability statement. Standardize
This solution technique is not applicable to every problem, but it is most appropriate
and use cumulative probability
for finding probabilities through hypothesis testing, which is the foundation of most intro-
associated with Z if necessary.
ductory statistics courses. Some exercises in this text ask you to write each step in the
Solution Trail formally. As you become accustomed to using this solution style, it will
become routine, natural, and helpful.
2. The Texas Instruments TI-84 Plus C graphing calculator includes many common
statistical features such as confidence intervals, hypothesis tests, and probability distri-
bution functions. Data are entered and edited in the stat list editor as shown in Fig-
ure 0.4. Figure 0.5 shows the results from a one-sample t test, and Figure 0.6 shows a
visualization of this hypothesis test.
With a Little Help from Technology 5
Figure 0.4 The stat list editor. Figure 0.5 One-sample t-test Figure 0.6 One-sample t-test
output. visualization.
3. Minitab is a powerful software tool for analyzing data. It has a logical interface,
including a worksheet screen similar to a common spreadsheet. Data, graph, and sta-
tistics tools can be accessed through pull-down menus, and most commands can also
be entered in a session window. Figure 0.7 shows a bar chart of the number of Rolex
24 sports car race wins by automobile manufacturer.
4. Excel 2013 includes many common chart features accessible under the Insert tab.
There are also probability distribution functions that allow the user to build tem-
plates for confidence intervals, hypothesis tests, and other statistical procedures. The
Data Analysis tool pack provides additional statistical functions. Figure 0.8 shows
some descriptive statistics associated with the ages of 100 stock brokers at a New
York City firm.
In addition to these tools, JMP statistical software is used by scientists, engineers, and
others who want to explore or mine data. Various statistical tools and dynamic graphics
are available, and this software features a friendly interactive interface. Figure 0.9 shows
a scatter plot of the price of used Honda Accords versus the age of the vehicle, the least-
squares regression line, and confidence bands for the true mean price for each age.
Many other technology tools and statistical software packages are also available. For
example, R is free statistical software, SPSS is used primarily in the social sciences, and
SAS incorporates a proprietary programming language. Regardless of your technology
choice, remember that careful and thorough interpretation of the results is an essential
part of using software properly.
6 C h a p t e r 0 Why Study Statistics
Chapter 0 Exercises
0 0.1 Name the four parts of every statistical inference problem. the SEC believed Gordon and Bud may have had inside
information or manipulated the price of certain stocks?
0.2 Apply the four statistical inference steps to the Triple
Six Fix. 0.7 What do you think it means when a weatherperson says,
“There is a 50% chance of rain today.” Contact a weatherperson
0.3 Name the four parts of the Solution Trail. and ask him or her what this statement means. Does this
0.4 The Canary Party recently began the Not a Coincidence explanation agree with yours?
campaign to highlight women who have been affected by 0.8 James Bozeman of Orlando won the Florida lotto twice. He
Merck’s human papillomavirus (HPV) vaccine, Gardasil.3 As of beat the odds of 1 in 22,957,480 twice to win a total of $13
November 2013, a report states that there have been 31,741 million. State two possible explanations for this occurrence.
adverse events, 10,849 hospitalizations, and 144 deaths due to Which explanation do you think is more reasonable? Why?
HPV vaccines. Explain why The Canary Party believes that
there must be something wrong with the vaccine. 0.9 In January 2014, 33 whales died off the coast of Florida.
Twenty-five were found on Kice Island in Collier County. Blair
0.5 It had been very rare for an NBA player to suffer a major Mase, a marine mammal scientist with the National Oceanic
knee injury while on the court. Derrick Rose tore an anterior and Atmospheric Administration (NOAA) indicated that NOAA
cruciate ligament (ACL) in his left knee in 2012. Rose was was carefully investigating these deaths.4 Explain why NOAA
the first player to suffer an ACL tear since Danny Manning in believes the whales did not die as a result of natural causes and
1995 and Bernard King in 1985. Since the injury to Derrick is investigating the deaths.
Rose, at least six NBA players have experienced similar
injuries—torn ACLs. State two possible explanations for this 0.10 In January 2014, 62 people became sick after dining at
rare rash of injuries. Which explanation do you think is more one of two restaurants that share a kitchen in Muskegon
plausible? Why? County, Michigan.5 The illnesses occurred over a four-day
period, and county health officials began an immediate
0.6 In the movie, Wall Street, corporate raider Gordon Gekko investigation.
and his partner Bud Fox made a lot of money trading stocks. a. Explain why officials investigated the source of these
However, several of the trades attracted the attention of the illnesses.
Securities and Exchange Commission (SEC). Why do you think b. Apply the four statistical inference steps to this situation.
Chapter 0 Exercises 7
0.11 In 2009 and 2010, Toyota issued a costly recall of over 9 0.13 Recently, the Sedgwick County Health Department
million vehicles because of possibly out-of-control gas pedals. reported at least 27 cases of whooping cough in one month. This
There had been at least 60 reported cases of runaway vehicles, observed count was more than in any month in the previous five
some of which resulted in at least one death.6 years. Do you think health officials should be concerned about
a. State two possible reasons for this observed high number this outbreak of whooping cough? Why or why not?
of runaway vehicles.
0.14 To understand the definitions and formulas in this text,
b. Why do you think Toyota issued this recall?
you will need to feel comfortable with mathematical notation.
0.12 Suppose there were 15 home burglaries in a small town To review and prepare for the notation we will use, make sure
during the entire year. None occurred on a Thursday. Do you you are familiar with the following:
Looking Forward
s
n Recognize that data and statistics are pervasive and that statistics are used to describe typical
values and variability, and to make decisions that affect everyone.
n Understand the relationships among a population, a sample, probability, and statistics.
n Learn the basic steps in a statistical inference procedure.
Contents
1.1 Statistics Today
1.2 Populations, Samples, Probability, and Statistics
1.3 Experiments and Random Samples
wanphen chawarung/Shutterstock
9
10 C h a p t e r 1 An Introduction to Statistics and Statistical Inference
4. Likelihood and inference: Recent research suggests that there is a link between the
incidence of lung cancer and cancer-causing chemical pollutants near large industrial
facilities, especially oil refineries.3 The incidence of lung cancer in oil refinery coun-
ties was higher than in non–oil refinery counties. The chance of this happening was so
small that the researchers concluded that oil refinery products significantly affect lung
carcinogenesis.
5. Relative frequency and probability: According to Oddee, some vending machines
mysteriously fall over and crush 13 people per year.4 If all 312 million Americans are
equally likely to be killed by a vending machine, then the probability that a randomly
selected individual will be crushed by a vending machine during a particular year is
0.00000004167 5 13/312,000,000. The relative frequency of occurrence is a good
estimate of probability and is often used to develop statistical models and make
predictions.
There has been an explosion of numerical information, in stories like those above, in
business, in consumer reports, and even in casual conversation. Interpretation of graphs
and evaluation of statistical arguments are no longer reserved for academics and research-
ers. It is essential for all of us to be able to understand arguments based on acquired data.
This numerical, or quantitative, literacy is a vital life-long tool.
No matter how you are employed or where you live, you will have to make decisions
based on available information or data. Here are some questions you may have to consider.
Stephen Finn/Shutterstock
1. Do you have enough information (data) to make a confident decision? How were the
data obtained? If more information is necessary, how will these data be gathered?
2. How are the data summarized? Are the graphical and/or numerical techniques appro-
priate? Does the summary represent the data accurately?
3. What is the appropriate statistical technique for analyzing the data? Are the conclu-
sions reasonable and reliable?
Definition
Descriptive statistics: Graphical and numerical methods used to describe, organize, and
summarize data.
Here’s a dictionary definition for Inferential statistics: Techniques and methods used to analyze a small, specific set of
inference: a deduction or logical data so as to draw a conclusion about a large, more general collection of data.
conclusion.
40000
35000
30000
Number of Reports
25000
20000
15000
10000
5000
an
So iian
Ha n
e
Am lta
ka
Sk a
r
t
t
d
tie
es
es
a
lu
es
ite
r tr
as
ic
De
tb
yw
hw
wa
on
M
er
Un
Ai
Al
Je
Fr
ut
Airline
Figure 1.1 Mishandled baggage reports. (Source: Office of Aviation
Enforcement and Proceedings, U.S. Department of Transportation.)
Table 1.1 A portion of table summarizing health and safety characteristics in owner-occupied units (national)
Region
Definition
A population is the entire collection of individuals or objects to be considered or studied.
A sample is a subset of the entire population, a small selection of individuals or objects
taken from the entire collection.
A variable is a characteristic of an individual or object in a population of interest.
A Closer L ok
1. A population consists of all objects of a particular type. There are usually infinitely
many objects in a population, or at least so many that we cannot look at all of them.
14 C h a p t e r 1 An Introduction to Statistics and Statistical Inference
Solution
Step 1 The population consists of all adult drivers in the United States. This population is
not infinite, but it is so large that it would be impossible to contact every adult driver.
SOLUTION TRAIL
Solution Trail 1.6 Step 2 The sample consists of the 10 adult drivers selected.
Step 3 The variable in the problem is whether the driver has fallen asleep at the wheel,
K e y w ords a yes/no response.
n Adult drivers
n 10 adult drivers
Try It Now Go to Exercise 1.7
n Fallen asleep while driving
Tr ansl at i o n A Closer L ok
n All adult drivers in the Example 1.6 raises some important issues regarding the sample of 10 adult drivers.
United States
n Subset of all adult drivers 1. How large a sample is necessary for us to be confident in our conclusion? Ten adult
n Characteristic of adult drivers drivers may not seem like enough. But how many do we need? 100? 1000? We will
consider the problem of sample size in Chapter 7 and beyond.
C onc e p ts 2. This problem does not say how the sample was obtained. Perhaps the first 10 drivers
n Population who recently renewed their licenses were selected. Or maybe only those from one state
n Sample were included. To draw a valid conclusion, we need to be certain the sample is repre-
n Variable sentative of the entire population. The formal definition of a representative sample is
presented in Section 1.3.
V isi on
Determine the set of all objects Statistical inference is based on, and follows from, basic probability concepts.
of interest, the subset, and the robability and inferential statistics are both related to a population and a sample, but
P
attribute measured for each from different perspectives. For the rest of this chapter, statistics really means inferen-
object. tial statistics.
Definition
To solve a probability problem, certain characteristics of a population are assumed to be
known. We then answer questions concerning a sample from that population.
In a statistics problem, we assume very little about a population. We use the information
about a sample to answer questions concerning the population.
Probability
Population Sample
Statistics
Figure 1.2 Relationships among probability,
statistics, population, and sample.
In Figure 1.2, it may seem like we can start our study anywhere in this circular dia-
gram. However, we need to understand probability before we can learn statistics. A solid
background in probability is necessary before we can actually do statistical inference.
16 C h a p t e r 1 An Introduction to Statistics and Statistical Inference
to conclude that an owl can turn its neck more than 300
120 degrees from the forward position.
250
c. A Navy research facility runs several tests to check the
structural integrity of a new submarine. A laboratory 200
Bankruptcies
report states the vessel can withstand pressure at depths
of at most 800 ft. 150
d. A safety inspector in Atlanta selects a sample of
100
apartment buildings and checks the fire ladders on each.
The proportion of broken ladders in the sample is used to 50
estimate the proportion of broken fire ladders in the
entire city. 0
te
n
e
sa ing
e. Like most states, a large portion of the New York
Re rad
tio
io
ad
io
nc
ta
ct
at
r
na
r
ta
es
tu
t
ru
rm
State budget is spent on health care, pensions, and
or
le
il
ac
Fi
t
ta
al
ns
fo
sp
uf
Re
In
Co
le
education. The pie chart in Figure 1.3 shows the
an
an
ho
Tr
M
W
percentage of the budget spent on each item for Fiscal Industry
Year 2013.8
Figure 1.4 Total bankruptcies in Quebec.
controversial issue. Before he votes, he would like to know 1.13 Psychology and Human Behavior During each
what his constituents think about the proposed bill. An aide for summer, many families spend part of their vacation time at a
the senator selects 500 people from Florida and asks them beach along the East or West Coast. Due to the popularity of
whether they believe the bill should become law. Describe the movies like Jaws, and recent shark attacks on surfers, swim-
population and the sample in this problem. mers, snorkelers, and spearfishermen, Americans have become
increasingly concerned about water activities. Research
1.11 Economics and Finance In 2012, Hurricane Sandy suggests that 46% of all shark attacks are on divers.10 One
caused over 2 million households to lose power and damaged thousand records of shark attacks are selected, and each is
over 70,000 homes and businesses. Every family filed some categorized by victim group.
sort of insurance claim for damage to their home and/or car. a. What is the population of interest?
An insurance company serving the area is interested in the b. What is the sample?
typical amount of a claim as a result of this storm. Seventy- c. Describe the variable of interest.
five affected families are selected and their total claims are
recorded. Describe the population and the sample in this 1.14 Manufacturing and Product Development Spray-on
problem. tans, or fake tans, contain several chemicals that have been
linked to allergies, diabetes, and obesity.11 Twenty fake tan
1.12 Probability or Statistics In each of the following products are selected and the amount of dihydroxyacetone (the
problems, identify the population and the sample, and active ingredient) in each is measured. Describe the population,
determine whether the question involves probability or the sample, and variable in this problem.
statistics.
a. Seventy-five percent of all people who buy a dining room 1.15 Medicine and Clinical Studies There is some
table purchase matching chairs. Five people who evidence to suggest people with chronic hepatitis C have a
purchased a dining room table within the last month are liver enzyme level that fluctuates between normal and
selected at random. What is the probability that all five abnormal.12 Fifty patients diagnosed with hepatitis C are
purchased matching chairs? selected and their liver enzyme levels are recorded each day
b. Twenty-five people entering a rest area and food court on for one month. Describe the population, sample, and variable
Highway 59 near Houston are selected at random. Of in this problem.
these 25 people, 20 purchased food from at least one of 1.16 Manufacturing and Product Development Paper
the eateries. Estimate the true proportion of people towel manufacturers constantly advertise their products’
stopping at this rest area who purchase food. strength, amount of stretch, and softness. A consumer group
c. Historical records indicate 1 out of every 500 people is interested in testing the absorption of Bounty paper
using a particularly steep water slide suffer some kind of towels. Thirty-five rolls are selected, and the amount of
injury. Fifty people using the slide are selected at random. absorption for a single paper towel from each roll is
How many do you expect to be injured? recorded.
d. A building inspector in Henderson, Nevada, is checking a. What is the population of interest?
public buildings with doors that open automatically. One b. What is the sample?
hundred doors are randomly selected. Careful inspection c. Describe the variable of interest.
reveals that 12 doors are broken. Use this information to
estimate the percentage of automatic doors in Henderson
that are broken. Extended Applications
e. One thousand people entering Los Angeles International
1.17 Manufacturing and Product Development While
Airport (LAX) are selected at random. Each person is
much of the cheddar cheese consumed around the world is
asked to complete a short survey regarding travel. The
processed, some is still produced in the traditional manner:
survey results show 637 carry a frequent-flier card. Is
made in small batches, wrapped in cloth to breathe, and allowed
there evidence to suggest the true proportion of travelers
to age. Most traditional cheddar is aged one to two years; like
entering LAX who carry frequent flier cards is greater
fine wines, older cheddars assume their own character and
than 0.60?
flavor. Suppose 75% of all cheddars are aged less than two
f. The Risdall Advertising Agency reports 65% of all
years, and a sample of 20 cheddar cheeses from around the
women have purchased perfume within the last three
world is obtained.
months. Thirty-four women are selected at random. Is it
a. Describe the population and the sample in this problem.
likely more than 20 of these women purchased perfume
b. Write a probability question and a statistics question
within the last three months?
involving this population and sample.
g. Representatives from the Occupational Safety and Health
Administration inspected several for-profit and Medicare 1.18 Marketing and Consumer Behavior Magazines,
nursing homes for any violations. The resulting data will newspapers, and books have become more readily available
be used to determine whether there is any evidence to in digital format. In addition, the quality of readers, for
suggest the quality of treatment is different in the two example, the Kindle, Nook, and iPad, has increased. A recent
types of nursing homes. study suggests that 21% of adults read an ebook within the
1.3 Experiments and Random Samples 19
past year.13 Suppose a sample of 500 adults in the United making certain investments overseas. A recent study suggests
States is obtained. that 4% of large companies have plans to relocate jobs back to
a. Describe the population and the sample in this problem. the United States.14 Seventy-five large companies are selected,
b. Write a probability question and a statistics question and each is surveyed to determine if it plans to move jobs back
involving this population and sample. to the United States.
a. What is the population of interest?
1.19 Business and Management One of the main reasons b. What is the sample?
that U.S. companies shift jobs overseas is labor costs. Although c. Describe the variable of interest.
the compensation gap between the United States and China has d. Write a probability question and a statistics question
decreased recently, the tax code still rewards companies for involving this population and sample.
Definition
In an observational study, we observe the response for a specific variable for each indi-
vidual or object.
In an experimental study, we investigate the effects of certain conditions on individuals
or objects in the sample.
Definition
A (simple) random sample (SRS) of size n is a sample selected in such a way that every
possible sample of size n has the same chance of being selected.
20 C h a p t e r 1 An Introduction to Statistics and Statistical Inference
A Closer L ok
How can we be absolutely certain 1. In practice, a random sample may be very difficult to achieve. Statisticians employ
every possible sample of size n is various techniques, including random number tables and random number generators,
equally likely? to select a random sample.
2. If a sample is not random, then it is biased. There are many different kinds of bias and
factors that contribute to a biased sample.
3. Nonresponse bias is very common when data are collected using surveys. The
majority of people who receive a survey in the mail simply discard it. The original
collection of people receiving the survey may be random, but the final sample of
completed surveys is not. Because the sample is biased, it is impossible to draw a
valid conclusion.
4. Self-selection bias occurs when the individuals (or objects) choose to be included in
the sample, as opposed to being selected. For example, a television news program may
ask viewers to respond to a yes/no question by dialing one of two phone numbers to
cast their vote. Viewers choose to participate, and usually those with strong opinions
(either way) vote. There are many more who did not have the opportunity to respond—
every single sample is not equally likely. Certainly this sample is biased, and hence no
valid conclusion is possible.
5. If the population is infinite, then the number of simple random samples is also infinite.
For finite populations, the formula for the number of possible random samples is pre-
sented in Chapter 7.
6. A simple random sample is vital for sound statistical practice. Before doing any analy-
sis, you should always ask how the data were obtained. If there is any evidence of a
pattern in selection, if the observations are associated or linked in some way, or if there
is some connection among the observations, then the sample is not random. There is
simply no way to transform bad data into good statistics.
in order to investigate and isolate specific effects. The following example is of an exper-
imental study.
We will use this four-step process to check a claim in many different contexts.
The method for determining likelihood is the key to this valuable tool for logical
reasoning.
1.22 True/False The number of simple random samples is 1.28 Demographics and Population Statistics State Farm
always infinite. Insurance Company would like to estimate the proportion of
volunteer firefighters across the country who are full-time
1.23 True/False A simple random sample is representative teachers. The 25 largest volunteer fire companies in the United
of the entire population of interest. States are identified. Each is contacted and asked to complete a
short survey regarding the number of volunteers and the
1.24 Fill in the Blank
occupation of each volunteer.
a. If a sample is not random, then it is .
a. Is this an observational or an experimental study?
b. It is very common to experience when
b. Describe the sample in this problem.
data is collected using surveys.
c. Is this a random sample? Justify your answer.
c. occurs when individuals ask to be
included in an survey. 1.29 Manufacturing and Product Development The
Visniak Bottling Plant in Cheektowaga, New York, has been
1.25 Statistical Inference Name the four parts of every
accused of systematically underfilling 12-oz bottles of soda. An
statistical inference problem.
inspection team enters the plant one afternoon and selects
1.26 Liar, Liar Suppose an experimental outcome is very bottled soda ready for shipment from various locations within
rare. What two things could have happened? the plant. The contents of each selected bottle are carefully
measured.
a. Describe the population and the sample in this problem.
Applications b. Is this a random sample? Justify your answer.
1.27 Fuel Consumption and Cars The administration at 1.30 Biology and Environment Science Oregon Scientific
the University of Nebraska in Lincoln is interested in student has come under suspicion of purposely shipping defective
reaction to a planned parking garage on campus. A dormitory wireless weather stations. The Attorney General’s office in
near the proposed site is selected and several Student Senate Delaware would like to estimate the proportion of defective
members volunteer to solicit responses. One Thursday products being shipped by this company. Describe a method
1.3 Experiments and Random Samples 23
for obtaining a simple random sample of shipped wireless 1.37 Fuel Consumption and Cars Electric and plug-in
weather stations. electric cars are designed to save gasoline and help the
environment. In addition, there are certain tax credits for these
1.31 Fuel Consumption and Cars The Massachusetts State
types of hybrid automobiles.15 Although there are certainly
Police union is interested in the number of miles driven by each
benefits to owning a hybrid car, many people complain about
officer during an 8-hour shift. Twelve officers are selected from
the slow acceleration, repair expense, and overall comfort.
the 11:00 p.m. to 7:00 a.m. shift, and the number of miles
Thirty-five passengers are randomly selected. Each is blind-
traveled by each officer is recorded.
folded and taken for a ride in a traditional combustion-engine
a. Is this an observational or an experimental study?
automobile and in a comparably sized hybrid car (over the same
b. Describe the population and the sample in this problem.
route). The passenger is then asked to select the car with the
c. Is this a random sample? Justify your answer.
most comfortable ride.
1.32 Manufacturing and Product Development Gillette a. Is this an observational or an experimental study?
claims a new disposable razor provides a closer shave than any b. What is the variable of interest?
other brand currently on the market. One hundred men who are c. Describe possible sources of bias in these results.
observed buying a disposable razor are selected and asked to
participate in a shaving study. 1.38 Manufacturing and Product Development The
a. Describe the population and the sample in this problem. ceramic tile used to construct the floors in a mall must be
b. Is this a random sample? Justify your answer. sturdy, easy to clean, and long-lasting. Before installing a
specific tile, a construction firm orders a box of 25 tiles and
1.33 Manufacturing and Product Development Midwest uses a standard strength test on each. The results are used to
Pet Supplies claims its K9 Chain Link Dog Kennel can be set determine whether the tiles will be used throughout the
up in less than 30 minutes. An investigative reporter would like new mall.
to check this claim. Describe a method for obtaining a simple a. Describe the population and the sample in this
random sample of customers who set up this kennel. problem.
1.34 Sports and Leisure A National Football League coach b. Is this a random sample? If so, justify your answer.
is permitted to initiate two challenges to referee calls per game If not, describe a technique for obtaining a random
(outside of the final 2 minutes in each half). If both challenges sample.
are successful, then the coach is given a third. During a
1.39 Manufacturing and Product Development Many
challenge, the referee reviews the play in question on a replay
comforters contain both white feathers and down in order to
monitor on the field, and the call is either confirmed or the
provide a warm, soft cover. A bed-and-bath company would
challenge is upheld. The NFL reports the time required to
like to expand its line of products and sell comforters for
resolve a coach’s challenge is less than 5 minutes. A sports
queen- and king-size beds. Before manufacturing begins, a
statistician would like to check this claim. Describe a method
random sample of comforters is obtained from other companies
for obtaining a simple random sample of challenges during
and the proportions of white feathers, down, and other compo-
NFL games.
nents are measured and recorded. These data will be used to
1.35 Travel and Transportation The Department of determine the exact mixture of feathers and down for the new
Public Works in Bismarck, North Dakota, would like to line of comforters.
estimate the number of potholes per mile (after a long, snowy a. Is this an observational or an experimental study?
winter). Each selected mile-long stretch will be thoroughly b. What are the variables of interest?
examined for potholes, and the number in each section will be c. Describe a method for obtaining a random sample of
recorded. comforters from current manufacturers.
a. Describe a method for obtaining a simple random sample
of mile-long road segments. 1.40 Marketing and Consumer Behavior Disney World
b. Is this an observational or experimental study? is going to initiate the use of wireless tracking wristbands
for visitors to the Orlando, Florida, theme park.16 The new
1.36 Biology and Environmental Science The Faber Floral MagicBand has several functions: it serves as a hotel room
Company in Kankakee, Illinois, claims to have developed a key and park entry pass, and is linked to customer credit
special spray for roses that causes the blossom to last longer card information. Visitors wearing these wristbands will
than an untreated flower. Fifty long-stemmed roses are obtained have immediate access to certain rides. Suppose a random
and randomly assigned to one of two groups: treated versus sample of Disney World visitors wearing wristbands is
untreated. The treated roses are sprayed, and the lifetime of obtained and the wait time for Big Thunder Mountain is
each blossom is carefully recorded. recorded for each.
a. Is this an observational or an experimental study? a. Is this an observational or an experimental study?
b. What is the variable of interest? b. What are the variables of interest?
c. Describe a technique to randomly assign each rose to a c. Describe a method for obtaining a random sample of
group. visitors wearing wristbands.
24 C h a p t e r 1 An Introduction to Statistics and Statistical Inference
Chapter 1 Summary
Concept Page Notation / Formula / Description
Descriptive statistics 11 Graphical and numerical methods used to describe, organize, and summarize
data.
Inferential statistics 11 Techniques and methods used to draw a conclusion or make an inference.
Population 13 The entire collection of individuals or objects to be considered or studied.
Sample 13 A subset of the entire population.
Variable 13 A characteristic of an individual or object in a population of interest.
Probability problem 15 Certain properties of a population are assumed known. Questions involve a
sample taken from this population.
Statistics problem 15 Information about a sample is used to answer questions concerning a population.
Observational study 19 We observe the response for a specific variable for each individual or object in
the sample.
Experimental study 19 We investigate the effects of certain conditions on individuals or objects in the
sample.
Simple random sample of size n 19 A sample selected in such a way that every possible sample of size n has the
same chance of being selected.
Statistical inference procedure 21 Four-step process: Claim, Experiment, Likelihood, and Conclusion.
Chapter 1 Exercises
Looking Back
n Realize the difference between a sample and the population.
n Recognize the importance of a simple random sample in the statistical inference procedure.
n Understand the difference between descriptive and inferential statistics.
Looking Forward
n Be able to classify a data set as categorical or numerical, discrete or continuous.
n Learn several graphical summary techniques.
n Construct bar charts, pie charts, stem-and-leaf plots, and histograms.
9.3 3.5 5.2 8.3 4.6 11.1 10.5 3.7 2.8 5.9
7.4 14.2 13.6 8.3 7.5 5.2 6.4 12.0 10.7 4.0
11.1 3.7 7.0 12.2 5.2 8.1 4.2 6.1 6.3 13.2
3.9 6.7 3.3 8.3 10.9 9.5 9.4 4.3 4.6 5.8
4.1 5.2 4.7 5.8 6.4 3.8 7.1 4.6 7.5 6.0
The tabular and graphical techniques presented in this chapter will be used to
describe the shape, center, and spread of this distribution of python lengths and
to identify any outliers.
Contents
2.1 Types of Data
2.2 Bar Charts and Pie Charts
2.3 Stem-and-Leaf Plots
2.4 Frequency Distributions and Histograms
Michael R. Rochford/University of Florida/AP
27
28 Ch a pte r 2 Tables and Graphs for Summarizing Data
1100
1000
900
Number of crashes 800
700
600
500
400
300
200
100
0
n
ic
en
nd
ay
to
de
nt
rg
rla
ng
la
m
Be
pe
rli
At
be
Ca
Bu
Ca
m
Cu
County
Definition
A data set consisting of observations on only a single characteristic, or attribute, is a
We’ll do more with bivariate data univariate data set.
in Chapter 12. If we measure, or record, two observations on each individual, the data set is bivariate.
If there are more than two observations on the same person, the data set is multivariate.
Suppose we record only the make of car driven by each person who arrives at the
country club—a univariate data set. The observations, for example, Ford, Honda, or
Lexus, are categorical. There is no natural ordering of the data, and each observation
2.1 Types of Data 29
falls into only one category or class. We might instead ask each person who arrives how
long it took to reach the country club. This time the responses, for example, 10, 15, or 45
minutes, are numerical.
Definition
A categorical, or qualitative, univariate data set consists of non-numerical observations
that may be placed in categories.
A numerical, or quantitative, univariate data set consists of observations that are numbers.
The following examples illustrate the two basic types of data sets.
2.0 6.1 7.3 6.4 8.0 8.2 9.9 5.2 7.7 6.7
Data Set
6.9 4.8 10.8 9.2 3.2 7.9 8.5 8.9 6.6 8.1
mailwts
Definition
A numerical data set is discrete if the set of all possible values is finite, or countably infi-
nite. Discrete data sets are usually associated with counting.
A numerical data set is continuous if the set of all possible values is an interval of num-
bers. Continuous data sets are usually associated with measuring.
A Closer L ok
1. To decide whether a data set is discrete or continuous, consider all the possible val-
ues. Finite or countably infinite means discrete. An interval of possible values means
continuous.
Mathematically, a set is countably 2. Countably infinite means there are infinitely many possible values, but they are count-
infinite if it can be put into able. You may not ever be able to finish counting all of the possible values, but there
one-to-one correspondence with exists a method for actually counting them.
the counting numbers (1, 2, 3, 3. The interval for a continuous data set can be any interval, of any length, open or
4, . . .). The three dots, . . . , mean closed. The exact interval may not be known, only that there is some interval of
the list continues in the same possible values.
manner.
4. In practice, we have no measurement device that is precise enough to return any
number in some interval. We may only be able to achieve up to 10 digits of accuracy.
So a continuous data set may contain any number in some interval in theory, but not
in reality.
Stepped
STEPPED Tutorial
TUTORIALS
The classifications of univariate data are shown in Figure 2.4. Here is an example to
Variable
BOX PLOTStypes,
values, individuals
illustrate these classifications.
Categorical
data
Univariate Discrete
data data
Numerical
data
Continuous
data
b. The position of the drawbridge in Belmar, New Jersey, at noon on days in July. Assume
the drawbridge is not moving, and is either open or closed to boat traffic.
c. The length of time (in minutes) it takes to get a haircut.
g. The type of plumbing problem reported by the next person who contacts the Plumbing
Pros.
SOLUTION
a. The observations are numbers, so the data set is numerical. The set of possible values
is finite. We don’t know the maximum number of books read, but the possible numbers
in the data set represent counts. The data set is discrete.
b. The observations are categorical: Up (open) or down (closed). There is no natural
ordering; the possible responses fall into groups or classes. This data set is categorical.
c. The observations are numbers and the set of possible values is some interval, perhaps
5 to 45 minutes. This is a numerical continuous data set.
d. The observations are numbers and the set of possible values is finite. We can count the
number of advertised garage sales. The minimum number may be 0 and the maximum
may be 25. This is a numerical discrete data set.
e. The observations may be Milky Way, Snickers, Nestle Crunch, etc. Although there may
be some personal preference and an individual ranking, this is a categorical data set.
f. The observations are numbers and the set of possible values is some interval, say 12.5
to 13.5 psi. This is a numerical continuous data set.
g. The observations are dripping faucets, leaking pipes, plugged sinks, etc. There may be
some preference for the Plumbing Pros, but this is a categorical data set.
f. The classifications of Forward Operating Air Force bases 2.12 Univariate Data Classifications Classify each data set
(Main Air Base, Air Facility, Air Site, or Air Point). as categorical, discrete, or continuous.
a. A random sample of mature Eastern tent caterpillars is
2.7 Univariate Data Classifications A set of observations
obtained from a tree branch in a neighborhood yard. The
is obtained as indicated below. In each case, classify the result-
length of each caterpillar is recorded.
ing data set as categorical or numerical. If the data set is numer-
b. Randomly selected prime-time television shows are
ical, determine whether it is discrete or continuous.
selected and the number of violent acts is recorded for
a. The number of steps on apartment fire escapes.
each show.
b. The number of leaves on maple trees.
c. A representative sample of employees from a large
c. The reason several automobiles fail inspection.
company is obtained, and the overtime hours for the past
d. The weight of fully loaded tractor trailers.
month are recorded for each employee.
e. The area of several Nebraska farms.
d. An HMO selects a random sample of subscribers and
f. The cellular calling plan selected by customers.
records the number of office visits over the past year for
2.8 Univariate Data Classifications A set of observations each patient.
is obtained as indicated below. In each case, classify the result- e. Thirty-six apples are randomly selected from an orchard.
ing data set as categorical or numerical. If the data set is numer- Each is graded for quality of appearance: excellent, good,
ical, determine whether it is discrete or continuous. fair, or poor.
a. The number of engine revolutions per minute in f. A random sample of mattresses is obtained and the
automobiles. firmness (medium, medium firm, firm, or extra firm) of
b. The thickness of the polar ice cap in several locations. each is recorded.
c. The state in which families vacationed last summer.
d. The type of Internet connection in county households. 2.13 Univariate Data Classifications Classify each data set
e. The make of watch worn by people entering a certain as categorical, discrete, or continuous.
department store. a. A random sample of cheeses is obtained and the number
f. The number of raisins in 24-ounce boxes. of months each is allowed to age before sale is recorded.
b. Sixteen universities are selected and each computer
2.9 Numerical Observations A set of numerical observa- network system is carefully analyzed. The computer
tions is obtained as described below. Classify each resulting virus threat is assessed for each campus: low, medium,
data set as discrete or continuous. or high.
a. The widths of posters at an art gallery. c. A random sample of Waterford Normandy dinner plates is
b. The time it takes to compile computer programs. selected and the weight of each plate is recorded.
c. The number of radioactive particles that escape from d. Thirty-five new customers at a health club are selected
special containers during a one-hour period. and the body-fat percentage of each member is computed
d. The time it takes to bake batches of banana muffins. and recorded.
e. The concentration of carbon monoxide in homes during e. A random sample of CDs is obtained from a local music
the winter. store. The company that produced each CD is noted.
f. The number of pages in best-selling murder-mystery novels. f. A collection of pens is obtained from employees at a large
2.10 Numerical Observations A set of numerical observa- company. For each pen, the outside diameter of the barrel
tions is obtained as described below. Classify each resulting at its widest point is measured and recorded.
data set as discrete or continuous.
2.14 Univariate Data Classifications Classify each data set
a. The weight of baseball bats.
as categorical, discrete, or continuous.
b. The area of selected dorm rooms.
a. A random sample of Hudson River ferry trips is obtained
c. The number of bees in hives.
and the number of riders on each trip is recorded.
d. The height of a storm surge during hurricanes.
b. A random sample of military helicopters is obtained and
e. The amount of ink used in office printers during a week.
the weight of each is recorded.
f. The number of fish in office aquariums.
c. A random sample of communities in Canada is selected
2.11 Numerical Observations A set of numerical observa- and the number of full-time police officers employed is
tions is obtained as described below. Classify each resulting recorded.
data set as discrete or continuous. d. A random sample of locations in the United States is
a. The time it takes giant slalom skiers to cover a race course. selected. Temperature data are used to determine whether
b. The number of magazines available for sale at newsstands. or not a new record high temperature was set during the
c. The number of black squares in crossword puzzles. past year.
d. The length of time spent waiting in line at grocery-store e. A random sample of stock analysts is obtained and each is
checkout lanes. asked to rate a specific stock as buy, sell, or hold.
e. The number of French fries in a small order from fast-food f. A random sample of cross-country flights is obtained and
restaurants. the number of controllers each pilot talks to during the
f. The length in words of email messages received. flight is recorded.
2.2 Bar Charts and Pie Charts 33
Definition
A frequency distribution for categorical data is a summary table that presents catego-
ries, counts, and proportions.
1. Each unique value in a categorical data set is a label, or class. In Table 2.1, the classes
are physical damage, natural events, loss of essential services, etc.
2. The frequency is the count for each class. In Table 2.1, the frequency for the compro-
mise of information class is 35 (i.e., 35 computer security threats were due to compro-
mise of information).
3. The relative frequency, or sample proportion, for each class is the frequency of the
class divided by the total number of observations. In Table 2.1, the relative frequency
for the technical failure class is 95/500 5 0.19.
Construct a frequency distribution to describe these data. What proportion of cruise ships
Jimmy Lopes/FeaturePics did not go to Southampton?
34 C hapte r 2 Tables and Graphs for Summarizing Data
Solution
Step 1 Each unique destination is a label, or class. This is a categorical data set. There
are five unique classes and 25 observations in total.
Step 2 Draw a table and list each unique class in the left-hand column. Find the fre-
quency and relative frequency for each class. For example, because Bermuda
appears four times in the sample, the frequency for this class is 4. The relative
frequency for Bermuda is 4/25 5 0.16.
Relative
Class Tally Frequency frequency
Bahamas || 2 0.08 (5 2/25)
Bermuda |||| 4 0.16 (5 4/25)
Caribbean 00000 6 0.24 (5 6/25)
Mediterranean ||| 3 0.12 (5 3/25)
Southampton 0000 0000 10 0.40 (5 10/25)
Total 25 1.00
The proportion of cruise ships that did go to Southampton is 10/25 5 0.40. The
total proportion is always 1.00. Therefore, the proportion of cruise ships that did
not go to Southampton is 1.00 2 0.40 5 0.60.
A Closer L ok
A tally mark is a short line drawn 1. If you have to construct a frequency distribution by hand, an additional tally column is
for each count up to four. On helpful. Insert this after the class column, and use a tally mark or tick mark to count
number five, draw a diagonal line observations as you read them from the table.
across the other four. Count in 2. The last (total) row is optional, but it is a good check of your calculations. The fre-
sets of five. quencies should sum to the total number of observations, and the relative frequencies
should sum to 1.00 (subject to round-off error).
3. There is no rule for ordering the classes. In Example 2.4, the classes happen to be
presented in alphabetical order.
25
20
15
10
5
0
CT DE FL MA NJ PA RI
State
Figure 2.5 Bar chart showing the number of Nathan’s
World Famous Beef Hot Dogs franchises in certain states
as of March 25, 2012.
2.2 Bar Charts and Pie Charts 35
Solution
Either frequency or relative
Step 1 Use the frequency distribution for the cruise ship data. There are five classes,
frequency may be used on the
and the frequencies range from 2 to 10.
vertical axis. Both are acceptable
because the resulting graphical Step 2 Draw a horizontal and a vertical axis. On the horizontal axis, draw five ticks for
representations of the distribution the five classes and label them with the class names. Because the greatest fre-
are identical. The only difference quency is 10, draw and label tick marks from 0 to at least 10 on the vertical axis.
between the two graphs is the Step 3 The height of each vertical bar is determined by the frequency of the class. For
labels on the vertical axis. Unless it example, the frequency of trips to Bermuda is 4, so the height of the bar repre-
is stated otherwise, frequencies senting Bermuda is 4. The resulting bar chart is shown in Figure 2.6. A technol-
are used. ogy solution is shown in Figure 2.7.
12
10
8
Frequency
6
4
2
0
as
an
an
on
ud
m
be
ne
pt
rm
ha
m
rib
rra
Be
ha
Ba
Ca
ite
ut
ed
So
M
Destination
Figure 2.6 Bar chart for the cruise ship data. Figure 2.7 CrunchIt! bar chart.
31%,
18%, Try It Now Go to Exercise 2.21
US
Europe
A pie chart is another graphical representation of a frequency distribution for categor-
ical data. An example of a pie chart is shown in Figure 2.8.3
11%, 10%,
Other Japan
3. The first slice of a pie chart is usually drawn with an edge horizontal and to the right
(08). The angle is measured counterclockwise. Each successive slice is added counter-
clockwise with the appropriate angle.
Solution
Step 1 Add a column to the frequency distribution for slice angle. Use the relative fre-
quency of each class to find the slice angle.
Step 2 Draw a circle and mark slices using the angles in the frequency distribution.
Draw the first slice with an edge extending from the center of the circle to the
right. The remaining slices are drawn moving around the pie counterclockwise.
It may be helpful to use a protractor and compass to draw the circle and measure
the angles. See Figure 2.9. A technology solution is shown in Figure 2.10.
Caribbean Bermuda
Bahamas
28.8º
Mediterranean
Southampton
Stepped
STEPPED Tutorial
TUTORIALS Try It Now Go to Exercise 2.23
Bar
BOX charts
PLOTS and pie
charts A Closer L ok
1. A pie chart is hard to draw accurately by hand, even with a protractor and compass.
A graphing calculator or computer is quicker and more efficient for constructing
this graph.
2. There are lots of pie-chart variations, for example, exploding pie charts and 3D pie
charts. Each is simply a visual representation of a frequency distribution for categor-
ical data.
Technology Corner
Procedure:VIDEO
Construct
TECHaMANUALS
bar chart. VIDEO Tech
Video TECH Manuals
MANUALS
Reconsider:
EXEL DISCRIPTIVEsolution, and interpretations.
Example 2.5, EXELchart
Bar DISCRIPTIVE
CrunchIt!
CrunchIt! has a built-in function to construct a bar chart from original or summarized data.
1. Enter the classes in column Var1 and the frequencies in column Var2. Rename each column if desired.
2. Select Graphics; Bar Chart. Choose Var1 for Labels and Var2 for Heights. Optionally enter a Title, X Label, and Y
Label. Click the Calculate button. Refer to Figure 2.7.
TI-84 Plus C
The TI-84 Plus C does not accept categorical data. Therefore, there is no built-in function to construct a bar chart.
However, you may assign a number to each class, and use the Histogram statistical plot to construct a bar chart.
1. Enter integers corresponding to each class in list L1 and the frequency for each class in the corresponding row in list
L2 (Figure 2.11).
2. Press STATPLOT and select Plot1 from the STAT PLOts menu.
3. Turn the plot On and select Type histogram. For Xlist, enter the name of the list containing the categories. For
Freq, enter the name of the list containing the frequencies. Select a Color (Figure 2.12).
4. Consider each class to have width 1. Enter appropriate WINDOW settings. Press GRAPH to display the bar chart
(Figure 2.13).
Figure 2.11 The categories and Figure 2.12 The Plot1 setup Figure 2.13 TI-84 Plus C bar
frequencies. screen. chart.
38 Cha pte r 2 Tables and Graphs for Summarizing Data
Minitab
The input can be either the entire data set in a single column or a summary table of categories and frequencies in two columns.
1. Enter the data into column C1.
2. Select Graph; Bar chart. Choose Counts of unique values and Simple.
3. Enter C1 under Categorical variables.
4. Edit graph attributes as necessary, for example, the axes labels, plot title, and gaps between clusters. See Figure 2.14.
Excel
The built-in functions Frequency or Sumif may be used to construct a frequency distribution. Assume this summary
information is available.
1. Enter the categories into column A and the corresponding frequencies into column B.
2. Select the range of cells A1:B5. Under the Insert tab, select Column; 2-D Column; Clustered Column.
3. Use Chart Tools to format the bar chart as necessary. See Figure 2.15.
CrunchIt!
CrunchIt! has a built-in function to construct a pie chart from original or summarized data.
1. Enter the classes in column Var1 and the frequencies in column Var2. Rename each column if desired.
2. Select Graphics; Pie Chart. Choose Var1 for Labels and Var2 for Sizes. Optionally enter a Title. Click the Calculate
button. Refer to Figure 2.7.
2.2 Bar Charts and Pie Charts 39
TI-84 Plus C
A pie chart can be constructed using the CellSheet App for the TI-84 Plus C.
1. Open a new spreadsheet in the CellSheet app.
2. Enter the categories in row 1 and the corresponding frequencies in row 2. See Figure 2.16.
3. Select MENU; Charts; Pie. Enter the range for the categories, the range for the frequencies, select Number or
Percent for display, and enter a Title if desired. See Figure 2.17.
4. Highlight Draw and press ENTER to display the pie chart. See Figure 2.18.
Figure 2.16 The categories and Figure 2.17 The PIE CHART Figure 2.18 TI-84 Plus C pie
frequencies in a CellSheet setup screen. chart.
spreadsheet.
Minitab
The input can be either the entire data set in a single column or a summary table of categories and frequencies in two columns.
1. Enter the data into column C1.
2. Select Graph; Bar chart. Choose Counts of unique values and Simple.
3. Enter C1 under Categorical variables.
4. Edit graph attributes as necessary, for example, the axes labels, plot title, and gaps between clusters. See Figure 2.19.
Excel
The built-in functions Frequency or Sumif may be used to construct a frequency distribution. Assume this summary
information is available.
1. Enter the categories into column A and the corresponding frequencies into column B.
2. Select the range of cells A1:B5. Under the Insert tab, select Column; 2-D Column; Clustered Column.
3. Use Chart Tools to format the bar chart as necessary. See Figure 2.20.
40 Ch a pte r 2 Tables and Graphs for Summarizing Data
2.22 Biology and Environmental Science The following a. Find the relative frequency for each prize.
table lists the number of dairy farms for various counties in b. Construct a pie chart for these data.
Vermont.4 Dairy
2.26 Education and Child Development The grade distri-
County Frequency bution for a large psychology class at Louisiana
State University is given in the following frequency
Addison 145 distribution: PSYCHGRD
Bennington 18
Caledonia 80 Grade Frequency Relative frequency
Chittenden 42
Essex 12 A 10
Franklin 210 B 43
Grand Isle 17 C 54
Lamoille 38 D 26
Orange 93 F 15
Orleans 141 a. Find the relative frequency for each grade.
Rutland 71 b. Construct a bar chart using frequency on the vertical axis
Washington 39 and a pie chart from the frequency distribution.
Windham 25
c. How many students were in this psychology class? What
Windsor 39
proportion of students passed (i.e., received a D or better)?
a. Find the relative frequency for each county. 2.27 Public Health and Nutrition A random survey of 200
b. Construct a bar chart for these data using relative customers who purchased ice cream at Brigham’s showed the
frequency on the vertical axis. following proportions: icecream
Sector Proportion
Product Proportion
Alarms 0.2964 Government 0.246
Training 0.0632 Professional services 0.062
Extinguishers 0.0514 Technology 0.136
Pumps 0.0237 Industry, manufacturing 0.154
Sprinklers 0.0632 Service 0.104
Building materials 0.0751 Resource-based 0.098
Electrical equipment 0.1265 Financial 0.142
Hazmat storage 0.0870 Nonprofit 0.058
Security products 0.1621 a. Construct a bar chart and a pie chart for these data using
Signaling systems 0.0514 the proportions in the table.
b. Suppose 225 companies participated in this survey. Find
the frequency, the number of companies, for each sector.
a. Find the number of exhibitors in each classification.
b. Carefully sketch a bar chart and a pie chart using the 2.33 Marketing and Consumer Behavior A survey of new
proportions for each class. homes built in the Sleepy Creek Mountains of West Virginia
produced the following results for the type of siding: siding
2.30 Psychology and Human Behavior Using the Library
of Congress classification scheme, the Brookings Public Siding Frequency
Library in South Dakota recorded the type of book borrowed by
30 randomly selected patrons. The data are given in the Aluminum 20
following table: books Brick 15
Stucco 12
Vinyl 45
Medicine Science Medicine Medicine
Wood 24
Science Education Medicine Science
Education Law Medicine Technology a. Find the relative frequency for each siding classification.
Education Technology Literature Education b. Construct a bar chart using frequency on the vertical axis
and a pie chart for these data.
Technology Medicine Technology Science
Science Literature Medicine Literature 2.34 Public Policy and Political Science There are many
think tanks in the world, consisting of groups of independent
Law Literature Law Technology
scholars with academic, government, and/or private experience.
Technology Education These think-tank scholars publish articles in appropriate
journals and offer advice on politics, economics, and govern-
a. Construct a frequency distribution for these data. mental policy matters. The following table lists the number of
b. Carefully sketch a bar chart using relative frequency on think tanks (TTs) in the world in 2012 by region.8 think
the vertical axis and pie chart for these data.
c. Do you think the public library should try to purchase Region No. of TTs
more books in one particular subject area? Why or Africa 554
why not? Asia 1194
Europe 1836
2.31 Marketing and Consumer Behavior Cardinal Glass
Latin America and the Caribbean 721
Industries produces several products for residential buildings,
Middle East and North Africa 339
for vehicles, and for ordinary consumer use. The proportion
North America 1919
of each type of manufactured product is given in the follow-
Oceania 40
ing table: glass
2.2 Bar Charts and Pie Charts 43
Frequency
2.35 Travel and Transportation In late October 2012, 25
urricane Sandy had a devastating effect on the coast of
H
20
the northeastern United States. This superstorm caused an
15
estimated $65.6 billion in damages and was the largest Atlantic
hurricane by diameter ever recorded. A survey was conducted 10
to determine the primary source of local transportation 5
information for people affected by the storm. The following 0
table lists each source and its frequency.9 sandy
is
tz
e
ty
ge
am
ris
lu
r
rif
Av
He
Va
d
Th
rp
Al
Bu
te
En
Agency
Source Frequency
Official websites and alerts 265 a. Construct a frequency distribution for these survey results.
Social media 198 b. How many observations were in this data set?
News websites 152 c. What proportion of people did not use Hertz or Enterprise?
News TV/radio 147 d. Construct a pie chart for these data.
Friends/family 115
2.38 Marketing and Consumer Behavior One thousand
Community groups 45
customers entering the Mall of America in Bloomington,
Smartphone apps 45
Minnesota, were randomly selected and asked to rank
Other 30
the variety of stores. The results are given in the following
table.
a. Find the relative frequency associated with each
source. Response Frequency Relative frequency
b. Construct a pie chart for these data.
Excellent 50
Very good 152
Extended Applications Good 255
Fair 0.4250
2.36 Sports and Leisure Complete the following frequency Poor 0.1180
distribution from a random sample of people visiting Atlantic
City casinos.
a. Complete the frequency distribution.
b. Construct a bar chart using frequency on the vertical axis
and a pie chart for these data.
Relative
c. What proportion of customers did not rank the store
Class Frequency frequency
variety as very good or excellent?
Bally’s 40 2.39 Travel and Transportation The following table shows
Caesars 25 0.125 the number of fatal vehicle crashes in 2011 by day of the week
Harrah’s 32 for rural and urban roads or streets.10 crash
Resorts 0.110
Sands 25
Trump Plaza 0.280 Day of week Rural Urban
b. Find the relative frequency for urban for each day. a. Construct a bar chart for these data.
Construct a bar chart using relative frequency on the b. Construct a pie chart for these data.
vertical axis (and day on the horizontal axis). c. Is it reasonable to conclude that hunters in Texas are more
c. How do these two bar charts compare? Describe any likely to be injured hunting hogs than any other animal?
similarities or differences. Why or why not?
2.40 Biology and Environmental Science An avalanche is 2.42 Public Policy and Political Science A side-by-side or a
a major danger facing skiers, snowboarders, and snowmobilers. stacked bar chart may be used to compare categorical data
The number of avalanche fatalities per year in the United States obtained from two (or more) different sources or groups.
varies from approximately 5 to 35, but in general has increased Figures 2.21 and 2.22 show an example of each—a comparison
steadily since 1956. The following table shows the number of of test grades in two different sections of an introductory
avalanche fatalities from 2002 to 2011 in the United States by statistics course. The blue rectangles represent students from
activity.11 Avalanch Section 01; the green rectangles represent students from
Section 02. RATINGS
Activity Frequency
Climber 30
Hiker 2 10
Rec snowplayer 1
Resident 8 8
Ski inbounds 9 Frequency
Ski out of bounds 21 6
Ski tour 53
Snowboard inbounds 1 4
Snowboard out of bounds 10
Snowboard tour 23 2
Snowmobiler 107
Snowshoer 12
0
Work other 1 A B C D E
Work patrol 3 Grade
Ski helicoptor 2 Figure 2.21 Side-by-side bar chart. Bars corresponding
Snowmobiler other 1 to the same category are placed side by side for easy
comparison.
a. Find the relative frequency of fatalities for each
activity.
b. Construct a bar chart using frequency on the vertical axis, 16
and a bar chart using relative frequency on the vertical 14
axis. Which of these two graphs do you think is a better
graphical description of avalanche fatalities by activity? 12
Why? 10
Frequency
4
Animal hunted Frequency
2
Dove 14
0
White-tailed deer 11 A B C D E
Rabbit/hare 5 Grade
Hog 21
Figure 2.22 Stacked bar chart. Within each category, bars are
Quail/pheasant 6
stacked for comparison.
Turkey 3
Duck/goose 4
Coyote 2 In January 2013, a poll by ABC News indicated that President
Squirrel/prairie dog 4 Obama’s popularity had reached a three-year high. Suppose the
Nongame bird/snake 5 frequency of occurrence of each response, grouped by sex, is
Raccoon 2 given in the following table.
2.3 Stem-and-Leaf Plots 45
3. List all the digits of each leaf next to its corresponding stem. It is not necessary to put
the leaves in increasing order, but make sure the leaves line up vertically.
4. Indicate the units for the stems and leaves.
STEPPED Tutorial
Stepped TUTORIALS Example 2.7 Waterfall Heights
BOX PLOTS
Stemplots Kerepakupai Meru, or Angel Falls, is the highest waterfall in the world.14 Because the
falls are so high, 979 meters, by the time water reaches the canyon below, it has vaporized
into a giant mist cloud. Suppose the following table lists the total height, in meters, of
several waterfalls in the world.
Data Set
Wtrfall 693 745 631 635 625 629 739 738 732 725
720 719 715 715 707 707 706 705 700 680
674 671 671 665 660 660 650 646 645 640
640 638 620 620 612 610 610 610 610 610
610 610 610 610 610 600 600 600 651 727
60 000
61 20000000000
62 5900
63 158
64 6500
65 01
66 500
67 411
68 0
69 3
70 77650
71 955
72 507 Stem 5 10
73 982 Leaf 5 1
74 5
2.3 Stem-and-Leaf Plots 47
A technology solution: Note that Stem 5 10 means the rightmost digit in the stem is in the tens
place and Leaf 5 1 means the leftmost digit in each leaf is in the ones place.
Reading from the graph, the smallest waterfall height is 600 meters and the
largest is 745 meters. The center of a data set is a typical value or values near
the middle of the observations when they are arranged in (increasing) order.
For these data, the center appears to be in the 64 or 65 stem row. There are
no outlying values. Figure 2.23 shows a technology solution.
A Closer L ok
1. As a general rule of thumb, try to construct the plot with 5 to 20 stems. With fewer than
5, the graph is too compact; with more than 20, the observations are too spread out.
Neither extreme reveals much about the distribution.
2. Sometimes, to help us find the center of the data, we put the leaves in increasing order,
to make an ordered stem-and-leaf plot.
3. Some advantages of a stem-and-leaf plot: each observation is a visible part of the
graph and (in an ordered stem-and-leaf plot, as when using a computer) the data are
sorted. However, a stem-and-leaf plot can get very big, very fast.
If a stem-and-leaf plot is made for a very large data set, the stems may be divided, usu-
ally in half or fifths. Consider the following example.
75 84 78 79 72 73 50 90 85 69
76 77 77 78 61 58 80 81 75 75
89 89 74 73 79 86 85 94 64 72
77 83 78 81 91 70 93 75 78 54
55 60 63 69 65 73 93 81 79 79
77 75 61 71 68 72 77
Solution
Step 1 If we split each observation between the tens place and the ones place, there will
be five stems. However, the leaves will extend far to the right and the shape,
center, and spread of the distribution will be unclear.
Step 2 Divide each stem in half. The first 5-stem row holds numbers 50–54, the sec-
ond 5-stem row holds numbers 55–59, the first 6-stem row holds numbers
60–64, etc.
Step 3 The resulting stem-and-leaf plot, with divided stems, offers a better graphical
description of the distribution. Note that the leaves have been ordered.
48 Ch apte r 2 Tables and Graphs for Summarizing Data
Step 4 Notice that we can draw a straight line across the stem-and-leaf plot near the
75–79 row and the graph is almost a mirror image, or reflection, over this line.
Therefore, the distribution of the data is approximately symmetric, centered near
75–79. In addition, the distribution is compact and there are no outlying values.
Figure 2.24 shows a technology solution.
Men Women
110 124 132 147 157 183 190 201 211 154
164 172 180 193 201 186 212 213 169 173
210 224 112 158 165 177 195 203 203 189
173 181 193 205 216 207 158 218 213 205
194 194 179 185 185 205 204 189 179 177
Solution
Step 1 The following graph is a back-to-back stem-and-leaf plot for these data.
Men Women
20 11
4 12
2 13
7 14
87 15 48
54 16 9
932 17 3779
5510 18 3699
4433 19 05
51 20 1334557
60 21 12338 Stem 5 10
4 22 Leaf 5 1
Step 2 The center column of numbers (11, 12, 13, . . .) represents the stems for both
groups. The stem-and-leaf plot for the men’s data is constructed to the left,
while the plot for the women’s data is constructed to the right. Note that the
leaves have been placed in increasing order, starting from the stem and pro-
ceeding outward.
Step 3 The distribution for the men seems more spread out, or has more variability,
while the distribution of women’s HDL-cholesterol levels is more compact and
seems centered at a slightly greater value.
Technology Corner
VIDEO TECH
Procedure: Construct MANUALS
a stem-and-leaf plot. VIDEO Tech
Video TECH Manuals
MANUALS
Reconsider:EXEL DISCRIPTIVE
Example 2.7, solution, and interpretations. EXEL DISCRIPTIVE
Stem plot
There is no built-in command on the TI-84 Plus C or in Excel to construct a stem-and-leaf plot. Some calculator programs
are available, and there are several add-ins for Excel for drawing a stem-and-leaf plot.
CrunchIt!
1. Enter the data in column Var1. Rename the column if desired.
2. Select Graphics; Stem and Leaf. Choose Var1 for the Sample and optionally enter a Title. Click the Calculate button.
Refer to Figure 2.23 (page 47).
Minitab
1. Enter the data into column C1.
2. Select Graph; Stem-and-Leaf.
3. Enter C1 under Graph variables. Click OK.
4. The increment (distance between stems) is automatically selected and the leaves are placed in order. The numbers in the
left column represent the cumulative counts from each end. The stem row containing the middle value is marked by
only a count, in parentheses, of the number of observations in that row. N is the total number of observations. See
Figure 2.25.
Determine a range of numbers to indicate the center of the selected and the reaction time (in seconds) for each was
data. Within this range, select one number that is a typical recorded. reactime
value for this data set. EX2.49 a. Construct a stem-and-leaf plot by splitting each
observation between the ones place and the tenths place.
2.50 Construct a stem-and-leaf plot for the data given on the
Truncate the hundredths digit so that each leaf has a
text website. EX2.50
single digit.
2.51 Construct a stem-and-leaf plot for the data given on the text b. Construct a stem-and-leaf plot by splitting each
website. Split each observation between the tens place and the observation between the ones place and the tenths place.
ones place, and divide each stem in half. Determine a range of Round each leaf to the nearest tenth so that each leaf has
numbers to indicate the center of the data. Within this range, select a single digit.
one number that is a typical value for this data set. EX2.51 c. Describe any differences between the two plots. What is a
typical value?
2.52 Construct a stem-and-leaf plot for the data given on the
website for this book. Use the stem-and-leaf plot to identify any 2.56 Public Health and Nutrition The owner of Copper-
outliers in this distribution. EX2.52 field Racquet and Health Club randomly selected 50 people
and recorded the number of calories burned after 20 minutes
2.53 Consider the following stem-and-leaf plot:
on a treadmill. calburn
1717 1719 1645 3739 3024 3664 3830 468 526 463 520 481 521 536 492 509 520
2991 2430 2730 3469 5086 2119 3021 497 487 506 464 474 516 503 481 562 514
3292 2844 3426 2067 3215 2767 3124 503 482 531 486 488 508 495 536 504 514
2573 2840 2449 2584 1505 1390 1645 529 518 495 497 471 458 494 519 511 490
2497 3466 3228 3192 435 520 499 492 519 466 450 482 514 475
a. Construct a stem-and-leaf plot by splitting each observation a. Construct a stem-and-leaf plot for these light-intensity data.
between the thousands place and the hundreds place. b. What is a typical light intensity? Are there any outliers? If
b. Construct a stem-and-leaf plot by splitting each so, what are they?
observation between the hundreds place and the tens
2.59 Public Health and Nutrition Every patient who
place (using two-digit leafs).
visits a hospital emergency room is classified by the imme-
c. Which plot presents a better picture of the distribution?
diacy with which the patient should be seen. The American
Why?
College of Emergency Physicians suggests that approxi-
mately 92% of all patients who visit emergency rooms can be
Applications
classified as urgent, that is, need attention within 1 minute to
2.55 Psychology and Human Behavior A random sam- 2 hours.18 Suppose a random sample of hospital emergency
ple of patients involved in a psychology experiment was rooms was obtained and yearly records were examined. The
52 Ch apte r 2 Tables and Graphs for Summarizing Data
in the competition for the largest pumpkin. A random sample of a. Construct a stem-and-leaf plot for these data. Divide each
the largest pumpkins from 2012 was obtained, and the data are stem into five parts.
given on the text website.20 pumpkin
b. Based on the plot in part (a), do you believe this city has
a. Construct a stem-and-leaf plot for these data. set the amber light duration to meet federal
b. What is a typical weight for these giant pumpkins? Are recommendations? Justify your answer.
there any outliers in the data set? If so, what are they? 2.67 Biology and Environmental Science The 2011 Maine
Sea Scallop Survey was conducted in November 2011 between
West Quoddy Head and Matinicus Island.23 Each sea scallop
Extended Applications
catch was divided into three size categories: seed, sublegal, and
2.64 Sports and Leisure A greyhound race handicapper harvestable ($101.6 mm). Based on information in this report,
uses several factors to predict the winner, such as past perfor- a random sample of shell heights (in millimeters) is given on
mance, track condition, early speed, form, and competition. the text website. scallop
Races on a 5/16-mile track were randomly selected at the a. Construct a stem-and-leaf plot for these data.
Naples-Fort Myers track, and the winning time (in seconds) b. Describe the distribution of shell heights in terms of shape,
was recorded for each. The data are given in the following center, and spread.
table.21 Greyrace c. Estimate the proportion of sea scallops that are harvestable.
2.4 Frequency Distributions and Histograms 53
10
6
Frequency
0
24 26 28 30 32 34 36 38 40 42
Silver price
Definition
A frequency distribution for numerical data is a summary table that displays classes,
frequencies, relative frequencies, and cumulative relative frequencies.
Here is a procedure for constructing a frequency distribution, along with the necessary
definitions.
6. Find the cumulative relative frequency (CRF) for each class: the sum of all the rela-
tive frequencies of classes up to and including that class. This column is a running
total or accumulation of relative frequency, by row.
20.4 24.1 28.4 53.4 62.1 31.7 57.2 45.7 38.1 51.1
41.3 11.0 37.5 36.4 25.6 43.5 23.1 24.2 35.5 26.4
13.0 44.4 16.9 14.9 63.7
Solution
Step 1 The data set is numerical (continuous). The observations are measurements, and
each can be any number in some interval. Scan the data to find the smallest and
largest observations (11.0 and 63.7). Choose between 5 and 20 reasonable
(equal) intervals that capture all of the data.
Step 2 The range of values 10–70 captures all of the data. Divide this range using the
friendly numbers 10, 20, 30, . . . into the class intervals 10–20, 20–30, 30–40,
etc.
Step 3 Count the number of observations in each interval. For example, in the interval
10–20, there are four observations (16.9, 14.9, 11.0, and 13.0), so the frequency is 4.
Step 4 Compute the proportion of observations in each class. For example, in the inter-
val 10–20, the relative frequency is 4 (observations) divided by 25 (total number
of observations).
Step 5 Find the CRF for each class. For example, for the class 30–40, the cumulative
relative frequency is the sum of the relative frequencies of this class and of all
those listed above it: 0.16 1 0.28 1 0.20 5 0.64.
Cumulative
Relative relative
Class Frequency frequency frequency
10–20 4 0.16 (54/25) 0.16 (50.16)
20–30 7 0.28 (57/25) 0.44 (50.16 1 0.28)
30–40 5 0.20 (55/25) 0.64 (50.44 1 0.20)
40–50 4 0.16 (54/25) 0.80 (50.64 1 0.16)
50–60 3 0.12 (53/25) 0.92 (50.80 1 0.12)
60–70 2 0.08 (52/25) 1.00 (50.92 1 0.08)
Total 25 1.00
Step 6 As for categorical data, if you must construct a frequency distribution by hand,
an additional tally column is helpful (as introduced in Section 2.2). Insert this
after the class column, and use a tally mark or tick mark to count observations as
you read them from the table.
(25 in Example 2.10), and the relative frequencies should sum to 1.00 (subject to
round-off error).
The CRF of the first class row is equal to the relative frequency of the first class.
There are no other observations before the first class. The CRF of the last class should
be 1.00 (subject to round-off error). You must accumulate all of the data by the
last class.
CRF gives the proportion of observations in that class and all previous classes. In
Example 2.10, the CRF of the class 40–50 is 0.80. Interpretation: the proportion of torque
measurements less than 50 is 0.80.
A Closer L ok
This idea of working backward 1. Suppose you were given just the CRF for each class. To find the relative frequency for
from cumulative relative a class, take the class CRF and subtract the previous class CRF. In Example 2.10, to
frequency to obtain relative find the relative frequency for the class 50–60: 0.92 (CRF for the class 50–60) 2 0.80
frequency is a handy technique (CRF for the previous class 40–50) 5 0.12.
for answering many probability 2. If the data set is numerical and discrete, use the same procedure outlined above for
questions. constructing a frequency distribution. If the number of discrete observations is small,
then each value may be a class, or category. In addition, certain liberties are sometimes
acceptable in listing the classes. For example, suppose a discrete data set consists of
integers from 1 to 30. One might use the classes 1–5, 6–10, 11–15, 16–20, 21–25, and
26–30. This is not a strict partition of the interval 1–30 even though these classes are
disjoint, or do not overlap. They do not allow for all numbers between 1 and 30. For
example, the value 5.5 is between 1 and 30 but does not fall into any of these classes.
However, these classes work fine in this case, because each observation is an integer.
The resulting frequency distribution is perfectly valid.
STEPPED Tutorial
Stepped TUTORIALS Example 2.11 Nuts and Bolts, Continued
BOX PLOTS
Histograms Construct a frequency histogram for the torque data presented in Example 2.10. For reference,
here is the frequency distribution from Example 2.10:
Cumulative
Relative relative
Class Frequency frequency frequency
10–20 4 0.16 0.16
20–30 7 0.28 0.44
30–40 5 0.20 0.64
40–50 4 0.16 0.80
50–60 3 0.12 0.92
60–70 2 0.08 1.00
56 C hapte r 2 Tables and Graphs for Summarizing Data
Solution
Step 1 Draw a horizontal axis and place tick marks corresponding to the class boundar-
ies, or endpoints: 10 through 70 by tens.
Step 2 Draw a vertical axis for frequency and place appropriate tick marks by checking
the frequency distribution. The frequencies range from 0 to 7, so draw tick marks
at 0 to at least 7 on the vertical axis.
Step 3 Draw a rectangle above each class with height equal to frequency. The resulting
histogram is shown in Figure 2.27. Figure 2.28 shows a technology solution.
8
7
6
Frequency
5
4
3
2
1
0
10 20 30 40 50 60 70
Torque
Figure 2.27 Frequency histogram for torque. Figure 2.28 CrunchIt! histogram.
A Closer L ok
1. A histogram tells us about the shape, center, and variability of the distribution. In addi-
tion, we can quickly identify any outliers.
2. If you must draw a histogram by hand, then you need to construct the frequency distri-
bution first. However, calculators and computers construct histograms directly from
the data. The frequency distribution is in the background and is usually not displayed.
3. To construct a relative frequency histogram, plot relative frequency versus class inter-
Histogram usually means val. The only difference between a frequency histogram and a relative frequency histo-
frequency histogram. gram is the scale on the vertical axis. The two graphs are identical in appearance. In
Example 2.12 both a frequency histogram and a relative frequency histogram are shown.
4. Histograms should not be used for inference. They provide a quick look at the distribu-
tion of data and only suggest certain characteristics.
Cumulative
Relative relative
Class Frequency frequency frequency
© Caro/Alamy 0–500 16 0.08 0.08
500–1000 28 0.14 0.22
1000–1500 54 0.27 0.49
1500–2000 48 0.24 0.73
2000–2500 36 0.18 0.91
2500–3000 18 0.09 1.00
Total 200 1.00
2.4 Frequency Distributions and Histograms 57
Use this table to construct a frequency histogram and a relative frequency histogram for
these data.
Solution
Step 1 For each graph, draw a horizontal axis and place tick marks at the class boundar-
ies: 0, 500, 1000, . . . , 3000.
Step 2 For the frequency histogram:
a. Draw a vertical axis for frequency. Since the largest frequency is 54, use the
tick marks at 0, 10, 20, . . . , 60.
b. Draw a rectangle above each class with height equal to frequency. The result-
ing frequency histogram is shown in Figure 2.29.
Step 3 For the relative frequency histogram:
a. Draw a vertical axis for relative frequency. Because the largest relative fre-
quency is 0.27, use the tick marks at 0, 0.05, 0.10, . . . , 0.30.
b. Draw a rectangle above each class with height equal to relative frequency.
The resulting relative frequency histogram is shown in Figure 2.30.
60
0.30
50 0.25
40 0.20
30 0.15
20 0.10
10 0.05
0 0.00
0 500 1000 1500 2000 2500 3000 0 500 1000 1500 2000 2500 3000
Figure 2.29 Frequency histogram for the Figure 2.30 Relative frequency histogram
tunnel-length data. for the tunnel-length data.
Relative
Class Frequency frequency Width Density
16–18 18,157 0.0494 2 0.0247 (50.0494/2)
18–21 40,122 0.1091 3 0.0364 (50.1091/3)
21–25 45,247 0.1231 4 0.0308 (50.1231/4)
25–30 43,106 0.1172 5 0.0234 (50.1172/5)
30–40 73,846 0.2008 10 0.0201 (50.2008/10)
40–50 78,442 0.2133 10 0.0213 (50.2133/10)
50–60 68,781 0.1871 10 0.0187 (50.1871/10)
Total 367,701 1.0000
Solution
Step 1 The class intervals are of unequal width, so the class density must be used as the
height of each rectangle in a histogram.
Step 2 Draw a horizontal axis corresponding to age. Because the classes range from 16
to 60, use tick marks at 15, 20, 25, . . . , 60, or tick marks corresponding to the
endpoints of each class.
Step 3 Add a vertical axis for density. The largest density is 0.0364, so use the tick
marks 0, 0.005, 0.010, . . . , 0.040.
Step 4 Draw a rectangle above each class with height equal to density. The resulting
density histogram is shown in Figure 2.31. A technology solution is shown in
Figure 2.32.
0.040
0.035
0.030
0.025
Density
0.020
0.015
0.010
0.005
0
1618 21 25 30 40 50 60
Age
Figure 2.31 Histogram for unequal class Figure 2.32 A technology solution:
widths: density histogram for the age data. Minitab density histogram.
Shape of a Distribution
Because the relative frequency is equal to the area of each rectangle in a density histo-
gram, the sum of the areas of all the rectangles is 1. This is an important concept as we
begin to associate area with probability.
2.4 Frequency Distributions and Histograms 59
60
50
40
30
20
10
0
–3 –2 –1 0 1 2 3
Figure 2.33 Smooth curve that captures Figure 2.34 Typical smoothed histogram.
the general shape of the distribution.
Definition
1. A unimodal distribution has one peak. This is very common; almost all distributions
have a single peak.
2. A bimodal distribution has two peaks. This shape is not very common and may occur
if data from two different populations are accidentally mixed.
3. A multimodal distribution has more than one peak. A distribution with more than two
distinct peaks is very rare.
Figure 2.35 Unimodal histogram. Figure 2.36 Bimodal histogram. Figure 2.37 Multimodal
distribution with four peaks.
The following characteristics are used to further classify and identify unimodal distri-
butions.
Definition
1. A unimodal distribution is symmetric if there is a vertical line of symmetry in the
distribution.
2. The lower tail of a distribution is the leftmost portion of the distribution, and the
upper tail is the rightmost portion of the distribution.
60 C hapte r 2 Tables and Graphs for Summarizing Data
Figures 2.38 and 2.39 show examples of symmetric, unimodal distributions. Each
shows the (dashed) line of symmetry. The left half of the distribution is a mirror image of
the right half. A bimodal or multimodal distribution may also be symmetric, and many
distributions are approximately symmetric.
Examples of skewed distributions are shown in Figures 2.40 and 2.41. Positively
skewed distributions are more common. The distribution of the lifetime of an electronics
part might be positively skewed.
We will learn much more about The most common unimodal distribution shape is a normal curve (as shown in Fig-
the normal curve in Chapter 6. ure 2.42). This curve is symmetric and bell-shaped, and can be used to model, or approx-
imate, many populations.
The vertical cross section of a bell A curve with heavy tails has more observations in the tails of the distribution than a
is a normal curve. comparable normal curve. The tails do not drop down to the measurement axis as quickly
as a normal curve. A curve with light tails has fewer observations in the tails of the dis-
tribution than a comparable normal curve. The tails drop to the measurement axis quickly.
Examples of curves with heavy and light tails are shown in Figures 2.43 and 2.44. Both
of these characteristics are subtle and tricky to spot.
Figure 2.42 Normal curve. Figure 2.43 A distribution with Figure 2.44 A distribution with
heavy tails. light tails.
2.4 Frequency Distributions and Histograms 61
SOLUTION TRAIL
Data Set
Example 2.14 Radiation Exposure
The U.S. Nuclear Regulatory Commission was established to regulate commercial, indus-
rems
trial, academic, and medical uses of nuclear materials.26 The NRC is charged with moni-
toring our health and safety and protecting the environment. Part of this responsibility
involves monitoring the radiation exposure at nuclear power reactors and other facilities.
Solution Trail 2.14 The individual radiation dose per year is measured in rem (roentgen equivalent man).
Suppose a sample of 50 individual radiation measurements was obtained from employees
K e y w ords
and the data are given in the following table:
n Less than 0.40
n At least 0.50 0.62 0.29 0.06 0.09 0.10 0.24 0.06 0.38 0.32 0.46
0.71 0.53 0.28 0.19 0.16 0.40 0.08 0.24 0.57 0.11
Tr a nsl at i o n
0.30 0.32 0.18 0.29 0.15 0.13 0.42 0.18 0.28 0.39
n Up to but not including 0.40
n 0.50 or greater 0.14 0.18 0.27 0.20 0.37 0.22 0.26 0.31 0.11 0.29
0.19 0.12 0.22 0.21 0.12 0.05 0.22 0.26 0.49 0.43
C once pts
n Use the frequency distribution a. Construct a frequency distribution and a histogram for these data using the class inter-
to describe the distribution vals 0–0.10, 0.10–0.20, etc.
and answer questions about b. Describe the shape, center, and spread of the distribution.
specific classes.
c. What proportion of observations are less than 0.40 rem?
V isi on d. What proportion of observations are at least 0.50 rem?
Use the frequency distribution to
construct the histogram. Smooth Solution
out the rectangles to describe the Step 1 The class intervals are given. Create a frequency distribution and compute the
shape, consider the middle of the
frequency, relative frequency, and cumulative relative frequency for each class.
observations to determine the
center, and try to decide whether Cumulative
the data are compact or spread Relative relative
out over a wide range. Use the
Class Frequency frequency frequency
relative frequencies or
cumulative relative frequencies 0.00–0.10 5 0.10 (5 5/50) 0.10 (5 0.10)
to find the proportion of 0.10–0.20 14 0.28 (5 14/50) 0.38 (5 0.10 1 0.28)
observations in certain classes.
0.20–0.30 15 0.30 (5 15/50) 0.68 (5 0.38 1 0.30)
0.30–0.40 7 0.14 (5 7/50) 0.82 (5 0.68 1 0.14)
0.40–0.50 5 0.10 (5 5/50) 0.92 (5 0.82 1 0.10)
0.50–0.60 2 0.04 (5 2/50) 0.96 (5 0.92 1 0.04)
0.60–0.70 1 0.02 (5 1/50) 0.98 (5 0.96 1 0.02)
0.70–0.80 1 0.02 (5 1/50) 1.00 (5 0.98 1 0.02)
Total 50 1.00
Use the frequency distribution to sketch the histogram (Figure 2.45). A tech-
nology solution is shown in Figure 2.46.
16
14
12
Frequency
10
8
6
4
2
0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
Radiation
Figure 2.45 Histogram for the radiation data. Figure 2.46 JMP histogram for the
radiation data.
62 Ch apte r 2 Tables and Graphs for Summarizing Data
()*
0.04 ()*
+ 0.02 ()*
+ 0.02 = 0.08
0.50–0.60 0.60–0.70 0.70–0.80
b. Find the cumulative relative frequency up to 0.50 and subtract this value from 1.
Technology Corner
Procedure: Construct
VIDEO TECH MANUALS
a histogram. VIDEO Tech
Video TECH Manuals
MANUALS
Reconsider: Example
EXEL DISCRIPTIVE
2.11, solution, and interpretations. EXEL DISCRIPTIVE
Histogram
CrunchIt!
CrunchIt! has a built-in function to construct a histogram.
1. Enter the data into column Var1.
2. Select Graphics; Histogram. Choose Sample (column) Var1. Optionally enter the Bin Width and Start Bins At. Optionally
enter a Title and X Label. The Y Label is Count by default. Click the Calculate button. Refer to Figure 2.28 (page 56).
TI-84 Plus C
A histogram is one of the six types of TI-84 Plus C statistical plots. There is no built-in function to construct a density
histogram. There are calculator programs available that will produce this graph.
1. Enter the data into list L1.
2. Select STATPLOT ; Plot1 to define, or set up, the histogram. Turn the plot On, select Type histogram, set Xlist to
the name of the list containing the data, set Freq (frequency of occurrence of each observation) to 1, and select a Color.
See Figure 2.47.
3. Set the WINDOW parameters so that Xmin is the left endpoint on the first class, Xmax is the right endpoint on the last
class, and Xscl is the width of each class. Ymin should be 0 (the smallest frequency) and Ymax should be at least the
largest frequency. Set Yscl to a reasonable distance between tick marks. See Figure 2.48.
2.4 Frequency Distributions and Histograms 63
Figure 2.47 TI-84 Plus C Plot1 Figure 2.48 The WINDOW Figure 2.49 TI-84 Plus C
setup screen. settings. histogram.
4. Press GRAPH to display the histogram. See Figure 2.49. Note: The TRACE key is used to move on the graph between
rectangles (classes). The corresponding class boundaries and frequency are displayed.
Minitab
The input may be either a single column containing the data or summary information in two columns: observations and
frequencies.
1. Enter the data into column C1.
2. Select Graph; Histogram and highlight Simple histogram. Click OK.
3. Enter C1 in the Graph variables window. Click OK to view the histogram.
4. Edit the horizontal axis scale to use the correct class intervals. Under the Binning tab, select Interval type: Cutpoint and
enter the Midpoint/Cutpoint positions (class boundaries). See Figure 2.50.
Excel
The input may be either a single column containing the data or summary information in two columns: right endpoint of
each class (bin limits) and corresponding frequencies. There are several methods to construct a frequency distribution in
Excel using FREQUENCY, SUMIF, or COUNTIFS, for example.
1. Enter the data into column A and the right endpoint of each class into column B.
2. Under the Data tab, select Data Analysis and choose Histogram. Enter the Input Range, the Bin Range, the Output
Range, and select Chart Output.
64 C ha pte r 2 Tables and Graphs for Summarizing Data
3. Each class is labeled with its right endpoint. In addition, Excel places observations on a boundary in the smaller class.
See Figure 2.51.
2.71 True/False The relative frequency for each class can be Construct a frequency distribution to summarize these data
determined by using the cumulative relative frequencies. using the class intervals 78–80, 80–82, 82–84, . . . .
2.72 True/False The only difference between a frequency 2.78 Consider the data given on the text website. Construct a
histogram and a relative frequency histogram (for the same frequency distribution to summarize these data. EX2.78
data) is the scale on the vertical axis.
2.79 Consider the following frequency distribution.
2.73 True/False A histogram can be used to describe the
shape, center, and variability of a distribution.
Cumulative
2.74 Short Answer Relative relative
a. When is a density histogram appropriate? Class Frequency frequency frequency
b. In a density histogram, what is the sum of areas of all
rectangles? 400–410 5 0.0758 0.0758
410–420 8 0.1212 0.1970
2.75 Fill in the Blank 420–430 10 0.1515 0.3485
a. The most common unimodal distribution is a 430–440 12 0.1818 0.5303
. 440–450 9 0.1364 0.6667
b. A unimodal distribution is if there is a 450–460 8 0.1212 0.7879
vertical line of symmetry. 460–470 5 0.0758 0.8637
c. If a unimodal distribution is not symmetric, then it is 470–480 4 0.0606 0.9243
. 480–490 3 0.0455 0.9698
490–500 2 0.0303 1.0001
2.76 True/False A bimodal distribution cannot be symmetric.
2.4 Frequency Distributions and Histograms 65
SOLUTION TRAIL
a. Construct a frequency distribution to summarize these a. Construct a histogram for these data.
data, and draw the corresponding histogram. b. Describe the distribution in terms of shape, center, and
b. Describe the shape of the distribution. variability.
c. Estimate the middle of the distribution, a number M such
2.91 Sports and Leisure The National Hockey League
that 50% of the data are below M and 50% are above M.
is concerned about the number of penalty minutes assessed
2.88 Marketing and Consumer Behavior The weights of a to each player. While some people in attendance hope to see a lot
diamond and other precious stones are usually measured in car- of fighting (and penalty minutes), the League Office believes most
ats. One carat is traditionally equal to 200 milligrams. A ran- fans are interested in good, clean hockey. A sample of total penalty
dom sample of the weights (in carats) of conflict-free loose minutes per player during the 2012–2013 regular season was
diamonds is given in the following table.28 diamond obtained, and the data are given in the following table.29 penalty
Relative
Class Frequency frequency Width Density a. Construct a histogram for these data.
b. Describe the distribution in terms of shape, center, and
1400–1500 1 variability.
1500–1600 2 c. Find a number Q1 such that 25% of the problem data are
1600–1800 41 less than Q1. Find a number Q3 such that 25% of the
1800–2000 192 problem data are greater than Q3.
2000–2100 79 d. How many values should be between Q1 and Q3? Find the
2100–2200 60 actual number of values between Q1 and Q3. Explain any
2200–2300 15 difference between these two values.
2300–2800 10
Total 400
Chapter 2 Summary
Concept Page Notation / Formula / Description
Categorical data set 29 Consists of observations that may be placed into categories.
Numerical data set 29 Consists of observations that are numbers.
Discrete data set 30 The set of all possible values is finite, or countably infinite.
Continuous data set 30 The set of all possible values is an interval of numbers.
Frequency distribution 33 A table used to describe a data set. It includes the class, frequency, and relative
frequency (and cumulative relative frequency, if the data set is numerical).
Class frequency 33 The number of observations within a class.
Class relative frequency 33 The proportion of observations within a class: class frequency divided by total
number of observations.
68 C hapte r 2 Tables and Graphs for Summarizing Data
Class cumulative relative 34 The proportion of observations within a class and every class before it: the sum
frequency of all the relative frequencies up to and including the class.
Bar chart 34 A graphical representation of a frequency distribution for categorical data with
a vertical bar for each class.
Pie chart 35 A graphical representation of a frequency distribution for categorical data with
a slice, or wedge, for each class.
Stem-and-leaf plot 45 A graph used to describe numerical data. Each observation is split into a stem
and a leaf.
Histogram 55 A graphical representation of a frequency distribution for numerical data.
Density histogram 58 A graphical representation of a frequency distribution for numerical data con-
taining class intervals of unequal width.
Unimodal distribution 59 A distribution with one peak.
Bimodal distribution 59 A distribution with two peaks.
Multimodal distribution 59 A distribution with more than one peak.
Symmetric distribution 59 A distribution with a vertical line of symmetry.
Positively skewed distribution 60 A distribution in which the upper tail extends farther than the lower tail.
Negatively skewed distribution 60 A distribution in which the lower tail extends farther than the upper tail.
Normal curve 60 The most common distribution, a bell-shaped curve.
CHAPTER 2 EXERCISES
several construction-related machines are given on the text c. Suppose the performance of these microwave ovens is graded
website.32 noise by actual output power, according to the following chart.
a. Construct a frequency distribution for these data.
b. Draw the corresponding histogram. Power Grade
c. What proportion of construction equipment had a peak
900–1000 Excellent
noise level below 80 dBA?
800–900 Very good
d. What proportion of construction equipment have peak
700–800 Good
noise levels of at least 90 dBA?
600–700 Fair
2.100 Technology and the Internet Many computer sellers 500–600 Poor
and most software vendors maintain help lines for customers. 0–500 Not serviceable
A random sample of the duration (in minutes) of customer
support calls to Amazon.com was obtained, and the resulting Classify each power output, construct a frequency
stem-and-leaf plot is given below. distribution by grade, and draw the resulting pie chart.
Stem Leaf
EXTENDED APPLICATIONS
0 11223344555566678888999
1 0001222222335668999 2.103 Economics and Finance PwC, World Bank, and IFC
2 012334556678 released a study about paying taxes around the world. The
3 000123478 report includes measures of the world’s tax systems associated
4 334468 with a standardized business. One of the measures, Time to
5 125 Comply (in hours), for each of the 185 countries in the study, is
6 15 Stem 5 10 given on the text website.33 taxpay
vitamin C group. The duration of each cold (in days) was Figures 2.52 and 2.53 show a frequency distribution and the
recorded, and the data are summarized in the following table. corresponding ogive. The observations are ages. The values to
be used in the plot are shown in bold in the table.
Placebo Vitamin C
Duration frequency frequency
Cumulative
3 0 3 Relative relative
4 0 6 Class Frequency frequency frequency
5 8 7
12–16 8 0.08 0.08
6 7 10
16–20 10 0.10 0.18
7 21 18
20–24 20 0.20 0.38
8 10 15
24–28 30 0.30 0.68
9 26 17
28–32 15 0.15 0.83
10 15 10
32–36 10 0.10 0.93
11 8 9
32–40 7 0.07 1.00
12 3 2
13 1 3 Total 100 1.00
14 1 0
Figure 2.52 Frequency distribution.
a. Use appropriate graphical procedures to compare the
placebo and vitamin C data sets.
b. Do the graphs suggest any differences in shape, center, or
1.0
variability?
Cumulative relative frequency
d. Using the frequency distribution in part (a), a. Construct a frequency distribution for these data.
approximately what proportion of furnaces do not meet b. Draw the resulting ogive for these data.
the DOE’s minimum AFUE requirement? 2.108 Public Health and Nutrition A doughnut graph is
e. The gas company classifies each AFUE reading according another graphical representation of a frequency distribution for
to the following scheme: 90 or above, excellent; at least 80 categorical data. To construct a doughnut graph:
but below 90, good; at least 70 but below 80, fair; and less 1. Divide a (flat) doughnut (or washer) into pieces, so that
than 70, poor. Classify each reading in the table above, and each piece (bite of the doughnut) corresponds to a class.
construct a bar chart for these classification data. 2. The size of each piece is measured by the angle made at
the center of the doughnut. To compute the angle of each
CHALLENGE piece, multiply the relative frequency times 3608 (the
number of degrees in a whole, or complete, circle).
2.107 Sports and Leisure An ogive, or cumulative relative
frequency polygon, is another type of visual representation of a The manager at a Whole Foods Market obtained a random
frequency distribution. To construct an ogive: sample of customers who purchased at least one popular herb
■ Plot each point (upper endpoint of class interval, cumulative (for cooking or medicinal purposes). Figure 2.54 and 2.55
relative frequency). show a frequency distribution and the corresponding dough-
■ Connect the points with line segments. nut graph.
Chapter 2 Exercises 71
Looking Back
n Be familiar with several common tabular and graphical summary procedures.
n Be able to construct a bar chart, pie chart, frequency distribution, stem-and-leaf plot, and
histogram.
Looking Forward
n Learn how to compute and interpret common numerical summary measures that describe
central tendency, variability, or relative standing.
n Learn how to measure distance in statistics.
n Find a five-number summary and construct box plots.
572 711 582 663 612 577 650 550 590 659
610 611 557 685 683 629 626 637 634 723
718 707 673 697 808 755 438 569 684 637
The procedures presented in this chapter will be used to describe the center
and variability of these data, and to search for any unusual observations.
Contents
3.1 Measures of Central Tendency
3.2 Measures of Variability
3.3 The Empirical Rule and Measures of Relative Standing
3.4 Five-Number Summary and Box Plots
73
74 C h apter 3 Numerical Summary Measures
a xi 5 x1 1 x2 1
n
c1 xn : This is an example of summation notation, often used to
i51
can be written more compactly by using the notation on the left side. g is the Greek
write long mathematical expressions more concisely. Here, the sum of n observations
capital letter sigma; i is the index of summation; 1 is the lower bound; and n is the
upper bound. To make the notation more compact and less threatening, we will usu-
ally omit the subscript i 5 1 and superscript n. Unless specifically indicated, each
used to represent the sum of each squared observation: g xi2 5 x12 1 x22 1 c1 xn2.
summation applies to all values of the variable. For example, the following notation is
The following example illustrates the use of this notation and some of the computa-
tions used throughout this text.
Solution
In each case, i is the index of summation, 1 is the lower bound, and 6 is the upper bound.
Apply the definition of summation notation to each expression.
a. In words, expression (a) says add all of the observations, and square the result.
( g xi ) 2 5 ( x1 1 x2 1 x3 1 x4 1 x5 1 x6 ) 2 Expand summation notation.
5 3 5 1 9 1 12 1 ( 26 ) 1 17 1 ( 22 ) 4
2
Use given data.
b. In words, expression (b) says square each observation, and add the resulting values.
g x2i 5 x21 1 x22 1 x23 1 x24 1 x25 1 x26 Expand summation notation.
2 2 2 2 2 2
5 ( 5 ) 1 ( 9 ) 1 ( 12 ) 1 ( 26 ) 1 ( 17 ) 1 ( 22 ) Use given data.
5 579 Add.
c. In words, expression (c) says subtract 7 from each observation, square each difference,
and add the resulting values.
g ( xi 2 7 ) 2 5 ( x1 2 7 ) 2 1 ( x2 2 7 ) 2 1 ( x3 2 7 ) 2 1 ( x4 2 7 ) 2 1 ( x5 2 7 ) 2 1 ( x6 2 7 ) 2
Expand summation notation.
5 ( 5 2 7 ) 2 1 ( 9 2 7 ) 2 1 ( 12 2 7 ) 2 1 (26 2 7 ) 2 1 ( 17 2 7 ) 2 1 (22 2 7 ) 2
Use given data.
The most common measure of central tendency is the sample, or arithmetic, mean.
Definition
x is read as “x bar.” The sample (arithmetic) mean, denoted x, of the n observations x1, x2, . . . , xn is the sum
of the observations divided by n. Written mathematically.
x 5 g xi 5
1 x1 1 x2 1 c1 xn
(3.1)
n n
A Closer L ok
1. The notation x is used to represent the sample mean for a set of observations denoted
by x1, x2, . . . , xn. Similarly, y would represent the sample mean for a set of observations
denoted by y1, y2, . . . , yn.
2. The population mean is denoted by m, the Greek letter mu.
6 11 20 19 23 28 30 8 23 25 29 33
Find the sample mean temperature at the base camp.
Solution
Use Equation 3.1 to find the sample mean.
g xi 5
1 1
x5 ( x1 1 x2 1 c1 x12 ) Add all the numbers, and divide by n 5 12.
12 12
1
5 ( 6 1 11 1 20 1 19 1 23 1 28 1 30 1 8 1 23 1 25 1 29 1 33 )
Galyna Andrushko/Shutterstock 12
1
5 ( 255 ) 5 21.25°F
12
76 C h apter 3 Numerical Summary Measures
A Closer L ok
1. x is a sample characteristic. It describes the center of a fixed collection of data. There
is no set rule to determine the number of included decimal places. Often, at least one
extra decimal place to the right is used to write the result; then the sample mean has
one more decimal place than the original data values.
2. The sample mean is an average. There are many other averages, for example, the geo-
metric mean, the harmonic mean, a weighted mean, the median, and the mode. People
usually associate the average with the sample mean.
3. m is a population characteristic. It describes the center of an entire population. If the
population happens to be of finite size N, then m is the sum of all the values divided by
N. Most populations of interest are infinite, or at least very large, and therefore m is an
unknown constant that cannot be measured. It seems reasonable to use x to estimate
and draw conclusions about m.
4. The population mean m is a fixed constant. x varies from sample to sample. It is rea-
sonable to think that two sample means computed using samples from the same popu-
lation should be close, but different.
If a data set contains outliers—observations very far away from the rest—then the
sample mean may not be a very good measure of central tendency. An outlier has lots of
influence on the sample mean, and tends to pull the mean in its direction. Example 3.3
shows how an outlier can affect the sample mean.
6 11 20 19 23 28 30 8 23 25 29 72
Definition
~x is read as “x tilde.” The sample median, denoted ~x , of the n observations x1, x2, . . . , xn is the middle number
when the observations are arranged in order from smallest to largest.
1. If n is odd, the sample median is the single middle value.
2. If n is even, the sample median is the mean of the two middle values.
3.1 Measures of Central Tendency 77
A Closer L ok
1. The median divides the data set into two parts, so that half of the observations lie
below and half lie above the median.
2. Only one calculation is necessary to find the median (no calculations are needed if n is
odd). Put the observations in ascending order of magnitude (not the order in which the
observations were selected), and find the middle value.
~
3. Similarly, y represents the sample median for a set of observations denoted by y , y , 1 2
. . . , yn.
~ .
4. The population median is denoted by m
Observations Median
a. 10 11 14 16 17 There are n 5 5 observations. The middle number is in the third
position. ~x 5 14.
b. 10 11 14 16 57 There are still n 5 5 observations. The middle number is in
the third position, and ~x 5 14. The outlier 57 does not affect
the median.
Data Set c. 10 11 14 16 17 20 There are n 5 6 observations. There is no single middle value.
PUBLISH The median is the mean of the observations in the third and
fourth positions. ~x 5 12 ( 14 1 16 ) 5 15.
Solution
Step 1 Arrange the observations in order.
13,500 13,900 14,000 14,100 14,300 14,500 15,200 15,200 15,700 15,900 16,100 16,900
Step 2 There are n 5 12 observations. The median is the mean of the two middle values
(in the sixth and seventh positions).
A Closer L ok
Figure 3.2 The sample median 1. In general, the sample mean is not equal to the sample median, x 2 ~
x . If the distribu-
and other summary statistics tion of the sample is symmetric, then x 5 ~x . If the sample distribution is approxi-
found using JMP. mately symmetric, then x < ~x .
78 C h apter 3 Numerical Summary Measures
~ . If the
2. In general, the population mean is not equal to the population median, m 2 m
distribution of the population is symmetric, then m 5 m ~.
3. The relative positions of x and ~ x suggest the shape of a distribution. The smoothed
histograms in Figures 3.3–3.5 illustrate three possibilities:
a. If x . ~
x , the distribution of the sample is positively skewed, or skewed to the right
(Figure 3.3).
b. If x < ~
x , the distribution of the sample is approximately symmetric (Figure 3.4).
c. If x , ~
x , the distribution of the sample is negatively skewed, or skewed to the left
(Figure 3.5).
Statistical Applet
Mean and Median
Stepped
STEPPED Tutorial
TUTORIALS ~ ~ ~
x x̄ x ¯x x̄ x
Measures
BOX PLOTSof center:
mean and median Figure 3.3 Positively Figure 3.4 Approximately Figure 3.5 Negatively
skewed distribution. symmetric distribution. skewed distribution.
Recall: A histogram consists of Because the sample mean is extremely sensitive to outliers, and the sample median is
rectangles drawn above each class very insensitive to outliers, it seems reasonable to search for a compromise measure of
with height proportional to central tendency. A trimmed mean is moderately sensitive to outliers.
frequency or relative frequency.
We draw a curve along the tops Definition
of the rectangles to smooth out
the histogram and display an A 100p% trimmed mean, denoted xtr(p), of the n observations x1, x2, . . . , xn is the sample
enhanced graphical mean of the trimmed data set.
representation of the distribution. 1. Order the observations from smallest to largest.
2. Delete, or trim, the smallest 100p% and the largest 100p% of the observations from the
data set.
3. Compute the sample mean for the remaining data.
100p is the trimming percentage, the percentage of observations deleted from each end
of the ordered list.
A Closer L ok
1. We compute a trimmed mean by deleting the smallest and largest values, which are
possible outliers. Some statisticians believe that deleting any data is a bad idea, because
every observation contributes to the big picture.
2. A 100p% trimmed mean is computed by deleting the smallest 100p% and the largest
100p% of the observations. Therefore, 2(100p)% of the observations are removed.
3. There is no set rule for determining the value of p. It seems reasonable to delete only
a few observations, and to select p so that np (the number of observations deleted from
each end of the ordered data) is an integer.
4. Here is a specific example using the notation: xtr(0.05) is a (100)(0.05) 5 5% trimmed
mean. In this example, 10% of the observations are discarded.
disease. Suppose the following December overtime hours for tellers at the Kaw Valley State
Bank and Trust Company in Topeka, Kansas, were obtained. Find a 10% trimmed mean.
0.2 0.8 1.5 1.5 1.6 1.7 1.7 1.8 2.0 2.0 2.2 2.5 2.7 2.7 3.0 3.0 3.2 3.5 4.0 5.0
Solution
Step 1 The trimming percentage is 10%. p 5 10/100 5 0.10. Find the number of obser-
vations to delete from each end of the ordered list.
There are n 5 20 observations.
np 5 ( 20 )( 0.10 ) 5 2 Trim 2 observations from each end.
Note that np may not be an integer. Computer software packages have algo-
rithms for dealing with this problem.
Step 2 The resulting data set is
Definition
The mode, denoted M, of the n observations x1, x2, . . . , xn is the value that occurs most
often, or with the greatest frequency.
If all the observations occur with the same frequency, then the mode does not exist.
If two or more observations occur with the same greatest frequency, then the mode is not
unique. If there are two modes, the distribution is bimodal, three modes, trimodal, etc.
The mode is easy to compute and, intuitively, it does return a reasonable measure of
central tendency. For example, consider a bell-shaped distribution. A random sample
from this distribution should contain lots of (identical) values near the center. Therefore,
the mode should suggest the middle of the distribution (Figure 3.7). For symmetric distri-
Figure 3.7 We expect the mode butions, the mean, the median, and the mode will be about the same.
M of a sample from this
distribution to be near the A Closer L ok
population mean m.
1. Sometimes only a data summary table, or grouped data, is available. Let x1, x2, . . . , xk
For example, x7 occurs f7 times. The total number of observations is n 5 g fi. If the
be a set of (representative) observations with corresponding frequencies f1, f2, . . . , fk.
data are grouped, there are corresponding formulas for the (approximate) measures of
central tendency defined above.
2. Remember, there are many other averages, for example, the weighted mean, the geo-
metric mean, and the harmonic mean.
The remainder of this section
describes summary measures The natural summary measures for observations on a qualitative variable are simply
for qualitative data. the frequency and relative frequency of occurrence for each category. We have already
80 C h apter 3 Numerical Summary Measures
done this! Recall Example 2.4, in which 25 cruise ships were randomly selected and the
destination of each ship was recorded. Each response was categorical (destination), and
the data were summarized in a table listing only category, frequency of occurrence for
each category, and relative frequency of occurrence for each category.
Suppose now that the commuter students at a small college are asked to complete a
survey to identify the make of car they use to drive to school. Numerical summary mea-
sures for this categorical variable should include frequencies and relative frequencies, or
proportions, as shown in the following table. (The cumulative relative frequency is used
only for numerical data sets, and doesn’t really make sense here because there is no natu-
ral ordering.)
Relative
Category Frequency frequency
Buick 137 0.0938
Chevrolet 288 0.1973
Ford 202 0.1384
Honda 336 0.2301
Mazda 175 0.1199
Saturn 322 0.2205
Total 1460 1.0000
A dichotomous or Bernoulli variable is a special categorical variable that has only two
possible responses. One response is often associated with, or called, a success, denoted S,
and the other response is called a failure, denoted F. The two possible actual responses
are ignored. For example, suppose a medical researcher selects children at random and
asks them all whether they have had an ear infection within the past year. The response
had an ear infection might be a success, and had no ear infection would be a failure. The
same numerical measures are used to summarize observations on this kind of categorical
variable: frequency and relative frequency of occurrence for each response. The relative
frequency of successes has a special name.
Definition
For observations on a categorical variable with only two responses, the sample propor-
=
tion of successes, denoted p, is the relative frequency of occurrence of successes:
=
p is read as “p hat.” = number of S’s in the sample n(S)
p5 5 (3.2)
total number of responses n
A Closer L ok
The symbol p is used in notation 1. The population proportion of successes is denoted by p.
to represent several quantities: 2. The success response is not necessarily associated with a good thing. For example, a
the population proportion of researcher may be interested in the proportion of laboratory animals that die when they
successes, the sample proportion are exposed to a certain toxic chemical. A success may be associated with the death of
of successes, and in the definition an animal.
of the trimmed mean. The context =
3. The sample proportion of successes p can be thought of as a sample mean in disguise.
in which the notation is used
implies the appropriate concept.
Suppose every S is changed to a 1, and every F to a 0. The sample mean for this new
numerical data is
1 n(S) =
x5 ( a sum of 0’s and 1’s ) 5 5 p
n n
3.1 Measures of Central Tendency 81
S S F F S S F S F S F S S S
S F S S S S F S S F S F S F
The sample contains 28 observations and 18 successes. The sample proportion of suc-
cesses is
= n(S) 18
p5 5 5 0.6429
n 28
Approximately 64% of the drivers stopped at the checkpoint were wearing their seatbelts.
=
It is reasonable to assume the value of p is close to the population proportion of
successes—in this example, the true proportion of drivers who wear a seatbelt.
Technology Corner
Procedure: Compute the sample mean, sample median, a trimmed mean, and the mode.
Reconsider: Example 3.2, solution, and interpretations.
CrunchIt!
CrunchIt! has a built-in function to find certain descriptive statistics, including the sample mean and the sample median.
There is no built-in function to compute a trimmed mean nor a sample mode.
1. Enter the data into a column.
2. Select Statistics; Descriptive Statistics. Choose the appropriate column and click the Calculate button. See
Figure 3.8.
TI-84 Plus C
There are several ways to find the sample mean and the sample median using the graphing calculator. There is no built-in
function to compute a trimmed mean nor a sample mode.
1. Enter the data into list L1.
2. Use the command LIST ; MATH; mean to compute the sample mean. Use the command LIST ; MATH; median to
compute the sample median. See Figure 3.9.
3. The function STAT ; CALC; 1-Var Stats returns several summary statistics. The sample mean is on the first output
screen and the sample median is on the second, denoted by Med. See Figures 3.10 and 3.11.
82 C h apter 3 Numerical Summary Measures
Figure 3.9 The sample mean Figure 3.10 The sample mean Figure 3.11 The second output
and the sample median using is part of the output from the screen from 1-Var Stats
built-in calculator functions. 1-Var Stats function. shows the sample median (Med).
Minitab
There are several ways to find the summary statistics using Minitab. In addition to the general Describe command, there are
Calc; Column statistics functions, Calc; Calculator functions, and various macros.
1. Enter the data into column C1.
2. Select Stat; Basic Statistics; Display Descriptive Statistics. Enter C1 in the Variables window.
3. Choose the Statistics option button and check the summary statistics Mean, Median, Mode, and Trimmed mean.
Note: Minitab computes only a 5% trimmed mean. Other macros allow any percentage. See Figure 3.12.
Excel
Excel has built-in functions for these four descriptive statistics. Under the Data tab, Data Analysis; Descriptive Statistics can
also be used to compute several summary statistics simultaneously.
1. Enter the data into column A.
2. Use the appropriate Excel function to compute the sample mean, sample median, trimmed mean, and mode. Note: the
second argument in TRIMMEAN is the total proportion of data trimmed. Excel rounds the number of trimmed observa-
tions down to the nearest multiple of 2. See Figure 3.13.
3.2 True/False 3.9 Use the values of the sample mean and the sample median
a. The sample mean and the population mean are always the to determine whether the distribution is symmetric, skewed to
same value. the left, or skewed to the right.
b. The sample mean and the sample median can be the same a. x 5 37, ~ x 5 49
value. b. x 5 63.5, ~ x 5 62.75
c. The sample mean is sensitive to outliers. c. x 5 237, ~ x 5 216
d. When computing a trimmed mean, we discard the same d. x 5 212.56, ~ x 5 12.56
number of observations from each end of the ordered list.
3.10 Compute the indicated trimmed mean for each data
e. The mode may not exist for a specific data set.
set. EX3.10
f. It is reasonable to assume that the sample proportion of
a. {24, 36, 26, 30, 28, 35, 33, 33, 34, 27} xtr(0.10)
successes is close to the population proportion of
b. {72, 76, 76, 77, 85, 76, 80, 86, 62, 70} xtr(0.20)
successes.
c. {182, 169, 180, 166, 173, 101, 188, 124, 182, 137, 100,
137, 118, 111, 137, 181, 189, 130, 168, 133} xtr(0.20)
Practice d. {5.5, 7.5, 7.3, 6.4, 5.3, 9.5, 7.2, 5.8, 7.0, 6.7, 9.0, 8.1, 8.4,
5.8, 5.4, 7.2, 7.4, 7.5, 5.9, 7.5} xtr(0.15)
3.3 Compute each summation using the following random
sample. EX3.3 3.11 Find the mode for each data set, if it exists. EX3.11
a. 3, 5, 6, 7, 3, 4, 6, 6, 8, 11, 13, 2, 1
a. g xi gx2i c. g ( xi 2 10 )
x1 5 215 x2 5 6 x3 5 40 x4 5 13 x5 5 38 b. 217, 210, 0, 3, 25, 4.3, 12, 0, 5, 22.1, 1.7, 27
d. g ( xi 2 5 ) 2 e. g ( 2xi )
b. c. 6.6, 7.3, 5.2, 6.2, 8.3, 9.8, 4.1, 3.7
f. 2g xi
3.12 Find the sample proportion of successes for each data
3.4 Suppose the following random sample is obtained. set. EX3.12
a. S, F, S, F, F, F, F, F, S, S, S, F, F, S
43.3 52.7 67.7 52.1 54.7 b. F, S, S, F, S, F, F, S, S, S, S, S, S, S, S, F, S, S, S, S, S
54.1 46.7 47.2 48.5 45.8 c. S, F, S, F, F, F, F, F, S, S, S, F, F, S, F, F, S, F, S, S, S, S,
F, F, F, F, F, F, F, S, S, F, S, F, F
a. gx2i b. gx3i
Compute the following sums. EX3.4
c. g ( xi 2 50 ) 2 d. ( gxi ) 2
Applications
e. g ( xi 2 51.28 ) f. g
xi 3.13 Travel and Transportation Tractor trailers tend to
exceed the speed limit (65 mph) on one downhill stretch of
7
Route 80 in Pennsylvania. Using a radar gun, the following
a. g xi 5 1057, n 5 10 b. g xi 5 356, n 5 27
3.5 Compute the mean for each sample with known sum.
tractor trailer speeds (in mph) were observed.
c. g xi 5 250.5, n 5 36 d. gxi 5 1.355, n 5 11
RADAR
e. g xi 5 237.4, n 5 15 f. g xi 5 496.81, n 5 28 81 66 67 69 79 62 70 73 67 60 61
67 74 65 77 74 64 71 64 67 61
3.6 Find the position, or location, of the sample median in an
ordered data set of size n. a. Find the sample mean, x.
a. n 5 22 b. n 5 37 b. Find the sample median, ~x.
c. n 5 117 d. n 5 64 c. What do your answers to parts (a) and (b) suggest about
the shape of the distribution of speeds?
3.7 Find the sample mean and the sample median for each
data set. EX3.7 3.14 Biology and Environmental Science The 2012
a. 5, 3, 7, 9, 11, 5, 6, 7, 7 estimated wheat production (in 1000 metric tons) for several
b. 27, 10, 25, 22, 36, 224, 0, 1, 12, 9, 211 countries is given on the text website.4 WHEAT
c. 5.4, 3.3, 6.0, 10.1, 13.6, 7.7, 16.6, 28.9, 4.6 a. Find the sample mean and the sample median for these
d. 2103.7, 2110.4, 2109.1, 299.7, 2115.6 data.
84 C h apter 3 Numerical Summary Measures
b. What do the summary statistics in part (a) suggest about manufactured to give hitters an advantage. To investigate this
the shape of the distribution of wheat yield? claim, a sample of the earned run average (ERA) for Ameri-
can League starting pitchers for the 2012 season was
3.15 Technology and the Internet Steam is a software
obtained.8 PITCH
platform used to distribute and manage multiplayer online
a. Find the sample mean and the sample median.
games. A day was selected at random and the number of
b. Suppose the pitcher with the highest ERA plays in
simultaneous peak users for certain games is given in the
Colorado, where the air is thin, and the home runs are
following table.5 GAMES
many. To eliminate such outliers, find a 4% trimmed mean.
c. Find the mode for the original data set, if it exists.
930 858 827 849 744 849 763 753 781
948 678 742 769 754 782 862 894 861 3.21 Biology and Environmental Science The water
temperature (in degrees Fahrenheit) during the summer of 2012
a. Find the sample mean and the sample median. at several locations off the coast of Florida is given in the
b. Suppose the last observation had been 2861 instead of following table.9 TEMP2012
861. Find the sample mean and sample median for this
revised data set. Explain how this change in the data 79 80 80 80 80 81 79 82 84 86
affects the mean and median found in part (a). 81 81 83 84 84 84 80 81 83 84
3.16 Fuel Consumption and Cars The text website contains 84 85 86 86 79 84 79 80 79 79
a table that lists the atmospheric CO2 concentration (in ppm) a. Find the sample mean and the sample median.
for 36 months ending in October 2012.6 ATMOCO2
b. Find a 10% trimmed mean for these data.
a. Find the sample mean and sample median for this data. c. Find the mode for the original data, if it exists.
b. A certain group considers any monthly concentration less
than 390 a success (not harmful to the environment). Find 3.22 Education and Child Development An educational
the sample proportion of successes. study was designed to compare cooperative learning versus
traditional lecture style. Two sections of an introductory statistics
3.17 Education and Child Development The Math SAT class were used. Seven students were randomly selected from
scores for all students in an introductory statistics class at each section. The scores on the second test (a 30-item exam) are
Edinboro University are given on the text website. EDINSAT
given in the following table. EDSTYLE
a. Find the sample mean and the sample median.
b. Find a 5% trimmed mean. Traditional 21 28 25 25 21 19 23
c. Using these three numerical summary measures, describe Cooperative 25 30 28 25 24 24 29
the shape of the distribution.
Which group of students did better on average? Justify your
3.18 Manufacturing and Product Development A
answer.10
random sample of 12-ounce cans of Dr Pepper soda was
obtained from E. M. Heaths supermarket. The exact amount
of soda (in ounces) in each can was measured, and the data Extended Applications
are given on the text website. SODACAN
3.23 Sports and Leisure The gold medal in the women’s
a. Find the sample mean and the sample median. 10-meter platform diving at the 2012 Summer Olympics in
b. What do the summary statistics in part (a) suggest about London was won by Roulin Chen from China. Many of the
the shape of the distribution of the amount of soda in each participants in this competition included a 407C, an inward
can? 312 somersault, as one of their dives. The text website contains the
c. Suppose any amount of 12 ounces or greater is considered scores for some of these dives for various participants.11 DIVING
a success. Find the sample proportion of successes. a. Find the sample mean and the sample median for this data
3.19 Biology and Environmental Science There are set.
approximately 10,000 commercial fishing harvesters in Maine, b. Find the mode, if it exists.
and these businesses contribute one-third of the value of all c. Multiply each score by the degree of difficulty, 3.2. Find the
New England fisheries. Some of the species caught off the sample mean for this new data set. How does this sample
Maine coast include cod, salmon, and flounder. The number of mean compare with the sample mean found in part (a)?
pounds (in millions) caught by Maine fisheries for several 3.24 Public Health and Nutrition Residents in the greater
recent years is given on the text website.7 CATCH
Toronto area have complained that there has been an increase in
a. Find the sample mean and the sample median for this data aircraft noise.12 This increased noise may lead to sleep distur-
set. bances and other health problems. Suppose the noise level (in
b. Which statistic is a better measure of central tendency for dBA) of several aircraft was measured using one flight path.
these data? Justify your answer. The data are given in the following table. AIRNOISE
3.20 Sports and Leisure Some critics of Major League
Baseball believe the ball is juiced (livelier) because it is 75 72 71 65 63 72 70 68
3.1 Measures of Central Tendency 85
a. Find the sample mean for these data. 3.28 Medicine and Clinical Studies In a random sample of
b. Some researchers believe that the wind velocity gradient 13 patients with calcaneus bone fractures, the sample mean
can add as much as 4 dB to each reading. Add 4 dB to number of days until fracture healing was x 5 37.85 and the
each observation in the data set. Compute the new sample sample median was ~x 5 40. Suppose an additional patient is
mean. How does this compare with the sample mean added to the sample so that x14 5 44.5.
found in part (a)? a. Find the sample mean for all 14 patients.
b. Is there any way to determine the sample median for all
3.25 Travel and Transportation The following table
14 patients? Explain.
contains the estimated unlinked light rail transit passenger trips
(in thousands) for various cities during June 2012. TRANSIT 3.29 Fuel Consumption and Cars The estimated oil
reserves (in millions of barrels) of four wells are given by
995.1 5132.7 190.9 1018.0 2593.8 4055.9
x1 5 1078 x2 5 5833 x3 5 10,772 x4 5 7320
849.6 1691.2 22.5 597.5 6415.3 752.4
933.4 1426.4 409.9 1867.0 517.0 245.8 a. Find x5 so that the mean for all five observations is 6883.4.
3605.5 2381.4 655.3 155.0 1822.0 896.2 b. Find x5 so that the sample mean is equal to the sample
median.
Source: American Public Transportation Association.
a. Find the sample mean number of unlinked light rail 3.30 Manufacturing and Product Development A
transit trips. consumer group has tested the drying time for 15 samples of
b. Use each June observation to estimate the yearly number exterior latex paint. The sample mean drying time is 83.8 minutes.
of trips. That is, multiply each observation by 12. Find the What must the 16th drying time be if the 16th observation
sample mean for this new data set. How does this sample decreases the mean drying time by 30 seconds? By 1 minute?
mean compare with the sample mean found in part (a)? 3.31 Biology and Environmental Science The beaches
along the coast of New Hampshire are famous for chilly
3.26 Manufacturing and Product Development A new
waters, even during the hottest summer days. A recent sample
quality control program was recently started at a Hyundai
of the water temperature on 24 randomly selected summer
manufacturing facility. Several times each day, randomly
days was obtained. The following temperatures are in degrees
selected panels from a stamping press are inspected for defects.
Fahrenheit (8F). BEACH
A nondefective panel is a success (S). A defective panel is a
failure (F) and must be restamped at an additional cost. During a
recent inspection, the following 32 observations were recorded: 58 58 53 53 59 57 54 61 56 60
57 61 56 55 59 60 55 53 55 58
S S S S F S F F S S S S S S 59 53 59 63
S S F S S S S S S S S F S S
a. Find the sample mean and the sample median.
S S S S
b. Convert each temperature to degrees Celsius (8C). Use the
2 32
a. Find the sample proportion of successes. formula C 5 F 1.8 . Find the mean for all the water
b. Change each S to a 1, and each F to a 0. Find the sample temperatures in degrees Celsius.
mean for these new data. How does the mean compare c. What is the relationship between the sample means in
with the sample proportion of successes found in part (a)? parts (a) and (b)?
c. Suppose 8 additional panels were selected and inspected
3.32 Technology and the Internet A recent survey of
(for a total of 40 panels). Is it possible for the sample
students at Minneapolis North High School included a question
proportion of successes to be 0.9? Why or why not?
about the number of computers at home. The (grouped) data are
3.27 Sports and Leisure The playing time for rookies in the summarized below. HOMECOMP
National Basketball Association (NBA) depends on many
factors, including position and performance. A random sample
Number of Frequency
of playing times per game (in minutes) for rookies in the NBA
computers of occurrence
during the 2011–2012 season was obtained. The data are given
in the following table. rookies 0 3
1 27
34.2 30.5 29.4 29.4 25.5 23.1 20.2 19.5 10.5
2 23
18.9 18.6 16.7 15.2 15.0 14.6 13.5 13.2 12.8
3 7
Source: National Basketball Association.
4 3
a. Find the sample mean and the sample median.
b. What do the summary statistics in part (a) suggest about 5 1
the shape of the distribution of playing time for rookies?
c. Can you change the maximum observation (34.2) so that the Find the sample mean and the sample median number of
sample mean is equal to the sample median? Why or why not? computers at home.
86 C h apter 3 Numerical Summary Measures
0 5 10 15 20 25 30 0 5 10 15 20 25 30
Figure 3.14 Sample 1: x1, x2, . . . , xn. Figure 3.15 Sample 2: y1, y2, . . . , ym.
The smoothed histogram suggests a The smoothed histogram suggests the
compact distribution. data are more disperse, or spread out.
The measures of central tendency (sample mean and sample median) are approxi-
mately the same (x < y < 15 and ~x < ~y < 15), but the data in Sample 1 are more com-
pact because more of the data are clustered about the mean x 5 15. To describe the
difference between the data sets, we need to consider variability.
Definition
The (sample) range, denoted R, of the n observations x1, x2, . . . , xn is the largest observa-
tion minus the smallest observation. Written mathematically,
R 5 xmax 2 xmin (3.3)
where xmax denotes the maximum, or largest, observation, and xmin stands for the mini-
mum, or smallest, observation.
A Closer L ok
1. In theory, the sample range does measure, or describe, variability. A data set with a
small range has little variability and is compact. A data set with a large range has lots
of variability and is spread out.
2. The sample range is used in many quality control applications. For example, a produc-
tion supervisor may want to maintain small variability in a manufacturing process. The
sample range may be used to determine whether the process is still well controlled, or
whether there is abnormal variation.
Despite being very easy to compute and a logical measure, the sample range is not
adequate for describing variability. The sample range may not accurately represent the
variability of a distribution if the maximum and minimum values are outliers.
The sample range for each data set summarized by the smoothed histograms in Figures
3.14 and 3.15 is approximately the same: R < 30 2 0 5 30. In fact, the two data sets
have approximately the same mean. Therefore, it is necessary to use a better, more sensi-
tive measure of variability. To derive a more precise measure of variability, consider how
far each observation lies from the mean.
A graph may be used to visualize the spread of data and to suggest another measure-
ment. A dot plot is a graph that simply displays a dot corresponding to each observation
3.2 Measures of Variability 87
Sample 2: y’s
Data: {12, 13, 14, 26, 27, 28}
Sample 1: x’s
Data: {17, 18, 19, 21, 22, 23}
12 14 16 18 22 24 26 28
x y
along a number line. The stacked dot plot in Figure 3.16 may be used to compare the vari-
ability in Sample 1 (x’s) versus Sample 2 (y’s).
In Sample 1, the data set is compact; each observation is very close to the mean. In
Sample 2, the data set is more spread out; each observation is far away from the mean.
This analysis of Figure 3.16 suggests that a better measure of variability might include the
distances from the mean.
Definition
Given a set of n observations x1, x2, . . . , xn, the ith deviation about the mean is xi 2 x.
A Closer L ok
1. Given a data set, to calculate the ith deviation about the mean, find x, then compute the
difference xi 2 x. For example, the seventh deviation about the mean is the value x7 2 x.
2. We usually do not need any one deviation about the mean; all of the deviations about
the mean together will be used to find a suitable measure of variability.
3. If the ith deviation about the mean is positive, then the observation is to the right of the
mean: If xi 2 x . 0, then xi . x.
I f the ith deviation about the mean is negative, then the observation is to the left of the
mean: If xi 2 x , 0, then xi , x.
A data set with little variability should have small deviations about the mean, and the
squares of the deviations should be small. A data set with lots of variability should have
large deviations about the mean, and the squares of the deviations should be large. This
idea is used to define the sample variance.
Definition
The sample variance, denoted s2, of the n observations x1, x2, . . . , xn is the sum of the
squared deviations about the mean divided by n 2 1. Written mathematically,
g ( xi 2 x ) 2
1
s2 5
n21
1
5 3 ( x1 2 x ) 2 1 ( x2 2 x ) 2 1 c1 ( xn 2 x ) 2 4 (3.4)
n21
The sample standard deviation, denoted s, is the positive square root of the sample vari-
ance. Written mathematically,
s 5 "s2 (3.5)
88 C h apter 3 Numerical Summary Measures
A Closer L ok
The sample variance s2 is often 1. The population variance, a measure of variability for an entire population, is denoted
called an average of the squared by s2, and the population standard deviation is denoted by s, the Greek letter sigma.
deviations about the mean, yet 2. Just knowing s2 doesn’t seem to say much about variability. If s2 5 6, for example, it
we divide the sum of the squared is hard to infer anything about variability. However, the sample variance s2 is a measure
deviations by n 2 1. Although this of variability, and it is useful in comparisons. For example, if Sample 1 and Sample 2
does not seem correct, dividing by have similar units, s21 5 14, and s22 5 10, then the data in Sample 2 are more compact.
n 2 1 makes s2 an unbiased
3. The sample standard deviation s is used (rather than s2) in many statistical inference
estimator of s2. We will see later
problems. So, if we need to find s (by hand), we need to compute s2 first, and then take
in the text that an unbiased
the positive square root to find s.
statistic is, in some sense, a good
thing. There are n 2 1 degrees of 4. The units for the sample standard deviation are the same as for the original data. And
freedom, a kind of dimension of a value of s 5 0 means there is no variability in the data set.
variability, associated with the 5. The notation s2x is used to represent the sample variance for a set of observations
sample variance s2. denoted by x1, x2, . . . , xn. Similarly, s2y represents the sample variance for a set of obser-
vations y1, y2, . . . , yn.
Solution
Step 1 Find the sample mean:
1 1
x5 ( 6.2 1 4.5 1 6.6 1 7.0 1 8.2 ) 5 ( 32.5 ) 5 6.5
5 5
Step 2 Use Equation 3.4 to find the sample variance.
1
s2 5 3 ( 6.2 2 6.5 ) 2 1 ( 4.5 2 6.5 ) 2 1 ( 6.6 2 6.5 ) 2 1 ( 7.0 2 6.5 ) 2 1 ( 8.2 2 6.5 ) 2 4
4
Use data and x.
1
5 3 ( 20.3 ) 2 1 ( 22.0 ) 2 1 ( 0.1 ) 2 1 ( 0.5 ) 2 1 ( 1.7 ) 2 4 Compute differences.
4
1
5 3 0.09 1 4.0 1 0.01 1 0.25 1 2.89 4 Square each difference.
4
1
5 ( 7.24 ) 5 1.81 Add, divide by 4.
4
Step 3 Take the positive square root of the variance to find the standard deviation.
Equation 3.4 is the definition of the sample variance and may be used to find s2, but
there is actually a more efficient technique for computing s2.
Definition
The computational formula for the sample variance is
c g x2i 2 ( g xi ) 2 d
1 1
s2 5 (3.6)
n21 n
3.2 Measures of Variability 89
This is a convenient shortcut method for calculating s2 without having to find all the
deviations about the mean. Suppose x1, x2, . . . , xn is a set of observations. To find s2,
Equation 3.6 says:
1. Find the sum of the squared observations, g x2i .
2. Find the sum of the observations, g xi.
3. Square the sum of the observations, ( g xi ) 2.
( g xi ) 2.
1
4. Multiply the square of the sum of the observations by 1/ n,
n
5. Subtract the two quantities, and multiply the difference by 1 / (n 21),
c g x2i 2 ( g xi ) 2 d
1 1
s2 5
n21 n
Solution
Step 1 Find the sum of the squared observations:
( g xi ) 2 5 ( 32.5 ) 2 5 211.25
1 1
5 5
Step 4 Subtract the two quantities, and multiply by 1 / (n 2 1):
1 1
s2 5 ( 218.49 2 211.25 ) 5 ( 7.24 ) 5 1.81
4 4
It can be shown that Equation 3.4 and Equation 3.6 are equivalent. Exercise 3.49 at the
end of this section asks for a proof. If you must find a sample variance by hand, then use
the computational formula. It has fewer calculations (is more efficient) and is usually
more accurate (has less round-off error). In fact, most calculator and computer programs
that find the sample variance use the computational formula.
The sample variance is always greater than or equal to zero: s2 $ 0. This is easy to see
by looking at the definition in Equation 3.4. We sum squared deviations about the mean
(always greater than or equal to zero) and divide by a positive number (n 2 1). There are
two special cases.
1. s2 5 0: This occurs if all the observations are the same. If all the observations are
equal to some constant c, the mean is c, and all the deviations about the mean are zero.
Hence, s2 5 0. This makes sense intuitively also: If all the observations are the same,
there is no variability.
2. n 5 1: This is a strange case, but it can occur. If n 5 1, there is no variability—or,
another way to think of this—we cannot measure variability. The denominator in
Equation 3.4 is zero, and anything divided by zero is undefined.
90 C h apter 3 Numerical Summary Measures
The sample variance (and the sample standard deviation) can be greatly influenced by
outliers. An observation very far away from the rest has a large deviation about the mean,
a large squared deviation about the mean, and therefore contributes a lot to the sum (in the
definition of the sample variance). The interquartile range is another measure of variability,
and it is resistant to outliers.
Definition
Let x1, x2, . . . , xn be a set of observations. The quartiles divide the data into four parts.
Note that the definition for Q1 1. The first (lower) quartile, denoted Q1 (QL), is the median of the lower half of the
and Q3 involves the median, not observations when they are arranged in ascending order.
the mean. 2. The second quartile is the median ~ x5Q. 2
3. The third (upper) quartile, denoted Q3 (QU), is the median of the upper half of the
observations when they are arranged in ascending order.
4. The interquartile range, denoted IQR, is the difference IQR 5 Q3 2 Q1.
In smoothed histograms, the area The quartiles are illustrated in Figure 3.18.
under the curve between two
points corresponds to the
proportion of observations
between those points.
Interpreting Figure 3.18: 25% of
the observations are between Q1
and ~x . 25% 25% 25% 25%
Q1
x Q2 Q3
A Closer L ok
There is a very intuitive method for finding the quartiles. Arrange the data in order from
smallest to largest. The median, ~x 5 Q2, is the middle value. The first quartile, Q1, is the
median of the lower half, and the third quartile, Q3, is the median of the upper half. In
practice, a more general method is used for locating the position, or depth, of the first and
third quartiles (in the ordered data set).
68 71 64 58 61 76 73 62 72 66
a. Find the first quartile, the third quartile, and the interquartile range.
b. Suppose there are 12 patients in the study, with x11 5 78 and x12 5 81. Find the first
quartile, the third quartile, and the interquartile range for this modified data set.
Solution
Step 1 Arrange the observations in order from smallest to largest.
Observation 58 61 62 64 66 68 71 72 73 76
Position 1 2 3 4 5 6 7 8 9 10
Step 2 Find the depth of the first quartile.
IQR 5 72 2 62 5 10
A technology solution is shown in Figures 3.19 and 3.20.
Step 5 If there are 12 patients in the study, arrange the observations in order from smallest
to largest in the modified data set.
Observation 58 61 62 64 66 68 71 72 73 76 78 81
Position 1 2 3 4 5 6 7 8 9 10 11 12
Step 6 Find the depth of the first quartile.
Q1 is the mean of the observations in the third and fourth positions in the
ordered list.
1
Q1 5 ( 62 1 64 ) 5 63
2
Step 7 Find the depth of the third quartile.
3n ( 3 )( 12 ) Because d3 is a whole number, add 0.5.
d3 5 5 5 9
4 4 The depth of the third quartile is 9.5.
Q3 is the mean of the observations in the ninth and tenth positions in the ordered list.
1
Q3 5 ( 73 1 76 ) 5 74.5
2
Step 8 Find the interquartile range.
IQR 5 74.5 2 63 5 11.5
A technology solution is shown in Figures 3.21 and 3.22.
Stepped
STEPPED Tutorial
TUTORIALS A Closer L ok
Measures
BOX PLOTSof spread 1. The interquartile range is the length of an interval that includes the middle half (middle
50%) of the data.
Statistical Applet 2. The interquartile range is not sensitive to outlying values. The lower and/or upper 25%
One-Variable Statis- of the distribution can be extreme without affecting Q1 and/or Q3.
tical Calculator
Technology Corner
Procedure: Compute the sample variance, VIDEOsample standard deviation, first quartile,
TECH MANUALS VIDEO Tech
Video TECH Manuals
MANUALS
third quartile, and interquartile range. EXEL DISCRIPTIVE EXEL DISCRIPTIVE
Summary statistics
Reconsider: Example 3.10(b), solution, and interpretations.
CrunchIt!
CrunchIt! has a built-in function to find certain descriptive statistics, including the sample standard deviation, first quartile,
and third quartile. There is no built-in function to compute the sample variance nor the interquartile range.
1. Enter the data into a column.
2. Select Statistics; Descriptive Statistics. Choose the appropriate column and click the Calculate button. See
Figure 3.23.
3.2 Measures of Variability 93
TI-84 Plus C
1. Enter the data into list L1.
2. Select LIST ; MATH; variance. Take the square root of the variance to find the standard deviation. See
igure 3.17.
F
3. Select STAT ; CALC; 1-Var Stats.
4. The quartiles are displayed on the second output screen. Refer to Figure 3.21. Note: The sample standard deviation is
displayed on the first output screen and the value is stored in the statistic variable Sx.
5. Compute the interquartile range on the Home screen. Use the TI-84 Plus statistics variables that represent the quartiles.
Refer to Figure 3.22.
Minitab
1. Enter the data into column C1.
2. Select Stat; Basic Statistics; Display Descriptive Statistics. Enter C1 in the Variables window.
3. Choose the Statistics option button and check the summary statistics Standard deviation, variance, First quartile, Third
quartile, and Interquartile range. Note: Minitab computes quartiles using a slightly different algorithm. See Figure 3.24.
Excel
Use built-in functions to compute the sample standard deviation, sample variance, quartiles, and interquartile range.
1. Enter the data into column A.
2. Use the function STDEV.S to compute the sample standard deviation; VAR.S to compute the sample variance;
QUARTILE to compute the first and third quartile; and compute the interquartile range using the results. See
Figure 3.25. Note: Excel computes quartiles using another different algorithm.
Practice Find the sample variance and the sample standard deviation
for this new data set. How are these values related to the
3.38 Find the sample range, sample variance, and sample sample variance and the sample standard deviation found
standard deviation for each data set. EX3.38 in part (a)?
a. {2.7, 6.0, 5.7, 5.4, 4.0, 3.1, 6.6, 5.7, 6.1, 3.0}
3.43 How does an outlier affect each of the following?
b. {18.5, 23.5, 15.7, 15.7, 36.3, 20.8, 21.1, 20.2, 26.8, 19.9,
a. The sample variance
17.6, 17.5, 21.5, 22.4, 25.7}
b. The sample standard deviation
c. {23.94, 231.04, 37.09, 22.64, 261.23, 1.59, 23.09, 1.14}
c. The first quartile and the third quartile
d. {0.13, 0.96, 20.50, 0.10, 21.65, 20.14, 1.43, 22.57,
d. The interquartile range
21.28, 20.24, 20.90, 21.27, 1.53, 3.00, 21.28, 1.04,
20.90, 2.44, 1.70, 3.13} Applications
3.39 Compute the sample variance and the sample standard 3.44 Sports and Leisure The following times (in seconds) are
deviation for each sample with known sum(s).
a. gxi 5 1219.29 gx2i 5 58,945.1
from the ISU World Cup 2012/2013, Montreal, women’s 500-
meter speed skating event, October 26–28, 2012.13
b. gxi 5 35.2918 gx2i 5 7748.98
n 5 30 SPDSKT
d. g ( xi 2 x ) 2 5 49.784
n 5 15 47.149 59.028 46.188 54.118 45.416
n 5 21
a. Find the sample range, R.
3.40 Find the depth of the first quartile and the third quartile in b. Find the sample variance, s2, and the sample standard
an ordered data set of size n. deviation, s.
a. n 5 60 c. Find the first and third quartiles, Q1 and Q3, and the inter-
b. n 5 37 quartile range, IQR.
c. n 5 100
3.45 Physical Sciences The following table includes
d. n 5 48
some of the data from an experiment performed by H. S.
3.41 Find the first quartile, the third quartile, and the interquar- Lew for the Center for Building Technology at the U.S.
tile range for each data set. EX3.41 National Institute of Standards and Technology (NIST).
a. {20, 17, 37, 33, 29, 50, 20, 33} These data are used to certify computational results and evalu-
b. {13.1, 7.8, 11.9, 2.3, 6.7, 2.3, 7.4, 2.7, 8.9, 6.6, 6.8, 5.1, ate statistical software. Each observation represents the
2.2, 5.6, 5.5, 2.1, 7.7, 13.9, 1.6, 1.7} deflection of a steel-concrete beam while subjected to periodic
c. {215, 213, 27, 215, 222, 212, 221, 221, 226, 217} pressure. BEAM
d. {43.6, 44.1, 59.5, 52.3, 50.9, 39.7, 42.4, 58.5, 40.9, 38.5,
44.2, 60.3, 72.2, 34.8, 46.0, 54.7, 51.0, 54.3, 49.7, 62.9, 2213 2564 235 215 141 115 2420
44.6, 61.3, 52.4, 43.9, 68.8, 59.2, 57.1, 70.5, 52.3, 49.5} 2360 203 2338 2431 194 2220 2513
3.42 Consider the following data set: EX3.42 154 2125 2559 92 221 2579
Source: National Institute of Standards and Technology.
a. Compute CV and CQV for each development. and the number of acres burned has approximately doubled.
b. Compare the coefficient of variation and the coefficient of The following table lists the number of wildland fires in 2012
quartile variation for each development. Which data set as of November 2 in selected states.18 WILDFIRE
has more variability?
State AK CA CO CT GA FL
3.52 Physical Sciences Solar wind released from the Sun can
Fires 398 7737 1447 180 2878 2217
affect power grids on Earth, the northern and southern lights,
and even the tails of comets. The following table lists the proton
density in protons per cubic centimeter (p/cc) for several times State IN KY LA MA MD ME
in November 2012.16 SUNWIND Fires 67 896 828 1446 143 551
2.8 15.7 0.7 0.5 0.6 2.7 2.2 2.7 3.1 0.5 State NC NH NJ OH
5.0 5.5 1.3 10.9 3.2 3.4 2.7 1.2 1.7 0.8 Fires 2575 312 994 218
a. Find the sample variance and the sample standard a. Find the sample variance and the sample standard deviation.
deviation. b. Find Q1, Q3, and IQR for these data.
b. Find the Q1, Q3, and IQR for these data. c. Verify that the sum of the deviations about the mean is 0
c. Remove the two largest proton densities from the data (subject to round-off error).
set. Answer parts (a) and (b) for this reduced data set.
Compare the sample standard deviation and IQR in these 3.56 Marketing and Consumer Behavior An Internet search
two data sets and explain how these values have changed. for the best deal on a 12-megapixel digital camera revealed the
following prices (in U.S. dollars).19 BESTDEAL
3.53 Public Health and Nutrition The Center for Science
in the Public Interest (CSPI), a consumer group concerned
160 169 783 90 129 188
about nutrition labeling, has defined a new measure of
breakfast cereal called the nutritional index (NI), which is 300 295 600 579 356 553
based on calories, vitamins, minerals, and sugar content per
a. Find Q1, Q3, and IQR for these price data.
serving. A larger NI indicates greater nutritional value. The
b. Suppose the highest price (783) is changed to 699. Find
NI was measured for randomly selected cereals sold by
Q1, Q3, and IQR for this modified data set.
Kellogg’s and General Mills. The results are given in the
c. How large could the maximum price be without changing
following table. CEREALNI
IQR?
Kellogg’s d. How much could the minimum price be raised before Q1
changes?
86 70 77 79 71 80 88 62 81 82
75 83 70 67 72 68 74 80 62 74 3.57 Travel and Transportation A typical road bridge is
constructed to last 50 years. The mean age of all bridges in the
General Mills United States is approximately 43 years. The number of structur-
54 49 50 31 46 29 81 63 41 60 ally deficient bridges in each state and the District of Columbia
as of 2009 is given on the text website.20 BRIDGES
66 68 39 59 47 80 41 91 41 33
a. Find the sample variance and the sample standard devia-
a. Find s2, s, and IQR for Kellogg’s. tion of the number of structurally deficient bridges.
b. Find s2, s, and IQR for General Mills. b. Suppose each state is able to repair 10% of all structur-
c. Use the results in parts (a) and (b) to compare the vari- ally deficient bridges. Find the sample variance and the
ability in NI for the two companies. sample standard deviation for this new data set.
c. How do your answers to parts (a) and (b) compare?
3.54 Biology and Environmental Science Stage data
(in feet-NGVD) for the Mississippi River at Baton 3.58 Travel and Transportation In its annual study, the
Rouge at various times in 2012 are given on the text International Telework Association & Council asks survey
website.17 MSRIVER participants how many miles they must drive to work each day.
a. Find Q1, Q3, and IQR for these stream velocity data. Six study participants were selected at random and their
b. How large could the minimum stage be without changing mileage was recorded: MILEAGE
the IQR?
c. Find the coefficient of quartile variation, CQV (defined in 25 39 16 35 18 45
Exercise 3.51).
a. Find each deviation about the mean.
b. Verify that the sum of the deviations about the mean is 0
Extended Applications
c. Prove that, in general, g ( xi 2 x ) 5 0. (Hint: Write as two
(subject to round-off error).
3.55 Biology and Environmental Science The number of
wildland fires has increased dramatically over the last decade, separate sums, and use the definition of the sample mean.)
3.2 Measures of Variability 97
3.59 Biology and Environmental Science The Virginia 3.63 Transformed Data Combine the results obtained in the
Estuarine & Coastal Observing System monitors the previous two exercises. Suppose a data set (x’s) has variance s2x
Chesapeake Bay and records values of several variables, and standard deviation sx. A new (transformed) data set is created
including salinity, temperature, and turbidity. The wind speed using the equation yi 5 ax 1 b, where a and b are constants.
(in miles per hour) at selected locations on November 6, 2012, How are the variance and standard deviation of the new data
is given in the following table.21 CHESABAY set (s2y and sy) related to s2x and sx?
3.64 Is This Possible? Consider the set of observations
2 5 11 17 9 7 11 10
6 9 24 27 8 8 10 28 5 5, 7, 3, 2, 4, 6, 9, 11, 13 6
30 25 14 24 10 18 18 25 Can you find a subset of size n 5 7 with x 5 5 and s2 5 6? If
not, why not?
a. Find the sample variance and the sample standard devia-
tion for these wind speed data. Challenge
b. Convert each observation to meters per second (multiply
each observation by 0.44704). Find the sample variance 3.65 Biology and Environmental Science A whale-
and the sample standard deviation for the wind-speed data watching tour off the coast of Maine is considered a success
in meters per second. if at least one whale is sighted. Thirty-two randomly selected
c. How do your answers to parts (a) and (b) compare? summer tours are classified in the following table:
3.60 Proof Prove that Equation 3.4 (the definition of the sample S S S S F S S S S S S S
variance) can be written as Equation 3.6. That is, show that S S S S S S S F S S S S
g ( xi 2 x ) 2 5 c g x2i 2 ( g xi ) 2 d
1 1 1 S F S S S S S S
n21 n21 n
a. Find the sample proportion of successes.
3.61 Public Health and Nutrition A nutritional study b. Change each S to a 1 and each F to a 0. Find the sample
recently found the following number of calories in one slice of variance for these new data. Write the sample variance in
plain pizza at 10 different national chains.27 PIZZA terms of the sample proportion.
c. If a population happens to be of finite size N, then the
228 281 274 408 364 259 317 299 302 231 population mean and population variance are defined by
g xi g ( xi 2 m ) 2
Source: Food Science and Human Nutrition, Colorado State University. 1 1
m5 s2 5
a. Find the sample variance and the sample standard deviation. N N
b. Add 15 (calories) to each observation. Find the sample Suppose the table represents an entire population. Find
variance and the sample standard deviation for this modi- the population variance for the data (consisting of 0’s
fied data set. and 1’s). Write the population variance in terms of the
c. How do your answers to parts (a) and (b) compare? sample proportion.
d. Suppose a data set (x’s) has variance s2x and standard 3.66 Other Summary Statistics Many other summary statistics
deviation sx. A new (transformed) data set is created using can also be used to describe various characteristics of a numerical
the equation yi 5 xi 1 b, where b is a constant. How are data set. Suppose x1, x2, . . . , xn is a set of observations. For r 5 1, 2,
the variance and standard deviation of the new data set 3, . . . , the rth moment about the mean x is defined as
g ( xi 2 x ) r
(s2y and sy) related to s2x and sx?
1
3.62 Technology and the Internet A benchmark computer
mr 5
n
program was executed on eight different machines, and the
For example, the second moment about the mean is
g ( xi 2 x ) 2
following times to completion (in seconds) were recorded:
Programs
1
m2 5
n
12.592 14.152 12.396 6.801 Certain moments about the mean are used to define the coeffi-
13.646 12.075 15.377 7.602 cient of skewness (g1) and the coefficient of kurtosis (g2):
m3 m4
a. Find the sample variance and the sample standard deviation. g1 5 g2 5
b. Multiply each observation by 7. Find the sample variance m3/2
2 m22
and the sample standard deviation for this modified data set. The statistic g1 is a measure of the lack of symmetry, and g2 is a
c. How do your answers to parts (a) and (b) compare? measure of the extent of the peak in a distribution.
d. Suppose a data set (x’s) has variance s2x and standard devia- Use technology to compute the values g1 and g2 for various
tion sx. A new (transformed) data set is created using the distributions: skewed, symmetric, unimodal, uniform. Use your
equation yi 5 ax, where a is a constant. How are the vari- results to determine the values of g1 that suggest more skewness
ance and standard deviation of the new data set (s2y and sy) in the distribution, and the values of g2 that indicate a flatter,
related to s2x and sx? more uniform distribution.
98 C h apter 3 Numerical Summary Measures
Chebyshev’s Rule
What happens if k 5 1? Let k . 1. For any set of observations, the proportion of observations within k stan-
dard deviations of the mean [lying in the interval ( x 2 ks, x 1 ks ) , where s is the standard
deviation] is at least 1 2 (1 / k 2 ).
Recall interval notation: (a, b) The diagram in Figure 3.26, and the accompanying table, illustrate this idea. For any
denotes an open interval, with the set of observations, the smoothed histogram shows that the proportion of observations
endpoints not included, from a to captured in the interval ( x 2 ks, x 1 ks ) is at least 1 2 (1 / k 2 ). For example, the propor-
b. Therefore, ( x 2 ks, x 1 ks ) tion of observations within 1.5 standard deviations of the mean is at least 0.56 (or 56%).
means the set of all x’s such that The proportion of observations within 3 standard deviations of the mean is at least 0.89
x 2 ks , x , x 1 ks. (or 89%).
1
k 12 2
k
1
1.5 12 < 0.56
1.52
1
2.0 12 < 0.75
2.02
Recall: In smoothed histograms,
the area under the curve between 1
1 2.3 12 < 0.81
two points a and b corresponds to At least 1
k2
2.32
the proportion of observations 1
between a and b. 3.0 12 < 0.89
x ks x x ks
3.02
A Closer L ok
A symmetric interval about the 1. Chebyshev’s rule simply helps to describe a set of observations using symmetric
mean is centered at the mean and intervals about the mean. If we move k standard deviations from the mean in both
has endpoints that are the same directions, then the proportion of observations captured is at least 1 2 (1 / k 2 ).
distance from the mean. 2. The total area under the curve (the sum of all the proportions) is 1. Hence, Chebyshev’s
rule also implies that the proportion of observations in the tails of the distribution,
outside the interval ( x 2 k s, x 1 k s ) , is at most 1 / k 2.
3. As indicated in the statement of Chebyshev’s rule and as suggested in the table, you
may use any value of k greater than 1, including decimals. The two most common
values for k are k 5 2 and k 5 3. The actual proportions of observations within 2 and
within 3 standard deviations can be compared to the values predicted by Chebyshev’s
rule and the empirical rule (page 100). In addition, k 5 2 and k 5 3 provide the funda-
mental background to statistical inference.
4. Chebyshev’s rule is very conservative because it applies to any set of observations.
Usually, the proportion of observations within k standard deviations of the mean is
bigger than 1 2 (1 / k 2 ).
3.3 The Empirical Rule and Measures of Relative Standing 99
SOLUTION TRAIL
5. Chebyshev’s rule may also be used to describe a population. If the mean and standard devia-
tion are known, then m and s may be used in place of x and s. For any population, the pro-
portion of observations that lie in the interval (m 2 ks, m 1 ks) is at least 1 2 (1 / k 2 ).
b. Find the approximate proportion of observations less than 1.85 or greater than 4.85 minutes.
Solution
No values of k are specified, so use k 5 2 and k 5 3.
a. ( x 2 2s, x 1 2s ) 5 ( 3.35 2 2 ( 0.5 ) , 3.35 1 2 ( 0.5 )) 5 ( 2.35, 4.35 ) k52
At least 1 2 (1 / 4) 5 3 / 4 (or 75%) of the observations lie between 2.35 and 4.35 minutes.
b. ( x 2 3s, x 1 3s ) 5 ( 3.35 2 3 ( 0.5 ) , 3.35 1 3 ( 0.5 )) 5 ( 1.85, 4.85 ) k53
At least 1 2 (1 / 9) 5 8 / 9 (or 89%) of the observations lie between 1.85 and 4.85 minutes.
At most 1 / 9 (or 11%) of the observations are less than 1.85 or greater than 4.85 minutes.
c. Since Chebyshev’s rule measures intervals in terms of the number of standard
deviations from x, find out how far 5 is from x in standard deviations.
x 1 ks 5 3.35 1 k ( 0.5 ) 5 5 1 k 5 3.3
We cannot assume anything about the shape of the distribution.
1 1
12 2
512 < 0.91
k 3.32
At least 0.91 (or 91%) of the observations lie in the interval
( x 2 3.3s, x 1 3.3s ) 5 ( 3.35 2 3.3 ( 0.5 ) , 3.35 1 3.3 ( 0.5 ) ) 5 ( 1.7, 5.0 )
100 C h apter 3 Numerical Summary Measures
Therefore, at most 1 2 0.91 5 0.09 (or 9%) of the observations are outside this inter-
val, either less than 1.7 or greater than 5.0 minutes. We cannot assume that the distribu-
tion is symmetric, so we do not know what part of the 9% is less than 1.7 and what part
is more than 5.0 minutes. To be conservative, the best we can say is that at most 9% of
the observations are more than 5 minutes long.
Figure 3.27 illustrates the empirical rule, the symmetric intervals about the mean, and
the proportions. The empirical rule conclusions are more accurate than Chebyshev’s rule
because we know (assume) more about the shape of the distribution (normality).
x 1s x x 1s x 2s x x 2s x 3s x x 3s
Figure 3.27 Symmetric intervals and proportions associated with the empirical rule.
A Closer L ok
For now, the reasons for the 1. Given a set of observations, the empirical rule may be used to check normality. To test
proportions 0.68, 0.95, and 0.997 for normality numerically, find the mean, standard deviation, and the three symmetric
remain a mystery. We will discover intervals about the mean ( x 2 ks, x 1 ks ) , k 5 1, 2, 3. Compute the actual propor-
where these numbers come from tion of observations in each interval. If the actual proportions are close to 0.68, 0.95,
in Chapter 6. and 0.997, then normality seems reasonable. Otherwise, there is evidence to suggest
the shape of the distribution is not normal. This process is sort of a backward empiri-
cal rule.
2. The empirical rule may also be used to describe a population. If the distribution of the
population is approximately normal, and the mean and standard deviation are known,
then m and s may be used in place of x and s.
3. The proportion of observations beyond three standard deviations from the mean is
1 2 0.997 5 0.003 (pretty small). Therefore, if the shape of a (population) distribution
is approximately normal, it would be unusual to have an observation more than three
standard deviations from the mean. What if there is one? (See Example 3.14.)
SOLUTION TRAIL
1 0.997 0.003
0.95 0.997
55 80 105 130 155 180 205 55 80 105 130 155 180 205
c. Because a normal distribution is symmetric about the mean, the remaining proportion
outside three standard deviations from the mean (1 2 0.997 5 0.003) is divided evenly
between the two tails. Therefore, approximately 0.003 / 2 5 0.0015 (or 0.15%) of the
observations are greater than 205. See Figure 3.30.
d. (105, 180) is not a symmetric interval about the mean. However, approximately 0.68
of the observations lie in the interval (105, 155) (one standard deviation from the
102 C h apter 3 Numerical Summary Measures
mean). Approximately 0.95 of the observations lie in the interval (80, 180) (two
standard deviations from the mean). This means that 0.95 2 0.68 5 0.27 of the
observations lie in the intervals (80, 105) and (155, 180). Because a normal distribution
is symmetric, 0.27 / 2 5 0.135 of the observations lie between 155 and 180. Therefore,
a total of approximately 0.68 1 0.135 5 0.815 (or 81.5%) of the observations lie
between 105 and 180. See Figure 3.31.
55 80 105 130 155 180 205 55 80 105 130 155 180 205
Figure 3.30 Approximately 0.0015 Figure 3.31 Approximately 0.27 (or 27%)
(or 0.15%) of the observations are greater of the observations lie in the intervals (80,
SOLUTION TRAIL
than 205. 105) and (155,180).
Translat i on Solution
n Draw a conclusion. Step 1 Because the shape of the (population) distribution is approximately normal, the
Conc epts empirical rule applies (using m 5 8 and s 5 0.2).
n Empirical rule a. Approximately 0.68 of the population lies in the interval (7.8, 8.2).
n Inference procedure. b. Approximately 0.95 of the population lies in the interval (7.6, 8.4).
c. Approximately 0.997 of the population lies in the interval (7.4, 8.6).
Vi sion
Step 2 The observation x 5 7 hours lies outside the largest interval (7.4, 8.6). Only
Because the distribution of
pain-relief times is
1 2 0.997 5 0.003 of the population lies outside this interval. More precisely
approximately normal, the (because of symmetry), only 0.003 / 2 5 0.0015 of the population lies below 7.4.
empirical rule may be used to Seven hours is a very rare observation. Two things may have occurred.
determine how often observed a. Seven hours is an incredibly lucky observation. Even though the proportion of
times in certain intervals occur. observations below 7.4 is small, it is still possible for the manufacturer’s claim
If the observed pain-relief time to be true and for the pain reliever to last only 7 hours in this patient.
is rare, then we should question b. The manufacturer’s claim is false. Because an observation of 7 hours is so
the manufacturer’s claim.
rare, it is more likely that one of the assumptions is wrong. The shape of the
distribution may not be normal, the mean may be different from 8, and/or the
standard deviation might be different from 0.2.
Step 3 Typically, statistical inference discounts the lucky alternative. Therefore, because
7 hours is such an unlikely observation, there is evidence to suggest the manu-
facturer’s claim is false. Something is awry. We would rarely see pain relief of
only 7 hours if the claim is true.
3.3 The Empirical Rule and Measures of Relative Standing 103
Note: We may be too quick to make an inference based on only a single observation.
We will learn how to use more observations (information) to reach a more confident
conclusion.
One method for comparing observations from different samples (with different units)
is to use a standardized score. For a given observation, this relative measure is used to
determine the distance from the mean in standard deviations.
Definition
Suppose x1, x2, . . . , xn is a set of n observations with mean x and standard deviation s. The
z-score corresponding to the ith observation, xi, is given by
xi 2 x
zi 5 (3.7)
s
zi is a measure associated with xi that indicates the distance from x in standard deviations.
A Closer L ok
Statisticians tend to measure 1. zi may be positive or negative (or zero). A positive z-score indicates the observation is
distances in standard deviations, to the right of the mean. A negative z-score indicates the observation is to the left of the
not miles, feet, inches, or meters. mean.
We often ask, “How many 2. A z-score is a measure of relative standing; it indicates where an observation lies in
standard deviations from the mean relation to the rest of the values in the data set. There are other methods of standardiza-
is a given observation?”
3. Given a set of n observations, the sum of all the z-scores is 0; g zi 5 0. Can you
tion, but this is the most common.
SOLUTION TRAIL
prove this?
Definition
Let x1, x2, . . . , xn be a set of observations. The percentiles divide the data set into 100
parts. For any integer r (0 , r , 100), the rth percentile, denoted pr, is a value such that
r percent of the observations lie at or below pr (and 100 2 r percent lie above pr).
0.75 0.25
P75
The rth percentile has the same units as the observations, not a percent. Figure 3.32
Figure 3.32 The 75th percentile shows a smoothed histogram and illustrates the location of the 75th percentile on the
is illustrated using a smoothed measurement axis.
histogram.
Solution
Here, 3000 is a single observation from the population of number of campsites used per
day and percentiles divide these observations into 100 parts. Because 3000 lies at the 75th
percentile, 75% of the days had 3000 or fewer campsites used, and on 100 2 75 5 25%
of the days, more than 3000 campsites were utilized.
Note: We do not know anything about the shape of the distribution, nor do we know the
mean or standard deviation. There is no way of telling how far 3000 is from the mean in
standard deviations.
44 51 43 31 50 53 59 49 55 25
Solution Trail 3.18 28 30 60 42 36 54 31 33 48 39
K e y w ords 37 44 48 51 34 38 58 59 53 59
n Find the time at which it took
20% of the walkers to make it Find the time at which it took 20% of the walkers to make it across the bridge.
across the bridge.
Transl at i o n Solution
n Find the time, t, such that 20%
of all times are less than or Step 1 Order the data from smallest to largest. A portion of this ordered list is given in
equal to t. the following table:
C once p ts
Observation 25 28 30 31 31 33 34 36 37 38
n Percentiles.
Position 1 2 3 4 5 6 7 8 9 10
V isi on
Find p20 (in minutes) so that 20% Step 2 Compute
of the observations lie below and n#r 30 # 20
80% lie above. Follow the steps d20 5 5 56
100 100
for computing percentiles.
Step 3 Because d20 is a whole number, add 0.5. The depth of p20 is d20 1 0.5 5 6 1 0.5 5 6.5.
106 C h apter 3 Numerical Summary Measures
Step 4 The 20th percentile, p20, is the mean of the sixth and seventh observations.
1
p20 5 ( 33 1 34 ) 5 33.5
2
Figure 3.33 shows a technology solution.
Step 5 Twenty percent of the walkers made it across the bridge before 33.5 minutes, and
80% made it after 33.5 minutes.
Technology Corner
Procedure: Compute the rth percentile.
Reconsider: Example 3.18, solution, and interpretation. The TI-84 does not have a built-in function to compute the rth
percentile.
CrunchIt!
CrunchIt! has a built-in function to find certain descriptive statistics, including the rth percentile.
1. Enter the data into a column.
2. Select Statistics; Descriptive Statistics. Choose the appropriate column and enter the desired value of r. Click the Calcu-
late button. Refer to Figure 3.33.
Minitab
The Minitab calculator function Percentile can be used to compute the rth percentile in a Session Window.
1. Enter the data into column C1.
2. In the Session Window, compute and print the appropriate percentile (Figure 3.34).
Excel
1. Enter the data into column A.
2. Use the function PERCENTILE.EXC to compute the 20th percentile. See Figure 3.35.
During the 2012 tournament, the team headed by Sue Vermillion a. Approximately what proportion of observations is
caught a 638-pound fish, a weight in the 85th percentile of all between 23.8 and 27.4 inches?
blue marlin caught. Interpret this value. b. Approximately what proportion of observations is
between 22.9 and 26.5 inches?
3.83 Manufacturing and Product Development There
c. Approximately what proportion of observations is less
are approximately 3 million parts in a Boeing 777, and
than 27.4 inches?
suppliers are all over the world. It takes considerable coordina-
tion and organization to assemble this aircraft. From the time 3.87 Education and Child Development The Iowa Test of
the first part is moved from the factory to delivery of an Basic Skills (ITBS) is a multiple-choice exam given to students
aircraft, the mean time to assemble a Boeing 777 is 83 days.26 in various grades in each state each year. The purpose is to test
Suppose the standard deviation is 6 days. fundamental skills in reading, mathematics, language, social
a. What values are one standard deviation away from the studies, and science. Scores are reported as state and/or national
mean? What values are two standard deviations away percentile points. Results from the 2010–2011 academic year
from the mean? indicate that students in grades 7–9 at Carlisle High School in
b. Without assuming anything about the shape of the Arkansas scored at the 53rd and 69th (national) percentiles in
distribution of times, approximately what proportion of mathematics and science, respectively.27
assembly times is between 71 and 95 days? a. Interpret these values.
c. Without assuming anything about the shape of the b. The 50th percentile (in any subject area) is the national
distribution of times, approximately what proportion average. Explain the meaning of average in this context.
of assembly times is either less than 65 or greater than c. Suppose a seventh grader scored at the 99th percentile
101 days? (nationally) in mathematics. Interpret this result.
d. Assuming the distribution of times is normal, what
3.88 Travel and Transportation Bicycle delivery services
proportion of assembly times is between 71 and 95?
Either less than 65 or greater than 101? are utilized in many metropolitan areas because they can
provide rush deliveries and are not subject to traffic jams or
3.84 Biology and Environmental Science The Common- parking restrictions. Suppose an architectural firm would like
wealth of Pennsylvania is concerned about the dwindling to evaluate two bicycle delivery services in New York City.
number of family-owned farms and the number of smaller, The first service has a mean and standard deviation for
less efficient farms. For a random sample, the total acreage of delivery (in minutes) of 37 and 5. The second service has a
each farm was recorded. The mean was 1125 acres, with a mean of 42 with a standard deviation of 7. The company sent
standard deviation of 250. The shape of the distribution of two test
SOLUTION TRAIL packages to the same location, one with each
areas is not normal. delivery service. The times to delivery were 33 and 35
a. Approximately what proportion of areas is between 625 minutes, respectively. Use z-scores to determine which
and 1625 acres? service performed better.
b. Approximately what proportion of areas is between 375
and 1875 acres? 3.89 Business and Management The Green Mill
c. Approximately what proportion of areas is less than Restaurant and Bar in Wausau, Wisconsin, is advertising
375 acres? quick lunches with a mean waiting time of 11 minutes and a
d. Approximately what proportion of areas is between 750 standard deviation of 2.5 minutes. The general manager (Rob
and 1500 acres? Meyer, a former statistician) also claims that the distribution of
waiting times is approximately normal.
3.85 Physical Sciences During the spring, many rivers are a. Suppose your waiting time is 13 minutes. Is there any
monitored very carefully so as to be able to warn residents of reason to believe the general manager’s claim is false?
an impending flood. The depth (in feet) of the Susquehanna (Use a z-score.) Write a Solution Trail for this problem.
River at the Bloomsburg bridge is measured and reported daily. b. Suppose your waiting time is 20 minutes. Now, is there
In a random sample of depths, x 5 16.7, s 5 2.1, and the shape any evidence to refute the general manager’s claim?
of the distribution is approximately normal.
a. Approximately what proportion of depths is between 3.90 Marketing and Consumer Behavior The time spent
14.6 and 18.8 feet? in a grocery store is an important issue for shoppers and for
b. Approximately what proportion of depths is less than companies trying to market new products. Men tend to spend
14.6 feet? less time in a grocery store than women, and people spend
c. Approximately what proportion of depths is between more time in the store on weekends. A random sample of
14.6 and 23 feet? shoppers at a local grocery store was obtained and the shopping
time (in minutes) for each was recorded. GROCERY
3.86 Biology and Environmental Science Many farmers a. Construct a histogram for these data.
use the height of their corn on July 4th as an indication of the b. Use your histogram in part (a) to approximate the
entire crop. In a random sample of corn-stalk heights on July following percentiles: (i) 45th, (ii) 80th, (iii) 10th.
4th in Columbia County, x 5 25.6, s 5 0.9 inches, and a c. Compute the exact percentiles in part (b) and compare
histogram of the observations is bell-shaped. your results.
3.4 Five-Number Summary and Box Plots 109
Definition
The five-number summary for a set of n observations x1, x2, . . . , xn consists of the mini-
mum value, the maximum value, the first and third quartiles, and the median.
Recall: The range of a data set is
the largest observation (maximum
value) minus the smallest These five numbers do provide a glimpse of symmetry, central tendency, and variabil-
observation (minimum value). This ity in a data set. For example, minimum and maximum values that are very far apart sug-
descriptive statistic was our first gest lots of variability. If the median is approximately halfway between the minimum and
attempt at measuring variability in maximum values and approximately halfway between the first and third quartiles, that
a data set. suggests the distribution is symmetric. A box plot is constructed as described below.
110 C h apter 3 Numerical Summary Measures
Figure 3.36 illustrates this step-by-step procedure and shows a standard box plot with
the five numbers indicated on a measurement axis. Note that the length of the box is the
interquartile range. The box contains the middle half of the values.
The position of the vertical line in the box (median) and the lengths of the horizontal
lines (whiskers) indicate symmetry or skewness, and variability. Figure 3.37 shows a
standard box plot for a distribution of data that is skewed to the right. The lower half of
the data is in the interval from 3 to 4.5, while the upper half of the data is much more
spread out, from 4.5 to 11. Figure 3.38 shows a standard box plot for a fairly symmetric
distribution with lots of variability. The lower and upper half of the data are evenly distrib-
uted, but the whiskers extend far from each edge of the box. That is, 25% of the data are
between 0 and 4, and 25% are between 7 and 11.
0 1 2 3 4 5 6 7 8 9 10 11 12 1 0 1 2 3 4 5 6 7 8 9 10 11 12
Figure 3.37 Standard box plot for Figure 3.38 Standard box plot for a
data skewed to the right. symmetric distribution.
Solution
Step 1 Find the five-number summary:
Step 5 The box plot suggests the data are negatively skewed, or skewed to the left. The
lower half of the data is much more spread out than the upper half. A technology
solution is shown in Figure 3.40.
Figure 3.40 JMP quantile
box plot. A Closer L ok
1. A box plot has only one measurement axis, and it may be horizontal or vertical. Many soft-
ware packages, including CrunchIt! and Minitab, draw box plots with a vertical measure-
ment axis by default. The construction and interpretations are the same. The Transpose
option in Minitab produces a box plot with a horizontal measurement axis.
2. The software does not usually include tick marks on the measurement axis for the five-
number summary. The tick marks and scale are selected simply for convenience.
There are some disadvantages to using a standard box plot based on the five-number
summary to describe a data set. By examining the graph, there is no way of knowing how
many observations are between each quartile and the extreme. Each whisker is drawn
from the quartile to the extreme, regardless of the number of observations in between. In
addition, there are no provisions for identifying outliers. A standard (graphical) technique
for distinguishing outliers is important because these values play an important role in
statistical inference. Therefore, many statisticians prefer to use a modified box plot to
describe a data set graphically. This type of graph still conveys information about center,
variability, symmetry, and skewness, but it is also more precise and plots outliers.
Any observations outside the outer fences (less than OFL, or greater than OFH) are
classified as extreme outliers and are plotted separately with open circles. Note: Some
statistical packages will use other symbols for outliers and may not distinguish mild
and extreme outliers.
Figure 3.41 shows the relationship between construction points for a modified box plot
and the location of any outliers.
extreme outliers mild outliers mild outliers extreme outliers
3(IQR) 3(IQR)
1.5(IQR) 1.5(IQR)
IQR
0.7 1.7 0.8 2.7 1.4 0.8 2.6 3.5 1.3 0.6
9.5 2.6 2.7 11.9 3.1 7.9 0.6 6.1 5.2 1.9
2.2 3.7 0.5 1.1 1.4 3.1 0.1 2.6 4.3 4.5
Solution
Winterdance Dogsled Tours
Step 1 Find the quartiles, the median, and the interquartile range.
Stepped
STEPPED Tutorial
TUTORIALS Figure 3.43 Modified box plot for the
box plots
BOX PLOTS sled dog trip data. 0 2 4 6 8 10 12
Note: IFL and OFL are negative even though an observed trip time cannot be less
than 0 hours. That’s OK. This is a correct statistical calculation, not a contradiction,
even though it seems odd. Figure 3.42 shows a technology solution.
A Closer L ok
When we compare two (or more) data sets graphically, the corresponding box plots may
be placed on the same measurement axis (one above the other using a horizontal axis, or
side-by-side with a vertical axis). Figure 3.44 shows three box plots on the same measure-
Figure 3.44 TI-84 Plus C box ment axis, representing the number of gallons of gasoline pumped in randomly selected
plots for gasoline data. vehicles at three different stations.
Technology
VIDEOCorner
TECH MANUALS VIDEO Tech
Video TECH Manuals
MANUALS
EXEL DISCRIPTIVE EXELplots
Box DISCRIPTIVE
Procedure: Construct a box plot.
Reconsider: Example 3.20, solution, and interpretations.
CrunchIt!
CrunchIt! has a built-in function to construct a box plot.
1. Enter the data into a column.
2. Select Graphics; Box Plot. Choose the appropriate column, optionally enter a Title, X Label, and Y Label, and click the
calculate button. The resulting box plot is shown in Figure 3.45.
TI-84 Plus C
The TI-84 Plus has two built-in statistical plots, a standard and a modified box plot. The modified box plot does not
distinguish between mild and extreme outliers.
1. Enter the data into list L1 .
2. Press STATPLOT and select Plot1 from the STATPLOTS menu.
3. Turn the plot On and select Type box plot (modified or standard). Set Xlist to the name of the list containing the
data and Freq to 1. Choose a Mark for outliers (if constructing a modified box plot) and select a Color.
4. Enter appropriate WINDOW settings. Press GRAPH to display the box plot. A modified box plot is shown in
Figure 3.42.
114 C h apter 3 Numerical Summary Measures
Minitab
The Minitab modified box plot does not distinguish between mild and extreme outliers.
1. Enter the data into column C1.
2. Select Graph; Boxplot and choose a One Y; Simple box plot.
3. Enter C1 in the Graph variables window. Select the Scale options button and check Transpose value and category scales
to construct a box plot with a horizontal measurement axis.
4. Select the Data view options button and check Interquartile range box, and Outlier symbols for a modified box plot.
Note there are only two outliers in this box plot due to the numerical method Minitab uses to compute quartiles. See
Figure 3.46.
Excel
The following steps may be used to construct a standard box plot. Additional calculations and options are necessary to
construct a modified box plot.
1. Enter the data into column A.
2. Find the five-number summary in the order shown. Highlight these five cells.
3. Under the Insert tab, select Insert Line Chart; 2-D Line; Line with Markers.
4. Under Chart tools; Design, select Switch Row/Column.
5. Right-click on the point representing the minimum value. Select Format Data Series; Line and choose No line. Repeat
this process for each point.
6. Select any point on the graph. Select Add Chart Element; High-Low Lines and Add Chart Element; Up/Down Bars;
Up/Down Bars.
7. Format other graph items as appropriate. See Figure 3.47.
Practice 2 1 0 1 2 3 4 5
3.102 For each data set with Q1 and Q3 given, find the
interquartile range and the inner and outer fences. 80 70 60 50 40 30 20
a. Q1 5 22, Q3 5 46
b. Q1 5 1255, Q3 5 1306 Applications
c. Q1 5 65.75, Q3 5 75.21
3.105 Business and Management A Roth’s supermarket in
d. Q1 5 914.9, Q3 5 1140.5
Salem, Oregon, is using a new statistical tool to help in
e. Q1 5 1.275, Q3 5 4.07
ordering bottles of raspberry iced tea. A random sample of the
f. Q1 5 0.265, Q3 5 2.51
number of bottles sold per day is given in the following table.
g. Q1 5 233.67, Q3 5 223.90
Construct a modified box plot for these data. Describe the
h. Q1 5 98.43, Q3 5 98.81
distribution of the number of bottles sold. BOTTLES
3.107 Public Policy and Political Science A recent study Teen Center. The 2012 monthly commission checks
reported the school tax bills (in dollars) for randomly selected (in dollars) associated with these machines are given in the
families in Hillsboro, New Hampshire, a small rural commu- following table: VENDING
nity. Use the following modified box plot to describe the
distribution of the data. 51.24 27.60 37.80 44.76 24.36 14.04
119.52 123.48 38.48 35.28 16.92 47.25
25.65 31.95 40.50 22.50 33.75 59.85
36.90 18.45 31.95 18.45 19.50 22.65
1300 1400 1500 1600 1700 1800 1900 a. Construct a modified box plot for these data. Describe the
distribution in terms of symmetry, skewness, variability,
3.108 Education and Child Development Dr. Jan Remer, and outliers.
a psychologist in Hermitage, Tennessee, randomly selected b. Construct a standard box plot for these data. Which plot
six-year-olds and recorded the time (in minutes) each child do you think is more descriptive? Why?
needed to complete a 20-piece picture puzzle. The data were
3.112 Biology and Environmental Science Several
used to predict readiness for first grade. The following standard
weather centers across the country carefully track
box plot for the data was drawn. Describe the distribution of
and record data for tropical disturbances during the
completion times.
hurricane season. The following table from the 2012
Atlantic hurricane season contains the minimum central
pressure (in millibars) for each named tropical storm and
hurricane.35 STORMS
1.0 1.5 2 2.5 3 3.5 4 4.5 5.0 998 992 987 990 980 1000 965
1004 968 1006 970 968 964 978
3.109 Public Health and Nutrition As part of a new 997 1005 969 940 1000
physical fitness program, the Crooked Oak Middle School in
Oklahoma City, Oklahoma, records the number of sit-ups each Construct a modified box plot for the hurricane data. Describe
sixth-grade student can do in one minute. A random sample for the distribution in terms of symmetry, skewness, variability, and
males and females was obtained and the following modified box outliers. How would the graph change if a standard box plot
plots were drawn. Describe the male and female data separately. were used?
What similarities and/or differences do the box plots suggest?
Extended Applications
Males
3.113 Public Policy and Political Science The amber-light
time at an intersection usually varies according to the speed
limit.36 A random sample of amber-light times (in seconds) at
Females
intersections with different speed limits is given below.
Construct a modified box plot for each set of data on the same
measurement axis. Describe and compare the distributions. Do
15 20 25 30 35 40 45 50 55
the box plots suggest that amber-light times are longer at
3.110 Economics and Finance Many government program intersections with a higher speed limit? AMBERLT
budgets are determined by the annual inflation rate. The
inflation rate (as a percent change in the consumer price index 30 mph
since 1913) for the United States for the years 1965–2011 are 3.7 3.7 3.4 2.2 3.4 3.6 3.5 3.8 2.6 2.1
given on the text website.33 Construct a modified box plot for 3.1 3.9 4.3 1.8 3.0 1.6 4.5 2.7 2.2 2.4
these data. Describe the distribution of inflation rate over the
past 40 years in terms of symmetry, skewness, variability, and 50 mph
outliers. INFRATE 4.1 4.7 4.3 4.1 4.8 5.4 2.9 3.9 2.1 6.9
3.111 Public Policy and Political Science According 4.6 3.3 6.6 5.0 5.5 3.6 3.7 5.9
to US Vending Company, the national average commission
per machine per month is approximately $35.34 Suppose 3.114 Business and Management In an attempt
the Culver City, California, City Council recently entered to control costs, a manager at San José State University is
into an agreement with Coca-Cola Bottling Company. carefully looking at the number of photocopies made by
As part of this contract, two US Vending snack machines faculty members at the campus duplicating center.
were installed, one at Veteran’s Park and the other at the Random samples of liberal arts faculty and of natural
Chapter 3 Summary 117
Chapter 3 Summary
Concept Page Notation / Formula / Description
g ( xi 2 x ) 2 5 c g x2i 2 ( g xi ) 2 d
1 1 1
Sample variance 87 s2 5
n21 n21 n
118 C h apter 3 Numerical Summary Measures
Sample standard deviation 87 s 5 "s2: the positive square root of the sample variance.
Population variance 88 s2: the variance for an entire population.
Population standard deviation 88 s 5 "s2: the positive square root of the population variance.
Quartiles 90 The quartiles divide the data into four parts. Q1 is the first quartile and Q3 is the
third quartile. Q2 is the median.
Interquartile range 90 IQR 5 Q3 2 Q1
Chebyshev’s rule 98 For any set of observations, the proportion of observations within k standard
1
deviations of the mean is at least 1 2 2.
k
Empirical rule 100 If a distribution is approximately normal, the proportion of observations within
one, two, and three standard deviations about the mean is approximately 0.68,
0.95, and 0.997, respectively.
xi 2 x
z-score 103 zi 5 , how far an observation is from the mean in standard deviations.
s
Percentiles 104 The percentiles divide a data set into 100 parts.
Five-number summary 109 xmin, Q1, ~x , Q3, xmax
Box plot 110 A graphical description of a data set, constructed using the five-number
summary. The graph conveys information about central tendency, symmetry,
skewness, and variability.
Modified box plot 111 A graphical description of a data set, constructed using ~x , Q1, Q3, IQR, and the
inner and outer fences. This box plot also indicates any outliers.
Chapter 3 Exercises
a. Find the median, the first and third quartiles, and the cups of coffee) is a moderate intake.41 The amount of
interquartile range. caffeine in a cup of coffee varies according to coffee bean,
b. Find the 30th and the 95th percentiles. brewing technique, filter, etc. A random sample of eight-
c. Suppose Dallas currently has 357 episodes (as of May ounce cups of coffee was obtained and the caffeine content
2014). Using the data in the table, in what percentile does (in milligrams) was measured. The data are given in the
this episode count lie? following table: CAFFEIN
to measure central tendency justified (or necessary) in this a. Find the mean, variance, and standard deviation.
case? Why or why not? b. Construct a modified box plot for these data. Classify any
outliers as mild or extreme.
3.124 Medicine and Clinical Studies Although caffeine is c. Using the data in the table, in what percentile does
believed to be safe in moderate amounts, some health experts 400 lux lie?
suggest that 300 mg of caffeine (the amount in about three d. Use Chebyshev’s rule to describe this data set (k 5 2, 3).
120 C h apter 3 Numerical Summary Measures
3.128 Manufacturing and Product Development The a. Without assuming anything about the shape of the distri-
density of tires is an important selling point for serious bution of M&M weights, what proportion of M&Ms have
mountain bike riders. The tire industry uses a type A durometer weights between 0.83 and 0.99 gram?
to measure the indentation hardness for mountain bike tires. b. Suppose a random M&M has weight 0.74 gram. Do you
Suppose the distribution of tire hardness is approximately believe the manufacturer’s claim about the mean weight?
normal, with mean 45 and standard deviation 7. Justify your answer.
a. Carefully sketch the normal curve for tire hardness.
b. Is a tire hardness of 30 unusually soft? Justify your answer. 3.133 Manufacturing and Product Development The
c. A certain bicycle shop claims the hardness of all its tires actual width of a 2 3 4 piece of lumber is approximately
is in the 84th percentile. If this is true, what is the mini- 134 inches but can vary considerably. The Lumber Yard in
mum hardness of any tire in the store? Martinsburg, West Virginia, advertises consistent dimensions
for better building, and claims all 2 3 4s sold have a mean
3.129 Physical Sciences There were approximately width of 134 inches with a standard deviation of 0.02 inch.
17,000 earthquakes around the world in 2012.43 A random a. Assume the distribution of widths is approximately
sample of the magnitudes (on the Richter scale) of these normal. Find a symmetric interval about the mean that
earthquakes during November is given on the text website. contains almost all of the 2 3 4 widths.
QUAKES b. Suppose a random 2 3 4 has width 1.79 inches. Is there
a. Find the mean, median, variance, and standard deviation any evidence to suggest The Lumber Yard’s claim is
of the magnitudes. wrong? Justify your answer.
b. Find the 40th and the 80th percentiles. c. Suppose a random 2 3 4 has width 1.68 inches. Is there
c. How likely is a magnitude of 4.8? Justify your answer. any evidence to suggest The Lumber Yard’s claim is
wrong? Justify your answer.
EXTENDED APPLICATIONS
3.134 Biology and Environmental Science Some fish have
3.130 Physical Sciences Hydraulic fracturing, or fracking, been found to have mercury levels greater than 1 ppm (parts
is a method used to extract natural gas from deep shale per million), a level considered safe by the U.S. Food and Drug
deposits. This process involves over 500 chemicals and Administration. Suppose the mean mercury level for small-
millions of gallons of water. In a random sample of fracking mouth bass in the Susquehanna River is 0.7 ppm with standard
wells, the mean depth was 8000 feet.44 Assume the standard deviation 0.1 ppm, and the distribution of mercury level is
deviation is 450 feet and the distribution of depths is approxi- approximately normal.
mately normal. a. Is it likely a fisherman will catch a smallmouth bass with
a. What proportion of wells have depths between 7100 and mercury level greater than 1 ppm? Justify your answer.
8900 feet? b. Suppose the standard deviation is 0.05 ppm. Now, is it
b. What proportion of wells have depths less than 6650 feet? likely a fisherman will catch a smallmouth bass with
c. What proportion of wells have depths between 7550 and mercury level greater than 1 ppm? Justify your answer.
9350 feet? c. Carefully sketch the normal curves for parts (a) and (b)
d. Suppose a new fracking well was drilled in 2012 to a on the same measurement axis.
depth of 8255. Is there any evidence to suggest that the
mean depth of wells has changed? 3.135 Sports and Leisure The longest-running
Broadway show, with over 10,000 performances, is Phantom
3.131 Physical Sciences A building code officer inspected
of the Opera, surpassing Cats in 2006. A sample of Broadway
random home fire extinguishers for pressure (in psi), and the shows was obtained, and the number of performances
data are given on the text website. FIREX
of each was recorded.46 SHOWS
a. Construct a modified box plot for these data.
a. Find the sample mean and the sample median number of
b. Use the empirical rule to decide whether this distribution
performances. What do these values suggest about the
of pressures is approximately normal. shape of the distribution?
c. Create a new set of observations, yi 5 ln ( xi ) , where ln is
b. Find the sample variance and the sample standard deviation.
the natural logarithm function. Construct a modified box Find the proportion of observations within one standard
plot for this new set of data. Use this graph and the deviation of the mean, within two standard deviations of
empirical rule to decide whether the distribution of the the mean, and within three standard deviations of the mean.
transformed data is approximately normal. What do these proportions suggest about the shape of the
3.132 Manufacturing and Product Development distribution?
America’s favorite candy is M&Ms, with over $670 million c. Find the first quartile, the third quartile, and the interquar-
in annual sales. M&Ms were originally sold in tubes and tile range. Construct a modified box plot for the perfor-
are now available in several versions and can even be mance data. Use this graph to describe the distribution
personalized.45 Suppose the manufacturer (Mars) claims the in terms of symmetry, skewness, variability, and outliers.
mean weight of a single M&M is 0.91 gram with standard Does your description based on the box plot agree with
deviation 0.04 gram. your answers to parts (a) and (b)? Why or why not?
Chapter 3 Exercises 121
d. Find out how many performances there have been for carloads of grain mill products for 30 randomly selected weeks
Phantom of the Opera and add this value to the data in 2011 and 2012. RAILWAY
set. How will this value affect the sample mean, sample
median, sample variance, and quartiles? Find these values 572 711 582 663 612 577 650 550 590 659
and verify your predictions. 610 611 557 685 683 629 626 637 634 723
718 707 673 697 808 755 438 569 684 637
a. Compute the summary statistics for these data, including the
LAST STEP
mean, median, variance, standard deviation and quartiles.
3.136 How efficient is the Canadian Pacific b. Construct a box plot for these data.
Railway? To increase efficiency, officials at the c. Describe the distribution. Identify any outliers.
Canadian Pacific Railway monitor several variables including d. Find the proportion of observations within one, two,
train speed, cars on each train, and terminal dwell time. In and three standard deviations of the mean. Compare the
addition, the type and amount of freight is carefully recorded results to the empirical rule.
for each train. The following table shows the number of e. Find the 90th percentile.
4 Probability
Looking Back
n Understand the relationships among a population, a sample, probability, and statistics.
Looking Forward
n Learn the definition of probability and useful probability concepts.
n Compute the probability of various events involving counting techniques, independence,
or conditional probability.
Contents
4.1 Experiments, Sample Spaces, and Events
4.2 An Introduction to Probability
4.3 Counting Techniques
4.4 Conditional Probability
4.5 Independence
123
124 C hapter 4 Probability
Definition
An experiment is an activity in which there are at least two possible outcomes and the
result of the activity cannot be predicted with absolute certainty.
Solution
Step 1 The last digit of a person’s Social Security number can be any integer from 0 to 9.
Step 2 There are 10 possible outcomes.
The outcomes are 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9.
Solution
Step 1 If a driver is wearing a seatbelt, denote this observation by R (for restrained), and
if he is not wearing a seatbelt, use U (for unrestrained).
4.1 Experiments, Sample Spaces, and Events 125
There are lots of other ways to Step 2 Each outcome is a pair of observations, one on each driver. There are four pos-
denote these four outcomes. There sible outcomes, and here is a way to write them: RR, RU, UR, UU.
is no single correct notation. Write The first letter indicates the observation on the first driver, and the second letter
the outcomes so that others can indicates the observation on the second driver.
understand and interpret your list.
RU is a different outcome from UR. RU means the first driver was wearing a
seatbelt and the second driver was not. UR means the first driver was not wearing
a seatbelt and the second driver was.
Tree diagrams will also be All of the outcomes from the experiment in Example 4.2 can be determined by con-
extremely useful for determining structing a tree diagram, a visual road map of possible outcomes. Figure 4.1 is a tree
probabilities in problems involving diagram associated with this experiment.
Bayes’s rule. Problems of this type
are presented in Section 4.5. First-generation Second-generation
branches branches Outcomes
R RR
U
R
RU
U
R UR
The first-generation branches indicate the possible choices associated with the first
driver, and the second-generation branches represent the choices for the second driver. A
path from left to right represents a possible experimental outcome.
Solution
Now there are eight possible outcomes: RRR, RRU, RUR, RUU, URR, URU, UUR, UUU.
Figure 4.2 is a tree diagram for this extended experiment. Again, every path from left to
right represents a possible outcome.
R RRR
U
R
RRU
U
R R RUR
U
RUU
R URR
U
R U
URU
U
R UUR
Figure 4.2 Tree diagram for U
Example 4.3. UUU
126 C hapter 4 Probability
A Closer L ok
Tree diagrams are also used to 1. Tree diagrams are a fine technique for finding all the possible outcomes for an experi-
prove the multiplication rule ment. However, they can get very big, very fast.
(Section 4.3), an arithmetic 2. A tree diagram does not have to be symmetric, as they are in Figures 4.1 and 4.2. The
technique used to count the branches and paths depend on the experiment. Consider the next example.
number of possible outcomes
in certain experiments. Example 4.4 Breakfast of Champions
A consumer in Clarkdale, Arizona, is searching for a box of his favorite breakfast cereal.
He will check all three grocery stores in town if necessary, but will stop if the cereal is
found. The experiment consists of searching for the cereal. How many possible outcomes
are there, and what are they?
Solution
Step 1 If the cereal is in stock, use the letter I; if it is out of stock, use O. Figure 4.3
shows a tree diagram for this experiment.
Why isn’t IO a possible outcome? Step 2 On the tree diagram, there are four possible paths from left to right. The out-
comes are
OOO
O
O I
OOI
O I
OI
This tree diagram is not symmetric, but all possible outcomes are represented by
left-to-right paths.
The paths representing the outcomes have different lengths. Some of the out-
comes are shorter because the experiment ends early if the consumer finds the
cereal in the first or second store.
The symbol S is used to denote
several different objects, or
Definition
quantities, in probability and
statistics; a small s is also The sample space associated with an experiment is a listing of all the possible outcomes
common. The text of the problem using set notation. It is the collection of all outcomes written mathematically, with curly
reveals the relevant meaning braces, and denoted by S.
of the symbol.
4.1 Experiments, Sample Spaces, and Events 127
Solution
We determined the outcomes for each experiment. Write the sample space using set notation.
Step 1 Last digit of Social Security number: S 5 {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}.
Step 2 Seatbelt experiment: S 5 {RR, RU, UR, UU}.
Step 3 Extended seatbelt experiment:
S 5 {RRR, RRU, RUR, RUU, URR, URU, UUR, UUU}.
Step 4 Cereal experiment: S 5 {I, OI, OOI, OOO}.
Definition
1. An event is any collection (or set) of outcomes from an experiment (any subset of the
sample space).
2. A simple event is an event consisting of exactly one outcome.
3. An event has occurred if the resulting outcome is contained in the event.
A Closer L ok
1. An event may be given in standard set notation, or it may be defined in words. If a writ-
ten definition is given, we need to translate the words into mathematics in order to
identify the event outcomes.
2. Notation:
(a) Events are denoted with capital letters, for example, A, B, C, . . . .
(b) Simple events are often denoted by E1, E2, E3, . . . .
3. It is possible for an event to be empty. An event containing no outcomes is denoted by
{} or 0 (the empty set).
Definition
Let A and B denote two events associated with a sample space S.
A9 is read as “A prime” or “A 1. The event A complement, denoted A9, consists of all outcomes in the sample space S
complement.” that are not in A.
2. The event A union B, denoted A c B, consists of all outcomes that are in A or B or both.
3. The event A intersection B, denoted A d B, consists of all outcomes that are in both A
and B.
4. If A and B have no elements in common, they are disjoint or mutually exclusive, written
A d B 5 {}.
A Closer L ok
1. The event A9 is also called not A. The word not in the text of a probability question
usually means you need to find the complement of an event.
2. Or usually means union; A or B means A c B.
3. And usually means intersection; A and B means A d B.
4. Any outcome in both A and B is included only once in the event A c B.
5. The three events defined above could be denoted using any new symbols. A9, A c B, and
A d B are traditional mathematical symbols to denote complement, union, and intersection.
6. It is possible for one of these new events to contain all the outcomes in the sample
space.
4.1 Experiments, Sample Spaces, and Events 129
First-generation Second-generation
branches branches
FF
F
T
FT
G
F FG
TF
F
T T
TT
G
G TG
GF
F
T
GT
G
GG
Here are some new events created from the four given events:
Dr 5 5 FF, FG, TT, GF, GG 6
5 D complement, neither or both buy satin.
5 All outcomes in S not in D.
A c C 5 5 FF, TT, GG, FT, FG, TF, GF 6
5 Both buy the same finish or at least one buys flat.
5 All outcomes in A or C ( or both ) .
AdD5 5 6
5 Both buy the same finish and exactly one buys satin.
5 All outcomes in A and D. A and D are disjoint.
( A c C ) r 5 5 TG, GT 6
5 A union C, complement.
5 All outcomes in S not in A c C.
(A d D) r 5 S
5 A intersection D, complement.
5 All outcomes in S not in A d D.
A Venn diagram may be used to visualize a sample space and events, to determine
outcomes in combinations of events, and to answer probability questions in later sections.
To construct a Venn diagram draw a rectangle to represent the sample space. Various fig-
ures (often circles) are drawn inside the rectangle to represent events. The Venn diagrams
in Figure 4.5 illustrate various combinations of events.
S S
A
A A B
A complement: A A union B: A ∪ B
S S
A B A B
In a Venn diagram, plane regions represent events. We often add labeled points to
denote outcomes. Later, probabilities assigned to events will be added to the diagrams.
The definitions of union, intersection, and disjoint events can be extended to a collec-
tion consisting of more than two events.
Definition
Let A1, A2, A3, . . . , Ak be a collection of k events.
1. The event A1 c A2 c # # # c Ak is a generalized union and consists of all outcomes
in at least one of the events A1, A2, A3, . . . , Ak.
2. The event A1 d A2 d # # # d Ak is a generalized intersection and consists of all
outcomes in every one of the events A1, A2, A3, . . . , Ak.
3. The k events A1, A2, A3, . . . , Ak are disjoint if no two have any element in common.
a. List the outcomes in the event A c B c C and illustrate these three events using a Venn
diagram.
. List the outcomes in the event A c B c D and illustrate these three events using a Venn
b
diagram.
c. List the outcomes in each of the following events:
i. A d B d C ii. A d B d D
iii. ( A c B ) r iv. ( A c B c D ) r
Solution
Step 1 The event A c B c C includes all the outcomes in at least one of the events A, B,
or C. A c B c C 5 {0, 1, 2, 3, 4, 5, 6, 7, 8}.
Figure 4.6 shows the relationships among the events A, B, and C, and the sample
space S.
Step 2 The event A c B c D includes all the outcomes in at least one of the events A, B,
or D. A c B c D 5 {0, 1, 2, 3, 4, 5, 6, 9}.
Figure 4.7 shows the relationships among the events A, B, and D, and the sample
space S.
Step 3 A d B d C 5 5 6 There are no outcomes in all three events.
S S
A 1 9
A 1 D
0 2 C 2 9
0
7
3 4 4
7 8 3 6
8
5 6 5
B B
4.9 One playing card is selected from a regular 52-card deck. 4.16 The Venn diagram below shows the relationship among
An experiment consists of recording the denomination (ace, 2, three events.
3, 4, 5, 6, 7, 8, 9, 10, jack, queen, king) and suit (club, diamond,
heart, or spade). How many possible outcomes are there in this S
experiment?
A B
4.10 Consider an experiment with sample space
S 5 5 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 6
and the events
A 5 {0, 2, 4, 6, 8}
B 5 {1, 3, 5, 7, 9}
C 5 {0, 1, 2, 3, 4}
C
D 5 {5, 6, 7, 8, 9}
Find the outcomes in each of the following events.
a. A9 b. C9 c. D9
d. A c B e. A c C f. A c D Redraw the Venn diagram for each part of this problem
and carefully shade in the region corresponding to each
4.11 Use the sample space and the events in Exercise 4.10 new event.
to find the outcomes in each of the following events. a. A c B c C b. A d B d C c. A c C
a. B d C b. B d D c. A d B . B d C
d e. B d Cr f. ( A c B ) r d C
. A d C
d e. ( B d C ) r f. Br c Cr . ( A c B c C ) r h. Ar d Br d Cr
g i. B d C d Ar
4.12 Consider an experiment with sample space 4.17 Consider an experiment with sample space
S 5 5 a, b, c, d, e, f, g, h, i, j, k 6 S 5 {YYY, YYN, YNY, YNN, NYY, NYN, NNY, NNN}
and the events a. Find the outcomes in each of the following events.
A 5 {a, c, e, g} B 5 {b, c, f, j, k} A 5 Exactly one Y.
C 5 {c, f, g, h, i} D 5 {a, b, d, e, g, h, j, k} B 5 Exactly two Ns.
Find the outcomes in each of the following events. C 5 At least one Y.
a. A9 b. C9 c. D9 D 5 At most one N.
d. A d B e. A d C f. C d D Find the outcomes in each event described in words
and write each as a combination of the events A, B, C,
4.13 Use the sample space and the events in Exercise 4.12
and D.
to find the outcomes in each of the following events.
b. Exactly one Y or at most one N
a. A c B c D
c. Two or more Ns
b. B c C c D
d. Exactly two Ns and at least one Y
c. B d C d D
e. Two or more Ys
d. A d B d C
4.18 Consider an experiment with sample space
4.14 Use the sample space and the events in Exercise 4.12 S 5 {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
to find the outcomes in each of the following events. and the events
a. ( A d B d C ) r A 5 {0, 1, 2, 7, 8, 9}
b. A c B c C c D B 5 {0, 1, 2, 4, 8}
c. ( B c C c D ) r C 5 {0, 1, 3, 9}
d. Br d Cr d Dr D 5 {1, 4, 9}
4.15 The Venn diagram below shows the relationship between Draw a separate Venn diagram to illustrate the relationships
two events. among each collection of events and the sample space S.
a. B and C
S b. A and D
A B c. A, B, and C
d. A, C, and D
Applications
4.19 Physical Sciences An experiment consists of recording
the time zone (E, C, M, P) and strength (L, M, H) for the next
Redraw the Venn diagram for each part of this problem and earthquake in the 48 contiguous states United States.
carefully shade in the region corresponding to each new event. a. Carefully sketch a tree diagram to illustrate the possible
a. ( A c B ) r b. ( A d B ) r c. Ar d B outcomes for this experiment.
. A d Br
d e. Ar d Br f. Ar c Br b. Find the sample space S for this experiment.
4.1 Experiments, Sample Spaces, and Events 133
4.20 Economics and Finance Three taxpayers are selected 4.27 Medicine and Clinical Studies The Emergency Room
at random and asked whether they itemized their tax deductions in a rural hospital is staffed in four six-hour shifts (1, 2, 3, 4).
last year or used the standard deduction. An experiment During any shift, an Emergency Room patient is attended to
consists of recording each response. Construct a tree diagram by either a general physician (G), a surgeon (R), or an intern
to represent this experiment and find the outcomes in the (I). An experiment consists of coding the next Emergency
sample space. Room patient by shift and attending doctor. Consider the
following events.
4.21 Travel and Transportation Two people who work in
A 5 The attending doctor is the general physician.
New York City are selected at random and asked how they get
B 5 The patient is admitted during the second shift.
to work: drive, take a train, or take a bus. An experiment
C 5 The patient is admitted during shift 3 or is seen by
consists of recording each response. Construct a tree diagram
the intern.
to represent this experiment and find the outcomes in the
D 5 The patient is admitted during shift 4 and is seen by the
sample space.
general physician.
4.22 Fuel Consumption and Cars In early 2013 it was a. Find the sample space S for this experiment.
revealed that the Winnipeg Police Service had worked 627 b. List the outcomes in each of the events A, B, C, and D.
overtime shifts over the last six months specifically for c. List the outcomes in the events A c B and A d B.
traffic enforcement.2 The cost to Canadian taxpayers was
4.28 Psychology and Human Behavior Drivers entering
approximately $900,000, and some people believe the time
the Quaker Bride Mall parking lot at the main entrance may
and money should be spent on crime prevention instead.
turn left, right, or go straight. An experiment consists of
An experiment consists of selecting a ticketed car at random
recording the direction of the next car entering the mall and
and recording:
the vehicle style (sedan, SUV, van, or pickup). Consider the
a. Whether the driver has a valid registration.
following events.
b. Whether the automobile is properly insured.
A 5 The next vehicle is a van.
c. The time of day during the traffic stop: morning, afternoon
B 5 The next vehicle is a sedan or pickup.
or evening.
C 5 The next vehicle turns left.
Construct a tree diagram to represent this experiment and find
D 5 The next vehicle goes straight or turns right.
the outcomes in the sample space.
a. Find the sample space S for this experiment.
4.23 Physical Sciences A construction crew excavating b. List the outcomes in each of the events A, B, C, and D.
a site for a building foundation must remove the rock and c. List the outcomes in the events C c D and C d D.
prepare a trench for concrete footers. An experiment consists of
4.29 Public Health and Nutrition Each patient with a
recording the type of rock present (I, igneous; S, sedimentary;
regular appointment at Dr. Kenneth Heise’s dental office is
M, metamorphic) and the number of days needed to prepare
classified by the number of cavities found (assume 4 is the
the site (1 to 5).
maximum) and as late (L) or on time (T) for the appointment.
a. Carefully sketch a tree diagram to illustrate the possible
a. Find the sample space S for this experiment.
outcomes for this experiment.
b. Describe the following events in words.
b. Find the sample space S for this experiment.
A 5 {0L, 1L, 2L, 3L, 4L}
4.24 Manufacturing and Product Development One of B 5 {3L, 4L, 3T, 4T}
four calculator batteries is bad. An experiment consists of C 5 {1L, 3L, 1T, 3T}
testing each battery until the dead one is found. D 5 {0L, 0T}
a. How many possible outcomes are there for this experiment? E 5 {0L, 0T, 1L, 2L, 3L, 4L}
b. Is the outcome GBGG (Good, Bad, Good, Good) possible? F 5 {4T}
Why or why not?
4.30 Travel and Transportation Every passenger arriving
4.25 Sports and Leisure An experiment consists of at the Las Vegas McCarran International Airport is classified as
recording the number of pins knocked down on each roll during American (A) or Foreign (F), and by the number of checked
a frame of a bowling game. A bowler may take a maximum of bags (assume 5 is the maximum).
two rolls per frame. How many outcomes are in the sample a. Find the sample space S for this experiment.
space for this experiment? Hint: If the first roll is a 10 (a strike), b. Describe the following events in words.
the experiment is over. A 5 {A0, F0}
B 5 {F0, F1, F2, F3, F4, F5}
4.26 Sports and Leisure A sports statistician must
C 5 {A1, F1, A2, F2}
carefully chart opposition football plays in preparation for
D 5 {F0, F5}
the next game. An experiment consists of recording the
E 5 {A1, F1, A3, F3, A5, F5}
type of play (pass or rush) and the yards gained (299, 298,
297, . . . , 22, 21, 0, 1, 2, . . . , 97, 98, 99) on a randomly 4.31 Public Health and Nutrition A researcher working for
selected first down. How many outcomes are in the sample a Five Guys fast-food restaurant in Savannah, Georgia, selects
space for this experiment? random customers and classifies each according to sex
134 C hapter 4 Probability
[male (M) or female (F)], fresh-cut French fries or not [(C) or Each user must select the number of gigabytes depending on
(N)], and age group [young (Y), middle aged (D), or senior (R)]. the planned volume of emails, music, and videos.3 An experi-
a. Find the sample space S for this experiment. ment consists of selecting a Verizon Wireless smartphone
b. Describe the following events in words. customer and recording the data plan (1, 2, 3, 4, 5, or 6 GB)
A 5 {MCY, MCD, MCR, MNY, MND, MNR} and whether the customer used more data than planned last
B 5 {MCR, FCR} month. Consider the following events.
C 5 {MCY, MNY, FCY, FNY} A 5 The customer used more data than planned.
D 5 {MNY, MND, MNR, FNY, FND, FNR} B 5 The customer has a 1-, 2-, or 3-GB plan.
C 5 The customer has a 5- or 6-GB plan, and used more data
than planned.
Extended Applications D 5 The customer has a 2-, 4-, or 6-GB plan.
a. Find the sample space S for this experiment.
4.32 Sports and Leisure A single six-sided die is rolled.
b. Find the outcomes in each of the following events.
If the number face up is even, then the experiment is over.
i. B9 ii. A c B iii. A d B
If the number face up is odd, then the die is rolled again.
iv. C d D v. A d B d D vi. (A d D)9
The experiment continues until the number face up is even.
a. Carefully sketch (part of) a tree diagram to illustrate the 4.36 Travel and Transportation An experiment consists
possible outcomes for this experiment. of selecting a random passenger on a train from Washington,
b. Find the sample space for this experiment. DC, Union Station to Trenton, NJ, and recording the purpose
of travel (business or pleasure) and the number of pieces
4.33 Economics and Finance A taxpayer in need of advice
of luggage (0 to 4). Consider the following events:
will call the IRS repeatedly until she can get through (no busy
A 5 The passenger is traveling on business.
signal). If she receives a busy signal, she will hang up and try
B 5 The passenger has no luggage.
again later, and will stop calling as soon as she reaches an
C 5 The passenger has at most one piece of luggage.
agent. An experiment consists of recording the calling pattern.
D 5 The passenger has three pieces of luggage or is traveling
A possible outcome is BBH: a busy signal (B) on the first two
for pleasure.
calls, and (finally) help (H) on the third call.
a. Find the sample space S for this experiment.
a. How many possible outcomes are there in this experiment?
b. Find the outcomes in each of the following events.
b. List some of the outcomes for this experiment.
i. A c B ii. A d B iii. B c C
4.34 Marketing and Consumer Behavior Musicnotes.com iv. B d C v. A d D vi. A d B d C d D
sells sheet music in the following genres: rock, jazz, new age,
4.37 Marketing and Consumer Behavior Raman’s Coffee
and country. An experiment consists of recording the preferred
and Chai offers Chai tea in the following variations: hot or
genre for the next customer, and the number of songs purchased
cold; with whipped cream or without; and in small, medium, or
(assume 5 is the maximum). Consider the following events.
large size. An experiment consists of recording these three
A 5 The next customer prefers rock.
options for the next customer.
B 5 The next customer prefers jazz and buys at least three songs.
a. Carefully sketch a tree diagram to illustrate the possible
C 5 The next customer buys at most two songs.
outcomes for this experiment.
D 5 The next customer prefers country and buys one song.
b. Find the sample space S for this experiment.
a. Find the sample space S for this experiment.
c. Consider the following events.
b. Find the outcomes in each of the following events.
A 5 The next customer order is small.
i. A9 ii. A c C iii. A d D
B 5 The next customer order is cold.
iv. C d D v. A d C d D vi. (A d B)
C 5 The next customer order is small or hot.
4.35 Marketing and Consumer Behavior Verizon Wireless Find the outcomes in each of the following events.
offers smartphone and tablet customers a variety of data plans. i. A c B ii. B c C iii. B d C iv. C9
Definition
The probability of an event A is a number between 0 and 1 (including those endpoints)
that measures the likelihood A will occur.
1. If the probability of an event is close to 1, then the event is likely to occur.
2. If the probability of an event is close to 0, then the event is not likely to occur.
Would you enroll in a class where If the probability of an event A is 1, then the event is a certainty: It will occur. If the
the probability of receiving an probability of an event B is 0, then B is definitely not going to occur. What about events
A is 1? with probabilities in between? How do we decide to assign a probability of 0.3, for exam-
ple, to an event C? We need a reasonable, all-purpose rule for linking an event to its
likelihood of occurrence. The natural (theoretical) definition for assigning a probability to
an event is very intuitive.
Definition
The relative frequency of occurrence of an event is the number of times the event occurs
divided by the total number of times the experiment is conducted.
Solution
Step 1 Let C be the event that a club is selected. We want the probability of the event C,
which is denoted by P(C ).
Step 2 To estimate the probability of C, it seems reasonable to conduct the experiment
several times and see how often a club is selected. If C occurs often (we get a
club a lot of the time), then the likelihood (probability) should be high. If C
occurs rarely, then the probability should be close to 0.
Relative frequency was defined in Step 3 To estimate the likelihood of selecting a club, we use the relative frequency of
Chapter 2 in the context of occurrence of a club, which is the frequency divided by total trials, or
frequency distributions.
number of times a club is selected
Relative frequency 5
total number of selections
After every selection, the Step 4 Suppose that after 10 tries, a club was selected only twice. The relative frequency
observed card is placed back in is 2/10 5 0.2. This is an estimate of P(C ). It’s quick and easy, but it doesn’t seem
the deck. The deck is shuffled, too accurate.
and another selection is made. Step 5 Suppose we try the experiment a few more times. With more observations we
should be able to make a better guess at P(C). The table below shows values for
=
N, the number of trials, and p, the relative frequency of occurrence of a club.
N 10 50 100 200 300 400 500 600 700
=
p 0.2 0.3 0.29 0.23 0.223 0.205 0.228 0.252 0.267
N 800 900 1000 1100 1200 1300 1400 1500
=
p 0.245 0.243 0.227 0.254 0.260 0.256 0.261 0.249
For example, after 300 draws, the relative frequency of occurrence of the event
C was 0.223.
136 C hapter 4 Probability
Step 6 Figure 4.8 shows a plot of relative frequency versus number of trials. The graph
shows a remarkable pattern. As N increases, the points are noticeably closer to
the dashed line. The relative frequencies seem to be homing in on one number
(around 0.25); this relative frequency, whatever it is, should be the probability of
the event C.
0.3
Relative frequency, p
0.2
0.1
0.0
0 500 1000 1500
Number of trials, N
Step 7 In the long run, the relative frequencies tend to stabilize, or even out, and become
almost constant. They close in on one number, the limiting relative frequency.
The probability of the event C is the limiting relative frequency.
If an experiment is conducted N times and an event occurs n times, then the probability
of the event is approximately n/N (the relative frequency of occurrence). The probability
of an event A, P(A), is the limiting relative frequency, the proportion of time the event A
will occur in the long run. This is a basic and sensible definition, a rule for assigning prob-
ability to an event. Given an event, all we need to do is find the limiting relative frequency.
Statistical Applet Although this definition makes sense, and Example 4.10 and Figure 4.8 support and illus-
Probability trate our intuition, there is a real practical problem. We cannot conduct experiments over and
over, compute relative frequencies, and only then estimate the true probability. How will we
ever know the true limiting relative frequency? How large should N be? When are we close
enough? Will we ever hit the limiting relative frequency exactly? The definition is nice, but
there seems little hope of ever finding the true probability of an event. Fortunately, there is
another way to determine the exact probability in some cases. Consider the next two examples.
Solution
If we were to conduct this There are only two possible outcomes on each flip of the coin, and they are both equally
experiment over and over, the likely to occur. In the long run, we expect heads to occur half of the time.
relative frequency of occurrence
Therefore, P ( H ) 5 1/2.
of H would close in on 1/ 2.
Without flipping the coin thousands of times, making estimates, or guessing at the limit-
ing relative frequency, we are certain the probability is 1/2.
Solution
There are six possible outcomes on each roll of the die, and they are all equally likely to
The relative frequency of occurrence occur. In the long run, we expect 1 to occur one-sixth of the time.
of a 1 would get closer and closer
Therefore, P ( E ) 5 1/6.
to 1/ 6 as the number of rolls gets
larger and larger. We can identify the exact limiting relative frequency.
These two examples suggest it is indeed possible to find the limiting relative frequency!
They are special cases, however, because in each experiment, all of the outcomes are
equally likely.
Properties of Probability
The word chance is also used to 1. For any event A, 0 # P(A) # 1.
express likelihood. A 10% chance The probability of any event is a limiting relative frequency, and a relative frequency
means the probability is 0.10. is a number between 0 and 1. An event with probability close to 0 is very unlikely to
occur, and an event with probability close to 1 is very likely to occur.
2. For any event A, P(A) is the sum of the probabilities of all of the outcomes in A.
To compute P(A), just add up the probability of each outcome or simple event in A.
3. The sum of the probabilities of all possible outcomes in a sample space is 1: P(S) 5 1.
The sample space S is an event. If an experiment is conducted, S is guaranteed to occur.
4. The probability of the empty set is 0: P({ }) 5 P(0) 5 0. This event contains no outcomes.
In the next example, the probability (limiting relative frequency) of each simple event
is assumed to be known. We will use the properties above and some earlier definitions to
develop some common tools and strategies for solving similar probability questions.
Solution
Step 1 P ( A ) 5 P ( 1 ) 1 P ( 2 ) Add the probabilities of each simple event in A.
5 0.08 1 0.12 5 0.20
138 C hapter 4 Probability
To find probabilities in the previous example, we looked at each event piece by piece.
We broke down each event into simple events. Let’s apply the same properties in an
equally likely outcome experiment.
Think about tossing a fair coin, Suppose an experiment has n equally likely outcomes, S 5 {e1, e2, e3, . . . , en}. Each
or rolling a fair die, or randomly simple event has the same chance of occurring, so the probability of each is 1/n; P(ei ) 5
selecting a student in a class 1/n. The limiting relative frequency of ei is 1/n. This is exactly what we found in Exam-
to answer a question. ples 4.11 and 4.12. Consider an event A 5 {e1, e2, e3, e4, e5}. To find P(A), add up the
probabilities of each simple event in A.
P ( A ) 5 P ( e1 ) 1 P ( e2 ) 1 P ( e3 ) 1 P ( e4 ) 1 P ( e5 )
1 1 1 1 1 5
5 1 1 1 1 5
n n n n n n
number of outcomes in A N(A)
5 5
number of outcomes in the sample space S N(S)
Section 4.3 presents some special counting rules to help compute probabilities associ-
ated with common experiments and events. However, we can solve some of these prob-
lems already, and even use our results to make a statistical inference.
V isi on a. Let A 5 both trainees are selected for the audit. Because the trainees are tellers 1 and
To find the probability of each 2, there is only one outcome in the event A: A 5 {12}.
event, count the number of
number of outcomes in A N(A) 1
outcomes in that event and P(A) 5 5 5 5 0.10
divide by the total number of number of outcomes in S (
N S ) 10
outcomes in the sample space.
b. Let B 5 one male and one female teller are selected. Tellers 2, 3, and 4 are female, and
tellers 1 and 5 are male. Check the sample space carefully to list the outcomes in B.
N(B) 6
B 5 5 12, 13, 14, 25, 35, 45 6 1 P ( B ) 5 5 5 0.60
N(S) 10
c. Let C 5 two females are selected. Tellers 2, 3, and 4 are female. Check the sample
space again, and pick out the matching outcomes.
N(C) 3
C 5 5 23, 24, 34 6 1 P ( C ) 5 5 5 0.30
N(S) 10
The next example involves an equally likely outcome experiment and an inference
question. We’ll need to compute the likelihood of the observed event to help us draw a
conclusion.
Solution Trail 4.15a a. Find the probability that exactly one person buys a plain bagel.
b. Suppose all five customers purchase a plain bagel. Is there any evidence to suggest that
K e y w ords demand is weighted more toward one variety?
n Demand for each kind is
the same
Solution
n Selected at random
The experiment consists of selecting five customers at random and recording their bagel
Tr a nsl at i o n purchase. Each outcome is a sequence of five letters: Cs and/or Ps. For example, the out-
n Equally likely outcomes come CCPCP stands for: the first customer buys a cinnamon raisin bagel, the second
customer buys a cinnamon raisin bagel, the third customer buys a plain bagel, the fourth
C onc e p ts
buys a cinnamon raisin bagel, and the fifth buys a plain bagel. There are 32 possible out-
n P(A) 5 N(A) / N(S)
comes; a systematic listing helps, and a tree diagram works (but is big). (The multiplica-
V isi on tion rule also works here. This very useful counting technique is presented in Section 4.3.)
To find the probability of each Here is the sample space:
event, count the number of
S = {PPPPP, PPPPC, PPPCP, PPPCC, PPCPP, PPCPC, PPCCP, PPCCC, PCPPP,
outcomes in that event and
divide by the total number of PCPPC, PCPCP, PCPCC, PCCPP, PCCPC, PCCCP, PCCCC, CPPPP, CPPPC,
outcomes in the sample space. CPPCP, CPPCC, CPCPP, CPCPC, CPCCP, CPCCC, CCPPP, CCPPC, CCPCP,
CCPCC, CCCPP, CCCPC, CCCCP, CCCCC}.
140 C hapter 4 Probability
a. Let A 5 exactly one person buys a plain bagel. Check the sample space and carefully
SOLUTION TRAIL list all the outcomes in A.
A 5 5 PCCCC, CPCCC, CCPCC, CCCPC, CCCCP 6
N(A) 5
P(A) 5 5 5 0.15625 Equally likely outcomes.
N(S) 32
b. The claim is that the demand for each type of bagel is equal. If this is true, then all of
Solution Trail 4.15b
the outcomes in the sample space S are equally likely.
Ke ywor ds The experiment consists of observing the bagel purchase for the next five customers.
n All five purchase a plain bagel Let B 5 the observed outcome, everyone buys a plain bagel.
n Is there any evidence? Find the likelihood of the event B occurring. There is only one outcome in B, so the
Transl at i on
probability of the event B is
n Experimental outcome N(B) 1
P(B) 5 5 5 0.03125 Count and divide.
n Draw a conclusion (
N S ) 32
Concepts The conclusion: Because this probability is so small, all five people buying a plain
n Inference procedure bagel is a rare event. But it happened! This suggests the assumption is wrong—there is
evidence to suggest that the demand for each type of bagel is not equal.
Vi sion
Note: There is really evidence to suggest that some assumption is wrong. It could be,
Find the probability of the
experimental outcome that all
for example, that the five customers were not selected at random. To draw a conclusion
five customers buy a plain bagel. about the demand for these two types of bagels, we must accept all other assumptions
Compute how likely this are true.
outcome is, and draw a
conclusion about the claim. Try It Now Go to Exercise 4.52
Consider an experiment, two events A and B, and known probabilities P(A) and P(B).
Suppose we use A and B to create a new event using complement, union, or intersection.
Sometimes we can use the known probabilities P(A) and P(B) to calculate the probability
of the new event quickly. We may not have to break down the new event into simple
events, or even count all the outcomes in the new event (if it is an equally likely outcome
experiment). The complement rule and the addition rule for two events are two rules
that help with probability calculations.
A Closer L ok
S 1. The complement rule is easy to visualize and justify by looking at a Venn diagram.
Figure 4.9 shows an event A and its complement A9.
Remember, the area of a region represents probability. P(A) 1 P(A9) 5 P(S) 5 1,
which can be written as P(A) 5 1 2 P(A9) or P(A9) 5 1 2 P(A).
A A
2. The complement rule is incredibly handy; it is used in various contexts throughout
probability and statistics. The problem is, how do you know when to use it? Look for
keywords such as not, at least, and at most. A rule of thumb: If you are faced with a
very long probability calculation involving many simple events, or one that may
Figure 4.9 Venn diagram for require lots of counting, try looking at the complement.
visualizing the complement rule.
Example 4.16 Law and Order
(A9)9 is read as “A complement, Three public defenders are assigned to cases randomly. An experiment consists of recording
complement.” What is (A9)9? All the lawyer (by number) assigned to the next three cases. The outcome 132 means lawyer 1
outcomes not in A9, which is A! was assigned case 1, lawyer 3 was assigned case 2, and lawyer 2 was assigned case 3.
a. Find the probability that all three cases are assigned to different lawyers.
b. Find the probability that lawyer 2 is not assigned to any of the three cases.
c. Find the probability that lawyer 2 is assigned to at least one case.
4.2 An Introduction to Probability 141
Solution
There are 27 possible outcomes; a tree diagram works.
Note: Each case can be assigned to one of three lawyers:
Number of possible assignments for each case.
T T T
3 3 3 3 3 5 27
Case 1 Case 2 Case 3
Here’s the sample space:
S = {111, 112, 113, 121, 122, 123, 131, 132, 133,
211, 212, 213, 221, 222, 223, 231, 232, 233,
311, 312, 313, 321, 322, 323, 331, 332, 333}.
a. Let A 5 all three cases are assigned to different lawyers. Find all the outcomes in S
with a 1, a 2, and a 3.
A 5 5 123, 132, 213, 231, 312, 321 6
N(A) 6
P(A) 5 5 5 0.2222 Equally likely outcomes.
N(S) 27
8/ 27 is really approximately equal b. Let B 5 lawyer 2 is not assigned to any of the three cases. Find all the outcomes
to 0.2963. Many answers in this
SOLUTION TRAIL
without a 2.
text are rounded (here, to four B 5 5 111, 113, 131, 133, 311, 313, 331, 333 6
decimal places) and an equal sign
is used for simplicity and
N(B) 8
P(B) 5 5 5 0.2963
convenience. N(S) 27
c. Let C be the event lawyer 2 has at least one case. The outcomes in C include those with
Solution Trail 4.16c one 2, two 2s, and three 2s. That seems like a lot of counting. This is a good opportunity
to use the complement rule.
K e y w ords
n At least one case P ( C ) 5 1 2 P ( Cr ) Complement rule.
A B
To find the probability of the union, start by adding P(A ) 1 P(B). This sum includes
the region of intersection, P(A d B), twice. Adjust this total by subtracting the intersec-
tion area, once.
2. A c B 5 B c A; order doesn’t matter here. So, P ( A c B ) 5 P ( B c A ) .
3. The f irst, more general, formula always works. If A and B are disjoint, then
P(A d B) 5 0.
4. The addition rule can be extended.
For any three events A, B, and C:
You can also visualize and derive this by using a Venn diagram. In this case, the sum
P(A) 1 P(B) 1 P(C) includes the double intersections twice and the triple intersection
three times. We therefore need to adjust the total accordingly.
5. Let A1, A2, A3, . . . , Ak be a collection of k disjoint events.
P ( A1 c A2 c cc Ak ) 5 P ( A1 ) 1 P ( A2 ) 1 c1 P ( Ak ) .
If the events are disjoint, to find the probability of a union, just add up the corre-
sponding probabilities. This is especially useful in questions that ask about the num-
Beware of these common errors:
ber of individuals or objects with a specific attribute. For example, suppose 10 people
P(A 1 B)—you can’t add two
are asked whether they received a flu shot this winter. The probability of at least 3
events; P(A) c P(B)—you can’t
is the probability of 0, plus the probability of 1, plus the probability of 2, plus the
union two numbers.
probability of 3.
6. Complement, union, and intersection are operations applied to events. It doesn’t
Probabilities are given as
SOLUTION TRAIL make sense to take the union of probabilities (which are numbers). Similarly, addi-
percentages in Example 4.17. tion and subtraction are operations on real numbers. You shouldn’t try to add or
Divide each by 100 to convert subtract events.
to a probability.
Concepts
Solution
n Probability of events
a. Define the events given in the problem.
n Intersection
Let G 5 the customer adds sugar; P(G) 5 0.70.
Vi sion Let M 5 the customer adds milk; P(M) 5 0.35.
Create a Venn diagram and find Use both means uses sugar and milk, which means intersection. Therefore,
probabilities of events using the
P(G d M) 5 0.25.
Venn diagram and probability
rules. These three probabilities add up to more than 1. That’s OK because the events M and G
intersect.
4.2 An Introduction to Probability 143
Remember, area of a region corresponds to probability. To complete the picture, start at the
inside and work your way out.
i. The shaded area represents the probability that the customer uses both sugar and
milk, that is, P(G d M). We know that P(G d M) 5 0.25. Because P(G) 5 0.70, the
remaining area representing G corresponds to 0.70 2 0.25 5 0.45.
ii. Similarly, because P(G d M) 5 0.25 and P(M) 5 0.35, the remaining area repre-
senting M corresponds to 0.35 2 0.25 5 0.10.
iii. The total probability in the entire sample space must sum to 1; the remaining prob-
ability is 1 2 (0.45 1 0.25 1 0.10) 5 0.20.
Figure 4.11 is the Venn diagram that corresponds to this problem.
S
0.20
G M
The Venn diagram supports this answer. Look at the region that represents P(G c M),
and add up the corresponding probabilities.
c. Uses neither means does not use sugar or milk. Because G c M means sugar or milk,
neither suggests the complement of G c M.
P 3 ( G c M ) r 4 5 1 2 P ( G c M ) Complement rule applied to the event G c M.
5 1 2 0.80 5 0.20 Use the previous answer.
e. Uses just one of these items means uses sugar or milk, but not both. Start with the
union, subtract off the intersection.
P ( exactly one ) 5 P ( G c M ) 2 P ( G d M ) Use the Venn diagram.
5 0.80 2 0.25 5 0.55 Use the known probabilities.
Try It Now Go to Exercise 4.61
Solution
Find the probability of A, B, C, and D. Break down each event and look at the individual
outcomes.
P(A) 5 P ( M1 )1 P ( M2 ) 5 0.10 1 0.25 5 0.35
P(B) 5 P ( M2 )1 P ( M3 ) 1 P ( M6 ) 5 0.25 1 0.20 1 0.05 5 0.50
P(C) 5 P ( M4 )1 P ( M5 ) 5 0.30 1 0.10 5 0.40
P(D) 5 P ( M6 )5 0.05
a. Or means union. Find the corresponding events, and translate everything into
mathematics.
P(A c B) 5 P(A) 1 P(B) 2 P(A d B) (General) addition rule.
Find the following probabilities. five possible benefits are health insurance, life insurance, a
a. P(A) b. P(C) c. P(D) prescription plan, dental insurance, and vision insurance.
. P ( A c B )
d e. P ( A d C ) f. P ( B d D ) a. How many different benefit packages can an employee
. P(A9)
g h. P ( A d Cr ) i. P ( Ar d D ) select? List them.
j. P(C9) k. P ( B d C d D ) l. P 3 ( B c C ) r 4 b. If all benefit packages are equally likely, what is the
How do you know there is no other possible simple event in this probability that an employee selects a package that includes
experiment? health insurance?
c. If all benefit packages are equally likely, what is the
4.45 An experiment consists of rolling a special 18-sided die.
probability that an employee selects a package that
All of the numbers, 1 through 18, are equally likely. Find the
includes life insurance and a prescription?
probability of each event.
a. A 5 rolling an even number. 4.53 Travel and Transportation Suppose a bridge has 10 toll
b. B 5 rolling a number divisible by 3. booths in the east-bound lane: four are only for E-Z Pass holders,
c. C 5 rolling a number less than 7. two are only for exact change, one takes only tokens, and the
d. D 5 rolling at least a 10. remainder are manned by toll collectors who accept only cash.
During heavy-traffic hours it is difficult to see the signs indicating
4.46 An experiment consists of rolling a special 22-sided die. the type of toll booth. Suppose a driver selects a toll booth randomly.
All of the numbers, 1 through 22, are equally likely. Find the a. What is the probability that an exact-change toll booth is
probability of each event. selected?
a. A 5 rolling a number greater than 10 and even. b. What is the probability that a manual-collection toll booth
b. B 5 rolling a prime number or a number divisible by 5. or the token toll booth is selected?
c. C 5 rolling at most an 11. c. What is the probability that an E-Z Pass toll booth is not
d. D 5 rolling a number divisible by 2 and 3. selected?
4.47 Consider an experiment, the events A and B, and prob- d. Suppose the driver has only tokens. What is the probability
abilities P(A) 5 0.55, P(B) 5 0.45, and P ( A d B ) 5 0.15. of selecting the appropriate toll booth?
Find the probability of: 4.54 Public Health and Nutrition As of February 16, 2013,
a. A or B occurring. the risk of contracting the flu in the United States was still
b. A and B occurring. elevated. However, the number of new cases was decreasing.
c. Just A occurring. The following table lists the proportion of reported cases of
d. Just A or just B occurring. influenza A by region since September 30, 2012.4
4.48 Consider an experiment, the events A and B, and prob- Region Proportion
abilities P(A) 5 0.26, P(B) 5 0.68, and P ( A c B ) 5 0.80.
Find each probability. 1 0.076
a. P ( A d B ) b. P(A9) 2 0.074
c. P 3 ( A d B ) r 4 d. P 3 ( A c B ) r 4 3 0.215
4.49 Consider an experiment, the events A and B, and prob-
4 0.080
abilities P(A) 5 0.355, P(B) 5 0.406, and P ( A d B ) 5 0.229. 5 0.154
Find each probability. 6 0.065
a. P ( A c B ) b. P 3 ( A c B ) r 4 7 0.062
c. P(B9) d. P 3 ( A d B ) r 4 8 0.089
4.50 Carefully sketch a Venn diagram showing the 9 0.104
relationship between two events. Add probabilities to 10 0.081
the appropriate regions so that the following statements
are true: P ( A d B ) 5 0.31, P(A) 5 0.57, and P(B) 5 0.48. Suppose a reported case is selected at random.
a. What is the probability that the case is from Region 1, 2,
4.51 Carefully sketch a Venn diagram showing the relation- 3, or 4?
ships among three events. Add probabilities to the appropriate b. What is the probability that the case is from Region 9 or 10?
regions so that the following statements are true: c. What is the probability that the case is not from Region 5?
P ( A ) 5 0.46 P ( B ) 5 0.35 P ( C ) 5 0.44
4.55 Marketing and Consumer Behavior Delorenzo’s
P ( A d B ) 5 0.05 P ( A d C ) 5 0.18
Pizza offers five different toppings on its pizzas: pepperoni,
P ( B d C ) 5 0.14 P ( A d B d C ) 5 0.03
sausage, olives, mushrooms, and anchovies. A large pizza
comes with any two different toppings.
Applications
a. How many different two-topping pizzas are possible?
4.52 Manufacturing and Product Development Valassis, b. Suppose that all of the pizzas are equally likely. What is
a marketing services company, offers a cafeteria-style benefit the probability that the next pizza ordered has at least one
program; an employee may select three benefits from five. The meat topping?
146 C hapter 4 Probability
c. What is the probability that the next pizza ordered does Consider the following events.
not have anchovies? A 5 {Magazine, Newspaper}
d. Suppose one more large pizza choice is added: plain B 5 {TV, Radio, Internet}
cheese with no toppings. Answer parts (b) and (c) with C 5 {Magazine, Newspaper, Internet, Billboard}
this added assumption. Find the following probabilities.
4.56 Demographics and Population Statistics The a. P(A), P(B), P(C)
following table lists the proportion of employed people by b. P(A c B), P(A d B), P(B d C)
each major industry in Japan as of December 2012.5 c. P(A9), P(Ar d C), P(A d B d C)
d. P(Br d Cr), P[(B c C)9]
Major industry Proportion
4.59 Sports and Leisure The Florida Cash 3 daily midday
Agriculture and forestry 0.0328
lottery number consists of three digits, each 0–9.
Construction 0.0845 a. How many possible midday numbers are there?
Manufacturing 0.1722 b. If all of the midday numbers are equally likely, find the
Information and communications 0.0330 probability that all three digits are the same.
Transport and postal activities 0.0585 c. If all of the midday numbers are equally likely, find the
Wholesale and retail trade 0.1786 probability that all three digits are 8s or 9s.
Scientific, professional, technical 0.0367 d. There is an evening number, also consisting of three
digits, 0–9. If all of the midday and evening numbers are
Accommodations 0.0666
equally likely, what is the probability that the two
Personal services 0.0412
numbers are the same?
Education 0.0511
4.60 Marketing and Consumer Behavior The following
Medical 0.1247
table lists the most popular U.S. convention centers.7 Suppose
Services 0.0797
the probability given represents the likelihood that a randomly
Government 0.0404 selected U.S. convention will be held at that site.
Suppose an employed person in Japan is selected at random. Site Probability
a. What is the probability that the person works in
Orlando 0.310
construction or manufacturing?
b. What is the probability that the person does not work in Chicago 0.225
wholesale or retail trade? Las Vegas 0.098
c. What is the probability that the person does not work in Washington, DC 0.075
education nor medical? Dallas 0.064
4.57 Public Health and Nutrition The number of reported Atlanta 0.055
cases of vaccine-preventable diseases in Canada in 2011 is Phoenix 0.033
given in the following table.6
Other 0.140
Disease Frequency
Hib meningitis 38 Suppose a convention is randomly selected. Consider the events:
Measles 759 A 5 {Convention in Orlando or Chicago}
Mumps 282 B 5 {Convention not in Washington, DC}
C 5 {Convention in Las Vegas}
Pertussis 676
Other 8 Find the following probabilities.
a. P(A), P(B), P(C)
Suppose a reported case is selected at random. b. P(A d B), P(A c C), P(A d C)
a. What is the probability that the case is measles? c. P(Ar c C), P(A c B c Cr)
b. What is the probability that the case is mumps or pertussis? 4.61 Technology and the Internet Tablet computers have
c. What is the probability that the case is not Hib meningitis? become very popular and fill a gap between smartphones and PCs.
4.58 Marketing and Consumer Behavior A marketing A recent survey indicated that of those people who own tablets,
firm can place an advertisement using several media. The table 70% use the device to play games and 44% use the device to access
below shows the probability that a randomly selected person in a bank accounts.8 Suppose 30% do both—play games and access
targeted region will see the advertisement in the given medium. bank accounts—and suppose a tablet user is selected at random.
a. What is the probability that the tablet user plays games or
Medium Newspaper Radio Magazine TV accesses bank accounts?
Probability 0.15 0.10 0.08 0.30 b. What is the probability that the tablet user does not play
games nor access bank accounts?
Medium Internet Billboard Not seen
c. What is the probability that the tablet user only plays
Probability 0.12 0.05 0.20 games? Only accesses bank accounts?
4.3 Counting Techniques 147
A Closer L ok
1. You can picture (and even prove) this rule by drawing a tree diagram and counting the
number of paths from left to right.
2. To use this rule, think of each choice as a slot, or a position, to fill.
Solution
Step 1 This is a counting problem, and there are three slots to fill: receiver, speakers,
and Blu-ray player. We’ll assume that all components are compatible, and that
the choice of any one item does not depend on any other item.
Step 2 Here’s how to apply the multiplication rule:
7 3 12 3 9 5 756
Receiver Speakers Blu-ray player
There are 756 possible systems.
Solution
a. This is a counting problem. There are six slots to fill: three letters followed by three
numbers. There are 26 possible letters for each of the first three positions, and 10
The actual number of possible possible numbers for each of the last three positions. Use the multiplication rule.
license plates is smaller, because
26 3 26 3 26 3 10 3 10 3 10 5 17,576,000
some three-letter words aren’t
allowed. Letter Letter Letter Number Number Number
There are 17,576,000 possible different license plates.
4.3 Counting Techniques 149
b. If the license plate ends in 555, then each of the number positions is fixed; there is only
one choice. We are still free to choose any letter in each of the first three positions. The
multiplication rule still works.
26 3 26 3 26 3 1 3 1 3 1 5 17,576
Letter Letter Letter Number Number Number
There are 17,576 license plates that end in 555.
Solution
a. There are five slots to fill, one for each die. Use the multiplication rule.
Keeweeboy/Dreamstime.com
6 3 6 3 6 3 6 3 6 5 7776
Die 1 Die 2 Die 3 Die 4 Die 5
There are 7776 possible rolls, or outcomes, in the sample space.
b. There are only six possible Yahtzees: 11111, 22222, 33333, 44444, 55555, and 66666.
Because all the outcomes are equally likely (fair dice), the probability of rolling a
Yahtzee is
number of Yahtzees 6
P ( Yahtzee ) 5 5 5 0.0007716
number of different rolls 7776
Solution
a. There are three positions to fill, but the number of choices in the second slot depends
on the first choice and the number of choices in the third slot depends on the first two
choices. Even though we are drawing from the same, reduced, collection, we can still
use the multiplication rule.
There are 12 horses that could finish first. Once a first-place horse is selected, there are
only 11 left that could come in second. After a first and second-place horse are selected,
there are only 10 possible for third place. The multiplication rule is used here to count
the number of permutations.
12 3 11 3 10 5 1320
First Second Third
There are 1320 possible different finishes. N(S) 5 1320.
150 C hapter 4 Probability
b. Let A be the event horse 4 or 5 wins the race. We’ll assume that all of the outcomes are
equally likely so that P(A) 5 N(A) / N(S) 5 N(A) / 1320.
There are two choices for first place (horse 4 or 5). There are now 11 choices for
second place (the horse not selected for first, plus the remaining 10), and 10 choices
for third place.
The multiplication rule is used to find the number of outcomes in A:
2 3 11 3 10 5 220
First Second Third
Finally, P(A) 5 220 / 1320 5 0.1667.
c. The word not suggests the use of a complement, but a direct approach may be easier.
Let the event B 5 horse 7 does not finish first, second, or third. P(B) 5 N(B) / N(S) 5
N(B) / 1320.
Use the multiplication rule again to count the number of outcomes in the event B. We
do not want horse 7 in the top three. That leaves 11 possible horses for first place, 10
for second, and 9 for third.
11 3 10 3 9 5 990
First Second Third
P(B) 5 990 / 1320 5 0.75.
Definition
For any positive whole number n, the symbol n! (read “n factorial”) is defined by
n! 5 n ( n 2 1 )( n 2 2 ) c( 3 )( 2 )( 1 )
In addition, 0! 5 1 (0 factorial is 1).
A Closer L ok
1. To find n!, just start with n, multiply by (n 2 1), then (n 2 2), . . . , down to 1. For
example,
7! 5 ( 7 )( 6 )( 5 )( 4 )( 3 )( 2 )( 1 ) 5 5040
10! 5 ( 10 )( 9 )( 8 )( 7 )( 6 )( 5 )( 4 )( 3 )( 2 )( 1 ) 5 3,628,800
2. Factorials get really big, really fast. Try finding 50!. If you absolutely have to find a
large factorial, then you should probably use a good calculator or computer.
Definition
Given a collection of n different items, an ordered arrangement, or subset, of these items is
called a permutation. The number of permutations of n items, taken r at a time, is given by
nPr 5 n ( n 2 1 )( n 2 2 ) c 3 n 2 ( r 2 1 ) 4
4.3 Counting Techniques 151
A Closer L ok
1. All n items must be different in order for this formula to be used.
2. A distinguishing characteristic of a permutation is that order matters. For example, if
the outcome AB is different from the outcome BA, that suggests a permutation. Sup-
pose an experiment consists of selecting two students from a class of 35. The first one
selected will be the president and the second will be the vice president. Order certainly
matters here; we will be counting permutations. If the two students selected will form
a committee, however, then the order of selection does not matter. Counting in this
case involves a combination, which will be introduced a little later.
3. Here is an example to justify this formula.
( 9 )( 8 )( 7 )( 6 )( 5 )( 4 )( 3 )( 2 )( 1 )
5 ( 12 )( 11 )( 10 ) 3 Multiply by 1 in a nice form.
( 9 )( 8 )( 7 )( 6 )( 5 )( 4 )( 3 )( 2 )( 1 )
( 12 )( 11 )( 10 )( 9 )( 8 )( 7 )( 6 )( 5 )( 4 )( 3 )( 2 )( 1 )
5 Rewrite as one fraction.
( 9 )( 8 )( 7 )( 6 )( 5 )( 4 )( 3 )( 2 )( 1 )
12! 12! n!
5 5 5 Definition of factorial.
9! ( )
12 2 3 ! ( n 2 r)!
Solution
If you compute nPr by hand, there Step 1 There are n 5 10 items, we need to choose r 5 6, and the order in which the soda
is always a lot of canceling. nPr is a is arranged matters. For example, if capital letters represent soda types, then the
count, so the answer has to be an arrangement ABCDEF is different from ABCEDF. We must count the number of
integer. permutations of 10 items, taken 6 at a time.
10! 10!
Step 2 10P6 5 5 Definition of nPr, using factorials.
( 10 2 6 ) ! 4!
( 10 )( 9 )( 8 )( 7 )( 6 )( 5 )( 4 )( 3 )( 2 )( 1 )
5 Definition of factorial.
( 4 )( 3 )( 2 )( 1 )
5 ( 10 )( 9 )( 8 )( 7 )( 6 )( 5 ) 5 151,200 Cancel; multiply.
There are 151,200 ordered arrangements of soda types in the vending machine.
Figure 4.12 shows a technology solution.
Count the number of There are 3024 different ordered arrangements of four episodes.
arrangements in which the final
b. Let A 5 the last recording selected is the season finale. There are four positions to fill,
recording is selected last, and
divide this count by the total
but the last slot is fixed (with the season finale). The first three positions can be filled
number of ordered arrangements by any of the remaining eight recordings, in any order.
6 8P3
8 3 7 3 6 3 1 5 336
Rec 1 Rec 2 Rec 3 Rec 4
336
P(A) 5 5 0.1111
3024
Figure 4.13 shows a technology solution.
Definition
Figure 4.13 Find the number of
permutations and divide by the Given a collection of n different items, an unordered arrangement, or subset, of these
total number of outcomes. items is called a combination. The number of combinations of n items, taken r at a time,
is given by
n n! nPr
nCr 5a b5 5
r r! ( n 2 r ) ! r!
A Closer L ok
n
1. a b is read as “n choose r.”
r
2. To find nCr from nPr we need to collapse all ordered arrangements of the same r items
into one possible outcome. Dividing by r! does this because every unordered set of r
distinct items can be arranged in r! ways.
3. If you have to calculate nCr by hand, there is always a lot of cancellation. The final
answer must be an integer because it is a count.
Solution
Step 1 There are n 5 20 prospective jurors, and we need to choose r 5 12, without
regard to order. A jury is an unordered arrangement of 12 people. We need to
count the number of combinations of 20 items, taken 12 at a time.
4.3 Counting Techniques 153
20
Step 2 20C12 5a b
12
20! 20! n
5 5 Definition of a b.
12! ( 20 2 12 ) ! 12!18! r
( 20 )( 19 )( 18 )( 17 )( 16 )( 15 )( 14 )( 13 )
5 5 125,970 Cancellation; computation.
8!
SOLUTION TRAIL
There are 125,970 ways to select a jury of 12 from a pool of 20 candidates. Figure 4.14
Figure 4.14 The TI-84 Plus C shows a technology solution.
combination function.
Example 4.26 Hardwood Floors
Lumber Liquidators ships 214 -inch solid oak wood flooring in cartons containing 20
Solution Trail 4.26 square feet. Suppose that two cartons in a shipment of 11 contain defective pieces. An
installer randomly selects five cartons.
K e y w ords
n Randomly selects a. What is the probability that there are no defective pieces in any of the five cartons
selected?
Tr a nsl at i o n
b. What is the probability that the installer picks exactly one carton that contains defective
n Equally likely outcomes pieces?
C onc e p ts
n Probability of an event means Solution
counting
There are n 5 11 cartons, and we need to choose r 5 5, without regard to order.
n Combinations
V isi on
11 11! 11! ( 11 )( 10 )( 9 )( 8 )( 7 )
11C5 5a b5 5 5 5 462
The order in which the cartons 5 5! ( 11 2 5 ) ! 5!6! 5!
are selected does not matter. To
There are 462 outcomes in the sample space, all equally likely.
find the number of outcomes in
the sample space, count a. Let A 5 select no cartons that contain defective pieces. Count the number of ways to
combinations. To count the select no cartons that contain defective pieces. This is the same as choosing five good
number of outcomes in each cartons. There are nine good cartons, so we count the number of ways to select five
event, use the multiplication rule cartons from the nine good cartons, without regard to order.
and the formula for nCr.
9 9! 9! ( 9 )( 8 )( 7 )( 6 )
N(A) 5 a b 5 5 5 5 126
5 5! ( 9 2 5 ) ! 5!4! 4!
N(A) 126
P(A) 5 5 5 0.2727
N(S) 462
b. Let B 5 select one carton that contains defective pieces (and, therefore, four good
cartons). To find the number of outcomes in B, there are two cases to consider: the
number of ways to select one bad carton, and the number of ways to select four
good cartons.
2 9
a b 3 a b 5 2 3 126 5 252
1 4
c c
The number of ways to select The number of ways to select
one bad carton from four good cartons from
two, without regard to order nine, without regard to order
N(B) 252
P(B) 5 5 5 0.5455
N(S) 462
Figure 4.15 Probability
calculations. Figure 4.15 shows a technology solution.
Technology Corner
Procedure: Compute permutations and combinations.
Reconsider: Examples 4.23 and 4.25, solutions, and interpretations.
TI-84 Plus C
There are built-in functions to compute permutations, combinations, and even factorials.
1. On the Home screen, enter the value of n.
2. Select MATH ; PROB ; nPr or MATH ; PROB ; nCr.
3. Enter the value of r. See Figures 4.13 and 4.14.
Minitab
Use the functions Permutations and Combinations in either a Session window or in Calc; Calculator. See Figure 4.16.
Excel
Use the function PERMUT to compute permutations and the function COMBIN to compute combinations. Use intermediate
results to compute probabilities. See Figure 4.17.
Practice work. Three of the 14 do not have their union card, and 6
carpenters will be selected at random for construction jobs.
4.73 Find the number of permutations indicated. a. What is the probability that all six carpenters selected will
a. 8P4 b. 11P7 c. 12P4 have their union card? Write a Solution Trail for this
d. 10P10 e. 10P1 f. 10P0 problem.
g. 9P2 h. 20P2 i. 100P2 b. What is the probability that exactly one carpenter selected
4.74 Find the number of combinations indicated. will not have a union card?
9 9 14 c. What is the probability that at least one carpenter selected
a. a b b. a b c. a b will not have a union card?
5 4 7
10 10 10 4.83 Fuel Consumption and Cars Suppose Geico offers
. a b
d e. a b f. a b
10 1 0 automobile insurance with specific levels of coverage according
12 16 20 to the table below.
. a b
g h. a b i. a b Coverage levels
3 7 18
4.75 How many permutations of the letters in the word Medical: $10,000; $20,000; $50,000; $100,000
HISTOGRAM are possible? Bodily injury liability: $50,000; $100,000
Property damage liability: $25,000; $50,000; $100,000
4.76 A businessman’s outfit consists of a pair of pants, a shirt,
Uninsured motorists: $50,000; $100,000; $200,000
and a tie. Suppose he can choose from among 5 pairs of pants,
8 shirts, and 15 ties. Comprehensive: $250,000; $500,000; $1,000,000
a. How many different outfits are possible?
b. Suppose a winter outfit includes a sweater and he can Suppose an automobile policy must have all five coverages.
select one of 7 sweaters. Now how many different winter a. How many different automobile policies are possible?
outfits are possible? b. How many policies have comprehensive coverage of at
least $500,000?
4.77 A disc jockey has 20 songs to choose from but can play only c. How many policies have bodily injury liability and
7 in the next half-hour. How many different playlists are possible? property damage liability of $100,000?
4.78 A grocery store has 6 cashiers on duty, 10 baggers, and 4 4.84 Manufacturing and Product Development eBags
people who will help customers load their groceries into a car. offers backpacks in 5 styles, 3 sizes, and 10 colors.
How many different checkout crews are possible? a. How many different backpacks does the company offer?
b. Midnight blue and dark green are the two most popular
4.79 A television station is developing a new identifying
colors. How many different backpacks in these colors
three-note theme. How many different three-note themes are
does the company offer?
possible if there are 20 notes to choose from and no note can be
c. The urban-style backpack is the least popular. If the
repeated?
company eliminates this style, how many different
4.80 A small basket contains 17 good apples and 3 rotten apples. backpacks will it offer?
a. How many different handfuls of six apples are possible?
4.85 Fuel Consumption and Cars A small tool-and-die
b. How many different handfuls of five good apples and one
shop manufactures kneuter valves. A shipment of 15 valves to a
rotten apple are possible?
Swedish automobile assembly plant contains three defective
c. How many different handfuls of three good apples and
values. Suppose the assembly plant randomly selects four
three rotten apples are possible?
valves from the shipment.
a. What is the probability that all four valves will be
Applications
defect-free?
4.81 Manufacturing and Product Development Suppose b. What is the probability that the plant will select all three
Target sells a combination lock, which is really a permutation defectives?
lock, with 40 numbers, 0 to 39. The combination for each lock c. What is the probability that the plant will select at least
is set at the factory and consists of three numbers. one defective?
a. How many lock combinations are possible if numbers can
4.86 Medicine and Clinical Studies A physician routinely
be repeated?
visits a local nursing home on Thursday mornings to examine
b. If all lock combinations are equally likely, what is the
patients. Suppose the facility has 20 residents, but the physician
probability of selecting a lock with only single-digit
SOLUTION TRAIL
only has time to check 8. The supervisor places 8 random
numbers in the combination?
patients on an ordered list and presents the schedule to the
c. Answer parts (a) and (b) if the lock combination must be
physician.
three different numbers.
a. How many different schedules are possible?
4.82 Public Policy and Political Science Suppose 14 b. If there are 15 women and 5 men in the facility, what is the
carpenters report to the union hall hoping for a chance to probability that all appointments will be with women?
156 C hapter 4 Probability
4.87 Psychology and Human Behavior A telemarketer has sink consisting of 8 six-inch-square ceramic tiles decorated
12 people on his contact list. Suppose he will randomly select 8 with different botanical herbs. The tiles will be installed in a
people to call during the next shift. custom-made wooden panel. The tile supplier has 12 different
a. How many different calling schedules are possible? herb designs to choose from, and the builder selects 8 of these
b. Suppose only 2 of the 12 will definitely purchase the 12 at random. Suppose the order in which the tiles are arranged
product when contacted. What is the probability that these on the splashguard does not matter.
2 people will be the first 2 called?
SOLUTION TRAIL
a. Two of the 12 herb tiles contain a blue tint that matches
c. Suppose another 2 of the 12 will ask to be placed on the the kitchen color scheme. What is the probability that
do-not-call list when contacted. What is the probability these 2 tiles will be included in the splashguard?
that these 2 people will not be called? b. The family actually grows 5 of the 12 herbs in a backyard
garden. What is the probability that all 5 of these will be
4.88 Sports and Leisure In preparation for the coming
included on the splashguard?
season, a bass fisherman decides to buy 5 random lures
out of the 10 new ones in the local tackle shop. 4.93 Travel and Transportation A PennDOT road
a. How many different collections of 5 new lures are possible? line-painting crew consists of a foreman, a driver, and a painter.
b. Suppose 1 of the 10 lures is a Crazy Crawler. What is the Suppose a supervisor is preparing the schedule to paint lines on
probability that the fisherman will not select this lure? roads in Johnstown and 10 foremen, 15 drivers, and 17 painters
Write a Solution Trail for this problem. are available.
c. Suppose 3 of the 10 are Excalibur lures. What is the a. How many different crews are possible?
probability that at least 1 of the 5 selected will be an b. Suppose the crews are selected at random, and there is
Excalibur lure? one foreman who has a severe personality conflict with
4.89 Classic Vinyl A music collector has 15 unopened
one driver. What is the probability that neither of these
classic rock albums in her collection. Suppose she decides individuals will be on the road painting crew?
to select three to sell at an upcoming auction. c. Eight of the painters have been cited by a supervisor for
a. How many different ways are there to select three albums improper painting. What is the probability that the crew
from her collection? will include one of these painters?
b. Suppose the albums are selected at random and five are 4.94 Education and Child Development A university
by the Beatles. What is the probability that all three library is preparing a display case of books written by faculty
selected will be by the Beatles? members. There are 25 new faculty books, but there is room
c. What is the probability that none of the three will be by for only 10 in the display case. Suppose 10 books are selected
the Beatles? at random.
d. Suppose that one album is by the Doors. What is the a. How many different faculty book collections can be
probability that this album is not selected? displayed?
4.90 Business and Management The Gagosian art gallery b. If 15 of the new books are written by faculty members
in New York City has 20 stored paintings but has just made room from the College of Science and Technology, what is the
to display several of them. Seven paintings will be randomly probability that all 10 displayed books are written by
selected and offered to the public for sale. faculty members from this college?
a. How many different collections of 7 paintings are c. If none of the 10 displayed books is written by faculty
possible? members from the College of Science and Technology, is
b. Suppose 10 of the 20 stored works are by the same local there any evidence to suggest the selection process was
artist. What is the probability that all 7 of the selected not random? Justify your answer.
paintings will be by this artist? 4.95 Psychology and Human Behavior In a family with
c. The featured room in the gallery receives the most attention, five children, two of the five are selected at random each
and the order in which the paintings are displayed in this evening to do the dishes. The first one selected washes, and the
room is related to buyer interest. Suppose the 7 selected second one dries.
paintings will be placed in this featured room. How many a. How many different wash–dry crews are possible?
different arrangements are possible? b. Suppose there are two girls and three boys in the family.
4.91 Economics and Finance The purchasing agent for a If the two girls are selected to wash and dry, is there any
state office building placed a call for bids on replacing the entry evidence to suggest the selection process was not
doors. Suppose that eight sealed bids are received by the random? Justify your answer.
deadline. The bids will be opened in random order.
a. In how many different ways can the bids be opened? Extended Applications
b. What is the probability that the lowest bid will be opened
4.96 Manufacturing and Product Development A
last?
remote-control garage door opener has a series of 10 two-
4.92 Marketing and Consumer Behavior In remodeling a position (0 or 1) switches used to set the access code. The
kitchen, a builder decides to place a splashguard behind the code is initially set at the factory, and the switch sequence on
4.3 Counting Techniques 157
the remote control and the opener must match in order to use Challenge
the system.
a. How many different access codes are possible? 4.100 The Complement Rule Reconsider Example 4.22.
b. If all access codes are equally likely, what is the Verify the probability in part (c) using the complement rule.
probability that a randomly selected system will have a 4.101 Combination Patterns Find the following sums.
code with exactly one 0?
2 2 2
c. To increase security and ensure that customers will have a. a b 1 a b 1 a b
different access codes, new systems have 10 three- 0 1 2
position switches (0, 1, or 2). Answer parts (a) and (b) 3 3 3 3
b. a b 1 a b 1 a b 1 a b
using the new system. 0 1 2 3
4.97 Psychology and Human Behavior An annual family 4 4 4 4 4
c. a b 1 a b 1 a b 1 a b 1 a b
picture following Thanksgiving dinner is arranged with all 10 0 1 2 3 4
family members in a row in front of a fireplace.
n n n n
a. How many different arrangements of family members are d. a b 1 a b 1 a b 1 c1 a b
possible? 0 1 2 n
b. Suppose the family includes one set of twins, and all 4.102 Sports and Leisure Consider a regular deck of
arrangements are equally likely. What is the probability 52 playing cards. For a five-card poker hand, find the
that the twins will be in the middle two places (positions probability of:
5 and 6)? a. One pair.
c. What is the probability that the twins will be side by side b. Two pairs.
in the picture? c. Three of a kind: three cards of the same rank and two
d. Suppose the family includes five males and five females. others of different ranks, for example, JJJ74.
What is the probability that the picture arrangement will d. A straight: five cards in sequence; the ace can be either
alternate male, female, male, female, etc., or female, high or low.
male, female, male, etc.? e. A flush: five cards of the same suit.
4.98 Public Policy and Political Science A special commit- 4.103 Psychology and Human Behavior How many
tee on community development has four members from the different ways are there to arrange n people at a round table?
town council. The full town council has 14 members, six (Hint: A simple rotation of a seating plan, shifting each person
Democrats and eight Republicans. around the table but keeping the order the same, is not a
a. How many different committees on community devel- different arrangement.)
opment are possible?
b. Suppose the committee members are selected at random. 4.104 Travel and Transportation Suppose there are n
What is the probability of a committee consisting of all items of which n1 are of one type, n2 are of a second type, . . . ,
Republicans? and nk are of the kth type, and n1 1 n2 1 c1 nk 5 n. The
c. Suppose every member of the committee selected is a number of unordered arrangements of the n items is a general-
Democrat. Do you believe the selection process was ized combination given by
random? Justify your answer. n n!
a b5
4.99 Sports and Leisure Texas hold ’em poker has become n1 n2 # # # nk n1! n2! cnk
very popular in gambling casinos and is seen on ESPN and the
(Think about arranging a string of colored Christmas
Travel Channel. In November 2012, Greg Merson won the
tree lights.) Suppose the Amtrak Auto Train from
World Series of Poker in Las Vegas and a cool $8.53 million.
Washington, DC, to Florida has 10 sleeper cars, 2 diner
The game is played with a standard 52-card deck, and starts
cars, and 14 car carriers. Discounting the engine and
with each player being dealt two (random) cards face down
caboose, how many different arrangements of cars in the
(hole cards). There is a round of betting, the dealer then flips
train are there?
three cards face up (the flop), betting, one card is flipped (the
turn), betting, a fifth card is flipped (the river), and more 4.105 Public Policy and Political Science The U.S.
betting. Let’s focus on the two hole cards, called a (pre-flop) Senate Committee on Homeland Security and Governmental
hand, in this problem. Affairs has 16 members. The full Senate has 53 Democrats,
a. How many (two-card, pre-flop) hands are possible in 45 Republicans, and 2 Independents.
Texas hold ’em? a. How many different 16-member Senate committees are
b. What is the probability that a pre-flop hand consists of possible?
two aces? b. If the committee members are selected at random, what is
c. What is the probability that a pre-flop hand consists of a the probability of a committee consisting of all
pair, that is, two cards of the same rank? Democrats?
d. What is the probability that a pre-flop hand consists of c. What is the probability that the committee consists of
two cards of the same suit? 14 Democrats and 2 Independents?
158 C hapter 4 Probability
Knowing something extra may change the probability assignment. How do we use
any added information to compute the (possibly) new probability? Consider the next
example.
Suppose someone rolls the die, covers it with her hands, peeks at the number, and reports,
“I rolled an odd number.” With this added information, the probability of a 1 is now
P ( A 0 B ) 5 13 . This conditional probability is reasonable because now we only have to
consider three possibilities—that is, we have reduced the sample space from six outcomes
to three, and the number of outcomes in A is 1.
The idea of reducing, or shrinking, the sample space is key to calculating conditional
probabilities. The definition of conditional probability, and some justification for it, are
given below.
Definition
Suppose A and B are events with P(B) > 0. The conditional probability of the event A
given that the event B has occurred, P(A 0 B), is
What goes wrong with this P(A d B)
P(A 0 B) 5
definition if P(B) 5 0? P(B)
A Closer L ok
1. The unconditional probability of an event A can be written as
P(A) P(A) probability of the event A
P(A) 5 5 5
1 P(S) probability of the relevant sample space
We use this same reasoning to find P(A 0 B).
2. Given that B has occurred, the relevant sample space has changed. It is reduced from
S to B. (See Figure 4.18.)
3. Given that B has occurred, the only way A can occur is if A d B has occurred, because
the sample space has been reduced to B.
4. P(A 0 B) is the probability that A has occurred, P ( A d B ) , divided by the probability of
the relevant sample space P(B).
S
A B
In the following example, the formula for conditional probability is used to confirm our
intuitive answer to the die problem above.
Solution
Step 1 We will need the following probabilities:
A Closer L ok
Here are some facts about union, intersection, and conditional probability to help trans-
late and solve many of the problems that follow.
1. P ( A c B ) 5 P ( B c A ) .
This is always true, because A c B 5 B c A (all the outcomes in A or B or both).
2. P ( A d B ) 5 P ( B d A ) .
This is also always true, because A d B 5 B d A.
When are these two conditional 3. P ( A 0 B ) 2 P ( B 0 A ) .
probabilities equal? These two probabilities could be equal, but in general they are different.
It’s all right to switch A and B with union and intersection, but not with conditional
probability.
4. The keywords given and suppose often signal partial information and, therefore, indi-
cate a conditional probability question.
Assume that these results are representative of the entire population of Chicago, so the
relative frequency of occurrence is the true probability of the event. A person from
Chicago is randomly selected.
a. Find the probability that the person rates the prices medium.
b. Find the probability that the person rates the food 2 stars.
c. Suppose the person selected rates the prices high. What is the probability that he rates
the restaurants 1 star?
d. Suppose the person selected does not rate the food 4 stars. What is the probability that
she rates the prices high?
4.4 Conditional Probability 161
Solution
Step 1 This is an unconditional probability question, asking only about the event M.
Compute the relative frequency of occurrence of M, that is, the proportion of
responses that rated the restaurant medium priced.
50 1 80 1 95 1 25 250
P(M) 5 5 5 0.4902
510 510
Step 2 This is just another unconditional probability question. Find the relative fre-
quency of occurrence of 2 stars.
35 1 80 1 5 120
P(B) 5 5 5 0.2353
510 510
Step 3 This is a conditional probability question; the key word is suppose. The
given information is rates prices high. We need the probability of the event
A given that the event H has occurred. Using the formula for conditional
probability,
P(A d H) 25/510 25 # 510 25
P(A 0 H) 5 5 5 5 5 0.2500
P(H) 100/510 510 100 100
This probability can also be obtained directly by reducing the sample space in
the two-way table. The shaded row is the reduced sample space.
There are 440 (5 95 1 120 1 225) outcomes in the reduced sample space, and
70 (5 25 1 5 1 40) people rated the prices high.
70
P ( H 0 Dr ) 5 5 0.1591
440
Try It Now Go to Exercise 4.124
162 C hapter 4 Probability
Steps for Calculating Example 4.32 Sex, Marital Status, and the Census
a Conditional The U.S. Constitution directs the government to conduct a census of the population every
Probability 10 years. Population totals are used to allocate congressional seats, electoral votes, and
funding for many government programs. The U.S. Census Bureau also compiles informa-
To find the conditional probabil- tion related to income and poverty, living arrangements for children, and marital status.
ity of the event A given that the The following joint probability table lists the probabilities corresponding to marital status
event B has occurred: and sex of persons 18 years and over.14
a. Calculate P(B) and P(A d B).
Marital status
P(A d B)
b. Find P ( A 0 B ) 5 Never Divorced or
P(B) Married (R) married (N) Widowed (W) separated (D)
Male (M) 0.282 0.147 0.013 0.043 0.485
Sex
P(F d R) 0.284
P(F 0 R) 5 5 5 0.502
(
P R ) 0.566
A Closer L ok
1. In Example 4.32,
P(R) 5 P(R d M) 1 P(R d F)
5 P ( R d M ) 1 P ( R d Mr )
In general, for any two events A and B,
P ( A ) 5 P ( A d B ) 1 P ( A d Br )
This decomposition technique is often needed in order to find P(A). The Venn diagram
in Figure 4.19 illustrates this equation.
The events B and B9 make up the entire sample space: S 5 B c Br.
S
A B
A B
Try to draw the Venn diagram to 2. Suppose B1, B2, and B3 are mutually exclusive and exhaustive:
illustrate this equality. B1 c B2 c B3 5 S. For any other event A,
P(A) 5 P(A d B1) 1 P(A d B2) 1 P(A d B3)
164 C hapter 4 Probability
4.117 Consider an experiment and three events A, B, and C Find the following probabilities.
defined in the Venn diagram below. a. P(A), P(B), and P(C).
b. P(A d B) and P(B d C).
S c. P(A 0 B), P(B 0 C), and P[(A d B) 0 C].
A d. P(0 0 C9), P(7 0 C), and P[(A c B) 0 C9).
C e. P(2 0 B), P(3 0 B), and P(7 0 B).
1 2 6 7
8 Applications
5
3 4.119 Sports and Leisure Consider a regular 52-card deck
9
4 of playing cards. Suppose two cards are drawn at random from
B the deck without replacement.
a. What is the probability that the second card is an ace,
given that the first card is a king?
The following table gives the probability of each outcome. b. What is the probability that the second card is an ace,
given that the first card is an ace?
Outcome 1 2 3 4 5 c. What is the probability that the second card is a heart,
given that the first card is a heart?
SOLUTION TRAIL
Probability 0.01 0.12 0.11 0.10 0.15 d. Suppose two cards are drawn at random from the deck
with replacement. What is the probability that the second
Outcome 6 7 8 9 card is a heart, given that the first card is a heart?
Probability 0.25 0.14 0.08 0.04 4.120 Economics and Finance In the United States,
there is some evidence to suggest a link between people
Find the following probabilities. who participate in an office football pool and those who cheat
a. P(A), P(B), and P(C). on their income taxes. Suppose 25% of all people participate
Why don’t these three probabilities sum to 1? in an office football pool. The IRS estimates that 15% of
b. P(A d B) and P(B d C). all people participate in an office football pool and cheat
c. P(B 0 C) and P(C 0 B). on their income tax return. Suppose a person is randomly
d. P(A 0 B9), P(C 0 A9), and P[1 0 (A c B)9]. selected. If the person is known to participate in an
e. P(3 0 B), P(4 0 B), and P(5 0 B). office football pool, what is the probability that she cheats
Why do these three probabilities sum to 1? on her income tax return? Write a Solution Trail for this
4.118 Consider an experiment and three events A, B, and C
problem.
defined in the Venn diagram below. 4.121 Trail Users According to Trail Count, the annual
survey of San Jose’s off-street bicycle and pedestrian trail users,
S approximately 24% use trails daily.15 Suppose 12% use trails
daily and exercise, and 8% use trails daily for commuting.
A B
Suppose a trail user is randomly selected.
2 a. Given that the person uses trails daily, what is the
1 3
probability that he uses the trails for exercise?
b. Given that the person uses trails daily, what is the
SOLUTION TRAIL
5 6
4 7 probability that she uses the trails for commuting?
c. If the person uses trails daily, what is the probability that
he does not use the trial for commuting?
8 9
0 4.122 Travel and Transportation In a particularly
C rural area in upstate New York, 80% of all people use
chains on their car tires (for winter driving), 60% carry a
snow shovel in their car and use chains, and 15% carry a
The following table gives the probability of each outcome. shovel but do not use chains. Suppose a person from this area
is selected at random.
Outcome 0 1 2 3 4 a. If the person uses chains, what is the probability that he
carries a shovel? Write a Solution Trail for this problem.
Probability 0.135 0.130 0.142 0.128 0.147 b. Given that the person does not use chains, what is the
probability that she carries a shovel?
Outcome 5 6 7 8 9
4.123 Biology and Environmental Science The Florida
Probability 0.083 0.072 0.063 0.055 0.045 Sea Grants Agents, Florida Fish and Wildlife Commission, and
166 C hapter 4 Probability
volunteer divers work together in the Great Goliath Grouper Revenue source
Count. The table below lists the probability of each size grouper Gambling Liquor stores Other
in two regions.16
18–21 33 68 12
Length 21–30 55 121 50
Age
, 3 ft 3–5 ft . 5 ft 30–45 117 109 132
Sarasota 0.059 0.347 0.079 $ 45 158 110 90
Region
Monroe 0.030 0.277 0.208
Assume this table is representative of the entire state’s popula-
Suppose a grouper from one of these regions is selected at tion and suppose a resident is randomly selected.
random. a. What is the probability that the resident is in favor of
a. Suppose the grouper is 3–5 feet long. What is the legalized gambling?
probability that it is from Monroe? b. Suppose the person is in favor of state-owned liquor stores.
b. Suppose the grouper is from Sarasota. What is the What is the probability that the person is 30–45 years old?
probability that the grouper is longer than 5 feet? c. Suppose the person selected is under 21. What is the
c. Suppose the grouper is longer than 3 feet. What is the probability that the person is in favor of some other option?
probability that it is not from Sarasota? d. Suppose the resident is not in favor of legalized gambling.
What is the probability that the respondent is 21–30 years old?
4.124 Sports and Leisure An assistant football coach at a
e. If the person selected is under 21 or at least 45, what is
Division II school helps his team prepare for the next opponent
the probability that this resident is in favor of state-owned
by charting plays. He looks at game films and records the down
liquor stores?
distance (first, second, or third down, and a categorical measure
of the number of yards needed for a first down) and type of play 4.126 Marketing and Consumer Behavior Because
(rush or pass), to look for tendencies. The two-way table below wireless technology has become reliable and prevalent, many
shows the number of plays that fall into each category. (Fourth- people are terminating their landline service. According to a
down plays are not charted, because they usually involve a punt survey by the Centers for Disease Control and Prevention,
or a field-goal attempt.) 35.8% of households have wireless phones only. In addition,
52.5% have a landline with wireless, 9.4% have a landline with-
Down/distance out wireless, 0.2% have a landline with unknown wireless, and
2nd 2nd 3rd 3rd 2.1% are phoneless.18 Suppose the survey indicated that 22.2%
1st short long short long were wireless-only users and in excellent health, 9.3% were
Rush 126 35 46 65 12 wireless-only users and in adequate health, and 4.3% were
Play wireless-only users and in poor health. Suppose a person who
Pass 87 16 67 23 59
completed the survey is selected at random.
a. Suppose the person is a wireless-only user. What is the
Suppose this table represents the true tendencies of the next
probability that she is in excellent health?
opponent.
b. Suppose the person is a wireless-only user. What is the
a. What is the probability that the opponent rushes
probability that she is in poor health?
the ball?
c. Suppose the person is a wireless-only user. What is the
b. Suppose the opponent has a first down. What is the
probability that he is not in poor health?
probability of a pass?
c. Suppose it is a first or second down. What is the 4.127 Marketing and Consumer Behavior The manager at
probability that the opponent rushes the ball? Of The Land Gallery in Red Lake Falls, Minnesota, is looking
d. Suppose the opponent passes the ball. What is the at many ways to increase pottery sales for next year. The
probability that it is a third down? probability that she will advertise more is 0.65, and the
probability of advertising more and increasing revenue is 0.35.
4.125 Public Policy and Political Science As a result of
a. Suppose the store manager decides to advertise more.
decreasing revenue, economic conditions, and deep budget defi-
What is the probability that revenue will increase?
cits, many states have tried to legalize gambling or, in states
b. If the store manager does advertise more, what is the
where gambling is already legal, expand casino operations. For
probability that revenue will not increase?
example, Washington state and Pennsylvania have joined
c. What is the probability of not advertising more and
multistate lotteries, several states have raised taxes on receipts
revenue increasing?
from riverboat gambling, and at least four states have had ballot
questions about starting new types of games.17 To measure 4.128 Psychology and Human Behavior A random sample
public opinion in Kansas, a random sample of residents was of adult drivers was obtained, 1000 men and 900 women. A
selected and each response was categorized according to survey showed that 640 men rely on GPS systems and 450
revenue preference and age. The results are given in the women rely on them.19 Suppose a person included in this
following two-way table. survey is randomly selected.
4.4 Conditional Probability 167
a. What is the probability that the person selected is a Suppose this table is representative of all Comcast user
woman and relies on GPS systems? customer satisfaction and a customer is selected at random.
b. Suppose the person selected is a man. What is the a. What is the probability that the customer is from the
probability that he relies on a GPS system? Midwest and customer service is fair?
c. Suppose the person selected relies on a GPS system. b. If the customer is from the North, what is the probability
What is the probability that the person is a woman? that customer service is excellent?
c. Suppose customer service is poor. What is the probability
Extended Applications that the customer is from the South?
d. If the customer service is good or fair, which region is the
4.129 Demographics and Population Statistics Accord-
customer most likely from? Justify your answer.
ing to the U.S. Census Bureau, 87.1% of people living in the
United States are native-born and 12.9% are foreign-born. In 4.132 Psychology and Human Behavior The following
addition, 10.16% are native-born and had no health insurance partial two-way table lists the number of adult criminal cases in
during the last year, and 3.78% are foreign-born and had no Canada by case type and sentence.22
health insurance during the last year.20 Suppose a person living
Type of sentence
in the United States is selected at random.
Conditional
a. Suppose the person is native-born. What is the probability Custody sentence Probation
that he had no health insurance during the last year?
b. Suppose the person is foreign-born. What is the probability Crimes against
that she had no health insurance during the last year? the person 16,067 2,465 55,006
c. Suppose the person is foreign-born. What is the probability Property
that he had health insurance during the last year? crimes 22,178 34,353 60,456
Criminal code
4.130 Biology and Environmental Science Homeowners Administration
who cultivate small backyard gardens are often worried about of justice 28,186 1,460 20,378 50,024
pests (for example, rabbits and groundhogs) ruining plants. Some Other criminal
gardeners protect their gardens with a fence, others spread chemicals code offenses 456 5,708 10,288
around the perimeter of the garden to keep animals away, and Criminal code
some do nothing. The joint probability table below shows the offenses (traffic) 7,387 751 7,141 15,279
relationships among these garden protection methods and success.
Other federal
Garden defense statute 7,980 2,732 10,534 21,246
Fence Chemicals Nothing 85,922 114,588
Pests 0.05 0.08 0.34
Result a. Complete the table.
No pests 0.30 0.20 0.03
b. Suppose the case was a crime against the person. What is
the probability that it resulted in custody?
c. Suppose the case results in probation. What is the
Suppose a backyard gardener is selected at random.
probability that it was a conviction for a criminal code
a. Suppose the garden had pests. What is the probability the
offense (traffic)?
gardener used nothing?
d. Suppose the case was a crime against the person. What is
b. Suppose the gardener used chemicals. What is the
the probability that it resulted in custody?
probability there were pests?
c. Given that the garden had no pests, which method of defense 4.133 Public Health and Nutrition The McPherson Middle
did the gardener most likely use? Justify your answer. School in Clyde, Ohio, is set to examine its school lunch program.
A survey of 2200 students asked students about their lunch type
4.131 Business and Management Bank of America, Time
and how they got to school in the morning. The following (partial)
Warner Cable, and Delta Airlines are some of the companies
two-way table is assumed to represent the entire student body.
ranked worst in customer service according to the American
Consumer Satisfaction Index.21 Suppose the table below Arrival mode
represents the results of a survey of customer satisfaction for Bus Car Walk
Comcast, another company on the Worst Customer Service list. Carries 466 142
Lunch
Customer service Buys 345 500 967
Excellent Good Fair Poor
970
North 0.102 0.059 0.062 0.004
Region
c. Suppose the student takes the bus to school. What is the a. What is the probability that the injury is to the eye and the
probability that the student buys lunch? shooter is a relative?
d. Suppose the student does not walk to school. What is the b. Suppose the injury was caused by a stranger. What is the
probability that the student carries a lunch? probability that the body part injured is an extremity?
e. If the student buys lunch, how did he or she most likely c. Suppose the injury is to the head/neck. What is the
get to school? probability that the shooter is a friend/acquaintance?
d. Suppose the injury is not to the eye. What is the
4.134 Medicine and Clinical Studies In the movie A
probability that the shooter is a relative?
Christmas Story, Ralph “Ralphie” Parker wanted an official
Red Ryder carbine-action 200-shot range model BB gun (with
a compass in the stock). Everyone in the movie (including Challenge
Santa) tells Ralphie he will shoot his eye out with this present.
According to the Centers for Disease Control and Prevention, 4.135 Public Policy and Political Science A survey
approximately 30,000 people visited an Emergency Room last of voters in a certain district asked if they favored a
year for BB gun accidents. The most common injuries were to return to stronger isolationism. The following three-way
the face, head, neck, and eye.23 In a survey of BB– and pellet table classifies each response by sex, political party,
gun–related injuries treated at hospitals, each injury was and response.
classified by primary body part injured and victim–shooter Male Female
relationship. The results are given in the table below.
Dem Rep Ind Dem Rep Ind
Victim-shooter relationship Yes 202 126 105 234 101 95
Other/
No 124 288 85 312 66 150
Friend/ shooter Not
Self acquaintance Relative Stranger not seen stated
Extremity 200 128 89 17 25 189 Suppose a random voter is selected from this district.
Body part injured
4.5 Independence
If extra information is given, In the last section we learned about conditional probability, that is, how knowing extra
sometimes we simply say, information may change a probability assignment. Often, however, additional informa-
“So what?” tion has no effect on the probability assignment. Consider the following examples.
Knowing extra information here does not change the conditional probability assignment.
Intuitively, the events C (contracting a cold) and E (exercising more) are unrelated, or
independent.
In these two examples, the occurrence or nonoccurrence of one event has no effect on
the occurrence of the other. In this case, the two events are independent.
Definition
Two events A and B are independent if and only if
P(A 0 B) 5 P(A)
If A and B are not independent, they are said to be dependent events.
A Closer L ok
One way to verify independent 1. If we know the events A and B are independent, then
events: Is P(A | B) 5 P(A)? If so,
P(A 0 B) 5 P(A) and P(B 0 A) 5 P(B).
then A and B are independent; if
not, they are dependent. Similarly, if either one of these equations is true, then the other is also true, and the
events are independent.
2. If A and B are independent events, then so are all combinations of these two events and
their complements.
Mathematical translation: If P(A 0 B) 5 P(A), then P(A 0 B9) 5 P(A), P(Ar 0 B) 5
P(A9), and P(Ar 0 Br) 5 P(A9).
3. Unfortunately, independent events cannot be shown on a Venn diagram. In problems
that involve independent events, we’ll have to translate the words into a probability
question and then use an appropriate formula.
4. It is reasonable to think of independent events as unrelated. One might conclude that
they are therefore disjoint. This is not true!
Suppose A and B are mutually exclusive and P ( A ) 2 0 (there is some positive proba-
bility associated with the event A). Then
P(A 0 B) 5 0 2 P(A)
The probability of A given B has to be 0, because A and B are disjoint. Once B occurs,
A cannot occur. Hence, disjoint events are dependent.
Stepped
STEPPED Tutorial
TUTORIALS
We can solve both of these equations for P ( A d B ) to obtain the following probability
Independence multiplication rule.
BOX PLOTS
and the
multiplication rule
$'%'&
# P(B 0 A) Always true.
5 P(A)
5 P(A) # P(B) Only true if A and B are independent.
A Closer L ok
1. The real skill in applying this rule is knowing which equality to use. The first two
equalities are always true. Use one of these only if A and B are dependent and you need
to find P(A d B). Read the problem carefully to determine which conditional and
unconditional probabilities are given.
If A and B are independent, use the third equality to compute the probability of inter-
section. The word independent will not always appear in the problem. It may be implied
or can be inferred from the type of experiment described.
2. If events are dependent, a modified tree diagram can be used to apply the probability
multiplication rule. In Figure 4.20 the probability of traveling along any branch is writ-
ten along the appropriate leg. Second-generation branch probabilities are conditional.
For example, P(C 0 A) (the probability of C given A) is the probability of taking path C,
given path A.
A P(D |
On this road map, to determine a A)
P(A)
final probability, we multiply
D P(A D) P(A) P(D | A)
probabilities along the way.
P(B) E P(B E ) P(B) P(E |B)
B)
P(E |
B P(F |
B)
All probabilities coming from a single node must sum to 1. To find the probability of
traveling along a complete path from left to right (equivalent to the probability of an
intersection), we multiply probabilities along the path.
A modified tree diagram is useful 3. The probability multiplication rule can be extended. For any three events A, B, and C:
here also. Try drawing one to P(A d B d C) 5 P(A) ? P(B 0 A) ? P(C 0 A d B).
illustrate this extended rule. 4. If the events A1, A2, . . . , Ak are mutually independent, then P(A1 d A2 d # # # d Ak)
5 P ( A1 ) # P ( A2 ) # # # P ( Ak ) .
In words, if the events are mutually independent, the probability of an intersection is
the product of the corresponding probabilities.
4.5 Independence 171
SOLUTION TRAIL There are lots of probability rules, formulas, and diagrams in this section. Here are
some examples (along with Solution Trails to help you translate the words into mathemat-
ics) to illustrate these concepts.
SOLUTION TRAIL
This is a correct application of the rule, but it doesn’t help in this problem,
because the probabilities on the right-hand side are not given. An accurate but
inappropriate use of the probability multiplication rule is evident because the
probabilities given in the problem and in the equality are mismatched. Simply
try the other equality.
K e y w ords
Example 4.36 It’s Made for Sleep
n Both mattresses will be
delivered on time
Better Bedding in East Hartford, Connecticut, claims that 99.4% of all its mattress deliv-
eries are on time. Suppose two mattress deliveries are selected at random.
Tr a nsl at i o n
a. What is the probability that both mattresses will be delivered on time?
n What is the probability of
the intersection of the event b. What is the probability that both mattresses will be delivered late?
mattress 1 is delivered on time c. What is the probability that exactly one mattress will be delivered on time?
and the event mattress 2 is
delivered on time? Solution
C onc e p ts a. Let Mi 5 mattress i is delivered on time; P(Mi) 5 0.994 (given).
n Probability multiplication rule Both mattresses delivered on time means mattress 1 is on time and mattress 2 is
V isi on on time.
Determine whether the events are P ( M1 d M2 ) 5 P ( M1 ) # P ( M2 ) Both mattresses on time; independent events.
independent, determine which
5 ( 0.994 )( 0.994 ) Probability of each mattress delivered on time.
probabilities are given, and use
the appropriate form of the 5 0.988036
probability multiplication rule.
The probability that both mattresses will be delivered on time is 0.9880.
172 C hapter 4 Probability
b. Both mattresses delivered late means mattress 1 is delivered late and mattress 2 is
delivered late. Delivered late is the complement of delivered on time.
5 ( 0.006 )( 0.006 )
5 0.000036
We don’t usually see this step, but there really is a union of two events in the back-
ground. [Notice the or separating (a) and (b) above.] These two events are disjoint, and
the probability of the union of disjoint events is the sum of the corresponding proba-
bilities.
5 P ( M1 d Mr2 ) 1 P ( Mr1 d M2 )
5 P ( M1 ) # P ( Mr2 ) 1 P ( Mr1 ) # P ( M2 ) Independent events.
5 0.011928
5 1 2 0.988072
5 0.011928
Example 4.37 Immunizations
Federal health officials have reported that the proportion of children (ages 19 to 35
months) who received a full series of inoculations against vaccine-preventable dis-
eases, including diphtheria, tetanus, measles, and mumps, increased up until 2006, but
has stalled since. The CDC reports that 14 states have achieved a vaccination coverage
rate of at least 80% for the 4:3:1:3:3:1 series.26 The probability that a randomly selected
toddler in Alabama has received a full set of inoculations is 0.792, for a toddler in
Georgia, 0.839, and for a toddler in Utah, 0.711.27 Suppose a toddler from each state
is randomly selected.
a. Find the probability that all three toddlers have received these inoculations.
b. Find the probability that none of the three has received these inoculations.
c. Find the probability that exactly one of the three has received these inoculations.
4.5 Independence 173
Solution
a. Define the following three events:
The probability that all three toddlers have received these inoculations is 0.4725.
b. None of the three has received inoculations means toddler A has not received the
inoculations and toddler G has not received the inoculations and toddler U has not
received the inoculations. Translate this sentence into mathematics using intersection
and complement.
P ( Ar d Gr d Ur ) Math translation: intersection.
# #
5 P ( Ar ) P ( Gr ) P ( Ur ) Independent events.
# #
5 3 1 2 P(A) 4 3 1 2 P(G) 4 3 1 2 P(U) 4 Complement rule.
5 0.0097
The probability that none of the three has received the inoculations is 0.0097.
c. To write a probability statement for exactly one has received the inoculations, ask
“How can that happen?” Toddler A has received the inoculations and toddlers G and U
have not, or toddler G has received the inoculations and toddlers A and U have not, or
toddler U has received the inoculations and toddlers A and G have not. Translate this
sentence into probability using intersection and complement.
P ( A d Gr d Ur ) 1 P ( Ar d G d Ur ) 1 P ( Ar d Gr d U )
Three ways exactly one toddler has received the inoculations.
5 P ( A ) # P ( Gr ) # P ( Ur ) 1 P ( Ar ) # P ( G ) # P ( Ur ) 1 P ( Ar ) # P ( Gr ) # P ( U )
Independent events.
5 ( 0.792 )( 0.161 )( 0.289 ) 1 ( 0.208 )( 0.839 )( 0.289 ) 1 ( 0.208 ) ( 0.161 )( 0.711 )
Known probabilities; complement rule.
5 0.0369 1 0.0504 1 0.0238
5 0.1111
The probability that exactly one toddler of the three has received the inoculations
is 0.1111.
n The words at least one means 5 1 2 3 1 2 P(M) 4 # 3 1 2 P(O) 4 # 3 1 2 P(Q) 4 Complement rule.
the event one, two, or three 5 1 2 ( 1 2 0.40 )( 1 2 0.43 )( 1 2 0.98 ) Use given probabilities.
drivers use winter tires. This is
the same as the complement of 5 1 2 ( 0.60 )( 0.57 )( 0.02 )
the event none of the drivers 5 1 2 0.0068 5 0.9932
uses winter tires
The probability that at least one driver uses winter tires is 0.9932.
Concepts
Challenge: Find this probability using a direct approach, without using the comple-
n Complement rule ment rule.
Vi sion Try It Now Go to Exercise 4.153
Define the event none of the
SOLUTION TRAIL
drivers uses winter tires and use
the complement rule to find the Example 4.39 A Traveling Salesperson
probability that at least one does. During frequent trips to a certain city, a traveling salesperson stays at hotel A 50%
of the time, at hotel B 30% of the time, and at hotel C 20% of the time. When check-
ing in, there is some problem with the reservation 3% of the time at hotel A, 6% of
Solution Trail 4.39a the time at hotel B, and 10% of the time at hotel C. Suppose the salesperson travels
to this city.
Ke ywor ds
a. Find the probability that the salesperson stays at hotel A and has a problem with the
n Stays at hotel A, problem with
the reservation reservation.
b. Find the probability that the salesperson has a problem with the reservation.
Transl at i on
c. Suppose the salesperson has a problem with the reservation; what is the probability
n What is the probability of the
intersection of the event A and that the salesperson is staying at hotel A?
the event R?
Concepts Solution
n Probability multiplication rule Define the following events:
Vi sion A 5 stays at hotel A; B 5 stays at hotel B;
Determine whether the events are C 5 stays at hotel C; and R 5 problem with the reservation.
independent, determine which
probabilities are given, and use Convert all the given percentages into probabilities.
the appropriate form of the The phrase of the time indicates conditional probability.
probability multiplication rule.
P ( A ) 5 0.50 P ( B ) 5 0.30 P ( C ) 5 0.20
P ( R 0 A ) 5 0.03 P ( R 0 B ) 5 0.06 P ( R 0 C ) 5 0.10
To find P(R9 | A), apply the This experiment can be represented with a modified tree diagram (Figure 4.21). Remem-
complement rule to a conditional ber, the probabilities along all paths coming from a node must sum to 1, and second-
probability statement: generation branch probabilities are conditional.
P(R9 | A) 5 1 2 P(R | A). a. The events A and R are dependent. The likelihood of a problem with a reservation
depends on the hotel.
P ( A d R ) 5 P ( A ) # P ( R 0 A ) Probability multiplication rule.
5 0.0150
The probability of staying at hotel A and having a problem with the reservation is
0.0150.
4.5 Independence 175
R P(A R)
0.03
A 0.97
0 R P(A R)
0.5
R P(B R)
0.06
0.30
B 0.94
0.
20 R P(B R)
R P(C R)
0.10
C 0.90
SOLUTION TRAIL
R P(C R)
K e y w ords 5 P(A) # P(R 0 A) 1 P(B) # P(R 0 B) 1 P(C) # P(R 0 C) Probability multiplication rule.
n Problem with the reservation 5 ( 0.50 )( 0.03 ) 1 ( 0.30 )( 0.06 ) 1 ( 0.20 )( 0.10 ) Use known probabilities.
Tr a nsl at i o n
5 0.0150 1 0.0180 1 0.0200 5 0.0530
n The event R The probability of a problem with the reservation (regardless of the hotel) is 0.0530.
C onc e p ts Figure 4.22 shows this decomposition of R using a Venn diagram.
n Unconditional probability
V isi on
A B C
To find P(R), ask, “How can that
happen?” Which paths from left
to right involve the event R? The R
tree diagram suggests that three
compound events (paths)
involve
SOLUTION TRAIL a problem with the
reservation.
A Closer L ok
1. Part (c) of the hotel example illustrates Bayes’ rule. This theorem loosely states:
Given certain conditional probabilities (and other unconditional probabilities), we
are able to solve for a new conditional probability where the events are inverted, or
swapped.
Stepped
STEPPED Tutorial
TUTORIALS
Tree
In the hotel example, we were given the conditional probabilities P(R 0 A), P(R 0 B),
BOX diagrams
PLOTS and
Bayes’ rule and P(R 0 C). Using these probabilities and the unconditional probabilities P(A), P(B),
and P(C), we were able to find P(A 0 R), a conditional probability with the events A and
R switched.
2. Suppose P(A), P(B), and P ( A d B ) are known. To decide whether A and B are inde-
?
pendent, check the equation P ( A d B ) 5 P ( A ) # P ( B ) . If the probability of the inter-
section is equal to the product of the probabilities, then the events are independent. If
not, they are dependent.
3. There are many applications in probability and statistics that involve repeated sam-
pling from a population with replacement. In this case, each draw is independent of
any other draw.
Other applications involve sampling without replacement, for example, exit polls and
telephone surveys. Consider each individual response as an event. These events are defi-
nitely dependent. However, if the population is large enough and the sample is small rela-
tive to the size of the population, then the events are almost independent. Calculating
probabilities assuming independence results in little loss of accuracy. Exercise 4.134
illustrates this idea.
a. Identify and determine each missing path probability. 4.151 Snakes on a Plane India has reported the greatest
b. Find P(A d C) and P(Ar d B). number of venomous snake bites and fatalities in the world.
c. Find P(D). The four snakes that reportedly do the most biting are the
King Cobra, Indian Krait, Russell’s Viper, and the Saw
4.148 Consider the modified tree diagram below. Scaled Viper. The snake bite fatality rate in India is 0.20.30
Suppose two people in India bitten by a Krait are selected
7 C
0.3 at random.
B
a. What is the probability that both people will die?
8
0.2 b. What is the probability that exactly one person will die?
C c. What is the probability that at most one person will die?
A
C 4.152 Economics and Finance In a 2013 survey conducted
5
0.3 0.55 by the Bank of Montreal, Canadians indicated that they were
B
feeling more optimistic about the economy. It was found that
C 38% of the respondents believed that their employer would hire
additional people during the year.31 Suppose three Canadian
C
workers are selected at random.
B 0.0 a. What is the probability that all three believe their
8
employer will hire additional people in the coming year?
C b. What is the probability that at least one believes that
A 0.7
6 4 C her employer will hire additional people in the
0.6 coming year?
SOLUTION TRAIL
a. Identify and determine each missing path probability. 4.153 Physical Sciences The San Francisco Bay Area
b. Find P(A d B d C) and P(Ar d B d Cr). is near several geological fault lines and is therefore
c. Find P(C). Are the events B and C independent? Justify vulnerable to the constant threat of earthquakes. According
your answer. to a study by the U.S. Geological Survey, the probability of a
178 C hapter 4 Probability
magnitude 6.7 or greater earthquake in the Greater Bay Area in living room, 68% have a TV in the bedroom, and 17% have
the next 30 years is 0.63. The probability of a large earthquake a TV in the kitchen.35 Suppose a home with TVs is selected
within the next 30 years along four major fault lines is given in at random.
the following table.32 a. What is the probability that the home has a TV in all three
rooms?
Fault line Probability b. What is the probability that the home has a TV in only the
North San Andreas 0.21 living room?
Hayward 0.31 c. What is the probability that the home has a TV in exactly
two of the three rooms?
San Gregorio 0.06
Concord–Green Valley 0.03 4.157 Economics and Finance Detailed analysis of two
technology stocks indicates over the next six months the
Suppose earthquakes occur in this area independently of probability that the price of stock 1 will rise is 0.42 and for
one another. stock 2 the probability is 0.63. Suppose the stock prices react
a. What is the probability that there will be a major earthquake independently.
within the next 30 years in all four fault regions? Write a a. What is the probability that both stock prices will rise
Solution Trail for this problem. over the next six months?
b. What is the probability that there will be no major b. What is the probability that stock 1 will rise and stock 2
earthquake within the next 30 years in any of the four will sink?
regions? c. Suppose both stocks are in the technology sector, and
c. What is the probability that there will be a major stock 2 tends to follow stock 1. If stock 1 rises over the
earthquake within the next 30 years in at least one of the next six months, the chance of stock 2 rising is 81%. Now
four regions? what is the probability of both stock prices rising over the
d. Suppose each probability is doubled if we consider the next six months?
next 40 years. What is the probability that there will be a 4.158 Sports and Leisure The PGA Tour maintains
major earthquake within the next 40 years in at least one statistical reports on variables such as money leaders, driving
of the four regions? distance, and driving accuracy. The table below lists the
4.154 Public Policy and Political Science Recent national probability that selected players were able to hit the green “in
elections suggest that the political ideology of adults in the regulation.”36 According to the PGA Tour, a green is considered
United States is very evenly divided. In a USA Today/Gallup hit in regulation if any portion of the ball is touching the putting
survey, 29% were Republicans, 30% were Democrats, and 41% surface after the green in regulation stroke has been taken.
were Independents.33 In addition, 51% of all Republicans, 16%
Golfer Probability
of all Democrats, and 28% of all Independents describe their
political views as conservative. Suppose an adult in the United Nick Watney 0.7738
States is selected at random. Bubba Watson 0.7654
a. What is the probability that the adult is a Republican and Camilo Villegas 0.7525
describes her political views as conservative?
b. What is the probability that the adult is a Democrat and
Suppose these three golfers are playing a round at Sawgrass
describes his political views as conservative?
and they tee up on number 11, one of the most difficult holes to
c. Suppose the adult described his views as conservative.
play on the professional tour.
What is the probability that he is an Independent?
a. What is the probability that all three players will hit the
4.155 Medicine and Clinical Studies According to the green in regulation?
Alzheimer’s Association, approximately 13% of older Ameri- b. What is the probability that none of the three players will
cans have Alzheimer’s disease.34 This is the sixth leading cause hit the green in regulation?
of death in the United States, and there is no treatment to c. What is the probability that exactly one of the players will
prevent, cure, or even slow the disease. Suppose four older hit the green in regulation?
Americans are selected at random. d. What is the probability that all three players will hit the
a. What is the probability that all four have Alzheimer’s green in regulation on all four rounds?
disease?
4.159 Sports and Leisure As of March 2013, Larry Bird
b. What is the probability that exactly one has Alzheimer’s
had the tenth highest career free-throw percentage in NBA
disease?
history—88.6%. Mark Price was number one.37 Suppose
c. What is the probability that at least two have Alzheimer’s
Larry were still playing and he steps up to the free-throw
disease?
line for two shots. It is unlikely that the two shots are
4.156 Psychology and Human Behavior There are independent. If he misses the first shot, the probability that
almost 36 million homes in the United States that have four he makes the second is 0.95, and if he makes the first shot,
TVs. In homes where there are TVs, 88% have a TV in the the probability that he makes the second is 0.85.
4.5 Independence 179
a. What is the probability that he makes both shots? b. What is the probability that their bid will be accepted?
b. What is the probability that he misses both shots? c. Suppose their bid is accepted. What is the probability that
c. What is the probability that he makes only one shot? it is for $20?
4.160 Economics and Finance As Baby Boomers reach 4.164 Biology and Environmental Science Opponents of
retirement age, many are beginning to carefully examine their the U.S. Navy SURTASS LFA Sonar System argue that it
savings plans and government benefits. This generation is constitutes a substantial risk to marine life, causing extraordi-
worried about remaining financially independent as many nary numbers of stranded, or beached, whales. Consider the
companies cut or even eliminate pension plans. According to a following statements concerning the use of this system near a
report commissioned by the Canadian bank CIBC, approxi- remote island in the South Pacific.
mately 25% of retiring Baby Boomers expect to carry some ■ On any given day, the probability of a mass stranding of
debt into their retirement.38 Suppose three Baby Boomers are whales in this area is 0.01.
selected at random. ■ The probability of a military exercise on any given day
a. What is the probability that exactly two of the three is 0.001.
expect to carry some debt into retirement? ■ If there is a military exercise, the probability of a mass
b. What is the probability that all three expect to carry some stranding is 0.17.
debt into retirement? a. Define events and write a probability statement for each
c. Suppose another random sample of five Baby Boomers is fact above.
obtained. What is the probability that exactly one adult b. On a randomly selected day, what is the probability of a
from each sample expects to carry some debt into mass stranding of whales and a military exercise?
retirement? c. Are the events mass stranding and military exercise
independent? Justify your answer.
4.161 Psychology and Human Behavior A recent survey
revealed that more people in the United Kingdom believe in 4.165 Medicine and Clinical Studies A tine test is a
space aliens than in God.39 One in 10 people has reported common method used to determine whether a person has been
seeing a UFO, and 20% of the respondents believe that UFOs exposed to tuberculosis. Approximately 5% of people in the
have landed on Earth. Suppose three people from the United United States have been exposed to tuberculosis.41 Using the
Kingdom are selected at random. tine test, 95% of all people who have been exposed test
a. What is the probability that all three believe UFOs have positive, and 98% of those not exposed test negative.
landed on Earth? Suppose a person is randomly selected and given the
b. What is the probability that none of the three believes tine test.
UFOs have landed on Earth? a. What is the probability that the person tests positive and
c. What is the probability that exactly one of the three has been exposed to tuberculosis?
believes UFOs have landed on Earth? b. What is the probability that the person tests positive?
c. Suppose the test is positive; what is the probability that
the person actually has been exposed?
Extended Applications
4.166 Travel and Transportation There are four major
4.162 Medicine and Clinical Studies When a person has a air carriers with flights from Boston to Los Angeles; 32%
certain type of leukemia, a physician may perform a bone of all passengers take American Airlines, 25% take Jet Blue,
marrow transplant in order to restore a healthy blood supply. 17% take United, and 26% take Virgin America. Data
Among the general population, the chances of an acceptable from 2012 indicate that 20% of all American Airlines flights
bone marrow match are 1 in 20,000.40 Suppose a person needs a from Boston to Los Angeles are late, 23% of Jet Blue
bone marrow transplant and four people from the general flights from Boston to Los Angeles are late, 19% of United
population are selected at random. flights from Boston to Los Angeles are late, and 11% of Virgin
a. What is the probability that none of the four will match? America flights from Boston to Los Angeles are late.42 Suppose
b. What is the probability that at least one will match? a passenger taking a flight from Boston to Los Angeles is
c. How many people would have to be tested in order for the randomly selected.
probability of at least one match to be 0.50? a. What is the probability that the passenger takes American
Airlines and is late?
4.163 Travel and Transportation A family trying to
b. What is the probability that the passenger is late? On time?
arrange a vacation is using the Internet to name their own price
c. Suppose the passenger arrives late. Which airline did the
for a rental car. The software reports that 50% of all people
passenger most likely fly?
name a price of $30 per day, 40% bid $25 per day, and 10% bid
$20 per day. The Internet company also reports that 90% of all 4.167 Manufacturing and Product Development The
$30 bids are accepted, 60% of all $25 bids are accepted, and Italian Aspide missile, a licensed version of the U.S.
only 5% of all $20 bids are accepted. Sparrow, has a sophisticated homing guidance system
a. What is the probability that the family will submit a bid and single-shot hit probability of 0.80.43 Suppose an enemy
of $25 and have it accepted? plane is within range of three missile firing stations, all three
180 C hapter 4 Probability
fire an Aspide surface-to-air missile, and the missiles operate a. Complete the tree diagram by filling in the missing path
independently. probabilities.
a. What is the probability that the plane is hit? b. What is the probability that the car is repaired under
b. What is the probability that all three missiles miss? budget, on time, and with company D?
c. How many missiles would have to be fired at the plane in c. What is the probability that the cost of the repair is over
order to be 99.99% sure it would be hit? the estimate?
d. What is the probability that the car is repaired under
4.168 Public Health and Nutrition Many more adults in the
budget, given that it is ready on time?
United States have celiac disease than a decade ago. A new
study suggests that approximately 1% of all U.S. adults have 4.170 Sports and Leisure Suppose two cards are drawn
celiac disease and should avoid eating foods with gluten.44 without replacement from a regular deck of 52 playing cards.
Suppose five adults in the United States are selected at random. Consider the events
a. What is the probability that exactly one of the five has A1 5 an ace is selected on the first draw;
celiac disease? A2 5 an ace is selected on the second draw.
b. What is the probability that only the first adult and the a. Find P(A2 0 A1) and P(A2). Are the events A1 and A2
fifth adult selected have celiac disease? independent? Justify your answer.
c. Suppose all five adults have celiac disease. Do you believe b. Suppose the two cards are drawn without replacement
the claim concerning the percentage of adults with celiac from six regular 52-card decks shuffled together.
disease? Justify your answer. Find P(A2 0 A1) and P(A2) for this experiment.
4.169 Fuel Consumption and Cars After a minor collision, Are the events A1 and A2 independent? Justify
a driver must take his car to one of two body shops in the area. your answer.
Consider the following events. c. In part (b), the events are almost independent. For
D 5 driver takes his car to shop D. six decks, find P(A1 d A2) exactly, and then find the
L 5 driver takes his car to shop L. same probability assuming the two events are
T 5 the work is completed on time. independent (with the probability of an ace on any
B 5 the cost is less than or equal to the estimate (under budget). draw being 24 / 312).
The following modified tree diagram describes the relationships
among these events.
Challenge
4.171 The Traveling Salesperson Reconsider Example
55 B
0.4 4.39. Suppose the salesperson has a problem with the
T reservation. In which hotel did the salesperson most
26
0.2 likely stay?
B
D 4.172 The Grapes of China The Chinese wine industry
78 B
0.3 has become very large. However, there are many quality
34
0.6 T control problems.45 Wine batches are systematically exam-
ined for alcohol content, total sugar, volatile acid, and other
B food additives. Suppose the Yantai Best Cellar Consulting
95 B Company claims that only 3% of all bottles are unqualified,
0.8
or defective. Suppose six bottles of Yantai wine are selected
T
at random.
B a. What is the probability that none of the six bottles will be
L unqualified?
0.8 B
44 95 b. What is the probability that at least one of the bottles will
0.9
T be unqualified?
c. Suppose all six bottles are unqualified. Do you believe the
B company’s claim? Justify your answer.
Chapter 4 Summary
Concept Page Notation / Formula / Description
Experiment 124 An activity in which there are at least two possible outcomes and the result cannot
be predicted with certainty.
Sample space 126 S: a listing of all possible outcomes, using set notation.
Chapter 4 Exercises 181
CHAPTER 4 Exercises
4.174 Physical Sciences At a state middle-school science 4.175 Sports and Leisure The Boston Bruins play in the
fair, students launch bottle rockets designed and built from Northeast Division of the Eastern Conference in the National
182 C hapter 4 Probability
Hockey League. For each game played against another team 4.178 Economics and Finance Many Americans use
in the Eastern Conference, the division [Atlantic (A), savings bonds to supplement retirement funds or to pay for
Northeast (N), or Southeast (S)] and the outcome [win (W), qualified higher-education expenses. The U.S. Treasury even
loss (L), tie (T), or overtime loss (O)] are recorded. Consider sells savings bonds online. Approximately one in every six
the following events. Americans owns savings bonds.48 Suppose four Americans are
E 5 The opponent is in the Southeast Division. randomly selected.
F 5 The Bruins win the game. a. What is the probability that all own savings bonds?
G 5 The opponent is in the Northeast or the game is an b. What is the probability that none of the four owns savings
overtime loss. bonds?
H 5 The Bruins lose and the opponent is from the Atlantic c. What is the probability that exactly two of the four own
Division. savings bonds?
a. Carefully sketch a tree diagram to illustrate the possible 4.179 Marketing and Consumer Behavior At Elmo’s, an
outcomes for this experiment. old-fashioned barber shop in Melbourne, Florida, 70% of all
b. Find the sample space for this experiment. customers get a haircut, 40% get a shave, and 15% get both.
c. Find the outcomes in each of the events E, F, G, and H. a. What is the probability that a randomly selected customer
d. List the outcomes in E c F, F d G, and H9. gets a shave or a haircut?
e. List the outcomes in E c H9, E c F c G9, and F c G9. b. What is the probability that a randomly selected customer
gets neither?
4.176 Demographics and Population Statistics In a
c. What is the probability that a randomly selected customer
recent population survey, the U.S. Census Bureau reported the
gets only a shave?
following classifications and corresponding probabilities.46
d. What is the probability that a randomly selected customer
Educational attainment Probability gets a shave, given that he gets a haircut?
e. Suppose two customers are selected at random. What is
No degree 0.1317 the probability that both get only a haircut?
High school graduate 0.3001
4.180 Medicine and Clinical Studies More and more
Some college, no degree 0.1946 people are trying herbal remedies, including gooseberry juice,
Associate degree 0.0916 eucalyptus oil, and crushed ajwain, for relief from the common
Bachelor’s degree 0.1844 cold. The following joint probability table shows the relation-
Master’s degree 0.0708 ship between having tried an herbal remedy and highest degree
Professional degree 0.0132 earned.
Doctorate degree 0.0136 Highest degree earned
High College Graduate
Consider the events: Vocational school degree degree
A 5 has a bachelor’s, master’s or doctorate degree
Tried 0.23 0.17 0.06 0.05
B 5 does not have an associate degree
C 5 does not have a degree Not tried 0.04 0.12 0.15 0.18
Find the following probabilities.
a. P(A), P(B), and P(C). Suppose one person is randomly selected.
b. P(A d B), P(A c C), and P(B c C). a. What is the probability that the person has tried an herbal
c. P(C9), P(A9 c B), and P(Br d Cr). remedy, given that the highest degree earned is from
college?
4.177 Biology and Environmental Science The germina-
b. If the person has not tried an herbal remedy, what is the
tion rate for pumpkin seeds is directly related to the prevailing
probability that the highest degree earned is from high
weather conditions. The Autumn Gold is a popular medium-
school?
sized pumpkin and ripens to a deep orange. If conditions are
c. Suppose the person has not earned a graduate degree.
seasonable, the probability of germination is 0.85.47 If it is dry,
What is the probability that the person has tried an herbal
suppose the probability that a random seed will germinate is
remedy?
0.75. Recent weather history suggests there is a 40% chance of
d. Suppose two people are selected at random. What is the
a dry start to the growing season. Suppose an Autumn Gold
probability that exactly one has tried an herbal remedy?
pumpkin seed is randomly selected.
a. What is the probability that the growing season will be 4.181 Psychology and Human Behavior Do you believe in
dry and the seed will germinate? ghosts? According to a recent survey, 21% of people in Sweden
b. What is the probability that the seed will germinate? believe in ghosts.49 Suppose that of those who believe in ghosts,
c. Suppose the seed does not germinate. What is the 20% said they have had a spiritual encounter with a ghost.
probability that the growing season had a dry start? Suppose a Swede is selected at random.
Chapter 4 Exercises 183
a. If the person believes in ghosts, what is the probability their preferred flavors were offered. Suppose a coffee drinker
that she has never had an encounter with a ghost? is selected at random.
b. What is the probability that the person believes in ghosts a. Suppose the coffee drinker uses creamer. What is the
and has had an encounter with a ghost? probability that he would not drink more even if his
c. What is the probability that the person believes in ghosts preferred flavor were offered?
and has not had an encounter with one? b. What is the probability that the coffee drinker uses
creamer and would drink more if his preferred flavor
4.182 Travel and Transportation A super-commuter is a
were offered?
person who commutes to work from one large metro area to c. Suppose three coffee drinkers are selected at random. What
another by car, rail, bus, or even air. Super-commuters are not is the probability that exactly one uses creamer?
necessarily elite business travelers, but rather middle-income
individuals who are willing to commute long distances in order
EXTENDED APPLICATIONS
to secure affordable housing or better schools. According to a
recent study, 13% of workers in Houston are super-commuters, 4.186 Public Health and Nutrition Tobacco smoke
8.6% in Phoenix, and 7.5% in Atlanta.50 Suppose three workers contains more than 7000 chemicals, many that are toxic and
are selected at random, one from each city. several that are known to cause cancer. As a result, many
a. What is the probability that all three are super-commuters? smokers try various methods to quit. Consider a group of
b. What is the probability that none of the three are super- smokers who want to quit. In this group, 2.7% have tried an
commuters? electronic cigarette, or e-cigarette. Of those who have tried
c. What is the probability that only the worker from Houston e-cigarettes, 31% quit smoking after six months.53 Of those
is a super-commuter? people who tried some other method, suppose 16% quit
d. What is the probability that exactly two of the workers are smoking after six months. Suppose a smoker who would
super-commuters? like to quit is selected at random.
a. What is the probability that the smoker tried e-cigarettes
4.183 Manufacturing and Product Development At a
and quit smoking after six months?
glass manufacturing facility, crystal stemware is carefully
b. What is the probability that the smoker quit after six
inspected for correct dimensions, quality, and production
months?
trends. After lengthy studies, the factory is known to produce
c. Suppose the smoker quit smoking after six months. What
15% defectives. Most of these pieces are discovered through
is the probability that she tried e-cigarettes?
inspection and are reworked or discarded. Suppose two pieces
are randomly selected for inspection. 4.187 Travel and Transportation In a study of the world-
a. What is the probability that both pieces are defect-free? wide commercial jet fleet through 2011, 37% of all fatal
b. What is the probability that neither piece is defect-free? accidents occurred when the plane was on final approach or
c. Suppose at least one of the pieces has a flaw; what is the landing.54 Suppose four fatal jet accidents are selected at random.
probability that both are defective? a. What is the probability that all four occurred during final
approach or landing?
4.184 Medicine and Clinical Studies There is a constant
b. What is the probability that none of the four occurred
shortage of organ donors in the United States. Fewer people are
during final approach or landing?
donating organs and ever more people are on waiting lists. One
c. What is the probability that exactly one of the four
solution to this problem involves compensating organ donors. A
occurred during final approach or landing?
recent poll suggests that 60% of Americans support some form
of compensation in terms of future health care for people who 4.188 Economics and Finance Customers at a Publix
make organ donations while alive, for example, kidneys, bone grocery store in Charleston, South Carolina, can pay for
marrow, or liver.51 Suppose four Americans are selected at purchases with cash, a debit card, or a credit card. Fifty-five
random. percent of all customers use cash and 38% use a debit card.
a. What is the probability that all four support compensation Careful research has shown of those paying with cash, 75%
for organ donors? use coupons; of those using a debit card, 35% use coupons;
b. What is the probability that none of the four supports and of those using a credit card, only 10% use coupons.
compensation for organ donors? Suppose a customer is randomly selected.
c. What is the probability that exactly two of the four a. What is the probability that the customer pays with a credit
support compensation for organ donors? card and does not use coupons?
b. What is the probability that the customer does not use
4.185 Marketing and Consumer Behavior Americans
coupons?
drink a lot of coffee, and they put all sorts of extras into their
c. If the customer does not use coupons, what is the probability
coffee to enhance the drink, including flavor shots and
that he paid with a debit card?
flavored creams. Research data indicates that 62% of all
coffee drinkers put creamer in their coffee.52 Of those people 4.189 Psychology and Human Behavior According to a
who use creamer, 40% say they would drink more coffee if recent survey, 11% of men ages 50 to 64 now color their hair.55
184 C hapter 4 Probability
Vehicle type
c. Suppose none of the four colors his hair. Is there any Van 2,494 174,299 453,197
evidence to suggest that the study’s claim is not correct? Other light 32 67,828 222,940
Justify your answer. truck
4.190 Fuel Consumption and Cars Auto Parts Warehouse Large truck 3,215 53,411 239,298
offers a wide variety of parts and accessories for cars. Consider Bus 221 9,968 47,387
the following events: Other 1,152 6,282 12,429
A 5 a randomly selected customer purchases a manual
B 5 a randomly selected customer purchases trim accessories a. What is the probability that the crash results in injury
C 5 a randomly selected customer purchases a car-care only?
product b. Suppose the crash involves a utility vehicle. What is the
Suppose the following probabilities are known. probability that it is fatal?
P(A) 5 0.44, P(B) 5 0.52, P(C) 5 0.39, c. Suppose the crash involves property damage only. What is
P(A d B) 5 0.19, P(A d C) 5 0.10, P(B d C) 5 0.23, the probability that the vehicle is a large truck?
P(A d B d C) 5 0.08. d. Suppose the crash is not fatal. What is the probability that
a. Carefully sketch a Venn diagram illustrating the it involves a bus?
relationship among these three events and label each e. Suppose the crash does not involve a passenger car. What
region with the corresponding probability. is the probability that it results in injury only?
b. Find the probability of just event A occurring. f. Are the events fatal crash and van independent? Justify
c. Find the probability of none of the events (A, B, or C) your answer.
occurring. g. Suppose two crashes are selected at random. What is the
d. Find P(A 0 C), P(B 0 A d C), and P(A d B d C 0 A). probability that both were fatal and involved an unknown
type of vehicle?
4.191 Travel and Transportation Pasco County in Florida
has special evacuation plans in the event of a hurricane.
CHALLENGE
Suppose residents can take one of five different major highways
out of the county. Department of Transportation officials have 4.193 Free Nights During the month of August, one guest at
produced the following table indicating the probability that a the Golden Nugget in Las Vegas will be selected at random to
resident will use a selected road. participate in a contest to win free lodging. A fair quarter will
be tossed until the first head is recorded. If the first head occurs
Road A B C D E
on toss x, the contestant will win x free nights’ stay at the
Probability 0.20 0.18 0.26 0.32 0.04 Golden Nugget.
So, if a head is obtained on the first coin toss, the contest is
Suppose three Pasco County residents are selected at random
over, and the guest wins one free night. If the first head appears
and a hurricane strikes.
on the fourteenth toss (13 tails and then a head), the guest wins
a. What is the probability that all three will take the same
14 free nights. Theoretically, a guest could win any number of
escape route?
free nights, 1, 2, 3, 4, . . . , although it seems unlikely someone
b. What is the probability that exactly one will take escape
could win, for example, 100 free nights.
route E?
a. Use technology to model this contest. Try your
c. What is the probability that two will take escape
simulation 10 times and record the number of free nights
route C?
awarded each time. Did anyone win five or more free
d. Suppose all three Pasco County residents hear a traffic
nights’ stay?
report indicating that route A is flooded and
b. Consider the event A 5 the guest wins five or more
impassable. What is the probability that all three will
free nights at the Golden Nugget. Simulate the contest
take route B?
n 5 50 times and compute the relative frequency of
4.192 Travel and Transportation Some researchers believe occurrence of the event A. Repeat this process for n 5
that the severity of an automobile accident is related to the type 100, 150, 200, . . . , 2000.
of vehicle. For example, pickup trucks tend to be involved in c. Construct a plot of the relative frequency versus the
more fatal crashes than other types of vehicles. The following number of simulations. Describe any patterns.
table presents the number of vehicles involved in each type of d. Use your results in (b) and (c) to estimate the probability
crash.56 Suppose this table is representative of all crashes in the of winning five or more free nights at the Golden
United States and one crash is selected at random. Nugget.
Chapter 4 Exercises 185
SOLUTION TRAIL
Looking Back
n Recall the definition of an experiment, how to find a sample space, and operations on
events.
n Remember the properties and rules used to compute the probability of various events.
Looking Forward
n Learn the concept of a random variable, a bridge between the experimenter’s world and
the statistician’s world, and how information is transferred between worlds.
n Understand the connection between an experimental outcome and the number associated
with that outcome.
n Understand probability distributions for discrete random variables and work with several
special discrete distributions.
Contents
5.1 Random Variables
5.2 Probability Distributions for Discrete Random Variables
5.3 Mean, Variance, and Standard Deviation for a Discrete Random Variable
5.4 The Binomial Distribution
5.5 Other Discrete Distributions
187
188 C h apte r 5 Random Variables and Discrete Probability Distributions
5.1 Random Variables
The idea of assignment suggests A function f is a rule that takes an input value and returns an output value (according to
the need for a function. the rule). Suppose the function f is defined by f ( x ) 5 x2 1 4. This rule indicates that f
takes an input x and assigns, or maps, x to the value x2 1 4. For example, the function f
assigns the input 1 to the output 5 because f ( 1 ) 5 12 1 4 5 5. A random variable is just
a special kind of function.
Definition
A random variable is a function that assigns a unique numerical value to each outcome
in a sample space.
A Closer L ok
1. Such functions are called random variables because their values cannot be predicted
with certainty before the experiment is performed.
2. Capital letters, such as X and Y, are used to represent random variables.
3. A random variable is a rule for assigning each outcome in a sample space to a unique
X: S S R real number. If e is an experimental outcome and x is a real number, here is a formal
A random variable maps elements way to picture this assignment: X ( e ) 5 x. The random variable X takes an outcome e
of a sample space to the real and maps, or assigns, it to the number x. The number x is associated with the outcome
numbers. e, and is a value the random variable can take on, or assume.
4. Figures 5.1 and 5.2 help us understand how a random variable works. These figures
illustrate the random variable X as the link between experimental outcomes and
numerical values. The rule for a random variable may be given by a formula, as a table,
or even in words. Note that several outcomes may be assigned to the same number, but
each outcome is assigned to only one number.
S
e4 X
e3
e1 e6
e2
e7 e5
4 3 2 1 0 1 2 3 4x
Experimenter’s Statistician’s
world X world
•
• •
•
• •
e4 2
e5 1
e6 0
e7 1
e8 2
e9 3
•
• •
•
• •
Figure 5.2 Another visualization of
the definition of a random variable.
5.1 Random Variables 189
The next example shows how a specific random variable maps outcomes to numbers.
The notation will get shorter and more concise as the concept of assignment becomes
clearer.
Solution
Step 1 The experiment consists of recording the type of each sinkhole. Each outcome
consists of a sequence of three letters, with each letter a C or an O. There are
eight possible outcomes (from the multiplication rule). Here is the sample space:
S 5 5 OOO, OOC, OCO, OCC, COO, COC, CCO, CCC 6
Step 2 The random variable X takes each outcome and returns the number of collapse
sinkholes (Cs). Here is a table that illustrates this mapping and the values the
random variable X can assume:
Outcome Value of X
OOO 0
OOC 1
OCO 1
OCC 2
COO 1
COC 2
CCO 2
CCC 3
Outcome X x
OOO
x is a possible value of the
OOC
random variable X.
OCO 0
OCC 1
COO 2
COC 3
CCO
CCC
There are two types of random variables. The type depends on the number of possible
values the random variable can assume.
Definition
A random variable is discrete if the set of all possible values is finite, or countably infinite.
A random variable is continuous if the set of all possible values is an interval of numbers.
These definitions are analogous to those for discrete and continuous data sets. The fol-
lowing remarks are also similar.
A Closer L ok
1. Discrete random variables are usually associated with counting, and continuous ran-
dom variables are usually associated with measuring.
2. To decide whether a random variable is discrete or continuous, consider all the possi-
ble values the random variable could assume. Finite or countably infinite means dis-
crete. An interval of possible values means continuous.
3. Recall: Countably infinite means there are infinitely many possible values, but they are
countable. You may not ever be able to finish counting all of the possible values, but
there exists a method for actually counting them.
4. The interval of possible values for a continuous random variable can be any interval,
of any length, open or closed. The exact interval may not be known, only that there is
some interval of possible values.
5. In practice, no measurement device is precise enough to return any number in some
interval. In theory, a continuous random variable may assume any value in some inter-
val (but not in reality).
Remember, an experiment may result in a numerical value right away, not a symbol or
a token. In this case we do not need any extra link or connection to the real numbers. The
description of the experiment is the same as the definition of the random variable. The
values the random variable can assume are the possible distinct experimental outcomes.
In the following example, several experiments are described and each associated random
variable is identified.
a. A Kohl’s department store has 65 cash registers. At the end of the day, the re-
ceipts are carefully audited to determine whether each cash register balances. Let
the random variable X be the number of cash registers that balance on a randomly
selected day.
b. Patients undergoing a tonsillectomy are administered a general anesthetic. Let the ran-
dom variable Y be the length of time from injection of the anesthetic until a patient is
rendered unconscious.
c. Schlage manufactures and sells a maximum-security double-cylinder deadbolt
lock for homes. At the facility where the locks are made and assembled, finished
locks are randomly selected and carefully checked for defects. If a defective lock
is found, the assembly line is shut down. An experiment consists of recording
whether the selected lock is good (G) or defective (B). The sample space is
S 5 5 B, GB, GGB, GGGB, GGGGB, c6 . Let the random variable X be the number
of locks inspected until a defect is found.
d. Let the random variable Y be the length of the largest fish caught on the next party boat
arriving back to the dock in Belmar, New Jersey.
Solution
a. There is no need to use a collection of symbols to represent experimental outcomes for
the cash registers. The possible values for X (and the distinct experimental outcomes)
are finite: 0, 1, 2, 3, . . . , 65. These values are distinct, disconnected points on a number
line. The random variable X is discrete.
b. Y is a measurement, the time elapsed until a patient is unconscious. The possible values
for Y are any number in some interval, say, 0 to 60 minutes. The random variable Y is
continuous.
c. The values X can assume are 1, 2, 3, 4, . . . . The number of possible values is countably
infinite; the values are disconnected on a number line. The random variable X is
discrete.
d. Y is a measurement, and can (theoretically) take on any value in some interval. The
possible values for Y are any number is some interval, say 5 to 25 inches. The random
variable Y is continuous.
e. The number of girls born in a rural hospital during the 5.14 Education and Child Development An experiment
next year. consists of showing a four-year-old child an interactive
f. The interest rate on a savings account at a randomly instructional video and then asking the child to tie his shoe-
selected bank in Philadelphia. laces. The random variable Y is the length of time the child
g. The number of tickets sold in the next Powerball lottery. takes to tie the first shoelace. Is Y discrete or continuous?
h. The number of oil tankers registered to a certain country Justify your answer.
at a given time.
5.15 Biology and Environmental Science The Waynes-
5.10 Classify each random variable as discrete or continuous. burg Lions Club receives a shipment of 300 Christmas trees
a. The number of visitors to the Museum of Science in from Wending Creek Farms in Coudersport, Pennsylvania, to
Boston on a randomly selected day. sell as a fundraiser. Classify each of the following random
b. The camber-angle adjustment necessary for a front-end variables as discrete or continuous.
alignment. a. The number of trees over six feet tall.
c. The total number of pixels in a photograph produced by b. The moisture content (expressed as a percentage) of a
a digital camera. randomly selected tree.
d. The number of days until a rose begins to wilt after c. The number of Douglas fir trees in the shipment.
purchase from a flower shop. d. The diameter of the trunk at the bottom of a randomly
e. The running time for the latest James Bond movie. selected tree.
f. The blood alcohol level of the next person arrested for
DUI in a particular county. 5.16 Biology and Environmental Science To map the
current of bottom water in a certain part of the Atlantic Ocean,
5.11 Classify each random variable as discrete or continuous. a dye is released and used to trace the water flow. Let the
a. The number of people requesting vegetarian meals on a random variable X be the maximum distance (in meters) from
flight from New York to London. release at which the dye is detected after one day. Is X discrete
b. The exact thickness (in millimeters) of a paper towel. or continuous? Justify your answer.
c. The time it takes a driver to react after the car in front
stops suddenly. 5.17 Psychology and Human Behavior An experiment
d. The number of escapees in the next prison breakout. consists of recording the behavior of a randomly selected
e. The length of time a deep-space probe remains in contact Duluth cab driver as a traffic signal changes from red to green.
with Earth. Let the random variable X be the acceleration (in ft/s2) of the
f. The number of points on a randomly selected buck. cab one second after the light changes. Is X discrete or continu-
The definition of a point is an antler projection at least one ous? Justify your answer.
inch in length from the base to tip. The brow tine and main
5.18 Medicine and Clinical Studies A report in the
beam tip shall be counted as points regardless of length.
journal Cancer suggests that women who take aspirin on a
5.12 Classify each random variable as discrete or continuous. regular basis lower their risk of a certain kind of melanoma.2
a. The number of votes necessary to elect a new Pope. Every woman who visits the Turtle Creek Medical Center in
b. The amount of sugar in a 16-ounce sweetened bottled Dallas, Texas, during the next business day will be asked
drink purchased in a New York City cafe. whether she regularly takes an aspirin. Let X be the number of
c. The total number of riders on all forms of public women who take an aspirin daily. Is X discrete or continuous?
transportation in the United States during the year. Justify your answer.
d. The number of residents in an assisted-living center who
suffer from hardening of the arteries. 5.19 Sports and Leisure The Boston Red Sox play their
e. The amount of lead measured in the soil of a children’s spring training games in JetBlue Park, Fort Myers, Florida.
playground. Suppose a game against the Tampa Bay Rays is selected.
f. The time it takes an automobile to pass through the Classify each of the following random variables as discrete or
George Massey Tunnel in Vancouver, British Columbia. continuous.
a. The price of a randomly selected ticket.
b. The time it takes to complete the game.
Applications c. Whether the game is postponed due to rain or is
5.13 Marketing and Consumer Behavior T. J. Maxx sells completed.
home fashions and men’s, women’s, boys’, and girls’ apparel. An d. The speed of the first pitch in the bottom of the third
experiment consists of classifying the next two items purchased, inning.
each as men’s, women’s, boys’, or girls’ apparel. Let the random e. The number of fans in attendance.
variable X be the number of sales of women’s or girls’ apparel. f. The number of hot dogs sold during the entire game.
a. List the outcomes in the sample space. g. The total number of errors in the game.
b. What are the possible values for X? Is X discrete or h. The weight in ounces of the bat used by the third hitter in
continuous? Justify your answer. the fifth inning.
5.2 Probability Distributions for Discrete Random Variables 193
Definition
The probability distribution for a discrete random variable X is a method for specify-
ing all of the possible values of X and the probability associated with each value.
A Closer L ok
1. A probability distribution for a discrete random variable may be presented in the form
of an itemized listing, a table, a graph, or a function.
2. A probability mass function (pmf) is denoted with a small p, and is the probability that
a discrete random variable is equal to some specific value. In symbols, it is defined by
p ( x ) 55
P(X 5 x).
Rule
In words, the rule for the function p evaluated at an input x is the probability of an
event, the probability that the random variable X takes on the specific value x. The
function p and its probability rule are used interchangeably.
Suppose X is a discrete random variable. Then p(7) means find the probability that the
random variable X equals 7, or P ( X 5 7 ) .
The random variable X is defined to be the number of Ds in an outcome. Find the proba-
bility distribution for X.
Solution
Step 1 The probability distribution for X consists of all the possible values X can assume
along with the associated probabilities. The table below shows the random vari-
able assignment and a technique for calculating the probability of each value.
194 C h a p t e r 5 Random Variables and Discrete Probability Distributions
To find the probability that X takes on a specific value x, find all the outcomes
that are mapped to x, and add the probabilities of these outcomes.
Step 2 The random variable X takes on the values 0, 1, 2, and 3. The probability distri-
Looking at just this (probability bution can be presented in a table as shown below:
distribution) table, how do you
x 0 1 2 3
know X cannot assume any other
value? p(x) 0.336 0.452 0.188 0.024
A Closer L ok
1. Think about this process of constructing a probability distribution. To find the proba-
bility that X takes on the value x, look back at the experiment and find all the outcomes
that are mapped to x. Drag along these probabilities and sum them.
2. The probability distribution for a random variable X is a reference for use in answering
probability questions about the random variable. For example, we’ll need to answer
probability questions such as “Find P ( X 5 3 ) .” Think of X 5 3 as an event stated in
terms of a random variable. The details needed for answering this question are in the
probability distribution.
The next example illustrates various methods for presenting a probability distribution.
y 0 1 2 3 4
p(y) 5 / 15 4 / 15 3 / 15 2 / 15 1 / 15
This kind of table is the most common way to present a probability distribution
for a discrete random variable. It concisely lists all the values Y can assume and
the associated probabilities.
Step 3 A probability histogram:
p(y)
5/15
4/15
3/15
2/15
1/15
0
0 1 2 3 4 y
p(y)
5/15
4/15
3/15
2/15
1/15
0
0 1 2 3 4 y
522 3
p(2) 5 5
15 15
All of the techniques presented in the previous example are valid methods for present-
ing a probability distribution. Use the style that is most convenient or appropriate, or what
is called for in the question. Often, a graphical representation of the distribution will be
helpful. Sometimes, having a formula for the probability distribution is more useful. In
the next example we’ll construct another probability distribution and consider some prob-
ability questions involving a random variable.
Solution
The experiment consists of observing four customer choices. Each outcome consists of a
sequence of four letters, each a C or a T. From the multiplication rule, there are 16 pos-
sible outcomes: CCCC, CCCT, CCTC, etc.
Because the customers are selected at random, each choice is independent, and the prob-
ability of each outcome is obtained by multiplying the corresponding probabilities. For
example,
SOLUTION TRAIL
P ( CTCT ) 5 P ( C d T d C d T ) First customer buys coffee and second customer buys tea and . . .
# # #
5 P ( C ) P ( T ) P ( C ) P ( T ) Events are independent. Multiply corresponding probabilities.
5 ( 0.70 )( 0.30 )( 0.70 )( 0.30 ) P(T) 5 1 2 P(C)
5 0.0441
Solution Trail 5.5a The following table lists all the possible experimental outcomes, the probability of each
outcome (computed as above), and the value of the random variable assigned to each
Ke ywor d s outcome.
n Probability distribution
Tr anslat ion
Outcome Probability x Outcome Probability x
n Find all the values X can TTTT 0.0081 0 CTTC 0.0441 2
assume and all the associated TTTC 0.0189 1 CTCT 0.0441 2
probabilities.
TTCT 0.0189 1 CCTT 0.0441 2
C oncept s TCTT 0.0189 1 TCCC 0.1029 3
n Connection between CTTT 0.0189 1 CTCC 0.1029 3
experimental outcomes and TTCC 0.0441 2 CCTC 0.1029 3
real numbers
TCTC 0.0441 2 CCCT 0.1029 3
Vis ion TCCT 0.0441 2 CCCC 0.2401 4
First, think about the experiment.
Use the definition of the random
variable to link experimental This table shows that the values of X are 0, 1, 2, 3, and 4.
outcomes with values of the
a. Use the links in the table to construct the probability distribution for X.
random variable, and drag along
all of the probabilities. Construct p ( 0 ) 5 P ( X 5 0 ) 5 P ( TTTT ) 5 0.0081
a table listing all the values of X
and the associated probabilities. There is only one outcome assigned to a 0, and the probability of that outcome is
0.0081.
SOLUTION TRAIL
Solution Trail 5.5b p(1) 5 P(X 5 1) Definition of a probability mass function.
C on cepts Given that at least two people order coffee, the probability that exactly four order cof-
n Conditional probability
fee is 0.2620.
Here is a way to picture this conditional probability using the probability distribution
V is io n table:
Given that at least two customers
order coffee, the number of
x 0 1 2 3 4
values X can assume is reduced.
Use the definition of conditional p(x) 0.0081 0.0756 0.2646 0.4116 0.2401
probability with events involving
the random variable.
Given that X is either 2, 3, or 4, the reduced, or relevant, probability is 0.9163. The
proportion of time that X is equal to 4, given X is 2, 3, or 4, is 0.2401 / 0.9163.
The probability distribution of a random variable reveals which values of the random
variable are most likely to occur. This information is extremely helpful in making a statis-
tical inference. Consider the following example.
The following example involves a probability distribution for a discrete random vari-
able and illustrates these two properties.
a. Find p(150).
b. Find P ( 100 # Y # 250 ) and P ( 100 , Y , 250 ) .
Keith Brofsky/Photodisc/Getty Images
c. Construct the corresponding probability histogram.
p(y)
0.4
0.3
0.2
0.1
0.0
0 100 150 200 250 300 y
5.33 Fuel Consumption and Cars An automobile insurance a. Construct the probability distribution for X. Construct the
policy depends on many factors, including the vehicle type, corresponding probability histogram.
year, make, and model, type of coverage, your driving history, b. TRAIL
SOLUTION What is the probability that all three Americans trust the
your insurance score based on claims history, payment history, government?
and credit score, and your state of residence.4 Suppose that for c. What is the probability that at least one of the three trusts
some driver category, the probability distribution for the the government?
random variable Y, the amount (in dollars) paid to policy-
5.36 Marketing and Consumer Behavior Twinkies
holders for claims in one year, is given in the table below.
are an American tradition, sold for over 83 years,
y 0 500 1000 5000 10000 originating in a Chicago bakery, and forever linked to legal
lingo. (You may have read about the “Twinkie defense,”
p(y) 0.65 0.20 0.10 0.04 0.01 associated with a 1979 murder trial in San Francisco.) Approxi-
a. Find P ( Y . 0 ) . mately 12% of households in the United States buy Twinkies.7
b. Find P ( Y # 1000 ) . Suppose four households are selected at random and let Y be
c. What is the probability that a randomly selected driver is the total number of households that buy Twinkies.
paid $5000? a. Construct the probability distribution for Y. Write a
d. Suppose two drivers are selected at random. What is the Solution Trail for this problem.
probability that both are paid $1000? b. What is the probability that at least one household buys
e. Suppose two drivers are selected at random. What is the Twinkies?
probability that at least one is paid $500 or more? c. Suppose at least two households buy Twinkies. What is
the probability that all four households buy Twinkies?
5.34 Public Policy and Political Science Camden, New
Jersey, has one of the highest crime rates in the United States. 5.37 Manufacturing and Product Development Staples has
As a result, a new Camden County Police Department will six special drafting pencils for sale, two of which are defective. A
include 400 officers and be responsible for an area covering student buys two of these six drafting pencils, selected at random.
nine square miles.5 Suppose the number of times a police Let the random variable X be the number of defective pencils
cruiser from this new department drives through the Whitman purchased. Construct the probability distribution for X.
Park neighborhood during a one-hour period is a random 5.38 Business and Management Two packages are
variable X, with probability distribution as given in the table independently shipped from Fort Collins, Colorado, to the
below. Convention Center in Kansas City, Missouri, and each is
x 0 1 2 3 guaranteed to arrive within four days. The probability that a
package arrives within one day is 0.10, within two days is
p(x) 0.3679 0.3679 0.1839 0.0613
0.15, within three days is 0.25, and on the fourth day is 0.50.
x 4 5 6 7 Let the random variable X be the total number of days for
both packages to arrive. Construct the probability distribu-
p(x) 0.0153 0.0031 0.0005 0.0001 tion for X.
Suppose a one-hour period is randomly selected.
a. What is the probability that no police cruiser will drive Extended Applications
through the neighborhood?
b. What is the probability that at least one police cruiser will 5.39 Biology and Environmental Science Agway, a farm
drive through the neighborhood? and garden supply store, sells winter fertilizer in 50-pound
c. What is the probability that at most two police cruisers bags. For customers who purchase this product, the probability
will drive through the neighborhood? distribution for the random variable X, the number of bags sold,
d. What is the probability that more than seven police is given in the table below.
cruisers will drive through the neighborhood? x 1 2 3 4 5
e. Suppose at least two police cruisers were sighted in
the neighborhood during the one-hour period. What is p(x) 0.55 0.35 0.07 0.02 0.01
the probability that there were at least four during this
Suppose a person buying winter fertilizer is randomly selected.
time?
a. What is the probability that the customer buys more than
5.35 Public Policy and Political Science According to a two bags?
January 2013 survey by the Pew Research Center, the percent- b. What is the probability that the customer does not buy
age of Americans who trust the government in Washington has two bags 3 P ( X 2 2 ) 4 ?
decreased steadily since Bill Clinton left office.6 Today, c. Find the probability that two randomly selected customers
approximately 30% of Americans say they trust the govern- each buy one bag.
ment (always or most of the time). Suppose three Americans d. Suppose two customers are randomly selected. What is
are selected at random. Let X be the number who trust the the probability that the total number of bags purchased
government. will be at least eight?
202 C h apter 5 Random Variables and Discrete Probability Distributions
e. Let the random variable Y be the number of pounds sold that allows customers to pay for meals with a bank debit card.
to a randomly selected customer buying winter fertilizer. The manager of the restaurant claims this new system will
Find the probability distribution for Y. decrease the waiting time, and that the probability of getting a
meal in under two minutes (with this system in place) is 0.75.
5.40 Sports and Leisure Suppose the probability that a
Suppose four customers are selected at random. Let the random
person says he or she was at the Woodstock Festival and
variable X be the number of customers who receive their meal
Concert is 0.20. An experiment consists of randomly selecting
in under two minutes.
people in Green Bay and asking them whether they were at
a. Construct the probability distribution for X.
Woodstock. The experiment stops as soon as one person says he
b. Suppose none of the four customers receives their
or she was there. The random variable X is the number of
meal in under two minutes. Is there any evidence to
people stopped and questioned (until one person says he or she
suggest the manager’s claim is false? Justify your
was there). Let Y and N stand for a Yes and No response,
answer.
respectively.
a. List the first several outcomes in the sample space. 5.43 Demographics and Population Statistics Most
b. Find the probability of each outcome in part (a). physicians in Canada graduated from a medical school in
c. Find the value of the random variable associated with Canada. However, some attended medical school in the
each outcome in (a). United States or another foreign country. The probability
d. Find a formula for the probability distribution of X. that a physician in Alberta attended a Canadian medical
school is 0.684; in British Columbia, 0.705; in Quebec,
5.41 Sports and Leisure A game show contestant on Let’s
0.891; and in Ontario, 0.736.8 Suppose one physician is
Make a Deal selects two envelopes with prize money enclosed.
independently selected from each province, and let the
Two of the envelopes contain $100, one envelope contains
random variable Y be the number of physicians who
$250, two envelopes contain $500, and the last envelope
attended a Canadian medical school.
contains $1000. Let the random variable M be the maximum of
a. Construct a probability distribution for Y.
the two prizes.
b. What is the probability that at least one physician
a. Find the probability distribution for M.
attended a Canadian medical school?
b. Suppose two contestants independently select prize
c. Suppose another group of physicians is selected, one from
envelopes on two different days. What is the probability
each province. What is the probability that at least three
that both win the top prize?
physicians from each group attended a Canadian medical
5.42 Economics and Finance A Subway restaurant in school?
Scotsbluff, Nebraska, recently installed a new computer system
Solution
Step 1 This question is concerned with the amount earned each day on average, not on any
one particular day. Consider the probabilities given and consider five typical days.
On four of five days, the caddy earns $50. On the fifth day, he earns $0. The total
earned for the five typical days is $200. To find the average amount earned each
day, divide by five: $200/5 5 $40.
Step 2 Consider a random variable X that takes on only two values, 0 and 50, with prob-
abilities 0.20 and 0.80, respectively. Another way to compute the average earned
each day is to use this probability distribution.
40 5 0 3 0.20 1 50 3 0.80
c c c c
Value Probability Value Probability
The long-run average earnings per day can be found by using a probability dis-
tribution. Multiply each value by its corresponding probability, and sum these
products.
Definition
Let X be a discrete random variable with probability mass function p(x). The mean, or
expected value, of X is
E ( X ) 5 m 5 mX 5 a 3 x # p ( x ) 4 . (5.1)
8 8
all x
Notation Calculation
A Closer L ok
1. The capital E stands for expected value and is a function. The function E takes a ran-
dom variable as an input and returns the expected value. More generally, E accepts as
an input any function of a random variable. For example, suppose f ( X ) is a function
of a discrete random variable X. The expected value of f ( X ) is
E 3 f (X ) 4 5 a 3 f (x) # p(x) 4 (5.2)
all x
2. m is the mean, or expected value, of a random variable (which may model a popula-
tion). If necessary, the associated random variable is used as a subscript for identifi-
cation, for example, mX or mY .
3. The mean is easy to compute. Multiply each value of the random variable by its cor-
responding probability, and add the products.
4. The mean of a random variable is a weighted average and is only what happens on
average. The mean may not be any of the possible values of the random variable.
x 220 210 30 60
p(x) 0.2 0.3 0.4 0.1
Find the mean of X.
204 C h apter 5 Random Variables and Discrete Probability Distributions
Solution
Step 1 X is a discrete random variable. To find the mean, use Equation 5.1.
x 0 1 2 3 4 5
p(x) 0.55 0.28 0.09 0.04 0.03 0.01
Find the expected number of glasses of soda that an American drinks each day.
Solution
Step 1 X is a discrete random variable (it takes on only a finite number of values). Find
the mean using Equation 5.1.
The variance and standard deviation of a random variable measure the spread of the
distribution. The variance is computed using the expected value function, and the stan-
dard deviation together with the mean can be used to determine the most likely values of
the random variable.
Definition
Let X be a discrete random variable with probability mass function p(x). The variance
of X is
s 5 sX 5 "s2 (5.4)
Notation Calculation
A Closer L ok
1. In words, the variance is the expected value of the squared deviations about the
mean.
2. The symbol Var stands for variance and is a function. The function Var takes a random
variable as an input and returns the variance.
3. To compute the variance using Equation 5.3:
a. Find the mean, m, of X using Equation 5.1.
b. Find each difference: ( x 2 m ) .
c. Square each difference: ( x 2 m ) 2.
d. Multiply each squared difference by the associated probability.
e. Sum the products.
4. There is a computational formula for the variance of a random variable.
In words, the variance is the expected value of X squared minus the expected value of
X, squared.
x 1 2 3 4 5 6 7
p(x) 0.05 0.10 0.15 0.25 0.20 0.15 0.10
Solution
a. Expected value: Find the expected value of X using Equation 5.1.
E(X) 5 a 3 x # p(x) 4
all x
5 ( 1 )( 0.05 ) 1 ( 2 )( 0.10 ) 1 ( 3 )( 0.15 ) 1 ( 4 )( 0.25 ) 1 ( 5 ) ( 0.20 )
1 ( 6 )( 0.15 ) 1 ( 7 )( 0.10 )
5 0.05 1 0.20 1 0.45 1 1.00 1 1.00 1 0.90 1 0.70
5 4.30 5 m
206 C h apter 5 Random Variables and Discrete Probability Distributions
Here is a tabular method for computing the variance using the definition. Sum the last
column to obtain s2 .
E ( X 2 ) 5 a 3 x2 # p ( x ) 4 Equation 5.2.
all x
5 0.05 1 0.40 1 1.35 1 4.00 1 5.00 1 5.40 1 4.90 Compute each product.
5.3 Mean, Variance, and Standard Deviation for a Discrete Random Variable 207
The probability that X takes on a value within one standard deviation of the mean is
0.60.
A Closer L ok
1. The computational formula for the variance is quicker and produces less round-off
error. Use Equation 5.5 to find the variance of a discrete random variable.
2. In Example 5.11, the random variable X is not (approximately) normal; the empirical
rule does not apply. In addition, even though Chebyshev’s rule applies to any distribu-
tion, it should not be used if the probability distribution is known. Chebyshev’s rule
provides only an estimate, a lower bound for the probability that X is within k standard
deviations of the mean. The exact probability can be determined using the known
probability distribution. (Actually, Chebyshev’s rule can’t help at all here, because k
must be greater than 1.)
3. Neither the TI-84 Plus C nor Minitab has a built-in menu function to find the mean, vari-
ance, and standard deviation of a discrete random variable. However, it’s easy to perform
list operations on the calculator and column operations in Minitab or Excel in order to
produce these summary statistics. See the technology manuals for details.
x 20 10 5 4 2 0
p(x) 0.006667 0.013333 0.040000 0.046667 0.100000 0.783172
208 C h apte r 5 Random Variables and Discrete Probability Distributions
a. Find the expected payout, and the variance and standard deviation of the payout.
b. Interpret the values found in part (a). In particular, if it costs $2.00 to purchase a ticket,
what happens in the long run?
Solution
a. Use the tabular method and the computation formula for the variance.
x2 # p ( x ) x2 p(x) x x # p(x)
625.0000 625000000 0.000001 25000 0.0250
100.0000 25000000 0.000004 5000 0.0200
5.5000 250000 0.000022 500 0.0110
8.0000 10000 0.000800 100 0.0800
6.6675 2500 0.002667 50 0.1334
4.1669 625 0.006667 25 0.1667
2.6668 400 0.006667 20 0.1333
1.3333 100 0.013333 10 0.1333
1.0000 25 0.040000 5 0.2000
0.7467 16 0.046667 4 0.1867
0.4000 4 0.100000 2 0.2000
0.0000 0 0.783172 0 0.0000
E(X 2) S 755.4811 1.2894 d m
5.52 The probability distribution for a random variable X is a. Is this a valid probability distribution? Justify your answer.
given in the table below. b. Find the mean, variance, and standard deviation of X.
c. TRAIL
SOLUTION Find the probability X takes on a value less than the mean.
x 5 10 15 20 d. Suppose two bags of pistachios are selected at random.
p(x) 0.10 0.15 0.70 0.05 What is the probability that both bags have four or more
pistachios too difficult to open by hand?
a. Find the mean, variance, and standard deviation of X. 5.57 Sports and Leisure Suppose the number of rides
b. Find the probability X takes on a value smaller than the that a visitor enjoyed at Disney World during a day is a
mean. random variable with probability distribution given in the table
c. Using the probability distribution, explain why the value below.11
of the mean of X makes sense.
x 5 6 7 8 9 10 11 12
5.53 Suppose Y is a discrete random variable with probability
distribution given in the table below. p(x) 0.04 0.07 0.09 0.12 0.20 0.30 0.13 0.05
y 220 210 0 10 20 a. Find the mean number of rides for a Disney World visitor.
p(y) 0.30 0.15 0.10 0.15 0.30 b. Find the variance and standard deviation of the number of
rides for a Disney World visitor.
c. Find the probability that the number of rides for a
a. Find m, s2 , and s .
randomly selected Disney World visitor is within one
b. Find P ( m 2 2s # Y # m 1 2s ) .
standard deviation of the mean. Write a Solution Trail for
c. Find P ( Y $ m ) and P ( Y . m ) .
this problem.
5.54 Suppose the random variable X has the probability d. Find the probability that the number of rides for a
distribution given in the table below. randomly selected Disney World visitor is less than one
standard deviation above the mean.
x 2 3 5 7 11 13
5.58 Public Policy and Political Science Approximately
p(x) 0.15 0.25 0.15 0.10 0.30 0.05 100 children’s products are recalled every year.12 In particular,
children’s clothing is recalled for a variety of reasons, for
a. Find the mean, variance, and standard deviation of X. example, drawstrings that are too long and pose a hazard, small
b. Suppose the random variable Y is defined by Y 5 2X 1 1. buttons that may break off and cause choking, and material that
Find the mean, variance, and standard deviation of Y. fails to meet federal flammability standards. Suppose the
c. Suppose the random variable W is defined by number of recalls of children’s clothing during a given month is
W 5 X 2 1 1. Find the mean, variance, and standard a random variable with probability distribution given in the
deviation of W. table below.
5.55 Suppose X is a discrete random variable with probability x 0 1 2 3 4 5 6
distribution as given in the table below.
p(x) 0.005 0.185 0.275 0.305 0.200 0.020 0.010
x 1 2 3 5 8 13 21
p(x) 0.05 0.10 0.15 0.20 0.25 0.20 0.05 a. Find the mean, variance, and standard deviation of the
number of recalls of children’s clothing during a given
month.
a. Find the mean, variance, and standard deviation of X.
b. Suppose the number of recalls in a given month is at least
b. Find the probability X is more than one standard deviation
three. What is the probability that the number of recalls
from the mean.
that month will be at least five?
c. Find P ( X # m 1 2s ) .
c. If the number of recalls in a given month is above more
than one standard deviation from the mean, the federal
Applications government issues a special warning directed toward
5.56 Manufacturing and Product Development In a parents. What is the probability that a special warning
quarter-pound bag of red pistachio nuts, some shells are too will be issued during a given month?
difficult to pry open by hand. Suppose the random variable X, 5.59 Public Health and Nutrition A bill introduced into the
the number of pistachios in a randomly selected bag that cannot Virginia State Senate stipulated that the owner of a tanning
be opened by hand, has the probability distribution given in the facility must identify and document the skin type of every
table below. customer, and must advise every customer of the maximum
time of recommended exposure in the tanning device.13 Past
x 0 1 2 3 4 5
records indicate that most sessions range from 10 to 30 minutes.
p(x) 0.500 0.250 0.100 0.050 0.075 0.025 Suppose the duration of a tanning session (in minutes) at
210 C h apte r 5 Random Variables and Discrete Probability Distributions
p(x) 0.07 0.12 0.18 0.37 0.17 0.09 0 1/n 1 2511 / n 2 262,544 / n
3 839,562 / n 4 204,764 / n 5 15,344 / n
a. Find the mean, variance, and standard deviation of the 6 1397 / n 7 204 / n 8 32 / n
number of patients assigned to each nurse.
b. Find the probability that the number of assigned patients is
greater than one standard deviation to the right of the mean. a. Find the mean, variance, and standard deviation for the
c. Suppose three nurses are selected at random. What is the Bacon Number.
probability that exactly two of the three have five assigned b. Find P ( X $ m 2 s ) .
patients? c. The number of movies, Y, in which a Hollywood
personality has appeared is related to the Bacon Number
5.61 Psychology and Human Behavior A certain elevator by the formula Y 5 2X 1 5. Find the mean, variance, and
in Tampa’s tallest office building, 100 North Tampa, is used standard deviation of the number of movies in which a
heavily between 8:00 a.m. and 9:00 a.m. as employees arrive Hollywood personality has appeared.
for work. Suppose the number of people who board the elevator
on the ground floor going up is a random variable with 5.64 Marketing and Consumer Behavior While the
probability distribution given in the table below. temperature range of most household ovens is approximately
200–600°F, most consumers only use four or five common
x 1 2 3 4 5 6 settings. Suppose the probability distribution for the oven-
p(x) 0.002 0.010 0.050 0.060 0.080 0.090 temperature setting for a randomly selected use is given in the
table below.
x 7 8 9 10 11 12
x 300 325 350 375 400 500
p(x) 0.100 0.120 0.140 0.150 0.150 0.048
p(x) 0.040 0.205 0.400 0.075 0.200 0.080
2
a. Find m, s , and s .
b. For a randomly selected elevator ride from the ground a. Find the mean, variance, and standard deviation of the
floor going up, what is the probability that the number of oven temperature settings.
riders is within one standard deviation of the mean? b. Suppose three different uses are randomly selected. Find
c. For two randomly selected elevator rides from the ground the probability that the temperature settings for all three
floor going up, what is the probability that both trips have are at least 400°F.
a number of riders more than two standard deviations c. Suppose three different uses are randomly selected. Find
from the mean? the probability that exactly one use is for 350°F.
5.4 The Binomial Distribution 211
There are four common properties in all of these experiments. These properties are
used to describe a binomial experiment, and they are necessary to define a binomial
random variable.
A Closer L ok
1. A trial is a small part of the larger experiment. A trial results in a single occurrence of
either a success or a failure. For example, flipping a coin once, or drilling one test oil
well, is a single trial. A typical binomial experiment might consist of n 5 50 trials.
2. A success does not have to be a good thing. For example, the experiment may consist
of injecting animals with a potential carcinogen and checking for the development of
tumors. A success might be an animal that develops at least one tumor. Success and
failure could stand for heads and tails, acceptable and not acceptable, or even dead
and alive.
3. Trials are independent if whatever happens on one trial has no effect on any other trial.
For example, any one voter response has no effect on any other voter response.
4. The probability of a success on every trial is exactly the same. For example, the prob-
ability of the tossed (fair) coin landing with head face up is always 1/2.
Notation
Why is P(F) 5 1 2 p? 1. The probability of a success is denoted by p. Therefore, P ( S ) 5 p and P ( F ) 5
1 2 p 5 q.
2. A binomial random variable X is completely determined by the number of trials n and
the probability of a success p. If we know those two values, then we will be able to
answer any probability question involving X.
The shorthand notation X , B ( n, p ) means X is (distributed as) a binomial random
variable with n trials and probability of a success p.
For example, X , B(25, 0.4) means X is a binomial random variable with 25 trials
and probability of success 0.4.
Our goal now is to find the probability distribution for a binomial random variable.
Given n and p, we want to find the probability of obtaining x successes in n trials,
P ( X 5 x ) 5 p ( x ) . We will solve this problem by first considering a simple case with
n 5 5.
5.4 The Binomial Distribution 213
For example, suppose five people Example 5.13 Binomial Experiment with n 5 5
are selected at random. Let p be
Consider a binomial experiment with n 5 5 trials and probability of success p.
the probability that a randomly
selected person snores. a. A typical outcome with two successes is SFFSF. Find the probability of this outcome.
b. Another possible outcome with two successes is FSFSF. Find the probability of this
outcome.
c. Compare your results from (a) and (b).
d. Find the probability that X 5 2 successes.
Solution
a. The probability of this outcome is
P ( SFFSF ) 5 P ( S d F d F d S d F )
Probability of a success on the first trial and a failure on the second trial and . . . .
c. The results are identical. Every other outcome with two successes, and therefore three
failures, has exactly the sample probability, p2 ( 1 2 p ) 3. Therefore, the probability of
an outcome depends on the number of successes (and failures), not on the order in
which they appear.
d. To compute the probability that X 5 2 successes, find all the outcomes mapped to a 2,
and add the corresponding probabilities. However, every outcome that is mapped to a
2 has the same probability, so all we need to know is how many outcomes are mapped
to a 2.
P ( X 5 2 ) 5 ( number of outcomes with 2 successes ) p2 ( 1 2 p ) 3
The number of successes and the Generalizing, suppose X , B ( n, p ) . We want to find the probability of obtaining x
s
number of failures must sum to n, successes in n trials. The probability of any single outcome with x successes, and there-
the total number of trials. fore n 2 x failures, is px ( 1 2 p ) n2x. The probability of obtaining x successes is
P ( X 5 x ) 5 ( number of outcomes with x successes ) px ( 1 2 p ) n2x
We need a method for quickly counting the number of outcomes with x successes.
Recall: For any positive whole number n, the symbol n! (read “n factorial”) is defined
by
n! 5 n ( n 2 1 )( n 2 2 ) c( 3 )( 2 ) ( 1 )
In addition, 0! 5 1 (0 factorial is 1). Given a collection of n items, the number of com-
binations of size x is given by
n n!
nCx 5a b5
x x! ( n 2 x ) !
The number of outcomes with x successes is determined using combinations. Suppose
n
Can you figure out why this is X , B ( n, p ) . The number of outcomes with x successes is a b. We can now write an
true?
x
expression for the probability of obtaining x successes in n trials.
s
SOLUTION TRAIL
5 0.7759
The probability that at least seven bottles of ice wine are from Ontario is 0.7759.
Stepped
STEPPED Tutorial
TUTORIALS Note:
Binomial
BOX PLOTS
probabilities 1. The two most important elements for solving this problem are (1) the probability dis-
tribution and (2) the probability statement.
5.4 The Binomial Distribution 215
2. Often, the properties of a binomial experiment will not be stated explicitly in the prob-
lem. Usually we must read into the problem to see the n trials, to identify a success, to
recognize independence, and to presume that the probability of a success remains con-
stant from trial to trial.
A Closer L ok
1. Even for small values of n, many of the probabilities associated with a binomial ran-
dom variable are a little tedious to calculate, and are subject to lots of round-off error.
Technology helps, and Table 1 in Appendix A presents cumulative probabilities for a
binomial random variable, for various values of n and p.
2. Cumulative probability is an important concept. If X , B ( n, p ) , the probability that X
takes on a value less than or equal to x is cumulative probability. Accumulate all the
probability associated with values up to and including x. Symbolically, cumulative
probability is
P(X # x) 5 a P(X 5 k)
x
(5.7)
k50
5 P ( X 5 0 ) 1 P ( X 5 1 ) 1 P ( X 5 2 ) 1 c1 P ( X 5 x )
3. Graphically, cumulative probability is like standing on a special staircase, looking
down (or back), and measuring the height. The steps are labeled 0, 1, 2, . . . , n, the
height of step x is P ( X 5 x ) , and the total height of the staircase is 1. Figure 5.4 illus-
trates P ( X # 3 ) . The number of steps is n 1 1, and the height of each step depends on
n and p. In this example, n 5 10 and p 5 0.25. The largest steps (the highest proba-
bilities) are associated with X 5 1, 2, and 3. Steps 7, 8, 9, and 10 are hard to see
SOLUTION TRAIL
because P ( X 5 7 ) , P ( X 5 8 ) , P ( X 5 9 ) , and P ( X 5 10 ) are so small.
1
K e y words
P(X = 2) P(X 3)
n 40%
n 20 pizza orders
n Randomly selected P(X = 1)
P(X = 0)
Tr a nslat io n 0 1 2 3 4 5 6 7 8 9 10
n p 5 0.40 Figure 5.4 Staircase analogy to cumulative probability.
n n 5 20
C on cepts Every probability question about a binomial random variable can be answered using
n Probabilities associated with a cumulative probability. There may also be other, faster methods, but cumulative probabil-
binomial distribution ity always works. The following example illustrates some of the techniques for converting
to and using cumulative probability.
V is io n
Consider a binomial experiment:
n and p are given, and the four Example 5.15 A Slice Above
characteristics of a binomial Approximately 40% of all pizza orders are carry-out.18 Suppose 20 pizza orders are ran-
experiment are assumed to apply. domly selected.
Identify the probability
distribution and write each a. Find the probability that at most 8 are carry-out orders.
probability question in terms of b. Find the probability that exactly 10 are carry-out orders.
the random variable. Use
c. Find the probability that at least 7 are carry-out orders.
appropriate probability rules.
d. Find the probability that between 5 and 11 (inclusive) are carry-out orders.
216 C h a p t e r 5 Random Variables and Discrete Probability Distributions
Solution
Let X be the number of pizza orders (out of the 20 selected) that are carry-out. X is a
binomial random variable with n 5 20 and p 5 0.40: X , B ( 20, 0.40 ) .
a. The probability that at most 8 are carry-out orders
5 0.1172
This solution may also be found by using the probability mass function for a binomial
random variable (Equation 5.6). In addition, most statistical software can compute
binomial probabilities for single values.
c. The probability that at least 7 are carry-out orders
5 P(X $ 7) Translate the words into mathematics.
5 0.7500
d. The probability that between 5 and 11 (inclusive) are carry-out orders
5 0.8925
Figures 5.5 through 5.8 show technology solutions.
Figure 5.5 P(X # 8); Figure 5.6 P(X 5 10); using Figure 5.7 P(X $ 7); using Figure 5.8 P(5 # X # 11);
cumulative probability. the probability mass function. cumulative probability. using cumulative probability.
The next example shows how the binomial distribution can be used to make an
inference.
Solution Trail 5.16a or joint pain, which is considered an adverse reaction.19 Suppose 25 people who need
Lipitor are selected at random. Each is given a 40-mg dose (per day), and the number of
K e y w or d s people who experience joint pain is recorded.
n 10%
a. Find the probability that at most one person will experience joint pain.
n 25 people
b. Suppose seven people experience joint pain. Is there any evidence to suggest that
n At most one
Pfizer’s claim is wrong? Justify your answer.
T r a n s l at io n
n p 5 0.10 Solution
n n 5 25 Let X be the number of people (out of the 25 selected) who experience joint pain after tak-
n X#1 ing Lipitor. X is a binomial random variable with n 5 25 and p 5 0.10: X , B ( 25, 0.10 ) .
Translate the words into a mathematical probability statement, convert to cumulative
C once pts
probability if necessary, and use Table I in the Appendix.
n Binomial random variable
a. The probability that at most one will experience joint pain
V is ion
Consider a binomial experiment: 5 P(X # 1) Translate the words into mathematics.
n 5 25 trials with two outcomes 5 0.2712 Already cumulative probability; use Table 1 in Appendix A.
(joint pain or no joint pain), the
trials are independent (random The probability of at most one person experiencing joint pain is 0.2712.
sample), and the probability of a b. Pfizer claims p 5 0.10. This implies that the random variable X has a binomial
success (experience joint pain) is distribution with n 5 25 and p 5 0.10.
constant from trial to trial.
Claim: p 5 0.10 S X , B ( 25, 0.10 ) .
The experimental outcome is that seven people experience joint pain.
SOLUTION TRAIL
Experiment: x 5 7.
It seems reasonable to consider P ( X 5 7 ) and draw a conclusion based on this prob-
ability. However, to be conservative (to give the person making the claim the benefit of
the doubt), we always consider a tail probability. We accumulate the probability in a
tail of the distribution, and if it is small, then there is evidence to suggest the claim is
Solution Trail 5.16b false.
K E Y W OR DS
So, which tail? It depends on the mean of the distribution (and later on, the alternative
hypothesis). Formulas for the mean, variance, and standard deviation of a binomial
n Is there any evidence?
random variable are given below. Intuitively, however, the mean of a binomial random
T R ANS L AT I O N variable is m 5 np. If n 5 25 and p 5 0.10, we expect to see m 5 ( 25 )( 0.10 ) 5 2.5
n Use the experimental outcome people experience joint pain. Because x 5 7 is to the right of the mean, we’ll consider
to draw a conclusion concern- a right-tail probability. See Figure 5.9.
ing Pfizer’s claim.
p(x)
C ONC E P T S 0.25
n Inference procedure
0.20
V I S I ON
To decide whether seven people
experiencing joint pain is 0.15
reasonable, we need to follow the
four-step inference procedure. 0.10
This process now involves a
random variable. Consider the
assumption (claim), the 0.05
experiment, and the likelihood of
the experimental outcome. 0.00
0 1 2 3 4 5 6 7 8 9 10 x
Likelihood:
P(X $ 7) 5 1 2 P(X , 7) The complement rule.
5 0.0095
Conclusion: Because this tail probability is so small (less than 0.05), it is very unusual
to observe seven or more people with joint pain. But it happened! This is either an
incredibly lucky occurrence, or someone is lying. We usually discount the lucky pos-
sibility, and conclude that there is evidence to suggest Pfizer’s claim is false.
Figure 5.10 shows a technology solution.
A random variable is often described, or characterized, by its mean and variance (or
standard deviation): m and s2 (or s ). If we know m and s , we can use Chebyshev’s rule
to determine the most likely values of the random variable. For any population (random
variable), most (at least 89%) of the values are within three standard deviations of the
mean. This fact provides another approach to statistical inference, for determining the
SOLUTION TRAIL likelihood of an experimental outcome.
To find the mean and variance of a binomial random variable, we could use the math-
ematical definitions (Equations 5.1 and 5.3). These formulas are used to produce the
general results below. However, the mean is intuitive. Consider a binomial random vari-
able X , B ( 10, 0.5 ) . We expect to see ( 10 )( 0.5 ) 5 5 5 np successes in 10 trials. (Think
about tossing a fair coin 10 times.) Similarly, if X , B ( 100, 0.75 ) , we expect to see
Solution Trail 5.17a 100 ( 0.75 ) 5 75 5 np successes. The mean of a binomial random variable with n trials
Ke ywor d s
and probability of a success p is m 5 np.
n 60%
n 100 children
Mean, Variance, and Standard Deviation of a Binomial
Random Variable
Tr anslat ion
If X is a binomial random variable with n trials and probability of a success p, X , B ( n, p ) ,
n p 5 0.60 then
n n 5 100
m 5 np s2 5 np ( 1 2 p ) s 5 "np ( 1 2 p ) (5.8)
C oncept s
n Binomial random variable,
mean, variance, standard
Given a binomial random variable, n, and p, we know the mean, variance, and standard
deviation
deviation immediately. There is no need to create a table of values and probabilities, and
Vis ion use the formulas to find m and s2 . Here is an example to illustrate the use of this concept.
Consider a binomial experiment:
n 5 100 trials, two outcomes Example 5.17 Children in Poverty
(living in poverty or not living in Troubles in the automobile industry and the global economic downturn caused high
poverty), trials are independent
unemployment and a population exodus in Detroit. According to the 2013 State of Detroit
(random sample), and probability
report, 60% of all children there live in poverty.20 Suppose 100 children in Detroit are
of a success (living in poverty) is
constant from trial to trial. Define selected at random.
a random variable and identify its a. Find the mean, variance, and standard deviation of the number of children living in
probability distribution. poverty.
b. Suppose 55 of the 100 children are living in poverty. Is there any evidence to suggest
the report’s claim is false? Justify your answer.
SOLUTION TRAIL
A Closer L ok
1. Recall: In statistics, we usually measure distance in standard deviations, not miles,
feet, inches, meters, or other units. We often want to know how many standard devia-
tions from the mean is a given observation.
2. The inference problem in part (b) of Example 5.17 can also be answered using the tail
probability approach. (Try it!) This method leads to the same conclusion and is gener-
ally more precise than constructing an interval about the mean. A more formal process
for checking claims (hypothesis tests) is introduced in Chapter 9.
3. Whenever we test a claim, there are only two possible conclusions:
a. There is evidence to suggest the claim is false.
b. There is no evidence to suggest the claim is false.
Note that, in either case, we never state with absolute certainty that the claim is true or
the claim is false. This is because we never look at the entire population, only at a
sample. With a large, random (representative) sample, we can be pretty confident in
our conclusion, but never absolutely sure.
Technology Corner
Procedure: Compute probabilities associated
VIDEO TECHwith a binomial random variable.
MANUALS VIDEO Tech
Video TECH Manuals
MANUALS
Reconsider: Example 5.15, solution, and interpretations.
EXEL DISCRIPTIVE EXEL DISCRIPTIVE
Binomial probability
computations
220 C h apter 5 Random Variables and Discrete Probability Distributions
CrunchIt!
CrunchIt! has a built-in function to compute probabilities associated with a binomial random variable.
1. Select Distribution Calculator; Binomial. Enter the values for n and p, and select an appropriate inequality symbol or
the equals sign. Enter the value for x and click Calculate. See Figure 5.11.
2. To find the probability that X takes on a single value, use the same menu choices and enter values for n and p. Select the
equals sign and enter the value for x. Click Calculate. See Figure 5.12.
TI-84 Plus C
Suppose X , B ( n, p ) . There are built-in functions to compute a value of the probability mass function (the probability
X takes on a single value) and cumulative probability.
1. For cumulative probability, use DISTR ; DISTR; binomcdf. Enter values for the number of trials (n), p (p), and
x value (x). Position the cursor on Paste and tap ENTER . The appropriate calculator command is copied to the
Home screen. Tap ENTER again to compute the resulting probability. Refer to Figure 5.5, page 216.
2. To find the probability that X takes on a single value, use DISTR ; DISTR; binompdf. Enter values for the number
of trials (n), p (p), and x value (x). Position the cursor on Paste and tap ENTER . The appropriate calculator
command is copied to the Home screen. Tap ENTER again to compute the resulting probability. Refer to Figure 5.6,
page 216.
Minitab
There are built-in functions accessed via input windows or the command language to compute a value of the probability
mass function (the probability that X takes on a single value) and cumulative probability. The command language may be
necessary to perform additional calculations involving probabilities.
1. Select Calc; Probability Distributions; Binomial. Choose Cumulative probability. Enter the Number of trials (n), the
Event probability (p), and the Input constant (x). See Figure 5.13.
2. To find the probability that X takes on a single value, select Calc; Probability Distributions; Binomial. Choose
Probability. Enter the Number of trials (n), the Event probability (p), and the Input constant (x). See Figure 5.14.
Excel
There is a single built-in function to find the probability that X takes on a single value or cumulative probability. The last
argument of the function is either True for cumulative probability or False for the probability mass function. Additional
spreadsheet calculations may be necessary to find the final answer.
1. To find cumulative probability, use the function BINOMDIST. Enter x, n, p, and True. See Figure 5.15.
2. To find the probability that X takes on a single value, use BINOMDIST. Enter x, n, p, and False. See Figure 5.15.
Practice
Applications
5.81 Suppose X , B ( 15, 0.25 ) . Find the following probabilities.
5.87 Economics and Finance Approximately 90% of
a. P ( X # 2 ) b. P ( X , 2 )
freshmen at Marquette University receive financial aid.21
c. P ( X 5 7 ) d. P ( X . 6 )
Suppose 20 Marquette freshmen are randomly selected.
e. P ( 3 # X # 10 )
a. Find the probability that at most 15 freshmen receive
5.82 Suppose X , B ( 20, 0.40 ) . Find the following probabilities. financial aid.
a. P ( X $ 12 ) b. P ( X 2 10 ) b. Find the probability that at least 12 freshmen receive
c. P ( X # 15 ) d. P ( 2 , X # 8 ) financial aid.
222 C h apte r 5 Random Variables and Discrete Probability Distributions
5.95 Demographics and Population Statistics There is d. Suppose two groups of 20 random orders are independently
some evidence to suggest that businesses are moving out of selected. What is the probability that at least 16 orders will
states where unions are prevalent. In California, 18.4% of all be filled correctly in both groups?
workers belong to a union, and in Arkansas, 3.7%.24 Suppose
5.99 Sports and Leisure The reality television series
20 workers from California and 20 workers from Arkansas are
Splash! features celebrities attempting to learn how to dive.
selected at random.
The first episode aired in January 2013 and earned a 23.6%
a. Find the probability that at most two California workers
audience share. That is, 23.6% of all TVs in use during the
belong to a union.
show time period were tuned to a station airing Splash!.27
b. Find the probability that none of the Arkansas workers
Twenty people who watched TV during that time period were
belongs to a union.
selected at random.
c. Find the probability that at least one worker from each
a. Find the probability that at least six watched Splash!.
state belongs to a union.
b. Find the expected number of people who watched
5.96 Public Health and Nutrition A recent report suggests Splash!. Find the probability that the number of people
that one-third of all U.S. children are overweight or obese. One who watched Splash! is less than the mean.
possible cause is the availability of junk foods and sugary c. Suppose that at most four people watched Splash!. What
snacks in schools. Approximately 40% of all students buy and is the probability that no one watched Splash!?
eat one or more snacks at school.25 Suppose 40 school children
5.100 Marketing and Consumer Behavior More children
are selected at random.
are being rushed to the hospital because they were able to push
a. Find the mean, variance, and standard deviation of the
down and twist the cap on a medication bottle and were
number of school children (out of the 40 selected) who
poisoned by a common drug. A recent research study suggested
buy and eat a snack at school.
that 25% of all preschool children can open a medication bottle
b. Construct intervals one, two, and three standard
and 10 preschool children are selected at random. Let the
deviations from the mean.
random variable X be the number of children who can open
c. Suppose 27 students (out of the 40 selected) buy and eat a
the bottle.
snack at school. How many standard deviations from the
a. Construct a probability histogram for the random
mean is this observation? What does this distance
variable X.
measure indicate about the likelihood of observing 27
b. Find the mean, variance, and standard deviation of X.
students who buy and eat a snack at school?
Indicate the mean on the graph from part (a).
5.97 Business and Management According to the CICA c. Find P ( m 2 s # X # m 1 s ) and indicate this
Business Monitor, 11% of Canada’s executive chartered probability on the graph from part (a).
accountants are pessimistic about the economy.26 To check d. Suppose these 10 children try to open a medicine bottle
this claim, 50 chartered accountants were randomly selected with a new design for the cap and one child is able to
and asked whether they feel pessimistic about the economy open the bottle. Is there any evidence to suggest that the
in 2013. new cap is more effective in stopping children from
a. If the claim is true, find the probability that exactly seven opening the bottle? Justify your answer.
chartered accountants are pessimistic about the economy. 5.101 Manufacturing and Product Development A
b. If the claim is true, find the probability that at most four company has developed a very inexpensive explosive-detection
chartered accountants are pessimistic about the economy. machine for use at airports. However, if an explosive is actually
c. Suppose 12 chartered accountants are pessimistic about in a suitcase, the probability of it being detected by this
the economy. Is there any evidence to suggest that the machine is only 0.60. Therefore, several of these machines will
claim is false? Justify your answer. be used simultaneously to screen each piece of luggage
independently. Suppose a piece of luggage actually contains an
Extended Applications explosive.
a. If three machines screen this luggage, what is the
5.98 Business and Management In the movie Lethal probability that exactly one will detect the explosive?
Weapon, the character played by Joe Pesci is concerned about What is the probability that none of the three will detect
drive-thru windows at fast-food restaurants. Suppose the the explosive?
probability that an order at a drive-thru window at a fast-food b. If four machines screen this luggage, what is the
restaurant will be filled correctly is 0.75. Twenty orders are probability that at least one device will detect the
selected at random. explosive?
a. What is the probability that exactly 15 orders will be c. If five machines screen this luggage, what is the
filled correctly? probability that at least one device will detect the
b. What is the probability that at most 12 orders will be explosive?
filled correctly? d. How many machines are necessary for screening in order
c. What is the probability that between 10 and 14 (inclusive) to be certain that at least one device will detect the
orders will be filled correctly? explosive with probability 0.999 or greater?
224 C h apter 5 Random Variables and Discrete Probability Distributions
5.102 Marketing and Consumer Behavior Forever 21 5.104 Psychology and Human Behavior As a result of
sells women’s flip-flops in (oddly enough) 21 different colors. stricter training requirements, fewer big fires, higher-paying
Despite this vast available color selection, 50% of all flip-flop jobs in cities, and changes in society, the number of volunteer
purchases are in white. Suppose 30 buyers are selected at firefighters is declining. Approximately three-fourths of all
random. firefighters in the United States are volunteers, and the total
a. Find the mean, variance, and standard deviation of the number of volunteer firefighters has decreased steadily over the
number of buyers who purchase white flip-flops. last two decades. Suppose 30 U.S. firefighters are selected at
b. Find the probability that the number of white flip-flops random.
purchased will be within two standard deviations of the a. Find the probability that exactly 22 of the firefighters are
mean. Compare this with the predicted result from volunteers.
Chebyshev’s rule. b. Find the probability that more than 25 of the firefighters
c. Suppose two groups of 30 customers are independently are volunteers.
selected. What is the probability of at least one group c. Suppose 17 of the firefighters are volunteers. Is there any
having exactly 15 people who buy white flip-flops? evidence to suggest that the proportion of volunteer
firefighters has decreased? Justify your answer.
5.103 Manufacturing and Product Development In early
d. Suppose 50 firefighters are selected at random from the
2013 there was a worldwide glut of solar panels. This forced
West and 50 firefighters are selected from the Northeast.
prices down and made this form of energy production afford-
What is the probability that at least 40 of the firefighters
able to many more people. This oversupply of solar panels may
will be volunteers in both groups?
have caused some manufacturers to cut corners. Sainty Solar is
a leading producer and claims the proportion of its solar panels 5.105 Sports and Leisure Most ski resorts operate beginner,
that are defective is 0.02. Solar Solutions, a leading installer, intermediate, and advanced terrain in order to appeal to people
receives a shipment of 50,000 solar panels. Before accepting with varying abilities. The table below lists several ski areas in
the entire lot, Solar Solutions selects a random sample of 25 Canada and the proportion of skiers who attempt the advanced
panels and thoroughly tests each one. If four or more panels are terrain during their visit.
found to be defective, the entire shipment will be sent back.
Otherwise the shipment will be accepted. Ski area Probability
a. Suppose the claim is true: The actual proportion of
Big White 0.28
defectives is p 5 0.02. What is the probability that the
shipment will be rejected? (This is one type of error Kicking Horse 0.60
probability. The company would be making a mistake if Norquay 0.44
this event occurred. It would be rejecting the shipment
when the proportion of defectives is as claimed.) Suppose 20 skiers are randomly selected from each area.
b. Suppose the actual proportion of defectives is p 5 0.05. a. Find the probability that exactly five skiers at Big White
What is the probability that the shipment will be will attempt the advanced terrain.
accepted? (This is another type of error probability. In b. Find the probability that more than eight skiers at Kicking
this case, the company would also be making a mistake. It Horse will attempt the advanced terrain.
would be accepting the shipment when the proportion of c. Find the probability that between 12 and 16 (inclusive)
defectives is too high.) skiers at Norquay will attempt the advanced terrain.
c. Suppose the actual proportion of defectives is p 5 0.07. d. Find the probability that at most five skiers at all three
What is the probability that the shipment will be accepted? locations will attempt the advanced terrain.
Think of an experiment in which you continue to phone a friend until you get through.
The number of calls necessary until the first success (reaching your friend) is the value of
a geometric random variable.
The derivation of the probability distribution involves the properties given above. Let
X be a geometric random variable, the number of trials until the first success (including
the trial on which the success is obtained). Given p, the probability of a success, find the
probability of needing x trials, P ( X 5 x ) 5 p ( x ) .
P(X 5 1) 5 P(S) 5 p
X 5 1 means the first trial results in a success, and the experiment is over. The probability
of a success is simply p.
P(X 5 2) 5 P(F d S) 5 P(F) # P(S) 5 (1 2 p)p
X 5 2 means the first trial is a failure and the second trial is a success. Because trials are
independent, we multiply the corresponding probabilities.
P(X 5 3) 5 P(F d F d S) 5 P(F) # P(F) # P(S)
5 ( 1 2 p )( 1 2 p ) p 5 ( 1 2 p ) 2p
Why isn’t FSF a possible outcome X 5 3 means the first two trials are failures and the third trial is a success. We use inde-
in a geometric experiment? pendence again, and multiply the corresponding probabilities.
P(X 5 4) 5 P(F d F d F d S) 5 P(F) # P(F) # P(F) # P(S)
5 ( 1 2 p )( 1 2 p )( 1 2 p ) p 5 ( 1 2 p ) 3p
X 5 4 means the first three trials are failures and the fourth trial is a success. We use
independence again, and multiply the corresponding probabilities.
In general,
P ( X 5 x ) 5 P ( F ) # P ( F ) cP ( F ) # P ( S ) 5 ( 1 2 p )( 1 2 p ) c( 1 2 p ) # p
x 2 1 failures x 2 1 terms
5 ( 1 2 p ) x21p
226 C h apter 5 Random Variables and Discrete Probability Distributions
X 5 x means the first x 2 1 trials are failures and the xth trial is the first success. This
generalization is the formula for the probability distribution.
A Closer L ok
1. The geometric random variable is discrete. The number of possible values is countably
infinite: 1, 2, 3, . . . .
2. The geometric distribution is completely characterized, or defined, by one parame-
ter, p.
3. We do not need a table to find cumulative probabilities associated with a geometric
There is a formula for the sum of random variable because there is an easy formula for computing these values. If X is a
a geometric series. Can you use it geometric random variable with probability of success p, then
to show that this sum is 1?
P ( X # x ) 5 1 2 ( 1 2 p ) x (5.11)
Several years ago the Bekins
4. Equation 5.9 is a valid probability distribution.
Moving Company used a clever
Each probability is between 0 and 1, and the sum of all the probabilities is an infinite
s
a P(X 5 x) 5 a (1 2 p) p
“Bekins men are careful, quick, ` `
x21
and kind, Bekins takes a load off
x51 x51
of your mind. . . .”
is called a geometric series and it does sum to 1!
s
Solution Trail 5.18 Example 5.18 Bekins Men are Careful, Quick, and Kind
The number of people who changed residences in the United States has declined steadily
Ke ywords
since 1985. However, perhaps due to the sluggish economy, the U.S. Census Bureau esti-
n First person to have moved mates that approximately 12% of all people changed residences in 2012.28 Suppose
T ransl at io n researchers at Bekins randomly call people in the United States and ask if they have
n First success moved in the last year.
Co ncepts a. What is the probability that the fourth person called will be the first to have moved in
n Geometric probability
the past year?
distribution b. What is the probability that it will take at least six calls before speaking to someone
who has moved in the past year?
Vision
Consider a geometric experiment. Solution
Each trial is a call to ask if the
person has moved in the past year, a. Let X be the number of calls necessary until the first mover is found. X is a geometric
P ( S ) 5 P ( moved ) 5 0.12, P(S) random variable with P ( S ) 5 0.12 5 p.
is the same on each trial, and the The probability that the first mover (success) is found on the fourth call:
trials are independent. Define the
geometric random variable and P ( X 5 4 ) 5 ( 1 2 p ) 421p Equation 5.9.
write a probability statement for 3 3
5 ( 1 2 0.12 ) ( 0.12 ) 5 ( 0.88 ) ( 0.11 ) 5 0.0818 Use p 5 0.12.
each part.
The probability that the first mover is found on the fourth call is 0.0818.
b. At least six calls before speaking to someone who has moved in the past year means
the first success will occur on the sixth call or later.
5.5 Other Discrete Distributions 227
The probability that it will take six or more calls to find the first mover is 0.5277.
Figures 5.16 and 5.17 show technology solutions.
The distribution is named after The Poisson probability distribution has many practical applications and is often
the French mathematician Simeon associated with rare events. A Poisson random variable is a count of the number of
Denis Poisson (1781–1840). occurrences of a certain event in a given unit of time, space, volume, distance, etc., for
example, the number of arrivals to a hospital Emergency Room in a certain 30-minute
period, the number of asteroids that pass through Earth’s orbit during a given year, or the
number of bacteria in a milliliter of drinking water.
These properties are often referred to as a Poisson process and can be difficult to
verify.
The Poisson distribution is completely determined by the mean, denoted by the Greek
letter lambda, l. Because the Poisson distribution is often used to count rare events, the
mean number of events per interval is usually small. The probability distribution is given
below.
228 C h apter 5 Random Variables and Discrete Probability Distributions
A Closer L ok
1. The Poisson random variable is discrete. The number of possible values is countably
infinite: 0, 1, 2, 3, . . . .
2. The Poisson distribution is completely characterized by only one parameter, l. The
mean and the variance are both equal to the same value, l.
3. Equation 5.12 is a valid probability distribution. All of the probabilities are between 0
and 1, and the sum of all the probabilities is 1 (another infinite series).
4. e in Equation 5.12 is the base of the natural logarithm. e < 2.71828 is an irrational
number, and most calculators have this special constant built in.
SOLUTION TRAIL
5. The denominator of Equation 5.12 contains x! (x factorial).
Recall: x! 5 x ( x 2 1 )( x 2 2 ) c( 3 )( 2 )( 1 ) and 0! 5 1.
6. Table 2 in Appendix A contains values for P ( X # x ) (cumulative probability) for vari-
ous values of l.
month. The mean of the Poisson 5 0.8571 2 0.6767 5 0.1804 Use Table II in the Appendix.
distribution is given.
b. At least five means five or more: X $ 5.
5 0.0527
5.5 Other Discrete Distributions 229
P(m 2 s # X # m 1 s)
5 P ( 2 2 1.4142 # X # 2 1 1.4142 ) Use values for m and s.
Figure 5.18 P(X 5 3). Figure 5.19 P(X $ 5). Figure 5.20 P(1 # X # 3).
A Closer L ok
1. Here is an explanation for the strange restriction on the possible values for the ran-
s
dom variable X.
max ( 0, n 2 N 1 M ) # x: x must be at least 0 or n 2 N 1 M, whichever is bigger. If
n 2 N 1 M is positive, it is impossible to obtain fewer than n 2 N 1 M successes.
x # min ( n, M ) : x can be at most n or M, whichever is smaller. The greatest number
of successes possible is either n or the total number of successes in the population.
Suppose n 5 5, N 5 10, and M 5 6. Then:
max ( 0, n 2 N 1 M ) 5 max ( 0, 5 2 10 1 6 ) 5 max ( 0, 1 ) 5 1 and
SOLUTION TRAIL min ( n, N ) 5 min ( 5, 10 ) 5 5 1 1 # x # 5
It is impossible to obtain less than 1 success. Also, the greatest number of successes
possible is 5.
s
2. The hypergeometric random variable is discrete. All of the probabilities are between 0
and 1, and the probabilities do sum to 1.
Solution Trail 5.20 n
3. Recall: a b is a combination. The number of combinations of size r is given by
r
Ke ywords
n n!
n 10 twin-size comforters nCr 5a b5
r r! n 2 r ) !
(
n 2 stitched incorrectly
n 4 selected at random Example 5.20 Hello Kitty
T ransl at io n A Target store has 10 Hello Kitty twin-size comforters for sale. Two of the 10 comforters
n N 5 10 (finite population) have been stitched incorrectly at the factory and will split open when used. Suppose four
n 2 failures, therefore M 5 8 of the comforters are randomly selected.
successes
a. What is the probability that exactly two comforters will be stitched correctly?
n Sample size n 5 4
b. What is the probability that at least three comforters will be stitched correctly?
Co ncepts
n Hypergeometric probability Solution
distribution Let X be the number of successes in the sample. X has a hypergeometric distribution with
vision
n 5 4, N 5 10, and M 5 8. Translate each question into a probability statement, convert
to cumulative probability if necessary, and use Equation 5.14 and/or technology.
All four comforters selected at
random without replacement and a. Exactly two means X 5 2.
there are 10 comforters to choose
from. Each comforter selected is M N2M 8 10 2 8
either a success (stitched a ba b a ba b
x n2x 2 422
correctly) or a failure. Consider a P(X 5 2) 5 5 Use Equation 5.14.
N 10
hypergeometric distribution. a b a b
n 4
5.5 Other Discrete Distributions 231
8 2 8 2
a ba b a ba b
3 1 4 0
5 1 Use Equation 5.14.
10 10
a b a b
4 4
( 56 )( 2 ) ( 70 )( 1 )
5 1 Use the formula for a combination.
210 210
5 0.5333 1 0.3333 5 0.8666
Note: This problem can also be solved using cumulative probability:
P(X $ 3) 5 1 2 P(X # 2)
Figures 5.21 and 5.22 show technology solutions.
Figure 5.21 Minitab session window Figure 5.22 P(X $ 3) using the Minitab
output using the Hypergeometric command language.
Distribution input window.
Technology Corner
Procedure: Compute probabilities associated with a geometric, Poisson, or hypergeometric distribution.
Reconsider: Examples 5.18, 5.19, and 5.20, solutions, and interpretations.
CrunchIt!
CrunchIt! has a built-in function to compute probabilities associated with a geometric and a Poisson variable. The
CrunchIt! geometric random variable is defined to be the number of failures until the first success is achieved.
1. Suppose X is a geometric random variable with P ( X ) 5 p. Select Distribution Calculator; Geometric. Enter the value
for p and select an appropriate inequality symbol or the equals sign. Enter the value for x 2 1 and click Calculate. See
Figures 5.23 and 5.24.
232 C h apte r 5 Random Variables and Discrete Probability Distributions
2. Suppose X is a Poisson random variable with mean l. Select Distribution Calculator; Poisson. Enter the value for
lambda and select an appropriate inequality symbol or the equals sign. Enter the value for x and click Calculate. See
Figures 5.25 and 5.26.
TI-84 Plus C
There are built-in functions to compute cumulative probability and to evaluate the probability mass function associated
with a geometric and Poisson random variable. Use the built-in function for combinations to compute probabilities
associated with a hypergeometric random variable.
1. Suppose X is a geometric random variable with P ( S ) 5 p. Use the functions in the DISTR ; DISTR menu. Use the
function geometpdf to find the probability that X takes on a single value and geometcdf to find cumulative
probability. In each case, enter a value of p (p) and x value (x). Position the cursor on Paste and tap ENTER . The
appropriate calculator command is copied to the Home screen. Tap ENTER again to compute the desired probability.
Refer to Figures 5.16 and 5.17, page 227.
2. Suppose X is a Poisson random variable with mean l. Use the functions in the DISTR ; DISTR menu. Use the
function poissonpdf to find the probability that X takes on a single value and poissoncdf to find cumulative
probability. In each case, enter a value of p (p) and x value (x). Position the cursor on Paste and tap ENTER . The
appropriate calculator command is copied to the Home screen. Tap ENTER again to compute the desired probability.
Refer to Figures 5.18 through 5.20, page 229.
Minitab
There are built-in functions to compute cumulative probability and to evaluate the probability mass function associated
with a geometric, Poisson, or hypergeometric random variable. These functions may be accessed through a graphical input
window: Calc; Probability Distributions, or by using the command language.
1. Suppose X is a geometric random variable with P ( S ) 5 p. In a session window, use the commands PDF or CDF to
evaluate the probability mass function or to compute cumulative probability. See Figures 5.27 and 5.28.
5.5 Other Discrete Distributions 233
Figure 5.27 P(X 5 4) in Example 5.18. Figure 5.28 P(X $ 6) in Example 5.18.
2. Suppose X is a Poisson random variable with mean l. In a session window, use the command PDF or CDF to evaluate
the probability mass function or to compute cumulative probability. See Figures 5.29 and 5.30.
Figure 5.29 P(X 5 3) in Example 5.19. Figure 5.30 P(X $ 5) in Example 5.19.
3. Suppose X is a hypergeometric random variable with parameters n, N, and M. In a session or Calc; Probability
istributions input window, evaluate the probability mass function or compute cumulative probability. Refer to Figures
D
5.21 and 5.22, page 231.
Excel
There are built-in functions to compute cumulative probability and to evaluate the probability mass function associated
with a Poisson and a hypergeometric random variable. Use the built-in function for a binomial distribution,
BINOM.DIST, to find probabilities associated with the geometric distribution.
1. Suppose X is a geometric random variable with P ( S ) 5 p. Use the following formulas:
P ( X 5 x ) 5 BINOM.DIST ( 1, x, p, FALSE ) /x
P ( X # x ) 5 1 2 BINOM.DIST ( 0, x, p, FALSE )
See Figure 5.31.
2. Suppose X is a Poisson random variable with l 5 L. To evaluate the probability mass function, use the function
POISSON.DIST with the last argument set to False. To compute cumulative probability, use the function
OISSON.DIST with the last argument set to True. See Figure 5.32.
P
3. Suppose X is a hypergeometric random variable with parameters n, N, and M. To evaluate the probability mass function,
use the function HYPGEOM.DIST with the last argument set to False. To compute cumulative probability, use the
function HYPGEOM.DIST with the last argument set to True. See Figure 5.33.
Figure 5.31 Probabilities associated Figure 5.32 Probabilities associated Figure 5.33 Probabilities associated
with a geometric random variable. with a Poisson random variable. with a hypergeometric random variable.
234 C h apte r 5 Random Variables and Discrete Probability Distributions
Note: Similar calculations are available using JMP. See Figures 5.34–5.36.
Figure 5.34 Probabilities associated Figure 5.35 Probabilities associated Figure 5.36 Probabilities associated
with a geometric random variable. with a Poisson random variable. with a hypergeometric random variable.
are infected with some form of malware.32 Suppose a computer event. Suppose the probability of a false start in any swimming
repair specialist carefully checks every machine left at his store. event is 0.07, and swimming events are selected at random.
a. What is the probability that the second computer examined a. What is the probability that the first false start will occur
will be the first to have malware? Write a Solution Trail in the fourth event selected?
for this problem. b. What is the probability that the first false start will occur
b. What is the probability that the tenth machine examined after the 15th event selected?
will be the first to have malware? c. What is the mean number of events selected before a false
c. What is the mean number of computers examined before start occurs?
one will be infected with malware? d. Suppose there are 26 events at a swimming and diving
d. What is the probability that at least five computers will be meet. What is the probability that the first false start will
examined before one will have malware? occur at this meet?
5.121 Physical Sciences Of all cities in the United States, 5.125 Manufacturing and Product Development Flat-
Amherst, New York, has the fewest number of days per year panel TV displays in televisions and computer monitors often
clear of clouds, 4.4.33 Other cities with very few clear days develop dead pixels, pixels that become locked in one state—for
include Buffalo, New York, Lakewood, Washington, and example, red at all times. Manufacturers maintain that dead
Seattle, Washington. Suppose a random year is selected. pixels are a natural defect and there are various return policies
a. What is the probability that Amherst will have exactly after the discovery of a dead pixel. Suppose the mean number of
three days clear of clouds? dead pixels in a new Samsung 64-inch plasma TV is 2.5. One of
b. What is the probability of fewer than six days clear of clouds? these TVs is randomly selected and inspected for dead pixels.
c. What is the probability of at least nine days clear of clouds? a. What is the probability that there will be no dead pixels?
d. Suppose that between 2 and 10 (inclusive) days are clear b. If the number of dead pixels is more than m 1 3s, the
of clouds. What is the probability of more than five days assembly line is automatically stopped and examined.
clear of clouds? What is the probability that the assembly line will be
stopped?
5.122 Travel and Transportation Bad weather is to blame
c. What is the probability that the number of dead pixels
for some of the worst highway crashes in Canada. In December
will be within two standard deviations of the mean?
2012 there was a 27-vehicle pile-up on Highway 40, and in
February 2013 a 50-car pile-up shut down Highway 401 near 5.126 Economics and Finance Most banks charge a
Woodstock, Ontario. Highway 63 in Alberta has a notorious monthly fee for a checking account, in addition to ATM,
reputation; there are approximately four accidents every week overdraft, and other fees. However, 72% of credit unions offer
on this road.34 Suppose a week is randomly selected. checking accounts with no monthly fees.36 Suppose a bank
a. Find the probability that there are no more than four auditor selects credit unions at random.
crashes. a. What is the probability that the second credit union
b. Find the probability that the number of crashes is more selected will be the first to offer free checking?
than m 1 2s. b. What is the probability that the first credit union to offer
c. To obtain government funding for safety improvements, free checking will be one of the first three?
SOLUTION TRAIL
there must be five weeks in a row with six or more c. Suppose the first credit union to offer free checking is the
crashes. What is the probability of this happening? 10th selected. Is there any evidence to suggest that the
claim (72%) is false? Justify your answer.
5.123 Physical Sciences On a Friday night in late March
2013, there were hundreds of reports of meteor sightings along 5.127 Business and Management Fifteen
the East Coast of the United States. People from North Carolina lobstermen have their boats anchored at a small pier
to Canada contacted the American Meteor Society to report along the New Hampshire coast. Five of these lobstermen have
what they saw and heard. Suppose that in a group of 25 people been fined within the past year for commercial lobster-size
at the National Mall, 15 actually saw the meteor. A patrolman violations. Suppose four lobstermen are selected at random.
randomly selects five people from this group. a. What is the probability that exactly two have been fined
a. What is the probability that none of the five people saw for violations within the past year? Write a Solution Trail
the meteor? for this problem.
b. What is the probability that at least four people saw the b. What is the probability that all four have been fined for
meteor? violations within the past year?
c. What is the probability that at most two people saw the c. What is the probability that at least one has been fined for
meteor? violations within the past year?
5.124 Sports and Leisure The NCAA Men’s and Women’s 5.128 Demographics and Population Statistics Buchtal, a
Swimming and Diving Committee recently recommended a no manufacturer of ceramic tiles, reports 3.9 job-related accidents
recall false-start rule.35 This proposal means that unless a false per year. Accident categories include trip, fall, struck by
start is blatant, the race will continue. The student-athlete equipment, transportation, and handling. Suppose a year is
committing the false start will be disqualified following the selected at random.
236 C h apte r 5 Random Variables and Discrete Probability Distributions
a. What is the probability that there will be no job-related a. What is the probability that the third telephone order
accidents? selected will be the first to contain an error?
b. What is the probability that the number of accidents that b. What is the probability that the supervisor will inspect
year will be between two and five (inclusive)? between two and six (inclusive) telephone orders before
c. If the number of accidents is more than three standard finding an error?
deviations above the mean, the company insurance carrier c. What is the probability that the inspector will examine at
will raise the rates. What is the probability of an increase least seven orders before finding an error?
in the company’s insurance bill? d. What is the probability that the first error will be on the
fourth telephone order or later?
5.129 Sports and Leisure Amusement park rides are great
e. Suppose the first four telephone orders contain no errors.
family fun, but over 4400 children are injured on rides every
What is the probability that the first error will be on the
year. According to a recent study, on average, one child is
eighth order or later?
treated in a hospital Emergency Room every two hours as a
result of an injury from an amusement park ride.37 Suppose a 5.133 Economics and Finance The manager of Capitol Park
two-hour period is selected at random. Plaza, an apartment complex in Washington, DC, collects the
a. What is the probability that no children will be treated rent from each tenant on the first day of every month. Past
in an Emergency Room as a result of an injury from an records indicate that the mean number of tenants who do not
amusement park ride? pay the rent on time in any given month is 4.7. Consider the
b. What is the probability that at most three children will be rent collection for the next month.
treated in an Emergency Room as a result of an injury a. Find the probability that every tenant will pay the rent on
from an amusement park ride? time.
c. Suppose that six children are treated in an Emergency b. Find the probability that at least seven tenants will be late
Room as a result of an injury on an amusement park ride. with their rent.
Is there any evidence to suggest that the claim (of one c. Suppose the number of delinquent rent payments in a
every two hours) is wrong? Justify your answer. month is independent of the number in every other
month. What is the probability that at most three tenants
Extended Applications will be late with their rent in two consecutive months?
5.134 Public Health and Nutrition The mean number of
5.130 Marketing and Consumer Behavior The Sweet
Leaf Iced Teas Company is sponsoring a conventional bottle- commercially prepared meals per week for a typical American is
cap sweepstakes game. Under each bottle cap there is a note 4.39 This might seem a little high. But consider young profes-
saying either “You are not a winner,” or the prize awarded. sionals who eat lunch out several times each week, and families
Suppose there are 20 of the game bottles on a shelf in the that order out for pizza or stop at a fast-food restaurant routinely.
supermarket, and two of them are winners. A customer a. What is the probability that a randomly selected American
randomly selects six bottles from the shelf. does not eat out during a week? Eats out once during the
a. What is the probability of selecting no winning bottles? week? Twice during the week?
b. What is the probability of selecting both winning bottles? b. Suppose two Americans are selected at random. What is
c. What is the mean number of winning bottles selected? the probability that the total number of meals for the two
d. How many bottles would the customer have to purchase in Americans during a week is 0? 1? 2?
order to expect one winning bottle? c. Suppose the mean number of times an American eats out
during a two-week period is 8. What is the probability
5.131 Physical Sciences There are over 1 million earth- that a randomly selected American does not eat out during
quakes worldwide of magnitude 2–2.9 each year. However, the a two-week period? Once during the two-week period?
mean number of earthquakes of magnitude 8 or higher is Twice during the two-week period?
approximately one per year.38 Suppose a random year is selected. d. How do your answers in parts (b) and (c) compare? What
a. What is the probability of exactly two earthquakes of property does this suggest about a Poisson random variable?
magnitude 8 or higher?
5.135 Psychology and Human Behavior The percentage
b. What is the probability of at most four earthquakes of
of Americans who claim to have no religious affiliation is the
magnitude 8 or higher?
highest since 1930, approximately 20%.40 In a group of 30
c. Suppose there are three earthquakes of magnitude 8 or
police officers, 6 have no religious affiliation. Suppose 4
higher. Is there any evidence to suggest that the mean is
officers from this group are selected at random.
different from one? Justify your answer.
a. What is the probability that exactly 1 officer will have no
5.132 Business and Management Managers at CafePress religious affiliation?
acknowledge that a variety of errors may occur in customer b. What is the probability that at most 2 officers will have no
orders received via telephone. A recent audit revealed that the religious affiliation?
probability of some type of error in a telephone order is 0.20. In c. Suppose the group consists of 50 police officers, 10 with
an attempt to correct these errors, a supervisor randomly selects no religious affiliation. Find the probabilities in parts (a)
telephone orders and carefully inspects each one. and (b) given this new, larger group.
Chapter Summary 237
d. Suppose 4 officers are selected at random from across the Suppose the distribution of Y is changed again, once more at
country. What is the probability that exactly 1 will have the right tail.
no religious affiliation? At most 2 will have no religious
y P(Y 5 y)
affiliation?
e. Compare all of these probabilities. Explain how the hyper 0 P(X 5 0) 5 0.1353
geometric distribution is related to the binomial distribution. 1 P(X 5 1) 5 0.2707
2 P(X 5 2) 5 0.2707
Challenge 3 P(X 5 3) 5 0.1804
5.136 Approaching Poisson Suppose X is a Poisson 4 P(X 5 4) 5 0.0902
random variable with l 5 2. Let the random variable Y have a 5 P(X $ 5) 5 0.0527
probability distribution as given in the following table.
y P(Y 5 y) Find the expected value of Y.
Chapter 5 Summary
Concept Page Notation / Formula / Description
Random variable 188 A function that assigns a unique numerical value to each outcome in a sample
space.
Discrete random variable 190 The set of all possible values is finite, or countably infinite.
Continuous random variable 190 The set of all possible values is an interval of numbers.
Probability distribution for 193 A method for conveying all the possible values of the random variable and the
a discrete random variable probability associated with each value.
Mean, or expected value, 203 m 5 E(X) 5 a 3 x # p(x) 4
of a discrete random all x
variable X
Variance of a discrete 204 s2 5 Var ( X ) 5 a 3 ( x 2 m ) 2 # p ( x ) 4
random variable X all x
Chapter 5 Exercises
5 APPLICATIONS on hold for no more than 60 seconds, then the call is classified
as successful (actually, this sounds like a miracle). The supervi-
5.138 Public Health and Nutrition Emergency defibrilla- sor in technical support claims 80% of all calls are successful.
tors are now located in many public buildings. However, the Suppose 25 calls to technical support are selected at random.
U.S. Food and Drug Administration (FDA) is concerned about a. Find the mean, variance, and standard deviation of the
the reliability of these devices. Approximately 45,000 devices number of successful calls.
failed during the past seven years.41 Suppose the FDA claims b. Find the probability that at least 18 calls will be successful.
that the probability of an emergency defibrillator working c. Suppose 21 calls are successful. Is there any evidence to
correctly is 0.90, and 30 of these devices are selected at random suggest the supervisor’s claim is false? Justify your answer.
and tested.
a. Find the probability that exactly 28 devices will work 5.140 Economics and Finance Bank overdraft fees range
correctly. from $10 to $38, consumers believe they are annoying and
b. Find the probability that at least 25 devices will work excessive, and these fees are a huge revenue source for banks.42
correctly. The Overdraft Protection Act of 2013 is designed to limit
c. Suppose only 20 of the devices work correctly. Is there overdraft fees in a variety of ways and to require fees to be
any evidence to suggest that the proportion of emergency reasonable and proportional to the amount of the overdraft. Let
defibrillators that work correctly has changed? Justify X be the amount of an overdraft fee for a randomly selected
your answer. bank. The probability distribution for X is given in the table
below.
5.139 Business and Management IKEA is a Swedish
company that sells ready-to-assemble furniture. Shoppers x 10 12 15 20 25
contact customer service with regard to finding a nearby store, p(x) 0.02 0.06 0.08 0.10 0.16
online shipping questions, and even for help with assembly.
IKEA classifies all telephone calls to its customer support staff x 27 30 35 38
by the amount of time the customer is on hold. If the customer is p(x) 0.28 0.15 0.07 0.08
Chapter 5 Exercises 239
a. Find the mean, variance, and standard deviation of the a. What is the probability that there will be no shark attacks?
overdraft amount. b. What is the probability that between two and five shark
b. Find the probability that a randomly selected bank has an attacks inclusive will occur?
overdraft fee greater than $25. c. If there is evidence to suggest that the mean number of
c. Find the probability that a randomly selected bank has an shark attacks per year has increased, the Coast Guard will
overdraft fee less than m 2 s . begin more patrols to adequately protect the public.
d. Suppose three banks are selected at random. What is the Suppose there are eight shark attacks in the thirteenth
probability that at least one bank has an overdraft fee less year. Is there evidence to suggest the need for more
than $20? patrols? Justify your answer.
d. Find out exactly how many shark attacks there were in a
5.141 Manufacturing and Product Development Thales
recent year in North Carolina. Determine the probability
Alenia Space is a European company that manufactures of this occurrence.
communications satellites. Researchers at the company have e. Florida has the highest number of shark attacks per year,
determined that the most common reason for a satellite to fail approximately 22.5, followed by Hawaii (4.33 per year),
once it is in orbit is a problem related to opening and initiating and California (3.17 per year). What is the probability
the solar panels. Suppose the probability of a failure related to that there will be no attacks in all four states in a given
the solar panels is 0.08. year?
a. What is the probability that the fifth satellite launched
will be the first to fail due to a solar-panel problem? 5.145 Technology and the Internet In February 2013, the
b. What is the expected number of satellites launched until Chinese Ministry of Industry and Information Technology
the first one fails due to a solar-panel problem? announced plans to expand broadband connections (or faster)
c. Thales Alenia Space is preparing an advertising campaign to 70% of Chinese households.45 Suppose the plan is successful
in which they claim to have had 20 successful launches in and 40 Chinese households are selected at random.
a row. What is the probability that the first failure due to a a. What is the probability that exactly 25 households have
solar-panel problem will occur after the 20th launch? broadband coverage?
b. What is the probability that at most 30 households have
5.142 Business and Management An easy-assembly, broadband coverage?
no-tools-required, gas grill comes with detailed step-by-step c. What is the probability that more than 33 households have
instructions. Even though each grill is carefully packaged, there broadband coverage?
are often missing pieces. This can aggravate the customer and d. Suppose the number of households that have broadband
increase the cost to the producer, who must provide phone coverage is within two standard deviations of the mean.
support and ship the missing parts. Suppose the mean number What is the probability that the actual number that have
of missing pieces per packaged grill is 0.7, and one grill is broadband coverage is within one standard deviation of
randomly selected from the stockroom. the mean?
a. What is the probability that there will be no missing
pieces in the package? 5.146 Sports and Leisure In 2012–2013, Carnival Cruise
b. If there are more than five missing pieces, the producer Lines experienced onboard problems with four ships: Triumph,
identifies the packager and issues a warning. What is the Elation, Dream, and Legend. The technical issues included loss
probability of a warning being issued at the packaging plant? of power, steering problems, and even a fire in an engine room.
c. Suppose three grills are randomly selected. What is the Despite these setbacks, the cruise industry contributes almost
probability that each will have no more than one missing $38 billion to the U.S. economy, and approximately 20% of all
piece? people in the United States have taken a cruise.46 Suppose 25
people from the United States are selected at random.
5.143 Travel and Transportation Because of forced a. Find the probability that exactly three people have been
spending cuts in Spring 2013, the Federal Aviation Administra- on a cruise.
tion identified 149 air traffic control towers that would be b. Find the probability that at most two people have been on
closed.43 In a group of 20 air traffic control towers in the a cruise.
Midwest, five will be closed. Suppose six of the 20 air traffic c. Suppose seven people have taken a cruise. Is there any
control towers are selected at random. evidence to suggest that the percentage of people who
a. What is the probability that none of the six air traffic have taken a cruise has increased? Justify your answer.
control towers will be closed?
5.147 Psychology and Human Behavior The army
b. What is the probability that at most two of the air traffic
emphasizes cleanliness and neatness in a military barracks.
control towers will be closed?
Each cadet is responsible for maintaining his or her area in top
c. What is the probability that at least four will be closed?
condition. Periodic inspections are held, and those receiving top
5.144 Sports and Leisure Over a period of 12 years, there scores are rewarded. Suppose the mean number of violations
were approximately 2.42 shark attacks per year at the beaches discovered per cadet during a barracks inspection is 2.7.
in North Carolina.44 Consider the number of shark attacks a. What is the probability that a randomly selected cadet will
during the thirteenth year. have exactly three violations during an inspection?
240 C h apte r 5 Random Variables and Discrete Probability Distributions
b. If a cadet has six or more violations, he or she is assigned a. What is the probability that Abby has no Caf-Pows on the
to KP duty for one week. What is the probability that a show?
randomly selected cadet will be assigned to KP duty b. What is the probability that she is shown having at most
following a barracks inspection? 3 Caf-Pows on the show?
c. If every member of a 10-cadet unit has no violations, then c. Suppose Abby has 6 Caf-Pows. Is there any evidence that
each will receive a weekend pass. What is the probability the number of Caf-Pows per show has changed? Justify
of this happening following a barracks inspection? your answer?
5.148 Marketing and Consumer Behavior In the country’s
2012 budget, Canada decided to stop production of the penny.47 EXTENDED APPLICATIONS
The government indicated that it costs 1.6 cents to produce 5.152 Discrete Uniform Random Variable Suppose X is a
each penny, some see the penny as a burden to the economy, random variable with probability distribution given by
and approximately 10% of all Canadians believe the penny is a
nuisance. Suppose 50 Canadians are selected at random and 1
p(x) 5 x 5 1, 2, 3, 4, 5
asked if they believe the penny is a nuisance. 5
a. Find the mean, variance, and standard deviation of the a. Find the mean, variance, and standard deviation of X.
number of Canadians who believe the penny is a nuisance. b. Suppose p ( x ) 5 1/ 6, x 5 1, 2, 3, 4, 5, 6. Find the mean,
b. What is the probability that at most three Canadians variance, and standard deviation of X.
believe the penny is a nuisance? c. Suppose p ( x ) 5 1/ n, x 5 1, 2, 3, c, n. Find the mean,
c. Find the probability that the number of Canadians who variance, and standard deviation of X in terms of n.
believe the penny is a nuisance will be within two
standard deviations of the mean. 5.153 Medicine and Clinical Studies According to a Pew
Research Center survey, approximately 35% of Americans
5.149 Marketing and Consumer Behavior The movie attempt to diagnose a medical condition online.49 Highmark
Les Misérables, an adaptation of Victor Hugo’s novel, starred insurance company is concerned about the rising number of
Hugh Jackman, Russell Crowe, Anne Hathaway, and Amanda online diagnosers and the resulting failure to consult a physi-
Seyfried, and won many awards. The Flixster movie site, Rotten cian. They have decided to select 25 policyholders at random. If
Tomatoes, rated the movie at 74% on the Tomatometer. the number of online diagnosers is 11 or fewer, then no action
However, 81% of all people who saw the movie liked it.48 will be taken. Otherwise, they will begin a campaign to remind
Suppose 30 people who saw the movie are selected at random. policyholders that they should always consult a physician to
a. What is the probability that at most 20 people liked the confirm a medical condition.
movie? a. Suppose the true proportion of online diagnosers is 0.35.
b. What is the probability that at least 25 people liked the What is the probability that Highmark will begin a new
movie? reminder campaign?
c. Suppose 18 people liked the movie. Is there any evidence b. Suppose the true proportion of online diagnosers is 0.40.
to suggest that the claim (81%) is wrong? Justify your What is the probability that no action will be taken? What
answer. if the true proportion is 0.50?
5.150 Public Policy and Political Science In a recent c. Suppose the decision rule is changed such that if the
nationwide study, it was reported that U.S. adults continue to number of online diagnosers is 12 or fewer, then no action
believe that big companies and lobbyists have too much power. will be taken. Answer parts (a) and (b) using this rule.
In particular, 85% of those polled indicated that PACs (Political 5.154 Marketing and Consumer B ehavior Kohl’s is
Action Committees) have too much influence in Washington. running a sale in which customers may save as much as 40%
Suppose 50 U.S. adults are selected at random. on any purchase. Once a customer decides to make a purchase,
a. What is the probability that at least 45 U.S. adults think he selects two sales prize tickets at random from a large bin at
PACs have too much power? the front of the store. Each ticket has a percentage marked on
b. What is the probability that between 38 and 42 (inclusive) it, and the probability of selecting each ticket is given in the
U.S. adults think PACs have too much power? table below.
c. Suppose 35 U.S. adults think PACs have too much power.
Is there any evidence to suggest the poll results are Percentage 10% 20% 30% 40%
wrong? Justify your answer. Probability 0.50 0.35 0.10 0.05
5.151 The CBS show NCIS stars Mark Harmon as Special The larger of the two percentages selected is used for the
Agent Leroy Jethro Gibbs in which his team investigates purchase.
military-related criminal cases. The character Abigail Sciuto is a. Let X be the maximum of the two prize ticket percentages.
frequently seen drinking the high-energy caffeine-laden drink Find the probability distribution for X.
Caf-Pow (she keeps a spare in the lab refrigerator). The mean b. Find the mean, variance, and standard deviation of X.
number of Caf-Pows Abby consumes per show is 3. Suppose a c. What is the probability that a customer will receive at
2013 NCIS episode is selected at random. least 20% off on his or her purchase?
Chapter 5 Exercises 241
5.155 Medicine and Clinical Studies A recent study shipments for crew 2). Add these two values to compute a
suggests that total knee joint replacement surgery may be random total number of complete shipments for crews 1
related to weight gain. Researchers who studied records from and 2.
the Mayo Clinic Health system report that 30% of these Repeat this process to generate 1000 random total number
patients gained 5% or more of their body weight following of complete shipments for crews 1 and 2. Compute the
surgery.50 Suppose this percentage is the same at all hospitals relative frequency of occurrence of each observation.
in the United States and 40 people who had total knee joint
Suppose Y is a binomial random variable with n 5 11
replacement surgery are selected at random.
and p 5 0.6. Use technology to construct a table of
a. What is the probability that exactly 14 will experience a
probabilities for Y 5 0, 1, 2, 3, c, 11. Compare these
weight gain?
probabilities with the relative frequencies obtained above.
b. Find the largest value w such that the probability of w or
b. Suppose a new receiving crew is added and it unloads five
fewer patients who experience a weight gain is at most 0.20.
shipments each day. Let X3 be a binomial random
c. Suppose 16 patients experience a weight gain. Is there any
variable, representing the number of complete shipments
evidence to suggest that the proportion of patients with
for crew 3, with parameters n3 5 5 and p 5 0.6.
knee joint replacement surgery who experience a weight
gain has changed? Justify your answer. Use technology to generate random observations for X1,
X2, and X3. Add these three values to compute a random
5.156 Public Health and Nutrition A recent study sug- total number of complete shipments for crews 1, 2, and 3.
gested that 75% of all meals marketed specifically to babies and
Repeat this process to generate 1000 random total number
toddlers and sold in grocery stores had high sodium content.51
of complete shipments for crews 1, 2, and 3. Compute the
This is of deep concern because increased salt in the diet can
relative frequency of occurrence of each observation.
cause hypertension, which may lead to cardiovascular disease.
Suppose 30 toddler meals are randomly selected and the Suppose Y is a binomial random variable with n 5 16 and
sodium content in each is carefully measured. p 5 0.6. Use technology to construct a table of
a. What is the probability that exactly 23 meals will have probabilities for Y 5 0, 1, 2, 3, c, 16. Compare these
high sodium content? probabilities with the relative frequencies obtained above.
b. What is the probability that at least 25 will have high c. Suppose another receiving crew is added and it unloads
sodium content? nine shipments each day. Let X4 be a binomial random
c. Suppose at most 20 meals have high sodium content. variable, representing the number of complete shipments
What is the probability that at most 15 will have high for crew 4, with parameters n2 5 9 and p 5 0.6. Let Y
sodium content? represent the total number of complete shipments for all
four crews.
i. Find P ( Y 5 15 ) (the probability of exactly 15 total
CHALLENGE complete shipments).
ii. Find P ( Y # 12 ) .
5.157 A Day on the Dock Two crews work on a receiving i ii. Find P ( Y . 16 ) .
dock at a fabric manufacturing plant. The first crew unloads iv. How many total complete shipments can be expected?
four shipments every day and the second crew unloads seven
shipments every day. A supervisor records whether each
shipment is complete (a success) or missing items (a failure). LAST STEP
Suppose X1 is a binomial random variable, representing the 5.158 Is a flu shot really effective? In 2012–2013
number of complete shipments for crew 1, with parameters the CDC reported that the flu vaccine was 56% effective. That
n1 5 4 and p 5 0.6. Similarly, let X2 be a binomial random is, 56% of all people receiving the flu vaccine who were
variable, representing the number of complete shipments for exposed to a flu virus did not contract the flu. To check this
crew 2, with parameters n2 5 7 and p 5 0.6. Assume X1 and X2 claim (56% effective), a random sample of 50 at-risk people
are independent. who received a flu shot was selected. During the flu season, all
a. Use technology to generate a random observation for X1 50 were exposed to the flu and 29 actually contracted the
(the number of complete shipments for crew 1) and a disease. Is there any evidence to suggest the claim is false, that
random observation for X2 (the number of complete the chance of contracting the flu is greater than 44%?
6 Continuous Probability
Distributions
Looking Back
n Remember how to completely describe and compute probabilities associated with a
discrete random variable.
n Recall the characteristics of and probability computations associated with the binomial,
geometric, Poisson, and hypergeometric random variables.
Looking Forward
n Learn how to completely describe a continuous random variable.
n Compute probabilities associated with a continuous random variable.
n Understand the characteristics of the normal distribution and compute probabilities
involving a normal random variable.
Contents
6.1 Probability Distributions for a Continuous Random Variable
6.2 The Normal Distribution
6.3 Checking the Normality Assumption
6.4 The Exponential Distribution
243
244 C hapter 6 Continuous Probability Distributions
Definition
A probability distribution for a continuous random variable X is given by a smooth
curve called a density curve, or probability density function (pdf). The curve is defined
so that the probability that X takes on a value between a and b ( a , b ) is the area under
the curve between a and b.
A Closer L ok
1. Probability in a continuous world is area under a curve. Figures 6.1–6.3 illustrate the
correspondence between the probability of an event (defined in terms of a continuous
random variable) and the area under the density curve.
Probability
f (x) f (x) f (x)
density curve
P(X a)
P(a X b) P(X b)
a b x a x b x
Figure 6.1 The probability is the area Figure 6.2 The shaded area is Figure 6.3 The shaded area is
of the shaded region. P(X # a). P(X $ b).
f (x)
0 x
Figure 6.4 A valid probability density function.
6.1 Probability Distributions for a Continuous Random Variable 245
P(X 5 a) translated: Find the area 4. If X is a continuous random variable with density function f, the probability that X
under the curve between a and a. equals any one specific value is 0. That is, P ( X 5 a ) 5 0 for any a. The reason: There
This is asking for the area of a line is no area under a single point.
segment. There is no second This seems like a contradiction. Certainly we can observe specific values of X, yet the
s
dimension. Hence the area is 0. probability of observing any single value is 0. Recall: Probability is a limiting relative fre-
quency. There are an (uncountably) infinite number of values for any continuous random
variable. Therefore, the limiting relative frequency of occurrence of any single value is 0.
Because no probability is associated with a single point, the following four probabili-
This is not necessarily true for a ties are all the same:
discrete random variable.
P(a # X # b) 5 P(a , X # b) 5 P(a # X , b) 5 P(a , X , b) (6.1)
In fact, we can remove as many single points as we want from any interval, and the
probability will stay the same. The only reasonable probability questions concerning
continuous random variables involve intervals. And we can almost always sketch a
graph to visualize these probabilities, or regions.
s
So, how do we find area under a curve, and therefore probability? In general, this is a
calculus question. Don’t panic. We’ll use a little geometry, tables, and technology to find
the necessary area (probability).
The (continuous) uniform distribution provides a good opportunity to illustrate the
connection between area under the curve and probability. For this random variable, the total
probability, 1, is distributed evenly, or uniformly, between two points. Computing probabil-
ities associated with this random variable reduces to finding the area of a rectangle.
Definition
The random variable X has a uniform distribution on the interval [a, b] if
1
if a # x # b
f (x) 5 • b 2 a 2` , a , b , ` (6.2)
0 otherwise
a1b (b 2 a)2
m5 s2 5 (6.3)
2 12
A Closer L ok
Stepped
STEPPED Tutorial
TUTORIALS 1. a and b can be any real numbers, as long as a is less than b (a , b).
Density curves
BOX PLOTS 2. All of the probability (action) is between a and b. The probability density function is
the constant 1/ ( b 2 a ) between a and b, and zero outside of this interval. Hence, there
is no area and no probability outside the interval [a, b]. Figure 6.5 shows a graph of the
uniform probability density function.
f (x)
1
b–a
a b x
Figure 6.5 The graph of the probability
density function for a uniform random
variable.
246 C hapter 6 Continuous Probability Distributions
3. Equation 6.2 is a valid probability density function because f ( x ) $ 0 for all x, and the
total area under the curve is 1. The area under the curve for x , a is zero, and the area
under the curve for x . b is zero. Between a and b, the area under the curve is the area
of a rectangle ( area 5 width 3 height ) . See Figure 6.6.
f (x)
1
Area (b – a) 1
b–a
1
b–a
1
b–a
Area 0 Area 0
a b x
b–a
Figure 6.6 The total area under the curve is 1. Here,
SOLUTION TRAIL
the density curve consists of three line segments.
The following example involves a uniform distribution and illustrates visualizing and
calculating probabilities associated with a continuous random variable.
Vi sion
Solution
The time it takes to reach a dive a. Let X be the time it takes to reach a dive site. X is uniform between the times a 5 5
site has a uniform distribution and b 5 25. Use Equation 6.2 to find
between 5 and 25 minutes. The 1 1 1
two most important components 5 5 5 0.05
for solving this kind of problem b2a 25 2 5 20
are the probability distribution The probability density function is
and the probability statement.
Use Equation 6.2 to draw the 0.05 if 5 # x # 25
density function, and sketch a f (x) 5 e
0 otherwise
graph corresponding to each
probability statement. Figure 6.7 shows the graph of the probability density function.
b. We have the distribution of X. Translate the question in part (b) into a probability
statement, sketch the region corresponding to the probability statement, and find the
area of that region.
At most means up to and including 10. We need the probability that X is less than or
equal to 10: P ( X # 10 ) . See Figure 6.8.
6.1 Probability Distributions for a Continuous Random Variable 247
f (x) f (x)
0.075 0.075
P(X 10)
0.050 0.050
0.025 0.025
0 5 10 15 20 25 30 x 0 5 10 15 20 25 30 x
Figure 6.7 The graph of the probability Figure 6.8 The area of the shaded region is
density function for a uniform random variable P(X # 10).
on the interval a 5 5 to b 5 25.
The probability statement P ( X # 10 ) 5 area under the density curve between 5 and 10
P(X # 10) simplifies to 5 area of a rectangle
P(5 # X # 10) in this case
5 width 3 height
because there is no probability
(area) for x less than 5. 5 ( 5 )( 0.05 ) 5 0.25
The probability that it takes at most 10 minutes is 0.25.
P(10 # X # 20) c. The probability that it takes between 10 and 20 minutes to reach a dive site in terms of
5 P(10 , X # 20) the random variable X is P ( 10 # X # 20 ) . Even though the word inclusive is not used
in the question, we chose to write the interval including the endpoints. It doesn’t really
5 P(10 # X , 20)
matter! Remember: In a continuous world, single values contribute no probability and
5 P(10 , X , 20) do not change the probability calculation.
P ( 10 # X # 20 ) 5 area under the density curve between 10 and 20
5 area of a rectangle
5 width 3 height
5 ( 10 )( 0.05 ) 5 0.50
The probability that it takes between 10 and 20 minutes is 0.50. See Figure 6.9.
f (x)
0.050
0.025
0 5 10 15 20 25 30 x
Figure 6.9 The area of the shaded region is
P(10 # X # 20).
a1b 5 1 25 30
m5 5 5 5 15
2 2 2
The mean time it takes to reach a dive site is 15 minutes. Because the uniform
distribution is symmetric, the mean is the middle of the distribution, and the mean is
equal to the median.
248 C h a p t e r 6 Continuous Probability Distributions
cumulative probability will mean 5 1 2 P(X # b) A single value contributes no probability.
up to and including x; we use # Cumulative probability
(not ,).
P ( a # X # b ) 5 P ( X # b ) 2 P ( X , a ) (Figure 6.12)
5 P(X # b) 2 P(X # a) A single value contributes no probability.
Find all the probability up to b, find all the probability up to a, and subtract. The dif-
ference is the probability that X lies in the interval from a to b.
P(X b)
Density curve
Density curve Density curve
P(a X b)
P(X x)
P(X a)
P(X b) P(X b)
x b a b
Figure 6.10 The shaded area is the Figure 6.11 Use the complement rule Figure 6.12 The shaded area is
cumulative probability P(X # x). to convert to cumulative probability. P(X $ b).
Here is one more way to picture cumulative probability. As x moves from left to right, we
accumulate more and more probability. As x increases, cumulative probability also increases.
Imagine starting at an altitude of zero and walking up a (smooth) hill. At any point along the
walk, measure the altitude. This distance is the cumulative probability. Figure 6.13 shows
the relationship between the area under the density curve and the altitude.
f (x) f (x)
1.0 Cumulative
distribution
function Density
curve
0.5
P(X x)
0.0 0.0
x x
Figure 6.13 Picturing cumulative probability: The altitude is equal to the shaded area.
6.1 Probability Distributions for a Continuous Random Variable 249
A Closer L ok
Why is 1 the maximum value of 1. The drawing on the left in Figure 6.13 is a graph of a cumulative distribution function.
the cumulative distribution This function starts at 0 and is always increasing, until it reaches a maximum value of 1.
function? 2. The mean, m, and the variance, s2 , for a continuous random variable are computed
using calculus. Although we will not consider any of these calculations, we will inter-
pret and use these values as usual. m is a measure of the center of the distribution, and
s2 (or s ) is a measure of the spread, or variability, of the distribution.
Figure 6.14 shows density functions for the random variables X and Y.
a. The mean of X is less than the mean of Y, mX , mY , because the center of the dis-
tribution of X is to the left of the center of the distribution of Y.
b. The standard deviation of X is greater than the standard deviation of Y, sX . sY ,
because the distribution of X is more spread out, and thus has more variability, than
the distribution of Y.
Density
curve for Y
Density
curve for X
X Y
The following example illustrates the use of cumulative probability to compute prob-
ability associated with a continuous random variable.
Video Tech Manuals
VIDEO TECH MANUALS Example 6.2 Keeping Good Time
Probability distribu-
EXEL DISCRIPTIVE
tion calculations Each Citizen Eco-Drive wristwatch is carefully tested for accuracy before being packaged
with density curve and shipped. If the watch gains or loses time during the 24-hour testing period, it is sent
to a technician for adjustment. The time inconsistency (in seconds) is a random variable,
X, with the probability density function shown in Figure 6.15. A negative value of X
indicates that the watch lost time, and a positive value indicates that the watch gained
time. Cumulative probability for X is illustrated in Figure 6.16 and can be computed using
f (x) f (x)
P(X x)
10 5 0 5 10 x 10 5 0 x 5 10
Figure 6.15 The probability density Figure 6.16 The cumulative probability for
function for the time inconsistency of a the time inconsistency of a wristwatch.
wristwatch.
250 C hapter 6 Continuous Probability Distributions
the equation that follows. The cumulative probability (the area of the shaded region in
Figure 6.16) is
Recall: e is the base of the natural
1
logarithm; e < 2.71828. Most P(X # x) 5 for all x (6.4)
calculators have a specific key 1 1 e2x/2
for e. Suppose a watch is randomly selected.
a. What is the probability that the watch is 5 seconds slow or slower?
b. What is the probability that the watch is more than 10 seconds fast?
c. What is the probability that the watch is between 3 seconds slow and 3 seconds fast?
Solution
a. If the watch is 5 seconds slow or slower, this means X # 25. The probability
P ( X # 25 ) is cumulative probability already. The calculation and the graphical
interpretation (Figure 6.17) follow.
f (x)
P(X 5)
10 5 0 5 10 x
1
P ( X # 25 ) 5 ( )
5 0.0759 Use Equation 6.4.
1 1 e2 25 /2
The probability that a randomly selected watch is 5 seconds slow or slower is 0.0759.
b. If the watch is more than 10 seconds fast, this means X . 10. To compute the
corresponding probability, use the complement rule to convert to an expression
involving cumulative probability, and use Equation 6.4. See Figure 6.18.
f (x)
P(X 10)
10 5 0 5 10 x
Figure 6.18 The probability for watch
inconsistency, P(X . 10).
1
512 Use Equation 6.4.
1 1 e210/2
5 1 2 0.9933 5 0.0067 Simplify.
6.1 Probability Distributions for a Continuous Random Variable 251
The probability that a randomly selected watch is at least 10 seconds fast is 0.0067.
c. If the watch is between 3 seconds slow and 3 seconds fast, this means 23 # X # 3.
To compute the corresponding probability, find the difference between two cumulative
probabilities. See Figure 6.19.
f (x)
P(3 X 3)
P(X 3)
10 5 3 0 3 5 10 x
P ( 23 # X # 3 ) 5 P ( X # 3 ) 2 P ( X , 23 )
5 P ( X # 3 ) 2 P ( X # 23 ) A single point contributes no probability.
1 1
5a 23/2 b
2a ( ) b
Use Equation 6.4.
11e 1 1 e2 23 /2
5 0.8176 2 0.1824 5 0.6352 Simplify.
The probability that a randomly selected watch is between 3 seconds slow and 3
seconds fast is 0.6352.
Solution
Step 1 Because X has a uniform distribution with a 5 10 and b 5 60, the probability
Tim Dominick/MCT/ABACAUSA.COM/Newscom density function is
f (x)
0.020
0.015
0.010
0.005
10 t 60 x
Step 3 Set the expression for probability equal to 0.75 and solve for t.
( t 2 10 )( 0.02 ) 5 0.75
0.75
t 2 10 5 5 37.5 Divide both sides by 0.02.
0.02
t 5 37.5 1 10 5 47.5 Add 10 to both sides.
a. Find the probability that the candle burns at least seven d. If the DOT road crew removes all of the cones on the
hours. center line 55 minutes after painting, what is the
b. Find the probability that the candle burns at most eight probability that the paint will still be wet?
hours.
c. Find the mean burning time and the probability that the 6.16 Physical Sciences In Grafton, a rural area in
burning time of a randomly selected candle will be within
SOLUTION TRAIL Vermont, the distance (in meters) between telephone
one standard deviation of the mean. poles has a uniform distribution with a 5 40 and b 5 65.
d. Find a time t such that 25% of all candles burn longer Suppose two consecutive telephone poles are selected at
than t hours. random.
a. What is the probability that the distance between the poles
6.13 Sport and Leisure According to Major League is less than 60 meters?
Baseball rules, a baseball should weigh between 5 and b. What is the probability that the distance between the poles
5.25 ounces and have a circumference of between 9 and 9.25 is between 45 and 55 meters? Write a Solution Trail for
inches. Suppose the weight of a baseball (in ounces) has a this problem.
uniform distribution with a 5 5.085 and b 5 5.155, and the c. Any distance between poles greater than 50 meters is
circumference (in inches) has a uniform distribution with considered to be environment friendly. What is the
a 5 9.0 and b 5 9.1. probability that the distance is environment friendly?
a. Find the probability that a randomly selected baseball has
a weight greater than 5.14 ounces. Write a Solution Trail
for this problem. Extended Applications
b. Find the probability that a randomly selected baseball has
a circumference less than 9.03 inches. 6.17 Psychology and Human Behavior Some of the
c. Suppose the weight and the circumference are common medications to help people with insomnia fall asleep
independent. Find the probability that a randomly selected include Ambien, Lunesta, and Sonata. Ambien CR is an
baseball will have a weight between 5.11 and 5.13 ounces extended-release variation and is formulated so that an
and a circumference between 9.04 and 9.06 inches. individual will fall asleep within 30 minutes.3 The probability
density function for X, the time (in minutes) it takes to fall
6.14 Manufacturing and Product Development Pre- asleep after taking an Ambien CR tablet, is given below.
manufactured wooden roof trusses allow builders to complete
projects faster and with lower on-site labor costs. The connector 0.05 if 0 # x # 10
plates for trusses are made from Grade A steel and are hot-dip f ( x ) 5 • 20.0025 ( x 2 30 ) if 10 , x # 30
galvanized. The thickness of a truss connector (in inches) varies 0 otherwise
254 C hapter 6 Continuous Probability Distributions
f (x) c. What is the probability that the car is parked for more
than 2.6 hours?
0.05
d. What is the probability that the car is parked for between
0.04 1.4 and 2.6 hours?
6.19 Marketing and Consumer Behavior Marini’s
0.03
candy store on the beach boardwalk in Santa Cruz sells
0.02 candy in bulk. Customers can mix products from over 100
barrels. The probability distribution for the number of
0.01 pounds of candy purchased by a randomly selected customer
is shown below.
0 5 10 15 20 25 30 x
f (x)
f (x)
f (x)
1.0
0.375
0.8
0.6
0.250
0.4
0.2 0.125
0 1 2 3 4 x
3 2 1 0 1 2 3 x
a. What is the probability that the car is parked for less than
2 hours? a. Verify that this is a valid probability density function.
b. What is the probability that the car is parked for less than b. What is the probability that the stock price increases by
1.4 hours? at least $1.00 on a randomly selected day?
6.1 Probability Distributions for a Continuous Random Variable 255
c. What is the probability that the change in stock price is b. What is the probability that the person must wait for a
between 21.00 and 1.00? table for over one half-hour?
d. Find a value c such that P ( 2c # X # c ) 5 0.90. c. What is the probability that the person must wait for a
table for between 15 and 30 minutes?
6.21 Medicine and Clinical Studies Although we all
experience inflammation associated with a bruise or sprain, 6.23 Psychology and Human Behavior Parents with
inflammation can also affect the cells in the body. C-reactive children under age 16 often spend a lot of time during the day
protein (CRP) is a measure of inflammation and can be part of a driving their kids to various places, for example, to/from
routine blood test. For healthy adults, the CRP level is less than 5 after-school activities, music practice, sports practices and
milligrams per liter of blood. Suppose the probability distribution games, the library, and a friend’s home. Suppose a family has
for X, the CRP level in healthy adults, is given as follows: k child(ren) under 16 (k 5 1, 2, 3, 4, 5), and let the random
variable Xk be the time (in hours) spent taxiing during the day.
20.08 ( x 2 5 ) if 0 # x # 5
f (x) 5 e Xk has a uniform distribution with a 5 0 and b 5 k. For
0 otherwise example, for a family with two children, X2 has a uniform
distribution with a 5 0 and b 5 2.
f (x) a. For a family with three children, what is the probability
that parents will spend less than one hour driving kids on
0.4 a randomly selected day?
b. For a family of four children, what is the mean number of
hours spent driving kids? What is the probability that the
0.3
driving time will be greater than two standard deviations
from the mean?
0.2 c. For a family with five children under 16, find a time t such
that the probability of driving kids more than t hours is 0.25.
0.1 d. Suppose five families are selected at random, the first
with one child under 16, the second with two children
under 16, etc. What is the probability that all five families
0 1 2 3 4 5 6 x drive less than 30 minutes on a randomly selected day?
What is the probability that all five families drive more
than 90 minutes on a randomly selected day?
a. Verify that this is a valid probability distribution.
b. What is the probability that a randomly selected healthy 6.24 Suppose X is a continuous random variable such that
adult will have a CRP level less than 2.5? 2
1 2 e2x /8 if x $ 0
c. What is the probability that a randomly selected healthy P(X # x) 5 e
0 otherwise
adult will have a CRP level between 2 and 3?
d. Find a value c such that 5% of healthy adults have a CRP a. Find P ( X # 4 ) .
level of at least c. b. Find P ( X . 2 ) .
e. If a patient has a CRP level of at least 4, then additional c. Find P ( 1 # X # 3 ) .
testing is done. What is the probability that a healthy d. Find P ( X # 2 0 X # 4 ) .
adult will need additional testing?
f. What is the probability that the fifth healthy adult will be 6.25 Sports and Leisure A figure skating routine is
the first to need additional testing? designed to last six minutes. The amount of time (in minutes)
less than or greater than six minutes is a random variable, X,
with a probability density function given by
Challenge
2 2 2
6.22 Marketing and Consumer Behavior Dinner 2 x2 if 2 #x#
f (x) 5 c Å p Åp Åp
c ustomers at the Primanti Brothers restaurant in Pittsburgh,
Pennsylvania, often experience a long wait for a table. For a 0 otherwise
randomly selected customer who arrives at the restaurant If the value of X is negative, then the routine was shorter than
between 6:00 p.m. and 7:00 p.m., the waiting time (in minutes) six minutes; if the value of X is positive, the routine went too
is a continuous random variable such that long.
1 2 e20.05x if x $ 0 a. Carefully sketch a graph of the density function.
P(X # x) 5 e b. Find the probability that a randomly selected performance
0 otherwise
is within 1/ !p minutes of 6. That is, find
Suppose a dinner customer is randomly selected.
a. What is the probability that the person must wait for a 1 1
Pa2 #X# b
table for at most 20 minutes? !p !p
256 C hapter 6 Continuous Probability Distributions
A Closer L ok
We’ve seen e before, in the 1. In this probability density function, e is the base of the natural logarithm; e < 2.71828.
Poisson distribution. p is another constant, commonly used in trigonometry; p < 3.14159.
2. We use the shorthand notation X , N ( m, s2 ) to indicate that X is (distributed as) a
normal random variable with mean m and variance s2. For example, X , N ( 5, 36 )
means that X is a normal random variable with mean m 5 5 and variance s2 5 36
(and s 5 6).
3. Equation 6.6 means that x can be any real number (the density curve continues forever
in both directions), the mean m can be any real number (positive or negative), and the
variance can be any positive real number.
Bell-shaped means: Place a bell on 4. For any mean m and variance s2, the density curve is symmetric about the mean m,
a table and pass a plane (a piece unimodal, and bell-shaped as shown in Figure 6.21.
of paper) through the bell
perpendicular to the table. The f (x) Concave down
intersection of the plane and the
bell is a bell-shaped curve.
Concave up Concave up
x
again at x 5 m 1 s.
s
The mean is equal to the median because the normal distribution is symmetric.
6.2 The Normal Distribution 257
It can be shown (using calculus) that the total area under this density curve is 1 (even
s
though it extends forever in both directions, getting closer and closer to the x axis but
never touching it).
s
Stepped
STEPPED Tutorial
TUTORIALS 5. The mean m is a location parameter, and the variance s2 determines the spread of the
Normal
BOX PLOTS distribution. As the variance increases, the total area under the probability density
distributions function (1) is rearranged. The graph is compressed down and pushed out (on the
tails). Figures 6.22 and 6.23 show the effects of m and s2 on the location (center) and
spread of the density curve.
f (x) f (x)
4 7 10 13 16 19 x 4 6 8 10 12 14 16 18 20 x
Figure 6.22 Normal probability density Figure 6.23 Normal probability density
function with m 5 7 and small s2 . function with m 5 12 and large s2 .
P(a X b)
a b x
Figure 6.24 The shaded region corresponds
to P(a # X # b).
The shaded region in Figure 6.24 is not a simple geometric figure; it’s bounded by a
curve! Consequently, there is no nice formula for the area of this region, corresponding to
P ( a # X # b ) . However, a probability statement associated with any normal random
variable can be transformed into an equivalent expression involving a standard normal
random variable (defined below). Cumulative probabilities associated with this distribu-
tion are provided in Appendix A, Table III.
A Closer L ok
1. In Equation 6.7 the independent variable z is used to define the probability density
function simply because the standard normal random variable is usually denoted by Z.
2. Figure 6.25 shows a graph of the probability density function for a standard normal
random variable. The mean is m 5 0 and the standard deviation is s 5 1. Note most
of the probability (area) is within three standard deviations of the mean, between 23
and 3. The shorthand notation Z , N ( 0, 1 ) means Z is a normal random variable with
mean 0 and variance 1.
f (z)
3 2 1 0 1 2 3 z
Figure 6.25 Graph of the probability density
function for a standard normal random variable.
We will often refer to a standard 3. The standard normal distribution is not common, but it is used extensively as a refer-
normal distribution as a Z world. ence distribution. Any probability statement involving any normal random variable
can be transformed into an equivalent expression (with the same probability) involving
a Z random variable. We will learn how to standardize shortly. Therefore, you need to
become an expert at computing probabilities in the Z world. Probabilities associated
with Z are computed using cumulative probability, as shown below. Figure 6.26 shows
the steps for computing probabilities associated with a normal random variable.
Probabilities associated with a standard normal random variable, Z, are computed using
cumulative probability. Table III in Appendix A contains values for P ( Z # z ) for selected
values of z. Figure 6.27 shows the geometric region corresponding to P ( Z # z ) , and Fig-
ure 6.28 illustrates the use of Table III in Appendix A. Locate the units and tenths digits in z
along the left side of the table. Find the hundredths digit in z across the top row. The intersec-
tion of this row and column, in the body of the table, contains the cumulative probability.
f (z)
P(Z z)
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
( ( ( ( ( ( ( ( ( ( (
1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621
1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830
1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015
1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177
1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319
( ( ( ( ( ( ( ( ( ( (
Figure 6.28 P(Z # 1.23) 5 0.8907 in Table III, Appendix A.
The following example illustrates the use of Table III in Appendix A to find probabili-
ties associated with Z.
Solution
a. This expression is already cumulative probability. Go directly to Table III in the
Appendix, and find the intersection of row 1.4 and column 0.05. See Figure 6.29.
P ( Z # 1.45 ) 5 0.9265 Cumulative probability; use Table III in Appendix A.
f (z)
0 1.45 z
Figure 6.29 The area of the shaded region is Figure 6.30 P(Z # 1.45).
P(Z # 1.45).
b. This is a right-tail probability. Convert to cumulative probability and use Table III in
the Appendix. See Figure 6.31.
P ( Z $ 20.6 ) 5 1 2 P ( Z , 20.6 ) The Complement Rule.
f (z)
0.6 0 z
Figure 6.31 The area of the shaded region is Figure 6.32 P(Z $ 20.6).
P(Z $ 20.6).
c. Find all the probability up to 2.13, find all the probability up to 21.25, and subtract.
The difference is the probability that Z lies in this interval. See Figure 6.33.
P ( 21.25 # Z # 2.13 )
5 P ( Z # 2.13 ) 2 P ( Z , 21.25 ) Use cumulative probability.
f (z)
1.25 0 2.13 z
d. In this problem, we need to work backward to find the solution. This is an inverse
cumulative probability problem. The cumulative probability is given. We need the
value b such that the cumulative probability is 0.90. See Figure 6.35. Search the body
of Table III in Appendix A to find a cumulative probability as close to 0.90 as possible.
Read the row and column entries to find b.
In the body of Table III, the closest cumulative probability to 0.90 is 0.8997. This
corresponds to 1.28 < b. Figure 6.36 shows a technology solution.
f (z)
0 b z
Figure 6.35 The area of the shaded region is Figure 6.36 Inverse cumulative
0.90 5 P(Z # b). probability.
6.2 The Normal Distribution 261
Note: Linear interpolation can be used to find a more exact answer. The technology
solution presented in Figure 6.36 uses a special inverse cumulative probability
functions.
Interpolation
Recall: interpolation is a method of approximation. It is often used to estimate a value at
a position between two given values in a table. Linear interpolation assumes that the two
known values lie on a straight line.
In Example 6.4(d), 0.90 is between the Table III known cumulative probabilities 0.8997
and 0.9015. Suppose the two points (1.28, 0.8997) and (1.29, 0.9015) lie on a straight
line. The approximate z value corresponding to the cumulative probability 0.90 is
The following rule provides the connection between any normal random variable and
the standard normal random variable.
Standardization Rule
If X is a normal random variable with mean m and variance s2 , then a standard normal
random variable is given by
X2m
Z5 (6.8)
s
A Closer L ok
There are other types of 1. The process of converting from X to Z is called standardization. Z is a standardized
standardization. Z 5 (X 2 m)/ s is random variable.
the most common. 2. Using this rule, any probability involving a normal random variable can be transformed
into an equivalent expression involving a Z random variable. We can then convert to
cumulative probability if necessary, and use Table III in the Appendix.
3. The rule above is illustrated in Figure 6.37, using cumulative probability.
P(X x) X–
x 0 z
Figure 6.37 An illustration of standardization. The areas of the shaded regions
are equal.
The following calculation shows why the two shaded regions in Figure 6.37 have the
same area, and how to use the rule to compute probabilities involving any normal random
variable.
Assume: X , N ( m, s2 ) .
262 C hapter 6 Continuous Probability Distributions
Remember the phrase: Whatever P(X # x) The original (cumulative) probability statement.
you do to one side of the X2m x2m Work within the probability statement. Subtract the mean of X and divide
inequality, you have to do to the 5 Pa # b
s s by the standard deviation of X, on both sides of the inequality (standardize).
other side.
5 P(Z # z) Apply the standardization rule within the probability statement. The expression with X
is transformed into Z. The expression with x becomes some fixed value z. Use Table III
in the Appendix to find this probability.
The examples below involve normal random variables and standardization. The hardest
part of these types of problems is (as before) (1) to define and identify the probability distribu-
tion, and (2) to write a probability statement. Given a probability statement involving a normal
random variable, all we have to do is standardize and use cumulative probability. Even for
backward problems (with a known probability), we still standardize and still use cumulative
probability. Note that the technology solutions presented do not require standardization.
b. Find P ( 9 # X # 10 ) .
c. Find the value b such that P ( X # b ) 5 0.75.
Solution
a. X is normal. We know the mean and standard deviation. Standardize and use cumulative
probability associated with Z.
X 2 10 12.5 2 10
P ( X . 12.5 ) 5 Pa . b Standardize.
2 2
5 P ( Z . 1.25 ) Equation 6.8; simplify.
N(10, 4)
P(X 12.5)
10 12.5 x
Standardize f (z)
Standardization illustrated b. Standardize again. Work within the probability statement to write an equivalent ex-
part (b): pression involving Z.
P(9 X 10)
f (x) 9 2 10 X 2 10 10 2 10
P ( 9 # X # 10 ) 5 Pa # # b Standardize.
2 2 2
5 P ( 20.5 # Z # 0 ) Use Equation 6.8; simplify.
x
b 2 10
10 11.349 5 0.6745 Table III; interpolation.
2
b – 10
P Z
2
b 2 10 5 1.349 Multiply both sides by 2.
0 0.6745 z
Figure 6.39 P(X $ 12.5). Figure 6.40 P(9 # X # 10). Figure 6.41 Inverse cumulative
probability.
Solution Trail 6.6 a. For a randomly selected economy seat, find the probability that the seat pitch is be-
tween 33.25 and 34.75 inches (considered comfortable).
Ke ywor ds b. Any seat with a seat pitch less than 33 inches is considered constricted. Find the prob-
n Normally distributed ability that a randomly selected economy seat is constricted.
n Mean
n Standard deviation Solution
a. Let X be the seat pitch in inches. The keywords in the problem suggest X ,
T ranslati on N ( 34, 0.25 ) , s 5 0.5 .
n Normal random variable
Between 33.25 and 34.75 means in the interval [33.25, 34.75] (whether it is closed or
n m 5 34
open doesn’t matter). Find the probability that X lies in this interval.
n s 5 0.5
P ( 33.25 # X # 34.75 )
CONCEPT S
33.25 2 34 X 2 34 34.75 2 34
n Normal probability 5 Pa # # b Standardize.
0.5 0.5 0.5
distribution
n Standardization 5 P ( 21.50 # Z # 1.50 ) Equation 6.8; simplify.
Standardization illustrated
part (b):
P(X 33)
f (x)
Solution
Step 1 Let X be the volume (in cubic inches, in3) of a randomly selected backpack. The
information given indicates that X is a normal random variable with mean
m 5 600 and standard deviation s 5 100: X , N ( 600, 10,000 ) .
SOLUTION TRAIL
Solution Trail 6.7 Find a symmetric interval about the mean such that 95% of all backpack
volumes lie in this interval translates as: Find a value of b such that P ( 600 2 b
K e y w ords # X # 600 1 b ) 5 0.95. Figure 6.43 illustrates this probability statement.
n Normally distributed
n Mean
n Standard deviation
P(600 b X 600 b)
n 95%
The value of b is 196 and the symmetric interval about the mean is
P ( 600 2 b # X # 600 1 b ) 5 P ( 600 2 196 # X # 600 1 196 )
5 P ( 404 # X # 796 ) 5 0.95
95% of all backpacks have a volume between 404 and 796 in3.
Technology can be used to find the endpoints of the interval without solving for
Figure 6.44 A technology b. See Figure 6.44.
solution: Use inverse cumulative
probability to find each endpoint. Try It Now Go to Exercise 6.46
266 C hapter 6 Continuous Probability Distributions
Technology Corner
Procedure: Solve probability questions involving
VIDEO TECH a normal random variable.
MANUALS VIDEO Tech
Video TECH Manuals
MANUALS
Reconsider: Example 6.5, solutions, and interpretations.
EXEL DISCRIPTIVE EXEL DISCRIPTIVE
Normal
calculations
CrunchIt!
There is a built-in function to compute probabilities associated with a normal random variable. Select Distribution
Calculator; Normal. To find cumulative probability or right-tail probability, select the Probability tab. Choose an appropri-
ate inequality symbol and enter a value for the endpoint. To solve an inverse cumulative probability problem, select the
Quantile tab and enter a cumulative probability.
1. Select Distribution Calculator; Normal. Enter the mean, 10, and the standard deviation, 2. Under the Probability tab,
select . and enter 12.5 for the endpoint. See Figure 6.45.
2. To find the probability that X takes on a value in an interval, use cumulative probability.
P ( 9 # X # 10 ) 5 P ( X # 10 ) 2 P ( X # 9 ) .
3. Select Distribution Calculator; Normal. Enter the mean, 10, and the standard deviation, 2. Under the Quantile tab, enter
the cumulative probability and click Calculate. See Figure 6.46.
TI-84 Plus C
The built-in function normalcdf is used to find (calculator) cumulative probability: the probability that X takes on a
value between a and b. This function takes four arguments: a (lower), b (upper), m, and s. The built-in function invNorm
takes three arguments: p (area), m, and s. This function returns a value x such that P ( X # x ) 5 p. The default values for m
and s are 0 and 1, respectively.
1. Select DISTR ; DISTR; normalcdf. Enter the left endpoint (lower), 12.5, the right endpoint (upper), 1E99
(calculator infinity), the mean (m), 10, and the standard deviation (s), 2. Highlight Paste and tap ENTER . Refer to
Figure 6.39.
2. Select DISTR ; DISTR; normalcdf. Enter the left endpoint (lower), 9, the right endpoint (upper), 10, the mean
(m), 10, and the standard deviation (s), 2. Highlight Paste and tap ENTER . Refer to Figure 6.40.
3. Select DISTR ; DISTR; invNorm. Enter the cumulative probability (area), 0.75, the mean (m), 10, and the standard
deviation (s), 2. Highlight Paste and tap ENTER . Refer to Figure 6.41.
Minitab
There are several built-in functions to compute cumulative probability, tail probability, and inverse cumulative probability.
These functions may be accessed through a graphical input window or by using the command language.
6.2 The Normal Distribution 267
1. In a session window, use the function CDF and the complement rule to find P ( X . 12.5 ) . See Figure 6.47.
2. Select Graph; Probability Distribution Plot; View Probability. In the Distribution menu, select Normal. Enter the Mean
and Standard deviation. Under the Shaded Area tab, choose X Value and Middle. Enter the X value 1 (9) and X value 2
(10). Click OK. Minitab displays a distribution plot with the shaded area corresponding to probability. See Figure 6.48.
3. Select Calc; Probability Distributions; Normal. Choose Inverse cumulative probability, enter the Mean, 10, and the
Standard deviation, 2. Select Input constant (p), and enter 0.75. Click OK. The value of x is displayed in the session
window. See Figure 6.49.
Figure 6.47 P(X . 12.5) using Figure 6.48 P(9 # X # 10) using Figure 6.49 Inverse cumulative
the command language. Probability Distribution Plot. distribution function output.
Excel
There are built-in functions to compute cumulative probability associated with a standard normal random variable
(NORM.S.DIST) and a normal random variable with arbitrary mean and standard deviation (NORM.DIST). The functions
NORM.S.INV and NORM.INV are the corresponding inverse cumulative probability functions.
1. Use the function NORM.DIST to find P ( X # 12.5 ) . Use the complement rule to find P ( X . 12.5 ) . See Figure 6.50.
2. Use the function NORM.DIST to find P ( X # 9 ) and P ( X # 10 ) . Compute the difference to find P ( 9 # X # 10 ) . See
Figure 6.51.
3. Use the function NORM.INV. Enter the cumulative probability, 0.75, the mean, 10, and the standard deviation, 2. See
Figure 6.52.
Figure 6.50 P(X . 12.5). Figure 6.51 P(9 # X # 10). Figure 6.52 Inverse cumulative
probability.
a. Find the probability that the wedding costs more than a. What is the probability that the salinity is more than 36 ppt?
$31,000. b. What is the probability that the salinity is less than 33.5 ppt?
b. Find the probability that the wedding costs between c. A certain species of fish can only survive if the salinity is
$26,000 and $30,000. between 33 and 35 ppt. What is the probability that this
c. Find the probability that the wedding costs less than species can survive in a randomly selected area?
$25,000. d. Find a symmetric interval about the mean salinity such
that 50% of all salinity levels lie in this interval. What are
6.44 Public Policy and Political Science The President of
the endpoints of this interval called?
the United States gives a State of the Union Address every year
in late January or early February. Since Lyndon Johnson’s 6.48 Public Health and Nutrition Many people grab a
address in 1966, Richard Nixon gave some of the shortest granola bar for breakfast or for a snack to make it through the
messages, and Bill Clinton presented a 1-hour and 28-minute afternoon slump at work. A Kashi GoLean Crisp Chocolate
address in 2000. The mean length for these addresses is 51.75 Caramel bar is 45 grams, and the mean amount of protein in
minutes and the standard deviation is 14.37 minutes. Assume the each bar is 8 grams.10 Suppose the distribution of protein in a
length of a State of the Union Address is normally distributed. bar is normally distributed and the standard deviation is 0.15
a. What is the probability that the next State of the Union gram, and a random Kashi bar is selected.
Address will be between 45 and 55 minutes long? a. What is the probability that the amount of protein is less
b.TRAIL
SOLUTION What is the probability that the next State of the Union than 7.75 grams?
Address will be more than 90 minutes long? b. What is the probability that the amount of protein is
c. What is the probability that the next two State of the between 7.8 and 8.2 grams?
Union Addresses will be less than 30 minutes long? c. Suppose the amount of protein is at least 8.1 grams. What
is the probability that it is more than 8.3 grams?
6.45 Manufacturing and Product Development A d. Suppose three bars are selected at random. What is the
standard Versa-lok block used in residential and commer- probability that all three will be between 7.7 and 8.3 grams?
cial retaining wall systems has mean weight 37.19 kg.8 Assume
the standard deviation is 0.8 kg and the distribution is approxi- 6.49 Public Health and Nutrition Many typical household
mately normal. A standard block unit is selected at random. cleaners contain toxic chemicals. 2-Butoxyethanol is found in
a. What is the probability that the block weighs more than multipurpose cleaners and is a very powerful solvent. The EPA
38 kg? Write a Solution Trail for this problem. has a safety standard for this chemical when used in the
b. What is the probability that the block weighs between workplace, but cleaning at home in a confined area can cause
36 and 37 kg? levels to rise well above this standard. The mean percent of
c. If the block weighs less than 35.5 kg, it cannot be used 2-butoxyethanol in Rain-X Glass Cleaner is 3.11 Suppose the
in certain commercial construction projects. What is the distribution is approximately normal and the standard deviation
probability that the block cannot be used? is 1%, and a random bottle of Rain-X Glass Cleaner is selected.
a. What is the probability that the percentage of
6.46 Marketing and Consumer Behavior Movie trailers 2-butoxyethanol is less than 2.5?
are designed to entice audiences by showing scenes from b. What is the probability that the percentage of
coming attractions. Several trailers are usually shown in a 2-butoxyethanol is between 2.2 and 3.5?
theater before the start of the main feature, and most are c. Suppose the EPA has established a limit of 5%
available via the Internet. The duration of a movie trailer is 2-butoxyethanol in all consumer products. What is the
approximately normal, with mean 150 seconds and standard probability that a bottle exceeds this limit?
deviation 30 seconds.
6.50 Psychology and Human Behavior In many U.S.
a. What is the probability that a randomly selected trailer
lasts less than 1 minute? families, both parents work outside the home, while children
b. Find the probability that a randomly selected trailer lasts spend time at daycare centers or are cared for by other relatives.
between 2 minutes and 3 minutes 15 seconds. The mean amount of time fathers spend with their child(ren) is
c. Any movie trailer that lasts beyond 4 minutes and 7.3 hours per week.12 Suppose this time distribution is approxi-
30 seconds is considered too long. What proportion of mately normal with standard deviation 0.75 hour, and suppose a
movie trailers is too long? father is randomly selected.
d. Find a symmetric interval about the mean such that 99% a. What is the probability that the father spends at least
of all movie trailer durations lie in this interval. 8 hours with his child in a given week?
b. What is the probability that the father spends between 6
6.47 Biology and Environmental Science The salinity, or and 7 hours with his child in a given week?
salt content, in the ocean is expressed in parts per thousand (ppt). c. If the child sees his or her father for less than 6 hours per
The number varies with depth, rainfall, evaporation, river runoff, week, the parental bond is weakened. What is the probability
and ice formation. The mean salinity of the oceans is 35 ppt.9 that this special bond will be weakened in a given week?
Suppose the distribution of salinity is normal and the standard d. What is the probability that the father will spend at least
deviation is 0.52 ppt, and suppose a random sample of ocean 9 hours with his child in each of five randomly selected
water from a region in the tropical Pacific Ocean is obtained. weeks?
270 C hapter 6 Continuous Probability Distributions
6.57 Medicine and Clinical Studies Repeated industrial 6.60 Physical Sciences Hydroelectric projects are carefully
tasks often cause work-related muscle disorders. Measurements monitored, and their energy capability is predicted for several
of joint angles (of the shoulder and elbow, for example) years into the future. Suppose the Klamath Hydro Project,
required to complete a certain task can be used to predict future located on the upper Klamath River in south-central Oregon,
injuries. The shoulder joint angle required to fasten an alumi- generates electricity according to a normal distribution. The
num door frame on an assembly line varies according to the Pacific Northwest Utilities Conference Committee claims the
worker’s height, arm length, and location. The shoulder joint mean electricity generated per year is 35 megawatts (MW).
angle for this task is normally distributed with mean 23.7 a. The probability that the Klamath Hydro Project generates
degrees and standard deviation 1.9 degrees. Suppose an less than 34 MW during any randomly selected year is
employee is randomly selected. 0.3540. Find the standard deviation.
a. What is the probability that the shoulder joint angle will b. Suppose the years are independent, and the hydro project
be between 20 and 25 degrees? will record a profit in a given year if it is able to generate
b. What is the probability that the joint angle will be less at least 37.8 MW that year. What is the probability that
than 18 degrees? the project will record a profit for four consecutive years?
c. If the joint angle is more than 28 degrees, there is a good c. Suppose the electricity generated during a certain year is
chance the employee will suffer from a muscle disorder. 33.5 MW. Is there any evidence to suggest that the claim
What is the probability that the employee will suffer from by the Pacific Northwest Utilities Conference Committee
a muscle disorder? is false? Justify your answer.
d. If the joint angle is between 21.7 and 25.7 degrees, then
management believes the ergonomics of the task are 6.61 Manufacturing and Product Development Dining-
adequate. If five employees are randomly selected, what is room chairs come in many different woods, styles, and shapes.
the probability that four of the five have adequate The height of the seat of a randomly selected oak dining-room
ergonomics? chair is approximately normal with mean 85 centimeters (cm)
and standard deviation 1.88 cm.
6.58 Biology and Environmental Science Many lakes are a. Find a value h such that 99% of all dining-room chairs
carefully monitored for pH concentration, total phosphorus, have height less than h.
chlorophyll, nitrogen, and total suspended solids. These data are b. Consumer testing indicates that any chair seat higher than
used to characterize the condition of the lake and to chart 90 cm is uncomfortable to use when eating. What is the
year-to-year variability. Based on information from the Lake probability that a randomly selected dining-room chair is
Partner Program, Ontario Ministry of the Environment, uncomfortable?
Aberdeen Lake has a mean total phosphorus concentration of c. Find the first and third quartiles of the dining-room chair
14.6 mg/liter and standard deviation 5.8 mg/liter. Suppose a day height distribution.
is selected at random, and a total phosphorus measure from d. There is some evidence to suggest that, after five years of
Aberdeen Lake is obtained. use, the mean height of these chairs has decreased, due to
a. What is the probability that the total phosphorus is less wear, erosion, and humidity. Suppose that after five years,
than 13 mg/liter? the probability the height is more than 86 cm is 0.0718.
b. What is the probability that the total phosphorus differs Find the mean height after five years.
from the mean by more than 5 mg/liter?
c. Suppose the total phosphorus is less than 20 mg/liter. 6.62 Sports and Leisure Tianlang Guan was the youngest
What is the probability that it is less than 14 mg/liter? person ever to participate in the Masters Golf Tournament. The
d. If the total phosphorus measurement is 27 mg/liter, is 14-year-old from China played the Augusta, Georgia, course
there any evidence to suggest the mean has increased? with the confidence of a professional, but very slowly. He was
warned about his slow play and was assessed a one-stroke
6.59 Manufacturing and Product Development High- penalty on the par-4 17th hole on the second day of the
pressure washers have become popular for cleaning siding, tournament.16 The PGA tour maintains a 40-second time limit
decks, and windows. This equipment is available in various to play a stroke, but also has several exceptions to this rule that
engine types and horsepower. Suppose the power rating (in allow for an additional 20 seconds. The mean time for all golf
horsepower, hp) for a residential pressure washer is normally shots is 38 seconds.17 Assume the time for all golf shots is
distributed with mean 20 hp and standard deviation s . normally distributed with standard deviation 9 seconds, and
a. The probability that a randomly selected power rating is suppose a golf shot is selected at random.
within 2.5 hp of the mean is 0.7229. Find the value of s. a. What is the probability that a randomly selected shot
b. A leading consumer magazine advised its readers to takes between 25 and 35 seconds?
purchase pressure washers with a power rating of 15 hp b. If the shot takes more than 60 seconds, the golfer is
or more. What proportion of pressure washers have this assessed a penalty stroke. What is the probability that the
rating? golfer will be assessed a penalty stroke?
c. If the power rating is more than 26.5 hp, the pressure washer c. Suppose a golfer takes 72 strokes to complete the round.
will crack, or even break, certain windows. What is the What is the probability that at least 60 of these shots take
probability that a pressure washer could break a window? less than 45 seconds?
272 C h a p t e r 6 Continuous Probability Distributions
x x
20 20
15 15
10 10
5 5
0 0
2 1 0 1 2 z 2 1 0 1 2 z
Figure 6.53 A normal probability plot. The points Figure 6.54 A normal probability plot. The curved
lie along an approximate straight line. There is no graph suggests that the distribution is not normal
evidence of non-normality. and is skewed.
x x
20 20
15 15
10 10
5 5
0 0
2 1 0 1 2 z 2 1 0 1 2 z
Figure 6.55 A normal probability plot. The Figure 6.56 A normal probability plot. The plot
plot suggests that the distribution is not normal suggests that the distribution is not normal and
and that the data set contains an outlier. has heavy tails.
The data axis can be horizontal or vertical. To use a horizontal data axis, plot the points
( x(i), zi ) . Figure 6.57 shows a normal probability plot with the data plotted on the vertical
axis and Figure 6.58 shows a normal probability plot (using the same data and standard-
ized normal scores) with the data plotted on the horizontal axis.
274 C hapter 6 Continuous Probability Distributions
x z
2
80
1
60
0
40
1
20
2
0
2 1 0 1 2 z 0 20 40 60 80 x
Figure 6.57 A normal probability plot with the Figure 6.58 A normal probability plot with the
data plotted on the vertical axis. data plotted on the horizontal axis.
Interpretation of a normal probability plot is very subjective, and even if the axes are
reversed, we are still looking for the points to lie along a straight line.
All four methods can be used to check the normality assumption, and any one (or several)
may suggest the data did not come from a normal distribution. Because we are searching for
evidence of non-normality, even if we fail to reject the normality assumption in each test, we
still cannot say with absolute certainty that the data came from a normal distribution.
757 751 749 753 745 749 738 746 732 750
741 743 760 741 758 762 745 752 735 767
Is there any evidence to suggest that this distribution is not normally distributed?
Solution
Step 1 Figure 6.59 shows a frequency histogram for these data. There are no obvious
outliers, and the distribution seems approximately normal.
4
Frequency
0
730 735 740 745 750 755 760 765 770
Copper mined
Step 2 The sample mean and the sample standard deviation are x 5 748.70 and
s 5 9.17. The following table lists three symmetric intervals about the mean, the
number of observations in each interval, and the proportion of observations in
each interval (recall that n 5 20).
The actual proportions are close to those given by the empirical rule (0.68, 0.95,
and 0.997).
Step 3 The quartiles are Q1 5 742, Q3 5 755.
Plot these points to obtain the normal probability plot, as shown in Figure 6.60.
Figure 6.61 shows a technology solution. The points lie along an approximately
straight line.
x
770
760
750
740
730
2 1 0 1 2 z
Figure 6.60 Normal probability plot for the copper Figure 6.61 Normal probability plot.
mine data.
The histogram, backward empirical rule, IQR/s, and normal probability plot
show no significant evidence of non-normality. Remember, however, that this
decision is very subjective.
Try It Now Go to Exercise 6.76
276 C hapter 6 Continuous Probability Distributions
350 351 352 353 354 358 361 364 371 376
377 378 387 396 399 402 406 408 412 424
427 430 432 437 440 441 443 446 447 449
Is there any evidence to suggest the distribution of six-month total dosage is not normally
distributed?
Solution
Step 1 Figure 6.62 shows a frequency histogram for these data. Although the graph
seems symmetric, it is not bell-shaped. Most of the data are concentrated
in the tails of the distribution. This suggests the data are not from a normal
distribution.
5
Frequency
0
350 360 370 380 390 400 410 420 430 440 450
Total dose
Step 2 The sample mean is x 5 399.03 and the sample standard deviation is s 5 34.94.
The following table lists three symmetric intervals about the mean, the number
of observations in each interval, and the proportion of observations in each inter-
val (computed using n 5 30).
The first two proportions (0.50 and 1.00) are significantly different from those
given by the empirical rule (0.68 and 0.95). This suggests the population of total
chemotherapy doses is not normal.
6.3 Checking the Normality Assumption 277
440
420
400
380
360
340
2 1 0 1 2 z
Figure 6.63 Normal probability plot for the Figure 6.64 JMP normal probability
chemotherapy dose data. plot.
The histogram, backward empirical rule, IQR/s, and the normal probability plot
all indicate that this sample did not come from a normal population.
TECHNOLOGY CORNER
Procedure: Construct a normal
VIDEO TECH probability plot.
MANUALS VIDEO Tech
Video TECH Manuals
MANUALS
Reconsider: Example 6.8, solution,
EXEL DISCRIPTIVE and interpretations. EXEL DISCRIPTIVE
Normal quantile
plots
CrunchIt!
A QQ Plot (quantile plot) is used to construct a normal probability plot.
1. Enter the data into a column. Rename the column if desired.
2. Select Graphics; QQ Plot. Select the Sample (name of the column) from the drop-down menu. Click Calculate to view
the graph. See Figure 6.65. CrunchIt! adds a straight line to this plot for visual reference.
TI-84 Plus C
A normal probability plot is one of the six built-in statistical plots.
1. Enter the data into list L1.
2. Choose STATPLOT ; STAT PLOTS; Plot1. Turn the plot On, select Type normal probability plot (the last graph
icon), enter the Data List, L1, set the Data Axis to Y, choose a Mark (for the points on the graph), and select
a Color.
3. Enter appropriate window settings and press GRAPH to view the normal probability plot. Refer to Figure 6.61.
Minitab
Use the built-in function NSCOR to compute the normal scores. Construct a scatter plot of the data versus the normal scores.
1. Enter the data into column C1.
2. Compute the normal scores in a session window (or by using the Minitab Calculator) and store the results in column
C2: LET C2 = NSCOR(C1).
3. Construct a scatter plot.
a. In a session window: PLOT C1*C2.
b. In a graphical input window: Graph; Scatterplot. Select Simple and let the Y variable be the data column (C1) and
the X variable be the normal scores column (C2). See Figure 6.66.
6.3 Checking the Normality Assumption 279
Excel
To construct a normal probability plot, compute the normal scores using the formula below. Construct a scatter plot of the
data versus the normal scores.
1. Enter the data into column C, in increasing order, and the numbers 1 to n 5 20 in column A.
2. Set the cell B1 equal to NORM.S.INV((A1 – 3/8)/(20 + 1/4)). Copy this result and paste into the cells
B2–B20. These are (approximately) the normal scores.
3. Highlight the data range B1:C20. Under the Insert tab, select Scatter; Scatter. See Figure 6.67.
0
Construct a normal probability plot. Is there any evidence to 2 1 0 1 2 z
suggest the data are from a non-normal population? Justify
your answer. 6.73 Consider the given data. Use the four methods presented in
this section to determine whether there is any evidence to suggest
6.72 Examine each normal probability plot below. Is there the data are from a non-normal population. EX6.73
any evidence to suggest the data are from a non-normal
population? Justify your answer. 6.74 Consider the following data:
40
2223 641 1275 1796 802 691 1716 1157
20 1340 1504 1006 778 1476 914 1108 2110
1704 2106 1069 1398 1680 1705 2376 1577
0 1601 2201 1809 1847 1721 1069
2 1 0 1 2 z
60
6.76 Education and Child Development Some people
claim children who practice yoga are more physically fit,
40
self-confident, and self-aware. A random sample of pre-teens
(ages 10–12) practicing yoga was obtained and their meditation
20
(or quiet breathing) times (in minutes) per day were recorded.
Use the four methods presented in this section to determine
0 whether there is any evidence to suggest the data are from a
2 1 0 1 2 z non-normal population. YOGA
6.3 Checking the Normality Assumption 281
6.77 Sports and Leisure Many NBA players express 6.81 Marketing and Consumer Behavior The predomi-
themselves on the court through their sneakers. High-school nant acid in frozen concentrated orange juice (FCOJ) is citric
basketball players often insist on wearing the same expensive acid, and the amount is usually given as a percentage. Degrees
sneakers worn by their favorite player. A random sample of Brix is a measure of the total soluble solids in FCOJ and is
sneakers worn by Notre Dame Prep (in Massachusetts) basket- also a percentage. The Brix/acid ratio is computed by simple
ball players was obtained, and the retail price (in dollars) for division, and 12 is considered an ideal ratio. A random sample
each pair of sneakers is given in the table below: SNEAKERS of FCOJ was obtained from different sellers, and the Brix/acid
ratio was measured for each. These measurements were used to
96.70 112.05 120.70 106.40 86.60 construct the following normal probability plot:
126.40 134.75 76.75 142.20 116.70
x
Use the four methods presented in this section to determine
13
whether there is any evidence to suggest the data are from a
non-normal population.
6.78 Public Policy and Political Science Bicycle paths are 12
usually planned and constructed according to certain guidelines
(for example, the American Association of State Highway and
11
Transportation Officials guidelines for construction). There are
construction standards for width, offset from the road, maximum
grade, and horizontal and vertical clearances. A random sample 10
2 1 0 1 2 z
of bicycle-path widths (in feet) was obtained. BIKEPATH
a. Find the sample mean and the sample standard deviation
for these data. Does this plot suggest the data are from a non-normal distribu-
b. Compute the intervals ( x 2 s, x 1 s ) , ( x 2 2s, x 1 2s ) , tion? Justify your answer.
and ( x 2 3s, x 1 3s ) . 6.82 Biology and Environmental Science There are
c. Find the proportion of observations in each interval in approximately 1500 black bears in Great Smoky Mountains
part (b). Is there any evidence to suggest the data are from National Park. Park visitors are warned that these animals are
a non-normal population? dangerous and unpredictable, and that it is illegal to disturb or
6.79 Physical Sciences Near-Earth objects (NEOs) are comets displace a black bear. To track and protect these animals,
and asteroids that have entered the Earth’s neighborhood. Between suppose a random sample of male black bears was obtained and
March 4 and March 10, 2013, four asteroids were in our neighbor- the weight of each (in pounds) was recorded.21 Use the four
hood; the largest was approximately 600,000 miles away. The methods presented in this section to determine whether there is
National Aeronautics and Space Administration maintains a list of any evidence to suggest the data are from a non-normal
NEOs deemed to have the potential to collide with Earth. The population. BEARS
estimated absolute magnitude (H, related to diameter) of several of 6.83 Manufacturing and Product Development Pig iron
these objects is given on the text website.20 Use the methods is a mixture of iron ore, charcoal from coal, and limestone,
presented in this section to determine whether there is any evidence melted together under very high pressure. This type of iron is
to suggest the data are from a non-normal population. NEOS usually refined and used to produce wrought iron, cast iron, or
6.80 Sports and Leisure A typical round of golf takes steel. A random sample of the production and shipment of
approximately 3.5 hours (walking, without an electric cart). pig iron (in 100 metric tonnes) per month in Canada was
Many weekend golfers take more time as a result of lost golf recorded.22 Use the four methods presented in this section to
balls, thinking about certain shots, and talking to other players. determine whether there is any evidence to suggest the data
The manager at the Wawona Golf Course in Wawona, are from a non-normal population. PIGIRON
California, obtained a random sample of round times (in hours),
and the data are reported in the table below. GOLF Challenge
6.84 Normal Scores Generate 500 random samples of size
3.86 4.92 4.15 3.83 4.34 4.56 4.24 4.33 10 from a standard normal distribution. Order each sample
4.36 4.09 4.30 4.23 4.28 4.63 4.34 3.73 from smallest to largest. Find x(1), the sample mean of the
4.66 4.40 4.45 4.42 500 smallest values from each sample. Consider the next
largest value in each sample. Find x(2), the sample mean of
a. Construct a stem-and-leaf plot for these data. these 500 values. Continue in this manner to find x(3), x(4), . . . ,
b. Compute the ratio IQR/s. and x(10), the mean of the 500 largest values from each
c. Construct a normal probability plot for these data. sample. Compare these 10 sample means, x(1), x(2), . . . , x(10)
d. Is there any evidence to suggest these data are from a with the standardized normal scores for n 5 10 in Table IV in
non-normal population? Justify your answer. the Appendix.
282 C h a p t e r 6 Continuous Probability Distributions
Try a similar procedure for n 5 20. Generate 500 random Generate 500 random observations from a normal distribution
samples of size 20 from a standard normal distribution. with mean 50 and standard deviation 10. Arrange the observa-
Order each sample from smallest to largest. Find tions in order from smallest to largest and denote this ordered
x(1), x(2) , c, x(20) and compare these values with the list x(1), x(2), . . . , x(500) .
standardized normal scores for n 5 20 in Table IV in the
Form the ordered pairs ( x(1), 1/ 500 ) , ( x(2), 2/ 500 ) , . . . ,
Appendix.
( x(i), i/ 500 ) , . . . , ( x(500), 1 ) . Plot these points in a rectangular
Explain why these sample means should be good estimates coordinate system and describe the shape of the graph. Which
of the standardized normal scores. curve does this graph approximate? Why?
A Closer L ok
1. The symbol e in Equation 6.9 is the base of the natural logarithm (e < 2.71828). The
constant e is also used in the Poisson distribution and the normal distribution.
2. If the exponential distribution is used to model the lifetime of a light bulb, machine, or
even a human being, then l represents the failure rate.
Notice that f (x) 5 l when x 5 0 3. The exponential distribution has positive probability only for x $ 0. Figures 6.68 and
(because e0 5 1). 6.69 show the graphs of a general probability density function for an exponential ran-
dom variable and a probability density function with l 5 2.
f (x) f (x)
2
0 x 0 1 2 3 4 x
Figure 6.68 A graph of the Figure 6.69 A graph of the
probability density function for an probability density function for an
exponential random variable with exponential random variable with
parameter l. l 5 2.
6.4 The Exponential Distribution 283
4. The mean and variance for an exponential random variable, X, with parameter l are
1 1
E(X) 5 m 5 s2 5 (6.10)
l l2
f (x)
P(X x)
Figure 6.70 A graph of the probability
density function for an exponential random
variable. The shaded area corresponds to
P(X # x). x
Solution
a. Use Equation 6.9 to sketch the graph (Figure 6.71) and Equation 6.10 to compute the
mean, variance, and standard deviation.
1 1
m5 5 5 10
l 0.1
1 1
s2 5 2 5 5 100 s 5 "s2 5 "100 5 10
l 0.12
284 C hapter 6 Continuous Probability Distributions
f (x)
0.10
0.05
0
0 2 4 6 8 10 12 14 16 18 x
5 1 2 0.3679 5 0.6321
The probability that the length of time until symptoms return is less than the mean is
0.6321. Figure 6.72 illustrates this probability.
f (x)
0.10
0.05
0
0 2 4 6 8 10 12 14 16 18 x
c. Translate the question into a probability statement, and use cumulative probability
where appropriate. At least 12 hours means 12 hours or more.
P ( X $ 12 ) Translation to a probability statement.
20.1(12)
5e Use the formula for a right-tail probability.
21.2
5e 5 0.3012 Simplify.
The probability that the symptoms will return in between 8 and 16 hours is 0.2474.
Figures 6.73 through 6.75 show technology solutions.
Figure 6.73 P(X , 10): Use the Figure 6.74 P(X $ 12): Use the Figure 6.75 P(8 # X # 16):
formula for cumulative probability. formula for right-tail probability. Compute each cumulative probability
and subtract.
Solution
a . Translate the question into a probability statement. Use cumulative probability or
SOLUTION TRAIL right-tail probability where appropriate. At least five years means five years or longer.
P(X $ 5) Translation to a probability statement.
Christian Delbert/Shutterstock 20.0625(5)
5e 5 0.7316 Right-tail probability.
The probability that the heat pump lasts for at least five years is 0.7316.
Solution Trail 6.11b b. Use the definition of conditional probability, and cumulative or right-tail probability
where appropriate.
K e y w ords
n Suppose the heat pump lasts
P ( X $ 5 1 5 0 X $ 5 ) Translation to a probability statement.
Technology Corner
Procedure: Compute probabilities associated with an exponential random variable.
Reconsider: Example 6.11, solution, and interpretations.
CrunchIt!
There is no built-in function to compute probabilities associated with an exponential random variable.
TI-84 Plus C
There is no built-in function to compute (cumulative) probabilities associated with an exponential random variable. Use
the formulas for cumulative probability and right-tail probability.
1. P ( X , 10 ) : Use the formula for cumulative probability. Refer to Figure 6.73.
2. P ( X . 12 ) : Use the formula for right-tail probability. Refer to Figure 6.74.
3. P ( 8 # X # 16 ) : Compute each cumulative probability and subtract. Refer to Figure 6.75.
Minitab
There are built-in functions to compute (strict) cumulative probability and inverse cumulative probability. These functions
may be accessed through a graphical input window or by using the command language.
1. In a session window, use the function CDF to find P ( X , 10 ) , or select Calc; Probability Distributions; Exponential.
Choose Cumulative probability and enter the Scale 5 m 5 10 and Threshold 5 0. Select Input constant (x) and enter
10. See Figure 6.76. Click OK. The cumulative probability is displayed in the session window. See Figure 6.77.
3. Use the function CDF to find P ( X # 16 ) . Note that Minitab uses the mean to completely describe an exponential
random variable. Store the result in K1. Use the function CDF to find P ( X # 8 ) . Store the result in K2. Find the
difference, and print the result. See Figure 6.79.
Excel
The built-in function EXPON.DIST may be used to compute (strict) cumulative probability.
1. Use the function EXPON.DIST to find P ( X , 10 ) . The arguments for this function are x, l, and the logical value
True to compute cumulative probability. See Figure 6.80.
2. Use the function EXPON.DIST to find P ( X # 12 ) . Use the complement rule to find P ( X $ 12 ) . See Figure 6.80.
3. Use the function EXPON.DIST to find P ( X # 16 ) . Use the function EXPON.DIST to find P ( X # 8 ) . Find the
difference. See Figure 6.82.
Figure 6.80 P(X , 10): Use the Figure 6.81 P(X $ 12): Use the Figure 6.82 P(8 # X # 16).
function for cumulative function for cumulative probability
probability. and the complement rule.
d. If a subscriber spent less than $500 for personal or a. If a test occurred at 6:00 a.m., what is the probability that
vacation travel during the past 12 months, the circulation another one would occur before 6:00 p.m.?
department will no longer hand-deliver each issue. What b. If a test occurred at 10:00 p.m., what is the probability that
is the probability that a randomly selected subscriber will another one would not occur until after 6:00 a.m.?
no longer receive hand-delivered issues? c. If a test occurred at 9:00 a.m., what is the probability
that the next test would occur between 12:00 p.m. and
6.94 Fuel Consumption and Cars The purpose of an
1:00 p.m.?
automobile’s timing belt is to provide a connection between the
camshaft and the crankshaft. This allows the valves to open and
close in sync with the pistons. Suppose the duration of a timing Extended Applications
belt (in miles) can be modeled by an exponential random
variable with mean 100,000 miles. 6.98 Public Health and Nutrition The U.S. Public Health
a. Find the value of l. Service’s Advisory Committee on Immunization Practices
b. Find the probability that a randomly selected timing belt recommends a tetanus booster every 10 years. Suppose the
lasts for more than 120,000 miles. lifetime (in years) of a tetanus booster shot has an exponential
c. Find the probability that a randomly selected timing belt distribution with l 5 0.05.
lasts for between 75,000 and 125,000 miles. a. Find the probability that a randomly selected tetanus
d. TRAIL
SOLUTION If the timing belt on a new car breaks within 70,000 miles, booster shot is still protecting against tetanus after more
the dealer will install a new belt free of charge. What is the than 10 years.
probability the dealer will be forced to install a new timing b. Suppose a physician recommends a booster shot after a
belt free of charge on a randomly selected car? period when only 10% of all tetanus shots still have an
effect. How long should you wait before getting another
6.95 Business and Management Suppose the booster shot?
dentists at the Eastin Center for Modern Dentistry in c. Suppose two people independently receive tetanus shots.
Coeur D’Alene, Idaho, never see a patient on time. The amount What is the probability that both shots still have an effect
of time (in minutes) past the appointment time a random patient after more than five years?
must wait to see a dentist has an exponential distribution with
6.99 Medicine and Clinical Studies A patient’s blood
l 5 0.01. Suppose a patient is selected at random, and the
appointment time has passed. cholesterol level is often checked during a routine physical
a. What is the probability that the dentist will see the patient examination. A blood sample is taken (from the finger or arm)
within five minutes after the appointment time? Write a and tested for total cholesterol and HDL (high-density lipopro-
Solution Trail for this problem. tein, the “good cholesterol”) cholesterol levels in milligrams
b. Find the probability that the dentist will see the patient per deciliter (mg/dL). If total cholesterol is less than 200 mg/
between 10 and 20 minutes after the appointment time. dL, then no action is taken. If a patient’s total cholesterol is
c. Find the probability that the patient will have to wait at approximately 300 mg/dL, a new drug together with a strict diet
least 30 minutes past the appointment time. is prescribed to reduce this number to a safe level. The amount
of time (in days) it takes to reduce total cholesterol to a safe
6.96 Psychology and Human Behavior In Columbia, level has an exponential distribution with l 5 1/ 15. A patient
South Carolina, on a randomly selected Friday night between with total cholesterol of 300 mg/dL is randomly selected and
9:00 p.m. and 2:00 a.m., the time (in minutes) between calls to a placed on this drug-and-diet regimen.
911 dispatcher has an exponential distribution with l 5 0.125. a. Find the mean number of days until the total cholesterol
Suppose a Friday evening in Columbia is selected at random. level is safe (less than 200 mg/dL).
a. If a 911 call is taken at 10:07 p.m., what is the probability b. Find the probability that the total cholesterol level will be
that the next call will occur before 10:30 p.m.? safe in a number of days within two standard deviations
b. If a 911 call is taken at 11:30 p.m., what is the probability of the mean.
that the next call will occur after 11:45 p.m.? c. Find a value d such that 75% of all such patients have a
c. Find the mean, variance, and standard deviation for the safe cholesterol level with d days.
time between 911 calls. d. Suppose two patients are selected at random. Find the
d. Find a value t such that if a 911 call is received between probability that this drug-and-diet regimen works for at
9:00 p.m. and 2:00 a.m., then the probability that the next least one of the patients within 10 days.
911 call will occur more than t minutes later is 0.75.
6.100 Marketing and Consumer Behavior A toy manufac-
6.97 Public Policy and Political Science During the Cold turer routinely tests new toys in a controlled environment
War there were frequent radio tests of alert warnings. The before deciding whether to actually market a toy. Research has
regular program was interrupted and replaced by a long shown that the amount of time (in minutes) a randomly selected
high-pitched signal. An announcer would break in with a child plays with a new toy has an exponential distribution with
message similar to, “This has been a test of the emergency l 5 0.05. Suppose a new toy is presented for study.
broadcasting system.’’ Suppose the time (in hours) between a. What is the probability that a randomly selected child will
these tests had an exponential distribution with mean 24 hours. play with the toy for at most 10 minutes?
Chapter 6 Summary 289
b. What is the probability that a randomly selected child will b. What is the probability that the aroma lasts for at least
play with the toy for between 5 and 20 minutes? 40 minutes?
c. Suppose a child plays with the toy for at least 15 minutes. c. What is the probability that the aroma lasts for between
What is the probability that the child will play with the 30 and 50 minutes?
toy for another 20 minutes? d. Find a value t such that the probability that the aroma
d. If four different children each play independently with a lasts for at most t minutes is 0.90.
new toy for at least 25 minutes, then the toy is e. Suppose a second batch of cookies is taken out of the
immediately brought to market. What is the probability of oven 10 minutes after the first. What is the probability
this happening for a newly designed toy? that there will be no aroma from either batch 35 minutes
after the first batch was done?
6.101 Physical Sciences Four large water pumps supply
Bellingham, Washington, with water. If water pump 6.103 Psychology and Human Behavior Sensory memory in
i (i 5 1, 2, 3, 4) breaks down, then the time to repair it has an humans is very short-term and lasts approximately 200–500
exponential distribution with l 5 1/ ( 2i ) hours. Suppose the milliseconds after an individual perceives an object.23 Suppose
water pumps operate independently, and when breakdowns the length of time (in milliseconds) an object remains in sensory
occur, the repair times are also independent. memory for a randomly selected adult has an exponential
a. Suppose water pump 1 breaks down. What is the distribution with l 5 0.003. A flashing grid of letters is displayed
probability that it will take more than 30 minutes to repair? to a randomly selected adult and the sensory memory is measured.
b. Answer part (a) for water pumps 2, 3, and 4. a. What is the probability that the sensory memory will last
c. Suppose all four water pumps break simultaneously. What for at most 200 milliseconds?
is the probability that at least one of the four water pumps b. Find the median time the sensory memory will last.
will not be repaired within one hour? c. Find a time t such that only 5% of all adults’ sensory
memory lasts at least t milliseconds.
6.102 Public Health and Nutrition From the instant a
fresh-baked chocolate-chip cookie is taken out of the oven, the
Challenge
time (in minutes) the wonderful aroma lasts has an exponential
distribution with l 5 1/ 30. Suppose a chocolate-chip cookie is 6.104 Memoryless Property Suppose X is an exponential
done baking and is taken out of the oven. random variable with parameter l. If a and b are constants
a. Find the mean, variance, and standard deviation for the (.0), confirm the memoryless property by showing that
time the aroma lasts. P(X $ a) 5 P(X $ a 1 b 0 X $ b) .
Chapter 6 Summary
Concept Page Notation / Formula / Description
Probability distribution 244 A smooth curve (density curve) defined such that the probability X takes on a
for a continuous random value between a and b is the area under the curve between a and b.
variable
Uniform distribution 245 If X has a uniform distribution on the interval [a, b], then
1
if a # x # b a1b 2 (b 2 a)2
f (x) 5 • b 2 a m 5 ,s 5
2 12
0 otherwise
Normal distribution 256 If X is a normal random variable with mean m and variance s2, then the
probability density function is given by
1 ( )2 2
f (x) 5 e2 x2m /2s
s"2p
Standard normal 257 A normal distribution with m 5 0 and s2 5 1 is the standard normal
distribution distribution. The standard normal random variable is usually denoted by
Z: Z , N ( 0, 1 ) .
Standardization 261 If X is a normal random variable with mean m and variance s2, then
X2m
Z5
s
is a standard normal random variable.
290 C hapter 6 Continuous Probability Distributions
Normal probability plot 272 A scatterplot of each observation versus its corresponding expected value
from a Z distribution. For a normal distribution, the points will fall along a
straight line.
Exponential distribution 282 If X is an exponential random variable with parameter l, then the probability
density function is given by
le2lx if x $ 0 1 1
f (x) 5 e m 5 , s2 5 2
0 otherwise l l
Chapter 6 Exercises
c. Find the first and the third quartile times. c. Find a value w such that 80% of all male jockeys weigh
d. Suppose it takes seven minutes to make the reservation. Is more than w.
there any evidence to suggest the inn’s claim ( d. Suppose the jockey selected weighs 57 kg. Is there any
m 5 4 minutes) is false? Justify your answer. evidence to suggest the claimed mean (52 kg) is wrong?
Justify your answer.
6.111 Fuel Consumption and Cars The Fiat 500E is an
electric car that can travel approximately 80–100 miles on a full 6.115 Public Health and Nutrition The amount of caffeine
charge.26 The time it takes to fully recharge depends on the in a cup of coffee varies considerably, even if it is brewed by the
percent depleted, but is approximately normal with mean same person, using the same brewing method and ingredients.
2.2 hours and standard deviation 0.6 hour (on a 240-volt line). Suppose the amount of caffeine in an eight-ounce cup of
Suppose a random Fiat 500E is recharged. Maxwell House coffee is approximately normally distributed
a. What is the probability that the amount of time to fully with mean 120 mg and standard deviation 7.75 mg.27
recharge is between 1 and 2 hours? a. What is the probability that a randomly selected cup
b. What is the probability that the amount of time to fully of Maxwell House coffee will have less than 110 mg
recharge is more than 3 hours? of caffeine?
c. If the amount of time to fully recharge is less than 30 b. What is the probability that a randomly selected cup of
minutes, the car could still travel another 50 miles. What is Maxwell House coffee will have between 115 and 130 mg
the probability that the car could travel another 50 miles? of caffeine?
d. Suppose three Fiat 500E’s are selected at random. What is c. A recent article in a medical journal suggests that an
the probability that all three will have a time to fully eight-ounce cup of coffee with more than 140 mg of
recharge greater than 2.5 hours? caffeine could cause a person’s heart to race. What is the
probability that a randomly selected cup will have more
6.112 Manufacturing and Product Development Oriental than 140 mg of caffeine?
rugs are made from various wools and woven in several
different countries. The pile height (in millimeters) of an 6.116 Manufacturing and Product Development The
Oriental rug varies slightly and can be modeled by a uniform quality of a kitchen knife is often measured by the sharpness
random variable on the interval [6, 10]. and total lifetime of the blade. One test for sharpness involves
a. Carefully sketch a graph of the density function for pile mounting the knife with the blade vertical and lowering a
height. specially designed pack of paper onto the blade. The sharpness
b. What is the probability that a randomly selected Oriental is measured by the depth of the cut. A greater depth indicates a
rug will have pile height less than 7 mm? sharper knife. Suppose the depth of the cut for a randomly
c. What is the probability that a randomly selected Oriental selected knife has a normal distribution with mean 92 mm and
rug will have pile height between 8.5 and 9.5 mm? standard deviation 21 mm.
d. Find a value h such that 90% of all Oriental rugs have pile a. What is the probability that a randomly selected kitchen
height less than h. knife has a sharpness measure less than 75 mm?
b. A kitchen knife with a sharpness measure of at least 100
6.113 Technology and the Internet The life expectancy of mm qualifies as a steak knife. What proportion of kitchen
a laser-printer toner cartridge varies considerably according to knives are steak knives?
the toner and drum type, and how the cartridge is used. Printed c. The Kitchen Gadgets Association would like to set a
pages containing lots of graphics require more toner, while maximum sharpness for butter knives. Find a value c such
pages with mostly text require considerably less toner. Page that 15% of all knives have sharpness less than c.
coverage is usually measured as a proportion (or percentage). d. Suppose a randomly selected kitchen knife has sharpness
For example, a typical text page has approximately 0.05 cover- greater than 90 mm. What is the probability that it has
age. A random sample of printed pages from an office printer sharpness greater than 100 mm?
was obtained, and the page coverage was carefully measured. Is
6.117 Economics and Finance Some of the variables that
there any evidence to suggest the data are from a non-normal
population? TONER
affect the monthly payment of a new-car loan are the total
amount borrowed, the interest rate, and the length of the loan.
6.114 Sports and Leisure Jockeys in the United States and During the fourth quarter of 2012, the mean length of a new-car
England work very hard to keep their weight down. Many loan was 65 months, a record high according to Experian.28
participate in weight-loss programs, carefully monitor their Suppose the length of a new-car loan is approximately normal
diet, and exercise regularly. The weight of a male jockey is with standard deviation nine months.
approximately normal with mean 52 kg and standard deviation a. What is the probability that a new-car loan is for at most
1.2 kg. Suppose a male jockey is randomly selected. 48 months?
a. What is the probability that the jockey weighs more than b. What is the probability that a new-car loan length is
53 kg? between 50 and 70 months?
b. What is the probability that the jockey weighs between c. Find a symmetrical interval about the mean such that 95%
50 and 54 kg? of all new-car loan lengths fall in this interval.
292 C hapter 6 Continuous Probability Distributions
d. Suppose the amount borrowed on a new-car loan is also passenger vehicles is 15 minutes.31 Suppose the time to cross
approximately normal with mean $20,000 and standard the border at Alexandria Bay is a random variable, X, with
deviation $5000. If the length of the loan and the amount probability density function given by
borrowed are independent, what is the probability that
the loan will be for more than $27,000 and for less than 0.02 x if 0 # x # 10
f (x) 5 e
60 months? 0 otherwise
6.118 Fuel Consumption and Cars Even though automo- a. Carefully sketch a graph of the density function.
biles are becoming more fuel-efficient, the cost of gasoline is b. Find the probability that it takes less than five minutes for
still taking a huge chunk of our personal income. According to a randomly selected passenger vehicle to cross the border
the U.S. Energy Information Administration, the mean amount at Alexandria Bay.
spent on gasoline in 2012 was $2912.29 A random sample of c. Find the probability that it takes more than eight minutes
U.S. households was obtained, and each was asked for the for a randomly selected passenger vehicle to cross the
amount (in dollars) spent on gasoline during the last year. Is border at Alexandria Bay.
there any evidence to suggest that the data are from a non-normal d. Find the probability that it takes between two and six
distribution? GASCOST minutes for a randomly selected passenger vehicle to
cross the border at Alexandria Bay.
6.119 Physical Sciences Large reservoirs of oil found e. Suppose it takes less than two minutes for a randomly selected
underground are under very high pressure, which allows the oil passenger vehicle to cross the border at Alexandria Bay. What
to be pumped to the surface. All oil fields contain some water, is the probability that it takes less than one minute?
and as water is pumped back into a well to maintain high
pressure, the water content increases. Suppose the proportion 6.122 Public Health and Nutrition Meat or poultry
of water in a randomly selected barrel of oil pumped to the classified as lean has less than four grams of saturated fat.
surface has a normal distribution with mean 0.12 and standard Suppose the amount of saturated fat in a randomly selected
deviation 0.025. piece of lean meat or poultry is a random variable, X, with
a. What is the probability that a randomly selected barrel of probability density function given by
oil has a proportion of water less than 0.12?
b. What is the probability that a randomly selected barrel of 0.1 if 0 # x , 1
oil has a proportion of water between 0.15 and 0.17? 0.2 if 1 # x , 2
c. The higher the proportion of water in oil, the more f ( x ) 5 e 0.3 if 2 # x , 3
expensive it is to separate the oil from the water. If the 0.4 if 3 # x , 4
proportion of water is greater than 0.20, then the well is 0 otherwise
too expensive to operate and maintain. What is the
probability that a randomly selected well is too expensive? a. Carefully sketch a graph of the density function.
b. What is the probability that a randomly selected piece of lean
6.120 Travel and Transportation The Washington, DC, meat or poultry has less than 1.5 grams of saturated fat?
Metro has some long and deep escalators. The longest continu- c. What is the probability that a randomly selected piece of
ous escalator is at the Wheaton (Red Line) stop. It is 230 feet lean meat or poultry has more than three grams of
long and approximately 140 feet deep.30 For those who simply saturated fat?
ride the escalator (without extra steps), it takes just over 2.5 d. What is the probability that a randomly selected piece of
minutes to complete the journey. However, because many lean meat or poultry has between two and four grams of
people walk while on the escalator, the amount of time to travel saturated fat?
the 230 feet is approximately normal with mean 1.9 minutes e. Suppose a randomly selected piece of lean meat or poultry
and standard deviation 0.3 minute. Suppose an escalator rider is has at most three grams of saturated fat. What is the
selected at random. probability that it has at most one gram of saturated fat?
a. What is the probability that it takes less than 2 minutes to
ride the escalator? 6.123 Manufacturing and Product Development
b. What is the probability that it takes between 1.5 and 2.5 Four people working independently on an assembly line all
minutes to ride the escalator? perform the same task. The time (in minutes) to complete
c. Suppose that five escalator riders are selected at random. this task for person i (i 5 1, 2, 3, 4) has a uniform distribution
What is the probability that three of the five take at least on the interval [0, i]. Suppose each person begins the task at
2 minutes and 20 seconds to ride the escalator? the same time.
a. What is the probability that person 2 takes less than
90 seconds to complete the task?
EXTENDED APPLICATIONS
b. What is the mean completion time for each person?
6.121 Travel and Transportation The U.S. Customs and c. What is the probability that all four people complete the
Border Protection Agency maintains a table of the estimated task in less than 30 seconds?
wait times for reaching the primary inspection booth when d. What is the probability that exactly one person completes
crossing the Canada/U.S. border. The processing goal for the task in less than one minute?
Chapter 6 Exercises 293
6.124 Probability Calculations Using a Density Function b. If the amount of protein in a cup of yogurt is between 14
The probability density function for a random variable X is and 17 grams, the manufacturing process is consider to be
given by in control. What is the probability that the process is
considered in control? Out of control?
1 1 c. Suppose the cup of yogurt has less than 16 grams of protein.
2 x1 if 0 # x , 2
4 2 What is the probability that it has more than 15 grams?
f (x) 5 e 1 d. Suppose that six cups of yogurt are selected at random.
2 x11 if 2 # x , 4 What is the probability that at least three cups have at
4
0 otherwise least 16.5 grams of protein?
Looking Back
n Recall that parameters such as m, s2 , and p completely characterize, or describe, a
population.
n These parameters are constant and usually unknown, but we would like to estimate these
values, or draw a conclusion about these parameters.
Looking Forward
=
n Understand and utilize methods to find a sampling distribution (for X and for P ) and use
it in statistical inference.
n Discover the central limit theorem, and solve probability and inference problems using
this result.
Contents
7.1 Statistics, Parameters, and Sampling Distributions
7.2 The Sampling Distribution of the Sample Mean and the Central Limit Theorem
7.3 The Distribution of the Sample Proportion
© Kristoffer Tripplaar/Alamy
295
296 C h apte r 7 Sampling Distributions
Definition
A parameter is a numerical descriptive measure of a population.
A statistic is any quantity computed from values in a sample.
A Closer L ok
1. A parameter is a population quantity. It is used to describe some characteristic of a
population. Usually we cannot measure a parameter; it is an unknown constant that we
would like to estimate.
2. A statistic is any sample quantity. There are infinitely many quantities one could com-
pute using the data in a sample. For example, x and ~x are statistics, as is the sum of the
smallest and the largest values divided by 2.
The difference between a parameter and a statistic is key: Parameters describe popula-
tions, and statistics describe samples. We use statistics to make inferences about parameters.
Therefore, the properties of a statistic are important.
Solution
a. 52% is a statistic. This number describes a characteristic of a sample of Republicans.
b. 0.32 is a parameter. This number describes a characteristic of the entire population of
women.
c. 42 is a parameter. This number is a characteristic of all the highway bridges in the
United States.
d. 5.6 is a statistic. This number describes a characteristic of the sample of 20 guests.
e. 7.2 is a statistic. This number describes a characteristic of the sample of individuals
who responded to the survey.
Suppose the mean is computed for a sample of size n from a population of interest.
Denote this value x1. In a second sample of size n from the same population, let x2 be the
7.1 Statistics, Parameters, and Sampling Distributions 297
mean. It is reasonable to expect x1 and x2 to be close to each other but not equal. The
important realization is that x1 will be different from x2. In fact, the sample mean will dif-
fer from sample to sample. This statistic is subject to sampling variability. Therefore, the
sample mean, X , is a random variable, and X has a mean, a variance, a standard devia-
tion, and a probability distribution. This distribution is called a sampling distribution.
Statistics are random variables! Any statistic is a random variable, because the value of the statistic differs from
sample to sample. One cannot predict the value of a statistic with absolute certainty.
To make a reliable inference based on a specific statistic, we need to know the proper-
ties of the distribution of the statistic.
Definition
The sampling distribution of a statistic is the probability distribution of the statistic.
A Closer L ok
1. A sampling distribution (like any random variable) describes the long-run behavior of
the statistic.
2. We can use either of two techniques to obtain, or find, a sampling distribution.
a. Recall that to approximate the distribution of a population, we construct a histo-
gram (or stem-and-leaf plot) using values from the population. If the sample is
representative, then the histogram should be similar in shape, center, and spread to
the population distribution.
Similarly, to approximate the distribution of a statistic, we obtain (many) values of
the statistic and construct a histogram. The resulting graph approximates the sam-
pling distribution of the statistic.
For example, to approximate the sampling distribution of the mean of a sample of
size n 5 10 from a population: (i) obtain several samples of size 10 from the popu-
lation; (ii) compute the sample mean for each sample; (iii) construct a histogram
using all the sample means. The histogram approximates the sampling distribution
of the sample mean.
b. In some cases, the exact sampling distribution of a statistic can be obtained. If the
statistic is a discrete random variable, the sampling distribution includes all the
values the statistic assumes and the associated probabilities. If the statistic is a
continuous random variable, the sampling distribution consists of a probability
density curve.
Solution
5
The order of selection does not Step 1 There are a b 5 10 ways to select three koalas from the five in the population.
3
matter here. Therefore, this is a
List all the possible samples, the computed value of the statistic for each sample,
combination.
and the probability of selecting each sample. Use this table to construct the sam-
pling distribution.
298 C h apte r 7 Sampling Distributions
The median for each sample is the middle value. The probability of each sample
is 1/10 5 0.1, because we assume that each sample is equally likely.
Step 3 Of all the possible samples, only three values are possible for the sample median
in this case. Sum the probabilities associated with each value. The probability
~ ~
distribution for the random variable X lists the values (of X ) and the associated
probabilities.
~x 14.5 17.1 17.6
p ( ~x ) 0.3 0.4 0.3
A Closer L ok
~
1. In Example 7.2, X is a discrete random variable. Using Equations 5.1, 5.3, and 5.4, the
Challenge: Find the sampling ~
~ mean, variance, and standard deviation for X are m 5 16.47, s2 5 1.71, and s 5 1.31.
distribution for X if sampling
is done with replacement. 2. The koalas were selected without replacement. Suppose three bears are selected with
Hint: There are 5 3 5 3 5 5 125 replacement. That is, select a stuffed animal, record its weight, and place it back on the
possible samples. dresser. The same koala could be selected two or three times. This sampling scheme
~
changes the probability distribution for X .
Solution
Step 1 A sample consists of two numbers: The first represents the sick days for employee
1, and the second denotes the sick days for employee 2. There are 16 possible
samples. Using the multiplication rule, there are two slots to fill: 4 3 4 5 16.
List all the possible samples, the computed value of the statistic for each sample,
and the probability of selecting each sample.
Step 2 The probability associated with each sample is computed using the indepen-
Anyra/Dreamstime.com dence assumption. For example, the probability the first employee used one sick
day and the second employee used two sick days is given by
The maximum for each sample is the largest of the two values.
Remember, a capital letter, such Step 4 There are four possible values for the discrete random variable M. Sum the prob-
as M, represents a random abilities associated with each value. The probability distribution is given in the
variable. The corresponding table below.
lowercase letter m denotes a
m 0 1 2 3
specific value the random variable
M can assume. p(m) 0.1600 0.4025 0.3400 0.0975
A Closer L ok
1. Suppose the population is finite of size N and the sample is of size n. The number of
N
possible simple random samples is a b.
n
2. A random sample consists of individuals or objects, and a variable is a characteristic
of an individual or object. A value of the variable is obtained for each member of the
random sample. We often refer to the values of the variable as the random sample
rather than the individuals. For example, consider a study in which the amount of the
trace element chromium is measured in coal from around the world. The random
300 C h a p t e r 7 Sampling Distributions
sample consists of pieces of coal (objects) and the values are the chromium measure-
ments. However, it is common practice to say, “Consider the random sample of chro-
mium measurements.”
3. Unless stated otherwise, all data presented in this text are obtained from a simple ran-
dom sample.
VIDEO Tech
Video TECH Manuals
MANUALS Find an approximate sampling distribution for the sample mean of five observations from
EXEL DISCRIPTIVE
Sampling from this population.
a data set
Solution
20
Step 1 There are a b 5 15,504 possible samples of size 5. Instead of considering
5
every one of these samples, select some (say, 100) samples of size 5, compute
the mean for each sample, and construct a histogram of the sample means.
Step 2 The table below lists the first few samples of size 5 and the mean for each sample.
Sample x Sample x
25 13 18 10 12 15.6 24 3 13 20 10 14.0
3 10 13 12 11 9.8 4 24 10 9 19 13.2
19 9 10 11 16 13.0 20 16 26 24 15 20.2
( ( ( (
Step 3 Figure 7.1 shows a histogram of the sample means for 100 samples of size 5.
Step 4 There are some very interesting, curious results here. Even though the population is
not normally distributed (the population is finite; and consider a graph of the 20
cattle feeds in Figure 7.2), the shape of the sampling distribution appears to be
approximately normal! In addition, the center of the sampling distribution of the
mean is approximately the population mean ( m 5 15.8 ) . Although the relationship
between the original population parameters and the sampling distribution parame-
ters is not clear, there is certainly less variability in the sampling distribution than in
the population distribution ( s 5 8.11 ) . These three observations suggest the exact
distribution of the sample mean, as discussed in the next section.
8
25
20 6
Frequency
Frequency
15
4
10
2
5
6 8 10 12 14 16 18 20 22 24 26 5 10 15 20 25 30 35
Mean Percentage
and quality of the paint, and the applicator. A local hardware 7.17 Education and Child Development The number of
store stocks 30 different paints. paint copies sold (in millions) for the all-time best-selling children’s
a. Use a computer or calculator to draw 50 random samples books are given in the table below. books
of size 5 without replacement, and compute the mean for
each sample. 9.9 9.7 7.1 7.0 6.8
b. Construct a histogram of the sample means.
c. Use the histogram to approximate the sampling
Suppose two of these books are selected at random without
distribution of the mean. What is the approximate shape
replacement.
of the distribution? What is the approximate value of the
a. Find the sampling distribution of the sample mean.
mean of the sample mean?
b. Find the sampling distribution of the total number of
d. Find the population mean. How does this compare with
books sold.
the approximate mean of the sampling distribution?
7.14 Demographics and Population Statistics For plan- Extended Applications
ning purposes, U.S. Bancorp has determined the probability
distribution for the retirement age, X, of employees in the mort- 7.18 Fuel Consumption and Cars An automobile manu-
gage work group. The probability distribution for X is given in facturer lists several specifications for every one of its cars.
the table below. RETIRE For example, the length, width, wheelbase, turning circle,
curb weight, and interior room measurements are readily
x 64 65 66 available to customers. The acceleration time from 0 to 60
p(x) 0.1 0.7 0.2 miles per hour (in seconds) for 10 cars is given in the table
below.6 cars
a. Find the mean of X.
b. Suppose two employees from this work group are selected
at random. Find the exact probability distribution for the 2.5 2.8 3.0 3.5 3.4 3.7 3.9 4.1 4.7 5.1
sample mean, X .
c. Find the mean of X . How does this compare with your a. Use a computer or calculator to draw 30 random samples
answer in part (a)? of size 3 without replacement, and compute the standard
deviation for each sample.
7.15 Public Health and Nutrition The U.S. Food & Drug b. Construct a histogram of the sample standard deviations.
Administration regulates the use of certain terms used on food c. Use the histogram to describe the shape of the sampling
labels—for example, reduced fat, low fat, and light. Lean meat distribution.
must contain less than 4.5 grams of saturated fat per serving.4 d. Compute the population standard deviation. Find an
Suppose the probability distribution for the amount of saturated approximate mean of the sampling distribution. How
fat in a randomly selected serving of lean meat is given in the do these two numbers compare?
table below. USFDA
7.19 Travel and Transportation American Airlines offers
x 0 1 2 3 4 limited first-class seating to passengers on a flight from Newark
p(x) 0.50 0.25 0.10 0.10 0.05 to Los Angeles. The probability distribution for the number of
passengers in first class on a randomly selected flight is a ran-
a. Suppose two servings of lean meat are selected at dom variable, X. The probability distribution for X is given in
random. Find the exact probability distribution for the the table below. FLIGHT
~
sample median amount of saturated fat, X .
~
b. Find the mean, variance, and standard deviation of X . x 5 6 7 8
7.16 Biology and Environmental Science In April 2013, p(x) 0.50 0.30 0.15 0.05
Eddie (“Flathead Ed”) Wilcoxson caught a flathead catfish that
weighed 76.52 pounds on Bartlett Lake in Arizona.5 The cat- a. Find the variance of X.
fish was 53.5 inches long and its girth measured 34.75 inches. b. Suppose two American Airlines cross-country flights are
Suppose five catfish were caught on the same lake with randomly selected. Find the exact probability distribution
weights (in pounds) given in the following table. CATFISH for the sample variance, S 2.
c. Find the mean of S 2. How does this compare with your
11 15 32 21 27 answer in part (a)?
7.20 Sports and Leisure The times (in minutes) for
A random sample of three of these catfish is selected without three men in the 3M Half Marathon are given in the
replacement. following table.7 RUNTIME
a. Find the sampling distribution of the sample mean X .
b. Find the sampling distribution of the total length for all
three catfish, T. 66.08 68.58 77.10
7.1 Statistics, Parameters, and Sampling Distributions 303
Suppose a random sample of two of these times is selected c. Use technology to select 100 random samples of size 10
with replacement. from this population. Find the sample mean for each sample.
a. Find the sampling distribution of the minimum time. d. Construct a histogram of the 100 sample means. Where is
b. Find the sampling distribution of the total time. the histogram centered? How does this value compare to m?
e. Find the standard deviation for the 100 sample means.
7.21 Economics and Finance The Kentucky Derby is
How does this value compare to s?
run every May at Churchill Downs. This extravagant affair
features mint juleps and unusual woman’s hats. The jockeys
Challenge
with the most Kentucky Derby mounts are given in the fol-
lowing table.8 DERBY 7.25 Which Estimator Is Better? Each of the following
graphs shows the probability distribution for two statistics that
Jockey Mounts
could be used to estimate a parameter u (describing an underly-
Bill Shoemaker 26 ing population). Select the statistic that would be a better esti-
Pat Day 22 mator of u and justify your answer.
Eddie Arcaro 21 a.
Laffit Pincay Jr. 21
S2
Suppose a random sample of two of these jockeys is selected
at random without replacement.
a. Find the sampling distribution for the maximum number
of mounts. S1
b. Find the sampling distribution for the total number of
mounts.
7.22 Psychology and Human Behavior Five people own
and operate the Victrola Coffee Shop in Seattle. Each person is b.
married, and the number of years each has been married is
given below. COFFEE
S1 S2
5 3 7 2 12
Solution
Step 1 Consider a random sample of n observations selected with replacement. For
n 5 5, five numbers are selected at random from the population, and the sam-
ple mean is computed. This procedure is repeated 500 times. A histogram of the
resulting 500 sample means is shown in Figure 7.3. For n 5 10, a similar pro-
cedure is followed. The resulting histogram of the sample means is shown in
Figure 7.4.
90 140
80 120
70 100
Frequency
Frequency
60
50 80
40 60
30 40
20
10 20
0
0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18 20
Sample mean Sample mean
Figure 7.3 Histogram of sample means for Figure 7.4 Histogram of sample means for
n 5 5. n 5 10.
Step 2 These histograms suggest some very surprising results. Each distribution appears
to be centered near the population mean, 10.5. In addition, the shape of each
The original population is certainly distribution is approximately normal! Also notice that the variability of the sam-
not normally distributed; it is pling distribution decreases as n increases. The sampling distribution for X for
finite, and a histogram is a n 5 10 is more compact than that for n 5 5.
horizontal line. It is amazing and Step 3 This example implies that the distribution of the sample mean is approximately
perhaps counterintuitive that X normal, with mean equal to the underlying, original, population mean and vari-
has a normal distribution. ance related to the sample size, n.
In the next example, the exact sampling distribution of the mean is obtained.
7.2 The Sampling Distribution of the Sample Mean and the Central Limit Theorem 305
Solution
a. A sample consists of three numbers: The first represents the number of sources for
student 1, the second for student 2, and the third for student 3. By the multiplication
rule, there are 3 3 3 3 3 5 27 possible samples. For each sample, list the sample
mean and the probability of selecting that sample.
The probability associated with each sample is computed using independence. For
example, the probability of observing the sample 2, 2, 4 is
P(X 5 2 d X 5 2 d X 5 4) Intersection of three events.
( ) # (
5P X52 P X52 P X54 ) # ( ) Independence.
The probability distribution for X is given in the table below. It lists all the values X
can assume and the corresponding probabilities.
x 2 7/3 8/3 3 10 / 3 11 / 3 4
p(x) 0.125 0.225 0.285 0.207 0.114 0.036 0.008
306 C h apte r 7 Sampling Distributions
Here is one more approach to illustrate the very important connections between the
distribution of the sample mean and the distribution of the original population.
Consider three original, or underlying, distributions: (1) the standard normal distribu-
tion, a normal distribution with mean 0 and standard deviation 1; (2) a uniform distribu-
tion with parameters a 5 0 and b 5 1; and (3) an exponential distribution with parameter
l 5 0.5. The probability density function and the mean for each distribution are shown
in the first row of Figure 7.5.
Consider the following process for each distribution. Select 500 samples of size n 5 2
and compute the mean for each sample. Construct a histogram of the sample means. The
resulting smoothed histogram is shown in the second row of Figure 7.5. Repeat this pro-
cedure for n 5 5, 10, and 20, for each underlying distribution. These smoothed histo-
grams are also given in Figure 7.5.
Notice the following incredible patterns.
1. If the underlying population is normal, the distribution of the sample mean appears to
be normal, regardless of the sample size. See Figure 7.5(a).
2. Even if the underlying population is not normal, the distribution of the sample mean
becomes more normal as n increases. See Figures 7.5(b) and 7.5(c).
3. The sampling distribution of the mean is centered at the mean of the underlying popu-
lation. See Figures 7.5(a), 7.5(b), and 7.5(c).
4. As the sample size, n, increases, the variance of the distribution of the sample mean
decreases. See Figures 7.5(a), 7.5(b), and 7.5(c).
The previous examples and observations lead to the following properties concerning
the sample mean, X .
In symbols: mX 5 m.
2. The variance of X is equal to the variance of the underlying population divided by the
sample size.
s2
In symbols: sX2 5 .
n
7.2 The Sampling Distribution of the Sample Mean and the Central Limit Theorem 307
s2 s
The standard deviation of X is sX 5 5 .
Ån !n
3. If the underlying population is distributed normally, then the distribution of X is also
exactly normal for any sample size.
In symbols: X , N ( m, s2 / n )
n 2 f (x) f (x ) f (x )
0.8
1.5 5
4 0.6
1.0
3 0.4
2
0.5 0.2
1
0.0 0 0.0
3 2 1 0 1 2 3x 0.00 0.25 0.50 0.75 1.00 x 0 1 2 3 4 x
n 5 f (x) f (x ) f (x )
0.8
1.5 5
4 0.6
1.0
3 0.4
0.5 2
0.2
1
0.0 0 0.0
3 2 1 0 1 2 3x 0.00 0.25 0.50 0.75 1.00 x 0 1 2 3 4 x
n 10 f (x) f (x )
f (x ) 0.8
1.5 5
4 0.6
1.0
3 0.4
0.5 2
0.2
1
0.0 0 0.0
3 2 1 0 1 2 3x 0.00 0.25 0.50 0.75 1.00 x 0 1 2 3 4 x
n 20 f (x) f (x ) f (x )
0.8
1.5 5
4 0.6
1.0
3 0.4
2
0.5 0.2
1
0.0 0 0.0
3 2 1 0 1 2 3x 0.00 0.25 0.50 0.75 1.00 x 0 1 2 3 4 x
(a) (b) (c)
Figure 7.5 Three different original, or underlying, populations, and approximations to the
distribution of the sample mean for various sample sizes n.
308 C h apte r 7 Sampling Distributions
The mean of the sampling distribution of X is the same as the mean of the underlying
distribution that we want to estimate, so the sample mean is an unbiased estimator of the
population mean m.
Even if the underlying population is not normal, the previous examples suggest the
distribution of X becomes more normal as the sample size increases. This amazing result
is the most important idea in all of statistics, the central limit theorem.
Statistical Applet The central limit theorem is really a remarkable result. No matter what the shape of the
The central limit underlying population (skewed, bimodal, lots of variability, etc.), the CLT says the distri-
theorem bution of X is approximately normal, as long as n is large enough!
A Closer L ok
1. A better name for this result might be the normal convergence theorem. The distri-
bution of X converges, or gets closer and closer, to a normal distribution.
2. If the original population is normally distributed, then the distribution of X is normal,
no matter how large or small the sample size (n) is.
3. If the original population is not normal, the central limit theorem says the distribution
of X approaches a normal distribution as n increases, and the approximation improves
as n increases: The approximation gets better and better as the sample size n gets big-
ger and bigger.
There is no magical threshold value for n. However, in most cases, if n $ 30, then the
approximation is pretty good. Even for severely skewed populations, as long as n is at
least 30, the approximation is reasonable.
There is a tendency to think that if n 5 29 the distribution of X will not be approxi-
mately normal, but if n 5 31 it will be. This is simply not true. The CLT should be
interpreted to mean that the distribution of X approaches a normal distribution, looks
more and more like a normal distribution, as n increases. In some cases, for n as little
as 5 the approximation will be excellent. In others, n might have to be at least 26 before
the approximation is good.
4. To compute a probability involving the sample mean, we treat X just like any other
normal random variable: Standardize and use cumulative probability where appropri-
ate. We use the same method even if X is only approximately normal.
5. The expression for the variance of X mathematically confirms our observations regard-
ing the variability of the sampling distribution. As n increases, the distribution of the
SOLUTION TRAIL
7.2 The Sampling Distribution of the Sample Mean and the Central Limit Theorem 309
Solution Trail 7.7 sample mean becomes more compact. sX2 5 s2 /n and s2 is constant. This fraction
becomes smaller as n (the denominator) increases.
K e y w ords
6. A more general version of the CLT includes a statement about the sum of independent
n Normally distributed observations, T. If n is sufficiently large, the distribution of T approaches a normal
n Mean 50, standard deviation distribution with mean nm and variance ns2.
4, 25 trips
In symbols: T , N ( nm, ns2 ) .
d
n Sample mean
Tr a nsl at i o n
n Normal An even more general version of the CLT concludes, essentially, that any statistic that
n m 5 50, s2 5 16, n 5 25 is a sum or a mean tends toward a normal distribution as n increases. This is useful in
n X many inference problems because the appropriate statistic is often a sum or a mean. It is
easy to compute probabilities associated with a normal statistic (random variable). There-
C onc e p ts fore, the likelihood, or tail probability, associated with an observed value of the statistic is
n Properties of X a straightforward calculation.
V isi on
In addition, the central limit theorem helps to explain why so many real-world distri-
butions are approximately normal. Almost any statistic can be decomposed into a sum of
The underlying distribution is
normal, so the distribution of X
other variables. For example, the height of a tomato plant after six weeks might be
is exactly normal. Find the mean, directly related to, or is the sum of, the effects of a large number of independent variables
variance, and standard deviation including the amount of water, fertilizer, and sunlight it has received. Therefore, the
of X , translate each question into distribution of the height should be approximately normal. This single theorem explains
a probability statement, empirical evidence that suggests almost every measurement distribution is approxi-
standardize, and use cumulative mately normal.
probability where appropriate. The following examples illustrate the properties of X and the central limit theorem.
Solution
a. The underlying distribution is normal with m 5 50 and s2 5 16. The sample size is
n 5 25. The sample mean is (exactly) normally distributed:
2.5 0 z
X , N ( m, s2 /n ) 5 N ( 50, 16/25 ) ; sX 5 !16/25 5 4/5 5 0.80.
To solve part (a), begin with a probability statement involving the random variable X .
We need the probability that X will be less than 48 minutes.
X 2 50 48 2 50
P (X , 48 ) 5 Pa , b Standardize.
0.80 0.80
5 P ( Z , 22.5 ) Equation 6.8; simplify.
The probability that the sample mean time for the 25 trips will be less than 48 minutes is
Figure 7.6 P(X , 48). 0.0062. Figure 7.6 shows a technology solution.
310 C h apte r 7 Sampling Distributions
Remember, one point in a b. Within 1 minute of the population mean means 49 # X # 51. Write the probability
continuous world (Z) contributes statement, and standardize again.
no probability:
P ( 49 # X # 51 )
P(Z , 21.25) 5 P(Z # 21.25).
49 2 50 X 2 50 51 2 50
5 Pa # # b Standardize.
0.8 0.8 0.8
5 P ( 21.25 # Z # 1.25 ) Equation 6.8; simplify.
5 0.7888
Standardization illustrated:
P(49 X 51)
f (x) P(1.25 Z 1.25)
f (z)
49 50 51 x 1.25 0 1.25 z
SOLUTION TRAIL
The probability that the sample mean time for the 25 trips will be between 49 and 51
Figure 7.7 P(49 # X # 51). minutes is 0.7888. Figure 7.7 shows a technology solution.
5 0.0228
7.2 The Sampling Distribution of the Sample Mean and the Central Limit Theorem 311
The probability that the sample mean amount of milk picked up by the 36 trucks will
be greater than 7800 is 0.0228.
Standardization illustrated:
P(X 7800) P(Z 2)
f (x ) f (z)
7750 7800 x 0 2 z
b. To solve part (b), translate the question into a probability statement. Convert the
expression into cumulative probability involving Z. Because the probability is
already given, this is an inverse cumulative probability problem. Work backward in
Table III.
X 2 7750 m 2 7750
P (X , m ) 5 Pa , b Standardize.
25 25
m 2 7750
5 PaZ , b 5 0.1 Equation 6.8.
25
There is no further simplification within the probability statement. However, the result-
ing probability statement involves Z and is cumulative probability. Find a value in the
body of Table III, as close to 0.1 as possible. Set the corresponding z value equal to
m 2 7750
a b , and solve for m.
25
m 2 7750
5 21.28 Table III in the Appendix.
25
m 2 7750 5 232 Multiply both sides by 25.
The probability that the sample mean will be less than m 5 7718 is (approximately) 0.1.
Standardization illustrated:
m 7750 x 1.28 0 z
Solution
Solution Trail 7.9 Step 1 Assume the underlying distribution has m 5 26 and s 5 9. We do not know the
shape of the underlying distribution of customer waiting times, but the sample
Ke ywor ds size is large: n 5 32 ($30). By the central limit theorem, the distribution of X
n Is there any evidence? is approximately normal.
Transl at i on Step 2 Claim: m 5 26 S X , N ( 26, 92 / 32 ) ,
d
sX 5 9/ !32.
n Statistical inference
Experiment: x 5 28.0.
Concepts Likelihood: We are looking for evidence that the mean customer waiting time is
n Central limit theorem greater than 26 seconds, so find the right-tail probability (to be conservative).
Vi sion X 2 15 28.0 2 26.0
This inference question concerns P (X $ 28.0 ) 5 Pa $ b Standardize.
5/ !40 9/ !32
a population mean. Use the CLT
and the distribution of X to 5 P ( Z $ 1.26 ) Equation 6.8; simplify.
f (x)
22 24 26 28 30 x
Figure 7.9 The distribution of X . The shaded region Figure 7.10 Use calculator
represents the right-tail probability P(X $ 28). variables to make the input easier.
Technology Corner
Procedure: Solve probability questions involving the sample mean X .
Reconsider: Example 7.7, solution, and interpretations.
Note: In these probability questions X is either normally distributed or approximately normally distributed. In either case,
the technology procedures are similar to those described in Section 6.2.
CrunchIt!
CrunchIt! has a built-in function to compute probabilities associated with a normal random variable. Select Distribution
Calculator; Normal. To find cumulative probability or right-tail probability, select the Probability tab. Choose an appropri-
ate inequality symbol and enter a value for the endpoint. To solve an inverse cumulative probability problem, select the
Quantile tab and enter a cumulative probability.
1. Select Distribution Calculator; Normal. Enter the mean, 50, and the standard deviation, 0.8. Under the Probability tab,
select # and enter 48 for the endpoint. Click the Calculate Tab. See Figure 7.11.
2. To find the probability that X takes on a value in an interval, use cumulative probability.
P ( 49 # X # 51 ) 5 P (X # 51 ) 2 P (X # 49 ) .
TI-84 Plus C
The built-in function normalcdf is used to find (calculate) cumulative probability: the probability that X takes on a value
between a and b. This function takes four arguments: a (lower), b (upper), m, and s. The built-in function invNorm takes
three arguments: p (area), m, and s. This function returns a value x such that P ( X # x ) 5 p. The default values for m and
s are 0 and 1, respectively.
1. Select DISTR ; DISTR; normalcdf. Enter the left endpoint (lower), 21E99 (negative calculator infinity), the
right endpoint (upper), 48, the mean ( m ) , 50, and the standard deviation ( s ) , 0.8. Highlight Paste and tap
ENTER . Refer to Figure 7.6.
2. Select DISTR ; DISTR; normalcdf. Enter the left endpoint (lower), 49, the right endpoint (upper), 51, the
mean ( m ) , 50, and the standard deviation ( s ) , 0.8. Highlight Paste and tap ENTER . Refer to Figure 7.7.
314 C h apte r 7 Sampling Distributions
Minitab
There are several built-in functions to compute cumulative probability, tail probability, and inverse cumulative probability.
These functions may be accessed through a graphical input window or by using the command language.
1. In a session window, use the function CDF to find P (X , 48 ) . See Figure 7.12.
2. Select Graph; Probability Distribution Plot; View Probability. In the Distribution menu, select Normal. Enter the Mean
and Standard deviation. Under the Shaded Area tab, choose X Value and Middle. Enter the X value 1 (49) and X value 2
(51). Click OK. Minitab displays a distribution plot with the shaded area corresponding to probability. See Figure 7.13.
Figure 7.12 P(X , 48) using the Figure 7.13 P(49 # X # 51) using
command language. Probability Distribution Plot.
Excel
There are built-in functions to compute cumulative probability associated with a standard normal random variable
(NORM.S.DIST) and a normal random variable with arbitrary mean and standard deviation (NORM.DIST). The functions
NORM.S.INV and NORM.INV are the corresponding inverse cumulative probability functions.
1. Use the function NORM.DIST to find P (X , 48 ) . See Figure 7.14.
2. Use the function NORM.DIST to find P (X # 49 ) and P (X # 51 ) . Compute the difference to find P ( 49 # X # 51 ) .
See Figure 7.15.
7.26 True/False The sample mean varies from sample to 7.29 True/False If the underlying population is not normal,
sample. the central limit theorem says the distribution of X approaches
a normal distribution as n increases.
7.27 Fill in the Blank Let X be the mean of observations in
7.30 True/False As n increases, the distribution of the sum
a random sample of size n from a population with mean m and
of the observations approaches a normal distribution.
variance s2.
a. The mean of X is . 7.31 True/False As n increases, the variance of X also
b. The variance of X is . increases.
7.2 The Sampling Distribution of the Sample Mean and the Central Limit Theorem 315
Note: To find the distribution of X , or any random variable, 7.37 The figure below shows the graphs of the probability den-
you need to describe the distribution, by name if possible, and sity functions for the random variable X, the random variable X
provide the values of the parameters that characterize the for n 5 5, and the random variable X for n 5 15.
distribution. For example, X is normally distributed with mean
m 5 27 and variance s2X 5 0.35, or X , N ( 27, 0.35 ) , com-
pletely describes the distribution of X .
Practice
7.32 Consider a normally distributed population with mean
m 5 10 and standard deviation s 5 2.5. Suppose a random sam-
ple of size n is selected from this population. Find the distribution
of X and the indicated probability in each of the following cases.
a. n 5 7, P (X # 9 ) .
b. n 5 12, P (X . 11.5 ) .
c. n 5 15, P ( 9.5 # X # 10.5 ) .
d. n 5 25, P (X $ 10.25 ) . 30 40 50 60 70
e. n 5 100, P (X # 9.8 c X $ 10.2 ) .
Identify each probability density function.
7.33 Suppose X is a normal random variable with mean
m 5 17.5 and standard deviation s 5 6. A random sample 7.38. The figure below shows graphs of the probability density
of size n 5 24 is selected from this population. function for the random variable X and the approximate density
a. Find the distribution of X . functions for the random variable X for n 5 5 and the random
b. Carefully sketch a graph of the probability density variable X for n 5 15.
functions for X and X on the same coordinate axes.
c. Find P ( X # 14 ) and P (X , 14 ) .
d. Find P ( 15 , X , 19 ) and P ( 15 , X , 19 ) .
d. A certain health spa claims its clients have BMI between b. How can this probability in part (a) be so small when 11.9
23.5 and 24. Suppose eight health spa clients are selected is so close to the population mean m 5 12?
at random. What is the probability that the sample mean c. From your answer to part (a), is there any evidence to
will be in this interval? suggest that the mean weight of bags of potato chips is
less than 12 ounces?
7.40 Public Policy and Political Science In certain
h urricane-prone areas of the United States, concrete columns 7.44 Psychology and Human Behavior In attempting to
used in construction must meet specific building codes. The flee from police, criminals often barricade themselves inside a
minimum diameter for a cylindrical column is 8 inches. building and create a police standoff. The criminal is usually
Suppose the mean diameter for all columns is 8.25 inches with armed with a dangerous weapon and may hold hostages. Sup-
standard deviation 0.1 inch. A building inspector randomly pose the mean length of a police standoff, ending with some
selects 35 columns and measures the diameter of each. sort of resolution, is 6.5 hours with standard deviation 4 hours.
a. Find the approximate distribution of X . Carefully sketch A random sample of 35 police standoffs is selected.
a graph of the probability density function. a. Find the distribution of the sample mean.
b. What is the probability that the sample mean diameter for b. What is the probability that the sample mean for the 35
the 35 columns will be greater than 8 inches? police standoffs will be greater than 7 hours?
c. What is the probability that the sample mean diameter for c. A new psychological technique was used in negotiations
the 35 columns will be between 8.2 and 8.4 inches? with the criminals involved in the 35 police standoffs.
d. Suppose the standard deviation is 0.15 inch. Answer parts Suppose the sample mean for the 35 police standoffs is
(a), (b), and (c) using this value of s. x 5 5.1 hours. Is there any evidence to suggest that the
mean police standoff time is lower when this new
7.41 Manufacturing and Prodcut Development A large
technique is used?
part of the luggage market is made up of overnight bags. These
bags vary by weight, exterior appearance, material, and size. 7.45 Biology and Environmental Science During monsoon
Suppose the volume of overnight bags is normally distributed season in India, the mean amount of rainfall is 89 centimeters.12
with mean m 5 1750 cubic inches and standard deviation Suppose the standard deviation is 5 centimeters and 30 monsoon
s 5 250 cubic inches. A random sample of 15 overnight bags seasons are selected at random.
is selected, and the volume of each is found. a. What is the probability that the sample mean rainfall is
a. Find the distribution of X . less than 87 centimeters?
b. What is the probability that the sample mean volume is b. What is the probability that the sample mean rainfall is
more than 1800 cubic inches? greater than 91.5 centimeters?
c. What is the probability that the sample mean volume is c. Find a symmetric interval about the mean, 89, such that the
within 100 cubic inches of 1750? probability that the sample mean lies in this interval is 0.90.
d. Find a symmetric interval about 1750 such that 95% of all
7.46 Biology and Environmental Science Carbon dioxide
values of the sample mean volume lie in this interval.
(CO2) is one of the primary gases contributing to the green-
7.42 Biology and Environmental Science The U.S. house effect and global warming. The mean amount of CO2 in
nvironmental Protection Agency (EPA) is concerned about
E the atmosphere for March 2013 was 397.34 parts per million
pollution caused by factories that burn sulfur-rich fuel. To (ppm).13 Suppose 40 atmospheric samples are selected at ran-
decrease the impact on the environment, factory chimneys must dom in May 2013 and the standard deviation for CO2 in the
be high enough to allow pollutants to dissipate over a larger atmosphere is s 5 20 ppm.
area. Assume the mean height of chimneys in these factories is a. Find the probability that the sample mean CO2 level is
100 meters (an EPA-acceptable height) with standard deviation less than 393 ppm.
12 meters. A random sample of 40 chimney heights is obtained. b. Find the probability that the sample mean CO2 level is
a. What is the probability that the sample mean height for between 395 and 403 ppm.
the 40 chimneys is greater than 102 meters? c. Suppose the sample mean CO2 level is 400. Is there any
b. What is the probability that the sample mean height is evidence to suggest that the population mean CO2 level
between 101 and 103 meters? has increased? Justify your answer.
c. Suppose the sample mean is 98.5 meters. Is there any
7.47 Public Health and Nutrition Typhoid fever is more
evidence to suggest that the true mean height for
common in developing countries with poor sanitation and
chimneys is less than 100 meters? Justify your answer.
greater incidence of food contamination. The mean duration of
7.43 Manufacturing and Product Development The man- this illness is approximately 4 weeks.14 Periodically, a team of
ager at an HEB grocery store in Corpus Christi, Texas, suspects a doctors randomly selects 10 patients with typhoid fever and
supplier is systematically underfilling 12-ounce bags of potato carefully measures the duration of the illness. If the mean
chips. To check the manufacturer’s claim, a random sample of 100 duration is greater than 4.5 weeks, then a health alert is issued.
bags of potato chips is obtained, and each bag is carefully weighed. Suppose the duration of this illness is normally distributed with
The sample mean is 11.9 ounces. Assume s 5 0.3 ounces. standard deviation one week.
a. Find the probability that the sample mean is 11.9 ounces a. Find the probability that the sample mean duration is less
or less. than 4.5 weeks.
7.2 The Sampling Distribution of the Sample Mean and the Central Limit Theorem 317
c ontinue. Otherwise, it is shut down. Suppose the force distribu- Standard clip-on weights for steel rims on automobiles (used
tion is normal with standard deviation 0.1 kN. when tires are balanced) are available in 14 -ounce to 6-ounce
a. If the true mean is 0.8 kN, what is the probability that the sizes. Suppose an automobile garage receives a large shipment
process will be shut down? This probability represents the of 2-ounce weights. A garage mechanic will select a random
chance of making one kind of error. sample of 30 weights and weigh each one on a precise scale.
b. If the true mean is 0.82 kN, what is the probability that If the sample mean is within 0.05 ounce of the printed weight
the process will be allowed to continue? What if (of 2 ounces), then the shipment is accepted. Otherwise, the
m 5 0.84 kN? These probabilities represent the chance of entire shipment is rejected and returned to the manufacturer.
making a different kind of error. Suppose the population standard deviation is 0.13 ounce.
c. Suppose the process is allowed to continue if the mean a. Find the probability of accepting the entire shipment if
force required is between 0.76 and 0.84 kN. Answer parts the true population mean weight is 1.86, 1.88, 1.90, 1.92,
(a) and (b) with this new interval. 1.94, 1.96, 1.98, 2.00, 2.02, 2.04, 2.06, 2.08, 2.10, 2.12,
of 2.14 ounces.
7.55 Manufacturing and Product Development Shetland
b. Plot the probability of accepting the entire shipment versus
wool is considered some of the finest in the world because it is
the true population mean weight. The resulting graph is the
soft, durable, and easy to spin. The mean fleece weight from a
OC curve for the given acceptance sampling plan.
typical sheep is 3.25 pounds with a standard deviation of 0.4
pound. Suppose a farmer has 100 sheep ready to be sheared. 7.58 Normal Approximation to the Binomial
a. Find the distribution for the total weight of fleece from Distribution Consider a binomial experiment with n
the 100 sheep, T. trials and probability of success p. If we assign a 0 to each fail-
b. What is the probability that the total fleece weight is less ure and a 1 to each success, then the binomial random variable
than 323 pounds? can be defined as a sum. By the central limit theorem, as n
c. What is the probability that the total fleece weight is increases the distribution of X (the sum) approaches a normal
between 330 and 340 pounds? distribution with mean np and variance np ( 1 2 p ) . Suppose X
d. Find a value t such that P ( T $ t ) 5 0.15. is a binomial random variable with n 5 30 and probability of
success p 5 0.5.
7.56 Sports and Leisure A wingsuit is a specially designed
a. Construct a probability histogram for the binomial
jumpsuit with added surface area to provide more lift. Extreme-
random variable X. Find P ( 12 # X # 16 ) using the
sports enthusiasts wear this suit and jump from planes or a fixed
binomial distribution.
object (BASE jumping, where BASE stands for Building,
b. Find the approximate normal distribution for X. Find
Antenna, Span, Earth). One of the most popular BASE environ-
P ( 12 # X # 16 ) using the normal distribution for X.
ments is Troll Wall in Norway. Suppose the mean flight time for
c. Compare the probabilities found in parts (a) and (b).
a wingsuit jump at Troll Wall is 22.5 seconds with standard
d. Find P ( 11.5 # X # 16.5 ) using the normal distribution
deviation 2.3 seconds. Thirty-two jumpers are selected at ran-
for X. Compare this answer with the probability in part (a).
dom to test a new wingsuit at Troll Wall.
Why do you think this is a much better approximation?
a. What is the probability that the sample mean flight time is
less than 22 seconds? 7.59 Manufacturing and Product Development A hard-
b. Suppose the sample mean flight time for these jumpers is ware store has 20 interior plantation shutter sets in stock. The
23.15 seconds. Is there any evidence to suggest that this width (in inches) of each set is given in the table below.
new wingsuit increases the mean flight time? SHUTTER
c. Find a value w such that P (X # w ) 5 0.005.
30 30 28 34 36 28 34 36 24 35
Challenge 28 30 32 30 44 34 22 32 20 30
7.57 Fuel Consumption and Cars Companies receiving large
shipments of raw materials of any product often use a specific a. Find the (population) mean width of the 20 shutters.
plan for accepting the entire shipment. An acceptance sampling b. Use technology to generate at least 100 random samples
plan usually includes the sample size for close inspection, the of size five (without replacement). Find the sample mean
acceptance criterion, and the rejection criterion. For a given plan, width for each sample.
the operating characteristic (OC) curve shows the probability of c. Construct a histogram of the sample means found in part
accepting the entire lot as a function of the actual quality level. (a). Describe the distribution.
p (the probability of a success). For example, a politician might want to estimate the pro-
portion of voters in a district in favor of a certain highway bill, or a quality-control super-
visor might need to estimate the true proportion of defective parts in a large shipment.
=
It seems reasonable to use a value of the sample proportion, p, to make an inference
concerning the population proportion p. Therefore, knowledge of the sampling distribu-
tion of the sample proportion is necessary. We need to completely characterize the vari-
ability of this statistic.
Consider a sample of n individuals or objects (or trials) and let X be the number of suc-
cesses in the sample. The sample proportion (introduced in Section 3.1) is defined to be
= X the number of successes in the sample
P5 5 (7.1)
n the sample size
The sample proportion is simply the proportion of successes in the sample, or a relative
frequency.
An approach
= similar to the one in Section 7.2 can be used to approximate the distri-
bution of P: Generate lots of sample proportions, construct a histogram, and try to char-
acterize the distribution in terms of shape, center, and variability. The sampling distribution
is summarized below.
=
The Sampling Distribution of P
=
Let P be the sample proportion of successes in a sample of size n from a population with
true proportion of success p.
=
1. The mean of P is the true population proportion.
In symbols: mP= 5 p.
= p(1 2 p)
2. The variance of P is sP2= 5 .
n
= p(1 2 p)
As n increases, the distribution The standard deviation of P is sP= 5 .
= Å n
of P approaches a normal =
3. If n is large and both np $ 5 and n ( 1 2 p ) $ 5, then the distribution of P is approxi-
distribution. There is no threshold
mately normal.
value for n. The larger the value =
In symbols: P , N ( p, p ( 1 2 p ) /n ) .
d
of n and the closer p is to 0.5, the
better is the approximation.
A Closer L ok
=
1. It may be a little surprising to learn that P is approximately normal. However, the
sample proportion can be written as a sample mean. Assign 0 to each failure and 1 to
each success. X, the total number of successes, is a sum (of 0s and 1s). Therefore, the
sample proportion, X/n, is really a sample mean. Remember, the central limit theorem
says that the sample mean is approximately normal for n sufficiently large.
= =
2. Because the mean of P is the true population proportion, P is an unbiased estimator
for p.
3. A large sample isn’t enough for normality. The two products np and n ( 1 2 p ) must
both be greater than or equal to 5. This is called the nonskewness criterion. It guaran-
=
tees that the distribution of P is approximately symmetric, that is, centered far enough
away from 0 or 1.
=
4. To compute a probability involving the sample proportion, treat P like any other normal
random variable: Standardize and use cumulative probability where appropriate.
=
The following example illustrates the properties of P and the technique for computing
probabilities associated with this random variable.
SOLUTION TRAIL
Concepts
= Solution
n Distribution of P
a. For n 5 110 and p 5 0.45, check the nonskewness criteria.
Vi sion
This question involves the np 5 ( 110 )( 0.45 ) 5 49.5 $ 5 and n ( 1 2 p ) 5 ( 110 )( 0.55 ) 5 60.5 $ 5
= =
random variable P . Use the Both inequalities are satisfied. The distribution of P is approximately normal with
=
properties of P to find the
distribution, translate each p(1 2 p) ( 0.45 )( 0.55 )
mP= 5 p 5 0.45 and s2P= 5 5 5 0.00225
question into a probabiity n 120
statement, and write one =
In symbols: P , N ( 0.45, 0.00225 ) sP= 5 !0.00225.
d
f (p)
P(P 0.50)
=
Figure 7.16 The probability density function for P . The shaded
=
area represents P(P . 0.50).
=
b. P (P . 0.50 ) Probability that the sample proportion is greater than 0.50.
=
P 2 0.45 0.50 2 0.45
5 Pa . b Standardize.
!0.00225 !0.00225
5 P ( Z . 1.05 ) Equation 6.8; simplify.
5 0.1469
The probability that the sample proportion is greater than 0.50 is 0.1469.
7.3 The Distribution of the Sample Proportion 321
=
Standardization illustrated c. P ( 0.37 # P # 0.47 ) Probability the sample proportion is between 0.37 and 0.47.
part (c): =
0.37 2 0.45 P 2 0.45 0.47 2 0.45
5 Pa # # b Standardize.
!0.00225 !0.00225 !0.00225
P(0.37 P 0.47)
f (p ) 5 P ( 21.69 # Z # 0.42 )
5 0.6173
Figure 7.17 and 7.18 show technology solutions to parts (b) and (c).
p
0.37 0.45 0.47
P(1.69 Z 0.42)
f (z)
1.69 0 0.42 z = =
Figure 7.17 P(P . 0.50): Figure 7.18 P(0.37 # P # 0.47).
right-tail probability.
Remember, it is very important for
SOLUTION TRAIL
f (p)
P(P r ) 0.25
0.70 p
0.50 0.55 0.60 r 0.65
=
Figure 7.19 The shaded area represents P(P . r).
Concepts
n Inference procedure Experiment: The proportion of CEOs who believe there are too many government
=
Vi sion regulations is p 5 0.56.
=
To decide whether the observed Likelihood: Because p 5 0.56 is to the left of the mean ( p 5 0.60 ) , consider a left-
sample proportion is reasonable, tail probability as a measure of likelihood.
we need to follow the four-step =
inference procedure. Consider = P 2 0.60 0.56 2 0.60
P (P # 0.56 ) 5 Pa # b Standardize.
the claim, the experiment, and 0.04 0.04
the likelihood of the
5 P ( Z # 21.00 ) Equation 6.8; simplify.
experimental outcome.
5 0.1587 Table III in the Appendix.
7.3 The Distribution of the Sample Proportion 323
P(P 0.56) P(Z 1.00)
f (p) f (z)
= p
0.56 0 1.00 0 z
Figure 7.21 P is approximately
normal, so use normalcdf.
In statistical inference problems, any probability less than or equal to 0.05 is consid-
Note: Even though the sample ered small.
proportion can not be less than 0,
Conclusion: This probability is larger than 0.05, so it is reasonable to observe a sam-
21E99 was used in this
ple proportion of 0.56 or smaller. There is no evidence to suggest that the claim of
technology solution because a
p 5 0.60 is wrong.
normal distribution is defined for
all real numbers. Most of the time, Figure 7.21 shows a technology solution.
when the approximation is good,
using 0 (or 1 as a right bound) will Try It Now Go to Exercise 7.75
not change the probability.
Technology Corner
=
Procedure: Solve probability questions involving the sample proportion P.
Reconsider: Example 7.11, solutions, and interpretations.
CrunchIt!
=
If P is approximately normal, use the Distribution Calculator as described in Section 6.2 to find probabilities and to solve
inverse cumulative probability problems.
1. Select Distribution Calculator; Normal. Enter the mean, 0.60, and the standard deviation, 0.04. Under the Quantile tab,
enter the cumulative probability and click Calculate. See Figure 7.22.
2. Select Distribution Calculator; Normal. Enter the mean, 0.60, and the standard deviation, 0.04. Under the Probability
tab, select # and enter 0.56 for the endpoint. See Figure 7.23.
=
Figure 7.22 A solution to Figure 7.23 P(P # 0.56).
=
P(P # r) 5 0.75.
324 C h apte r 7 Sampling Distributions
TI-84 Plus C
=
If P is approximately normal, use the built-in function normalcdf to find (calculate) cumulative probability and the
built-in function invNorm to solve inverse cumulative probability problems.
1. Select DISTR ; DISTR; invNorm. Enter the area (cumulative probability), 0.75, the mean, 0.6, and the standard
deviation, 0.04. Refer to Figure 7.20.
2. Select DISTR ; DISTR; normalcdf. Enter the left endpoint, 21E99, the right endpoint, 0.56, the mean, 0.6, and the
standard deviation, 0.04. Refer to Figure 7.21.
Minitab
As in Chapter 6 for normal random variables, use the built-in functions to find cumulative probability or inverse cumula-
tive probability. These functions may be accessed through a graphical input window: Calc; Probability Distributions;
Normal, Graph; Probability Distribution Plot; or by using the command language.
1. Use the function INVCDF to solve inverse cumulative probability problems. See Figure 7.24.
2. Use the function CDF to find cumulative probability. See Figure 7.25.
Figure 7.24 Use INVCDF to solve inverse Figure 7.25 Use CDF to find cumulative
cumulative probability problems. probability.
Excel
As in Chapter 6 for normal random variables, use the built-in function NORM.DIST to compute cumulative probability
associated with a normal random variable, and the function NORM.INV to solve inverse cumulative probability problems.
=
1. Use the function NORM.INV to find the value r such that P (P # r ) 5 0.75. See Figure 7.26.
2. Use the function NORM.DIST to find cumulative probability. See Figure 7.27.
7.66 Suppose a random sample of size n 5 200 is obtained 7.72 Marketing and Consumer Behavior Movie trailers
from a population with probability of success p 5 0.40. Find are designed to attract large audiences. However, according to a
each of the following probabilities. recent survey, 32% of Americans believe movie trailers give
=
a. P (P # 0.37 ) . away too many of a movie’s best scenes, that is, they reveal too
= much.20 Suppose 250 Americans are selected at random and
b. P (P . 0.45 ) .
= asked if they believe movie trailers reveal too much.
c. P ( 0.38 # P # 0.42 ) .
= = a. Find the distribution of the sample proportion of
d. P (P , 0.33 ) or P (P . 0.47 ) .
Americans who believe movie trailers give away too much.
7.67 Suppose a random sample of size n 5 500 is obtained b. Find the probability that the sample proportion is more
from a population with probability of success p 5 0.50. Find than 0.35.
each of the following probabilities. c. Find the probability that the sample proportion is less
= =
a. P ( 0.52 . P) . b. P (P $ 0.47 ) . than 0.25.
= = d. Find a symmetric interval about the mean ( p 5 0.32 )
c. P ( 0.44 # P , 0.49 ) . d. P ( 0.45 # P , 0.55 ) .
such that the probability that the sample proportion is in
7.68 Suppose a random sample of size n 5 80 is obtained this interval is 0.90.
from a population with probability of success p 5 0.35.
= 7.73 Sports and Leisure Anyone who bets money in a
a. Find a value a such that P (P # a ) 5 0.10.
= casino is classified as a winner if he or she wins more than he
b. Find a value b such that P (P . b ) 5 0.01.
= or she loses. Casino operators in Atlantic City, New Jersey,
c. Find a value c such that P ( 0.35 2 c # P # 0.35 1 c ) 5 believe the proportion of all players who go home a winner is
0.95. 0.46. Suppose 75 Atlantic City gamblers are selected at random.
a. Find the sampling distribution of the proportion of
7.69 Suppose a random sample of size n 5 1000 is obtained
from a population with probability of success p 5 0.25. gamblers who go home winners.
= b. Find the probability that the sample proportion is less
a. Find a value a such that P (P # a ) 5 0.05.
= than 0.40.
b. Find a value b such that P (P . b ) 5 0.005.
= c. Find the probability that the sample proportion is more
c. Find a value c such that P ( 0.25 2 c # P # 0.25 1 c ) 5 than 0.45.
0.99. d. Find a symmetric interval about the mean ( p 5 0.46 )
=
d. Find the quartiles of the distribution of P . such that the probability that the sample proportion is in
this interval is 0.99.
7.74 Marketing and Consumer Behavior According to a
Applications
recent study, approximately 69% of consumers order food
7.70 Public Health and Nutrition Approximately 70% of online using a mobile device, for example, a smartphone or
American adults monitor a specific health indicator, for exam- tablet.21 Consumers use mobile applications to check restaurant
ple weight, diet, or exercise.18 Suppose a random sample of 250 menus, consider fast-food options, and even to order a pizza.
American adults is obtained, and all were asked if they monitor Suppose 220 consumers are randomly selected.
their health in some way. a. Find the probability that the sample proportion of
a. Find the distribution of the sample proportion of consumers who order food online using a mobile device
American adults who monitor their health. is less than 0.65.
b. Find the probability that the sample proportion is less b. Find the probability that the sample proportion of
than 0.66. consumers who order food online using a mobile device
c. Find the probability that the sample proportion is more is more than 0.72.
=
than 0.71. c. Find a value t such that P (P , t ) 5 0.95.
d. Find the probability that the sample proportion is between
7.75 Public Health and Nutrition The U.S. Centers for
0.68 and 0.78.
Disease Control announced that skin allergies have risen among
7.71 Technology and the Internet According to a report U.S. children to 12.5%.22 Serious allergies can affect a child’s
from Microsoft, 24% of PCs worldwide are not adequately pro- education, sleep, and ordinary daily activities. Suppose 320 U.S.
tected by antivirus software.19 Suppose 200 PCs from around children are randomly selected and tested for skin allergies.
the world are selected at random. a. Find the probability that the sample proportion of
a. Find the distribution of the sample proportion of PCs that children with skin allergies is less than 0.10.
are not adequately protected. b. Find the probability that the sample proportion of
b. Find the probability that the sample proportion is less children with skin allergies is between 0.08 and 0.16.
than 0.20. c. Suppose 150 U.S. children aged 5 to 9 are randomly
c. Find the probability that the sample proportion is more selected and each is tested for skin allergies. Nine children
than 0.29. are found to have skin allergies. Is there any evidence to
d. Find a value v such that the probability the sample suggest that children in this age group have a lower
proportion is less than v is 0.01. incidence rate of skin allergies? Justify your answer.
326 C h apte r 7 Sampling Distributions
7.76 Education and Child Development Admissions 7.80 Biology and Environmental Science Ships dumping
offices at universities and colleges keep careful records of garbage and ordinary beachgoers contribute to the increasing
acceptance and yield rates. The yield rate is the percentage of amount of trash that washes onto the shore and collects in the
admitted students who decide to accept an offer of admission. oceans all over the world. In 2012 the International Coastal
Historical data help them to plan for the incoming classes and Cleanup collected more than 9 million pounds of garbage from
guide some admission decisions. The yield rate for Yale the world’s oceans. Common debris items collected included
University is 64.1%.23 Suppose 100 students who were food wrappers, bottle caps, and beverage bottles. For the United
accepted to Yale are randomly selected. States, cigarettes accounted for 28.1% of all debris items.26 A
a. Find the probability that the sample proportion of those random sample of 430 debris items from a clean-up along a
who enroll is less than 0.55. Texas beach was obtained.
b. Find the probability that the sample proportion of those a. Find the probability that the proportion of cigarette debris
who enroll is between 0.60 and 0.65. items is greater than 0.31.
c. Suppose the actual sample proportion of those who enroll b. Find the probability that the proportion of cigarette debris
is 0.70. Is there any evidence to suggest that the yield has items is between 0.25 and 0.30.
increased? Justify your answer. c. Suppose 94 of the debris items are cigarettes. Is there any
evidence to suggest that the true proportion of cigarette
7.77 Business and Management A certain philanthropic debris items is different from 0.281? Justify your answer.
organization funds one out of every ten grant proposals. Sup-
pose a random sample of 300 grant proposals is obtained.
a. Find the probability that the sample proportion of funded Extended Applications
grant proposals is less than 0.075.
7.81 Manufacturing and Product Development Low-
b. Find the probability that the sample proportion of funded
quality coffee shipped from various locations in Europe tends
grant proposals is between 0.11 and 0.15.
to contain a high proportion of defective beans (beans com-
c. If the funding rate increases, the board of directors may
posed of foreign matter; moldy, black, unripe, or fermented
become concerned about resources being depleted.
beans; or those known as stinkers). Suppose a shipper claims
Suppose the actual sample proportion of funded grant
the proportion of defective beans is 0.07. A U.S. packaging
proposals is 0.16. Is there any evidence to suggest that
company receives a huge shipment of coffee beans and ran-
the funding rate has increased? Justify your answer.
domly select 1000 beans. If the sample proportion of defective
7.78 Sports and Leisure The fielding percentage for a beans is more than 0.09, then the entire shipment will be
major league baseball team is a measure of how well defen- returned to the supplier in Europe.
sive players handle a batted ball (without error) and is based a. If the true proportion of defective beans is 0.07 (as
on total chances ( putouts 1 assists 1 errors ) . One quarter claimed), what is the probability that the shipment will be
into the 2013 baseball season, the Boston Red Sox fielding sent back?
percentage was 0.984.24 Suppose 150 chances are randomly b. If the true proportion of defective beans is 0.08, what is
selected. the probability that the shipment will be accepted?
a. Find the distribution of the sample proportion of chances 7.82 Manufacturing and Product Development McGuckin
handled without error. Hardware in Boulder, Colorado, routinely receives shipments of
b. Find the probability that the sample proportion is less
4 3 8 foot sheets of 12 -inch-thick plywood. A plywood sheet
than 0.97.
= may contain defects, for example, a knot, a split, or a deviation
c. Find a value f such that P (P # f ) 5 0.05. in wood structure. Defective sheets reduce profits because they
d. If the team fielding percentage for these random chances are sold at a lower price. The manufacturer claims the propor-
is greater than 0.99, then everyone on the team receives a tion of all plywood sheets that are defective is 0.05. Suppose a
bonus. What is the probability of a team bonus? large shipment of plywood sheets is received and 200 are ran-
7.79 Psychology and Human Behavior Many Americans domly selected for inspection. If the sample proportion of
own a DVR or subscribe to a special TV recording service. For defective sheets is more than 0.09, the entire shipment will be
example, the Hopper from DISH network allows users to record sent back to the supplier.
up to six HD programs at once. Despite this increased record- a. If the true proportion of defective plywood sheets is 0.05,
ing flexibility, Motorola Mobility recently announced that for what is the probability that the entire shipment will be
U.S. TV viewers, 41% of the recorded content on DVRs is sent back?
never watched.25 Suppose 425 programs recorded on DVRs in b. If the true proportion of defective plywood sheets is 0.03,
the United States are randomly selected. what is the probability that the entire shipment will be
a. Find the probability that the sample proportion of sent back?
programs never watched is less than 0.37. c. If the true proportion of defective plywood sheets is 0.10,
b. Find the probability that the sample proportion is between what is the probability that the shipment will be accepted?
0.42 and 0.43.
= 7.83 Demographics and Population Statistics According
c. Find a value w such that P (P $ w ) 5 0.01. to National Geographic, 28% of Canadians are of British
Chapter 7 Exercises 327
descent.27 Suppose 150 Canadians in Alberta are randomly is defective. The acceptance sampling plan states: Accept the
selected and 185 Canadians in Ontario are randomly selected. entire shipment if the sample proportion of defective tiles is
a. What is the probability that the sample proportion of 0.08 or less; otherwise, reject the entire shipment.
those Canadians from Alberta of British descent is less a. Find the probability of accepting the entire shipment if
than 0.23? the true proportion of defective floor tiles is 0.01, 0.02,
b. What is the probability that the sample proportion of 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.10, 0.15.
those Canadians from Ontario of British descent is less b. Plot the probability of accepting the entire shipment (on
than 0.23? the y axis) versus the true proportion of defective floor
c. What is the probability that the sample proportion of tiles (on the x axis). The resulting graph is the OC curve
those Canadians from both provinces is greater than 0.35? for the given acceptance sampling plan.
7.85 Maximum Variance For a fixed sample size n, find
Challenge the value of p that maximizes the variance of the sample
p(1 2 p)
7.84 Manufacturing and Product Development Suppose proportion, s2P= 5 .
a company is receiving a large shipment of peel-and-stick vinyl n
floor tiles. The manufacturer claims the proportion of defective Hint: Compute the value of the variance for several different
floor tiles is 0.05. The company will select a random sample of values of p. Plot the variance (on the y axis) versus the value
200 floor tiles, carefully inspect each, and determine whether it of p (on the x axis).
Chapter 7 Summary
Concept Page Notation / Formula / Description
Chapter 7 Exercises
suggest the marble floor is unsafe on rainy days? Assume the b. What is the probability that the sample mean oxygen
underlying population standard deviation is 0.2 and justify your produced is between 7.25 and 7.5 ml/h?
answer. c. Suppose each plant is exposed to a new high-intensity
grow light and the sample mean oxygen produced for the
7.88 Marketing and Consumer Behavior In the fresh 35 plants is 8.1 ml/h. Is there any evidence that the new
fruits and vegetables section of a Kroger grocery store, lamp has increased oxygen output?
customers can purchase any desired amount (by placing the d. Answer parts (a), (b), and (c) if s 5 3.75 ml/h.
food in a plastic bag to be weighed and/or priced at the
checkout line). For those people who purchase cucumbers, the 7.92 Marketing and Consumer Behavior Seafood
probability distribution for the number purchased is given in restaurants along the coast of New England offer a variety of
the table below. CUCUMBER entrees, but lobster is the most popular meal. At Newick’s
Seafood Restaurant in Dover, New Hampshire, 37% of all
x 1 2 3 4 5 diners order lobster. Suppose 120 customers are selected at
p(x) 0.10 0.50 0.20 0.15 0.05 random, and the meal ordered by each is recorded.
a. Find the distribution of the sample proportion of diners
a. Suppose two customers who purchase cucumbers are
who order lobster, and carefully sketch the probability
selected at random. Find the exact probability distribution distribution.
for the sample mean number of cucumbers purchased, X . b. Find the probability that the sample proportion of diners
b. Find the mean, variance, and standard deviation of X . who order lobster is less than 0.30.
7.89 Marketing and Consumer Behavior A drive-in movie c. Find the probability that the sample proportion of diners
theater charges viewers by the carload, but keeps careful who order lobster is between 0.35 and 0.40.
records of the number of people in each car. The probability d. The manager of Newick’s is concerned that more cus-
distribution for the number of people in each car entering the tomers might be ordering lobster. This would require
drive-in is given in the table below. DRIVEIN a change in restaurant ordering and a shift in kitchen
staff. Suppose the actual proportion of diners who order
x 1 2 3 4 5 6 lobster is 0.42. Is there any evidence to suggest that the
p(x) 0.02 0.30 0.10 0.30 0.20 0.08 proportion of diners who order lobster has increased?
Justify your answer.
a. Suppose two cars entering the drive-in are selected at
random. Find the exact probability distribution for the 7.93 Physical Sciences Tropical rainforests are located in
maximum number of people in either one of the cars, M. areas near the Equator. The temperature in a typical rainforest
b. Find the mean, variance, and standard deviation of M. ranges from 20 to 30°C, and the humidity is usually between
77% and 88%.29 The mean amount of rain per year in any one
7.90 Demographics and Population Statistics Many clubs rainforest is m 5 155 inches with standard deviation s 5 35
and companies around the country offer hot-air balloon rides. inches. Suppose 30 rainforests are selected at random and the
Most impose strict safety regulations and take at most four amount of rain per year is recorded for each.
adults at one time, plus a pilot. The mean weight for an adult a. What is the probability that the mean rainfall per year for
male in the United States is 195.5 pounds.28 Suppose the the 30 rainforests is more than 170 inches?
distribution is normal with a standard deviation of 30 pounds. b. What is the probability that the mean rainfall is between
a. If a hot-air balloon pilot (an adult male) takes three adult 140 and 150 inches?
males for a ride, what is the distribution for the total c. Find a value r such that the probability that the mean
weight aboard, T ? Carefully sketch the probability distri- rainfall is less than r is 0.001.
bution for T.
b. If the total weight is less than 900 pounds, the pilot will 7.94 Technology and the Internet An office manager
have enough fuel to extend the ride by a few minutes. has several computers running distributed programs.
What is the probability of an extended ride? Because of the demands on the system, the machines may
c. If the total weight is over 975 pounds, then the balloon crash at various times during the day and require a hard
will not be able to take off. What is the probability that reset. The probability distribution for the number of times a
the balloon will not be able to take off? randomly selected machine crashes during a day, X, is given
in the table below. COMPUTER
7.91 Biology and Environmental Science A typical
houseplant produces oxygen (from carbon dioxide) in varying x 0 1 2 3 4 5
amounts, depending on the amount of light and water. Suppose
p(x) 0.50 0.30 0.10 0.07 0.02 0.01
a medium Norfolk Island pine produces 7.5 ml per hour of
oxygen when exposed to normal sunlight. Thirty-five Norfolk a. Find the mean, variance, and standard deviation for the
Island pines are selected at random, and the oxygen output is number of crashes by a single machine during a day.
carefully measured for each. Assume s 5 1.75 ml/h. b. Suppose n 5 2 machines are selected at random. Find the
a. What is the probability that the sample mean oxygen sampling distribution of the statistic T, the total number
produced is less than 7 ml/h? of crashes for the two machines.
Chapter 7 Exercises 329
7.99 Travel and Transportation The Greenline Taxi recyclable products once a week, sorted into barrels of paper,
Company in New York City keeps careful records of items glass, and plastic. The amount of glass recycled per week by a
left behind by riders. Most items are claimed, but many single household is normally distributed with mean m 5 27
remain in a lost-and-found area at the company’s headquar- pounds and standard deviation s 5 7 pounds. Suppose 12
ters. Records indicate that the proportion of riders who leave households are randomly selected.
an item in a taxi is 0.12. Suppose a random sample of 250 a. What is the probability that the total glass collected for
riders is obtained. the 12 homes, T, will be less than 350 pounds?
a. Find the sampling distribution of the sample proportion b. Find a value g such that P ( T $ g ) 5 0.05.
of riders who leave an item behind. Carefully sketch the c. If the total glass collected for the 12 homes is more than
probability density function. 400 pounds, the recycling plant will make a profit. Find
b. What is the probability that the sample proportion of rid- a value of m such that the probability of making a profit
ers who leave an item behind is more than 0.15? is 0.10.
c. What is the probability that the sample proportion
of riders who leave an item behind is between 0.11 7.103 Sports and Leisure The Hotel Association of Canada
and 0.115? reported that, because of the high cost of airline tickets and
d. Some people speculate that in bad economic times, rid- gasoline, more people are taking a staycation, a vacation close
ers (and people in general) are more careful with their to their hometown. According to a survey, 31% of travelers
belongings. Suppose this sample was obtained during a stayed in a hotel near home, visited local attractions, restau-
recession, and the sample proportion of riders who left rants, and shopped.32 Suppose 250 Canadians planning to take a
an item behind was 0.09. Is there any evidence to suggest vacation are selected at random and asked whether they will
that the true proportion is less than 0.12? take a staycation.
a. What is the probability that the sample proportion who
7.100 Manufacturing and Product Development Eye- will take a staycation is between 0.30 and 0.35?
drops are used by many people to soothe and relieve irritation b. What is the probability that the sample proportion is more
and to lubricate their eyes. A manufacturer claims that the than 0.37?
amount of dextran, which helps to prolong the effect of c. Find a value of the sample size n such that the probability
eyedrops, contained in its eyedrops product is 70%. Thirty-six that the sample proportion is less than 0.40 is 0.95.
randomly selected bottles of these eye drops were obtained and
the sample mean amount of dextran was 68.25%. Assume the 7.104 Manufacturing and Product Development A
population standard deviation is s 5 5%. manufacturer of ice pops fills each plastic container with a
a. Find the sampling distribution of the sample mean. fruity liquid, leaving enough room so that consumers can freeze
b. Find the probability that the amount of dextran is more the product. A filling machine is set so that the amount of liquid
than 71%. in each ice pop is normally distributed with mean 8.00 ounces
c. Does the sample mean found suggest that the manufac- and standard deviation 0.25 ounce. Suppose 16 ice pops are
turer’s claim is wrong? Justify your answer. randomly selected.
a. Find the probability distribution of the sample mean num-
7.101 Public Health and Nutrition Individuals who belong
ber of ounces in each container, X .
to a health maintenance organization (HMO) usually share the
b. Find the probability that the sample mean is less than
cost of certain medical services, for example, paying $10 for an
7.9 ounces.
office visit. An insurance company study indicates that the
c. Find the probability that the sample mean is more than
proportion of all people who belong to an HMO who have a
8.15 ounces.
copayment of more than $10 for an office visit is 0.65. Suppose
d. Suppose the filling machine operator can fine-tune the
1000 people who belong to an HMO are randomly selected, and
process by controlling the standard deviation of the
the office copayment amount is recorded.
fill. Find a value for s such that the probability that the
a. Find the distribution of the sample proportion of people
sample mean is more than 8.05 ounces is 0.05.
who have a copayment of more than $10 for an office
visit. Verify the nonskewness criterion. 7.105 Biology and Environmental Science During
b. Find the probability that the sample proportion is more Spring 2013, a huge population of 17-year cicadas emerged
than 0.66. from the ground on the East Coast of the United States.
c. Find the probability that the sample proportion is between These prehistoric-looking bugs have red eyes, huge wings,
0.64 and 0.67. and make noises as loud as a lawnmower.33 Suppose the
d. Find a value h such that the probability that the sample mean noise level is 90 decibels and the standard deviation is
proportion is less than h is 0.01. 15.6 decibels. During the cicada emergence in Virginia and
other Southern states, a random sample of 40 locations was
selected and the noise level (in decibels) was carefully
Extended Applications
measured.
7.102 Physical Sciences Carlinville, Illinois, has just started a. Find the probability distribution of the sample mean
an ambitious recycling program. Special trucks collect noise level.
Chapter 7 Exercises 331
b. Find the probability that the sample mean is less than b. Suppose the true mean diameter of the table tennis balls
60 decibels, the level of moderate conversational speech. is 40.4 mm. What is the probability that the shipment will
c. Find the probability that the sample mean is more than be accepted by the sporting goods store?
110 decibels, the level of a passing train. c. Suppose the true mean diameter of the table tennis balls
d. Find a value of n such that the probability that the sample is 39.4 mm. What is the probability that the shipment will
mean is less than 93 decibels with probability 0.95. be accepted by the sporting goods store?
7.106 Sports and Leisure A large sporting goods company
LAST STEP
has just received a shipment of 100,000 table tennis balls. USA
Table Tennis tournament regulations specify that the diameter 7.107 How much was the typical foreclosure check?
of the ball must be 40 millimeters (mm). Suppose the distribu- The checks sent to Americans in 2013 were in the amount
tion of the diameter is normal with standard deviation 0.4 mm. of $300 to $125,000. Suppose a spokesperson from the U.S.
Twenty-five table tennis balls will be selected at random, and Treasury claimed that the mean check amount was $800 with
the diameter of each will be carefully measured. If the mean standard deviation $125. A random sample of 40 Americans
diameter is within 0.2 mm of 40 mm, then the shipment will be compensated under this program was obtained. The amount of
accepted. Otherwise the entire shipment will be returned to the each check was recorded and the sample mean check amount
manufacturer. for these Americans was $755.
a. Suppose the true mean diameter of the table tennis balls a. Find the distribution of the sample mean.
is 40 mm. What is the probability that the entire shipment b. Is there any evidence to suggest that the true mean check
will be sent back to the manufacturer? amount was less than $800?
8 Confidence Intervals Based
on a Single Sample
Looking Back
n Remember numerical summary measures, for example, the sample mean, sample variance,
and sample proportion.
n Recall that the sampling distribution of a statistic is the probability distribution of
the statistic.
Looking Forward
n Learn the properties of point estimators and understand what makes an estimator good.
n Construct confidence intervals (CIs) for various parameters.
Contents
8.1 Point Estimation
8.2 A Confidence Interval for a Population Mean When s Is Known
8.3 A Confidence Interval for a Population Mean When s Is Unknown
8.4 A Large-Sample Confidence Interval for a Population Proportion
8.5 A Confidence Interval for a Population Variance or Standard Deviation
333
334 C hapter 8 Confidence Intervals Based on a Single Sample
Definition
An estimator (statistic) is a rule used to produce a point estimate of a population
parameter.
Suppose we need to estimate a population parameter u and there are many different
statistics (rules) available. Which one should we use? Figure 8.1 shows the sampling
distributions of three different statistics for estimating u. These graphs suggest some
properties of a good statistic.
(a) (b) (c)
Figure 8.1 The sampling distributions for three different statistics for estimating u: (a) a
skewed statistic; (b) a statistic with large variance; (c) a statistic with small variance. The
=
horizontal axis represents all possible values of u .
The statistic in (a) is unlikely to produce a value close to u. The sampling distribution
is skewed to the left, and most of the values of the statistic are to the right of u. The sta-
tistic in (b) is centered at u. On average (in the long run), this statistic will produce u
(that’s good!). However, this statistic has large variance (that’s bad!). Even though the
sampling distribution is centered at the true value of the population parameter, specific
estimates will probably be far away from u. The statistic in (c) exhibits two very desirable
properties. It is centered at the true value of the population parameter, and it has small
variance. These observations suggest two rules for selecting a statistic.
Definition
= =
A statistic u is an unbiased estimator of a population parameter u if E ( u ) 5 u, the mean
=
of u is u.
= =
If E ( u ) 2 u, then the statistic u is a biased estimator of u.
Figure 8.2 illustrates the sampling distribution of an unbiased estimator for u. The
distribution of the statistic is centered at u; the value of the statistic is, on average, u.
Figure 8.2 also shows the sampling distribution of a biased estimator for u. On average,
the value of this statistic is greater than u.
8.1 Point Estimation 335
(a) (b)
Figure 8.2 The sampling distribution for (a) an unbiased estimator for u, and (b) a biased
=
estimator for u. The horizontal axis represents all possible values of u .
= =
Suppose there are two statistics to choose from, u 1 and u 2, for estimating u, as shown
= =
in Figure 8.4. The statistic u 1 is unbiased but has large variance; u 2 is slightly biased but
has small variance. The choice of an estimator is a difficult decision, and there is no
definitive answer.
336 C hapte r 8 Confidence Intervals Based on a Single Sample
2
1
Figure 8.4 Sampling distributions
=
of an unbiased statistic u 1 with large
=
variance and a biased statistic u 2 with
small variance.
For a given parameter u, a MVUE Suppose we need to estimate the population parameter u, and there are several unbi-
may not exist. ased statistics from which to choose. If one of these statistics has the smallest possible
variance, it is called the MVUE (minimum-variance unbiased estimator). If the underly-
ing population is normal, the sample mean, X , is the MVUE for estimating m. So, if the
population is normal, the sample mean is a really good statistic to use for estimating m. X
is unbiased, and it has the smallest variance of all possible unbiased estimators for m.
Find point estimates for the population mean petroleum-solvent storage capacity and for
the population median petroleum-solvent storage capacity.
Solution
Step 1 Use the sample mean to estimate the population mean.
1
x5 ( 770 1 875 1 c1 925 1 1100 )
12
1
5 ( 11,145 ) 5 928.75
12
A point estimate of the population mean petroleum-solvent storage capacity is
928.75 gallons.
Step 2 Use the sample median to estimate the population median.
Order the observations from smallest to largest. The sample median is the middle
value. There are n 5 12 observations, so the middle value is in position 6.5, that
is, the mean of the values in positions 6 and 7.
Ordered observations:
770 800 830 850 875 925 940 950 980 1000 1100 1125
c
~x 5 1 ( 925 1 940 ) 5 932.50
2
Technology Corner
Procedure: Compute point estimates.
Reconsider: Example 8.1, solution, and interpretations.
CrunchIt!
CrunchIt! has a built-in function to find certain descriptive statistics, including the sample mean and the sample median.
1. Enter the data into a column.
2. Select Statistics; Descriptive Statistics. Choose the appropriate column and click the Calculate button. See Figure 8.5.
TI-84 Plus C
There are several built-in functions to compute summary statistics. STAT ; CALC ; 1-Var Stats may be used to find
several summary statistics at once. The LIST ; MATH menu contains several functions that return single point estimates.
1. Enter the data into list L1.
2. Use LIST ; MATH; mean to find the sample mean and LIST ; MATH; median to find the sample median. See Figure 8.6.
Minitab
There are several built-in functions to compute summary statistics, in a session window, using a graphical input window,
or the Calculator.
1. Enter the data into column C1.
2. Select Stat; Basic Statistics; Display Descriptive Statistics.
3. Enter C1 in the Variables input window. Use the Statistics options button to select the desired estimates. See Figure 8.7.
Excel
There are several built-in functions to compute summary statistics and under the Data tab, Data Analysis; Descriptive
Statistics returns several summary statistics at once.
1. Enter the data into column A.
2. Use the function AVERAGE to find the sample mean and the function MEDIAN to find the sample median. See
Figure 8.8.
Practice
8.7 The graph below shows the probability density functions for 8.9 Why is an unbiased estimator for u better than a biased
three different statistics that could be used to estimate a popula- estimator for u?
tion parameter u. Which statistic would you use, and why? 8.10 Suppose there are several unbiased estimators for u. What
criterion would you use for selecting one of these estimators,
and why?
1 2
Applications
8.11 Business and Management A recent survey asked
employees in technology-related jobs what perks they would
like their employer to provide. Out of 1200 randomly selected
workers polled, 975 said they would like ongoing training paid
for by their employer. Find a point estimate for the population
proportion of all workers in technology-related jobs who would
3 like ongoing training.
8.12 Biology and Environmental Science A hand is a
traditional unit used to measure the height of a horse. One hand
8.2 A Confidence Interval for a Population Mean When s Is Known 339
is four inches, and the height of a horse is measured from the course sprayers would like to rate the pressure (in psi)
ground to the horse’s shoulder. A random sample of horses sold developed by this unit. Thirty pumps were randomly selected
at auctions around the country revealed the following heights and tested. pumps
(in hands). horseht a. Find a point estimate for the minimum pressure developed
by this pump.
15.8 13.7 11.0 17.1 19.3 14.6 14.4 13.8 b. Find a point estimate for the maximum pressure developed
18.7 16.5 12.2 17.9 16.8 16.3 12.8 18.7 by this pump.
10.4 15.6 11.7 12.8 c. Use your answers to parts (a) and (b) to construct an
interval estimate for the pressure developed by this pump.
a. Find a point estimate for the population mean height of all 8.16 Travel and Transportation The San Francisco Bay
horses sold at auctions. Area Rapid Transit (BART) system conducted a survey to
b. Find a point estimate for the population median height of learn if bikes on board would have any effect on riders. A
all horses sold at auctions. random sample of riders was selected and asked if there was
c. Find a point estimate for the population variance of the enough room for bikes and passengers. Of the 1720 riders
height of all horses sold at auctions. selected, 671 indicated there was enough room and 327 said
d. Any horse with a height less than 14.5 hands is considered it would be too crowded.2
short. Find a point estimate for the true proportion of short a. Find a point estimate for the proportion of all BART riders
horses sold at auctions. who believe there is enough room for passengers and bikes.
8.13 Biology and Environmental Science During the first b. Find a point estimate for the proportion of all BART
few weeks of January every year, biologists associated with the riders who believe it is too crowded with bikes on board.
U.S. Fish and Wildlife Service and the Maryland Department of c. Let pd denote the difference in population proportions
Natural Resources physically count ducks, geese, and swans between all BART riders who believe there is enough
near the Chesapeake Bay. In 2013 these scientists counted room for passengers and bikes and all BART riders who
175,500 ducks, of which 33,100 were Mallards.1 Find a point believe it is too crowded. Find a point estimate for pd.
estimate for the proportion of Mallards that were observed 8.17 Medicine and Clinical Studies Merck recently
during the 2013 winter duck count. released an experimental insomnia drug, suvorexant. This drug
8.14 Biology and Environmental Science Cowrie shells temporarily blocks chemical messengers that work to keep
have been used as money in many parts of the world, for example, people alert and awake.3 The new pill seems to work but does
China and Africa. Today, they are used in decorations and jewelry, have some potential harmful side effects, for example, extended
and cowrie-shell bracelets and necklaces are popular near seaside daytime drowsiness. A random sample of patients taking this
resorts. A jeweler recently purchased several hundred cowrie experimental drug was obtained. The dosage in milligrams was
shells to use in making earrings. A random sample of the finished recorded for each patient. sleep
earrings was obtained, and the weight (in grams) of each is given a. Find a point estimate for the mean dosage.
in the table below. cowrie b. Find a point estimate for the variance of the dosage.
c. Find a point estimate for the first quartile and a point
7.3 7.2 7.2 7.9 7.3 7.3 7.0 7.0 7.4 7.4 estimate for the third quartile.
7.4 7.2 7.5 7.7 7.1 7.0 7.2 7.2 7.7 6.9 8.18 Ambulance Response Time In March 2013, the
7.4 7.8 7.7 7.4 7.5 7.7 7.5 7.3 7.6 7.3 17 ambulances in Manatee County, Florida, responded to
3645 calls.4 The response time is measured from the time the
a. Find a point estimate for the first quartile and a point ambulance is notified of the call until the crew arrives on the
estimate for the third quartile. scene. A random sample of the March response times was
b. The smallest (in weight) 20% of all cowrie-shell earrings obtained. ambresp
are sold at a discount. Find a point estimate for the 20th a. Find a point estimate for the minimum response time.
percentile of the cowrie-shell earring weight distribution. b. Find a point estimate for the maximum response time.
c. Find a point estimate for the interquartile range of
8.15 Manufacturing and Product Development A
company that manufactures a centrifugal pump for golf- response times.
interval of values is constructed so that we can be reasonably sure the true value of the
population parameter lies within it.
Statistical Applet
Definition
Confidence A confidence interval (CI) for a population parameter is an interval of values constructed
intervals so that, with a specified degree of confidence, the value of the population parameter lies
within it.
The confidence coefficient is the probability that the CI encloses the population param-
eter in repeated samplings.
The confidence level is the confidence coefficient expressed as a percentage.
A Closer L ok
Stepped
STEPPED Tutorial
TUTORIALS 1. A confidence interval is usually expressed as an open interval, for example, (10.5,
General
BOX PLOTSidea of a 15.8), where 10.5 is the left endpoint, or lower bound, and 15.8 is the right end-
confidence interval point, or upper bound. The interval extends all the way to, but does not include, the
endpoints.
2. Typical confidence coefficients are 0.95 and 0.99.
3. Typical confidence levels are, therefore, 95% and 99%.
The following steps provide background for the construction of a confidence interval
for a population mean m.
1. Suppose either (a) the underlying population is normal, or (b) the sample size n is
large, or both, and the population standard deviation s is known.
2. Using the properties of X and the central limit theorem (if necessary): The sample
mean X is (approximately) normal with mean m and variance s2 /n.
In symbols, X , N ( m, s2 /n ) .
3. Using the empirical rule, approximately 95% of all values of the sample mean lie
within two standard deviations of the population mean. Figure 8.9 shows an
example of a single value, or point estimate, x within two standard deviations of
the mean.
4. Even though we know the distribution of X is centered at m, we do not know the true
value of m. To capture m it seems reasonable to step two standard deviations from an
estimate x in both directions. The resulting (rough) 95% confidence interval
( x 2 2s / !n, x 1 2s / !n ) is illustrated in Figure 8.10.
X
X
( )
x x – 2(/–n) x x + 2(/–n)
Figure 8.9 The sampling distribution of Figure 8.10 A rough 95% confidence
X and a typical value. interval that probably captures the true
value m.
8.2 A Confidence Interval for a Population Mean When s Is Known 341
To construct a more accurate 95% confidence interval for m, using the same assump-
tions, begin with the standardized random variable Z.
X 2m
1. X , N ( m, s2 /n ) S Z5 , N ( 0, 1 )
s / !n
2. Find a symmetric interval about 0 such that the probability that Z lies in this interval
is 0.95.
P ( 21.96 , Z , 1.96 ) 5 0.95 Use Table III in the Appendix; see Figure 8.11.
3. Substitute for Z and manipulate the interval inside the probability statement so that m
is caught in the middle.
X 2m
Pa21.96 , , 1.96b 5 0.95
s
Substitute for Z.
s / !n
s s
Pa ( 21.96 ) , X 2 m , ( 1.96 ) b 5 0.95 Multiply all three parts by s/ !n.
!n !n
s s
Pa2X 2 ( 1.96 ) , 2m , 2X 1 ( 1.96 ) b 5 0.95 Subtract X from all three parts.
!n !n
s s Multiply all three parts by 21.
PaX 1 1.96 . m . X 2 1.96 b 5 0.95 Multiplying by 21 changes the
!n !n directions of the inequalities.
s s
PaX 2 1.96 , m , X 1 1.96 b 5 0.95 Rewrite the expressions in increasing order.
s
!n !n
The last expression includes a formula for a 95% confidence interval for m: Step
exactly 1.96 (not 2) standard deviations from a specific value x in both directions. This
interval is shown in Figure 8.12.
The last step is to find a more general 100 ( 1 2 a ) % confidence interval for m, using
the same assumptions. a is usually small. For example, if a 5 0.05, the resulting confi-
dence level is 100 ( 1 2 0.05 ) % 5 95%. Usually, the confidence level is given and we
need to work backward to find a. The following definition is necessary in order to start
with a probability statement involving Z.
342 C h a p t e r 8 Confidence Intervals Based on a Single Sample
A Closer L ok
1. za/2 is simply a z value such that there is a /2 of the area (probability) to the right of
za/2. 2za/2 is just the negative critical value.
2. Critical values are always defined in terms of right-tail probability.
3. z critical values are easy to find by using the complement rule and working backward.
For example,
P ( Z $ za/2 ) 5 a /2 Definition of critical value.
To find a general confidence interval for m, start once again in the Z world. Find a
s ymmetric interval about 0 such that the probability that Z lies in this interval is 1 2 a
(Figure 8.13).
P ( 2za/2 , Z , za/2 ) 5 1 2 a (8.1)
Manipulate Equation 8.1 to obtain the probability statement.
s s
PaX 2 za/2 , m , X 1 za / 2 b512a
Stepped !n !n
STEPPED Tutorial
TUTORIALS
Confidence
BOX PLOTS interval Figure 8.14 illustrates this interval for a specific value x.
for a mean
These derivations lead to the following general result.
Z X
( )
z/2 0 z/2 x – z/2(/n) x x + z/2(/n)
Figure 8.13 A symmetric interval about 0 Figure 8.14 A 100(1 2 a)% confidence
such that P(2za/2 , Z , za/2) 5 1 2 a. interval for m.
A Closer L ok
1. Equation 8.2 can be used only if s is known.
VIDEO Tech
Video TECH Manuals
MANUALS 2. If n is large and s is unknown, some statisticians suggest using the sample standard devi-
EXEL DISCRIPTIVE
Confidence ation s in Equation 8.2. in place of s. This produces an approximate confidence interval
SOLUTION TRAIL
intervals—Z— for m. The next section presents an exact confidence interval for m when s is unknown.
summarized data
3. As the confidence coefficient increases (with s and n constant), the critical value za/2
increases. Therefore, the confidence interval is wider.
(18.12, 19.38) is a 95% confidence interval for the true mean weight (in pounds)
of 185/60/14 tires.
Figure 8.15 and 8.16 together show a technology solution.
There are two important ideas to remember when constructing a confidence interval.
1. The population parameter, in this case m, is fixed. The confidence interval varies
from sample to sample. It is correct to say, “We are 95% confident that the interval
captures the true mean m.” The statement, “We are 95% confident m lies in the
interval,” (incorrectly) implies that the interval is fixed and the parameter m varies.
2. The confidence coefficient, a probability, is a long-run limiting relative frequency.
In repeated samples, the proportion of confidence intervals that capture the true
value of m approaches 0.95. Figure 8.17 illustrates this concept. We cannot be
certain about any one specific confidence interval. The confidence is in the long-
run process.
This confidence
interval does not
capture the true
value of .
Repeated
confidence
intervals
In the following example, actual data are presented, rather than summary statistics.
Equation 8.2 is used again to construct a confidence interval for a population mean.
VIDEO Tech
Video TECH Manuals
MANUALS
EXEL DISCRIPTIVE
Confidence
Example 8.3 Vitality Snack
intervals—Z—with In May 2013, Mamma Chia introduced a new drink in four flavors called the Chia Squeeze
data
Vitality Snack. The product was advertised as a good source of fiber, containing fruit and
vegetables, no added sugar, gluten-free and vegan, and only 70 calories. The drink had a
Data Set
smoothie-like texture, and the company claimed each squeeze packet contained 1200 mg
vitality of omega-3.5 A random sample of Green Magic Vitality Snacks was obtained and the
SOLUTION TRAIL
Solution Trail 8.3a amount of omega-3 was carefully measured in each.The data are given in the table below.
Assume that s 5 12 mg.
K e y w ords
n 99% CI for the mean 1192 1200 1207 1185 1198 1194 1210 1197 1212 1209
n s 5 12 mg 1189 1202 1194 1196 1179 1191 1214 1197 1213 1213
n Random sample 1211 1193 1204 1187 1196 1194 1220 1193 1194 1194
Tr a nsl at i o n 1177 1181 1217 1213 1204 1197 1221 1198 1210 1183
n 99% CI for m
a. Find a 99% confidence interval for the mean amount of omega-3 in each Green Magic
n Known standard deviation
Vitality Snacks.
C onc e p ts b. Using the confidence interval in part (a), is there any evidence to suggest that the popu-
n A 100 ( 1 2 a ) % CI for a lation mean amount of omega-3 is different from 1200 mg?
population mean when s
is known
Solution
V isi on a. s 5 12 n 5 40 Given.
Construct a 99% CI for m. No
1
information is given about the x5 ( 1192 1 c1 1183 ) 5 1199.47 Compute the sample mean.
shape of the underlying 40
population distribution. 1 2 a 5 0.99 1 a 5 0.01 1 a /2 5 0.005 Find a/2.
However, n 5 40 ( $30 ) , so the
P ( Z $ za/2 ) 5 P ( Z $ z0.005 ) 5 0.005 Definition of critical value.
central limit theorem applies,
and we can use Equation 8.2. P ( Z # z0.005 ) 5 1 2 0.005 5 0.995 The complement rule.
Solution Trail 8.3b (1194.58, 1204.36) is a 99% confidence interval for the true mean amount of omega-3
in a Green Magic Vitality Snack. Figure 8.18 shows a technology solution.
K e y w ords
n Is there any evidence? b. Claim: m 5 1200.
Experiment: x 5 1199.47.
Tr a nsl at i o n
n Inference procedure Likelihood: The likelihood in this case is expressed as a 99% confidence interval, an
interval of likely values for m: (1194.57, 1204.37).
C onc e p ts
Conclusion: This CI includes 1200, so there is no evidence to suggest that m is
n A 100 ( 1 2 a ) % CI for a
different from 1200.
population mean when s is
known
Try It Now Go to Exercise 8.35
V isi on
A 99% CI for m is an interval in Here is one more example involving a data set instead of summary statistics.
which we are 99% confident that
the true value of m lies. If the CI Example 8.4 Health Canada
in part (a) captures, or includes,
In April 2013, the second set of biomonitoring data from the Canadian Health Measures
1200, then there is no evidence
Survey was released. This report contained information on 91 chemicals and was designed
to suggest that m is different
from 1200. Use the four-step to track levels over time in order to assess certain regulations and health-risk management
inference procedure. actions. 100% of Canadians tested in this survey, aged 3 to 79, had lead in their blood.
However, almost all had blood lead levels lower than the current intervention level.6 A
346 C hapt er 8 Confidence Intervals Based on a Single Sample
Data Set random sample of 50 Canadians was obtained and the blood lead level in each was recorded,
healthca in mg/dL. The data are given in the following table. Assume that s 5 0.3 mg/dL.
1.6 1.3 1.6 1.3 1.3 1.2 1.2 1.3 1.0 1.3 1.1 0.7 1.2
SOLUTION TRAIL
1.3 1.3 1.5 0.9 1.3 1.1 1.0 1.5 1.7 1.3 1.6 0.8 1.1
1.5 1.3 1.4 0.9 1.0 1.5 1.2 1.3 1.5 1.3 1.6 1.4 1.9
0.8 0.9 1.6 1.4 1.3 1.2 1.0 0.7 1.2 1.5 0.8
Find a 98% confidence interval for the true mean blood lead level in Canadians.
Solution Trail 8.4
Solution
Ke ywor ds
Step 1 s 5 0.3 n 5 50 Given.
n 98% CI for the true mean
1
n s 5 0.3 x5 ( 1.6 1 c1 0.8 ) 5 1.254 Compute the sample mean.
50
n Random sample
1 2 a 5 0.98 1 a 5 0.02 1 a /2 5 0.01 Find a/2.
Transl at i on
P ( Z $ za/2 ) 5 P ( Z $ z0.01 ) 5 0.01 Definition of critical value.
n 98% CI for m
n Known standard deviation P ( Z # z0.01 ) 5 1 2 0.01 5 0.99 The complement rule.
Many confidence intervals have endpoint formulas that are of the same general form.
=
Suppose the statistic u is used to estimate the population parameter u. The general form
of a confidence interval is
= =
( Point estimate using u ) 6 ( critical value ) # 3 ( estimate of ) standard deviation of u 4 .
The critical value may be from the standard normal distribution or some other reference
distribution. If the actual standard deviation of the statistic is not known (a likely situa-
tion), then an estimate is used.
The (two-sided) confidence interval derived above is the most common CI. However,
there are other two-sided confidence intervals, and also one-sided confidence intervals
consisting of either only a lower bound or only an upper bound. The derivation of several
one-sided confidence intervals is outlined in the Challenge problems.
8.2 A Confidence Interval for a Population Mean When s Is Known 347
The error of estimation is the step, Most of the time, as in the examples above, there is no control over the sample size.
the distance from the sample The data or summary statistics are presented, and a confidence interval is constructed.
mean to the left and to the right However, a certain desired accuracy may be expressed by the width of a confidence inter-
endpoint of the confidence val. Given s and the confidence level, a sample size n can be computed such that the
interval. We would like to find a resulting confidence interval has a certain desired width.
sample size n so that the step is Suppose n is large (but unknown), s is known, and the confidence level is 100 ( 1 2 a ) %.
no bigger than B. Therefore, B is If the desired width is W, let B 5 W/2, called the bound on the error of estimation. Half
called a bound on the error of the width of the confidence interval is given by the step (in each direction) from x. The
estimation. endpoints of the confidence interval for m are (from Equation 8.2)
s
x 6 za / 2
!n
B 5 bound
Let B 5 za/2 ( s / !n ) , and solve for n. The resulting formula for n is given by
s za / 2 2
n5a b (8.3)
B
A Closer L ok
In symbols, 1. Consider the effect on n as one value (on the right-hand side of Equation 8.3)
sc 1 nc changes while the other two remain constant.
za/ 2 c 1 n c As s increases (with za/2 and B constant), the fraction is larger, the square is larger, and n
is larger. This result makes sense, because a larger underlying variance would require a
BT 1 nc larger sample size to preserve a certain bound on the error of estimation.
As za/2 increases (equivalently, as the confidence level increases, with s and B con-
SOLUTION TRAIL
VIDEO Tech
Video TECH Manuals
MANUALS stant), again the fraction is larger, the square is larger, and n is larger. More confidence
EXEL DISCRIPTIVE
Sample size requires a larger n to maintain the same bound on the error of estimation.
calculation As B decreases (with s and za/2 constant), the denominator is smaller, which makes
the fraction larger, the square larger, and n larger. A smaller bound on the error of
estimation (i.e., a smaller CI) means more information (a larger sample size) is needed
Solution Trail 8.5 to maintain the same confidence level.
2. When Equation 8.3 is applied, it is likely that the value of n will not be an integer.
K e y w ords Always round up. This guarantees a 100 ( 1 2 a ) % confidence interval with a bound
n 95% CI for the mean on the error of estimation of at most B. Any larger sample size will also be sufficient.
n Bound of 5 feet, standard 3. It is unlikely that s, needed in Equation 8.3, is known. However, often an experi-
deviation 15 feet menter can make a very good guess at s from previous experience or a small
n How large a sample? preliminary study.
Tr a nsl at i o n
n B 5 5, s 5 15 The following example illustrates the use of Equation 8.3.
Find a value of n
n
Example 8.5 Geyser Height
C onc e p ts Yellowstone National Park, which opened in 1872, includes magnificent scenery, historic
n Equation 8.3 for n sites, attractions of scientific interest, and recreational areas. The Cliff Geyser is located
V isi on
in the Black Sands Basin on Iron Creek. The time between eruptions varies considerably,
from a few minutes to several hours.7 A National Park Service researcher would like to
The researcher is interested in a
95% CI for a population mean find a 95% confidence interval for the mean height of the Cliff Geyser eruption with a
with a certain bound on the error bound on the error of estimation of five feet. Previous experience suggests that the popu-
of estimation. There is a good lation standard deviation for the height is approximately 15 feet. How large a sample is
estimate for the population necessary to achieve this accuracy?
standard deviation. Find the
appropriate critical value, and Solution
use Equation 8.3 to determine a
Step 1 s 5 15 B 5 5 Given.
lower bound for the sample size.
1 2 a 5 0.95 1 a /2 5 0.25 1 z0.025 5 1.96 Find the critical value.
348 C h a p t e r 8 Confidence Intervals Based on a Single Sample
The necessary sample size is n $ 35 (always round up). This will guarantee that a 95%
confidence interval for the population mean height will have a bound on the error of
estimation no greater than five.
Technology Corner
Procedure: Construct a confidence interval for a population mean when s is known.
Reconsider: Example 8.2, solution, and interpretations.
CrunchIt!
Use the built-in function z 1-Sample. Input is either summary statistics or a column containing the sample data.
1. Select Statistics; z; 1-sample.
2. Under the Summarized tab, enter values for n, the Sample Mean, the Standard Deviation, and the Confidence Interval
Level (Figure 8.20). Click Calculate. The results are displayed in a new window (Figure 8.21).
TI-84 Plus C
Use the built-in function ZInterval. Input is either summary statistics or a list containing the data.
1. Select STAT ; TESTS; ZInterval.
2. Highlight Stats (for summary statistics). Enter values for s, x, n, and the C-Level (the confidence coefficient). See
Figure 8.15.
3. Highlight Calculate and press ENTER . See Figure 8.16.
8.2 A Confidence Interval for a Population Mean When s Is Known 349
Minitab
Use the built-in function 1-Sample Z. Input is either summary statistics or a column containing data.
1. Select Stat; Basic Statistics; 1-Sample Z.
2. Choose Summarized data and enter the Sample size, Mean, and Standard deviation.
3. Select the Options button and enter the confidence level. Click OK. The results are displayed in the session window.
See Figure 8.22.
Excel
There is no built-in function to compute the endpoints of a confidence interval for a population mean when s is known.
Use the function CONFIDENCE.NORM to find the step (za/2 # s / !n) in each direction from the mean, and the function
AVERAGE to find the sample mean, if necessary.
1. Use the command CONFIDENCE.NORM to find the step. The arguments are a, s, and n.
2. Compute the left and right endpoints of the confidence interval. See Figure 8.23.
8.29 Two statisticians are given the same data from an inseam on pants and letting out a sports coat around the waist.
experiment. Each uses these data to construct a confidence A random sample of each type of alteration was obtained, and
interval for the true population mean m. The resulting CIs the summary statistics are given in the table below. Measure-
are (8.55, 10.85) and (8.40, 11.0). ments are in inches.
a. What is the value of the sample mean x?
b. One CI has a confidence level of 95%, and the other has a Sample Sample Assumed
confidence level of 99.9%. Match the confidence level with Alteration size mean s
the confidence interval, and justify your answer. Pants inseam 33 0.74 0.22
8.30 In each of the following problems, the population Sports coat waist 42 1.05 0.37
standard deviation, the bound on the error of estimation, and
the confidence level are given. Find a value for the sample size a. Find a 95% confidence interval for the mean alteration of
n necessary to satisfy these requirements. the inseams on men’s pants.
a. s 5 7.9, B 5 2.5, 95% b. How large a sample is necessary for the bound on the
b. s 5 10.77, B 5 5, 99% error of estimation to be 0.05 for the confidence interval
c. s 5 0.55, B 5 0.001, 98% in part (a)?
d. s 5 35.97, B 5 3.5, 95% c. Find a 90% confidence interval for the mean alteration of
e. s 5 55, B 5 2, 99.9% the waists of sports coats.
SOLUTION TRAIL
A random sample of 35 koalas was obtained, and each was material that forms in flues due to the condensation of certain
carefully weighed. The mean weight was x 5 20.75 pounds. gases. In an effort to promote fireplace and wood-stove safety, a
Assume s 5 3.05 pounds. town in Vermont has started a new inspection program. Forty
a. Find a 99% confidence interval for the mean weight of all homes with wood stoves were selected at random, and the
koalas. amount of creosote buildup one foot from the top of the
b. Researchers believe that if the true mean weight is less than chimney was carefully measured. The resulting sample mean
23 pounds, these animals may be suffering from malnutrition. thickness was x 5 0.131 inch. Assume s 5 0.02 inch.
Is there any evidence to suggest that the koala population is a. Find a 95% confidence interval for the true mean thickness
suffering from malnutrition? Justify your answer. of creosote buildup one foot from the top of chimneys.
b. One-eighth of an inch of creosote buildup is considered safe.
8.38 Technology and the Internet Many companies are
If there is evidence to suggest that the true mean thickness is
increasingly concerned about computer security and employees
greater than 18 inch, the town will embark on an extensive
using their computers for personal use. A random sample of
safety program. Using the interval constructed in part (a),
50 employees of Liberty Mutual Insurance Company was asked
should the town stress greater safety? Justify your answer.
whether they use the Internet for nonbusiness (personal) purposes
during the day. The number of hours for each employee is given 8.42 Physical Sciences According to the U.S. National
on the text website. Assume s 5 0.5 hours. INTERNET Weather Service, at any given moment of any day, approximately
a. Find a 95% confidence interval for the mean number of 2000 thunderstorms are occurring worldwide. Many of these
hours all employees at Liberty Mutual use the Internet for storms include lightning strikes. Sensitive electronic equip-
personal reasons. ment is used to record the number of lightning strikes
b. Using the confidence interval in part (a), is there any evidence worldwide every day. Twelve days were selected at random,
to suggest that the mean time spent using the Internet for and the number of lightning strikes on each day was
personal reasons is more than 1 hour? Justify your answer. recorded. The sample mean was x 5 8.6 million. Assume the
c. How large a sample is necessary for the bound on the error distribution of the number of lightning strikes per day is
of estimation to be 0.1 hours? normal with s 5 0.35 million.
d. Do you think the underlying distribution of time spent on the a. Find a 99% confidence interval for the mean number of
Internet for personal use is normal? Explain your answer. lightning strikes per day worldwide.
b. Do you think the normality assumption in this problem is
8.39 Sports and Leisure A random sample of professional
reasonable? Why or why not?
wrestlers was obtained, and the annual salary (in dollars) for
each was recorded. The summary statistics were x 5 47,500 8.43 Technology and the Internet In early 2013, Cree, Inc.,
and n 5 18. Assume the distribution of annual salary is normal introduced a new, cheap LED light that uses much less electricity
with s 5 8,500. and is designed to last for years. These new bulbs are replace-
a. Find a 90% confidence interval for the true mean annual ments for traditional incandescent bulbs, and the company claims
salary for all professional wrestlers. that its 60-watt warm white bulb has a brightness of 800 lumens.13
b. How large a sample is necessary for the bound on the A random sample of 45 60-watt warm white bulbs was obtained
error of estimation to be 3000? and the brightness of each was carefully measured. The sample
c. How large a sample is necessary for the bound on the mean was x 5 798 lumens. Assume s 5 12.2 lumens.
error of estimation to be 1000? a. Find a 95% confidence interval for the true mean
brightness for the 60-watt warm white bulb.
8.40 Travel and Transportation The Boeing 747-8 is a
b. Using your answer in part (a), is there any evidence to
technologically advanced aircraft with a new wing design and
suggest that the mean brightness is less than 800 lumens?
GEnx-2B engines, and is capable of holding more passengers
Justify your answer.
and payload than previous versions of the 747. In addition, this
c. Suppose s 5 5.2. Find a 95% confidence interval for the
plane has better fuel efficiency than the 747-400 and the A380,
true mean brightness for the 60-watt warm white bulb. In
which means lower carbon emissions.12 A random sample of
this case, is there any evidence to suggest that the true mean
twelve 747-8 test flights was obtained and the carbon emission
brightness is less than 800 lumens? Justify your answer.
was measured on each (in grams CO2 per seat-km). The sample
mean was 72. Assume the carbon emission distribution is
normal and s 5 8.7. Extended Applications
a. Find a 99% confidence interval for the true mean carbon
emission of all 747-8 planes. 8.44 Biology and Environmental Science Peregrine falcons
b. The carbon emission for the A380 is 80 g CO2 per were placed on the U.S. Endangered Species list in the 1970s.
seat-km. Is there any evidence to suggest that the mean Their population dwindled primarily as a result of the introduc-
carbon emission for the 747-8 is lower than that of the tion of new chemicals and pesticides. Although it may seem like
A380? Justify your answer. a peculiar nesting area, there are many falcon pairs in the New
York City area. The tall buildings and bridges provide lots of
8.41 Public Policy and Political Science Many chimney open space for hunting. These birds have a wingspan of 3.3 to 3.6
fires are caused by a buildup of creosote, a highly flammable feet and weigh between 18.8 and 56.5 ounces. A random sample
352 C hapte r 8 Confidence Intervals Based on a Single Sample
of 35 falcons was obtained and each was carefully weighed. The a. Find a 95% confidence interval for the true mean coping
sample mean was x 5 47.3 ounces. Assume s 5 6.2 ounces. skills level for all male athletes in each sport.
a. Find a 95% confidence interval for the true mean weight b. Use your answers to part (a) to determine whether there is
of peregrine falcons. any evidence to suggest that the mean coping skills level
b. Using your answer from part (a), is there any evidence for male basketball players is different from that of male
to suggest that the mean weight is less than 48 ounces? football players. Justify your answer.
c. How large a sample is necessary for the bound on the error
8.45 Economics and Finance Representative Bill Cassidy
of estimation for each confidence interval in part (a) to be 2?
has become concerned about the disparity in home prices in
Louisiana’s 6th district (Monroe). A random sample of homes 8.48 Sports and Leisure Two competing ski slopes in
sold within the past year in each of two parishes was obtained. Colorado advertise their powder base each day in an effort to
The summary statistics are given in the table below. Prices are attract more skiers. A random sample of the powder base depth
in dollars. (in inches) was obtained for each ski resort on days during a
recent winter. Assume sB 5 2.5 and sV 5 2.7. POWDER
Sample Sample Assumed
a. Find a 99% confidence interval for the true mean powder
Parish size mean s
depth at Breckenridge Ski Resort.
East Feliciana 30 125,200 5,750 b. Find a 99% confidence interval for the true mean powder
Iberville 36 155,900 25,390 depth at Vail Ski Resort.
c. Use the results in parts (a) and (b) to determine whether
a. Find a 99% confidence interval for the mean selling price there is any evidence to suggest a difference in the true
of all homes in East Feliciana Parish. mean powder depths at these two ski resorts.
b. Find a 99% confidence interval for the mean selling price d. What assumptions were necessary to construct the
of all homes in Iberville Parish. confidence intervals in parts (a) and (b)?
c. Using the confidence intervals in parts (a) and (b), is there
8.49 Public Health and Nutrition The nutritional value
any evidence to suggest that the mean selling price is
of every food product sold in the United States is listed on the
different for the two parishes? Justify your answer.
package in terms of protein, fat, etc. A researcher randomly
8.46 Sports and Leisure There are three main types of selected 50 one-ounce samples of various nuts and carefully
exercises: range-of-motion (flexibility), strengthening, and measured the amount of protein (in grams) in each sample.
endurance. A random sample of people who exercise regularly The sample mean amount of protein and the assumed standard
was obtained. The type of exercise and length (in minutes) of deviation for each type of nut are given in the table below.14
each workout was recorded. The table below summarizes the Sample Assumed
information obtained. Nut mean s
Exercise Sample Sample Assumed
Cashew 5.17 0.40
type size mean s
Filbert 4.24 0.60
Range-of-motion 65 25.2 5.2 Pecan 2.60 0.95
Strengthening 32 73.6 10.7
Endurance 40 82.2 12.5 a. Find a 95% confidence interval for the true mean amount
of protein in each type of nut. Based on these intervals, is
a. Construct a 99% confidence interval for the mean there any evidence to suggest that the mean amount of
workout length for each of the three types of exercises. protein is different for cashews and pecans? How about
b. A group of exercise scientists claims the length of a filberts and pecans?
workout for range-of-motion exercises is the same as for b. Answer part (a) using a sample size for each nut of 18.
the other two types. Based on the confidence intervals in 8.50 Business and Management Snail farming, or
part (a), is there any evidence to refute this claim? heliciculture, has become more popular and lucrative in many
8.47 Psychology and Human Behavior A random sample parts of the world. In April 2013, John Morton, head of U.S.
of male college athletes was obtained and their coping skills Immigration and Customs Enforcement, resigned with plans to
levels were measured by means of an extensive psychological open a snail farm.15 The Helix pomatia, or escargot snail, grows
profile. Each athlete was also classified by sport. The table to full size in 2 to 3 years and weighs 15 to 25 grams. A random
below summarizes the information obtained. sample of snails of this type was obtained from a farm near the
town of Targovishte, Bulgaria. Each snail was carefully
Sample Sample Assumed weighed (in ounces). Assume s 5 2.4 grams. SNAILS
Sport size mean s a. Find a 99% confidence interval for the true mean weight
Football 35 65.77 14.07 of these snails.
Basketball 30 53.90 12.50 b. Using your answer in part (a), is there is any evidence
to suggest that the population mean weight is more than
Hockey 32 68.45 10.25
20 grams?
8.3 A Confidence Interval for a Population Mean When s Is Unknown 353
c. How large a sample size is necessary for the bound on the was obtained. The sample mean was x 5 259.79. Assume and
error of estimation to be 0.5 gram? s 5 7.5 the population distribution of winning times is normal,
and find a one-sided 99% confidence interval, bounded below,
for the true mean winning time.
Challenge
8.53 The “Best” CI There are actually infinitely many
8.51 One-Sided Confidence Intervals Given a random different confidence intervals for a population mean m. For
sample of size n from a population with mean m, assume the example, suppose that, in a random sample of size n from a
underlying population is normal or n is large, and the popula- population with mean m, the underlying population is normal or
tion standard deviation s is known. n is large, and the population standard deviation s is known.
a. Manipulate the probability statement One could start with a probability statement of the form
X 2m X 2m
Pa . 2za b 5 1 2 a P a2z3a/4 , , za / 4 b 5 1 2 a
s/ !n s/ !n
to find a one-sided 100 ( 1 2 a ) % confidence interval for to find a 100 ( 1 2 a ) % confidence interval for m. Why is the
m. That is, find an upper bound for the population mean m. traditional two-sided confidence interval (presented in this
b. Manipulate a similar probability statement to find a chapter) the best CI for a parameter?
one-sided 100 ( 1 2 a ) % confidence interval for m
8.54 The Real Meaning of a 95% CI In the Confidence
bounded below.
c. First-class mail in the United States includes personal
Intervals applet, select a 95% Confidence Level (C). Click the
correspondence and all kinds of bills. Each piece of first-class Sample 50 button to generate 50 simple random samples. The
mail must weigh less than 13 ounces. In a random sample of applet will construct the confidence interval associated with
25 first-class letters, the sample mean weight was x 5 2.2 each sample and display a visual representation of each CI with
ounces. Assume the population is normal and s 5 0.75. a line centered at the sample mean. Lines in red do not capture
Find a 95% one-sided confidence interval bounded above for the true population mean.
a. Describe any pattern in the confidence intervals.
the population mean weight of first-class letters.
b. Click the Sample 50 button repeatedly and observe the
8.52 Sports and Leisure Steeplechase horseraces are run on Percent hit. This indicates the percentage of CIs that
courses that include obstacles such as brush fences, stone walls, captured the true population mean. Describe the behavior
timber walls, and water jumps. A random sample of 17 winning of the percent hit as the total CIs generated increases.
times (in seconds) in the 2 38 -mile Saratoga Steeplechase Race Justify your answer.
The t distribution is closely related to the standard normal distribution. Most continu-
ous random variables are defined by a probability density function, but it is only impor-
tant here to understand the properties of a t distribution.
The t distribution was derived by Properties of a t Distribution
William Gosset in 1908. Working
1. A t distribution is completely determined (characterized) by only one parameter, v,
for Guinness Breweries, he
called the number of degrees of freedom (df). n must be a positive integer, n 5 1, 2,
published his result using the
3, 4, c, and there is a different t distribution corresponding to each value of v.
pseudonym Student. For this
reason, the distribution is still often 2. If T (a random variable) has a t distribution with v degrees of freedom, denoted T , tv, then
called “Student’s t distribution.” v
mT 5 0 and s2T 5 (v $ 3) (8.5)
v22
The symbol n is the Greek letter 3. The density curve for every t distribution is bell-shaped and centered at 0, but more
“nu.” It is often used to represent spread out than the density curve for a standard normal random variable, Z. As v
the number of degrees of increases, the density curve for T becomes more compact and closer to the density
freedom associated with a curve for Z. See Figure 8.24.
random variable.
Z N(0, 1)
T t15
T t5
Definition
ta,n is a critical value related to a t distribution with n degrees of freedom. If T has a t
distribution with n degrees of freedom, then P ( T $ ta,n) 5 a.
A Closer L ok
Remember, the a in ta,n is simply 1. For any t distribution, ta,n is simply a t value (a value on the measurement axis) such
a placeholder. Any symbol could that there is a of the area (probability) to the right of ta,n. The negative critical value
be used here to represent a is 2ta,n. Because the t distribution is symmetric, P ( T # 2ta,n ) 5 P ( T $ ta,n) 5 a.
right-tail probability. See Figure 8.25.
P(T $ ta,n) 5 a
t
Table V in the Appendix presents selected critical values associated with various t
distributions. Right-tail probabilities are in the top row and degrees of freedom are listed
in the left column. In the body of the table, ta,n is at the intersection of the a column and
the n row. The following example illustrates the use of this table for finding critical values
associated with a t distribution.
Solution
a. Using the following portion of Table V, find the intersection of the a 5 0.05 column
and the n 5 12 row.
a
n 0.20 0.10 0.05 0.025 0.01 0.005 0.001 0.0005 0.0001
( ( ( ( ( ( ( ( ( (
10 .8791 1.3722 1.8125 2.2281 2.7638 3.1693 4.1437 4.5869 5.6938
11 .8755 1.3634 1.7959 2.2010 2.7181 3.1058 4.0247 4.4370 5.4528
12 .8726 1.3562 1.7823 2.1788 2.6810 3.0545 3.9296 4.3178 5.2633
13 .8702 1.3502 1.7709 2.1604 2.6503 3.0123 3.8520 4.2208 5.1106
14 .8681 1.3450 1.7613 2.1448 2.6245 2.9768 3.7874 4.1405 4.9850
( ( ( ( ( ( ( ( ( (
T t12 T t21
Figure 8.26 Visualization of Figure 8.27 Visualization of Figure 8.28 Use the invT function
t0.05,12 5 1.7823. t0.01,21 5 2.5176. to find critical values associated with
a t distribution.
A Closer L ok
1. Table V is very limited. However, a calculator or computer can find almost any critical
value needed.
Even for n 5 40, ta,n < za. 2. In Table V, as n increases, the t critical values approach the corresponding Z critical values.
This numerical observation is analogous to the graphical comparison in Figure 8.24. The
density curve for T approaches the density curve for Z as n increases.
356 C h a p t e r 8 Confidence Intervals Based on a Single Sample
8.1 Theorem
Let X be the mean of a random sample of size n from a normal distribution with mean m.
The random variable
X 2m
Even though n is in the T5 (8.6)
S/ !n
denominator, n 2 1 degrees of
freedom is correct! has a t distribution with n 2 1 degrees of freedom.
This theorem is used to construct a confidence interval for m. As in Section 8.2, start
in the appropriate t world. Find a symmetric interval about 0 such that the probability T
lies in this interval is 1 2 a.
P ( 2ta/2,n21 , T , ta/2,n21) 5 1 2 a
X 2m
Pa2ta/2,n21 , , ta/2,n21b 5 1 2 a (8.7)
S/ !n
Manipulate Equation 8.7 to obtain the probability statement
S S
Pa X 2 ta/2,n21 , m , X 1 ta/2,n21 b 5 1 2 a (8.8)
!n !n
This probability statement leads to the following general result.
A Closer L ok
1. Equation 8.9 can be used with any sample size n ( $ 2 ) and produces an exact (not an
approximate) confidence interval for m.
2. This confidence interval for m (Equation 8.9) is valid only if the underlying population
is normal.
n Normally distributed ta/2,n21 5 t0.025,10 5 2.2281 Use Table V in the Appendix with n 5 10.
n s 5 27.6
n Random sample Step 2 Use Equation 8.9.
s
T ra nsl ati o n x 6 ta/2,n21 Equation 8.9.
!n
n 95% CI for m
Underlying population is normal s
n 5 x 6 t0.025,10 Use the values of a and n.
n s is unknown !n
27.6
C oncepts 5 980.2 6 ( 2.2281 ) Use summary statistics and value for t0.025,10.
n A 100 ( 1 2 a ) % CI for a !11
population mean when s is 5 980.2 6 18.54 Simplify.
unknown 5 ( 961.66, 998.74 ) Compute endpoints.
V isi on
(961.66, 998.74) is a 95% confidence interval for the true mean oil yield (in grams)
We need a 95% CI for m. The
through distillation in a typical batch.
population is normal, and s
is unknown. Find the appropriate Figures 8.29 and 8.30 together show a technology solution.
t critical value, and use
Equation 8.9.
3.4 3.5 3.0 3.4 3.3 3.5 3.3 3.0 3.5 3.0 2.7 3.1
3.8 3.1 3.3 3.1 3.1 2.9 2.5 3.1 3.0 3.6 3.2 3.2
358 C h a p t e r 8 Confidence Intervals Based on a Single Sample
a. Find a 99% confidence interval for the true mean amount of time Americans spend
SOLUTION TRAIL social networking each day. Assume that the underlying population is normal.
b. Using the confidence interval constructed in part (a), is there any evidence to suggest
that the true mean amount of time spent social networking each day is different from
3.2 hours? Justify your answer.
Concepts
Use Equation 8.9.
n A 100 ( 1 2 a ) % CI for a
population mean when s is s
x 6 ta/2,n21 Equation 8.9.
unknown !n
Vi sion s
5 x 6 t0.005,23 Use the values of a and n.
Construct a 99% CI for m. The !n
underlying distribution is 0.2903
assumed to be normal, and s is 5 3.19 6 ( 2.8073 ) Use summary statistics and value for t0.005,23.
!24
unknown. Use Equation 8.9.
5 3.19 6 0.1664 Simplify.
SOLUTION TRAIL
(3.02, 3.36) is a 99% confidence interval for the true mean amount of time
Americans spend social networking each day. Figure 8.31 and 8.32 show technology
solutions.
Tr anslat i on
n Inference procedure
Figure 8.31 TI-84 Plus C Figure 8.32 JMP confidence interval.
Con cepts
TInterval.
n A 100 ( 1 2 a ) % CI for a
population mean when s is
unknown b. Claim: m 5 3.2.
Experiment: x 5 3.19.
Vi sion
The CI constructed in part (a) is Likelihood: The likelihood is expressed as a 99% confidence interval, an interval of
an interval in which we are 99% likely values for m: (3.02, 3.36), from part (a).
confident the true value of m lies. Conclusion: The claimed amount of time spent social networking, 3.2 hours, is
If 3.2 lies in this interval, there is included in this confidence interval. There is no evidence to suggest m is different
no evidence to suggest m is from 3.2.
different from 3.2. Use the
four-step inference procedure.
Try It Now Go to Exercise 8.71
8.3 A Confidence Interval for a Population Mean When s Is Unknown 359
A Closer L ok
1. Technology solutions can find the appropriate t critical value for any sample size and
any confidence level. If you must use Table V to find a critical value for a value of n
and/or a value of a not listed, use linear interpolation.
Recall that interpolation is a method of approximation. It is often used to estimate a
value at a position between two given values in a table. Linear interpolation assumes
the two known values lie on a straight line and was discussed in Chapter 6.
2. Sample size calculation is more complicated in this case. s is unknown and the critical
value, ta/2,n21, depends on n. Equation 8.3 is often used with an estimate for s and the
assumption that za/2 < ta/2,n21.
Technology Corner
Procedure: Construct a confidence interval for a population mean when s is unknown.
Reconsider: Example 8.8, solution, and interpretations.
CrunchIt!
Use the function t; 1-sample. Input is either summary statistics or a column containing data.
1. Enter the data in column Var1.
2. Select Statistics; t; 1-sample.
3. Under the Columns tab, enter the name of the column containing the data.
4. Under the Confidence Interval tab, enter the confidence level. Click Calculate. The confidence interval is displayed in a
new window. See Figure 8.33.
TI-84 Plus C
Use the calculator function TInterval. Input is either summary statistics or a list containing data.
1. Enter the data into list L1.
2. Select STATS ; TESTS; TInterval.
3. Highlight Data. Enter the name of the list L1, set the frequency to 1, and enter the C-Level (the confidence coefficient).
4. Highlight Calculate and press ENTER . The confidence interval and summary statistics are displayed on the Home
Screen. See Figure 8.31.
Minitab
Use the function 1-Sample t. Input is either summary statistics or a column containing data.
1. Enter the data into column C1.
2. Select Stat; Basic Statistics; 1-Sample t.
3. Choose One or more samples, each in a column, and enter C1 in the entry window.
360 C h a p t e r 8 Confidence Intervals Based on a Single Sample
4. Select the Options option button and enter the confidence level. The results are displayed in a session window. See Figure 8.34.
Excel
There is no built-in function to compute the endpoints of a confidence interval for a population mean when s is unknown. Use the
function CONFIDENCE.T to find the bound on the confidence interval and the functions AVERAGE and STDEV.S if necessary.
1. Enter the data into column A.
2. Compute the sample mean and the sample standard deviation.
s
3. Find the bound on the confidence interval using the function CONFIDENCE.T. This is the value ta/2,n21 , half the
!n
width of the CI. The arguments are a, x, and n.
4. Compute the left and right endpoints of the confidence interval. See Figure 8.35.
8.65 In each of the following problems, the sample mean, the b. Interpret the confidence interval found in part (a).
sample standard deviation, the sample size, and the confidence c. Suppose a 95% confidence interval for the true mean
level are given. Assume the underlying population is normally wash cycle time is constructed. Would this interval be
distributed. Find the associated confidence interval for the smaller or larger than the interval in part (a)? Justify
population mean. your answer.
a. x 5 0.234, s 5 0.081, n 5 16, 95%
8.69 Biology and Environmental Science For the week
b. x 5 259.6, s 5 76.9, n 5 26, 99%
ending 5/29/13, the Iowa Department of Agriculture reported
c. x 5 22.85, s 5 7.19, n 5 27, 99%
the mean weight of barrows and gilts (young male and female
d. x 5 380.9, s 5 28.4, n 5 21, 95%
hogs) as 275.4 pounds.17 To check this claim, a random
e. x 5 88.1, s 5 17.45, n 5 19, 99.9%
sample of hogs was obtained and each was carefully
8.66 In each of the following problems, put the values in order weighed. The weight (in pounds) of each is given in the
from smallest to largest. (Note: There is no need to use a table following table. HOGS
or technology here. Use your knowledge of the t distribution
and the Z distribution.)
269.7 253.2 277.3 264.1 264.2 266.7
a. t0.01,5, t0.01,27, z0.01, t0.01,17
b. t0.025,13, t0.025,11, t0.025,45, z0.025 267.4 268.3 268.2 272.4 259.7 263.3
c. t0.05,15, t0.001,15, t0.02,15, t0.025,15
d. t0.0001,21, t0.1,21, t0.005,21, t0.05,21 a. Assuming the underlying distribution of barrow and gilt
e. t0.10,6, t0.001,26, t0.05,17, z0.0001 weights is normal, find a 95% confidence interval for the
true mean weight of barrows and gilts.
b. Using your answer in part (a), is there any evidence to
Applications
suggest that the claim ( m 5 275.4 ) is wrong? Justify
8.67 Manufacturing and Product Development During your answer.
the manufacture of certain commercial windows and doors,
8.70 Physical Sciences The Earth is structured in layers:
hot steel ingots are passed through a rolling mill and flattened
crust, mantle, and core. A recent study was conducted to
to a prescribed thickness. The machinery is set to produce a
estimate the mean depth of the upper mantle in a specific
steel section 0.25 inch thick. Fourteen steel sections were
farming region in California. Twenty-six sample sites were
selected at random and the thickness of each was recorded.
selected at random, and the depth to the upper mantle was
The data are given in the table below.
measured using changes in seismic velocity and density. The
summary statistics are x 5 127.5 km and s 5 21.3 km.
0.213 0.298 0.236 0.324 0.254 0.271 0.204 Suppose the depth of the upper mantle is normally distrib-
SOLUTION TRAIL
0.252 0.307 0.297 0.301 0.291 0.222 0.246 uted. Find a 95% confidence interval for the true mean depth
of the upper mantle in this farming region, and interpret
your result.
Assume the underlying distribution of section thickness is
normal. 8.71 Fuel Consumption and Cars In many rural
a. Find a 99% confidence interval for the true mean steel areas, newspaper carriers deliver morning papers using
section thickness. their automobiles because the length of the route prohibits
b. As the rollers erode, the machine begins to produce walking. In a random sample of 28 carriers who use their
steel sections that are too thick. Using your answer to automobiles, the sample mean route length was x 5 16.7 miles
part (a), is there any evidence to suggest the true mean with s 5 3.4.
steel section thickness is more than 0.25 inch? Justify
SOLUTION TRAIL
a. If the distribution of route lengths is normal,
your answer. find a 95% confidence interval for the true mean
c. Use the methods described in Section 6.3, page 272, to route length of newspaper carriers who use their
check for any evidence of non-normality. automobiles.
b. If the mean length of the routes is over 20 miles, the
8.68 Manufacturing and Product Development
circulation department becomes concerned that papers
A typical washing machine has several different cycles,
will not be delivered by 7:00 a.m. Using your answer to
including soak, wash, and rinse. The energy consumption
part (a), is there any evidence to suggest the true mean
of a washing machine is linked to the length of each
route length is over 20 miles? Justify your answer. Write a
cycle. A random sample of 21 washing machines was
Solution Trail for this problem.
obtained, and the length (in minutes) of each wash cycle
was recorded. The summary statistics are x 5 37.8 and 8.72 Sports and Leisure The popular sport of mountain
s 5 5.9. Assume the underlying distribution of main wash biking involves riding bikes off-road, often over very
cycle times is normal. rough terrain. Trek is a leading manufacturer of mountain
a. Find a 90% confidence interval for the true mean wash bikes, specially designed for durability and performance.
cycle time. Write a Solution Trail for this problem. A random sample of Trek mountain bikes was obtained
362 C h a p t e r 8 Confidence Intervals Based on a Single Sample
and the weight (in kg) of each is given in the following c. If city buses in Atlanta consistently arrive more than five
table.18 MTNBIKE minutes late, the mayor receives a large number of phone
calls from irate citizens. Using your answer to part (a), is
there any evidence to suggest the mayor will be receiving
11.74 12.32 12.06 13.58 12.19 12.29 10.84 12.04 nasty phone calls? Justify your answer.
13.01 11.37 9.01 7.34 7.19 7.45 7.69
8.75 Business and Management Recently the Health
Services Purchasing Organization started a patient-focused
a. Assume the underlying distribution is normal. Find a 99% funding (PFF) program in British Columbia. One component of
confidence interval for the true mean weight of Trek this program included financial incentives for hospitals that
mountain bikes. operate more efficiently, which is associated with the patient
b. Suppose Trek claims to have the lightest mountain bikes average length of stay (ALOS). In a random sample of 25
on the market, with a mean weight of 10 kg. Using your patients with congestive heart failure at Vancouver Coastal
answer in part (a), is there any evidence to suggest this Health, the resulting mean ALOS (in days) was 9.9.19 The
claim is false? Justify your answer. standard deviation was s 5 3.7 days and assume that the
8.73 Technology and the Internet A computer supply underlying distribution is normal.
store sells a wide variety of generic replacement ink cartridges a. Find a 99% confidence interval for the true mean ALOS
for printers. A consumer group is concerned that the cartridges for patients with congestive heart failure.
may not contain the specified amount of ink (30 ml). A b. Suppose the Health Services Purchasing Organization
random sample of 17 black replacement cartridges was will award hospitals performance funding money if the
obtained, and the amount of ink (in ml) in each is given in mean ALOS for these patients is less than 12 days. Is
the table below. there any evidence to suggest that Vancouver Coastal
INK
Health will receive performance funding money?
Justify your answer.
30.27 29.70 29.35 29.08 29.74 29.26 29.50
8.76 Sports and Leisure One factor in rating a National
29.12 29.68 28.54 30.01 29.87 30.61 29.80
Hockey League team is the mean weight of its players.
29.33 29.21 28.84 A random sample of players from five teams was obtained.
The weight (in pounds) of each player was carefully
Assume the underlying distribution of ink amount is normal. measured, and the resulting data are given in the
a. Find a 95% confidence interval for the true mean amount following table.20
of ink in each black cartridge.
b. Is there any evidence to suggest the cartridges are Sample
underfilled? Justify your answer. Sample Sample standard
Team size mean deviation
8.74 Travel and Transportation Some municipal managers
complain a city bus route is the most difficult service to fulfill. Boston 15 200.0 10.5
Traffic jams and roadwork are often the cause of unreliable Detroit 18 201.2 11.8
service. In an effort to analyze current city bus transportation in NY Rangers 12 200.3 10.9
Atlanta, Georgia, a random sample of routes and stops was
Philadelphia 10 202.7 12.1
obtained. The number of minutes the bus was late (compared
with the posted route times) was recorded. The resulting data San Jose 15 210.7 14.2
are given in the table below. A value of 0.00 means the bus
arrived on time. CITYBUS a. Assuming normality, find a 95% confidence interval
for the true mean weight of players for each hockey
team.
10.29 0.00 0.00 9.96 2.10 0.00 9.52 b. The sample size for Boston and San Jose is the same.
1.83 2.37 1.47 0.00 7.98 5.19 0.40 Why is the 95% confidence interval for the weight of San
3.21 0.00 6.91 0.00 5.02 Jose players wider?
c. If the true mean weight for two teams is different, then it
is likely that there will be a more physical game when the
a. Assume the underlying distribution of late times is
two teams meet. Is there any evidence to suggest that the
normal. Find a 90% confidence interval for the true mean
true mean player weights are different from Boston and
number of minutes a city bus in Atlanta is late.
San Jose? Justify your answer.
b. Use the methods described in Section 6.3, page 272, to
determine whether there is any evidence to suggest that 8.77 Marketing and Consumer Behavior According to
the distribution of late times is non-normal. a recent survey, Americans are keeping their cars longer.21
8.3 A Confidence Interval for a Population Mean When s Is Unknown 363
Some of the reasons may include the poor economy and 8.80 Public Policy and Political Science Juvenile courts
better-built vehicles. A random sample of new- and used-car in all states maintain careful records for cases, including
owners was asked how long they usually kept their car demographics, charges, and dispositions. One variable of
(in months). CARS interest for repeat offenders is the number of days since
a. Find a 99% confidence interval for the true mean time the last offense (or arrest). A random sample of repeat
people keep a new car. offenders appearing in juvenile court was obtained for three
b. Find a 99% confidence interval for the true mean time states. The number of days since the last offense was
people keep a used car. recorded for each, and the summary statistics are given in
c. Using your answers in parts (a) and (b), do you think the table below.
new-car owners and used-car owners keep their cars for
different lengths of times? Justify your answer. Sample
Sample Sample standard
8.78 Economics and Finance Along with gold and silver,
State size mean deviation
platinum is also considered a precious metal and is traded on
the commodities market. A random sample of the price of Ohio 17 180.6 37.8
platinum (in dollars per troy ounce) was obtained from jewelers California 29 162.7 25.2
around the world. The resulting summary statistics were Massachusetts 22 115.3 17.6
x 5 664.50 , s 5 5.25, and n 5 55. Assume the underlying
distribution of the price is normal.
a. Find a 95% confidence interval for the current true mean a. Assume the underlying distribution of days since the last
price per troy ounce of platinum. offense is normal for each state. Find a 99% confidence
b. One month ago, a brokerage firm reported the true mean interval for the true mean number of days since the last
price of platinum to be $660.79 per troy ounce, with a offense for repeat offenders in each state.
recommendation to buy. Is there any evidence to suggest b. Using your answers to part (a), is there any evidence to
the true mean price has increased? Justify your answer. suggest the true mean number of days since the last
offense in Ohio and California is different? How about
California and Massachusetts? Justify each answer.
Extended Applications
8.81 Medicine and Clinical Studies Arrhythmia, or an
8.79 Psychology and Human Behavior The ambient irregular heart beat, may be caused by heart disease or by
temperature in which humans are comfortable varies with environmental factors such as stress, caffeine, tobacco, or
culture, activity, metabolic rate, psychological state, environ- even cold medicine. One type of arrhythmia is atrial
ment, and season. For most people in the United States, the flutter, in which the heart beats very fast, at over 250 beats
comfort zone is 68 to 78°F. During a recent winter, a random per minute. In a recent research study, randomly selected
sample of homeowners was selected from two different parts of patients identified to have atrial flutter were carefully
the country. The thermostat temperature setting (in °F) was monitored. The number of beats per minute for
recorded for each home, and the summary statistics are given in the most recent flutter for each patient, by sex, is
the following table. given on the text website. Assume the underlying
Sample distribution of beats per minute in atrial flutter patients
Sample Sample standard is normal. FLUTTER
Region size mean deviation a. Find a 98% confidence interval for the true mean beats
per minute in male atrial flutter patients.
New England 14 70.2 2.75
b. Find a 98% confidence interval for the true mean beats
South 11 72.1 1.55 per minute in female atrial flutter patients.
c. Using your answers to parts (a) and (b), is there any
a. Find a 95% confidence interval for the true mean
evidence to suggest the true mean beats per minute in
thermostat setting for New England homeowners during
atrial flutter patients is different for men and women?
winter.
Justify your answer.
b. Find a 95% confidence interval for the true mean
d. Do you think the normality assumption is reasonable in
thermostat setting for Southern homeowners during
this case? Why or why not?
winter.
c. What assumption(s) did you make in constructing these 8.82 Sports and Leisure The Wimbledon Championships
two confidence intervals? tennis tournament, where women players curtsy to the Queen
d. Using your answers to parts (a) and (b), is there any and fans eat strawberries and cream, finishes in the first week of
evidence to suggest the New England mean thermostat July. Each match is played on a grass court, as opposed to a
setting is different from the Southern thermostat setting clay or hard court. Some people believe the grass court is more
during winter? Justify your answer. challenging and speeds up the game. Random samples of match
364 C h apter 8 Confidence Intervals Based on a Single Sample
lengths (in minutes) from Wimbledon and from the U.S. Clay more about the approximately 1 million participants, a random
Court Championships were obtained. The data are summarized sample of bike enthusiasts was obtained. Some of the summary
in the table below. data are given in the table below.
Sample Sample
Court Sample Sample standard Sample Sample standard
type size mean deviation Variable size mean deviation
Grass 18 65.7 25.3 Age (males, years) 60 38.9 7.9
Clay 12 83.2 35.8 Age (females, years) 40 35.6 4.5
Distance traveled (miles) 75 257.5 56.8
Assume the underlying distribution of match lengths is normal
on grass and on clay. Assume the underlying distributions are normal.
a. Find a 99% confidence interval for the true mean match a. Find a 95% confidence interval for the true mean age of
length for grass courts at Wimbledon. men attending the rally and a 95% confidence interval for
b. Find a 99% confidence interval for the true mean match the true mean age of women attending the rally.
length for clay courts at the U.S. Clay Court b. Is there any evidence to suggest the mean age for men is
Championships. different from the mean age for women? Justify your answer.
c. Is there any evidence to suggest the mean match time for c. Find a 99% confidence interval for the true mean distance
grass courts and clay courts is different? Justify your traveled to the rally. Interpret this result.
answer.
8.86 Biology and Environmental Science There have
8.83 Public Health and Nutrition During a medical been many studies that suggest global climate change is
emergency, people who dial 911 expect an ambulance to arrive affecting the population of polar bears. One measure of the
quickly and personnel to provide vital care. Health insurance health of a polar bear population is weight, and the mean
companies believe there is a marked difference in the response weight for a male polar bear is approximately 900 pounds. A
time for rural areas versus cities because of differences in recent study suggests that the polar bears in the Chukchi Sea
coverage area and number of qualified paramedics. Random are doing very well, despite changes in the climate.24 A
samples of ambulance response times (in minutes) were random sample of 14 male polar bears from the Chukchi Sea
obtained for rural areas and cities. PARAMED
was obtained from a capture-and-release program. The sample
a. Assuming normality, find a 99% confidence interval for mean weight was x 5 1016 pounds . Assume the underlying
the true mean response time for rural-area ambulances. weight distribution is normal and that s 5 78.
b. Assuming normality, find a 99% confidence interval for a. Find a 99% confidence interval for the true mean weight
the true mean response time for city ambulances. of male polar bears in the Chukchi Sea.
c. Use the methods described in Section 6.3, page 272, to b. Using your answer from part (a), is there any evidence to
determine whether there is any evidence to suggest either suggest that the mean weight of polar bears in the Chukchi
response time distribution is non-normal. Sea is greater than 900 pounds? Justify your answer.
d. Is there any evidence to suggest the mean response time is
different for rural areas and cities? Justify your answer.
Challenge
8.84 Sports and Leisure In May 2013 the New York Post
8.87 Public Health and Nutrition Given a random sample of
published an article about a scheme by wealthy Manhattan
size n from a normal population with mean m (and s unknown):
mothers to avoid long wait times at Disney World.22
a. Use the method outlined in Exercise 8.51 to find a
Although the wait times can be long and the Disney VIP
one-sided 100 ( 1 2 a ) % confidence interval for m
service is expensive, the plot to cut in line has been difficult
bounded above, and another bounded below.
to prove. A random sample of wait times (in minutes) for the
b. A recent study suggests that drinks made with broccoli
Magic Kingdom, Epcot, and Animal Kingdom was
sprouts may filter out harmful chemicals from our
obtained.23 Assume each underlying distribution of wait
bodies.25 A new “super broccoli” has been developed
times is normal. WAITTIME
that contains two to three times more glucoraphanin,
a. Find a 95% confidence interval for the true mean wait
which may help to reduce the risk of certain kinds of
time for each park.
cancer. A random sample of the amount of
b. Is there any evidence that any of the parks have different
glucoraphanin in 17 super broccoli seeds was obtained.
true mean wait times? Justify your answer.
The sample mean was 21.6 mmol/g with a standard
8.85 Sports and Leisure The Jackpine Gypsies Motorcycle deviation of 3.2. Assume the underlying population
Club holds the Sturgis Rally in the Black Hills of South Dakota distribution is normal, and find a one-sided 99%
every summer. There are motocross, hill climb, and short-track confidence interval, bounded below, for the true mean
races, and plenty of parties and bikes. In an attempt to learn amount of glucoraphanin in a super broccoli seed.
8.4 A Large-Sample Confidence Interval for a Population Proportion 365
=
P2p
Z5 , N ( 0, 1 ) (8.11)
d
p(1 2 p)
Ä n
P ( 2za/2 , Z , za/2 ) 5 1 2 a
=
P2p
P ° 2za/2 , , za/2 ¢ 5 1 2 a (8.12)
p(1 2 p)
Ä n
A Challenge exercise in this Trying to sandwich p in Equation 8.12 is a little tricky because p appears in both the
section asks you to find a numerator and the denominator (inside a square root). Instead, because n is large, we
=
confidence interval for p without usually use the sample proportion, p, as a good estimate of p in the denominator.
=
using p in the denominator. Manipulating the inequality (inside the probability statement) leads to the following
general result.
Solution
Step 1 n 5 350, x 5 86. Given.
= 86
Step 2 The sample proportion: p 5 5 0.2457.
350
=
np 5 ( 350 )( 0.2457 ) 5 86 $ 5
JAXA/NASA
=
n ( 1 2 p ) 5 ( 350 )( 0.7543 ) 5 264 $ 5
=
Both inequalities are satisfied. P is approximately normal.
Step 3 1 2 a 5 0.99 1 a 5 0.01 1 a / 2 5 0.005 Find a/2.
(0.1864, 0.3050) is a 99% confidence interval for the true proportion, p, of days on
which there is at least one solar flare. Figures 8.37 and 8.38 show technology solutions.
Suppose n is large, the confidence level is 100 ( 1 2 a ) %, and B is the bound on the
error of estimation. Half the width of the confidence interval is the step in each direction
=
from p. Therefore, the bound on the error of estimation is
= =
p(1 2 p)
B 5 za / 2 # (8.14)
Å n
Solving for n yields
= = za / 2
2
n 5 p(1 2 p) a b (8.15)
B
Look carefully at this result. Although the mathematics is correct, there is a real problem
with Equation 8.15. The critical value, za/2, and the bound on the error of estimation, B, can
= =
be specified. But p is unknown. We do not know p until we have the sample size n (and x).
There are two solutions to this problem.
=
1. Use a reasonable estimate for p from previous experience. Researchers often conduct
=
many similar experiments over time and can make very realistic guesses for p.
=
For given za/2 and B, Equation 2. If no prior information is available, use p 5 0.5 in Equation 8.15. This produces a
8.15 is greatest (maximized) when very conservative, large value for n.
=
p 5 0.5.
In either case, it is unlikely that the value of n will be an integer. Always round up. This
guarantees the resulting confidence interval will have a bound on the error of estimation
of at most B. Any larger sample size will also suffice. The following example illustrates
the use of Equation 8.15.
Solution
=
VIDEO Tech
Video TECH Manuals
MANUALS a. Use p 5 0.1 and B 5 0.02 in Equation 8.15.
EXEL DISCRIPTIVE
Sample size 1 2 a 5 0.95 1 a /2 5 0.05 1 z0.025 5 1.96 Find the critical value.
calculation
= = za / 2
2
n 5 p(1 2 p) a b Equation 8.15.
B
1.96 2
5 ( 0.1 )( 0.9 ) a b Substitute given values.
0.02
5 ( 0.09 )( 98 ) 2 5 864.36 Computation.
= = za / 2
2
n 5 p(1 2 p) a b Equation 8.15.
B
1.96 2
5 ( 0.5 )( 0.5 ) a b Substitute given values.
0.02
5 ( 0.25 )( 98 ) 2 5 2401 Computation.
Technology Corner
Procedure: Construct a confidence interval for a population proportion.
Reconsider: Example 8.9, solution, and interpretations.
CrunchIt!
Use the built-in function Proportion 1-Sample. Input is either summary statistics or a single column of values in which one
is designated as a success.
1. Select Statistics; Proportion; 1-sample.
2. Under the Summarized tab, enter values for n, Successes, and the Confidence Interval Level. Click Calculate. The
results are displayed in a new window (Figure 8.37).
TI-84 Plus C
Use the built-in function 1-PropZInt.
1. Select STATS ; TESTS; 1-PropZInt.
2. Enter values for x, the number of success, n, the total number of trials, and C-Level, the confidence coefficient.
3. Highlight Calculate and press ENTER . The confidence interval and summary statistics are displayed on the Home
Screen. See Figure 8.36.
Minitab
Use the built-in function 1 Proportion. Input is either a sample in a column or summarized data.
1. Select Stat; Basic Statistics; 1 Proportion.
2. Choose Summarized data and enter the Number of events (the number of successes), and the Number of trials.
3. Select the Options option button. Enter the Confidence level and select Method: Normal appoximation. Click OK.
The results are displayed in a session window. See Figure 8.39.
water heaters includes a flame arrester. This device helps a. Find a 95% confidence interval for the true proportion of
to prevent flashback fires from flammable liquid vapor Americans in favor of limiting the number of
nearby. The Consumer Product Safety Commission would commercials shown during each children’s show.
like to estimate the proportion of homes affected by this b. Find a 95% confidence interval for the true proportion of
safety standard. Americans in favor of providing more adult-education
a. In a random sample of 575 homes, 235 had gas water programs.
heaters. Find a 90% confidence interval for the true c. Find a 95% confidence interval for the true proportion of
proportion of homes with gas water heaters. Write a Americans in favor of making all children’s programming
Solution Trail for this problem. commercial-free.
b. Prior research suggests that the proportion of homes with d. Which of the above confidence intervals is the narrowest?
gas water heaters is approximately 0.40. How large a Why?
sample is necessary for the bound on the error of
8.105 Marketing and Consumer Behavior In the CRFA
estimation to be 0.03 for a 95% confidence interval?
2013 Canadian Chef Survey, 350 professional chefs were
8.102 Marketing and Consumer Behavior A successful randomly selected and asked to rate the popularity of several
company usually has high brand-name and logo recognition items, including cooking methods, hot trends, and perennial
among consumers. For example, Coca-Cola products are favorites.29 In the Hot Trends section, suppose 209 chefs
available to 98% of all people in the world, and may have the selected slow cooking as the most popular preparation method,
highest logo recognition of any company. A software firm 107 chefs selected gluten-free as the most popular culinary
developing a product would like to estimate the proportion of theme, and 98 selected bite-size desserts as the most popular
people who recognize the Linux penguin logo. Of the 952 dessert item.
randomly selected consumers surveyed, 132 could identify the a. Find a 95% confidence interval for the true proportion of
product associated with the penguin. = Canadian chefs who believe slow cooking is the most
a. Is the distribution of the sample proportion, P, popular preparation method.
approximately normal? Justify your answer. b. Find a 95% confidence interval for the true proportion of
b. Find a 95% confidence interval for the true proportion of Canadian chefs who believe gluten-free is the most
consumers who recognize the Linux penguin. popular culinary theme.
c. The company will market a Linux version of their new c. Find a 95% confidence interval for the true proportion of
software if the true proportion of people who recognize Canadian chefs who believe bite-size desserts are the
the logo is greater than 0.10. Is there any evidence to most popular dessert item.
suggest the true proportion of people who recognize the
8.106 Civic Engagement The Hyde Park Neighborhood
logo is greater than 0.10? Justify your answer.
Association Outreach Committee recently conducted a survey
8.103 Travel and Transportation To advertise appropriate to learn more about the issues and concerns of the residents. In
vacation packages, Best Bets Travel would like to learn more a random sample of 124 residents, 20 indicated that the main
about families planning overseas trips. In a random sample of barrier to involvement in community events is scheduling
125 families planning a trip to Europe, 15 indicated France as conflicts.30
their travel destination. a. Find a 90% confidence interval for the true proportion of
a. For those families planning vacations to Europe, find a residents who believe the main barrier to involvement in
98% confidence interval for the true proportion traveling community events is scheduling conflicts.
=
to France. b. Suppose no prior estimate of p is known. How large a
= sample is necessary for the bound on the error of
b. Suppose no prior estimate of p is known. How large a
sample is necessary for the bound on the error of estimation to be 0.04 with confidence level 90%?
estimation to be 0.05 with the confidence level in part (a)?
8.107 Economics and Finance When unemployment rises,
8.104 Public Policy and Political Science In a recent high school students face more competition from college
survey, a random sample of 650 Americans were asked several students and adult workers for summer jobs. In a random
questions regarding television broadcasting and programming. sample of 188 high school students looking for summer work,
The table below lists three proposals and the number of 61 said they were able to find a job.
Americans who responded in favor of each. a. Find a 95% confidence interval for the true proportion of
high school students who were able to find a summer job.
Number in b. In a similar study the previous year, the sample proportion
Proposal favor of high school students who were able to find a job was
0.25. Use this estimate to find the sample size necessary
To limit the number of commercials shown 566 for the bound on the error of estimation to be 0.025 with
during each children’s show confidence level 95%.
To provide more adult-education programs 530
8.108 Marketing and Consumer Behavior Most walk-
To make children’s shows commercial-free 468 behind lawn mowers have three options for disposal of grass
372 C hapt er 8 Confidence Intervals Based on a Single Sample
clippings: by bagging, by mulching, or by side discharge. The Geographic Sample Number who
manager at an Aubuchon Hardware store conducted a survey to region size responded Yes
determine which disposal method is most common. The results
Northeast 225 180
are given in the table below, classified by area mowed. A small
area is less than 12 acre, a medium area is 12 to 1 acre, and a Midwest 276 224
large area is over 1 acre. South Central 301 232
South Atlantic 454 377
Disposal method
West 366 304
Sample Side
Area size Bagging Mulching discharge a. Find a 99% confidence interval for the true proportion of
Small 125 85 35 5 home-buyers who believe a separate dining room is
Medium 157 70 40 47 essential in each geographic region.
Large 144 42 45 57 b. Which confidence interval in part (a) is the largest? Why?
8.112 Travel and Transportation The U.S. Transportation
a. For people with small yards, find a 95% confidence Security Administration (TSA) screens approximately 1.8
interval for the true proportion who dispose of grass million travelers in the nation’s airports each day. All carry-on
clippings by bagging. and checked baggage is carefully screened, and every passenger
b. For people with medium yards, find a 95% confidence must pass through a metal detector or an advanced imaging-
interval for the true proportion who dispose of grass technology scanner. The agency claims that approximately 3%
clippings by mulching.
SOLUTION TRAIL
of all passengers are patted down.31 In a random sample of 478
c. For people with large yards, find a 95% confidence airline passengers, 31 were patted down.
interval for the true proportion who dispose of grass a. Find a 99% confidence interval for the true proportion of
clippings by side discharge. airline passengers who are patted down.
b. Is there any evidence to suggest the TSA claim is false?
8.109 Business and Management A U.S. textile
Justify your answer.
company is interested in the proportion of orders shipped
to another country. In a random sample of 1560 clothing orders
placed to U.S. companies, 500 were exported. Extended Applications
a. Find a 99% confidence interval for the true proportion of
clothing orders shipped to other countries. 8.113 Business and Management The 2013 Global
b. In the previous year, the true proportion of clothing orders Management Education Graduate Survey asked students in
shipped to other countries was believed to be 0.30. Using 159 business schools from 33 countries several questions
your answer to part (a), is there any evidence to suggest about their job search. Approximately 60% of job seekers
the true proportion has changed? Justify your answer. reported that they had an offer of employment.32 A random
Write a Solution Trail for this problem. sample of graduate students in their final year at business
schools in the Northeast was asked whether they had a job
8.110 Business and Management Many dairy farms offer. The resulting data are given in the table below, by
have experienced bankruptcy over the past decade as a degree program.
result of wild fluctuations in conventional milk prices.
However, organic farms, those that do not treat cows Sample Number with Degree
with antibiotics or hormones and use hay grown without size a job offer (x) program
chemicals, have remained solvent and even expanded. In a
260 160 MBA
random sample of 1400 New England dairy farms, 90 are
certified as organic. 380 288 MAcc
a. Find a 99% confidence interval for the true proportion of 310 125 MFin
New England dairy farms certified as organic.
b. Five years ago, an extensive census reported that 3% of a. Find a 99% confidence interval for the true proportion of
all New England dairy farms were organic. Is there any graduate business students who have a job offer for each
evidence to suggest this proportion has changed? Justify degree program.
your answer. b. Is there any evidence to suggest that the true proportion
of MFin graduate business students who have a job offer
8.111 Marketing and Consumer Behavior In a recent
is different from that in either of the other two degree
survey of home-buyer preferences, consumers were asked about
programs? Justify your answer.
desired characteristics, such as a wood-burning fireplace, a den/
library, and flooring. In particular, each person was asked 8.114 Psychology and Human Behavior The Drivers
whether a separate dining room is essential. The sample size Technology Association in the United Kingdom recently
and the number who responded “Yes” to this question are given studied the behavior of drivers with and without radar
in the table below by geographic region. detectors. A random sample of 550 users and 562 non-users
8.4 A Large-Sample Confidence Interval for a Population Proportion 373
was obtained. In the past three years, 108 users and 68 c. Suppose the results obtained in this study are preliminary
non-users have had an accident. and a larger survey is planned. How large a sample
a. Find a 99% confidence interval for the true proportion of is necessary for each political party for the bound on
radar-detector users who have had an accident in the past the error of estimation to be 0.02 with confidence
three years. level 95%?
b. Find a 99% confidence interval for the true proportion of
8.117 Medicine and Clinical Studies A new prescription
radar-detector non-users who have had an accident in the
medication is designed to ease the pain of arthritis. In a
past three years.
clinical trial, both treatment and placebo groups were studied
c. Is there any evidence to suggest the two true proportions
to determine whether there were any adverse reactions. The
are different? Justify your answer.
table below lists the number of patients in each group who
8.115 Education and Child Development Many families experienced each adverse reaction. Assume each group
that must find care for their children have turned to an au pair. represents a random sample.
An au pair usually provides more than just child care, for
example, help with meals, house cleaning, and even participa- Treatment group Placebo group
tion in children’s activities. Some become an extended family Adverse reaction ( n 5 465 ) ( n 5 154 )
member. According to Aupair World, the majority of au pairs Headache 61 15
are from Europe. However, approximately 8% are from North Rash 31 9
America.33 A random sample of au pairs in the United States
and Canada was asked for their home country. The resulting
a. Find a 95% confidence interval for the true proportion of
data are given in the table below.
people who suffered a headache in each of the groups.
Sample Number from b. Is there any evidence to suggest the true proportion of
size North America (x) Country people who suffered a headache is different for the two
groups? Justify your answer.
450 41 United States c. Find a 98% confidence interval for the true proportion of
506 51 Canada people who experienced a rash in each of the groups.
d. Is there any evidence to suggest the true proportion of
a. Find a 95% confidence interval for the true proportion of people who experienced a rash is different for the two
au pairs in the United States that come from North groups? Justify your answer.
America.
8.118 Public Policy and Political Science A recent
b. Find a 95% confidence interval for the true proportion
survey of Canadians on privacy-related issues indicated that
of au pairs in Canada that come from North America.
approximately two-thirds of all Canadians are concerned
c. Is there any evidence to suggest that the two true
about privacy protection. In addition, 70% of Canadians
proportions are different? Justify your answer.
believe their personal information is less protected than
8.116 Economics and Finance The first question on the it was 10 years ago.34 Suppose the table below lists the
U.S. IRS Income Tax Form 1040 asks taxpayers whether they number of respondents concerned about privacy protection
want $3 to go to the Presidential Election Campaign Fund. A by province.
large Washington political-action committee obtained a random
sample of registered voters and asked them if they checked Number concerned
“Yes” on this question on the past year’s return. The results are Province Sample size about privacy protection
given in the table below, by party affiliation. Ontario 580 401
British Columbia 617 381
Number who
Political party Sample size checked Yes Nova Scotia 478 320
Alberta 436 292
Democrat 237 70
Republican 388 184 a. Find a 95% confidence interval for the true proportion of
Independent 155 23 Canadians who are concerned about privacy protection
for each province.
a. Find a 95% confidence interval for the true proportion b. Is there any evidence to suggest that the true proportion
of registered voters in each political party who checked of Canadians who are concerned about privacy
“Yes” on the Presidential Election Campaign Fund protection is different from 67% in any province?
question. Justify your answer.
b. Is there any evidence to suggest the proportion of c. Is there any evidence to suggest that the true proportion
Independent voters who checked “Yes” is different from of Canadians who are concerned about privacy
either the proportion of Democrat or Republican voters protection is different for any two provinces? Justify
who checked “Yes”? Justify your answer. your answer.
374 C h a p t e r 8 Confidence Intervals Based on a Single Sample
3. Suppose X , x2n. The density curve for X is positively skewed (not symmetric), and as
x increases it gets closer and closer to the x axis but never touches it. As n increases,
the density curve becomes flatter and actually looks more normal. See Figure 8.41.
f (x)
0.2 X 23
X 25
0.1 X 2
7
0 2 4 6 8 10 12 14 16 x
Figure 8.41 Density curves for several
chi-square distributions.
The definition and notation for a x2 critical value is analogous to those for a Z and a t
critical value.
Definition
x2a,n is a critical value related to a chi-square distribution (a x2 critical value) with v
degrees of freedom. If X , x2n, then P ( X $ x2a,n) 5 a.
A Closer L ok
1. x2a,n is a value on the measurement axis in a chi-square world with v degrees of free-
dom such that there is a of the area (probability) to the right of x2a,n. There is no sym-
metry in chi-square critical values (as there was for Z and t critical values).
2. Critical values are always defined in terms of right-tail probability. The notation is just
a little trickier here because a chi-square distribution is not symmetric. It will be neces-
sary to find critical values denoted x212a,n (with large values for 1 2 a). By defini-
tion, P ( X $ x212a,n) 5 1 2 a, and by the complement rule, P ( X # x212a,n) 5 a. See
Figure 8.42.
f (x)
X 2
21, 2, x
Table VI in the Appendix presents selected critical values associated with various chi-
square distributions. Right-tail probabilities are in the top row and degrees of freedom are
listed in the left column. In the body of the table, x2a,n is at the intersection of the a column
376 C hapte r 8 Confidence Intervals Based on a Single Sample
and the v row. The first part of this table presents left-tail critical values corresponding to
large right-tail probabilities. The second half contains right-tail critical values correspond-
ing to small right-tail probabilities. The following example illustrates the use of this table
for finding critical values associated with a chi-square distribution.
Solution
a. Here, x20.05,10 is a critical value related to a chi-square distribution with 10 degrees of
freedom. By definition, if X , x210, then P ( X $ x20.05,10 ) 5 0.05. The following portion of
Table VI shows the intersection of the a 5 0.05 column and the n 5 10 row.
a
v 0.10 0.05 0.025 0.01 0.005 0.001 0.0005 0.0001
( ( ( ( ( ( ( ( (
8 13.3616 15.5073 17.5345 20.0902 21.9550 26.1245 27.8680 31.8276
9 14.6837 16.9190 19.0228 21.6660 23.5894 27.8772 29.6658 33.7199
10 15.9872 18.3070 20.4832 23.2093 25.1882 29.5883 31.4198 35.5640
11 17.2750 19.6751 21.9200 24.7250 26.7568 31.2641 33.1366 37.3670
12 18.5493 21.0261 23.3367 26.2170 28.2995 32.9095 34.8213 39.1344
( ( ( ( ( ( ( ( (
f (x)
X 210
0.95 0.05
0 18.307 x
b. Here, x20.99,7 is a critical value related to a chi-square distribution with seven degrees of
freedom. The following portion of Table VI shows the intersection of the a 5 0.99
column and the n 5 7 row.
a
v 0.9999 0.9995 0.999 0.995 0.99 0.975 0.95 0.90
( ( ( ( ( ( ( ( (
5 0.0822 0.1581 0.2102 0.4117 0.5543 0.8312 1.1455 1.6103
6 0.1724 0.2994 0.3811 0.6757 0.8721 1.2373 1.6354 2.2041
7 0.3000 0.4849 0.5985 0.9893 1.2390 1.6899 2.1673 2.8331
8 0.4636 0.7104 0.8571 1.3444 1.6465 2.1797 2.7326 3.4895
9 0.6608 0.9717 1.1519 1.7349 2.0879 2.7004 3.3251 4.1682
x20.99,7 5 1.2390 and if X , x27, then P ( X $ 1.2390 ) 5 0.99 and P ( X # 1.2390 ) 5 0.01,
as illustrated in Figure 8.44.
8.5 A Confidence Interval for a Population Variance or Standard Deviation 377
f (x)
X 27
0.01 0.99
0 1.239 X
Figure 8.44 Visualization of x20.99,7 5 1.2390. Figure 8.45 Use the Quantile option in
the Chi-square Distribution Calculator of
CrunchIt! to find critical values associated
with a chi-square distribution.
Table VI is limited. There are only a handful of values for a, and n 5 1240, 50, 60, 70,
80, 90, 100. However, a calculator or computer can find almost any critical value needed.
A confidence interval for a population variance is based on the following theorem.
Theorem
Let S2 be the sample variance of a random sample of size n from a normal distribution
with variance s2. The random variable
( n 2 1 ) S2
X5 (8.18)
s2
has a chi-square distribution with n 2 1 degrees of freedom.
This is another kind of This theorem is used to construct a confidence interval for s2 .
standardization, a transformation
to a chi-square distribution. Let X , x2n21 and find an interval that captures 1 2 a in the middle of this chi-
square distribution (see Figure 8.46).
P ( x212a/2,n21 , X , x2a/2,n21) 5 1 2 a (8.19)
( n 2 1 ) S2
Pax212a/2,n21 , , x2a/2,n21b 5 1 2 a (8.20)
s2
f (x)
X 2n−1
212,n−1 22,n−1 x
Manipulate here means: Inside the Manipulate Equation 8.20 to obtain the probability statement
probability statement, divide each
( n 2 1 ) S2 ( n 2 1 ) S2
term by (n 2 1)S2 and take the Pa , s2 , b 5 1 2 a (8.21)
reciprocal of each term (change x2a/2,n21 x212a/2,n21
the direction of the inequality).
This probability statement leads to the following general result.
A Closer L ok
1. This confidence interval for s2 is valid only if the underlying population is normal.
2. Take the square root of each endpoint of Equation 8.22 to find a 100 ( 1 2 a ) % confi-
dence interval for the population standard deviation s.
The following example illustrates the use of Equation 8.22 to construct a confidence
interval for s2 and to answer an inference question.
Tr a nsl at i o n
Figure 8.47 Confidence intervals for the
n Inference procedure
population standard deviation and population
C onc e p ts variance.
n A 100 ( 1 2 a ) % CI for a
b. Claim: s2 5 16.
population variance
Experiment: s2 5 17.55.
V isi on
Likelihood: The likelihood is expressed as a 95% confidence interval, an interval of
The CI constructed in part (a) is
an interval in which we are 95% likely values for s2: (10.0202, 38.3807), from part (a).
confident the true value of s2 Conclusion: The required kiln temperature variance, 16°C, is included in this
lies. If 16 lies in this interval, confidence interval. There is no evidence to suggest s2 is different from 16.
there is no evidence to suggest
Note: The confidence interval for s2 is not symmetric about the point estimate s2 and
s2 is different from 16. Use the
four-step inference procedure. is quite large (see Figure 8.48). The confidence interval will be narrow only for very
large values of n.
Left Right
endpoint s2 endpoint
( )
0 10.02 17.55 38.38 x
Technology Corner
Procedure: Construct a confidence interval for a population variance.
Reconsider: Example 8.13, solution, and interpretations.
CrunchIt!
There is no built-in function to construct a confidence interval for a population variance. The Chi-square Distribution
Calculator may be used to find critical values.
TI-84 Plus C
There is no built-in function to compute critical values associated with a chi-square distribution nor to construct a confi-
dence interval for a population variance. However, the EQUATION SOLVER, or equivalently, the solve function, may be
used to find critical values, and other confidence interval calculations can be completed on the Home screen.
380 C h a p t e r 8 Confidence Intervals Based on a Single Sample
Minitab
Use the built-in function 1 Variance. Input is either summary statistics or a column containing data.
1. Select Stat; Basic Statistics; 1 Variance.
2. Choose Sample variance from the pull down menu. Enter the Sample size and the Sample variance.
3. Select the Options option button and enter the Confidence level. The results (a confidence interval for the population
standard deviation and for the population variance) are displayed in a session window. See Figure 8.47.
Excel
There are built-in functions to find critical values. Use these values in the appropriate formula to construct a confidence
interval for a population variance.
1. Use the function CHISQ.INV to find the left tail critical values and the function CHISQ.INV.RT to find the right tail
critical value. The arguments are tail probability and degrees of freedom.
2. Compute the left and right endpoints of the confidence interval. See Figure 8.49.
associated confidence intervals for the population variance and accounts. The time (in hours) needed to obtain each password is
the population standard deviation. given in the table below. CODES
a. s2 5 3.08, n 5 14, 99%
b. s2 5 64.10, n 5 11, 95% 1.88 1.71 2.09 6.60 2.28 3.52 2.64 4.94
c. s2 5 59.07, n 5 6, 80% 1.78 4.13 2.55 1.66 0.28 4.02 4.47 0.68
d. s2 5 7.35, n 5 27, 99.98% 5.67 0.03
e. s2 5 31.38, n 5 22, 95%
f. s2 5 12.39, n 5 18, 99% Assume the distribution of times is normal.
a. Find a 98% confidence interval for the true population
SOLUTION TRAIL
8.132 Use Table VI in the Appendix and linear interpolation to variance of the time needed to acquire someone’s password.
approximate each critical value. Verify each approximation b. Use the methods described in Section 6.3, page 272, to
using technology. check for any evidence of non-normality.
a. x20.05,45 b. x20.005,65
2
c. x0.01,72 d. x20.025,56 8.136 Physical Sciences St. Simons Island, Georgia,
2 is a popular vacation destination. Attractions include a
e. x0.95,85 f. x20.999,52
2 salt-marsh tour, a dolphin watch, a golf course, and the art and
g. x0.90,66 h. x20.975,75
antiques trail. In a sample of high-tide observations (in feet)
from June 2013, x 5 7.07, s 5 0.702, and n 5 30.36
Applications a. Assuming normality, find a 99% confidence interval for
8.133 Business and Management A personnel manager at the true variance in high-tide level.
Inserra Supermarkets is concerned about the pattern of overtime b. Historical evidence suggests the variance in high tide is
hours claimed by employees. Although there is a special budget 0.50. Is there any evidence to suggest the variance in high
to pay for overtime hours, large fluctuations may cause cash- tide has decreased? Justify your answer. Write a Solution
flow problems. In a random sample of 25 employees, the sample Trail for this problem.
variance for the number of overtime hours claimed was 4.25. 8.137 Physical Sciences June temperatures in Phoenix,
Assume the distribution of overtime hours is normal. Arizona, routinely soar above 100°F, but residents claim that it
a. Find a 95% confidence interval for the population
SOLUTION TRAIL
is “dry” heat. The low humidity makes outside physical activity
variance of overtime hours. easier and attracts tourists, especially golfers, and retirees. The
b. Find a 95% confidence interval for the population following table contains relative humidity data (percentages)
standard deviation of overtime hours. for several days in June in Phoenix.37 HUMID
b. Suppose a process is considered “in control” if the 8.143 Physical Sciences The well depth in certain locations
variance is 0.02 mm (or less). Is there any evidence to is important for agriculture and for the development and
suggest any of the three processes is out of control? maintenance of residential and commercial areas. A random
sample of the depth to water (in centimeters) at the Upper
8.139 Fuel Consumption and Cars New hybrid automo-
Gordon Gulch Ground Water Well from 2011 to 2013 is given
biles run on both gasoline and electricity. Toyota claims the
in the following table.41 GULCH
Prius is capable of traveling 15 miles on electricity alone, and
GM reports that the all-electric range of the 2013 Volt is 38
miles.38 A random sample of each automobile was obtained. All 1177 1021 989 984 981 979 974 974 973 975
cars were fully charged, and the all-electric range for each was 970 925 877 959 962 937 960 985 984 970
carefully measured. The data are given in the following table.
a. Find a 98% confidence interval for the population
Sample Sample variance in depth to water at this site.
Automobile size variance b. Find a 98% confidence interval for the population
Volt 12 8.31 standard deviation in depth to water at this site.
Prius 8 41.50 c. Use the methods described in Section 6.3, page 272, to
test for any evidence of non-normality.
a. Find a 95% confidence interval for the true population 8.144 Fuel Consumption and Cars One measure of a vehicle’s
variance in all-electric miles for the Volt. front-end alignment is the caster angle—the relationship between
b. Find a 95% confidence interval for the true population the upper and lower ball joints. The specifications for a certain
variance in all-electric miles for the Prius. sports car include a caster angle of 2.8° with variance 0.40.
c. Is there any evidence to suggest that the two population Twenty-two new sports cars were randomly selected, and the caster
variances are different? Justify your answer. angle was measured for each. The sample variance was 0.55.
8.140 Sports and Leisure Some of the stages of the Tour de a. Find a 95% confidence interval for the population
France bicycling championships are over 200 km. In a random variance in caster angle. Assume normality.
sample of 22 riders on Stage 4 of the 2012 Tour de France, b. Is there any evidence to suggest the population variance
Abbeville to Rouen, 214.5 km, times were measured (in in caster angle is different from 0.40? Justify your answer.
minutes), and the sample variance was 4.81 minutes.39 Assum- 8.145 Business and Management Even though the budget
ing normality, find a 99% confidence interval for the true for a movie may be close to $300 million, worldwide distribution
population variance in times for Stage 4 of the Tour de France. can provide a huge net income for a studio. The following table
8.141 Sports and Leisure During the National Football includes a sample of movies, the budget (in dollars) for each, and
League preseason, coaches appraise new players and tested the U.S. gross income (in dollars) for each.42 MOVIES
veterans. Suppose a rookie is trying to unseat a veteran for the
Movie Budget Gross
starting tailback position. A random sample of preseason runs
is selected for each player, and the number of yards gained on Avatar 237,000,000 760,507,625
each play is recorded. RUNS Superman Returns 232,000,000 200,120,000
a. Find a 99% confidence interval for the variance of
Quantum of Solace 230,000,000 169,368,427
yardage gained per run for the veteran tailback.
Green Lantern 200,000,000 116,601,172
b. Find a 99% confidence interval for the variance of
yardage gained per run for the rookie tailback. Wrath of the Titans 150,000,000 36,030,512
c. Suppose the coach will start the more consistent and Die Another Day 142,000,000 160,942,139
dependable tailback. Which player do you think will get Lethal Weapon 4 140,000,000 130,444,603
the starting position, and why? Sahara 145,000,000 68,671,925
8.142 Travel and Transportation Airline travelers are often I Am Legend 150,000,000 256,393,010
plagued by long delays on the tarmac waiting to take off. In April Transformers 151,000,000 319,246,193
2013 there were six flights in the United States with tarmac times Cowboys and Aliens 163,000,000 100,240,551
of more than 3 hours; four were international flights.40 A random Iron Man 2 170,000,000 312,433,331
sample of 25 planes at the Charlotte Douglas International
Wild Wild West 175,000,000 113,805,681
Airport was obtained, and the tarmac time (in minutes) was
recorded for each. The sample variance was 181.55. Salt 130,000,000 118,311,368
a. Find a 95% confidence interval for the population Bad Boys II 130,000,000 138,540,870
variance in tarmac times. Assume normality.
b. Airlines would like to reduce tarmac times, of course, and a. Find a 95% confidence interval for the population
lower the variance. Is there any evidence to suggest that variance in gross income.
the population variance is less than 100 minutes? Justify b. Compute the ratio of U.S. revenue to budget ( ratio 5
your answer. gross / budget ) for each movie. Find a 95% confidence
8.5 A Confidence Interval for a Population Variance or Standard Deviation 383
interval for the population variance in ratio of revenue to c. Is there any evidence to suggest the true variance is
budget. different for men and women? Justify your answer.
c. If the variance in the ratio is greater than 1, it is d. What assumption(s) did you make in constructing the
considered risky to invest in a new film. Is there any confidence intervals above?
evidence to suggest investing in a new movie is risky?
8.149 Biology and Environmental Science Rice is a staple
Justify your answer.
food for much of the world’s population. The height of a rice plant
8.146 Manufacturing and Product Development (usually 0.4–5 mm) depends on the rice variety and the environ-
Silver bars used in trading on the bullion market are manu- ment. Genetic engineering is currently underway to protect rice
factured to certain height, width, and length specifications. plants from disease and to decrease the variability in plant height.
In addition, there is variation in the silver content. Five In a controlled experiment, plant heights were measured (in
randomly selected silver bars were carefully analyzed for millimeters); 30 natural rice plants had a sample variance in
silver content (in ounces). The sample variance was 65.5. height of 1.5 mm, and 22 genetically engineered rice plants had a
Assume the distribution of silver content is normal, and find sample variance in height of 0.89 mm. Assume normality.
a 99% confidence interval for the population variance in a. Find a 95% confidence interval for the true variance in
silver content. height for natural rice plants.
b. Find a 95% confidence interval for the true variance in
8.147 Psychology and Human Behavior The Great Wall
height for genetically engineered rice plants.
of China was started in 214 b.c. and designed as a defense
c. Is there any evidence to suggest the population variance is
against nomadic tribes. This 1500-mile-long wall, built from
different for natural and genetically engineered rice plants?
earth and stone, is visible from space and is one of the most
famous structures in the world. Suppose a random sample of 8.150 Public Health and Nutrition In June 2013 the
midpoints between guard towers along the Great Wall was Agriculture Department banned all high-calorie sports drinks
obtained, and the height of the wall at each point was and candy bars from school vending machines.44 Many
measured (in feet). The data are given in the following beverage companies had already added sports drinks to replace
table.43 GRWALL high-calorie sodas. The new rules stipulate that sports drinks
must contain 60 calories or less in a 12-ounce serving. A
26.9 22.2 21.8 26.6 19.2 29.3 21.9 19.0 random sample of sports drinks was obtained, and the calories
in each was recorded.45 DRINKS
23.5 26.7 18.3 27.7 18.8 25.0 24.9 21.1
a. Find a 95% confidence interval for the population
24.1 24.3 19.5 20.8 27.1 variance in calories for sports drinks.
b. Find a 95% confidence interval for the population
a. Find a 95% confidence interval for the population standard deviation in calories for sport drinks.
variance in height at the midpoints between guard towers
of the Great Wall. 8.151 Physical Sciences A microwave radiometer is used to
b. Using the confidence interval constructed in part (a), is measure the column water vapor and the infrared brightness
there any evidence to suggest the true population variance (IB) temperatures in clouds. Clouds were randomly selected,
in the height at midpoints between guard towers is less and a weather station in Coffeyville, Kansas, collected the
than 12 feet? Justify your answer. following summary statistics.
Column water IB
Extended Applications vapor (cm) Temperature (°C)
Cloud Sample Sample Sample Sample
8.148 Medicine and Clinical Studies Many researchers
type size variance size variance
believe moderate physical activity, such as walking, will help
prevent weight gain. To study this claim, doctors in Colorado Cirrus 11 0.06 17 201.7
had randomly selected patients wear a pedometer to measure Cumulus 21 0.08 28 225.6
the distance walked (in miles) per week. The table below
presents data for men and women. a. Find a 99% confidence interval for the population
variance in column water vapor and in infrared brightness
Sample Sample temperature for cirrus clouds.
size variance b. Find a 99% confidence interval for the population
Men 32 5.75 variance in column water vapor and in infrared brightness
Women 28 7.66 temperature for cumulus clouds.
c. Is there any evidence to suggest the variance in column
water vapor or infrared brightness temperature is different
a. Find a 95% confidence interval for the true variance in
in cirrus and cumulus clouds? Justify your answers.
distance walked per week for men.
b. Find a 95% confidence interval for the true variance in 8.152 Medicine and Clinical Studies A new medicinal spray
distance walked per week for women. has been developed to help ease the itch and burn associated with
384 C h a p t e r 8 Confidence Intervals Based on a Single Sample
poison ivy and poison oak. People who suffer from these poisons a. Use the method outlined in Exercise 8.51 to find a
want and need immediate relief. A research study was conducted one-sided 100 ( 1 2 a ) % confidence interval for s2
to measure the time (in minutes) from application of this spray to bounded above, and another bounded below.
relief from itching. The following summary statistics were b. A random sample of soccer stadiums from around the
reported for a random sample of children and adults. world was obtained, and the seating capacity for each is
given on the text website. Find a one-sided 95% confidence
Sample Sample interval, bounded above, for the population variance in
Population size variance seating capacity for soccer stadiums. SEATCAP
Children 11 1.57 8.155 Normal Approximation to the Chi-Square Distribu-
Adults 25 2.38 tion Given a random sample of size n from a normal
population with variance s2, a 100 ( 1 2 a ) % confidence
a. Find a 95% confidence interval for the population variance interval for s2 (based on a chi-square distribution) is given by
in time to relief for children. Equation 8.22:
b. Find a 95% confidence interval for the population variance
in time to relief for adults. ( n 2 1 ) s2 ( n 2 1 ) s2
, s2 ,
c. Is there any evidence to suggest the variance in time to relief x2a/2 x212a/2
is different in children and adults? Justify your answer.
If n is large ( n . 30 ) , then the chi-square random variable
8.153 Electric Shock During the first half of 2013, it was ( n 2 1 ) S2/s2 is approximately normal with mean n 2 1 and
discovered that British Columbia Hydro was dramatically variance 2 ( n 2 1 ) . Therefore, for large n,
overcharging customers in a Burnaby condominium complex.
( n 2 1 ) S2
New “smart” meters had been installed but not connected to units, s2
2 (n 2 1)
and the company was estimating electricity use based on very Pa2za/2 , , za/2 b 5 1 2 a (8.23)
!2 ( n 2 1 )
poor comparisons.46 Suppose a random sample of bimonthly
electricity bills for these condominium owners was obtained a. Manipulate Equation 8.23 to obtain an approximate
before and after the new meters were installed. meters 100 ( 1 2 a ) % confidence interval for a population
a. Find a 99% confidence interval for the population variance variance.
in electricity bills before the new meters were installed. b. The thickness of pavement on roads and highways
b. Find a 99% confidence interval for the population variance depends on the predicted weight and volume of vehicular
in electricity bills after the new meters were installed. traffic. The thickness of the pavement (in mm) was
c. Is there any evidence to suggest that the variance is measured at 51 random locations along Route 95. The
different before and after the new meters were installed? sample variance was s2 5 16.25.
Justify your answer. i. Find a 95% confidence interval for the population
variance in pavement thickness using Equation 8.22.
ii. Find an approximate 95% confidence interval for the
Challenge
population variance in pavement thickness using the
8.154 Sports and Leisure Given a random sample of size n equation derived in part (a).
from a normal population with variance s2: i ii. Which of these two intervals is wider? Why?
Chapter 8 Summary
Concept Page Notation / Formula / Description
Estimator 334 A statistic, or rule, used to produce a point estimate of a population parameter.
= =
Unbiased estimator 334 A statistic u is an unbiased estimator of u if E ( u ) 5 u.
= =
Biased estimator 334 A statistic u is a biased estimator of u if E ( u ) 2 u.
MVUE 336 Minimum variance unbiased estimator.
Confidence interval (CI) 340 An interval of values constructed so that with a specified degree of confidence, the value
of the population parameter lies in this interval.
Confidence coefficient 340 The probability the confidence interval encloses the population parameter in repeated
samplings.
Confidence level 340 The confidence coefficient expressed as a percentage.
za 342 z critical value; a value such that P ( Z $ za ) 5 a.
Chapter 8 Exercises 385
ta,n 354 t critical value; a value such that P ( T $ ta,n) 5 a where T has a t distribution with v
degrees of freedom.
x2a,n 375 Chi-square critical value; a value such that P ( X $ x2a,n) 5 a, where X has a chi-square
distribution with v degrees of freedom.
Chapter 8 Exercises
APPLICATIONS varies due to the seasons, the entry of fresh water into the area, 8
and the salt water from the ocean. A random sample of 35 days
8.156 Medicine and Clinical Studies Most patients in need was selected and the chlorophyll near Stearns Wharf was
of a kidney transplant are placed on a dialysis machine. There is measured (in mg/L) for each.47 The sample mean chlorophyll
some evidence to suggest that the longer patients remain on concentration was 2.93 mg/L. Assume the population standard
dialysis, the worse they fare following a transplant. The mean deviation is s 5 2.12.
waiting time for a kidney transplant is five years (60 months). a. Find a 98% confidence interval for the true mean
Suppose a random sample of kidney transplant patients was chlorophyll concentration near Stearns Wharf.
obtained, and the wait time (in months) was recorded for each. b. Find the sample size necessary to construct a 98% confidence
The data are given in the table below. KIDNEY interval for the true mean chlorophyll concentration with a
bound on the error of estimation B 5 0.5.
58.1 63.9 74.4 57.8 85.5 68.5 53.5 100.8
8.158 Marketing and Consumer Behavior Parrot Jungle
87.6 83.7 78.4 49.7 76.6 40.7 63.6 50.3
Island is a roadside attraction in Miami, Florida, featuring
75.5 76.8 101.4 64.0 43.7 64.3 48.8 76.2
tropical birds, crocodiles, and over 2000 varieties of plants and
flowers. A new advertising campaign was developed to attract
a. Find a 95% confidence interval for the true mean wait
more out-of-state visitors. In a random sample of 270 visitors,
time for a kidney transplant.
189 were area residents.
b. Find a 95% confidence interval for the true variance in
a. Find the sample proportion of visitors who were area
wait time for a kidney transplant.
residents. Check the nonskewness criterion.
c. Construct a normal probability plot for the wait-time data.
b. Find a 99% confidence interval for the true proportion of
Is there any evidence to suggest non-normality?
visitors who were area residents.
d. Is there any evidence to suggest that the waiting time is
c. Suppose no prior knowledge of the proportion of visitors
different from 60 months? Justify your answer.
who were area residents is available. Find the sample size
8.157 Biology and Environmental Science The amount of necessary for a 99% confidence interval with a bound on
chlorophyll near Stearns Wharf in Santa Barbara, California, the error of estimation of 0.05.
386 C hapt er 8 Confidence Intervals Based on a Single Sample
8.159 Sports and Leisure Most Major League baseball 8.162 Medicine and Clinical Studies Passengers on long
parks have a device to measure the speed of every pitch. airline flights may develop deep vein thrombosis (DVT),
The results are often displayed on a scoreboard and tracked potentially dangerous blood clots. Although most blood clots in
by coaches. A random sample of pitches made during the the bloodstream dissolve naturally, travelers on long flights
first inning of baseball games around the country was have three times the risk of developing a blood clot. In a
selected. The speed of each pitch (in mph) is given in the research study of people who traveled frequently as part of
table below. PITCH their job, 22 of 8755 developed a blood clot within eight weeks
of a long flight.
a. Find a 90% confidence interval for the true proportion of
85 90 82 86 83 88 87 90 92 84 92 passengers on long flights who develop DVT within eight
90 89 90 89 89 84 87 92 89 86 89 weeks of a long flight.
b. Is there any evidence to suggest this proportion is greater
a. Assume normality. Find a 95% confidence interval for than 0.5%? Justify your answer.
the population mean speed of pitches in the first inning 8.163 Psychology and Human Behavior People who
of Major League baseball games. watch multiple DVDs or several online TV episodes in a row
b. Find a 99% confidence interval for the population are indulging in binge-watching. Most experts agree that this is
variance in speed of pitches in the first inning of Major mostly a harmless, enjoyable addiction. In a recent survey of
League baseball games. 2693 U.S. adults with Internet access, 1670 watch multiple TV
c. The mean speed of all pitches made in the first inning episodes back to back.48
of games during the previous season was 90.225. Is a. Find a 99% confidence interval for the true proportion
there any evidence to suggest the mean speed has of adults with Internet access who watch multiple TV
changed? episodes back to back.
8.160 Economics and Finance ATM machines have made b. Find the sample size necessary to construct a 99%
it easier and faster for customers to check account balances, confidence interval with a bound on the error of
transfer money, and obtain cash. Many banks are now consider- estimation of 0.02.
ing the next generation of ATM machines, which will feature 8.164 Travel and Transportation An English racing team
news headlines, full-motion video, and tickets to events. In a called the Bloodhound Project is testing a rocket car that they
random sample of 500 customers, 280 said ATM machines hope will break the current land speed record. They expect this
should offer postage stamps. hybrid vehicle, which runs on concentrated peroxide and a
a. Find the sample proportion of customers who believe synthetic rubber, to reach speeds over 1000 mph.49 Suppose a
ATM machines should offer postage stamps, and check random sample of test runs was obtained, and the speed of the
the nonskewness criterion. rocket car was recorded for each. ROCKET
b. Find a 99% confidence interval for the true proportion a. Find a 99% confidence interval for the population mean
of customers who believe ATM machines should offer speed of the rocket car.
postage stamps. b. Find a 99% confidence interval for the population
c. A bank official claims the proportion of customers variance in speed of the rocket car.
who believe ATM machines should offer postage c. What assumptions did you make in constructing these two
stamps is 0.60. Is there any evidence to refute confidence intervals?
this claim? d. The Bloodhound Team members will attempt to break the
8.161 Manufacturing and Product Development A
1000 mph barrier on the Hakskeen Pan desert in South
0.2-kiloton “bunker buster” missile is designed to destroy Africa in 2016. Using the test speed data, is there any
enemy bunkers approximately 70 feet deep. To prevent fallout, evidence to suggest the mean speed of the rocket car will
the missile must penetrate 120 feet below the ground. In tests be great than 1000 mph? Justify your answer.
conducted by the military, eight missiles were fired and the 8.165 Business and Management Many businesses rent a
penetration depth (in feet) of each was recorded. The data are limousine to chauffeur important clients to/from an airport,
given in the table below. MISSILE hotel, or office. A random sample of the cost (in dollars) of
renting a limousine for the entire day in Cincinnati was
obtained. For n 5 25, the mean was x 5 410.25 and
122 117 119 119 121 124 119 120
the standard deviation was s 5 35.07. Assume the cost
distribution is normal.
a. Assume normality. Find a 99% confidence interval for a. Find a 95% confidence interval for the population mean
the population mean penetration depth. cost of renting a limousine for the entire day.
b. Is there any evidence to suggest the population mean b. Find a 95% confidence interval for the population variance
penetration depth is less than 120 feet? in the cost of renting a limousine for the entire day.
Chapter 8 Exercises 387
c. Find a 95% confidence interval for the standard b. A government publication claims that the proportion of
deviation in the cost of renting a limousine for the people without health insurance for the general
entire day. population is 0.146. Using the confidence intervals in part
(a), is there evidence that any group has a different
8.166 Psychology and Human Behavior According population proportion without health insurance?
to a recent survey, approximately one of every five people
hides money from their spouse. Women are slightly more
likely to hide money than men, but when men stash cash, EXTENDED APPLICATIONS
they hide more of it. The report claims that the mean
8.169 Medicine and Clinical Studies The drugs Ritalin
amount hidden is $10,000.50 Suppose a random sample
and Adderall are designed to stimulate the central nervous
of 17 people who hide money from their spouse was obtained.
system and are widely prescribed for children diagnosed with
The mean amount hidden was x 5 10,460 and the standard
attention deficit disorder (ADD). Suppose a random sample of
deviation was s 5 325. Assume the distribution of hidden
girls and boys in 12th grade was obtained and the following
cash is normal.
results were obtained.
a. Find a 99% confidence interval for the mean amount of
hidden cash. Sample Number taking
b. Suppose divorce lawyers always assume a spouse has Group size Ritalin/Adderall
hidden cash of approximately $10,000. Is there any
evidence to suggest that the mean amount of stashed cash Girls 375 24
is more than $10,000? Boys 480 39
c. Find a 99% confidence interval for the variance in the
amount of hidden cash. a. Find a 95% confidence interval for the true proportion of
d. A large variance in the amount of hidden cash means 12th-grade girls who are taking Ritalin/Adderall.
divorce lawyers need to hire more forensic accountants. Is b. Find a 95% confidence interval for the true proportion of
there any evidence to suggest that the variance in hidden 12th grade boys who are taking Ritalin or Adderall.
cash is greater than $55,000? c. Is there any evidence to suggest that the proportion
of girls in this age group taking Ritalin or Adderall is
8.167 Physical Sciences The Quabbin Reservoir in Massa-
different from the proportion of boys? Justify
chusetts is 18 miles long and has a capacity of 412 billion
your answer.
gallons. This reservoir serves approximately 2.5 million people,
with a daily yield of 300 million gallons of water. The depth of 8.170 Public Health and Nutrition Tannin is a general term
the reservoir is carefully monitored and measured at certain for certain nonvolatile phenolic substances in many fruits that
locations and times each day. In a random sample of 18 summer provide an astringent sensation, in apple cider for example.
days, the depth (in feet) was measured at location S-1 at noon. There is some evidence to suggest the tannin level in apples is
The mean depth was 75.4. Assume the population standard affected by the fertilizer regimen. A random sample of apples
deviation is s 5 7.58. was obtained from both fertilized and unfertilized trees. The
a. Find a 99% confidence interval for the depth of the percentage of tannin was measured in each apple, and the data
reservoir at this location. (sample size, sample mean, and sample standard deviation) are
b. Historical records indicate the mean depth at this location reported in the table below.
for the previous 10 years is 78.4 feet. Is there any
evidence to suggest this population mean has changed? Sample
Sample Sample standard
8.168 Public Health and Nutrition Rising costs caused Apples size mean deviation
many Americans to do without health insurance. A random
sample of adults of various ethnicities was obtained and asked Fertilized trees 48 0.30 0.058
whether they had health insurance. The resulting data are given Unfertilized trees 55 0.35 0.077
in the table below.
a. Find a 95% confidence interval for the true mean tannin
Sample Number without percentage in apples from fertilized trees.
Group size health insurance b. Find a 95% confidence interval for the true mean tannin
White 1220 166 percentage in apples from unfertilized trees.
African American 1080 205 c. Is there any evidence to suggest the percentage of tannin
is different in apples from fertilized and unfertilized
Hispanic 1156 384
trees?
a. Find a 95% confidence interval for the proportion of 8.171 Sports and Leisure Most high schools in the United
adults without health insurance for each group. States offer a wide variety of sports for both boys and girls.
388 C hapt er 8 Confidence Intervals Based on a Single Sample
However, the participation rate has historically been higher for generated by each in one week was recorded. The sample
boys. A random sample of high school students from across the mean was x 5 52.3 with a standard deviation of s 5 10.75.
country was obtained, and the students were asked whether they Assume normality.
participate in a high school sport. The results are given in the a. Find a 98% confidence interval for the true mean amount
following table. of trash generated by an American household per week.
b. Find an approximate 98% confidence interval for the true
Sample Number mean amount of trash generated by an American
Group size participating household per week (based on a Z distribution rather than
Boys 1250 583 a t distribution). Compare your answers in parts (a) and
(b). How do they differ?
Girls 1475 494
c. Find a 95% confidence interval for the variance in the
amount of trash generated by an American household per
a. Find a 95% confidence interval for the true proportion of week.
boys participating in a high school sport.
b. Find a 95% confidence interval for the true proportion of 8.174 Medicine and Clinical Studies In a major health
girls participating in a high school sport. study, a random sample of adult males was obtained and
c. Is there any evidence to suggest the proportion of boys each was tested for symptoms of heart disease yearly for a
and girls participating in a high school sport is different? decade. Subjects were divided into those who regularly
Justify your answer. donated blood and those who did not. The results are given
in the table below.
8.172 Public Health and Nutrition Elevated levels of
mercury can cause hair loss, fatigue, and memory lapses. The Sample Number with
concentration of mercury in a person’s blood can be greatly Group size heart disease
affected by diet, specifically, the amount of fish consumed.
Random samples of adults who eat fish two to three times per Donate blood 145 51
week and of adults who never eat fish were obtained. The Do not donate blood 527 210
mercury concentration in the blood of each person was
measured (in micrograms per liter of blood), and the summary a. Find a 95% confidence interval for the population proportion
data are reported in the table below. of males who donate blood who have heart disease.
b. Find a 95% confidence interval for the population
Sample proportion of males who do not donate blood who have
Sample Sample standard heart disease.
Group size mean deviation c. Is there any evidence to suggest the proportion of males
Fish 18 4.662 0.298 with heart disease is different for those who donate blood
No fish 27 2.079 0.309 and those who do not?
8.175 Manufacturing and Product Development Teflon-
Assume both distributions are normal. coated pots and pans are designed to be nonstick and to make
a. Find a 95% confidence interval for the population mean cooking and clean-up easier. A recent study compared the
mercury concentration in the blood of adults who eat fish thickness of the Teflon layer in new, factory-coated pans with
regularly. recoated pans. A random sample of pans was obtained for
b. Find a 95% confidence interval for the population mean each group, and the Teflon thickness was carefully measured
mercury concentration in the blood of adults who never (in inches). TEFLON
eat fish. a. Find a 95% confidence interval for the population mean
c. The safe level of mercury (set by the U.S. Environmental Teflon thickness in factory-coated pans.
Protection Agency) is five micrograms per liter of blood. b. Find a 95% confidence interval for the population mean
Is there any evidence to suggest either group is over the Teflon thickness in recoated pans.
safe limit of mercury concentration? c. Is there any evidence to suggest these two population
d. Use the confidence intervals in parts (a) and (b) to means are different? Justify your answer.
determine whether the two groups have different mean
8.176 Public Health and Nutrition Most people understand
mercury concentration levels.
that highly processed, carbohydrate-heavy foods are not very
8.173 Biology and Environmental Science There is healthy. However, we often justify a carbohydrate-rich meal by
growing concern that Americans are generating too much vowing to “eat healthy” next time. A new study suggests that
trash. More zoning permits are being sought for landfills in those who eat a high-carbohydrate meal start to crave even more
rural areas, and trash haulers move garbage out of state and of these foods shortly afterwards.51 An experiment was conducted
even out to the ocean. In a recent study, 65 households were to test this claim. Researchers gave two groups of adults similar
randomly selected, and the amount of trash (in pounds) test meals, one of which was rich in carbohydrates. After four
Chapter 8 Exercises 389
hours, brain scans were used to determine if there was activity c. Is there any evidence to suggest that the proportion of
related to reward, craving, and addictive behavior. The follow- people who exhibit addictive behavior is different after
ing results were obtained. eating a high- or low-carbohydrate meal?
Looking Back
=
n Recall that x, p, and s2 are the point estimates for the parameters m, p, and s2 .
n Remember how to construct and interpret confidence intervals.
n Think about the concept of a sampling distribution for a statistic and the process of
standardization.
Looking Forward
n Use the available information in a sample to make a specific decision about a population
parameter.
n Understand the formal decision process and learn the four-part hypothesis test procedure.
n Conduct formal hypothesis tests concerning the population parameters m, p, and s2 .
Contents
9.1 The Parts of a Hypothesis Test and Choosing the Alternative Hypothesis
9.2 Hypothesis Test Errors
9.3 Hypothesis Tests Concerning a Population Mean When s Is Known
9.4 p Values
9.5 Hypothesis Tests Concerning a Population Mean When s Is Unknown
9.6 Large-Sample Hypothesis Tests Concerning a Population Proportion
9.7 Hypothesis Tests Concerning a Population Variance or Standard Deviation
Marcelo Santos/Stone/Getty Images
391
392 C hapter 9 Hypothesis Tests Based on a Single Sample
Definition
In statistics, a hypothesis is a declaration, or claim, in the form of a mathematical state-
ment, about the value of a specific population parameter (or about the values of several
population characteristics).
A Closer L ok
Statistical Applet 1. The test of a statistical hypothesis is a procedure by which we decide whether there is
The reasoning of evidence to suggest that the alternative hypothesis, Ha, is true. The ultimate objective
a statistical test of a hypothesis test is to use the information in a sample to decide which hypothesis is
more likely to be true, H0 or Ha. Usually, we are trying to reject the null hypothesis.
2. The rejection region and the nonrejection region divide the world (values of the test
statistic) into parts. Figure 9.1 illustrates this concept in terms of a parameter. The
cutoff point, or dividing line, is determined by considering likely values for the test
statistic if H0 is true, and is included in one of the regions.
The value of the parameter must lie in one of the regions. In Section 9.3, we’ll see how
easy it is to specify a rejection region.
Cutoff
point
3. The hypothesis test procedure is very prescriptive. Once the four parts are identified,
the sample data are used to compute a value of the test statistic. There are only two
possible conclusions.
a. If the value of the test statistic lies in the rejection region, then we reject H0. There
is evidence to suggest the alternative hypothesis is true.
b. If the value of the test statistic does not lie in the rejection region, then we cannot
reject H0. There is no evidence to suggest the alternative hypothesis is true.
We never say that we accept H0. A hypothesis test is designed to prove the alterna-
tive hypothesis. If there is no evidence in favor of Ha, this does not imply H0 is true.
A hypothesis test can only provide support in favor of Ha. If the value of the test statis-
tic lies in the rejection region, reject the null hypothesis. There is evidence to suggest
Ha is true. If the value of the test statistic does not lie in the rejection region, do not
reject H0. There is no evidence to suggest Ha is true. Watch the wording: We never
accept the null hypothesis. Rather, the value of the test statistic does not lie in the
rejection region.
4. This formal hypothesis test procedure is analogous to the four-step inference proce-
dure used in the previous chapters. In fact, many of the concepts presented in earlier
chapters are combined here to produce this traditional, well-established hypothesis test
procedure.
The claim corresponds to H0, a claim about a population parameter. The experiment
is equivalent to a value of the test statistic. Likelihood is expressed in terms of the
nonrejection region (likely values of the test statistic) and the rejection region (unlikely
values of the test statistic). The conclusion is completely determined by the region in
which the value of the test statistic lies. If the value is in the rejection region, we reject
H0; otherwise, we cannot reject H0.
394 Chapter 9 Hypothesis Tests Based on a Single Sample
Solution
Step 1 The claim assumed to be true involves the population proportion of patients who
experience reduced pain and greater mobility with the existing surgical tech-
nique. It is assumed that 65% of all patients experience relief. Therefore, the null
hypothesis, stated in terms of a single value for p, is
Eraxion/Dreamstime.com H0: p 5 0.65
Step 2 The existing proportion of patients who experience relief is 0.65. The experi-
ment is designed to detect an increase in this proportion, i.e., to answer the ques-
tion, “Is the new procedure more effective?” Researchers hope to find evidence
that the proportion of patients who experience relief with IDET is greater than
0.65. Therefore, the alternative hypothesis is
Ha: p . 0.65
Solution
Step 1 The claim assumed to be true involves the population mean number of kilome-
ters driven by all Australians. The null hypothesis, the status quo, is given in
terms of the parameter m.
H0: m 5 15,300
Step 2 We are searching for evidence in favor of a smaller mean number of kilometers
driven. The advertising campaign is designed with the hope that the mean
number of miles driven is less than the current mean. Therefore,
Ha: m , 15,300
Solution
Step 1 The assumption, or existing state, involves the population variance in thickness
of recycled printer paper. The null hypothesis is given in terms of s2.
H0: s2 5 0.0007
Step 2 The experiment is designed to detect any difference in the population variance.
This suggests a two-sided alternative.
Ha: s2 2 0.0007
9.9 In each of the following problems, determine whether 9.14 Biology and Environmental Science During the
each pair of statements is a valid set of null and alternative summer months, wildfires in the western United States pose a
hypotheses. Justify your answers. great hazard to people, residential and commercial buildings,
a. H0: m 5 30; Ha: m 2 30 and animals. Previous records indicate that the mean number of
b. H0: s 5 3.55; Ha: s . 3.55 acres burned during a wildfire is 17,060. The most recent
c. H0: p # 0.32; Ha: p . 0.32 summer was unusually wet, and firefighting officials would
d. H0: x 5 78.5; Ha: x 2 78.5 like to know whether the mean number of acres burned during
wildfires was any less. State the null and alternative hypotheses.
9.10 In each of the following problems, determine whether
each pair of statements is a valid set of null and alternative 9.15 Biology and Environmental Science Tourism offi-
hypotheses. Justify your answers. cials in Palm Beach County, Florida, claim that the mean diam-
a. H0: p 5 0.50; Ha: p 2 0.50 eter of sand dollars found on Delray Beach is 4.25 centimeters.
b. H0: m 5 25.6; Ha: m , 25.6 Some scientists claim that warmer water off the coast has inhib-
c. H0: m , 35.9; Ha: m $ 35.9 ited growth of all sea life and that the diameter of sand dollars
d. H0: s2 5 95; Ha: s2 . 95 has decreased. State the null and alternative hypothesis.
9.11 In each of the following problems, a conclusion to a 9.16 Marketing and Consumer Behavior The stereotypical
hypothesis test is presented. Determine whether each statement video-game player is a male teenager. However, recent studies
is permissible. suggest that at least 45% of all players are women.4 A software
a. The value of the test statistic does not lie in the rejection company has decided to develop and market a sophisticated
region. Therefore, we accept the null hypothesis. stock-market video game if the mean age of all video-game
b. The value of the test statistic lies in the rejection region. players is greater than 25. A random sample of players will be
Therefore, there is evidence to suggest the null hypothesis obtained, and the resulting data will be used to test the relevant
is not true. hypothesis. Let m represent the mean age of all video-game
c. The value of the test statistic does not lie in the rejection players. Which of the following sets of null and alternative
region. Therefore, there is evidence to suggest the null hypotheses is appropriate? Justify your answer.
hypothesis is true. a. H0: m 5 25 versus Ha: m . 25
d. The value of the test statistic lies in the nonrejection b. H0: m 5 25 versus Ha: m , 25
region. Therefore, there is no evidence to suggest the c. H0: m 5 25 versus Ha: m 2 25
alternative hypothesis is true. 9.17 Physical Sciences There are approximately 16 fiber-
e. The value of the test statistic does not lie in the rejection optic lines under the ocean off the Florida coast. These lines
region. Therefore, there is no evidence to suggest the carry telephone and Internet communications between Florida
alternative hypothesis is true. and Europe, Latin America, and the Caribbean. During storms,
f. The value of the test statistic lies in the rejection region. these lines sway dramatically and often damage coral or get
Therefore, there is evidence to suggest the alternative caught in anchors. Technical reports including the distance the
hypothesis is true. line swayed indicate that the variance in sway during storms is
32 feet. A study will be conducted to determine whether a new,
Applications heavier cable housing will decrease the variance in sway. State
9.12 Education and Child Development The College the relevant null and alternative hypotheses in terms of s2, the
Board reported that the mean cumulative SAT score (Critical variance in cable sway.
Reading, Mathematics, and Writing) for 2013 college-bound 9.18 Technology and the Internet Many limousines now
seniors was 1498.3 Officials from the Pennsylvania State offer the latest in high-tech gadgets. DVD players, satellite radio,
System of Higher Education would like to know whether the and wireless Internet connections are some of the high-priced
students enrolled for fall 2013 classes have a mean cumulative accessories. Suppose Boston Coach will install wireless Inter-
SAT score greater than 1498. A random sample of students net in all of its vehicles if more than 75% of all clients desire
from across the system is obtained, each cumulative SAT score this service. A random sample of clients will be selected, and
is recorded, and the mean cumulative score is recorded. State the resulting information will be used to test the relevant
the null and alternative hypotheses. hypothesis. Let p represent the proportion of all clients who
9.13 Marketing and Consumer Behavior A marketing would like wireless Internet in a limousine. State the null and
research study conducted in 2013 indicated that 11% of all alternative hypotheses in terms of p.
households in the United States have at least one telescope. 9.19 Sports and Leisure The mean duration of a Major
During May 2013, Venus, Jupiter, and Mercury were visible in League baseball game from the first pitch to the last out over the
a near-perfect triangle formation. A company that manufactures first eight weeks of the 2013 season was 2 hours and 58 minutes.5
telescopes believes the increased media attention caused more During the second half of the season, Major League umpires
people to buy telescopes in order to see this spectacular sight. tried various techniques to speed up the game. A random sample
State the null and alternative hypotheses in terms of the popula- of second-half games will be selected, and the resulting game
tion proportion of households that have at least one telescope. durations will be used to determine whether games take less time
9.1 The Parts of a Hypothesis Test and Choosing the Alternative Hypothesis 397
to complete. Let m be the mean duration of a baseball game. whether there is evidence that the mean distance to a fire
State the relevant null and alternative hypotheses in terms of m. hydrant has decreased. State the null and alternative hypotheses
in terms of m, the population mean distance to a fire hydrant.
9.20 Travel and Transportation DATTCO, a school bus
company in New Britain, Connecticut, will install seat belts 9.26 Public Policy and Political Science The U.S. Health
on all buses if more than 50% of all parents favor this change. Insurance Portability and Accountability Act of 1996 (HIPAA)
A random sample of parents in the school district will be was designed to protect personal privacy. However, this law has
obtained, and the responses will be used to test the relevant created mountains of paperwork and may even increase the cost
hypothesis. If p is the true proportion of parents who favor of medical care. The federal government has decided to consider
seat-belt installation, which of the following sets of null and repealing this law if more than 60% of all hospitals are experi-
alternative hypotheses is appropriate? Justify your answer. encing increased costs due to this regulation. A random sample
a. H0: p 5 0.50 versus Ha: p 2 0.50 of hospitals will be obtained, and the resulting information will
b. H0: p 5 0.50 versus Ha: p , 0.50 be used to test the relevant hypothesis. State the null and alterna-
c. H0: p 5 0.50 versus Ha: p . 0.50 tive hypotheses in terms of p, the population proportion of hos-
pitals experiencing increased costs as a result of HIPAA.
9.21 Public Policy and Political Science Suppose Colby
Coash, a state senator from Lancaster County, Nebraska, is con- 9.27 Public Policy and Political Science The mean time
sidering a run for governor. He will enter the campaign if there is from receipt of application to a final decision for a routine citizen-
evidence to suggest that more than 65% of all state residents favor ship request in Canada is 25 months.6 In an effort to shorten this
his candidacy. A random sample of likely voters is obtained, and lengthy process, the government of Canada has installed a sophis-
the resulting survey data will be used to test the relevant hypoth- ticated electronic record system and hired more staff. A study will
esis. Let p be the proportion of all voters who favor his candidacy. be conducted to determine whether these actions have been effec-
State the null and alternative hypotheses in terms of p. tive. State the null and alternative hypothesis in terms of m, the
mean time from receipt of application to a final decision.
9.22 Education and Child Development Students at
9.28 Public Policy and Political Science The Sparks City
Stetson University plan to ask administrators to build more
Council in Nevada will build a new dock at the Sparks Marina if
on-campus housing on the DeLand campus if there is any
more than 80% of the residents favor the plan. A random sample of
evidence that the median monthly rent for off-campus housing
residents will be obtained, and their responses will be used to test
is more than $350 per person. A random sample of students
the relevant hypothesis. Let p be the population proportion of resi-
living off campus will be obtained, and the resulting rent data
~ be the median dents who favor a new marina. Which of the following sets of null
will be used to test the relevant hypothesis. Let m
and alternative hypotheses is appropriate? Justify your answer.
monthly rent for students. State the null and alternative
~. a. H0: p 5 0.80 versus Ha: p 2 0.80
hypotheses in terms of the population median m
b. H0: p 5 0.80 versus Ha: p , 0.80
9.23 Travel and Transportation Commercial airline pilots c. H0: p 5 0.80 versus Ha: p . 0.80
often begin as flight engineers or first officers for smaller, 9.29 Business and Management The Savannah Sugar
regional airlines. Historically, newly hired pilots at major airline Refinery in Louisiana will invest in new energy-saving devices
companies have approximately 2000 hours of flight experience. if there is evidence to suggest that the mean amount of energy
The FAA is conducting a study to determine if the mean flight (in kilowatts) used per day will decrease from 1925. Suppose m
experience of newly hired pilots has decreased. State the null and is the population mean amount of energy used per day, and an
alternative hypotheses in terms of the population mean m. experiment is designed to test the new equipment. State the null
9.24 Medicine and Clinical Studies A new surgical proce- and alternative hypotheses in terms of m.
dure has been developed to remove cataracts; it involves a 9.30 Economics and Finance Suppose U.S. monetary policy
smaller incision. This more expensive procedure will be imple- will be set by the Federal Reserve Board by examining the median
mented at a major hospital only if there is evidence that the consumer price, m ~ , for a large fixed set of common commodities.
standard deviation in time to recovery is less than seven days. The Board will raise the interest rate if there is evidence that the
A random sample of patients will receive the new procedure, median price is less than $125.50. A random sample of counties
and the resulting recovery times will be used to test the relevant will be obtained, and the cost of these goods in each county will be
hypothesis. Let s be the population standard deviation in recov- used to test the relevant hypothesis. State H0 and Ha in terms of m ~.
ery time. State the null and alternative hypotheses in terms of s.
9.31 Public Health and Nutrition The typical American
9.25 Public Policy and Political Science Officials in Dexter, adult male consumes approximately 2666 calories per day. New
a small town in upstate New York, have decided to install more research suggests that those people who stay up late lose more
fire hydrants in order to decrease insurance rates for many busi- sleep and consume more calories.7 A random sample of 125
nesses. Prior to installing the new fire hydrants, the mean dis- American adult males who routinely stay up late was obtained,
tance to a fire hydrant for downtown buildings was 525 feet. and the number of calories per day consumed by each was care-
After the new installations, a random sample of downtown fully recorded. Let m represent the mean number of calories
buildings will be obtained, and the distance to the nearest fire consumed per day by an American adult male. State the rele-
hydrant will be recorded. The data will be used to determine vant null and alternative hypotheses in terms of m.
398 C hapter 9 Hypothesis Tests Based on a Single Sample
Cutoff
point
Here is what can happen when the sample data are collected and the hypothesis is
tested.
This is all very theoretical. You never 1. Suppose the true population mean m is 62°F or greater; the water is warm enough for
really know whether you made a swimming.
mistake in a hypothesis test. a. If x . 61.5°F, then there is no evidence to reject H0. This conclusion is correct.
b. If x # 61.5°F, then there is evidence to reject H0. An advisory warning is posted.
This conclusion is incorrect. The water really is warm enough for swimming.
2. Suppose the true population mean m is less than 62°F; the water is too cold for
swimming.
a. If x . 61.5°F, then there is no evidence to reject H0. This conclusion is incorrect.
The water is really too cold for swimming.
b. If x # 61.5°F, then there is evidence to reject H0. An advisory warning is posted.
This conclusion is correct.
This example illustrates the two possible errors in a hypothesis test.
9.2 Hypothesis Test Errors 399
Definition
1. The value of the test statistic may lie in the rejection region, but the null hypothesis is
true. If we reject H0 when H0 is really true, this is called a type I error. The probability
of a type I error is called the significance level of the hypothesis test and is denoted by
a: P(type I error) 5 a.
2. The value of the test statistic may not lie in the rejection region, but the alternative
hypothesis is true. If we do not reject the null hypothesis when Ha is really true, this is
called a type II error. The probability of a type II error is denoted by b: P(type II
error) 5 b.
The following table illustrates the decisions and errors in a hypothesis test.
Decision
Reject H0 Do not reject H0
H0 Type I error Correct decision
Truth
Ha Correct decision Type II error
Each example below describes a specific hypothesis test. The type I error and the type
II error are described in context, and the real-world consequences of a wrong decision
are given.
Solution
Step 1 Suppose H0 is true: The proportion of defective solar panels is really 0.078
(or less).
a. If H0 is rejected, the entire solar panel shipment will be sent back. This is a
type I error, and in this case the company in Shanghai will not be happy. The
solar panels are of acceptable quality, but the entire shipment is returned.
b. If H0 is not rejected, then the solar panel shipment is accepted. Everyone is
happy in this case. The hypothesis test indicated no evidence of a higher pro-
portion of defective solar panels.
Step 2 Suppose Ha is true: The proportion of defective solar panels is greater than 0.078.
a. If H0 is rejected, the entire solar panel shipment will be returned. This is the
correct conclusion. The hypothesis test indicated the solar panel shipment
contained a higher proportion of defectives, and Florida Power and Light will
be glad to return a bad batch.
400 Chapter 9 Hypothesis Tests Based on a Single Sample
b. If H0 is not rejected, then the solar panel shipment will be accepted. This is a
type II error, and in this case Florida Power and Light will not be happy. Too
many solar panels are defective, yet Florida Power and Light will accept a
shipment of poor quality.
Solution
Step 1 Suppose H0 is true: The mean amount of the active ingredient is 80 mg.
a. If H0 is rejected, the aspirin shipment is returned to Bayer. This is a type I
error, and Bayer will not like this decision. The aspirin has the specified
amount of acetylsalicylic acid, but the entire shipment is returned.
b. If H0 is not rejected, then the aspirin bottles are placed on the shelves at Wal-
greens. Everyone is happy. The aspirin contains the correct amount of the
active ingredient, and the hypothesis test did not indicate otherwise.
Step 2 Suppose Ha is true: The mean amount of active ingredient is not 80 mg.
a. If H0 is rejected, the aspirin shipment is returned to Bayer. This is the cor-
rect conclusion. The hypothesis test indicated the mean amount of acetyl-
salicylic acid is different from 80 mg, and Walgreens is happy to return a
flawed shipment.
b. If H0 is not rejected, the aspirin bottles are placed on the shelves at Walgreens,
ready for sale. This is a type II error, and in this case Walgreens may encoun-
ter some very unhappy customers. The mean amount of the active ingredient
is not as specified, yet the aspirin is placed in store stock and sold.
Solution
Step 1 Suppose H0 is true: The population variance is 0.04 inch (or less).
a. If H0 is rejected, the manufacturing process is shut down. This is a type I
error. The null hypothesis is true, but H0 is rejected. This error is bad for
Nucor. The process is stopped unnecessarily, and production time is lost.
b. If H0 is not rejected, then the facility continues to manufacture metal sheets.
This is a correct decision.
Step 2 Suppose Ha is true: The population variance in thickness is greater than 0.04 inch.
a. If H0 is rejected, the correct decision has been made. The variance is too high
and the hypothesis test procedure suggested that the manufacturing process
should be shut down for inspection.
b. If H0 is not rejected, the facility continues to hum along. This is an incorrect
decision, a type II error. Ha is true, but the null hypothesis is not rejected. In
this case, there is a good chance the metal sheets are being made with too
much variability in thickness.
A perfect hypothesis test would The efficiency or goodness of a hypothesis test is often measured in terms of a and b.
have the probability of a type I Intuitively, an effective test should have the probability of both types of errors small. In fact,
error and the probability of a type because no one wants to make a mistake, it would be ideal to have a 5 b 5 0. However, this
II error both equal to 0. is impossible, because any decision in a hypothesis test is made using information in a sample.
We can never examine the entire population, so there is always a chance of making a mistake.
In a hypothesis test, we usually control the value of a. We do not have any direct control
over b, but we can often compute the probability of a type II error for a specific alternative
value of the parameter. It seems reasonable to set a as small as possible, but for a fixed sample
a and b are inversely related, but size, a and b are inversely related. For a fixed n, making a smaller forces b to be larger. The
not by any set formula. We know only way to decrease both a and b (simultaneously) is to increase the sample size. More infor-
only that as a decreases, b mation (larger n) means less of a chance of making a mistake. The most common values for a
increases; and as b decreases, a are 0.05 and 0.01. The probability of a type II error actually depends on the specific alternative
increases. value of the population parameter being tested. Therefore, the probability of a type II error is
usually written as a function of the population parameter. For example, b ( ma ) represents the
probability of a type II error if the true value of m is ma.
Consider a hypothesis test with H0: u 5 u 0 and Ha: u 2 u 0. As the alternative value of
the parameter, u a, moves farther away from the hypothesized value, u 0, the probability of
a type II error, b ( u a ) , decreases. This seems reasonable because for values of u a far away
from u 0, we have a better chance of detecting the difference between the hypothesized
value and the alternative value of the population parameter.
Later in this chapter, we will visualize and learn how to compute the probability of a
type II error for a specific alternative value of the population characteristic.
Practice evidence to suggest that the mean number of cars per day using
this bridge has increased.
9.37 Consider a hypothesis test with a. Write the null and alternative hypotheses about m, the
H0: m 5 180 and Ha: m , 180 mean number of cars per day that cross the Tappan Zee
Bridge.
Determine whether each of the following decisions is correct or
b. For the hypotheses in part (a), describe the type I and
in error. Identify each error as type I or type II.
type II errors.
a. The true value of m is 180 and H0 is rejected.
c. If a type I error is committed, who is more angry, the
b. The true value of m is 179 and H0 is rejected.
transportation officials or drivers, and why?
c. The true value of m is 160 and H0 is not rejected.
d. If a type II error is committed, who is more angry, the
d. The true value of m is 182 and H0 is rejected.
transportation officials or drivers, and why?
9.38 Consider a hypothesis test with
9.44 Public Policy and Political Science The Avenal State
H0: p 5 0.44 and Ha: p 2 0.44 Prison in California has thousands of files on former inmates, an
Determine whether each of the following decisions is correct estimated 20 million pages of documents. A proposal has been
or in error. Identify each error as type I or type II. made to archive many of these old files to compact discs. The
a. The true value of p is 0.44 and H0 is not rejected. state will release money for this project only if prison officials
b. The true value of p is 0.41 and H0 is not rejected. present evidence to suggest the true mean age of all files is greater
c. The true value of p is 0.45 and H0 is not rejected. than 10 years. A random sample of files will be obtained, and the
d. The true value of p is 0.42 and H0 is rejected. information in the sample will be used to test the hypotheses
may increase the level of antioxidants, compounds that protect technology.14 Many shoppers still feel insecure completing a
us against free radicals, which can cause heart disease and transaction with a credit card via a mobile device. Executives
cancer.12 A study was designed, and the antioxidant concentration at Alt Hotels have decided to conduct a survey to determine
in patients who ate dark chocolate regularly was measured. whether hotel guests are becoming more comfortable booking a
This information was used to test the hypotheses room using a mobile device. If the proportion ( p 5 0.02 ) has
increased, they intend to invest more money in this technology.
H0: m 5 0.4 versus Ha: m . 0.4
a. State the null and the alternative hypothesis.
where m is the true mean percentage of antioxidants in the b. Describe a type I error and a type II error in this context.
bloodstream.
a. Describe a type I error and a type II error in this context. 9.52 Medicine and Clinical Studies An electroencephalo-
b. The Hershey Company, maker of Hershey’s chocolates, is gram (EEG) is often used to measure brain, or neural, activity
very interested in the results of this study. Which error is expressed as electrical voltage. Research suggests that transcen-
more serious for this company? Why? dental meditation (TM) produces a simplified state of rest and
relaxation resulting in an EEG with smaller than normal vari-
9.48 Public Policy and Political Science The Dallas, Texas, ance. Suppose normal brain activity has variance s2 5 15 volts2.
city council is going to consider a zoning variance for the A random sample of patients who can achieve TM will be
Estates on Frankford apartment complex so that the developer selected, and their brain activity will be measured during TM.
may extend the structure closer to the nearest road. Some coun- The resulting information will be used to test the research theory.
cil members are concerned about safety, but they will vote for a. State the null and alternative hypotheses in terms of s2.
the measure if more than 60% of all city residents favor the b. Describe a type I and a type II error in this context.
variance. A random sample of residents will be obtained and c. If there is evidence to suggest TM decreases the variance
asked whether they favor the zoning variance. Let p be the true in brain activity, then the National Science Foundation
proportion of residents who favor the extended structure. (NSF) will commit more money for TM research. Which
a. State the null and alternative hypotheses in terms of p. error is more serious for the NSF? Why? Which error is
b. Describe a type I error and a type II error in this context. more serious for TM researchers? Why?
c. Which error is more serious for the developer? Why?
d. Which error is more serious for the city council
members? Why? Extended Applications
9.49 Manufacturing and Product Development The 9.53 Marketing and Consumer Behavior Many families
National Fire Protection Association maintains an extensive list have decided to use a satellite dish instead of traditional cable
of codes and standards that address equipment, extinguishing TV. Satellite-dish companies offer a wide variety of premium
systems, and fire-protective gear.13 A firefighter’s helmet should channels and popular sports packages. The local cable TV
withstand a temperature of 1200°F for 30 seconds. The Seattle provider in Neodesha, Kansas, Cable ONE, is concerned about
Fire Department is ordering helmets from Kidde Fire Fighting. losing market share and is going to conduct a hypothesis test
They will accept the shipment only if there is no evidence to to determine whether more advertising is needed. A random
suggest the helmets fail to meet the standard. Let m be the sample of homes in the city will be obtained, and the data will
mean temperature the helmets can withstand. be used to determine whether there is any evidence that the true
a. State the null and alternative hypothesis. proportion of homes with satellite dishes is greater than 0.25.
b. Describe a type I error and a type II error in the context of a. State the null and alternative hypotheses in terms of p.
this problem. b. Describe type I and type II errors in this context.
c. For a fixed sample size, which is smaller, b ( 0.27 ) or
9.50 Economics and Finance A shadow chief secretary b ( 0.35 ) ? Justify your answer.
to the treasury claims that first-time home buyers in the South
9.54 Sports and Leisure South Carolina Department of
Downs area (near Worthing in Sussex, England) are paying
(on average) more than £1367 in stamp duty. The shadow chief Natural Resources personnel enforce hunting and fishing regula-
secretary believes this steep tax is prohibiting families from tions and conduct routine safety checks on recreational boats.
buying a first home in this area. A random sample of first-time Past experience indicates that 15% of all boats inspected have
home buyers will be obtained, and the stamp duty paid by each at least one safety violation. Recent accidents on lakes in South
family will be recorded. These data will be used to determine Carolina have prompted calls for more extensive inspections.
whether there is any evidence the mean stamp duty for first- A random sample of recreational boats will be obtained and
time home buyers, m, is more than £1367. inspected. If there is evidence that the true proportion of boats
a. State the null and alternative hypotheses in terms of m. with safety violations, p, is more than 15%, a methodical inspec-
b. Describe a type I error and a type II error in this context. tion of every boat launched at popular sites will be started.
a. State the null and alternative hypotheses in terms of p.
9.51 Psychology and Human Behavior Recently a panel b. Describe a type I error and a type II error in this context.
of Canadian hotel industry IT executives reported on hotel tech- c. What happens to the probability of a type I error as the
nology trends. Although 22–30% of guests shop for a room value of p approaches 0.15 from the right (that is, 0.20,
using a mobile device, only 2% actually book a room using this 0.19, . . .)?
404 C hapter 9 Hypothesis Tests Based on a Single Sample
9.55 Physical Sciences Civil engineers are going to test a 9.56 Medicine and Clinical Studies The failure rate of hip
highway bridge outside of Washington, D.C., for structural implants in men is approximately p 5 0.019, and this propor-
integrity. A random sample of locations along the bridge will be tion is even higher for women.15 A new implant design by
selected, and the concrete stress (in pounds per square inch, psi) researchers at the Missouri Bone & Joint Research Foundation
will be measured. These data will be used to determine the uses a thinner plastic and will hopefully cause fewer complica-
safety of the bridge. tions and further surgeries. A long-term study was conducted to
a. Suppose 6400 psi is considered a safe concrete stress examine the failure rate of the new implant design in men.
level. Which pair of hypotheses should be tested? a. State the null and alternative hypotheses in terms of p.
H0: m 5 6400 versus Ha: m . 6400 b. Describe a type I error and a type II error in this context.
c. For a fixed sample size, which is smaller, b ( 0.015 ) or
or
b ( 0.010 ) ?
H0: m 5 6400 versus Ha: m , 6400 d. If you have invested in the company that manufactures this
b. If you regularly drive over the bridge being tested, would new implant, would you prefer a 5 0.1 or a 5 0.01?
you prefer a 5 0.1 or a 5 0.01? Why? Why?
Solution
Step 1 Let m be the mean time to treatment for very ill patients entering the emergency
room. The current state, or what is assumed to be true, involves time to treatment
prior to the waiting-room experiment. Therefore, the null hypothesis is
H0: m 5 20
Step 2 The chairman of the Department of Emergency Medicine for University Hospi-
tal hopes doctors going directly to patients in the waiting room will decrease
time to treatment for very ill patients. The alternative hypothesis is
Ha: m , 20
Step 3 We need to know whether 16.1 minutes is a reasonable observation under the
null hypothesis, or whether it is too far away from the assumed mean, 20 min-
utes. Because H0 is assumed to be true, and n 5 36 $ 30, by the central limit
theorem, X is approximately normal with mean m 5 20 and standard deviation
s / !n. Although X could be used as the test statistic, it’s easier to identify
9.3 Hypothesis Tests Concerning a Population Mean When s Is Known 405
Critical value
1.6449 0
The critical value divides the z
Rejection region
measurement axis into two parts:
the rejection region and the Figure 9.3 The critical value and the rejection region for
nonrejection region. the emergency-room hypothesis test.
x 2 20 16.1 2 20
The small letters here (z and x) z5 5 5 24.68
represent actual values. s / !n 5.0/ !36
The value x 5 16.1 is 4.68 standard deviations to the left of the mean—a very
unusual observation if the null hypothesis is really true. This means either 16.1
is an incredibly lucky observation or the assumption (m 5 20) is wrong. As
usual, we discount the lucky possibility.
More formally, because z 5 24.68 ( # 21.6449 ) lies in the rejection region, we
reject the null hypothesis at the a 5 0.05 level of significance. There is evidence to
suggest the true population mean time to treatment, m, is less than 20 minutes.
In any hypothesis test, as in the example above, we assume H0 is true and consider the
likelihood of the sample outcome (expressed as a single value of the test statistic).
1. If the value of the test statistic is reasonable under the null hypothesis, then we cannot
reject H0.
2. If the value of the test statistic is unlikely under the null hypothesis, then we reject H0.
406 Chapter 9 Hypothesis Tests Based on a Single Sample
As presented in Section 9.1, there are three possible alternative hypotheses: one two-sided
alternative and two one-sided alternatives. The rejection region depends on the alternative
hypothesis. The general procedure for a hypothesis test concerning m is summarized below.
A hypothesis test about the population mean m with significance level a has the form:
H0: m 5 m0
Ha: m . m0, m , m0, or m 2 m0
X 2 m0
The rejection region always TS: Z 5
includes the endpoint of the s / !n
(infinite) interval. RR: Z $ za, Z # 2za, or 0 Z 0 $ za / 2
A Closer L ok
1. m0 is a fixed, hypothesized value of the population mean m.
2. Use only one (appropriate) alternative hypothesis and the corresponding rejection
region. The graphs in Figures 9.4–9.6 illustrate the rejection region for each alternative
hypothesis.
Z Z Z
/2 /2
0 z z 0 z 2 0 z 2
Figure 9.4 Rejection region for Figure 9.5 Rejection region for Figure 9.6 Rejection region for
Ha: m . m0 . Ha: m , m0 . Ha: m 2 m0 .
3. For a two-sided alternative hypothesis, the rejection region 0 Z 0 $ za/2 (written using
the absolute value of Z ) is simply a shorthand way to write Z $ za/2 or Z # 2za/2.
4. For a given significance level, the corresponding critical value is found using Table III
in the Appendix, backward. This procedure was presented in Chapter 8. Common values
for a are 0.05, 0.025, and 0.01.
5. The hypothesis test procedure described above can be used only if s is known. If n is
large and s is unknown, some statisticians suggest using the sample standard devia-
tion s in place of s. This produces an approximate test statistic. Section 9.5 presents
an exact test procedure concerning m when s is unknown.
Solution Trail 9.8 weather. The long-term mean area of the Dead Zone is 5960 square miles.16 As a result of
recent flooding in the Midwest and subsequent runoff from the Mississippi River, research-
K e ywor ds ers believe that the Dead Zone area will increase. A random sample of 35 days was obtained,
n Is there any evidence? and the sample mean area of the Dead Zone was 6759 mi2. Is there any evidence to suggest
n Greater than the long-term mean that the current mean area of the Dead Zone is greater than the long-term mean? Assume
n Standard deviation 1850 that the population standard deviation is 1850 and use a 5 0.025.
n Random sample
Solution
Tr a nsl ati o n
Step 1 The current state, or assumed mean, is m 5 5960 ( 5 m0 ) .
n Conduct a one-sided, right-
tailed test about a population The sample size is n 5 35; s 5 1850 and a 5 0.025.
mean m We are looking for evidence that the current mean area of the Dead Zone is
n m0 5 5960 greater than the long-term mean. Therefore, the alternative hypothesis is one-
n s 5 1850 sided, right-tailed.
Step 2 The four parts of the hypothesis test are
C onc e p ts
n Hypothesis test concerning a H0: m 5 5960
population mean when s is Ha: m . 5960
known
X 2 m0
Vision TS: Z 5
s/ !n
Use the template for a one-sided,
right-tailed test about m. The RR: Z $ za 5 z0.025 5 1.96
underlying population distribution Step 3 The value of the test statistic is
is unknown, but n is large and
s is known. Determine the x 2 m0 6759 2 5960
z5 5 5 2.555 $ 1.96
appropriate alternative hypothesis s / !n 1850/ !35
and the corresponding rejection
region, find the value of the test Step 4 Because 2.555 lies in the rejection region, we reject the null hypothesis at the
statistic, and draw a conclusion. a 5 0.025 level. There is evidence to suggest the current mean area of the Dead
Zone is greater than 5960 mi2.
Figures Figure 9.7–Figure 9.9 show a technology solution.
VIDEO Tech
Video TECH Manuals
MANUALS Try It Now Go to Exercise 9.74
EXELmean
One DISCRIPTIVE
test - z -
summarized data Example 9.9 Natural Defense
Data Set White blood cells are the body’s natural defense mechanism against disease and infection.
The mean white blood cell count in healthy adults, measured as part of a CBC (complete
whtcells
blood count), is approximately 7.5 3 103 /ml.17 A company developing a new drug to treat
arthritis pain must check for any side effects. A random sample of patients using the new
VIDEO Tech
Video TECH Manuals
MANUALS drug was selected, and the white blood cell count of each patient was measured. The
EXELmean
One DISCRIPTIVE
test - z - results are given in the table below ( 3103 /ml ) .
with data
6.50 8.69 6.85 6.76 6.58 8.84 8.44 8.28 7.65 6.95 10.12
8.74 8.00 8.84 7.93 7.65 7.00 6.70 9.20 6.45 7.66
408 Chapter 9 Hypothesis Tests Based on a Single Sample
Assume the distribution of white blood cell counts is normal and s 5 1.1. Conduct a
hypothesis test to determine whether there is any change in the mean white blood cell
count due to the arthritis drug. Use a 5 0.01.
Solution
Step 1 The assumed mean is m 5 7.5 ( 5 m0 ) ; the sample size is n 5 21; s 5 1.1; and
a 5 0.01.
The company is looking for any change in the mean white blood cell count.
Therefore, the relevant alternative hypothesis is two-sided.
The sample size is small (n , 30), but the population is assumed to be nor-
mal. The hypothesis test concerning a population mean when s is known can
be used.
Step 2 The four parts of the hypothesis test are
H0: m 5 7.5
Ha: m 2 7.5
X 2 m0
TS: Z 5
s / !n
RR: 0 Z 0 $ za/2 5 z0.005 5 2.5758 ( Z # 22.5758 or Z $ 2.5758 )
Step 3 The sample mean is
1
x5 ( 6.50 1 8.69 1 c1 7.66 ) 5 7.8014
21
The value of the test statistic is
x 2 m0 7.8014 2 7.5
z5 5 5 1.2556
s / !n 1.1/ !21
Step 4 The value of the test statistic, z 5 1.2556, does not lie in the rejection region.
We do not reject the null hypothesis at the a 5 0.01 level of significance. There
is no evidence to suggest the new arthritis drug has changed the mean white
blood cell count.
Figures 9.10 through Figure 9.12 show a technology solution using the TI-84
plus; Figure 9.13 shows JMP Test Mean results.
Recall that the probability of a type II error depends on the alternative value of the
parameter under investigation. The example below presents a method for computing b in
a hypothesis test about a population mean m when s is known.
Solution
Step 1 The assumed mean is m 5 315 ( 5 m0 ) ; n 5 36, s 5 29.0, and a 5 0.025.
The NFL scouts are looking for evidence of a smaller mean weight in offensive
tackles. This is a one-sided, left-tailed test.
The underlying population distribution is unknown, but the sample size is large
(n 5 36 $ 30). The hypothesis test concerning a population mean m when s is
known is relevant.
Step 2 The four parts of the hypothesis test are
H0: m 5 315.0
Ha: m , 315.0
X 2 m0 X 2 315.0
TS: Z 5 5
s / !n 29.0/ !36
RR: Z # 2za 5 2z0.025 5 21.96
Step 3 To compute the probability of a type II error, you must visualize the rejection
region in terms of X . Start by writing the definition of a type I error, and work
backward to the X world.
P ( Z # 21.96 ) 5 0.025 Definition of type I error.
X 2 315.0
Pa # 21.96b 5 0.025 Use the definition of the test statistic.
29.0/ !36
Isolate X. Within the probability expression, multiply
P ( X # 305.5 ) 5 0.025 both sides by 29.0 / !36, and add 315.0 to both sides.
305.5 5 X c is the critical value in the X world (see Figure 9.14). If the
sample mean weight of the tackles is less than or equal to 305.5, reject the
null hypothesis.
Step 4 b ( 300 ) is the probability of not rejecting the null hypothesis if the real mean is
300 (often denoted ma 5 300; the a in the subscript stands for alternative
mean). Write a probability statement for b ( 300 ) using the critical value 305.5,
and standardize.
b ( ma ) 5 P ( X . X c ) Definition of a type II error.
Z Z
1.96 0 0 1.14
Figure 9.14 Visualization of the calculations for b(300), the probability of a type II error
for ma 5 300.
If the true mean weight of tackles at the combine is 300 pounds, the probability
of a type II error is 0.1271, at the a 5 0.025 significance level.
A Closer L ok
1. By referring to Figure 9.14, one can visualize the inverse relationship between a and
b. For example, if a increases, Xc moves to the right, and the area of the region cor-
responding to b decreases. Exercise 9.88 at the end of this section is designed to
explore and confirm this concept numerically.
2. The logic and method for finding the probability of a type II error for the other alterna-
tive hypotheses is the same:
a. Work backward to find Xc .
b. Write a probability statement for b ( ma ) in terms of X and Xc .
c. Standardize to find the probability.
3. Rather than trying to memorize a formula for the probability of a type II error corre-
sponding to each alternative hypothesis, use the definition, a drawing, and the three-
A little educational philosophy step procedure described above. This conceptual approach will help when we consider
sneaks in here. the hypothesis test procedure concerning a population proportion.
Technology Corner
Procedure: Hypothesis test concerning a population mean when s is known.
Reconsider: Example 9.9, solution, and interpretations.
9.3 Hypothesis Tests Concerning a Population Mean When s Is Known 411
CrunchIt!
Use the built-in function z 1-Sample. Input is either summary statistics or a column containing the sample data.
1. Enter the data in column Var1.
2. Select Statistics; z; 1-sample.
3. Under the Columns tab, select Var1 for the Sample and enter the standard deviation. Under the Hypothesis Test
tab, enter the value for m0 and select the appropriate Alternative. (Figure 9.15).
4. Click Calculate. The results are displayed in a new window (Figure 9.16).
Figure 9.15 1-Sample input screen. Figure 9.16 1-Sample test results.
TI-84 Plus C
Use the calculator function Z-Test. The input for this function is either data in a list or summary statistics.
1. Enter the data into list L1.
2. Select STAT ; TESTS; Z-Test.
3. The data are given in this example; highlight Data. Enter m0 , s, the name of the list containing the data,
the frequency of occurrence of each observation, and highlight the appropriate alternative hypothesis.
See Figure 9.10.
4. In the last row of the input screen, select Calculate and press ENTER .
5. The p value associated with the hypothesis test is given on the Z-Test output screen. This topic is covered in
Section 9.4. See Figure 9.11.
6. The Z-Test input screen includes an option to Draw, or visualize, the results of the hypothesis test. The calculator
will automatically determine a suitable WINDOW, sketch a standard normal density curve, display the computed z
and p values, and shade the area under the curve corresponding to the p value. See Figure 9.12.
Minitab
Use the function 1-Sample Z. The input is either data in a column (Samples in columns) or summary statistics
(Summarized data).
1. Enter the data into column C1.
2. Choose Stat; Basic Statistics; 1-Sample Z.
3. In this example, select One or more samples, each in a column and enter C1. Enter the Standard deviation, check
the box to Perform hypothesis test, and enter the Hypothesized mean (Figure 9.17).
4. Choose Options. Enter a Confidence level and select the Alternative hypothesis (Figure 9.18).
5. Summary statistics, the confidence interval, the value of the test statistic, and the p value are displayed in the
Session window (Figure 9.19).
412 C h a p t e r 9 Hypothesis Tests Based on a Single Sample
Figure 9.17 1-Sample Z input screen. Figure 9.19 1-Sample Z hypothesis test results.
Excel
Use the function Z.TEST. The input is a column (array) of data.
1. Enter the data into column A.
2. The arguments for Z.TEST are the array, m0 , and s. The function returns the probability that the sample mean is
greater than the hypothesized mean. Other calculations may be necessary to find the desired p value (discussed in
Section 9.4). See Figure 9.20.
a. Write the appropriate test statistic. c. The value of the test statistic does not lie in the rejection
b. Write the rejection region corresponding to each value of a: region. We accept the null hypothesis and conclude that
i. a 5 0.01 ii. a 5 0.2 iii. a 5 0.05 m 5 m0.
iv. a 5 0.10 v. a 5 0.001 vi. a 5 0.0002 d. The null hypothesis is H0: m . m0.
e. The probability of a type II error when m 5 ma is
9.65 Consider a hypothesis test concerning a population mean
1 2 a 5 1 2 0.05 5 0.95 .
with H0: m 5 3.55, Ha: m , 3.55, n 5 49, and s 5 6.2. Find
the significance level (a) for each rejection region:
a. Z # 21.6449 b. Z # 22.5758 c. Z # 22.0537 Applications
. Z # 22.3263 e. Z # 23.0902 f. Z # 23.7190
d 9.72 Business and Management The mean income per
9.66 Consider a hypothesis test concerning a population mean year of employees who produce internal and external
with H0: m 5 7.6, Ha: m 2 7.6, n 5 37, and s 5 4.506. Find n ewsletters and magazines for corporations was reported to be
the significance level (a) for each rejection region: $51,500. These editors and designers work on corporate
publications but not on marketing materials. As a result of
a. 0 Z 0 $ 1.96 b. 0 Z 0 $ 1.6449 c. 0 Z 0 $ 2.8070
poor economic conditions and oversupply, these corporate
. 0 Z 0 $ 3.2905 e. 0 Z 0 $ 1.2816
d f. 0 Z 0 $ 2.3263
communications workers may be experiencing a decrease in
9.67 Consider a hypothesis test concerning a population mean salary. A random sample of 38 corporate communications
from a normal population with H0: m 5 98.6, Ha: m . 98.6, workers revealed that x 5 49,762. Conduct a hypothesis test
SOLUTION TRAIL
n 5 10, and s 5 1.2. Find the significance level ( a ) for each to determine whether there is any evidence to suggest that the
rejection region: mean income per year of corporate communications workers
a. Z $ 3.7190 b. Z $ 0.8416 c. Z $ 2.3263 has decreased. Assume s 5 3750 and use a 5 0.01.
. Z $ 1.6449 e. Z $ 3.2905
d f. Z $ 2.8782
9.73 Biology and Environment Science Motor
9.68 Consider a random sample of size 25 from a normal vehicles contribute approximately 15% of all carbon
population with hypothesized mean 212 and s 5 2.88. dioxide released into the atmosphere each year. Carbon dioxide
a. Write the four parts for a one-sided, right-tailed hypothesis emissions are often used to determine if a motor vehicle is
test concerning the population mean with a 5 0.01. efficient, or green. A random sample of 40 2013 automobiles
b. What assumptions are made in order for the test in part was obtained and the CO2 emissions for each was measured
(a) to be appropriate? (in g/mi).19 The mean CO2 emissions for all automobiles tested
c. Suppose the sample mean is x 5 213.5. Find the value of by the U.S. Environmental Protection Agency in 2012 was
the test statistic, and draw a conclusion about the 335 g/mi. Is there any evidence that the mean CO2 emissions
population mean. has decreased? Assume s 5 65 and use a 5 0.05. Write a
Solution Trail for this problem. greencar
9.69 Consider a random sample of size 32 from a population
with hypothesized mean 3.14 and s 5 6.8. EX9.69 9.74 Marketing and Consumer Behavior After analyzing
a. Write the four parts for a one-sided, left-tailed hypothesis a database of over 2 million telephone calls, Vonage reported
test concerning the population mean with a 5 0.001. that the mean length of all international calls was 295 seconds.
b. What assumptions are made in order for the test in part Following an extensive advertising campaign, the company
(a) to be appropriate? expected the length of international calls to increase. A few
c. Compute the sample mean, and find the value of the test months after the ads first appeared, a random sample of 48 calls
statistic. Draw a conclusion about the population mean. was obtained. The sample mean was x 5 306.3 seconds.
Assume that the population standard deviation is 52 seconds.
9.70 Consider a random sample of size 48 from a population Is there any evidence to suggest the advertising campaign has
with hypothesized mean 365.25 and s 5 22.3. been successful? Use a 5 0.01.
a. Write the four parts for a two-sided hypothesis test
concerning the population mean with a 5 0.05. 9.75 Physical Sciences The U.S. Geological Survey collects
b. What assumptions are necessary in order for the test in water-level measurement data from various locations. Data
part (a) to be appropriate? from the La Pine Basin, Oregon, are being used to study
c. Suppose the sample mean is x 5 360.0. Find the value changes in the water table. Twelve days were randomly selected,
of the test statistic, and draw a conclusion about the and the water table in feet below land surface on each day is
population mean. given in the following table. WTRTABLE
9.76 Sports and Leisure The length of a motorboat is tradi- s 5 3.4. Is there any evidence to suggest that the download
tionally measured in two ways. The distance along the centerline speed for the United States is greater than the world population
from the outside of the front hull to the rear is the length overall mean? Use a 5 0.01. download
(LOA). The length of waterline, or load waterline (LWL), is the
9.81 Public Policy and Political Science The U.S. Depart-
length of the boat on the line where the boat meets the water.
ment of Health and Human Services defines “response time” as
Members of the Community Association in the lakeside city of
the time from receipt of a report of child neglect to the initial
Vermilion, Ohio, are concerned that residents are using bigger
investigation. Once a call to the agency is received and logged
boats, contributing to more noise and water pollution. Past
in, the clock starts. Face-to-face contact with the alleged victim
registration records indicate the mean LOA for boats allowed
marks the initial investigation. There is some concern that
on the lake is 35 feet. A random sample of 41 boats on the lake
workload and poor organization have contributed to a significant
was obtained, and each LOA was carefully measured. The
increase in the long-term mean response time of 14.0 hours. A
sample mean LOA was x 5 36.22 feet. Assume s2 5 5.7 ft2.
random sample of cases was selected, and the response time for
a. TRAIL
SOLUTION Is there any evidence to suggest the mean LOA has
each was recorded. Assume s 5 3.2 and conduct a hypothesis
increased? Use a 5 0.01.
test to determine whether there is any evidence that the mean
b. Does your answer to part (a) change if a 5 0.1? Why or
response time has increased. Use a 5 0.001. hhscases
why not?
9.82 Manufacturing and Product Development A major
9.77 Economics and Finance The price of coffee has cause of injuries in highway work zones is weak construction
been steadily declining since October 2012. The Interna- barriers. One federal government road-barrier specification
tional Coffee Organization (ICO) reported that the mean price involves the velocity of a front-seat passenger immediately fol-
for coffee in May 2013 was 126.96 U.S. cents/lb.20 A random lowing impact. This impact velocity must be less than 12 m/s.
sample of 34 days was obtained and the composite indicator for TSS GmbH, a German company, has just developed a new high-
Brazilian Natural coffee was recorded for each. The sample impact road barrier with special absorbing material. In controlled
mean was x 5 130.29. Assume the population standard devia- tests the impact velocity was measured for 12 randomly selected
tion is 8.6 U.S. cents/lb. Is there any evidence to suggest that the crashes. The sample mean was 11.85 m/s. Assume the distribu-
mean composite indicator for Brazilian Natural is greater than tion of impact velocities is normal and s 5 0.26.
126.96? Use a 5 0.05. Write a Solution Trail for this problem. a. Conduct a hypothesis test to determine whether there is
9.78 Public Health and Nutrition The mean daily energy sufficient evidence to suggest the true mean impact
requirement for eight-year-old boys is 2200 calories. An educa- velocity is less than 12 m/s. Use a 5 0.05.
tion researcher believes many students in this group do poorly b. What is your conclusion if a 5 0.01?
in school because they have an inadequate diet and therefore 9.83 Marketing and Consumer Behavior Many people
not enough energy. A random sample of academically at-risk live under a lot of stress and various time constraints. Conse-
eight-year-old boys was obtained, and their caloric intake was quently, some people run out of the house without the keys, and
carefully measured for one day. The summary statistics were are locked out. Rather than break into one’s own home, most
n 5 37, x 5 2089. Assume s 5 358. people in these circumstances call a locksmith. In 2013, the
a. Is there any evidence this group of students has a mean mean service charge for a locksmith was $68.22 A random sam-
caloric intake below the daily energy requirement? ple of 26 locksmith service charges in Black Springs, Arkansas,
Use a 5 0.05. was obtained. The sample mean was $72.35. Assume the under-
b. How would your answer to part (a) change if a 5 0.01? lying population distribution is normal and that s 5 15.5. Is
9.79 Biology and Environment Science An average lawn there any evidence to suggest the mean locksmith service
has a mean of 21 blades of grass per square inch. A garden charge in this city is greater than $68? Use a 5 0.05.
store sells an expensive fertilizer designed to transform an aver- 9.84 Manufacturing and Product Development The
age lawn into a lush, thick carpet within three weeks. To test the Brunton Optimus Crux Camping Stove, a lightweight camping
claim, a random sample of average lawn plots was obtained. stove, is designed to accept a tri-blend fuel made of propane,
Each was treated with the fertilizer according to the instructions iso-butane, and butane. The cartridges for this stove are sold
on the package. Three weeks later the density of each plot was with camping equipment, and the label indicates 225 grams of
measured by counting the blades of grass per square inch. The fuel. To check this claim, 52 cartridges were randomly selected
summary statistics were n 5 32, x 5 22.4. Is there any evi- and the amount of fuel in each was carefully measured. The
dence to suggest the fertilizer improves the thickness of an sample mean was x 5 224.2 grams.
average lawn? Assume s 5 2.7 and use a 5 0.005. a. Is there any evidence that the true mean amount of fuel in
these cartridges is less than the advertised amount?
9.80 Technology and the Internet Based on test results,
Assume s 5 3.6 and use a 5 0.05.
Speedtest.net publishes an Internet download index for coun-
b. What assumptions did you make in order to conduct the
tries all over the world. In July 2013, the mean download speed
hypothesis test in part (a)?
for the entire world was 13.91 Mbps.21 A random sample of
states was obtained and the download index for each was 9.85 Public Health and Nutrition Egg Beaters are egg sub-
recorded. Assume the underlying population is normal and that stitutes with no fat or cholesterol. The Original Egg Beaters are
9.3 Hypothesis Tests Concerning a Population Mean When s Is Known 415
produced to have 5 grams of protein, one less gram than a regular was tested with biodiesel fuel. The resulting HC measurements
egg.23 To check the protein claim, 18 Egg Beaters were randomly are given in the table below. biodies
selected and the amount of protein in each was measured. The
resulting data are given in the following table. eggsub 0.17 0.19 0.10 0.21 0.15 0.23 0.06 0.21
0.18 0.01 0.24 0.29 0.23 0.14 0.18 0.10
5.54 3.83 3.01 5.83 4.64 6.26 4.45 5.71 4.84 0.02 0.16 0.13 0.10 0.14 0.30 0.24 0.24
4.41 5.62 5.49 5.51 4.11 6.46 6.26 4.95 4.40 0.18 0.06
Is there any evidence to suggest the mean amount of protein in a. Suppose the underlying population is normal. Is there any
an Egg Beater is less than 5 grams? Assume the distribution of evidence to suggest the use of biodiesel has decreased the
protein is normal and s 5 0.95. Use a 5 0.02. mean level of HC emissions? Use a 5 0.01.
9.86 Public Policy and Political Science Residential mail- b. Suppose a company is thinking about building a $20
boxes in Des Moines, Iowa, should be installed such that the million biodiesel fuel production facility. The company
bottom of the mailbox is 42 inches above the ground. This rule will invest the money only if there is overwhelming
is designed for safety and to accommodate short mail carriers. evidence in favor of decreased HC emissions. Which error
A random sample of 75 mailboxes in the city was selected. The (type I or type II) is more important to the fuel company?
height of each was carefully measured, and the sample mean Would the company prefer a smaller or a larger
was 43.22 inches. Assume s 5 7.6 inches and use a 5 0.05. Is significance level? Justify your answer.
there any evidence to suggest the true mean height of mailboxes 9.90 Manufacturing and Product Development The left
in Des Moines is different from 42 inches? outside panel of an apartment-size, frost-free refrigerator is
9.87 Public Health and Nutrition Iodine is an important nutri- designed to have width 2358 inches. Each hour, 10 such panels are
ent for the human body, especially in hormone development. Milk randomly selected from the assembly line and carefully measured.
is a natural source of iodine, and the mean amount in cow’s milk is If there is any evidence that the mean width is different from 2358
88 mg/250ml.24 Milk from organic farms is reported to have a inches, the assembly line is shut down for cleaning and inspection.
lower concentration of elements such as zinc, iodine, and selenium. Suppose the distribution of panel widths is normal and s 5 0.15.
A random sample of 35 gallons of organic milk from Humboldt a. During a specific hour of operation, x 5 23.7 inches.
County, California, was obtained and the iodine concentration was Should the assembly line be shut down? Use a 5 0.05.
measured in each. The sample mean was x 5 86.5. Assume b. Using a 5 0.05, find the critical values in the x world.
s 5 3.4. Is there any evidence to suggest that the mean iodine (Note: There are two critical values, because this is a
concentration in organic milk is less than 88? Use a 5 0.01. two-sided hypothesis test.)
9.91 Manufacturing and Product Development A
Extended Applications manufacturer claims the weight of a package of their
unsalted pretzels is (at least) 15.5 ounces. To test this claim,
9.88 Type II Errors The four parts of a hypothesis test concern- a consumer group randomly selected 250 packages and
ing a population mean from a normal population are shown below. carefully weighed the contents of each. The sample mean
H0: m 5 50 was x 5 15.45 ounces. pretzel
a. Conduct a one-sided, left-tailed hypothesis test to see
Ha: m . 50
whether there is any evidence that the true mean weight
X 2 m0 of the pretzel packages is less than 15.5 ounces. Use
TS: Z 5
s/ !n a 5 0.01 and assume s 5 0.26.
RR: Z $ za b. If your conclusion in part (a) is to reject the null
hypothesis, explain how this conclusion is possible even
Assume the sample size is n 5 25; s 5 7.5 and a 5 0.01. though the sample mean, 15.45, is so close to the
a. Find the probability of a type II error for the alternative hypothesized mean, 15.5. If your conclusion in part (a) is
mean ma 5 54; that is, find b ( 54 ) . to not reject H0, check those calculations one more time.
b. Find b ( 55 ) and b ( 56 ) .
9.92 Biology and Environmental Science Lake Vostok is
c. Repeat parts (a) and (b) for a 5 0.025.
Antarctica’s largest and deepest subsurface lake. It is buried
9.89 Fuel Consumption and Cars Biodiesel fuels are made under approximately 2 miles of ice, and scientists believe it con-
from vegetable oils or animal fats and may be used instead of tains evidence of thousands of tiny organisms and fish. The mean
conventional diesel fuel. One advantage to using biodiesel is a depth of the lake is believed to be 344 meters.25 A random sam-
possible decrease in regulated emissions, specifically total ple of the depth of this lake was obtained. Assume the underlying
hydrocarbons (HC). Using the heavy-duty transient Federal distribution is normal and that s 5 30 meters. vostok
Test Procedure (FTP), the mean HC emission is 0.23 g/hp-hr a. Conduct a hypothesis test to determine whether there
(grams per horsepower-hour) with standard deviation s 5 0.07. is any evidence to suggest the mean depth is less than
A random sample of heavy-duty engines was obtained, and each 344 meters. Use a 5 0.01.
416 Chapter 9 Hypothesis Tests Based on a Single Sample
b. Find the probability of a type II error if the true mean 9.96 Fuel Consumption and Cars The joint venture ContiTech-
depth of the lake is 315 meters; that is, find b ( 315 ) . Jiebao Power Transmission Systems Ltd. produces many makes
c. Carefully sketch a graph illustrating the probability found and models of automobile drive belts. One popular poly V-belt is
in part (b) using density curves for X . designed to have length 1050 mm. To ensure that the belts meet
d. Find b ( 310 ) . this specification, 18 are randomly selected from the assembly
line every hour and carefully measured. If there is any evidence
9.93 Marketing and Consumer Behavior The water-
that the population mean length is different from 1050 mm, the
treatment facility owners in Lake Havasu City, Arizona, recently
entire line is shut down for inspection. fuel
proposed changes to all user fees. There is a treatment capacity
a. Assume the distribution of belt lengths is normal and
fee, a connection fee, and a monthly minimum fee. All prices
s 5 3.7 mm. In a two-sided hypothesis test of
have been set assuming the mean monthly usage is 714 cubic
H0: m 5 1050 versus Ha: m 2 1050, find the critical
feet per household. To check this claim, a random sample of
values in the X world. Use a 5 0.01.
16 homes was obtained and the monthly usage for each home
b. In a sample of 18 belts, suppose x 5 1049. Should the
was recorded. The sample mean was 601.2 cubic feet. Assume
assembly line be shut down? Justify your answer using
the distribution of monthly water usage per household is
the Z distribution and the appropriate X distribution.
normal with standard deviation 283 cubic feet.
a. Conduct a two-sided hypothesis test to determine whether 9.97 Manufacturing and Product Development The 2013
there is any evidence the mean monthly water usage is Kawasaki Jet Ski Ultra 300LX is supercharged and intercooled,
different from 714 cubic feet. Use a 5 0.05. has a four-stoke inline-four engine, and is designed to have a
b. If your conclusion in part (a) is to not reject the null load capacity of 496 pounds.26 If the passenger and gear weight
hypothesis, explain how this conclusion is possible even exceed this capacity, the jet ski will not function properly and
though the sample mean, 601.2, is so far away from the there will be a severe strain on the engine. To determine if cus-
hypothesized mean, 714. If your conclusion in part (a) is tomers are following the load capacity recommendation, a ran-
to reject the null hypothesis, try checking your dom sample of these jet skis is obtained from marinas around
calculations one more time. the country and each is carefully weighed as owners prepare for
9.94 Marketing and Consumer Behavior The Carpet a ride. The summary statistics were n 5 33 and x 5 498.42.
Corner in Gladstone, Missouri, offers a variety of carpets, wood Assume s 5 46.7.
flooring, and tiles. Sales records indicate that the mean amount a. Is there any evidence to suggest that the true mean weight
of carpet installed in a wall-to-wall carpeted residential home of passengers and gear is greater than 496 pounds? Use
by crews from this store is 1250 square feet. The store manager, a 5 0.01.
Frank Vida, believes that when there is a sale, customers trans- b. Suppose the engineers have determined that the watercraft
late the savings into carpeting a larger area. A random sample will overheat rapidly if the load is 525 pounds. Find the
of 45 wall-to-wall carpet purchases was selected during a sale. probability of a type II error if the true mean weight of
The sample mean was x 5 1305 square feet; assume that passengers and gear is 525 pounds; that is, find b ( 525 ) .
s 5 155 square feet.
a. Conduct a one-sided, right-tailed test of H0: m 5 1250 Challenge
versus Ha: m . 1250. Is there any evidence to suggest 9.98 The Power of a Test The probability of a type II
the true mean square footage is larger during a sale? error, b, represents the likelihood of accepting the null
Use a 5 0.01. hypothesis when the alternative is true. The power of a statisti-
b. Find the probability of a type II error if the true mean cal test is p 5 1 2 b, the probability of (correctly) rejecting
square footage during a sale is ma 5 1330; that is, find the null hypothesis, of detecting a difference in the hypothe-
b ( 1330 ) . sized value of the population parameter. Try the Statistical
c. Carefully sketch a graph that includes the distribution of Power applet on the text website.
X if m 5 1250 and if m 5 1330. Shade in the areas that A hot torsion test is used to determine the workability of a metal.
correspond to the probability of a type I error (0.01) and Suppose a carbon steel rod is designed to fail (break) with mean
to the probability of a type II error found in part (b). axial load of 800 newtons (N) (under certain temperature and
9.95 Physical Sciences Kiln-dried solid grade A teak wood speed conditions). A random sample of 25 rods is obtained, and
should have a moisture content of no more than 12%. A furni- the axial load failure is measured for each. Suppose the underly-
ture company recently purchased a large shipment of this wood ing distribution of axial load is normal and s 5 50.
for use in constructing dining room sets. Thirty-seven pieces a. Consider a hypothesis test of H0: m 5 800 versus
of teak wood were randomly selected and carefully measured Ha: m , 800 with a 5 0.025. Find the probability of a
for moisture content. The sample mean moisture content was type II error and the power of this test for m 5 775; that
x 5 12.3%. Assume s 5 1.25. is, find b ( 775 ) and p ( 775 ) .
a. Is there any evidence to suggest the true population mean b. Use a calculator or computer to compute the power,
moisture content is greater than 12%? Use a 5 0.01. p ( ma ) , for ma 5 730, 735, 740, . . . , 800.
b. Find the probability of a type II error if the true population c. Use the values from part (b) to carefully sketch a plot of
mean moisture content is 12.2%; that is, find b ( 12.2 ) . p ( ma ) versus ma. The resulting plot is called a power curve.
9.4 p Values 417
9.4 p Values
The last piece of a hypothesis test, the rejection region, establishes a firm decision rule
based on the value of the test statistic. If the value of the test statistic lies in the rejection
region, we reject H0. If not, we cannot reject the null hypothesis. Using a fixed rejection
region associated with a specific value of a (the significance level) can lead to a peculiar
dilemma. Consider the following example.
Solution
Step 1 The assumed mean is m 5 7.0 ( 5 m0 ) ; n 5 32 and s 5 0.67.
We are searching for any evidence to suggest that the mean sleeping time for
adults is less than 7.0 hours. This is a one-sided, left-tailed test.
The underlying population distribution is unknown, but the sample size is large
( n 5 32 $ 30 ) . The hypothesis test concerning a population mean m when s is
known, is relevant.
Step 2 No value of a is given. The first three parts of the hypothesis test are
H0 : m 5 7
Ha: m , 7
X 2 m0 X 27
TS: Z 5 5
s / !n 0.67/ !32
The value of the test statistic is
6.8 2 7.0
z5 5 21.6886
0.67/ !32
Step 3 The following table shows the resulting rejection region and conclusion for var-
ious values of a.
a Rejection region Conclusion
0.10 Z # 2z0.10 5 21.2816 z 5 21.6886 lies in the rejection region.
We reject the null hypothesis at the
a 5 0.10 level.
0.05 Z # 2z0.05 5 21.6449 z 5 21.6886 lies in the rejection region.
We reject the null hypothesis at the
a 5 0.05 level.
0.025 Z # 2z0.025 5 21.9600 z 5 21.6886 does not lie in the rejection
region. We cannot reject the null hypothesis
at the a 5 0.025 level.
0.01 Z # 2z0.01 5 22.3263 z 5 21.6886 does not lie in the rejection
region. We cannot reject the null hypothesis
at the a 5 0.01 level.
418 Chapter 9 Hypothesis Tests Based on a Single Sample
Step 4 In this example, the decision to reject or not reject H0 depends on the value of
a. To avoid this dilemma, an alternative method for reporting the result of a
hypothesis test is often used. This technique involves computing a tail probability,
or p value.
Alternative Probability
hypothesis definition
Ha: m . m0 p 5 P(Z $ z) Figure 9.21
Ha: m , m0 p 5 P(Z # z) Figure 9.22
Ha: m 2 m0 p/2 5 P ( Z $ z ) if z $ 0 Figure 9.23
p/2 5 P ( Z # z ) if z , 0 Figure 9.24
The p value conveys the strength of the evidence in favor of the alternative hypothesis.
For a small p value, the value of the test statistic lies far out in a tail of the distribution, the
more unlikely the value of the test statistic, and the more evidence in favor of Ha. Small
values of p are usually p # 0.05.
Z Z
p P(Z z) p P(Z z)
0 z z 0
Figure 9.21 Illustration of the p value Figure 9.22 Illustration of the p value
for Ha: m . ma . for Ha: m , ma .
Z Z
p/2 P(Z z)
p/2 P(Z z)
0 z z 0
Figure 9.23 Illustration of the p value Figure 9.24 Illustration of the p value
for Ha: m 2 ma, z $ 0. for Ha: m 2 ma, z , 0.
9.4 p Values 419
If a is the significance level of the hypothesis test, the conclusion is usually written in
one of two ways.
1. If p # a: Reject the null hypothesis. There is evidence to suggest Ha is true at the
p 5 ( observed p value ) level of significance.
2. If p . a: Do not reject the null hypothesis. There is no evidence to suggest the null
hypothesis is false.
Consider a one-sided, right-tailed hypothesis test concerning a population mean m
when s is known ( Ha: m . m0 ) . Suppose either the underlying population is normal or n
is large, a is the significance level, and let z be the value of the test statistic. Recall the
definition of a critical value: a 5 P ( Z $ za ) . Figures 9.25 and 9.26 demonstrate the rela-
tionship between the p value and a fixed significance level. These figures also illustrate the
definition of a p value, that is, the smallest significance level for which H0 can be rejected.
If the significance level ( a ) is less than p, then z , za, and we cannot reject H0.
Z Z
p
p
0 z z 0 z z
Figure 9.25 If p # a, then z $ za. Figure 9.26 If p . a, then z , za.
Conclusion: Reject H0 . Conclusion: Do not reject H0 .
Solution
Step 1 The assumed mean is m 5 620 ( 5 m0 ) ; n 5 20 and s 5 25.
This is a one-sided, left-tailed test. The underlying population is assumed to be
normal. The hypothesis test concerning a population mean m when s is known,
is relevant.
Step 2 The value of the test statistic will be used to compute a p value. The first three
parts of the hypothesis test are
H0: m 5 620
Ha: m , 620
X 2 m0 X 2 620
TS: Z 5 5
s / !n 25/ !20
The value of the test statistic is
610.1 2 620
z5 5 21.7710
25/ !20
420 Chapter 9 Hypothesis Tests Based on a Single Sample
p 0.0384
1.771 0
Figure 9.28 TI-84 Plus C Figure 9.29 TI-84 Plus C Figure 9.30 TI-84 Plus C
Z-Test input screen. Z-Test Calculate results. Z-Test Draw results.
Assume the underlying population is normal and that s 5 10 mm. Is there any evidence
that the true mean daily rainfall total in Toronto is different from 5 mm? Use a 5 0.05
and the p value associated with the test statistic to justify your answer.
9.4 p Values 421
Solution
Step 1 The assumed mean is m 5 5.0 ( 5 m0 ) ; n 5 26 and s 5 10.0.
The phrase “is different’’ means this is a two-sided test. The distribution of
24-hour rainfall total is assumed to be normal. The hypothesis test concerning a
population mean m when s is known, is relevant.
Step 2 The first three parts of the hypothesis test are
H0: m 5 5.0
Ha: m 2 5.0
X 2 m0 X 2 5.0
TS: Z 5 5
s / !n 10.0/ !26
Step 3 The sample mean is
1
x5 ( 0.1 1 4.2 1 c1 1.1 ) 5 7.8269
26
The value of the test statistic is
x 2 m0 7.8269 2 5.0
z5 5 5 1.4414
s / !n 10.0/ !26
Step 4 Since this is a two-sided test and the value of the test statistic ( z 5 1.4414 ) is
positive, p/2 is a right-tail probability (see Figure 9.31).
p/2 5 P ( Z $ 1.4414 ) Definition of p value for Ha: m 2 m0 .
p/2 0.0747
0 1.4414
Figure 9.32 TI-84 Plus C Figure 9.33 TI-84 Plus C Figure 9.34 TI-84 Plus C
Z-Test input screen. Z-Test Calculate Z-Test Draw results.
results.
422 C hapter 9 Hypothesis Tests Based on a Single Sample
Technology Corner
Procedure: Compute the p value in a hypothesis test concerning a population mean when s is known.
Reconsider: Example 9.13, solution, and interpretations.
CrunchIt!
Use the built-in function z 1-Sample. Input is either summary statistics or a column containing the sample data.
1. Enter the data in column Var1. Rename the column if desired.
2. Select Statistics; z; 1-sample.
3. Under the Columns tab, select the appropriate column name for the Sample and enter the standard deviation. Under the
Hypothesis Test tab, enter the value for m0 and select the appropriate Alternative. (Figure 9.35).
4. Click Calculate. The results are displayed in a new window (Figure 9.36).
Figure 9.35 1-Sample input screen. Figure 9.36 1-Sample test results.
TI-84 Plus C
Use the calculator function Z-Test. The input for this function is either data in a list or summary statistics.
1. Enter the data into list L1.
2. Select STAT ; TESTS; Z-Test.
3. The data are given in this example; highlight Data. Enter m0 , s, the name of the list containing the data, the frequency
of occurrence of each observation, and highlight the appropriate alternative hypothesis. See Figure 9.31.
4. In the last row of the input screen, select Calculate and press ENTER .
5. The p value associated with the hypothesis test is given on the Z-Test output screen. See Figure 9.32.
6. The Draw results are shown in Figure 9.33.
Minitab
Use the function 1-Sample Z. The input is either data in a column (One or more samples, each in a column) or summary
statistics (Summarized data).
1. Enter the data into column C1.
2. Choose Stat; Basic Statistics; 1-Sample Z.
9.4 p Values 423
3. In this example, select One or more samples, each in a column and enter C1. Enter the Standard deviation, check the
box to Perform hypothesis test, and enter the Hypothesized mean (Figure 9.37).
4. Choose Options. Enter a Confidence level and select the Alternative hypothesis (Figure 9.38).
5. Summary statistics, the confidence interval, the value of the test statistic, and the p value are displayed in the Session
window (Figure 9.39).
Figure 9.37 1-Sample Z input screen. Figure 9.39 1-Sample Z hypothesis test results.
Excel
Use the function Z.TEST. The input is a column (array) of data.
1. Enter the data into column A.
2. The arguments for Z.TEST are the array, m0 , and s. The function returns the probability that the sample mean is
greater than the hypothesized mean. Use this probability to compute the p value. See Figure 9.40.
9.106 Consider a hypothesis test concerning a population produce dioxin air pollution have complained of many health
mean with s known, underlying population normal, and alter- problems. A random sample of people living within a 10-mile
native hypothesis Ha: m , m0. Find the p value associated with radius of the facility was obtained, and the dioxin level (in ppt)
each value of the test statistic. in the blood of each was measured. Is there any evidence to
a. z 5 22.05 b. z 5 21.43 c. z 5 23.22 suggest that the mean dioxin level in the blood of residents
. z 5 20.67
d e. z 5 24.58 f. z 5 0.25 living near the incinerator is greater than 10 ppt? Assume
s 5 2.66, use a 5 0.05, and compute the p value. dioxin
9.107 Consider a hypothesis test concerning a population
mean with s known, underlying population normal, and 9.112 Sports and Leisure The National Football League
alternative hypothesis Ha: m 2 m0. Find the p value associated rates each quarterback after every game according to a for-
with each value of the test statistic. mula that involves pass attempts, passing yards, touchdown
a. z 5 21.77 b. z 5 1.43 c. z 5 2.58 passes, and interceptions. Suppose the mean quarterback
. z 5 20.37
d e. z 5 3.58 f. z 5 0.85 rating per game is 52.1 with s 5 16.7. There is some
speculation that rule changes benefiting receivers have
9.108 Consider a hypothesis test concerning a population
also increased quarterback ratings. In a random sample of
mean with s known and n large. For each alternative
34 quarterback 2012 season ratings, the sample mean was
hypothesis, value of the test statistic, and significance level,
x 5 57.09.31 Is there any evidence to suggest that the true
find the p value and determine whether H0 is rejected or
mean quarterback rating has increased? Use a 5 0.05 and
not rejected.
compute the p value.
a. Ha: m . 12.5, z 5 1.43, a 5 0.001
b. Ha: m , 20.56, z 5 22.05, a 5 0.05 9.113 Medicine and Clinical Studies The active ingredient
c. Ha: m 2 1200, z 5 1.75, a 5 0.10 in antiseptic liquid bandages is 8-hydroxyquinoline (8h). This
d. Ha: m . 37.7, z 5 3.11, a 5 0.01 chemical can be harmful through simple skin contact, by swal-
e. Ha: m , 52.68, z 5 21.16, a 5 0.025 lowing, or inhalation. Suppose the list of ingredients on a 3M
f. Ha: m 2 46.68, z 5 22.35, a 5 0.001 Nexcare liquid bandage indicates that the volume of 8h is 1%.
A random sample of eight bottles of this product was obtained,
9.109 Consider a hypothesis test concerning a population
and each was analyzed to obtain the volume of 8h. The sample
mean with s known and underlying population normal. For
mean was x 5 1.025%. Assume the underlying population is
each alternative hypothesis, value of the test statistic, and sig-
normal and s 5 0.04. Is there any evidence to suggest that the
nificance level, find the p value and determine whether H0 is
true mean percentage of 8h is different from 1%? Use a 5 0.01
rejected or not rejected.
and compute the p value.
a. Ha: m . 3.14, z 5 2.52, a 5 0.05
b. Ha: m . 9.80, z 5 1.39, a 5 0.05 9.114 Fuel Consumption and Cars Jack Keef General
c. Ha: m , 186,000, z 5 22.28, a 5 0.01 Motors, an automobile dealer in Lincoln, Nebraska, adver-
d. Ha: m , 4.135, z 5 0.17, a 5 0.01 tised a $9.99 40-point safety checkup for any car and claimed
e. Ha: m 2 1.62, z 5 21.63, a 5 0.001 that the service department would finish the job in less than
f. Ha: m 2 0.671, z 5 2.96, a 5 0.001 30 minutes. A local newspaper reporter selected a random
sample of 58 customers who took advantage of the dealer’s
9.110 Consider a hypothesis test concerning a population
offer and found the sample mean time to complete the safety
mean with s known and underlying population normal. For
checkup was x 5 32.2 minutes. Assume s 5 5.7 and use
each hypothesis test find the p value and determine whether H0
a 5 0.01.
is rejected or not rejected.
a. Is there any evidence to suggest that the true mean time to
H0 Ha x s n a complete the safety checkup is greater than 30 minutes?
b. Find the p value for the hypothesis test in part (a).
a. m 5 10 m . 10 11.50 7.56 18 0.05
b. m 5 2.718 m , 2.718 2.60 0.56 21 0.01 9.115 Sports and Leisure The Der Dachstein mountain in
Austria has several hiking trails, lots of options for skiers, and
c. m 5 57.72 m 2 57.72 56.42 1.58 14 0.01
a spectacular sky walk that extends out from a ledge to offer a
d. m 5 216.18 m . 216.18 2.35 21.23 8 0.001 view of the Alps. A new cable car takes passengers to approxi-
e. m 5 273 m , 273 275.80 17.80 15 0.05 mately 2700 meters above sea level in approximately 5.5 min-
f. m 5 6.63 m 2 6.63 7.17 1.08 27 0.05 utes.32 A random sample of cable car rides was selected and the
time (in minutes) to reach the top was recorded for each. The
data are given in the following table. cblcar
Applications
9.111 Public Health and Nutrition High levels of the 5.7 6.4 6.1 5.2 8.7 5.4 5.9 4.7
chemical dioxin, ingested through food and air, can cause 7.1 6.1 8.9 6.2 6.8 6.0 4.5 6.1
diabetes, immune disorders, and cancer. The U.S. EPA has set
a maximum acceptable level of dioxin in blood at 10 ppt (parts Assume the underlying population is normal and s 5 1.3. Is
per trillion). Residents living near an old incinerator known to there any evidence to suggest that the mean cable car time is
9.4 p Values 425
greater than 5.5 minutes? Use a 5 0.05 and compute the p that the manufacturer’s claim is false? Use a 5 0.01 and
value to draw a conclusion. compute the p value to draw a conclusion.
9.116 Manufacturing and Product Development Nitter-
house Masonry Products, LLC, in Chambersburg, Pennsylva- Extended Applications
nia, produces architectural concrete masonry products. The 9.120 Manufacturing and Product Development Arizona
Dover is the largest block in a certain collection, is used primar- state procurement specifications for instant-dry highway paint
ily for residential retaining walls, and is manufactured to weigh indicate that the paint must dry in less than 60 seconds. A ran-
40 pounds. A quality control inspector for the company ran- dom sample of paint was obtained from a potential seller, and
domly selected eight blocks, and the weight (in pounds) of each the drying time of each batch was carefully tested. Assume
is given in the following table. dover s 5 8.0 and use a 5 0.05. paint
a. Is there any evidence to suggest the mean drying time for
43.4 39.8 42.3 42.6 41.0 39.8 39.6 38.8 this highway paint is less than 60 seconds? Use the p
value to draw a conclusion.
Assume the underlying population is normal and s 5 2 b. Using the results from part (a), do you believe there is
pounds. Is there any evidence to suggest that the true mean overwhelming evidence to suggest the paint dries in less
weight of the blocks is different from 40 pounds? Use a 5 0.05 than 60 seconds? Why or why not?
and compute the p value to draw a conclusion. c. Carefully sketch a graph to illustrate the p value in this
9.117 Travel and Transportation At maximum takeoff hypothesis test.
weight, the Airbus A350-900 needs a runway of at least 8000 9.121 Biology and Environmental Science A large farm
feet.33 To check this claim, officials from the FAA randomly selling round fescue hay bales claims the mean weight per bale
selected A350-900 flights and measured the length (in feet) of the is 1600 pounds. To check this claim, personnel from the local
runway needed for takeoff. Assume s 5 365 and use a 5 0.05. agricultural extension service randomly selected 25 round bales
Is there any evidence to suggest that the true mean runway length and carefully weighed each. The sample mean was x 5 1595.6
needed for takeoff of an Airbus A350-900 is greater than 8000 pounds. Assume the underlying population is normal and
feet? Use the p value to draw a conclusion. runway s 5 23 pounds.
9.118 Manufacturing and Product Development a. Is there any evidence to suggest that the true mean weight
According to U.S. Inspect, roofing shingles weighed approxi- of these round hay bales is less than 1600 pounds? Use
mately 240 pounds per square foot prior to 1973, and modern a 5 0.01 and compute the p value to draw a conclusion.
shingles weigh approximately 190 pounds per square foot, b. Carefully sketch a graph to illustrate the p value in this
depending on the material and the manufacturer. Suppose a hypothesis test.
brand of recycled synthetic shingles is advertised to weigh 9.122 Public Policy and Political Science A technical
185 pounds per square foot. A random sample of these shin- panel investigating residential smoke alarms recommends
gles was obtained, and the weight of each is given in the that the sound emitted by the device (during a fire) should be
following table. roofing at least 85 decibels at 10 feet away. In a random sample of
60 smoke alarms from homes in a large city, the sample mean
182.6 184.8 187.4 186.4 180.4 187.3 185.1 decibel level (at 10 feet away) was x 5 84.88. Assume
181.4 183.5 187.6 182.7 186.6 185.4 184.0 s 5 5.6.
a. Is there any evidence that the true mean decibel level
185.1 187.7 186.7 185.4 183.4 188.1 182.5
produced by smoke alarms in this city is less than 85? Use
186.3 185.2 187.8 186.5 187.8 182.7 188.8 a 5 0.01 and compute the p value to draw a conclusion.
187.2 189.7 184.5 183.7 b. What is the smallest significance level at which this
hypothesis test would be significant?
Assume s 5 2.7. Is there any evidence to suggest that the true
9.123 Technology and the Internet Communications sat-
mean weight of these recycled shingles is different from 185
ellites in geostationary orbit are designed to have a high orbit of
pounds per square foot? Use a 5 0.05 and compute the p value
35,800 kilometers.34 If the mean apogee is greater than 35,800
to draw a conclusion.
km, there is a danger that the satellite will temporarily lose con-
9.119 Manufacturing and Product Development The tact with ground antennas. A random sample of geostationary
Wood Flooring Manufacturers Association claims a new satellites was obtained and the apogee for each was recorded.35
product restores a high-gloss finish to old wood floors. A gloss Assume s 5 315. orbit
meter is used to measure the shine of a wood floor, and a high- a. Is there any evidence to suggest that the true mean apogee
gloss finish should have a rating of at least 80 units. To check for communication satellites is greater than 35,800 km?
the manufacturer’s claim, the sealer and finish were applied to Use a 5 0.05 and compute the p value to draw a
35 randomly selected wood floors and the gloss was measured conclusion.
after each application dried. The sample mean gloss was b. Carefully sketch a graph to illustrate the p value in this
x 5 78.6. Assume s 5 3.45. Is there any evidence to suggest hypothesis test.
426 C hapter 9 Hypothesis Tests Based on a Single Sample
Theorem
Given a random sample of size n from a normal distribution with mean m, the random
variable
X 2m
T5
S/ !n
has a t distribution with n 5 n 2 1 degrees of freedom.
Remember, this is simply another kind of standardization, and yes, the distribution of
T is characterized by n 2 1 (not n) degrees of freedom. The general procedure for a
hypothesis test concerning m is summarized below.
A Closer L ok
1. This procedure is often called a small-sample test (concerning a population mean), or
simply a t test. As long as the underlying population is normal, this test is valid (and
exact) for any sample size n (large or small).
2. Table V in the Appendix presents selected critical values associated with various t distri-
butions. These critical values were used to construct confidence intervals in Section 8.3.
3. Figures 9.41, 9.42, and 9.43 illustrate the rejection region for each alternative
hypothesis.
9.5 Hypothesis Tests Concerning a Population Mean When s Is Unknown 427
/2 /2
0 t t 0 t 2 0 t 2
Figure 9.41 Rejection region for Figure 9.42 Rejection region for Figure 9.43 Rejection region for
SOLUTION TRAIL Ha: m . m0 . Ha: m , m0 . Ha: m 2 m0 .
x 2 m0 305.72 2 279
t5 5 5 2.3111 $ 1.7959
s / !n 40.05/ !12
Step 4 Because 2.3111 lies in the rejection region, we reject the null hypothesis at the
a 5 0.05 level. There is evidence to suggest that the mean July residential elec-
tric bill in the Pacific region is greater than 279 dollars.
Figures 9.44–9.46 show a technology solution. Figures 9.46 and 9.47 illustrate
the p value associated with this test.
428 Chapter 9 Hypothesis Tests Based on a Single Sample
Figure 9.44 TI-84 Plus C Figure 9.45 TI-84 Plus C Figure 9.46 TI-84 Plus
T-Test input screen. T-Test Calculate T-Test Draw results.
results.
t11
0.0206
0 2.3111
Is there any evidence to suggest the mean water usage is different from 250,000 gallons?
Assume the underlying population is normal and use a 5 0.01.
Solution
Step 1 The assumed mean is 250 ( 5 m0 ) thousand gallons, n 5 21, and a 5 0.01.
The water department is looking for water usage different from 250,000. There-
fore, the relevant alternative hypothesis is two-sided.
Mayangsari/Dreamstime.com
The underlying population is assumed to be normal, and s is unknown. A t test
is appropriate.
9.5 Hypothesis Tests Concerning a Population Mean When s Is Unknown 429
H0: m 5 250
Ha: m 2 250
X 2 m0
TS: T 5
S/ !n
RR: 0 T 0 $ ta/2,n21 5 t0.005,20 5 2.8453 ( T # 22.8453 or T $ 2.8453 )
Step 3 The sample mean is
1
x5 ( 140.1 1 270.1 1 c1 303.5 ) 5 245.2238
21
Find the sample standard deviation.
1 1
s2 5 a1,296,137.59 2 ( 5149.7 ) 2 b 5 1665.4269
20 21
s 5 !1665.4269 5 40.8096
The value of the test statistic is
x 2 m0 245.2238 2 250
t5 5 5 20.5363
s / !n 40.8096/ !21
Step 4 The value of the test statistic, t 5 20.5363, does not lie in the rejection region.
We cannot reject the null hypothesis. There is no evidence to suggest park water
usage is different from 250,000 gallons per day.
Figure 9.48 shows a technology solution, and Figure 9.49 illustrates the p value
associated with this test.
t20
0.2988 0.2988
0.5363 0 0.5363
The table listing critical values for various t distributions is very limited. Using this
table, the best we can do is bound the p value associated with a hypothesis test.
Solution
Step 1 The assumed mean is 6.5 ( 5 m0 ) ; n 5 8 and a 5 0.01.
We would like to know whether the new driver has been able to shorten the mean
delivery time. The relevant alternative hypothesis is one-sided, left-tailed.
The underlying population is assumed to be normal, and s is unknown. A t test
is appropriate.
Step 2 The four parts of the hypothesis test are
H0: m 5 6.5
Ha: m , 6.5
X 2 m0
TS: T 5
S/ !n
RR: T # 2ta,n21 5 2t0.01,7 5 22.9980
Step 3 The sample mean is
1
x5 ( 6.61 1 6.25 1 c1 6.29 ) 5 6.3688
8
9.5 Hypothesis Tests Concerning a Population Mean When s Is Unknown 431
t7
0.10
0.05
0 1.4149 |t | 1.8946
The p value associated with this hypothesis test is between 0.05 and 0.10.
Because the smallest possible value, 0.05, is greater than the significance level,
0.01, we cannot reject the null hypothesis.
Figures 9.51 through 9.53 show a technology solution.
Figure 9.51 TI-84 Plus C Figure 9.52 TI-84 Plus C Figure 9.53 TI-84 Plus C
T-Test input screen. T-Test Calculate T-Test Draw results.
results.
Technology Corner
Procedure: Hypothesis test concerning a population mean when s is unknown.
Reconsider: Example 9.16, solution, and interpretations.
CrunchIt!
Use the built-in function t 1-Sample. Input is either summary statistics or a column containing the sample data.
1. Enter the data in column Var1.
2. Select Statistics; t; 1-sample.
3. Under the Columns tab, select Var1 for the Sample. Under the Hypothesis Test tab, enter the value for m0 and select the
appropriate Alternative. (Figure 9.54).
4. Click Calculate. The results are displayed in a new window (Figure 9.55).
TI-84 Plus C
Use the calculator function T-Test. The input for this function is either data in a list or summary statistics.
1. Enter the data into list L1.
2. Select STAT ; TESTS; T-Test.
3. The data are given in this example; highlight Data. Enter m0 , the name of the list containing the data, the frequency of
occurrence of each observation, and highlight the appropriate alternative hypothesis. Refer to Figure 9.50.
4. In the last row of the input screen, select Calculate and press ENTER .
5. The alternative hypothesis, value of the test statistic, p value, and summary statistics are displayed on the output screen.
Refer to Figure 9.51.
6. The Draw results are shown in Figure 9.52.
Minitab
Use the function 1-Sample t. The input is either data in a column (Samples in columns) or summary statistics (Summa-
rized data).
1. Enter the data into column C1.
2. Choose Stat; Basic Statistics; 1-Sample t.
3. In this example, select One or more samples, each in a column, and enter C1. Check the box to Perform a hypothesis
test, and enter the Hypothesized mean (Figure 9.56).
4. Choose Options. Enter a Confidence level and select the Alternative hypothesis (Figure 9.57).
5. Summary statistics, the confidence interval, the value of the test statistic, and the p value are displayed in the Session
window (Figure 9.58).
9.5 Hypothesis Tests Concerning a Population Mean When s Is Unknown 433
Figure 9.56 1-Sample t input screen. Figure 9.58 1-Sample t hypothesis test results.
Excel
There is no built-in function for a one-sample t test. Compute the summary statistics and use the appropriate equation to
find the value of the test statistic.
1. Enter the data into column A.
2. Compute the sample mean, sample standard deviation, and value of the test statistic. Use the appropriate Excel function
to compute the p value. See Figure 9.59.
Find the significance level ( a ) corresponding to each value of n a. Write the four parts for a two-sided hypothesis test
and rejection region. concerning the population mean with a 5 0.002.
a. n 5 27, T $ 2.0555 b. n 5 9, T $ 4.5008 b. Suppose x 5 9.04 and s 5 1.20. Find the value of the test
c. n 5 16, T $ 2.6025 d. n 5 20, T $ 3.5794 statistic, and draw a conclusion about the population mean.
c. Find the p value associated with this hypothesis test.
9.133 Consider a hypothesis test concerning the mean from a
normal population with H0: m 5 215.76 and Ha: m , 215.76. 9.141 Consider a two-sided hypothesis test concerning the
Find the significance level ( a ) corresponding to each value of n mean, m0, from a normal population, with sample size n 5 25,
and rejection region. standard deviation s, and a 5 0.05. Explain the error in each
a. n 5 23, T # 21.3212 b. n 5 4, T # 210.2145 of the following statements.
c. n 5 31, T # 22.7500 d. n 5 18, T # 22.5524 X 2 m0
a. The test statistic is T 5 .
9.134 Consider a hypothesis test concerning the mean from a s/ !25
normal population with H0: m 5 5128 and Ha: m 2 5128. Find X 2 m0
b. The test statistic is T 5 .
the significance level ( a ) corresponding to each value of n and S/ !24
rejection region. c. The rejection region is RR: T $ 1.7109.
a. n 5 30, 0 T 0 $ 1.6991 b. n 5 6, 0 T 0 $ 4.0321 d. TRAIL
SOLUTION If the value of the test statistic is t 5 2.6732, then
c. n 5 17, 0 T 0 $ 3.6862 d. n 5 14, 0 T 0 $ 5.1106 p # 0.005.
9.135 Consider a hypothesis test concerning the mean from a
normal population with H0: m 5 2.53 and Ha: m . 2.53. Find Applications
bounds on the p value for each value of n and test statistic t. 9.142 Biology and Environmental Science The
a. n 5 24, t 5 2.35 b. n 5 3, t 5 8.55
Atlantic bluefin tuna is the largest and most endangered
c. n 5 12, t 5 5.68 d. n 5 8, t 5 1.52
of the tuna species. They are found throughout the North Atlan-
9.136 Consider a hypothesis test concerning the mean from a tic Ocean and the mean weight is 550 pounds.37 The Center for
normal population with H0: m 5 6.28 and Ha: m , 6.28. Find Biological Diversity is concerned that this species has been
bounds on the p value for each value of n and test statistic t. overfished and that the mean weight has decreased. Suppose a
a. n 5 25, t 5 21.97 b. n 5 13, t 5 20.63 random sample of 12 Atlantic bluefin tuna was obtained from
c. n 5 18, t 5 22.28 d. n 5 27, t 5 23.58 commercial fishing boats and weighed. The summary statistics
were x 5 535.7 and s 5 37.8. Conduct a hypothesis test to
9.137 Consider a hypothesis test concerning the mean from a determine whether there is any evidence that the mean weight
normal population with H0: m 5 1.414 and Ha: m 2 1.414. Find of Atlantic bluefin tuna is less than 550 pounds. Assume the
bounds on the p value for each value of n and test statistic t. distribution of weight is normal and use a 5 0.05. Write a
a. n 5 10, t 5 2.04 b. n 5 23, t 5 23.14 Solution Trail for this problem.
c. n 5 14, t 5 25.52 d. n 5 11, t 5 1.75
9.143 Travel and Transportation During a routine com-
9.138 Consider a random sample of size 20 from a normal mercial airline flight, much of the time in the aircraft is spent
population with hypothesized mean 1.618. waiting to take off and taxiing to an arrival gate. However, air-
a. Write the four parts for a one-sided, left-tailed hypothesis lines also keep careful records of the actual air time of each
test concerning the population mean with a 5 0.05. flight. A random sample of the air times (in minutes) of US Air-
b. Suppose x 5 1.5 and s 5 0.45. Find the value of the test ways flights from Philadelphia to Las Vegas during December
statistic, and draw a conclusion about the population mean. 2012 was obtained, and the data are given in the following
c. Find the p value associated with this hypothesis test. table.38 flights
9.139 Consider a random sample of 11 observations, given in
the following table, from a normal population with hypothe- 297 301 299 306 307 300 308 327
sized mean 57.71. EX9.139 316 300 323 313 348 303 316
59.94 58.93 59.41 60.66 59.00 60.98 The air time for this flight is affected by the prevailing west-to-
58.85 55.21 59.02 61.14 59.25 east winds and weather systems and normally takes 303 min-
utes. Is there any evidence to suggest the true mean air time is
a. Write the four parts for a one-sided, right-tailed hypothesis greater than this 303 minutes? Assume the underlying distribu-
test concerning the population mean with a 5 0.01. tion is normal and use a 5 0.025.
b. Compute the sample mean, the sample standard deviation,
9.144 Biology and Environmental Science The historical
and the value of the test statistic. Draw a conclusion about
mean yield of soybeans from farms in Indiana is 31.9 bushels
the population mean.
per acre. Following a recent dry summer, a random sample of 26
c. Find the p value associated with this hypothesis test.
farms across the state was obtained. The mean yield in bushels
9.140 Consider a random sample of size 28 from a normal per acre was x 5 30.088 with s 5 4.433. Is there any evidence
population with hypothesized mean m0 5 9.96. to suggest the lack of rain adversely affected the soybean yield
9.5 Hypothesis Tests Concerning a Population Mean When s Is Unknown 435
SOLUTION TRAIL
in Indiana? Assume the underlying population is normal and and the physician density per 1000 people was recorded for
use a 5 0.01. each. The data are given in the following table. oecd
incentives. A random sample of white-collar employees was a. Is there any evidence to suggest that the mean time has
obtained, and the number of hours each worked during the last decreased? Use a 5 0.01.
week are given in the following table. work
b. Find bounds on the p value associated with this test.
9.151 Business and Management In June 2013 the largest
44.7 42.0 45.8 43.0 42.8 50.9 47.0
building in the world opened in Chengdu, China. The Century
41.9 49.3 45.6 45.7 39.4 39.0 44.4 Global Center has an ice-skating rink, shopping mall, 14-screen
IMAX theater, and an artificial beach.43 Some researchers
Is there any evidence the true mean number of hours worked
believe that as the economy improves, larger buildings are
by white-collar employees is greater than 40? Use a 5 0.01.
being constructed. A random sample of buildings from around
What assumption(s) did you make in order to complete this
the world with large floor space was obtained and the floor area
hypothesis test?
(in million square feet) was measured for each. bigbldg
9.147 Marketing and Consumer Behavior Kennebunk- a. Is there any evidence to suggest that the mean floor area
port, Maine, is a popular tourist town, but businesses suffer of very large buildings is greater than 9 million square
during the winter months, especially January. The mean hotel feet? Assume the underlying population is normal and use
room occupation rate per day, a measure of tourist activity, is a 5 0.05.
23.1% during winter months. A new advertising campaign was b. Find bounds on the p value.
launched to attract more tourists to this town during the winter.
9.152 Marketing and Consumer Behavior As the use of
Following the campaign, a random sample of nine winter days
the Internet has increased, companies are spending more money
was selected. The mean hotel room occupation rate was
advertising online. A sample of leading national advertisers was
x 5 24.6% and s 5 2.1%. Is there any evidence to suggest
obtained, and the Internet advertising spending (in millions of
the mean hotel room occupation rate has increased? Assume
dollars) for the year was recorded for each. The data are given
normality and use a 5 0.01. If this test is significant, can you
in the following table. advert
conclude the ad campaign caused the increase?
9.148 Economics and Finance Economic growth in China Company Spending
has contributed to increased demand for crude oil and rising Procter & Gamble Co. 80.6
gasoline prices in several countries. In May 2013, the mean
Verizon Communications 189.1
crude oil imported per day in China was 5.64 million bar-
rels.40 A random sample of 17 days in the last quarter of 2013 Time Warner 98.4
was obtained and the crude oil imported was recorded for Walt Disney Co. 141.4
each (in million barrels per day, bpd). The summary statistics General Electric Co. 78.9
were x 5 5.869 million bpd and s 5 1.16 million bpd. Is Toyota Motor Corp. 57.3
there any evidence to suggest that the mean crude oil Sony Corp. 71.5
imported per day in China has increased? Use a 5 0.05 and
Kraft Foods 36.3
assume normality.
Macy’s 11.1
9.149 Medicine and Clinical Studies According to the PepsiCo 28.1
Organization for Economic Co-operation and Development
Pfizer 24.2
(OECD), the mean density of physicians per 1000 people
worldwide is 3.2.41 A random sample of countries was obtained McDonald’s Corp. 27.2
436 C hapter 9 Hypothesis Tests Based on a Single Sample
Suppose the mean spending on Internet advertising for these a. Is there any evidence to suggest the mean methane output
companies the previous year was $55.5 million. Is there any per well per day has increased?
evidence to suggest the true mean spending on Internet adver- b. Find bounds on the p value associated with this hypothesis
tising has increased? Use a 5 0.05. test.
9.159 Biology and Environmental Science A quahog is a 9.162 Public Health and Nutrition Fluoride is added to
chewy Atlantic hard-shell clam with a blue-gray shell. Qua- public water supplies and many dental products in order to
hogs have different names, depending on size; for example, help prevent tooth decay. Sodium fluoride is a common addi-
the width of a littleneck clam is under 5 centimeters (across tive in toothpaste, and the concentration is usually measured
the shell). The historical mean width of a littleneck is 4.75 cm. in parts per million by weight (ppmF). The manufacturer of
Following a recent oil spill off the coast of Maine, a random Pepsodent toothpaste claims the concentration of fluoride in
sample of 46 littlenecks was obtained and the width of each every tube is 1000 ppmF. A random sample of toothpaste
was recorded. The summary statistics were x 5 4.66 cm and tubes was obtained, and the fluoride concentration in each
s 5 0.25 cm. was determined. Some research suggests that high concentra-
a. Use a one-sided, left-tailed hypothesis test to show that tions of fluoride can be toxic and may cause brain damage,
there is evidence to suggest the mean width of littlenecks immune disorders, and changes in bone structure and
has decreased. Assume the population of littleneck widths strength. FLUOR
is normal and use a 5 0.01. a. Test the relevant hypothesis concerning the mean fluoride
b. Explain why there is statistical evidence that the concentration per tube of toothpaste. Assume normality
population mean has decreased even though the sample and use a 5 0.05.
mean (4.66) is so close to the historical mean (4.75). b. Find bounds on the p value associated with this hypothesis
c. Find bounds on the p value associated with this test.
hypothesis test.
9.163 Public Health and Nutrition A recent study sug-
9.160 Physical Sciences The Palmer Drought Severity Index gested that artificially sweetened soft drinks actually have a
(PDSI) is a measure of prolonged abnormal dryness or wetness. negative impact on weight and other health issues. Aspartame is
It indicates general conditions and is not affected by local varia- the most common sweetener used in diet sodas. A random sam-
tions. An index value of 24 indicates extreme drought, and 14 ple of diet sodas was obtained and the amount of aspartame in
represents very wet conditions. A random sample of the PDSI an 8-ounce can was recorded for each (in mg). The data are
for selected regions in the Eastern United States for the week given in the following table.49 asprtme
ending July 6, 2013, was obtained, and the data are given in the
following table.47 drought 125 125 0 58 118 118 83
123 57 50 50 19 66 0
0.56 2.51 2.41 1.75 20.07 21.34 1.79
2.08 20.42 2.62 1.74 2.61 5.11 1.53 Assume the underlying population is normal.
1.29 1.30 20.28 21.14 20.85 3.63 2.10 a. Is there any evidence to suggest that the mean amount of
aspartame in an 8-ounce can of diet soda is greater than
a. Is there any evidence to suggest the true mean PDSI is 65 mg? Use a 5 0.01.
different from 0? Assume the underlying distribution is b. Find bounds on the p value associated with this test.
normal and use a 5 0.01. Find bounds on the p value
associated with this test. 9.164 Marketing and Consumer Behavior Red Rocks
b. Explain your conclusion in part (a) in terms of farming Park in Morrison, Colorado, is a naturally formed, rock, open-
conditions and reservoir levels. air amphitheater 6450 feet above sea level. A random sample of
events from Summer 2013 was obtained and the attendance for
9.161 Physical Sciences Alaska, California, and Hawaii each was recorded. The summary statistics were x 5 8722,
are the states with the most earthquakes, and over 50% of the s 5 460.4, and n 5 11. Assume the distribution of attendance
earthquakes in the United States occur in Alaska. A random is normal and use a 5 0.01.
sample of earthquakes around the world during July 2013 a. The amphitheater operators believe that 9000 people is
was obtained, and the magnitude of each is given in the fol- the optimal attendance in terms of comfort and profit. Is
lowing table.48 quake there any evidence to suggest that the true mean
attendance is different from 9000?
1.5 0.4 0.6 0.5 0.3 0.6 1.1 1.0 1.7 1.4 b. Find bounds on the p value associated with this test.
1.9 1.6 1.9 1.2 1.5 1.6 5.3 4.5 0.5 1.9 c. How large a sample size would be necessary for this test
1.3 0.5 0.3 2.0 2.4 0.5 1.8 0.6 to be significant (assuming x and s remain the same)?
a Poisson random variable. The four parts of the hypothesis The four parts of the large-sample hypothesis test are
test are
H0: l 5 l0
H0: l 5 l0
Ha: l . l0, l , l0, or l 2 l0
Ha: l . l0, l , l0, or l 2 l0
X 2 l0
TS: X 5 the number of events that occur in the specified TS: Z 5
interval "l0
RR: X $ xa, X # xra, or X # xa/2 or X $ xra/2 X 5 the number of events that occur in the specified
interval
The critical values xa, xra, xa/2, and xra/2 are obtained from the
RR: Z $ za, Z # 2za, or 0 Z 0 $ za/2
Poisson distribution with parameter l0 to yield the desired
significance level a. Postal workers who deliver mail are at high risk for dog
At a large hotel in Los Angeles, on average four people per bites. The number of dog bites to postal employees in the
day forget to take their room key and, therefore, lock them- United States per (fiscal) year peaked during the mid-1980s,
selves out. In an effort to lower the number of these service but with increased public awareness and employee training
calls, the hotel staff has placed special signs on the back of the number steadily decreased to less than 3000 per year.
every hotel room door reminding visitors to take their key. However, in fiscal year 2002, the number of dog bites
On a randomly selected day, two people were locked out of began to rise again, especially during the “dog days’’ of
their rooms. summer. In 2012 the mean number of dog bites to postal
a. Write the four parts of a small-sample, one-sided, left-tailed workers per day was 19.50 A random day in July 2013 was
test concerning the mean number of people locked out of obtained and the number of dog bites to postal workers
their room, based on a Poisson distribution. Use a 5 0.05. was 24.
b. Is there any evidence to suggest the mean has decreased? a. Write the four parts of a large-sample, one-sided,
c. Find the p value associated with this hypothesis test. right-tailed test concerning the mean number of dog
bites to postal employees per day, based on a normal
9.166 Public Policy and Political Science A large-sample approximation to the Poisson distribution. Use
test concerning the mean of a Poisson random variable is based a 5 0.01.
on a normal approximation. If X has a Poisson distribution with b. Is there any evidence to suggest the mean number of dog
(large) mean l, then X is approximately normal with mean l bites to postal employees per day has increased?
and standard deviation"l. c. Find the p value associated with this hypothesis test.
From Section = 7.3, if n is large and both np $ 5 and n ( 1 2 p ) $ 5, then the ran-
dom
= variable P is approximately normal with mean p and variance = p ( 1 2 p ) /n:
P , N ( p, p ( 1 2 p ) /n ) . These results concerning the distribution of P are used to con-
d
SOLUTION TRAIL
Video Tech Manuals
VIDEO TECH MANUALS A Closer L ok
One
EXELProportion
DISCRIPTIVE
Inference CI - 1. This test is valid as long as np0 $ 5 and n ( 1 2 p0 ) $ 5 (the nonskewness criterion).
Summarized Data
2. The critical values for this test are from the standard normal distribution, Z (as in a
hypothesis test about a population mean when s is known).
Solution Trail 9.17
The following example illustrates this hypothesis test procedure.
K e ywor ds
Is there any evidence?
n
Example 9.17 Online Diagnosers
n Proportion increased
It’s hard to believe, but doctors used to make house calls! Today, many people first
n 2400 U.S. adults
consult the Internet when they are sick, starting with a search engine and then a spe-
n 871 online diagnosers
cialized health site such as WebMD. According to a Pew Research Center survey, 35%
n 35% of U.S. adults have used the Internet specifically to diagnose a certain medical condi-
n Random sample tion.51 A medical insurance company is concerned that the percentage of online diag-
Tr a nsl ati o n nosers is increasing in response to rising health care costs. A random sample of 2400
n Conduct a one-sided, U.S. adults was obtained, and 871 indicated that they diagnose online. Is there any
right-tailed test about p evidence to suggest that the proportion of online diagnosers has increased? Use
n n 5 2400 a 5 0.01.
n x 5 871
n p0 5 0.35 Solution
Step 1 The given information:
C onc e p ts
n Large-sample hypothesis Sample size: n 5 2400.
test concerning a population Number of people with the specific characteristic, online diagnoser: x 5 871.
proportion =
The sample proportion: p 5 x/n 5 871/2400 5 0.3629.
Vision The assumed value of the population proportion is 0.35 ( 5 p0 ) , and the signifi-
Check the nonskewness cance level is a 5 0.01.
criterion. Use the template for a
Step 2 Check the nonskewness criterion.
one-sided, right-tailed test about
p. Use a 5 0.01 to find the np0 5 ( 2400 )( 0.35 ) 5 840 $ 5
critical value, compute the value n ( 1 2 p0 ) 5 ( 2400 )( 0.65 ) 5 1560 $ 5
of the test statistic, and draw a =
conclusion. Both inequalities are satisfied, so P is approximately normal, and the large-sample
hypothesis test concerning a population proportion can be used.
440 C hapter 9 Hypothesis Tests Based on a Single Sample
Figure 9.60 TI-84 Plus C Figure 9.61 TI-84 Plus C Figure 9.62 TI-84 Plus C
1-PropZTest input 1-PropZTest 1-PropZTest Draw
screen. Calculate results. results.
0.0923
0 1.325
Figure 9.63 JMP hypothesis test results. Figure 9.64 p-Value illustration:
p 5 P(Z $ 1.325)
5 0.0923 . 0.01 5 a
9.6 Large-Sample Hypothesis Tests Concerning a Population Proportion 441
Solution
a. The sample size is n 5 400 and x 5 204.
=
The sample proportion is p 5 x/n 5 204/400 5 0.51.
The assumed value of the population proportion is p0 5 0.54 and a 5 0.05.
b. Because this is a two-sided test and the value of the test statistic ( z 5 21.2039 ) is
negative, p/2 is a left-tail probability.
Figure 9.65 TI-84 Plus C Figure 9.66 TI-84 Plus C Figure 9.67 TI-84 Plus C
1-PropZTest input 1-PropZTest 1-PropZTest Draw
screen. Calculate results. results.
0.1143 0.1143
–1.2039 0 1.2039
Technology Corner
Procedure: Hypothesis test concerning a population proportion.
Reconsider: Example 9.18, solution, and interpretations.
CrunchIt!
Use the built-in function Proportion 1-Sample. Input is either summary statistics or a column containing the sample data.
1. Select Statistics; Proportion; 1-sample.
2. Under the Summarized tab, enter the value for n and Successes (x). Under the Hypothesis Test tab, enter the value for p0
and select the appropriate Alternative. (Figure 9.69).
3. Click Calculate. The results are displayed in a new window (Figure 9.70).
9.6 Large-Sample Hypothesis Tests Concerning a Population Proportion 443
TI-84 Plus C
Use the calculator function 1-PropZTest. The input is the value of p0, the number of successes, and the number of
trials.
1. Select STAT ; TESTS; 1-PropZTest.
2. Enter p0, the hypothesized proportion, x, the number of successes, and n, the sample size.
3. Choose the appropriate alternative hypothesis and Calculate. Press ENTER . See Figure 9.65.
4. The 1-PropZTest output includes the alternative hypothesis, the value of the test statistic, the p value, the sample
proportion, and the sample size. Refer to Figure 9.66.
5. The Draw results are shown in Figure 9.67.
Minitab
Use the function 1 Proportion. The input is either data in a column (Samples in columns) or summary statistics (Summa-
rized data).
1. Choose Stat; Basic Statistics; 1 Proportion.
2. In this example, select Summarized data. Enter the Number of events (x) and the Number of trials (n). Check the box to
Perform a hypothesis test, and enter the Hypothesized proportion. See Figure 9.71.
3. Choose Options. Enter a Confidence level, select the Alternative hypothesis, and select the Normal approximation
Method. See Figure 9.72.
4. Summary statistics, the confidence interval, the value of the test statistic, and the p value are displayed in the Session
window. See Figure 9.73.
Figure 9.71 1 Proportion input screen. Figure 9.73 1 Proportion hypothesis test results.
444 C hapter 9 Hypothesis Tests Based on a Single Sample
Excel
There is no built-in function to conduct a hypothesis test concerning a population proportion. However, Excel can be used
to compute the value of the test statistic and the p value.
1. Enter p0, the hypothesize proportion, x, the number of successes, and n, the sample size.
2. Compute the sample proportion, the value of the test statistic, and the p value.
9.169 True/False In a large-sample hypothesis test concerning 9.175 In each of the following problems, assume the null hypoth-
a population proportion, the number of trials must be at least 250. esis is H0: p 5 p0 and the alternative hypothesis is Ha: p 2 p0.
Use the values of p0, x, n, and a to conduct a large-sample
9.170 True/False In a large-sample hypothesis test concern-
hypothesis test about the population proportion.
ing a population proportion, there are only two possible alterna-
a. p0 5 0.28, x 5 88, n 5 377, a 5 0.025
tive hypotheses.
b. p0 5 0.46, x 5 130, n 5 243, a 5 0.02
9.171 Short Answer In a large-sample hypothesis test con- c. p0 5 0.337, x 5 120, n 5 414, a 5 0.05
cerning a population proportion, what happens if the nonskew- d. p0 5 0.71, x 5 865, n 5 1250, a 5 0.005
ness criterion is not satisfied? e. p0 5 0.93, x 5 1515, n 5 1600, a 5 0.01
9.176 In each of the following problems, assume the null
Practice hypothesis is H0: p 5 p0 and the alternative hypothesis is
Ha: p . p0 . Use the values of p0, x, n, and a to conduct a
9.172 In each of the following problems, the sample size and
large-sample hypothesis test about the population proportion.
p0 are given for a large-sample hypothesis test concerning a
Find the p value associated with each test, and use it to draw a
population proportion. Check the two nonskewness inequalities
conclusion.
and determine whether this test is appropriate.
a. p0 5 0.15, x 5 60, n 5 356, a 5 0.10
a. n 5 276, p0 5 0.30 b. n 5 1158, p0 5 0.60
b. p0 5 0.62, x 5 298, n 5 450, a 5 0.05
c. n 5 645, p0 5 0.03 d. n 5 159, p0 5 0.97
c. p0 5 0.743, x 5 1035, n 5 1360, a 5 0.05
e. n 5 322, p0 5 0.38 f. n 5 443, p0 5 0.82
d. p0 5 0.94, x 5 795, n 5 825, a 5 0.01
9.173 In each of the following problems, assume the null e. p0 5 0.32, x 5 960, n 5 2750, a 5 0.001
hypothesis is H0: p 5 p0 and the alternative hypothesis is
9.177 In each of the following problems, assume the null hypoth-
Ha: p . p0. Use the values of p0, x, n, and a to conduct a large-
esis is H0: p 5 p0 and the alternative hypothesis is Ha: p , p0.
sample hypothesis test about the population proportion.
Use the values of p0, x, n, and a to conduct a large-sample
a. p0 5 0.34, x 5 121, n 5 348, a 5 0.05
hypothesis test about the population proportion. Find the p value
b. p0 5 0.67, x 5 200, n 5 281, a 5 0.10
associated with each test, and use it to draw a conclusion.
c. p0 5 0.488, x 5 535, n 5 1020, a 5 0.01
a. p0 5 0.54, x 5 145, n 5 301, a 5 0.05
d. p0 5 0.02, x 5 52, n 5 2366, a 5 0.025
b. p0 5 0.39, x 5 180, n 5 460, a 5 0.005
e. p0 5 0.85, x 5 570, n 5 667, a 5 0.01
c. p0 5 0.07, x 5 35, n 5 566, a 5 0.01
9.174 In each of the following problems, assume the null hypoth- d. p0 5 0.64, x 5 449, n 5 747, a 5 0.025
esis is H0: p 5 p0 and the alternative hypothesis is Ha: p , p0. e. p0 5 0.47, x 5 395, n 5 925, a 5 0.005
9.6 Large-Sample Hypothesis Tests Concerning a Population Proportion 445
9.178 In each of the following problems, assume the null hypoth- 9.182 Public Policy and Political Science It has always been
esis is H0: p 5 p0 and the alternative hypothesis is Ha: p 2 p0. d ifficult to attract general physicians to rural areas. Small-town
Use the values of p0, x, n, and a to conduct a large-sample doctors do not have the opportunity to take much time off, and the
hypothesis test about the population proportion. Find the p value salary is relatively low. However, the quality of life is often
associated with each test, and use it to draw a conclusion. appealing. Past records indicate that 52% of all physicians in rural
a. p0 5 0.50, x 5 418, n 5 882, a 5 0.01 areas leave after one year. The federal government has decided to
b. p0 5 0.19, x 5 158, n 5 700, a 5 0.05 offer more incentives for doctors to stay in rural areas, for exam-
c. p0 5 0.90, x 5 445, n 5 520, a 5 0.001 ple, loan forgiveness and housing allowances. A few years after
d. p0 5 0.75, x 5 1095, n 5 1400, a 5 0.025 this program was implemented, a random sample of rural-area
e. p0 5 0.45, x 5 2386, n 5 5525, a 5 0.01 physician positions showed 62 of 130 left after one year.
a. Identify n, x, and p0.
Applications b. Check the nonskewness criterion. Is a large-sample
hypothesis test about p appropriate?
SOLUTION TRAIL
9.179 Public Health and Nutrition In a recent research study
c. Conduct the appropriate test to determine whether the
concerning nutrition, it was reported that only 2% of children in
turnover rate of rural doctors has decreased. Use a 5 0.01.
the United States ages 6–10 eat the recommended number of
d. Compute the p value associated with this test.
servings for all five major food groups each day. A certain state
conducted a nutrition campaign through elementary schools, 9.183 Medicine and Clinical Studies According to a
newspapers, and television. Following the campaign, a random recent study, taller women have a significantly higher risk
sample of 500 children in this age group was obtained, and 16 of developing some form of cancer.54 The American Cancer
were found to eat the recommended number of servings each day. Society reports that approximately 38% of all women develop
a. Identify n, x, and p0. some form of cancer in their lifetime. A long-term Canadian
b. Check the nonskewness criterion. Is a large-sample study was conducted involving 1250 tall women between the
hypothesis test about p appropriate? ages of 50 and 79, and 502 developed some form of cancer. Is
c. Conduct the appropriate hypothesis test to determine there any evidence to suggest that the proportion of taller
whether the proportion of children who eat the women who develop cancer is greater than 0.38? Use a 5 0.01.
recommended number of servings each day has increased. Write a Solution Trail for this problem.
Use a 5 0.05.
d. Compute the p value associated with this test. 9.184 Public Policy and Political Science Soon after Edward
Snowden disclosed information about the surveillance programs
9.180 Psychology and Human Behavior It is very com- of the U.S. National Security Agency, a survey of Americans
mon to complain about one’s boss. There are many examples in revealed that 56% were worried the United States would go too
TV, movies, and real life of bullying and harassment. In a recent far in violating privacy rights.55 Aides to Senator Roy Blunt
survey it was reported that 20% of workers say that a manager selected a random sample of 575 Missouri residents, and 358 said
has hurt their career.53 A random sample of 500 workers in the they were worried about violations of their privacy rights. Is there
retail industry was obtained, and 115 indicated that a boss had any evidence to suggest that the true proportion of Missouri resi-
hurt their career. dents worried about violations of privacy rights is greater than
a. Identify n, x, and p0. 0.58? Use a 5 0.05 and use the p value to draw a conclusion.
b. Check the nonskewness criterion. Is a large-sample
hypothesis test about p appropriate? 9.185 Psychology and Human Behavior There are many
c. Conduct the appropriate hypothesis test to determine urban legends involving a full moon and human behavior.
whether the proportion of workers in the retail industry Research at the University of Basel in Switzerland suggests that
who believe a boss has hurt their career is different from sleep is associated with the lunar cycle.56 In a new sleep study,
p0. Use a 5 0.05. 120 random adults were selected and studied during a full
d. Compute the p value associated with this test. moon phase. Melatonin levels were used to determine whether
eachTRAIL
SOLUTION person experienced a deep sleep and 76 experienced low
9.181 Education and Child Development Homeschooling
levels, and therefore, trouble sleeping, during the full moon. Is
has become very popular in the United States, and many col-
there any evidence to suggest that more than half of all people
leges try to attract students from this group. Evidence suggests
have trouble sleeping during a full moon? Use a 5 0.05.
that approximately 90% of all homeschooled children attend
college. To check this claim, a random sample of 225 home- 9.186 Marketing and Consumer Behavior A recent
schooled children was obtained and 189 of them were found to study suggests that Canadians are getting richer but they
have attended college. are among the most indebted people in the world.57 The Bank of
a. Identify n, x, and p0. Canada suggests that no household should have debt in excess of
b. Check the nonskewness criterion. Is a large-sample 160% of annual disposable income. A random sample of 1500
hypothesis test about p appropriate? households in Canada’s wealthiest cities, Vancouver, Calgary, and
c. Conduct the appropriate test to determine whether the Toronto, was obtained, and 235 were found to be too much in
proportion of homeschooled children who attend college debt. Is there any evidence to suggest that the true proportion of
is different from 0.90. Use a 5 0.05. Canadian households that are too much in debt is greater than
d. Compute the p value associated with this test. 14%? Use a 5 0.05. Write a Solution Trail for this problem.
446 C h a p t e r 9 Hypothesis Tests Based on a Single Sample
9.187 Public Policy and Political Science Ron Littlefield is more students leery of travel and life in another country. In a
considering a campaign for mayor of Chattanooga. He will enter random sample of 1200 graduating college students, 150 said
the race if there is evidence to suggest that less than 40% of all they had participated in a study-abroad program. Is there any
residents are satisfied with the local government. A random sam- evidence to suggest the proportion of students participating in
ple of 375 residents is obtained, and 127 indicate they are satis- these programs has decreased? Use a 5 0.05.
fied with the local government. Do you think this politician will
9.193 Physical Sciences Interstate Batteries guarantees their
enter the race for mayor? Justify your answer. (Use a 5 0.01.)
lawn tractor battery will last at least three years. To increase
9.188 Physical Sciences It’s known as the Hum, a steady, sales, the company is planning to offer a new replacement war-
droning sound heard in places all around the world. It is a low- ranty, as long as 95% of all batteries do indeed last at least three
frequency, annoying, throbbing, unexplained sound. It seems to years. A random sample of 200 customers was contacted, and
be louder at night and indoors, and approximately 2% of the 183 had batteries that lasted at least three years.
people living in any given Hum-prone area can hear the sound.58 a. Is there any evidence to suggest the proportion of tractor
To determine if more people are affected by the Hum, a random batteries that last at least three years is less than 0.95?
sample of 1300 adults in Taos, New Mexico, was obtained, and Use a 5 0.05.
24 reported that they regularly hear the noise. Is there any b. Find the p value associated with this hypothesis test.
evidence to suggest that the proportion of adults in this area who c. Based on your results, would you recommend
hear the Hum has changed? Use a 5 0.05 and use the p value implementation of the new replacement warranty? Justify
associated with this hypothesis test to draw a conclusion. your answer.
9.189 Public Policy and Political Science A local planning 9.194 Medicine and Clinical Studies A certain puzzle task
board must consider whether to require all new projects with is designed to measure spatial reasoning performance. Forty-
five or more apartments to designate some of the units as rent- five percent of all people attempting the puzzle complete the
controlled. A random sample of 100 apartments in Cheyenne, task within the allotted time. A researcher decided to test the
Wyoming, was obtained, and 9 were found to be rent-controlled. theory that classical music increases brain activity and
Is there any evidence to suggest the proportion of rent-controlled improves the ability to perform such tasks. A random sample of
apartments is less than 10%? Let a 5 0.05, and use the p value 400 people was obtained, and each listened to 15 minutes of
associated with this hypothesis test to draw a conclusion. classical music and attempted the task. Two hundred and eleven
9.190 Public Policy and Political Science Early in 2013, a completed the task within the allotted time.
political report indicated that 86% of Americans disapprove of a. Is there any evidence to suggest the proportion of people
the job Congress is doing. In a poll of 1000 Americans during who complete the puzzle within the allotted time has
Summer 2013 by NBC News, 830 said they disapprove of the increased? Use a 5 0.001.
job Congress is doing.59 Is there any evidence to suggest that b. Compute the p value associated with this test.
the proportion of Americans who disapprove of the job Con- c. Carefully sketch a graph indicating the critical value from
gress is doing has decreased? Use a 5 0.01. part (a) and the p value from part (b).
9.191 Technology and the Internet Technology is con- 9.195 Location, Location, Location The 2013 Profile of Inter-
stantly improving, increasing the speed of communication, and national Home Buying Activity reported the percentage of total
many software programs on tablets, laptops, and phones now foreign purchases of U.S. homes by country of origin. Although
anticipate our next words or phrases. Docmail, a UK-based buyers from China, India, and Mexico accounted for a large por-
printing and mailing company, conducted a study and concluded tion of all foreign purchases, Canadians purchased approximately
that one of every three adults had not been required to produce twice as many homes as the second largest group of foreign pur-
something in handwriting for more than a year.60 To check this chasers, the Chinese.62 In 2014, a random sample of 347 foreign
claim in the United States, a random sample of 656 adults was purchases was obtained, and 63 were purchased by Canadians.
obtained, and 235 said they had not been required to produce a. Is there any evidence to suggest that the proportion of foreign
something in handwriting for more than a year. Is there any purchases of U.S. homes by Canadians has decreased from
evidence to suggest that the proportion of Americans not using the 2013 proportion (0.23)? Use a 5 0.05.
handwriting is different in the United States? Use a 5 0.05 b. Compute the p value associated with this test.
and find the p value associated with this test. c. Carefully sketch a graph that shows the critical value from
part (a) and the p value from part (b).
Extended Applications
Challenge
9.192 Education and Child Development College study-
abroad programs have great educational value, compel students 9.196 Public Policy and Political Science The high cost of
to become more globally aware, and invite participants to learn health care has forced some people to choose between medical
about different nations and cultures. Some colleges sponsor treatment and other necessities such as food, heat, or electricity. A
several programs and have high participation rates. However, recent survey concluded that 43% of America’s working-age
nationally, only 14% of all college students participate in a adults did not see a doctor or access other medical services during
study-abroad program.61 Recent world events may have made the past year because of the cost.63 Following a recent advertising
9.7 Hypothesis Tests Concerning a Population Variance or Standard Deviation 447
campaign urging adults to seek medical assistance when needed, The four parts of the hypothesis test are
physicians’ groups hope this percentage has decreased. A random H0: p 5 p0
sample of 1000 working Americans was obtained, and 380 did not
Ha: p . p0, p , p0, or p 2 p0
seek medical assistance when needed, because of the cost.
a. Is there any evidence to suggest the proportion of TS: X 5 the number of successes in n trials
American workers who did not seek medical assistance RR: X $ xa, X # xra or X # xa/2 or X $ xra/2
has decreased? Use a 5 0.01. The critical values xa, xra, xa/2, and xra/2 are obtained from the
b. Find the probability of a type II error in this hypothesis binomial distribution with parameters n and p0 to yield the desired
test for pa 5 0.38; that is, find b ( 0.38 ) . significance level a. For example, in a one-sided, right-tailed test,
c. Find a value of pa such that the probability of a type II the critical value xra is found such that P ( X $ xra ) < a.
error is 0.1; that is, solve b ( pa ) 5 0.1 for pa.
A recent poll indicated that 51% of all students admit to cheat-
d. Find the sample size necessary such that the probability
ing.64 A sociologist conducting research believes this percent-
of a type II error for pa 5 0.35 is 0.025; that is, solve
age is much lower. In a random sample of 25 students from
b ( 0.35 ) 5 0.025 for n.
Long Island public schools, 9 admitted to cheating.
9.197 Demographics and Population Statistics A large- a. Write the four parts of a small-sample (exact) one-sided,
sample hypothesis test concerning a population proportion is based left-tailed test concerning the population proportion of
on a normal approximation and the nonskewness criterion. If n is Long Island students who admit to cheating, based on a
small (and the nonskewness criterion fails), the test statistic in an binomial distribution. Use a 5 0.01.
exact hypothesis test concerning p and is based on the number of b. Is there any evidence to suggest the true proportion is less
successes in the sample, X. If H0: p 5 p0 is true, then X has a than 0.51?
binomial distribution with n trials and probability of success p0. c. Find the p value associated with this hypothesis test.
A Closer L ok
1. X (a Greek capital chi) is a random variable. x (a Greek lowercase chi) is a specific
value.
2. This test is valid for any sample size, as long as the underlying population is normal.
3. Table VI in the Appendix presents selected critical values associated with various chi-
square distributions. These critical values were used to construct confidence intervals
in Section 8.5.
4. Figures 9.75, 9.76, and 9.77 illustrate the rejection region for each alternative
hypothesis.
2n−1
2n−1 2n−1
2
2
SOLUTION TRAIL
0 2,n−1 0 21,n−1 0 212,n−1 22,n−1
Figure 9.75 Rejection region for Figure 9.76 Rejection region for Figure 9.77 Rejection region for
Ha: s2 . s20 . Ha: s2 , s20 . Ha: s2 2 s20 .
Step 4 The value of the test statistic does not lie in the rejection region. We do not reject
the null hypothesis. There is no evidence to suggest the true population variance
in volumetric liquid water content is greater than 0.002.
Figure 9.78 shows a technology solution.
A Closer L ok
1. Neither the TI-84 Plus nor Excel has a built-in command to conduct a hypothesis test
concerning a population variance. However, both can be used to find the appropriate
critical value and associated p value.
2. The table listing critical values for various chi-square distributions is very limited.
Using this table, the best we can do is bound the p value associated with a hypothesis
test (as for a t test).
360 347 346 347 338 370 362 356 330 339
372 358 387 359 343 372 369 349 344 334
a. Is there any evidence to suggest that the true population variance in ice cream pur-
chased per day is different from 660.49 gallons2? Assume normality and use a 5 0.01.
b. Find bounds on the p value associated with this test.
Solution
Step 1 The assumed population variance is 25.72 5 660.49 ( 5 s20 ) ; n 5 20 and
a 5 0.01.
We would like to know whether the population variance is different from 660.49.
The relevant alternative hypothesis is two-sided.
The underlying population is assumed to be normal. A hypothesis test concern-
ing a population variance can be used.
SOLUTION TRAIL
Solution Trail 9.20 Step 2 The four parts of the hypothesis test are
Ke y wor ds
H0: s2 5 660.49
n Is there any evidence? Ha: s2 2 660.49
n Population variance (n 2 1) S 2
TS: X2 5
n Different from 660.49 s20
n Random sample RR: X2 # x212a/2,n21 5 x20.995,19 5 6.8440 or
2
Transl ati on X $ x2a/2,n21 5 x20.005,19 5 38.5823
n Conduct a two-sided test
Step 3 The sample variance is
about s2
c g x2i 2 ( g xi ) 2 d
n Find n and s2 1 1
s2 5 Computational formula, sample variance.
n21 n
Concepts
Hypothesis test concerning a 1 1
n
5 c 2,511,964 2 ( 7082 ) 2 d 5 222.52 Use given data.
population variance 19 20
Visi on The value of the test statistic is
We can assume that the ( n 2 1 ) s2 ( 19 )( 222.52 )
underlying distribution is normal. x2 5 5 5 6.4011
Use the template for a two-sided s20 660.49
test about s2. Use a 5 0.01 to The value of the test statistic lies in the rejection region ( x2 5 6.4011 # 6.8840 ).
find the critical values.
We reject the null hypothesis. There is evidence to suggest the true population
Compute the value of the test
variance is different from 660.49. Figure 9.79 shows a technology solution.
statistic, and draw a conclusion.
Use Table VI in the Appendix to Step 4 x2 5 6.4011.
find bounds on the p value. Using Table VI in the Appendix, row n 2 1 5 20 2 1 5 19, place 6.4011 in the
ordered list of critical values.
5.4068 # 6.4011 # 6.8440
x20.999,19 # 6.4011 # x20.995,19
x2120.001,19 # 6.4011 # x2120.005,19
The initial bounds are on p / 2,
because the hypothesis test is Therefore, 0.001 # p/2 # 0.005
two-sided. and 0.002 # p # 0.010 See Figure 9.80.
219
0.005
0.001
0 5.4068 2 6.8440
Figure 9.79 JMP Hypothesis Figure 9.80 The left tail of a chi-square distribution with
test concerning a population 19 degrees of freedom. This illustrates the computations to find
standard deviation. bounds on the p value.
Technology Corner
Procedure: Hypothesis test concerning a population variance or standard deviation.
Reconsider: Example 9.20, solution, and interpretations.
CrunchIt!
There is no built-in function to conduct a hypothesis test concerning a population variance or standard deviation. However,
CrunchIt! may be used to compute the p value.
1. Select Distribution Calculator; Chi-square.
2. Enter the degrees of freedom associated with the test statistic (df).
3. Under the Probability tab, select the appropriate inequality and enter the value of the test statistic. Click Calculate. A
graph and the tail probability are displayed. See Figure 9.81. Note that the probability displayed in this example is p/2.
TI-84 Plus C
There is no built-in function to conduct a hypothesis test concerning a population variance or standard deviation. However,
the calculator may be used to compute summary statistics, the test statistic, and the p value.
1. Enter the data into list L1.
2. Use LIST ; MATH; stdDev to compute the sample standard deviation.
3. Compute the value of the test statistic.
4. Use DISTR ; DISTR; x2cdf to compute the left tail probability, and multiply by 2 to obtain the p value. See
Figure 9.82.
Minitab
Use the function 1 Variance to conduct a hypothesis test concerning a population variance or standard deviation. The input may
be either Samples in columns or Summarized data, and the test may be conducted in term of the variance or standard deviation.
1. Enter the data into column C1.
2. Select Stat; Basic Statistics; 1 Variance.
452 C hapter 9 Hypothesis Tests Based on a Single Sample
3. Choose One or more samples, each in a column, and enter C1. Check the box to Perform a hypothesis test, and enter the
Hypothesized variance. See Figure 9.83.
4. Choose Options. Enter a Confidence level and select the Alternative hypothesis. See Figure 9.84.
5. Click OK. Summary statistics, the confidence interval, the value of the test statistic, and the p value are displayed in the
Session window. See Figure 9.85.
Figure 9.84 1 Variance Options screen. Figure 9.85 Hypothesis test results.
Excel
There is no built-in function to conduct a hypothesis test concerning a population variance or standard deviation. However,
Excel may be used to compute summary statistics, the test statistic, and the p value.
1. Enter the data into column A.
2. Compute the sample variance and the value of the test statistic.
3. Use CHISQ.DIST to find the left tail probability and calculate the p value. See Figure 9.86.
9.204 Consider a hypothesis test concerning a population 9.212 Consider a random sample of size 21 from a normal
v ariance from a normal population with H0: s2 5 352.98 and population with hypothesized variance 16.7.
Ha: s2 , 352.98. a. Write the four parts for a one-sided, right-tailed hypothesis
a. Write the appropriate test statistic. test concerning the population variance with a 5 0.01.
b. Find the rejection region corresponding to each value of n b. Suppose s2 5 28. Find the value of the test statistic, and
and a. draw a conclusion about the population variance.
i. n 5 17, a 5 0.001 ii. n 5 13, a 5 0.05 c. Find bounds on the p value associated with this hypothesis
i ii. n 5 37, a 5 0.0001 iv. n 5 27, a 5 0.10 test, and carefully sketch a graph to illustrate this value.
v. n 5 33, a 5 0.05 vi. n 5 8, a 5 0.005
9.213 Consider a random sample of 16 observations, given in
9.205 Consider a hypothesis test concerning a population the following table, from a normal population with hypothe-
v ariance from a normal population with H0: s2 5 43.8 and sized variance 36.8. EX9.213
Ha: s2 2 43.8.
a. Write the appropriate test statistic. 233.1 226.1 220.3 247.6 232.9 232.8 235.9
b. Find the rejection region corresponding to each value of n 232.4 249.4 207.4 231.8 232.1 220.7 229.6
and a.
242.5 229.3
i. n 5 40, a 5 0.05 ii. n 5 31, a 5 0.01
i ii. n 5 22, a 5 0.001 iv. n 5 16, a 5 0.02 a. Write the four parts for a two-sided hypothesis test
v. n 5 29, a 5 0.002 vi. n 5 5, a 5 0.20 concerning the population variance with a 5 0.05.
9.206 Consider a hypothesis test concerning the variance from
b. Compute the sample variance and the value of the test
a normal population with H0: s2 5 1.28 and Ha: s2 . 1.28. statistic. Draw a conclusion about the population variance.
Find the significance level ( a ) corresponding to each value of n c. Find bounds on the p value associated with this
and rejection region. hypothesis test.
a. n 5 21, X2 $ 31.4104 b. n 5 33, X2 $ 56.3281 9.214 Consider a random sample of size 40 from a normal
c. n 5 14, X2 $ 29.8195 d. n 5 29, X2 $ 59.3000 population with hypothesized variance 75.6.
9.207 Consider a hypothesis test concerning the variance from a. Write the four parts for a one-sided, left-tailed hypothesis
a normal population with H0: s2 5 48.92 and Ha: s2 , 48.92. test concerning the population variance with a 5 0.001.
Find the significance level ( a ) corresponding to each value of n b. Suppose s2 5 48.5. Find the value of the test statistic, and
and rejection region. draw a conclusion about the population variance.
a. n 5 25, X2 # 7.4527 b. n 5 36, X2 # 18.5089 c. Find bounds on the p value associated with this hypothesis
c. n 5 12, X2 # 3.8157 d. n 5 51, X2 # 27.9907 test.
Assume normality and use a 5 0.01. Write a Solution Trail for a corn maze so that the mean time for completion is 1.5 hours
this problem. with variance 0.57 hour2. A random sample of 18 people
entering the maze was selected, and the time (in hours) to com-
9.217 Manufacturing and Product Development A man-
plete the puzzle was recorded for each. The sample standard
ufactured glass rod is tested for the stress required for fracture.
deviation was s 5 0.34. Is there any evidence the completion-
Experiments suggest that a flame-polished rod has a higher
time variance is different from the intended variance? Use
mean fracture stress than a rod with an abraded surface. In both
a 5 0.05 and assume the underlying distribution is normal.
cases, the variance in stress is designed to be at most 324 MPa2.
A random sample of 12 polished glass rods was obtained, and 9.223 Sports and Leisure Bull riding has become a very
the fracture stress for each was measured. The sample standard popular rodeo sport—the action is fast and dangerous. A bull
deviation in fracture stress was s 5 21.56. Is there any evidence ride is scored by two judges; one actually rates the bull, while
to refute the manufacturer’s claim? Assume the underlying dis- the other evaluates the rider. The variance in bull-riding time
tribution is normal and use a 5 0.01. tends to be large, approximately 22.5 seconds2. A random
sample of 37 bull rides was selected over an entire rodeo sea-
9.218 Public Health and Nutrition The Italian Peoples
son. The sample variance in riding times was s2 5 15.6 sec-
Bakery in Pennington, New Jersey, advertises an original-recipe
onds2. Smaller variability in bull-riding time translates to a
cream puff packed with 3 ounces of filling. To produce a con-
more monotonous, unexciting rodeo. Is there any evidence
sistent product, the variance in cream filling is carefully moni-
that bull riding has become less exciting? Assume normality
tored.
SOLUTION In a random sample of 23 cream puffs, the variance in
TRAIL
and use a 5 0.05.
filling was s2 5 0.105 ounce2. Is there any evidence to suggest
the variance in filling is greater than 0.09 ounce2? Assume nor-
mality and use a 5 0.025. Extended Applications
9.219 Manufacturing and Product Development In 9.224 Public Health and Nutrition The Obama administra-
an effort to improve efficiency, Samuel Adams brew- tion worked to make health care more affordable and account-
ing company would like to produce yeast slurry with vari- able. However, there is considerable variation across the
ance no greater than 62.5. A new delivery pump was country in provider charges for common medical services. A
installed, and a random sample of 10 slurry mixtures was random sample of hospitals was obtained and the charge for
obtained. Each mixture was measured in billion cells/ml and outpatient service 300 was recorded for each.67 health
a. Is there any evidence to suggest that the standard
the sample variance was s2 5 70.1. Is there any evidence to
deviation in outpatient charges for this service is greater
suggest the variance is greater than the desired value?
than 1000? Assume normality and use a 5 0.05.
Assume normality and use a 5 0.01. Write a Solution Trail
b. Find bounds on the p value associated with this hypothesis
for this problem.
test, and carefully sketch a graph to illustrate this value.
9.220 Sports and Leisure The specified weight of a
9.225 Manufacturing and Product Development Bitter-
22-mm die used at casinos in Las Vegas, Nevada, is 10.4 g.
sharp apple cider is made from apples that contain more than
The variability in die weight must be negligible (at most
0.45% malic acid. It is important for the variance in malic acid
0.04 g2) in order for patrons to believe that games involving
content to be small, because large variability may cause the
dice are fair. A random sample of 25 new 22-mm dice was
resulting cider to be too sharp. In preparation for making a
obtained and each die was carefully weighed. The sample
batch of cider, a random sample of 20 Foxwhelp Bittersharp
mean weight was x 5 10.38 g and the sample standard
apples was obtained, and the malic acid content in each was
deviation was s 5 0.244 g. Conduct the relevant hypothesis
measured. The sample variance was s2 5 0.42.
test to determine whether there is any evidence that the
a. Is there any evidence to suggest the population variance
variance in die weight is greater than 0.04. Use a 5 0.01
in malic acid is greater than 0.36? Assume the underlying
and assume normality.
distribution is normal and use a 5 0.05.
9.221 Manufacturing and Product Development SoLux b. Find bounds on the p value associated with this hypothesis
produces special light bulbs for museums, photography studios, test.
automotive paint finishing, and machine vision. The SoLux
9.226 Business and Management Researchers studying
4100 Kelvin 50-watt 12-volt 10° bulb is designed to produce a
the European Union have suggested that small variability in
beam diameter of 1.75 feet 10 feet away.66 In a random sample
minimum wage among countries leads to more economic
of 26 bulbs from the production line, the standard deviation was
stability and growth. A random sample of European countries
0.342 feet. Is there any evidence to suggest that the standard
was obtained and the minimum wage (in euros) was recorded
deviation in beam diameter at 10 feet is greater than 0.3?
for each.68 minwage
Assume normality and use a 5 0.05.
a. Is there any evidence to suggest that the standard
9.222 Business and Management Many farmers craft deviation in minimum wage is greater than 400 euros?
unique corn mazes in their fields after the growing season. Assume normality and use a 5 0.05.
These huge puzzles challenge both children and adults, and b. Find bounds on the p value associated with this hypothesis
increase revenue for farmers. Suppose a farmer has constructed test.
9.7 Hypothesis Tests Concerning a Population Variance or Standard Deviation 455
9.227 Manufacturing and Product Development Cell goggles was obtained and the resolution was measured for
phone companies attract customers by offering a wide cover- each. The sample variance was s2 5 5.230.
age area and a consistent, strong signal. Signal strength is a. Is there any evidence to suggest that the standard
measured in dBm, decibels above or below 1.0 milliwatt. A deviation in the resolution for these goggles is different
random sample of locations in a company’s coverage area from 2? Assume normality and use a 5 0.01.
was obtained, and the signal strength (in dBm) at each spot b. Find bounds on the p value associated with this
was measured. cphone hypothesis test.
a. Suppose a consistent signal means a variance of at most
9.231 Manufacturing and Product Development Oxford
230 mw2. Is there any evidence of an inconsistent signal
Paper Company manufactures boxboard for folding cartons
in this company’s coverage area? Use a 5 0.05 and
made entirely from recycled material. One specific product is
assume normality.
designed to have thickness 375 micrometers with standard
b. Find bounds on the p value associated with this hypothesis
deviation 7 micrometers. A random sample of 21 pieces of
test.
boxboard was obtained from the assembly line.
9.228 Manufacturing and Product Development The a. Assume the thickness distribution is normal and let
Estes Pro Series II Ventris model rocket weighs approximately a 5 0.05. Consider a two-sided hypothesis test to
16 ounces and can reach altitudes of over 2000 feet.69 Suppose determine whether the boxboard is being manufactured
the company would like the standard deviation in height to be with the designed variance in thickness.
approximately 50 feet. If the standard deviation is any larger, b. Find the critical values in terms of the sample variance.
then more rockets are lost (because they soar too high and drift Work backward to determine two values such that if
away) or customers become angry because more launches do S 2 # s 2L or S 2 $ s 2H, then the null hypothesis is rejected.
not achieve the advertised height. A random sample of rockets c. Suppose s2 5 56. Use the critical values from part (b)
was obtained, and the height (in feet) achieved on each launch to draw a conclusion about the population variance.
was measured. rocket d. Suppose s2 5 15.6. Use the critical values from part (b)
a. Is there any evidence to suggest the population variance to draw a conclusion about the population variance.
in height is greater than the company’s desired value?
Assume normality and use a 5 0.01. Challenge
b. Find bounds on the p value for this hypothesis test.
9.232 Economics and Finance An approximate test con-
9.229 Biology and Environmental Science A monarch cerning a population variance is based on a normal approxima-
butterfly may fly as many as 2000 miles during its migration tion. Recall that if S 2 is the sample variance of a random sample
to Mexico. These amazing creatures actually vary dramati- of size n from a normal distribution with variance s2, the ran-
cally in weight, color, and even flying behavior. Previous dom variable X 5 ( n 2 1 ) S 2 / s2 has a chi-square distribution
research suggests the mean wingspan for a monarch butterfly with n 2 1 degrees of freedom. If n is large, the random
is 50 mm with a standard deviation of 2.75 mm. There is variable X is approximately normal with mean n 2 1 and
some speculation that changes in the environment (for exam- variance 2 ( n 2 1 ) . An approximate hypothesis test is based
ple, pollution and climate) have caused greater variability in on standardizing X to a Z random variable.
the wingspan. A random sample of monarch butterflies was The four parts of the hypothesis test are
obtained, and the wingspan of each (in mm) is given in the
following table. monarch
H0: s2 5 s20.
Ha: s2 . s20, s2 , s20, or s2 2 s20
49.3 52.4 43.3 51.2 55.1 48.4 41.6 54.9 S 2 2 s20
TS: Z 5
51.9 51.6 47.0 50.7 47.2 55.7 50.3 48.3 !2s20 / !n 2 1
RR: Z $ za, Z # 2za, or 0 Z 0 $ za / 2
52.9 47.8 49.4 56.8 47.0 51.5
The historic variance in the exchange rate for the Japanese yen
against the U.S. dollar is approximately 1.56. A random sample
a. Is there any evidence that the variability in wingspan of
of 48 closing exchange rates was obtained, and the summary
the monarch butterfly has increased? Assume normality
statistics were x 5 103.74 and s2 5 2.2.
and use a 5 0.05.
a. Write the four parts of a large-sample, two-sided test
b. Find bounds on the p value associated with this
concerning the variance in exchange rate, based on a
hypothesis test.
normal approximation. Use a 5 0.05.
9.230 Manufacturing and Product Development Fol- b. Is there any evidence to suggest the variance in exchange
lowing a recent night-time helicopter crash, the air ambulance rate has changed?
service in Ontario is considering night-vision goggles for all c. Find the p value associated with this hypothesis test.
pilots. One measure of the effectiveness of these goggles is d. Conduct an exact hypothesis test based on the chi-square
the resolution, in lp/mm (line pairs per millimeter). A random distribution. Compute the p value, and compare it with
sample of 18 ATN PVS7-CGT high-performance night-vision your answer to part (c).
456 C hapter 9 Hypothesis Tests Based on a Single Sample
Chapter 9 Summary
Concept Page Notation / Formula / Description
Alternative Rejection
Parameter Assumptions hypothesis Test statistic region
m n large, s known, or m . m0 X 2 m0 Z $ za
normality, s known m , m0 Z5 Z # 2za
m 2 m0 s / !n 0 Z 0 $ za / 2
m normality, s unknown m . m0 T $ ta,n21
m , m0 X 2 m0 T # 2ta,n21
T5
m 2 m0 S/ !n 0 T 0 $ ta/2,n21
p n large, p . p0 =
P 2 p0 Z $ za
nonskewness p , p0 Z5 Z # 2za
criteria p 2 p0 p0 ( 1 2 p0 )
0 Z 0 $ za / 2
Å n
s2 normality s2 . s20 X
2
$ x2a,n21
s2 , s20 (n 2 1)S 2 X
2
# x212a,n21
s2 2 s20 X2 5 2
s20 X # x212a/2,n21 or
2
X $ x2a/2,n21
Chapter 9 Exercises
a random sample of 40 pieces of tape are carefully measured. If random patient was recorded for each. The data are given in the
there is any evidence the thickness is different from 4 mil, then following table (in minutes). DOCTIME
the entire process is stopped and the machinery checked and
cleaned. Assume s 5 0.05 mil and use a 5 0.01. 6.8 9.0 7.2 8.2 7.4 4.8 7.8 5.7 9.0
a. Suppose the sample mean is x 5 4.014. Should the 7.6 6.2 7.5 6.8 5.7 6.4 11.6 8.0
process be stopped?
b. Suppose the sample mean is x 5 3.979. Should the a. Is there any evidence to suggest that the true population
process be stopped? mean time spent with each patient is less than eight
c. Comment on the statistical and practical differences minutes? Assume normality and use a 5 0.05.
between the decisions in parts (a) and (b). b. Find bounds on the p value associated with this
9.235 Physical Sciences CubeSats are tiny satellites that hypothesis test.
began as an educational experiment but are now included in 9.239 Business and Management According to a report
many commercial and government space launches. A CubeSat issued by the management consulting firm McKinsey &
is a 10-cm cube that weighs up to 1.33 kg. A random sample Company, 20% of all companies require genetic or family
of CubeSats set for launch in 2013 was obtained and the weight medical history information from employees or job applicants.
of each was measured. CUBESAT A random sample of 1500 companies was obtained, and 345
a. Each CubeSat has optimal launch characteristics if the said they require this information from employees or job
weight is close to 1 kg. Is there any evidence to suggest applicants.
that the mean weight of CubeSats scheduled for launch is =
a. Identify p0 and n, and compute p.
different from 1? Assume the distribution is normal, b. Check the nonskewness criterion.
s 5 0.1, and use a 5 0.05. c. Is there any evidence to suggest the true proportion of
b. Compute the p value for this hypothesis test. companies that require this information is greater than
9.236 Biology and Environmental Science A special water 0.20? Use a 5 0.01.
channel is being constructed, connected to the Campbell River d. Compute the p value associated with this hypothesis test.
in British Columbia, to provide more area for chinook salmon 9.240 Public Policy and Political Science A candidate for
spawning. The plans call for the channel to be 23 meters wide. district attorney claims that 75% of all residents in Elko
Following completion, a random sample of 18 locations along County, Nevada, favor granting police additional powers to tap
the channel was selected, and the width at each location was telephone lines. To check this claim, a random sample of 560
measured. The sample mean was x 5 24.6 meters. Assume the residents was selected, and 392 said they were in favor of more
distribution of channel widths is normal and s 5 2.28 meters. phone taps.
a. Is there any evidence to suggest the mean width of the a. Is there any evidence to suggest the true proportion of
channel is greater than 23 meters? Use a 5 0.01. residents who favor additional power to tap phones is less
b. Find the p value associated with this hypothesis test. than 0.75? Use a 5 0.01.
b. Compute the p value associated with this hypothesis test.
9.237 Manufacturing and Product Development A
piston in a particular 12-cylinder diesel engine is manufactured 9.241 Public Policy and Political Science The sequester—
to have diameter 13 mm. Any larger or smaller diameter will potentially harmful, automatic budget cuts—can affect jobs,
cause immediate, costly damage to the engine. Every half-hour, services, education, and public safety. In 2013, the U.S.
10 finished pistons are selected and the diameter of each (in Congress was unable to reach a compromise on the federal bud-
mm) is carefully measured. If there is any evidence that the get, and on March 1, the sequester began. According to a poll
mean diameter is different from 13 mm, the manufacturing conducted during the Summer, 22% of Americans said they had
process is stopped and the machinery inspected. The quality been significantly affected by resulting cuts.71 Early in the Fall,
control inspector uses a significance level of a 5 0.05. in a new survey, 55 of 310 randomly selected Americans said
a. Suppose the sample mean is x 5 12.89 and the sample they were significantly affected by the resulting cuts.
standard deviation is s 5 0.96. Should the manufacturing a. Is there any evidence to suggest that the proportion of
process be stopped? Americans affected by the sequester changed? Use
b. Suppose the sample mean is x 5 13.04 and the sample a 5 0.05.
standard deviation is s 5 0.045. Should the b. Find the p value associated with this hypothesis test.
manufacturing process be stopped?
c. If you were buying these pistons, would you like the 9.242 Public Health and Nutrition Many remember actor
manufacturer to use a smaller or a larger significance Michael J. Fox as conservative Republican Alex Keaton on the
level? Justify your answer. TV show Family Ties. After more than 10 years, he returned to
television during the Fall of 2013 as a news anchor diagnosed
9.238 Medicine and Clinical Studies According to a recent with Parkinson’s disease. The show reflects much of his
study, doctors are spending less time with each patient, approxi- frustrating experience. Most cases of Parkinson’s disease are
mately eight minutes each day with each patient.70 A random linked to environmental or genetic factors. It has been reported
sample of residents was obtained and the time spent with a that approximately 15% of people with Parkinson’s disease
458 Chapter 9 Hypothesis Tests Based on a Single Sample
have a family history of the disease.72 A random sample of 420 was obtained and the response time (in minutes) for each was
Parkinson’s patients was obtained, and 78 had a family history recorded. The data are given in the following table. DETROIT
of the disease. Is there any evidence to suggest that the
proportion of Parkinson’s patients with a family history of the 52 53 56 50 48 51 56 60 49 57 60 59 61
disease is different from 0.15? Use a 5 0.05. 59 60 44 57 61 66 54 55 59 53 54 62
9.243 Travel and Transportation In July 2013 a train
a. The chief of police claims that response time has
carrying 72 tank cars filled with oil exploded and rolled off the
improved. Is there any evidence to suggest that the mean
tracks in Lac-Mégantic, Quebec. This was one of the worst train
police response time has decreased? Assume normality
accidents in Canadian history, and the disaster prompted many
and use a 5 0.05.
to discuss improved regulations for the transportation of
b. Find bounds on the p value for this hypothesis test.
hazardous materials. The Association of American Railroads
reported that the safety record for moving hazardous materials
is outstanding, and that 74% of railroad crude-oil spills involve EXTENDED APPLICATIONS
less than five gallons.73 Over the next year, a random sample of 9.248 Physical Sciences Workers at the Daivik diamond
129 railroad crude-oil spills was obtained, and 85 involved mine in northern Canada extract approximately 1800 DMT
spills of less than five gallons. (dry metric tons) of ore per day, which is sifted and examined
a. Is there any evidence to suggest that the proportion of for diamonds. New machinery has just been installed that is
crude-oil spills that involve less than five gallons has designed to increase the amount of ore extracted per day. A
decreased? Use a 5 0.05. random sample of 36 days was selected, and the amount of ore
b. Explain the practical implications of the hypothesis test extracted each day was recorded. The sample mean was
results in part (a). x 5 1852 DMT. Assume s 5 202 DMT.
c. Find the p value associated with this hypothesis test. a. Is there any evidence to suggest the new machinery has
9.244 Medicine and Clinical Studies The diameter of a improved production? Use a 5 0.05.
virus is approximately 0.3 mm (micrometers). There is some b. What is the probability of a type II error if the true mean
speculation that new virus strains exhibit greater variability in amount of ore extracted has changed to 1875 DMT; that
diameter. A medical research lab obtained a random sample of is, find b ( 1875 ) ? Find the probability of a type II error if
15 new virus strains and measured the diameter (in mm) of the true mean is 1925 DMT.
each. The sample mean diameter was 0.323 and the sample 9.249 Biology and Environmental Science A random
variance was s2 5 0.0026. Is there any evidence to suggest the sample of days in July 2013 was obtained and the highest
true population variation in diameter of viruses has increased significant wave heights that occurred in July 2013 near
from 0.0015? Assume normality and use a 5 0.05. Southeast Queensland, Australia, was obtained. The data are
9.245 Manufacturing and Product Development The given in the following table (in meters).75 WAVES
radial shrinkage in paper birch wood from green to oven dry
is approximately 6.3%. A new process has been developed to 0.9 0.4 2.4 0.9 1.7 1.5 2.0
decrease the variability in shrinkage. In a random sample of 1.3 2.1 2.1 2.0 1.3 2.1
21 pieces of paper birch, the sample variance in shrinkage was
0.39. Is there any evidence to suggest the population variance a. Is there any evidence to suggest that the mean significant
in shrinkage is less than 0.50? Assume normality and use wave height is greater than 1 meter? Assume normality
a 5 0.10. and use a 5 0.01.
b. Find bounds on the p value for this hypothesis test.
9.246 Medicine and Clinical Studies The normal blood
platelet count for an adult ranges from 150,000 to 400,000, 9.250 Physical Sciences The design specifications for a new
with a standard deviation of approximately 62,500. A gymnasium at a local high school call for the lights to produce
researcher has speculated that increased exposure to pollutants at least 40 footcandles (a measure of brightness). Before the
has increased the variability in numbers of blood platelets. A new facility was opened to students, the contractor collected a
random sample of 37 adults was obtained, and the blood sample of 23 brightness measurements at random locations in
platelet count was measured in each person. The sample the gym. The sample mean was x 5 38.63 footcandles and the
standard deviation was s 5 65,268. Is there any evidence to sample standard deviation was s 5 5.6 footcandles.
suggest the population variance in blood platelet count has a. Is there any evidence to suggest the mean brightness in
increased? Assume normality and use a 5 0.001. the gym is less than the design specification? Assume
normality and use a 5 0.01.
9.247 Public Policy and Political Science With over
b. Find bounds on the p value associated with this
$18 billion in debt, in July 2013 the city of Detroit declared
hypothesis test.
bankruptcy. In many parts of the city there are few public
services, roads in need of repair, and approximately 40% of all 9.251 Business and Management The largest above-
streetlights do not work at night. In addition, the mean police ground storage tanks in Bayonne, New Jersey, have the capacity
response time is 58 minutes.74 A random sample of calls to police to hold 6,000,000 gallons of oil. For safety reasons, managers
Chapter 9 Exercises 459
prefer the mean amount stored at any given time to be no Summer 2013 was obtained and the capacity of the lake on each
greater than 4,500,000 gallons. A random sample of nine large day (in thousands of acre feet) was measured.76 WATER
tanks was selected, and the amount of oil stored in each (in a. Suppose the historic mean capacity during the summer is
gallons) was recorded. The summary statistics were 277 thousand acre feet. Is there any evidence to suggest
x 5 4,675,250 and s 5 482,556. that the true population mean capacity has decreased?
a. Is there any evidence to suggest the mean amount of oil Assume normality and use a 5 0.05.
stored in the large tanks is above the safety level? Assume b. Find the p value associated with this hypothesis test.
normality and use a 5 0.025.
9.255 Manufacturing and Product Development The
b. Find bounds on the p value associated with this
Liberty Stone Vista standard unit is manufactured to weigh 70
hypothesis test.
pounds.77 This double-sided stone is part of a wall system for
9.252 Public Health and Nutrition Cast iron is an extremely residential and commercial retaining walls. Large variability in
durable cookware material, good for searing and blackening the weight of these stones could cause structural deficiencies in
foods. However, a cast-iron pan, for example, can be full of a wall. A random sample of 17 standard units was obtained and
bacteria and can lend unwanted flavors to food. A company the sample variance was 7.75 pounds2.
trying to promote alternative ceramic cookware asked members a. Is there any evidence to suggest that the population
of a community to bring their favorite cast-iron cookware to variance is greater than 4 pounds2? Assume normality and
their store for bacteria testing. The company manager claimed use a 5 0.05.
that at least 60% of all cast-iron cookware contains harmful b. Find bounds on the p value for this hypothesis test.
bacteria. A random sample of 120 pans was selected, and 57
9.256 Business and Management According to Catalyst, a
were found to contain harmful bacteria.
nonprofit organization working to provide more opportunities
a. Is there any evidence to refute the manager’s claim? Use
for women and business, women currently hold 4.6% of
a 5 0.01.
Fortune 1000 CEO positions.78 Some economic researchers
b. Find the p value associated with this hypothesis test.
believe that women own a larger proportion of small busi-
c. Do you believe the sample is really random? Why or why
nesses. A random sample of 550 small businesses in the United
not?
States was obtained, and 37 were owned by women.
9.253 Marketing and Consumer Behavior An assisted- a. Is there any evidence to suggest that the proportion of
living home is an alternative to a nursing home and a bridge small businesses owned by women is greater than 4.6%?
between a skilled-care facility and a patient’s residence. A Use a 5 0.01.
recent report indicated 92% of all patients in assisted-living c. Find the p value for this hypothesis test.
homes are satisfied with the facility and the care. An insurance
company believes that this percentage is actually much lower last Step
(due to health-care violations and strict for-profit motives).
9.257 Is it OK to pad an insurance claim? The
A random sample of 5000 patients in assisted-living homes
results of a recent survey revealed that 24% of Americans
around the country was obtained, and 4576 said they were
believe that it is acceptable to increase an insurance claim by a
satisfied with the facility and the care.
small amount to make up for deductibles that they are required
a. Is there any evidence to suggest the true proportion of
to pay. Suppose Liberty Mutual Insurance Company has hired
assisted-living patients who are satisfied is less than 0.92?
an independent agency to conduct a survey of its automobile
Use a 5 0.01.
policy holders. Liberty Mutual plans to increase all policies by
b. Find the p value associated with this hypothesis test.
a fixed rate if there is evidence to suggest that more than 24%
c. Carefully sketch a graph illustrating the critical value,
of policy holders view claim padding as acceptable. Of the 355
value of the test statistic, and p value.
policy holders contacted, 96 said they view claim padding as
9.254 Biology and Environmental Science The Castaic acceptable. Is there any evidence to suggest that the proportion
Lake reservoir provides water for the northern portion of the of policy holders that view claim padding as acceptable is
Greater Los Angeles Area. A random sample of days during greater than 0.24? Use a 5 0.05.
10 Confidence Intervals and
Hypothesis Tests Based on
Two Samples or Treatments
Looking Back
n Recall the formal, four-part hypothesis test process.
n Remember the specific inference procedures concerning a single population parameter: m,
p, or s2 .
Looking Forward
n Adapt and extend the single-sample hypothesis test procedures.
n Construct confidence intervals to estimate the difference between two population parameters.
n Conduct hypothesis tests to compare two population parameters.
Contents
10.1 Comparing Two Population Means Using Independent Samples When Population
Variances Are Known
10.2 Comparing Two Population Means Using Independent Samples from Normal
Populations
10.3 Paired Data
10.4 Comparing Two Population Proportions Using Large Samples
10.5 Comparing Two Population Variances or Standard Deviations
Katja Kreder/AWL Images/Getty Images
461
462 C hapte r 1 0 Confidence Intervals and Hypothesis Tests Based on Two Samples or Treatments
Notation
To conduct a hypothesis test to compare two (similar) population parameters, we will sim-
ply modify the single-sample procedures presented in the previous chapter. Perhaps the
most tricky aspect of these procedures is the notation. The following table summarizes the
notation used to represent similar parameters associated with two different populations.
Population parameters
Standard
Mean Variance deviation Proportion
Population 1 m1 s21 s1 p1
Population 2 m2 s22 s2 p2
The following table summarizes the notation used to represent values of summary statis-
tics associated with samples from two different populations.
Note: We do not necessarily use Sample statistics
every summary statistic associated
Sample Standard
with a sample in every problem.
size Mean Variance deviation Proportion
For example, we may only need
the sample size and proportion in Sample from =
n1 x1 s21 s1 p1
one case, but use the sample size, population 1
mean, and standard deviation in Sample from =
another problem. population 2 n2 x2 s22 s2 p2
To compare two population parameters to see whether there is any evidence that they
are different, we often consider a difference. For example, to compare two population
means, m1 and m2, we consider the difference m1 2 m2. In searching for evidence that p1
is larger than p2, we look at the difference p1 2 p2.
There are two reasons to consider a difference.
1. A typical relationship between two population parameters can be written in terms of a
difference. For example, suppose we need to compare the means from two popula-
tions, m1 and m2.
Standard Difference
notation notation
m1 5 m2 is equivalent to m1 2 m2 5 0
m1 . m2 is equivalent to m1 2 m2 . 0
m1 , m2 is equivalent to m1 2 m2 , 0
Definition
1. Two samples are independent if the process of selecting individuals or objects in
sample 1 has no effect on, or no relation to, the selection of individuals or objects in
sample 2. If the samples are not independent, they are dependent.
Similar means the individuals or 2. A paired data set is the result of matching each individual or object in sample 1 with
objects share some common, a similar individual or object in sample 2. A common experiment in which paired data
fundamental characteristic. They are obtained involves a before and after measurement on each individual or object.
may even be the same individual Each before observation is matched, or paired, with an after observation.
or object!
The notation, the idea of using differences, and the extra assumptions are all used in
the following sections to construct hypothesis tests for comparing various characteristics
of two populations.
Properties of X 1 2 X 2
Suppose
1. X 1 is the mean of a random sample of size n1 from a population with mean m1 and
variance s21.
2. X2 is the mean of a random sample of size n2 from a population with mean m2 and
variance s22.
3. The samples are independent.
If the distributions of both populations are normal, then the random variable X 1 2 X 2 has
the following properties.
1. E (X1 2 X2 ) 5 mX12X 2 5 m1 2 m2 .
X 1 2 X 2 is an unbiased estimator of the parameter m1 2 m2. The distribution is cen-
tered at m1 2 m2.
s21 s22
2. Var ( X1 2 X2 ) 5 sX212X2 5 1 and the standard deviation is
n1 n2
s21 s22
sX12X 2 5 1 .
Å n1 n2
3. The distribution of X 1 2 X 2 is normal.
Can you see the standardization If the underlying distributions are not known, but both n1 and n2 are large, then X 1 2 X 2
coming? is approximately normal (by the central limit theorem).
A hypothesis test concerning two population means, in terms of the difference in means
m1 2 m2, with significance level a, has the form
This is the template for a H0: m1 2 m2 5 D0
hypothesis test concerning two Ha: m1 2 m2 . D0, m1 2 m2 , D0, or m1 2 m2 2 D0
population means when variances
( X1 2 X2 ) 2 D0
are known, sometimes called a TS: Z 5
two-sample Z test. s21 s22
1
Å n1 n2
RR: Z $ za, Z # 2za, or 0 Z 0 $ za/2
A Closer L ok
D is the uppercase Greek letter 1. The value D0 is the fixed, hypothesized difference in means. Usually D0 5 0, that is,
delta. the means are assumed equal. The null hypothesis is then H0: m1 2 m2 5 0, which is
equivalent to H0: m1 5 m2. However, D0 may be some nonzero value. For example, two
population means may historically differ by 12 so that H0: m1 2 m2 5 12 ( 5 D0 ) . We
may want to conduct a test to see whether there is any change in this difference, with
Ha: m1 2 m2 2 12.
2. Just a reminder: Use only one (appropriate) alternative hypothesis and the correspond-
ing rejection region. The z critical values are from the standard normal distribution.
3. This hypothesis test procedure can be used only if both population variances are
known. If they are unknown but both sample sizes are large, some statisticians substi-
tute s21 for s21 and s22 for s22. This produces an approximate test statistic. Section 10.2
presents an exact test procedure for comparing population means (under certain
assumptions) when the population variances are unknown.
Is there any evidence to suggest that the mean weekly time spent watching TV for 18–24
year olds is less that the mean weekly time spent watching TV for 25–34 year olds? Use
SOLUTION TRAIL
10.1 Comparing Two Population Means Using Independent Samples When Population Variances Are Known 465
Solution Trail 10.1 a 5 0.01 and assume that each underlying distribution of weekly time spent watching TV
is normal.
K e ywor ds
n Is there any evidence? Solution
n Less than Step 1 Arbitrarily, let the 18–24-year-old group be population 1, and the 25–34-year-
n Known variances old group be population 2.
n Independent random samples The current state, or assumption, is that the two population mean weekly times
n Each underlying distribution is spent watching TV are equal:
normal
m1 5 m2 1 m1 2 m2 5 0 ( 5 D0 ) .
T ra nsl ati o n
The sample sizes, sample means, and population variances are given.
n Conduct a one-sided,
left-tailed test to compare m1 We are trying to find evidence that the 18–24-year-old group has smaller mean
and m2 . weekly time spent watching TV: m1 , m2, which is the same as m1 2 m2 , 0.
Therefore, the alternative hypothesis is one-sided, left-tailed.
C onc epts
Step 2 The four parts of the hypothesis test are
n Hypothesis test concerning
two population means when H0: m1 2 m2 5 0
variances are known Ha: m1 2 m2 , 0
V isi on ( X1 2 X2 ) 2 0
TS: Z 5
Use the template for this s21 s22
hypothesis test. The samples are 1
Å n1 n2
random and independent, the
underlying populations are RR: Z # 2za 5 2z0.01 5 22.3263
normal, and the population Step 3 The value of the test statistic is
variances are known. Use a
one-sided alternative hypothesis ( x1 2 x2 ) 2 0 23.4 2 28.9
and the corresponding rejection z5 5 5 22.4055 ( #22.3263 )
s21 s22 44.89 65.61
region, find the value of the test 1 1
statistic, and draw a conclusion. Å n1 n2 Å 18 24
Step 4 Because 22.4055 lies in the rejection region, we reject the null hypothesis at the
a 5 0.01 significance level. There is evidence to suggest that the mean weekly
time spent watching TV for 18–24-year-olds is less than the mean weekly time
spent watching TV for 25–34-year-olds.
The p value for this hypothesis test is
−2.4055 0
Figure 10.1 p-Value illustration:
p 5 P(Z # 22.4055)
5 0.0081 # 0.05 5 a
The following example involves a hypothesis test with a nonzero value for the hypoth-
esized difference in means, D0.
SOLUTION TRAIL
Example 10.2 Low-Carb Ice Cream
Data Set Low-carbohydrate foods are very popular as many Americans try to avoid this sugar and
LOWCARB starch combination that many believe causes weight gain. An advertisement for a low-
carb ice cream claims that the product has 16 fewer grams of carbohydrates per serving
than the leading store brand. To check this claim, independent random samples of each
Solution Trail 10.2 type of ice cream were obtained, and the amount of carbohydrates in each serving was
measured. The data are given (in grams) in the following table.
Ke y wor ds
n Is there any evidence? Store brand (l)
n difference in population
means 15.4 20.4 21.0 24.3 23.3 18.7 19.8 22.5 18.9 22.8
n 16 grams 25.4 25.1 20.3 24.1 16.6 22.6 22.1 19.4 16.6 24.4
n the variance is known 17.8 18.6 14.9 24.6 19.1 17.9 18.7 20.1 26.3 18.4
n independent random samples 21.8 17.1 21.5 19.6 22.9 22.2 21.5 18.3
H0: m1 2 m2 5 16
Ha: m1 2 m2 2 16
( X1 2 X2 ) 2 16
TS: Z 5
s21 s22
1
Å n1 n2
RR: 0 Z 0 $ za/2 5 z0.005 5 2.5758
Step 3 The sample means are
1
x1 5 ( 15.4 1 20.4 1 c1 18.3 ) 5 20.6579
38
1
x2 5 ( 3.7 1 3.9 1 c1 3.9 ) 5 3.8486
35
The value of the test statistic is
( x1 2 x2 ) 2 16 ( 20.6579 2 3.8486 ) 2 16
z5 5 5 1.6842
s21 s22 8.5 0.253
1 1
Å n1 n2 Å 38 35
Step 4 The value of the test statistic, z 5 1.6842, does not lie in the rejection region.
We do not reject the null hypothesis. There is no evidence to suggest that the
difference in population mean carbohydrates is different from 16 grams at the
a 5 0.01 significance level.
This is a two-sided test and the value of the test statistic is positive, so p/2 is a
Z
right-tail probability.
p/2 5 P ( Z $ 1.6842 ) Definition of p value for a two-sided test.
−1.6842 0 1.6842
p 5 2 ( 0.0461 ) 5 0.0922 Solve for p.
Figure 10.5 p-Value illustration: Because p 5 0.0922 . 0.01 ( 5 a ) , we do not reject the null hypothesis. See
p 5 2P(Z $ 1.6842) figure 10.5.
5 0.0922 . 0.01 5 a Figure 10.6 shows a technology solution.
Given the two-sample Z test assumptions and the properties of the random variable
X1 2 X2 , we can construct a confidence interval (CI) for the (difference) parameter
m1 2 m2.
468 C hapte r 1 0 Confidence Intervals and Hypothesis Tests Based on Two Samples or Treatments
As usual, to find a general CI, start with an appropriate symmetric interval about 0
s
such that the probability Z lies in this interval is 1 2 a.
( X1 2 X2 ) 2 ( m1 2 m2 )
P £ 2za/2 , , za/2 § 5 1 2 a (10.1)
s21 s22
1
Å n1 n2
Z
Manipulate the inequality in Equation 10.1 to sandwich the parameter m1 2 m2. We
obtain the following probability statement:
s
How to Find a 100(1 2 a)% Confidence Interval
for m1 2 m2 When Variances Are Known
Given the two-sample Z test assumptions, a 100 ( 1 2 a ) % confidence interval for
m1 2 m2 has as endpoints the values
s21 s22
( x1 2 x2 ) 6 za/2 1 (10.2)
Å n1 n2
Find a 95% confidence interval for the difference in population mean pizza-stone weights.
Solution
Step 1 Sample sizes, sample means, and known variances are given.
The underlying weight distributions are unknown, but the sample sizes are both
large ( $ 30 ) .
1 2 a 5 0.95 1 a 5 0.05 1 a /2 5 0.025 Find a/2.
s21 s22
( x1 2 x2 ) 6 za/2 1 Equation 10.2.
Å n1 n2
10.1 Comparing Two Population Means Using Independent Samples When Population Variances Are Known 469
2.1 3.5
5 ( 6.21 2 7.08 ) 6 ( 1.96 ) 1 Use summary statistics and critical value.
Å 35 31
5 20.87 6 0.8150 Simplify.
Technology Corner
Procedure: Hypothesis tests and confidence intervals concerning two population means when the population variances
are known.
Reconsider: Example 10.2, solution, and interpretations.
Crunchlt!
Use the function z 2-Sample to conduct a hypothesis test concerning two population means.
1. Enter the store-brand data into column Var1 and the low-carb brand data into column Var2.
2. Select Statistics; z; 2-Sample. Under the Columns tab, select Var1 for Sample 1 and Var2 for Sample 2. Enter the
standard deviation for each group.
3. Under the Hypothesis Test tab, enter the Difference of means under null hypothesis, 16 ( 5 D0 ) . Choose the appropriate
Alternative (hypothesis).
4. Click Calculate. The results are shown in Figure 10.6.
TI-84 Plus C
Use the calculator functions 2-SampZTest and 2-SampZInt. Input is either summary statistics or data in lists.
1. Enter the store brand data into list L1 and the low-carb brand data into list L2.
2. Subtract D0 5 16 from each observation in list L1 and store the results in list L1.
3. Select STAT ; TESTS; 2-SampZTest. Highlight Data. Enter s1 , s2 , List1, and List2. Set each frequency to 1.
Highlight the alternative hypothesis. See Figure 10.9.
4. Highlight Calculate and press ENTER . The results are displayed on the Home screen. See Figure 10.10. The Draw
results are shown in Figure 10.11.
470 C hapte r 1 0 Confidence Intervals and Hypothesis Tests Based on Two Samples or Treatments
Minitab
There is no built-in function to conduct hypothesis tests and construct confidence intervals concerning two population
means when the population variances are known. Remember, this is an instructive situation, not very realistic. It is pretty
unlikely we would know the population variances, but not the population means.
Excel
Use the built-in function z-test: Two Sample for Means.
1. Enter the store brand data into column A and the low-carb brand data into column B.
2. Under the Data tab, select Data Analysis; z-test: Two Sample for Means.
3. Enter the Variable 1 Range and Variable 2 Range, the Hypothesized Mean Difference, and the known population
variances. Enter the value for Alpha and choose an Output option. See Figure 10.14. Click OK.
4. Summary statistics along with the value of the test statistic, critical values, and p values are displayed. See Figure 10.15.
Figure 10.14 z-Test: Two Sample for Means input Figure 10.15 Hypothesis test
screen. results.
10.1 Comparing Two Population Means Using Independent Samples When Population Variances Are Known 471
each toothbrush was measured, and the summary statistics are Sample Sample Population
given in the following table. State size mean variance
Is there any evidence to suggest that the mean amount of pro- a. Assume the underlying distributions are normal. Is there
tein is different in these two products? Use a 5 0.01 and any evidence to suggest that the population mean
assume normality. airspeeds are different? Use a 5 0.05.
b. Find the p value associated with this hypothesis test.
10.21 Manufacturing and Product Development Several
factors determine how well a ceiling fan cools a room, includ- 10.24 Medicine and Clinical Studies The time it takes for
ing blade pitch, height from the ceiling, and revolutions per general anesthesia to work (time to induction) is an important
minute. Independent random samples of two types of ceiling consideration during an emergency and for scheduled surgeries.
fans were obtained, and the revolutions per minute (on high) for Recently, a study was conducted to compare the mean induction
each was measured. The summary statistics and known vari- time of similar drugs administered via inhalation and intrave-
ances are given in the following table. nously. Independent random samples of patients requiring
general anesthesia were obtained, and the induction times (in
Sample Sample Population minutes) were measured. Assume the variance in induction time
Ceiling fan size mean variance for inhalation administration is 0.0625 and for intravenous
Hampton 34 295.05 11.55 administration is 0.1225. Is there any evidence to suggest that
Altura 35 300.38 6.25 the mean time to induction for intravenous administration is
less than the mean time to induction for inhalation administra-
a. Find a 99% confidence interval for the true mean tion? Use a 5 0.05. anesth
a. Is there any evidence to suggest that the population mean lorida still has the most flood insurance policies, followed by
F
output of elements from The Repair Clinic is different Texas, then Louisiana. A random sample of flood insurance
from 7.4? policies in these three states was obtained, and the premium in
b. Is there any evidence to suggest that the population dollars for each was recorded. The summary statistics are given
mean output of elements from The Parts Pros is different in the following table.4
from 7.4?
c. Is there any evidence to suggest that m1 is different from Sample Sample Population
m2? State size mean standard deviation
d. Using the results from parts (a), (b), and (c), which Louisiana 18 716.15 250
supplier should the manufacturer use? Florida 22 498.71 275
10.27 Marketing and Consumer Behavior The Press Texas 26 560.50 300
Association Mediapoint recently released a housing market
report for England that included how long it would take a Assume the underlying populations are normal. Is there any
typical first-time buyer to save for a deposit in their local area.3 evidence to suggest that any pairs of population mean flood
Independent random samples of first-time buyers were obtained insurance premiums are different? That is, conduct three sepa-
in two areas, and the time (in years) needed for a couple to save rate hypothesis tests to consider m1 2 m2, m1 2 m3, and
for the deposit was recorded for each. The summary statistics m2 2 m3. Use a 5 0.05 in each case.
are given in the following table.
Challenge
Sample Sample Population
Area size mean variance 10.29 Sample Size Calculation Suppose a 100 ( 1 2 a ) %
confidence interval is needed for the difference in two popula-
Yorkshire & The Humer 60 4.5 1.56
tion means, m1 2 m2. In addition, suppose the underlying popu-
East Midlands 75 4.8 3.24
lations are normal, the population variances, s21 and s22, are
known, and the samples sizes are equal, n1 5 n2 5 n.
a. Is there any evidence to suggest that the mean time
a. Find an expression for the sample size necessary (from
needed for a couple to save for a deposit is different in
each population) in order for the resulting confidence
these two areas? Use a 5 0.05.
interval to have a bound on the error of estimation B
b. Find the p value associated with this hypothesis test.
(half the width of the confidence interval).
c. The sample means, 4.5 and 4.8, seem close together. Can
b. How large a sample size is necessary if s1 5 12.7,
you find a value n 5 n1 5 n2 such that the hypothesis test
s2 5 9.5, B 5 5, and the confidence level is 95%?
is significant at the a 5 0.05 level?
c. Use the sample size in part (b) with x1 5 57.3 and
10.28 Flood Insurance According to the U.S. National x2 5 48.6 to construct a 95% confidence interval for
Flood Insurance Program, the cost of policies in New Orleans m1 2 m2. Compute the exact bound on the error of
continued to rise for several years after Hurricane Katrina. estimation. How does this compare with B 5 5?
For reference, these are the 1. X1 is the mean of a random sample of size n1 from a normal population with mean m1.
two-sample t test assumptions. 2. X2 is the mean of a random sample of size n2 from a normal population with mean m2.
The last assumption is new and implies we are comparing populations with the same
variability. If we do not assume equal variances, there is no nice test procedure. More on
this later.
Properties of X1 2 X2
If the two-sample t test assumptions are true, then the estimator X1 2 X2 has the following
properties.
1. E ( X1 2 X2 ) 5 m X1 2X2 5 m1 2 m2
X1 2 X2 is still an unbiased estimator of the parameter m1 2 m2.
s21 s22 s2 s2 1 1
2. Var ( X1 2 X2 ) 5 s2X1 2 X 2 5 1 5 1 5 s2 a 1 b and the standard
n1 n2 n1 n2 n1 n2
1 1
deviation is sX 2X 5 s2 a 1 b.
1 2
Å n1 n2
3. Both underlying populations are normal, so the distribution of X1 2 X2 is also normal.
In the previous section, we used the known population variances, standardized, and
constructed a test based on the Z distribution. Here, an estimate of the common variance
s2 is necessary. The appropriate standardization results in a t distribution.
S 21 and S 22 are separate estimators for the common variance, but using only one of these
means ignoring additional, useful information. Because s2 is the variance for both under-
lying populations, an estimator for this common variance should depend on both samples.
However, it also seems reasonable for the estimator to rely more on the larger sample.
Therefore an estimate of the common variance uses both S12 and S22 in a weighted average.
Definition
The pooled estimator for the common variance s2, denoted Sp2, is
( n1 2 1 ) S 21 1 ( n2 2 1 ) S 22
S 2p 5 (10.3)
n1 1 n2 2 2
n1 2 1 n2 2 1
5a bS 21 1 a bS 22
n1 1 n2 2 2 n1 1 n2 2 2
A Closer L ok
l is the lowercase Greek letter 1. S 2p is indeed a weighted average. This estimator can be written in the form
lambda and represents a constant.
Sp2 5 lS 21 1 ( 1 2 l ) S 22 where 0#l#1
If n1 5 n2, then l 5 12 and Sp2 5 12S12 1 12S 22. If n1 2 n2, then more weight is given
to the larger sample.
2. The constants in Equation 10.3 are related to the number of degrees of freedom. S 21
contributes n1 2 1 degrees of freedom and S 22 contributes n2 2 1 degrees of freedom.
Consequently, there are a total of ( n1 2 1 ) 1 ( n2 2 1 ) 5 n1 1 n2 2 2 degrees of
freedom associated with the estimator S 2p .
Theorem
If the two-sample t test assumptions are true, then the random variable
( X1 2 X2 ) 2 ( m1 2 m2 )
T5
1 1
S 2p a 1 b
Å n1 n2
has a t distribution with n1 1 n2 2 2 degrees of freedom.
As in a two-sample Z test, the null and alternative hypotheses are stated in terms of the
difference m1 2 m2. The critical values are from the appropriate t distribution.
a. Is there any evidence to suggest that there is a difference in the population mean wait-
ing time for a knee replacement between Abbotsford Regional Hospital and Earl Ridge
Hospital? Use a 5 0.05 and assume the underlying distributions are normal, with
equal variances.
b. Find bounds on the p value associated with this hypothesis test.
Solution
Step 1 Let Abbotsford Regional Hospital be population 1 and Earl Ridge Hospital be
population 2.
The null hypothesis is that the two population means are equal—with the same
waiting time for a knee replacement: m1 5 m2 1 m1 2 m2 5 0 ( 5 D0 ) .
SOLUTION TRAIL
10.2 Comparing Two Population Means Using Independent Samples from Normal Populations 477
Solution Trail 10.4 The summary statistics are given, and the samples were obtained independently.
The population variances are unknown but assumed equal. A two-sample t test is
K e ywor ds appropriate.
n Is there any evidence? We are looking for any difference in population means, so this is a two-sided test.
n Difference in population mean
Step 2 The four parts of the hypothesis test are
n Underlying distributions are
normal, with equal variances H0: m1 2 m2 5 0
n Independent random samples Ha: m1 2 m2 2 0
( X1 2 X2 ) 2 0
T ra nsl ati o n TS: T 5
Conduct a two-sided test to 1 1
n
S 2p a 1 b
compare m2 and m2 Å n1 n2
n Variances are unknown but RR: 0 T 0 $ ta/2,n11n222 5 t0.025,30 5 2.0423
assumed equal
Step 3 The pooled estimate of the common population variance is
C onc epts ( n1 2 1 ) s21 1 ( n2 2 1 ) s22 ( 14 )( 34.81 ) 1 ( 16 ) ( 46.24 )
n Hypothesis test concerning s2p 5 5 5 40.906
n1 1 n2 2 2 30
two population means when
variances are unknown but The value of the test statistic is
equal
( x1 2 x2 ) 2 0 17.4 2 12.1
V isi on
t5 5 5 2.3393 ( $ 2.0423 )
1 1 1 1
Use the template for this s2p a 1 b ( 40.906 ) a 1 b
hypothesis test. The samples are
Å n1 n2 Å 15 17
random and independent, the The value of the test statistic, t 5 2.3393, lies in the rejection region, hence we
underlying distributions are reject the null hypothesis at the a 5 0.05 significance level. There is evidence to
normal, and the population suggest the mean waiting time for a knee replacement is different at Abbotsford
variances are unknown but
Regional Hospital and Earl Ridge Hospital.
assumed equal. Use the
two-sided alternative hypothesis Step 4 Recall, because of the nature of the table of critical values for t distributions, we
and the corresponding rejection can only bound the p value.
region.
0 t 0 5 0 2.3393 0 5 2.3393
In Table V in the Appendix, row n1 1 n2 2 2 5 15 1 17 2 2 5 30, place
t30 2.3393 in the ordered list of critical values.
2.0423 # 2.3393 # 2.4573
t0.025,30 # 2.3444 # t0.01,30
Therefore, 0.01 # p/2 # 0.025
And 0.02 # p # 0.05
−2.3393 0 2.3393
See Figure 10.16.
Figure 10.16 p Value illustration:
p 5 2P(T $ 2.3393) Figures 10.17 through 10.19 together show a technology solution.
5 0.0262 # 0.05 5 a
VIDEO Tech
Video TECH Manuals
MANUALS
EXEL sample
Two DISCRIPTIVE
mean
inference - t -
summarized data
T ransl at i on Is there any evidence that the new-process aluminum cans have a smaller population
n Conduct a one-sided test to mean weight? Assume the populations are normal, with equal variances, and use a 5 0.01.
compare m1 and m2
n Variances are unknown Solution
but equal Step 1 The null hypothesis is that the two mean weights are the same: m1 2 m2 5 0. We
are looking for evidence that the new-process cans have a smaller mean weight.
Concepts
The alternative hypothesis is m1 2 m2 . 0.
n Hypothesis test concerning
two population means when The underlying populations are assumed normal with equal variances, and the
variances are unknown but samples were obtained independently. A two-sample t test is relevant.
equal Step 2 The four parts of the hypothesis test are
Vi sion H0: m1 2 m2 5 0
Use the template for this Ha: m1 2 m2 . 0
hypothesis test. The samples are ( X1 2 X2 ) 2 0
random and independent, the TS: T 5
underlying distributions are 1 1
normal, and the population S 2p a 1 b
Å n1 n2
variances are unknown but RR: T $ ta,n11n222 5 t0.01,40 5 2.4233
assumed equal. Use a one-sided
alternative hypothesis and the Step 3 The summary statistics are
corresponding rejection region.
1
x1 5 ( 0.52 1 0.49 1 c1 0.51 ) 5 0.5048
21
1
x2 5 ( 0.51 1 0.51 1 c1 0.48 ) 5 0.4886
21
1 1
s21 5 c 5.3580 2 ( 10.6 ) 2 d 5 0.0003762
20 21
1 1
s22 5 c 5.0216 2 ( 10.26 ) 2 d 5 0.0004429
20 21
The pooled estimate of the common population variances is
( 20 )( 0.0003762 ) 1 ( 20 )( 0.0004429 )
s2p 5 5 0.0004095
40
The value of the test statistic is
( x1 2 x2 ) 2 0 0.5048 2 0.4886
t5 5 5 2.5941
1 1 1 1
s2p a 1 b ( 0.0004095 ) a 1 b
Å n1 n2 Å 21 21
10.2 Comparing Two Population Means Using Independent Samples from Normal Populations 479
Step 4 The value of the test statistic lies in the rejection region (t 5 2.5941 $ 2.4233;
p 5 0.0066 # .01, see Figure 10.20). We reject the null hypothesis at the
a 5 0.01 significance level. There is evidence to suggest that new-process alu-
minum cans have a smaller mean weight.
t40
Figure 10.21 Minitab hypothesis test (and confidence Figure 10.22 JMP two-sample t test.
interval) results.
This methodology has been used Using the assumptions presented in this section and the technique presented in Section
several times, beginning in 10.1, a confidence interval for m1 2 m2 can be derived. Start with a symmetric interval
Chapter 8. about 0 such that the probability T lies in this interval is 1 2 a. Manipulate the inequal-
ity to sandwich the parameter m1 2 m2.
1 1
( x1 2 x2 ) 6 ta/2,n1 1n222 s2p a 1 b (10.4)
Å n1 n2
Assume the populations are normal and the variances are equal. Find a 99% confidence
interval for the difference in population mean iron content.6
Solution
Step 1 The summary statistics are given, the underlying distributions are assumed normal,
and the population variances are assumed equal. Equation 10.4 can be used to
construct a confidence interval for the difference m1 2 m2.
1 2 a 5 0.99 1 a 5 0.01 1 a /2 5 0.005 Find a/2.
1 1
( x1 2 x2 ) 6 ta/2 # s2p a 1 b Equation 10.4.
Å n1 n2
1 1
5 ( 23.17 2 24.19 ) 6 ( 2.7874 ) ( 14.7257 ) a 1 b
Å 12 15
Use summary statistics and critical values.
( 25.1627, 3.1227 ) is a 99% confidence interval for the difference (in mg) in
population mean iron content, m1 2 m2. Note that because 0 is included in, or
captured by, this interval, there is no evidence to suggest the mean iron content
is different.
Figures 10.23 and 10.24 together show a technology solution.
different, as long as the underlying populations are normal and n1 5 n2, the results are
still very reliable.
If the underlying populations are normal, the population variances are unequal, and
Nice means a reasonable the sample sizes are different, there is no nice test procedure concerning m1 2 m2 (or
standardization to produce a confidence interval for m1 2 m2 ). It is reasonable to use each sample variance as an
common random variable. approximation for the corresponding population variance. However, the resulting log-
ical standardization produces only an approximate test statistic. If the sample sizes are
small and the underlying populations are not normal, then a nonparametric test must
be used.
s21 s22
( x1 2 x2 ) 6 ta/2,v 1 (10.5)
Å n1 n2
A Closer L ok
1. The random variable Tr has an approximate t distribution with v degrees of freedom.
2. It is likely that the value of v will not be an integer. To be conservative, always round
down (to the nearest integer).
3. A test for equality of population variances is presented in Section 10.5. This hypothe-
sis test is often used to determine whether equal population variances is a reasonable
assumption.
482 C hapte r 1 0 Confidence Intervals and Hypothesis Tests Based on Two Samples or Treatments
T ransl at i on Is there any evidence to suggest that the population mean weight of $100 chips is different
n Conduct a two-sided test to from that of $500 chips? Assume both populations are normal, and use a 5 0.05.
compare m1 and m2
n Variances are unknown and Solution
assumed unequal Step 1 The null hypothesis is that the two mean weights are the same, and the alterna-
tive is two-sided. The underlying populations are assumed normal and the sam-
Concepts
ples were obtained independently. However, there is no assumption of equal
n Hypothesis test concerning variances. The approximate two-sample t test is appropriate.
two population means when
variances are unknown and Step 2 The summary statistics are
unequal 1
x1 5 ( 9.17 1 9.21 1 c1 9.11 ) 5 9.2043
Vi sion 14
1
Use the template for this x2 5 ( 9.37 1 9.98 1 c1 9.69 ) 5 9.4322
hypothesis test. The samples are 9
random and independent, the 1 1
underlying distributions are s21 5 c 1186.1804 2 ( 128.86 ) 2 d 5 0.008934
13 14
normal, and the population
1 1
variances are unknown and s22 5 c 802.1295 2 ( 84.89 ) 2 d 5 0.1785
unequal. Use the two-sided 8 9
alternative hypothesis and the The approximate number of degrees of freedom are
corresponding rejection region,
the test statistic, and draw a 0.008934 0.1785 2
conclusion. a 1 b
14 9
v< 5 8.5177
( 0.008934/14 ) 2 ( 0.1785/9 ) 2
1
13 8
We round v down to 8.
Step 3 The four parts of the hypothesis test are
H0: m1 2 m2 5 0
Ha: m1 2 m2 2 0
( X1 2 X2 ) 2 0
TS: Tr 5
S 21 S 22
1
Å n1 n2
RR: 0 Tr 0 $ ta/2,v 5 t0.025,8 5 2.3060
Step 4 The value of the test statistic is
( x1 2 x2 ) 2 0 9.2043 2 9.4322
tr 5 5 5 21.5928
s21 s22 0.008934 0.1785
1 1
Å n1 n2 Å 14 9
The value of the test statistic does not lie in the rejection region. Equivalently,
p 5 0.1475 . 0.05, illustrated in Figure 10.25. We do not reject the null hypoth-
esis at the a 5 0.05 significance level. There is no evidence to suggest that the
mean weight of $100 chips is different from the mean weight of $500 chips.
Figure 10.26 shows a technology solution.
10.2 Comparing Two Population Means Using Independent Samples from Normal Populations 483
t8
−1.5928 0 1.5928
Figure 10.25 p Value illustration: Figure 10.26 Excel hypothesis test results.
p 5 2P(T $ 1.5928)
5 0.1475 . 0.05 5 a
Technology Corner
Procedure: H
ypothesis tests and confidence intervals concerning two population means when the population variances
are unknown.
Reconsider: Example 10.5, solution, and interpretations.
Crunchlt!
Use the built-in function t 2-Sample. Input is either summary statistics or data in columns.
1. Enter the old process data into column Var1 and the new process data into column Var2.
2. Select Statistics; t; 2-sample. Using the pull-down menus, select Var1 for Sample 1 and Var2 for Sample 2. Check the
Pooled Variance box.
3. Under the Hypothesis tab, enter the Difference of means under null hypothesis (0) and select the appropriate Alternative
(Greater than). See Figure 10.27.
4. Click Calculate. The results are displayed in a new window (Figure 10.28).
5. Use the Confidence Interval tab to construct a confidence interval for the difference of two population means.
TI-84 Plus C
Use the built-in functions 2-SampTTest and 2-SampTInt. Input is either summary statistics or data in lists.
1. Enter the old process data into list L1 and the new process data into list L2.
2. Select STAT ; TESTS; 2-SampTTest. Highlight Data. Enter List1, List2, and set each frequency to 1.
Highlight the alternative hypothesis and Yes for Pooled. See Figure 10.29.
3. Highlight Calculate and press ENTER . The results are displayed on the Home screen. See Figure 10.30.
4. The Draw results are shown in Figure 10.31.
5. Use the function STAT ; TESTS; 2-SampTInt to construct a confidence interval for the difference of two
population means.
Minitab
Use the built-in function 2-Sample t to conduct a hypothesis test and to construct a confidence interval. Input is
either data in one or two columns (data in one column requires a subscript, or group-identifying, column) or
summarized data.
1. Enter the old process data into column C1 and the new process data into column C2.
2. Select Stat; Basic Statistics; 2-Sample t.
3. Choose Each sample is in its own column and enter C1 in the Sample 1 input window and C2 in the Sample 2 input
window.
4. Choose the Options option button. Enter a Confidence level, the hypothesized Test difference, and choose the appropri-
ate Alternative, check the Assume equal variances box.
5. The hypothesis test results and confidence interval are displayed in a session window. Refer to Figure 10.21.
Excel
The Data Analysis toolkit contains two functions for comparing population means, assuming equal variances and
assuming unequal variances. Use the appropriate formula and ordinary spreadsheet calculations to find the endpoints
of a confidence interval.
1. Enter the old process data into column A and the new process data into column B.
2. Under the Data tab, select Data Analysis; t-Test: Two-Sample Assuming Equal Variances.
3. Enter the Variable 1 Range, Variable 2 Range, the Hypothesized Mean Difference, and the value for a. Choose
an Output option and click OK.
4. Summary statistics along with the value of the test statistic, critical values, and p values are displayed. See
Figure 10.32.
10.2 Comparing Two Population Means Using Independent Samples from Normal Populations 485
10.41 In each of the following problems, n1, n2, s1, and s2 are uppose independent random samples of rotors from two
S
given. Assume normal underlying distributions, independent different manufacturers were obtained. The largest deviation
random samples, and unknown, unequal variances. Find the from perfect flatness of each rotor was measured, and the
approximate number of degrees of freedom, v, in the critical resulting summary statistics are given in the following table.
value of an approximate two-sample t test.
a. n1 5 12, n2 5 15, s1 5 11.7, s2 5 16.7 Sample Sample Sample
b. n1 5 8, n2 5 23, s1 5 5.46, s2 5 6.78 Manufacturer size mean standard deviation
c. n1 5 18, n2 5 26, s1 5 57.8, s2 5 49.9 Tire Rack 11 26.74 8.31
d. n1 5 32, n2 5 34, s1 5 5.51, s2 5 5.03 JC Whitney 14 29.53 6.85
10.42 Consider the following table of summary statistics.
Assume the underlying distributions are normal and the popula-
Sample Sample Sample tion variances are equal.
Group size mean variance a. Is there any evidence to suggest that there is a difference
One 8 173.9 320.41 in population mean deviations from perfect flatness for
these two rotor brands? Use a 5 0.05.
Two 9 150.3 655.36
b. Find bounds on the p value associated with this hypothesis
test.
Assume normal underlying distributions, independent random
samples, and unknown, unequal variances. 10.46 Manufacturing and Product Development The mean
a. Conduct a hypothesis test of H0: m1 2 m2 5 0 versus weight of an ordinary key is an important consideration, as most
Ha: m1 2 m2 . 0. Use a 5 0.05. Americans carry a pocketful of keys. A manufacturer claims that
b. Find bounds on the p value associated with this test. a new process produces a lighter and more durable key. Indepen-
dent random samples of both types of keys were obtained, and
10.43 Consider the following table of summary statistics.
each key was carefully weighed and its weight (in ounces) was
Sample Sample Sample recorded. The resulting summary statistics are given in the follow-
Group size mean standard deviation ing table, with the sample means and sample variances.
10.45 Fuel Consumption and Cars The durability and flat- 10.48 Manufacturing and Product Development Shelf
ness of the front rotors on an automobile are important for Safe Milk does not need to be refrigerated until it is opened.
braking and for a smooth ride. The flatness of a rotor can be Although it is convenient, there is some concern that this Grade
determined by a special optical measuring device that measures A milk contains less protein than regular milk. Independent
the largest deviation from perfect flatness in microinches. random samples of 8-ounce servings of Shelf Safe Milk and
10.2 Comparing Two Population Means Using Independent Samples from Normal Populations 487
regular milk were obtained, and the amount of protein (in the boarding time (in minutes) for each was recorded. The sum-
grams) in each was measured. The summary statistics are given mary statistics are given in the following table.
in the following table.
Sample Sample Sample
Sample Sample Sample Airline size mean standard deviation
Milk size mean standard deviation
American 21 44.5 12.3
Shelf Safe 25 13.95 3.93 US Airways 26 50.7 15.5
Regular 23 19.09 5.91
Is there any evidence to suggest that the new boarding proce-
Is there any evidence to suggest that the population mean dure has decreased the population mean boarding time? Use
amount of protein in Shelf Safe Milk is less than the population a 5 0.05, and assume the populations are normal with equal
mean amount of protein in regular milk? Use a 5 0.01, and variances.
assume the underlying distributions are normal, with equal
10.52 Manufacturing and Product Development A
variances.
recent study was conducted to determine the curing efficiency
10.49 Medicine and Clinical Studies A study was con- (time to harden) of dental composites (resins for the restoration
ducted to determine standard reference values for musculoskel- of damaged teeth) using two different types of lights. Indepen-
etal ultrasonography in healthy adults. Independent random dent random samples of lights were obtained and a certain
samples of men and women were obtained, and the sagittal composite was cured for 40 seconds. The depth of each cure
diameter (in mm) of the biceps tendon was measured in each (in mm) was measured using a penetrometer. The summary
subject. The resulting summary statistics are given in the fol- statistics for the Halogen light were n1 5 10, x1 5 5.35, and
lowing table. s1 5 0.7. The summary statistics for the LuxOMax light were
n2 5 10, x2 5 3.90, and s2 5 0.8. Assume the underlying
Sample Sample Sample populations are normal, with equal variances.
Group size mean standard deviation a. The maker of the Halogen light claims that they produce
Women 54 2.5 0.49 a larger cure depth after 40 seconds than LuxOMax
Men 48 2.8 0.49 lights. Is there any evidence to support this claim? Use
SOLUTION TRAIL
a 5 0.01.
Assume the underlying populations are normal, with equal b. Construct a 99% confidence interval for the difference in
variances. population mean cure depths.
a. Is there any evidence to suggest that the population mean 10.53 Manufacturing and Product Development
sagittal diameter of women’s biceps tendons is different Certain masonry ties used in residential construction
from that of men’s biceps tendons? Use a 5 0.01. receive a hot-dipped galvanized finish for strength and protec-
b. Construct a 95% confidence interval for the difference in tion against moisture. Independent random samples of masonry
population mean sagittal diameters, m1 2 m2. ties from two competing companies were obtained. The amount
10.50 Manufacturing and Product Development A com- of coating on one side of each tie was measured (in g/m2). The
pany that produces hospital furniture has two assembly lines resulting summary statistics are given in the following table.
dedicated to cutting and drilling wood for medical cabinets. Sample Sample Sample
Each computer-controlled process is designed to drill holes in Company size mean variance
a certain cabinet part with depth 12.7 mm. Independent ran-
dom samples of drilled holes were obtained from the two Fero 10 331.4 201.64
assembly lines, the resulting hole depths (in mm) were Cintex 18 298.7 1190.25
recorded. Assume the underlying populations are normal, with
equal variances. CABINET Assume the underlying populations are normal.
a. Is there any evidence to suggest that Line 2 is producing a. Managers at Fero claim that their product has a larger
holes with a greater population mean depth than Line 1? mean coating than Cintex. Is there any evidence to
Use a 5 0.05. support this claim? Use a 5 0.01. Write a Solution Trail
b. Find bounds on the p value associated with this hypothesis for this problem.
test. b. Find a 95% confidence interval for the difference in
population mean coatings.
10.51 Travel and Transportation During Summer 2013,
American Airlines introduced a new method for passengers to 10.54 Sports and Leisure The curve in a hockey stick is
board a narrowbody aircraft. The new procedure affected pas- measured by first placing the face of the blade against a flat sur-
sengers traveling light, those carrying one item that fits under face. The curvature of the stick is restricted so that the perpen-
the seat. The system was designed to decrease the total boarding dicular distance from any point at the heel to the end of the
time, and improve on-time performance.8 Independent random blade is at most 34 inch.9 Independent random samples of hockey
samples of American and US Airways flights were obtained and sticks used by players on the Toronto Maple Leafs and
488 C hapte r 1 0 Confidence Intervals and Hypothesis Tests Based on Two Samples or Treatments
a. If the population variances are assumed equal, is there any designed to last longer. Independent random samples of the two
evidence to suggest the mean width of a $20 bill is greater types of fuel rods were obtained, and the lifetime of each
than the mean width of a $1 bill? Use a 5 0.01. (in months) was recorded. The summary statistics are given in
b. If the population variances are assumed unequal, is there the following table.
any evidence to suggest the mean width of a $20 bill is
greater than the mean width of a $1 bill? Use a 5 0.01. Fuel-rod Sample Sample Sample
c. Why do both tests lead to the same conclusion (with very design size mean standard deviation
similar p values)? Old 11 34.91 3.20
10.61 Marketing and Consumer Behavior Many home- New 11 39.55 3.55
owners use TIKI torches for outside decoration and to burn
s pecial oil to repel insects. Independent random samples of two Assume the underlying populations are normal and the
types of oil were obtained, and the burn time for 3 ounces of variances are equal.
each was recorded (in hours). The summary statistics are given a. Conduct the relevant hypothesis test to determine
in the following table. whether the new fuel rod does last longer. Use
a 5 0.01.
Sample Sample Sample b. Construct a 99% confidence interval for the difference in
Oil size mean variance population mean lifetimes. Does this confidence interval
Citronella Torch Fuel 21 6.25 1.04 support the hypothesis test conclusion in part (a)?
Explain.
Black Flag Mosquito Control 28 5.98 0.77
10.64 Manufacturing and Product Development Root
Assume the underlying populations are normal. beer was originally made using the sarsaparilla root. However,
a. Do you think the assumption of equal variances is the oil from this root was shown to be carcinogenic (cancer-
reasonable? Why or why not? causing). Since then, many varieties are now made with cane
b. Based on your answer to part (a), conduct the appropriate sugar, herbs, spices, and vanilla. Independent random samples
hypothesis test to determine whether there is any evidence of 12-ounce cans of A&W root beer and Barq’s root beer were
that the mean burn time is different for these two brands. obtained, and the amount of sugar (in grams) was measured in
Use a 5 0.01. each. Assume the underlying populations are normal, with
c. Find bounds on the p value associated with the hypothesis unequal variances. ROOTBEER
test in part (b). a. Is there any evidence to suggest that the population mean
amount of sugar in A&W root beer is greater than in
10.62 Physical Sciences A pressure-relief valve (PRV) is
Barq’s root beer? Use a 5 0.05.
installed on a residential hot-water heater to protect against
b. Construct a 95% confidence interval for the difference in
overheating and, of course, high pressure. Independent random
the population mean sugar amounts. Does this confidence
samples of PRVs from different companies were obtained. Each
interval support the hypothesis test conclusion in part (a)?
value was tested by recording the pressure (in psi) required to
Explain.
cause the valve to open. The summary statistics are given in the
following table.
Challenge
Sample Sample Sample
Company size mean variance 10.65 Robust Statistics The two-sample t test for compar-
ing population means when the variances are equal is a
Delta 30 147.6 7.09 robust statistical procedure. If the population variances are
Gamma 35 147.8 13.70 different, as long as the underlying populations are normal
and the sample sizes are equal, then the hypothesis test is still
Assume the underlying populations are normal, with unequal very reliable.
variances. Generate a random sample of size 25 from a normal distribu-
a. Is there any evidence to suggest that the mean pressure tion with mean m1 5 100 and standard deviation s1 5 5.
required to open each valve is different? Use a 5 0.05. Generate a second random sample of size 25 from a normal
b. Find a 95% confidence interval for the difference in distribution with mean m2 5 100 and standard deviation
population mean pressure required to open each valve. Is s2 5 5. Conduct a two-sided, two-sample t test for comparing
this confidence interval consistent with the results in part population means assuming the population variances are equal
(a)? Explain. and with a 5 0.05. Do this 100 times and record the number of
10.63 Physical Sciences The lifetime of a fuel rod in a com- times you reject the null hypothesis.
mercial light-water nuclear reactor is related to the internal Repeat the same procedure but use s2 5 7. Record the number
pressure. Typically, a fuel rod lasts for 36 months, and one-third of times you reject the null hypothesis. Repeat the same proce-
of all fuel rods are replaced each year during a plant shutdown. dure for s2 5 10, 15, 20, 25, 30, 50. Use your results to explain
A new type of fuel rod includes a gas-relief capsule and is the robust nature of this hypothesis test.
490 C hapte r 1 0 Confidence Intervals and Hypothesis Tests Based on Two Samples or Treatments
Even through the word individual Let X1 represent a randomly selected first observation and let X2 represent the cor-
s
is used, this means individual or responding second observation on the same individual. Consider the random variable
object. D 5 X1 2 X2 , the difference in the observations, and the n observed differences
di 5 ( x1 ) i 2 ( x2 ) i, i 5 1, 2, . . . , n. X1 and X2 are both normal, so D is also normal.
More important, the differences are independent. A hypothesis test concerning
m1 2 m2 is based on the sample mean of the differences, D. This random variable has
the following properties.
Properties of D
1. E ( D ) 5 m1 2 m2 . D is an unbiased estimator for the difference in means m1 2 m2.
2. The variance of D is unknown, but it can be estimated using the sample variance of
the differences.
3. Both underlying populations are normal, so D is normal, and hence, D is also
normal.
s
Here’s what all of these results mean for us. To compare population means, m1 and m2,
when the data are paired, we focus on the difference m1 2 m2. As in earlier two-sample
tests, the null hypothesis H0: m1 5 m2 is equivalent to H0: m1 2 m2 5 0. A test to determine
whether the underlying population means of two paired samples are equal is equivalent to
a test to determine whether the population mean of the paired differences is zero. We com-
pute the differences, d1, d2, . . . , dn, and conduct a one-sample t test (with n 2 1 degrees
of freedom) using the differences.
10.3 Paired Data 491
A Closer L ok
1. D0 is the hypothesized difference in the population means. Usually D0 5 0: The null
hypothesis is that the two population means are equal. However, D0 may be nonzero.
For example, the null hypothesis H0: mD 5 m1 2 m2 5 5 5 D0 specifies that the dif-
ference in population means is 5.
2. A paired t test is valid even if the underlying population variances are unequal, that is,
even if s21 2 s22. The sample variance of the differences, S2D, is a good estimator of
Var ( X1 2 X2 ) when the observations are paired.
3. If a paired t test is appropriate, the test statistic is based on n 2 1 degrees of freedom.
A two-sample t test (incorrect here) would be based on a test statistic with
n 1 n 2 2 5 2n 2 2 degrees of freedom. Therefore, the correct analysis is based on
a distribution with greater variability and is more conservative.
Subject 1 2 3 4 5 6
Initial pulse rate 67 71 67 83 70 75
Final pulse rate 61 72 70 76 58 61
Difference 6 21 23 7 12 14
Subject 7 8 9 10 11 12
Initial pulse rate 71 68 72 88 78 70
Final pulse rate 74 59 61 64 71 77
Difference 23 9 11 24 7 27
492 C hapte r 1 0 Confidence Intervals and Hypothesis Tests Based on Two Samples or Treatments
Is there any evidence to suggest that the population mean cesium-137 level decreased
from August 23 to August 30? Assume the underlying populations are normal, use
a 5 0.05, and find bounds on the p value associated with this hypothesis test.
t1
0 2.5251
Figure 10.34 p-Value illustration: Figure 10.35 Minitab Paired t (Test and Confidence
p 5 P(T $ 2.5251) Interval) results.
5 0.0198 # 0.05 5 a
The random variable T here is The usual technique can be used to construct a confidence interval for the difference in
D 2 mD means, mD 5 m1 2 m2, when the observations are paired. Start with a symmetric interval
T5
SD / !n about 0 such that the probability T lies in this interval is 1 2 a. Manipulate the inequality
to sandwich mD.
Solution
Step 1 The sample size and summary statistics are given, and the underlying distribu-
tions (before maintenance mpg and after maintenance mpg) are assumed normal.
The observations are paired, so we can use Equation 10.6.
1 2 a 5 0.99 1 a 5 0.01 1 a /2 5 0.005 Find a/2.
Andresr/Shutterstock ta/2,n21 5 t0.005,17 5 2.8982 Find the t critical value with n 5 17.
( 25.1191, 2.5591 ) is a 99% confidence interval for the true mean difference in
miles per gallon, mD.
Note that because 0 is included in this confidence interval, there is no evidence
VIDEO Tech
Video TECH Manuals
MANUALS to suggest that mD is different from 0, and there is no evidence to suggest that the
EXEL DISCRIPTIVE
Paired samples maintenance program improves mileage.
inference
Figures 10.36 and 10.37 together show a technology solution.
Technology Corner
Procedure: Hypothesis tests and confidence intervals concerning paired data.
Reconsider: Example 10.9, solution, and interpretations.
Crunchlt!
Use the built-in function Statistics; t; Paired. There are tabs to conduct a hypothesis test and construct a confidence interval.
1. Enter the data from the first date into column Var1 and the data from the second date into column Var2.
2. Select Statistics; t; Paired. Choose the First and Second Variables from the drop-down menus. Under the Hypothesis
Test tab, enter the Mean difference under the null hypothesis and select the appropriate alternative. See Figure 10.38.
3. Click Calculate. The results are displayed in a separate window. See Figure 10.39.
TI-84 Plus C
Compute the differences if necessary. Use the built-in functions T-Test and TInterval. Input is either summary statistics
or data in lists.
1. Enter the August 23 data into list L1 and the August 30 data into list L2.
2. Find the paired differences on the Home screen, and store them in the list L3.
496 C hapte r 1 0 Confidence Intervals and Hypothesis Tests Based on Two Samples or Treatments
3. Select STAT ; TESTS; T-Test. Highlight Data. Enter m0, the hypothesized difference in means and the list
containing the paired differences. Set the frequency to 1 and highlight the alternative hypothesis. See Figure 10.40.
4. Highlight Calculate and press ENTER . The results are displayed on the Home screen. See Figure 10.41. The Draw
results are shown in Figure 10.42.
5. Use STAT ; TESTS; TInterval to construct a confidence interval for the paired differences. This procedure is described
in the Technology Corner in Section 8.3. A 95% confidence interval for the paired differences in shown in Figure 10.43.
Figure 10.40 T-Test Figure 10.41 T-Test Figure 10.42 T-Test Figure 10.43 A 95%
input screen. hypothesis test results. Draw results. confidence interval for the
paired differences.
Minitab
Use the built-in function Paired t to conduct a hypothesis test and to construct a confidence interval. Input is either paired
data in two columns or summarized data for the paired differences.
1. Enter the August 23 data into column C1 and the August 30 data into column C2.
2. Select Stat; Basic Statistics; Paired t.
3. Choose Each sample is in a column, and enter C1 in the Sample 1 input window and C2 in the Sample 2 input window.
4. Choose the Options option button. Enter a Confidence level, the Hypothesized difference ( D0 ) , and the Alternative
hypothesis.
5. The hypothesis test results and the confidence interval are displayed in a session window. Refer to Figure 10.35.
Excel
Use the Data Analysis toolkit function t-Test: Paired Two Sample for Means. If only summarized data for the paired
differences are given, use the function confidence.T to find the bound on the confidence interval and compute the
endpoints of the CI as described in the Technology Corner in Section 8.3.
1. Enter the August 23 data into column A and the August 30 data into column B.
2. Under the Data tab, select Data Analysis; t-Test: Paired Two Sample for Means.
3. Enter the Variable 1 Range (A1:A8), Variable 2 Range (B1:B8), the Hypothesized Mean Difference ( D0 ) , and a value
for Alpha.
4. Choose an Output option and click OK.
5. Summary statistics along with the value of the test statistic, critical values, and p values are displayed. See Figure 10.44.
will be used to determine whether there is a difference in a. What is the common characteristic that makes these data
the mean amount of fiber in the banana peel by country. paired?
c. A new stent has been designed for use in patients with b. Assume normality. Conduct the appropriate hypothesis
diseased arteries. A random sample of patients in need of test to determine whether there is any evidence that the
a stent was selected. The intrasaccular pressure of each mean runtime for Java programs is greater than the mean
was carefully measured, and then a stent was surgically runtime for C++ programs. Use a 5 0.001.
placed in the diseased artery. The intrasaccular pressure c. Find bounds on the p value associated with this hypothesis
was measured again following surgery. The data will be test.
used to determine whether the stent placement reduced 10.79 Physical Sciences A consultant working for a State
intrasaccular pressure. Police barracks contends that service weapons will fire with a
d. An automobile manufacturer claims that using a lighter higher muzzle velocity if the barrel is properly cleaned. A
weight engine oil can actually increase a car’s miles per random sample of Glock 9-mm handguns was obtained, and
gallon (of gasoline). A random sample of cars (and the muzzle velocity (in feet per second) of a single shot from
drivers) was selected, all who currently use a heavy weight each gun was measured. Each gun was professionally
oil in their engines. The miles per gallon for each car was cleaned, and the muzzle velocity of a second shot (with the
recorded. The engine oil was drained, a lighter weight oil same bullet type) was measured. The data are given in the
was used, and the miles per gallon was recorded again. following table. GLOCK
The data will be used to determine if the mean miles per
gallon has increased. Gun 1 2 3 4 5 6
e. A new process has been developed that theoretically Before 1505 1419 1504 1494 1510 1506
improves the nutritional value of barley for use in fish After 1625 1511 1459 1441 1472 1521
feed.* A random sample of traditional barley and another
random sample of the new barley was obtained. The a. What is the common characteristic that makes these data
percentage of protein in each grain was carefully paired?
measured. The data will be used to determine whether the b. Assume normality. Conduct the appropriate hypothesis
new barley has a greater mean percentage of protein. test to determine whether there is any evidence that a
10.75 The following summary statistics were obtained in a clean gun fires with a higher muzzle velocity. Use
paired-data study: d 5 15.68, sD 5 33.55, and n 5 17. Assume a 5 0.01.
normality and conduct a test of H0: mD 5 0 versus Ha: mD . 0. c. Find bounds on the p value associated with this hypothesis
Use a 5 0.05. test.
particulate matter was measured again. The differences (before b allads. The steps per minute for each person for each genre
filtration – after filtration) were recorded. Assume normality. Is were recorded. WALKING
there any evidence to suggest that the new filtration system a. What is the common characteristic that makes these data
improves air quality by removing particulate matter? Use paired?
a 5 0.05. Write a Solution Trail for this problem. FILTER b. Assume normality. Conduct the appropriate hypothesis to
determine whether there is any evidence to suggest that
SOLUTION TRAIL
10.82 Biology and Environmental Science The best hur-
the population mean steps per minute listening to hard
ricane forecasting models use global data and take hours to run
rock is greater than the population mean steps per minute
on the world’s fastest supercomputers. A random sample of 26
listening to ballads. Use a 5 0.01.
tropical storms or hurricanes that passed within 25 miles of
Miami were selected. The European Center for Medium-Range 10.86 Public Health and Nutrition The cost of
Weather Forecasting (ECMWF) and the Global Forecast Sys- long-term health care continues to rise each year, but it
tem (GFS) models were used to predict storm surge (in feet). varies considerable from state to state. Many older adults who
The summary statistics for the differences (ECMWF – GFS) must enter a nursing home may select either a semi-private or
were d 5 1.4923 and sD 5 2.8097. private room. Several states were selected at random and the
a. Is there any evidence to suggest that the ECMWF model cost of each type of room was recorded.14 ROOMS
predicts higher storm surges? Assume normality and use a. Assume normality and conduct the appropriate hypothesis
a 5 0.05. test to determine whether the population mean cost of a
b. Find bounds on the p value associated with this hypothesis private room is $15 greater than the population mean cost
test. of a semi-private room. Use a 5 0.001. Write a Solution
Trail for this problem.
10.83 Physical Sciences It is important to maintain a low b. What characteristic of the differences suggests that the
ammonia-ion concentration in freshwater aquariums to ensure hypothesis test in part (a) will be significant?
healthy fish (and plants). An ammonia neutralizer is advertised c. Find bounds on the p value associated with this hypothesis
to almost instantly detoxify ammonia (i.e., reduce the concen- test.
tration of ammonia ions) in order to protect fish. Fourteen
untreated 20-gallon aquariums were selected at random, and the 10.87 Manufacturing and Product Development The
ammonia-ion concentration (in ppm) in each was measured. porosity of a concrete block is a measure (as a percentage) of
One hour after the directed amount of the neutralizer was used, the amount of empty space in the block. In residential homes
the ammonia ion concentration was measured again. Assuming with concrete-block foundations, a larger porosity leads to
normality, is there any evidence to suggest that the neutralizer damper, colder basements. A contractor recommends pretreat-
decreases the mean ammonia-ion concentration? Use ment of concrete blocks with a product designed to decrease the
a 5 0.025. AQUARIUM porosity. A random sample of concrete blocks was obtained,
and the porosity of each was measured. The clear, paintlike
10.84 Public Health and Nutrition Beef boullion generally product was applied to each block, and the porosity was mea-
has a high salt content, which can cause health problems. A sured again. The differences (before treatment – after treatment)
food columnist for a local newspaper suggested simmering in porosities were recorded. Assume normality. Is there any evi-
boullion with slices of raw potato to remove salt. To check this dence to suggest the new product decreases the mean porosity
claim, 10 different boullion brands were selected at random of concrete blocks? Use a 5 0.001. CONCRETE
and the salt content in each was measured (in mg/cup of
10.88 Psychology and Human Behavior Each employee
water). Five potato slices were then added to each broth and
hired at an electronics parts assembly line in Edmonton, Alberta,
the mixtures were left to simmer for 15 minutes. Following
is given a general intelligence test. To determine which method
this procedure, the salt content was measured again. The dif-
of training is more effective, eight pairs of new hires were
ference between the initial and the final salt content was com-
matched according to their exam scores. One set of employees
puted for each boullion brand, and the data are given in the
was asked to read appropriate training manuals, while the other
following table. BOULLION
group watched interactive training videos. Each employee was
then asked to assemble a part used in a locater-beacon transmit-
2169 2222 431 110 2168 ter, and the time (in minutes) to completion was recorded. The
353 2207 68 25 203 data are given in the following table. TRAINING
a. What is the common characteristic that makes these data to lower the cloud point of this type of fuel. A random sample
paired? of six different biodiesel fuels was obtained and the cloud point
b. Is there any evidence to suggest the true mean time was measured for each. One ounce of the chemical additive was
difference, mD, is different from 0? Assume normality and mixed in with every fuel sample and the cloud point was mea-
use a 5 0.05. sured again. The resulting data are given in the following table
(temperatures in °C). CLOUDING
10.89 Public Health and Nutrition Americans love ham-
burgers, but the high fat content in some cooked patties pres-
ents a severe health threat. Certain electric grills are designed to Fuel 1 2 3 4 5 6
drain fat away from the patty, resulting in a healthier, although Before 11.7 12.9 14.2 12.7 11.3 12.4
perhaps less tasty, meal. A random sample of ground beef pack- additive
ages was obtained (with various fat contents). Two patties were
After 10.3 10.7 14.1 10.0 11.2 12.1
made from each package. One was cooked in an electric grill,
additive
while the other was prepared in a frying pan on top of a stove.
The fat content (as a percentage) in each cooked patty was
measured. HAMBURG a. Assume normality, and conduct the appropriate
a. Conduct the appropriate hypothesis test to determine hypothesis test to determine whether the additive lowers
whether the true mean fat content in hamburgers cooked the mean cloud point in biodiesel fuel. Use a 5 0.05.
on an electric grill is less than the true mean fat content of b. Conduct an inappropriate two-sample t test to compare
hamburgers cooked in a frying pan. Assume normality the population mean cloud point before treatment with the
and use a 5 0.001. population mean cloud point after treatment. Assume the
b. Find bounds on the p value associated with this hypothesis population variances are unequal and use a 5 0.05.
test. c. Compare the conclusions in parts (a) and (b). How are the
test statistics the same, and how do they differ?
Extended Applications 10.92 Biology and Environmental Science In August
2013, Vancouver Coastal Health warned swimmers that the
10.90 Medicine and Clinical Studies A new drug designed
coliform count at East False Creek was approximately twice the
to reduce fever (and relieve aches and pains) is being tested for
safe level.15 The contamination was attributed to boats, birds,
efficacy and side effects. Ten patients entering a hospital with a
geese, and warm weather. A random sample of locations along
high fever were selected at random. The temperature (in °F) of
the creek was obtained, and the coliform count (bacteria per
each patient was measured, the drug was administered, and two
100 ml of water) was measured in early August and again fol-
hours later the temperature was measured again. The data are
lowing several rain storms. COLIFORM
given in the following table. FEVER
a. What is the common characteristic that makes these data
Patient 1 2 3 4 5 paired?
b. Use a one-sided paired t test to determine whether the rain
Before drug 102.6 99.2 102.3 101.1 102.7
caused a decrease in the coliform count. Assume
After drug 99.8 98.8 97.5 100.3 99.6 normality and use a 5 0.001.
c. Find the safe level of coliform. Do you think it was safe to
Patient 6 7 8 9 10 swim in the creek after the rain? Why or why not?
Before drug 102.6 100.5 103.5 105.7 104.3 10.93 Travel and Transportation For anyone planning to
After drug 102.8 99.0 101.8 97.1 99.2 travel, either for vacation or on business, the pricing plans of
airlines remain a guarded mystery. A study by CheapAir.com
a. What is the common characteristic that makes these data
suggested that the cheapest fares are found 49 days before a
paired?
flight. However, according to Travelers Today, the best prices
b. Assume normality. Conduct the appropriate hypothesis
are offered 21 days before a flight.16 The Washington, D.C.-to-
test to determine whether there is any evidence that the
Los Angeles route was selected as a test case. A random sample
new drug reduces the mean patient temperature after two
of days was selected, and the best price for a one-way ticket was
hours. Use a 5 0.05.
recorded 21 and 49 days prior to the flight. AIRPRICE
c. Find bounds on the p value associated with this hypothesis
a. Assume normality. Is there any evidence to suggest that
test.
the price of a ticket on this route differs if purchased 21 or
d. What characteristic of the differences suggests that a
49 days in advance? Use a 5 0.05.
hypothesis test will be significant?
b. Find bounds on the p value associated with this hypothesis
10.91 Fuel Consumption and Cars Biodiesel fuel has a test.
cloud point, the temperature at which the fuel becomes cloudy, c. Find a 95% confidence interval for the difference in
of approximately 13°C. This clouding can lead to poor engine population mean cost per ticket. Does this confidence
performance and can even cause an engine to stop completely. interval support your conclusion in part (a)? Why or
An industrial chemical company produces an additive designed why not?
10.4 Comparing Two Population Proportions Using Large Samples 501
= =
Properties of the Sampling Distribution of P1 2 P2
= =
1. The mean of P1 2 P2 is the true difference between population proportions, p1 2 p2 .
That is,
= =
E ( P1 2 P2 ) 5 mP=1 2 P= 25 p12 p2
= =
2. The variance of P1 2 P2 is
= = p1 ( 1 2 p1 ) p2 ( 1 2 p2 )
Var ( P1 2 P2 ) 5 s2P= 12P= 2 5 1
= n=1 n2
The standard deviation of P1 2 P2 is
p1 ( 1 2 p1 ) p2 ( 1 2 p2 )
sP=12P= 2 5 1
Å n1 n2
3. If
(a) both n1 and n2 are large,
Items (b) and (c) are the (b) n1p1 $ 5 and n1 ( 1 2 p1 ) $ 5, and
nonskewness criterion. (c) n2 p2 $ 5 and n2 ( 1 2 p2 ) $ 5,
= =
then the distribution of P1 2 P2 is approximately normal.
= = p1 ( 1 2 p1 ) p2 ( 1 2 p2 )
In symbols: P1 2 P2 , N c p1 2 p2, d
d
1
n1 n2
A Closer L ok
1. This test is valid as long as the nonskewness criterion holds for both samples. Use the
= =
estimates p1 and p2 to check the inequalities.
2. Just as a reminder, the z critical values for this test are from the standard normal distri-
bution.
3. Remember, we can also determine whether to reject or not to reject the null hypothesis
by comparing the p value associated with the value of the test statistic to the signifi-
cance level a.
n Greater than the true their parents, and in a random sample of 300 female millennials, 96 lived with their par-
proportion
ents. Is there any evidence to suggest that the true proportion of male millennials living
n Random sample with their parents is greater than the true proportion of female millennials living with
T ra nsl ati o n their parents? Use a 5 0.05.
n Conduct a one-sided
hypothesis test about p1 2 p2 Solution
n D0 5 0. Step 1 This is a one-sided test in which we are looking for evidence that a greater pro-
C onc epts portion of males than females are living with their parents. Therefore, D0 5 0,
n Large-sample hypothesis test
and case 1 is appropriate. Arbitrarily, let male millennials be population 1 and
concerning two population female millennials be population 2. The given information:
proportions when D0 5 0
H0: p1 2 p2 5 0
Ha: p1 2 p2 . 0
= =
P1 2 P2
TS: Z 5
= = 1 1
Pc ( 1 2 Pc ) a 1 b
Å n1 n2
RR: Z $ za 5 z0.05 5 1.6449
Step 4 The estimate of the common population proportion is
= x1 1 x2 110 1 96
pc 5 5 5 0.3583
n1 1 n2 275 1 300
The value of the test statistic is
= =
p1 2 p2 0.40 2 0.32
z5 5 5 1.9985
= = 1 1 1 1
pc ( 1 2 pc ) a 1 b ( 0.3583 )( 0.6417 ) a 1 b
Å n1 n2 Å 275 300
Step 5 The value of the test statistic lies in the rejection region. Equivalency, p 5
0.0228 # 0.05, as illustrated in Figure 10.45. We reject the null hypothesis at the
a 5 0.05 significance level. There is evidence to suggest that the true proportion
of male millennials living with their parents is greater than the true proportion of
female millennials living with their parents.
504 C hapte r 1 0 Confidence Intervals and Hypothesis Tests Based on Two Samples or Treatments
Case 2: H0: p1 2 p2 5 D0 2 0
This case, with D0 2 0, is less common. Because p1 and p2 are assumed unequal, there is
no hypothesized, common population proportion. The hypothesis test follows routinely
= =
from the properties of P1 2 P2.
Suppose two random samples of sizes n1 =and n2= are obtained, and the nonskewness
criterion is satisfied. Using the properties of P1 2 P2, a confidence interval for p1 2 p2
can be derived ( p1 and p2 are assumed unequal). Start with a symmetric interval about 0
such that the probability Z lies in this interval is 1 2 a. As usual, manipulate the inequal-
ity to sandwich the parameter p1 2 p2.
10.4 Comparing Two Population Proportions Using Large Samples 505
Stepped
STEPPED Tutorial
TUTORIALS
How to Find a 100(1 2 a)% Confidence Interval for p1 2 p2
Confidence
BOX PLOTS Given two (large) random samples of sizes n1 and n2, a 100 ( 1 2 a ) % confidence interval
intervals for the
difference in popu- for p1 2 p2 has as endpoints the values
lation proportions
SOLUTION TRAIL = = = =
= = p1 ( 1 2 p1 ) p2 ( 1 2 p2 )
( p1 2 p2 ) 6 za/2 1 (10.10)
Å n1 n2
Solution Trail 10.12 Example 10.12 Make Room for Canadian Fliers
More Canadians are using smaller U.S. airports near the border when they travel to cities
K e ywor ds
in the United States. Reasons include a strong Canadian dollar and higher taxes and fees
Is there any evidence?
n
on air travel in Canada.18 In a random sample of 500 fliers at Buffalo Niagara Interna-
n True proportion is more than tional Airport, 245 were Canadian, and in a random sample of 400 fliers at Bellingham
0.05 greater than
(Washington) International Airport, 160 were Canadian.
n Random sample
a. Conduct the appropriate hypothesis test to determine whether there is evidence that the
T ra nsl ati o n true proportion of Canadian fliers at Niagara is more than 0.05 greater than the true
n Conduct a one-sided proportion of Canadian fliers at Bellingham. Use a 5 0.01.
hypothesis test about p1 2 p2 b. Find the p value associated with this hypothesis test.
n D0 5 0.05
Solution
C onc epts
Step 1 Let Niagara fliers be population 1 and Bellingham fliers be population 2. We are
n Large-sample hypothesis test
concerning two population looking for evidence that the difference p1 2 p2 is greater than 0.05 5 D0 2 0.
proportions when D0 2 0 Therefore, case 2 is appropriate.
The given information is presented here.
V isi on
Check the large-sample Niagara fliers Bellingham fliers
assumptions. Use the template
for a one-sided test concerning Sample size n1 5 500 n2 5 400
p1 2 p2 when D0 2 0. Use Number of successes x1 5 245 x2 5 160
a 5 0.01 to find the critical = =
Sample proportion p1 5 245/500 5 0.49 p2 5 160/400 5 0.40
value, compute the value of the
test statistic, and draw a
conclusion. Check the nonskewness criterion using estimates for p1 and p2.
= =
n1p1 5 ( 500 )( 0.49 ) 5 245 $ 5 n1 ( 1 2 p1 ) 5 ( 500 ) ( 0.51 ) 5 255 $ 5
= =
n2 p2 5 ( 400 )( 0.40 ) 5 160 $ 5 n2 ( 1 2 p2 ) 5 ( 400 ) ( 0.60 ) 5 240 $ 5
= =
All of the inequalities are satisfied, so P1 2 P2 is approximately normal, and the
large-sample hypothesis test concerning population proportions can be used.
Step 2 The four parts of the hypothesis test are
H0: p1 2 p2 5 0.05
Ha: p1 2 p2 . 0.05
= =
( P1 2 P2 ) 2 0.05
TS: Z 5 = = = =
P1 ( 1 2 P1 ) P2 ( 1 2 P2 )
1
Å n1 n2
RR: Z $ za 5 z0.01 5 2.3263
Step 3 The value of the test statistic is
= =
( p1 2 p2 ) 2 0.05 ( 0.49 2 0.40 ) 2 0.05
z5 = = = = 5 5 1.2062
p1 ( 1 2 p1 ) p2 ( 1 2 p2 ) ( 0.49 )( 0.51 ) ( 0.40 ) ( 0.60 )
1 1
Å n1 n2 Å 500 400
506 C h apte r 1 0 Confidence Intervals and Hypothesis Tests Based on Two Samples or Treatments
The value of the test statistic does not lie in the rejection region. (Equivalently,
p 5 0.1139 . 0.05, as illustrated in Figure 10.49.) We do not reject the null
hypothesis. At the a 5 0.01 significance level, there is no evidence to suggest
that the population proportion of Canadian fliers at Niagara is more than 0.05
greater than the population proportion of Canadian fliers at Bellingham.
VIDEO Tech
Video TECH Manuals
MANUALS The computations for a confidence interval for p1 2 p2 are illustrated in the next
EXEL Proportions
Two DISCRIPTIVE example.
Inference CI Test -
Summarized Data
Example 10.13 Storm Watch
The Weather Channel (TWC) is one of the most popular cable TV networks. Who hasn’t
seen Jim Cantore in the middle of some wild weather? However, the number of viewers
varies greatly according to geographic region and the current weather conditions. Random
samples of cable TV viewers in the Northeast (population 1) and in the West (population
2) were obtained. The number of viewers who watched TWC in the past week was recorded.
The data are given in the following table.
Northeast West
Carolina K. Smith MD/Shutterstock Sample size n1 5 1000 n2 5 1500
Number of successes x1 5 446 x2 5 303
= =
Sample proportion p1 5 446/1000 5 0.4460 p2 5 303/1500 5 0.2020
Construct a 99% confidence interval for the true difference in proportions of cable TV
viewers who watched TWC in the past week.
10.4 Comparing Two Population Proportions Using Large Samples 507
Solution
Step 1 The sample sizes, number of successes, and sample proportions are given. Check
the nonskewness criterion using estimates for p1 and p2.
= =
n1p1 5 ( 1000 )( 0.4460 ) 5 446 $ 5 n1 ( 1 2 p1 ) 5 ( 1000 ) ( 0.5540 ) 5 540 $ 5
= =
n2 p2 5 ( 1500 )( 0.2020 ) 5 303 $ 5 n2 ( 1 2 p2 ) 5 ( 1500 ) ( 0.7980 ) 5 1197 $ 5
All of the inequalities are satisfied, so the distribution of the difference in sample
proportions is approximately normal. A large-sample confidence interval is
appropriate.
Step 2 Find the critical value.
1 2 a 5 0.99 1 a 5 0.01 1 a /2 5 0.005 Find a/2.
(0.1955, 0.2925) is a 99% confidence interval for the difference in the proportion
of cable TV viewers who watched TWC in the past week in the Northeast and in
the West, p1 2 p2. Note that because 0 is not included in this interval, there is
evidence to suggest that the two proportions are different.
Figures 10.51 and 10.52 together show a technology solution.
Technology Corner
Procedure: Hypothesis tests and confidence intervals concerning two population proportions.
Reconsider: Example 10.11, solution, and interpretations.
Crunchlt!
Use Proportion 2-sample to conduct a hypothesis test and concerning two population proportions, D0 5 0 or D0 2 0, and
to construct a confidence interval for the difference in population proportions.
508 C hapte r 1 0 Confidence Intervals and Hypothesis Tests Based on Two Samples or Treatments
TI-84 Plus C
Use 2-PropZTest to conduct a hypothesis test concerning two population proportions, D0 5 0, and 2-PropZInt to
construct a confidence interval for the difference in population proportions. There is no built-in function to conduct a
hypothesis test if D0 2 0.
1. Select STAT ; TESTS; 2-PropZTest.
2. Enter the number of successes and the number of trials for each sample: x1, n1, x2, n2. Highlight the appropriate alterna-
tive hypothesis. See Figure 10.46.
3. The Calculate and Draw results are shown in Figures 10.47 and 10.48.
4. To construct a confidence interval, select STAT ; TESTS; 2-PropZInt.
5. Enter the number of successes and the number of trials for each sample, x1, n1, x2, n2, and the confidence level
(Figure 10.56).
6. Highlight Calculate and press ENTER . The confidence interval is displayed on the Home Screen. See Figure 10.57.
Minitab
Use the function 2 Proportions to conduct a hypothesis test and to construct a confidence interval. Input is samples in one
column, samples in different columns, or summarized data.
1. Select Stat; Basic Statistics; 2 Proportions.
2. Choose Summarized data and enter the number of successes (Events) and number of trials for each sample.
10.4 Comparing Two Population Proportions Using Large Samples 509
3. Choose the Options option button. Enter a Confidence level, the (hypothesized) Test difference ( D0 ) , and select the
appropriate Alternative hypothesis. If D0 5 0, use the Test method: Use the pooled estimate of the proportion. If
D0 2 0, use Estimate the proportions separately.
4. The hypothesis test results and confidence interval are displayed in a session window. See Figure 10.58.
Excel
There are no built-in functions to conduct a hypothesis test concerning two population proportions or to construct a
confidence interval for the difference in population proportions. However, functions associated with the standard normal
distribution may be used to find critical values and p values. Use ordinary spreadsheet calculations where necessary.
1. Enter the number of successes and number of trials for each sample.
= = =
2. Compute p1, p2, and pc.
3. Compute the value of the test statistic, z, and use the function NORM.S.DIST to find the p value.
= =
4. To construct the confidence interval, compute the difference, p1 2 p2, use the function NORM.S.INV to find the
= = = =
p1 ( 1 2 p1 ) p2 ( 1 2 p2 )
critical value, and compute 1 .
Å n1 n2
5. Find the left endpoint and the right endpoint of the confidence interval. See Figure 10.59.
states, 85 said that an armed revolution might be necessary. Is 10.111 Travel and Transportation Many people who com-
there any evidence to suggest that the proportion of voters in mute to work by car in New York City every day use either the
the West who believe that an armed revolution might be neces- George Washington Bridge or the Lincoln Tunnel. A random
sary is different from the proportion of voters in the East? Use sample of commuters who use one of these two routes was
a 5 0.05. Write a Solution Trail for this problem. obtained, and each was asked whether they carpooled to work.
The data are given in the following table.
10.108 Technology and the Internet In its Millennium
Development Goals Report, the United Nations suggested that Commuting Sample Number who
by the end of 2013 there will be 6.8 billion cell-phone subscrip- route size carpool
tion plans, and approximately 2.7 billion people will be con-
Bridge 1055 530
nected to the Internet.21 A random sample of households in
Brazil and Russia was obtained, and the number of people Tunnel 1663 825
connected to the Internet was recorded for each. The data are a. Verify that the nonskewness criterion inequalities are
given in the following table. satisfied.
Sample Number connected b. Is there any evidence to suggest that the proportion of
Country size to the Internet carpoolers crossing the George Washington Bridge is
greater than the proportion of carpoolers using the
Brazil 326 161 Lincoln Tunnel? Use a 5 0.01.
Russia 387 210 c. Find the p value associated with the hypothesis test in
part (b).
Is there any evidence to suggest that the proportion of house-
holds connected to the Internet is greater in Russia than in 10.112 Conspiracy Theory In a recent survey, Americans
Brazil? Use a 5 0.01. were asked about 20 popular conspiracy theories. Thirty-seven
percent of voters believe global warming is a hoax, 21% are
10.109 Marketing and Consumer Behavior Many critics
certain a UFO crashed in Roswell, New Mexico, and 7% believe
have been known to say, “They sure don’t make movies like the moon landings were faked.22 (You might consider investi-
they used to.” To assess Americans’ opinions of movies, a ran- gating the theory about lizard people who control our society.)
dom sample of people was obtained and each was asked about The survey results were often very different according to politi-
the quality of movies. The data are given in the following table. cal affiliation. A random sample of Democrats and Republicans
Sample Number who said were asked if they believe pharmaceutical companies invent
Age group size movies are getting better new diseases to make money. The resulting data are given in the
following table.
18–29 347 238
30–49 387 221 Number who believe
Political Sample pharmaceutical companies
a. Conduct the appropriate hypothesis test to determine affiliation size invent diseases
whether there is any evidence to suggest that the true
Democrats 788 137
proportion of 18–29-year-olds who believe movies are
getting better is greater than the proportion of 30–49-year-
SOLUTION TRAIL Republicans 866 105
olds. Use a 5 0.001. Is there any evidence to suggest that the proportion of voters
b. Find the p value associated with the hypothesis test in
who believe pharmaceutical companies invent diseases to make
part (a). money is different for Democrats and Republicans? Use
10.110 Public Health and Nutrition Several years a 5 0.01.
ago, most doctors believed that it was not necessary to 10.113 Medicine and Clinical Studies According to the
take any dietary supplement. Now, because many Americans do National Institute of Allergy and Infectious Diseases, approxi-
not eat a healthy, balanced diet, many physicians recommend a mately 54.6% of all U.S. citizens test positive to one or more
once-a-day multivitamin. A random sample of people was allergens. Between 9% and 16% suffer from hay fever. A ran-
obtained and asked whether they regularly take a multivitamin. dom sample of people who suffer from hay fever was obtained,
The data are given in the following table. and each was treated with either a conventional antihistamine
Sample Number who take or butterbur extract. The number of subjects who experienced
Group size a multivitamin relief from hay fever was recorded for each group. The result-
ing data are given in the following table.
Men 490 181
Women 428 214 Sample Number who
Treatment size experienced relief
Is there any evidence that the proportion of women who take
Antihistamine 255 71
a multivitamin is greater than the proportion of men? Use
a 5 0.005. Write a Solution Trail for this problem. Butterbur extract 237 55
512 C hapte r 1 0 Confidence Intervals and Hypothesis Tests Based on Two Samples or Treatments
a. Compute the sample proportion of people who Respondents were asked to rate how safe or unsafe they felt in
experienced relief for each treatment. their neighborhood. The data are given in the following table.24
b. Conduct the appropriate hypothesis test to determine
Number who felt safe
whether the proportion of people who experience relief
Sample in their neighborhood
due to the antihistamine is different from the proportion
District size after dark
of people who experience relief from butterbur extract.
Use a 5 0.01. Hunter Woods 213 141
10.114 Public Health and Nutrition A survey was con- Lake Anne 218 155
ducted concerning physical activity of adults in two states.
Is there any evidence to suggest that the true proportion of resi-
andom samples of adults were obtained from Arizona and
R
dents who feel safe after dark is different in the two districts?
from West Virginia, and they were all asked whether they
Use a 5 0.05.
consider themselves physically inactive. The data are given
in the following table. 10.118 Technology and the Internet The percentage of
Sample Number who are anadians who use technology is very high, but a recent survey
C
State size physically inactive suggests that they greatly overrate their tech savviness. Approx-
imately 60% of Canadians rated themselves as B or better for
Arizona 1122 163 tech savviness, but a large proportion of respondents could not
West Virginia 1181 205 explain roaming, data usage, or online security.25 A random
sample of Canadians from two regions was obtained and asked
Is there any evidence to suggest that the proportion of adults to rate their tech savviness. The results are given in the follow-
who consider themselves physically inactive is greater in West ing table.
Virginia than in Arizona? Use a 5 0.001 and find the p value. Number who rated
10.115 Manufacturing and Product Development Sample their tech savviness
Blenko Specs has two different processes for the manufacture Region size B or better
of optical lenses supplied to the military. A random sample of Edmonton 566 345
finished lenses was obtained from each process, and each lens Thunder Bay 617 330
was carefully inspected for defects. Of the 106 lenses from
Process A, eight were defective, and 12 of the 121 lenses from Is there any evidence to suggest that the true proportion of
Process B were defective. Canadians who rate their tech savviness as B or better is greater
a. Compute the sample proportion of defectives for each in Edmonton than in Thunder Bay? Use a 5 0.01.
process.
b. Check the nonskewness criterion and verify that the
Extended Applications
inequalities are satisfied.
c. Conduct a hypothesis test to determine whether there 10.119 Marketing and Consumer Behavior Americans
is any evidence that the sample proportion of defective have many sources for daily news, for example, local television
lenses is different for Process A and Process B. Use shows, public radio, or national newspapers. A random sample
a 5 0.05. of Americans was obtained and classified by age. Each person
was asked whether he or she obtained news every day from
10.116 Sports and Leisure A major league sports franchise
three specific sources. The data are given in the following table.
can contribute a great deal to the local economy and unite an
entire region. In a recent survey, a random sample of adults in Age group
the Portland, Oregon, area were asked if they would support a
18- to 29-year-olds 30- to 49-year-olds
National Football League team. The data are given in the fol-
lowing table.23 Number who Number who
News Sample obtained news Sample obtained news
Sample Number who would
source size every day size every day
County size support an NFL team
Nightly 570 103 462 120
Clackamas 469 117 network
Multnomah 1985 337 news
Cable 450 108 520 182
Is there any evidence to suggest that the proportion of residents
news
who would support an NFL team is different in the two coun-
networks
ties? Use a 5 0.01.
Internet 546 197 568 239
10.117 Psychology and Human Behavior In a recent sur-
vey, residents of Reston, Virginia, were asked about their quality a. Conduct the appropriate hypothesis test to determine
of life, characteristics of the community, child care, and crime. whether there is evidence that the true proportion of
10.5 Comparing Two Population Variances or Standard Deviations 513
18- to 29-year-olds who obtain news every day from obtained asked whether they planned any landscaping within
nightly network news shows is less than the true the next year. The data are given in the following table.
proportion of 30- to 49-year-olds who obtain news every
day from nightly network news shows. Use a 5 0.05. Sample Number who are
Find the p value associated with this hypothesis test. Residence size planning to landscape
b. Conduct the appropriate hypothesis test to determine Homeowner 261 90
whether there is evidence that the true proportion of 18- to Condominium owner 303 65
29-year-olds who obtain news every day from cable news
networks is less than the true proportion of 30- to Is there any evidence to suggest that the proportion of home-
49-year-olds who obtain news every day from cable news owners planning a landscaping project is more than 0.10 greater
networks shows. Use a 5 0.01. Find the p value than the proportion of condominium owners planning a land-
associated with this hypothesis test. scaping project? Use a 5 0.01.
c. Conduct the appropriate hypothesis test to determine
whether there is evidence that the true proportion of 18- to 10.122 Medicine and Clinical Studies Young children usu-
29-year-olds who obtain news every day from the Internet ally get 5–10 colds each year. To ease cold symptoms, for exam-
is different from the true proportion of 30- to 49-year-olds ple, a runny nose or sore throat, some parents give their children
who obtain news every day from the Internet. Use over-the-counter cough and cold medicines. However, many of
a 5 0.005. Find the p value associated with this these medicines are not effective and can cause serious side
hypothesis test. effects in young children. A random sample of parents of young
children was obtained and asked whether they give their children
10.120 Manufacturing and Product Development Two cough or cold medicine. The data are given in the following table.
different machines in a manufacturing facility are designed to
fill cans with 280 grams of Tang orange-flavored drink mix. A Sample Number who give
random sample of filled cans from each machine was obtained, Parent size cold medicine
and each can was carefully weighed. Of the 134 cans from Male 376 142
machine A, 10 were underfilled, and 7 of 114 cans from
Female 428 183
machine B were underfilled.
a. Compute the sample proportion of underfilled cans for
a. Is there any evidence to suggest that the true population
each machine.
proportion of males who give their children cough
b. Verify the nonskewness criterion.
medicine is different from the true proportion of females?
c. Find a 95% confidence interval for the true difference in
Use a 5 0.05.
the proportion of underfilled cans for machines A and B.
b. Find the p value associated with this hypothesis test.
d. Using the confidence interval in part (c), is there any
c. Find a 95% confidence interval for the difference in the
evidence to suggest that the proportion of underfilled cans
proportion of males who give their children cold medicine
is different for the two machines? Justify your answer.
and the proportion of females who give their children cold
10.121 Psychology and Human Behavior Historically, the medicine. Does this confidence interval support your
three most popular home-improvement projects are interior conclusion in part (a)? Why or why not?
decorating, landscaping, and expansion (respectively). A ran- d. What must be true of the respondents in order for the
dom sample of homeowners and condominium owners was hypothesis test in part (a) to be valid?
An F distribution has positive probability only for non-negative values. The probabil-
ity density function for an F random variable is 0 for x , 0. Once again, it is important
to focus on the properties of an F distribution and the method for finding critical values
associated with this distribution.
Properties of an F Distribution
The numerator and denominator 1. An F distribution is completely determined by two parameters, the number of degrees
designations will make more of freedom in the numerator and the number of degrees of freedom in the denominator,
sense as you read on. given in that order. Both values must be positive integers (1, 2, 3, . . .) and there is, of
course, a different F distribution for every combination.
2. If X has an F distribution with v1 and v2 degrees of freedom, ( X , Fv1,v2 ) , then
v2 2v22 ( v1 1 v2 2 2 )
Why are these restrictions on v2 mX 5 , v2 $ 3 and s2X 5 , v2 $ 5 (10.11)
necessary? What do you suppose
v2 2 2 v1 ( v2 2 2 ) 2 ( v2 2 4 )
the mean is if n2 5 2? 3. Suppose X , Fv1,v2. The density curve for X is positively skewed (not symmetric), and
gets closer and closer to the x axis but never touches it. As both degrees of freedom
increase, the density curve becomes taller and more compact. See Figure 10.60.
f (x)
X ~ F10,15
X ~ F7,7
X ~ F3,4
The definition and notation for an F critical value are analogous to those for Z, t, and
x2 critical values.
Definition
Fa,v1,v2 is a critical value related to an F distribution with v1 and v2 degrees of freedom. If
X , Fv1,v2, then P ( X $ Fa,v1,v2 ) 5 a.
f (x)
X ~ F1,2
P(X ≤ F1−,1,2) =
P(X ≥ F ,1,2) =
0 F1−,1,2 F ,1,2 x
Notice how the degrees of 3. F critical values are related according to the following equation:
freedom switch positions.
1
F12a,v1,v2 5 . (10.12)
Fa,v2,v1
Table VII in the Appendix presents selected critical values associated with various F
distributions and right-tail probabilities. The degrees of freedom in the numerator are
given in the top row and the degrees of freedom in the denominator are given in the left
column. In the body of the table, Fa,v1,v2 is at the intersection of the appropriate row and
column. Left-tail probabilities are found using Equation 10.12. The following example
illustrates the use of Table VII in the Appendix for finding critical values associated
with an F distribution.
Solution
a. F0.05,8,10 is a critical value related to an F distribution with 8 and 10 degrees of free-
dom. By definition, if X , F8,10 , then P ( X $ F0.05,8,10 ) 5 0.05. Using Table VII in
the Appendix, for a 5 0.05, find the intersection of the v1 5 8 column and the
v2 5 10 row.
a 5 0.05
vl
v2 6 7 8 9 10
( ( ( ( ( ( ( (
8 c 3.58 3.50 3.44 3.39 3.35 c
9 c 3.37 3.29 3.23 3.18 3.14 c
10 c 3.22 3.14 3.07 3.02 2.98 c
11 c 3.09 3.01 2.95 2.90 2.85 c
12 c 3.00 2.91 2.85 2.80 2.75 c
( ( ( ( ( ( ( (
Therefore, F0.05,8,10 5 3.07 and if X , F8,10 , then P ( X $ 3.07 ) 5 0.05, as illustrated
in Figure 10.62.
b. F0.99,9,15 is a critical value related to an F distribution with 9 and 15 degrees of freedom.
By definition, if X , F9,15, then P ( X $ F0.99,9,15 ) 5 0.99. Because F0.99,9,15 is in the
left tail of the distribution, use Equation 10.12.
1
F0.99,9,15 5 F120.01,9,15 5
F0.01,15,9
516 C hapte r 1 0 Confidence Intervals and Hypothesis Tests Based on Two Samples or Treatments
f (x)
f (x)
X ~ F8, 10 X ~ F9,15
0 3.07 x 0 0.202 x
Figure 10.62 Visualization of F0.05,8,10 . Figure 10.63 Visualization of F0.99,9,15 .
Using Table VII in the Appendix, for a 5 0.01, find the intersection of the v1 5 15
column and the v2 5 9 row.
a 5 0.01
v1
v2 9 10 15 20 30
( ( ( ( ( ( ( (
7 c 6.72 6.62 6.31 6.16 5.99 c
8 c 5.91 5.81 5.52 5.36 5.20 c
9 c 5.35 5.26 4.96 4.81 4.65 c
10 c 4.94 4.85 4.56 4.41 4.25 c
11 c 4.63 4.54 4.25 4.10 3.94 c
( ( ( ( ( ( ( (
Figure 10.64 Use the Excel
function F.INV.RT to find F Therefore, F0.99,9,15 5 1 / 4.96 5 0.202. If the random variable X , F9,15 then
critical values. P ( X $ 0.202 ) 5 0.99 and P ( X # 0.202 ) 5 0.01, as illustrated in Figure 10.63.
A Closer L ok
1. Table VII in the Appendix is very limited. There are only three values for a and a lim-
ited number of values for v1 and v2. The TI-84 Plus C does not have a built-in function
for finding F critical values. However, the SOLVE feature may be used to find a critical
value related to any F distribution.
2. Minitab and Excel may also be used to find a critical value related to any F distribu-
tion. Minitab uses inverse cumulative probability and Excel uses right-tail probability.
For example, to find F0.99,9,15 using Minitab, let the random variable X , F9,15.
P ( X $ F0.99,9,15 ) 5 0.99 Definition of F critical value.
Hypothesis tests concerning two population variances and a confidence interval for the
ratio of two population variances are based on the following results.
For reference, we’ll call these the Let S 21 be the sample variance of a random sample of size n1 from a normal distribution
two-sample F test assumptions. with variance s21, let S 22 be the sample variance of a random sample of size n2 from a
normal distribution with variance s22, and suppose the samples are independent.
10.5 Comparing Two Population Variances or Standard Deviations 517
Using the same assumptions, a confidence interval for the ratio of two population vari-
ances can be derived. Let X , Fn121,n221 and find an interval that captures 1 2 a in the
middle of this F distribution. Manipulate the inequality to sandwich the ratio s21 /s22.
The hypothesis test procedure described above can also be used to compare two popu-
lation standard deviations. And, you can take the square root of each endpoint of Equation
10.14 to find a 100 ( 1 2 a ) % confidence interval for the ratio of two population standard
deviations. The following example illustrates the hypothesis test procedure.
Connecticut (1)
270 294 174 180 314 274 160 210 255 187 271
Colorado (2)
161 150 164 109 168 172 133 148 120 157 138 94
166 116 98 168 153 118 138 116 120
a. Conduct the appropriate hypothesis test to determine whether there is any evidence
that the population variance in cost per day is different in Connecticut and
Colorado. Assume the costs per day underlying populations are normal and use
a 5 0.02.
b. Find bounds on the p value for the hypothesis test in part (a).
Solution
Step 1 The null hypothesis is that the two population variances are equal. We are look-
ing for any difference in the variances, so the alternative hypothesis is two-sided.
The underlying populations are assumed normal and the samples were obtained
independently. A two-sample F test is appropriate.
In this case, n1 5 11 and n2 5 21; a /2 5 0.01 and 1 2 a /2 5 0.99.
Step 2 The four parts of the hypothesis test are
1 1
s21 5 c 638,819 2 ( 2589 ) 2 d 5 2946.25
10 11
1 1
s22 5 c 414,601 2 ( 2907 ) 2 d 5 609.46
20 21
The value of the test statistic is
s21 2946.25
f5 5 5 4.83 ( $ 3.37 )
s22 609.46
The value of the test statistic lies in the rejection region. Therefore, we reject the
null hypothesis at the a 5 0.02 significance level. There is evidence to suggest
that the two population variances are different.
Step 4 Because the tables of critical values for F distributions are very limited, we can
X ~ F10,20
only bound the p value. Place the value of the test statistic, f 5 4.83, in an
ordered list of critical values with df 10 and 20.
3.37 # 4.83 # 5.08
F0.01,10,20 # 4.83 # F0.001,10,20
0 4.83
Therefore, 0.001 # p / 2 # 0.01
Figure 10.65 p-Value illustration: and, 0.002 # p # 0.02
p 5 2P(X $ 4.83)
5 0.0027 # 0.02 5 a The exact p value is illustrated in Figure 10.65.
Figures 10.66 through 10.69 show technology solutions.
10.5 Comparing Two Population Variances or Standard Deviations 519
Figure 10.66 Figure 10.67 Figure 10.68 Figure 10.69 JMP tests for
2-SampFTest input 2-SampFTest hypothesis 2-SampFTest Draw equality of two population
screen. test results. results. variances.
The following example involves constructing a confidence interval for the ratio of two
population variances.
Construct a 90% confidence interval for the ratio of population variances in diameter of
brain coral polyps.
Solution
Step 1 The samples are independent; the sample sizes and sample variances are given.
A confidence interval for the ratio of two population variances is appropriate.
Step 2 Find the critical values.
Fa/2,n121,n221 5 F0.05,15,20 5 2.20 Critical value, left endpoint. Use Table VII in the Appendix.
Fa/2,n221,n121 5 F0.05,20,15 5 2.33 Critical value, right endpoint. Use Table VII in the Appendix.
Step 3 Use Equation 10.15.
s 21 1 s 21
a , Fa/2,n221,n121 b Equation 10.15.
s 22 Fa/2,n121,n221 s 22
0.267 1 0.267
5a , ( 2.33 ) b Use sample variances and critical values.
0.172 2.20 0.172
5 ( 0.7056, 3.6169 ) Simplify.
(0.7056, 3.6169) is a 90% confidence interval for the ratio of the population
variances.
Suppose a two-sample t test will be used to compare two population means. The
hypothesis test presented in this section is often used first to compare the population
variances. The results and conclusion suggest the appropriate hypothesis test concerning
population means from Section 10.2, according to whether or not there is evidence that
the two population variances are unequal.
520 C hapte r 1 0 Confidence Intervals and Hypothesis Tests Based on Two Samples or Treatments
Technology Corner
Procedure: Hypothesis tests and confidence intervals concerning two population variances.
Reconsider: Example 10.15, solution, and interpretations.
Crunchlt!
There is no built-in function to conduct hypothesis tests nor to construct a confidence interval concerning two population
variances. However, the F Distribution Calculator can be used to find critical values.
TI-84 Plus C
Use the built-in function 2-SampFTest to conduct a hypothesis test concerning two population variances. Input is either
data in lists or summary statistics. There is no built-in function to construct a confidence interval for the ratio of population
variances.
1. Enter the Connecticut data into list L1 and the Colorado data into list L2.
2. Select STAT ; TESTS; 2-SampFTest.
3. Highlight Data and enter the two lists. Set each frequency to 1 and highlight the appropriate alternative hypothesis.
See Figure 10.66.
4. Highlight Calculate and press ENTER to display the hypothesis test results. See Figure 10.67. The Draw results
are shown in Figure 10.68.
Minitab
Use the built-in function 2 Variances to conduct a hypothesis test concerning two population variances and to construct a
confidence interval for the ratio of the population variances. Input is samples in one column (with subscripts), samples in
different columns, or summarized data. Several graph options are also available.
1. Enter the Connecticut data into column C1 and the Colorado data into column C2.
2. Select Stat; Basic Statistics; 2 Variances.
3. Choose Each sample is in its own column, and enter C1 in the Sample 1 input window and C2 in the Sample 2 input
window.
4. Choose the Options option button. Select the Ratio of sample variances, enter a Confidence level, enter a Hypothesized
ratio (and value, usually 1), and an Alternative hypothesis.
5. Click OK and the results are displayed in a session and graph window. See Figures 10.70 and 10.71.
Figure 10.70 The 2 Variances output from Figure 10.71 The 2 Variances graph results
Minitab. from Minitab.
10.5 Comparing Two Population Variances or Standard Deviations 521
Excel
Use the built-in function F-Test Two-Sample for Variances to conduct a hypothesis test concerning two population
variances. There is no built-in function to construct a confidence interval for the ratio of population variances.
1. Enter the Connecticut data into column A and the Colorado data into column B.
2. Under the Data tab, select Data Analysis; F-Test Two-Sample for Variances.
3. Enter the Variable 1 range, Variable 2 Range, a value for Alpha, and specify an Output option. There is no alternative
hypothesis option.
4. Click OK to view the summary statistics and the hypothesis test results. See Figure 10.72. The p value displayed is for a
one-sided hypothesis test. Double this value for a two-sided test.
a. Write the four parts for a one-sided, left-tailed hypothesis 10.138 Medicine and Clinical Studies A study in the British
test concerning the two population variances with a 5 0.05. Medical Journal suggested that children who received a diag-
b. Compute each sample variance, find the value of the test nostic CT scan using ionizing radiation were more likely to
statistic, and draw a conclusion. develop a cancer 10 years after radiation exposure. Newer CT
c. Find bounds on the p value associated with this hypothesis scanners use less radiation. Therefore, the increased risk for
test. children today may be decreased.26 Independent random sam-
ples of hospital CT scans in the United States and England were
10.133 Consider independent random samples of sizes
obtained, and the amount of radiation for each was recorded (in
n1 5 10 and n2 5 16 from normal populations.
mSV). For the United States, n1 5 10 and s21 5 1.075; for
a. Write the four parts for a two-sided hypothesis test
concerning the population variances with a 5 0.01. England n2 5 16 and s22 5 2.786. Is there any evidence to
b. Suppose s 21 5 426.42 and s 22 5 88.36. Find the value of suggest that the variability in CT radiation per scan is different
the test statistic, and draw a conclusion about the for these two countries? Use a 5 0.05 and assume normality.
population variances. 10.139 Travel and Transportation Most airlines now
c. Find bounds on the p value associated with this hypothesis charge passengers to check a bag and impose a surcharge if the
test. weight of the bag is over 50 pounds. Independent random sam-
10.134 In each of the following problems, the sample sizes and ples of checked luggage on Delta and American flights were
the confidence level are given. Find the appropriate F critical obtained, and the weight (in pounds) of each was recorded. For
values for use in constructing a confidence interval for the ratio Delta, n1 5 25, and s21 5 96.23; for American, n2 5 21 and
of the population variances. s22 5 194.02. Is there any evidence to suggest that the variabil-
a. n1 5 10, n2 5 10, 90% ity in checked baggage weight is different for these two air-
b. n1 5 21, n2 5 31, 98% lines? Use a 5 0.05 and assume normality.
c. n1 5 9, n2 5 7, 98% 10.140 Sports and Leisure Many basketball purists believe
d. n1 5 41, n2 5 31, 99.8% that the three-point shot (a shot from behind the three-point
10.135 In each of the following problems, the sample sizes, line, 22 feet from the basket) has dramatically changed the
the sample variances, and the confidence level are given. game, for the worse. Independent random samples of attempted
Assume the underlying populations are normal and the samples shots from National Basketball Association games played in
were obtained independently. Find the associated confidence 1975 (prior to the three-point shot) and in 2013 were obtained.
interval for the ratio of the population variances. The shot distance (in feet) was recorded for each attempt. The
data are given in the following table.
a. n1 5 10, s 21 5 17.2, n2 5 9, s 22 5 15.6, 90%
2
b. n1 5 16, s 1 5 54.1, n2 5 16, s 22 5 32.6, 98% Sample Sample
2
c. n1 5 16, s 1 5 3.35, n2 5 31, s 22 5 4.59, 98% Year size variance
d. n1 5 31, s 1 5 126.8, n2 5 41, s 22 5 155.3, 99.8%
2
1975 61 12.25
10.136 Use Table VII in the Appendix and linear interpolation 2013 61 26.01
to approximate each critical value. Verify each approximation
using technology. Is there any evidence to suggest that the variability in shot
a. F0.05,25,15 b. F0.99,20,32 distance is greater in the year 2013 than it was in 1975? Use
c. F0.01,10,56 d. F0.025,15,20 a 5 0.01 and assume normality. (Why do you suppose there is
e. F0.995,10,7 f. F0.05,35,35 greater variability in shot distance with a three-point shot?)
10.141 Sports and Leisure A Laurel Downs racetrack
Applications o fficial believes there is less variability in winning times for a
10.137 Biology and Environmental Science In a recent race in which the purse is at least $10,000, called a stakes race.
study conducted by the NOAA, the aerosol light absorption Independent random samples of ordinary races and stakes races
coefficient was measured (in Mm21) at randomly selected loca- were obtained, and the winning time (in seconds) for each race
tions in Africa and in South America. The resulting data are was recorded. The summary statistics were as follows: ordinary
summarized in the following table. race, n1 5 26 and s21 5 110.25; stakes race, n2 5 26 and
s22 5 38.44.
Sample Sample a. Write the four parts for a hypothesis test to check for
Country size variance evidence of the official’s assertion. Use a 5 0.01, assume
Africa 10 243.36 normality, and find the critical value using technology.
Conduct the hypothesis test and draw a conclusion.
South America 21 51.84
b. Construct a 98% confidence interval for the ratio of
population variances.
Is there any evidence that the population variance in aerosol
light absorption coefficient is greater in Africa than in South 10.142 Sports and Leisure A study was conducted to
America? Use a 5 0.05 and assume normality. compare the variability in times for men and women involved
10.5 Comparing Two Population Variances or Standard Deviations 523
in collegiate swimming events. Independent random samples 12.42-mile race up Pikes Peak in Colorado features over 156
of 800-meter freestyle competitors were obtained, and the time turns, grades of 7%, and a finish line at 14,110 feet. Indepen-
(in minutes) was recorded for each swimmer. The data are dent random samples of two classes of cars in the 2013 race
summarized in the following table. were obtained, and the speed (in mph) for each was recorded.
The summary statistics are given in the following table.28
Sample Sample
Group size variance Sample Sample
Class size variance
Men 11 0.1025
Women 12 0.1241 Exhibition Powersports 9 44.05
Heavyweight Supermoto 9 9.76
a. Find the critical values necessary to construct a 95%
confidence interval for the ratio of population variances. Is there any evidence to suggest that the variability in speed for
b. Construct the confidence interval. the Exhibition Powersports class is greater than the variability
in speed for the Heavyweight Supermoto class? Use a 5 0.01
10.143 Take the Stairs The World Summit Wing Hotel in
and assume normality.
China is Beijing’s tallest hotel and hosts the Vertical Run. There
are 81 floors, 330 meters, and 2041 steps to reach the roof. 10.147 Manufacturing and Product Development The
Independent random samples of men’s and women’s times (in Akashi-Kaikyo bridge in Japan is the longest suspension bridge
minutes) from the 2013 run were obtained.27 VERTRUN in the world, with a main span of 1991 meters. Two million
a. Is there any evidence of a difference in variability of times workers took 10 years to construct this bridge using 181,000
for men and women who finished this vertical run? Use tons of steel and 1.4 million cubic meters of concrete. Indepen-
a 5 0.05 and assume normality. dent random samples of suspension bridges in China and the
b. Find bounds on the p value associated with this hypothesis United States were obtained, and the span (in meters) of each
test. was recorded. For China, n1 5 14 and s21 5 66,096.8; and for
the United States, n2 5 10 and s 22 5 59,524.5. Is there any
10.144 Manufacturing and Product Development A sail-
evidence to suggest a difference in the variability of the span
boat manufacturer has two machines for constructing main
of suspension bridges in China and the United States? Use
mast poles with diameter designed to be 76.2 mm. Small vari-
a 5 0.05 and assume normality.
ability in production is very important to ensure boat control
and safety. Independent random samples of mast poles pro-
duced on each machine were obtained, and the diameter of each Extended Applications
was carefully measured. The summary statistics were as fol- 10.148 Physical Sciences Crude oil pumped from ocean
lows: Machine A, n1 5 7, s21 5 0.0231; Machine B, wells contains salt that must be removed before the oil is
n2 5 8, s22 5 0.0096. Conduct the appropriate hypothesis test refined. Otherwise, equipment would erode quickly. Indepen-
to determine whether there is any evidence of a difference in dent random samples of unrefined crude oil from two ocean
variability of mast-pole diameter between the two machines. wells were obtained, and the percentage of salt in each sample
Assume normality, use a 5 0.05, find the p value associated was recorded. The data are given in the following table.
with this test, and use this value to draw a conclusion.
Sample Sample
10.145 Manufacturing and Product Development Tower
Oil well size variance
cranes along a city skyline often indicate the success of eco-
nomic development efforts. In early January 2013, more than North Sea 21 56.40
50 tower crane permits were in use in Washington, D.C. Inde- Antarctica 31 82.42
pendent random samples of items lifted by cranes at two differ-
ent sites were obtained, and the weight (in tons) of each item a. Conduct the appropriate test to determine whether there is
was recorded for each. The summary statistics are given in the evidence of any difference in variability of salt content
following table. between these two wells. Use a 5 0.10 and assume
normality.
Sample Sample
b. Use Table VII in the Appendix to find bounds on the p
Site size variance
value for this hypothesis test. Use technology to find an
New York Avenue 31 109.86 exact p value.
Jefferson at Market Place 31 339.43 10.149 Public Health and Nutrition Saccharin is a low-
calorie sweetener used in sugar-free foods and beverages.
Is there any evidence to suggest that the variability in item
According to the U.S. Food and Drug Administration, the
weight at Jefferson is greater than at New York Avenue? Use
acceptable daily intake (ADI) of saccharin is 5 mg for a per-
a 5 0.01 and assume normality.
son with a body weight of 60 kg. If saccharin is used as an
10.146 Sports and Leisure The Pikes Peak International additive, it must be included on the food label and cannot
Hill Climb is also known as The Race to the Clouds. This exceed certain limits. Independent random samples of
524 C hapte r 1 0 Confidence Intervals and Hypothesis Tests Based on Two Samples or Treatments
12-ounce bottles of iced tea from two different manufacturers the null hypothesis, H0: s 21 5 s 22 , is true, then the test statistic
were obtained, and the amount of saccharin in each drink was F 5 S 21 / S 22 is approximately normal with
measured (in mg). The summary statistics were as follows:
Fishing Creek, n1 5 20, s21 5 7.84; Honest Tea, n2 5 15, n2 2 1 2 ( n2 2 1 ) 2 ( n1 1 n2 2 4 )
mF 5 and s2F 5
s 22 5 2.89. n2 2 3 ( n1 2 1 )( n2 2 3 ) 2 ( n2 2 5 )
a. Conduct the appropriate test to determine whether there is An approximate hypothesis test is based on standardizing F to a
any difference in the population variance of saccharin Z random variable.
amounts. Use a 5 0.02.
b. Find bounds on the p value associated with this hypothesis
The four parts of the hypothesis test are
test. H0: s 21 5 s 22
10.150 Fuel Consumption and Cars The length of time Ha: s 21 . s 22, s 21 , s 22, or s 21 2 s 22
brake pads last in an automobile varies depending on driving ( S 21 / S 22 ) 2 3 ( n2 2 1 ) / ( n2 2 3 ) 4
style and the type of car. Brake pads are made from organic, TS: Z 5
semimetallic, metallic, or synthetic materials, and typically 2 ( n2 2 1 ) 2 ( n1 1 n2 2 4 )
last between 30,000 and 70,000 miles. Independent random Å ( n1 2 1 )( n2 2 3 ) 2 ( n2 2 5 )
samples of cars in for an inspection at dealerships and private RR: Z $ za, Z # 2za or 0 Z 0 $ za / 2
garages were obtained and the width (in mm) of the brake pad
The National Wind Energy Assessment contains data from
on the front driver’s side was measured for each. The data are
975 stations and includes measurements of wind speed and
given in the following table.
wind power density. Suppose independent random samples of
Sample Sample wind power density (in watts/m2) during the winter were
Location size variance obtained from two stations. The data are summarized in the
following table.
Dealership 41 1.056
Private garage 41 2.771 Sample Sample Sample
Location size mean variance
a. Is there any evidence to suggest that the variance in
brake-pad widths of cars in for inspection is greater at Chanute 31 207 95.35
private garages than at dealerships? Use a 5 0.01 and Dodge City 31 283 53.68
assume normality.
b. Find bounds on the p value associated with this hypothesis a. Write the four parts of a large-sample, two-sided,
test. approximate test based on the standard normal
distribution to determine whether there is any evidence to
suggest that the two population variances are different.
Challenge
Conduct the test using a 5 0.05.
10.151 Physical Sciences When we are comparing two pop- b. Conduct an exact hypothesis test based on the F
ulation variances, if both sample sizes, n1 and n2, are large and distribution. Compare your answer to part (a).
Independent samples 463 Two samples are independent if the process of selecting individuals or objects in
sample 1 has no effect on the selection of individuals or objects in sample 2.
Paired data set 463 The result of matching each individual or object in sample 1 with a similar indi-
vidual or object in sample 2.
( n1 2 1 ) S 21 1 ( n2 2 1 ) S 22
Pooled estimator for 475 S 2p 5
n1 1 n2 2 2
the common variance
= X1 1 X2
Combined estimate of 502 Pc 5
n1 1 n2
the common population
proportion
Chapter 10 Summary 525
Null Alternative
hypothesis Assumptions hypothesis Test statistic Rejection region
m1 2 m2 5 D0 Normality, m1 2 m2 . D0 ( X 1 2 X2 ) 2 D0 Tr $ ta,v
independence, Tr 5
m1 2 m2 , D0 S 21 S 22 Tr # 2ta,v
s21, s22 unknown, m1 2 m2 2 D0 1 0 Tr 0 $ ta/2,v
Å n1 n2
s21 2 s22.
s21 s22 2
a 1 b
n1 n2
v5
( s21 /n1 ) 2 ( s22 /n2 ) 2
1
n1 2 1 n2 2 1
mD 5 D0 Normality, mD . D0 D 2 D0 T $ ta,n21
n pairs, T5
mD , D0 SD/ !n T # 2ta,n21
dependence. mD 2 D0 0 T 0 $ ta/2,n21
= =
p1 2 p2 5 0 n1, n2 large, p1 2 p2 . 0 P1 2 P2 Z $ za
nonskewness, p1 2 p2 , 0 Z5
= = 1 1 Z # 2za
independence. p1 2 p2 2 0 Pc ( 1 2 Pc ) a 1 b 0 Z 0 $ za / 2
Å n1 n2
= X1 1 X2
Pc 5
n1 1 n2
= =
p1 2 p2 5 D0 n1, n2 large, p1 2 p2 . D0 ( P1 2 P2 ) 2 D0 Z $ za
nonskewness, p1 2 p2 , D0 Z5 = = = = Z # 2za
P1 ( 1 2 P1 ) P2 ( 1 2 P2 )
independence. p1 2 p2 2 D0 1 0 Z 0 $ za / 2
Å n1 n2
North Carolina 22 835.6 3192.25 Is there any evidence to suggest that the true proportion of chest
Virginia 25 884.2 3956.41 hits is greater in Lancashire than in West Mercia? Use a 5 0.01.
Is there any evidence to suggest that the mean amount of 10.154 Biology and Environmental Science Soybeans are
corrosive material carried by trucks in North Carolina is an important source of oil and protein and are also used to
Chapter 10 Exercises 527
produce many food additives. The leading producers of a. What is the common characteristic that makes these data
soybeans are the United States, Brazil, Argentina, and China. paired?
The first genetically modified (GM) soybeans were grown in b. Assume normality. Conduct the appropriate hypothesis
the United States in 1996, and now GM soybeans are grown in test to determine whether there is any evidence that the
at least nine countries. Independent random samples of soybean aluminum arrow flies faster. Use a 5 0.05.
farmers in the United States and Brazil were obtained, and the c. Find bounds on the p value associated with this
number growing GM soybeans was recorded. The data are hypothesis test.
given in the following table.
10.158 Manufacturing and Product Development
Sample Number of Raytheon Aircraft is now manufacturing business jets with a
Country size GM soybean farmers molded carbon fiber fuselage instead of aluminum. This
reduces the overall weight of the plane, speeds production time,
United States 238 202
and increases cabin space. The total wall thickness of a carbon
Brazil 162 104 fiber fuselage is 0.81 inch versus 3 inches for aluminum, and
a. Find the sample proportion of GM soybean farmers for the variability in thickness is theoretically much smaller also.
each country. Verify the nonskewness criterion. Independent random samples of the two fuselage types were
b. Conduct the appropriate hypothesis test to determine obtained, and the thickness was measured (in inches) on each.
whether there is any evidence that the true proportion of The data are given in the following table.
GM soybean farmers in the United States is 0.15 greater Fuselage Sample Sample
than in Brazil. Use a 5 0.05. type size variance
c. Find the p value associated with this hypothesis test.
Aluminum 9 0.0196
10.155 Biology and Environmental Science Maple syrup
Carbon fiber 11 0.0025
producers in New York and Vermont collect sweet-water sap
from sugar maples and black maples in early spring. It takes Is there any evidence to suggest that the variability in fuselage
approximately 30–50 gallons of sap to yield, through boiling thickness is less for carbon fiber fuselages? Use a 5 0.01.
and evaporation, 1 gallon of maple syrup. Independent
random samples of maple trees in both states were obtained, 10.159 Biology and Environmental Science Piers on public
and the amount of sap collected from each tree was recorded. beaches are usually supported by widely spread piles or pillars and
Assume the underlying populations are normal, with equal can extend a thousand feet into the ocean. Many piers are
variances. Is there any evidence to suggest that the population extensions of boardwalks, and visitors frequently fish or simply
mean amount of sap from trees in New York is different from sightsee along these walkways. Longer piers tend to be more
the population mean amount of sap from trees in Vermont? susceptible to wind and storm damage. Independent random
Use a 5 0.01. samples of concrete and wooden piers on public beaches along the
SAP
California and Florida coasts were obtained and the length (in feet)
10.156 Physical Sciences Recycling of aluminum, glass, of each was recorded. Assume the underlying populations are
newspapers, and magazines is good for the environment and normal, with equal variances. Is there any evidence to suggest that
the economy. In 2013, San Francisco had the highest recy- the population mean pier length in California is greater than the
cling rate in the United States30 (recycling rate 5 tons population mean pier length in Florida? Use a 5 0.01. PIERS
collected for recycling/tons of all waste generated). Despite
efforts to make the process easier, many people still do not 10.160 Public Health and Nutrition In case you missed it,
recycle. Independent random samples of residents in Ohio the United Nations declared 2008 as the International Year of the
and in Florida were obtained and asked whether they recycle Potato. Seriously, potatoes are a good source of carbohydrates,
newspapers. Of the 909 Ohio residents, 700 said they protein, fiber, and potassium. However, the amount of each
recycled newspapers, and 691 of the 923 Florida residents element varies depending on where the potato is grown. Indepen-
said they recycled newspapers. dent random samples of medium-sized potatoes from Russia and
a. Is there any evidence to suggest that the population China were obtained, and the amount of potassium (in mg) was
proportion of residents in Ohio who recycle newspapers measured in each. The data are summarized in the following table.
is greater than the population proportion of residents in Sample
Florida? Use a 5 0.01. Sample Sample standard
b. Find the p value for this hypothesis test. Location size mean deviation
10.157 Sports and Leisure Archery target shooters use a Russia 25 896.8 92.9
variety of arrows made from wood, carbon, aluminum, or even China 30 866.0 120.0
platinum. One measure of the quality of an arrow (and bow) is
the speed of the arrow when shot. A random sample of archers Assume normality and equal population variances. Is there any
was obtained, and each was asked to shoot a carbon arrow and a evidence to suggest that the population mean potassium level is
similarly made aluminum arrow. The speed (in feet per second) different for a medium-sized potato in Russia and China? Use
of each arrow was measured. ARCHERY a 5 0.05.
528 C hapte r 1 0 Confidence Intervals and Hypothesis Tests Based on Two Samples or Treatments
10.161 Biology and Environmental Science The n2 5 1448 and 87 were contaminated. Is there any evidence
moisture content in bulk grain is important, because high to suggest that the true proportion of contaminated cat food
values can encourage the development of fungi. Potential is different from the true proportion of contaminated dog
buyers want to know how much water they are buying along food? Use a 5 0.05.
with their grain. Two direct methods for measuring the
moisture content are by means of a chemical reaction (with
iodine in the presence of sulfur dioxide) and by distillation. A EXTENDED APPLICATIONS
random sample of bulk grain was obtained, and the moisture 10.165 Economics and Finance The U.S. Internal
content of each grain sample was measured as a percentage of Revenue Service estimates that the average taxpayer takes
water using each method. Assuming normality, conduct the approximately six hours to complete Form 1040. A study was
appropriate hypothesis test to determine whether there is any conducted to examine the amount of time it takes to complete
difference in the population mean moisture content of bulk this dreaded form, by income level. Independent random
grain measured by chemical reaction and by distillation. Use samples of federal filers in two income ranges were obtained,
a 5 0.05. GRAIN and the length of time (in hours) to complete Form 1040 was
10.162 Psychology and Human Behavior Two recent recorded for each. The summary statistics are given in the
studies suggest that people who drive really nice cars exhibit following table.
some very bad habits. In one study, as a car approached a Income level Sample Sample Sample
crosswalk, a person stepped into the road, and the driver’s (in dollars) size mean variance
reaction was recorded. In another, similar study, independent
50,000–<100,000 17 4.56 1.5625
random samples of drivers were selected, and their behavior
was observed at a four-way intersection. For luxury-car drivers, 100,000–<200,000 14 6.58 15.0544
n1 5 217 and 130 cut ahead in the usual four-way rotation. For
Assume the underlying populations are normal.
ordinary-car drivers, n2 5 182 and 82 violated the four-way-
a. Conduct an F test to determine whether there is any
intersection rotation rule. Is there any evidence to suggest that
evidence that the two population variances are different.
the proportion of luxury-car drivers with insufferable driving
Use a 5 0.02.
habits is greater than the proportion of ordinary-car drivers with
b. Using your conclusion from part (a), conduct the
similar habits? Use a 5 0.01. Note: The largest group of
appropriate test for evidence that the mean time to
driving-rule etiquette violators were men, ages 35–50, with
complete Form 1040 for the lower-income level is less
blue BMWs.31
than the mean time for the higher-income level. Use
10.163 Public Policy and Political Science California law a 5 0.05. State your conclusion and find bounds on the
requires fuel outlets to install special catch basins designed to p value.
contain gasoline leaks in underground storage tanks. Owners
10.166 Public Policy and Political Science In many
who do not comply can face stiff fines and other penalties.
Independent random samples of gasoline stations around Los states, lawyers are encouraged to do pro bono work by both
Angeles and around San Francisco were obtained, and each their firms and judicial advisory councils. However, in recent
station was inspected for catch basins. Sixteen of 140 stations years lawyers have been devoting more time to paying clients
near Los Angeles had no catch basins, and 12 of 126 in San and less time to pro bono legal aid. Independent random
Francisco were not complying with the law. samples of lawyers from two large firms were obtained, and
a. Find the sample proportion of stations without the number of pro bono hours for the past year was recorded
catch basins near each city. Verify the nonskewness for each lawyer. The summary statistics are given in the
criterion. following table.
b. Is there any evidence that the population proportion Sample Sample Sample
of stations in noncompliance with the law is different Law firm size mean variance
near Los Angeles and near San Francisco? Use
Dewey, Cheatum, & Howe 26 75.1 5.92
a 5 0.01.
Fine, Howard, & Fine 26 80.9 5.65
10.164 Manufacturing and Product Development Dur-
ing Summer 2013, Procter and Gamble (P&G) recalled 30 Assume the underlying populations are normal, with equal
different types of cat and dog food because they may have variances.
been contaminated with Salmonella. While pets can become a. Is there any evidence to suggest the mean number of
ill from eating contaminated foods, the Centers for Disease yearly pro bono hours is different at these two law firms?
Control and Prevention also reminded people to wash their Use a 5 0.01.
hands thoroughly after handling pet food. Independent b. Construct a 99% confidence interval for the difference in
random samples of P&G cat foods and dog foods were mean pro bono hours.
obtained, and each was tested for Salmonella. For cat food, c. Does the confidence interval in part (b) support your
n1 5 1250 and 50 were contaminated, and for dog food, conclusion in part (a)? Explain.
Chapter 10 Exercises 529
Looking Back
n Recall the statistical tests use to compare two population means.
n Remember the assumptions and four parts to a two-sample t test.
Looking Forward
n Learn how to compare k (. 2) population means.
n Understand the assumptions and conduct one-way, single-factor, analysis of variance
(ANOVA) tests.
n Learn methods to isolate group differences if an ANOVA test is significant.
n Understand the assumptions, how the total variation is decomposed, and the hypothesis
tests in a two-way ANOVA.
531
532 C hap te r 1 1 The Analysis of Variance
Or from three
different
populations?
1 2 3
Figure 11.1 A visualization of a typical ANOVA problem.
The notation used in a one-way ANOVA is similar to and is an extension of the nota-
tion used in Chapter 10.
ANOVA Notation
k 5 the number of populations under investigation.
Population 1 2 c i c k
Population mean m1 m2 c mi c mk
Population variance s21 s22 c s2i c s2k
Sample size n1 n2 c ni c nk
Sample mean x1. x2. c xi. c xk.
Sample variance s21 s22 c s2i c s2k
n 5 n1 1 n2 1 c1 nk
5 the total number of observations in the entire data set
The null and alternative hypotheses are stated in terms of the population means.
H0: m1 5 m2 5 c5 mk (All k population means are equal.)
Ha: mi 2 mj for some i 2 j (At least two of the k population means differ.)
11.1 One-Way ANOVA 533
The assumptions for this test procedure are similar to those for a two-sample t test.
1. The k population distributions are normal.
2. The k population variances are equal; that is, s21 5 s22 5 c5 s2k .
For reference, these are the
one-way ANOVA assumptions.
3. The samples are selected randomly and independently from the respective populations.
To denote observations, we use a single letter with two subscripts. The first subscript
indicates the sample number and the second subscript denotes the observation number
within that sample. In general,
xij 5 the jth measurement taken from the ith population.
Xij 5 the corresponding random variable.
A comma is placed between i and j if there is ambiguity. For example, x1,23 is the 23rd
observation in the first sample, but x12,3 is the 3rd observation in the 12th sample.
The mean of the observations in the ith sample is
x.. 5 a a xij
1 k ni
(11.2)
n i51 j51
There is just a little more notation that will make some of the calculations easier.
ti. 5 a xij
ni
A more theoretical development 5 sum of the observations in the ith sample.
j51
of a one-way ANOVA test includes
t.. 5 a a xij 5 sum of all the observations.
k ni
the corresponding random
variables Xi., X.., Ti. , and T... i51 j51
Here’s where the analysis of variance plays a role. The total variation in the data (the
total sum of squares) is decomposed into a sum of between-samples variation (the sum of
squares due to factor) and within-samples variation (the sum of squares due to error). The
total variation is the variability of individual observations from the grand mean. The
between-samples (or between-factor) variation is the variability in the sample means; this
tells us how different the sample means are from each other. The within-samples variation
is the variability of the observations from their sample mean; this is just like the sample
variance we have already been using. The three sums of squares are defined in the follow-
ing fundamental identity, which shows the decomposition of the total variation in the data.
A Closer L ok
1. SSA is used to denote the sum of squares due to factor instead of SSF because in a
two-way ANOVA there is a factor A and a factor B.
2. The sample size is used as a weight in the expression for SSA.
534 C hap te r 1 1 The Analysis of Variance
If the null hypothesis is true, then each observation comes from the same population
with mean m and variance s2 . Therefore, the sample means, the xi. ’s, should all be about
the same, should all be close to the grand mean, x.. , and SSA should be (relatively)
small. If at least two population means are different, then at least two xi. ’s should be
very different, and these values will be far away from x.. . In this case, SSA should be
(relatively) large.
The one-way ANOVA test statistic is based on two separate estimates for the common
variance, s2, computed using SSA and SSE.
Definition
The mean square due to factor, MSA, is SSA divided by k 2 1:
SSA
MSA 5 (11.4)
k21
The mean square due to error, MSE, is SSE divided by n 2 k:
SSE
MSE 5 (11.5)
n2k
If the null hypothesis is true, then (the random variable) MSA is an unbiased esti-
mator of s2 . If Ha is true, then MSA tends to overestimate s2 . The mean square due
to error, MSE, is an unbiased estimator of the common variance s2 whether H0 or Ha
is true.
Stepped
STEPPED Tutorial
TUTORIALS Consider the ratio F 5 MSA/MSE. If the value of F is close to 1, then the two esti-
Conditions
BOX PLOTS and mates of s2, or sources of variation, are approximately the same. There is no evidence to
the F test suggest that the population means are different. If the value of F is much greater than 1,
then the variation between samples is greater than the variation within samples. This sug-
gests that the alternative hypothesis is true.
One-way ANOVA does not mean If the one-way ANOVA assumptions are satisfied and H0 is true, then the statistic
a one-sided statistical test, but F 5 MSA /MSE has an F distribution with k 2 1 and n 2 k degrees of freedom. Because
indicates that there is one factor. large values of F suggest that Ha is true, the rejection region is only in the right tail of the
distribution.
If the value of F is smaller than the critical value, then there is no evidence from the data
to reject H0. If the value of F is in the rejection region, we say that the F test is significant
and that there is a statistically significant difference among the population means. Recall
that we can also use the p value to conduct an equivalent test: If p # a, then we reject H0.
Remember that if any of the one-way ANOVA assumptions are violated, then the
conclusion is unreliable. Several methods to check for normality were presented in
Section 6.3, and there are statistical procedures for testing equality of variances. The
samples must also be selected randomly and independently from the appropriate
populations.
The following computational formulas (rather than the definitions) are used to find the
sums of squares, and then the mean squares.
11.1 One-Way ANOVA 535
5
i51 j51
5
definition computational formula
One last detail: One-way ANOVA calculations are often presented in an analysis of
variance table, or ANOVA table. The values included in this table are associated with
the three sources of variation and the calculation of the F statistic.
One-way ANOVA summary table
Source of Sum of Degrees of Mean
variation squares freedom square F p Value
SSA MSA
Factor SSA k21 MSA 5 p
k21 MSE
SSE
SOLUTION TRAIL
Error SSE n2k MSE 5
n2k
Data Set
Total SST n21
sodium
The following example illustrates the computations involved in a one-way ANOVA
and the process of making an inference based on the value of the test statistic.
Solution Trail 11.1 Example 11.1 Take with a Grain of Salt
K e ywor ds A recent research study suggests that a high-salt diet in older women increases the risk of
n Is there any evidence? breaking a bone.2 Although the biological mechanism is still unclear, there does appear to be
n at least two of the population an association between excessive sodium intake and bone fragility. One measure of bone
means are different health is the vitamin D blood level. As determined by food questionnaires, independent ran-
n four categories dom samples of older women in four different salt intake categories were obtained. The vita-
min D blood level (in nmol/L) was measured in each. The data are given in the following table.
n Independent random samples
T ransl ation
Sample Observations
n Statistical inference Very high 91.5 77.5 94.5 77.5 92.0
n analysis of variance High 89.0 92.0 98.2 80.0 86.7
Conc epts Moderate 92.5 100.7 94.0 93.3 106.3
n One-way ANOVA test Low 100.1 98.0 99.1 103.9 97.6
procedure
Is there any evidence to suggest that at least two of the population mean vitamin D blood
Vision
levels are different? Use a 5 0.05.
Compute the summary statis-
tics necessary to complete the Solution
ANOVA summary table. Find
the critical value, and draw the Step 1 Assume the one-way ANOVA assumptions are true. There are k 5 4 samples or
appropriate conclusion. groups and n 5 5 1 5 1 5 1 5 5 20 total observations. Some summary statis-
tics are given in the following table.
536 C hap te r 1 1 The Analysis of Variance
The summary statistics table Sample Sample size Sample total Sample mean Sample variance
suggests a possible violation of
the equal-variances assumption.
Very high n1 55 t1. 5 433.0 x1. 5 86.60 s21 5 70.30
There are formal statistical High n2 55 t2. 5 445.9 x2. 5 89.18 s22 5 44.94
procedures for testing equality Moderate n3 55 t3. 5 486.8 x3. 5 97.36 s23 5 35.62
of (several) variances, but the Low n4 55 t4. 5 498.7 x4. 5 99.74 s24 5 6.36
ANOVA test statistic is robust.
That is, even if the population The sum of all the observations, or grand total, is
variances are a little different, the t.. 5 433.0 1 445.9 1 486.8 1 498.70 5 1864.4
test still provides reliable results. In
Step 2 The four parts of the hypothesis test are
addition, because the sample sizes
are small, it is more difficult to H0: m1 5 m2 5 m3 5 m4 (all four population means are equal)
detect a real difference in Ha: mi 2 mj for some i 2 j (at least two population means differ)
population variances. MSA
TS: F 5
MSE
RR: F $ Fa,k21,n2k 5 F0.05,3,16 5 3.24
Step 3 Find the total sum of squares.
SST 5 a a a x2ij b 2
k ni
t2..
Computational formula for SST.
i51 j51 n
2
1864.4
5 ( 91.52 1 77.52 1 c1 97.62 ) 2 Apply the formula.
20
5 175,027.24 2 173,799.368 5 1227.872 Simplify.
SSA 5 a a b 2
k 2
ti. t2..
Computational formula for SSA.
i51 ni n
433.02 445.92 486.82 498.72 1864.42 Use sample totals and
5a 1 1 1 b2 grand total.
5 5 5 5 20
5 174,398.348 2 173,799.368 5 598.98 Simplify.
Use these two values to find the sum of squares due to error.
SSE 5 SST 2 SSA 5 1227.872 2 598.98 5 628.892
Step 4 Compute MSA and MSE.
The next reasonable question is, MSA 5 SSA/ ( k 2 1 ) Definition.
“Which pairs of means are 5 598.98/ ( 4 2 1 ) 5 199.66 Use SSA and k 5 4 groups.
contributing to this overall
MSE 5 SSE/ ( n 2 k ) Definition.
difference?” We’ll address this
issue in Section 11.2. 5 628.892/ ( 20 2 4 ) 5 39.3058 Use SSE, n 5 20 total observations, and k 5 4.
Step 7 Here’s how all of these calculations are presented in an ANOVA table.
VIDEO Tech
Video TECH Manuals
MANUALS
EXEL DISCRIPTIVE
One-way ANOVA
Figure 11.3 ANOVA
one-way results.
Try It Now Go to Exercise 11.15
Statistical Applet
One-way In the next example, fewer detailed calculations are shown. However, the ANOVA
ANOVA table is presented, with the focus on the inference process.
SSA 5 a a b 2 5 c5 6.6255
k 2
ti. t2..
i51 ni n
Use these two values to find the sum of squares due to error.
SSE 5 SST 2 SSA 5 110.5687 2 6.6255 5 103.9432
Step 3 Compute MSA and MSE.
MSA 5 SSA/ ( k 2 1 ) Definition.
( )
5 6.6255/ 5 2 1 5 1.6564 Use SSA and k 5 5 groups.
X F4,25 MSE 5 SSE/ ( n 2 k ) Definition.
Technology Corner
Procedure: Conduct a one-way analysis of variance test.
Reconsider: Example 11.1, solution, and interpretations.
CrunchIt!
Use the ANOVA; one-way function.
1. Enter the data from sample 1 into column Var1, from sample 2 into column Var2, from sample 3 into column Var3, and
from sample 4 into column Var4.
2. Select Statistics; ANOVA; One-way.
3. Under the Columns tab, check the column names containing the data.
4. Click Calculate. The results are displayed in a new window. Refer to Figure 11.3.
11.1 One-Way ANOVA 539
TI-84 Plus C
Use the built-in function ANOVA.
1. Enter the data from sample 1 into list L1, from sample 2 into list L2, from sample 3 into list L3, and from sample 4 into list L4.
2. Select STAT ; TESTS; ANOVA.
3. Enter the lists containing the data as function arguments. See Figure 11.6.
4. Press ENTER . The results are displayed on the Home Screen. See Figure 11.7.
Minitab
Minitab has several built-in functions to conduct an analysis of variance test, depending on the model. Use One-Way for
the model presented in this section.
1. Enter the data from sample 1 into column C1, from sample 2 into column C2, from sample 3 into column C3, and
from sample 4 into column C4. Each sample is in a separate column.
2. Select Stat; ANOVA; One-Way.
3. Select Response data are in a separate column for each factor level. Enter the Responses: the columns containing the data.
4. Select Options. Check the box to Assume equal variances and select an appropriate Confidence level and Type of
confidence interval.
5. Click OK. The output is displayed in a session window and an Interval Plot is constructed in a graph window. See
Figure 11.8 and 11.9.
Figure 11.8 Minitab ANOVA one-way Figure 11.9 Interval plot associated with
results. one-way ANOVA results.
540 C hapt e r 1 1 The Analysis of Variance
Excel
Use the Data Analysis Toolkit function Anova: Single Factor.
1. Enter the data from sample 1 into column A, from sample 2 into column B, from sample 3 into column C, and from
sample 4 into column D.
2. Under the Data tab, select Data Analysis; Anova: Single Factor.
3. Enter the Input Range, choose Grouped By Columns, and select an Output option. Click OK to display the results.
See Figure 11.10.
11.18 Sports and Leisure A study was conducted to deter- syrups. Independent random samples of four cough syrups were
mine the tension needed by various types of guitar strings in obtained, and the amount of sorbitol per teaspoon (in grams)
order to produce the proper frequency. A high E was used for was measured. Conduct the appropriate test to determine
comparison, and tension was measured in newtons. The guitar whether there is any evidence that at least two population mean
string brands and summary statistics are given in the following levels of sorbitol are different. Use a 5 0.01 and include an
table. ANOVA table. SORBITOL
obtained, and the DOC was measured for each (in mg/L).7 Is Sample Sample Sample
there any evidence to suggest at least two population mean Club size mean variance
DOC concentrations are different? Use a 5 0.01. DOCS
AAA 15 36.8 10.18
11.27 Kerosene Purity Many people who live in northern Discover 17 43.8 9.97
states supplement their main heating system with kerosene Executive 20 34.8 12.15
heaters. These devices are generally inexpensive, but a kerosene
a. Use the definitions to find SSA and SSE. Use these two
flame emits a noticeable ash and the fumes must be well venti-
values to find SST.
lated. The flash point of kerosene is related to purity, ash,
b. Complete an ANOVA table, and use this information to
fumes, and safety. A low flash point is a fire hazard and may be
determine whether there is any evidence to suggest that at
an indication of contamination. Independent random samples of
least two population mean waiting times are different.
kerosene from five dealers in Maine were obtained and the flash
Use a 5 0.01.
point of each was measured (in °C). Is there any evidence to
suggest that at least two of the population mean flash points are 11.31 Physical Sciences During the summer months, gro-
different? Use a 5 0.05. KEROSENE cery stores, convenience stores, and gas stations sell bags of ice.
Independent random sample of bags were obtained from various
11.28 Expense IQ A recent publication by Concur provided
locations, and the weight (in pounds) of each was recorded. The
a summary of travel expenses for both small and medium-sized
summary statistics are given in the following table.
businesses (SMBs) and large-scale companies.8 A random sam-
ple of business trips to some of the most visited international Sum of
cities was obtained, and the amount spent on ground transporta- Sample Sample squared
tion was recorded (in dollars). Is there any evidence to suggest Location size total observations
that at least two of the population mean amounts spent on Giant 10 80.8 654.38
ground transportation are different? Use a 5 0.01. EXPENSE Sheetz 10 88.8 789.20
Star Market 14 119.9 1028.31
Unimart 12 104.1 903.57
Extended Applications
Weis 15 139.7 1303.41
11.29 Manufacturing and Product Development Many
a. Do the data suggest that the population mean weight of bags
cities, towns, and college campuses use a rotary broom to
of ice is the same at these five locations? Use a 5 0.001.
sweep debris and snow from sidewalks. The pressure created by
b. If each store sells these bags of ice for approximately the
the broom is one measure of how effective the machine will be
same price, where would you make your purchase?
in removing debris. Random samples of rotary brooms from
Justify your answer.
four manufacturers were obtained, and the pressure of each
broom was measured (in psi). The data are given in the follow- 11.32 Sports and Leisure The ESPN Home Run Tracker
ing table. BROOMS computes a variety of measurements for every home run hit in
Major League Baseball, including true distance, speed off bat,
Company Observations elevation angle, and apex. Independent random samples of
Ditch Witch 2081 1980 2210 2297 home runs during the 2013 season were obtained from six
2204 2765 2327 different ballparks, and the true distance was recorded for
Schwarze 2567 1799 2422 2437 each.9 HOMERUN
2367 2244 2245 a. Is there any evidence to suggest that at least two
Elgin 2228 2581 2364 2375 population mean home-run distances are different? Use
2066 2091 2543 a 5 0.05.
b. Do you think there are any violations in the one-way
Holder 2905 2695 2503 2931
ANOVA assumptions? If so, why?
2657 2591 2138
c. Do the data suggest that population mean home-run
a. Conduct the appropriate test to determine whether there is distance is greatest in one ballpark? Why do you suppose
any evidence that at least two population mean pressures the distances tend to be greater in that ballpark?
are different. Use a 5 0.05.
b. Which manufacturer would you recommend and why? Challenge
11.30 Fuel Consumption and Cars Automobile service 11.33 Public Policy and Political Science Every city has
clubs offer free maps, trip-interruption protection, payment for design specifications for streets, including minimum right of
some legal fees, and roadside assistance. Independent random way, minimum vertical grade, and minimum centerline radii on
samples of roadside service calls were selected for three differ- curves. Independent random samples of city streets were
ent clubs. The time required (in minutes) for a tow truck to obtained from Washington, D.C., and New York City, and the
arrive was recorded for each call. The summary statistics are width (in feet) of a randomly selected section was recorded.
given in the following table. The data are given in the following table. STREETS
544 C hapt e r 1 1 The Analysis of Variance
City Observations variance s21 5 100. Generate a second random sample of size
New York 28.2 32.9 34.6 31.5 31.6 29.5 n2 5 20 from a normal distribution with mean m2 5 50 and
City 30.3 29.2 25.6 28.6 28.8 31.5 variance s22 5 100. Generate a third random sample of size
n3 5 20 from a normal distribution with mean m3 5 52 and
Washington, 32.5 36.3 34.3 33.0 31.0 36.5
variance s23 5 100. Conduct a one-way ANOVA test with
D.C. 29.8 30.0 30.2 34.7 31.9 30.0
a 5 0.05 to determine whether there is any evidence to suggest
a. Conduct a two-sample t test to determine whether there is that the population means differ.
any evidence to suggest that the population mean street Repeat this process 100 times and record the proportion of
widths are different. Assume the population variances are times you reject the null hypothesis, pr.
equal and use a 5 0.05. Find the exact p value associated
with this test. Let mT 5 ( m1 1 m2 1 m3 ) /3 and compute the effect size e,
a ( mi 2 mT ) /3
b. Conduct a one-way analysis of variance test to determine 3
2
whether there is any difference among the k 5 2
i51
population mean street widths due to city. Use a 5 0.05. e5
Find the exact p value associated with this test. ã 100
c. What is the relationship between the value of the test Plot the point (e, pr).
statistic in part (a) and the value of the test statistic in part
Repeat this process for various values of m1, m2, and m3, and
(b)? How are the p values related? Why do these
therefore, various effect sizes. Plot the points (e, pr).
relationships make sense?
Explain the resulting graph. What concept related to an
11.34 The Effect Size Generate a random sample of size ANOVA test does this graph illustrate?
n1 5 20 from a normal distribution with mean m1 5 50 and
The procedures presented here provide methods for capping the probability of making at
least one mistake in all of the comparisons.
Although we could actually conduct hypothesis tests, usually we construct multiple
confidence intervals for the difference between population means. Recall that, if a confi-
dence interval for m1 2 m2 contains 0, there is no evidence to suggest that the population
means are different. However, if 0 is not included in the confidence interval, there is evi-
dence to suggest that the two population means are different. If we want to find a
100 ( 1 2 a ) % confidence interval for all possible paired comparisons (i.e., so that the
overall probability is a), we must make the intervals much wider than those for individual
differences of means.
In a one-way ANOVA with three groups, suppose there is evidence to suggest that at
least one pair of means is different and a multiple comparison procedure produces the
following confidence intervals.
Difference Confidence interval
m1 2 m2 (21.21, 9.00)
m1 2 m3 ( 3.09, 13.31)
m2 2 m3 (20.81, 9.41)
Zero is included in the confidence intervals for the differences m1 2 m2 and m2 2 m3 .
There is no evidence to suggest that these pairs of population means are different. The
confidence interval for m1 2 m3 does not include 0. There is evidence to suggest that m1
is different from m3.
The general form of each Bonferroni confidence interval is very similar to a confi-
dence interval for the difference between two means based on a t distribution using a
pooled estimate of the common variance (introduced in Section 10.2). Here, we use the
mean square due to error (MSE) as an estimate of the common variance. And a t critical
value is used to achieve a simultaneous, or familywise, confidence level of 100 ( 1 2 a ) %.
1 1
( xi. 2 xj. ) 6 ta/(2c),n2k "MSE 1 for all i 2 j (11.6)
Å ni nj
The ANOVA test is significant at the p 5 0.0016 level. There is evidence to suggest that
at least one pair of population means is different (an overall difference). Construct the
Bonferroni 95% confidence intervals and use them to isolate the pair(s) of means contrib-
uting to this overall experiment difference.
Solution
Step 1 The number of pairwise comparisons needed is
4#3
c5 56
2
a 0.05
95% 5 100 ( 1 2 a ) % 1 a 5 0.05 1 5 # 5 0.0042
2c 2 6
This (infrequent) right-tail probability is not specified in Table V in the Appen-
dix; however, one can use linear interpolation or technology, with n 2 k 5
40 2 4 5 36 degrees of freedom, to find ta/(2c),36 5 t0.0042,36 5 2.7888.
Step 2 The Bonferroni confidence interval for the difference m1 2 m2 is
1 1
( x1. 2 x2. ) 6 ta/(2c),n2k "MSE 1
Å n1 n2
1 1
5 ( x1. 2 x2. ) 6 t0.0042,36 "MSE 1
Å n1 n2
1 1
5 ( 402.3 2 421.1 ) 6 ( 2.7888 ) "4585.7 1
Å 10 10
5 ( 2103.3, 65.7 )
Step 3 The remaining five confidence intervals are found in the same manner. Each
Bonferroni confidence interval and its corresponding conclusion are shown in
the following table.
Bonferroni
Difference confidence interval Conclusion
m1 2 m2 ( 2103.3, 65.7 ) 0 in CI. m1 and m2 are not significantly different.
m1 2 m3 ( 28.3, 160.7 ) 0 in CI. m1 and m3 are not significantly different.
m1 2 m4 ( 3.5, 172.5 ) 0 not in CI. Evidence to suggest that m1 2 m4.
m2 2 m3 ( 10.5, 179.5 ) 0 not in CI. Evidence to suggest that m2 2 m3.
m2 2 m4 ( 22.3, 191.3 ) 0 not in CI. Evidence to suggest that m2 2 m4.
m3 2 m4 ( 272.7, 96.3 ) 0 in CI. m3 and m4 are not significantly different.
Finally, the initial ANOVA test indicates that there is an overall difference among
the four population means. The simultaneous 95% Bonferroni confidence inter-
vals suggest that this overall difference is due to a difference between m1 and m4,
m2 and m3, and m2 and m4. Figure 11.11 shows a technology solution.
11.2 Isolating Differences 547
There is another common, compact, graphical method for summarizing the results of
a multiple comparison procedure. Write the sample means in order from smallest to larg-
est. Use the results from a multiple comparison procedure to draw a horizontal line under
the groups of means that are not significantly different.
Here is the graphical summary for Example 11.3.
1. The sample means in order:
x4. x3. x1. x2.
Sample mean 314.3 326.1 402.3 421.1
2. There is no significant difference between m1 and m2. Draw a horizontal line under the
sample means from population 1 and 2.
x4. x3. x1. x2.
Sample mean 314.3 326.1 402.3 421.1
3. There is no significant difference between m1 and m3. Draw a horizontal line under the
sample means from population 1 and 3. There is no significant difference between m3
and m4. Draw a horizontal line under the sample means from population 3 and 4.
x4. x3. x1. x2.
Sample mean 314.3 326.1 402.3 421.1
Those pairs of means not connected by a horizontal line are significantly different.
1 1 1
( xi. 2 xj. ) 6 Qa,k,n2k "MSE 1 for all i 2 j (11.7)
"2 Å i
n nj
548 C hapte r 1 1 The Analysis of Variance
If all pairwise comparisons are considered, the Bonferroni procedure produces wider
confidence intervals than the Tukey procedure. However, if only a subset of all pairwise
comparisons is needed, then the Bonferroni method may be better. There are also other
methods for comparing population means following an ANOVA test. No single compari-
son method is uniformly best.
In the following example, Tukey’s procedure is used to isolate pairwise differences
contributing to an overall significant ANOVA test.
Group number 1 2 3 4
Region Northeast South Midwest West
Sample size 10 11 12 11
Sample mean 15.64 13.99 9.50 12.98
The ANOVA test is significant at the p 5 0.0016 level. There is evidence to suggest that
at least one pair of means is different (an overall difference). Construct the Tukey 95%
simultaneous confidence intervals and use them to isolate the pair(s) of means contribut-
ing to this overall experiment difference.
Solution
Step 1 The number of pairwise comparisons needed is A 42 B 5 6 and a 5 0.05.
The Studentized range critical value (Table VIII in the Appendix) is Qa,k,n2k 5
Q0.05,4,40 5 3.791.
Step 2 The first confidence interval, for the difference m1 2 m2 , using Tukey’s proce-
dure is
1 1 1
( x1. 2 x2. ) 6 Q0.05,k,n2k "MSE 1
"2 Å 1
n n2
1 1 1
5 ( 15.64 2 13.99 ) 6 ( 3.791 ) "12.34 1
"2 Å 10 11
5 1.65 6 4.11 5 ( 22.47, 5.75 )
Step 3 The remaining confidence intervals and conclusions are given in the following
table.
11.2 Isolating Differences 549
Video Tech Manuals The Bonferroni multiple comparison procedure is conservative. This means the overall
VIDEO TECH MANUALS resulting confidence level is probably higher than the specified value. To compute Bonfer-
One-way ANOVA -
EXEL DISCRIPTIVE
follow-up analysis: roni intervals, the overall confidence level is simply divided evenly among all pairwise
Bonferroni, Tukey comparisons. If we are only interested in some of the A k2 B 5 c pairwise comparisons, the
overall confidence level is divided accordingly. In this case, the resulting confidence level
may be closer to the specified value. Tukey’s procedure is based on a special distribution
and tends to result in more accurate confidence levels. Therefore, if all pairwise compari-
sons are necessary, Tukey’s multiple comparison procedure is usually better.
Consider the Bonferroni confidence intervals for Example 11.4.
Bonferroni
Difference confidence interval Conclusion
m1 2 m2 ( 22.61, 5.91 ) 0 in CI. m1 and m2 are not significantly different.
m1 2 m3 ( 1.97, 10.31 ) 0 not in CI. Evidence to suggest that m1 2 m3.
m1 2 m4 ( 21.60, 6.92 ) 0 in CI. m1 and m4 are not significantly different.
m2 2 m3 ( 0.42, 8.56 ) 0 not in CI. Evidence to suggest that m2 2 m3.
m2 2 m4 ( 23.14, 5.16 ) 0 in CI. m2 and m4 are not significantly different.
m3 2 m4 ( 27.55, 0.59 ) 0 in CI. m3 and m4 are not significantly different.
Notice that each confidence interval is wider than the corresponding Tukey confidence
interval. A wider, more conservative, Bonferroni confidence interval could result in a dif-
ferent conclusion concerning the difference between two population means. For example,
compare the Tukey and Bonferroni confidence intervals for m2 2 m3; the left endpoint of
the Bonferroni CI is closer to 0.
Technology Corner
Procedure: C
onstruct simultaneous confidence intervals to isolate the pair(s) of means contributing to an overall
experiment difference detected using an analysis of variance test.
Reconsider: Example 11.3, solution, and interpretations.
CrunchIt!, the TI-84 Plus C, and Excel do not have built-in functions to construct Bonferroni or Tukey confidence inter-
vals. In Excel, one can use ordinary spreadsheet functions to construct the Bonferroni confidence intervals.
550 C hap te r 1 1 The Analysis of Variance
Minitab
There is an option to One-Way Analysis of Variance to construct the Tukey confidence intervals. There is no option for
Bonferroni confidence intervals.
1. The data may be entered in a one column for all factor levels or in a separate column for each factor level. In this
example, enter the data from samples 1, 2, 3, and 4 into columns C1, C2, C3, and C4, respectively.
2. Select Stat; ANOVA; One-Way. Choose Response data are in a separate column for each factor level.
3. Enter the Responses: the columns containing the data.
4. Choose the Options button. Check the box to Assume equal variances. Select an appropriate Confidence level and Type
of confidence interval.
5. Choose the Comparisons options button. Enter an Error rate for comparisons which is 1 2 confidence level, and check
Tukey. Check Interval plot for the differences of means, Grouping information, and Tests as desired.
6. Click OK. The output is displayed in a session window. See Figure 11.12. Note that the mean differences are reversed.
11.35 True/False The Bonferroni multiple comparison pro- x1. 5 52.8 m1 2 m2 ( 26.00, 17.85 )
cedure is conservative. x2. 5 46.9 m1 2 m3 ( 29.50, 14.35 )
x3. 5 50.4 m2 2 m3 ( 215.42, 8.42 )
11.36 True/False The Bonferroni multiple comparison pro-
cedure can be used even if we are only interested in some of the
pairwise comparisons. b. Sample means Difference Confidence interval
11.37 True/False For Tukey’s multiple comparison proce- x1. 5 4.82 m1 2 m2 ( 25.69, 21.13 )
dure to be valid, all sample sizes must be the same. x2. 5 8.23 m1 2 m3 ( 24.49, 0.07 )
11.38 True/False In a one-way ANOVA with k groups, sup- x3. 5 7.03 m2 2 m3 ( 21.08, 3.48 )
pose we reject the null hypothesis. Using a multiple comparison
procedure, there must be at least one pairwise comparison in c. Sample means Difference Confidence interval
which there is evidence to suggest the population means are
significantly different. x1. 5 16.09 m1 2 m2 ( 212.16, 21.07 )
x2. 5 22.71 m1 2 m3 ( 27.98, 3.11 )
11.39 Short Answer In a one-way ANOVA with k groups,
x3. 5 18.53 m1 2 m4 ( 21.37, 9.72 )
suppose we reject the null hypothesis. What is the total number
of possible pairwise comparisons? x4. 5 16.33 m2 2 m3 ( 25.78, 5.31 )
m2 2 m4 ( 0.83, 11.92 )
11.40 Short Answer Explain why we use a multiple com-
m3 2 m4 ( 23.34, 7.75 )
parison procedure rather than several individual hypothesis tests
or confidence intervals.
d. Sample means Difference Confidence interval
11.41 Short Answer In a one-way ANOVA, suppose the
null hypothesis is not rejected. Explain why there is no need for x1. 5 201.7 m1 2 m2 ( 9.98, 56.20 )
a multiple comparison procedure. x2. 5 168.6 m1 2 m3 ( 13.20, 59.42 )
x3. 5 165.4 m1 2 m4 ( 219.89, 26.33 )
Practice x4. 5 219.7 m2 2 m3 ( 241.10, 5.12 )
11.42 In each of the following problems, the number of m2 2 m4 ( 274.19, 227.98 )
groups, k, the total number of observations, n, and the overall m3 2 m4 ( 277.41, 231.20 )
confidence level for the Bonferroni multiple comparison proce-
dure are given. Find the total number of comparisons, c, and the 11.45 In each of the following problems, use the summary
t critical value used in the calculation of each Bonferroni confi- s tatistics and the Tukey confidence intervals to construct the
dence interval. corresponding graph indicating which pairs of means are and
a. k 5 3, n 5 30, 95% are not significantly different.
b. k 5 3, n 5 43, 99% a. Sample means Difference Confidence interval
c. k 5 4, n 5 32, 99%
d. k 5 5, n 5 55, 90% x1. 5 233.44 m1 2 m2 ( 237.57, 0.36 )
e. k 5 6, n 5 30, 95% x2. 5 214.83 m1 2 m3 ( 252.88, 214.95 )
11.43 In each of the following problems, the number of obser- x3. 5 0.48 m1 2 m4 ( 256.71, 218.77 )
vations in each group and the overall confidence level for x4. 5 4.30 m2 2 m3 ( 234.28, 3.66 )
Tukey’s multiple comparison procedure are given. Find the crit- m2 2 m4 ( 238.10, 20.17 )
ical value from the Studentized range distribution to be used in m3 2 m4 ( 222.79, 15.14 )
the calculation of each Tukey confidence interval.
a. n1 5 6, n2 5 8, n3 5 7, 95%
b. n1 5 n2 5 11, n3 5 12, n4 5 10, 95% b. Sample means Difference Confidence interval
c. n1 5 14, n2 5 12, n2 5 10, n4 5 18, 99% x1. 5 1.62 m1 2 m2 ( 0.086, 0.320 )
d. n1 5 n2 5 n3 5 n4 5 n5 5 11, 99% ( 0.197, 0.432 )
x2. 5 1.41 m1 2 m3
e. n1 5 n2 5 n3 5 5, n4 5 n5 5 6, n6 5 9, 99.9%
x3. 5 1.30 m1 2 m4 ( 20.006, 0.229 )
11.44 In each of the following problems, use the summary sta- x4. 5 1.50 m2 2 m3 ( 0.001, 0.235 )
tistics and the Bonferroni confidence intervals to construct the ( 20.203, 0.320 )
m2 2 m4
corresponding graph indicating which pairs of means are and
m3 2 m4 ( 20.314, 20.079 )
are not significantly different.
552 C ha pt e r 1 1 The Analysis of Variance
c. Sample means Difference Confidence interval ANOVA test was significant at the p 5 0.0190 level. The sum-
( 0.91, 19.38 ) mary statistics are given below. If MSE 5 136,146, construct
x1. 5 64.35 m1 2 m2
the Bonferroni 95% confidence intervals to isolate the popula-
x2. 5 54.21 m1 2 m3 ( 25.68, 12.79 )
tion mean payments that are significantly different.
x3. 5 60.80 m1 2 m4 ( 3.20, 21.66 )
( 29.73, 8.73 ) State Sample size Sample mean
x4. 5 51.92 m1 2 m5
x5. 5 64.85 m2 2 m3 ( 215.83, 2.64 ) Connecticut 18 3489.22
m2 2 m4 ( 26.95, 11.52 ) Georgia 22 3774.50
m2 2 m5 ( 219.88, 21.41 ) Mississippi 25 3794.76
m3 2 m4 ( 20.36, 18.11 )
11.50 Demographics and Population Statistics A study
m3 2 m5 ( 213.29, 5.18 )
was conducted to compare the population mean age of women
m4 2 m5 ( 222.16, 23.70 ) at the time of their first marriage. Independent random samples
of size 11 were obtained from weddings in three different years.
11.46 In each of the following problems, the results from a The resulting ANOVA test was significant at the p 5 0.003
multiple comparison procedure are shown graphically. Use level. The sample means are given below. If MSE 5 17.68,
each illustration to identify all pairs of population means that construct the Tukey 95% confidence intervals to isolate the
are significantly different. population mean ages that are significantly different.
a. x3. x1. x2. x4.
Sample mean 12.69 14.64 15.94 16.21 Year 1985 1995 2003
b. x1. x2. x3. x4. Sample mean 23.3 24.5 25.3
Sample mean 29.98 32.59 32.62 37.25
11.51 Manufacturing and Product Development An
c. x1. x2. x3. x4.
experiment was conducted to compare the true cost of
Sample mean 39.01 41.40 50.83 51.62
microwave popcorn brands by examining the percentage of
d. x2. x3. x5. x1. x4. popped kernels. Independent random samples of six pack-
Sample mean 4.67 6.08 6.23 6.30 7.20 ages of buttered microwave popcorn (all weighing the same)
were obtained for each of four different brands. Each bag
11.47 Suppose data collected from an experiment resulted in was popped, and the percentage of popped kernels was
a significant ANOVA test. Using the summary statistics and measured. An ANOVA test resulted in the test statistic
MSE 5 29.83, construct the Bonferroni 99% confidence inter- f 5 7.69 and MSE 5 2.72. The sample means are given in
vals for all pairwise comparisons. Use these intervals to iden- the following table.
tify the significantly different population means.
Microwave Pop Best Orville
Group 1 2 3 4 popcorn Secret Choice Act II Redenbacher’s
Sample size 20 20 20 20 Sample 90.8 86.9 90.9 89.2
Sample mean 19.59 21.58 16.50 25.76 mean
11.48 Suppose data collected from an experiment resulted in a a. Find bounds on the p value associated with this test
significant ANOVA test. Using the summary statistics and statistic. State the conclusion of the ANOVA test.
MSE 5 8111.8, construct the Tukey 95% confidence intervals b. Construct the Bonferroni 99% simultaneous confidence
for all pairwise comparisons. Use these intervals to identify the intervals to determine which pairs of population mean
significantly different population means. percentages of popped kernels are significantly different.
Personality type Sample size Sample mean Construct the Bonferroni 99% confidence intervals for all pair-
Mechanic 14 1962.7 wise comparisons and draw a graph to represent the results.
Identify the pairs of athletic conference population means that
Nurturer 16 1757.4
are significantly different.
Protector 20 2057.8
Idealist 18 2019.5 11.55 Biology and Environmental Science Most honey
comes from European honey bee colonies cultivated in the
a. Show that there is evidence of an overall difference in United States. Africanized killer bees tend to be aggressive and
population mean resting metabolic rate. Use a 5 0.01 produce less honey. A study was conducted to compare the
and justify your answer. mean amount of honey produced by colonies in four parts of
b. Construct the Tukey 99% confidence intervals and the country. Independent random samples of colonies were
indicate which pairs of means are contributing to the obtained in each area, and the amount of honey (in pounds) per
overall difference. year per colony was recorded. HONEY
a. Conduct an analysis of variance test to show that there is
11.53 Biology and Environmental Science Soil permea- evidence of an overall difference in population mean
bility, the rate at which water can flow through the soil, is honey production. Use a 5 0.05.
an important measure prior to construction. Four different b. Construct the Bonferroni 95% confidence intervals to
potential county development sites in Wisconsin were selected isolate the pairs of means contributing to the overall
for testing. Independent random samples of locations within difference.
each development were selected, and the soil permeability c. Draw a graph to represent the results of the multiple
was measured at each location (in inches per hour). An comparison procedure in part (b).
ANOVA test was conducted to determine whether there
was any difference in population mean permeability rates 11.56 Olive Oil Purity Olive oil is used in a variety of cook-
( f 5 9.10 and MSE 5 0.02504 ) . The summary statistics are ing methods and has been shown to provide some health bene-
given in the following table. fits. However, the chemical composition and purity of olive oil
can also vary and depends on the supplier region and climate.
Development Sample size Sample mean Independent random samples of domestic regular olive oils
Grant 10 1.3164 were obtained and the peroxide value (in mEq O2/kg oil) was
Green 12 0.2567 measured for each. OLIVEOIL
a. Conduct an analysis of variance test to show that there is
Dane 11 0.5601
evidence of an overall difference in population mean
Rock 11 0.9206
peroxide value. Use a 5 0.01.
b. Construct the Tukey 99% confidence intervals to isolate
a. Find bounds on the p value associated with this
the pairs of means contributing to the overall difference.
(significant) F statistic.
c. Draw a graph to represent the results of the multiple
b. Construct the Bonferroni 95% confidence intervals for all
comparison procedure in part (b).
pairwise comparisons and draw a graph to represent the
d. Based on these data, which olive oil appears to be the
results.
purest? Why?
11.54 Sports and Leisure There is a 35-second shot clock
11.57 Demographics and Population Statistics A stun
for men’s NCAA basketball games. A team must shoot before
gun delivers a high-voltage, low-amperage electrical shock to
the shot clock expires, or they turn the ball over to the opposing
an attacker. In certain states, cities, and countries the use of this
team. A team rarely uses the entire 35 seconds, and some teams
self-protection device is restricted or even illegal. Independent
shoot as quickly as possible. A study was conducted to compare
random samples of stun gun owners in cities, suburban areas,
the mean time to take a shot in five different athletic confer-
and rural areas were obtained. Each stun gun was examined,
ences. Independent random samples of possessions were
and the voltage (in thousands of volts) on the device was care-
obtained, and the time (in seconds) to take a shot was recorded
fully measured in a single test. STUNGUN
for each. An ANOVA test was conducted to determine whether
a. Conduct an analysis of variance test to show that there is
there was any overall difference in population mean shot times
evidence of an overall difference in population mean stun
( f 5 6.33 and MSE 5 65.201 ) . The summary statistics are
gun voltage. Use a 5 0.05.
given in the following table.
b. Construct the Tukey 95% confidence intervals to isolate
Conference Sample size Sample mean the pairs of means contributing to the overall difference.
c. Draw a graph to represent the results of the multiple
ACC 20 17.27
comparison procedure in part (b).
Big East 26 27.90
Pac-12 23 19.14 11.58 Public Health and Nutrition A simple gelatin is
known today as “America’s most famous dessert.’’ Although it
Sun Belt 25 22.97
was patented in 1845, Jell-O sales were minimal until the early
WAC 26 20.05 1900s. While primarily made from processed collagen, gelatin
554 C hapte r 1 1 The Analysis of Variance
also contains sugar, artificial flavors, and coloring. Indepen- b. Construct the Bonferroni 95% confidence intervals to isolate
dent random samples of three gelatin brands were obtained, the pairs of means contributing to the overall difference.
and the amount of sugar (in grams) in one serving was c. Construct the Tukey 95% confidence intervals to isolate
measured. GELATIN the pairs of means contributing to the overall difference.
a. Conduct an analysis of variance test to show that there is d. Are your answers to parts (b) and (c) the same? If so, did
evidence of an overall difference in population mean you expect this to happen? If not, why not?
sugar content in gelatin servings. Use a 5 0.01.
11.61 Public Health and Nutrition Butter and margarine
b. Construct the Bonferroni 99% confidence intervals to
contain saturated fat, which can increase the “bad” cholesterol
isolate the pairs of means contributing to the overall
in your blood. In order to advise clients, a dietitian examined
difference.
four types of margarine. A random sample of each type was
c. Draw a graph to represent the results of the multiple
obtained, and the amount of saturated fat (in grams) per serving
comparison procedure in part (b).
was recorded. SATFAT
11.59 Sports and Leisure The outside dimensions and play- a. Conduct an analysis of variance test to show that there is
ing lines of a tennis court are standard and well established. evidence of an overall difference in population mean
However, the distance between courts, or sidelines, varies. Many saturated fat per serving. Use a 5 0.01.
construction guidelines recommend 24 feet between courts, but b. Construct the Tukey 99% confidence intervals to isolate
to conserve space, builders plan for only at least 12 feet. Inde- the pairs of means contributing to the overall difference.
pendent random samples of public courts were obtained in five c. Are the differences found in part (b) consistent with the
different cities, and the distance (in feet) between courts was percentage of fat indicated on each product label? If not,
recorded. The data are given in the following table. TENNIS can you explain any discrepancy?
City Observations 11.62 Washer Water Usage According to the U.S. Environ-
mental Protection Agency, a typical American family washes
Atlanta 14.0 14.2 13.0 15.2 15.0
about 400 loads of laundry each year. Front-loading washing
Boston 14.2 16.8 18.6 15.5 16.6
machines have become popular because they have a small car-
Los Angeles 14.3 14.9 16.5 15.1 14.4 bon footprint, use less electricity, and tend to get clothes cleaner.
Miami 14.3 17.3 17.3 14.9 16.4 Independent random samples of front-loading washing
New York 22.0 18.3 19.3 20.5 18.5 machines were obtained and the amount of water used (in gal-
lons) in a regular load was measured for each. LAUNDRY
a. Conduct an analysis of variance test to show that there a. Conduct an analysis of variance test to show that there is
is evidence of an overall difference in population mean evidence of an overall difference in population mean
distances between tennis courts in public parks. Use water usage. Use a 5 0.05.
a 5 0.01. b. Construct the Bonferroni 95% confidence intervals to
b. Construct the Bonferroni 95% confidence intervals to isolate the pairs of means contributing to the difference.
isolate the pairs of means contributing to the overall c. Construct the Tukey 95% confidence intervals to isolate
difference. the pairs of means contributing to the overall difference.
c. Draw a graph to represent the results of the multiple d. Are the results the same in parts (b) and (c)? If so why?
comparison procedure in part (b). If not, why not?
An interaction effect is significant The total variation in the data, sum of squares (SST), is decomposed into the sum of
if one level of factor A and one squares due to factor A (SSA), the sum of squares due to factor B (SSB), the sum of
level of factor B interact squares due to interaction [SS(AB)], and the sum of squares due to error (SSE).
differently, or inconsistently, from Consistent with previous notation, dots in the subscript of x and t indicate the mean
other factor combinations. and the sum of xijk, respectively, over the appropriate subscript(s), for example,
Here are the fundamental identity, the definitions, and the computational formulas for a
two-way ANOVA.
a ti..
a
2
5 bn a ( xi.. 2 x... ) 2 5
a
i51 t...2
SSA 2
i51 bn abn
a t. j.
b
2
5 an a ( x. j. 2 x... ) 2 5
b
j51 t...2
SSB 5
j51 an abn
i51 j51
definition
a a tij. a ti.. a t. j.
a b a b
2 2 2
computational formula
The assumptions for a two-way ANOVA are stated in terms of each cell, which is con-
sidered a population.
For reference, these are the 1. The ab population distributions are normal.
two-way ANOVA assumptions. 2. The ab population variances are equal.
3. The samples are selected randomly and independently from the respective
populations.
Two-way ANOVA calculations are also usually presented in a summary table. The
values in this table are associated with the five sources of variation, and the F statistics are
used to conduct appropriate hypothesis tests.
The hypothesis test for an interaction effect is usually considered first. An interaction
effect is present when the relationship between the two factors is not linear, or additive,
for at least one combination of levels. For example, suppose the total production of corn
depends on two factors, the amount of water and the amount of fertilizer. Intuitively, one
might expect the total production of corn to increase as the amount of water increases
and/or as the amount of fertilizer increases. Each change in water and/or fertilizer results
in a predictable, additive change in the total amount of corn produced. However, consider
a very high level of water and a high level of fertilizer. The additive model predicts a huge
total production in corn. However, there could be an interaction or inconsistent effect
associated with these two levels. Too much water and too much fertilizer might combine
to produce very little corn production. An interaction plot is a scatter plot of each cell
sample mean versus factor A level. Connect the points in the graph corresponding to the
same factor B levels. Parallel lines suggest no evidence of interaction.
Case 1. If the null hypothesis is not rejected, then the other two hypothesis tests can be
conducted as usual, to see whether there are effects due to either (or both) factors.
Case 2. If the null hypothesis is rejected, then there is evidence of a significant interac-
tion. Interpretation of the other two hypothesis tests is tricky.
1. If we reject a null hypothesis of no effect due to factor A (and/or factor B), then the
effect due to factor A (and/or factor B) is probably significant.
2. If we do not reject a null hypothesis of no effect due to factor A (and/or factor B),
then the effect due to factor A (and/or factor B) is inconclusive.
A study was conducted to determine whether the signal strength was related to the satel-
lite company and/or the geographic region. For each company and region, a random sam-
ple of satellite TV users was obtained, and the signal strength of each system was measured
( in dBmV ) . The data are given in the following table.
Geographic region
Northeast Southeast Midwest West
65.9 73.9 64.0 61.4 55.9 64.2 54.7 69.9
DIRECTTV
70.9 74.9 55.0 52.1 67.2 74.3 62.2 65.5
Company
Dish 71.3 76.8 64.3 64.0 69.4 68.6 65.6 67.8
Network 71.4 65.2 61.3 68.2 61.2 64.9 55.2 66.0
64.0 64.2 60.8 62.9 58.3 64.4 62.4 62.9
Echostar
76.7 65.1 57.0 69.6 69.3 73.1 64.5 71.3
Solution
Step 1 Assume the two-way ANOVA assumptions are true. There are a 5 3 levels of
factor A (satellite TV company), b 5 4 levels of factor B (geographic region),
and n 5 4 observations in each cell. Sample totals are given in the following
table.
Geographic region
Northeast Southeast Midwest West
t11. 5 285.6 t12. 5 232.5 t13. 5 261.6 t14. 5 252.3 t1.. 5 1032.0
Company
DIRECTTV
Dish Network t21. 5 284.7 t22. 5 257.8 t23. 5 264.1 t24. 5 254.6 t2.. 5 1061.2
Echostar t31. 5 270.0 t32. 5 250.3 t33. 5 265.1 t34. 5 261.1 t3.. 5 1046.5
t.1. 5 840.3 t.2. 5 740.6 t.3. 5 790.8 t.4. 5 768.0 t... 5 3139.7
SST 5 a a a a x2ijk b 2
3 4 4
t...2
Computational formula.
i51 j51 k51 ( 3 )( 4 )( 4 )
3139.72
5 ( 65.92 1 73.92 1 c1 71.32 ) 2 Apply the formula.
48
5 206,990.75 2 205,369.09 5 1621.66 Simplify.
a ti..
3
2
i51 t...2
SSA 5 2 Computational formula.
( 4 )( 4 ) ( 3 )( 4 )( 4 )
1032.02 1 1061.22 1 1046.52 3139.72
5 2 Apply the formula.
16 48
5 205,395.73 2 205,369.09 5 26.64 Simplify.
a t. j.
4
2
j51 t...2
SSB 5 2 Computational formula.
( 3 )( 4 ) ( 3 )( 4 )( 4 )
840.32 1 740.62 1 790.82 1 768.02 3139.72
5 2 Apply the formula.
12 48
5 205,815.09 2 205,369.09 5 446.00 Simplify.
11.3 Two-Way ANOVA 559
a a tij. a ti.. a t. j.
3 4 3 4
2 2 2
5 108.19 Simplify.
VIDEO Tech
Video TECH Manuals
MANUALS Try It Now Go to Exercise 11.17
EXEL DISCRIPTIVE
Two-way ANOVA
Data Set
Source of Sum of Degrees of Mean
variation squares freedom square F p Value
snails
Factor A 44.690 4 11.172 1.08 0.3929
Factor B 122.533 1 122.533 11.84 0.0026
Interaction 24.383 4 6.096 0.59 0.6744
Error 206.973 20 10.349
Total 398.579 29
11.3 Two-Way ANOVA 561
Solution
a. Check for an interaction effect.
Technology Corner
Procedure: Two-way analysis of variance.
Reconsider: Example 11.5, solution, and interpretations.
CrunchIt!
Use the Two-Way ANOVA function.
1. Enter the signal data in column Var3. Enter the corresponding company level in column Var1 and the region level in
column Var2.
2. Select Statistics; ANOVA; Two-way.
3. Select the Factor 1 column (Var1), the Factor 2 column (Var2), and the Values column (Var3).
4. Click Calculate. The results are displayed in a new window. Refer to Figure 11.17.
Minitab
Use the Balanced ANOVA function.
1. Enter the signal data in a single worksheet column (C3). Enter the company level in column C1 and the region level in
column C2.
2. Select Stat; ANOVA; Balanced ANOVA.
3. Enter the Response column (C3) and specify the model: C1 C2 C1*C2. The last expression is the interaction term.
4. Click OK. The results are displayed in a session window. See Figure 11.22.
Excel
Use the Data Analysis function Anova: Two-Factor With Replication.
1. Enter the data in an array as shown in Figure 11.23.
2. Under the Data tab, select Data Analysis; Anova: Two-Factor With Replication.
11.3 Two-Way ANOVA 563
3. Enter the Input Range, including the row and column headings, the Rows per sample (4), and the Alpha level (0.05).
4. Choose an Output option and click OK.
5. The analysis of variance table and summary statistics by row and column are displayed. The Anova summary table is
show in Figure 11.24.
Figure 11.23 Data input array for Anova: Figure 11.24 Excel Anova summary table:
Two-Factor With Replication. Two-Factor With Replication.
b. Find a a a x2ijk .
4 2 4 11.74 Complete the following ANOVA table.
i51 j51 k51 Source of Sum of Degrees of Mean p
c. Compute SST, SSA, SSB, SS(AB), and SSE. variation squares freedom square F Value
11.71 Consider the following data in the context of a two-way Factor A 121.25 1.57
ANOVA test. EX11.71 Factor B 91.12 4
Interaction
Factor B
Error 1446.41 75
1 2 3
Total 1833.33 99
11.4 16.4 4.9 9.6 17.7 14.9
1 Conduct the hypothesis tests to check for effects due to
9.6 12.6 16.8
interaction, factor A, and factor B. Use a 5 0.01 for each test.
8.9 8.5 8.6 12.7 6.5 17.3
Factor A
45.5 39.7 37.5 37.9 33.2 45.0 a. Complete the ANOVA table.
2
38.3 35.9 42.4 37.5 40.1 44.5 b. What was the total number of observations?
31.0 35.1 38.3 35.7 41.4 39.8 c. Conduct the hypothesis test for any effect due to
3 interaction. Use a 5 0.05. What does your conclusion
29.5 39.1 36.3 33.5 39.0 36.7
imply about the other two hypothesis tests?
a. Complete an ANOVA summary table. d. Conduct the hypothesis tests for factor effects. Use
b. Conduct the hypothesis tests to check for effects due to a 5 0.05. State your conclusions.
interaction, factor A, and factor B. Use a 5 0.01 for each
11.76 Business and Management A temp agency recently
test. State your conclusions.
conducted a study to determine the effects of job type and sex
11.73 Complete the following ANOVA table. on length of employment. Independent random samples of
employees who worked in service, technology, sales, security,
Source of Sum of Degrees of Mean p and labor were obtained. The length (in weeks) of each assign-
variation squares freedom square F Value ment was recorded. Consider the following partial ANOVA
Factor A 4 40.66 summary table.
Factor B 2 Source of Sum of Degrees of Mean p
Interaction 144.23 variation squares freedom square F Value
Error 1206.87 75 Sex 16.33 1
Total 1670.28 89 Job type 184.39 4
Interaction 4
Conduct the hypothesis tests to check for effects due to
Error 202.42 50
interaction, factor A, and factor B. Use a 5 0.05 for each test.
State your conclusions. Total 422.48
11.3 Two-Way ANOVA 565
a. Complete the ANOVA table. 11.80 Travel and Transportation Many “snowbirds,”
b. Is there any evidence of interaction? Use a 5 0.01. p eople who live in the Northeast during the summer and in
c. Is there any evidence that gender or job type affects the Florida during the winter, ship their vehicles back and forth.
length of employment? Use a 5 0.01. However, automobile transportation costs are affected by a
number of factors. In a study conducted by Corsia Logistics, a
11.77 Biology and Environmental Science According to
random sample of automobiles shipped from Boston to Miami
a recent study, many factors affect feed intake of chickens and,
was obtained for each combination of type of car (compact,
therefore, egg production.13 For example, lighting, noise, and
midsize, full-size, SUV, pickup) and transport type (open,
water may all affect the efficiency of poultry production. A
enclosed). The cost for each (in dollars) was recorded. Conduct
study was conducted to determine whether the feed intake of
an analysis of variance with a 5 0.01 to test for interaction
chickens (in kg/day) is affected by flock size and/or
and factor effects. State your conclusions. autotran
temperature. feed
a. Construct a two-way ANOVA summary table. 11.81 Biology and Environmental Science Wetlands, some-
b. Use a 5 0.05 to test the following hypotheses: (i) there is times called the “nurseries of life,” are home to thousands of spe-
no effect due to interaction; (ii) there is no effect due to cies of plants and animals and are found on every continent expect
flock size; and (iii) there is no effect due to temperature. Antarctica. Canada has approximately 130 million hectares of wet-
lands, which is about 25% of all the wetlands on our planet.14 Sup-
11.78 Sports and Leisure Tailgate parties before college
pose that in a recent Department of Agriculture study, a random
football games have become long and lavish, and some present sample of wetlands was obtained for each combination of state and
an added security risk. A study was conducted to determine cover type. The cover type classifications are defined by the U.S.
whether the length of a tailgate party is affected by the outside Fish and Wildlife service and are determined by the representative
temperature and/or by the college team. The data presented in plant species. The area covered by water was measured (in hect-
the following table are the lengths (in hours) of randomly ares), and the data are given in the following table. WETLAND
selected tailgate parties (before a game) by school and tempera-
ture (C, cold; M, moderate; H, hot). tailgate State
School Louisiana Mississippi California Arkansas
Michigan Miami Ohio State UCLA AB3 0.8 1.1 1.2 0.7 1.2 1.8 3.3 0.2
Cover type
5.1 5.0 4.5 7.0 3.4 4.6 2.9 2.5 FO1 2.0 1.7 0.5 0.1 2.4 3.0 1.4 1.5
C
Temperature
6.0 2.2 6.6 2.8 0.2 3.6 3.1 3.6 ow 2.6 0.6 2.7 1.6 1.3 1.8 2.7 1.8
5.9 5.5 4.7 3.4 3.1 5.7 6.7 7.4 SS1 2.7 2.3 1.1 0.8 1.3 2.4 1.8 1.3
M
4.4 4.9 5.4 7.9 5.0 7.2 5.3 5.8
a. Construct the ANOVA summary table. Is there any
4.8 2.8 6.5 5.1 6.3 5.8 4.4 5.8
H evidence of an effect due to interaction of state and cover
4.7 3.4 7.5 5.3 8.0 8.5 9.8 7.2
type? Use a 5 0.01.
Use a 5 0.05 to test the following hypotheses: (a) there is b. Is there any evidence of an effect due to state or cover
no effect due to interaction; (b) there is no effect due to type? Use a 5 0.01 for both hypothesis tests. State your
temperature; and (c) there is no effect due to school. Include the conclusions.
ANOVA summary table. 11.82 Sports and Leisure The Genesis Diving Institute of
11.79 Public Policy and Political Science The city of Florida certifies scuba divers using a variety of different sys-
ustin, Texas, recently introduced new procedures to streamline
A tems of education. This organization recently studied the time
the review of commercial and residential building proposals. A spent underwater by scuba divers exploring caves and in open
random sample of proposals was obtained and classified by water. Each diver was also classified by age group. Independent
type of development and by the city office conducting the dives were selected at random for each combination of age
review. The length of the review process (in business days) for group and dive type. The data (in minutes) are given in the fol-
each proposal is given in the following table. BUILDING lowing table. DIVERS
Dive type
Type of development
Cave Open water
Commercial Residential
39 38 41 39 42 40 40 39
1 5 18 32 32 30 27 27 30 20–30
41 42 40 40 37 45 36 40
Office
2 20 23 23 28 31 26 31 30 43 40 38 36 45 39 43 43
Age group
Construct the two-way ANOVA summary table. Interpret the b. Is there any evidence of an effect due to gender or
results. Use a 5 0.001 for each hypothesis test. altitude? Use a 5 0.05 for both hypothesis tests.
11.86 Public Health and Nutrition Cerner Corporation
Extended Applications recently presented their vision of future hospital rooms: more
11.83 Travel and Transportation The Swedish Highway technology to improve medical efficiency and decrease the
Department conducted a study to examine the speed of cars on chance of human error. In addition to a system that scans
major highways. Three regions were selected and rated using staff IDs and a monitor that displays all relevant patient
four different quality scores (based on retroreflective proper- medical information, another consideration is the size of the
ties) for road markings. (The four classes of road markings in room. Suppose a random sample of existing hospital rooms
Sweden are K1, certainly approved; K2, probably approved; was obtained, and the amount of square feet per patient in
K3, probably rejected; and K4, certainly rejected.) The speed each room was measured. Each room was classified by
of randomly selected cars (in km/hr) was recorded, the sample location and type, and the data are given in the following
totals for each cell are given in the following table ( n 5 4 ) , and table. ROOMSIZE
Room type
Road marking quality
Semiprivate Semiprivate
K1 K2 K3 K4
Private two beds four beds
North 402.4 393.4 386.6 353.3
Region
11.85 Medicine and Clinical Studies A low hemoglobin a. Construct the ANOVA summary table. Is there any
level may be an indication of anemia, an iron deficiency, or even evidence of an effect due to interaction of county and
kidney disease. A high hemoglobin level could indicate lung dis- ADHD diagnosis? Use a 5 0.05.
ease or extreme physical exercise. In a study conducted by a new b. Is there any evidence of an effect due to county or ADHD
testing facility, a random sample of adults was obtained for each diagnosis? Use a 5 0.05 for both hypothesis tests. State
combination of sex and altitude. The hemoglobin level (in grams your conclusions.
per liter) for each person was recorded. HEMOGLOB c. Early diagnosis of ASD is important in order to begin
a. Construct the summary ANOVA table. Is there any intervention services. Is diagnosis of ADHD associated
evidence of an effect due to interaction of gender and with a decrease or increase in age at diagnosis for ASD?
altitude? Use a 5 0.05. Justify your answer.
Chapter 11 Summary 567
i51 j51
Hypothesis test
H0: m1 5 m2 5 . . . 5 mk (all k population means are equal)
Ha: mi 2 mj for some i 2 j (at least two population means differ)
MSA
TS: F 5
MSE
RR: F $ Fa,k21,n2k
Bonferroni multiple comparison procedure
The c simultaneous 100 ( 1 2 a ) % Bonferroni confidence intervals have as end points
1 1
( xi. 2 xj. ) 6 ta/(2c),n2k"MSE 1 for all i 2 j
Å ni nj
Tukey multiple comparison procedure
The c simultaneous 100 ( 1 2 a ) % confidence intervals have as endpoints
1 1 1
( xi. 2 xj. ) 6 Qa,k,n2k"MSE 1 for all i 2 j
"2 Å ni nj
568 C h ap te r 1 1 The Analysis of Variance
Two-Way ANOVA
A two-way analysis of variance is a statistical procedure used to determine the effect of
two factors on a response variable.
The sums of squares
The fundamental identity is SST 5 SSA 1 SSB 1 SS ( AB ) 1 SSE.
SST 5 total sum of squares
a ti..
a
2
5 bn a ( xi.. 2 x... ) 5
a
2 i51 t...2
2
i51 bn abn
SSB 5 sum of squares due to factor B
a t. j.
b
2
5 an a ( x. j. 2 x... ) 2 5
b
t...2
j51
2
j51 an abn
SS ( AB ) 5 sum of squares due to interaction
a a tij. a ti.. a t. j.
a b a b
2 2 2
Hypothesis tests
1. Test for an interaction effect
H0: There is no interaction effect.
Ha: There is an effect due to interaction.
MS ( AB )
TS: FAB 5
MSE
RR: FAB $ Fa,(a21)(b21),ab(n21)
Chapter 11 Exercises 569
Native 0.86 1.03 1.10 1.45 Campground 23.7 32.1 38.9 31.1 34.2 37.2
American 0.77 0.88 1.58 1.18 49.8 56.6 52.2 34.2 44.5 38.9
Hispanic 1.13 0.81 0.90 1.01 Rock Hill 42.0 41.6 63.2 41.6 25.5 28.4
0.78 0.86 1.57 1.42 33.8
Construct the two-way ANOVA summary table. Interpret the a. Conduct a one-way analysis of variance test to determine
results. Use a 5 0.05 for each hypothesis test. whether there is an overall difference in population mean
catfish weights. Use a 5 0.01.
11.95 Travel and Transportation Many factors affect the
b. Construct the Tukey 99% confidence intervals to isolate
number of automobile accidents on major highways, for
any population means contributing to an overall
example, increased traffic, weather conditions, and driver
difference, and draw a graph to represent the results.
fatigue. A recent study was conducted to determine whether
Which site would you recommend for the fisherman to
the frequency of median crashes is related to the number of
have a good chance at winning the tournament? Why?
lanes and/or the differential elevation or opposite travel
lanes. Independent random samples of U.S. highways were 11.98 Biology and Environmental Science We all expect
obtained for each combination of lanes and differential hospitals to be clean, almost spotless. However, according to
elevation. The number of median crashes over a 6-month the U.S. Centers for Disease Control and Prevention, approxi-
time period was recorded for each. Construct a two-way mately one million people contract an infection directly from a
ANOVA table. Interpret the results. Use a 5 0.05 for each hospital visit each year. A recent study suggested that hospital
hypothesis test. crash lobbies may have high levels of airborne particles, including
Chapter 11 Exercises 571
bacteria and fungi.19 Independent random samples of hospital a. Conduct a one-way analysis of variance test to determine
lobbies were obtained for each combination of season (1, winter; whether there is an overall difference in population mean
2, spring; 3, summer; 4, fall) and lobby size (1, small; 2, medium; glutathione levels. Use a 5 0.05.
3, large). The level of airborne bacteria was measured in each (in b. Construct the Tukey 95% confidence intervals to isolate
CFU/m3). BACTERIA any population means contributing to an overall
a. Construct the two-way ANOVA summary table. Interpret difference. Draw a graph to represent the results.
the results. Use a 5 0.05 for each hypothesis test. c. Because we are interested mainly in whether pain
b. Suppose more airborne bacteria in the lobby means there is relievers affect glutathione levels in healthy males, find
a greater chance of contracting an infection. Is there one the Bonferroni 95% confidence intervals only for the
season in which we should avoid going to a hospital? Why? differences between each pain reliever mean and the
c. Similarly, is there a specific size of hospital lobby to healthy adult mean. Based on these results, do you
avoid? Why? believe pain relievers lower glutathione levels? Justify
your answer.
11.99 Marketing and Consumer Behavior A real-estate
agent conducted a study to compare the effect of location and 11.102 Public Health and Nutrition A consumer group
season on weekly time-share costs (in dollars). The sample recently studied the amount of partially hydrogenated oils (trans fat)
totals for each combination of island (A, Aruba; M, Martinique; in various peanut butter brands. Four brands were selected in
SK, St. Kitts; SL, St. Lucia) and season are given in the smooth and chunky varieties. The amount of trans fat was measured
following table ( n 5 6 ) and SST 5 487,902,980.64. (in grams) per serving (two tablespoons) in each randomly selected
Season jar, and the data are given in the table below. PEANUT
a. Construct the two-way ANOVA summary table. Interpret
Spring Summer Fall Winter
the results. Use a 5 0.05 for each hypothesis test.
A 18,045 12,168 19,894 26,214 b. If a consumer wants to avoid trans fat as much as
Island
M 14,495 1,925 13,890 17,538 possible, which combination of brand and style would
SK 18,075 32,887 24,398 18,834 you recommend? Why?
SL 25,457 20,757 27,505 36,428 Peanut butter brand
Use a 5 0.01 to test the following hypotheses: (a) there is no Jif Peter Pan Skippy Smucker’s
effect due to interaction; (b) there is no effect due to island; and Smooth 0.51 0.72 0.89 0.69 0.75 0.85 0.73 0.71
(c) there is no effect due to season.
Chunky 0.46 0.63 0.88 0.81 0.76 0.69 0.73 0.76
11.100 Biology and Environmental Science A random
sample of male baby California sea lions was captured, tagged,
and monitored for several years. The weight of each sea lion Last Step
was measured (in pounds) once, and each sea lion was classi-
fied by species ID and age (in years). The sample totals are 11.103 Do the hours worked by farm laborers
given in the following table ( n 5 4 ) and SST 5 103,152.38. differ by region? Farm worker wages vary by region
and by the number of years a person has worked for the same
Age employer. In addition, some farm workers are paid based on a
1 2 3 piece rate, that is, by the number of baskets or crates they pick
Species ID
50011 1768 1735 1752 of the crop being harvested. Some research suggests farm
50023 1794 1563 1517 workers work longer hours in some regions of the United
States. Random samples of farm workers from four regions
50037 1750 1742 1328
were obtained, and the number of hours worked during a spring
Use a 5 0.05 to test the following hypotheses: (a) there is no planting week was recorded for each. The summary statistics
effect due to interaction; (b) there is no effect due to species; are given in the following table.20 farmhr
Looking Back
n Recall the properties and the methods to compute probabilities associated with a normal distribution.
n Remember the procedures for conducting hypothesis tests and constructing confidence intervals
based on a t distribution.
n Think about the procedures for conducting hypothesis tests based on an F distribution.
Looking Forward
n Learn how to visualize and measure a linear relationship between two variables.
n Construct scatter plots, compute the correlation coefficient, and consider linear regression concepts
and computations.
n Investigate methods for selecting the best mathematical model.
Contents
12.1 Simple Linear Regression
12.2 Hypothesis Tests and Correlation
12.3 Inferences Concerning the Mean Value and an Observed Value of Y for x 5 x*
12.4 Regression Diagnostics
12.5 Multiple Linear Regression
*12.6 The Polynomial and Qualitative Predictor Models
*12.7 Model Selection Procedures
Eugene Everett/Dreamstime.com
573
574 C hapte r 1 2 Correlation and Linear Regression
y Graph of y 5 f (x)
f (x0)
x0 x
Suppose there are n observations on specific, fixed values of the independent variable.
Here is the notation we use.
1. The observed values of the independent variable are denoted x1, x2, . . . , xn.
2. Yi and yi are the random variable and the observed value of the random variable associ-
ated with xi, for i 5 1, 2, c, n. This is tricky. For each xi, there is a corresponding
random variable Yi. So there are really n random variables in these problems.
3. The data set consists of n ordered pairs: (x1, y1), (x2, y2), . . . , (xn, yn).
Solution
a. The independent variable is sodium, and the dependent variable is TDS. The grounds-
keeper believes that there is a relationship between these two variables and hopes to be
able to predict the TDS as a function of sodium.
iShoot Photos, LLC/E+/Getty Images b. The data set consists of 11 ordered pairs: (4.1, 450), (5.4, 672), . . . , (11.3, 2398). For
each (fixed) value of sodium, there is a corresponding observation on a random vari-
able for TDS.
Stepped
STEPPED Tutorial
TUTORIALS
c. Figure 12.2 is a scatter plot of TDS ( y) versus sodium (x). This plot suggests that as
Scatter plots
BOX PLOTS
sodium increases, TDS also increases, and therefore the relationship might be positive
linear. Figure 12.3 shows a technology solution.
VIDEO Tech
Video TECH Manuals
MANUALS
EXEL DISCRIPTIVE
Scatter plots y
2500
2000
1500
1000
500
0
4 6 8 10 12 14 x
Figure 12.2 A scatter plot of TDS versus Figure 12.3 A TI-84 Plus C scatter
sodium level. plot of TDS versus sodium level.
variable x changes. Figure 12.4 shows the graph of a possible deterministic straight line
added to the scatter plot. Notice that the data points lie close to the line; the vertical
distance between a point and the line depends on the value of the random error. And the
line has positive slope, which conveys the relationship between the two variables: as the
sodium level rises, so does the TDS.
y
2500
2000
1500
1000
Figure 12.4 TDS versus sodium 500
level scatter plot with a (deterministic)
0
line added. 4 6 8 10 12 14 x
In just a few pages, we’ll define This section presents a method for finding the best deterministic straight line for a
what is meant by best. given set of data. Hypothesis tests will be used to determine whether there is a significant
linear relationship between two variables. As with all other hypothesis tests, certain
assumptions are required for the related statistical procedures to be valid.
1. The Ei’s are normally distributed (which implies that the Yi’s are normally distributed).
2. The expected value of Ei is 0 [which implies that E ( Yi ) 5 b0 1 b1xi ].
3. Var ( Ei ) 5 s2 [which implies that Var ( Yi ) 5 s2 ].
4. The Ei’s are independent (which implies that the Yi’s are independent).
A Closer L ok
1. The Ei’s are the random deviations or random error terms.
2. y 5 b0 1 b1x is the true regression line. Each point (xi, yi) lies near the true regres-
sion line, depending on the value of the random error term ei.
3. The four assumptions in the simple linear regression model definition can be stated
ind
compactly in terms of the random error term: Ei , N ( 0, s2 ) .
Just a little more notation is necessary to understand the simple linear regression
model and the resulting properties.
Recall, that Y 0 xi means “Y Consider each random variable Yi 5 Y 0 xi.
given xi.”
mY 0 xi 5 E ( Y 0 xi ) is the expected value of Y for a fixed value xi, and
s2Y 0 xi is the variance of Y for a fixed value xi.
The simple linear regression model assumptions imply that
mY 0 xi 5 E ( b0 1 b1xi 1 Ei ) 5 b0 1 b1xi 1 E ( Ei ) 5 b0 1 b1xi
s2Y 0 xi 5 Var ( b0 1 b1xi 1 Ei ) 5 s2
Y 0 xi is normal.
Therefore, the mean value of Y is a linear function of x. The true regression line passes
through the line of mean values.
12.1 Simple Linear Regression 577
The variability in the distribution of Y is the same for every value of x (this is called
homogeneity of variance).
Figure 12.5 illustrates the model assumptions and resulting properties. Each Yi has a
normal distribution, centered at b0 1 b1xi . All the distributions have the same width, or
variance.
0 1x3
0 1x2
0 1x1
x1 x2 x3 x
Solution
a. We need the expected value of Y for the value x 5 30.
5 83.3 Simplify.
The expected IQ for a six-year-old child with a blood lead level of 30 mg/dL is 83.3.
See Figure 12.6.
y
Distribution of
observed values
for x 30
30 x
Figure 12.6 The true regression line, the mean response for x 5 30, and the distribution
of observed values around the mean response.
578 Cha pte r 1 2 Correlation and Linear Regression
b. The slope of the true regression line, b1 5 20.45, is the change in IQ associated with
a 1 mg/dL change in blood lead level.
If the blood lead level increases by 10 mg/dL, the expected change in IQ is
( b1 )( change in x ) 5 ( 20.45 )( 10 ) 5 24.5.
If the blood lead level decreases by 20 mg/dL, the expected change in IQ is
( b1 )( change in x ) 5 ( 20.45 )( 220 ) 5 9.0.
The change in x is negative because the blood lead level is decreasing.
c. If s 5 8, then for x 5 20 the random variable Y is normally distributed with mean
96.8 2 0.45 ( 20 ) 5 87.8 and variance s2 5 82 5 64: Y , N ( 87.8, 64 ) .
Find the probability that Y exceeds 100.
Y 2 87.8 100 2 87.8
P ( Y . 100 ) 5 Pa . b Standardize.
8 8
5 P ( Z . 1.53 ) Equation 6.8, simplification.
(
5 1 2 P Z # 1.53 ) Use cumulative probability.
Y ~ N(87.8, 64)
Mean response 87.8
fox x 20
P(Y > 100)
True regression line
20 x 87.8 100 x
Suppose two variables are related via a simple linear regression model. The parameters
b0 and b1 are usually unknown. However, if we assume that the observations (x1, y1),
(x2, y2), . . . , (xn, yn) are independent, then these sample data can be used to estimate the
model parameters b0 and b1 .
The line of best fit, or estimated regression line, is obtained using the principle of
least squares. Figure 12.9 illustrates this concept: Minimize the sum of the squared
deviations, or vertical distances from the observed points to the line. Consider the
y
A vertical distance
x
Figure 12.9 An illustration of the principle of least squares.
12.1 Simple Linear Regression 579
v ertical distances from the points (x1, y1), (x2, y2), . . . , (xn, yn) to the line. The principle
of least squares produces an estimated regression line such that the sum of all squared
vertical distances is a minimum.
The focus in the remainder of this section is on the interpretations associated with find-
ing the line of best fit. In Section 12.2, several hypothesis tests utilizing these various
statistics will be introduced.
Least-Squares Estimates
= =
b0 and b1 are estimates of b0 and The least-squares estimates of the y intercept ( b0 ) and the slope ( b1 ) of the true regres-
=
sion line are
ng xi yi 2 ( g xi )( g yi )
b1 , respectively, but b1 is found
first, because its value is used in =
ng xi2 2 ( g xi ) 2
=
the calculation of b0 . b1 5 (12.2)
g yi 2 b1 g xi
and
=
= =
b0 5 5 y 2 b1x (12.3)
n
= =
The estimated regression line is y 5 b0 1 b1x.
A Closer L ok
1. Before using these equations, always consider a scatter plot and compute the sample
correlation coefficient (presented in Section 12.2) to make sure a linear model is rea-
sonable. For example, in Figure 12.10 a linear model seems reasonable; the relation-
ship between x and y appears to be negative linear. In Figure 12.11, a linear model is
not reasonable; the relationship between the variables appears to be quadratic.
y
y
8
8
6
6
4 4
2 2
0 0
0 2 4 6 8 x 0 2 4 6 8 x
Figure 12.10 A scatter plot of data in Figure 12.11 A scatter plot of data
which the relationship appears linear. in which the relationship does not appear
to be linear.
= =
The predictor variable is the 2. If x* is a specific value of the independent, or predictor, variable x, let y* 5 b0 1 b1 x*.
independent variable, and the a. y* is an estimate of the mean value of Y for x 5 x*, denoted mY 0 x* .
response variable is the dependent
b. y* is also an estimate of an observed value of Y for x 5 x*.
variable.
Birth 3 6 9 12 15 18 21 24 27 30 33 36
in cm AGE (MONTHS)
cm in
41 41 L
40 40 E
100 95 100
39 90 39 N
38 G
75 38
95 95 T
37 50 37 H
36 25 36
90 90
35 10 35
5
34
85
33
32 38
80 95 17
31
L 30 36
E 75 90 16
N
29
34
G 28
70 75
15
T 27 32
H 26 65 14
25 50 30 W
24 E
60 13
23 25 28 I
G
22 55 12 H
10 26
21 5 T
20 50 11 24
19
18 45 10 22
17
16 40 9 20
15
8 18
SOLUTION TRAIL 16 16
7 AGE (MONTHS)
kg lb
12 15 18 21 24 27 30 33 36
14
6 Mother’s Stature Gestational
W Father’s Stature Age: Weeks Comment
E 12
Date Age Weight Length Head Circ.
I 5 Birth
G 10
H
T
4
8
Solution Trail 12.3 3
6
Keywo rds 2
lb kg
n Estimated regression line Birth 3 6 9
n True mean weight of a Published May 30, 2000 (modified 4/20/01).
SOURCE: Developed by the National Center for Health Statistics in collaboration with
the National Center for Chronic Disease Prevention and Health Promotion (2000).
22-month-old girl http://www.cdc.gov/growthcharts
n Random sample
Tr ansl at i on
= = A random sample of infant girls was obtained, and the weight (in kilograms) and age
n y 5 b0 1 b1x (in months) for each is given in the following table.
n E ( Y 0 22 )
Concepts Age 33 32 14 20 15 16 30 17 21 23
n Least-squares estimates Weight 12.9 13.8 8.2 12.2 8.5 12.9 13.7 11.2 11.9 10.4
n Estimate of the mean value
of Y for x 5 x*
a. Find the estimated regression line.
Vi sion
b. Estimate the true mean weight of a 22-month-old girl.
Find the necessary summary
statistics.
= Use Equation 12.2 to
find
= b 1 and Equation= 12.3 =to find Solution
b0 . Compute y* 5 b0 1 b1 ( 22 )
as an estimate of the mean value a. Age (x) is the independent, or predictor, variable, and weight ( y) is the dependent, or
of Y for x 5 22. response, variable. Find the necessary summary statistics for the n 5 10 pairs of
observations, and use Equations 12.2 and 12.3.
12.1 Simple Linear Regression 581
g yi 2 b1 g xi
4649
=
=
b0 5 Equation 12.3.
n
115.7 2 ( 0.2012 )( 221.0 ) =
5 Use summary statistics and b1 .
10
5 7.123 Simplify.
VIDEO Tech
Video TECH Manuals
MANUALS The estimated regression line is y 5 7.123 1 0.2012x. Figure 12.12 shows a scatter
EXEL DISCRIPTIVE
Linear regression plot of the data and the graph of the estimated regression line.
b. The estimated true mean weight of a 22-month-old girl is found by substituting
x* 5 22 into the estimated regression line equation.
y 5 7.123 1 0.2012x Estimated regression line equation.
An estimate for the expected weight of a 22-month-old girl is 11.55 kilograms. See
Figure 12.13.
y
y
14
14
12
11.55
10 10
8 8
6
15 20 25 30 35 x 15 20 22 25 30 35 x
Figure 12.12 A scatter plot of the data and Figure 12.13 The expected weight for a
the graph of the estimated regression line. 22-month-old girl.
VIDEO Tech
Video TECH Manuals
MANUALS Figure 12.14 shows a technology solution.
EXEL DISCRIPTIVE
Fitted line plots
Sxx 5 g ( xi 2 x ) 2 5 g xi2 2 ( g xi ) 2
1
n
3 5
definition computational formula
Syy 5 g ( yi 2 y ) 2 5 g yi2 2 ( g yi ) 2
1
n
3 5
definition computational formula
Sxy 5 g ( xi 2 x )( yi 2 y ) 5 g xi yi 2 ( g xi )( g yi )
1
n
8 8
definition computational formula
= = = =
The ith predicted, or fitted, value, denoted yi, is yi 5 b0 1 b1xi. This is simply the equation
for the estimated regression line evaluated at xi.
=
The ith residual is yi 2 yi. This difference is a measure of how far away the observed
value of Y is from the estimated value of Y.
The total variation in the data (the total sum of squares, denoted SST) is decomposed
into a sum of the variation explained by the model (the sum of squares due to regres-
sion, denoted SSR) and the variation about the regression line (the sum of squares due
to error, denoted SSE). These three sums are defined in the following identity, which
shows the decomposition of the total variation in the data.
Sum of Squares
g ( yi 2 y ) 2 5 g ( yi 2 y ) 2 1 g ( yi 2 yi ) 2
= =
(12.4)
3 3 3
SST SSR SSE
Here are the computational formulas for these sums of squares.
=
SST 5 Syy SSR 5 b1Sxy SSE 5 SST 2 SSR
Coefficient of Determination
The coefficient of determination, denoted r 2, is a measure of the proportion of the varia-
tion in the data that is explained by the regression model, and is defined by
r 2 5 SSR/SST (12.5)
r 2 5 1 implies a perfect fit. What Because 0 # SSR # SST, the coefficient of determination, r 2, is always a number
does r 2 5 0 imply? between 0 and 1 (inclusive). The higher r 2 is, the better the model is. Many statistical
software packages report 100r 2, the percentage of variation explained by the regression
model.
The following example illustrates the computations necessary to produce the esti-
mated regression line and the ANOVA table.
a. Find the
= estimated regression line and explain the meaning of the estimated coeffi-
cient b1.
b. Complete the ANOVA table (without the p value), and find and interpret the coefficient
of determination.
SOLUTION TRAIL
g x 2i g y 2i
n ANOVA table
ng xi yi 2 ( g xi )( g yi )
n Coefficient of determination 5 94,066 5 211,775
Random sample =
ng x 2i 2 ( g xi ) 2
n
b1 5 Equation 12.2.
T ranslati on
= =
n y 5 b0 1 b1x ( 18 )( 128,564 ) 2 ( 1,166 )( 1,733 )
ANOVA summary table 5 Use summary statistics.
n ( 18 )( 94,066 ) 2 ( 1,166 ) 2
n r2 293,474
5 5 0.8796 Simplify.
g yi 2 b1 g xi
Concepts 333,632
=
n Least-squares estimates =
b0 5 Equation 12.3.
n ANOVA summary table for n
simple linear regression 1,733 2 ( 0.8796 )( 1,166 ) =
n The proportion of variation 5 Use summary statistics and b1 .
18
in the data explained by the
regression model 5 39.3 Simplify.
=
The estimated regression line is y 5 39.3 1 0.8796x. The value b1 5 0.8796 sug-
Vi sion
gests that an increase of 1 dB leads to an increase of approximately 0.9 in the Agatston
Find the necessary summary
score. Figure 12.15 shows a scatter plot of the data and the graph of the estimated re-
statistics.
= Use Equation 12.2 to
find and Equation 12.3 to find gression line.
= b 1
b0 . Compute the ANOVA
summary table and compute r2. y
175
Estimated regression line:
150 y 39.3 0.8796x
125
100
75
50
25
20 40 60 80 100 120 x
Figure 12.15 A scatter plot of the data and the
graph of the estimated regression line.
SST 5 Syy 5 g y 2i 2 ( g yi ) 2
1
Computational formula.
18
1
5 211,775 2 ( 1,733 ) 2 5 44,925.6 Use summary statistics.
18
SSR 5 b1Sxy 5 b1 c g xi yi 2 ( g xi )( g yi ) d
= = 1
Computational formula.
18
1
( 1,166 )( 1,733 ) d
5 0.8796 c 128,564 2 Use summary statistics.
18
5 14,341.1 Simplify.
You can probably guess that we The value of the test statistic is
will reject the null hypothesis if
the value of the test statistic is MSR 14,341.1
f5 5 5 7.50
large. The formal hypothesis test MSE 1,911.5
is presented in Section 12.2.
Here are all of these calculations presented in an ANOVA table.
Approximately 0.3192, or 32%, of the variation in the data is explained by the regres-
sion model.
Figures 12.16 and 12.17 show technology solutions.
Figure 12.16 JMP scatter plot and Figure 12.17 JMP Linear Fit results.
estimated regression line.
Technology Corner
Procedure: Construct a scatter plot, find the estimated regression line, and complete an ANOVA table.
Reconsider: Example 12.4, solution, and interpretations.
CrunchIt!
A Scatterplot option is available under the Graphics menu, and the Simple linear regression command is used to find the
estimated regression line and construct the ANOVA table.
1. Enter the Noise data into column Var1 and the Agatston score data into column Var2. Rename the columns if desired.
2. Choose Graphics; Scatterplot. Enter the independent variable (X) and the dependent variable (Y).
3. In the Parameters section, select Points, and optionally enter a Title, X Label, Y Label, Slope of line, and Intercept of
line. Click Calculate to produce the scatter plot (Figure 12.18).
4. Choose Statistics; Regression; Simple Linear. Enter the Dependent Variable, the Independent Variable, and in the
Parameters section, select the desired Display. Click Calculate. The estimated regression line and coefficient statistics
are displayed. See Figure 12.19.
Figure 12.18 CrunchIt! scatter plot. Figure 12.19 The estimated regression line.
TI-84 Plus C
A scatter plot is one of the six built-in statistical plots. Use the function LinReg(a+bx) to compute the regression
coefficients. There is no built-in function to produce the ANOVA table.
1. Enter the Noise data into list L1 and the Agatston score data into list L2.
2. Choose STATPLOT ; STAT PLOTS; Plot1. Turn the plot On, select Type scatter plot (the first graph icon), enter
the Xlist, L1, the Ylist, L2, and choose a Mark and Color (for the points on the graph).
3. Enter appropriate window settings and press GRAPH to view the scatter plot. See Figure 12.20.
4. On the Home screen, choose STAT ; CALC; LinReg(a+bx), and enter two arguments: the independent variable
list, L1, and the dependent variable list, L2. Highlight Calculate and press ENTER . The regression coefficients
are displayed on the Home screen. See Figure 12.21.
Note that the function LinReg(a+bx) has an optional argument, a function variable, for example Y1, for storing the
regression equation. There are other calculator functions that will also produce the regression coefficients, for example,
LinReg(ax+b) and LinRegTTest.
Minitab
Minitab has a Scatterplot option in the Graph menu, and there are several built-in functions to compute the regression
coefficients and the ANOVA table, in the Stat; Regression menu.
1. Enter the Noise data into column C1 and the Agatston score data into column C2.
2. Select Graph; Scatterplot. Choose a Simple scatter plot.
3. Enter C2 under Y variables and C1 under X variables. Click OK. The scatter plot is displayed in a graph window. Adjust
the labels and axes as necessary. See Figure 12.22.
4. Select Stat; Regression; Regression; Fit Regression Model. Enter the Response variable, C2, and the Continuous predictors,
C1. Click OK. The regression coefficients and the ANOVA table are displayed in a Session window. See Figure 12.23.
Figure 12.22 Minitab scatter plot. Figure 12.23 Minitab output: regression
coefficients and analysis of variance table.
Excel
Use the Scatter chart option to construct a scatter plot and the Data Analysis Toolkit function Regression to compute the
regression coefficients and ANOVA table.
1. Enter the Noise data into column A and the Agatston score data into column B.
2. Highlight, or select, the data range, A1:B18. Under the Insert tab, select Scatter; Scatter with only Markers. The scatter
plot is displayed in the current worksheet. Edit the plot options as necessary. See Figure 12.24.
2. Under the Data tab, select Data Analysis; Regression. Enter the Y Range (B1:B19), the X Range, (A1:A19), and the
Output Range. Click OK to display the regression coefficients and ANOVA table. See Figure 12.25.
Figure 12.24 Excel scatter plot. Figure 12.25 Estimated regression coefficients and ANOVA table.
588 C hapter 1 2 Correlation and Linear Regression
12.10 Short Answer Explain why a scatter plot should be y 10.5 11.5 6.6 9.8 2.9 6.8 12.8
considered prior to finding an estimated regression line. x 6.8 4.9 7.8 5.3 1.1 0.3
y 10.2 4.9 9.4 8.2 3.6 3.2 EX12.12A
Practice
b. x 98.2 56.4 52.9 99.3 73.5 82.7
12.11 Decide whether a simple linear regression model is
appropriate for each of the following graphs. If it is, indicate y 88.7 167.0 185.5 72.5 149.5 113.0
=
whether the slope ( b1 ) of the estimated regression line is x 54.1 68.2 70.3 71.2 53.4 90.2
positive, negative, or zero. If it is not, state why.
y 163.0 135.4 135.8 130.1 168.4 82.0
a. y
x 53.8 99.4 85.5 72.6 92.7 89.6
y 184.2 90.1 130.3 132.9 93.2 103.6
x 80.9 76.4 54.5 97.3 58.7 92.6
y 121.0 115.7 174.1 79.1 159.3 86.4
EX12.12B
a. Find the expected value of Y when x 5 27.4. b. How much change in yield is expected if the distance
b. How much change in the dependent variable is expected between rows decreases by 2 inches?
when x decreases by 5 units? Justify your answer. c. Suppose s 5 3.2 bushels per acre. Find the probability
c. Suppose s 5 1.5. Find the probability that an that an observed value of yield is between 90 and 95
observed value of Y is between 2250 and 2252 bushels per acre when rows are 17 inches apart.
when x 5 30.
12.19 Physical Sciences Once a PVC pipe joint is assembled
12.15 Suppose a simple linear regression model is used to using solvent cement, a certain amount of time must be allowed
explain the relationship between x and y. A random sample of for the new joint to set. For 4- to 8-inch-diameter pipes, suppose
n 5 12 values for the independent variable was selected, and the setting time ( y, measured in hours) is linearly related to the
the corresponding values of the dependent variable were ambient temperature (x, measured in °F ) . The true regression
g xi 5 460.53 g yi 5 26,349.7
observed. The summary statistics were line is assumed to be y 5 8.3 2 0.09x.
a. Find an estimate of an observed setting time for an
gx 2i 5 17,875.1 g y 2i 5 3,421,892
ambient temperature of 65°F.
g xi yi 5 2246,677
b. How much less time does a joint take to set if the
temperature rises by 10°F?
a. Find the estimated regression line. c. If s 5 15 minutes, find the probability that an observed
b. Estimate the true mean value of Y when x 5 41. set time is less than one hour if the ambient temperature
is 80°F.
12.16 Suppose a simple linear regression model is used
to explain the relationship between x and y. A random 12.20 Sports and Leisure A recent study investigated the
sample of n 5 15 values for the independent variable was association between heading a soccer ball and subclinical evi-
selected, and the corresponding values of the dependent dence of traumatic brain injury.5 Amateur soccer players were
variable were observed. The data are given in the following selected, and each completed a questionnaire to determine the
table. EX12.16 number of times (x) they headed the ball in the last 12 months.
In addition, each player was subject to diffusion-tensor mag-
x 4.9 9.9 0.3 3.8 9.8 4.6 9.8 5.4 netic resonance to measure the fractional anisotropy (FA, y).
y 49.1 53.4 21.7 40.4 78.6 60.4 73.9 37.4 Suppose the estimated regression line was
y 5 20.00026x 1 0.6877.
x 3.1 3.0 4.2 4.0 1.7 1.2 7.0 a. Find the expected value of y if the number of headers is
y 41.6 29.6 46.7 28.5 23.8 15.2 72.3 1000.
b. Find an estimate of an observed y value for x 5 1200.
a. Construct a scatter plot for these data. Is a simple linear c. Suppose s 5 0.06. Find the probability that an observed
regression model reasonable? Justify your answer. y value is more than 0.5 when the number of headers
b. Find the estimated regression line. Add a graph of this is 750.
line to your scatter plot.
c. What is the predicted value of Y when x 5 5.7? 12.21 Manufacturing and Product Development As soon
as a bottle of soda is opened, it begins to lose its carbonation.
12.17 Suppose a simple linear regression model is used to Fourteen 12-ounce bottles of cola were obtained, and each was
explain the relationship between x and y. A random sample of assigned a randomly selected time period (in hours). Each
n 5 21 values for the independent variable was selected, and bottle was opened and allowed to stand at room temperature.
the corresponding values of the dependent variable were The carbonation ( y) in each bottle was measured (in volumes)
observed. EX12.17 after the prescribed time period (x). The summary statistics are
a. Construct a scatter plot for these data. Is a simple linear given below.
gxi 5 8.6 g yi 5 37.4 gxi yi 5 19.61
regression model reasonable? Justify your answer.
gx 2i 5 7.26 gy 2i 5 110.24
b. Find the estimated regression line. Add a graph of this
line to your scatter plot.
c. Estimate the true mean value of Y when x 5 62. a. Find the estimated regression line.
d. Construct an ANOVA summary table, except for the b. Estimate the true mean carbonation after 1 hour and
p value. 15 minutes.
12.22 Biology and Environmental Science Some research
Applications suggests that higher levels of carbon dioxide (CO2) increase
12.18 Biology and Environmental Science Agricultural the rate at which plants and trees grow.6 In fact, even without
research suggests that the final corn yield in bushels per acre ideal conditions (for example, normal precipitation and tem-
(y) is linearly related to the number of inches between rows (x). perature), some trees grow more in elevated levels of CO2.
Suppose the true regression line is y 5 197.5 2 6.1x. Suppose that, in a research study, data were collected on oak
a. Find the expected yield when there are 15 inches trees in northern U.S. forests. For randomly selected levels of
between rows. CO2 (x, measured in parts per million, ppm), the most recent
590 C hapter 1 2 Correlation and Linear Regression
tree-ring growth ( y, in cm) was measured. The summary statis- c. Find an estimate of an observed value of the donation
tics are given below ( n 5 10 ) . amount for a football team with a winning percentage
gxi 5 5,035.0 g yi 5 13.7 g xi yi 5 7,012.5
of 75%.
g x 2i 5 2,607,575 g y 2i 5 19.25
Extended Applications
a. Find the estimated regression line.
b. Find an estimate of an observed value of tree-ring growth 12.26 Public Health and Nutrition A recent study suggests
if the level of CO2 is 600 ppm. that the number of steps walked per day is strongly associated
with good health.8 A person’s heart rate indicates how hard the
12.23 Business and Management A recent study
heart is working to circulate blood throughout the body and is a
considered the effects of innovation on employment in Latin
measure of health. A lower resting heart rate may reduce the
America.7 It seems reasonable that as more firms produce
risk of heart attack and stroke, and also increase endurance. A
new products, they would need more workers, and employ-
random sample of adults between 35 and 40 years old was
ment would rise. For small firms in Argentina, let y be the
obtained, and each person wore a pedometer for a day. The
yearly percentage of employment growth and x be the
number of steps taken, x, was recorded, and the next day the
percentage of small firms that are product or process
resting pulse rate (beats/minute), y, for each person was also
innovators. Assume the estimated regression line is
measured.
y 5 25.399 1 5.790x.
a. Find the estimated regression line and= explain the
a. Find the expected percentage in employment growth if the
meaning of the estimated coefficient b1.
percentage of product or process innovators is 2%.
b. Complete the ANOVA table (without the p value), and find
b. Find an estimate of an observed value y for x 5 3.5.
and interpret the coefficient of determination. WALKING
c. Suppose s 5 1.5. Find the probability that an observed y
value is more than 19 when x 5 4. 12.27 Medicine and Clinical Studies Many automobile
d. Suppose x 5 0.9%. Find the expected value of y and accidents are caused by tired drivers. A typical test for
interpret this result. alertness involves measuring the speed and degree to which
a person’s eyes respond to certain stimuli.9 A random sample
12.24 Sports and Leisure Sailboat enthusiasts believe that
of 25 drivers was obtained, and the oscillations in pupil
the wind speed (x, in miles per hour) is linearly related to
size (x, in millimeters per second) was measured using a
(downwind) boat speed ( y, in knots). For wind speeds between
pupillograph. Each person’s tiredness ( y) was also recorded
10 and 30 mph and Hobie catamarans, the following data were
using the pupil unrest index (PUI). The summary statistics
recorded. EX12.24
are given below.
g xi 5 7.1 gyi 5 192.0 g xi yi 5 49.22
gx 2i 5 2.1064 g y 2i 5 2094.0
x 26.2 11.8 27.3 24.3 20.4 24.6 20.7 14.0 15.5 20.3
y 16.3 7.7 17.3 18.2 16.9 10.3 12.7 13.3 10.0 6.8
a. Construct a scatter plot for these data. a. Find the estimated regression line.
b. Find the estimated regression line. Add a graph of the b. Find the expected PUI for x 5 0.3 millimeters per
estimated regression line to the scatter plot in part (a). second.
c. Estimate the mean speed of a Hobie catamaran for a wind c. Suppose a driver is considered too tired to drive if the PUI
speed of 18 mph. score is 15 (or higher). What value of x yields an expected
PUI score of 15?
12.25 Psychology and Human Behavior A development
officer at a large university believes that the amount of 12.28 Biology and Environment Science The Virginia
money donated to the general fund by alumni each year is Cooperative Extension Service provides yield data and other
linearly related to the football team’s winning percentage. performance statistics on barley as well as a summary of the
A random sample of Division I football teams was selected. growing season. A random sample of barley varieties was
The winning percentage (x) and subsequent alumni donation obtained and the 2013 yield ( y, in bushels per acre) and the
amount (y, in millions of dollars) are given in the following date headed (x, number of days since the beginning of the cal-
table. EX12.25 endar year at which 50% of the heads have emerged from the
plants) are given in the following table.10 BARLEY
x 92 67 92 25 83 83 58 8 75
y 15.8 11.1 13.5 7.0 17.4 10.8 11.2 3.9 23.7 Date headed 115 111 114 116 115
Yield 88 87 87 84 82
x 100 92 58 92 17 25 75 92 33
y 27.8 23.3 11.3 6.6 7.3 10.3 15.8 12.0 6.8 Date headed 120 119 113 112 119
Yield 82 81 78 82 81
a. Construct a scatter plot for these data.
b. Find the estimated regression line. Add a graph of the a. Find the estimated regression line.
estimated regression line to the scatter plot in part (a). b. Complete the ANOVA table (without the p value).
12.2 Hypothesis Tests and Correlation 591
Even though a true regression line knowing the value of x provides no additional information about the value of Y. In this
is shown in Figure 12.27, there is case, a scatter plot would have no discernible pattern, with points scattered randomly
really no significant linear in the plane (Figure 12.26).
relationship. The true regression For simple linear regression, a test of significance is equivalent to testing the
line is of absolutely no use for hypothesis H0: b1 5 0. If H0 is true, then the model assumptions imply that the mean
predicting or estimating values of Y. value of Y for any value of x is the same. That is, mY 0 xi 5 b0 . Therefore, the values of
Y vary around the horizontal line y 5 b0 , and knowing the value of x adds no addi-
Remember that b1 is the slope of
tional information. See Figure 12.27.
the true regression line. A
horizontal line has slope,
b1, equal to 0. y y
True
0 regression
line
x x
Figure 12.26 Scatter plot of data showing Figure 12.27 If b1 5 0, the values of Y vary randomly
no linear pattern. around the true regression line y 5 b0.
A Closer L ok
1. The null hypothesis is rejected only for large values of the test statistic. The associated
p value is a right-tail probability. The F ratio will be larger when b1 2 0 than when
b1 5 0.
2. This is often called a model utility test. In general, if H0 is rejected, then the value of
r2 is usually large.
3. Alternatively, we can conduct this hypothesis test by using the p value. Recall that if
H0 is true, the p value is the probability of obtaining a value of the test statistic at least
as large as the observed value. If p # a, we reject the null hypothesis. If p . a, we
do not reject the null hypothesis.
Theorem
If the simple linear regression assumptions are true, then the random variable
B1 2 b1 B1 2 b1
T5 5
S/ !Sxx SB1
has a t distribution with n 2 2 degrees of freedom.
No proof is given here that the The null and alternative hypotheses are stated in terms of the parameter b1. The most
two tests are equivalent, but there common test has H0: b1 5 0. If b1 5 0, there is no significant linear relationship between
is some numerical evidence in the two variables. The true regression line is y 5 b0 (a horizontal line), and knowing the
Example 12.5. value of x is of no use in predicting the value of Y 0 x. For simple linear regression, the
following hypothesis test (with b10 5 0) is equivalent to an F test for a significant regres-
b10 is the hypothesized value of
sion with significance level a.
b0.
Hypothesis Test and Confidence Interval
Concerning b1
H0: b1 5 b10
Ha: b1 . b10, b1 , b10, or b1 2 b10
B1 2 b10
TS: T 5
SB1
RR: T $ ta,n22, T # 2ta,n22, or 0 T 0 $ ta/2,n22
VIDEO Tech
Video TECH Manuals
MANUALS A 100 ( 1 2 a ) % confidence interval for b1 has as endpoints the values
EXEL DISCRIPTIVE =
Linear regression:
Inference for slope,
b1 6 ta/2,n22 # sB1 (12.6)
CI for slope
There are similar properties, a hypothesis test procedure, and confidence interval concerning
the simple linear regression parameter b0. The estimator B0 has the following properties.
s2 g x 2i
1. B0 is an unbiased estimator for b0: E ( B0 ) 5 mB0 5 b0.
s2 g x 2i
nSxx
Theorem
If the simple linear regression assumptions are true, then the random variable
B0 2 b0 B0 2 b0
T5 5
S"g x 2i /nSxx SB0
SOLUTION TRAIL
Example 12.5 Ready to Operate
Data Set The Pulsar corporation sells a large sterilizer with four extendable shelves for medical
MEDTOOLS tools. Company engineers believe that the time to reach operating temperature from a cold
start (y, measured in minutes) is linearly related to the thickness of insulation (x, in inches).
A random sample of n 5 12 thicknesses was selected, and the time to reach operating
Solution Trail 12.5 temperature was recorded for each. The data and the summary statistics are given below.
Ke ywor ds x 1.3 1.8 0.9 1.6 2.6 1.5 2.1 3.0 0.8 2.4 2.5 2.6
n Estimated regression line y 8.0 6.9 8.1 7.0 6.3 6.5 6.4 5.8 8.3 8.3 6.6 6.6
g x2i g y2i
n t test (concerning b1 ) for
significant regression 5 50.13 5 607.66
n Random sample
a. Find the estimated regression line.
T ranslati on b. Complete the ANOVA table and conduct an F test for a significant regression. Use a
= =
n y 5 b0 1 b1x significance level of 0.05.
n ANOVA summary table
c. Conduct a t test (concerning b1 ) for a significant regression. Use a significance level
n Hypothesis test with b10 5 0 of 0.05.
Concepts d. Interpret your results.
n Least-squares estimates
Solution
n ANOVA summary table for = =
ng xi yi 2 ( g xi )( g yi )
simple linear regression a. Use the summary statistics to find b0 and b1 .
=
ng x2i 2 ( g xi ) 2
n Hypothesis test concerning b1
b1 5 Equation 12.2.
Vi sion
Find the necessary summary ( 12 )( 158.5 ) 2 ( 23.1 )( 84.8 )
statistics. Use Equation 12.2 to 5 Use summary statistics.
= ( 12 )( 50.13 ) 2 ( 23.1 ) 2
find
= b1 and Equation 12.3 to find
b0 . Complete the ANOVA 256.88
5 5 20.8371 Simplify.
g yi 2 b1 g xi
summary table. Use the template 67.95
=
for a hypothesis test concerning =
b1 with b10 5 0. b0 5 Equation 12.3.
n
84.8 2 ( 20.8371 )( 23.1 ) =
5 Use summary statistics and b1 .
12
5 8.6781 Simplify.
SSR 5 b1Sxy 5 b1 c g xi yi 2 ( g xi )( g yi ) d
= = 1
Computation formula.
n
1
( 23.1 )( 84.8 ) d
5 20.8371 c 158.5 2 Use summary statistics.
12
5 3.9679 Simplify.
c. The t test for a significant regression (concerning b1 ) is two-sided and has b10 5 0.
Use a significance level of 0.05.
H0: b1 5 0
Ha: b1 2 0
B1
TS: T 5
SB1
RR: 0 T 0 $ ta/2,n22 5 t0.025,10 5 2.2281
Sxx 5 g x2i 2 ( g xi ) 2
Remember that s2 5 MSE. 1
Computational formula.
n
1
5 50.13 2 ( 23.1 ) 2 5 5.6625 Use summary statistics.
12
s2B1 5 s2 /Sxx 5 0.4439/5.6625 5 0.0784
and the value of the test statistic is
=
b1 20.8371
t5 5 5 22.9896; 0 22.9896 0 5 2.9896 $ 2.2281
sB1 "0.0784
596 Ch apter 1 2 Correlation and Linear Regression
Because 0 t 0 lies in the rejection region, there is evidence to suggest that b1 2 0, the
t10
regression is significant. The tables of critical values for t distributions can be used to
bound the p value. Using technology,
p/2 5 P ( T $ 2.9896 ) 5 0.0068 1 p 5 0.0136
Figure 12.29 shows the calculation and an illustration of the exact p value.
2.9896 0 2.9896
Note: Comparing the two tests for a significant regression, we notice that t 2 < f. In
Figure 12.29 p-Value
fact, for simple linear regression t 2 5 f in every case, subject to round-off error. The
illustration:
p values associated with these two tests are also always the same. It seems reasonable
p 5 2P(T # 22.9896)
that these two tests should always lead to the same conclusion, because they are both
5 0.0136 # 0.05 5 a
overall tests of a significant regression.
d. This analysis suggests that insulation
= thickness is linearly related to the time to reach
Why have two hypothesis tests
operating temperature. Because b1 5 20.8371 , 0, this suggests that for each 1-inch
that are essentially the same? In
increase in thickness, the time to reach operating temperature decreases by 0.8371 minute.
multiple linear regression there
may be many (more than one) Figures 12.30 and 12.31 show technology solutions.
hypothesized explanatory
variables. An overall F test for a
significant regression is conducted
first. If the regression is
significant, then a t test is
conducted for each variable to
see whether it contributes
significantly to the variability in y.
Figure 12.30 Minitab regression analysis Figure 12.31 JMP Linear Fit results.
output.
plots in Figures 12.32 and 12.33 and the quantity Sxy 5 g ( xi 2 x )( yi 2 y ) . The horizon-
To understand the formula for the sample correlation coefficient, consider the scatter
tal line y 5 y and the vertical line x 5 x divide the plane region into four parts. The signs,
plus ( 1 ) or minus ( 2 ) , on each graph indicate the sign of the product ( xi 2 x ) ( yi 2 y )
12.2 Hypothesis Tests and Correlation 597
y y ()
() () ()
in each part. For example, for any ordered pair in the top right corner of Figure 12.32,
xi . x and yi . y. Therefore, xi 2 x . 0 and yi 2 y . 0, and the product is positive.
Similarly, consider any ordered pair in the bottom right corner of Figure 12.33, xi . x and
yi , y. Therefore, xi 2 x . 0 and yi 2 y . 0, and the product is negative.
If x and y are positively related, as in Figure 12.32, then most of the products
( xi 2 x )( yi 2 y ) are positive. And if x and y are negatively related, as in Figure 12.33, then
most of the products ( xi 2 x )( yi 2 y ) are negative. Therefore, it seems reasonable to find
the sum of all of these products, Sxy, and use this as a measure of the linear relationship
between the two variables. Large positive values of Sxy should indicate a positive linear
relationship, and large negative values should indicate a negative linear relationship.
This approach is intuitive, but it must be modified slightly. The magnitude of Sxy
depends on the units of x and y. For example, if we multiply every value of x by 10, the
inherent linear relationship between x and y should not change, but the value of Sxy
increases. The sample correlation coefficient adjusts Sxy so that it is unit-independent.
g xi yi 2 ( g xi )( g yi )
1
Sxy n
g x2i 2 ( g xi ) 2 d c g y2i 2 ( g yi ) 2 d
r5 5 (12.8)
"SxxSyy 1 1
c
Å n n
A Closer L ok
1. The value of r does not depend on the order of the variables and is independent of
units, or is unitless.
2. The value of r is always between 21 and 11, that is, 21 # r # 11. r is exactly 11 if
and only if all of the ordered pairs lie on a straight line with positive slope. r is exactly
21 if and only if all of the ordered pairs lie on a straight line with negative slope.
3. The square of the sample correlation coefficient is the coefficient of determination in
a simple linear regression model. Because 21 # r # 11, 0 # r2 # 1.
4. r is a measure of the strength of a linear relationship. If r is near 0, there is no evidence
of a linear relationship, but x and y may be related in another way.
5. Suppose there is a horizontal line ( y 5 b0 ) with zero slope, and all the data points lie
very close to this line. There is no association between the variables; the correlation is
close to 0. Intuitively, some of the products ( xi 2 x )( yi 2 y ) are positive and some are
negative, and they tend to cancel out one another.
598 Cha pter 1 2 Correlation and Linear Regression
x x
Figure 12.34 The variables x and y are Figure 12.35 The variables x and y are
(strongly) negatively related. The points (moderately) positively related. As the values
all fall near a straight line with negative of x increase, the values of y tend to
slope. The sample correlation coefficient is increase. The sample correlation coefficient
near 21. is near 0.6.
y r near 0 y r near 0
x x
Figure 12.36 The scatter plot shows no Figure 12.37 The variables x and y appear
linear relationship between the variables x and to be related but not linearly. There is a
y. The sample correlation coefficient is near 0. pattern in the scatter plot, but because the
sample correlation coefficient measures linear
association, r is near 0.
Find the sample correlation coefficient between capacity and number of tickets issued,
and interpret this value.
12.2 Hypothesis Tests and Correlation 599
Solution
VIDEO Tech
Video TECH Manuals
MANUALS Step 1 Compute the summary statistics needed to find the sample correlation coefficient.
Sxy 5 g xi yi 2 ( g xi )( g yi )
1
n
Figure 12.38 The sample 1
5 132,058 2 ( 634 )( 1,956 ) 5 8,047.6
correlation coefficient 10
computed using CrunchIt!. Step 2 Using Equation 12.8, the sample correlation coefficient is
Sxy 8,047.6
Remember that r is a measure of r5 5 5 0.1519
association, not causation. Two "SxxSyy " ( 29,078.4 )( 96,554.4 )
variables may be strongly related, Because r 5 0.1519 , 0.5, there is a weak positive linear relationship between
but that does not mean one variable parking-lot capacity and the number of tickets issued each month within 100
causes the other. In children, for meters. This suggests that more tickets are issued close to larger parking lots.
example, shoe size and height are Figure 12.38 shows a technology solution.
associated, but probably both are
caused by a third variable, age. Try It Now Go to Exercise 12.52
Technology Corner
Procedure: Find the estimated regression line, complete the ANOVA table, and conduct a t test (concerning b1 ) for a
significant regression.
Reconsider: Example 12.5, solution, and interpretations.
CrunchIt!
Use the Simple Linear Regression command. CrunchIt! does not display the ANOVA table.
1. Enter the thickness of insulation values into column Var1 and the time to reach operating temperature into column Var2.
2. Select Statistics; Regression; Simple Linear. Using the drop-down menus, select the Dependent Variable, y, the
I ndependent Variable, x; and in the Parameters section, select Numeric Results as the Display option.
3. Click Calculate. The results are displayed in a new window. See Figure 12.39.
TI-84 Plus C
Execute DiagnosticOn on the Home screen to display the coefficient of determination and sample correlation coefficient
with the estimated regression line.
1. Enter the thickness of insulation values into list L1 and the times into list L2.
600 C hapte r 1 2 Correlation and Linear Regression
2. Select STAT ; CALC; LinReg(a+bx). Enter the name of the list containing the observed values of the independent
variable, L1, and the name of the list containing the observed values of the dependent variable, L2. Highlight Calculate
and press ENTER . The estimated regression coefficients are displayed on the Home screen. See Figure 12.40.
3. To test for a significant regression in terms of b1, select STAT ; TESTS; LinRegTTest. Enter the name of the list
containing the observed values of the independent variable, L1, and the name of the list containing the observed values
of the dependent variable, L2, set Freq to 1, highlight 2 0, and optionally store the regression equation in a function
variable. See Figure 12.41.
4. Highlight Calculate and press ENTER . The output is displayed on the Home screen. See Figure 12.42.
Minitab
There are several ways to find the estimated regression line in Minitab, in a Session window and in the Stat; Regression menu.
1. Enter the thickness of insulation values into column C1 and the times into column C2.
2. Select Stat; Regression; Regression; Fit Regression Model. Enter the Response variable, C2, and the Continuous
predictor variable, C1. Click OK. The results are displayed in a session window. Refer to Figure 12.30.
Excel
Regression analysis is part of the Data Analysis tool pack.
1. Enter the thickness of insulation values into column A and the times into column B.
2. Under the Data tab, select Data Analysis; Regression. Enter the Y Range, X Range, and the Output Range. Click OK.
The output is displayed in the specified range. See Figure 12.43
CrunchIt!
Use the command Correlation to find the sample correlation coefficient.
1. Enter the capacity values into column Var1 and the number of tickets into column Var2.
2. Select Statistics; Correlation. Check the appropriate columns and click Calculate. The sample correlation coefficient is
displayed in a new window. Refer to Figure 12.38.
12.2 Hypothesis Tests and Correlation 601
TI-84 Plus C
Use the built-in function LinReg(a+bx) to find the sample correlation coefficient. Execute DiagonisticOn on
the Home screen to display the coefficient of determination and sample correlation coefficient with the estimated
regression line.
1. Enter the values for capacity into list L1 and the values for tickets into list L2.
2. Select STAT ; CALC; LinReg(a+bx). Enter the name of the list containing the observed values of the independent
variable, L1, and the name of the list containing the observed values of the dependent variable, L2. Press ENTER .
3. The sample correlation coefficient, r, is displayed at the bottom of the screen. See Figure 12.44.
Minitab
Minitab has a built-in function to compute the sample correlation coefficient, in the Stat; Basic Statistics menu.
1. Enter the values for capacity into column C1 and the values for tickets into column C2.
2. Select Stat; Basic Statistics; Correlation.
3. Select the Variables (columns) containing the data. Select the Method and optionally display the p-values. Click OK.
The sample correlation coefficient is displayed in a Session window. See Figure 12.45.
Excel
Use the built-in function CORREL to compute the sample correlation coefficient.
1. Enter the values for capacity into column A and the values for tickets into column B.
2. Use the function CORREL with the two arrays. See Figure 12.46.
12.38 True/False The sample correlation coefficient is a c. Using your confidence interval in part (b), is there any
measure of the strength of a linear relationship between two evidence to suggest that b1 2 0? Justify your answer.
continuous variables. How does your conclusion compare with part (a)?
12.39 Short Answer The sample correlation coefficient is 12.44 Consider the following data obtained in a simple linear
always between what two numbers? regression study. EX12.44
a. Complete the ANOVA summary table. a. Conduct an F test for a significant regression. Use
b. Conduct an F test for a significant regression. Use a 5 0.05. Bound the p value and use technology to find
a 5 0.05. the exact p value.
c. Find the coefficient of determination. b. Conduct a t test for a significant regression. Use a 5 0.05.
d. Use your answer in part (c) to find the sample correlation Bound the p value and use technology to find the exact
coefficient. p value.
c. Square the value of the test statistic in part (a). Compare
12.42 Consider the following (partial) ANOVA summary table this value with the test statistic in part (b).
from a simple linear regression analysis. d. How do the exact p values compare in parts (a) and (b)?
Explain this result.
ANOVA summary table for simple linear regression
12.46 Consider the following summary statistics for data
Source of Sum of Degrees of Mean obtained in a study of the linear relationship between two
variation squares freedom square F p Value variables.
Regression 2,772.93 Sxx 5 199.418 Syy 5 81.430 Sxy 5 111.774
Error 12,988.70
a. Find the sample correlation coefficient.
Total 35 b. Use the value of r from part (a) to describe the
relationship between the two variables.
a. Complete the ANOVA summary table.
12.47 Consider the following data obtained in a study of the
b. Conduct an F test for a significant regression. Use
a 5 0.01. relationship between two variables. EX12.47
can be explained by this simple linear regression model? a. Construct a scatter plot of the data.
b. Find the sample correlation coefficient.
12.49 Biology and Environmental Science In a recent c. Use your results in parts (a) and (b) to describe the
study, the weight of an orange (x, in pounds) was compared relationship between crimini mushroom weight and
with the amount of fresh-squeezed juice from the orange (y, in copper content.
ounces). A random sample of n 5 15 oranges was obtained. d. If a simple linear regression analysis were conducted,
Each was carefully weighed and squeezed. The following sum- identify the independent and dependent variables that
mary statistics were obtained.
g xi 5 9.09 g yi 5 50.44 g xi yi 5 30.72
would be used.
gx 2i 5 5.73 gy 2i 5 173.21
12.53 Sports and Leisure A recent study suggests that, for
NASCAR racers, yearly winnings are related to average finish.15
A random sample of NASCAR racers was obtained, and the
a. Find the estimated regression line, and complete the
average finish (x) and total winnings (y, in millions of dollars)
ANOVA table. Interpret your results.
through 31 races in 2013 were recorded for each.16 NASCAR
b. Conduct an F test for a significant regression. Use a
significance level of 0.01. a. Construct a scatter plot of the data. Describe the
c. Find a 95% confidence interval for the regression relationship between average finish and winnings.
parameter b0. Does this interval suggest that b0 is b. Find the sample correlation coefficient. Is the value of r
different from 0? consistent with your interpretation in part (a)? Why or
why not?
12.50 Physical Sciences The temperature of the upper layer c. Find the estimated regression line, and complete the
of ocean water is affected by sunlight and wind. There is often a ANOVA table.
very sharp difference in temperature between the surface zone d. Using your results from part (c), what would the average
and the more stationary deep zone. The thermocline layer finish have to be in order to win at least $6 million
marks the abrupt drop-off in temperature. The following data through 31 races?
were obtained in a study of temperature (x, measured in °C)
versus depth (y, measured in meters) above the thermocline 12.54 Fuel Consumption and Cars An investigative reporter
layer in the Mediterranean Sea. believes that certain automobile service stations that offer state
OCEAN
vehicle inspections routinely charge for unnecessary repair work.
Preliminary data suggest that the cost of the repair work may be
x 76 54 146 7 91 130 131 117 related to the age of the car. A random sample of automobiles
y 11.0 16.9 8.0 24.7 17.4 16.0 11.1 16.5 inspected at these stations was obtained, and the age (in years)
along with the cost of the repairs (in dollars) were recorded for
a. Complete the ANOVA summary table for simple linear each vehicle. The data are given in the following table. repair
regression.
Age Repair cost Age Repair cost
b. Conduct an F test for a significant regression and find
bounds on the p value associated with this test. Interpret 9.1 1882 3.2 17
your results. 3.8 193 3.3 1268
c. Find a 95% confidence interval for the true value of b1. 5.0 368 6.7 1126
1.7 1047 3.9 646
12.51 Physical Sciences Permafrost is soil or rock that
1.9 315 9.7 955
remains at or below 0°C for at least two years. Suppose an
5.3 1631 2.0 801
Arctic study analyzed the observed changes in mean annual air
5.9 652 5.4 973
temperature and the depth of the freezing layer. The depth of
6.4 475
the upper surface of the permafrost layer (y, in feet) was
604 Cha pter 1 2 Correlation and Linear Regression
a. Find the sample correlation coefficient, and describe the Extended Applications
linear relationship.
b. Find the estimated regression line for age (x) and repair 12.58 Business and Management The owner of a small
cost (y). Conduct an F test for a significant repression, ice cream stand believes that total weekly revenue (y, in dollars)
and find the bounds on the p value for this test. during the summer months is related to money spent on adver-
c. Using your results in parts (a) and (b), do you believe the tising (x, dollars per week). A random sample of summer weeks
reporter’s claim? Justify your answer. was selected, and the resulting data are given in the following
table. ICECREAM
12.55 Physical Sciences As the temperature of air
increases, it has a greater capacity to hold moisture. x 30 300 380 275 350 190 85
owever, humidity can only increase if there is a supply of
H y 957 1125 1202 1028 1134 1124 1062
moisture. For the area of southern Florida, a random sample
of days was obtained and the relative humidity (x, a a. Find the estimated regression line, and complete the
percentage) and daily temperature (y, in °F) were recorded ANOVA table.
for each.17 HUMID b. What proportion of the observed variation in weekly
a. Construct a scatter plot of the data. revenue is explained by this regression model?
b. Find the sample correlation coefficient. c. Find a 99% confidence interval for the regression
= parameter b1.
c. Based on the value of r in part (b), predict the sign of b1
d. Conduct a hypothesis test of H0: b0 5 0 versus
in the estimated regression line.
Ha: b0 . 0. Interpret the results. (What happens if the
d. Find the estimated regression line. Do the results support
owner spends nothing on advertising in a week?)
your answer to part (c)? Why or why not?
12.59 Public Health and Nutrition Research suggests that
12.56 Biology and Environmental Science A recent study
salt intake is related to fluid intake in adults and soft drink con-
suggests that airplanes flying into and out of Australia and New
sumption in children and adolescents.20 Suppose a random sam-
Zealand create the most amount of aircraft pollution in the form
ple of 18 children and adolescents was obtained and the amount
of ozone.18 An area over the Pacific Ocean, about 1000 km to
of salt consumed (x, g/week) and the amount of sugar-sweet-
the east of the Solomon Islands, is very sensitive to aircraft
ened soft drinks consumed (y, g/week) was recorded for each.
emissions. A random sample of aircraft flying from Australia
The following summary statistics were obtained.
gxi 5 430 g yi 5 17,512 gxi yi 5 441,608
was obtained, and the amount of aircraft emissions (x, oxides,
c. Find an estimate of the expected turbidity if 0.55 inch of acres) and the depth from the surface (y, in thousands of feet)
rain has fallen within the past week. was obtained for each. GEOLCO2
d. What proportion of the observed variation in turbidity is a. Construct a scatter plot of these data. Describe the
explained by this regression model? How do you think relationship between these two variables.
this model could be improved? b. Compute the sample correlation coefficient. Does this
12.61 Economics and Finance Stock brokers and even value support your answer in part (a)? Why or why not?
casual investors are always searching for better methods to c. Using the results in parts (a) and (b), do you believe the
predict the movement in the price of stocks. The “skirt-length relationship between these two variables is significant?
theory” suggests that if women’s skirts are short, then the Why or why not?
markets will rise. If skirts are long, then the markets will be d. Find the estimated regression line and conduct an F test
headed down. To test this theory, a random sample of years was for a significant regression. Use a 5 0.05. Does your
obtained, and the length of women’s skirts was measured (x, in conclusion support your answer in part (c)?
inches) for a typical fashion model. The change in the S&P 500
market index ( y) was also recorded for that year. The data are Challenge
given in the following table. STOCKS 12.64 Sports and Leisure Let r be the population correlation
coefficient between two variables. A hypothesis test for a correla-
x 24 15 24 16 16 16 19 15 tion different from zero is based on the sample correlation coef-
y 7 27 3 36 235 23 235 79 ficient, R (the random variable), and involves the t distribution.
H0: r 5 0
x 19 25 15 18 17 22 16 19
Ha: r . 0, r , 0, or r20
y 25 255 229 8 42 7 29 245
R!n 2 2
TS: T 5
a. Find the estimated regression line, and complete the "1 2 R2
ANOVA table. RR: T $ ta,n22, T # 2ta,n22, or 0 T 0 $ ta/2,n22
b. Conduct an F test for a significant regression. Use a
significance level of 0.01. A new mountain climber believes that there is a linear relation-
c. Construct a scatter plot of the data, and find the sample ship between the diameter of a single rope and its dry weight. A
correlation coefficient. random sample of climbing ropes was obtained. The diameter
d. Using your answers to parts (b) and (c), do you think the of each was measured (x, in mm) and the weight of each (y, in
skirt-length theory is worth using? Justify your answer. grams per meter) was also recorded. The data are given in the
following table. MTNCLIMB
12.62 Physical Sciences The rate of evaporation at the sur-
face of the water in a swimming pool (kg/h) is believed to be x 10.07 9.54 10.34 10.85 9.85 9.45 9.95
related to the air velocity (m/s) or the relative humidity (mea- y 71.2 64.5 67.2 73.5 64.5 65.2 68.2
sured as a percentage). POOL
a. Find the estimated regression line, and complete the
x 10.63 9.73 10.34
ANOVA table for evaporation rate ( y) and air velocity (x). y 70.9 67.8 72.0
b. Find the estimated regression line and complete the
a. Find the sample correlation coefficient, and conduct a
ANOVA table for evaporation rate ( y) and relative
two-sided hypothesis test with H0: r 5 0. Use a
humidity (x).
significance level of 0.05.
c. Which of these two models do you think is better? Justify
b. Find the estimated regression line, and conduct a test of
your answer.
H0: b1 5 0 versus Ha: b1 2 0. Use a significance level
12.63 Physical Sciences The U.S. Geological Survey of 0.05.
recently completed a national assessment of geologic carbon c. Explain the similarities between the values of the test
dioxide storage resources.22 A random sample of formations statistics in parts (a) and (b). Explain why this
was obtained, and the area of each formation (x, in thousands of relationship makes sense.
Remember that E(Y 0 x*) is the The error in estimating the mean value of Y is less than the error in estimating an
mean of all values of Y for which observed value of Y. In the first case, y* is used to estimate a single value, the mean.
x 5 x*. And y* is a single However, an estimate of an observed value of Y is a guess at the next value selected
observation for x 5 x*. from an entire distribution; the error in estimation must be greater. In this section, we
will first consider a hypothesis test and confidence interval concerning the mean value
of Y for x 5 x*. Then we will consider a prediction interval for an observed value of
Y if x 5 x*.
Suppose the simple linear regression model assumptions are true. For x 5 x*, the
random variable B0 1 B1x* has a normal distribution with expected value
E ( B0 1 B1x* ) 5 b0 1 b1x* (12.9)
and variance
1 ( x* 2 x ) 2
Var ( B0 1 B1x* ) 5 s2 c 1 d (12.10)
n Sxx
The standard deviation is the square root of the expression in Equation 12.10, and an
estimate of the standard deviation is obtained by using s as an estimate for s.
The numerator, (x* 2 x)2 , is 0 The variance of B0 1 B1x* is smallest when x 5 x. The further x* is from x, the
when x 5 x. greater the squared difference ( x* 2 x ) 2 and the greater the variance. Therefore, the esti-
mator, B0 1 B1x*, for the mean value of Y is most precise near x. And a confidence inter-
val for the mean value of Y would be narrower for values of x* near x. Intuitively, think
about predicting the weather. The further into the future we try to predict the weather, the
more inaccurate will be the forecast.
The following theorem is used to conduct a hypothesis test and construct a confidence
interval concerning the mean value of Y given x 5 x*.
Theorem
If the simple linear regression assumptions are true, then the random variable
( B0 1 B1x* ) 2 ( b0 1 b1x* )
T5
S" ( 1/n ) 1 3 ( x* 2 x ) 2 /Sxx 4
has a t distribution with n 2 2 degrees of freedom.
The null and alternative hypotheses are stated in terms of the parameter y* 5 b0 1 b1x*.
A 100 ( 1 2 a ) % confidence interval for mY 0 x*, the mean value of Y for x 5 x*, has as
endpoints the values
= = 1 ( x* 2 x ) 2
( b0 1 b1x* ) 6 ta/2,n22 s 1 (12.11)
Ån S xx
12.3 Inferences Concerning the Mean Value and an Observed Value of Y for x 5 x* 607
x 5.523 67.039 49.686 22.431 4.636 12.594 16.542 30.523 59.581 43.865 36.874 31.820
y 0.225 5.939 5.280 1.723 0.262 1.121 1.075 2.123 6.648 4.875 3.666 3.027
The estimated regression line is y 5 20.4201 1 0.1076x, and the summary ANOVA
Solution Trail 12.7 table is as follows.
Key w or ds
ANOVA summary table for simple linear regression
n Hypothesis test
n 30 thousand dollars Source of Sum of Degrees of Mean
n Mean expenditures greater variation squares freedom square F p Value
than
Regression 53.5304 1 53.5304 222.41 0.0000
n Random sample
Error 2.4068 10 0.2407
Tr anslati o n
Total 55.9372 11
n Conduct a one-sided, right-
tailed, hypothesis test concern- In addition, s 5 !MSE 5 !0.2407 5 0.4906, x 5 31.7595, and Sxx 5 4624.22.
ing the mean when x 5 30
a. For x 5 30 thousand dollars per capita, conduct a hypothesis test to determine
C oncepts whether there is any evidence that the mean expenditures for health care per capita is
n Hypothesis test concerning the greater than 2.6 thousand dollars.
mean value of Y when x 5 x* b. Construct a 95% confidence interval for the true mean health care expenditures in
thousands of dollars for a country with a GDP of 48 thousand dollars per capita.
V isi on
The hypothesized value of the
mean expenditures for a GDP of
Solution
30 is y*0 5 2.6. Use this value a. The hypothesis test is one-sided, right-tailed with y*0 5 2.6. Use a significance level
and the summary statistics to of 0.05.
conduct a one-sided, right-tailed
hypothesis test concerning the H0: y* 5 2.6
mean value of Y for Ha: y* . 2.6
x 5 x* 5 30. ( B0 1 B1x* ) 2 y*0
TS: T 5
S" ( 1/n ) 1 3 ( x* 2 x ) 2 /Sxx 4
t10 RR: T $ ta,n22 5 t0.05,10 5 1.8125
The value of the test statistic is
= =
( b0 1 b1x* ) 2 y*0
t5
s" ( 1/n ) 1 3 ( x* 2 x ) 2 /Sxx 4
0 1.4606 3 20.4201 1 0.1076 ( 30 ) 4 2 2.6
Figure 12.47 p-Value 5 5 1.4606
0.4906" ( 1/12 ) 1 3 ( 30 2 31.7595 ) 2 /4624.22 4
illustration:
p 5 P(T $ 1.4606) The value of the test statistic does not lie in the rejection region. Equivalently,
5 0.0874 . 0.05 5 a p 5 0.0874 . 0.05 as shown in Figure 12.47. At the a 5 0.05 significance level, there
is no evidence to suggest that the mean health care expenditures per capita is greater than
2.6 thousand dollars for a country with GDP 30 thousand dollars per capita.
b. To construct the confidence interval, first find the appropriate critical value.
ta/2,n22 5 t0.025,10 5 2.2281 Use Table V in the Appendix to find the critical value.
608 C hapte r 1 2 Correlation and Linear Regression
= = 1 ( x* 2 x ) 2
5 ( b0 1 b1x* ) 6 t0.025,10 s 1 Use the values of a and n.
Ån Sxx
1 ( 48 2 31.7595 ) 2
5 3 20.4201 1 0.1076 ( 48 ) 4 6 ( 2.2281 )( 0.4906 ) 1
Å 12 4624.22
Use summary statistics, values for s and t0.025,10 .
5 4.7447 6 0.4095 Simplify.
VIDEO Tech
Video TECH Manuals
MANUALS (4.3352, 5.1542) is a 95% confidence interval for the true mean health care expendi-
EXEL DISCRIPTIVE
Linear regression: tures in a country with GDP 48 thousand dollars per capita.
Confidence Interval Note: Figure 12.48 shows a scatter plot of the data, the estimated regression line, and the
for X 5 x
95% confidence bands for the true mean health expenditures for each GDP. The confi-
dence bands allow us to visualize 95% confidence intervals for the true mean value of Y
for any value of x 5 x*. The top confidence band connects the right, or upper, endpoint
of each 95% confidence interval, and the bottom confidence band connects the left, or
lower, endpoint of each 95% confidence interval. To estimate the 95% confidence inter-
val for the true mean value of y for x 5 48 from the graph, draw a vertical line at x 5 48.
The intersection of the line and the confidence bands yields the endpoints of the interval.
As x* moves away from x, the width of the confidence interval increases.
Figure 12.49 shows a technology solution.
y
7
Estimated
6 regression
5 line
4
3
Confidence
2 bands
1
0
0 10 20 30 x 40 50 60 x
Figure 12.48 Scatter plot, estimated regression Figure 12.49 Minitab output: regression equation
line, and confidence bands for the health care and a 95% confidence interval for the true mean health
expenditure–GDP data. care expenditures in a country with a GDP of 48.
The only difference between this Prediction Interval for an Observed Value of Y
prediction interval and the
A 100 ( 1 2 a ) % prediction interval for an observed value of Y when x 5 x* has as end-
confidence interval in Equation
points the values
12.10 is the extra 1 underneath
the square-root symbol. This extra = = 1 ( x* 2 x ) 2
( b0 1 b1x* ) 6 ta/2,n22 s 1 1 1 (12.12)
1 is reasonable because the Å n Sxx
random variable here includes an
extra term, E*, with variance s2.
Example 12.8 Health Care Expenditures (Continued)
Use the health care expenditure–GDP data presented in Example 12.7. Suppose a country
with a GDP of 50 thousands dollars per capita is selected at random. (This was the
approximate U.S. GDP in 2012.) Find a 90% prediction interval for the health care expen-
ditures per capita.
Solution
Step 1 Find the appropriate critical value.
ta/2,n22 5 t0.05,10 5 1.8125 Use Table V in the Appendix to find the critical value.
(4.0041, 5.9157) is a 90% prediction interval for an observed value of health care
expenditures per capita in a country with a GDP of 50 thousand dollars per capita.
VIDEO Tech
Video TECH Manuals
MANUALS Figure 12.50 shows a technology solution.
EXEL DISCRIPTIVE
Linear regression:
Prediction interval
for X 5 x
A Closer L ok
Can you explain why the 1. A confidence interval for the mean value of Y and a prediction interval for an observed
prediction interval is wider the value of Y (for x 5 x*) are centered at the same value. Compare the endpoints for
further x* is from x? Look these two intervals in Equations 12.11 and 12.12. The only difference is a 1 underneath
carefully at Equation 12.12. the radical in the prediction interval.
2. As x* moves farther away from x, the width of the corresponding prediction interval
increases.
Technology Corner
Procedure: Construct a confidence interval for the mean value of Y and a prediction interval for an observed value of Y
when x 5 x*.
Reconsider: Examples 12.7 and 12.8, solution, and interpretations.
CrunchIt!
Use Simple Linear Regression options to construct a prediction interval.
1. Enter the GDP data into column Var1 and the health care expenditure data into column Var2. Rename the columns if
desired.
2. Choose Statistics; Regression; Simple Linear. Enter the Dependent Variable, the Independent Variable. In the Param-
eters section, select the desired Display, enter 50 in the Predict box, and the Prediction Interval Level. Click Calculate.
The estimated regression line, coefficient statistics, and the prediction interval are displayed. See Figure 12.51. Note:
CrunchIt! does not have an option for a confidence interval for Y when x 5 x*.
Minitab
Use regression options to construct a confidence interval and a prediction interval.
1. Enter the values for GDP into column C1 and the values for health care expenditures into column C2.
2. Select Stat; Regression; Regression; Fit Regression Model. Enter the Response column, C2, and the Continuous
predictor column, C1. Click OK.
3. Select Stat; Regression; Regression; Predict. Select the Response variable, enter individual values, and enter 48
under C1.
4. Click OK. The confidence interval(s), and prediction interval(s) are displayed in a session window. Refer to
Figures 12.49 and 12.50.
Note
1. Figure 12.52 shows a 95% confidence interval for the mean value of Y and a 90% prediction interval for an observed
value of Y when x 5 48 using JMP.
2. The TI-84 Plus and Excel do not have a built-in function for constructing a confidence interval for Y nor a prediction
interval for y when x 5 x*.
12.3 Inferences Concerning the Mean Value and an Observed Value of Y for x 5 x* 611
and the amount of solar radiation was measured (x, in lang- 12.79 Public Health and Nutrition The European Food
leys) for each. The total battery charge was measured as a Safety Authority recently issued a scientific opinion on the
proportion (y, between 0 and 1). The summary statistics are public health risks related to mechanically separated meat
given below. (MSM). The analysis suggested that calcium could be used to
= distinguish between MSM and non-MSM products.23 A ran-
b0 5 0.2007 n 5 21 MSE 5 0.06135
= dom sample of MSM poultry was obtained and the deboner
b1 5 0.00446 x 5 103.095 Sxx 5 12335.8 head pressure (x, lb/in2) and the amount of calcium (y, ppm)
a. What proportion of a charge can you expect the batteries was measured for each. The data are given in the following
to take if the amount of solar radiation is 100 langleys? table. POULTRY
b. Find a 95% confidence interval for the true mean battery
charge proportion if the amount of solar radiation is 130 Pressure Calcium Pressure Calcium
langleys. 51 573 112 577
c. A value of 80 langleys indicates a typical cloudy day. On 95 654 76 600
a typical cloudy day, is there any evidence to suggest
104 581 143 666
that the true mean charge proportion is greater than 0.06
(the proportion needed to ensure a home will have 143 709 93 616
sufficient energy until the next day)? Use a significance 77 560 87 514
level of 0.01. 109 629 70 586
12.77 Fuel Consumption and Cars An automobile 102 623 49 584
mechanic believes the weight of a tire is related to the overall 72 560 142 634
diameter of the tire. A random sample of automobile tires was 120 598 132 632
obtained and the diameter (x, inches) and weight (y, pounds)
were measured for each. The estimated regression line was a. Find the estimated regression line.
b. Find an estimate for the true mean calcium for a pressure
SSE 5 258.8706, x 5 25.3474, gx2i 5 12221.6504, and
y 5 237.8316 1 2.46657x with n 5 19, SSR 5 87.1673,
of 100 lb/in2.
c. Suppose a deboner is set at a pressure of 120 lb/in2. Is
Sxx 5 14.3274.
there any evidence to suggest that the true mean calcium
a. Complete the ANOVA summary table, and conduct an F
is greater than 625 ppm? Find the p value associated with
test for a significant regression. Explain the relationship
this test.
between tire weight and tire diameter.
d. Suppose certain hand-deboned poultry has on average
b. Find an estimate of the observed tire weight for a tire
600 ppm calcium. What should the pressure be for the
diameter of 25.2 inches.
mean calcium level for MSM poultry to be the same
c. Find a 99% prediction interval for an observed tire weight
value?
if the tire diameter is 24.8 inches.
d. Conduct the hypothesis test H0: b0 5 0 versus Ha: b0 2 0 12.80 Travel and Transportation Highway engineers
with significance level 0.01. Is there any evidence to have long argued that roads designed with high skid
suggest that the true regression line does not pass through resistance help to prevent accidents, especially in wet condi-
the origin? Does this result make practical sense? Why or tions. A random sample of two-lane highways was selected
why not? from across the United States, and the skid resistance
was measured (in skid numbers) using a Skid Resistance
12.78 Biology and Environmental Science A study was
Tester (SRT). The accident rate (per 10,000 vehicles) was
conducted to investigate the relationship between asthma preva-
computed for 25-mile sections of each highway during wet
lence and annual rainfall in countries around the world. A ran-
conditions. SKIDTEST
dom sample of 11 countries was obtained. The total annual
a. Identify the independent variable and the dependent
rainfall (x, in mm) was recorded for each country, as well as the
variable.
percentage of adults, 22–44 years old, treated for asthma (y).
b. Construct a scatter plot for these data, and describe the
The following summary statistics were reported.
relationship between the two variables.
Sxx 5 178,661 Syy 5 49.2655 Sxy 5 380.955 c. Find the estimated regression line.
x 5 792.636 y 5 5.26 d. Suppose the skid resistance is 0.50. Conduct a hypothesis
test to determine whether there is any evidence that the
a. Find the estimated regression line.
true mean accident rate is less than 0.60. Use a
b. Complete the ANOVA summary table, and conduct an F
significance level of 0.05.
test for a significant regression. Does annual rainfall help
to explain the variation in asthma prevalence? Justify your 12.81 Biology and Environmental Science A recent study
answer. suggests that the onset of Alzheimer’s disease (AD) is related to
c. Find a 95% prediction interval for an observed asthma exposure to microorganisms.24 A greater exposure to microor-
prevalence if the total annual rainfall is 1000 mm. ganisms may actually protect against AD. A random sample of
Comment on anything odd about this interval. countries was obtained and a measure of contemporary parasite
12.3 Inferences Concerning the Mean Value and an Observed Value of Y for x 5 x* 613
stress (x) and AD burden (y) was recorded for each. The data Extended Applications
are given in the following table. ALZHEIM
12.84 Technology and the Internet A recent study suggests
that there is a relationship between Twitter use and TV ratings.26
Parasite AD Parasite AD
A random sample of prime-time television shows was obtained.
stress burden stress burden
During the month of September, the Twitter volume (x, in mil-
3.81 5.14 4.99 3.48 lions) of tweets on the day of each show and the Nielsen TV rat-
3.84 4.59 4.91 3.59 ing ( y) was recorded for each show. The summary statistics are
=
4.45 4.27 3.79 5.06 b0 5 20.54 n 5 25 MSE 5 0.2042
4.86 4.46 4.36 4.76 =
b1 5 0.006795 x 5 407.6 Sxx 5 21612.0
3.95 4.28 4.07 4.31
a. Find an estimate of the true mean TV rating if there are
4.19 4.40 3.89 4.33
400 million tweets on the day of the show.
4.05 4.49 4.59 4.80 b. Suppose the number of tweets during a day is 450 million.
4.46 4.43 3.54 4.77 Is there any evidence that the true mean TV rating will
4.42 4.28 4.51 4.96 exceed 2.3? Use a significance level of 0.05.
4.27 4.99 4.18 5.44 c. Construct a 99% confidence interval for the true mean
TV rating when x 5 375. On the basis of this interval,
a. Find the estimated regression line. Explain the relevance of do you think the mean TV rating is less than 2.3? Justify
the sign on the estimate of b1 in the context of this problem. your answer.
b. Conduct an F test for a significant regression. Find the p 12.85 Public Policy and Political Science In a global
value associated with this test. study by Transparency International, it was found that a coun-
c. Suppose the United States has a parasite stress of try’s environmental performance is related to political corrup-
approximately 5.6. Find a 95% confidence interval for tion. A random sample of 15 countries was selected. The
the true mean AD burden in the United States. Transparency International Corruption Perceptions Index
12.82 Psychology and Human Behavior Several studies (CPI, x), a measure of perceived corruption among public
suggest that the number of violent crimes is related to tempera- officials and politicians, was computed for each country. In
ture. Suppose the FBI investigated this relationship between addition, the Environmental Sustainability Index (ESI, y), a
temperature and the number of violent crimes in several measure of overall progress toward environmental sustainabil-
large cities in the United States. A random sample of days ity, was computed for each country. The estimated regression
was obtained, and the average temperature (x, in °F) and the line was y 5 26.432 1 4.546x, with SSR 5 853.50,
number of violent crimes per 100,000 people (y) were recorded SSE 5 1234.23, x 5 5.4333, and Sxx 5 41.2933.
for each day. CRIME a. Complete the ANOVA summary table, and conduct an F
a. Construct a scatter plot for these data, and describe the test for a significant regression. Explain the relationship
relationship between the two variables. between CPI and ESI.
b. Find the estimated regression line. b. Find an estimate of the observed ESI for a CPI score
c. Find a 95% prediction interval for an observed number of of 6.7.
violent crimes if the average temperature is 80°F. c. Find a 95% prediction interval for an observed ESI if the
d. Find a 95% prediction interval for an observed number of CPI is 8.2. Based on this interval, do you think a
violent crimes if the average temperature is 60°F. randomly selected country with CPI 8.2 will have an ESI
of 90 or greater? Justify your answer.
12.83 Demographics and Population Statistics A recent
study suggests that the linguistic diversity in a community is 12.86 Physical Sciences The frequency of air exchange is
related to the number of traffic accidents.25 A random sample believed to influence the indoor climate in unheated historic
of U.S. cities was obtained and the annual road fatalities per buildings. This is important because many unheated historic
1000 people (y) and the linguistic diversity (x, Greenberg buildings hold collections of artifacts, and the indoor climate
diversity index) was recorded for each. LINGDIV is related to preservation of these items. Skokloster Castle in
a. Construct a scatter plot for these data, and describe the Sweden is a masonry building without active climate control
relationship between the two variables. Does your that houses many artifacts.27 A random sample of days during
answer support the conclusions of the recent study? Why the year was obtained. The air changes per hour (x, ACH)
or why not? and the climate fluctuations transmittance related to relative
b. Find
= the estimated regression line. Explain how the sign humidity (y, CFT) was measured in similar rooms. CLIMATE
of b1 supports the conclusions of the recent study. a. Construct a scatter plot for these data, and describe the
c. The Greenburg diversity index in the United States is relationship between the two variables.
approximately 0.333. Find a 95% interval for the true b. Find the estimated regression line. Does the estimate
mean number of accidents per 1000 people in the of b1 agree with your description in part (a)? Why or
United States. why not?
614 Cha pter 1 2 Correlation and Linear Regression
c. Conduct an F test for a significant regression. Find the p sleep-aid products. Unfortunately, because this drug is so
value associated with this test. accessible, there are frequent cases of overdoses. To treat these
d. Find a 99% confidence interval for the true mean CFT for patients properly, it is important to know the amount of DS
an ACH of 0.5. ingested. However, many overdose patients are unconscious
e. Using your confidence interval in part (d), is there any when they arrive at a hospital emergency room. A study was
evidence that the mean CFT is different from 0.75? conducted to determine if the DS amount ingested could be
reliably predicted from the plasma drug concentration upon
12.87 Medicine and Clinical Studies Golden Rule medical
arrival at the hospital. A random sample of DS overdose
insurance company recently investigated the relationship between
patients was obtained from various hospitals. The plasma drug
the number of patients per registered nurse in a hospital and the
concentration (x, in mg/mL) was measured for each upon
patient’s length of stay. A random sample of hospitals was
arrival at the hospital. Following recovery, each patient was
selected, and the number of patients per registered nurse was
interviewed and the amount of DS ingested (y, in mg) was
computed (x). A patient was randomly selected for each hospital,
also recorded. ANTIHIST
and the length of stay was recorded (y, in hours). PATIENT
a. Construct a scatter plot for these data, and describe the
a. Construct a scatter plot for these data, and describe the
relationship between the plasma concentration and the
relationship between the two variables.
amount ingested.
b. Find the estimated regression line.
b. Find the estimated regression line.
c. Find a 99% prediction interval for an observed length of
c. Find a 95% confidence interval for the true mean
stay if the number of patients per nurse is 3.7.
amount of DS ingested for a plasma concentration of
d. Find a 99% prediction interval for an observed length of
6 mg/mL.
stay if the number of patients per nurse is 3.3.
d. Suppose the lethal dose of DS for a typical teenager
e. Why is the prediction interval in part (c) wider than the
is 2000 mg and that an overdose patient has a plasma
prediction interval in part (d)?
concentration of 5 mg/ mL. Do you believe that
12.88 Physical Sciences Doxylamine succinate (DS) is an this patient has taken a lethal dose of DS? Why or
over-the-counter antihistamine and is used in many nighttime why not?
observations from a normal distribution, the points will fall along a straight line. The data
axis can be horizontal or vertical. If the scatter plot is nonlinear, there is evidence to sug-
gest that the data did not come from a normal distribution.
A normal probability plot of the residuals may be used to check the normality assump-
tion. A simple histogram or stem-and-leaf plot may also be used to reveal departures from
normality.
A scatter plot of the residuals versus the independent variable values is also used to
=
check the simple linear regression assumptions. The ordered pairs in this plot are ( xi, ei ) .
Figure 12.53 illustrates the definition of a residual. This is a scatter plot of y versus x with
the estimated regression line. Figure 12.54 shows a scatter plot of the resulting residuals
versus the independent variable.
y
The ith residual e
is the difference:
ei yi (0 1xi)
ei
Estimated regression
line: y 0 1x
0
xi x
xi x
Figure 12.53 An illustration of the definition Figure 12.54 A scatter plot of the residuals
of a residual. versus the independent variable.
If there are no violations in assumptions, a scatter plot of the residuals versus the inde-
pendent variable should look like a horizontal band around zero with randomly distrib-
uted points and no discernible pattern. For example, see Figure 12.55. There should be no
obvious relation or pattern between the residuals and the predictor variable.
e
8
6
4
2
0
60 70 x
2
4
6
8
e e
0 0
x x
Figure 12.56 A curved residual plot. This Figure 12.57 A residual plot with
suggests that a linear model is not appropriate. nonconstant spread. This suggests that the
variance is not the same for each value of x.
3. Check for any unusually large (in magnitude) residual (Figure 12.58). This suggests
that one observation is very far away from the rest. The data may have been recorded
or entered incorrectly. Often, the offending point is omitted and a new estimated
regression line is computed.
4. Check for any outliers (Figure 12.59). If the observation is correct, an outlying resid-
ual suggests that one observation has an unusually large influence on the estimated
regression line. This point is also often omitted, and a new line is computed.
Possible
e e influential
observation
0 0
x x
Large residual
Figure 12.58 A residual plot with an Figure 12.59 A residual plot with an outlying
unusually large residual. This suggests that an residual. This observation is very influential when
observation is far away from the test. the estimated regression line is computed.
Solution
=
Step 1 Use the estimated regression line to find the predicted value, yi, for each x value:
= = =
yi 5 212.360 1 14.213xi. Then, compute each residual: ei 5 yi 2 yi. The nor-
mal scores for n 5 20 are given in Table IV in the Appendix.
12.4 Regression Diagnostics 617
Normal
= =
xi yi yi ei score
5.9 108 71.5 36.5 0.92
4.4 12 50.2 238.2 20.92
3.6 86 38.8 47.2 1.13
9.1 136 117.0 19.0 0.45
8.8 103 112.7 29.7 20.19
6.3 99 77.2 21.8 0.59
11.7 136 153.9 217.9 20.31
10.9 94 142.6 248.6 21.13
6.3 9 77.2 268.2 21.87
5.7 145 68.7 76.3 1.87
Percho/Dreamstime
5.0 4 58.7 254.7 21.40
Reminder: The residuals should be 8.4 124 107.0 17.0 0.31
placed in order from smallest to 4.5 68 51.6 16.4 0.19
largest when the normal scores
13.3 177 176.7 0.3 0.06
are assigned.
4.9 91 57.3 33.7 0.74
14.2 259 189.5 69.5 1.40
10.4 101 135.5 234.5 20.74
10.2 101 132.6 231.6 20.45
5.0 56 58.7 22.7 20.06
3.6 7 38.8 231.8 20.59
Step 2 The normal probability plot of the residuals is shown in Figure 12.60. The points
lie along an approximate straight line. There is no evidence to suggest that the
normality assumption is violated.
e
75
50
25
2 1 1 2 z
25
50
Figure 12.60 The normal 75
probability plot of the residuals.
=
Step 3 Figure 12.61 shows the residual plot (ei versus xi). There is no discernible pattern
in this graph. One point in the upper right, corresponding to x 5 14.2, may be an
outlier, but the graphical evidence is not convincing enough. These two graphs
suggest that there is no violation in the simple linear regression assumptions.
e
75
50
25
0
2 4 6 8 10 12 14 x
25
Figure 12.61 The residual plot 50
of residuals versus predictor 75
variable values.
618 Cha pte r 1 2 Correlation and Linear Regression
VIDEO Tech
Video TECH Manuals
MANUALS Figures 12.62 and 12.63 together show a technology solution.
EXEL DISCRIPTIVE
Residual plots
Figure 12.62 JMP normal probability plot Figure 12.63 JMP scatter plot of the residuals
of the residuals. versus the predictor variable values.
A Closer L ok
1. The sum of the residuals is always 0, subject to round-off error. We’ll check a few
examples empirically.
=
2. Other diagnostic plots include a plot of the residuals versus the predicted values yi.
Any distinct pattern suggests a violation in the simple linear regression assumptions.
3. Other types of residuals are also used to check for violations in assumptions. Stan-
dardized residuals utilize the standard deviation of each residual and are useful for
identifying residuals with large magnitudes. Studentized residuals are also standard-
ized but by using a model without the current observation.
Technology Corner
Procedure: Construct a normal probability plot of the residuals and a scatter plot of the residuals versus the predictor
variable values.
Reconsider: Example 12.9, solution, and interpretations.
CrunchIt!
In the Simple Linear Regression dialog box, use the Residuals QQ Plot option to produce a normal probability plot and the
Residuals Plot to construct a scatter plot of the residuals versus the predictor variable.
1. Enter the x values into column Var1 and the observed y values into column Var2.
2. Select Statistics; Regression; Simple Linear. Using the pull-down menus, select the Dependent Variable, Var2, and the
Independent Variable, Var1.
3. In the Parameters section, select Residuals QQ Plot from the pull-down menu for Display and click Calculate. See Figure 12.64.
4. To construct a scatter plot of the residuals versus the predictor variable values, select Residuals Plot from the pull-down
menu for Display and click Calculate. See Figure 12.65.
Figure 12.64 Normal probability plot of Figure 12.65 Scatter plot of the residuals
the residuals. versus the predictor variable values.
12.4 Regression Diagnostics 619
TI-84 Plus C
A normal probability plot and a scatter plot are built-in statistical plots.
1. Enter the x values into list L1 and the observed y values into list L2.
2. Select STAT ; CALC; LinReg(a+bx). Enter the name of the list containing the values of the independent variable,
L1, and the name of the list containing the observed values of the dependent variable, L2. When this command is
executed, the residuals are automatically stored in the named list RESID.
3. Construct a normal probability plot for the residuals (in the list RESID as described in Section 6.3. See Figure 12.66.
4. Construct a scatter plot of the residuals versus the predictor variable values as described in Section 12.1. See Figure 12.67.
Minitab
Use the Graphs option to construct a scatter plot of the residuals and a scatter plot of the residuals versus the predictor
variable values.
1. Enter the x values into column C1 and the observed y values into column C2.
2. Select Stat; Regression; Regression; Fit Regression Model. Enter the Response and Continuous predictor columns.
3. In the Graphs options, use Regular residuals. Check Normal probability plot of residuals and enter C1 under Residuals
versus the variables.
4. Click OK to conduct the regression analysis and display the graphs. See Figures 12.68 and 12.69.
Figure 12.68 Minitab normal probability plot of Figure 12.69 Scatter plot of the residuals versus the
the residuals. predictor variable values.
Excel
Compute the normal scores for the residuals as described in Section 6.3. Construct scatter plots as described in Section 12.1.
1. Enter the x values into column A and the observed y values into column B.
2. Under the Data tab, select Data Analysis; Regression. Enter the Y Range, the X Range, and an Output Range.
Check Residuals to save the residuals and Residual Plots to construct a plot of the residuals versus the predictor
variable values.
620 Ch apter 1 2 Correlation and Linear Regression
3. Click OK. The scatter plot of the residuals versus predictor variable values is shown in Figure 12.70.
4. Compute the normal scores and construct the normal probability plot as described in Section 6.3. See Figure 12.71.
Figure 12.70 Scatter plot of the residuals Figure 12.71 Normal probability plot of
versus the predictor variable values. the residuals.
Practice
12.95 Consider the following data used in a simple linear c. e
regression analysis. EX12.95
d. e d. e
10
5
0
20 40 60 80 x
2 1 1 2 z 5
10
12.98 The following residuals were obtained in a simple linear 12.101 The data on the text website were used in a simple
regression analysis. EX12.98 linear regression analysis. The estimated regression line is
y 5 91.74 2 3.7384x. EX12.101
a. Find the residuals.
222.2 267.9 21.1 26.3 212.4 233.0 224.2
b. Carefully sketch a graph of the residuals versus the
35.0 213.2 33.7 42.3 14.2 13.6 210.7 predictor variable, x. Is there any evidence of a violation
21.6 23.7 38.0 0.4 220.3 12.1 in the regression assumptions? Justify your answer.
a. Construct a normal probability plot for these residuals. 12.102 The data on the text website were used in a simple
b. Is there any evidence to suggest that the random errors are linear regression analysis. EX12.102
not normal? Justify your answer. a. Find the estimated regression line.
b. Compute the residuals.
12.99 The residuals on the text website were obtained in a
c. Carefully sketch a graph of the residuals versus the
simple linear regression analysis. EX12.99
predictor variable, x. Is there any evidence of a violation
a. Construct a normal probability plot for these residuals.
in the regression assumptions? Justify your answer.
b. Is there any evidence to suggest that the random errors are
not normal? Justify your answer.
Applications
12.100 Examine each residual plot (residuals versus predic-
tor variable). Is there any evidence to suggest a violation in 12.103 Physical Sciences The Atlantic Elevator Company
the simple linear regression model assumptions? If the graph conducted a study to examine the relationship between the
suggests a violation, indicate which assumption is in doubt. total weight of passengers on an elevator (x, in pounds) and
a. e the total energy (y, in kW) required to lift the passengers
15
one floor in an office building. A simple linear regression
analysis was conducted, and the following residuals were
10 obtained. ELEVAT
5
21.2 21.1 1.5 28.5 213.1 23.3 217.4
4 2 2 4 x
7.5 26.0 8.2 15.6 6.9 24.3 219.2
5
18.9 29.4 16.3 0.8 23.0 10.9
a. Carefully sketch a normal probability plot for the strongly related to the number of businesses, or establishments,
residuals. in that area. During the first quarter of 2013, a random sample
b. Is there any evidence that the random error terms are not of states was obtained. The number of establishments (x, in
normal? Justify your answer. thousands) and the number of people employed (y, in thou-
sands) was recorded for each.28 EMPLOY
12.105 Medicine and Clinical Studies A family counselor
a. Find the estimated regression line and compute the
believes that there is a relationship between number of years
residuals.
married and blood pressure. For each married man in a random
b. Carefully sketch a graph of the residuals versus the
sample, the number of years married (x) and the systolic blood
predictor variable (establishments). Is there any evidence
pressure ( y, in mmHg) were recorded. The data are given in the
that the simple linear regression assumptions are
following table. MARRIED
violated? Justify your answer.
Years Blood Years Blood 12.109 Physical Sciences The solar wind, or the flow of
married pressure married pressure charged particles from the Sun, affects communications, navi-
gation, and the Earth’s static pressure (the force exerted by the
4.9 133 3.4 94
atmosphere on the surface of the Earth). A random sample of
10.3 157 3.1 114 days was selected, and the solar wind density was measured
11.9 129 12.0 144 (x, in protons/cm3) using a special NASA satellite. For each day
13.0 141 12.9 147 selected, the static pressure (y, in MPa) was also recorded. The
13.0 166 7.1 124 data are given in the following table. SOLAR
The estimated regression line is y 5 97.95 1 4.034x. Solar wind Static Solar wind Static
a. Compute the residuals, and verify that they sum to 0. density pressure density pressure
b. Carefully sketch a normal probability plot for the 5.6 0.099 9.9 0.117
residuals. Is there any evidence that the random error 7.4 0.080 9.4 0.110
terms are not normal? Justify your answer.
8.0 0.115 11.0 0.138
12.106 Manufacturing and Product Development An 8.7 0.110 6.7 0.104
automotive component manufacturer believes that the speed of 10.1 0.110 6.9 0.094
the milling machine is related to the surface roughness in alu-
minum components. A random sample of aluminum compo- a. Find the estimated regression line, and compute the
nents was obtained. Each piece had a 50350 mm cross section residuals.
and was 100 mm in length. The speed of the milling machine b. Carefully sketch a normal probability plot of the
(x, in rpm) and the surface roughness (y, in mm) was measured residuals and a plot of the residuals versus the predictor
for each piece. MILLING variable.
a. Find the estimated regression line and compute the c. Do these graphs provide any evidence that the simple
residuals. linear regression assumptions are invalid? Justify your
b. Carefully sketch a normal probability plot of the answer.
residuals and a plot of the residuals versus the predictor
variable. 12.110 Medicine and Clinical Studies A health researcher
c. Do these graphs provide any evidence that the simple recently investigated the relationship between balance and
linear regression assumptions are invalid? Justify your bone density. A random sample of 25-year-old white women
answer. was obtained. Each woman took a standing balance test in
which she held her arms out horizontally and balanced on one
12.107 Biology and Environmental Science Agriculture foot. The length of time (x, in seconds) each person remained
researchers recently investigated the relationship between wheat in that position was recorded. The bone density (y, in mg/cm2)
yield and wind speed. A random sample of wheat-field plots for each person was measured using the DEXA (dual-energy
was obtained. The mean daily wind speed over the growing X-ray absorptiometry) technique. The data are given in the
period (x, in m/s) was recorded for each field, along with the following table. BONE
yield (y, in bushels per acre). The estimated regression line is
y 5 50.052 2 1.103x. WHEAT Balance Bone Balance Bone
a. Compute the residuals, and verify that they sum to 0. time density time density
b. Carefully sketch a graph of the residuals versus the
29.8 1127 20.9 1094
predictor variable (wind speed). Is there any evidence that
the simple linear regression assumptions are violated? 26.3 1105 27.5 1115
Justify your answer. 26.0 1187 39.8 1228
37.2 1334 10.7 871
12.108 Business and Management Many economists
30.1 1067 35.6 1377
believe that the number of people employed in an area is
12.4 Regression Diagnostics 623
a. Find the estimated regression line, and compute the a. Find the estimated regression line, and compute the
residuals. residuals.
b. Carefully sketch a normal probability plot of the b. Conduct an F test for a significant regression. Use a
residuals and a plot of the residuals versus the predictor significance level of 0.10. Do you believe that there is a
variable. relationship between commuting distance and sick hours?
c. Do these graphs provide any evidence that the simple Why or why not?
linear regression assumptions are invalid? Justify your c. Carefully sketch a normal probability plot of the
answer. residuals and a plot of the residuals versus the predictor
variable.
12.111 Medicine and Clinical Studies A coal miner’s job is
d. Do these graphs provide any evidence that the simple
extremely hazardous and can cause serious health problems, for
linear regression assumptions are invalid? Justify your
example, black lung disease and vibration-induced white finger.
answer.
A study was conducted to examine the hearing loss of coal min-
ers exposed to years of vibration and cool temperatures under- 12.114 Physical Sciences Researchers at a marine institute
ground. A random sample of coal miners was selected, and the investigated the relationship between the dissolved barium con-
length of time on the job (x, in years) was recorded for each centration and the silicate/nitrate ratio in the North Pacific
person. Each miner took a test in which the hearing threshold Ocean. The silicate/nitrate ratio (x) and the dissolved barium
level (HTL, y, in dB) was measured. MINER concentration (y, in nmol/kg) were measured at randomly
a. Find the estimated regression line, and compute the selected locations. NPOCEAn
residuals. a. Find the estimated regression line, and compute the
b. Carefully sketch a normal probability plot of the residuals.
residuals and a plot of the residuals versus the predictor b. Conduct an F test for a significant regression. Use a
variable. significance level of 0.01.
c. Do these graphs provide any evidence that the simple c. Carefully sketch a normal probability plot of the
linear regression assumptions are invalid? Justify your residuals and a plot of the residuals versus the predictor
answer. How do you think this regression model could be variable.
improved? d. Do these graphs provide any evidence that the simple
linear regression assumptions are invalid? Justify your
answer.
Extended Applications
12.115 Biology and Environmental Science Research
12.112 Sports and Leisure There have been many studies suggests that the electrical conductivity (EC) of soil is related
to determine the most important characteristic for golfers to to plant growth. Nutrients in the soil are in the form of ions.
produce long drives. A recent investigation considered grip These ions in the soil become dissolved in water and carry an
strength. A random sample of golfers was obtained and a hand- electrical charge, and this determines the soil EC. Too little EC
grip dynamometer was used to measure grip strength (x, in indicates a lack of nutrients, and a high EC level suggests a
Newtons) on each person. Each golfer then drove a ball, and the salinity problem. A random sample of corn fields in Iowa was
length of the drive was measured (y, in yards). GOLFER obtained and the soil EC was measured in each (x, in mS/ cm).
a. Find the estimated regression line, and compute the The resulting crop yield was also recorded (y, in bushels per
residuals. acre). The data are given in the following table. SOIL
b. Conduct an F test for a significant regression. Use a
significance level of 0.01. EC Yield EC Yield EC Yield
c. Carefully sketch a normal probability plot of the
residuals and a plot of the residuals versus the predictor 1087 193 711 165 534 90
variable. 399 104 672 159 661 163
d. Do these graphs provide any evidence that the simple 709 165 931 171 789 152
linear regression assumptions are invalid? Justify your 916 134 636 122 482 155
answer. How do you think this regression model could be 941 151 921 165 841 157
improved?
877 191 707 131 533 131
12.113 Business and Management Suppose the 687 177 1045 140 737 176
Human Resources director at NVR, a large construction
company, believes that the number of sick hours per year a. Find the estimated regression line.
taken by an employee is related to the commuting distance. b. Conduct an F test for a significant regression ( a 5 0.05 ) .
A random sample of employees was obtained, and each Explain your results in the context of this problem. Find
person’s travel distance to work (x, in miles) was recorded. bounds on the p value associated with this test. Use
Personnel records were used to obtain the number of hours technology to verify your answer.
off for sickness in a year (y) for each employee included in c. Carefully sketch a normal probability plot of the residuals
the study. SICKHRS and a plot of the residuals versus the predictor variable. Do
624 Ch a pter 1 2 Correlation and Linear Regression
these graphs provide any evidence that the simple linear c. Carefully sketch a plot of the residuals versus the
regression assumptions are invalid? Justify your answer. predictor variable and a normal probability plot of the
residuals. Do these graphs provide any evidence that the
12.116 Sports and Leisure In a study conducted in Scandi-
simple linear regression assumptions are invalid? Justify
navia, certain physiological parameters of endurance horses
your answer.
were found to be related to performance.29 A random sample of
horses and races was obtained. The total fluid intake pre-race
(x, in liters) and the speed of the horse during the race (y, in Challenge
km/h) was measured for each. HORSES
a. Find the estimated regression line, and compute the 12.117 Sum of the Residuals In a simple linear regression
residuals. analysis, show that the sum of the residuals is always 0. Hint: In
=
b. Conduct an F test for a significant regression. Explain the definition of the ith residual, use the definition of yi, and
=
your results in the context of this problem. then the formula for b0.
A Closer L ok
1. The double subscript notation on x is necessary to indicate both the variable and the
observation. For example, x21 is the value of the variable x2 that corresponds to the
observed value of y1. Similarly, x12 is the value of the variable x1 that corresponds to
the observed value of y2.
2. In the multiple linear regression model, the Ei’s again represent the random deviations
or random error terms.
3. The true regression equation is y 5 b0 1 b1x1 1 b2 x2 1 c1 bk xk . Notice that
this equation is a linear function of the unknown parameters b0, b1, c, bk . The graph
of the true regression equation is, in general, a surface, not a line. However, we often
refer to this equation as the true regression line.
12.5 Multiple Linear Regression 625
4. The unknown constants b0, b1, c, bk are called partial regression coefficients.
Recall that in Section 12.1, b1 represented the mean amount of change in y for every
one-unit increase in x1. In a multiple linear regression model, bi represents the mean
change in y for every increase of one unit in xi if the values of all the other predictor
variables are kept fixed.
Stepped
STEPPED Tutorial
TUTORIALS The focus of this section is on the method for finding the best deterministic linear
General
BOX PLOTSmultiple model, that is, finding estimates of the unknown parameters b0, b1, c, bk . Hypothesis
regression model tests will be used to determine whether the overall model explains a significant amount of
the variability in the dependent variable and to evaluate the contribution of each indepen-
dent variable. The principle of least squares can again be used to minimize the sum of the
squared deviations between the observations and the estimated values (the sum of squares
due to error).
Although hand calculations
= = are possible,
= it is much more efficient to use technology to
compute the estimates b0, b1, c, bk for the true regression parameters, or coefficients,
b0, b1, c, = bk . =For specific
= = independent variables, ( x*1, x*2, c, x*k ) 5 x*,
values of the
let y* 5 b0 1 b1x*1 1 b2 x*2 1 c1 bk x*k . The value y* is an estimate of the mean
value of Y for x 5 x*, denoted mY 0 x* , and also an estimate of an observed value of Y
for x 5 x*.
=
The sign of bi represents the way The estimated regression equation is y 5 8.2214 1 0.1811x1 1 0.3585x2. That is,
= = =
in which the surface is tilted along b0 5 8.2214, b1 5 0.1811, and b2 5 0.3585. Note that the estimated regression coef-
the xi axis. ficients on both payroll and the price of a hot dog are positive. This suggests that as
payroll and the price of a hot dog increase, ticket prices also increase. The signs on
both estimated regression coefficients seem reasonable.
b. The estimated true mean ticket price is found by substituting x1 5 x*1 5 100 and
x2 5 x*2 5 5 into the estimated regression equation.
y 5 8.2214 1 0.1811xi 1 0.3585x2 Estimated regression equation.
5 28.12 Simplify.
An estimate of the expected ticket price for a 100-million-dollar payroll and a 5-dollar
hot dog is approximately $28.12.
In this example, the graph of y 5 8.2214 1 0.1811x1 1 0.3585x2 is a plane in three
dimensions. Figure 12.73 is a three-dimensional scatter plot of the data and the graph of
the estimated regression equation. Note that the remaining points are behind the plane and
are not visible in this particular view. Figures 12.74, 12.75, and 12.76 illustrate how y
depends on each predictor variable separately. In addition to the graph of the estimated
regression equation, Figure 12.74 includes the graphs of two lines (in three dimensions):
one for x1 5 130 held constant, and one for x2 5 3.25 held constant. Figures 12.75 and
12.76 each represent a slice through the three-dimensional
= plane at a particular value of
the other predictor variable. In Figure
= 12.75, b 1 5 0.1811 is the slope of the line when
x2 5 3.25, and in Figure 12.76, b2 5 0.3585 is the slope of the line when x2 5 130.
40 40
7 7
y 30 6 y 30 6
5 5
20 20
4 4
50 3 x2 50 3 x2
100 2 100 2
x1 150 1 x1 150 1
0 0
200 200
Figure 12.73 A scatter plot of the data and Figure 12.74 A scatter plot of the data,
the graph of the estimated regression equation. the graph of the estimated regression equation,
and two lines.
y
y
50 50
40 40
30 30
20 20
10 10
50 100 150 x1 0 1 2 3 4 5 6 x2
Figure 12.75 A graph of y versus x1 when Figure 12.76 A graph of y versus x2 when
x2 5 3.25. x1 5 130.
The following concepts from simple linear regression also apply to multiple linear
regression.
1. The variance s2 is a measure of the underlying variability in the model. An estimate of
s2 is used to conduct hypothesis tests and to construct confidence intervals related to
multiple linear regression.
= = = =
2. The ith predicted value is yi 5 b0 1 b1x1i 1 c1 bk xki.
=
3. The ith residual is yi 2 yi.
4. The total sum of squares, the sum of squares due to regression, and the sum of
squares due to error have the same definitions. The sum of squares identity, Equation
12.4, is also true. The total variation in the data (SST) is decomposed into a sum of
the variation explained by the model (SSR) and the variation about the regression
equation (SSE).
An analysis of variance table is used to summarize the computations in multiple linear
regression. The differences are in the degrees of freedom associated with the sum of
squares due to regression and the sum of squares due to error. As usual, the mean squares
are corresponding sums of squares divided by the corresponding degrees of freedom.
This is the model utility test to Hypothesis Test for a Significant Multiple
determine whether the multiple Linear Regression
linear regression model is
appropriate/useful (similar to the H0: b1 5 b2 5 c5 bk 5 0
simple linear regression case). (None of the predictor variables helps to explain the variation in y.)
Ha: bi 2 0 for at least one i
(At least one predictor variable helps to explain the variation in y.)
MSR
TS: F 5
MSE
RR: F $ Fa,k,n2k21
SOLUTION TRAIL
n Coefficient of determination
a. Complete the summary ANOVA table and conduct an F test for a significant regres-
sion. Use a significance level of 0.05.
Concepts b. Compute r2 and interpret this value.
n Multiple linear regression
n Summary ANOVA table Solution
n Hypothesis test for a signifi- a. Use the sum of squares identity to compute the total sum of squares.
cant regression
2 SST 5 SSR 1 SSE 5 0.3517 1 0.5085 5 0.8602 Use Equation 12.4.
n Computation of r
There are n 5 29 observations and k 5 3 predictor variables.
Vi sion Use these values to compute the mean squares.
Use the sums of squares, number
MSR 5 SSR/k 5 0.3517/3 5 0.1172
of predictor variables, and
number of observations to MSE 5 SSE/ ( n 2 k 2 1 ) 5 SSE/ ( 29 2 3 2 1 ) 5 0.5085/25 5 0.0203
complete the ANOVA summary The F test for a significant regression with a 5 0.05 is
table for multiple linear
regression. Use the template for H0: b1 5 b2 5 b3 5 0
a hypothesis test for a significant Ha: bi 2 0 for at least one i
regression with a 5 0.05. The
sum of squares due to regression MSR
TS: F 5
and the total sum of squares are MSE
used to compute r2. RR: F $ Fa,k,n2k21 5 F0.05,3,25 5 2.99
The value of the test statistic is
f 5 MSR/MSE 5 0.1172/0.0203 5 5.7734 ( $ 2.99 )
X ~ F3,25 Because f lies in the rejection region, equivalently, p 5 0.0038 # 0.05, there is evi-
dence to suggest that at least one of the regression coefficients is different from zero.
At least one of the predictor variables can be used to explain a significant amount of
variation in the price per gallon of gasoline. The next reasonable step is to determine
which of the three predictors is really contributing to this overall significant regression.
0 5.7734 x Recall that using tables of critical values for F distributions, we can only bound the p
value. Place the value of the test statistic in an ordered list of critical values with
p-Value illustration:
v1 5 3 and v2 5 25.
p 5 P(X $ 5.7734)
5 0.0038 # 0.05 5 a 4.68 # 5.7734 # 7.45
F0.01,3,25 # 5.7734 # F0.001,3,25
Therefore, 0.001 # p # 0.01
Here are all the calculations presented in the ANOVA table, with some help from tech-
nology to find the exact p value.
ANOVA summary table for multiple linear regression
Source of Sum of Degrees of Mean
variation squares freedom square F p Value
Regression 0.3517 3 0.1172 5.7734 0.0038
Error 0.5085 25 0.0203
Total 0.8602 28
12.5 Multiple Linear Regression 629
bi0 can be any constant. We can Hypothesis Test and Confidence Interval
conduct a test for evidence that Concerning bi
the regression coefficient bi is
H0: bi 5 bi0
different from any value.
However, usually bi0 5 0 and Ha: bi . bi0, bi , bi0, or bi 2 bi0
Ha: bi 2 0. This means we are Bi 2 bi0
TS: T 5
looking for any evidence to SBi
suggest that the value of xi helps RR: T $ ta,n2k21, T # 2ta,n2k21, or 0 T 0 $ ta/2,n2k21
to predict the value of Y 0 x.
A 100 ( 1 2 a ) % confidence interval for bi has as endpoints the values
=
bi 6 ta/2,n2k21 sBi (12.14)
Solution
Solution Trail 12.12
a. There are k 5 2 predictor variables and, from the ANOVA table, there are n 5 28
Ke ywor ds observations ( n 2 1 5 27 ) .
n Multiple linear regression is The F test for a significant regression with a 5 0.05 is
significant
n Each predictor variable H0: b1 5 b2 5 0.
contributes Ha: bi 2 0 for at least one i.
T ranslati on TS: F 5 MSR/MSE
n Overall F test for a significant RR: F $ Fa,k,n2k21 5 F0.05,2,25 5 3.39
regression
n Separate hypothesis tests As given in the ANOVA table, the value of the test statistic is
concerning b1 and b2
f 5 MSR/MSE 5 3639.1/215.6 5 16.88 ( $ 3.39 )
Concepts
n Hypothesis test for a signifi-
Because f lies in the rejection region (or, equivalently, p , 0.001), there is evidence
cant regression to suggest that at least one of the regression coefficients is different from 0.
n Hypothesis test concerning bi b. To test whether x1, total fat, is a significant predictor, conduct a hypothesis test con-
cerning the regression coefficient b1.
Vi sion
Use the values in the output H0: b1 5 0
provided to conduct a formal F Ha: b1 2 0
test for a significant regression.
Test the hypotheses H0: b1 5 0 B1 2 0
TS: T 5
and H0: b2 5 0. SBi
0 0
RR: T $ ta/2,n2k21 5 t0.025,25 5 2.0595
Notice that Minitab rounds the
Using the Minitab output, the value of the test statistic is
value of the test statistic, f, to two
decimal places.
=
b1 2 0 6.72
t5 5 5 1.60
sB1 4.21
Notice again that Minitab rounds
the value of the test statistic, t, to The value of the test statistic does not lie in the rejection region, 0 t 0 5 0 1.60 0 5
two decimal places. 1.60 , 2.0595 (equivalently, p 5 0.123 . 0.05 ) . There is no evidence to suggest that
the predictor total fat is contributing to the overall regression effect nor to the variabil-
ity in y.
Conduct a similar hypothesis test concerning the regression coefficient b2.
H0: b2 5 0
Ha: b2 2 0
B2 2 0
TS: T 5
SB2
0 0
RR: T $ ta/2,n2k21 5 t0.025,25 5 2.0595
The value of the test statistic lies in the rejection region, 0 t 0 5 0 5.61 0 5
5.61 $ 2.0595 ( equivalently, p , 0.001 ) . There is evidence to suggest that the re-
gression coefficient b2 is different from 0. This implies that the predictor variable
sugar is contributing to the overall regression effect and is contributing significantly to
the variability in y.
A Closer L ok
1. The Minitab output in Example 12.12 also includes the results of a hypothesis test
concerning the constant term, b0, with H0: b0 5 0. If we fail to reject this null hypoth-
esis, a model without a constant term might be more appropriate.
Recall that if a confidence interval 2. If a model utility test is significant in a multiple linear regression model, then there are
for bi contains 0, there is no k $ 2 hypothesis tests to consider in order to isolate those variables contributing to the
evidence to suggest that the ith overall effect. Recall from Chapter 11 that we cannot set the significance level in each
predictor variable is significant. individual hypothesis test. The probability a of making a type I error (a mistake) is set in
However, if 0 is not included in each test under the assumption that only one test is conducted per experiment. Therefore,
the confidence interval, there is the more tests that are conducted, the greater is the chance of making an error. To control
evidence to suggest that the ith the probability of making at least one mistake, we can use the Bonferroni technique (pre-
regression coefficient is different sented in Chapter 11) applied to simultaneous hypothesis tests or confidence intervals.
from 0. a. If k hypothesis tests are conducted, then the significance level in each case is a /k.
b. The k simultaneous 100 ( 1 2 a ) % confidence intervals have as endpoints the val-
=
ues bi 6 ta/(2k),n2k21 sBi.
Consider additional reading about 3. Many different statistical procedures can be used to select the best regression model.
a partial F test, stepwise, and best For now, the most reasonable method is simply to keep only those variables in the
subsets regression. model that have regression coefficients significantly different from zero. Eliminate the
others, and calculate a new, reduced model for prediction.
Two very useful inferences in multiple linear regression involve an estimate of the
mean value of Y for x 5 x* and an estimate of an observed value of Y for x 5 x*. Recall
from the simple linear regression case that the error in estimating the mean value of Y is
less than the error in estimating an observed value of Y. The difference is due to estimat-
ing a single value versus estimating the next value from an entire distribution.
If the multiple linear regression assumptions are true, then a hypothesis test and confi-
dence interval concerning the mean value of Y for x 5 x* is based on the t distribution.
The by-hand calculation of the standard deviation is taxing. An appropriate symbol is
used below, and we will rely on technology to provide the necessary calculations. The null
and alternative hypotheses are stated in terms of the parameter y* 5 b0 1 b1x*1 1
b2 x*2 1 c1 bk x*k . The random variable Y* is used as an estimate of y*.
Recall that we can conduct a test Hypothesis Test and Confidence Interval Concerning
of y* 5 y*0 using either a formal the Mean Value of Y for x 5 x*
hypothesis test or a confidence
H0: y* 5 y*0
interval. Most statistical software
packages produce a confidence Ha: y* . y*0, y* , y*0, or y* 2 y*0
interval rather than conduct a ( B0 1 B1x*1 1 c 1 Bk x*k ) 2 y*0
TS: T 5
hypothesis test. SY *
RR: T $ ta,n2k21, or T # 2ta,n2k21, 0 T 0 $ ta/2,n2k21
A 100 ( 1 2 a ) % confidence interval for mY 0 x*, the mean value of Y for x 5 x*, has as
endpoints the values
= = =
( b0 1 b1x*1 1 c1 bk x*k ) 6 ta/2,n2k21sY* (12.15)
Solution
a. The Fit Model option in JMP is used to produce a summary of fit, the analysis of
v ariance table, and the parameter estimates. The Save Columns option is used with
specific values for x1, x2, and x3 to produce a confidence interval and a prediction
interval. The relevant output is shown in Figures 12.78–12.80.
The assumptions in a multiple linear regression model are also given in terms of the
random deviations, Ei, i 5 1, 2, c, n. As in a simple linear regression model, it is
assumed that the Ei’s are independent, normal random variables with mean 0 and constant
=
A residual, ei , is the difference variance s2. If any of these assumptions
= = are violated,= then the resulting analysis is unreli-
=
between the observed value of Yi able. The residuals, ei 5 yi 2 ( b0 1 b1x1i 1 c1 bk xki ) , i 5 1, 2, c, n, are estimates
and the predicted value for Yi. of the random errors and are used to check the regression assumptions.
The following graphical procedures, developed for a simple linear regression model,
can be extended to a multiple linear regression model.
1. Construct a histogram, stem-and-leaf plot, scatter plot, and/or normal probability plot
of the residuals. These graphs are all used to check the normality assumption.
2. Construct a scatter plot of the residuals versus each independent variable. For exam-
=
ple, the first scatter plot has the ordered pairs ( x1i, ei ) , the second has the ordered pairs
=
( x2i, ei ) , etc. If there are no violations in assumptions, each scatter plot should appear
as a horizontal band around 0. There should be no recognizable pattern. Typical pat-
terns in a residual plot that suggest a violation in assumptions include a distinct curve,
nonconstant spread, an unusually large residual, or an outlier.
Solution
Step 1 Using technology, the normal probability plot of the residuals is shown in
Figure 12.82. The points appear to lie along an approximate straight line.
There is no overwhelming evidence to suggest that the normality assumption
is violated.
= = =
Step 2 Figures 12.83–12.85 show the residual plots: ei versus x1i, ei versus x2i, and ei
versus x3i. There is no obvious pattern in any of the these graphs.
Together, these four graphs suggest that there is no violation in the multiple linear re-
gression model assumptions.
634 Ch apter 1 2 Correlation and Linear Regression
Figure 12.82 Minitab normal probability Figure 12.83 Plot of the residuals versus
plot. the first predictor.
Figure 12.84 Plot of the residuals versus Figure 12.85 Plot of the residuals versus
the second predictor. the third predictor.
Technology Corner
Procedure: Multiple Linear Regression analysis, normal probability plot of the residuals, and plot of the residuals
versus values of a predictor variable.
Reconsider: Example 12.14, solution, and interpretations.
CrunchIt!
Use Multiple Linear regression options to produce a plot of the residuals versus the dependent variable and a normal
probability plot of the residuals.
1. Enter the data into columns Var1, Var2, Var3, and Var4. Rename the columns if desired.
2. Select Statistics; Regression; Multiple Linear. Choose the Dependent Variable from the pull-down menu and check the
Independent Variables. Select Numeric Results from the Display pull-down menu to view the estimated regression line
and hypothesis test results concerning the coefficients. The CI Level box is used to obtain a confidence interval for each
coefficient. Click Calculate to display the results. See Figure 12.86.
3. Construct a normal probability plot of the residuals by using the Residuals QQ Plot option from the Display pull-down
menu. See Figure 12.87. Construct a scatter plot of the residuals versus the dependent variable by using the Residuals
Plot option from the Display pull-down menu. See Figure 12.88.
Figure 12.87 Normal probability plot of the Figure 12.88 Scatter plot of the residuals
residuals. versus the dependent variable.
Minitab
There are several ways to conduct a multiple linear regression analysis in Minitab, in a Session window and in the Stat;
Regression menu.
1. Enter values of the dependent variable into column C1, and values for the predictor variables into columns C2, C3, and
C4. Name the columns if desired.
2. Select Stat; Regression; Regression. Enter the Response column, C1, and the Predictor columns, C2, C3, and C4.
636 C h a p t e r 1 2 Correlation and Linear Regression
3. Select the Storage button and check Residuals to store the residuals in a Worksheet column. Click OK to display the
regression analysis. See Figure 12.81.
4. Construct a normal probability plot of the residuals as described in Section 12.4 (Figures 12.82). Construct scatter plots
of the residuals versus values of each predictor variable as described in Section 12.4 (Figures 12.83–12.85).
Excel
Regression analysis is included in the Data Analysis tool pack.
1. Enter values of the dependent variable into column A, and values for the predictor variables into columns B, C, and D.
2. Under the Data tab, select Data Analysis; Regression. Enter the Y Range, X Range, and the Output Range. Check
Residuals Plot to construct a plot of the residuals versus each predictor variable. Check Residuals to save the residuals.
The summary output is shown in Figure 12.89.
Compute the normal scores and construct a normal probability plot of the residuals as described in Section 6.3. The
normal probability plot is shown in Figure 12.90, and the other residual plots are shown in Figures 12.91–12.93.
Figure 12.90 Normal probability plot of Figure 12.91 Scatter plot of the residuals
the residuals. versus the first predictor variable.
Figure 12.92 Scatter plot of the residuals Figure 12.93 Scatter plot of the residuals
versus the second predictor variable. versus the third predictor variable.
12.5 Multiple Linear Regression 637
d. Find the value of r2 and explain the meaning of this a. Estimate the regression coefficients in the model
value. Yi 5 b0 1 b1 x1i 1 b2 x2i 1 b3 x3i 1 Ei.
b. Conduct an F test for a significant regression. Use
12.136 An experiment was conducted and the data obtained
a 5 0.05. Use technology to find the exact p value.
were used to fit the multiple linear regression model
c. Find the coefficient of determination.
Yi 5 b0 1 b1x1i 1 b2 x2i 1 b3 x3i 1 b4 x4i 1 Ei. Minitab was
d. Which predictor variable(s) is(are) significant in
used to estimate the regression coefficients, and the output is
explaining the variation in the dependent variable?
shown below.
Conduct the appropriate hypothesis tests.
e. Construct a 95% confidence interval for the constant
regression coefficient, b0. Use this interval to determine
whether there is evidence to suggest the constant
regression coefficient is different from 0.
Minitab was used to estimate the regression coefficients, and b. Do these predictor variables explain a significant amount
the output is shown below. of variation in Y ? Conduct the appropriate model utility
test using a 5 0.05.
c. Find an estimate of an observed value of Y for x1 5 1.5
and x2 5 1200.
12.143 Physical Sciences An experiment was conducted to
test the effect of temperature ( x1, °F ) , contact area (x2, cm2),
and wood density (x3, g/cm3) on the bonding strength (y) of a
certain (wood) glue. Each piece of wood was glued to the
same control block and the entire fixture was placed in an
industrial oven. The bonding strength was measured by
recording the time (in minutes) until the test piece of wood
separated from the control block. The multiple linear regres-
sion model equation is Yi 5 b0 1 b1 x1i 1 b2 x2i 1 b3 x3i 1 Ei.
Minitab was used to analyze the data, and a portion of the
output is shown below.
Consider the value x* 5 ( 6.1, 5.5, 8.3, 7.2, 6.5 ) and suppose
an estimate of the standard deviation of Y* is sY* 5 1.973.
a. Find a 95% confidence interval for the true mean value
of Y when x 5 x*. Give an interpretation of this
interval.
b. Find a 95% prediction interval for an observed value of Y
when x 5 x*. Give an interpretation of this interval.
Applications
12.141 Economics and Finance A financial analyst a. Complete the ANOVA table and conduct a model utility
believes that the U.S. 6-month treasury yield (y) can be pre- test.
dicted from the crude oil price per barrel (x1), the price of gold b. Interpret the sign of each estimated regression
(x2), and the M1 money supply (x3, in billions of dollars). A coefficient.
random sample of days was selected and the data were c. Conduct three hypothesis tests with H0: bi 5 0 and
recorded.33 USTREAS a 5 0.05 in each test. Find bounds on the p value
a. Find the estimated regression line. Conduct an F test for a associated with each test. Use these results to determine
significant regression with a 5 0.05. which predictor variables are the most important in
b. Find an estimate of the true mean 6-month treasury yield determining the bonding strength of glue.
for x1 5 100, x2 5 1200, and x3 5 2600.
12.144 Physical Sciences A Daniel cell uses zinc and copper
c. Which of these variables do you think is the most
solutions to produce electricity. An experiment was conducted to
important predictor for the 6-month treasury yield? Why?
determine whether voltage (y) is affected by the temperature of
12.142 Business and Management A study was conducted the solutions ( x1, °F ) , the concentration of the solutions (x2, M),
to determine if the rental price of an apartment in a college and/or the surface area of the electrodes (x3, cm2). DANIEL
town is related to the distance from campus and the apartment a. Estimate the regression coefficients in the model
size. A random sample of apartments in State College, Pennsyl- Yi 5 b0 1 b1x1i 1 b2x2i 1 b3x3i 1 Ei.
vania, was obtained, and the monthly rental price (y, in dollars), b. Conduct a model utility test and the other appropriate
distance from campus (x1, in miles), and square feet (x2) was tests to determine which variables are the most important
recorded for each. RENTAL predictors of voltage.
a. Estimate the regression coefficients in the model c. Construct a normal probability plot of the residuals. Use
Yi 5 b0 1 b1 x1i 1 b2 x2i 1 Ei. Interpret the value of each this plot to determine whether there is any evidence of
estimated coefficient. violation in multiple linear regression assumptions.
640 Cha pter 1 2 Correlation and Linear Regression
12.145 Physical Sciences To keep golf courses attractive in d. Find the value of r2 and interpret this value.
the arid southwest, groundskeepers carefully monitor salt con- e. Estimate the mean value of Y for a sea ice extent of
tent in water used for irrigation. It is believed that high salt con- 12.5% and 35 stormy days.
centration can damage plants and grass. A random sample of
12.148 Physical Sciences Potassium ferricyanide crystals are
golf courses in the southwest was obtained and a sample of
used for etching and in eloctroplating applications. Many factors
water used for irrigating was obtained for each. The following
affect the crystal growth and therefore the size of these crystals.
variables were measured for each water sample: dissolved salt
These factors include the solubility of the compound and the
(y, dS/m), total dissolved solids (x1, mg/L), sodium, calcium
evaporation process. An experiment was conducted to determine
magnesium, and bicarbonate (x2, x3, x4, and x5, all in milliequiv-
whether the weight of the final crystal (y, in grams) is affected by
alents/liter).34 GROUNDS
the amount of initial solid (x1, g), the amount of water (x2, ml),
a. Estimate the regression coefficients in the model
and/or the temperature of the water ( x3, °F ) . Minitab was used to
Yi 5 b0 1 b1 x1i 1 b2 x2i 1 b3 x3i 1 b4 x4i 1 b5 x5i 1 Ei.
analyze the data and to estimate the regression coefficients in the
b. Construct the ANOVA table and conduct the model utility
model Yi 5 b0 1 b1 x1i 1 b2 x2i 1 b3 x3i 1 Ei. A portion of the
test.
output is shown below.
c. Conduct the necessary hypothesis tests to determine the
most important variables in predicting the time to failure.
What do you think is the single most important predictor
variable? Why?
d. Check the model assumptions by constructing a normal
probability plot of the residuals and the appropriate
scatter plots.
12.146 Biology and Environmental Science Researchers
in Canada recently investigated the impacts on birds from the
increased use of wind turbines and loss of nesting habitat.35
A random sample of Canadian bird habitats was obtained, and
the estimated number of nests (y, per hectare) was recorded
for each. In addition, the number of wind farms (x1), number
of turbines (x2), and area of the habitat (x3, in ha) was
also recorded. NESTING
a. Estimate the regression coefficients in the model
Yi 5 b0 1 b1 x1i 1 b2 x2i 1 b3 x3i 1 Ei.
b. Interpret the sign of each estimated regression coefficient.
c. Conduct a model utility test using a 5 0.05 and find the
value of the coefficient of determination.
d. Which predictor variable do you believe is most a. Complete the ANOVA table and conduct a model utility
important? Why? test.
e. Check the model assumptions by constructing a normal b. Use the output to determine which of the three predictor
probability plot of the residuals and the appropriate variables contributes to the overall significant regression.
scatter plots. c. Suppose x*1 5 ( 93, 200, 100 ) and x*2 5 ( 100, 220, 140 ) .
The predicted value for each is given in the Minitab
output, along with the standard error of each estimate
Extended Applications (SE Fit). For example, sY *1 5 1.05782. Find a 95%
12.147 Biology and Environmental Science There is some confidence interval for the true mean value of Y when
evidence to suggest that the variation in the size of penguin col- x 5 x*1 . Find a 95% confidence interval for the true mean
onies is related to sea ice extent. Suppose a study was con- value of Y when x 5 x*2 .
ducted to investigate the effect of sea ice extent and stormy d. Find a 95% prediction interval for an observed value of Y
weather on the size of penguin colonies along the west coast of when x 5 x*1 . Find a 95% prediction interval for an
the Antarctic Peninsula. Fifteen years were selected at random, observed value of Y when x 5 x*2 .
and the data were recorded: PENGUIN e. Why do you suppose sY *1 , sY *2 ?
y 5 size of the penguin colony population. 12.149 Biology and Environmental Science In many rural
x1 5 sea ice extent ( as a percentage ) areas in countries around the world, it is very costly to construct
x2 5 number of stormy days ( yearly total ) an efficient water supply system. Therefore, groundwater is the
a. Estimate the regression coefficients in the model principal source of water for drinking and other activities. Prior
Yi 5 b0 1 b1 x1i 1 b2 x2i 1 Ei. to use, the groundwater must be tested for safety and quality.
b. Conduct the model utility test. Use a 5 0.01. Chloride in water affects the taste of many foodstuffs, and large
c. Interpret the sign of each estimated regression coefficient. amounts may lead to hypertension. However, little is known
12.5 Multiple Linear Regression 641
about the long-term effects. A recent multiple linear regression ( y) in cultured buttermilk (produced from whole milk) can be
model was used to explain the variation in chloride in ground- predicted by the temperature to which the milk is heated
water ( y). The model included five predictor variables. The ( x1, °C ) , the temperature to which the milk is cooled before
independent variables are x1, total dissolved solids (mg/L); x2, the started culture is added ( x2, °C ) , the percentage of started
iron (mg/L); x3, nitrate (mg/L); x4, total hardness (ppm); x5, culture added (x3), and/or the fermentation time (x4, in
depth of sample (feet). hours). MILK
A portion of the analysis associated with this model follows. a. Estimate the regression coefficients in the model
Yi 5 b0 1 b1 x1i 1 b2 x2i 1 b3 x3i 1 b4 x4i 1 Ei.
b. Conduct a model utility test using a 5 0.05 and find
the value of the coefficient of determination.
c. Conduct the appropriate hypothesis tests to determine
which predictor(s) is(are) significant.
d. Conduct a hypothesis test to determine whether there is
any evidence to suggest b3 is less than 0.93. Use
a 5 0.05.
e. Find a 95% confidence interval for the true mean amount
of lactic acid in cultured buttermilk when x1 5 95,
x2 5 22, x3 5 1.5, and x4 5 20.
12.152 The Rainy Season Hawaii is a popular tourist desti-
nation during the winter months, and Honolulu offers many
outdoor attractions. However, the months of December through
February are the rainy season, and records indicate that the
most rain falls during the month of January. Using data from
random years,37 a multiple linear regression model was used to
explain the total January rainfall (y) that consisted of four pre-
dictor variables, one of which was a dummy variable (a variable
a. Complete the ANOVA table and conduct a model utility with a value of either 0 or 1). The independent variables were
test. Use technology to find the p value associated with monthly rainfall total in the previous December, January, and
this test. Find the value of r2. Use these results to February (x1, x2, and x3, respectively) and a dummy variable (x4)
described the overall significance of the model. which indicated whether there was a moderate to weak El Niño
b. Find the value of the test statistic and the p value during the previous year. Rainfall totals were recorded in
associated with each hypothesis test for a significant inches. HAWAII
regression coefficient. Use these results to determine the a. Construct the ANOVA table and conduct a model utility
most important predictor variables in the model. For each test. Use technology to find the p value associated with
of these variables, explain the effect on chloride in this test. Find the value of r2. Use these results to describe
groundwater. Justify your answers. the overall significance of the model.
12.150 Economics and Finance A recent study investi- b. Find the value of the test statistic and the p value
gated the variables that can be used to predict economic associated with each hypothesis test for a significant
growth in China, as measured by gross domestic product regression coefficient. Use these results to determine the
(GDP, y, in hundred millions).36 A sample of years was most important predictor variables in this model.
obtained and the following measurements were recorded: cap- c. Consider a reduced model with the two most important
ital stock (x1, in hundred millions); labor force (x2, in ten predictor variables. Conduct a model utility test. Describe
thousands); and energy consumption (x3, in ten thousand tons the significance of this model compared to the full model
standard coal). GDP in part (a).
a. Estimate the regression coefficients in the model d. Find the most recent rainfall totals for Honolulu and
Yi 5 b0 1 b1x1i 1 b2x2i 1 b3x3i 1 Ei. records on El Niño. Use this information to find a 95%
b. Conduct the model utility test. Use a 5 0.05. prediction interval for the most recent January rainfall
c. Find the value of r2 and interpret this value. Explain this total in Honolulu. Does the actual value lie in this
remarkable value in the context of this problem. interval? What does this suggest about the reduced
d. Conduct the appropriate hypothesis tests to determine model?
which regression coefficient(s) is(are) significantly
different from 0. Use a 5 0.05 in each test. Challenge
e. Do you believe there is any way to improve this model?
12.153 Geothermal Gradient The temperature naturally
Why or why not?
increases as one digs deeper into the continental crust. This
12.151 Public Health and Nutrition A recent experiment phenomenon is called the geothermal gradient. A random sam-
was designed to determine whether the percentage of lactic acid ple of continental crust depths (x, in km) was obtained, and the
642 C hapter 1 2 Correlation and Linear Regression
temperature at each depth was recorded ( y, in °C ) . The data are c. There is some evidence to suggest that the rate of change
given in the following table. GEOGRAD in temperature is different according to two depth zones.
For example, at small depths, the temperature changes
Depth 192 306 75 375 96 120 399 only slightly for each kilometer. However, at large depths,
Temperature 212 682 15 855 93 49 943 the temperature change is much greater for each
kilometer. Use your scatter plot from part (a) to divide the
Depth 469 439 209 224 313 379 296 data into two depth zones. Find the estimated regression
Temperature 1456 1156 271 336 604 911 606 line and the ANOVA summary table for each zone.
d. Some researchers believe the relationship between depth
Depth 37 316 41 242 21 230 and temperature is nonlinear and is better modeled by a
Temperature 35 576 31 462 10 274 quadratic curve. Square each depth to create a new list of
observations (x2). Find the estimated regression line using
a. Carefully sketch a scatter plot of these data. x2 as the independent variable.
b. Find the estimated regression line, and complete the e. Which of these three models do you think is the best?
ANOVA summary table. Justify your answer.
SST 5 Syy 5 g ( yi 2 y ) 2 5 g y 2i 2 ( g yi ) 2
1
n
SSR 5 b1Sxy 5 b1 g ( xi 2 x )( yi 2 y ) 5 b1 c g xi yi 2 ( g xi )( g yi ) d
= = = 1
n
SSE 5 SST 2 SSR
Chapter 12 Summary 643
Coefficient of determination
r2 5 SSR/SST
Estimate of variance
s2 5 SSE/ ( n 2 2 )
F test for a significant linear regression
H0: There is no significant linear relationship ( b1 5 0 ) .
Ha: There is a significant linear relationship ( b1 2 0 ) .
MSR
TS: F 5
MSE
RR: F $ Fa,1,n22
Hypothesis test and confidence interval concerning b1
H0: b1 5 b10
Ha: b1 . b10, b1 , b10, or b1 2 b10
B1 2 b10 B1 2 b10
TS: T 5 5
S/ !Sxx SB1
RR: T $ ta,n22, T # 2ta,n22, 0 T 0 $ ta/2,n22
=
A 100 ( 1 2 a ) % confidence interval for b1 has as endpoints b1 6 ta/2,n22 sB1.
Hypothesis test and confidence interval concerning b0
H0: b0 5 b00
Ha: b0 . b00, b0 , b00, or b0 2 b00
B0 2 b00 B0 2 b00
TS: T 5 5
2
S"g xi /nSxx SB0
RR: T $ ta,n22, T # 2ta,n22, or 0 T 0 $ ta/2,n22
=
A 100 ( 1 2 a ) % confidence interval for b0 has as endpoints b0 6 ta/2,n22sB0
Sample correlation coefficient
g xi yi 2 ( g xi )( g yi )
1
Sxy n
c g x2i 2 ( g xi ) 2 d c g y2i 2 ( g yi ) 2 d
r5 5
!Sxx Syy 1 1
Å n n
Hypothesis test and confidence interval concerning the mean value of Y for x 5 x*
H0: y* 5 y0*
Ha: y* . y*0, y* , y*0, or y* 2 y*0
( B0 1 B1x* ) 2 y*0
TS: T 5
S" ( 1/n ) 1 3 ( x* 2 x ) 2 /Sxx 4
RR: T $ ta,n22, T # 2ta,n22, or 0 T 0 $ ta/2,n22
644 C ha pter 1 2 Correlation and Linear Regression
A 100 ( 1 2 a ) % confidence interval for mY 0 x*, the mean value of Y for x 5 x*, has as
endpoints the values
= = 1 ( x* 2 x ) 2
( b0 1 b1x* ) 6 ta/2,n22 s 1
Ån Sxx
Prediction interval for an observed value of Y
A 100 ( 1 2 a ) % prediction interval for an observed value of Y when x 5 x* has as end-
points the values
= = 1 ( x* 2 x ) 2
( b0 1 b1x* ) 6 ta/2,n22 s 1 1 1
Å n Sxx
Regression diagnostics
1. Normal probability plot of residuals. The points should lie along an approximately
straight line.
2. Scatter plot of residuals versus predictor variable. There should be no distinct pattern.
g ( yi 2 y ) 2 5 g ( yi 2 y ) 2 1 g ( yi 2 yi ) 2
The sum of squares
= =
3 3
SST 3 SSE
SSR
Analysis of variance for multiple linear regression
Source of Sum of Degrees of
variation squares freedom Mean square F p Value
SSR MSR
Regression SSR k MSR 5 p
k MSE
SSE
Error SSE n2k21 MSE 5
n2k21
Total SST n21
Chapter 12 Summary 645
Coefficient of determination
r2 5 SSR/SST
Estimate of variance
s2 5 SSE/ ( n 2 k 2 1 )
H0: bi 5 bi0
Ha: bi . bi0, bi , bi0, or bi 2 bi0
Bi 2 bi0
TS: T 5
SBi
RR: T $ ta,n2k21, T # 2ta,n2k21, or 0 T 0 $ ta/2,n2k21
Hypothesis test and confidence interval concerning the mean value of Y for x 5 x*
H0: y* 5 y*0
Ha: y* . y*0, y* , y*0, or y* 2 y*0
( B0 1 B1x*1 1 c1 Bk x*k ) 2 y*0
TS: T 5
Sy*
RR: T $ ta,n2k21, T # 2ta,n2k21, or 0 T 0 $ ta/ 2,n2k21
A 100 ( 1 2 a ) % confidence interval for mY 0 x*, the mean value of Y for x 5 x*, has as
endpoints the values
= = =
( b0 1 b1x*1 1 c1 bk x*k ) 6 ta/2,n2k21 sY*
Regression diagnostics
1. Construct a histogram, stem-and-leaf plot, scatter plot, and/or normal probability plot
of the residuals. These graphs are all used to check the normality assumption.
2. Construct a scatter plot of the residuals versus each independent variable. If there are
no violations in assumptions, each scatter plot should appear as a horizontal band
around 0. There should be no recognizable pattern.
646 Cha pter 1 2 Correlation and Linear Regression
chemical elements and oak tree defoliation was studied. A a. Find the estimated regression line, and complete the
random sample of oak trees was obtained, and the concentration ANOVA table.
of phosphorus (x, in g/kg) in leaves was measured for each tree. b. Compute the residuals.
The percentage of defoliation (y) for each tree was carefully c. Carefully sketch a normal probability plot of the
estimated at the end of the growing season. OAKTREE residuals, and interpret the graph.
a. Carefully sketch a scatter plot of these data. Describe the d. Carefully sketch a scatter plot of the residuals versus the
relationship between these two variables. predictor variable. Is there any indication of a violation in
b. Find the estimated regression line. Is this regression the regression model assumptions?
significant? Conduct an appropriate test at the 0.05 level.
12.163 Marketing and Consumer Behavior “Black
Add a graph of this line to your scatter plot in part (a).
Friday,” the Friday after Thanksgiving, is traditionally the
c. Find the residuals, and sketch a graph of the residuals
beginning of the Christmas shopping season, and is the busiest
versus the predictor variable.
shopping day of the year. An economist believes there is a
d. How could this model be improved? Justify your answer.
relationship between the time of day at which a consumer starts
12.160 Medicine and Clinical Studies In a study of hair shopping on Black Friday and the amount of money spent
regeneration, 30 men over 45 years old who had used minoxidil that day. Suppose a random sample of Black Friday shoppers
for six months were randomly selected. The daily dose of was obtained and the shopping start time (x, in hours after
minoxidil (x, in mg) and the hair density ( y, in hairs/mm2) 6:00 a.m.) and the total amount spent ( y, in hundreds of
were recorded for each man. The following summary statistics dollars) was recorded for each. SHOPPER
were reported. a. Find the estimated regression line, and complete the
ANOVA table.
Sxx 5 21423.0 Syy 5 80.119 Sxy 5 165.593
b. Describe the relationship between shopping start time and
x 5 49.033 y 5 2.207 total amount spent on Black Friday. What would you
a. Find the estimated regression line. recommend to retailers to increase business?
b. Complete the ANOVA table, and conduct an F test for a c. Find a 95% prediction interval for an observed value of
significant regression. Does the dosage of minoxidil help shopping start time of 8:00 a.m.
to explain variation in hair density? Justify your answer. d. Suppose the shopping start time is 9:00 a.m. Is there any
c. Find a 95% confidence interval for the mean hair density evidence to suggest that the true mean amount spent is
for a man taking a daily 50-mg dose of minoxidil. greater than $725? Use a significance level of 0.05.
12.161 Economics and Finance A statistician (and cigar 12.164 Economics and Finance Scientists at the U.S. Mint
smoker) recently studied the relationship between Cigar believe that the number of years a quarter is in circulation is
Aficionado’s blind ratings of cigars and price. The purpose of this linearly related to the condition of the coin. To test this
study was to determine whether premium cigars are really more theory, a random sample of quarters was obtained. The
expensive. A random sample of 50 cigars was selected, and the number of years in circulation (x) was recorded for each coin,
rating (x) and price per cigar ( y, in dollars) were recorded for and each quarter was assessed for wear using the Official
each. The following summary statistics were obtained. American Numismatic Association Grading Standards for
U.S. Coins ( y, a 70-point scale). A simple linear regression
Sxx 5 185.725 Syy 5 1481.83 Sxy 5 111.877 analysis was performed, and the following partial ANOVA
x 5 5.6084 y 5 11.7654 table was obtained.
a. Find the estimated regression line. ANOVA summary table
b. Complete the ANOVA table.
c. Conduct a test of H0: b1 5 0 versus Ha: b1 2 0 with a Source of Sum of Degrees of Mean
significance level of 0.05. variation squares freedom square F p Value
d. Do you believe the saying, “You get what you pay for”
Regression
applies to cigars? Justify your answer.
Error 7671.0
12.162 Manufacturing and Product Development Con-
Total 11,148.4 54
sumer Federation of America recently studied the relationship
between tensile strength and amount of nickel in household a. Complete the ANOVA table.
stainless steel products. The percentage of nickel by weight (x) b. Find an estimate of the variance of the random error
and the tensile strength ( y, in MPa) were measured for each terms.
product. The data are given in the following table. NICKEL
12.165 Biology and Environmental Science Every 13 or
17 years a brood of cicadas, sometimes in Biblical proportions,
x 5.9 2.8 5.3 4.6 4.6 3.5 3.8
emerges from underground in the northeastern part of the
y 948 859 921 909 915 876 828 United States. Although harmless to humans, cicadas cause
x 5.0 5.3 4.2 damage to small trees and shrubs, and they make a piercing,
irritating sound. Some research suggests that the density of
y 964 964 900
cicadas is related to the density of moles, natural predators of
648 C hapte r 1 2 Correlation and Linear Regression
cicada. Plots were randomly selected, and the density of moles b. Find the estimated regression line, and complete the
per acre (x) and the density of cicadas per acre (y, in millions) ANOVA table.
were carefully estimated. The data are given in the following c. How does the F test for a significant regression support
table. CICADA your answer to part (a)?
12.168 Medicine and Clinical Studies Twenty-one adults
x 4 6 3 0 2 4 0 5 4
were selected at random for a study of the effect of castor oil
y 1.5 0.8 1.2 1.4 1.5 1.5 1.7 1.2 1.1
on the immune system. Participants took a certain amount of
a. Carefully sketch a scatter plot of the data. Describe the castor oil (x, in milliliters) each day for three months. At the
relationship between the density of moles and the density end of this regimen, the white blood cell count was measured
of cicadas. (y, thousands of cells per microliter of blood) for each person.
b. Find the estimated regression line. Conduct a hypothesis Analyze these data using any appropriate methods to deter-
test of H0: b1 5 0 versus Ha: b1 2 0. Use a significance mine whether there is a significant linear relationship between
level of 0.05. the amount of castor oil consumed and the number of white
c. Construct a normal probability plot of the residuals and a blood cells. Check the relevant assumptions. Assume that the
scatter plot of the residuals versus the predictor variable. more white blood cells, the stronger the immune system, and
Is there any evidence that the regression assumptions are draw a conclusion about the effect of castor oil on the immune
not satisfied? system. CASTOR
12.166 Travel and Transportation Since airlines began 12.169 Sports and Leisure Despite the rising cost of
charging for checked luggage, more passengers use carry-on equipment, the number of fishing licenses issued in many states
bags to avoid the extra fee. For those who check luggage, the continues to increase. Officials are trying to predict the number
wait at the baggage carousel while jostling with other passen- of participants in freshwater fishing in order to hire and deploy
gers can be long and infuriating. A study was conducted to the appropriate number of fish and game wardens. The data
determine whether there is a relationship between the flight given on the website list the percentage of freshwater (x) and
time (x, in hours) and the amount of time for a bag to arrive the freshwater fishing participation rate (y) by state.40 Note: No
from a plane to the carousel (y, in minutes). A random sample data are given for Hawaii. Analyze these data using any
of arriving flights at the Denver International Airport was appropriate methods to determine whether there is a significant
obtained and the flight time and baggage time were recorded linear relationship between the percentage of freshwater and the
for each. LUGGAGE participation
= rate. Check the relevant assumptions. Does the
a. Construct a scatter plot for these data. Describe the sign of b1 seem appropriate? Why or why not? FISHING
affected by the water-to-cement ratio (x1), the sand-to-cement a. Estimate the regression coefficients in the model
ratio (x2), the percentage of fly ash (x3), and/or the ambient Yi 5 b0 1 b1x1i 1 b2x2i 1 b3x3i 1 b4x4i 1 b5x5i 1 Ei.
temperature during curing ( x4, °F ) . PAHWY b. Determine the most important predictor variables. Find
a. Estimate the regression coefficients in the model the estimated regression line for this reduced model.
Yi 5 b0 1 b1x1i 1 b2x2i 1 b3x3i 1 b4x4i 1 Ei. Use the c. Use the sign of each estimated regression coefficient in
sign of each estimated regression coefficient to explain the reduced model to explain the effect of each predictor
the effect of each predictor variable on the compression variable on the volume of mesophilic fungi.
strength of concrete. d. Check the relevant assumptions for the reduced model.
b. Construct an ANOVA table and conduct a model utility
test. Use technology to find the p value associated with last Step
this test.
12.173 Are windmills too noisy? An experiment
c. Conduct the appropriate hypothesis tests to determine the
was conducted to measure the windmill noise level
significant predictor variables.
(in dB) at various distances (in meters) from a proposed site.
d. Construct a normal probability plot of the residuals and
The data are summarized in the following table.
the plots of the residuals versus each predictor variable. Is
there any evidence of a violation in the multiple linear Distance 10 50 75 120 150 160 200 250 400 500
regression assumptions? Noise level 75 110 73 52 58 77 56 57 28 4
12.172 Public Health and Nutrition There is some evidence a. Find the estimated regression line, and complete the
to suggest that exposure to airborne fungal products may be ANOVA table.
associated with adverse health effects, for example, respiratory b. Find the coefficient of determination and interpret this value.
tract infections.41 A random sample of Canadian elementary c. Suppose a windmill is constructed 350 meters from your
schools was obtained and the following data was obtained for home. Find an estimate for the mean noise level and a
each: mesophilic fungi ( y, in cfu/m3), building age (x1, in years), 95% prediction interval for the noise level.
air exchange rate (x2 , per hour), indoor CO2 (x3, ppm), outdoor d. Construct a normal probability plot of the residuals and a
temperature ( x4, °C ) , and a dummy variable, signs of moisture in scatter plot of the residuals versus distance. Is there any
the room (x5, 0 for no signs, 1 for signs). FUNGI evidence of non-normality? Justify your answer.
13 Categorical Data and
Frequency Tables
Looking Back
s
n Remember how univariate categorical data are identified: non-numerical observations that
may be placed in categories.
n Recall how bivariate categorical data are obtained and characterized: two non-numerical
observations on each individual or object.
n Think about the natural summary measures for categorical data: frequency and relative
frequency.
Looking Forward
s
Brooklyn 45 52 14 8
Manhattan 30 54 17 11
Queens 47 48 15 13
Staten Island 30 43 20 14
The techniques presented in this chapter will be used to determine whether
two categorical variables are dependent. In the context of this problem, we
would like to know whether residents in all boroughs have the same public
transportation patterns.
Contents
13.1 Univariate Categorical Data, Goodness-of-Fit Tests
13.2 Bivariate Categorical Data, Tests for Homogeneity and Independence
© David M. Grossman/The Image Works
651
652 Cha pte r 1 3 Categorical Data and Frequency Tables
The hypothesis test procedure presented in this section is designed to compare a set of
hypothesized proportions with the set of true proportions, to check the goodness of fit
(Gof). For example, the HR director might use the data above to determine whether
there is any evidence that the true proportions are different from the following hypothe-
sized proportions: 0.4 (for motivated employees), 0.3 (for indifferent employees), 0.2 (for
disgruntled employees), and 0.1 (for con-artist employees).
The notation used here is an extension of the notation used to represent sample and
population proportions. Suppose each observation falls into one of k categories.
The true proportion and the hypothesized proportion are both population proportions.
The sum of the hypothesized proportions (like the sum of the true proportions) must be 1.
That is, p10 1 p20 1 c1 pk0 5 1.
The goodness-of-fit test is done to determine whether there is any evidence (from the
observed, or sample, counts) that the true population proportions differ from the hypoth-
esized population proportions. The null hypothesis and the alternative hypothesis are
stated in terms of the true and hypothesized category proportions.
H0 is a composite null hypothesis. H0: p1 5 p10, p2 5 p20, . . . , pk 5 pk0.
It involves several parameters and (Each true category proportion is equal to a specified hypothesized value.)
is true only if all the equalities Ha: pi 2 pi0 for at least one i.
hold. (There is at least one true category proportion that is not equal to the corresponding
specified hypothesized value.)
Suppose a random sample of size n is selected; let ni ( i 5 1, 2, . . . , k ) be the number
of observations falling into each category. To decide whether the sample data fit the
In this context, a cell is simply a hypothesized proportions, observed cell counts (the ni’s) are compared with expected cell
category. counts. If the null hypothesis is true, then the expected frequency, or count, for category 1,
or in cell 1, is e1 5 np10. For cell 2, the expected frequency is e2 5 np20, etc. For example,
if the sample size is n 5 100 and p10 5 0.25, then we expect the count for category 1 to
be, on average, np10 5 ( 100 )( 0.25 ) 5 25.
13.1 Univariate Categorical Data, Goodness-of-Fit Tests 653
The test statistic is a measure of how far away the observed cell counts are from the
expected cell counts. If the null hypothesis is true, then the random variable
X2 5 a 5 a
The X in X 2 is the uppercase k ( observed cell count 2 expected cell count ) 2 k
( ni 2 ei ) 2
Greek letter chi (x is the expected cell count ei
i51 i51
lowercase form).
has approximately a chi-square distribution with k 2 1 degrees of freedom. This approx-
imation is good if ei 5 npi0 $ 5 for all i, that is, if all expected cell counts are at least 5.
If the observed cell counts are close to the expected cell counts, then the value of X2
will be small. If the observed cell counts are considerably different from the expected cell
counts, then the value of X2 will be large. Therefore, the null hypothesis is rejected only
for large values of the test statistic. Now we have all of the pieces for the complete
hypothesis test.
Stepped
STEPPED Tutorial
TUTORIALS
Goodness-of-Fit Test
Goodness-of-fit
BOX PLOTS test Let ni be the number of observations falling into the ith category ( i 5 1, 2, . . . , k ) , and
let n 5 n1 1 n2 1 c1 nk . A hypothesis test about the true category population propor-
tions with significance level a has the form
H0: p1 5 p10, p2 5 p20, . . . , pk 5 pk0
Ha: pi 2 pi0 for at least one i.
TS: X2 5 a
k
( ni 2 ei ) 2
where ei 5 npi0
i51 ei
RR: X2 $ x2a,k21
A reminder: This test is appropriate if all expected cell counts are at least 5
SOLUTION TRAIL
(ei 5 npi0 $ 5 for all i). The chi-square distribution was introduced in Section 8.5, and a
Data Set hypothesis test based on this distribution was discussed in Section 9.7.
BEES
Example 13.1 Sweet as Tupelo Honey
For beekeepers looking to harvest more honey, there are four possibilities for obtaining
Solution Trail 13.1 more bees: package bees, nucs, established colonies, or swarms. A university agricultural
sciences department obtained a random sample of recent bee purchases, and each was
K e y w or ds
classified into one of the four categories. Use the following one-way frequency table to
n Test the hypothesis test the hypothesis that the four possible bee purchases occur with equal frequency. Use
n Four possible bee purchases a 5 0.05.
occur with equal frequency
TS: X2 5 a
there is any evidence that any k
( ni 2 ei ) 2
one of the true population ei
i51
proportions differs from 0.25.
RR: X2 $ x2a,k21 5 x20.05,3 5 7.8147 From Table VI in the Appendix.
654 C h a p t e r 1 3 Categorical Data and Frequency Tables
Observed
Cell Category cell count Expected cell count
1 Package bees 31 e1 5 np10 5 ( 113 )( 0.25 ) 5 28.25
2 Nucs 36 e2 5 np20 5 ( 113 )( 0.25 ) 5 28.25
3 Colonies 26 e3 5 np30 5 ( 113 )( 0.25 ) 5 28.25
4 Swarms 20 e4 5 np40 5 ( 113 )( 0.25 ) 5 28.25
All four expected cell counts are greater than 5. The chi-square goodness-of-fit
test is appropriate.
Step 3 The value of the test statistic is
x2 5 a
4
( ni 2 ei ) 2
23 ei
i51
Figure 13.1 p-Value illustration: Step 4 The value of the test statistic does not lie in the rejection region or equivalently,
p 5 P(X $ 4.9823) p 5 0.1731 . 0.05. Figure 13.1 shows the calculation and an illustration of the
5 0.1731 . 0.05 5 a exact p value. At the a 5 0.05 significance level, there is no evidence to suggest
that any one of the true population proportions differs from 0.25.
Recall that, using Table VI in the Figure 13.2 shows a technology solution.
Appendix, we can only bound the
p value for this hypothesis test.
CrunchIt!, Minitab, or the TI-84
can be used to find the exact
p value for this test.
Data Set
Example 13.2 Thanksgiving Traditions
On Thanksgiving, many families traditionally gather for a wonderful meal, lively conver-
THANKS
sation, and in some cases a spirited street hockey game. A random sample of adults over
age 18 was obtained and asked to name their favorite Thanksgiving food. The data and the
proportions from a previous survey3 are given in the following table.
Previous
Favorite food Frequency proportions
Turkey 250 0.38
Stuffing 148 0.26
Mashed potatoes 98 0.17
Yams 55 0.10
Green bean casserole 30 0.05
Cranberry sauce 42 0.04
SOLUTION TRAIL
Solution Trail 13.2 Is there evidence to suggest that any of the true cell proportions differ from the previous
proportions? Use a 5 0.05.
K e y w or ds
n Is there evidence? Solution
n True cell proportions differ Step 1 A goodness-of-fit test is appropriate to determine whether there is evidence that
any of the true category population proportions have changed. There are k 5 6
T r ansl ati o n categories.
n Statistical inference The four parts of the hypothesis test are
n True population proportions
differ H0: p1 5 0.38, p2 5 0.26, p3 5 0.17, p4 5 0.10, p5 5 0.05, p6 5 0.04
Ha: pi 2 pi0 for at least one i.
C onc e pts
TS: X2 5 a
n Goodness-of-fit test
k
( ni 2 ei ) 2
i51 ei
V isi on
RR: X2 $ x2a,k21 5 x20.05,5 5 11.0705 Table VI in the Appendix.
Find the expected cell counts,
and use the goodness-of-fit test Step 2 There are n 5 250 1 148 1 98 1 55 1 30 1 42 5 623 total observations.
procedure to determine whether The expected cell counts are given in the following table.
there is evidence that any one of
the true population proportions Observed
differs from the previous Cell Favorite food cell count Expected cell count
proportions.
1 Turkey 250 e1 5 np10 5 ( 623 ) ( 0.38 ) 5 236.74
2 Stuffing 148 e2 5 np20 5 ( 623 ) ( 0.26 ) 5 161.98
3 Mashed potatoes 98 e3 5 np30 5 ( 623 ) ( 0.17 ) 5 105.91
4 Yams 55 e4 5 np40 5 ( 623 ) ( 0.10 ) 5 62.30
5 Green bean casserole 30 e5 5 np50 5 ( 623 ) ( 0.05 ) 5 31.15
6 Cranberry sauce 42 e6 5 np60 5 ( 623 ) ( 0.04 ) 5 24.92
All expected cell counts are greater than 5. The chi-square goodness-of-fit test is
appropriate.
Step 3 The value of the test statistic is
x2 5 a
6
( ni 2 ei ) 2
i51 ei
( 250 2 236.74 ) 2 ( 148 2 161.98 ) 2 ( 98 2 105.91 ) 2
5 1 1
236.74 161.98 105.91
( 55 2 62.30 ) 2 ( 30 2 31.15 ) 2 ( 42 2 24.92 ) 2
1 1 1
62.30 31.15 24.92
Jon Feingersh/Blend Images/Getty Images Use observed and expected cell counts.
know? Step 4 The value of the test statistic ( x2 5 15.15 ) lies in the rejection region or equiva-
lently, p 5 0.0097 # 0.05. At the a 5 0.05 significance level, there is evidence to
25 suggest that at least one population proportion has changed from its previous value.
Note: we can use Table VI in the Appendix to bound the p value for this hypoth-
esis test. In row k 2 1 5 5 (degrees of freedom), place 15.15 in the ordered list
of critical values.
15.0863 # 15.15 # 16.7496
0 15.15
x20.01,5 # 15.15 # x20.005,5
Figure 13.3 p-Value illustration:
Therefore, 0.005 # p # 0.01 See Figure 13.3.
p 5 P(X $ 15.15)
5 0.0097 # 0.05 5 a Figure 13.3 shows the calculation and an illustration of the exact p value.
656 Cha pte r 1 3 Categorical Data and Frequency Tables
VIDEO Tech
Video TECH Manuals
MANUALS
Figure 13.4 shows a technology solution.
EXEL DISCRIPTIVE
Chi-square GOF test
Technology Corner
Procedure: Chi-square goodness-of-fit test.
Reconsider: Example 13.2, solution, and interpretations.
CrunchIt!
Use the built-in function Chi-squared Goodness-of-fit.
1. Enter the observed frequencies into column Var1. If all categories are assumed to be equally likely, continue with
Step 2. Otherwise, enter the corresponding expected probabilities into column Var2.
2. Select Statistics; Chi-squared Goodness-of-fit. Choose the Observed Frequencies column and optionally the Expected
Probabilities column.
3. Click Calculate. The results are displayed in a separate window. See Figure 13.5.
TI-84 Plus C
Use the built-in function x2GOF-Test.
1. Enter the observed frequencies into list L1 and the expected frequencies into list L2.
2. Select STAT ; TESTS; x2GOF-Test. Enter the list containing the observed frequencies, the list containing the
expected frequencies, and the degrees of freedom associated with the test statistics. Note: To compute the expected
frequencies, enter the hypothesized proportions into a list. Compute n (623 in this example) times this list and store the
results (expected frequencies) in a new list.
3. Highlight Calculate and press ENTER . The hypothesis test results are displayed on the Home screen. See
Figure 13.6. The Draw results are shown in Figure 13.7.
13.1 Univariate Categorical Data, Goodness-of-Fit Tests 657
Minitab
The Chi-square goodness-of-fit test is in the Stat; Tables menu.
1. Enter the category names into column C1 (optional), the hypothesized proportions into column C2, and the observed
counts into column C3.
2. Select Stat; Tables; Chi-Square Goodness-of-Fit Test (One variable). Enter the Observed counts column, the Category
names column, and the Specific proportions column.
3. Click OK. The results are displayed in a session window. Refer to Figure 13.4. This Minitab procedure also automati-
cally produces a chart of observed and expected values by category, and a chart of the contribution to the value of the
chi-square statistic by category.
Excel
Use the built-in function CHISQ.TEST.
1. Enter the observed frequencies into column B. Compute and enter the expected frequencies into column D.
2. Use the function CHISQ.TEST to find the p value associated with the goodness-of-fit test and the function CHISQ.INV
to find the value of the test statistic if desired. See Figure 13.8.
13.5 Fill in the Blank The goodness-of-fit test is appropriate to a fraternal organization. The results are given in the follow-
if all expected cell counts are _____________. ing one-way frequency table. EX13.11
b. Find the expected count for each cell. Verify that each is
at least 5. Sentiment Bullish Neutral Bearish
c. Find the value of the test statistic. State your conclusion, Frequency 413 337 250
and justify your answer.
13.9 Consider the following one-way frequency table. EX13.9 Is there any evidence that the percentages during this week are
different from the long-term percentages? Use a 5 0.01.
Category 1 2 3 4 5 6 13.14 Biology and Environmental Science In a recent
Frequency 90 82 75 95 96 36 study of adult eye color, the hypothesized distribution was
given as blue 25%, green 10%, brown 50%, and black 15%.
a. Find the four parts of a goodness-of-fit test with A random sample of adults was obtained, and the resulting
p10 5 0.175, p20 5 0.171, p30 5 0.162, p40 5 0.225, eye colors are summarized in the following one-way frequency
p50 5 0.202, and p60 5 0.065. Use a 5 0.05. table. EYES
b. Find the value of the test statistic. State your conclusion
and justify your answer. Eye color Blue Green Brown Black
c. Find bounds on the p value.
Frequency 121 65 242 72
13.10 Consider the following one-way frequency table. EX13.10
Conduct a goodness-of-fit test to determine whether these data
Category 1 2 3 4 5 provide any evidence that the true proportions differ from the
Frequency 140 135 155 152 168 hypothesized proportions. Use a 5 0.05.
13.15 Marketing and Consumer Behavior Original Designs
Conduct a goodness-of-fit test to check the hypothesis that all sells home plans in five different styles. A random sample of pur-
five categories are equally likely. Use a 5 0.05. Find bounds chases was obtained, and the number falling into each category
on the p value. was recorded. The data are given in the following table. HOMES
Conduct a goodness-of-fit test to determine whether there is Conduct a goodness-of-fit test to determine whether there is
evidence that any of the true proportions differ from the evidence that any of this year’s true population proportions
hypothesized values 0.10, 0.20, 0.25, 0.05, and 0.40. Use differs from the 2009 proportion. Use a 5 0.05.
a 5 0.05.
13.19 Business and Management A random sample of
13.16 Marketing and Consumer Behavior A random employed adults in Canada was obtained and classified by
sample of West Bay Yacht Club members was obtained and occupation. The resulting data and the hypothesized propor-
asked to identify their dinner event preference. The resulting tions are given in the following table.7 EMPLOYED
data along with the historical proportions are given in the fol-
lowing table.5 dinner
Occupation Frequency pi0
Dinner event preference Frequency pi0
Management 1503 0.09
Pot luck 120 0.33 Business, finance, administration 3157 0.18
Catered $8 95 0.18 Natural and applied sciences 1357 0.07
Catered $12 100 0.26 Health 1183 0.07
Catered $15 36 0.10 Social science, education, gov, and rel 1659 0.09
Catered $18 61 0.13 Art, culture, recreation, sport 595 0.03
Sales and service 4262 0.24
Conduct a goodness-of-fit test to determine whether there is
Trades, transport, eq operators 2634 0.15
evidence that any of the true proportions differs from the
historical value. Use a 5 0.05. Unique to primary 557 0.03
Unique to processing, man, util 788 0.05
13.17 Marketing and Consumer Behavior The manager at
a Publix grocery store is trying to determine which types of
lettuce to order and sell to customers. A random sample of Conduct a goodness-of-fit test to determine whether there is
shoppers was obtained and asked to select their favorite lettuce evidence that any of the true proportions differs from the
type. The data are given in the following table. lettuce hypothesized value. Use a 5 0.01.
Lettuce Frequency 13.20 Public Health and Nutrition Prescription medication
Crisphead 45 can be very expensive, in part because of the high cost of
research and development. The most expensive drug in the
Butterhead 17
world is Soliris, which is used to treat a rare blood disorder.
Romaine 130 Many insurance companies have adopted a tiered plan that
Leafy 90 allows the user to select a generic or preferred drug at a reduced
Celtuce 28 cost or copay instead of the named or specialty drug at a higher
copay. Medicare Blue RX has four tiers for prescription medi-
Test the fit of these data to the hypothesized proportions 0.14, cation, each with different copays. Past records indicate that the
0.06, 0.37, 0.30, and 0.13. Use a 5 0.05. proportions of users who select each tier are as follows: Tier 1,
13.18 Public Health and Nutrition The U.S. Centers for 0.35; Tier 2, 0.28; Tier 3, 0.12; Tier 4, 0.25. Suppose a random
Disease Control and Prevention collects data on the number sample of policyholders who purchased medication was
of reported flu cases. The following table shows the number obtained and the frequency associated with each tier is given in
of reported H1N1 flu cases by region for the week ending the following table. soliris
December 15, 2013, and the hypothesized proportions from
2009.6 h1n1 Tier 1 2 3 4
Surveillance region Frequency pi0
Frequency 152 107 36 76
1 40 0.014
2 125 0.047 a. Use a goodness-of-fit test to show that there is a shift in
3 145 0.061 the proportion of users by tiers. Use a 5 0.05.
4 501 0.210 b. Use your results in part (a) to explain the shift in the
5 345 0.132 proportions, that is, which tiers are more/less utilized.
6 380 0.151 13.21 Sports and Leisure As soon as the gates at an
7 140 0.051 amusement park open, there is a mad rush to the most
8 501 0.221 popular rides. A random sample of people waiting in line was
9 164 0.066 obtained and asked which ride they were headed to first. The
results and the hypothesized proportions are given in the
10 110 0.047
following table. rides
660 Ch a pt e r 1 3 Categorical Data and Frequency Tables
Ride Frequency pi0 c. Use your results in part (a) to explain which proportions
are different. Why would you expect these proportions to
Tower of Terror 63 0.40
be different in Florida?
Rockin’ Roller Coaster 35 0.30
Star Tours 16 0.15 Extended Applications
Studios Backlot Tour 14 0.15
13.24 Psychology and Human Behavior Berkeley
Breathed stopped writing his Pulitzer Prize–winning comic strip
Conduct a goodness-of-fit test to determine whether any of the Bloom County in 1989. Since then, he has written several books
true population proportions differ from the hypothesized and animations. Recently, Mr. Breathed has been involved in
proportions. Use a 5 0.01. writing a children’s movie and two sequel comic strips, Outland
13.22 Marketing and Consumer Behavior According and Opus, featuring some of the old Bloom County gang. A ran-
to a Rasmussen Report, 58% of American adults dine out dom sample of people who read the original Bloom County
at least once per week.8 Consequently, customers are quick comic strip was obtained and asked to name their favorite char-
to voice dissatisfaction when their dining experience is acter. The results are given in the following table. bloom
cable companies. In a recent survey by MSN Money, each par- contains the 2012 responses and a random sample of responses
ticipant was asked to select the one item that matters the most from residents in 2014. metro
in customer service.11 A random sample of consumers was
2012 2014
obtained shortly after the 2014 holiday shopping season, and
Category responses sample
each was also asked the same question. The results are given in
the following table. service Transportation 127 135
Growth 127 124
2014 MSN
Crime 120 110
Customer service item sample survey
Economy 80 89
Knowledgeable staff 150 0.30 Government 67 62
Service after the sale 130 0.18 Taxes 40 43
Friendly staff 85 0.16 Social 33 39
Flexible policies for returns/exchanges 75 0.13 Housing 39 45
Readily available staff 47 0.09 Other 33 36
The product is all that matters 45 0.08
Not sure 34 0.06 a. Compute the proportion associated with each category for
the 2012 responses.
b. Conduct a goodness-of-fit test to determine whether there
Conduct a goodness-of-fit test to determine whether there is is any evidence that the true 2012 population proportions
evidence that the true 2014 proportions of the most important have changed in 2014. Use a 5 0.05.
customer service items have changed since the MSN survey.
Use a 5 0.05. Challenge
13.27 Travel and Transportation Capital Bikeshare is a 13.29 Business and Management In Britain, a small baker
Washington, D.C., organization sponsored by several agencies that does not have a fully automatic plant and sells the majority of
offers short-term use of over 1650 bicycles to registered members. his or her production on-site or from vehicles. According to the
There are approximately 175 bicycle stations in the District of Birmingham City Council trading standards, the law states that
Columbia, Arlington County, and the City of Alexandria, the average weight of one loaf type must be 400 grams.
Virginia.12 To distribute bicycles appropriately to stations, a Suppose the Small Baker’s Association (SBA) claims that the
survey is conducted each year to determine the home location and weight of each loaf of bread of this type is approximately nor-
work location of members. The following table contains the mal with mean 400 grams and standard deviation 15 grams. To
historical proportions associated with each home location and a check this claim, a random sample of small-baker loaves was
summary of a random sample of members in 2014. bikes obtained, and each was carefully weighed. The observed fre-
quency of weights in each specified interval (in grams) is
2014 Historical
summarized in the following one-way table.
Home location sample proportions
Interval Frequency
District of Columbia 3750 0.78
Arlington County (VA) 530 0.11 ,370 18
Montgomery County (MD) 185 0.04 370–385 67
Fairfax County (VA) 101 0.02 385–400 175
Prince Georges County (MD) 52 0.01 400–415 184
Alexandria City (VA) 98 0.02 415–430 75
Other 92 0.02 $430 14
a. Assume the SBA claim is true. Find the probability that a
Conduct a goodness-of-fit test to determine whether there is randomly selected loaf of bread falls into each interval.
evidence that any of the historical proportions have changed. b. Use the probabilities computed in part (a) as the
Use a 5 0.05. hypothesized population proportions associated with each
13.28 Public Policy and Political Science The Metropolitan interval. Conduct a goodness-of-fit test to determine
Council routinely conducts a survey to learn the opinions of whether the observed weights fit the hypothesized
Twin City residents about the region’s quality of life and key distribution. Use a 5 0.05. State your conclusion.
regional problems and possible solutions. One key question Note: The goodness-of-fit test provides a formal test for nor-
involves the most important problem in the Twin Cities metro mality. It complements the methods used to check for normality
area, grouped by major categories. Suppose the following table in Section 6.3.
662 C ha pte r 1 3 Categorical Data and Frequency Tables
Two-way tables are described by The natural summary for this type of bivariate data set (in which each observation is
the number of rows and the categorical) is the number of observations in each category combination. For example,
number of columns. For example, compute the number of futures trades (or frequency) by Fidelity and involving agricul-
a 2 3 6 contingency table has 2 ture, the number of futures trades by Fidelity and involving energy, etc. This summary
rows and 6 columns. information can be displayed in a 3 3 4 two-way frequency table, or contingency table,
as shown in Table 13.1. Each cell in this table contains the number of observations
(observed count or frequency) in a category combination, or pairing.
Table 13.1 A two-way frequency table for the data obtained from futures
trades
Futures type
Agriculture Energy Financials Metals
Fidelity 15 25 30 22
Firm
Vanguard 22 24 15 30
WellsTrade 32 25 20 40
Stepped
STEPPED Tutorial
TUTORIALS The columns of the two-way table in Table 13.1 correspond to futures type, and the
two-way
BOX PLOTStables
rows correspond to brokerage firms. Each cell in the body of the table contains a fre-
quency, or observed cell count. For example, there were 15 trades by Vanguard involving
financials, and there were 40 trades by WellsTrade involving metals.
13.2 Bivariate Categorical Data, Tests for Homogeneity and Independence 663
Bar charts were introduced in If we consider a single firm (population), say Fidelity, then a bar chart may be used to
Chapter 2. represent the distribution of futures type (Figure 13.9). A bar chart may be constructed for
each firm. For example, Figure 13.10 is another bar chart associated with these data, a
summary of futures type for Vanguard.
35 35
30 30
25 25
Frequency
Frequency
20 20
15 15
10 10
5 5
0 0
re
s
s
re
s
gy
gy
al
al
al
al
tu
tu
er
er
i
i
et
et
nc
nc
ul
ul
En
En
M
ric
na
ric
na
Ag
Ag
Fi
Fi
Futures type Futures type
Figure 13.9 A bar chart showing the Figure 13.10 A bar chart showing the
frequency of futures type for trades by frequency of futures type for trades by
Fidelity. Vanguard.
Video Tech Manuals A side-by-side or stacked bar chart may be used to compare categorical data from two
VIDEO TECH MANUALS or more sources, or populations. Figure 13.11 shows a side-by-side bar chart of futures
Two-way tables -
EXEL DISCRIPTIVE
stacked/segmented type grouped by brokerage firm. Figure 13.12 shows a stacked bar chart of futures type
bar charts grouped by brokerage firm.
40 120
100
30
80
Frequency
Frequency
Agriculture
Energy
20 60
Financials
Metals 40
10
20
0 0
e
de
y
d
ad
lit
lit
ar
ar
a
de
de
Tr
gu
Tr
gu
ls
Fi
ls
Fi
n
n
el
el
Va
Va
W
W
Brokerage firm Brokerage firm
Figure 13.11 A side-by-side bar chart showing the Figure 13.12 A stacked bar chart
frequency of occurrence of each futures type, by firm. showing the frequency of occurrence of
each futures type, by brokerage firm.
Although these graphs are useful for suggesting possible differences among popula-
tions, there is a more precise statistical test. The practical problem to consider is: Are all
of the true category proportions the same for each population? This is a test for homoge-
neity of populations. Homogeneity is the state of having identical properties of values; in
this case, it refers to populations having identical true category proportions. In the firm
and futures type example, it seems reasonable to ask whether the proportion of futures
type trades is the same for each firm. The statistical procedure used to analyze this prob-
lem is based on observed and expected cell counts (as in Section 13.1). Under the null
hypothesis that the populations have the same category proportions, the test statistic also
has a chi-square distribution.
664 C h a pte r 1 3 Categorical Data and Frequency Tables
Suppose there are I rows and J columns in a two-way frequency table. The notation
associated with this table includes the dot notation in a subscript (introduced in Chap-
ter 11) to indicate a sum over that subscript while the other subscript is held fixed.
nij 5 o bserved cell count, or frequency, in the (ij) cell (the intersection of the ith row and
the jth column).
ni. 5 a nij
J
j51
5 ith row total, the sum of the cell counts, or observed frequencies, in the ith row.
n. j 5 a nij
I
i51
5 j th column total, the sum of the cell counts, or observed frequencies, in the jth
column.
n 5 a a nij
I J
i51 j51
5 grand total, the total of all cell counts, or observed frequencies.
Table 13.2 is a visualization of these symbols in an I 3 J two-way frequency table.
Category
1 2 c j c J Row total
1 n11 n12 c n1j c n1J n1.
2 n21 n22 c n2j c n2J n2.
Population
( ( ( ( ( ( ( (
i ni1 ni2 c nij c niJ ni.
( ( ( ( ( ( ( (
I nI1 nI2 c nIj c nIJ nI.
Column total n.1 n.2 c n.j c n.J n
The row and column totals are used to compute the expected cell counts. The broker-
age firm and futures type data will be used to illustrate these calculations. Table 13.3 is a
modified two-way table containing the observed cell counts, the row and column totals,
and the grand total.
Table 13.3 A modified two-way frequency table including the row and column
totals and the grand total for the trade data
Futures type
Agriculture Energy Financials Metals Row total
Fidelity 15 25 30 22 92
Vanguard 22 24 15 30 91
Firm
WellsTrade 32 25 20 40 117
Column total 69 74 65 92 300
13.2 Bivariate Categorical Data, Tests for Homogeneity and Independence 665
Here, as in Section 13.1, the There were 300 futures trades in this study, and 69 involve agriculture. The proportion
expected cell count may not be an of all agriculture trades in the data set is 69/300 5 0.23. Suppose there is no difference
integer. in the proportion of agriculture trades among firms. Then we expect 23% of the futures
trades by Fidelity to involve agriculture. This expected frequency in the (11) cell is
denoted e11 and is computed by
69 ( 92 )( 69 )
The e here is used to denote e11 5 0.23 3 92 5 3 92 5
expected frequency and is not
300 300
connected in any way with the ( 1st row total )( 1st column total ) n1. 3 n.1
5 5 5 21.16
random errors or residuals in grand total n
Chapter 12.
Similarly, we expect 23% of the futures trades by Vanguard to involve agriculture. The
expected cell count in the (21) cell is
69 ( 91 )( 69 )
e21 5 0.23 3 91 5 3 91 5
300 300
( 2nd row total )( 1st column total ) n2. 3 n.1
5 5 5 20.93
grand total n
The expected counts in column 2 are computed in a similar manner. There are 74
futures trades that involve energy. The proportion of all futures trades involving energy is
74/300 5 0.2467. If there is no difference in the proportion of energy trades, then we
expect 24.67% of the trades, or
74 ( 92 )( 74 )
e12 5 0.2467 3 92 5 3 92 5
300 300
( 1st row total )( 2nd column total ) n1. 3 n.2
5 5 5 22.69
grand total n
to be the energy trades by Fidelity. We continue in the same manner to compute all the
expected counts. Table 13.4 shows each expected count in parentheses beneath the cor-
responding observed count.
Table 13.4 A modified two-way frequency table including the row and column
totals, the grand total, and the expected cell counts for the trade data
Futures type
Agriculture Energy Financials Metals Row total
Fidelity 15 25 30 22 92
(21.16) (22.69) (19.93) (28.21)
Vanguard 22 24 15 30 91
Firm
The computations in this example suggest an easy formula for finding the expected
frequencies. In an I 3 J two-way table, the expected count, or frequency, in the (ij) cell
can be written as
( ith row total )( jth column total ) ni. 3 n.j
eij 5 5
grand total n
666 C ha pte r 1 3 Categorical Data and Frequency Tables
The test statistic is a measure of how far away the observed cell counts are from the
expected cell counts. If there is no difference in category proportions among populations,
the random variable
X 5 a 5 aa
( observed cell count 2 expected cell count ) 2 I ( nij 2 eij ) 2
J
2
x2 5 a a
3 ( nij 2 eij ) 2
4
Stepped
STEPPED Tutorial
TUTORIALS
Test for Homogeneity of Populations
Chi-square
BOX PLOTS tests
In an I 3 J two-way frequency table, let nij be the observed count in the (ij) cell and let
eij be the expected count in the (ij) cell. A hypothesis test for homogeneity of populations
with significance level a has the following form:
H0: The true category proportions are the same for all populations (homogeneity of
populations).
Ha: The true category proportions are not the same for all populations.
TS: X2 5 a a
I ( nij 2 eij ) 2
J ( ith row total )( jth column total ) ni. 3 n.j
where eij 5 5
i51 j51 eij grand total n
RR: X2 $ x2a,(I21)(J21)
A reminder: This test is appropriate if all expected cell counts are at least 5 (eij $ 5
for all i and j). Note that this procedure is called a test of homogeneity and this
property is stated in the null hypothesis. However, we are really testing for evidence
Data Set of inhomogeneity. We cannot prove homogeneity, we can only test for evidence of
HOSPITAL inhomogeneity.
13.2 Bivariate Categorical Data, Tests for Homogeneity and Independence 667
Gender
n Is there any evidence? Males 346 56 19 15 24
n True category proportions . . . Females 454 66 28 18 16
are different
T r ansl ati o n
Solution
n Statistical inference Step 1 This is a test for homogeneity of populations. There are two populations, males
n Population proportions are and females, and five categories for readmittance ( I 5 2, J 5 5 ) . We would like
different to know whether the true proportions of readmittance categories are the same for
each population.
C onc e pts Step 2 The four parts of the hypothesis test are
n Test for homogeneity of
populations H0: The true readmission category proportions are the same for males and
females.
V isi on
Ha: The true readmission category proportions are not the same.
We need to compare the
TS: X2 5 a a
proportion of the number of
2 ( nij 2 eij ) 2
5
x2 5 a a
2 5( nij 2 eij ) 2
i51 j51 eij
( 346 2 353.17 ) 2 ( 56 2 53.86 ) 2 c ( 16 2 22.34 ) 2
5 1 1 1
353.17 53.86 22.34
5 4.777 ( , 9.4877 )
Step 5 The value of the test statistic does not lie in the rejection region or equivalently,
p 5 0.3109 . 0.05. Figure 13.13 shows the calculation and illustration of the
exact p value. At the a 5 0.05 significance level, there is no evidence to suggest
that the true readmission category proportions are different for males and females.
Figure 13.14 shows a technology solution.
Figure 13.14 JMP chi-square
test results. Try It Now Go to Exercise 13.42
668 Ch a pt e r 1 3 Categorical Data and Frequency Tables
VIDEO Tech
Video TECH Manuals
MANUALS Suppose bivariate data arise from a single random sample in which the values of two
EXEL DISCRIPTIVE categorical variables are recorded for each individual or object. For example, in a random
Chi-square two-way
test sample of home burglaries, the type of item stolen and the method of entry might be
recorded. Or suppose a random sample of employed people is obtained, and the occupa-
tion and any job-related injury is recorded for each person. In each instance, the data are
bivariate, and there are observations on two categorical variables.
Given this type of bivariate data, it seems reasonable to ask whether the values of one
variable affect the values of the other (are the variables dependent?). Or are the two vari-
ables independent? Then, knowing the value of one variable suggests nothing special
about the value of the other. For example, suppose the proportion of all employed people
who suffer from job-related respiratory illnesses is 0.15 for farm workers, 0.12 for
industry and construction workers, 0.03 for service workers, and 0.04 for all other occu-
pations. Now, suppose we know an employed person suffered a job-related broken bone.
Does this change the proportions? If so, then the variables (occupation and job-related
injury) are dependent. If knowing the occupation does not alter the job-related injury
proportions, then the variables are independent.
We use the same notation as in the test for homogeneity. In an I 3 J two-way table,
there are I categories for the first variable (instead of I populations), and J categories for
the second variable. The test for independence is based on observed and expected
counts once again. The test statistic is exactly the same as in the test for homogeneity.
Here’s why.
As in the test for homogeneity, Recall from Chapter 4 that if two events A and B are independent, then the probability
this is actually a test for of A and B is the product of the corresponding probabilities; that is,
dependence. We cannot prove
independence, we can only test
P(A d B) 5 P(A) # P(B) if A and B are independent
for evidence of dependence. Suppose the two categorical variables are independent and an individual or object falls
into the (ij) cell. Consider the following probability.
5 P ( ith value for first variable ) # P ( jth value for second variable )
Independent events; multiply corresponding probabilities.
ni. # n. j Using the notation introduced in this section, the probability of falling into the ith row times the
5
n n probability of falling into the jth column.
Because there is a total of n individuals, the expected count, or frequency, in the (ij) cell is
eij 5 expected count in the ( ij ) cell
sample # probability an individual falls
5a b c d
size into the ( ij ) cell
ni. # n.j
5n#a b
n n
ni. 3 n. j
5 Simplify: cancel an n.
n
( ith row total )( jth column total )
5 Symbol translation.
grand total
This is identical to the expression we used before, so the test statistic is the same
as in the test for homogeneity. The test statistic is again a measure of how far away
the observed cell counts are from the expected cell counts. The formal hypothesis test
follows.
13.2 Bivariate Categorical Data, Tests for Homogeneity and Independence 669
TS: X2 5 a a
I J ( nij 2 eij ) 2 ( ith row total )( jth column total ) ni. 3 n.j
where eij 5 5
i51 j51 eij grand total n
RR: X2 $ x2a,(I21)(J21)
This test is appropriate if all expected cell counts are at least 5 (eij $ 5 for all i
and j).
Data Set
Example 13.4 Rest Stop Preferences
SOLUTION TRAIL
RESTSTOP
The Pilot Travel Plaza in Pennsylvania is located at the intersection of Routes 487 and 80
so that travelers on each road have easy-on/easy-off access in both directions. Research is
being conducted to summarize food preferences and to attract vendors. A random sample
of people who purchased food at this plaza was obtained, and the traveling direction and
the food vendor were recorded for each person. The observed frequencies are given in the
table below. Is there any evidence to suggest that traveling direction and food vendor are
Solution Trail 13.4
dependent? Use a 5 0.01.
K e y w or ds
n Is there any evidence? Food vendor
n Traveling direction and food Aunt Pizza Taco Mrs. Hot Dog
vendor are dependent
Annie’s Hut Bell Fields Company
T r ansl ati o n North 25 30 17 38 56
Traveling
direction
TS: X 5 a a
to find the value of the test ( nij 2 eij ) 2
4 5
2
statistic and draw the appropriate eij
i51 j51
conclusion.
RR: X2 $ x2a,(I21)(J21) 5 x20.01,12 5 26.2170
670 C ha pt e r 1 3 Categorical Data and Frequency Tables
Table 13.6 The two-way table for the travel plaza example, including the row
and column totals, the grand total, and the expected cell counts
Food vendor
Aunt Pizza Taco Mrs. Hot Dog
Annie’s Hut Bell Fields Company Row total
North 25 30 17 38 56 166
(32.38) (26.26) (22.18) (40.03) (45.13)
Traveling direction
South 40 22 25 45 41 173
(33.75) (27.37) (23.12) (41.72) (47.04)
East 34 24 20 43 48 169
(32.97) (26.74) (22.59) (40.76) (45.95)
West 28 27 25 31 32 143
(27.90) (22.63) (19.11) (34.49) (38.88)
Column total 127 103 87 157 177 651
x2 5 a a
4 5 ( nij 2 eij ) 2
i51 j51 eij
212
( 25 2 32.38 ) 2 ( 30 2 26.26 ) 2 c ( 32 2 38.88 ) 2
5 1 1 1
32.38 26.26 38.88
5 14.598 ( , 26.2170 )
Step 5 The value of the test statistic does not lie in the rejection region or equivalently,
0 14.598 p 5 0.2642 . 0.01. Figure 13.15 shows the calculation and illustration of the
Figure 13.15 p-Value illustration: exact p value. At the a 5 0.01 significance level, there is no evidence to suggest
p 5 P(X $ 14.598) that the two categorical variables are dependent.
5 0.2642 . 0.01 5 a Figure 13.16 shows a technology solution.
Technology Corner
Procedure: Chi-square test for homogeneity or independence.
Reconsider: Example 13.4, solution, and interpretations.
CrunchIt!
Use the built-in function Contingency Table; with counts.
1. Enter the row numbers in Var1, the column numbers in Var2, and the corresponding counts in Var3.
2. Choose Statistics; Contingency Table; with counts. Select the Row Variable, the Column Variable, and the Counts
(variable).
3. Click Calculate. CrunchIt! computes and displays a table with counts, row percentages, column percentages, and total
percentages in addition to the value of the test statistic. See Figure 13.17.
TI-84 Plus C
Use the built-in function x2-Test.
1. Enter the observed cell counts into the matrix [A]. See Figure 13.18.
2. Select STAT ; TESTS; x2 -Test. Enter the matrix containing the observed cell counts, [A], and specify a matrix for
the expected frequency, [B]. See Figure 13.19.
3. Highlight Calculate and press ENTER . The expected frequencies are shown in Figure 13.20 and the chi-square test
results are shown in Figure 13.21. The Draw results are shown in Figure 13.22.
Minitab
Use the built-in function Chi-Square Test for Association in the Stat; Tables menu.
1. Enter the observed category frequencies into the columns C1–C5.
2. Select Stat; Tables; Chi-Square for Association.
3. Select Summarized data in a two-way table. Enter the columns containing the table.
4. Click OK. The chi-square hypothesis test results are displayed in a sessions window. See Figure 13.16.
Excel
Use the built-in function CHISQ.TEST.
1. Enter the observed cell counts in a rectangular array (matrix) and the expected cell counts in another rectangular array.
2. Use the function CHISQ.TEST to find the p value associated with the chi-square test and the function CHISQ.INV to
find the value of the test statistic if desired. See Figure 13.23.
13.36 True/False In a test for independence of two categori- 13.39 Find the missing observed cell counts and row
cal variables, we reject the null hypothesis only for large values and column totals in the following two-way frequency
of the test statistic. table.
13.2 Bivariate Categorical Data, Tests for Homogeneity and Independence 673
3 33 26 28 119 Product
Column total 68 60 258 Kandy
Krimpets Cupcakes Kakes Creamies
Grocery store
13.40 Consider the following two-way frequency table with three Giant 90 80 95 92
populations and three values of a categorical variable. ex13.40
Shaw 81 66 87 56
Category
Weis 94 83 92 55
1 2 3
1
Population
13.42 Sports and Leisure Random samples of gamblers at Funding for CBC
four Las Vegas casinos were obtained, and each gambler was Decrease Maintain Increase Unsure
asked which game he or she played most. The results are given
Atlantic 17 58 36 9
in the following two-way frequency table. casino
Region
Quebec 27 167 65 37
Game
Ontario 59 191 79 26
Blackjack Poker Roulette Slots
West 68 212 106 44
Bellagio 22 20 38 66
Caesar’s
Casino
Round
426 360 11
Girls 259 369 254 118
3 441 355 30
Boys 274 335 262 129
4 443 358 26
Is there any evidence that the number of times spent helping
a. Is there any evidence to suggest that draft round is
with homework is different for boys and girls? Use a 5 0.01.
dependent on position? Conduct the appropriate
Find bounds on the p value associated with this test.
hypothesis test with a 5 0.01.
13.47 Psychology and Human Behavior While food-and- b. Interpret your results in part (a). For example, when is a
wine pairing is subjective and an inexact science, traditionally red special teams player most likely to be drafted?
wine goes with red meat, and white wine goes with fish and poultry.
13.50 Medicine and Clinical Studies Some research sug-
A random sample of diners at four-star restaurants was obtained,
gests that college student athletes who are subject to high stress
and each diner was classified according to the food and wine
levels (due to academic demands, personal problems, or other
ordered. Here is the resulting two-way frequency table. pairing
sources) are more likely to be injured during a game. A random
Wine sample of student athletes was obtained, and a questionnaire was
Red White used to determine the stress level of each prior to a game. Follow-
ing the game, the injury status of each athlete was also recorded.
Red meat 86 46
Food
Is there any evidence that food and wine are dependent? Test the Low Medium High
relevant hypothesis with a 5 0.05. Do these data suggest that
Injury
Yes 12 17 25
diners are still following the traditional food-and-wine pairings? No 335 292 288
13.48 Sports and Leisure The Whitewater Ski Resort in
Is there any evidence to suggest that stress level and injury are
British Columbia offers downhill skiing, cross-country skiing,
dependent? Conduct the appropriate hypothesis test with a 5 0.05.
snowboarding, and snow tubing. The manager is planning to
develop a new, targeted advertising campaign. A random sam- 13.51 Public Health and Nutrition The National Advisory
ple of customers was obtained, and each was classified by age Committee on Immunization in Canada provides the public
group and activity. The data are summarized in the following with specific health recommendations and summary reports. A
two-way table. resort random sample of patients who tested positive for influenza
Resort activity during the 2012–2013 flu season was obtained. Each person
was classified by age group and influenza type. The data are
Cross- summarized in the following table.15 flu
Downhill country Snow- Snow
skiing skiing boarding tubing Influenza type
,16 24 35 23 48 A/H1N1 A/H3N2 A unsub B
16–20 ,5 207 849 1592 617
Age group
50 25 40 32
5–19
Age group
Violation familiarity with the UN and the country are dependent? Conduct
Serious, pool Water Policy/ the appropriate hypothesis test with a 5 0.01. united
Country
France 31 261 669 84
Private club 367 1,508 863
Italy 32 547 400 74
Pool type
hypothesis test with a 5 0.01 and find bounds on the p value Country
associated with this test. United States Canada
13.53 Psychology and Human Behavior Most municipali- MD 78 30
ties provide the public with a variety of crime statistics, includ- ND 22 31
ing types of crimes and class to the police. A random sample of
gunshot incidents in Oakland, California, was obtained, and each RN 15 14
Degree
was classified by time of day and day of week. The data are sum- DO 12 5
marized in the two-way frequency table below.17 crime NP 11 4
Time of day DVM 10 18
Overnight Daytime Evening Other 21 20
Mon 17 10 11 Is there any evidence to suggest that the true proportions for type
Tue 32 4 9 of degree held are different for each country? Use a 5 0.01.
Wed 16 7 17 Challenge
Day
Thu 14 7 8
13.56 Marketing and Consumer Behavior In a recent USA
Fri 24 6 17 Today poll, most customers shopping online indicated that shipping
Sat 49 3 12 options are important when making buying decisions.18 There are
Sun often several shipping options for online consumers, including
46 7 15
overnight delivery, 3-day ground service, and free slow shipping.
Suppose that, in a random sample of 500 adults in California, 135
a. Is there any evidence that day of the week and time of day
said they prefer free slow shipping. In a random sample of 600
are dependent? Use a 5 0.05.
adults in New Jersey, 204 indicated they prefer free slow shipping.
b. Are the assumptions for the hypothesis test in part (a)
a. Conduct a hypothesis test concerning two population
met? Why or why not?
proportions to determine whether there is any evidence
c. Use your results in part (a) to suggest the day and time of day
that the proportion of adults who prefer free slow shipping
when gunshot incidents are much different than expected.
is different in California and in New Jersey. Find the
13.54 Public Policy and Political Science The United Nations p value associated with this test.
is almost 70 years old. However, many people in countries all over b. Consider “California’’ and “New Jersey’’ as populations
the world are still unfamiliar with the UN and do not understand its and “prefer’’ and “not preferred’’ as categories. Using the
function. A survey concerning familiarity with the United Nations data given in this problem, construct a two-way frequency
was conducted in six nations. The data are summarized in the table and conduct a test for homogeneity of populations.
two-way frequency table. Is there any evidence to suggest that the Use technology to find the p value for this test.
676 C ha pte r 1 3 Categorical Data and Frequency Tables
c. What is the relationship between the value of the test The survey results are summarized in the following two-way
statistic in part (a) and the value of the test statistic in part frequency table. dental
(b)? How are the p values related? Why do these
relationships make sense? Floss
13.57 Public Health and Nutrition The American Dental 1 2 3 4 5 6 7
Association recently conducted a survey regarding dental hygiene 1 35 37 48 52 55 80 101
habits. A random sample of elderly people was obtained, and each
2 36 38 42 47 54 75 97
person was classified according to brushing and flossing frequency.
The following codes and categories were used for each variable. 3 38 43 46 51 58 77 94
Brush
Code Floss/brush frequency 4 33 51 42 42 51 52 86
5 76 79 85 87 93 102 115
1 Never
2 Once per month 6 81 41 46 78 107 103 116
3 A few times per month 7 98 94 126 136 142 198 252
4 Once per week
5 A few times per week Is there any evidence to suggest that flossing frequency and
6 Once per day brushing frequency are dependent? Conduct the appropriate
7 More than once per day hypothesis test with a 5 0.01. Find the p value associated with
this test.
TS: X2 5 a
k ( ni 2 npi0 ) 2
i51 npi0
RR: X2 $ x2a,k21
This test is appropriate if all expected cell counts are at least 5 ( npi0 $ 5 for all i ) .
Two-Way Frequency, or Contingency, Table
A two-way frequency table is a method for summarizing a bivariate categorical data set.
Each cell contains the number of observations (observed count or frequency) in a cat-
egory combination, or pairing.
Test for Homogeneity of Populations
In an I 3 J two-way frequency table, let nij be the observed count in the (ij) cell and let
eij be the expected count in the (ij) cell. A hypothesis test for homogeneity of populations
with significance level a has the form
H0: The true category proportions are the same for all populations (homogeneity of
populations).
Ha: The true category proportions are not the same for all populations.
Chapter 13 Exercises 677
TS: X2 5 a a
I J ( nij 2 eij ) 2 ( ith row total )( jth column total ) ni. 3 n. j
where eij 5 5
i51 j51 eij grand total n
RR: X2 $ x2a,(I21)(J21)
This test is appropriate if all expected cell counts are at least 5 (eij $ 5 for all i and j).
Test for Independence of Two Categorical Variables
In a random sample of n individuals, suppose the values of two categorical variables are
recorded. In the resulting I 3 J two-way frequency table, let nij be the observed count in
the (ij) cell and let eij be the expected count in the (ij) cell. A hypothesis test for indepen-
dence of the two categorical variables with significance level a has the form
H0: The two variables are independent.
Ha: The two variables are dependent.
TS: X2 5 a a
I J ( nij 2 eij ) 2 ( ith row total )( jth column total ) ni. 3 n. j
where eij 5 5
i51 j51 eij grand total n
RR: X2 $ x2a,(I21)(J21)
This test is appropriate if all expected cell counts are at least 5 (eij $ 5 for all i and j).
Is there any evidence to suggest that the true population proportions 13.61 Public Policy and Political Science A Lubbock
of preferred logistics site is different from 0.20? Use a 5 0.05. County, Texas, government official discovered some extra
money in the budget that must be spent by the end of the fiscal
13.59 Psychology and Human Behavior The Rescue year. A random sample of county residents was obtained, and
Pet Store sells five breeds of dogs, and the owner is trying each was asked how the money should be spent. The data and
to determine whether one breed is preferred over the others. the hypothesized population proportions (from past county
A random sample of recent dog sales was obtained, and referendum votes) are summarized in the following one-way
the number of each breed purchased is given in the frequency table. project
following table. breed
Project Frequency pi0
Dog breed Frequency
Road construction 103 0.3
American bulldog 46 Road resurfacing 119 0.4
Collie 54 Bicycle paths 40 0.1
Golden retriever 32 New sidewalks 35 0.1
German shepherd 30 Park improvements 20 0.1
Yorkshire terrier 46
Is there evidence to suggest that any of the true population
Is there evidence to suggest that the true population proportion proportions are different from the hypothesized proportions?
of sales for any breed is different from 0.20? Use a 5 0.05. Use a 5 0.05.
678 C h a p t e r 1 3 Categorical Data and Frequency Tables
satisfaction
A great deal 109 62 101
Current
tant reason for attending a specific university. The following table
summarizes the results associated with this question from students Some 97 56 44
surveyed at Carleton University and the proportions from the Very little 19 7 5
school’s comparison group.20 CARLETON
a. Is there any evidence to suggest that current satisfaction
Carleton Comparison level is associated with attorney category? Use a 5 0.01.
Response frequency group b. Find the p value associated with the hypothesis test in
part (a).
To prepare for a specific job or career 568 0.43
c. Which attorneys tend to be most satisfied with their job?
To get a good job 455 0.26 Justify your answer.
To increase my knowledge in an 179 0.08
academic field 13.65 Marketing and Consumer Behavior One of the most
common home remodeling jobs involves the kitchen. In almost
To get a good general education 146 0.07
every house, this room is heavily used and often needs to be
To prepare for graduate/professional 130 0.07 expanded to accommodate personal tastes. Three building-
school supply stores were selected, and a random sample of individuals
To develop a broad base of skills 65 0.04 purchasing kitchen countertops was obtained from each. The
To meet parental expectations 32 0.02 type of countertop was recorded for each person, and the data
Other 32 0.02 are summarized in the following two-way frequency table
To meet new friends 16 0.01 (Home Depot, HD; Lowe’s, LO; TrueValue, TV). counters
Countertop
Is there evidence to suggest that any of the Carleton true
Concrete Corian Marble Granite
population proportions is different from the comparison group
Supply store
Household tenure
Company
CF 90 51 56 61
LI 75 82 70 60 Owner occupied Social rented Private rented
PR 69 57 78 69 0 18 18 8
UL 93 92 77 91 1–2 140 89 51
Conduct a test for homogeneity of populations with a 5 0.05. 3–5 473 212 95
Days
Is there any evidence to suggest that the true proportions 6–9 665 171 86
associated with transfer plans are different for any of the 10–15 298 71 19
populations? Justify your answer. 16–25 88 18 8
13.64 Psychology and Human Behavior The Ohio State $ 26 70 12 3
Bar Association regularly surveys the legal community with
Is there any evidence to suggest that the number of days they
regard to the economics of law practice. The resulting report
could survive is associated with household tenure? Use
includes information about attorney demographics, experi-
a 5 0.01. Interpret your results in the context of this problem.
ence, and prevailing hourly billing rates.21 In addition, each
survey participant is asked to indicate job satisfaction and 13.67 Psychology and Human Behavior Some research
attorney category. These data are summarized in the following suggests that music influences how much time people spend in a
two-way table. attorney store. To investigate this theory, a random sample of customers
Chapter 13 Exercises 679
at a Paramount retail store was obtained (over a long period of found at each site was recorded along with the source. The data
time), and each was classified by the amount of time spent are summarized in the following table. litter
shopping (in minutes) and the type of music played during the City
day. The results are given in the following table. music
San Washington,
Type of music Francisco D.C. Oakland
Easy Take-out food 35 60 82
Classical listening Rock Country
Conv. store 40 30 78
, 15 49 35 17 14 Pharmacy 5 6 4
15–30 51 27 19 41
Time
Grocery 60 5 5
30–60 67 41 41 22 Other 7 6 12
$ 60 47 26 26 33
Is there any evidence to suggest that the proportion of discarded
Is there any evidence of an association between music type and paper bags by source is not the same for all cities? Use a 5 0.05.
time spent shopping? Conduct the appropriate hypothesis test
13.71 Marketing and Consumer Behavior The Silicon
with a 5 0.01.
Valley Bank Wine Report suggests that there is a relationship
13.68 Physical Sciences Recent research suggests that the between wine sales and age group.22 A random sample of adults
Titanic luxury liner broke into three sections, causing it to sink purchasing wine was obtained from various states. Each person
faster than was previously believed. The collision with an was classified by generation and retail bottle price. The summary
iceberg was a terrifying event, and some experts believe the data are given in the following two-way table. wine
chance of survival was associated with location aboard the Generation
vessel. The following table presents the class and survival status
of passengers aboard the Titanic. Millennial Gen X Boomers Matures
Titanic
Died Survived
$20–$29 44 132 147 44
First 122 203
$30–$39 22 59 85 36
Second 167 118 $40–$69 12 41 67 26
Class
s
n Recall the parametric methods discussed in previous chapters: statistical techniques based on the
normality assumption.
n Remember that if any assumptions are violated, the conclusions may be invalid.
Looking Forward
s
n Learn several nonparametric, or distribution-free, procedures that require very few assumptions
about the underlying population(s).
n Understand the intuitive reasons for using many nonparametric test statistics.
n Compute and use ranks in several nonparametric test procedures.
Alabama 5.0 16.0 8.5 4.2 7.3 4.1 4.2 7.3 6.9 4.3
Florida 4.2 14.0 7.9 8.3 25.0 8.3 4.1 8.5 8.3 8.3 12.0 4.2
It seems reasonable to conduct a two-sample t test for a difference in population
means. However, it is known that naphthalene concentrations are not normally distrib-
uted. Because the normality assumption is violated, a two-sample t test is not valid.
The methods presented in this chapter allow for comparison of continuous distribu-
tions, with few assumptions necessary. These nonparametric procedures are handy when
very little is known about the underlying distributions. A statistical test based on ranks
can be used to compare the naphthalene concentrations in these two locations.
Contents
14.1 The Sign Test
14.2 The Signed-Rank Test
14.3 The Rank-Sum Test
14.4 The Kruskal-Wallis Test
14.5 The Runs Test
14.6 Spearman’s Rank Correlation
Dan Anderson/EPA/Landov
681
682 C h a pt er 1 4 Nonparametric Statistics
A Closer L ok
~ and the sign test
1. If the underlying (continuous) distribution is symmetric, then m 5 m
can be used to test a hypothesis about a population mean.
2. We really do not need to literally replace observations with plus or minus signs. Dis-
card any observations equal to m~ , and simply count the number of observations greater
0
~
than m0.
3. Because the binomial distribution is discrete, we usually cannot find critical values to
yield an exact level-a test. Use critical values such that the significance level is as
close to a as possible but not greater than a.
T r ansl ati o n Is there any evidence to suggest that the median duration of unemployment is greater than
n One-sided, right-tailed 36 weeks? Use a significance level of a 5 0.10.
hypothesis test concerning a
population median Solution
~ 5 36 ( 5 m
Step 1 The assumed median is m ~ ) , the sample size is n 5 15, and we will
0
C onc epts
use a 5 0.10.
n Sign test concerning a
population median We are looking for any evidence that the median unemployment duration is
greater than 36 weeks. Therefore, the relevant alternative hypothesis is one-
V isi on sided, right-tailed.
We cannot assume anything Because the population is not assumed to be normal and this is a test concerning
about the shape of the underlying
the median, the sign test should be used.
distribution of unemployment
duration. Since the normality Step 2 The four parts of the hypothesis test are
assumption is not justified and ~ 5 36
H0 : m
this is a test concerning the
H:m ~ . 36
median, the (nonparametric) sign a
test is appropriate. TS: X 5 the number of observations greater than 36
RR: X $ c1 5 11
To find the critical value c1:
a. There are no observations equal to 36, the hypothesized median. Therefore,
no values are excluded from the analysis.
684 C h a pte r 1 4 Nonparametric Statistics
Table I in the Appendix, Binomial If the null hypothesis is true, X is a binomial random variable with n 5 15
Distribution Cumulative and p 5 0.5: X , B ( 15, 0.5 ) . We need to find a value c1 such that P ( X $ c1 )
Probabilities, n 5 15 is as close to a 5 0.10 as possible without going over.
p Find the smallest c1 such that P ( X $ c1 ) # 0.10.
x c 0.50 c b. Use the complement rule applied to a discrete random variable to convert this
( ( equation to cumulative probability,
8 0.6964 P ( X $ c1 ) 5 1 2 P ( X , c1 ) 5 1 2 P ( X # c1 2 1 ) # 0.10
9 0.8491
10 c 0.9408
or find the smallest c1 such that P ( X # c1 2 1 ) $ 0.90.
11 0.9824 c. Using Table I in Appendix A, with n 5 15 and p 5 0.5, c1 2 1 5 10. There-
12 0.9963 fore, c1 5 11. The actual significance level using this critical value is
( ( P ( X $ 11 ) 5 1 2 P ( X , 11 ) 5 1 2 P ( X # 10 ) 5 1 2 0.9408 5 0.0592
Step 3 Using signs, classify each observation as either greater than or less than the
hypothesized median.
Observation 35 40 38 37 39 43 49 39 34 41 47 35 45 38 43
Sign 2 1 1 1 1 1 1 1 2 1 1 2 1 1 1
The value of the test statistic is the number of plus signs, or the number of obser-
vations greater than m ~ 5 36. Therefore, x 5 12.
0
Step 4 The value of the test statistic lies in the rejection region. At the a 5 0.10 sig-
nificance level, there is evidence to suggest that the population median unem-
ployment duration is greater than 36 weeks.
Step 5 It is unlikely that a critical value will yield the exact desired significance level,
so it is often more appropriate to find a p value when using the sign test.
p 5 P ( X $ 12 ) Definition of a p value.
(
5 1 2 P X # 11 ) The complement rule; discrete random variable.
This is the nonparametric If the data are paired (two observations on each individual or object) and the underly-
counterpart to a paired t test. ing distributions are not normal, then the sign test can be used to compare population
medians. Compute each pairwise difference, disregard the magnitude of the difference,
and consider only the sign of the difference. Replace each positive difference with a plus
14.1 The Sign Test 685
sign and each negative difference with a minus sign. Use these signs and the test proce-
dure described above. This analysis is appropriate for comparing two population medians,
~ and m
m ~ .
1 2
Subject 1 2 3 4 5 6 7 8 9 10
Before 0.58 0.78 0.76 0.54 0.60 0.73 0.56 0.45 0.59 0.45
After 0.60 0.49 0.41 0.74 0.80 0.77 0.52 0.58 0.69 0.71
Subject 11 12 13 14 15 16 17 18 19 20
Before 0.42 0.62 0.76 0.61 0.67 0.49 0.48 0.56 0.57 0.52
After 0.58 0.70 0.73 0.43 0.52 0.70 0.47 0.44 0.64 0.68
Other research suggests that the underlying DVA score populations are not normal. Use a
sign test to compare the median DVA score before the egg diet with the median DVA
Stockbroker/© MBI/Alamy score after the egg diet. Is there any evidence to suggest that eggs help to improve eye-
sight, as measured by DVA? Use a significance level of 0.01.
Solution
Step 1 The data are certainly paired: There are before and after measurements on each
individual. Typically, the before measurements are considered population 1 and
the after measurements are population 2. The underlying populations are not
normal, so a paired t test is not appropriate.
SOLUTION TRAIL
Solution Trail 14.2 Step 2 The null hypothesis is that the two population medians are equal: The egg-a-day
diet has no effect on visual acuity, or
Ke ywor ds
~ 5m
m ~ 1 m
~ 2m
~ 5m
~ 5 0 ( 5D )
n Sign test 1 2 1 2 D 0
n Median score before, after We are searching for evidence that the egg diet improves visual acuity, so the
n Eggs help to improve eyesight alternative hypothesis is m~ ,m ~ 1 m ~ 2m ~ 5m ~ , 0. This is a one-
1 2 1 2 D
sided, left-tailed test.
T ranslati on
n One-sided, left-tailed Step 3 The four parts of the hypothesis test are
hypothesis test concerning H0 : ~ 50
mD
the difference in population ~ ,0
medians Ha: mD
TS: X 5 the number of differences greater than 0
Concepts
RR: X # c2 5 4
n Sign test to compare two
population medians To find the critical value c2:
Vi sion a. There are no differences equal to D0 5 0, the hypothesized difference. There-
There is a before and after fore, no pairs are excluded from the analysis.
measurement on each individual; If the null hypothesis is true, X has a binomial distribution with n 5 20
the data are paired. Compute and p 5 0.5: X , B ( 20, 0.5 ) . Find the largest value c 2 such that
each difference, and use the sign
P ( X # c2 ) # 0.01 5 a.
test to compare two medians
procedure to find the value of the b. Using Table I in the Appendix, with n 5 20 and p 5 0.5, the critical value
test statistic and draw the is c2 5 4. The actual significance level using this critical value is
appropriate conclusion. P ( X # 4 ) 5 0.0059 ( # 0.01 ) .
Step 4 Compute each pairwise difference ( before 2 after ) and determine the signs.
Step 5 The value of the test statistic is the number of plus signs, or the number of dif-
What would the significance level ferences greater than D0 5 0. Therefore, x 5 8.
be if c2 5 5?
Step 6 The value of the test statistic does not lie in the rejection region. At the a 5 0.01
significance level, there is no evidence to suggest that the egg diet increased the
median DVA score.
Step 7 The p value for this hypothesis test is
Technology Corner
Procedure: Conduct a sign test concerning a population median.
Reconsider: Example 14.2, solution, and interpretations.
CrunchIt!
The Sign Test function is only for comparing two sample medians.
1. Enter the data from sample 1 into column Var1 and the data from sample 2 into column Var2.
2. Select Statistics; Non-parametrics; Sign Test.
3. Using the pull-down menus, select the First Variable and the Second Variable.
4. Under the Hypothesis Test tab, enter the Difference under the null hypothesis (D0) and select the appropriate Alterna-
tive. Click Calculate. The results are displayed in a new window (Figure 14.2).
Minitab
Use the 1-Sample Sign function to conduct a sign test concerning a population median.
1. Enter the unemployment data into column C1.
2. Select Stat; Nonparametrics; 1-Sample Sign. Enter the Variable (column containing the data, C1), select Test median
and enter the hypothesized value, 36, and select the Alternative hypothesis (greater than).
3. Click OK. The test results are displayed in a session window. Refer to Figure 14.1.
Excel
There is no built-in function to conduct a sign test. Use other functions to count the number of observations greater than
the hypothesized median and to compute the p value associated with the test.
1. Enter the data into column A.
2. Count the number of observations greater than 30 using the function COUNTIF.
3. Use the function BINOM.DIST to compute the p value. See Figure 14.3.
14.1 Short Answer A nonparametric test is also called a 8.90 7.68 9.41 9.47 8.98 8.41 9.51 7.40
_______ procedure. 9.66 8.17 5.32 9.06 7.67 7.87 8.16 8.77
14.2 Short Answer Name two advantages to using a non- 7.01 9.22 7.41 8.57
parametric statistical test. ~ 5 125, H : m
~ 2 125, a 5 0.01
d. H0: m a
14.3 True/False In a case where either a nonparametric or a
parametric test can be used, it is usually better to use the para- 196 126 187 168 111 196 164 181 110
metric procedure. 113 187 136 159 154 177 132 103 151
187 191 192 139 187 190 112
14.4 True/False The sign test concerning a population
median is based on the number of positive values (observations 14.11 Using the paired data on the website, conduct a sign
greater than 0) and the number of negative values (observations ~ 2m
test to compare population medians with H0: m ~ 5 0,
1 2
less than 0). ~ ~
Ha: m1 2 m2 . 0, and significance level a 5 0.05. EX 14.11
14.5 Short Answer Under what conditions can the sign test 14.12 Using the paired data on the website, conduct a sign test
be used to test a hypothesis about a population mean? to compare population medians with H0: m~ 2m ~ 5 3,
1 2
~ ~
Ha: m1 2 m2 2 3, and significance level a 5 0.01. Find the
14.6 True/False The sign test is based on the binomial distri-
bution with n trials and probability of a success p 5 0.50. exact p value for this test. EX 14.12
site was recorded for each. Assume that the underlying distribu- The distribution of chlorophyll in surface water is assumed to
tion of mileage is continuous. Use the sign test to determine be continuous but not normal. Is there any evidence to suggest
whether there is any evidence that the median mileage is that the median chlorophyll amount in surface water is different
greater than 100. Use a 5 0.05 for this test and find the exact in April than in August? Use a significance level of a 5 0.05.
p value. MODHOMES
14.20 Travel and Transportation Over the past several
14.16 Manufacturing and Product Development Grade years, Times Square in New York City was made pedestrian-
40s grease mohair must have a fiber diameter of approximately only, and protected bike lanes were added to five avenues in the
24 microns. A manufacturer of mohair jackets received a large middle of Manhattan. In addition, a new bike-share program
shipment of 40s-grade raw material and obtained 30 random has become extremely popular. As a result of these changes,
measurements of fiber diameter. The manufacturer can only motor vehicle traffic is actually moving more smoothly into and
assume that the underlying distribution of fiber diameters is through Manhattan.8 One measure of better traffic flow is the
continuous. Is there any evidence to suggest that the median speed of a taxi ride. After carefully reviewing taxi records, a
diameter is different from 24 microns? Use a significance level random sample of similar trips before and after the changes to
of a 5 0.05. MOHAIR Times Square were obtained. The average speed (in mph) of
each trip was recorded. Use the sign test with a 5 0.05 to
14.17 Sports and Leisure There has been increased atten-
determine whether there is any evidence that the median speed
tion lately to sports-related head injuries. Frontline recently
of a taxi trip is faster after the changes to Times Square. Find
published an article about the NFL’s “concussion crisis,”5 and
the p value associated with this test. TIMESQ
physicians have become more sensitive to recognizing and
treating head trauma in younger children. Recent studies sug-
gest that there may be a sex difference in memory after a Extended Applications
concussion.6 A random sample of female soccer players who
14.21 Biology and Environmental Science Jet-engine
sustained a concussion was obtained. A computer test to mea-
emissions at high altitudes cause vapor trails that contribute to
sure memory skills was administered, and the time (in seconds)
cloud cover and pollution, and may even affect weather pat-
needed to complete the test was recorded for each. Assume that
terns. An experiment was conducted to measure the concentra-
the underlying distribution of time needed to complete this
tion of volatile organic compounds (VOCs) directly behind an
memory test is continuous. Is there any evidence to suggest that
operating engine. A random sample of jet engines was obtained.
the median time to complete this test for women is greater than
The VOC concentration was measured (in mg/m3) before and
40 seconds? Use a 5 0.05. MEMORY
after a special exhaust scrubber was installed. The data are
14.18 Travel and Transportation According to a report given in the following table. JETVOC
from TrueCar.com, the average transaction price for a new
vehicle reached a record high in August 2013 of $31,252.7 A Before 13.0 14.9 14.0 13.4 9.8 14.9 12.0 11.2
transaction price of a vehicle is the out-the-door-price, which After 11.0 13.6 12.3 9.9 10.3 10.7 9.6 9.5
includes the price of the vehicle, discounts, add-ons, taxes, and
license fees. A random sample of new vehicles purchased in Before 14.9 15.0 9.7 13.1 14.4 11.3 14.6
August 2014 was obtained, and the transaction price of each After 11.8 13.1 9.2 10.0 11.9 13.7 12.3
was recorded. Assume that the underlying distribution of new-
vehicle transaction price is continuous. CARPRICE Use the sign test with a 5 0.05 to determine whether there is
a. Is there any evidence to suggest that the median new any evidence that the median VOC concentration is smaller
vehicle transaction price has increased (from ($31,252)? when the scrubber is installed. Find the p value associated with
b. Why is the median, rather than the mean, new-vehicle this test.
transaction price a better measure of the center of this
distribution? 14.22 Medicine and Clinical Studies Calcium blockers
have many effects on a person’s heart rate. They are most often
14.19 Biology and Environmental Science Chlorophyll is used to slow the heart rate in patients with atrial fibrillation.
an important part of photosynthesis in plants and is also present in This class of medications work to decrease the heart pumping
microscopic algae and other phytoplankton. A random sample of strength and, therefore, ease tension in blood vessels.9 Suppose
20 locations along the western coastline of the United States was an experiment was conducted to determine whether calcium
selected. Using fluorometry, the quantity of chlorophyll in the sur- blockers affect heart rhythms in patients with arrhythmias. A
face water was measured (in mg/m3) in early April and in late random sample of patients was obtained, and their resting pulse
August. The pairwise differences (April measurement 2 August rate was measured (in beats per minute). After a two-week regi-
measurement) are given in the following table. CHLORO men of a calcium blocker, each person’s resting pulse rate was
measured again. Suppose no assumptions can be made about
6.24 6.51 210.96 9.52 5.39 5.88 29.03 the shape of the continuous distributions of before and after
pulse rates. CABLOCK
4.58 28.16 6.99 4.92 29.35 4.88 21.99
a. Conduct a sign test to determine whether there is any
4.96 3.90 28.13 7.55 12.61 7.68 evidence that the median pulse rate before the calcium
690 C h a pter 1 4 Nonparametric Statistics
blocker medication is different from the median pulse rate obtained. The survival (in years) after the transplant was recorded
after the medication. Use a significance level of a 5 0.01. for each. Suppose no assumptions can be made about the shape
b. Do you believe the assumptions for the sign test are valid of the underlying continuous distribution. LUNG
in this case? Why or why not? a. Conduct a sign test to determine whether there is any
14.23 Medicine and Clinical Studies The short-term sur- evidence to suggest that the median survival at D’Youville
vival rate following a lung transplant has improved recently. Even is greater than 4.6 years. Use a 5 0.05.
though patients face the risks of organ rejection and infection, b. Conduct a one-sample t test concerning a population
approximately 78% of patients survive after the first year, and the mean to determine whether there is any evidence to
median survival for a single-lung recipient is 4.6 years.10 The suggest that the mean survival at D’Youville is greater
D’Youville Hospital Center in New York City claims to have a than 4.6 years. Use a 5 0.05. How do your results
survival rate higher than the national average. Suppose a random compare with part (a)? Which analysis do you believe is
sample of single-lung transplant patients from D’Youville was more appropriate? Why?
µ̃0 µ̃0
Figure 14.4 If H0 is true, for every positive Figure 14.5 If m ~.m ~ , then there should be
0
difference there should be a negative more positive differences of large magnitude.
difference of about the same magnitude. The sum of the ranks associated with the
positive difference would be large.
14.2 The Signed-Rank Test 691
We assume the underlying population is symmetric (when using the signed-rank test), so
the median is equal to the mean. Therefore, the test procedure above can be used to test a
hypothesis concerning a population mean (when the underlying population is not normal
SOLUTION TRAIL
but is symmetric).
Data Set
SWEETEN Example 14.3 Sweet River Water
Diet foods and beverages contain artificial sweeteners that are not processed by the body.
Consequently, these sweeteners, for example, sucralose, cyclamate, saccharin, and ace-
Solution Trail 14.3 sulfame, flow into water-treatment facilities. Most of these chemicals remain unaffected
by the treatment process and flow into rivers and streams. A recent study concluded that
K e y w or ds
the Grand River in Ontario had the highest concentration of artificial sweeteners in the
n Signed-rank test world.11 Suppose a random sample of Grand River water was obtained from 10 different
n Median concentration greater locations, and the saccharin concentration (in mg/L) was measured for each. The data are
than given in the following table.
T r ansl ati o n
n One-sided, right-tailed
5.2 7.2 3.5 4.7 5.9 3.6 2.5 5.8 4.4 3.3
hypothesis test concerning a Assume the underlying distribution of saccharin concentration is continuous and sym-
population median
metric. Use a signed-rank test to determine whether there is any evidence that the median
C onc epts saccharine concentration is greater than 4 mg/L, with a significance level of 0.05.
n Wilcoxon signed-rank test
Solution
V isi on
Step 1 The distribution of saccharin concentration is assumed to be continuous and
Rank the absolute differences,
symmetric. Because this problem involves the population median, the signed-
and find the value of the test
statistic. Use this value to draw rank test is appropriate. We are looking for evidence that the population median
the appropriate conclusion. is greater than 4; the hypothesis test is one-sided, right-tailed. The number of
observations is n 5 10, (,20 ) ; the test statistic is T 1 .
692 C h a pte r 1 4 Nonparametric Statistics
~54
Step 2 H0: m
Ha : ~.4
m
TS: T 1 5 the sum of the ranks corresponding to the positive differences
xi 2 m~
0
RR: T 1 $ c1 5 44
We need a value for c1 such that P ( T 1 $ c1 ) < 0.05. Using Table IX in the
Appendix, P ( T 1 $ 44 ) 5 0.0527.
Step 3 The following table shows the data, each difference xi 2 4, the absolute value of
each difference 0 xi 2 4 0 , the rank associated with each absolute difference, and
the signed rank. The positive or negative sign from the pairwise difference has
been attached to the rank to create the signed rank. Note that there are no zero
differences. Therefore, no observations are excluded from the analysis.
Observation 5.2 7.2 3.5 4.7 5.9 3.6 2.5 5.8 4.4 3.3
Difference 1.2 3.2 20.5 0.7 1.9 20.4 21.5 1.8 0.4 20.7
Absolute difference 1.2 3.2 0.5 0.7 1.9 0.4 1.5 1.8 0.4 0.7
Rank 6.0 10.0 3.0 4.5 9.0 1.5 7.0 8.0 1.5 4.5
Signed rank 16.0 110.0 23.0 14.5 19.0 21.5 27.0 18.0 11.5 24.5
To assign each rank, consider the ordered list of absolute differences and their
position in the list.
Absolute difference 0.4 0.4 0.5 0.7 0.7 1.2 1.5 1.8 1.9 3.2
Position 1 2 3 4 5 6 7 8 9 10
Rank 1.5 1.5 3.0 4.5 4.5 6.0 7.0 8.0 9.0 10.0
The absolute differences in positions 1 and 2 are equal. Each is assigned the
mean rank, ( 1 1 2 ) /2 5 1.5.
The absolute difference 0.5 is in the third position. It is assigned the rank 3.0.
The absolute differences in positions 4 and 5 are equal. Each is assigned the
mean rank, ( 4 1 5 ) /2 5 4.5.
The absolute difference 1.2 is in the sixth position. It is assigned the rank 6.0.
Similarly, 1.5, 1.8, 1.9, and 3.2 are assigned the ranks 7.0, 8.0, 9.0, and 10.0.
Step 4 The value of the test statistic is the sum of the positive signed ranks.
There are two kinds of differences The signed-rank test may also be used to compare population medians when the data
here: the pairwise differences are are paired and the underlying distributions are not normal. We assume that the distribu-
di 5 xi 2 yi , and the differences tion of the pairwise differences is continuous and symmetric, and that each pair of values
used in the calculation of the test ~ 2m
is independent of all the other pairs. If the null hypothesis is H0: m ~ 5m ~ 5 D , we
1 2 D 0
statistic are di 2 D0 . apply the one-sample test procedure to the pairwise differences.
1. Subtract D0 from each pairwise difference.
That is, compute di 2 D0 ( i 5 1, 2, c, n ) .
2. Rank the absolute differences, 0 di 2 D0 0 ( i 5 1, 2, . . . , n ) .
3. Find the sum of the ranks associated with the positive differences, t 1 .
SOLUTION TRAIL
Example 14.4 Night-Shift Hazards
Data Set A recent study suggests that night-shift workers have a greater risk of heart attacks and
NTWORK strokes.12 The body clock of those people who work evening, night, and rotating shifts
becomes disrupted and may be associated with high blood pressure, high cholesterol, and
diabetes. Suppose a study was conducted to compare the cholesterol level in workers on
the night shift and workers on the traditional day shift in a food-processing facility. A
Solution Trail 14.4 random sample of employees on the night and day shifts were selected and paired accord-
ing to sex, age, and weight. The total cholesterol level was measured (in mg/dL) in each,
K e y w or ds
and the data are given in the following table.
n Signed-rank test
n Median greater than
Night shift 215 185 175 185 211 158 171 199 190 151 154 233
T r ansl ati o n Day shift 203 199 168 177 176 153 173 179 187 165 145 212
n Two-sided, right-tailed
hypothesis test concerning
Night shift 180 202 180 160 195 171 152 210 212 196 188 213
the difference in population
medians Day shift 178 190 174 167 189 156 148 200 197 204 190 224
C onc epts Assume that the underlying distribution of the pairwise differences is continuous and
n Wilcoxon signed-rank test symmetric. Use the Wilcoxon signed-rank test to determine whether there is any evidence
V isi on that the median total cholesterol level is greater in night-shift workers. Use a 5 0.05.
Compute each pairwise
difference and rank the absolute Solution
differences. Find the value of the Step 1 The underlying distributions are not assumed to be normal, but the distribution
test statistic, and draw the of the differences is assumed to be continuous and symmetric. The assumptions
appropriate conclusion. for the Wilcoxon signed-rank test are met. Note that the sign test to compare two
medians could also be used, but the Wilcoxon signed-rank test is a more reliable
test because it uses more of the information in the sample.
We are searching for evidence that night-shift workers have higher median total
cholesterol, so the hypothesis test is one-sided, right-tailed, with D0 5 0.
Let population 1 be the total cholesterol level in night-shift workers and popula-
tion 2 be the total cholesterol level in day-shift workers.
Because D0 5 0, we do not need to modify the pairwise differences. No pair-
wise differences are equal to zero; no pairs are excluded from the analysis.
There are n 5 24 $ 20 observations (pairwise differences), so the normal
approximation will be used. The four parts of the hypothesis test are
~ 2m
H0 : m ~ 5m ~ 5D 50
1 2 D 0
~
H:m 2m 5m ~ ~ .D 50
a 1 2 D 0
T 1 2 mT 1
TS: Z 5
sT 1
RR: Z $ za 5 z0.05 5 1.6449
694 C h a pte r 1 4 Nonparametric Statistics
Step 2 Compute the absolute value of each pairwise difference, and rank these num-
bers. Equal values are assigned the mean rank for their positions. The following
table shows the original data, the pairwise differences, the absolute value of each
pairwise difference, the rank associated with each pairwise difference, and the
signed rank.
To find the p value associated with this test, find the right-tail probability.
p 5 P ( Z $ 2.0587 ) Definition of p value.
Technology Corner
Procedure: Conduct a Wilcoxon signed-rank test.
Reconsider: Example 14.3, solution, and interpretations.
CrunchIt!
Use the Wilcoxon Signed Rank function to conduct a signed-rank test concerning a population median. For paired data,
use Wilcoxon Paired.
1. Enter the data into column Var1.
2. Select Statistics; Non-parametrics; Wilcoxon Signed-Rank.
3. Using the pull down menu, select the Sample column.
~ ) and select the appropriate Alternative.
4. Under the Hypothesis Test tab, enter the Median under null hypothesis ( m0
Click Calculate. The results are displayed in a new window. See Figure 14.8.
Minitab
The built-in Wilcoxon signed-rank test is accessible in the Stat; Nonparametrics menu and may also be executed using
commands in a session window. In the case of paired data, use the differences di 2 D0 ( i 5 1, 2, c, n ) .
1. Enter the data into column C1.
2. Select Stat; Nonparametrics; 1-Sample Wilcoxon. Enter the Variable (column containing the data, C1), select Test
median and enter the hypothesized value, 4, and select the Alternative hypothesis (greater than).
3. Click OK. The test results are displayed in a session window. Refer to Figure 14.6.
696 C h a p t e r 1 4 Nonparametric Statistics
Excel
There is no built-in function to conduct a Wilcoxon signed-rank test. Use other arithmetic functions to compute the sum of
the signed ranks.
1. Enter the data in column A.
2. Compute the differences, and the absolute differences using the function ABS. Use the function RANK.AVG to compute
the appropriate ranks for this test. Use the IF function to compute the signed ranks.
3. Use the SUMIF function to find the value of the test statistic. See Figure 14.9.
14.24 Fill in the Blank The signed-rank test is based on the 21.4 20.3 18.5 18.8 21.5 21.6 20.8 21.9
_____________ and _____________ of each difference. 19.8 19.6 19.4 20.8 19.6 18.7 20.4 20.2
14.25 True/False The test statistic for a signed-rank test has 21.6 19.2 20.1 18.5 18.5 19.7 19.7
a chi-square distribution with n 2 1 degrees of freedom. ~ 5 25.
c. H0: m
14.26 Short Answer What are the assumptions for a signed-
rank test? 22 28 29 28 27 22 22 21 25 24
29 27 29 27 25 25 28 28 23 21
14.27 True/False The signed-rank test can be used to test a
24 26 27 27 22
hypothesis about a population mean.
~ 5 305.4.
d. H0: m
14.28 Short Answer If both the sign test and the signed-
rank test are appropriate, which would you use, and why? 296 303 271 263 288 312 260 305
250 308 315 264 254 258 274 314
Practice 279 267 312 310 309 273 269 293
264 278 255 285 272 307
14.29 A data set and null hypothesis are given in each of the
following problems. Assume a Wilcoxon signed-rank test will 14.30 Use the random sample in each problem to conduct a
be used to test H0. Find (i) the differences, (ii) the absolute Wilcoxon signed-rank test with the indicated null hypothesis,
differences, and (iii) the rank associated with each absolute alternative hypothesis, and significance level. Find the p value
difference. EX 14.29 associated with each test. EX14.30
~ 5 60.
a. H0: m ~ 5 70, H : m
a. H0: m ~ 2 70, a 5 0.10
a
~ 5 0.7, H : m
b. H0: m ~ , 0.7, a 5 0.05 s upporting concrete. A random sample of grout from various
a
locations in an office building project was obtained, and the
0.072 0.348 0.319 0.502 0.733 0.603 0.052 strength of each batch (in psi) was determined using the grout-
0.493 0.721 0.762 0.166 0.965 0.616 0.904 cube test at 28 days. Assume the underlying distribution of
0.882 0.773 0.771 0.506 0.735 grout strength is continuous and symmetric. Use the Wilcoxon
signed-rank test with a 5 0.01 to determine whether there is
~ 5 245, H : m
c. H0: m ~ . 245, a 5 0.02 any evidence that the median grout strength is different from
a
6000 psi. Find the p value for this test. GROUT
250 235 235 231 241 232 232 14.35 Economics and Finance Near the end of 2013, the
240 246 235 238 231 236 232 annual percentage rate (APR) on a typical credit card had risen to
approximately 15.06%.14 Usually the APR on student credit cards
~ 5 450, H : m
d. H0: m ~ 2 450, a 5 0.01
a is slightly lower. To check this claim, a random sample of student
credit cards was obtained and the annual percentage rate was
461 424 436 485 476 457 463 409 424 450 recorded for each. Assume that the distribution of APR on student
406 402 435 457 420 410 402 438 428 450 credit cards is continuous. Use the Wilcoxon signed-rank test with
410 418 423 466 497 492 411 426 a 5 0.01 to determine whether there is any evidence that the
median APR on student credit cards is less then 15.06. CREDIT
14.31 Using the paired data on the text website, conduct a
14.36 Sports and Leisure During the summer of 2013, par-
ilcoxon signed-rank test to compare population medians with
W
H0: m~ 2m ~ 5 0, H : m~ 2m ~ 2 0, and significance level ents spent approximately $856 per child on summer activities.
1 2 a 1 2
a 5 0.02. Find the p value associated with this test. This amount was greater than in 2012, and it was expected to
EX14.31
rise again in the summer of 2014.15 Suppose a random sample
of parents with children not yet in high school was obtained.
Applications Each was asked how much they spent per child for summer
14.32 Medicine and Clinical Studies A normal adult brain 2014 activities. Assume the distribution of amount spent per
oxidizes approximately 120 grams of glucose per day.13 Some child is continuous. Use the Wilcoxon signed-rank test to deter-
researchers believe that the rate of metabolism is higher in mine whether there is any evidence that the median amount
adults whose job requires them to make many decisions daily. A spent per child during the summer of 2014 is greater than $856.
random sample of basketball referees was obtained, and the Use a 5 0.05. SUMMER
brain oxidation rate of each was measured. The data are given 14.37 Economics and Finance Two counties in Pennsylvania
in the following table. BRAIN handle delinquent property-tax payments in different manners.
The first county has very strict regulations that include interest
120 121 119 122 121 122 121 119 121 on the unpaid balance and eventually property seizure. The sec-
122 124 121 123 122 123 119 120 119 ond county uses a more personal touch. Someone calls or visits
the property owner to discuss the overdue bill and, if necessary,
The underlying distribution of oxidation rates is assumed to be a payment schedule is created. A random sample of delinquent
continuous and symmetric. payments in each county was obtained, and the data were paired
a. Use the Wilcoxon signed-rank test to determine whether according to the size of the original property-tax bill. The
there is any evidence that the mean oxidation rate for amount of time (in months) until full payment was made was
basketball referees is greater than 120 grams per day. Use a recorded for each property. Assume the distribution of pairwise
significance level of a 5 0.05. Find the p value for this test. differences is continuous and symmetric. Use the Wilcoxon
b. Why can the signed-rank test be used in this case to test a signed-rank test to determine whether there is any evidence of a
hypothesis concerning the population mean (rather than difference in median collection time due to the method of han-
the population median)? dling delinquent payments. Use a 5 0.10, and find the p value
14.33 Medicine and Clinical Studies A random sample of associated with this test. PROPTAX
patients with a certain kidney disorder was obtained, and the 14.38 Biology and Environmental Science The 154-day
renal blood-flow rate was measured (in L/min) for each. In weights (in pounds) of 16 randomly selected Rambouillet lambs
healthy patients, the median renal blood-flow rate is known on a farm in Nebraska were recorded. The data are given in the
to be 3.00 L/min, and the distribution of flow rates is assumed following table. LAMBS
to be continuous. Use the Wilcoxon signed-rank test to deter-
mine whether there is any evidence that the median renal 118 119 120 115 114 113 119 119
blood-flow rate in patients with this kidney disorder is differ-
118 117 120 116 113 115 118 119
ent from 3.00 L/min. Use a 5 0.01, and find the p value for
this test. RENAL
Assume the underlying distribution of weights is continuous and
14.34 Public Policy and Political Science Regulations for a symmetric. Use the Wilcoxon signed-rank test to determine
new office building specify that the median strength for whether there is any evidence that the median 154-day weight is
n onmetallic, nonshrink grout should be 6000 psi when it is less than 118 pounds. Use a significance level of 0.05.
698 C h a pter 1 4 Nonparametric Statistics
14.39 Public Health and Nutrition CBC News recently b. Based on your conclusion in part (a), which method for
reported that close to 8 million acres of farmland in China is estimating speed do you think the state police should use,
too polluted to use for growing food. There is a high concentra- and why?
tion of heavy metals and other chemicals in the soil, and some
14.41 Physical Sciences In a study of several Chesapeake
of these pollutants have already made it into the food chain
Bay tributaries, a random sample of nontidal freshwater loca-
through rice and other crops. Cadmium is a carcinogenic metal
tions on the Patuxent River was obtained. The total arsenic con-
that can cause kidney damage, is absorbed by rice, and is of
centration was measured at each location (in mg/L), and the
particular concern.16 A random sample of farmland near
data are given in the following table. CHESBAY
Guangzhou was identified, and samples of soil were taken from
each. The concentration of cadmium in each sample was
0.67 0.22 0.56 0.27 0.17 0.57 0.55 0.55
measured. Assume that the distribution of the concentration of
cadmium is continuous. Use the Wilcoxon signed-rank test to 0.53 0.09 0.61 0.45 0.49 0.04 0.44 0.19
determine whether there is any evidence that the median
Suppose the distribution of arsenic concentrations is continuous
cadmium concentration is greater than 1 ppm. Use a 5 0.05
and symmetric and that the safe level of arsenic concentration
and find the p value for this test. CADMIUM
is 0.30 mg/L.
a. Conduct a sign test to determine whether there is any
Extended Applications evidence that the median arsenic concentration in freshwater
14.40 Travel and Transportation A recent study was locations is greater than 0.30 mg/L. Use a 5 0.05.
c onducted to assess two methods to estimate vehicle speed on b. Conduct a signed-rank test to determine whether there is
interstate highways. A random sample of vehicles on Interstate any evidence that the median arsenic concentration in
75 along Alligator Alley in Florida was obtained. The speed of freshwater locations is greater than 0.30 mg/L. Use
each vehicle was measured using a loop detector and an elec- a 5 0.05.
tronic toll transponder. The data (in mph) are given in the fol- c. Compare your conclusions in parts (a) and (b). Which test
lowing table. HWYSPEED
do you think is more accurate? Why?
Loop Toll Loop Toll 14.42 Business and Management There has been con-
Vehicle detector transponder Vehicle detector transponder siderable discussion about dairy policy and the price of milk
in the United States. A recent report compared the Federal
1 66 65 2 66 66 Agriculture Reform and Risk Management Act of 2013
3 70 67 4 82 80 (DSA) and the Dairy Freedom Act (DFA) on the price of
5 74 68 6 70 67 milk (in dollars per cwt). A random sample of months was
7 63 65 8 63 61 selected, and the price of milk was recorded for each.17
Assume the distribution of pairwise differences is continuous
9 80 78 10 71 68
and symmetric. DAIRY
11 82 81 12 66 65 a. Conduct a sign test to determine whether there is any
13 72 68 14 70 68 evidence of a difference in the median milk prices under
15 65 60 16 63 62 the two different pricing policies. Use a 5 0.05.
17 66 65 18 79 79 b. Conduct a signed-rank test to determine whether there is
any evidence of a difference in the median milk prices
Assume that the distribution of pairwise differences is continu- under the two different pricing policies. Use a 5 0.05.
ous and symmetric. c. Compare your conclusions in parts (a) and (b). Which test
a. Use the Wilcoxon signed-rank test with a 5 0.05 to do you think is more accurate? Why?
determine whether there is any evidence that the difference d. If you were a dairy farmer, which pricing policy would
in median speed estimates is different from zero. you prefer? Why?
list (just as in the signed-rank test). Let w be the sum of the ranks associated with obser-
vations from the first sample.
Can you figure out the smallest Suppose, for example, that the sample sizes are equal. If m ~ ,m ~ , then the observa-
1 2
and the largest possible value tions in the first sample will tend to be smaller than the observations in the second sample,
of w? and w will be small. If m ~ .m ~ , then the observations in the first sample will tend to be
1 2
larger than the observations in the second sample, and w will be large. There is also a
normal approximation, which is valid when both n1 and n2 are large.
H: m ~ 2m ~ .D, ~ 2m
m ~ ,D, or m~ 2m ~ 2D
a 1 2 0 1 2 0 1 2 0
Subtract D0 from each observation in the first sample. Combine these differences and the
observations in the second sample, and rank all of these values. Equal values are assigned
the mean rank for their positions.
TS: W 5 the sum of the ranks corresponding to the differences from the first sample
RR: W $ c1, W # c2, W$c or W # n1 ( n1 1 n2 1 1 ) 2 c
The critical values c1, c2, and c are obtained from Table X in the Appendix such that
P ( W $ c1 ) < a, P ( W # c2 ) < a, and P ( W $ c ) < a /2.
The Normal Approximation: As n1 and n2 increase, the statistic W approaches a normal
distribution with
n1 ( n1 1 n2 1 1 ) n1n2 ( n1 1 n2 1 1 )
mW 5 and s2W 5
2 12
W 2 mW
Therefore, the random variable Z 5 has approximately a standard normal
sW
distribution. The normal approximation is good when both n1 and n2 are greater than 8. In
this case, Z is the test statistic and the rejection region is
RR: Z $ za, Z # 2za, or 0 Z 0 $ za / 2
A Closer L ok
Stepped
STEPPED Tutorial
TUTORIALS 1. This general test procedure allows for any hypothesized difference between the two
Wilcoxon
BOX PLOTS rank-sum population medians, D0. Usually, D0 5 0, and the null hypothesis is that the two popu-
test lation medians are equal.
2. If D0 2 0, we subtract this quantity from each observation in the first sample. Intui-
tively, if H0 is true, this shifts sample 1 so that the two samples have the same median.
3. Like other nonparametric tests, this procedure is appropriate when the underlying dis-
tributions are non-normal, or when the data are ranks.
4. Suppose both underlying populations are symmetric. The Wilcoxon rank-sum test can
be used to compare population means, because in each population the mean is equal to
the median. In this case, one might also consider the two-sample t test, discussed in
Section 10.2. However, if the underlying populations are not normal, then the t-test
results are not valid.
Many software packages call this 5. Sometimes, a slightly different test statistic is used in this test procedure. The Mann–
the Mann–Whitney rank-sum test. Whitney U statistic is a function of W and also approaches a normal distribution as n1
and n2 increase. The two statistics lead to similar conclusions.
700 C h a pt er 1 4 Nonparametric Statistics
SOLUTION TRAIL
Example 14.5 Icebreakers
Data Set In December 2013 a scientific expedition ship became stuck in ice near Antarctica. At
ICEBREAK least three icebreakers attempted to reach the stranded ship but had to turn back due to
weather or ice thickness. Several countries around the world maintain a fleet of icebreak-
ers. These ships are designed with a strong hull, powerful engines, and a wide beam.
Solution Trail 14.5 Independent random samples of icebreakers from the United States and Canada were
obtained, and the beam width of each was measured (in meters). The data are given in the
Ke ywor ds following table.
n Rank-sum test
n Difference in population
medians U.S. 19.60 25.45 24.40 25.00 18.30
T ranslati on
Canada 24.38 17.80 19.50 24.45 19.84 19.15
n Two-sided hypothesis test
concerning two population
medians Suppose the underlying distributions of icebreaker beam widths are continuous. Conduct
a Wilcoxon rank-sum test to determine whether there is any evidence to suggest a differ-
Concepts ence in the population icebreaker beam widths. Use a 5 0.05.
n Wilcoxon rank-sum test
Vi sion Solution
Combine the samples, and rank Step 1 The samples are independent, and the populations are assumed to be continuous.
the ordered observations. Find We are searching for any difference in the medians, so D0 5 0.
the value of the test statistic W The sample sizes are n1 5 5 and n2 5 6. Because n1 and n2 are both less than 8,
and draw the appropriate
the test statistic is W.
conclusion.
The four parts of the hypothesis test are
~ 2m
H0 : m ~ 5D 50
1 2 0
H:m ~ 2m ~ 2D 50
a 2 2 0
TS: W 5 the sum of the ranks corresponding to the differences from the first
sample
D0 5 0, so W is the sum of the ranks corresponding to the first (smaller)
sample.
RR: W $ 41 or W # 5 ( 5 1 6 1 1 ) 2 41 5 19
© RIA Novosti/The Image Works
We need a value for c such that P ( W $ c ) < 0.05/2 5 0.025. Using Table X in
the Appendix, P ( W $ 41 ) 5 0.026. Therefore, the actual significance level for
this test is 2 ( 0.026 ) 5 0.052 < 0.05.
Step 2 The following table shows the combined samples in order and the rank associ-
ated with each value. The shaded columns correspond to observations from the
first sample.
Observation 17.80 18.30 19.15 19.50 19.60 19.84 24.38 24.40 24.45 25.00 25.45
Rank 1 2 3 4 5 6 7 8 9 10 11
Step 3 The value of the test statistic is the sum of the ranks corresponding to the obser-
vations from the first sample.
w 5 2 1 5 1 8 1 10 1 11 5 36
The value of the test statistic does not lie in the rejection region. At the a 5 0.05
significance level, there is no evidence to suggest that the population median
beam widths are different.
14.3 The Rank-Sum Test 701
VIDEO Tech
Video TECH Manuals
MANUALS
Figure 14.10 shows a technology solution.
EXEL DISCRIPTIVE
Wilcoxon rank-sum
test
Here is one more example that involves the normal approximation in the Wilcoxon
rank-sum test.
Data Set
Example 14.6 Economic Mobility
OPPINDEX
The Opportunity Index (OI) is a unique measure of the economic mobility in counties and
states. Sixteen variables are used to derive the value of the OI, including ones that mea-
sure jobs and the local economy, education, and community health and civic life. The OI
is often considered a measure of the “American dream.” A higher OI suggests more social
mobility and greater economic security. A random sample of counties in the Northeast
SOLUTION TRAIL
and the West were obtained, and the OI was recorded for each.18 The data are given in the
following table.
Observation 37.3 40.0 41.6 43.0 45.7 45.9 46.2 47.3 48.3
Rank 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0
Observation 53.3 54.1 54.8 54.8 54.9 55.1 55.6 56.0 56.2
Rank 10.0 11.0 12.5 12.5 14.0 15.0 16.0 17.0 18.0
Note: Remember that equal values are assigned the mean rank for their
positions.
Step 3 The value w is the sum of the ranks corresponding to the observations from the
first sample.
w 5 6.0 1 11.0 1 12.5 1 12.5 1 17.0 1 18.0 1 21.0 1 22.0
1 23.0 1 24.0 1 25.0 1 26.0 5 218.0
The mean and variance of the random variable W are
n1 ( n1 1 n2 1 1 ) 12 ( 12 1 14 1 1 )
mW 5 5 5 162.0
2 2
n1n2 ( n1 1 n2 1 1 ) ( 12 )( 14 )( 12 1 14 1 1 )
s2W 5 5 5 378.0
12 12
The value of the test statistic is
w 2 mW 218.0 2 162.0
z5 5 5 2.88
sW !378.0
The value of the test statistic lies in the rejection region. Equivalently,
p 5 0.004 # 0.01. The exact p value calculation and illustration is shown in
Figure 14.11. At the a 5 0.01 significance level, there is evidence to suggest
that the population median Opportunity Indices are different.
Figure 14.12 shows a technology solution.
–2.88 0 2.88
Figure 14.11 p-Value illustration: Figure 14.12 Minitab Mann-Whitney test.
p 5 2P ( Z $ 2.88 )
5 0.0040 # 0.01 5 a
Technology Corner
Procedure: Conduct a rank-sum test.
Reconsider: Example 14.6, solution, and interpretations.
CrunchIt!
Use the Mann-Whitney U function to conduct a rank-sum test.
1. Enter the NE data into column Var1 and the West data into column Var2.
2. Select Statistics; Non-parametrics; Mann-Whitney U.
3. Under the Columns tab, use the pull-down menus to select Sample 1 (Var1) and Sample 2 (Var2).
4. Under the Hypothesis Test tab enter the Location shift under the null hypothesis (0) and select the Alternative
(Two-sided). Click Calculate. The results are displayed in a new window. See Figure 14.13.
Minitab
Use the built-in function Mann-Whitney, accessible in the Stat; Nonparametrics menu or by using the appropriate com-
mands in a session window.
1. Enter the NE data into column C1 and the West data into column C2.
2. Select Stat; Nonparametrics; Mann-Whitney. Enter the First Sample column, C1, the Second Sample column, C2, and
select an Alternative hypothesis (not equal).
3. Click OK. The test results are displayed in a session window. Refer to Figure 14.12.
Excel
There is no built-in function to conduct a rank-sum test. Use other arithmetic functions to compute the ranks, the values of
the test statistics, w and z, and the p value associated with this test.
1. Enter the NE data into column A and the West data into column B.
2. Use the RANK.AVG function to compute the appropriate ranks associated with each observation.
3. Sum the ranks associated with the NE observations. Find the mean and standard deviation for W and compute the value
of the test statistic. Use the NORM.S.DIST function to find the p value. See Figure 14.14.
of patients at the Mayo Clinic aged 50 to 80 with and without 14.58 Sports and Leisure Many hockey players use a
type 2 diabetes was obtained. Researchers measured the bone slapshot—a short, quick, powerful swing with the stick—to try
material strength in each person’s tibia using micro indentation and score a goal. The speed of this shot makes it very difficult
testing. The data are given in the following table. DIABETES for a goalie to react and make a save. The fastest slapshot
recorded was 108.8 mph in 2012 by Zdeno Chara.20 A National
With type 2 diabetes
Hockey League (NHL) scout believes that defensemen have
1.09 1.14 1.38 1.17 1.29 1.32 faster slapshots than forwards. To test this claim, independent
random samples from each group of players were obtained.
Without type 2 diabetes Each player took a slapshot from the blue line, and the speed of
1.06 1.78 1.72 1.48 1.68 1.87 1.40 the puck was recorded (in mph) using a radar gun. The data are
given in the following table. SLAPSHOT
Note: The indentation data are unitless, and a larger value indi- Defensemen
cates less bone strength. Use the Wilcoxon rank-sum test with
a 5 0.05 to determine whether there is any evidence that 94 98 97 94 94 94 91 100 98 96
patients with type 2 diabetes have less bone strength. Assume Forwards
that the underlying distributions are continuous.
92 88 86 94 90 85 86 95 89 85
14.56 Technology and the Internet Users surfing the
Internet generally leave web pages very quickly, often within Use the Wilcoxon rank-sum test with a 5 0.01 to determine
10–20 seconds. Some research suggests that the first 10 sec- whether there is any evidence to suggest that the population
onds of a page visit are critical to a decision to read further. A median slapshot speed of NHL defensemen is greater than the
random sample of users visiting similar web pages, one with population median slapshot speed of NHL forwards. Assume
advertisements and one without, was obtained. The time (in the underlying distributions are continuous.
seconds) for each visit was recorded, and the data are given in
14.59 Business and Management Kroll’s South restaurant
the following table. WEBVISIT
serves dairy products in two separate buffet lines. Independent
With advertisements random samples of the dairy products holding temperature (in °F)
in each line were obtained. Use the Wilcoxon rank-sum test to
14.5 10.1 11.9 9.7 11.6 8.2
determine whether there is any evidence to suggest that the popu-
Without advertisements lation median holding temperatures are different. Use a 5 0.05,
and find the p value associated with this test. BUFFET
10.4 12.0 13.2 11.7 10.9 13.2 14.5
14.60 Terrestrial Radio Listeners have several options to
Use the Wilcoxon rank-sum test with a 5 0.05 to determine traditional terrestrial radio, for example, Internet radio, music
whether there is any evidence to suggest that the population services, satellite radio, and individual playlists. Despite the
median visit times are different. Assume the underlying distri- loss of listeners, traditional terrestrial radio stations still run
butions are continuous. lots of commercials. A list of traditional music and talk radio
stations was obtained, and a random hour during the day was
14.57 Manufacturing and Product Development Better selected for each. The total time for commercials during the
World Technologies claims to have produced a revolutionary hour (in minutes) was recorded for each. RADIO
new jackhammer with less vibration, reduced noise and energy a. Use the Wilcoxon rank-sum test to determine whether
input, fewer moving parts, and greater impact strength. Inde- there is any evidence to suggest that the population
pendent random samples of conventional jackhammers and new median commercial time per hour is greater on talk radio
jackhammers were obtained. The impact strength of each, using stations than on music stations. Use a 5 0.05 and find the
heavy-duty concrete breakers, was measured (in ft-lb) at 75 psi. p value associated with this test.
The data are given in the following table. IMPACT
b. Suppose the underlying distributions are approximately
Conventional jackhammer normal. Use a two-sample t test assuming equal variances
to determine whether there is any evidence to suggest that
95 94 102 100 93 93 93 the population mean commercial time per hours is greater
on talk radio stations than on music stations. Use
New jackhammer
a 5 0.05 and find the p value associated with this test.
119 130 99 126 114 130 101 c. Which test do you think is more appropriate? Why?
d. Summarize your results—which type of station plays
a. Use the Wilcoxon rank-sum test with a 5 0.01 to
more commercials?
determine whether there is any evidence to suggest that
the population median impact strength is higher for the
Extended Applications
new jackhammer than for the conventional jackhammer.
Assume the underlying distributions are continuous. 14.61 Public Health and Nutrition Independent random
b. Find the p value for this hypothesis test. samples of almonds from two suppliers in the United
706 C h a pt er 1 4 Nonparametric Statistics
g d 2 3(n 1 1)
12 R2i
TS: H 5 c
n(n 1 1) ni
Critical values for the Kruskal-Wallis test statistic are available. However, if H0 is true
and either
1. k 5 3, ni $ 6, ( i 5 1, 2, 3 ) or
2. k . 3, ni . 5, ( i 5 1, 2, . . . , k )
The chi-square distribution was then H has an approximate chi-square distribution with k 2 1 degrees of freedom.
defined in Section 8.5.
RR: H $ x2a,k21
A Closer L ok
If the sample sizes are small, then 1. We reject the null hypothesis only for large values of the test statistic H. If the sample
the distribution of H may not be sizes are large enough, the rejection region is always in the right tail of the appropriate
close enough to chi-square to chi-square distribution.
make a reliable conclusion. 2. If we reject the null hypothesis, there is evidence to suggest that at least two popula-
tions are different. Further analysis is needed to determine which pairs of populations
are different, and how they differ; the means, medians, variances, shapes of the distri-
butions, or other characteristics could be dissimilar.
Airborne Federal
Express (1) UPS (2) Express (3)
834 245 1617
2954 600 1538
2845 915 1298
1889 998 1580
2006 284 1968
1318 325 1526
1675 558 1002
1959 493
SOLUTION TRAIL
Solution Trail 14.7 The underlying populations are assumed to be continuous. Use the Kruskal-Wallis test to
determine whether there is any evidence that the package shipping distance populations
Ke ywor ds are different. Use a significance level of a 5 0.05.
n Kruskal-Wallis test
n Is there any evidence? Solution
n Populations are different Step 1 There are k 5 3 independent random samples of sizes n1 5 8 $ 6, n2 5 8 $ 6,
and n3 5 7 $ 6, so the Kruskal-Wallis test statistic has an approximately chi-
T ranslati on
square distribution with k 2 1 5 3 2 1 5 2 degrees of freedom.
n Hypothesis test concerning the
general populations Step 2 The four parts of the hypothesis test are
g d 2 3(n 1 1)
n
12 R2i
Vi sion TS: H 5 c
n ( n 1 1 ) ni
The underlying distributions are
assumed to be continuous. Use RR: H $ x2a,k21 5 x20.05,2 5 5.9915
the Kruskal-Wallis test
Step 3 There are n 5 n1 1 n2 1 n3 5 8 1 8 1 7 5 23 total observations. The fol-
procedure; combine and rank all
of the observations. Find the
lowing table shows all 23 observations, sorted in ascending order, with the
value of the test statistic, and associated rank. Each sample is color-coded to provide a visual comparison of
draw the appropriate conclusion. the ranks.
Step 4 If the populations are identical, we expect the ranks to be evenly distributed
among the three samples. The shaded cells suggest that there are more low
ranks associated with sample 2, middle ranks associated with sample 3, and
high ranks associated with sample 1; this implies that the populations are
different.
All the observations, the associated ranks, and the rank sums are given in the
following table.
Airborne Federal
Express (1) Rank UPS (2) Rank Express (3) Rank
834 7 245 1 1617 16
2954 23 600 6 1538 14
2845 22 915 8 1298 11
1889 18 998 9 1580 15
2006 21 284 2 1968 20
1318 12 325 3 1526 13
1675 17 558 5 1002 10
1959 19 493 4
Rank sum 139 38 99
14.4 The Kruskal-Wallis Test 709
g d 2 3(n 1 1)
22
12 R2i
h5 c
n ( n 1 1 ) ni
12 1392 382 992
5 c a 1 1 b d 2 3 ( 24 ) 5 14.8645 ( $ 5.9915 )
( 23 )( 24 ) 8 8 7
0 14.8645
Figure 14.15 p-Value illustration: The value of the test statistic ( h 5 14.8645 ) lies in the rejection region. Equiva-
p 5 P( X $ 14.8645) lently, p 5 0.0006 # 0.05. The exact p value calculation and illustration is
5 0.0006 # 0.05 5 a shown in Figure 14.15. At the a 5 0.05 significance level, there is evidence to
suggest that the package shipping distance populations are different.
VIDEO Tech
Video TECH Manuals
MANUALS Figures 14.16 and 14.17 show technology solutions.
EXEL DISCRIPTIVE test
Kruskal-Wallis
Figure 14.16 Minitab Kruskal-Wallis Figure 14.17 JMP Kruskal-Wallis test results.
test results.
Technology Corner
Procedure: Conduct a Kruskal-Wallis test.
Reconsider: Example 14.7, solution, and interpretations.
CrunchIt!
Use Statistics; non-parametrics; Kruskal-Wallis.
1. Enter the data into columns Var1, Var2, and Var3.
2. Select Statistics; Non-parametrics; Kruskal-Wallis.
3. Check the columns containing the data. Click Calculate. The results are displayed in a new window. See Figure 14.18.
Figure 14.18 CrunchIt!
Kruskal-Wallis test results.
710 C h a p t e r 1 4 Nonparametric Statistics
Minitab
The Kruskal-Wallis test is accessible in the Stat; Nonparametrics menu and may also be executed using commands in a
session window. The data from all samples must be in a single column, and the corresponding group numbers (or names)
must be in a second column.
1. Enter the data from each sample into a separate column (C1, C2, and C3).
2. Use the command Data; Stack to place the data from all three samples into column C4 and to store the subscripts
(group numbers) in column C5.
3. Select Stat; Nonparametrics; Kruskal-Wallis. Enter the Response column, C4, and the Factor column, C5.
4. Click OK. The test results are displayed in a session window. Refer to Figure 14.16.
Excel
There is no built-in function to conduct a Kruskal-Wallis test. Use other arithmetic functions to compute the ranks, the
value of the test statistic h, and the p value.
1. Enter the data from each sample into a separate column (A, B, and C).
2. Compute the ranks using the function RANK.AVG.
3. Compute the value of the test statistic h. Find the p value using the function CHISQ.DIST.RT. See Figure 14.19.
populations are different? Use a 5 0.05, and find bounds on is evidence to suggest that at least two populations are
the p value associated with this test. different.
b. Which pair(s) of populations do you think is (are)
14.80 Sports and Leisure The New York City Marathon is
different? Why?
one of the largest in the world, with over 50,000 finishers in 2013.
The annual race takes runners through the five boroughs of New 14.82 Physical Sciences In 2013 there were 1087 superno-
York City over 26.219 miles. A random sample of the top 100 fin- vae reported by various observation posts around the world.
ishers from five countries in 2013 was obtained, and the finish When possible, these exploding stars were classified by their
time for each was recorded (in minutes).23 Assume the underlying light curves and absorption lines. A random sample of these
distributions of finish times are continuous. MARATHON Type I and Type II supernovae were obtained, and the magni-
a. Is there any evidence to suggest that the populations are tude of each was recorded.24 SUPERNOV
different? Use a 5 0.01. a. Assume the underlying distributions are continuous. Use
b. Using the ranks, which of these countries has the fastest the Kruskal-Wallis test with a 5 0.01 to determine if
runners? Why? there is any evidence that the populations are different.
Find the p value associated with this test.
Extended Applications b. Use the normal approximation in the Wilcoxon rank-sum
test to determine whether there is any evidence to suggest
14.81 Business and Management One indication of the population median magnitudes are different. Use
morale in county government positions is the length of service a 5 0.01. Find the p value associated with this test.
in the current position. Independent random samples of county c. What is the relationship between the p values in parts (a)
employees were obtained, and the length of service (in years) and (b)?
was recorded for each person, by job classification. MORALE d. Consider the value of the test statistics found in parts (a)
a. Assume the underlying distributions are continuous. Use and (b). Compare z2 and h. How are these values related?
the Kruskal-Wallis test with a 5 0.05 to show that there
O O B B B B O B B O O O O B B
The grouped subsequences of similar observations, or symbols, are called runs. In the
sequence of observations above, there are six runs. The test for randomness is based on
the total number of runs.
Definition
A run is a series, or subsequence, of one or more identical observations.
14.5 The Runs Test 713
A Closer L ok
1. The runs test is appropriate for testing whether a sequence of observations is not ran-
dom. It can be used if the data can be divided into two mutually exclusive categories,
for example, defective or satisfactory, working or retired, pass or fail. This test may
also be used for quantitative data that can be classified into one of two categories, for
example, above or below the median, or dangerous versus acceptable temperature.
Note: A run can have length 1. 2. The smallest possible number of runs in any sample is 1. This will occur if every observa-
tion in the sample falls into the same category or has the same attribute. The largest pos-
sible number of runs in a sample depends on the number of observations in each category.
3. The statistical test is based on the total number of runs. It seems reasonable that if the
order of observations is random, then the total number of runs should not be very large
or very small. There is a table of critical values that uses the exact distribution for the
number of runs. There is also a normal approximation that can be used if the number
of observations in each category is large.
S S E E S E S S E E E E E S
Use the runs test with a 5 0.05 to determine whether there is any evidence that the order
in which the sample was selected was not random.
SOLUTION TRAIL
For v1 5 11: P(V $ 11) 5 0.0629, There are a total of seven runs. The value of the test statistic, v 5 7, does not lie
which leads to a much higher in the rejection region. At the a 5 0.05 significance level, there is no evidence
significance level. to suggest that the order of observations was not random.
Figure 14.20 shows a technology solution.
The original data in the following example are quantitative but can be classified into
two mutually exclusive categories. The number of observations in each category is large,
so the normal approximation will be used in the runs test.
22.8 95.6 106.3 20.1 23.2 76.8 18.0 61.0 151.5 29.5
24.2 95.6 18.9 68.1 118.3 130.2 21.5 69.2 24.5 38.5
102.7 23.3 72.0 39.2 83.5 22.7 134.3 97.0 17.9 33.0
SOLUTION TRAIL
Solution Trail 14.9 Suppose the residential threshold value for delivery of propane is 25 pounds. Is there
any evidence to suggest that the order of observations is not random with respect to this
K e y w or ds threshold value? Use a 5 0.05.
n Threshold value is 25 pounds
n Is there any evidence? Solution
n Order of observations is not Step 1 Replace each observation above the threshold value with an A and each observa-
random tion below the threshold value with a B. Any observation equal to 25.0 will be
n m 5 19, n 5 11 excluded from the analysis and the sample size will be reduced. Each observa-
tion now falls into one of two mutually exclusive categories: above or below the
Tr ansl at i o n
delivery threshold.
n Hypothesis test to determine
whether there is evidence the Step 2 The runs test will be used to determine whether there is evidence to suggest that
sequence of observations is the sequence of observations is not random. There are m 5 19 observations
not random with respect to the above 25.0 and n 5 11 observations below 25.0.
threshold value Step 3 Because both m and n are greater than 10, the normal approximation will be
n m and n greater than 10 used. The four parts of the hypothesis test are
C onc e p ts H0: The sequence of observations is random.
n Runs test
Ha: The sequence of observations is not random.
n Normal approximation
V 2 mV
V isi on
TS: Z 5
sV
Use the runs test procedure:
RR: 0 Z 0 $ za/2 5 z0.025 5 1.96
compute the total number of
runs, find the value of the test Step 4 Compute the number of runs using the original sequence of observations.
statistic, and draw the
appropriate conclusion. 22.8 95.6 106.3 20.1 23.2 76.8 18.0 61.0 151.5 29.5
B A A B B A B A A A
24.2 95.6 18.9 68.1 118.3 130.2 21.5 69.2 24.5 38.5
B A B A A A B A B A
102.7 23.3 72.0 39.2 83.5 22.7 134.3 97.0 17.9 33.0
A B A A A B A A B A
There are x 5 20 runs.
Step 5 The mean and the variance of the random variable V are
2mn 2 ( 19 )( 11 )
mV 5 115 1 1 5 14.9333
m1n 19 1 11
2mn ( 2mn 2 m 2 n ) 2 ( 19 )( 11 ) 3 2 ( 19 )( 11 ) 2 19 2 11 4
s2V 5 2
5 5 6.2139
(m 1 n) (m 1 n 2 1) ( 19 1 11 ) 2 ( 19 1 11 2 1 )
The value of the test statistic is
v 2 mV 20 2 14.9333
z5 5 5 2.0326
sV "6.2139
Step 6 Because 0 z 0 5 0 2.0326 0 5 2.0326 $ 1.96, the value of the test statistic lies in
the rejection region. Equivalently, p 5 0.0420 # 0.05. At the a 5 0.05 signifi-
cance level, there is evidence to suggest that the order of the observations is not
random.
To find the p value associated with this test, find the right-tail probability and
multiply by 2.
p/2 5 P ( Z $ 2.0326 ) Definition of p value.
The exact p value is illustrated in Figure 14.21. Figure 14.22 shows a technology
solution.
–2.0326 0 2.0326
Figure 14.21 p-Value illustration: Figure 14.22 Minitab runs test.
p 5 2 P(Z $ 2.0326)
5 0.0420 # 0.05 5 a
Technology Corner
Procedure: Conduct a runs test.
Reconsider: Example 14.9, solution, and interpretations.
Minitab
The runs test is accessible in the Stat; Nonparametrics menus and may also be executed using commands in a session
window. Categorical data must be converted to numerical values.
1. Enter the data into column C1.
2. Select Stat; Nonparametrics; Runs Test. Enter the Variable, the column containing the data, C1.
3. Select Above and below, and enter 25.
4. Click OK. The results are displayed in a session window. See Figure 14.22.
c. m 5 6, n 5 6, a 5 0.05 Use the runs test with a < 0.05 to determine whether there is
d. m 5 8, n 5 9, a 5 0.01 any evidence to suggest that the order of observations is not
14.91 In each of the following problems, use the population random with respect to sex.
median to classify each observation in the ordered sample as 14.96 Manufacturing and Product Development Follow-
either above or below the median. Find the number of runs in ing Hurricane Katrina, the Make It Right New Orleans charitable
each sequence of observations. Note: The order of observations organization helped to rebuild homes in the Ninth Ward. Recently,
is left to right, then down. EX14.91 homeowners have complained that the wood used in construction
a. m~ 5 20.0
cannot withstand the New Orleans humidity and is rotting and
22.4 15.1 22.0 25.6 19.0 25.1 18.7 21.2 growing fungus. Make It Right has blamed the company that sup-
plied the lumber, which was guaranteed for 40 years.27 A sample
11.5 19.6
of rebuilt homes in the Ninth Ward was obtained, and each was
~ 5 4.00
b. m carefully inspected for good (G) or bad (B) wood. The ordered
observations are given in the following table. MAKEITRT
4.14 3.26 3.23 3.12 4.80 5.52 5.81 5.71
3.77 3.68 3.79 3.03 4.36 5.29 G G G B G G G G G
~ 5 150
c. m G G G G B G G G G
115 169 139 101 102 138 123 195 107 Is there any evidence to suggest that the order of observations is
175 178 151 181 107 110 174 196 167 not random? Use the runs test with a < 0.05.
~ 5 0.44
d. m 14.97 Marketing and Consumer Behavior Staples sells
photo paper in either glossy (G) or matte (M) finish. A sample of
0.07 0.19 0.10 0.12 0.44 0.08 0.26 0.03 online customers who purchased photo paper was obtained, and
0.10 0.14 0.41 0.20 0.01 0.38 0.28 0.03 each purchase was classified according to finish. The ordered
0.15 0.03 0.00 0.04 0.19 0.04 observations are given in the following table. PHOTOPPR
14.93 A sample was obtained, and each observation was classi- Is there any evidence to suggest that the order of observations is
fied into one of two mutually exclusive categories. The sequence not random? Use the runs test with the normal approximation
of observations is given in the following table. EX14.93 and a 0.02 level of significance.
14.100 What’s in Your Wallet? Perhaps due to the increase determine whether there is any evidence that the order of
in use of debit and credit cards, only 20% of Americans always GMAT scores is not random. GMAT
carry cash.30 A sample of shoppers at the Quaker Bridge Mall
14.103 Medicine and Clinical Studies Brigham and Wom-
in Lawrence, New Jersey was obtained and asked if they were
en’s Hospital reports that approximately 20% of American males
carrying cash. The sequence of responses is given in the follow-
and 10% of American females will develop a kidney stone at
ing table [cash (C) and no cash (N)]. WALLET
some time in their life.32 Some research suggests that drinking lots
of sugar-sweetened soda may lead to an increased risk of develop-
N C N C N C C C N C N C C C
ing a kidney stone. A sample of patients who were admitted to
Is there any evidence to suggest that the order of observations is Brigham and Women’s Hospital for a kidney stone was obtained.
not random? Use the runs test with a < 0.05. Low-dose computed tomography was used to measure the size
(in ml) of each stone, and the ordered observations were recorded.
14.101 Manufacturing and Product Development Use the runs test with the normal approximation and a 5 0.05 to
Composite decks are made of recycled wood and plastic, are determine whether there is any evidence that the order of kidney
low-maintenance and sturdy, and are designed to last much stone sizes is not random with respect to the median size, 3.0 ml.
longer than traditional wood decks. A manufacturer of compos- Find the p value associated with this test. STONES
ite decks routinely checks the surface wear as part of quality
control. A sample of planks was obtained, and each was
Challenge
measured for surface wear using a Taber tester. The ordered
observations (in Taber units) are given in the following table. 14.104 Random Number Generators The purpose of this
exercise is to determine whether a random number generator
204 211 188 208 190 203 203 193 204 really produces a random sequence of observations. Using your
214 176 209 graphing calculator or favorite statistical software:
a. Generate 100 observations from a standard normal
The planks are manufactured to have a median wear index of distribution.
200 units. Use the runs test with a < 0.02 to determine b. Classify each observation as either above or below the
whether there is any evidence that the order of observations is mean (and median) 0.
not random with respect to the median. DECKS c. Conduct a runs test using the normal approximation to
14.102 Education and Child Development The Kellogg determine whether there is any evidence to suggest that
School of Management at Northwestern University has no min- the sequence of observations is not random with respect
imum GMAT score to be accepted into the MBA program. to the mean. Use a 5 0.05.
However, they reported that the average GMAT for the class of Do this 100 times. How many times did you reject the null
2013 was 715.31 A sample of students in this MBA program hypothesis H0: The sequence is random? Draw a conclusion
was obtained, and the GMAT score for each was recorded. Use about the sequence of observations produced by your random
the runs test with the normal approximation and a 5 0.05 to number generator.
A Closer L ok
1. As usual, equal values within each variable are assigned the mean rank of their posi-
tions in the ordered list. Equation 14.1 is not exact when there are tied observations
within either variable. In this case, one should compute rS by finding the sample cor-
The sample correlation coefficient relation coefficient between the ranks.
was defined in Section 12.2. 2. Because rS is really a sample correlation coefficient, the value is always between 21
and 11. Values near 21 indicate a strong negative linear relationship, and values near
11 suggest a strong positive linear relationship between the ranks.
3. Remember, correlation does not imply causation; rS is a measure of the linear associa-
tion between the ranks. And a strong linear relationship between the ranks does not
imply that the relationship between the original variables is also linear.
Month 1 2 3 4 5 6 7 8 9 10 11 12
Miles 212.74 226.30 227.70 227.90 233.66 233.28 261.62 179.54 263.06 256.37 186.83 205.98
Unemp rate 9.7 7.6 3.8 3.9 5.8 4.9 4.8 6.7 4.7 6.6 6.5 7.1
Solution
Step 1 Let x represent the miles driven per month and let y represent the unemployment
rate. For each variable, order the observations from smallest to largest and assign
a rank to each value. Compute the difference between each pair of ranks.
The following table shows each observation, its associated rank, and each
difference.
Step 2 There are n 5 12 pairs and no ties within either variable. Use Equation 14.1 to
compute Spearman’s rank correlation coefficient.
6 a d2i
rS 5 1 2 Equation 14.1.
n ( n2 2 1 )
6 3 ( 28 ) 2 1 ( 26 ) 2 1 c1 ( 27 ) 2 4
512 Use the di’s.
12 ( 122 2 1 )
2640
512 5 20.5385
1716
Step 3 Because rS 5 20.5385, there is a moderate negative linear relationship between
the miles traveled ranks and the unemployment ranks. This suggests that there is
a moderate negative correlation between miles traveled and unemployment rate;
high unemployment rates are associated with fewer miles traveled.
Figures 14.23 and 14.24 show technology solutions.
Technology Corner
Procedure: Compute Spearman’s rank correlation coefficient.
Reconsider: Example 14.10, solution, and interpretations.
Minitab
Use the Correlation function with the appropriate Method to compute Spearman’s rank correlation coefficient.
1. Enter the miles traveled data into column C1 and the unemployment rate data into column C2.
2. Select Stat; Basic Statistics; Correlation. Enter the Variables: the columns containing the data. Select Spearman rho as
the Method.
3. Click OK. Spearman’s rank correlation coefficient is displayed in a session window. See Figure 14.23.
Excel
There is no built-in function to compute rS. However, rank each sample separately, and find the sample correlation coef-
ficient between the ranks.
1. Enter the miles traveled data into column A and the unemployment rate data into column B.
2. Rank the observations in A and store the results in column C. Rank the observations in B and store the results in
column D.
3. Use the function CORREL to find the sample correlation coefficient between the ranks, columns and C and D. See
Figure 14.25.
14.6 Spearman’s Rank Correlation 721
x 5.62 5.04 4.65 6.19 14.114 Biology and Environmental Science Farmers con-
sider heavy molasses to be a high-energy food for cattle. In a
y 214.9 214.4 215.6 215.2
recent study, the percentage of digestible energy (x) was com-
c. pared with the percentage of protein (y) in heavy molasses.
x 25.51 25.80 24.10 24.19 25.52 26.16
Random samples of various brands of molasses were obtained.
y 24.45 26.98 25.72 27.20 28.56 25.69 The percentage of digestible energy and of protein were care-
fully measured. Compute Spearman’s rank correlation coeffi-
x 24.79 23.95 25.14 25.09 25.35 26.64 cient. Use this value to describe the relationship between the
y 27.38 25.84 24.04 25.87 25.51 26.94 percentage of digestible energy and the percentage of protein
in heavy molasses. MOLASS
x 25.63 23.84 14.115 Biology and Environmental Science The bulk den-
y 26.64 26.53 sity of soil (x, in g/cm3) affects plant growth, specifically, how
easily roots can penetrate the soil. A group of carrot growers
d. x y x y x y conducted a study to compare soil bulk density with soil texture
418 754 411 755 414 776 (y), measured on a scale from 1 to 10, with 1 being very sandy
484 725 378 759 453 781 soil and 10 corresponding to clay. A random sample of plots
was obtained, and the data were recorded. Compute Spearman’s
448 712 425 749 478 745
rank correlation coefficient, and use this value to describe the
458 718 436 733 465 785 relationship between these two variables. SOIL
476 756 443 727 438 775
14.116 Biology and Environmental Science A recent
421 772 442 716 459 735
study examined the relationship between measures of egg albu-
480 728 435 763 490 751 men height and pH.34 A random sample of eggs from Brown
461 753 440 752 407 744 Leghorn hens was obtained, and the albumen height (x, in mm)
and pH was measured for each. Compute Spearman’s rank
14.112 Consider the following data obtained in a study of the
correlation coefficient. What does this value suggest about the
relationship between two variables. EX14.112
relationship between albumen height and pH? LEGHORN
x y x y
14.117 Nature Deficit Only a few generations ago, children
1.5 7.3 1.4 7.4 played much more outside, participated in spontaneous play,
1.8 6.8 1.3 7.2 communicated face to face, and learned how to compromise.
1.5 7.4 1.2 7.2 Today, in fear, many parents are reluctant to let their child play
outdoors alone. As a result, many children have very little con-
1.6 7.5 1.4 7.3
tact with nature. Some research suggests that children who play
1.7 6.7 1.2 6.6 regularly in natural environments tend to be healthier. A ran-
a. Rank the observations in each sample separately, and find dom sample of middle school children was obtained, and the
each difference between the ranks, di. number of hours per week each child played outside and the
b. Find the sample correlation coefficient between the ranks. number of days of school missed due to illness was recorded.
c. Find Spearman’s rank correlation coefficient using the The data are given in the following table. OUTDOOR
differences, di.
d. Explain why the values in parts (b) and (c) are different. Hours 3.6 6.3 5.0 7.9 8.7 4.9 6.2 10.6 6.8 8.2
Days 15 3 6 0 0 11 2 5 3 14
Applications
14.113 Travel and Transportation Dynasty Travel offers Compute Spearman’s rank correlation coefficient. Does this
cruises to the Caribbean and often discounts unsold tickets as value support the theory that healthier children spend more
the sailing date approaches. These last-minute deals were stud- time outside? Justify your answer.
ied by comparing the price of an inside stateroom (x, in dollars)
14.118 Biology and Environmental Science In January
with the number of days before sailing (y). A random sample of
2014 a polar vortex caused extreme weather conditions across
last-minute cruise deals was obtained, and the data are given in
Canada and much of the United States. Travel was treacherous,
the following table.
and countless flights were canceled at various airports. The
severe cold at Pearson International Airport in Toronto caused at
x 998 956 763 1313 1071 1091
least 200 flights to be delayed, and additional police were needed
y 9 3 4 13 10 6 to deal with irate passengers. A random sample of passengers in
Chapter 14 Summary 723
the airport was obtained, and the number of days delayed and The dramatic increase in these self-portraits and the use of the
the systolic blood pressure of each was recorded. vortex word made “selfie” the Oxford Dictionary 2013 Word of the
a. Compute Spearman’s rank correlation coefficient. Year. Despite their popularity, a recent study suggests that those
b. Use the value of rS to describe the relationship between who share more self-portraits experience less intimacy with
days delayed and systolic blood pressure, which is related others.35 A random sample of social media users was obtained
to stress. and asked for the number of selfies they had shared within the
past month. In addition, each person took the Intimacy Attitude
Extended Applications Scale (IAS) test, which measures a person’s willingness to
share feelings, fears, and concerns. SELFIES
14.119 Sports and Leisure The owner of a snow tubing a. Compute Spearman’s rank correlation coefficient.
resort in the Pocono Mountains believes that the total snowfall b. Lower values on the IAS test suggest the person is less
(x, in feet) during a winter is related to the number of customers inclined to be close and personal. Explain the meaning of
(y, in thousands). A random sample of winters was obtained, rS in the context of this problem.
and the data were recorded. WINTER
a. Construct a scatter plot for these data, and describe the Challenge
relationship between the two variables.
14.122 Fuel Consumption and Cars Suppose there are n
b. Compute Spearman’s rank correlation coefficient, and use
pairs of observations and rS is the population correlation
this value to describe the relationship between the two
between ranks. The four parts of a hypothesis test concerning
variables.
rS based on a normal approximation are
c. Explain any differences in your answers to parts (a) and (b).
H0: rS 5 0 ( no population correlation between ranks )
14.120 Physical Sciences A certain solder joint on the
undercarriage of a city bus is being tested for shear strength. Ha: rS . 0, rS , 0, or rS 2 0
A sample of solder joints was randomly selected, and the shear TS: Z 5 RS !n 2 1
strength for each was measured (in N) before (x) and after (y)
RS is Spearman’s rank correlation coefficient.
temperature-cycle stress. The data are given in the following
table. SOLDER
RR: Z $ za, Z # 2za, or 0 Z 0 $ za / 2
A turbocharger is a compact way to add more power to an auto-
x 71 69 66 69 62 67 69 64 73 69 mobile, by increasing the amount of air going into the cylin-
y 64 56 65 62 65 64 57 63 57 63 ders. A study was conducted to examine the relationship
between the added pressure (x, in psi) and power ( y, percentage
a. Rank the observations in each sample separately, and find
change in horsepower) provided by a certain turbocharger. A
each difference between the ranks, di.
variety of automobiles was selected, and a randomly selected
b. Find the sample correlation coefficient between the ranks.
turbocharger was installed in each. The additional air pressure
c. Find Spearman’s rank correlation coefficient using the
and power measurements were recorded. TRBOCHG
di’s.
a. Compute Spearman’s rank correlation coefficient, and use
d. Explain why the values in parts (b) and (c) are different,
this value to describe the relationship between added air
and use these results to explain the relationship between
pressure and change in engine power.
the two variables.
b. Conduct a hypothesis test to determine whether there is
14.121 Too Many Selfies Advanced smartphones and easy any evidence that the true population correlation between
uploads to social media sites have led to an increase in “selfies.” ranks is greater than 0. Use a 5 0.01.
Sign test 682 A nonparametric test concerning a population median, based on the number
~ . It can also be used to compare two population
of observations greater than m 0
medians, based on the number of pairwise differences greater than D0.
Wilcoxon signed-rank test 691 A nonparametric test concerning a population median, based on the sum of
the ranks corresponding to the positive differences xi 2 m~ . It can also be
0
used to compare two population medians, based on the sum of the ranks cor-
responding to the positive differences di 2 D0.
Wilcoxon rank-sum test 699 A nonparametric test concerning the difference of two population medians,
based on the sum of the ranks corresponding to the differences xi 2 D0.
724 C h a pter 1 4 Nonparametric Statistics
Kruskal-Wallis test 707 A nonparametric test to determine whether at least two of k samples are from
different populations, based on the ranks of all observations combined.
Run 712 A series, or sub-sequence, or one or more identical observations.
Runs test 713 A nonparametric test to determine whether there is evidence that a sequence
of observations is not random, based on the total number of runs.
Spearman’s rank 718 A nonparametric alternative to the sample correlation coefficient. Each
correlation coefficient observation is converted to a rank, and the correlation coefficient is computed
using the ranks in place of the actual observations.
recommend a personal umbrella policy for most customers, even 14.129 Swim at your own risk Recent research suggests
if they already have a home and an automobile policy. Basic that exposure to disinfection by-products (DBPs) in swimming
liability coverage may not be adequate as lawsuits become more pools may increase the risk of some cancers.38 A study was
common and jury awards escalate. A random sample of adults conducted to determine whether exposure to DBPs in pool
with umbrella policies was obtained, and the amount of coverage water increased toxic biomarkers. A random sample of adults
was recorded for each (in millions of dollars). Assume the was obtained and the concentration of four trihalomethanes
underlying distribution of umbrella coverage amounts is (THMs, in m/ m3) in exhaled breath was measured before and
continuous. Use the sign test with a 5 0.05 to determine one hour after swimming. Assume that the distribution of
whether there is any evidence that the median coverage amount pairwise differences is continuous and symmetric. Use the
is different from $2.00 million. POLICY
Wilcoxon signed-rank test with a 5 0.05 to determine whether
14.126 The Met Several years ago the Metropolitan Opera there is any evidence that the median THM concentration is
in New York City began live-to-cinema opera transmissions in greater after swimming. DBP
an effort to attract a greater audience. The broadcasts are now 14.130 Travel and Transportation A random sample of
seen in 64 countries, and according to the new general manager, Georgia drivers renewing their vehicle registration was obtained.
the Met is reaching younger people.37 A random sample of Using motor vehicle records, the number of miles driven in the
people who attended a performance at the Metropolitan Opera past year was recorded for each. The data (in thousands of
was obtained and the age (in years) of each was recorded. The miles) were classified by whether the driver had an organ donor
data are given in the following table. OPERA
card and are given in the following table. DRIVERS
Assume the underlying distributions are continuous. Company Sample size Rank sum
a. Is there any evidence to suggest that there is a difference
Brunswick n1 5 12 r1 5 263.0
in the absorbed radiation by machine? Use the Wilcoxon
rank-sum test with a 5 0.10. AMF n2 5 14 r2 5 185.0
b. Find the p value associated with this test. Olhausen n3 5 18 r3 5 542.0
14.132 Manufacturing and Product Development The Assume the underlying weight populations are continuous. Use
pressure for brewing espresso is necessary to help form cream the Kruskal-Wallis test with a 5 0.05 to determine whether
and to distinguish this drink from strong drip coffee. Random there is any evidence to suggest that at least two slate weight
samples of espressos from each of two commercial machines populations are different.
were obtained, and the pressure (in atmospheres) while brewing
each cup was recorded. Assume the underlying distributions are 14.136 Physical Sciences Amateur radio operators (hams)
continuous. Is there any evidence to suggest that the espresso communicate with one another all over the world, and many
setting on each machine produces a different median pressure? help coordinate relief efforts during a natural disaster. Unlike
Use the Wilcoxon rank-sum test with a 5 0.05, and find the p the 5-watt power limit on a CB radio, ham radios can have as
value associated with this test. ESPRESS much as 1500 watts of power. A random sample of amateur
radio operators was obtained from four states. The power on
14.133 Technology and the Internet Intel and Test each transmitter was measured (in watts), and the summary
Devices recently began shipping a smart baby body suit called statistics are given in the following table.
Mimo.39 This baby onesie has an Internet connection that
allows parents to monitor an infant at all times. A random State Sample size Rank sum
sample of infants (less than 1 year old) was obtained, and each New Hampshire n1 5 15 r1 5 611.0
was fitted with a Mimo. The suit was used to monitor sleeping
Alabama n2 5 16 r2 5 605.5
respiratory rates (breaths per minute), and each child was
Texas n3 5 18 r3 5 609.0
classified by position: on the back or on the stomach. The data
are given in the following table. ONESIE California n4 5 20 r4 5 589.5
14.135 Manufacturing and Product Development The 87 81 78 37 42 50 52 49
slate in a pool table may be one piece, but because it is prone 50 102 59 56 55 52 53 46
to fracturing during transporting, it is often split into three 82 109 74 84 60 49 52 20
slabs. Slate slabs from three pool-table manufacturers were 106 31 84 88 54 60 56 54
randomly selected. The weight of each slab (in kg) was
recorded, and the summary statistics are given in the follow- Assume the underlying distributions of crude steel production
ing table. per month are continuous. Use the Kruskal-Wallis test with
Chapter 14 Exercises 727
or no exam (N). The sequence of observations is given in the 14.145 Biology and Environmental Science A recent
following table. EXAM
study investigated the possibility of using moving cars to
measure the amount and intensity of rainfall, rather than
E E N E E E N E E N E N E E N traditional rain gauges.44 A random sample of automobiles was
N E N N N E E E N E E E N E E selected, and each was equipped with an automatic wiper
N N E N E N E N E N system. Each car was subject to 10–15 minutes of rain in a
controlled laboratory setting. The wiper frequency (x, w/min)
Is there any evidence to suggest that the order of observations is and the rain intensity ( y, mm/h) were recorded for each.
not random? Use the runs test with the normal approximation Compute Spearman’s rank correlation coefficient. Use this
and a 5 0.01. Find the p value associated with this test. value to explain the relationship between wiper frequency and
rain intensity and whether you believe wiper frequency might
14.142 Public Health and Nutrition The head of the
be used to predict rainfall total. RAIN
Ontario Medical Association recently suggested that companies
should stop requiring a sick note from a doctor when an 14.146 Medicine and Clinical Studies The traditional
employee is ill and stays home.43 The practice of requiring a method for people to test their blood glucose level is to prick a
note is often demeaning and discouraging, and sick employees finger with a short needle, place a drop of blood on a test strip,
should stay home. A sample of Canadian companies was and then use a special measuring device. The process can be
obtained, and each was classified as requiring a sick note (R) or painful. A group of scientists has developed a noninvasive,
not requiring any justification (N). The sequence of observa- painless approach to measuring glucose level using infrared
tions is given in the following table. OUTSICK laser light. A random sample of healthy adults was obtained.
728 C h a pte r 1 4 Nonparametric Statistics
The glucose level was measured in each using the traditional siding, a sample of consecutive customers was obtained, and
method and the laser light approach (both measurements in mil- each was classified by the exterior finish chosen. The ordered
ligrams per deciliter). Assume the underlying distribution of observations are given in the following table. SIDING
pairwise differences is continuous and symmetric. GLUCOSE
a. Use the Wilcoxon signed-rank test to determine whether B V V V V V B V V V V B V B B
there is any evidence that the median glucose levels are
different for the two procedures. Use a 5 0.01. a. Conduct a runs test with a < 0.05 to determine whether
b. Using your answer in part (a), do you believe the new there is any evidence to suggest that the order of
approach is accurate and should be recommended for observations is not random with respect to exterior finish.
people with diabetes who test their glucose level b. Using this sample, do you believe the advertising
regularly? Justify your answer. campaign was successful? Why or why not?
14.150 Psychology and Human Behavior The manager of
concessions at Fenway Park believes that the number of runs
EXTENDED APPLICATIONS
scored by the Boston Red Sox (x) is related to the number of
14.147 Manufacturing and Product Development Most Fenway Franks consumed by fans (y). A random sample of
home air conditioners built before 2010 used Freon gas to nine-inning games was obtained. The hot dog and run totals
provide cooling in a typical evaporation cycle. However, Freon, were recorded. FRANKS
which contains chlorine, was phased out of new equipment by a. Construct a scatter plot for these data, and describe the
2010 and will not be used at all by 2020. R-410A, a mixture of relationship between the two variables.
difluoromethane and pentafluoroethane, is now the most b. Rank the observations in each sample separately, and find
common refrigerant in the United States. Although an air each difference between the ranks, di.
conditioner is, theoretically, a closed system, units usually lose c. Find the sample correlation coefficient between the ranks.
refrigerant and must be recharged every few years. A random d. Find Spearman’s rank correlation coefficient using the di’s.
sample of 10,000-BTU air conditioners was obtained, and the e. Explain why the values in parts (c) and (d) are different.
amount of refrigerant (in pounds) in each was carefully mea- f. Suppose the manager of concessions would like to sell as
sured. Each unit was recharged by a technician, and the amount many hot dogs as possible. How may runs would he or
of refrigerant in each was measured after the service. Suppose she like the Red Sox to score? Why?
no assumptions can be made about the shape of the continuous
14.151 Sports and Leisure In 2012 the NCAA changed a
distributions of before and after refrigerant weights. REFRIG
rule concerning football kickoffs. Beginning in 2012, kickoffs
a. Conduct a sign test to determine whether there is any
took place at the 35-yard line rather than the 30-yard line.
evidence that the median refrigerant weight before service
There has been some concern that this rule change would affect
is less than the median refrigerant weight after service.
the return yards on each kickoff. A random sample of kickoff
Use a significance level of a 5 0.01.
returns from the 2011 and 2013 college football season was
b. Based on your results, do you believe recharging an air
obtained, and the yards were recorded for each.45 Assume the
conditioner really results in more refrigerant in the
underlying distributions are continuous. KICKOFF
system? Why or why not?
a. Use a Wilcoxon rank-sum test to determine whether there
14.148 Manufacturing and Product Development A nail is any evidence to suggest a difference in the population
gun is a handy tool, especially if you need to install a wood median return yards. Use a 5 0.05. Explain this result in
floor, replace a roof, attach a Venetian blind to a metal support, the context of the rule change.
or if you hit your thumb a lot with a hammer. These devices b. Use a two-sample t test to determine whether there is any
propel a nail at incredible speeds and can save time and energy. evidence to suggest a difference in the population mean
A random sample of nail guns from four companies was return yards. Assume equal population variances and use
obtained, and the speed of the nail was measured for each gun a 5 0.05. Explain this result in the context of the rule
(in feet per second, fps). Assume the underlying speed distribu- change.
tions are continuous. NAILGUN c. Which test do you believe is more appropriate? Why?
a. Use the Kruskal-Wallis test with a 5 0.05 to determine d. Suppose more return yards per kickoff leads to more
whether there is any evidence that at least two of the exciting games. Do you believe the rule change has made
nail-gun speed population distributions are different. the games more exciting? Justify your answer.
b. Find bounds on the p value associated with this test.
14.152 Boarding Pain The process of boarding a flight in
c. Suppose you would like to purchase the brand of nail gun
any airport can be slow and agonizing. Passengers with priority
that fires a nail at the highest speed. Which brand would
boarding often crowd the gate area, and passengers with
you choose, and why?
carry-on bags jostle for position in line. Because faster
14.149 Marketing and Consumer Behavior A home- boarding time saves airlines money and eases anxiety, airlines
building company offers a variety of styles in only two exterior are experimenting with more efficient ways to seat passengers
finishes: vinyl siding (V) or brick (B). Immediately following and prepare for departure. Consider the following boarding
an advertising campaign explaining the advantages of vinyl methods.46 BOARD
Chapter 14 Exercises 729
N-1
N-2 Notes and Data Sources
7. Canadian Lawyer Annual Corporate Counsel Survey, 2 6. U.S. NRC Strategic Plan, Fiscal Years 2008–2013.
accessed February 3, 2013, http://www.canadianlawyermag. 27. Adeveloper, accessed February 17, 2013, http://www.ade-
com/images/stories/Surveys/2012/corpcnslsurvey% veloper.com/CPweather/Current_Vantage_Pro_Plus.htm.
20-%20low%20res.pdf.
28. Blue Nile, accessed February 17, 2013, http://www.
8. 2012 Think Tanks and Civil Societies Program, International bluenile.com/diamond-search?track5NavEngLoo.
Relations Program, University of Pennsylvania, 1/28/13.
29. ESPN, accessed February 17, 2013, http://espn.go.com/
9. Based on information from NYU Rudin Center Survey, nhl/statistics/player/_/stat/major-penalties/sort/penalty
November 26, 2012, http://wagner.nyu.edu/blog/ Minutes.
rudincenter/commuting-after-hurricane-sandy-survey-
30. University of Hawaii Sea Level Center, accessed Febru-
results.
ary 17, 2013, http://uhslc.soest.hawaii.edu/data/rqd.
10. National Highway Traffic Safety Administration,
31. Harris Interactive, accessed February 18, 2013, Ellen
accessed February 4, 2013, http://www-fars.nhtsa.dot.
Dances Her Way to the Top and Is America’s Favorite TV
gov/QueryTool/QuerySection/Report.aspx.
Personality.
11. Colorado Avalanche Information Center, accessed Febru-
32. Construction Noise Impact Assessment, Biological Assess-
ary 5, 2013, https://avalanche.state.co.us/acc/acc_stats.php.
ment Preparation, Advanced Training Manual Version
12. 2011 Texas Hunting Incidents Analysis, Texas Parks and 02-2012, Washington State Department of Transportation.
Wildlife, accessed February 5, 2013, http://www.tpwd.
33. PwC, About Paying Taxes 2013, accessed February 18,
state.tx.us/publications/pwdpubs/media/pwd_rp_
2013, http://www.pwc.com/gx/en/ptaxes/about-paying-
k0700_1124_2011.pdf.
taxes.jhtml.
13. Statistics Canada, accessed February 5, 2013, http://
34. Energy.Gov, accessed February 18, 2013, http://energy.
www.statcan.gc.ca/tables-tableaux/sum-som/101/cst01/
gov/energysaver/articles/tips-heating-and-cooling.
legal50c-eng.htm.
14. Waterfall Database, accessed February 5, 2013, http:// Chapter 3
www.worldwaterfalldatabase.com/tallest-waterfalls/ 1. Canadian Pacific, Key Metrics, accessed October 28,
total-height. 2012, http://www.cpr.ca/en/invest-in-cp/key-metrics/
15. Exit Information Guide, accessed February 5, 2013, Pages/default.aspx.
http://www.i95exitguide.com/lodging/index.php. 2. Denali National Park and Preserve, National Park Ser-
16. Mayo Clinic, accessed February 5, 2013, http://www. vice, accessed October 28, 2012, http://www.nps.gov/
mayoclinic.com/health/cholesterol-levels/CL00001. dena/planyourvisit/upload/Weather-2011-obs.pdf.
17. The Engineering Toolbox, accessed February 11, 2013, 3. D. Baker, “Why Americans Should Work Less—The
http://www.engineeringtoolbox.com/light-level-rooms- Way Germans Do,” The Guardian, July 3, 2012.
d_708.html. 4. Wheat Production by Country in 1000 MT, Index Mundi,
18. debt.org, accessed February 11, 2013, http://www.debt. http://www.indexmundi.com/agriculture/?commodity=
org/medical/emergency-room-urgent-care-costs. wheat accessed October 30, 2012.
19. Port of Tacoma, accessed February 11, 2013 http://www. 5. Steam Game Stats, accessed October 30, 2012, http://
portoftacoma.com/Page.aspx?cid5497. store.steampowered.com/stats.
20. Great Pumpkin Commonwealth, accessed February 11, 6. CO2Now.org, accessed October 31, 2012, http://co2now.
2013, http://greatpumpkincommonwealth.com. org/Current-CO2/co2-now/noaa-mauna-loa-co2-data.html.
21. Naples-Fort Myers PDF Files Library, accessed Febru- 7. Department of Marine Resources, State of Maine,
ary 11, 2013, http://www.rosnet2000.com/rx_asp/rx_ accessed November 1, 2012, http://www.maine.gov/dmr/
trackpdf.asp?atid5NF. commercialfishing/historicaldata.htm.
22. Science Daily, accessed February 11, 2013, http://www. 8. ESPN MLB, accessed November 1, 2012, http://espn.
sciencedaily.com/releases/2012/09/120911151835.htm. go.com/mlb/stats/pitching/_/seasontype/2/league/al/
23. Results from the 2011 Maine Sea Scallop Survey, accessed order/false.
February 11, 2013, http://www.maine.gov/dmr/rm/scallops/ 9. National Oceanographic Data Center, accessed Novem-
2011ScallopSurveyReport.pdf. ber 1, 2012, http://www.nodc.noaa.gov/dsdt/cwtg/satl.htm.
24. Cache Metals, accessed February 12, 2013, http://www. 10. “Cooperative Learning and Statistics Instruction,” The
cachemetals.com/charts/historical-silver-chart. Journal of Statistics Education, Vol. 5, No. 3 (1997).
25. Michigan Traffic Crash Facts, accessed February 12, 11. Official London 2012 Web site, accessed November
2013, http://publications.michigantrafficcrashfacts. 2, 2012, http://www.london2012.com/diving/event/
org/2011/2011MTCF_vol1.pdf. women-10m/index.html.
Notes and Data Sources N-3
12. “Residents Claim Area Airplane Noise Has Ramped 31. Run Silent Dog Sled Trips, accessed November 24, 2012,
Up,” Post City Magazine, August 2012. http://www.runsilent.com.
13. International Skating Union, accessed November 4, 32. Flightstats, accessed November 27, 2012, http://www.
2012, http://www.isu.org. flightstats.com.
14. U.S. Environmental Protection Agency, accessed Novem- 33. InflationData.com.
ber 4, 2012, http://www.epa.gov/otaq/crttst.htm. 34. US Vending, accessed November 28, 2012, http://www.
15. Infoplease, accessed November 5, 2012, http://www. usvending.com.
infoplease.com/us/government/presidential-pardons- 35. Unisys, accessed November 28, 2012, http://weather.
1789-present.html. unisys.com/hurricane/atlantic/2012.
16. National Oceanic and Atmospheric Administration, 36. Science Daily, accessed November 28, 2012, http://www.
accessed November 5, 2012, http://www.swpc.noaa.gov/ sciencedaily.com/releases/2012/09/120911151835.htm.
ftpdir/lists/ace2/201211_ace_swepam_1h.txt.
37. Matrix of Mnemosyne, accessed November 28, 2012,
17. U.S. Army Corps of Engineers, accessed November 5, http://www.matrixbookstore.biz/trial_jury.htm.
2012, http://www.mvn.usace.army.mil/eng/edhd/wcontrol/
38. Wisconsin Department of Revenue, accessed November
miss.asp.
28, 2012, http://www.revenue.wi.gov/delqlist/nmallA.
18. National Interagency Fire Center, accessed November 6, htm.
2012, http://www.nifc.gov/f ireInfo/f ireInfo_stats_
39. WSOP, accessed November 30, 2012, http://www.
YTD2012.html.
angelfire.com/trek/proutsy.
19. Nextag, accessed November 6, 2012, http://www.nextag.
40. General Electric Company, accessed November 30,
com/12-mp-digital-camera/products-html.
2012, http://www.ge-energy.com/wind.
20. Bureau of Transportation, accessed November 6, 2012,
41. TeensHealth, accessed November 30, 2012, http://
http://www.bts.gov/current_topics/national_and_state_
kidshealth.org/teen/food_fitness/nutrition/caffeine.html#.
bridge_data/html/bridges_by_state.html.
42. statista, accessed November 30, 2012, http://www.
21. El Dorado Weather, accessed November 6, 2012, http://
statista.com/statistics/208146/number-of-subscribers-
www.eldoradocountyweather.com/buoy/Chesapeake%
of-world-of-warcraft.
20Bay/buoy-xhtml.php.
43. U.S. Geological Survey.
22. CBC News, accessed November 7, 2012, http://www.
cbc.ca/news/5canada/edmonton/story/2012/08/07/ 44. Gasland, accessed November 30, 2012, http://www.
alberta-highways-driving-charges-long-weekend.html. gaslandthemovie.com/whats-fracking.
23. Guinea Pig Manual, accessed November 8, 2012, http:// 45. Bloomberg BusinessWeek, accessed November 30,
www.guineapigmanual.com. 2012, http://images.businessweek.com/ss/09/10/1021_
americas_25_top_selling_candies/26.htm.
24. Minnesota Department of Natural Resources, accessed
November 8, 2012, http://www.dnr.state.mn.us/faq/ 46. Playbill, accessed November 30, 2012, http://www.
mnfacts/state_parks.html. playbill.com/celebritybuzz/article/75222-Long-Runs-
on-Broadway.
25. About.com, accessed November 8, 2012, http://brooklyn.
about.com/od/brooklynbridge/f/How-Long-Does-It-
Take-To-Walk-Across-The-Brooklyn-Bridge.htm. Chapter 4
26. AirlineReporter, accessed November 13, 2012, http:// 1. About.com, Contests and Sweepstakes, Rare McDonald’s
www.airlinereporter.com/2012/01/how-long-does-it- Monopoly Game Pieces for 2012, accessed February
take-to-build-a-boeing-777. 19, 2013, http://contests.about.com/b/2012/09/27/
27. ArkansasOnline, accessed November 13, 2012, http:// rare-mcdonalds-monopoly-game-pieces-for-2012.htm.
www.arkansasonline.com/extras/databases/2010- 2. CBC News, accessed February 21, 2013, http://www.
2011ITBS/?appSession=516336608108138. cbc.ca/news/canada/manitoba/story/2013/02/21/mb-
28. TruckersReport, accessed November 13, 2013, http:// police-overtime-costs-tickets-winnipeg.html.
www.thetruckersreport.com/facts-about-trucks. 3. Verizon Wireless, accessed February 21, 2013, http://
29. Rubik’s Cube World Records, accessed November 13, www.verizonwireless.com/wcms/consumer/explore/
2013, http://www.recordholders.org/en/list/rubik.html. choosing-a-plan.html.
30. G. Chiva-Blanch et al., “Dealcoholized Red Wine 4. Centers for Disease Control and Prevention, accessed
Decreases Systolic and Diastolic Blood Pressure and February 24, 2013, http://www.cdc.gov/flu/weekly.
Increases Plasma Nitric Oxide,” Circulation Research, 5. Statistics Bureau, accessed February 25, 2013, http://
Vol. 111, pp. 959–961, 2012. www.stat.go.jp/english/data/roudou/154.htm#TAB.
N-4 Notes and Data Sources
6. World Health Organization, Immunization Profile— 23. Philadelphia Injury Attorney Blog, accessed March 2,
Canada, accessed February 25, 2013, http://apps.who. 2013, http://www.philadelphiainjuryattorneyblog.com/
int/immunization_monitoring/en/globalsummary/ bb_gun_accidents.
countryprofileresult.cfm?c=can. 24. Alka-Seltzer Plus, accessed March 2, 2013, http://www.
7. USA Business Review, accessed February 25, 2013, alkaseltzerplus.com/asp/coldflufacts.html.
http://www.businessreviewusa.com/business_leaders/ 25. CNN Money, accessed March 3, 2013, http://money.cnn.
top-ten-us-convention-centers. com/2012/11/27/pf/cyber-monday-sales/index.html.
8. Infodocket, accessed February 25, 2013, http://www. 26. Child Trends Databank, accessed March 3, 2013, http://
infodocket.com/2012/08/15/fast-facts-comscore-offers- www.childtrendsdatabank.org/?q=node/71.
a-look-at-the-todays-tablet-consumer. 27. Centers for Disease Control and Prevention, accessed
9. APPA, accessed February 25, 2013, http://www. March 3, 2013, http://www.cdc.gov/vaccines/stats-surv/
americanpetproducts.org/press_industrytrends.asp. nis/data/tables_2011.htm.
10. Discovery Fit and Health, accessed February 25, 2013, 28. Traffic Injury Research Foundation, Winter Tires: A
http://health.howstuffworks.com/human-body/systems/ Review of Research on Effectiveness and Use, accessed
circulatory/question593.htm. March 3, 2013, http://www.tirf.ca/publications/PDF_
11. Based on information from the Centers for Disease Con- publications/2012_Winter_Tire_Report_7.pdf.
trol and Prevention, accessed February 25, 2013, http:// 29. Business Pundit, accessed March 4, 2013, http://www.
www.cdc.gov/nchs/fastats/ervisits.htm. businesspundit.com/5-jobs-with-the-highest-fatality-
12. Zagat, accessed February 28, 2013, http://www.zagat.com. rates.
13. Catalyst, accessed February 28, 2013, http://www. 30. The Garden of Eaden, accessed March 4, 2013, http://
catalyst.org/knowledge/working-parents. gardenofeaden.blogspot.com/2012/05/what-is-most-
poisonous-snake-in-india.html.
14. U.S. Census Bureau, accessed February 28, 2013, http://
31. The Globe and Mail, accessed March 4, 2013, http://
www.census.gov/compendia/statab/cats/population/
www.theglobeandmail.com/report-on-business/careers/
marital_status_and_living_arrangements.html.
careers-leadership/canadians-feeling-optimistic-about-
15. Trail Count 2012, accessed February 28, 2013, http:// employers-hiring-investment-plans/article7367200.
www.sanjoseca.gov/DocumentCenter/View/5647.
32. 2008 Bay Area Earthquake Probabilities, USGS, http://
16. Summary of the 2012 Great Goliath Grouper Count, earthquake.usgs.gov/regional/nca/ucerf.
accessed February 28, 2013, “The Marine Scene,” The 33. Gallup Politics, accessed March 4, 2013, http://www.
Southwest Florida Sea Grant Newsletter. gallup.com/poll/152021/conservatives-remain-largest-
17. More States Embrace Gambling to Fight Budget Woes, ideological-group.aspx.
USA Today, August 29, 2012. 34. Alzheimer’s Association, accessed March 4, 2013, http://
18. Wireless Substitution: Early Release of Estimates from www.alz.org/alzheimers_disease_facts_and_figures.
the National Health Interview Survey, January-June asp#quickfacts.
2012, Blumberg, S.J. and Luke, J.V., Division of Health 35. Connect Your Home, accessed March 4, 2013, http://
Interview Statistics, National Center for Health Statis- www.connectyourhome.com/news_and_articles/
tics, accessed March 2, 2013, http://gigaom2.files. featured-connectyourhome-articles/introducing-the-
wordpress.com/2012/12/wireless201212.pdf. hopper-by-dish-network-changing-the-way-americans-
19. Autoblog, accessed March 2, 2013, http://www.autoblog. watch-tv.
com/2012/07/31/au-survey-results-suggests-men-rely- 36. PGA Tour, accessed March 4, 2013, http://www.pgatour.
on-gps-more-than-women. com/stats.html.
20. U.S. Census Bureau, accessed March 2, 2013, http:// 37. NBA, accessed March 4, 2013, http://www.nba.com/
www.census.gov/hhes/www/cpstables/032012/health/ statistics/default_all_time_leaders/AllTimeLeaders
h01_000.htm. FTPQuery.html?top.
21. Consumerist, accessed March 2, 2013, http://consumerist. 38. Investopedia, accessed March 4, 2013, http://www.
com/2013/01/08/list-of-companies-with-worst- investopedia.com/financial-edge/1012/boomers-staying-
customer-service-scores-is-full-of-familiar-names. in-debt-to-retire-in-comfort.aspx#axzz2MaJSaIO1.
22. Statistics Canada, accessed March 2, 2013, http://www. 39. Weird News, accessed March 4, 2013, http://www.
statcan.gc.ca/tables-tableaux/sum-som/l01/cst01/ huffingtonpost.com/2012/10/15/alien-believers-outnumber-
legal22a-eng.htm. god_n_1968259.html.
Notes and Data Sources N-5
40. American Bone Marrow Donor Registry, accessed Safety Facts, accessed March 6, 2013, http://www.census.
March 4, 2013, http://www.abmdr.org/faq0.aspx. gov/compendia/statab/cats/transportation/motor_vehicle_
41. New Guidelines: What to Do About Unexpected Positive accidents_and_fatalities.html.
Tuberculin Skin Test, Curley, C., accessed March 4,
2013, http://www.ccjm.org/content/70/1/49.full.pdf. Chapter 5
1. Florida Department of Environmental Protection, accessed
42. Bureau of Transportation Statistics, accessed March 5,
March 6, 2013, http://www.dep.state.fl.us/geology/
2013, http://apps.bts.gov/xml/ontimesummarystatistics/
gisdatamaps/sinkhole_database.htm.
src/ddisp/OntimeSummaryDataDisp.xml.
2. Aspirin may lower deadly skin cancer risk in women,
43. FAS Military Analysis Network, accessed March 5,
Carroll, L., MSN, accessed March 11, 2013, http://vitals.
2013, http://www.fas.org/man/dod-101/sys/missile/row/
nbcnews.com/_news/2013/03/11/17240612-aspirin-
aspide.htm.
may-lower-deadly-skin-cancer-risk-in-women?lite.
44. CBS News, accessed March 5, 2013, http://www.cbsnews.
3. Central Blood Bank, accessed March 12, 2013, http://
com/8301-504763_162-57483789-10391704/gluten-free-
www.centralbloodbank.org/program-bonus-points.asp.
diet-fad-are-celiac-disease-rates-actually-rising.
4. HomeInsurance.com, accessed March 12, 2013, http://
45. Yantai Best Cellar Consulting Co., accessed March 5, 2013,
homeinsurance.com/auto-insurance/faqs/what-deter-
http://www.winechina.com/html/2013/01/201301143358.
mines-my-auto-insurance-premium.php.
html.
5. Camden hires first members of regional police force,
46. U.S. Census Bureau, accessed March 5, 2013, http://www.
Laday, J., January 18, 2013, nj.com, accessed March 12,
census.gov/hhes/socdemo/education/data/cps/2012/
2013, http://www.nj.com/camden/index.ssf/2013/01/
tables.html.
camden_hires_first_members_of.html.
47. Generic Seeds, accessed March 5, 2013, http://www.
6. Views of Government: Key Data Points, Pew Research
genericseeds.com/vegetable-garden-seed/autumn-gold-
Center, accessed March 12, 2013, http://www.pewresearch.
pumpkin-seeds.
org/2013/03/09/views-of-government-key-data-points.
48. SlideShare, accessed March 5, 2013, http://www.slideshare.
7. Relax, Twinkies likely to live on, White, M. C., NBCNews.
net/dkjnmd/savings-bond-training-webinar-15069817.
com, Business, accessed March 12, 2013, http://www.
49. About.com, Paranormal Phenomena, accessed March 5, nbcnews.com/business/relax-twinkies-likely-live-
2013, http://paranormal.about.com/od/paranormalbasics/a/ 1C7121954.
news_120214n.htm. 8. Canadian Medical Association, accessed March 13,
50. Atlantic Cities, accessed March 5, 2013, http://www. 2014, http://www.cma.ca/multimedia/CMA/Content_
theatlanticcities.com/commute/2012/02/rise-super- Images/Inside_cma/Statistics/09GradCountry.pdf.
commuter/1351/#. 9. Nearly Half of Americans Drink Soda Daily, Gallup
51. NPR, Compensating Organ Donors Becomes Talk of the Wellbeing, Saad, L., http://www.gallup.com/poll/156116/
Nation, accessed March 5, 2013, http://www.npr.org/ Nearly-Half-Americans-Drink-Soda-Daily.aspx.
blogs/health/2012/05/23/153373854/compensating-organ- 10. North Carolina Education Lottery, accessed March 18,
donors-becomes-talk-of-the-nation. 2013, http://www.nc-educationlottery.org/instant_detail.
52. AC Nielsen, Nestle, Dean Foods, accessed March 6, aspx?gn=332.
2013, http://www.statisticbrain.com/coffee-creamer- 11. Based on the article, How may rides do you ride a day at
industry-statistics. Disney? Budget-Travel, accessed March 18, 2013, http://
53. UBS, Wells Fargo, Tobacco Vapor Electronic Cigarette www.budgettravel.com/blog/how-many-rides-do-you-
Association, accessed March 6, 2013, http://www. ride-on-a-day-at-disney,11602.
statisticbrain.com/electronic-cigarette-statistics. 12. Kids in Danger, accessed March 18, 2013, http://www.
54. Statistical Summary of Commercial Jet Airplane Acci- kidsindanger.org/product-hazards/recalls.
dents Worldwide Operations 1959-2011, accessed March 13. Senate Bill No 1274, Offered January 14, 2013, Virginia
6, 2013, http://www.boeing.com/news/techissues/pdf/ State Senate, accessed March 18, 2013, http://leg1.state.
statsum.pdf. va.us/cgi-bin/legp504.exe?131+ful+SB1274.
55. More Men Coloring Their Hair, Los Angeles Times, 14. Based on the article, Burnout in Nurses Increases Patient
accessed March 6, 2013, http://articles.latimes.com/ Infections, posted in Medical Malpractice, accessed
2012/jan/29/image/la-ig-mens-hair-color-20120129. March 18, 2013, http://www.kinnardclaytonandbeveridge.
56. The 2012 Statistical Abstract, The National Data Book, com/blog/2012/09/burnout-in-nurses-increases-patient-
U.S. National Highway Safety Administration, Traffic infections.shtml.
N-6 Notes and Data Sources
15. The Oracle of Bacon, accessed March 18, 2013, http:// standing-anxiety/related-illnesses/other-related-condi-
oracleofbacon.org/center.php. tions/adult-adhd.
16. About.Com Weddings, accessed March 19, 2013, http:// 31. FBI Releases 2011 Bank Crime Statistics, Federal
weddings.about.com/cs/bridesandgrooms/a/numofatten- Bureau of Investigation, accessed March 22, 2013, http://
dants.htm. www.fbi.gov/news/pressrel/press-releases/fbi-releases-
17. How to market ice wine that can’t be called ice wine, Bri- 2011-bank-crime-statistics.
jbassi, A., The Globe and Mail, accessed March 19, 2013, 32. Infoworld Tech Watch, accessed March 22, 2013, http://
http://www.theglobeandmail.com/report-on-business/ www.infoworld.com/t/cyber-crime/malware-infects-30-
small-business/sb-growth/the-challenge/how-to-market- percent-of-computers-in-us-199598.
ice-wine-that-cant-be-called-ice-wine/article9669088. 33. City-Data.com, accessed March 22, 2013, http://www.
18. Pizza Industry Analysis 2013-Cost and Trends, Franchise city-data.com/top2/c477.html.
Help, accessed March 19, 2013, http://www.franchisehelp. 34. Canada’s most notorious highways, CBCnews, Canada,
com/industry-reports/pizza-industry-report. accessed March 22, 2013, http://www.cbc.ca/news/
19. DailyMed, accessed March 19, 2013, http://dailymed.nlm. canada/story/2013/02/01/f-highways-dangerous.html.
nih.gov/dailymed/drugInfo.cfm?id=81505#section-6. 35. Swimming committee recommends change to false-start
20. Report: Childhood Poverty High in Detroit, But Teen rule, Johnson, G., NCAA, accessed March 22, 2013,
Pregnancy Down, CBS Detroit, accessed March 19, 2013, http://www.ncaa.org/wps/wcm/connect/public/NCAA/
http://detroit.cbslocal.com/2013/01/24/report-childhood- Resources/Latest+News/2011/July/Swimming+
poverty-high-in-detroit-but-teen-pregnancy-down. committee+recommends+change+to+false-start+rule.
21. Marquette University, Tuition and Costs, accessed March 36. 4 popular myths about credit unions, NBCNEWS.com,
19, 2013, http://www.marquette.edu/about/studenttu- accessed March 23, 2013, http://www.nbcnews.com/
ition.shtml. business/4-popular-myths-about-credit-unions-1C8936002.
22. Women Moving Millions, accessed March 19, 2013, http:// 37. Rides Injure More Than 4,400 Children Per Year, Pearson,
www.womenmovingmillions.org/how-we-do-it/facts. C., Huff Post Parents, accessed May 2, 2013, http://www.
23. 2013 Illinois DUI Fact Book, accessed March 20, 2013, huffingtonpost.com/2013/05/01/rides-injury-children_
http://www.cyberdriveillinois.com/publications/pdf_ n_3187571.html.
publications/dsd_a118.pdf. 38. Earthquake Facts and Statistics, USGS, accessed March
24. The Blog, Real Estate Center, Texas A&M University, 22, 2013, http://earthquake.usgs.gov/earthquakes/
accessed March 20, 2013, http://blog.recenter.tamu. eqarchives/year/eqstats.php.
edu/2013/01. 39. Don’t Eat Out as Often, The Simple Dollar, accessed
25. Health Policy Snapshot, Childhood Obesity, accessed March 22, 2013, http://www.thesimpledollar.com/
March 20, 2013, http://www.rwjf.org/content/dam/farm/ 2012/07/07/dont-eat-out-as-often-188365.
reports/issue_briefs/2012/rwjf 72649. 40. Huff Post, accessed March 22, 2013, http://www.huff-
26. Travel Impact Newswire, Canadian CA’s Optimism Dips ingtonpost.com/2013/03/13/religion-america-decline-
in Wake of Concerns over U.S. Economy, March 8, 2013, low-no-affiliation-report_n_2867626.html.
http://www.travel-impact-newswire.com/2013/03/ 41. FDA proposes tightening rules for heart defibrillators,
canadian-cas-optimism-dips-in-wake-of-concerns-over- Clarke, T., nbcnews.com, accessed March 23, 2013, http://
u-s-economy/#axzz2O6Y7tZeh. vitals.nbcnews.com/_news/2013/03/22/17416241-fda-
27. Theguardian, accessed March 20, 2013, http://www. proposes-tightening-rules-for-heart-defibrillators?lite.
guardian.co.uk/media/2013/jan/07/tom-daley-splash- 42. FDIC study: outrageous overdraft fees, Bruce, L.,
itv-tv-ratings. Bankrate.com, accessed March 23, 2013, http://www.
28. U.S. Census Bureau, Mover Rate Reaches Record Low, bankrate.com/finance/investing/fdic-study-outrageous-
Census Bureau Reports, accessed March 21, 2013, http:// overdraft-fees-1.aspx.
www.census.gov/newsroom/releases/archives/mobility_ 43. Who’s in the tower? At some regional airports soon:
of_the_population/cb11-193.html. Nobody. Ahler, M.M., and Marsh, R., CNN Travel,
29. United States Parachute Association, accessed March 21, accessed March 24, 2013, http://www.cnn.com/2013/03/22/
2013, http://www.uspa.org/AboutSkydiving/Skydiving travel/faa-control-tower-closures/index.html?hpt=hp_bn3.
Safety/tabid/526/Default.aspx. 44. International Shark Attack File, Ichthyology at the
30. Adult ADHD (Attention Deficit/Hyperactive Disorder), Florida Museum of Natural History, accessed March 24,
Anxiety and Depression Association of America, 2013, http://www.flmnh.ufl.edu/fish/sharks/statistics/
accessed March 21, 2013, http://www.adaa.org/under- statsus.htm.
Notes and Data Sources N-7
45. China pledges to grow broadband coverage to 70% of 8. Versa-lok, accessed April 10, 2013, http://www.versa-lok.
households, Buckley, S., engadget, http://www.engadget. com/products/residential-commercial/standard.
com/2013/02/28/china-pledges-to-grow-broadband- 9. Science and Technology Focus, Office of Naval Research,
coverage-to-70-of-households. accessed April 10, 2013, http://www.onr.navy.mil/focus/
46. Cruise Ship Industry Statistics, Statistic Brain, accessed ocean/water/salinity1.htm.
March 24, 2013, http://www.statisticbrain.com/cruise- 10. Kashi, accessed April 11, 2013, http://www.kashi.com/
ship-industry-statistics. products/golean_crunchy_bars_chocolate_caramel.
47. Canada getting rid of the penny to save costs, CNN, 11. Household Products Database, U.S. Department of
accessed March 24, 2013, http://www.cnn.com/2012/ Health and Human Services, accessed April 10, 2013,
03/30/business/canada-penny/index.html. http://hpd.nlm.nih.gov/cgi-bin/household/search?tbl=
48. Rotten Tomatoes by Flixter, Les Miserables, accessed TblChemicals&queryx=111-76-2.
March 24, 2013, http://www.rottentomatoes.com/ 12. Modern Parenthood, Parker, K., and Wang, W., Pew
m/1083326-les_miserables. Research Social and Demographic Trends, accessed
49. Health Buzz: Many Americans Seek Medical Diagnoses April 14, 2013, http://www.pewsocialtrends.org/
Online, McMullen, L., USNews Health, accessed March 2013/03/14/modern-parenthood-roles-of-moms-and-
24, 2013, http://health.usnews.com/health-news/articles/ dads-converge-as-they-balance-work-and-family.
2013/01/15/35-percent-of-americans-seek-medical- 13. Johnson Holds Off Dale Jr. for Daytona 500 Win, Bruce,
diagnosis-online. K., NASCAR News and Media, accessed April 14, 2013,
50. Knee Replacement Surgery: Does It Lead to Weight http://www.nascar.com/en_us/news-media/articles/
Gain? A Physical Therapist, Inc., accessed March 24, 2013/02/24/55th-daytona-500-finish.html.
2013, http://aphysicaltherapistinc.com/2013/01/knee- 14. U.S. Forest Service, Forest Management, accessed April
replacement-surgery-does-it-lead-to-weight-gain. 14, 2013, http://www.fs.fed.us/forestmanagement/
51. Toddler meals swimming in salt, CNN Health, accessed documents/sold-harvest/documents/1905-2012_Natl_
March 24, 2013, http://thechart.blogs.cnn.com/2013/03/ Summary_Graph.pdf.
21/meals-and-snacks-for-toddlers-heavy-in-sodium/ 15. Kraft Foods, accessed April 13, 2013, http://www.
?hpt=he_c2. kraftrecipes.com/Products/ProductInfoDisplay.aspx?
SiteId=1&Product=4300095369.
Chapter 6 16. Sports Illustrated, Golf.com, accessed April 14, 2013,
1. Report: AT&T Wins LTE Speed Race, MetroPCS stum- http://blogs.golf.com/presstent/2013/04/14-year-old-
bles, Parker, T., Fierce Broadband Wireless, accessed guan-slapped-with-slow-play-penalty-.html.
March 30, 2013, http://www.fiercebroadbandwireless. 17. PGA Tour Tracks Pace of Play, GolfTalkCentral,
com/story/report-att-wins-lte-speed-race-metropcs- Hoggard, R., accessed April 14, 2013, http://www.
stumbles/2013-02-13. golfchannel.com/news/golftalkcentral/pga-tour-tracks-
2. How Long It Took Different Groups to Vote, The New pace-of-play.
York Times, accessed April 1, 2013, http://www.nytimes. 18. Utah.com, accessed April 15, 2013, http://www.utah.
com/interactive/2013/02/05/us/politics/how-long-it- com/attractions/kennecott.htm.
took-groups-to-vote.html?_r=0. 19. Some Neighborhoods Dangerously Contaminated by
3. Drugs to Treat Insomnia, WebMD, accessed April 2, Lead Fallout, USA Today, Young, A., and Eisler, P.,
2013, http://www.webmd.com/sleep-disorders/insomnia- accessed April 22, 2013, http://usatoday30.usatoday.
medications. com/news/nation/story/2012-04-20/smelting-lead-
4. Tourism Kingston, accessed April 2, 2013, http://www. contamination-soil-testing/54420418/1.
kingstoncanada.com/en/makeaconnection/parking. 20. Near Earth Object Program, National Aeronautics and
asp?_mid_=3232. Space Administration, accessed April 22, 2013, http://
5. Choose Airplane Seats, About.com, accessed April 3, neo.jpl.nasa.gov/stats/wise.
2013, http://honeymoons.about.com/od/flying/ht/ 21. Great Smoky Mountains, National Park Service,
choosing_airplane_seats.htm. accessed April 22, 2013, http://www.nps.gov/grsm/
6. sfist, accessed April 9, 2013, http://sfist.com/2013/03/07/ naturescience/black-bears.htm.
map_average_rent_for_1br_in_san_fra.php. 22. Statistics Canada, Table 303-0048, accessed April 22,
7. TheKnot.com, as reported by CNN Money, accessed 2013, http://www5.statcan.gc.ca/cansim/a47.
April 11, 2013, http://money.cnn.com/2013/03/10/pf/ 23. The Human Memory, accessed April 24, 2013, http://
wedding-cost/index.html. www.human-memory.net/types_sensory.html.
N-8 Notes and Data Sources
24. Fatsecret, accessed April 26, 2013, http://www.fatsecret. 40. Live Science, accessed May 31, 2014, http://www.
com/calories-nutrition/food/noodle-soup/sodium. livescience.com/33657-8-weird-statistics.html.
25. Economic Research Service, U.S. Department of 41. DunwoodyPatch, accessed May 31, 2014, http://
Agriculture, accessed April 24, 2013, http://www.ers.usda. dunwoody.patch.com/groups/business-news/p/reports-
gov/data-products/china-agricultural-and-economic-data/ sams-club-will-cut-2-of-its-workforce.
national-and-provincial-data.aspx#.UXrBIMoVwbk. 42. Mother Jones, accessed May 31, 2014, http://www.
26. 2013 Fiat 500E, Car and Driver, accessed April 24, motherjones.com/blue-marble/2014/04/poll-science-
2013, http://www.caranddriver.com/news/2013-fiat- denial-big-bang-evolution-creationism-climate-change.
500e-photos-and-info-news.
27. Fooducate, accessed April 24, 2013, http://www.fooducate. Chapter 7
com/app#page=product&id=2344EFDA-E10B-11DF- 1. Foreclosure compensation checks arrive, but Anger
A102-FEFD45A4D471. Some Homeowners, Myers, L., and Gardella. R., accessed
May 2, 2013, http://openchannel.nbcnews.com/_news/
28. Average New-Car Loan a Record 65 Months in Fourth 2013/05/02/18022071-foreclosure-compensation-
Quarter, Reuters, accessed April 24, 2013, http://www. checks-arrive-but-anger-some-homeowners?lite.
reuters.com/article/2013/03/05/us-auto-loans-idUS-
BRE9240KQ20130305. 2. Employee Benefits, Hilmar Cheese Company, accessed
May 3, 2014, http://www.hilmarcheese.com/Careers/
29. Average U.S. Family Spent Nearly $3000 on Gas Last Employee_Benefits.
Year, AOL Autos, accessed April 24, 2013, http://
autos.aol.com/article/gas-prices-fuel-economy-income- 3. 2013 Feed Composition Tables, accessed May 3, 2013,
American. http://beefmagazine.com/nutrition/nutrient-values-300-
cattle-feeds.
30. Washington, DC: Metrorail Trivia, tripadvisor, accessed
April 24, 2013, http://www.tripadvisor.com/Travel- 4. American lamb, accessed May 6, 2013, http://www.
g28970-c56842/Washington-Dc:District-Of-Columbia: americanlamb.com/lamb-101.
Metrorail.Trivia.html. 5. Big catfish caught on Bartlett Lake, Hungeree, accessed
31. U.S. Customs and Border Protection, CBP Border Wait May 6, 2013, http://hungeree.com/animals/big-catfish-
Times, accessed April 24, 2013, http://apps.cbp.gov/bwt. caught-on-bartlett-lake.
32. Chobani Products, accessed April 24, 2013, http:// 6. autosnout.com, accessed May 6, 2013, http://www.
chobani.com/products/faq. autosnout.com/Cars-0-60mph-List.php.
7. 3M Half Marathon, accessed May 6, 2013, http://www.
Chapter 6 WEB wetimeraces.com/RacingSystems/Results/2013/
33. Less than Half of Brazilians Favor Hosting World Cup, 3M552/3M2013.htm#/results::1367880139151.
Poll Shows, accessed May 30, 2014, http://www. 8. Jockeys, Kentucky Derby (1875-2012), Churchill Downs,
reuters.com/article/2014/04/08/us-worldcup-brazil- accessed May 6, 2013, http://www.kentuckyderby.com/
idUSBREA3715H20140408. sites/kentuckyderby.com/f iles/u64720/Jockey%20
34. Radio, accessed May 31, 2014, http://radiomagonline. Records%2C%20Kentucky%20Derby_0.pdf.
com/digital_radio/55_percent_cars_uk. 9. Toronto Rock, accessed May 6, 2013, http://www.nll.
35. CBS DC, accessed May 31, 2014, http://washington. com/stats/team_instance/243876?subseason=86199.
cbslocal.com/2014/04/22/poll-63-percent-of-americans-
10. Kroger’s New Weapon: Infrared Cameras, Yahoo Finance,
against-personal-drones-being-allowed-in-us-airspace.
accessed May 2, 2013, http://finance.yahoo.com/news/
36. The Christian Science Monitor, accessed May 31, 2014, krogers-weapon-infrared-cameras-011800830.html.
http://www.csmonitor.com/USA/Education/2014/0514/
11. Fattest countries in the world revealed: Extraordinary
Less-than-40-percent-of-12th-graders-ready-for-college-
graphic charts the average body mass index of men and
analysis-finds.
women in every country (with some surprising results),
37. Climate Change Communication, accessed May 31, 2014, Bond, A., Mail Online, accessed May 8, 2013, http://
http://environment.yale.edu/climate-communication/ www.dailymail.co.uk/health/article-2301172/Fattest-
article/american-opinion-on-climate-change-warms-up. countries-world-revealed-Extraordinary-graphic-charts-
38. Industry Canada, accessed May 31, 2014, http://www. average-body-mass-index-men-women-country-surprising-
ic.gc.ca/eic/site/lsg-pdsv.nsf/eng/h_hn01703.html. results.html.
39. Zero Hedge, accessed May 31, 2014, http://www.zero- 12. Monsoon rainfall seen average in 2013, Reuters, accessed
hedge.com/news/2014-03-24/30-survey-results-sound- May 8, 2013, http://in.reuters.com/article/2013/04/26/
false-are-actually-true. india-monsoon-minister-idINDEE93P03S20130426.
Notes and Data Sources N-9
13. Dr. Pieter Tans, NOAA/ESRL (www.esrl.noaa.gov/gmd/ 30. TheResourceSolutions.com, accessed May 17, 2013,
ccgg/trends). http://theresourcesolutions.com/topics/human_body/
14. WebMD, Information and Resources, Typhoid Fever, where_is_the_smallest_bone_in_your_body.htm.
accessed May 8, 2013, http://www.webmd.com/ 31. All About Vinyl Swimming Pool Liners, accessed May
a-to-z-guides/typhoid-fever. 17, 2013, http://www.poolinfo.com/Vinyl-Liners.htm.
15. topendsports, accessed May 8, 2013, http://www. 32. hospitality.net, accessed May 19, 2013, http://www.
topendsports.com/testing/results/vertical-jump.htm. hospitalitynet.org/news/4059582.html.
16. NASA, Goddard Space Flight Center, Ozone Hole 33. nbcnews, accessed May 18, 2013, http://cosmiclog.
Watch, accessed May 8, 2013, http://ozonewatch.gsfc. nbcnews.com/_news/2013/05/17/18326731-buggy-
nasa.gov/meteorology/annual_data.html. hordes-of-cicadas-sighted-in-virginia-but-new-york-not-
17. Blog.GasBuddy.com, accessed May 13, 2013, http:// yet?lite.
blog.gasbuddy.com/posts/Almost-1-in-4-motorists-are-
shopping-for-better-auto-insurance-45-percent-switch-
Chapter 8
providers/1715-539072-1765.aspx. 1. Maryland Department of Natural Resources, accessed
May 20, 2013, http://news.maryland.gov/dnr/2013/02/
18. 70 percent of Americans track their health, but most go 05/2013-midwinter-waterfowl-survey-results-are-in.
low tech, Heusser, K.M., Gigaom, accessed May 14,
2. March Bike Pilot Evaluation and Initiatives Update, San
2013, http://gigaom.com/2013/01/27/70-percent-of-
Francisco BART, accessed May 20, 2013, http://www.
americans-track-health-but-most-skip-tech-and-many-
bart.gov/docs/Bike_Pilot_2_Board_Presentation.pdf.
just-use-their-heads.
3. nbcnews.com, accessed May 20, 2013, http://vitals.
19. The Official Microsoft Blog, accessed May 14, 2013,
nbcnews.com/_news/2013/05/20/18377639-new-
http://blogs.technet.com/b/microsoft_blog/archive/
sleep-pill-may-be-unsafe-at-higher-doses-fda-review-
2013/04/17/latest-security-intelligence-report-shows-
suggests?lite.
too-many-pcs-lack-antivirus-protection.aspx.
4. Manatee County Administration Dashboards, accessed
20. Inside Movies, accessed May 14, 2013, http://insidemovies. May 20, 2013, http://public.mymanatee.org/dashboards/
ew.com/2013/05/04/movie-trailers-show-too-much- view?view=Public%20Safety&dashboard_param=65.
survey.
5. Mamma Chia, accessed May 21, 2013, http://www.
21. ClickZ, accessed May 15, 2013, http://www.clickz.com/ mammachia.com/chia-squeeze.
clickz/news/2239608/more-consumers-order-food-
6. Statistics Canada, accessed May 21, 2013, http://www.
online-using-a-smartphone-or-tablet.
statcan.gc.ca/pub/82-625-x/2013001/article/11779-eng.htm.
22. Medical News Today, accessed May 15, 2013, http://
7. The Geyser Observation and Study Association, accessed
www.medicalnewstoday.com/articles/260022.php.
May 21, 2013, http://www.geyserstudy.org/geyser.
23. Yale News, accessed May 15, 2013, http://yaledailynews. aspx?pGeyserNo=CLIFF.
com/blog/2013/02/06/yales-admission-yield-rate-
8. Utah Geological Survey, accessed May 22, 2013, http://
ranked-9th.
geology.utah.gov/emp/energydata/coaldata.htm.
24. ESPN MLB, accessed May 15, 2013, http://espn.go.com/ 9. Huff Post Chicago, accessed May 22, 2013, http://www.
mlb/stats/team/_/stat/fielding. huffingtonpost.com/2013/05/20/ferris-wheel-world-
25. Huffpost TV, accessed May 15, 2013, http://www. record_n_3306361.html.
huffingtonpost.com/2013/03/20/motorola-mobility-41- 10. Canadian Geographic, accessed May 22, 2013, http://
percent-dvr-content-unwatched_n_2918929.html. geography.about.com/gi/o.htm?zi=1/XJ&zTi=1&sdn=
26. The Ocean trash Index, accessed May 15, 2013, http:// geog raphy&cdn=education&tm=95&f=10&su=
www.oceanconservancy.org/our-work/marine-debris/2012- p284.13.342.ip_&tt=2&bt=0&bts=0&zu=http%3A//
icc-data-pdf.pdf. www.canadiangeographic.ca/magazine/MA06/indepth/
27. National Geographic, accessed May 15, 2013, http://travel. justthefacts.asp.
nationalgeographic.com/travel/countries/canada-facts. 11. National Geographic, accessed May 22, 2013, http://
28. Centers for Disease Control and Prevention, accessed animals.nationalgeographic.com/animals/mammals/koala.
May 17, 2013, http://www.cdc.gov/nchs/fastats/ 12. Boeing, accessed May 22, 2013, http://www.newair-
bodymeas.htm. plane.com/747/design_highlights/#/home.
29. The Tropical Rainforest Biome, accessed May 17, 2013, 13. Co.Exist, accessed May 22, 2013, http://www.fastcoexist.
http://prezi.com/xki3uz46_hmb/the-tropical-rainforest- com/1681494/a-new-ultra-cheap-led-light-looks-and-
biome. acts-like-an-incandescent-bulb.
N-10 Notes and Data Sources
14. USDA National Nutrient Database for Standard Refer- 31. 2013 Global Management Education Graduate Survey,
ence, accessed May 23, 2013, http://ndb.nal.usda.gov/ GMAC, accessed June 25, 2013, http://www.gmac.
ndb/search/list. com/~/media/Files/gmac/Research/curriculum-insight/
15. Imagine 2050, accessed May 23, 2013, http://imagine gmegs-2013-stats-brief.pdf.
2050.newcomm.org/2013/04/01/john-morton-resigns- 32. Aupair World, Press kit 2012, accessed June 25, 2013.
to-open-gourmet-snail-farm. 33. Final Report, Survey of Canadians on Privacy-Related
16. Watershed Publishing, Marketing Charts, accessed May Issues, accessed June 25, 2013, http://www.priv.gc.ca/
27, 2013, http://www.marketingcharts.com/wp/interac- information/por-rop/2013/por_2013_01_e.pdf.
tive/social-networking-eats-up-3-hours-per-day-for-the- 34. U.S. Department of Agriculture, Economic Research
average-american-user-26049. Service, U.S. milk production and related data (quar-
17. Cattle.com, accessed May 31, 2013, http://www.cattle. terly), accessed June 27, 2013, http://www.ers.usda.gov/
com/markets/archive.aspx?code=NW_LS720. data-products/dairy-data.aspx#.UcxsLJzYl8k.
18. Bike Rumor, accessed May 31, 2013, http://www. 35. Golden Isles of Georgia Tide Tables, accessed June 27,
bikerumor.com/2012/08/26/2013-trek-bikes-actual- 2013, http://www.gacoast.com/tide1.html.
weights-for-road-mountain-bikes.
36. intellicast.com, accessed June 27, 2013, http://www.
19. Sutherland, J., Liu, G., Repin, N., and Crump, T., Hospi- intellicast.com/Local/Obser vation.aspx?char t=
tal funding policies: Average length of stay for conges- Relative%20Humidity.
tive heart failure, BCHeaPR Study Data Bulletin #15.
37. Bloomberg, accessed June 27, 2013, http://www.bloom-
Vancouver: UBC Centre for Health Services and Policy
berg.com/news/2012-06-07/gm-boosts-2013-volt-mile-
Research, 2013.
age-rating-beating-toyota-prius.html.
20. Mirtle, J., A Hockey Journalist’s Blog, accessed May 31,
38. cycling news, accessed June 27, 2013, http://www.
2013, http://mirtle.blogspot.com/2013/01/2013-nhl-
cyclingnews.com/races/tour-de-france-2012/stage-4/
teams-by-weight-height-and-age.html.
results.
21. Vehicles, accessed May 31, 2013, http://vehicles-us.
blogspot.com/2013/02/longer-average-length-of- 39. Bureau of Transportation Statistics, accessed June 27,
vehicle.html. 2013, http://www.rita.dot.gov/bts/sites/rita.dot.gov.bts/
files/subject_areas/airline_information/taxi_out_and_
22. MSN Money, accessed May 31, 2013, http://money.msn. other_tarmac_times/tarmac_more_than_3_hours_
com/now/post.aspx?post=f3c6a0cf-33f1-43fb- april_2013.html.
9072-6e9375392905.
40. Critical Zone Observatories, accessed June 27, 2013,
23. WDW Mobile App, accessed May 31, 2013. http://criticalzone.org/national/data/dataset/2428.
24. Polar Bear Science, accessed May 31, 2013, http://
41. The Numbers, accessed June 27, 2013, http://www.the
polarbearscience.com/2013/03/10/new-chukchi-s ea-
numbers.com/movies/records/allbudgets.php.
polar-bear-survey-exciting-preliminary-results.
42. infoplease, accessed June 27, 2013, http://www.infoplease.
25. NBC News Travel, accessed June 23, 2013, http://www.
com/encyclopedia/history/great-wall-china.html.
nbcnews.com/travel/roads-less-crowded-4th-july-aaa-
says-6C10393641. 43. Today Health, accessed June 27, 2013, http://www.today.
com/health/new-rules-make-school-junk-food-free-
26. International Business Times, accessed June 23, 2013,
zone-6C10467557.
http://www.ibtimes.com/solar-activity-cycle-peaks-
2013-more-solar-flares-expected-later-year-1302041. 44. calorie count, accessed June 27, 2013, http://caloriecount.
27. Huffpost Healthy Living, accessed June 22, 2013, http:// about.com/calories-sports-drinks-ic1444.
www.huffingtonpost.com/2013/06/11/healthy-restau- 45. CBC British Columbia, accessed June 28, 2013, http://
rant-meals_n_3421607.html. www.cbc.ca/bc/news/investigates.
28. CRFA’s 2013 Canadian Chef Survey, accessed June 22, 46. Southern California Coastal Ocean Observing
2013, http://www.crfa.ca/pdf/chefsurvey_2013_english. System, accessed June 28, 2013, http://www.sccoos.
pdf. org/query/.
29. Hyde Park Neighborhood Association, accessed June 22, 47. Today Entertainment, accessed June 28, 2013, http://
2013, http://www.austinhydepark.org/2013/04/hpna- www.today.com/entertainment/tv-binge-watching-
2013-survey-results. mostly-harmless-addiction-6C10476290.
30. The Wichita Eagle, accessed June 22, 2013, http://www. 48. Scientific American, accessed June 28, 2013, http://www.
kansas.com/2013/02/14/2677088/kansas-considering- scientificamerican.com/article.cfm?id=1000-mph-car-
restraints.html. land-speed-record.
Notes and Data Sources N-11
49. MIX 104.1, accessed June 28, 2013, http://mix1041.cbslo- 16. NOAA, accessed July 5, 2013, http://www.noaanews.
cal.com/2013/05/21/do-you-hide-money-from-your-spouse. noaa.gov/stories2013/20130618_deadzone.html.
50. Today Health, accessed June 28, 2013, http://www.today. 17. Mayo Clinic, accessed July 5, 2013, http://www.mayo-
com/health/eating-carbs-may-make-you-crave-even- clinic.com/health/complete-blood-count/MY00476/
more-carbs-6C10484514. DSECTION=results.
18. ESPN NFL, accessed July 5, 2013, http://espn.go.
Chapter 9 com/nfl/draft2013/story/_/id/8953829/2013-nfl-draft-five-
1. Insurance Journal, accessed June 30, 2013, http://www. year-nfl-combine-averages?utm_source=twitterfeed&
insur ancejour nal.com/new s/national/2013/03/ utm_medium=twitter.
20/285243.htm. 19. U.S. Environmental Protection Agency, accessed July 7,
2. Roy Morgan Research, accessed July 1, 2013, http:// 2013, http://www.epa.gov/otaq/tcldata.htm.
www.roymorgan.com/findings/australian-motorists-
20. International Coffee Organization Monthly Coffee Mar-
drive-average-15530km-201305090702.
ket report, May 2013, accessed July 7, 2013, http://www.
3. The College Board, accessed July 1, 2013, http://profes- ico.org/documents/cy2012-13/cmr-0513-e.pdf.
sionals.collegeboard.com/testing/sat-reasoning/scores/
21. Net Index, accessed July 7, 2013, http://www.netindex.
averages.
com/download/#.
4. Entertainment Software Association, accessed July 1,
22. Locksmith Ledger, accessed July 7, 2013, http://www.
2013, http://www.theesa.com/facts/gameplayer.asp.
locksmithledger.com/article/10881122/locksmith-
5. The Boston Globe, accessed July 1, 2013, http://www.bos- services-pricing-2013.
tonglobe.com/sports/2013/06/08/why-baseball-games-
23. ConAgraFoods, accessed July 7, 2013, http://www.
take-long/wikaeRMGatBDGDefpbFE1H/story.html.
eggbeaters.com/healthy-eating-habits-about-us/
6. Government of Canada, Citizenship and Immigration compare-egg-nutrition.
Canada, accessed July 1, 2013, http://www.cic.gc.ca/
24. Livingstrong.com, accessed July 7, 2013, http://www.
english/information/times/canada/cit-processing.asp.
livestrong.com/article/490608-how-much-iodine-is-in-milk.
7. Today Health, accessed July 1, 2013, http://www.today.
25. NBC News Science, accessed July 8, 2013, http://www.
com/health/later-you-stay-more-you-eat-study-shows-
nbcnews.com/science/antarcticas-hidden-lake-vostok-
6C10488450.
found-teem-life-6C10561955.
8. Institute for Energy Research, accessed July 3, 2013,
26. Kawasaki, accessed July 8, 2013, http://www.
http://www.instituteforenergyresearch.org/2013/06/03/
kawasaki.com/Products/Product-Specifications.aspx?
americas-green-energy-problems-defective-solar-panels.
scid=23&id=708.
9. University of Florida News, accessed July 3, 2013, http://
news.ufl.edu/2013/05/15/aspirin-2. 27. The New York Times, Cheating Ourselves of Sleep,
accessed July 9, 2013, http://well.blogs.nytimes.
10. BuildTheBridgeNowNY.org, accessed July 3, 2013, com/2013/06/17/cheating-ourselves-of-sleep.
http://buildthebridgenowny.org/faq.
28. gizmag, accessed July 9, 2013, http://www.gizmag.com/
11. Waikiki Roughwater Swim, accessed July 3, 2013, how-does-elon-musk-hyperloop-work/27757.
http://www.waikikiroughwaterswim.com/2012/
2012OverallResults.html. 29. CNN, accessed July 10, 2013, http://www.cnn.com/2013/
07/09/world/americas/toronto-train-flooding.
12. Livingstrong.com, accessed July 3, 2013, http://www.
livestrong.com/article/5966-need-antioxidants-dark- 30. Agricorp, accessed July 10, 2013, http://www.agricorp.
chocolate. com/en-ca/Programs/ProductionInsurance/ForageRain-
fall/Pages/RainfallDailyData.aspx?StationID=7021293
13. National Fire Protection Association, accessed July 3,
&Year=2013&LANG=EN.
2013, http://www.nfpa.org/codes-and-standards/docu-
ment-information-pages? utm_source=feedburner&utm_ 31. ESPN NFL, accessed July 11, 2013, http://espn.go.com/
medium=feed&utm_campaign=Feed%3A+nfpacodes nfl/qbr/_/year/2011, 2012.
andstandards+%28NFPA+codes+and+standards%29. 32. J2SKI, accessed July 11, 2013, http://www.j2ski.com/
14. Canadian Tourism Human Resource Council, accessed ski-chat-forum/posts/list/14182.page.
July 3, 2013, http://cthrc.ca/en/member_area/member_ 33. FindTheBest, accessed July 11, 2013, http://planes.find-
news/a_panel_of_canadian_hotel_industry_it_execu- thebest.com/l/241/Airbus-A350-900.
tives_to_look_at_hotel_technology_trends. 34. Communications Satellites, accessed July 11, 2013,
15. drugwatch, accessed July 3, 2013, http://www.drug- http://www.satellites.spacesim.org/english/function/
watch.com/2013/02/19/hip-implants-women. communic/index.html.
N-12 Notes and Data Sources
35. Union of Concerned Scientists, accessed July 11, 2013, 52. The Wall Street Journal, accessed July 6, 2013, http://
http://www.ucsusa.org/nuclear_weapons_and_global_ online.wsj.com/article/PR-CO-20130515-909947.html.
security/space_weapons/technical_issues/ucs-satellite- 53. Today Money, accessed July 25, 2013, http://www.
database.html. today.com/money/sometimes-boss-really-psycho-
36. U.S. Energy Information Administration, accessed July 6C10732488.
12, 2013, http://www.eia.gov/todayinenergy/detail. 54. NBC News Health, accessed July 25, 2013, http://www.
cfm?id=11831. nbcnews.com/health/taller-women-more-likely-get-
37. NOAA Fisheries, Office of Protected Resources, cancer-large-study-finds-6C10746890.
accessed July 13, 2013, http://www.nmfs.noaa.gov/pr/ 55. First Read on NBC News, accessed July 25, 2013, http://
species/fish/bluefintuna.htm. firstread.nbcnews.com/_news/2013/07/24/19661695-
38. Research and Innovative Technology Administration, poll-majority-more-worried-us-surveillance-goes-too-
Bureau of Transportation Statistics, accessed July 13, far?lite.
2013, http://apps.bts.gov/xml/ontimesummarystatistics/ 56. NBC News Health, accessed July 25, 2013, http://www.
src/dstat/OntimeSummaryAirtimeData.xml. nbcnews.com/health/full-moon-can-mess-your-sleep-
39. Cleveland.com, Northeast Ohio, accessed July 13, 2013, new-study-finds-6C10743979.
http://www.cleveland.com/north-olmsted/index. 57. CBC News Saskatchewan, accessed July 25, 2013, http://
ssf/2013/06/north_olmsted_mayors_court_app.html. www.cbc.ca/news/canada/saskatchewan/story/2013/
40. CNBC, accessed July 14, 2013, http://www.cnbc.com/ 07/25/business-rich-canada.html.
id/100804842. 58. NBC News Science, accessed July 26, 2013, http://www.
41. OECD, Health Policies and Data, accessed July 14, 2013, nbcnews.com/science/mysterious-hum-driving-people-
http://www.oecd.org/health/health-systems/oecdhealth- crazy-around-world-6C10760872.
data2013-frequentlyrequesteddata.htm. 59. First Read on NBC News.com, accessed July 26, 2013,
42. Rothschild, P., Grabar, S., Le Du, B., Temstet, C., Rostaqui, http://f irstread.nbcnews.com/_news/2013/07/24/
O., and Brezin, A. (2013), Patients subjective assessment 19644154-nbcwsj-poll-faith-in-dc-hits-a-low-83-
of the duration of cataract surgery: A case series, BMJ percent-disapprove-of-congress.
Open 3: e002497, doi:10.1136/bmjopen-2012-002497. 60. CNN Tech, accessed July 26, 2013, http://www.cnn.com/
2013/07/26/tech/web/impact-technology-handwriting/
43. NBC News, accessed July 14, 2013, http://www.nbc-
index.html?hpt=hp_t3.
news.com/travel/worlds-largest-building-opens-china-
6C10578538. 61. Inside Higher Ed, accessed July 26, 2013, http://www.
insidehighered.com/news/2012/11/12/report-shows-
44. Baseball Press, accessed July 15, 2013, http://www.base-
growth-international-enrollments-study-abroad.
ballpress.com/article.php?id=1269.
62. Mingtiandi, accessed July 26, 2013, http://www.mingti-
45. NBC News Science, accessed July 15, 2013, http://www. andi.com/real-estate/outbound-investment/20130707/
nbcnews.com/science/big-lionfish-found-disturbing- the-chinese-are-coming-just-not-all-that-quickly.
depths-6C10631330.
63. CNN Money, accessed July 26, 2013, http://money.cnn.
46. SportStats, accessed July 15, 2013, http://www.sport- com/2013/04/26/news/economy/health-care-cost/index.
stats.ca/displayResults.xhtml?racecode=103723. html.
47. National Weather Service, Climate Prediction Center, 64. CBS New York, accessed July 26, 2013, http://newyork.
accessed July 15, 2013, http://www.cpc.ncep.noaa.gov/ cbslocal.com/2013/05/24/west-islip-students-accused-
products/analysis_monitoring/cdus/pastdata/palmer. of-using-facebook-to-cheat.
48. U.S. Geological Survey Earthquake Hazards Program, 65. U.S. Energy Information Administration, accessed July
accessed July 15, 2013, http://earthquake.usgs.gov/ 28, 2013, http://www.eia.gov/analysis/studies/world-
earthquakes/feed/v1.0/csv.php. shalegas/pdf/fullreport.pdf?zscb=45967897.
49. Diabetes Self-Management, accessed July 15, 2013, 66. SoLux 4100K 50W Specifications, accessed July 28,
http://static.diabetesselfmanagement.com/pdfs/ 2013, http://www.solux.net/ies_files/SoLux%204100K%
DSM0310_012.pdf. 2050W%20Specs.pdf.
50. Time News Feed, accessed July 13, 2013, http:// 67. Centers for Medicare & Medicaid Services, accessed
newsfeed.time.com/2013/05/17/u-s-postal-service- July 28, 2013, http://www.cms.gov/Research-Statistics-
releases-list-of-worst-cities-for-dog-attacks. Data-and-Systems/Statistics-Trends-and-Reports/
51. US News Health, accessed July 16, 2013, http://health. Medicare-Provider-Charge-Data/Outpatient.html.
usnews.com/health-news/articles/2013/01/15/35-percent- 68. Google Public Data, Eurostat, accessed July 28, 2013, http://
of-americans-seek-medical-diagnosis-online. www.google.com/publicdata/explore?ds=ml9s8a132hlg.
Notes and Data Sources N-13
69. Estes Rockets, accessed July 28, 2013, http://www. 7. States seek federal crackdown on mussel invaders, USA
estesrockets.com/rockets/pro-series/009701-ventris- Today, accessed August 5, 2013, http://www.usatoday.
ps-rocket. com/story/news/nation/2013/08/05/mussel-invaders-
70. The New York Times, Health, Science, accessed July 29, cost-billions/2618669.
2013, http://well.blogs.nytimes.com/2013/05/30/for- 8. Forbes, accessed August 5, 2013, http://www.forbes.
new-doctors-8-minutes-per-patient/?_r=0. com/sites/tedreed/2013/05/18/will-americans-new-
71. First Read on NBC News.com, accessed July 29, 2013, boarding-process-work-it-failed-at-virgin-america.
http://f irstread.nbcnews.com/_news/2013/07/28/ 9. NHL.com, Rules, accessed August 5, 2013, http://www.
19700822-ahead-of-budget-battle-more-americans-say- nhl.com/ice/page.htm?id=26286.
sequester-has-hurt?lite. 10. Hardocp, accessed August 5, 2013, http://www.hardocp.
72. Genetics Home Reference, U.S. National Library of com/article/2013/03/12/crysis_3_video_card_perfor-
Medicine, accessed July 29, 2013, http://ghr.nlm.nih. mance_iq_review/5#.Uf_7um0nb90.
gov/condition/parkinson-disease. 11. Urban bees for hire: A thriving hive business, Reuters,
73. Association of American Railroads, accessed July 29, 2013, accessed August 6, 2013, http://www.reuters.com/
https://www.aar.org/safety/Documents/Freight% article/2013/06/19/idUS248215678920130619.
20Railroads%20Safely%20Moving%20Crude%20Oil.pdf. 12. Journal of Biomedical Graphics and Computing, Volume
74. Hot Air, accessed July 29, 2013, http://hotair.com/ 3, Number 3, 2013, accessed August 6, 2013, www.
archives/2013/07/26/important-update-from-msnbc- sciedu.ca/jbgc.
detroit-is-fast-becoming-americas-most-libertarian-city. 13. What’s on your playlist affects how you walk, NBC News
75. Department of Environment and Heritage Protection, Health, accessed August 7, 2013, http://www.nbcnews.
Queensland Government, accessed July 29, 2013, com/health/whats-your-playlist-affects-how-you-walk-
http://www.ehp.qld.gov.au/coastal/monitoring/waves/ 6C10596898.
index.php. 14. Genworth, accessed August 7, 2013, https://www.
genworth.com/corporate/about-genworth/industry-
76. Department of Water Resources, California Data
expertise/cost-of-care.html.
Exchange Center, accessed July 30, 2013, http://cdec.
water.ca.gov/cdecapp/resapp/resDetailOrig.action? 15. CBC News, British Columbia, accessed August 7, 2013,
resid=CAS. http://www.cbc.ca/news/canada/british-columbia/
story/2013/08/06/bc-coliform-false-creek.html.
77. Libertystone Hardscaping Systems, accessed July 30,
2013, http://www.liberty-stone.net/wall-products- 16. CNN Travel, accessed August 7, 2013, http://www.cnn.
stonevista-standard.html. com/2013/08/02/travel/20-travel-mistakes/index.
html?hpt=hp_bn10.
78. Catalyst, accessed July 30, 2013, http://www.catalyst.
17. The New York Times, Business Day, accessed August 8,
org/knowledge/women-ceos-fortune-1000.
2013, http://economix.blogs.nytimes.com/2013/08/01/
millennials-in-their-parents-basements/?_r=0.
Chapter 10 18. The New York Times, Business Day, accessed August 8,
1. Marketing charts, accessed July 31, 2013, http://www. 2013, http://www.nytimes.com/2013/07/16/business/air-
marketingcharts.com/wp/television/are-young-people- ports-on-the-border-make-room-for-canadian-flyers.html.
watching-less-tv-24817.
19. NBC News Health, accessed August 9, 2013, http://
2. NBC News Business, accessed August 1, 2013, http:// www.nbcnews.com/health/kids-exposure-secondhand-
www.nbcnews.com/business/growing-copper-theft- smoke-drops-except-among-those-asthma-6C10867951.
epidemic-sweeping-us-6C10791941. 20. Fox News, accessed August 9, 2013, http://www.
3. Breakdown of Deposit Savings Periods, Sam Beattie, foxnews.com/politics/2013/05/02/poll-shows-2-percent-
Press Association Mediapoint, June 19, 2013. voters-think-armed-revolution-might-be-needed.
4. Average cost of flood insurance policy in New Orleans 21. MCB Mobile, accessed August 9, 2013, http://www.
area continues to rise, Christian Moises, New Orleans mcbmobilepayments.com/96-people-to-have-phones-39-
City business, March 14, 2013. the-internet-by-the-end-of-2013-un/#.UgUYo20nb90.
5. Ministry of Health, Surgical Wait Times, British Columbia, 22. Public Policy Polling, accessed August 9, 2013, http://
accessed August 1, 2013, http://www.health.gov.bc.ca/swt/ www.publicpolicypolling.com/main/2013/04/conspiracy-
faces/Wooden.jsp. theory-poll-results-.html.
6. Top 10 Foods Highest in Iron, Health Alicious Ness. 23. Optin Online Panel, accessed August 9, 2013, http://
com, accessed August 2, 2013, http://www.healthali- optinpanel.org/wp-content/media/Metro-Opt-In-17-
ciousness.com/articles/food-sources-of-iron.php. Community-Newspapers-July.pdf.
N-14 Notes and Data Sources
24. Reston Association, Community Survey, Report of 7. USGS, Scientific Investigations Report 2013-5001,
Results, accessed August 9, 2013, https://www.reston. accessed August 25, 2013, http://pubs.usgs.gov/
o rg / p o r t a l s / 3 / G E N E R A L / I t e m % 2 0 C % 2 0 3 % 2 0 sir/2013/5001/table6.html.
Reston%20VA%20Report%20of%20Results%20 8. Concur Expense IQ report 2013, accessed August 25,
2013%20FINAL%202013-04-11.pdf. 2013, https://www.concur.com/sites/default/files/lp/
25. Rogers, accessed August 9, 2013, http://newsroom. pdfs/concur_expense_iq_report_2013_v2.pdf.
rogers.com/news/13-01-14/Survey_shows_Canadians_
9. ESPN Home Run Tracker, accessed August 25, 2013,
give_themselves_top_marks_for_being_tech_savvy_
http://www.hittrackeronline.com/index.php?h=&p=
but_fall_short_when_put_to_the_test.aspx.
&b=Fenway%2BPark.
26. KevinMD.com, accessed August 11, 2013, http://www.
10. Statistics Canada, accessed August 26, 2013, http://
kevinmd.com/blog/2013/07/radiation-exposure-related-
www.statcan.gc.ca/daily-quotidien/120510/t120510a001-
rise-lawsuit-culture.html.
eng.htm.
27. Unofficial Overall Results, China World Summit Wing
Hotel Vertical Run, accessed August 12, 2013, http:// 11. e-Power, accessed August 26, 2013, http://epower.core-
www.verticalrun.cn/downloads/2013_CVR_Results_ mark.com/2013/04/regional-differences-influence-c-
Overall_Unofficial.pdf. store-shopping.
28. PPIHC, accessed August 12, 2013, http://livetiming.net/ 12. Kaiser Health News, accessed August 27, 2013, http://
ppihc. www.kaiserhealthnews.org/Stories/2013/May/28/
29. Taser FOI, accessed August 13, 2013, https://docs. medicare-state-geographic-variation-costs.aspx.
google.com/a/guardian.co.uk/spreadsheet/ccc?key= 13. World Poultry, accessed September 2, 2013, http://www.
0At6CC4x_yBnMdE5wMnpiQ1FmNWV0QVNHWlM worldpoultry.net/Broilers/Nutrition/2013/3/Factors-
2WGhKTWc#gid=0. affecting-feed-intake-of-chickens-1172230W.
30. TriplePundit, accessed August 13, 2013, http://www. 14. Natural Resources Canada, accessed September 2, 2013,
triplepundit.com/2013/05/san-francisco-achieves-highest- http://cfs.nrcan.gc.ca/pages/353.
recycling-rate. 15. Autism, accessed September 2, 2013, http://aut.sagepub.
31. The Week, accessed August 15, 2013, http://theweek. com/content/17/2/184.abstract.
com/article/index/248281/its-not-your-imagination-
16. SFGate, accessed September 4, 2013, http://blog.sfgate.
bmw-drivers-are-the-biggest-jerks.
com/topdown/2013/09/03/the-new-san-francisco-oakland-
32. American Meteor Society, accessed August 15, 2013, bay-bridge-wide-open-vistas-a-bicycle-path-and-a-6-4-
http://www.amsmeteors.org/fireballs/fireball-tracking- billion-price-tag.
system-analysis.
17. European Nuclear Society, accessed September 4, 2013,
33. Predictors of indoor BTEX concentrations in Canadian http://www.euronuclear.org/info/encyclopedia/n/
residences, accessed August 15, 2013, http://www.stat- nuclear-power-plant-world-wide.htm.
can.gc.ca/pub/82-003-x/2013005/article/11793-eng.pdf.
18. fox4kc.com, accessed September 4, 2013, http://fox4kc.
Chapter 11 com/2013/04/09/kan-couple-catches-record-breaking-
1. Based on information from Farm Labor, National Agri- catfish.
cultural Statistics Service, accessed August 19, 2013, 19. Int.J. Environ. Res. Public Health 2013, 10(2), 541–555;
http:/usda01.library.cornell.edu/usda/current/FarmLabo/ doi:10.3390/ijerph10020541 Article Assessment of the
FarmLabo-05-21-2013.pdf. Levels of Airborne Bacteria, Gram-Negative Bacteria,
2. ScienceDaily, accessed August 22, 2013, http://www. and Fungi in Hospital Lobbies Dong-Uk Park1,*, Jeong-
sciencedaily.com/releases/2013/06/130617110931.htm. Kwan Yeom2,3, Won Jae Lee3 and Kyeong-Min Lee1,4
3. National Atmospheric Deposition Program, accessed 20. Based on information from Farm Labor, National Agri-
August 23, 2013, http://nadp.sws.uiuc.edu/MDN/ cultural Statistics Service, accessed August 19, 2013,
mdndata.aspx. http://usda01.library.cornell.edu/usda/current/FarmLabo/
4. USA Today, accessed August 24, 2013, http://www. FarmLabo-05-21-2013.pdf.
usatoday.com/story/money/cars/2013/05/04/worst-traffic-
cities/2127661. Chapter 12
5. Existing U.S. Coal Plants, accessed August 24, 2013, 1. Wind Turbine Noise, accessed September 7, 2013, http://
http://www.sourcewatch.org/index.php/Existing_U.S._ ramblingsdc.net/wtnoise.html.
Coal_Plants#SO2_pollution_and_pollution_controls. 2. GCM Magazine, accessed September 7, 2013, http://
6. Statistics Canada, accessed August 24, 2013, http:// www.gcsaa.org/_common/templates/GcsaaTwoColumn-
www5.statcan.gc.ca/cansim/a47. Layout.aspx?id=6985&LangType=1033.
Notes and Data Sources N-15
3. Mother Jones, accessed September 9, 2013, http://www. 23. European Food Safety Authority, accessed November 3,
motherjones.com/environment/2013/01/lead-crime- 2013, http://www.efsa.europa.eu/en/efsajournal/
link-gasoline. doc/3137.pdf.
4. EurekAlert, accessed September 13, 2013, http://www. 24. Evolution, Medicine, and Public Health, accessed
eurekalert.org/pub_releases/2013-05/atx-sfa051313.php. November 3, 2013, http://emph.oxfordjournals.org/
5. Radiology, accessed September 18, 2013, http://radiol- content/2013/1/173.full.
ogy.rsna.org/content/early/2013/06/03/radiol.13130545. 25. PLOS One, accessed November 3, 2013, http://www.
abstract. plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.
6. Climate Central, accessed October 2, 2013, http://www. pone.0070902.
climatecentral.org/news/study-f inds-plant-growth- 26. Newswire, accessed November 3, 2013, http://www.
surges-as-co2-levels-rise-16094. nielsen.com/us/en/newswire/2013/new-study-confirms-
7. United Nations University, Working Paper Series, correlation-between-twitter-and-tv-ratings.html.
accessed October 2, 2013, http://www.merit.unu.edu/ 27. Morana RTD, accessed November 3, 2013, http://www.
publications/wppdf/2013/wp2013-001.pdf. morana-rtd.com/e-preservationscience/2013/Luciani-
8. Journal of Science and Medicine in Sport, accessed 13-03-2013.pdf.
S eptember 12, 2013, http://www.jsams.org/article/ 28. U.S. Department of Labor, Bureau of Labor Statistics,
S1440-2440%2812%2900207-1/abstract. accessed November 6, 2013, http://www.bls.gov/news.
9. Science Daily, accessed October 9, 2013, http://www. release/cewqtr.t03.htm.
sciencedaily.com/releases/2012/10/121010102156.htm. 29. ISRN Veterinary Science, accessed November 11, 2013,
10. Virginia Cooperative Extension, Small Grains in 2013, http://www.hindawi.com/isrn/veterinary.science/2013/
accessed October 9, 2013, http://www.pubs.ext.vt.edu/ 684353.
CSES/CSES-62/CSES-62_pdf.pdf. 30. MLB Team Payrolls, accessed November 12, 2013,
11. PubMed, accessed October 9, 2013, http://www.ncbi. http://www.stevetheump.com/Payrolls.htm; and Team
nlm.nih.gov/pubmed/22077818. Marketing Report, accessed November 12, 2013, http://
news.cincinnati.com/assets/AB20330243.pdf.
12. National Data Buoy Center, National Oceanic and Atmo-
spheric Administration, http://www.ndbc.noaa.gov. 31. Calorie Comparison of Soy Milk Brands, accessed
November 13, 2013, http://www.fitsugar.com/Calorie-
13. Feeding America, accessed October 9, 2013, http://
Comparison-Soy-Milk-Brands-24224831.
feedingamerica.org/press-room/press-releases/map-the-
meal-gap-2013.aspx. 32. Natural Hazards and Earth System Sciences, accessed
November 13, 2013, http://www.nat-hazards-earth-
14. Canada.com, accessed October 11, 2013, http://o.
syst-sci-discuss.net/1/3449/2013/nhessd-1-3449-2013-
canada.com/business/money/city-of-toronto-parking-
print.pdf.
lots-are-overflowing-costing-drivers-thousands.
33. Financial Forecast Center, accessed November 16, 2013,
15. Gao et al., Regression of NASCAR: Looking into Five
http://www.forecasts.org/data/price-data.htm.
Years of Jimmie Johnson, SAS Global Forum 2013,
accessed October 20, 2013, http://support.sas.com/ 34. GCSAA, GCM Magazine, accessed November 16, 2013,
resources/papers/proceedings13/439-2013.pdf. http://www.gcsaa.org/_common/templates/GcsaaTwo-
ColumnLayout.aspx?id=6985&LangType=1033.
16. DriverAverages.com, accessed October 20, 2013, http://
www.driveraverages.com/nascar_stats. 35. Avian Conservation and Ecology, accessed November 16,
2013, http://www.ace-eco.org/vol8/iss2/art10/#collision.
17. The Weather Underground, accessed October 20, 2013,
http://www.wunderg round.com/histor y/air por t/ 36. L. Xuanyu, The Varination Tendency Analysis on
KAPF/2013/10/20/DailyHistory.html. Contribution Rate to Economic Growth by Production
Factor Input in China, Management Science and Engi-
18. Science Daily, accessed October 22, 2013, http://www.
neering, vol. 7, no. 2, pp. 16–23, accessed November
sciencedaily.com/releases/2013/09/130904203703.htm.
17, 2013.
19. WRisk.com, accessed October 22, 2013, http://www.
37. National Weather Service Forecast Office, Honolulu, HI,
wxrisk.com/2012/11/final-winter-preview-2012-13.
accessed November 18, 2013, http://www.prh.noaa.gov/
20. F. He et al., Salt Intake Is Related to Soft Drink Con- hnl/climate/PHNL_rainfall.php; and Golden Gate
sumption in Children and Adolescents, Hypertension, Weather Services, accessed November 18, 2013, http://
51;629, 2008. ggweather.com/enso/oni.htm.
2 1. Owasco Watershed Lake Association, Inc. 38. Nutrition Express, accessed November 29, 2013, http://
22. USGS, accessed October 24, 2013, http://pubs.usgs.gov/ www.nutritionexpress.com/article+index/newsletters/
ds/774. february+2013/showarticle.aspx?id=1817.
N-16 Notes and Data Sources
39. @CBS Local, accessed November 29, 2013, http:// 52. H. Riojas-Rodriguez et al., Intellectual Function in Mex-
washington.cbslocal.com/2013/10/24/study-more-time- ican Children Living in a Mining Area and Environmen-
spent-online-means-less-work-sleep. tally Exposed to Manganese, Environmental Health
40. StateMaster, accessed November 30, 2013, http://www. Perspectives, vol. 118, pp. 1465–1470, 2010.
statemaster.com/graph/geo_wat_are-geography-water- 53. Based on information from Yanzhou Coal Mining Com-
area; Responsive Management, accessed November 30, pany Limited.
2013, http://www.responsivemanagement.com/download/ 54. 2014 APTA Fact Book Appendix, accessed July 28,
reports/WA_Fish_Marketing_Plan.pdf. 2014, http://www.apta.com/resources/s.
41. The Annals of Occupational Hygiene, accessed November 55. Based on information from S. K. Sinnakaudan et al., Mul-
30, 2013, http://annhyg.oxfordjournals.org/content/48/ tiple Linear Regression Model for Total Bed Material Load
6/547.full. Prediction, Journal of Hydraulic Engineering, May 2006.
56. D. Dumicic et al., Internet Purchases in European Union
Countries: Multiple Linear Regression Approach, Inter-
Chapter 12 WEB
national Journal of Social, Management, Economics,
42. Krejza, J., and Mariak, Z. Effect of Age on Cerebral
and Business Engineering, vol. 8, no. 3, 834–840, 2014.
Blood Flow Velocity in Patients After Aneurysmal
Subarachnoid Hemorrhage, Stroke, vol. 33, no. 2, 57. Based on information from J. Gidjunis, Sprawl Exceeds
pp. 640–642, 2002. Reach of Hydrants, USA Today, August 24, 2007.
43. D. G. Hall, E. J. Wenniger, and M. G. Hentz, Temperature 58. About 1100 Injured as Meterorite Hits Russia with Force
Studies with the Asian Citrus Psyllid, Diaphorina citri: of Atomic Bomb, Associated Press, February 15, 2013.
Cold Hardiness and Temperature Thresholds for Ovipo- 59. Based on information from C. Masters, Props to the New
sition. Journal of Insect Science, vol. 11, art 83, 2011. Turbos, Time Magazine, September 3, 2007.
44. C. Moran, Stress and Emergency Work Experience: A 60. Based on information from Y. S. Han et al., Intraocular
Non-linear Relationship, Disaster Prevention and Man- Pressure and Influencing Systemic Health Parameters in
agement, vol. 7, issue 1, pp. 38–46, 1998. a Korean Population, Indian Journal of Ophthalmology,
45. Wang, S. et al., Arsenic and Fluoride Exposure in Drink- vol. 62, issue 3, pp. 305–301, 2014.
ing Water: Children’s IQ and Growth in Shanyin County, 61. Based on information from CNN Interactive, Some High
Shanxi Province, China, Environmental Health Perspec- Fiber Dry Cereals. Accessed August 17, 2014, http://
tives, vol. 115, no. 4, pp. 643–647, 2007. www.cnn.com/FOOD/resources/food.for.thought/
46. Ramachandran S. et al., Circulation, as reported in USA grains/cereal/compare.dry.fiber.html
Today, July 23, 2007. 62. Simulated Carrying Capacities of Fish in Norwegian Fjord,
47. Ciani E. et al., Neurochemical Correlates of Nicotine Neu- Fisheries Oceanography, vol. 4, no. 1, pp. 17–32, 1995.
rotoxicity on Rat Habenulo-Interpeduncular Cholinergic 63. Swift, P. et al., Residential Street Typology and Injury
Neurons, Neuro Toxicology, vol. 26, pp. 467–474, 2006. Accident Frequency, presented at the Congress for the
48. Based on information from Study: Diet soda linked to New Urbanism, Denver, Colorado, June 1997.
heart risks, USA Today, 7/23/07. Couvreur et al., The
Linear Relationship Between the Proportion of Fresh Chapter 13
Grass in the Cow Diet, Milk Fatty Acid Composition, 1. Daily News, accessed December 2, 2013, http://www.
and Butter Properties, Journal of Dairy Science, vol. 89, nydailynews.com/new-york/new-yorkers-havelongest-
pp. 1956–1969, 2006. commute-times-article-1.1426047.
49. Based on a report of a study done by New Zealand’s 2. Forbes, accessed December 3, 2013, http://www.forbes.
National Research Centre for Growth and Development com/sites/hannylerner/2013/10/05/how-to-hire-the-
and the University of Southhampton in Britain, Yahoo right-employees-and-discover-the-toxic-ones.
News, July 25, 2007. 3. Huff Post Food, accessed December 13, 2013, http://
50. Based on information in the article by S. Mishra et al, Mod- www.huff ingtonpost.com/john-dick/thanksgiving-
eling of Yarn Strength Utilization in Cotton Woven Fabrics traditions_b_4297296.html.
Using Multiple Linear Regression, Journal of Engineered 4. American Association of Individual Investors, accessed
Fibers and Fabrics, vol. 9, issue 2, pp. 105–111, 2014. December 15, 2013, http://www.aaii.com/sentiment-
51. Fisher, J. B. et al.,Wood Vessel Diameter Is Related to survey.
Elevation and Genotype in the Hawaiian Tree Metrosideros 5. West Bay Yacht Club, accessed December 15, 2013,
polymorpha (Myrtaceae), American Journal of Botany, http://westbayyachtclub.org/assets/2013-wbyc-cruise-
vol. 94, pp. 709–715, 2007. preference-survey-results.pdf.
Notes and Data Sources N-17
6. Centers for Disease Control and Prevention, accessed 21. The Economics of Law Practice in Ohio in 2013, Ohio
December 15, 2013, http://www.cdc.gov/flu/weekly. State Bar Association, accessed December 18, 2013,
7. Statistics Canada, accessed December 15, 2013, http:// https://www.ohiobar.org/NewsAndPublications/
www.statcan.gc.ca/pub/75-006-x/2013001/article/ Documents/OSBA_EconOfLawPracticeOhio.pdf.
11775/info-eng.htm. 22. Silicon Valley Bank Wine Report, State of the Wine
8. Rasmussen Reports, accessed December 16, 2013, http:// Industry 2013, accessed December 18, 2013, www.svb.
www.rasmussenreports.com/public_content/lifestyle/ com/wine-report-2013-pdf.
general_lifestyle/july_2013/58_eat_at_a_restaurant_at_
least_once_a_week. Chapter 14
9. Switchboard, accessed December 16, 2013, http:// 1. How Much Damage Did the Deepwater Horizon Spill
switchboard.nrdc.org/blogs/kbenfield/new_realtors_ Do to the Gulf of Mexico?, Scientific American, accessed
community_prefere.html. December 19, 2013, http://www.scientificamerican.com/
10. The University of Maine, Miscellaneous Reports, Maine article.cfm?id5how-much-damage-deepwater-horizon-
Agricultural and Forest Experiment Station, accessed gulf-mexico.
December 16, 2013, http://digitalcommons.library. 2. EPA, Coastal Water Sampling, accessed December 20,
umaine.edu/cgi/viewcontent.cgi?article=1017&context= 2013, http://www.epa.gov/bpspill/water.html#mapreports.
aes_miscreports. 3. YCharts, accessed December 24, 2013, http://ycharts.
11. MSN Money, accessed December 16, 2013, http:// com/indicators/average_duration_of_unemployment.
money.msn.com/investing/2013-customer-service-hall- 4. Health Extremist, accessed December 24, 2013, http://
of-shame. www.healthextremist.com/are-eggs-good-for-you-30-
12. 2013 Capital Bikeshare Member Survey Report, reasons-to-eat-eggs.
accessed December 16, 2013, http://capitalbikeshare. 5. League of Denial: The NFL’s Concussion Crisis, accessed
com/assets/pdf/CABI-2013SurveyReport.pdf. December 26, 2013, http://www.pbs.org/wgbh/pages/
13. J. G. Berry et al., Hospital Utilization and Characteristics frontline/sports/league-of-denial/timeline-the-nfls-
of Patients Experiencing Recurrent Readmissions Within concussion-crisis.
Children’s Hospitals, The Journal of the American Medi- 6. NBC News Health, accessed December 26, 2013, http://
cal Association, 305(7):682–690, 2011. www.nbcnews.com/health/women-may-have-harder-
14. DraftHistory.com, accessed December 18, 2013, http:// concussion-recovery-men-2D11603652.
www.drafthistory.com/index.php/summary/summary. 7. USA Today, accessed December 26, 2013, http://www.
15. Public Health Agency of Canada, Statement of Seasonal usatoday.com/story/money/cars/2013/09/04/record-
Influenza Vaccine for 2013-2014, accessed December price-new-car-august/2761341.
18, 2013, http://www.phac-aspc.gc.ca/publicat/ccdr- 8. StreetsBlog.org, accessed December 26, 2013, http://
rmtc/13vol39/acs-dcc-4/#rec. www.streetsblog.org/2013/09/05/after-the-addition-of-
16. Centers for Disease Control and Prevention, Morbidity bike-lanes-and-plazas-manhattan-traffic-moves-faster.
and Mortality Weekly Report, accessed December 18, 9. American Heart Association, accessed December 26,
2013, http://www.cdc.gov/mmwr/preview/mmwrhtml/ 2013, http://www.heart.org/HEARTORG/Conditions/
mm5919a2.htm. Arrhythmia/AboutArrhythmia/Atrial-Fibrillation-
17. City of Oakland Shot Spotter Report, October 2013, Medications_UCM_423781_Article.jsp#.
accessed December 18, 2013, http://www2.oaklandnet. 10. National Heart, Lung, and Blood Institute, accessed
com/oakca1/groups/police/documents/webcontent/ December 27, 2013, http://www.nhlbi.nih.gov/health/
oak043887.pdf. health-topics/topics/lungtxp/risks.html.
18. USA Today Snapshot, accessed December 18, 2013, 11. Yahoo News Canada, accessed December 28, 2013,
http://www.usatoday.com. http://ca.news.yahoo.com/blogs/geekquinox/ontario-
19. Europe’s Most Desirable Logistics Locations, ProLogis, grand-river-tests-highest-world-artificial-sweeteners-
accessed December 19, 2013, http://www.portofantwerp. 200300835.html.
com/sites/portofantwerp/files/imce/WinningLocations 12. Mail Online, accessed December 27, 2013, http://www.
Europe_0.pdf. dailymail.co.uk/health/article-2179572/Night-shifts-
20. 2013 Canadian University Survey Consortium (CUSC): raise-risk-heart-attacks-strokes-40-cent.html.
First-Year Undergraduate Students, Carleton University, 13. Mark’s Daily Apple, accessed December 30, 2013, http://
accessed December 18, 2013, http://oirp.carleton.ca/ www.marksdailyapple.com/how-much-glucose-does-
surveys/CUSC2013_summary.pdf. your-brain-really-need/#axzz2oxZxwtXj.
N-18 Notes and Data Sources
14. CreditCards.com, accessed on PR Newswire, December 30. StarTribune, Lifestyle, accessed January 6, 2014, http://
30, 2013, http://www.prnewswire.com/news-releases/ www.startribune.com/lifestyle/208529101.html.
creditcardscom-weekly-credit-card-rate-report-credit- 31. Kellogg School of Management, accessed January 6,
card-interest-rates-rise-to-1506-percent-232734271. 2013, http://www.kellogg.northwestern.edu/programs/
html. fulltimemba/faqs.aspx.
15. Spending & Saving Tracker, accessed December 30, 32. Science Daily, accessed January 6, 2014, http://www.
2013, http://amexspendsave.mediaroom.com/index. sciencedaily.com/releases/2013/05/130515174407.htm.
php?s534135&item519#assets_123.
33. P. Frase, Sociology, CUNY Graduate Center, accessed
16. CBC News Health, accessed December 30, 2013, http:// January 6, 2014, http://www.peterfrase.com/category/
www.cbc.ca/news/health/millions-of-acres-of-china- social-science/statistical-graphics-social-science.
farmland-too-polluted-to-grow-food-1.2479102.
34. F. G. Silversides and K. Budgell, The Relationships
17. A Comparison of 2013 Dairy Policy Alternatives on Among Measures of Egg Albumen Height, pH, and
Dairy Markets, accessed December 30, 2013, http:// Whipping Volume, Poultry Science, accessed January 6,
www.nmpf.org/files/AMAP2013DSADFA.pdf. 2014, http://ps.fass.org/content/83/10/1619.long.
18. Opportunity Index, accessed December 31, 2013, http:// 35. Psychology Today, accessed on January 6, 2014, http://
opportunityindex.org/#4.00/40.00/-97.00. w w w. p s y c h o l og y t o d ay. c o m / bl og / o u r- g e n d e r-
19. Science Daily, accessed January 1, 2014, http://www. ourselves/201309/what-your-selfies-say-about-you.
sciencedaily.com/releases/2013/11/131112141236.htm. 36. Lake Powell Water Database, accessed January 8, 2014,
20. Sports Illustrated, Inside the NHL, accessed January 2, http://lakepowell.water-data.com.
2014, http://sportsillustrated.cnn.com/nhl/news/20130308/ 37. The Daily Herald, accessed January 8, 2014, http://www.
top-10-nhl-slap-shots-of-all-time. thedailyherald.com/index.php?option5com_content&
21. European Environmental Agency, accessed January 2, view5article&id545016:a-minute-with-met-opera-
2014, http://www.eea.europa.eu/highlights/co2-emissions- chief-peter-gelb-on-live-opera-broadcasts&catid518:
from-new-vans. entertainment&Itemid529.
22. WGMS and NSIDC, 1989, updated 2012, World Glacier 38. Environmental Health Perspectives, accessed January 8,
Inventory. Compiled and made available by the World 2014, http://www.ncbi.nlm.nih.gov/pmc/articles/
Glacier Monitoring Service, Zurich, Switzerland, and PMC2974689.
the National Snow and Ice Data Center, Boulder CO, 39. Huff Post, accessed January 8, 2014, http://www.huff
U.S.A., doi:10.7265/N5/NSIDC-WGI-2012-02. ingtonpost.com/2014/01/07/mimo-baby-monitor_
23. Flo Track, accessed January 3, 2014, http://www.flotrack. n_4556461.html.
org/coverage/250963-New-York-City-M arathon- 40. Speedski-Info, accessed January 8, 2014, http://www.
2013/article/22961-RESULTS-ING-New-York-City- speedski-info.com/index_E.php.
Marathon-2013. 41. Worldsteel Association, accessed January 9, 2014, http://
24. Supernova discovery statistics for 2013, accessed January www.worldsteel.org/statistics/crude-steel-production.
3, 2014, http://www.rochesterastronomy.org/sn2013/ html.
snstats.html. 42. Yahoo News, accessed January 8, 2014, http://gma.
25. S. Tobak, 9 Ways to Make a Million, accessed January 3, y a h o o . c o m / t a l l e s t - h o t e l - a m e r i c a - o p e n s - yo r k -
2014, http://smallbusiness.yahoo.com/advisor/9-ways- city-015502341–abc-news-travel.html.
to-make-a-million-203855561.html. 43. CBC News Toronto, accessed January 8, 2014, http://
26. CBC News, Ottawa, accessed January 6, 2014, http:// www.cbc.ca/news/canada/toronto/bosses-should-stop-
www.cbc.ca/news/canada/ottawa/propane-shipping- asking-workers-for-sick-notes-oma-head-says-1.2489885.
delays-hit-eastern-ontario-1.2483820. 44. E. Rabiei, U. Haberlandt, M. Sester, and D. Fitzner, Rain-
27. Mail Online, accessed January 6, 2014, http://www. fall Estimation Using Moving Cars as Rain Gauges—
dailymail.co.uk/news/article-2531945/Brad-Pitts- Laboratory Experiments. Hydrology and Earth System
Hurricane-Katrina-charity-fire-homes-rotting.html. Sciences, 17:4701–4712, 2013, accessed January 9, 2014.
28. IT World Canada, accessed January 6, 2014, http://www. 45. College Football Statistics, accessed January 9, 2014,
itworldcanada.com/article/canucks-split-on-government- http://www.cfbstats.com.
reading-email-survey/84227. 46. CNN Travel, accessed January 10, 2014, http://www.
29. CBS News, accessed January 6, 2014, http://www. cnn.com/2013/12/17/travel/four-innovative-ways-cut-
cbsnews.com/news/the-kings-popularity-constant. boarding-planes.
Tables Appendix
T-1
T-2 Tabl e s Appendi x
k50
n55 p
x 0.01 0.05 0.10 0.20 0.25 0.30 0.40 0.50 0.60 0.70 0.75 0.80 0.90 0.95 0.99
0 0.9510 0.7738 0.5905 0.3277 0.2373 0.1681 0.0778 0.0313 0.0102 0.0024 0.0010 0.0003 0.0000
1 0.9990 0.9774 0.9185 0.7373 0.6328 0.5282 0.3370 0.1875 0.0870 0.0308 0.0156 0.0067 0.0005 0.0000
2 1.0000 0.9988 0.9914 0.9421 0.8965 0.8369 0.6826 0.5000 0.3174 0.1631 0.1035 0.0579 0.0086 0.0012 0.0000
3 1.0000 0.9995 0.9933 0.9844 0.9692 0.9130 0.8125 0.6630 0.4718 0.3672 0.2627 0.0815 0.0226 0.0010
4 1.0000 0.9997 0.9990 0.9976 0.9898 0.9688 0.9222 0.8319 0.7627 0.6723 0.4095 0.2262 0.0490
n 5 10 p
x 0.01 0.05 0.10 0.20 0.25 0.30 0.40 0.50 0.60 0.70 0.75 0.80 0.90 0.95 0.99
0 0.9044 0.5987 0.3487 0.1074 0.0563 0.0282 0.0060 0.0010 0.0001 0.0000
1 0.9957 0.9139 0.7361 0.3758 0.2440 0.1493 0.0464 0.0107 0.0017 0.0001 0.0000 0.0000
2 0.9999 0.9885 0.9298 0.6778 0.5256 0.3828 0.1673 0.0547 0.0123 0.0016 0.0004 0.0001 0.0000
3 1.0000 0.9990 0.9872 0.8791 0.7759 0.6496 0.3823 0.1719 0.0548 0.0106 0.0035 0.0009 0.0000
4 0.9999 0.9984 0.9672 0.9219 0.8497 0.6331 0.3770 0.1662 0.0473 0.0197 0.0064 0.0001 0.0000
5 1.0000 0.9999 0.9936 0.9803 0.9527 0.8338 0.6230 0.3669 0.1503 0.0781 0.0328 0.0016 0.0001
6 1.0000 0.9991 0.9965 0.9894 0.9452 0.8281 0.6177 0.3504 0.2241 0.1209 0.0128 0.0010 0.0000
7 0.9999 0.9996 0.9984 0.9877 0.9453 0.8327 0.6172 0.4744 0.3222 0.0702 0.0115 0.0001
8 1.0000 1.0000 0.9999 0.9983 0.9893 0.9536 0.8507 0.7560 0.6242 0.2639 0.0861 0.0043
9 1.0000 0.9999 0.9990 0.9940 0.9718 0.9437 0.8926 0.6513 0.4013 0.0956
n 5 15 p
x 0.01 0.05 0.10 0.20 0.25 0.30 0.40 0.50 0.60 0.70 0.75 0.80 0.90 0.95 0.99
n 5 20 p
x 0.01 0.05 0.10 0.20 0.25 0.30 0.40 0.50 0.60 0.70 0.75 0.80 0.90 0.95 0.99
n 5 25 p
x 0.01 0.05 0.10 0.20 0.25 0.30 0.40 0.50 0.60 0.70 0.75 0.80 0.90 0.95 0.99
k50
l
x 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50
0 0.9512 0.9048 0.8607 0.8187 0.7788 0.7408 0.7047 0.6703 0.6376 0.6065
1 0.9988 0.9953 0.9898 0.9825 0.9735 0.9631 0.9513 0.9384 0.9246 0.9098
2 1.0000 0.9998 0.9995 0.9989 0.9978 0.9964 0.9945 0.9921 0.9891 0.9856
3 1.0000 1.0000 0.9999 0.9999 0.9997 0.9995 0.9992 0.9988 0.9982
4 1.0000 1.0000 1.0000 1.0000 0.9999 0.9999 0.9998
5 1.0000 1.0000 1.0000
l
x 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00
0 0.5769 0.5488 0.5220 0.4966 0.4724 0.4493 0.4274 0.4066 0.3867 0.3679
1 0.8943 0.8781 0.8614 0.8442 0.8266 0.8088 0.7907 0.7725 0.7541 0.7358
2 0.9815 0.9769 0.9717 0.9659 0.9595 0.9526 0.9451 0.9371 0.9287 0.9197
3 0.9975 0.9966 0.9956 0.9942 0.9927 0.9909 0.9889 0.9865 0.9839 0.9810
4 0.9997 0.9996 0.9994 0.9992 0.9989 0.9986 0.9982 0.9977 0.9971 0.9963
5 1.0000 1.0000 0.9999 0.9999 0.9999 0.9998 0.9997 0.9997 0.9995 0.9994
6 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9999 0.9999
7 1.0000 1.0000
l
x 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0
0 0.3329 0.3012 0.2725 0.2466 0.2231 0.2019 0.1827 0.1653 0.1496 0.1353
1 0.6990 0.6626 0.6268 0.5918 0.5578 0.5249 0.4932 0.4628 0.4337 0.4060
2 0.9004 0.8795 0.8571 0.8335 0.8088 0.7834 0.7572 0.7306 0.7037 0.6767
3 0.9743 0.9662 0.9569 0.9463 0.9344 0.9212 0.9068 0.8913 0.8747 0.8571
4 0.9946 0.9923 0.9893 0.9857 0.9814 0.9763 0.9704 0.9636 0.9559 0.9473
5 0.9990 0.9985 0.9978 0.9968 0.9955 0.9940 0.9920 0.9896 0.9868 0.9834
6 0.9999 0.9997 0.9996 0.9994 0.9991 0.9987 0.9981 0.9974 0.9966 0.9955
7 1.0000 1.0000 0.9999 0.9999 0.9998 0.9997 0.9996 0.9994 0.9992 0.9989
8 1.0000 1.0000 1.0000 1.0000 0.9999 0.9999 0.9998 0.9998
9 1.0000 1.0000 1.0000 1.0000 1.0000
Tab l e s A pp end ix T-5
l
x 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0
0 0.1225 0.1108 0.1003 0.0907 0.0821 0.0743 0.0672 0.0608 0.0550 0.0498
1 0.3796 0.3546 0.3309 0.3084 0.2873 0.2674 0.2487 0.2311 0.2146 0.1991
2 0.6496 0.6227 0.5960 0.5697 0.5438 0.5184 0.4936 0.4695 0.4460 0.4232
3 0.8386 0.8194 0.7993 0.7787 0.7576 0.7360 0.7141 0.6919 0.6696 0.6472
4 0.9379 0.9275 0.9162 0.9041 0.8912 0.8774 0.8629 0.8477 0.8318 0.8153
5 0.9796 0.9751 0.9700 0.9643 0.9580 0.9510 0.9433 0.9349 0.9258 0.9161
6 0.9941 0.9925 0.9906 0.9884 0.9858 0.9828 0.9794 0.9756 0.9713 0.9665
7 0.9985 0.9980 0.9974 0.9967 0.9958 0.9947 0.9934 0.9919 0.9901 0.9881
8 0.9997 0.9995 0.9994 0.9991 0.9989 0.9985 0.9981 0.9976 0.9969 0.9962
9 0.9999 0.9999 0.9999 0.9998 0.9997 0.9996 0.9995 0.9993 0.9991 0.9989
10 1.0000 1.0000 1.0000 1.0000 0.9999 0.9999 0.9999 0.9998 0.9998 0.9997
11 1.0000 1.0000 1.0000 1.0000 1.0000 0.9999 0.9999
12 1.0000 1.0000
l
x 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4.0
0 0.0450 0.0408 0.0369 0.0334 0.0302 0.0273 0.0247 0.0224 0.0202 0.0183
1 0.1847 0.1712 0.1586 0.1468 0.1359 0.1257 0.1162 0.1074 0.0992 0.0916
2 0.4012 0.3799 0.3594 0.3397 0.3208 0.3027 0.2854 0.2689 0.2531 0.2381
3 0.6248 0.6025 0.5803 0.5584 0.5366 0.5152 0.4942 0.4735 0.4532 0.4335
4 0.7982 0.7806 0.7626 0.7442 0.7254 0.7064 0.6872 0.6678 0.6484 0.6288
5 0.9057 0.8946 0.8829 0.8705 0.8576 0.8441 0.8301 0.8156 0.8006 0.7851
6 0.9612 0.9554 0.9490 0.9421 0.9347 0.9267 0.9182 0.9091 0.8995 0.8893
7 0.9858 0.9832 0.9802 0.9769 0.9733 0.9692 0.9648 0.9599 0.9546 0.9489
8 0.9953 0.9943 0.9931 0.9917 0.9901 0.9883 0.9863 0.9840 0.9815 0.9786
9 0.9986 0.9982 0.9978 0.9973 0.9967 0.9960 0.9952 0.9942 0.9931 0.9919
10 0.9996 0.9995 0.9994 0.9992 0.9990 0.9987 0.9984 0.9981 0.9977 0.9972
11 0.9999 0.9999 0.9998 0.9998 0.9997 0.9996 0.9995 0.9994 0.9993 0.9991
12 1.0000 1.0000 1.0000 0.9999 0.9999 0.9999 0.9999 0.9998 0.9998 0.9997
13 1.0000 1.0000 1.0000 1.0000 1.0000 0.9999 0.9999
14 1.0000 1.0000
T-6 Tabl es Append i x
l
x 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5.0
0 0.0166 0.0150 0.0136 0.0123 0.0111 0.0101 0.0091 0.0082 0.0074 0.0067
1 0.0845 0.0780 0.0719 0.0663 0.0611 0.0563 0.0518 0.0477 0.0439 0.0404
2 0.2238 0.2102 0.1974 0.1851 0.1736 0.1626 0.1523 0.1425 0.1333 0.1247
3 0.4142 0.3954 0.3772 0.3594 0.3423 0.3257 0.3097 0.2942 0.2793 0.2650
4 0.6093 0.5898 0.5704 0.5512 0.5321 0.5132 0.4946 0.4763 0.4582 0.4405
5 0.7693 0.7531 0.7367 0.7199 0.7029 0.6858 0.6684 0.6510 0.6335 0.6160
6 0.8786 0.8675 0.8558 0.8436 0.8311 0.8180 0.8046 0.7908 0.7767 0.7622
7 0.9427 0.9361 0.9290 0.9214 0.9134 0.9049 0.8960 0.8867 0.8769 0.8666
8 0.9755 0.9721 0.9683 0.9642 0.9597 0.9549 0.9497 0.9442 0.9382 0.9319
9 0.9905 0.9889 0.9871 0.9851 0.9829 0.9805 0.9778 0.9749 0.9717 0.9682
10 0.9966 0.9959 0.9952 0.9943 0.9933 0.9922 0.9910 0.9896 0.9880 0.9863
11 0.9989 0.9986 0.9983 0.9980 0.9976 0.9971 0.9966 0.9960 0.9953 0.9945
12 0.9997 0.9996 0.9995 0.9993 0.9992 0.9990 0.9988 0.9986 0.9983 0.9980
14 0.9999 0.9999 0.9998 0.9998 0.9997 0.9997 0.9996 0.9995 0.9994 0.9993
15 1.0000 1.0000 1.0000 0.9999 0.9999 0.9999 0.9999 0.9999 0.9998 0.9998
16 1.0000 1.0000 1.0000 1.0000 1.0000 0.9999 0.9999
17 1.0000 1.0000
l
x 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5 10.0
0 0.0041 0.0025 0.0015 0.0009 0.0006 0.0003 0.0002 0.0001 0.0001 0.0000
1 0.0266 0.0174 0.0113 0.0073 0.0047 0.0030 0.0019 0.0012 0.0008 0.0005
2 0.0884 0.0620 0.0430 0.0296 0.0203 0.0138 0.0093 0.0062 0.0042 0.0028
3 0.2017 0.1512 0.1118 0.0818 0.0591 0.0424 0.0301 0.0212 0.0149 0.0103
4 0.3575 0.2851 0.2237 0.1730 0.1321 0.0996 0.0744 0.0550 0.0403 0.0293
5 0.5289 0.4457 0.3690 0.3007 0.2414 0.1912 0.1496 0.1157 0.0885 0.0671
6 0.6860 0.6063 0.5265 0.4497 0.3782 0.3134 0.2562 0.2068 0.1649 0.1301
7 0.8095 0.7440 0.6728 0.5987 0.5246 0.4530 0.3856 0.3239 0.2687 0.2202
8 0.8944 0.8472 0.7916 0.7291 0.6620 0.5925 0.5231 0.4557 0.3918 0.3328
9 0.9462 0.9161 0.8774 0.8305 0.7764 0.7166 0.6530 0.5874 0.5218 0.4579
10 0.9747 0.9574 0.9332 0.9015 0.8622 0.8159 0.7634 0.7060 0.6453 0.5830
11 0.9890 0.9799 0.9661 0.9467 0.9208 0.8881 0.8487 0.8030 0.7520 0.6968
12 0.9955 0.9912 0.9840 0.9730 0.9573 0.9362 0.9091 0.8758 0.8364 0.7916
13 0.9983 0.9964 0.9929 0.9872 0.9784 0.9658 0.9486 0.9261 0.8981 0.8645
14 0.9994 0.9986 0.9970 0.9943 0.9897 0.9827 0.9726 0.9585 0.9400 0.9165
15 0.9998 0.9995 0.9988 0.9976 0.9954 0.9918 0.9862 0.9780 0.9665 0.9513
16 0.9999 0.9998 0.9996 0.9990 0.9980 0.9963 0.9934 0.9889 0.9823 0.9730
17 1.0000 0.9999 0.9998 0.9996 0.9992 0.9984 0.9970 0.9947 0.9911 0.9857
18 1.0000 0.9999 0.9999 0.9997 0.9993 0.9987 0.9976 0.9957 0.9928
19 1.0000 1.0000 0.9999 0.9997 0.9995 0.9989 0.9980 0.9965
20 1.0000 0.9999 0.9998 0.9996 0.9991 0.9984
21 1.0000 0.9999 0.9998 0.9996 0.9993
22 1.0000 0.9999 0.9999 0.9997
23 1.0000 0.9999 0.9999
24 1.0000 1.0000
Tab l e s A pp e nd ix T-7
0 z
z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
23.4 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0002
23.3 0.0005 0.0005 0.0005 0.0004 0.0004 0.0004 0.0004 0.0004 0.0004 0.0003
23.2 0.0007 0.0007 0.0006 0.0006 0.0006 0.0006 0.0006 0.0005 0.0005 0.0005
23.1 0.0010 0.0009 0.0009 0.0009 0.0008 0.0008 0.0008 0.0008 0.0007 0.0007
23.0 0.0013 0.0013 0.0013 0.0012 0.0012 0.0011 0.0011 0.0011 0.0010 0.0010
22.9 0.0019 0.0018 0.0018 0.0017 0.0016 0.0016 0.0015 0.0015 0.0014 0.0014
22.8 0.0026 0.0025 0.0024 0.0023 0.0023 0.0022 0.0021 0.0021 0.0020 0.0019
22.7 0.0035 0.0034 0.0033 0.0032 0.0031 0.0030 0.0029 0.0028 0.0027 0.0026
22.6 0.0047 0.0045 0.0044 0.0043 0.0041 0.0040 0.0039 0.0038 0.0037 0.0036
22.5 0.0062 0.0060 0.0059 0.0057 0.0055 0.0054 0.0052 0.0051 0.0049 0.0048
22.4 0.0082 0.0080 0.0078 0.0075 0.0073 0.0071 0.0069 0.0068 0.0066 0.0064
22.3 0.0107 0.0104 0.0102 0.0099 0.0096 0.0094 0.0091 0.0089 0.0087 0.0084
22.2 0.0139 0.0136 0.0132 0.0129 0.0125 0.0122 0.0119 0.0116 0.0113 0.0110
22.1 0.0179 0.0174 0.0170 0.0166 0.0162 0.0158 0.0154 0.0150 0.0146 0.0143
22.0 0.0228 0.0222 0.0217 0.0212 0.0207 0.0202 0.0197 0.0192 0.0188 0.0183
21.9 0.0287 0.0281 0.0274 0.0268 0.0262 0.0256 0.0250 0.0244 0.0239 0.0233
21.8 0.0359 0.0351 0.0344 0.0336 0.0329 0.0322 0.0314 0.0307 0.0301 0.0294
21.7 0.0446 0.0436 0.0427 0.0418 0.0409 0.0401 0.0392 0.0384 0.0375 0.0367
21.6 0.0548 0.0537 0.0526 0.0516 0.0505 0.0495 0.0485 0.0475 0.0465 0.0455
21.5 0.0668 0.0655 0.0643 0.0630 0.0618 0.0606 0.0594 0.0582 0.0571 0.0559
21.4 0.0808 0.0793 0.0778 0.0764 0.0749 0.0735 0.0721 0.0708 0.0694 0.0681
21.3 0.0968 0.0951 0.0934 0.0918 0.0901 0.0885 0.0869 0.0853 0.0838 0.0823
21.2 0.1151 0.1131 0.1112 0.1093 0.1075 0.1056 0.1038 0.1020 0.1003 0.0985
21.1 0.1357 0.1335 0.1314 0.1292 0.1271 0.1251 0.1230 0.1210 0.1190 0.1170
21.0 0.1587 0.1562 0.1539 0.1515 0.1492 0.1469 0.1446 0.1423 0.1401 0.1379
20.9 0.1841 0.1814 0.1788 0.1762 0.1736 0.1711 0.1685 0.1660 0.1635 0.1611
20.8 0.2119 0.2090 0.2061 0.2033 0.2005 0.1977 0.1949 0.1922 0.1894 0.1867
20.7 0.2420 0.2389 0.2358 0.2327 0.2296 0.2266 0.2236 0.2206 0.2177 0.2148
20.6 0.2743 0.2709 0.2676 0.2643 0.2611 0.2578 0.2546 0.2514 0.2483 0.2451
20.5 0.3085 0.3050 0.3015 0.2981 0.2946 0.2912 0.2877 0.2843 0.2810 0.2776
20.4 0.3446 0.3409 0.3372 0.3336 0.3300 0.3264 0.3228 0.3192 0.3156 0.3121
20.3 0.3821 0.3783 0.3745 0.3707 0.3669 0.3632 0.3594 0.3557 0.3520 0.3483
20.2 0.4207 0.4168 0.4129 0.4090 0.4052 0.4013 0.3974 0.3936 0.3897 0.3859
20.1 0.4602 0.4562 0.4522 0.4483 0.4443 0.4404 0.4364 0.4325 0.4286 0.4247
20.0 0.5000 0.4960 0.4920 0.4880 0.4840 0.4801 0.4761 0.4721 0.4681 0.4641
T-8 Tabl es Appe ndi x
z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753
0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141
0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517
0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879
0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224
0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549
0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852
0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133
0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389
1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621
1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830
1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015
1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177
1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319
1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441
1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545
1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633
1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706
1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767
2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817
2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857
2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890
2.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916
2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936
2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952
2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964
2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974
2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981
2.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986
3.0 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990
3.1 0.9990 0.9991 0.9991 0.9991 0.9992 0.9992 0.9992 0.9992 0.9993 0.9993
3.2 0.9993 0.9993 0.9994 0.9994 0.9994 0.9994 0.9994 0.9995 0.9995 0.9995
3.3 0.9995 0.9995 0.9995 0.9996 0.9996 0.9996 0.9996 0.9996 0.9996 0.9997
3.4 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9998
0 t,
a
n 0.20 0.10 0.05 0.025 0.01 0.005 0.001 0.0005 0.0001
This table contains critical values associated with the chi-square distribution, x2a,n, defined
by a and the degrees of freedom, n.
a
n 0.9999 0.9995 0.999 0.995 0.99 0.975 0.95 0.90
a
n 0.10 0.05 0.025 0.01 0.005 0.001 0.0005 0.0001
a 5 0.05 n1
n2 1 2 3 4 5 6 7 8 9 10 15 20 30 40 50 60 100
1 161.45 199.50 215.71 224.58 230.16 233.99 236.77 238.88 240.54 241.88 245.95 248.01 250.10 251.14 251.77 252.20 253.04
2 18.51 19.00 19.16 19.25 19.30 19.33 19.35 19.37 19.38 19.40 19.43 19.45 19.46 19.47 19.48 19.48 19.49
3 10.13 9.55 9.28 9.12 9.01 8.94 8.89 8.85 8.81 8.79 8.70 8.66 8.62 8.59 8.58 8.57 8.55
4 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 6.00 5.96 5.86 5.80 5.75 5.72 5.70 5.69 5.66
5 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 4.77 4.74 4.62 4.56 4.50 4.46 4.44 4.43 4.41
6 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 4.10 4.06 3.94 3.87 3.81 3.77 3.75 3.74 3.71
7 5.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 3.68 3.64 3.51 3.44 3.38 3.34 3.32 3.30 3.27
8 5.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 3.39 3.35 3.22 3.15 3.08 3.04 3.02 3.01 2.97
9 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 3.18 3.14 3.01 2.94 2.86 2.83 2.80 2.79 2.76
10 4.96 4.10 3.71 3.48 3.33 3.22 3.14 3.07 3.02 2.98 2.85 2.77 2.70 2.66 2.64 2.62 2.59
11 4.84 3.98 3.59 3.36 3.20 3.09 3.01 2.95 2.90 2.85 2.72 2.65 2.57 2.53 2.51 2.49 2.46
12 4.75 3.89 3.49 3.26 3.11 3.00 2.91 2.85 2.80 2.75 2.62 2.54 2.47 2.43 2.40 2.38 2.35
13 4.67 3.81 3.41 3.18 3.03 2.92 2.83 2.77 2.71 2.67 2.53 2.46 2.38 2.34 2.31 2.30 2.26
14 4.60 3.74 3.34 3.11 2.96 2.85 2.76 2.70 2.65 2.60 2.46 2.39 2.31 2.27 2.24 2.22 2.19
15 4.54 3.68 3.29 3.06 2.90 2.79 2.71 2.64 2.59 2.54 2.40 2.33 2.25 2.20 2.18 2.16 2.12
16 4.49 3.63 3.24 3.01 2.85 2.74 2.66 2.59 2.54 2.49 2.35 2.28 2.19 2.15 2.12 2.11 2.07
17 4.45 3.59 3.20 2.96 2.81 2.70 2.61 2.55 2.49 2.45 2.31 2.23 2.15 2.10 2.08 2.06 2.02
18 4.41 3.55 3.16 2.93 2.77 2.66 2.58 2.51 2.46 2.41 2.27 2.19 2.11 2.06 2.04 2.02 1.98
19 4.38 3.52 3.13 2.90 2.74 2.63 2.54 2.48 2.42 2.38 2.23 2.16 2.07 2.03 2.00 1.98 1.94
20 4.35 3.49 3.10 2.87 2.71 2.60 2.51 2.45 2.39 2.35 2.20 2.12 2.04 1.99 1.97 1.95 1.91
21 4.32 3.47 3.07 2.84 2.68 2.57 2.49 2.42 2.37 2.32 2.18 2.10 2.01 1.96 1.94 1.92 1.88
22 4.30 3.44 3.05 2.82 2.66 2.55 2.46 2.40 2.34 2.30 2.15 2.07 1.98 1.94 1.91 1.89 1.85
23 4.28 3.42 3.03 2.80 2.64 2.53 2.44 2.37 2.32 2.27 2.13 2.05 1.96 1.91 1.88 1.86 1.82
24 4.26 3.40 3.01 2.78 2.62 2.51 2.42 2.36 2.30 2.25 2.11 2.03 1.94 1.89 1.86 1.84 1.80
25 4.24 3.39 2.99 2.76 2.60 2.49 2.40 2.34 2.28 2.24 2.09 2.01 1.92 1.87 1.84 1.82 1.78
30 4.17 3.32 2.92 2.69 2.53 2.42 2.33 2.27 2.21 2.16 2.01 1.93 1.84 1.79 1.76 1.74 1.70
40 4.08 3.23 2.84 2.61 2.45 2.34 2.25 2.18 2.12 2.08 1.92 1.84 1.74 1.69 1.66 1.64 1.59
50 4.03 3.18 2.79 2.56 2.40 2.29 2.20 2.13 2.07 2.03 1.87 1.78 1.69 1.63 1.60 1.58 1.52
60 4.00 3.15 2.76 2.53 2.37 2.25 2.17 2.10 2.04 1.99 1.84 1.75 1.65 1.59 1.56 1.53 1.48
100 3.94 3.09 2.70 2.46 2.31 2.19 2.10 2.03 1.97 1.93 1.77 1.68 1.57 1.52 1.48 1.45 1.39
T-13
T-14
Table VII Critical Values for the F Distribution (Continued)
a 5 0.01 n1
n2 1 2 3 4 5 6 7 8 9 10 15 20 30 40 50 60 100
2 98.50 99.00 99.17 99.25 99.30 99.33 99.36 99.37 99.39 99.40 99.43 99.45 99.47 99.47 99.48 99.48 99.49
3 34.12 30.82 29.46 28.71 28.24 27.91 27.67 27.49 27.35 27.23 26.87 26.69 26.50 26.41 26.35 26.32 26.24
4 21.20 18.00 16.69 15.98 15.52 15.21 14.98 14.80 14.66 14.55 14.20 14.02 13.84 13.75 13.69 13.65 13.58
5 16.26 13.27 12.06 11.39 10.97 10.67 10.46 10.29 10.16 10.05 9.72 9.55 9.38 9.29 9.24 9.20 9.13
6 13.75 10.92 9.78 9.15 8.75 8.47 8.26 8.10 7.98 7.87 7.56 7.40 7.23 7.14 7.09 7.06 6.99
7 12.25 9.55 8.45 7.85 7.46 7.19 6.99 6.84 6.72 6.62 6.31 6.16 5.99 5.91 5.86 5.82 5.75
8 11.26 8.65 7.59 7.01 6.63 6.37 6.18 6.03 5.91 5.81 5.52 5.36 5.20 5.12 5.07 5.03 4.96
9 10.56 8.02 6.99 6.42 6.06 5.80 5.61 5.47 5.35 5.26 4.96 4.81 4.65 4.57 4.52 4.48 4.41
10 10.04 7.56 6.55 5.99 5.64 5.39 5.20 5.06 4.94 4.85 4.56 4.41 4.25 4.17 4.12 4.08 4.01
11 9.65 7.21 6.22 5.67 5.32 5.07 4.89 4.74 4.63 4.54 4.25 4.10 3.94 3.86 3.81 3.78 3.71
12 9.33 6.93 5.95 5.41 5.06 4.82 4.64 4.50 4.39 4.30 4.01 3.86 3.70 3.62 3.57 3.54 3.47
13 9.07 6.70 5.74 5.21 4.86 4.62 4.44 4.30 4.19 4.10 3.82 3.66 3.51 3.43 3.38 3.34 3.27
14 8.86 6.51 5.56 5.04 4.69 4.46 4.28 4.14 4.03 3.94 3.66 3.51 3.35 3.27 3.22 3.18 3.11
15 8.68 6.36 5.42 4.89 4.56 4.32 4.14 4.00 3.89 3.80 3.52 3.37 3.21 3.13 3.08 3.05 2.98
16 8.53 6.23 5.29 4.77 4.44 4.20 4.03 3.89 3.78 3.69 3.41 3.26 3.10 3.02 2.97 2.93 2.86
17 8.40 6.11 5.18 4.67 4.34 4.10 3.93 3.79 3.68 3.59 3.31 3.16 3.00 2.92 2.87 2.83 2.76
18 8.29 6.01 5.09 4.58 4.25 4.01 3.84 3.71 3.60 3.51 3.23 3.08 2.92 2.84 2.78 2.75 2.68
19 8.18 5.93 5.01 4.50 4.17 3.94 3.77 3.63 3.52 3.43 3.15 3.00 2.84 2.76 2.71 2.67 2.60
20 8.10 5.85 4.94 4.43 4.10 3.87 3.70 3.56 3.46 3.37 3.09 2.94 2.78 2.69 2.64 2.61 2.54
21 8.02 5.78 4.87 4.37 4.04 3.81 3.64 3.51 3.40 3.31 3.03 2.88 2.72 2.64 2.58 2.55 2.48
22 7.95 5.72 4.82 4.31 3.99 3.76 3.59 3.45 3.35 3.26 2.98 2.83 2.67 2.58 2.53 2.50 2.42
23 7.88 5.66 4.76 4.26 3.94 3.71 3.54 3.41 3.30 3.21 2.93 2.78 2.62 2.54 2.48 2.45 2.37
24 7.82 5.61 4.72 4.22 3.90 3.67 3.50 3.36 3.26 3.17 2.89 2.74 2.58 2.49 2.44 2.40 2.33
25 7.77 5.57 4.68 4.18 3.85 3.63 3.46 3.32 3.22 3.13 2.85 2.70 2.54 2.45 2.40 2.36 2.29
30 7.56 5.39 4.51 4.02 3.70 3.47 3.30 3.17 3.07 2.98 2.70 2.55 2.39 2.30 2.25 2.21 2.13
40 7.31 5.18 4.31 3.83 3.51 3.29 3.12 2.99 2.89 2.80 2.52 2.37 2.20 2.11 2.06 2.02 1.94
50 7.17 5.06 4.20 3.72 3.41 3.19 3.02 2.89 2.78 2.70 2.42 2.27 2.10 2.01 1.95 1.91 1.82
60 7.08 4.98 4.13 3.65 3.34 3.12 2.95 2.82 2.72 2.63 2.35 2.20 2.03 1.94 1.88 1.84 1.75
100 6.90 4.82 3.98 3.51 3.21 2.99 2.82 2.69 2.59 2.50 2.22 2.07 1.89 1.80 1.74 1.69 1.60
Table VII Critical Values for the F Distribution (Continued)
a 5 0.01 n1
n2 1 2 3 4 5 6 7 8 9 10 15 20 30 40 50 60 100
2 998.50 999.00 999.17 999.25 999.30 999.33 999.36 999.37 999.39 999.40 999.43 999.45 999.47 999.47 999.48 999.48 999.49
3 167.03 148.50 141.11 137.10 134.58 132.85 131.58 130.62 129.86 129.25 127.37 126.42 125.45 124.96 124.66 124.47 124.07
4 74.14 61.25 56.18 53.44 51.71 50.53 49.66 49.00 48.47 48.05 46.76 46.10 45.43 45.09 44.88 44.75 44.47
5 47.18 37.12 33.20 31.09 29.75 28.83 28.16 27.65 27.24 26.92 25.91 25.39 24.87 24.60 24.44 24.33 24.12
6 35.51 27.00 23.70 21.92 20.80 20.03 19.46 19.03 18.69 18.41 17.56 17.12 16.67 16.44 16.31 16.21 16.03
7 29.25 21.69 18.77 17.20 16.21 15.52 15.02 14.63 14.33 14.08 13.32 12.93 12.53 12.33 12.20 12.12 11.95
8 25.41 18.49 15.83 14.39 13.48 12.86 12.40 12.05 11.77 11.54 10.84 10.48 10.11 9.92 9.80 9.73 9.57
9 22.86 16.39 13.90 12.56 11.71 11.13 10.70 10.37 10.11 9.89 9.24 8.90 8.55 8.37 8.26 8.19 8.04
10 21.04 14.91 12.55 11.28 10.48 9.93 9.52 9.20 8.96 8.75 8.13 7.80 7.47 7.30 7.19 7.12 6.98
11 19.69 13.81 11.56 10.35 9.58 9.05 8.66 8.35 8.12 7.92 7.32 7.01 6.68 6.52 6.42 6.35 6.21
12 18.64 12.97 10.80 9.63 8.89 8.38 8.00 7.71 7.48 7.29 6.71 6.40 6.09 5.93 5.83 5.76 5.63
13 17.82 12.31 10.21 9.07 8.35 7.86 7.49 7.21 6.98 6.80 6.23 5.93 5.63 5.47 5.37 5.30 5.17
14 17.14 11.78 9.73 8.62 7.92 7.44 7.08 6.80 6.58 6.40 5.85 5.56 5.25 5.10 5.00 4.94 4.81
15 16.59 11.34 9.34 8.25 7.57 7.09 6.74 6.47 6.26 6.08 5.54 5.25 4.95 4.80 4.70 4.64 4.51
16 16.12 10.97 9.01 7.94 7.27 6.80 6.46 6.19 5.98 5.81 5.27 4.99 4.70 4.54 4.45 4.39 4.26
17 15.72 10.66 8.73 7.68 7.02 6.56 6.22 5.96 5.75 5.58 5.05 4.78 4.48 4.33 4.24 4.18 4.05
18 15.38 10.39 8.49 7.46 6.81 6.35 6.02 5.76 5.56 5.39 4.87 4.59 4.30 4.15 4.06 4.00 3.87
19 15.08 10.16 8.28 7.27 6.62 6.18 5.85 5.59 5.39 5.22 4.70 4.43 4.14 3.99 3.90 3.84 3.71
20 14.82 9.95 8.10 7.10 6.46 6.02 5.69 5.44 5.24 5.08 4.56 4.29 4.00 3.86 3.77 3.70 3.58
21 14.59 9.77 7.94 6.95 6.32 5.88 5.56 5.31 5.11 4.95 4.44 4.17 3.88 3.74 3.64 3.58 3.46
22 14.38 9.61 7.80 6.81 6.19 5.76 5.44 5.19 4.99 4.83 4.33 4.06 3.78 3.63 3.54 3.48 3.35
23 14.20 9.47 7.67 6.70 6.08 5.65 5.33 5.09 4.89 4.73 4.23 3.96 3.68 3.53 3.44 3.38 3.25
24 14.03 9.34 7.55 6.59 5.98 5.55 5.23 4.99 4.80 4.64 4.14 3.87 3.59 3.45 3.36 3.29 3.17
25 13.88 9.22 7.45 6.49 5.89 5.46 5.15 4.91 4.71 4.56 4.06 3.79 3.52 3.37 3.28 3.22 3.09
30 13.29 8.77 7.05 6.12 5.53 5.12 4.82 4.58 4.39 4.24 3.75 3.49 3.22 3.07 2.98 2.92 2.79
40 12.61 8.25 6.59 5.70 5.13 4.73 4.44 4.21 4.02 3.87 3.40 3.14 2.87 2.73 2.64 2.57 2.44
50 12.22 7.96 6.34 5.46 4.90 4.51 4.22 4.00 3.82 3.67 3.20 2.95 2.68 2.53 2.44 2.38 2.25
60 11.97 7.77 6.17 5.31 4.76 4.37 4.09 3.86 3.69 3.54 3.08 2.83 2.55 2.41 2.32 2.25 2.12
100 11.50 7.41 5.86 5.02 4.48 4.11 3.83 3.61 3.44 3.30 2.84 2.59 2.32 2.17 2.08 2.01 1.87
T-15
T-16
Table VIII Critical Values for the Studentized Range Distribution
This table contains critical values associated with the Studentized range distribution, Qa,k,n, defined by a, and the degrees of freedom k and n,where k is the number of
degrees of freedom in the numerator (the number of treatment groups) and n is the number of degrees of freedom in the denominator.
a 5 0.05 k
n 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
2 6.085 8.331 9.798 10.881 11.734 12.434 13.027 13.538 13.987 14.387 14.747 15.076 15.375 15.650 15.905 16.143 16.365 16.573 16.769
3 4.501 5.910 6.825 7.502 8.037 8.478 8.852 9.177 9.462 9.717 9.946 10.155 10.346 10.522 10.686 10.838 10.980 11.114 11.240
4 3.926 5.040 5.757 6.287 6.706 7.053 7.347 7.602 7.826 8.027 8.208 8.373 8.524 8.664 8.793 8.914 9.027 9.133 9.233
5 3.635 4.602 5.218 5.673 6.033 6.330 6.582 6.801 6.995 7.167 7.324 7.465 7.596 7.716 7.828 7.932 8.030 8.122 8.208
6 3.460 4.339 4.896 5.305 5.629 5.895 6.122 6.319 6.493 6.649 6.789 6.917 7.034 7.143 7.244 7.338 7.426 7.509 7.587
7 3.344 4.165 4.681 5.060 5.359 5.606 5.815 5.997 6.158 6.302 6.431 6.550 6.658 6.759 6.852 6.939 7.020 7.097 7.169
8 3.261 4.041 4.529 4.886 5.167 5.399 5.596 5.767 5.918 6.053 6.175 6.287 6.389 6.483 6.571 6.653 6.729 6.801 6.870
9 3.199 3.948 4.415 4.755 5.024 5.244 5.432 5.595 5.738 5.867 5.983 6.089 6.186 6.276 6.359 6.437 6.510 6.579 6.644
10 3.151 3.877 4.327 4.654 4.912 5.124 5.304 5.460 5.598 5.722 5.833 5.935 6.028 6.114 6.194 6.269 6.339 6.405 6.467
11 3.113 3.820 4.256 4.574 4.823 5.028 5.202 5.353 5.486 5.605 5.713 5.811 5.901 5.984 6.062 6.134 6.202 6.265 6.325
12 3.081 3.773 4.199 4.508 4.750 4.950 5.119 5.265 5.395 5.510 5.615 5.710 5.797 5.878 5.953 6.023 6.089 6.151 6.209
14 3.033 3.701 4.111 4.407 4.639 4.829 4.990 5.130 5.253 5.363 5.463 5.554 5.637 5.714 5.785 5.852 5.915 5.973 6.029
15 3.014 3.673 4.076 4.367 4.595 4.782 4.940 5.077 5.198 5.306 5.403 5.492 5.574 5.649 5.719 5.785 5.846 5.904 5.958
16 2.998 3.649 4.046 4.333 4.557 4.741 4.896 5.031 5.150 5.256 5.352 5.439 5.519 5.593 5.662 5.726 5.786 5.843 5.896
17 2.984 3.628 4.020 4.303 4.524 4.705 4.858 4.991 5.108 5.212 5.306 5.392 5.471 5.544 5.612 5.675 5.734 5.790 5.842
18 2.971 3.609 3.997 4.276 4.494 4.673 4.824 4.955 5.071 5.173 5.266 5.351 5.429 5.501 5.567 5.629 5.688 5.743 5.794
19 2.960 3.593 3.977 4.253 4.468 4.645 4.794 4.924 5.037 5.139 5.231 5.314 5.391 5.462 5.528 5.589 5.647 5.701 5.752
20 2.950 3.578 3.958 4.232 4.445 4.620 4.768 4.895 5.008 5.108 5.199 5.282 5.357 5.427 5.492 5.553 5.610 5.663 5.714
25 2.913 3.523 3.890 4.153 4.358 4.526 4.667 4.789 4.897 4.993 5.079 5.158 5.230 5.297 5.359 5.417 5.471 5.522 5.570
30 2.888 3.487 3.845 4.102 4.301 4.464 4.601 4.720 4.824 4.917 5.001 5.077 5.147 5.211 5.271 5.327 5.379 5.429 5.475
40 2.858 3.442 3.791 4.039 4.232 4.388 4.521 4.634 4.735 4.824 4.904 4.977 5.044 5.106 5.163 5.216 5.266 5.313 5.358
50 2.841 3.416 3.758 4.002 4.190 4.344 4.473 4.584 4.681 4.768 4.847 4.918 4.983 5.043 5.098 5.150 5.199 5.245 5.288
100 2.806 3.365 3.695 3.929 4.109 4.256 4.379 4.484 4.577 4.659 4.733 4.800 4.862 4.918 4.971 5.020 5.066 5.108 5.149
200 2.789 3.339 3.664 3.893 4.069 4.212 4.332 4.435 4.525 4.605 4.677 4.742 4.802 4.857 4.908 4.955 4.999 5.041 5.080
300 2.783 3.331 3.654 3.881 4.056 4.198 4.317 4.419 4.508 4.587 4.659 4.723 4.782 4.837 4.887 4.934 4.978 5.019 5.057
400 2.780 3.327 3.649 3.875 4.050 4.191 4.309 4.411 4.500 4.578 4.649 4.714 4.772 4.826 4.876 4.923 4.967 5.007 5.046
500 2.779 3.324 3.645 3.872 4.046 4.187 4.305 4.406 4.494 4.573 4.644 4.708 4.766 4.820 4.870 4.917 4.960 5.001 5.039
Table VIII Critical Values for the Studentized Range Distribution (Continued)
a 5 0.01 k
n 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
2 14.035 19.019 22.293 24.717 26.628 28.199 29.528 30.677 31.687 32.585 33.395 34.129 34.802 35.421 35.995 36.529 37.028 37.496 37.937
3 8.260 10.616 12.169 13.324 14.240 14.997 15.640 16.198 16.689 17.128 17.524 17.884 18.214 18.519 18.802 19.065 19.311 19.543 19.761
4 6.511 8.118 9.173 9.958 10.582 11.099 11.539 11.925 12.264 12.566 12.840 13.089 13.318 13.530 13.726 13.909 14.081 14.242 14.394
5 5.702 6.976 7.806 8.421 8.913 9.321 9.669 9.971 10.239 10.479 10.695 10.893 11.075 11.243 11.399 11.544 11.681 11.809 11.930
6 5.243 6.331 7.033 7.556 7.974 8.318 8.611 8.869 9.097 9.300 9.485 9.653 9.808 9.951 10.084 10.208 10.325 10.434 10.538
7 4.948 5.919 6.543 7.006 7.373 7.678 7.940 8.167 8.368 8.548 8.711 8.859 8.996 9.124 9.242 9.353 9.456 9.553 9.645
8 4.745 5.635 6.204 6.625 6.960 7.238 7.475 7.681 7.864 8.028 8.177 8.312 8.437 8.552 8.659 8.760 8.854 8.942 9.026
9 4.595 5.428 5.957 6.347 6.658 6.915 7.134 7.326 7.495 7.647 7.785 7.910 8.026 8.133 8.233 8.326 8.413 8.495 8.573
10 4.482 5.270 5.769 6.136 6.428 6.669 6.875 7.055 7.214 7.356 7.485 7.603 7.712 7.813 7.906 7.994 8.076 8.153 8.226
11 4.392 5.146 5.621 5.970 6.247 6.476 6.671 6.842 6.992 7.127 7.250 7.362 7.465 7.560 7.649 7.732 7.810 7.883 7.952
12 4.320 5.046 5.502 5.836 6.101 6.321 6.507 6.670 6.814 6.943 7.060 7.167 7.265 7.356 7.441 7.520 7.594 7.664 7.731
13 4.261 4.964 5.404 5.727 5.981 6.192 6.372 6.528 6.666 6.791 6.903 7.006 7.100 7.188 7.269 7.345 7.417 7.484 7.548
14 4.210 4.895 5.322 5.634 5.881 6.085 6.258 6.409 6.543 6.664 6.772 6.871 6.962 7.047 7.125 7.199 7.268 7.333 7.394
15 4.167 4.836 5.252 5.556 5.796 5.994 6.162 6.309 6.438 6.555 6.660 6.757 6.845 6.927 7.003 7.074 7.141 7.204 7.264
16 4.131 4.786 5.192 5.488 5.722 5.915 6.079 6.222 6.348 6.461 6.564 6.658 6.743 6.824 6.897 6.967 7.032 7.093 7.151
17 4.099 4.742 5.140 5.430 5.659 5.847 6.007 6.147 6.270 6.380 6.480 6.572 6.656 6.733 6.806 6.873 6.937 6.997 7.053
18 4.071 4.703 5.094 5.379 5.603 5.787 5.944 6.081 6.201 6.309 6.407 6.496 6.579 6.655 6.725 6.791 6.854 6.912 6.967
19 4.046 4.669 5.054 5.333 5.553 5.735 5.888 6.022 6.141 6.246 6.342 6.430 6.510 6.585 6.654 6.719 6.780 6.837 6.891
20 4.024 4.639 5.018 5.293 5.509 5.687 5.839 5.970 6.086 6.190 6.285 6.370 6.449 6.523 6.591 6.654 6.714 6.770 6.823
25 3.942 4.527 4.884 5.143 5.346 5.513 5.654 5.777 5.885 5.982 6.070 6.150 6.223 6.291 6.355 6.414 6.469 6.521 6.571
30 3.889 4.454 4.799 5.048 5.242 5.401 5.536 5.653 5.756 5.848 5.932 6.008 6.078 6.142 6.202 6.258 6.311 6.360 6.407
40 3.825 4.367 4.695 4.931 5.114 5.265 5.392 5.502 5.599 5.685 5.764 5.835 5.900 5.961 6.017 6.069 6.118 6.165 6.208
50 3.787 4.316 4.634 4.863 5.040 5.185 5.308 5.414 5.507 5.590 5.665 5.734 5.796 5.854 5.908 5.958 6.005 6.050 6.092
100 3.714 4.216 4.516 4.730 4.896 5.031 5.144 5.242 5.328 5.405 5.474 5.537 5.594 5.648 5.697 5.743 5.786 5.826 5.864
200 3.714 4.216 4.516 4.730 4.896 5.031 5.144 5.242 5.328 5.405 5.474 5.537 5.594 5.648 5.697 5.743 5.786 5.826 5.864
300 3.666 4.152 4.440 4.645 4.803 4.931 5.039 5.132 5.213 5.286 5.351 5.410 5.464 5.514 5.560 5.603 5.644 5.682 5.717
400 3.661 4.144 4.431 4.634 4.791 4.919 5.026 5.118 5.199 5.271 5.335 5.394 5.448 5.498 5.543 5.586 5.626 5.664 5.699
500 3.657 4.139 4.425 4.628 4.784 4.911 5.018 5.110 5.190 5.262 5.327 5.385 5.438 5.488 5.533 5.576 5.616 5.653 5.688
T-17
T-18
Table VIII Critical Values for the Studentized Range Distribution (Continued)
a 5 0.001 k
n 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
2 44.666 60.323 70.586 78.162 84.127 89.022 93.650 97.285 100.480 103.325 105.886 108.211 110.340 112.300 114.115 115.805 117.385 118.867 120.263
3 18.275 23.298 26.609 29.075 31.030 32.645 34.016 35.327 36.389 37.338 38.194 38.974 39.688 40.347 40.959 41.529 42.062 42.564 43.036
4 12.174 14.965 16.798 18.225 19.333 20.253 21.037 21.719 22.323 22.862 23.350 23.795 24.204 24.581 24.932 25.259 25.566 25.854 26.126
5 9.710 11.671 12.959 13.924 14.695 15.335 15.882 16.358 16.780 17.158 17.500 17.811 18.098 18.402 18.651 18.884 19.102 19.307 19.501
6 8.431 9.955 10.965 11.719 12.322 12.824 13.254 13.629 13.961 14.260 14.530 14.777 15.004 15.215 15.411 15.593 15.765 15.927 16.079
7 7.649 8.933 9.761 10.388 10.883 11.316 11.674 11.988 12.265 12.515 12.742 12.949 13.139 13.316 13.480 13.634 13.778 13.914 14.043
8 7.130 8.252 8.980 9.523 9.948 10.317 10.625 10.894 11.133 11.347 11.559 11.740 11.906 12.060 12.203 12.337 12.463 12.582 12.694
9 7.130 8.252 8.980 9.523 9.948 10.317 10.625 10.894 11.133 11.347 11.559 11.740 11.906 12.060 12.203 12.337 12.463 12.582 12.694
10 6.486 7.411 8.007 8.451 8.805 9.100 9.353 9.574 9.770 9.954 10.106 10.245 10.387 10.512 10.629 10.737 10.840 10.936 11.027
11 6.274 7.137 7.688 8.099 8.427 8.700 8.934 9.138 9.320 9.483 9.631 9.767 9.892 10.017 10.121 10.218 10.309 10.394 10.475
12 6.106 6.917 7.442 7.820 8.128 8.383 8.602 8.793 8.963 9.116 9.254 9.381 9.498 9.607 9.708 9.803 9.892 9.976 10.055
13 5.969 6.740 7.234 7.595 7.885 8.126 8.333 8.513 8.674 8.818 8.949 9.068 9.179 9.281 9.377 9.466 9.550 9.630 9.705
14 5.855 6.593 7.070 7.410 7.692 7.914 8.111 8.282 8.434 8.571 8.696 8.810 8.915 9.012 9.103 9.188 9.268 9.343 9.414
15 5.760 6.470 6.920 7.257 7.517 7.742 7.924 8.088 8.234 8.365 8.483 8.592 8.693 8.786 8.873 8.954 9.030 9.102 9.170
16 5.678 6.365 6.799 7.125 7.377 7.585 7.769 7.923 8.063 8.189 8.303 8.407 8.504 8.593 8.676 8.754 8.828 8.897 8.963
17 5.614 6.274 6.695 7.010 7.254 7.457 7.629 7.783 7.921 8.037 8.147 8.248 8.341 8.427 8.508 8.583 8.654 8.720 8.783
18 5.550 6.201 6.609 6.909 7.147 7.343 7.511 7.658 7.781 7.908 8.017 8.116 8.199 8.283 8.361 8.433 8.502 8.566 8.628
19 5.493 6.129 6.527 6.820 7.051 7.243 7.407 7.550 7.676 7.790 7.894 7.990 8.079 8.162 8.238 8.302 8.369 8.431 8.491
20 5.444 6.065 6.455 6.741 6.967 7.154 7.314 7.453 7.577 7.687 7.788 7.880 7.966 8.046 8.121 8.190 8.256 8.318 8.376
25 5.264 5.840 6.196 6.456 6.662 6.831 6.976 7.102 7.213 7.314 7.404 7.487 7.558 7.629 7.696 7.758 7.816 7.871 7.924
30 5.154 5.698 6.033 6.277 6.469 6.628 6.763 6.880 6.984 7.077 7.161 7.238 7.309 7.375 7.436 7.494 7.548 7.598 7.646
40 5.022 5.527 5.837 6.062 6.239 6.385 6.508 6.615 6.710 6.795 6.872 6.942 7.006 7.066 7.121 7.173 7.222 7.268 7.312
50 4.946 5.426 5.725 5.939 6.107 6.245 6.361 6.463 6.552 6.632 6.705 6.771 6.832 6.888 6.940 6.989 7.035 7.078 7.119
100 4.795 5.244 5.512 5.706 5.855 5.978 6.083 6.173 6.252 6.323 6.387 6.445 6.499 6.548 6.594 6.637 6.678 6.715 6.751
200 4.723 5.151 5.408 5.596 5.738 5.854 5.952 6.038 6.110 6.178 6.237 6.292 6.342 6.388 6.431 6.471 6.509 6.544 6.577
300 4.700 5.122 5.375 5.556 5.696 5.814 5.910 5.993 6.066 6.131 6.189 6.244 6.291 6.335 6.379 6.418 6.455 6.489 6.522
400 4.688 5.107 5.358 5.538 5.677 5.791 5.890 5.972 6.044 6.108 6.166 6.219 6.267 6.312 6.355 6.393 6.427 6.460 6.494
500 4.681 5.098 5.348 5.527 5.665 5.778 5.874 5.959 6.031 6.095 6.152 6.205 6.253 6.297 6.338 6.376 6.412 6.448 6.479
Tab l e s A pp end i x T-19
n c1 c2 a n c1 c2 a n c1 c2 a n c1 c2 a n c1 c2 a
Table IX Critical Values for the Wilcoxon Signed-Rank Statistic (Continued)
n c1 c2 a n c1 c2 a n c1 c2 a n c1 c2 a
15 0 120 0.0000 16 0 136 0.0000 18 0 171 0.0000
17 0 153 0.0000
1 119 0.0001 1 135 0.0000 1 170 0.0000
1 152 0.0000
2 118 0.0001 2 134 0.0000 2 169 0.0000
2 151 0.0000
3 117 0.0002 3 133 0.0001 3 168 0.0000
3 150 0.0000
4 116 0.0002 4 132 0.0001 4 167 0.0000
4 149 0.0001
5 115 0.0003 5 131 0.0002 5 166 0.0000
5 148 0.0001
6 114 0.0004 6 130 0.0002 6 165 0.0001
6 147 0.0001
7 113 0.0006 7 129 0.0003 7 164 0.0001
7 146 0.0001
8 112 0.0008 8 128 0.0004 8 163 0.0001
8 145 0.0002
9 111 0.0010 9 127 0.0005 9 162 0.0001
9 144 0.0003
10 110 0.0013 10 126 0.0007 10 161 0.0002
10 143 0.0003
11 109 0.0017 11 125 0.0008 11 160 0.0002
11 142 0.0004
12 108 0.0021 12 124 0.0011 12 159 0.0003
12 141 0.0005
13 107 0.0027 13 123 0.0013 13 158 0.0003
13 140 0.0007
14 106 0.0034 14 122 0.0017 14 157 0.0004
14 139 0.0008
15 105 0.0042 15 121 0.0021 15 156 0.0005
15 138 0.0010
16 104 0.0051 16 120 0.0026 16 155 0.0006
16 137 0.0013
17 103 0.0062 17 119 0.0031 17 154 0.0008
17 136 0.0016
18 102 0.0075 18 118 0.0038 18 153 0.0010
18 135 0.0019
19 101 0.0090 19 117 0.0046 19 152 0.0012
19 134 0.0023
20 100 0.0108 20 116 0.0055 20 151 0.0014
20 133 0.0028
21 99 0.0128 21 115 0.0065 21 150 0.0017
21 132 0.0033
22 98 0.0151 22 114 0.0078 22 149 0.0020
22 131 0.0040
23 97 0.0177 23 113 0.0091 23 148 0.0024
23 130 0.0047
24 96 0.0206 24 112 0.0107 24 147 0.0028
24 129 0.0055
25 95 0.0240 25 111 0.0125 25 146 0.0033
25 128 0.0064
26 94 0.0277 26 110 0.0145 26 145 0.0038
26 127 0.0075
27 93 0.0319 27 109 0.0168 27 144 0.0045
27 126 0.0087
28 92 0.0365 28 108 0.0193 28 143 0.0052
28 125 0.0101
29 91 0.0416 29 107 0.0222 29 142 0.0060
29 124 0.0116
30 90 0.0473 30 106 0.0253 30 141 0.0069
30 123 0.0133
31 89 0.0535 31 105 0.0288 31 140 0.0080
31 122 0.0153
32 88 0.0603 32 104 0.0327 32 139 0.0091
32 121 0.0174
33 87 0.0677 33 103 0.0370 33 138 0.0104
33 120 0.0198
34 86 0.0757 34 102 0.0416 34 137 0.0118
34 119 0.0224
35 85 0.0844 35 101 0.0467 35 136 0.0134
35 118 0.0253
36 84 0.0938 36 100 0.0523 36 135 0.0152
36 117 0.0284
37 83 0.1039 37 99 0.0583 37 134 0.0171
37 116 0.0319
38 82 0.1147 38 98 0.0649 38 133 0.0192
38 115 0.0357
39 81 0.1262 39 97 0.0719 39 132 0.0216
39 114 0.0398
40 80 0.1384 40 96 0.0795 40 131 0.0241
40 113 0.0443
41 79 0.1514 41 95 0.0877 41 130 0.0269
41 112 0.0492
42 94 0.0964 42 111 0.0544 42 129 0.0300
43 93 0.1057 43 110 0.0601 43 128 0.0333
44 92 0.1156 44 109 0.0662 44 127 0.0368
45 91 0.1261 45 108 0.0727 45 126 0.0407
46 90 0.1372 46 107 0.0797 46 125 0.0449
47 89 0.1489 47 106 0.0871 47 124 0.0494
48 105 0.0950 48 123 0.0542
49 104 0.1034 49 122 0.0594
50 103 0.1123 50 121 0.0649
51 102 0.1217 51 120 0.0708
52 101 0.1317 52 119 0.0770
53 100 0.1421 53 118 0.0837
54 99 0.1530 54 117 0.0907
55 116 0.0982
56 115 0.1061
57 114 0.1144
58 113 0.1231
59 112 0.1323
60 111 0.1419
61 110 0.1519
Tab l e s A pp end i x T-21
Table IX Critical Values for the Wilcoxon Signed-Rank Statistic (Continued)
n c1 c2 a n c1 c2 a n c1 c2 a n c1 c2 a
19 0 190 0.0000 19 41 149 0.0145 20 0 210 0.0000 20 41 169 0.0077
1 189 0.0000 42 148 0.0162 1 209 0.0000 42 168 0.0086
2 188 0.0000 43 147 0.0180 2 208 0.0000 43 167 0.0096
3 187 0.0000 44 146 0.0201 3 207 0.0000 44 166 0.0107
4 186 0.0000 45 145 0.0223 4 206 0.0000 45 165 0.0120
5 185 0.0000 46 144 0.0247 5 205 0.0000 46 164 0.0133
6 184 0.0000 47 143 0.0273 6 204 0.0000 47 163 0.0148
7 183 0.0000 48 142 0.0301 7 203 0.0000 48 162 0.0164
8 182 0.0000 49 141 0.0331 8 202 0.0000 49 161 0.0181
9 181 0.0001 50 140 0.0364 9 201 0.0000 50 160 0.0200
10 180 0.0001 51 139 0.0399 10 200 0.0000 51 159 0.0220
11 179 0.0001 52 138 0.0437 11 199 0.0001 52 158 0.0242
12 178 0.0001 53 137 0.0478 12 198 0.0001 53 157 0.0266
13 177 0.0002 54 136 0.0521 13 197 0.0001 54 156 0.0291
14 176 0.0002 55 135 0.0567 14 196 0.0001 55 155 0.0319
15 175 0.0003 56 134 0.0616 15 195 0.0001 56 154 0.0348
16 174 0.0003 57 133 0.0668 16 194 0.0002 57 153 0.0379
17 173 0.0004 58 132 0.0723 17 193 0.0002 58 152 0.0413
18 172 0.0005 59 131 0.0782 18 192 0.0002 59 151 0.0448
19 171 0.0006 60 130 0.0844 19 191 0.0003 60 150 0.0487
20 170 0.0007 61 129 0.0909 20 190 0.0004 61 149 0.0527
21 169 0.0008 62 128 0.0978 21 189 0.0004 62 148 0.0570
22 168 0.0010 63 127 0.1051 22 188 0.0005 63 147 0.0615
23 167 0.0012 64 126 0.1127 23 187 0.0006 64 146 0.0664
24 166 0.0014 65 125 0.1206 24 186 0.0007 65 145 0.0715
25 165 0.0017 66 124 0.1290 25 185 0.0008 66 144 0.0768
26 164 0.0020 67 123 0.1377 26 184 0.0010 67 143 0.0825
27 163 0.0023 68 122 0.1467 27 183 0.0012 68 142 0.0884
28 162 0.0027 69 121 0.1562 28 182 0.0014 69 141 0.0947
29 161 0.0031 70 120 0.1660 29 181 0.0016 70 140 0.1012
30 160 0.0036 30 180 0.0018 71 139 0.1081
31 159 0.0041 31 179 0.0021 72 138 0.1153
32 158 0.0047 32 178 0.0024 73 137 0.1227
33 157 0.0054 33 177 0.0028 74 136 0.1305
34 156 0.0062 34 176 0.0032 75 135 0.1387
35 155 0.0070 35 175 0.0036 76 134 0.1471
36 154 0.0080 36 174 0.0042 77 133 0.1559
37 153 0.0090 37 173 0.0047
38 152 0.0102 38 172 0.0053
39 151 0.0115 39 171 0.0060
40 150 0.0129 40 170 0.0068
T-22 Tabl es Appe ndi x
m n c1 c2 a m n c1 c2 a m n c1 c2 a m n c1 c2 a
m n c1 c2 a m n c1 c2 a m n c1 c2 a m n c1 c2 a
m n c1 c2 a m n c1 c2 a m n c1 c2 a m n c1 c2 a
7 10 28 98 0.0001 8 9 36 108 0.0000 9 9 45 126 0.0000 10 10 55 155 0.0000
29 97 0.0001 37 107 0.0001 46 125 0.0000 56 154 0.0000
30 96 0.0002 38 106 0.0002 47 124 0.0001 57 153 0.0000
31 95 0.0004 39 105 0.0003 48 123 0.0001 58 152 0.0000
32 94 0.0006 40 104 0.0005 49 122 0.0002 59 151 0.0001
33 93 0.0010 41 103 0.0008 50 121 0.0004 60 150 0.0001
34 92 0.0015 42 102 0.0012 51 120 0.0006 61 149 0.0002
35 91 0.0023 43 101 0.0019 52 119 0.0009 62 148 0.0002
36 90 0.0034 44 100 0.0028 53 118 0.0014 63 147 0.0004
37 89 0.0048 45 99 0.0039 54 117 0.0020 64 146 0.0005
38 88 0.0068 46 98 0.0056 55 116 0.0028 65 145 0.0008
39 87 0.0093 47 97 0.0076 56 115 0.0039 66 144 0.0010
40 86 0.0125 48 96 0.0103 57 114 0.0053 67 143 0.0014
41 85 0.0165 49 95 0.0137 58 113 0.0071 68 142 0.0019
42 84 0.0215 50 94 0.0180 59 112 0.0094 69 141 0.0026
43 83 0.0277 51 93 0.0232 60 111 0.0122 70 140 0.0034
44 82 0.0351 52 92 0.0296 61 110 0.0157 71 139 0.0045
45 81 0.0439 53 91 0.0372 62 109 0.0200 72 138 0.0057
46 80 0.0544 54 90 0.0464 63 108 0.0252 73 137 0.0073
47 79 0.0665 55 89 0.0570 64 107 0.0313 74 136 0.0093
48 78 0.0806 56 88 0.0694 65 106 0.0385 75 135 0.0116
49 77 0.0966 57 87 0.0836 66 105 0.0470 76 134 0.0144
50 76 0.1148 58 86 0.0998 67 104 0.0567 77 133 0.0177
51 75 0.1349 59 85 0.1179 68 103 0.0680 78 132 0.0216
52 74 0.1574 60 84 0.1383 69 102 0.0807 79 131 0.0262
8 8 36 100 0.0001 8 10 36 116 0.0000 70 101 0.0951 80 130 0.0315
37 99 0.0002 37 115 0.0000 71 100 0.1112 81 129 0.0376
38 98 0.0003 38 114 0.0001 72 99 0.1290 82 128 0.0446
39 97 0.0005 39 113 0.0002 9 10 45 135 0.0000 83 127 0.0526
40 96 0.0009 40 112 0.0003 46 134 0.0000 84 126 0.0615
41 95 0.0015 41 111 0.0004 47 133 0.0000 85 125 0.0716
42 94 0.0023 42 110 0.0007 48 132 0.0001 86 124 0.0827
43 93 0.0035 43 109 0.0010 49 131 0.0001 87 123 0.0952
44 92 0.0052 44 108 0.0015 50 130 0.0002 88 122 0.1088
45 91 0.0074 45 107 0.0022 51 129 0.0003 89 121 0.1237
46 90 0.0103 46 106 0.0031 52 128 0.0005 90 120 0.1399
47 89 0.0141 47 105 0.0043 53 127 0.0007 91 119 0.1575
48 88 0.0190 48 104 0.0058 54 126 0.0011
49 87 0.0249 49 103 0.0078 55 125 0.0015
50 86 0.0325 50 102 0.0103 56 124 0.0021
51 85 0.0415 51 101 0.0133 57 123 0.0028
52 84 0.0524 52 100 0.0171 58 122 0.0038
53 83 0.0652 53 99 0.0217 59 121 0.0051
54 82 0.0803 54 98 0.0273 60 120 0.0066
55 81 0.0974 55 97 0.0338 61 119 0.0086
56 80 0.1172 56 96 0.0416 62 118 0.0110
57 79 0.1393 57 95 0.0506 63 117 0.0140
58 94 0.0610 64 116 0.0175
59 93 0.0729 65 115 0.0217
60 92 0.0864 66 114 0.0267
61 91 0.1015 67 113 0.0326
62 90 0.1185 68 112 0.0394
63 89 0.1371 69 111 0.0474
64 88 0.1577 70 110 0.0564
71 109 0.0667
72 108 0.0782
73 107 0.0912
74 106 0.1055
75 105 0.1214
76 104 0.1388
Table XI Critical Values for the Runs Test
This table contains cumulative probabilities associated with the runs test. Let m be
the number of observations in one category and n the number of observations in
the other category ( m # n ) , and V be the number of runs. The values in this table
are the probabilities P ( V # v ) if the order of observations is random.
v
m n 2 3 4 5 6 7 8 9
T-25
T-26
Table XI Critical Values for the Runs Test (Continued)
v
m n 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Alpha a A
Beta b B
Gamma g G
Delta d D
Epsilon P e E
Zeta z Z
Eta h H
Theta u q U
Iota i I
Kappa k K
Lambda l L
Mu m M
Nu n N
Xi j J
Omicron o O
Pi p
P
Rho r R
Sigma s ςς% S
Tau t T
Upsilon y Y
Phi f w
ϕ
ϕ F
Chi x X
Psi c C
Omega v V
Answers to Odd-Numbered Exercises
A-1
A-2 A n s w e r s t o O d d -Nu m b e red E x erc i s e s
country that sell comforters. Assign a number to each store, 2.19 Relative
and use a random number generator to select stores. Visit the Category Frequency requency
selected stores and count the number of comforters on
Comedy 7 0.1667
display. Use a random number generator to purchase one of
the comforters on display. Drama 10 0.2381
Educational 3 0.0714
Chapter Exercises Reality 7 0.1667
Soap 10 0.2381
1.41 a. Descriptive. b. Descriptive. c. Inferential.
Sports 5 0.1190
d. Inferential.
Total 42 1.0000
1.43 a. Descriptive. b. Inferential. c. Inferential.
d. Descriptive. 2.21 a. Relative
Issue Frequency frequency
1.45 Population: All adults. Sample: 400 adults selected.
Salary 50 0.1250
1.47 a. Observational study. b. 1000 people called. c. Not Health insurance 100 0.2500
a random sample. Not everyone has a phone. Not everyone with
Retirement benefits 75 0.1875
a phone has a listed number. People may choose not to respond.
People may not be home at the time of the call. Class size 60 0.1500
Temporary faculty 90 0.2250
1.49 a. Experimental study. b. Length of time until ice formed
Parking 25 0.0625
on the wings. c. Flip a coin for each plane. If heads, assign the
plane to one group. If tails, assign the plane to the other group. Total 400 1.0000
b. 100
1.51 a. Observational study. b. Pressure on each section.
c. Not a random sample. Only the deepest areas of the tunnel
were selected. 80
60
bottle.
40
20
Chapter 2
Section 2.1 0
S H R C T P
2.1 False. Contract issue
Section 2.2
2.15 True. Unlikely Very unlikely
2.17 True.
A n s w e r s t o Od d - Nu m b e r e d E xer c i ses A-3
b. Strawberry
b.
50 0.0
Al Tr Ex Pu Sp Bu El Ha Se Si
40 Product
Frequency
30 Training
Extinguishers
20
Pumps Alarms
10
Sprinklers
0
BD CT CC Pi St VO Building
Signaling
Ice cream materials
systems
Security
Electrical products
equipment
Hazmat
storage
A-4 A n s w e r s t o O d d -Nu m b e red Ex erc i s e s
2.31 0.35
0.30 Brick
Stucco
Relative frequency
0.25
0.20 Aluminum
0.15
0.10
Vinyl
Wood
0.05
0.00
Bw Vw Co Ta La
Product
40
2.37 a. Relative
Agency Frequency frequency
Frequency
30
Alamo 10 0.0571
20 Avis 25 0.1429
Budget 30 0.1714
10
Enterprise 40 0.2286
0
Hertz 35 0.2000
Al Br St Vi Wo Thrifty 20 0.1143
Siding Value 15 0.0857
Total 175 1.0000
A n s w e r s t o O d d -N u m b e r e d E xer c i ses A-5
d.
0.15
Relative frequency
Budget
Avis 0.10
Enterprise Alamo
0.05
Value
0.00
Thrifty M T W Th F Sa Su
Hertz Day of week (urban)
c. The shapes of the bar graphs are very similar. The highest
proportion of fatal vehicle crashes occur on Sunday and
2.39 a. Relative
Monday.
Day of week Frequency frequency
2.41 a.
Monday 557 0.1744
25
Tuesday 383 0.1199
Wednesday 348 0.1090
20
Thursday 351 0.1099
Friday 415 0.1299
15
Frequency
Saturday 464 0.1453
Sunday 676 0.2116
10
Total 3194 1.0000
0.25 5
0.20 0
Relative frequency
Do Wd RH Ho QP Tu DG Co Sq No Ra
0.15 Animal hunted
0.00
Dove
M T W Th F Sa Su
Day of week (rural)
Hog Raccoon
b. Relative Bird/snake
Day of week Frequency frequency
Squirrel/prairie dog
Monday 448 0.1618
Coyote
Tuesday 359 0.1296 Quail/pheasant Duck/goose
Wednesday 328 0.1185 Turkey
Thursday 330 0.1192
c. No. We do not know how many people are hunting each
Friday 386 0.1394
animal.
Saturday 404 0.1459
Sunday 514 0.1856 2.43 a. 0.1606, 0.0676, 0.0074, 0.3095, 0.4549
b. 0.1375, 0.0752, 0.0077, 0.2991, 0.4805
Total 2769 1.0000
A-6 A n s w e r s t o O d d -N u m b e red E x erc i s e s
0.3 3 3449
4 0111125555799
0.2 5 699
6 023444
0.1
7 0123679
8 258
0.0
B&E Tmv .5000 ,5000 Mi 9 3
Violation
b.
0 4 Stem 5 1
Section 2.3 1 16 Leaf 5 0.1
2 2378
2.45 False.
3 0355
2.47 True. 4 001222256668
2.49 5 00799
2 79 Stem 5 1
6 133445
3 Leaf 5 0.1
7 1134779
3 56669
8 269
4 112
9 3
4 779
5 01124 c. These two graphs are slightly different. However, they both
5 57789 suggest the same general shape, center, and variability. Typical
6 1444 value is 4.5.
6 68 2.57 a. Lower floors Upper floors
7 1
10 14
The center of the data is between 5.0 and 5.5. A typical value 10
is 5.2. 4300 11 1
99887777777666665555 11 55578
2.51
53 0344 Stem 5 100 3333322211111110000000 12 1244
53 799 Leaf 5 1 665 12 55556777899
54 111344 0 13 122234
54 566677777889 13 567888999
55 112334 14 02234
55 67777 14 5888
56 002 15 244
56 9
Stem 5 100, Leaf 5 1
The center of the data is between 545 and 550. A typical value b. The lower-floors distribution is more compact and has, on
is 547. average, smaller values. The upper-floors distribution has more
variability and has, on average, larger values.
2.53 a. 543, 543, 549. b. 574 c. The data tail off
slowly on the low end. d. There do not appear to be any 2.59 a.
1 00699 Stem 5 10
outliers.
2 244557 Leaf 5 1
3 11245678899
4 000011555669
5 2259
6 8
7 1
A n s w e r s t o O d d -N u m b e r e d Exer c i ses A-7
260 26
6
261 2
262 3 4
2.81 Cumulative 12
Relative relative
Class Frequency frequency frequency 10
Frequency
150–200 120 0.1500 0.3438
6
200–250 130 0.1625 0.5063
250–300 145 0.1813 0.6875 4
50 Total 30 1.0000
10
0
0 25 50 75 100 125 150 175 200 9
Observation 8
7
2.85 a. Cumulative
Frequency
6
Relative relative 5
Class Frequency frequency frequency 4
0–10 1 0.0167 0.0167 3
10–20 3 0.0500 0.0667 2
1
20–30 9 0.1500 0.2167
0
30–40 11 0.1833 0.4000 0 50 100 150 200 250 300 350 400 450 500 550
40–50 12 0.2000 0.6000 Cold cranking amps
50–60 10 0.1667 0.7667 b. Positively skewed, centered near 100, lots of variability, one
60–70 7 0.1167 0.8833 outlier: 514.
70–80 5 0.0833 0.9667 c. M < 110
80–90 2 0.0333 1.0000
Total 60 1.0000
A n s w e r s t o O d d - N u m b e r e d E xer c i ses A-9
15
110–115 40 0.200 0.625
10
115–120 25 0.125 0.750
120–125 20 0.100 0.850
5 125–130 15 0.075 0.925
130–135 10 0.050 0.975
0 135–140 5 0.025 1.000
0 3 6 9 12 15 18 21 24
Niacin intake (U.S.) Total 200 1.000
35 b. 80
30 70
60
25
50
Frequency
Frequency
20
40
15
30
10
20
5
10
0
0 3 6 9 12 15 18 21 24 0
100 105 110 115 120 125 130 135 140
Niacin intake (Europe)
Air gap
b. United States: unimodal, positively skewed. Europe: c. 0.425
u nimodal, negatively skewed. On average, it appears that
Europeans have a greater daily niacin intake. 2.95 a.
12
2.91 a. Solution Trail 10
Keywords: Histogram; shape; center; variability.
8
Frequency
16 0
70 90 110 130 150 170 190 210 230
14 Problems
12 b. The distribution is centered around 133, is skewed slightly
10 to the right, and has a lot of variability. c. Q1 < 120.5,
Frequency
b. Approximately bell-shaped, center around 0.32, little b. New equipment times tend to be smaller; the distribution is
variability. more compact. Traditional equipment times are more spread
out and tend to be larger. c. The new equipment times tend to
2.99 a. Cumulative be better, with shorter response times. The majority of the times
Relative relative are less than the traditional equipment times.
Class Frequency frequency frequency
2.103 a. Cumulative
70–75 4 0.0741 0.0741
Relative relative
75–80 12 0.2222 0.2963 Class Frequency frequency frequency
80–85 21 0.3889 0.6852
0–200 73 0.3946 0.3946
85–90 7 0.1296 0.8148
200–400 88 0.4757 0.8703
90–95 4 0.0741 0.8889
400–600 13 0.0703 0.9405
95–100 3 0.0556 0.9444
600–800 7 0.0378 0.9784
100–105 1 0.0185 0.9630
800–1000 2 0.0108 0.9892
105–110 0 0.0000 0.9630
1000–1200 1 0.0054 0.9946
110–115 1 0.0185 0.9815
1200–1400 0 0.0000 0.9946
115–120 0 0.0000 0.9815
1400–1600 0 0.0000 0.9946
120–125 0 0.0000 0.9815
1600–1800 0 0.0000 0.9946
125–130 1 0.0185 1.0000
1800–2000 0 0.0000 0.9946
Total 54 1.0000 2000–2200 0 0.0000 0.9946
b. 2200–2400 0 0.0000 0.9946
20 2400–2600 0 0.0000 0.9946
2600–2800 1 0.0054 1.0000
15 Total 185 1.0000
Frequency
90
10
80
70
5
60
Frequency
50
0 40
70 75 80 85 90 95 100 105 110 115 120 125 130
Noise level 30
20
c. 0.2963 d. 0.1852
10
2.101 a. New Traditional 0
0 400 800 1200 1600 2000 2400 2800
0 89 Stem 5 10 Time to comply
1 5 Leaf 5 0.1 b. The distribution is skewed to the right, centered at 218 with
87660 2 6 a lot of variability. c. A typical value is 220. There is one
765443200 3 34 outlier: 2600. d. 0.027
99774444322 4 02568 2.105 a.
110 5 579 30
43 6 122888
25
7 33567
8 033589 20
Frequency
9 178
15
10 034
11 3 10
12 033 5
13 03
0
14 16 3 4 5 6 7 8 9 10 11 12 13 14
Duration, placebo
A n s w e r s t o O d d -N u m b e r e d Exer c i ses A-11
30 Stem Leaf
25 2 8 Stem 5 1
3 357789 Leaf 5 0.1
20
4 01236667
Frequency
15 5 2222889
6 013447
10
7 01455
5 8 1333
9 345
0
3 4 5 6 7 8 9 10 11 12 13 14 10 579
Duration, vitamin C 11 11
b. Both graphs appear to be centered at about the same duration. 12 02
Both appear to be symmetric and bell-shaped. The placebo dura- 13 26
tions are slightly more compact than the vitamin C durations.
14 2
c. There is no graphical evidence to suggest that vitamin C
reduced the duration. 15
Frequency
100–110 1 0.02 0.02
110–120 2 0.04 0.06
5
120–130 9 0.18 0.24
130–140 7 0.14 0.38
140–150 9 0.18 0.56
0
150–160 9 0.18 0.74 2 4 6 8 10 12 14 16
Length
160–170 11 0.22 0.96
170–180 1 0.02 0.98 b. The stem-and-leaf plot and the histogram suggest
180–190 1 0.02 1.00 that the data are positively skewed. Most of the observations
Total 50 1.00 are in the lower tail of the distribution; the upper tail has
few observations. The center of the distribution (the middle
b.
1.0 number) is approximately 6.0. Most of the observations
are between 2 and 7, but the distribution is spread out
Cumulative rel. freq.
0.8
between 2.8 and 14.2. (The largest python captured was,
0.6
in fact, 14.3 feet, and is not part of the data set.) The propor-
tion of observations less than 14 is 0.98. There are no real
0.4 outliers, observations very far away from the rest.
0.2
Chapter 3
0.0
90 100 110 120 130 140 150 160 170 180 190 200 Section 3.1
Scores
3.1 a. Center, variability b. Centered or clustered
2.109 a. Cumulative 3.3 a. 82 b. 3474 c. 32 d. 2779 e. 164 f. 164
Relative relative
Class Frequency frequency frequency 3.5 a. 105.7 b. 13.1852 c. 6.9583 d. 0.1232
2–4 7 0.14 0.14 e. 22.4933 f. 17.7432
4–6 15 0.30 0.44 3.7 a. 6.6667, 7 b. 6.6364, 9 c. 10.6889, 7.7
6–8 11 0.22 0.66 d. 2107.69, 2109.1
8–10 7 0.14 0.80
3.9 a. Skewed left. b. Symmetric. c. Skewed left.
10–12 5 0.10 0.90 d. Skewed left.
12–14 4 0.08 0.98
14–16 1 0.02 1.00 3.11 a. 6 b. 0 c. No mode.
Total 50 1.00 3.13 a. 68.5238 b. 67.0 c. Slightly skewed right.
A-12 A n s w e r s t o Od d -Nu m bered E x er c i s e s
3.15 a. 813.556, 804.5 b. 924.667, 804.5. The mean is 3.53 a. s2 5 54.5763, s 5 7.3876, IQR 5 10.5
greater, pulled in the direction of the new, higher value. The b. s2 5 365.2921, s 5 19.1126, IQR 5 26.0 c. There is
median stays the same. more variability in the General Mills data set.
3.17 a. x 5 619.5, ~
b. Q1 5 265, Q3 5 1832, IQR 5 1567 c. g ( xi 2 x ) 5 0
x 5 620.0 b. 619.1667 3.55 a. s2 5 3,618,447.73, s 5 1902.2218.
c. Approximately symmetric.
3.19 a. 1.9225, 1.4738 b. The median is a better measure of 3.57 a. s2 5 2,086,130.87, s 5 1444.34
central tendency. There is one outlier pulling the mean in its b. s2 5 1,689,766.008, s 5 1299.91 c. The new sample
direction. variance is ( 0.9 ) 2 times the original sample variance. The new
3.21 a. 81.9, 81.0 b. 81.75 c. 84 sample standard deviation is 0.9 times the original sample
standard deviation.
3.23 a. 8.1979, 8.5 b. 8.5 c. 26.2333. The new mean is
3.2 times the original mean: (3.2)(8.1979). 3.59 a. s2 5 70.1449, s 5 8.3753 b. s2 5 14.0181,
s 5 3.7441 c. The new sample variance is (0.44704)2 times
3.25 a. 1634.58 b. 19,614.9. The new mean is 12 times the
the original sample variance. The new sample standard deviation
original mean: (12)(1634.58).
is 0.44704 times the original sample standard deviation.
3.27 a. 20.0444, 18.75 b. The sample mean and the
3.61 a. s2 5 3175.5667, s 5 56.3522 b. s2 5 3175.5667,
sample median are approximately the same, which suggests
s 5 56.3522 c. Same. d. s2y 5 s2x and sy 5 sx
that the shape of the distribution is approximately symmetric.
c. x 5 ~x 5 18.75 3.63 s2y 5 a2s2x , sy 5 0 a 0 sx
= 32 = =
3.29 a. x5 5 9414 b. x5 5 6250.75 3.65 a. p 5 0.9063 b. s2 5 0.0877 5 31 ( p 2 p2 )
= =
c. s2 5 0.0850 5 p 2 p2.
3.31 a. xF 5 57.1667, ~ x F 5 57.5 b. xC 5 13.9815
c. xC 5 ( xF 2 32 ) /1.8
Section 3.3
Section 3.2 3.67 True.
3.33 False. 3.69 True.
3.35 False. 3.71 False.
3.37 True. 3.73 True.
3.39 a. s2 5 323.7757, s 5 17.9938 3.75 a. (40.0, 60.0), 0.75 b. (320.5, 383.5), 0.8889
b. s2 5 479.7322, s 5 21.9028 c. (11.4, 22.6), 0.6094 d. (18.2125, 54.7875), 0.6735
c. s2 5 31.3735, s 5 5.6012 e. (95.5, 220.5), 0.84 f. ( 255.35, 254.65 ) , 0.8724
d. s2 5 2.4892, s 5 1.5777 g. ( 256.35, 59.75 ) , 0.8025
3.41 a. Q1 5 20, Q3 5 35, IQR 5 15 3.77 a. 3 b. 21.25 c. 0.8333 d. 21.1111
b. Q1 5 2.3, Q3 5 7.75, IQR 5 5.45 e. 1.6563 f. 0.4545 g. 22.2 h. 4.1143 i. 1.1111
c. Q1 5 221, Q3 5 213, IQR 5 8 j. 5.8125
d. Q1 5 44.1, Q3 5 59.2, IQR 5 15.1 3.79 a. 120.5 b. 90 c. 22 d. 30.5 e. 20.5 f. 3525
3.43 a. Increases. b. Increases. c. Does not affect. 3.81 a. (18.8, 32.4), (15.4, 35.8) b. Approximately 0.68
d. Does not affect.
3.83 a. (77, 89), (71, 95) b. At least 0.75 c. At most
3.45 a. s 5 279.0969 b. IQR 5 529 c. Probably IQR 0.1111 d. 0.95, 0.003
because there are several observations that are very large and
some that are very small. 3.85 a. 0.68 b. 0.16 c. 0.8385
3.47 a. s2 5 4.2958, s 5 2.0726, IQR 5 2.00 3.87 a. Of students nationwide, 7th–9th grade Carlisle High
b. s2 5 12.2632, s 5 3.5019, IQR 5 5.50 School students scored the same or better than 53% in mathe-
c. The More than two hours data set has more variability. matics and 69% in science on the ITBS. b. The median.
c. The seventh-grader did better than 99% of those who took
3.49 a. Q1 5 170, Q3 5 1187, IQR 5 1017 the exam.
b. s2 5 471,674.1238, s 5 686.7854 c. IQR 5 1086.5,
s2 5 991,485.9333 d. IQR is approximately the same. Add- 3.89 a. Solution Trail
ing one more outlier to the data set does not change the middle Keywords: Is there any reason to believe the general manager’s
50% much. However, s2 is sensitive to outliers, and is much claim is false?
larger in the expanded data set. Translation: Draw a conclusion.
3.51 a. East: CV 5 3.3932, CQV 5 2.6149; West: Concepts: Inference procedure; z-score.
CV 5 16.5767, CQV 5 13.7452 b. The West-side develop- Vision: Compute the z-score for the waiting time and determine
ment data have more variability. if it is reasonable, between 23 and 13.
A n s w e r s t o O d d -Nu m b e r e d E xer c i ses A-13
a a xi 2 nxb
1 n
5
s i51
a a xi 2 n a xi b 5 # 0 5 0
1 n 1 n 1 44 46 48 50 52 54 56 58 60
5
s i51 n i51 s
Centered near 48.5, positively skewed, two mild outliers.
1
Therefore, z 5 ( 0 ) 5 0 3.107 Approximately symmetric, centered near 1560, two mild
n
outliers.
g ( zi 5 z ) 2 5 gzi2
1 1
sz2 5 3.109 Males: Slightly skewed right, centered near 29, lots of
n21 n21
variability, 1 mild outlier. Females: Slightly skewed right, cen-
ga
1 xi 2 x 2 tered near 29, little variability, 1 mild outlier. Both centered
5 b
n21 s near 29, both slightly skewed right. Female data are more
g ( xi 2 x ) 2 d
1 1 compact.
5 c
s2 n 2 1 3.111 a.
1
5 2 s 2 5 1 1 sz 5 1
s
3.95 The interval two standard deviations from the mean is
10 25 40 55 70 85 100 115 130
(48, 96). A good minimum guaranteed life is 48 months.
Positively skewed, centered near 34, compact except for the two
Section 3.4 extreme outliers.
3.99 False.
3.101 a.
10 25 40 55 70 85 100 115 130
15 20 25 30 35
A-14 A n s w e r s t o O d d -N u m b ered E x er c i s e s
Low
Interval Proportion
21 22 23 24 25 26 27 28 29 30 31 32
Within 1s: (5.21, 5.46) 0.86
b. Miami: centered near 24.5, little variability, approximately
symmetric, 2 mild outliers. Denver: centered near 27, lots of Within 2s: (5.09, 5.58) 0.94
variability, approximately symmetric, no outliers. c. Both Within 3s: (4.96, 5.70) 0.97
distributions approximately symmetric. Denver has more
The box plot suggests that the distribution is positively skewed.
variability and the values are, on average, larger.
The proportions are a little closer to the empirical rule. There is
3.121 a. z 5 0.5714. This suggests that 1.54 is a reasonable still evidence to suggest that the new distribution is non-normal.
observation. This is not an unusual generating capacity.
b. z 5 22.8571. This z-score indicates that 1.3 is almost three 3.133 a. Almost all (0.997) 2 3 4’s have width between 1.69
standard deviations from the mean. Therefore, this is an and 1.81 inches. b. 1.79 is two standard deviations from the
unusual generating capacity. mean. This is a reasonable observation. There is no evidence to
suggest that the claim is false. c. 1.68 is more than three stan-
3.123 a. x 5 3.0003, ~x 5 2.995 b. The mean is approxi- dard deviations from the mean. This is a very unusual observa-
mately equal to the median, which suggests that the distribution tion. There is evidence to suggest that the claim is false.
A n s w e r s t o Od d - Nu m b e r e d Exer c i ses A-15
Interval Proportion A B A B
750 1750 2750 3750 4750 5750 6750 7750
e. S f. S
The distribution is centered near 1300, has lots of variability, is
positively skewed, with several mild and extreme outliers. This A B A B
description agrees with parts (a) and (b).
d. Phantom of the Opera: 10,807 performances as of January
14, 2014. This value should increase the values of the sample
mean, median, variance, and standard deviation.
y 5 1845.01 . x, ~y 5 1295 . ~x , s2y 5 2,452,146.9 . s2x ,
sy 5 1565.93 . sx
4.17 a. A 5 5 YNN, NYN, NNY 6 ,
B 5 5 YNN, NYN, NNY 6 ,
Chapter 4 C 5 5 YNN, NYN, NNY, YYN, YNY, NYY, YYY 6 ,
D 5 5 YYY, YYN, YNY, NYY 6
Section 4.1
b. A c D 5 5 NNY, NYN, NYY, YNN, YNY, YYN, YYY 6
4.1 False. c. Dr 5 5 NNN, NNY, NYN, YNN 6
d. B d C 5 5 NNY, NYN, YNN 6
4.3 True.
e. D 5 5 YYY, YYN, YNY, NYY 6
4.5 a. Not. b. Or. c. And.
4.19 a. EL
L
4.7 L RL
M EM
H
RH H
R EH
L BL
H CL
B BH E L
G M CM
L GL
H
H C
K GH CH
L KL M ML
H L
KH M MM
S = {RL, RH, BL, BH, GL, GH, KL, KH} H
P
4.9 52 MH
b. S 5 5 EL, EM, EH, CL, CM, CH, 4.31 a. S 5 {MCY, MCD, MCR, MNY, MND, MNR, FCY,
ML, MM, MH, PL, PM, PH 6 FCD, FCR, FNY, FND, FNR} b. A 5 The customer is male.
B 5 The customer orders fresh-cut fries and is a senior.
4.21 1D
D C 5 The customer is young.
T 1T D 5 The customer did not order fresh-cut fries.
B
1 4.33 a. There are infinitely many outcomes in this experi-
1B
ment. b. Some of the possible outcomes in this experiment
2 2D are H, BH, BBH, BBBH, BBBBH.
D
T 2T
4.35 a. S 5 5 1U, 2U, 3U, 4U, 5U, 6U, 1O, 2O, 3O,
B 4O, 5O, 6O 6 b. Br 5 5 4U, 5U, 6U, 4O, 5O, 6O 6
2B A c B 5 5 1U, 2U, 3U, 1O, 2O, 3O, 4O, 5O, 6O 6
A d B 5 5 1O, 2O, 3O 6 C d D 5 5 6O 6 A d C d D 5 5 2O 6
S 5 5 1D, 1T, 1B, 2D, 2T, 2B 6 ( A d D ) r 5 5 1U, 2U, 3U, 4U, 5U, 6U, 1O, 3O, 5O 6
4.23 a. I1 4.37 a. HWS
S
1 M HWM
I2
2 L
W
3 I3 HWL
4
5 I4 N HNS
S
H M HNM
I5
L
I
S1 HNL
1 S2 CWS
2 S
S 3 C M CWM
S3
4 L
W
5 S4 CWL
S5 N CNS
M S
M CNM
M1
L
1 M2 CNL
2
3 M3 b. S 5 5 HWS, HWM, HWL, HNS, HNM, HNL,
4
CWS, CWM, CWL, CNS, CNM, CNL 6
5 M4 c. A c B 5 5 HWS, CWS, HNS, CNS, CWM, CWL,
M5
CNM, CNL 6 B c C 5 S B d C 5 5 CWS, CNS 6
Cr 5 5 SWM, CWL, CNM, CNL 6
b. S 5 5 I1, I2, I3, I4, I5, S1, S2, S3, S4, S5,
M1, M2, M3, M4, M5 6
4.25 66 Section 4.2
4.27 a. S 5 5 1G, 1R, 1I, 2G, 2R, 2I, 3G, 3R, 3I, 4G, 4R, 4I 6 4.39 a. Very likely to occur. b. Very unlikely to occur.
b. A 5 5 1G, 2G, 3G, 4G 6 , B 5 5 2G, 2R, 2I 6 , c. The limiting relative frequency of occurrence of the event.
C 5 5 3G, 3R, 3I, 1I, 2I, 4I 6 , D 5 5 4G 6
4.41 False.
c. A c B 5 5 1G, 2G, 2R, 2I, 3G, 4G 6 , A d B 5 5 2G 6
4.43 P ( A d B )
4.29 a. S 5 5 0L, 1L, 2L, 3L, 4L, 0T, 1T, 2T, 3T, 4T 6
b. A 5 the patient is late. B 5 The patient has 3 or 4 cavities. 4.45 a. 0.5 b. 0.3333 c. 0.3333 d. 0.5
C 5 The patient has 1 or 3 cavities. D 5 The patient has 0 4.47 a. 0.85 b. 0.15 c. 0.4 d. 0.7
cavities. E 5 The patient has 0 cavities or is late. F 5 The
patient has 4 cavities and is on time. 4.49 a. 0.532 b. 0.468 c. 0.594 d. 0.771
A n s w e r s t o Od d - Nu m b e r e d E xer c i ses A-17
4.51 S 4.99 a. 1326 b. 0.0045 c. 0.0588 d. 0.2353
0.09 4.101 a. 4 b. 8 c. 16 d. 2n
A B
4.103 ( n 2 1 ) !
0.26 0.02 0.19
4.105 a. 1,345,860,629,046,814,650 b. 0.00001103
0.03
c. 0.0000017862
0.15 0.11
4.57 a. 0.4305 b. 0.5434 c. 0.9784 4.113 a. Valid. b. Rows: 0.17, 0.22, 0.21, 0.21, 0.19.
Columns: 0.74, 0.26. c. 0.12, 0.07, 0.02 d. 0.1923,
4.59 a. 1000 b. 0.01 c. 0.002 d. 0.001 0.9048, 0 e. 0.21 5 0.17 1 0.04
4.61 a. 0.84 b. 0.16 c. 0.40, 0.14 4.115 a. 0.466, 0.299 b. 0.135, 0.215 c. 0.3258,
4.63 a. 0.29 b. 0.25 c. 0.24 d. 0.95 0.4893, 0.4491 d. 0.534 5 0.145 1 0.174 1 0.215
S
4.65 a. G1G2, G1G3, G1G4, G1G5, G1G6, G1B1, G1B2, G2G3,
C1 C2 C3
G2G4, G2G5, G2G6, G2B1, G2B2, G3G4, G3G5, G3G6, G3B1, G3B2,
G4G5, G4G6, G4B1, G4B2, G5G6, G5B1, G5B2, G6B1, G6B2, B1B2
b. 0.5357 c. 0.4643 d. 0.0357
4.67 0.000976. Because this probability is so small, all 10 peo- 0.145 0.174 0.215
ple buying a plain bagel is a rare event. This suggests that the
assumption is wrong. There is evidence to suggest that the
demand for each type of bagel is not equal.
Section 4.3
4.117 a. 0.13, 0.36, 0.62. These events are not mutually
4.69 Permutation.
exclusive and exhaustive. b. 0, 0.15 c. 0.2419, 0.4167
4.71 Equally likely outcomes experiment. d. 0.2031, 0.7126, 0 e. 0.3056, 0.2778, 0.4167. Given B has
occurred, either 3, 4, or 5 must occur.
4.73 a. 1,680 b. 1,663,200 c. 11,880 d. 3,628,800
e. 10 f. 1 g. 72 h. 380 i. 9,900 4.119 a. 0.0784 b. 0.0588 c. 0.2353 d. 0.25
4.79 6840 4.125 a. 0.3441 b. 0.2672 c. 0.1062 d. 0.2471
e. 0.3779
4.81 a. 64,000 b. 0.0156 c. 59,280, 0.0121
4.127 a. 0.5385 b. 0.4615 c. 0.3
4.83 a. 216 b. 144 c. 36
4.129 a. 0.1166 b. 0.2930 c. 0.7070
4.85 a. 0.3626 b. 0.0088 c. 0.6374
4.131 a. 0.144 b. 0.4493 c. 0.7843
4.87 a. 19,958,400 b. 0.0152 c. 0.0909 d. P ( North 0 ( Good c Fair ) 5 0.1814
4.89 a. 455 b. 0.0220 c. 0.2637 d. 0.80 P ( Midwest 0 ( Good c Fair ) 5 0.3733
P ( South 0 ( Good c Fair ) 5 0.4453
4.91 a. 40,320 b. 0.125
The customer is most likely from the South.
4.93 a. 2550 b. 0.84 c. 0.4706
4.133 a. Arrival mode
4.95 a. 20 b. P ( two girls selected ) 5 0.10. Because this Bus Car Walk
probability is so small, there is evidence to suggest that the
process was not random. Carries 625 466 142 1233
Lunch
Buys 345 122 500 967
4.97 a. 3,628,800 b. 0.0222 c. 0.2 d. 0.0079
970 588 642 2200
A-18 A n s w e r s t o Od d -Nu m bered E x er c i s e s
b. Rel. Rel. Rel. 5.19 a. Continuous. b. Continuous. c. Discrete.
n freq. n freq. n freq. d. Continuous. e. Discrete. f. Discrete g. Discrete.
h. Continuous.
50 0.0600 100 0.0900 150 0.0800
200 0.0950 250 0.0800 300 0.0567
Section 5.2
350 0.0629 400 0.0775 450 0.0844
500 0.0600 550 0.0618 600 0.0733 5.21 True.
650 0.0492 700 0.0586 750 0.0640 5.23 All the possible values of X, the probability associated
800 0.0713 850 0.0541 900 0.0633 with each value.
950 0.0642 1000 0.0540 1050 0.0590 5.25 a. 0.07 b. 0.62, 0.42 c. 0.7 d. 0.38
1100 0.0755 1150 0.0635 1200 0.0692
5.27 a. 0.65, 0.35 b. 0.6 c. 0.4615
1250 0.0640 1300 0.0677 1350 0.0681
1400 0.0671 1450 0.0517 1500 0.0680 d. p(x)
1550 0.0516 1600 0.0613 1650 0.0709 0.3
0.08
0.06 5.29 x 1 2 3 4
0.04 p(x) 0.01 0.08 0.27 0.64
0.02
p(x)
0.00 0.6
0 500 1000 1500 2000
0.5
n
0.4
d. An estimate of the probability of winning 5 or more free 0.3
nights is 0.063. e. 0.0625
0.2
0.1
Chapter 5 0.0
1 2 3 4 x
Section 5.1
5.1 True. 5.31 a.
5.3 False. x 1 2 3 4 5 6
p(x) 0.0179 0.0536 0.1071 0.1786 0.2679 0.3750
0 # p ( x ) # 1 and gp ( x ) 5 1.
5.5 Sample space, real numbers.
5.7 Measurement.
b. 0.1786 c. 0.9286 d. 0.2857
5.9 a. Discrete. b. Continuous. c. Continuous. e. p(x)
d. Discrete. e. Discrete. f. Continuous. g. Discrete.
h. Discrete.
0.3
5.11 a. Discrete. b. Continuous. c. Continuous.
d. Discrete. e. Continuous. f. Discrete. 0.2
5.13 a. S 5 5 MM, MW, MB, MG, WM, WW, WB, 0.1
WG, BM, BW, BB, BG, GM, GW, GB, GG 6
b. 0, 1, 2. Discrete. X can assume only a finite number of values. 0.0
1 2 3 4 5 6 x
5.15 a. Discrete. b. Continuous. c. Discrete.
d. Continuous.
5.33 a. 0.35 b. 0.95 c. 0.04 d. 0.01 e. 0.5775
5.17 Continuous. Measuring acceleration.
A-20 A n s w e r s t o O d d -Nu m bered E x e r c i s e s
5 a x 2p ( x ) 2 2m a xp ( x ) 1 m2 a p ( x )
all x all x
b. 0.027 c. 0.657
all x all x all x
5.37 x 0 1 2
5 E ( X 2 ) 2 2mm 1 m2 5 E ( X 2 ) 2 m2
p(x) 0.4000 0.5333 0.0667
5.39 a. 0.1 b. 0.65 c. 0.3025 d. 0.0023 Section 5.4
e. y 50 100 150 200 250
5.71 False.
p( y) 0.55 0.35 0.07 0.02 0.01
5.73 True.
5.41 a. m 100 250 500 1000
5.75 True.
p(m) 0.0667 0.1333 0.4667 0.3333
5.77 The number of successes in n trials.
b. 0.1111
5.79 (1) n identical trials. (2) Each trial can result in only
5.43 a. y 0 1 2 3 4 one of two possible outcomes. (3) Outcomes are independent.
p( y) 0.0027 0.0416 0.2089 0.4305 0.3162 (4) Probability of a success is constant from trial to trial.
b. 0.9973 c. 0.5576 5.81 a. 0.2361 b. 0.0802 c. 0.0393 d. 0.0566
e. 0.7638
Section 5.3
5.83 a. < 1 b. 0.9995 c. 0.5118 d. 0.8047
5.45 True.
5.85 a. m 5 12, s2 5 7.2, s 5 2.6833
5.47 True.
5.49 a [ f ( x ) # p ( x ) ]
b. (9.3167, 14.6833), (6.6334, 17.3666), (3.9501, 20.0499)
c. 0.0009 d. 0.0172
all x
5.87 a. 0.0432 b. 0.9999 c. 18 d. 0.1230
5.51 m 5 7.20, s2 5 8.96, s 5 2.9933
5.89 a. 24 b. 0.9744 c. 0.0095
5.53 a. m 5 0, s2 5 270, s 5 16.4317
b. 1 c. 0.55, 0.45 5.91 a. m 5 45, s2 5 4.5, s 5 2.1213 b. 0.9421
c. 0.8304 d. Claim: p 5 0.9 1 X , B ( 50, 0.9 )
5.55 a. m 5 7.35, s2 5 24.6275, s 5 4.9626
Experiment: x 5 41
b. 0.4 c. 0.95
Likelihood: P ( X # 41 ) 5 0.0579
5.57 a. m 5 9.04 b. s2 5 3.0184, s 5 1.7374 Conclusion: There is no evidence to suggest the claim is false.
c. Solution Trail
Keywords: Within one standard deviation of the mean. 5.93 a. 0.1722 b. 0.2348
c. Claim: p 5 0.17 1 X , B ( 35, 0.17 )
Translation: In the interval ( m 2 s, m 1 s ) .
Experiment: x 5 3
Concepts: Probability distribution; probability statement.
Likelihood: P ( X # 3 ) 5 0.1315
Vision: Write the appropriate probability statement, find the
value(s) of X that lie in the interval, and add the corresponding Conclusion: There is no evidence to suggest the claim is false.
probabilities. 5.95 a. 0.2599 b. 0.4705 c. 0.5205
0.62 d. 0.82
5.97 a. 0.1297 b. 0.3438
5.59 a. m 5 15.55, s2 5 40.4475, s 5 6.3598 c. Claim: p 5 0.11 1 X , B ( 50, 0.11 )
b. 0.82 c. 0.92 d. 386 Experiment: x 5 12
2
5.61 a. m 5 7.998, s 5 6.706, s 5 2.5896 b. 0.6 Likelihood: P ( X $ 12 ) 5 0.0069 ( # 0.05 )
c. 0.000144 Conclusion: There is evidence to suggest the claim is false.
A n s w e r s t o O d d -N u m b e r e d E xer c i ses A-21
5.99 a. 0.3269 b. m 5 4.72, P ( X , m ) 5 0.4729 5.135 a. 0.4431 b. 0.9819 c. 0.4290, 0.9782 d. 0.4096,
c. 0.0097 0.9728 e. In a hypergeometric experiment, the probability of
a success changes on each trial. As N increases with M/N
5.101 a. 0.2880, 0.0640 b. 0.9744 c. 0.9898 d. n $ 8
constant, the hypergeometric probabilities approach the
5.103 a. 0.0014 b. 0.9659 c. 0.9064 corresponding binomial probabilities.
5.105 a. 0.1933 b. 0.9435 c. 0.1121 d. 0.00005268 5.137
a P(X 5 a) P ( Y 5 1 ) /a P(X # a) 1 2 P(Y 5 0)
Section 5.5 1 0.4 0.4 0.4 0.4
5.107 True. 2 0.24 0.24 0.64 0.64
3 0.144 0.144 0.784 0.784
5.109 True.
4 0.0864 0.0864 0.8704 0.8704
5.111 In a hypergeometric experiment, the probability of a 5 0.0518 0.0518 0.9222 0.9222
success changes on each trial. 6 0.0311 0.0311 0.9533 0.9533
5.113 a. 0.25 b. 0.4290 c. 0.0563 7 0.0187 0.0187 0.9720 0.9720
5.115 a. 0.4679 b. 0.1125 c. 0.3606 d. 0.9597
8 0.0112 0.0112 0.9832 0.9832
9 0.0067 0.0067 0.9899 0.9899
5.117 a. 4, 5, 6, 7, 8 b. m 5 6, s2 5 0.8, s 5 0.8944 10 0.0040 0.0040 0.9940 0.9940
c. 0.2462 d. 0.0385
5.119 a. Solution Trail P ( X 5 a ) 5 P ( Y 5 1 ) / a; P ( X # a ) 5 1 2 P ( Y 5 0 ) .
Keywords: Mean number of robberies per day; 4.32. a
a b ( 0.4 ) 1 ( 0.6 ) a21
Translation: Fixed time; l 5 4.32. P(Y 5 1) 1 a ( 0.4 )( 0.6 ) a21
5 5
Concepts: Poisson probability distribution. a a a
Vision: The mean of the Poisson distribution is given. Write a 5 ( 0.4 )( 0.6 ) a21 5 P ( X 5 a )
probability statement and convert to cumulative probability if
necessary. a
1 2 P ( Y 5 0 ) 5 1 2 a b ( 0.4 ) 0 ( 0.6 ) a 5 1 2 ( 1 )( 1 )( 0.6 ) a
0.1241 b. 0.0325 c. 0.00017689 0
5 1 2 ( 0.6 ) a 5 P ( X # a )
5.121 a. 0.1743 b. 0.7199 c. 0.0358 d. 0.2957
5.123 a. 0.0047 b. 0.3134 c. 0.3012 Chapter Exercises
5.125 a. 0.0821 b. 0.0042 c. 0.9580
5.139 a. m 5 20, s2 5 4, s 5 2 b. 0.8909
5.127 a. Solution Trail c. Claim: p 5 0.8 1 X , B ( 25, 0.8 )
Keywords: 15 lobstermen; 5 fined; 4 selected at random. Experiment: x 5 21
Translation: N 5 15; M 5 5 successes; 10 failures; sample Likelihood: P ( X $ 21 ) 5 0.4207
size n 5 4. Conclusion: There is no evidence to suggest the supervisor’s
Concepts: Hypergeometric probability distribution. claim is false.
Vision: Sampling without replacement, and the population is 5.141 a. 0.0573 b. 12.5 c. 0.1887
finite. Write a probability statement involving a hypergeometric
random variable and use the appropriate formula. 5.143 a. 0.1291 b. 0.8687 c. 0.0139
0.3297 b. 0.0037 c. 0.8462 5.145 a. 0.0774 b. 0.8041 c. 0.0238 d. 0.6476
5.129 a. 0.3679 b. 0.9810 5.147 a. 0.2205 b. 0.0567 c. 1.878 3 10212
c. Claim: l 5 1 1 X is a Poisson RV.
5.149 a. 0.0451 b. 0.4826
Experiment: x 5 6
c. Claim: p 5 0.81 1 X , B ( 30, 0.81 )
Likelihood: P ( X $ 6 ) 5 0.0005942
Experiment: x 5 18
Conclusion: There is evidence to suggest the claim is false.
Likelihood: P ( X # 18 ) 5 0.0062
5.131 a. 0.1839 b. 0.9963 Conclusion: There is evidence to suggest the claim (81%) is
c. Claim: l 5 1 1 X is a Poisson RV. false.
Experiment: x 5 3
5.151 a. 0.0498 b. 0.6472
Likelihood: P ( X $ 3 ) 5 0.0803 c. Claim: l 5 3 1 X is a Poisson RV.
Conclusion: There is no evidence to suggest the claim is false. Experiment: x 5 6
5.133 a. 0.0091 b. 0.1954 c. 0.0959 Likelihood: P ( X $ 6 ) 5 0.0839
A-22 A n s w e r s t o O d d -N u m bered E x er c i s e s
0.6
0.4
0.2 0 4z
P(Z # 4) < 1
21.0 20.5 0.5 1.0 x
6.33 a. 0.6827 b. 0.9544 c. 0.9974. These are the
b. 0.8183 empirical rule probabilities.
6.35
Section 6.2 a. b.
6.27 True.
6.29 Standardization.
6.31
a. b.
0.0251 z 0 1.2372 z
b 5 0.0251 b 5 1.2372
c. d.
0 2.16 z 0 2.16 z
P ( Z # 2.16 ) 5 0.9846 P ( Z , 2.16 ) 5 0.9846
c. d.
0 1.6449 z –2.3263 0 z
b 5 1.6449 b 5 22.3263
e. f.
–0.470 z 0 0.73 z
P ( Z # 20.47 ) 5 0.3192 P ( 0.73 . Z ) 5 0.7673
e. f.
0 5 z 4 0 z 34 x 50 62 70 x
P ( Z , 5 ) < 1 P ( Z # 24 ) < 0 0.1845 0.6730
A-24 A n s w e r s t o O d d -Nu mbered Ex er c i s e s
c. d. 6.71 x
85
75
65
55
45
35
32 45 x 76.95 77 x
25
0.0088 0.3085
15
–2 –1 0 1 2 z
e. f.
The curved graph and the possible outliers suggest the data are
from a nonnormal population.
6.73 Frequency histogram:
6.61 a. 89.4 b. 0.0039 c. 83.7, 86.3 d. 83.25 110
90
Section 6.3
22 21 0 1 2 z
6.65 True.
There is no evidence to suggest that the data are from a non-
6.67 True. normal population. The points appear to fall on a straight line.
The four methods do not suggest that the data are from a
6.69 True.
non-normal population.
A n s w e r s t o O d d -N u m b e r e d E xer c i ses A-25
There is a slight curve in the plot near the bottom left. The points appear to fall along a straight line. There is no evidence
However, there is probably not enough evidence to suggest to suggest that the data are from a non-normal population.
non-normality. The four methods indicate that there is no evidence to suggest
the data are from a non-normal population.
b. Backward empirical rule:
x 5 1480, s 5 482.75 6.79 Frequency histogram:
Interval Proportion
9
Within 1s: (997.25, 1962.75) 0.67
8
Within 2s: (514.50, 2445.50) 1.00 7
Within 3s: ( 31.75, 2928.25) 1.00
Frequency 6
5
The actual proportions are not far enough away from those
4
given by the empirical rule to suggest that the data are from a
3
non-normal population.
2
6.77 Frequency histogram: 1
17 18 19 20 21 22 23
Absolute magnitude
2
The histogram suggests that the data are from a non-normal
population.
Frequency
There is some evidence that these data are from a non-normal 6.89 a. f(x)
distribution. The histogram indicates several outliers, and the
0.25
normal probability plot exhibits some nonlinearity.
6.81 The normal probability plot suggests that the data are from 0.20
a non-normal population. The plot exhibits a nonlinear pattern.
0.15
6.83 Frequency histogram:
0.10
7 0.05
6
22 21 0 1 2 3 4 5 6 7 8 9 x
5
Frequency
600 7
6
575
Frequency
5
550
4
525 3
22 21 0 1 2 z
2
The normal probability plot suggests that the data are from
1
a non-normal population. The plot exhibits a nonlinear pattern.
0
The four methods suggest that the data are from a non-normal 0 2,000 4,000 6,000 8,000 10,000 12,000
population. Rice production
Backward empirical rule:
x 5 4620.07, s 5 3229.74
Section 6.4
Interval Proportion
6.85 True.
Within 1s: ( 1390.3, 7849.8 ) 0.60
6.87 a. 1 2 e2lx for x $ 0 b. e2lx for x $ 0 Within 2s: ( 21839.4, 11079.5 ) 0.97
Within 3s: ( 25069.1, 14309.3 ) 1.00
A n s w e r s t o O d d -N u m b e r e d Exer c i ses A-27
6
5 Section 7.1
4
3 7.1 Population. Sample.
2
7.3 True.
1
0 7.5 Obtain many values of the statistic and construct a
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
histogram.
Page coverage
Backward empirical rule: N
7.7 a b
x 5 0.1730, s 5 0.1289 n
Interval Proportion 7.9 a. Statistic. b. Parameter. c. Statistic. d. Parameter.
Within 1s: ( 0.0441, 0.3019 ) 0.87 e. Statistic.
Within 2s: ( 20.0847, 0.4307 ) 0.97 7.11 a. Statistic. b. Parameter. c. Parameter.
Within 3s: ( 20.2136, 0.5596 ) 0.97 d. Parameter. e. Statistic.
IQR/s 5 1.2415 7.13 a. Random samples will vary. b. Frequency histogram:
0.7
0.6 8
0.5 6
0.4 4
0.3 2
0.2 0
0.1 300 325 350 375 400 425 450 475
0.0 Sample mean
–2 –1 0 1 2 z
c. Approximately normal. Approximate mean: 380
There is some evidence that the data are from a non-normal d. m 5 379.7, almost the same.
population. The histogram and the normal probability plot
indicate an outlier, and the backward empirical rule proportions
are inconsistent.
A-28 A n s w e r s t o O d d -Nu m b ered E x er c i s e s
p(s2) 0.365 0.405 0.180 0.050 Graph of the probability density function:
Conclusion: There is no evidence to suggest that the claim is c. Histogram of the sample means:
false, that the mean weight of bags of potato chips is less than
100
12 ounces.
90
7.45 a. 0.0142 b. 0.0031 c. (87.5, 90.5) 80
70
7.47 a. 0.9431 b. 0.0569 c. 0.2635
Frequency
60
7.49 a. 0.0432 b. 0.3318 50
40
c. Claim: m 5 28 1 X , N ( 28, 72 / 50 )
d
30
Experiment: x 5 29.75 20
Likelihood: P ( X $ 29.75 ) 5 0.0385 10
Conclusion: There is evidence to suggest that the claim is false, 0
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
that the mean vertical leap has increased.
Sample mean
7.51 a. X5 , N ( 3.7, 0.2 ) , X20 , N ( 3.7, 0.05 )
The shape of the histogram is approximately normal, centered
b. Blue curve: X; red curve: X5; green curve: X20.
near the population mean 30.85.
Section 7.3
7.61 False.
7.63 True.
7.65 a. np 5 25 $ 5, n ( 1 2 p ) 5 75 $ 5,
= d
P , N ( 0.25, 0.0019 ) b. np 5 135 $ 5,
=
n ( 1 2 p ) 5 15 $ 5, P , N ( 0.90, 0.0006 )
d
e. np 5 30 $ 5, n ( 1 2 p ) 5 4970 $ 5,
d. 0.1030 =
P , N ( 0.006, 0.000001928 )
d
1.86 0.0001 1.88 0.0016 1.90 0.0176 c. 0.0489 d. 0.1697
1.92 0.1031 1.94 0.3368 1.96 0.6632 =
7.73 a. P , N ( 0.46, 0.003312 ) b. 0.1486
d
1.98 0.8953 2.00 0.9648 2.02 0.8953 c. 0.5690 d. (0.3118, 0.6082)
2.04 0.6632 2.06 0.3368 2.08 0.1031
7.75 a. 0.0881 b. 0.9634
2.10 0.0176 2.12 0.0016 2.14 0.0001 = d
c. Claim: p 5 0.125 1 P , N ( 0.125, 0.0007292 )
=
b. Graph of the OC curve: Experiment: p 5 0.06
=
Likelihood: P ( P # 0.06 ) 5 0.0080
p (accept)
Conclusion: There is evidence to suggest that the claim is false,
0.8 that the proportion of children in this age group with allergies
is less than 0.125.
0.6
7.77 a. 0.0745 b. 0.2799
= d
0.4 c. Claim: p 5 0.10 1 P , N ( 0.10, 0.0003 )
=
Experiment: p 5 0.16
0.2 =
Likelihood: P ( P $ 0.16 ) 5 0.000266
0.0 Conclusion: There is evidence to suggest that the claim is false,
1.86 1.90 1.94 1.98 2.02 2.06 2.10 2.14 m
that the funding rate has increased.
7.59 a. m 5 30.85 b. Answers will vary. 7.79 a. 0.0468 b. 0.1366 c. 0.4655
A-30 A n s w e r s t o Od d - Nu m bered E x e r c i s e s
=
7.81 a. 0.0066 b. 0.8781 7.99 a. P , N ( 0.12, 0.000422 )
d
7.83 a. 0.0863 b. 0.0649 c. 0.000477 Graph of the probability density function:
s P2
0.0025
0.0020
0.0015
0.0010
=
The graph suggests that the variance is a maximum when Experiment: p 5 0.09
p 5 0.50. =
Likelihood: P ( P # 0.09 ) 5 0.0722
Conclusion: There is no evidence to suggest that the claim is
Chapter Exercises false.
=
7.101 a. P , N ( 0.65, 0.0002275 ) , np 5 650 $ 5,
d
that the mean coefficient of static friction is greater than 0.5. d. n $ 74
7.89 a. Distribution of M: 7.107 a. X , N ( 800, 1252 / 40 )
d
Experiment: x 5 8.1 =
8.7 u 2: unbiased and small variance.
Likelihood: P ( X $ 8.1 ) 5 0.1719
8.9 The value of the unbiased estimator is, on average, u.
Conclusion: There is no evidence to suggest that the claim is
false. 8.11 0.8125
7.95 a. Statistic b. Parameter c. Statistic d. Parameter 8.15 a. 95 b. 104 c. (95, 104)
e. Statistic f. Statistic 8.17 a. 30.5 b. 24.4615 c. 26.5, 34.0
7.97 a. f1: underlying distribution; f2: distribution of the
sample mean. b. f1: underlying distribution; f2: distribution Section 8.2
of the sample mean. c. f1: underlying distribution;
8.19 False.
f2: distribution of the sample mean. d. f1: underlying
distribution; f2: distribution of the sample mean. 8.21 True.
A n s w e r s t o O d d -Nu m b e r e d Exer c i ses A-31
8.23 Wider. b. There is evidence to suggest that the mean coping skills
level is different for football and basketball players.
8.25 The probability that the confidence interval encloses the
The CIs do not overlap.
population parameter in repeated samplings. c. Football, 191; basketball, 151; hockey: 101.
8.27 a. (13.507, 17.693) b. (6232.2, 6411.8) 8.49 a. Cashew, (5.0591, 5.2809); filbert, (4.0737, 4.4063);
c. ( 251.06, 240.50 ) d. (0.0763, 0.0827) pecan, (2.3367, 2.8633). Cashews and pecans: yes. The
e. (36.287, 39.073) CIs do not overlap. Filberts and pecans: yes. The CIs
8.29 a. 9.7 b. 95% CI: (8.55, 10.85); 99.9% CI: (8.40, do not overlap. b. Cashew, (4.9852, 5.3548); filbert,
11.0). For a higher confidence level (all else being equal), (3.9628, 4.5172); pecan, (2.1611, 3.0389), Cashews and
the CI has to be larger. pecans: yes. The CIs do not overlap. Filberts and pecans:
yes. The CIs do not overlap.
8.31 (4.637, 5.343) s
8.51 a. Upper bound: x 1 za
8.33 a. (9.3713, 9.6851) b. (9.3220, 9.7344) !n
c. For a larger confidence level, the critical value is also s
b. Lower bound: x 2 za c. m , 2.4467
larger. This change makes the confidence interval in part (b) !n
wider.
8.53 This is the best CI because it is the narrowest
8.35 a. (92.8006, 111.1994) 100 ( 1 2 a ) % CI for m.
b. Solution Trail
Keywords: Is there any evidence? Section 8.3
Translation: Inference procedure. 8.55 True.
Concepts: 100 ( 1 2 a ) % CI for a population mean when s is
known. 8.57 False.
Vision: If the CI in part (a) includes 95, then there is no 8.59 a, n (the degrees of freedom)
evidence to suggest that m is more than 95. Use the four-step
8.61 a. 1.4759 b. 0.8569 c. 2.8609 d. 2.3646
inference procedure.
e. 2.9467 f. 5.2076 g. 3.7676 h. 22.2037 i. 1.7959
Claim: m 5 95
8.63 a. 1.0931 b. 1.5286 c. 3.1534 d. 2.4377
Experiment: x 5 102 e. 2.6981 f. 1.9921 g. 2.1150 h. 2.8652 i. 1.6794
Likelihood: A 99% confidence interval or interval of likely
values for m: (92.8006, 111.1994). 8.65 a. (0.1908, 0.2772) b. (217.5618, 301.6382)
c. (19.005, 26.695) d. (367.9725, 393.8275)
Conclusion: Because this CI includes 95, there is no evidence
e. (72.4005, 103.7995)
to suggest that m is greater than 95.
8.67 a. (0.2345, 0.2963) b. No. The CI in part (a) includes
8.37 a. (19.422, 22.078)
0.25. c. Frequency histogram:
b. Yes. The CI does not include 23.
There is evidence to suggest that the data are from a non- IQR/s 5 1.2131
normal population. This ratio is fairly close to 1.3.
8.69 a. (262.3195, 270.0971) b. Yes. The CI does not Normal probability plot:
x
include 275.4.
18
8.71 a. (15.3816, 18.0184)
b. Solution Trail 16
Keywords: Is there any evidence? 14
Translation: Inference procedure.
12
Concepts: A 100 ( 1 2 a ) % CI for a population mean when s
is unknown. 10
Vision: If 20 lies in (or is greater than) the CI from part (a), 8
then there is no evidence to suggest that the true mean length is
22 21 0 1 2 z
over 20 miles.
There is some evidence of a curve in this plot near the bottom left.
There is no evidence to suggest that the true mean length is
over 20 miles. The CI in part (a) suggests that the true mean There is insufficient evidence to suggest that the data are from
length is under 20 miles. a non-normal population.
8.73 a. (29.2575, 29.7908) b. Yes. The CI does not contain 30. City areas:
Frequency histogram:
8.75 a. (7.8303, 11.9697) b. Yes. The CI is below, and does
not include, 12. 8
8.77 a. (59.1498, 84.5168) b. (38.6665, 58.1668) 7
c. Yes. The CIs do not overlap. 6
Frequency
8.97 a. 461 b. 97 c. 338,244 d. 271 e. 68 40
8.99 a. Increases. b. Increases. c. Decreases.
d. Decreases. 20
Section 8.5 There is a slight curve to the points in this scatter plot.
8.123 False. There is no overwhelming evidence to suggest that the data are
from a non-normal population.
8.125 False.
8.137 a. (14.2571, 40.6222)
8.127 a. 9.2364 b. 61.0983 c. 26.2962 d. 35.4789
b. Frequency histogram:
e. 3.0535 f. 7.2609 g. 11.6886 h. 1.7349
8.129 a. 10.2829, 35.4789 b. 17.8867, 61.5812
c. 2.5582, 23.2093 d. 18.4927, 43.7730 6
e. 0.4844, 11.1433 f. 14.4012, 70.5881 5
Frequency
8.131 a. (1.347, 11.2313), (1.1588, 3.3513) 4
b. (31.2940, 197.4147), (5.5941, 14.0504)
c. (31.9769, 183.4121), (5.6548, 13.5430) 3
d. (3.0994, 26.5425), (1.7605, 5.1519) 2
e. (18.5739, 64.0850), (4.3097, 8.0053)
1
f. (5.8969, 36.9707), (2.4284, 6.0804)
0
8.133 a. (2.5912, 8.2250) b. (1.6097, 2.8679) 6 8 10 12 14 16 18 20 22 24
Relative humidity
8.135 a. (1.7229, 8.9829)
b. Frequency histogram:
The shape of the distribution does not appear to be normal.
Backward empirical rule:
4
x 5 13.0667, s 5 4.7411
Frequency
3 Interval Proportion
0
1 2 3 4 5 6 7 The actual proportions are close to those given by the
Time e mpirical rule.
IQR/s 5 1.6874
The shape of the distribution does not appear to be normal.
This ratio is not close to 1.3.
Backward empirical rule:
x 5 2.8294, s 5 1.8401 Normal probability plot:
Interval Proportion x
24
Within 1s: ( 0.9894, 4.6695 ) 0.67 22
Within 2s: ( 20.8507, 6.5096 ) 0.94 20
Within 3s: (22.6908, 8.3497 ) 1.00 18
16
The actual proportions are close to those given by the empirical 14
rule. 12
IQR /s 5 1.3152 10
This ratio is very close to 1.3. 8
6
Normal probability plot: 22 21 0 1 2 z
x
7 There is a slight curve to the points in this scatter plot.
6
There is evidence to suggest that the data are from a non-
5 normal population.
4
8.139 a. (4.1702, 23.9560) b. (18.1418, 171.9068)
3
c. No. The CIs overlap.
2
1 8.141 a. (0.5611, 2.2381) b. (15.5896, 71.8413)
c. Veteran. The CI for the variance for yardage gained is much
0
22 21 0 1 2 z narrower and covers smaller values.
A n s w e r s t o O d d -N u m b e r e d E xer c i ses A-35
8.143 a. (1589.5501, 7536.9105) b. (39.8692, 86.8154) 8.155 a. CI for s2:
c. Frequency histogram:
s2 ( n 2 1 ) s2 ( n 2 1 )
a , s2 , b
za/2 !2 ( n 2 1) 1 ( n 2 1) 2za/2 !2 ( n 2 1) 1 ( n 2 1)
14
12
b. i. (11.3763, 25.1102) ii. (11.6739, 26.7267)
iii. The interval based on the normal distribution is wider
10
Frequency
9.11 a. Not permissible. We never accept a null hypothesis. when they really do. Type II error: Decide that m 5 1200 (or
We fail to reject the null hypothesis. b. Permissible. greater) when the true mean is really less than 1200. That is,
c. Not permissible. We conduct a hypothesis test to try and find decide that the helmets meet the standard when they really do not.
evidence in favor of the alternative hypothesis.
9.51 a. H0: p 5 0.02, Ha: p . 0.02. b. Type I error: Decide
d. Permissible. e. Permissible. f. Permissible.
that p . 0.02 when the true proportion is really 0.02 (or less).
9.13 H0: p 5 0.11, Ha: p . 0.11 Type II error: Decide that p 5 0.02 (or less) when the true
proportion is really greater than 0.02.
9.15 H0: m 5 4.25, Ha: m , 4.25
9.53 a. H0: p 5 0.25, Ha: p . 0.25. b. Type I error: Decide
9.17 H0: s2 5 32, Ha: s2 , 32
that p . 0.25 when the true proportion is really 0.25 (or less).
9.19 H0: m 5 178 (minutes), Ha: m , 178 Type II error: Decide that p 5 0.25 (or less) when the true
proportion is really greater than 0.25. c. b ( 0.35 ) is smaller.
9.21 H0: p 5 0.65, Ha: p . 0.65 The probability of missing the difference between 0.25 and
9.23 H0: m 5 2000, Ha: m , 2000 0.35 is smaller than that of missing the difference between 0.25
and 0.27.
9.25 H0: m 5 525, Ha: m , 525
9.55 a. H0: m 5 6400, Ha: m . 6400. b. a 5 0.1. This
9.27 H0: m 5 25, Ha: m , 25 allows a larger error on the side of safety.
9.29 H0: m 5 1925, Ha: m , 1925
Section 9.3
9.31 H0: m 5 2666, Ha: m . 2666
9.57 False.
Section 9.2 9.59 False.
9.33 False. 9.61 True.
9.35 True. X 2 45.6
9.63 a. Z 5 b. i. Z $ 2.3263 ii. Z $ 1.96
9.37 a. Type I error. b. Correct decision. c. Type II error. 15 / !16
d. Type I error. iii. Z $ 1.6449 iv. Z $ 1.2816 v. Z $ 2.5758
9.39 a. Type I error. b. Type II error. c. Correct decision. vi. Z $ 3.2905
d. Correct decision. 9.65 a. 0.05 b. 0.005 c. 0.02 d. 0.01 d. 0.001
9.41 There is always a chance of making a mistake in any e. 0.0001
hypothesis test because we never examine the entire population, 9.67 a. 0.0001 b. 0.20 c. 0.01 d. 0.05 e. 0.0005
only a sample. f. 0.002
9.43 a. H0: m 5 140,000, Ha: m . 140,000. b. Type I error:
X 2 m0
Decide that the mean is greater than 140,000 when the true 9.69 a. H0: m 5 3.14; Ha: m , 3.14; TS: Z 5 ;
mean is 140,000 (or less). Type II error: Decide that the mean is s/ !n
140,000 (or less) when the true mean is greater than 140,000. RR: Z # 23.0902 b. The sample size is large and the
c. Drivers are more angry. Tolls do not have to be raised. population standard deviation is known. c. z 5 21.2588.
d. Transportation officials are more angry. They really need There is no evidence to suggest that the population mean is
additional tolls to fund planned repairs. less than 3.14.
9.45 a. H0: m 5 0.65, Ha: m . 0.65. b. Type I error: Decide 9.71 a. The test is right-tailed with a 5 0.05. The rejection
that m . 0.65 when the true mean is really 0.65 (or less). Type region should be RR: Z $ 1.6449. b. The numerator in the
II error: Decide that m 5 0.65 (or less) when the true mean is test statistic is reversed. It should be X 2 m0 . c. We never
really greater than 0.65. c. Type II error. If the mean current accept the null hypothesis, and we cannot prove the null
velocity is greater than 0.65, it is unsafe for swimmers. hypothesis is true. We simply fail to find evidence in favor of
d. Type I error. The race would be canceled, but the mean cur- the alternative hypothesis. d. The null hypothesis is always
rent is really safe. stated with an equals sign. e. The probabilities of type I
and type II errors are inversely related. However, this does not
9.47 a. Type I error: Decide that m . 0.4 when the true mean
mean they sum to 1.
is really 0.4 (or less). Type II error: Decide that m 5 0.4 (or
less) when the true mean is really greater than 0.4. b. Type II 9.73 Solution Trail
error. Chocolate is really increasing the level of antioxidants, Keywords: Is there any evidence; decreased; s 5 65.
but there is no evidence. Translation: Conduct a one-sided, left-tailed test about a
9.49 a. H0: m 5 1200, Ha: m , 1200. b. Type I error: population mean.
Decide that m , 1200 when the true mean is really 1200 (or Concepts: Hypothesis test concerning a population mean when
greater). That is, decide that the helmets do not meet the standard s is known.
A n s w e r s t o O d d -N u m b e r e d E xer c i ses A-37
X 2 m0 9.101 True.
9.83 H0: m 5 68; Ha: m . 68; TS: Z 5 ;
s/ !n 9.103 Reject the null hypothesis.
RR: Z $ 1.6449
z 5 1.4310. There is no evidence to suggest that the mean lock- 9.105 a. 0.0307 b. 0.0054 c. 0.1151 d. 0.2843
smith service charge is greater than $68. e. 0.00005 f. 0.8729
X 2 m0 Section 9.5
9.111 H0: m 5 10; Ha: m . 10; TS: Z 5
s/ !n 9.125 True.
z 5 2.6904, p 5 0.0036 # 0.05. There is evidence to suggest
that the mean is greater than 10. 9.127 False.
X 2 m0 X 2 m0
9.113 H0: m 5 1; Ha: m 2 1; TS: Z 5 9.129 a. T 5 b. (i) T $ 3.3649
s/ !n S/ !n
z 5 1.7678, p 5 0.0771 . 0.01. There is no evidence to ii. T $ 2.0739 iii. T $ 1.7459 iv. T $ 1.3125
suggest that the mean is different from 1. v. T $ 4.2968 vi. T $ 6.4420
X 2 m0 X 2 m0
9.115 H0: m 5 5.5; Ha: m . 5.5; TS: Z 5 9.131 a. T 5 b. i. 0 T 0 $ 3.1058
s/ !n S/ !n
z 5 2.2692, p 5 0.0116 # 0.05. There is evidence to suggest ii. 0 T 0 $ 1.3304 iii. 0 T 0 $ 2.0595 iv. 0 T 0 $ 1.7033
that the mean is greater than 5.5. v. 0 T 0 $ 5.9588 vi. 0 T 0 $ 2.3961
X 2 m0 X 2 m0
9.147 H0: m 5 23.1; Ha: m . 23.1; TS: T 5 ; 9.159 a. H0: m 5 4.75; Ha: m , 4.75; TS: T 5 ;
S/ !n S/ !n
RR: T $ 2.8965 RR: T # 22.4121
t 5 2.1429 , 2.8965; p 5 0.0322 . 0.01. There is no evi- t 5 22.4416 # 22.4121; p 5 0.0093 # 0.01. There is
dence to suggest that the mean hotel room occupation rate has evidence to suggest that the population mean width of little-
increased. If the test is significant, one cannot conclude the ad necks has decreased. b. This test is statistically significant
campaign caused the increase. even though the sample mean (4.66) is so close to the
X 2 m0 assumed population mean (4.75), because the sample size is
9.149 H0: m 5 3.2; Ha: m 2 3.2; TS: T 5 ; fairly large ( n 5 46 ) and the sample standard deviation is
S/ !n quite small ( s 5 0.25 ) relative to the sample mean.
RR: 0 T 0 $ 2.0860
c. 0.005 # p # 0.01
t 5 20.2770; 0 t 0 , 2.0860; p 5 0.7846 . 0.05. There is no
evidence to suggest that the mean physician density is different X 2 m0
9.161 a. H0: m 5 1.0; Ha: m . 1.0; TS: T 5 ;
from 3.2. S/ !n
X 2 m0 RR: T $ 1.7033
9.151 a. H0: m 5 9; Ha: m . 9; TS: T 5 ; t 5 2.0264 $ 1.7033; p 5 0.0264 # 0.05. There is evidence
S/ !n
RR: T $ 1.8595 to suggest that the true mean magnitude is greater than 1.0.
b. 0.025 # p # 0.05
t 5 0.8517 , 1.8595; p 5 0.2096 . 0.05. There is no
evidence to suggest that the mean floor area is greater than X 2 m0
9 million square feet. 9.163 a. H0: m 5 65; Ha: m . 65; TS: T 5 ;
S/ !n
b. p . 0.20
RR: T $ 2.6503
9.153 a. H0: m 5 1381; Ha: m 2 1381; t 5 0.4795 , 2.6503; p 5 0.3198 . 0.01. There is no
X 2 m0 evidence to suggest that the mean amount of aspartame is
TS: T 5 ; RR: 0 T 0 $ 2.1199 greater than 65 mg. b. p $ 0.20
S/ !n
t 5 2.4969; 0 t 0 $ 2.1199; p 5 0.0238 # 0.05. There 9.165 a. H0: l 5 4; Ha: l , 4; TS X 5 the number of
is evidence to suggest that the true mean loss as a result people locked out of their rooms; RR: X # 0 ( a < 0.0183 ) .
of a residential break-in has changed from $1381. b. x 5 2. There is no evidence to suggest that the mean number
b. 0.02 # p # 0.05 of people locked out of their rooms per day is less than 4.
c. p-Value illustration: c. P ( X # 2 ) 5 0.2381
Section 9.6
9.167 True.
9.169 False.
9.171 The hypothesis test is not valid.
9.173 Value of
RR the TS Conclusion
22.4969 0 2.4969 t
a. Z $ 1.6449 0.3033 Do not reject.
9.155 a. H0: m 5 159,350; Ha: m . 159,350; b. Z $ 1.2816 1.4882 Reject.
X 2 m0 c. Z $ 2.3263 2.3327 Reject.
TS: T 5 ; RR: T $ 2.7638 d. Z $ 1.9600 0.6872 Do not reject.
S/ !n
e. Z $ 2.3263 0.3307 Do not reject.
t 5 1.4855 , 2.7638; p 5 0.0841 . 0.01. There is no
evidence to suggest that the mean methane output per well per
day has increased. 9.175 Value of
b. 0.05 # p # 0.10 RR the TS Conclusion
X 2 m0 a. 0 Z 0 $ 2.2414 22.0142 Do not reject.
9.157 a. H0: m 5 13.5; Ha: m . 13.5; TS: T 5 ;
S/ !n b. 0 Z 0 $ 2.3263 2.3451 Reject.
RR: T $ 2.4999 c. 0 Z 0 $ 1.9600 22.0294 Reject.
t 5 3.0384 $ 2.4999; p 5 0.0029 # 0.01. There is evidence d. 0 Z 0 $ 2.8070 21.4025 Do not reject.
to suggest that the mean length of lionfish in this area is greater
e. 0 Z 0 $ 2.5758 2.6455 Reject.
than 13.5 inches. b. 0.001 # p # 0.005
A-40 A n s w e r s t o Od d - Nu m bered E x e r c i s e s
49-year-olds who obtain news every day from nightly 10.133 a. H0: s21 5 s22; Ha: s21 2 s22;
network news shows. TS: F 5 S 21 / S 22 ; RR: F # 0.17 or F $ 4.54
b. H0: p1 2 p2 5 0; Ha: p1 2 p2 , 0; b. f 5 4.8259 $ 4.54; p 5 0.0074. There is evidence to
= =
P1 2 P2 suggest that population variance 1 is different from population
TS: Z 5 = = ; RR: Z # 22.3263 variance 2. c. 0.002 # p # 0.02
"Pc ( 1 2 Pc ) A n11 1 n12 B
z 5 23.7319 # 22.3263; p 5 0.0001 # 0.01. There is 10.135 a. (0.3254, 3.5608) b. (0.4712, 5.8451)
evidence to suggest the population proportion of 18- to 29-year- c. (0.2703, 2.3458) d. (0.2843, 2.5079)
olds who obtain news every day from cable news shows is less
than the population proportion of 30- to 49-year-olds who 10.137 H0: s21 5 s22; Ha: s21 . s22; TS: F 5 S 12 / S 22;
obtain news every day from cable news shows. RR: F $ 2.39
c. H0: p1 2 p2 5 0; Ha: p1 2 p2 2 0; f 5 4.6944 $ 2.39; p 5 0.0019 # 0.05. There is evidence to
= = suggest that the population variance in the aerosol light absorp-
P1 2 P2
TS: Z 5 = = ; RR: 0 Z 0 $ 2.8070 tion coefficient is greater in Africa than in South America.
"Pc ( 1 2 Pc ) A n11 1 n12 B
10.139 H0: s21 5 s22; Ha: s21 2 s22;
z 5 22.0501; 0 z 0 , 2.8070; p 5 0.0404 . 0.005. There is no
TS: F 5 S 12 / S 22; RR: F # 0.43 or F $ 2.41
evidence to suggest that the population proportion of 18- to
f 5 0.4960; 0.43 , f , 2.41; p 5 0.1025 . 0.05. There is no
29-year-olds who obtain news every day from the Internet is
evidence to suggest that the population variance in checked-
different from the population proportion of 30- to 49-year-olds
baggage weight is different for these two airlines.
who obtain news every day from the Internet.
10.141 a. H0: s21 5 s22; Ha: s21 . s22;
10.121 H0: p1 2 p2 5 0.10; Ha: p1 2 p2 . 0.10;
= = TS: F 5 S 12 / S 22; RR: F $ 2.60
( P1 2 P2 ) 2 D0
TS: Z 5 = = = = RR: Z $ 2.3263 f 5 2.8681 $ 2.60; p 5 0.0054 # 0.01. There is evidence to
( ) ( )
#P1 1 n21 P1 1 P2 1 n22 P2 suggest that the population variance in winning times for an
z 5 0.8038 , 2.3263; p 5 0.2108 . 0.01. There is no ordinary race is greater than the population variance in winning
evidence to suggest that the population proportion of times for a stakes race. b. (1.1014, 7.4689)
homeowners planning a landscaping project is more than 0.10 10.143 a. H0: s21 5 s22; Ha: s21 2 s22;
greater than the population proportion of condominium owners TS: F 5 S 12 / S 22; RR: F # 0.40 or F $ 2.53
planning a landscaping project. f 5 0.5784; 0.40 , f , 2.53. There is no evidence to suggest a
difference in the population variance of times for men and
Section 10.5 women who finished this vertical run.
b. p $ 0.05; p < 0.2418
10.123 False.
10.145 H0: s21 5 s22; Ha: s21 , s22; TS: F 5 S 12 / S 22;
10.125 False.
RR: F # 0.42
10.127 The value of S 21 is, on average, s21. The value of S 22 is, f 5 0.3237 # 0.42; p 5 0.0014 # 0.01. There is evidence
on average, s22. to suggest that the variability in item weight at Jefferson is
greater that at New York Avenue.
10.129 a. 2.20 b. 3.15 c. 3.58 d. 4.99 e. 0.23
f. 0.12 g. 0.34 h. 0.25 10.147 H0: s21 5 s22; Ha: s21 2 s22;
TS: F 5 S 12 / S 22; RR: F # 0.30 or F $ 3.83
10.131 a. H0: s21 5 s22; Ha: s21 . s22;
f 5 1.1104; 0.30 , f , 3.83; p 5 0.8976 . 0.05. There is no
TS: F 5 S 12 / S 22; RR: F $ 1.94
evidence to suggest a difference in the variability of the span of
b. f 5 2.5448 $ 1.94. There is evidence to suggest that
suspension bridges in China and the United States.
population variance 1 is greater than population variance 2.
c. 0.01 # p # 0.05. p-Value illustration: 10.149 a. H0: s21 5 s22; Ha: s21 2 s22;
TS: F 5 S 12 / S 22; RR: F # 0.31 or F $ 3.53
f(x) f 5 2.7128; 0.31 , f , 3.53. There is no evidence to suggest
that the population variance in saccharin amount for the two
teas is different. b. 0.02 # p # 0.10; p < 0.0621
10.151 a. H0: s21 5 s22; Ha: s21 2 s22;
( S 21 / S 22 ) 2 3 ( n2 2 1 ) / ( n2 2 3 ) 4
TS: Z 5 ( ) 2 ( n1 1 n2 2 4 )
; RR: 0 Z 0 $ 1.96
" ( 2n1 n22 21 ) 1( n2 2 3 ) 2 ( n2 2 5 )
z 5 1.7059; 0 z 0 , 1.96; p 5 0.0880 . 0.05. There is no
evidence to suggest that the two population variances are
different. b. H0: s21 5 s22; Ha: s21 2 s22;
0 2.54 x TS: F 5 S 12 / S 22; RR: F # 0.48 or F $ 2.07
A-48 A n s w e r s t o Od d -Nu mbered E x er c i s e s
f 5 1.7763; 0.48 , f , 2.07; p 5 0.1212 . 0.05. There is f 5 0.1038 # 0.29; p 5 0.0001 # 0.02. There is evidence to
no evidence to suggest that the two population variances are suggest that the two population variances are different.
different. b. H0: m1 2 m2 5 0; Ha: m1 2 m2 , 0;
(X1 2 X2) 2 0
Chapter Exercises TS: Tr 5 ; RR: Tr # 21.7531
S 21 S 22
$Q n1 1 n2 R
10.153 H0: p1 2 p2 5 0; Ha: p1 2 p2 . 0;
= = tr 5 21.8697 # 21.7531; p 5 0.0406 # 0.05. There is
P1 2 P2 evidence to suggest that the population mean time to complete
TS: Z 5 = = ; RR: Z $ 2.3263
"Pc ( 1 2 Pc ) A n11 1 n12 B Form 1040 for the lower income level is less than the popula-
z 5 3.5139 $ 2.3263; p 5 0.0002 # 0.01. There is evidence tion mean time to complete Form 1040 for the higher income
to suggest that the true proportion of chest hits is greater in level. 0.025 # p # 0.05.
Lancashire than in West Mercia. = =
10.167 a. p1 5 0.3793, p2 5 0.2876.
= =
10.155 H0: m1 2 m2 5 0; Ha: m1 2 m2 2 0; n1 p1 5 132 $ 5, n1 ( 1 2 p1 ) 5 216 $ 5,
= =
( X1 2 X2 ) 2 0 n2 p2 5 65 $ 5, n2 ( 1 2 p2 ) 5 161 $ 5
TS: T 5 RR: 0 T 0 $ 2.7045 b. (0.0137, 0.1697) c. There is evidence to suggest that
2 1 1
#Sp A n1 1 n2 B the population proportion of online investors is different for
t 5 1.3083; 0 t 0 , 2.7045; p 5 0.1982 . 0.01. There is no these two portfolio classifications. 0 is not in the CI.
evidence to suggest that the mean amount of sap from trees in 10.169 For each test:
New York is different from the mean amount of sap from trees ( X1 2 X2 ) 2 0
Ha: m1 2 m2 5 0; Ha: m1 2 m2 2 0; TS: T 5
in Vermont. 2 1 1
#Sp A n1 1 n2 B
Single detached versus Double/duplex:
10.157 a. Archer. b. H0: mD 5 0; Ha: mD , 0; RR: 0 T 0 $ 2.7633. t 5 21.5545; 0 t 0 , 2.7633;
D 2 D0 p 5 0.1313 . 0.01. There is no evidence to suggest that the
TS: T 5 ; RR: T # 21.7959
SD / !n population mean toluene levels are different.
t 5 21.1829 . 21.7959; p 5 0.1309 . 0.05. There is no Single detached versus Apartment:
evidence to suggest that the population mean speed of a carbon RR: 0 T 0 $ 2.7564. t 5 23.2994; 0 t 0 $ 2.7564;
arrow is less than the population mean speed of an aluminum p 5 0.0026 # 0.01. There is evidence to suggest that the
arrow. c. 0.10 # p # 0.20. population mean toluene levels are different.
Double/duplex versus Apartment:
10.159 Ha: m1 2 m2 5 0; Ha: m1 2 m2 . 0;
RR: 0 T 0 $ 2.7564. t 5 22.1737; 0 t 0 , 2.7564;
( X 1 2 X2 ) 2 0 p 5 0.0380 . 0.01. There is no evidence to suggest that the
TS: T 5 RR: T $ 2.4286
2 1 1
#S1 A n1 1 n2 B population mean toluene levels are different.
t 5 2.1140 , 2.4286; p 5 0.0206 . 0.01. There is no 10.171 H0: p1 2 p2 5 0; Ha: p1 2 p2 , 0;
= =
evidence to suggest that the mean pier length in California is P1 2 P2
greater than the mean pier length in Florida. TS: Z 5 = = ; RR: Z $ 21.6449
"Pc ( 1 2 Pc ) A n11 1 n12 B
D 2 D0 z 5 21.7661 # 21.6449; p 5 0.0387 # 0.05. There is
10.161 H0: mD 5 0; Ha: mD 2 0; TS: T 5 ;
SD / !n evidence to suggest that the population proportion of obese
RR: 0 T 0 $ 2.0930 Denver residents is less than the population proportion of obese
t 5 0.4957; 0 t 0 , 2.0930; p 5 0.6258 . 0.05. There is no New Orleans residents. This suggests that the adults living in
evidence to suggest that the population mean moisture content Denver are thinner.
of bulk grain measured by chemical reaction and by distillation
is different.
= = Chapter 11
10.163 a. p1 5 0.1143, p2 5 0.0952.
= =
n1 p1 5 16 $ 5, n1 ( 1 2 p1 ) 5 124 $ 5, Section 11.1
= =
n2 p2 5 12 $ 5, n2 ( 1 2 p2 ) 5 114 $ 5
b. H0: p1 2 p2 5 0; Ha: p1 2 p2 2 0; 11.1 True.
= =
P1 2 P2 11.3 False.
TS: Z 5 = = ; RR: 0 Z 0 $ 2.5758
"Pc ( 1 2 Pc ) A n11 1 n12 B 11.5 s2
z 5 0.5054; 0 z 0 , 2.5758; p 5 0.6133 . 0.01. There is no 11.7 They are easier, faster, and more accurate.
evidence to suggest that the population proportion of stations in
noncompliance with the law is different near Los Angeles and 11.9 a. 9, 7, 6 b. 260, 229, 195 c. 21,602
near San Francisco.
11.11 a. 775, 745, 768, 753 b. 465,193
10.165 a. H0: s21 5 s22; Ha: s21 2 s22; c. 2808.95, 112.55, 2696.40 d. 37.5167, 168.525
TS: F 5 S 12 / S 22; RR: F # 0.29 or F $ 3.78 e. 0.2226
A n s w e r s t o O d d -N u m b e r e d Exer c i ses A-49
fAB 5 0.03 , 3.89; p 5 0.9710 . 0.05; there is no evidence fAB 5 0.74 , 1.68; p 5 0.7363 . 0.05; there is no evidence of
of interaction. fA 5 0.40 , 3.89; p 5 0.6787 . 0.05; interaction.
there is no evidence of an effect due to office. b. fA 5 57.75 $ 3.89; p , 0.0001; there is evidence of an
fB 5 5.66 $ 4.75; p 5 0.0287 # 0.05; there is evidence effect due to ADHD diagnosis. fB 5 1.82 . 1.68;
of an effect due to type of development. p 5 0.0290 # 0.05; there is evidence of an effect due
to county.
11.81 a.
c. Diagnosis of ADHD is associated with an increase in age at
Source of Sum of Degrees of Mean diagnosis for ASD. The mean age of all those not diagnosed
variation squares freedom square F p-Value with ADHD was 3.72, and with ADHD, 4.30.
Cover type 1.54 3 0.51 0.87 0.4770
Chapter Exercises
State 3.12 3 1.04 1.76 0.1953
Interaction 7.11 9 0.79 1.34 0.2919 11.89
Error 9.45 16 0.59 Source of Sum of Degrees of Mean
variation squares freedom square F p-Value
Total 21.22 31
Factor 106,568 3 35,522.67 4.67 0.0124
fAB 5 1.34 , 3.78; p 5 0.2934 . 0.01; there is no evidence of Error 151,997 20 7,599.85
interaction.
b. fA 5 0.87 , 5.29; p 5 0.4774 . 0.01; there is no evidence Total 258,565 23
of an effect due to cover type.
f 5 4.67 $ 3.10; p 5 0.0124 # 0.05; there is evidence to
fB 5 1.76 , 5.29; p 5 0.1954 . 0.01; there is no evidence of
suggest that at least two of the population mean wattages of
an effect due to state.
spotlights are different.
11.83 fAB 5 1.68 , 2.36; p 5 0.1542 . 0.05;
11.91 Significantly
there is no evidence of interaction. fA 5 7.23 . 3.26;
Difference Bonferroni CI different
p 5 0.0023 # 0.05; there is evidence of an effect due to
region. fB 5 19.71 $ 2.87; p , 0.0001; there is evidence m1 2 m2 ( 24.38, 8.58 ) No
of an effect due to road marking quality. m1 2 m3 ( 24.08, 8.88 ) No
11.85 a. m1 2 m4 ( 1.62, 14.58 ) Yes
m2 2 m3 ( 26.18, 6.78 ) No
Source of Sum of Degrees of Mean
m2 2 m4 ( 20.48, 12.48 ) No
variation squares freedom square F p-Value
m3 2 m4 ( 20.78, 12.18 ) No
Gender 261.33 1 261.33 10.52 0.0024
Altitude 241.17 3 80.39 3.24 0.0321 x4. x3. x2. x1.
Interaction 106.17 3 35.39 1.43 0.2497 98.2 104.3 104.6 106.7
Error 993.33 40 24.83
11.93 a. H0: m1 5 m2 5 m3 5 m4;
Total 1602.00 47 Ha: mi 2 mj for some i 2 j;
TS: F 5 MSA/ MSE; RR: F $ 3.24
fAB 5 1.43 , 2.84; p 5 0.2497 . 0.05; there is no evidence of
f 5 6.23 $ 3.24; p 5 0.0052 # 0.05. There is evidence
interaction.
to suggest that at least two of the population mean step heights
b. fA 5 10.52 $ 4.08; p 5 0.0024 # 0.05; there is evidence
are different.
of an effect due to gender. fB 5 3.24 $ 2.84;
p 5 0.0321 # 0.05; there is evidence of an effect due to b. Significantly
altitude. Difference Bonferroni CI different
11.87 a. m1 2 m2 ( 0.0139, 0.2892 ) Yes
m1 2 m3 ( 20.0346, 0.2407 ) No
Source of Sum of Degrees of Mean
m1 2 m4 ( 20.0276, 0.2477 ) No
variation squares freedom square F p-Value
m2 2 m3 ( 20.1862, 0.0892 ) No
ADHD 72.45 1 72.45 57.75 , 0.0001 ( 20.1792,
m2 2 m4 0.0962 ) No
County 38.72 17 2.28 1.82 0.0290 ( 20.1307,
m3 2 m4 0.1447 ) No
Interaction 15.70 17 0.92 0.74 0.7627
Error 225.82 180 1.25 x2. x4. x3. x1.
Total 352.69 215 0.2299 0.2714 0.2784 0.3815
A n s w e r s t o O d d -N u m b e r e d E xer c i ses A-53
20
Chapter 12 15
Section 12.1 10
12.1 Deterministic and random. 5
12.3 Homogeneity of variance. 0
20 40 60 80 100 x
12.5 The mean value of Y for x 5 x*; an observed value
of Y for x 5 x*. b. y 5 3.8619 1 0.1423x c. 14.5358
12.7 SST 5 SSR 1 SSE 12.27 a. y 5 24.4297 2 58.9778x b. 6.7364 c. 0.1599
12.33 a. Scatter plot of y versus x1: b. H0: There is no significant linear relationship.
Ha: There is a significant linear relationship.
y
TS: F 5 MSR/ MSE; RR: F $ 4.28
8 f 5 2.58 , 4.28; p 5 0.1219 . 0.05.
7 There is no evidence of a significant linear relationship.
6 c. 0.1007 d. 0.3174
5
12.43 a. H0: b1 5 0; Ha: b1 2 0; TS: T 5 B1 / SB1;
4
RR: 0 T 0 $ 2.0796
3
t 5 2.2980 $ 2.0796; p 5 0.0319 # 0.05.
2
There is evidence to suggest that b1 2 0, the regression
1
line is significant. b. (0.4209, 8.4361)
0 c. Yes. The CI does not include 0.
150 200 250 300 350 400 450 x1
As x1 increases, the value of y tends to level off. The relation- 12.45 a. H0: There is no significant linear relationship.
ship appears to be logarithmic. Ha: There is a significant linear relationship.
b. Values of x2: 1.3863, 3.9512, 4.7185, 4.7791, 2.8904, TS: F 5 MSR/ MSE; RR: F $ 4.60
3.1355, 5.6240, 5.7366, 5.8493, 2.7081, 4.0254, 4.4773, f 5 4.14 , 4.60. There is no evidence of a significant linear
5.2040, 5.4972, 1.6094, 5.312, 5.7462, 3.5835, 4.3175, 4.8903, relationship. p . 0.05, p 5 0.0614
5.3083, 5.5215, 5.6021, 5.7170, 5.8081 b. H0: b1 5 0; Ha: b1 2 0; TS: T 5 B1 / SB1;
c. Scatter plot of y versus x2: RR: 0 T 0 $ 2.1448
y
t 5 2.0337; 0 t 0 , 2.1448. There is no evidence to suggest that
b1 is different from 0. 0.05 # p # 0.10. 0.0614 c. t2 5 f
8
d. Same. These two hypothesis tests are testing the same
7 null hypothesis.
6
5 12.47 a. y 5 68.0071 2 8.7993x
4
Source of Sum of Degrees of Mean
3
variation squares freedom square F p-Value
2
1 Regression 7,462.54 1 7,462.54 10.35 0.0324
0 Error 2,884.77 4 721.19
2 3 4 5 x2
Total 10,347.31 5
d. y 5 20.8059 1 1.5603x2
ANOVA summary table: b. H0: There is no significant linear relationship.
Ha: There is a significant linear relationship.
Source of Sum of Degrees of Mean
TS: F 5 MSR / MSE; RR: F $ 7.71
variation squares freedom square F p-Value
f 5 10.35 $ 7.71; p 5 0.0324 # 0.05.
Regression 103.1611 1 103.1611 741.04 , 0.0001 There is evidence of a significant linear relationship.
Error 3.2019 23 0.1392 c. 0.7212
d. 20.8492. Negative relationship. b1 is negative.
Total 106.3629 24 e. r2 5 0.7212
12.49 a. y 5 2.9430 1 0.6925x
Section 12.2
12.35 Testing the hypothesis H0: b1 5 0. Source of Sum of Degrees of Mean
variation squares freedom square F p-Value
12.37 False.
Regression 0.1062 1 0.1062 0.40 0.5403
12.39 Between 21 and 11. Error 3.4909 13 0.2685
12.41 a. ANOVA summary table: Total 3.5971 14
Source of Sum of Degrees of Mean b. H0: There is no significant linear relationship.
variation squares freedom square F p-Value Ha: There is a significant linear relationship.
Regression 11691.9 1 11691.90 2.58 0.1219 TS: F 5 MSR/ MSE; RR: F $ 9.07
f 5 0.40 , 9.07; p 5 0.5403 . 0.01.
Error 104372.1 23 4537.92
There is no evidence of a significant linear relationship.
Total 116064.0 24 c. ( 26.7570, 12.6430 ) . No. The CI includes 0.
A-56 A n s w e r s t o O d d -Nu mbered E x er c i s e s
12.51 a. r 5 20.6434 b. There is a negative linear relation- There is a strong positive relationship between Eurasian snow
ship between mean annual temperature and depth of permafrost cover and AO anomaly. b. r 5 0.9953.
layer. As the mean annual temperature increases, the depth of This also suggests a very strong, positive relationship.
the permafrost layer decreases. c. y 5 29.4729 1 0.9925x. Students should use the current
winter. For reference, in 2012, x 5 11.1. Predicted AO: 1.544
12.53 a. Scatter plot of y versus x:
y 12.59 a. y 5 330.9302 1 26.8727x
ANOVA summary table:
6
Source of Sum of Degrees of Mean
variation squares freedom square F p-Value
5
Regression 625,214.03 1 625,214.03 18.69 0.0005
4 Error 535,323.75 16 33,457.73
Total 1,160,537.78 17
3
b. H0: There is no significant linear relationship.
15 20 25 30 x
Ha: There is a significant linear relationship.
There seems to be a negative linear relationship. As average TS: F 5 MSR/ MSE; RR: F $ 4.49
finish increases, total winnings decreases. b. r 5 20.8847. f 5 18.69 $ 4.49; there is evidence of a significant linear
This is consistent with part (a). r is negative, and close to 21. relationship p , 0.001.
c. y 5 7.4920 2 0.1605x
ANOVA summary table: c. H0: b1 5 25; Ha: b1 2 25;
B1 2 25
Source of Sum of Degrees of Mean TS: T 5 ; RR: 0 T 0 $ 2.1199
variation squares freedom square F p-Value SB1
t 5 0.3012; 0 t 0 , 2.1199.
Regression 15.5939 1 15.5939 46.84 , 0.0001 There is no evidence to suggest that b1 is different from 25.
Error 4.3282 13 0.3329 d. (8.7156, 45.0298)
Total 19.9221 14 12.61 a. y 5 58.2111 2 3.1972x
d. 9.3 ANOVA summary table:
12.83 a. Scatter plot of y versus x: 12.93 The variance is not constant.
y 12.95 a. 8.8557, 26.3342, 29.6692, 28.6693, 26.7549,
4.3457, 10.7254, 7.5009 b. 0.0001
40
12.97 a. No b. Yes c. Yes d. No
30 12.99 a. Normal probability plot:
e
20
450
10 350
0 250
0.0 0.2 0.4 0.6 0.8 x
150
There appears to be a weak positive linear relationship. This
supports the study; there is a relationship between linguistic 50
diversity
= and annual road fatalities. b. y 5 7.7428 1 25.3696x –2 –1 –50 1 2 z
b1 . 0, which supports the conclusion that there is a weak
–150
positive linear relationship. c. (12.1524, 20.2295)
b. There is quite a bit of evidence to suggest that the random
12.85 a. ANOVA summary table: error terms are not normal. The plot is curved (not linear) and
Source of Sum of Degrees of Mean has several outliers.
variation squares freedom square F p-Value 12.101 a. 29.4156, 223.1316, 213.0912, 23.76574,
Regression 853.50 1 853.50 8.99 0.0103 210.2607, 4.23426, 210.7988, 223.6600, 234.7145,
Error 1234.23 13 94.94 21.91207, 8.00498, 213.6152, 3.62451, 210.4535, 61.2238,
22.54092, 21.9411, 19.4999
Total 2087.73 14 b. Scatter plot of residuals versus x:
e
H0: There is no significant linear relationship.
Ha: There is a significant linear relationship.
TS: F 5 MSR/ MSE; RR: F $ 4.67 ( a 5 0.05 ) 40
f 5 8.99 $ 4.67; p 5 0.0103 # 0.05.
20
There is evidence of a significant linear relationship. As CPI
increases, so does ESI. b. 56.8902 c. (40.1554, 87.2630).
0
No. The PI does not include 90. 25 30 35 40 45 50 x
e 0.01
15
0.00
10 6 7 8 9 10 11 x
5 0.01
0.02
–2 –1 1 2 z
–5
–10 c. There is no overwhelming evidence to suggest that the simple
linear regression assumptions are invalid.
–15
–20 12.111 a. y 5 28.4546 1 3.3981x
Residuals: 6.3086, 5.3283, 24.8369, 20.0698, 20.7688,
There is evidence to suggest that the random error terms are not 4.2894, 26.1573, 3.3281, 7.3477, 22.5456, 26.4194,
normal. There is a nonlinear pattern in the graph. 24.2058, 1.2506, 7.9688, 27.0991, 0.0465, 24.6233,
12.107 a. 3.7425, 21.3811, 3.5112, 26.3979, 2.3609, 4.3086, 1.6873, 25.1379
5.9687, 25.3974, 3.0674, 22.0416, 0.4215, 3.5257, b. Normal probability plot:
a a 3 y i 2 ( b 0 1 b 1x i ) 4
Scatter plot of residuals versus x: n
=
n = =
( y i 2 y i ) 5
e i51 i51
5 a 3 y i 2 ( y 2 b 1x 1 b 1 x i ) 4
n = =
8
i51
5 a c y i 2 y 2 b1 a ( x i 2 x ) d
4 n = n
0
=
10 20 30 40 50 60 x i51 i51
24 5 ny 2 ny 2 b1 nx 2 nx ) 5 0
(
28 Section 12.5
212 12.119 The unknown parameters b0, b1, c, bk .
d. The graphs do not provide any evidence that the 12.121 Principle of least squares.
simple linear regression assumptions are invalid. The normal
probability plot is approximately linear, and the plot of 12.123 False.
residuals versus the predictor variable exhibits no 12.125 At least one of the predictor variables helps to explain
discernible pattern. some of the variation in the dependent variable.
12.115 a. y 5 94.1430 1 0.0768x 12.127 This assumes that only one test is conducted.
b. H0: There is no significant linear relationship. 12.129 The mean value of Y for x 5 x* and an estimate of an
Ha: There is a significant linear relationship. observed value of Y for x 5 x*.
TS: F 5 MSR/ MSE; RR: F $ 4.38
f 5 7.69 $ 4.38. There is evidence of a significant linear 12.131 a. 482.4 b. As x1 increases, y increases.
relationship. This suggests that as the soil EC increases, the c. 25.3 d. 0.1680
corn yield also increases. 12.133 a. y 5 12.7786 1 1.9638x1 1 7.4479x2
0.01 # p # 0.05. p 5 0.0121. b. 168.2178
0.01 # p # 0.05. d. r2 5 0.4705. Approximately 47% of the Scatter plot of residuals versus x3:
variation in y is explained by this regression model. e
40
12.137
30
a. y 5 246.2192 2 4.2153x1 1 16.4785x2 1 1.1186x3
20
b. H0: b1 5 b2 5 b3 5 0;
10
Ha: bi 2 0 for at least one i;
TS: F 5 MSR/ MSE; RR: F $ 4.76 2 4 6 8 x3
f 5 32.80 $ 4.76; p 5 0.0004 # 0.05. There is evidence 210
to suggest that at least one of the regression coefficients is 220
different from 0. The overall regression is significant. 230
d. b1: t 5 25.8458, p 5 0.0011. b2: t 5 3.9533, p 5 0.0075. 240
b3: t 5 0.6198, p 5 0.5582. x1 and x2 are significant predictors. This graph suggests a violation in the regression assumptions.
e. ( 2117.7319, 25.2936 ) . There is no evidence to suggest that Include a quadratic term, x23, to improve the model.
the constant regression coefficient is different from 0. The CI
includes 0. 12.141
a. y 5 20.0565 2 0.0007x1 1 0.0002x2 1 0.000002x3
12.139 H0: b1 5 b2 5 b3 5 0;
a. y 5 7.5139 2 14.9684x1 1 2.9118x2 2 0.9704x3 Ha: bi 2 0 for at least one i;
b. Normal probability plot: TS: F 5 MSR/ MSE; RR: F $ 3.29
e f 5 3.24 , 3.28; p 5 0.0519. It’s really close, but at the
40 a 5 0.05 level, there is no evidence to suggest that at least one
of the regression coefficients is different from 0. The overall
30
regression is not significant.
20
b. 0.0622 c. The t tests suggest that the price of gold is the
10 most important predictor.
22 21 1 2 z 12.143 a. ANOVA summary table:
210
220 Source of Sum of Degrees of Mean
230
variation squares freedom square F p-Value
240 Regression 799.95 3 266.65 29.95 , 0.0001
The graph suggests that the random error terms are not normal. Error 160.23 18 8.90
The pattern is nonlinear. Total 960.18 21
b. Scatter plot of residuals versus x1:
H0: b1 5 b2 5 b3 5 0;
e Ha: bi 2 0 for at least one i;
40 TS: F 5 MSR/ MSE; RR: F $ 3.16 ( a 5 0.05 )
30 f 5 29.95 $ 3.16. There is evidence to suggest that at least
20 one of the regression coefficients is different from 0. The
10 overall regression is significant.
0 b. As temperature increases, the bonding strength decreases.
10 12 14 16 18 20 x1
210 As contact area increases, the bonding strength increases. As
220 wood density increases, the bonding strength decreases.
230 c. b1: t 5 22.382, 0.02 # p # 0.05.
240 b2: t 5 8.473, p , 0.0002. b3: t 5 21.0736, 0.20 # p # 0.40.
Temperature and contact area are the most important
Scatter plot of residuals versus x2: (significant) predictor variables.
e 12.145 a.
40 y 5 0.0510 2 0.0028x1 1 0.3247x2 1 0.2777x3
30 1 0.2188x4 2 0.1029x5
20 b. ANOVA summary table:
10
0
Source of Sum of Degrees of Mean
25 30 35 40 x2 variation squares freedom square F p-Value
210
220 Regression 6.4709 5 1.2942 172.18 , 0.0001
230 Error 0.0301 4 0.0075
240
Total 6.5010 9
A-62 A n s w e r s t o Od d -N u m bered E x er c i s e s
1 1.00
0.75
1.50 1.75 2.00 2.25 2.50 2.75 3.00 x
21 0.50
0 1 2 3 4 5 6 x
22
There appears to be a negative linear relationship.
23 b. y 5 1.6096 2 0.0924x
H0: b1 5 0; Ha: b1 2 0;
24
TS: T 5 B1 / SB1; RR: 0 T 0 $ 2.3646.
e. The graphs in part (a) and (c) suggest the model could be t 5 22.6441; 0 t 0 $ 2.3646; p 5 0.0332 # 0.05. There is evi-
improved by adding a quadratic term, x2. dence to suggest that b1 2 0, the regression line is significant.
A-64 A n s w e r s t o O d d -N u m bered E x e r c i s e s
22 21 1 2 z Total 1695.44 48
20.1 There is evidence to suggest a significant linear relationship
( p 5 0.0095 ) .
20.2 Normal probability plot:
20.3 e
level is explained by this regression model. x 2 5 11.6 $ 7.8147; p 5 0.0089 # 0.05. There is evidence to
c. 33.1205, ( 23.45, 69.69 ) suggest that at least one of the population proportions differs
d. Normal probability plot: from its hypothesized value.
13.13 H0: p1 5 0.390, p2 5 0.305, p3 5 0.305;
e Ha: pi 2 pi0 for at least one i;
TS: X 2 5 a ( ni 2 ei ) 2 / ei; RR: X 2 $ 9.2103.
3
20
i51
x 2 5 14.6318 $ 9.2103; p 5 0.0007 # 0.01. There is evidence
10 to suggest that at least one of the percentages during this week
is different from the long-term percentages.
22 21 1 2 z 13.15 H0: p1 5 0.10, p2 5 0.20, p3 5 0.25, p4 5 0.05,
p5 5 0.40; Ha: pi 2 pi0 for at least one i;
TS: X 2 5 a ( ni 2 ei ) 2 / ei; RR: X 2 $ 9.4877.
210 5
220 i51
x 2 5 10.5927 $ 9.4877; p 5 0.0315 # 0.05. There is evidence
Scatter plot of residuals versus x: to suggest that at least one of the population proportions differs
from its hypothesized value.
e
13.17 H0: p1 5 0.14, p2 5 0.06, p3 5 0.37, p4 5 0.30,
20 p5 5 0.13; Ha: pi 2 pi0 for at least one i;
TS: X 2 5 a ( ni 2 ei ) 2 / ei; RR: X 2 $ 9.4877.
5
10
i51
x 2 5 6.0884 , 9.4877; p 5 0.1926 . 0.05. There is no
0
100 200 300 400 500 x evidence to suggest that any of the population proportions
differs from its hypothesized value.
210
13.19 H0: p1 5 0.09, p2 5 0.18, p3 5 0.07, p4 5 0.07,
220 p5 5 0.09, p6 5 0.03, p7 5 0.24, p8 5 0.15, p9 5 0.03,
p10 5 0.05; Ha: pi 2 pi0 for at least one i;
TS: X 2 5 a ( ni 2 ei ) 2 / ei; RR: X 2 $ 21.6660.
There is some evidence that the variance in the random error 10
13.21 H0: p1 5 0.40, p2 5 0.30, p3 5 0.15, p4 5 0.15; 13.37 a. 12.5916 b. 15.0863 c. 14.4494 d. 26.1245
Ha: pi 2 pi0 for at least one i;
TS: X 2 5 a ( ni 2 ei ) 2 / ei; RR: X 2 $ 11.3449.
4 13.39
Category
i51 1 2 3 4
2
x 5 4.9622 , 11.3449; p 5 0.1746 . 0.01. There is no
Population
1 18 14 18 15 65
evidence to suggest that any of the population proportions
differs from its hypothesized value. 2 25 21 16 12 74
3 32 33 26 28 119
13.23 a. H0: p1 5 0.52, p2 5 0.24, p3 5 0.14, p4 5 0.06,
75 68 60 55 258
p5 5 0.04; Ha: pi 2 pi0 for at least one i;
TS: X 2 5 a ( ni 2 ei ) 2 / ei; RR: X 2 $ 9.4877.
5
13.41 H0: Variable 1 and variable 2 are independent.
Ha: Variable 1 and variable 2 are dependent.
TS: X 2 5 a a
i51
4 4 (n 2 e )
x 2 5 26.6083 $ 9.4877. There is evidence to suggest that ij ij
; RR: X 2 $ 16.9190
at least one of the population proportions differs from its e
i51 j51 ij
hypothesized value. b. p 5 0.000024 c. The two housing x 2 5 16.8226 , 16.9190; p 5 0.0516 . 0.05.
types contributing most to the chi-square statistic are There is just barely no evidence to suggest that the two
single-family detached home, large yard, and apartment categorical variables are dependent.
or condominium. It could be that retirees in Florida like both
13.43 H0: The true product proportions are the same for
of these housing types.
all stores.
13.25 a. 0.13, 0.17, 0.34, 0.36. Ha: The true product proportions are not the same for
b. H0: p1 5 0.13, p2 5 0.17, p3 5 0.34, p4 5 0.36; all stores.
TS: X 2 5 a a
3 4 (n 2 e )
Ha: pi 2 pi0 for at least one i; ij ij
; RR: X 2 $ 16.8119
TS: X 2 5 a ( ni 2 ei ) 2 / ei; RR: X 2 $ 11.3449.
4
e
i51 j51 ij
i51 x 2 5 9.2674 , 16.8119; p 5 0.1591 . 0.01.
x 2 5 5.6952 , 11.3449; p 5 0.1274 . 0.01. There is no There is no evidence to suggest that the true proportion of
evidence to suggest that any of the population proportions each favorite differs by grocery store.
changed in 2014.
13.45 H0: The true funding opinion proportions are the
13.27 H0: p1 5 0.78, p2 5 0.11, p3 5 0.04, p4 5 0.02, same for all regions.
p5 5 0.01, p6 5 0.02, p7 5 0.02; Ha: pi 2 pi0 for at Ha: The true funding opinion proportions are not the same
least one i; for all regions.
TS: X2 5 a ( ni 2 ei ) 2 / ei; RR: X2 $ 12.5916. TS: X 2 5 a a
7 4 4 (n 2 e )
ij ij
; RR: X 2 $ 23.5894
i51 e
i51 j51 ij
x 2 5 1.0594 , 12.5916; p 5 0.9833 . 0.05. There is x 2 5 17.8692 , 23.5894; p 5 0.0367 # 0.05.
no evidence to suggest that any of the historical proportions There is evidence to suggest that funding opinion differs
have changed. by region.
13.29 a. 0.0228, 0.1359, 0.3413, 0.3413, 0.1359, 0.0228 13.47 H0: Food and wine are independent.
b. H0: p1 5 0.0228, p2 5 0.1359, p3 5 0.3413, Ha: Food and wine are dependent.
TS: X 2 5 a a
2 (n 2 e )
p4 5 0.3413, p5 5 0.1359, p6 5 0.0228; Ha: pi 2 pi0 2
ij ij
for at least one i; ; RR: X 2 $ 7.8794
e ij
13.51 H0: Age group and influenza type are independent. x 2 5 7.9711 , 9.4877; p 5 0.0926 . 0.05.
Ha: Age group and influenza type are dependent. There is no evidence to suggest that any of the true
TS: X 2 5 a a
5 4 (n 2 e )
ij ij population proportions differs from its hypothesized
; RR: X 2 $ 26.2170 value.
i51 j51 e ij
2
x 5 2505.6007 $ 26.2170; p , 0.0001. 13.63 H0: The true population proportions associated with
There is overwhelming evidence to suggest that age group transfer plans are the same for all companies.
and influenza type are dependent. Ha: The true population proportions associated with transfer
13.53 a. H0: Day and time of day are independent. plans are not the same for all companies.
TS: X 2 5 a a
5 4 (n 2 e )
ij ij
Ha: Day and time of day are dependent. ; RR: X 2 $ 21.0261
TS: X 2 5 a a
7 3 (n 2 e )
ij ij i51 j51 e
ij
; RR: X 2 $ 21.0261 2
x 5 18.3478 , 21.0261; p 5 0.1055 . 0.05.
i51 j51 e ij
2 There is no evidence to suggest that the true proportions
x 5 30.7707 $ 21.0261; p 5 0.0021 # 0.05.
associated with transfer plans are different for any of the
There is evidence to suggest that day of the week and time
populations.
of day are dependent.
b. The assumptions are not met. One cell has an expected 13.65 H0: The true proportions of each countertop type is the
count less than 5. same for all supply stores.
c. Monday, daytime. Ha: The true proportions of each countertop type is not the
13.55 H0: The true proportions for type of degree are the same for all supply stores.
TS: X 2 5 a a
3 4 (n 2 e )
same for each country. ij ij
; RR: X 2 $ 12.5916
Ha: The true proportions for type of degree are different for i51 j51 e ij
each country. x 2 5 10.2014 , 12.5916; p 5 0.1164 . 0.05.
TS: X 2 5 a a
2 7 (n 2 e )
ij ij There is no evidence to suggest that the true proportion
; RR: X 2 $ 22.4577
i51 j51 e ij of each type of countertop purchased differs for supply
x 2 5 24.4007 $ 22.4577; p 5 0.0004 # 0.001. stores.
There is evidence to suggest that the true proportions for 13.67 H0: Type of music and time spent shopping are
type of degree are different for each country. independent.
13.57 H0: Flossing frequency and brushing frequency Ha: Type of music and time spent shopping are dependent.
TS: X 2 5 a a
4 4 (n 2 e )
are independent. ij ij
Ha: Flossing frequency and brushing frequency are ; RR: X 2 $ 21.6660
i51 j51 e ij
dependent. x 2 5 26.6081 $ 21.6660; p 5 0.0016 # 0.01.
TS: X 2 5 a a
7 7 (n 2 e )
ij ij
; RR: X 2 $ 58.6192 There is evidence to suggest an association between
i51 j51 e ij music type and time spent shopping.
x 2 5 62.4388 $ 58.6192; p 5 0.0041 # 0.01.
There is evidence to suggest that flossing frequency and 13.69 H0: pi 5 0.10; Ha: pi 2 pi0 for at least one i;
~ 5 1.38; H : m
14.13 H0: m ~ , 1.38; 12 12 5.0
a
TS: X 5 the number of observations greater than 1.38. 229 29 15.0
x 5 2, p 5 0.0065 # 0.05. 227 27 14.0
There is evidence to suggest that the population median 226 26 13.0
concentration of norephinephrine in women with PD is less 238 38 18.0
than 1.38 nmol/L.
222 22 10.0
~ 5 100; H : m
14.15 H0: m ~ . 100;
a 236 36 17.0
TS: X 5 the number of observations greater than 100. 21 1 2.0
x 5 17, p 5 0.0320 # 0.05.
224 24 11.5
There is evidence to suggest that the median mileage is
greater than 100. 0 0 1.0
~ 5 40; H : m
~ . 40; 213 13 6.0
14.17 H0: m a
215 15 7.0
TS: X 5 the number of observations greater than 40.
x 5 11, p 5 0.4119 . 0.05. b. Absolute
There is no evidence to suggest that the median time to Difference difference Rank
complete this test is greater than 40 seconds.
1.4 1.4 16.0
~ 2m
14.19 H0: m ~ 5 0; H : m
~ 2m
~ 2 0;
1 2 a 1 2 20.2 0.2 2.5
TS: X 5 the number of pairwise difference greater than 0.
1.6 1.6 21.5
x 5 14, p 5 0.1153 . 0.05.
There is no evidence to suggest that the median chlorophyll 0.3 0.3 5.0
amount in surface water is different in April and August. 20.4 0.4 8.0
~ 2m
~ 5 0; H : m
~ 2m
~ . 0; 20.8 0.8 12.0
14.21 H0: m1 2 a 1 2
21.5 1.5 18.5
TS: X 5 the number of pairwise differences greater than 0.
x 5 13, p 5 0.0037 # 0.05. 20.6 0.6 10.0
There is evidence to suggest that the median VOC concentra- 0.1 0.1 1.0
tion is smaller when the scrubber is installed. 21.2 1.2 14.0
~ 5 4.6; H : m
14.23 a. H0: m ~ . 4.6; 0.8 0.8 12.0
a
TS: X 5 the number of observations greater than 4.6. 21.5 1.5 18.5
x 5 15, p 5 0.0669 . 0.05. 1.5 1.5 18.5
There is no evidence to suggest that the median survival is 20.4 0.4 8.0
greater than 4.6 years. 21.5 1.5 18.5
X 2 m0
b. H0: m 5 4.6; Ha: m . 4.6; TS: T 5 ; 1.6 1.6 21.5
S/ !n
RR: T $ 1.7109 21.3 1.3 15.0
t 5 2.6824 $ 1.7109; p 5 0.0065 # 0.05. 20.3 0.3 5.0
There is evidence to suggest that the median survival is 0.8 0.8 12.0
greater than 4.6 years. This result is contrary to part (a). 0.4 0.4 8.0
The analysis in part (a) seems more reasonable. It seems 20.3 0.3 5.0
unlikely that the survival-time distribution is normally
1.9 1.9 23.0
distributed.
0.2 0.2 2.5
A n s w e r s t o O d d - Nu m b e r e d E xer c i ses A-69
c. Absolute ~ 2m
14.31 H0: m ~ 5 0; H : m
~ 2m
~ 2 0;
1 2 a 1 2
Difference difference Rank TS: T 1 5 the sum of the ranks corresponding to the positive
3.0 3.0 16.5 differences di 2 0;
24.0 4.0 23.0 RR: T 1 # 43 or T 1 $ 167
1.0 1.0 5.0 t 1 5 150.5; p 5 0.0896 . 0.02. There is no evidence to
~ is different from m
suggest that m ~ .
23.0 3.0 16.5 1 2
22.0 2.0 9.5 ~ 5 3; H : m
14.33 H : m ~ 2 3;
0 a
21.0 1.0 5.0
TS: T 1 5 the sum of the ranks corresponding to the positive
24.0 4.0 23.0
differences xi 2 3;
24.0 4.0 23.0
RR: T 1 # 37 or T 1 $ 173
22.0 2.0 9.5
t 1 5 30.5 # 37; p 5 0.006 # 0.01. There is evidence to
23.0 3.0 16.5
suggest that the median renal blood-flow rate is different
22.0 2.0 9.5
22.0 2.0 9.5 from 3.
22.0 2.0 9.5 14.35 H0: m~ 5 15.06; H : m ~ , 15.06;
a
0.0 0.0 2.0 T 1 2 mT 1
3.0 3.0 16.5 TS: Z 5 ; RR: Z # 22.3263
sT 1
3.0 3.0 16.5
z 5 22.9335 # 22.3263; p 5 0.0017 # 0.01.
0.0 0.0 2.0
There is evidence to suggest that the median APR is less
3.0 3.0 16.5
than 15.06.
23.0 3.0 16.5
4.0 4.0 23.0 14.37 H0: m~ 2m ~ 5 0; H : m ~ 2m ~ 2 0;
1 2 a 1 2
23.0 3.0 16.5 T 1 2 mT 1
0.0 0.0 2.0 TS: Z 5 ; RR: 0 Z 0 $ 1.6449
sT 1
2.0 2.0 9.5
z 5 20.9256; 0 z 0 , 1.6449; p 5 0.3547 . 0.10.
1.0 1.0 5.0
There is no evidence to suggest a difference in median
4.0 4.0 23.0
collection times.
d. Absolute ~ 5 1; H : m ~ 2 1;
14.39 H0: m a
Difference difference Rank
T 1 2 mT 1
29.4 9.4 10.0 TS: Z 5 ; RR: Z $ 1.6449
sT 1
255.4 55.4 30.0
z 5 1.9088 $ 1.6499; p 5 0.0281 # 0.05. There is evidence
226.4 26.4 15.0
to suggest that the median cadmium concentration is greater
241.4 41.4 23.5
than 1.
22.4 2.4 3.0
2.6 2.6 4.0 14.41 a. H : m~ 5 0.30; H : m ~ . 0.30;
0 a
238.4 38.4 22.0 TS: X 5 the number of observations greater than 0.30. x 5 6;
227.4 27.4 16.0 p 5 0.2272 . 0.05. There is no evidence to suggest that the
234.4 34.4 20.0 median arsenic concentration is greater than 0.30.
9.6 9.6 11.0 b. H0: m~ 5 0.30; H : m ~ . 0.30;
a
6.6 6.6 7.5 TS: T 1 5 the sum of the ranks corresponding to the positive
250.4 50.4 28.0 differences xi 2 0.30;
242.4 42.4 25.0 RR: T 1 $ 101; t 1 5 105.5 $ 101; p 5 0.0253 # 0.05. There
41.4 41.4 23.5 is evidence to suggest that the median arsenic concentration is
4.6 4.6 6.0 greater than 0.30.
220.4 20.4 14.0 c. The conclusions in parts (a) and (b) are not the same.
217.4 17.4 13.0
The test in part (b) is probably more accurate because
251.4 51.4 29.0
it uses more of the information in the sample than the test
3.6 3.6 5.0
in part (a).
233.4 33.4 19.0
6.6 6.6 7.5
247.4 47.4 27.0
232.4 32.4 18.0 Section 14.3
1.6 1.6 2.0 14.43 False.
245.4 45.4 26.0
231.4 31.4 17.0 14.45 True.
236.4 36.4 21.0
14.47 True, if the populations are symmetric.
20.4 0.4 1.0
8.6 8.6 9.0
212.4 12.4 12.0
A-70 A n s w e r s t o O d d -N u mbered Ex er c i s e s
g d 2 3 ( n 1 1 ) ; RR: H $ 13.2767;
12 ri2 14.95 H0: The sequence of observations is random;
TS: H 5 c
n ( n 1 1 ) ni Ha : The sequence of observations is not random.
h 5 16.1591 $ 13.2767; p 5 0.0028 # 0.01. There is evi- TS: V 5 the number of runs.
dence to suggest that at least two of the populations are different. RR: V # 3 or V $ 7 (a 5 0.3682)
v 5 5. There is no evidence to suggest that the order of
14.73 H0: The three samples are from identical populations;
observations is not random.
Ha : At least two of the populations are different;
g d 2 3 ( n 1 1 ) ; RR: H $ 5.9915
12 ri2 14.97 H0: The sequence of observations is random;
TS: H 5 c Ha : The sequence of observations is not random.
n ( n 1 1 ) ni
h 5 6.5184 $ 5.9915; p 5 0.0384 # 0.05. There is evidence V 2 mV
TS: z 5 ; RR: 0 Z 0 $ 2.3263
to suggest that at least two of the populations are different. sV
z 5 1.2546; 0 z 0 , 2.3263; p 5 0.2096 . 0.02. There is no
14.75 H0: The four samples are from identical populations;
evidence to suggest that the order of observations is not random.
Ha : At least two of the populations are different;
g d 2 3 ( n 1 1 ) ; RR: H $ 7.8147
12 ri2 14.99 H0: The sequence of observations is random;
TS: H 5 c Ha : The sequence of observations is not random.
n ( n 1 1 ) ni
h 5 6.3338 , 7.8147; p 5 0.0965 . 0.05. There is no evi- V 2 mV
TS: Z 5 ; RR: 0 Z 0 $ 1.96
dence to suggest that the populations are different. sV
z 5 1.1051; 0 z 0 , 1.96; p 5 0.2691 . 0.05. There is no evi-
14.77 H0: The four samples are from identical populations;
dence to suggest that the order of observations is not random.
Ha : At least two of the populations are different;
g d 2 3 ( n 1 1 ) ; RR: H $ 11.3449
12 ri2 14.101 H0: The sequence of observations is random;
TS: H 5 c Ha : The sequence of observations is not random.
n ( n 1 1 ) ni
h 5 3.6953 , 11.3449; p 5 0.2963 . 0.01. There is no TS: V 5 the number of runs.
evidence to suggest that the populations are different. RR: V # 2 or V $ 9 (a 5 0.0711)
v 5 9. There is evidence to suggest the order of observations is
14.79 H0: The three samples are from identical populations; not random at the a 5 0.0711 level, but not for a 5 0.02.
Ha : At least two of the populations are different;
g d 2 3 ( n 1 1 ) ; RR: H $ 5.9915
12 ri2 14.103 H0: The sequence of observations is random;
TS: H 5 c Ha : The sequence of observations is not random.
n ( n 1 1 ) ni
V 2 mV
h 5 1.3713 , 5.9915; p 5 0.5038 . 0.05. There is no TS: z 5 ; RR: 0 Z 0 $ 1.96
evidence to suggest that the uncompressed depth populations sV
are different. p . 0.10. z 5 22.0370; 0 z 0 $ 1.96. There is evidence to suggest that the
order of observations is not random with respect to the median.
14.81 H0: The three samples are from identical populations; p 5 0.0416.
Ha : At least two of the populations are different;
g d 2 3 ( n 1 1 ) ; RR: H $ 5.9915
12 ri2 Section 14.6
TS: H 5 c
n ( n 1 1 ) ni
14.105 No assumptions.
h 5 22.5890 $ 5.9915; p , 0.0001. There is evidence to
suggest that at least two of the length-of-service populations 14.107 True.
are different. b. Public safety and communications. Support
services and communications. The corresponding rank sums 14.109 False.
are very far apart. 14.111 a. 0.5357; moderate positive relationship.
b. 0.3333; weak positive relationship. c. 0.0637; no
Section 14.5 definitive relationship. d. 0.2922; weak negative relationship.
14.83 False. 14.113 0.7714. There is a positive relationship between x and
y. As the price of a stateroom increases, so does the number of
14.85 1
days before sailing. Therefore, this suggests that cruise prices
14.87 When both m and n are greater than 10. are reduced at the last minute.
A-72 A n s w e r s t o O d d -Nu m bered E x er c i s e s
g d 2 3 ( n 1 1 ) ; RR: H $ 5.9915
12 ri2
5 TS: H 5 c
n ( n 1 1 ) ni
0 h 5 13.6603 $ 5.9915; p 5 0.0011 # 0.05. There is
5 10 15 20 x
evidence to suggest that at least two slate weight populations
The scatter plot suggests a quadratic relationship between total are different.
snowfall and number of customers. b. 0.4842. This value
14.137 H0: The four samples are from identical populations;
suggests a weak to moderate positive linear relationship
Ha : At least two of the populations are different;
between the variables. c. Spearman’s rank correlation
g d 2 3 ( n 1 1 ) ; RR: H $ 11.3449
coefficient can only test for a linear relationship between two 12 ri2
TS: H 5 c
variables. If there is some other type of relationship between n ( n 1 1 ) ni
the variables, as in this case, the value of RS can be misleading. h 5 14.2557 $ 11.3449; p 5 0.0026 # 0.01. There is
14.121 a. 20.3634. b. rS , 0, which suggests that as the evidence to suggest that at least two nozzle temperature
number of selfies increases, the IAS score decreases. Therefore, populations are different.
people who share more selfies tend to be less inclined to be 14.139 a. H0: The sequence of observations is random.
close and personal. Ha : The sequence of observations is not random.
TS: V 5 the number of runs; RR: v # 4 or V $ 12
Chapter Exercises ( a 5 0.0560 )
~ 5 11,000; H : m
~ , 11,000; v 5 10. There is no evidence to suggest that the order of obser-
14.123 H0: m a
vations is not random. b. p 5 0.4756
TS: the number of observations greater than 11,000.
RR: X # 3 (a 5 0.0287) 14.141 H0: The sequence of observations is random.
x 5 6 . 3; p 5 0.3953 . 0.05. There is no evidence to sug- Ha : The sequence of observations is not random.
gest that the median outflow is less than 11,000 cfs. V 2 mV
~ 5 2; H : m
~ 2 2; TS: Z 5 ; RR: 0 Z 0 $ 2.5758
14.125 H : m
0 a sV
TS: the number of observations greater than 2. z 5 1.7872; 0 z 0 , 2.5758; p 5 0.0739. There is no evidence to
RR: X # 5 or X $ 15 (a 5 0.0414) suggest that the order of observations is not random.
x 5 15 $ 15; p 5 0.0414 # 0.05. There is evidence to
suggest that the median coverage amount is different from 2. 14.143 0.7785. This value suggests a moderate positive linear
~ 5 630; H : m
~ , 630; relationship. As the number of miles from I-95 increases, so
14.127 a. H : m
0 a does the starting room price of a hotel room.
TS: T 1 5 the sum of the ranks corresponding to the
positive differences xi 2 630. RR: T 1 # 20 (a 5 0.0108) 14.145 0.8660. This value suggests a strong positive linear
t 1 5 17.5 # 20. There is evidence to suggest that the median relationship. Wiper frequency might be used to predict
spray height is less than 630. b. 0.0062 c. The distribution rainfall total.
is not assumed to be symmetric. ~ 2m
~ 5 0; H : m
~ 2m
~ , 0;
14.147 a. H0: m1 2 a 1 2
14.129 H : m ~ 2m ~ 5 0; H : m~ 2m ~ , 0; TS: X 5 the number of pairwise differences greater than 0.
0 1 2 a 1 2
TS: The sum of the ranks corresponding to the positive RR: X # 5 (a 5 0.0207)
differences di 2 0. RR: T 1 # 47 (a 5 0.0494) x 5 15. There is no evidence to suggest that the median refrig-
t 1 5 25 # 47; p 5 0.0033 # 0.05. There is evidence to suggest erant weight before service is less than the median refrigerant
that the median THM concentration is greater after swimming. weight after service. b. Because we cannot reject the null
A n s w e r s t o O d d -Nu m b e r e d Exer c i ses A-73
I-1
I-2 In d e x
0 t,
a
n 0.20 0.10 0.05 0.025 0.01 0.005 0.001 0.0005 0.0001
0 z
z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
23.4 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0002
23.3 0.0005 0.0005 0.0005 0.0004 0.0004 0.0004 0.0004 0.0004 0.0004 0.0003
23.2 0.0007 0.0007 0.0006 0.0006 0.0006 0.0006 0.0006 0.0005 0.0005 0.0005
23.1 0.0010 0.0009 0.0009 0.0009 0.0008 0.0008 0.0008 0.0008 0.0007 0.0007
23.0 0.0013 0.0013 0.0013 0.0012 0.0012 0.0011 0.0011 0.0011 0.0010 0.0010
22.9 0.0019 0.0018 0.0018 0.0017 0.0016 0.0016 0.0015 0.0015 0.0014 0.0014
22.8 0.0026 0.0025 0.0024 0.0023 0.0023 0.0022 0.0021 0.0021 0.0020 0.0019
22.7 0.0035 0.0034 0.0033 0.0032 0.0031 0.0030 0.0029 0.0028 0.0027 0.0026
22.6 0.0047 0.0045 0.0044 0.0043 0.0041 0.0040 0.0039 0.0038 0.0037 0.0036
22.5 0.0062 0.0060 0.0059 0.0057 0.0055 0.0054 0.0052 0.0051 0.0049 0.0048
22.4 0.0082 0.0080 0.0078 0.0075 0.0073 0.0071 0.0069 0.0068 0.0066 0.0064
22.3 0.0107 0.0104 0.0102 0.0099 0.0096 0.0094 0.0091 0.0089 0.0087 0.0084
22.2 0.0139 0.0136 0.0132 0.0129 0.0125 0.0122 0.0119 0.0116 0.0113 0.0110
22.1 0.0179 0.0174 0.0170 0.0166 0.0162 0.0158 0.0154 0.0150 0.0146 0.0143
22.0 0.0228 0.0222 0.0217 0.0212 0.0207 0.0202 0.0197 0.0192 0.0188 0.0183
21.9 0.0287 0.0281 0.0274 0.0268 0.0262 0.0256 0.0250 0.0244 0.0239 0.0233
21.8 0.0359 0.0351 0.0344 0.0336 0.0329 0.0322 0.0314 0.0307 0.0301 0.0294
21.7 0.0446 0.0436 0.0427 0.0418 0.0409 0.0401 0.0392 0.0384 0.0375 0.0367
21.6 0.0548 0.0537 0.0526 0.0516 0.0505 0.0495 0.0485 0.0475 0.0465 0.0455
21.5 0.0668 0.0655 0.0643 0.0630 0.0618 0.0606 0.0594 0.0582 0.0571 0.0559
21.4 0.0808 0.0793 0.0778 0.0764 0.0749 0.0735 0.0721 0.0708 0.0694 0.0681
21.3 0.0968 0.0951 0.0934 0.0918 0.0901 0.0885 0.0869 0.0853 0.0838 0.0823
21.2 0.1151 0.1131 0.1112 0.1093 0.1075 0.1056 0.1038 0.1020 0.1003 0.0985
21.1 0.1357 0.1335 0.1314 0.1292 0.1271 0.1251 0.1230 0.1210 0.1190 0.1170
21.0 0.1587 0.1562 0.1539 0.1515 0.1492 0.1469 0.1446 0.1423 0.1401 0.1379
20.9 0.1841 0.1814 0.1788 0.1762 0.1736 0.1711 0.1685 0.1660 0.1635 0.1611
20.8 0.2119 0.2090 0.2061 0.2033 0.2005 0.1977 0.1949 0.1922 0.1894 0.1867
20.7 0.2420 0.2389 0.2358 0.2327 0.2296 0.2266 0.2236 0.2206 0.2177 0.2148
20.6 0.2743 0.2709 0.2676 0.2643 0.2611 0.2578 0.2546 0.2514 0.2483 0.2451
20.5 0.3085 0.3050 0.3015 0.2981 0.2946 0.2912 0.2877 0.2843 0.2810 0.2776
20.4 0.3446 0.3409 0.3372 0.3336 0.3300 0.3264 0.3228 0.3192 0.3156 0.3121
20.3 0.3821 0.3783 0.3745 0.3707 0.3669 0.3632 0.3594 0.3557 0.3520 0.3483
20.2 0.4207 0.4168 0.4129 0.4090 0.4052 0.4013 0.3974 0.3936 0.3897 0.3859
20.1 0.4602 0.4562 0.4522 0.4483 0.4443 0.4404 0.4364 0.4325 0.4286 0.4247
20.0 0.5000 0.4960 0.4920 0.4880 0.4840 0.4801 0.4761 0.4721 0.4681 0.4641
Table III Standard Normal Distribution Cumulative Probabilities (Continued)
z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753
0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141
0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517
0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879
0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224
0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549
0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852
0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133
0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389
1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621
1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830
1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015
1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177
1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319
1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441
1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545
1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633
1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706
1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767
2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817
2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857
2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890
2.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916
2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936
2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952
2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964
2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974
2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981
2.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986
3.0 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990
3.1 0.9990 0.9991 0.9991 0.9991 0.9992 0.9992 0.9992 0.9992 0.9993 0.9993
3.2 0.9993 0.9993 0.9994 0.9994 0.9994 0.9994 0.9994 0.9995 0.9995 0.9995
3.3 0.9995 0.9995 0.9995 0.9996 0.9996 0.9996 0.9996 0.9996 0.9996 0.9997
3.4 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9998
www.whfreeman.com