Professional Documents
Culture Documents
Bhisham C. Gupta
H. Fred Walker
5 4 3 2 1
Introduction
Introduction
types of data that require specic analyses that depend upon the types of data
we are working with. It is therefore important to distinguish between different types of data.
In Chapter 2 we discuss and provide examples for different types of data.
In addition, terminology such as population and sample are introduced.
In Chapter 3 we introduce several graphical methods found in descriptive
statistics. These graphical methods are some of the basic tools of statistical
quality control (SQC). These methods are also very helpful in understanding
the pertinent information contained in very large and complex datasets.
In Chapter 4 we learn about the numerical methods of descriptive statistics. Numerical methods that are applicable to both sample as well as population data provide us with quantitative or numerical measures. Such measures
further enlighten us about the information contained in the data.
In Chapter 5 we proceed to study the basic concepts of probability theory and see how probability theory relates to applied statistics. We also introduce the random experiment and dene sample space and events. In addition,
we study certain rules of probability and conditional probability.
In Chapter 6 we introduce the concept of a random variable, which is a
vehicle used to assign some numerical values to all the possible outcomes of
a random experiment. We also study probability distributions and dene
mean and standard deviation of random variables. Specically, we study
some special probability distributions of discrete random variables such as
Bernoulli, binomial, hypergeometric, and Poisson distributions, which are
encountered frequently in many statistical applications. Finally, we discuss
under what conditions (e.g., the Poisson process) these probability models
are applicable.
In Chapter 7 we continue studying probability distributions of random
variables. We introduce the continuous random variable and study its probability distribution. We specically examine uniform, normal, exponential,
and Weibull continuous probability distributions. The normal distribution is
the backbone of statistics and is extensively used in achieving Six Sigma
quality characteristics. The exponential and Weibull distributions form an
important part of reliability theory. The hazard or failure rate function is also
introduced.
Having discussed probability distributions of data as they apply to discrete and continuous random variables in Chapters 6 and 7, in Chapter 8 we
expand our study to the probability distributions of sample statistics. In particular, we study the probability distribution of the sample mean and sample
proportion. We then study Students t, chi-square, and F distributions. These
distributions are an essential part of inferential statistics and, therefore, of
applied statistics.
Estimation is an important component of inferential statistics. In Chapter
9 we discuss point estimation and interval estimation of population mean and
of difference between two population means, both when sample size is large
and when it is small. Then we study point estimation and interval estimation
of population proportion and of difference between two population proportions when the sample size is large. Finally, we study the estimation of a population variance, standard deviation, ratio of two population variances, and
ratio of two population standard deviations.
xxv
xxvi
Introduction
Tool or Technique
Define
Descriptive Statistics
Graphical Methods
Numerical Descriptions
Chapter 2
Chapter 3
Chapter 4
Measure
Sampling
Point & Interval Estimation
Chapter 8
Chapter 9
Analyze
Probability
Discrete & Continuous Distributions
Hypothesis Testing
Chapter 5
Chapters 6 & 7
Chapters 10
Improve
Control
Preface
pplied Statistics for the Six Sigma Green Belt was written as a desk
reference and instructional aid for individuals involved with Six
Sigma project teams. As Six Sigma team members, green belts will
help select appropriate statistical tools, collect data for those tools, and assist
with data interpretation within the context of the Six Sigma methodology.
Composed of steps or phases titled Dene, Measure, Analyze, Improve,
and Control (DMAIC), the Six Sigma methodology calls for the use of many
more statistical tools than is reasonable to address in one large book.
Accordingly, the intent of this book is to provide Green Belts with the benet of a thorough discussion relating to the underlying concepts of basic statistics. More advanced topics of a statistical nature will be discussed in three
other books that, together with this book, will comprise a four-book series.
The other books in the series will discuss statistical quality control, introductory design of experiments and regression analysis, and advanced design
of experiments.
While it is beyond the scope of this book and series to cover the DMAIC
methodology specically, we do focus this book and series on concepts,
applications, and interpretations of the statistical tools used during, and as part
of, the DMAIC methodology. Of particular interest in this book, and indeed
the other books in this series, is an applied approach to the topics covered
while providing a detailed discussion of the underlying concepts. This level
of detail in providing the underlying concepts is particularly important for
individuals lacking a recent study of applied statistics as well as for individuals who may never have had any formal education or training in statistics.
In fact, one very controversial aspect of Six Sigma training is that, in
many cases, this training is targeted at the Six Sigma Black Belt and is all too
commonly delivered to large groups of people with the assumption that all
trainees have a uent command of the underlying statistical concepts and
theory. In practice this assumption commonly leads to a good deal of concern and discomfort for trainees who quickly nd it difcult to keep up with
and successfully complete black beltlevel training. This concern and discomfort becomes even more serious when individuals involved with Six
xx
Preface xxi
Sigma training are expected to pass a written and/or computer-based examination that so commonly accompanies this type of training.
So if you are beginning to learn about Six Sigma and are either preparing for training or are supporting a Six Sigma team, the question is: How do
I get up to speed with applied statistics as quickly as possible so I can get the
most from training or add the most value to my Six Sigma team? The answer
to this question is simple and straightforwardget access to a book that provides a thorough and systematic discussion of applied statistics, a book that
uses the plain language of application rather than abstract theory, and a book
that emphasizes learning by examples. Applied Statistics for the Six Sigma
Green Belt has been designed to be just that book.
This book was organized so as to expose readers to applied statistics in
a thorough and systematic manner. We begin by discussing concepts that are
the easiest to understand and that will provide you with a solid foundation
upon which to build further knowledge. As we proceed with our discussion,
and as the complexity of the statistical tools increases, we fully intend that
our readers will be able to follow the discussion by understanding that the
use of any given statistical tool, in many cases, enables us to use additional
and more powerful statistical tools. The order of presentation of these tools
in our discussion then will help you understand how these tools relate to,
mutually support, and interact with one another. We will continue this logic
of the order in which we present topics in the remaining books in this series.
Getting the most benet from this book, and in fact from the complete series
of books, is consistent with how many of us learn most effectivelystart at
the beginning with less complex topics, proceed with our discussion of new
and more powerful statistical tools once we learn the basics, be sure to
cover all the statistical tools needed to support Six Sigma, and emphasize
examples and applications throughout the discussion.
So let us take a look together at Applied Statistics for the Six Sigma
Green Belt. What you will learn is that statistics arent mysterious, they
arent scary, and they arent overly difcult to understand. As in learning any
topic, once you learn the basics it is easy to build on that knowledgetrying to start without a knowledge of the basics, however, is generally the
beginning of a difcult situation!
Contents
1
1
2
9
9
10
11
12
12
13
13
15
15
15
18
20
20
22
23
27
vii
3
5
6
viii Contents
45
45
46
46
48
50
52
53
53
55
57
57
58
58
59
60
60
63
63
64
64
66
66
67
71
71
72
73
75
75
77
83
86
88
Contents ix
Contents
Contents xi
311
312
315
317
318
320
321
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
About the Authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333
1
Setting the Context
for Six Sigma
Chapter One
Figure 1.1
+3
0 +-1.5
LSL
USL
Cp=2
Cpk=1.5
3.4 DPMO
3.4 DPMO
Cpk=1.5
Cp=2
Cp=Cpk=2
6 to LSL
Figure 1.2
+1
+2
+3
+4
+5
6 to USL
by customers in the form of tolerances and describe the values for which products or services must conform to be considered good or acceptable.
There is more to this explanation, however. Again, looking at Figure 1.2
we see that because the width of the distribution is so much smaller than the
width of the limits that it is possible for the location of the distribution to
move around, or vary, within the limits. This movement or natural variation
is inherent in any process, and so anticipating the movement is exactly what
we want to be able to do! In fact, we want as much room as possible for the
distribution to move within the limits so we do not risk the distribution moving outside these limits.
Now someone may ask, Why would the distribution move around within the limits? and How much movement would we expect? Both are interesting questions, and both questions help us better understand this concept
called 6 as it refers to quality.
When a process is operating, whether that process involves manufacturing operations or service delivery, variation within that process is to be
expected. Variation occurs both in terms of the measures of dispersion (i.e.,
the width of a process) and measures of location (i.e., where the center of that
process lies). During normal operation we would expect the location of a
process (described numerically by the measure of location) to vary or move
/ 1.5 standard deviations. Herein lies explanation of 6.
Our goal is to reduce the variability of any process as compared to the
process limits to a point where there is room for a / 1.5 standard deviation move, accounting for the natural variability of the process while containing all that variability within the limits. Such a case is referred to as a 6
level of quality, wherein no more than 3.4 DPMO would be expected to fall
outside the limits.
+6
Chapter One
Champion team
formed
Potential projects
identified and
evaluated
Begin/Charter
projects with
DMAIC
methodology
Yes
Do projects
meet criteria?
No
Discontinue
consideration of
project
Define phase
Measure phase
Analyze phase
Is phase review
successfully
completed?
No
Improve phase
Control phase
Yes
Complete projects
with DMAIC
methodology
Verify financial
payback criteria
have been met
Have financial
payback
criteria been
met?
No
Yes
Figure 1.3
Reconsider
project
selection criteria
Chapter One
6 Black Belts
6 Green Belts
Figure 1.4
responsibility for day-to-day operation of 6 efforts. So we see a redenition of responsibilities wherein the green belts no longer simply collect data
as prescribed by black belts, but rather green belts are rapidly being tasked
with collecting data and, more importantly, converting that data into useful
information.
22
26
25
29
23
29
20
27
24
27
27
28
24
26
25
29
24
31
23
26
Table 1.1 has several rows and columns of numbers. These numbers correspond to measurements of the average time to complete a process step. As
a collection of numbers, the data in Table 1.1 do not help us understand much
about the process. To really understand the process, we need to convert the
data into information, and to convert the data we use appropriate tools and
Table 1.2
Descriptive statistics.
Mean
Std Dev
Std Err Mean
upper 95% Mean
lower 95% Mean
N
17.5
Figure 1.5
20
22.5
25
25.52
3.0430248
0.608605
26.776099
24.263901
25
27.5
30
32.5
Histogram.
2
Getting Started with Statistics
10
Chapter Two
12
Chapter Two
Ordinal
Quantitative
Interval
Ratio
3
Describing Data Graphically
n Chapter 2 we introduced descriptive statistics. In this and the next chapter we take a detailed look at the various methods that come under the
umbrella of descriptive statistics.
Commonly, practitioners applying statistics in a professional environment
become overwhelmed with large data sets they have collected. Occasionally,
practitioners even have difculties understanding data sets because either too
many or too few factors are included/not included in the data sets that inuence
a response variable of interest. In other cases, practitioners may doubt whether
the proper statistical technique was used to collect data. Consequently, the
information present in a selected data set may be biased or incomplete.
To avoid the situations described above, it is important to stay focused
on the purpose or need for collecting the data. By staying focused on the
purpose or need, it is much easier to ensure the use of appropriate data collection techniques and the selection of appropriate factors. Descriptive statistics are commonly used in applied statistics to help us understand the
information contained in large and complex data sets. Next, we continue our
discussion of descriptive statistics by considering an important tool called
the frequency distribution table.
15
16
Chapter Three
of data points that belong to any particular category is called the frequency of
that category. For illustration, let us consider the following example.
Example 3.1 Consider a random sample of 110 small to midsize companies located in the midwestern United States. Classify them according to
their annual revenues (in millions of dollars).
Solution: We can classify the annual revenues into ve categories:
Category 1Annual revenue is under $250 million.
Category 2Annual revenue is at least $250 million but less
than $500 million.
Category 3Annual revenue is at least $500 million but less
than $750 million.
Category 4Annual revenue is at least $750 million but less
than $1,000 million.
Category 5Annual revenue is over $1,000 million.
The data collected are given in Table 3.1.
After tallying the data, we nd that of the 110 companies, 30 belong in
the rst category, 25 in the second category, 20 in the third category, 15 in
the fourth category, and 20 in the fth category. The frequency distribution
table for these data is shown in Table 3.2.
Notes:
1. While preparing the frequency distribution table, we must
ensure that no data point belongs to more than one category and
that no data point is omitted from the count. In other words,
each data point must belong to only one category.
2. The total frequency is always equal to the total number of data
points in the data set. In the above example, the total frequency
is equal to 110.
Definition 3.1 A variable is a characteristic of the data under consideration. For example, in the Example 3.1 data, a companys annual revenues are under consideration, so revenue is a variable.
The information provided in frequency distribution Table 3.1 can be
expanded if we include two more columns: a column of relative frequencies
and a column of cumulative frequencies. The column of relative frequencies is
Table 3.1 Annual revenues of 110 small to midsize companies in midwestern United
States.
1 4 3 5 3 4 1 2 3 4 3 1 5 3 4 2 1 1 4 5 5 3 5 2 1 2 1 2 3 3 2 1 5 3 2 1
1 1 2 2 4 5 5 3 3 1 1 2 1 4 1 1 1 4 4 5 2 4 1 4 4 2 4 3 1 1 4 4 1 1 21 5
3 1 1 2 5 2 3 1 1 2 1 1 2 2 5 3 2 2 5 2 5 3 5 5 3 2 3 5 2 3 5 5 2 3 2 5
Table 3.2 Frequency distribution table for 110 small to midsize companies in the
midwestern United States.
Category number
Tally
Category/Class frequency
130
125
120
115
120
Total
110
Table 3.3 Complete frequency distribution table for the 110 small to midsize
companies in the midwestern United States.
Category
number
Tally
Frequency
Relative
frequency
Percentage
Cumulative
frequency
30
30/110
27.27
30
25
25/110
22.73
55
20
20/110
18.18
75
15
15/110
13.64
90
20
20/110
18.64
110
110
100%
Total
obtained by dividing the frequency of each class by the total frequency. The
column of the cumulative frequencies is obtained by adding the frequency of
each class to the frequencies of all the preceding classes so that the last entry
in this column is equal to the total frequency. Some practitioners like to use a
column of percentages instead of the relative frequency column or both. The
percentage column is easy to obtain, that is, just multiply each entry in the relative frequency column by 100. For example, the expanded (or complete) version of Table 3.2 is as shown in Table 3.3.
Sometimes a data set is such that it consists of only a few distinct observations, which occur repeatedly. This kind of data is normally treated in the
same way as the categorical data. The categories are represented by the distinct observations. We illustrate this scenario with the following example.
Example 3.2 The following data show the number of coronary artery
bypass graft surgeries performed at a hospital in 24 hour periods during the
past 50 days. Bypass surgeries are usually performed when a patient has
multiple blockages or when the left main coronary artery is blocked.
1 2 1 5 4 2 3 1 5 4 3 4 6 2 3 3 2 2 3 5 2 5 3 43
1 3 2 2 4 2 6 1 2 6 6 1 4 5 4 1 4 2 1 2 5 2 24 3
Construct a complete frequency distribution table for these data.
18
Chapter Three
Table 3.4 Complete frequency distribution table for the data in example 3.2.
Category
number
Tally
Frequency
Relative
frequency
Percentage
Cumulative
frequency
///// ///
18
8/50
16
13
13/50
26
21
///// /////
10
10/50
20
31
///// ////
19
9/50
18
40
///// /
16
6/50
12
46
////
14
4/50
18
50
50
100%
Total
(3.1)
ber of data points in each class or category is about six or seven, or (2) we
may use Sturges formula:
m 1 3.3 log n
(3.2)
(3.3)
20
Chapter Three
Tally
Frequency
Relative
frequency
Percentage
Cumulative
frequency
[110117)
///
3/40
7.5
[117124)
///// //
7/40
17.5
10
[124131)
///// ///
8/40
20.0
18
[131138)
///// //
7/40
17.5
25
[138145)
///// /
6/40
15.0
31
[145152]
///// ////
9/40
22.5
40
40
100%
Total
and the larger is called the upper limit. Note that except for the last class, the
upper limit does not belong to the class. This means, for example, the data
point 117 will be assigned to class 2 and not class 1. This way no two classes have any common point, which ensures each data point will belong to only
one class. For simplication we will use mathematical notations to denote
the above classes as
[110117), [117124), [124131), [131138), [138145), [145152]
where customarily the symbol [ implies that the end point belongs to the class
and ) implies that the end point does not belong to the class. Then the frequency distribution table for the data in this example is as shown in Table 3.5.
Once data are placed in a frequency distribution table, the data are
grouped data. Once the data are grouped, it is not possible to retrieve the
original data; it is important to note that when grouping data some information will be lost. As we shall see in the next chapter, by using grouped data,
we cannot expect to get as accurate a result as we might expect by using
ungrouped data. In the next chapter we will also see that in order to calculate
certain quantities, such as the mean and variance, using grouped data we
need to dene another quantity, the class mark or class midpoint, which is
dened as the average of the upper and the lower limit. For example, the midpoint of class 1 in the above example is:
Midpoint of class 1 (110 117)/2 113.5
(3.4)
DOT PLOT
A graphical tool used to provide visual
information about the distribution of a single
variable.
DESCRIPTION
USE
TYPE OF DATA
DESIGN/APPLICATION
CONSIDERATIONS
SPECIAL COMMENTS/CONCERNS
RELATED TOOLS
Example 3.4 The following data gives the number of defective motors
received in 20 shipments.
8 12 10 16 6 25 21 15 17 5
26 21 29 8 10 21 10 17 15 13
Construct a dot plot for this data.
Solution: To construct a dot plot rst draw a horizontal line, the scale of
which begins at the smallest observation (5 in this case) or smaller and ends
with the largest observation (29 in this case) or larger (see Figure 3.1).
Dot plots usually are more useful when the sample size is small. A dot
plot gives us, for example, information about how far the data are scattered
and where most of the observations are concentrated. For instance, in the
above example, we see that the minimum number of defective motors and the
maximum number of defective motors received in any shipment was 5 and
29, respectively. Also, we can see that 75% of the time, the number of defective motors was between 8 and 21 for the shipment, and so on.
12
20
16
Defective Motors
24
28
Figure 3.1 Dot plot for the data on defective motors that are received in 20
shipments.
30
22
Chapter Three
DESCRIPTION
PIE CHART
A graphical tool to study a population when
it is divided into different categories. Each
category is represented by a slice of the pie
with angle at the center of the pie
proportional to the frequency of the
corresponding category.
USE
TYPE OF DATA
DESIGN/APPLICATION
CONSIDERATIONS
SPECIAL COMMENTS/CONCERNS
RELATED TOOLS
Bar chart.
Pie charts are commonly used to represent categories of a population that are
created by a characteristic of interest of that population. Examples include
allocation of federal budget by sector, revenues of a large manufacturing company by region, and technicians in a large corporation by qualication, that is,
high school diploma, associate degree, undergraduate degree, or graduate
degree, and so on. The pie chart helps us better understand at a glance the
composition of the population with respect to the characteristic of interest.
To construct a pie chart, divide a circle into slices such that each slice
represents a category proportional to the size of that category. Remember, the
total angle of the circle is 360 degrees. The angle of a slice corresponding to
a given category is determined as follows:
Angle of a slice (Relative frequency of the given category) 360.
We illustrate the construction of a pie chart using the data in example 3.5.
Example 3.5 In a manufacturing operation we are interested in better
understanding defect rates as a function of our various process steps. The
inspection points are initial cutoff, turning, drilling, and assembly. These
data are shown in Table 3.6. Construct a pie chart for these data.
Solution: The pie chart for these data is constructed by dividing the circle
into four slices. The angle of each slice is given in the last column of Table
3.6. The pie chart appears in Figure 3.2.
Frequency
Relative frequency
Angle size
86
86/361
85.75
Turning
182
182/361
181.50
Drilling
83
83/361
82.75
10
10/361
10.00
361
1.000
360.00
Initial cutoff
Assembly
Total
Assembly
2.8%
Initial cutoff
23.8%
Drilling
23.0%
Turning
50.1%
Figure 3.2
24
Chapter Three
BAR CHART
A graphical tool in which frequency of each
category of qualitative data is represented
by a bar of height proportional to the
corresponding category.
DESCRIPTION
USE
TYPE OF DATA
DESIGN/APPLICATION
CONSIDERATIONS
SPECIAL COMMENTS/CONCERNS
RELATED TOOLS
100
80
60
40
20
0
1998
2000
1999
2001
2002
Figure 3.3 Bar graph for annual revenues of a company over the period of five
years.
Table 3.7 Frequency distribution table for the data in Example 3.7.
Categproes
Tally
Frequency
Relative
frequency
Cumulative
frequency
14
14/50
14
13
13/50
27
///// ////
9/50
36
///// //
7/50
43
///// //
7/50
50
50
1.00
Total
To construct the bar chart we label the intervals of equal length on the xaxis with the types of defects and then indicate the frequency of observations
associated with the defect within that interval. The observations are taken to
be equal to the frequency of the corresponding categories. The desired bar
graph appears in Figure 3.4, which shows that the defects of type A occur
most frequently, type B occur second most frequently, type C occur third
most frequently, and so on.
Example 3.8 The following data give the frequency defect types in
Example 3.7, as auto parts manufactured over the same period in two plants
that have the same manufacturing capacity.
Defect Type
Total
Plant I
14
13
50
Plant II
12
18
12
52
Chapter Three
26
8
6
4
2
0
A
Figure 3.4
C
Defect
Defect type
type
20
15
10
0
P1 P2
A
P1 P2
B
P1 P2
C
P1 P2
D
P1 P2
E
Defect type
Figure 3.5 Bar charts for types of defects in auto parts manufactured in Plant I (P1)
and Plant II (P2).
DESCRIPTION
HISTOGRAM
A graphical tool consisting of bars
representing the frequencies or relative
frequencies of classes or categories of
quantitative data. The height of each bar is
equal to the frequency or relative frequency
of the corresponding class.
USE
TYPE OF DATA
DESIGN/APPLICATION
CONSIDERATIONS
SPECIAL COMMENTS/CONCERNS
RELATED TOOLS
28
Chapter Three
Example 3.9 The following data give the survival time (in hours) of 50
parts involved in a field test under extreme operating conditions:
60 100 130 100 115 30 60 145 75 80 89 57 64 92 87 110 180 195 175 179
159 155 146 157 167 174 87 67 73 109 123 135 129 141 154 166 179 37 49
68 74 89 87 109 119 125 56 39 49 190
a. Construct a frequency distribution table for the above data.
b. Construct frequency and relative frequency histograms for the
above data.
Solution:
a.
1. Find the range of the data.
R 195 30 165
2. Determine the number of classes.
m 1 3.3 log 50
6.57
By rounding it we consider the number of classes to be equal to 7.
3. Compute the class width.
Class width R/m 165/7 23.57
By rounding up this number we have class width equal to 24. As noted
earlier, we always round up the class width to a whole number or to any
other number that may be easy to work with. Note that if we round down
the class width, then some of the observations may be left out of our count
and not belong to any class. Consequently the total frequency will be less
than n. The frequency distribution table for the data in this example is as
shown in Table 3.8.
Table 3.8 Frequency distribution table for the survival time of parts.
Classes
Tally
Frequency
Relative
frequency
Cumulative
frequency
[3054)
/////
5/50
[5478)
///// /////
10
11/50
16
[78102)
///// ////
8/50
24
[102126)
///// //
7/50
31
[126150)
///// /
6/50
37
[150174)
///// /
6/50
43
[174198]
///// //
7/50
50
50
Total
Frequency
10
0
30
54
78
102
126
150
174
198
Data
Figure 3.6 Frequency histogram for survival time of parts under extreme operating
conditions.
Chapter Three
Relative frequency
10/50
5/50
0.0
30
78
54
102
126
150
174
198
Data
Figure 3.7 Relative frequency histogram for survival time of parts under extreme
operating conditions.
10
Frequency
30
0
30
54
78
102
126
150
174
198
Data
Figure 3.8
end of class one open and include all the sparse observations that may be
present in the data in class 1.
Another graph that becomes the basis of probability distributions that we
will study in later chapters is called the frequency polygon or relative frequency polygon, depending upon which histogram is used to construct this
graph.
To construct the frequency or relative frequency polygon, rst mark the
midpoints on the top ends of the rectangles of the corresponding histogram
Relative frequency
10/50
5/50
0
30
54
78
126
102
150
174
198
Data
Figure 3.9
f (x)
and then simply join these midpoints. Note that we include classes with zero
frequencies at the lower as well as at the upper end of the histogram so that
we can connect the polygon with the x-axis. The curves obtained by joining
the midpoints are called the frequency or relative frequency polygons as the
case may be. The frequency polygon and the relative frequency polygon for
the data in Example 3.9 are shown in Figure 3.8 and Figure 3.9 respectively.
Sometimes a data set consists of a very large number of observations,
and that results in having a large number of classes of very small widths. In
such cases frequency polygons or relative frequency polygons become
smooth curves. For example, Figure 3.10 shows one such smooth curve.
Such curves are usually called frequency distribution curves and represent the probability distributions of continuous random variables. We will
study probability distributions of continuous random variables in Chapter 7.
Our comments on this topic indicate the importance of the histogram, as they
eventually become the basis of probability distributions.
Chapter Three
50
40
Cumulative frequency
32
30
20
10
0
30
54
78
102
126
150
174
198
Data
Figure 3.12 Cumulative frequency histogram for the data in Example 3.9.
50
Cumulative frequency
40
30
20
10
0
6
30
54
78
102
126
150
174
198
Data
Figure 3.13 Ogive curve for the survival data in Example 3.9.
DESCRIPTION
LINE GRAPH
A graphical tool used to display a time
series data.
USE
TYPE OF DATA
DESIGN/APPLICATION
CONSIDERATIONS
SPECIAL COMMENTS/CONCERNS
RELATED TOOLS
A line graph, also known as time series graph, is commonly used to study
any changes that take place over time in the variable of interest. In a line
graph, time is marked on the horizontal axis or x-axis and the variable on the
vertical axis or y-axis. For illustration we use the data in Example 3.10.
Chapter Three
Example 3.10 The following data give the number of lawn mowers sold by
a garden shop in one year:
Month
Jan.
LMS
Feb. Mar.
1
April
May
June
10
57
62
68
40
15
10
70
62
64
52
60
Lawn mowers sold
34
50
40
40
30
20
15
10
11
10
2
10
11
12
Figure 3.14 Line graph for the data on lawn mowers given in Example 3.10.
DESCRIPTION
USE
TYPE OF DATA
DESIGN/APPLICATION
CONSIDERATIONS
SPECIAL COMMENTS/CONCERNS
RELATED TOOLS
In other words, by preparing a stem and leaf diagram we do not lose any
information. We illustrate the construction of the stem and leaf diagram with
the following example.
Example 3.11 Use the data on survival time of parts in Example 3.9. The
data of Example 3.9 are reproduced in Table 3.9.
Solution: To create a stem and leaf diagram, we split each observation in
the data set into two parts, called the stem and the leaf. For this example we
split each observation at the unit place so that the digit at the unit place is a
leaf and the part to the left is a stem. For example, for observations 60 and
100 the stems and leaves are
Stem
Leaf
10
To construct a complete stem and leaf diagram, list all the stems without
repeating them in a column and then list all the leaves in a row against its
corresponding stem. For the data in this example, we have the diagram
shown in Figure 3.15.
36
Chapter Three
(a)
(b)
Stem
Leaf
Stem
Leaf
12
15
21
22
(4)
10
24
11
21
12
18
13
16
14
13
15
16
6
9
17
18
19
12
15
21
22
(4)
10
24
11
21
12
18
13
16
14
13
15
16
7
5
17
18
19
Figure 3.15 Ordinary and ordered stem and leaf diagram for the data on survival
time for parts in extreme operating conditions in Example 3.9.
Note that in Figure 3.15(a) leaves occur in the same order as observations in the raw data. In Figure 3.15(b) leaves appear in the ascending order,
and that is why it is called an ordered stem and leaf diagram. By rotating
the stem and leaf diagram counterclockwise through 90 degrees, we see the
diagram can serve the same purpose as a histogram with stems serving the
role of classes, leaves as class frequencies, and rows of leaves as rectangles.
Unlike the frequency distribution table and histogram, the stem and leaf diagram can be used to determine, for example, what percentage of parts survived between 90 and 145 hours. Using the stem and leaf diagram we can
see that 15 out of 50, or 30% of the parts, survived between 90 and 145
hours. Using either the frequency table or the histogram, this question cannot be answered, since the interval 90 to 145 does not include whole parts
of classes 3 and 5. The rst column in the graph is counting from the top
and the bottom the number of parts that survived up to and beyond certain
number of hours, respectively. For example, the entry in the fth row from
the top indicates that 15 parts survived less than 80 hours, whereas the entry
in the fth row from the bottom indicates that 13 parts survived at least 150
hours. The entry with parentheses indicates the row that contains the median value of the data. It is clear that we can easily retrieve the original data
from the stem and leaf diagram. Thus, for example, the rst row in Figure
3.15 consists of data points 30, 37, and 39. Ultimately, we can see the stem
and leaf diagram is usually more informative than a frequency distribution
table or a histogram.
Breaking Each Stem into Two or Five Parts Quite often we deal with a large
data set that is spread in a narrow range. When we prepare a stem and leaf diagram for data that do not indicate much variability, it becomes difcult to interpret, as there are too many leaves on the same stem. The stem and leaf diagram
in which stems have too many leaves tend to be less informative since they are
not as clear as those in which the stems do not have too many leaves. To illustrate this scenario, we consider a stem and leaf diagram in Example 3.12.
Example 3.12 A manufacturing company has been awarded a huge contract by the Defense Department to supply spare parts. In order to provide
these parts on schedule, the company needs to hire a large number of new
workers. To estimate how many workers to hire, representatives of the human
resources department took a random sample of 80 workers and found the
number of parts each worker produces per week. The data collected are
shown in Table 3.10.
Table 3.10 Number of parts produced by each worker per week.
66
82
75
68
71
74
76
81
73
73
74
87
70
69
89
93
68
68
86
92
79
87
91
81
84
85
92
80
85
86
65
70
77
87
64
63
75
89
62
65
61
90
67
62
69
92
63
69
74
71
69
74
80
93
73
76
83
67
69
83
82
66
71
85
86
65
76
91
87
68
77
89
78
73
84
90
81
72
83
85
Prepare a stem and leaf diagram for the data in Table 3.10.
Solution: The stem and leaf diagram for the data in Table 3.10 appears in
Figure 3.16.
The stem and leaf diagram in Figure 3.16 is not as informative as it
would be if it had more stems so that each stem had fewer leaves. We can
modify the diagram by breaking each stem into two parts so that the rst part
carries leaves 0 thru 4 and the second one carries leaves 5 thru 9. The modied stem and leaf diagram is shown in Figure 3.17.
Sometimes even a two-stem and leaf diagram is insufcient in illustrating the desired information, since some of the stems still have too many
leaves. In such cases we can break each stem into ve parts so that these
Stem
(22
Leaf
6 1223345556677888899999
(23)
7 00111233334444556667789
(35
8 00111223334455556667777999
(29
9 001122233
Figure 3.16 Ordered stem and leaf diagram for the data in Table 3.10.
38
Chapter Three
Stem
Leaf
(6
6.*
122334
(22
6*
5556677888899999
(36
7.*
00111233334444
(9)
7*
556667789
(35
8.*
001112233344
(23
8*
55556667777999
(9
9.*
001122233
Figure 3.17 Ordered two-stem and leaf diagram for the data in Table 3.10.
Stem
Leaf
(1
6.
(5
6t
2233
(9
6f
4555
(13
6s
6677
(22
6*
888899999
(27
7.
00111
(32
7t
23333
(38
7f
444455
(5)
7s
66677
(37
7*
89
(35
8.
00111
(30
8t
22333
(25
8f
445555
(19
8s
6667777
(12
8*
999
(9
9.
0011
(5
9t
22233
Figure 3.18 Ordered five-stem and leaf diagram for the data in Table 3.10.
stems carry leaves 01, 23, 45, 67, and 89. The new stem and leaf diagram appears in Figure 3.18.
Note that the stem and leaf diagram in Figure 3.18 has become much
simpler and accordingly more informative. It is also interesting to note
that the labels t, f, and s, used to denote the stems, have real meanings. In
this case, the stems indicate that in the leaves assigned to them, t stands
for two and three, f stands for four and five, and s stands for six and
seven.
DESCRIPTION
SCATTER PLOT
A graphical tool used to plot and compare
one variable against another variable.
USE
TYPE OF DATA
DESIGN/APPLICATION
CONSIDERATIONS
SPECIAL COMMENTS/CONCERNS
RELATED TOOLS
Example 3.13 The cholesterol level and the systolic blood pressure of 30
randomly selected U.S. men in the age group of 40 to 50 years are given in
Table 3.11. Construct a scatterplot of this data and determine whether there
is any association between the cholesterol levels and systolic blood pressures.
40
Chapter Three
Table 3.11 Cholesterol levels and Systolic BP of 30 randomly selected U.S. males.
Subject
10
Cholesterol (x)
195
180
220
160
200
220
200
183
139
155
Systolic BP (y)
130
128
138
122
140
148
142
127
116
123
11
12
13
14
15
16
17
18
19
20
Subject
Cholesterol (x)
153
164
171
143
159
167
162
165
178
145
Systolic BP (y)
119
130
128
120
121
124
118
121
124
115
Subject
21
22
23
24
25
26
27
28
29
30
Cholesterol (x)
245
198
156
175
171
167
142
187
158
142
Systolic BP (y)
145
126
122
124
117
122
112
131
122
120
Solution: Figure 3.19(a) shows the scatterplot of the data in Table 3.11.
This scatterplot clearly indicates that there is a fairly good upward linear
trend. Also, if we draw a straight line through the data points, we can see that
the data points are concentrated around the straight line within a narrow band.
The upward trend indicates a positive association between the two variables,
whereas the narrow width of the band indicates the strength of the association
is very strong. As the association between the two variables gets stronger, the
band enclosing the plotted points becomes narrower and narrower. The downward trend indicates a negative association between the two variables. A
numerical measure of association between two numerical variables is called
the Pearson correlation coefficient, named after the English statistician Karl
Pearson (18571936). The correlation coefcient between two numerical
variables in sample data is usually denoted by r. A Greek letter denotes the
corresponding measure of association, that is, correlation coefcient for a
population data. The correlation coefcient is dened as
r=
( xi )( yi )
n
=
2
2
2
2 ( xi ) 2 ( yi )2
( xi x ) ( yi y)
xi n yi n
( xi x )( yi y )
xi yi
(3.5)
150
r= .891
140
130
120
110
150
175
200
225
250
(a)
110
r= .891
120
130
140
150
150
175
200
225
250
(b)
Figure 3.19 MINITAB display depicting eight degrees of correlation: (a) represents
strong positive correlation, (b) represents strong negative correlation,
(c) represents positive perfect correlation, (d) represents negative perfect
correlation, (e) represents positive moderate correlation, (f) represents
negative moderate correlation, (g) represents a positive weak
correlation, and (h) represents a negative weak correlation.
Continued
Chapter Three
200
180
r=1
160
140
120
100
150
175
200
225
250
(c)
100
r = 1
120
42
140
160
180
200
150
175
200
225
250
(d)
Figure 3.19 MINITAB display depicting eight degrees of correlation: (a) represents
strong positive correlation, (b) represents strong negative correlation,
(c) represents positive perfect correlation, (d) represents negative perfect
correlation, (e) represents positive moderate correlation, (f) represents
negative moderate correlation, (g) represents a positive weak
correlation, and (h) represents a negative weak correlation.
Continued
150
140
130
120
r = .518
110
150
200
175
225
250
(e)
110
rr == -.891
.518
120
130
140
150
150
175
200
225
250
(f)
Figure 3.19 MINITAB display depicting eight degrees of correlation: (a) represents
strong positive correlation, (b) represents strong negative correlation,
(c) represents positive perfect correlation, (d) represents negative perfect
correlation, (e) represents positive moderate correlation, (f) represents
negative moderate correlation, (g) represents a positive weak
correlation, and (h) represents a negative weak correlation.
Continued
Chapter Three
Continued
150
r = .212
140
130
120
110
150
200
175
225
250
(g)
110
r = .212
120
44
130
140
150
150
200
175
225
250
(h)
Figure 3.19 MINITAB display depicting eight degrees of correlation: (a) represents
strong positive correlation, (b) represents strong negative correlation,
(c) represents positive perfect correlation, (d) represents negative perfect
correlation, (e) represents positive moderate correlation, (f) represents
negative moderate correlation, (g) represents a positive weak
correlation, and (h) represents a negative weak correlation.
4
Describing Data Numerically
45
46
Chapter Four
DESCRIPTION
MEASURES OF CENTRALITY
Measures of centrality include several
numerical measures. The more commonly
used such measures are mean, median and
mode.
USE
TYPE OF DATA
DESIGN/APPLICATION
CONSIDERATIONS
SPECIAL COMMENTS/CONCERNS
RELATED TOOLS
Measures of dispersion.
X1 + X 2 + + X N
=
N
X1 + X 2 + + X n
=
n
Xi
N
Xi
n
(4.1)
(4.2)
where (read as sigma) is symbolized as a summation over all the measurements, and where N and n denote the population and sample size respectively.
Example 4.1 The following data give the hourly wages (in dollars) of some
randomly selected workers in a manufacturing company:
8, 6, 9, 10, 8, 7, 11, 9, 8
Find the mean hourly wage of these workers.
Solution: Since wages listed in these data are only for some of the workers in the company, it represents a sample. We have
n9
xi 8 6 9 10 8 7 11 9 8 76
So the sample mean is
X=
x1 = 76 = 8.44
n
In this example, the mean hourly wage of these employees is $8.44 an hour.
Example 4.2 The following data give the ages of all the employees in city
hardware store:
22, 25, 26, 36, 26, 29, 26, 26
Find the mean age of the employees in that hardware store.
Solution: Since the data give the ages of all the employees of the hardware
store, we are interested in a population. Thus, we have
N8
xi 2225263626292626 216
So the population mean is
x1 = 216 = 27 years
8
N
In this example, the mean age of the employees in the hardware store is 27
years.
Note that even though the formulas for calculating sample mean and population mean are
similar, it is important to make a clear distinction between
the sample mean X and the population mean for all application purposes.
48
Chapter Four
Example 4.4 The following data describe the sales (in thousands of dollars) for the 16 randomly selected sales personnel distributed throughout the
United States:
10, 8, 15, 12, 17, 7, 20, 19, 22, 25, 16, 15, 18, 250, 300, 12
Find the median sales of these individuals.
Solution:
Step 1. Observations in ascending order:
7 8 10 12 12 15 15 16 17 18 19 20 22 25 250 300
Ranks: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Step 2. Rank of the median (16 1) / 2 8.5
Step 3. Find the value corresponding to the 8.5th rank. Since
the rank of the median is not a whole number, in this case the
median is defined as the average of the values that correspond
to the ranks 8 and 9 (since rank 8.5 is between the ranks 8
and 9).
The median of the above data is Md (16 17) / 2 16.5.
Thus, the median sales of the given individuals is 16.5 thousand dollars.
It is important to note the median does not have to be one of the values
of the data set. Whenever the sample size is odd, the median is the center
value and whenever it is even, the median is always the average of the two
middle values where the data is arranged in the ascending order.
Finally, note that the data in the above example contains two values, 250
thousand and 300 thousand dollars, that seem to be the sales of top performing sales personnel. These two large values may be considered as extreme
values.
In this case, the mean of these data is given by
X (7 8 10 12 12 15 15 16 17 18 19 20
22 25 250 300) / 16 47.875
Since the mean of 47.875 is so much larger than the median of 16.5, it is
obvious that the mean of these data has been adversely affected by the
extreme values. Since the mean does not adequately represent the measure of
centrality of the data set, the median would more accurately identify center
locations of the data for this example.
Furthermore, if we replace the extreme values of 250 and 300, for example, with 25 and 30, the median will not change, although the mean becomes
$16,937. The new data obtained by replacing 250 and 300 with 25 and 30 did
not contain any extreme values. Therefore, the new mean value is more consistent with the true average sales.
50
Chapter Four
w1 X 1 + w 2X 2 + ... + w nX n
=
w1 + w 2 + ... + w n
wi X i
wi
(4.3)
where w1, w2, ..., wn are the weights attached to X1, X2, ..., Xn, respectively.
In this example, the GPA is given by:
Xw =
4.2.3 Mode
The mode of a data set is the value that occurs most frequently. Mode is the
least used measure of centrality. When products are produced via mass production, for example, clothes of certain sizes or rods of certain lengths, the
modal value is of great interest. Note that in any data set there may be no
mode or, conversely, there may be multiple modes. We denote the mode of a
data set by M0.
Example 4.6 Find the mode for the following data set:
3, 8, 5, 6, 10, 17, 19, 20, 3, 2, 11
Solution: In the given data set each value occurs once except 3 which
occurs twice. Thus, the mode for this set is:
M0 3
Example 4.7 Find the mode for the following data set:
1, 7, 19, 23, 11, 12, 1, 12, 19, 7, 11, 23
Solution: Note that in this data set, each value occurs the same number of
times. Thus, in this data set there is no mode.
Example 4.8 Find modes for the following data set:
5, 7, 12, 13, 14, 21, 7, 21, 23, 26, 5
Solution: In this data set, 5, 7, and 21 occur twice and the rest of the values occur only once. Thus, in this example there are three modes, that is,
M0 5, 7, and 21
Symmetric
Mean = Median = Mode
Left-skewed
Left-skewed
Right-ske
Right-skewed
Mode < Median < Mean
Figure 4.1 Frequency distributions showing the shape and location of measures of
centrality.
52
Chapter Four
Definition 4.5 A data set is right-skewed when values in the data set
that are smaller than the median occur with relatively higher frequency than those values that are greater than the median. The values greater than the median are scattered far from the median.
DESCRIPTION
MEASURES OF DISPERSION
Measures of dispersion include several
numerical measures. The more commonly
used measures are variance, standard
deviation, range and inter-quartile range.
USE
TYPE OF DATA
DESIGN/APPLICATION
CONSIDERATIONS
SPECIAL COMMENTS/CONCERNS
RELATED TOOLS
Coefficient of variation.
Figure 4.2 Two frequency distribution curves with equal mean, median and mode
values.
4.3.1 Range
The range of a data set is the easiest measure of dispersion to calculate.
Range is dened as follows:
Range Largest value Smallest value
(4.4)
Range is not a very efcient measure of dispersion since it takes into consideration only the largest and the smallest values and none of the remaining
observations. For example, if a data set has 100 distinct observations, it uses
only two observations and ignores the remaining 98. As a rule of thumb, if
the data set contains 10 or fewer observations, the range is considered a reasonably good measure of dispersion. For data sets larger than 10 observations, the range is not considered to be a very efcient measure of dispersion.
Example 4.9 The following data give the tensile strength (in psi) of a material sample submitted for inspection:
8538.24, 8450.16, 8494.27, 8317.34, 8443.99,
8368.04, 8368.94, 8424.41, 8427.34, 8517.64
Find the range for this data set.
Solution: The largest and the smallest values in the data set are 8538.24
and 8317.34, respectively. Therefore, the range for this data set is:
Range 8538.24 8317.34 220.90
4.3.2 Variance
One of the most interesting pieces of information associated with any data is
how the values in the data set vary from one another. Of course, range can
give us some idea of variability. Unfortunately, range does not help us understand centrality. To better understand variability, we rely on more powerful
54
Chapter Four
indicators such as variance, which is a value that focuses on how much individual observations within the data set deviate from their mean.
For example, if the values in the data set are x1, x2, x3, ... xn and the mean
is x, then x1 x, x2 x, x3 x ... xn x are the deviations from the mean.
It is then natural to nd the sum of these deviations and argue that if this sum
is large, the values differ too much from each other, and if this sum is small,
they do not differ from each other too much. Unfortunately, this argument
does not hold since the sum of the deviations is always zero, no matter how
much the values in the data set differ. This is true because some of the deviations are positive and some are negative, and when we take their sum they
cancel each other.
Since we dont get any useful information from two sets of measures
(i.e., one positive and one negative) that cancel each other, we can square
these deviations and then take their sum. By taking the square we get rid of
the negative deviation in the sense that they also become positive. The variance then becomes the average value of the sum of the squared deviations
from the mean x. If the data set represents a population, the deviations are
taken from the population mean . Thus, the population variance, denoted by
2 (read as sigma square), is dened as:
2 =
2
1 N
( Xi )
N i =1
(4.5)
2
1 n
( Xi X )
n i =1
(4.6)
2
1 n
( Xi X )
n 1 i =1
(4.7)
(4.8)
( X i )
1
S =
[ X i2
]
n 1
n
(4.9)
Note that one difculty in using the variance as the measure of dispersion is that the units for measuring the variance are not the same as are used
for data values. Rather, variance is expressed as a square of the units used for
the data values. For example, if the data values are dollar amounts, then the
variance will be expressed in squared dollars that, in this case, becomes
(4.10)
( X i )2
1
[ Xi2
]
n 1
n
(4.11)
=+
S=+
Note: In general, random variables are denoted by uppercase letters and their
values by the corresponding lowercase letters.
Example 4.10 The following data give the length (in millimeters) of material chips removed during a machining operation:
4, 2, 5, 1, 3, 6, 2, 4, 3, 5
Calculate the variance and the standard deviation for the data.
Solution: There are three simple steps involved in calculating the variance
of any data set.
Step 1. Calculate xi, the sum of all the data values. Thus we
have
xi 4 2 5 1 3 6 2 4 3 5 35
2
Step 2. Calculate X i , the sum of squares of all the
observations, that is,
X i2 42 22 52 12 32 62 22 42 32 52 145
Step 3. Since the sample size is n 10, by inserting the values
2
xi and X i , calculated in Step 1 and Step 2 in formula 4.10,
we have
S2 =
1
1
( 35 )2
(145
) = (145 122.5 ) = 2.5
10 1
10
9
The standard deviation is obtained by taking the square root of the variance,
that is,
S = 2.5 = 1.58
56
Chapter Four
Notes:
1. It is important to remember that the value of S2, and therefore of
S, is always greater than zero, except where all the data values
are equal, in which case it is zero.
2. Sometimes the data values are so large that the calculations for
computing the variance become quite cumbersome. In such
cases one can code the data values by adding to (or subtracting
from) each data value the constant, say c, and then calculate
the variance/standard deviation of the coded data, since the
variance/standard deviation does not change. This can easily be
seen from the following discussion.
Let X1, X2, ..., Xn be a data set and let C 0 be any constant. Let Y1
Y = X +C.
X1 c, Y2 X2 C, ..., Yn Xn C. Then,
clearly we have
This means
that the deviations of Xis from X are the same as the deviations
of Yis from Y . Thus, the variance/standard deviation of Xs is the same as the
variance/standard deviation of Ys (S2y Sx2 and Sy Sx). This result implies
that any shift in location of the data set does not affect the variance/standard
deviation of the data.
Example 4.11
ing data:
Find the variance and the standard deviation of the follow53, 60, 58, 64, 57, 56, 54, 55, 51, 61, 63
Solution: We now compute the variance and the standard deviation of the
new data set 3, 10, 8, 14, 7, 6, 4, 5, 1, 11, 13, which is obtained by subtracting 50 from each value of the original data. This will also be the variance and
the standard deviation of the original data set. Thus, we have
xi 3 10 8 14 7 6 4 5 1 11 13 82
xi2 32 102 82 142 72 62 42 52 12 112 132 786
So that
S2 =
1
1
(82 )2
[ 786
] = [ 786 611.27 ] = 17.473
11 1
11
10
and
S = 17.473 = 4.18
The variance and the standard deviation of the original data set are 17.473
and 4.18, respectively.
3. Any change in scale of the data does affect the variance/
standard deviation. That is, if Yi CXi (C 0) then Y = CX .
Therefore, it can be seen that S2y C2Sx2 and sy | C | Sx.
(4.12)
58
Chapter Four
4.4.1 Mean
In order to compute the mean of a grouped data set, the rst step is to nd the
midpoint m, also known as the class mark for each class, which is dened as:
m (Lower limit Upper limit) / 2
Then the population mean G and the sample mean XG bar are dened as
follows:
G (fimi) / N
(4.13)
XG (fimi) / n
(4.14)
Note: From the entries in Table 4.1 one can observe that the difference
between the midpoints of any two consecutive classes is always the same as
the class width.
Solution: Using Formula 4.14, we have
XG (fimi) / n 1350 / 40 33.75
4.4.2 Median
To compute the median MG of grouped data, follow these steps:
Step 1. Determine the rank of the median that is given by
Rank of MG (n 1) / 2
Table 4.1 Age distribution of group of 40 people watching a basketball game.
Class
Frequency
fm
10under 20
10 + 20
=15
2
120
20under 30
10
20 + 30
= 25
2
250
30under 40
30 + 40
= 35
2
210
40under 50
11
40 + 50
= 45
2
495
50under 60
50 + 60
= 55
2
275
n = f = 40
f i m = 1350
(4.15)
where
L lower limit of the class containing the median
c (n 1) / 2 [sum of the frequencies of all classes
preceding the class containing the median]
f frequency of the class containing the median
w class width
Example 4.14
Example 4.13.
Solution:
Step 1. Rank of the median (40 1) / 2 20.5
Step 2. Add the frequencies until the sum becomes greater than
or equal to 20.5, that is
8 10 6 24 20.5
The class containing the median is (30under 40)
Step 3.
MG 30 ((20.5 (8 10)) / 6)10
30 (2.5 / 6)10 34.16
4.4.3 Mode
To nd the mode of grouped data is a simple exercise. That is, just nd the
class with the highest frequency. The mode of the grouped data is equal to
the midpoint of that class. Note that if there is more than one class with the
highest, but equal, frequencies, there is more than one mode and those modes
are equal to the midpoints of such classes.
In Example 4.13, the mode is equal to the midpoint of the class
[40under 50] since it has the highest frequency, 11. Thus,
Mode (40 50) / 2 45
60
Chapter Four
4.4.4. Variance
The population and the sample variance of grouped data are computed by
using the following formulas.
( fi mi )
1
[ fi mi2
]
N
N
2
PopulationVariance( G2 ) =
( fi mi )
1
[ fi mi2
]
n 1
n
(4.16)
SampleVariance(SG2 ) =
(4.17)
1
(1350 )2
[ 52800
]
40 1
40
1
1
[ 52800 45562.5 ] = [ 7237.5 ] = 185.577
39
39
The population and the sample standard deviation are found by taking the
square root of the corresponding variances. For example, the standard deviation for the data in example 4.13 is
SG
185.577 13.62
Figure 4.3
.15 oz
62
Chapter Four
15.8
15.5
16.1
0.3 = 2
0.3 = 2
-2
Figure 4.4
-2
X = 35,700
28,000
7,500 = 3S
7,500 = 3S
X 3S
Figure 4.5
43,200
X + 3S
64
Chapter Four
11
72
Figure 4.6
11.2
12
.2
.8
73
Salary data.
Step 3. Find the data value that corresponds to the rank 11.2,
which will be the 70th percentile. From Figure 4.6, we can
easily see that the value of the 70th percentile is given by
70th percentile 72(.8) 73(.2)
72.2
(# of data values x )
(100 )
(n + 1)
(4.18)
25%
25%
25%
25%
es
Q1
Q2
Q3
25 th
50 th
75 th
Quartiles
Percentiles
Figure 4.7
obtained by trimming 25% of the values from the bottom and 25% from the
top. Interquartile range (IQR) is dened as
IQR Q3 Q1
Example 4.19
4.18:
(4.19)
Salaries: 48, 51, 51, 52, 54, 55, 58, 62, 63, 69, 72, 73, 76, 85, 95
Solution: In order to nd the interquartile range, we need to nd the quartiles Q1 and Q3 or, equivalently, 25th percentile and the 75th percentile. We
can easily see that the ranks of 25th and 75th percentile are:
Rank of 25th percentile (25 / 100)(15 1) 4
Rank of 75th percentile (75 / 100)(15 1) 12
Thus, in this case, Q1 52 and Q3 73.
This means that the middle 50% of the engineers earn a salary between
$52,000 and $73,000. The interquartile range in this example is
IQR $73,000 $52,000 $21,000
Notes:
1. The interquartile range gives the range of variation among the
middle 50% of the population.
2. The interquartile range is potentially a more meaningful
measure of dispersion as it is not affected by the extreme
values that may be present in the data. By trimming 25%
of the data from the bottom and 25% from the top, we are
eliminating any extreme values that may be present in the
data set. Interquartile range is used quite often as a measure
of comparison for comparing two or more data sets on similar
studies.
66
Chapter Four
DESCRIPTION
BOX-WHISKER PLOT
A graphical tool that uses summary
statistics: first quartile, second quartile, third
quartile, extreme data values that are
located within a certain range.
USE
TYPE OF DATA
DESIGN/APPLICATION
CONSIDERATIONS
SPECIAL COMMENTS/CONCERNS
RELATED TOOLS
L
A
1.5
IQ R
1.5
IQ R
Q1
Q2
IQ R
Q3
1.5
IQ R
F
1.5
IQ R
Figure 4.8
Box-whisker plot.
points that fall within the inner fences. The lines from A to S
and D to L are called the whiskers.
68
Chapter Four
Example 4.20 The following data gives the noise level measured in decibels (a normal conversation by humans produces a noise level of about 75
decibels) produced by different machines in a large manufacturing plant:
85, 80, 88, 95, 115, 110, 105, 104, 89, 87, 96, 140, 75, 79, 99
Construct a box plot and see whether the data set contains any outliers.
Solution:
First we arrange the data in the ascending order and rank them from 1 to 15
(n 15)
Data values: 75, 79, 80, 85, 88, 89, 95, 96, 97, 99, 104, 105, 110, 115, 140
Ranks: 11, 12, 13, 14, 15, 16, 17, 18, 19, 10, 111, 112, 113, 114, 115
We now nd the ranks of the quartiles Q1, Q2, and Q3. Thus, we have
Rank of Q1 (25 / 100)(15 1) 4
Rank of Q2 (50 / 100)(15 1) 8
Rank of Q3 (75 / 100)(15 1) 12
Therefore the values of Q1, Q2, and Q3 are
Q1 85
Q2 96
Q3 105
Interquartile range is
IQR Q3 Q1 105 85 20
and
(1.5)IQR (1.5)20 30
The Figure 4.9 shows the box plot for the above data.
Figure 4.9 shows that the data includes one outlier. In this case, action
should be taken to reduce the sounds of the machinery, which produces a
noise level of 140 decibels.
C
A
25
55
Figure 4.9
135
165
75
115
85
96
105
Example 4.21 The following data give the number of persons who take the
bus during the off-peak time schedule from Grand Central Station to Lower
Manhattan in New York:
12 12 12 14 15 16 16 16 16 17 17 18 18 18 19 19 20 20 20 20
20 20 20 20 21 21 21 22 22 23 23 23 24 24 25 26 26 28 28 28
1. Find the mean, mode, and median for these data.
2. Prepare the box plot for the data.
3. Using results of part (1) and (2), verify whether the data are
symmetric or skewed. Examine whether the conclusion made
using the two methods about the shape of the distribution are
the same or not.
4. Using the box plot, determine whether the data contains any
outliers.
5. If in part (3) the conclusion is that the data are at least
approximately symmetric, find the standard deviation and
verify whether the empirical rule holds.
Solution:
1. The sample size in this problem is n 40. Thus, we have
Mean X xi / n 800/40 20
Mode 20
Median 20
2. To prepare the box plot, we rst nd the quartiles Q1, Q2 and
Q3.
Rank of Q1 (25 / 100)(40 1) 10.25
Rank of Q2 (50 / 100)(40 1) 20.5
Rank of Q3 (75 / 100)(40 1) 30.75
Since the data presented in this problem are already in ascending order, we
can easily see that the quartiles Q1, Q2, and Q3 are
Q1 17
Q2 20
Q3 23
Interquartile range:
IQR Q3 Q1 23 17 6
1.5(IQR) 1.5(6) 9
The box plot for the data is as shown in Figure 4.10
3. Both parts (1) and (2) lead us to the same conclusionthe data
are symmetric.
70
Chapter Four
C
A
32
41
12
28
4. From the box plot in Figure 4.10, we see that the data do not
contain any outliers.
5. In part (3) we conclude that the data is symmetric. We proceed
to calculate the standard deviation and then verify whether the
empirical rule holds.
S2 =
1
(12 + + 28 )2
12 2 + + 28 2
40 1
40
] = 18.1538
5
Probability
71
72
Chapter Five
Probability 73
Solution: In this case there are six possible outcomes, which we list as {1,
2, 3, 4, 5, 6}.
Example 5.4
outcomes.
If in Example 5.3 two parts are selected, list all the possible
74
Chapter Five
the workers nominated, then the sample space will consist of eight possible
outcomes:
S {(MMM), (MMF), (MFM), (FMM), (MFF), (FMF), (FFM), (FFF)}
The sample spaces in all the examples considered so far are nite, which
means each sample space contains a nite number of elements or sample
points.
Definition 5.2 A set C is said to be countably innite when there
is a one-to-one correspondence between the set C and a set of all
non-negative integers.
Many examples in our day-to-day life involve sample spaces that consist
of countably innite number of elements.
Example 5.6 A lightbulb manufacturing company agrees to destroy all
bulbs produced until it produces a set quantity, say n, of bulbs at a defined
level of quality. Such an agreement is made in the context of negotiating a
sales contract wherein the purchaser wishes to ensure product of insufficient
quality does not enter the supply stream.
Solution: In this case, we observe the number of bulbs the company has to
destroy. The sample space for this situation consists of countably innite
number of sample points S {0, 1, 2, ...} until a set number of bulbs produced meets or exceeds the desired level of quality.
The sample space S in this example contains the element or sample point
zero, since it is possible that the rst n bulbs produced by the company meet
the desired quality level or standard, in which case the company does not
need to destroy any bulbs.
Example 5.7 In Example 5.1 suppose that the Six Sigma Green Belt engineer decides to test the chips until she finds a defective chip. Determine the
sample space for this experiment.
Solution: The sample space S in this example will be S {1, 2, 3, ...}. The
defective chip may be found in the rst trial or in the second trial or in the
third trial and so on.
In Examples 5.6 and 5.7, the sample spaces consist of countably innite
number of elements.
Definition 5.3 A sample space S is considered discrete if it consists of either a nite or a countably innite number of elements.
Definition 5.4 Any collection of sample points of a sample space
S, i.e., any subset of S, is called an event.
Example 5.8
events in S.
Solution:
Probability 75
The possible subsets of S and consequently the possible events in this sample space S are
[{},{DD},{DN},{ND},{NN},{DD, DN},{DD, ND},{DD, NN},{DN,
ND},{DN, NN},{ND, NN},{DD, DN, ND},{DD, DN, NN},{DD, ND,
NN},{DN, ND, NN},{DD, DN, ND, NN}]
There are 16 total possible events in this sample space. In general, if a
sample space consists of n sample points then there are 2n possible events in
the sample space. Each sample space contains two special events: {},
which does not contain any elements of S, an empty set known as the null
event, and the event represented by the whole sample space S itself, which is
known as the sure event.
In Example 5.8 we encourage you to note that the simple events {DD},
{DN}, {ND}, and {NN} are also listed as events. By denition, all simple
events are also events; however, not all events are simple events. Events in a
sample space S are usually denoted by the capital letters A, B, C, D, and so
on. Event A is said to have occurred if the outcome of a random experiment
is an element of A.
Example 5.9 Suppose in Example 5.2, a part is randomly selected and the
manufacturer of the part is found. Determine whether a given event has
occurred.
Solution: In this case, the sample space S is {1, 2, 3, 4, 5, 6}. Let the given
event in S be A {1, 4, 5, 6}. Now we can say event A has occurred if the
manufacturer of the part is 1, 4, 5, or 6. Otherwise we say event A has not
occurred.
76
Chapter Five
in describing and listing the sample points in the sample space of the
experiment.
Solution: We use a tree diagram technique to describe and list the sample
points in the sample space of the experiment in this example. The rst trial
in this experiment can result in only two outcomes (D, N); the second in six
outcomes (1, 2, 3, 4, 5, or 6) and the third, again, can result in two possible
outcomes (D, N). The tree diagram associated with the experiment is shown
in Figure 5.1.
D1D
D
N
D1N
1
D2D
D
N
2
D2N
D
D3D
N
D3N
D4D
D
N
5
D
D4N
D5D
D
N
6
D
D5N
D6D
N
D6N
N1D
D
N
1
N1N
N2D
D
2
N
N2N
N3D
N
3
D
N
N3N
N4D
D
5
6
N
D
N
N4N
N5D
N5N
N6D
D
N
N6N
Figure 5.1 Tree diagram for an experiment of testing a chip, randomly selecting a
part, and testing another chip.
Probability 77
78
Chapter Five
Suppose that to arrange these objects we allocate three slots so that each
slot can accommodate only one object. The rst slot can be lled with any of
the three objects or we can say that there are three ways to ll the rst slot.
After lling the rst slot we are left with only two objects which can be
either A, B; A, C; or B, C. Now the second slot can be lled with either of
the two remaining objects, meaning that there are two ways to ll the second
slot. Once the rst two slots are lled, we are left with only one object, so
that the third slot can be lled only one way. Then, all three slots can be lled
simultaneously in 3 2 1 6 ways. This concept can be extended for any
number of objects, that is, n distinct objects can be arranged in n (n 1)
(n 2) ... 3 2 1 ways. The number n (n 1) (n 2) ...
3 2 1 is particularly important in mathematics and applied statistics
and is denoted by the special symbol n! (read as n-factorial).
Definition 5.6 The n-factorial is the product of all the integers
starting from n to 1. That is,
n! n (n 1) (n 2) ... 3 2 1.
(5.1)
1. 8!
2. 10!
3. (13 5)!
Solution:
1. 8! 8 7 6 5 4 3 2 1 40,320
2. 10! 10 9 8 7 6 5 4 3 2 1 3,628,800
3. (13 5)! 8! 40,320
The number of permutations of n distinct objects is denoted by Pnn . If we are
interested in arranging only r of n objects (r n), the number of permutations is denoted by Prn. From our discussion above, we can see that
Pnn n (n 1) (n 2) ... 3 2 1 n!
Prn n (n 1) (n 2) ... (n r 1)
n!
(n r )!
(5.2)
(5.3)
Example 5.12 An access code for a security system consists of four positive digits (1 through 9). How many access codes are possible if each digit
can be used only once?
Solution: Since each digit can be used only once and the different orders of
four digits give different access codes, the total number of access codes is
equal to the number of arrangements of choosing four digits from nine digits and arranging four digits in all possible ways. That is,
Number of access codes p49 =
9!
9! 9 8 7 6 5!
= =
= 3024
(9 4 )! 5 !
5!
Probability 79
Combinations
If we want to select r objects from a set of n objects, without giving any
importance to the order in which these objects are selected, the total number
of possible ways that r objects can be selected is called the
n
number of combinations. It is usually denoted by crn and sometimes by .
x
Clearly each of the crn combinations of r objects can be arranged in r! ways.
Thus, the total number of permutations of n objects taken r at a time is r!
crn . Thus from Equation (5.3), we have
r! c nr
n!
(n r )!
that is
crn =
n!
r !(n r )!
(5.4)
10!
10 9 8 7 6 5 4 3 2 1 10 9 8 ( 7!)
=
=
3! 7!
3! 7!
3! 7!
10 9 8 10 9 8
=
= 120
3!
3 2 1
Said another way, the management team can select three managers out of the
10 possible managers in 120 possible ways.
So far we have covered the problem of describing sample points in a sample space and in events belonging to that sample space. Now we need to study
how to combine two or more events and describe the associated sample
points. When we look more closely at probability theory in the next section,
we will see that quite often we are interested in calculating the probability of
events that are, in fact, combinations of two or more events. The combinations
of events are completed by special operations known as unions, intersections,
and complements. To dene these operations we rely on Venn diagrams.
We will now describe how to represent a sample space and events in that
sample space with the help of a Venn diagram. In Venn diagrams, the sample
space is represented by a rectangle, whereas events are represented by regions
or parts of regions within the rectangle. Note that a region representing an
event encloses all the sample points in that event. For example, suppose S
{1, 2, 3, 4, 5, 6} and A {2, 4, 5}. Then a Venn diagram, as shown in Figure
5.2, is drawn to represent the sample space S and the event A. Note that the
region representing the event A encloses the sample points 2, 4, and 5.
Definition 5.7 An event in sample space S is called a null event if
it does not contain any sample point, in which case it is usually
denoted by the Greek letter (read as phi).
80
Chapter Five
S
2
4
5
1
6
Figure 5.2
Example 5.14 Consider a group of 10 shop floor workers in a production/manufacturing operation. Suppose the group consists of nine men and
one woman. Let S be a sample space that consists of a set of all possible
groups of three workers. Determine an event A in S containing all groups of
three workers that have two women and one man.
Solution: Clearly no groups of three workers can have two women since
there is only one woman in the bigger group. Thus, the event A is the null
event, that is A .
Definition 5.8 Let S be a sample space and let A be an event in S.
Then the event A is called a sure event if it consists of all the sample
points in the sample space S.
Example 5.15 Let S be a sample space associated with a random experiment E. Then determine a sure event in the sample space S.
Solution: By denition a sure event must contain all the sample points that
are in S. Thus, the only sure event is the sample space S itself.
As we saw above, a sample space and events can be represented by using
set notation. To more fully develop the basic concepts of probability theory
in an orderly fashion, it is important to study rst some basic operations of
set theory.
Basic Operations of Set Theory
Let S be a sample space and let A and B be any two events in S. Then the
basic operations of set theory are union, intersection, and complements:
Definition 5.9 The union of events A and B, denoted by A B
(read as A union B), is dened as an event containing all the sample
points that are in either A or B or both A and B, as illustrated in
Figure 5.3.
Definition 5.10 The intersection of events A and B, denoted by A
B (read as A intersection B), is dened as an event containing all
the sample points that are in both A and B, as illustrated in Figure 5.4.
Definition 5.11 The complement of an event A in the sample space
S, denoted by A (read as complement of A), is dened as an event
Probability 81
Figure 5.3
S
S
A
A
B
B
Figure 5.5
containing all the sample points that are in S but not in A, as illustrated in Figure 5.5.
Example 5.16 Let a sample space S and two events A and B in S be defined
as follows: S {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, A {1, 4, 6, 7, 8}, B {5, 7,
9, 10}. Then determine A B, A B, A and B .
Solution:
82
Chapter Five
1 4
6 8
2
B
10
3
1 4
6 8
B
5
9
10
1 4
6 8
B
5
9
10
1 4
6 8
B
5
9
10
B
Figure 5.6 Venn diagram representing A B {1, 4, 5, 6, 7, 8, 9, 10},
Probability 83
Figure 5.7
Definition 5.12 Let S be a sample space and let A and B be any two
events in S. Then the events A and B are called mutually exclusive if
the event A B is a null event, that is A B (see Figure 5.7).
Example 5.17 Let S {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, A {1, 3, 5, 7, 9}, B
{2, 4, 5, 8, 10}. Determine whether events A and B are mutually exclusive.
Solution: Clearly events A and B do not have any sample point in common, that is, A B . Therefore, events A and B are mutually exclusive.
This means that events A and B cannot occur together.
84
Chapter Five
Thus, there are eight (2 2 2) equally likely sample points in the sample
S associated with this experiment, that is,
S {HHH, HHT, HTH, THH, HTT, THT, TTH, TTT}
Example 5.20 Consider an experiment E of rolling two balanced dice and
observing the numbers that appear on the uppermost face. Determine the
sample space associated with the experiment E.
Solution: The sample space S in this experiment consists of 36 (6 6)
equally likely sample points, that is,
(1, 1), (1, 2 ), (1, 3), (1, 4 ), (1, 5 ), (1, 6 ), (2, 1), (2, 2 ), (2, 3), (2, 4 ), (2, 5 ), (2, 6 ),
(5, 1), (5, 2 ), (5, 3), (5, 4 ), (5, 5 ), (5, 6 ), (6, 1), (6, 2 ), (6, 3), (6, 4 ), (6, 5 ), (6, 6 )
Example 5.21 Repeat the experiment in Example 5.20, but instead of
observing only the numbers that appear on the uppermost faces, observe the
sum of the numbers that appear on both dice. Then determine the sample
space S associated with the new experiment.
Solution:
The sample points in this sample space correspond to the sample points in
the sample space of Example 5.20:
2 {(1, 1)}
3 {(1, 2) (2, 1)}
4 {(1 ,3) (2, 2) (3, 1)}
5 {(1, 4) (2, 3) (3 ,2) (4, 1)}
6 {(1, 5) (2, 4) (3, 3) (4, 2) (5, 1)}
7 {(1, 6) (2, 5) (3, 4) (4, 3) (5, 2) (6, 1)}
8 {(2, 6) (3, 5) (4, 4) (5, 3) (6, 2)}
9 {(3, 6) (4, 5) (5, 4) (6, 3)}
10 {(4, 6) (5, 5) (6, 4)}
11 {(5, 6) (6, 5)}
12 {(6, 6)}
Note that the equations here express equality of events.
From the description of the sample points, we can see that the sample
points in the sample space are not equally likely. In fact, 3 is twice as likely to
appear as 2, 4 is three times as likely as 2 or 1.5 times as likely as 3, and so on.
Definition 5.14 Consider a random experiment E with sample
space S. The sample points that describe event A are said to be
favorable to the event A.
Probability 85
na
n
(5.5)
na
20 / 60 1 / 3
n
86
Chapter Five
Example 5.24 Roll two balanced dice and observe the sum of the two numbers that appear on the uppermost faces. Let A be the event that the sum of
the numbers of the two dice is 7. Find the probability of the event A.
From Examples 5.20 and 5.21 we can see that n 36, na 6.
na
Thus, P(A)
6 / 36 1 / 6.
n
Solution:
Example 5.25 In Example 5.20, find the probability that the two dice show
the same number.
Solution: Let A be the event that the two dice show the same number. Then
A {(1, 1)(2, 2)(3, 3)(4, 4)(5, 5)(6, 6)}, that is, na 6.
The probability that the two dice show the same number is
P(A)
na
6 / 36 1 / 6
n
Example 5.26 Consider a group of six workers, all of whom are born during the same nonleap year. Find the probability that no two workers have
the same birthday.
Solution: We represent the workers as W1, W2, W3, W4, W5, and W6 and
their birthdays as D1, D2, D3, D4, D5, and D6, respectively. The sample point
that represents the birthdays of these workers may be represented by (D1, D2,
D3, D4, D5, D6). Since each birthday could be any day of the year, using the
multiplication rule, the total number of sample points in the sample space is
given by
n 365 365 365 365 365 365 3656
Now suppose that E is the event where no two of the six workers are born on
the same day. Event E will occur if the rst workers birthday falls on any of
the 365 days, the second workers birthday falls on any of the remaining 364
days, the third workers birthday falls on any of the remaining 363 days, and
so on. The total number of sample points favorable to the event E, using the
multiplication rule, is
na 365 364 363 362 361 360. Thus, in this example, we have
na
P(E)
(365 364 363 362 361 360) / 3656
n
0.959538
Probability 87
0 P(A) 1
Axiom 2.
P(S) 1
Axiom 3.
P( Ai ) = P(Ai )
i =1
i =1
Please note that the rst axiom states that the probability of an event A
always assumes a value between 0 and 1 (inclusive). Axiom 2 states that the
event S is sure to happen; in other words, it is certain that the outcome of the
experiment E will be a sample point in the sample space S. Axiom 3 states
that the probability of occurrence of one or more of the mutually exclusive
events A1, A2, ..., An is just the sum of their respective probabilities.
A consequence of these axioms is that several important results simplify
the computation of probability of complex events. Here we state just a few
of them. The proofs of these results are beyond the scope of this book.
Theorem 5.1 Let S be a sample space, and let A be any event in S. The
sum of probabilities of the event A and its complement A is one, that is,
P(A) P(A) 1
(5.6)
From this it follows that
(5.7)
(5.8)
Theorem 5.3 Let S be a sample space. Let A and B be any two events
(may or may not be mutually exclusive) in S. Then, we have
Figure 5.8 Venn diagram showing the phenomenon of P(A B) P(A) P(B)
P(A B).
88
Chapter Five
(5.9)
(5.10)
The result in Theorem 5.3 can easily be extended for more than two events.
For example, for any three events A, B and C we have
P(A B C) P(A) P(B) P(C) P(A B)
P(A C) P(B C) P(A B C)
Example 5.27 Suppose a manufacturing plant has 100 workers, some of
whom are working on two projects, project 1 and project 2. Suppose 60
workers are working on project 1, 30 are working on project 2, and 20 workers are working on both the projects. Suppose a worker is selected randomly. What is the probability that he or she is working on at least one of the
projects?
Solution: Let A be the event that the selected worker is working on project 1 and B be the event that the selected worker is working on project 2. We
are interested in nding the probability that the worker is working on at least
one project, that is, either on project 1 or on project 2 or on both the projects.
This is just equivalent to nding the probability P(A B). From the information provided to us, we have
P(A) 60 / 100, P(B) 30 / 100 and P(A B) 20 / 100
Therefore, from Formula (5.9), we have
P(A B) P(A) P(B) P(A B) 60 / 100
30 / 100 20 / 100 70 / 100 7 / 10
Probability 89
given that B and it should not be confused with A / B, which means A divided
by B). The conditional probability may be dened as follows:
Definition 5.16 Let S be a sample space and let A and B be any
two events in the sample space S. The conditional probability of the
event A, given that the event B has already occurred, is as follows:
P(A | B)
P ( A B)
,
P (B)
if
P (B) 0
(5.11)
P(B | A)
P ( A B)
,
P( A)
if
P( A) 0
(5.12)
Similarly,
Example 5.28 The manufacturing department of a company hires technicians who are college graduates as well as technicians who are not college
graduates. Under the diversity program, the manager of any given department is very careful to hire both male and female technicians. The data in
Table 5.1 shows a classification of all technicians in a selected department
by qualification and gender.
In this case, the manager promotes one of the technicians to a supervisory position. If it is known that the promoted technician is a woman, then
what is the probability that she is a nongraduate? Find the probability that
the promoted technician is a nongraduate when it is not known that the promoted technician is a woman.
Solution: Let S be the sample space associated with this problem and let
A and B be two events dened as follows:
A: the promoted technician is a nongraduate
B: the promoted technician is a woman
We are interested in nding the conditional probability P(A | B).
Since any of the 100 technicians could be promoted, the sample space S
consists of 100 equally likely sample points. The sample points that
are favorable to the event A are 65 and those that are favorable to the event
B are 44. Also, the sample points favorable to both the events A and B are all
the women who are nongraduates and equal to 29. To describe this situation
we have
P(A) 65 / 100, P(B) 44 / 100, and P(A B) 29 / 100
Therefore,
Table 5.1 Classification of technicians by qualification and gender.
Graduates
Nongraduates
Total
Male
20
36
56
Female
15
29
44
Total
35
65
100
90
Chapter Five
P(A | B)
P ( A B)
29 / 100
29 / 44
P (B)
44 / 100
(5.13)
(5.14)
and
Now, using the results in Equations (5.13) and (5.14) and if P(A | B) P(A)
or P(B | A) P(B), that is, if events A and B are independent, we can easily see that
P(A B) P(A) P(B)
A consequence of this result is that we have the following denition.
Definition 5.17 Let S be a sample space, and let A and B be any
two events in S. The events A and B are independent, if and only if
any one of the following is true:
1. P(A | B) P(A)
(5.15)
2. P(B | A) P(B)
(5.16)
(5.17)
The conditions in Equations (5.15), (5.16) and (5.17) are equivalent in the
sense that if one is true then the other two are true.
Note that the results in 5.13 and 5.14 are known as the multiplication rule.
Earlier in this chapter we learned about mutually exclusive events.
Although it may seem that mutually exclusive events are the same as independent events, we encourage you to be aware that the two concepts are
entirely different. Independence is a property that relates to the probability of
events, whereas mutual exclusivity relates to the composition of events, that
is, to the sample points presented in the events. For example, if the events A
and B are mutually exclusive and P(A) 0, P(B) 0, then P(A B)
P() 0 P(A)P(B) so the events are not independent.
Another method of calculating conditional probability without using the
formulas in Equations (5.11) and (5.12) is by determining a new sample
space, called the induced sample space, taking into consideration the information about the event that has already occurred. For instance, in Example
5.28, if we use the information that a woman has already been promoted,
then it is determined that the technician who was promoted cant be a man
and, therefore, the new sample space, or the induced sample space, consists
Probability 91
Male
Graduates
Nongraduates
Total
27
33
60
Female
18
22
40
Total
45
55
100
only of 44 sample points (the total number of women). Out of the 44 women
technicians, 29 are nongraduates. The conditional probability P(A | B), is the
probability that a nongraduate technician has been promoted given that a
female technician has been promoted, and is now found as
P(A | B)
11 / 50
P ( A B)
11 / 50 5 / 2 11 / 20
2/5
P (B)
6
Discrete Random Variables
and Their Probability
Distributions
n Chapter 5 we studied sample space, events, and basic concepts, including certain axioms of probability theory. We saw that the sample space
associated with a random experiment E describes all possible outcomes
of the experiment. In many applications such a description of outcomes is not
sufcient to extract full information about the possible outcomes of the
experiment. In such cases it is always useful to assign a certain numerical
value to all the possible outcomes. Dening a variable known as a random
variable does the assigning of numerical values to all the possible outcomes.
In this chapter we dene random variables and study their probability
distribution, mean, and standard deviation. Then, we study some special
probability distributions that are commonly encountered in various statistical
applications.
93
94
Chapter Six
Definition 6.2 A random variable that assumes a nite (or countably innite) number of values is called a discrete random variable.
Definition 6.3 A random variable that assumes an uncountably innite number of values is called a continuous random variable.
Examples of discrete random variables are such as the number of cars
sold by a dealer, the number of new employees hired by a company, the number of parts produced by a machine, the number of defects in an engine of a
car, the number of patients admitted to a hospital, the number of telephone
calls answered by a receptionist, the number of applications sent out by a job
seeker, the number of games played by a batter before he makes a home run,
and so on.
To elaborate the concept that a random variable assigns numerical values
to all the possible outcomes, we consider a simple example of rolling two
dice. Remember that experiments of rolling dice or tossing coins were used
in this text to introduce the concept of probability theory.
Example 6.1 Roll two fair dice and observe the numbers that show up on
the upper faces. Find the sample space of this experiment and then define a
random variable that assigns a numerical value to each sample point equal
to the sum of the points that show up on upper faces. Find all the values that
such a random variable assumes.
Solution: Obviously, when a fair die is rolled any one of the six possible
numbers (1, 2, 3, 4, 5, 6) can come up. Thus, the sample space when two dice
are rolled is as follows:
S {(1,1), (1,2), (1,3), (1,4), (1,5), (1,6), (2,1), (2,2), ..., (2,6), ..., (6,6)}
Let X be a random variable that assigns a numerical value to each sample point equal to the sum of the two numbers. We have
X(1,1) 2
X(1,2) X(2,1) 3
X(1,3) X(2,2) X(3,1) 4
X(1,4) X(2,3) X(3,2) X(4,1) 5
X(1, 5) X(2, 4) X(3, 3) X(4, 2) X(5, 1) 6
X(1, 6) X(2, 5) X(3, 4) X(4, 3) X(5, 2) X(6, 1) 7
X(2, 6) X(3, 5) X(4, 4) X(5, 3) X(6, 2) 8
X(3, 6) X(4, 5) X(5, 4) X(6, 3) 9
X(4, 6) X(5, 5) X(6, 4) 10
X(5, 6) X(6, 5) 11
X(6, 6) 12
The random variable X assumes the values 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,
12. X assumes only a nite number of values; therefore, it is a discrete random variable.
(6.1)
2. pi 1
(6.2)
We can easily verify that the probabilities in Table 6.2 satisfy the properties given in Equations (6.1) and (6.2). Thus, the probability function P(X x)
x1
x2
x3
x4
xn
P(X x )
p1
p2
p3
p4
pn
10
11
12
P(X x )
1/36
2/36
3/36
4/36
5/36
6/36
5/36
4/36
3/36
2/36
1/36
96
Chapter Six
10
11
12
Figure 6.1
(6.3)
where the summation is taken over all xi in A. Thus, for example, let A in
Example 6.1 be such that
A {2, 4, 5, 8}.
Then, from Table 6.2, we have
P(A) P(X 2) P(X 4) P(X 5) P(X 8)
1 / 36 3 / 36 4 / 36 5 / 36 13 / 36
In particular, if we dene the subset A as
A {x | x 6}
Then, again using Table 6.2, we have
P(A) P(X 2) P(X 3) P(X 4) P(X 5) P(X 6)
1 / 36 2 / 36 3 / 36 4 / 36 5 / 36 15 / 36 5 / 12.
In general, if we dene A as
A {xi | xi x}
then
P( A) =
P( X = xi ) = f ( xi )
xi x
(6.4)
xi x
Equation (6.4) is clearly the sum of probabilities for all xi that are less
than or equal to x. The probability P(A) dened in Equation (6.4) is commonly known as a cumulative probability.
Figure 6.2
Xx
f(x) P(X = x)
__
__
__
1
X
98
Chapter Six
Figure 6.3
(6.6)
(6.7)
x2
1 / 36
3 / 36
6 / 36
10 / 36
2 x3
3 x4
4 x5
5 x6
35 / 36
11 x 12
1
12 x.
= E ( X ) = xf ( x )
(6.8)
that is, each value of the random variable X is multiplied by the corresponding probability and summed over all the possible values of
the random variable X.
Definition 6.7 The variance of a discrete random variable X is
dened as
2 = V ( X ) = ( x )2 f ( x )
(6.9)
( x )2 f ( x )
(6.10)
x2 f (x) 2
(6.11)
Example 6.4 Let a random variable X denote the number of defective parts
produced per eight-hour shift by a machine. Experience shows that it produces defectives between 0 and 4 (inclusive) with the following probabilities:
Xx
f(x) P(X x)
0.1
0.4
0.25
0.20
0.05
100
Chapter Six
= E ( X ) = xf ( x )
= 0 (0.1) + 1 (0.4 ) + 2 (0.25 ) + 3 (0..20 ) + 4 (0.05 )
= 0 + 0.40 + 0.50 + 0.60 + 0.20 = 1.70
Now, using Equation (6.10), the standard deviation of the random variable X is
= x2 f (x) 2
(0.05 ) (1.70 )2
0.4
0.3
0.2
0.1
0.0
0
0.4
Figure 6.4
1
1.70
4
3.80
x1
1pq
x 0,
(6.12)
P(X x)
102
Chapter Six
x 0, 1
(6.13)
P(X 1) 0
and
P(X 0) P(X 1) p q 1.
Definition 6.8 A random variable X is said to be distributed as
Bernoulli distribution if its probability function is dened as
P(X x) pxq1x,
x 0, 1; p q 1
= p
and
pq
(6.14)
x = 0, 1, 2, ..., n; p + q = 1
(6.15)
where n 0 is the number of trials, x the number successes, and p the probability of success. Sometimes, practitioners call the probability distribution
as a binomial model. The n and p in Equation (6.15) are the parameters of the
distribution.
n
Recall from Equation (5.4) that , pronounced n choose x is the
x
number of combinations when x items are selected from a total of n items,
and is equal to
n
n!
x = x ! (n x )!
For example,
6
6!
6 5 4 3 2 1
4 = 4 ! (6 4 )! = ( 4 3 2 1) ( 2 1) = 15
Example 6.5 The probability is 0.80 that a randomly selected technician
will finish his or her project successfully. Let X be the number of technicians
among a randomly selected group of 10 technicians who will finish their
projects successfully. Find the probability distribution of the random variable X. Also, represent this probability distribution graphically.
104
Chapter Six
Solution: It is clear that the random variable X in this example is distributed as binomial with n 10, and p 0.8. Thus, by using the binomial probability we have
10
0
10
f ( 0 ) = ( 0.80 ) (.20 ) = .0000
0
10
1
9
f (1) = ( 0.80 ) (.20 ) = .0000
1
10
2
8
f ( 2 ) = ( 0.80 ) (.20 ) = .0001
2
10
3
7
f ( 3) = ( 0.80 ) (.20 ) = .0008
3
10
4
6
f ( 4 ) = ( 0.80 ) (.20 ) = .0055
4
10
5
5
f ( 5 ) = ( 0.80 ) (.20 ) = .0264
5
10
6
4
f ( 6 ) = ( 0.80 ) (.20 ) = .0881
6
10
7
3
f ( 7 ) = ( 0.80 ) (.20 ) = .2013
7
10
8
2
f ( 8 ) = ( 0.80 ) (.20 ) = .3020
8
10
9
1
f ( 9 ) = ( 0.80 ) (.20 ) = .2684
9
10
10
0
f (10 ) = ( 0.80 ) (.20 ) = .1074
10
The graphical representation of the probability distribution is shown in
Figure 6.5.
Example 6.6
probabilities:
0.3
0.2
0.1
0.0
0
Figure 6.5
10
Solution:
1. In this part we are interested in nding the cumulative
probability P(X 3). Using the probabilities obtained in
Example 6.5, we get
P(X 3) P(X 3) P(X 4) ... P(X 10)
.0008 .0055 ... .1074 .9999
2. In this part we want to nd the probability of P(x 5). Again,
using the probabilities in Example 6.5 we get
P(X 5) P(X 0) P(X 1) P(X 2) P(X 3) P(X 4)
P(X 5)
.0000 .0000 .0001 .0008 .0055 .0264 .0328.
3. Now we want to nd the probability P(4 X 6). Thus we get
P(4 X 6) P(X 4) P(X 5) P(X 6) .0055 .0264
.0881 .1200.
6.3.3 Binomial Probability Tables
The tables of binomial probabilities for n 1 to 15 and for some selected
values of p are given in Table I of the Appendix. We illustrate the use of these
tables with the following example.
Example 6.8 The probability that the Food and Drug Administration
(FDA) will approve a new drug is 0.60. Suppose that five new drugs are submitted to FDA for its approval. Find the following probabilities:
1. Exactly three drugs are approved.
2. At most three drugs are approved.
3. At least three drugs are approved.
4. Between two and four (inclusive) drugs are approved.
106
Chapter Six
.05
.60
.774
.010
.203
.077
.022
.230
.001
.346
.000
.259
.000
.078
Solution: To use the appropriate table, one must rst determine the values
of n. Then, the probability for given values of n and p is found at the intersection of the row corresponding to the given value of X and the column corresponding to the given value of p. In this example the probability for n 5,
p 0.60, and x 3 is shown in Table 6.4.
Thus, from Table 6.4, we have
1. P(x 3) 0.346
2. P(x 3) P(x 0) P(x 1) P(x 2) P(x 3)
.010 .077 .230 .346 .663
3. P(x 3) P(x 3) P(x 4) P(x 5)
.346 .259 .078 .683
4. In this part we want to nd the probability P(2 x 4). Thus,
from Table 6.4, we get P(2 x 4) P(x 2) P(x 3)
P(x 4) .230 .346 .259 .835
Mean and Standard Deviation of a Binomial Distribution
Definition 6.10 The mean and standard deviation of binomial random variable X are
Mean: E(X) np
(6.16)
(6.17)
= np = 10 (0.8 ) = 8
= npq = 10(0.8 )(0.2 ) = 1.60 = 1.26
Example 6.10 The probability that a shopper entering a department store
will make a purchase is 0.30. Let X be the number of shoppers out of a total
of 30 shoppers who enter that department store within a certain period who
make a purchase. Use the formulas (6.16) and (6.17) of the mean and the
standard deviation of the binomial distribution to find the mean and the standard deviation of the random variable X.
Solution: Using formulas (6.16) and (6.17), we have
= np = 30 (.30 ) = 9
= npq = 30(.30 )(.70 )
= 6.30
= 2.51
108
Chapter Six
However, this probability can easily be found by using another probability distribution, known as hypergeometric distribution.
Definition 6.11 A random variable X is said to be distributed as
hypergeometric if
P( X = x ) =
r N r
x n x
N
n
x = a, a + 1, ..., min(r, n )
(6.18)
Where
a Max(0, n N r)
N total number of objects, say successes and failures in the
population
r number of objects of category of interest, say successes in
the population
N r number of failures in the population
n number of trials
x number of successes in n trials
n x number of failures in n trials
Example 6.11 A Six Sigma Green Belt randomly selects two parts from a box
containing 5 defective and 15 nondefective parts. He discards the box if one or
both parts drawn are defective. What is the probability that he will:
1. one defective part
2. two defective parts
3. Reject the box.
Solution:
1. In this problem we have N 20, r 5, n 2, x 1. Thus, we have
P ( X = 1) =
5 15
1 1
20
2
5 15 755
= .3947
=
190
190
2.
P( X = 2) =
5 15
2 0
20
2
10 1
= .0526
190
P( X = 0) =
10 100 10
0 5 0
100
5
100 90
0 5
100
5
10!
90!
90 89 88 87 86
= 0!10! 5!885! =
= .5837
100!
100 99 98 97 96
5! 95!
2. In this case, we have
N 100, r 4, n 5 and x 0.
Therefore, the probability that the lot will be accepted is
P( X = 0) =
4 100 4
0 5 0
100
5
4 ! 96!
96 95 94 93 92
0!! 4 ! 5! 91!
=
=
100!
100 99 98 97 96
5! 95!
= .8119.
110
Chapter Six
(6.19)
N n
npq
N 1
(6.20)
Where
N total number of objects in the population
r total number of objects in the category of interest
n the sample size
r
p
N
q1p
N r
N
N n
npq =
N 1
250 20
8 250 8
20
250 2550
250 1
230
8 242
=
20
= .7565
249
250 250
e x
, x 0, 1, 2,
x!
(6.21)
112
Chapter Six
shall revisit this concept in Chapter 8. The Poisson distribution is also known
to be a distribution that deals with rare events, that is, events that occur with
a very small probability.
It can easily be shown that Equation (6.21) satises both properties of
being a probability function, that is
P(X x) f (x) 0
and
P( X = x ) = f ( x ) = 1
x
Example 6.14 It is known from experience that 4% of the parts manufactured at a plant of a manufacturing company are defective. Use the Poisson
approximation to the binomial distribution to find the probability that in a lot
of 200 parts manufactured at that plant, seven parts will be defective.
Solution: Since n 200 and p .04, we have np 200(.04) 8 ( 10).
The Poisson approximation should give a satisfactory result. From formula
(6.20), we get
e8 (8 )7
7!
(.0009118965 )(2097152 )
= .3794
5040
P( X = 7) = f (7) =
Example 6.15 The number of breakdowns of a machine is a random variable having the Poisson distribution with 2.2 breakdowns per month.
Find the probability that the machine will work during any given month with:
(a) No breakdown
(b) One breakdown
(c) Two breakdowns
(d) At least two breakdowns
Solution: In this example we are given the number of breakdowns of the
machine per unit time. The unit time in this problem is one month. The probabilities that we want to nd are also of the number breakdowns in one
month. The parameter
remains the same, that is, 2.2. The probabilities
in parts (a)(d) can be easily found by using the probability function given
in formula (6.21).
(a) P ( X = 0 ) =
e2.2 (2.2 )0
0!
= e2.2
= 0.1108
since 0! 1
e2.2 (2.2 )1
1!
(0.1108 )(2.2 )
=
= 0.2438
1
(b) P ( X = 1) =
(c) P ( X = 2 ) =
e2.2 (2.2 )2
= 0.2681
2!
Since P ( X = x ) = 1
x
P( X = 4 ) =
P( X = 5) =
114
Chapter Six
1.1
1.9
2.0
.333
.150
.135
.366
.284
.271
.201
.270
.271
.074
.171
.180
.020
.081
.090
.005
.031
.036
.001
.010
.012
.000
.003
.004
.000
.000
.001
.000
.000
.000
(6.22)
Variance: V(X)
(6.23)
Standard deviation: = V ( X ) =
(6.24)
7
Continuous Random
Variables and Their
Probability Distributions
115
116
Chapter Seven
Combining the above property of probability distributions with the characteristic of continuous random variables, we see that we cannot assign any
nonzero probability when a continuous random variable takes an individual
value, for otherwise it is not possible to keep the total probability equal to 1.
Consequently, unlike the discrete random variables, the probability that a
continuous random variable assumes any individual value is always zero. In
case of continuous random variables, we are always interested in nding the
probability of random variables taking any value in an interval rather than
taking any individual values. For example, in the problem of time taken by a
technician to nish a job, let a random variable X denote the time (in hours)
taken by the technician to nish a given job. Then, we would be interested in
nding, for example, the probability P(3.0 X 3.5). That is, we would be
interested in nding the probability that she takes between 3 and 3.5 hours to
nish the job rather than nding the probability of her taking 3 hours, 10
minutes, 15 seconds to nish the job. In this case, the chance of the event
associated with completing the job in exactly three hours, ten minutes, and
fteen seconds is very remote and, therefore, the probability of such an event
will be zero.
The probability function of a continuous random variable X, denoted by
f(x), is usually known as the density function and is represented by a smooth
curve. For example, a typical density function curve of a continuous random
variable is shown in Figure 7.1.
The density function of a continuous random variable satises the following properties:
(i)
f (x) 0
(ii)
f ( x ) dx = 1
(7.1)
says is that the total area, which represents the total probability, is equal to
1. For example, the probability that the random variable X falls in an interval (a, b) is
P(a X b) Shaded area in Figure 7.1.
Note that if in Figure 7.1 we take a b, then the shaded area will be 0. That
implies the P(X a) P(X b) 0, which conrms the point we made
earlier in that the probability of a continuous random variable taking any
individual value is 0. This fact leads us to another important result in that it
does not matter whether the endpoints of an interval are included or not
while calculating the probability. In other words, if X is a continuous random variable,
P(a X b) P(a X b) P(a X b) P(a X b) (7.2)
The cumulative distribution function denoted by F(x) is dened as
F(x) P(X x)
(7.3)
(7.4)
(7.5)
(7.6)
So far, we have had a general discussion about the probability distributions of continuous random variables. In the remainder of the chapter we are
going to discuss some special continuous probability distributions that we
encounter frequently in applied statistics.
F(x)
118
Chapter Seven
f (x) = b a
0
for a x b
(7.7)
otherwise
Note that the density function f (x) in Equation (7.7) is constant for all values of x in the interval (a, b). Figure 7.3 shows the graphical representation
of a uniform distribution of the random variable X distributed over the interval (a, b), where a b.
The probability that the random variable X takes the values in an interval
(x1, x2), where a x1 x2 b is the shaded area in Figure 7.4 and is equal to
P ( x1 X x2 ) =
x 2 x1
ba
(7.8)
53 2
= = 0.5.
62 4
f(x)
1
(b a)
f(x)
1
(b a)
x1
x2
P ( X 4 ) = P (2 X 4 ) =
65 1
= = 0.25.
62 4
10 0 10 1
=
=
30 0 30 3
(b) P ( X 20 ) = P (20 X 30 ) =
(c) P (12 X 22 ) =
30 20 10 1
=
=
30 0 30 3
22 12 10 1
=
=
30 0 30 3
120
Chapter Seven
Note that in each case, the probability turned out to be the same (i.e., 1/3).
This shows that, in a uniform distribution, the probability depends upon the
length of the interval and not on the location of the interval. In each case the
length of the interval was equal to 10.
7.2.1 Mean and Standard Deviation of the Uniform Distribution
Let X be a random variable distributed uniformly over an interval (a, b). The
mean and the standard deviation of the random variable X are given by
a+b
2
(7.9)
ba
12
(7.10)
The distribution function F(x) of a random variable X distributed uniformly over an interval (a, b) is dened as
F(x) P(X x)
P(a X x)
=
xa
ba
(7.11)
Example 7.3 Let a random variable X denote the coffee break (in minutes)
that a technician takes every morning. Let the random variable X be uniformly distributed over an interval (0, 16). Find the mean and the standard
deviation of the distribution.
Solution: Using Equations (7.9) and (7.10), we get
a + b 0 + 16
=
=8
2
2
b a 16 0
=
= 4.619.
12
12
Example 7.4 In Example 7.3, find the following values of the distribution
function of the random variable X:
(a) F(3) (b) F(5) (c) F(12)
Solution: Using the result of Equation (7.11), we get
(a) F ( 3) =
x a 3 0
3
=
=
b a 16 0 16
(b) F (5 ) =
xa 50
5
=
=
b a 16 0 16
(c) F (12 ) =
x a 12 0 12 3
=
=
=
b a 16 0 16 4
2
2
1
e ( x ) / 2
2
x
(7.12)
where and 0 are the two parameters of the distribution, 3.1428 and e 2.71828. Also, note that and are
the mean and standard deviation of the distribution. A random variable X having a normal distribution with mean and a standard
deviation is usually written as X N(, ).
Some of the characteristics of the normal density function are the following:
1. The normal density function curve is bell shaped and
completely symmetric about its mean . For this reason the
normal distribution is also known as a bell-shaped distribution.
2. The specic shape of the curve, whether it is more or less tall, is
determined by its standard deviation .
3. The tails of the density function curve extend from to .
4. The total area under the curve is 1.0. However, 99.74% of the
area falls within three standard deviations of the mean .
5. The area under the normal curve to the right of is 0.5 and to
the left of is also 0.5.
Figure 7.5 shows the normal density function curve of a random variable
X with mean and standard deviation .
122
Chapter Seven
+3
Figure 7.5 The normal density function curve with mean and standard deviation .
=1
=3
=5
=7
Figure 7.6 Curves representing the normal density function with different means,
but with the same standard deviation.
=1
=2
=3
Figure 7.7 Curves representing the normal density function with different standard
deviations, but with the same mean.
(7.13)
The new random variable Z is also distributed normally, but with mean 0 and
a standard deviation 1. The distribution of the random variable Z is generally
known as the standard normal distribution.
Definition 7.4 The normal distribution with mean 0 and standard
deviation 1 is known as the standard normal distribution and is usually written as N(0,1).
The values of the standard normal random variable Z, denoted by the
lower case letter z, are called the z-scores. For example, in Figure 7.8 the
points marked on the x-axis are the z-scores. The probability of the random
variable Z falling in an interval (a, b) is shown by the shaded area under the
standard normal curve in Figure 7.9. This probability is determined by using
a standard normal distribution table (see Table III of the appendix).
7.3.1 Standard Normal Distribution Table
The standard normal distribution, Table III of the appendix, lists the probabilities of the random variable Z for its values between z 0.00 and z
3.09. A small portion of this table is reproduced below in Table 7.1. The
entries in the body of the table are the probabilities P(0 Z z), where z is
some point in the interval (0, 3.09). These probabilities are also shown by the
124
Chapter Seven
Table 7.1 A portion of standard normal distribution Table III of the appendix.
Z
.00
.01
.02
.03
.04
.05
.06
.07
.08
.09
0.0
.0000
.0040
.0080
.0120
.0160
.0199
.0239
.0279
.0319
.0359
0.1
.0398
.0438
.0478
.0517
.0557
.0596
.0636
.0675
.0714
.0753
1.0
.3413
.3438
.3461
.3485
.3508
.3531
.3554
.3577
.3599
.3621
1.1
.3643
.3665
.3686
.3708
.3729
.3749
.3770
.3790
.3810
.3830
1.9
.4713
.4719
.4726
.4732
.4738
.4744
.4750
.4756
.4761
.4767
2.0
.4772
.4778
.4783
.4788
.3793
.4798
.4803
.3808
.4812
.4817
shaded area under the normal curve given at the top of the Table III of the
appendix. To read this table we mark the row and the column corresponding
to the value of z to one decimal point and the second decimal point respectively. Then the entry at the intersection of that row and column is the probability P(0 Z z). For example, the probability P(0 Z 2.09) is found
by marking the row corresponding to z 2.0 and column corresponding to
z .09 (note that z 2.09 2.0 .09) and locating the entry at the intersection of the marked row and column, which in this case is equal to .4817.
The probabilities for the negative values of z are found, due to the symmetric property of the normal distribution, by nding the probabilities of the corresponding positive values of z. For example, P(1.54 Z 0.50)
P(0.50 Z 1.54).
Example 7.5 Use the standard normal distribution table, Table III of the
appendix, to find the following probabilities:
(a) P(1.0 Z 2.0) (b) P(1.50 Z 0) (c) P(2.2 Z 1.0)
Solution:
(a) From Figure 7.10 it is clear that
P(1.0 Z 2.0) P(0 Z 2.0) P(0 Z 1.0)
.4772 .3413 0.1359
1.5
1.5
2.2
2.2
Figure 7.12 Two shaded areas showing P(2.2 Z 1.0) P(1.0 Z 2.2).
126
Chapter Seven
1.5
0.8
0.7
Thus, we have
P(1.50 Z 0.80) P(1.50 Z 0) P(0 Z 0.80)
P(0 Z 1.50) P(0 Z 0.80)
.4332 .2881 0.7213
(b) The probability P(Z 0.70) is shown by the shaded area in
Figure 7.14. This area is equal to the sum of the area to left of
z 0 and the area between z 0 and z 0.7, which implies that:
P(Z 0.70) P(Z 0) P(0 Z 0.7) 0.5 .2580 0.7580
(d) By using the same argument as in part b and Figure 7.15, we get
P(Z 1.0) P(1.0 Z 0) P(Z 0)
P(0 Z 1.0) P(Z 0)
.3413 .5 0.8413
Example 7.7
bilities:
Use Table III of the appendix to find the following proba(a) P(Z 2.15) (b) P(Z 2.15)
Solution:
(a) The desired probability P(Z 2.15) is equal to the shaded
area under the normal curve to right of z 2.15, shown in
2.15
2.15
4
4
4
P(0.5 Z 2.0)
P(0 Z 2.0) P(0 Z 0.50)
0.4772 0.1915 0.2857.
128
Chapter Seven
14
0.5 1
2.0
0.5
(b) Proceeding in same manner as in part (a) and using Figure 7.20,
we have
2 6 X 6 10 6
P (2.0 X 10.0 ) = P
4
4
4
= P (1.0 Z 1.0 )
= P (1.0 Z 0 ) P (0 Z 1.0 )
= 2 P (0 Z 1.0 ) = 2(0.3413) = 0.6826.
(c) Again, transforming X into Z and using Figure 7.21, we get
0 6 X 6 4 6
P (0 X 4.0 ) = P
4
4
4
= P (1.50 Z 0.50 ) = P (0.5 Z 1.50 )
= P (0 Z 1.50 ) - P (0 Z 0.50 )
= 0.4332 0.1915 = 0.2417
Example 7.9 Suppose a quality characteristic of a product is normally distributed with mean 18 and standard deviation 1.5. The specification limits furnished by the customer are (15, 21). Determine what percentage of the product meets the specifications set by the customer.
1.5
0.5
1.5
1.5
1.5
= 100 P (2.0 Z 2.0 )
= 100 [ P (2.0 Z 0 ) + P (0 Z 2.0 )]
= 100 2 P (0 Z 2.0 )
= 100 2(.4772 ) = 95.44%.
In this case, the percentage of product that will meet the specications
set by the customer is 95.44%.
(7.14)
130
Chapter Seven
f(x)
2
1.5
=2.0
0.5
=1.0
= 0.5
= 0.1
10
Figure 7.22 Graphs of exponential density function for 0.1, 0.5, 1.0, and 2.0.
and
= V (X) = 1 /
(7.15)
(7.16)
(7.17)
Equation (7.17) leads us to an important property known as the memoryless property of the exponential distribution. As an illustration of this property, we consider the following example.
Example 7.10 Let the breakdowns of a machine follow the Poisson process
so that the random variable X denoting the number of breakdowns per unit
of time is distributed as Poisson distribution. Then, the time between any two
consecutive failures is also a random variable (say) T, which is distributed
P (T > t + t1 , T > t1 )
P (T > t1 )
Where P(T t t1, T t1) means that the probability that T t t1 and
T t1. When T t t1 it is automatically greater than t1. Thus, we have
P(T t t1 | T t1) P(T t t1)
So that
P (T > t + t1 | T > t1 ) =
P (T > t + t1 )
P (T > t1 )
e (t + t1 )
= e t = e ( 0.1)t = e t /10 , since = 0.1
e t1
Therefore, the probability P(T t t1 | T t1) is the same as the probability P(T t). This means that under the exponential model, the probability P(T t) remains the same no matter from what point we measure the
time t. In other words, it does not remember when the machine had its last
breakdown. For this reason, the exponential distribution is known to have a
memory-less property.
From the above discussion we can see that it does not matter when we
start observing the system, since it does not take into account an aging factor. That is, whether the machine is brand new, or 20 years old, we have the
same result as long as we model the system using the same Poisson process
132
Chapter Seven
(or exponential model). In practice, however, this is not very valid. For example, if we are investigating how tollbooths function during rush hours and
nonrush hours, and we model it with the same Poisson process, then the
results may not be very valid. It would make more sense that when there is
very clear distinction between the two scenarios, we should model them by
two different processes.
h1 (t)
4
3
2
h3 (t)
h2 (t)
0
10
(7.18)
(7.19)
1
1
1 +
(7.20)
2
1
2
1
= 2 1 + 1 +
(7.21)
4
= 2.0
3
2
= 1.0
= 0.5
0
10
(7.22)
134
Chapter Seven
2
=1, = 0.5
=1, = 1
=1, = 2.0
0
0
(7.23)
Example 7.11 From data on a system, the parameters of a Weibull distribution are estimated to be 0.00025 and 0.5 where t is measured in
hours. Then, determine:
(a) The mean time before the system breaks down
(b) The probability P(T 5,000)
(c) The probability P(T 10,000)
(d) The probability P(T 10,000)
Solution:
(a) Using the expression in Equation (7.20) for the mean, we have
1
1
1 +
0.00025
0.5
(4,000) (3)
(4,000) (2!)
8,000
0.5
0.5
= e (1.25 )
= e1.118
= 0.3269
(c) Again, using the expression in Equation (7.23), we have
P (T 10, 000 ) = e (( 0.00025 )(10, 000 ))
0.5
= e1.5811388
= 0.2057
(d) The probability P(T 10,000) can be found using the result in
part c, that is,
P(T 10,000) 1 P(T 10,000) 1 0.2057 0.7943
8
Sampling Distributions
n Chapters 6 and 7 we discussed distributions of data as they apply to discrete and continuous random variables. Now we will turn our attention to
sampling distributions, or the probability distributions of functions of
observations from a random sample. The sampling distributions we are going
to discuss below are frequently encountered in applied statistics.
One of the key functions of statistics is to draw conclusions about the
population based upon information contained in a random sample. To use
such sample information for drawing conclusions about a population, it is
necessary that we understand the relationship between numerical measures
of sample data (statistics) and numerical measures of population data
(parameters). The purpose of this and the next chapter is to establish and discuss the importance of these relationships between statistics and parameters.
As an illustration, consider that we have a population with an unknown
mean and we would like to gain some knowledge about that population. It
seems quite plausible that we could take a random sample of n observations
(X1, X2, ..., Xn) from this population and use the sample mean as a substitute
(called an estimator) of the population mean based on the following:
X=
1 n
xi
n i =1
(8.1)
Once we decide to use the sample mean as an estimator of the population mean, we have an immediate question about how well any particular
esti
mate actually describes the true value of . Since the value of X changes
from sample to sample, how well the estimate describes
the true value of
depends upon the behavior of the sample mean X, which is a random variable known as a statistic. To see how this random variable X behaves, we
need to study its probability distribution. The probability distribution of X is
known as the sampling distribution of the sample mean. Generally speaking,
a sampling distribution is the probability distribution of a statistic. For example, the probability distribution of a sample median, sample proportion, and
sample variance are called sampling distributions of the sample median,
sample proportion, and sample variance, respectively.
137
138
Chapter Eight
=
2 =
=
1
1
Xi = (1 + 2 + 3 + 4 + 5 + 6 ) = 3.5
N
6
1
1
Xi2 2 = (1 + 4 + 9 + 16 + 25 + 36 ) ( 3.5 )2
N
6
1
(91 ( 3.5 )2 )
6
2.917
= 2.917 = 1.708
Now suppose instead of considering the whole population we want to
consider a random sample of size 2 to draw conclusions about the entire population. To achieve this goal we must determine all possible samples of size
2. In this case, there are 15 possible samples, when sampling is done without replacement (under this sampling scheme any population element can
appear in a given sample only once). We list these samples with their respective means in Table 8.2.
Note that some of the samples have the same mean. The sample we will
draw will be selected randomly from the population, which means that each
one of the 15 samples has the same chance (1/15) of being drawn. Our
chances, or probability, of getting different sample means will vary depending upon the actual drawing of the sample. For example, the probability of
getting a sample mean of 2.5 is 2/15, whereas the probability of getting a
sample mean of either 1.5 or 2.0 is only 1/15 each. We list different sample
means with their respective chances or probabilities
in Table 8.3.
Table 8.3 gives the sampling distribution of X. If we select a random
sample of size 2 from the population given in Table 8.1, we may draw any of
the 15 possible samples. The sample mean X can then assume any of the nine
Table 8.1 Population with its distribution for the experiment of rolling a fair die.
Xx
p(x)
1/6
1/6
1/6
1/6
1/6
1/6
Table 8.2 All possible samples of size 2 with their respective means.
Sample number
Sample
Sample mean
11
1, 2
1.5
12
1, 3
2.0
13
1, 4
2.5
14
1, 5
3.0
15
1,6
3.5
16
2, 3
2.5
17
2, 4
3.0
18
2, 5
3.5
19
2, 6
4.0
10
3, 4
3.5
11
3, 5
4.0
12
3, 6
4.5
13
4, 5
4.5
14
4, 6
5.0
15
5, 6
5.5
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
5.5
1/15
1/15
2/15
2/15
3/15
2/15
2/15
1/15
1/15
the mean and variance of the sampling distribution of X given in Table 8.3 as
follows:
E(X ) x p(x )
(8.2)
x
(8.4)
(8.3)
140
Chapter Eight
From this example we can see that the mean of the sample mean is equal
to the population mean. Note that this is not a coincidence, but it is true in
general. That is,
x E(X )
(8.5)
and the variance of X ) depends upon the size of the population and whether
the sample has been taken with replacement or without replacement. If the
population is nite, say of size N, and the sample of size n is taken without
replacement, then we have the following formula:
x2
N n 2
=
N 1 n
(8.6)
N n
is called the finite population correction factor. The
N 1
standard deviation of X, usually known as the standard error of the sample
mean, is found by taking the square root of the variance in Equation (8.6). If
the population is innite, or if the population is nite and the sample has
been taken with replacement, we have the following formula:
where the fraction
x2 =
2
n
(8.7)
Also, if the population is nite and the sample of size n is taken without
replacement such that
n
0.05
N
(8.8)
x2
2
=
n
(8.9)
From these formulas we see that the variance of the sample mean is smaller
than the population
variance and, therefore, the spread of the sampling dis
tribution of X is smaller than the spread of the corresponding population
distribution. This also implies that asn increases, the values of X become
more concentrated about the mean of X, which is the same as the population
mean. This grouping of sample means about the population mean is the reason that, whenever possible, we like to take as large a sample as possible
when we want to estimate the population mean.
In the above discussion we have not said anything about the shape of the
probability distribution of the sample mean. The concept of shape as related
to a probability distribution is very interesting and occupies an important
place in statistical applications.
If the distribution
of a population is normal then the distribution of the
((X ) /
) is approximately standard normal.
n
It is important to note that there is a clear distinction between the two
cases when the population is normal and when it is not normal. When the
population is normal, there isno restriction on the sample size, and the disWhen the population is not
tribution of the sample mean X is exactly normal.
N 1 n
Example 8.2 The mean weight of a food entre is 190g with a standard deviation of 14g. If a sample of 49 entres is selected, then find:
(a) The probability that the sample mean weight will fall between
186g and 194g.
(b) The probability that the sample mean will be greater than
192g.
Solution:
x =
14
=
= 2g. Thus, from Figure 8.1, we have
n
49
2
2
2
142
Chapter Eight
(b) Using the same argument as in part (a) and Figure 8.2, we have
the following:
P(X 192) P
P(Z 1) 0.1587
2
2
Example 8.3
49 to 64.
Solution:
14
14
=
=
= 1.75g.
190g and standard deviation x =
8
n
64
Continuing our Example 8.2 and using Figure 8.3, we have
186 190 X 190 194 190
P(186 X 194) P
1.75
1.75
1.75
P(2.28 Z 2.28) 2P(0 Z 2.28)
2(0.4887) 0.9774
2.20
2.20
1.14
Solution: Let X be the sample mean. Then, we are interested in nding the
probability
P(79.85 X 80.15)
144
Chapter Eight
1.5
1.5
1.6
1.6
Since the sample size is large, the central limit theorem tells us that X is
approximately normally distributed with mean x 80cm and standard
0.6
=
deviation x
0.1cm. Thus, using Figure 8.5, we have
n
36
79.85 80 X 80 80.15 80
P(79.85 X 80.15) P
0.1
0.1
0.1
0.15
0.15
Z
P
P(1.5 Z 1.5)
0.1
0.1
2(0.4332) 0.8664
Example 8.5 Suppose the mean hourly wage of all employees in a large
semiconductor
manufacturing facility is $50 with a standard deviation of $10.
Let X be the mean hourly wages of certain employees selected randomly from
all the employees of this manufacturing
facility. Find the approximate proba
bility that the mean hourly wage X falls between $48 and $52 when the number of selected employees is (a) 64, (b) 100.
Solution:
(a) The sample size 64 is large,
so by use of the central limit
x =
10
=
= 1.25
n
64
P(48 X 52)
Using the mean and the standard deviation of the sample mean X derived
above and Figure 8.6, we have the following:
48 50 X 50 52 50
P(48 X 52) P
1.25
1.25
1.25
10
1
100
48 50 X 50 52 50
P(48 X 52) P
1
1
1
146
Chapter Eight
n
64
=
= 0.125
N 500
Thus, the sample size is greater than 5% of the population. By using the nite
population correction factor, we have the following:
x = = 50 and x =
436
N n 10
500 64
= 1.168
= 1.25
499
N 1 8
500 1
n
X
48
50
50
52
50
P(48 X 52) P
1.168
1.168
1.168
x = = 50 and x =
N n 10
500 100
400
= 0.895
=
499
N 1 10
500 1
n
48 50 X 50 52 50
P(48 X 52) P
0.895
0.895
0.895
1.71
1.71
2.23
2.23
with probability p
with probability q 1 p
(8.10)
X1 + X 2 + ... + X n m
=
n
n
(8.11)
where m is the total number of successes in n Bernoulli trials. Thus, X represents the proportion
of successes in a sample of n trials. In other words, we
may look upon X as the sample proportion that is usually denoted by p (read
as p hat). Thus, the sampling distribution of the sample proportion is just the
sampling distribution of the sample mean when the sample is taken from a
Bernoulli population with mean p and variance pq.
From the above discussion and the result of central limit theorem, we
have that for a large n the sampling distribution of the sample proportion p
(sampling mean of a sample from the Bernoulli population) is approximately
normal with mean p and variance pq / n.
It is important to note, however, that when the sample is taken from a
Bernoulli population, the central limit theorem holds only when np 5 and
nq 5. If p 1 2, for example, the sampling distribution of the sample proportion is approximately normal even when n is as small as 10.
Example 8.7 Let X be a random variable distributed as binomial with
parameters n and p, B(n, p), where n is the number of trials and p is the probability of success. Find the sampling distribution of the sample proportion p,
when (a) n 100, p 0.25, and (b) n 64, p 0.5.
Solution:
(a) We have the following:
np 100(.25) 25 5
nq 100(1.25) 100(.75) 75 5
148
Chapter Eight
so the central limit theorem holds. Thus, p is approximately normally distributed with mean and variance given as follows:
p = p = 0.25
2p =
pq (.25 )(.75 )
=
= .001875
n
100
p = p = .05
2p =
(8.12)
The variable X is said to be distributed as chi-square with n degrees of freedom and is written as 2n. The frequency distribution of chi-square with n
degrees of freedom is given by
1
f (x) =
2
n
2 (n
0,
e x / 2 x ( n / 2 )1
x0
/ 2)
otherwise
(8.13)
f(x)
n=4
n=6
n=8
n = 10
10
20
2
Xn,
(8.14)
2n
2
(8.15)
2
n,
of the variable 2
(8.16)
150
Chapter Eight
0.05
0
28.8693
0.05.
2
Xn,1-
.
(8.17)
That is, we rst nd the corresponding area under the upper tail (total
area under the 2 curve 1 ). Since Table V lists the values of the
variable 2 only for certain values of upper tail areas.
For example, if the random variable 2 is distributed with n 10 degrees
of freedom and 0.10 which we write as 210,1-0.10 or as 210,0.90 (see
Figure 8.14) then from Table V of the appendix, we have
210,0.90 4.86518 and P(210 4.86518) 0.10.
Example 8.8 Let a random variable 2 be distributed as chi-square with
20 degrees of freedom. Find the value of 2 such that (a) P(220 2) 0.05
(b) P(220 2) 0.025.
0.05
4.86518
0.10.
Solution:
(a) In this case we are given the area under the upper tail. Thus we
can see the value of 2 directly from the table with n 20 and
0.05. That is,
220,0.05 31.410
(b) In this case, we are given the area under the lower tail. We now
rst nd the corresponding area under the upper tail that is
given by
1 1 0.025 0.975
Then, the value of can directly be seen from Table V of the appendix with
n 20 and 0.975. That is,
2
1
( X i X )2
n 1
(8.18)
Then
(n 1)S 2
2
(8.19)
152
Chapter Eight
X is distributed
as normal with mean and variance 2 / n. In this case, for
a given X only (n 1) of the variables X1, X2, ..., Xn can vary freely, therefore, the degree of freedom of 2 is (n 1). The actual proof of this theorem
is beyond the scope of this book.
Example 8.9 Suppose a tea packaging machine is calibrated so that the
amount of tea it discharges is normally distributed with mean 1 pound
(16oz) with a standard deviation of 1.0 oz. Suppose we randomly select 21
packages and weigh the amount of tea in each package. If the variance of
these 21 weights is denoted by S2, then it may be of interest to find the values of c1 and c2 such that
P(c1 S2 c2) 0.95
The solution to this problem would enable us to calibrate the machine such
that the value of the sample variance would be expected to fall between certain values with a very high probability.
Solution: From Theorem 8.3, we have the following:
(n 1)S2
220
2
Thus, we have
n 1
(n 1)S2 n 1
P 2 c1
2 c2 = 0.95
2
or
n 1
n 1
2
P 2 c1 20
2 c2 = 0.95
X
Y
n
(8.20)
t2
1
1+
n
1 n
nB ,
2 2
n +1
2
(8.21)
where B(a, b) is called the beta function and is equal to (a)(b)/(a b).
Like the standard normal distribution, the t-distribution is unimodal and symmetric about t 0. The mean and the variance of t-distribution with n
degrees of freedom are
0,
provided n 1
(8.22)
and
2
n
,
n2
provided n 2
(8.23)
respectively.
Figure 8.15 gives a comparison of the frequency distribution function
of the Students t-distribution and the standard normal distribution. Note
that as the degrees of freedom increase, the t-distribution tends to become
more like the standard normal distribution. In most applications, as n
becomes large (n 30), we use the standard normal distribution rather
than the t-distribution. We will see this substitution of normal distribution
for the Students t-distribution in Chapter 9 and subsequent chapters.
From Theorem 8.3 and the denition of the t-distribution, we have an
important result that is used quite frequently in applications.
Theorem 8.4 Let X1, X2, ..., Xn be a random sample from a normal
population with mean and an unknown variance 2. Let X and S2
respectively be the sample mean and sample variance. Then the random variable
T (X ) / (S /
n)
(8.24)
154
Chapter Eight
t n,
t n,
Figure 8.16 t-distribution with shaded area under the two tails equal to P(T tn, )
P(T tn, ) .
on these observations that their sum must be equal to nX. Thus, we can select
only (n 1) variables freely and the nth observation
must be such that the
total of all the observations must be equal to nX, so that the degree of freedom for the t-distribution in Theorem 8.4 is (n 1).
Like the standard normal distribution tables, we have tables for the t-distributions. Let the random variable T be distributed as a t-distribution with n
degrees of freedom. We then dene the quantity tn, as
P(T tn, )
(8.25)
Table IV of the appendix lists the values of the quantity tn, for various values of n and .
To nd the value of tn, from Table IV of the appendix, rst locate the
value of n in the column of degrees of freedom and then locate the value of
(probability represented by the shaded area under the upper tail) at the top
of Table IV. Then, the value of tn, is the entry found at the intersection of
the row of n degrees of freedom and the column of the probability .
Also, from Figure 8.16 we can see that P(T tn, ) P(T tn, ) .
Example 8.10
.05
.025
.01
14
1.365
1.761
2.145
2.624
15
1.361
1.753
2.131
2.602
16
1.337
1.746
2.120
2.583
17
1.333
1.740
2.110
2.567
19
1.330
1.724
2.101
2.552
at the intersection of the marked row and column is 1.753. Thus, we have t15,
0.05 1.753
If we now want the value of tn, such that the probability under the left
tail is 0.05, then it is equal to tn,0.05. This is due to the fact that t-distribution is symmetric about the origin.
F= 1
X2
2
(8.26)
v1 v2 v1
1
1
v1 2 v22 x 2
f(x) =
v1 + v2 , for x > 0
v
v
B 1 , 2 v1 x + v2 2
2 2
(8.27)
0, otherwise
The mean and variance of the F-distribution are
2
,
2 2
provided 2 2
(8.28)
156
Chapter Eight
2 =
2 22 (1 + 2 2 )
1 ( 2 2 ) ( 2 4 )
2
provided 2 4
(8.29)
respectively.
Note that the mean of the F-distribution depends only on the degree of
freedom of the denominator.
Figure 8.17 shows the curve of the probability density function of a typical F-distribution with 1 and 2 degrees of freedom. Like the 2 random
variable, the F random variable is also non-negative, and its distribution is
right skewed. The shape of the distribution changes as the degrees of freedom change.
Now consider two random samples X11, X12, ..., X1n1 and X21, X22, ..., X2n2
from two independent normal distributions with variances 12 and 22,
respectively. Let S12 and S22 be the sample variances of the samples coming
from these normal populations respectively. From our previous discussion in
( n2 1) S22
n1 1) S12
(
and X2
are indesection 8.4, we know that X1
22
12
pendently distributed as chi-square with 1 n1 1 and 2 n2 1
degrees of freedom. In this case, we have the following theorem.
Theorem 8.5 Let X11, X21, ..., X1n1 and X21, X22, ..., X2n2 be two independent random samples from two normal populations N(1, 1)
and N(2, 2). Let S12 and S22 be the sample variances and let a new
random variable X be dened as
S12
2
X = 12
S2
22
(8.30)
f(x)
2.
f(x)
Fv, v ,
1
2
.
f(x)
F, , 1-
.
Like tables for some other distributions, we also have tables for the Fdistribution. Table VI of the appendix lists values of F1, 2; for various values of 1, 2 and (see Figure 8.18), such that
P(F F1, 2; )
(8.31)
Solution: Locate and mark the column and row corresponding to the
numerator degrees of freedom (1 15) and the denominator degrees of
freedom (2 20). The entry at the intersection of the marked column and
row corresponding to the value 0.05 is the desired value of F15,20; 0.05. In
this case, we have
F15,20; 0.05 2.20
Note that entries F1, 2, in the Table correspond only to the upper-tail areas.
To nd the entries corresponding to the lower-tail areas which we denote by
F1, 2, 1 such that (see Figure 8.19)
(F F1, 2, 1 )
158
Chapter Eight
1
Fv2 , v1 ,
(8.32)
1
F20,15,0.05
1
0.429.
2.33
S12
) P(F9,11 )
S22
So we need to nd the value of F9,11; 0.05. From Table VI of the appendix, we have
F9,11; 0.05 2.90.
This means the probability that the variance of the rst sample is greater than
or equal to 2.90 times the variance of the second sample is only 0.05. We
encourage the reader to verify that the probability that the variance of the rst
sample is less than or equal to 0.3448 times the variance of the second sample is also 0.05.
e5 5 0 e5 51 e5 5 2 e5 5 3 e5 5 4
+
+
+
+
0!
1!
2!
3!
4!
160
Chapter Eight
p = 0.3
p = 0.2
0.3
0.25
0.25
0.2
0.2
0.15
0.15
0.1
0.1
0.05
0.05
0
0
10 12
14
0 2
p = 0.4
0.25
0.2
0.2
0.15
0.15
0.1
0.1
0.05
0.05
0
4
10
12
14
p = 0.5
0.25
0 2
10 12 14
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
p 0.4
p 0.5
Exact prob.
Normal app.
Exact prob.
Normal app.
10
0.0005
0.0015
0.0000
0.0001
11
0.0047
0.0070
0.0005
0.0008
12
0.0219
0.0237
0.0032
0.0040
13
0.0634
0.0613
0.0139
0.0145
14
0.1268
0.1208
0.0417
0.0412
15
0.1859
0.1814
0.0916
0.0902
16
0.2066
0.2079
0.1527
0.1519
17
0.1771
0.1815
0.1964
0.1973
18
0.1181
0.1207
0.1964
0.1972
19
0.0612
0.0613
0.1527
0.1519
10
0.0245
0.0237
0.0916
0.0902
11
0.0074
0.0070
0.0417
0.0412
12
0.0016
0.0015
0.0139
0.0145
13
0.0003
0.0003
0.0032
0.0040
14
0.0000
0.0000
0.0005
0.0008
15
0.0000
0.0000
0.000
0.0001
Table 8.6 Showing the use of continuity correction factor under different scenarios.
Probability using binomial distribution
P(a X b)
P(a X b)
P(a X < b)
P(a X < b)
P(X a)
P(X a 0.5)
P(X a)
P(X a 0.5)
P(X a)
P(X a)
P(X a 0.5)
P(X a)
P(X a 0.5)
162
Chapter Eight
Solution:
(a) From Table I of the appendix, we have
P(7 X 12) P(X 7) P(X 8) P(X 9) P(X 10)
P(X 11) P(X12)
0.1659 0.1797 0.1597 0.1170 0.0710 0.0355
0.7289
(b) First we check whether we can use normal approximation to the
binomial distribution or not by verifying the condition that np
5 and n(1 p) 5. From the given information, we have
np 20(0.4) 8 5
n(1 p) 20(1 0.4) 12 5
In this case, both conditions necessary for being able to use the normal
approximation for the binomial distribution are satised. Thus, we can now
proceed to calculate the approximate probability as follows:
np 20(0.4) 8
2 npq 20(0.4)(0.6) 4.8
4.8 2.19
6.5 8 X 8 12.5 8
= P (6.5 X 12.5 ) = P
2.19
2.19
2.199
P(0.68 Z 2.05) P(0.68 Z 0) P(0 Z 2.05)
0.2517 0.4798 0.7315.
The difference between the approximate and the exact probability is 0.7315
0.7289 0.0026, which is clearly not very signicant.
Example 8.15 A fair coin is tossed 12 times. Find the exact and approximate probabilities of getting seven heads and compare the two probabilities.
Solution: Let X be a random variable that denotes the number of heads. In
this case, X is distributed as binomial with n 12 and p 0.5.
Using the binomial tables the exact probability of getting seven heads is
P(X 7) 0.1934
Now we can calculate the approximate probability by using the normal
approximation. Clearly both the conditions np 5 and n(1 p) 5 are satised and
(a)
67 8
12
20
(b)
10
15
20
6.5 8
12.5
20
Figure 8.21 (a) Showing the normal approximation to the binomial (b) Replacing
the shaded area contained in the rectangles by the shaded area under
the normal curve.
np 12(0.5) 6
2 npq 12(0.5)(0.5) 3
1.73
Thus, we have
P(X 7) P(7 X 7)
P(7 0.5 X 7 0.5)
P(6.5 X 7.5)
6.5 6 X - 6 7.5 6
P
P(0.29 Z 0.87)
1.73
1.73
1.73
P(Z 0.87) P(Z 0.29) 0.3078 0.1141 0.1937
In this case, the exact and approximate probabilities of getting 7 heads in 12
trials are almost equal.
9
Point and Interval Estimation
166
Chapter Nine
DESCRIPTION
POINT ESTIMATION
A method to find a single number, based
upon the information contained in a sample,
that comes close to an unknown parameter
value.
USE
TYPE OF DATA
DESIGN/APPLICATION
CONSIDERATIONS
SPECIAL COMMENTS/CONCERNS
RELATED TOOLS
Interval estimation.
Point estimation is a method to determine a suitable statistic, called an estimator. The estimator tells us how to arrive at a single value, called a point
estimate, based on the observations contained in a sample. For example:
X=
1
Xi
n
(9.1)
is a point estimator of the population mean . Note that X is one of the many
possible estimators of the population mean and is usually denoted by
(read as mu hat). This estimator tells us how to use the sample data to
arrive at a single value x, a point estimate of . To calculate x, we add all the
observations in the sample and then divide the sum by n, the number of
observations in the sample.
Since it is always possible to nd many point estimators of a parameter,
our immediate problem is to decide which estimator is good. A good estimator is one that would give, based on the information contained in a sample, an
estimate that falls closer to the true value of the parameter to be estimated. But
the question at hand is how to identify a good estimator. There are certain
properties associated with a good estimator. An estimator that possesses more
of these properties is considered a better estimator. Here we will examine only
a couple of such properties, as the rest of them are beyond the scope of this
book.
9.1.1 Properties of Point Estimators
There are various properties of a good point estimator, such as being unbiased,
having minimum variance, relative efciency, consistency, and sufciency. The
properties that we are going to discuss in some detail are the following:
1. Unbiased
2. Minimum variance
Let f(x, ) be the probability distribution of a population of interest with a
parameter , which is unknown. Let X1, X2, ..., Xn be a random sample from
the population of interest. Let (X1, X2, ..., Xn) be a point estimator of
the unknown parameter .
Definition 9.1 The point estimator ( X1, X2, ..., Xn) is said
to be an unbiased estimator of if and only if E() (i.e., the mean of
) is equal to . If E() , then is a biased estimator of .
Note that in Denition 9.1 is a statistic.
Example 9.1 Find an unbiased estimator of the population mean .
Solution: In Chapter 8, wesaw that the mean of the sample mean
is equal
to the population mean, or E(X) . Therefore, the sample mean X is always
an unbiased estimator of the population mean .
Example 9.2 Find an unbiased estimator of the population proportion p.
Solution: In Chapter 8, we saw that the mean of the sample proportion p is
equal to the population proportion p, that is, E(p ) p. Thus, the sample proportion p is an unbiased estimator of the population proportion p.
If an estimator is not unbiased, the bias of the estimator is equal to the
difference between E() and . If E() , is said to be positively biased,
and if E() , it is said to be negatively biased.
Having said this, there still remains one question to be answered. If the
mean of a population is unknown and we take a random sample from this
population, nd its mean x and use it as an estimate of , how do we know
how close our estimate x is to the true value of ? Dont worry, the answer
to this question is simple, but depends upon the population size, the probability distribution of the population, and the sample size. However, before
discussing the answer to this question, we would like to discuss the estimation of the population variance with the following example.
Example 9.3 Let us consider a population with probability distribution
f (x, , 2) where and 2 are unknown population mean and variance,
respectively. Then find an unbiased estimator of 2.
Solution: Let X1, X2, ..., Xn be a random sample from the population f (x,
,2) and let S2 be the sample variance. Then S2 is an estimator of 2.
168
Chapter Nine
1
( Xi X )2
n
(9.2)
1
( Xi X )2
n 1
(9.3)
(9.4)
N n
N 1
(9.5)
where N and n are the population and the sample size respectively. If is not
known in Equations (9.4) or (9.5) we replace it by the sample standard deviation S, which is an estimator of . Thus, in this case the maximum differences between the estimate x and the true value of in Equations (9.4) and
(9.5), respectively, are given by
E = z
S
n
(9.6)
N n
N 1
(9.7)
and
E = z
S
n
Note that the margin of error E given in Equations (9.4)(9.7) is with probability 1 .
Example 9.4 A manufacturing engineer wants to use the mean of a random sample of size n 64 to estimate the average length of the rods being
manufactured. If it is known that 0.5 cm, find the margin of error with
95% probability.
Solution: Since the sample size is large, and assuming that the total number of rods manufactured at the given facility is quite large, from Equation
(9.4) it follows that
E = z
= 1.96
n
0.5
0.5
= 1.96
= 0.1225
8
64
From Equations (8.6) and (9.4) or (8.7) and (9.5) it follows that as the sample size increases the variance of the estimator of the parameter and the margin of error E decrease. Thus, if the variance is minimal the margin of error
will also be minimal. In general, it is true that an unbiased estimator with
minimum variance is a better estimator because it will result in an estimate
that is closer to the true value of the parameter. This makes the minimum
variance property of an estimator desirable.
Definition 9.2 Consider a population with probability density
function f(x,), where
is an unknown parameter. Let
1,
2, ,
n,
be the unbiased estimators of . Then an estimator
i is said to be a
minimum variance unbiased estimator of
if the variance of
i is
smaller than the variance of any other unbiased estimator.
There are techniques to nd the minimum variance unbiased estimator,
if it exists, but these techniques are beyond the scope of this book. So we
shall limit ourselves only to the following rule:
If we have more than one unbiased estimator (not necessarily all possible unbiased estimators) of , choose from these estimators the one that has
the smallest standard error (standard error is nothing but the standard deviation of sampling distribution of the estimator).
sample from an infinite popExample 9.5: Let X1, X2, ..., Xn be a random
2
= (1.25 )2 x2 ,
n
(9.8)
170
Chapter Nine
which implies that the variance of the sample median is larger than the variance of the sample mean. Thus, between the twounbiased estimators of the
population mean we shall choose sample mean X as a better estimator
of .
It is very interesting to note that if the population is normal then X is the
minimum variance unbiased estimator of .
Example 9.6 In order to evaluate a new catalyst in a chemical production
process a chemist uses that catalyst in 30 batches. The final yield of the
chemical in each batch is recorded as follows:
72 74 71 78 84 80 79 75 77 76 74 78 88 78 70
72 84 82 80 75 73 76 78 84 83 85 81 79 76 72
(a) Find a point estimate of the final mean yield of the chemical.
(b) Find the standard error of the point estimator calculated in
part (a).
(c) Find with 95% probability the margin of error.
Solution: Since the sample size is large, all the results discussed above are
applicable to this problem. Also, note that when the population size is not
known as in this case we assume that the population is very large or at least
large enough that the sample size is less than 5% of the population size.
(a) To nd a point estimate of the nal mean yield we nd the
sample mean, which is a point estimate of the nal mean yield
of the chemical. Thus, we have
= X = ( 72 + 74 + 71 + 78 + ... + 72 ) 30 = 77.8
(b) To nd the standard error of the point estimate calculated in
part (a), we rst need to determine the sample standard
deviation S that is given by
1
( Xi X )2
(9.9)
n 1
S 4.6416.
so that the standard error of the point estimate is
S
4.6416
=
= 0.8474
n
30
(c) Since we want to nd the margin of error with 95% probability,
0.05 and the population standard deviation is not known.
Thus, substituting the value of Z 2 Z0.002 = 1.96, S 4.6416
and n 30 in Equation (9.6), we get the margin of error to be
equal to
E = 1.96( 4.6416
30 ) = 1.6609
The value of the margin of error E shows that our estimate of the nal
mean yield of the chemical is quite good.
DESCRIPTION
INTERVAL ESTIMATION
A method to find two numbers forming an
interval, based upon the information
contained in a sample, that would contain
with certain probability the true value of an
unknown parameter.
USE
TYPE OF DATA
DESIGN/APPLICATION
CONSIDERATIONS
SPECIAL COMMENTS/CONCERNS
RELATED TOOLS
172
Chapter Nine
Then and u are commonly called the lower confidence limit (LCL) and
upper confidence limit (UCL), respectively. The probability (1 ) is the
confidence coefficient.
The difference u is called the width or the size of the confidence
interval.
9.2.1 Interpretation of a Confidence Interval
Note that the lower and upper condence limits and u are statistics and
therefore random variables, which means the interval (, u ) will be different for different samples. Thus, it is possible that some of these condence
intervals may contain the true value of the unknown parameter while some
other may not contain that value. This leads us to the following interpretation
of the condence interval with condence coefcient 1 . That is, if we
take a large number of samples and for each sample we determine the condence interval (, u ) then the frequency with which these intervals will
contain the true value of , say 0, is 1 .
Figure 9.1 shows that 2 out of 50 condence intervals with condence
coefcient 0.95 ( 1 0.05) do not contain 0, the true value of .
X
z 2 = 1
P z 2
n
(9.11)
P X z 2
n X + z 2
n = 1
(9.12)
Equation (9.12) gives the condence interval for the mean with condence coefcient 1 . It is very interesting to note that the areas under the
two tails shown in Figure 9.2 each equal to /2. In other words, we have
divided into two equal parts such that half-alpha we take under one tail and
other half under the second tail. Technically speaking we may divide into
two parts as we wish, that is we may take, for example, one-third of under
one tail and the remaining two-third under the other tail. But traditionally we
always divide into two equal parts unless we have very strong reason to do
otherwise. Moreover, for a symmetric distribution, by dividing into two
equal parts we get a slightly smaller condence interval, which is one of the
nicest properties of a condence interval (smaller the better).
174
Chapter Nine
z /2
z /2
(1 ), where = X z 2
and u = X + z 2
are, respectively,
n
n
the lower condence limit (LCL) and upper condence limit (UCL). The
condence interval ( , u) is known as a two-sided condence interval.
As discussed above, suppose we decide not to divide at all. That is suppose lies completely under one tail or the other as shown in Figure 9.3(a)
and Figure 9.3(b). Then we get condence intervals that are known as onesided condence intervals. For example, from Figure 9.3(a), we have
X
P
z = 1
/ n
(9.13)
(9.14)
(a)
(b)
Figure 9.3 (a) Standard normal curve with lower-tail area equal to
normal curve with upper-tail area equal to .
. (b) Standard
176
Chapter Nine
obtained by simply replacing with S in the corresponding condence interval for when is known. Thus, the following:
Two-sided condence interval with condence coefcient 1 is
( , u ),
where
= X z
S
n
and
u = X + z
S
n
(9.16)
where
= X z
S
n
(9.17)
where
u = X + z
S
n
(9.18)
X 10 and S 2.
Also, the sample size n 100 is large. Thus, using the condence interval
( , u) where
= X z 2
S
n
u = X + z 2
S
n
2
100
9.608
u 10 + 1.96
2
100
10.392
Thus, a 95% condence interval for the average time taken by the technician to complete one job is (9.608, 10.392) hours.
Example 9.9 Suppose that it is known that the standard deviation of
workers hourly wage in auto industry is $5. A random sample of 64 workers
had an average hourly wage of $30. Find a 99% confidence interval for the
mean hourly wage .
Solution: Since in this case the sample size is n 64, which is large
and
the population standard deviation 5. Also, we are given that X 30.
Thus, a condence interval for mean hourly wage is ( , u) where
X z
u X + z
= 30 2.575
5
64
= 30 + 2.575
5
64
30 1.61 30 1.61
28.39 31.61
Thus, a 99% condence interval for the mean hourly wage is (28.39,
31.61) dollars.
Note: It is important to remember that the size of a condence interval,
which is dened as u , will increase or decrease as the sample size
decreases or increases.
9.3.2 Confidence Interval for Population Mean When the
Sample Size Is Small
The large-sample procedures for nding a condence interval for the population mean that we discussed in the previous section does not make any
assumptions about the distribution of the sampled population except that the
mean and the standard deviation of the population are and respectively.
The condence interval ( , u) was obtained using thefact that for large
sample size by the central limit theorem (see Chapter 8) X is approximately
normally distributed with mean and standard deviation n . When the
sample size is small, we cannot apply the central limit theorem, so we
assume that the sampled population is normally distributed. Note that when
the population is normal then irrespective of the sample size, the sample
X for any sample size is also normally distributed with mean and standard
deviation n . Thus, a condence interval for with condence coefcient 1 is ( , u) where
= X z
and
u = X + z
(9.19)
178
Chapter Nine
X
by replacing population standard deviation with its estimator
/ n
S is no longer normally distributed. In Chapter 8 (Theorem 8.4), we saw that
X
the pivotal quantity
is distributed as Students t-distribution with
S/ n
n 1 degrees of freedom. Thus, from Figure 9.4, we have
from
P ( t n 1, 2
X
t n 1, 2 ) = 1
S/ n
or
P ( t n 1,
S
X t n 1,
n
S
) = 1
n
or
P ( X t n 1,
S
X + t n 1,
n
S
) = 1
n
Thus, a two-sided small-sample confidence interval for with condence coefcient 1 when is not known is ( , u) where
S
S
, and u X + t n 1, 2
(9.20)
n
n
Lower and upper one-sided condence intervals are ( , ) and (, u)
respectively, where
X t n 1,
X t n 1,
S
S
, and u X + t n 1,
n
n
(9.21)
= 175 t15,0.025
S
, and u X + t n 1,
n
15
16
= 175 + t15,0.025
S
n
15
16
175 7.99
167.01
182.99
100
25
1875 2.797(20)
100
25
1875 2.797(20)
1875 55.94
1875 55.94
1819.06
1930.94
Thus, a small-sample two-sided 99% condence interval for is (1819.06,
1930.94).
180
Chapter Nine
1875 49.84
1875 49.84
1825.16
1924.84
12 n1 + 22 n2
(9.22)
Using the central limit theorem and the result of Theorem 9.1, it can easily be shown that the pivotal quantity in Equation (9.22) is distributed as
standard normal (i.e., the mean 0 and standard deviation 1). Thus, we have
P ( z
P ( z
( X1 X2 ) ( 1 2 )
12 n1 + 22 n2
z 2 ) = 1
1 n1 + 2 n 2 (X1 X 2 ) ( ) z
1
2
2
1 n1 + 2 n 2 ) = 1
2
or
P (( X1 X 2 ) z
1 n1 + 2 n2 (X1 X 2 ) + z
1
2
2
1 n1 + 2 n 2 ) = 1
2
12 2 2
+
, X1 X 2 + z
n1
n2
12 2 2
+
)
n1
n2
(9.23)
12 2 2
12 2 2
+
, ) and (, X1 X 2 + z
+
)
n1
n2
n1
n2
(9.24)
respectively.
Example 9.12 Suppose two independent random samples, one of 64
mechanical engineers and the other of 100 electrical engineers, showed that
the mean starting salaries of mechanical and electrical engineers are $36,250
and $40,760, respectively. Suppose it is known that the standard deviations of
starting salaries of mechanical and electrical engineers are $2240 and
$3000, respectively. Find a two-sided 95% confidence interval for 1 2.
Find the upper and lower one-sided 95% confidence intervals for 1 2.
Solution: From the given information, we have
n1 64
X1 36,250
1 2240
n2 100
X2 40,760
2 3000
Using Equation (9.23) a two-sided condence interval for 1 2 with 95%
condence coefcient is given by
(X1 X 2 z
12 2 2
, X1 X 2 + z
+
n1
n2
12 2 2
+
)=
n1
n2
(2240 )2 ( 3000 )2
, 36,250 40,760
+
64
100
(2, 240 )2 ( 3, 000 )2
+
)
64
100
182
Chapter Nine
z0.05
12 2 2
+
) = (, 36, 250 40, 760 +
n1
n2
(2240 )2 ( 3000 )2
+
)
64
100
(, 4510 675) (, 3835)
S2 S2
S12 S2 2
+
, ) and (, X1 X 2 + z 1 + 2 ) (9.26)
n1 n2
n1
n2
X1 150, S1 13 ; X2 120, S2 12
Find a 99% confidence interval for 1 2, the difference in the mean tensile strength of the two types of wires.
Solution: Since the population variances in this case are not known, a 99%
condence interval for 1 2 is obtained by using Equation (9.25). Thus,
we have
(X1 X 2 z
S12 S2 2
+
, X1 X 2 + z
n1
n2
S12 S2 2
+
)
n1
n2
132 12 2
, 150 120 2.575
+
40 40
132 12 2 )
+
40 40
12 2 2
+
n1
n2
(9.27)
which is distributed as standard normal N(0,1), that is, normal with mean 0
and standard deviation 1. So in this case pivotal quantity is the same as when
the sample size is large and the variances are known. Moreover, in the largesample case, by the central limit theorem, and here, because of the normality assumption, the pivotal quantity is distributed as standard normal. Thus,
the condence interval for 1 2 with condence coefcient (1 ) is
exactly the same as in Equation (9.23). That is,
( X1 X 2 z
12 2 2
12 2 2
+
, X1 X 2 + z 2
+
)
n1
n2
n1
n2
(9.28)
(9.29)
184
Chapter Nine
and
(, X1 X 2 + z 12 n1 + 22 n2 )
(9.30)
respectively.
Example 9.14 A manager of a company wants to evaluate the technicians
at two of its plants. She took two samples, one from each plant, of sizes n1
13 and n2 17 technicians. Then she looked at the number of jobs each technician performed during a fixed period of time. From experience, the number
of jobs performed by all the technicians at the two plants are known to be
normally distributed with variances 12 21, 22 18.The data collected
produced the following summary statistics:
X1 27
12 21
X2 24
22 18
Find a 95% confidence interval for 1 2 , the mean difference of the number of jobs performed by the technicians at the two plants.
Solution: The populations are normally distributed with known variances
and the sample sizes are small. To nd a desired condence interval we use
Equation (9.28), that is,
( X1 X 2 z
12 2 2
12 2 2
+
, X1 X 2 + z 2
+
)
n1
n2
n1
n2
21 18
, 27 24 1.96
+
13 17
(3 3.205, 3 3.205) (0.205, 6.205)
(27 24 1.96
21 18
)
+
13 17
(9.31)
where S12 and S22 are the sample variances of the samples drawn from the two
populations. In this case, the pivotal quantity that we use to nd a condence
interval for 1 2 with condence coefcient 1 is
(X1 X 2 ) ( 1 2 )
1 1
Sp
+
n1 n2
(9.32)
n1 + n2 2 ,
Sp
1 n1 + 1 n2 , X1 X 2 + t
n1 + n2 2 ,
Sp
1 n1 + 1 n2 )
(9.33)
Lower and upper one-sided condence intervals for 1 2 with condence coefcient 1 are given by
( X1 X 2 t n1 + n2 2, S p 1 n1 + 1 n2 , )
(9.34)
(9.35)
and
respectively.
9.4.2.3 Both Variances 12 and 22 Are Unknown But 12 and 22 Cannot Be
Assumed to Be Equal
Under this scenario the population variances are again unknown, but they
cannot be assumed to be equal (again, this assumption can be veried by
using techniques discussed in Chapter 10). In this case, the pivotal quantity
we use to nd a condence interval for 1 2 with condence coefcient
1 is
(X1 X 2 ) ( 1 2 )
(9.36)
S12 S22
+
n1 n2
The pivotal quantity in Equation (9.36) can be shown to be approximately
distributed as Students t-distribution with m degrees of freedom where
m=
S12 S22
n + n
1
2
S12 n1
) +(
n1 1
S22 n2
n2 1
(9.37)
186
Chapter Nine
Since the degree of freedom is always a whole number, it is usually necessary to round the value of m in Equation (9.37). Thus, the two-sided condence interval for 1 2 with condence coefcient (1 ) is given by
( X1 X 2 t m,
S12 n1 + S22 n2 , X1 X 2 +t m,
S12 n1 + S22 n2 )
(9.38)
Lower and upper one-sided condence intervals for 1 2 with condence coefcient (1 ) are given by
( X1 X 2 t m, S12 n1 + S22 n2 , )
(9.39)
(, X1 X 2 +t m, S12 n1 + S22 n2 )
(9.40)
and
respectively.
Example 9.15 A pharmaceutical company sets two machines to fill 15oz
bottles with cough syrup. Two random samples of n1 16 bottles from
machine 1 and n2 12 bottles from machine 2 are selected. The two samples yield the following sample statistics:
X1 15.24
S12 0.64
X2 14.96
S22 0.36
Find a 95% confidence interval for 1 2, the mean difference of the
amount of cough syrup filled in bottles by the two machines. Assume that the
two population variances are equal.
Solution: Since in this case the population variances are unknown but
assumed to be equal, we rst nd the pooled estimate of the common variance. That is,
S p2 =
=
Sp 0.722
Now to determine a desired condence interval using Equation (9.33), we
have
( X1 X 2 t
n1 + n2 2 ,
Sp
2
1 n1 + 1 n2 , X1 X 2 +t
Sp
2
1
1
+ , 15.24 14.96
16 12
1
1
+
)
16 12
(0.28 0.56, 0.28 0.56) (0.28, 0.84)
2.056(0.722)
n1 + n2 2 ,
1 n1 + 1 n2 )
m=
0.64 0.36
+
16
12
2
0.36
0.64
16
12
+
11
15
26
Note that in this case the degree of freedom turned out to be the same as
in Example 9.15. But this is not the case always. Thus, using the condence
interval in Equation (9.38), we get
( X1 X 2 t m,
S12 n1 + S22 n2 , X1 X 2 +t m,
S12 n1 + S22 n2 )
0.64 0.36
, 15.24 14.96
+
16
12
0.64 0.36 )
+
16
12
188
Chapter Nine
p(1 p )
a good cann
didate to be considered a pivotal quantity for estimating p because it possesses all the characteristics of a pivotal quantity. Having said this, we are
now ready to study the technique of nding a condence interval for p. Note
that throughout this section we are going to assume that n is large (np 5,
n(1 p) 5).
mean 0 and standard deviation 1. This makes ( p p ) /
p(1 p )
n
(9.41)
as a pivotal quantity for nding the condence interval for p. Also, from our
discussion in Chapter 8, we know that the pivotal quantity in Equation (9.41)
is distributed approximately as standard normal N(0, 1). Thus, we have
P ( z 2
p p
z 2 ) = 1
p(1 p )
n
or
P ( z
p(1 p )
p p z
n
p(1 p )
) = 1
n
or
P ( p z
p(1 p )
p p + z
n
LCL
p(1 p )
) = 1
n
UCL
(9.42)
p(1 p )
in LCL and UCL is the standard error p ,
n
which is unknown, since p is not known. But a good approximation of the
standard error is found by substituting p for p. Thus, we have
P ( p z
p (1 p )
p p + z
n
p (1 p )
) = 1
n
(9.43)
(9.44)
p (1 p )
n
(9.45)
p u = p + z
p = p z
p (1 p )
,
n
p u = p + z
p (1 p )
n
(1 / 8 )( 7 / 8 )
400
= 0..1250 + .0016
= 1 / 8 1.96
(1 / 8 )( 7 / 8 )
400
= 0.1250 .0016
= 1 / 8 + 1.96
= 0.1234
= 0.1266
n1
p1 =
X1i
i =1
n1
p 2 =
X2 j
j =1
n2
(9.46)
190
Chapter Nine
(9.47)
p1 (1 p1 ) p2 (1 p2 )
+
n1
n2
is approximately distributed as standard normal N(0, 1). Thus, using the statistic in Equation (9.47) as the pivotal quantity for estimating (p1 p2), we have
P ( Z / 2
( p1 p 2 ) ( p1 p 2 )
p1(1 p1) p 2(1 p 2 )
+
n2
n1
Z / 2) = 1
or
p (1 p1) p 2(1 p2 )
+
P ( Z / 2 1
( p 1 p 2 ) ( p1 p 2 )
n1
n2
p1(1 p1) p 2(1 p2 )
+
Z /2
= 1
n1
n2
or
p (1 p1) p 2(1 p2 )
+
P (( p 1 p 2 ) Z / 2 1
( p1 p 2 ) ( p1 p 2 ) +
n1
n2
p (1 p1) p 2(1 p2 )
+
Z / 2 1
) = 1
n1
n2
Note that the quantity
p1 (1 p1 ) p2 (1 p2 )
+
in LCL and UCL is
n1
n2
unknown since p1 and p2 are not known. Also, note that this quantity is the
standard error of (p1 p2) Thus, we estimate the standard error of (p1 p2)
p1 (1 p1 ) p 2 (1 p 2 )
, ( p1 p 2 ) +
+
n1
n2
p1 (1 p1 ) p 2 (1 p 2 )
+
n1
n2
(9.48)
Lower and upper one-sided condence intervals for (p1 p2) with condence coefcient 1 are given by
(( p1 p 2 ) z
p1 (1 p1 ) p 2 (1 p 2 )
, 1)
+
n1
n2
(9.49)
p1 (1 p1 ) p 2 (1 p 2 )
+
)
n1
n2
(9.50)
and
(0, ( p1 p 2 ) + z
respectively.
Example 9.18 Companies A and B claim that the new type of lightbulb
has a lifetime of more than 5,000 hours. In a random sample of 400 bulbs
manufactured by company A, 60 bulbs burned out before the guaranteed
period ended, and in a random sample of 500 bulbs manufactured by company B, 100 bulbs burned out before the guarantee period ended. Find a
point estimate and a 95% confidence interval for the true value of the difference (p1 p2), where p1 and p2 are the proportion of the bulbs manufactured by company A and company B, respectively, that burn out before
the guarantee period, that is, 5,000 hours.
Solution: From the given information, we have
p 1 =
60
3
100 1
and p =
=
=
2
400 20
500 5
20 5
We now want to nd a 95% condence interval for ( p1 p2). From Equation
(9.48), we have
p 1 p 2 =
(( p1 p 2 ) z
z
p1 (1 p1 ) p 2 (1 p 2 )
, ( p1 p 2 ) +
+
n1
n2
p1 (1 p1 ) p 2 (1 p 2 )
+
)
n1
n2
192
Chapter Nine
Substituting the values of p1 and p2 in this relation and z/2 1.96 since
0.05, we have
3 / 20(1 3 / 20 ) 1 / 5(1 1 / 5 )
3 1
) 1.96
+
400
20 5
500
= 0.05 0.06 = 0.11
LCL = (
3 / 20(1 3 / 20 ) 1 / 5(1 1 / 5 )
3 1
) + 1.96
+
400
20 5
500
= 0.05 + 0.06 = 0.01
UCL = (
z2 2 2
(9.51)
2
2
E2
(1.96 )2 (12 )2
= 61.46
32
The engineer should take a sample of size 62 to achieve the goal. Note that
the value of n is always rounded up.
Case 2
Let 1 2, then in order to determine the sample size in this case we
assume that the sample sizes taken from two population are equal, that is
n1 n2 n. Then the margin of error with probability (1 ) is given by
E = z
12 22
+
= z
n
n
12 + 22
n
Now by taking the square on both sides and doing some algebraic manipulations, we get
n=
z2 2 ( 12 + 22 )
E2
(9.52)
where 12 and 22 are the variances of populations under consideration.
Example 9.20 Suppose that we want to estimate the difference between
two population means 1 and 2. Further suppose that we know 1 2.0
and 2 2.5. How large a sample should be taken so that with probability
99% our estimate is within 1.2 units of the true value of 1 2.
Solution: From the given information, we have
z 2 = z0.005 = 2.575, 1 = 2.0 and 2 = 2.5, E = 1.2
Using Equation (9.52), the desired sample size is
n=
(2.575 )2 (2 2 + (2.5 )2 )
= 47.197 48.
(1.2 )2
194
Chapter Nine
variance, we will have to nd the sample variance, for which we would need
to have a sample. But to have a sample we must know the sample size we are
trying to nd. Thus it becomes a vicious circle. To solve this problem we use
one of two possible solutions.
a. We use some existing data on the same kind of study to
calculate the sample variance. Then we use the value of the
sample variance to determine the sample size n.
b. We take a preliminary sample, say of size n1, to calculate the
value of the sample variance. Then, we use this value of the
sample variance to determine the sample size n. Since we
already have a sample size n1, now we take another
supplemental sample of size n n1 and then combine the two
samples in order to get a full sample of size n.
Case 3
Let p. In this case the margin of error E is given by
E = z
p(1 p )
n
Now taking the square on both sides and doing some algebraic manipulation,
we get
n=
z2 2 p(1 p )
E2
(9.53)
Case 4
Let p1 p2. In this case, we assume that the sample sizes taken from
two Bernoulli populations are equal, that is, n1 n2 n. Then the margin
of error E is given by
E = z
p1 (1 p1 ) p2 (1 p2 )
+
n
n
Again, taking the square on both sides and doing some algebraic manipulations, we get the desired sample size n needed to have the margin of error no
more than E with probability 1 as
n=
z2 2 [ p1 (1 p1 ) + p2 (1 p2 )]
E2
(9.54)
p2 0.4
196
Chapter Nine
2 =
(n 1)S 2
2
(9.55)
(n 1)S 2
n21, 2 ) = 1
2
or
P( 2 n21,1 2 (n 1)S 2 2 n21, 2 ) = 1
Doing some further algebraic manipulations, we get
(n 1)S 2
(n 1)S 2
2 2
= 1
P 2
( n 1),1 2
( n 1), 2
LCL
(9.56)
UCL
2 =
(n 1)S 2
(n 1)S 2
and
=
u
(2n 1), 2
(2n 1),1 2
(n 1)S 2
n21, ) = 1
2
and
P(
(n 1)S 2
n21,1 ) = 1
2
we get lower and upper one-sided condence intervals for 2 with condence coefcient 1 as
(n 1)S 2
(n 1)S 2
and
,
0, 2
n 1,
n 1,1
(9.58)
respectively. Note that the condence interval for the population standard
deviation with condence coefcient 1 is obtained by taking the
square root of the corresponding condence interval for 2. Thus for example, a two-sided condence interval for with condence coefcient 1
is ( , u ) where
=
(n 1)S 2
n21, 2
and
u =
(n 1)S 2
n21,1 2
(9.59)
Example 9.23 The time taken by a worker in a car manufacturing company to finish a paint job on a car is normally distributed with mean and variance 2. A random sample of 15 paint jobs is randomly selected and
assigned to that worker, and the time taken by the worker to finish the job is
jotted down. These data yields a sample standard deviation of S 2.5 hours.
Find a 95% two-sided and one-sided lower and upper confidence intervals
for the population standard deviation .
Solution: From the given information and using the chi-square distribution
table (Table V of the appendix) and Figure 9.5, we have
S 2.5 0.05 n 1 14
214,10.025 214,0.975 5.629
214,0.025 26.119
2 =
u2 =
0.025
0.025
0
5.629
26.119
Figure 9.5 Chi-square distribution with two tail areas each equal to 0.025.
198
Chapter Nine
2 =
u2 =
Therefore, one-sided lower and upper condence intervals for 2 are (3.69,
) and (0, 13.32), respectively. These condence intervals for the population
standard deviation are found just by taking the square root; that is, one-sided
lower and upper 95% condence intervals for the population standard deviation are (1.92, ) and (0, 3.65) respectively.
9.7.2 Confidence Interval for the Ratio of
Two Population Variances
In this section we consider two normal populations with unknown variances
12 and 22. We want to nd a condence interval for 12 /22 with condence coefcient 1 . Let X11, X12, ..., X1n1 and X21, X22, ..., X2n2 be
random samples from independent populations. Let S12 and S22 be the corresponding sample variances. Then, from Theorem 8.5 it follows that the random variable
F=
S12 / 12
S22 / 22
(9.60)
S12 12
F1 , 2 , 2 ) = 1
S22 22
P( F1 , 2 ,1 2
S12 22
F1 , 2 , 2 ) = 1
S22 12
or
Now , using F1 , 2 ,1 2 =
1
F 2 ,1 ,
we get
P ( F 2 ,1 ,1
S12 12
S12
F
) = 1
2
2 1
S22 22
S22
(9.61)
2
From Equation (9.61) it follows that a condence interval for 12 with con2
dence coefcient 1 is
( F 2 ,1 ,1
S12
S12
,
F
)
2
2 1
S22
S22
(9.62)
The corresponding condence interval for the ratio of the population standard deviations is found by taking the square root of the condence limits in
S12
S12
F 2 ,1 ,1 2 2 , F 2 ,1 , 2 2
S2
S2
(9.63)
Lower and upper one-sided condence intervals for the ratio of the population variances and population standard deviations with condence coefcient
1 are
( F 2 ,1 ,1
S12
, )
S22
(0, F 2 ,1 ,
S12
)
S22
(9.64)
and
S12
F 2 ,1 ,1 2 ,
S2
0,
F 2 ,1 ,
S12
S22
(9.65)
respectively.
Example 9.24 Two random samples of sizes 13 and 16 are selected from a
group of patients with hypertension. The patients in the two samples are
independently treated with drug A and B. After a full course of treatments,
these patients are evaluated. The data collected at the time of evaluation
yielded sample standard deviations S1 6.5 mm Hg and S2 7.5 mm Hg.
Assume that the two sets of data come from independent normal populations
with variances 12 and 22, respectively. Determine 95% two-sided and
one-sided confidence intervals for 12/22 and 1/2.
200
Chapter Nine
f(x)
f(x)
f(x)
(a)
0.025
0.025
0.3378
(c)
(b)
3.18
0.05
0.05
1.9679
0.4032
Figure 9.6 F-distribution curve (a) shaded area under two tails each equal to 0.025
(b) shaded area under left tail equal to 0.05 (c) shaded area under the
right tail equal to 0.05.
1 12
2 15,
S1 6.5
S2 7.5
Thus, a two-sided condence interval for the ratio 1 /22 of the two population variances is determined by substituting these values in Equation (9.62)
(see Figure 9.6(a)), that is,
2
S12
S12
F
,
F
2 ,1 ,1 2 S 2 2 ,1 , 2 S 2 = (0.22537, 2.3884),
2
2
and for the ratio of the standard deviations 1/2 the two-sided condence
interval is found by taking the square root, which gives
(0.5037, 1.5454)
Using Equations (9.64) and (9.65) and Figures 9.6(b) and 9.6(c), we get
95% lower and upper one-sided condence intervals for 12/22 and 1/2 as
(0.3028, ), (0, 1.9679) and (0.5502, ), (0, 1.4028)
respectively.
10
Hypothesis Testing
201
202
Chapter Ten
DESCRIPTION
HYPOTHESIS TESTING
A decision-making process, based upon the
information contained in a sample, about
whether an unknown population parameter
can take some assigned value.
USE
TYPE OF DATA
DESIGN/APPLICATION
CONSIDERATIONS
SPECIAL COMMENTS/CONCERNS
RELATED TOOLS
(10.1)
H1: 0
where 0 is known. Thus, under the null hypothesis it is believed that takes
some known value 0, whereas under the alternative hypothesis, our assertion, based on some theory or some new information, is that takes a value
less than 0. Should we have some different information, then that could lead
us to another alternative hypothesis, namely
H1: 0
or
(10.2)
0
(10.3)
0
(10.4)
is called a two-tail alternative. These names for the alternative hypotheses are
based on reasons that will become clear as we move forward. Having dened
these terminologies, the question now is, how do we test these hypotheses?
The most logical answer that comes to mind: To make any decision we are
going to use the information contained in a sample that has been drawn from
the population with probability model f(x,), where is unknown. We should
consider some statistic, called the test statistic, say , which for example,
may be an estimator of . Then using the sample data we calculate the value
of the test statistic. Then for certain values of the test statistic we may favor
the alternative hypothesis H1 and reject the null hypothesis H0, whereas for
any other value of the test statistic we do not reject the null hypothesis H0.
Thus, for example, consider the following hypothesis.
H0: 0
H1: 0
It seems quite reasonable to consider that if the value of the test statistic
turns out to be too small then we should favor the alternative hypothesis H1
204
Chapter Ten
and reject the null hypothesis H0. Otherwise we should not reject H0. As far
as the decision of how the small value of is too small, that can be made by
considering the sample space of and dividing it into two regions so that if
the value of falls in the lower region, the shaded region in Figure 10.1(a),
we reject H0. Otherwise we do not reject H0. The region for which we reject
H0 is usually known as the rejection region or critical region, and the region
for which we do not reject the null hypothesis H0 is known as the acceptance
region. The point separating these two regions is called the critical point.
Using the same argument, we can easily see that for testing the alternatives
H1: 0
and
H1: 0
the hypothesis H1: 0 is favored for large values of , while the hypothesis H1 : 0 is favored when is either very small or very large. Thus,
the rejection regions will respectively fall in the upper region, and in both
lower and upper regions. These regions are shown in Figures 10.1(b) and
10.1(c) respectively.
Now it should be clear why we call the alternatives 0 or 0 the
one tail and the alternative 0 the two tail. It is because of the location
of the rejection regions. In the rst two cases the rejection region is located
on only one side, while in the third case it is located on both sides.
We have developed the above procedure of using the information contained in a sample, by means of a statistic, to make a decision about the
unknown parameters and consequently, about the population itself. Having
done this, the next question that one might ask is whether there are any risks
10.1(a)
H1 :
< 0
10.1(b)
H1 :
10.1(c)
H1 :
> 0
Figure 10.1 Critical points dividing the sample space of in two regions, the
rejection region and the acceptance region.
of committing any errors while making such decisions. The answer is yes.
There are two risks. One occurs when the null hypothesis is true but, based
on the information contained in the sample, we end up rejecting it. This type
of error is called type I error. The second kind of error is when the null
hypothesis is false or the alternative hypothesis is true but still we do not
reject the null hypothesis. This kind of error is called type II error. Note that
these errors cannot be eliminated, but they certainly can be minimized. For
that we will have to pay some price. We shall study this aspect of the problem a little later in this chapter.
Certain probabilities are associated with committing type I and type II
errors, which we denote by and , respectively. We may dene and as
follows:
P(rejecting H0 H0 is true)
(10.5)
(10.6)
H0 is false
Reject H0
Correct decision
Do not reject H0
Correct decision
Type II error ()
Chapter Ten
normal. In such cases the value of can easily be obtained by using one of
the following formulas.
0 1
z , if H1: 0
= PZ >
(10.7)
0 1
+ z ,
= PZ <
if H1: 0
(10.8)
0 1
z < Z < 0 1 + z if H :
= P
1
0
2
2
(10.9)
H 1:
< 0
1.0
1.0
0.8
0.8
0.4
0.6
0.6
0.2
0.4
0.2
0
0.2
0.4
0.6
0.8
1.0
From these formulas we can easily see that under the alternative hypothesis,
if we set 0 then will always be (1 ) and the power of the test will
be .
If we plot a graph of the values of versus different values of 1 under
the alternative hypothesis, we get a curve known as the operating characteristic curve or simply the OC-curve. The OC-curves for different alternative
hypotheses are shown in Figure 10.2.
If we now plot a graph of power (1 ) versus different values of 1
under the alternative hypothesis, we get a curve known as the power curve.
Power curves under different alternative hypotheses are shown in Figure 10.3.
It is quite clear that although the value of is predetermined, the value
of is determined later, and it depends upon the alternative hypothesis.
Remember, when the value of is lower, the power of the test is higher and
is, therefore, a better test. At this juncture one might ask whether one can
ever assign some predetermined value to as well. The answer is yes, but at
a certain cost. What cost? The only cost is that the appropriate sample size
206
H 1: > 0
H 1: 0
1.0
0.8
0.6
0.6
H 1: 0
0.2
0.2
0.4
0.4
0.6
0.4
0
0.2
Power
0.8
0.8
1.0
1.0
H 1: < 0
H 1: > 0
needed to achieve this goal may turn out to be quite large. For given values
of and , the sample size n should be such that
n
( z + z )2 2
(1 0 )2
(10.10)
(10.11)
( z + z )2 2
n
(1 0 )2
208
Chapter Ten
(10.12)
or
(iii) H0: 0 versus H1: 0
Step 2 Assign some suitable value to , say 0.05.
Step 3 Determine the suitable test statistic.
We consider the pivotal quantity
X
n
(10.13)
for as a test statistic for testing hypotheses (i), (ii), or (iii) about .
Step 4 Determine the probability distribution of the test
statistic.
Since we are assuming the sample size to be large (n 30), the central
X
limit theorem we know that the test statistic
is distributed as
n
standard normal, that is, normal with mean 0 and standard deviation 1.
Step 5 Find the rejection regions.
Since the location of the rejection region depends upon the alternative hypothesis and its size depends upon the size of type I error set
(ii)
(i)
1.645
1.645
(iii)
1.96
1.96
Figure 10.4 Rejection regions for hypotheses (i), (ii), and (iii).
in this case at 0.05, the rejections regions for all three hypotheses
are as shown in Figure 10.4
Note that because of the location for the rejection regions, the hypotheses (i), (ii), and (iii) are sometimes known as lower-tail, upper-tail, and twotail hypotheses, respectively.
Step 6 Calculate the value of the test statistic and make the
decision.
Now we take a random sample from the given population and calculate
the value of the test statistic
X
n
Note that in the test statistic and n are known, and X is calculated using the
sample data. The value of is always taken equal to 0, since we always test
a hypothesis under the assumption that the null hypothesis is true. Then, if
the value of the test statistic falls in the rejection region we contradict our
assumption and reject H0. Otherwise, we do not reject H0.
Example 10.1 A random sample of 36 pieces of copper wire produced in a
plant of a wire manufacturing company yields the mean tensile strength of X
950 psi. Suppose that the population of tensile strengths of all copper
wires produced in that plant are distributed with mean and standard deviation 120 psi. Test a statistical hypothesis
H0: 980 versus H1: 980
at the 0.01 level of significance.
Solution:
Step 1
H0: 980 versus H1: 980
Step 2
0.01
210
Chapter Ten
X
n
X 950 980
=
= 1.5
n 120 36
This value does not fall in the rejection region, so we do not reject the
null hypothesis H0. In other words, the data seem to support the hypothesis
that the mean tensile strength of the copper wires manufactured in that plant
is 980 psi.
Sometimes instead of determining the rejection region (in step 5) and
then verifying whether the value of the test statistic falls in the rejection
region, we use a different method to take our decision. This method makes
use of a quantity, called the p-value.
Definition 10.1 The p-value of a test is the smallest value of for
which the null hypothesis H0 is rejected.
Given that the sample has been taken and the value z of the test statistic
Z has been computed, the p-value may be determined as follows:
p-value P(Z z);
if H1: 0
(10.14)
P(Z z);
if H1: 0
(10.15)
2P(Z |z|); if H1 0
(10.16)
f(x)
2.33
2.575
2.575
212
Chapter Ten
Example 10.3
Solution: Since the hypothesis in Example 10.2 is a two-tail test. we calculate the value of at 860, 880, 900, 925, 950, 975, 985, 1010, 1035,
1060, 1080, and 1100. Note that we have selected these values in such a way
that some of these values are smaller and some are larger than the value of
under the null hypothesis. This clearly satises the requirement of the alternative hypothesis that 0, since 0 980. Now using the formula given
in Equation (10.9) for calculating the value of , we have
980 860
980 860
1 0.4305
950;
1 0.1412
975;
1 0.0124
985;
1 0.0124
1010
1 0.1412
1035
1 0.4305
1060
1 0.9229
1080
1 0.9923
1100
1 1.0
Now the power curve for the test in Example 10.2 is obtained by plotting
the values of versus the values of 1 . The power curve is shown in
Figure 10.7.
It is important to remember that there is no analytic relationship between
type I error () and type II error (). That is, there does not exist any function
such that (). However, one can easily see that if everything except
and are xed, then as increases decreases and as decreases
increases.
Power
1.0
0.5
0.0
900
1000
mu
1100
Let X be the sample mean and S2 be the sample variance. Then, we want to
test one of the hypotheses dened below in step 1 at the level of signicance, say 0.05, assuming that the sample size is large ( 30).
Step 1 (i) H0: 0
(ii) H0: 0
(iii) H0: 0
versus H1: 0
or
Step 2 0.05
Step 3 We consider as a test statistic the pivotal quantity
X
S n
(10.17)
X
a pivotal quan n
tity since in that case we knew . But in the present case we do not know
X
and, therefore,
is not a pivotal quantity, since a pivotal quantity does
n
not contain any unknown parameter other than the one under consideration,
which in this case is .
for . Note that in the section 10.2.1, we considered
214
Chapter Ten
Note that in the test statistic, n, is known, X and S are calculated using
the sample data, and the value of is always taken equal to 0, that is, the
value of under the null hypothesis. Then, if the value of the test statistic
falls in the rejection region, we contradict our assertion and reject H0.
Otherwise we do not reject H0.
f(x)
f(x)
(i)
1.645
f(x)
(ii)
1.645
(iii)
1.96
1.96
Figure 10.8 Rejection regions for hypotheses (i), (ii), and (iii).
Solution:
Step 1
X
S n
X
S n
which falls in the rejection region. Thus, we reject the null hypothesis
H0.
The p-value for the test is given by
p-value P(Z z) P(Z 2.0) 0.0228,
1.645
Figure 10.9 Rejection region under the lower test with 0.05.
216
Chapter Ten
and the type II error at 1 60,500 miles, using Equation (10.7), is given
by
0 1
z
= P Z >
x
61, 000 60, 500
1.645
= P Z >
4, 000 64
sample sizes are large. Let X1 and X2 be the sample means of samples from
populations I and II, respectively. Then we are interested in testing one of the
hypotheses dened below, in step 1, at the level of signicance.
To test these hypotheses when the variances are known, we go through
the same six steps that we did in section 10.2.1 for testing hypotheses about
the mean of one population. Thus, we have
Step 1
or
(i) H0: 1 2 0
versus H1: 1 2 0,
(ii) H0: 1 2 0
versus H1: 1 2 0,
(iii) H0: 1 2 0
versus H1: 1 2 0
(10.18)
( X1 X 2 ) ( 1 2 )
12 n1 + 22 n2
(10.19)
(i)
1.645
(ii)
1.645
(iii)
1.96
1.96
Figure 10.10 Rejection regions for testing hypotheses (i), (ii), and (iii) at the
0.05 level of significance.
Since the sample sizes are large, using the central limit theorem and
Theorem 9.1, we can easily show that the test statistic in Equation (10.19),
that is,
( X X 2 ) ( 1 2 )
Z= 1
12 n1 + 22 n2
is distributed as standard normal, that is, normal with mean 0 and standard
deviation 1.
Step 5 Find the rejection regions.
As explained in section 10.2, the location of the rejection regions is
determined by the alternative hypothesis, and its size is determined by the
size of the type I error . Using 0.05, the rejection region for each of the
above hypotheses is shown in Figure 10.10.
Step 6 Now take two samples one from each of the
populations I and II and calculate the sample means. Then
calculate the observed value of the test statistic, and if it falls in
the rejection region reject the null hypothesis H0. Otherwise do
not reject H0.
Example 10.5 Suppose two random samples one from each of population
I and population II, with known variances 12 23.4 and 22 20.6, yielded the following sample statistics:
n1 50
X1 38.5
n2 45
X2 35.8
Test at the 0.05 level of significance the hypothesis H0: 1 2 0
versus H1: 1 2 0.
Solution:
Step 1 H0: 1 2 0
Step 2 0.05
versus H1: 1 2 0.
218
Chapter Ten
1.645
Figure 10.11 Rejection region under the upper tail with 0.05.
( X1 X 2 ) ( 1 2 )
12 n1 + 22 n2
Step 4 Since sample sizes are large, using the central limit
theorem we can easily show that the test statistic is distributed
as standard normal, that is, normal with mean 0 and standard
deviation 1.
Step 5 The hypothesis in this example is clearly upper-tail
hypothesis. The rejection region is as shown in Figure 10.11.
Step 6 Substituting the value of X1 and X2, 12 and 12, and the
value of 1 2 0, under the null hypothesis, in the test
statistic, the observed value of the test statistic is
Z=
( 38.5 35.8 ) 0
= 2.806
23.4 20.6
+
50
45
Clearly this value falls in the rejection region. Thus, we reject the null
hypothesis of equal means. In other words, based upon the given information, we can conclude, at the 0.05 level of signicance, that population
means are not equal.
The p-value in this example (see Equation (10.15)) can be found, using
the normal tables.
p-value P(Z z)
P(Z 2.806) 0.0026
Example 10.6 A suppler furnishes two types of filaments, type I and type II,
to a manufacturer of electric bulbs. Suppose an electrical engineer in the manufacturing company wants to compare the average resistance of the two types
of filament. In order to do so, he takes two samples, size n1 36 of filament
type
filament type II. The two samples yield sample means
I and size n2 40 of
X1 7.35 Ohms and X2 7.65 Ohms, respectively. If from experience it is
known that the standard deviations of the two filaments are 1 0.50 Ohms
and 2 0.64 Ohms, respectively, test at the 0.05 level of significance the
1.96
1.96
Figure 10.12 Rejection regions under the two tails with 0.05.
versus H1: 1 2 0
Step 2 0.05
Step 3 Test statistic for testing the above hypothesis is the
pivotal quantity for 1 2, that is
Z=
( X1 X 2 ) ( 1 2 )
12 n1 + 22 n2
( 7.35 7.65 ) 0
= 2.28
(0.50 )2 (0.64 )2
+
36
40
The observed value of the test statistic falls in the rejection region and we
reject the null hypothesis H0. In other words, we conclude at the 0.05
level of signicance that the two laments have different resistances.
10.3.2 Population Variances Are Unknown
Consider two populations with probability distribution models f1(x) and f2(x)
with means 1 and 2 and variances 12 and 22 respectively. Let X11, X12,
X13, ..., X1n1 and X21, X22, X23, ..., X2n2 be random samples from populations
220
Chapter Ten
I and II respectively. Let X1 and X2 be the sample means and S12 and S22 be
the sample variances of the samples from populations. Then we are interested in testing one of the hypotheses dened below in step 1 at the level of
signicance.
The method for testing these hypotheses when variances are unknown is
exactly the same as when the variances are known, discussed in section
10.3.1, except that the population variances are replaced with sample variances. We proceed to discuss these hypotheses as follows:
Step 1 Dene the null and alternative hypotheses.
or
(i) H0: 1 2 0
versus H1: 1 2 0,
(ii) H0: 1 2 0
versus H1: 1 2 0,
(iii) H0: 1 2 0
versus H1: 1 2 0.
(10.20)
(10.21)
(i)
1.645
(ii)
1.645
(iii)
1.96
1.96
Figure 10.13 Rejection regions for testing hypotheses (i), (ii), and (iii) at 0.05
level of significance.
Example 10.7 Rotor shafts of the same diameters are being manufactured
at two different facilities of a manufacturing company. A random sample of
size n1 72 rotor shafts from one facility produced a mean diameter of
0.536 inch with a standard deviation of 0.007 inch, while a sample of size n2
60 from the second facility produced a mean diameter of 0.540 inch with
a standard deviation of 0.01 inch.
(i) Test the null hypothesis H0: 1 2 0 versus H1: 1 2
0 at the 0.05 level of significance.
(ii) Find the p-value for the test in part (i)
(iii) Find the size of the type II error and the power of the test if
the true value of 1 2 0.002.
Solution:
(i)
222
Chapter Ten
1.96
1.96
(0.536 0.540 ) 0
(0.007 )2 72 + (0.01)2 60
= 2.61,
which obviously falls in the rejection region. Thus, we reject the null hypothesis H0.
(ii) Since the test is a two-tail test the p-value of the test is given
by
p-value 2P(z 2.61) 2(.0045) 0.009
(iii) Using Formula 10.9 for calculating the type II error , we get
( 2 )0 ( 1 2 )1
( 2 )0 ( 1 2 )1
= P 1
z / 2 Z 1
+ z / 2
( 1 2 )
( 1 2 )
0 ( 0.002 )
= P
1.96 Z
(.007 )2 (.01)2
+
72
60
0 ( 0.002 )
+ 1.96
(.007 )2 (.01)2
+
72
60
circumstances the experimenter may have to evaluate whether it is more benecial to take a larger sample or to take a smaller sample and accept somewhat less accurate results. So she may choose to take a smaller sample, or
she may have no choice other than to take a smaller sample. In this and the
next two sections we work with the problem of testing hypotheses when
sample sizes are small.
In this section, we assume that the sample is drawn from a population
distributed normally with an unknown mean and variance 2 that may or
may not be known.
10.4.1 Population Variance Is Known
Let X1, X2, X3, . . . , Xn be a random sample from a normalpopulation with
an unknown mean and variance 2 that is known. Let X be the sample
mean. We would like to test one of the hypotheses dened in step 1 at the
level of signicance. As in section 10.3, we use the six-step method to test
these hypotheses.
Step 1 Dene the null and alternative hypotheses.
(i) H0: 0 versus H1: 0,
(ii) H0: 0
(iii) H0: 0
(10.22)
or
Step 2 Assign a predetermined value to type I error , say,
0.05.
Step 3 Determine a suitable test statistic.
Since the variance is known and the population is normally distributed,
X -
we consider the pivotal quantity
for , as a test statistic.
n
Step 4 Find the probability distribution of the test statistic.
Since the sample has been drawn from a normal population with known
variance, the test statistic in step 3 is distributed as standard normal N(0,1).
Step 5 Find the rejection region.
Using the arguments discussed in sections 10.3 and 10.4, it can be shown
that the rejection regions are as shown in Figure 10.15.
Step 6 Calculate the observed value of the test statistic and
make a decision.
Use the sample data to calculate the sample mean X. Then calculate the
value of the test statistic under the null hypothesis H0: 0. If the
observed value of the test statistic falls in the rejection region, then we reject
the null hypothesis H0. Otherwise do not reject H0.
224
Chapter Ten
(ii)
(i)
1.645
1.645
(iii)
1.96
1.96
Figure 10.15 Rejection regions for testing hypotheses (i), (ii), (iii) at the 0.05
level of significance.
X -
n
68 75 7
=
= 2.8,
10 / 16 2.5
1.645
Figure 10.16 Rejection region under the lower tail with 0.05.
which falls in the rejection region. Thus, we reject the null hypothesis H0 and
conclude that based upon the data, travel time is less than 75 minutes.
The p-value is given by
p-value P(Z z) P(Z 2.8)
To nd the type II error, we proceed as follows:
= P(Z >
= P(Z >
0 1
Z )
/ n
75 72
1.645 )
10 / 16
X -
/ n
226
Chapter Ten
1.96
1.96
Figure 10.17 Rejection regions under the two tails with 0.05.
10.5 10 0.5
=
= 1.0,
2.5 / 25 0.5
which does not fall in the rejection region. Thus, we do not reject the null
hypothesis H0. In other words the data support the advising ofces claim that
10 at the 0.05 level of signicance.
The p-value is given by
p-value P(Z z) P(Z z)
2P(Z |z|) 2(0.1587) 0.3174.
10.4.2 Population Variance Is Unknown
In this section, as in Chapter 9, we shall invoke the use of the Students t-distribution with (n 1) degrees of freedom. Note that we use t-distribution
only when all of the following conditions hold.
(i) The sampled population is at least approximately normal.
(ii) The sample size is small (n 30).
(iii) The population variance is unknown.
Let X1, X2, X3, . . . , Xn be a random sample from a normal population
with unknown mean and unknown variance 2. Let X and S be the sample
mean and the sample standard deviation, respectively. Then we want to test
the following hypotheses about the mean , dened in step 1, at the level
of signicance.
The six-step method to test these hypotheses is as follows:
Step 1 Dene the null and alternative hypotheses.
(i) H0: 0
(ii) H0: 0
(10.23)
or
(iii) H0: 0
X
S/
(10.24)
(i)
t n-1, 0.05
(ii)
t n-1, 0.05
(iii)
t n-1, 0.25
t n-1, 0.25
Figure 10.18 Rejection regions for testing hypotheses (i), (ii), and (iii) at the given .
228
Chapter Ten
of the companys belief. Assume that the assembly times are normally distributed. Find the p-value.
Solution:
Step 1 H0: 30 versus H1: 30
Step 2 0.05
Step 3 Test statistic is
T=
X -
S/ n
33 30
= 2.0
6 / 16
Since the value of the test statistic T 2.0 falls in the rejection region, we
reject the null hypothesis H0.
To nd the exact p-value, we will have to integrate the density function
of the t-distribution with 15 degrees of freedom between the limits 2.0 to .
But this is beyond the scope of this book. Thus, we content ourselves with
simply nding a pair of values within which the p-value falls, that can be
found simply by using the t-distribution table (Table IV of the appendix).
Thus, to achieve this goal we proceed as follows.
We nd two entries in the t-distribution table with 15 degrees of freedom
such that one value is just smaller and other is just larger than the observed
1.753
Figure 10.19 Rejection region under the upper tail with 0.05.
value of the test statistic. Thus, for example, in this case these entries are
1.753 and 2.131. This implies that
P(T 2.131) P(T 2.0) P(T 1.753)
or
0.025 p-value 0.05
since the values 1.753 and 2.131 correspond to under the upper tail areas
0.05 and 0.025, respectively. Note that sometimes the situation may arise
such that the value of the test statistic is either so small or so large that it is
not possible to nd two entries in the table that will enclose that value. Using
the case above, let us assume that the observed value of the test statistic is t
3.0. We nd only one entry 2.947, which is just smaller than t 3.0, and
there is no value larger than 3.0. In this case p-value will be
p-value P(t 3.0) P(t 2.947) 0.005
That is, the p-value is less than 0.005.
versus H1: 1 2 0,
(ii) H0: 1 2 0
versus H1: 1 2 0,
(iii) H0: 1 2 0
versus H1: 1 2 0.
(10.25)
or
We shall consider three possible scenarios.
1. Population variances 12 and 22 are known.
2. Population variances 12 and 22 are unknown, but we can
assume that they are equal, that is, 12 22 2.
3. Population variances 12 and 22 are unknown, but we cannot
assume that they are equal, that is 12 22.
230
Chapter Ten
For testing each of the hypotheses, we shall consider the pivotal quantity for 1 2 as the test statistic, which will depend upon whether the population variances are known. Since under the last two scenarios we shall be
using Students t-distribution, it is quite important to review the conditions
under which we use Students t-distribution.
When to Use Students t-Distribution under Scenarios 2 and 3:
1. The sampled populations are at least approximately normal.
2. The samples are independent and at least one of the sample
sizes is small.
3. The population variances 12 and 22 are unknown.
Note that under scenario 2, we assume that 12 22 2, which
implies that as far as the variance is concerned, the two populations are identical. Thus, as discussed in Chapter 9, to estimate the common unknown variance 2 we use the information from both the samples. Such an estimator,
denoted by Sp2, usually known as the pooled estimator of 2, is dened as
(n1 1)S12 + (n2 1)S22
(10.26)
=
n1 + n2 2
Having said all this, we now proceed to consider, one by one, the three scenarios.
S p2
12 22
+
n1 n2
(10.27)
Solution:
Step 1 H0: 1 2 0
versus H1: 1 2 0
Step 2 0.01
Step 3 Test statistic is
Z=
(X1 X 2 ) ( 1 2 )
12 22
+
n1 n2
(92.8 90.1) 0
(1.5 )2 (1.5 )2
+
12
16
2.7
= 4.71
0.328
This value of the test statistic clearly falls in the rejection region. Thus, we
reject the null hypothesis H0.
p-value P(Z z)
P(Z 4.71) 0
2.33
Figure 10.20 Rejection region under the upper tail with 0.01.
232
Chapter Ten
= P(Z <
( 1 2 )0 ( 1 2 )1
= P(Z <
12 22
+
n1 n2
02
(1.5 )2 (1.5 )2
+
12
16
+ z0.01 )
+ 2.33)
versus H1: 1 2 0,
(ii) H0: 1 2 0
versus H1: 1 2 0,
(iii) H0: 1 2 0
versus H1: 1 2 0.
(10.28)
(10.29)
where the pooled estimator Sp2 for the common variance 2 is given by
S p2 =
(10.30)
(ii)
(i)
t n + n - 2,
t n + n - 2,
1
(iii)
tn + n - 2, /2 t n + n - 2, /2
1
Figure 10.21 Rejection regions for testing hypotheses (i), (ii), and (iii) at the level
of significance.
(X1 X 2 ) ( 1 2 )
1 1
Sp
+
n1 n2
234
Chapter Ten
2.048
Figure 10.22 Rejection region under the upper tail with 0.025.
Using Sp2 2.39, X1 92.7, X2 89.8, and the value of 1 2, under
the null hypothesis that is 1 2 0, we have the observed value of the
test statistic as
T=
(92.7 89.9 ) 0
1
1
1.55
+
14 16
4.94
Clearly this value of the test statistic falls in the rejection region. Thus, we
reject the null hypothesis H0. In other words, we can say that at the
0.025 level of signicance that fuel one has a higher octane number.
p-value P(t 4.94)
P(t 2.763)
0.005
Thus, the p-value of the test is less than 0.005.
versus H1: 1 2 0,
(ii) H0: 1 2 0
versus H1: 1 2 0,
(iii) H0: 1 2 0
versus H1: 1 2 0.
(10.31)
(X1 X 2 ) ( 1 2 )
S12 S22
+
n1 n2
(10.32)
m=
(S
2
1
S12 S22
n + n
1
2
n1
) + (S
n1 1
2
2
n2
(10.33)
n2 1
236
Chapter Ten
(ii)
(i)
(iii)
t m, /2
t m,
t m,
t m, /2
Figure 10.23 Rejection regions for testing the hypotheses (i), (ii), and (iii) at the
level of significance.
Example 10.13 A new weight control company A claims that persons who
use its program regularly for a certain period of time lose the same amount
of weight as those who use the program of a well-established company B for
the same period of time. A random sample of n1 12 persons who used company As program lost on the average 20 pounds with standard deviation of
4 pounds, while another sample of n2 10 persons who used company Bs
program for the same period lost on the average 22 pounds with standard
deviation of 3 pounds. Determine at the 0.01 level of significance
whether the data provide sufficient evidence to support the claim by company A. Find the p-value of the test. We assume that the two population variances are not equal.
Solution:
Step 1 H0: 1 2 0 versus H1: 1 2 0.
Step 2 Assign a suitable predetermined value to ;
say 0.01.
Step 3 We consider the pivotal quantity for 1 2 to be the
test statistic, that is,
T=
(X1 X 2 ) ( 1 2 )
S12 S22
+
n1 n2
m=
(S
2
1
S12 S22
n + n
1
2
n1
) + (S
n1 1
2
2
n2
n2 1
4.988
20.
0.2516
2.845
2.845
Figure 10.24 The rejection region under the two tails with 0.01.
(20 22 ) 0
16 9
+
12 10
1.34,
which does not fall in the rejection region. Thus, we do not reject the null
hypothesis H0.
In other words, at the 0.01 level of signicance, the data support the
claim of company A.
Since the test is a two-tail test, the p-value is given by
p-value 2P(T 1.34)
But
P(T 1.725) P(T 1.34) P(T 1.325)
or
0.05 P(T 1.34) 0.01
or
2(0.05) 2 P(T 1.34) 2(0.10)
or
0.10 p-value 0.20
That is, the p-value of the test is somewhere between 10% and 20%.
238
Chapter Ten
from each population. Quite often for various reasons the experiments are
designed in such a way that the data are collected in pairs, that is, two observations are taken on the same subject and consequently the samples are not
independent. We encounter this kind of data in elds such as medicine, psychology, chemical industry, and engineering. For example, a civil engineer
may divide each specimen of concrete into two parts and apply two drying
techniques, one technique to each part; a nurse collects blood samples to test
the serum-cholesterol level, divides each sample into two parts, and sends
one part to one lab and the second part to another lab; a psychologist treats
patients with some mental disorder and takes two observations on each
patient, one before the treatment and the other after the treatment; or a production engineer may want to increase the productivity of some product by
adjusting a machine differently, so she measures the productivity before and
after the adjustment. The data collected in this manner are usually known as
paired data. If the techniques of testing hypotheses discussed in section 10.5
are applied to paired data, our results may turn out to be inaccurate, since the
samples are not independent.
These kinds of data are also sometimes known as before and after data.
To compare the two means in such cases, we use a test known as paired ttest.
Let (X11, X21), (X12, X22), ..., (X1n, X2n) be a set of n paired observations
on n randomly selected individuals or items, where (X1i, X2i) is a pair of
observations on the ith individual or item. We assume that the samples (X11,
X12, X13, ..., X1n) and (X21, X22, X23, ..., X2n) come from populations with
means 1 and 2 and variances 12 and 22 respectively. Clearly the samples
(X11, X12, X13, ..., X1n) and (X21, X22, X23, ..., X2n) are not independent, since
each pair of observations (X1i, X2i) are two observations on the same individual and are not independent. Finally we assume that the sample of differences between each pair, that is (d1, d2, d3, ..., dn), where di X1i X2i; i
1, 2, 3, ..., n, comes from a normal population with mean d 1 2 and
variance d2, where d2 is unknown. We are then interested in testing the following hypotheses:
(i) H0: d 0
(ii) H0: d 0
(10.34)
or
(iii) H0: d 0 versus H1: d 0
Recall the discussion we had in section 10.5.2 concerning testing of
hypotheses about one population mean . The problem of testing hypotheses about the mean d falls in the same framework as the problem of testing
hypotheses about the mean of a normal population with unknown variance.
It follows that the test statistic for testing any one of the above hypotheses is
T=
Xd d
Sd n
(10.35)
where Xd and Sd are respectively the sample mean and the sample standard
deviation of the sample of differences (d1, d2, d3, ..., dn). Assuming that the
population of differences is normal it follows that the test statistic T is distributed as Students t-distribution with (n 1) degrees of freedom. We further illustrate this method with the following example.
Example 10.14 A manager of a manufacturing company wants to evaluate
the effectiveness of a training program by measuring the productivity of
those workers who went through that training. The following data shows the
productivity scores before and after the training of 10 randomly selected
workers.
Workers
10
Before
75
78
76
80
79
83
70
72
72
74
After
79
77
80
85
80
84
78
76
70
80
di
X1i X2i
4
4
5
1
1
8
4
6
Do the data provide sufficient evidence to indicate that the training is effective? Use 0.05. Find the p-value of the test.
Solution: We rst calculate some sample statistics that we need for testing
the desired hypothesis, that is,
Xd ( di) / n 30 / 10 3
Sd2
di
1
di2
=
n 1
n
1
= (180 90 ) = 10
9
Sd = 10 = 3.162
To test the hypothesis we again follow the six-step technique.
Step 1 H0: d 0 versus H1: d 0
Step 2 Assign a suitable predetermined value to ; say
0.05.
Step 3 From our above discussion, the test statistic that we
would use is
X d
T= d
Sd n
Step 4 The test statistic is distributed as Students tdistribution (we encourage readers to check the conditions
needed to use t-distribution) with n 1 9 degrees of
freedom.
240
Chapter Ten
0.05
1.833
Figure 10.25 Rejection region under the lower tail with 0.05.
Step 5 Since the test is a lower tail test with 0.05, the
rejection region is as shown in Figure 10.25.
3 0
= 3
10 10
Clearly the observed value of the test statistic falls in the rejection region.
Thus, we reject the null hypothesis. In other words the test does indicate, at
the 0.05 level of signicance, that the training program is effective.
The p-value of the test is given by
p-value P(T 3) P(T 3) 0.01.
number of elements in the sample that possess the desired characteristic. Then
from Chapter 8 we know that X/n is a point estimator of p, that is, p X / n.
We also know that for large n (np 5, n(1 p) 5), the estimator p is distributed approximately as normal with mean p and variance p(1 p) / n.
Having said that, we are now ready to discuss the method of testing of a
hypothesis about the population proportion p. Under the assumption that the
sample size is large, we discuss the following hypotheses about the population proportion.
(i) H0: p p0
versus H1: p p0
(10.36)
or
(iii) H0: p p0
versus H1: p p0
Since the method of testing these hypotheses follows the same six-step
technique that we used to test hypotheses about the population mean, we
illustrate the method with the following example.
Example 10.15 Environmentalists believe that sport utility vehicles
(SUVs) consume excessive amount of gasoline and are the biggest polluters
of our environment. An environmental agency wants to find what proportion
of vehicles on U.S. highways are SUVs. Suppose that a random sample of
500 vehicles collected from highways in the various parts of the country
showed that 120 out of 500 vehicles were SUVs. Do these data provide sufficient evidence that 25% of the total vehicles driven in the United States are
SUVs? Use 0.05 level of significance. Find the p-value of the test.
Solution: From the given information, we have
n 500; X Xi 120,
thus
p X/n 120/500 0.24
Now to test the desired hypothesis we proceed as follows:
Step 1 H0: p p0 versus H1: p p0
Step 2 0.05
Step 3 We consider the pivotal quantity for p as the test
statistic, that is,
Z=
p - p
p(1 - p) n
(10.37)
242
Chapter Ten
1.96
1.96
Figure 10.26 Rejection regions under the two tails with 0.05.
0.24-0.25
0.25(1-0.25) 500
0.516,
which does not fall in the rejection region. Thus, we do not reject the null
hypothesis H0.
Since the test is a two tail test the p-value is given by
p-value 2P(Z z) 2P(Z 0.516)
2(0.3030) 0.6060
10.7.2 Testing of Statistical Hypotheses about the Difference
Between Two Population Proportions When Sample Sizes Are
Large
Consider two binomial populations with parameters n1, p1 and n2, p2 respectively. Then we are usually interested in testing hypotheses such as
(i) H0: p1 p2 versus H1: p1 p2
(ii) H0: p1 p2
(iii) H0: p1 p2
versus H1: p1 p2
(10.38)
or
The hypotheses in 10.38 may equivalently be written as
(i) H0: p1 p2 0
versus H1: p1 p2 0
(ii) H0: p1 p2 0
versus H1: p1 p2 0,
(iii) H0: p1 p2 0
versus H1: p1 p2 0
or
(10.39)
We illustrate the method of testing the above hypotheses with the following example.
Example 10.16 A computer assembling company gets all its chips from
two suppliers. The company has experienced that both suppliers have supplied a certain proportion of defective chips. The company wants to test a
hypothesis with three hypotheses: (i) supplier I supplies a smaller proportion
of defective chips, (ii) supplier I supplies a higher proportions of defective
chips, or (iii) the suppliers do not supply the same proportion of defective
chips. To achieve this goal the company took a random sample from each
supplier. It was found that in one sample of 500 chips, 12 were defective, and
in the second sample of 600 chips, 20 were defective. For each of the above
hypotheses use 0.05 level of significance. Find p-value for each test.
Solution: From the given data, we have
n1 500, X1 12, p1 X1/n1 12/500 0.024
n2 600, X2 20, p1 X2/n2 20/600 0.033
where X1 and X2 are the number of defective chips in samples 1 and 2 respectively. Now to test the desired hypotheses we proceed as follows:
Step 1
(i) H0: p1 p2 0
versus H1: p1 p2 0
(ii) H0: p1 p2 0
versus H1: p1 p2 0,
(iii) H0: p1 p2 0
versus H1: p1 p2 0
or
Step 2 0.05
Step 3 We consider the pivotal quantity for p1 - p2 as the test
statistic, that is,
(p1 p 2 ) (p1 p2 )
Z=
p1 (1 p1 ) p2 (1 p2 )
(10.40)
+
n1
n2
Step 4 Since n1p1 500(0.024) 12 5, and n1(1 p1)
500(1 0.024) 488 5, the sample size n1 is large.
Similarly, we can verify that the sample size n2 is large. Thus,
the test statistic Z in step 3 is distributed approximately as
standard normal N(0,1).
Step 5 Since the test statistic is approximately normally
distributed, the rejection regions for testing hypotheses (i), (ii),
and (iii) at the 0.05 level of signicance are as shown in
Figure 10.27.
Step 6 Since under the null hypothesis p1 p2 0, we
substitute the values of p1, p2, and p1 p2 0 in the
numerator, and the values of p1 and p2 in the denominator. Note
244
Chapter Ten
(i)
1.645
(ii)
1.645
(iii)
1.96
1.96
Figure 10.27 Rejection regions for testing hypotheses (i), (ii), and (iii) at the
0.05 level of significance.
(10.41)
(ii)
mally with mean and unknown variance 2. Let X be the sample mean and
S2 be the sample variance. Then from Chapter 8, under the assumption of the
population being normal, we know that the pivotal quantity for 2, that is,
(n 1)S 2
2
(10.42)
(10.43)
or
where 02 0 is known. As we described earlier the location of the rejection
regions depend upon the alternative hypotheses. Thus, for example the rejection regions for testing the hypotheses (i), (ii), and (iii) at the level of signicance are as shown in Figure 10.28.
f(x)
f(x)
(i)
X n-1, 1-
f(x)
(ii)
(iii)
X n-1,
2
X n-1, 1 -
2
2
X n-1, 1 -
Figure 10.28 Rejection region under the chi-square distribution curve for testing
hypotheses (i), (ii), and (iii) at the level of significance.
246
Chapter Ten
f(x)
13.848
Figure 10.29 Rejection region under the lower tail with 0.05.
2
2
versus H1: 1 2 1,
(ii) H0: 12 22 1
versus H1: 12 22 1,
(iii) H0: 12 22 1
versus H1: 12 22 1.
(10.44)
or
Let X11, X12, X13, ... , X1n1 and X21, X22, X23, ... , X2n2 be the random
2
samples from two independent
2,
(10.45)
as a test statistic we can test any of the hypotheses in Equation (10.44). The
rejection regions for testing the hypotheses (i), (ii), and (iii) at the level of
signicance are as shown in Figure 10.30.
We illustrate the method of testing a hypothesis about the ratio of two
population variances with the following example.
Figure 10.30 Rejection region under the F-distribution curve for testing hypotheses
(i), (ii), and (iii) at the level of significance.
248
Chapter Ten
Example 10.18 The quality of any process depends on the amount of variability present in the process, which we measure in terms of the variance of
the quality characteristic. For example, if we have to choose between two
similar processes, we would prefer the one with smaller variance. Any
process with smaller variance is more dependable and more predictable. In
fact, one of the most important criteria used to improve the quality of a
process or to achieve 6 quality is to reduce the variance of the quality characteristic in the process. In practice, comparing the variances of two
processes is common. Suppose the following is the sample summary of samples from two independent processes. We assume that the quality characteristics in the two processes are normally distributed as N(1, 12) and N(2,
22) respectively.
n1 21
X1 15.4
S12 24.6
n2 16
X2 17.2
S22 16.4
Test at the 0.05 level of significance the hypothesis H0: 12 22 versus H1: 12 22, which is equivalent to testing H0: 12 / 22 1 versus H1:
12 / 22 1. Find the p-value for the test.
Solution:
Step 1 H0: 12 / 22 1
Step 2 0.05
Step 3 As explained in the introductory paragraph, the test
statistic for testing the hypothesis in step 1 is
F S12 / S22.
Step 4 The test statistic in step 3 is distributed as Fn1-1,n2-1 or,
in this case, F20, 15.
Step 5 Since the test is a two-tail test, the rejection region is as
shown in Figure 10.31. As noted in Chapter 8, the critical point
under the right tail is F20, 15; 0.025, which can be found directly
from F tables (Table VI of the appendix). However, the critical
point under the lower tail is F20, 15; 1 0.025 or F20, 15; 0.975, which
cannot be found directly from the F tables. Thus, to nd this
value we use the following relation (see Equation (8.32)):
F1 , 2 ,1 =
1
Fv2 ,v1 ,
(10.46)
f(x)
0.389
2.76
Figure 10.31 Rejection region under the right tail with 0.05.
f(x)
2.33
Figure 10.32 Rejection region under the right tail with 0.05.
Clearly this value does not fall in the rejection region. Thus, we do not reject
the null hypothesis H0.
Note that for the tests about variances we can only nd a range for the pvalue. Thus, in this case we have
p-value 2P(F f) 2P(F 1.5) 0.20.
Example 10.19 Use the data of Example 10.17 to test the following
hypothesis:
H0: 12 / 22 1
Solution: The only difference between this example and Example 10.18 is
that the alternative hypothesis is different. In this example the only change
that occurs is in the rejection region; everything else, including the value of
the test statistic, will be exactly the same. The rejection region in this case
will be only under the right tail, which can be determined directly from Table
VI of the appendix. Thus, the rejection region is as shown in Figure 10.32.
Since the value of the test statistic does not fall in the rejection region,
we do not reject the null hypothesis H0.
The p-value for the test in this example is given by
p-value P(F f) P(F 1.5) 0.10.
250
Chapter Ten
z 2 <
X 0
< z
n
(10.47)
or
z 2
n < X 0 < z 2
or
X z 2
n < 0 < X + z 2
(10.48)
From Equation (10.48) it follows that we do not reject the null hypothesis H0 if the value (0) of under the null hypothesis falls in the interval
( X z 2 n ,X + z 2 n ). This is equivalent to saying that we do not
reject the null hypothesis H0 if the condence interval
( X z 2 n ,X + z 2 n ) for with condence coefcient 1
contains the value (0) of under the null hypothesis. Now, using the information contained in the sample summary the condence interval for with
condence coefcient 1 ( in our case it is 95%) is
( X z 2
n ,X + z 2
n)
(10.5 1.96
2.5
2.5
, 10.5 1.96
)
25
25
(9.52, 11.48)
This interval clearly contains 10, the value of under the null hypothesis.
Thus, we do not reject the null hypothesis H0, and that was the conclusion
we made in Example 10.8.
We now consider a one-tail test.
Example 10.21 Referring back to Example 10.9, we have that the sampled
population is normally distributed with an unknown mean and unknown
standard deviation . Also, we aregiven that the sample size is small, with
sample summary given as n 16, X 33, S 6. We want to test the hypothesis H0: 30 versus H1: 30 at the 0.05 level of significance,
using a confidence interval with confidence coefficient 1 , that is, 95%.
Solution: Recall from Example 10.9 that the test statistic used to test the
hypothesis
H0: 30 versus H1: 30
was
X
. Thus, it is clear that we do not reject the null hypothesis H0 if
S n
the test statistic under the null hypothesis H0: 0 is such that
X 0
< t n 1,
S n
(10.49)
or
X 0 < t n 1, S
or
X t n 1, S
n < 0
In other words, we do not reject the null hypothesis H0 if the lower onesided condence interval
( X t n 1, S
n ,)
(10.50)
with condence coefcient 1 contains the value (0) of under the null
hypothesis.
Now using the information contained in the sample and Equation
(10.50), the lower one-sided condence interval for with condence coefcient 95% is
( X t n 1, S n , )
= ( 33 1.753 (6
(30.3705, )
16 ), )
252
Chapter Ten
This condence interval clearly does not contain 30, the value of under
the null hypothesis. Thus, we reject the null hypothesis H0, and this was the
conclusion we made in Example 10.9.
Having discussed these two examples, we now give the rule (see
Equations (10.48) and (10.50)) and the condence intervals to be used for
testing various hypotheses that we discussed earlier in this chapter.
Rule: Do not reject the null hypothesis H0 at the level of signicance if the corresponding condence interval with condence
coefcient 1 , given in Table 10.2, contains the value of the
parameter under the null hypothesis H0.
(, X + z / n ) if is known
H: 0 vs. H1: 0
(, X + z S / n ) if is unknown
(X z / n , ) if is known
(X z S / n , ) if is unknown
(X z / 2 / n , X + z / 2 / n ) if is known
(X z / 2 S / n , X + z / 2 S / n ) if is unknown
(, X 1 X 2 + z 12 / n1 + 22 / n2 ) if 1, 2 are
known
(X 1 X 2 z 12 / n1 + 22 / n2 , ) if 1, 2 are
known
(X 1 X 2 z / 2 12 / n1 + 22 / n2 ,
X 1 X 2 + z / 2 12 / n1 + 22 / n2 ) if 1, 2 are
known
H0: 1 2 0 vs. H1: 1 2 0
(X 1 X 2 z / 2 S12 / n1 + S22 / n2 ,
Hypothesis
(, X + z / n ) if is known
(, X + t n 1, S / n ) if is unknown
(X z / n , ) if is known
(X t n 1, S / n , ) if is unknown
(X z / 2 / n , X + z / 2 / n ) if is known
(X t n 1, / 2S / n ,X + t n 1, / 2S / n ) if is known
(, X 1 X 2 + z 12 / n 1 + 22 / n2 ) if 1, 2
are known
(, X 1 X 2 + t n1+n2 2, Sp 1/ n 1 + 1/ n2 ) if 1, 2
are unknown and 1 2
(X 1 X 2 z
known
12 / n1 + 22 / n2 , ) if 1, 2 are
(X 1 X 2 t n1+n2 2, Sp 1/ n1 + 1/ n2 , ) if 1, 2
are unknown and 1 2
(X 1 X 2 t m , S12 / n1 + S22 / n2 , )
if 1, 2 are unknown and 1 2
(X 1 X 2 z / 2 12 / n1 + 22 / n2 ,
X 1 X 2 + z / 2 12 / n1 + 22 / n2 )
if 1, 2 are known
H0: 1 2 0 vs. H1: 1 2 0
(X 1 X 2 t n +n
Sp 1/ n1 + 1/ n2 ,
X 1 X 2 + t n +n
Sp 1/ n1 + 1/ n2 )
2 2, 2
2 2, 2
(X 1 X 2 t m , / 2 S12 / n1 + S22 / n2 ,
X 1 X 2 + t m , / 2 S12 / n1 + S22 / n2
if 1, 2 are unknown and 1 2
Large Sample Size
H0: p1 p0 vs. H1: p1 p0
(0, p + z p 0q 0 / n ) where q 1 p
0
0
(p z p 0q 0 / n , 1)
(p z / 2 p0q0 / n , p + z / 2 p0q0 / n )
Continued
254
Chapter Ten
Continued
Hypothesis
(1/ n1 + 1/ n2 )) where
(0, (p 1 p 2 ) + z pq
p =
n1p 1 + n2 p 2
n1 + n2
(1/ n1 + 1/ n2 ), 1)
((p 1 p 2 ) z pq
(1/ n1 + 1/ n2 ),
((p 1 p 2 ) z / 2 pq
(1/ n1 + 1/ n2 ))
(p 1 p 2 ) + z / 2 pq
(n 1)S 2
,
2
X (n 1),1
(n 1)S 2
0, 2
X (n 1),1
(n 1)S 2 (n 1)S 2
, 2
2
X (n 1), / 2 X (n 1),1 / 2
Hypothesis
O 12
O 12
H0: O 2 1 vs. H0: O 2 1
2
2
S12
0, Fn2 1,n11, S 2
O 12
O 12
H0: O 2 1 vs. H0: O 2 1
2
2
S12
Fn2 1,n11, S 2 ,
O 12
O 12
H0: O 2 1 vs. H0: O 2 1
2
2
S12
S12
Fn2 1,n11, / 2 S 2 , Fn2 1,n11, / 2 S 2
2
2
m=
(*)
S12 S22
n + n
1
2
11
Computer Resources to
Support Applied Statistics
Using MINITAB and JMP Statistical Software
n the past two decades, the use of technology to analyze complicated data
has increased substantially, which not only has made the analysis very
simple, but also has reduced the time required to complete such analysis.
To facilitate statistical analysis many companies have acquired personal computer-based statistical software. Several PC-based software packages are
available, including BMDP, JMP, MINITAB, SAS, SPSS, and SYSTAT. A
great deal of effort has been expended in the development of these software
packages to create graphical user interfaces that allow users to complete statistical analysis without having to know a programming or scripting language.
We believe that publishing a book discussing applied statistics without
acknowledging and addressing the importance and usefulness of statistical
software would simply not be in our readers best interests. Accordingly, in
this chapter we briey discuss two popular statistical packages, MINITAB
and JMP. It is our explicit intent not to endorse either software package. Each
package has its strengths and weaknesses.
256
Chapter Eleven
Menu
commands
Session
window
Data
window
Project
manager
window
Figure 11.1 The screen that appears first in the MINITAB environment.
File
Edit
Data
Calc
Stat
Graph
Editor
Tools
Window
Help
there is one cell that is not labeled, whereas the rest of the cells are labeled
1, 2, 3, ... . In the unlabeled cell you can enter a variable name, such as part
name, shift, lot number, and so on. In the labeled cells you enter data, using
one cell for each data point. If a numerical observation is missing, MINITAB
will replace the missing value with a star (*).
Saving a Data File
The command File Save Current Worksheet As allows saving the current data le. When you enter this command a dialog box titled Save
Worksheet As appears. Type the le name in the box next to File Name,
select the drive location for the le, and click Save.
Retrieving a Saved MINITAB Data File
Using the command File Open Worksheet will prompt the dialog box
Open Worksheet to appear. Select the drive and directory where the le was
saved by clicking the down arrow next to the Look in box, enter the le
name in the box next to FILE NAME and then click Open. The data will
appear in the same format you had entered earlier.
Saving a MINITAB Project
Using the command File Save Project saves the ongoing project in a
MINITAB Project (MPJ) le to the designated directory with the name you
chose. Saving the project saves all windows opened in the project, along with
the contents of each window.
Print Options
To print the contents in any specic window you need to make the specic
window active by clicking on it, then use the command File Print Session
Window... (Graph..., Worksheet...).
If you want to print multiple graphs on a single page, highlight the
graphs in the Graph folder in the Project Manager Window, right-click
and choose Print. The Print Multiple Graphs dialog box appears. To
258
Chapter Eleven
change the page orientation of the multiple graphs, use File Page Setup
to adjust the printing options.
11.1.2 Calculating Descriptive Statistics
Column Statistics
First enter the desired data in the Worksheet window. Then, from the Menu
command select Calc Column Statistics. The statistics that can be displayed for the selected columns are Sum, Mean, Standard Deviation,
Minimum, Maximum, Range, Median, Sum of squares, N total, N nonmissing, and N missing. All these choices appear in the dialog box shown
in Figure 11.3. This dialog box appears immediately after you select command Calc Column Statistics. Note that using this command you can
choose only one statistic at a time.
Example 11.1 Use the following steps to calculate any one of the statistics
listed in the Column Statistics dialog box, using the following data:
8976568989
Solution:
1. Enter the data in column C1 of the Data window.
2. Select Calc from the Menu command.
3. Click Column Statistics from the pull-down menu in the Calc
command menu.
4. Check in the dialog box Column Statistics the circle next to
the desired statistics; here we will use standard deviation.
5. Enter C1 in the box next to the input variable.
6. Click OK. The MINITAB output will appear in the session
window, as shown in Figure 11.3.
Row Statistics
From the Menu bar select Calc Row Statistics. The statistics that can be
displayed for the selected rows are Sum, Mean, Standard deviation,
Minimum, Maximum, Range, Median, Sum of squares, N total, N nonmissing, and N missing. Note that Column Statistics and Row Statistics give
you exactly the same choices. Use the appropriate command Column
Statistics or Row Statistics depending upon the format of your data and
whether it is arranged in columns or rows.
Descriptive Statistics
From the Menu bar select Stat Basic Statistics Display Descriptive
Statistics. Statistics available for display are Mean, SE of mean, Standard
deviation, Variance, Coefficient of variation, Trimmed mean, Sum,
Minimum, Maximum, Range, N nonmissing, N missing, N total, Cumulative
Pull-down
menu
Dialog box
Figure 11.3 MINITAB window showing input and output for Column Statistics.
260
Chapter Eleven
Figure 11.4 MINITAB window showing various options available under Stat
command.
Graph variables just enter C1, C2, and so forth and click OK. A separate
graph is displayed for each variable. To display more than one graph, select
the Multiple Graphs option and choose the desired display option.
Example 11.3
23
25
20
16
19
18
42
25
28
29
36
26
27
35
41
18
20
24
29
26
37
38
24
26
34
36
38
39
32
33
Solution: Use the following steps to draw any one of the graphs listed in the
pull-down menu in the Graph command.
1. Enter the data in column C1 of the Data window.
2. Select Graph from the Menu command.
3. Click Histogram from the pull-down menus available in the
Graph command menu.
4. Enter C1 into the Graph variables box and click OK.
5. The MINITAB output will appear in the Graph window, as
shown in Figure 11.5.
Figure 11.5 MINITAB display of histogram for the data given in Example 11.3.
262
Chapter Eleven
Sometimes we are interested in constructing a histogram with a particular number of classes or intervals, say 5. Right-click on one of the bars in the
default histogram to bring up the Edit Bars dialog box shown in Figure 11.6.
In the Edit Bars dialog box select Binning tab and then under Interval
Type check the circle next to Midpoint. Under Interval Definition check
the circle next to Number of Intervals and enter the number of desired intervals in the box next to it, 5 in this example. Click OK. The output will appear
in the graph window as shown in Figure 11.7.
Dotplot
First enter the data in one or more columns of the worksheet, depending
upon how many variables you have. For each variable use only one column.
Then use the menu command Graph Dotplot. These commands prompt
a dialog box titled Dotplots to appear; it has seven options. Choose the
desired graph option and click OK. Then another dialog box appears,
DotplotOne Y, Simple. Enter one or more variables into Graph variables.
If you have not entered the names of the variables in the data columns, under
Graph variables just enter C1, C2, and so forth and click OK. A separate
graph is displayed for each variable. To display more than one graph, select
the Multiple Graphs option and choose the desired display option.
Example 11.4
23
25
20
16
19
18
42
25
28
29
36
26
27
35
41
18
20
24
29
26
37
38
24
26
34
36
38
39
32
33
6
5
4
3
2
1
0
18
24
30
36
42
Data
Figure 11.7 MINITAB display of histogram with 5 classes for the data in Example
11.3.
Solution:
1. Enter the data in column C1 of the Data window (same as
Example 11.3).
2. Select Graph from the Menu command.
3. Click Dotplot from the pull-down menus available in the
Graph command menu.
4. Select the Simple dotplot.
5. Enter C1 into the Graph variables box and click OK.
6. The MINITAB output will appear in the Graph window, as
shown in Figure 11.8.
Scatterplot
Enter the data in one or more columns of the worksheet, depending upon
how many variables you have. For each variable use only one column. Use
the menu command Graph Scatterplot. These commands prompt a dialog box titled Scatterplots, which has seven options. Choose the desired
graph option and click OK. Another dialog box appears, titled ScatterplotSimple. Enter the names of the variables under y variable and x variable. If
you have not entered the names of the variables in the data columns, then do
so under y variable and x variable (enter the columns where you have entered
the data, say C1, C2, etc.) and then click OK. A separate graph is displayed
for each set of variables. To display more than one graph, select the Multiple
Graphs option and choose the desired option.
Example 11.5 The following data shows the test scores (x) and the job
evaluation scores (y) of 16 Six Sigma Green Belts. Prepare a scatter plot for
these data and interpret the result you observe in this graph.
264
Chapter Eleven
45
47
40
35
43
40
49
46
9.2
8.2
8.5
7.3
8.2
7.5
8.2
7.3
38
39
45
41
48
46
42
40
7.4
7.5
7.7
7.5
8.8
9.2
9.0
8.1
Figure 11.8 MINITAB output of Dotplot for the data in Example 11.4.
Solution:
1. Enter the data in columns C1 and C2 of the Data window.
2. Select Graph from the Menu command.
3. Click Scatterplot from the pull-down menus available in the
Graph command menu.
4. Select the simple scatterplot and click OK.
5. Enter C2 and C1 under the y variable and x variable
respectively and click OK.
6. The MINITAB output will appear in the Graph window, as
shown in Figure 11.9.
The graph in Figure 11.9 shows that although the plotted points are not
clustered around the line, they still fall around the line going through them.
This indicates that there is a moderate correlation between the test scores and
the job evaluation scores of the Six Sigma Green Belts.
Box Whisker Plot
First, enter the data in one or more columns of the worksheet depending
upon how many variables you have. For each variable use only one column.
Then use the menu command Graph Boxplot or Stat EDA Boxplot.
These commands prompt a dialog box titled Boxplot to appear with four
Figure 11.9 MINITAB output of Scatterplot for the data given in Example 11.5.
graph options. Choose the desired option and click OK. Then another dialog
box appears, titled BoxplotOne Y, Simple. Enter one or more variables
into Graph variables. If you have not entered the names of the variables in
the data columns, then under Graph variables just enter C1, C2, and so
forth and click OK. A separate graph is displayed for each variable. To display more than one graph select the Multiple Graphs option and choose the
desired display option. The Box Plot will appear in the graph window.
Example 11.6
23
25
20
16
19
55
42
25
28
29
36
26
27
35
41
55
20
24
29
26
37
38
24
26
34
36
38
39
32
33
Solution:
1. Enter the data in column C1 of the Data window (same as
Example 11.3).
2. Select Graph from the Menu command.
3. Click Boxplot from the pull-down menus available in the
Graph command menu.
266
Chapter Eleven
Figure 11.10 MINITAB display of box plot for the data in Example 11.6.
Graphical Summary
First enter the data in one or more columns of the worksheet, depending how
many variables you have. For each variable use only one column. Then use
the menu command select Stat Basic Statistics Graphical Summary.
These commands prompt a dialog box titled Graphical Summary to appear.
Enter the names of the variables you want summarized under Variables. If
you have not entered the names of the variables in the data columns, then
under Variables just enter C1, C2, and so forth. In the box next to
Confidence Level enter the appropriate value of the condence level and
click OK. This option provides both graphical and numerical descriptive statistics. A separate graph and summary statistics are displayed for each variable.
Example 11.7
23
25
20
16
19
55
42
25
28
29
36
26
27
35
41
55
20
24
29
26
37
38
24
26
34
36
38
39
32
33
Figure 11.11 MINITAB display of graphical summary for the data in Example 11.7.
Solution:
1. Enter the data in column C1 of the Data window (same as
Example 11.3).
2. Select Stat from the Menu command.
3. Click Basic Statistics and then Graphical Summary from the
pull-down menus available in the Stat command menu.
4. Enter C1 into the Graph variables box and click OK.
The MINITAB output will appear in the Summary window, as shown in
Figure 11.11.
Bar Chart
Enter the data containing categories and frequencies in columns C1 and C2.
Or if the categories and frequencies are not given, enter all the categorical
data in column C1 for the following example. From the Menu command
select Graph Bar Chart. In the Bar Charts dialog box are three options
under Bars represent. Select option 1, Counts of unique values, if you
have one or more columns of categorical data (as you will in the following
example); select 2, A function of a variable, if you have one or more
columns of measurement data; or select option 3, Values from a table, if you
have one or more columns of summary data. For each of these options there
are several other options about the representation of the graph. Choose an
appropriate option and click OK. Now another dialog box appears, titled Bar
268
Chapter Eleven
Chart[description]. Enter the variable name(s) under Categorical variables. If you have not entered the names of the variables in the data columns,
under Graph variables just enter C1, C2, and so forth and then click OK. A
separate graph is displayed for each variable. To display more than one graph
select Multiple Graphs and choose the desired display option.
Example 11.8
Solution:
1. Enter the data in column C1 of the Data window.
2. Select Graph from the Menu command.
3. Click Bar Chart from the pull-down menus available in the
Graph command menu.
4. Select Counts of Unique Values and Simple from the options.
5. Click OK.
6. Enter C1 into the Categorical variables box.
7. Click OK.
8. The MINITAB output will appear in the graph window, as
shown in Figure 11.12
Pie Chart
Enter the data containing categories and frequencies in columns C1 and C2.
If the categories and frequencies are not given, enter the categorical data in
column C1, as in the following example. From the Menu command select
Graph Pie Chart. Choose Chart raw data when each row in the column
represents a single observation and Chart variables from a table if the categories and frequencies are given. A slice in the pie is proportional to the
number of occurrences of a value in the column or the frequency of each category. Enter column C1 in the box next to Categorical variables. A separate
pie chart for each column is displayed, on the same graph. To display more
than one graph select the Multiple Graphs option and choose the required
display option. When category names exist in one column and summary data
exist in another column, use the Chart values from a table option. Enter
columns for Categorical variable and Summary variables.
Figure 11.12 MINITAB display of bar graph for the data Example 11.8.
Example 11.9
Solution:
1. Enter the data in column C1 of the Data window.
2. Select Graph from the Menu command.
3. Click Pie Chart from the pull-down menus available in the
Graph command menu.
4. Select Chart raw data from the options.
5. Enter C1 into the Categorical variables box and click OK.
6. The MINITAB output will appear in the graph window, as
shown in Figure 11.13
11.1.3 Probability Distributions
To calculate various probabilities, select from the Menu command Calc
Probability Distributions and then the probability of choice. This will bring
up a dialog box where the choice of how the probabilities are calculated, such
270
Chapter Eleven
Figure 11.13 MINITAB display of pie chart for the data in Example 11.9.
the Optional storage enter the column in which you want to store the output. Then click OK.
Example 11.10 Let a random variable be distributed as normal with mean
6 and standard deviation 4. Determine the probability P(8.0 X
14.0).
Solution: In order to determine the probability P(8.0 X 14.0), we
have to rst nd the probabilities P(X 8.0) and P(X 14.0). Then P(8.0
X 14.0) P(X 14.0) P(X 8.0). To nd probabilities P(X 8.0)
and P(X 14.0) using MINITAB, we proceed as follows:
1. Enter the test values of 8 and 14 in column C1.
2. From the Menu bar select Calc Probability Distribution
Normal.
3. In the dialog box that appears, click the circle next to
Cumulative probability.
4. Enter 6 (the value of the mean) in the box next to Mean and 4
(the value of the standard deviation) in the box next to
Standard deviation.
5. Click the circle next to Input column and type C1 in the box
next to it.
6. Click OK.
7. In the session window, text will appear as follows, indicating
values of P(X 14.0) 0.977250 and P(X 8.0) 0.691462.
Thus, P(8.0 X 14.0) P(X 14.0) P(X 8.0)
0.977250-0.691462 0.285788.
Normal with mean 6 and standard deviation 4
x
P( X x )
0.691462
14
0.977250
Binomial Distribution
For binomial probability distributions, the same options are available as for
the normal probability distribution: Probability, Cumulative probability,
and Inverse cumulative probability.
From the Menu bar select Calc Probability Distributions
Binomial. This will prompt a dialog box titled Binomial Distribution to
appear. Click one of the options, which are Probability density,
Cumulative probability, or Inverse cumulative probability. Enter the
Number of trials and Probability of success (0 p 1) to dene the binomial distribution. Check the circle next to Input column (if you have more
272
Chapter Eleven
than one value of x that you must enter in one of the data columns, say, C1)
and enter C1 in the box next to it. Or you may select the Input constant field
if you have only one value of x and enter that value in the box next to it. If
desired, use the Optional storage to enter the column in which you want to
store the output. Then click OK.
Example 11.11 The probability is 0.80 that a randomly selected Six Sigma
Green Belt will finish a project successfully. Let X be the number of Green
Belts who will finish successfully from a randomly selected group of 10
Green Belts. Find the probability distribution of the random variable X.
Solution: In order to nd the probability distribution of the random variable X we need to nd the probability of X 0, 1, 2, ..., 10. To nd these
probabilities using MINITAB we proceed as follows:
1. Enter the values 0, 1, 2, ...,10 in column C1.
2. From the Menu bar select Calc Probability Distributions
Binomial.
3. In the dialog box that appears, click the circle next to
Probability.
4. Enter 10 (the number of trials) in the box next to Number of
trials and 0.80 (the probability of success) in the box next to
Probability of success.
5. Click the circle next to Input column and type C1 in the box
next to it.
6. Click OK.
The desired probabilities will show up in the session window as:
Binomial with n 10 and p 0.8
x
P( X x )
0.000000
0.000004
0.000074
0.000786
0.005505
0.026424
0.088080
0.201327
0.301990
0.268435
10
0.107374
25
20
16
19
35
42
25
28
29
36
26
27
35
41
30
20
24
29
26
37
38
24
26
34
36
38
39
32
33
25
30
(a) Find a 95% confidence interval for the mean. (b) Test a hypothesis H0:
30 versus H1: 30 at the 5% level of significance.
Solution: Since the sample size n 32 is greater than 30, it is considered
to be a large sample. Therefore, either to nd a condence interval or to test
a hypothesis we use the Z-statistic. Also, when the sample size is large, the
population standard deviation , if it is unknown as in this example, can be
replaced with the sample standard deviation S. Note that we can nd the condence interval and test the hypothesis by using the one procedure given
below.
1. Enter the data in column C1 of the Data window (Worksheet
window).
2. Since the populations standard deviation is not known,
calculate the sample standard deviation of these data using one
of the MINITAB procedures discussed earlier. You will nd S
6.83.
3. Select the Stat command and then click Basic Statistics 1Sample Z in the pull-down menu. This will prompt a dialog
box titled 1-Sample Z (Test and Confidence Interval).
4. Enter C1 in the box below Samples in columns. (If you had
summary statistics, sample mean and sample size, check the
circle next to Summarized data and enter in the boxes next to
it the appropriate values.) Enter the values of the standard
deviation (6.83 from Step 2).
274
Chapter Eleven
5. Enter the value of the test mean under the null hypothesis, in
this case 30. (If you are not testing any hypothesis, leave it
empty)
6. Check Options, which will prompt another dialog box to
appear. Enter the condence level 95% in the box next to
Confidence level. Finally, next to Alternative, select one of the
three options; less than, not equal, or greater than. Click OK.
The MINITAB output will show up in the session window as:
One-Sample Z: C1
Test of mu 30 vs not 30
The assumed standard deviation 6.83
Variable N Mean StDev SE Mean
C1
95% CI
Since the p-value for the test is 0.756, which is much greater than the
level of signicance 5%, we do not reject the null hypothesis. The 95% condence interval is given as (27.2586, 31.9914). Also, note that since 95%
condence interval contains the value of we were testing for, we do not
reject the null hypothesis at the 5% [(100-95)%] level of signicance.
1-Sample t
In Chapters 9 and 10, we saw that if a small sample is taken from a normal
population with an unknown variance, we use t-statistic for nding a condence interval and testing a hypothesis about the mean. The MINITAB procedure for 1-Sample t is similar to the one for 1-Sample Z.
From the Menu bar select Stat Basic Statistics 1-Sample t. This
will prompt a dialog box titled 1-Sample t (Test and Confidence Interval)
to appear. Check the circle next to Samples in columns if you have entered
the raw data in columns. In the box below, select the columns containing the
sample data. Check the circle next to Summarized data if you have summary values for the sample, that is, sample size, sample mean, and sample
standard deviation, and enter those values. Enter the hypothesis test mean in
the box next to Test mean. Select Options to prompt another dialog box,
where you enter the condence level and value of the mean under the alternative hypothesis in the boxes next to Confidence Level and Alternative,
respectively. Click OK in both dialog boxes. The MINITAB output, which
provides the condence intervals and the p-value for the hypothesis testing,
will appear in the session window.
Example 11.13 Consider the following data from a population with an
unknown mean and unknown standard deviation:
23
25
20
16
19
35
42
25
28
29
36
26
27
35
41
30
20
24
29
26
37
38
24
26
(a) Find a 95% confidence interval for the mean . (b) Test a hypothesis H0:
28 versus H1: 28 at the 5% level of significance.
Solution: Follow the procedure discussed in the preceding paragraph and
use the same steps as in 1-Sample Z procedure. The MINITAB output will
appear in the session window as:
One-Sample T: C1
Test of mu 28 vs not 28
Variable N
C1
Mean
StDev SE Mean
95% CI
Since the p-value for the test is 0.797, which is much greater than the 5%
level of signicance, we do not reject the null hypothesis. The 95% condence interval is (25.3870, 31.3630).
Also, note that since 95% condence interval contains the test value of
, we do not reject the null hypothesis at the 5% [(100 95)%] level of signicance.
1 Proportion
For testing a hypothesis about one population proportion and for nding condence intervals, use the following MINITAB procedure.
From the Menu bar select Stat Basic Statistics 1 Proportion. This
will prompt a dialog box entitled 1 Proportion (Test and Confidence
Interval) to appear. In the dialog box, check the circle next to Samples in
columns if you have entered the raw data in columns. Then, in the box
below, select the columns containing the sample data. Check circle next to
Summarized data if you have sample summary values for the number of trials and successes (events) and enter those values. Select Options to prompt
another dialog box to appear. Enter the condence level, value of the proportion under the null, and the alternative hypotheses in the boxes next to
Confidence level, Test proportion, and Alternative, respectively. Check the
box next to Use test and interval based on normal population if the sample size is large, that is, if np and nq are greater than or equal to 5. If the sample size is not large, do not check this box. Click OK in both dialog boxes.
The MINITAB output, which provides the condence intervals and the pvalue for the hypothesis testing will appear in the session window.
Example 11.14 Several studies show that many industrial accidents can be
avoided if all safety precautions are strictly enforced. One such study showed
that 35 out of 50 accidents in one kind of industry could have been avoided
if all safety precautions were taken. If p denotes the proportion of accidents
that could be avoided by taking all safety precautions, find a 95% confidence
interval for p and test the hypothesis H0: p 0.85 versus H1: p 0.85 at the
5% level of significance.
Solution: In this problem, sample summary values are given. Following
the procedure discussed in the above paragraph enter the appropriate values
276
Chapter Eleven
N Sample p
95% CI
Z-Value P-Value
2.97
0.003
Since the p-value is 0.003, which is much smaller than the 5% level of
signicance, we reject the null hypothesis. The 95% condence interval is
(0.572980, 0.827020). Also, note that since 95% condence interval does not
contain the value of p, we reject the null hypothesis at the 5% [(100 - 95)%]
level of signicance.
11.1.5 Estimation and Testing of Hypotheses about Two
Population Means and Proportions
2-Sample t
In Chapters 9 and 10 we saw that if small samples are taken from two normal populations with unknown variances, we use t-statistic for nding a condence interval for the difference of two population means and for testing a
hypothesis about the two population means. We can achieve this goal of nding a condence interval for the difference of two population means and for
testing a hypothesis about the two population means by using the MINITAB
procedure for 2-Sample t.
From the Menu bar select Stat Basic Statistics 2-Sample t. This
will prompt a dialog box titled 2-Sample t (Test and confidence interval)
to appear. Check the circle next to Samples in columns if you have entered
the raw data in a single column, differentiated by subscript values in a second column. Enter columns C1 and C2 in boxes next to Samples and
Subscripts, respectively. Check the circle next to Samples in different
columns if the data for the two samples are entered in two separate columns.
Enter C1 and C2 in the boxes next to First and Second. If you have summary data, check the circle next to Summarized data and enter for the two
samples the sample size, sample mean, and sample standard deviation.
Check the box next to Assume equal variances only if the variances of the
two populations can be assumed to be equal. Then select Options to prompt
another dialog box to appear. Enter the condence level and the value of the
mean difference under the null hypothesis in the box next to Confidence
level, Test difference, and, depending on the alternative hypothesis, choose
less than, greater than, or not equal to in the box next to Alternative. Then
click OK in both dialog boxes. The MINITAB output, which provides the
condence intervals and the p-value for the hypothesis testing, will appear in
the session window.
Example 11.15 The following data give the summary statistics of scores
on productivity for two groups, one group who are Six Sigma Green Belts
and another group who are not:
Group
25
93
2.1
27
87
3.7
Find a 95% confidence interval for the difference between two population
means. Test a hypothesis H0: 1 2 0 versus H1: 1 2 0 at the 5%
level of significance. Assume that variances of the two populations are equal.
Solution: In this problem sample summary values are given. Following the
procedure discussed in the above paragraph, enter the appropriate values in
boxes of the dialog boxes. Then click OK in both dialog boxes. The overall
procedure is similar to that in Example 11.14. The MINITAB output will
appear in the session window as:
Mean
StDev
SE Mean
25
93.00
2.10
0.42
22
87.00
3.70
0.79
278
Chapter Eleven
Continued
Before
83
87
83
78
76
89
79
83
86
90
81
77
74
86
89
After
87
85
89
77
79
84
90
88
83
92
87
81
83
79
85
Test a hypothesis H0: d 0 versus H1: d 0 at the 5% level of significance. Find a 95% confidence interval for the population means difference
between before and after test scores.
Solution: Since we have two test scores for each Green Belt, the two
samples are not independent. To test the hypothesis and to find the desired
Mean
C1
15
82.7333 5.1056
1.3182
C2
15
84.6000 4.3556
1.1246
Difference 15
StDev
SE Mean
280
Chapter Eleven
N Sample p
120
0.066667
12
150
0.080000
From the Menu bar select Stat Basic Statistics 2 Variances. This
will prompt a dialog box titled 2 Variances to appear. Check the circle next
to Samples in one column if you have entered data into a single column,
with a second column of subscripts identifying the samples, and enter the
column references for the data and the subscripts in those boxes. Check the
circle next to Samples in different columns if the data for the two samples
are entered in two separate columns, and enter those column references next
to First and Second. If you have summary data, check the circle next to
Summarized data and enter sample sizes and sample variances in the appropriate boxes. Select Options to prompt another dialog box to appear. In this
dialog box, enter the condence levelany number between 0 and 100. By
default, it is 95%. Then click OK in both dialog boxes. The MINITAB output, which provides the condence intervals and the p-value for the hypothesis testing, will appear in the session window.
Example 11.18 Let 12 and 22 denote the variances of the serum cholesterol levels of elderly and young American men. For a sample of 32 elderly
American men, the sample standard deviation of serum cholesterol was 32.4;
for 36 young American men the sample standard deviation of serum cholesterol was 21.7. Do these data support, at the 5% level of significance, the
assumption that the variations of cholesterol levels in the two populations
are the same?
Solution: We are given the sample summaries. Following the procedure
discussed above, enter the appropriate values in boxes of the dialog boxes.
Notice that the problem provides sample deviations rather than variance.
Click OK in both dialog boxes. The MINITAB output will appear in the session window as:
Lower
StDev
Upper
32
25.1974
32.4
44.9883
36
17.0998
21.7
29.4730
282
Chapter Eleven
20
25
30
35
40
45
Figure 11.14 MINITAB printout of 95% Bonferroni confidence interval for standard
deviations.
23
25
20
16
19
55
42
25
28
29
36
26
27
35
41
55
20
24
29
26
37
38
24
26
34
36
38
39
32
33
Solution: Enter the data in column C1 of the data window, then follow the
steps discussed in the Normality Test procedure. The MINITAB output, the
normal probability graph shown in Figure 11.15, will appear in the graph
window. If all the data points fall almost on a straight line, we can conclude
that the sample comes from a normal population. The decision if all the
points fall almost on a straight line is somewhat subjective. Imagine a 10year-old child putting a nger on the straight line. If all the points are hidden
under the nger, we can say the data pass the normality test. In this example,
all the data points except the two points do fall on a straight line. Thus we
can assume that the data in this example passes the normality test. Moreover,
MINITAB also provides the p-value for one of the tests mentioned above.
For instance, we determined the p-value for the Anderson-Darling normality
test. Clearly the p-value is greater than 5%, the level of signicance.
Therefore, we can assume the given data come from a normal population.
The normal probability plot can usually be created in two steps:
1. Arrange the data in the ascending order and rank them 1, 2, 3,
..., n, where n is the sample size.
2. Plot the ith-ranked observation against 100(i 0.5)/n on
special graph paper, called normal-probability graph paper (see
Figure 11.15). If the plotted points fall on a straight line, it
implies that the sample comes from a normal population.
Note that the horizontal axis contains the data values and the vertical axis
contains the values of 100(i 0.5)/n.
Figure 11.15 MINITAB display of normal probability graph for the data in Example
11.19.
284
Chapter Eleven
Menu
commands
JMP starter
window
Figure 11.16 The screen that appears first in the JMP environment.
and get into the JMP environment, you will see the image in Figure 11.16 on
your screen. The pull-down menus appear at the top of the screen.
Menu commands include
File Edit Tables Rows Cols DOE Analyze Graph Tools View Window
Help
By clicking any of these menu commands, we arrive at options included
in that command. For example, if we click on the File menu, we get the dropdown menu as shown in Figure 11.17. The rst option, New, allows us to
create a data table as displayed in Figure 11.17.
Creating a Data Table
New data are entered in a Data Table. The strength of JMP as a statistical
analysis software package is derived from its ability to process data in
columns, more so than in rows. Data can be entered in one or more columns
depending upon the setup of the problem. By default, one active column
appears in the Data Table window. To add more columns, double-click on the
blank space to the right of the last column that was created. The rst column
on the far left serves as an index of the number of cells. Labels can be entered
for each column by double-clicking on the top cell of each column and entering a label such as Part Name, Shift, Lot Number, Operator, or Machine. In
the labeled cells you can enter data using a single cell for a single data point.
Saving a Data Table
Using the command File Save As function allows users to save the current
Data Table. When you enter this command, a dialog box entitled Save JMP
File As appears. Type the le name in the box next to File Name and then
286
Chapter Eleven
click Save. The le will be saved in the drive that you must choose before
you click Save.
Retrieving a Saved JMP Data Table
Using the command File Open Data Table will prompt the dialog box
Open Data Table to appear. Select the drive and directory where the le was
saved by clicking the down arrow next to the Look in box, enter the le
name in the box next to File Name, and click Open. The data will appear in
the same format it was last saved in.
Importing Data from the Internet
Using the command File Internet open... will prompt the dialog box
Internet Open Window to appear. The default protocol is HTTP: if you are
importing data from the Internet using an FTP site, then select ftp from the
drop-down menu. In the URL box type in the website address that contains
the data you want to import into JMP for analysis.
11.2.2 Calculating Descriptive Statistics
Column Statistics
First, enter the desired data in the active Data Table window. Then from the
Menu command select Table Summary. Select one or more data columns
in the box located on the left side of the dialog box to calculate descriptive
statistics. Then click on the Statistics button. A drop-down menu appears
with various options available to compute statistics for selected columns such
as the sum, mean, standard deviation, minimum, maximum, range, median,
sum of squares, N total, N nonmissing, and N missing. All these choices of
statistics appear in the dialog box shown in Figure 11.18. Then click OK.
Example 11.20 Use the following steps to calculate any one of the statistics
listed in the dialog box titled Column Statistics, using the following data:
8976568989
Solution:
1. Open a new Data Table under File New Data Table.
2. Enter the data in column 1 of the Data Table window.
3. Select Tables from the Menu command.
4. Click Summary from the pull-down menu available in the
Tables command menu.
5. Select column 1 in the box located on the left side of the dialog
box.
6. Under the Statistics option on the right side of the window,
choose the statistics that you want included in the summary.
Repeat this step until you have selected all the statistics that you
would like to be included in the summary.
7. Click OK.
Dialog box
Pull-down menu
Figure 11.18 JMP window showing input and output for Column Statistics.
288
Chapter Eleven
23
25
20
16
19
18
42
25
28
29
36
26
27
35
41
18
20
24
29
26
37
38
24
26
34
36
38
39
32
33
Figure 11.20 JMP display of histogram for the data given in Example 11.21.
Solution: Take the following steps to generate a histogram from the data set
in Example 11.21.
1. Enter the data in column 1 of the Data Table.
2. Select Tables from the Menu command.
3. Click Summary from the pull-down menu.
4. Select column 1 in the Select Columns box.
5. Select Y, Columns from Cast Selected Columns into Roles.
6. Click OK from Action.
To make the Graph layout horizontal, click on the red arrow to the left of
the column 1 title in the histogram window, shown in Figure 11.20. Select
Display Options Horizontal Layout.
Stem and Leaf
Prepare a stem and leaf diagram for the data in Example 11.21.
1. Enter the data in column 1 of the Data Table.
2. Select Analyze Distribution from the Menu command.
3. Click the column name under Select Columns.
4. Select Y, Columns from Cast Selected Columns into Roles.
5. Click OK from Action.
6. Select the red arrow to the left of the column name.
7. Select Stem and Leaf from the drop-down menu.
290
Chapter Eleven
Figure 11.21 JMP printout of stem and leaf for the data given in Example 11.21.
The stem and leaf diagram for the data in Example 11.21 is shown in
Figure 11.21
Box Whisker Plot
Extending our discussion of the data in Example 11.21, JMP is capable of
generating a box whisker plot along with the histogram. In JMP, there are
two types of box plots: Outlier Box Plot and Quantile Box Plot. Outlier
Box Plot is generated by default. To generate the Quantile Box Plot, take
the following steps:
1. Right-click anywhere outside the histogram graph, shown in
Figure 11.20.
2. Select the Quantile Box Plot Option.
To set the graph in a horizontal position as shown in Figure 11.22, leftclick the red arrow to the left of the title Distributions above the graph and
select the stack option.
Displayed above the histogram, we see both outlier and quantile box
plots. Data points that are considered outliers are presented as individual
points in the tails of the graphic while the interquartile range is indicated by
the width of the box. The single vertical line within the box represents the
Figure 11.22 JMP display of box plot with summary statistics for Example 11.21.
median of the data. The center of the diamond displays the mean, and the
width of the diamond indicates a 95% condence interval for the mean. JMP
lists by default the quantiles and other related statistics (moments) such as
mean and standard deviation to the right of the graph.
Graphical Summary
First, enter the data in one or more columns of the Data Table depending
upon whether you have data on one or more variables. For each variable use
only one column. Then using the menu command, select Tables
Summary. These commands prompt a dialog box titled JMP: Summary to
appear. Select the appropriate columns by highlighting them. Under the
Statistics option select the appropriate statistics to display. Once all the statistics to have displayed are selected, select OK. This option provides both
graphical and numerical descriptive statistics. A separate graph and summary of statistics is displayed for each variable.
To t a distribution to the histogram, click on the red arrow to the left of
Column 1 in the Distribution dialog box. This action provides a drop-down
menu. From the drop-down menu, select Fit Distribution Normal (if we
wanted to t a normal distribution to the data, or choose any other desired
distributions such as Weibull, Exponential, or Poisson). Click the red arrow
by the side of Fitted Normal, and more options become available such as
Goodness of Fit Test and Density Curve.
Example 11.22
23
25
20
16
19
55
42
25
28
29
36
26
27
35
41
55
20
24
29
26
37
38
24
26
34
36
38
39
32
33
25
292
Chapter Eleven
Figure 11.23 JMP display of graphical summary for the data in Example 11.22.
Bar Chart. In the Cast Selected Columns into Roles box, select N from the
Statistics pull-down menu. Then select X, Level option and click OK. Note
that the type of data for the bar chart should be qualitative.
Example 11.23
Solution: Take the following steps to generate a bar chart as shown in Figure
11.24.
1. Enter the data in a column.
2. Under the Command Menu select Graph Chart.
3. A dialog box titled Chart appears.
4. Select the desired column from the Select Columns box.
5. Under the Options box in this window, select Bar Chart from
the drop-down menu.
Figure 11.24 JMP display of bar graph for the data in Example 11.23.
294
Chapter Eleven
Solution: Take the following steps to generate a pie chart as shown in Figure
11.25.
1. Enter the data in a column.
2. Under the Command Menu select Graph Chart.
3. A dialog box entitled Chart appears.
4. Select the desired column from the Select Columns box.
5. Under the Options box in this window, select Pie Chart from
the drop-down menu.
6. Under Cast Selected Columns into Roles, click on the
Statistics button.
7. Select N from the Statistics drop-down menu.
8. Then select the X, Level option.
9. Click OK.
Figure 11.25 JMP printout of pie chart for the data in Example 11.24.
296
Chapter Eleven
23
25
20
16
19
35
42
25
28
29
36
26
27
35
41
30
20
24
29
26
37
38
24
26
Figure 11.26 JMP printout of 1 sample t-test for the data in Example 11.25.
Select the red arrow to the left of the Column 1 in the Distribution box.
Select Test mean. In the Specify Hypothesized Mean box, enter the value
of the mean under the null hypothesis. Then enter the value of the standard
deviation in the box next to Enter True Standard Deviation to do z-test
rather than t-test. The t-test is specied by default.
Example 11.26 Consider the following data from a population with an
unknown mean and standard deviation , which may or may not be known:
23
25
20
16
19
35
42
25
28
29
36
26
27
35
41
30
20
24
29
26
37
38
24
26
34
36
38
39
32
33
25
30
Figure 11.27 JMP printout of 1 sample z-test for the data in Example 11.26.
298
Chapter Eleven
column. From the JMP starter window, select the 2-sample t-test. From
the Select Column option in the dialog box, select the column containing the
response, and then click on the Y Response in the Cast selected columns
into Roles box. Then select second column that contains the sample identication from the same dialog box in the Select Column options, and then
click on the X grouping option in the Cast selected columns into Roles
box. Click OK. For more options such as Displaying Quantiles, click on the
red arrow to the left of Oneway Analysis of Column 1 By Column 2.
Example 11.27 A company bought resistors from two suppliers, 26 resistors from the first supplier and 19 from the second supplier. The following
data show the coded values of the resistance of the resistors bought by the
company. Assuming that the samples come from two normal populations,
determine whether the mean resistance of one population is significantly different from the other. Use 0.05. Also, find a 95% confidence interval for
difference of the two population means.
Sample 1 7.366 7.256 4.537 5.784 5.604 7.987 10.996 6.743 8.739 4.963 9.065 6.451 7.028
6.924 6.525 9.346 5.157 6.372 9.286 3.818 3.221 11.073 6.775 7.779 4.295 5.964
Sample 2 7.730 5.366 4.365 3.234 5.334 6.870 4.268 5.886 7.040 5.434 4.370 4.239 3.875
4.154 5.798 5.995 5.324 5.190 5.330
Solution:
1. Enter the data in a column in a stacked form.
2. In the second column, identify the sample for each observation
that was entered in the rst column.
3. Under the JMP starter window, click two sample t-test.
4. In the Select Columns, select the data column, then click
Y,Response.
5. Select the sample identication column from Select Columns,
and then click X,Response.
6. Click OK.
The JMP output is as shown in Figure 11.28. The output gives a 95%
condence interval (0.66840, 2.59939) for the difference of the two population means. The p-value for the test is 0.0014, which is less than 5% the level
of signicance. Thus, we reject the null hypothesis and conclude that the
mean resistances of the resistors supplied by the two suppliers are signicantly different.
Paired t
From the Menu bar select Analyze Matched Pairs Paired t. Choose
Samples in columns if you have entered raw data in two columns. Once the
columns have been selected in the Select Columns box, (select more than
Figure 11.28 JMP printout of 2-sample t-test for the data in Example 11.27.
one column by holding the Ctrl key), click on the Y, Paired Response button, and click OK.
A graphical display is provided for the paired t-test. Paired t evaluates
the rst sample minus the second sample. The results also provide the condence intervals.
Example 11.28 The following data give the test scores before and after a
two-week training of a group of 15 Six Sigma Green Belts:
Before
83
87
83
78
76
89
79
83
86
90
81
77
74
86
89
After
87
85
89
77
79
84
90
88
83
92
87
81
83
79
85
Test a hypothesis H0: d 0 versus H1: d 0 at the 5% level of significance. Find a 95% confidence interval for the difference in before and
after population means.
Solution:
1. Enter the data in a column in a stacked form.
2. In the second column, identify the sample for each observation
that entered in the rst column.
3. Under the JMP starter window, click the two sample t-test.
4. In the Select Columns, select the data column, then click
Y,Response.
5. Select the sample identication column from Select Columns,
and then click X,Response.
6. Click OK.
300
Chapter Eleven
Figure 11.29 JMP printout of paired t-test for the data in Example 11.28.
Sample 2 7.730 5.366 4.365 3.234 5.334 6.870 4.268 5.886 7.040 5.434 4.370 4.239 3.875
4.154 5.798 5.995 5.324 5.190 5.330
Solution:
1. Enter the data in a column in a stacked form.
2. In the second column, identify the sample for each observation
that was entered in the rst column.
3. Under the JMP starter window, click two sample t-test.
302
Chapter Eleven
23
25
20
16
19
55
42
25
28
29
36
26
27
35
41
55
20
24
29
26
37
38
24
26
34
36
38
39
32
33
Solution: We used the normal quantile plot to verify if the sample comes
from a normal population. To construct the normal quantile plot, take the following steps.
1.
2.
3.
4.
Quantile Plot
55
Dotted
straight line
50
45
40
35
30
Dotted red
curves
25
20
15
Slope represents
standard deviation
-3
-2
-1
Normal quantile
Figure 11.31 JMP display of normal quantile graph for the data in Example 11.30.
331
332
Acknowledgments
xxii
Acknowledgments xxiii
Priya, for reminding him that there is always time for play. Fred would like
to sincerely thank his wife, Julie, and sons, Carl and George, for their love,
support, and patience as he worked on this and two previous books. Without
their encouragement, such projects would not be possible or meaningful.
Bhisham C. Gupta
H. Fred Walker
Bibliography
329
330
Bibliography
quantitative
qualitative
quantitative
quantitative
qualitative
qualitative
2. (a)
(b)
(c)
(d)
(e)
quantitative
qualitative
quantitative
qualitative
quantitative
3. (a)
(b)
(c)
(d)
(e)
ordinal
nominal
nominal
ordinal
nominal
4. (a) interval
(b.) ratio
(c.) ratio
(d.) ratio
5. (a) nominal
(b) ordinal
(c) nominal
(d) interval
(e) ratio
(f) ratio
6. Answers may vary:
(a.) nominal: Types of apples in an apple orchard
(b.) ordinal: Guide book rating of local restaurants: poor, fair, good
(c.) interval: Water temperature of Lake Michigan
(d.) ratio: Cost of tickets to a baseball game
7. (a.)
(b.)
(c.)
(d.)
(e.)
(f.)
(g.)
nominal
ratio
ratio
ordinal
nominal
ratio
interval
8. (a.) descriptive
1
(b.)
(c.)
(d.)
(e.)
(f.)
descriptive
inferential
inferential
inferential
descriptive
9. Population: A collection of all conceivable individuals, elements, numbers or entities which possess a
characteristic of interest
Sample: A portion of a population selected for study
Random Sample: A sample in which every element of the population has an equal chance of being
selected
Representative Sample: A sample that has approximately the same distribution of characteristics as
the population from which it was drawn
Descriptive Statistics: A branch of statistics that uses techniques to organize, summarize, present,
and
interpret a data set to draw conclusions that do not go beyond the boundaries of the data
set.
10. (a.)
(b.)
(c.)
(d.)
(e.)
inferential
descriptive
inferential
inferential
descriptive
11. (a.)
(b.)
(c.)
(d.)
(e.)
population
sample
sample
sample
population
no
no
no
yes
yes
no, since of the person in the telephone book may not be the eligible voters.
2
Chapter 3
1. (a) & (b)
Number of Classes m = 1 + 3.3(log30) = 5.87 6.0
Range 20
=
= 3.33 4.0
Class Width =
m
6
Class Limit
Freq.
Rel. Freq.
Percent
[40 44)
[44 48)
[48 52)
[52 56)
[56 60)
[60 64)
5
7
5
6
6
1
0.1667
0.2333
0.1667
0.2000
0.2000
0.0333
16.67%
23.33%
16.67%
20.0%
20.0%
3.33%
(c)
Histogram of Annual Salaries of Six Sigma Green Belt Workers
7
6
Frequency
5
4
3
2
1
0
39.5
43.5
47.5
51.5
55.5
Annual Salaries
2. (a)
Class Limit
2
3
4
5
6
Freq.
9
11
12
5
3
59.5
63.5
(b
Bar Chart of the Number of Defective Items
12
Frequency
10
8
6
4
2
0
4
5
Number Of Defective Items
(c)
Dotplot of the Number of Defective Items
4
5
Number of Defective Items
Class Limit
Freq.
Rel. Freq.
Percent
[40 45)
[45 50)
[50 55)
[55 60)
[60 65)
[65 70)
5
8
4
4
4
5
0.1667
0.2667
0.1333
0.1333
0.1333
0.1667
16.67%
26.67%
13.33%
13.33%
13.33%
16.67%
(c)
Histogram of the Number of Computers Assembled
9
8
7
Frequency
6
5
4
3
2
1
0
40
45
50
55
60
Number 0f Computers Assembled
65
(d)
Frequency Polygon of the Number of Computers Assembled
9
8
7
Frequency
6
5
4
3
2
1
0
42.5
47.5
52.5
57.5
62.5
Number of Computers Assembled
67.5
70
4. (a)
Bar Chart of R&D Budget
9
8
7
6
5
4
3
2
1
0
Chicago
Detroit
Houston
Facility Locations
New York
St Louis
(b)
Class
Chicago
Detroit
Houston
New York
St. Louis
Freq
3.5
5.4
4.2
8.5
5.5
Percentage
3.5 / 27.1 = 12.9%
19.9%
15.5%
31.4%
20.3%
No. of Degrees
0.129 x 360 = 46
72
56
113
73
St Louis
20.3%
Detroit
19.9%
New York
31.4%
Houston
15.5%
Freq.
Rel. Freq.
Percent
9
12
11
8
0.225
0.300
0.275
0.200
22.5%
30.0%
27.5%
20.0%
6. (a.)
Class Limit
1
2
3
4
Freq.
10
10
13
17
0013355777799
01226689
33345679
0
11112233345666778999
0123344456799
0001111122233345555666799
00
Freq.
Cum. Freq.
[30 35)
[35 40)
[40 45)
[45 50)
[50 - 55)
[55 60)
10
10
8
5
15
12
10
20
28
33
48
60
(c)
Cumulative Frequency Histogram of the Diameter of Ball Bearings
60
Cumulative Frequency
50
40
30
20
10
0
30
35
40
45
50
Diameter of Ball Bearings (mm)
55
60
(d)
Ogive of the Diameter of Ball Bearings
60
Cumulative Frequency
50
40
30
20
10
0
32
37
42
47
52
Diameter of Ball Bearings (mm)
57
10. Stem and Leaf Diagram of the Amount of Gasoline Sold (in Gallons)
50 1: 501 gallons of gasoline
50
51
52
53
54
55
56
57
11.
11355
002233579
0234579
2344445699
01456
024566
012356788
022246679
501
513
529
539
556
568
501
515
532
540
556
570
503
517
533
541
560
572
505
519
534
544
561
572
505
520
534
545
562
572
510
522
534
546
563
574
510
523
534
550
565
576
01111111122233344
556666777788888999
000000012222344444
555557788888999
00000111111122222222333333444444
011111111
222333
4455
66667777
88888999
00000001
22223
4444455555
77
88888999
000001111111
22222222333333
444444
Chapter 4
9
512
524
535
552
566
576
512
525
536
554
567
577
513
527
539
555
568
579
60
55
50
45
40
10
Length of Rods in cm
50
40
30
20
Class Limit
[40 44)
[44 48)
[48 52)
[52 56)
[56 60)
[60 64)
Class
Mark
42
46
50
54
58
62
Freq.
11
5
6
2
11
1
Age in Years
50
45
40
35
30
25
(c) 60 % of the ages are within one standard deviation of the mean
100% of the ages are within two standard deviations of the mean
100% of the ages are within three standard deviations of the mean
13. (a) mean = 37.65, median = 38, standard deviation = 1.748, coefficient of variation = 4.64
(b)
12
Age in Weeks
39
38
37
36
35
(c) 65% of all the ages are within one standard deviation of the mean
100% of all the ages are within two standard deviations of the mean
100% of all the ages are within three standard deviations of the mean
(d) The distribution appears to be skewed left.
14.
Categories
35
36
37
38
39
40
Freq.
5
8
6
7
5
9
mean of the grouped data = 37.65, standard deviation of the grouped data = 1.7475. In this case
note that the grouped mean and grouped standard deviation are equal to the actual mean and standard
deviation.
15. The box plots of the two data sets allow us to visually see the range of the data sets, the shape of their
distributions, median, and their quartile values, while also allowing us to draw both comparative and
individual conclusions. For example, the box plot of data set #1 shows that the data set has a range of
about 10, slightly left skewed distribution, the median lies at about 26, while Q1 and Q3 are about 24
and 28 respectively. The box plot of data set #2 shows that the data set has a range of about 18, the
shape of the distribution is some what left skewed, the median lies at about 51, while Q1 and Q3 are
about 46 and 57 respectively.
13
28
26
24
22
20
55
50
45
40
Chapter 5
1. (1)
(2)
(3)
(4)
A B
A B
Ac B c
A B c Ac B
(5) ( A B)
) (
(1,2)
(2,2)
(1,3)
(2,3)
(1,4)
(2,4)
(1,5)
(2,5)
(1,6)
(2,6)
14
(3,1)
(4,1)
(5,1)
(6,1)
(3,2)
(4,2)
(5,2)
(6,2)
(3,3)
(4,3)
(5,3)
(6,3)
(3,4)
(4,4)
(5,4)
(6,4)
(3,5)
(4,5)
(5,5)
(6,5)
(3,6)
(4,6)
(5,6)
(6,6)
P ( A1 ) P ( B | A1 )
( 0.2)( 0.4 )
0.08
=
= 0.2051
P ( A ) P ( B | A ) + ... + P ( A ) P ( B | A ) ( 0.2 ) ( 0.4 ) + ... + ( 0.3) ( 0.2 ) 0.39
1
P ( A2 | B ) = 0.05
P ( A3 | B ) = 0.6
P ( A4 | B ) = 0.15
8. P (W | m ) =
9.
P ( A2 | lost ) =
P (W ) P ( m | W )
( 0.35)( 0.2)
0.07
=
= 0.4746
P ( D ) P ( m | D ) + ... + P (W ) P ( m | W ) ( 0.4 ) ( 0.1) + ... + ( 0.35) ( 0.2 ) 0.1475
( 0.25)( 0.3)
0.075
=
= 0.3333
P ( A ) P ( lost | A ) + ... + P ( A ) P ( lost | A ) ( 0.4 ) ( 0.15) + ... + ( 0.1) ( 0.4 ) 0.225
10. (i.)
P C2 | H =
(ii.)
P (C4 | T ) =
P ( A2 ) P ( lost | A2 )
( ) (
)
(0.25)(0.75)
0.1875
=
=
= 0.2727
P ( C ) P ( H | C ) + ... + P ( C ) P ( H | C ) ( 0.25) ( 0.9 ) + ... + ( 0.25) ( 0.5) 0.6875
P C2 P H | C2
( 0.25)( 0.5)
0.125
=
= 0.4000
P (C ) P ( T | C ) + ... + P (C ) P ( T | C ) ( 0.25) ( 0.1) + ... + ( 0.25) ( 0.5) 0.3125
11. (i.)
P ( m | favor ) =
P (C4 ) P ( T | C4 )
P ( m ) P ( favor | m )
( 0.55)( 0.75)
0.4125
=
= 0.6962
P ( m ) P ( favor | m ) + P ( f ) P ( favor | f ) ( 0.55) ( 0.75) + ( 0.45) ( 0.4 ) 0.5925
=
P ( M 2 | np ) = 0.1443
13. Let 0A = no accident , 1A = one accident, 2A = two or more accidents, A = accident :
P ( 0 A) P ( A | 0 A)
( 0.6)( 0.01)
0.006
P ( 0 A | A) =
=
=
= 0.2105
P ( 0 A) P ( A | 0 A) + ... + P ( 2A) P ( A | 2A) ( 0.6 ) ( 0.01) + ... + ( 0.15) ( 0.1) 0.0285
14. W = White, AA = African American, H = Hispanic, A = Asian, Sci = Science Major
(i.)
P (W ) P ( Sci | W )
( 0.4 )( 0.5)
0.2
=
= 0.4734
P (W ) P ( Sci | W ) + ... + P ( A) P ( Sci | A) ( 0.4 ) ( 0.5) + ... + ( 0.15) ( 0.75) 0.4225
P (W | Sci ) =
(
)
(iii.) P ( AA | Sci ) = 0.1420
(ii.) P A | Sci = 0.2623
15. (i.)
P ( A) P ( D | A)
( 0.45)( 0.05)
0.0225
=
= 0.6164
P ( A) P ( D | A) + ... + P (C ) P ( D | C ) ( 0.45) ( 0.05) + ... + ( 0.30 ) ( 0.03) 0.0365
(ii.) P ( B | D ) = 0.1370
(iii.) P (C | D ) = 0.2466
P ( A | D) =
P (C4 | 2H ) =
17. P A | D =
P (C4 ) P ( 2H | C4 )
( 0.2)(1.0 )
0.2
=
= 0..5714
P (C ) P ( 2H | C ) + ... + P (C ) P ( 2H | C ) ( 0.2 ) ( 0.25) + ... + ( 0.2 ) ( 0.0 ) 0.35
1
P ( A) P D c | A
( 0.4 )( 0.98)
0.392
=
= 0.4050
P ( A) P ( D | A) + ... + P (C ) P ( D | C ) ( 0.4 ) ( 0.98 ) + ... + ( 0.35) ( 0.96 ) 0.968
c
Chapter 6
1.
(
)
(
)
( ) ( )
()
(b) P ( X < 15) = P (14 ) + P (13) + ... + P (1) + P ( 0 ) = 0.1958
(c) P ( X = 14 ) = 0.1091
()
(d) Let n = 20, p = 0.2, q = 0.80, since the probability of a car being from Maine is 0.2.
P X = 8 = 0.0222
(
) ( )
( )
(b) P ( X > 5) = 1 P ( X 5) = 1 P ( 5) P ( 4 ) ... P (1) P ( 0 ) = 0.3348
(c) P ( X < 8 ) = P ( 7 ) + P ( 6 ) + ... + P (1) + P ( 0 ) = 0.9427
(d) P ( X = 0 ) = 0.0022
(
)(
(
)
(
)
( ) ()
() ( )
(b) P ( X < 8 ) = P ( 7 ) + P ( 6 ) + ... + P (1) + P ( 0 ) = 0.0500
(c) P (10 X 12 ) = P ( X 12 ) P ( X 9 ) = 0.8720 0.2731 = 0.5989
5. N = 100, n = 10, r = 8
(a) P ( X 1) = 1 P ( X = 0 ) = 1
(
)
(c) P ( X = 9 ) = 0.0000
(d) P ( X = 0 ) = 0.4166
8 92
0 10
100
10
= 1
1 7.21 1012
13
1.73 10
) = 1 0.4166 = 0.5834
(b) P X = 10 = 0.0000
8
6. mean = = np = 10
= 0.80
100
variance = 2 =
8 92
N n
100 10
npq =
10
= 0.6691
N 1
100 1
100 100
standard deviation = =
N n
npq = 0.6691 = 0.8180
N 1
17
7. N = 12, n = 5, a = 5, b = 7
(a) P ( X 2 ) = 1 P ( 0 ) P (1) = 1
5 7
0 5
12
5
(b) P ( X 3) = 1 P ( 0 ) P (1) P ( 2 ) = 1
5 7
1 4
12
5
7 5
0 5
12
5
7 5
1 4
12
5
7 5
2 3
12
5
( ) ( ) ()
(d) P ( X = 0 ) = 0.0265
3
8. = ( 2000 )
= 6.0
1000
()
()
()
()
()
(a) P X 4 = 1 P X < 4 = 1 P 3 P 2 P 1 P 0
= 1 0.0025 0.0149 0.0446 0.0892 = 0.8488
(
) ( ) ()
() ( )
(c) P (5 X 8) = P ( X 8) P ( X 4 ) = 0.874 0.446 = 0.428
(d) P ( X < 2 ) = P (1) + P ( 0 ) = 0.0174
(e) P ( X > 2 ) = 1 P ( 2 ) P (1) P ( 0 ) = 1 0.0446 0.0149 0.0025 = 0.9380
(b) P X 10 = P 10 + P 9 + ... + P 1 + P 0 = 0.9574
2
9. = ( 5) = 10.0
1
( ) () ()
() ( )
(b) P ( X 4 ) = 1 P ( X < 4 ) = 1 P ( 3) P ( 2 ) P (1) + P ( 0 ) = 0.9897
(c) P ( 3 X 5) = P ( X 5) P ( X 2 ) = 0.067 0.003 = 0.063
(d) P ( X > 1) = 1 P ( X 1) = 1 P (1) P ( 0 ) = 0.9995
(a) P X < 8 = P 7 + P 6 + ... + P 1 + P 0 = 0.2202
4
10. = ( 25) = 10.0
10
( )
( )
( ) () ( ) () ( )
(b) P ( X 2 ) = P ( 2 ) + P (1) + P ( 0 ) = 0.0028
(c) P ( 2 X 6 ) = P ( X 6 ) P ( X 1) = 0.130 0.000 = 0.130
(d) P ( X < 6 ) = P (5) + P ( 4 ) + ... + P (1) + P ( 0 ) = 0.067
18
11. (a) This experiment can be studied using a binomial model because it satisfies all of the necessary
conditions: there are a fixed number of trials: trails are independent, each trail has only two
possible outcomes, and in each trial the probability of success is the same.
(b) This experiment cannot be studied using a binomial model.
(c) This experiment cannot be studied using a binomial model. By the definition of a binomial
experiment one of the conditions that needs to be satisfied is that each trial has only two possible
outcomes, success and failure. This particular experiment is only observing the number that
appears on the die which means there are six possible outcomes.
(d) This experiment cannot be studied using a binomial model because it fails to satisfy the condition
of having a fixed number of trials, since we are not given that from how many companies we are
selecting the manufacturing company.
12. n = 20, p = 0.60, q = 0.40
(a) P X 5 = 1 P X < 5 = 1 P 5 P 4 ... P 1 P 0 = 0.9987
( )
( )
() ()
() ( )
(b) P ( X 7 ) = P ( 7 ) + P ( 6 ) + ... + P (1) + P ( 0 ) = 0.021
(c) P (5 < X < 10 ) = P ( X 9 ) P ( X 5) = 0.128 0.057 = 0.071
(d) P ( X = 8 ) = 0.0355
(e) P ( X 9 ) = P ( 9 ) + P (8) + ... + P (1) + P ( 0 ) = 0.128
2
13. = (1000 )
= 4.0
500
e x e 4 4 5 ( 0.01832 ) (1024 )
=
=
= 0.1563
x!
5!
120
(b) (i) P X > 5 = 1 P X 5 = 1 P 5 P 4 ... P 1 P 0 = 0.2149
(a) P ( X = 5) =
( )
( )
() ()
() ( )
(ii) P ( X 6 ) = P ( 6 ) + P ( 5) + ... + P (1) + P ( 0 ) = 0.8893
(iii) P ( 4 X 8 ) = P ( X 8 ) P ( X < 4 ) = 0.97864 0.42247 = 0.5562
14. N = 500, n = 20, r = 60 (One can also find this probability very easily by using MINITAB or JMP)
P ( X 3) = P ( 0 ) + P (1) + P ( 2 ) + P ( 3) =
60 440
0
20
500
20
60 440
1
19
500
20
60 440
2
18
500
20
60 440
3
17
500
20
15. (a) = 5
P X = 4 = 0.1755
)
(b) = ( 2 ) ( 5) = 10.0
P ( X > 7 ) = 1 P ( X 7 ) = 1 P ( 7 ) P ( 6 ) ... P (1) P ( 0 ) = 0.7798
19
5
(c) = ( 90 ) = 7.5
60
()
()
()
( )( )
(d) = 2 5 = 10.0
= P ( 1 < Z < 1)
= P ( Z < 1) P ( Z 1)
= 0.8413 0.1587 = 0.6826
15 20 X 20 25 20
(c) P (15 < X < 25) = P
<
<
2
2
2
= P ( 2.5 < Z < 2.5)
X 15 14 15
(b) P X > 14 = P
>
= P Z > 0.67 = 0.5 + 0.2486 = 0.7486
1.5
1.5
X 15 15 15
(c) P ( X < 15) = P
<
= P ( Z < 0 ) = 0.5000
1.5
1.5
20
= 33
=3
27 30 X 30 30 30
<
<
P ( 27 < X < 30 ) = P
3
3
3
(d)
= P ( 1 < Z < 0 )
= P ( Z < 0 ) P ( Z 1)
= 0.5000 0.1587 = 0.3413
3. = 33 , = 3
27 30 X 30 30 30
(a) P ( 27 < X < 30 ) = P
<
<
3
3
3
= P ( 1 < Z < 0 )
= P ( Z < 0 ) P ( Z 1)
= 0.5000 0.1587 = 0.3413
27 30 X 30 35 30
(b) P ( 27 < X < 35) = P
<
<
3
3
3
32 33 X 33 39 33
(c) P ( 32 < X < 39 ) = P
<
<
3
3
3
= P ( Z < 2 ) P ( Z 0.33)
= 0.9772 0.3707 = 0.6065
4. = 0 , = 1
(
)
(b) P ( 2 Z 2 ) = P ( Z 2 ) P ( Z < 2 ) = 0.9772 0.0228 = 0.9544
(c) P ( 3 Z 3) = P ( Z 3) P ( Z < 3) = 0.9987 0.0013 = 0.9974
(a) P 1 Z 1 = P(Z 1) P(Z 1) = 0.8413 0.1587 = 0.6826
5. From the empirical rule we know that approximately 68% of the data values will fall within one
standard deviation of the mean, 95% will fall within two standard deviations of the mean, and 99.7%
will fall within three standard deviations of the mean. It is clear to see that the results obtained in
problem 4 are similar to the results obtained using the empirical rule.
21
(
)
(b) P ( Z 1.2 ) = 1 P ( Z < 1.2 ) = 0.8849
(c) P ( 1.58 Z 2.40 ) = P ( Z 2.40 ) P ( Z < 1.58) = 0.9918 0.0559 = 0.9359
(d) P ( Z 1.96 ) = 1 P ( Z < 1.96 ) = 1 0.9750 = 0.0250
(e) P ( Z 1.96 ) = 0.0250
7. = 1.5
( )
(b) P ( X < 4 ) = P ( X 3) = 1 e = 1 e = 0.9889
(c) P ( 2 < X < 4 ) = P ( X < 4 ) P ( X 2 ) = 0.9889 0.9502 = 0.0387
(d) P ( X < 0 ) = 0.0000
1.5 2
(a) P X > 2 = e x = e ( )( ) = e 3 = 0.0498
x
8.
4.5
9. mean = =
1
1
1
1
1 + =
1 +
= (100 ) ( 2 ) = 200
0.01
0.5
2
1
2
1
variance = = 2 1 +
1 +
2
1
2
1
= 10000 ( 20 ) = 200000
1 +
=
1+
0.5
0.5
0.012
(
)
(b) P ( X > 7000 ) = e (
( x )
x )
11. mean = =
=e (
0.01( 4500 )
) 2 = 0.0012
1
0.01( 7000 )) 2
=e (
= 0.00023
1
1
1
1
1 + =
1 + = (1000 ) ( 0.886226 ) = 886.226
0.001
2
2
2
1
1 +
1 +
2
1
2
1
1 + 1 + = (1000000 ) ( 0.2146034769 ) = 214603.4769
=
2
2
0.0012
1
variance = = 2
2
22
0.001( 799 ))
x
= 0.4719
12. (a) P ( X < 800 ) = P ( X 799 ) = 1 e ( ) = 1 e (
2
0.001(1000 ))
x
= 0.3679
(b) P ( X > 1000 ) = e ( ) = e (
2
2
2
( 0.001(1500 ))
( 0.001(1000 ))
1
e
(c) P 1000 < X < 1500 = P X 1500 P X 1000 = 1 e
= 0.8943 0.6321 = 0.2622
13. = 0.2
()
= e = 0.2466
( )
(b) P ( 7 < X < 10 ) = P ( X < 10 ) P ( X 7 ) = 0.8647 0.7534 = 0.1113
P ( X 8) P ( X > 5) P ( X 8) 0.2019
(c) P ( X 8 | X > 5) =
=
=
= 0.5488
P ( X > 5)
P ( X > 5) 0.36788
(d) P ( X < 7 ) = P ( X 7 ) = 1 0.2466 = 0.7534
(a) P X > 7 = e x = e
) (
0.2 7
1.4
14. P X 5 + 9 5 = P X 9 = 0.4065
15. = 0.00125
(a) P X = 700 = 0.00 , since the probability for a continuous random variable to be equal to any
(
)
(c) P ( 600 < X < 900 ) = P ( X 900 ) P ( 600 ) = 0.6754 0.5274 = 0.1480
(d) P ( X 650 ) = 0.4437
Chapter 8
1. Because the sample size is large, the sampling distribution of x is approximately normal with mean
9
x = 28 and standard deviation x =
= 1.5 .
36
2. mean = x = = 18
variance = 2 x =
25 50 5
= 4.59, Note that population is finite and sample size is > 5% of the
5 50 1
population.
standard deviation = 2.1424
to
6
8
(b) standard error will decrease from
to
10
20
23
to
9 18
(d) standard error will decrease from
to
16
24
4. x = 3000 , x =
100
16
= 25
5. x = 140 , x =
35
=5
49
X 140 145 140
(i) P ( X > 145) = P
>
= P ( Z > 1) = 1 P ( Z 1) = 1 0.8413 = 0.1587
5
5
(
) ( )
(iii) P (132 < X < 148 ) = P ( 1.6 < Z < 1.6 ) = 0.9452 0.0548 = 0.8904
(ii) P X < 140 = P Z < 0 = 0.5000
6. x = 120 , x =
10
= 1.6667
36
X 120 122 120
(i) P ( X > 122 ) = P
>
= P ( Z > 1.20 ) = 1 P ( Z 1.20 ) = 1 0.8849 = 0.1151
1.6667
1.6667
(
) (
)
(iii) P (116 < X < 123) = P ( 2.40 < Z < 1.80 ) = 0.9641 0.0082 = 0.9559
(ii) P X < 115 = P Z < 3.00 = 0.0013
7. x = 70 , x =
= 0.6667
36
X 70 75 70
(i) P ( X > 75) = P
>
= P ( Z > 7.50 ) = 1.0000
0.6667 0.6667
(
) (
)
(iii) P ( 70 < X < 80 ) = P ( 0.00 < Z < 15.00 ) = P ( Z 15.00 ) P ( Z 0.0000 ) = 0.5000
(ii) P X < 70 = P Z < 0.00 = 0.5000
8. (i) Since np > 5 and nq > 5, the sample proportion is approximately normal with
24
pq
=
n
9. (i) In this problem we are given n = 100 and p = 0.5. Thus, we have np > 5 and nq > 5, therefore, the
sample proportion p is approximately normal with mean = p = p = 0.5 and standard deviation
p =
(0.5)(0.5) = 0.05 .
pq
=
n
100
10. (i) In this problem we are given n = 500 and p = 0.8. Thus, we have np> 5 and nq > 5, therefore, the
sample proportion is approximately normal with
mean p = p = 0.8 and standard deviation p =
(0.8)(0.2) = 0.0179 .
pq
=
n
500
11. (i) In this problem we have n = 100 and p = 0.5. Thus, we have np > 5 and nq > 5, therefore, the
sample proportion p is approximately normal with mean = p = p = 0.5 and standard deviation
p =
(0.5)(0.5) = 0.05 .
pq
=
n
100
(
(ii) P (
(iii) P (
(iv) P (
(v) P (
)
6.2621) = 0.975
6.2621) = 0.025
7.2609 ) = 0.95
7.2609 ) = 0.05
2
15
2
15
2
15
1
F12,10,0.05
(ii) F8,10,0.975 =
(iii) F15,20,0.95 =
(iv) F20,15,0.99 =
= 0.3436
1
F10,8,0.025
1
F20,15,0.05
1
F15,20,0.01
= 0.2326
= 0.4292
= 0.3236
Chapter 9
1. margin of error E = z0.025
n
= 1.96
1.5
36
= 0.49
( )
2. (a.) Since E X = , the sample mean X is always an unbiased estimator of the population mean .
Therefore, the point estimate of the population mean wage is 25.
4
(b.) standard error of the point estimate is
=
= 0.5714
n
49
4
(c.) margin of error E = z0.025
= 1.96
= 1.12
49
n
3. x = 12 , s = 0.6 , z = z0.005 = 2.58 , n = 64
2
x
+
z
12
+
2.58
=
12
2.58
2
2
64
n
n
64
= (11.8065 12.1935)
26
(x
x 2 ) z
12 22
62 8.52
+
= ( 203 240 ) 1.645
+
= 37 2.5877
n1 n2
36 49
= ( 39.5877 1 2 34.4123)
12 22
62 8.52
+
= ( 203 240 ) 2.58
+
= 37 4.0585
n1 n2
36 49
= ( 41.0585 1 2 32.9415)
(x
x 2 ) z
s12 s22
10,600 2 12,800 2
+
= ( 295,000 305,000 ) 2.33
+
= 10,000 3667.5485
n1 n2
100
121
= ( 13,667.5485 1 2 6,332.4515)
2
p
(n
=
1) s12 + ( n2 1) s22
n1 + n2 2
16 + 25 2
95% confidence interval for the difference between two population means:
1
1
1 1
x1 x2 tn + n 2, S p
+
= 10.17 12.34 2.021 1.0023
+ = 2.17 0.6485
1
2
16 25
n1 n2
2
= 2.8185 1 2 1.5215
7. 95% confidence interval for the difference between two population means:
(x
x 2 ) z
1
s12 s22
30 40
+
= ( 79 86 ) 1.96
+
= 7 2.573
n1 n2
49 36
= ( 9.573 1 2 4.427 )
98% lower confidence interval for the difference between two population means:
(x x ) z
1
s12 s22
30 40
+
= 79 86 2.06
+
= 7 2.704 = 9.704
49 36
n1 n2
= 9.704 1 2
27
98% upper confidence interval for the difference between two population means:
(x x ) + z
1
s12 s22
30 40
+
= 79 86 + 2.06
+
= 7 + 2.704 = 4.296
49 36
n1 n2
= 1 2 4.4296
8. 95% confidence interval for the difference between two population means:
( x1 x2 ) tm,
s12 s22
+ where,
n1 n2
m=
s12 s22
n + n
1
2
s12
s22
n1
ns
+
n1 1
n2 1
1.212 0.852
16 + 25
1.212
16
16 1
) +(
2
0.852
25
25 1
= 24.44 24
Then,
1.212 0.852
(10.17 12.34 ) 2.069 16 + 25 = 2.17 0.7179 = ( 2.8879 1 2 1.4521)
9. A point estimate of p = p =
18
= 0.02 .
900
0.02 0.98
p q
= 0.02 1.96
= 0.02 0.009 = 0.011 p 0.029
900
n
p z
0.02 0.98
p q
= 0.02 1.645
= 0.02 0.00767 = 0.01233 p 1
900
n
22
= 0.055 , q = 0.945 , n = 400
400
95% confidence interval for the population proportion:
10. p =
p z
2
0.055 ( 0.945)
p q
= 0.055 1.96
= 0.055 0.0223 = ( 0.0327 p 0.0773)
n
400
28
0.055 0.945
p q
= 0.055 1.645
= 0.055 0.0187 = 0.0363 p 1
400
n
p z
11. p1 =
0.055 0.945
p q
= 0.055 + 1.645
= 0.055 + 0.0187 = 0. p 0.0737
400
n
p + z
40
50
= 0.05 , q1 = 0.95 , p 2 =
= 0.083 , q2 = 0.917
800
600
95% confidence interval for the difference between two population proportions:
( p p ) z
1
12. p1 =
2
72
110
= 0.6 , q1 = 0.40 , p 2 =
= 0.73 , q1 = 0.27
120
150
95% confidence interval for the difference between two population proportions:
( p p ) z
1
2
z p q
1.962 ( 0.02 ) ( 0.98 )
2
=
= 120.47 121
13. n =
2
E2
0.025
( )
z pq 1.962 ( 0.02 ) ( 0.98 )
2
=
= 30.12 31
14. n =
2
E2
0.05
( )
When we increase the margin of error from 0.025 to 0.05 we see that the sample size decreases by 90
from 121 to 31.
width 20
=
= 10 , z 2 = z0.02 2 = 2.33
2
2
2.332 30 2
z 2 22
n=
=
= 48.86 49
E2
10 2
15. E =
)( )
29
16. When we decrease the confidence coefficient from 98% to 95% our new sample size is
1.962 30 2
z 2 2 2
n=
=
= 34.57 35
E2
10 2
When we increase the standard deviation from 30 to 50 our new sample size is:
1.962 50 2
z 2 2 2
n=
=
= 96.04 97
E2
10 2
)( )
)( )
z2 2 p1q1 + p2 q2
2
2
2
1 1
2 2
2
2
1 1
2 2
18. (i) n =
z 2 2 2
(ii) n =
(iii) n =
(iv) n =
19. n =
20. Fn
E2
(1.645 )( 40 ) = 10.82 11
=
z 2 22
E
z 2 2 2
E2
z 2 22
E
1,n1 1,
20 2
1.6452 40 2
)( ) = 3.53 4
)( ) = 2.14 3
)( ) = 1.02 2
352
1.6452 40 2
452
1.6452 40 2
652
22
= F14,9,0.05 = 3.03
)( )
30
)( )
= 15.2423 2 48.3828
( n 1) s
n21,
( 25 1)(5 ) = 4.0591
2
36.4151
( n 1) s
n21,1
( 25 1)(5 ) = 6.5823
2
13.8484
The lower one-sided confidence limit, 4.0591, is greater than the lower two-sided confidence limit,
3.9043. Also the upper one-sided confidence limit, 6.5823, is less than the upper two-sided confidence
limit, 6.9557. The difference in these confidence limits has to do with the way that alpha was used in
each of the equations. For the two sided confidence interval we used /2 on each sided and for the
one-sided confidence limits we used .
22. x1 = 27.5 , 1 = 1.96 , n1 = 10
95% confidence interval for the population variance:
2
( n1 1) 12
10 1) 1.962
n1 1) 12 (10 1) 1.96
(
(
2
2
1 2
1
=
2
2.7004
n 1,1 2
19.0228
n 1, 2
= 1.8175 12 12.8034
( n1 1) 12
(10 1) 1.962
2
2
=
= 1.5958 12
1
1
2
21.666
n 1,
2.0879
n 1,1
) (
2
( n2 1) 22
15 1) 2.212
n2 1) 22 (15 1) 2.21
(
(
2
2
2 2
2
=
2
26.119
5.6287
n 1,1 2
n 1, 2
= 2.618 22 12.148
= 2.3464 22
2
2
2
29.1413
n 1,
4.6604
n 1,1
24. Fn
1,n1 1, 2
= F14,9,0.025 = 3.80 , Fn
1,n1 1,1 2
) (
= 1
= 1
Fn 1,n
2 1, 2
F9,14,0.025
= 0.3115
2
2
1
22 2.212
s2 Fn1 1,n2 1, 2 2 s2
2.21
12
= 0.2450 2 2.988
2
25. 95% upper confidence interval for the ratio of two variances:
Fn 1,n 1, = F14,9,0.05 = 3.03
2
s12
1.962
1.962
= 0, 3.03
= 0,2.3832
0, Fn2 1,n1 1, 2 = 0, F14,9,0.05
s2
2.212
2.212
s2
1.962 1 1.962
1
1
,
=
,
0.3773
,
2
=
2.212
= 0.2967,
2
F
F
s
2.21
2 n1 1,n2 1,
9,14,0.05
Chapter 10
32
} {
} {
} {
} {
Test Statistic:
X 500 450
Z=
=
= 6.0
s n
50 36
Since 6.0 > 1.96 , we reject the null hypothesis H 0 .
4. (a.) p value = P Z z = P Z 6.0 = 0.0000 . Since the p value = 0.0000 < 0.05 = , we reject
the null hypothesis H 0 .
} {
} {
33
} = { Z z } = { Z 2.58}
0.005
Test Statistic:
X 58.7 60
Z=
=
= 5.2
s n 2.5 100
Since 5.2 > 2.58 , we reject the null hypothesis H 0 .
6. Rejection Region: RR = Z z
} = { Z z } = { Z 1.96}
0.025
Test Statistic:
X 18.2 18
Z=
=
= 1.33
s n 1.2 64
Since 1.33 < 1.96 , we do not reject the null hypothesis H 0 .
Since the p value = 0.1836 > 0.05 = , we do not reject the null hypothesis H 0 .
Power of the test:
At = 18.5 ; 1 = 1 P 0
z < Z < 0
+ z
s n
s n
2
2
18 18.5
18 18.5
= 1 P
1.96 < Z <
+ 1.96
1.2 64
1.2 64
} = { Z z } = { Z 1.96}
0.025
Test Statistic:
( X X 2 ) ( 1 2 ) = ( 73.54 74.29) 0 = 21.2132
Z= 1
0.22 0.152
s12 s22
+
+
50
50
n1 n2
Since 21.2132 > 1.96 , we reject the null hypothesis H 0 .
} {
} {
Test Statistic:
( X X 2 ) ( 1 2 ) = (68.8 81.5) 0 = 9.8918
Z= 1
5.12 7.4 2
s12 s22
+
+
49
49
n1 n2
Since 9.8918 < 1.645 , we reject the null hypothesis H 0 .
( 1 2 )0 ( 1 2 ) z = P Z >
(b.) Type II Error = P Z >
s12 s22
+
n1 n2
05
1.645
5.12 7.4 2
+
49
49
9. (a.) H 0 : 1 2 = 0 vs. H1 : 1 2 0
Rejection Region: RR = Z z
} = { Z z } = { Z 1.96}
0.025
Test Statistic:
( X X 2 ) ( 1 2 ) = (68,750 74,350 ) 0 = 5.8135
Z= 1
4,930 2 5,400 2
s12 s22
+
+
55
60
n1 n2
Since 5.8135 > 1.96 , we reject the null hypothesis H 0 . In other words, we conclude at the 0.05
significance level that two loan officers do not issue loans of equal value.
(b.) H 0 : 1 2 = 0 vs. H1 : 1 2 < 0
} {
} {
Test Statistic:
( X X 2 ) ( 1 2 ) = (68,750 74,350 ) 0 = 5.8135
Z= 1
4,930 2 5,400 2
s12 s22
+
+
55
60
n1 n2
Since 5.8135 < 2.33 , we reject the null hypothesis H 0 . In other words, we conclude at the 0.01
significance level that on the average loans issued by officer one is less than the average loans
issued by officer two.
} {
} {
35
Z=
X
15.5 15
= 2.86
n 1.4 64
Since 2.86 > 2.33 , we reject the null hypothesis H 0 .
0 1
15 16
(b.) Type II Error = P Z <
+ z = P Z <
+ 2.33
1.4
64
n
} = { T t } = { T 3.25}
9,0.005
Test Statistic:
x 25.4 15
T=
=
= 9.21
s
3.57
n
10
Since 9.21 > 3.25 , we reject the null hypothesis H 0 .
} {
} {
Test Statistic:
x 4,858 5,000
T=
=
= 0.99
575
s
16
n
Since 0.99 > 1.753 , we do not reject the null hypothesis H 0 .
2
p
} = {T t
} = { T 2.797}
( n 1) s + ( n 1) s = (12 1)(9 ) + (14 1)(10 ) = 91.2917
=
2
1
n1 + n2 2
2
2
24,0.005
12 + 14 2
Test Statistic:
36
T=
(X
X 2 ) ( 1 2 )
1 1
+
n1 n2
Sp
(110 115) 0
1
1
+
9.5547
12 14
= 1.33
)
)
)
)
m=
s12 s22
n +n
1
2
} = { T t
24,0.005
92 10 2
12 + 14
} = { T 2.797}
=
= 23.93 = 24
2
2
2
2
s12
s22
10 2
92
12
14
n
n
+
1
s
+
12 1
14 1
n1 1
n2 1
Test Statistic:
( X X 2 ) ( 1 2 ) = (110 115) 0 = 1.34
T= 1
92 10 2
s12 s22
+
+
12 14
n1 n2
( ) (
} = { Z z } = { Z 2.33}
0.01
Test Statistic:
X X 2 1 2
30.25 41.27 0
Z= 1
=
= 4.84
4.52 6.22
12 22
+
+
12
11
n1 n2
) (
) (
} {
} {
(c.) p value = P t18 0.0992 > P t18 1.330 = 0.10 . Thus, p-value > 0.10.
17. X d =
d
n
70
=7
10
1
2
di2
Sd =
n 1
( d ) =
2
2
70 )
(
1
540
= 5.5556
10 1
10
} {
} {
} {
250
380
= 0.625 , p 2 =
= 0.76
400
500
X + X 2 250 + 380
p = 1
=
= 0.70
n1 + n2
400 + 500
19. p1 =
} {
} {
) (
400
500
Since 4.3916 < 1.645 , we reject the null hypothesis H 0 .
(b.) Hypothesis: H 0 : p1 p2 = 0 vs. H1 : p1 p2 0
Rejection Region: RR = Z z
} = { Z z } = { Z 2.33}
0.01
Test Statistic:
( p p 2 ) ( p1 p2 ) =
Z= 1
p1q1 p2 q2
+
n1
n2
500
p =
X1 + X 2
21 + 24
=
= 0.062
n1 + n2 300 + 425
Hypothesis: H 0 : p1 p2 = 0 vs. H1 : p1 p2 0
Rejection Region: RR = Z z
} = { Z z } = { Z 1.96}
Test Statistic:
p p 2 p1 p2
Z= 1
=
p q p q
+
n1
n2
) (
0.025
(0.07 0.056) 0
= 0.7818
0.062 ( 0.938) 0.062 ( 0.938)
+
300
425
} {
} {
Test Statistic:
p p 0.129 0.1
Z=
=
= 1.14
pq
0.1( 0.9 )
n
140
Since 1.14 < 1.645 , we do not reject the null hypothesis H 0 .
} {
} {
2
= X 2 < 8.6718
Rejection Region RR = X 2 < n21,1 = X 2 < 17,0.95
Test Statistic:
( n 1) s 2 = (18 1)(15.6) = 13.26
X2 =
20
2
Since 13.26 > 8.6718 , we do not reject the null hypothesis H 0 .
25. n = 12 , x = 15.8583 , s = 0.4757
} {
2
2
Rejection Region RR = X 2 n1,1
or X 2 n1,
/2
} = { X } or { X
= { X 2.6032} or { X
2
2
11,0.995
Test Statistic:
X
( n 1) s
=
0.2
2
Since 3.565 < 12.446 < 29.8194 , we do not reject then null hypothesis H 0 .
40
2
11,0.005
26.7569
1
26. Rejection Region RR = F
or F Fn1 1,n2 1, 2
Fn 1,n 1, 2
2
1
1
= F
or F F24,35,0.025
F35,24,0.025
Note: Since neither 35 nor 24 can be found in the table we will use approximate values for F using a
method commonly known as interpolation method. Thus, we have
RR = F 1 / 2.15 0.4651 or F 2.02
} {
Test Statistic:
S12 1.475
F= 2 =
= 1.67995
S2 0.878
Since 0.4651 < 1.67995 < 2.02 , we do not reject the null hypothesis H 0 .
1,
} = {F F
24,35,0.05
} = { F 1.82}
Note: Since neither 24 nor 35 can be found in the table we will use approximate value by looking the
value for 25 and 35 degrees of freedom. This will give us somewhat conservative rejection region.
Test Statistic:
S 2 1.475
F = 12 =
= 1.67995
S2 0.878
Since 1.67995 < 1.82 , we do not reject the null hypothesis H 0 .
28. Sample I: n1 = 12 , x1 = 21.075 , s1 = 1.8626
Sample II: n2 = 12 , x 2 = 22.5 , s2 = 1.3837
1
Rejection Region: RR = F
Fn 1,n 1,
2
1
or F Fn1 1,n2 1,
2
1
= F
or F F11,11,0.025
F11,11,0.025
= F 0.2882 or F 3.47
} {
Test Statistic:
S 2 1.86282
F = 12 =
= 1.8124
S2 1.38372
Since 0.2882 < 1.8124 < 3.47 , we do not reject the null hypothesis H 0 .
29. Portfolio I: n1 = 12 , x1 = 1.125 , s1 = 1.5621
Portfolio II: n2 = 12 , x 2 = 0.733 , s2 = 2.7050
1
Rejection Region: RR = F
or F Fn1 1,n2 1,
F
n2 1,n1 1, 2
41
1
= F
or F F11,11,0.025
F11,11,0.025
= F 0.2882 or F 3.47
} {
Test Statistic:
S12 1.56212
F= 2 =
= 0.3335
S2 2.70502
Since 0.2882 < 03335 < 3.47 , we do not reject the null hypothesis H 0 .
1
1
30. Rejection Region RR = F
= F
= F 0.3546
F
F
n
1.n
1,
11,11,0.05
2
1
Test Statistic:
S 2 1.56212
F = 12 =
= 0.3335
S2 2.70502
Chapter 11
1.
Frequency
7
5
7
5
6
4
6
40
Rel.Freq.
7/40
5/40
7/40
5/40
6/40
4/40
6/40
1
Cumul.Freq.
7
12
19
24
30
34
40
Note: The Frequency distribution table can vary depending upon how many classes you choose.
42
43
Mean
StDev
Variance
CoefVar
53.80
12.54
157.14
23.30
44
Median
53.00
IQR
22.25
3. (i)
0.001238
0.009285
0.033656
0.078532
0.132522
0.172279
0.179457
0.153821
0.110559
10
0.067564
11
10
0.035471
12
11
0.016123
13
12
0.006382
14
13
0.002209
15
14
0.000671
16
15
0.000179
17
16
0.000042
18
17
0.000009
19
18
0.000002
20
19
0.000000
21
20
0.000000
22
21
0.000000
23
22
0.000000
24
23
0.000000
25
24
0.000000
26
25
0.000000
27
26
0.000000
28
27
0.000000
29
28
0.000000
30
29
0.000000
31
30
0.000000
45
(ii)
P(r <= r)
0.00124
0.01052
0.04418
0.12271
0.25523
0.42751
0.60697
0.76079
0.87135
10
0.93891
11
10
0.97438
12
11
0.99051
13
12
0.99689
14
13
0.99910
15
14
0.99977
16
15
0.99995
17
16
0.99999
18
17
1.00000
19
18
1.00000
20
19
1.00000
21
20
1.00000
22
21
1.00000
23
22
1.00000
24
23
1.00000
25
24
1.00000
26
25
1.00000
27
26
1.00000
28
27
1.00000
29
28
1.00000
30
29
1.00000
31
30
1.00000
46
4. (a) Random Sample of size 100 from a Binomial Population with n = 60 and p = 0.7.
44
35
44
41
38
45
38
41
41
38
35
42
45
42
43
41
45
45
42
48
41
51
40
38
43
42
38
44
47
48
45
39
39
44
42
42
47
43
43
37
46
45
43
37
46
42
43
46
30
40
38
32
41
45
41
45
39
39
42
41
47
46
43
43
31
45
42
36
37
41
38
43
48
41
43
44
36
45
42
41
43
40
40
47
40
47
35
39
43
41
40
50
44
39
37
36
40
43
42
45
Note: Each time you generate random data using MINITAB, JMP, or, in
fact, any other software the data will be different. You should use
the above data to calculate some of the descriptive measures.
47
(b)
-2.37602
1.29775
0.00277
2.36307
0.75201
-1.78136
-0.15683
0.60029
0.81729
-0.63226
-0.59754
1.43915
1.09275
-0.65072
1.31089
-1.61429
1.67827
-0.73676
1.44223
-1.13921
-0.42632
-1.32405
-0.89732
0.27543
0.31205
1.09948
-1.50948
-0.35195
-1.53494
-0.07543
-0.47851
-0.05165
0.73783
-1.01755
-0.40007
-0.33038
1.47401
-1.80159
-1.52018
-1.18923
-0.27756
1.11539
-0.84455
0.74224
-1.45352
-1.63088
0.22896
-0.44742
1.18517
1.64296
2.03639
-0.66084
-0.51547
2.55221
0.76959
-0.02838
1.02326
0.04541
1.39261
0.57836
0.48845
-0.14035
1.03660
0.64137
0.32206
-0.22875
-0.13656
-1.37834
-0.13024
-0.20022
-0.93577
-2.05991
-1.08695
1.39269
-0.21355
-0.13393
1.13944
-0.13758
0.47625
-1.51160
-2.84154
1.12404
0.83947
-0.39440
-0.47810
-1.18853
0.61019
-0.05268
0.48313
-0.60043
-0.95162
-1.55669
-1.50416
-0.09011
0.10925
0.17342
-0.07181
-0.11341
-2.11343
Note: Each time you generate random data using MINITAB, JMP, or, in
fact, any other software the data will be different. You should use
the above data to calculate some of the descriptive measures.
48
where z0 = -1.645
P ( z z0 ) = 0.10
where z0 = -1.28
P ( z z0 ) = 0.90
where z0 = 1.28
P ( z z0 ) = 0.95
where z0 = 1.6445
Mean
Prob. #1
40
53.8000
Variable
Mean
Prob. #1
40
53.8000
Variable
Mean
Prob. #1
40
53.8000
StDev
SE Mean
12.5355
StDev
1.9820
SE Mean
12.5355
StDev
1.9820
SE Mean
12.5355
1.9820
90% CI
(50.5398, 57.0602)
95% CI
(49.9153, 57.6847)
99% CI
(48.6946, 58.9054)
(b).
Variable
Prob. 4b
0.069944)
N
100
Mean
-0.110786
St Dev
1.098759
49
SE Mean
0.109876
90% CI
(-0.291515,
Variable
Prob. 4b
0.104568)
N
100
Mean
-0.110786
St Dev
1.098759
Variable
Prob. 4b
0.172236)
N
100
Mean
-0.110786
St Dev
1.098759
SE Mean
0.109876
95% CI
(-0.326139,
SE Mean
0.109876
99% CI
(-0.393807,
Note: The confidence intervals in problem 6 (b) are strictly for the data set generated in problem
4(b). If you generate another data set then your confidence intervals will be different from the ones
shown above.
Mean
StDev
SE Mean
Sample 1
15
4.35
1.03
0.27
Sample 2
15
4.40
0.816
0.21
(-0.752233, 0.645566)
Interpretation: Since the confidence interval includes 0 we do not reject the null hypothesis that two
population means are equal, at 5% level of significance.
N
15
Mean
4.35
StDev
1.03
SE Mean
0.27
Sample 2
15
4.40
0.816
0.21
50
(-0.998122, 0.891456)
Interpretation: Since the confidence interval includes 0 we do not reject the null hypothesis that two
population means are equal, at 1% level of significance.
8. (a) H 0 : 1 2 = 0 versus H a : 1 2 0 . Let = 0.05 .
Two-Sample t-Test: Sample 1, Sample 2
Two-sample T for Sample 1 Vs Sample 2
N
Mean
StDev
SE Mean
Sample 1
15
4.35
1.03
0.27
Sample 2
15
4.40
0.816
0.21
Mean
StDev
SE Mean
Sample 1
15
4.35
1.03
0.27
Sample 2
15
4.40
0.816
0.21
Mean
StDev
SE Mean
Sample 1
15
4.35
1.03
0.27
Sample 2
15
4.40
0.816
0.21
9(a)
17
11
13
14
14
12
18
13
16
12
10
11
21
15
12
16
16
21
18
18
15
19
20
19
15
19
17
16
18
14
10
15
17
14
16
16
23
17
14
14
12
17
15
14
13
16
19
14
17
13
13
15
15
17
15
15
10
15
16
15
15
15
12
15
19
11
17
16
15
17
20
12
15
15
16
21
16
16
16
10
19
16
19
19
16
16
15
15
15
18
22
21
18
15
15
23
11
52
(b)
5
10
11
10
(c)
0.92686
1.05933
4.86450
0.89129
1.34490
4.58611
4.23555
0.72925
3.10352
9.80666
1.06287
0.48886
0.78756
1.14476
0.41525
0.99962
0.53098
0.58796
0.87946
6.17721
6.04005
3.52215
0.70621
4.47782
2.82171
2.20250
0.26019
0.49498
3.48382
0.51510
3.94087
0.63957
2.24939
3.60930
0.02941
2.20857
1.86662
5.91489
2.39399
2.11482
0.28320
1.70997
0.20455
1.72291
2.57827
3.19872
6.59576
2.73302
2.44231
0.24167
0.98836
0.85113
2.08334
0.55120
1.56717
2.43075
0.45812
2.49099
3.71435
1.19253
0.61162
0.19957
1.40573
1.93952
0.01378
2.31370
1.07929
1.46698
1.31533
1.86552
1.99617
1.79145
0.87150
0.02049
2.14500
5.15758
3.72229
1.65200
2.35870
2.27768
1.20566
5.05366
4.95891
2.73776
0.11289
2.93519
0.69056
0.20431
1.79161
1.47690
1.53913
0.11944
1.65551
0.94907
1.02282
5.37658
2.80358
0.28814
2.71166
2.00077
53
Note: Each time you generate random data using MINITAB, JMP, or, in
fact, any other software the data will be different. You should use
the above data to calculate some of the descriptive measures.
10. (a)
Stem-and-Leaf Display for the Data in Problem # 1
Stem
7
18
(7)
15
6
3
4
5
6
7
4
1
1
0
0
6
2
3
1
2
7
2
3
1
2
7
3
4
3
3
8
4
6
3
5
9
6
7
5
6
9
7 7 8 8 9
8
6 7 9
(b)
Boxplot of Data from Problem #1
80
70
Data
60
50
40
30
54
11.
Pie Chart of Data from Problem #11
6
16.7%
Category
2
3
4
5
6
2
20.0%
5
23.3%
3
16.7%
4
23.3%
P( X <= x )
0.932427
P( X <= x )
0.999983
x
4
P( X <= x )
0.198381
P( X <= x )
55
12
0.995549
P( X <= x )
1.00000
P( X <= x )
0.686036
P( X <= x )
0.9849
P( X <= x )
0.0151
(b) = 12 and = 4
P(8 Z 16) = 0.8413 0.1587 = 0.6826
Cumulative Distribution Function
Normal with mean = 12 and standard deviation = 4
x
16
x
8
P( X <= x )
0.8413
P( X <= x )
0.1587
(c) = 0 and = 1
P(-1.96 Z 1.96) = 0.9750 0.0250 = 0.95
Cumulative Distribution Function
Normal with mean = 0 and standard deviation = 1
x
1.96
x
-1.96
P( X <= x )
0.9750
P( X <= x )
0.0250
(d) = 0 and = 1
P(-1.645 Z 1.645) = 0.9500 0.0500 = 0.90
56
P( X <= x )
0.9500
x
-1.645
P( X <= x )
0.0500
(e) = 0 and = 1
P(-2.575 Z 2.575) = 0.9950 0.0050 = 0.99
Cumulative Distribution Function
Normal with mean = 0 and standard deviation = 1
x
2.575
x
-2.575
P( X <= x )
0.9950
P( X <= x )
0.0050
Mean
St Dev
SE Mean
Problem 15
32
29.6250
6.8285
1.2071
95% CI
(27.1630, 32.0870)
Mean
St Dev
SE Mean
32
29.6250
6.8285
1.2071
95% CI
(27.1630, 32.0870)
P
0.758
Interpretation: Since the p-vale = 0.758 > = 0.05 , we do not reject the null hypothesis.
57
16.
SE Mean
0.42
0.71
Difference = 1 2
Estimate for 1 2 : 93.00 83.00 = 6.00000
95% CI for 1 2 :
(4.30579, 7.69421)
58
Glossary
Alpha ( )Probability of rejecting a null hypothesis when it is true.
alternative hypothesisA hypothesis formulated using new information.
arithmetic meanThe average of a data set.
bar graphA graph representing categories in a data set with bars of height
equal to the frequencies of the corresponding categories.
Bernoulli trialA trial that has only two possible outcomes, such as success and failure.
Beta ()Probability of not rejecting a null hypothesis when the alternative
is true.
binomial model or binomial distributionA discrete distribution that is
applicable whenever an experiment consists of n independent Bernoulli
trials and the probability of an outcome, say, success, is constant throughout the experiment.
bimodal distributionA distribution that has two modes; a distribution
with more modes is called multimodal distribution; a distribution with one
mode is called unimodal.
bound on error of estimationMaximum difference between the estimated value and the true value of a parameter.
bimodal dataA data set that has two modes.
box and whisker plotA plot that helps detect any outliers in the data and
shows whether the data are skewed.
central limit theoremA theorem that states that irrespective of shape of
the distribution of a population, the distribution of sample means is
approximately normal when the sample size is large.
Chi-square distributionProbability distribution of sum of squares of n
independent normal variables.
classAn interval that includes all observations in a quantitative data set
within two values (for qualitative data, classes are generally dened by
categories).
class frequencyNumber of data points of a data set that belong to a certain class.
class limitsThe two values that determine a class interval. The smaller
value is the lower limit, and the larger value is the upper limit.
class midpointClass midpoint is the average of the lower and upper class
limits; also called class mark.
class widthThe difference between the upper and lower class limits.
coefficient of variation (CV)Ratio of the standard deviation to the mean,
expressed as a percentage.
305
306
Glossary
complement eventAn event that consists of all those sample points that
are in the sample space but not in the given event.
conditional probabilityThe probability of one event happening given that
another event has already happened.
contingency tableA two-dimensional table that contains frequencies or
count data that belong to various categories according to the classication
dictated by two attributes.
continuous distributionProbability distribution of a continuous random
variable.
continuous random variableA random variable that can take any value
in one or more intervals.
correction factor of continuityA correction made when a discrete distribution is approximated with a continuous distribution.
correlation coefficientA unitless measure that measures the strength of
the association between two numerical variables.
critical pointThe point that separates the rejection region from the acceptance region.
cumulative frequencyThe frequency of a class that includes the frequencies of that class and of all the preceding classes.
data setA collection of values collected through experimentation, sample
surveys, or observations.
degrees of freedomNumber of variables in a sample that can independently vary.
dependent eventsEvents when the probability of occurrence of certain
events changes when the information about the occurrence of other events
is taken into consideration.
descriptive statisticsA group of methods used for organizing, summarizing, and representing data using tables, graphs, and summary statistics.
design of experiment (DOE)A well-structured plan that designs an
experiment in which the experimenter can control the factors that might
inuence the outcome of the experiment.
discrete distributionProbability distribution of a discrete random variable.
discrete random variableA random variable that can take nite or countably innite number of values.
empirical ruleA rule that gives the percentage of data points, which falls
within a given number of standard deviation of the mean of data set that
is normally distributed.
equally likely eventEvents when no event can occur in preference to any
other event.
estimationProcedure to nd an estimator of an unknown parameter.
estimatorA rule (sample statistic) that tells us how to nd a single value
known as an estimate of an unknown parameter.
error of estimationThe difference between the estimated value and the
true value of a parameter.
eventA set of one or more outcomes of a random experiment.
expected valueThe weighted average of all the values that a random variable takes.
Glossary 307
308
Glossary
Glossary 309
310
Glossary
Figures
Figure 1.1
Figure 1.2
Figure 1.3
Figure 1.4
Figure 1.5
Figure 2.1
Figure 3.1
Figure 3.2
Figure 3.3
Figure 3.4
Figure 3.5
Figure 3.6
Figure 3.7
Figure 3.8
Figure 3.9
Figure 3.10
Figure 3.11
Figure 3.12
Figure 3.13
Figure 3.14
Figure 3.15
Figure 3.16
Figure 3.17
Figure 3.18
xiii
2
3
4
6
7
12
21
23
25
26
26
29
30
30
31
31
32
32
33
34
36
37
38
38
xiv
Figures
Figures xv
Figure 7.8
Figure 7.9
Figure 7.10
Figure 7.11
Figure 7.12
Figure 7.13
Figure 7.14
Figure 7.15
Figure 7.16
Figure 7.17
Figure 7.18
Figure 7.19
Figure 7.20
Figure 7.21
Figure 7.22
Figure 7.23
Figure 7.24
Figure 7.25
Figure 8.1
Figure 8.2
Figure 8.3
Figure 8.4
Figure 8.5
Figure 8.6
Figure 8.7
Figure 8.8
Figure 8.9
Figure 8.10
Figure 8.11
Figure 8.12
Figure 8.13
Figure 8.14
Figure 8.15
Figure 8.16
Figure 8.17
Figure 8.18
Figure 8.19
Figure 8.20
Figure 8.21
Figure 9.1
Figure 9.2
xvi
Figures
Figures xvii
Tables
Table 1.1
Table 1.2
Table 3.1
Table 3.2
Table 3.3
Table 3.4
Table 3.5
Table 3.6
Table 3.7
Table 3.8
Table 3.9
Table 3.10
Table 3.11
Table 4.1
Table 5.1
Table 5.2
Table 6.1
Table 6.2
Table 6.3
Table 6.4
Table 6.5
Table 7.1
Table 8.1
Table 8.2
Table 8.3
xviii
Tables
Table 8.4
Table 8.5
Table 8.6
Table 10.1
Table 10.2
Table I
Table II
Table III
Table IV
Table V
Table VI
xix
THE
NORMAL
LAW OF ERROR
STANDS OUT IN THE
EXPERIENCE OF MANKIND
AS ONE OF THE BROADEST
GENERALIZATIONS OF NATURAL
PHILOSOPHY IT SERVES AS THE
GUIDING INSTRUMENT IN RESEARCHES
IN THE PHYSICAL AND SOCIAL SCIENCES AND
IN MEDICINE AGRICULTURE AND ENGINEERING
IT IS AN INDISPENSIBLE TOOL FOR THE ANALYSIS AND THE
INTERPRETATION OF THE BASIC DATA OBTAINED BY OBSERVATION AND EXPERIMENT
W. J. Youden
Appendix
Table I
Table II
Table V
Table VI
311
312
Appendix
Table I
Binomial probabilities.
( )
n x
x
Tabulated values are P (X = x ) = n
x p (1 p )
p
n
1
2
3
x
0
1
0
1
2
0
1
2
3
0
1
2
3
4
0
1
2
3
4
5
0
1
2
3
4
5
6
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
8
0
1
2
3
4
5
6
7
8
9
.05
.950
.050
.902
.095
.003
.857
.136
.007
.000
.815
.171
.014
.000
.000
.774
.204
.021
.001
.000
.000
.735
.232
.031
.002
.000
.000
.000
.698
.257
.041
.004
.000
.000
.000
.000
.663
.279
.052
.005
.000
.000
.000
.000
.000
.630
.298
.063
.008
.001
.000
.000
.000
.000
.000
.10
.900
.100
.810
.180
.010
.729
.243
.027
.001
.656
.292
.048
.004
.000
.591
.328
.073
.008
.000
.000
.531
.354
.098
.015
.001
.000
.000
.478
.372
.124
.023
.003
.000
.000
.000
.430
.383
.149
.033
.005
.000
.000
.000
.000
.387
.387
.172
.045
.007
.001
.000
.000
.000
.000
.20
.800
.200
.640
.320
.040
.512
.384
.096
.008
.410
.410
.154
.025
.001
.328
.410
.205
.051
.006
.000
.262
.393
.246
.082
.015
.002
.000
.210
.367
.275
.115
.029
.004
.000
.000
.168
.335
.294
.147
.046
.009
.001
.000
.000
.134
.302
.302
.176
.066
.017
.003
.000
.000
.000
.30
.700
.300
.490
.420
.090
.343
.441
.189
.027
.240
.412
.265
.075
.008
.168
.360
.309
.132
.028
.003
.118
.302
.324
.185
.059
.010
.001
.082
.247
.318
.227
.097
.025
.004
.000
.058
.198
.296
.254
.136
.047
.010
.001
.000
.040
.156
.267
.267
.172
.073
.021
.004
.000
.000
.40
.600
.400
.360
.480
.160
.216
.432
.288
.064
.130
.346
.345
.154
.025
.078
.259
.346
.230
.077
.010
.047
.187
.311
.276
.138
.037
.004
.028
.131
.261
.290
.194
.077
.017
.002
.017
.089
.209
.279
.232
.124
.041
.008
.001
.010
.061
.161
.251
.251
.167
.074
.021
.004
.000
.50
.500
.500
.250
.500
.250
.125
.375
.375
.125
.062
.250
.375
.250
.063
.031
.156
.312
.312
.156
.031
.016
.094
.234
.313
.234
.094
.015
.008
.055
.164
.273
.273
.164
.055
.008
.004
.031
.109
.219
.273
.219
.110
.031
.004
.002
.018
.070
.164
.246
.246
.164
.070
.018
.002
.60
.400
.600
.160
.480
.360
.064
.288
.432
.216
.025
.154
.346
.346
.129
.010
.077
.230
.346
.259
.078
.004
.037
.138
.277
.311
.186
.047
.002
.017
.077
.194
.290
.261
.131
.028
.001
.008
.041
.124
.232
.279
.209
.089
.017
.000
.004
.021
.074
.167
.251
.251
.161
.060
.010
.70
.300
.700
.090
.420
.490
.027
.189
.441
.343
.008
.076
.264
.412
.240
.002
.028
.132
.308
.360
.168
.001
.010
.059
.185
.324
.302
.118
.000
.004
.025
.097
.227
.318
.247
.082
.000
.001
.010
.048
.136
.254
.296
.198
.057
.000
.000
.004
.021
.073
.172
.267
.267
.156
.040
.80
.200
.800
.040
.320
.640
.008
.096
.384
.512
.002
.026
.154
.409
.409
.000
.006
.051
.205
.410
.328
.000
.002
.015
.082
.246
.393
.262
.000
.000
.004
.029
.115
.275
.367
.210
.000
.000
.001
.009
.046
.147
.294
.335
.168
.000
.000
.000
.003
.017
.066
.176
.302
.302
.134
.90
.100
.900
.010
.180
.810
.001
.027
.243
.729
.000
.004
.048
.292
.656
.000
.001
.008
.073
.328
.590
.000
.000
.001
.015
.098
.354
.531
.000
.000
.000
.003
.023
.124
.372
.478
.000
.000
.000
.000
.005
.033
.149
.383
.430
.000
.000
.000
.000
.001
.007
.045
.172
.387
.387
.95
.050
.950
.003
.095
.902
.000
.007
.135
.857
.000
.001
.014
.171
.815
.000
.000
.001
.021
.204
.774
.000
.000
.000
.002
.031
.232
.735
.000
.000
.000
.000
.004
.041
.257
.698
.000
.000
.000
.000
.000
.005
.052
.279
.664
.000
.000
.000
.000
.000
.001
.008
.063
.298
.630
Continued
Appendix 313
Table I
Binomial probabilities.
p
n
10
11
12
13
14
x
0
1
2
3
4
5
6
7
8
9
10
0
1
2
3
4
5
6
7
8
9
10
11
0
1
2
3
4
5
6
7
8
9
10
11
12
0
1
2
3
4
5
6
7
8
9
10
11
12
13
0
1
2
3
4
5
6
7
8
.05
.599
.315
.075
.010
.001
.000
.000
.000
.000
.000
.000
.569
.329
.087
.014
.001
.000
.000
.000
.000
.000
.000
.000
.540
.341
.099
:017
.002
.000
.000
.000
.000
.000
.000
.000
.000
.513
.351
.111
.021
.003
.000
.000
.000
.000
.000
.000
.000
.000
.000
.488
.359
.123
.026
.004
.000
.000
.000
.000
.10
.349
.387
.194
.057
.011
.002
.000
.000
.000
.000
.000
.314
.384
.213
.071
.016
.003
.000
.000
.000
.000
.000
.000
.282
.377
.230
.085
.021
.004
.001
.000
.000
.000
.000
.000
.000
.254
.367
.245
.010
.028
.006
.001
.000
.000
.000
.000
.000
.000
.000
.229
.356
.257
.114
.035
.008
.001
.000
.000
.20
.107
.268
.302
.201
.088
.026
.006
.001
.000
.000
.000
.086
.236
.295
.222
.111
.039
.010
.002
.000
.000
.000
.000
.069
.206
.283
.236
.133
.053
.016
.003
.001
.000
.000
.000
.000
.055
.179
.268
.246
.154
.069
.023
.006
.001
.000
.000
.000
.000
.000
.044
.154
.250
.250
.172
.086
.032
.009
.002
.30
.028
.121
.234
.267
.200
.103
.037
.009
.001
.000
.000
.020
.093
.200
.257
.220
.132
.057
.017
.004
.001
.000
.000
.014
.071
.168
.240
.231
.159
.079
.029
.008
.001
.000
.000
.000
.010
.054
.139
.218
.234
.180
.103
.044
.014
.003
.001
.000
.000
.000
.007
.041
.113
.194
.229
.196
.126
.062
.023
.40
.006
.040
.121
.215
.251
.201
.111
.042
.011
.002
.000
.004
.027
.089
.177
.237
.221
.147
.070
.023
.005
.001
.000
.002
.017
.064
.142
.213
.227
.177
.101
.042
.013
.003
.000
.000
.001
.011
.045
.111
.185
.221
.197
.131
.066
.024
.007
.001
.000
.000
.001
.007
.032
.085
.155
.207
.207
.157
.092
.50
.001
.010
.044
.117
.205
.246
.205
.117
.044
.010
.001
.001
.005
.027
.081
.161
.226
.226
.161
.081
.027
.005
.001
.000
.003
.016
.054
.121
.193
.226
.193
.121
.054
.016
.003
.000
.000
.002
.010
.035
.087
.157
.210
.210
.157
.087
.035
.010
.00
.000
.000
.001
.006
.022
.061
.122
.183
.210
.183
.60
.000
.002
.011
.042
.111
.201
.251
.215
.121
.040
.006
.000
.001
.005
.023
.070
.147
.221
.237
.177
.089
.027
.004
.000
.000
.003
.012
.042
.101
.177
.227
.213
.142
.064
.017
.002
.000
.000
.001
.007
.024
.066
.131
.197
.221
.184
.111
.045
.011
.001
.000
.000
.001
.003
.014
.041
.092
.157
.207
.70
.000
.000
.001
.009
.037
.103
.200
.267
.234
.121
.028
.000
.000
.001
.004
.017
.057
.132
.220
.257
.200
.093
.020
.000
.000
.000
.002
.008
.030
.079
.159
.231
.240
.168
.071
.014
.000
.000
.000
.001
.003
.014
.044
.103
.180
.234
.218
.139
.054
.0100
.000
.000
.000
.000
.001
.007
.023
.062
.126
.80
.000
.000
.000
.001
.006
.026
.088
.201
.302
.268
.107
.000
.000
.000
.000
.002
.010
.039
.111
.222
.295
.236
.086
.000
.000
.000
.000
.001
.003
.016
.053
.133
.236
.283
.206
.069
.000
.000
.000
.000
.000
.001
.006
.023
.069
.154
.246
.268
.179
.055
.000
.000
.000
.000
.000
.000
..002
.010
.032
.90
.000
.000
.000
.000
.000
.002
.011
.057
.194
.387
.349
.000
.000
.000
.000
.000
.000
.003
.016
.071
.213
.384
.314
.000
.000
.000
.000
.000
.000
.001
.004
.021
.085
.230
.377
.282
.000
.000
.000
.000
.000
.000
.000
.001
.006
.028
.100
.245
.367
.254
.000
.000
.000
.000
.000
.000
.000
.000
.001
.95
.000
.000
.000
.000
.000
.000
.001
.011
.075
.315
.599
.000
.000
.000
.000
.000
.000
.000
.001
.014
.087
.329
.569
.000
.000
.000
.000
.000
.000
.000
.000
.002
.017
.099
.341
.540
.000
.000
.000
.000
.000
.000
.000
.000
.000
.003
.021
.111
.351
.513
.000
.000
.000
.000
.000
.000
.000
.000
.000
Continued
314
Appendix
Continued
Table I
Binomial probabilities.
p
x
9
10
11
12
13
14
15 0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
.05
.0000
.000
.000
.000
.000
.000
.463
.366
.135
.031
.005
.001
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.10
.0000
.000
.000
.000
.000
.000
.206
.343
.267
.129
.043
.011
.002
.000
.000
.000
.000
.000
.000
.000
.000
.000
.20
.0003
.000
.000
.000
.000
.000
.035
.132
.231
.250
.188
.103
.043
.014
.004
.001
.000
.000
.000
.000
.000
.000
.30
.0066
.001
.000
.000
.000
.000
.005
.031
.092
.170
.219
.206
.147
.081
.035
.012
.003
.001
.000
.000
.000
.000
.40
.0408
.014
.003
.001
.000
.000
.001
.005
.022
.063
.127
.186
.207
.177
.118
.061
.025
.007
.002
.000
.000
.000
.50
.1222
.061
.022
.006
.001
.000
.000
.001
.003
.014
.042
.092
.153
.196
.196
.153
.092
.042
.014
.003
.001
.000
.60
.2066
.155
.085
.032
.007
.001
.000
.000
.000
.002
.007
.025
.061
.118
.177
.207
.186
.127
.063
.022
.005
.001
.70
.1963
.229
.194
.113
.041
.007
.000
.000
.000
.000
.001
.003
.012
.035
.081
.147
.206
.219
.170
.092
.031
.005
.80
.0860
.172
.250
.250
.154
.044
.000
.000
.000
.000
.000
.000
.001
.004
.014
.043
.103
.188
.250
.231
.132
.035
.90
.0078
.035
.114
.257
.356
.229
.000
.000
.000
.000
.000
.000
.000
.000
.000
.002
.011
.043
.129
.267
.343
.206
.95
.000
.004
.026
.123
.359
.488
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.001
.005
.031
.135
.366
.463
Appendix 315
Table II
Poisson probabilities.
e x
x!
x
0
1
2
3
4
5
6
7
0.1
.905
.091
.005
.000
.000
.000
.000
.000
0.2
.819
.164
.016
.001
.000
.000
.000
.000
0.3
.741
.222
.033
.003
.000
.000
.000
.000
0.4
.670
.268
.054
.007
.000
.000
.000
.000
0.5
.607
.303
.076
.013
.002
.000
.000
.000
x
0
1
2
3
4
5
6
7
8
9
1.1
.333
.366
.201
.074
.020
.005
.001
.000
.000
.000
1.2
.301
.361
.217
.087
.026
.006
.001
.000
.000
.000
1.3
..273
.354
.230
.100
.032
.008
.002
.000
.000
.000
1.4
.247
.345
.242
.113
.040
.011
.003
.001
.000
.000
1.5
.223
.335
.251
.126
.047
.014
.004
.001
.000
.000
x
0
1
2
3
4
5
6
7
8
9
10
11
12
2.1
.123
.257
.270
.189
.099
.042
.015
.004
.001
.000
.000
.000
.000
2.2
.111
.244
.268
.197
.108
.048
.017
.006
.002
.000
.000
.000
.000
2.3
.100
.231
.265
.203
.117
.054
.021
.007
.002
.001
.000
.000
.000
2.4
.091
.218
.261
.209
.125
.060
.024
.008
.003
.001
.000
.000
.000
2.5
.082
.205
.257
.214
.134
.067
.028
.010
.003
.001
.000
.000
.000
0.6
.549
.329
.099
.020
.003
.000
.000
.000
0.7
.497
.348
.122
.028
.005
.001
.000
.000
0.8
.449
.360
.144
.038
.008
.001
.000
.000
0.9
.407
.366
.165
.049
.011
.002
.000
.000
1.0
.368
.368
.184
.061
.015
.003
.001
.000
1.6
.202
.323
.258
.138
.055
.018
.005
.001
.000
.000
1.7
.183
.311
.264
.150
.064
.022
.006
.002
.000
.000
1.8
.165
.298
.268
.161
.072
.026
.008
.002
.001
.000
1.9
.150
.284
.270
.171
.081
.031
.010
.003
.001
.000
2.0
.135
.271
.271
.180
.090
.036
.012
.003
.001
.000
2.6
.074
.193
.251
.218
.141
.074
.032
.012
.004
.001
.000
.000
.000
2.7
.067
.182
.245
.221
.149
.080
.036
.014
.005
.001
.000
.000
.000
2.8
.061
.170
.238
.223
.156
.087
.041
.016
.006
.002
.001
.000
.000
2.9
.055
.160
.231
.224
.162
.094
.046
.019
.007
.002
.001
.000
.000
3.0
.050
.149
.224
.224
.168
.101
.050
.022
.008
.003
.001
.000
.000
Continued
316
Appendix
Continued
Table II
Poisson probabilities.
x
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
4.1
.017
.068
.139
.190
.195
.160
.109
.064
.033
.015
.006
.002
.009
.000
.000
.000
4.2
.015
.063
.132
.185
.194
.163
.114
.069
.036
.017
.007
.003
.001
.000
.000
.000
4.3
.014
.058
.125
.180
.193
.166
.119
.073
.039
.019
.008
.003
.001
.000
.000
.000
4.4
.012
.054
.119
.174
.192
.169
.124
.078
.043
.021
.009
.004
.001
.001
.000
.000
4.5
.011
.050
.113
.169
.190
.171
.128
.082
.046
.023
.010
.004
.002
.001
.000
.000
4.6
.010
.046
.106
.163
.188
.173
.132
.087
.050
.026
.012
.005
.002
.001
.000
.000
4.7
.009
.043
.101
.157
.185
.174
.136
.091
.054
.028
.013
.006.
.002
.001
.000
.000
4.8
.008
.040
.095
.152
.182
.175
.140
.096
.058
.031
.015
.006
.003
.001
.000
.000
4.9
.007
.037
.089
.146
.179
.175
.143
.100
.061
.033
.016
.007
.003
.001
.000
.000
5.0
.007
.034
.084
.140
.176
.176
.146
.104
.065
.036
.018
.008
.003
.001
.000
.000
5.6
.004
.021
.058
.108
.152
.170
.158
.127
.089
.055
.031
.016
.007
.003
.001
.000
.000
.000
5.7
.003
.019
.054
.103
.147
.168
.159
.130
.093
.059
.033
.017
.008
.004
.002
.001
.000
.000
5.8
.003
.018
.051
.099
.143
.166
.160
.133
.096
.062
.036
.019
.009
.004
.002
.001
.000
.000
5.9
.003
.016
.048
.094
.138
.163
.161
.135
.100
.065
.039
.021
.010
.005
.002
.001
.000
.000
6.0
.002
.015
.045
.089
.134
.161
.161
.138
.103
.069
.041
.023
.011
.005
.002
.001
.000
.000
6.6
.001
.010
.029
.065
.108
.142
.156
.147
.122
.089
.059
.035
.019
.010
.005
.002
.001
.000
.000
.000
6.7
.001
.008
.028
.062
.103
.139
,155
.148
.124
.092
.062
.038
.021
.011
.005
.002
.001
.000
.000
.000
6.8
.001
.008
.026
.058
.099
.135
.153
.149
.126
.095
.065
.040
.023
.012
.006
.003
.001
.000
.000
.000
6.9
.001
.007
.024
.055
.095
.131
.151
.149
.128
.098
.068
.043
.025
.013
.006
.003
.001
.001
.000
.000
7.0
.001
.007
.022
.052
.091
.128
.149
.149
.130
.101
.071
.045
.026
.014
.007
.003
.001
.001
.000
.000
x
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
5.1
.006
.031
.079
.135
.172
.175
.149
.109
.069
.039
.020
.009
.004
.002
.001
.000
.000
.000
5.2
.006
.029
.075
.129
.168
.175
.151
.113
.073
.042
.022
.010
.005
.002
.001
.000
.000
.000
5.3
.005
.027
.070
.124
.164
.174
.154
.116
.077
.045
.024
.012
.005
.002
.001
.000
.000
.000
5.4
.005
.024
.066
.119
.160
.173
.156
.120
.081
.049
.026
.013
.006
.002
.001
.000
.000
.000
5.5
.004
.022
.062
.113
.156
.171
.157
.123
.085
.052
.029
.014
.007
.003
.001
.000
.000
.000
x
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
6.1
.002
.014
.042
.085
.129
.158
.160
.140
.107
.072
.044
.024
.012
.006
.003
.001
.000
.000
.000
.000
6.2
.002
.013
.040
.081
.125
.155
.160
.142
.110
.076
.047
.026
.014
.007
.003
.001
.001
.000
.000
.000
6.3
,002
.012
.036
.077
.121
.152
.159
.144
.113
.079
.050
.029
.015
.007
.003
.001
.001
.000
.000
.000
6.4
.002
.011
.034
.073
.116
.149
.159
.145
.116
.083
.053
.031
.016
.008
.004
.002
.001
.000
.000
.000
6.5
.002
.010
.032
.069
.112
.145
.158
.146
.119
.086
.056
.033
.018
.009
.004
.002
.001
.000
.000
.000
Appendix 317
z
0.0
0.1
0.2
0.3
0.4
0.5
.00
.0000
.0398
.0793
.1179
.1554
.1915
.01
.0040
.0438
.0832
.1217
.1591
.1950
.02
.0080
.0478
.0871
.1255
.1628
.1985
.03
.0120
.0517
.0910
.1293
.1664
.2019
.04
.0160
.0557
.0948
.1331
.1700
.2054
.05
.0199
.0596
.0987
.1368
.1736
.2088
.06
.0239
.0636
.1026
.1406
.1772
.2123
.07
.0279
.0675
.1064
.1443
.1808
.2157
.08
.0319
.0714
.1103
.1480
.1844
.2190
.09
.0359
.0753
.1141
.1517
.1879
.2224
0.6
0.7
0.8
0.9
1.0
.2257
.2580
.2881
.3159
.3413
.2291
.2611
.2910
.3186
.3438
.2324
.2642
.2939
.3212
.3461
.2357
.2673
.2967
.3238
.3485
.2389
.2704
.2995
.3264
.3508
.2422
.2734
.3023
.3289
.3531
.2454
.2764
.3051
.3315
.3554
.2486
.2794
.3078
.3340
.3577
.2517
.2823
.3106
.3365
.3599
.2549
.2852
.3133
.3389
.3621
1.1
1.2
1.3
1.4
1.5
.3643
.3849
.4032
.4192
.4332
.3665
.3869
.4049
.4207
.4345
.3686
.3888
.4066
.4222
.4357
.3708
.3907
.4082
.4236
.4370
.3729
.3925
.4099
.4251
.4382
.3770
.3962
.4131
.4279
.4406
.3790
.3980
.4147
.4292
.4418
.3810
.3997
.4162
.4306
.4429
.3830
.4015
.4177
.4319
.4441
.3830
.4015
.4177
.4319
.4441
1.6
1.7
1.8
1.9
2.0
.4452
.4554
.4641
:4713
.4772
.4463
.4564
.4649
.4719
.4778
.4474
.4573
.4656
.4726
.4783
.4484
.4582
.4664
.4732
.4788
.4495
.4591
.4671
.4738
.4793
.4505
.4599
.4678
.4744
.4798
.4515
.4608
.4686
.4750
.4803
.4525
.4616
.4693
.4756
.4808
.4535
.4625
.4699
.4761
.4812
.4545
.4633
.4706
.4767
.4817
2.1
2.2
2.3
2.4
2.5
.4821
.4861
.4893
.4918
.4938
.4826
.4864
.4896
.4920
.4940
.4830
.4868
.4898
.4922
.4941
.4834
.4871
.4901
.4925
.4943
.4838
.4875
.4904
.4927
.4945
.4842
.4878
.4906
.4929
.4946
.4846
.4881
.4909
.4931
.4948
.4850
.4884
.4911
.4932
.4949
.4854
.4887
.4913
.4934
.4951
.4857
.4890
.4916
.4936
.4952
2.6
2.7
2.8
2.9
3.0
.4953
.4965
.4974
.4981
.4987
.4955
.4966
.4975
.4982
.4987
.4956
.4967
.4976
.4982
.4987
.4957
.4968
.4977
.4983
.4988
.4959
.4969
.4977
.4984
.4988
.4960
.4970
.4978
.4984
.4989
.4961
.4971
.4979
.4985
.4989
.4962
.4972
.4979
.4985
.4989
.4963
.4973
.4980
.4986
.4990
.4964
.4974
.4981
.4986
.4990
For negative values of z the probabilities are found by using the symmetric property.
318
Appendix
Table IV
2
11
12
13
14
15
16
17
18
19
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
140
150
160
170
180
190
100
2
0.995
0.00004
0.0100
0.0717
0.2070
0.4117
0.6757
0.9893
1.3444
1.7349
2.1559
2.6032
3.0738
3.5650
4.0747
4.6009
5.1422
5.6972
6.2648
6.8440
7.4339
8.0337
8.6427
9.2604
9.8862
10.5197
11.1603
11.8076
12.4613
13.1211
13.7867
20.7065
27.9907
35.5346
43.2752
51.1720
59.1963
67.3276
2
0.990
0.00016
0.0201
0.1148
0.2971
0.5543
0.8720
1.2390
1.6465
2.0879
2.5582
3.0535
3.5706
4.1069
4.6604
5.2294
5.8122
6.4078
7.0149
7.6327
8.2604
8.8972
9.5425
10.1957
10.8564
11.5240
12.1981
12.8786
13.5648
14.2565
14.9535
22.1643
29.7067
37.4848
45.4418
53.5400
61.7541
70.0648
2
0.975
0.00098
0.0506
0.2158
0.4844
0.8312
1.2373
1.6899
2.1797
2.7004
3.2470
3.8158
4.4038
5.0087
5.6287
6.2621
6.9077
7.5642
8.2308
8.9066
9.5908
10.2829
10.9823
11.6885
12.4011
13.1197
13.8439
14.5733
15.3079
16.0471
16.7908
24.4331
32.3574
40.4817
48.7576
57.1532
65.6466
74.2219
2
0.950
0.00393
0.1026
0.3518
0.7107
1.1455
1.6354
2.1674
2.7326
3.3251
3.9403
4.5748
5.2260
5.8919
6.5706
7.2609
7.9616
8.6718
9.3905
10.1170
10.8508
11.5913
12.3380
13.0905
13.8484
14.6114
15.3791
16.1513
16.9279
17.7083
18.4926
26.5093
34.7642
43.1879
51.7393
60.3915
69.1260
77.9295
2
0.900
0.01589
0.2107
0.5844
1.0636
1.6103
2.2041
2.8331
3.4895
4.1682
4.8652
5.5778
6.3038
7.0415
7.7895
8.5468
9.3122
10.085
10.865
11.6509
12.4426
13.2396
14.0415
14.8479
15.6587
16.4734
17.2919
18.1138
18.9392
19.7677
20.5992
29.0505
37.6886
46.4589
55.3290
64.2778
73.2912
82.3581
Continued
Appendix 319
Continued
Table IV
111
112
113
114
115
116
117
118
119
110
111
112
1~3
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
140
150
160
170
180
190
100
2
0.050
3.8415
5.9915
7.8147
9.4877
11.0705
12.5916
14.0671
15.5073
16.9190
18.3070
19.6751
21.0261
22.3621
23.6848
24.9958
26.2962
27.5871
28.8693
30.1435
31.4104
32.6705
33.9244
35.1725
36.4151
37.6525
38.8852
40.1133
41.3372
42.5569
43.7729
55.7585
67.5048
79.0819
90.5312
101.8795
113.1452
124.3421
2
0.025
5.0239
7.3778
9.3484
11.1433
12.8325
14.4494
16.0128
17.5346
19.0228
20.4831
21.9200
23.3367
24.7356
26.1190
27.4884
28.8454
30.1910
31.5264
32.8523
34.1696
35.4789
36.7807
38.0757
39.3641
40.6465
41.9232
43.1944
44.4607
45.7222
46.9792
59.3417
71.4202
83.2976
95.0231
106.6285
118.1360
129;5613
2
0.010
6.6349
9.2103
11.3449
13.2767
15.0863
16.8119
18.4753
20.0902
21.6660
23.2093
24.7250
26.2170
27.6883
29.1413
30.5779
31.9999
33.4087
34.8053
36.1908
37.5662
38.9321
40.2894
41.6384
42.9798
44.3141
45.6417
46.9630
48.2782
49.5879
50.8922
63.6907
76.1539
88.3794
100.4251
112.3288
124.1162
135.8070
2
0.005
7.8794
10.5966
12.8381
14.8602
16.7496
18.5476
20.2777
21.9550
23.5893
25.1882
26.7569
28.2995
29.8194
31.3193
32.8013
34.2672
35.7185
37.1564
38.5822
39.9968
41.4010
42.7956
44.1813
45.5585
46.9278
48.2899
49.6449
50.9933
52.3356
53.6720
66.7659
79.4900
91.9517
104.2148
116.3210
128.2290
140.1697
320
Appendix
Table V
t v,
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
40
60
80
100
120
t.100
3.078
1.886
1.638
1.533
1.476
1.440
1.415
1.397
1.383
1.372
1.363
1.356
1.350
1.345
1.341
1.337
1.333
1..330
1.328
1.325
1.323
1.321
1.319
1.318
1.316
1.315
1.314
1.313
1.311
1.310
1.303
1.296
1.292
1.290
1.289
1.282
t.050
6.314
2.920
2.353
2.132
2.015
1.943
1.895
1.860
1.833
1.812
1.796
1.782
1.771
1.761
1.753
1.746
1.740
1.734
1.729
1.725
1.721
1.717
1.714
1.711
1.708
1.706
1.703
1.701
1.699
1.697
1.684
1.671
1.664
1.660
1.658
1.645
t.025
12.706
4.303
3.182
2.776
2.571
2.447
2.365
2.306
2.262
2.228
2.201
2.179
2.160
2.145
2.131
2.120
2.110
2.101
2.093
2.086
2.080
2.074
2.069
2.064
2.060
2.056
2.052
2.048
2.045
2.042
2.021
2.000
1.990
1.984
1.980
1.960
t.010
31.821
6.965
4.541
3.747
3.365
3.143
2.998
2.896
2.821
2.764
2.718
2.681
2.650
2.624
2.602
2.583
2.567
2.552
2.539
2.528
2.518
2.508
2.500
2.492
2.485
2.479
2.473
2.467
2.462
2.457
2.423
2.390
2.374
2.364
2.358
2.326
t.005
63.657
9.925
5.841
4.604
4.032
3.707
3.499
3.355
3.250
3.169
3.106
3.055
3.012
2.977
2.947
2.921
2.898
2.878
2.861
2.845
2.831
2.819
2.807
2.797
2.787
2.779
2.771
2.763
2.756
2.750
2.704
2.660
2.639
2.626
2.617
2.576
Critical points of t for lower tail areas are found by using the symmetric property.
t.0005
636.619
31.599
12.924
8.610
6.869
5.959
5.408
5.041
4.781
4.587
4.437
4.318
4.221
4.140
4.073
4.015
3.965
3.922
3.883
3.850
3.819
3.792
3.768
3.745
3.725
3.707
3.690
3.674
3.659
3.646
3.551
3.460
3.416
3.390
3.373
3.291
Appendix 321
Table VI Critical values of F with numerator and denominator degrees of freedom 1, 2
respectively ( 0.10).
F v1, v2,
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
25
30
35
40
45
50
60
70
80
90
100
1
39.86
8.53
5.54
4.54
4.06
3.78
3.59
3.46
3.36
3.29
3.23
3.18
3.14
3.10
3.07
3.05
3.03
3.01
2.99
2.97
2.92
2.88
2.85
2.84
2.82
2.81
2.79
2.78
2.77
2.76
2.76
2.71
2
49.50
9.00
5.46
4.32
3.78
3.46
3.26
3.11
3.01
2.92
2.86
2.81
2.76
2.73
2.70
2.67
2.64
2.62
2.61
2.59
2.53
2.49
2.46
2.44
2.42
2.41
2.39
2.38
2.37
2.36
2.36
2.30
3
53.59
9.16
5.39
4.19
3.62
3.29
3.07
2.92
2.81
2.73
2.66
2.61
2.56
2.52
2.49
2.46
2.44
2.42
2.40
2.38
2.32
2.28
2.25
2.23
2.21
2.20
2.18
2.16
2.15
2.15
2.14
2.08
4
55.83
9.24
5.34
4.11
3.52
3.18
2.96
2.81
2.69
2.61
2.54
2.48
2.43
2.39
2.36
2.33
2.31
2.29
2.27
2.25
2.18
2.14
2.11
2.09
2.07
2.06
2.04
2.03
2.02
2.01
2.00
1.94
5
57.24
9.29
5.31
4.05
3.45
3.11
2.88
2.73
2.61
2.52
2.45
2.39
2.35
2.31
2.27
2.24
2.22
2.20
2.18
2.16
2.09
2.05
2.02
2.00
1.98
1.97
1.95
1.93
1.92
1.91
1.91
1.85
6
58.20
9.33
5.28
4.01
3.40
3.05
2.83
2.67
2.55
2.46
2.39
2.33
2.28
2.24
2.21
2.18
2.15
2.13
2.11
2.09
2.02
1.98
1.95
1.93
1.91
1.90
1.87
1.86
1.85
1.84
1.83
1.77
7
58.91
9.35
5.27
3.98
3.37
3.01
2.78
2.62
2.51
2.41
2.34
2.28
2.23
2.19
2.16
2.13
2.10
2.08
2.06
2.04
1.97
1.93
1.90
1.87
1.85
1.84
1.82
1.80
1.79
1.78
1.78
1.72
8
59.44
9.37
5.25
3.95
3.34
2.98
2.75
2.59
2.47
2.38
2.30
2.24
2.20
2.15
2.12
2.09
2.06
2.04
2.02
2.00
1.93
1.88
1.85
1.83
1.81
1.80
1.77
1.76
1.75
1.74
1.73
1.67
9
59.86
9.38
5.24
3.94
3.32
2.96
2.72
2.56
2.44
2.35
2.27
2.21
2.16
2.12
2.09
2.06
2.03
2.00
1.98
1.96
1.89
1.85
1.82
1.79
1.77
1.76
1.74
1.72
1.71
1.70
1.69
1.63
10
60.19
9.39
5.23
3.92
3.30
2.94
2.70
2.54
2.42
2.32
2.25
2.19
2.14
2.10
2.06
2.03
2.00
1.98
1.96
1.94
1.87
1.82
1.79
1.76
1.74
1.73
1.71
1.69
1.68
1.67
1.66
1.60
Continued
322
Appendix
Continued
Table VI Critical values of F with numerator and denominator degrees of freedom 1, 2
respectively ( 0.10).
dId
11
1 60.47
2 9.40
3 5.22
4 3.91
5 3.28
6 2.92
7 2.68
8 2.52
9 2.40
10 2.30
11 2.23
12 2.17
13 2.12
14 2.07
15 2.04
16 2.01
17 1.98
18 1.95
19 1.93
20 1.91
25 1.84
30 1.79
35 1.76
40 1.74
45 1.72
50 1.70
60 1.68
70 1.66
80 1.65
90 1.64
100 1.64
1.57
12
60.71
9.41
5.22
3.90
3.27
2.90
2.67
2.50
2.38
2.28
2.21
2.15
2.10
2.05
2.02
1.99
1.96
1.93
1.91
1.89
1.82
1.77
1.74
1.71
1.70
1.68
1.66
1.64
1.63
1.62
1.61
1.55
13
60.90
9.41
5.21
3.89
3.26
2.89
2.65
2.49
2.36
2.27
2.19
2.13
2.08
2.04
2.00
1.97
1.94
1.92
1.89
1.87
1.80
1.75
1.72
1.70
1.68
1.66
1.64
1.62
1.61
1.60
1.59
1.52
14
61.07
9.42
5.20
3.88
3.25
2.88
2.64
2.48
2.35
2.26
2.18
2.12
2.07
2.02
1.99
1.95
1.93
1.90
1.88
1.86
1.79
1.74
1.70
1.68
1.66
1.64
1.62
1.60
1.59
1.58
1.57
1.50
15
61.22
9.42
5.20
3.87
3.24
2.87
2.63
2.46
2.34
2.24
2.17
2.10
2.05
2.01
1.97
1.94
1.91
1.89
1.86
1.84
1.77
1.72
1.69
1.66
1.64
1.63
1.60
1.59
1.57
1.56
1.56
1.49
20
61.74
9.44
5.18
3.84
3.21
2.84
2.59
2.42
2.30
2.20
2.12
2.06
2.01
1.96
1.92
1.89
1.86
1.84
1.81
1.79
1.72
1.67
1.63
1.61
1.58
1.57
1.54
1.53
1.51
1.50
1.49
1.42
25
62.05
9.45
5.17
3.83
3.19
2.81
2.57
2.40
2.27
2.17
2.10
2.03
1.98
1.93
1.89
1.86
1.83
1.80
1.78
1.76
1.68
1.63
1.60
1.57
1.55
1.53
1.50
1.49
1.47
1.46
1.45
1.38
30
62.26
9.46
5.17
3.82
3.17
2.80
2.56
2.38
2.25
2.16
2.08
2.01
1.96
1.91
1.87
1.84
1.81
1.78
1.76
1.74
1.66
1.61
1.57
1.54
1.52
1.50
1.48
1.46
1.44
1.43
1.42
1.34
40
62.53
9.47
5.16
3.80
3.16
2.78
2.54
2.36
2.23
2.13
2.05
1.99
1.93
1.89
1.85
1.81
1.78
1.75
1.73
1.71
1.63
1.57
1.53
1.51
1.48
1.46
1.44
1.42
1.40
1.39
1.38
1.30
50
62.69
9.47
5.15
3.80
3.15
2.77
2.52
2.35
2.22
2.12
2.04
1.97
1.92
1.87
1.83
1.79
1.76
1.74
1.71
1.69
1.61
1.55
1.51
1.48
1.46
1.44
1.41
1.39
1.38
1.36
1.35
1.26
75
62.90
9.48
5.15
3.78
3.13
2.75
2.51
2.33
2.20
2.10
2.02
1.95
1.89
1.85
1.80
1.77
1.74
1.71
1.69
1.66
1.58
1.52
1.48
1.45
1.43
1.41
1.38
1.36
1.34
1.33
1.32
1.21
To find the critical value of F when is under the lower tail denoted by F1,2,1 we use the
following formula: F
. Example: Fv ,v ,1.10= 1
1 2
1,2,1 = 1
Fv ,v ,.10
F , ,
2 1
2 1
100
63.01
9.48
5.14
3.78
3.13
2.75
2.50
2.32
2.19
2.09
2.01
1.94
1.88
1.83
1.79
1.76
1.73
1.70
1.67
1.65
1.56
1.51
1.47
1.43
1.41
1.39
1.36
1.34
1.32
1.30
1.29
1.18
63.33
9.49
5.13
3.76
3.11
2.72
2.47
2.30
2.16
2.06
1.97
1.90
1.85
1.80
1.76
1.72
1.69
1.66
1.63
1.61
1.52
1.46
1.41
1.38
1.35
1.33
1.29
1.27
1.24
1.23
1.21
1.00
Appendix 323
Table VI Critical values of F with numerator and denominator degrees of freedom 1, 2
respectively ( 0.05).
dfd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
25
30
35
40
45
50
60
70
80
90
100
1
161.45
18.51
10.13
7.71
6.61
5.99
5.59
5.32
5.12
4.97
4.84
4.75
4.67
4.60
4.54
4.49
4.45
4.41
4.38
4.35
4.24
4.17
4.12
4.09
4.06
4.03
4.00
3.98
3.96
3.95
3.94
3.84
2
199.50
19.00
9.55
6.94
5.79
5.14
4.74
4.46
4.26
4.10
3.98
3.89
3.81
3.74
3.68
3.63
3.59
3.55
3.52
3.49
3.39
3.32
3.27
3.23
3.20
3.18
3.15
3.13
3.11
3.10
3.09
3.00
3
215.71
19.16
9.28
6.59
5.41
4.76
4.35
4.07
3.86
3.71
3.59
3.49
3.41
3.34
3.29
3.24
3.20
3.16
3.13
3.10
2.99
2.92
2.87
2.84
2.81
2.79
2.76
2.74
2.72
2.71
2.70
2.60
4
224.58
19.25
9.12
6.39
5.19
4.53
4.12
3.84
3.63
3.48
3.36
3.26
3.18
3.11
3.06
3.01
2.96
2.93
2.90
2.87
2.76
2.69
2.64
2.61
2.58
2.56
2.53
2.50
2.49
2.47
2.46
2.37
5
230.16
19.30
9.01
6.26
5.05
4.39
3.97
3.69
3.48
3.33
3.20
3.11
3.03
2.96
2.90
2.85
2.81
2.77
2.74
2.71
2.60
2.53
2.49
2.45
2.42
2.40
2.37
2.35
2.33
2.32
2.31
2.21
6
233.99
19.33
8.94
6.16
4.95
4.28
3.87
3.58
3.37
3.22
3.09
3.00
2.92
2.85
2.79
2.74
2.70
2.66
2.63
2.60
2.49
2.42
2.37
2.34
2.31
2.29
2.25
2.23
2.21
2.20
2.19
2.10
7
236.77
19.35
8.89
6.09
4.88
4.21
3.79
3.50
3.29
3.14
3.01
2.91
2.83
2.76
2.71
2.66
2.61
2.58
2.54
2.51
2.40
2.33
2.29
2.25
2.22
2.20
2.17
2.14
2.13
2.11
2.10
2.01
8
238.88
19.37
8.85
6.04
4.82
4.15
3.73
3.44
3.23
3.07
2.95
2.85
2.77
2.70
2.64
2.59
2.55
2.51
2.48
2.45
2.34
2.27
2.22
2.18
2.15
2.13
2.10
2.07
2.06
2.04
2.03
1.94
9
240.54
19.38
8.81
6.00
4.77
4.10
3.68
3.39
3.18
3.02
2.90
2.80
2.71
2.65
2.59
2.54
2.49
2.46
2.42
2.39
2.28
2.21
2.16
2.12
2.10
2.07
2.04
2.02
2.00
1.99
1.97
1.88
10
241.88
19.40
8.79
5.96
4.74
4.06
3.64
3.35
3.14
2.98
2.85
2.75
2.67
2.60
2.54
2.49
2.45
2.41
2.38
2.35
2.24
2.16
2.11
2.08
2.05
2.03
1.99
1.97
1.95
1.94
1.93
1.83
Continued
324
Appendix
Continued
Table VI Critical values of F with numerator and denominator degrees of freedom 1, 2
respectively ( 0.05).
dId
11
12
13
14
15
20
25
30
40
50
75
100
1 243.0 243.9 244.7 245.4 246.0 248.0 249.3 250.1 251.1 251.8 252.6 253.0 254.3
2 19.40 19.41 19.42 19.42 19.43 19.45 19.46 19.46 19.47 19.48 19.48 19.49 19.50
3
8.76
8.74
8.73
8.71
8.70
8.66
8.63
8.62
8.59
8.58
8.56
8.55
8.53
4
5.94
5.91
5.89
5.87
5.86
5.80
5.77
5.75
5.72
5.70
5.68
5.66
5.63
5
4.70
4.68
4.66
4.64
4.62
4.56
4.52
4.50
4.46
4.44
4.42
4.41
4.37
6
4.03
4.00
3.98
3.96
3.94
3.87
3.83
3.81
3.77
3.75
3.73
3.71
3.67
7
3.60
3.57
3.55
3.53
3.51
3.44
3.40
3.38
3.34
3.32
3.29
3.27
3.23
8
3.31
3.28
3.26
3.24
3.22
3.15
3.11
3.08
3.04
3.02
2.99
2.97
2.93
9
3.10
3.07
3.05
3.03
3.01
2.94
2.89
2.86
2.83
2.80
2.77
2.76
2.71
10
2.94
2.91
2.89
2.86
2.85
2.77
2.73
2.70
2.66
2.64
2.60
2.59
2.54
11
2.82
2.79
2.76
2.74
2.72
2.65
2.60
2.57
2.53
2.51
2.47
2.46
2.40
12
2.72
2.69
2.66
2.64
2.62
2.54
2.50
2.47
2.43
2.40
2.37
2.35
2.30
13
2.63
2.60
2.58
2.55
2.53
2.46
2.41
2.38
2.34
2.31
2.28
2.26
2.21
14
2.57
2.53
2.51
2.48
2.46
2.39
2.34
2.31
2.27
2.24
2.21
2.19
2.13
15
2.51
2.48
2.45
2.42
2.40
2.33
2.28
2.25
2.20
2.18
2.14
2.12
2.07
16
2.46
2.42
2.40
2.37
2.35
2.28
2.23
2.19
2.15
2.12
2.09
2.07
2.01
17
2.41
2.38
2.35
2.33
2.31
2.23
2.18
2.15
2.10
2.08
2.04
2.02
1.96
18
2.37
2.34
2.31
2.29
2.27
2.19
2.14
2.11
2.06
2.04
2.00
1.98
1.92
19
2.34
2.31
2.28
2.26
2.23
2.16
2.11
2.07
2.03
2.00
1.96
1.94
1.88
20
2.31
2.28
2.25
2.22
2.20
2.12
2.07
2.04
1.99
1.97
1.93
1.91
1.84
25
2.20
2.16
2.14
2.11
2.09
2.01
1.96
1.92
1.87
1.84
1.80
1.78
1.71
30
2.13
2.09
2.06
2.04
2.01
1.93
1.88
1.84
1.79
1.76
1.72
1.70
1.62
35
2.07
2.04
2.01
1.99
1.96
1.88
1.82
1.79
1.74
1.70
1.66
1.63
1.56
40
2.04
2.00
1.97
1.95
1.92
1.84
1.78
1.74
1.69
1.66
1.61
1.59
1.51
45
2.01
1.97
1.94
1.92
1.89
1.81
1.75
1.71
1.66
1.63
1.58
1.55
1.47
50
1.99
1.95
1.92
1.89
1.87
1.78
1.73
1.69
1.63
1.60
1.55
1.52
1.44
60
1.95
1.92
1.89
1.86
1.84
1.75
1.69
1.65
1.59
1.56
1.51
1.48
1.39
70
1.93
1.89
1.86
1.84
1.81
1.72
1.66
1.62
1.57
1.53
1.48
1.45
1.35
80
1.91
1.88
1.84
1.82
1.79
1.70
1.64
1.60
1.54
1.51
1.45
1.43
1.32
90
1.90
1.86
1.83
1.80
1.78
1.69
1.63
1.59
1.53
1.49
1.44
1.41
1.30
100
1.89
1.85
1.82
1.79
1.77
1.68
1.62
1.57
1.52
1.48
1.42
1.39
1.28
1.79
1.75
1.72
1.69
1.67
1.57
1.51
1.46
1.39
1.35
1.28
1.24
1.00
To find the critical value of F when is under the lower tail denoted by F1,2,1 we use the
following formula: F
. Example: Fv ,v ,1.10= 1
1 2
1,2,1 = 1
Fv ,v ,.10
F , ,
2 1
2 1
Appendix 325
Table VI Critical values of F with numerator and denominator degrees of freedom 1, 2
respectively ( 0.025).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
25
30
35
40
45
50
60
70
80
90
100
1
647.79
38.51
17.44
12.22
10.01
8.81
8.07
7.57
7.21
6.94
6.72
6.55
6.41
6.30
6.20
6.12
6.04
5.98
5.92
5.87
5.69
5.57
5.49
5.42
5.38
5.34
5.29
5.25
5.22
5.20
5.18
5.02
2
799.50
39.00
16.04
10.65
8.43
7.26
6.54
6.06
5.71
5.46
5.26
5.10
4.97
4.86
4.77
4.69
4.62
4.56
4.51
4.46
4.29
4.18
4.11
4.05
4.01
3.97
3.93
3.89
3.86
3.84
3.83
3.69
3
864.16
39.17
15.44
9.98
7.76
6.60
5.89
5.42
5.08
4.83
4.63
4.47
4.35
4.24
4.15
4.08
4.01
3.95
3.90
3.86
3.69
3.59
3.52
3.46
3.42
3.39
3.34
3.31
3.28
3.26
3.25
3.12
4
899.58
39.25
15.10
9.60
7.39
6.23
5.52
5.05
4.72
4.47
4.28
4.12
4.00
3.89
3.80
3.73
3.66
3.61
3.56
3.51
3.35
3.25
3.18
3.13
3.09
3.05
3.01
2.97
2.95
2.93
2.92
2.79
5
921.85
39.30
14.88
9.36
7.15
5.99
5.29
4.82
4.48
4.24
4.04
3.89
3.77
3.66
3.58
3.50
3.44
3.38
3.33
3.29
3.13
3.03
2.96
2.90
2.86
2.83
2.79
2.75
2.73
2.71
2.70
2.57
6
937.11
39.33
14.73
9.20
6.98
5.82
5.12
4.65
4.32
4.07
3.88
3.73
3.60
3.50
3.41
3.34
3.28
3.22
3.17
3.13
2.97
2.87
2.80
2.74
2.70
2.67
2.63
2.59
2.57
2.55
2.54
2.41
7
948.22
39.36
14.62
9.07
6.85
5.70
4.99
4.53
4.20
3.95
3.76
3.61
3.48
3.38
3.29
3.22
3.16
3.10
3.05
3.01
2.85
2.75
2.68
2.62
2.58
2.55
2.51
2.47
2.45
2.43
2.42
2.29
8
956.66
39.37
14.54
8.98
6.76
5.60
4.90
4.43
4.10
3.85
3.66
3.51
3.39
3.29
3.20
3.12
3.06
3.01
2.96
2.91
2.75
2.65
2.58
2.53
2.49
2.46
2.41
2.38
2.35
2.34
2.32
2.19
9
963.28
39.39
14.47
8.90
6.68
5.52
4.82
4.36
4.03
3.78
3.59
3.44
3.31
3.21
3.12
3.05
2.98
2.93
2.88
2.84
2.68
2.57
2.50
2.45
2.41
2.38
2.33
2.30
2.28
2.26
2.24
2.11
10
968.63
39.40
14.42
8.84
6.62
5.46
4.76
4.30
3.96
3.72
3.53
3.37
3.25
3.15
3.06
2.99
2.92
2.87
2.82
2.77
2.61
2.51
2.44
2.39
2.35
2.32
2.27
2.24
2.21
2.19
2.18
2.05
Continued
326
Appendix
Continued
Table VI Critical values of F with numerator and denominator degrees of freedom 1, 2
respectively ( 0.025).
dId
11
12
13
14
15
20
25
30
40
50
75
100
1 973.0 976.7 979.8 982.5 984.9 993.1 998.1 1001.4 1005.6 1008.1 1011.5 1013.2
2 39.41 39.41 39.42 39.43 39.43 39.45 39.46
39.46
39.47
39.48
39.48
39.49
3 14.37 14.34 14.30 14.28 14.25 14.17 14.12
14.08
14.04
14.01
13.97
13.96
4
8.79
8.75
8.71
8.68
8.66
8.56
8.50
8.46
8.41
8.38
8.34
8.32
5
6.57
6.52
6.49
6.46
6.43
6.33
6.27
6.23
6.18
6.14
6.10
6.08
6
5.41
5.37
5.33
5.30
5.27
5.17
5.11
5.07
5.01
4.98
4.94
4.92
7
4.71
4.67
4.63
4.60
4.57
4.47
4.40
4.36
4.31
4.28
4.23
4.21
8
4.24
4.20
4.16
4.13
4.10
4.00
3.94
3.89
3.84
3.81
3.76
3.74
9
3.91
3.87
3.83
3.80
3.77
3.67
3.60
3.56
3.51
3.47
3.43
3.40
10
3.66
3.62
3.58
3.55
3.52
3.42
3.35
3.31
3.26
3.22
3.18
3.15
11
3.47
3.43
3.39
3.36
3.33
3.23
3.16
3.12
3.06
3.03
2.98
2.96
12
3.32
3.28
3.24
3.21
3.18
3.07
3.01
2.96
2.91
2.87
2.82
2.80
13
3.20
3.15
3.12
3.08
3.05
2.95
2.88
2.84
2.78
2.74
2.70
2.67
14
3.09
3.05
3.01
2.98
2.95
2.84
2.78
2.73
2.67
2.64
2.59
2.56
15
3.01
2.96
2.92
2.89
2.86
2.76
2.69
2.64
2.59
2.55
2.50
2.47
16
2.93
2.89
2.85
2.82
2.79
2.68
2.61
2.57
2.51
2.47
2.42
2.40
17
2.87
2.82
2.79
2.75
2.72
2.62
2.55
2.50
2.44
2.41
2.35
2.33
18
2.81
2.77
2.73
2.70
2.67
2.56
2.49
2.44
2.38
2.35
2.30
2.27
19
2.76
2.72
2.68
2.65
2.62
2.51
2.44
2.39
2.33
2.30
2.24
2.22
20
2.72
2.68
2.64
2.60
2.57
2.46
2.40
2.35
2.29
2.25
2.20
2.17
25
2.56
2.51
2.48
2.44
2.41
2.30
2.23
2.18
2.12
2.08
2.02
2.00
30
2.46
2.41
2.37
2.34
2.31
2.20
2.12
2.07
2.01
1.97
1.91
1.88
35
2.39
2.34
2.30
2.27
2.23
2.12
2.05
2.00
1.93
1.89
1.83
1.80
40
2.33
2.29
2.25
2.21
2.18
2.07
1.99
1.94
1.88
1.83
1.77
1.74
45
2.29
2.25
2.21
2.17
2.14
2.03
1.95
1.90
1.83
1.79
1.73
1.69
50
2.26
2.22
2.18
2.14
2.11
1.99
1.92
1.87
1.80
1.75
1.69
1.66
60
2.22
2.17
2.13
2.09
2.06
1.94
1.87
1.82
1.74
1.70
1.63
1.60
70
2.18
2.14
2.10
2.06
2.03
1.91
1.83
1.78
1.71
1.66
1.59
1.56
80
2.16
2.11
2.07
2.03
2.00
1.88
1.81
1.75
1.68
1.63
1.56
1.53
90
2.14
2.09
2.05
2.02
1.98
1.86
1.79
1.73
1.66
1.61
1.54
1.50
100
2.12
2.08
2.04
2.00
1.97
1.85
1.77
1.71
1.64
1.59
1.52
1.48
1.99
1.94
1.90
1.87
1.83
1.71
1.63
1.57
1.48
1.43
1.34
1.30
To find the critical value of F when is under the lower tail denoted by F1,2,1 we use the
following formula: F
. Example: Fv ,v ,1.10= 1
1 2
1,2,1 = 1
Fv ,v ,.10
F , ,
2 1
2 1
Appendix 327
Table VI Critical values of F with numerator and denominator degrees of freedom 1, 2
respectively ( 0.01).
dId
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
25
30
35
40
45
50
60
70
80
90
100
1
4052
98.50
34.12
21.20
16.26
13.75
12.25
11.26
10.56
10.04
9.65
9.33
9.07
8.86
8.68
8.53
8.40
8.29
8.18
8.10
7.77
7.56
7.42
7.31
7.23
7.17
7.08
7.01
6.96
6.93
6.90
6.63
2
5000
99.00
30.82
18.00
13.27
10.92
9.55
8.65
8.02
7.56
7.21
6.93
6.70
6.51
6.36
6.23
6.11
6.01
5.93
5.85
5.57
5.39
5.27
5.18
5.11
5.06
4.98
4.92
4.88
4.85
4.82
4.61
3
5403
99.17
29.46
16.69
12.06
9.78
8.45
7.59
6.99
6.55
6.22
5.95
5.74
5.56
5.42
5.29
5.18
5.09
5.01
4.94
4.68
4.51
4.40
4.31
4.25
4.20
4.13
4.07
4.04
4.01
3.98
3.78
4
5625
99.25
28.71
15.98
11.39
9.15
7.85
7.01
6.42
5.99
5.67
5.41
5.21
5.04
4.89
4.77
4.67
4.58
4.50
4.43
4.18
4.02
3.91
3.83
3.77
3.72
3.65
3.60
3.56
3.53
3.51
3.32
5
5764
99.30
28.24
15.52
10.97
8.75
7.46
6.63
6.06
5.64
5.32
5.06
4.86
4.69
4.56
4.44
4.34
4.25
4.17
4.10
3.85
3.70
3.59
3.51
3.45
3.41
3.34
3.29
3.26
3.23
3.21
3.02
6
5859
99.33
27.91
15.21
10.67
8.47
7.19
6.37
5.80
5.39
5.07
4.82
4.62
4.46
4.32
4.20
4.10
4.01
3.94
3.87
3.63
3.47
3.37
3.29
3.23
3.19
3.12
3.07
3.04
3.01
2.99
2.80
7
5928
99.36
27.67
14.98
10.46
8.26
6.99
6.18
5.61
5.20
4.89
4.64
4.44
4.28
4.14
4.03
3.93
3.84
3.77
3.70
3.46
3.30
3.20
3.12
3.07
3.02
2.95
2.91
2.87
2.84
2.82
2.64
8
5981
99.37
27.49
14.80
10.29
8.10
6.84
6.03
5.47
5.06
4.74
4.50
4.30
4.14
4.00
3.89
3.79
3.71
3.63
3.56
3.32
3.17
3.07
2.99
2.94
2.89
2.82
2.78
2.74
2.72
2.69
2.51
9
6022
99.39
27.35
14.66
10.16
7.98
6.72
5.91
5.35
4.94
4.63
4.39
4.19
4.03
3.89
3.78
3.68
3.60
3.52
3.46
3.22
3.07
2.96
2.89
2.83
2.78
2.72
2.67
2.64
2.61
2.59
2.41
10
6056
99.40
27.23
14.55
10.05
7.87
6.62
5.81
5.26
4.85
4.54
4.30
4.10
3.94
3.80
3.69
3.59
3.51
3.43
3.37
3.13
2.98
2.88
2.80
2.74
2.70
2.63
2.59
2.55
2.52
2.50
2.32
Continued
328
Appendix
Continued
Table VI Critical values of F with numerator and denominator degrees of freedom 1, 2
respectively ( 0.01).
dId
11
12
13
14
15
20
25
30
40
50
1 6056
6106
6130
6140
6157
6209
6240
6261
6287
6303
2
99.41
99.42
99.42
99.43
99.43
99.45
99.46
99.47
99.47
99.48
3
27.13
27.05
26.98
26.92
26.87
26.69 2658
26.50
26.41
26.35
4
14.45
14.37
14.31
14.25
14.20
14.02
13.91
13.84
13.75
13.69
5
9.96
9.89
9.82
9.77
9.72
9.55
9.45
9.38
9.29
9.24
6
7.79
7.72
7.66
7.61
7.56
7.40
7.30
7.23
7.14
7.09
7
6.54
6.47
6.41
6.36
6.31
6.16
6.06
5.99
5.91
5.86
8
5.73
5.67
5.61
5.56
5.52
5.36
5.26
5.20
5.12
5.07
9
5.18
5.11
5.05
5.01
4.96
4.81
4.71
4.65
4.57
4.52
10
4.77
4.71
4.65
4.60
4.56
4.41
4.31
4.25
4.17
4.12
11
4.46
4.40
4.34
4.29
4.25
4.10
4.01
3.94
3.86
3.81
12
4.22
4.16
4.10
4.05
4.01
3.86
3.76
3.70
3.62
3.57
13
4.02
3.96
3.91
3.86
3.82
3.66
3.57
3.51
3.43
3.38
14
3.86
3.80
3.75
3.70
3.66
3.51
3.41
3.35
3.27
3.22
15
3.73
3.67
3.61
3.56
3.52
3.37
3.28
3.21
3.13
3.08
16
3.62
3.55
3.50
3.45
3.41
3.26
3.16
3.10
3.02
2.97
17
3.52
3.46
3.40
3.35
3.31
3.16
3.07
3.00
2.92
2.87
18
3.43
3.37
3.32
3.27
3.23
3.08
2.98
2.92
2.84
2.78
19
3.36
3.30
3.24
3.19
3.15
3.00
2.91
2.84
2.76
2.71
20
3.29
3.23
3.18
3.13
3.09
2.94
2.84
2.78
2.69
2.64
25
3.06
2.99
2.94
2.89
2.85
2.70
2.60
2.54
2.45
2.40
30
2.91
2.84
2.79
2.74
2.70
2.55
2.45
2.39
2.30
2.25
35
2.80
2.74
2.69
2.64
2.60
2.44
2.35
2.28
2.19
2.13
40
2.73
2.66
2.61
2.56
2.52
2.3 7
2.27
2.20
2.11
2.06
45
2.67
2.61
2.55
2.51
2.46
2.31
2.21
2.14
2.05
2.00
50
2.63
2.56
2.51
2.46
2.42
2.27
2.17
2.10
2.01
1.95
60
2.56
2.34
2.44
2.39
2.19
2.03
2.10
1.86
1.76
1.88
70
2.51
2.45
2.40
2.35
2.31
2.15
2.05
1.98
1.87
1.83
80
2.48
2.42
2.36
2.31
2.27
2.12
2.01
1.94
1.85
1.79
90
2.45
2.39
2.33
2.27
2.24
2.09
1.99
1.92
1.82
1.76
100
2.43
2.37
2.31
2.27
2.22
2.07
1.97
1.89
1.80
1.74
2.25
2.18
2.12
2.08
2.04
1.88
1.77
1.70
1.59
1.52
To find the critical value of F when is under the lower tail denoted by F1,2,1 we use the
following formula: F
. Example: Fv ,v ,1.10= 1
1 2
1,2,1 = 1
Fv ,v ,.10
F , ,
2 1
2 1
75
6320
99.49
26.28
13.61
9.17
7.02
5.79
5.00
4.45
4.05
3.74
3.50
3.31
3.15
3.01
2.90
2.80
2.71
2.64
2.57
2.33
2.17
2.06
1.98
1.92
1.87
1.79
1.74
1.70
1.67
1.65
1.42
100
6334
99.49
26.24
13.58
9.13
6.99
5.75
4.96
4.41
4.01
3.71
3.47
3.27
3.11
2.98
2.86
2.76
2.68
2.60
2.54
2.29
2.13
2.02
1.94
1.88
1.82
1.75
1.70
1.65
1.62
1.60
1.36
INDEX
Index Terms
Links
A
absolute probability
89
acceptance regions
204
aging factors
129
Alpha, defined
305
alternative hypotheses
202
alternatives, two-tail
203
131
203
305
267
292
xxi
305
39
associations, perfect
41
86
B
bar charts
23
237
2
Bernoulli distribution
102
Bernoulli populations
147
102
Bernoulli trials
101
Beta, defined
305
beta function
153
167
bimodal data
305
bimodal distribution
305
305
310
305
Index Terms
Links
270
102
defined
305
mean
106
normal approximation to
159
point
102
Poisson approximation to
111
107
standard deviation
106
271
158
105
314
39
41
BMDP software
255
168
192
305
307
66
264
290
305
box-whisker plots
C
categorical data, graphical representation
22
97
117
121
141
305
71
23
267
292
305
box-whisker plots
66
264
290
305
categorical data
22
control charts
33
dot plots
20
39
262
15
histograms
27
35
260
307
JMP
291
This page has been reformatted by Knovel to provide easier navigation.
288
Index Terms
Links
33
Pareto chart
24
pie chart
22
probability function
97
scatter plots
21
39
268
309
39
27
34
summary information
266
291
33
tree diagram
75
Venn diagrams
294
310
310
320
chi-square distributions
148
270
classes
20
306
52
57
combination
77
79
complement
80
306
composite hypotheses
203
conditional probability
88
confidence coefficients
171
305
306
306
confidence intervals
See also interval estimation defined
165
171
180
hypothesis testing
250
275
173
180
one-sided
174
176
172
187
195
198
177
180
two-sided
173
187
183
176
320
Index Terms
Links
confidence limits
171
contingency tables
306
160
306
continuous distribution
115
306
94
115
117
306
control charts
33
xxi
140
160
168
306
40
306
critical points
204
306
critical regions
204
correction factors
correlation coefficient
97
117
cumulative frequencies
16
306
32
307
cumulative probabilities
96
curves
bell-shaped
frequency distribution
31
Ogive
32
308
operating characteristic
206
308
power
309
CV (coefficients of variation)
52
57
306
D
data
before and after
237
bimodal
305
bivariate
39
categorical
22
41
converting to information
defined
grouped
20
57
interval
12
307
nominal
12
308
307
Index Terms
Links
data (Cont.)
numerical. See numerical data
ordinal
12
308
paired
237
309
12
15
qualitative
22
309
quantitative
See quantitative data
ratio
12
sets of
15
306
skewed
51
52
symmetric
51
310
types of
11
ungrouped
20
310
xvii
xxi
148
154
306
dependent events
91
306
descriptive statistics
10
15
306
94
97
99
45
52
60
306
72
107
discrete distributions
306
93
306
74
2
64
Index Terms
Links
distribution functions
continuous random variables
117
cumulative
97
frequency
153
117
distributions
Bernoulli
102
bimodal
305
269
chi-square
148
270
continuous
115
306
93
14
306
exponential
129
270
307
F-
155
270
307
hypergeometric
107
10
307
110
14
270
309
39
307
discrete
305
320
95
rectangular distributions
118
of sample mean
140
51
skewedsymmetric
67
67
Snedecors F-
155
Students t-
153
230
15
34
tables
uniform
118
Weibull
132
311
xvii
306
dot plots
20
39
262
Index Terms
Links
E
empirical rule
60
66
70
307
305
307
305
307
307
errors
of estimation
168
192
in hypothesis testing.
204
212
margin of
168
192
mean square
308
of point estimation
168
192
standard
140
310
type I
205
212
310
type
11
205
212
estimators
137
307
defined
74
307
dependent
91
306
equally likely
307
independent
89
307
mutually exclusive
83
90
null
75
79
of random experiments
73
rare
308
110
representations of
75
simple
73
75
309
sure
75
80
310
expected frequencies
expected values
307
99
307
experiments
defined
307
deterministic
72
129
270
307
310
Index Terms
Links
exponential models
extreme values
131
48
66
67
270
307
308
F
F critical values table
323
F-distributions
155
132
168
finite populations
first quartile
11
140
307
freedom, degrees of
148
frequencies, class
306
frequencies, cumulative
16
154
306
frequencies, expected
307
frequencies, relative
16
83
306
309
153
27
32
34
307
frequency polygons
27
30
33
307
glossary
305
11
Gosset, W. S.
153
xvii
20
57
307
H
hazard rate function
132
Index Terms
Links
histograms
27
35
260
288
307
hypergeometric distributions
107
10
307
hypotheses, types of
202
209
305
308
273
237
confidence intervals
250
275
errors in
204
212
general concepts
203
in JMP
295
300
large samples
208
240
252
295
in MINITAB
273
normal population
238
253
254
208
223
238
250
250
274
296
276
295
one population proportion
240
244
paired t-test
237
201
purpose
201
small samples
223
steps in
207
216
229
242
276
247
280
300
I
Improve phase, tools/techniques
associated
xxi
independent events
89
independent samples
307
inertia, moments of
101
307
Index Terms
Links
inferential statistics
10
infinite populations
11
information
307
52
64
intersection
80
307
interval data
12
307
165
171
192
52
64
308
interval estimation
308
J
JMP
basic functions
284
calculating statistics
286
292
290
291
displaying histograms
288
294
289
hypothesis testing
295
normality testing
301
paired t-test
298
300
L
LCL (lower confidence limits)
171
51
67
level of significance
205
limits, class
306
limits, confidence
171
limits, specification
308
307
Index Terms
Links
line graphs
location, measures of
33
39
171
lower fences
308
lower-tail hypotheses
209
63
M
MAIC (Measure, Analyze, Improve, and
Control)
margin of error
168
marginal probability
308
marks, class
192
20
308
mean
arithmetic
305
Bernoulli distribution
102
binomial distribution
106
120
defined
308
99
exponential distribution
130
58
hypergeometric distribution
110
Poisson distribution
114
population
138
sample
138
uniform distribution
120
Weibull distribution
133
weighted
311
xxi
39
Index Terms
Links
measures of centrality
defined
45
57
limitations of
52
308
48
51
58
mode
50
59
308
45
52
63
51
58
measures of dispersion
308
60
64
measures of location
measures of variability
median
2
308
48
memory-less properties
midpoints, class
130
20
306
MINITAB
calculating distributions
269
calculating statistics
258
267
264
262
266
260
displaying histograms
260
268
263
general use
255
264
265
273
276
280
normality testing
282
paired t-test
278
82
308
Index Terms
Links
mode
50
moment of inertia
Motorola definition of Six Sigma
MSE (mean square error)
59
308
101
3
308
multiplication rule
77
90
83
90
nominal data
12
308
nonconditional probability
89
308
nonparametric statistics
308
normal distribution
calculating in MINITAB
270
148
defined
121
empirical rule
123
308
60
examples
124
generally
121
153
tables
123
319
normality testing
JMP
301
MINITAB
282
null event
null hypotheses
75
79
202
308
20
27
numerical data
graphical representations
interval estimation and
measures of
66
171
52
166
45
310
Index Terms
Links
O
observations
308
205
308
206
308
32
308
OC (operating characteristic)
curves
Ogive curves
one-tail alternatives
203
one-tail tests
308
206
308
1
36
See also
stem and leaf diagrams
ordinal data
12
308
outliers
48
66
p-values
210
309
paired data
237
309
paired t-test
237
278
298
parameters
45
137
165
309
Pareto chart
24
294
309
67
308
Pearson correlation
coefficient
40
Pearson, Karl
40
percentiles
63
perfect association
41
permutations
77
pie charts
22
pivotal quantities
172
102
309
268
Index Terms
Links
point estimation
See also hypothesis testing;
interval estimation
bias in
167
defined
165
description
166
69
errors of
168
192
examples
169
variance of
167
point value
310
305
307
169
309
111
158
Poisson distribution
110
270
114
317
Poisson process
111
131
309
population means
confidence intervals for large samples
180
183
differences between
216
138
population proportions
confidence intervals
187
242
estimating unknown
195
240
273
population variances
confidence intervals
195
formula for
54
60
208
216
223
229
250
hypothesis testing of one
244
247
280
300
213
219
226
232
Index Terms
Links
populations
defined
10
309
types of
11
107
309
power, defined
309
205
140
probability
absolute
89
axiomatic approach
86
conditional
88
306
defined
71
72
83
marginal
308
nonconditional
89
random experiments
72
statistics and
72
theoretical
85
102
Binomial
103
115
exponential distributions
129
formula for
95
graphical representations
97
hypergeometric
108
normal
121
Poisson distributions
111
Snedecors F-distributions
156
uniform
118
Weibull
132
114
317
xix
309
147
Index Terms
Links
Q
qualitative data
defined
12
15
graphical representations
22
309
309
quantitative data
defined
12
18
graphical representations
20
309
27
34
307
309
66
171
52
166
64
R
random experiments defined
307
events of
73
probability and
72
random samples
11
309
random variables
Bernoulli
102
continuous
94
115
defined
93
309
discrete
93
94
117
306
97
99
306
standard normal
types
122
93
range spaces
95
range
52
115
53
309
Index Terms
Links
range, interquartile
rare events
52
64
110
ratio data
12
rectangular distribution
118
rejection regions
204
205
309
relative frequencies
16
83
309
85
27
30
107
research hypotheses
202
52
67
xix
307
S
sample mean, probability distributions of
sample points
140
73
77
192
73
sample statistics
309
sample survey
309
sample variance
54
sampled populations
11
79
309
60
samples
defined
11
independent
307
replacement and
107
309
309
of sample mean
138
of sample proportion
147
Students t-distribution
153
Index Terms
Links
SAS software
255
scatter plots
21
second quartile
39
263
309
Set Theory
80
significance, level of
simple events
simple hypotheses
single-valued frequency distribution tables
205
308
73
75
309
203
18
Six Sigma
defined
methodology
xix
Motorola definition
statistical concept
steps in
xvii
tools/techniques
xxi
skewed data
310
Snedecors F-distribution
155
255
310
303
2
255
standard deviation
Bernoulli distribution
102
binomial distribution
106
120
defined
310
99
exponential distribution
130
60
hypergeometric distribution
110
Poisson distribution
114
uniform distribution
120
standard error
140
310
265
Index Terms
Links
122
statistical tools
255
303
286
calculating in MINITAB
258
defined
descriptive
45
137
10
15
306
goals of
165
inferential
10
nonparametric
308
probability and
72
sample
309
Statpages.net
303
Statsoftinc.com
303
307
27
34
289
310
Students t-distribution
153
180
226
230
80
310
310
Sturges formula
19
sure events
75
survey, sample
309
symmetric data
51
symmetric distribution
67
SYSTAT software
310
255
T
t critical values table
322
t-distributions
153
180
226
t-test, paired
237
278
298
105
159
314
tables
binomial probability
230
Index Terms
Links
tables (Cont.)
chi-square distribution
149
F critical value
323
frequency distribution
320
15
34
normal distribution
123
319
Poisson probability
114
317
Snedecors F-distribution
157
Students t-distribution
154
t critical values
322
target populations
11
test statistic
39
310
testing statistical
hypotheses
202
tests, types of
310
theoretical probability
third quartile
85
310
33
tree diagrams
75
two-tail alternatives
203
two-tail hypotheses
209
two-tail tests
310
type I error
205
212
310
type II error
205
212
310
U
UCL (upper confidence limits)
171
unbiased estimators
167
ungrouped data
310
20
uniform distributions
union
118
270
310
80
171
upper fences
310
upper-tail hypotheses
209
307
Index Terms
Links
V
values
chi-square
320
expected
99
307
extreme
48
66
F critical
323
p-
210
point
309
t critical
322
variability, measures of
67
309
308
variables
defined
310
16
variances
defined
310
of point estimators
167
169
54
60
3
79
310
W
web-based statistical tools
303
Weibull distribution
132
weighted mean
311
width, class
306
Z
Z distribution
311
z-scores
123
311
308