Professional Documents
Culture Documents
A Course in Mathematical Statistics George G. Roussas p593 PDF
A Course in Mathematical Statistics George G. Roussas p593 PDF
ii
Contents
Contents
George G. Roussas
Intercollege Division of Statistics
University of California
Davis, California
ACADEMIC PRESS
San Diego
New York
London Boston
Sydney Tokyo Toronto
iii
iv
Contents
ACADEMIC PRESS
525 B Street, Suite 1900, San Diego, CA 92101-4495, USA
1300 Boylston Street, Chestnut Hill, MA 02167, USA
http://www.apnet.com
ACADEMIC PRESS LIMITED
2428 Oval Road, London NW1 7DX, UK
http://www.hbuk.co.uk/ap/
Contents
vi
Contents
Contents
vii
Contents
Chapter 1
Chapter 2
vii
viii
Contents
2.6*
Chapter 3
3.5*
Chapter 4
4.2
4.3
4.4*
Chapter 5
Contents
5.4
5.5
5.6*
Chapter 6
Chapter 7
Preliminaries 138
Denitions and Basic TheoremsThe One-Dimensional Case 140
Exercises 145
The Characteristic Functions of Some Random Variables 146
Exercises 149
Denitions and Basic TheoremsThe Multidimensional Case 150
Exercises 152
The Moment Generating Function and Factorial Moment Generating
Function of a Random Variable 153
Exercises 160
Chapter 8
ix
Contents
8.5
8.6*
Chapter 9
Chapter 10
10.2
11.5
Chapter 12
Chapter 11
Introduction 284
Exercise 284
Contents
12.2
Chapter 13
13.2
13.3
13.4
13.5
13.6
13.7
13.8
xi
xii
Contents
13.9
Chapter 14
Chapter 15
15.2
15.3
15.4
15.5
Chapter 17
Chapter 16
Analysis of Variance
17.1
17.2
424
440
Contents
17.3
17.4
Chapter 18
Chapter 19
20.3
20.4
20.5
20.6
Appendix II
Introduction 476
Some Theorems on Quadratic Forms 477
Exercises 483
Appendix I
Introduction 463
Exercises 466
Some Properties of Multivariate Normal Distributions 467
Exercise 469
Estimation of and / and a Test of Independence 469
Exercises 475
Chapter 20
508
xiii
xiv
Contents
II.2
II.3
Appendix III
Tables
1
2
3
4
5
6
7
511
Index
561
545
547
Contents
xv
This is the second edition of a book published for the rst time in 1973 by
Addison-Wesley Publishing Company, Inc., under the title A First Course in
Mathematical Statistics. The rst edition has been out of print for a number of
years now, although its reprint in Taiwan is still available. That issue, however,
is meant for circulation only in Taiwan.
The rst issue of the book was very well received from an academic
viewpoint. I have had the pleasure of hearing colleagues telling me that the
book lled an existing gap between a plethora of textbooks of lower mathematical level and others of considerably higher level. A substantial number of
colleagues, holding senior academic appointments in North America and elsewhere, have acknowledged to me that they made their entrance into the
wonderful world of probability and statistics through my book. I have also
heard of the book as being in a class of its own, and also as forming a collectors
item, after it went out of print. Finally, throughout the years, I have received
numerous inquiries as to the possibility of having the book reprinted. It is in
response to these comments and inquiries that I have decided to prepare a
second edition of the book.
This second edition preserves the unique character of the rst issue of the
book, whereas some adjustments are affected. The changes in this issue consist
in correcting some rather minor factual errors and a considerable number of
misprints, either kindly brought to my attention by users of the book or
located by my students and myself. Also, the reissuing of the book has provided me with an excellent opportunity to incorporate certain rearrangements
of the material.
One change occurring throughout the book is the grouping of exercises of
each chapter in clusters added at the end of sections. Associating exercises
with material discussed in sections clearly makes their assignment easier. In
the process of doing this, a handful of exercises were omitted, as being too
complicated for the level of the book, and a few new ones were inserted. In
xv
xvi
Contents
Preface to the Second Edition
Chapters 1 through 8, some of the materials were pulled out to form separate
sections. These sections have also been marked by an asterisk (*) to indicate
the fact that their omission does not jeopardize the ow of presentation and
understanding of the remaining material.
Specically, in Chapter 1, the concepts of a eld and of a -eld, and basic
results on them, have been grouped together in Section 1.2*. They are still
readily available for those who wish to employ them to add elegance and rigor
in the discussion, but their inclusion is not indispensable. In Chapter 2, the
number of sections has been doubled from three to six. This was done by
discussing independence and product probability spaces in separate sections.
Also, the solution of the problem of the probability of matching is isolated in a
section by itself. The section on the problem of the probability of matching and
the section on product probability spaces are also marked by an asterisk for the
reason explained above. In Chapter 3, the discussion of random variables as
measurable functions and related results is carried out in a separate section,
Section 3.5*. In Chapter 4, two new sections have been created by discussing
separately marginal and conditional distribution functions and probability
density functions, and also by presenting, in Section 4.4*, the proofs of two
statements, Statements 1 and 2, formulated in Section 4.1; this last section is
also marked by an asterisk. In Chapter 5, the discussion of covariance and
correlation coefcient is carried out in a separate section; some additional
material is also presented for the purpose of further clarifying the interpretation of correlation coefcient. Also, the justication of relation (2) in Chapter 2
is done in a section by itself, Section 5.6*. In Chapter 6, the number of sections
has been expanded from three to ve by discussing in separate sections characteristic functions for the one-dimensional and the multidimensional case, and
also by isolating in a section by itself denitions and results on momentgenerating functions and factorial moment generating functions. In Chapter 7,
the number of sections has been doubled from two to four by presenting the
proof of Lemma 2, stated in Section 7.1, and related results in a separate
section; also, by grouping together in a section marked by an asterisk denitions and results on independence. Finally, in Chapter 8, a new theorem,
Theorem 10, especially useful in estimation, has been added in Section 8.5.
Furthermore, the proof of Plyas lemma and an alternative proof of the Weak
Law of Large Numbers, based on truncation, are carried out in a separate
section, Section 8.6*, thus increasing the number of sections from ve to six.
In the remaining chapters, no changes were deemed necessary, except that
in Chapter 13, the proof of Theorem 2 in Section 13.3 has been facilitated by
the formulation and proof in the same section of two lemmas, Lemma 1 and
Lemma 2. Also, in Chapter 14, the proof of Theorem 1 in Section 14.1 has been
somewhat simplied by the formulation and proof of Lemma 1 in the same
section.
Finally, a table of some commonly met distributions, along with their
means, variances and other characteristics, has been added. The value of such
a table for reference purposes is obvious, and needs no elaboration.
xvii
This book contains enough material for a year course in probability and
statistics at the advanced undergraduate level, or for rst-year graduate students not having been exposed before to a serious course on the subject
matter. Some of the material can actually be omitted without disrupting the
continuity of presentation. This includes the sections marked by asterisks,
perhaps, Sections 13.413.6 in Chapter 13, and all of Chapter 14. The instructor can also be selective regarding Chapters 11 and 18. As for Chapter 19, it
has been included in the book for completeness only.
The book can also be used independently for a one-semester (or even one
quarter) course in probability alone. In such a case, one would strive to cover
the material in Chapters 1 through 10 with the exclusion, perhaps, of the
sections marked by an asterisk. One may also be selective in covering the
material in Chapter 9.
In either case, presentation of results involving characteristic functions
may be perfunctory only, with emphasis placed on moment-generating functions. One should mention, however, why characteristic functions are introduced in the rst place, and therefore what one may be missing by not utilizing
this valuable tool.
In closing, it is to be mentioned that this author is fully aware of the fact
that the audience for a book of this level has diminished rather than increased
since the time of its rst edition. He is also cognizant of the trend of having
recipes of probability and statistical results parading in textbooks, depriving
the reader of the challenge of thinking and reasoning instead delegating the
thinking to a computer. It is hoped that there is still room for a book of the
nature and scope of the one at hand. Indeed, the trend and practices just
described should make the availability of a textbook such as this one exceedingly useful if not imperative.
G. G. Roussas
Davis, California
May 1996
xviii
Contents
xviii
Contents
Preface to the First
Edition
xix
xx
the text. Also, there are more than 530 exercises, appearing at the end of the
chapters, which are of both theoretical and practical importance.
The careful selection of the material, the inclusion of a large variety of
topics, the abundance of examples, and the existence of a host of exercises of
both theoretical and applied nature will, we hope, satisfy people of both
theoretical and applied inclinations. All the application-oriented reader has to
do is to skip some ne points of some of the proofs (or some of the proofs
altogether!) when studying the book. On the other hand, the careful handling
of these same ne points should offer some satisfaction to the more mathematically inclined readers.
The material of this book has been presented several times to classes
of the composition mentioned earlier; that is, classes consisting of relatively
mathematically immature, eager, and adventurous sophomores, as well as
juniors and seniors, and statistically unsophisticated graduate students. These
classes met three hours a week over the academic year, and most of the
material was covered in the order in which it is presented with the occasional
exception of Chapters 14 and 20, Section 5 of Chapter 5, and Section 3 of
Chapter 9. We feel that there is enough material in this book for a threequarter session if the classes meet three or even four hours a week.
At various stages and times during the organization of this book several
students and colleagues helped improve it by their comments. In connection
with this, special thanks are due to G. K. Bhattacharyya. His meticulous
reading of the manuscripts resulted in many comments and suggestions that
helped improve the quality of the text. Also thanks go to B. Lind, K. G.
Mehrotra, A. Agresti, and a host of others, too many to be mentioned here. Of
course, the responsibility in this book lies with this author alone for all omissions and errors which may still be found.
As the teaching of statistics becomes more widespread and its level of
sophistication and mathematical rigor (even among those with limited mathematical training but yet wishing to know why and how) more demanding,
we hope that this book will ll a gap and satisfy an existing need.
G. G. R.
Madison, Wisconsin
November 1972
Chapter 1
S
S'
s1
s2
A
Ac
or
U Aj ,
j =1
is dened by
j =1
S
Figure 1.3 A1 A2 is the shaded region.
A1
A2
A1 A2 An
or
I Aj ,
j =1
is dened by
j =1
I A j = {s S ; s A j for all j = 1, 2, }.
j =1
S
Figure 1.4 A1 A2 is the shaded region.
A1
A2
A1 A2 = s S ; s A1 , s A2 .
Symmetrically,
A2 A1 = s S ; s A2 , s A1 .
A1
A2
) (
) (
A1 A2 = A1 A2 A2 A1 .
Note that
A1 A2 = A1 A2 A1 A2 .
Pictorially, this is shown in Fig. 1.6. It is worthwhile to observe that
operations (4) and (5) can be expressed in terms of operations (1), (2), and
(3).
S
Figure 1.6 A1 A2 is the shaded area.
A1
A2
A1 + A2 , A1 + + An = A j and
j =1
j =1
A1 + A2 + = A j
j =1
will usually be either the (nite) set {1, 2, . . . , n}, or the (innite) set
{1, 2, . . .}.
S
A1
1.1.3
A2
1. S c = , c = S, (Ac)c = A.
2. S A = S, A = A, A Ac = S, A A = A.
3. S A = A, A = , A Ac = , A A = A.
The previous statements are all obvious as is the following: A for every
subset A of S. Also
4. A1 (A2 A3) = (A1 A2) A3
A1 (A2 A3) = (A1 A2) A3
5. A1 A2 = A2 A1
A1 A2 = A2 A1
6. A (j Aj) = j (A Aj)
A (j Aj) = j (A Aj)
(Associative laws)
(Commutative laws)
(Distributive laws)
UA
There are two more important properties of the operation on sets which
relate complementation to union and intersection. They are known as De
Morgans laws:
i
c
U Aj = I Aj ,
j
c
I Aj = U Aj .
j
ii
We wish to establish
a) ( U jAj)c I jAcj
and
b) I jAcj ( U j Aj)c.
We will then, by denition, have veried the desired equality of the two
sets.
a) Let s ( U jAj)c. Then s U jAj, hence s Aj for any j. Thus s Acj for every
j and therefore s I jAcj.
b) Let s I jAcj. Then s Acj for every j and hence s Aj for any j. Then
s U jAj and therefore s ( U jAj)c.
The proof of (ii) is quite similar.
This section is concluded with the following:
DEFINITION 1
n=1
n=1
A = lim inf An = U
n
I Aj,
n=1 j= n
and
A = lim sup An = I
n
U Aj .
n=1 j= n
Exercises
1.1.1
Let Aj, j = 1, 2, 3 be arbitrary subsets of S. Determine whether each of
the following statements is correct or incorrect.
iii) (A1 A2) A2 = A2;
iii) (A1 A2) A1 = A2;
iii) (A1 A2) (A1 A2) = ;
iv) (A1 A2) (A2 A3) (A3 A1) = (A1 A2) (A2 A3) (A3 A1).
1.1.2
Let S = {(x, y) 2; 5 x 5, 0 y 5, x, y = integers}, where
prime denotes transpose, and dene the subsets Aj, j = 1, . . . , 7 of S as follows:
A1 = x, y S ; x = y;
A3 = x, y S ; x 2 = y 2 ;
A5 = x, y S ; x 2 + y 2 4 ;
A7 = x, y S ; x 2 y.
( )
( )
( )
A2 = x, y
A4 = x, y
A6 = x, y
( ) S ; x = y;
y 2 ;
S ; x y2 ;
( ) S ; x
( )
( )
7
7
iii) A1 U A j = U A1 A j ;
j =2 j =2
7
7
iii) A1 I A j = I A1 A j ;
j =2 j =2
7
7
iii) U A j = I Acj ;
j =1
j =1
c
7
iv) I A j = U Acj
j =1
j =1
by listing the members of each one of the eight sets appearing on either side of
each one of the relations (i)(iv).
1.1.4
Let A, B and C be subsets of S and suppose that A B and B C.
Then show that A C; that is, the subset relationship is transitive. Verify it by
taking A = A1, B = A3 and C = A4, where A1,A3 and A4 are dened in Exercise
1.1.2.
1.1.5
1.1.6
In terms of the acts A1, A2, A3, and perhaps their complements,
express each one of the following acts:
iii) Bi = {s S; s belongs to exactly i of A1, A2, A3 , where i = 0, 1, 2, 3};
iii) C = {s S; s belongs to all of A1, A2, A3 };
1.1.8
Give a detailed proof of the second identity in De Morgans laws; that
is, show that
c
c
I Aj = U Aj .
j
j
1.1.9
1
2
1
An = x, y 2 ; 3 + x < 6 , 0 y 2 2 ,
n
n
n
1
Bn = x, y 2 ; x 2 + y 2 3 .
n
( )
( )
1
An = x ; 5 + < x < 20
n
1
,
n
3
Bn = x ; 0 < x < 7 + .
n
An = A and lim
B = B exist (by
Then show that An and Bn, so that lim
n
n n
Exercise 1.1.9(iv)). Also identify the sets A and B.
1.1.12 Let A and B be subsets of S and for n = 1, 2, . . . , dene the sets An as
follows: A2n1 = A, A2n = B. Then show that
lim inf An = A B,
n
lim sup An = A B.
n
1.2.1
1. S, F.
2. If Aj F, j = 1, 2, . . . , n, then Unj =1 A j F, Inj =1 A j F for any nite n.
(That is, F is closed under nite unions and intersections. Notice, however, that Aj F, j = 1, 2, . . . need not imply that their union or intersection is
in F; for a counterexample, see consequence 2 on page 10.)
PROOF OF (1) AND (2) (1) (F1) implies that there exists A F and (F2)
implies that Ac F. By (F3), A Ac = S F. By (F2), S c = F.
(2) The proof will be by induction on n and by one of the De Morgans
laws. By (F3), if A1, A2 F, then A1 A2 F; hence the statement for unions
is true for n = 2. (It is trivially true for n = 1.) Now assume the statement for
unions is true for n = k 1; that is, if
A1 , A2 , , Ak1 F , then
k1
U Aj F .
j =1
and by induction, the statement for unions is true for any nite n. By observing
that
c
n c
A
=
I j U A j ,
j =1
j =1
n
* The reader is reminded that sections marked by an asterisk may be omitted without jeo* pardizing the understanding of the remaining material.
-Fields
1.1 Some1.2*
Denitions
Fields and
and Notation
we see that (F2) and the above statement for unions imply that if A1, . . . , An
F, then Inj =1 A j F for any nite n.
Let I be any non-empty index set (nite, or countably innite, or uncountable), and let Fj, j I be elds of subsets of S. Dene F by F = I j I F j =
{A; A Fj for all j I}. Then F is a eld.
PROOF
PROOF
( )
F C = I F j.
j I
10
1.2.3
1.2.4
Examples of -Fields
i) Sc = is countable, so C4 is non-empty.
ii) If A C4, then A or Ac is countable. If A is countable, then (Ac)c = A is
countable, so that Ac C4. If Ac is countable, by denition Ac C4.
iii) The proof of this statement requires knowledge of the fact that a countable union of countable sets is countable. (For proof of this fact see
page 36 in Tom M. Apostols book Mathematical Analysis, published
by Addison-Wesley, 1957.) Let Aj, j = 1, 2, . . . A. Then either each
Aj is countable, or there exists some Aj for which Aj is not countable but
Acj is. In the rst case, we invoke the previously mentioned theorem
on the countable union of countable sets. In the second case, we note
that
c
A
=
I Acj ,
U j
j = 1 j = 1
1.1
-Fields
Some1.2*
Denitions
Fields and
and Notation
11
( )
(
(
)( ]( )[
][ )[ ]
)(
, x , , x , x, , x, , x, y ,
C0 = all intervals in =
.
x, y , x, y , x, y ; x, y , x < y
12
it the Borel -eld (over the real line). The pair ( , B) is called the Borel real
line.
THEOREM 5
C1 =
C2
C3
C4
C5
C6
C7
C8
I (, x ) = (, x ].
n
n=1
(x, ) = (, x] ,
c
[x, ) = (, x) ,
c
(x, y) = (, y) (, x] = (, y) (x, ) (C ),
(x, y] = (, y] (x, ) (C ), [x, y) = (, y) [x, ) (C ),
[x, y] = (, y] [x, ) (C ).
7
1.1
13
} {(, x) (, x ), (, x) (, x ],
C0 = all rectangles in 2 =
(, x] (, x ), (, x] (, x ],
(x, ) (x , ), , [x, ) [x , ), ,
(x, y) (x , y), , [x, y] [x , y],
x, y, x , y , x < y, x < y}.
Exercises
1.2.1 Verify that the classes dened in Examples 1, 2 and 3 on page 9 are
elds.
1.2.2 Show that in the denition of a eld (Denition 2), property (F3) can
be replaced by (F3) which states that if A1, A2 F, then A1 A2 F.
1.2.3 Show that in Denition 3, property (A3) can be replaced by (A3),
which states that if
Aj A,
j = 1, 2, then
I Aj A.
j =1
1.2.4 Refer to Example 4 on -elds on page 10 and explain why S was taken
to be uncountable.
1.2.5 Give a formal proof of the fact that the class AA dened in Remark 1
is a -eld.
and lim An,
1.2.6 Refer to Denition 1 and show that all three sets A, A
n
{ {} {} {} { } { } { } { } { } { }
{1, 2, 3}, {1, 3, 4}, {2, 3, 4}, S }.
C = , 1 , 2 , 3 , 4 , 1, 2 , 1, 3 , 1, 4 , 2, 3 , 2, 4 ,
14
Chapter 2
14
15
(In the terminology of Section 1.2, we require that the events associated with
a sample space form a -eld of subsets in that space.) It follows then that I j Aj
is also an event, and so is A1 A2, etc. If the random experiment results in s and
s A, we say that the event A occurs or happens. The U j Aj occurs if at least
one of the Aj occurs, the I j Aj occurs if all Aj occur, A1 A2 occurs if A1 occurs
but A2 does not, etc.
The next basic quantity to be introduced here is that of a probability
function (or of a probability measure).
DEFINITION 1
REMARK 1
n
n
P A j = P A j .
j =1 j =1
( )
Actually, in such a case it is sufcient to assume that (P3) holds for any two
disjoint events; (P3) follows then from this assumption by induction.
( )
( ) ( )
P S = P S ++ = P S +P + ,
or
( )
1 = 1+ P +
and
( )
P = 0,
16
P A j = P A j .
j =1 j =1
( )
Indeed,
jn= 1P Aj .
( )
) (
( )
for
) ()
P A + Ac = P S , or
( ) ( )
P A + P Ac = 1,
A2 = A1 + A2 A1 ,
hence
( )
( ) (
P A2 = P A1 + P A2 A1 ,
If A1 A2, then P(A2 A1) = P(A2) P(A1), but this is not true,
in general.
(C5) 0 P(A) 1 for every event A. This follows from (C1), (P2), and (C4).
(C6) For any events A1, A2, P(A1 A2) = P(A1) + P(A2) P(A1 A2).
In fact,
A1 A2 = A1 + A2 A1 A2 .
Hence
( ) (
)
= P( A ) + P( A ) P( A A ),
P A1 A2 = P A1 + P A2 A1 A2
1
since A1 A2 A2 implies
( ) (
P A2 A1 A2 = P A2 P A1 A2 .
(C7) P is subadditive; that is,
P U A j P A j
j =1 j =1
( )
and also
n
n
P U A j P A j .
j =1 j =1
( )
17
j =1
P U A j = P A j P A j A j
j =1 j =1
1 j < j n
( )
P Aj Aj Aj
1 j1 < j2 < j3 n
( )
+ 1
n+1
)
)
P A1 A2 An .
18
We have
k
k+1
P U A j = P U A j Ak+1
j =1
j =1
k
= P U A j + P Ak+1 P U A j Ak+1
j =1
j =1
k
= P A j P A j1 A j2
1 j1 < j2 k
j = 1
+ P A j1 A j2 A j3
( )
1 j1 < j2 < j3 k
( )
+ 1
k+1
k+1
( ) P( A A )
P( A A A )
( )
+ 1
k+1
j1
1 ji < j2 k
j1
1 ji < j2 < j3 k
But
P A1 A2 Ak + P Ak+1 P U A j Ak+1
j =1
= P Aj
j =1
j2
j2
j3
k
P A1 A2 Ak P U A j Ak+1
j =1
) .
(1)
k
k
1 j1 < j2 < j3 k
( )
+ 1
( )
+ 1
P A j1 A j2 A j3 Ak+1
1 j1 < j2 jk 1 k
k+1
P A j1 A jk 11 Ak+1
P A1 Ak Ak+1 .
k+1 k+1
P U A j = P A j P A j A j + P A j Ak+1
j =1 j =1
1 j < j k
j =1
+ P A j A j A j + P A j A j Ak+1
1 j < j k
1 j < j < j k
( )
( ) [ P( A A )
+ 1
+
k+1
P A j A j Ak+1
k
k 1
( ) P( A A A )
= P( A ) P( A A )
+
P( A A A )
+ 1
k+ 2
k+1
k+1
j=1
j1
( )
+ 1
THEOREM 2
j1
1 j1 < j2 k+1
k+ 2
j2
j2
j3
P A1 Ak+1 .
( )
P lim An = lim P An .
PROOF
lim An = U A j.
n
j=1
We recall that
UA
) (
) (
j =1
= A1 + A2 A1 + A3 A2 + ,
by the assumption that An . Hence
P lim An = P U A j = P A1 + P A2 A1
n
j =1
( ) (
(
)
= lim[ P( A ) + P( A A ) + + P( A A )]
= lim[ P( A ) + P( A ) P( A )
+ P( A ) P( A ) + + P( A ) P( A )]
= lim P( A ).
+ P A3 A2 + + P An An1 +
n
Thus
( )
P lim An = lim P An .
n
n1
j=1
n1
19
20
Hence
( )
or equivalently,
c
P I A j = lim 1 P An , or 1 P I A j = 1 lim P An .
n
j = 1 n
j =1
( )]
( )
Thus
lim P An = P I A j = P lim An ,
n
n
j =1
( )
Exercises
2.1.1 If the events Aj, j = 1, 2, 3 are such that A1 A2 A3 and P(A1) =
P(A2) = 125 , P(A3) = 7 , compute the probability of the following events:
1
4
12
2.1.2 If two fair dice are rolled once, what is the probability that the total
number of spots shown is
i) Equal to 5?
ii) Divisible by 3?
2.1.3 Twenty balls numbered from 1 to 20 are mixed in an urn and two balls
are drawn successively and without replacement. If x1 and x2 are the numbers
written on the rst and second ball drawn, respectively, what is the probability
that:
i) x1 + x2 = 8?
ii) x1 + x2 5?
2.1.4
{
B = {x S ; x = 3n + 10
A = x S ; x is divisible by 7 ,
C = x S ; x 2 + 1 375 .
2.2
2.4 Conditional
Combinatorial
Probability
Results
21
Compute P(A), P(B), P(C), where P is the equally likely probability function
on the events of S.
2.1.5 Let S be the set of all outcomes when ipping a fair coin four times and
let P be the uniform probability function on the events of S. Dene the events
A, B as follows:
{
}
B = {s S ; any T in s precedes every H in s}.
A = s S ; s contains more T s than H s ,
P( A j ) < .
j=1
( )
( )
( )
( )
22
Let A be an event such that P(A) > 0. Then the conditional probability, given
A, is the (set) function denoted by P(|A) and dened for every event B as
follows:
P A B
( ) (P( A) ) .
P BA =
P SA
P A
( ) (P( A) ) = P(( A)) = 1,
PSA =
P j = 1 A j A P j = 1 A j A
P A j A =
=
P A
P A
j =1
( )
1
P A
( )
P( A
j =1
(
( )
)=
P Aj A
j =1
P A
A =
( )
P( A
A.
j =1
Then
2.4
2.2 Conditional
Combinatorial
Probability
Results
P I A j = P An A1 A2 An1
j =1
23
)( )
P An1 A1 An 2 P A2 A1 P A1 .
EXAMPLE 1
An urn contains 10 identical balls (except for color) of which ve are black,
three are red and two are white. Four balls are drawn without replacement.
Find the probability that the rst ball is black, the second red, the third white
and the fourth black.
Let A1 be the event that the rst ball is black, A2 be the event that the
second ball is red, A3 be the event that the third ball is white and A4 be the
event that the fourth ball is black. Then
P A1 A2 A3 A4
)(
)(
)( )
= P A4 A1 A2 A3 P A3 A1 A2 P A2 A1 P A1 ,
and by using the uniform probability function, we have
( )
P A1 =
5
,
10
P A2 A1 =
3
,
9
P A4 A1 A2 A3 =
Thus the required probability is equal to
P A3 A1 A2 =
1
42
2
,
8
4
.
7
0.0238.
B = B Aj .
j
Hence
( )
)( )
P B = P B Aj = P B Aj P Aj ,
j
provided P(Aj) > 0, for all j. Thus we have the following theorem.
THEOREM 4
24
P B A P( A )
P B A P( A )
P A B
) ( P(B) ) = ( P(B) ) = (P(B A) )P( A ) .
P Aj B =
Thus
THEOREM 5
P Aj B =
)( ).
P( B A ) P( A )
P B Aj P Aj
i
EXAMPLE 2
A multiple choice test question lists ve alternative answers, of which only one
is correct. If a student has done the homework, then he/she is certain to
identify the correct answer; otherwise he/she chooses an answer at random.
Let p denote the probability of the event A that the student does the homework and let B be the event that he/she answers the question correctly. Find
the expression of the conditional probability P(A|B) in terms of p.
By noting that A and Ac form a partition of the appropriate sample space,
an application of Theorems 4 and 5 gives
( )
P AB =
( )( )
P BA P A
( )( ) (
)( )
P BA P A +P BA P A
c
1 p
5p
=
.
1
4p+1
1 p + 1 p
5
(
)
= ( B A ),
Ai = Ai B j ,
i = 1, 2, ,
Bj
j = 1, 2, ,
and
{A B , i = 1, 2, , j = 1, 2, }
i
2.4
Combinatorial
Exercises
Results
25
is a partition of S. In fact,
(A B ) (A
i
(i, j ) (i, j )
B j = if
and
( A B ) = ( A B ) = A = S.
i
i, j
The expression P(Ai Bj) is called the joint probability of Ai and Bj. On the
other hand, from
Ai = Ai Bj
j
and
Bj = Ai Bj ,
i
we get
( )
)( )
)( )
P Ai = P Ai B j = P Ai B j P B j ,
j
( )
P B j = P Ai B j = P B j Ai P Ai ,
i
Exercises
2.2.1 If P(A|B) > P(A), then show that P(B|A) > P(B)
26
in terms of n and p. Next show that if p is xed but different from 0 and 1, then
Pn(A|B) increases as n increases. Does this result seem reasonable?
2.2.6 If Aj, j = 1, 2, 3 are any events in S, show that {A1, Ac1 A2, Ac1 Ac2
A3, (A1 A2 A3)c} is a partition of S.
2.2.7 Let {Aj, j = 1, . . . , 5} be a partition of S and suppose that P(Aj) = j/15
and P(A|Aj) = (5 j)/15, j = 1, . . . , 5. Compute the probabilities P(Aj|A),
j = 1, . . . , 5.
2.2.8 A girls club has on its membership rolls the names of 50 girls with the
following descriptions:
20 blondes, 15 with blue eyes and 5 with brown eyes;
25 brunettes, 5 with blue eyes and 20 with brown eyes;
5 redheads, 1 with blue eyes and 4 with green eyes.
If one arranges a blind date with a club member, what is the probability that:
i) The girl is blonde?
ii) The girl is blonde, if it was only revealed that she has blue eyes?
2.2.9 Suppose that the probability that both of a pair of twins are boys is 0.30
and that the probability that they are both girls is 0.26. Given that the probability of a child being a boy is 0.52, what is the probability that:
i) The second twin is a boy, given that the rst is a boy?
ii) The second twin is a girl, given that the rst is a girl?
2.2.10 Three machines I, II and III manufacture 30%, 30% and 40%, respectively, of the total output of certain items. Of them, 4%, 3% and 2%, respectively, are defective. One item is drawn at random, tested and found to be
defective. What is the probability that the item was manufactured by each one
of the machines I, II and III?
2.2.11 A shipment of 20 TV tubes contains 16 good tubes and 4 defective
tubes. Three tubes are chosen at random and tested successively. What is the
probability that:
i) The third tube is good, if the rst two were found to be good?
ii) The third tube is defective, if one of the other two was found to be good
and the other one was found to be defective?
2.2.12 Suppose that a test for diagnosing a certain heart disease is 95%
accurate when applied to both those who have the disease and those who do
not. If it is known that 5 of 1,000 in a certain population have the disease in
question, compute the probability that a patient actually has the disease if the
test indicates that he does. (Interpret the answer by intuitive reasoning.)
2.2.13 Consider two urns Uj, j = 1, 2, such that urn Uj contains mj white balls
and nj black balls. A ball is drawn at random from each one of the two urns and
2.4
Combinatorial
2.3 Independence
Results
27
is placed into a third urn. Then a ball is drawn at random from the third urn.
Compute the probability that the ball is black.
2.2.14 Consider the urns of Exercise 2.2.13. A balanced die is rolled and if
an even number appears, a ball, chosen at random from U1, is transferred to
urn U2. If an odd number appears, a ball, chosen at random from urn U2, is
transferred to urn U1. What is the probability that, after the above experiment
is performed twice, the number of white balls in the urn U2 remains the same?
2.2.15 Consider three urns Uj, j = 1, 2, 3 such that urn Uj contains mj white
balls and nj black balls. A ball, chosen at random, is transferred from urn U1 to
urn U2 (color unnoticed), and then a ball, chosen at random, is transferred
from urn U2 to urn U3 (color unnoticed). Finally, a ball is drawn at random
from urn U3. What is the probability that the ball is white?
2.2.16 Consider the urns of Exercise 2.2.15. One urn is chosen at random
and one ball is drawn from it also at random. If the ball drawn was white, what
is the probability that the urn chosen was urn U1 or U2?
2.2.17 Consider six urns Uj, j = 1, . . . , 6 such that urn Uj contains mj ( 2)
white balls and nj ( 2) black balls. A balanced die is tossed once and if the
number j appears on the die, two balls are selected at random from urn Uj.
Compute the probability that one ball is white and one ball is black.
2.2.18 Consider k urns Uj, j = 1, . . . , k each of which contain m white balls
and n black balls. A ball is drawn at random from urn U1 and is placed in urn
U2. Then a ball is drawn at random from urn U2 and is placed in urn U3 etc.
Finally, a ball is chosen at random from urn Uk1 and is placed in urn Uk. A ball
is then drawn at random from urn Uk. Compute the probability that this last
ball is black.
2.3 Independence
For any events A, B with P(A) > 0, we dened P(B|A) = P(A B)/P(A). Now
P(B|A) may be >P(B), < P(B), or = P(B). As an illustration, consider an urn
containing 10 balls, seven of which are red, the remaining three being black.
Except for color, the balls are identical. Suppose that two balls are drawn
successively and without replacement. Then (assuming throughout the uniform probability function) the conditional probability that the second ball is
red, given that the rst ball was red, is 69 , whereas the conditional probability
that the second ball is red, given that the rst was black, is 79 . Without any
knowledge regarding the rst ball, the probability that the second ball is red is
7
. On the other hand, if the balls are drawn with replacement, the probability
10
that the second ball is red, given that the rst ball was red, is 107 . This probability is the same even if the rst ball was black. In other words, knowledge of the
event which occurred in the rst drawing provides no additional information in
28
calculating the probability of the event that the second ball is red. Events like
these are said to be independent.
As another example, revisit the two-children families example considered
earlier, and dene the events A and B as follows: A = children of both
genders, B = older child is a boy. Then P(A) = P(B) = P(B|A) = 12 . Again
knowledge of the event A provides no additional information in calculating
the probability of the event B. Thus A and B are independent.
More generally, let A, B be events with P(A) > 0. Then if P(B|A) = P(B),
we say that the even B is (statistically or stochastically or in the probability
sense) independent of the event A. If P(B) is also > 0, then it is easily seen that
A is also independent of B. In fact,
P B A P( A) P B P A
P A B
( ) (P(B) ) = ( P(B) ) = ( P)(B() ) = P( A).
P AB =
That is, if P(A), P(B) > 0, and one of the events is independent of the other,
then this second event is also independent of the rst. Thus, independence is
a symmetric relation, and we may simply say that A and B are independent. In
this case P(A B) = P(A)P(B) and we may take this relationship as the
denition of independence of A and B. That is,
DEFINITION 3
The events A, B are said to be (statistically or stochastically or in the probability sense) independent if P(A B) = P(A)P(B).
Notice that this relationship is true even if one or both of P(A), P(B) = 0.
As was pointed out in connection with the examples discussed above,
independence of two events simply means that knowledge of the occurrence of
one of them helps in no way in re-evaluating the probability that the other
event happens. This is true for any two independent events A and B, as follows
from the equation P(A|B) = P(A), provided P(B) > 0, or P(B|A) = P(B),
provided P(A) > 0. Events which are intuitively independent arise, for example, in connection with the descriptive experiments of successively drawing
balls with replacement from the same urn with always the same content, or
drawing cards with replacement from the same deck of playing cards, or
repeatedly tossing the same or different coins, etc.
What actually happens in practice is to consider events which are independent in the intuitive sense, and then dene the probability function P
appropriately to reect this independence.
The denition of independence generalizes to any nite number of events.
Thus:
DEFINITION 4
The events Aj, j = 1, 2, . . . , n are said to be (mutually or completely) independent if the following relationships hold:
) ( )
( )
P Aj Aj = P Aj P Aj
1
2.4
Combinatorial
2.3 Independence
Results
29
n n
n
n n
n
n
+ + + =2 =2 n1
2 3
n
1 0
relationships characterizing the independence of Aj, j = 1, . . . , n and they are
all necessary. For example, for n = 3 we will have:
) ( ) ( ) ( )
P( A A ) = P( A )P( A ),
P( A A ) = P( A )P( A ),
P( A A ) = P( A )P( A ).
P A1 A2 A3 = P A1 P A2 P A3 ,
1
That these four relations are necessary for the characterization of independence of A1, A2, A3 is illustrated by the following examples:
Let S = {1, 2, 3, 4}, P({1}) = = P({4}) = 14 , and set A1 = {1, 2}, A2 = {1, 3},
A3 = {1, 4}. Then
{}
{}
A1 A2 = A1 A3 = A2 A3 = 1 , and
A1 A2 A3 = 1 .
Thus
P A1 A2 = P A1 A3 = P A2 A3 = P A1 A2 A3 =
Next,
( ) ( )
( ) ( )
( ) ( )
1 1 1
= = P A1 P A2 ,
4 2 2
1 1 1
P A1 A3 = = = P A1 P A3 ,
4 2 2
1 1 1
P A1 A3 = = = P A2 P A3 ,
4 2 2
P A1 A2 =
but
P A1 A2 A3 =
( ) ( ) ( )
1 1 1 1
= P A1 P A2 P A3 .
4 2 2 2
({ })
P 1 =
1
, P 2
8
1
.
4
30
Let
A1 = 1, 2, 3 , A2 = 1, 2, 4 , A3 = 1, 3, 4 .
Then
{ }
{}
A1 A2 = 1, 2 ,
A1 A2 A3 = 1 .
Thus
P A1 A2 A3 =
( ) ( ) ( )
1 1 1 1
= = P A1 P A2 P A3 ,
8 2 2 2
but
P A1 A2 =
( ) ( )
5 1 1
= P A1 P A2 .
16 2 2
If the events A1, . . . , An are independent, so are the events A1, . . . , An, where
Aj is either Aj or Acj, j = 1, . . . , n.
The proof is done by (a double) induction. For n = 2, we have to
show that P(A1 A2) = P(A1)P(A2). Indeed, let A1 = A1 and A2 = Ac2. Then
P(A1 A2) = P(A1 Ac2) = P[A1 (S A2)] = P(A1 A1 A2) = P(A1)
P(A1 A2) = P(A1) P(A1)P(A2) = P(A1)[1 P(A2)] = P(A1)P(Ac2) = P(A1)P(A2).
Similarly if A1 = Ac1 and A2 = A2. For A1 = Ac1 and A2 = Ac2, P(A1 A2) =
P(Ac1 Ac2) = P[(S A1) Ac2] = P(Ac2 A1 Ac2) = P(Ac2) P(A1 Ac2) = P(Ac2)
P(A1)P(Ac2) = P(Ac2)[1 P(A1)] = P(Ac2)P(Ac1) = P(A1)P(A2).
Next, assume the assertion to be true for k events and show it to be true
for k + 1 events. That is, we suppose that P(A1 Ak) = P(A1 ) P(Ak),
and we shall show that P(A1 Ak+1) = P(A1 ) P(Ak+1). First, assume
that Ak+1 = Ak+1, and we have to show that
PROOF
(
)
= P( A) P( A )P( A ).
P A1 Ak +1 = P A1 Ak Ak+1
1
k+1
) [(
(
)
= P( A A A ) P( A A A A )
= P( A ) P( A ) P( A ) P( A ) P( A ) P( A ) P( A )
= P( A ) P( A )P( A )[1 P( A )] = P( A )P( A ) P( A )P( A ).
= P A2 Ak Ak+1 A1 A2 Ak Ak+1
2
k+1
k+1
k+1
c
1
k+1
k+1
k+1
2.4
Combinatorial
2.3 Independence
Results
31
This is, clearly, true if Ac1 is replaced by any other Aci, i = 2, . . . , k. Now, for
< k, assume that
( )
( )( )
( ) P( A )P( A )P( A ) P( A ).
=P A
c
1
Indeed,
c
l
c
l +1
l+ 2
k +1
= P A A Al+ 2 Ak+1
c
1
c
l
= P A A Al+ 2 Ak+1
c
1
c
l
c
l
( ) ( )( ) ( )
P( A ) P( A ) P( A ) P( A ) P( A )
c
1
l +1
l+ 2
k+1
c
l
l+ 2
k+1
c
1
c
l
l+ 2
k+1
c
1
c
l
c
l +1
l+ 2
l +1
c
l +1
k+1
as was to be seen. It is also, clearly, true that the same result holds if the Ai s
which are Aci are chosen in any one of the (k) possible ways of choosing out
of k. Thus, we have shown that
( )
( ) (
( )
( )
P A1 Ak Ak+1 = P A1 P Ak P Ak+1 .
P A1 Ak = P A1 P Ak ,
) ( )
( ) (
P A1 Ak Akc+1 = P A1 P Ak P Akc+1 .
32
In fact,
) [(
P A1 Ak Akc+1 = P A1 Ak S Ak+1
))]
(
)
= P( A A ) P( A A A )
= P( A) P( A ) P( A) P( A )P( A )
(by the induction hypothesis and what was last proved)
= P( A) P( A )[1 P( A )]
= P( A) P( A )P( A ).
= P A1 Ak A1 Ak Ak+1
1
k+1
k+1
k+1
c
k+1
(E , E , , E ) = E
1
S = S1 S n =
E2 En
{(s , , s ); s S , j = 1, 2, , n}.
1
The class of events are subsets of S, and the experiments are said to be
independent if for all events Bj associated with experiment Ej alone, j = 1,
2, . . . , n, it holds
2.4
Combinatorial
Exercises
Results
( )
33
( )
P B1 Bn = P B1 P Bn .
Exercises
2.3.1 If A and B are disjoint events, then show that A and B are independent
if and only if at least one of P(A), P(B) is zero.
2.3.2 Show that if the event A is independent of itself, then P(A) = 0 or 1.
2.3.3 If A, B are independent, A, C are independent and B C = , then A,
B + C are independent. Show, by means of a counterexample, that the conclusion need not be true if B C .
2.3.4 For each j = 1, . . . , n, suppose that the events A1, . . . , Am, Bj are
independent and that Bi Bj = , i j. Then show that the events A1, . . . , Am,
nj= 1Bj are independent.
2.3.5 If Aj, j = 1, . . . , n are independent events, show that
n
n
P U A j = 1 P Acj .
j =1
j =1
( )
2.3.6 Jim takes the written and road drivers license tests repeatedly until he
passes them. Given that the probability that he passes the written test is 0.9
and the road test is 0.6 and that tests are independent of each other, what is the
probability that he will pass both tests on his nth attempt? (Assume that
the road test cannot be taken unless he passes the written test, and that once
he passes the written test he does not have to take it again, no matter whether
he passes or fails his next road test. Also, the written and the road tests are
considered distinct attempts.)
2.3.7 The probability that a missile red against a target is not intercepted by
an antimissile missile is 32 . Given that the missile has not been intercepted, the
probability of a successful hit is 34 . If four missiles are red independently,
what is the probability that:
i) All will successfully hit the target?
ii) At least one will do so?
How many missiles should be red, so that:
34
2
B
A
n
2.4
Combinatorial Results
35
i) A man has ve suits, three pairs of shoes and two hats. Then the number
of different ways he can attire himself is 5 3 2 = 30.
ii) Consider the set S = {1, . . . , N} and suppose that we are interested in
nding the number of its subsets. In forming a subset, we consider for each
element whether to include it or not. Then the required number is equal to
the following product of N factors 2 2 = 2N.
iii) Let nj = n(Sj) be the number of points of the sample space Sj, j = 1, 2, . . . ,
k. Then the sample space S = S1 Sk has n(S) = n1 nk sample
points. Or, if nj is the number of outcomes of the experiment Ej, j = 1,
2, . . . , k, then the number of outcomes of the compound experiment E1
. . . Ek is n1 . . . nk.
In the following, we shall consider the problems of selecting balls from an
urn and also placing balls into cells which serve as general models of
many interesting real life problems. The main results will be formulated as
theorems and their proofs will be applications of the Fundamental Principle of
Counting.
Consider an urn which contains n numbered (distinct, but otherwise identical) balls. If k balls are drawn from the urn, we say that a sample of size k was
drawn. The sample is ordered if the order in which the balls are drawn is taken
into consideration and unordered otherwise. Then we have the following
result.
THEOREM 8
36
n
Pn ,k
n!
= C n ,k = =
k!
k k! n k !
n + k 1
N n, k =
n + k 1
= N n, k ,
k
as was to be seen.
For the sake of illustration of Theorem 8, let us consider the following
examples.
EXAMPLE 4
i(i) Form all possible three digit numbers by using the numbers 1, 2, 3, 4, 5.
(ii) Find the number of all subsets of the set S = {1, . . . , N}.
In part (i), clearly, the order in which the numbers are selected is relevant.
Then the required number is P5,3 = 5 4 3 = 60 without repetitions, and 53 = 125
with repetitions.
In part (ii) the order is, clearly, irrelevant and the required number is (N0)
N
+ ( 1) + + (NN) = 2N, as already found in Example 3.
2.4
EXAMPLE 5
Combinatorial Results
37
An urn contains 8 balls numbered 1 to 8. Four balls are drawn. What is the
probability that the smallest number is 3?
Assuming the uniform probability function, the required probabilities are
as follows for the respective four possible sampling cases:
5
3 1
= 0.14;
8 7
4
6 + 3 1
28
=
0.17;
8 + 4 1 165
(5 4 3)4 = 1 0.14;
8765
4 3 4 2 4
4
1 5 + 2 5 + 3 5 + 4
671
Order counts/replacements allowed:
=
0.16.
84
4, 096
EXAMPLE 6
What is the probability that a poker hand will have exactly one pair?
A poker hand is a 5-subset of the set of 52 cards in a full deck, so there
are
52
= N = 2, 598, 960
5
different poker hands. We thus let S be a set with N elements and assign the
uniform probability measure to S. A poker hand with one pair has two cards
of the same face value and three cards whose faces are all different among
themselves and from that of the pair. We arrive at a unique poker hand with
one pair by completing the following tasks in order:
a) Choose the face value of the pair from the 13 available face values. This can
be done in (131) = 13 ways.
b) Choose two cards with the face value selected in (a). This can be done in (42)
= 6 ways.
c) Choose the three face values for the other three cards in the hand. Since
there are 12 face values to choose from, this can be done in (123) = 220
ways.
d) Choose one card (from the four at hand) of each face value chosen in (c).
This can be done in 4 4 4 = 43 = 64 ways.
38
1, 098, 240
0.42.
2, 598, 960
THEOREM 9
n
n!
=
.
n1! n2 ! nk ! n1 , n2 , , nk
iii) The number of ways that n indistinguishable balls can be distributed into
k distinct cells is
k + n 1
.
n
.
k 1
PROOF
n1 n2
1
2
k
iii) We represent the k cells by the k spaces between k + 1 vertical bars and the
n balls by n stars. By xing the two extreme bars, we are left with k + n
1 bars and stars which we may consider as k + n 1 spaces to be lled in
by a bar or a star. Then the problem is that of selecting n spaces for the n
stars which can be done in k+nn1 ways. As for the second part, we now
have the condition that there should not be two adjacent bars. The n stars
create n 1 spaces and by selecting k 1 of them in n1 ways to place the
k1
k 1 bars, the result follows.
( )
( )
2.4
Combinatorial Results
39
REMARK 5
i) The numbers nj, j = 1, . . . , k in the second part of the theorem are called
occupancy numbers.
ii) The answer to (ii) is also the answer to the following different question:
Consider n numbered balls such that nj are identical among themselves and
distinct from all others, nj 0, j = 1, . . . , k, kj=1 nj = n. Then the number of
different permutations is
.
n1 , n2 , , nk
Now consider the following examples for the purpose of illustrating the
theorem.
EXAMPLE 7
Find the probability that, in dealing a bridge hand, each player receives one
ace.
The number of possible bridge hands is
52
52!
N =
.
=
4
13, 13, 13, 13
13!
( )
Our sample space S is a set with N elements and assign the uniform probability
measure. Next, the number of sample points for which each player, North,
South, East and West, has one ace can be found as follows:
a) Deal the four aces, one to each player. This can be done in
4
4!
= 1! 1! 1! 1! = 4! ways.
1, 1, 1, 1
b) Deal the remaining 48 cards, 12 to each player. This can be done in
48
48!
ways.
=
4
12, 12, 12, 12
12!
( )
The eleven letters of the word MISSISSIPPI are scrambled and then arranged
in some order.
i) What is the probability that the four Is are consecutive letters in the
resulting arrangement?
There are eight possible positions for the rst I and the remaining
seven letters can be arranged in 7 distinct ways. Thus the required
1, 4 , 2
probability is
( )
40
7
8
1, 4, 2
4
=
0.02.
11 165
1, 4, 4, 2
ii) What is the conditional probability that the four Is are consecutive (event
A), given B, where B is the event that the arrangement starts with M and
ends with S?
Since there are only six positions for the rst I, we clearly have
( )
P AB =
5
6
2
9
4, 3, 2
1
0.05.
21
( )
P AC =
3
4
2
7
1, 2, 4
4
0.11.
35
Exercises
2.4.1 A combination lock can be unlocked by switching it to the left and
stopping at digit a, then switching it to the right and stopping at digit b and,
nally, switching it to the left and stopping at digit c. If the distinct digits a, b
and c are chosen from among the numbers 0, 1, . . . , 9, what is the number of
possible combinations?
2.4.2 How many distinct groups of n symbols in a row can be formed, if each
symbol is either a dot or a dash?
2.4.3 How many different three-digit numbers can be formed by using the
numbers 0, 1, . . . , 9?
2.4.4 Telephone numbers consist of seven digits, three of which are grouped
together, and the remaining four are also grouped together. How many numbers can be formed if:
i) No restrictions are imposed?
ii) If the rst three numbers are required to be 752?
2.4
Combinatorial
Results
Exercises
41
2.4.5 A certain state uses ve symbols for automobile license plates such that
the rst two are letters and the last three numbers. How many license plates
can be made, if:
i) All letters and numbers may be used?
ii) No two letters may be the same?
2.4.6 Suppose that the letters C, E, F, F, I and O are written on six chips and
placed into an urn. Then the six chips are mixed and drawn one by one without
replacement. What is the probability that the word OFFICE is formed?
2.4.7 The 24 volumes of the Encyclopaedia Britannica are arranged on a
shelf. What is the probability that:
i) All 24 volumes appear in ascending order?
ii) All 24 volumes appear in ascending order, given that volumes 14 and 15
appeared in ascending order and that volumes 113 precede volume 14?
2.4.8 If n countries exchange ambassadors, how many ambassadors are
involved?
2.4.9 From among n eligible draftees, m men are to be drafted so that all
possible combinations are equally likely to be chosen. What is the probability
that a specied man is not drafted?
2.4.10
Show that
n + 1
m + 1
n+1
=
.
m+1
n
m
42
2.4.16
i) Six fair dice are tossed once. What is the probability that all six faces
appear?
ii) Seven fair dice are tossed once. What is the probability that every face
appears at least once?
2.4.17 A shipment of 2,000 light bulbs contains 200 defective items and 1,800
good items. Five hundred bulbs are chosen at random, are tested and the
entire shipment is rejected if more than 25 bulbs from among those tested are
found to be defective. What is the probability that the shipment will be
accepted?
2.4.18
Show that
M M 1 M 1
=
+
,
m m m 1
where N, m are positive integers and m < M.
2.4.19
Show that
r
m n
m + n
,
r
x r x =
x =0
where
k
= 0 if
x
2.4.20
x > k.
Show that
n
j = 2 n ;
j=0
n
i)
ii )
(1) j = 0.
j
j=0
2.4
Combinatorial
Results
Exercises
43
iii) The committee contains the same number of students from each class;
iv) The committee contains two male students and one female student from
each class;
v) The committee chairman is required to be a senior;
vi) The committee chairman is required to be both a senior and male;
vii) The chairman, the secretary and the treasurer of the committee are all
required to belong to different classes.
2.4.23 Refer to Exercise 2.4.22 and suppose that the committee is formed by
choosing its members at random. Compute the probability that the committee
to be chosen satises each one of the requirements (i)(vii).
2.4.24 A fair die is rolled independently until all faces appear at least once.
What is the probability that this happens on the 20th throw?
2.4.25 Twenty letters addressed to 20 different addresses are placed at random into the 20 envelopes. What is the probability that:
i) All 20 letters go into the right envelopes?
ii) Exactly 19 letters go into the right envelopes?
iii) Exactly 17 letters go into the right envelopes?
2.4.26 Suppose that each one of the 365 days of a year is equally likely to be
the birthday of each one of a given group of 73 people. What is the probability
that:
i) Forty people have the same birthdays and the other 33 also have the same
birthday (which is different from that of the previous group)?
ii) If a year is divided into ve 73-day specied intervals, what is the probability that the birthday of: 17 people falls into the rst such interval, 23 into
the second, 15 into the third, 10 into the fourth and 8 into the fth interval?
2.4.27 Suppose that each one of n sticks is broken into one long and one
short part. Two parts are chosen at random. What is the probability that:
i) One part is long and one is short?
ii) Both parts are either long or short?
The 2n parts are arranged at random into n pairs from which new sticks are
formed. Find the probability that:
iii) The parts are joined in the original order;
iv) All long parts are paired with short parts.
2.4.28
2.4.29 Three cards are drawn at random and with replacement from a standard deck of 52 playing cards. Compute the probabilities P(Aj), j = 1, . . . , 5,
where the events Aj, j = 1, . . . , 5 are dened as follows:
44
{
}
= {s S ; at least 2 cards in s are red},
= {s S ; exactly 1 card in s is an ace},
= {s S ; the rst card in s is a diamond,
2.4.32 An urn contains nR red balls, nB black balls and nW white balls. r balls
are chosen at random and with replacement. Find the probability that:
i)
ii)
iii)
iv)
2.4.33 Refer to Exercise 2.4.32 and discuss the questions (i)(iii) for r = 3 and
r1 = r2 = r3 (= 1), if the balls are drawn at random but without replacement.
2.4.34 Suppose that all 13-card hands are equally likely when a standard
deck of 52 playing cards is dealt to 4 people. Compute the probabilities P(Aj),
j = 1, . . . , 8, where the events Aj, j = 1, . . . , 8 are dened as follows:
{
}
= {s S ; s consists only of diamonds},
= {s S ; s consists of 5 diamonds, 3 hearts, 2 clubs and 3 spades},
= {s S ; s consists of cards of exactly 2 suits},
= {s S ; s contains at least 2 aces},
= {s S ; s does not contain aces, tens and jacks},
45
2.5 Product
2.4 Combinatorial
Probability Results
Spaces
{
}
= {s S ; s consists of cards of all different denominations}.
{
}
A = {s S ; s contains exactly 7 red cards}.
P(A B) = P(A)P(B);
P(A C) = P(A)P(C);
P(B C) = P(B)P(C);
P(A B C) P(A)P(B)P(C).
C = A1 A2 ; A1 A1 , A2 A2 ,
where A1 A2 =
{(s , s ); s A , s
1
A2 .
46
S = S1 S n =
{(s , , s ); s S , j = 1, 2, , n},
1
C = A1 An ; A j A j , j = 1, 2, , n ,
and P is the unique probability measure dened on A through the
relationships
( )
( )
P A1 An = P A1 P An , A j A j , j = 1, 2, , n.
The probability measure P is usually denoted by P1 Pn and is called the
product probability measure (with factors Pj, j = 1, 2, . . . , n), and the probability space (S, A, P) is called the product probability space (with factors (Sj, Aj,
Pj), j = 1, 2, . . . , n). Then the experiments Ej, j = 1, 2, . . . , n are said to be
independent if P(B1 B2) = P(B1) P(B2), where Bj is dened by
B j = S1 S j 1 A j S j + 1 S n , j = 1, 2, , n.
The denition of independent events carries over to -elds as follows.
Let A1, A2 be two sub--elds of A. We say that A1, A2 are independent if
P(A1 A2) = P(A1)P(A2) for any A1 A1, A2 A2. More generally, the
-elds Aj, j = 1, 2, . . . , n (sub--elds of A) are said to be independent if
n
n
P I A j = P A j
j =1 j =1
( )
for any
A j A j , j = 1, 2, , n.
2.6* The
2.4Probability
Combinatorial
of Matchings
Results
47
Exercises
2.5.1 Form the Cartesian products A B, A C, B C, A B C, where
A = {stop, go}, B = {good, defective), C = {(1, 1), (1, 2), (2, 2)}.
2.5.2 Show that A B = if and only if at least one of the sets A, B is .
2.5.3 If A B, show that A C B C for any set C.
2.5.4 Show that
i) (A B)c = (A Bc) + (Ac B) + (Ac Bc);
ii) (A B) (C D) = (A C) (B D);
iii) (A B) (C D) = (A C) (B D) [(A Cc) (Bc D)
+ (Ac C) (B Dc )].
S0 = 1,
M
( )
S1 = P A j ,
j =1
S2 =
1 j1 < j2 M
Sr =
P A j1 A j2 ,
P A j1 A j2 A jr ,
S M = P A1 A2 AM .
Let also
Bm = exactly
48
THEOREM 10
m + 1
m + 2
P Bm = S m
S m+1 +
S m+ 2 + 1
m
m
( )
( )
M m
M
SM
m
(2)
which for m = 0 is
( )
( )
P B0 = S0 S1 + S 2 + 1
SM ,
(3)
( )
(4)
and
( )
( ) (
P C m = P Bm + P Bm+1 + + P BM ,
and
( )
( ) ( )
( )
P Dm = P B0 + P B1 + + P Bm .
(5)
For the proof of this theorem, all that one has to establish is (2), since (4)
and (5) follow from it. This will be done in Section 5.6 of Chapter 5. For a proof
where S is discrete the reader is referred to the book An Introduction to
Probability Theory and Its Applications, Vol. I, 3rd ed., 1968, by W. Feller, pp.
99100.
The following examples illustrate the above theorem.
EXAMPLE 9
( )
1 1
+ + 1
2! 3!
M +1
1
1 e 1 0.63
M!
1
1 1
1 1 + + + 1
m!
2! 3!
( )
1 M m
1
m! k = 0
( )
1
1 1
e
k! m!
M m
for
M m !
M m large.
49
2.6* The
of Matchings
2.4Probability
Combinatorial
Results
M r !
) ( M! ) .
M M r ! 1
Sr =
= .
r!
r M!
This implies the desired results.
EXAMPLE 10
Coupon collecting (case of sampling with replacement). Suppose that a manufacturer gives away in packages of his product certain items (which we take to
be coupons), each bearing one of the integers 1 to M, in such a way that each
of the M items is equally likely to be found in any package purchased. If n
packages are bought, show that the probability that exactly m of the integers,
1 to M, will not be obtained is equal to
n
k M m
M M m
m + k
1 M .
1
k
m k = 0
( )
DISCUSSION
Ak = z1 , , zn n ; z j integer, 1 z j M , z j k, j = 1, 2, ,
M 1
1
P Ak =
= 1 ,
M
M
( )
P Ak Ak
1
and, in general,
M 2
2
=
= 1 ,
M
M
k = 1, 2, , M ,
k1 = 1, 2, , n
k2 = k1 + 1, , n
n.
50
k1 = 1, 2, , n
k2 = k1 + 1, , n
M
kr = kr 1 + 1, , n.
r
P Ak Ak Ak = 1 ,
M
M
r
S r = 1 , r = 0, 1, , M .
M
r
(6)
Let Bm be the event that exactly m of the integers 1 to M will not be found in
the sample. Clearly, Bm is the event that exactly m of the events A1, . . . , AM
will occur. By relations (2) and (6), we have
M
( ) ( )
P Bm =
r =m
r m
r M
r
1 M
m r
k M m
M M m
m + k
= 1
1 M ,
m k=0
k
( )
(7)
=
.
m m + k m k
(8)
Let A and B be two disjoint events. Then in a series of independent trials, show
that:
( )
P A
( ) ( )
P A +P B
and therefore
2.6* The
2.4Probability
Combinatorial
of Matchings
Results
[ (
) (
51
( ) (
) (
)
= P( A ) + P( A B ) P( A ) + P( A B ) P( A B ) P( A )
+ + P( A B ) P( A B )P( A ) + (by Theorem 6)
= P( A) + P( A B )P( A) + P ( A B )P( A)
+ + P ( A B )P( A)
= P( A)[1 + P( A B ) + P ( A B ) + + P ( A B ) + ]
+ + P A1c B1c Anc Bnc An+1 +
c
1
c
1
c
1
c
1
c
n
c
2
c
2
n+1
( ) 1 P A1 B
c
1
c
n
=P A
c
1
But
(
)
( ) ( )
c
P Ac Bc = P A B = 1 P A B
= 1 P A+ B = 1 P A P B ,
so that
) ( ) ()
1 P Ac Bc = P A + P B .
Therefore
( )
P A
( ) ( )
P A +P B
as asserted.
It is possible to interpret B as a catastrophic event, and A as an event
consisting of taking certain precautionary and protective actions upon the
energizing of a signaling device. Then the signicance of the above probability
becomes apparent. As a concrete illustration, consider the following simple
example (see also Exercise 2.6.3).
52
EXAMPLE 11
Then P(A) =
= 131 134 = 14 .
4
52
1
13
and P(B) =
12
52
B = a picture occurs.
4
13
Exercises
2.6.1
Show that
m + k M M M m
=
,
m m + k m k
as asserted in relation (8).
2.6.2 Verify the transition in (7) and that the resulting expression is indeed
the desired result.
2.6.3 Consider the following game of chance. Two fair dice are rolled repeatedly and independently. If the sum of the outcomes is either 7 or 11, the player
wins immediately, while if the sum is either 2 or 3 or 12, the player loses
immediately. If the sum is either 4 or 5 or 6 or 8 or 9 or 10, the player continues
rolling the dice until either the same sum appears before a sum of 7 appears in
which case he wins, or until a sum of 7 appears before the original sum appears
in which case the player loses. It is assumed that the game terminates the rst
time the player wins or loses. What is the probability of winning?
53
Chapter 3
( )
({ })(= P(X = x ))
fX x j = PX x j
for x = x j ,
53
54
()
fX x 0
f (x ) = 1.
) f (x ).
P X B =
x j B
3.2
55
()
fX x 0
()
( ) {
X S = 0, 1, 2, , n ,
n
P X = x = f x = p x q n x ,
x
) ()
x =0
x =0
f ( x ) = x p x q n x = ( p + q)
= 1n = 1.
56
S = S, F S, F
(n copies).
In particular, for n = 1, we have the Bernoulli or Point Binomial r.v. The r.v.
X may be interpreted as representing the number of Ss (successes) in
the compound experiment E E (n copies), where E is the experiment
resulting in the sample space {S, F} and the n experiments are independent
(or, as we say, the n trials are independent). f(x) is the probability that exactly
x Ss occur. In fact, f(x) = P(X = x) = P(of all n sequences of Ss and Fs
with exactly x Ss). The probability of one such a sequence is pxqnx by the
independence of the trials and this also does not depend on the particular
sequence we are considering. Since there are ( xn) such sequences, the result
follows.
The distribution of X is called the Binomial distribution and the quantities
n and p are called the parameters of the Binomial distribution. We denote the
Binomial distribution by B(n, p). Often the notation X B(n, p) will be used
to denote the fact that the r.v. X is distributed as B(n, p). Graphs of the p.d.f.
of the B(n, p) distribution for selected values of n and p are given in Figs. 3.1
and 3.2.
f (x)
0.25
n 12
0.20
p
1
4
0.15
0.10
0.05
0
1 2 3 4 5 6 7 8 9 10 11 12 13
Figure 3.1 Graph of the p.d.f. of the Binomial distribution for n = 12, p = 14 .
()
f (1) = 0.1267
f (2 ) = 0.2323
f ( 3) = 0.2581
f ( 4 ) = 0.1936
f ( 5) = 0.1032
f (6) = 0.0401
f 0 = 0.0317
()
f ( 8 ) = 0.0024
f (9) = 0.0004
f (10 ) = 0.0000
f (11) = 0.0000
f (12 ) = 0.0000
f 7 = 0.0115
3.1 Soem
Discrete Random Variables
(and General
RandomConcepts
Vectors)
3.2
57
f (x)
0.25
0.20
n 10
0.15
p
1
2
0.10
0.05
1 2 3 4 5 6 7 8 9 10
Figure 3.2 Graph of the p.d.f. of the Binomial distribution for n = 10, p = 12 .
()
f (1) = 0.0097
f (2 ) = 0.0440
f ( 3) = 0.1172
f ( 4 ) = 0.2051
f ( 5) = 0.2460
()
f ( 7) = 0.1172
f ( 8 ) = 0.0440
f (9) = 0.0097
f (10 ) = 0.0010
f 0 = 0.0010
f 6 = 0.2051
3.2.2 Poisson
x
,
x!
x = 0, 1, 2, . . . ; > 0. f is, in fact, a p.d.f., since f(x) 0 and
( ) {
f ( x ) = e x! = e
x =0
) ()
P X = x = f x = e
X S = 0, 1, 2, ,
e = 1.
x =0
58
may be taken as the limit of Binomial distributions. Roughly speaking, suppose that X B(n, p), xwhere n is large and p is small. Then P X = x =
n x
( xn ) px (1 p) e np ( npx !) , x 0 . For the graph of the p.d.f. of the P()
distribution for = 5 see Fig. 3.3.
A visualization of such an approximation may be conceived by stipulating
that certain events occur in a time interval [0,t] in the following manner: events
occurring in nonoverlapping subintervals are independent; the probability
that one event occurs in a small interval is approximately proportional to its
length; and two or more events occur in such an interval with probability
approximately 0. Then dividing [0,t] into a large number n of small intervals of
length t/n, we have that the probability that exactly x events occur in [0,t] is
x
n x
approximately ( nx )( nt ) (1 nt ) , where is the factor of proportionality.
Setting pn = nt , we havex npn = t and Theorem 1 in Section 3.4 gives that
x
n x
( nx )( nt ) (1 nt ) e t (xt )! . Thus Binomial probabilities are approximated by
Poisson probabilities.
f (x)
0.20
0.15
0.10
0.05
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
()
f (1) = 0.0337
f (2 ) = 0.0843
f ( 3) = 0.1403
f ( 4 ) = 0.1755
f ( 5) = 0.1755
f (6) = 0.1462
f ( 7) = 0.1044
f ( 8 ) = 0.0653
f 0 = 0.0067
()
f (10 ) = 0.0181
f (11) = 0.0082
f (12 ) = 0.0035
f (13) = 0.0013
f (14 ) = 0.0005
f (15) = 0.0001
f 9 = 0.0363
()
3.2
59
3.2.3 Hypergeometric
( ) {
m n
x r x
f x =
,
m + n
()
X S = 0, 1, 2, , r ,
m n
m + n
= 1.
r
f ( x ) = m + n x r x = m + n
1
x =0
x =0
( ) {
X S = 0, 1, 2, ,
r + x 1 x
f x = pr
q ,
x
()
r + x 1 x
pr
q =
x
x =0
1 q
f ( x ) = p
r
x =0
pr
= 1.
pr
(1 x)
n + j 1 j
=
x ,
j
j=0
x < 1.
The distribution of X is called the Negative Binomial distribution. This distribution occurs in situations which have as a model the following. A Binomial
experiment E, with sample space {S, F}, is repeated independently until exactly
r Ss appear and then it is terminated. Then the r.v. X represents the number
of times beyond r that the experiment is required to be carried out, and f(x) is
the probability that this number of times is equal to x. In fact, here S =
60
{all (r + x)-sequences of Ss and Fs such that the rth S is at the end of the
sequence}, x = 0, 1, . . . and f(x) = P(X = x) = P[all (r + x)-sequences as above
for a specied x]. The probability of one such sequence is pr1qxp by the
independence assumption, and hence
r + x 1 r 1 x
r r + x 1 x
f x =
q .
p q p= p
x
x
()
The above interpretation also justies the name of the distribution. For r = 1,
we get the Geometric (or Pascal) distribution, namely f(x) = pqx, x = 0, 1, 2, . . . .
3.2.5
Discrete Uniform
( ) {
X S = 0, 1, , n 1 ,
()
f x =
1
,
n
x = 0, 1, , n 1.
n5
Figure 3.4 Graph of the p.d.f. of a Discrete
Uniform distribution.
1
5
3.2.6
Multinomial
Here
X S = x = x1 , , xk ; x j 0, j = 1, 2, , k, x j = n,
j =1
k
n!
f x =
p1x 1 p2x 2 pkx k , p j > 0, j = 1, 2, , k, p j = 1.
x1! x 2 ! x k !
j =1
()
()
f ( x) =
X
x 1 , , xk
n!
p1x1 pkxk = p1 + + pk
x1! xk !
= 1n = 1,
61
probability that the outcome Oj occurs exactly xj times. In fact f(x) = P(X = x)
= P(all n-sequences which contain exactly xj Ojs, j = 1, 2, . . . , k). The probability of each one of these sequences is p1x1 pkxk by independence, and since
there are n!/( x1! xk ! ) such sequences, the result follows.
The fact that the r. vector X has the Multinomial distribution with parameters n and p1, . . . , pk may be denoted thus: X M(n; p1, . . . , pk).
REMARK 1 When the tables given in the appendices are not directly usable
because the underlying parameters are not included there, we often resort to
linear interpolation. As an illustration, suppose X B(25, 0.3) and we wish
to calculate P(X = 10). The value p = 0.3 is not included in the Binomial
Tables in Appendix III. However, 164 = 0.25 < 0.3 < 0.3125 = 165 and the
probabilities P(X = 10), for p = 164 and p = 165 are, respectively, 0.9703
and 0.8756. Therefore linear interpolation produces the value:
0.3 0.25
= 0.8945.
0.3125 0.25
Likewise for other discrete distributions. The same principle also applies
appropriately to continuous distributions.
REMARK 2 In discrete distributions, we are often faced with calculations of
the form x =1 x x . Under appropriate conditions, we may apply the following
approach:
x
x =1
x =1
x =1
= x x 1 =
d x
d
=
d
d
x=2
x x1
x =1
x2
=
d 1
1
Exercises
3.2.1 A fair coin is tossed independently four times, and let X be the r.v.
dened on the usual sample space S for this experiment as follows:
()
X s = the number of H s in s.
62
iii) X 1?
iii) X 20?
iii) 5 X 20?
3.2.3 A manufacturing process produces certain articles such that the probability of each article being defective is p. What is the minimum number, n, of
articles to be produced, so that at least one of them is defective with probability at least 0.95? Take p = 0.05.
3.2.4 If the r.v. X is distributed as B(n, p) with p > 12 , the Binomial Tables
in Appendix III cannot be used directly. In such a case, show that:
ii) P(X = x) = P(Y = n x), where Y B(n, q), x = 0, 1, . . . , n, and q = 1 p;
ii) Also, for any integers a, b with 0 a < b n, one has: P(a X b) =
P(n b Y n a), where Y is as in part (i).
3.2.5 Let X be a Poisson distributed r.v. with parameter . Given that
P(X = 0) = 0.1, compute the probability that X > 5.
3.2.6 Refer to Exercise 3.2.5 and suppose that P(X = 1) = P(X = 2). What is
the probability that X < 10? If P(X = 1) = 0.1 and P(X = 2) = 0.2, calculate the
probability that X = 0.
3.2.7 It has been observed that the number of particles emitted by a radioactive substance which reach a given portion of space during time t follows
closely the Poisson distribution with parameter . Calculate the probability
that:
iii) No particles reach the portion of space under consideration during
time t;
iii) Exactly 120 particles do so;
iii) At least 50 particles do so;
iv) Give the numerical values in (i)(iii) if = 100.
3.2.8 The phone calls arriving at a given telephone exchange within one
minute follow the Poisson distribution with parameter = 10. What is the
probability that in a given minute:
iii) No calls arrive?
iii) Exactly 10 calls arrive?
iii) At least 10 calls arrive?
3.2.9 (Truncation of a Poisson r.v.) Let the r.v. X be distributed as Poisson
with parameter and dene the r.v. Y as follows:
Find:
and Y = 0 otherwise.
63
) (
P X j = x j , j = 1, , k =
0 x j n j , j = 1, , k,
( ),
nj
j =1 x j
n1 + + nk
n
x j = n.
j =1
3.2.12 Refer to the manufacturing process of Exercise 3.2.3 and let Y be the
r.v. denoting the minimum number of articles to be manufactured until the
rst two defective articles appear.
ii) Show that the distribution of Y is given by
)(
P Y = y = p2 y 1 1 p
y2
y = 2, 3, ;
Show that the function f(x) = ( 12 )xIA(x), where A = {1, 2, . . .}, is a p.d.f.
3.2.14
()
()
f x = c x I A x ,
where
A = 0, 1, 2,
3.2.15 Suppose that the r.v. X takes on the values 0, 1, . . . with the following
probabilities:
()
f j =P X=j =
c
,
3j
j = 0, 1, ;
64
3.2.16 There are four distinct types of human blood denoted by O, A, B and
AB. Suppose that these types occur with the following frequencies: 0.45, 0.40,
0.10, 0.05, respectively. If 20 people are chosen at random, what is the probability that:
ii) All 20 people have blood of the same type?
ii) Nine people have blood type O, eight of type A, two of type B and one of
type AB?
3.2.17 A balanced die is tossed (independently) 21 times and let Xj be the
number of times the number j appears, j = 1, . . . , 6.
ii) What is the joint p.d.f. of the Xs?
ii) Compute the probability that X1 = 6, X2 = 5, X3 = 4, X4 = 3, X5 = 2, X6 = 1.
3.2.18 Suppose that three coins are tossed (independently) n times and
dene the r.v.s Xj, j = 0, 1, 2, 3 as follows:
X j = the number of times j H s appear.
Determine the joint p.d.f. of the Xjs.
3.2.19 Let X be an r.v. distributed as P(), and set E = {0, 2, . . . } and O = {1,
3, . . . }. Then:
ii) In terms of , calculate the probabilities: P(X E) and P(X O);
ii) Find the numerical values of the probabilities in part (i) for = 5. (Hint: If
k
k
SE = kE k! and SO = kO k! , notice that SE + SO = e, and SE SO =
e .)
3.2.20 The following recursive formulas may be used for calculating
Binomial, Poisson and Hypergeometric probabilities. To this effect, show
that:
iii) If X B(n, p), then f ( x + 1) = qp nx+x1 f ( x), x = 0, 1, . . . , n 1;
iii) If X P(), then f ( x + 1) = x+1 f ( x), x = 0, 1, . . . ;
iii) If X has the Hypergeometric distribution, then
f x+1 =
(m x)(r x) f x ,
()
(n r + x + 1)(x + 1)
x = 0, 1, , min m, r .
3.2.21
i) Suppose the r.v.s X1, . . . , Xk have the Multinomial distribution, and let
j be a xed number from the set {1, . . . , k}. Then show that Xj is distributed as
B(n, pj);
ii) If m is an integer such that 2 m k 1 and j1, . . . , jm are m distinct integers
from the set {1, . . . , k}, show that the r.v.s Xj , . . . , Xj have Multinomial
distributions with parameters n and pj , pj .
1
3.3
65
3.2.22 (Polyas urn scheme) Consider an urn containing b black balls and r
red balls. One ball is drawn at random, is replaced and c balls of the same color
as the one drawn are placed into the urn. Suppose that this experiment is
repeated independently n times and let X be the r.v. denoting the number of
black balls drawn. Then show that the p.d.f. of X is given by
[ ( )]
r (r + c ) [r + ( n x 1)c ]
.
)(
b b + c b + 2c b + x 1 c
n
P X =x =
x b + r b + r + c
)(
)]
b + r + 2c b + r + m 1 c
(This distribution can be used for a rough description of the spread of contagious diseases. For more about this and also for a certain approximation to the
above distribution, the reader is referred to the book An Introduction to
Probability Theory and Its Applications, Vol. 1, 3rd ed., 1968, by W. Feller, pp.
120121 and p. 142.)
( )
()
X S = , f x =
x
exp
2 2
2
x .
()
()
()
()
I 2 = f x dx = f x dx f y dy
x 2
y
1
1
=
exp
dx exp
2
2 2
2 2
1 1
e z
dz
dy
d ,
66
f (x)
0.8
0.5
0.6
0.4
1
0.2
2
1
2
0
2
N(, 2 )
Figure 3.5 Graph of the p.d.f. of the Normal distribution with = 1.5 and several values of .
I2 =
1
2
z 2 + 2
) 2 dz d = 1
2
0
e r
r dr d
1
2
e r
r dr
d = e r
r dr = e r
2
0
= 1;
()
max f x =
x
1
2
f ( x)dx = 1,
we conclude that the larger is, the more spread-out f(x) is and vice versa. It
is also clear that f(x) 0 as x . Taking all these facts into consideration,
we are led to Fig. 3.5.
The Normal distribution is a good approximation to the distribution of
grades, heights or weights of a (large) group of individuals, lifetimes of various
manufactured items, the diameters of hail hitting the ground during a storm,
the force required to punctuate a cardboard, errors in numerous measurements, etc. However, the main signicance of it derives from the Central Limit
Theorem to be discussed in Chapter 8, Section 8.3.
3.3.2
Gamma
( )
( ) (
X S = actually X S = 0,
))
3.3
67
Here
1
x 1 e x ,
f x =
0 ,
()
( )
x0
where () = 0 y 1 e y dy (which exists and is nite for > 0). (This integral
is known as the Gamma function.) The distribution of X is also called the
Gamma distribution and , are called the parameters of the distribution.
f (x )dx = ( )
1
x 1 e x dx =
( )
y 1 e y dy,
f (x )dx = ( ) 0
1
REMARK 3
y 1 e y dy =
( )
1
= 1.
( )
( ) (
)(
= 1 1 ,
and if is an integer, then
( ) (
)(
()
= 1 2 1 ,
where
()
() (
1 = e y dy = 1; that is, a = 1 !
0
We often use this notation even if is not an integer, that is, we write
( ) (
= 1 ! = y 1 e y dy for > 0.
0
1 1
= ! = .
2 2
We have
(1 2 )
1
= y
e y dy.
0
2
By setting
y1 2 =
t
2
, so that
y=
t2
, dy = t dt , t 0, .
2
68
we get
1
1
= 2 e t
0
2
t
t dt = 2 e t
dt = ;
that is,
1 1
= ! = .
2 2
From this we also get that
3 1 1
= =
, etc.
2
2
2
2
Graphs of the p.d.f. of the Gamma distribution for selected values of and
are given in Figs. 3.6 and 3.7.
The Gamma distribution, and in particular its special case the Negative
Exponential distribution, discussed below, serve as satisfactory models for
f (x)
1.00
1, 1
0.75
0.50
2, 1
0.25
4, 1
Figure 3.6 Graphs of the p.d.f. of the Gamma distribution for several values of , .
f (x)
1.00
0.75
2, 0.5
0.50
2, 1
0.25
2, 2
4
Figure 3.7 Graphs of the p.d.f. of the Gamma distribution for several values of , .
3.3
69
3.3.3 Chi-square
For = r/2, r 1, integer, = 2, we get what is known as the Chi-square
distribution, that is,
1
( r 2 )1 x 2
x
e ,
1
r 2
f x = 2 r 2
0,
()
( )
x>0
r > 0, integer.
x0
The distribution with this p.d.f. is denoted by r2 and r is called the number of
degrees of freedom (d.f.) of the distribution. The Chi-square distribution occurs
often in statistics, as will be seen in subsequent chapters.
e x ,
f x =
0 ,
()
x>0
x0
> 0,
which is known as the Negative Exponential distribution. The Negative Exponential distribution occurs frequently in statistics and, in particular, in waitingtime problems. More specically, if X is an r.v. denoting the waiting time
between successive occurrences of events following a Poisson distribution,
then X has the Negative Exponential distribution. To see this, suppose that
events occur according to the Poisson distribution P(); for example, particles
emitted by a radioactive source with the average of particles per time unit.
Furthermore, we suppose that we have just observed such a particle, and let X
be the r.v. denoting the waiting time until the next particle occurs. We shall
show that X has the Negative Exponential distribution with parameter .
To this end, it is mentioned here that the distribution function F of an r.v., to
be studied in the next chapter, is dened by F(x) = P(X x), x , and if X
is a continuous r.v., then dFdx( x ) = f ( x). Thus, it sufces to determine F here.
Since X 0, it follows that F(x) = 0, x 0. So let x > 0 be the the waiting time
for the emission of the next item. Then F(x) = P(X x) = 1 P(X > x). Since
is the average number of emitted particles per time unit, their average
x ( x )
= e x , since no
number during time x will be x. Then P( X > x) = e
0!
x
particles are emitted in (0, x]. That is, F(x) = 1 e , so that f(x) = ex. To
summarize: f(x) = 0 for x 0, and f(x) = ex for x > 0, so that X is distributed
as asserted.
0
70
( )
])
( ) [
X S = actually X S = , and
1 ,
f x =
0,
()
x
otherwise
< .
Clearly,
()
f (x )dx = dx = 1.
1
f x 0,
f (x)
1
3.3.6 Beta
( )
( ) ( ))
X S = actually X S = 0, 1 and
( )
( )()
x 1 1 x
f x =
0 elsewhere,
()
0<x<1
> 0, > 0.
3.3
71
( )()
= x 1 e x dx y 1 e y dy
0
0
( x+y )
= x 1 y 1 e
dx dy
0
( )
uy
y du
, dx =
, u 0, 1
2
1 u
1 u
and x + y =
y
,
1 u
becomes
=
u 1
(1 u)
u 1
1 u
+1
y 1 y 1 e
y + 1 e
y 1 u
y 1 u
du
(1 u)
dy
du dy.
u (1 u)
+ 1 e du d
(
(1 u)
1
= + 1 e d u 1 1 u
0
= +
) u
1
du
du;
that is,
( )() (
= +
) x (1 x)
1
dx
and hence
1
f (x )dx = ( )( ) 0 x (1 x )
dx = 1.
Graphs of the p.d.f. of the Beta distribution for selected values of and are
given in Fig. 3.9.
REMARK 4 For = = 1, we get the U(0, 1), since (1) = 1 and (2) = 1.
The distribution of X is also called the Beta distribution and occurs rather
often in statistics. , are called the parameters of the distribution
1
and the function dened by 0 x 1 (1 x ) 1 dx for , > 0 is called the Beta
function.
Again the fact that X has the Beta distribution with parameters and
may be expressed by writing X B(, ).
72
5
3
f (x)
2.5
3
3
2.0
1.5
2
2
1.0
0.2
0.4
0.6
0.8
1.0
Figure 3.9 Graphs of the p.d.f. of the Beta distribution for several values of , .
3.3.7 Cauchy
Here
( )
()
X S = and
f x =
2 + x
x , , > 0.
f ( x )dx =
2 + x
dx =
1
1
1 + x
[(
) ]
dx
1 dy
1
= arctan y = 1,
1 + y 2
upon letting
y=
x
, so that
dx
= dy.
The distribution of X is also called the Cauchy distribution and , are called
the parameters of the distribution (see Fig. 3.10). We may write X
Cauchy(, 2) to express the fact that X has the Cauchy distribution with
parameters and 2.
(The p.d.f. of the Cauchy distribution looks much the same as the Normal
p.d.f., except that the tails of the former are heavier.)
3.3.8 Lognormal
Here X(S) = (actually X(S) = (0, )) and
3.3
3.1 Soem
Continuous Random Variables
(and General
RandomConcepts
Vectors)
73
f (x)
0.3
0.2
0.1
2
1
f x = x 2
2 2
0,
()
x>0
x 0 where , > 0.
()
f x dx =
log x log
1
exp
x
2 2
dx
y log
1
=
exp
y 2 2
2 e
e y dy.
But this is the integral of an N(log , 2) density and hence is equal to 1; that
is, if X is lognormally distributed, then Y = log X is normally distributed with
parameters log and 2. The distribution of X is called Lognormal and , are
called the parameters of the distribution (see Fig. 3.11). The notation X
Lognormal(, ) may be used to express the fact that X has the Lognormal
distribution with parameters and .
(For the many applications of the Lognormal distribution, the reader is
referred to the book The Lognormal Distribution by J. Aitchison and J. A. C.
Brown, Cambridge University Press, New York, 1957.)
3.3.9 t
3.3.10 F
These distributions occur very often in Statistics (interval estimation, testing hypotheses, analysis of variance, etc.) and their
densities will be presented later (see Chapter 9, Section 9.2).
74
f (x)
0.8
2 0.5
1
0.6
1
e2
0.4
e
0.2
Figure 3.11 Graphs of the p.d.f. of the Lognormal distribution for several values of , .
f x1 , x 2 =
1
2 1 2 1
e q 2 ,
2
x 2
x 1 1 x 2 2 x 2 2
1
1
2
+
1 2 2
1 2 1
2
x 2
x1 1 x 2 2 x 2 2
1
1 2 q = 1
1
1 2 2
x 2
x1 1
2 x1 1
= 2
+ 1
.
1
1
2
Furthermore,
x2 2
x1 1 x 2 2
1
x 1
2 1
=
2
2
1
2
1
=
1
2
x1 1
1
x2 b ,
x 2 2 + 2
=
1
2
where
b = 2 +
2
x1 1 .
1
3.3
3.1 Soem
Continuous Random Variables
(and General
RandomConcepts
Vectors)
75
z
z f(x1, x 2 ) for z k
x2
x1
Thus
2
x b
2 x1 1
1 2 q = 2
+ 1
2
1
and hence
f x1 , x 2 dx 2 =
x
1
1
exp
2
2 1
2 1
x2 b
1
exp
2 2 1 2
2 2 1 2
2
x
1
1
exp
2
2 1
1
dx
2
1,
since the integral above is that of an N(b, 22(1 2)) density. Since the rst
factor is the density of an N(1, 12) random variable, integrating with respect
to x1, we get
f ( x1 , x 2 )dx1 dx 2 = 1.
REMARK 5
Normal, then
( )
( )
is N 1 , 12 ,
is N 2 , 22 .
f1 x1 = f x1 , x 2 dx 2
and similarly,
f2 x 2 = f x1 , x 2 dx1
76
As will be seen in Chapter 4, the p.d.f.s f1 and f2 above are called marginal
p.d.f.s of f.
The notation X N(1, 2, 12, 22, ) may be used to express the fact that
X has the Bivariate Normal distribution with parameters 1, 2, 12, 22, .
Then X1 N(1 , 12) and X2 N(2, 22 ).
Exercises
3.3.1
()
ii) max f x =
x
1
2
3.3.2 Let X be distributed as N(0, 1), and for a < b, let p = P(a < X < b). Then
use the symmetry of the p.d.f. f in order to show that:
iii)
iii)
iii)
iv)
If X N(0, 1), use the Normal Tables in Appendix III in order to show
77
between two consecutive such events. Show that the distribution of T is Negative Exponential with parameter by computing the probability that T > t.
3.3.7 Let X be an r.v. denoting the life length of a TV tube and suppose that
its p.d.f. f is given by:
f(x) = exI(0, )(x).
Compute the following probabilities:
P(j < X j + 1), j = 0, 1, . . . ;
P(X > t) for some t > 0;
P(X > s + t|X > s) for some s, t > 0;
Compare the probabilities in parts (ii) and (iii) and conclude that the
Negative Exponential distribution is memoryless;
iv) If it is known that P(X > s) = , express the parameter in terms of
and s.
iii)
iii)
iii)
iv)
3.3.8 Suppose that the life expectancy X of each member of a certain group
of people is an r.v. having the Negative Exponential distribution with parameter = 1/50 (years). For an individual from the group in question, compute
the probability that:
iii) He will survive to retire at 65;
iii) He will live to be at least 70 years old, given that he just celebrated his 40th
birthday;
iii) For what value of c, P(X > c) = 12 ?
3.3.9 Let X be an r.v. distributed as U(, ) ( > 0). Determine the values
of the parameter for which the following are true:
ii) P(1 < X < 2) = 0.75;
ii) P(|X| < 1) = P(|X| > 2).
3.3.10
B , = x 1 1 x
0
dx.
n m
dx =
p
n!
x m1 1 x
m1 nm ! 0
)(
= n p j 1 p
j
j= m
n j
n m
dx
3.3.12 Let X be an r.v. with p.d.f given by f(x) = 1/[(1 + x2)]. Calculate the
probability that X2 c.
78
3.3.13
iii)
iii)
iii)
iv)
f(x)
f(x)
f(x)
f(x)
x>0
1 < x 0;
x 1
3.3.16 Let X be an r.v. with p.d.f. given by 3.3.15(ii). Compute the probability that X > x.
3.3.17 Let X be the r.v. denoting the life length of a certain electronic device
expressed in hours, and suppose that its p.d.f. f is given by:
()
f x =
()
c
I 1,000 , 3,000 ] x .
xn [
iii) Show that it is a p.d.f. (called the Weibull p.d.f. with parameters
and );
iii) Observe that the Negative Exponential p.d.f. is a special case of a Weibull
p.d.f., and specify the values of the parameters for which this happens;
iii) For = 1 and = 12 , = 1 and = 2, draw the respective graphs of the
p.d.f.s involved.
(Note: The Weibull distribution is employed for describing the lifetime of
living organisms or of mechanical systems.)
3.4
3.1 The
Soem
Poisson
General
Distribution
Concepts
3.3.20
79
) (
) (
f x, y = c 25 x 2 y 2 I (0 ,5 ) x 2 + y 2 .
Determine the constant c and compute the probability that 0 < X2 + Y2 < 4.
3.3.21 Let X and Y be r.v.s whose joint p.d.f. f is given by f(x, y) =
cxyI(0,2)(0,5)(x, y). Determine the constant c and compute the following
probabilities:
i)
ii)
iii)
iv)
f x, y =
3.3.23
) (
1
cos y I A x, y ,
4
A = , , .
2 2
1 x
e ,
4
1 ,
f x = 8
x
1
,
2
0,
()
x0
0<x<2
x = 2, 3,
otherwise.
n = npn ,
for some > 0. Then the following theorem is true.
THEOREM 1
e
pn q n n
x
x!
80
PROOF
We have
) 1
n x n x n n 1 n x + 1 x n x
pn q n
pn q n =
x
x!
n n 1 n x + 1 n
n
1
x!
n
n
n n1 nx +1
nx
= 1 1
x!
n x
x
n
1
1 n
x
n
n
1
1
x 1 nx
n
n x!
1
x
1 n
e ,
n
x
n
x!
n
1
n
since, if n , then
n
1 e .
n
1 e .
n
m n
x r x
m + n
3.4
3.1 The
Soem
Poisson
General
Distribution
Concepts
m n
r
x r x
p x q r x ,
m + n
x
r
PROOF
81
x = 0, 1, 2, , r .
We have
m n
m! n! m + n r !
x r x
r!
=
m + n
m x ! n r x ! m + n ! x! r x !
)
)] (
)[ (
)
)
r m x + 1 m n r + x + 1 n
=
x
m+nr +1 m+n
) (
r m( m 1) [ m ( x 1)] n( n 1) [ n (r x 1)]
.
=
x
(m + n) [(m + n) (r 1)]
Both numerator and denominator have r factors. Dividing through by (m + n),
we get
m x
m
n r x r m m
1
x1
=
m + n
m + n m + n
x m + n m + n m + n
r
n n
n
1
r x 1
m+n
m + n m + n m + n
m+n
1
r 1
1 1
1
m + n
m + n
r
m
p x q r x ,
,n
x
since
m
p and hence
m+n
n
1 p = q.
m+n
REMARK 7
82
m n
x r x
m + n
r
by p x q r x
x
Exercises
3.4.1 For the following values of n, p and = np, draw graphs of B(n, p) and
P() on the same coordinate axes:
iii)
iii)
iii)
iv)
iv)
n=
n=
n=
n=
n=
10, p
16, p
20, p
24, p
24, p
=
=
=
=
=
4
16
2
16
2
16
1
16
2
16
, so that = 2.5;
, so that = 2;
, so that = 2.5;
, so that = 1.5;
, so that = 3.
3.4.2 Refer to Exercise 3.2.2 and suppose that the number of applicants is
equal to 72. Compute the probabilities (i)(iii) by using the Poisson approximation to Binomial (Theorem 1).
3.4.3 Refer to Exercise 3.2.10 and use Theorem 2 in order to obtain an
approximate value of the required probability.
( ) {
() }
X 1 T = s S ; X s T .
83
X 1 U T j = U X 1 T j .
j
j
( )
( )
(1)
( )
If T1 T2 = , then X 1 T1 X 1 T2 = .
(2)
X 1 T j = X 1 T j .
j
j
( )
(3)
X 1 I T j = I X 1 T j ,
j
j
(4)
( )
Also
( ) [ ( )] ,
(5)
X 1 T = S ,
(6)
X 1
(7)
X 1 T c = X 1 T
( )
() = .
( ) {
( )
X 1 D = A S ; A = X 1 T for some T D .
By means of (1), (5), (6) above, we immediately have
THEOREM 3
THEOREM 4
COROLLARY
84
( ) [ ( )] (
({
( ) })
PX B = P X 1 B = P X B = P s S ; X s B .
(8)
By the Corollary to Theorem 4, the sets X1(B) in S are actually events due to
the assumption that X is an r. vector. Therefore PX is well dened by (8); i.e.,
P[X1(B)] makes sense, is well dened. It is now shown that PX is a probability
measure on Bk . In fact, PX(B) 0, B Bk, since P is a probability measure.
Next, PX(k) = P[X 1(k)] = P(S ) = 1, and nally,
PX B j = P X 1 B j = P X 1 B j = P X 1 B j = PX B j .
j =1
j = 1
j =1
j =1
j =1
The probability measure PX is called the probability distribution function (or
just the distribution) of X.
( )
[ ( )]
( )
Exercises
3.5.1 Consider the sample space S supplied with the -eld of events A. For
an event A, the indicator IA of A is dened by: IA(s) = 1 if s A and IA(s) = 0
if s Ac.
iii) Show that IA is r.v. for any A A .
iii) What is the partition of S induced by IA?
iii) What is the -eld induced by IA?
3.5.2
Write out the proof of Theorem 1 by using (1), (5) and (6).
3.5.3
85
Chapter 4
i) Obvious.
ii) This means that x1 < x2 implies F(x1) F(x2). In fact,
85
86
x1 < x2
(, x ] (, x ]
implies
and hence
] (
( )
( )
Q , x1 Q , x 2 ; equivalently, F x1 F x 2 .
(, x ] (, x]
x n x implies
and hence
Q , x n Q , x
(, x ] ,
n
] ( )
so that Q , x n Q = 0
i) F(x) can be used to nd probabilities of the form P(a < X b); that is
() ()
P a<X b =F b F a .
In fact,
and
Thus
) (
() ()
( )
( )
F x = lim F x n
n
with x n x.
This limit always exists, since F(xn), but need not be equal to F(x+)(=limit
from the right) = F(x). The quantities F(x) and F(x) are used to express
the probability P(X = a); that is, P(X = a) = F(a) F(a). In fact, let xn
a and set A = (X = a), An = (xn < X a). Then, clearly, An A and hence
by Theorem 2, Chapter 2,
( )
( )
P An P A , or lim P x n < X a = P X = a ,
or
[ ( ) ( )] (
lim F a F x n = P X = a ,
n
87
or
()
( )
F a lim F x n = P X = a ,
or
() ( )
F a F a =P X =a .
It is known that a nondecreasing function (such as F) may have
discontinuities which can only be jumps. Then F(a) F(a) is the length of
the jump of F at a. Of course, if F is continuous then F(x) = F(x) and
hence P(X = x) = 0 for all x.
iii) If X is discrete, its d.f. is a step function, the value of it at x being dened
by
( ) f (x )
F x =
and
xjx
( )
( ) ( )
f x j = F x j F x j1 ,
( )=f
dF x
dx
(x)
F(x)
F(x)
1.00
1.00
0.80
0.60
0.80
0.60
0.40
0.40
0.20
0.20
x
0
(a) Binomial for n 6, p
1
.
4
(x)
F(x)
1.0
1.0
0
0
0.5
x
x
x .
x
2
1
0
(d) N(0, 1).
88
()
( )=f
dF x
()
F x = f t dt and
dx
(x),
STATEMENT 2
THEOREM 2
()
y =
e t
dt .
We have
X
P Y y = P
y = P X y +
y +
t
exp
2 2
dt =
1
2
u2 2
()
du = y ,
89
The transformation
of X.
THEOREM 3
of X is referred to as normalization
()
FY y = P Y y = P y X
1
Let x =
ex
dx = 2
ex
dx.
()
FY y = 2
e t 2 dt .
2 t
Hence
( )=
dFY y
dy
1
y
e y
1
2
(1 2 )1
e y 2 .
fY y = 21 2
0,
()
y>0
y 0,
and this is the p.d.f. of 21. (Observe that here we used the fact that ( 12 ) =
.)
Exercises
4.1.1 Refer to Exercise 3.2.13, in Chapter 3, and determine the d.f.s corresponding to the p.d.f.s given there.
4.1.2 Refer to Exercise 3.2.14, in Chapter 3, and determine the d.f.s corresponding to the p.d.f.s given there.
90
4.1.3 Refer to Exercise 3.3.13, in Chapter 3, and determine the d.f.s corresponding to the p.d.f.s given there.
4.1.4 Refer to Exercise 3.3.14, in Chapter 3, and determine the d.f.s corresponding to the p.d.f.s given there.
4.1.5 Let X be an r.v. with d.f. F. Determine the d.f. of the following r.v.s:
X, X 2, aX + b, XI[a,b) (X ) when:
i) X is continuous and F is strictly increasing;
ii) X is discrete.
4.1.6 Refer to the proof of Theorem 1 (iv) and show that we may assume
that xn (xn ) instead of xn (xn ).
4.1.7 Let f and F be the p.d.f. and the d.f., respectively, of an r.v. X. Then
show that F is continuous, and dF(x)/dx = f(x) at the continuity points x of f.
4.1.8
i) Show that the following function F is a d.f. (Logistic distribution) and
derive the corresponding p.d.f., f.
()
F x =
1+ e
1
, x , > 0, ;
(x + )
4.2
The 4.1
d.f. ofThe
a Random
Cumulative
Vector
Distribution
and Its Properties
Function
91
tested, what is the probability that at most 15 of them are defective? (Use the
required independence.)
4.1.13 A manufacturing process produces 12 -inch ball bearings, which are
assumed to be satisfactory if their diameter lies in the interval 0.5 0.0006 and
defective otherwise. A days production is examined, and it is found that the
distribution of the actual diameters of the ball bearings is approximately
normal with mean = 0.5007 inch and = 0.0005 inch. Compute the proportion of defective ball bearings.
4.1.14 If X is an r.v. distributed as N(, 2), nd the value of c (in terms of
and ) for which P(X < c) = 2 9P(X > c).
4.1.15 Refer to the Weibull p.d.f., f, given in Exercise 3.3.19 in Chapter 3 and
do the following:
i) Calculate the corresponding d.f. F and the reliability function (x) = 1
F(x);
f (x )
ii) Also, calculate the failure (or hazard) rate H x = ( x ) , and draw its graph
for = 1 and = 1 , 1, 2;
()
iii) For s and t > 0, calculate the probability P(X > s + t|X > t) where X is an r.v.
having the Weibull distribution;
iv) What do the quantities F(x), (x), H(x) and the probability in
part (iii) become in the special case of the Negative Exponential
distribution?
4.2 The d.f. of a Random Vector and Its PropertiesMarginal and Conditional
d.f.s and p.d.f.s
For the case of a two-dimensional r. vector, a result analogous to Theorem
1 can be established. So consider the case that k = 2. We then have X =
(X1, X2) and the d.f. F(or FX or FX1,X2) of X, or the joint distribution function
of X1, X2, is F(x1, x2) = P(X1 x1, X2 x2). Then the following theorem holds
true.
With the above notation we have
THEOREM 4
92
y
(x1, y 2)
y2
(x1, y 1)
y1
0
x1
(x2, y 2)
(x2, y 1)
x2
iv) If both x1, x2, , then F(x1, x2) 1, and if at least one of the x1, x2
, then F(x1, x2) 0. We express this by writing F(, ) = 1, F(, x2) =
F(x1, ) = F(, ) = 0, where < x1, x2 < .
PROOF
i) Obvious.
ii) V = P(x1 < X1 x2, y1 < X2 y2) and is hence, clearly, 0.
iii) Same as in Theorem 3. (If x = (x1, x2), and zn = (x1n, x2n), then zn x means
x1n x1, x2n x2).
iv) If x1, x2 , then (, x1] (, x2] R 2, so that F(x1, x2) P(S) = 1. If
at least one of x1, x2 goes () to , then (, x1] (, x2] , hence
( )
F x1 , x 2 P = 0.
REMARK 3
F x1 , = lim P X1 x1 , X 2 x n
x n
( )
= P X1 x1 , < X 2 < = P X1 x1 = F1 x1 .
Similarly F(, x2) = F2(x2) is the d.f. of the random variable X2. F1, F2 are called
marginal d.f.s.
It should be pointed out here that results like those discussed
in parts (i)(iv) in Remark 1 still hold true here (appropriately interpreted).
In particular, part (iv) says that F(x1, x2) has second order partial derivatives
and
REMARK 4
2
F x1 , x 2 = f x1 , x 2
x1x 2
) (
at continuity points of f.
For k > 2, we have a theorem strictly analogous to Theorems 3 and 6 and
also remarks such as Remark 1(i)(iv) following Theorem 3. In particular, the
analog of (iv) says that F(x1, . . . , xk) has kth order partial derivatives and
4.2
The 4.1
d.f. ofThe
a Random
Cumulative
Vector
Distribution
and Its Properties
Function
k
F x1 , , xk = f x1 , , xk
x1x2 xk
) (
93
at continuity points of f, where F, or FX, or FX1, , Xk, is the d.f. of X, or the joint
distribution function of X1, . . . , Xk. As in the two-dimensional case,
( )
F , , , x j , , , = F j x j
is the d.f. of the random variable Xj, and if m xjs are replaced by (1 < m < k),
then the resulting function is the joint d.f. of the random variables corresponding to the remaining (k m) Xjs. All these d.f.s are called marginal distribution functions.
In Statement 2, we have seen that if X = (X1, . . . , Xk) is an r. vector, then
Xj, j = 1, 2, . . . , k are r.v.s and vice versa. Then the p.d.f. of X, f(x) =
f(x1, . . . , xk), is also called the joint p.d.f. of the r.v.s X1, . . . , Xk.
Consider rst the case k = 2; that is, X = (X1, X2), f(x) = f(x1, x2) and set
(
(
)
)
f x1 , x 2
f1 x1 = x
f x1 , x 2 dx 2
f x1 , x 2
f2 x 2 = x
f x1 , x 2 dx1 .
( )
( )
f ( x ) = f ( x , x ) = 1,
1
x1
x1
x2
or
Similarly we get the result for f2. Furthermore, f1 is the p.d.f. of X1, and f2 is the
p.d.f. of X2. In fact,
( )
f x1 , x2 = f x1 , x2 = f1 x1
x B x
x B
P X 1 B = x B , x
f x1 , x2 dx1dx2 = f x1 , x2 dx2 dx1 = f1 x1 dx1 .
B
B
B
( )
Similarly f2 is the p.d.f. of the r.v. X2. We call f1, f2 the marginal p.d.f.s. Now
suppose f1(x1) > 0. Then dene f(x2|x1) as follows:
f x,x
) (f (x ) ) .
1
f x 2 x1 =
94
f ( x 2 x1 ) =
x2
1
f1 x1
( )
f ( x1 , x 2 ) =
( )
1
f1 x1 = 1,
f1 x1
( )
x2
f ( x 2 x1 )dx 2 = f ( x ) f ( x1 , x 2 )dx 2 = f ( x ) f1 ( x1 ) = 1.
1
1
1
1
1
f x1 x 2 =
and show that f(|x2) is a p.d.f. Furthermore, if X1, X2 are both discrete, the
f(x2|x1) has the following interpretation:
f x,x
P X =x, X =x
) (f (x ) ) = ( P(X = x ) ) = P(X
f x 2 x1 =
= x 2 X1 = x1 .
Hence P(X2 B|X1 = x1) = x B f(x2|x1). For this reason, we call f(|x2) the
conditional p.d.f. of X2, given that X1 = x1 (provided f1(x1) > 0). For a similar
reason, we call f(|x2) the conditional p.d.f. of X1, given that X2 = x2 (provided
f2(x2) > 0). For the case that the p.d.f.s f and f2 are of the continuous type,
the conditional p.d.f. f (x1|x2) may be given an interpretation similar to the
one given above. By assuming (without loss of generality) that h1, h2 > 0, one
has
2
1 2
1 2
+ h1 , x 2
)]
where F is the joint d.f. of X1, X2 and F2 is the d.f. of X2. By letting h1, h2 0
and assuming that (x1, x2) and x2 are continuity points of f and f2, respectively,
the last expression on the right-hand side above tends to f(x1, x2)/f2(x2) which
was denoted by f(x1|x2). Thus for small h1, h2, h1 f(x1|x2) is approximately equal
to P(x1 < X1 x1 + h1|x2 < X2 x2 + h2), so that h1 f(x1|x2) is approximately the
conditional probability that X1 lies in a small neighborhood (of length h1) of x1,
given that X2 lies in a small neighborhood of x2. A similar interpretation may
be given to f(x2|x1). We can also dene the conditional d.f. of X2, given X1 = x1,
by means of
4.2
The 4.1
d.f. ofThe
a Random
Cumulative
Vector
Distribution
and Its Properties
Function
95
f x x
2 1
F x2 x1 = x x
x f x x dx ,
2 1
2
fi , , i xi , , xi
1
f x1 , , xk
x , , x
=
f x ,
1 , xk dx j dx j .
j1
jt
There are
k k
k
k
+ + +
=2 2
1 2
k 1
such p.d.f.s which are also called marginal p.d.f.s. Also if xi1, . . . , xis are such
that fi1, . . . , it (xi1, . . . , xis) > 0, then the function (of xj1, . . . , xjt) dened by
f x j , , x j xi , , xi =
1
f x1 , , xk
fi , , i xi , , xi
1
is a p.d.f. called the joint conditional p.d.f. of the r.v.s Xj1, . . . , Xjt, given Xi1 =
xi1, , Xjs = xjs, or just given Xi1, . . . , Xis. Again there are 2k 2 joint conditional p.d.f.s involving all k r.v.s X1, . . . , Xk. Conditional distribution functions are dened in a way similar to the one for k = 2. Thus
)
f (x , , x x , , x )
)
F x j , , x j xi , , xi
t
j
j
i
i
= ( x , , x )( x , , x
x x f x ,
j , x j x i , , x i dx j dx j .
t
j1
j1
jt
j1
jt
jt
Let the r.v.s X1, . . . , Xk have the Multinomial distribution with parameters
n and p1, . . . , pk. Also, let s and t be integers such that 1 s, t < k and
s + t = k. Then in the notation employed above, we have:
96
n!
x
x
pi pi q n r ,
xi ! xi ! n r !
fi , , i xi , , xi =
ii)
i1
is
q = 1 pi + + pi , r = xi + + xi ;
1
that is, the r.v.s Xi1, . . . , Xis and Y = n (Xi1 + + Xis) have the Multinomial
distribution with parameters n and pi1, . . . , pis, q.
ii)
f x j1 , , x jt xi1 , , xi s
(n r )!
p j1
=
x j1 ! x jt ! q
x j1
xj
pj t
1 ,
q
r = xi1 + + xi s ;
that is, the (joint) conditional distribution of Xj1, . . . , Xjt given Xi1, . . . , Xis is
Multinomial with parameters n r and pj1/q, . . . , pjt/q.
DISCUSSION
i) Clearly,
(X
i1
) (
) (
) (
= xi , , X i = xi X i + + X i = r = n Y = r = Y = n r ,
1
so that
(X
i1
) (
= xi , , X i = xi = X i = xi , , X i = xi , Y = n r .
1
j1
f x j1 , , x jt xi1 , , xi s =
jt
i1
i1
p p p p
i
is
j1
jt
= 1
xi1 ! xi s ! x j1 ! x jt !
xi s
x j1
is
x jt
EXAMPLE 2
(n r )!
p j1
x j1 ! x jt ! q
i1
is
x j1 + + x jt
xi s
i1
p p q
i
is
1
xi ! xi ! n r !
1
s
xi1
(since n r = n ( x
x j1
n!
x
x
pi1i1 pi si s q n r
xi1 ! xi s ! n r !
k
1
n!
p1x pkx
=
x1! xk !
xi1
is
+ + xi s = x j1 + + x jt
x jt
pj
t ,
q
as was to be seen.
Let the r.v.s X1 and X2 have the Bivariate Normal distribution, and recall that
their (joint) p.d.f. is given by:
f x1 , x2 =
97
1
2 1 2 1 2
exp
2
2 1
2
x 2
x1 1 x2 2 x2 2
1
1
.
1
1 2 2
We saw that the marginal p.d.f.s f1, f2 are N(1, 21), N(2, 22), respectively;
that is, X1, X2 are also normally distributed. Furthermore, in the process of
proving that f(x1, x2) is a p.d.f., we rewrote it as follows:
f x1 , x 2 =
1
2 1 2
x
1
1
exp
2
2 1
1 2
b
2
exp
,
2
2 2 1
where
b = 2 +
2
x1 1 .
1
)=
Hence
f x 2 x1 =
f x1 , x 2
( )
f1 x1
2 2
x2 b
exp
2
2
1 2
2 2 1
which is the p.d.f. of an N(b, 22(1 2)) r.v. Similarly f(x1|x2) is seen to be the
p.d.f. of an N(b, 21(1 2)) r.v., where
b = 1 +
1
x2 2 .
2
Exercises
4.2.1 Refer to Exercise 3.2.17 in Chapter 3 and:
i) Find the marginal p.d.f.s of the r.v.s Xj, j = 1, , 6;
ii) Calculate the probability that X1 5.
4.2.2 Refer to Exercise 3.2.18 in Chapter 3 and determine:
ii) The marginal p.d.f. of each one of X1, X2, X3;
ii) The conditional p.d.f. of X1, X2, given X3; X1, X3, given X2; X2, X3, given
X1 ;
98
iii) The conditional p.d.f. of X1, given X2, X3; X2, given X3, X1; X3, given X1,
X2 .
If n = 20, provide expressions for the following probabilities:
iv) P(3X1 + X2 5);
v) P(X1 < X2 < X3);
vi) P(X1 + X2 = 10|X3 = 5);
vii) P(3 X1 10|X2 = X3);
viii) P(X1 < 3X2|X1 > X3).
4.2.3 Let X, Y be r.v.s jointly distributed with p.d.f. f given by f(x, y) = 2/c2
if 0 x y, 0 y c and 0 otherwise.
i) Determine the constant c;
ii) Find the marginal p.d.f.s of X and Y;
iii) Find the conditional p.d.f. of X, given Y, and the conditional p.d.f. of Y,
given X;
iv) Calculate the probability that X 1.
4.2.4 Let the r.v.s X, Y be jointly distributed with p.d.f. f given by f(x, y) =
exy I(0,)(0,) (x, y). Compute the following probabilities:
i) P(X x);
ii) P(Y y);
iii) P(X < Y);
iv) P(X + Y 3).
4.2.5
f x1 , x 2 , x3 = c 3 e
c x1 + x2 + x 3
I A x1 , x 2 , x3 ,
where
) (
) (
A = 0, 0, 0, ,
i) Determine the constant c;
ii) Find the marginal p.d.f. of each one of the r.v.s Xj, j = 1, 2, 3;
iii) Find the conditional (joint) p.d.f. of X1, X2, given X3, and the conditional
p.d.f. of X1, given X2, X3;
iv) Find the conditional d.f.s corresponding to the conditional p.d.f.s in (iii).
4.2.6
yx e y
,
f x y = x!
0,
( )
x = 0, 1, ; y 0
otherwise.
4.3
4.1 Quantiles
The Cumulative
and Modes
Distribution
of a Distribution
Function
99
i) Show that for each xed y, f(|y) is a p.d.f., the conditional p.d.f. of an r.v.
X, given that another r.v. Y equals y;
ii) If the marginal p.d.f. of Y is Negative Exponential with parameter = 1,
what is the joint p.d.f. of X, Y?
iii) Show that the marginal p.d.f. of X is given by f(x) = ( 12 )x+1 IA(x), where
A = {0, 1, 2, . . . }.
4.2.7 Let Y be an r.v. distributed as P() and suppose that the conditional
distribution of the r.v. X, given Y = n, is B(n, p). Determine the p.d.f. of X and
the conditional p.d.f. of Y, given X = x.
4.2.8 Consider the function f dened as follows:
f x1 , x 2 =
x 12 + x 22
1
1 3 3
exp
x1 x 2 I [ 1, 1][ 1, 1] x1 , x 2
+
2
2 4e
( )
f1 x1 = f x1 , x 2 dx 2
and
( )
f2 x 2 = f x1 , x 2 dx1
Let X be an r.v. distributed as U(0, 1) and let p = 0.10, 0.20, 0.30, 0.40, 0.50,
0.60, 0.70, 0.80 and 0.90. Determine the respective x0.10, x0.20, x0.30, x0.40, x0.50, x0.60,
x0.70, x0.80, and x0.90.
Since for 0 x 1, F(x) = x, we get: x0.10 = 0.10, x0.20 = 0.20, x0.30 = 0.30,
x0.40 = 0.40, x0.50 = 0.50, x0.60 = 0.60, x0.70 = 0.70, x0.80 = 0.80, and x0.90 = 0.90.
EXAMPLE 4
Let X be an r.v. distributed as N(0, 1) and let p = 0.10, 0.20, 0.30, 0.40, 0.50,
0.60, 0.70, 0.80 and 0.90. Determine the respective x0.10, x0.20, x0.30, x0.40, x0.50, x0.60,
x0.70, x0.80, and x0.90.
100
Typical cases:
F(x)
F(x)
1
p
xp
]
xp
(a)
(b)
F(x)
p
xp
0
(c)
F(x)
x
F(x)
1
1
p
xp
0
(d)
]
xp
(e)
Figure 4.3 Observe that the gures demonstrate that, as dened, xp need not be unique.
4.3
4.1 Quantiles
The Cumulative
and Modes
Distribution
of a Distribution
Function
THEOREM 5
101
n
f x = px qn x ,
x
()
0 < p < 1, q = 1 p, x = 0, 1, , n.
Consider the number (n + 1)p and set m = [(n + 1)p], where [y] denotes the
largest integer which is y. Then if (n + 1)p is not an integer, f(x) has a unique
mode at x = m. If (n + 1)p is an integer, then f(x) has two modes obtained for
x = m and x = m 1.
PROOF
For x 1, we have
n x n x
p q
x
=
f x1
n x 1 n x +1
p q
x 1
()
f x
n!
p x q n x
x! n x !
n!
p x 1 q n x +1
x 1! nx +1!
)(
nx +1 p
.
x
q
That is,
()
f x
f x1
nx +1 p
.
x
q
or
np xp + p > x xp, or
(n + 1) p > x.
Thus if (n + 1)p is not an integer, f(x) keeps increasing for x m and then
decreases so the maximum occurs at x = m. If (n + 1)p is an integer, then the
maximum occurs at x = (n + 1)p, where f(x) = f(x 1) (from above calculations). Thus
x = n+1 p1
x
,
x = 0, 1, 2, , > 0.
x!
Then if is not an integer, f(x) has a unique mode at x = []. If is an integer,
then f(x) has two modes obtained for x = and x = 1.
()
f x = e
PROOF
For x 1, we have
102
()
f x
f x1
e x x!
e
[ (x 1)!]
x 1
.
x
Hence f(x) > f(x 1) if and only if > x. Thus if is not an integer, f(x) keeps
increasing for x [] and then decreases. Then the maximum of f(x) occurs
at x = []. If is an integer, then the maximum occurs at x = . But in this case
f(x) = f(x 1) which implies that x = 1 is a second point which gives
the maximum value to the p.d.f.
Exercises
4.3.1 Determine the pth quantile xp for each one of the p.d.f.s given in
Exercises 3.2.1315, 3.3.1316 (Exercise 3.2.14 for = 14 ) in Chapter 3 if p =
0.75, 0.50.
4.3.2 Let X be an r.v. with p.d.f. f symmetric about a constant c (that is,
f(c x) = f(c + x) for all x ). Then show that c is a median of f.
4.3.3 Draw four graphstwo each for B(n, p) and P()which represent
the possible occurrences for modes of the distributions B(n, p) and P().
4.3.4 Consider the same p.d.f.s mentioned in Exercise 4.3.1 from the point
of view of a mode.
LEMMA 1
4.1
4.4*The
Justication
Cumulativeof
Distribution
StatementsFunction
1 and 2
DEFINITION 2
LEMMA 2
103
DEFINITION 3
LEMMA 3
LEMMA 4
104
THEOREM 7
( )
where B1 = g 1 B B k
DEFINITION 5
LEMMA 5
0j
105
related from a measurability point of view. To this effect, we have the following result.
THEOREM 8
X 1 ( B) = (X B) = ( X j ( , x j ], j = 1, , k ) = I X j1 (( , x j ]) A .
j =1
Exercises
4.4.1 If X and Y are functions dened on the sample space S into the real line
, show that:
1
X
, provided X 0.
ii) Use part (i) in conjunction with Exercise 4.4.4(ii) to show that, if X and Y
are r.v.s, then so is the function XY , provided Y 0.
106
Chapter 5
g x f x , x = x1 , , xk
n
E g X = x
n
g x ,
f x1 , , xk dx1 dxk .
1 , xk
For n = 1, we get
[ ( )]
[ ( )] ( )
[(
)] (
()()
g x f x
E g X = x
g x1 , , xk f x1 , , xk dx1 dxk
[ ( )]
106
)(
107
and call it the mathematical expectation or mean value or just mean of g(X).
Another notation for E[g(X)] which is often used is g(X), or [g(X)], or just
, if no confusion is possible.
iii) For r > 0, the rth absolute moment of g(X) is denoted by E|g(X)|r and is
dened by:
( )
Eg X
g x f x , x = x1 , , xk
= x
r
g x ,
f x1 , , xk dx1 dxk .
1 , xk
() ()
) (
iii) For an arbitrary constant c, and n and r as above, the nth moment and rth
absolute moment of g(X) about c are denoted by E[g(X) c]n, E|g(X) c|r,
respectively, and are dened as follows:
[( ) ]
E g X c
g x c f x , x = x1 , , xk
= x
n
g x ,
1 , xk c f x1 , , xk dx1 dxk ,
[() ] () (
) ] (
[(
and
r
g x c f x , x = x1 , , xk
Eg X c = x
r
g x ,
1 , xk c f x1 , , xk dx1 dxk .
( )
()
()
For c = E[g(X)], the moments are called central moments. The 2nd central
moment of g(X), that is,
{(
) [ ( )]}
[ g( x) Eg(X )] f ( x),
E g X E g X
x = x1 , , xk
= x
g x ,
1 , xk Eg X
[(
( )] f ( x , , x )dx
dxk
is called the variance of g(X) and is also denoted by 2[g(X)], or g2( X ), or just
2, if no confusion is possible. The quantity + 2 [ g(X )] = [ g(X )] is called the
standard deviation (s.d.) of g(X) and is also denoted by g(X), or just , if no
confusion is possible. The variance of an r.v. is referred to as the moment of
inertia in Mechanics.
108
( )
E X
n
j
()
x nj f x =
x nj f x1 , , xk
x
( x , ,x )
=
xn f x ,
1 , xk dx1 dxk
j
x nj f j x j
x
=
x n f x dx
j
j j j
1
( )
( )
which is the nth moment of the r.v. Xj. Thus the nth moment of an r.v. X with
p.d.f. f is
( )
E Xn
()
x n f x
= x
x n f x dx.
()
For n = 1, we get
()
xf x
E X = x
xf x dx
( )
()
g( x)f ( x)dx. On the other hand, from the last expression above, E(X) =
g( x)f ( x)dx and set y = g(x). Suppose that g is differentiable and has an
inverse g1, and that some further conditions are met. Then
REMARK 2
1
1
g( x) f ( x)dx = yf [g ( y)] dy g ( y) dy.
109
Therefore the last integral above is equal to yfY ( y)dy, which is consonant
with the denition of E ( X ) = xf ( x) dx. (A justication of the above derivations is given in Theorem 2 of Chapter 9.)
2. For g as above, that is, g(X1, . . . , Xk) = X 1n X kn and n1 = = nj1 =
nj+1 = = nk = 0, nj = 1, and c = E(Xj), we get
1
E X j EX j
x j EX j f x , x = x1 , , xk
= x
x EX n f x ,
1 , xk dx1 dxk
j
j
n
x j EX j f j x j
= xj
x EX n f x dx
j
j
j
j
j
) ()
) (
) ( )
) ( )
which is the nth central moment of the r.v. Xj (or the nth moment of Xj about its
mean).
Thus the nth central moment of an r.v. X with p.d.f. f and mean is
E X EX
=E X
) f (x) = (x ) f (x)
) f (x)dx = (x ) f (x)dx.
x EX
= x
x EX
n
n
E X1 EX1 X k EX k
) ( )
) ( )
x j x j 1 x j n + 1 f j x j
x
E Xj Xj 1 Xj n+1 =
x x 1 x n + 1 f x dx
j
j
j
j
j j
[ (
)]
110
is the nth factorial moment of the r.v. Xj. Thus the nth factorial moment of an
r.v. X with p.d.f. f is
)()
)()
x x 1 x n + 1 f x
E X X 1 X n+1 = x
x x 1 x n + 1 f x dx.
[ (
5.1.2
)]
From the very denition of E[g(X)], the following properties are immediate.
(E1) E(c) = c, where c is a constant.
(E2) E[cg(X)] = cE[g(X)], and, in particular, E(cX) = cE(X) if X is an
r.v.
(E3) E[g(X) + d] = E[g(X)] + d, where d is a constant. In particular,
E(X + d) = E(X) + d if X is an r.v.
(E4) Combining (E2) and (E3), we get E[cg(X) + d] = cE[g(X)] + d,
and, in particular, E(cX + d) = cE(X) + d if X is an r.v.
(E4) E nj = 1 cj g j X = nj = 1 cj E g j X .
( )]
[ ( )]
n
E c j g j X = c j g j x1 , , xk f x1 , , xk dx1 dxk
j =1
j =1
( )
) (
)(
= c j g j x1 , , xk f x1 , , xk dx1 dxk
j =1
n
[ ( )]
= cjE gj X .
j =1
111
(E7) If E(Xn) exists (that is, E|X|n < ) for some n = 2, 3, . . . , then E(Xn)
also exists for all n = 1, 2, . . . with n < n.
[( ) ]
{[ ( ) ] [ ( ) ]}
= E[ g( X) Eg( X)] = [ g( X)].
2 g X +d = E g X +d E g X +d
2
2 g X = E g X Eg X
2
the equality before the last one being true because of (E4).
(V6) 2(X) = E[X(X 1)] + EX (EX)2, if X is an r.v., as is easily seen.
This formula is especially useful in calculating the variance of a
discrete r.v., as is seen below.
Exercises
5.1.1 Verify the details of properties (E1)(E7).
5.1.2 Verify the details of properties (V1)(V5).
5.1.3 For r < r, show that |X|r 1 + |X|r and conclude that if E|X|r < , then
E|X|r for all 0 < r < r.
112
5.1.4 Verify the equality (E[ g( X )] =) g( x) fX ( x)dx = yfY ( y)dy for the
case that X N(0, 1) and Y = g(X) = X2.
5.1.5 For any event A, consider the r.v. X = IA, the indicator of A dened by
IA(s) = 1 for s A and IA(s) = 0 for s Ac, and calculate EXr, r > 0, and also
2(X).
5.1.6 Let X be an r.v. such that
1
P X = c = P X = c = .
2
Calculate EX, 2(X) and show that
P X EX c =
5.1.7
( ).
2 X
c2
ii) For any constant c, show that E(X c)2 = E(X EX)2 + (EX c)2;
ii) Use part (i) to conclude that E(X c)2 is minimum for c = EX.
5.1.8
[ (
)]
[ ( )( )]
E[ X ( X 1)( X 2)( X 3)] = EX 6EX + 11EX
E X X 1 = EX 2 EX ; E X X 1 X 2 = EX 3 3EX 2 + 2EX ;
4
6EX .
0.304
0.287
0.208
0.115
0.061
0.019
0.006
5.1.11
green.
113
+
EX =
,
2
( )
X
2
( )
=
12
5.1.14 Let the r.v. X be distributed as U(, ). Calculate EXn for any positive
integer n.
5.1.15 Let X be an r.v. with p.d.f. f symmetric about a constant c (that is,
f(c x) = f(c + x) for every x).
ii) Then if EX exists, show that EX = c;
ii) If c = 0 and EX2n+1 exists, show that EX2n+1 = 0 (that is, those moments of X
of odd order which exist are all equal to zero).
5.1.16 Refer to Exercise 3.3.13(iv) in Chapter 3 and nd the EX for those s
for which this expectation exists, where X is an r.v. having the distribution in
question.
5.1.17
()
f x =
()
I c ,c x .
c2 ( )
( )]
()
EX = 1 F x dx F x dx;
0
114
ii) Use the interpretation of the denite integral as an area in order to give a
geometric interpretation of EX.
5.1.19
)()
E X c = E X m + 2 c x f x dx;
m
ii) Utilize (i) in order to conclude that E|X c| is minimized for c = m. (Hint:
Consider the two cases that c m and c < m, and in each one split the
integral from to c and c to in order to remove the absolute value.
m
1
2
1
2
ii) Show that EX = 1 + , EX 2 = 1 + , so that
2
1
2 X = 1 + 2 1 + 2 ,
( )
()
Discrete Case
( )
n n1 !
px qn x
( x 1)!(n x)!
(n 1)!
( )( )
p q
= np
!
!
x
n
x
1
1
1
( ) [( ) ( )]
(n 1)! p q( ) = np p + q
= np
( )
x![( n 1) x]!
x =1
x 1
n 1 x 1
x =1
n 1
x= 0
n 1 x
n 1
= np.
5.2
Next,
[ (
115
Expectations
5.1 Moments
and Variances
of Random
of Some
Variables
R.V.s
)]
E X X 1
n
) x! nn! x ! p q
( )
n( n 1)( n 2)!
(
= x( x 1)
p p q
x( x 1)( x 2)![( n 2) ( x 2)]!
(n 2)!
( )( )
= n( n 1) p
p q
( x 2)![(n 2) ( x 2)]!
(n 2)! p q( )
= n( n 1) p
x![( n 2) x]!
= n( n 1) p ( p + q) = n( n 1) p .
= x x1
x= 0
n x
x2
)(
n 2 x2
x=2
n 2 x2
x2
x=2
n 2
n 2 x
x= 0
n 2
That is,
[ (
)] (
)]
( )
E X X 1 = n n 1 p2 .
Hence, by (V6),
( ) [ (
2 X = E X X 1 + EX EX
= n n 1 p 2 + np n 2 p 2
= n p np + np n p = np 1 p = npq.
2
( )
E X = xe
x= 0
x
x
x 1
= xe
= e
x! x =1
x x1 !
x =1 x 1 !
= e e = .
x = 0 x!
= e
Next,
[ (
)]
x
x!
x
x
= 2 e
= 2 .
x x1 x 2 !
x = 0 x!
E X X 1 = x x 1 e
x= 0
= x x 1 e
x=2
)(
116
5.2.2
Continuous Case
2n !
( ) 2( (n)!) ,
E X 2 n+1 = 0, E X 2 n =
n 0.
In particular, then
( )
( )
( )
2
= 1.
2 1!
2 X = E X2 =
E X = 0,
In fact,
E X 2 n+1 =
x 2 n+1 e x
dx.
But
2 n +1 x 2 2
dx = x 2n +1e x 2 dx + x 2n +1e x 2 dx
2
= y 2n +1e y 2 dy + x 2n +1e x 2 dx
2
= x 2n +1e x 2 dx + x 2n +1e x 2 dx = 0.
2
2n
ex
dx = 2 x 2 n e x
dx,
x 2n e x 2 dx = x 2n 1de x
2
= x 2n 1e x
2
0
= 2n 1
+ 2n 1
x 2n 2 e x 2 dx
x 2n 2 e x 2 dx,
2
( )
= (2 n 3)m
m2 n = 2 n 1 m2 n 2 , and similarly,
m 2 n 2
M
m2 = 1 m0
2 n 4
( ) () )
m0 = 1 since m0 = E X 0 = E 1 = 1 .
Multiplying them out, we obtain
5.2
Expectations
5.1 Moments
and Variances
of Random
of Some
Variables
R.V.s
117
)(
)
1 2 ( 2 n 3)( 2 n 2)( 2 n 1)( 2 n)
(2 n)!
=
=
2 ( 2 n 2)( 2 n)
(2 1) [2( n 1)](2 n)
(2 n)!
(2 n)! .
=
=
2 [1 ( n 1) n] 2 ( n!)
m2 n = 2 n 1 2 n 3 1
REMARK 4
X
2 X
E
= 0,
= 1.
But
X 1
E
= EX .
( )
Hence
1
E X = 0,
( )
X
1 2
2
= 2 X
( )
and then
( )
1 2
X = 1,
2
so that 2(X) = 2.
2. Let X be Gamma with parameters and . Then E(X) = and
2(X) = 2. In fact,
( )
EX =
( )
( )
xx 1 e x dx =
x de x =
( )
( )
x e x dx
x e x x 1 e x dx
0
0
( )
x 1 e x dx = .
118
Next,
( ) (1)
E X2 =
x +1 e x dx = 2 + 1
and hence
( )
2 X = 2 + 1 2 2 = 2 + 1 = 2 .
REMARK 5
I=
x dx
2 + x
I=
( )
1 x dx
1 1 d x
=
1 + x 2 2 1 + x 2
2
1
1 1 d 1+ x
=
log 1 + x 2
=
2
2
1+ x
1
=
,
2
I * = lim
1 A xdx
1
=
lim log 1 + x 2
2 A
1+ x
[ (
)]
A
A
1
lim log 1 + A 2 log 1 + A 2 = 0.
2 A
119
Thus the mean of the Cauchy exists in terms of principal value, but not in the
sense of our denition which requires absolute convergence of the improper
Riemann integral involved.
Exercises
5.2.1 If X is an r.v. distributed as B(n, p), calculate the kth factorial moment
E[X(X 1) (X k + 1)].
5.2.2 An honest coin is tossed independently n times and let X be the r.v.
denoting the number of Hs that occur.
iii) Calculate E(X/n), 2(X/n);
iii) If n = 100, nd a lower bound for the probability that the observed
frequency X/n does not differ from 0.5 by more than 0.1;
iii) Determine the smallest value of n for which the probability that X/n does
not differ from 0.5 by more 0.1 is at least 0.95;
iv) If n = 50 and P(|(X/n) 0.5| < c) 0.9, determine the constant c. (Hint: In
(ii)(iv), utilize Tchebichevs inequality.)
5.2.3 Refer to Exercise 3.2.16 in Chapter 3 and suppose that 100 people are
chosen at random. Find the expected number of people with blood of each one
of the four types and the variance about these numbers.
5.2.4 If X is an r.v. distributed as P(), calculate the kth factorial moment
E[X(X 1) (X k + 1)].
5.2.5 Refer to Exercise 3.2.7 in Chapter 3 and nd the expected number of
particles to reach the portion of space under consideration there during time
t and the variance about this number.
5.2.6 If X is an r.v. with a Hypergeometric distribution, use an approach
similar to the one used in the Binomial example in order to show that
EX =
mr
,
m+n
( )
2 X =
(
) .
(m + n) (m + n 1)
mnr m + n r
2
120
x
.
x!
e
f ( x)dx =
x= 0
Conclude that in this case, one may utilize the Incomplete Gamma tables (see,
for example, Tables of the Incomplete -Function, Cambridge University
Press, 1957, Karl Paerson, editor) in order to evaluate the d.f. of a Poisson
distribution at the points j = 1, 2, . . . .
5.2.9 Refer to Exercise 3.3.7 in Chapter 3 and suppose that each TV tube
costs $7 and that it sells for $11. Suppose further that the manufacturer sells an
item on money-back guarantee terms if the lifetime of the tube is less than c.
ii) Express his expected gain (or loss) in terms of c and ;
ii) For what value of c will he break even?
5.2.10 Refer to Exercise 4.1.12 in Chapter 4 and suppose that each bulb costs
30 cents and sells for 50 cents. Furthermore, suppose that a bulb is sold under
the following terms: The entire amount is refunded if its lifetime is <1,000 and
50% of the amount is refunded if its lifetime is <2,000. Compute the expected
gain (or loss) of the dealer.
5.2.11
then
( ) ( ),
( ) ( + + n)
+ +n
n = 1, 2, ;
X
1 = E
.
121
the right and if 1 < 0, the distribution is said to be skewed to the left. Then show
that:
iii) If the p.d.f. of X is symmetric about , then 1 = 0;
iii) The Binomial distribution B(n, p) is skewed to the right for p <
skewed to the left for p > 12 ;
1
2
and is
iii) The Poisson distribution P() and the Negative Exponential distribution
are always skewed to the right.
5.2.16
Let X be an r.v. with EX4 < and dene the (pure number) 2 by
4
X
2 = E
3,
( )
where = EX , 2 = 2 X .
()
G t = pj t j ,
1 t 1.
j =0
[ (
)]
) (
E X X 1 X k +1 =
dk
Gt
dt k
()
t =1
j 0, 0 < p < 1, q = 1 p
and
j
,
e
j!
j 0, > 0;
122
iv) Utilize (ii) and (iii) in order to calculate the kth factorial moments of X
being B(n, p) and X being P(). Compare the results with those found in
Exercises 5.2.1 and 5.2.4, respectively.
(
(
)
)
x f x x
2
2 1
E X 2 X1 = x1 = x
x f x x dx ,
2 1
2
2
x 2 E X 2 X1 = x1
2 X 2 X1 = x1 = x
x 2 E X 2 X1 = x1
[
[
)] f (x x )
)] f (x x )dx .
For example, if (X1, X2) has the Bivariate Normal distribution, then f(x2 |x1)
is the p.d.f. of an N(b, 22 (1 2)) r.v., where
b = 2 +
2
x1 1 .
1
Hence
2
x1 1 .
1
1
x2 2 .
2
E X 2 X1 = x1 = 2 +
Similarly,
E X1 X 2 = x 2 = 1 +
Let X1, X2 be two r.v.s with joint p.d.f f(x1, x2). We just gave the denition
of E(X2|X1 = x1) for all x1 for which f(x2|x1) is dened; that is, for all x1 for which
fX (x1) > 0. Then E(X2|X1 = x1) is a function of x1. Replacing x1 by X1 and writing
E(X2|X1) instead of E(X2|X1 = x1), we then have that E(X2|X1 ) is itself an r.v.,
and a function of X1. Then we may talk about the E[E(X2|X1)]. In connection
with this, we have the following properties:
1
5.3
Conditional
5.1 Moments of Random Variables
123
[(
)]
( )
E E X 2 X1 = x 2 f x 2 x1 dx 2 fX x1 dx1
) ( )
x 2 f x 2 x1 fX x1 dx 2 dx1
1
x 2 f x1 , x 2 dx 2 dx1 =
x 2 f x1 , x 2 dx1 dx 2
( )
( )
= x 2 f x1 , x 2 dx1 dx 2 = x 2 fX x 2 dx 2 = E X 2 .
Note that here all interchanges of order of integration are legitimate because of the absolute convergence of the integrals involved.
REMARK 7
[ ( )
] ( )(
E X 2 g X1 X1 = x1 = g x1 E X 2 X1 = x1
or
[ ( ) ] ( )(
E X 2 g X1 X1 = g X1 E X 2 X1 .
Again, restricting ourselves to the continuous case, we have
[ ( )
( )(
( )
E X 2 g X1 X1 = x1 = x 2 g x1 f x 2 x1 dx 2 = g x1
( ) (
x 2 f x 2 x1 dx 2
= g x1 E X 2 X1 = x1 .
In particular, by taking X2 = 1, we get
(CE2) For all x1 for which the conditional expectations below exist, we
have E[g(X1)|X1 = x1] = g(x1) (or E[g(X1)|X1] = g(X1)).
(CV) Provided the quantities which appear below exist, we have
[(
)]
( )
2 E X 2 X1 2 X 2
( ) ( )
= E X 2 , X1 = E X 2 X1 .
124
Then
( )
) = E{[X (X )] + [ (X ) ]}
= E[ X ( X )] + E[ ( X ) ] + 2 E{[ X ( X )][ ( X ) ]}.
(
2 X2 = E X2
Next,
{[
( )][ ( ) ]}
= E[ X ( X )] E[ ( X )] E( X ) + E[ ( X )]
= E{E[ X ( X ) X ]} E[ ( X )] E[ E( X X )]
+ E[ ( X )]
(by (CE1)),
E X 2 X1 X1
2
[ ( )] [ ( )]
[ ( )]
[ ( )] (by (CE2)),
E 2 X1 E 2 X1 E X1 + E X1
which is 0. Therefore
( )
( )]
2 X 2 = E X 2 X1
[( ) ]
+ E X1 ,
and since
( )]
E X 2 X1
we have
( ) [( )
0,
[(
)]
2 X 2 E X1 2 = 2 E X 2 X1 .
The inequality is strict unless
( )]
E X 2 X1
= 0.
But
( )]
E X 2 X1
( )]
( )]
= 2 X 2 X1 , since E X 2 X1 = = 0.
Exercises
5.3.1
5.3.2
5.4
125
f x, y =
2
n n+1
[( )( )
] ()[( )
E h X gY X=x =h x E gY X=x.
E g ( X)
[ ( ) ] [ c ].
Pg X c
PROOF
[ ( )]
)(
E g X = g x1 , , xk f x1 , , xk dx1 dxk
(
)(
f ( x , , x )dx
= g x1 , , xk f x1 , , xk dx1 dxk + c g x1 , , xk
A
A
1
dxk ,
126
[ ( )]
)(
)
c f ( x , , x )dx dx
= cP[ g(X ) A] = cP[ g(X ) c].
E g X g x1 , , xk f x1 , , xk dx1 dxk
A
5.4.1
EX
r
P X c = P X c r
.
cr
EX
P X c = P X c 2
c2
( )=
2 X
c
2
2
P X k
1
1
; equivalently, P X < k 1 2 .
2
k
k
LEMMA 1
( )
( )
E X = E Y = 0,
( )
( )
2 X = 2 Y = 1.
Then
( )
( )
E 2 XY 1 or, equivalently, 1 E XY 1,
and
( )
E( XY ) = 1
E XY = 1
PROOF
(
)
P(Y = X ) = 1.
P Y = X = 1,
if any only if
if any only if
We have
0 E X Y
= E X 2 2 XY + Y 2
( )
( )
= EX 2 2 E XY + EY 2 = 2 2 E XY
5.4
127
and
0 E X +Y
= E X 2 + 2 XY + Y 2
( )
( )
= EX + 2 E XY + EY 2 = 2 + 2 E XY .
2
) [E(X Y )] = E(X Y )
2 E( XY ) + EY = 1 2 + 1 = 0,
2 X Y = E X Y
= EX 2
P X = Y = 1.
THEOREM 2
[(
)]
)(
E 2 X 1 Y 2 12 22 ,
or, equivalently,
[(
)]
)(
1 2 E X 1 Y 2 1 2 ,
and
[(
)(
E X 1 Y 2
)] =
1
if and only if
P Y = 2 + 2 X 1 = 1
and
[(
)(
E X 1 Y 2
)] =
1
if and only if
P Y = 2 2 X 1 = 1.
1
PROOF
Set
X1 =
X 1
,
1
Y1 =
Y 2
.
2
128
E 2 X1Y1 1
if and only if
1 E X1Y1 1
becomes
[(
)(
E 2 X 1 Y 2
2
1
2
2
)] 1
if and only if
[(
)]
)(
1 2 E X 1 Y 2 1 2 .
The second half of the conclusion follows similarly and will be left as an
exercise (see Exercise 5.4.6).
A more familiar form of the CauchySchwarz inequality is
E2(XY) (EX2)(EY2). This is established as follows: Since the inequality is
trivially true if either one of EX2, EY2 is , suppose that they are both nite
and set Z = X Y, where is a real number. Then 0 EZ2 = (EX2)2
2[E(XY)] + EY2 for all , which happens if and only if E2(XY) (EX2)(EY2)
0 (by the discriminant test for quadratic equations), or E2(XY) (EX2)(EY2).
REMARK 9
Exercises
5.4.1
5.4.2 Let g be a (measurable) function dened on into (0, ). Then, for any
r.v. X and any > 0,
[( ) ]
P g X
( ).
Eg X
If furthermore g is even (that is, g(x) = g(x)) and nondecreasing for x 0, then
P X
( ).
g ( )
Eg X
5.4.3 For an r.v. X with EX = and 2(X) = 2, both nite, use Tchebichevs
inequality in order to nd a lower bound for the probability P(|X | < k).
Compare the lower bounds for k = 1, 2, 3 with the respective probabilities
when X N(, 2).
5.5
Covariance, Correlation
5.1 Coefcient
Moments of
and
Random
Its Interpretation
Variables
129
2
5.4.4 Let X be an r.v. distributed as 40
. Use Tchebichevs inequality
in order to nd a lower bound for the probability P(|(X/40) 1| 0.5),
and compare this bound with the exact value found from Table 3 in Appendix
III.
5.4.5 Refer to Remark 8 and show that if X is an r.v. with EX = (nite) such
that P(|X | c) = 0 for every c > 0, then P(X = ) = 1.
5.4.6 Prove the second conclusion of Theorem 2.
5.4.7 For any r.v. X, use the CauchySchwarz inequality in order to show
that E|X| E1/2X2.
[(
)(
X 1 Y 2 E X 1 Y 2
= E
=
1 2
1 2
E XY 1 2
.
=
1 2
( )
)] = Cov( X , Y )
1 2
2
X 1
1
Y = 2
2
X 1
1
130
Y
2
2
1
1
y 2
2
1
(x 1 )
y 2
2
1
(x 1 )
2
2
2
1
1
1
0
Figure 5.1
For 1 < < 1, 0, we say that X and Y are correlated (positively if > 0,
negatively if < 0). Positive values of may indicate that there is a tendency of
large values of Y to correspond to large values of X and small values of Y to
correspond to small values of X. Negative values of may indicate that small
values of Y correspond to large values of X and large values of Y to small
values of X. Values of close to zero may also indicate that these tendencies
are weak, while values of close to 1 may indicate that the tendencies are
strong.
The following elaboration sheds more light on the intuitive interpretation of the correlation coefcient (= (X, Y)) as a measure of co-linearity
of the r.v.s X and Y. To this end, for > 0, consider the line y = 2 + ( x 1 )
in the xy-plane and let D be the distance of the (random) point (X, Y)
from the above line. Recalling that the distance of the point (x0, y0) from the
a 2 + b 2 . we have in the present
line ax + by + c = 0 is given by ax0 + by0 + c
case:
2
D= X
since here a = 1, b =
1
Y + 1 2 1
2
2
1+
12
,
22
and c = 1 2 1 . Thus,
2
2
D = X 1 Y + 1 2 1
2
2
12
1
+
,
22
and we wish to evaluate the expected squared distance of (X, Y) from the line
y = 2 + ( x 1 ) ; that is, ED2. Carrying out the calculations, we nd
2
2
1
+ 22 D 2 = 22 X 2 + 12Y 2 2 1 2 XY + 2 2 1 2 2 1 X
) (
2 1 1 2 2 1 Y + 1 2 2 1 .
(1)
5.5
Covariance, Correlation
5.1 Coefcient
Moments of
and
Random
Its Interpretation
Variables
EX 2 = 12 + 12 , EY 2 = 22 + 22
131
( )
and E XY = 1 2 + 1 2 ,
we obtain
ED2 =
2 12 22
1
12 + 22
( > 0).
(2)
ED2 =
2 12 22
1+
12 + 22
( < 0).
(3)
ED2 =
2 12 22
1 .
12 + 22
(4)
132
D1 = AX 1 + BY1 + C1
noticing that
we obtain
ED12 = 1 +
2 AB
C2
+
.
A2 + B2 A2 + B2
(5)
2 AB
.
A2 + B2
(6)
A+ B
= A2 + B2 2 AB 0 A2 + B2 2 AB = A B ,
or equivalently,
1
2 AB
1.
A2 + B2
(7)
133
Exercises
5.5.1 Let X be an r.v. taking on the values 2, 1, 1, 2 each with probability
1 . Set Y = X2 and compute the following quantities: EX, 2(X), EY, 2(Y),
4
(X, Y).
5.5.2 Go through the details required in establishing relations (2), (3) and
(4).
5.5.3 Do likewise in establishing relation (5).
5.5.4 Refer to Exercise 5.3.2 (including the hint given there) and
iii) Calculate the covariance and the correlation coefcient of the r.v.s X and
Y;
iii) Referring to relation (4), calculate the expected squared distance of (X,
Y) from the appropriate line y = 2 + ( x 1 ) or y = 2 ( x 1 )
(which one?);
2
iii) What is the minimum expected squared distance of (X1, Y1) from
the appropriate line y = x or y = x (which one?) where X 1 = X and
Y1 = Y .
1
) .
n n+1
x = 2
x =1
5.5.5 Refer to Exercise 5.3.2 and calculate the covariance and the correlation coefcient of the r.v.s X and Y.
5.5.6 Do the same in reference to Exercise 5.3.3.
5.5.7 Repeat the same in reference to Exercise 5.3.4.
5.5.8 Show that (aX + b, cY + d) = sgn(ac)(X, Y), where a, b, c, d are
constants and sgn x is 1 if x > 0 and is 1 if x < 0.
5.5.9 Let X and Y be r.v.s representing temperatures in two localities, A and
B, say, given in the Celsius scale, and let U and V be the respective temperatures in the Fahrenheit scale. Then it is known that U and X are related as
follows: U = 59 X + 32, and likewise for V and Y. Fit this example in the model
of Exercise 5.5.8, and conclude that the correlation coefcients of X, Y and U,
V are identical, as one would expect.
5.5.10 Consider the jointly distributed r.v.s X, Y with nite second moments
and 2(X) > 0. Then show that the values and for which E[Y (X + )]2
is minimized are given by
134
= EY EX ,
( ) X, Y .
( )
(X )
Y
2 1 2
(x
=
2
x 2
x 1 1 x 2 2 x 2 2
1
1
+
2
1
1 2 2
1
2
2
1
(x
2 2 1 2
where b = 2 +
2
x 1 1 .
1
5.6* Justication
5.1 Moments
of Relation
of Random
(2) in Chapter
Variables
2
135
1 if s A
IA s =
c
0 if s A .
()
I I Aj = I A
n
j=1
j =1
(8)
n
j=1
Aj = I A ,
(9)
j =1
and, in particular,
IA = 1 IA.
(10)
Clearly,
( )
( )
E IA = P A
(11)
(1 X )(1 X ) (1 X ) = 1 H
1
( )
+ H 2 + 1 H r ,
(12)
where Hj stands for the sum of the products Xi Xi , where the summation
extends over all subsets {i1, i2, . . . , ij} of the set {1, 2, . . . , r}, j = 1, . . . , r. Let
, be such that: 0 < , and + r. Then the following is true:
1
Xi
j
+
X i H J =
H + ,
( )
(13)
where J = {i1, . . . , i} is the typical member of all subsets of size of the set
{1, 2, . . . , r}, H(J) is the sum of the products Xj Xj , where the summation
extends over all subsets of size of the set {1, . . . , r} J, and J is meant to
extend over all subsets J of size of the set {1, 2, . . . , r}.
The justication of (13) is as follows: In forming H+, we select ( + ) Xs
from the available r Xs in all possible ways, which is ( +r ). On the other hand,
for each choice of J, there are ( r ) ways of choosing Xs from the remaining
(r ) Xs. Since there are (r ) choices of J, we get (r )( r ) groups (products)
of ( + ) Xs out of r Xs. The number of different groups of ( + ) Xs out
1
136
(
) (
)(
r !
r!
r r
! r ! ! r !
=
r!
r
r !
+
!
+
=
( + )! = + .
! !
m +1
Jm
where the summation extends over all choices of subsets Jm = {i1, . . . , im} of the
set {1, 2, . . . , M} and Bm is the one used in Theorem 9, Chapter 2. Hence
IB = IA
i1
Jm
= IA IA 1 IA
i1
Jm
im
im + 1
( )
AiM
( )
( )
= I A I A 1 H1 J m + H 2 J m + 1
J
i1
im
M m
( )
H M m J m
(by (12)).
Since
IA
Jm
i1
m + k
I A H k Jm =
H m+ k
m
im
( )
(by (13)),
we have
m + 1
m + 2
I B = Hm
H m+1 +
H m+ 2 + 1
m
m
m
( )
M m
M
HM .
m
Taking expectations of both sides, we get (from (11) and the denition of Sr in
Theorem 9, Chapter 2)
m + 1
m + 2
P Bm = S m
S m+1 +
S m+ 2 + 1
m
m
( )
as was to be proved.
( )
M m
M
SM ,
m
5.6* Justication
5.1 Moments
of Relation
of Random
(2) in Chapter
Variables
2
137
(For the proof just completed, also see pp. 8085 in E. Parzens book
Modern Probability Theory and Its Applications published by Wiley, 1960.)
REMARK 10 In measure theory the quantity IA is sometimes called the characteristic function of the set A and is usually denoted by A. In probability
theory the term characteristic function is reserved for a different concept and
will be a major topic of the next chapter.
138
Chapter 6
Characteristic Functions,
Moment Generating Functions
and Related Theorems
6.1 Preliminaries
The main subject matter of this chapter is the introduction of the concept of
the characteristic function of an r.v. and the discussion of its main properties.
The characteristic function is a powerful mathematical tool, which is used
protably for probabilistic purposes, such as producing the moments of an r.v.,
recovering its distribution, establishing limit theorems, etc. To this end, recall
that for z , eiz = cos z + i sin z, i = 1 , and in what follows, i may
be treated formally as a real number, subject to its usual properties: i2 = 1,
i3 = i, i 4 = 1, i 5 = i, etc.
The sequence of lemmas below will be used to justify the theorems which
follow, as well as in other cases in subsequent chapters. A brief justication for
some of them is also presented, and relevant references are given at the end of
this section.
LEMMA A
( )
( )
g1 x j g 2 x j ,
j = 1, 2, ,
PROOF
LEMMA A
Let g1, g2 : [0, ) be such that g1(x) g2(x), x , and that a g1 (x)dx
exists for every a, b, with a < b, and that g 2 (x)dx < . Then
g1(x)dx
< .
b
PROOF
138
6.5
LEMMA B
139
Let g : {x1, x2, . . .} and xj|g(xj)| < . Then xj g(xj) also converges.
The result is immediate for nite sums, and it follows by taking the
limits of partial sums, which satisfy the inequality.
PROOF
LEMMA B
Let g : be such that a g(x)dx exists for every a, b, with a < b, and that
PROOF
E(X), or
probability 1) and E(Y) < . Then E(X) exists and E(Xn) n
equivalently,
( )
lim E X n = E lim X n .
n
REMARK 1
The next lemma gives conditions under which the operations of differentiation and taking expectations commute.
LEMMA D
X s; t Y s ,
t
( )
()
s S , t T .
Then
d
E X ; t = E X ; t , for all t T .
dt
t
[ ( )]
( )
The proofs of the above lemmas can be found in any book on real variables theory, although the last two will be stated in terms of weighting functions rather than expectations; for example, see Advanced Calculus, Theorem
2, p. 285, Theorem 7, p. 292, by D. V. Widder, Prentice-Hall, 1947; Real
Analysis, Theorem 7.1, p. 146, by E. J. McShane and T. A. Botts, Van
Nostrand, 1959; The Theory of Lebesgue Measure and Integration, pp. 6667,
by S. Hartman and J. Mikusinski, Pergamon Press, 1961. Also Mathematical
Methods of Statistics, pp. 4546 and pp. 6668, by H. Cramr, Princeton
University Press, 1961.
140
[ ]
()
X t = E e itX
[ ( ) ( ) ( ) ( )]
()
[ ( ) ( ) ( ) ( )]
[ ( ) ( )] [ ( ) ( )]
()
x
= x
itx
e f x dx =
cos tx f x + i sin tx f x dx
cos tx f x + i sin tx f x
x
= x
( )()
( )()
By Lemmas A, A, B, B, X(t) exists for all t . The ch.f. X is also called the
Fourier transform of f.
The following theorem summarizes the basic properties of a ch.f.
THEOREM 1
ii)
iii)
iv)
v)
vi)
vii)
dn
X t
dt n
()
PROOF
()
iii) X t + h X t = E e
( )
i t+ h X
[ (
e itX
)]
()
141
h0
h0
provided we can interchange the order of lim and E, which here can be
done by Lemma C. We observe that uniformity holds since the last expression on the right is independent of t.
iv) X+d(t) = Eeit(X+d) = E(eitXeitd) = eitd EeitX = eitd X(t).
v) cX(t) = Eeit(cX) = Eei(ct)X = X(ct).
vi) Follows trivially from (iv) and (v).
vii)
dn
dn
t = n Ee itX = E n e itX = E i n X n e itX ,
n X
dt
dt
t
()
()
( )
= inE X n .
t =0
REMARK
2
n
(i)
n d
dt n
In the following result, the ch.f. serves the purpose of recovering the
distribution of an r.v. by way of its ch.f.
THEOREM 2
(Inversion formula) Let X be an r.v. with p.d.f. f and ch.f. . Then if X is of the
discrete type, taking on the (distinct) values xj, j 1, one has
( )
i) f x j = lim
1
2T
itx j
()
t dt , j 1.
142
()
1 e ith itx
e t dt
T
ith
1
h0 T 2
()
1 itx
e (t)dt.
2
1
2T
itx j
itx
e itx f x k dt
e
k
it
x
x
1
(
) f x dt
=
e
k
2T T k
1 T it( x x )
e
dt
= f xk
2T T
k
()
t dt =
1
2T
( )
( )
( )
(the interchange of the integral and summations is valid here). That is,
1
2T
T e
itx j
()
( ) 21T
t dt = f x k
k
it x k x j
) dt .
(1)
But
it x k x j
) dt =
[cos t(x
(
= cos t ( x
) dt = cos t x
(
)]
x j + i sin t x k x j dt
)
(
)
x )dt , since sin z is an odd function. That is,
x )dt .
(2)
T
= cos t x k x j dt + i sin t x k x j dt
T
it x k x j
cos t x k x j dt =
=
=
1
xk x j
d sin t x k x j
[ (
sin T x k x j sin T x k x j
xk x j
2 sin T x k x j
xk x j
).
)]
143
Therefore,
1
2T
1,
(
e
it x k x j
) dt = sin T x x
( k j)
T x k x j
if
xk = x j
if
xk x j.
(3)
sinT ( x x )
x 1 x , a constant independent of T, and therefore, for
But
x x
sin T ( x x )
xk xj, lim T ( x x ) = 0 , so that
T
k
k
j
1
T 2T
lim
it x k x j
) dt = 1,
0,
if
xk = x j
if
xk x j.
(4)
lim
1
2T
T e
itx j
()
( ) 21T
t dt = lim f xk
T
( )
1
T 2T
= f xk lim
k
T e
dt
dt
it xk x j
it xk x j
( )
= f x j + lim
k j
1
2T
T e
it x k x j
) dt = f x ,
( j)
as was to be seen
ii) (ii) Strictly speaking, (ii) follows from (ii). We are going to omit (ii)
entirely and attempt to give a rough justication of (ii). The assumption that
itx
| (t )|dt < implies that e (t)dt exists, and is taken as follows for every
arbitrary but xed x :
()
()
(0 < )T T
(5)
But
()
()
T
e itx t dt = e itx e ity f y dydt
T
T
T
T
it ( y x )
it ( y x )
= e
f y dy dt = f y e
dt dy,
T
T
()
()
(6)
where the interchange of the order of integration is legitimate here. Since the
integral (with respect to y) is zero over a single point, we may assume in the
sequel that y x. Then
T
it y x
dt =
2 sin T y x
yx
),
(7)
144
itx
()
()
t dt = 2 lim f y
T
sin T y x
yx
) dy.
(8)
z sin z
e itx t dt = 2 lim f x +
dz
T
T z
()
()
()
= 2 f x = 2f x ,
by taking the
limit under the integral sign, and by using continuity of f and the
()
f x =
1
2
()
e itx t dt ,
as asserted.
EXAMPLE 1
Let X be B(n, p). In the next section, it will be seen that X(t) = (peit + q)n. Let
us apply (i) to this expression. First of all, we have
1
2T
()
( pe
e itx t dt =
1
2T
1
2T
r ( pe
1
2T
r p q
1
2T
r p q
+ q e itx dt
it
r = 0
n r
n r
e itx dt
)q
n r
r= 0
n
it
i r x t
dt
i( r x )
r= 0
r x
i r x t
i r x dt
1 n x n x T
p q T dt
2T x
i ( r x )T
i ( r x )T
n
n
e
e
1 n x n x
= p r qn r
+
p q 2T
T x
2
2Ti r x
r= 0 r
r x
n
sin r x T n x n x
n
+ p q .
= p r qn r
x
rx T
r= 0 r
r x
()
6.5
145
(One could also use (i) for calculating f(x), since is, clearly, periodic with
period 2.)
EXAMPLE 2
For an example of the continuous type, let X be N(0, 1). In the next section, we
2
2
will see that X(t) = et /2. Since |(t)| = et /2, we know that | (t )|dt < , so that
(ii) applies. Thus we have
( )
f x =
1
2
()
e itx t dt =
1
2
e itx e t
dt
1 ( 1 2 )( t + 2 itx )
1 ( 1 2 ) t + 2 t ( ix ) + ( ix ) ( 1 2 )( ix )
e
dt
=
e
e
dt
2
2
( 1 2 )x
( 1 2 )x
e
1
e
1
( 1 2 )( t + ix )
( 1 2 )u
dt
=
e
du
=
e
2
2
2
2
2
( )
1 2 x2
1 =
1
2
e x
as was to be shown.
THEOREM 3
PROOF The p.d.f. of an r.v. determines its ch.f. through the denition of the
ch.f. The converse, which is the involved part of the theorem, follows from
Theorem 2.
Exercises
6.2.1 Show that for any r.v. X and every t , one has |EeitX| E|eitX|(= 1).
(Hint: If z = a + ib, a, b , recall that z = a 2 + b 2 . Also use Exercise 5.4.7
in Chapter 5 in order to conclude that (EY)2 EY2 for any r.v. Y.)
6.2.2 Write out detailed proofs for parts (iii) and (vii) of Theorem 1 and
justify the use of Lemmas C, D.
6.2.3 For any r.v. X with ch.f. X, show that X(t) = X(t), t , where the bar
over X denotes conjugate, that is, if z = a + ib, a, b , then z = a ib.
6.2.4 Show that the ch.f. X of an r.v. X is real if and only if the p.d.f. fX of X
is symmetric about 0 (that is, fX(x) = fX(x), x ). (Hint: If X is real, then the
conclusion is reached by means of the previous exercise and Theorem 2. If fX
is symmetric, show that fX(x) = fX(x), x .)
146
6.2.5 Let X be an r.v. with p.d.f. f and ch.f. given by: (t) = 1 |t| if |t| 1
and (t) = 0 if |t| > 1. Use the appropriate inversion formula to nd f.
6.2.6 Consider the r.v. X with ch.f. (t) = e|t|, t , and utilize Theorem 2(ii)
in order to determine the p.d.f. of X.
6.3.1
Discrete Case
( )
()
Hence
()
d
X t
dt
= n pe it + q
t=0
n 1
= inp,
ipe it
t=0
()
= inp
t=0
d it
pe + q
dt
n 1
)(
= inp n 1 pe it + q
[( ) ]
e it
t= 0
n 2
ipe it e it + pe it + q
n 1
[( ) ]
ie it
t= 0
( )
= i 2 np n 1 p + 1 = np n 1 p + 1 = E X 2 ,
so that
[( )
( )
( )
( ) ( )
E X 2 = np n 1 p + 1 and 2 X = E X 2 EX
= n 2 p 2 np 2 + np n 2 p 2 = np 1 p = npq;
()
X t = e itx e
x= 0
( )
it
e
x
=e
x!
x!
x= 0
= e e e = e e
it
it
147
Hence
()
d
X t
dt
t=0
= e e
it
ie it
t= 0
= i ,
()
d
ie e e
dt
=
t=0
= ie
= ie
d e
e
dt
e
it
it
eit +it
+it
t=0
+it
t= 0
e it i + i
t= 0
( )
= i ( + 1) = ( + 1) = E ( X ),
= ie
e i + i
so that
( ) ( )
( )
2 X = E X 2 EX
= + 1 2 = ;
()
()
( )
( )
()
( X ) t = (1 ) X ( ) t = e it X t , and X t = e it ( X ) t .
So it sufces to nd the ch.f. of an N(0, 1) r.v. Y, say. Now
()
Y t =
=
1
2
1
2
e
e t
ity
e y
dy =
y it
1
2
dy = e t
y 2 2 ity
) 2 dy
2t 2
X t = exp it
.
2
()
148
Hence
()
d
X t
dt
2t 2
2
= exp it
i t
2
t =0
d2
X t
dt 2
()
( )
= i , so that E X = .
t =0
2t 2
2
= exp it
i t
2
t =0
2t 2
2 exp it
2
t =0
= i2 2 2 = i2 2 + 2 .
()
X t =
( ) 0
( ) 0
e itx x 1 e x dx =
x 1 e
x 1 it
dx.
y
dy
, dx =
,
1 it
1 it
y 0, .
y
( ) 0 (1 it ) 1
= 1 it
1 y
( ) 0
dy
1 it
y 1e y dy = 1 it
Therefore
()
d
X t
dt
=
t =0
(1 it )
= i ,
+1
t =0
=i
t =0
)
(1 it )
+1 2
= i 2 + 1 2 ,
+2
t =0
6.5
() (
X t = 1 2it
r 2
it
, X t = 1
()
149
,
it
respectively.
3. Let X be Cauchy distributed with = 0 and = 1. Then X(t) = e|t|. In
fact,
()
( )
1 1
1 cos tx
dx =
dx
2
1+ x
1 + x 2
i sin tx
2 cos tx
dx =
dx
+
2
0 1 + x2
1+ x
X t = e itx
( )
( )
because
( ) dx = 0,
sin tx
1 + x2
since sin(tx) is an odd function, and cos(tx) is an even function. Further, it can
be shown by complex variables theory that
( ) dx = e
cos tx
1 + x2
Hence
()
X t = e
Now
()
d
d t
X t = e
dt
dt
does not exist for t = 0. This is consistent with the fact of nonexistence of E(X ),
as has been seen in Chapter 5.
Exercises
6.3.1 Let X be an r.v. with p.d.f. f given in Exercise 3.2.13 of Chapter 3.
Derive its ch.f. , and calculate EX, E[X(X 1)], 2(X), provided they are
nite.
6.3.2 Let X be an r.v. with p.d.f. f given in Exercise 3.2.14 of Chapter 3.
Derive its ch.f. , and calculate EX, E[X(X 1)], 2(X), provided they are
nite.
6.3.3 Let X be an r.v. with p.d.f. f given by f(x) = e(x) I(,)(x). Find its ch.f.
, and calculate EX, 2(X), provided they are nite.
150
()
t =
pr
1 qe it
()
t =
e it e it
;
it
( )2
12
6.3.6 Consider the r.v. X with p.d.f. f given in Exercise 3.3.14(ii) of Chapter
3, and by using the ch.f. of X, calculate EX n, n = 1, 2, . . . , provided they are
nite.
X , ,
1
Xk
(t , , t ) = E[e
1
it 1 X 1 +it2 X 2 + + itk X k
],
t j ,
6.4
151
(t1, . . . , tk).
Xk
(c1t1, . . . , cktk).
Xk
it1d1 + + itkdk
vii) If the absolute (n1, . . . , nk)-joint moment, as well as all lower order joint
moments of X1, . . . , Xk are nite, then
n + +n
X , ,
t tn t tn
1
Xk
(t , , t )
1
= i
k
j=1 n j
E X 1n X kn ,
1
t 1 = =tk = 0
and, in particular,
n
X , ,
t nj
1
Xk
(t , , t )
( )
= i n E X nj ,
t 1 = =tk = 0
j = 1, 2, , k.
viii) If in the X1, . . . , Xk(t1, . . . , tk) we set tj1 = = tjn = 0, then the resulting
expression is the joint ch.f. of the r.v.s Xi1, . . . , Xim, where the js and the
is are different and m + n = k.
Multidimensional versions of Theorem 2 and Theorem 3 also hold true.
We give their formulations below.
THEOREM 2
(Inversion formula) Let X = (X1, . . . , Xk) be an r. vector with p.d.f. f and ch.f.
. Then
ii) f X , ,
1
Xk
1
x1 , , xk = lim
T 2T
X , ,
1
Xk
k
T
e it x it x
1 1
k k
(t , , t )dt
1
dt k ,
Xk
(x , , x )
1
1
= lim lim
h 0 T 2
1 e it j h
it j h
j =1
k
T
e it1x1 itk xk X 1 , ,
Xk
(t , , t )dt
1
dt k ,
if X is of the continuous type, with the analog of (ii) holding if the integral
of |X1, . . . , Xk(t1, . . . , tk)| is nite.
THEOREM 3
152
6.4.1
n!
p1x pkx .
x1! xk !
P X 1 = x1 , , X k = xk =
Then
X , ,
1
Xk
(t , , t ) = ( p e
1
it 1
+ + pk e it
).
n
In fact,
X , ,
Xk
(t , , t ) =
1
e it x + +it x
1 1
k k
x 1 , , xk
x 1 , , xk
n!
p1x pkx
x1! xk !
1
n!
p1e it
x1! xk !
= p1e it + + pk e it
1
x1
pk e it
xk
).
n
Hence
k
X , ,
t1 t k
1
Xk
(t , , t )
k
t 1 = = tk = 0
) (
= i k n n 1 n k + 1 p1 p2 pk .
= n n 1 n k + 1 i p1 pk p1e it1 +
+ pk e itk
Hence
n k
t 1 = = tk = 0
) (
) (
E X1 X k = n n 1 n k + 1 p1 p2 pk .
e itg ( x ) f x , x = x ,
1 , xk
itg ( X )
= x
g(X) t = E e
itg ( x , , x )
e
f x1 , , xk dx1 , , dxk .
()
()
Exercise
6.4.1 (CramrWold) Consider the r.v.s Xj, j = 1, . . . , k and for cj ,
j = 1, . . . , k, set
6.5
153
Yc = c j X j .
j =1
Then
ii) Show that Yc(t) = X1, . . . , Xk(c1t, . . . , ckt), t , and X1, . . . , Xk(c1, . . . , ck)
= Yc(1);
ii) Conclude that the distribution of the Xs determines the distribution of Yc
for every cj , j = 1, . . . , k. Conversely, the distribution of the Xs is
determined by the distribution of Yc for every cj , j = 1, . . . , k.
dn
MX t
dt n
()
=
t =0
dn
Ee tX
dt n
= E X n e tX
dn
= E n e tX
dt
t =0
t =0
t =0
( )
= E Xn .
This is the property from which the m.g.f. derives its name.
154
6.5.1
( )
()
()
d
MX t
dt
=
t =0
d
pe t + q
dt
= n pe t + q
t =0
n 1
( )
= np = E X ,
pe t
t =0
and
d2
MX t
dt 2
()
d
np pe t + q
dt
=
t =0
)(
n 1
= np n 1 pe t + q
et
t =0
n 2
pe t e t + pe t + q
n 1
et
t =0
( )
= n n 1 p 2 + np = n 2 p 2 np 2 + np = E X 2 ,
so that 2(X) = n2p2 np2 + np n2p2 = np(1 p) = npq.
2. If X P(), then MX(t) = ee , t . In fact,
t
()
M X t = e tx e
x= 0
( )
t
e
x
=e
x!
x!
x= 0
= e e e = e e .
t
Then
()
d
MX t
dt
=
t =0
( )
d e
= e t e e
==E X ,
e
t =0
dt
t =0
t
and
d2
MX t
dt 2
()
=
t =0
t
d
e t e e
dt
= e t e e + e t e e e t
t =0
( )
( )
t =0
= 1 + = E X 2 , so that 2 X = + 2 2 = .
6.5
t +
2t 2
2
155
1), then MX(t) = et /2, t . By the property for m.g.f. analogous to property (vi)
in Theorem 1,
2
t
t
M X t = e t M X , so that M X = e t M X t
()
()
()
d
MX t
dt
=
t= 0
d t +
e
dt
2t 2
2
=e
t +
t t 2
+
2
2t 2
2
= + 2t e
t=0
. Then
t +
2t 2
2
( )
==E X ,
t= 0
and
d2
MX t
dt 2
()
t +
d
= + 2t e
dt
t=0
2t 2
2
t= 0
= 2 + 2
t=0
t
t
2 t +
t +
2
2
= 2 e
+ + 2t e
2 2
2 2
( )
( )
= E X 2 , so that 2 X = 2 + 2 2 = 2 .
4. If X is distributed as Gamma with parameters and , then MX(t) =
(1 t), t < 1/. Indeed,
()
MX t =
( )
e tx x 1 e x dx =
(1 t )
( ) 0
y
1 t
( )
, dx =
dy
1 t
y 1 e y dy =
x 1 e
()
=
t =0
d
1 t
dt
(1 t )
( )
= = E X ,
t =0
dx.
d
MX t
dt
x 1 t
156
and
d2
MX t
dt 2
()
=
t =0
d
1 t
dt
) (
= + 1 2 1 t
t =0
( )
2
t =0
= + 1 = EX , so that X = .
2
In particular, for = 2r and = 2, we get the m.g.f. of the 2r, and its mean and
variance; namely,
() (
( )
( )
1
, E X = r , 2 X = 2r .
2
For = 1 and = 1 , we obtain the m.g.f. of the Negative Exponential
distribution, and its mean and variance; namely
MX t = 1 2t
r 2
, t<
1
1
, t < , EX = , 2 X = 2 .
t
1 1
M X t = E e tX = e tx
dx
1 + x2
1
1
1
1
> e tx
dx > tx
dx
2
0
0
1+ x
1 + x2
()
( )
MX t =
( )
()
( )
2 x dx
t du
t
lim log u .
=
=
2
1
2
u
2 x
1+ x
Thus for t > 0, MX(t) obviously is equal to . If t < 0, by using the limits , 0
in the integral, we again reach the conclusion that MX(t) = (see Exercise
6.5.9).
The examples just discussed exhibit all three cases regarding the
existence or nonexistence of an m.g.f. In Examples 1 and 3, the m.g.f.s exist
for all t ; in Examples 2 and 4, the m.g.f.s exist for proper subsets of ; and
in Example 5, the m.g.f. exists only for t = 0.
REMARK 4
()
( )
( )
X t = E t X , t , if E t X exists.
This function is sometimes referred to as the Mellin or MellinStieltjes transform of f. Clearly, X(t) = MX(log t) for t > 0.
Formally, the nth factorial moment of an r.v. X is taken from its factorial
m.g.f. by differentiation as follows:
6.5
dn
X t
dt n
()
157
[ (
)]
) (
= E X X 1 X n+1 .
t =1
In fact,
n X
dn
dn
X
X n
E
t
E
t
=
=
,
X
n t = E X X 1 X n+1 t
dt n
dt n
t
[ (
( )
()
provided Lemma D applies, so that the interchange of the order of differentiation and expectation is valid. Hence
dn
X t
dt n
()
[ (
)]
) (
= E X X 1 X n+1 .
t =1
(9)
The factorial m.g.f. derives its name from the property just established. As has already been seen in the rst two examples in Section 2 of
Chapter 5, factorial moments are especially valuable in calculating the variance of discrete r.v.s. Indeed, since
REMARK 5
( ) ( )
( )
( ) [ (
)] ( )
2 X = E X 2 EX , and E X 2 = E X X 1 + E X ,
we get
( )
[ (
)] ( ) ( )
2 X = E X X 1 + E X EX ;
that is, an expression of the variance of X in terms of derivatives of its factorial
m.g.f. up to order two.
Below we derive some factorial m.g.f.s. Property (9) (for n = 2) is valid in
all these examples, although no specic justication will be provided.
x= 0
x= 0
()
( )
= n n 1 p2 ,
Then
d2
X t
dt 2
()
) (
= n n 1 p 2 pt + q
t =1
n 2
()
X t = t e
x =0
( )
t
x
= e
= e e t = e t , t .
x!
x = 0 x!
158
Hence
d2
X t
dt 2
()
= 2 e t
t =1
t =1
( )
= 2 , so that 2 X = 2 + 2 = .
The m.g.f. of an r. vector X or the joint m.g.f. of the r.v.s X1, . . . , Xk,
denoted by MX or MX1, . . . , Xk, is dened by:
M X , , X t1 , , t k = E e t X + t X , t j , j = 1, 2, , k,
for those tjs in for which this expectation exists. If MX1, . . . , Xk(t1, . . . , tk) exists,
then formally X1, . . . , Xk(t1, . . . , tk) = MX1, . . . , Xk(it1, . . . , itk) and properties analogous to (i)(vii), (viii) in Theorem 1 hold true under suitable conditions. In
particular,
1
n + +n
MX , ,
t1n 1 t knk
1
Xk
(t , , t )
= E X 1n 1 X knk ,
t 1 = = tk = 0
(10)
6.5.3
Xk
(t , , t ) = ( p e
1
t1
+ + pk e t
),
n
t j ,
j = 1, , k.
In fact,
MX1 , ,
Xk
(t , , t ) = Ee
1
t 1 X 1 + +tk X k
( )
n!
p1e t1
x1! xk !
t X 1 + +tk X k
= e 1
x1
( )
),
pk e tk
= p1e t1 + + pk e tk
n!
p1x1 pkxk
x1! xk !
xk
MX
1 ,X 2
(t , t ) = exp t
1
1 1
+ 2t 2 +
1 2 2
1 t1 + 2 1 2 t1t 2 + 22 t 22
2
), t , t
1
. (11)
6.5
f x1 , x 2
=
159
)
1
2 1 2
2
x1 1 x 2 2 x 2 2
1 x1 1
exp
2
+
.
1 2 2
1 2
2 1
1 =
2
1 2 1 2
.
1 2
12
Therefore
22 1 2 x1 1
1
x1 1 x 2 2
12 x 2 2
1 2
x 1 x =
=
1
2
1
2
2
1
1 2
2 x
2
1
1
)(
2
2 1 2 x1 1 x 2 2 + 12 x 2 2
2
x 2
x1 1 x 2 2 x 2 2
1
1
+
.
1
1 2 2
()
f x =
1
2
1 2
1
exp x 1 x
2
) .
In this form, is the mean vector of X = (X1 X2), and is the covariance matrix
of X.
Next, for t = (t1 t2), we have
()
( )()
M X t = Ee t X = 2 exp t x f x dx
=
1
2
1 2
exp t x 2 ( x ) ( x )dx.
1
(11)
160
1
1
1
t + t t 2 t + t t 2 t x + x x .
2
)]
(13)
Focus on the quantity in the bracket, carry out the multiplication, and observe
1) = 1, xt = tx, t = t, and x 1 =
1x, to obtain
that = , (
( (
))]
2 t + t t 2t x + x 1 x = x + t 1 x + t .
(14)
()
MX t
1
1
= exp t + t t
2
2 1 2
exp x + t 1 x + t dx.
2
( (
))
( (
))
1
M X t = exp t + t t .
2
()
(15)
Observing that
12 1 2 t1
2 2
2 2
t t = t1 t 2
= 1 t1 + 2 1 2 t1t 2 + 2 t 2 ,
2
1 2 2 t 2
( )
Exercises
6.5.1 Derive the m.g.f. of the r.v. X which denotes the number of spots that
turn up when a balanced die is rolled.
6.5.2 Let X be an r.v. with p.d.f. f given in Exercise 3.2.13 of Chapter 3.
Derive its m.g.f. and factorial m.g.f., M(t) and (t), respectively, for those ts
for which they exist. Then calculate EX, E[X(X 1)] and 2(X), provided they
are nite.
6.5.3 Let X be an r.v. with p.d.f. f given in Exercise 3.2.14 of Chapter 3.
Derive its m.g.f. and factorial m.g.f., M(t) and (t), respectively, for those ts
for which they exist. Then calculate EX, E[X(X 1)] and 2(X), provided they
are nite.
6.5
161
6.5.4 Let X be an r.v. with p.d.f. f given by f(x) = e(x )I(,)(x). Find its
m.g.f. M(t) for those ts for which it exists. Then calculate EX and 2(X),
provided they are nite.
6.5.5 Let X be an r.v. distributed as B(n, p). Use its factorial m.g.f. in order
to calculate its kth factorial moment. Compare with Exercise 5.2.1 in Chapter
5.
6.5.6 Let X be an r.v. distributed as P(). Use its factorial m.g.f. in order to
calculate its kth factorial moment. Compare with Exercise 5.2.4 in Chapter 5.
6.5.7 Let X be an r.v. distributed as Negative Binomial with parameters
r and p.
i) Show that its m.g.f and factorial m.g.f., M(t) and (t), respectively, are
given by
()
MX t =
pr
(1 qe )
t
()
, t < log q, X t =
pr
(1 qt )
t <
1
;
q
()
Mt =
e t e t
;
t
12
) .
2
6.5.9 Refer to Example 3 in the Continuous case and show that MX(t) = for
t < 0 as asserted there.
6.5.10 Let X be an r.v. with m.g.f. M given by M(t) = e t + t , t ( ,
> 0). Find the ch.f. of X and identify its p.d.f. Also use the ch.f. of X in order
to calculate EX4.
2
6.5.11 For an r.v. X, dene the function by (t) = E(1 + t)X for those ts for
which E(1 + t)X is nite. Then, if the nth factorial moment of X is nite, show
that
(d
) ()
dt n t
t =0
[ (
)]
= E X X 1 X n+1 .
6.5.12 Refer to the previous exercise and let X be P(). Derive (t) and use
it in order to show that the nth factorial moment of X is n.
162
6.5.13 Let X be an r.v. with m.g.f. M and set K(t) = log M(t) for those ts for
which M(t) exists. Furthermore, suppose that EX = and 2(X) = 2 are both
nite. Then show that
()
d
Kt
dt
d2
Kt
dt 2
()
= and
t =0
= 2.
t =0
(The function K just dened is called the cumulant generating function of X.)
6.5.14 Let X be an r.v. such that EX n is nite for all n = 1, 2, . . . . Use the
expansion
xn
n = 0 n!
ex =
in order to show that, under appropriate conditions, one has that the m.g.f. of
X is given by
()
M t = EX n
n =0
) tn! .
n
6.5.15 If X is an r.v. such that EX n = n!, then use the previous exercise in
order to nd the m.g.f. M(t) of X for those ts for which it exists. Also nd the
ch.f. of X and from this, deduce the distribution of X.
6.5.16
(2k)! ,
2 k k!
EX 2k +1 = 0,
k = 0, 1, . . . . Find the m.g.f. of X and also its ch.f. Then deduce the distribution
of X. (Use Exercise 6.5.14)
6.5.17
) (
1
M t1 , t 2 = e t1 + t2 + 1 + e t1 + e t2 , t1 , t 2 .
6
3
Calculate EX1, 2(X1) and Cov(X1, X2), provided they are nite.
6.5.18 Refer to Exercise 4.2.5. in Chapter 4 and nd the joint m.g.f.
M(t1, t2, t3) of the r.v.s X1, X2, X3 for those t1, t2, t3 for which it exists. Also nd
their joint ch.f. and use it in order to calculate E(X1X2X3), provided the
assumptions of Theorem 1 (vii) are met.
6.5.19 Refer to the previous exercise and derive the m.g.f. M(t) of the r.v.
g(X1, X2, X3) = X1 + X2 + X3 for those ts for which it exists. From this, deduce
the distribution of g.
6.5
163
6.5.20 Let X1, X2 be two r.v.s with m.g.f. M and set K(t1, t2) = log M(t1, t2) for
those t1, t2 for which M(t1, t2) exists. Furthermore, suppose that expectations,
variances, and covariances of these r.v.s are all nite. Then show that for
j = 1, 2,
K t1 , t 2
t j
= EX j ,
t 1 =t2 = 0
2
K t1 , t 2
t1t 2
2
K t1 , t 2
t j2
( )
= 2 Xj ,
t 1 =t2 = 0
= Cov X 1 , X 2 .
t 1 =t2 = 0
6.5.21 Suppose the r.v.s X1, . . . , Xk have the Multinomial distribution with
parameters n and p1, . . . , pk, and let i, j, be arbitrary but xed, 1 i < j k.
Consider the r.v.s Xi, Xj, and set X = n Xi Xj, so that these r.v.s have the
Multinomial distribution with parameters n and pi, pj, p, where p = 1 pi pj.
ii) Write out the joint m.g.f. of Xi, Xj, X, and by differentiation, determine the
E(XiXj);
ii) Calculate the covariance of Xi, Xj, Cov(Xi, Xj), and show that it is negative.
6.5.22 If the r.v.s X1 and X2 have the Bivariate Normal distribution with
parameters 1, 2, 21, 22 and , show that Cov(X1, X2) 0 if 0, and
Cov(X1, X2) < 0 if < 0. Note: Two r.v.s X1, X2 for which Fx ,x (X1, X2)
Fx (X1)Fx (X2) 0, for all X1, X2 in , or Fx ,x (X1, X2) Fx (X1)Fx (X2) 0, for
all X1, X2 in , are said to be positively quadrant dependent or negatively
quadrant dependent, respectively. In particular, if X1 and X2 have the Bivariate
Normal distribution, it can be seen that they are positively quadrant dependent or negatively quadrant dependent according to whether 0 or < 0.
1
6.5.23
6.5.24
ii) If the r.v.s X1 and X2 have the Bivariate Normal distribution with parameters 1, 2, 21, 22 and , use their joint m.g.f. given by (11) and property
(10) in order to determine E(X1X2);
ii) Show that is, indeed, the correlation coefcient of X1 and X2.
6.5.25 Both parts of Exercise 6.4.1 hold true if the ch.f.s involved are replaced by m.g.f.s, provided, of course, that these m.g.f.s exist.
ii) Use Exercise 6.4.1 for k = 2 and formulated in terms of m.g.f.s in order to
show that the r.v.s X1 and X2 have a Bivariate Normal distribution if and
only if for every c1, c2 , Yc = c1X1 + c2X2 is normally distributed;
ii) In either case, show that c1X1 + c2X2 + c3 is also normally distributed for any
c3 .
164
Chapter 7
P X j Bj , j = 1, , k = P X j Bj .
j =1
REMARK 1
164
THEOREM 1
165
( )
x j , j = 1, , k.
( )
x j , j = 1, , k.
i) FX , , X x1 , , xk = FX x j , for all
1
j =1
k
j =1
PROOF
P X j Bj , j = 1, , k = P X j Bj , Bj , j = 1, , k.
j =1
( )
FX , , X x1 , , xk = FX x j .
1
j =1
The proof of the converse is a deep probability result, and will, of course,
be omitted. Some relevant comments will be made in Section 7.4, Lemma 3.
ii) For the discrete case, we set Bj = {xj}, where xj is in the range of Xj, j = 1, . . . ,
k. Then if Xj, j = 1, . . . , k are independent, we get
P X 1 = x1 , , X k = xk = P X j = x j ,
j =1
or
( )
f X , , X x1 , , xk = f X x j .
1
j =1
Let now
k
( )
f X , , X x1 , , xk = f X x j .
1
j =1
f X , , X x1 , , xk =
1
B1
( )
( )
f X x1 f X xk
1
= f X x j ,
j =1
B
j
( )
or
( )
FX , , X y1 , , yk = FX y j .
1
j =1
( )
f X , , X x1 , , xk = f X x j
1
and let
j =1
166
C j = , y j , y j . j = 1, , k.
Then integrating both sides of this last relationship over the set C1
Ck, we get
( )
FX , , X y1 , , yk = FX y j ,
1
j =1
( )
FX , , X x1 , , xk = FX x j
1
j =1
(that is, the Xjs are independent). Then differentiating both sides, we get
( )
f X , , X x1 , , xk = f X x j .
1
j =1
Consider independent r.v.s and suppose that gj is a function of the jth r.v.
alone. Then it seems intuitively clear that the r.v.s gj(Xj), j = 1, . . . , k ought to
be independent. This is, actually, true and is the content of the following
LEMMA 1
[ ( )]
( )
PROOF
(Factorization Theorem) The r.v.s Xj, j = 1, . . . , k are independent if and only if:
( )
X , , X t1 , , t k = X t j , for all t j , j = 1, , k.
1
j =1
167
PROOF
( )
f X , , X x1 , , xk = f X x j .
1
j =1
Hence
k
k
k it X
it X
X , , X x1 , , xk = E exp i t j X j = E e = Ee
j =1
j =1
j =1
by Lemmas 1 and 2, and this is kj=1X (tj). Let us assume now that
j
( )
X , , X t1 , , t k = X t j .
1
j =1
( )
j j
1
f X 1 , , X k ( x1 , , x k ) = lim
T 2T
( )
T
exp i t j x j
T
j =1
X 1 , , X k (t1 , , t k )dt1 dt k
1
= lim
T 2T
k
k
T
exp i t j x j X j t j ( dt1 dt k )
T
T
j =1
j =1
( )
k
1 T it j x j
k
e
X j t j dt j = f X x .
= lim
T 2T T
j =1 j ( j )
j =1
( )
That is, Xj, j = 1, . . . , k are independent by Theorem 1(ii). For the continuous
case, we have
it j h
1 T 1 e
it h
e X t j dt j , j = 1, , k,
h 0 T 2 T
it j h
and for the multidimensional case, we have (see Theorem 2(ii) in Chapter 6)
( )
f X x j = lim lim
j
f X 1 , , X k x1 , , xk
1
= lim lim
h 0 T 2
( )
k
T
1 e it j h
j =1
it j h
it j x j
X 1 , , X k t1 , , t k dt1 dt k
1
= lim lim
h 0 T 2
k
T
1 e it j h it j x j
e Xj t j
T
j =1
it j h
T
( )
dt1 dt k
k
1
= lim lim
h 0 T 2
j =1
= fX j ( xj ) ,
j =1
it h
1 e j it j x j
T it j h e X j t j dt j
( )
168
Let X1, X2 have the Bivariate Normal distribution. Then X1, X2 are independent if and only if they are uncorrelated.
PROOF
1,
X2
(x , x ) =
1
e q 2 ,
2 1 2 1 2
where
q=
1
1 2
2
x 2
x1 2 x2 2 x2 2
1
1
2
,
1
1 2 2
and
x 2
x
1
1
1
2
, f x =
2
fX 1 x1 =
exp
exp
X2
2
2
2
2
2
2 1
2 2
1
2
( )
( )
( )
( )
fX 1 , X 2 x1 , x 2 = fX 1 x1 fX 2 x 2 ,
that is, X1, X2 are independent. The converse is always true by Corollary 1 in
Section 7.2.
Exercises
7.1.1
[ ()
( )]
X (1) = min X 1 , , X n ,
X n = max X 1 , , X n ;
that is,
()
X (1) s = min X 1 s , , X n s ,
()
[ ()
( )]
X ( n ) s = max X 1 s , , X n s .
Then express the d.f. and p.d.f. of X(1), X(n) in terms of f and F.
7.1.2
Let the r.v.s X1, X2 have p.d.f. f given by f(x1, x2) = I(0,1) (0,1)(x1, x2).
ii) Show that X1, X2 are independent and identify their common distribution;
ii) Find the following probabilities: P(X1 + X2 < 13 ), P( X12 + X 22 < 14 ),
P(X1X2 > 12 ).
7.1.3
Let X1, X2 be two r.v.s with p.d.f. f given by f(x1, x2) = g(x1)h(x2).
Exercises
7.1 Stochastic Independence: Criteria of Independence
169
ii) Derive the p.d.f. of X1 and X2 and show that X1, X2 are independent;
ii) Calculate the probability P(X1 > X2) if g = h and h is of the continuous
type.
7.1.4 Let X1, X2, X3 be r.v.s with p.d.f. f given by f(x1, x2, x3) = 8x1x2x3 IA(x1,
x2, x3), where A = (0, 1) (0, 1) (0, 1).
ii) Show that these r.v.s are independent;
ii) Calculate the probability P(X1 < X2 < X3).
7.1.5 Let X1, X2 be two r.v.s with p.d.f f given by f(x1, x2) = cIA(x1, x2), where
A = {(x1, x2) 2; x12 + x 22 9}.
ii) Determine the constant c;
ii) Show that X1, X2 are dependent.
7.1.6 Let the r.v.s X1, X2, X3 be jointly distributed with p.d.f. f given by
1
f x1 , x 2 , x3 = I A x1 , x 2 , x3 ,
4
where
{(
)(
)(
)(
)}
A = 1, 0, 0 , 0, 1, 0 , 0, 0, 1 , 1, 1, 1 .
Then show that
ii) Xi, Xj, i j, are independent;
ii) X1, X2, X3 are dependent.
7.1.7 Refer to Exercise 4.2.5 in Chapter 4 and show that the r.v.s X1, X2, X3
are independent. Utilize this result in order to nd the p.d.f. of X1 + X2 and X1
+ X2 + X3 .
7.1.8 Let Xj, j = 1, . . . , n be i.i.d. r.v.s with p.d.f. f and let B be a (Borel) set
in .
iii) In terms of f, express the probability that at least k of the Xs lie in B for
some xed k with 1 k n;
iii) Simplify this expression if f is the Negative Exponential p.d.f. with parameter and B = (1/, );
iii) Find a numerical answer for n = 10, k = 5, = 12 .
7.1.9 Let X1, X2 be two independent r.v.s and let g: be measurable.
Let also Eg(X2) be nite. Then show that E[g(X2) | X1 = x1] = Eg(X2).
7.1.10 If Xj, j = 1, . . . , n are i.i.d. r.v.s with ch.f. and sample mean X ,
express the ch.f. of X in terms of .
7.1.11 For two i.i.d. r.v.s X1, X2, show that X X (t) = |X (t)|2, t . (Hint:
Use Exercise 6.2.3 in Chapter 6.)
1
170
7.1.12 Let X1, X2 be two r.v.s with joint and marginal ch.f.s X ,X , X and X .
Then X1, X2 are independent if and only if
1
1 ,X2
(t , t ) = (t ) (t ),
1
X1
X2
t1 , t 2 .
1 ,X2
(t, t ) = (t ) (t ),
X1
t ,
X2
E g j X j = g1 x1 gk xk
j =1
f X , , X x1 , , xk dx1 dxk
( )
( )
( )
)
= g ( x ) g ( x ) f ( x ) f ( x )dx dx
(by independence)
= g ( x ) f ( x )dx g ( x ) f ( x )dx
= E[ g ( X )] E[ g ( X )].
1
X1
Xk
X1
Xk
Now suppose that the gjs are complex-valued, and for simplicity, set gj(Xj) = Yj
= Yj1 + Yj2, j = 1, . . . , k. For k = 2,
[(
)(
)]
(
) (
)
= [ E(Y Y ) E(Y Y )] + i[ E(Y Y ) + E(Y Y )]
= [( EY )( EY ) ( EY )( EY )] + i[( EY )( EY ) ( EY )( EY )]
= ( EY + iEY )( EY + iEY ) = ( EY )( EY ).
= E Y11Y21 Y12Y22 + iE Y11Y22 + Y12Y21
11
11
11
21
12
22
21
12
12
21
11
22
12
22
22
22
11
21
12
21
7.2 Independence:
Proof of Lemma
2 andof
Related
Results
7.1 Stochastic
Criteria
Independence
171
[(
) ]
= E(Y Y )( EY ) (by the part just established)
= ( EY ) ( EY )( EY ) (by the induction hypothesis).
E Y1 Ym+1 = E Y1 Ym Ym+1
1
m+1
COROLLARY 1
The covariance of an r.v. X and of any other r.v. which is equal to a constant
c (with probability 1) is equal to 0; that is, Cov(X, c) = 0.
PROOF
COROLLARY 2
m+1
If the r.v.s X1 and X2 are independent, then they have covariance equal to 0,
provided their second moments are nite. In particular, if their variances are
also positive, then they are uncorrelated.
PROOF
In fact,
) ( )( )
= ( EX )( EX ) ( EX )( EX ) = 0,
Cov X1 , X 2 = E X1 X 2 EX1 EX 2
1
COROLLARY 3
i) For any k r.v.s Xj, j = 1, . . . , k with nite second moments and variances
2j = 2(Xj), and any constants cj, j = 1, . . . , k, it holds:
k
k
2 c j X j = c j2 j2 + ci c j Cov X i , X j
j =1
j =1
1i j k
= c 2j j2 + 2
j =1
)
)
ci c j Cov X i , X j .
1i < j k
= c 2j j2 + 2
j =1
ci c j i j ij .
1i < j k
(Bienaym equality).
172
PROOF
iii) Indeed,
2
k
k
2
c j X j = E c j X j E c j X j
j =1
j =1
j = 1
k
= E c j X j EX j
j = 1
k
= E c 2j X j EX j
j = 1
) + c c (X
= c 2j j2 +
j =1
j =1
i j
ci c j Cov X i , X j
1i j k
= c 2j j2 + 2
)(
ci c j Cov X i , X j
1i < j k
EX i X j EX j
This establishes part (i). Part (ii) follows by the fact that Cov(Xi, Xj) =
ijij = jiji.
iii) Here Cov (Xi, Xj) = 0, i j, either because of independence and Corollary
2, or ij = 0, in case j > 0, j = 1, . . . , k. Then the assertion follows from
either part (i) or part (ii), respectively.
iii) Follows from part (iii) for c1 = = ck = 1.
Exercises
7.2.1 For any k r.v.s Xj, j = 1, . . . , k for which E(Xj) = (nite) j = 1, . . . , k,
show that
k
(X
j =1
) = (X
2
j =1
+k X
= kS 2 + k X ,
where
2
1 k
1 k
2
X
and
S
=
j
Xj X .
k j =1
k j =1
7.2.2 Refer to Exercise 4.2.5 in Chapter 4 and nd the E(X1 X2), E(X1 X2 X3),
2(X1 + X2), 2(X1 + X2 + X3) without integration.
X=
7.2.4
n
n
3
E X j EX j = E X j EX j .
j = 1
j =1
Let Xj, j = 1, . . . , n be i.i.d. r.v.s with mean and variance 2, both nite.
7.3 Independence:
Some Consequences
7.1 Stochastic
Criteria of Independence
173
ii) In terms of , c and , nd the smallest value of n for which the probability
that X (the sample mean of the Xs) and differ in absolute value at most
by c is at least ;
ii) Give a numerical answer if = 0.90, c = 0.1 and = 2.
7.2.5 Let X1, X2 be two r.v.s taking on the values 1, 0, 1 with the following
respective probabilities:
f ( 1, 0) = , f ( 1, 1) =
( )
f (0, 1) = , f (0, 0) = 0,
f (0, 1) =
;
f (1, 1) = , f (1, 0) = ,
f (1, 1) = .
f 1, 1 = ,
, > 0, + =
1
.
4
X = Xj is B n, p , where n = n j .
j =1
j =1
(That is, the sum of independent Binomially distributed r.v.s with the same
parameter p and possibly distinct njs is also Binomially distributed.)
PROOF It sufces to prove that the ch.f. of X is that of a B(n, p) r.v., where
n is as above. For simplicity, writing j Xj instead of kj=1Xj, when this last
expression appears as a subscript here and thereafter, we have
()
()
()
X t = X t = X t = pe it + q
j
j =1
j =1
) = ( pe
nj
it
+q
()
X = X j is P , where = j .
j =1
j =1
174
(That is, the sum of independent Poisson distributed r.v.s is also Poisson
distributed.)
PROOF
We have
()
()
()
X t = X t = X t = exp j e it j
j
j =1
j =1
k
k
= exp e it j j = exp e it
j =1
j =1
(ii) We have
k
k
2 c 2t 2
X t = c X t = X cj t = exp icj t j j j
2
j =1
j =1
2
2
t
= exp it
2
()
j j
()
( )
with and 2 as in (ii) above. Hence X is N(, 2). (i) Follows from (ii) by
setting c1 = c2 = = ck = 1.
Now let Xj, j = 1, . . . , k be any k independent r.v.s with
( )
E X j = ,
( )
2 X j = 2,
j = 1, , k.
Set
X=
1 k
Xj.
k j =1
1
, 1 = = k = , and 12 = = k2 = 2
k
and get the rst conclusion. The second follows from the rst by the use of
Theorem 4, Chapter 4, since
c1 = = c k =
7.3 Independence:
Some Consequences
7.1 Stochastic
Criteria of Independence
k X
THEOREM 5
) = (X ) .
175
2 k
j =1
j =1
X = X j is r2 , where r = rj .
PROOF
We have
()
()
()
X t = X t = X t = 1 2it
j
j =1
j =1
rj 2
= 1 2it
r 2
COROLLARY 1
j =1
j
PROOF
is k2 .
By Lemma 1,
2
X j j
,
j = 1, , k
are 12 ,
j = 1, , k.
(X
j =1
) = ( X
2
j =1
+k X
= kS 2 + k X ,
where
S2 =
2
1 k
Xj X .
k j =1
We have
2
k X
Xj
=
j =1
kS 2
.
2
176
Now
X
j
j =1
k
is k2
k X
is 12 ,
by Theorem 3, Chapter 4. Then taking ch.f.s of both sides of the last identity
above, we get (1 2it)k/2 = (1 2it)1/2 kS / (t).
Hence kS / (t) = (1 2it)(k1)/2 which is the ch.f. of a 2k1 r.v.
2
REMARK 6
or
( ) (
2 k1 4
k1 2
, and 2 S 2 =
.
k
k2
The following result demonstrates that the sum of independent r.v.s
having a certain distribution need not have a distribution of the same kind, as
was the case in Theorems 25 above.
ES 2 =
THEOREM 6
Exercises
7.3.1
T = Xj,
j =1
= j .
j =1
7.4*
177
Independence
Classes of Events
andofRelated
Results
7.1
StochasticofIndependence:
Criteria
Independence
7.3.3 The life of a certain part in a new automobile is an r.v. X whose p.d.f.
is Negative Exponential with parameter = 0.005 day.
iii) Find the expected life of the part in question;
iii) If the automobile comes supplied with a spare part, whose life is an r.v. Y
distributed as X and independent of it, nd the p.d.f. of the combined life
of the part and its spare;
iii) What is the probability that X + Y 500 days?
7.3.4 Let X1, X2 be independent r.v.s distributed as B(n1, p1) and B(n2, p2),
respectively. Determine the distribution of the r.v.s X1 + X2, X1 X2 and X1
X2 + n2.
7.3.5 Let X1, X2 be independent r.v.s distributed as N(1, 21), and N(2, 22),
respectively. Calculate the probability P(X1 X2 > 0) as a function of 1, 2 and
1, 2. (For example, X1 may represent the tensile strength (measured in p.s.i.)
of a steel cable and X2 may represent the strains applied on this cable. Then
P(X1 X2 > 0) is the probability that the cable does not break.)
7.3.6 Let Xi, i = 1, . . . , m and Yj, j = 1, . . . , n be independent r.v.s such that
the Xs are distributed as N(1, 21) and the Ys are distributed as N(2, 22).
Then
ii) Calculate the probability P( X > Y ) as a function of m, n, 1, 2 and 1, 2;
ii) Give the numerical value of this probability for m = 10, n = 15, 1 = 2 and
21 = 22 = 6.
7.3.7 Let X1 and X2 be independent r.v.s distributed as 2r and 2r , respectively, and for any two constants c1 and c2, set X = c1X1 + c2X2. Under what
conditions on c1 and c2 is the r.v. X distributed as 2r ? Also, specify r.
1
X = j X j ,
j =1
Y = j X j ,
j =1
178
To start with, consider the probability space (S, A, P) and recall that k
events A1, . . . , Ak are said to be independent if for all 2 m k and all 1 i1
< < im k, it holds that P(Ai Ai ) = P(Ai ) P(Ai ). This denition
is extended to any subclasses of Cj, j = 1, . . . , k, as follows:
1
DEFINITION 2
DEFINITION 3
We say that the r.v.s Xj, j = 1, . . . , k are independent (in any one of the modes
mentioned in the previous denition) if the -elds induced by them are
independent.
From the very denition of X j1(B), for every Aj X j1(B) there exists Bj
B such that Aj = X j1(Bj), j = 1, . . . , k. The converse is also obviously true; that
is, X j1(Bj) X j1(B), for every Bj B, j = 1, . . . , k. On the basis of these
observations, the previous denition is equivalent to Denition 1. Actually,
Denition 3 can be weakened considerably, as explained in Lemma 3 below.
According to the following statement, in order to establish independence
of the r.v.s Xj, j = 1, . . . , k, it sufces to establish independence of the (much
smaller) classes Cj, j = 1, . . . , k, where Cj = X j1({(, x], x }). More
precisely,
LEMMA 3
Let
( )
A j = X j1 B
and C j = X j1
A= g X
Exercise
7.1 Stochastic Independence: Criteria of Independence
179
[ ( )] (B ),
A j* = g X j
j = 1, , k.
Then
A j* A j ,
j = 1, , k,
Exercise
7.4.1 Consider the probability space (S, A, P) and let A1, A2 be events. Set
X1 = IA , X2 = IA and show that X1, X2 are independent if and only if A1, A2 are
independent. Generalize it for the case of n events Aj, j = 1, . . . , n.
1
180
Chapter 8
i) We say that {Xn} converges almost surely (a.s.), or with probability one, to
a.s.
X as n , and we write Xn n
X, or Xn
X with probability 1,
n
or P[Xn
X]
=
1,
if
X
(s)
X(s)
for
all
s
S except possibly for
n
n
n
a subset N of S such that P(N) = 0.
c
a.s.
Thus Xn n
X means that for every > 0 and for every s N there
()
()
Xn s X s <
for all n N(, s). This type of convergence is also known as strong
convergence.
ii) We say that {Xn} converges in probability to X as n , and we write
P
Xn
0.
X is
n
equivalent to: P[|Xn X| ]
0 for every
n
n
> 0, then clearly P[|Xn X| ]
0.
n
180
8.1
181
X, if Fn(x)
n
F is continuous there exists N(, x) such that |Fn(x) F(x)| < for all
n N(, x). This type of convergence is also known as weak convergence.
d
REMARK 2 If Fn have p.d.f.s fn, then Xn
()
EXAMPLE 1
if
( )
x = 1 1 n
or
( )
x = 1+ 1 n
otherwise.
Fn x = 21 ,
1,
()
if
if
if
( )
1 (1 n) x < 1 + (1 n)
x 1 + (1 n).
x < 1 1 n
Fn
1
1
2
Figure 8.1
0 1
1
n
1
n
0 ,
F x =
1,
()
if
if
x<1
x 1,
which is a d.f.
Under further conditions on fn, f, it may be the case, however, that fn
converges to a p.d.f. f.
We now assume that E|Xn|2 < , n = 1, 2, . . . , Then:
iv) We say that {Xn} converges to X in quadratic mean (q.m.) as n , and we
2
write Xn nq.m.
0.
X, if E|Xn X|
n
q.m.
Thus Xn n
X
means
that:
For
every
> 0, there exists N() > 0 such
182
X, if P(An)
n
n
to 0, as n , but the events An themselves keep wandering around the
sample space S. Finally, convergence in quadratic mean simply signies that
the averages E|Xn X|2 converge to 0 as n .
Exercises
8.1.1
P X n = 1 = pn ,
P X n = 0 = 1 pn.
0?
Under what conditions on the pns does Xn
P
n
0 for every x .
n
Thus a convergent sequence of d.f.s need not converge to a d.f.
8.1.3 Let Xj, j = 1, . . . , n, be i.i.d. r.v.s such that EXj = , 2(Xj) = 2, both
n )2
nite. Show that E(X
0.
n
8.1.4 For n = 1, 2, . . . , let Xn, Yn be r.v.s such that E(Xn Yn)2
0 and
n
q.m.
suppose that E(Xn X)2
0
for
some
r.v.
X.
Then
show
that
Y
n
n n
X.
8.1.5 Let Xj , j = 1, . . . , n be independent r.v.s distributed as U(0, 1), and
set Yn = min(X1, . . . , Xn), Zn = max(X1, . . . , Xn), Un = nYn, Vn = n(1 Zn).
Then show that, as n , one has
P
i) Yn
0;
P
ii) Zn
1;
d
iii) Un
U;
d
iv) Vn
V, where U and V have the negative exponential distribution
with parameter = 1.
THEOREM 1
183
a.s.
P
i) Xn n
X implies Xn
X.
n
q.m.
P
ii) Xn n
X
implies
X
X.
n
n
P
d
iii) Xn
X implies Xn
conv. in q.m.
PROOF
i) Let A be the subset of S on which Xn
1
A = I U I X n+ r X < ,
k
k=1 n=1 r =1
X is given by
n
1
A c = U I U X n+ r X .
k
k=1 n=1 r =1
The sets A, Ac as well as those appearing in the remaining of this discussion are all events, and hence we can take their probabilities. By setting
1
Bk = I U X n+ r X ,
k
n=1 r =1
1
C n = U X n+ r X .
k
r =1
P(Cn)
1
1
X
n+ m
U X n+ r X ,
k r =1
k
so that
1
1
P X n+ m X P U X n+ r X = P C n
0
n
k
k
r =1
P
for every k 1. However, this is equalivalent to saying that Xn
X, as
n
was to be seen.
ii) By special case 1 (applied with r = 2) of Theorem 1, we have
( )
P Xn X >
E Xn X
184
Thus, if Xn nq.m.
n
n
P
0 for every > 0, or equivalently, Xn
X.
n
iii) Let x be a continuity point of F and let > 0 be given. Then we have
[X x ] = [X
[X
[X
since
[X
] [
x, X x + X n > x, X x
x + X n > x, X x
] [
x] [ X
] [
[X
X ,
> x, X x = X n > x, X x +
So
[X x ] [X
] [
] [
X Xn X .
] [
implies
] [
x Xn X
]
]
P X x P Xn x + P Xn X ,
or
() [
F x Fn x + P X n X .
Thus, if Xn
X, then we have by taking limits
P
n
()
(1)
(2)
F x lim
inf Fn x .
n
In a similar manner one can show that
() (
lim
sup Fn x F x + .
n
But (1) and (2) imply F(x ) lim inf Fn(x) lim sup Fn(x) F(x + ).
n
n
Letting 0, we get (by the fact that x is a continuity point of F) that
()
()
() ()
F x lim
inf Fn x lim
sup Fn x F x .
n
n
Hence lim Fn(x) exists and equals F(x). Assume now that P[X = c] = 1. Then
n
0 ,
F x =
1,
()
x<c
xc
] [
P X n c = P X n c
[
]
= P[ X c + ] P[ X
P[ X c + ] P[ X
= F (c + ) F (c ).
= P c Xn c +
]
c ]
< c
185
] ( ) ( )
lim P X n c F c + F c = 1 0 = 1.
n
Thus
P Xn c
1.
REMARK 4
not true.
EXAMPLE 2
Let S = (0, 1], and let P be the probability function which assigns to
subintervals of (0, 1] as measures of their length. (This is known as the
Lebesgue measure over (0, 1].) Dene the sequence X1, X2, . . . of r.v.s as
follows: For each k = 1, 2, . . . , divide (0, 1] into 2k1 subintervals of equal
length. These intervals are then given by
j 1 j
k 1 , k 1 ,
2
2
j = 1, 2, , 2 k 1.
j 1 j
s k1 , k1 and 0, otherwise.
2
2
We assert that the so constructed sequence X1, X2, . . . of r.v.s converges to
0 in probability, while it converges nowhere pointwise, not even for a single
s (0, 1]. In fact, by Theorem 1(ii), it sufces to show that Xn nq.m.
0;
that is, EX 2n
0.
For
any
n
1,
we
have
that
X
is
the
indicator
of
an
n
n
interval
j 1 j
k1 , k1
2
2
for some k and j as above. Hence EX 2n = 1/2k1. It is also clear that for m > n,
EX 2m 1/2k1. Since for every > 0, 1/2k1 < for all sufciently large k, the proof
that EX 2n
0 is complete.
n
P
X,
and
also
that
X
X
need
not
imply Xn n
X. That
n
a.s.
q.m.
Xn n
X
need
not
imply
that
X
X
is
seen
by
the
following
example.
n
n
EXAMPLE 3
0.
nI(0,1/n]. Then, clearly, Xn
n
X if and only if
( )
( )
E Xn
c, 2 X n
0.
n
n
In fact,
186
E Xn c
[(
) (
= E( X EX ) + ( EX
= ( X ) + ( EX c ) .
)]
c)
= E X n EX n + EX n c
2
0 and EXn
c.
n
n
n
REMARK 6 The following example shows that the converse of (iii) is not
true.
EXAMPLE 4
Let S = {1, 2, 3, 4}, and on the subsets of S, let P be the discrete uniform
function. Dene the following r.v.s:
()
()
()
()
X n 1 = X n 2 = 1, X n 3 = X n 4 = 0, n = 1, 2, ,
and
()
()
X 1 = X 2 = 0,
()
()
X 3 = X 4 = 1.
Then
()
()
X n s X s = 1 for all s S.
Hence Xn does not converge in probability to X, as n . Now,
FX n
0,
x = 21 ,
1,
()
x<0
0 x < 1,
x1
FX
0,
x = 21 ,
1,
x<0
0 x < 1,
x1
()
n
n
probability to X.
Very often one is confronted with the problem of proving convergence in
distribution. The following theorem replaces this problem with that of proving
convergence of ch.f.s, which is much easier to deal with.
n
THEOREM 2
(P. Lvys Continuity Theorem) Let {Fn} be a sequence of d.f.s, and let F be a
d.f. Let n be the ch.f. corresponding to Fn and be the ch.f. corresponding to
F. Then,
(t), for
i) If Fn (x)
n
n
every t .
ii) If n(t) converges, as n , and t , to a function g(t) which is continuous
at t = 0, then g is a ch.f., and if F is the corresponding d.f., then Fn (x)
n
F(x), for all continuity points x of F.
PROOF Omitted.
REMARK 7 The assumption made in the second part of the theorem above
according to which the function g is continuous at t = 0 is essential. In fact, let
Xn be an r.v. distributed as N(0, n), so that its ch.f. is given by n(t) = et n/2. Then
2
8.3
187
n(t)
=
n
2
n
n
n
1
for every x and F(x) = 2 , x , is not a d.f. of an r.v.
n
()
Exercises
8.2.1 (Rnyi) Let S = [0, 1) and let P be the probability function on subsets
of S, which assigns probability to intervals equal to their lengths. For n = 1,
2, . . . , dene the r.v.s Xn as follows:
j
j +1
N , if
s<
s
=
2N + 1
2N + 1
+j
0, otherwise,
j = 0, 1, . . . , 2N, N = 1, 2, . . . . Then show that
P
0;
i) Xn
n
ii) Xn(s)
0, s (0, 1);
iii) Xn (s)
n
iv) EXn
0.
n
XN
()
X, where X is an r.v.
n
n
distributed as P().
8.2.3 For n = 1, 2, . . . , let Xn be r.v.s having the negative binomial distribution with pn and rn such that pn
1, rn
n
n
n
d
(>0). Show that Xn
X,
where
X
is
an
r.v.
distributed
as
P(
).
(Use
n
ch.f.s.)
8.2.4 If the i.i.d. r.v.s Xj, j = 1, . . . , n have a Cauchy distribution, show that
P
n
there is no nite constant c for which X
c. (Use ch.f.s.)
n
8.2.5 In reference to the proof of Theorem 1, show that the set A of convergence of {Xn} to X is, indeed, expressed by A = Ik=1 Un=1 I r =1(|Xn+r X| < k1).
(Central Limit Theorem) Let X1, . . . , Xn be i.i.d. r.v.s with mean (nite) and
(nite and positive) variance 2. Let
188
n X
1 n
n
, and x =
X
,
G
x
=
P
x
j n
n j =1
Then Gn(x)
()
Xn =
()
1
2
e t 2 dt .
2
REMARK 8
i) We often express (loosely) the CLT by writing
n Xn
Sn E Sn
N 0, 1 , or
N 0, 1 ,
Sn
for large n, where
( )
( )
( )
n Xn
Sn = X j , since
j =1
)=S
( )
( ).
(S )
E Sn
n
ii) In part (i), the notation Sn was used to denote the sum of the r.v.s X1, . . . ,
Xn. This is a generally accepted notation, and we are going to adhere to
it here. It should be pointed out, however, that the same or similar
symbols have been employed elsewhere to denote different quantities
(see, for example, Corollaries 1 and 2 in Chapter 7, or Theorem 9 and
Corollary to Theorem 8 in this chapter). This point should be kept in mind
throughout.
iii) In the proof of Theorem 3 and elsewhere, the little o notation will be
employed as a convenient notation for the remainder in Taylor series
expansions. A relevant comment would then be in order. To this end, let
{an}, {bn}, n = 1, 2, . . . be two sequences of numbers. We say that {an} is o(bn)
(little o of bn) and we write an = o(bn), if an/bn
0. For example, if an
n
= n and bn = n2, then an = o(bn), since n/n2 = 1/n
0. Clearly, if an =
n
o(bn), then an = bno(1). Therefore o(bn) = bno(1).
iv) We recall the following fact which was also employed in the proof of
Theorem 3, Chapter 3. Namely, if an
a, then
n
n
an
e a.
1 +
n n
PROOF OF THEOREM 3 We may now begin the proof. Let gn be the ch.f.
of Gn and be the ch.f. of ; that is, (t) = et /2, t . Then, by Theorem 2, it
sufces to prove that gn(t)
n Xn
) = nX
1
n
Zj ,
j =1
1
n
j =1
Xj
8.3
189
()
gn t = g t
n jZ j
t 1
t = g jZ j
= g Z1
.
n n
()
Now consider the Taylor expansion of gz around zero up to the second order
term. Then
1
t
t2
t
1 t
gZ
g Z 0 +
g Z 0 + o .
= gZ 0 +
2! n
n
n
n
1
()
()
()
Since
()
()
( )
( )
()
g Z1 0 = 1, g Z 1 0 = iE Z1 = 0, g Z1 0 = i 2 E Z 12 = 1,
we get
t
t2
t2
t2
t2
t2
gZ
1
o
1
o
1
=
1
1 o 1 .
=
+
=
n
2n
2n n
2n
n
()
( )]
Thus
n
t2
1o 1 .
g n t = 1
2 n
()
[ ( )]
COROLLARY
(x) is uniform in x .
The convergence Gn(x)
n
(That is, for every x and every > 0 there exists N() > 0 independent of
x, such that |Gn(x) (x)| < for all n N() and all x simultaneously.)
8.3.1 Applications
1. If Xj, j = 1, . . . , n are i.i.d. with E(Xj) = , 2(Xj) = 2, the CLT is used
to give an approximation to P[a < Sn b], < a < b < +. We have:
190
( )
( )
( )
( )
( )
( )
( )
( )
a E Sn
Sn E Sn
b E Sn
P a < Sn b = P
<
Sn
S n
S n
a n S n E S n
b n
= P
<
Sn
n
n
Sn E Sn
Sn E Sn
b n
a n
P
= P
S n
S n
n
n
b* a * ,
( )
( )
( ) ( )
( )
( )
where
a* =
a n
, b* =
b n
.
n
n
(Here is where the corollary is utilized. The points a* and b* do depend on n,
and therefore move along as n . The above approximation would not be
valid if the convergence was not uniform in x .) That is, P(a < Sn b)
(b*) (a*).
2. Normal approximation to the Binomial. This is the same problem
as above, where now Xj, j = 1, . . . , n, are independently distributed as
B(1, p). We have = p, = pq . Thus:
( ) ( )
P a < Sn b b * a * ,
where
a* =
a np
npq
, b* =
b np
npq
()
()
1
2npq
ex
where
x=
r np
.
npq
Then it can be shown that fn(r)/n(x)
8.3
191
large n, we have, in particular, that fn(r) is close to n(x). That is, the probability
(nr)prqnr is approximately equal to the value
r np 2
exp
2
npq
2npq
of the normal density with mean np and variance npq for sufciently large n.
Note that this asymptotic relationship of the p.d.f.s is not implied, in general,
by the convergence of the distribution functions in the CLT.
1
To give an idea of how the correction term 21 comes in, we refer to Fig. 8.2
drawn for n = 10, p = 0.2.
0.3
N(2, 1.6)
0.2
0.1
0
Figure 8.2
Now
()
()
P 1 < S n 3 = P 2 S n 3 = fn 2 + fn 3
= shaded area,
while the approximation without correction is the area bounded by the normal
curve, the horizontal axis, and the abscissas 1 and 3. Clearly, the correction,
given by the area bounded by the normal curve, the horizontal axis and the
abscissas 1.5 and 3.5, is closer to the exact area.
To summarize, under the conditions of the CLT, and for discrete r.v.s,
b n
a n
P a < S n b
n
n
and
b + 0.5 n
a + 0.5 n
P a < S n b
n
n
192
((
P a S n bn = P a 1 < S n bn ,
(3)
( ) ( )
P a Sn b b * a *
where
a* =
and
a 1 n
, b* =
( ) ( )
P a S n b b a
b n
(4)
where
a =
a 0.5 n
, b =
b + 0.5 n
(5)
.
n
n
These expressions of a*, b* and a, b in (4) and (5) will be used in calculating
probabilities of the form (3) in the numerical examples below.
EXAMPLE 5
5 , nd P(45 Sn 55).
(Numerical) For n = 100 and p1 = 21 , p2 = 16
i) For p1 = 21 : Exact value: 0.7288
Normal approximation without correction:
1
44 100
2 = 6 = 1.2,
a* =
1 1
5
100
2 2
1
55 100
5
2
= = 1.
b* =
1 1
5
100
2 2
Thus
( ) ( )
() (
() ( )
b * a * = 1 1.2 = 1 + 1.2 1
= 0.841345 + 0.884930 1 = 0.7263.
Normal approximation with correction:
1
45 0.5 100
2 = 5.5 = 1.1
a =
5
1 1
100
2 2
1
55 + 0.5 100
2 = 5.5 = 1.1.
b =
5
1 1
100
2 2
8.3
193
Thus
( ) ( )
( ) (
( )
( ) ( )
so that (b) (a) = 0.0021.
so that b * a * = 0.0030.
a* = 2.75, b* = 4.15,
a = 2.86, b = 5.23,
Then:
n
n
and
a + 0.5 n
b + 0.5 n
P a < S n b
n
n
( ) ( )
P a Sn b b * a *
where
a* =
a 1 n
n
, b* =
b n
n
and
( ) ( )
P a S n b b a
where
a =
EXAMPLE 6
a 0.5 n
n
, b =
b + 0.5 n
n
194
11 16
( ) ( )
so that b a = 0.7851.
Exercises
8.3.1 Refer to Exercise 4.1.12 of Chapter 4 and suppose that another manufacturing process produces light bulbs whose mean life is claimed to be 10%
higher than the mean life of the bulbs produced by the process described in the
exercise cited above. How many bulbs manufactured by the new process must
be examined, so as to establish the claim of their superiority with probability
0.95?
8.3.2 A fair die is tossed independently 1,200 times. Find the approximate
probability that the number of ones X is such that 180 X 220. (Use the
CLT.)
8.3.3 Fifty balanced dice are tossed once and let X be the sum of the
upturned spots. Find the approximate probability that 150 X 200. (Use the
CLT.)
8.3.4 Let Xj, j = 1, . . . , 100 be independent r.v.s distributed as B(1, p). Find
the exact and approximate value for the probability P( j100
= 50). (For the
=1 Xj
latter, use the CLT.)
8.3.5 One thousand cards are drawn with replacement from a standard deck
of 52 playing cards, and let X be the total number of aces drawn. Find the
approximate probability that 65 X 90. (Use the CLT.)
8.3.6 A Binomial experiment with probability p of a success is repeated
1,000 times and let X be the number of successes. For p = 12 and p = 14, nd
the exact and approximate values of probability P(1,000p 50 X 1,000p
+ 50). (For the latter, use the CLT.)
8.3.7 From a large collection of bolts which is known to contain 3% defective bolts, 1,000 are chosen at random. If X is the number of the defective bolts
among those chosen, what is probability that this number does not exceed 5%
of 1,000? (Use the CLT.)
Exercises
195
8.3.8 Suppose that 53% of the voters favor a certain legislative proposal.
How many voters must be sampled, so that the observed relative frequency of
those favoring the proposal will not differ from the assumed frequency by
more than 2% with probability 0.99? (Use the CLT.)
8.3.9 In playing a game, you win or lose $1 with probability 12 . If you play the
game independently 1,000 times, what is the probability that your fortune (that
is, the total amount you won or lost) is at least $10? (Use the CLT.)
8.3.10 A certain manufacturing process produces vacuum tubes whose lifetimes in hours are independently distributed r.v.s with Negative Exponential
distribution with mean 1,500 hours. What is the probability that the total life of
50 tubes will exceed 75,000 hours? (Use the CLT.)
8.3.11 Let Xj, j = 1, . . . , n be i.i.d. r.v.s such that EXj = nite and 2(Xj) =
n | c) = 0.90. (Use
2 = 4. If n = 100, determine the constant c so that P(|X
the CLT.)
8.3.12 Let Xj, j = 1, . . . , n be i.i.d. r.v.s with EXj = nite and 2(Xj) = 2
(0, ).
n | k)
i) Show that the smallest value of the sample size n for which P(|X
2
p is given by n = [ k1 1 ( 1+2 p )] if this number is an integer, and n is
the integer part of this number increased by 1, otherwise. (Use the
CLT.);
ii) By using Tchebichevs inequality, show that the above value of n is given
by n = k (11 p) if this number is an integer, and n is the integer part of this
number increased by 1, otherwise;
iii) For p = 0.95 and k = 0.05, 0.1, 0.25, compare the respective values of n in
parts (i) and (ii).
2
196
< 1). It may be assumed that acceptance and rejection of admission offers by
the various students are independent events.
i) How many students n must be admitted, so that the probability P(|X c|
d) is maximum, where X is the number of students actually accepting an
admission, and d is a prescribed number?
ii) What is the value of n for c = 20, d = 2, and p = 0.6?
iii) What is the maximum value of the probability P(|X 20| 2) for p = 0.6?
Hint: For part (i), use the CLT (with continuity correction) in order to nd the
approximate value to P(|X c| d). Then draw the picture of the normal
curve, and conclude that the probability is maximized when n is close to c/p.
For part (iii), there will be two successive values of n suggesting themselves as
optimal values of n. Calculate the respective probabilities, and choose that
value of n which gives the larger probability.)
n
a.s.
n
The converse is also true, that is, if X
to some nite constant ,
n
then E(Xj) is nite and equal to .
Xn =
X1 + + X n P
.
n
n
PROOF
i) The proof is a straightforward application of Tchebichevs inequality under
the unnecessary assumption that the r.v.s also have a nite variance 2.
n = , 2(X
n) = 2/n, so that, for every > 0,
Then EX
P Xn
1 2
0 as n .
2 n
197
ii) This proof is based on ch.f.s (m.g.f.s could also be used if they exist). By
Theorems 1(iii) (the converse case) and 2(ii) of this chapter, in order to
P
n
prove that X
()
()
X t
t = e it , for t .
n
For simplicity, writing Xj instead of nj=1 Xj, when this last expression
appears as a subscript, we have
n
t t
X t = 1 nx t = x = X
n n
n
()
()
t
t
= 1 + i + o
n
n
t
t
= 1 + i + o 1
n
n
()
t
= 1 + i + o 1
e it .
n
n
( )]
Both laws of LLN hold in all concrete cases which we have studied except
for the Cauchy case, where E(Xj) does not exist. For example, in the Binomial
case, we have:
If Xj, j = 1, . . . , n are independent and distributed as B(1, p), then
Xn =
X1 + + X n
p
n
n
a.s.
Xn =
X1 + + X n
n
n
a.s.
For x ,
()
Fn x =
1
the number of X 1 , , X n x .
n
198
Fn is a step function which is a d.f. for a xed set of values of X1, . . . , Xn. It is
also an r.v. as a function of the r.v.s X1,, . . . , Xn, for each x. Let
1,
Yj x = Yj =
0,
Xj x
X j > x,
()
j = 1, , n.
Then, clearly,
()
Fn x =
1 n
Yj .
n j =1
On the other hand, Yj, j = 1, . . . , n are independent since the Xs are, and Yj is
B(1, p), where
()
p = P Yj = 1 = P X j x = F x .
Hence
n
n
E Yj = np = nF x , 2 Yj = npq = nF x 1 F x .
j =1
j =1
( )[
()
It follows that
[ ( )]
()
()
1
nF x = F x .
n
So for each x , we get by the LLN
E Fn x =
()
()
a.s.
Fn x n
F x ,
( )]
()
()
P
Fn x
F x.
n
0 = 1
n
a.s.
(that is, Fn(x) n
F(x)
uniformly
in
x
).
() ()
PROOF Omitted.
Exercises
8.4.1 Let Xj, j = 1, . . . , n be i.i.d. r.v.s and suppose that EX kj is nite for a
given positive integer k. Set
n
(k ) 1
X n = X kj
n j =1
for the kth sample moment of the distribution of the Xs and show that
P
n(k)
X
EX k1.
n
8.4.2 Let Xj, j = 1, . . . , n be i.i.d. r.v.s with p.d.f. given in Exercise 3.2.14 of
Chapter 3 and show that the WLLN holds. (Calculate the expectation by
means of the ch.f.)
199
n =
1 n
j.
n j =1
0.
n
Show that if the Xs are pairwise uncorrelated and 2j M(<), j 1, then the
generalized version of the WLLN holds.
8.4.4 Let Xj, j = 1, . . . , n be pairwise uncorrelated r.v.s such that
) (
P X j = j = P X j = j =
1
.
2
Show that for all s such that 0 < 1, the generalized WLLN holds.
8.4.5 Decide whether the generalized WLLN holds for independent r.v.s
such that the jth r.v. has the Negative Exponential distribution with parameter
j = 2j/2.
8.4.6 For j = 1, 2, . . . , let Xj be independent r.v.s such that Xj is distributed
as j2/ j . Show that the generalized WLLN holds.
8.4.7 For j = 1, 2, . . . , let Xj be independent r.v.s such that Xj is distributed
as P( j). If {1/nnj=1j} remains bounded, show that the generalized WLLN
holds.
( j ) a.s.
Xj,
X n n
(1 )
(k )
a.s.
g X1 , , X k .
j = 1, , k imply g X n , , X n n
200
PROOF Follows immediately from the denition of the a.s. convergence and
the continuity of g.
A similar result holds true when a.s. convergence is replaced by convergence in probability, but a justication is needed.
THEOREM 7
P
i) Let Xn, n 1, X and g be as in Theorem 7(i), and suppose that Xn
X.
n
P
Then g(Xn)
g(X).
n
ii) More generally, let again X (j)
n , Xj and g be as in Theorem 7(ii), and suppose
P
P
(k)
that X (j)
X
,
j
=
1,
.
.
.
,
k. Then g(X (1)
g(X1, . . . , Xk).
n
j
n , . . . , Xn )
n
n
PROOF
i) We have P(X ) = 1, and if Mn (Mn > 0), then P(X [Mn, Mn])
([
P X , M n
)] + [ X (M
n0 ,
n0
n0
>1 .
P X > M < 2.
g being continuous in , is uniformly continuous in [2M, 2M]. Thus for
every > 0, there exists (, M) = () (<1) such that |g(x) g(x)| < for
P
all x, x [2M, 2M] with |x x| < (). From Xn
X we have that
n
there exists N() > 0 such that
( )]
()
P Xn X < 2 , n N .
Set
() [
( )]
A1 = X M , A2 n = X n X < ,
and
()
( ) ( )
A3 n = g X n g X <
] (for n N ( )).
Then it is easily seen that on A1 A2(n), we have 2M < X < 2M, 2M < Xn
< 2M, and hence
()
()
A1 A2 n A3 n ,
()
()
Hence
[ ( )] ( ) [ ( )]
(for n N ( )).
( ) ( )
P g X n g X < .
201
P
P
X, Yn
Y, then
If Xn
n
n
P
i) Xn + Yn
X + Y.
n
P
ii) aXn + bYn
aX + bY (a, b constants).
n
P
iii) XnYn
XY.
n
P
PROOF In sufces to take g as follows and apply the second part of the
theorem:
i) g(x, y) = x + y,
ii) g(x, y) = ax + by,
iii) g(x, y) = xy,
y 0.
d
P
If Xn
X and Yn
c, constant, then
n
n
d
i) Xn +Yn
X + c,
n
d
ii) XnYn
cX,
n
d
iii) Xn/Yn
Equivalently,
()
()
i) P X n + Yn z = FX n +Yn z
FX + c z
n
(
()
c 0.
= P X + c z = P X z c = FX z c ;
()
ii) P X nYn z = FX Y z
FcX z
n
n n
z
z
c>0
P X = FX ,
c
c
= P cX z =
P X z = 1 F z ,
c < 0;
X
c
c
iii) P n z = FX n
Yn
Yn
(z) F (z)
X c
(
(
)
)
( )
( )
c>0
X
P X cz = FX cz ,
= P z =
c < 0,
c
P X cz = 1 FX cz ,
provided P(Yn 0) = 1.
202
1. In fact,
c|
)
1
for
every
> 0, or
n
n
n
P(c Yn c + )
1.
Thus,
if
we
choose
<
c,
we
obtain
the
result.
Next,
n
since P(Yn 0) = 1, we may divide by Yn except perhaps on a null set. Outside
this null set, we have then
X
P n z = P n z I Yn > 0 + P n z I Yn < 0
Yn
Yn
Yn
X
P n z I Yn > 0 + P Yn < 0 .
Yn
Xn
X
z = n z I Yn c + n z I Yn c <
Yn
Yn
Yn
Yn c
)U ( X
zYn
)I ( Y
c < .
(X
) [
)]
) (
) [
)]
zYn Yn c < X n z c + , if z 0,
and
(X
) (
(X
) (
) [
zYn Yn c < X n z c
)]
and hence
Xn
z Yn c
Yn
)U [X
)]
z c , z .
Thus
P n z P Yn c + P X n z c , z .
Yn
) [
)]
lim sup P n z FX z c , z .
n
Yn
[ ( )]
203
lim sup P n z FX zc , z .
n
Yn
( )
Next,
(6)
By choosing < c, we have that |Yn c| < is equivalent to 0 < c < Yn <
c + and hence
[X z(c )]I ( Y
c < n z , if
Yn
z 0,
[X z(c + )]I ( Y
c < n z , if
Yn
z < 0.
and
n
[X z(c )]I ( Y
n
c < n z
Yn
and hence
[X z(c )] ( Y
n
)U XY
z , z .
Thus
X
P X n z c P Yn c + P n z .
Yn
)] (
FX z c lim inf P n z , z .
n
Yn
[ ( )]
FX zc lim inf P n z , z .
Y
n
n
( )
Relations (6) and (7) imply that lim P(Xn/Yn z) exists and is equal to
n
FX zc = P X zc = P z = FX
c
( )
Thus
(z).
(7)
204
P n z = FX
Yn
Yn
(z) F (z),
n
z ,
X c
as was to be seen.
1 n
X j Xn
n j =1
1 n
X 2j X n2 .
n j =1
Next, the r.v.s X j2, j = 1, . . . , n are i.i.d., since the Xs are, and
( )
( ) (
E X 2j = 2 X j + EX j
= 2 + 2 , if
( )
( )
= E Xj , 2 =2 Xj
(which are assumed to exist). Therefore the SLLN and WLLN give the result
that
1 n
2 + 2
X 2j
n
n j =1
a.s.
2n
and also in probability. On the other hand, X
a.s.
and also in probability (by the same theorems just referred to). So we have
proved the following theorem.
THEOREM 9
REMARK 13 Of course,
P
S 2n
2
n
implies
2
n Sn P
1,
n 1 2 n
since n/(n 1)
1.
n
COROLLARY
TO THEOREM 8
If X1, . . . , Xn are i.i.d. r.v.s with mean and (positive) variance 2, then
n 1 Xn
Sn
PROOF In fact,
) N 0, 1
( )
d
n
and also
n Xn
n Xn
Sn
) N 0, 1 ,
( )
d
n
) N 0, 1 .
( )
d
n
205
by Theorem 3, and
n
Sn P
1,
n
n1
by Remark 13. Hence the quotient of these r.v.s which is
n 1 Xn
Sn
g(d)X as n .
0, or
P
equivalently, Xn d
0, and hence, by Theorem 7(i),
P
Xn d
0.
(8)
) ( )
( ) () (
g X n = g d + X n d g X *n ,
where X n* is an r.v. lying between d and Xn. Hence
[ ( ) ( )]
) (
cn g X n g d = cn X n d g X *n .
(9)
P
P
However, |X n* d| |Xn d|
d, and therefore,
( )
()
P
g X *n
g d.
(10)
Let the r.v.s X1, . . . , Xn be i.i.d. with mean and variance 2 (0, ),
and let g: be differentiable with derivative continuous at . Then, as
n ,
[ ( ) ( )]
[ ( )] .
n g Xn g
d N 0, g
d
n )
n(X
206
[ ( ) ( )]
[ ( )] .
d
n g Xn g
g X ~ N 0, g
()
Here = p, 2 = pq, and g(x) = x(1 x), so that g(x) = 1 2x. The result
follows.
[ (
Exercises
8.5.1 Use Theorem 8(ii) in order to show that if the CLT holds, then so does
the WLLN.
8.5.2 Refer to the proof of Theorem 7(i) and show that on the set A1
A2(n), we actually have 2M < X < 2M.
8.5.3 Carry out the proof of Theorem 7(ii). (Use the usual Euclidean
distance in k.)
( )
()
F < 2, F > 1 2 .
(11)
( ) ( )
F x j +1 F x j < 2,
j = 1, , r 1.
(12)
F(xj) implies that there exists Nj() > 0 such that for all
Next, Fn(xj)
n
n Nj(),
( ) ( )
Fn x j F x j < 2,
By taking
j = 1, , r .
8.6*
( ()
()
207
( ))
n N = max N1 , , N r ,
we then have that
( ) ( )
Fn x j F x j < 2,
j = 1, , r.
(13)
Let x0 = , xr+1 = . Then by the fact that F() = 0 and F() = 1, relation (11)
implies that
( ) ( )
( ) ( )
F x1 F x0 < 2, F x r +1 F x r < 2.
(14)
( ) ( )
F x j +1 F x j < 2,
j = 0, 1, , r .
(15)
( ) ( )
Fn x j F x j < 2,
j = 0, 1, , r + 1.
(16)
Next, let x be any real number. Then xj x < xj+1 for some j = 0, 1, . . . , r. By (15)
and (16) and for n N(), we have the following string of inequalities:
( )
( ) () ( ) ( )
< F (x ) + F (x) + F (x ) + .
Hence
()
()
( )
( )
0 F x + Fn x F x j+1 + F x j + 2 < 2
() ()
Fn x F x <
for every x .
(17)
()
Xj n
Xj > n
and
0,
if
Zj n = Zj =
X j , if
()
Xj n
X j > n,
j = 1, , n.
208
( )
( )
2 Y j = 2 Y1
( ) ( )
= E Y 12 EY1
= E X 12 I
= x2I
[X
( )
E Y 12
] ( X1 )
[ x n] ( x ) f ( x )dx
()
x 2 f x dx n
()
x f x dx n
()
x f x dx
= nE X1 ;
that is,
( )
2 Yj n E X 1 .
Next,
( )
( )
E Yj = E Y1 = E X 1 I
= xI
[X
(18)
] ( X1 )
[ x n ] ( x) f ( x)dx.
Now,
xI
xf ( x ),
n
[ x n] ( x ) f ( x ) < x f ( x ), xI [ x n] ( x ) f ( x )
and
()
x f x dx < .
Therefore
xI
xf ( x )dx =
n
[ x n] ( x ) f ( x )dx
( )
E Yj
.
n
1 n
n
P Yj EYj = P Yj E Yj n
j = 1
j =1
n j = 1
n
1
2 Yj
2
n
j =1
2
( )
n 2 Y1
n 2
n n E X 1
2
n 2 2
E X1
2
(19)
8.6*
209
(20)
Thus,
1 n
1 n
P Y j 2 = P Y1 E Y j + E Y1 2
n j=1
n j=1
( ) ( ( ) )
1 n
P Y j EY1 + EY1 2
n j=1
1 n
P Y j EY1 + P EY1
n j=1
2 E X1
1 n
P Y j 2 2 E X1
n j=1
for n large enough. Next,
(21)
) (
)
= P ( X > n)
P Zj 0 = P Zj > n
j
( x > n
f ( x)dx
()
f x dx
) ( )
f x dx
) ( )
f x dx +
( x > in >1
<
( x > n ) n
()
f x dx
()
1
x f x dx
n ( x > n )
1 2
<
n
x f x dx < 2
= , since
( x > n )
n
=
()
P Z j 0 nP Z j 0
j = 1
(22)
210
1 n
1 n
1 n
P X j 4 = P Y j + Z j 4
n j =1
n j = 1
n j = 1
1 n
1 n
P Y j + Z j 4
n j =1
n j = 1
1 n
1 n
P Y j 2 + P Z j 2
n j = 1
n j = 1
1 n
n
P Y j 2 + P Z j 0
n j = 1
j = 1
2 E X1 +
1 n
P X j 4 E X 1 + 3
n j = 1
for n sufciently large. Since this is true for every > 0, the result
follows.
n
a.s.
X does not necessarily imply that Xn n
X.
However,
the
following
is
always true.
THEOREM 11
P
If Xn
PROOF Omitted.
As an application of Theorem 11, refer to Example 2 and consider the
subsequence of r.v.s {X2 1}, where
k
X2
= I 2
k1
1
k 1 1
2
Then for > 0 and large enough k, so that 1/2k1 < , we have
1
P X 2 1 > = P X 2 1 = 1 = k1 < .
2
Hence the subsequence {X2 1} of {Xn} converges to 0 in probability.
) (
Exercises
Exercises
8.6.1 Use Theorem 11 in order to prove Theorem 7(i).
8.6.2 Do likewise in order to establish part (ii) of Theorem 7.
211
212
Chapter 9
9.1.1
Application 1: Transformations of
Discrete Random Variables
Let X be a discrete r.v. taking the values xj, j = 1, 2, . . . , and let Y = h(X). Then
Y is also a discrete r.v. taking the values yj, j = 1, 2, . . . . We wish to determine
fY(yj) = P(Y = yj), j = 1, 2, . . . . By taking B = {yj}, we have
{ ( ) }
A = xi; h xi = y j ,
and hence
( )
({ }) = P ( A) = f (x ),
fY y j = P Y = y j = PY y j
212
x i A
9.1
where
( )
213
fX x i = P X = x i .
EXAMPLE 1
( ) (
) (
)
= ( x = r ) + ( x = r ) = { r} + {r}.
A = h 1 B = x 2 = r 2 = x = r or
Thus
( )
( )
({ })
x=r
({ }) = 21n + 21n = 1n .
PY B = PX A = PX r + PX r
That is,
P Y = r 2 = 1 n, r = 1, , n.
EXAMPLE 2
{y = x
} {
+ 2 x 3; x = 0, 1, = 3, 0, 5, 12, .
From
x 2 + 2 x 3 = y,
we get
x 2 + 2 x y + 3 = 0, so that
x = 1 y + 4 .
Hence x = 1 + y + 4 , the root 1 y + 4 being rejected, since it is negative. Thus, if B = {y}, then
( ) {
A = h 1 B = 1 + y + 4 ,
and
( )
( )
PY B = P Y = y = PX A =
e 1+ y + 4
.
1 + y + 4 !
214
transformation, h1, exists and, of course, h1[h(x)] = x. For such a transformation, we have
()
) [( ) ]
= P{h [ h( X )] h ( y)}
= P( X x ) = F ( x ),
(
FY y = P Y y = P h X y
1
( ) [ ( ) ] { [ ( )]
= P[ X h ( y)] = P( X x )
= 1 P( X < x ) = 1 F ( x ),
( )}
FY y = P h X y = P h 1 h X h 1 y
1
where FX(x) is the limit from the left of FX at x; FX(x) = lim FX(y), y x.
Figure 9.1 points out why the direction of the inequality is reversed when h1 is applied if h in monotone decreasing.
REMARK 1
y
y h(x)
(y y0) corresponds,
under h, to (x x0)
y0
x0 h1 (y0 )
0
Figure 9.1
Let h: S T be one-to-one and monotone. Then FY(y) = FX(x) if h is increasing, and FY(y) = 1 FX(x) if h is decreasing, where x = h1(y) in either case.
REMARK 2 Of course, it is possible that the d.f. FY of Y can be expressed in
terms of the d.f. FX of X even though h does not satisfy the requirements of the
corollary above. Here is an example of such a case.
EXAMPLE 3
() (
) (
) [( ) ] (
FY y = P Y y = P h X y = P X 2 y = P y X y
) (
) ( y ) F ( y );
= P X y P X < y = FX
that is,
()
FY y = FX
( y ) F ( y )
9.1
215
We will now focus attention on the case that X has a p.d.f. and we will
determine the p.d.f. of Y = h(X), under appropriate conditions.
One way of going about this problem would be to nd the d.f. FY of the r.v.
Y by Theorem 1 (take B = (, y], y ), and then determine the p.d.f. fY of
Y, provided it exists, by differentiating (for the continuous case) FY at continuity points of fY. The following example illustrates the procedure.
EXAMPLE 4
()
()
FY y = FX
Next,
d
FX
dy
so that
( y ) F ( y ),
y 0.
( y ) = f ( y ) dyd
X
and
y=
( )
1
2 y
fX
( y) = 2
1
2 y
e y 2 ,
( )
d
1
1
FX y =
fX y =
e y 2 ,
dy
2 y
2 2 y
()
()
d
1
FY y = fY y =
dy
y
1
2
ey 2 =
( )Z
1
2
1
2
1
1
2
ey
( = ( )),
1
2
d 1
h y ,
y T
f h 1 y
fY y = X
dy
0,
otherwise.
For a sketch of the proof, let B = [c, d] be any interval in T and set A = h1(B).
Then A is an interval in S and
()
[ ( )]
()
) [( ) ] (
()
P Y B = P h X B = P X A = fX x dx.
A
Under the assumptions made, the theory of changing the variable in the
integral on the right-hand side above applies (see for example, T. M. Apostol,
216
[ ( )] dyd h ( y) dy.
P Y B = fX h 1 y
B
Let the r.v. X have a continuous p.d.f. fX on the set S on which it is positive, and
let y = h(x) be a (measurable) transformation dened on into , so that
Y = h(X) is an r.v. Suppose that h is one-to-one on S onto T (the image of S
under h), so that the inverse transformation x = h1(y) exists for y T. It
is further assumed that h1 is differentiable and its derivative is continuous
and 0 on T. Then the p.d.f. fY of Y is given by
d 1
h y ,
f h 1 y
fY y = X
dy
0,
[ ( )]
()
EXAMPLE 5
()
y T
otherwise.
()
h 1 y =
1
yb
a
Therefore,
()
fY y =
exp
()
d 1
1
h y = .
dy
a
and
2 2 a
y b
a
[ (
)]
y a + b
exp
2 a 2 2
2 a
which is the p.d.f. of a normally distributed r.v. with mean a + b and variance
a22. Thus, if X is N(, 2), then aX + b is N(a + b, a22).
Now it may happen that the transformation h satises all the requirements
of Theorem 2 except that it is not one-to-one from S onto T. Instead, the
following might happen: There is a (nite) partition of S, which we denote by
9.1
217
Let the r.v. X have a continuous p.d.f. fX on the set S on which it is positive, and
let y = h(x) be a (measurable) transformation dened on into , so that
Y = h(X) is an r.v. Suppose that there is a partition {Sj, j = 1, . . . , r} of S and
subsets Tj, j = 1, . . . , r of T (the image of S under h), which need not be distinct
or disjoint, such that rj=1Tj = T and that h dened on each one of Sj onto Tj ,
j = 1, . . . , r, is one-to-one. Let hj be the restriction of the transformation h to
1
Sj and let h1
j be its inverse, j = 1, . . . , r. Assume that h j is differentiable and
its derivative is continuous and 0 on Tj, j = 1, . . . , r. Then the p.d.f. fY of Y is
given by
r
j y fY y ,
fY y =
j =1
0,
() ()
()
y T
otherwise,
where for j = 1, . . . , r,
[ ( )] dyd h ( y) ,
()
fY y = fX h j 1 y
j
1
j
y T j ,
[ ( )] dyd h ( y) ;
()
fY y = fX h j 1 y
j
1
j
EXAMPLE 6
Consider the r.v. X and let Y = h(X) = X2. We want to determine the p.d.f. fY
of the r.v. Y. Here the conditions of Theorem 3 are clearly satised with
S1 = , 0 ,
S 2 = 0, ,
T1 = 0, ,
()
h11 y = y ,
()
h 21 y =
y,
so that
()
d 1
1
,
h1 y =
dy
2 y
Therefore,
()
T2 = 0,
1
d 1
, y > 0.
h2 y =
dy
2 y
218
()
( ) 2 1y ,
fY y = fX y
1
()
fY y =
1
fX
2 y
()
fY y = fX
2
( y) 2 1y ,
( y ) + f ( y ),
X
Exercises
9.1.1 Let X be an r.v. with p.d.f. f given in Exercise 3.2.14 of Chapter 3 and
determine the p.d.f. of the r.v. Y = X 3.
9.1.2 Let X be an r.v. with p.d.f. of the continuous type and set
Y = nj= 1cjIB (X), where Bj, j = 1, . . . , n, are pairwise disjoint (Borel) sets and cj,
j = 1, . . . , n, are constants.
i) Express the p.d.f. of Y in terms of that of X, and notice that Y is a discrete
r.v. whereas X is an r.v. of the continuous type;
ii) If n = 3, X is N(99, 5) and B1 = (95, 105), B2 = (92, 95) + (105, 107),
B3 = (, 92] + [107, ), determine the distribution of the r.v. Y dened
above;
iii) If X is interpreted as a specied measurement taken on each item of a
product made by a certain manufacturing process and cj, j = 1, 2, 3 are the
prot (in dollars) realized by selling one item under the condition that
X Bj, j = 1, 2, 3, respectively, nd the expected prot from the sale of one
item.
j
9.2
9.1 The
TheMultivariate
Univariate Case
219
()
f x =
x 2 e
( ),
1 2 x 2
x
EXAMPLE 7
()
FY y = P X1 + X 2 y =
{x +x
1
fX
1 ,X 2
(x , x )dx dx .
1
()
FY y =
( )
A,
220
where A is the area of that part of the square lying to the left of the line
x1 + x2 = y. Since for y + , A = (y 2)2/2, we get
( y 2 )
F ( y) =
2( )
2 < y + .
for
()
FY y =
( )
(2 y)
= 1 2 y
)
)
2
2
Thus we have:
0 ,
2
y 2
,
2
2
FY y =
2 y
1
2
1,
()
)
)
y 2
2 < y +
)
)
2
2
+ < y 2
y > 2 .
x2
2
x1
x2 y
x1 x2 y
2
x1
2
Figure 9.2
9.2
9.1 The
TheMultivariate
Univariate Case
EXAMPLE 8
221
fY ,Y y1 , y2 = P Y1 = y1 , Y2 = y2 = P X1 = y1 y2 , X 2 = y2 ,
1
) (
P X1 = y1 y2 P X 2 = y2
n1 y y n ( y y ) n2 y n
=
p q
q
p
y1 y2
y2
1
y2
n1 n2 y ( n + n ) y
;
=
p q
y1 y2 y2
1
that is
n1 n2 y ( n + n ) y
fY ,Y y1 , y2 =
,
p q
y1 y2 y2
1
0 y1 n1 + n2
u = max 0, y1 n1 y2 min y1 , n2 = .
Thus
( )
) f (y , y ) = p
fY y1 = P Y1 = y1 =
1
y2 =u
Y1 ,Y2
y1
( n + n ) y
1
n1 n2
.
1 y2 y2
y2 =u
Next, for the four possible values of the pair, (u, ), we have
y1
y
n1 n2
n2 n n1 n2
=
=
y =0 y1 y2 y2
y = y n y1 y2 y2
1 y2 y2
y2 = 0
n1
n2
y2 = y1 n 1
n1 n2 n1 + n2
=
;
y1 y2 y2 y1
that is, Y1 = X1 + X2 is B(n1 + n2, p). (Observe that this agrees with Theorem 2,
Chapter 7.)
Finally, with y1 and y2 as above, it follows that
n1 n2
y y2 y2
P Y2 = y2 Y1 = y1 = 1
,
n1 + n2
y1
222
THEOREM 2
() ( ()
( ))
y = h x = h1 x , , hk x
() ( ()
( ))
x = h 1 y = g1 y , , gk y
exists for y T .
()
g ji y =
g j y1 , , yk ,
yi
i, j = 1, , k
[ ( )]
[ ()
( )]
f h 1 y J = f g y ,
X
, gk y J ,
1
fY y = X
0,
()
y T
otherwise,
J=
g11
g 21
g12
g 22
g1k
g2 k
M
g k1
M
gk2
M
M
g kk
and is assumed to be 0 on T.
In Theorem 2, the transformation h transforms the k-dimensional r. vector X to the k-dimensional r. vector Y. In many applications,
however, the dimensionality m of Y is less than k. Then in order to determine
the p.d.f. of Y, we work as follows. Let y = (h1(x), . . . , hm(x)) and choose
another k m transformations dened on k into , hm+j, j = 1, . . . , k m,
say, so that they are of the simplest possible form and such that the
transformation
REMARK 4
h = h1 , , hm , hm + 1 , , hk
9.1 The
TheMultivariate
Univariate Case
9.2
EXAMPLE 9
223
fX
1 ,X 2
0 ,
(x , x ) (
1
< x1 , x 2 <
otherwise.
Y = X1 + X 2
y = x1 + x 2
, < x1 , x 2 < ; then 1
h: 1
Y2 = X 2 .
y2 = x 2
From h, we get
x1 = y1 y2
x2 = y2 .
Then
J=
1 1
=1
0
1
and also < y2 < . Since y1 y2 = x1, < x1 < , we have < y1 y2 < . Thus
the limits of y1, y2 are specied by < y2 < , < y1 y2 < . (See Figs. 9.3 and
9.4.)
x2
(Fig. 9.4)
S
x1
0
y2
y1 y2 c
y1 y2
2
0
y1 y2
2
y1
224
Thus we get
fY ,Y y1 , y2
1
0 ,
) (
otherwise.
Therefore
fY
y1 =
1
0,
( )
y1
dy2 =
y dy2 =
1
y1 2
2 y1
( )
for 2 < y1 +
otherwise.
fY1(y1)
Figure 9.5
1
( )
2
REMARK 5
EXAMPLE 10
2
y1
Let X1, X2 be i.i.d. r..s from U(1, ). Set Y1 = X1X2 and nd the p.d.f. of Y1.
Consider the transformation
y = x1 x 2
Y = X1 X 2
; then 1
h: 1
y2 = x 2
Y2 = X 2 .
From h, we get
Now
y
x1 = 1
y2
x = y
2
2
1 y1
1
and J = y2 y22 = .
y2
0
1
S = x1 , x2 ; f X
is transformed by h onto
1 ,X 2
( x , x ) > 0
1
y
T = y1 , y2 ; 1 < 1 < , 1 < y2 < .
y2
9.2
9.1 The
TheMultivariate
Univariate Case
225
y2 y1
y2
y1
1
0
y1
2
Figure 9.6
fY ,Y
1
y1 , y2 = 1
0,
) (
(y , y )
1
,
2
otherwise,
we have
fY
1
y1 =
1
1
( ) (
(
)
)
y1
dy2
1
=
y2
1
y1
dy2
1
=
y2
1
(2 log log y ),
1
y1 < 2 ;
that is
fY1
EXAMPLE 11
1
log y1 ,
2
1
1
2 log log y1 ,
y1 =
2
1
0,
( )
1 < y1 <
y1 < 2
otherwise.
Let X1, X2 be i.i.d. r..s from N(0, 1). Show that the p.d.f. of the r.v.
Y1 = X1/X2 is Cauchy with = 0, = 1; that is,
( )
fY y1 =
1
1
1
, y1 .
1 + y 21
We have
Y1 = X1/X2. Let Y2 = X2 and consider the transformation
226
y = x1 x2 , x2 0
h: 1
y2 = x2 ;
x = y1 y2
then 1
x2 = y2
and
J=
y2
y1
= y2 , so that
J = y2 .
fY1 ,Y2 y1 , y2 = fX 1 ,X 2 y1 y2 , y2 y2 =
and therefore
1
fY ( y ) =
2
1
y2 + 1 y2
y 21 y22 + y22
1
2
1
exp 2 y2 dy2 = 0 exp 2 y2 dy2 .
Set
(y
2
1
+1
2
)y
2
2
= t , so that
y22 =
2t
y +1
2
1
and
2y2 dy2 =
2dt
, or
y 21 + 1
y2 dy2 =
[ )
dt
, t 0, .
y +1
2
1
0 e
dt
1
1
1
1
e t dt = 2
,
= 2
y +1
y1 + 1
y1 + 1
2
1
since
0 e
dt = 1;
that is,
( )
fY y1 =
1
EXAMPLE 12
1
1
.
y12 + 1
x1
x = y1 y2
y =
h: 1 x1 + x2 , x1 , x2 > 0; then 1
x2 = y2 y1 y2 .
y = x + x
1
2
2
9.2
9.1 The
TheMultivariate
Univariate Case
227
Hence
J=
y2
y2
y1
= y2 y1 y2 + y1 y2 = y2
1 y1
J = y2 .
and
Next,
fX
1 ,X 2
x + x2
1
x 1 x2 1 exp 1
, x1 , x2 > 0,
1
x1 , x2 = 2 2
2
otherwise,
, > 0.
0,
( )()
y1 =
x1
1
=
1.
x1 + x 2 1 + x 2 x1
Thus 0 < y1 < 1 and, clearly, 0 < y2 < . Therefore, for 0 < y1 < 1, 0 < y2 < , we
get
fY1 ,Y2 y1 , y2 =
=
1
y1 1 y2 1 y2 1 1 y1
2 +
( )()
1
( )()
y1 1 1 y1
y
exp 2 y2
2
y2 + 1e y 2 2 .
Hence
( )
fY1 y1 =
1
y1 1 1 y1
2 +
( )()
y2 + 1e y2 2 dy2 .
0
But
y2 + 1e y
Therefore
fY
EXAMPLE 13
( )
()()
y 1 1 y1
y1 = a 1
0,
( )
dy2 = 2 + t + 1e t dt = 2 + + .
0 < y1 < 1
otherwise.
e x , x > 0
f x =
x 0.
0 ,
()
Set
Y1 =
X1
X1 + X 2
, Y2 =
, Y3 = X1 + X 2 + X 3
X1 + X 2
X1 + X 2 + X 3
228
x1
y1 =
x1 + x 2
x1 = y1 y2 y3
x1 + x 2
h: y2 =
, x1 , x 2 , x3 > 0; then x 2 = y1 y2 y3 + y2 y3
x1 + x 2 + x3
x = y y + y
2 3
3
3
y3 = x1 + x 2 + x3
and
y2 y3
y1 y3
J = y2 y3
y1 y2
y1 y3 + y3
y3
y1 y2 + y2 = y2 y32 .
y2 + 1
Now from the transformation, it follows that x1, x2, x3 (0, ) implies that
( )
( )
y1 0, 1 , y2 0, 1 , y3 0, .
Thus
fY ,Y
1
2 ,Y 3
2 y3
3
( y , y , y ) = 0y, y e
1
Hence
( ) y y e
(y ) = y y e
fY y1 =
1
fY
2 y3
3
dy2 dy3 = 1,
2 y3
3
0 < y1 < 1,
= 2 y2 , 0 < y2 < 1
and
( )
1 1
0 0
y dy
0
1 2 y
y3 e , 0 < y3 < .
2
3
Since
( ) ( ) ( )
9.2
9.1 The
TheMultivariate
Univariate Case
229
()
fX x =
( )
1 2 x2
x ,
1
( r 2 )1 y 2
,
y
e
1
1 2 )r
(
fY y = r 2
2
0,
()
y>0
( )
y 0.
1
t u
x =
t =
h:
y r ; then
r
y = u
u = y
and
u
J=
2 u r =
1
r
0
( )
fT ,U t , u =
=
1
2
( )
t2 u 2 r
1
( r 2 )1 u 2 u
u
e
.
r 2
r 22
r
( )
( )
2r r 2 2 r
(1 2 )( r +1)1
u
t2
exp 1 + .
r
2
Hence
()
fT t =
( )
2r r 2 2 r
(1 2 )( r +1)1
u
t2
exp 1 + du.
r
2
We set
1
u
t2
t2
t2
1
z
u
2
z
1
du
2
1
dz,
+
=
,
so
that
=
+
,
=
+
2
r
r
r
230
(1 2 )( r +1)1
1
2
2z
e z
dz
fT t =
2
0
r 2
1+ t r
1 + t2 r
2r r 2 2
(1 2 )( r +1)
(1 2 )( r +1)1
1
2
z
=
e z dz
1 2 )( r +1) 0
(
r 2
2
2r r 2 2
1+ t r
()
( )
( )
( )
( )
[ ( )]
( ) [ ( )]
1 2 r +1
r r 2 1 + t 2 r ( )( )
[ (r + 1)];
1
2
that is
(r + 1)]
() [
r r 2
fT t =
1
2
( ) [1 + (t r )](
2
)( )
1 2 r +1
t .
The probabilities P(T t) for selected values of t and r are given in tables (the
t-tables). (For the graph of fT, see Fig. 9.7.)
fT (t)
t (N(0, 1))
t5
0
Figure 9.7
The density of the F distribution with r1, r2 d.f. (Fr ,r ). Let the independent r..s
X and Y be distributed as 2r and 2r , respectively, and set F = (X/r1)/(Y/r2). The
r.v. F is said to have the F distribution with r1, r2 degrees of freedom (d.f.) and
is often denoted by Fr ,r .
We want to nd its p.d.f. We have:
1
fX
1
1
x = 2 r1 2 r1
0,
()
( )
( r 2 )1 x 2
e
, x>0
1
x 0,
9.2
9.1 The
TheMultivariate
Univariate Case
1
1
fY y = 2 r2 2 r
0,
()
( )
( r 2 )1 y 2
e
, y>0
2
y 0.
x r1
r
f =
x = 1 fz
h:
y r2 ; then
r2
z = y
y = z
and
r1
z
J = r2
0
r1
f r1
r2 = z, so that
r2
1
J =
r1
z.
r2
( )
f F ,Z f , z =
( r )( r )2
1
2 1
1
2 2
(1 2 )( r + r
1
r1
) r2
( r 2 )1
1
( r 2 )1 ( r 2 )1 ( r 2 )1
z
z
1
r
r
exp 1 fz e z 2 1 z
r2
2r2
r 2 ( r 2 )1
r1 r2
f
zr
(1 2 )( r + r )1
=
z
exp 1 f + 1 .
1
2
r
+
r
( )( )
2 r2
12 r1 12 r2 2
)
( )( )
1
Therefore
()
( )
(r r ) f (
=
(
( r ) ( r )2
fF f = fF ,Z f , z dz
0
r1 2
1
2 1
1
2 2
r1 2 1
)(
1 2 r1 + r2
(1 2 )( r + r )1
1
zr
exp 1 f + 1 dz.
2 r2
Set
1
r1
z r1
f + 1 = t , so that z = 2t f + 1 ,
2 r2
r
dz = 2 1 f + 1 dt , t 0, .
r2
231
232
(r r ) f (
f
=
()
(
( r ) ( r )2
fF
2 1 f + 1
r2
)(
1 2 r1 + r2
1
2 2
1
2 1
r1 2 1
r1 2
[ (r + r )](r r )
1
2
(1 2 )( r + r )1
1
1
2 2
( )(
1 2 r1 + r2 +1
e t dt
r1 2
( r )( r )
1
2 1
(1 2 )( r + r )1 r1
f + 1
)
r2
( r 2 )1
1
(1 2 )( r + r )
[1 + (r r ) f ]
1
Therefore
fF
[(
)]( )
( )( )
1 r + r r r
2 1 2 1 2
f =
12 r1 12 r2
0,
()
r1 2
( r 2 )1
1
(1 2 ) ( r + r )
[1 + (r r ) f ]
1
, for
f >0
for
f 0.
The probabilities P(F f) for selected values of f and r1, r2 are given by
tables (the F-tables). (For the graph of fF, see Fig. 9.8.)
fF ( f )
F10, 10
F10, 4
10
20
30
Figure 9.8
REMARK 6
9.1
The Univariate
Exercises
Case
233
() ()
()
y T
otherwise,
1
j
Jj =
g j11
g j21
g j12
g j22
g j1k
g j2 k
M
g jk1
M
g jk 2
M
M
g jkk
Exercises
9.2.1 Let X1, X2 be independent r.v.s taking on the values 1, . . . , 6 with
probability f(x) = 16 , x = 1, . . . , 6. Derive the distribution of the r.v. X1 + X2.
9.2.2 Let X1, X2 be r.v.s with joint p.d.f. f given by
f x1 , x 2 =
1
I A x1 , x 2 ,
where
A = x1 , x2 2 ; x 21 + x22 1.
Set Z2 = X 21 + X 22 and derive the p.d.f. of the r.v. Z2. (Hint: Use polar coordinates.)
9.2.3 Let X1, X2 be independent r.v.s distributed as N(0, 1). Then:
i) Find the p.d.f. of the r.v.s X1 + X2 and X1 X2;
ii) Calculate the probability P(X1 X2 < 0, X1 + X2 > 0).
9.2.4 Let X1, X2 be independent r.v.s distributed as Negative Exponential
with parameter = 1. Then:
234
X1 + X 2 , X1 X 2 , and X1 X 2 ;
ii) Show that X1 + X2 and X1/X2 are independent.
9.2.5
()
()
1
I 1, x .
x2 ( )
Determine the distribution of the r.v. X = X1/X2.
f x =
9.2.7
9.2.8
(Hint: For part (iv), use Stirlings formula (see, for example, W. Fellers book
An Introduction to Probability Theory, Vol. I, 3rd ed., 1968, page 50) which
states that, as n , (n)/(2)1/2n(2n1)/2en tends to 1.)
9.2.9 Let X1, X2 be independent r.v.s distributed as 2r and 2r , respectively,
and set X = X1 + X2, Y = X1/X2. Then show that:
1
r1
9.2.10
that:
235
9.2.11
( )
EX r = 0, r 2; 2 X r =
9.2.12
EX r1 ,r2 =
9.2.13
r
, r 3.
r 2
2r 22 r1 + r2 2
r2
, r2 3; 2 X r1 ,r2 =
, r2 5.
2
r2 2
r1 r2 2 r2 4
)(
Let Xr be an r.v. distributed as tr, and let fr be its p.d.f. Then show that:
x2
exp , x .
2
2
()
fr x
Fr ,r
1
1 2
r ;
r1
1
as r2 .
9.3.1 Preliminaries
A transformation h: k k which transforms the variables x1, . . . , xk to the
variables y1, . . . , yk in the following manner:
k
yi = cij xj , cij
j =1
real constants, i, j = 1, 2, , k
(1)
xi = dij y j , dij
j =1
real constants, i, j = 1, , k.
(2)
Let D = (dij) and * = |D|. Then, as is known from linear algebra (see also
Appendix 1), * = 1/. If, furthermore, the linear transformation above is
such that the column vectors (c1j, c2j, . . . , ckj), j = 1, . . . , k are orthogonal,
that is
236
j j
i =1
and
(3)
k
2
c
1
,
j
1
,
,
k
,
=
=
ij
i =1
then the linear transformation is called orthogonal. The orthogonality
relations (3) are equivalent to orthogonality of the row vectors (ci1, . . . , cik)
i = 1, . . . , k. That is,
k
cij ci j = 0
for
for i i
j =1
k
cij2 = 1, i = 1, , k.
j =1
cij ci j = 0
and
(4)
xi = c ji y j , i = 1, , k.
j =1
c
j =1
ji
k
k
k
k
k k
y j = c ji c jl xl = c ji c jl xl = xl c ji c jl = xi
l =1
j =1 l =1
j =1
j =1
l =1
yi = cij x j , then
j =1
xi = c ji y j , i = 1, , k.
j =1
According to what has been seen so far, the Jacobian of the transformation (1) is J = * = 1/, and for the case that the transformation is orthogonal,
we have J = 1, so that |J| = 1. These results are now applied as follows:
Consider the r. vector X = (X1, . . . , Xk) with p.d.f. fX and let S be the
subset of k over which fX > 0. Set
k
Yi = cij X j , i = 1, , k,
j =1
where we assume = |(cij)| 0. Then the p.d.f. of the r. vector Y = (Y1, . . . , Yk)
is given by
fY y1 , , yk
k
k
1
fX d1 j y j , , dkj y j ,
= j =1
j =1
otherwise,
0,
(y , , y )
1
fY y1 , , yk
k
k
fX c j 1 y j , , c jk y j ,
= j =1
j =1
0
,
otherwise.
(y , , y )
1
237
i =1
i =1
Y i2 = X 2i .
In fact,
2
k k
k
k
2
i
k k k
k k
k
= cij cil X j X l = X j X l cij cil
i =1
i =1 j =1 l =1
j =1 l =1
k
= X 2i
i =1
because
k
cij cil = 1
for
j = l and 0 for
j l.
i =1
Yi = cij X j ,
j =1
i = 1, , k,
X i = dij Yj ,
j =1
i = 1, , k,
fY y1 , , yk
k
k
1
fX d1 j y j , , ckj y j ,
= j =1
j =1
otherwise.
0,
(y , , y )
1
where T is the image of S under the given transformation. If, in particular, the
transformation is orthogonal, then
k
k
fX c j 1 y j , , c jk y j , y1 , , yk T
fY y1 , , yk = j =1
j =1
otherwise.
0,
Furthermore, in the case of orthogonality, we also have
238
X
j =1
2
j
= Y 2j .
j =1
Yi = cij X j , i = 1, , k.
j =1
Then the r.v.s Y1, . . . , Yk are also independent, normally distributed with
common variance 2 and means given by
k
( )
E Yi = cij j , i = 1, , k.
PROOF
j =1
fX x1 , , xk
1
1
=
exp
2
2
2
(x
i =1
2
i ,
and hence
fY y1 , , yk
1
1
exp
=
2 2
2
k
c ji y j i
i =1 j =1
k
Now
2
2
k
k k
k
c
y
=
c
y
+
c
y
ji j i ji j i
i ji j
i =1 j =1
j =1
i =1
j =1
k
k k k
= c ji cli y j yl + 2i 2 i c ji y j
i =1 j =1 l =1
j =1
k
= y j yl c ji cli + 2i 2 i c ji y j
j =1 l =1
k
i =1
i =1
j =1 i =1
= y 2j 2 c ji i y j + 2i
j =1
j =1 i =1
i =1
y j c ji i ,
j =1
i =1
k
239
k
2 k k
y
+
c
c
2
j ji jl i l c ji i y j
i =1 l =1
j =1
i =1
k
= y 2j 2 i c ji y j + i l c ji c jl
j =1
j =1 i =1
k
i =1 l =1
j =1
= y 2j 2 i c ji y j + 2i ,
j = 1 i =1
j =1
i =1
as was to be seen.
As a further application of Theorems 4 and 5, we consider the following
result. Let Z1, . . . , Zk be independent N(0, 1), and set
Y1 =
Y2 =
Y =
3
M
Y =
k
We thus have
Z1 +
k
1
2 1
1
3 2
Z1
2 1
1
Z1 +
k k 1
c1 j =
Z2 + +
3 2
Zk
Z2
2
Z2
Z1 + +
( )
i i1
3 2
Z3
k k 1
Zk 1
k 1
k k 1
Zk .
j = 1, , i 1, and
, for
i1
c ii =
, j = 1, , k, and for i = 2, , k
c ij =
( )
i i1
Hence
k
2
1j
j =1
k
= 1, and for i = 2, , k,
k
( )
( )
( ) ( )
2
i 1
1
c = c = i 1 i i 1 + i i 1
j =1
j =1
k
2
ij
2
ij
1 i 1
+
= 1,
i
i
while for i = 2, . . . , k, we get
=
c1 j cij =
j =1
1
k
cij =
j =1
1
k
cij =
j =1
i 1
1 i 1
k i i 1
i i 1
( )
( )
= 0,
240
c c
= cij clj
ij lj
j =1
and
if i < l ,
j =1
c c
ij lj
if i > l .
j =1
)(
i i1 l l 1
)(
i i1 l l 1
[(i 1) (i 1)] = 0,
[(l 1) (l 1)] = 0.
i =1
i =1
Y i2 = Z 2i
by Theorem 4.
Thus
k
i=2
i =1
Y i2 = Y i2 Y 12 = Z 2i
i =1
( kZ )
= Z kZ = Zi Z .
2
i
i =1
i =1
is independent of
Since Y1 is independent of ki=2Y 2i, we conclude that Z
k
)2. Thus we have the following theorem.
i = 1 (Zi Z
THEOREM 6
and S 2 are
Let X1, . . . , Xk be independent r..s distributed as N(, 2). Then X
independent.
PROOF
1
X
(Z
and
j =1
1
2
(X
j =1
Exercises
9.3.1
set:
1
2
X1 +
1
2
Y3 =
X 2 , Y2 =
1
6
X1 +
1
6
1
3
X1
X2 +
2
6
1
3
X3 .
X2 +
1
3
X3 ,
9.1
241
The Univariate
Exercises
Case
Then:
i) Show that the r.v.s Y1, Y2, Y3 are also independent normally distributed
with variance 2, and specify their respective means.
(Hint: Verify that the transformation is orthogonal, and then use Theorem
5);
ii) If 1 = 2 = 3 = 0, use a conclusion in Theorem 4 in order to show that
Y 21 + Y 22 + Y 23 2 23.
9.3.2 If the pair of r.v.s (X, Y) has the Bivariate Normal distribution with
parameters 1, 2, 21, 22, , that is, (X, Y) N(1, 2, 21, 22, ), then show that
X Y N(0, 0, 1, 1, ), and vice versa.
,
X 1
1
,V=
Y 2
2
, then:
12 = 12 + 22 + 2 1 2 , 22 = 12 + 22 2 1 2 , and 0 = 12 22 1 2 ;
iii) X + Y N(0, 21) and X Y N(0, 22);
iii) The r.v.s X + Y and X Y are independent if and only if 1 = 2. (Compare
with the latter part of Exercise 9.3.5.)
9.3.7 Let (X, Y) N(1, 2, 21, 22, ), and let c, d be constants with cd 0.
Then:
i) (cX, dY) N(c1, d2, c2 21, d2 22, ), with + if cd > 0, and if cd < 0;
ii) (cX + dY, cX dY) N(c1 + d2, c1 d2, 21, 22, 0), where
12 = c 2 12 + d 2 22 + 2 cd 1 2 , 22 = c 2 12 + d 2 22 2 cd 1 2 ,
and 0 =
c 2 12 d 2 22
1 2
c
d
= ;
2
242
iv) The r.v.s in part (iii) are distributed as N(c1 + d2, 21), and N(c1 d2,
22), respectively.
9.3.8
Let X be an r.v. with continuous d.f. F, and dene the r.v. Y by Y = F(X). Then
the distribution of Y is U(0, 1).
Let G be the d.f. of Y. We will show that G(y) = y, 0 < y < 1;
G(0) = 0; G(1) = 1. Indeed, let y (0, 1). Since F(x) 0 as x , there
exists a such that (0 )F(a) < y; and since F(x) 1 as x , there exists
> 0 such that y + < 1 and F(y) < F(y + )( 1). Set F(a) = c, y + = b,
and F(b) = d. Then the function F is continuous in the closed interval
[a, b] and all y of the form y + n (n 2 integer) lie in (c, d). Therefore, by the
Intermediate Value Theorem (see, for example, Theorem 3(ii) on page 95
in Calculus and Analytic Geometry, 3rd edition (1966), by George B.
Thomas, Addison-Wesley Publishing Company, Inc., Reading, Massachusetts)
there exist x0 and xn (n 2) in (a, b) such that F(x0) = y and F(xn) = y + n .
Then
PROOF
243
(X x ) [F (X ) F (x )] (since F is nondecreasing)
= [ F ( X ) y]
(since F (x ) = y)
0
F X < y +
n
( )
[ ( ) ( )]
= F X < F xn
X < xn
since F x n = y +
( )
) [ ( ) ] (
)
or y = F ( x ) G( y) F ( x ) = y + .
n
(
P X x0 P F X y P X x n ,
0
F
y
y
0
Figure 9.9
0 A x
Figure 9.10
Figure 9.11
244
[ ( )]
F F 1 y y.
1
(7)
Now assume that F (y) t. Then F[F (y)] F(t), since F is nondecreasing.
Combining this result with (7), we obtain y F(t).
Next assume, that y F(t). This means that t belongs to the set {x , F(x)
y} and hence F1(y) t. The proof of the lemma is completed.
By means of the above lemma, we may now establish the following result.
THEOREM 8
Let Y be an r.v. distributed as U(0, 1), and let F be a d.f. Dene the r.v. X by
X = F 1(Y), where F1 is dened by (5). Then the d.f. of X is F.
PROOF
We have
) [ ( ) ] [
( )]
()
P X x = P F 1 Y x = P Y F x = F x ,
where the last step follows from the fact that Y is distributed as U(0, 1) and the
one before it by Lemma 1.
As has already been stated, the theorem just proved provides a
specic way in which one can construct an r.v. X whose d.f. is a given d.f. F.
REMARK 7
Exercise
9.4.1 Let Xj, j = 1, . . . , n be independent r.v.s such that Xj has continuous
and strictly increasing d.f. Fj. Set Yj = Fj(Xj) and show that the r.v.
n
X = 2 log 1 Yj
j =1
is distributed as 22n.
Chapter 10
In this chapter we introduce the concept of order statistics and also derive
various distributions. The results obtained here will be used in the second part
of this book for statistical inference purposes.
(that is, for each s S, look at X1(s), X2(s), . . . , Xn(s), and then Yj(s) is dened
to be the jth smallest among the numbers X1(s), X2(s), . . . , Xn(s), j = 1, 2, . . . ,
n). It follows that Y1 Y2 Yn, and, in general, the Ys are not
independent.
We assume now that the Xs are of the continuous type with p.d.f. f such
that f(x) > 0, ( )a < x < b( ) and zero otherwise. One of the problems we
are concerned with is that of nding the joint p.d.f. of the Ys. By means of
Theorem 3, Chapter 9, it will be established that:
THEOREM 1
If X1, . . . , Xn are i.i.d. r.v.s with p.d.f. f which is positive for a < x < b and 0
otherwise, then the joint p.d.f. of the order statistics Y1, . . . , Yn is given by:
( )
( )
n! f y1 f yn ,
g y1 , , yn =
0,
PROOF The proof is carried out explicitly for n = 3, but it is easily seen, with
the proper change in notation, to be valid in the general case as well. In the rst
place, since for i j,
245
246
10
( )( )
P X i = X j = ( xi = x j ) f xi f x j dxi dx j = a
x
x f ( xi ) f ( x j )dxi dx j = 0,
j
j
and therefore P(Xi = Xj = Xk) = 0 for i j k, we may assume that the joint
p.d.f., f(, , ), of X1, X2, X3 is zero if at least two of the arguments x1, x2, x3 are
equal. Thus we have
( )( )( )
f x1 f x 2 f x3 , a < x1 x 2 x3 < b
f x1 , x 2 , x3 =
0,
otherwise.
Then we have
y1 = x1 ,
y2 = x 2 , y3 = x3
S132 :
y1 = x1 ,
y2 = x3 , y3 = x 2
S 213 : y1 = x 2 , y2 = x1 ,
y3 = x3
S 231 : y1 = x 2 , y2 = x3 , y3 = x1
S312 : y1 = x3 , y2 = x1 ,
S321 :
y3 = x 2
y1 = x3 , y2 = x 2 , y3 = x1 .
10.1
1 0 0
247
0 0 1
S123 : J 123 = 0 1 0 = 1
0 0 1
S231 : J 231 = 1 0 0 = 1
0 1 0
1 0 0
S132 : J 132 = 0 0 1 = 1;
0 1 0
0 1 0
S312 : J 312 = 0 0 1 = 1
1 0 0
0 1 0
S213 : J 213 = 1 0 0 = 1
0 0 1
0 0 1
S321 : J 321 = 0 1 0 = 1.
1 0 0
( )( )( ) ( )( )( ) ( )( )( )
( )( )( ) ( )( )( ) ( )( )( )
f y1 f y2 f y3 + f y1 f y3 f y2 + f y2 f y1 f y3
+ f y3 f y1 f y2 + f y2 f y3 f y1 + f y3 f y2 f y1 ,
g y1 , y2 , y3 =
0, otherwise.
This is,
( )( )( )
Notice that the proof in the general case is exactly the same. One has n!
regions forming S, one for each permutation of the integers 1 through n. From
the denition of a determinant and the fact that each row and column contains
exactly one 1 and the rest all 0, it follows that the n! Jacobians are either 1 or
1 and the remaining part of the proof is identical to the one just given except
one adds up n! like terms instead of 3!.
EXAMPLE 1
Let X1, . . . , Xn be i.i.d. r..s distributed as N(, 2 ). Then the joint p.d.f. of the
order statistics Y1, . . . , Yn is given by
n
1
1
g y1 , , yn = n!
exp
2
2
2
(y
j =1
2
,
Let X1, . . . , Xn be i.i.d. r..s distributed as U(, ). Then the joint p.d.f. of the
order statistics Y1, . . . , Yn is given by
g y1 , , yn =
n!
( )
248
10
Let X1, . . . , Xn be i.i.d. r.v.s with d.f. F and p.d.f. f which is positive and
continuous for ( ) a < x < b( ) and zero otherwise, and let Y1, . . . , Yn be
the order statistics. Then the p.d.f. gj of Yj, j = 1, 2, . . . , n, is given by:
n!
F yj
i) g j y j = j 1 ! n j !
0, otherwise.
( ) 1 F ( y )] f ( y ),
( ) ( )( )[ ] [
j 1
n j
a < yj < b
In particular,
( )] f ( y ),
n 1 F y1
i) g1 y1 =
0,
( )
n1
a < y1 < b
otherwise
and
( )
i) g n yn
[ ( )] f ( y ),
n F yn
=
0,
n1
a < yn < b
otherwise.
The joint p.d.f. gij of any Yi, Yj with 1 i < j n, is given by:
i 1
n!
F yi
F y j F yi
i1! j i1! n j !
ii) g ij yi , y j =
n j
f yi f y j , a < yi < y j < b
1 F y j
otherwise.
0,
( )(
( )]
( ) ( ) ( )]
)( )[ ] [
j i 1
( )( )
In particular,
)[ ( ) ( )]
n n 1 F yn F y1
ii) g1n y1 , yn =
0,
n 2
( )( )
f y1 f yn ,
dy
1
=
,
du f F 1 u
[ ( )]
( )
u 0, 1 .
10.1
249
[ ( )]
[ ( )] f [F (u )] 1 f [F (u )]
h u1 , , un = n! f F 1 u1 f F 1 un
for 0 < u1 < < un < 1 and equals 0 otherwise; that is, h(u1, . . . , un) = n! for
0 < u1 < < un < 1 and equals 0 otherwise. Hence for uj (0, 1),
( )
uj
h u j = n!
0
u2
uj
u n 1
dun du j +1 du1 du j1 .
The rst n j integrations with respect to the variables un, . . . , uj+1 yield
[1/(n j)!] (1 uj)nj and the last j 1 integrations with respect to the variables
u1, . . . , uj1 yield [1/(j 1)!] ujj1. Thus
( )
h uj =
n!
u jj1 1 u j
j 1! n j !
)(
n j
for uj (0, 1) and equals 0 otherwise. Finally, using once again the transformation Uj = F(Yj), we obtain
( )
g yj =
( ) 1 F ( y )] f ( y )
( )( )[ ] [
n!
F yj
j 1! n j !
j 1
n j
( )
( )
Gn yn = P Yn yn = P all X j s yn = F n yn .
( )
) [
( )] .
Then
( ) [
( )] [ f ( y )],
n1
g1 y1 = n 1 F y1
or
( ) [
( )] f ( y ).
g1 y1 = n 1 F y1
The proof of (ii) is similar to that of (i), and in fact the same method can be
used to nd the joint p.d.f. of any number of Yjs (see also Exercise
10.1.19).
EXAMPLE 3
x
,
F x =
1,
()
x
a<x<
x ,
250
10
and therefore
n!
= j 1! n j
0,
n!
= j 1! n j
0,
( ) (
gj yj
)(
)(
y j
j 1
nj
yj
1
,
a<yj <
otherwise
)!( )
(y
) ( y )
j 1
nj
a<yj <
otherwise.
Thus
y n1 1
n1
n
1
n
y1 ,
=
n
g1 y1 =
0,
y n1 1
n1
n
n n
=
yn ,
n
g n yn =
0,
( )
( )
g1n y1 , yn
y y1
n n 1 n
0,
n 2
< y1 <
otherwise,
n n1
< yn <
otherwise,
(y
y1
n 2
n!
y jj 1 1 y j
n
j
1
!
!
g j yj =
0,
( ) (
)(
n j
0 < yj < 1
otherwise.
(
() (
n+1
y jj1 1 y j
g j yj = j n j + 1
0,
( )
n j
0 < yj < 1
otherwise,
10.1
n 1 y
1
g1 y1 =
0,
nyn 1 ,
gn yn = n
0,
( )
n 1
251
0 < y1 < 1
otherwise,
0 < yn < 1
( )
otherwise
and
g1n y1 , yn
)(
n n 1 y y
n
1
=
0,
n 2
otherwise.
y = yn y1
z = y1 .
y = z
Then 1
yn = y + z
J = 1.
and hence
Therefore
( )
fY , Z y, z = g1n z, y + z
)[ (
) ( )]
= n n1 F y+ z F z
n 2
0 < y < b a
f z f y + z ,
a < z < b y
()(
fY y = n n 1
0,
()
n 2
b y
0 < y< ba
otherwise.
() (
fY y = n n 1
1 y
) (
y n 2 dz = n n 1 y n 2 1 y ,
0 < y < 1;
that is
n n 1 yn 2 1 y ,
fY y =
0,
()
0< y<1
otherwise.
Z=
U r
252
10
y
z =
u r
w = u r .
u = rw
y = z w
Then
Therefore
and hence
( )
J = r w.
( )
fZ , W z, w = fY z w fU rw r w ,
()
( )
( )
fZ z = fY z w fU rw r w dw,
0
Exercises
Throughout these exercises, Xj, j = 1, . . . , n, are i.i.d. r.v.s and Yj = X(j) is the
jth order statistic of the Xs. The r.v.s Xj, j = 1, . . . , n may represent various
physical quantities such as the breaking strength of certain steel bars, the
crushing strength of bricks, the weight of certain objects, the life of certain
items such as light bulbs, vacuum tubes, etc. From these examples, it is then
clear that the distribution of the Ys and, in particular, of Y1, Yn as well as
Yn Y1, are quantities of great interest.
10.1.1 Let Xj, j = 1, . . . , n be i.i.d. r.v.s with d.f. and p.d.f. F and f, respectively, and let m be the median of F. Use Theorem 2(i) in order to calculate
the probability that all Xjs exceed m; also calculate the probability P(Yn m).
10.1.2
()
( )
253
Exercises
( ) j +
=
EY j
( ) j(n j + 1) ;
(Y ) =
(n + 1) (n + 2)
2
and
n+1
ii) Derive EY1, 2(Y1), and EYn, 2(Yn) from part (i);
iii) What do the quantities in parts (i) and (ii) become for = 0 and
= 1? (Hint: In part (i), use the appropriate Beta p.d.f.s to facilitate
the integrations.)
10.1.5
i) Refer to the p.d.f. gij derived in Theorem 2(ii), and show that, if X1, . . . , Xn
are independent with distribution U(, ), then:
g1n y1 , yn =
n n1
( )
(y
y1
n 2
ii) Set Y = Yn Y1, and show that the p.d.f. of Y is given by:
()
fY y =
n n1
( )
( y) y
n 2
, 0 < y < ;
P a <Y < b =
( )
[( b)b
n1
) ]
a a n1 +
bn a n
( )
g ij yi , y j =
n!
yii1 y j yi
i1! j i1! n j !
( )(
)(
) (1 y )
j i1
n j
j i 1
dy =
i! j i 1 ! j
z,
j!
j +1
0 z (1 z)
1
E YY
i j =
i j +1
(n + 1)(n + 2)
n j
dz =
( j + 1)!(n j )! ;
(n + 2)!
254
10
Cov Yi , Yj =
i n j +1
)(
n+1
n+ 2
and Yi , Yj =
i n j +1
[(
)(
)]
i ni +1 j n j +1
1 2
n
z1 , , zn ; zj 0, j = 1, , n and
z
j =1
1.
10.1.9
i) Let the independent r.v.s X1, . . . , Xn have the Negative Exponential distribution with parameter . Then show that Y1 has the same distribution with
parameter n;
ii) Let F be the (common) d.f. of the independent r.v.s X1, . . . , Xn, and
suppose that their rst order statistic Y1 has the Negative Exponential
distribution with parameter n. Then F is the Negative Exponential d.f.
with parameter .
10.1.10 Let the independent r.v.s X1, . . . , Xn have the Negative Exponential distribution with parameter . Then:
i) Use Theorem 2(i) in order to show that the p.d.f. of Yn is given by:
( )
g n yn = ne y 1 e y
n
n1
, yn > 0 ;
ii) Let Y be the (sample) range; that is, Y = Yn Y1, and then use Theorem 2
(ii) in order to show that the p.d.f. of Y is given by:
() (
fY y = n 1 e y 1 e y
n 2
, y > 0;
Exercises
255
iii) Calculate the probability P(a < Y < b) for 0 < a < b;
iv) For a = 1/, b = 2/, and n = 10, nd a numerical value for the probability
in part (iii).
10.1.11 Let the independent r.v.s X1, . . . , Xn have the Negative Exponential distribution with parameter , and let 1 < j < n. Use Theorem 2(ii) and
Exercises 1.9(i) and 1.10(i) in order to determine:
i) The conditional p.d.f. of Yj, given Y1;
ii) The conditional p.d.f. of Yj, given Yn;
iii) The conditional p.d.f. of Yn, given Y1; and the conditional p.d.f. of Y1,
given Yn.
10.1.12 Let the independent r.v.s X1, . . . , Xn have the Negative Exponential distribution with parameter , and set: Z1 = Y1, Zj = Yj Yj1, j = 2, . . . , n.
Then:
i) For j = 1, . . . , n, show that Zj has the Negative Exponential distribution
with parameter (n j + 1), and that these r.v.s are independent;
ii) From the denition of the Zjs, it follows that Yj = Z1 + + Zj, j = 1, . . . ,
n. Use this expression and part (i) in order to conclude that:
EY j =
11
1
1
++
+
;
n j + 1
n n1
[ ( ) ]
P min X i X j c = exp n n 1 c 2 .
i j
i = 1, . . . , n;
n
n
n
c j Yj = j2 ci , ci constants;
j =1
j =1 i = j
2
iv) Also utilize parts (i) and (ii) in order to show that:
n
n
n
n
Cov ci Yi , d j Yj = i2 c j d j + 2 ck dl ,
i =1
i =1 j = i
1k < l n
j =1
where cj , dj constants.
Let Xj, j = 1, . . . , n be i.i.d. r.v.s. Then the sample median SM is dened as
follows:
256
10
SM
= 1
2 Y + Y
n+ 1
2
n
2
n +2
2
if n is odd
if n is even.
(*)
10.1.14 If Xj, j = 1, . . . , n are i.i.d. r.v.s, and n is odd, determine the p.d.f. of
SM when the underlying distribution is:
i) U(, );
ii) Negative Exponential with parameter .
10.1.15 If the r.v.s Xj, j = 1, . . . , n are independently distributed as N(, 2),
show that the p.d.f. of SM is symmetric about , where SM is dened by (*).
Without integration, conclude that ESM = .
10.1.16 For n odd, let the independent r.v.s Xj, j = 1, . . . , n have p.d.f. f with
median m. Then determine the p.d.f. of SM, and also calculate the probability
P(SM > m) in each one of the following cases:
i)
ii)
iii)
iv)
f(x) = 2xI(0,1)(x);
f(x) = 2(2 x)I(1,2)(x);
f(x) = 2(1 x)I(0,1)(x);
What do parts (i)(iii) become for n = 3?
10.1.17 Refer to Exercise 10.1.2 and derive the p.d.f. of SM, where SM is
dened by (*).
10.1.18 Let Xj, j = 1, . . . , 6 be i.i.d. r.v.s with p.d.f. f given by f(x) = 16 ,
x = 1, . . . , 6. Find the p.d.f.s of Y1 and Y6. Also, observe that P(Y1 = y) =
P(Y6 = 7 y), y = 1, . . . , 6.
10.1.19
Let X1, . . . , Xn be i.i.d. r..s with continuous d.f. F and let Zj = F(Yj), where Yj,
j = 1, 2, . . . , n are the order statistics. Then Z1, . . . , Zn are order statistics
from U(0, 1), and hence their joint p.d.f., h is given by:
n!, 0 < z1 < < zn < 1
h z1 , , zn =
0, otherwise.
257
The distributions of Zj, Z1, Zn and (Z1, Zn) are given in Example 3 of this
chapter.
Let now X be an r.v. with d.f. F. Consider a number p, 0 < p < 1. Then in
Chapter 4, a pth quantile, xp, of F was dened to be a number with the
following properties:
i) P(X xp) p and
ii) P(X xp) 1 p.
Now we would like to establish a certain theorem to be used in a subsequent chapter.
THEOREM 4
Let X1, . . . , Xn be i.i.d. r.v.s with continuous d.f. F and let Y1, . . . , Yn be the
order statistics. For p, 0 < p < 1, let xp be the (unique by assumption) pth
quantile. Then we have
j 1
P Yi x p Yj =
PROOF
k =i
( )p q
n
k
n k
, where q = 1 p.
j = 1, 2, , n.
( )
P W1 = 1 = P X1 x p = F x p = p.
Therefore
P at least i of X 1 , , X n x p =
k =i
( )p q
n
k
n k
or equivalently,
P Yi < x p = P Yi x p =
k= i
( )pq
n
k
n k
(
= P(Y x
) (
Y ) + P(Y < x ),
P Yi x p = P Yi x p , Y j x p + P Yi x p , Y j < x p
i
258
10
since
(Y
) (
) (
( )p q
< x p Yi x p .
Therefore
P Yi x p Y j = P Yi x p P Y j < x p .
P Yi x p Yj =
k =i
j 1
=
k =i
( )p q
n k
( )p q
n k
n
k
n
k
k= j
n
k
n k
Exercise
10.2.1 Let Xj, j = 1, . . . , n be i.i.d. r.v.s with continuous d.f. F. Use Theorem
3 and the relevant part of Example 3 in order to determine the distribution of
the r.v. F(Y1) and nd its expectation.
11.1
259
Chapter 11
Let X be an r.v. with p.d.f. f(; ) of known functional form but depending upon
an unknown r-dimensional constant vector = (1, . . . , r) which is called a
parameter. We let stand for the set of all possible values of and call it the
parameter space. So r, r 1. By F we denote the family of all p.d.f.s we
get by letting vary over ; that is, F = {f(; ); }.
Let X1, . . . , Xn be a random sample of size n from f(; ), that is, n
independent r.v.s distributed as X above. One of the basic problems of
statistics is that of making inferences about the parameter (such as estimating , testing hypotheses about , etc.) on the basis of the observed values
x1, . . . , xn, the data, of the r.v.s X1, . . . , Xn. In doing so, the concept of
sufciency plays a fundamental role in allowing us to often substantially condense the data without ever losing any information carried by them about the
parameter .
In most of the textbooks, the concept of sufciency is treated exclusively
in conjunction with estimation and testing hypotheses problems. We propose,
however, to treat it in a separate chapter and gather together here all relevant
results which will be needed in the sequel. In the same chapter, we also
introduce and discuss other concepts such as: completeness, unbiasedness and
minimum variance unbiasedness.
For j = 1, . . . , m, let Tj be (measurable) functions dened on n into and
not depending on or any other unknown quantities, and set T = (T1, . . . , Tm).
Then
T X 1 , , X n = T1 X 1 , , X n , , Tm X 1 , , X n
) ( (
))
259
260
11
= 1 , , r , = 1 , , r r ; j > 0, j = 1, , r
and j = 1
j =1
and
( )
f x; =
()
n!
1x1 rxr I A x =
x1! xr !
1x1 rxr 11 1 1 r 1
r 1
j =1
n!
x j ! n x1 xr 1 !
n rj =11x j
()
IA x ,
A = x = x1 , , xr r ; x j 0, j = 1, , r , x j = n.
j =1
For example, for r = 3, is that part of the plane through the points (1, 0, 0),
(0, 1, 0) and (0, 0, 1) which lies in the rst quadrant, whereas for r = 2, the
distribution of X = (X1, X2) is completely determined by that of X1 = X which
is distributed as B(n, 1) = B(n, ).
EXAMPLE 2
()
EXAMPLE 3
()
= 1 , 2 2 ; 1 , 2 > 0
(that is, the part of the plane above the horizontal axis) and
11.1
x
1
f x; =
exp
2 2
2
x
f x; =
exp
2 2
2
EXAMPLE 4
261
= 1 , , 5 5 ; 1 , 2 , 3 , 4 0, , 5 1, 1
and
( )
f x; =
1
2 1 2 1 2
e q 2 ,
where
2
x 2
x1 1 x2 2 x2 2
1
1
2
,
1
1 2 2
x = x1 , x2 .
q=
1
1 2
fX x j ; =
j
xj
(1 )
1 x j
( )
I A xj ,
j = 1, , n,
n
fT t ; = t 1
t
( )
n t
()
IB t ,
262
11
each one of the (nt) different ways in which the t successes can occur. Then, if
there are values of for which particular occurrences of the t successes can
happen with higher probability than others, we will say that knowledge of the
positions where the t successes occurred is more informative about than
simply knowledge of the total number of successes t. If, on the other hand, all
possible outcomes, given the total number of successes t, have the same
probability of occurrence, then clearly the positions where the t successes
occurred are entirely irrelevant and the total number of successes t provides all
possible information about . In the present case, we have
P X 1 = x1 , , X n = xn | T = t =
=
P X 1 = x1 , , X n = xn , T = t
P T = t
P X 1 = x1 , , X n = xn
P T = t
if x1 + + xn = t
x 1
1
1 x 1
x 1
n
n t
1
t
1 x n
n t
t 1
n t
n t
1
t
n t
1
n
t
if x1 + + xn = t and zero otherwise. Thus, we found that for all x1, . . . , xn such
that xj = 0 or 1, j = 1, . . . , n and
n
x j = t , P ( X 1 = x1 , , X n = xn | T = t ) = 1
j =1
n
t
Tj = Tj X 1 , , X n ,
j = 1, , m
11.1
263
P T* B T = t = P X 1 , , X n A T = t
T = T X 1 , , X n = T1 X 1 , , X n , , Tm X 1 , , X n
) ( (
))
is sufcient for if and only if the joint p.d.f. of X1, . . . , Xn factors as follows,
) [ (
) ](
f x1 , , xn ; = g T x1 , , xn ; h x1 , , xn ,
) [ (
) ](
f x1 , , xn ; = g T x1 , , xn ; h x1 , , xn ,
with g and h as described in the theorem. Clearly, it sufces to restrict attention to those ts for which P (T = t) > 0. Next,
[(
) ]
P T = t = P T X 1 , , X n = t = P X 1 = x1 , , X n = xn ,
where the summation extends over all (x1, . . . , xn) for which T(x1, . . . , xn) = t.
Thus
) ( )
= g(t; ) h( x , , x ).
( )(
P T = t = f x1 ; f xn ; = g t; h x1 , , xn
1
Hence
264
11
P X 1 = x1 , , X n = xn T = t
=
=
P X 1 = x1 , , X n = xn , T = t
P T = t
) = P (X
= x1 , , X n = xn
( )(
) = h( x , , x )
g(t; ) h( x , , x ) h( x , , x )
g t; h x1 , , xn
P T = t
P X 1 = x1 , , X n = xn T = t =
P X 1 = x1 , , X n = xn
P T = t
= k x1 , , xn , T x1 , , xn
)]
if and only if
(
)
= P (T = t)k[ x , , x , T( x , , x )].
f x1 ; f xn ; = P X 1 = x1 , , X n = xn
Setting
[(
) ]
g T x1 , , xn ; = P T = t
and h x1 , , xn
)]
= k x1 , , xn , T x1 , , xn ,
we get
) [ (
) ](
f x1 ; f xn ; = g T x1 , , xn ; h x1 , , xn ,
as was to be seen.
Continuous case: The proof in this case is carried out under some further
regularity conditions (and is not as rigorous as that of the discrete case). It
should be made clear, however, that the theorem is true as stated. A proof
without the regularity conditions mentioned above involves deeper concepts
of measure theory the knowledge of which is not assumed here. From Remark
1, it follows that m n. Then set Tj = Tj(X1, . . . , Xn), j = 1, . . . , m, and assume
that there exist other n m statistics Tj = Tj(X1, . . . , Xn), j = m + 1, . . . , n, such
that the transformation
t j = Tj x1 , , xn ,
j = 1, , n,
is invertible, so that
x j = x j t, t m +1 , , t n ,
j = 1, , n, t = t1 , , t m .
11.1
265
) [ (
) ](
f x1 ; f xn ; = g T x1 , , xn ; h x1 , , xn .
Then
)
), , x (t, t
fT , T , , T t, t m +1 , , t n ;
m+1
( )[ (
= g(t; )h * (t , t
= g t; h x1 t, t m +1 , , t n
where we set
m +1
m +1
, , tn
)] J
, , tn ,
) [ (
( ) (
h * t, t m +1 , , t n = h x1 t, t m +1 , , t n , , xn t, t m +1 , , t n
)] J .
Hence
( )
( )
()
fT t; = g t; h * t, t m +1 , , t n dt m +1 dt n = g t; h ** t ,
where
()
h * * t = h * t, t m +1 , , t n dt m +1 dt n .
m +1
f t m +1 , , t n t; =
m +1
f x1 , , xn ; = fT , T , , T t, t m +1 , , t n ; J 1
m+1
) ( )
= f t m +1 , , t n t; fT t; J 1 .
But f(tm+1, . . . , tn|t; ) is independent of by Remark 1. So we may set
) (
f t m +1 , , t n t; J 1 = h * t m +1 , , t n ; t = h x1 , , xn .
If we also set
( ) [ (
) ]
fT t; = g T x1 , , xn ; ,
266
11
we get
) [ (
) ](
f x1 , , xn ; = g T x1 , , xn ; h x1 , , xn ,
as was to be seen.
COROLLARY
) [ (
) ](
)
[T ( x , , x )]; }h( x , , x )
f x1 , , xn ; = g T x1 , , xn ; h x1 , , xn
=g
[ ( )]
()
= 1 = 1 .
Hence
) [ (
) ](
) [(
) ](
f x1 , , xn ; = g T x1 , , xn ; h x1 , , xn
becomes
f x1 , , xn ; = g T x1 , , xn ; h x1 , , xn ,
where we set
) [
( )]
f x1 , , xn ; = f x1 , , xn ; 1
and
[(
[(
) ]
( )]
g T x1 , , xn ; = g T x1 , , xn ; 1 .
Thus, T is sufcient for the new parameter .
( )
f x; =
()
n!
1x rx I A x .
x1! xr !
1
Then, by Theorem 1, it follows that the statistic (X1, . . . , Xr) is sufcient for
= (1, . . . , r). Actually, by the fact that rj = 1j = 1 and rj = 1 xj = n, we also have
( )
f x; =
r 1
j =1
n!
x j ! n x1 xr 1 !
1x rx1 1 1 r 1
1
r1
n rj =11x j
()
IA x
11.1
267
from which it follows that the statistic (X1, . . . , Xr1) is sufcient for (1, . . . ,
r1). In particular, for r = 2, X1= X is sufcient for 1 = .
EXAMPLE 7
Let X1, . . . , Xn be i.i.d. r.v.s from U(1, 2). Then by setting x = (x1, . . . , xn)
and = (1, 2), we get
( )
f x; =
( )
I [
g1 x(1) , g 2 x( n ) , ,
( )
1,
) x(1) I ( , ] x( n )
][
where g1[x(1), ] = I[ , )(x(1)), g2[x(n), ] = I(, ](x(n)). It follows that (X(1), X(n)) is
sufcient for . In particular, if 1 = is known and 2 = , it follows that X(n)
is sufcient for . Similarly, if 2 = is known and 1 = , X(1) is sufcient for .
1
EXAMPLE 8
Let X1, . . . , Xn be i.i.d. r.v.s from N(, 2). By setting x = (x1, . . . , xn), = 1,
2 = 2 and = (1, 2), we have
n
1
1
f x; =
exp
2
2 2
( )
(x
j =1
2
1 .
But
n
(x
j =1
) = [( x
2
) (
x + x 1
j =1
)] = ( x
n
j =1
+ n x 1 ,
so that
n
1
1
f x; =
exp
2
2 2
( )
(x
j =1
2
n
x 1 .
2 2
n 2
f x; =
exp 1 exp 1
2
2 2
2
( )
x
j =1
1
2 2
x
j =1
2
j
2
1 n
X j X , and
n j =1
1 n
Xj
n j =1
268
11
S2 =
2
1 n
Xj X .
n j =1
Now consider the conditional p.d.f. of (X1, . . . , Xn1), given nj= 1Xj = yn. By
using the transformation
n
y j = x j , j = 1, , n 1, yn = x j ,
j =1
one sees that the above mentioned conditional p.d.f. is given by the quotient of
the following p.d.f.s:
n
1
1
exp
y1 1
2 2
2 2
+ + yn 1 1
2
+ yn y1 yn 1 1
and
1
exp
yn n1 .
2n 2
2 n 2
2n 2
2 2
1
exp
yn n1
2 n 2
n y1 1
2
n yn y1 yn 1 1
n yn 1 1
269
Exercises
Sufciency: Denition and Some Basic
Results
11.1
and
(y
n1
n y1 1
n yn 1 1
(
) ,
n yn y1 yn 1 1
2
= yn2 n y12 + + yn21 + yn y1 yn 1
independent of 1. Thus the conditional p.d.f. under consideration is independent of 1 but it does depend on 2. Thus nj= 1 Xj, or equivalently, X is not
sufcient for (1, 2). The concept of X being sufcient for 1 is not valid
unless 2 is known.
Exercises
11.1.1 In each one of the following cases write out the p.d.f. of the r.v. X and
specify the parameter space of the parameter involved.
i)
ii)
iii)
iv)
X
X
X
X
is distributed as Poisson;
is distributed as Negative Binomial;
is distributed as Gamma;
is distributed as Beta.
11.1.2 Let X1, . . . , Xn be i.i.d. r.v.s distributed as stated below. Then use
Theorem 1 and its corollary in order to show that:
i) nj= 1 Xj or X is a sufcient statistic for , if the Xs are distributed as
Poisson;
ii) nj= 1 Xj or X is a sufcient statistic for , if the Xs are distributed as
Negative Binomial;
iii) ( nj= 1 Xj, nj= 1 Xj) or ( nj= 1 Xj, X ) is a sufcient statistic for (1, 2) = (,
) if the Xs are distributed as Gamma. In particular, nj= 1 Xj is a sufcient
statistic for = if is known, and nj= 1 Xj or X is a sufcient statistic for
= if is known. In the latter case, take = 1 and conclude that nj= 1 Xj
or X is a sufcient statistic for the parameter = 1/ of the Negative
Exponential distribution;
iv) ( nj= 1 Xj, nj= 1 (1 Xj)) is a sufcient statistic for (1, 2) = (, ) if the Xs
are distributed as Beta. In particular, nj= 1 Xj or nj= 1 log Xj is a sufcient
statistic for = if is known, and nj= 1 (1 Xj) is a sufcient statistic for
= if is known.
11.1.3 (Truncated Poisson r.v.s) Let X1, X2 be i.i.d. r.v.s with p.d.f. f(; )
given by:
f 0 ; = e ,
( )
f ( x; ) = 0,
f 1; = e ,
f 2; = 1 e e ,
x 0, 1, 2,
270
11
11.1.4 Let X1, . . . , Xn be i.i.d. r.v.s with the Double Exponential p.d.f. f(; )
given in Exercise 3.3.13(iii) of Chapter 3. Then show that nj= 1|Xj| is a sufcient
statistic for .
11.1.5 If Xj = (X1j, X2j), j = 1, . . . , n, is a random sample of size n from the
Bivariate Normal distribution with parameter as described in Example 4,
then, by using Theorem 1, show that:
n
n
n
2
2
X1 , X 2 , X1 j , X 2 j , X1 j X 2 j
j =1
j =1
j =1
is a sufcient statistic for .
11.1.6 If X1, . . . , Xn is a random sample of size n from U(, ), (0, ),
show that (X(1), X(n)) is a sufcient statistic for . Furthermore, show that this
statistic is not minimal by establishing that T = max(|X1|, . . . , |Xn|) is also a
sufcient statistic for .
11.1.7
that
2
X j, X j
j =1
j =1
or X ,
X j2
j =1
n
f x; = e
()
I ( , ) x , ,
( )
()
( )
2
ii) f ( x; ) =
( x)I ( ) ( x), (0, );
i) f x; = x 1 I ( 0 , 1) x , 0, ;
0,
iii) f x; =
c
iv)) f x; =
c x
()
()
1 3 x
x e I ( 0 , ) x , 0, ;
6 4
+1
I ( c , ) x , 0, .
11.1
271
11.2 Completeness
In this section, we introduce the (technical) concept of completeness which we
also illustrate by a number of examples. Its usefulness will become apparent in
the subsequent sections. To this end, let X be a k-dimensional random vector
Rr, and let g: k be a (measurable) function,
with p.d.f. f(; ),
so that g(X) is an r.v. We assume that E g(X) exists for all and set
F = {f(.; ); }.
DEFINITION 2
With the above notation, we say that the family F (or the random vector X) is
complete if for every g as above, E g(X) = 0 for all implies that g(x) = 0
.
except possibly on a set N of xs such that P(X N) = 0 for all
The examples which follow illustrate the concept of completeness. Meanwhile let us recall that if nj= 0 cnj xnj = 0 for more than n values of x, then
cj = 0, j = 0, . . . , n. Also, if n = 0 cnxn = 0 for all values of x in an interval for
which the series converges, then cn = 0, n = 0, 1, . . . .
EXAMPLE 9
Let
n
F = f ; ; f x; = x 1
x
( ) (
n x
I A x , 0, 1 ,
()
( )
( )
()
n x
= 1
) g( x) nx
n
x =0
g( x ) x x = 0
x=0
for every (0, ), hence for more than n values of , and therefore
n
g x = 0, x = 0, 1, , n
x
()
Let
F = f ; ; f x; = e
I A x , 0, ,
x
!
( ) (
( )
()
E g X = g x e
x= 0
()
()
g x
x
x = 0
= e
x!
x = 0 x!
272
11
EXAMPLE 11
Let
1
F = f ; ; f x; =
I [ , ] x , , .
( ) (
()
( )
1
g x dx.
a
E g X =
()
Thus, if E g(X) = 0 for all (, ), then g(x)dx = 0 for all > which
intuitively implies (and that can be rigorously justied) that g(x) = 0 except
possibly on a set N of xs such that P (X N) = 0 for all , where X is an
r.v. with p.d.f. f(; ). The same is seen to be true if f(; ) is U(, ).
EXAMPLE 12
Let X1, . . . , Xn be i.i.d. r.v.s from N(, 2). If is known and = , it can be
shown that
x
1
exp
F = f ; ; f x; =
2 2
2
( ) (
,
x 2
1
F = f ; ; f x; =
exp
, 0,
2
2
is not complete. In fact, let g(x) = x . Then E g(X ) = E(X ) = 0 for all
(0, ), while g(x) = 0 only for x = . Finally, if both and 2 are unknown,
it can be shown that ( X , S2) is complete.
( ) (
Let X1, . . . , Xn be i.i.d. r.v.s with p.d.f. f(; ), r and let T = (T1, . . . ,
Tm) be a sufcient statistic for , where Tj = Tj(X1, , Xn), j = 1, , m. Let
g(; ) be the p.d.f. of T and assume that the set S of positivity of g(; ) is the
same for all . Let V = (V1, . . . , Vk), Vj = Vj(X1, . . . , Xn), j = 1, . . . , k, be
any other statistic which is assumed to be (stochastically) independent of T.
Then the distribution of V does not depend on .
We have that for t S, g(t; ) > 0 for all and so f(v|t) is well
dened and is also independent of , by sufciency. Then
PROOF
) ( )( )
fV , T v, t; = f v t g t;
for all v and t S, while by independence
( )( )
fV , T v, t; = fV v; g t;
for all v and t. Therefore
Exercises
Sufciency: Denition and Some Basic
Results
11.1
273
( )( ) ( )( )
fV v; g t; = f v t g t;
for all v and t S. Hence fV(v; ) = f(v/t) for all v and t S; that is, fV(v; ) =
fV(v) is independent of .
REMARK 4
r and let
(Basu) Let X1, . . . , Xn be i.i.d. r.v.s with p.d.f. f(; ),
T = (T1, . . . , Tm) be a sufcient statistic of , where Tj = Tj(X1, . . . , Xn),
}
j = 1, . . . , m. Let g(; ) be the p.d.f. of T and assume that C = {g(; );
is complete. Let V = (V1, . . . , Vk), Vj = Vj(X1, . . . , Xn), j = 1, . . . , k be any other
statistic. Then, if the distribution of V does not depend on , it follows that V
and T are independent.
It sufces to show that for every t m for which f(v|t) is dened,
one has fV(v) = f(v|t), v k. To this end, for an arbitrary but xed v, consider
the statistic (T; v) = fV(v) f(v|T) which is dened for all ts except perhaps
for a set N of ts such that P (T N) = 0 for all . Then we have for the
continuous case (the discrete case is treated similarly)
PROOF
( ) ( )]
( )
()
E T ; v = E f V v f v T = f V v E f v T
()
(
)(
(v) f (v, t , , t ; )dt
( v ) f ( v ) = 0;
= fV v f v t1 , , t m g t1 , , t m ; dt1 dt m
= fV
= fV
dt m
that is, E(T; v) = 0 for all and hence (t; v) = 0 for all t Nc by
completeness (N is independent of v by the denition of completeness). So
fV(v) = f(v/t), t Nc, as was to be seen.
Exercises
11.2.1 If F is the family of all Negative Binomial p.d.f.s, then show that F is
complete.
11.2.2 If F is the family of all U(, ) p.d.f.s, (0, ), then show that F
is not complete.
11.2.3 (Basu) Consider an urn containing 10 identical balls numbered + 1,
+ 2, . . . , + 10, where = {0, 10, 20, . . . }. Two balls are drawn one by
one with replacement, and let Xj be the number on the jth ball, j = 1, 2. Use this
274
11
example to show that Theorem 2 need not be true if the set S in that theorem
does depend on .
11.3 UnbiasednessUniqueness
In this section, we shall restrict ourselves to the case that the parameter is realvalued. We shall then introduce the concept of unbiasedness and we shall
establish the existence and uniqueness of uniformly minimum variance unbiased statistics.
DEFINITION 3
Let X1, . . . , Xn be i.i.d. r.v.s with p.d.f. f(; ), and let U = U(X1, . . . ,
Xn) be a statistic. Then we say that U is an unbiased statistic for if EU = for
every , where by EU we mean that the expectation of U is calculated by
using the p.d.f. f(; ).
We can now formulate the following important theorem.
THEOREM 4
i) That (T) is a function of the sufcient statistic T alone and does not
depend on is a consequence of the sufciency of T.
ii) That (T) is unbiased for , that is, E(T) = for every , follows
from (CE1), Chapter 5, page 123.
iii) This follows from (CV), Chapter 5, page 123.
The interpretation of the theorem is the following: If for some reason one
is interested in nding a statistic with the smallest possible variance within the
class of unbiased statistics of , then one may restrict oneself to the subclass of
the unbiased statistics which depend on T alone (with probability 1). This is so
because, if an unbiased statistic U is not already a function of T alone (with
probability 1), then it becomes so by conditioning it with respect to T. The
variance of the resulting statistic will be smaller than the variance of the
statistic we started out with by (iii) of the theorem. It is further clear that
the variance does not decrease any further by conditioning again with respect
to T, since the resulting statistic will be the same (with probability 1) by
(CE2), Chapter 5, page 123. The process of forming the conditional expectation of an unbiased statistic of , given T, is known as RaoBlackwellization.
11.1
11.3 UnbiasednessUniqueness
Sufciency: Denition
and Some Basic Results
275
The concept of completeness in conjunction with the RaoBlackwell theorem will now be used in the following theorem.
THEOREM 5
[ ( ) ( )]
E U T V T = 0, ,
or
( )
E T = 0, ,
EXAMPLE 13
Let X1, . . . , Xn be i.i.d. r.v.s from B(1, ), (0, 1). Then T = nj= 1 Xj is a
sufcient statistic for , by Example 5, and also complete, by Example 9. Now
X = (1/n)T is an unbiased statistic for and hence, by Theorem 5, UMV
unbiased for .
EXAMPLE 14
Let X1, . . . , Xn be i.i.d. r.v.s from N(, 2). Then if is known and = , we
have that T = nj= 1 Xj is a sufcient statistic for , by Example 8. It is also
complete, by Example 12. Then, by Theorem 5, X = (1/n)T is UMV unbiased
for , since it is unbiased for . Let be known and without loss of generality
set = 0 and 2 = . Then T = nj= 1 X 2j is a sufcient statistic for , by Example
8. Since T is also complete (by Theorem 8 below) and S2 = (1/n)T is unbiased
for , it follows, by Theorem 5, that it is UMV unbiased for .
Here is another example which serves as an application to both Rao
Blackwell and LehmannScheff theorems.
EXAMPLE 15
Let X1, X2, X3 be i.i.d. r.v.s from the Negative Exponential p.d.f. with parameter . Setting = 1/, the p.d.f. of the Xs becomes f(x; ) = 1/ex/, x > 0. We
have then that E(Xj) = and 2(Xj) = 2, j = 1, 2, 3. Thus X1, for example, is an
unbiased statistic for with variance 2. It is further easily seen (by Theorem
276
11
T 2
2 =
< 2 , 0, .
3
3
( )
and
Exercises
11.3.1 If X1, . . . , Xn is a random sample of size n from P(), then use
Exercise 11.1.2(i) and Example 10 to show that X is the (essentially) unique
UMV unbiased statistic for .
11.3.2 Refer to Example 15 and, by utilizing the appropriate transformation,
show that X is the (essentially) unique UMV unbiased statistic for .
()
()
where C() > 0, and also h(x) > 0 for x S, the set of positivity of f(x; ),
which is independent of . It follows that
( ) e
C 1 =
() ( )
Q T x
x S
()
hx
()
C 1 = e
S
() ( )
Q T x
()
h x dx
for the continuous case. If X1, . . . , Xn are i.i.d. r.v.s with p.d.f. f(; ) as above,
then the joint p.d.f. of the Xs is given by
n
f x1 , , xn ; = C n exp Q T x j h x1 h xn ,
j =1
x j , j = 1, , n, .
Some illustrative examples follow.
()
()
( ) ( )
( )
(2)
EXAMPLE 16
277
Let
n
f x; = x 1
x
n x
()
IA x ,
) (
f x; = 1
n
exp log
I A x , 0, 1 ,
x
1 x
()
() (
()
C = 1 , Q = log
EXAMPLE 17
, T x = x, h x = I A x .
1
x
()
()
()
Let now the p.d.f. be N(, 2). Then if is known and = , we have
1 2
exp
exp 2 x exp
x , ,
2
2 2
2
2
f x; =
()
C =
2
exp
,
2
2
2
,
2
()
Q =
1 2
T x = x, h x = exp
x .
2 2
2
If now is known and = , then we have
()
f x; =
()
2
1
exp
x , 0, ,
2
()
C =
1
2
()
, Q =
() (
1
, T x = x
2
()
and h x = 1.
Let X be an r.v. with p.d.f. f(; ), given by (1) and set C = {g(; );
}, where g(; ) is the p.d.f. of T(X). Then C is complete, provided
contains a non-degenerate interval.
Then the completeness of the families established in Examples 9 and 10
and the completeness of the families asserted in the rst part of Example 12
and the last part of Example 14 follow from the above theorem.
In connection with families of p.d.f.s of the one-parameter exponential
form, the following theorem holds true.
278
11
THEOREM 7
Let X1, . . . , Xn be i.i.d. r.v.s with p.d.f. of the one-parameter exponential form.
Then
i) T* = nj= 1 T(Xj) is a sufcient statistic for .
ii) The p.d.f. of T* is of the form
( )
()
g t; = C n e
()
Q t
()
h* t ,
n
g t ; = C n exp Q T x j h x j
j = 1
j =1
n
Q ( ) t
Q t
= Cn e
h x j = C n e ( ) h * t ,
j =1
( )
()
()
()
( )
( )
( )
()
()
where
n
h * t = h x j .
j = 1
()
( )
Next, let the Xs be of the continuous type. Then the proof is carried out under
certain regularity conditions to be spelled out. We set Y1 = nj= 1 T(Xj) and let
Yj = Xj, j = 2, . . . , n. Then consider the transformation
n
n
y1 = T x j
T x1 = y1 T y j
j =1
j =2
hence
y = x , j = 2,
x = y , j = 2, . . . , n,
j
j
, n;
j
j
( )
( )
( )
and thus
n
1
x1 = T y1 T y j
j =2
x = y , j = 2, , n,
j
j
( )
x1
1
=
,
y1 T T 1 z
( )
z = y1 T y j ,
where
[ ( )]
279
j =2
( )
[ ( )]
T yj
x1
1
z
=
,
=
y j T T 1 z y j
T T 1 z
[ ( )]
x j
=1
y j
and
[ (z)]
T T
{ [ y T ( y ) T ( y )]}
T T
( ) { ( )[ ( )
+ T ( y ) + + T ( y )]}
( )
g y1 , , yn ; = C n exp Q y1 T y2 T yn
2
{ [
( )
( )]} h( y ) J
n
h T 1 y1 T y2 T yn
()
= Cn e
n
()
Q y1
j =2
{ [
( )
( )]}
h T 1 y1 T y2 T yn
( )
h yj J .
j =2
( )
{ [
( )
( )]}
h * y1 = h T 1 y1 T y2 T yn
( )
h y j J dy2 dyn ,
j =2
280
11
THEOREM 9
Let the r.v. X1, . . . , Xn be i.i.d. from a p.d.f. of the one-parameter exponential
form and let T* be dened by (i) in Theorem 7. Then, if V is any other statistic,
it follows that V and T* are independent if and only if the distribution of V
does not depend on .
PROOF In the rst place, T* is sufcient for , by Theorem 7(i), and the set
of positivity of its p.d.f. is independent of , by Theorem 7(ii). Thus the
assumptions of Theorem 2 are satised and therefore, if V is any statistic which
is independent of T*, it follows that the distribution of V is independent of .
For the converse, we have that the family C of the p.d.f.s of T* is complete, by
Theorem 8. Thus, if the distribution of a statistic V does not depend on ,
it follows, by Theorem 3, that V and T* are independent. The proof is
completed.
APPLICATION
1 n
Xj
n j =1
S2 =
and
1 n
Xj X
n j =1
are independent.
We treat as the unknown parameter and let 2 be arbitrary (>0)
but xed. Then the p.d.f. of the Xs is of the one-parameter exponential form
and T = X is both sufcient for and complete. Let
PROOF
V = V X1 , , X n = X j X .
j =1
(X
j =1
) = (Y Y ) .
2
j =1
But the distribution of nj= 1 (Yj Y )2 does not depend on , because P[nj= 1 (Yj
Y )2 B] is equal to the integral of the joint p.d.f. of the Ys over B and this
p.d.f. does not depend on .
Exercises
11.4.1 In each one of the following cases, show that the distribution of the
r.v. X is of the one-parameter exponential form and identify the various
quantities appearing in a one-parameter exponential family.
i) X is distributed as Poisson;
ii) X is distributed as Negative Binomial;
iii) X is distributed as Gamma with known;
11.1
11.5 Some
Multiparameter
Generalizations
Sufciency:
Denition
and Some
Basic Results
281
f x; =
x
1
x exp
I (0 , ) x , > 0 , > 0
()
(known).
f x; = C exp Q j Tj x h x ,
j = 1
( )
()
() () ()
) (
f x1 , , xr ; 1 , , r 1 = 1 1 r 1
j
n!
exp x j log
x ! x ! I A x1 , , xr ,
j =1
1
r 1
1
r
282
11
() (
C = 1 1 r 1 ,
()
Q j = log
j
, Tj x1 , , xr = x j , j = 1, , r 1,
1 1 r 1
and
h x1 , , xr =
EXAMPLE 19
n!
I A x1 , , xr .
x1! xr !
f x; 1 , 2 =
1 2
exp 1 exp 1 x
x ,
2 2
2
2 2
2 2
1
()
C =
2
1
exp 1 , Q1 = 1 , Q2 =
, T1 x = x,
2
2
2 2
2
2
2
()
()
T2 x = x 2
()
()
and h x = 1.
For multiparameter exponential families, appropriate versions of Theorems 6, 7 and 8 are also true. This point will not be pursued here, however.
Finally, if X1, . . . , Xn are i.i.d. r.v.s with p.d.f. f(; ), = (1, . . . , r)
r, not necessarily of an exponential form, the r-dimensional statistic U =
(U1, . . . , Ur), Uj = Uj(X1, . . . , Xn), j = 1, . . . , r, is said to be unbiased if EUj =
j, j = 1, . . . , r for all . Again, multiparameter versions of Theorems 49
may be formulated but this matter will not be dealt with here.
Exercises
11.5.1 In each one of the following cases, show that the distribution of the
r.v. X and the random vector X is of the multiparameter exponential form and
identify the various quantities appearing in a multiparameter exponential
family.
i) X is distributed as Gamma;
ii) X is distributed as Beta;
iii) X = (X1, X2) is distributed as Bivariate Normal with parameters as described in Example 4.
11.5.2 If the r.v. X is distributed as U(, ), show that the p.d.f. of X is not
of an exponential form regardless of whether one or both of , are unknown.
11.5.3 Use the not explicitly stated multiparameter versions of Theorems 6
and 7 to discuss:
11.1
Exercises
Sufciency: Denition and Some Basic
Results
283
()
px =
1+e
1
,
( + x )
Dose
x1
x2
...
xk
n1
n2
...
nk
Number of deaths
(Y)
Y1
Y2
...
Yk
x1, x2, . . . , xk and n1, n2, . . . , nk are known constants, Y1, Y2, . . . , Yk are
independent r.v.s; Yj is distributed as B(nj, p(xj)). Then show that:
ii) The joint distribution of Y1, Y2, . . . , Yk constitutes an exponential family;
ii) The statistic
k
k
Y
,
x
Y
j j j
j =1 j =1
is sufcient for = (, ).
(REMARK In connection with the probability p(x) given above, see also
Exercise 4.1.8 in Chapter 4.)
284
12
Point Estimation
Chapter 12
Point Estimation
12.1 Introduction
Let X be an r.v. with p.d.f. f(; ), where
r. If is known, we can
calculate, in principle, all probabilities we might be interested in. In practice,
however, is generally unknown. Then the problem of estimating arises; or
),
more generally, we might be interested in estimating some function of , g(
say, where g is (measurable and) usually a real-valued function. We now
). Let
proceed to dene what we mean by an estimator and an estimate of g(
X1, . . . , Xn be i.i.d. r.v.s with p.d.f. f(; ). Then
DEFINITION 1
Any statistic U = U(X1, . . . , Xn) which is used for estimating the unknown
) is called an estimator of g(
). The value U(x1, . . . , xn) of U for
quantity g(
).
the observed values of the Xs is called an estimate of g(
For simplicity and by slightly abusing the notation, the terms estimator and
estimate are often used interchangeably.
Exercise
12.1.1 Let X1, . . . , Xn be i.i.d. r.v.s having the Cauchy distribution with =
1 and unknown. Suppose you were to estimate ; which one of the estimators
would you choose? Justify your answer.
X1 , X
as a criterion of selection.)
(Hint: Use the distributions of X1 and X
284
12.2
Criteria12.3
for Selecting
Minimum Statistics
Variance
The CaseanofEstimator:
AvailabilityUnbiasedness,
of Complete Sufcient
285
DEFINITION 3
) is said to be estimable if
Let g be as above and suppose it is real-valued. g(
it has an unbiased estimator.
According to Denition 2, one could restrict oneself to the class of unbiased estimators. The interest in the members of this class stems from the
interpretation of the expectation as an average value. Thus if U =
), then, no matter what is,
U(X1, . . . , Xn) is an unbiased estimator of g(
).
the average value (expectation under ) of U is equal to g(
Although the criterion of unbiasedness does specify a class of estimators
with a certain property, this class is, as a rule, too large. This suggests that a
second desirable criterion (that of variance) would have to be superimposed
on that of unbiasedness. According to this criterion, among two estimators of
) which are both unbiased, one would choose the one with smaller
g(
variance. (See Fig. 12.1.) The reason for doing so rests on the interpretation
of variance as a measure of concentration about the mean. Thus, if U =
), then by Tchebichevs
U(X1, . . . , Xn) is an unbiased estimator of g(
inequality,
()
P U g 1
2U
.
2
Therefore the smaller 2U is, the larger the lower bound of the probability of
) becomes. A similar interpretation can be given
concentration of U about g(
by means of the CLT when applicable.
h1(u; )
h2(u; )
g( )
(a)
g( )
(b)
Figure 12.1 (a) p.d.f. of U1 (for a fixed ). (b) p.d.f. of U2 (for a fixed ).
286
12
Point Estimation
Following this line of reasoning, one would restrict oneself rst to the class
) and next to the subclass of unbiased estimaof all unbiased estimators of g(
tors which have nite variance under all . Then, within this restricted
class, one would search for an estimator with the smallest variance. Formalizing this, we have the following denition.
DEFINITION 4
Exercises
12.2.1 Let X be an r.v. distributed as B(n, ). Show that there is no unbiased
estimator of g( ) = 1/ based on X.
In discussing Exercises 12.2.212.2.4 below, refer to Example 3 in Chapter 10
and Example 7 in Chapter 11.
12.2.2 Let X1, . . . , Xn be independent r.v.s distributed as U(0, ), =
(0, ). Find unbiased estimators of the mean and variance of the Xs depending only on a sufcient statistic for .
12.2.3 Let X1, . . . , Xn be i.i.d. r.v.s from U(1, 2), 1 < 2 and nd unbiased
estimators for the mean (1 + 2)/2 and the range 2 1 depending only on a
sufcient statistic for (1, 2).
12.3
287
n+1
X
2 n + 1 (n )
and U 2 =
n+1
2 X ( n ) + X (1 ) .
5n + 4
Then show that both U1 and U2 are unbiased estimators of and that U2 is
uniformly better than U1 (in the sense of variance).
12.2.5 Let X1, . . . , Xn be i.i.d. r.v.s from the Double Exponential distribution f(x; ) = 21 e|x|, = . Then show that (X(1) + X(n))/2 is an unbiased
estimator of .
12.2.6 Let X1, . . . , Xm and Y1, . . . , Yn be two independent random samples
with the same mean and known variances 21 and 22, respectively. Then show
is an unbiased estimator of . Also
+ (1 c)Y
that for every c [0, 1], U = cX
nd the value of c for which the variance of U is minimum.
12.2.7 Let X1, . . . , Xn be i.i.d. r.v.s with mean and variance 2, both
is the minimum variance unbiased linear estimaunknown. Then show that X
tor of .
288
12
Point Estimation
THEOREM 1
EXAMPLE 1
2
1 n
Xj X ,
n 1 j =1
n
n
1 n
= X j2 nX 2 = X j n X j
n j =1
j =1
j =1
j =1
because Xj takes on the values 0 and 1 only and hence X 2j = Xj. By setting
T = nj= 1 Xj, we have then
n
Xj X
(X
j =1
=T
T2
T2
1
T
, so that U =
.
n
n 1
n
()
= 1
n x
+ n 1
n 1
n
+ 2 1
2
n 2
12.3
289
1 if
U=
0 if
X1 2
X 1 > 2.
(
= P (X
E U T = t = P U = 1T = t
2T = t =
)[
P X 1 2, X 1 + + X r = t
P T = t
1
P X 1 = 0, X 2 + + X r = t
P T = t
(
+P ( X
)
= t 2)]
+ P X 1 = 1, X 2 + + X r = t 1
= 2, X 2 + + X r
)[
) (
1
P X 1 = 0 P X 2 + + X r = t
P T = t
(
+P ( X
) (
= 2)P ( X
)
+ + X = t 2)]
n(r 1)
(
(1 )
(1 )
+ P X 1 = 1 P X 2 + + X r = t 1
nr
= t 1
t
+ n 1
n 1
n
+ 2 1
2
nr t
n 2
n r 1 t 2
n ( r 1)t + 2
t2
nr t
t
1
nr t
n r 1
n r 1 n n r 1
+ n
+
.
t 1 2 t 2
Therefore
1
n r 1 n n r 1
nr n r 1
T =
+ n
+
T T
T 1 2 T 2
( )
n r 1 t
n r 1 t 1
n ( r 1)t +1
1
t 1
nr
= t 1
t
290
12
Point Estimation
Consider certain events which occur according to the distribution P(). Then
the probability that no event occurs is equal to e. Let now X1, . . . , Xn (n 2)
be i.i.d. r.v.s from P(). Then the problem is that of nding a UMVU estimator of e.
Set
n
()
T = X j , = , g = e
j =1
and dene U by
1 if
U=
0 if
Then
X1 = 0
X 1 1.
) ()
E U = P U = 1 = P X1 = 0 = g ;
that is, U is an unbiased estimator of g( ). However, it does not depend on T
which is a complete, sufcient statistic for , according to Exercise 11.1.2(i)
and Example 10 in Chapter 11. It remains then for us to RaoBlackwellize U.
For this purpose we use the fact that the conditional distribution of X1, given
T = t, is B(t, 1/n). (See Exercise 12.3.1.) Then
t
1
E U T = t = P X1 = 0 T = t = 1 ,
n
so that
1
T = 1
n
( )
is a UMVU estimator of e.
EXAMPLE 4
Let X1, . . . , Xn be i.i.d. r.v.s from N(, 2) with 2 unknown and known. We
are interested in nding a UMVU estimator of .
2
1 n
Xj ,
n j =1
then nS2/ is 2n, so that nS/ is distributed as n. Then the expectation
E(nS/ ) can be calculated and is independent of ; call it cn (see Exercise
12.3.2). That is,
nS
nS
E
= c n , so that E
= ,
c n
12.3
291
S
E = ;
cn
Let again X1, . . . , Xn be i.i.d. r.v.s from N(, 2) with both and 2 unknown.
We are interested in nding UMVU estimators for each one of and 2.
) = , g2(
) = 2. By setting
Here = (, 2) and let g1(
S2 =
2
1 n
Xj X ,
n j =1
Let X1, . . . , Xn be i.i.d. r.v.s from N(, 2) with both and 2 unknown, and
set p for the upper pth quantile of the distribution (0 < p < 1). The problem is
that of nding a UMVU estimator of p.
Set = (, 2 ). From the denition of p, one has P (X1 p) = p. But
X p
p
P X 1 p = P 1
= 1
so that
p
= 1 p.
Hence
p
= 1 1 p and p = + 1 1 p .
292
12
Point Estimation
U=X+
S 1
1 p ,
cn
= and E (S/cn)
where cn is dened in Example 4. Then by the fact that E X
). Since U depends only on the
= (see Example 4), we have that E U = g(
, S2), it follows that U is a UMVU estimator
complete, sufcient statistic (X
of p.
Exercises
12.3.1 Let X1, . . . , Xn be i.i.d. r.v.s from P() and set T = nj= 1 Xj. Then show
that the conditional p.d.f. of X1, given T = t, is that of B(t, 1/n). Furthermore,
observe that the same is true if X1 is replaced by any one of the remaining Xs.
12.3.2
12.3.3 If X1, . . . , Xn are i.i.d. r.v.s from B(1, ), = (0, 1), by using
is the UMVU estimator of .
Theorem 1, show that X
12.3.4 If X1, . . . , Xn are i.i.d. r.v.s from P( ), = (0, ), use Theorem
1 in order to determine the UMVU estimator of .
12.3.5 Let X1, . . . , Xn be i.i.d. r.v.s from the Negative Exponential distribution with parameter = (0, ). Use Theorem 1 in order to determine the
UMVU estimator of .
12.3.6 Let X be an r.v. having the Negative Binomial distribution with
parameter = (0, 1). Find the UMVU estimator of g( ) = 1/ and
determine its variance.
12.3.7 Let X1, . . . , Xn be independent r.v.s distributed as N(, 1). Show that
2 (1/n) is the UMVU estimator of g( ) = 2.
X
12.3.8 Let X1, . . . , Xn be independent r.v.s distributed as N(, 2 ), where
both and 2 are unknown. Find the UMVU estimator of /.
12.3.9 Let (Xj, Yj), j = 1, . . . , n be independent random vectors having the
Bivariate Normal distribution with parameter = (1, 2, 1, 2, ). Find the
UMVU estimators of the following quantities: 12, 12, 2/1.
12.3.10 Let X be an r.v. denoting the life span of a piece of equipment. Then
the reliability of the equipment at time x, R(x), is dened as the probability
that X > x. If X has the Negative Exponential distribution with parameter
= (0, ), nd the UMVU estimator of the reliability R(x; ) on the basis
of n observations on X.
12.3.11
) (
( )
f x; = 1 , x = 0, 1, . . . , = 0, 1 ,
293
12.4 The Case Where Complete Sufcient Statistics Are Not Available or
May Not Exist: CramrRao Inequality
When complete, sufcient statistics are available, the problem of nding a
UMVU estimator is settled as in Section 3. When such statistics do not exist,
or it is not easy to identify them, one may use the approach described here in
searching for a UMVU estimator. According to this method, we rst establish
a lower bound for the variances of all unbiased estimators and then we attempt
to identify an unbiased estimator with variance equal to the lower bound
found. If that is possible, the problem is solved again. At any rate, we do have
a lower bound of the variances of a class of estimators, which may be useful for
comparison purposes.
The following regularity conditions will be employed in proving the main
result in this section. We assume that and that g is real-valued and
differentiable for all .
S f ( x1 ; ) f ( xn ; )dx1 dxn
294
12
Point Estimation
or
f ( x1 ; ) f ( xn ; )
S
or
S U ( x1 , . . . , xn )f ( x1 ; ) f ( xn ; ) dx1 dxn
U ( x1 , . . . , xn ) f ( x1 ; ) f ( xn ; )
THEOREM 2
[g ( )]
U
()
nI
()
, , where g =
()
dg
d
E U X 1 , . . . , X n
)(
()
= U x1 , . . . , xn f x1 ; f xn ; dx1 dxn = g .
S
(1)
[(
)]
f x1 ; f xn ;
f x1 ; f x j ; + f
=
j 1
f x j ; + +
f xn ;
j 2
=
f xj ;
j =1
n
1
=
j =1 f x j ;
n
=
log f
j =1
)
(
( x ; )
2
) f ( x ; )
j n
) f ( x ; )
i
ij
f x j ; f xi ;
i =1
n
x j ; f xi ; .
i =1
)
(2)
295
()
n
n
U
x
,
.
.
.
,
x
log
f
x
;
f xi ; dx1 dxn
n
j
S 1
S
j = 1
i = 1
n
(3)
= E U X 1 , . . . , X n
log f X j ; = E UV ,
j = 1
g =
where we set
log f X j ; .
j = 1
n
V = V X 1 , . . . , X n =
Next,
f x1 ; f xn ; dx1 dxn = 1.
S
) (
)(
()
Co U , V = E UV EU E V = E UV = g .
(4)
(5)
n
n
0 = E V = E
log f X j ; = E log f X j ;
j =1
j =1
= nE log f X 1 ; ,
so that
E log f X 1 ; = 0.
Therefore
n
n
2V = 2 log f X j ; = 2 log f X j ;
j = 1
j = 1
= n 2 log f X 1 ;
= nE log f X 1 ; = nE log f X ; .
(6)
296
12
Point Estimation
But
U , V =
( )
( U )( V )
Co U , V
) (
)(
C o U , V 2U 2V .
(7)
Taking now into consideration (5) and (6), relation (7) becomes
2
U nE log f X ; ,
[g ( )] ( )
2
or by means of (v),
2U
[g ( )]
[(
nE log f X ;
)]
[g ( )]
=
()
nI
(8)
The expression E[(/ )log f(X; )]2, denoted by I( ), is called Fishers information (about ) number; nE[(/ )log f(X; )]2 is the information (about )
contained in the sample X1, . . . , Xn.
(For an alternative way of calculating I( ), see Exercises 12.4.6 and 12.4.7.)
Returning to the proof of Theorem 2, we have that equality holds in (8) if
2
and only if C o(U, V) = ( 2U)( 2V) because of (7). By Schwarz inequality
(Theorem 2, Chapter 5), this is equivalent to
( )(
V = E V + k U EU with P probability 1,
(9)
where
()
k =
V
.
U
Furthermore, because of (i), the exceptional set for which (9) does not hold is
independent of and has P-probability 0 for all . Taking into consideration (4), the fact that EU = g() and the denition of V, equation (9) becomes
as follows:
n
log f X j ; = k U X 1 , . . . , X n g k
j =1
) () (
) ()()
(10)
log f X j ; = U X 1 , . . . , X n
j =1
) k( )d g( )k( )d + h ( X , . . . , X ),
1
297
where h(X
1, . . . , Xn) is the constant of the integration, or
n
log f x j ; = U x1 , . . . , xn
j =1
) k( )d g( )k( )d + h ( x , . . . , x ).
n
(11)
f ( x ; ) = C ( ) exp[Q( )U ( x , . . . , x )]h( x , . . . , x ),
n
(12)
j =1
where
[ ()() ] ()
()
()
C = exp g k d , Q = k d
and
[(
)]
h x1 , . . . , xn = exp h x1 , . . . , xn .
()
k =
V
,
U
then
f ( x ; ) = C ( ) exp[Q( )U ( x , . . . , x )]h( x , . . . , x )
n
j =1
outside a set N in n such that P[(X1, . . . , Xn) N] = 0 for all ; here C()
= exp[g( )k( )d] and Q( ) = k( )d. That is, the joint p.d.f. of the Xs is
of the one-parameter exponential family (and hence U is sufcient for ).
Let X1, . . . , Xn be i.i.d. r.v.s with p.d.f. f(; ) and let g be an estimable realvalued function of . For an unbiased estimator U = U(X1, , Xn) of g( ), we
assume that regularity conditions (i)(vi) are satised. Then 2U is equal to
the CramrRao bound if and only if there exists a real-valued function of ,
d( ), such that U = g( ) + d( )V except perhaps on a set of P-probability
zero for all .
298
12
Point Estimation
[g ( )]
U
2
[g ( )]
U
, or
()
nI
2 V
[g ( )] = ( U )( V ).
2
But
[g ( )]
= C o U , V
(5).
by
Let X1, . . . , Xn be i.i.d. r.v.s from B(1, p), p (0, 1). By setting p = , we have
so that
f x; = x 1
, x = 0, 1
) (
1 x
Then
x 1 x
log f x; =
and
2
1 2
1
log f x; = 2 x +
Since
E X 2 = , E 1 X
(see Chapter 5), we have
= 1
(1 x)
[ (
)]
and E X 1 X = 0
1
E log f X ; =
,
2
x 1 x .
1
299
Let X1, . . . , Xn be i.i.d. r.v.s from P(), > 0. Again by setting = , we have
f x; = e
x
, x = 0, 1, . . . so that log f x; = + x log log x!.
x!
Then
log f x; = 1 +
and
2
1 2 2
log f x; = 1 + 2 x x.
2
Since E X = and E X = (1 + ) (see Chapter 5), we obtain
1
E log f X ; = ,
is an unbiased
so that the CramrRao bound is equal to /n. Since again X
is a UMVU estimator of .
estimator of with variance /n, we have that X
EXAMPLE 9
Let X1, . . . , Xn be i.i.d. r.v.s from N(, 2). Assume rst that 2 is known and
set = . Then
f x; =
x
exp
2 2
2
, x
and hence
1 x
log f x; = log
2 2
2
Next,
1 x
log f x; =
,
so that
2
1 x
log f x; = 2
.
Then
2
1
E log f X ; = 2 ,
300
12
Point Estimation
X
E =
= 1.
f x; =
x
exp
2
2
so that
x
1
1
log f x; = log 2 log
2
2
2
( )
and
1
log f x; =
+
2
2 2
Then
2
1
1 x
1 x
+ 2
log f x; = 2 2
4
2
4
X
X
E
= 1, E
= 3.
Therefore
2
1
E log f X ; = 2
2
2
and the CramrRao bound is 2 /n. Next,
X
j is n2
j =1
n X 2
n X 2
j
j
2
E
= n and
= 2n
j =1
j =1
12.3
301
is n21
n
n 1 j = 1
n 1
the CramrRao bound.
This then is an example of a case where a UMVU estimator does exist but
its variance is larger than the CramrRao bound.
Exercises
12.4.1 Let X1, . . . , Xn be i.i.d. r.v.s from the Gamma distribution with
known and = (0, ) unknown. Then show that the UMVU estimator
of is
U X1 , . . . , X n =
1
n
X
j =1
302
12
Point Estimation
S 2 f ( x; )dx = 0
2 f ( x; ) = 0.
or
()
2V
[(
[1 + b( )]
)
nE log f X ;
)]
, .
Here X is an r.v. with p.d.f. f(; ) and b( ) = db( )/d. (This inequality is
established along the same lines as those used in proving Theorem 2.)
12.5
Criteria12.3
for Selecting
anofEstimator:
The
Principle
The Case
Availability
of Maximum
Complete Likelihood
Sufcient Statistics
303
[(
L x1 , . . . , xn = max L x1 , . . . , xn ; ;
(X1, . . . , Xn) is called an ML estimator (MLE for short) of .
L x1 , . . . , xn = P X 1 = x1 , . . . , X n = xn ;
|x1, . . . , xn) is the probability of observing the xs which were
that is, L(
acutally observed. Then it is intuitively clear that one should select as an
estimate of that which maximizes the probability of observing the xs which
were actually observed, if such a exists. A similar interpretation holds true
|x1, . . . , xn) with the
for the case that the Xs are continuous by replacing L(
|x1, . . . , xn)dx1 dxn which represents the probability
probability element L(
(under P ) that Xj lies between xj and xj + dxj, j = 1, . . . , n.
In many important cases there is a unique MLE, which we then call the
MLE and which is often obtained by differentiation.
Although the principle of maximum likelihood does not seem to be justiable by a purely mathematical reasoning, it does provide a method for
producing estimates in many cases of practical importance. In addition, an
MLE is often shown to have several desirable properties. We will elaborate on
this point later.
The method of maximum likelihood estimation will now be applied to a
number of concrete examples.
EXAMPLE 10
L x1 , . . . , xn = e n
j =1 x j !
n
nj = 1 x j
and hence
n
n
304
12
Point Estimation
log L x1 , . . . , xn = 0 becomes n + nx = 0
2
1
L x1 , . . . , xn = nx 2 < 0 for all > 0
2
= = p1 , . . . , pr r ; p j > 0, j = 1, . . . , r and
j =1
= 1.
Then
L x1 , . . . , xr =
=
n!
p 1x prx
1
j =1 x j !
r
n!
j =1 x j !
p1x prx1 1 p1 pr 1
r1
xr
log L x1 , . . . , xr = log
n!
x!
j =1 j
+ x1 log p1 +
+ xr 1 log pr 1 + xr log 1 p1 pr 1 .
Differentiating with respect to pj, j = 1, . . . , r 1 and equating the resulting
expressions to zero, we get
xj
1
1
xr
= 0,
pj
pr
j = 1, . . . , r 1.
This is equivalent to
xj
pj
xr
,
pr
j = 1, . . . , r 1;
that is,
x1
x
x
= = r 1 = r ,
p1
pr 1 pr
and this common value is equal to
x1 + + xr 1 + xr n
= .
p1 + + pr 1 + pr 1
12.5
Criteria12.3
for Selecting
anofEstimator:
The
Principle
The Case
Availability
of Maximum
Complete Likelihood
Sufcient Statistics
305
Hence xj /pj = n and pj = xj /n, j = 1, . . . , r. It can be seen that these values of the
ps actually maximize the likelihood function, and therefore p j = xj /n, j = 1, . . . ,
r are the MLEs of the ps. (See Exercise 12.5.4.)
EXAMPLE 12
Let X1, . . . , Xn be i.i.d. r.v.s from N(, 2) with parameter = (, 2). Then
L x1 , . . . , xn
1
1
=
exp
2
2
2 2
(x
j =1
2
,
so that
1
2 2
(x
j =1
n
log L x1 , . . . , xn = 2 x = 0
n
1 n
+
log
L
x
,
.
.
.
,
x
xj
1
n
2
2 2 2 4 j =1
= 0.
Then
2
1 n
xj x
n j =1
are the roots of these equations. It is further shown that and 2 actually
maximize the likelihood function (see Exercise 12.5.5) and therefore
= x and 2 =
1 n
= x and 2 = x j x
n j =1
log L x1 , . . . , xn = 0.
n
2
log L x1 , . . . , xn = 2 < 0
2
log L x1 , . . . , xn = 0
is equal to
2
1 n
xj .
n j =1
306
12
Point Estimation
Next,
2
1 n 1
log L x1 , . . . , xn = 4 2
2
2
2
which, for equal to
(x
j =1
2
1 n
xj ,
n j =1
becomes
n
n
< 0.
n =
2
2 4
1
4
So
1 n
2 = x j
n j =1
L x1 , , xn =
( )
( )
( )
I [ , ) x(1) I ( , ] x( n ) .
Here the likelihood function is not differentiable with respect to and , but
it is, clearly, maximized when is minimum, subject to the conditions that
x(1) and x(n). This happens when = x(1) and = x(n). Thus = x(1) and
= x(n) are the MLEs of and , respectively.
In particular, if = c, = + c, where c is a given positive constant, then
L x1 , . . . , xn =
( 2c)
( )
( )
I [ c , ) x(1) I ( , +c ] x( n ) .
The likelihood function is maximized, and its maximum is 1/(2c)n, for any
such that c x(1) and + c x(n); equivalently, x(1) + c and x(n) c. This
shows that any statistic that lies between X(1) + c and X(n) c is an MLE of .
For example, 12 [X(1) + X(n)] is such a statistic and hence an MLE of .
If is known and = , or if is known and = , then, clearly, x(1) and
x(n) are the MLEs of and , respectively.
REMARK 4
iii) The MLE may be a UMVU estimator. This, for instance, happens in
Example 10, for in Example 12, and also for 2 in the same example
when is known.
12.5
307
Criteria12.3
for Selecting
anofEstimator:
The
Principle
The Case
Availability
of Maximum
Complete Likelihood
Sufcient Statistics
iii) The MLE need not be UMVU. This happens, e.g., in Example 12 for 2
when is unknown.
iii) The MLE is not always obtainable by differentiation. This is the case in
Example 13.
iv) There may be more than one MLE. This case occurs in Example 13 when
= c, = + c, c > 0
In the following, we present two of the general properties that an MLE enjoys.
THEOREM 4
Let X1, . . . , Xn be i.i.d. r.v.s with p.d.f. f(; ), r, and let T = (T1, . . . ,
Tr), Tj = Tj(X1, . . . , Xn), j = 1, . . . , r be a sufcient statistic for = (1, . . . , r).
Then, if = (1, . . . , r) is the unique MLE , it follows that is a function
of T.
) [ (
) ](
f x1 ; f xn ; = g T x1 , . . . , xn ; h x1 , . . . , xn ,
where h is independent of .
Therefore
[(
max f x1 ; f xn ; ;
{[
) ]
= h x1 , . . . , xn max g T x1 , . . . , xn ; ; .
Thus, if a unique MLE exists, it will have to be a function of T, as it follows
from the right-hand side of the equation above.
REMARK 5 Notice that the conclusion of the theorem holds true in all
Examples 1013. See also Exercise 12.3.10.
Another optimal property of an MLE is invariance, as is proved in the
following theorem.
THEOREM 5
Let X1, . . . , Xn be i.i.d. r.v.s with p.d.f. f(x; ), r, and let be dened
on onto * m and let it be one-to-one. Suppose is an MLE of . Then
). That is, an MLE is invariant under one-to-one
( ) is an MLE of (
transformations.
), so that = 1(
*). Then
PROOF Set * = (
) [ ( )
L x1 , . . . , xn = L 1 * x1 , . . . , xn ,
*|x1, . . . , xn). It follows that
call it L*(
[(
[ (
max L x1 , . . . , xn ; = max L * * x1 , . . . , xn ; * * .
By assuming the existence of an MLE, we have that the maximum at the
left-hand side above is attained at an MLE . Then, clearly, the right-hand
).
side attains its maximum at *, where * = ( ). Thus ( ) is an MLE of (
For instance, since
308
12
Point Estimation
1 n
xj x
n j =1
is the MLE of 2 in the normal case (see Example 12), it follows that
1 n
xj x
n j =1
is the MLE of .
Exercises
12.5.1 If X1, . . . , Xn are i.i.d. r.v.s from B(m, ), = (0, ), show that
/m is the MLE of .
X
12.5.2 If X1, . . . , Xn are i.i.d. r.v.s from the Negative Binomial distribution
) is the MLE of .
with parameter = (0, 1), show that r/(r + X
12.5.3 If X1, . . . , Xn are i.i.d. r.v.s from the Negative Exponential distribu ) is the MLE of .
tion with parameter = (0, ), show that 1/X
12.5.4 Refer to Example 11 and show that the quantities p j = xj/n, j = 1, . . . ,
r indeed maximize the likelihood function.
12.5.5 Refer to Example 12 and consider the case that both and 2 are
unknown. Then show that the quantities = x and
2 =
1 n
xj x
n j =1
f x; 1 , 2 =
x 1
1
exp
, x 1 , = 1 , 2 = 0, .
2
2
12.5.9
1
2
,+
1
2
),
12.6 Criteria
for Selecting
Decision-Theoretic
12.3
The Casean
of Estimator:
AvailabilityThe
of Complete
Sufcient Approach
Statistics
)(
309
1
= X 1 , . . . , X n = X ( n ) + cos 2 X 1 X (1) X ( n ) + 1 .
2
Then show that is an MLE of but it is not a function only of the sufcient
statistic (X(1), X(n)). (Thus Theorem 4 need not be correct if there exists more
than one MLE of the parameters involved. For this, see also the paper Maximum Likelihood and Sufcient Statistics by D. S. Moore in the American
Mathematical Monthly, Vol. 78, No. 1, January 1971, pp. 4245.)
DEFINITION 8
For estimating on the basis of X1, . . . , Xn and by using the decision function
, a loss function is a nonnegative function in the arguments and (x1, . . . , xn)
which expresses the (nancial) loss incurred when is estimated by
(x1, . . . , xn).
The loss functions which are usually used are of the following form:
[ (
)]
L ; x1 , . . . , xn = x1 , . . . , xn ,
or more generally,
[ (
)] ( )
L ; x1 , . . . , xn = x1 , . . . , xn
, k > 0;
[ (
)] [
L ; x1 , . . . , xn = x1 , . . . , xn
DEFINITION 9
)] .
2
[ (
R ; = E L ; X 1 , . . . , X n
[ (
)]
)] (
)] (
L ; x , . . . , x f x ; f x ; dx dx
1
n
1
n
1
n
=
L ; x1 , . . . , xn f x1 ; f xn ; .
x
x
1
[ (
310
12
Point Estimation
That is, the risk corresponding to a given decision function is simply the
average loss incurred if that decision function is used.
Two decision functions and * such that
[ (
)]
)]
R ; = E L ; X 1 , . . . , X n = E L ; * X 1 , . . . , X n = R ; *
DEFINITION 11
311
12.6 Criteria
for Selecting
Decision-Theoretic
12.3
The Casean
of Estimator:
AvailabilityThe
of Complete
Sufcient Approach
Statistics
R(; *)
R(; )
R(0; )
0
Figure 12.2
0
0
Figure 12.3
Within the class D1, the estimator is said to be minimax if for any other
estimator *, one has
[( )
[(
sup R ; ; sup R ; * ; .
Figure 12.2 illustrates the fact that a minimax estimator may be inadmissible.
Now one may very well object to the minimax principle on the grounds
that it gives too much weight to the maximum risk and entirely neglects its
other values. For example, in Fig. 12.3, whereas the minimax estimate is
slightly better at its maximum R(0; ), it is much worse than * at almost all
other points.
Legitimate objections to minimax principles like the one just cited
prompted the advancement of the concept of a Bayes estimate. To see what
this is, some further notation is required. Recall that , and suppose now
that is an r.v. itself with p.d.f. , to be called a prior p.d.f. Then set
( )()
( )()
R ; d
R = E R ; =
R ; .
()
DEFINITION 13
Within the class D2, the estimator is said to be a Bayes estimator (in the
decision-theoretic sense and with respect to the prior p.d.f. on ) if R( )
R(*) for any other estimator *.
It should be pointed out at the outset that the Bayes approach to estimation poses several issues that we have to reckon with. First, the assumption of
being an r.v. might be entirely unreasonable. For example, may denote the
(unknown but xed) distance between Chicago and New York City, which is
312
12
Point Estimation
to be determined by repeated measurements. This difculty may be circumvented by pretending that this assumption is only a mathematical device, by
means of which we expect to construct estimates with some tangible and
mathematically optimal properties. This granted, there still is a problem in
choosing the prior on . Of course, in principle, there are innitely many
such choices. However, in concrete cases, choices do suggest themselves. In
addition, when choosing we have the exibility to weigh the parameters the
way we feel appropriate, and also incorporate in it any prior knowledge we
might have in connection with the true value of the parameter. For instance,
prior experience might suggest that it is more likely that the true parameter
lies in a given subset of rather than in its complement. Then, in choosing ,
it is sensible to assign more weight in the subset under question than to its
complement. Thus we have the possibility of incorporating prior information
about or expressing our prior opinion about . Another decisive factor in
choosing is that of mathematical convenience; we are forced to select so
that the resulting formulas can be handled.
We should like to mention once and for all that the results in the following
two sections are derived by employing squared loss functions. It should be
emphasized, however, that the same results may be discussed by using other
loss functions.
) [
)] [
= x1 , . . . , xn , L ; = L ; x1 , . . . , xn = x1 , . . . , xn
)] .
2
[ (
R ; = E X 1 , . . . , X n
=
Therefore
()
)]
[ ( x , . . . , x )] f ( x ; ) f ( x ; )dx
2
)()
R = R ; d
x1 , . . . , xn
)]
}()
f x1 ; f xn ; dx1 dxn d
dxn .
12.7
Finding
Bayes Estimators
The Case of Availability of
Complete
Sufcient
Statistics
12.3
[ ( x , . . . , x )]
()(
313
) }
f x1 ; f xn ; d dx1 dxn .
(13)
(As can be shown, the interchange of the order of integration is valid here
because the integrand is nonnegative. The theorem used is known as the
Fubini theorem.)
From (13), it follows that if is chosen so that
[ ( x1 , . . . , xn )] ( ) f ( x1 ; ) f ( xn ; )d
2
[ ( x1 , . . . , xn )] ( ) f ( x1 ; ) f ( xn ; )d
= 2 ( x1 , . . . , xn ) f ( x1 ; ) f ( xn ; ) ( )d 2 ( x1 , . . . , xn )
f ( x1 ; ) f ( xn ; ) ( )d + 2 f ( x1 ; ) f ( xn ; ) ( )d ,
()
g t = at 2 2bt + c
(14)
(a > 0 )
which is minimized for t = b/a. (In fact, g(t) = 2at 2b = 0 implies t = b/a and
g(t) = 2a > 0.)
Thus the required estimate is given by
f ( x ; ) f ( x ; ) ( )d
) f x ; f x ; d .
( )()
( )
x1 , . . . , xn =
f ( x1 ; ) f ( x n ; ) ( )d < ,
0 < f ( x1 ; ) f ( x n ; ) ( )d < ,
and
f (x ; ) f (x
2
)()
; d < ,
f ( x ; ) f ( x ; ) ( )d
) f x ; f x ; d ,
( )()
( )
x1 , . . . , xn =
(15)
314
12
Point Estimation
f , x
f x;
f x ; f x ;
( ) (h(x) ) = ( h(x)) ( ) = ( ) h(x() ) ( ) ,
1
hx =
(16)
where
()
)()
)()
h x = f x; d = f x1 ; f xn ; d
for the case that is of the continuous type. By means of (15) and (16), it
follows then that the Bayes estimate of (in the decision-theoretic sense)
(x1, . . . , xn) is the expectation of with respect to its posterior p.d.f., that is,
( )
x1 , . . . , xn = h x d .
hx =
(17)
( )
( )()
+
1
1 1 ,
if 0, 1
=
0,
otherwise.
Now, from the denition of the p.d.f. of a Beta distribution with parameters and , we have
()
( )
12.3
12.7
Finding
Bayes Estimators
The Case of Availability of
Complete
Sufcient
Statistics
1
0 x (1 x)
1
( ) ( ),
( + )
dx =
315
(18)
and, of course () = ( 1)( 1). Then, for simplicity, writing njxj rather
than nj=1xj when this last expression appears as an exponent, we have
)()
j xj
n j x j
I 1 = f x1 ; f xn ; d
( ) 1 1 d
( )
( )
( ) ( )
( + )
(
)
(
)
=
d ,
1 )
(
( ) ( )
=
+n j x j 1
+ j x j 1
I1
+ x + n
(
)
=
( ) ( )
( + + n)
n
j =1
j =1
x j
(19)
Next,
)()
I 1 = f x1 ; f xn ; d
( ) 1 1 d
( )
( )
( ) ( )
( + )
(
)
(
)
=
d .
1 )
(
( ) ( )
=
n j x j
j xj
+n j x j 1
+ j x j +1 1
I2
+ x + 1 + n
(
)
=
( ) ( )
( + + n + 1)
n
j =1
j =1
x j
(20)
x1 , . . . , xn
(
(
n
+ + n + j =1 x j + 1 + n x j
j =1
=
;
=
n
++n
+ + n + 1 + j =1 x j
that is,
x1 , . . . , xn
n
j =1
xj +
n+ +
(21)
316
12
Point Estimation
x1 , . . . , xn
n
j =1
xj + 1
n+ 2
Let X1, . . . , Xn be i.i.d. r.v.s from N(, 1). Take to be N(, 1), where is
known. Then
)()
1 n
exp
xj
2
j =1
I 1 = f x1 ; f xn ; d
( )
2
n +1
exp
2
1 n 2
exp
x
n +1
2 j =1
2
1
( )
n + 1 2 2 nx + d .
exp
[( )
But
(n + 1)
)]
nx +
2 nx + = n + 1 2 2
n+1
) (
2
2
nx +
nx +
nx +
= n + 1 2 2
n+1
n+1
n+1
2
2
nx +
nx +
= n + 1
.
n+1
n+1
Therefore
I1 =
1
n+1
( 2 )
nx +
1 n 2
exp x j + 2
n+1
2 j =1
nx +
1
d
exp
2
n+1
2 1 n + 1
2 1 n+1
nx +
1
1
1 n 2
2
=
exp
x
+
.
n
j
n + 1
n + 1 2
2 j =1
( )
(22)
12.3
317
Next,
)()
1 n
exp
x j
2 j = 1
I 2 = f x1 ; f xn ; d
( )
2
1
n+1
n +1
exp
2
nx +
1 n
exp x 2j + 2
n+1
2 j =1
( 2 )
1
nx +
d
exp
2
n+1
2 1 n + 1
2 1 n+1
nx + nx +
1
1
1 n 2
2
=
.
exp x j +
n + 1 n + 1
n + 1 2
2 j =1
( )
(23)
x1 , . . . , xn =
nx +
.
n+1
(24)
Exercises
12.7.1
318
12
Point Estimation
( x1 , . . . , xn ) = h( x)d ,
Suppose there is a prior p.d.f. on such that for the Bayes estimate dened
by (15) the risk R( ; ) is independent of . Then is minimax.
PROOF
one has
319
12.8
Finding Minimax
The Case of Availability
of Complete
SufcientEstimators
Statistics
12.3
R( ; ) ( )d R( ; *) ( )d
for any estimate *. But R(; ) = c by assumption. Hence
[( )
[(
)()
sup R ; ; = c R ; * d sup R ; * ;
for any estimate *. Therefore is minimax. The case that is of the discrete
type is treated similarly.
The theorem just proved is illustrated by the following example.
EXAMPLE 16
X +
R ; = E
n+ +
(n + + )
By taking = =
n 2 2 2 + 2 n + 2 .
1
n and denoting by * the resulting estimate, we have
2
( + )
n = 0, 2 2 + 2 n = 0,
so that
R ; * =
2
n+ +
) (
2
4 n+ n
4 1+ n
Since R(; *) is independent of , Theorem 6 implies that
* x1 , . . . , xn =
is minimax.
EXAMPLE 17
1
n
2 nx + 1
2
=
n+ n
2 1+ n
j =1
xj +
Let X1, . . . , Xn be i.i.d. r.v.s from N(, 2), where 2 is known and = .
of was UMVU. It can
It was shown (see Example 9) that the estimator X
be shown that it is also minimax and admissible. The proof of these latter two
facts, however, will not be presented here.
Now a UMVU estimator has uniformly (in ) smallest risk when its
competitors lie in the class of unbiased estimators with nite variance. However, outside this class there might be estimators which are better than a
UMVU estimator. In other words, a UMVU estimator need not be admissible.
Here is an example.
320
12
Point Estimation
EXAMPLE 18
Let X1, . . . , Xn be i.i.d. r.v.s from N(0, 2). Set 2 = . Then the UMVU
estimator of is given by
U=
1 n
X j2 .
n j =1
(See Example 9.) Its variance (risk) was seen to be equal to 2 2/n; that is,
R(; U) = 2 2/n. Consider the estimator = U. Then its risk is
R ; = E U
[(
)]
) (
= E U + 1
[(
2
n + 2 2 2 n + n .
n
The value = n/(n + 2) minimizes this risk and the minimum risk is equal to
2 2/(n + 2) < 2 2/n for all . Thus U is not admissible.
Exercise
12.8.1 Let X1, . . . , Xn be independent r.v.s from the P( ) distribution, and
consider the loss function L(; ) = [ (x)]2/. Then for the estimate (x) =
calculate the risk R(; ) = 1/E [ (X)]2, and conclude that (x) is
x,
minimax.
p j = p j , = 1 , . . . , r , j = 1, . . . , k.
()
[X
=
k
j =1
( )] .
np ()
np j
Often the ps are differentiable with respect to the s, and then the minimization can be achieved, in principle, by differentiation. However, the actual
solution of the resulting system of r equations is often tedious. The solution
may be easier by minimizing the following modied 2 expression:
12.3
12.9
Other Methods
of Estimation
The Case of Availability
of Complete
Sufcient
Statistics
[X
=
k
2
mod
( )] ,
np j
j =1
Xj
321
Let X1, . . . , Xn be i.i.d. r.v.s from N(, 2), where both and 2 are unknown.
By the method of moments, we have
X =
n
2
1 n
1
2
2
2
2
=
,
hence
,
X
X
Xj X .
j
n
n j =1
j =1
EXAMPLE 20
Let X1, . . . , Xn be i.i.d. r.v.s from U(, ), where both and are unknown.
Since
+
EX 1 =
2
( )
and X 1
2
( )
=
12
+
X =
2
n
1 X2 =
j
n
12
j =1
where
) + ( + )
2
+ = 2 X
, or
= S 12 ,
322
12
Point Estimation
S=
1 n
Xj X
n j =1
Hence = X S 3 , = X + S 3 .
Exercises
12.9.1 Let X1, . . . , Xn be independent r.v.s distributed as U( a, + b),
where a, b > 0 are known and = . Find the moment estimator of and
calculate its variance.
12.9.2 If X1, . . . , Xn are independent r.v.s distributed as U(, ), =
(0, ), does the method of moments provide an estimator for ?
12.9.3 If X1, . . . , Xn are i.i.d. r.v.s from the Gamma distribution with param 2/S2 and = S2/X
are the moment estimators of
eters and , show that = X
and , respectively, where
S2 =
12.9.4
2
1 n
Xj X .
n j =1
()
2
x I ( 0 , ) x , = 0, .
2
Find the moment estimator of .
f x; =
12.9.5 Let X1, . . . , Xn be i.i.d. r.v.s from the Beta distribution with parameters , and nd the moment estimators of and .
12.9.6
12.3
12.10
Properties
of Estimators
The
CaseAsymptotically
of Availability Optimal
of Complete
Sufcient
Statistics
323
THEOREM 8
PROOF For the proof of the theorem the reader is referred to Remark 5,
Chapter 8.
DEFINITION 15
( ))
d
n Vn
N 0, 2 , as n ,
( P )
P
it follows that Vn n
(see Exercise 12.10.1).
DEFINITION 16
324
12
Point Estimation
THEOREM 9
log L X 1 , . . . , X n = 0
has a root *n = *(X1, . . . , Xn), for each n, such that the sequence {*n} of
estimators is BAN and the variance of its limiting normal distribution is equal
to the inverse of Fishers information number
2
I = E log f X ; ,
()
EXAMPLE 21
iii) Let X1, . . . , Xn be i.i.d. r.v.s from B(1, ). Then, by Exercise 12.5.1, the
, which we denote by X
n here. The weak and strong
MLE of is X
n follows by the WLLN and SLLN, respectively (see
consistency of X
2
n X j 2 is I 1 2 = 2 4 see Example 9 .
n j =1
It can further be shown that in all cases (i)(iii) just considered the regularity conditions not explicitly mentioned in Theorem 9 are satised, and
therefore the above sequences of estimators are actually BAN.
( )
12.3
12.11Sufcient
Closing Statistics
Remarks
The Case of Availability of Complete
325
Exercise
12.10.1 Let X1, . . . , Xn be i.i.d. r.v.s with p.d.f. f(; ); and let {Vn}
d
= {Vn(X1, . . . , Xn)} be a sequence of estimators of such that n(Vn )
( P )
2
Y as n , where Y is an r.v. distributed as N(0, ( )). Then show that
P
Vn n
. (That is, asymptotic normality of {Vn} implies its consistency in
probability.)
{U } = {U ( X , . . . , X )}
n
and
{V } = {V ( X , . . . , X )}
n
be two sequences of estimators of . Then we say that {Un} and {Vn} are
asymptotically equivalent if for every ,
) n
P
n U n Vn 0.
For an example, suppose that the Xs are from B(1, ). It has been shown
n (= X
) and this
(see Exercise 12.3.3) that the UMVU estimator of is Un = X
coincides with the MLE of (Exercise 12.5.1). However, the Bayes estimator
of , corresponding to a Beta p.d.f. , is given by
Vn
j =1
Xj +
n+ +
(25)
Wn
j =1
Xj + n 2
n+ n
(26)
That is, four different methods of estimation of the same parameter provided three different estimators. This is not surprising, since the criteria
of optimality employed in the four approaches were different. Next, by the
N(0, (1 )), and it can also be shown (see Exercise 11.1), that n(Vn )
d
values of and . It can also be shown (see Exercise 12.11.2) that n(Un Vn)
P
n
0. Thus {Un} and {Vn} are asymptotically equivalent according to De
nition 17. As for Wn, it can be established (see Exercise 12.11.3) that n(Wn
d
W, as n , where W is an r.v. distributed as N( 1 , (1 )).
)
2
(P )
326
12
Point Estimation
Thus {Un} and {Wn} or {Vn} and {Wn} are not even comparable on the basis of
Denition 17.
Finally, regarding the question as to which estimator is to be selected in a
given case, the answer would be that this would depend on which kind of
optimality is judged to be most appropriate for the case in question.
Although the preceding comments were made in reference to the Binomial case, they are of a general nature, and were used for the sake of deniteness only.
Exercises
12.11.1 In reference to Example 14, the estimator Vn given by (25) is the
Bayes estimator of , corresponding to a prior Beta p.d.f. Then show that
estimator of , whereas the estimator Vn is given by (25). Then show that n(Un
P
Vn) 0.
n
12.11.3 In reference to Example 14, Wn, given by (26), is the minimax
d W as n , where W is an
estimator of . Then show that n(Wn )
( P )
r.v. distributed as (N 21 , (1 ).)
327
Chapter 13
Testing Hypotheses
A statement regarding the parameter , such as , is called a (statistical) hypothesis (about ) and is usually denoted by H (or H0). The statement
that c (the complement of with respect to ) is also a (statistical)
hypothesis about , which is called the alternative to H (or H0) and is usually
denoted by A. Thus
( )
H H 0 :
A : c .
Often hypotheses come up in the form of a claim that a new product, a
new technique, etc., is more efcient than existing ones. In this context, H (or
H0) is a statement which nullies this claim and is called a null hypothesis.
0}, then H is called a simple
If contains only one point, that is, = {
hypothesis, otherwise it is called a composite hypothesis. Similarly for
alternatives.
Once a hypothesis H is formulated, the problem is that of testing H on the
basis of the observed values of the Xs.
DEFINITION 2
A randomized (statistical) test (or test function) for testing H against the
alternative A is a (measurable) function dened on n, taking values in [0, 1]
and having the following interpretation: If (x1, . . . , xn) is the observed value of
(X1, . . . , Xn) and (x1, . . . , xn) = y, then a coin, whose probability of falling
327
328
13
Testing Hypotheses
x1 , . . . , xn
1 if
=
0 if
(x
(x
) B
, . . . , x ) B .
, . . . , xn
In this case, the (Borel) set B in n is called the rejection or critical region and
Bc is called the acceptance region.
In testing a hypothesis H, one may commit either one of the following two
kinds of errors: to reject H when actually H is true, that is, the (unknown)
parameter does lie in the subset specied by H; or to accept H when H is
actually false.
DEFINITION 3
329
13.213.3
Testing
UMPaTests
Simple
forHypothesis
Testing Certain
Against
Composite
a SimpleHypotheses
Alternative
effective or if it has harmful side effects, the loss sustained by the company due
to an immediate obsolescence of the product, decline of the companys image,
etc., will be quite severe. On the other hand, failure to market a truly better
drug is an opportunity loss, but that may not be considered to be as serious as
the other loss. If a decision is to be made on the basis of a number of clinical
trials, the null hypothesis H should be that the cure rate of the new drug is no
more than 60% and A should be that this cure rate exceeds 60%.
We notice that for a nonrandomized test with critical region B, we have
= P X 1 , . . . , X n B = 1 P X 1 , . . . , X n B
+ 0 P X 1 , . . . , X n Bc = E X 1 , . . . , X n ,
()
and the same can be shown to be true for randomized tests (by an appropriate
application of property (CE1) in Section 3 of Chapter 5). Thus
() ()
= = E X 1 , . . . , X n ,
DEFINITION 4
(1)
A level- test which maximizes the power among all tests of level is said to
be uniformly most powerful (UMP). Thus is a UMP, level- test if (i) sup
); ] = and (ii) (
) *(
), c for any other test * which
[(
satises (i).
If c consists of a single point only, a UMP test is simply called most
powerful (MP). In many important cases a UMP test does exist.
Exercise
13.1.1 In the following examples indicate which statements constitute a
simple and which a composite hypothesis:
i) X is an r.v. whose p.d.f. f is given by f(x) = 2e2xI(0,)(x);
ii) When tossing a coin, let X be the r.v. taking the value 1 if head appears and
0 if tail appears. Then the statement is: The coin is biased;
iii) X is an r.v. whose expectation is equal to 5.
330
13
Testing Hypotheses
x1 , . . . , xn
(
(
)
)
(
(
)
)
(
(
)
)
(
(
)
)
1, if f x1 ; 1 f xn ; 1 > Cf x1 ; 0 f xn ; 0
= , if f x1 ; 1 f xn ; 1 = Cf x1 ; 0 f xn ; 0
(2)
0, otherwise,
E0 X 1 , . . . , X n = .
(3)
Then, for testing H against A at level , the test dened by (2) and (3) is MP
within the class of all tests whose level is .
The proof is presented for the case that the Xs are of the continuous type,
since the discrete case is dealt with similarly by replacing integrals by summation signs.
PROOF
dz = dx1 dxn ,
z = x1 , . . . , xn ,
Z = X1 , . . . , X n
( )
()
P Dc = P Z T c = f0 z dz = 0,
0
Tc
( )
[ ( ) ( )] [ ( ) ( )]
= P {[ f (Z) > Cf (Z)]I D} + P {[ f (Z) = Cf (Z)]I D}
= P
= P
f ( Z)
( )
=
C
D
I
Z
f
( )
( )
f Z
= P
> C I D + P
f
Z
(4)
UMPaTests
forHypothesis
Testing Certain
Composite
13.213.3
Testing
Simple
Against
a SimpleHypotheses
Alternative
331
a(C )
1
Figure 13.1
a(C)
a(C )
C
( ) ( ) [
( )] [
( )] ( ) ( )
P Y = C = G C G C = 1 a C 1 a C = a C a C ,
0
( )
) ( )
E Z = P Y > C0 = a C0 = ,
0
as was to be seen.
Next, we assume that C0 is a discontinuity point of a. In this case, take
again C = C0 in (2) and also set
( )
a(C ) a(C )
a C0
0
(so that 0 1). Again we assert that the resulting test is of level . In the
present case, (4) becomes as follows:
( )
E Z = P Y > C0 + P Y = C0
0
( )
= a C0 +
( ) a C a C = .
[ ( ) ( )]
a(C ) a(C )
a C0
332
13
Testing Hypotheses
( )
a(C ) a(C )
a C0
0
() () } (
)
{
= {z ; (z ) * (z ) < 0} = ( * < 0).
B + = z n ; z * z > 0 = * > 0 ,
B
(
(
) (
) (
) (
) (
) (
) (
)
)
B+ = > * = 1 = = f1 Cf0
B = < * = 0 = = f1 Cf0 .
(5)
Therefore
which is equivalent to
(6)
But
( )
( )
( )
= E Z E * Z = E * Z 0,
0
(7)
and similarly,
( )
( )
( )
* Z = 1 * 1 .
(8)
1) 0, or (
1) *(
1). This
Relations (6), (7) and (8) yield (
1) *(
completes the proof of the theorem.
1) is at least . That is,
The theorem also guarantees that the power (
13.213.3
Testing
UMPaTests
Simple
forHypothesis
Testing Certain
Against
Composite
a SimpleHypotheses
Alternative
COROLLARY
333
1) .
Let be dened by (2) and (3). Then (
The test *(z) = is of level , and since is most powerful, we have
1) *(
1) = .
(
PROOF
REMARK 1
[ (
]] ( ) ( )
= [1 a(b )] [1 a(b )] = a(b ) a(b ) = 0.
P Y b1 , b2 G b2 G b1
0
ii) The theorem shows that there is always a test of the structure (2) and (3)
which is MP. The converse is also true, namely, if is an MP level test,
then necessarily has the form (2) unless there is a test of size < with
power 1.
This point will not be pursued further here.
The examples to be discussed below will illustrate how the theorem is
actually used in concrete cases. In the examples to follow, = {0, 1} and the
problem will be that of testing a simple hypothesis against a simple alternative
at level of signicance . It will then prove convenient to set
R z; 0 , 1 =
( ) (
)
f (x ; ) f (x ; )
f x1 ; 1 f xn ; 1
1
Let X1, . . . , Xn be i.i.d. r.v.s from B(1, ) and suppose 0 < 1. Then
log z; 0 , 1 = x log
1 1
1
+ n x log
,
1 0
0
where x = nj= 1 xj and therefore, by the fact that 0 < 1, R(z; 0, 1) > C is
equivalent to
(
(
)
)
1 1 0
1 1
.
x > C0 , where C0 = log C n log
log
1 0
0 1 1
334
13
Testing Hypotheses
z = ,
0,
j =1 x j > C0
n
j =1 x j = C0
n
if
()
if
(9)
otherwise,
( )
E Z = P X > C0 + P X = C0 = ,
0
(10)
and X = nj= 1 Xj is B(n, i), i = 0, 1. If 0 > 1, the inequality signs in (9) and (10)
are reversed.
For the sake of deniteness, let us take 0 = 0.50, 1= 0.75, = 0.05 and
n = 25. Then
is equivalent to
For C0 = 17, we have, by means of the Binomial tables, P0.5(X 17) = 0.9784
and P0.5(X = 17) = 0.0323. Thus is dened by 0.9784 0.0323 = 0.95, whence
= 0.8792. Therefore the MP test in this case is given by (2) with C0 = 17 and
= 0.882. The power of the test is P0.75(X > 17) + 0.882 P0.75(X = 17) = 0.8356.
EXAMPLE 2
Let X1, . . . , Xn be i.i.d. r.v.s from P() and suppose 0 < 1. Then
logR z; 0 , 1 = x log
1
n 1 0 ,
0
where
n
x = xj
j =1
and hence, by using the assumption that 0 < 1, one has that R(z; 0, 1) > C is
equivalent to x > C0 , where
n(
log Ce
C0 =
log 1 0
1
1,
z = ,
0,
()
if
if
n
j =1
n
j =1
x j > C0
x j = C0
otherwise,
(11)
335
13.213.3
Testing
UMPaTests
Simple
forHypothesis
Testing Certain
Against
Composite
a SimpleHypotheses
Alternative
( )
E0 Z = P0 X > C0 + P0 X = C0 = ,
(12)
and X = nj=1 Xj is P(ni), i = 0, 1. If 0 > 1, the inequality signs in (11) and (12)
are reversed.
As an application, let us take 0 = 0.3, 1 = 0.4, = 0.05 and n = 20. Then
(12) becomes
By means of the Poisson tables, one has that for C0 = 10, P0.3(X 10) = 0.9574
and P0.3(X = 10) = 0.0413. Therefore is dened by 0.9574 0.0413 = 0.95,
whence = 0.1791.
Thus the test is given by (11) with C0 = 10 and = 0.1791. The power of the
test is
Let X1, . . . , Xn be i.i.d. r.v.s from N(, 1) and suppose 0 < 1. Then
logR z; 0 , 1 =
1 n
x j 0
2 j =1
) (x
2
2
1
C0 =
n 0 + 1
1 log C
+
n 1 0
2
1,
z =
0,
()
if
x > C0
(13)
otherwise,
where C0 is determined by
( )
E Z = P X > C0 = ,
0
(14)
[(
) (
)] [ (
)]
) (
whence C0 = 0.03. Therefore the MP test in this case is given by (13) with
C0 = 0.03. The power of the test is
[(
] [ ( )
336
13
Testing Hypotheses
EXAMPLE 4
Let X1, . . . , Xn be i.i.d. r.v.s from N(0, ) and suppose 0 < 1. Here
logR z; 0 , 1 =
1
1 0
x + log 0 ,
2 01
2
1
where x = nj=1 x2j, so that, by means of 0 < 1, one has that R(z; 0, 1) > C is
equivalent to x > C0, where
C0 =
201
log C 1 .
1 0
0
1,
z =
0,
()
if
n
j =1
x 2j > C0
(15)
otherwise,
where C0 is determined by
n
E P = E X j2 > C0 = ,
j =1
()
(16)
whence C0 = 150.264. Thus the test is given by (15) with C0 = 150.264. The
power of the test is
X 150.264
2
P16 X > 150.264 = P16
>
= P 20 > 9.3915 = 0.977.
16
16
Exercises
13.2.1 If X1, . . . , X16 are independent r.v.s, construct the MP test of the
hypothesis H that the common distribution of the Xs is N(0, 9) against the
alternative A that it is N(1, 9) at level of signicance = 0.05. Also nd
the power of the test.
13.2.2 Let X1, . . . , Xn be independent r.v.s distributed as N(, 2), where
is unknown and is known. Show that the sample size n can be determined so
that when testing the hypothesis H : = 0 against the alternative A : = 1, one
has predetermined values for and . What is the numerical value of n if
= 0.05, = 0.9 and = 1?
337
x!
2x
C} for
(Hint: Observe that the function x!/2x is nondecreasing for x integer 1.)
) (
g z; = f x1 ; f x1 ; ,
z = x1 , . . . , xn .
(17)
338
13
Testing Hypotheses
DEFINITION 5
The family {g(; ); } is said to have the monotone likelihood ratio (MLR)
property in V if the set of zs for which g(z; ) > 0 is independent of and there
exists a (measurable) function V dened in n into such that whenever ,
with < then: (i) g(; ) and g(; ) are distinct and (ii) g(z; )/g(z; )
is a monotone function of V(z).
Note that the likelihood ratio (LR) in (ii) is well dened except perhaps on
a set N of zs such that P(Z N) = 0 for all . In what follows, we will
always work outside such a set.
An important family of p.d.f.s having the MLR property is a oneparameter exponential family.
PROPOSITION 1
()
f x: =C e
() ( )
Q T x
()
hx,
where C() > 0 for all and the set of positivity of h is independent
of . Suppose that Q is increasing. Then the family {g( ;); } has the MLR
property in V, where V(z) = nj=1 T(xj) and g( ; ) is given by (17). If Q is
decreasing, the family has the MLR property in V = V.
PROOF
We have
()
g z : = C0 e
() ()
Q V z
()
h* z ,
where C0() = C (), V(z) = T(xj) and h*(z) = h(x1) h(xn). Therefore on
the set of zs for which h*(z) > 0 (which set has P-probability 1 for all ),
one has
n
n
j=1
( ) = C ( )e ( ) ( ) = C ( ) e[
() ()
g ( z; )
C ( )
C ( )e
Q V z
g z;
Q V z
( ) ( )] ( )
.
Q Q V z
Now for < , the assumption that Q is increasing implies that g(z; )/g(z; )
is an increasing function of V(z). This completes the proof of the rst assertion.
The proof of the second assertion follows from the fact that
From examples and exercises in Chapter 11, it follows that all of the
following families have the MLR property: Binomial, Poisson, Negative Binomial, N(, 2) with 2 known and N(, ) with known, Gamma with = and
known, or = and known. Below we present an example of a family
which has the MLR property, but it is not of a one-parameter exponential
type.
EXAMPLE 5
Consider the Logistic p.d.f. (see also Exercise 4.1.8(i), Chapter 4) with parameter ; that is,
f x; =
e x
(1 + e )
x
x , = .
(18)
339
Then
( )=e
f ( x; )
f x;
1 + e x
x
1+ e
and
( ) < f ( x ; )
f ( x; )
f ( x ; )
f x;
if and only if
2
x
1 + e x
1 + e
<
e
.
x
x
1+e
1+e
Let X1, . . . , Xn be i.i.d. r.v.s with p.d.f. f(x; ), and let the family {g(;
); } have the MLR property in V, where g(; ) is dened in (17). Let 0
and set = { ; 0}. Then for testing the (composite) hypothesis
H : against the (composite) alternative A : c at level of signicance ,
there exists a test which is UMP within the class of all tests of level . In the
case that the LR is increasing in V(z), the test is given by
1,
z = ,
0,
()
()
V ( z) = C
if V z > C
if
otherwise,
( )
(19)
[() ]
[() ]
E 0 Z = P 0 V Z > C + P 0 V Z = C = .
(19)
If the LR is decreasing in V(z), the test is taken from (19) and (19) with
the inequality signs reversed.
The proof of the theorem is a consequence of the following two lemmas.
LEMMA 1
Under the assumptions made in Theorem 2, the test dened by (19) and (19)
is MP (at level ) for testing the (simple) hypothesis H0 : = 0 against the
(composite) alternative A : c among all tests of level .
Let be an arbitrary but xed point in c and consider the problem
of testing the above hypothesis H0 against the (simple) alternative A : = at
level . Then, by Theorem 1, the MP test is given by
PROOF
1,
z = ,
0,
()
if
if
( ) ( )
g( z; ) = C g( z; )
g z; > C g z; 0
otherwise,
340
13
Testing Hypotheses
( )
E 0 Z = .
[ ( )]
[V ( z)] = C
V z > C
()
()
( )
V z > 1 C = C0
V z = C0 .
if and only if
if and only if
(20)
In addition,
{ [ ( )] }
( )
{ [ ( )] }
E 0 Z = P 0 V Z > C + P 0 V Z = C
[()
[()
= P 0 V Z > C0 + P 0 V Z = C0 .
z = ,
0,
()
V ( z) = C
if V z > C0
()
if
(21)
otherwise,
and
[()
( )
[()
E0 Z = P0 V Z > C0 + P0 V Z = C 0 = ,
(21)
Under the assumptions made in Theorem 2, and for the test function dened
by (19) and (19), we have E(Z) for all .
Let be an arbitrary but xed point in and consider the problem
of testing the (simple) hypothesis H : = against the (simple) alternative
A0(= H0) : = 0 at level () = E(Z). Once again, by Theorem 1, the MP test
is given by
PROOF
1,
z = ,
0,
()
if
if
( ) ( )
g( z; ) = C g( z; )
g z; 0 > C g z;
0
otherwise,
( )
{ [ ( )] }
{ [ ( )] }
( )
E Z = P V Z > C + P V Z = C = .
341
z = ,
0,
()
( )
[()
()
V ( z) = C
if V z > C0
if
(22)
otherwise,
[()
] ( )
E Z = P V Z > C0 + P V Z = C0 = ,
(22)
where C0 = 1(C ).
Replacing 0 by in the expression on the left-hand side of (19) and
comparing the resulting expression with (22), one has that C0 = C and = .
Therefore the tests and are identical. Furthermore, by the corollary to
Theorem 1, one has that () , since is the power of the test .
PROOF OF THEOREM 2
{
}
= {all level tests for testing H : = }.
Then, clearly, C C0. Next, the test , dened by (19) and (19), belongs in C
by Lemma 2, and is MP among all tests in C0, by Lemma 1. Hence it is MP
among tests in C. The desired result follows.
For the symmetric case where = { ; 0}, under the
assumptions of Theorem 2, a UMP test also exists for testing H : against
A : c. The test is given by (19) and (19) if the LR is decreasing in V(z) and
by those relationships with the inequality signs reversed if the LR is increasing
in V (z). The relevant proof is entirely analogous to that of Theorem 2.
REMARK 2
COROLLARY
()
()
342
13
Testing Hypotheses
()
()
0
0
in Remark 2, is increasing for those s for which it is less than 1 (see Figs. 13.2
and 13.3, respectively).
Another problem of practical importance is that of testing
H : = ; 1 or 2
()
f x; = C e
() ( )
Q T x
()
(23)
hx,
z = i
0,
()
()
if C1 < V z < C 2
()
if V z = C i
(i = 1, 2) (C
< C2
(24)
otherwise,
( )
( ) ]
[()
P [V (Z) = C ] = , i = 1, 2,
E Z = P C1 < V Z < C 2 + 1P V Z = C1
i
+ 2
]
()
( )
and V z = T x j .
j =1
(25)
343
()
1
1
0
2
If Q is decreasing, the test is given again by (24) and (25) with C1 < V (z) < C2
replaced by V(z) < C1 or V(z) > C2.
It can also be shown that (in nondegenerate cases) the function () =
E(Z), for the problem discussed in Theorem 3, increases for 0 and
decreases for 0 for some 1 < 0 < 2 (see also Fig. 13.4).
Theorems 2 and 3 are illustrated by a number of examples below. In order
to avoid trivial repetitions, we mention once and for all that the hypotheses to
be tested are H : = { ; 0} against A : c and H : = { ;
1 or 2} against A : c; 0, 1, 2 and 1 < 2. The level of
signicance is (0 < < 1).
EXAMPLE 6
()
V z = xj
()
and Q = log
j =1
z = ,
0,
()
if
if
j = 1 x j > C
n
j = 1 x j = C
n
(26)
otherwise,
( )
E Z = P X > C + P X = C = ,
0
(27)
and
n
X = Xj
j =1
is
B n, .
For a numerical application, let 0 = 0.5, = 0.01 and n = 25. Then one
has
344
13
Testing Hypotheses
27
143
27
P0.75 X = 18 = 0.5923.
143
z = i
0,
()
if
n
j=1
(i = 1, 2)
x j = Ci
otherwise,
( )
E i Z = P i C1 < X < C 2 + 1 P i X = C1 + 2 P i X = C 2 = , i = 1, 2.
(
(C
)
(
)
< X < C ) + P (X = C ) +
(
)
(X = C ) = 0.05.
1 0.75
2 0.75
1 = 2 =
205
0.4904.
418
( )
[ (
)]
205
P0.5 X = 10 + P0.5 X = 15 = 0.6711.
418
Let X1, . . . , Xn be i.i.d. r.v.s from P(), = (0, ). Here V(z) = nj= 1 xj and
Q() = log is increasing. Therefore the UMP test for testing H is again given
by (26) and (27), where now X is P(n).
For a numerical example, take 0 = 0.5, = 0.05 and n = 10. Then, by means
of the Poisson tables, we nd C = 9 and
182
0.5014.
363
345
()
V z = xj
()
Q =
and
j =1
1,
z =
0 ,
if x > C
otherwise,
()
where C is determined by
( )
E 0 Z = P 0 X > C = ,
n C
.
= 1
For instance, for = 2 and 0 = 20, = 0.05 and n = 25, one has C = 20.66. For
= 21, the power of the test is
()
( )
21 = 0.8023.
On the other hand, for testing H the UMP test is given by
1,
z =
0,
()
if C1 < x < C 2
otherwise,
( )
E Z = P C1 < X < C 2 = ,
i
i = 1, 2.
n C
2
=
()
N (0 ,
2
n
) n (C
N (0,
0
2
n
0
346
13
Testing Hypotheses
N (0,
2
n
C1 0 C2
Let X1, . . . , Xn be i.i.d. r.v.s from N(, ) with known. Then V(z) =
nj= 1 (xj )2 and Q() = 1/(2) is increasing. Therefore for testing H, the UMP
test is given by
1,
z =
0,
()
j = 1( x j )
n
if
>C
otherwise,
where C is determined by
n
E Z = P X j
j = 1
0
( )
> C = .
()
= 1 P n2 < C
) (independent of !).
(See also Figs. 13.8 and 13.9; 2n stands for an r.v. distribution as 2n.)
For a numerical example, take 0 = 4, = 0.05 and n = 25. Then one has
C = 150.608, and for = 12, the power of the test is (12) = 0.980.
On the other hand, for testing H the UMP test is given by
1,
z =
0,
()
if C1 < j = 1 x j
n
< C2
otherwise,
n2
n2
0
C/0
C/0
E Z = P C1 < X j
j =1
i
( )
< C2 = ,
347
i = 1, 2.
C
C
= P n2 < 2 P n2 < 1
()
(independent of !).
) (
2
2
P 25
< C 2 P 25
< C1 = 0.01,
2 C2
2 C1
P 25
<
P 25 < = 0.01
3
3
and C1, C2 are determined from the Chi-square tables (by trial and error).
Exercises
13.3.1 Let X1, . . . , Xn be i.i.d. r.v.s with p.d.f. f given below. In each case,
show that the joint p.d.f. of the Xs has the MLR property in V = V(x1, . . . , xn)
and identity V.
i) f x; =
1 x
x e I ( 0 , ) x , = 0, , = known
()
( )
(> 0);
x
r + x 1
ii) f x; = r
1 I A x , A = 0, 1, . . . , = 0, 1 .
x
) ()
( )
13.3.2 Refer to Example 8 and show that, for testing the hypotheses H and
H mentioned there, the power of the respective tests is given by
n C
= 1
()
and
n C
2
=
()
) n (C
as asserted.
13.3.3 The length of life X of a 50-watt light bulb of a certain brand may be
assumed to be a normally distributed r.v. with unknown mean and known
s.d. = 150 hours. Let X1, . . . , X25 be independent r.v.s distributed as X and
suppose that x = 1,730 hours. Test the hypothesis H : = 1,800 against the
alternative A : < 1,800 at level of signicance = 0.01.
348
13
Testing Hypotheses
C
C
C
= 1 P n2 < and = P n2 < 2 P n2 < 1
()
()
as asserted.
13.3.6 Let X1, . . . , X25 be independent r.v.s distributed as N(0, 2). Test the
hypothesis H : 2 against the alternative A : > 2 at level of signicance =
2
0.05. What does the relevant test become for 25
j = 1xj = 120, where xj is the
observed value of Xj, j = 1, . . . , 25.
13.3.7 In a certain university 400 students were chosen at random and it was
found that 95 of them were women. On the basis of this, test the hypothesis H
that the proportion of women is 25% against the alternative A that is less than
25% at level of signicance = 0.05. Use the CLT in order to determine the
cut-off point.
13.3.8 Let X1, . . . , Xn be independent r.v.s distributed as B(1, p). For testing
the hypothesis H : p 12 against the alternative A : p > 12 , suppose that = 0.05
and ( 87 ) = 0.95. Use the CLT in order to determine the required sample
size n.
13.3.9 Let X be an r.v. distributed as B(n, ), = (0, 1).
i) Derive the UMP test for testing the hypothesis H : 0 against the
alternative A : > 0 at level of signicance .
ii) What does the test in (i) become for n = 10, 0 = 0.25 and = 0.05?
iii) Compute the power at 1 = 0.375, 0.500, 0.625, 0.750, 0.875.
Now let 0 = 0.125 and = 0.1 and suppose that we are interested in securing
power at least 0.9 against the alternative 1 = 0.25.
iv) Determine the minimum sample size n required by using the Binomial
tables (if possible) and also by using the CLT.
13.3.10 The number X of fatal trafc accidents in a certain city during a year
may be assumed to be an r.v. distributed as P(). For the latest year x = 4,
whereas for the past several years the average was 10. Test whether it has been
an improvement, at level of signicance = 0.01. First, write out the expression for the exact determination of the cut-off point, and secondly, use the
CLT for its numerical determination.
13.4
13.3 UMPU
UMP Tests for Testing Certain Composite Hypotheses
349
13.3.11 Let X be the number of times that an electric light switch can be
turned on and off until failure occurs. Then X may be considered to be an r.v.
distributed as Negative Binomial with r = 1 and unknown p. Let X1, . . . , X15 be
independent r.v.s distributed as X and suppose that x = 15,150. Test the
hypothesis H : p = 104 against the alternative A : p > 104 at level of signicance
= 0.05.
13.3.12 Let X1, . . . , Xn be independent r.v.s with p.d.f. f given by
f x; =
()
1 x
e
I (0 , ) x ,
= 0, .
i) Derive the UMP test for testing the hypothesis H : 0 against the alternative A : < 0 at level of signicance ;
ii) Determine the minimum sample size n required to obtain power at least
0.95 against the alternative 1 = 500 when 0 = 1,000 and = 0.05.
DEFINITION 7
350
13
Testing Hypotheses
()
f x; = C e
( )
T x
()
.
hx,
(28)
z = i ,
0,
()
or V (z ) > C
()
V (z ) = C , (i = 1, 2) (C
if V z < C1
if
< C2
otherwise,
( )
Ei Z = ,
i = 1, 2 for H ,
and
( )
[ ( ) ( )]
( )
E Z = , E V Z Z = E V Z
0
for H 0 .
(Recall that z = (x1, . . . , xn), Z = (X1, . . . , Xn) and V(z) = nj= 1 T(xj).)
Furthermore, it can be shown that the function () = E(Z),
(except for degenerate cases) is decreasing for 0 and increasing for 0
for some 1 < 0 < 2 (see also Fig. 13.10).
We would expect that cases like Binomial, Poisson and Normal
would fall under Theorem 4, while they seemingly do not. However, a simple
reparametrization of the families brings them under the form (28). In fact, by
Examples and Exercises of Chapter 11 it can be seen that all these families are
of the exponential form
REMARK 4
()
1
0
1
0
2
13.4
13.3 UMPU
UMP Tests for Testing Certain Composite Hypotheses
()
f x; = C e
() ( )
Q T x
351
()
hx.
i) For the Binomial case, Q() = log[/(1 )]. Then by setting log[/(1 )]
= , the family is brought under the form (28). From this transformation,
we get = e/(1 + e ) and the hypotheses 1 2, = 0 become
equivalently, 1 2, = 0, where
i = log
i
,
1 i
i = 0, 1, 2.
ii) For the Poisson case, Q() = log and the transformation log = brings
the family under the form (28). The transformation implies = e and the
hypotheses 1 2, = 0 become, equivalently, 1 2, = 0 with
i = log i, i = 0, 1, 2.
iii) For the Normal case with known and = , Q() = (1/2) and the factor
1/2 may be absorbed into T(x).
iv) For the Normal case with known and 2 = , Q() = 1/(2) and the
transformation 1/(2) = brings the family under the form (28) again.
Since = 1/(2), the hypotheses 1 2 and = 0 become, equivalently, 1 2 and = 0, where i = 1/(2i), i = 0, 1, 2.
As an application to Theorem 4 and for later reference, we consider the
following example. The level of signicance will be .
EXAMPLE 10
Suppose X1, . . . , Xn are i.i.d. r.v.s from N(, 2). Let be known and set
= . Suppose that we are interested in testing the hypothesis H : = 0 against
the alternative A : 0. In the present case,
()
T x =
1
x,
2
so that
()
( )
V z = T x j =
j =1
1
2
j =1
n
x.
2
1,
z =
0 ,
()
n
x < C1
2
otherwise,
if
or
n
x > C2
2
( )
[ ( ) ( )]
( )
E Z = , E V Z Z = E V Z .
0
352
13
Testing Hypotheses
z = 1,
0,
()
n x 0
if
otherwise,
) < C
n x 0
or
) > C
where
C1 =
C1
n
n0
n0
C 2
, C 2 =
.
On the other hand, under H, n(X
0)/ is N(0, 1). Therefore, because of
symmetry C1 = C2 = C, say (C > 0). Also
n x 0
) < C
n x 0
or
) >C
is equivalent to
n x
0
>C
and, of course, [n(X
0)/]2 is 21, under H. By summarizing then, we have
1,
z =
0,
n x
0
if
otherwise,
()
>C
where C is determined by
P 12 > C = .
In many situations of practical importance, the underlying p.d.f. involves
a real-valued parameter in which we are exclusively interested, and in
addition some other real-valued parameters 1, . . . , k in which we have no
interest. These latter parameters are known as nuisance parameters. More
explicitly, the p.d.f. is of the following exponential form:
[ ()
+ T ( x) + + T ( x )]h( x),
(
f x; , 1 , . . . , k = C , 1 , . . . , k exp T x
1 1
(29)
13.3 UMP
13.5 Tests
Testing
forthe
Testing
Parameters
Certainof
Composite
a Normal Hypotheses
Distribution
H1 : =
H1 : =
H 2 : =
H 3 : =
H 4 : =
{ ; }
{ ; }
{ ; or }A ( A) :
{ ; }
{ }
353
0
0
, i = 1, . . . , 4.
(30)
Let X1, . . . , Xn be i.i.d. r.v.s with p.d.f. given by (29). Then, under some
additional regularity conditions, there exist UMPU tests with level of signicance for testing any one of the hypotheses Hi(H1) against the alternatives
Ai(A1), i = 1, . . . , 4, respectively.
Because of the special role that normal populations play in practice, the
following two sections are devoted to presenting simple tests for the hypotheses specied in (30). Some of the tests will be arrived at again on the basis of
the principle of likelihood ratio to be discussed in Section 7. However, the
optimal character of the tests will not become apparent by that process.
Exercises
13.4.1 A coin, with probability p of falling heads, is tossed independently 100
times and 60 heads are observed. Use the UMPU test for testing the hypothesis H : p = 12 against the alternative A : p 12 at level of signicance = 0.1.
13.4.2 Let X1, X2, X3 be independent r.v.s distributed as B(1, p). Derive the
UMPU test for testing H : p = 0.25 against A : p 0.25 at level of signicance .
Determine the test for = 0.05.
354
13
Testing Hypotheses
Whenever convenient, we will also use the notation z and Z instead of (x1, . . . ,
xn) and (X1, . . . , Xn), respectively. Finally, all tests will be of level .
z =
0,
()
if
(x
n
j =1
>C
(31)
otherwise,
where C is determined by
P n21 > C 02 = ,
(32)
is UMP. The test given by (31) and (32) with reversed inequalities is UMPU
for testing H1 : 0 against A1 : < 0.
The power of the tests is easily determined by the fact that (1/2) nj= 1
)2 is 2n1 when obtains (that is, is the true s.d.). For example, for
(Xj X
n = 25, 0 = 3 and = 0.05, we have for H1, C/9 = 36.415, so that C = 327.735.
The power of the test at = 5 is equal to P(224 > 13.1094) = 0.962.
For H1, C/9 = 13.848, so that C = 124.632, and the power at = 2 is P 224
(< 31.158) = 0.8384.
PROPOSITION 3
z =
0,
if C1 < j =1 x j x
n
()
< C2
(33)
otherwise,
i = 1, 2,
(34)
is UMPU. The test given by (33) and (34), where the inequalities C1 < nj=1
(xj x)2 < C2 are replaced by
n
(
j =1
xj x
< C1
or
(x
j =1
> C2 ,
2 C1
2 C2
P 24 > P 24 >
= 0.05
4
4
24
24
9
9
13.3 UMP
13.5 Tests
Testing
forthe
Testing
Parameters
Certainof
Composite
a Normal Hypotheses
Distribution
355
z =
0,
()
(x
n
if
j =1
< C1
(x
n
or
j =1
> C2
otherwise,
c 2 02
c 1 02
()
g t dt =
1 c
n 1 c
02
02
()
tg t dt = 1 ,
n x 0
()
t z =
PROPOSITION 5
1
xj x
n 1 j =1
(35)
()
()
if t z > C
(36)
otherwise,
where C is determined by
P t n1 > C = ,
(37)
is UMPU. The test given by (36) and (37) with reversed inequalities is UMPU
for testing H1 : 0 against A1 : < 0; t(z) is given by (35). (See also Figs. 13.11
and 13.12; tn1 stands for an r.v. distributed as tn1.)
For n = 25 and = 0.05, we have P(t24 > C) = 0.05; hence C = 1.7109 for H1,
and C = 1.7109 for H1.
PROPOSITION 6
1,
z =
0,
()
()
if t z < C
otherwise,
()
or t z > C
(C > 0)
356
13
Testing Hypotheses
tn 1
tn 1
0
where C is determined by
P t n1 > C = 2,
tn 1
2
2
C
Figure 13.13 H4 : = 0, A4 : 0.
Exercises
13.5.1 The diameters of bolts produced by a certain machine are r.v.s distributed as N(, 2). In order for the bolts to be usable for the intended
purpose, the s.d. must not exceed 0.04 inch. A sample of size 16 is taken and
is found that s = 0.05 inch. Formulate the appropriate testing hypothesis
problem and carry out the test if = 0.05.
13.5.2 Let X1, . . . , Xn be i.i.d. r.v.s from N(, 2), where both and are
unknown.
i) Derive the UMPU test for testing the hypothesis H : = 0 against the
alternative A : 0 at level of signicance ;
)2 = 24.8, and = 0.05.
ii) Carry out the test if n = 25, 0 = 2, 25
j = 1 (xj x
13.5.3 Discuss the testing hypothesis problem in Exercise 13.3.4 if both
and are unknown.
13.6
13.3 Comparing
UMP Teststhe
forParameters
Testing Certain
of Two
Composite
Normal Distributions
Hypotheses
357
100
= 1,752
and
j =1
2
j
= 31, 157.
j =1
Make the appropriate assumptions and test the hypothesis H that the manufacturers claim is correct against the appropriate alternative A at level of
signicance = 0.01.
13.5.5 The diameters of certain cylindrical items produced by a machine are
r.v.s distributed as N(, 0.01). A sample of size 16 is taken and is found that
x = 2.48 inches. If the desired value for is 2.5 inches, formulate the appropriate testing hypothesis problem and carry out the test if = 0.05.
Z = X1 , . . . , X m ,
W = Y1 , . . . , Yn
z = x1 , . . . , xm ,
w = y1 , . . . , yn
358
13
Testing Hypotheses
Fn 1, m 1
Fn 1, m 1
C0
0 C0
1,
z, w =
0,
j =1 ( y j y )
2
m
i =1 ( xi x )
if
>C
(38)
otherwise,
where C is determined by
P Fn1,m1 > C0 = ,
C0 =
(m 1)C ,
(n 1)
(39)
is UMPU. The test given by (38) and (39) with reversed inequalities is UMPU
for testing H : 0 against A1 : < 0. (See also Figs. 13.14 and 13.15; Fn1,m1
stands for an r.v. distributed as Fn1,m1.)
The power of the test is easily determined by the fact that
1 n
Yj Y
22 j = 1
1 m
Xj X
12 i = 1
n1
=
m1
1 m1
n1
(Yj Y )
j =1
(Xi X )
i =1
is Fn1,m1 distributed when obtains. Thus the power of the test depends only
on . For m = 25, n = 21, 0 = 2 and = 0.05, one has P(F20,24 > 5C/12) = 0.05,
hence 5C/12 = 2.0267 and C = 4.8640 for H1; for H1,
5C
12
P F20 ,24 <
= P F24 ,20 >
= 0.05
12
5C
V z, w =
1
0
m
( yj y)
j =1
( xi x )
i =1
1
+
0
( yj y)
j =1
.
2
(40)
13.6
13.3 Comparing
UMP Teststhe
forParameters
Testing Certain
of Two
Composite
Normal Distributions
Hypotheses
PROPOSITION 8
359
if V z, w < C1
or V z, w > C 2
otherwise,
P C1 < B1
< C 2 = P C1 < B1
< C2 = 1 ,
1
1
n1) , ( m1)
n+1) , ( m1)
(
(
2
2
2
2
is UMPU; V(z, w) is dened by (40). (Br ,r stands for an r.v. distributed as Beta
with r1, r2 degrees of freedom.) For the actual determination of C1, C2, we use
the incomplete Beta tables. (See, for example, New Tables of the Incomplete
Gamma Function Ratio and of Percentage Points of the Chi-Square and Beta
Distributions by H. Leon Harter, Aerospace Research Laboratories, Ofce of
Aerospace Research; also, Tables of the Incomplete Beta-Function by Karl
Pearson, Cambridge University Press.)
1
t z, w =
yx
(
m
i =1
xi x
+ j =1 y j y
n
(41)
1,
z, w =
0,
if t z, w > C
(42)
otherwise,
where C is determined by
P t m+ n 2 > C0 = , C0 = C
m+n2
,
1 m + 1 n
) ( )
(43)
is UMPU. The test given by (42) and (43) with reversed inequalities is UMPU
for testing H1 : 0 against A1 : < 0; t(z, w) is given by (41). The determination
of the power of the test involves a non-central t-distribution, as was also the
case in Propositions 5 and 6.
For example, for m = 15, n = 10 and = 0.05, one has for H1 :
P(t23 > C 23 6 ) = 0.05; hence C 23 6 = 1.7139 and C = 0.1459. For H1,
C = 0.1459.
PROPOSITION 10
360
13
Testing Hypotheses
1,
z, w =
0,
if t z, w < C
or t z, w > C
otherwise,
where C is determined by
P t m+ n 2 > C0 = 2 ,
C0 as above, is UMPU.
Again with m = 15, n = 10 and = 0.05, one has P(t23 > C 23 6 ) = 0.025
and hence C 23 6 = 2.0687 and C = 0.1762.
Once again the determination of the power of the test involves the noncentral t-distribution.
In Propositions 9 and 10, if the variances are not equal, the tests
presented above are not UMPU. The problem of comparing the means of two
normal densities when the variances are unequal is known as the Behrens
Fisher problem. For this case, various tests have been proposed but we will not
discuss them here.
REMARK 6
Exercises
13.6.1 Let Xi, i = 1, . . . , 9 and Yj, j = 1, . . . , 10 be independent random
samples from the distributions N(1, 21) and N(2, 22), respectively. Suppose
that the observed values of the sample s.d.s are sX = 2, sY = 3. At level of
signicance = 0.05, test the hypothesis: H : 1 = 2 against the alternative A : 1
2 and nd the power of the test at 1 = 2, 2 = 3. (Compute the value of the
test statistic, and set up the formulas for determining the cut-off points and the
power of the test.)
13.6.2 Let Xj, j = 1, . . . , 4 and Yj, j = 1, . . . , 4 be two independent random
samples from the distributions N(1, 21) and N(2, 22), respectively. Suppose
that the observed values of the Xs and Ys are as follows:
x1 = 10.1,
x 2 = 8.4,
x3 = 14.3,
x 4 = 11.7,
y1 = 9.0,
y2 = 8.2,
y3 = 12.1,
y4 = 10.3.
x 2 = 0.125,
x3 = 0.121,
x 4 = 0.117,
x5 = 0.120
y1 = 0.114,
y2 = 0.115,
y3 = 0.119,
y4 = 0.120,
y5 = 0.110.
361
of the test statistic, and set up the formulas for determining the cut-off points
and the power of the test.)
13.6.4 Refer to Exercise 13.6.2 and suppose it is known that 1 = 4 and 2 =
3. Test the hypothesis H that the two means do not differ by more than 1 at
level of signicance = 0.05.
13.6.5 The breaking powers of certain steel bars produced by processes A
and B are r.v.s distributed as normal with possibly different means but the
same variance. A random sample of size 25 is taken from bars produced by
each one of the processes, and it is found that x = 60, sX = 6, y = 65,
sY = 7. Test whether there is a difference between the two processes at the level
of signicance = 0.05.
13.6.6 Refer to Exercise 13.6.3, make the appropriate assumptions, and
test the hypothesis H : 1 = 2 against the alternative A : 1 2 at level of
signicance = 0.05.
13.6.7 Let Xi, i = 1, . . . , n and Yi, i = 1, . . . , n be independent random
samples from the distributions N(1, 21) and N(2, 22), respectively, and suppose that all four parameters are unknown. By setting Zi = Xi Yi, we have
that the paired r.v.s Zi, i = 1, . . . , n, are independent and distributed as N(,
2) with = 1 2 and 2 = 21 + 22. Then one may use Propositions 5 and 6 to
test hypotheses about .
Test the hypotheses H1 : 0 against A1 : > 0 and H2 : = 0 against
A2 : 0 at level of signicance = 0.05 for the data given in (i) Exercise 13.6.2;
(ii) Exercise 13.6.3.
), where, of course, L(
) = max[L(
); ].
)/L(
used rather than L(
(When we talk about a statistic, it will be understood that the observed values
have been replaced by the corresponding r.v.s although the same notation will
)/L( ) is too small,
be employed.) In terms of this statistic, one rejects H if L(
that is, C, where C is specied by the desired size of the test. For obvious
reasons, the test is called a likelihood ratio (LR) test. Of course, the test
362
13
Testing Hypotheses
()
P 2 log x G x ,
x 0 for all ,
363
(Testing the mean of a normal distribution) Let X1, . . . , Xn be i.i.d. r.v.s from
N(, 2), and consider the following testing hypotheses problems.
i) Let be known and suppose we are interested in testing the hypothesis
H : = { 0}.
Since the MLE of is = x (see Example 12, Chapter 12), we have
( )
=
L
1
exp
2
2
(x
j =1
2
x
and
( )
L =
1
2
1
exp
2
2
(x
j =1
2
0 .
n
x 0
2
1,
z =
0,
n x
0
if
otherwise,
()
>C
where C is determined by
P 12 > C = .
(Recall that z = (x1, . . . , xn).)
Notice that this is consistent with Theorem 6. It should also be pointed
out that this test is the same as the test found in Example 10 and therefore
the present test is also UMPU.
ii) Consider the same problem as in (i) but suppose now that is also unknown. We are still interested in testing the hypothesis H : = 0 which now
is composite, since is unspecied.
364
13
Testing Hypotheses
1 n
x j 0
n j =1
and 2 =
( )
=
L
1
exp
2
2
2
x =
(x
j =1
e n
and
( )
L =
1
2
1
exp
2
2
2
0 =
(x
j =1
e n 2 .
Then
2
= 2
(x x )
=
(x )
n 2
or
j=1
2 n
j=1
But
n
(x
j =1
) = (x
2
j =1
+ n x 0
and therefore
2 n
n x 0
1
= 1+
n1 1 n
xj x
n 1 j =1
1
1
t2
= 1 +
,
n 1
where
()
t =t z =
n x 0
1 n
xj x
n 1 j =1
.
2
Then < 0 is equivalent to t2 > C for a certain constant C. That is, the LR
test is equivalent to the test
1,
z =
0,
()
if t < C
otherwise,
or t > C
365
where C is determined by
P t n1 > C = 2.
Notice that, by Proposition 6, the test just derived is UMPU.
EXAMPLE 12
( )
2
m +n
1
1
exp
2
m n
1 2
2 1
(x )
1
1
2 22
i =1
(y
j =1
2
2 .
1, = x , 2 , = y , 2 =
1 m
xi x
m + n i =1
) + (y
2
j =1
2
y ,
( )
=
L
1
2
m +n
m +n
Under = {
= (1, 2, ); 1 = 2 , > 0}, we have
n
mx + ny
1 m
,
+
x
yj =
m + n i =1
m+ n
j =1
1 m +n
1 m
k =
+
x
y j =
m + n k =1
m + n i =1
j =1
and
2 =
1 m +n
vk v
m + n k =1
1 m
xi
m + n i =1
j =1
Therefore
( )
L =
It follows that
1
2
m +n
m +n
) + (y
2
2
.
366
13
Testing Hypotheses
m +n
and
2 m +n
2
.
2
Next
( xi ) = [( xi x ) + ( x )] = ( xi x )
m
i =1
i =1
m
i =1
mn 2
) = (y
= xi x
i =1
+ m x
(x y) ,
2
(m + n)
(y
j =1
j =1
m2 n
(x y) ,
2
(m + n)
so that
(m + n) = (m + n)
2
mn
xy
m+ n
) = (x x) + (y
2
i =1
j =1
2
mn
xy .
m+ n
2 m+ n
t2
= 1 +
,
m + n 2
where
t=
mn
xy
m+ n
m
1
xi x
m + n 2 i = 1
) + (y
2
j =1
2
y .
if t < C
or t > C
(C > 0)
otherwise,
where C is determined by
P t m+ n 2 > C = 2,
and z = (x1, . . . , xm), w = (y1, . . . , yn), because, under H, t is distributed as
tm+n2. We notice that the test above is the same as the UMPU test found in
Proposition 10.
367
ii) Now we are interested in testing the hypothesis H : 1 = 2 (= un = (1, 2, 1, 2); 1, 2 , 1, 2 > 0}, we have
specied). Under = {
1 m
1, = x , 2 , = y , 12, = xi x
m i =1
and
v 2
1 n
22, = y j y ,
n j =1
1, = 1, , 2 , = 2 ,
and
1 m
xi x
m + n i =1
2 =
) + (y
2
j =1
2
y .
Therefore
( )
=
L
( 2 ) ( ) ( )
m +n
m 2
2
1 ,
2
2 ,
n 2
m +n
and
( )
L =
( 2 ) ( )
m +n
1
( m +n ) 2
e
,
( m +n ) 2
so that
( ) ( )
=
( )( )
2
1 ,
m 2
2
2 ,
m +n
n 2
(m + n)(
m +n
) 2 m
i =1 xi x
m
mm 2 nn 2 i = 1 xi x
(
m + n)
(
=
m +n
mm 2 nn
) (y
2
j =1
2
y
) (
2
n
+ j = 1 y j y
j =1
2
m
m 1 i = 1 xi x m 1
2
n
n1
j = 1 y j y n 1
(
(
)
)
2
m
1 + m 1 i = 1 xi x m 1
2
n
n1
j = 1 y j y n 1
(
(
)
)
m 2
2
yj y
m 2
( m +n )
( m +n )
368
13
Testing Hypotheses
(m + n)
( n+ m)
m1
n1
m m 2 nn 2
m1
f
1 +
n1
m 2
( m+ n)
m1
n1
m1
f
1 +
n1
m 2
( m+ n)
<C
Setting g( f) for the left hand side of this last inequality, we have that g(0) = 0
and g(f) 0 as f . Furthermore, it can be seen (see Exercise 13.7.4) that
g(f) has a maximum at the point
fmax =
( );
n( m 1)
m n1
()
f < C1
or
f > C2
P Fm1,n1 < C1 or
Fm1,n1 > C 2 =
( ) ( )
and g C1 = g C 2 .
0 C1
2
C2
Figure 13.16
369
Exercises
13.7.1 Refer to Exercise 13.4.2 and use the LR test to test the hypothesis
H : p = 0.25 against the alternative A : p 0.25. Specically, set (t) for the
likelihood function, where t = x1 + x2 + x3, and:
i) Calculate the values (t) for t = 0, 1, 2, 3 as well as the corresponding
probabilities under H;
ii) Set up the LR test, in terms of both (t) and t;
iii) Specify the (randomized) test of level of signicance = 0.05;
iv) Compare the test in part (iii) with the UMPU test constructed in Exercise
13.4.2.
13.7.2 A coin, with probability p of falling heads, is tossed 100 times and 60
heads are observed. At level of signicance = 0.1:
i) Test the hypothesis H : p = 12 against the alternative A : p 12 by using the
LR test and employ the appropriate approximation to determine the cutoff point;
ii) Compare the cut-off point in part (i) with that found in Exercise 13.4.1.
13.7.3 If X1, . . . , Xn are i.i.d. r.v.s from N(, 2), derive the LR test and the
test based on 2 log for testing the hypothesis H : = 0 rst in the case that
is known and secondly in the case that is unknown. In the rst case,
compare the test based on 2 log with that derived in Example 11.
13.7.4
()
gt =
m1
t
n1
m1
t
1 +
n1
m 2
( m+ n)
[ ()
max g t ; t 0 =
mm 2
[1 + (m n)](
m+ n 2
(
(
)
)
m n1
0,
n m1
(
(
)
)
m n 1
and decreasing in
, .
n m 1
370
13
Testing Hypotheses
P X1 = x1 , . . . , X k = x k =
n!
p1x pkx ,
x1! x k !
1
= = p1 , . . . , pk ; p j > 0, j = 1, . . . , k,
p
j =1
= 1.
We may suspect that the ps have certain specied values; for example, in
the case of a die, the die may be balanced. We then formulate this as a
hypothesis and proceed to test it on the basis of the data. More generally, we
may want to test the hypothesis that lies in a subset of .
0} = {(p10, . . . , pk0)}. Then, under ,
Consider the case that H : = {
L =
( )
n!
p10x pkx0 ,
x1! x k !
( )
n!
p 1x p kx ,
x1! x k !
while, under ,
=
L
where pj = xj/n are the MLEs of pj, j = 1, . . . , k (see Example 11, Chapter 12).
Therefore
p
= n j 0
j =1 x j
k
xj
and H is rejected if 2 log > C. The constant C is determined by the fact that
2 log is asymptotically 2k1 distributed under H, as it can be shown on the
basis of Theorem 6, and the desired level of signicance .
Now consider r events Ai, i = 1, . . . , r which form a partition of the sample
space S and let {Bj, j = 1, . . . , s} be another partition of S. Let pij = P(Ai Bj)
and let
s
j =1
i =1
pi . = pij , p. j = pij .
13.8 Applications
13.3of UMP
LR Tests:
TestsContingency
for Testing Certain
Tables,Composite
Goodness-of-Fit
Hypotheses
Tests
371
p = p
i.
.j
i =1
j =1
= pij = 1.
i =1 j =1
Furthermore, the events {A1, . . . , Ar} and {B1, . . . , Bs} are independent if and
only if pij = pi. p.j, i = 1, . . . , r, j = 1, . . . , s.
A situation where this set-up is appropriate is the following: Certain
experimental units are classied according to two characteristics denoted
by A and B and let A1, . . . , Ar be the r levels of A and B1, . . . , Br be the J
levels of B. For instance, A may stand for gender and A1, A2 for male and
female, and B may denote educational status comprising the levels B1 (elementary school graduate), B2 (high school graduate), B3 (college graduate),
B4 (beyond).
We may think of the rs events Ai Bj being arranged in an r s rectangular
array which is known as a contingency table; the event Ai Bj is called the
(i, j)th cell.
Again consider n experimental units classied according to the characteristics A and B and let Xij be the number of those falling into the (i, j)th cell. We
set
s
Xi . = X ij
and X . j = X ij .
j =1
i =1
i.
i =1
= X . j = n.
j =1
ij
= pi ,
j =1
ij
= qj ,
i =1
j =1
i =1
xi . = xij , x. j = xij ,
372
13
Testing Hypotheses
the likelihood function takes the following forms under and , respectively.
Writing i,j instead of ir= 1 js= 1, we have
( ) n!x ! p
xij
ij
L =
( ) n!x ! ( p q )
L =
n!
x
x
, pi q j =
x
!
i , j ij i j
ij
i ,j
ij
i ,j
xij
i ,j
ij
i ,j
ij
n!
x
pi
x
!
i , j ij i
i.
x
qj
j
.j
since
, pi
xij
i j
q j = pi q j = pix q!x q sx
xij
xij
xij
i1
is
= pix q1x q sx = pix
i
i
i
i
i1
is
x
qj
j
xij
n
x. j
xi .
, q j , =
,
n
n
, p i , =
( )
=
L
xij
x
n! xi .
i , j xij ! i n
xij
n!
, L =
i , j xij ! i , j n
i.
( )
x x
. j
j n
.j
and hence
x xi . x x. j
i . . j
i n j n
x
nij
i ,j
xij
x j
xi
xi . x j
j
i
nn xij ij
x
i ,j
=
2
j =1
(X
np j
np j
373
H : = 0 = p10 , . . . , pk 0 ,
{ } (
k
=
2
(x
npj 0
npj 0
j =1
and reject H if 2 is too large, in the sense of being greater than a certain
constant C which is specied by the desired level of the test. It can further be
shown that, under , 2 is asymptotically distributed as 2k1. In fact, the present
test is asymptotically equivalent to the test based on 2 log .
For the case of contingency tables and the problem of testing independence there, we have
=
2
(x
ij
npi q j
npi q j
i ,j
=
2
(x
ij
np i , p j ,
np i , q j ,
i ,j
Exercises
13.8.1 Show that p ij , =
this section.
xij
n
, p i , =
xi .
n
, q j , =
x j
n
374
13
Testing Hypotheses
13.8.3 A die is cast 600 times and the numbers 1 through 6 appear with the
frequencies recorded below.
100
94
103
89
110
104
12
10
13.8.6 It is often assumed that I.Q. scores of human beings are normally
distributed. Test this claim for the data given below by choosing appropriately
the Normal distribution and taking = 0.05.
x 90 90 < x 100 100 < x 110 110 < x 120 120 < x 130 x > 130
10
18
23
22
18
(Hint: Estimate and 2 from the grouped data; take the midpoints for the
nite intervals and the points 65 and 160 for the leftmost and rightmost
intervals, respectively.)
13.8.7 Consider a group of 100 people living and working under very similar
conditions. Half of them are given a preventive shot against a certain disease
and the other half serve as control. Of those who received the treatment, 40
did not contract the disease whereas the remaining 10 did so. Of those not
treated, 30 did contract the disease and the remaining 20 did not. Test effectiveness of the vaccine at the level of signicance = 0.05.
13.3
13.9 UMP
Decision-Theoretic
Tests for Testing
Viewpoint
Certain Composite
of Testing Hypotheses
375
WARD
Totals
Favor
Proposal
37
29
32
21
119
Do not favor
proposal
63
71
68
79
281
100
100
100
100
400
Totals
if z B
otherwise.
376
13
Testing Hypotheses
0,
L ; = L1 ,
L2 ,
if and = 0, or c and = 1.
if and = 1
if c
and = 0,
) ( ) (
) (
) (
R ; = L ; 1 P Z B + L ; 0 P Bc ,
or
L1P Z B ,
if
R ; =
(44)
c
if c .
L2 P Z B ,
1} and P (Z B) = , P (Z B) = , we have
0}, c = {
In particular, if = {
L1 ,
if = 0
(45)
R ; =
if = 1 .
L2 1 ,
As in the point estimation case, we would like to determine a decision
function for which the corresponding risk would be uniformly (in ) smaller
than the risk corresponding to any other decision function *. Since this is not
feasible, except for trivial cases, we are led to minimax decision and Bayes
decision functions corresponding to a given prior p.d.f. on . Thus in the case
0} and c = {
1}, is minimax if
that = {
[(
) (
)]
[(
) (
)]
max R 0 ; , R 1 ; max R 0 ; * , R 1 ; *
L1P0 Z B = L2 P1 Z Bc
) (equivalently, R( ; ) = R( ; )).
(46)
377
13.3
13.9 UMP
Decision-Theoretic
Tests for Testing
Viewpoint
Certain Composite
of Testing Hypotheses
1,
z =
0,
()
z B
if
(47)
otherwise,
is minimax.
For simplicity, set P0 and P1 for P and P , respectively, and similarly
0; ) and R(
1; ). Also set P0(Z B) = and
R(0; ), R(1; ) for R(
P1(Z Bc) = 1 . The relation (45) implies that
PROOF
( )
R 0; = L1
and R 1; = L2 1 .
R 0; * = L1P0 Z A
and R 1; * = L2 P1 Z Ac .
Consider R(0; ) and R(0; *) and suppose that R(0; *) R(0; ). This is
equivalent to L1P0(Z A) L1P0(Z B), or
P0 Z A .
Then Theorem 1 implies that P1(Z A) P1(Z B) because the test dened
by (47) is MP in the class of all tests of level . Hence
) (
P1 Z Ac P1 Z Bc , or L2 P1 Z Ac L2 P1 Z Bc ,
( )
R 0; * R 0; , then R 1; R 1; * .
(48)
[(
) (
)]
( )
[ ( ) ( )]
max R 0; * , R 1; * = R 1; * R 1; = max R 0; , R 1; ,
(49)
whereas if R(0; ) < R(0; *), then
[(
) (
)] (
[ ( ) ( )]
378
13
Testing Hypotheses
THEOREM 8
1,
z =
0,
0
if z B
otherwise,
()
PROOF
()
R = L1P0 Z B p0 + L2 P1 Z Bc p1
0
= p0 L1P0 Z B + p1L2 1 P1 Z B
)]
)]
(51)
)]
p1L2 + p0 L1 f z; 0 p1L2 f z; 1 dz
B
p1L2 + p0 L1 f z; 0 p1L2 f z; 1
zB
)]
for the discrete case. In either case, it follows that the which minimizes R ()
is given by
0
1,
z =
0,
0
()
if
p0 L1 f z; 0 p1L2 f z; 1 < 0
otherwise;
equivalently,
1,
z =
0,
0
z B
otherwise,
()
if
where
pL
B = z n ; f z; 1 > 0 1 f z; 0 ,
p1L2
as was to be seen.
It follows that the Bayesian decision function is an LR test and
is, in fact, the MP test for testing H against A at the level P0(Z B), as follows
by Theorem 1.
REMARK 8
13.3
13.9 UMP
Decision-Theoretic
Tests for Testing
Viewpoint
Certain Composite
of Testing Hypotheses
379
Let X1, . . . , Xn be i.i.d. r.v.s from N(, 1). We are interested in determining
the minimax decision function for testing the hypothesis H : = 0 against the
alternative A : = 1. We have
( ) = exp[n( )x ] ,
1
f ( z; )
exp n( )
f z; 1
2
1
2
0
[(
)]
x > C0 ,
where
C0 =
1
log C
1 + 0 +
2
n 1 0
(for > ).
L1P X > C0 = L2 P X C0 .
0
L1 P 0 X > C0 = L2 P 1 X < C0
becomes
or
P
) (
)]
n X 1 < 5 C0 1 = 2 P
n X 0 > 5C0 ,
or
) [
( )]
( ) (
1,
z =
0,
()
if x > 0.53
otherwise.
) [ (
380
13
Testing Hypotheses
) [ (
)]
) (
Thus
[(
)]
) (
max R 0 ; , R 1 ; = 0.0235,
corresponding to the minimax given above.
EXAMPLE 14
Refer to Example 13 and determine the Bayes decision function corresponding to 0 = {p0, p1 }.
From the discussion in the previous example it follows that the Bayes
decision function is given by
1,
z =
0,
0
()
x > C0
if
otherwise,
where
C0 =
1
log C
1 + 0 +
2
n 1 0
and C
p0 L1
.
p1L2
if x > 0.55
otherwise.
()
EXAMPLE 15
Let X1, . . . , Xn be i.i.d. r.v.s from B(1, ). We are interested in determining the
minimax decision function for testing H : = 0 against A : = 1.
We have
( ) =
f ( z; )
f z; 1
0
1 1
0 1 0
n x
, where x = x j ,
j =1
(1 ) > C ,
(1 )
0
13.3
13.9 UMP
Decision-Theoretic
Tests for Testing
Viewpoint
Certain Composite
of Testing Hypotheses
381
where
C 0 = log C n log
1 1
.
1 0
(1 ) = 3
(1 )
0
( > 1)
1 1
C0 = log C n log
1 0
log
( ).
(1 )
1 1 0
0
Next, X = Xj is B(n, ) and for C0 = 13, we have P0.5(X > 13) = 0.0577 and
P0.75(X > 13) = 0.7858, so that P0.75(X 13) = 0.2142. With the chosen values of
L1 and L2, it follows then that relation (46) is satised. Therefore the minimax
decision function is determined by
n
j=1
1,
(z ) =
0,
if x > 13
otherwise.
382
14
Sequential Procedures
Chapter 14
Sequential Procedures
Next, suppose we observe the r.v.s Z1, Z2, . . . one after another, a single
one at a time (sequentially), and we stop observing them after a specied event
occurs. In connection with such a sampling scheme, we have the following
denition.
DEFINITION 2
382
14.1
383
()
(1)
()
()
()
S N s = Z1 s + + Z N ( s ) s ,
s S
(2)
are of interest and one of the problems associated with them is that of nding
the expectation ESN of the r.v. SN. Under suitable regularity conditions, this
expectation is provided by a formula due to Wald.
THEOREM 1
(3)
i) For j 2,
( >)m = E Y
j 1
[(
)]
)(
= E E Yj N = E Yj N = n P N = n
n =1
)(
)(
= E Yj N = n P N = n + E Y j N = n P N = n .
n =1
n= j
(4)
)(
m = m P N = n + E Y j N = n P N = n
n =1
n =j
384
14
Sequential Procedures
or
)(
mP N j = E Yj N = n P N = n .
n =j
(5)
)(
mP N 1 = m = E Y1 = E Y1 N = n P N = n .
n =1
Therefore
E( Y j N = n)P( N = n) = mP( N j ),
j 1,
n= j
and hence
E( Y
j =1n = j
)(
N = n P N = n = m P N j = m jP N = j = mEN ,
j =1
j =1
(6)
jn
n =1 j =1
and
) (
jn
j =1 n = j
are equal. That this is, indeed, the case follows from part (i) and calculus
results (see, for example, T.M. Apostol, Theorem 1242, page 373, in
Mathematical Analysis, Addison-Wesley, 1957).
PROOF OF THEOREM 1
end, we have
[(
)]
)(
)( )
ETN = E E TN N = E TN N = n P N = n
n=1
= E Y j N = n P N = n
n=1
j=1
n
n
E Y j N = n P N = n = E Y j N = n P N = n
j=1
n=1
n=1 j=1
)(
= E Yj N = n P N = n
j=1 n= j
)(
14.1
ii) Here
[(
385
)]
)(
ETN = E E TN N = E TN N = n P N = n
n =1
n
n
= E Yj N = n P N = n = E Yj N = n P N = n
n =1 j =1
n =1 j =1
)(
)(
= E Yj N = n P N = n .
j =1 n = j
(7)
E(Y
j =1 n = j
)(
)(
N = n P N = n E Yj N = n P N = n <
j =1 n = j
[ ( )]
)(
0 = EYj = E E Yj N = E Yj N = n P N = n ,
n =1
(8)
)(
)(
)(
0 = E Yj N = n P N = n + E Yj N = n P N = n
n =1
n= j
= E Yj N = n P N = n .
n= j
(9)
This is so because the event (N = n) depends only on Y1, . . . , Yn, so that, for
j > n, E(Yj|N = n) = EYj = 0. Therefore (9) yields
E(Y j N = n)P( N = n) = 0,
j 2.
(10)
j 1.
(11)
n= j
E(Y j N = n)P( N = n) = 0,
n= j
E(Y
j =1n = j
)(
N = n P N = n = 0.
(12)
386
14
Sequential Procedures
THEOREM 2
Let Z1, Z2, . . . be i.i.d. r.v.s such that P(Zj = 0) 1. Set Sn = Z1 + + Zn and
for two constants C1, C2 with C1 < C2, dene the r. quantity N as the smallest n
for which Sn C1 or Sn C2; set N = if C1 < Sn < C2 for all n. Then there exist
c > 0 and 0 < r < 1 such that
P N n cr n
for all n.
(13)
The assumption P(Zj = 0) 1 implies that P(Zj > 0) > 0, or P(Zj < 0)
> 0. Let us suppose rst that P(Zj > 0) > 0. Then there exists > 0 such that
P(Zj > ) = > 0. In fact, if P(Zj > ) = 0 for every > 0, then, in particular,
P(Zj > 1/n) = 0 for all n. But (Zj > 1/n) (Zj > 0) and hence 0 = P(Zj > 1/n)
P(Zj > 0) > 0, a contradiction.
Thus for the case that P(Zj > 0) > 0, we have that
PROOF
P Z j > = > 0.
such that
(14)
With C1, C2 as in the theorem and as in (14), there exists a positive integer m
such that
m > C 2 C1.
(15)
P Z j > C 2 C1 > m
j = k +1
for k 0.
(16)
We have
k +m
I (Z
j =k +1
k +m
k +m
(17)
the rst inclusion being obvious because there are m Zs, each one of which is
greater than , and the second inclusion being true because of (15). Thus
k +m
k +m
k +m
P Z j > C 2 C1 P I Z j > = P Z j > = m ,
j = k +1
j = k +1
j = k +1
the inequality following from (17) and the equalities being true because of the
independence of the Zs and (14). Clearly
k 1
Skm = Z jm +1 + + Z( j + 1)m .
j =0
implies
Z jm +1 + + Z( j +1)m C 2 C1 ,
j = 0, 1, . . . , k 1.
(18)
14.1
(N km + 1) (C
< S j < C 2 , j = 1, . . . , km
k 1
387
I Z jm +1 + + Z( j +1)m C 2 C1 ,
j= 0
the rst inclusion being obvious from the denition of N and the second one
following from (18). Therefore
[
= P[ Z
]
C C ]
k 1
P N km + 1 P I Z jm +1 + + Z( j +1)m C 2 C1
j = 0
k 1
j =0
k 1
+ + Z( j +1)m
jm +1
) (
1m = 1m
j =0
),
k
the last inequality holding true because of (16) and the equality before it by the
independence of the Zs. Thus
) (
P N km + 1 1 m .
(19)
m 1/m
) (
)
) = c (1 )
P N n P N km + 1 1 m
=
(1 )
m
= cr
(k + 1 )m
(1
k+ 1
1 m
(k + 1 )m
cr n ;
these inequalities and equalities are true because of the choice of k, relation
(19) and the denition of c and r. Thus for the case that P(Zj > 0) > 0, relation
(13) is established. The case P(Zj < 0) > 0 is treated entirely symmetrically, and
also leads to (13). (See also Exercise 14.1.2.) The proof of the theorem is then
completed.
The theorem just proved has the following important corollary.
COROLLARY
Under the assumptions of Theorem 2, we have (i) P(N < ) = 1 and (ii)
EN < .
PROOF
( )
( )
P A = P lim An = lim P An
n
388
14
Sequential Procedures
n=1
n=1
EN = nP N = n = P N n cr n = c r n
n=1
n=1
=c
r
< ,
1r
as was to be seen.
The r.v. N is positive integer-valued and it might also take on the
value but with probability 0 by the rst part of the corollary. On the other
hand, from the denition of N it follows that for each n, the event (N = n)
depends only on the r.v.s Z1, . . . , Zn. Accordingly, N is a stopping time by
Denition 1 and Remark 1.
REMARK 2
Exercises
14.1.1
14.1.2
In Theorem 2, assume that P(Zj < 0) > 0 and arrive at relation (13).
n = n X1 , . . . , X n ; 0, 1 =
( ) ( ).
f (X ) f (X )
f1 X1 f1 X n
0
14.1
Some Basic
14.2 Theorems
SequentialofProbability
SequentialRatio
Sampling
Test
389
We shall use the same notation n for n (x1, . . . , xn; 0, 1), where x1, . . . , xn are
the observed values of X1, . . . , Xn.
For testing H against A, consider the following sequential procedure: As
long as a < n < b, take another observation, and as soon as n a, stop
sampling and accept H and as soon as n b, stop sampling and reject H.
By letting N stand for the smallest n for which n a or n b, we have that
N takes on the values 1, 2, . . . and possibly , and, clearly, for each n, the event
(N = n) depends only on X1, . . . , Xn. Under suitable additional assumptions,
we shall show that the value is taken on only with probability 0, so that N will
be a stopping time.
Then the sequential procedure just described is called a sequential probability ratio test (SPRT) for obvious reasons.
In what follows, we restrict ourselves to the common set of positivity of
f0 and f1, and for j = 1, . . . , n, set
Z j = Z j X j ; 0, 1 = log
( ),
f (X )
f1 X j
0
so that log n = Z j .
j =1
Clearly, the Zjs are i.i.d. since the Xs are so, and if Sn = nj= 1Zj, then N is
redened as the smallest n for which Sn log a or Sn log b.
At this point, we also make the assumption that Pi[f0(X1) f1(X1)] > 0 for
i = 0, 1; equivalently, if C is the set over which f0 and f1 differ, then it is assumed
that C f0(x)dx > 0 and C f1(x)dx > 0 for the continuous case. This assumption is
equivalent to Pi(Z1 0) > 0 under which the corollary to Theorem 2 applies.
Summarizing, we have the following result.
PROPOSITION 1
Let X1, X2, . . . be i.i.d. r.v.s with p.d.f. either f0 or else f1, and suppose that
n =
( ) ( ),
f (X ) f (X )
f1 X1 f1 X n
0
Z j = log
( ),
f (X )
f1 X j
0
j = 1, . . . , n
and
n
Sn = Z j = log n .
j =1
For two numbers a and b with 0 < a < b, dene the random quantity N as the
smallest n for which n a or n b; equivalently, the smallest n for which
Sn log a or Sn log b for all n. Then
i = 0, 1.
390
14
Sequential Procedures
[(
) (
= P0 1 b + a < 1 < b, 2 b +
(
= P ( b) + P ( a < < b,
+ P ( a < < b, . . . , a <
and
n1
b +
< b, n b +
)
a) +
(20)
[(
) (
= P1 1 a + a < 1 < b, 2
(
= P ( a ) + P ( a < < b,
+ P ( a < < b, . . . , a <
n1
a +
< b, n a + .
(21)
Relations (20) and (21) allow us to determine theoretically the cut-off points
a and b when and are given.
In order to nd workable values of a and b, we proceed as follows. For
each n, set
fin = f x1 , . . . , x n ; i , i = 0, 1
( )
( )
f11 x1
f
b
T1 = x1 ; 11 a , T1= x1 ;
f01
f01 x1
(22)
and for n 2,
f
T n = x1 , . . . , x n n ; a < 1 j < b, j = 1, . . . , n 1 and
f0 j
f
T n = x1 , . . . , x n n ; a < 1 j < b, j = 1, . . . , n 1 and
f0 j
f1n
a , (23)
f0 n
f1n
b. (24)
f0 n
14.1
Some Basic
14.2 Theorems
SequentialofProbability
SequentialRatio
Sampling
Test
391
In other words, T n is the set of points in n for which the SPRT terminates
with n observations and accepts H, while T n is the set of points in n for which
the SPRT terminates with n observations and rejects H.
In the remainder of this section, the arguments will be carried out for the
case that the Xjs are continuous, the discrete case being treated in the same
way by replacing integrals by summation signs. Also, for simplicity, the differentials in the integrals will not be indicated.
From (20), (22) and (23), one has
= f0n .
n =1
T n
= f0 n
n=1
T n
1
f1n.
b n=1 T n
(25)
Pi N = n = fin + fin , i = 0, 1,
T n
T n
and by Proposition 1,
1 = Pi N = n = fin + fin , i = 0, 1.
n=1
n=1
T n
n=1
T n
(26)
T n
n=1
T n
T f1n = .
n=1
b,
(27)
and in a very similar way (see also Exercise 14.2.1), we also obtain
1 1
a.
(28)
(29)
, b .
1
Relation (29) provides us with a lower bound and an upper bound for the
actual cut-off points a and b, respectively.
Now set
, and b =
1
(30)
and suppose that the SPRT is carried out by employing the cut-off points a
and b given by (30) rather than the original ones a and b. Furthermore, let
392
14
Sequential Procedures
and 1 be the two types of errors associated with a and b. Then replacing
, , a and b by , , a and b, respectively, in (29) and also taking into
consideration (30), we obtain
1
1
a =
1
1
= b
and
and hence
1
1
1
1
1
1
and
(31)
That is,
and 1
1
.
1
(32)
(1 )(1 ) (1 )(1 )
or
(1 ) + (1 ) +
and ,
and ,
+ 1 + 1 .
(33)
Summarizing the main points of our derivations, we have the following result.
PROPOSITION 2
For testing H against A by means of the SPRT with prescribed error probabilities and 1 such that < < 1, the cut-off points a and b are determined
by (20) and (21). Relation (30) provides approximate cut-off points a and b
with corresponding error probabilities and 1 , say. Then relation (32)
provides upper bounds for and 1 and inequality (33) shows that their
sum + (1 ) is always bounded above by + (1 ).
From (33) it follows that > and 1 > 1 cannot happen
simultaneously. Furthermore, the typical values of and 1 are such as 0.01,
0.05 and 0.1, and then it follows from (32) that and 1 lie close to and
1 , respectively. For example, for = 0.01 and 1 = 0.05, we have <
0.0106 and 1 < 0.0506. So there is no serious problem as far as and 1
are concerned. The only problem which may arise is that, because a and b
are used instead of a and b, the resulting and 1 are too small compared
to and 1 , respectively. As a consequence, we would be led to taking a
much larger number of observations than would actually be needed to obtain
and . It can be argued that this does not happen.
REMARK 3
Exercise
14.2.1 Derive inequality (28) by using arguments similar to the ones employed in establishing relation (27).
14.1
14.3 Some
Optimality
BasicofTheorems
the SPRT-Expected
of Sequential
Sample
Sampling
Size
393
n= 2
n a or n b , i = 0, 1.
(34)
Thus formula (34) provides the expected sample size of the SPRT under both
H and A, but the actual calculations are tedious. This suggests that we should
try to nd an approximate value to Ei N, as follows. By setting A = log a and
B = log b, we have the relationships below:
(a <
< b, j = 1, . . . , n 1, n a or b
= A < Zi < B, j = 1, . . . , n 1,
i =1
j
and
Zi A or
i =1
i =1
) (
B , n 2
a or 1 b = Z1 A or Z1 B .
(35)
(36)
From the right-hand side of (35), all partial sums Zi, j = 1, . . . , n 1 lie
between A and B and it is only the ni=1Zi which is either A or B, and this is
due to the nth observation Zn. We would then expect that ni= 1Zi would not be
too far away from either A or B. Accordingly, by letting SN = Ni= 1Zi, we are led
to assume as an approximation that SN takes on the values A and B with
respective probabilities
j
i=1
Pi S N A
But
and Pi S N B ,
i = 0, 1.
P0 S N A = 1 , P0 S N B =
and
P1 S N A = 1 , P1 S N B = .
394
14
Sequential Procedures
Therefore we obtain
E0 S N 1 A + B and E1 S N 1 A + B.
(37)
E0 N
(1 )A + B ,
E0 Z1
E1 N
(1 )A + B .
E1Z1
(38)
In the SPRT with error probabilities and 1 , the expected sample size EiN,
i = 0, 1 is given by (34). If furthermore Ei|Z1| < and EiZ1 0, i = 0, 1, relation
(38) provides approximations to EiN, i = 0, 1.
Actually, in order to be able to calculate the approximations
given by (38), it is necessary to replace A and B by their approximate values
taken from (30), that is,
REMARK 4
A log a = log
1
1
and B log b =
(39)
In utilizing (39), we also assume that < < 1, since (30) was derived under
this additional (but entirely reasonable) condition.
Exercises
14.3.1 Let X1, X2, . . . be independent r.v.s distributed as P(), =
(0, ). Use the SPRT for testing the hypothesis H : = 0.03 against the alternative A : = 0.05 with = 0.1, 1 = 0.05. Find the expected sample sizes under
both H and A and compare them with the xed sample size of the MP test for
testing H against A with the same and 1 as above.
14.3.2 Discuss the same questions as in the previous exercise if the Xjs
are independently distributed as Negative Exponential with parameter
= (0, ).
14.1
395
What we explicitly do, is to set up the formal SPRT and for selected
numerical values of and 1 , calculate a, b, upper bounds for and
1 , estimate EiN, i = 0, 1, and nally compare the estimated EiN, i = 0, 1
with the size of the xed sample size test with the same error probabilities.
EXAMPLE 1
f x; = x 1
1 x
, x = 0, 1, = 0, 1 .
j X j
1 1
1 0
n j X j
1 1
A n log
1
0
log
( )
(1 )
1 1 0
0
1 1
< X j < B n log
0
1
j =1
log
( ).
(1 )
1 1 0
0
(40)
Next,
Z1 = log
( ) = X log (1 ) + log 1
1
f (X )
(1 )
f1 X1
so that
EiZ1 = i log
( ) + log 1
1
(1 )
1 1 0
0
, i = 0, 1.
(41)
For a numerical application, take = 0.01 and 1 = 0.05. Then the cut-off
points a and b are approximately equal to a and b, respectively, where a and
b are given by (30). In the present case,
a =
0.05
0.05
0.95
=
0.0505 and b =
= 95.
1 0.01 0.99
0.01
For the cut-off points a and b, the corresponding error probabilities and
1 are bounded as follows according to (32):
a
0.01
0.05
0.0105 and 1
0.0505.
0.95
0.99
5
= 1.29667 and B log 95 = 1.97772.
99
At this point, let us suppose that 0 = 38 and 1 = 48 . Then
A log
(42)
396
14
Sequential Procedures
log
( ) = log 5 = 0.22185
3
(1 )
1 1 0
0
and log
1 1
4
= log = 0.09691,
1 0
5
(43)
On the other hand, the MP test for testing H against A based on a xed
sample size n is given by (9) in Chapter 13. Using the normal approximation,
we nd that for the given = 0.01 and = 0.95, n has to be equal to 244.05.
Thus both E0N and E1N compare very favorably with it.
EXAMPLE 2
Let X1, X2, . . . be i.i.d. r.v.s with p.d.f. that of N(, 1). Then
n
1
n = exp 1 0 X j n 12 20
2
j =1
and we continue sampling as long as
n 2
2
A + 2 1 0
) (
n
0 < X j < B + 21 20
2
j =1
) (
0 .
(44)
Next,
Z1 = log
( )=
(
f (X )
f1 X1
0 X1
1 2
1 02 ,
2
so that
1 2
1 02 , i = 0, 1.
(45)
2
By using the same values of and 1 as in the previous example, we have
the same A and B as before. Taking 0 = 0 and 1 = 1, we have
EiZ1 = i 1 0
Now the xed sample size MP test is given by (13) in Chapter 13. From this
we nd that n 15.84. Again both E0N and E1N compare very favorably with
the xed value of n which provides the same protection.
15.2
Some Examples
397
Chapter 15
Condence RegionsTolerance
Intervals
A random interval is a nite or innite interval, where at least one of the end
points is an r.v.
DEFINITION 2
Let L(X1, . . . , Xn) and U(X1, . . . , Xn) be two statistics such that L(X1, . . . , Xn)
U(X1, . . . , Xn). We say that the r. interval [L(X1, . . . , Xn), U(X1, . . . , Xn)] is
a condence interval for with condence coefcient 1 (0 < < 1) if
[(
)]
P L X1 , . . . , X n U X1 , . . . , X n 1
for all .
(1)
Also we say that U(X1, . . . , Xn) and L(X1, . . . , Xn) is an upper and a lower
condence limit for , respectively, with condence coefcient 1 , if for all
,
)]
P < U X1 , . . . , X n 1
and
[(
P L X1 , . . . , X n < 1 .
(2)
397
398
15
)]
P L X1 , . . . , X n + P U X1 , . . . , X n
[(
)]
)]
= P L X1 , . . . , X n U X1 , . . . , X n + 1,
it follows that, if L(X1, . . . , Xn) and U(X1, . . . , Xn) is a lower and an upper
condence limit for , respectively, each with condence coefcient 1 12 ,
then [L(X1, . . . , Xn), U(X1, . . . , Xn)] is a condence interval for with condence coefcient 1 . The length l(X1, . . . , Xn) of this condence interval is
l = l(X1, . . . , Xn) = U(X1, . . . , Xn) L(X1, . . . , Xn) and the expected length is
E l, if it exists.
Now it is quite possible that there exist more than one condence interval
for with the same condence coefcient 1 . In such a case, it is obvious
that we would be interested in nding the shortest condence interval within a
certain class of condence intervals. This will be done explicitly in a number of
interesting examples.
At this point, it should be pointed out that a general procedure for
constructing a condence interval is as follows: We start out with an r.v.
Tn() = T(X1, . . . , Xn; ) which depends on and on the Xs only through a
sufcient statistic of , and whose distribution, under P , is completely determined. Then Ln = L(X1, . . . , Xn) and Un = U(X1, . . . , Xn) are some rather
simple functions of Tn() which are chosen in an obvious manner.
The examples which follow illustrate the point.
Exercise
15.1.1
15.2
Some Examples
399
dence interval (and also the shortest condence interval within a certain class)
for with condence coefcient 1 .
EXAMPLE 1
Let X1, . . . , Xn be i.i.d. r.v.s from N(, 2). First, suppose that is known, so
)/. Then Tn()
that is the parameter, and consider the r.v. Tn() = n(X
depends on the Xs only through the sufcient statistic X of and its distribution is N(0, 1) for all .
Next, determine two numbers a and b (a < b) such that
( ) ]
P a N 0, 1 b = 1 .
which is equivalent to
P a
n X
(3)
) b = 1
P X b
X a
= 1 .
n
n
Therefore
, X a
(4)
X b
n
n
is a condence interval for with condence coefcient 1 . Its length is
equal to (b a) /n. From this it follows that, among all condence intervals
with condence coefcient 1 which are of the form (4), the shortest one is
that for which b a is smallest, where a and b satisfy (3). It can be seen (see
also Exercise 15.2.1) that this happens if b = c (> 0) and a = c, where c is the
upper /2 quantile of the N(0, 1) distribution which we denote by z /2. Therefore the shortest condence interval for with condence coefcient 1
(and which is of the form (4)) is given by
(5)
, X + z 2
X z 2
.
n
n
Next, assume that is known, so that 2 is the parameter, and consider
the r.v.
( )
Tn 2 =
2
1 n
nSn2
2
,
where
S
=
Xj .
n
2
n j =1
P a n2 b = 1 .
From (6), we obtain
(6)
400
15
nS 2
P a 2n b = 1
which is equivalent to
nS 2
nS 2
P n 2 n = 1 .
a
b
2
Therefore
nSn2
,
nSn2
(7)
db b 2
=
.
da a 2
(8)
Now, letting Gn and gn be the d.f. and the p.d.f. of the 2n, relation (6) becomes
Gn(b) Gn(a) = 1 . Differentiating it with respect to a, one obtains
g (a) = 0,
( ) db
da
gn b
()
()
db gn a
=
.
da gn b
or
Thus (8) becomes a2gn(a) = b2gn(b). By means of this result and (6), it follows
that a and b are determined by
()
()
a 2 gn a = b2 gn b
and
a g n (t )dt = 1 .
b
(9)
For the numerical solution of (9), tables are required. Such tables are available
(see Table 678 in R. F. Tate and G. W. Klett, Optimum condence intervals
for the variance of a normal distribution, Journal of the American Statistical
Association, 1959, Vol. 54, pp. 674682) for n = 2(1)29 and 1 = 0.90, 0.95,
0.99, 0.995, 0.999).
To summarize then, the shortest (both in actual and expected length)
condence interval for 2 with condence coefcient 1 (and which is of the
form (7)) is given by
15.2
nSn2
,
Some Examples
401
nSn2
,
a
40.646
2
25S25
.
13.120
25S25
,
45.7051 14.2636
Let X1, . . . , Xn be i.i.d. r.v.s from the Gamma distribution with parameter
and a known positive integer, call it r. Then nj= 1Xj is a sufcient statistic for
(see Exercise 11.1.2(iii), Chapter 11). Furthermore, for each j = 1, . . . , n, the
r.v. 2Xj / is 22r, since
2 X
(t ) =
Xj
2t
1
=
1 2it
2r 2
Therefore
()
Tn =
2 n
Xj
j =1
is 22rn for all > 0. Now determine a and b (0 < a < b) such that
P a 22rn b = 1 .
(10)
2 n
P a X j b = 1
j =1
which is equivalent to
n
n
P 2 X j b 2 X j a = 1 .
j =1
j =1
402
15
b
a
1 1 n
l = 2 X j ,
a b j =1
(11)
1 1
E l = 2 rn .
a b
()
()
a 2 g 2 rn a = b 2 g 2 rn b
a g 2 rn (t )dt = 1 .
b
and
2 7 X
j =1 j ,
49.3675
2 j =1 X j
.
16.5128
2 7 X
j =1 j ,
44.461
2 j =1 X j
,
15.308
Let X1, . . . , Xn be i.i.d. r.v.s from the Beta distribution with = 1 and =
unknown.
Then nj= 1Xj, or jn= 1 log Xj is a sufcient statistic for . (See Exercise
11.1.2(iv) in Chapter 11.) Consider the r.v. Yj = 2 log Xj. It is easily seen that
its p.d.f. is 21 exp(yj /2), yj > 0, which is the p.d.f. of a 22. This shows that
()
j =1
j =1
Tn = 2 log X j = Yj
is distributed as 22n. Now determine a and b (0 < a < b) such that
P a 22n b = 1 .
(12)
15.2
Some Examples
403
P a 2 log X j b = 1
j =1
which is equivalent to
n
n
P a 2 log X j b 2 log X j = 1 .
j =1
j =1
a
b
.
,
n
2 n log X
2 j =1 log X j
j
j =1
(13)
ab
2 j =1 log X j
n
Considering dl/da = 0 in conjunction with (12) in the same way as it was done
in Example 2, we have that the shortest (both in actual and expected length)
condence interval (which is of the form (13)) is found by numerically solving
the equations
()
()
g 2n a = g 2n b
and
a g 2n (t )dt = 1 .
b
Let X1, . . . , Xn be i.i.d. r.v.s from U(0, ). Then Yn = X(n) is a sufcient statistic
for (see Example 7, Chapter 11) and its p.d.f. gn is given by
( )
gn yn =
n n 1
yn , 0 yn
n
Consider the r.v. Tn() = Yn/. Its p.d.f. is easily seen to be given by
()
hn t = nt n 1 ,
0 t 1.
() ]
P a Tn b = nt n 1dt = bn an = 1 .
a
(14)
404
15
,
a
b
(15)
1 da 1
dl
= X (n ) 2
+ ,
db
a db b 2
while by way of (14), da/db = bn1/an1, so that
dl
an +1 bn +1
= X (n )
.
db
b 2 an +1
Since this is less than 0 for all b, l is decreasing as a function of b and its
minimum is obtained for b = 1, in which case a = 1/n, by means of (14).
Therefore the shortest (both in actual and expected length) condence interval with condence coefcient 1 (which is the form (15)) is given by
X (n )
X n ,
.
(
)
1 n
Exercises
15.2.1 Let be the d.f. of the N(0, 1) distribution and let a and b with a < b be
such that (b) (a) = (0 < < 1). Show that b a is minimum if b = c (> 0)
and a = c. (See also the discussion of the second part of Example 1.)
15.2.2 Let X1, . . . , Xn be independent r.v.s having the Negative Exponential
distribution with parameter = (0, ), and set U = ni= 1Xi.
ii) Show that the r.v. U is distributed as Gamma with parameters (n, ) and
that the r.v. 2U/ is distributed as 22n;
15.2
Some Exercises
Examples
405
ii) Use part (i) to construct a condence interval for with condence coefcient 1 . (Hint: Use the parametrization f (x; ) = 1/ ex/, x > 0).
15.2.3
ii) If the r.v. X has the Negative Exponential distribution with parameter
= (0, ), show that the reliability R(x; ) = P(X > x) (x > 0) is equal to ex/;
ii) If X1, . . . , Xn is a random sample from the distribution in part (i) and U =
in= 1Xi, then (by Exercise 15.2.2(i)) 2U/ is distributed as 22n. Use this fact
and part (i) of this exercise to construct a condence interval for R(x; )
with condence coefcient 1 . (Hint: Use the parametrization f (x; ) =
1/ ex/, x > 0).
15.2.4
iii) Show that a condence interval for , based on R, with condence coefcient 1 is of the form [R, R/c], where c is a root of the equation
[ ( )]
c n 1 n n 1 c =
iii) Show that the expected length of the shortest condence interval in Example 4 is shorter than that of the condence interval in (ii) above. (Hint: Use
the parametrization f (x; ) = 1/ ex/, x > 0).
15.2.5
()
22 ;
, Y1 ,
Y1
2n
406
15
iii) The shortest condence interval of the form given in (ii) is taken for a and
b satisfying the equations
a g 2n (t )dt = 1
b
()
()
a 2 g 2n a = b 2 g 2n b ,
and
f x; =
1 x
e ,
2
= 0, .
15.2.8 Consider the independent random samples X1, . . . , Xm from N(1, 21)
and Y1, . . . , Yn from N(2, 22), where 1, 2 are known and 1, 2 are unknown,
and let the r.v. Tm,n( 1 2) be dened by
Tm, n 1 2 =
(X
) (
Yn 1 2
) (
m +
2
1
2
2
).
2 2
2 2
X m Yn b 1 + 2 , X m Yn a 1 + 2 ,
m
n
m
n
15.2.9 Refer to Exercise 15.2.8, but now suppose that 1, 2 are known and
1, 2 are unknown. Consider the r.v.
2 S2
Tm, n 1 = 12 n2
2 2 Sm
and show that a condence interval for 21/ 22, based on Tm,n(1/2), with
condence coefcient 1 is given by
Sm2
Sm2
a 2 , b 2 ,
Sn
Sn
407
where 0 < a < b are such that P(a Fn,m b) = 1 . In particular, the equaltails condence interval is provided by the last expression above with
a = Fn,m; /2 and b = Fn,m; /2, where F n,m;
/2 and Fn,m; /2 are the lower and the upper
/2 quantiles, respectively, of Fn,m.
15.2.10 Let X1, . . . , Xm and Y1, . . . , Yn be independent random samples
from the Negative Exponential distributions with parameters 1 and 2, respectively, and set U = mi= 1Xi, V = ni= 1Yj. Then (by Exercise 15.2.2(i)) the independent r.v.s 2U/1 and 2V/2 are distributed as 22m and 22n, respectively, so
that the r.v. 2V2 2U1 is distributed as F2n,2m. Use this result in order to construct a
condence interval for 1/2 with condence coefcient 1 . (Hint: Employ
the parametrization used in Exercise 15.2.2.)
2
1 n
X j Xn .
n 1 j =1
only through the sufcient statistic (X n, S 2n1) of (, 2). Basing the condence
interval in question on Tn(), which is tn1 distributed, and working as in
Example 1, we obtain a condence interval of the form
Sn 1
S
, X n a n 1 .
Xn b
n
n
(16)
Sn 1
S
, X n + t n 1; 2 n 1 ,
X n t n 1; 2
n
n
(17)
408
15
where tn1; /2 is the upper /2 quantile of the tn1 distribution. For instance, for
n = 25 and 1 = 0.95, the corresponding condence interval for is taken
n 0.41278S24,
from (17) with t 24;0.025 = 2.0639. Thus we have approximately [X
X n + 0.41278S24].
Suppose now that we wish to construct a condence interval for 2. To this
end, modify the r.v. Tn( 2) of Example 1 as follows:
n1 S
( ) ( )
n 1
T n 2 =
()
and
n 1 Sn21
n 1 Sn21
,
(18)
,
b
a
and the shortest condence interval of this form is taken when a and b are
numerical solutions of the equations
()
a 2 gn 1 a = b 2 gn 1 b
a gn 1 (t )dt = 1 .
b
Thus with n and 1 as above, one has, by means of the tables cited in
Example 1, a = 13.5227 and b = 44.4802, so that the corresponding interval
approximately is equal to [0.539S 224, 1.775S 224].
EXAMPLE 6
(X
(m 1)S
Tm, n 1 2 =
) (
Yn 1 2
+ n 1 S n21 1 1
+
m+n2
m n
2
m1
X Y t
n
m+n 2; a
m
(X
Yn + t m + n 2 ; a
(m 1)S
+ n 1 S n2 1 1
1
+ ,
m+ n2
m n
(m 1)S
2
m1
+ n 1 S n2 1 1
1
+ .
m+ n2
m n
2
m1
409
For instance, for m = 13, n = 14 and 1 = 0.95, we have t25;0.025 = 2.0595, so that
the corresponding interval approximately is equal to
(X
13
2 S2
Tm, n 1 = 12 2n1
2 2 S m1
which is distributed as Fn1,m1. Now determine two numbers a and b with 0 < a
< b and such that
P a Fn 1,m 1 b = 1 .
Then
P
12 Sn21
a 2 2 b = 1 ,
2 Sm 1
or
P
Sm2 1 12
Sm2 1
a
= 1 .
2
2
Sn21
Sn 1 2
Sm2 1
Fn 1,m 1;
Sn21
where Fn1,m1; /2 and Fn1,m1; /2 are the lower and the upper /2-quantiles of
Fn1,m1. The point Fn1,m1; /2 is read off the F-tables and the point Fn1,m1; /2 is
given by
Fn1,m 1; 2 =
1
.
Fm 1,n 1; 2
S122
S122
0.3171 2 , 3.2388 2 .
S13
S13
410
15
Exercise
15.3.1 Let X1, . . . , Xn be independent r.v.s distributed as N(, 2). Derive a
condence interval for with condence coefcient 1 when is unknown.
n Xn
(n 1)S
2
n 1
and
( ) ]
P c N 0, 1 c = 1
P a n21 b = 1 .
and
n Xn
n 1 Sn21
P , c
c, a
n Xn
n 1 Sn21
= 1 .
= P , c
c P , a
Equivalently,
P , X n
c 2 2
,
n
(n 1)S
2
n 1
(n 1)S
2
n 1
= 1 .
(19)
For the observed values of the Xs, we have the condence region for (, 2)
indicated in Fig. 15.1. The quantities a, b and c may be determined so that
the resulting intervals are the shortest ones, both in actual and expected
lengths.
Now suppose again that is real-valued. In all of the examples considered
so far the r.v.s employed for the construction of condence intervals had an
exact and known distribution. There are important examples, however, where
this is not the case. That is, no suitable r.v. with known distribution is available
which can be used for setting up condence intervals. In cases like this, under
411
2
2
confidence region for
(, 2 )' with confidence
coefficient 1
1 n
(xj
a j
1
( xn ) 2
2
1
b
xn ) 2
c 2 2
n
(xj xn) 2
j1
Figure 15.1
1 n
X j Xn
n j =1
n
2
X n z
Sn
2
, X n + z
Sn
.
n
(20)
Let X1, . . . , Xn be i.i.d. r.v.s from B(1, p). The problem is that of constructing
a condence interval for p with approximate condence coefcient 1 . Here
n(1 X
n), so that (20) becomes
S 2n = X
X z
EXAMPLE 9
Xn 1 Xn
2
),
X n + z
Xn 1 Xn
.
Let X1, . . . , Xn be i.i.d. r.v.s from P(). Then a condence interval for with
approximate condence coefcient 1 is provided by
X n z
Xn
.
n
The two-sample problem also ts into this scheme, provided both means
and variances (known or not) are nite.
412
15
We close this section with a result which shows that there is an intimate
relationship between constructing condence regions and testing hypotheses.
Let X1, . . . , Xn be i.i.d. r.v.s with p.d.f. f(x; ), r. For each * let
*): = * at level of
us consider the problem of testing the hypothesis H(
*) stand for the acceptance region in n. Set Z =
signicance , and let A(
(X1, . . . , Xn), z = (x1, . . . , xn), and dene the region T(z) in as follows:
() {
( )}
T z = : z A .
(21)
In other words. T(z) is that subset of with the following property: On the
) is accepted for T(z). From (21), it is obvious that
basis of z, every H(
()
z A
()
if and only if T z .
Therefore
( )]
( )]
P T Z = P Z A 1 ,
so that T(Z) is a condence region for with condence coefcient 1 . Thus
we have the following theorem.
THEOREM 1
Exercises
15.4.1 Let X1, . . . , Xn be i.i.d. r.v.s with (nite) unknown mean and (nite)
known variance 2, and suppose that n is large.
iii) Use the CLT to construct a condence interval for with approximate
condence coefcient 1 ;
iii) What does this interval become if n = 100, = 1 and = 0.05?
iii) Refer to part (i) and determine n so that the length of the condence
interval is 0.1, provided = 1 and = 0.05.
15.4.2 Refer to the previous problem and suppose that both and 2 are
unknown. Then a condence interval for with approximate condence coefcient 1 is given be relation (20).
iii) What does this interval become for n = 100 and = 0.05?
iii) Show that the length of this condence interval tends to 0 in probability
(and also a.s.) as n ;
iii) Discuss part (i) for the case that the underlying distribution is B(1, ),
= (0, 1) or P(), = (0, ).
15.5
15.2Tolerance
Some Examples
Intervals
413
414
15
If we notice that for the observed values t1 and t 2 of T1 and T2, respectively,
F(t 2) F(t1) is the portion of the distribution mass of F which lies in the interval
(t1, t 2], the concept of a tolerance interval has an interpretation analogous to
that of a condence interval. Namely, suppose the r. experiment under consideration is carried out independently n times and let (t1, t 2] be the resulting
interval for the observed values of the Xs. Suppose now that this is repeated
independently N times, so that we obtain N intervals (t1, t2]. Then as N gets
larger and larger, at least 100 of the N intervals will cover at least 100p
percent of the distribution mass of F.
Now regarding the actual construction of tolerance intervals, we have the
following result.
THEOREM 2
Let X1, . . . , Xn be i.i.d. r.v.s with p.d.f. f of the continuous type and let Yj = X( j),
j = 1, . . . , n be the order statistics. Then for any p (0, 1) and 1 i < j n, the
r. interval (Yi, Yj] is a 100 percent tolerance interval of 100p percent of F,
where is determined as follows:
1
()
= g j i d ,
p
P Z j Zi p = .
(22)
and Wk = Zk Zk 1 ,
k = 2, . . . , n.
0 < wk , k = 1, . . . , n, w1 + + wn < 1
otherwise.
) (
Z j Zi = W1 + + Wj W1 + + Wi = Wi +1 + + Wj .
415
15.5
15.2Tolerance
Some Examples
Intervals
n!,
g 1 , . . . , k =
0,
otherwise.
It follows then from Theorem 2(i) in Chapter 10, that the marginal p.d.f. gr is
given by
n!
v r 1 1 v
gr v = r 1 ! n r
0,
() (
)(
n r
0<v<1
otherwise.
(
() (
n+1
v r 1 1 v
gr v = r n r + 1
0,
()
n r
0<v<1
otherwise.
g (v)dv = .
1
[( ) ] [
( )] [ ( )]
P F X p = P X F 1 p = F F 1 p = p,
so that (, X] does cover at most p of the distribution mass of F with
probability p, and
[( )
[( )
[ ( )]
P F X > 1 p = 1 P F X 1 p = 1 F F 1 1 p = 1 1 p = p,
416
16
Chapter 16
y j = 1 + 2 x j ,
j = 1, . . . , n
416
(n 2),
(1)
y j = 1 + 2 x j + + k +1 x kj ,
j = 1, . . . , n
(2 k n 1),
( )
417
(2)
( )
(n 2k + 1),
(3)
Yj = 1 + 2 x j + e j ,
Yj = 1 + 2 x j + + k +1 x kj + e j ,
and
j = 1, . . . , n n 2 ,
(1)
(
cos( kt ) +
j = 1, . . . , n 2 k n 1 ,
Y j = 1 + 2 cos t j + 3 sin t j + + 2 k
2 k+1
j = 1, . . . , n
(2)
( )
sin kt j , + e j ,
(n 2k + 1).
(3)
At this point, one observes that the models appearing in relations (1)(3)
are special cases of the following general model:
Y1 = x11 1 + x 21 2 + + x p1 p + e1
Y2 = x12 1 + x 22 2 + + x p 2 p + e 2
Yn = x1n 1 + x 2 n 2 + + x pn p + e n ,
Yj = xij i + e j ,
j = 1, . . . , n with
(4)
i =1
By setting
Y = Y1 , . . . , Yn , = 1 , . . . , p , e = e1 , . . . , en
and
x11 x21 x p1
x11 x12 x1n
x
x12 x22 x p 2
x22 x2n
21
X=
, so that X =
x p1 x p 2 x pn
x1n x2n x pn
relation (4) is written as follows in matrix notation:
Y = X + e.
(5)
The model given by (5) is called the general linear model (linear because the
parameters 1, . . . , p enter the model in their rst powers only). At this point,
it should be noted that what one actually observes is the r. vector Y, whereas
the r. vector e is unobservable.
418
16
DEFINITION 1
Let C = (Zij) be an n k matrix whose elements Zij are r.v.s. Then by assuming
the EZij are nite, the EC is dened as follows. EC = (EZij). In particular, for
Z = (Z1, . . . , Zn), we have EZ = (EZ1, . . . , EZn), and for C = (Z EZ)(Z
EZ), we have EC = E[(Z EZ) (Z EZ)]. This last expression is denoted by
/ z and is called the variancecovariance matrix of Z, or just the covariance
matrix of Z. Clearly the (i, j)th element of the n n matrix / z is Co(Zi, Zj), the
covariance of Zi and Zj, so that the diagonal elements are simply the variances
of the Zs.
Since the r.v.s ej, j = 1, . . . , n are r. errors, it is reasonable to assume that
Eej = 0 and that 2(ej) = 2, j = 1, . . . , n. Another assumption about the es
which is often made is that they are uncorrelated, that is, Co(ei, ej) = 0, for
i j. These assumptions are summarized by writing E(e) = 0 and / e = 2 In,
where In is the n n unit matrix.
By then taking into consideration Denition 1 and the assumptions just
made, our model in (5) becomes as follows:
Y = X + e, EY = X = , / Y = 2 I n ,
(6)
Y = e = e j2 ,
2
j =1
is minimum.
DEFINITION 2
, is
Any value of which minimizes the squared norm Y 2, where = X
called a least square estimator (LSE) of and is denoted by .
The norm of an m-dimensional vector v = (v1, . . . , vm), denoted by v, is
the usual Euclidean norm, namely
16.2
419
m
v = j2
j =1
1 2
For the pictorial illustration of the principle of LS, let p = 2, x1j = 1 and
x2j = xj, j = 1, . . . , n, so that j = 1 + 2xj, j = 1, . . . , n. Thus (xj, j), j = 1, . . . ,
n are n points on the straight line = 1 + 2x and the LS principle species that
1 and 2 be chosen so that jn= 1(Yj j)2 be minimum; Yj is the (observable) r.v.
corresponding to xj, j = 1, . . . , n. (See also Fig. 16.1.)
(The values of 1 and 2 are chosen in order to minimize the quantity
(Y1 1)2 + + (Y5 5)2.)
, we have that
From (1, . . . , n) = = X
p
j = xij i ,
j = 1, . . . , n
i =1
and
n
Y = Yj j
2
j =1
p
n
= Yj xij i ,
j =1
i =1
1 2 x
Y5
Y5 5
5
Y4
Y4 4
4
3
Y3
Y3 3
2
Y2
Y2 2
x1
x2
Y1 1
Figure 16.1
0
1
Y1
x3
x4
x5
420
16
S Y, = 0, = 1, . . . , p
v
j =1
i =1
j =1
j =1 i =1
(7)
j =1
(7)
Actually, the set of LSEs of coincides with the set of solutions of the normal
equations, as the following theorem shows. The normal equations provide a
method for the actual calculation of LSEs.
THEOREM 1
Any LSE of is a solution of the normal equations and any solution of the
normal equations is an LSE.
PROOF
We have
x11 x 21 x p1 1
x12 x 22 x p 2 2
= X =
M
x1n x 2 n x pn p
= x11 1 + x 21 2 + + x p1 p , x12 1 + x 22 2 +
+ x p 2 p , . . . , x1n 1 + x 2 n 2 + + x pn p
+ p x p1 , x p 2 , . . . , x pn
+ 2 x 21 , x 22 , . . . , x 2 n
= 11 + 2 2 + + p p ,
= j j
with j , j = 1, . . . , p as above.
(8)
j=1
16.2
421
Vn
Y X' Y
0
Vr
(p), and Vr Vn. Of course, Y Vn and from (8), it follows that Vr. Let
be the projection of Y into Vr. Then = pj= 1 jj, where j, j = 1, . . . , p may
is, however) but may be chosen to be functions
not be uniquely determined (
2 = Y 2
of Y since is a function of Y. Now, as is well known, Y X
= , and
becomes minimum if = . Thus is an LSE of if and only if X
PROOF
422
16
/ AV = A / V A ,
(10)
/ AV = E AV E AV AV E AV = E A V EV V EV A
= A E V EV V EV A = A / V A .
( )][
( )]
)(
)(
/ = 2S 1X S 1X = 2S 1XX S 1 = 2S 1S S 1
( )
( )
= 2 I p S 1 = 2S 1
because S and hence S1 is symmetric, so that (S1) = S1. This completes the
proof of the theorem.
The following denition will prove useful in the sequel.
DEFINITION 3
LEMMA 1
Let = c
be an estimable function, so that there exists a Vn such that
E(aY) = identically in . Furthermore, let d be the projection of a into Vr.
Then:
i) E(dY) = .
ii) 2(aY) 2 (dY).
iii) dY = c
for any LSE of .
Y) = , then
iv) If is any other nonrandom vector in Vr such that E(
= d.
PROOF
a Y = d + b Y = d Y + b Y
and therefore
( )
= E a Y = E d Y + b EY = E d Y + b X .
16.2
423
of X. Hence E(dY) = .
ii) From the decomposition a = d + b mentioned in (i), it follows that a2 =
d2 + b2. Next, by means of (10),
( )
2 a Y = a / Y a = 2 a
=2 d
+2 b .
Since also
2 d
( )
= 2 d Y
( )
2 aY = 2 dY + 2 b
( )
2 aY 2 dY .
identically in . But
iii) By (i), E(dY) = = c
E d Y = d EY = d X ,
= c
identically in . Hence dX = c. Next, with = X
, the
so that dX
projection of Y into Vr, one has d(Y ) = 0, since d Vr. Therefore
d Y = d = d X = c .
iv) Finally, let Vr be such that E(
Y) = . Then we have
) (
[(
) ] (
0 = E Y E d Y = E d Y = d X .
That is, (
d)X
= 0 identically in and hence (
d)X = 0, or
d)X = 0 which is equivalent to saying that d Vr. So, both d
(
Vr and d Vr and hence d = 0. Thus = d, as was to be seen.
Part (iii) of Lemma 1 justies the following denition.
DEFINITION 4
THEOREM 3
424
16
COROLLARY
is estimable,
Suppose that rank X = p. Then for any c Vp, the function = c
has the smallest variance in the class of all linear in
and hence its LSE = c
Y and unbiased estimators of . In particular, the same is true for each j,
j = 1, . . . , p, where = (1, . . . , p).
The rst part follows immediately by the fact that = S1XY.
The particular case follows from the rst part by taking c to have all its
components equal to zero except for the jth one which is equal to one, for
j = 1, . . . , n.
PROOF
( )
2 Z j = 2 , j = 1, . . . , n.
(11)
j = 0, j = r + 1, . . . , n.
By recalling that is the projection of Y into Vr, we have
n
j=1
j=1
Y = Z j j and = Z j j ,
so that
(12)
Y =
Z
j
j = r +1
n
n
= Zj j Zj j =
j = r +1
j = r +1
425
2
j
(13)
j = r +1
In the model described in (6) with the assumption that r < n, an unbiased
estimator for 2, 2, is provided by Y 2/(n r), where is the projection
of Y into Vr and r ( p) is the rank of X. We may refer to 2 as the LSE
of 2.
Now suppose that rank X = p, so that = S1XY by Theorem 2. Next, the
rows of X are j, where j, j = 1, . . . , p are the column vectors of X, and j
Vp, j = 1, . . . , p. Therefore ji = 0 for all j = 1, . . . , p and i = p + 1, . . . , n. Since
j, j = 1, . . . , n are the columns of P, it follows that the last n p elements in
all p rows of the p n matrix XP are all equal to zero. Now from the
transformation Z = PY one has Y = P1Z. But P 1 = P as follows from the fact
that PP = In. Thus Y = PZ and therefore = S1XPZ. Because of the special
form of the matrix XP menitoned above, it follows that is a function of Zj,
j = 1, . . . , p only (see also Exercise 16.3.1), whereas
2 =
n
1
Z j2
n p j = p +1
( )
by 13 .
Let rank X = p (< n). Then the LSEs and 2 are functions only of the
transformed r.v.s Zj, j = 1, . . , , p and Zj, j = p + 1, . . . , n, respectively.
From the last theorem above, it follows that in order for
us to be able to actually calculate the LSE 2 of 2, we would have to rewrite
Y 2 in a form appropriate for calculation. To this end, we have
REMARK 1
)(
) (Y X )
2
Y = Y X = Y X
= Y X Y X = Y Y Y X XY + XX
= Y Y Y X + XX XY .
2
Y = Y Y XY = Yj2 XY.
j =1
(14)
426
16
Finally, denoting by rv the vth element of the p 1 vector XY, one has
n
rv = xvj Yj , v = 1, . . . , p
(15)
j =1
j =1
v =1
2
Y = Yj2 v rv .
(16)
1 x1
1 x
2
X =
1 xn
and = 1 , 2 .
Next,
1
XX =
x1
1
1
1 1
x2 xn
x1
x2
xj
j = 1
xn
x
j
= S,
x
j =1
j =1
n
2
j
so that the normal equations are given by (7) with S as above and
1 1 1
n
XY =
Y1 , Y2 , . . . , Yn = Yj , x j Yj .
x1 x2 xn
j =1
j =1
Now
2
n
n
n
2
S = n x j2 x j = n x j x ,
j =1
j =1
j =1
so that
n
(x
j =1
0,
(17)
427
provided that not all xs are equal. Then S1 exists and is given by
S 1 =
1
n
n x j x
j =1
n 2
xj
j =n1
x j
j =1
x j
j =1
.
(18)
It follows that
1
= 1 = S 1XY = n
2
n x j x
j =1
n 2 n n n
x j Yj x j x j Yj
j =1 j =1 j =1 j =1
,
n n
n
x j Yj + n x j Yj
j =1
j =1 j =1
so that
n 2 n n n
x
Y
x
x
Y
j
j
j
j
j
j =1 j =1 j =1 j =1
,
1 =
n
2
n x j x
j =1
n
n
n
n x j Yj x j Yj
1
1
1
=
=
=
j
j
j
2 =
and
n
2
n x j x
j =1
n
n
n
2
n x j x = n x 2j x j .
j =1
j =1
j =1
(19)
But
n
n
n n
n x j Yj x j Yj = n x j x Yj Y ,
j =1 j =1
j =1
j =1
)(
( x x )(Y Y ) .
(x x)
n
2 =
j =1
j =1
(19)
428
16
(19)
The expressions of 1 and 2 given by (19) and (19), respectively, are more
compact, but their expressions given by (19) are more appropriate for actual
calculations.
In the present case, (15) gives in conjunction with (17)
n
r1 = Yj
and r2 = x j Yj ,
j =1
j =1
2
Y = Yj2 1 Yj 2 x j Yj .
j =1
j =1
j =1
(20)
=
2
n2
(21)
x 7 = 70
x8 = 70
x 9 = 70
x10 = 90
x11 = 90
Y1 = 37 Y7 = 20
Y2 = 43 Y8 = 26
Y3 = 30 Y9 = 22
Y4 = 32 Y10 = 15
Y5 = 27 Y11 = 19
x12 = 90 Y6 = 34 Y12 = 20.
Then relation (19) provides us with the estimates 1 = 46.3833 and 2 = 0.3216,
and (20) and (21) give the estimate 2 = 14.8939.
Exercises
16.3.1 Referring to the proof of the corollary to Theorem 4, elaborate on the
assertion that is a function of the r.v.s Zj, j = 1, . . . , r alone.
16.3.2
16.3.3
( )
E 1 = 1 , E 2 = 2 , 2 2 =
(x
n
j =1
429
and that 2 and 1 are normally distributed if the Ys are normally distributed.
16.3.4 i) Use relation (18) to show that
2 j = 1 x j2
n
( )
1 =
2
nj = 1 x j x
n
( )
, 2 2 =
(x
n
j =1
( )
( )
, 2 2 =
2 1 =
n
2
n
j =1
x j2
and, by assuming that n is even and xj[x, x], j = 1, . . . , n for some x > 0,
conclude that 2(2) becomes a minimum if half of the xs are chosen equal
to x and the other half equal to x (if that is feasible). (It should be pointed
out, however, that such a choice of the xswhen feasibleneed not be
optimal. This is the case, for example, when there is doubt about the
linearity of the model.)
16.3.5 Consider the linear model Yj = 1 + 2xj + 3x 2j + ej, j = 1, . . . , n under
the usual assumptions and bring it under the form (6). Then for n = 5 and the
data given in the table below nd:
i) The LSEs of and 2;
ii) The covariance of (the LSE of );
iii) An estimate of the covariance found in (ii).
x
1.0
1.5
1.3
2.5
1.7
430
16
(C ): Y = X + e,
e: N 0, 2 I n , rank X = r p < n .
(22)
The assumption that e: N(0, 2In) simply states that the r.v.s ej, j = 1, . . . , n are
uncorrelated normal (equivalently, independent normal) with mean zero and
variance 2.
and it was seen in (8) that
Now from (22), we have that = EY = X
Vr, the r-dimensional vector space generated by the column vectors of X.
This means that the coordinates of are not allowed to vary freely but they
satisfy n r independent linear relationships. However, there might be some
evidence that the components of satisfy 0 < q r additional independent
linear relationships. This is expressed by saying that actually belongs in
Vrq, that is, an (r q)-dimensional vector subspace of Vr. Thus we hypothesize
that
(q < r )
H : Vr q Vr
(23)
and denote by c = C H, that is, the conditions that our model satises if in
addition to (C), we also assume H.
For testing the hypothesis H, we are going to use the Likelihood Ratio
(LR) test. For this purpose, denote by fY(y; , 2), or fY Y (y1, . . . , yn; , 2)
the joint p.d.f. of the Ys and let SC and Sc stand for the minimum of
1, . . . ,
S y, = y X = y j EYj
2
j =1
fY y : ,
1
1
=
exp
2
2
2
2
n
( y j EYj )
j =1
1
1
S y , .
exp
=
2
2 2
2
(24)
From (24), it is obvious that for a xed 2, the maximum of fY(Y; , 2) with
respect to , under C, is obtained when S(y, ) is replaced by Sc. Thus in order
to maximize fY(y; , 2) with respect to both and 2, under C, it sufces to
maximize with respect to 2 the quantity
n
1
SC ,
exp
2
2
or its logarithm
n
n
S 1
log 2 log 2 C 2 .
2
2
2
( )
431
Differentiating with respect to 2 this last expression and equating the derivative to zero, we obtain
S
n 1 SC 1
+
= 0, so that 2 = C .
2
4
2
2
n
The second derivative with respect to 2 is equal to n/(2 4) (SC/ 6) which for
2 = 2 becomes n3/(2SC2 ) < 0. Therefore
n 2
n
max fY y; , 2 =
C
2SC
n
exp .
2
(25)
n
exp ,
2
(26)
c
2Sc
n 2
S
= c
SC
n 2
(27)
where
SC = min S Y, = Y C
= Y X C ,
(28)
C = LSE of under C ,
and
S c = min S Y, = Y c
c
= Y X c ,
(29)
c = LSE of under c .
()
g = 2 n 1.
(30)
Then
( )=2 (
dg
d
n+ 2
< 0,
432
16
so that g() is decreasing. Thus < 0 if and only if g() > g( 0).
Taking into consideration relations (27) and (30), the last inequality
becomes
n r S c SC
> F0 ,
q
SC
where F0 is determined by
n r S c SC
PH
> F0 = .
q
S
C
Therefore the LR test is equivalent to the test which rejects H whenever
F > F0 , where F =
n r Sc SC
q
SC
and F0 = Fq , n r .
jx
(31)
The statistics SC and Sc are given by (28) and (29), respectively, and the
distribution of F, under H, is Fq,nr, as is shown in the next section.
Now although the F-test in (31) is justied on the basis that it is equivalent
to the LR test, its geometric interpretation illustrated by Fig. 16.3 below
illuminates it even further. We have that C is the best estimator of under
C and c is the best estimator of under c. Then the F-test rejects H
whenever C and c differ by too much; equivalently, whenever Sc SC is too
large (when measured in terms of SC).
Suppose now that rank X = p (< n), and let be the unique (and unbiased)
) = 0 because Y , one has
LSE of . By the facet that (Y )(
that the joint p.d.f. of the Ys is given by the following expression, where y has
been replaced by Y:
n
2
1
1
2
X .
exp
n
p
+
X
2
2 2
Vn
SC squared norm of
Y
Sc squared norm of
Vr q
c
Sc S C squared norm of
Vr
16.5
433
Under conditions (C) described in (22) and the assumption that rank X = p (<
n), it follows that the LSEs j of j, j = 1, . . . , p have the smallest variance in
the class of all unbiased estimators of j, j = 1, . . . , n (that is, regardless of
whether they are linear in Y or not).
For j = 1, . . . , p, j is an unbiased (and linear in Y) estimate of j. It
is also of minimum variance (by the LehmannScheff theorem) as it depends
only on a sufcient and complete statistic.
PROOF
Exercises
16.4.1 Show that the MLE and the LSE of 2, 2, and 2, respectively, are
related as follows:
nr 2
2 =
and that 2 = 2
N + xj x , 2 ; xj ,
j = 1, . . . , n
1 n
xj
n j =1
434
16
j = 1, . . . , q.
2
j
j = r +1
Y = jZj
Z ,
and c =
j =1
j = q +1
= Z j2 +
j =1
Z j2 .
j = r +1
Hence
q
Sc SC = Z j2 ,
j =1
so that
nr
F =
q
j = 1Z j2 = j = 1Z j2 q .
n
n
j = r +1Z j2 j = r +1Z j2 n r
q
Z j2 is 2 q2
j =1
and
Z j2 is 2 n2r .
j = r +1
It follows that, under H (H), the statistic F is distributed as Fq, nr. The distribution of F, under the alternatives, is non-central Fq, nr which is dened in
2
terms of a nr
and a non-central 2q distribution. For these denitions, the
reader is referred to Appendix II.
From the derivations in this section and previous results, one has the
following theorem.
16.5
THEOREM 6
435
Assume the model described in (22) and let rank X = p (< n). Then the LSEs
and 2 of and 2, respectively, are independent.
It is an immediate consequence of the corollary to Theorem 4 and
Theorem 5 in Chapter 9.
PROOF
Refer to Example 1 and suppose that the xs are not all equal, and that the Ys
are normally distributed. It follows that rank X = r = 2, and the regression line
is y = 1 + 2x in the xy-plane. Without loss of generality, we may assume that
x1 x2. Then i = 1 + 2 xi, i = 1, 2 are linearly independent, and all j, j = 3, . . . ,
n are linear combinations of 1 and 2. Therefore V2.
Now, suppose we are interested in testing the following hypothesis about
the slope of the regression line; namely H1: 2 = 20, where 20 is a given
number. Hypothesis H1 is equivalent to the hypothesis H1 : i = 1 + 20 xi, i = 1,
2, from which it follows that, under H1 (or H1 ), V1. Thus, r q = 1, or q =
2Cx, 2C = jn= 1(xj x)2(Yj Y
)/
1. The LSE of 1 and 2 are 1C = Y
n
2
20x, 2c = 20.
j = 1(xj x) , whereas under H1 (or H1 , the LSE become 1c = Y
)2 22C jn= 1(xj x)2. Likewise,
Then C = XC and SC = Y C 2 = jn= 1(Yj Y
2
n
2
) + 20 (20 22C) jn= 1(xj x)2. It
c = X
c and Sc = Y c = j = 1(Yj Y
2
n
follows that Sc SC = (2C 20) j = 1(xj x)2, and the test statistic is
F = n1 2 ScSCSC ~ F1, n2 , and the cut-off point is F0 = F1,n2; .
Next, suppose we are interested in testing the hypothesis H2: 1 = 10, 2 =
20, where 10 and 20 are given numbers. Hypothesis H2 is equivalent to the
hypothesis H2: i = 10 + 20 xi, i = 1, 2, from which it follows that, under H2
(or H2 ), V0. Thus, r q = 0, or q = 2. Clearly, 1c = 10, 2c = 20, so that
c = X
c and Sc = jn= 1(Yj 10 20xj)2. It follows that the test statistic
here is F = n22 ScSCSC ~ F2 , n2 , where SC was given above. The cut-off point is
F0 = F2,n2;.
As mentioned earlier, the linear model adopted in this chapter can also be
used for prediction purposes. The following example provides an instance of a
prediction problem and its solution.
EXAMPLE 3
436
16
x0 x
1 1
.
Z2 = 2 + + n
m n
2
x j x
j =1
(32)
2 2
Y0 Y0
x0 x
1 1
+ + n
2
n 2 m n
x j x
j =1
[Y st
0
n 2 ; 2
, Y0 + st n 2 ;
],
where s =
x0 x
1 1
+ + n
2
n 2 m n
x j x
j =1
For a numerical example, refer to the data used in Example 1 and let x0 = 60,
m = 1 and = 0.05. Then Y 0 = 27.0873, s = 1.2703 and t10;0.025 = 2.2281, so that the
prediction interval for Y0 is given by [24.2570, 29.9176].
Exercises
16.5.1 From the discussion in Section 16.5, it follows that the distribution of
[(n r) 2]/2 is 2nr. Thus the statistic 2 can be used for testing hypotheses
about 2 and also for constructing condence intervals for 2.
ii) Set up a condence interval for 2 with condence coefcient 1 ;
ii) What is this condence interval in the case of Example 2 when n = 27 and
= 0.05?
16.5.2 Refer to Example 2 and:
i) Use Exercises 16.3.3 and 16.3.4 to show that
n 1
j = 1 x j2 ( 2 2 )
n
and
Exercises
437
n 1
j =1 ( x j x )
n
and
2
2
are distributed as tn2.
Thus the r.v.s guring in (ii) may be used for testing hypotheses
about 1 and 2 and also for constructing condence intervals for 1 and
2;
iii) Set up the test for testing the hypothesis H : 1 = 0 (the regression line
passes through the origin) against A : 1 0 at level and also construct a
1 condence interval for 1;
iv) Set up the test for testing the hypothesis H : 2 = 0 (the Ys are independent
of the xs) against A : 2 0 at level and also construct a 1 condence
interval for 2;
v) What do the results in (iii) and (iv) become for n = 27 and = 0.05?
16.5.3
16.5.4 Refer to Example 3 and suppose that the r.v.s Y0i = 1 + 2x0 + ei,
i = 1, . . . , m corresponding to an unknown point x0 are observed. It is
assumed that the r.v.s Yj, j = 1, . . . , n and Y0i, i = 1, . . . , m are all
independent.
ii) Derive the MLE x0 of x0;
ii) Set V = Y0 1 2x0, where Y0 is the sample mean of the Y0is and show
that the r.v.
V V
(m + n) (m + n 3)
2
,
2
where
2
x0 x
1 1
= + + n
m n
j = 1 x j x
2
V
and
2 =
1 n
Yj 1 2 x j
m + n j = 1
m
2
+ Y0i Y0 ,
i =1
is distributed as tm+n3.
16.5.5 Refer to the model considered in Example 1 and suppose that the xs
and the observed values of the Ys are given by the following table:
438
16
10
15
20
25
30
0.10
0.21
0.30
0.35
0.44
0.62
i) Find the LSEs of 1, 2 and 2 by utilizing the formulas (19), (19) and
(21), respectively;
ii) Construct condence intervals for 1, 2 and 2 with condence coefcient
1 = 0.95 (see Exercises 16.5.1 and 16.5.2(ii));
iii) On the basis of the assumed model, predict Y0 at x0 = 17 and construct a
prediction interval for Y0 with condence coefcient 1 = 0.95 (see
Example 3).
16.5.6 The following table gives the reciprocal temperatures (x) and the
corresponding observed solubilities of a certain chemical substance.
3.80
3.72
3.67
3.60
3.54
1.27
1.32
1.50
1.20
1.26
1.10
1.07
0.82
0.84
0.80
0.65
0.57
0.62
Assume the model considered in Example 1 and discuss questions (i) and (ii)
of the previous exercise. Also discuss question (iii) of the same exercise for
x0 = 3.77.
16.5.7 Let Zj, j = 1, . . . , n be independent r.v.s, where Zj is distributed as
N(j, 2). Suppose that j = 0 for j = r + 1, . . . , n whereas 1, . . . , r, 2 are
parameters. Then derive the LR test for testing the hypothesis H : 1 = 1
against the alternative A : 1 1 at level of signicance .
16.5.8 Consider the r.v.s of Exercise 16.4.2 and transform the Ys to Zs by
means of an orthogonal transformation P whose rst two rows are
n
2
x1 x
x x 1
1
,..., n
,...,
, s x2 = x j x .
sx n
sx
j =1
n
Then:
ii) Show that the Zs are as in Exercise 16.5.7 with r = 2, 1 = sx, 2 = ;
ii) Set up the test mentioned in Exercise 16.5.7 and then transform the Zs
back to Ys. Also compare the resulting test with the test mentioned in
Exercise 16.4.2.
Exercises
439
440
17
Analysis of Variance
Chapter 17
Analysis of Variance
17.1 One-way Layout (or One-way Classication) with the Same Number of
Observations Per Cell
The models to be discussed in the present chapter are special cases of the
general model which was studied in the previous chapter. In this section, we
consider what is known as a one-way layout, or one-way classication, which
we introduce by means of a couple of examples.
440
17.1
EXAMPLE 1
441
Consider I machines, each one of which is manufactured by I different companies but all intended for the same purpose. A purchaser who is interested in
acquiring a number of these machines is then faced with the question as to
which brand he should choose. Of course his decision is to be based on the
productivity of each one of the I different machines. To this end, let a worker
run each one of the I machines for J days each and always under the same
conditions, and denote by Yij his output the jth day he is running the ith
machine. Let i be the average output of the worker when running the ith
machine and let eij be his error (variation) the jth day when he is running the
ith machine. Then it is reasonable to assume that the r.v.s eij are normally
distributed with mean 0 and variance 2. It is further assumed that they are
independent. Therefore the Yijs are r.v.s themselves and one has the following model.
Yij = i + eij
EXAMPLE 2
( )
j = 1, . . . , J ( 2).
for i = 1, . . . , I 2 ;
(1)
(
e = (e , . . . , e ; e
= ( , . . . , )
Y = Y11 , . . . , Y1 J ; Y21 , . . . , Y2 J ; . . . ; YI , . . . , YI
11
1J
21
, . . . , e2 J ; . . . ; eI , . . . , eI
442
17
Analysis of Variance
J1
1
2
(i, jth)
cell
I1
I
Figure 17.1
I 4448
64447
1 0 0 0
J
1 0 0 0
0 1 0 0
0 J
X = 0 1 0 0
0 0 0 1
J
0 0 0 1
17.1
443
0 0
0
0
0
0
= JIp
0 J
and
J
J
J
XY = Y1 j , Y2 j , . . . , YIJ ,
j =1
j =1
j =1
1 J
1 J
= S 1 XY = Y1 j , Y2 j , . . . , YIJ .
J j =1
J j =1
J j =1
i = 1, . . . , I .
(2)
n r Sc SC I J 1 Sc SC
=
.
q
I 1
SC
SC
(3)
Now, under H, the model becomes Yij = + eij and the LSE of is obtained by
differentiating with respect to the expression
Y c
= Yij .
i =1 j =1
= Y. . ,
where
Y. . =
1
IJ
Y .
ij
i =1 j =1
(4)
444
17
Analysis of Variance
SC = Y C
) = (Y
) = (Y
= Yij ij ,C
i =1 j =1
Yi .
ij
i =1 j =1
and
Sc = Y c
= Yij ij ,c
i =1 j =1
ij
Y. . .
i =1 j =1
(Y
ij
Yi .
j =1
so that
) = Y
2
2
ij
JY i2. ,
j =1
) = Y
2
2
ij
i =1 j =1
J Y i2. .
(5)
IJY .2 . ,
(6)
i =1
Likewise,
J
) = Y
2
2
ij
i =1 j =1
2
Sc SC = J Y i2. IJY .2 . = J Yi2. IY .2 . = J Yi . Y. . ,
i =1
i =1
i =1
since
Y. . =
1 I
1 I 1 J
Y
ij = Yi . .
I i =1 J j =1 I i =1
That is,
Sc SC = SSH,
(7)
where
I
SSH = J Yi . Y. .
i =1
= J Y i2. IJY .2 . .
i =1
F =
I J 1 SS H MS H
=
,
I 1 SS e
MS e
(8)
where
SS H
SS e
,
MS e =
I 1
I ( J 1)
and SSH and SSe are given by (7) and (5), respectively. These expressions are
also appropriate for actual calculations. Finally, according to Theorem 4 of
Chapter 16, the LSE of 2 is given by
MS H =
2 =
SS e
.
I ( J 1)
(9)
17.1
445
degrees of
freedom
sums of squares
I
between groups
SS H = J (Y i . Y. . )
I1
mean squares
MS H =
i =1
within groups
i =1 j =1
total
SS e = Y ij Y i .
J
SS T = Y ij Y. .
i =1 j =1
I( J 1)
IJ 1
MS e =
SS H
I 1
SS e
I( J 1)
From (5), (6) and (7) it follows that SST = SSH + SSe. Also from
(6) it follows that SST stands for the sum of squares of the deviations of the Yijs
from the grand (sample) mean Y. .. Next, from (5) we have that, for each i,
Jj = 1(Yij Yi.)2 is the sum of squares of the deviations of Yij, j = 1, . . . , J within
the ith group. For this reason, SSe is called the sum of squares within groups.
On the other hand, from (7) we have that SSH represents the sum of squares of
the deviations of the group means Yi. from the grand mean Y. . (up to the factor
J). For this reason, SSH is called the sum of squares between groups. Finally,
SST is called the total sum of squares for obvious reasons, and as mentioned
above, it splits into SSH and SSe. Actually, the analysis of variance itself derives
its name because of such a split of SST.
Now, as follows from the discussion in Section 5 of Chapter 16, the
2
quantities SSH and SSe are independently distributed, under H, as 2 I1
and
2 2
2 2
I(J1), respectively. Then SST is IJ1 distributed, under H. We may
summarize all relevant information in a table (Table 1) which is known as an
Analysis of Variance Table.
REMARK 1
EXAMPLE 3
82
83
75
79
78
Y21 =
Y22 =
Y23 =
Y24 =
Y25 =
61
62
67
65
64
Y31 = 78
Y31 = 72
Y33 = 74
Y34 = 75
Y35 = 72
We have then
1 = 79.4,
2 = 63.8,
3 = 74.2
and MSH = 315.5392, MSe = 7.4, so that F = 42.6404. Thus for = 0.05, F2,12;0.05,
= 3.8853 and the hypothesis H : 1 = 2 = 3 is rejected. Of course, 2 = MSe = 7.4.
446
17
Analysis of Variance
Exercise
17.1.1 Apply the one-way layout analysis of variance to the data given in the
table below.
A
10.0
9.1
9.2
11.5
10.3
8.4
11.7
9.4
9.4
i =1
j =1
i = j = 0.
Finally, it is assumed, as is usually the case, that the r. errors eij, i = 1, . . . , I;
j = 1, . . . , J are independent N(0, 2). Thus the assumed model is then
Yij = + i + j + eij ,
where
i =1
j =1
i = j = 0
(10)
17.2
447
is the contribution of the ith fertilized and the jth column effect is the contribution of the jth variety of the commodity in question.
From the preceding two examples it follows that the outcome Yij is affected by two factors, machines and workers in Example 4 and fertilizers and
varieties of agricultural commodity in Example 5. The I objects (machines or
fertilizers) and the J objects (workers or varieties of an agricultural commodity) associated with these factors are also referred to as levels of the factors.
The same interpretation and terminology is used in similar situations throughout this chapter.
In connection with model (10), there are the following three problems to
be solved: Estimation of ; i, i = 1, . . . , I; j, j = 1, . . . , J; testing the hypothesis
HA : 1 = = I = 0 (that is, there is no row effect), HB : 1 = = J = 0 (that is,
there is no column effect) and estimation of 2.
We rst show that model (10) is a special case of the model described in
(6) of Chapter 16. For this purpose, we set
(
e = (e , . . . , e ; e , . . . , e ; . . . ; e
= ( ; , . . . , ; , . . . , )
Y = Y11 , . . . , Y1 J ; Y21 , . . . , Y2 J ; . . . ; YI 1 , . . . , J IJ
1J
11
2J
21
I1
, . . . , e IJ
and
1
1
X =
1
1
1
I 4448
64447
1 0 0 0
1 0 0 0
1 0 0 0
J 4444
6444
47
8
1 0 0 0
0 1 0 0
J
0 0 0 0 1
0 1 0
0 1 0
0 1 0
1 0 0
0 1 0
0 0 0
0
0
0
0
J
1
0
0
0
0
0 0 1
0 0 1
1 0 0
0 1 0
0 0 1
0
J
0 0 1
448
17
Analysis of Variance
n = IJ
with
and
p = I + J + 1.
It can be shown (see also Exercise 17.2.1) that X is not of full rank but rank
X = r = I + J 1. However, because of the two independent restrictions
I
=
i
i =1
= 0,
j =1
imposed on the parameters, the normal equations still have a unique solution,
as is found by differentiation.
In fact,
( )
S Y, = Yij. i j
i =1 j =1
and
S Y, = 0
( )
S Y, = 0
a i
implies
i = Yi . Y. ., where Yi. is given by (2) and (/j)S(Y, ) = 0 implies
j = Y.j Y. ., where
1 I
Yij .
I i =1
Summarizing these results, we have then that the LSEs of , i and j are,
respectively,
Y. j =
= Y. . , i = Yi. Y. . , i = 1, . . . , I , j = Y. j Y. . , j = 1, . . . , J .
(11)
Y. j =
1 I
Yij , j = 1, . . . , J .
I i =1
(12)
EY = = X : 1 , . . . , I ; 1 , . . . , J
Vr , where r = I + J 1.
(Y
ij
i =1 j =1
17.2
A = Y. . = , j, A = Y. j Y. . = j , j = 1, . . . , J .
449
(13)
Therefore relations (28) and (29) in Chapter 16 give by means of (11) and (12)
SC = Y C
= Yij ij ,C
i =1 j =1
) = (Y
2
Yi . Y. j + Y. .
ij
i =1 j =1
and
Sc = Y c
A
2
A
= Yij ij ,c
i =1 j =1
) = (Y
I
ij
Y. j .
i =1 j =1
[(
) (
SC = SSe = Yij Y. j Yi . Y. .
i =1 j =1
I
= Yij Y. j
i =1 j =1
)]
J Yi . Y. .
i =1
(14)
because
I
i =1
j =1
= J Yi . Y. . .
i =1
Therefore
I
i =1
i =1
Sc SC = SS A , where SS A = J 2i = J Yi . Y. .
A
= J Yi 2. IJY.2. .
(15)
i =1
It follows that for testing HA, the F statistic, to be denoted here by FA, is given
by
FA =
(I 1)( J 1) SS
I 1
SSe
MS A
,
MSe
(16)
where
MS A =
SS A
,
I 1
MS e =
SS e
I 1 J 1
)(
and SSA, SSe are given by (15) and (14), respectively. (However, for an expression of SSe to be used in actual calculations, see (20) below.)
450
17
Analysis of Variance
(I 1)( J 1) SS
J 1
SSe
MSB
,
MSe
(17)
j =1
j =1
SSB = Sc SC = I j2 = I Y. j Y. .
B
= I Y. 2j IJY.2. .
(18)
j =1
The quantities SSA and SSB are known as sums of squares of row effects and
column effects, respectively.
Finally, if we set
I
SST = Yij Y. .
i =1 j =1
) = Y
2
2
ij
IJY.2. ,
(19)
i =1 j =1
we show below that SST = SSe + SSA + SSB from where we get
SSe = SST SSA SSB.
(20)
Relation (20) provides a way of calculating SSe by way of (15), (18) and (19).
Clearly,
I
[(
) (
) (
SSe = Yij Y. . Yi . Y. . Y. j Y. .
i =1 j =1
I
= Yij Y. .
i =1 j =1
J
+ I Y. j Y. .
j =1
I
i =1
+ J Yi . Y. .
2
)(
2 Yij Y. . Yi . Y. .
i =1 j =1
)(
)(
2 Yij Y. . Y. j Y. .
i =1 j =1
)]
+ 2 Yi . Y. . Y. j Y. . = SSTT SS A SSB
i =1 j =1
because
I
i =1
j =1
= J Yi . Y. .
i =1
= SS A ,
451
Exercises
Table 2 Analysis of Variance for Two-way Layout with One Observation Per Cell
source of
variance
rows
columns
i =1
i =1
j =1
j =1
SS A = J 2i = J (Y i . Y. . )
SS B = I 2j = I Y. j Y. .
I
residual
degrees of
freedom
sums of squares
SS e = Y ij Y i . Y. j + Y. .
i =1 j =1
SS T = Y ij Y. .
total
i =1 j =1
(Y
ij
i =1 j =1
)(
mean squares
I1
MS A =
SS A
I 1
J1
MS B =
SS B
J 1
( I 1) ( J 1)
MS e =
IJ 1
Y. . Y. j Y. . = Y. j Y. .
j =1
j =1
) (Y
= I Y. j Y. .
SS e
( I 1)( J 1)
ij
Y. .
i =1
= SSB
and
I
(Y
i.
i =1 j =1
)(
Y. . Y. j Y. . = Yi . Y. .
i =1
) (Y
.j
j =1
Y. . = 0.
The pairs SSe, SSA and SSe, SSB are independent 2 2 distributed r.v.s with
certain degrees of freedom, as a consequence of the discussion in Section 5 of
Chapter 16. Finally, the LSE of 2 is given by
2 = MSe.
(21)
Exercises
17.2.1 Show that rank X = I + J 1, where X is the matrix employed in
Section 2.
17.2.2 Apply the two-way layout with one observation per cell analysis of
variance to the data given in the following table (take = 0.05).
452
17
Analysis of Variance
(22)
where
I
= =
i
i =1
j =1
j =1
ij
= ij = 0
i =1
(
e = (e
= (
and
11
, . . . , e11K ; . . . ; e1 J 1 , . . . , e1 JK ; . . . ; e IJ 1 , . . . , e IJK
, . . . , 1 J ; . . . ; I 1 , . . . . IJ
IJ 4444448
64444447
1 0 0 0
K
1 0 0 0
0 1 0 0
K
0 1 0 0
J
6447448
0 0 1 0 0
K
X = 0 0 1 0 0
I 1 J +1
64444
7
44448
0 0 1 0 0
K
0 0 1 0 0
1
0 0
K
0 0
1
453
with n = IJK
and p = IJ,
(22)
so that model (22) is a special case of model (6) in Chapter 16. From the form
of X it is also clear that rank X = r = p = IJ; that is, X is of full rank (see also
Exercise 17.3.1). Therefore the unique LSEs of the parameters involved are
obtained by differentiating with respect to ij the expression
( )
S Y, = Yijk ij .
We have then
i =1 j =1k =1
454
17
Analysis of Variance
ij = Yij. ,
i = 1. . . . , I ;
j = 1, . . . , J .
(23)
Next, from the fact that ij = + i + j + ij and on the basis of the assumptions
made in (22), we have
= . ., i = i. . ., j = .j . ., ij = ij i. .j + . .,
(24)
by employing the dot notation already used in the previous two sections.
From (24) we have that , i, j and ij are linear combinations of the parameters ij. Therefore, by the corollary to Theorem 3 in Chapter 16, they are
, i, j, ij, are given by the above-mentioned linear
estimable, and their LSEs
combinations, upon replacing ij by their LSEs. It is then readily seen that
j = Y. j. Y. . . ,
= Y. . . , i = Yi. . Y. . . ,
ij = Yij. Yi. . Y. j. + Y. . . ,
(25)
i = 1, . . . , I ; j = 1, . . . , J .
ij =
+ i + j, + ij. Therefore
Now from (23) and (25) it follows that
I
SC = Yijk ij
i =1 j =1k =1
Next,
i =1 j =1 k =1
) (
+ ( ) + ( ) + (
Yijk ij = Yijk i j ij +
i
and hence
( )
S Y, = Yijk ij
i =1 j =1k =1
I
+ JK i i
i =1
= Yijk i j ij .
= SC + IJK
J
+ IK j j
j =1
ij
+ ij
+ K ij ij ,
i =1 j =1
(26)
because, as is easily seen, all other terms are equal to zero. (See also Exercise
17.3.2.)
From identity (26) it follows that, under the hypothesis
HA :1 = = I = 0,
the LSEs of the remaining parameters remain the same as those given in (25).
It follows then that
I
i =1
i =1
Sc A = SC + JK i2 , so that Sc A SC = JK i2 .
Thus for testing the hypothesis HA the sum of squares to be employed are
I
= Yijk2 K Yij2 ,
i =1 j =1k =1
and
i =1 j =1
(27)
i =1
i =1
Sc SC = SS A = JK i2 = JK Yi . . Y. . .
A
455
= JK Yi2. . IJKY.2. . .
(28)
i =1
IJ K 1 SS A MS A
=
,
I 1 SS e
MS e
FA =
(29)
where
SS A
,
I 1
MS A =
MS e =
SS e
IJ K 1
IJ K 1 SS B MS B
=
,
J 1 SS e
MS e
FB =
(30)
where
MSB =
SSB
J 1
j =1
j =1
SSB = IK j2 = IK Y. j . Y. . .
and
= IK Y. 2j . IJKY.2. . .
(31)
j =1
i = 1, . . . , I; j = 1, . . . , J,
arguments similar to the ones used before yield the F statistic, which now is
given by
F AB =
where
IJ K 1
SS AB MS AB
=
,
MS e
I 1 J 1 SS e
)(
(32)
456
17
Analysis of Variance
MS AB =
SS AB
I 1 J 1
)(
and
i =1 j =1
= K Yij . Yi . . Y. j . + Y. . . .
i =1 j =1
SS AB = K ij2
2
(33)
(However, for an expression of SSAB suitable for calculations, see (35) below.)
Finally, by setting
I
SST = Yijk Y. . .
i =1 j =1k =1
I
= Yijk2 IJKY.2. . ,
(34)
i =1 j =1k =1
we can show (see Exercise 17.3.3) that SST = SSe + SSA + SSB + SSAB, so that
SSAB = SST SSe SSA SSB.
(35)
Relation (35) is suitable for calculating SSAB in conjunction with (27), (28), (31)
and (34).
Of course, the LSE of 2 is given by
2 = MSe.
(36)
Once again the main results of this section are summarized in a table, Table 3.
The number of degrees of freedom of SST is calculated by those of SSA,
SSB, SSAB and SSe, which can be shown to be independently distributed as 2 2
r.v.s with certain degrees of freedom.
EXAMPLE 6
18
20
50
53
34
36
40
17
X121
X122
X123
X124
X221
X222
X223
X224
= 64
= 49
= 35
= 62
= 40
= 63
= 35
= 63
X131
X132
X133
X134
X231
X232
X233
X234
=
=
=
=
=
=
=
=
61
73
62
90
56
61
58
73
3 = 16.2084;
22 = 1.4166,
1 = 2.5417,
11 = 0.7917,
23 = 2.2084
Exercises
457
Table 3 Analysis of Variance for Two-way Layout with K ( 2) Observations Per Cell
source of
variance
A main effects
B main effects
i =1
i =1
j =1
j =1
SS A = JK i2 = JK (Y i . . + Y. . . )
i =1 j =1
SS e = Y ijk Y ij .
i =1 j =1 k =1
total
SS AB = K 2ij = K Yij . Yi . . Y. j . + Y. . .
i =1 j =1
error
SS B = IK j2 = IK Y. j . Y. . .
I
AB interactions
degrees of
freedom
sums of squares
SS T = Y ijk Y. . .
i =1 j =1 k =1
mean squares
I1
MS A =
SS A
I 1
J1
MS B =
SS B
J 1
( I 1)( J 1)
IJ ( K 1)
MS AB =
MS e =
IJK 1
SS AB
( I 1)( J 1)
SS e
IJ ( K 1)
and
FA = 0.8471,
Thus for = 0.05, we have F1,18;0.05 = 4.4139 and F2,18;0.05 = 3.5546; we accept HA,
reject HB and accept HAB. Finally, we have 2 = 183.0230.
The models analyzed in the previous three sections describe three experimental designs often used in practice. There are many others as well. Some of
them are taken from the ones just described by allowing different numbers of
observations per cell, by increasing the number of factors, by allowing the row
effects, column effects and interactions to be r.v.s themselves, by randomizing
the levels of some of the factors, etc. However, even a brief study of these
designs would be well beyond the scope of this book.
Exercises
17.3.1
17.3.
17.3.2
458
17
Analysis of Variance
17.3.3 Show that SST = SSe + SSA + SSB + SSAB, where SSe, SSA, SSB, SSAB and
SST are given by (27), (28), (31), (33) and (34), respectively.
17.3.4 Apply the two-way layout with two observations per cell analysis of
variance to the data given in the table below (take = 0.05).
110
128
48
123
19
95
117
60
138
94
214
183
115
114
129
217
187
127
156
125
208
183
130
225
114
119
195
164
194
109
) (
1
1
1 + 2 + 3 4 + 5 + 6 ,
3
3
1
1
or
1 + 3 + 5 2 + 4 + 6 etc.
3
3
We observe that these quantities are all of the form
i j , i j, or
) (
ci i
i =1
with
ci = 0.
i =1
Any linear combination = iI= 1 cii of the s, where ci, i = 1, . . . , I are known
constants such that iI= 1ci = 0, is said to be a contrast among the parameters i,
i = 1, . . . , I.
Let = iI= 1cii be a contrast among the s and let
459
I
1 I
= ci Yi ., 2 = ci2 MSe and S 2 = I 1 FI 1,n I ;a ,
J i =1
i =1
where n = IJ. We will show in the sequel that the interval [ S ( ), +
S ( )] is a condence interval with condence coefcient 1 for all contrasts . Next, consider the following denition.
( )
DEFINITION 2
THEOREM 1
= ci i ,
i =1
c i = 0,
i =1
so that
1 I
2 = ci2 MSe ,
J i =1
where MSe is given in Table 1. Then the interval [ S ( ), + S ( )] is a
condence interval simultaneously for all contrasts with condence coefcients 1 , where S2 = (I 1)FI 1,nI; and n = IJ.
( )
f c1 , . . . , c I =
1 I 2
ci
J i =1
c (Y
i
i =1
i.
ci = 0.
i =1
Now, clearly, f(c1, . . . , cI) = f(c1, . . . , cI) for any > 0. Therefore the maximum (minimum) of f(c1, . . . , cI), subject to the restraint
I
ci = 0,
i =1
is the same with the maximum (minimum) of f( c1, . . . , cI) = f(c1, . . . , cI),
ci = ci, i = 1, . . . , I subject to the restraints
460
17
Analysis of Variance
c = 0
i
i =1
and
1 I 2
ci = 1.
J i =1
q c1 , . . . , c I = ci Yi . i ,
i =1
ci = 0
ci2 = J .
and
i =1
i =1
Thus the points which maximize (minimize) q(c1, . . . , cI) are to be found on
the circumference of the circle which is the intersection of the sphere
I
ci2 = J
i =1
ci = 0
i =1
which passes through the origin. Because of this it is clear that q(c1, . . . , cI) has
both a maximum and a minimum. The solution of the problem in question will
be obtained by means of the Lagrange multipliers. To this end, one considers
the expression
I
I
I
h = h c1 , . . . , c I ; 1 , 2 = ci Yi . i + 1 ci + 2 ci2 J
i =1
i =1
i =1
h
= Yk . k + 1 + 2 2 ck = 0, k = 1, . . . , I
ck
I
h
= ci = 0
1 i =1
h
2
= c i J = 0.
2 i =1
(37)
ck =
1
k Yk . 1
2 2
k = 1, . . . , I .
(38)
461
1 = . Y. .
2 =
and
2 J
Replacing these values in (38), we have
J k . Yk . + Y. .
ck =
k = 1, . . . , I .
. Yi . + Y. . .
i =1
J i =1 i . Yi . + Y. .
Next,
I
(Y
k.
k =1
)(
)[(
) + ( Y ) (
k k . Yk . + Y. . = k Yk .
k =1
I
= k Yk .
k =1
) (
Yk . . Y. .
I
..
Therefore
J i =1 i . Yi . + Y. .
I
[(
) (
I
i =1 i
c Yi . i
)]
0.
1 J i =1 ci2
I
J i =1 i . Yi . + Y. .
I
Yk .
2
I . Y. .
= k Yk . . Y. .
k =1
)]
k =1
I
= k Yk .
k =1
I
(39)
ci = 0.
i =1
J i . Yi . + Y. .
i =1
[(
) (
= J Yi . Y. . i .
i =1
)]
is 2 2I1 distributed (see also Exercise 17.4.1) and also independent of SSe
which is 2 2nI distributed. (See Section 17.1.) Therefore
J i =1 i . Yi . + Y. .
I
) (I 1)
MSe
is FI1,nI distributed and thus
(I 1)F
I
= P J i =1 i . Yi . + Y. .
MSe J i =1 i . Yi . + Y. .
I
I 1 , n I ;
(I 1)F
I 1 , n I ;a
MSe = 1 .
(40)
462
17
Analysis of Variance
(I 1)F
(I 1)F
1 J i = 1ci2 MSe = 1 ,
I 1 , n I ;
I 1 , n I ;
( )]
( )
P S + S = 1 ,
for all contrasts , as was to be seen. (This proof has been adapted from the
paper A simple proof of Scheffs multiple comparison theorem for contrasts
in the one-way layout by Jerome Klotz in The American Statistician, 1969,
Vol. 23, Number 5.)
In closing, we would like to point out that a similar theorem to the one just
proved can be shown for the two-way layout with (K 2) observations per cell
and as a consequence of it we can construct condence intervals for all contrasts among the s, or the s, or the s.
Exercises
17.4.1
17.4.2 Refer to Exercise 17.1.1 and construct condence intervals for all
contrasts of the s (take 1 = 0.95).
18.1 Introduction
463
Chapter 18
18.1 Introduction
In this chapter, we introduce the Multivariate Normal distribution and establish some of its fundamental properties. Also, certain estimation and independence testing problems closely connected with it are discussed.
Let Yj, j = 1, . . . , m be i.i.d. r.v.s with common distribution N(0, 1).
Then we know that for any constants cj, j = 1, . . . , m and the r.v. mj= 1 cjYj +
is distributed as N(, mj= 1 c2j ). Now instead of considering one (nonhomogeneous) linear combination of the Ys, consider k such combinations;
that is,
m
X i = cij Yj + i ,
i = 1, . . . , k,
(1)
j =1
or in matrix notation
X = CY + ,
(2)
where
X = X1 , . . . , X k ,
Y = Y1 , . . . , Ym ,
DEFINITION 1
and
( )( )
= ( , . . . , ) .
C = c ij k m ,
1
463
464
18
From (2) and relation (10), Chapter 16, it follows that EX = and / x =
C
/ YC = CImC = CC; that is,
EX = ,
/ x or just / = CC .
(3)
We now proceed to nding the ch.f. x of the r. vector X. For t = (t1, . . . , tk)
k, we have
()
)]
k
t CY = t j c j 1 , . . . ,
j =1
t j c jm (Y1 , . . . , Ym )
(4)
j =1
k
= t j c j 1 Y1 + + t j c jm Ym
j =1
j =1
and hence
E exp it CY = Y t j c j 1 + + Y t j c jm
j =1
j =1
because
2
2
1 k
1 k
= exp t j c j 1 t j c j m
2 j =1
2 j =1
= exp t CCt
2
(5)
t
c
+
+
j j1
t j c j m = t CCt.
j =1
j =1
The ch.f. of the r. vector X = (X1, . . . , Xk), which has the k-Variate Normal
distribution with mean and covariance matrix / , is given by
1
(6)
x t = exp it t / t .
2
()
X ~ N , / .
18.1 Introduction
exp x / 1 x , x k ,
2
() ( )
fX x = 2
PROOF
k 2
1 2
465
(7)
gives
Y = C 1 X .
Therefore
[ (
()
)] C
fX x = fY C 1 x
( )
= 2
k 2
1 k
exp y 2j C 1 .
2 j =1
But
y = ( x ) (C
k
2
j
j =1
C 1 = C
and (C ) (C ) = (C ) C = (CC ) = / .
1
(8)
Therefore
() ( )
fX x = 2
1
C 1 exp x / 1 x
2
k 2
) .
() ( )
fX x = 2
k 2
1 2
1
exp x / 1 x
2
) ,
as was to be seen.
A k-Variate Normal distribution with p.d.f. given by (7) is called
a non-singular k-Variate Normal. The use of the term non-singular corre/ | 0; that is, the fact that / is of full rank.
sponds to the fact that |
REMARK 2
COROLLARY 1
In the theorem, let k = 2. Then X = (X1, X2) and the joint p.d.f. of X1, X2 is the
Bivariate Normal p.d.f.
By Remark 1, both X1 and X2 are normally distributed and let X1~
N(1, 21) and X2 N(2, 22). Also let be the correlation coefcient of X1 and
X2. Then their covariance matrix / is given by
PROOF
12
/ =
1 2
1 2
22
and hence |
/ | = 21 22(1 2), so that
/ 1 =
12 22 1 2
22
1 2
1 2
.
12
466
18
Therefore
12 22 1 2 x / 1 x
)(
22
= x1 1 , x 2 2
1 2
1 2 x1 1
21 x 2 2
x 1
= x1 1 22 x 2 2 1 2 , x1 1 1 2 + x 2 2 12 1
x2 2
((
)(
) )
= x1 1 22 2 x1 1 x 2 2 1 2 + x 2 2 12 .
Hence
fX
1 ,X 2
(x , x ) =
1
1
2 1 2
as was to be shown.
COROLLARY 2
exp
2
2
1
2 1
x1 1
2
x 2
2
x1 1 x 2 2 + 2
,
1 2
2
)(
The (normal) r.v.s Xi, i = 1, . . . , k are independent if and only if they are
uncorrelated.
The r.v.s Xi, i = 1, . . . , k are uncorrelated if and only if / is a
diagonal matrix and its diagonal elements are the variances of the Xs. Then
/ | = 21 2n. On the other hand, |
/ | / 1 is also a diagonal matrix with the jth
|
2
diagonal element given by ij i , so that / 1 itself is a diagonal matrix with the
jth diagonal element being given by 1/ 2j. It follows that
PROOF
(x , . . . , x ) =
k
fX
1 , . . . , Xk
i =1
1
exp
xi i
2
2 i
2 i
1
)
2
Exercises
18.1.1 Use Denition 1 herein in order to conclude that the LSE of in (9)
of Chapter 16 has the n-Variate Normal distribution with mean and
covariance matrix 2S1. In particular, ( 1, 2), given by (19) and (19) of
467
Chapter 16, have the Bivariate Normal distribution with means and variances
E 1 = 1, E 2 = 2 and
( )
1 =
2
2 j = 1 x 2j
n
nj = 1 x j x
n
( )
, 2 2 =
(x
n
j =1
x
n x
j =1
n
j =1
18.1.2
.
2
j
PROOF
For t m, we have
()
[ ( )] (
( )
( )
1
1
Y t = expi A t A t / A t = expit A t A/ A t
2
2
and this last expression is the ch.f. of the m-Variate Normal with mean A and
/ A, as was to be seen. The particular case follows from
covariance matrix A
the general one just established.
()
THEOREM 4
( )
( ) ( )
( )
X = c j X j is N c j j , c 2j / j
j =1
j =1
j =1
468
18
()
()
( )
X t = c X t = Xj c j t .
But
j =1
j =1
1
X c j t = expi c jt j c jt / j c jt
2
1
= expit c j j t c 2j / j t ,
2
( )
( )
(
so that
( ) ( )
n
1 n
X t = exp it c j j t c 2j / j t.
2 j =1
j = 1
()
COROLLARY
1 n
Xj.
n j =1
is N(
, (1/n)
Then X
/ ).
PROOF
THEOREM 5
, / ) and set Q = (X )
/ 1(X ).
Let X = (X1, . . . , Xk) be non-singular N(
2
Then Q is an r.v. distributed as k.
PROOF
For t , we have
exp x / 1 x dx
2
()
k
(2 )
= 1 2it
k 2
1 2
k
)( )
k 2
1 2
exp x / 1 x 1 2it dx
2
) ( )
k 2
k 2
)(
1 2
/
1 2it
1
1
/
exp x
x
2
1 2it
since
) dx,
k
/
= 1 2it / .
1 2it
Now the integrand in the last integral above can be looked upon as the p.d.f.
of a k-Variate Normal with mean and covariance matrix / /(1 2it). Hence
18.3
469
the integral is equal to one and we conclude that Q (t) = (1 2it)k/2 which is the
ch.f. of 2k.
Notice that Theorem 5 generalizes a known result for the onedimensional case.
REMARK 4
Exercise
18.2.1 Consider the k-dimensional random vectors Xn = (X1n, . . . , Xkn), n =
1, 2, . . . and X = (X1, . . . , Xk) with d.f.s Fn, F and ch.f.s n, , respectively.
Then we say that {Xn} converges in distribution to X as n , and we write
Xn
d X, if Fn ( x)
F ( x) for all x k for which F is continuous (see
n
n
also Denition 1(iii) in Chapter 8). It can be shown that a multidimensional
version of Theorem 2 in Chapter 8 holds true. Use this result (and also
d
X, if and only if
Theorem 3 in Chapter 6) in order to prove that X n
n
k
d
d
X, for every = (, . . . , k) . In particular, X n
X n
X,
n
n
, / ) if and only if {
Xn} converges in distribution
where X is distributed as N(
and
as n , to an r.v. Y which is distributed as Normal with mean
/ for every k.
variance
, / )
For j = 1, . . . , n, let Xj = (Xj1, . . . , Xjk) be independent, non-singular N(
r. vectors and set
1 n
X = X 1 , . . . , X k , where X i = X ji , i = 1, . . . , k,
n j =1
and
( )
)(
Then
and S are sufcient for (
, / );
i) X
470
18
( )
fX ,Y x, y =
1
2 1 2 1
1
q=
1 2
e q 2 ,
2
x 2
x 1 y 2 y 2
1
.
+
2
1
1 2 2
2 1 2
1 2
e Q 2 ,
where
n
Q = qj
j =1
and
1
qj =
1 2
2
x 2
x j 1 y j 2 y j 2
1
j
2
+
,
1
1 2 2
j = 1, . . . , n. (9)
For testing H, we are going to employ the LR test. And although the
MLEs of the parameters involved are readily given by Theorem 6, we
) for log f(
)
choose to derive them directly. For this purpose, we set g(
considered as a function of the parameter , where the parameter space
is given by
= = 1 , 2 , 12 , 22 , 5 ; 1 , 2 ; 12 , 22 > 0; = 0 .
We have
() (
g = g = g 1 , 2 , 12 , 22 , ; x1 , . . . , xn , y1 , . . . , yn
= n log 2
n
n
n
1
log 12 log 22 log 1 2 q j ,
2
2
2
2 j =1
(10)
18.3
471
1
y
x
2
1 =
2
1
2
1
.
1
1
2 =
x
y.
1
2
1
2
Sy =
1
yj y
n j =1
(12)
2
1 n
xj x ,
n j =1
1 n
and Sxy = x j x y j y .
n j =1
Sx =
(11)
)(
(13)
Then, differentiating g with respect to 21 and 22, equating the partial derivatives to zero and replacing 1 and 2 by 1 and 2, respectively, we obtain after
some simplications
1
Sx
Sxy = 1 2
1 2
12
1
2
Sy
Sxy = 1 .
1 2
22
(14)
1
2
1
1
Sx
S xy + 2 S y +
S xy = 0.
2
2
1 2
2 1 2
1 1
(15)
In (14) and (15), solving for 21, 22 and , we obtain (see also Exercise 18.3.4)
12 = S x , 22 = S y , =
S xy
(16)
S x Sy
It can further be shown (see also Exercise 18.3.5) that the values of the
parameters given by (12) and (16) actually maximize f (equivalently, g) and the
maximum is given by
e 1
=
max f ; = L
S 2xy
S
S
1
x y
Sx Sy
[()
] ()
(17)
472
18
It follows that the MLEs of 1, 2, 21, 22 and , under , are given by (12) and
(16), which we may now denote by 1,, 2,, 21,, 22, and . That is,
1, = x , 2 , = y , 12, = Sx , 22 , = Sy , =
Sxy
(18)
Sx Sy
Under (that is, for = 0), it is seen (see also Exercise 18.3.6) that the MLEs
of the parameters involved are given by
1, = x , 2 , = y , 12, = Sx , 22 , = Sy
(19)
and
e 1
max f ; = L =
2 S S
x y
[()
] ()
(20)
Replacing the xs and ys by Xs and Ys, respectively, in (17) and (20), we have
that the LR statistic is given by
S2
= 1 XY
S X SY
n 2
= 1 R2
n 2
(21)
)(
R = X j X Yj Y
j =1
(X
j =1
) (Y Y )
2
(22)
j =1
(23)
In (23), in order to be able to determine the cut-off point c0, we have to know
the distribution of R under H. Now although the p.d.f. of the r.v. R can be
derived, this p.d.f. is none of the usual ones. However, if we consider the
function
( )
W =W R =
n 2R
1 R2
(24)
Reject H
(25)
18.3
473
)(
R x = x j x Yj Y
j =1
(x
j =1
) (Y Y )
2
(26)
j =1
Let also
vj = x j x
) (x
j =1
so that
n
= 0 and
j =1
Let also Wx = n 2R x
2
j
= 1.
(27)
j =1
R x = R v = v j Yj
j =1
2
j
nY 2 ,
j =1
Wx = W v = v j Yj
j =1
2
n
n
Y j2 nY 2 v j Yj
j =1
j =1
( n 2) .
(28)
We have that Yj, j = 1, . . . , n are independent N(2, 22). Now if we consider the
N(0, 1) r.v.s Yj = (Yj 2)/ 2, j = 1, . . . , n and replace Yj by Yj in (28), it is seen
(see also Exercise 18.3.9) that Wx = W*v remains unchanged. Therefore we may
assume that the Ys are themselves independent N(0, 1). Next consider the
transformation
1
1
Y1 + +
Yn
Z1 =
n
n
Z = v Y + + v Y .
1 1
n n
2
Then because of (1/n)2 + + (1/n)2 = n/n = 1 and also because of (27),
this transformation can be completed to an orthogonal transformation (see
Theorem 8.I(i) in Appendix I) and let Zj, j = 3, . . . , n be the remaining Zs.
Then by Theorem 5, Chapter 9, it follows that the r.v.s Zj, j = 1, . . . , n
are independent N(0, 1). Also nj= 1 Y 2j = nj= 1 Z2j by Theorem 4, Chapter 9. By
means of the transformation in question, the statistic in (28) becomes
Z2 jn= 3 Z j2 ( n 2) . Therefore the distribution of Wx, equivalently the
distribution of W, given Xj = xj, j = 1, . . . , n is tn2. Since this distribution is
independent of the xs, it follows that the unconditional distribution of W is
tn2. Thus we have the following result.
THEOREM 7
474
18
()
1 r 2 ( n 2 ) 2 , 1 < r < 1.
PROOF From W = n 2 r/1 R2, it follows that R and W have the same
sign, that is, RW 0. Solving for R, one has then R = W/W 2 + n 2. By setting
w = n 2 r/1 r2, one has dw/dr = n 2(1 r2)3/2, whereas
fW
Therefore
n 1
2
w =
1
n 2 n 2
2
( )
w2
1
+
n 2
n 1 2
n 1 2
, w .
n 2 r dw
fR r = fW
1 r 2 dr
()
n 1
2
=
1
n 2 n 2
2
n 2 r2
1 +
n 2 1 r2
)(
n 2 1 r2
3 2
n 1
2
1 r 2 ( n 2 ) 2 ,
=
1
n 2
2
as was to be shown.
The p.d.f. of R when 0 can also be obtained, but its expression is rather
complicated and we choose not to go into it.
We close this chapter with the following comment. Let X be a k, / ). Then its ch.f. is given by (6).
dimensional random vector distributed as N(
Furthermore, if / is non-singular, then the N(, / ) distribution has a p.d.f.
which is given by (7). However, this is not the case if / is singular. In this latter
case, the distribution is called singular, and it can be shown that it is concentrated in a hyperplane of dimensionality less than k.
18.1 Introduction
Exercises
475
Exercises
18.3.1
18.3.2
18.3.3
18.3.4 Show that 21, 22, and given by (16) is indeed the solution of the
system of the equations in (14) and (15).
18.3.5
2
g
i j
()
,
=
where
= 1 , 2 , 12 , 22 ,
= 1 , 2 , 12 , 22 ,
and 1, 2, 21, 22 and are given by (12) and (16). Let D = (dij), i, j = 1, . . . , 5
and denote by D5k the determinant obtained from D by deleting the last k
rows and columns, k = 1, . . . , 5, D0 = 1. Then show that the six numbers D0,
D1, . . . , D5 are alternately positive and negative. This result together with the
fact that dij, i, j = 1, . . . , 5 are continuous functions of implies that the
quantities given by (18) are, indeed, the MLEs of the parameters 1, 2, 21, 22
and , under . (See for example Mathematical Analysis by T. M. Apostol,
Addison-Wesley, 1957, Theorem 7.9, pp. 151152.)
18.3.6 Show that the MLEs of 1, 2, 21, and 22, under , are indeed given
by (19).
18.3.7 Show that 2 1, where is the sample correlation coefcient given
by (22).
18.3.8 Verify relation (28).
18.3.9 Show that the statistic Wx (= W*v ) in (28) remains unchanged if the
r.v.s Yj are replaced by the r.v.s
Y j =
Yj 2
, j = 1, . . . , n.
2
and S dened in Theorem 6 and, by using
18.3.10 Refer to the quantities X
Basus theorem (Theorem 3, Chapter 11), show that they are independent.
476
19
Quadratic Forms
Chapter 19
Quadratic Forms
19.1 Introduction
In this chapter, we introduce the concept of a quadratic form in the variables
xj, j = 1, . . . , n and then conne attention to quadratic forms in which the xjs
are replaced by independent normally distributed r.v.s Xj, j = 1, . . . , n. In this
latter case, we formulate and prove a number of standard theorems referring
to the distribution and/or independence of quadratic forms.
A quadratic form, Q, in the variables xj, j = 1, . . . , n is a homogeneous
quadratic (second degree) function of xj, j = 1, . . . , n. That is,
n
Q = cij xi x j ,
i =1 j =1
where here and in the sequel the coefcients of the xs are always assumed to
be real-valued constants. By setting x = (x1, . . . , xn) and C = (cij), we can write
Q = xCx. Now Q is a 1 1 matrix and hence Q = Q, or (xCx) = xCx = xCx.
Therefore Q = 12 (xCx + xCx) = xAx, where A = 12 (C + C); that is to say, if
A = (aij), then aij = 12 (cij + cji), so that aij = aji. Thus A is symmetric. We can then
give the following denition.
DEFINITION 1
Q = cij xi x j ,
(1)
i =1 j =1
476
(2)
477
DEFINITION 3
The quadratic form Q = xCx is called positive denite if xCx > 0 for every
x 0; it is called negative denite if xCx < 0 for every x 0 and positive
semidenite if xCx 0 for every x. A symmetric n n matrix C is called positive
denite, negative denite or positive semidenite if the quadratic form associated with it, Q = xCx, is positive denite, negative denite or positive
semidenite, respectively.
DEFINITION 4
C = C.
where
(Cochran) Let
k
X X = Qi ,
(3)
i =1
where for i = 1, . . . , k, Qi are quadratic forms in X with rank Qi = ri. Then the
r.v.s Qi are independent 2r if and only if ki= 1ri = n.
i
(i ) (i )
(i )
(i ) (i )
(i )
Qi = 1 b11 X 1 + + b1n X n + + r b r 1 X 1 + + b r n X n ,
(4)
k
where 1(i), . . . , (i)
r are either 1 or 1. Now i = 1ri = n and let B be the n n
matrix dened by
i
478
19
Quadratic Forms
b(1) b(1)
1n
11
b(1) b(1)
rn
r1
B=
.
k)
(
(k )
b11 b1n
(k )
(k )
br 1 br n
1
Qi = (BX )D(BX ),
(5)
i =1
Qi = X X
(BX)D(BX) = X (BDB)X.
and
i =1
X X = X BDB X
identically in X.
Hence BDB = In. From the denition of D, it follows that ||D|| = 1, so that rank
D = n. Let r = rank B. Then, of course, r n. Also n = rank In = rank (BDB)
r, so that r = n. It follows that B is nonsingular and therefore the relationship
BDB = In implies D = (B)1B1 = (BB)1. On the other hand, for any
nonsingular square matrix M, MM is positive denite (by Theorem 10.I(ii)
in Appendix I) and so is (MM)1. Thus (BB)1 is positive denite and hence
so is D. From the form of D, it follows then that all diagonal elements of
D are equal to 1, which implies that D = In and hence BB = In; that is to
say, B is orthogonal. Set Y = BX. By Theorem 5, Chapter 9, it follows that, if
Y = (Y1, . . . , Yn), then the r.v.s Yj, j = 1, . . . , n are independent N(0, 1). Also
the fact that D = In and the transformation Y = BX imply, by means of (4), that
Q1 is equal to the sum of the squares of the rst r1, Ys, Q2 is the sum of the
squares of the next r2 Ys, . . . , Qk is the sum of the squares of the last rk Ys.
It follows that, for i = 1, . . . , k, Qi are independent 2r . The proof is
completed.
i
APPLICATION 1
2
2
n Z
n Z Z
Zj
j
=
+
j =1
j =1
equivalently,
) ;
2
j =1
j =1
X j2 = X j X
479
+ n X .
1
2
Now
2
1 n
n X = X j = X C 2 X,
n j =1
2
1
2
where C2 has its elements identically equal to 1/n, so that rank C2 = 1. Next it
can be shown (see also Exercise 19.2.1) that
n
(X
j =1
= X C1X,
where C1 is given by
n1 n
1 n
C1 =
1 n
1 n
(n 1)
1 n
1 n
1 n
n 1 n
and that rank C1 = n 1. Then Theorem 1 applies with k = 2 and gives that
)2 and (nX
)2 are independent distributed as 2n1 and 21, respec nj= 1(Xj X
)2 is distributed as 2n1 and is
tively. Thus it follows that (1/ 2)nj= 1(Zj Z
.
independent of Z
The following theorem refers to the distribution of a quadratic form in the
independent N(0, 1) r.v.s Xj, j = 1, . . . , n. Namely,
THEOREM 2
rank C + rank I n C = n.
(6)
Also
X X = X CX + X I n C X.
(7)
Then Theorem 1 applies with k = 2 and gives that XCX is 2r (and also
X(In C)X is 2nr).
Assume now that Q = XCX is 2r . Then we rst show that rank C = r. By
Theorem 11.I(iii) in Appendix I, there exists an orthogonal matrix P such that
if Y = P1X (equivalently, X = PY), then
( ) ( )
Q = X CX = PY C PY = Y P CP Y = j Y j2 ,
j =1
(8)
480
19
Quadratic Forms
[(1 2i t ) (1 2i t )]
1
1 2
(9)
(1 2it )
r 2
(10)
From (8)(10), one then has that (see also Exercise 19.2.2)
1 = = m = 1
m = r.
and
(11)
It follows then that rank C = r. We now show that C2 = C. From (8), one has
that PCP is diagonal and, by (11), its diagonal elements are either 1 or 0.
Hence PCP is idempotent. Thus
) = (PCP)(PCP)
= P C(PP )CP = P CI CP = P C P.
P CP = P CP
That is,
P CP = P C 2P.
1
(12)
Refer to Application 1. It can be shown (see also Exercise 19.2.3) that C1 and
)2 and (n 1/2X
)2,
C2 are idempotent. Then Theorem 2 implies that nj=1(Xj X
or equivalently
1
2
(Z
j =1
and
n Z
PROOF
COROLLARY 2
481
(PCP)
( )
= P C PP CP = P CCP = P CP
THEOREM 3
Suppose that XX = Q1 + Q2, where Q1, Q2 are quadratic forms in X, and let Q1
be 2r . Then Q2 is 2nr and Q1, Q2 are independent.
1
Q2 = X X Q1 = X X X C1X = X I n C1 X
and (In C1) = In 2C1 + C = In C1, that is, In C1 is idempotent. Also rank
C1 + rank (In C1) = n by Theorem 12.I(iii) in Appendix I, so that rank (In C1)
= n r1. We have then that rank Q1 + rank Q2 = n, and therefore Theorem 1
applies and gives the result.
2
APPLICATION 3
2
1
independent of Z .
The following theorem is also of interest.
THEOREM 4
Qi = Y Bi Y,
Bi = P Ci P, i = 1, 2.
where
(Y , . . . ,Y )(Y , . . . ,Y ) = Y
1
2
j
= Q1 + Q2 .
(13)
j =1
482
19
Quadratic Forms
follows that Q*1 and Q*2 are functions of Yj, j = 1, . . . , r only. From the
orthogonality of P, we have that the r.v.s Yj, j = 1, . . . , r are independent
N(0, 1). On the other hand, Q*1 is 2r by Corollary 2 to Theorem 2. These
facts together with (13) imply that Theorem 3 applies (with n = r) and provides
the desired result.
1
Suppose that Q = ki= 1Qi, where Q and Qi, i = 1, . . . , k ( 2) are quadratic forms
in X. Furthermore, let Q be 2r, let Qi be 2r , i = 1, . . . , k 1 and let Qk be
positive semidenite. Then Qk is 2r , where
i
k 1
rk = r ri ,
and
Qi , i = 1, . . . , k
i =1
are independent.
The proof is by induction. For k = 2 the conclusion is true by Theorem 4. Let the theorem hold for k = m and show that it also holds for m + 1. We
write
PROOF
m 1
Q = Qi + Qm ,
where
i =1
Qm = Qm + Qm +1 .
m 1
rm = r ri ,
i =1
and
Q1 , . . . , Qm 1 , Qm
are independent, by the induction hypothesis. Thus Q*m = Qm + Qm+1, where Q*m
is 2r* , Qm is 2r and Qm+1 is positive semidenite. Once again Theorem 4 applies
and gives that Qm+1 is 2r , where
m
m+1
rm +1 = rm rm = r ri ,
i =1
and that Qm and Qm+1 are independent. It follows that Qi, i = 1, . . . , m + 1 are
also independent and the proof is concluded.
The theorem below gives a necessary and sufcient condition for independence of two quadratic forms. More precisely, we have the following
result.
THEOREM 6
483
C 1 I n C 1 C 2 = C 2 I n C 1 C 2 = 0.
Then Theorem 12.I(iii), in Appendix I, implies that
(14)
X X = X C1X + X C 2 X + X I n C1 C 2 X.
(15)
Then relations (14), (15) and Theorem 1 imply that XC1X = Q1, XC2X = Q2
(and X(In C1 C2)X) are independent.
Let now Q1, Q2 be independent. Since Q1 is 2r and Q2 is 2r , it follows that
Q1 + Q2 = X(C1 + C2)X is 2r +r . That is, X(C1 + C2)X is a quadratic form in X
distributed as 2r +r . Thus C1 + C2 is idempotent by Theorem 2. So the matrices
C1, C2 and C1 C2 are all idempotent. Then Theorem 12.I(iv), in Appendix I,
applies and gives that C1C2 (= C2C1) = 0. This concludes the proof of the
theorem.
1
REMARK 1 Consider the quadratic forms XC1X and XC2X guring in Applications 13. Then, according to the conclusion reached in discussing those
applications, XC1X and XC2X are distributed as 2n1 and 21, respectively.
This should imply that C1C2 = 0, by Theorem 6. This is, indeed, the case as is
easily seen.
Exercises
19.2.1
(X
j =1
= X C1X
as asserted there.
19.2.2
19.2.3 Refer to Application 2 and show that the matrices C1 and C2 are both
idempotent.
+ e, where X is of full rank
19.2.4 Consider the usual linear model Y = X
p, and let = S1XY be the LSE of . Write Y as follows: Y = X + (Y X )
and show that:
ii) ||Y||2 = YXS1XY + ||Y X ||2;
ii) The r.v.s YXS1XY and ||Y X ||2 are independent, the rst being distributed as noncentral 2p and the second as 2np.
484
19
Quadratic Forms
19.2.5 Let X1, X2, X3 be independent r.v.s distributed as N(0, 1) and let the
r.v. Q be dened by
1
5 X 12 + 2 X 22 + 5 X 23 + 4 X 1 X 2 2 X 1 X 3 + 4 X 2 X 3 .
6
Then nd the distribution of Q and show that Q is independent of the r.v.
3j = 1X 2j Q.
Q=
20.1
Nonparaetric Estimation
485
Chapter 20
Nonparametric Inference
. This is
consistent, that is, consistent in the probability sense. Thus X n
n
n is strongly consistent,
so by the WLLNs. It has also been mentioned that X
a.s.
namely, X n n
. This is justied on the basis of the SLLNs.
Let us suppose now that the Xs also have nite (and positive) variance 2
which presently is assumed to be known. Then, according to the CLT,
n Xn
) d Z N 0, 1 .
( )
n
486
20
Nonparametric Inference
n Xn
P z 2
z 2
= P X n z 2
X n + z
1 ,
n n
so that [Ln, Un] is a condence interval for with asymptotic condence coefcient 1 ; here
Ln = L X 1 , . . . , X n = X n z
and
U n = U X 1 , . . . , X n = X n + z
Next, suppose that 2 is unknown and set S 2n for the sample variance of the
Xs; namely,
S 2n =
2
1 n
X j Xn .
n j =1
Then the WLLNs and the SLLNs, properly applied, ensured that S 2n, viewed
as an estimator of 2, was both a weakly and strongly consistent estimator of 2.
Also by the corollary to Theorem 9 of Chapter 8, it follows that
n Xn
Sn
) d Z N 0, 1 .
( )
n
By setting
Ln = L X 1 , . . . , X n = X n z
Sn
2
and
U n = U X 1 , . . . , X n = X n + z
Sn
,
n
we have that [L*n, U*
n ] is a condence interval for with asymptotic condence
coefcient 1 .
Clearly, the examples mentioned so far are cases of nonparametric
point and interval estimation. A further instance of point nonparametric
estimation is provided by the following example. Let F be the (common
and unknown) d.f. of the Xis and set Fn for their sample or empirical d.f.; that
is,
2
20.2 Nonparametric
20.1 Nonparaetric
Estimation Estimation
of a P.D.F.
( )
Fn x; s =
() ]
()
1
the number of X 1 s , . . . , X n s x , x , s S .
n
487
(1)
We often omit the random element s and write Fn(x) rather than Fn(x; s). Then
it was stated in Chapter 8 (see Theorem 6) that
( )
()
Fn x; a.s.
F x
n
uniformly in
x .
(2)
Thus Fn(x; ) is a strongly consistent estimator of F(x) and for almost all s S
and every > 0, we have
( )
()
( )
Fn x; s F x Fn x; s + ,
()
f x = lim
) (
).
F x+h F xh
h 0
2h
Thus, for (0 <) h sufciently small, the quantity [F(x + h) F(x h)]/2h should
be close to f(x). This suggests estimating f(x) by the (known) quantity
()
fn x =
However,
).
Fn x + h Fn x h
2h
488
20
Nonparametric Inference
()
fn x =
Fn x + h Fn x h
2h
1 the number of X1 , . . . , X n x + h
=
2h
n
the number of X1 , . . . , X n x h
1 the number of X1 , . . . , X n in x h, x + h
;
2h
n
that is
1 the number of X1 , . . . , X n in x h, x + h
fn x =
n
2h
and it can be further easily seen (see Exercise 20.2.1) that
()
1 n x Xj
fn x =
,
K
nh j=1 h
()
(3)
1
, if x 1, 1
K x = 2
0, otherwise.
Thus the proposed estimator fn(x) of f(x) is expressed in terms of a known
p.d.f. K by means of (3). This expression also suggests an entire class of
estimators to be introduced below. For this purpose, let K be any p.d.f. dened
on into itself and satisfying the following properties:
()
{ ()
lim xK x = 0 as x
K x = K x , x .
( )
()
(4)
()
hn
0.
n
(5)
For each x and by means of K and {hn}, dene the r.v. fn(x; s), to be
shortened to fn(x), as follows:
()
1
fn x =
nhn
x Xj
.
hn
K
j =1
(6)
Let Xj, j = 1, . . . , n be i.i.d. r.v.s with (unknown) p.d.f. f and let K be a p.d.f.
satisfying conditions (4). Also, let {hn} be a sequence of positive constants
20.2 Nonparametric
20.1 Nonparaetric
Estimation Estimation
of a P.D.F.
489
satisfying (5) and for each x let fn(x) be dened by (6). Then for any x
at which f is continuous, the r.v. fn(x), viewed as an estimator of f(x), is
asymptotically unbiased in the sense that
( ) n ( )
Efn x
f x .
Now let {hn} be as above and also satisfying the following requirement:
nhn
.
(7)
Under the same assumptions as those in Theorem 1 and the additional condition (7), for each x at which f is continuous, the estimator fn(x) of f(x) is
consistent in quadratic mean in the sense that
() ()
E fn x f x
0.
n
( ) [ ( )] d Z N 0, 1 .
( )
n
[ f ( x )]
fn x E fn x
n
(8)
Under the same assumptions as those in Theorem 1 and also condition (8),
()
()
s.
fn x a.
f x
n
uniformly in x ,
()
K x =
1
2
ex
490
20
Nonparametric Inference
Then, clearly,
()
ex
{ ()
xe x
< xe x 2 = x e x 2 .
Now consider the expansion et = 1 + tet for some 0 < < 1, and replace t by
x/2. We get then
1
0
1 x 2 x
1 x + e
2
2
2
0.
0, so that
and therefore xe x 2
In
a
similar
way xe x 2
x
x
lim xK(x) = 0 as x . Since also K(x) = K(x), condition (4) is satised.
0, nhn2 = n n1 2 = n1 2
and
Let us now take hn = 1/n1/4. Then 0 < hn
n
n
3 4
nhn = n
.
Thus
the
estimator
given
by
(6)
has
all
properties
stated in
n
Theorems 14. This estimator here becomes as follows:
x
x
=
e x 2 1 + x 2 e x
( )
( )
||
()
fn x =
1
2 n3 4
xX
j
exp 2 n1 2
j =1
Exercise
20.2.1 Let Xj, j = 1, . . . , n be i.i.d. r.v.s and for some h > 0 and any x ,
dene fn(x) as follows:
1 the number of X1 , . . . , X n in x h, x + h
fn x =
.
n
2h
Then show that
()
1 n x Xj
fn x =
,
K
nh j = 1 h
()
20.3
20.1Some
Nonparaetric
Nonparametric
Estimation
Tests
491
testing hypotheses problems about F also arise and are of practical importance. Thus we may be interested in testing the hypothesis H : F = F0, a given
d.f., against all possible alternatives. This hypothesis can be tested by utilizing
the chi-square test for goodness of t discussed in Chapter 13, Section 8. The
chi-square test is the oldest nonparametric test regarding d.f.s. Alternatively,
the sample d.f. Fn may also be used for testing the same hypothesis as above.
In order to be able to employ the test proposed below, we have to make the
supplementary (but mild) assumption that F is continuous. Thus the hypothesis to be tested here is
H : F = F0 , a given continuous d.f.,
A : F F0
()
()
Dn = sup Fn x F0 x ; x ,
(9)
where Fn is the sample d.f. dened by (1). Then, under H, it follows from (2)
0. Therefore we would reject H if Dn > C and would accept it
that Dn a.s.
n
P Dn > C H = .
(10)
nDn x H
(1) e
j
2 j 2 x 2
, x 0.
(11)
j =
Thus for large n, the right-hand side of (11) may be used for the purpose of
determining C by way of (10). For moderate values of n (n 100) and selected
s ( = 0.10, 0.05, 0.025, 0.01, 0.005), there are tables available which facilitate
the calculation of C. (See, for example, Handbook of Statistical Tables by D. B.
Owen, Addison-Wesley, 1962.) The test employed above is known as the
Kolmogorov one-sample test.
The testing hypothesis problem just described is of limited practical importance. What arise naturally in practice are problems of the following type:
Let Xi, i = 1, . . . , m be i.i.d. r.v.s with continuous but unknown d.f. F and let
Yj, j = 1, . . . , n be i.i.d. r.v.s with continuous but unknown d.f. G. The two
random samples are assumed to be independent and the hypothesis of interest
here is
H : F = G.
One possible alternative is the following:
(12)
A: F G
(in the sense that F(x) G(x) for at least one x ).
492
20
Nonparametric Inference
()
()
Dm,n = sup Fm x Gn x ; x ,
(13)
where Fm, Gn are the sample d.f.s of the Xs and Ys, respectively. Under H,
F = G, so that
()
( ) [ ( ) ( )] [ ( ) ( )]
F ( x ) F ( x ) + G ( x ) G( x ) .
Fm x Gn x = Fm x F x Gn x G x
m
Hence
() ()
() ()
() ()
() ()
sup Fm x F x ; x a.s.
0, sup Gn x G x ; x a.s.
0.
m
a.s.
P Dm,n > C H = .
(14)
N Dm ,n x H
(1) e
j
2 j 2x 2
as m, n , x 0,
(15)
j =
(16)
in the sense that F(x) G(x) with strict inequality for at least one x , and
A : F < G,
(17)
in the sense that F(x) G(x) with strict inequality for at least one x .
For testing H against A, we employ the statistic D+m,n dened by
{ ()
()
Dm+ ,n = sup Fm x Gn x ; x
20.4
493
and reject H if D+m,n > C+. The cut-off point C+ is determined through the
relation
P Dm+ ,n > C + H =
N Dm+ ,n x 1 e 2 x
as m, n , x ,
as can be shown. Here N is as before, that is, N = mn/(m + n). Similarly, for
testing H against A, we employ the statistic D m,n dened by
{ ()
()
Dm ,n = sup Gn x Fm x ; x
and reject H if Dm,n < C . The cut-off point C is determined through the
relation
P Dm ,n < C H =
by utilizing the fact that
P
N Dm ,n x 1 e 2 x
as m, n , x .
For relevant tables, the reader is referred to the reference cited earlier in this
section. The last three tests based on the statistics Dm,n, D+m,n and Dm,n are
known as KolmogorovSmirnov two-sample tests.
494
20
Nonparametric Inference
ordering the Xs and Ys, we have strict inequalities with probability equal to
one.
For testing the hypothesis H specied above, we are going to use either
one of the rank sum statistics RX, RY dened by
m
( )
( )
RX = R X i , RY = R Yj
i =1
j =1
(18)
N
, 1 < k < .
m
( )
N
m
X < C ,
(19)
(20)
P X < C H = .
()
(21)
and set
m
U = u X i Yj .
i =1 j =1
(22)
Then U is, clearly, the number of times a Y precedes an X and it can be shown
(see Exercise 20.4.2) that
U = mn +
) R
n n+1
2
= RX
).
m m+1
2
(23)
20.4
495
mn m + n + 1
mn
, 2 U =
.
2
12
Then the r.v. (U EU)/( (U )) converges in distribution to an r.v. Z distributed as N(0, 1) as m, n and therefore, for large m, n, the limiting
distribution (along with the continuity correction for better precision) may be
used for determining the cut-off point C by means of (20).
A special interesting case, where the rank sum tests of the present section
are appropriate, is that where the d.f. G of the Ys is assumed to be of the form
( )
EU =
()
G x = F x , x
50
Y
53
Y
65
X
71
Y
74
X
78
X
82
X
110
Y ,
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )
where an X or Y below a number means that the number is coming from the
X or Y sample, respectively. From this, we nd that
( )
( )
( )
( )
R(Y ) = 5, R(Y ) = 3, R(Y ) = 2, so that
( )
( )
R X 1 = 7, R X 2 = 4, R X 3 = 6, R X 4 = 1, R X 5 = 8; R Y1 = 9,
2
RX = 26, RY = 19.
We also nd that
U = 4 + 4 + 3 + 0 = 11.
(Incidentally, these results check with (23) and Exercise 20.4.1.) Now
N = 5 + 4 = 9 and
( ) ()
N
m
9
5
1
,
126
496
20
Nonparametric Inference
5
0.04 .
126
Then for testing H against A (given by (16)), we would reject for small values
of RX, or equivalently (by means of (23)), for small values of U. For the given
m, n, and for the observed value of RX (or U), H is accepted. (See tables on
p. 341 of the reference cited in Section 20.3.)
Exercises
20.4.1 Consider the two independent random samples Xi, i = 1, . . . , m and
Yj, j = 1, . . . , n and let R(Xi) and (Yj) be the ranks of Xi and Yj, respectively,
in the combined sample of the Xs and Ys. Furthermore, let RX and RY be
dened by (18). Then show that
RX + RY =
),
N N +1
where N = m + n.
( )
20.6
Relative Asymptotic
20.1 Nonparaetric
Efficiency
Estimation
of Tests
497
Also set Z = nj=1Zj and p = P(Xj < Yj). Then, clearly, Z is distributed as B(n, p)
and the hypothesis H above is equivalent to testing p = 12 . Depending on the
type of the alternatives, one would use the two-sided or the appropriate onesided test.
Some cases where the sign test just described is appropriate is when one is
interested in comparing the effectiveness of two different drugs used for the
treatment of the same disease, the efciency of two manufacturing processes
producing the same item, the response of n customers regarding their preferences towards a certain consumer item, etc.
Of course, there is also the one-sample Sign test available, but we will not
discuss it here.
For the sake of an illustration, consider the following numerical example.
EXAMPLE 3
and
10
= 6.
j =1
498
20
Nonparametric Inference
cance is . Formally, we may also employ the t-test (see (36), Chapter 13) for
testing the same hypothesis against the same specied alternative at the same
level . Let n* be the (common) sample size required in order to achieve a
power equal to by employing the t-test. We further assume that the limit of
n*/n, as n , exists and is independent of and . Denote this limit by e.
Then this quantity e is the Pitman asymptotic relative efciency of the
WilcoxonMannWhitney test (or of the Sign test, depending on which one is
used) relative to the t-test. Thus, if we use the WilcoxonMannWhitney test
and if it so happens that e = 13 , then this means that the WilcoxonMann
Whitney test requires approximately three times as many observations as the
t-test in order to achieve the same power. However, if e = 5, then the
WilcoxonMannWhitney test requires approximately only one-fth as many
observations as the t-test in order to achieve the same power.
It has been found in the literature that the asymptotic efciency of the
WilcoxonMannWhitney test relative to the t-test is 3/ 0.95 when the
underlying distribution is Normal, 1 when the underlying distribution is Uniform and when the underlying distribution is Cauchy.
I.3
499
Appendix I
x + y = y + x,
( x + y ) + z = x + ( y + z ) = x + y + z.
The product x of the vector x = (x1, . . . , xn) by the real number (scalar) is
the vector dened by x = (x1, . . . , xn). For any two vectors x, y in Vn and
any two scalars , , the following properties are immediate:
( + )x = x + x, (x) = (x) = x,
(x + y) = x + y , 1x = x.
x + y = x + y,
The inner (or scalar) product xy of any two vectors x = (x1, . . . , xn), y =
(y1, . . . , yn) is a scalar and is dened as follows:
499
500
Appendix I
x y = x j y j .
j =1
For any three vectors x, y and z in Vn and any scalars , , the following
properties are immediate:
x y = y x, x y = x y = x y , x y + z = x y + x z,
( ) ( )
( )
x x 0 and x x = 0
if and only if
x = 0;
also if
x y = 0 for every y Vn , then x = 0.
x = x .
Two vectors x, y in Vn are said to be orthogonal (or perpendicular), and we
write x y, if xy = 0. A vector x in Vn is said to be orthogonal (or perpendicular) to a subset U of Vn, and we write x U, if xy = 0 for every y U.
A (nonempty) subset V of Vn is a vector space, which is a subspace of Vn,
denoted by V Vn, if for any vectors x, y in V and any scalars and , x +
y is also in V. Thus, for example, the straight line 1x1 + 2x2 = 0 in the plane,
being the set {x = (x1, x2) V2; 1x1 + 2x2 = 0} is a subspace of V2. It is shown
easily that for any given set of vectors xj, j = 1, . . . , r in Vn, V dened by
r
V = y Vn ; y = j x j , j , j = 1, . . . ,
j =1
is a subspace of Vn.
The vectors xj, j = 1, . . . , r in Vn are said to span (or generate) the subspace
r
V Vn if every vector y in V may be written as follows: y = j=1
jxj for some
scalars j, j = 1, . . . , r.
For any positive integer m < n, the m-dimensional vector space Vm may be
considered as a subspace of Vn by enlarging the m-tuples to n-tuples and
identifying the appropriate components with zero in the resulting n-tuples.
Thus, for examples, the x-axis in the plane may be identied with the set {x =
(x1, x2) V2; x1 , x2 = 0} which is a subspace of V2. Similarly the y-axis in the
plane may be identied with the set {y = (y1, y2) V2; y1 = 0, y2 } which is
a subspace of V2; the xy-plane in the three-dimensional space may be identied with the set {z = (x1, x2, x3) V3; x1, x2 , x3= 0} which is a subspace of
V3, etc.
From now on, we shall assume that the above-mentioned identication has been made and we shall write Vm Vn to indicate that Vm is a
subspace of Vn.
501
I.2I.3 Some
Basic
Theorems
Denitions
onAbout
VectorMatrices
Spaces
= 1,
i = 1, . . . , k.
e1 = 1, 0, . . . , 0 , e 2 = 0, 1, 0, . . . , 0 , . . . , e n = 0, . . . , 0, 1 ,
For any positive integer n, consider any subspace V Vn. Then V has a basis
and any two bases in V have the same number of vectors, say, m (the
dimension of V ). In particular, the dimension of Vn is n and m n.
THEOREM 2.I
THEOREM 3.I
THEOREM 4.I
502
Appendix I
A m n
a11
a
= 21
am 1
a12
a22
am 2
a1n
a2 n
amn
(A + B) + C = A + (B + C) = A + B + C.
The product A of the matrix A = (aij) by the scalar is the matrix dened by
A = (aij). The transpose A of the m n matrix A = (aij) is the n m matrix
dened by A = (aij). Thus the rows and columns of A are equal to the columns
and rows of A, respectively. If A is a square matrix and A = A, then A is called
symmetric. Clearly, for a symmetric matrix the elements symmetric with
respect to the main diagonal of A are equal; that is, aij = aji for all i and j. For
any m n matrices A, B and any scalars , , the following properties are
immediate:
A = A, A = A , A + B = A + B.
( )
( )
I.3
503
BA is not dened unless r = m and even then, it is not true, in general, that
AB = BA. For example, take
0 0
1 1
A=
, B =
.
0 1
0 0
Then
0 0
0 1
AB =
, BA =
,
0 0
0 0
so that AB BA. The products AB, BA are always dened for all square
matrices of the same order.
Let A be an m n matrix, let B, C be two n r matrices and let D be an
r k matrix. Then for any scalars , and , the following properties are
immediate:
(
)
( B + C)D = BD + CD, (A)B = A(B) = (AB) = AB,
(AB) = BA , (AB)D = A(BD).
IA = AI = A, 0A = A0 = 0, A B + C = AB + AC,
By means of the last property, we may omit the parentheses and set ABD for
(AB)D = A(BD).
Let A be an m n matrix and let ri, i = 1, . . . , m, cj, j = 1, . . . , n stand for
the row and column vectors of A, respectively. Then it can be shown that the
largest number of independent r-vectors is the same as the largest number of
independent c-vectors and this common number is called the rank of the matrix
A. Thus the rank of A, to be denoted by rank A, is the common dimension of
the two vector spaces spanned by the r-vectors and the c-vectors. Always rank
A min(m, n) and if equality occurs, we say that A is non-singular or of full
rank; otherwise A is called singular.
Let now |A| stand for the determinant of the square matrix A, dened only
for square matrices, say m m, by the expression
A = a1i a2j amp ,
where the aij are the elements of A and the summation extends over all
permutations (i, j, . . . , p) of (1, 2, . . . , m). The plus sign is chosen if the
permutation is even and the minus sign if it is odd. For further elaboration, see
any of the references cited at the end of this appendix. It can be shown that A
is nonsingular if and only is |A| 0. It can also be shown that if |A| 0, there
exists a unique matrix, to be denoted by A1, such that AA1 = A1A = I. The
matrix A1 is called the inverse of A. Clearly, (A1)1 = A.
Let A be a square matrix of order n such that AA = AA = I. Then A is
said to be orthogonal. Let ri and ci, i = 1, . . . , n stand for the row and column
504
Appendix I
r iri = ri
= ci
THEOREM 6.I
i) Let A, B be any two matrices of the same order. Then |AB| = |BA| = |A| |B|.
ii) For any diagonal matrix A of order n, |A| = nj=1aj, where aj, j = 1, . . . , n are
the diagonal elements of A.
iii) For any (square) matrix A, |A| = |A|.
iv) For any orthogonal matrix A, |A| is either 1 or 1.
v) Let A, B be matrices of the same order and suppose that B is orthogonal,
Then |BAB| = |BAB| = |A|.
vi) For any matrix A for which |A| 0, |A1| = |A|1.
THEOREM 7.I
I.4
THEOREM 8.I
505
i) Let r1, r2 be two vectors in Vn such that r1r2 = 0 and ||r1|| = ||r2|| = 1. Then there
exists an n n orthogonal matrix, the rst two rows of which are equal to
r1, r2.
(For a concrete example, see the application after Theorem 5 in
Chapter 9.)
ii) Let x be a vector in Vn, let A be an n n orthogonal matrix and set y = Ax.
Then xx = yy, so that ||x|| = ||y||.
iii) For every symmetric matrix A there is an orthogonal matrix B (of the
same order as that of A) such that the matrix BAB is diagonal (and its
diagonal elements are the characteristic roots of A).
THEOREM 9.I
( )
and
( )
( )
j = 1, . . . , n.
506
Appendix I
b x ,
ij
i = 1, . . . , r
j =1
such that
2
Q = i bij x j ,
j =1
i =1
r
where i is either 1 or 1, i = 1, . . . , r.
iii) Let Q be as in (ii). There exists an orthogonal matrix B such that if
m
y = B 1 x, then Q = j y 2j ,
j =1
Q = y 2j .
j =1
rank A
j =1
= rank A j .
j =1
In particular,
rank A 1 + rank I A 1 = n
I.4
507
and
508
Appendix II
Appendix II
( )
ft t ; =
r;
( )
r +1 2
( )
r 2
( r 1)
1 x
exp x + t
dx, t .
r
2
508
509
() ()
f x; = Pj fr + 2j x , x 0,
2
r;
j =0
where
()
Pj = e
( 2)
2
2 2
j!
and
fr + 2 j
r2+ 2j ,
is the p.d.f. of
j = 0, 1, . . . .
F=
F =
X r1
.
Y r2
fF
r1 , r2 :
(f; ) = e
cj
( 2)
2
j!
j =0
1 r 1+ j
2 1
(1 + f )
( r + r )+ j
1
where
1
r1 + r2 + j
2
,
cj =
1
1
r1 + j r2
2
2
j = 0, 1, . . . .
f 0,
510
Appendix II
REMARKS
Tables
511
Appendix III
Tables
j p (1 p )
j
n j
j =0
n
2
1/16
2/16
3/16
p
4/16
5/16
6/16
7/16
8/16
0
1
2
0
1
2
3
0
1
2
3
4
0
1
2
3
4
5
0.8789
0.9961
1.0000
0.8240
0.9888
0.9998
1.0000
0.7725
0.9785
0.9991
1.0000
1.0000
0.7242
0.9656
0.9978
0.9999
1.0000
1.0000
0.7656
0.9844
1.0000
0.6699
0.9570
0.9980
1.0000
0.5862
0.9211
0.9929
0.9998
1.0000
0.5129
0.8793
0.9839
0.9989
1.0000
1.0000
0.6602
0.9648
1.0000
0.5364
0.9077
0.9934
1.0000
0.4358
0.8381
0.9773
0.9988
1.0000
0.3541
0.7627
0.9512
0.9947
0.9998
1.0000
0.5625
0.9375
1.0000
0.4219
0.8437
0.9844
1.0000
0.3164
0.7383
0.9492
0.9961
1.0000
0.2373
0.6328
0.8965
0.9844
0.9990
1.0000
0.4727
0.9023
1.0000
0.3250
0.7681
0.9695
1.0000
0.2234
0.6296
0.9065
0.9905
1.0000
0.1536
0.5027
0.8200
0.9642
0.9970
1.0000
0.3906
0.8594
1.0000
0.2441
0.6836
0.9473
1.0000
0.1526
0.5188
0.8484
0.9802
1.0000
0.0954
0.3815
0.7248
0.9308
0.9926
1.0000
0.3164
0.8086
1.0000
0.1780
0.5933
0.9163
1.0000
0.1001
0.4116
0.7749
0.9634
1.0000
0.0563
0.2753
0.6160
0.8809
0.9840
1.0000
0.2500
0.7500
1.0000
0.1250
0.5000
0.8750
1.0000
0.0625
0.3125
0.6875
0.9375
1.0000
0.0312
0.1875
0.5000
0.8125
0.9687
1.0000
511
512
Table 1
n
6
10
(continued )
1/16
2/16
3/16
p
4/16
5/16
6/16
7/16
8/16
0
1
2
3
4
5
6
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
8
0
1
2
3
4
5
6
7
8
9
0
1
2
3
4
5
6
7
8
0.6789
0.9505
0.9958
0.9998
1.0000
1.0000
1.0000
0.6365
0.9335
0.9929
0.9995
1.0000
1.0000
1.0000
1.0000
0.5967
0.9150
0.9892
0.9991
1.0000
1.0000
1.0000
1.0000
1.0000
0.5594
0.8951
0.9846
0.9985
0.9999
1.0000
1.0000
1.0000
1.0000
1.0000
0.5245
0.8741
0.9790
0.9976
0.9998
1.0000
1.0000
1.0000
1.0000
0.4488
0.8335
0.9709
0.9970
0.9998
1.0000
1.0000
0.3927
0.7854
0.9537
0.9938
0.9995
1.0000
1.0000
1.0000
0.3436
0.7363
0.9327
0.9888
0.9988
0.9999
1.0000
1.0000
1.0000
0.3007
0.6872
0.9081
0.9817
0.9975
0.9998
1.0000
1.0000
1.0000
1.0000
0.2631
0.6389
0.8805
0.9725
0.9955
0.9995
1.0000
1.0000
1.0000
0.2877
0.6861
0.9159
0.9866
0.9988
1.0000
1.0000
0.2338
0.6114
0.8728
0.9733
0.9965
0.9997
1.0000
1.0000
0.1899
0.5406
0.8238
0.9545
0.9922
0.9991
0.9999
1.0000
1.0000
0.1543
0.4748
0.7707
0.9300
0.9851
0.9978
0.9998
1.0000
1.0000
1.0000
0.1254
0.4147
0.7152
0.9001
0.9748
0.9955
0.9994
1.0000
1.0000
0.1780
0.5339
0.8306
0.9624
0.9954
0.9998
1.0000
0.1335
0.4449
0.7564
0.9294
0.9871
0.9987
0.9999
1.0000
0.1001
0.3671
0.6785
0.8862
0.9727
0.9958
0.9996
1.0000
1.0000
0.0751
0.3003
0.6007
0.8343
0.9511
0.9900
0.9987
0.9999
1.0000
1.0000
0.0563
0.2440
0.5256
0.7759
0.9219
0.9803
0.9965
0.9996
1.0000
0.1056
0.3936
0.7208
0.9192
0.9868
0.9991
1.0000
0.0726
0.3036
0.6186
0.8572
0.9656
0.9952
0.9997
1.0000
0.0499
0.2314
0.5201
0.7826
0.9318
0.9860
0.9983
0.9999
1.0000
0.0343
0.1747
0.4299
0.7006
0.8851
0.9690
0.9945
0.9994
1.0000
1.0000
0.0236
0.1308
0.3501
0.6160
0.8275
0.9428
0.9865
0.9979
0.9998
0.0596
0.2742
0.5960
0.8535
0.9694
0.9972
1.0000
0.0373
0.1937
0.4753
0.7570
0.9260
0.9868
0.9990
1.0000
0.0233
0.1350
0.3697
0.6514
0.8626
0.9640
0.9944
0.9996
1.0000
0.0146
0.0931
0.2817
0.5458
0.7834
0.9260
0.9830
0.9977
0.9999
1.0000
0.0091
0.0637
0.2110
0.4467
0.6943
0.8725
0.9616
0.9922
0.9990
0.0317
0.1795
0.4669
0.7650
0.9389
0.9930
1.0000
0.0178
0.1148
0.3412
0.6346
0.8628
0.9693
0.9969
1.0000
0.0100
0.0724
0.2422
0.5062
0.7630
0.9227
0.9849
0.9987
1.0000
0.0056
0.0451
0.1679
0.3907
0.6506
0.8528
0.9577
0.9926
0.9994
1.0000
0.0032
0.0278
0.1142
0.2932
0.5369
0.7644
0.9118
0.9773
0.9964
0.0156
0.1094
0.3437
0.6562
0.8906
0.9844
1.0000
0.0078
0.0625
0.2266
0.5000
0.7734
0.9375
0.9922
1.0000
0.0039
0.0352
0.1445
0.3633
0.6367
0.8555
0.9648
0.9961
1.0000
0.0020
0.0195
0.0898
0.2539
0.5000
0.7461
0.9102
0.9805
0.9980
1.0000
0.0010
0.0107
0.0547
0.1719
0.3770
0.6230
0.8281
0.9453
0.9893
Tables
Table 1
513
(continued )
1/16
2/16
3/16
p
4/16
5/16
6/16
7/16
8/16
10
9
10
0
1
2
3
4
5
6
7
8
9
10
11
0
1
2
3
4
5
6
7
8
9
10
11
12
0
1
2
3
4
5
6
7
8
9
10
11
12
13
0
1
1.0000
1.0000
0.4917
0.8522
0.9724
0.9965
0.9997
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
0.4610
0.8297
0.9649
0.9950
0.9995
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
0.4321
0.8067
0.9565
0.9931
0.9992
0.9999
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
0.4051
0.7833
1.0000
1.0000
0.2302
0.5919
0.8503
0.9610
0.9927
0.9990
0.9999
1.0000
1.0000
1.0000
1.0000
1.0000
0.2014
0.5467
0.8180
0.9472
0.9887
0.9982
0.9998
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
0.1762
0.5035
0.7841
0.9310
0.9835
0.9970
0.9996
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
0.1542
0.4626
1.0000
1.0000
0.1019
0.3605
0.6589
0.8654
0.9608
0.9916
0.9987
0.9999
1.0000
1.0000
1.0000
1.0000
0.0828
0.3120
0.6029
0.8267
0.9429
0.9858
0.9973
0.9996
1.0000
1.0000
1.0000
1.0000
1.0000
0.0673
0.2690
0.5484
0.7847
0.9211
0.9778
0.9952
0.9992
0.9999
1.0000
1.0000
1.0000
1.0000
1.0000
0.0546
0.2312
1.0000
1.0000
0.0422
0.1971
0.4552
0.7133
0.8854
0.9657
0.9924
0.9988
0.9999
1.0000
1.0000
1.0000
0.0317
0.1584
0.3907
0.6488
0.8424
0.9456
0.9857
0.9972
0.9996
1.0000
1.0000
1.0000
1.0000
0.0238
0.1267
0.3326
0.5843
0.7940
0.9198
0.9757
0.9944
0.9990
0.9999
1.0000
1.0000
1.0000
1.0000
0.0178
0.1010
1.0000
1.0000
0.0162
0.0973
0.2816
0.5329
0.7614
0.9068
0.9729
0.9943
0.9992
0.9999
1.0000
1.0000
0.0111
0.0720
0.2240
0.4544
0.6900
0.8613
0.9522
0.9876
0.9977
0.9997
1.0000
1.0000
1.0000
0.0077
0.0530
0.1765
0.3824
0.6164
0.8078
0.9238
0.9765
0.9945
0.9991
0.9999
1.0000
1.0000
1.0000
0.0053
0.0388
0.9999
1.0000
0.0057
0.0432
0.1558
0.3583
0.6014
0.8057
0.9282
0.9807
0.9965
0.9996
1.0000
1.0000
0.0036
0.0291
0.1135
0.2824
0.5103
0.7291
0.8822
0.9610
0.9905
0.9984
0.9998
1.0000
1.0000
0.0022
0.0195
0.0819
0.2191
0.4248
0.6470
0.8248
0.9315
0.9795
0.9955
0.9993
0.9999
1.0000
1.0000
0.0014
0.0130
0.9997
1.0000
0.0018
0.0170
0.0764
0.2149
0.4303
0.6649
0.8473
0.9487
0.9881
0.9983
0.9999
1.0000
0.0010
0.0104
0.0504
0.1543
0.3361
0.5622
0.7675
0.9043
0.9708
0.9938
0.9992
1.0000
1.0000
0.0006
0.0063
0.0329
0.1089
0.2565
0.4633
0.6777
0.8445
0.9417
0.9838
0.9968
0.9996
1.0000
1.0000
0.0003
0.0038
0.9990
1.0000
0.0005
0.0059
0.0327
0.1133
0.2744
0.5000
0.7256
0.8867
0.9673
0.9941
0.9995
1.0000
0.0002
0.0032
0.0193
0.0730
0.1938
0.3872
0.6128
0.8062
0.9270
0.9807
0.9968
0.9998
1.0000
0.0001
0.0017
0.0112
0.0461
0.1334
0.2905
0.5000
0.7095
0.8666
0.9539
0.9888
0.9983
0.9999
1.0000
0.0001
0.0009
11
12
13
14
514
Table 1
(continued )
1/16
2/16
3/16
p
4/16
5/16
6/16
7/16
8/16
14
2
3
4
5
6
7
8
9
10
11
12
13
14
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0
1
2
3
4
5
6
7
8
9
10
11
12
13
0.9471
0.9908
0.9988
0.9999
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
0.3798
0.7596
0.9369
0.9881
0.9983
0.9998
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
0.3561
0.7359
0.9258
0.9849
0.9977
0.9997
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
0.7490
0.9127
0.9970
0.9953
0.9993
0.9999
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
0.1349
0.4241
0.7132
0.8922
0.9689
0.9930
0.9988
0.9998
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
0.1181
0.3879
0.6771
0.8698
0.9593
0.9900
0.9981
0.9997
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
0.4960
0.7404
0.8955
0.9671
0.9919
0.9985
0.9998
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
0.0444
0.1981
0.4463
0.6946
0.8665
0.9537
0.9873
0.9972
0.9995
0.9999
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
0.0361
0.1693
0.3998
0.6480
0.8342
0.9373
0.9810
0.9954
0.9991
0.9999
1.0000
1.0000
1.0000
1.0000
0.2811
0.5213
0.7415
0.8883
0.9167
0.9897
0.9978
0.9997
1.0000
1.0000
1.0000
1.0000
1.0000
0.0134
0.0802
0.2361
0.4613
0.6865
0.8516
0.9434
0.9827
0.9958
0.9992
0.9999
1.0000
1.0000
1.0000
1.0000
1.0000
0.0100
0.0635
0.1971
0.4050
0.6302
0.8103
0.9204
0.9729
0.9925
0.9984
0.9997
1.0000
1.0000
1.0000
0.1379
0.3181
0.5432
0.7480
0.8876
0.9601
0.9889
0.9976
0.9996
1.0000
1.0000
1.0000
1.0000
0.0036
0.0283
0.1069
0.2618
0.4729
0.6840
0.8435
0.9374
0.9799
0.9949
0.9990
0.9999
1.0000
1.0000
1.0000
1.0000
0.0025
0.0206
0.0824
0.2134
0.4069
0.6180
0.7940
0.9082
0.9666
0.9902
0.9977
0.9996
0.9999
1.0000
0.0585
0.1676
0.3477
0.5637
0.7581
0.8915
0.9615
0.9895
0.9979
0.9997
1.0000
1.0000
1.0000
0.0009
0.0087
0.0415
0.1267
0.2801
0.4827
0.6852
0.8415
0.9352
0.9790
0.9947
0.9990
0.9999
1.0000
1.0000
1.0000
0.0005
0.0057
0.0292
0.0947
0.2226
0.4067
0.6093
0.7829
0.9001
0.9626
0.9888
0.9974
0.9995
0.9999
0.0213
0.0756
0.1919
0.3728
0.5839
0.7715
0.8992
0.9654
0.9911
0.9984
0.9998
1.0000
1.0000
0.0002
0.0023
0.0136
0.0518
0.1410
0.2937
0.4916
0.6894
0.8433
0.9364
0.9799
0.9952
0.9992
0.9999
1.0000
1.0000
0.0001
0.0014
0.0086
0.0351
0.1020
0.2269
0.4050
0.6029
0.7760
0.8957
0.9609
0.9885
0.9975
0.9996
0.0065
0.0287
0.0898
0.2120
0.3953
0.6047
0.7880
0.9102
0.9713
0.9935
0.9991
0.9999
1.0000
0.0000
0.0005
0.0037
0.0176
0.0592
0.1509
0.3036
0.5000
0.6964
0.8491
0.9408
0.9824
0.9963
0.9995
1.0000
1.0000
0.0000
0.0003
0.0021
0.0106
0.0384
0.1051
0.2272
0.4018
0.5982
0.7728
0.8949
0.9616
0.9894
0.9979
15
16
Tables
Table 1
515
(continued )
1/16
2/16
3/16
p
4/16
5/16
6/16
7/16
8/16
16
14
15
16
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
0
1
2
3
4
1.0000
1.0000
1.0000
0.3338
0.7121
0.9139
0.9812
0.9969
0.9996
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
0.3130
0.6885
0.9013
0.9770
0.9959
0.9994
0.9999
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
0.2934
0.6650
0.8880
0.9722
0.9947
1.0000
1.0000
1.0000
0.1033
0.3542
0.6409
0.8457
0.9482
0.9862
0.9971
0.9995
0.9999
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
0.0904
0.3228
0.6051
0.8201
0.9354
0.9814
0.9957
0.9992
0.9999
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
0.0791
0.2938
0.5698
0.7933
0.9209
1.0000
1.0000
1.0000
0.0293
0.1443
0.3566
0.6015
0.7993
0.9180
0.9728
0.9927
0.9984
0.9997
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
0.0238
0.1227
0.3168
0.5556
0.7622
0.8958
0.9625
0.9889
0.9973
0.9995
0.9999
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
0.0193
0.1042
0.2804
0.5108
0.7235
1.0000
1.0000
1.0000
0.0075
0.0501
0.1637
0.3530
0.5739
0.7653
0.8929
0.9598
0.9876
0.9969
0.9994
0.9999
1.0000
1.0000
1.0000
1.0000
1.0000
0.0056
0.0395
0.1353
0.3057
0.5187
0.7175
0.8610
0.9431
0.9807
0.9946
0.9988
0.9998
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
0.0042
0.0310
0.1113
0.2631
0.4654
1.0000
1.0000
1.0000
0.0017
0.0149
0.0631
0.1724
0.3464
0.5520
0.7390
0.8725
0.9484
0.9828
0.9954
0.9990
0.9998
1.0000
1.0000
1.0000
1.0000
0.0012
0.0108
0.0480
0.1383
0.2920
0.4878
0.6806
0.8308
0.9247
0.9721
0.9915
0.9979
0.9996
0.9999
1.0000
1.0000
1.0000
1.0000
0.0008
0.0078
0.0364
0.1101
0.2440
1.0000
1.0000
1.0000
0.0003
0.0038
0.0204
0.0701
0.1747
0.3377
0.5333
0.7178
0.8561
0.9391
0.9790
0.9942
0.9987
0.9998
1.0000
1.0000
1.0000
0.0002
0.0025
0.0142
0.0515
0.1355
0.2765
0.4600
0.6486
0.8042
0.9080
0.9640
0.9885
0.9970
0.9994
0.9999
1.0000
1.0000
1.0000
0.0001
0.0016
0.0098
0.0375
0.1040
1.0000
1.0000
1.0000
0.0001
0.0008
0.0055
0.0235
0.0727
0.1723
0.3271
0.5163
0.7002
0.8433
0.9323
0.9764
0.9935
0.9987
0.9998
1.0000
1.0000
0.0000
0.0005
0.0034
0.0156
0.0512
0.1287
0.2593
0.4335
0.6198
0.7807
0.8934
0.9571
0.9860
0.9964
0.9993
0.9999
1.0000
1.0000
0.0000
0.0003
0.0021
0.0103
0.0356
0.9997
1.0000
1.0000
0.0000
0.0001
0.0012
0.0064
0.0245
0.0717
0.1662
0.3145
0.5000
0.6855
0.8338
0.9283
0.9755
0.9936
0.9988
0.9999
1.0000
0.0000
0.0001
0.0007
0.0038
0.0154
0.0481
0.1189
0.2403
0.4073
0.5927
0.7597
0.8811
0.9519
0.9846
0.9962
0.9993
0.9999
1.0000
0.0000
0.0000
0.0004
0.0022
0.0096
17
18
19
516
Table 1
(continued )
1/16
2/16
3/16
p
4/16
5/16
6/16
7/16
8/16
19
5
6
7
8
9
10
11
12
13
14
15
16
17
18
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
0
1
2
3
4
5
6
7
8
0.9992
0.9999
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
0.2751
0.6148
0.8741
0.9670
0.9933
0.9989
0.9999
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
0.2579
0.6189
0.8596
0.9612
0.9917
0.9986
0.9998
1.0000
1.0000
0.9757
0.9939
0.9988
0.9998
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
0.0692
0.2669
0.5353
0.7653
0.9050
0.9688
0.9916
0.9981
0.9997
0.9999
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
0.0606
0.2422
0.5018
0.7366
0.8875
0.9609
0.9888
0.9973
0.9995
0.8707
0.9500
0.9840
0.9957
0.9991
0.9998
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
0.0157
0.0883
0.2473
0.4676
0.6836
0.8431
0.9351
0.9776
0.9935
0.9984
0.9997
0.9999
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
0.0128
0.0747
0.2175
0.4263
0.6431
0.8132
0.9179
0.9696
0.9906
0.6678
0.8251
0.9225
0.9713
0.9911
0.9977
0.9995
0.9999
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
0.0032
0.0243
0.0913
0.2252
0.4148
0.6172
0.7858
0.8982
0.9591
0.9861
0.9961
0.9991
0.9998
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
0.0024
0.0190
0.0745
0.1917
0.3674
0.5666
0.7436
0.8701
0.9439
0.4266
0.6203
0.7838
0.8953
0.9573
0.9854
0.9959
0.9990
0.9998
1.0000
1.0000
1.0000
1.0000
1.0000
0.0006
0.0056
0.0275
0.0870
0.2021
0.3695
0.5598
0.7327
0.8605
0.9379
0.9766
0.9926
0.9981
0.9996
0.9999
1.0000
1.0000
1.0000
1.0000
1.0000
0.0004
0.0040
0.0206
0.0684
0.1662
0.3172
0.5003
0.6787
0.8206
0.2236
0.3912
0.5779
0.7459
0.8691
0.9430
0.9793
0.9938
0.9985
0.9997
1.0000
1.0000
1.0000
1.0000
0.0001
0.0011
0.0067
0.0271
0.0790
0.1788
0.3284
0.5079
0.6829
0.8229
0.9153
0.9657
0.9884
0.9968
0.9993
0.9999
1.0000
1.0000
1.0000
1.0000
0.0001
0.0007
0.0046
0.0195
0.0596
0.1414
0.2723
0.4405
0.6172
0.0948
0.2022
0.3573
0.5383
0.7103
0.8441
0.9292
0.9734
0.9919
0.9980
0.9996
1.0000
1.0000
1.0000
0.0000
0.0002
0.0013
0.0067
0.0245
0.0689
0.1552
0.2894
0.4591
0.6350
0.7856
0.8920
0.9541
0.9838
0.9953
0.9989
0.9998
1.0000
1.0000
1.0000
0.0000
0.0001
0.0008
0.0044
0.0167
0.0495
0.1175
0.2307
0.3849
0.0318
0.0835
0.1796
0.3238
0.5000
0.0672
0.8204
0.9165
0.9682
0.9904
0.9978
0.9996
1.0000
1.0000
0.0000
0.0000
0.0002
0.0013
0.0059
0.0207
0.0577
0.1316
0.2517
0.4119
0.5881
0.7483
0.8684
0.9423
0.9793
0.9941
0.9987
0.9998
1.0000
1.0000
0.0000
0.0000
0.0001
0.0007
0.0036
0.0133
0.0392
0.0946
0.1917
20
21
Tables
Table 1
517
(continued )
1/16
2/16
3/16
p
4/16
5/16
6/16
7/16
8/16
21
9
10
11
12
13
14
15
16
17
18
19
20
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
0
1
2
3
4
5
6
7
8
9
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
0.2418
0.5963
0.8445
0.9548
0.9898
0.9981
0.9997
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
0.2266
0.5742
0.8290
0.9479
0.9876
0.9976
0.9996
1.0000
1.0000
1.0000
0.9999
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
0.0530
0.2195
0.4693
0.7072
0.8687
0.9517
0.9853
0.9963
0.9992
0.9999
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
0.0464
0.1987
0.4381
0.6775
0.8485
0.9413
0.9811
0.9949
0.9988
0.9998
0.9975
0.9995
0.9999
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
0.0104
0.0631
0.1907
0.3871
0.6024
0.7813
0.8983
0.9599
0.9866
0.9962
0.9991
0.9998
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
0.0084
0.0532
0.1668
0.3503
0.5621
0.7478
0.8763
0.9484
0.9816
0.9944
0.9794
0.9936
0.9983
0.9996
0.9999
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
0.0018
0.0149
0.0606
0.1624
0.3235
0.5168
0.6994
0.8385
0.9254
0.9705
0.9900
0.9971
0.9993
0.9999
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
0.0013
0.0116
0.0492
0.1370
0.2832
0.4685
0.6537
0.8037
0.9037
0.9592
0.9137
0.9645
0.9876
0.9964
0.9991
0.9998
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
0.0003
0.0029
0.0154
0.0535
0.1356
0.2700
0.4431
0.6230
0.7762
0.8846
0.9486
0.9804
0.9936
0.9982
0.9996
0.9999
1.0000
1.0000
1.0000
1.0000
1.0000
0.0002
0.0021
0.0115
0.0416
0.1100
0.2280
0.3890
0.5668
0.7283
0.8507
0.7704
0.8806
0.9468
0.9799
0.9936
0.9983
0.9996
0.9999
1.0000
1.0000
1.0000
1.0000
0.0000
0.0005
0.0031
0.0139
0.0445
0.1107
0.2232
0.3774
0.5510
0.7130
0.8393
0.9220
0.9675
0.9885
0.9966
0.9991
0.9998
1.0000
1.0000
1.0000
1.0000
0.0000
0.0003
0.0021
0.0099
0.0330
0.0859
0.1810
0.3196
0.4859
0.6522
0.5581
0.7197
0.8454
0.9269
0.9708
0.9903
0.9974
0.9994
0.9999
1.0000
1.0000
1.0000
0.0000
0.0001
0.0005
0.0028
0.0133
0.0352
0.0877
0.1812
0.3174
0.4823
0.6490
0.7904
0.8913
0.9516
0.9818
0.9943
0.9985
0.9997
1.0000
1.0000
1.0000
0.0000
0.0000
0.0003
0.0018
0.0076
0.0247
0.0647
0.1403
0.2578
0.4102
0.3318
0.5000
0.6682
0.8083
0.9054
0.9605
0.9867
0.9964
0.9993
0.9999
1.0000
1.0000
0.0000
0.0000
0.0001
0.0004
0.0022
0.0085
0.0267
0.0669
0.1431
0.2617
0.4159
0.5841
0.7383
0.8569
0.9331
0.9739
0.9915
0.9978
0.9995
0.9999
1.0000
0.0000
0.0000
0.0000
0.0002
0.0013
0.0053
0.0173
0.0466
0.1050
0.2024
22
23
518
Table 1
(continued )
1/16
2/16
3/16
p
4/16
5/16
6/16
7/16
8/16
23
10
11
12
13
14
15
16
17
18
19
20
21
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
0
1
2
3
4
5
6
7
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
0.2125
0.5524
0.8131
0.9405
0.9851
0.9970
0.9995
0.9999
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
0.1992
0.5132
0.7968
0.9325
0.9823
0.9962
0.9993
0.9999
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
0.0406
0.1797
0.4082
0.6476
0.8271
0.9297
0.9761
0.9932
0.9983
0.9997
0.9999
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
0.0355
0.1623
0.3796
0.6176
0.8047
0.9169
0.9703
0.9910
0.9986
0.9997
0.9999
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
0.0069
0.0448
0.1455
0.3159
0.5224
0.7130
0.8522
0.9349
0.9754
0.9920
0.9978
0.9995
0.9999
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
0.0056
0.0377
0.1266
0.2840
0.4837
0.6772
0.8261
0.9194
0.9851
0.9954
0.9988
0.9997
0.9999
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
0.0010
0.0090
0.0398
0.1150
0.2466
0.4222
0.6074
0.7662
0.8787
0.9453
0.9787
0.9928
0.9979
0.9995
0.9999
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
0.0008
0.0070
0.0321
0.0962
0.2137
0.3783
0.5611
0.7265
0.9286
0.9705
0.9895
0.9968
0.9992
0.9998
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
0.0001
0.0015
0.0086
0.0322
0.0886
0.1911
0.3387
0.5112
0.6778
0.8125
0.9043
0.9574
0.9835
0.9945
0.9984
0.9996
0.9999
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
0.0001
0.0011
0.0064
0.0248
0.0710
0.1591
0.2926
0.4573
0.7919
0.8910
0.9504
0.9806
0.9935
0.9982
0.9996
0.9999
1.0000
1.0000
1.0000
1.0000
0.0000
0.0002
0.0014
0.0070
0.0243
0.0661
0.1453
0.2676
0.4235
0.5898
0.7395
0.8538
0.9281
0.9693
0.9887
0.9964
0.9990
0.9998
1.0000
1.0000
1.0000
1.0000
1.0000
0.0000
0.0001
0.0010
0.0049
0.0178
0.0504
0.1156
0.2218
0.5761
0.7285
0.8471
0.9252
0.9686
0.9888
0.9967
0.9992
0.9998
1.0000
1.0000
1.0000
0.0000
0.0000
0.0002
0.0011
0.0051
0.0172
0.0472
0.1072
0.2064
0.3435
0.5035
0.6618
0.7953
0.8911
0.9496
0.9799
0.9932
0.9981
0.9996
0.9999
1.0000
1.0000
1.0000
0.0000
0.0000
0.0001
0.0007
0.0033
0.0119
0.0341
0.0810
0.3388
0.5000
0.6612
0.7976
0.8950
0.9534
0.9827
0.9947
0.9987
0.9998
1.0000
1.0000
0.0000
0.0000
0.0000
0.0001
0.0008
0.0033
0.0113
0.0320
0.0758
0.1537
0.2706
0.4194
0.5806
0.7294
0.8463
0.9242
0.9680
0.9887
0.9967
0.9992
0.9999
1.0000
1.0000
0.0000
0.0000
0.0000
0.0001
0.0005
0.0028
0.0073
0.0216
24
25
Tables
Table 1
519
(continued )
1/16
2/16
3/16
p
4/16
5/16
6/16
7/16
8/16
25
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
0.9977
0.9995
0.9999
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
0.9678
0.9889
0.9967
0.9992
0.9998
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
0.8506
0.9287
0.9703
0.9893
0.9966
0.9991
0.9998
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
0.6258
0.7704
0.8756
0.9408
0.9754
0.9911
0.9972
0.9992
0.9998
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
0.3651
0.5275
0.6834
0.8110
0.9003
0.9538
0.9814
0.9935
0.9981
0.9995
0.9999
1.0000
1.0000
1.0000
1.0000
0.1630
0.2835
0.4335
0.5926
0.7369
0.8491
0.9240
0.9667
0.9874
0.9960
0.9989
0.9998
1.0000
1.0000
1.0000
0.0539
0.1148
0.2122
0.3450
0.5000
0.6550
0.7878
0.8852
0.9462
0.9784
0.9927
0.9980
0.9995
0.9999
1.0000
520
j =0
j
.
j!
k
0
1
2
3
4
0.001
0.9990 0050
0.9999 9950
1.0000 0000
0.005
0.9950
0.9999
0.9999
1.0000
1248
8754
9998
0000
0.010
0.9900
0.9999
0.9999
1.0000
0.015
4983
5033
9983
0000
0.9851
0.9998
0.9999
1.0000
1194
8862
9945
0000
0.020
0.025
0.9801 9867
0.9998 0264
0.9999 9868
0.9999 9999
1.0000 0000
0.9753 099
0.9996 927
0.9999 974
1.0000 000
1.0000 000
0.050
0.055
k
0
1
2
3
0.030
0.970
0.999
0.999
1.000
446
559
996
000
0.035
0.965
0.999
0.999
1.000
605
402
993
000
0.040
0.960
0.999
0.999
1.000
0.045
789
221
990
000
0.955
0.999
0.999
1.000
997
017
985
000
0.951
0.998
0.999
1.000
229
791
980
000
0.946
0.998
0.999
1.000
485
542
973
000
k
0
1
2
3
4
0.060
0.941
0.998
0.999
0.999
1.000
765
270
966
999
000
0.065
0.937
0.997
0.999
0.999
1.000
067
977
956
999
000
0.070
0.932
0.997
0.999
0.999
1.000
0.075
394
661
946
999
000
0.927
0.997
0.999
0.999
1.000
743
324
934
999
000
0.080
0.923
0.996
0.999
0.999
1.000
116
966
920
998
000
0.085
0.918
0.996
0.999
0.999
1.000
512
586
904
998
000
k
0
1
2
3
4
5
6
0.090
0.913
0.996
0.999
0.999
1.000
931
185
886
997
000
0.095
0.909
0.995
0.999
0.999
1.000
373
763
867
997
000
0.100
0.904
0.995
0.999
0.999
1.000
0.200
837
321
845
996
000
0.818 731
0.982 477
0.998 852
0.999 943
0.999 998
1.000 000
0.300
0.740 818
0.963 064
0.996 401
0.999 734
0.999 984
0.999 999
1.000 000
0.400
0.670 320
0.938 448
0.992 074
0.999 224
0.999 939
0.999 996
1.000 000
k
0
1
2
0.500
0.606 531
0.909 796
0.985 612
0.600
0.548 812
0.878 099
0.976 885
0.700
0.496 585
0.844 195
0.965 858
0.800
0.449 329
0.808 792
0.952 577
0.900
0.406 329
0.772 482
0.937 143
1.000
0.367 879
0.735 759
0.919 699
521
Tables
Table 2
(continued )
k
3
4
5
6
7
8
9
0.500
0.998
0.999
0.999
0.999
1.000
248
828
986
999
000
0.600
0.996
0.999
0.999
0.999
1.000
0.700
642
606
961
997
000
0.994
0.999
0.999
0.999
0.999
1.000
247
214
910
991
999
000
0.800
0.990
0.998
0.999
0.999
0.999
1.000
920
589
816
979
998
000
0.900
0.986
0.997
0.999
0.999
0.999
1.000
1.000
541
656
657
957
995
000
0.981
0.996
0.999
0.999
0.999
0.999
1.000
012
340
406
917
990
999
000
1.80
2.00
2.50
3.00
3.50
0.1653
0.4628
0.7306
0.8913
0.9636
0.9896
0.9974
0.9994
0.9999
1.0000
0.1353
0.4060
0.6767
0.8571
0.9473
0.9834
0.9955
0.9989
0.9998
1.0000
0.0821
0.2873
0.5438
0.7576
0.8912
0.9580
0.9858
0.9958
0.9989
0.9997
0.9999
1.0000
0.0498
0.1991
0.4232
0.6472
0.8153
0.9161
0.9665
0.9881
0.9962
0.9989
0.9997
0.9999
1.0000
0.0302
0.1359
0.3208
0.5366
0.7254
0.8576
0.9347
0.9733
0.9901
0.9967
0.9990
0.9997
0.9999
1.0000
5.00
6.00
7.00
8.00
9.00
10.00
0.0067
0.0404
0.1247
0.2650
0.4405
0.6160
0.7622
0.8666
0.9319
0.9682
0.9863
0.9945
0.9980
0.9993
0.9998
0.0025
0.0174
0.0620
0.1512
0.2851
0.4457
0.6063
0.7440
0.8472
0.9161
0.9574
0.9799
0.9912
0.9964
0.9986
0.0009
0.0073
0.0296
0.0818
0.1730
0.3007
0.4497
0.5987
0.7291
0.8305
0.9015
0.9467
0.9730
0.9872
0.9943
0.0003
0.0030
0.0138
0.0424
0.0996
0.1912
0.3134
0.4530
0.5925
0.7166
0.8159
0.8881
0.9362
0.9658
0.9827
0.0001
0.0012
0.0062
0.0212
0.0550
0.1157
0.2068
0.3239
0.4577
0.5874
0.7060
0.8030
0.8758
0.9261
0.9585
0.0000
0.0005
0.0028
0.0103
0.0293
0.0671
0.1301
0.2202
0.3328
0.4579
0.5830
0.6968
0.7916
0.8645
0.9165
1.20
1.40
1.60
0
1
2
3
4
5
6
7
8
9
10
11
12
13
0.3012
0.6626
0.8795
0.9662
0.9923
0.9985
0.9997
1.0000
0.2466
0.5918
0.8335
0.9463
0.9857
0.9968
0.9994
0.9999
1.0000
0.2019
0.5249
0.7834
0.9212
0.9763
0.9940
0.9987
0.9997
1.0000
4.00
4.50
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
0.0183
0.0916
0.2381
0.4335
0.6288
0.7851
0.8893
0.9489
0.9786
0.9919
0.9972
0.9991
0.9997
0.9999
1.0000
0.0111
0.0611
0.1736
0.3423
0.5321
0.7029
0.8311
0.9134
0.9597
0.9829
0.9933
0.9976
0.9992
0.9997
0.9999
522
Table 2
k
15
16
17
18
19
20
21
22
23
24
(continued )
4.00
4.50
5.00
1.0000
0.9999
1.0000
6.00
7.00
8.00
9.00
10.00
0.9995
0.9998
0.9999
1.0000
0.9976
0.9990
0.9996
0.9999
0.9918
0.9963
0.9984
0.9993
0.9997
0.9999
1.0000
0.9780
0.9889
0.9947
0.9976
0.9989
0.9996
0.9998
0.9999
1.0000
0.9513
0.9730
0.9857
0.9928
0.9965
0.9984
0.9993
0.9997
0.9999
1.0000
1.0000
Tables
523
( x ) =
[
x
0.00
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.10
0.11
0.12
0.13
0.14
0.15
0.16
0.17
0.18
0.19
0.20
0.21
0.22
0.23
0.24
0.25
0.26
0.27
0.28
0.29
0.30
0.31
0.32
0.33
0.34
1
2
e t 2 dt.
( x ) = 1 ( x ) .
(x)
(x)
(x)
(x)
0.500000
0.503989
0.507978
0.511966
0.515953
0.519939
0.523922
0.527903
0.531881
0.535856
0.539828
0.543795
0.547758
0.551717
0.555670
0.559618
0.563559
0.567495
0.571424
0.575345
0.579260
0.583166
0.587064
0.590954
0.594835
0.598706
0.602568
0.606420
0.610261
0.614092
0.617911
0.621720
0.625516
0.629300
0.633072
0.35
0.36
0.37
0.38
0.39
0.40
0.41
0.42
0.43
0.44
0.45
0.46
0.47
0.48
0.49
0.50
0.51
0.52
0.53
0.54
0.55
0.56
0.57
0.58
0.59
0.60
0.61
0.62
0.63
0.64
0.65
0.66
0.67
0.68
0.69
0.636831
0.640576
0.644309
0.648027
0.651732
0.655422
0.659097
0.662757
0.666402
0.670031
0.673645
0.677242
0.680822
0.684386
0.687933
0.691462
0.694974
0.698468
0.701944
0.705401
0.708840
0.712260
0.715661
0.719043
0.722405
0.725747
0.279069
0.732371
0.735653
0.738914
0.742154
0.745373
0.748571
0.751748
0.754903
0.70
0.71
0.72
0.73
0.74
0.75
0.76
0.77
0.78
0.79
0.80
0.81
0.82
0.83
0.84
0.85
0.86
0.87
0.88
0.89
0.90
0.91
0.92
0.93
0.94
0.95
0.96
0.97
0.98
0.99
1.00
1.01
1.02
1.03
1.04
0.758036
0.761148
0.764238
0.767305
0.770350
0.773373
0.776373
0.779350
0.782305
0.785236
0.788145
0.791030
0.793892
0.796731
0.799546
0.802337
0.805105
0.807850
0.810570
0.813267
0.815940
0.818589
0.821214
0.823814
0.826391
0.828944
0.831472
0.833977
0.836457
0.838913
0.841345
0.843752
0.846136
0.848495
0.850830
1.05
1.06
1.07
1.08
1.09
1.10
1.11
1.12
1.13
1.14
1.15
1.16
1.17
1.18
1.19
1.20
1.21
1.22
1.23
1.24
1.25
1.26
1.27
1.28
1.29
1.30
1.31
1.32
1.33
1.34
1.35
1.36
1.37
1.38
1.39
0.853141
0.855428
0.857690
0.859929
0.862143
0.864334
0.866500
0.868643
0.870762
0.872857
0.874928
0.876976
0.879000
0.881000
0.882977
0.884930
0.886861
0.888768
0.890651
0.892512
0.894350
0.896165
0.897958
0.899727
0.901475
0.903200
0.904902
0.906582
0.908241
0.909877
0.911492
0.913085
0.914657
0.916207
0.917736
524
Table 3
(continued )
(x)
(x)
(x)
(x)
0.919243
0.920730
0.922196
0.923641
0.925066
0.926471
0.927855
0.929219
0.930563
0.931888
0.933193
0.934478
0.935745
0.936992
0.938220
0.939429
0.940620
0.941792
0.942947
0.944083
0.945201
0.946301
0.947384
0.948449
0.949497
0.950529
0.951543
0.952540
0.953521
0.954486
0.955435
0.956367
0.957284
0.958185
0.959070
0.959941
0.960796
0.961636
0.962462
0.963273
0.964070
0.964852
0.965620
0.966375
0.967116
1.85
1.86
1.87
1.88
1.89
1.90
1.91
1.92
1.93
1.94
1.95
1.96
1.97
1.98
1.99
2.00
2.01
2.02
2.03
2.04
2.05
2.06
2.07
2.08
2.09
2.10
2.11
2.12
2.13
2.14
2.15
2.16
2.17
2.18
2.19
2.20
2.21
2.22
2.23
2.24
2.25
2.26
2.27
2.28
2.29
0.967843
0.968557
0.969258
0.969946
0.970621
0.971283
0.971933
0.972571
0.973197
0.973810
0.974412
0.975002
0.975581
0.976148
0.976705
0.977250
0.977784
0.978308
0.978822
0.979325
0.979818
0.980301
0.980774
0.981237
0.981691
0.982136
0.982571
0.982997
0.983414
0.983823
0.984222
0.984614
0.984997
0.985371
0.985738
0.986097
0.986447
0.986791
0.987126
0.987455
0.987776
0.988089
0.988396
0.988696
0.988989
2.30
2.31
2.32
2.33
2.34
2.35
2.36
2.37
2.38
2.39
2.40
2.41
2.42
2.43
2.44
2.45
2.46
2.47
2.48
2.49
2.50
2.51
2.52
2.53
2.54
2.55
2.56
2.57
2.58
2.59
2.60
2.61
2.62
2.63
2.64
2.65
2.66
2.67
2.68
2.69
2.70
2.71
2.72
2.73
2.74
0.989276
0.989556
0.989830
0.990097
0.990358
0.990613
0.990863
0.991106
0.991344
0.991576
0.991802
0.992024
0.992240
0.992451
0.992656
0.992857
0.993053
0.993244
0.993431
0.993613
0.993790
0.993963
0.994132
0.994297
0.994457
0.994614
0.994766
0.994915
0.995060
0.995201
0.995339
0.995473
0.995604
0.995731
0.995855
0.995975
0.996093
0.996207
0.996319
0.996427
0.996533
0.996636
0.996736
0.996833
0.996928
2.75
2.76
2.77
2.78
2.79
2.80
2.81
2.82
2.83
2.84
2.85
2.86
2.87
2.88
2.89
2.90
2.91
2.92
2.93
2.54
2.95
2.96
2.97
2.98
2.99
3.00
3.01
3.02
3.03
3.04
3.05
3.06
3.07
3.08
3.09
3.10
3.11
3.12
3.13
3.14
3.15
3.16
3.17
3.18
3.19
0.997020
0.997110
0.997197
0.997282
0.997365
0.997445
0.997523
0.997599
0.997673
0.997744
0.997814
0.997882
0.997948
0.998012
0.998074
0.998134
0.998193
0.998250
0.998305
0.998359
0.998411
0.998462
0.998511
0.998559
0.998605
0.998650
0.998694
0.998736
0.998777
0.998817
0.998856
0.998893
0.998930
0.998965
0.998999
0.999032
0.999065
0.999096
0.999126
0.999155
0.999184
0.999211
0.999238
0.999264
0.999289
1.40
1.41
1.42
1.43
1.44
1.45
1.46
1.47
1.48
1.49
1.50
1.51
1.52
1.53
1.54
1.55
1.56
1.57
1.58
1.59
1.60
1.61
1.62
1.63
1.64
1.65
1.66
1.67
1.68
1.69
1.70
1.71
1.72
1.73
1.74
1.75
1.76
1.77
1.78
1.79
1.80
1.81
1.82
1.83
1.84
Tables
525
Table 3
(continued )
(x)
(x)
(x)
(x)
0.999313
0.999336
0.999359
0.999381
0.999402
0.999423
0.999443
0.999462
0.999481
0.999499
0.999517
0.999534
0.999550
0.999566
0.999581
0.999596
0.999610
0.999624
0.999638
0.999651
3.40
3.41
3.42
3.43
3.44
3.45
3.46
3.47
3.48
3.49
3.50
3.51
3.52
3.53
3.53
3.55
3.56
3.57
3.58
3.59
0.999663
0.999675
0.999687
0.999698
0.999709
0.999720
0.999730
0.999740
0.999749
0.999758
0.999767
0.999776
0.999784
0.999792
0.999800
0.999807
0.999815
0.999822
0.999828
0.999835
3.60
3.61
3.62
3.63
3.64
3.65
3.66
3.67
3.68
3.69
3.70
3.71
3.72
3.73
3.74
3.75
3.76
3.77
3.78
3.79
0.999841
0.999847
0.999853
0.999858
0.999864
0.999869
0.999874
0.999879
0.999883
0.999888
0.999892
0.999896
0.999900
0.999904
0.999908
0.999912
0.999915
0.999918
0.999922
0.999925
3.80
3.81
3.82
3.83
3.84
3.85
3.86
3.87
3.88
3.89
3.90
3.91
3.92
3.93
3.94
3.95
3.96
3.97
3.98
3.99
0.999928
0.999931
0.999933
0.999936
0.999938
0.999941
0.999943
0.999946
0.999948
0.999950
0.999952
0.999954
0.999956
0.999958
0.999959
0.999961
0.999963
0.999964
0.999966
0.999967
3.20
3.21
3.22
3.23
3.24
3.25
3.26
3.27
3.28
3.29
3.30
3.31
3.32
3.33
3.34
3.35
3.36
3.37
3.38
3.39
526
P t r x = .
r
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
0.75
0.90
0.95
0.975
0.99
0.995
1.0000
0.8165
0.7649
0.7407
0.7267
0.7176
0.7111
0.7064
0.7027
0.6998
0.6974
0.6955
0.6938
0.6924
0.6912
0.6901
0.6892
0.6884
0.6876
0.6870
0.6864
0.6858
0.6853
0.6848
0.6844
0.6840
0.6837
0.6834
0.6830
0.6828
0.6825
0.6822
0.6820
0.6818
0.6816
0.6814
0.6812
0.6810
3.0777
1.8856
1.6377
1.5332
1.4759
1.4398
1.4149
1.3968
1.3830
1.3722
1.3634
1.3562
1.3502
1.3450
1.3406
1.3368
1.3334
1.3304
1.3277
1.3253
1.3232
1.3212
1.3195
1.3178
1.3163
1.3150
1.3137
1.3125
1.3114
1.3104
1.3095
1.3086
1.3077
1.3070
1.3062
1.3055
1.3049
1.3042
6.3138
2.9200
2.3534
2.1318
2.0150
1.9432
1.8946
1.8595
1.8331
1.8125
1.7959
1.7823
1.7709
1.7613
1.7531
1.7459
1.7396
1.7341
1.7291
1.7247
1.7207
1.7171
1.7139
1.7109
1.7081
1.7056
1.7033
1.7011
1.6991
1.6973
1.6955
1.6939
1.6924
1.6909
1.6896
1.6883
1.6871
1.6860
12.7062
4.3027
3.1824
2.7764
2.5706
2.4469
2.3646
3.3060
2.2622
2.2281
2.2010
2.1788
1.1604
2.1448
2.1315
2.1199
2.1098
2.1009
2.0930
2.0860
2.0796
2.0739
2.0687
2.0639
2.0595
2.0555
2.0518
2.0484
2.0452
2.0423
2.0395
2.0369
2.0345
2.0322
2.0301
2.0281
2.0262
2.0244
31.8207
6.9646
4.5407
3.7649
3.3649
3.1427
2.9980
2.8965
2.8214
2.7638
2.7181
2.6810
2.6503
2.6245
2.6025
2.5835
2.5669
2.5524
2.5395
2.5280
2.5177
2.5083
2.4999
2.4922
2.4851
2.4786
2.4727
2.4671
2.4620
2.4573
2.4528
2.4487
2.4448
2.4411
2.4377
2.4345
2.4314
2.4286
63.6574
9.9248
5.8409
4.6041
4.0322
3.7074
3.4995
3.3554
3.2498
3.1693
3.1058
3.0545
3.0123
2.9768
2.9467
2.9208
2.8982
2.8784
2.8609
2.8453
2.8314
2.8188
2.8073
2.7969
2.7874
2.7787
2.7707
2.7633
2.7564
2.7500
2.7440
2.7385
2.7333
2.7284
2.7238
2.7195
1.7154
2.7116
Tables
527
Table 4
(continued )
0.75
0.90
0.95
0.975
0.99
0.995
0.6808
0.6807
0.6805
0.6804
0.6802
0.6801
0.6800
0.6799
0.6797
0.6796
0.6795
0.6794
0.6793
0.6792
0.6791
0.6791
0.6790
0.6789
0.6788
0.6787
0.6787
0.6786
0.6785
0.6785
0.6784
0.6783
0.6783
0.6782
0.6782
0.6781
0.6781
0.6780
0.6780
0.6779
0.6779
0.6778
0.6778
0.6777
0.6777
0.6776
0.6776
0.6776
0.6775
1.3036
1.3031
1.3025
1.3020
1.3016
1.3011
1.3006
1.3002
1.2998
1.2994
1.2991
1.2987
1.2984
1.2980
1.2977
1.2974
1.2971
1.2969
1.2966
1.2963
1.2961
1.2958
1.2956
1.2954
1.2951
1.2949
1.2947
1.2945
1.2943
1.2941
1.2939
1.2938
1.2936
1.2934
1.2933
1.2931
1.2929
1.2928
1.2926
1.2925
1.2924
1.2922
1.2921
1.6849
1.6839
1.6829
1.6820
1.6811
1.6802
1.6794
1.6787
1.6779
1.6772
1.6766
1.6759
1.6753
1.6747
1.6741
1.6736
1.6730
1.6725
1.6720
1.6716
1.6711
1.6706
1.6702
1.6698
1.6694
1.6690
1.6686
1.6683
1.6679
1.6676
1.6672
1.6669
1.6666
1.6663
1.6660
1.6657
1.6654
1.6652
1.6649
1.6646
1.6644
1.6641
1.6639
2.0227
2.0211
2.0195
2.0181
2.0167
2.0154
2.0141
2.0129
2.0117
2.0106
2.0096
2.0086
2.0076
2.0066
2.0057
2.0049
2.0040
2.0032
2.0025
2.0017
2.0010
2.0003
1.9996
1.9990
1.9983
1.9977
1.9971
1.9966
1.9960
1.9955
1.9949
1.9944
1.9939
1.9935
1.9930
1.9925
1.9921
1.9917
1.9913
1.9908
1.9905
1.9901
1.9897
2.4258
2.4233
2.4208
2.4185
2.4163
2.4141
2.4121
2.4102
2.4083
2.4066
2.4069
2.4033
2.4017
2.4002
2.3988
2.3974
2.3961
2.3948
2.3936
2.3924
2.3912
2.3901
2.3890
2.3880
2.3870
2.3860
2.3851
2.3842
2.3833
2.3824
2.3816
2.3808
2.3800
2.3793
2.3785
2.3778
2.3771
2.3764
2.3758
2.3751
2.3745
2.3739
2.3733
2.7079
2.7045
2.7012
2.6981
2.6951
2.6923
2.6896
2.6870
2.6846
2.6822
2.6800
2.6778
2.6757
2.6737
2.6718
2.6700
2.6682
2.6665
2.6649
2.6633
2.6618
2.6603
2.6589
2.6575
2.6561
2.6549
2.6536
2.6524
2.6512
2.6501
2.6490
2.6479
2.6469
2.6459
2.6449
2.6439
2.6430
2.6421
2.6412
2.6403
2.6395
2.6387
2.6379
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
528
Table 4
(continued )
0.75
0.90
0.95
0.975
0.99
0.995
0.6775
0.6775
0.6774
0.6774
0.6774
0.6773
0.6773
0.6773
0.6772
1.2920
1.2918
1.2917
1.2916
1.2915
1.2914
1.2912
1.2911
1.2910
1.6636
1.6634
1.6632
1.6630
1.6628
1.6626
1.6624
1.6622
1.6620
1.9893
1.9890
1.9886
1.9883
1.9879
1.9876
1.9873
1.9870
1.9867
2.3727
2.3721
2.3716
2.3710
2.3705
2.3700
2.3695
2.3690
2.3685
2.6371
2.6364
2.6356
2.6349
2.6342
2.6335
2.6329
2.6322
2.6316
82
83
84
85
86
87
88
89
90
Tables
529
P r2 x = .
0.005
0.01
0.025
0.05
0.10
0.25
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
0.010
0.072
0.207
0.412
0.676
0.989
1.344
1.735
2.156
2.603
3.074
3.565
4.075
4.601
5.142
5.697
6.265
6.844
7.434
8.034
8.643
9.260
9.886
10.520
11.160
11.808
12.461
13.121
13.787
14.458
15.134
15.815
16.501
17.192
17.887
18.586
19.289
0.020
0.115
0.297
0.554
0.872
1.239
1.646
2.088
2.558
3.053
3.571
4.107
4.660
5.229
5.812
6.408
7.015
7.633
8.260
8.897
9.542
10.196
10.856
11.524
12.198
12.879
13.565
14.257
14.954
15.655
16.362
17.074
17.789
18.509
19.233
19.960
20.691
0.001
0.051
0.216
0.484
0.831
1.237
1.690
2.180
2.700
3.247
3.816
4.404
5.009
5.629
6.262
6.908
7.564
8.231
8.907
9.591
10.283
10.982
11.689
12.401
13.120
13.844
14.573
15.308
16.047
16.791
17.539
18.291
19.047
19.806
20.569
21.336
22.106
22.878
0.004
0.103
0.352
0.711
1.145
1.635
2.167
2.733
2.325
3.940
4.575
5.226
5.892
6.571
7.261
7.962
8.672
8.390
10.117
10.851
11.591
12.338
13.091
13.848
14.611
13.379
16.151
16.928
17.708
18.493
19.281
20.072
20.867
21.664
22.465
23.269
24.075
24.884
0.016
0.211
0.584
1.064
1.610
2.204
2.833
3.490
4.168
4.865
5.578
6.304
7.042
7.790
8.547
9.312
10.085
10.865
11.651
12.443
13.240
14.042
14.848
15.659
16.473
17.292
18.114
18.939
19.768
20.599
21.434
22.271
23.110
23.952
24.797
25.643
26.492
27.343
0.102
0.575
1.213
1.923
2.675
3.455
4.255
5.071
5.899
6.737
7.584
9.438
9.299
10.165
11.037
11.912
12.792
13.675
14.562
15.452
16.344
17.240
18.137
19.037
19.939
20.843
21.749
22.657
23.567
24.478
25.390
26.304
27.219
28.136
29.054
29.973
30.893
31.815
530
Table 5
(continued )
0.005
0.01
0.025
0.05
0.10
0.25
39
40
41
42
43
44
45
19.996
20.707
21.421
22.138
22.859
23.584
24.311
21.426
22.164
22.906
23.650
24.398
25.148
25.901
23.654
24.433
25.215
25.999
26.785
27.575
28.366
25.695
26.509
27.326
28.144
28.965
29.787
30.612
28.196
29.051
29.907
30.765
31.625
32.487
33.350
32.737
33.660
34.585
35.510
36.436
37.363
38.291
0.75
0.90
0.95
0.975
0.99
0.995
1.323
2.773
4.108
5.385
6.626
7.841
9.037
10.219
11.389
12.549
13.701
14.845
15.984
17.117
18.245
19.369
20.489
21.605
22.718
23.828
24.935
26.039
27.141
28.241
29.339
30.435
31.528
32.620
33.711
34.800
35.887
36.973
2.706
4.605
6.251
7.779
9.236
10.645
12.017
13.362
14.684
15.987
17.275
18.549
19.812
21.064
22.307
23.542
24.769
25.989
27.204
28.412
29.615
30.813
32.007
33.196
34.382
35.563
36.741
37.916
39.087
40.256
41.422
42.585
3.841
5.991
7.815
9.488
11.071
12.592
14.067
15.507
16.919
18.307
19.675
21.026
23.362
23.685
24.996
26.296
27.587
28.869
30.144
31.410
32.671
33.924
35.172
36.415
37.652
38.885
40.113
41.337
42.557
43.773
44.985
46.194
5.024
7.378
9.348
11.143
12.833
14.449
16.013
17.535
19.023
20.483
21.920
23.337
24.736
26.119
27.488
28.845
30.191
31.526
32.852
34.170
35.479
36.781
38.076
39.364
40.646
41.923
43.194
44.641
45.722
46.979
48.232
49.480
6.635
9.210
11.345
13.277
15.086
16.812
18.475
20.090
21.666
23.209
24.725
26.217
27.688
29.141
30.578
32.000
33.409
34.805
36.191
37.566
38.932
40.289
41.638
42.980
44.314
45.642
46.963
48.278
49.588
50.892
51.191
53.486
7.879
10.597
12.838
14.860
16.750
18.548
20.278
21.955
23.589
25.188
26.757
28.299
29.819
31.319
32.801
34.267
35.718
37.156
38.582
39.997
41.401
42.796
44.181
45.559
46.928
48.290
49.645
50.993
52.336
53.672
55.003
56.328
r
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
Tables
531
Table 5
(continued )
0.75
0.90
0.95
0.975
0.99
0.995
38.058
39.141
40.223
41.304
42.383
43.462
44.539
45.616
46.692
47.766
48.840
49.913
50.985
43.745
44.903
46.059
47.212
48.363
49.513
50.660
51.805
52.949
54.090
55.230
56.369
57.505
47.400
48.602
49.802
50.998
52.192
53.384
54.572
55.758
56.942
58.124
59.304
60.481
61.656
50.725
51.966
53.203
54.437
55.668
56.896
58.120
59.342
60.561
61.777
62.990
64.201
65.410
54.776
56.061
57.342
58.619
59.892
61.162
62.428
63.691
64.950
66.206
67.459
68.710
69.957
57.648
58.964
60.275
61.581
62.883
64.181
65.476
66.766
68.053
69.336
70.616
71.893
73.166
33
34
35
36
37
38
39
40
41
42
43
44
45
532
Let Fr ,r be a random variable having the F-distribution with r1, r2 degrees of freedom.
Then the tabulated quantities are the numbers x for which
1 2
P Fr ,r x = .
1
r1
0.500
0.750
0.900
0.950
0.975
0.990
0.995
1.0000
5.8285
39.864
161.45
647.79
4052.2
16211
1.5000
7.5000
49.500
199.50
799.50
4999.5
20000
1.7092
8.1999
53.593
215.71
864.16
5403.3
21615
1.8227
8.5810
55.833
224.58
899.58
5624.6
22500
1.8937
8.8198
57.241
230.16
921.85
5763.7
23056
1.9422
8.9833
58.204
233.99
937.11
5859.0
23437
0.500
0.750
0.900
0.950
0.975
0.990
0.995
0.500
0.750
0.900
0.950
0.975
0.990
0.995
0.66667
2.5714
8.5623
18.513
38.506
98.503
198.50
1.0000
3.0000
9.0000
19.000
39.000
99.000
199.00
1.1349
3.1534
9.1618
19.164
39.165
99.166
199.17
1.2071
3.2320
9.2434
19.247
39.248
99.249
199.25
1.2519
3.2799
9.2926
19.296
39.298
99.299
199.30
1.2824
3.3121
9.3255
19.330
39.331
99.332
199.33
0.500
0.750
0.900
0.950
0.975
0.990
0.995
0.500
0.750
0.900
0.950
0.975
0.990
0.995
0.58506
2.0239
5.5383
10.128
17.443
34.116
55.552
0.88110
2.2798
5.4624
9.5521
16.044
30.817
49.799
1.0000
2.3555
5.3908
9.2766
15.439
29.457
47.467
1.0632
2.3901
5.3427
9.1172
15.101
28.710
46.195
1.1024
2.4095
5.3092
9.0135
14.885
28.237
45.392
1.1289
2.4218
5.2847
8.9406
14.735
27.911
44.838
0.500
0.750
0.900
0.950
0.975
0.990
0.995
0.500
0.750
0.900
0.950
0.975
0.990
0.995
0.54863
1.8074
4.5448
7.7086
12.218
21.198
31.333
0.82843
2.0000
4.3246
6.9443
10.649
18.000
26.284
0.94054
2.0467
4.1908
6.5914
9.9792
16.694
24.259
1.0000
2.0642
4.1073
6.3883
9.6045
15.977
23.155
1.0367
2.0723
4.0506
6.2560
9.3645
15.522
22.456
1.0617
2.0766
4.0098
6.1631
9.1973
15.207
21.975
0.500
0.750
0.900
0.950
0.975
0.990
0.995
0.500
0.750
0.900
0.950
0.975
0.990
0.995
0.52807
1.6925
4.0604
6.6079
10.007
16.258
22.785
0.79877
1.8528
3.7797
5.7861
8.4336
13.274
18.314
0.90715
1.8843
3.6195
5.4095
7.7636
12.060
16.530
0.96456
1.8927
3.5202
5.1922
7.3879
11.392
15.556
1.0000
1.8947
3.4530
5.0503
7.1464
10.967
14.940
1.0240
1.8945
3.4045
4.9503
6.9777
10.672
14.513
0.500
0.750
0.900
0.950
0.975
0.990
0.995
0.500
0.750
0.900
0.950
0.975
0.990
0.995
0.51489
1.6214
3.7760
5.9874
8.8131
13.745
18.635
0.77976
1.7622
3.4633
5.1433
7.2598
10.925
14.544
0.88578
1.7844
3.2888
4.7571
6.5988
9.7795
12.917
0.94191
1.7872
3.1808
4.5337
6.2272
9.1483
12.028
0.97654
1.7852
3.1075
4.3874
5.9876
8.7459
11.464
1.0000
1.7821
3.0546
4.2839
5.8197
8.4661
11.073
0.500
0.750
0.900
0.950
0.975
0.990
0.995
r2
r2
533
Tables
Table 6
(continued )
r1
10
11
12
0.500
0.750
0.900
0.950
0.975
0.990
0.995
1.9774
9.1021
58.906
236.77
948.22
5928.3
23715
2.0041
9.1922
59.439
238.88
956.66
5981.1
23925
2.0250
9.2631
59.858
240.54
963.28
6022.5
24091
2.0419
9.3202
60.195
241.88
968.63
6055.8
24224
2.0558
9.3672
60.473
242.99
973.04
6083.3
24334
2.0674
9.4064
60.705
243.91
976.71
6106.3
24426
0.500
0.750
0.900
0.950
0.975
0.990
0.995
0.500
0.750
0.900
0.950
0.975
0.990
0.995
1.3045
3.3352
9.3491
19.353
39.355
99.356
199.36
1.3213
3.3526
9.3668
19.371
39.373
99.374
199.37
1.3344
3.3661
9.3805
19.385
39.387
99.388
199.39
1.3450
3.3770
9.3916
19.396
39.398
99.399
199.40
1.3537
3.3859
9.4006
19.405
39.407
99.408
199.41
1.3610
3.3934
9.4081
19.413
39.415
99.416
199.42
0.500
0.750
0.900
0.950
0.975
0.990
0.995
0.500
0.750
0.900
0.950
0.975
0.990
0.995
1.1482
2.4302
5.2662
8.8868
14.624
27.672
44.434
1.1627
2.4364
5.2517
8.8452
14.540
27.489
44.126
1.1741
2.4410
5.2400
8.8123
14.473
27.345
43.882
1.1833
2.4447
5.2304
8.7855
14.419
27.229
43.686
1.1909
2.4476
5.2223
8.7632
14.374
27.132
43.523
1.1972
2.4500
5.2156
8.7446
14.337
27.052
43.387
0.500
0.750
0.900
0.950
0.975
0.990
0.995
0.500
0.750
0.900
0.950
0.975
0.990
0.995
1.0797
2.0790
3.9790
6.0942
9.0741
14.976
21.622
1.0933
2.0805
3.9549
6.0410
8.9796
14.799
21.352
1.1040
2.0814
3.9357
5.9988
8.9047
14.659
21.139
1.1126
2.0820
3.9199
5.9644
8.8439
14.546
20.967
1.1196
2.0823
3.9066
5.9357
8.7933
14.452
20.824
1.1255
2.0826
3.8955
5.9117
8.7512
14.374
20.705
0.500
0.750
0.900
0.950
0.975
0.990
0.995
0.500
0.750
0.900
0.950
0.975
0.990
0.995
1.0414
1.8935
3.3679
4.8759
6.8531
10.456
14.200
1.0545
1.8923
3.3393
4.8183
6.7572
10.289
13.961
1.0648
1.8911
3.3163
4.7725
6.6810
10.158
13.772
1.0730
1.8899
3.2974
4.7351
6.6192
10.051
13.618
1.0798
1.8887
3.2815
4.7038
6.5676
9.9623
13.490
1.0855
1.8877
3.2682
4.6777
6.5246
9.8883
13.384
0.500
0.750
0.900
0.950
0.975
0.990
0.995
0.500
0.750
0.900
0.950
0.975
0.990
0.995
1.0169
1.7789
3.0145
4.2066
5.6955
8.2600
10.786
1.0298
1.7760
2.9830
4.1468
5.5996
8.1016
10.566
1.0398
1.7733
2.9577
4.0990
5.5234
7.9761
10.391
1.0478
1.7708
2.9369
4.0600
5.4613
7.8741
10.250
1.0545
1.7686
2.9193
4.0272
5.4094
7.7891
10.132
1.0600
1.7668
2.9047
3.9999
5.3662
7.7183
10.034
0.500
0.750
0.900
0.950
0.975
0.990
0.995
r2
r2
534
Table 6
(continued )
r1
13
14
15
18
20
24
0.500
0.750
0.900
0.950
0.975
0.990
0.995
2.0773
9.4399
60.903
244.69
979.85
6125.9
24504
2.0858
9.4685
61.073
245.37
982.54
6142.7
24572
2.0931
9.4934
61.220
245.95
984.87
6157.3
24630
2.1104
9.5520
61.567
247.32
990.36
6191.6
24767
2.1190
9.5813
61.740
248.01
993.10
6208.7
24836
2.1321
9.6255
62.002
249.05
997.25
6234.6
24940
0.500
0.750
0.900
0.950
0.975
0.990
0.995
0.500
0.750
0.900
0.950
0.975
0.990
0.995
1.3672
3.3997
9.4145
19.419
39.421
99.422
199.42
1.3725
3.4051
9.4200
19.424
39.426
99.427
199.43
1.3771
3.4098
9.4247
19.429
39.431
99.432
199.43
1.3879
3.4208
9.4358
19.440
39.442
99.443
199.44
1.3933
3.4263
9.4413
19.446
39.448
99.449
199.45
1.4014
3.4345
9.4496
19.454
39.456
99.458
199.46
0.500
0.750
0.900
0.950
0.975
0.990
0.995
0.500
0.750
0.900
0.950
0.975
0.990
0.995
1.2025
2.4520
5.2097
8.7286
14.305
26.983
43.271
1.2071
2.4537
5.2047
8.7148
14.277
26.923
43.171
1.2111
2.4552
5.2003
8.7029
14.253
26.872
43.085
1.2205
2.4585
5.1898
8.6744
14.196
26.751
42.880
1.2252
2.4602
5.1845
8.6602
14.167
26.690
42.778
1.2322
2.4626
5.1764
8.6385
14.124
26.598
42.622
0.500
0.750
0.900
0.940
0.975
0.990
0.955
0.500
0.750
0.900
0.950
0.975
0.990
0.995
1.1305
2.0827
3.8853
5.8910
8.7148
14.306
20.602
1.1349
2.0828
3.8765
5.8732
8.6836
14.248
20.514
1.1386
2.0829
3.8689
5.8578
8.6565
14.198
20.438
1.1473
2.0828
3.8525
5.8209
8.5921
14.079
20.257
1.1517
2.0828
3.8443
5.8025
8.5599
14.020
20.167
1.1583
2.0827
3.8310
5.7744
8.5109
13.929
20.030
0.500
0.750
0.900
0.950
0.975
0.990
0.995
0.500
0.750
0.900
0.950
0.975
0.990
0.995
1.0903
1.8867
3.2566
4.6550
6.4873
9.8244
13.292
1.0944
1.8858
3.2466
4.6356
6.4554
9.7697
13.214
1.0980
1.8851
3.2380
4.6188
6.4277
9.7222
13.146
1.1064
1.8830
3.2171
4.5783
6.3616
9.6092
12.984
1.1106
1.8820
3.2067
4.5581
6.3285
9.5527
12.903
1.1170
1.8802
3.1905
4.5272
6.2780
9.4665
12.780
0.500
0.750
0.900
0.950
0.975
0.990
0.995
0.500
0.750
0.900
0.950
0.975
0.990
0.995
1.0647
1.7650
2.8918
3.9761
5.3287
7.6570
9.9494
1.0687
1.7634
2.8808
3.9558
5.2966
7.6045
9.8769
1.0722
1.7621
2.8712
3.9381
5.2687
7.5590
9.8140
1.0804
1.7586
2.8479
3.8955
5.2018
7.4502
9.6639
1.0845
1.7569
2.8363
3.8742
5.1684
7.3958
9.5888
1.0907
1.7540
2.8183
3.8415
5.1172
7.3127
9.4741
0.500
0.750
0.900
0.950
0.975
0.990
0.995
r2
r2
535
Tables
Table 6
(continued )
r1
30
40
48
60
120
0.500
0.750
0.900
0.950
0.975
0.990
0.995
2.1452
9.6698
62.265
250.09
1001.4
6260.7
25044
2.1584
9.7144
62.529
251.14
1005.6
6286.8
25148
2.1650
9.7368
62.662
251.67
1007.7
6299.9
25201
2.1716
9.7591
62.794
252.20
1009.8
6313.0
25253
2.1848
9.8041
63.061
253.25
1014.0
6339.4
25359
2.1981
9.8492
63.328
254.32
1018.3
6366.0
25465
0.500
0.750
0.990
0.950
0.975
0.990
0.995
0.500
0.750
0.900
0.950
0.975
0.990
0.995
1.4096
3.4428
9.4579
19.462
39.465
99.466
199.47
1.4178
3.4511
9.4663
19.471
39.473
99.474
199.47
1.4220
3.4553
9.4705
19.475
39.477
99.478
199.47
1.4261
3.4594
9.4746
19.479
39.481
99.483
199.48
1.4344
3.4677
9.4829
19.487
39.490
99.491
199.49
1.4427
3.4761
9.4913
19.496
39.498
99.499
199.51
0.500
0.750
0.900
0.950
0.975
0.990
0.995
0.500
0.750
0.900
0.950
0.975
0.990
0.995
1.2393
2.4650
5.1681
8.6166
14.081
26.505
42.466
1.2464
2.4674
5.1597
8.5944
14.037
26.411
42.308
1.2500
2.4686
5.1555
8.5832
14.015
26.364
42.229
1.2536
2.4697
5.1512
8.5720
13.992
26.316
42.149
1.2608
2.4720
5.1425
8.5494
13.947
26.221
41.989
1.2680
2.4742
5.1337
8.5265
13.902
26.125
41.829
0.500
0.750
0.900
0.950
0.975
0.990
0.995
0.500
0.750
0.900
0.950
0.975
0.990
0.995
1.1649
2.0825
3.8174
5.7459
8.4613
13.838
19.892
1.1716
2.0821
3.8036
5.7170
8.4111
13.745
19.752
1.1749
2.0819
3.7966
5.7024
8.3858
13.699
19.682
1.1782
2.0817
3.7896
5.6878
8.3604
13.652
19.611
1.1849
2.0812
3.7753
5.6581
8.3092
13.558
19.468
1.1916
2.0806
3.7607
5.6281
8.2573
13.463
19.325
0.500
0.750
0.900
0.950
0.975
0.990
0.995
0.500
0.750
0.900
0.950
0.975
0.990
0.995
1.1234
1.8784
3.1741
4.4957
6.2269
9.3793
12.656
1.1297
1.8763
3.1573
4.4638
6.1751
9.2912
12.530
1.1329
1.8753
3.1488
4.4476
6.1488
9.2466
12.466
1.1361
1.8742
1.1402
4.4314
6.1225
9.2020
12.402
1.1426
1.8719
3.1228
4.3984
6.0693
9.1118
12.274
1.1490
1.8694
3.1050
4.3650
6.0153
0.0204
12.144
0.500
0.750
0.900
0.950
0.975
0.990
0.995
0.500
0.750
0.900
0.950
0.975
0.990
0.995
1.0969
1.7510
2.8000
3.8082
5.0652
7.2285
9.3583
1.1031
1.7477
2.7812
3.7743
5.0125
7.1432
9.2408
1.1062
1.7460
2.7716
3.7571
4.9857
7.1000
9.1814
1.1093
1.7443
2.7620
3.7398
4.9589
7.0568
9.1219
1.1156
1.7407
2.7423
3.7047
4.9045
6.9690
9.0015
1.1219
1.7368
2.7222
3.6688
4.9491
6.8801
8.8793
0.500
0.750
0.900
0.950
0.975
0.990
0.995
r2
r2
536
Table 6
(continued )
r1
0.500
0.750
0.900
0.950
0.975
0.990
0.995
0.50572
1.5732
3.5894
5.5914
8.0727
12.246
16.236
0.76655
1.7010
3.2574
4.7374
6.5415
9.5466
12.404
0.87095
1.7169
3.0741
4.3468
5.8898
8.4513
10.882
0.92619
1.7157
2.9605
4.1203
5.5226
7.8467
10.050
0.96026
1.7111
2.8833
3.9715
5.2852
7.4604
9.5221
0.98334
1.7059
2.8274
3.8660
5.1186
7.1914
9.1554
0.500
0.750
0.900
0.950
0.975
0.990
0.995
0.500
0.750
0.900
0.950
0.975
0.990
0.995
0.49898
1.5384
3.4579
5.3177
7.5709
11.259
14.688
0.75683
1.6569
3.1131
4.4590
6.0595
8.6491
11.042
0.86004
1.6683
2.9238
4.0662
5.4160
7.5910
9.5965
0.91464
1.6642
2.8064
3.8378
5.0526
7.0060
8.8051
0.94831
1.6575
2.7265
3.6875
4.8173
6.6318
8.3018
0.97111
1.6508
2.6683
3.5806
4.6517
6.3707
7.9520
0.500
0.750
0.900
0.950
0.975
0.990
0.995
0.500
0.750
0.900
0.950
0.975
0.990
0.995
0.49382
1.5121
3.3603
5.1174
7.2093
10.561
13.614
0.74938
1.6236
3.0065
4.2565
5.7147
8.0215
10.107
0.85168
1.6315
2.8129
3.8626
5.0781
6.9919
8.7171
0.90580
1.6253
2.6927
3.6331
4.7181
6.4221
7.9559
0.93916
1.6170
2.6106
3.4817
4.4844
6.0569
7.4711
0.96175
1.6091
2.5509
3.3738
4.3197
5.8018
7.1338
0.500
0.750
0.900
0.950
0.975
0.990
0.995
10
0.500
0.750
0.900
0.950
0.975
0.990
0.995
0.48973
1.4915
3.2850
4.9646
6.9367
10.044
12.826
0.74349
1.5975
2.9245
4.1028
5.4564
7.5594
9.4270
0.84508
1.6028
2.7277
3.7083
4.8256
6.5523
8.0807
0.89882
1.5949
2.6053
3.4780
4.4683
5.9943
7.3428
0.93193
1.5853
2.5216
3.3258
4.2361
5.6363
6.8723
0.95436
1.5765
2.4606
3.2172
4.0721
5.3858
6.5446
0.500
0.750
0.900
0.950
0.975
0.990
0.995
10
11
0.500
0.750
0.900
0.950
0.975
0.990
0.995
0.48644
1.4749
3.2252
4.8443
6.7241
9.6460
12.226
0.73872
1.5767
2.8595
3.9823
5.2559
7.2057
8.9122
0.83973
1.5798
2.6602
3.5874
4.6300
6.2167
7.6004
0.89316
1.5704
2.5362
3.3567
4.2751
5.6683
6.8809
0.92608
1.5598
2.4512
3.2039
4.0440
5.3160
6.4217
0.94837
1.5502
2.3891
3.0946
3.8807
5.0692
6.1015
0.500
0.750
0.900
0.950
0.975
0.990
0.995
11
12
0.500
0.750
0.900
0.950
0.975
0.990
0.995
0.48369
1.4613
3.1765
4.7472
6.5538
9.3302
11.754
0.73477
1.5595
2.8068
3.8853
5.0959
6.9266
8.5096
0.83530
1.5609
2.6055
3.4903
4.4742
5.9526
7.2258
0.88848
1.5503
2.4801
3.2592
4.1212
5.4119
6.5211
0.92124
1.5389
2.3940
3.1059
3.8911
5.0643
6.0711
0.94342
1.5286
2.3310
2.9961
3.7283
4.8206
5.7570
0.500
0.750
0.900
0.950
0.975
0.990
0.995
12
r2
r2
537
Tables
Table 6
(continued )
r1
10
11
12
0.500
0.750
0.900
0.950
0.975
0.990
0.995
1.0000
1.7011
2.7849
3.7870
4.9949
6.9928
8.8854
1.0216
1.6969
2.7516
3.7257
4.8994
6.8401
8.6781
1.0224
1.6931
2.7247
3.6767
4.8232
6.7188
8.5138
1.0304
1.6898
2.7025
3.6365
4.7611
6.6201
8.3803
1.0369
1.6868
2.6837
3.6028
4.7091
6.5377
8.2691
1.0423
1.6843
2.6681
3.5747
4.6658
6.4691
8.1764
0.500
0.750
0.900
0.950
0.975
0.990
0.995
0.500
0.750
0.900
0.950
0.975
0.990
0.995
0.98757
1.6448
2.6241
3.5005
4.5286
6.1776
7.6942
1.0000
1.6396
2.5893
3.4381
4.4332
6.0289
7.4960
1.0097
1.6350
2.5612
3.3881
4.3572
5.9106
7.3386
1.0175
1.6310
2.5380
3.3472
4.2951
5.8143
7.2107
1.0239
1.6274
2.5184
3.3127
4.2431
5.7338
7.1039
1.0293
1.6244
2.5020
3.2840
4.1997
5.6668
7.0149
0.500
0.750
0.900
0.950
0.975
0.990
0.995
0.500
0.750
0.900
0.950
0.975
0.990
0.995
0.97805
1.6022
2.5053
3.2927
4.1971
5.6129
6.8849
0.99037
1.5961
2.4694
3.2296
4.1020
5.4671
6.6933
1.0000
1.5909
2.4403
3.1789
4.0260
5.3511
6.5411
1.0077
1.5863
2.4163
3.1373
3.9639
5.2565
6.4171
1.0141
1.5822
2.3959
3.1022
3.9117
5.1774
6.3136
1.0194
1.5788
2.3789
3.0729
3.8682
5.1114
6.2274
0.500
0.750
0.900
0.950
0.975
0.990
0.995
10
0.500
0.750
0.900
0.950
0.975
0.990
0.995
0.97054
1.5688
2.4140
3.1355
3.9498
5.2001
6.3025
0.98276
1.5621
2.3772
3.0717
3.8549
5.0567
6.1159
0.99232
1.5563
2.3473
3.0204
3.7790
4.9424
5.9676
1.0000
1.5513
2.3226
2.9782
3.7168
4.8492
5.8467
1.0063
1.5468
2.3016
2.9426
3.6645
4.7710
5.7456
1.0166
1.5430
2.2841
2.9130
3.6209
4.7059
5.6613
0.500
0.750
0.900
0.950
0.975
0.990
0.995
10
11
0.500
0.750
0.900
0.950
0.975
0.990
0.995
0.96445
1.5418
2.3416
3.0123
3.7586
4.8861
5.8648
0.97661
1.5346
2.3040
2.9480
3.6638
4.7445
5.6821
0.98610
1.5284
2.2735
2.8962
3.5879
4.6315
5.5368
0.99373
1.5230
2.2482
2.8536
3.5257
4.5393
5.4182
0.99999
1.5181
2.2267
2.8176
3.4733
4.4619
5.3190
1.0052
1.5140
2.2087
2.7876
3.4296
4.3974
5.2363
0.500
0.750
0.900
0.950
0.975
0.990
0.995
11
12
0.500
0.750
0.900
0.950
0.975
0.990
0.995
0.95943
1.5197
2.2828
2.9134
3.6065
4.6395
5.5245
0.97152
1.5120
2.2446
2.8486
3.5118
4.4994
5.3451
0.98097
1.5054
2.2135
2.7964
3.4358
4.3875
5.2021
0.98856
1.4996
2.1878
2.7534
3.3736
4.2961
5.0855
0.99480
1.4945
1.1658
2.7170
3.3211
4.2193
4.9878
1.0000
1.4902
1.1474
2.6866
3.2773
4.1553
4.9063
0.500
0.750
0.900
0.950
0.975
0.990
0.995
12
r2
r2
538
Table 6
(continued )
r1
13
14
15
18
20
24
0.500
0.750
0.900
0.950
0.975
0.990
0.995
1.0469
1.6819
2.6543
3.5501
4.6281
6.4096
8.0962
1.0509
1.6799
2.6425
3.5291
4.5958
6.3585
8.0274
1.0543
1.6781
2.6322
3.5108
4.5678
6.3143
7.9678
1.0624
1.6735
2.6072
3.4666
4.5004
6.2084
7.8253
1.0664
1.6712
2.5947
3.4445
4.4667
6.1554
7.7540
1.0724
1.6675
2.5753
3.4105
4.4150
6.0743
7.6450
0.500
0.750
0.900
0.950
0.975
0.990
0.995
0.500
0.750
0.900
0.950
0.975
0.990
0.995
1.0339
1.6216
2.4875
3.2588
4.1618
5.6085
6.9377
1.0378
1.6191
2.4750
3.2371
4.1293
5.5584
6.8716
1.0412
1.6170
2.4642
3.2184
4.1012
5.5151
6.8143
1.0491
1.6115
2.4378
3.1730
4.0334
5.4111
6.6769
1.0531
1.6088
2.4246
3.1503
3.9995
5.3591
6.6082
1.0591
1.6043
2.4041
3.1152
3.9472
5.2793
6.5029
0.500
0.750
0.900
0.950
0.975
0.990
0.995
0.500
0.750
0.900
0.950
0.975
0.990
0.995
1.0239
1.5756
2.3638
3.0472
3.8302
5.0540
6.1524
1.0278
1.5729
2.3508
3.0252
3.7976
5.0048
6.0882
1.0311
1.5705
2.3396
3.0061
3.7694
4.9621
6.0325
1.0390
1.5642
2.3121
2.9597
3.7011
4.8594
5.8987
1.0429
1.5611
2.9893
2.9365
3.6669
4.8080
5.8318
1.0489
1.5560
2.2768
2.9005
3.6142
4.7290
5.7292
0.500
0.750
0.900
0.950
0.975
0.990
0.995
10
0.500
0.750
0.900
0.950
0.975
0.990
0.995
1.0161
1.5395
2.2685
2.8868
3.5827
4.6491
5.5880
1.0199
1.5364
2.2551
2.8644
3.5500
4.6004
5.5252
1.0232
1.5338
2.2435
2.8450
3.5217
4.5582
5.4707
1.0310
1.5269
2.2150
2.7977
3.4530
4.4563
5.3396
1.0349
1.5235
2.2007
2.7740
3.4186
4.4054
5.2740
1.0408
1.5179
2.1784
2.7372
3.3654
4.3269
5.1732
0.500
0.750
0.900
0.950
0.975
0.990
0.995
10
11
0.500
0.750
0.900
0.950
0.975
0.990
0.995
1.0097
1.5102
2.1927
2.7611
3.3913
4.3411
5.1642
1.0135
1.5069
2.1790
2.7383
3.3584
4.2928
5.1024
1.0168
1.5041
2.1671
2.7186
3.3299
4.2509
5.0489
1.0245
1.4967
2.1377
2.6705
3.2607
4.1496
4.9198
1.0284
1.4930
2.1230
2.6464
3.2261
4.0990
4.8552
1.0343
1.4869
2.1000
2.6090
3.1725
4.0209
4.7557
0.500
0.750
0.900
0.950
0.975
0.990
0.995
11
12
0.500
0.750
0.900
0.950
0.975
0.990
0.995
1.0044
1.4861
2.1311
2.6598
3.2388
4.0993
4.8352
1.0082
1.4826
2.1170
2.6368
3.2058
4.0512
4.7742
1.0115
1.4796
1.1049
2.6169
3.1772
4.0096
4.7214
1.0192
1.4717
2.0748
2.5680
3.1076
3.9088
4.5937
1.0231
1.4678
2.0597
2.5436
3.0728
3.8584
4.5299
1.0289
1.4613
2.0360
2.5055
3.0187
3.7805
4.4315
0.500
0.750
0.900
0.950
0.975
0.990
0.995
12
r2
r2
539
Tables
Table 6
(continued )
r1
30
40
48
60
120
0.500
0.750
0.900
0.950
0.975
0.990
0.995
1.0785
1.6635
2.5555
3.3758
4.3624
5.9921
7.5345
1.0846
1.6593
2.5351
3.3404
4.3089
5.9084
7.4225
1.0877
1.6571
2.5427
3.3224
4.2817
5.8660
7.3657
1.0908
1.6548
2.5142
3.3043
4.2544
5.8236
7.3088
1.0969
1.6502
2.4928
3.2674
4.1989
5.7372
7.1933
1.1031
1.6452
2.4708
3.2298
4.1423
5.6495
7.0760
0.500
0.750
0.900
0.950
0.975
0.990
0.995
0.500
0.750
0.900
0.950
0.975
0.990
0.995
1.0651
1.5996
2.3830
3.0794
3.8940
5.1981
6.3961
1.0711
1.5945
2.3614
3.0428
3.8398
5.1156
6.2875
1.0741
1.5919
2.3503
3.0241
3.8121
5.0736
6.2324
1.0771
1.5892
2.3391
3.0053
3.7844
5.0316
6.1772
1.0832
1.5836
2.3162
2.9669
3.7279
4.9460
6.0649
1.0893
1.5777
2.2926
2.9276
3.6702
4.8588
5.9505
0.500
0.750
0.900
0.950
0.975
0.990
0.995
0.500
0.750
0.900
0.950
0.975
0.990
0.995
1.0548
1.5506
2.2547
2.8637
3.5604
4.6486
5.6248
1.0608
1.5450
2.2320
2.8259
3.5055
4.5667
5.5186
1.0638
1.5420
2.2203
2.8066
3.4774
4.5249
5.4645
1.0667
1.5389
2.2085
2.7872
3.4493
4.4831
5.4104
1.0727
1.5325
2.1843
2.7475
3.3918
4.3978
5.3001
1.0788
1.5257
2.1592
2.7067
3.3329
4.3105
5.1875
0.500
0.750
0.900
0.950
0.975
0.990
0.995
10
0.500
0.750
0.900
0.950
0.975
0.990
0.995
1.0467
1.5119
2.1554
2.6996
3.3110
4.2469
5.0705
1.0526
1.5056
1.1317
2.6609
3.2554
4.1653
4.9659
1.0556
1.5023
2.1195
2.6410
3.2269
4.1236
4.9126
1.0585
1.4990
2.1072
2.6211
3.1984
4.0819
4.8592
1.0645
1.4919
2.0818
2.5801
3.1399
3.9965
4.7501
1.0705
1.4843
2.0554
2.5379
3.0798
3.9090
4.6385
0.500
0.750
0.900
0.950
0.975
0.990
0.995
10
11
0.500
0.750
0.900
0.950
0.975
0.990
0.995
1.0401
1.4805
2.0762
2.5705
3.1176
3.9411
4.6543
1.0460
1.4737
2.0516
2.5309
3.0613
3.8596
4.5508
1.0490
1.4701
2.0389
2.5105
3.0324
3.8179
4.4979
1.0519
1.4664
2.0261
2.4901
3.0035
3.7761
4.4450
1.0578
1.4587
1.9997
2.4480
2.9441
3.6904
4.3367
1.0637
1.4504
1.9721
2.4045
2.8828
3.6025
4.2256
0.500
0.750
0.900
0.950
0.975
0.990
0.995
11
12
0.500
0.750
0.900
0.950
0.975
0.990
0.995
1.0347
1.4544
2.0115
2.4663
2.9633
3.7008
4.3309
1.0405
1.4471
1.9861
2.4259
2.9063
3.6192
4.2282
1.0435
1.4432
1.9729
2.4051
2.8771
3.5774
4.1756
1.0464
1.4393
1.9597
2.3842
2.8478
3.5355
4.1229
1.0523
1.4310
1.9323
2.3410
2.7874
3.4494
4.0149
1.0582
1.4221
1.9036
2.2962
2.7249
3.3608
3.9039
0.500
0.750
0.900
0.950
0.975
0.990
0.995
12
r2
r2
540
Table 6
(continued )
r1
13
0.500
0.750
0.900
0.950
0.975
0.990
0.995
0.48141
1.4500
3.1362
4.6672
6.4143
9.0738
11.374
0.73145
1.5452
2.7632
3.8056
4.9653
6.7010
8.1865
0.83159
1.5451
2.5603
3.4105
4.3472
5.7394
6.9257
0.88454
1.5336
2.4337
3.1791
3.9959
5.2053
6.2335
0.91718
1.5214
2.3467
3.0254
3.7667
4.8616
5.7910
0.93926
1.5105
2.2830
2.9153
3.6043
4.6204
5.4819
0.500
0.750
0.900
0.950
0.975
0.990
0.995
13
14
0.500
0.750
0.900
0.950
0.975
0.990
0.995
0.47944
1.4403
3.1022
4.6001
6.2979
8.8616
11.060
0.72862
1.5331
2.7265
3.7389
4.8567
6.5149
7.9216
0.82842
1.5317
2.5222
3.3439
4.2417
5.5639
6.6803
0.88119
1.5194
2.3947
3.1122
3.8919
5.0354
5.9984
0.91371
1.5066
2.3069
2.9582
3.6634
4.6950
5.5623
0.93573
1.4952
2.2426
2.8477
3.5014
4.4558
5.2574
0.500
0.750
0.900
0.950
0.975
0.990
0.995
14
15
0.500
0.750
0.900
0.950
0.975
0.990
0.995
0.47775
1.4321
3.0732
4.5431
6.1995
8.6831
10.798
0.72619
1.5227
2.6952
3.6823
4.7650
6.3589
7.7008
0.82569
1.5202
2.4898
3.2874
4.1528
5.4170
6.4760
0.87830
1.5071
2.3614
3.0556
3.8043
4.8932
5.8029
0.91073
1.4938
2.2730
2.9013
3.5764
4.5556
5.3721
0.93267
1.4820
2.2081
2.7905
3.4147
4.3183
5.0708
0.500
0.750
0.900
0.950
0.975
0.990
0.995
15
16
0.500
0.750
0.900
0.950
0.975
0.990
0.995
0.47628
1.4249
3.0481
4.4940
6.1151
8.5310
10.575
0.72406
1.5137
2.6682
3.6337
4.6867
6.2262
7.5138
0.82330
1.5103
2.4618
3.2389
4.0768
5.2922
6.3034
0.87578
1.4965
2.3327
3.0069
3.7294
4.7726
5.6378
0.90812
1.4827
2.2438
2.8524
3.5021
4.4374
5.2117
0.93001
1.4705
2.1783
2.7413
3.3406
4.2016
4.9134
0.500
0.750
0.900
0.950
0.975
0.990
0.995
16
17
0.500
0.750
0.900
0.950
0.975
0.990
0.995
0.47499
1.4186
3.0262
4.4513
6.0420
8.3997
10.384
0.72219
1.5057
2.6446
3.5915
4.6189
6.1121
7.3536
0.82121
1.5015
2.4374
3.1968
4.0112
5.1850
6.1556
0.87357
1.4873
2.3077
2.9647
3.6648
4.6690
5.4967
0.90584
1.4730
2.2183
2.8100
3.4379
4.3359
5.0746
0.92767
1.4605
2.1524
2.6987
3.2767
4.1015
5.7789
0.500
0.750
0.900
0.950
0.975
0.990
0.995
17
18
0.500
0.750
0.900
0.950
0.975
0.990
0.995
0.47385
1.4130
3.0070
4.4139
5.9781
8.2854
10.218
0.72053
1.4988
2.6239
3.5546
4.5597
6.0129
7.2148
0.81936
1.4938
2.4160
3.1599
3.9539
5.0919
6.0277
0.87161
1.4790
2.2858
2.9277
3.6083
4.5790
5.3746
0.90381
1.4644
2.1958
2.7729
3.3820
4.2479
4.9560
0.92560
1.4516
1.1296
2.6613
3.2209
4.0146
4.6627
0.500
0.750
0.900
0.950
0.975
0.990
0.995
18
r2
r2
541
Tables
Table 6
(continued )
r1
10
11
12
13
0.500
0.750
0.900
0.950
0.975
0.990
0.995
0.95520
1.5011
2.2341
2.8321
3.4827
4.4410
5.2529
0.96724
1.4931
2.1953
2.7669
3.3880
4.3021
5.0761
0.97665
1.4861
2.1638
2.7144
3.3120
4.1911
4.9351
0.98421
1.4801
1.1376
2.6710
3.2497
4.1003
4.8199
0.99042
1.4746
1.1152
2.6343
3.1971
4.0239
4.7234
0.99560
1.4701
2.0966
2.6037
3.1532
3.9603
4.6429
0.500
0.750
0.900
0.950
0.975
0.990
0.995
13
14
0.500
0.750
0.900
0.950
0.975
0.990
0.995
0.95161
1.4854
2.1931
2.7642
3.3799
4.2779
5.0313
0.96360
1.4770
2.1539
2.6987
2.2853
4.1399
4.8566
0.97298
1.4697
2.1220
2.6548
3.2093
4.0297
4.7173
0.98051
1.4634
2.0954
2.6021
3.1469
3.9394
4.6034
0.98670
1.4577
2.0727
2.5651
3.0941
3.8634
4.5078
0.99186
1.4530
2.0537
2.5342
3.0501
3.8001
4.4281
0.500
0.750
0.900
0.950
0.975
0.990
0.995
14
15
0.500
0.750
0.900
0.950
0.975
0.990
0.995
0.94850
1.4718
2.1582
2.7066
3.2934
4.1415
4.8473
0.96046
1.4631
2.1185
2.6408
3.1987
4.0045
4.6743
0.96981
1.4556
2.0862
2.5876
3.1227
3.8948
4.5364
0.97732
1.4491
2.0593
2.5437
3.0602
3.8049
4.4236
0.98349
1.4432
2.0363
2.5064
3.0073
3.7292
4.3288
0.98863
1.4383
2.0171
2.4753
2.9633
3.6662
4.2498
0.500
0.750
0.900
0.950
0.975
0.990
0.995
15
16
0.500
0.750
0.900
0.950
0.975
0.990
0.995
0.94580
1.4601
2.1280
2.6572
3.2194
4.0259
4.6920
0.95773
1.4511
2.0880
2.5911
3.1248
3.8896
4.5207
0.96705
1.4433
2.0553
2.5377
3.0488
3.7804
4.3838
0.97454
1.4366
2.0281
2.4935
2.9862
3.6909
4.2719
0.98069
1.4305
2.0048
2.4560
2.9332
3.6155
4.1778
0.98582
1.4255
1.9854
2.4247
2.8890
3.5527
4.0994
0.500
0.750
0.900
0.950
0.975
0.990
0.995
16
17
0.500
0.750
0.900
0.950
0.975
0.990
0.995
0.94342
1.4497
2.1017
2.6143
3.1556
3.9267
4.5594
0.95532
1.4405
2.0613
2.5480
3.0610
3.7910
4.3893
0.96462
1.4325
2.0284
2.4943
2.9849
3.6822
4.2535
0.97209
1.4256
2.0009
2.4499
2.9222
3.5931
4.1423
0.97823
1.4194
1.9773
2.4122
2.8691
3.5179
4.0488
0.98334
1.4142
1.9577
2.3807
2.8249
3.4552
3.9709
0.500
0.750
0.900
0.950
0.975
0.990
0.995
17
18
0.500
0.750
0.900
0.950
0.975
0.990
0.995
0.94132
1.4406
2.0785
2.5767
3.0999
3.8406
4.4448
0.95319
1.4312
2.0379
2.5102
3.0053
3.7054
4.2759
0.96247
1.4320
2.0047
2.4563
2.9291
3.5971
4.1410
0.96993
1.4159
1.9770
2.4117
2.8664
3.5082
4.0305
0.97606
1.4095
1.9532
2.3737
2.8132
3.4331
3.9374
0.98116
1.4042
1.9333
2.3421
2.7689
3.3706
3.8599
0.500
0.750
0.900
0.950
0.975
0.990
0.995
18
r2
These tables have been adapted from Donald B. Owens Handbook of Statistical Tables,
published by Addison-Wesley, by permission of the publishers.
r2
542
Table 7 Table of Selected Discrete and Continuous Distributions and Some of their Characteristics
Distribution
Binomial, B(n, p)
(Bernoulli, B(1, p)
Poisson P()
Mean
Variance
n
f ( x) = p x q n x , x = 0, 1, . . . , n;
x
0 < p < 1, q = 1 p
np
npq
f (x) = p q
pq)
1 x
, x = 0, 1
, x = 0, 1, . . . ;
x!
x
f ( x) = e
>0
Hypergeometric
f (x) =
m n
x r x
m + n
r
, where
mr
m+n
x = 0, 1, . . . , min(r , m)
Negative Binomial
r + x 1 x
f ( x) = p r
q , x = 0, 1, . . . ;
x
mnr(m + n r )
(m + n) (m + n 1)
2
rq
p
rq
p2
q
p
q
)
p2
vector of expectations:
vector of variances:
(np1, . . . , npk)
(np1q1, . . . , npkqk)
0 < p < 1, q = 1 p
(Geometric
f ( x) = pq x , x = 0, 1, . . .
Multinomial
f ( x1 , . . . , xk ) =
n!
x1! x2 ! xk !
x1 + + xk = n; pj > 0, j = 1,
2, . . . , k, p1 + p2 + + pk = 1
Normal, N(, 2)
(x ) 2
exp
2 2
2
x ; , > 0
1
f (x) =
f ( x) =
Gamma
f (x) =
x2
exp , x
2
2
1
x
1
x 1 exp , x > 0 ;
( )
qj = 1 pj, j = 1, . . . , k
1)
2r
, > 0
Chi-square
f ( x) =
r
1
1
x
x 2 exp , x > 0;
2
r r2
2
2
r > 0 integer
Tables
Table 7
543
(continued )
Distribution
Mean
Variance
Negative Exponential
1
2
Uniform, U(, )
f (x) =
( )
+
2
1
, x ;
12
Beta
f (x) =
( + )
( ) ( )
x 1 ( 1 x )
, 0 < x < 1;
, > 0
Cauchy
f ( x) =
( + ) ( + + 1)
vector of expectations:
(1, 2)
vector of variances:
( 21, 22).
mean vector:
covariance
matrix: /
, x ;
2 + ( x )2
, > 0
Bivariate Normal
f ( x1 , x 2 ) =
1
2 1 2 1 2
q
exp ,
2
2
x 1
1 x1 1
2 1
1 2 1
1
2
x 2 x2 2
,
2
+
2 2
x1, x2, ;
1, 2 , 1, 2 > 0, 1 1
q=
f ( x ) = (2 )
k 2
1 2
1
exp ( x )
/ 1 ( x ),
2
x k; , / : k k
non-singular symmetric matrix
Distribution
Characteristic function
Binomial, B(n, p)
(t ) = pe it + q , t
(Bernoulli, B(1, p)
Poisson, P()
(t ) = pe it + q, t
M (t ) = pe t + q , t
n
M (t ) = pe t + q, t )
(t ) = exp e , t
M (t ) = exp e t , t
Negative Binomial
(t ) =
(Geometric
(t ) =
it
(1 qe )
it
, t
p
, t
1 qe it
M(t ) =
M(t ) =
(1 qe )
t
, t < log q
p
, t < log q)
1 qe t
544
Table 7
(continued )
Distribution
Characteristic function
Multinomial
(t 1 , . . . , t k ) = p1e it1 + + pk e it k ,
M (t 1 , . . . , t k ) = p1e t1 + + pk e t k ,
t1, . . . , tk
t1, . . . , tk
Normal, N(, 2)
2t 2
(t ) = exp it
, t
2
2t 2
M (t ) = exp t +
, t
2
(Standard Normal
t2
(t ) = exp , t
2
t2
M (t ) = exp , t )
2
Gamma
(t ) =
Chi-square
(t ) =
Negative Exponential
(t ) =
, t
it
M(t ) =
, t<
t
Uniform, U(, )
(t ) =
e it e it
, t
it ( )
M (t ) =
e t e t
, t
t ( )
Cauchy ( = 0, = 1)
(t ) = exp t , t
Bivariate Normal
(t 1 , t 2 ) = exp[i1 t 1 + i 2 t 2
M(t 1 , t 2 ) = exp[ 1 t 1 + 2 t 2
(1 2 it )
M(t ) =
, t
r 2
M(t ) =
, t
( )
k-Variate Normal
(1 it )
1 2 2
1 t 1 + 2 1 2 t 1 t 2 + 22 t 22 ,
2
(1 t )
( 1 2t )
r2
,t<
,t<
1
2
1 2 2
1 t 1 + 2 1 2 t 1 t 2 + 22 t 22 ,
2
t1, t2
t1, t2 R
( t) = exp it t / t ,
M ( t ) = exp t + t / t ,
t k
t k
545
F, A
F (C ), (C )
AA
(S, A)
k, B k , k 1
( 1, B 1) = ( , B)
,
P, (S, A, P)
IA
X1(B)
1
A X or X (-eld)
(X B) = [X B]
= X1(B)
r.v., r. vector,
r. experiment,
r. sample, r. interval,
r. error
X(S )
X( j) or Yj
B(n, p)
P()
N(, 2)
545
546
2r
U(, ) or R(, )
tr
Fr ,r
r2:
tr :
Fr ,r :
E(X) or EX or
(X) or X or
just
2(X) ((X)) or
2X (X) or
just 2()
Cov(X, Y), (X, Y)
X or X , . . . , X ,
X or just
MX or MX , . . . , X ,
MX or just M
X
a.s.
P
d
,
, q.m.
}
}
UMV (UMVU)
ML (MLE)
(UMP) MP (UMPU)
(MLR) LR
SPRT
LE (LSE)
Chapter 1
1.1.1.
1.1.2.
A1 = {(0, 0), (1, 1), (2, 2), (3, 3), (4, 4), (5, 5)}
A2 = {(5, 5), (4, 4), (3, 3), (2, 2), (1, 1), (0, 0)}
A5 = {(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (2, 0), (1, 0), (1, 1), (2, 0)}.
1.1.9.
n =1 j =n
1.1.11.
n =1 j =n
Let A1, A2 F. Then Ac1, Ac2 F by (F 2). Also Ac1 Ac2 F by (F 3).
But A Ac2 = (A1 A2)c. Thus (A1 A2)c F and hence A1 A2 F by
(F 2).
1.2.2.
c
1
Chapter 2
1.2.7.
2.1.1.
2.1.2.
2.1.3.
2.1.4.
2.2.7.
2.2.8.
2.2.9.
547
548
2.2.11.
2.2.12.
19/218.
1 6
[ m j n j /( m j + n j )( m j + n j 1)].
3 j =1
2.2.17.
2.3.1.
n 1
2.3.6.
j =1
2.4.1.
720.
2.4.2.
2n.
2.4.3.
900.
2.4.4.
2.4.6.
1/360.
2.4.7.
2.4.8.
n(n 1).
2.4.12.
29/56.
2.4.13.
(2n)!.
2.4.14.
(1 / 2) 2n
2.4.15.
j p j (1 p)n j .
j =0
2.4.21.
10 1 4
j 5 5
j=5
2.4.29.
2 n
.
j
j =n +1
2n
10
10 j
4 48 2 3
52 3
7
53
392
507
0.18965; P(A5) =
0.08857.
2067
5724
2.4.31.
4
4 48
26 26
13 13
(i) 2 ; (ii) 4 ; (iii)
; (iv) 384;
3 2
1 2
j = 2 j 5 j
13
(v) 4 .
5
Chapter 3
Chapter 4
549
2.6.3.
244/495.
3.2.1.
3.2.2.
3.2.5.
3.2.6.
e4.
= 2 4 , x = 0, 1, . . . , 4.
x
3.2.10.
9
400 1200
1
x =0
x 25 x
1600
.
25
3.2.14.
c=1.
3.2.15.
3.3.5.
a = 3.94, b = 18.3.
3.3.7.
3.3.8.
3.3.9.
(i) = 2; (ii) = 3.
3.3.12.
2 tan 1 c / .
3.3.15.
3.3.16.
exp(x3).
3.3.21.
3.4.2.
3.4.3.
0.0713.
4.1.10.
4.1.11.
550
4.1.13.
(i) 0.584.
4.1.14.
c = + 1.15.
x
4.2.1.
21 1 5
(i) f j ( x) =
x 6 6
21 x
4
21 1 5
, j = 1, . . . , 6; (ii) 1
x =0 x 6 6
21 x
(i) c = any real > 0; (ii) fX(x) = 2(c x)/c2, x [0, c], fY( y) = 2y/c2,
y [0, c]; (iii) f(x|y) = 1/y, x [0, y], y (0, c], f( y|x) = 1/(c x),
0 x y < c; (iv) (2c 1)/c2, by assuming that c > 1/2.
4.2.3.
Chapter 5
4.2.4.
(i) 1 ex, x > 0; (ii) 1 ey, y > 0; (iii) 1/2; (iv) 1 4e3.
5.1.6.
0, c2.
5.1.12.
(i) 2, 4; (ii) 2.
5.2.1.
n(n 1) (n k + 1)pk.
5.2.2.
5.2.3.
5.2.4.
k.
5.2.10.
$0.075.
5.2.13.
2 + 2
2
2
exp
, (exp 1)exp( 2 + ).
2
(ii) 1 = [np( p 1)(2p 1)]/ 3, 2 = np(1 p), so that 1 < 0 if p < 1/2
and 1 > 0 if p > 1/2; (iii) 1/2, 2.
5.2.15.
5.2.16.
5.3.2.
n.
7/12, 11/144, 7/12, 11/144, (3y + 2)/(6y + 3), y (0, 1), (6y2 + 6y + 1)/
2(6y + 3)2, y (0, 1).
5.3.3.
5.3.4.
5.4.5.
5.5.1.
1
1
P( X = ) = P I X < = P lim X <
n
n
n
n =1
1
= lim P X < = 1.
n
n
0, 2.5, 2.5, 2.25, 0.
Chapter 6
6.2.10.
6.2.11.
6.5.1.
6 jt
e 6, t .
j=1
6.5.2.
6.5.4.
6.5.5.
(t) = [ pt + (1 p)]n, t .
6.5.12.
(t) = et, t .
6.5.15.
M(t) = 1/(1 t), t (1, 1), (t) = 1/(1 it), f(x) = ex, x > 0.
6.5.17.
6.5.24.
12
C=
1 2
Chapter 7
551
1 2
1
, (t1, t2) = exp(it 2 tCt), E(X1, X2) = 12 + 12.
22
(1)
(n)
(1)
(n)
7.1.2.
(i) fX (x1) = I(0,1 )(x1), fX (x2) = I(0,1)(x2); (ii) 1/18, /16, (1 log 2)/2.
1
7.1.8.
(i)
j p (1 p)
j
n j
, p = P( X 1 B); (ii) p = 1 / e;
j =k
10
10
(iii) e j (1 e 1 )10 j 0.3057 independently of .
j=5 j
Chapter 8
7.2.2.
7.2.4.
7.3.3.
8.1.1.
pn 0 as n .
8.1.3.
2
n )2 = 2(X
n) = 0 as n .
E(X
n
552
where X ~ P().
8.1.4.
8.3.2.
8.3.3.
8.3.5.
8.3.7.
0.999928.
8.3.8.
4,146.
8.3.11.
c = 0.329.
8.3.15.
n = 123.
8.3.16.
26.
it
8.4.3.
1 n
1 n
1
E( X n n ) = 2 E ( X j j ) = 2 E( X j j ) 2 = 2
n j =1
n j =1
n
M
1
2 nM =
0.
n n
n
2
2
j
j =1
8.4.6.
1
1
2(X j ) = 2
j2 = 2 ( j2 ) = 2 and then Exercise 8.4.3 applies.
j j
8.4.7.
2 ( X j ) = j so that E X n n
=
=
Chapter 9
()
it
1
n2
2j =
j=1
1
n2
j
j=1
1 1
j n
0.
n n j=1
(ii) P(Y = c1) = 0.9596, P(Y = c2) = 0.0393, P(Y = c3) = 0.0011;
(iii) 0.9596c1 + 0.0393c2 + 0.0011c3.
9.1.2.
9.1.3.
N( 59 + 32, 81
25 ).
9.1.6.
fZ ( z) = 1 / 1 z2 , z (1, 1).
9.1.8.
553
fX ( x ) =
1
1
I (0 ,1) ( x ) + 2 I (1, ), and 0 otherwise.
2
2x
9.2.8.
1
1
r1 + k r2 k
r
2
2
;
ii(i) Use EXk = 2
1 1
r1
r1 r2
2 2
k
1
gives
1 + rr x
1
x=
r2 1 y
dx r2 1
= .
, o < y < 1, and
r1
y
dy r1 y 2
( )
r
r1 +2 r2
r 1 y r2 1
1
=
fY ( y) = f X 2
y2 1 y
2
r
r
y r1 y
r1
21 22
()()
r1
2
after cancellations. This last expression is the p.d.f. of the Beta distribution with degrees of freedom r2 and r2 .
2
(iii) By (ii) and for r1 = (= r), 1/(1 + X ) is B(r/2, r/2) which is symmetric
about 1/2.
1
1 1
Hence P( X 1) = P
= ; (iv) Set Y = r1 X . Then
1 + X 2 2
1
(r1 + r2 )
r /2
r /2
2
y
y
( r / 2 )1
fY ( y) =
y
1 +
1 +
r2
r2
1 1
r1 r2 r 2r / 2
2 2
2
1
y( r / 2)1e y / 2
1 ( r / 2)1
r1 2
2
1
r +r
since 1 2
2
r
2 r 2r / 2 r
1 2 ( r / 2)1
554
9.2.13
r + 1
2
fr x =
r
( r +1) 2 r + 1
2 e
2
()
1 2
1 + 1
r
2
2
( r 1)
r 2 r
e
2
t2
1 +
1 2
t2
1 +
r
1 2
.
2
e1 2
As r , the rst two terms on the right-hand side converge to 1 (by Stirlings
formula), and the remaining terms converge to:
e1 2 e t
9.3.5.
Chapter 10
1 e 1 2 2
(i)
0 2 1 +
u + v
~
N
,
0 0
u v
(ii)
+
x + y
1
2
~ N
,
x y
1
2
e t 2 .
2
,
2 1
2 2 1 +
.
2 2 1
0
10.1.1.
10.1.2.
c = log [1 (0.9)1/3].
10.1.4.
n
n
+ , 2 Y1 = 2 Yn =
+ ;
, EYn =
i(ii) EY1 =
2
n+1
n+1
n+1 n+2
( )
(iii) EY1 =
( )
( )
( )
)(
n
n
1
, 2 Y1 = 2 Yn =
, EYn =
.
2
n
n+1
+1
n+1 n+2
)(
For the converse, ent = P(Y1 > t) = P(Xj > t, j = 1, . . . , n) = [P(X1 > t)]n
so that P(X1 > t) = et. Thus the common distribution of the X s is the Negative
Exponential distribution with parameter .
10.1.9.
10.1.14.
and
With k =
555
1
n + 1 (2 k 1)!
,
( y )k1 ( y)k1 , y ( , ),
2
2 k1
2
[( k 1)!] ( )
( 2k 1)!
(1 e y )k 1 (e y )k , y > 0.
[(k 1)!]2
10.1.17.
Chapter 11
11.1.10.
11.2.2.
j =1
j =1
j =1
1
2
Set T = nj= 1Xj. Then T is P(n), sufcient for (Exercise 11.1.2(i)) and
complete (Example 10). Finally, E(T/n) = for every = (0, ).
11.3.1.
f X ( x) =
parameter(s).
11.5.2.
Chapter 12
1
x
12.2.2.
n+1
n+ 2 2
X (n ) ,
X (n ) .
2n
12 n
12.2.3.
[ X (1) + X (n ) ] 2 ,
12.3.2.
cn = 2 ( n + 1)
2
n+1
[ X X (1) ].
n 1 (n )
1
n .
2
n 1
n 2 n
2
2
X /
( X j X ) .
2
2 j =1
n 1 1
n
X
556
12.3.9.
j =1
n 1
n 3 n
2
2
( X j X )(Yj Y ) /( n 1)
(X j X ) .
2 j =1
2 j =1
n
12.5.6.
log(X/n).
12.5.7.
X(1).
X(1), X
12.5.8.
r
X j n.
j =1
12.5.9.
).
exp(x/X
12.9.1.
12.9.4.
3(X1 + X2)/2.
12.9.6.
X S and S, where S 2 =
ba
b a
2
, 2 X
= (a + b) / 12 n.
2
2
1 n
( X j X )2 .
n j =1
n (U n Vn ) = ( + ) X j n
n ( + + n) and
j =1
E [ n (U n Vn )] = [( + )n n]/ n ( + + n) n
0,
12.11.2.
2 [ n (U n Vn )] = n (1 )[( + ) / n ( + + n)]2 n
0.
Chapter 13
13.2.1.
13.2.2.
n = 9.
12.2.4.
13.3.1.
V = x j in both cases.
j =1
13.3.4.
13.3.7.
13.3.8.
n = 14.
13.3.10.
accepted.
13.3.12.
(i) Reject H if
x
j =1
Chapter 14
13.4.1.
H is rejected.
13.5.1.
13.5.4.
13.5.5.
13.7.2.
13.8.3.
13.8.4.
13.8.7.
14.3.1.
870.
14.3.2.
Chapter 15
557
(i) fR(r) = n(n 1)r n2( r)/ n, r (0, ); (iii) The expected length
of the shortest condence interval in Example 4 is = n(1/n 1)/(n + 1).
The expected length of the condence interval in (ii) is = (n 1)
(1 c)/c(n + 1) and the required inequality may be seen to be true.
15.2.4.
15.4.1.
ii(i) [ X n z / 2 / n , X n + z / 2 / n ];
100 0.196, X
100 + 0.196];
i(ii) [X
(iii) n = 1537.
15.4.2.
i(i)
100 0.0196S100, X
100 + 0.0196S100], S 2100 =
[X
100
(X
X 100 ) 2 / 100;
j =1
(ii)
Sn n
15.4.4.
15.4.5.
n z/2 / n , X
n + z /2 / n ]; = known,
= known, = unknown: [X
2
2
= unknown: [nS n /C2, nS n /C1], C1, C2 : P( 2n < C1 or 2n > C2) = .
j 1
10
P(Yi x p Yj ) = pk (1 p)10k = 1 . Let p = 0.25 and (i, j) =
k =i k
558
(2, 9), (3, 4), (4, 7). Then 1 = 0.756, 0.2503, 0.2206, respectively.
For p = 0.50 and (i, j) as above, 1 = 0.9786, 0.1172, 0.6562,
respectively.
[xp/2, xp/2], [0.8302, 2.0698].
15.4.7.
Chapter 16
16.3.5.
0.280
0.268
4.6 3.30
0.50
2.67 0.43 ;
i(ii) 3.3
0.5 0.43
0.07
16.4.2.
i(i)
(i) Reject H if ( 0 )
(x
x )2
j =1
where = ( xj x )Yj
j =1
n 2 / ( n 2) > t n 2 ; / 2 ,
( xj x )2 , 2
j =1
= [Yj ( xj x )]2 n , = Y ;
j =1
(ii)
t n 2 ; / 2 n 2 ( n 2) ( x j x ) 2 ,
j =1
n
+ t n 2 ; / 2 n 2 ( n 2) ( x j x ) 2 .
j =1
16.5.1.
i(i)
(ii)
16.5.2.
(iii) Reject H if
n 1
2 > t n 2 ; / 2 ,
559
b 2 / n , a 2 / n , a, b : P(a t b) = 1 ;
1
n 2
1
(iv) Reject H if 2
j =1
2 b 2
2
j
2 > t n 2 ; / 2 ,
2
x
j , a, b as in (iii).
j =1
x j2 , 2 a 2
j =1
16.5.5.
Reject H1 if
( 1 1)
m
n
n
m
2 xi2 m ( xi x ) 2 + xj 2 n ( xj x ) 2
i =1
i =1
j =1
j =1
> t m +n 4 ; / 2 ,
and reject H2 if
( 2 2 )
Chapter 17
2 1
( xi x )2 + 1
i =1
m
j =1
( xj x j )2 > tm +n 4 ; / 2 .
SSH = 0.9609 (d.f. = 2), MSH = 0.48045, SSe = 8.9044 (d.f. = 6),
MSe = 1.48407, SST = 9.8653 (d.f. = 8).
17.1.1.
SSA = 34.6652 (d.f. = 2), MSA = 17.3326, SSB = 12.2484 (d.f. = 3),
MSB = 4.0828, SSe = 12.0016 (d.f. = 6), MSe = 2.0003, SST = 58.9152 (d.f. = 11).
17.2.2.
17.4.1.
2
Yi i ~ N 0,
, i = 1, . . . , I independent. Since
J
1 I
(Yi i ) = Y , we have that
I i =1
I
[(Yi i ) (Y )]2
i =1
560
Chapter 18
18.1.3.
(1)
/
= ( 2) , / = 11
/ 21
/ 12
.
/ 22
Then the conditional distribution of X(1), given X(2) = x(2), is the m-variate
Normal with parameters:
n 2 n 2
18.3.7. In the inequality j j i j , i, j , i, j = 1, . . . ,
i =1 j =1
j =1
n, set i = Xi X , j = Yj Y .
Chapter 19
5 / 6 1 / 3 1 / 6
19.2.5. Q = XCX, where X = (X1, X2, X3), C = 1 / 3 1 / 3
1 / 3
1 / 6 1 / 3
5 / 6
Chapter 20
20.4.1.
RX + RY = R( X i ) + R(Yj ) = 1 + 2 + + N = N ( N + 1) / 2.
i =1
j =1
Eu(Xi Yj) = Eu (Xi Yj) = P(Xi > Yj) = 1/2, so that 2u(Xi Yj ) =
1/4, and Cov[u(Xi Yj), u(Xk Yl)] = 0, i k, j l, Cov[u(Xi Yj),
u(Xk Yl)] = 31 , i = k, j l or i k, j = l. The result follows.
20.4.3.
Index
561
INDEX
A
Absolutely continuous, 54, 55
Acceptance region, 332
Analysis of variance, 440
one-way layout (classication) in, 440
tables for, 445, 451, 457
two-way layout (classication) in, 446, 452
Asymptotic efciency (efcient), 327
Asymptotic relative efciency, 327, 497
Pittman, 498
of the sign test, 498
of the t-test, 498
of the Wilcoxon-Mann-Whitney test,
498
B
Basis, 501
orthonormal, 501
Bayes, decision function, 380, 382
estimator (estimate), 315317, 322
formula, 24
Behrens-Fisher problem, 364
Best linear predictor, 134
Beta distribution, 71
expectation of, 120, 543
graphs of, 72
moments of, 120
parameters of, 71
p.d.f. of, 70, 542
variance of, 543
Beta function, 71
Bias, 306
Bienaym equality, 171
Binomial distribution, 56
approximation by Poisson, 58, 79
as an approximation to Hypergeometric, 81
561
562
Index
Index
563
Raleigh, 78
Rectangular, 70
skewed, 121
skewness, 120
Standard Normal, 65
t-, 223
Triangular, 228
Uniform, 60, 70, 543544
Weibull, 78, 285
E
Effects
column, 446, 450
main, 457
row, 446, 450
Efciency, relative asymptotic, 327. See also
Pittman asymptotic relative efciency
Eigenvalues, 477, 504
Error(s), sum of squares of, 418
type-I, 332
type-II, 332
Estimable, function, 422
Estimation
by the method of moments, 324
decision-theoretic, 313
least square, 326
maximum likelihood, 307
minimum chi-square, 324
nonparametric, 485
point, 288
Estimator(s) (estimate(s))
admissible, 314
a.s. consistent, 327
asymptotically efcient, 327
asymptotically equivalent, 329
asymptotically normal, 327, 489
asymptotically optimal properties of, 326
asymptotically unbiased, 489
Bayes, 315317, 322
best asymptotically normal, 327
consistent in probability, 327
consistent in quadratic mean (q.m.), 489
criteria for selecting, 289, 306, 313
efcient, 305
essentially complete class of, 314
inadmissible, 315
least square, 418
maximum likelihood, 307
minimax, 314, 322323
strongly consistent, 327, 485487, 489
unbiased, 289
uniformly minimum variance unbiased
(UMVU), 292, 297
weakly consistent, 327, 485486
Event(s)
certain (or sure), 14
composite, 14
dependent, 33
happens (or occurs), 15
564
Index
Event(s) continued
impossible, 14
independent, 28
completely, 28
in the probability sure, 28
mutually, 28
pairwise, 28
statistically, 28
stochastically, 28
null, 15
simple, 14
Expectation, 107108
basic properties of, 110
conditional, 123
mathematical, 107108
of a matrix of r.vs, 418
of Beta distribution, 120, 543
of Binomial distribution, 114, 542
of Chi-square distribution, 118, 542
of Gamma distribution, 117, 542
of Geometric distribution, 119, 542
of Hypergeometric distribution, 119, 542
of Lognormal distribution, 120
of Negative Binomial distribution, 119, 542
of Negative Exponential distribution, 118,
543
of Normal distribution, 117, 542
of Poisson distribution, 115, 542
of Standard Normal distribution, 116
of Uniform distribution, 113, 543
of Weibull distribution, 114
properties of conditional, 123
Experiment(s)
binomial, 55, 59
composite (or compound), 32
dependent, 33
deterministic, 14
independent, 32, 46
multinomial, 60
random, 14
Exponential distribution(s), 280
multiparameter, 285286
Negative Exponential, 69
one-dimensional parameter, 280, 282
U(, ) is not an, 286
F
F-distribution, 73, 233
critical values of, 532541
expectation of, 238
graph of, 236
noncentral, 434, 509
the p.d.f. of, 234
variance of, 258
F-test, 432
geometric interpretation of, 432
Factorial moment, 110
generating function of a, 153
of Binomial distribution, 157
Index
H
Hazard rate, 91
Hypergeometric distribution, 59, 542
approximation to, 81
expectation of, 119, 542
Multiple, 63
p.d.f. of, 59, 542
variance of, 119, 542
Hypothesis(-es)
alternative, 331
composite, 331, 353
linear, 416
null, 331
simple, 331
statistical, 331
testing a simple, 333, 337
testing a composite, 341
I
Independence, 27
complete (or mutual), 28
criteria of, 164
in Bivariate Normal distribution, 168
in the sense of probability, 28
of classes, 177178
of events, 28
of random experiments, 32, 46
of r.v.s, 164, 178
of sample mean and sample variance in the
Normal distribution, 244, 284, 479, 481
of -elds, 46, 178
pairwise, 28
statistical, 28
stochastic, 28
Independent
Binomial r.v.s, 173
Chi-square r.v.s, 175
classes, 178
completely, 28
events, 28
in the sense of probability, 28
mutually, 28
Normal r.v.s, 174175
pairwise, 28
Poisson r.v.s, 173
random experiments, 32, 46
r.v.s, 164, 178
-elds, 46, 178
statistically, 28
stochastically, 28
Indicator, 84
function, 135
Indistinguishable balls, 38
Inequality(-ies)
Cauchy-Schwarz, 127
Cramr-Rao, 297-298
Markov, 126
moment, 125
probability, 125
Tchebichevs, 126
Inference(s), 263
Interaction(s), 452, 457
Intersection, 2
Invariance, 307, 493
Inverse image, 82
Inversion formula, 141, 151
J
Jacobian, 226, 237, 240
Joint, ch.f., 150
conditional distribution, 95, 467
conditional p.d.f., 95
d.f., 91, 94
moment, 107
m.g.f., 158
probability, 25
p.d.f., 95
p.d.f. of order statistics, 249
K
Kolmogorov
one sample test, 491
-Smirnov, two-sample test, 493
Kurtosis of a distribution, 121
of Double Exponential
distribution, 121
of Uniform distribution, 121
L
Laplace transform, 153
Latent roots, 481, 504
Laws of Large Numbers (LINs), 198
Strong (SLLNs), 198, 200
Weak (WLLNs), 198200, 210
Least squares, 418
estimator, 418, 420
estimator in the case of full rank, 421
Lebesgue measure, 186, 328
Leptokurtic distribution, 121
Level, of factor, 447
of signicance, 332
Likelihood, function, 307
ratio, 365
Likelihood ratio test, 365, 430, 432
applications of, 374
in Normal distribution(s), 367372
interpretation of, 366
Likelihood statistic, 365
asymptotic distribution of, 366
Limit
inferior, 5
of a monotone sequence of sets, 5
superior, 5
theorems, basic, 180
theorems, further, 202
565
566
Index
Linear
dependence, 129
hypothesis, 416
interpolation, 61
model, 424
canonical reduction of, 424
regression functions, 418
Lognormal distribution, 73
expectation of, 120
graphs of, 74
parameters of, 73
p.d.f. of, 73
variance of, 120
Loss function, 313, 380
squared, 313
Least Square Estimate (LSE), of s, 443
of , i, j, 448449
of ij etc., 454
of 2, 425, 444, 451, 456
M
Matching problem, 48
Matrix (-ices), 502
characteristic polynomial of a, 504
characteristic (latent) roots of a, 504505
determinant of a square, 503
diagonal, 502, 504506
diagonal element of a, 502
dimensions of a, 502
eigenvalues of a, 504
elements of a, 502
equal, 502
idempotent, 504, 506507
indentity, 502
inverse, 503
negative denite (semidenite), 504506
nonsingular, 503506
of full rank, 503
order of a, 502, 505507
orthogonal, 503506
positive denite (semidenite), 504506
product by a scalar, 502
product of, 502
rank of a, 503, 505507
singular, 503
some theorems about, 504
square, 502, 504505
sum of, 502
symmetric, 502, 505507
transpose, 502
unit, 502
zero, 502
Maximum likelihood, 306
and Bayes estimation, 318
estimator (estimate) (MIE), 307
in Binomial distribution, 309
in Bivariate Normal distribution, 472, 475
in Multinomial distribution, 308309, 312
in Negative Binomial distribution, 312
Index
n-th, 106107
of an r.v., 106, 108
of Beta distribution, 120
of inertia, 107
of Standard Normal, 116
r-th absolute, 107
sample, 325
Monotone likelihood ratio (MLR) property,
342
of Exponential distribution, 342
of Logistic distribution, 342343
testing under, 343
Multicomparison method, 458
Multinomial distribution, 6061, 64, 9596, 542
ch.f. of, 152, 544
distribution, 6061, 64, 9596, 542
experiment, 60
m.g.f. of, 158, 544
parameters of, 6061
p.d.f. of, 60
vector of expectations of, 542
vector of variances, 542
Multivariate (or k-Variate) Normal
distribution, 463465, 543544
ch.f. of, 544
estimation of and , 469
m.g.f. of, 544
nonsingular, 465
parameters of, 464
p.d.f. of, 465, 543
singular, 474
some properties of, 467
N
Negative Binomial distribution, 59
ch.f. of, 150, 543
expectation of, 119, 542
factorial m.g.f., 161
m.g.f., 161, 543
p.d.f. of, 59, 542
variance of, 119, 542
Negative Exponential distribution, 69
ch.f. of, 544
expectation of, 118, 543
m.g.f. of, 156, 544
parameter of, 69
p.d.f. of, 69, 543
variance of, 118, 543
Neyman-Pearson, 331, 366
fundamental lemma, 334
Noncentrality parameter, 508509
Noncomplete distributions, 276277
Nonexponential distribution, 286
Nonnormal Bivariate distribution, 99
Nonparametric, estimation, 485
estimation of a p.d.f., 487
inference, 485
tests, 490
Normal distribution, 65, 542
567
568
Index
Probability(-ies) continued
consequences of the denition of, 1516
dening properties of, 15
densities, 85
density function, 54
distribution, 53, 84
distribution function, 53, 84
function, 15
uniform, 17
inequalities, 125
integral transform, 246
joint, 25
Kolmogorov denition of, 17
marginal, 25
measure(s), 15
product of, 46
of coverage of a population quantile, 260
of matchings, 47
of type-I error, 332
of type-II error, 332
relative frequency denition of, 17
space, 15
statistical denition of, 17
Probability density function(s) (p.d.f.(s)), 53
54
conditional, 91, 94
convolution of, 224
F-, 233234
joint, 93
joint conditional, 95
marginal, 91, 93, 95
of Bernoulli (or Point Binomial)
distribution, 56, 542
of Beta distribution, 70, 543
of Binomial distribution, 55, 542
of Bivariate Normal distribution, 74, 97,
134, 465466, 543
of Cauchy distribution, 72, 543
of Chi-square distribution, 69, 542
of continuous Uniform distribution, 70, 543
of discrete Uniform distribution, 60
of Double Exponential distribution, 78
of Gamma distribution, 67, 542
of Geometric distribution, 60, 542
of Hypergeometric distribution, 59, 542
of Lognormal distribution, 73
of Maxwell distribution, 78
of Multinomial distribution, 60, 542
of Multivariate Normal distribution, 465,
543
of Negative Binomial distribution, 59, 542
of Negative Exponential distribution, 69,
543
of Normal distribution, 65, 542
of Pareto distribution, 78
of Pascal distribution, 60
of Poisson distribution, 57, 542
of Raleigh distribution, 78
of sample range, 255
of Standard Normal distribution, 65
of t-distribution, 233
of Triangular distribution, 228
posterior, 318
prior, 315, 380
Product probability
measures, 46
spaces, 4546
Q
Quadratic form(s), 476, 504, 506
negative denite (semidenite), 477
positive denite (semidenite), 477
rank of a, 477
some theorems on, 477
Quantile, p-th, 99, 261
R
Random variable(s) (r.v.(s)), 53
absolute moments of a, 107
absolutely continuous, 54
as a measurable function, 8283
Bernoulli (or Point Binomial), 56
Beta, 70
Binomial, 55
Cauchy, 72
central moment of a, 107
ch.f. of a, 138, 140
Chi-square, 69
completely correlated, 129
continuous, 5354
criteria of independence of, 164167
cumulative distribution function of a, 85
degenerate, 183
dependent, 164
discrete, 5355
discrete Uniform, 60
distribution function of a, 85
distribution of a, 53
expectation of a, 107108
functions of, 107108
Gamma, 66, 117
Geometric, 119, 542
Hypergeometric, 59, 119
independent, 178
linearly related, 129
Lognormal, 72
mathematical expectation of a, 107108
mean (or mean value) of a, 107108
m.g.f. of a, 138, 153, 158
moments of a, 107108
Negative Binomial, 59
Negative Exponential, 69
negatively correlated, 129
Normal, 65
normalization of a, 89
pairwise uncorrelated, 171
Poisson, 57
Poisson, truncated, 62
Index
569
range, 255
size, 35
space, 14
unordered, 35
Sampling
sequential, 382
with replacement, 3536, 49, 82
without replacement, 3536, 4849, 59, 63,
82
Scalar, 499, 502
Scheff criterion, 459
Sequential
analysis, 383
probability ratio test (SPRT), 388389
expected sample size, 393394
for Binomial distribution, 395
for Normal distribution, 396
optimality of, 393
with prescribed error probabilities, 392
procedures, 382
sampling, 383
Set(s)
basic, 1
complement of, 1
decreasing sequence of, 5
difference of, 3
disjoint, 3
element of, 1
empty, 3
equal, 3
increasing sequence of, 5
inferior limit of a sequence of, 5
intersection of, 2
limit of a sequence of, 5
measure of a, 102
member of a, 1
monotone sequence of, 5
mutually disjoint, 3
of convergence of a sequence of r.v.s, 183
open, 102103
operations, 1
pairwise disjoint, 3
properties of operations on, 4
superior limit of a sequence of, 5
symmetric difference of, 3
union of, 2
universal, 1
Shift, 499
-additive, 15
-eld(s)
Borel, 12
consequences of the denition of a, 10
dependent, 46
discrete, 10
examples of, 10
generated by, 11
independent, 46
induced by, 178
of events, 15
k-dimensional Borel, 13
570
Index
-eld(s) continued
trivial, 10
two-dimensional Borel, 13
Sign test
one-sample, 497
two-sample, 496
Signicantly different from zero, 459
Skewed, to the left (to the right), 121
Skewness, 120
of Binomial distribution, 121
of Negative Exponential distribution, 121
of Poisson distribution, 121
Space(s)
measurable, 11
probability, 15
product probability, 4546
sample, 14
Standard
deviation, 107, 109
Normal distribution, 65
Statistic(s), 263
complete, 292
m-dimensional, 263
minimal sufcient, 272
of uniformly minimum variance (UMV),
279
sufcient, 266, 270, 292
U-, 495
unbiased, 278
Stopping time, 382
Strong Law of Large Numbers (SLLNs), 198,
200
Studentized range, 256
Subadditive, 16
Subset, 1
proper, 1
Sub--elds
independent, 46
Sufciency, 263264
denition of, 264
some basic results on, 264
Sufcient statistic(s), 266, 270, 282, 292
complete, 291
for Bernoulli distribution, 274
for Beta distribution, 273
for Bivariate Normal distribution, 274
for Cauchy distribution, 272
for Double Exponential distribution, 274
for Exponential distribution, 282
for Gamma distribution, 273
for Multinomial distribution, 270
for Multivariate Normal distribution, 469
for Negative Binomial distribution, 273
for Negative Exponential distribution, 273
for Normal distribution, 271
for Poisson distribution, 273
for Uniform distribution, 271
for Weibull distribution, 285
m-dimensional, 263
minimal, 272
Sums of squares
between groups, 445
total, 440
within groups, 445
T
Tables, 511
of critical values for Chi-square
distribution, 529531
of critical values for F-distribution, 532541
of critical values for Students
t-distribution, 526528
of the cumulative Binomial distribution,
511519
of the cumulative Normal distribution, 523
525
of the cumulative Poisson distribution, 520
522
t (Students)-distribution, 73, 233234
critical values of, 526528
expectation of, 239
graph of, 234
noncentral, 360, 364, 508
noncentrality parameter, 508
p.d.f. of, 234
variance of, 239
Test(s), 311
about mean(s) in Normal distribution(s),
339, 349, 355, 359360, 363364, 383384
about variance(s) in Normal distribution(s),
340, 350, 358359, 361363
Bayes, in Normal distribution, 384
chi-square, 377
consistent, 379
equal-tail, 359, 372
F-, 432
function, 331
goodness-of-t, 374
Kolmogorov one-sample, 491
Kolmogorov-Smirnov two-sample test, 493
level-, 333
level of signicance of a, 332
likelihood ratio (LR), 365, 367372
minimax in Binomial distribution, 385
minimax in Normal distribution, 384
most powerful (MP), 333, 338340
nonparametric, 490493
nonrandomized, 332
of independence in Bivariate Normal
distribution, 470
one-sample Wilcoxon-Mann-Whitney, 495
power of a, 332
randomized, 331
rank, 493
rank sum, 495
relative asymptotic efciency of a, 497
sign, 496497
size of a, 332
statistical, 331
Index
571
U
U statistic, 495
Unbiasedness, 278, 286, 289
Unbiased statistic, 278
asymptotically, 489
unique, 279
Uniform distribution, 60, 70
ch.f. of, 150, 544
continuous, 70
discrete, 60
expectation of, 113, 543
graphs of, 60, 70
m.g.f. of, 161, 544
parameters of, 70
p.d.f. of, 60, 70, 544
variance of, 113, 543
Uniformly minimum variance unbiasedness,
279, 290
Uniformly minimum variance unbiased
(UMVU) estimator(s) (estimate(s)), 290
291
in Binomial distribution, 292294, 296
in Bivariate Normal distribution, 296
in Gamma distribution, 305
in Geometric distribution, 297
in Negative Binomial distribution, 296
in Negative Exponential distribution, 296
in Normal distribution, 294295, 303305
in Poisson distribution, 294, 303
Uniformly minimum variance unbiased
statistic(s), 279
in Binomial distribution, 279
in Negative Exponential
distribution, 279280
in Normal distribution, 279
in Poisson distribution, 280
Uniformly most powerful (UMP) test(s), 333,
343, 345
in Binomial distribution, 347
in distributions with the MLR property,
343
in Exponential distributions, 345346
in Normal distribution, 349350, 358
in Poisson distribution, 348
Uniformly most powerful unbiased (UMPU)
test(s), 353, 357
in Exponential distributions, 354
in Normal distributions, 355, 357360
in the presence of nuisance parameters,
357364
Uniform probability measure, 60
Union of sets, 2
V
Variance, 107, 109
basic properties of, 111
covariance of a matrix of r.v.s, 418
interpretation of, 109
572
Index
Variance continued
minimum, 289
of Beta distribution, 120, 543
of Binomial distribution, 114, 542
of Chi-square distribution, 118, 542
of Gamma distribution, 117, 542
of Geometric distribution, 119, 542
of Hypergeometric distribution, 119, 542
of Lognormal distribution, 120
of Negative Binomial distribution, 119,
542
of Negative Exponential distribution, 118,
543
of Normal distribution, 117, 543
of Poisson distribution, 115, 542
of Standard Normal distribution, 116, 542
of Uniform distribution, 113, 543
of Weibull distribution, 114
Vector(s), 499
components of a, 499
equal, 499
inner (scalar) product of, 499
linearly dependent, 501
linearly independent, 501
n-dimensional, 499
norm (or length) of a, 500