Welcome to Scribd! Start your free trial and access books, documents and more.Find out more

Contents

i

**A Course in Mathematical Statistics
**

Second Edition

ii

Contents

This Page Intentionally Left Blank

Contents

iii

**A Course in Mathematical Statistics
**

Second Edition

George G. Roussas

Intercollege Division of Statistics University of California Davis, California

ACADEMIC PRESS

San Diego New York

• •

London • Boston Sydney • Tokyo • Toronto

iv

Contents

This book is printed on acid-free paper. Copyright © 1997 by Academic Press

∞

All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher.

ACADEMIC PRESS 525 B Street, Suite 1900, San Diego, CA 92101-4495, USA 1300 Boylston Street, Chestnut Hill, MA 02167, USA http://www.apnet.com ACADEMIC PRESS LIMITED 24–28 Oval Road, London NW1 7DX, UK http://www.hbuk.co.uk/ap/

Library of Congress Cataloging-in-Publication Data Roussas, George G. A course in mathematical statistics / George G. Roussas.—2nd ed. p. cm. Rev. ed. of: A ﬁrst course in mathematical statistics. 1973. Includes index. ISBN 0-12-599315-3 1. Mathematical statistics. I. Roussas, George G. First course in mathematical statistics. II. Title. QA276.R687 1997 96-42115 519.5—dc20 CIP

Printed in the United States of America 96 97 98 99 00 EB 9 8 7 6 5 4 3 2 1

Contents

v

To my wife and sons

vi

Contents

This Page Intentionally Left Blank

Contents

vii

Contents

Preface to the Second Edition xv Preface to the First Edition xviii

Chapter 1

**Basic Concepts of Set Theory 1
**

1.1 1.2* Some Deﬁnitions and Notation Exercises 5 Fields and σ -Fields 8 1

Chapter 2

**Some Probabilistic Concepts and Results 14
**

2.1 2.2 2.3 2.4 2.5* Probability Functions and Some Basic Properties and Results 14 Exercises 20 Conditional Probability 21 Exercises 25 Independence 27 Exercises 33 Combinatorial Results 34 Exercises 40 Product Probability Spaces 45 Exercises 47

vii

viii

Contents

2.6*

The Probability of Matchings 47 Exercises 52

Chapter 3

**On Random Variables and Their Distributions 53
**

3.1 3.2 3.3 3.4 Some General Concepts 53 Discrete Random Variables (and Random Vectors) 55 Exercises 61 Continuous Random Variables (and Random Vectors) 65 Exercises 76 The Poisson Distribution as an Approximation to the Binomial Distribution and the Binomial Distribution as an Approximation to the Hypergeometric Distribution 79 Exercises 82 Random Variables as Measurable Functions and Related Results 82 Exercises 84

3.5*

Chapter 4

**Distribution Functions, Probability Densities, and Their Relationship 85
**

4.1 The Cumulative Distribution Function (c.d.f. or d.f.) of a Random Vector—Basic Properties of the d.f. of a Random Variable 85 Exercises 89 The d.f. of a Random Vector and Its Properties—Marginal and Conditional d.f.’s and p.d.f.’s 91 Exercises 97 Quantiles and Modes of a Distribution 99 Exercises 102 Justiﬁcation of Statements 1 and 2 102 Exercises 105

4.2

4.3 4.4*

Chapter 5

**Moments of Random Variables—Some Moment and Probability Inequalities 106
**

5.1 5.2 5.3 Moments of Random Variables 106 Exercises 111 Expectations and Variances of Some R.V.’s 114 Exercises 119 Conditional Moments of Random Variables 122 Exercises 124

Contents

ix

5.4

5.5 5.6*

Some Important Applications: Probability and Moment Inequalities 125 Exercises 128 Covariance, Correlation Coefﬁcient and Its Interpretation 129 Exercises 133 Justiﬁcation of Relation (2) in Chapter 2 134

Chapter 6

**Characteristic Functions, Moment Generating Functions and Related Theorems 138
**

6.1 6.2 6.3 6.4 6.5 Preliminaries 138 Deﬁnitions and Basic Theorems—The One-Dimensional Case 140 Exercises 145 The Characteristic Functions of Some Random Variables 146 Exercises 149 Deﬁnitions and Basic Theorems—The Multidimensional Case 150 Exercises 152 The Moment Generating Function and Factorial Moment Generating Function of a Random Variable 153 Exercises 160

Chapter 7

**Stochastic Independence with Some Applications 164
**

7.1 7.2 7.3 7.4* Stochastic Independence: Criteria of Independence 164 Exercises 168 Proof of Lemma 2 and Related Results 170 Exercises 172 Some Consequences of Independence 173 Exercises 176 Independence of Classes of Events and Related Results 177 Exercise 179

Chapter 8

**Basic Limit Theorems 180
**

8.1 8.2 8.3 8.4 Some Modes of Convergence 180 Exercises 182 Relationships Among the Various Modes of Convergence 182 Exercises 187 The Central Limit Theorem 187 Exercises 194 Laws of Large Numbers 196 Exercises 198

x

Contents

8.5 8.6*

Further Limit Theorems 199 Exercises 206 Pólya’s Lemma and Alternative Proof of the WLLN 206 Exercises 211

Chapter 9

**Transformations of Random Variables and Random Vectors 212
**

9.1 9.2 9.3 9.4 The Univariate Case 212 Exercises 218 The Multivariate Case 219 Exercises 233 Linear Transformations of Random Vectors 235 Exercises 240 The Probability Integral Transform 242 Exercise 244

Chapter 10

**Order Statistics and Related Theorems 245
**

10.1 10.2 Order Statistics and Related Distributions 245 Exercises 252 Further Distribution Theory: Probability of Coverage of a Population Quantile 256 Exercise 258

Chapter 11

**Sufﬁciency and Related Theorems 259
**

11.1 11.2 11.3 11.4 Sufﬁciency: Deﬁnition and Some Basic Results 260 Exercises 269 Completeness 271 Exercises 273 Unbiasedness—Uniqueness 274 Exercises 276 The Exponential Family of p.d.f.’s: One-Dimensional Parameter Case 276 Exercises 280 Some Multiparameter Generalizations 281 Exercises 282

11.5

Chapter 12

**Point Estimation 284
**

12.1 Introduction 284 Exercise 284

Goodness-of-Fit Tests 370 Exercises 373 13.8 Finding Minimax Estimators 318 Exercise 320 12.6 Criteria for Selecting an Estimator: The DecisionTheoretic Approach 309 12.Contents xi 12.3 13.8 .1 General Concepts of the Neyman-Pearson Testing Hypotheses Theory 327 Exercise 329 Testing a Simple Hypothesis Against a Simple Alternative 329 Exercises 336 UMP Tests for Testing Certain Composite Hypotheses 337 Exercises 347 UMPU Tests for Testing Certain Composite Hypotheses 349 Exercises 353 Testing the Parameters of a Normal Distribution 353 Exercises 356 Comparing the Parameters of Two Normal Distributions 357 Exercises 360 Likelihood Ratio Tests 361 Exercises 369 Applications of LR Tests: Contingency Tables.11 Closing Remarks 325 Exercises 326 Chapter 13 Testing Hypotheses 327 13.10 Asymptotically Optimal Properties of Estimators 322 Exercise 325 12.5 Criteria for Selecting an Estimator: The Maximum Likelihood Principle 302 Exercises 308 12.6 13.2 Criteria for Selecting an Estimator: Unbiasedness.4 The Case Where Complete Sufﬁcient Statistics Are Not Available or May Not Exist: Cramér-Rao Inequality 293 Exercises 301 12.3 The Case of Availability of Complete Sufﬁcient Statistics 287 Exercises 292 12.4 13.7 13.9 Other Methods of Estimation 320 Exercises 322 12.7 Finding Bayes Estimators 312 Exercises 317 12. Minimum Variance 285 Exercises 286 12.2 13.5 13.

4 16.2 16.9 Decision-Theoretic Viewpoint of Testing Hypotheses 375 Chapter 14 Sequential Procedures 382 14.5 Conﬁdence Intervals 397 Exercise 398 Some Examples 398 Exercises 404 Conﬁdence Intervals in the Presence of Nuisance Parameters 407 Exercise 410 Conﬁdence Regions—Approximate Conﬁdence Intervals 410 Exercises 412 Tolerance Intervals 413 Chapter 16 The General Linear Hypothesis 416 16.4 15.2 One-way Layout (or One-way Classiﬁcation) with the Same Number of Observations Per Cell 440 Exercise 446 Two-way Layout (Classiﬁcation) with One Observation Per Cell 446 Exercises 451 .2 15.1 16.1 15.3 16.3 14.1 14.5 Introduction of the Model 416 Least Square Estimators—Normal Equations 418 Canonical Reduction of the Linear Model—Estimation of σ2 Exercises 428 Testing Hypotheses About η = E(Y) 429 Exercises 433 Derivation of the Distribution of the F Statistic 433 Exercises 436 424 Chapter 17 Analysis of Variance 17.2 14.3 15.4 Some Basic Theorems of Sequential Sampling 382 Exercises 388 Sequential Probability Ratio Test 388 Exercise 392 Optimality of the SPRT-Expected Sample Size 393 Exercises 394 Some Examples 394 Chapter 15 Conﬁdence Regions—Tolerance Intervals 397 15.xii Contents 13.1 440 17.

3 20.4 Two-way Layout (Classiﬁcation) with K (≥ 2) Observations ≥ Per Cell 452 Exercises 457 A Multicomparison method 458 Exercises 462 Chapter 18 The Multivariate Normal Distribution 463 18. and F-Distributions II.f.1 Noncentral t-Distribution 508 508 .3 17.4 Basic Deﬁnitions in Vector Spaces 499 Some Theorems on Vector Spaces 501 Basic Deﬁnitions About Matrices 502 Some Theorems About Matrices and Quadratic Forms 504 Appendix II Noncentral t-.d.2 Introduction 476 Some Theorems on Quadratic Forms 477 Exercises 483 Chapter 20 Nonparametric Inference 485 20.1 18. χ 2-.5 20.1 I.Contents xiii 17.2 20.3 Introduction 463 Exercises 466 Some Properties of Multivariate Normal Distributions 467 Exercise 469 Estimation of μ and Σ and a Test of Independence 469 / Exercises 475 Chapter 19 Quadratic Forms 476 19.6 Nonparametric Estimation 485 Nonparametric Estimation of a p. 487 Exercise 490 Some Nonparametric Tests 490 More About Nonparametric Tests: Rank Tests 493 Exercises 496 Sign Test 496 Relative Asymptotic Efﬁciency of Tests 498 Appendix I Topics from Vector and Matrix Algebra 499 I.1 19.3 I.1 20.2 18.4 20.2 I.

2 II.3 Noncentral x2-Distribution 508 Noncentral F-Distribution 509 Appendix III Tables 1 2 3 4 5 6 7 511 The Cumulative Binomial Distribution 511 The Cumulative Poisson Distribution 520 The Normal Distribution 523 Critical Values for Student’s t-Distribution 526 Critical Values for the Chi-Square Distribution 529 Critical Values for the F-Distribution 532 Table of Selected Discrete and Continuous Distributions and Some of Their Characteristics 542 Some Notation and Abbreviations 545 Answers to Selected Exercises 547 Index 561 .xiv Contents II.

. as being too complicated for the level of the book. either kindly brought to my attention by users of the book or located by my students and myself. throughout the years. That issue. The ﬁrst issue of the book was very well received from an academic viewpoint. In the process of doing this. holding senior academic appointments in North America and elsewhere. under the title A First Course in Mathematical Statistics. I have also heard of the book as being in a class of its own. One change occurring throughout the book is the grouping of exercises of each chapter in clusters added at the end of sections. Also. I have received numerous inquiries as to the possibility of having the book reprinted. and a few new ones were inserted. The changes in this issue consist in correcting some rather minor factual errors and a considerable number of misprints. however. after it went out of print. The ﬁrst edition has been out of print for a number of years now. is meant for circulation only in Taiwan. It is in response to these comments and inquiries that I have decided to prepare a second edition of the book. although its reprint in Taiwan is still available. a handful of exercises were omitted. whereas some adjustments are affected. Inc. I have had the pleasure of hearing colleagues telling me that the book ﬁlled an existing gap between a plethora of textbooks of lower mathematical level and others of considerably higher level. the reissuing of the book has provided me with an excellent opportunity to incorporate certain rearrangements of the material. In xv . Finally. and also as forming a collector’s item. A substantial number of colleagues. have acknowledged to me that they made their entrance into the wonderful world of probability and statistics through my book. Associating exercises with material discussed in sections clearly makes their assignment easier. This second edition preserves the unique character of the ﬁrst issue of the book.Contents xv Preface to the Second Edition This is the second edition of a book published for the ﬁrst time in 1973 by Addison-Wesley Publishing Company.

in Chapter 1. . the proofs of two statements. this last section is also marked by an asterisk. thus increasing the number of sections from ﬁve to six. In Chapter 6. along with their means. and also by isolating in a section by itself deﬁnitions and results on momentgenerating functions and factorial moment generating functions.3 has been facilitated by the formulation and proof in the same section of two lemmas. the proof of Pólya’s lemma and an alternative proof of the Weak Law of Large Numbers. Furthermore. Also.6*. variances and other characteristics. are carried out in a separate section. the number of sections has been expanded from three to ﬁve by discussing in separate sections characteristic functions for the one-dimensional and the multidimensional case.1.2*. Lemma 1 and Lemma 2. They are still readily available for those who wish to employ them to add elegance and rigor in the discussion. also. This was done by discussing independence and product probability spaces in separate sections. the proof of Theorem 2 in Section 13. except that in Chapter 13. the number of sections has been doubled from three to six.1. In Chapter 5.xvi Contents the Second Edition Preface to Chapters 1 through 8. In Chapter 2.4*. in Chapter 14. Statements 1 and 2. Speciﬁcally. The value of such a table for reference purposes is obvious.5*. by grouping together in a section marked by an asterisk deﬁnitions and results on independence. two new sections have been created by discussing separately marginal and conditional distribution functions and probability density functions. Also. a new theorem. Finally. the number of sections has been doubled from two to four by presenting the proof of Lemma 2. has been added in Section 8. the proof of Theorem 1 in Section 14. in Chapter 8. These sections have also been marked by an asterisk (*) to indicate the fact that their omission does not jeopardize the ﬂow of presentation and understanding of the remaining material. and also by presenting.1 has been somewhat simpliﬁed by the formulation and proof of Lemma 1 in the same section. no changes were deemed necessary. In Chapter 7. stated in Section 7. Finally. the justiﬁcation of relation (2) in Chapter 2 is done in a section by itself. the concepts of a ﬁeld and of a σ-ﬁeld. Also. but their inclusion is not indispensable. a table of some commonly met distributions. Theorem 10. Section 3. especially useful in estimation. Section 5. and needs no elaboration. In the remaining chapters. and basic results on them. The section on the problem of the probability of matching and the section on product probability spaces are also marked by an asterisk for the reason explained above. some additional material is also presented for the purpose of further clarifying the interpretation of correlation coefﬁcient. some of the materials were pulled out to form separate sections. has been added. in Section 4. In Chapter 4.5. Section 8. the solution of the problem of the probability of matching is isolated in a section by itself. and related results in a separate section. the discussion of random variables as measurable functions and related results is carried out in a separate section. have been grouped together in Section 1.6*. In Chapter 3. the discussion of covariance and correlation coefﬁcient is carried out in a separate section. formulated in Section 4. based on truncation.

The book can also be used independently for a one-semester (or even one quarter) course in probability alone. In such a case. Sections 13.6 in Chapter 13. it has been included in the book for completeness only. One may also be selective in covering the material in Chapter 9. As for Chapter 19. In either case. perhaps. it is to be mentioned that this author is fully aware of the fact that the audience for a book of this level has diminished rather than increased since the time of its ﬁrst edition.Preface to the Second Edition Contents xvii This book contains enough material for a year course in probability and statistics at the advanced undergraduate level. however. It is hoped that there is still room for a book of the nature and scope of the one at hand. with emphasis placed on moment-generating functions. G. one would strive to cover the material in Chapters 1 through 10 with the exclusion. depriving the reader of the challenge of thinking and reasoning instead delegating the “thinking” to a computer. California May 1996 . The instructor can also be selective regarding Chapters 11 and 18. why characteristic functions are introduced in the ﬁrst place. or for ﬁrst-year graduate students not having been exposed before to a serious course on the subject matter. He is also cognizant of the trend of having recipes of probability and statistical results parading in textbooks. perhaps. Roussas Davis. and therefore what one may be missing by not utilizing this valuable tool. Some of the material can actually be omitted without disrupting the continuity of presentation. G. Indeed.4–13. This includes the sections marked by asterisks. and all of Chapter 14. the trend and practices just described should make the availability of a textbook such as this one exceedingly useful if not imperative. In closing. of the sections marked by an asterisk. One should mention. presentation of results involving characteristic functions may be perfunctory only.

There are basically two streams of textbooks on mathematical statistics that are currently on the market. Part 1 (Chapters 1–10) deals with probability and distribution theory. One category is the advanced level texts which demonstrate the statistical theories in their full generality and mathematical rigor.xviii Contents Preface to the First Edition This book is designed for a ﬁrst-year course in mathematical statistics at the undergraduate level. The advanced texts are inaccessible to them. although it is not formally so divided. in general—with no prior knowledge of statistics. measure theory. whereas the intermediate texts deliver much less than they hope to learn in a course of mathematical statistics. whereas Part xviii . for that purpose. mathematical background of the reader (for example. This has been made possible by developing the fundamentals of modern probability theory and the accompanying mathematical ideas at the beginning of this book so as to prepare the reader for an understanding of the material presented in the later chapters. and results are often stated without proofs or with partial proofs that fail to satisfy an inquisitive mind. The other category consists of intermediate level texts. real and complex analysis). as well as for ﬁrst-year graduate students in statistics—or graduate students. so that students without a sophisticated mathematical background can assimilate a fairly broad spectrum of the theorems and results from mathematical statistics. readers with a modest background in mathematics and a strong motivation to understand statistical concepts are left somewhere in between. The present book attempts to bridge the gap between the two categories. where the concepts are demonstrated in terms of intuitive reasoning. Thus. Some advanced calculus—perhaps taken concurrently—would be helpful for the complete appreciation of some ﬁne points. A typical three-semester course in calculus and some familiarity with linear algebra should sufﬁce for the understanding of most of the mathematical aspects of this book. they require a high level. This book consists of two parts.

Finally. statements of the Laws of Large Numbers and several proofs of the Weak Law of Large Numbers. Special attention is given to estimators derived by the principles of unbiasedness. There are more than 120 examples and applications discussed in detail in . although historically logical. This we consider to be one of the distinctive characteristics of this part. there are special chapters on sequential procedures. The second part of the book opens with an extensive chapter on sufﬁciency. the point estimation problem is taken up and is discussed in great detail and as large a generality as is allowed by the level of this book. . the systematic employment of characteristic functions—rather than moment generating functions—with all the well-known advantages of the former over the latter. along with the examples (most of them numerical) and the illustrative ﬁgures. the above two topics are approached either separately or in the reverse order from the one used here. Next. More precisely. In our opinion. These latter topics are available only in more advanced texts. . a discussion of exponential families. the Multivariate Normal distribution. and also the deﬁnition of a random variable as a measurable function. Here. Other important features are as follows: a detailed and systematic discussion of the most useful distributions along with ﬁgures and various approximations for several of them. which is pedagogically unsound. including all common modes of convergence and their relationship. the establishment of several moment and probability inequalities. are introduced. and also an extensive chapter on transformations of random variables with numerous illustrative examples discussed in detail. and further useful limit theorems. and nonparametric inference. This allows us to state and prove fundamental results in their full generality that would otherwise be presented vaguely using statements such as “it may be shown that . . For the convenience of the reader. . An abundance of examples is also found in this chapter. conﬁdence regions—tolerance intervals.” “it can be proved that . The following chapter is devoted to testing hypotheses problems. in Part 1 the concepts of a ﬁeld and σ-ﬁeld. . an extensive chapter on limit theorems.” etc. a complete statement and proof of the Central Limit Theorem (in its classical form). the book also includes an appendix summarizing all necessary results from vector and matrix algebra. Other features are a complete formulation and treatment of the general Linear Hypothesis and the discussion of the Analysis of Variance as an application of it. In many textbooks of about the same level of sophistication as the present book. A few of the proofs of theorems and some exercises have been drawn from recent publications in journals. quadratic forms. uniform minimum variance and the maximum likelihood and minimax principles. The concept of sufﬁciency is usually treated only in conjunction with estimation and testing hypotheses problems. the reader ﬁnds a discussion of families of probability density functions which have the monotone likelihood ratio property and. in particular. this does not do justice to such an important concept as that of sufﬁciency. .Contents Preface to the First Edition xix 2 (Chapters 11–20) is devoted to statistical inference.

we hope. On the other hand. His meticulous reading of the manuscripts resulted in many comments and suggestions that helped improve the quality of the text. that is. The careful selection of the material. These classes met three hours a week over the academic year. special thanks are due to G. All the application-oriented reader has to do is to skip some ﬁne points of some of the proofs (or some of the proofs altogether!) when studying the book. satisfy people of both theoretical and applied inclinations. and adventurous sophomores. Mehrotra. Section 5 of Chapter 5. Also thanks go to B. Madison. Lind. We feel that there is enough material in this book for a threequarter session if the classes meet three or even four hours a week. and the existence of a host of exercises of both theoretical and applied nature will. G. and a host of others. Wisconsin November 1972 . The material of this book has been presented several times to classes of the composition mentioned earlier. A. R. K. eager. G. Bhattacharyya. As the teaching of statistics becomes more widespread and its level of sophistication and mathematical rigor (even among those with limited mathematical training but yet wishing to know “why” and “how”) more demanding. which are of both theoretical and practical importance. At various stages and times during the organization of this book several students and colleagues helped improve it by their comments. too many to be mentioned here. Also. and Section 3 of Chapter 9. the responsibility in this book lies with this author alone for all omissions and errors which may still be found. the abundance of examples. Agresti. the inclusion of a large variety of topics. the careful handling of these same ﬁne points should offer some satisfaction to the more mathematically inclined readers. as well as juniors and seniors. there are more than 530 exercises. G.xx Preface to Contents the First Edition the text. In connection with this. K. and most of the material was covered in the order in which it is presented with the occasional exception of Chapters 14 and 20. we hope that this book will ﬁll a gap and satisfy an existing need. Of course. appearing at the end of the chapters. classes consisting of relatively mathematically immature. and statistically unsophisticated graduate students.

but s2 ∉S ′. or that it belongs to S is expressed by writing s ∈ S. S S' s1 s2 Figure 1. or space (which may be different from situation to situation).1). Sets are denoted by capital letters. or universal set. S ′ ⊂ S. or that S′ is contained in S.1 Some Deﬁnitions and Notation A set S is a (well deﬁned) collection of distinct objects which we denote by s. These concepts can be illustrated pictorially by a drawing called a Venn diagram (Fig.1 Set Operations 1. since s2 ∈S. we have s ∈ S. will be considered and all other sets in question will be subsets of S. denoted by Ac.1. and write S′ ⊆ S. The negation of the statement is expressed by writing s ∉ S.) 1 . From now on a basic. 1. an element of S. and we write S′ ⊂ S. if S′ ⊆ S and there exists s ∈ S such that s ∉ S′. 1. s ∉ A}.1 S ′ ⊆ S. while lower case letters are used for elements of sets. We say that S′ is a subset of S. The complement (with respect to S) of the set A. to be denoted by S.1. in fact. S′ is said to be a proper subset of S. 1. The fact that s is a member of S.1 Some Deﬁnitions and Notation 1 Chapter 1 Basic Concepts of Set Theory 1. (See Fig. if for every s ∈ S′.2. is deﬁned by Ac = {s ∈ S.

n j =1 For n = 2. j = 1.3 A1 ∪ A2 is the shaded region. . s ∈ A j for at least one j = 1. This deﬁnition extends to an inﬁnite number of sets. j =1 ∞ S Figure 1. 2. A1 A2 .3. . j =1 n is deﬁned by U A j = {s ∈S . s ∈ A j for all j = 1. n j =1 For n = 2.2 1 Basic Concepts of Set Theory A Ac S Figure 1. this is pictorially illustrated in Fig.4. The deﬁnition extends to an inﬁnite number of sets. j =1 ∞ S Figure 1. ⋅ ⋅ ⋅ }. one has I A j = {s ∈S . ⋅ ⋅ ⋅ }. s ∈ A j for at least one j = 1. 2. n}. . . A1 A2 3. The intersection of the sets Aj. . Thus for denumerably many sets. n. 2. to be denoted by A1 ∩ A2 ∩ ⋅ ⋅ ⋅ ∩ An is deﬁned by or I Aj . one has U A j = {s ∈S . . to be denoted by A1 ∪ A2 ∪ ⋅ ⋅ ⋅ ∪ An or U Aj . . this is pictorially illustrated in Fig. n.2 Ac is the shaded region. n}. The union of the sets Aj. 2. j =1 n I A j = {s ∈S . 2. Thus for denumerably many sets. 2. s ∈ A j for all j = 1. ⋅ ⋅ ⋅ .4 A1 ∩ A2 is the shaded region. 1. . 2. ⋅ ⋅ ⋅ . j = 1. 1.

Two sets A1. if both A1 ⊆ A2 and A2 ⊆ A1. I j A j. ( ) A1 A2 1. A2 are said to be disjoint if A1 ∩ A2 = ∅. . j = 1. 1. A1 − A 2 2 1 ≠ A2 − A1. and U A j. A2 − A1 is \\\\. Two sets A1. Note that ( ) ( ) ( ) A1 Δ A2 = A1 ∪ A2 − A1 ∩ A2 . s ∈ A1 . The sets Aj.5. 2. and we write A1 = A2. A1 + ⋅ ⋅ ⋅ + An = ∑ A j and j =1 n A1 + A2 + ⋅ ⋅ ⋅ = ∑ A j j =1 ∞ instead of A1 ∪ A2. where we do not wish to specify the range of j. We will write U j A j. A2 − A1 = s ∈ S .5 A1 − A2 is ////.1 Some Deﬁnitions and Notation 3 4. s ∉ A2 . Pictorially. A2 are said to be equal. . and (3). it is customary to write A1 + A2 . 1. are said to be pairwise or mutually disjoint if Ai ∩ Aj = ∅ for all i ≠ j (Fig. { { } } Symmetrically. this is shown in Fig. s ∈ A2 . In such a case.1. 1. Note that A1 − A2 = A1 ∩ Ac .7).6 A1 Δ A2 is the shaded area. s ∉ A1 .) S Figure 1. It is worthwhile to observe that operations (4) and (5) can be expressed in terms of operations (1). respectively. The difference A1 − A2 is deﬁned by A1 − A2 = s ∈ S . and that. ∑ j A j. which n j =1 ∞ j =1 . S Figure 1. The symmetric difference A1 Δ A2 is deﬁned by A1 Δ A2 = A1 − A2 ∪ A2 − A1 . U A j. (2). in general. . A1 A2 5.1.2 Further Deﬁnitions and Notation A set which contains no elements is called the empty set and is denoted by ∅. A2 − A1 = A2 ∩ Ac . (See Fig.6.

There are two more important properties of the operation on sets which relate complementation to union and intersection. S ∩ A = A. S Figure 1.7 A1 and A2 are disjoint. The previous statements are all obvious as is the following: ∅ ⊆ A for every subset A of S. . S c = ∅. PROOF OF (i) We wish to establish and b) I jAc ⊆ ( U j Aj)c. we prove (i). . A ∩ A = A. A1 ∪ A2 = A2 ∪ A1 A1 ∩ A2 = A2 ∩ A1 6. . The following identity is a useful tool in writing a union of sets as a sum of disjoint sets.1. A ∩ (∪j Aj) = ∪j (A ∩ Aj) A ∪ (∩j Aj) = ∩j (A ∪ Aj) } (Associative laws) (Commutative laws) (Distributive laws) } } are easily seen to be true. ⎝ j ⎠ j ⎛ ⎞ c ⎜ I Aj⎟ = U Aj . A ∩ Ac = ∅. that is.}. 2. Also A1 ∪ A2 = A1 + A2 for the same reason. They are known as De Morgan’s laws: i ) ) ⎛ ⎞ c ⎜ U Aj⎟ = I Aj . A ∪ A = A. . A1 A2 1. or the (inﬁnite) set {1. S ∪ A = S. j a) ( U jAj)c ⊆ I jAc j . . Also 4. (Ac)c = A. An identity: UA j j c c c = A1 + A1 ∩ A2 + A1 ∩ A2 ∩ A3 + ⋅ ⋅ ⋅ . A1 ∪ (A2 ∪ A3) = (A1 ∪ A2) ∪ A3 A1 ∩ (A2 ∩ A3) = (A1 ∩ A2) ∩ A3 5.4 1 Basic Concepts of Set Theory will usually be either the (ﬁnite) set {1. 2. . . ∅ ∪ A = A. ∅c = S. 3. A ∪ Ac = S. ∅ ∩ A = ∅. A1 ∩ A2 = ∅. 2. n}.3 Properties of the Operations on Sets 1. ⎝ j ⎠ j c c ii As an example of a set theoretic proof.

is said to be a monotone sequence of sets if: ii) A1 ⊆ A2 ⊆ A3 ⊆ · · · (that is. a) Let s ∈ ( U jAj)c. . Exercises 1. . The proof of (ii) is quite similar. then lim An = U An . . U Aj . Thus s ∈ Ac for every j j and therefore s ∈ I jAc. ¯ respectively. n = 1. . or ii) A1 A2 A3 · · · (that is. n→∞ n=1 More generally.1. . for any sequence {An}. Then j j s ∉ U jAj and therefore s ∈( U jAj)c. ▲ This section is concluded with the following: DEFINITION 1 The sequence {An}. and n→∞ n=1 ∞ ∞ ii) If An↓. . have veriﬁed the desired equality of the two sets. 2. iii) (A1 − A2) ∪ A2 = A2. iii) (A1 ∩ A2) ∩ (A1 − A2) = ∅. iii) (A1 ∪ A2) − A1 = A2. .1 Some Deﬁnitions and Notation Exercises 5 We will then. 2. . Then s ∈ Ac for every j and hence s ∉ Aj for any j. to be denoted by An↓). An is decreasing. we deﬁne A = lim inf An = U n→∞ ∞ n=1 j= n I Aj. . Then s ∉ U jAj. hence s ∉ Aj for any j. by deﬁnition. An is increasing. to be denoted by An↑). iv) (A1 ∪ A2) ∩ (A2 ∪ A3) ∩ (A3 ∪ A1) = (A1 ∩ A2) ∪ (A2 ∩ A3) ∪ (A3 ∩ A1). then lim An = I An . n = 1.1 Let Aj. The sequence {An} has a limit if A = A. The limit of a monotone sequence is deﬁned as follows: ii) If An↑. 2. ∞ ∞ and A = lim sup An = I n→∞ ∞ n=1 j= n ¯ The sets A and A are called the inferior limit and superior limit. of the sequence {An}. j b) Let s ∈ I jAc. 3 be arbitrary subsets of S.1. j = 1. Determine whether each of the following statements is correct or incorrect.

1.2. 2.1. 7 of S as follows: ⎧ ⎫ A1 = ⎨ x. 1. ⎝ j =2 ⎠ j =2 ( ) ⎞ 7 ⎛ 7 iii) A1 ∪ ⎜ I A j ⎟ = I A1 ∪ A j .5 Establish the distributive laws stated on page 4.A3 and A4 are deﬁned in Exercise 1.2 Let S = {(x. . x ( ) ⎫ ≤ y 2 ⎬. ⎩ ⎭ ⎧ ⎫ A5 = ⎨ x.1.6 1 Basic Concepts of Set Theory 1.1. B = A3 and C = A4. A3 . y ′ ∈ S . ⎩ ⎭ ⎧ ⎫ A7 = ⎨ x. and perhaps their complements. x = y⎬. where i = 0. 0 y 5. . that is. x 2 ≥ y⎬. where A1. y ′ ∈ S . . ⎬ ⎭ 2 ( ) List the members of the sets just deﬁned.2 and show that: ⎞ 7 ⎛ 7 iii) A1 ∩ ⎜ U A j ⎟ = U A1 ∩ A j . y ′ ∈ S .3 Refer to Exercise 1. ⎩ ⎭ ( ) ( ) ( ) ⎧ A2 = ⎨ x. Verify it by taking A = A1. express each one of the following acts: iii) Bi = {s ∈ S. . y ⎩ ⎧ A6 = ⎨ x. x. x 2 = y 2 ⎬. where prime denotes transpose. the subset relationship is transitive. j ⎝ j =1 ⎠ j =1 7 ⎞ ⎛ 7 iv) ⎜ I A j ⎟ = U Ac j ⎝ j =1 ⎠ j =1 c c by listing the members of each one of the eight sets appearing on either side of each one of the relations (i)–(iv). ⎝ j =2 ⎠ j =2 ( ) 7 ⎛ 7 ⎞ iii) ⎜ U A j ⎟ = I Ac . iii) C = {s ∈ S. −5 x 5. j = 1. A2. y ′ ∈ S . A3 }. s belongs to exactly i of A1. x 2 + y 2 ≤ 4 ⎬. ⎬ ⎭ ( )′ ∈ S . y ⎩ ⎧ A4 = ⎨ x. ⎭ ⎫ ′ ∈S . 1. y = integers}.1. A2. 1. A3. .4 Let A. ⎩ ⎭ ⎧ ⎫ A3 = ⎨ x. y)′ ∈ 2.1. 3}. Then show that A ⊆ C. and deﬁne the subsets Aj. s belongs to all of A1.6 In terms of the acts A1. 1. x = − y⎫. B and C be subsets of S and suppose that A ⊆ B and B ⊆ C. y ⎩ ( )′ ∈S .1. 1. A2. x ≤ y2 .

. n ⎭ 2 . s belongs to none of A1.8 Give a detailed proof of the second identity in De Morgan’s laws. 1.10 Let S = follows: 2 and deﬁne the subsets An.1. y ′ ∈ ⎩ ⎧ Bn = ⎨ x. Bn. s belongs to at least 1 of A1. n n n ⎭ 1⎫ 2 . iv) F = {s ∈ S.1. deﬁne the sets An as follows: A2n−1 = A. of S as ⎧ 1 An = ⎨x ∈ . A3}. .1 Some Deﬁnitions and Notation Exercises 7 iii) D = {s ∈ S. Bn↓ B and identify A and B. y ′ ∈ ⎩ ( ) ( ) 1 2 1⎫ ≤ x < 6 − . n⎭ ⎩ Then show that An↑ and Bn↓. A2. Also identify the sets A and B. n = 1. A3 }. ¯ iii) A = {s ∈ S. 1.9(iv)). .7 Establish the identity stated on page 4.12 Let A and B be subsets of S and for n = 1.1. 0 < x < 7 + ⎬. A2. of S as ⎧ An = ⎨ x.1. A2n = B. s belongs to at most 2 of A1. x 2 + y 2 ≤ 3 ⎬. . Then show that lim inf An = A ∩ B. so that lim An = A and lim Bn = B exist (by n→∞ n→∞ Exercise 1. 2. show that ⎛ ⎞ c ⎜ I Aj⎟ = U Aj . n = 1. . 1.1.11 Let S = follows: and deﬁne the subsets An. 2. . ⎝ j ⎠ j 1. ¯ iii) A ⊆ A. 1. then A = A = lim An . n→∞ lim sup An = A ∪ B. Bn. n⎭ ⎧ 3⎫ Bn = ⎨x ∈ . ¯ ¯ iv) If {An} is a monotone sequence.1. − 5 + < x < 20 − n ⎩ 1⎫ ⎬. A3}. iv) E = {s ∈ S. n→∞ . A2. .9 Refer to Deﬁnition 1 and show that c iii) A = {s ∈ S. n→∞ ¯ 1. s belongs to inﬁnitely many A’s}. 3+ Then show that An↑ A. that is. . .1. 2. 0 ≤ y ≤ 2 − 2 ⎬.1. . s belongs to all but ﬁnitely many A’s}.

⋅ ⋅ ⋅ . A2. S c = ∅ ∈ F. . . need not imply that their union or intersection is in F. By the associative law for unions of sets. . Notice. A2 . If Aj ∈ F. j = 1. . ∅ ∈ F. . if (F1) F is a non-empty class. . (It is trivially true for n = 1. A ∪ Ac = S ∈ F. A2 ∈ F implies that A1 ∪ A2 ∈ F (that is. Ak ∈ F. 2. and derive some basic results.) PROOF OF (1) AND (2) (1) (F1) implies that there exists A ∈ F and (F2) implies that Ac ∈ F. that Aj ∈ F. . if A1 . . . and is denoted by F. hence the statement for unions is true for n = 2. Uk= 1 A j ∈ F. (F2) A ∈ F implies that Ac ∈ F (that is. ⎛ k−1 ⎞ A j = ⎜ U A j ⎟ ∪ Ak .2* Fields and σ-Fields In this section. n. we introduce the concepts of a ﬁeld and of a σ-ﬁeld. then U Aj ∈F . that is. In=1 A j ∈ F for any ﬁnite n.1 Consequences of the Deﬁnition of a Field 1. S. see consequence 2 on page 10. Since Ak ∈ F.) Now assume the statement for unions is true for n = k − 1. F is closed under ﬁnite unions and intersections. I j ⎝ j =1 ⎠ j =1 n c * The reader is reminded that sections marked by an asterisk may be omitted without jeo* pardizing the understanding of the remaining material. however. By (F3). then A1 ∪ A2 ∈ F. F is closed under pairwise unions). for a counterexample. j j (That is. . (2) The proof will be by induction on n and by one of the De Morgan’s laws. then Un=1 A j ∈ F. F is closed under complementation).8 1 Basic Concepts of Set Theory 1. By observing that ⎞ ⎛ n A j = ⎜ U Ac ⎟ . present a number of examples. 2. DEFINITION 2 A class (set) of subsets of S is said to be a ﬁeld. By (F2).2. j =1 k−1 Consider A1. . if A1. the statement for unions is true for any ﬁnite n. 1. (F3) implies that j 1 k ⎛ k−1 ⎞ ⎜ U A j ⎟ ∪ Ak = U A j ∈ F ⎝ j =1 ⎠ j =1 and by induction. . By (F3). 2. A2 ∈ F. j = 1. U ⎝ j =1 ⎠ j =1 k − By the induction hypothesis. (F3) A1. Ak−1 ∈ F .

The other two possibilities follow just as in (b). or whose complements are ﬁnite. C1 = {∅. A2 ∈ Fj for every j ∈ I. Hence A1 ∪ A2 ∈ C4.1 Some1. then In=1 A j ∈ F for any ﬁnite n. If Ac is ﬁnite. (We say that F is generated by C and write F = F(C). C2 = {all subsets of S} is a ﬁeld (discrete ﬁeld). then (Ac)c = A is ﬁnite and hence Ac ∈ C4 also. j ∈I ( ) . j ∈ I} be the class of all ﬁelds containing C and deﬁne F(C) by F C = I F j. ii) If A ∈ F. A2 are ﬁnite. A2 ∈ F. . so that S. As an example. PROOF THAT C4 IS A FIELD i) Since S c = ∅ is ﬁnite.) PROOF Clearly. A. j ∈ I be ﬁelds of subsets of S. Next. Then A1 or Ac is ﬁnite and A2 or Ac is ﬁnite. . or uncountable). ▲ j 1. so that C4 is non-empty. Then A or Ac is ﬁnite. Then (A1 ∪ A2)c = A1 ∩ Ac is ﬁnite 2 c since A1 is. Ac}. 1 2 a) Suppose that A1. . is a ﬁeld. Hence (F1). iii) Suppose that A1. then A1. ii) Suppose that A ∈ C4. or countably inﬁnite. Deﬁne F by F = I j ∈I F j = {A. ∅ ∈ Fj for every j ∈ I. S ∈ C4. 3. S.2* Fields and Notation Deﬁnitions and σ -Fields 9 we see that (F2) and the above statement for unions imply that if A1. for some ∅ ⊂ A ⊂ S. Then A1 ∪ A2 ∈ Fj for every j ∈ I. so that Ac ∈ F. THEOREM 1 Let I be any non-empty index set (ﬁnite. A2 ∈ C4. An ∈ F. Then F is a ﬁeld. c c b) Suppose that A1. then Ac ∈ C4. ▲ We now formulate and prove the following theorems about ﬁelds. then A ∈ Fj for every j ∈ I. Let S be inﬁnite (countably so or not) and let C4 be the class of subsets of S which are ﬁnite. If A is ﬁnite.2 Examples of Fields 1. 2.1. Thus Ac ∈ Fj for every j ∈ I. . A2 are both ﬁnite. Then there is a unique minimal ﬁeld F containing C. A ∈ Fj for all j ∈ I}. C4 = {A ⊂ S. (F2). that is. so that A1 ∪ A2 ∈ C4. iii) If A1. and let Fj. and hence A1 ∪ A2 ∈ F. ▲ THEOREM 2 Let C be an arbitrary class of subsets of S. ∅ ∈ F and hence F is non-empty. (F3) are satisﬁed. A or Ac is ﬁnite}. C is contained in the discrete ﬁeld. let {Fj. Then A1 ∪ A2 is ﬁnite. PROOF i) S. S} is a ﬁeld (trivial ﬁeld).2. 4. C3 = {∅. we shall verify that C4 is a ﬁeld.

we invoke the previously mentioned theorem j on the countable union of countable sets. published by Addison-Wesley. Ac} for some ∅ ⊂ A ⊂ S is a σ-ﬁeld.) Let Aj. then (Ac)c = A is countable. . . and is denoted by A. 3. take S = (−∞. . since it is the intersection of sets. 2. we note that ∞ ⎛∞ ⎞ Aj ⎟ = I Ac . . A is closed under j denumerable intersections).3 Consequences of the Deﬁnition of a σ -Field 1. 1957. As an example. a σ-ﬁeld is a ﬁeld. 2. . then A or Ac is countable. Take S to be uncountable and deﬁne C4 as follows: C4 = {all subsets of S which are countable or whose complements are countable}. . Thus A is inﬁnite and j furthermore so is Ac. . iii) The proof of this statement requires knowledge of the fact that a countable union of countable sets is countable. In the second case. of all integers. A. . or there exists some Aj for which Aj is not countable but Ac is.10 1 Basic Concepts of Set Theory By Theorem 1. 4. and deﬁne Aj = {all integers in [−j. j ⎜U ⎝j = 1 ⎠ j = 1 c which is countable. If Aj ∈ A. j = 1. j = 1. whereas Aj ∈ F for all j. . ii) If A ∈ C4. j]}. Hence F = F(C). ∈ A. if it is a ﬁeld and furthermore (F3) is replaced by (A3): If Aj ∈ A. . A is closed under denumerable unions). PROOF i) Sc = ∅ is countable. F(C) is a ﬁeld containing C. then I ∞= 1 A j ∈ A (that is. 2. since it is the intersection of all ﬁelds containing C. one of which is countable. so that Ac ∈ C4. j = 1. 2. say. Then U ∞= 1 A j is the set A. Hence A ∉ F. by deﬁnition Ac ∈ C4. ∞). By deﬁnition. (For proof of this fact see page 36 in Tom M. If A is countable. It is obviously the smallest such ﬁeld. Apostol’s book Mathematical Analysis. C1 = {∅. . C2 = {all subsets of S} is a σ -ﬁeld (discrete σ-ﬁeld). . .2. In the ﬁrst case.4 Examples of σ -Fields 1. 2. so C4 is non-empty. ▲ . C3 = {∅. and is unique. 1. j = 1. Then either each Aj is countable. then U ∞= 1 A j j ∈ A (that is. In fact. . we prove that C4 is a σ-ﬁeld. If Ac is countable. 1. S} is a σ-ﬁeld (trivial σ-ﬁeld). 2. but the converse is not true.2. in Example 4 on page 9. . ▲ DEFINITION 3 A class of subsets of S is said to be a σ-ﬁeld. S.

1 Some1.5 Special Cases of Measurable Spaces 1. ( x. y ∈ . [ x. we denote this σ-ﬁeld by B and call . ∈ Aj for every j ∈ I and hence U ∞= 1 A j j ∈ A j. for every j ∈ I.⎫ } = ⎪ x. then for certain technical reasons. Deﬁne σ C = I all σ -ﬁelds containing C . if S is countable (that is. x < y ⎪. ii) If A ∈ A. then A1. PROOF i) S. ∅ ∈ Aj for every j ∈ I and hence they belong in A. Let S be (the set of real numbers. REMARK 1 ( ) { } 1. ▲ For later use. be σ-ﬁelds. iii) If A1. . which now plays the role of the entire space. However.2* Fields and Notation Deﬁnitions and σ -Fields 11 We now introduce some useful theorems about σ-ﬁelds. ( x. By Theorem 3. x.) PROOF Clearly. if S is uncountable. In both cases. . A2. Then A is a σ-ﬁeld. THEOREM 3 Let I be as in Theorem 1. ﬁnite or denumerably inﬁnite). y). Deﬁne A by A = I j∈I Aj = {A. y . or otherwise known as the real line) and deﬁne C0 as follows: C0 = all intervals in { ⎧( −∞. the pair (S. Hence A = σ(C). then AA = {C. we take the σ-ﬁeld to be “smaller” than the discrete one. ∞). . and let Aj. x ). x ]. ▲ j THEOREM 4 Let C be an arbitrary class of subsets of S. A) is called a measurable space. y . C is contained in the discrete σ-ﬁeld. where complements of sets are formed with respect to A. then A ∈ Aj for every j ∈ I. A2. . we note that if A is a σ-ﬁeld and A ∈ A. ⎨ ⎬ ⎪( ⎪ ][ )[ ] ⎩ ⎭ By Theorem 4. In all that follows. Uniqueness and minimality again follow from the deﬁnition of σ(C ). Thus Ac ∈ A.2.2. A ∈ Aj for all j ∈ I}. . ∈ A. y . This is easily seen to be true by the distributive property of intersection over union (see also Exercise 1. ( −∞. ∞). j ∈ I. (We say that A is the σ-ﬁeld generated by C and write A = σ(C). x. σ(C ) is a σ-ﬁeld which obviously contains C. thus U ∞= 1 A j ∈ A . . x. so that Ac ∈ Aj for every j ∈ I. C = B ∩ A for some B ∈ A} is a σ-ﬁeld. there is a σ-ﬁeld A = σ(C0). Then there is a unique minimal σ-ﬁeld A containing C.5). we will always take A to be the discrete σ-ﬁeld. .1.

y] ∩ (x. . x. y) = (−∞. ∞). . . x < y}.12 1 Basic Concepts of Set Theory it the Borel σ-ﬁeld (over the real line). . x ]. . . y] = (−∞. In the case of C′j. where for j = 1. ∞) ∈σ (C ). x ]. x] = (−∞. consider monotone sequences of rational numbers convergent to given irrationals x. . y) ∩ (x. Then (−∞. x ∈ }. Clearly. y). 8. 8. ∞). Thus. x. C1 = C2 C3 C4 C5 C6 C7 C8 Also the classes C ′j. xn) ∈ σ (C7). ▲ . j = 1. C′ are two classes of subsets of S such that C ⊆ C′. y ∈ . ∞) ∈σ (C ). it sufﬁces to prove that B ⊆ σ (Cj). . = {( −∞. = {( −∞. . . y are restricted to the j rational numbers. j = 1. . . Since (x. it sufﬁces to show that C0 ⊆ σ (Cj). x. x ) = (−∞. . B) is called the Borel real line. C′ is deﬁned the same way as Cj is except that x. (x. x ∈ }. y). Next. y] = (−∞. n But PROOF {(x. y ∈ . As an example. y ∈ . . 2. x. then σ (C) ⊆ σ (C′). C0 ⊆ σ(C′j). B ⊆ σ (C′j). ∞) = (−∞. we show that C0 ⊆ σ (C7). c 7 it also follows that (x. = {[ x. = {( x. 8. y] ∩ [x. x < y}. x < y}. x) . x] . 8 generate the Borel σ-ﬁeld. 2. and in order to prove this. xn) ∈ σ (C7) and hence I∞=1 (−∞. THEOREM 5 Each one of the following classes generates the Borel σ-ﬁeld. if C. x ∈ }. ∞) = (−∞. y]. ∞) ∈σ (C ). = {[ x. . y) − (−∞. ∞) ∈ σ(C7). [x. ∞) ∈σ (C ). y. 8. = {[ x. . c [x. y ∈ . y) = (−∞. x] ∈ σ(C7) for every x ∈ . [x. j = 1. [x. in order to prove the theorem. n n=1 ∞ Thus (−∞. j = 1. The pair ( . = {( x. . . x < y}. Consider xn ↓ x. . 7 7 7 Thus C0 ⊆ σ (C7). I (−∞. x ∈ }. . y) ∩ [x. 2. ∞). y]. (x. x ).

1. 2. n ≥ 1. which states that if Aj ∈ A. 2 . 1.2. ∞) × (x ′.2. ⋅ ⋅ ⋅ then I Aj ∈ A. belong to A. Let S = × × · · · × = k (k copies of ) and deﬁne C0 in a way similar to that in (2) above. ∞).2.2 Show that in the deﬁnition of a ﬁeld (Deﬁnition 2). 2. 1 . Let S = × = 2 and deﬁne C0 as follows: 2 C0 = all rectangles in { } = {(−∞. y′ ∈ . .2. 2. [x. Determine whether or not C is a ﬁeld.4 Refer to Example 4 on σ-ﬁelds on page 10 and explain why S was taken to be uncountable. 4 . x) × (−∞. {1. ∞). j =1 ∞ 1. 4 . y′). 1.2. S }. x ′]. 4}. y. 1. A theorem similar to Theorem 5 holds here too.6 Refer to Deﬁnition 1 and show that all three sets A. n→∞ ¯ whenever it exists. 3.8 Complete the proof of the remaining parts in Theorem 5. belong to A provided An. property (A3) can be replaced by (A3′). x ′. ⋅ ⋅ ⋅ . x ′). x ′]. 3 . property (F3) can be replaced by (F3′) which states that if A1. A and lim An.1 Verify that the classes deﬁned in Examples 1. 3. [x. x ′ < y′}. x] × (−∞. 4}. 1.1. ¯ 1. A2 ∈ F. 4 . x. x) × (−∞. j = 1. (−∞. 2. 1. 3. A theorem similar to Theorem 5 holds here too. x ′). 2.2. 3. ⋅ ⋅ ⋅ . The σ-ﬁeld generated by C0 is denoted by B2 and is the two-dimensional Borel σ-ﬁeld. 4} and deﬁne the class C of subsets of S as follows: C = ∅. 2 . then A1 ∩ A2 ∈ F. x] × (−∞.7 Let S = {1. ⋅ ⋅ ⋅ . 1. The σ-ﬁeld generated by C0 is denoted by Bk and is the k-dimensional Borel σ-ﬁeld.2. (−∞. 1. y) × (x ′. 2 and 3 on page 9 are ﬁelds. 3}. Exercises 1. (x.5 Give a formal proof of the fact that the class AA deﬁned in Remark 1 is a σ-ﬁeld. y′]. x < y. (−∞.1 Some Deﬁnitions and Notation Exercises 13 2. ∞) × [x ′.2.3 Show that in Deﬁnition 3. y] × [x ′. 3 . { {} {} {} { } { } { } { } { } { } {1. (x. {2. 3 .

and $(1. for example. then so is their union U j Aj. recording the number of telephone calls which arrive at a telephone exchange within a speciﬁed period of time. drawing a card from a standard deck of playing cards. Accordingly. then so is its complement Ac. Also. etc. recording the heights of individuals in a certain class. given the conditions under which the experiment is carried out. .000 at the annual rate of 5% will yield $1. 2. is called a random experiment.050 after one year.14 2 Some Probabilistic Concepts and Results Chapter 2 Some Probabilistic Concepts and Results 2. if A is an event. except that it is known to be one of a set of possible outcomes. The set of all possible outcomes of a random experiment is called a sample space and is denoted by S. An experiment for which the outcome cannot be determined. rolling a die. if Aj. If. a container of pure water is brought to a temperature of 100°C and 760 mmHg of atmospheric pressure the outcome is that the water will boil. 14 . The elements s of S are called sample points. S and ∅ are always events. Examples of random experiments are tossing a coin. the outcome is completely determined. are events. An experiment is a deterministic experiment if. and are called the sure or certain event and the impossible event. j = 1.05)n × 1. Also. we require that. Certain subsets of S are called events.1 Probability Functions and Some Basic Properties and Results Intuitively by an experiment one pictures a procedure being carried out under a certain set of conditions whereby the procedure can be repeated any number of times under the same set of conditions. counting the number of defective items produced by a certain manufacturing process within a certain period of time. and upon completion of the procedure certain results are observed. Only random experiments will be considered in this book. while an event containing at least two sample points is called a composite event. respectively. . Events of the form {s} are called simple events. The class of all events has got to be sufﬁciently rich in order to be meaningful. . a certiﬁcate of deposit of $1.000 after n years when the (constant) interest rate is compounded.

P) (or (S. ( ) ( ) ( ) ( ) or 1 = 1+ P ∅ + ⋅ ⋅ ⋅ ( ) and P ∅ = 0.4 Combinatorial 15 (In the terminology of Section 1. . If the random experiment results in s and s ∈ A. we have n ⎛ n ⎞ P⎜ ∑ A j ⎟ = ∑ P A j .1 Consequences of Deﬁnition 1 (C1) P(∅) = 0.1 Probability Functions and Some Basic Properties and Results 2. .1. there are only ﬁnitely many events and hence. P(S) = 1. . in particular. A1 − A2 occurs if A1 occurs but A2 does not. REMARK 1 If S is ﬁnite. P)) is known as a probability space. . The next basic quantity to be introduced here is that of a probability function (or of a probability measure). (P3′) follows then from this assumption by induction. . that is. S = S + ∅ + ⋅ ⋅ ⋅ . In fact. . that is for any event Aj. that is. . 2. etc. possibly ∅. for every collection of pairwise (or mutually) disjoint events Aj. Then (P3) is reduced to: (P3′) P is ﬁnitely additive. ﬁnitely many pairwise disjoint events. Aj. (P2) P is normed. P(A) ≥ 0. . i ≠ j. j = 1. . then every subset of S is an event (that is. A. The triple (S. DEFINITION 1 A probability function denoted by P is a (set) function which assigns to each event A a number denoted by P(A).) (C2) P is ﬁnitely additive. we say that the event A occurs or happens. (P3) P is σ-additive. j = 1. class of events. 2. with probability 0 is called a null event. for every event A. we have P(Σj Aj) = Σj P(Aj). .) It follows then that I j Aj is also an event. (So P(∅) = 0. ⎝ j =1 ⎠ j =1 ( ) Actually.2. in such a case it is sufﬁcient to assume that (P3′) holds for any two disjoint events.2. 2. and so is A1 − A2. and satisﬁes the following requirements: (P1) P is non-negative. . that is. j = 1. ( ) since P(∅) ≥ 0. A is taken to be the discrete σ-ﬁeld). n. that is. we require that the events associated with a sample space form a σ-ﬁeld of subsets in that space. The U j Aj occurs if at least one of the Aj occurs. for every collection of pairwise disjoint events. n such that Ai ∩ Aj = ∅. . etc. In such a case. 2. called the probability of A. the I j Aj occurs if all Aj occur. This is the axiomatic (Kolmogorov) deﬁnition of probability. so that P S = P S +∅+ ⋅⋅⋅ = P S +P ∅ + ⋅⋅⋅ . Any event. .

that is A1 ⊆ A2 implies P(A1) ≤ P(A2). since A + Ac = S. (P2). then P(A2 − A1) = P(A2) − P(A1). P Σ jn= 1Aj = P Σ j∞= 1PAj = Σ j∞= 1P Aj = ( ) ( ) ( ) (C3) For every event A. ( ) hence P A2 = P A1 + P A2 − A1 . P(A1 ∪ A2) = P(A1) + P(A2) − P(A1 ∩ A2). (C6) For any events A1. A2. In fact. In fact. ( ) ( ) ( ) and therefore P(A2) ≥ P(A1). ⎝ j =1 ⎠ j =1 ( ) Indeed. P A + Ac = P S . but this is not true. A1 ∪ A2 = A1 + A2 − A1 ∩ A2 . in general. (C7) P is subadditive. P(Ac) = 1 − P(A). j ≥ n + 1. Hence P A1 ∪ A2 = P A1 + P A2 − A1 ∩ A2 1 2 1 ( ) ( ) ( ) ( ) = P( A ) + P( A ) − P( A ∩ A ). ( ) for Aj = 0. ⎞ ∞ ⎛∞ P⎜ U A j ⎟ ≤ ∑ P A j ⎝ j =1 ⎠ j =1 ( ) ( ) ( ( ) ) and also n ⎛ n ⎞ P⎜ U A j ⎟ ≤ ∑ P A j . or ( ) () P A + P Ac = 1. This follows from (C1). (C4) P is a non-decreasing function. ( ) ( ) so that P(Ac) = 1 − P(A). Σ jn= 1P Aj . In fact. 2 since A1 ∩ A2 ⊆ A2 implies P A2 − A1 ∩ A2 = P A2 − P A1 ∩ A2 . and (C4). (C5) 0 ≤ P(A) ≤ 1 for every event A. A2 = A1 + A2 − A1 . REMARK 2 If A1 ⊆ A2. ⎝ j =1 ⎠ j =1 ( ) . that is.16 2 Some Probabilistic Concepts and Results n ⎛ n ⎞ P⎜ ∑ A j ⎟ = ∑ P A j .

ﬁnite or not. we have n ⎛ n ⎞ P⎜ U A j ⎟ = ∑ P A j − ∑ P A j ∩ A j ⎝ j =1 ⎠ j =1 1≤ j < j ≤ n ( ) ∑ ( 1 2 ) ) + P Aj ∩ Aj ∩ Aj 1 2 1≤ j1 < j2 < j3 ≤ n ( 1 2 3 ) − ⋅ ⋅ ⋅ + −1 ( ) n+1 P A1 ∩ A2 ∩ ⋅ ⋅ ⋅ ∩ An . . With this deﬁnition. j =1 (P3) and (C2). . supplied with a class of events A. it is called the probability of A. The relative frequency deﬁnition of probability is this: Let S be any sample space.2. Such a probability function is called a uniform probability function. however. The relative frequency deﬁnition of probability provides. and (C4). This deﬁnition is adequate as long as S is ﬁnite and the simple events {sj}. j = 1. may be assumed to be “equally likely. and deﬁne P as P({sj}) = 1/n. A random experiment associated with the sample space S is carried out n times. A special case of a probability space is the following: Let S = {s1. the statement is trivial. ( PROOF (By induction on n). this deﬁnition satisﬁes (P1). n. this classical deﬁnition together with the following relative frequency (or statistical) deﬁnition of probability served as a motivation for arriving at the axioms (P1)–(P3) in the Kolmogorov deﬁnition of probability.” but it breaks down if either of these two conditions is not satisﬁed.4 Combinatorial 17 This follows from the identities U A j = A1 + ( A1c ∩ A2 ) + ⋅ ⋅ ⋅ + ( A1c ∩ ⋅ ⋅ ⋅ ∩ Anc−1 ∩ An ) + ⋅ ⋅ ⋅ . j = 1. as n → ∞. We now state and prove some general theorems about probability functions. THEOREM 1 (Additive Theorem) For any ﬁnite number of events. . For n = 1. an intuitively satisfactory interpretation of the concept of probability. (P2) and (P3′). . 2. If. P clearly satisﬁes (P1)–(P3′) and this is the classical deﬁnition of probability. and prove it for n = k + 1. Now assume the result to be true for n = k. . . .1 Probability Functions and Some Basic Properties and Results 2. 2. . . let the class of events be the class of all subsets of S. Let n(A) be the number of times that the event A occurs. and we have proven the case n = 2 as consequence (C6) of the deﬁnition of probability functions. s2. j =1 n ∞ U A j = A1 + ( A1c ∩ A2 ) + ⋅ ⋅ ⋅ + ( A1c ∩ ⋅ ⋅ ⋅ ∩ Anc−1 ∩ An ). sn}. . Neither the classical deﬁnition nor the relative frequency deﬁnition of probability is adequate for a deep study of probability theory. n. Clearly. . lim[n(A)/n] exists. . and is denoted by P(A). . However. respectively.

⎠ ⎞ (1) k ⎛ k ⎞ P⎜ U A j ∩ Ak+1 ⎟ = ∑ P A j ∩ Ak+1 − ∑ P A j1 ∩ A j2 ∩ Ak+1 ⎝ j =1 ⎠ j =1 1≤ j1 < j2 ≤ k ( ) ( ) ( ) ) + 1≤ j1 < j2 < j3 ≤ k ∑ k P A j1 ∩ A j2 ∩ A j3 ∩ Ak+1 − ⋅ ⋅ ⋅ ( ) + −1 + −1 Replacing this in (1).18 2 Some Probabilistic Concepts and Results We have ⎞ ⎛⎛ k ⎛ k+1 ⎞ ⎞ P⎜ U A j ⎟ = P⎜ ⎜ U A j ⎟ ∪ Ak+1 ⎟ ⎝ j =1 ⎠ ⎠ ⎝ ⎝ j =1 ⎠ ⎛⎛ k ⎞ ⎛ k ⎞ ⎞ = P⎜ U A j ⎟ + P Ak+1 − P⎜ ⎜ U A j ⎟ ∩ Ak+1 ⎟ ⎝ j =1 ⎠ ⎝ ⎝ j =1 ⎠ ⎠ ( ) ⎡k = ⎢∑ P A j − ∑ P A j1 ∩ A j2 1≤ j1 < j2 ≤ k ⎢ ⎣ j =1 + ∑ P A j1 ∩ A j2 ∩ A j3 − ⋅ ⋅ ⋅ ( ) k+1 ( ) 1≤ j1 < j2 < j3 ≤ k ( ) + −1 k+1 ( ) ⎛ k ⎞ P A1 ∩ A2 ∩ ⋅ ⋅ ⋅ ∩ Ak ⎤ + P Ak+1 − P⎜ U A j ∩ Ak+1 ⎟ ⎥ ⎦ ⎝ j =1 ⎠ ( ) ( ) ( ) = ∑ P Aj − j =1 + 1≤ ji < j2 < j3 ≤ k ( ) ∑ P( A ∩ A ) ∑ P( A ∩ A ∩ A ) − ⋅ ⋅ ⋅ 1≤ ji < j2 ≤ k j1 j1 j2 j2 j3 k+1 + −1 But ( ) ⎛ k P A1 ∩ A2 ⋅ ⋅ ⋅ ∩ Ak − P⎜ U A j ∩ Ak+1 ⎝ j =1 ( ) ( )⎟ . ( ) k ⎡ ⎤ ⎛ k+1 ⎞ k+1 P⎜ U A j ⎟ = ∑ P A j − ⎢ ∑ P A j ∩ A j + ∑ P A j ∩ Ak+1 ⎥ ⎝ j =1 ⎠ j =1 ⎢1≤ j < j ≤ k ⎥ j =1 ⎣ ⎦ ⎡ ⎤ + ⎢ ∑ P A j ∩ A j ∩ A j + ∑ P A j ∩ A j ∩ Ak+1 ⎥ 1≤ j < j ≤ k ⎢ ⎥ ⎣1≤ j < j < j ≤ k ⎦ ( ) 1 2 ( 1 2 ) ( ) 1 2 ( 1 2 3 ) ( 1 2 ) 3 1 2 − ⋅ ⋅ ⋅ + −1 + ( ) [ P( A ∩ ⋅ ⋅ ⋅ ∩ A ) k+1 1 k 1≤ j1 < j2 < ⋅ ⋅ ⋅ < jk − 1 ∑ ⎤ P A j ∩ ⋅ ⋅ ⋅ ∩ A j ∩ Ak+1 ⎥ ≤k ⎥ ⎦ ( 1 k− 1 ) . we get ( ) ( ) 1≤ j1 < j2 ⋅ ⋅ ⋅ jk − 1 ≤ k k+1 ∑ P A j1 ∩ ⋅ ⋅ ⋅ ∩ A jk− 11 ∩ Ak+1 ( P A1 ∩ ⋅ ⋅ ⋅ ∩ Ak ∩ Ak+1 .

2. Hence ) ( ) ⎛∞ ⎞ P lim An = P⎜ U A j ⎟ = P A1 + P A2 − A1 n→∞ ⎝ j =1 ⎠ ( ) ( ) ( 2 1 ) n n−1 ( ) = lim[ P( A ) + P( A − A ) + ⋅ ⋅ ⋅ + P( A − A )] = lim[ P( A ) + P( A ) − P( A ) + P( A ) − P( A ) + ⋅ ⋅ ⋅ + P( A ) − P( A )] = lim P( A ). as n → ∞. n→∞ n→∞ ( ) ( ) Now let An ↓. by the assumption that An ↑. so that n c lim An = U Ac . Then lim An = U A j. n→∞ n→∞ ( ) ( ) PROOF Let us ﬁrst assume that An ↑. Then Ac ↑.1 Probability Functions and Some Basic Properties and Results 2. j n→∞ j=1 ∞ . n→∞ j=1 ∞ We recall that UA j =1 ∞ j c = A1 + A1c ∩ A2 + A1c ∩ A2 ∩ A3 + ⋅ ⋅ ⋅ ( ( ) ( ) = A1 + A2 − A1 + A3 − A2 + ⋅ ⋅ ⋅ . An ↑ or An ↓. ▲ ( ) Let {An} be a sequence of events such that. + P A3 − A2 + ⋅ ⋅ ⋅ + P An − An−1 + ⋅ ⋅ ⋅ n→∞ n→∞ 1 1 2 1 3 2 n n−1 n→∞ n ( ) Thus P lim An = lim P An . Then.4 Combinatorial 19 ( ) P( A ∩ ⋅ ⋅ ⋅ ∩ A ∩ A ) = ∑ P( A ) − ∑ P( A ∩ A ) + ∑ P( A ∩ A ∩ A ) − ⋅ ⋅ ⋅ + −1 k+1 j=1 k+ 2 1 k k+1 j 1≤ j1 < j2 ≤ k+1 j1 j1 j2 1≤ j1 < j2 < j3 ≤ k+1 j2 j3 + −1 THEOREM 2 ( ) k+ 2 P A1 ∩ ⋅ ⋅ ⋅ ∩ Ak+1 . P lim An = lim P An .

x 2 + 1 ≤ 375 . what is the probability that: i) x1 + x2 = 8? ii) x1 + x2 ≤ 5? 2.1.2 If two fair dice are rolled once. c ⎡⎛ ∞ ⎞ ⎤ ⎛∞ ⎞ ⎢ I A j ⎥ = lim 1 − P An . what is the probability that the total number of spots shown is i) Equal to 5? ii) Divisible by 3? 2. 2. respectively. ▲ This theorem will prove very useful in many parts of this book. and C by: A = x ∈ S . { B = {x ∈ S . compute the probability of the following events: 12 c c c A ∩ A2 . or 1 − P I A j = 1 − lim P An . 2.3 Twenty balls numbered from 1 to 20 are mixed in an urn and two balls are drawn successively and without replacement. c 1 1 4 . A1c ∩ A2 ∩ A3c . j = 1. { } } . P(A3) = 7 . n→∞ n→∞ ⎝ j =1 ⎠ ( ) ( ) and the theorem is established. A2 ∩ A3 . 3 are such that A1 ⊂ A2 ⊂ A3 and P(A1) = 5 P(A2) = 12 .1. j n→∞ n→∞ ⎝ j=1 ⎠ ( ) ( ) or equivalently. A1c ∩ A3 . B. x = 3n + 10 } C = x ∈ S .1. P ⎜ ⎟ ⎥ n→∞ ⎜ ⎟ n→∞ ⎢⎝ j = 1 ⎠ ⎝ j =1 ⎠ ⎦ ⎣ [ ( )] ( ) Thus ⎞ ⎛∞ lim P An = P⎜ I A j ⎟ = P lim An .1 If the events Aj. for some positive integer n .4 Let S = {x integer. x is divisible by 7 . A1 ∩ A2 ∩ A3c . Exercises 2. If x1 and x2 are the numbers written on the ﬁrst and second ball drawn.1. 1 ≤ x ≤ 200} and deﬁne the events A.20 2 Some Probabilistic Concepts and Results Hence ⎞ ⎛∞ c c P lim An = P⎜ U Ac ⎟ = lim P An .

Assuming the uniform probability function. consider a balanced die and suppose that the sides bearing the numbers 1. 2.1.2 Conditional Probability In this section.6 Suppose that the events Aj. B as follows: { } B = {s ∈ S .5 Let S be the set of all outcomes when ﬂipping a fair coin four times and let P be the uniform probability function on the events of S. Before the formal deﬁnition of conditional probability is given. we shall introduce the concepts of conditional probability and stochastic independence. suppose that the die is rolled once as 6 before and all that we can observe is the color of the upward side but not the number on it (for example. .2 Conditional Probability 2. . Letting B stand for the event that number 6 appears and A for the event that the uppermost side is red. namely. The die is rolled once and we are asked for the probability that the upward side is the one bearing the number 6. the answer now is 1 .1. 2. what is the probability that the number on the uppermost side is 6. whereas the remaining three sides are painted black. j = 1. To this end. s contains more T ’s than H ’s . where P is the equally likely probability function on the events of S. by assuming the uniform probability function. clearly. P(B). j=1 ∞ Use Deﬁnition 1 in Chapter 1 and Theorem 2 in this chapter in order to show ¯ that P(A ) = 0. we may be observing the die from a considerable distance. given the information that the uppermost side was painted red. the above- . any T in s precedes every H in s}. . 1 . we shall attempt to provide some intuitive motivation for it. are such that ∑ P( A j ) < ∞. A = s ∈ S . and use Deﬁnition 1 in Chapter 1 and Theorem 2 herein in order to show that P A ≤ lim inf P An ≤ lim sup P An ≤ P A . . P(B). Next.1. 2.4 Combinatorial Results 21 Compute P(A). the answer is. The same question as above is asked. Deﬁne the events A. Again. Compute the probabilities P(A). . j = 1.2. 2.7 Consider the events Aj. 4 and 6 are painted red. so that the color is visible but not the numbers on the sides). This latter probability is called the condi3 tional probability of the number 6 turning up. 2. P(C). n→∞ n→∞ ( ) ( ) ( ) ( ) 2. .

Suppose further (although this is not exactly correct) that: P({bb}) = P({bg}) = P({gb}) = P({gg}) = 1 . be events such that ⎛ n−1 ⎞ P ⎜ I A j ⎟ > 0. and bg. ) The conditional probability can be used in expressing the probability of the intersection of a ﬁnite number of events. DEFINITION 2 Let A be an event such that P(A) > 0. Then P(A|B) = P(A ∩ B)/ P(B) = 1 . where b stands for boy and g for girl. j = 1. 2. A sample space for this experiment is the following: S = {bb. THEOREM 3 (Multiplicative Theorem) Let Aj. . Next. Or suppose that. it sufﬁces to prove the P(·|A) satisﬁes (P1)–(P3). . for example. .22 2 Some Probabilistic Concepts and Results mentioned conditional probability is denoted by P(B|A). one is led to the following deﬁnition of conditional probability. ⎝ j =1 ⎠ Then . . we observe two-children families in a certain locality. . gb. PSA = P S∩A P A ( ) (P( A) ) = P(( A)) = 1. given A. given A. i ≠ j. and deﬁne the events A and B as follows: A = “children of one gender” = 4 {bb. are events such that Ai ∩ Aj = ∅. P(B|A) is called the conditional probability of B. indicates that the boy is older than the girl. bg. is the (set) function denoted by P(·|A) and deﬁned for every event B as follows: P BA = P A∩ B ( ) (P( A) ) . . and we observe that this is equal to the quotient P(A ∩ B)/P(A). for the purposes of a certain study. j = 1. gb}. The set function P(·|A) is actually a probability function. To see this. we have ⎡⎛ ∞ ⎤ ⎞ ⎤ ⎡ ∞ ⎛ ∞ ⎞ P ⎢⎝ ∑j = 1 A j ⎠ ∩ A⎥ P ⎣∑j = 1 A j ∩ A ⎦ ⎥ ⎣ ⎦= ⎢ P⎜ ∑ A j A⎟ = P A P A ⎝ j =1 ⎠ ( ) = 1 P A ( ) ∑ P( A j =1 ∞ j ∩A =∑ j =1 ) ∞ P Aj ∩ A P A ( ( ( ) ) ( ) )= ∑ P( A ∞ j =1 j A. 3 From these and other examples. B = “at least one boy” = {bb. bg. n. . clearly. and if Aj. gg}. gg}. Then the conditional probability. 2. We have: P(B|A) 0 for every event B. and record the gender of the children. .

8 P A4 A1 ∩ A2 ∩ A3 = Thus the required probability is equal to 1 42 ( ) 4 . Such a collection of events is called a partition of S. The partition is ﬁnite or (denumerably) inﬁnite. 2. . . This point is illustrated by the following simple example. Under the condition that P(B) > 0. THEOREM 4 (Total Probability Theorem) Let {Aj. Find the probability that the ﬁrst ball is black. For any event.) The value of the above formula lies in the fact that. j j ( ) ( ) ( )( ) provided P(Aj) > 0. .0238. it is easier to calculate the conditional probabilities on the right-hand side. .2. ( ) (The proof of this theorem is left as an exercise. and by using the uniform probability function. Let A1 be the event that the ﬁrst ball is black.2 Combinatorial Results 23 ⎛ n ⎞ P⎜ I A j ⎟ = P An A1 ∩ A2 ∩ ⋅ ⋅ ⋅ ∩ An−1 ⎝ j =1 ⎠ ( ) ( )( ) × P An−1 A1 ∩ ⋅ ⋅ ⋅ ∩ An− 2 ⋅ ⋅ ⋅ P A2 A1 P A1 . in general. Then P A1 ∩ A2 ∩ A3 ∩ A4 ( = P A4 A1 ∩ A2 ∩ A3 P A3 A1 ∩ A2 P A2 A1 P A1 . 10 P A2 A1 = ( ) 3 . 2. 9 P A3 A1 ∩ A2 = ( ) 2 . REMARK 3 EXAMPLE 1 An urn contains 10 identical balls (except for color) of which ﬁve are black. i ≠ j. the above formula . .4 Conditional Probability 2. we clearly have: B = ∑ B ∩ Aj . 7 0. . j = 1. and Σj Aj = S. the second red. j = 1. for all j. three are red and two are white. . see Exercise 2. j = 1. Then for B ∈ A. we have P(B) = ΣjP(B|Aj)P(Aj). accordingly. as the events Aj are ﬁnitely or (denumerably) inﬁnitely many. Thus we have the following theorem. j ( ) Hence P B = ∑ P B ∩ Aj =∑ P B Aj P Aj . .2.4. . A2 be the event that the second ball is red. 2. . A3 be the event that the third ball is white and A4 be the event that the fourth ball is black. be events such that Ai ∩ Aj = ∅. the third white and the fourth black. This formula gives a way of evaluating P(B) in terms of P(B|Aj) and P(Aj). . Now let Aj. Four balls are drawn without replacement. } be a partition of S with P(Aj) > 0. we have ( ) )( )( )( ) P A1 = ( ) 5 . all j.

otherwise he/she chooses an answer at random. .24 2 Some Probabilistic Concepts and Results can be “reversed” to provide an expression for P(Aj|B). ∑ j j j j j i i i (Bayes Formula) If {Aj. . j i i j i = 1.92. respectively. 1 4p+1 1⋅ p + 1 − p 5 ( ) Furthermore. . Of course. ∑ P( B A ) P( A ) P B Aj P Aj i i i ( It is important that one checks to be sure that the collection {Aj. Then.} {Bj. . . j ≥ 1} forms a partition of S. 0. ⋅ ⋅ ⋅ . for example. that P(A|B) is approximately equal to: 0. We may consider. P Aj B = Thus THEOREM 5 ( P B A ) P( A ) P B A ) P( A ) P A ∩B ) ( P(B) ) = ( P(B) = (P(B A )P( A ) . 2. ⋅ ⋅ ⋅ . clearly.83 and 0. we ﬁnd. of which only one is correct. . j Bj and ( ) = ∑ ( B ∩ A ). 2. j = 1. }. . j = 1. then he/she is certain to identify the correct answer. For example. an application of Theorems 4 and 5 gives P AB = ( ) P BA P A +P BA P A c ( )( ) ( P BA P A ( )( ) )( ) c = 1⋅ p 5p = . Let p denote the probability of the event A that the student does the homework and let B be the event that he/she answers the question correctly. Find the expression of the conditional probability P(A|B) in terms of p. REMARK 4 EXAMPLE 2 A multiple choice test question lists ﬁve alternative answers.} is a partition of S and P(Aj) > 0.3. By noting that A and Ac form a partition of the appropriate sample space. 2. as only then are the above theorems true. and if P(B) > 0. . for p = 0. . . two partitions {Ai. . ⋅ ⋅ ⋅ .7. then P Aj B = ( ) )( ). it is easily seen that P(A|B) = P(A) if and only if p = 0 or 1. j = 1.5. j = 1. 0. . 2. In fact. 2. i = 1. . ⋅ ⋅ ⋅} i . there is no reason to restrict ourselves to one partition of S only. . {A ∩ B . 2. 2. If a student has done the homework. j = 1. The following simple example serves as an illustration of Theorems 4 and 5. 2.68. 0. j = 1. . . Ai = ∑ Ai ∩ B j . 2. i = 1. .

2. by means of counterexamples. The probabilities P(Ai).2. On the other hand.3 If A ∩ B = ∅ and P(A + B) > 0. i ( ) we get P Ai = ∑ P Ai ∩ B j = ∑ P Ai B j P B j .2.2.1 If P(A|B) > P(A).5 Suppose that a multiple choice test lists n alternative answers of which only one is correct. A and B be deﬁned as in Example 2 and ﬁnd Pn(A|B) (P(A)P(B) > 0). P(Bj) are called marginal probabilities. . . 2. . j ) ≠ (i′.4 Combinatorial Results Exercises 25 is a partition of S. . and P B j = ∑ P Ai ∩ B j = ∑ P B j Ai P Ai . .2. j = 1. 2. i = 1.2 Show that: i) P(Ac|B) = 1 − P(A|B). Exercises 2. then show that P(B|A) > P(B) 2. j i j The expression P(Ai ∩ Bj) is called the joint probability of Ai and Bj.2. express the probabilities P(A|A + B) and P(B|A + B) in terms of P(A) and P(B). i j i j i. We have analogous expressions for the case of more than two partitions of S. ii) P(A ∪ B|C) = P(A|C) + P(B|C) − P(A ∩ B|C). j ′) i i and ∑ ( A ∩ B ) = ∑ ∑ ( A ∩ B ) = ∑ A = S. . . j j ( ) ( ) ( ( ) ) ( ( )( ) )( ) provided P(Bj) > 0. iv) P(C|A + B) = P(C|A) + P(C|B). i i provided P(Ai) > 0. Also show.2.4 Use induction to prove Theorem 3. 2. Let p. . 2. from Ai = ∑ Ai ∩ Bj j ( ) and Bj = ∑ Ai ∩ Bj . that the following equations need not be true: iii) P(A|Bc) = 1 − P(A|B). In fact. . (A ∩ B ) ∩ (A i j i′ ∩ B j ′ = ∅ if ) (i.

. 30% and 40%. 15 with blue eyes and 5 with brown eyes. If it is known that 5 of 1. 5 with blue eyes and 20 with brown eyes.2. If one arranges a blind date with a club member. One item is drawn at random.12 Suppose that a test for diagnosing a certain heart disease is 95% accurate when applied to both those who have the disease and those who do not. 2. what is the probability that: i) The second twin is a boy. 5. Of them. Compute the probabilities P(Aj|A).26 2 Some Probabilistic Concepts and Results in terms of n and p. 4%. 5} be a partition of S and suppose that P(Aj) = j/15 and P(A|Aj) = (5 − j)/15.13 Consider two urns Uj. j = 1.6 If Aj. (A1 ∪ A2 ∪ A3)c} is a partition of S. if one of the other two was found to be good and the other one was found to be defective? 2. j = 1. . of the total output of certain items. . 2. compute the probability that a patient actually has the disease if the test indicates that he does.) 2.10 Three machines I. what is the probability that: i) The girl is blonde? ii) The girl is blonde.000 in a certain population have the disease in question.26. respectively. 25 brunettes. II and III? 2. if it was only revealed that she has blue eyes? 2. Next show that if p is ﬁxed but different from 0 and 1. . . such that urn Uj contains mj white balls and nj black balls.7 Let {Aj. j = 1.9 Suppose that the probability that both of a pair of twins are boys is 0.2. tested and found to be defective. . 2. 3 are any events in S.52. 3% and 2%. are defective.2. then Pn(A|B) increases as n increases. (Interpret the answer by intuitive reasoning.2.11 A shipment of 20 TV tubes contains 16 good tubes and 4 defective tubes. II and III manufacture 30%. j = 1. given that the ﬁrst is a boy? ii) The second twin is a girl. . . Ac ∩ Ac ∩ 1 1 2 A3. Three tubes are chosen at random and tested successively. . A ball is drawn at random from each one of the two urns and . j = 1.30 and that the probability that they are both girls is 0. show that {A1. given that the ﬁrst is a girl? 2. What is the probability that: i) The third tube is good. . . Given that the probability of a child being a boy is 0. . Ac ∩ A2. 5.2. What is the probability that the item was manufactured by each one of the machines I.2. Does this result seem reasonable? 2. if the ﬁrst two were found to be good? ii) The third tube is defective. respectively.2. 5 redheads.8 A girl’s club has on its membership rolls the names of 50 girls with the following descriptions: 20 blondes.2. 1 with blue eyes and 4 with green eyes. 2.

k each of which contain m white balls and n black balls. 2. Then a ball is drawn at random from urn U2 and is placed in urn U3 etc. < P(B). On the other hand. is transferred from urn U2 to urn U3 (color unnoticed). chosen at random from U1. is transferred to urn U1.13.4 Combinatorial Results 2. Compute the probability that the ball is black. .2. Then (assuming throughout the uniform probability function) the conditional probability that the second ball is red.17 Consider six urns Uj. we deﬁned P(B|A) = P(A ∩ B)/P(A). the remaining three being black. is 7 . A balanced die is rolled and if an even number appears. j = 1.3 Independence 27 is placed into a third urn. Now P(B|A) may be >P(B). In other words.15 Consider three urns Uj. the number of white balls in the urn U2 remains the same? 2.2. A ball. chosen at random. Finally. a ball is chosen at random from urn Uk−1 and is placed in urn Uk. . j = 1. As an illustration.2. given that the ﬁrst was black. and then a ball. is transferred from urn U1 to urn U2 (color unnoticed). If the ball drawn was white. is 6 . . Finally. Compute the probability that one ball is white and one ball is black. if the balls are drawn with replacement. 3 such that urn Uj contains mj white balls and nj black balls. If an odd number appears. 2.2. is transferred to urn U2. a ball.2. Compute the probability that this last ball is black.14 Consider the urns of Exercise 2.16 Consider the urns of Exercise 2. What is the probability that. two balls are selected at random from urn Uj. seven of which are red. given that the ﬁrst ball was red. j = 1. a ball. . .3 Independence For any events A.15.2. the probability 10 7 that the second ball is red. the probability that the second ball is red is 7 . . is 10 . or = P(B). chosen at random from urn U2. Then a ball is drawn at random from the third urn. given that the ﬁrst ball was red. whereas the conditional probability 9 that the second ball is red. what is the probability that the urn chosen was urn U1 or U2? 2. What is the probability that the ball is white? 2. A ball is then drawn at random from urn Uk. 2. 2.18 Consider k urns Uj. This probability is the same even if the ﬁrst ball was black. 6 such that urn Uj contains mj (≥ 2) white balls and nj (≥ 2) black balls. Suppose that two balls are drawn successively and without replacement. chosen at random.2. knowledge of the event which occurred in the ﬁrst drawing provides no additional information in . . . A ball is drawn at random from urn U1 and is placed in urn U2. One urn is chosen at random and one ball is drawn from it also at random. B with P(A) > 0. Except for color. consider an urn containing 10 balls.2. A balanced die is tossed once and if the number j appears on the die. the balls are identical. after the above experiment is performed twice. Without any 9 knowledge regarding the ﬁrst ball. a ball is drawn at random from urn U3.

and deﬁne the events A and B as follows: A = “children of both genders. . 2. P(B) > 0. as follows from the equation P(A|B) = P(A). for example. . Again 2 knowledge of the event A provides no additional information in calculating the probability of the event B. then this second event is also independent of the ﬁrst. The deﬁnition of independence generalizes to any ﬁnite number of events. In fact. . . What actually happens in practice is to consider events which are independent in the intuitive sense. P(B) = 0. . etc. 2. Thus. Thus: DEFINITION 4 The events Aj.” B = “older child is a boy. . . These events are said to be pairwise independent if P(Ai ∩ Aj) = P(Ai)P(Aj) for all i ≠ j. in connection with the descriptive experiments of successively drawing balls with replacement from the same urn with always the same content. As was pointed out in connection with the examples discussed above. . and one of the events is independent of the other. provided P(A) > 0. .” Then P(A) = P(B) = P(B|A) = 1 . or drawing cards with replacement from the same deck of playing cards. then it is easily seen that A is also independent of B. . B be events with P(A) > 0. Events like these are said to be independent. That is. Then if P(B|A) = P(B). Notice that this relationship is true even if one or both of P(A). independence of two events simply means that knowledge of the occurrence of one of them helps in no way in re-evaluating the probability that the other event happens. . j = 1. if P(A). n and j1. provided P(B) > 0. and we may simply say that A and B are independent. .28 2 Some Probabilistic Concepts and Results calculating the probability of the event that the second ball is red. independence is a symmetric relation. and then deﬁne the probability function P appropriately to reﬂect this independence. . As another example. DEFINITION 3 The events A. or P(B|A) = P(B). n such that 1 ≤ j1 < j2 < · · · < jk ≤ n. or repeatedly tossing the same or different coins. jk = 1. . we say that the even B is (statistically or stochastically or in the probability sense) independent of the event A. More generally. If P(B) is also > 0. This is true for any two independent events A and B. . In this case P(A ∩ B) = P(A)P(B) and we may take this relationship as the deﬁnition of independence of A and B. . Events which are intuitively independent arise. revisit the two-children families example considered earlier. That is. . n are said to be (mutually or completely) independent if the following relationships hold: P Aj ∩ ⋅ ⋅ ⋅ ∩ Aj = P Aj ⋅ ⋅ ⋅ P Aj 1 k 1 ( ) ( ) ( ) k for any k = 2. Thus A and B are independent. B are said to be (statistically or stochastically or in the probability sense) independent if P(A ∩ B) = P(A)P(B). P AB = P B A)P( A) P B P A P A∩ B ( ) (P(B) ) = ( P(B) = ( P)(B() ) = P( A). let A.

. 1 1 1 = ⋅ = P A1 P A2 . P( A ∩ A ) = P( A )P( A ). for n = 3 we will have: P A1 ∩ A2 ∩ A3 = P A1 P A2 P A3 . The converse need not be true.4 Combinatorial Results 2. 5}. j = 1. 4}. j = 1. {} P A1 ∩ A2 = P A1 ∩ A3 = P A2 ∩ A3 = P A1 ∩ A2 ∩ A3 = ( ) ( ) ( ) ( ) 1 . and Thus {} A1 ∩ A2 ∩ A3 = 1 . 4}. P 2 8 3 5 ({ }) = P({3}) = P({4}) = 16 . 3. and set A1 = {1. . 4 A3 = {1. Then A1 ∩ A2 = A1 ∩ A3 = A2 ∩ A3 = 1 . 3}. . 3. P( A ∩ A ) = P( A )P( A ). .3 Independence 29 It follows that. Also there are ⎛ n⎞ ⎛ n⎞ ⎛ n⎞ ⎛ n⎞ ⎛ n⎞ n n ⎜ ⎟ +⎜ ⎟ + ⋅ ⋅ ⋅ +⎜ ⎟ =2 −⎜ ⎟ −⎜ ⎟ =2 −n−1 ⎝ 2 ⎠ ⎝ 3⎠ ⎝ n⎠ ⎝ 1⎠ ⎝ 0⎠ relationships characterizing the independence of Aj. A3 is illustrated by the following examples: Let S = {1. 2}. 2. then they are pairwise independent. . 2. 2. . 4. and deﬁne P as follows: P 1 = ({ }) 1 . A2 = {1. 4 2 2 P A1 ∩ A2 = ( ( ( ) ( ) ( ) ) ) ( ) ( ) ( ) ( ) but P A1 ∩ A2 ∩ A3 = ( ) 1 1 1 1 ≠ ⋅ ⋅ = P A1 P A2 P A3 . 4 Next. n and they are all necessary. 4 2 2 1 1 1 P A1 ∩ A3 = = ⋅ = P A1 P A3 . 4 2 2 1 1 1 P A1 ∩ A3 = = ⋅ = P A2 P A3 . 4 2 2 2 ( ) ( ) ( ) Now let S = {1. . P({5}) = 16 . . For example. n are mutually independent. as the example below illustrates. 1 3 1 3 2 3 2 3 That these four relations are necessary for the characterization of independence of A1. if the events Aj. .2. 1 2 1 2 ( ) ( ) ( ) ( ) P( A ∩ A ) = P( A )P( A ). P({1}) = · · · = P({4}) = 1 . A2.

{ } { } { } { } A1 ∩ A2 ∩ A3 = 1 . assume 1 k+1 1 k+1 that A′ = Ak+1. 4 . Then A1 ∩ A2 = 1. . we have to show that P(A′ ∩ A′ ) = P(A′ )P(A′ ). An are independent. . 8 2 2 2 ( ) ( ) ( ) but P A1 ∩ A2 = ( ) 5 1 1 ≠ ⋅ = P A1 P A2 . . 2. . THEOREM 6 If the events A1. P(A′ ∩ A′ ) = 1 2 1 2 1 2 1 1 2 P(Ac ∩ Ac ) = P[(S − A1) ∩ Ac ] = P(Ac − A1 ∩ Ac ) = P(Ac ) −P(A1 ∩ Ac ) = P(Ac ) 1 2 2 2 2 2 2 2 − P(A1)P(Ac ) = P(Ac )[1 − P(A1)] = P(Ac )P(Ac ) = P(A′ )P(A′ ). j = 1. n. 16 2 2 ( ) ( ) The following result. . we suppose that P(A′ ∩ · · · ∩ A′ ) = P(A′ ) · · · P(A′ ). . i = 2. That is. A′ . First. 2. and we have to show that k+1 PROOF P A1′ ∩ ⋅ ⋅ ⋅ ∩ Ak+1 = P A1′ ∩ ⋅ ⋅ ⋅ ∩ Ak ∩ Ak+1 ′ ′ 1 k k+1 ( ) ( ) = P( A′) ⋅ ⋅ ⋅ P( A′ )P( A ). j j The proof is done by (a double) induction. 3. . let A′ = A1 and A′ = Ac . . then i P A1c ∩ A2 ∩ ⋅ ⋅ ⋅ ∩ Ak ∩ Ak+1 = P S − A1 ∩ A2 ∩ ⋅ ⋅ ⋅ ∩ Ak ∩ Ak+1 = P A2 ∩ ⋅ ⋅ ⋅ ∩ Ak ∩ Ak+1 − A1 ∩ A2 ∩ ⋅ ⋅ ⋅ ∩ Ak ∩ Ak+1 2 k k+1 1 2 k ( ( ) = P( A ∩ ⋅ ⋅ ⋅ ∩ A ∩ A ) − P( A ∩ A ∩ ⋅ ⋅ ⋅ ∩ A ∩ A ) = P( A ) ⋅ ⋅ ⋅ P( A ) P( A ) − P( A ) P( A ) ⋅ ⋅ ⋅ P( A ) P( A ) = P( A ) ⋅ ⋅ ⋅ P( A )P( A )[1 − P( A )] = P( A )P( A ) ⋅ ⋅ ⋅ P( A )P( A ). 1 2 2 2 2 1 Next. k. is often used by many authors without any reference to it. ) This relationship is established also by induction as follows: If A′ = Ac and 1 1 A′ = Ai. 4 . . regarding independence of events. where 1 n A′ is either Aj or Ac. {} Thus P A1 ∩ A2 ∩ A3 = ( ) 1 1 1 1 = ⋅ ⋅ = P A1 P A2 P A3 . 3 . . so are the events A′ . For A′ = Ac and A′ = Ac . A3 = 1. Then 1 2 1 2 1 2 2 P(A′ ∩ A′ ) = P(A1 ∩ Ac ) = P[A1 ∩ (S − A2)] = P(A1 − A1 ∩ A2) = P(A1) − 1 2 2 P(A1 ∩ A2) = P(A1) − P(A1)P(A2) = P(A1)[1 − P(A2)] = P(A′ )P(Ac ) = P(A′ )P(A′ ). For n = 2. . assume the assertion to be true for k events and show it to be true for k + 1 events. . It is the theorem below. 1 k 1 k and we shall show that P(A′ ∩ · · · ∩ A′ ) = P(A′ ) · · · P(A′ ). k+1 2 2 k k k+1 k+1 1 2 k k+1 1 c 1 2 k k+1 ) [( ] .30 2 Some Probabilistic Concepts and Results Let A1 = 1. . 2 . . . 1 1 2 2 Similarly if A′ = Ac and A′ = A2. Indeed. A2 = 1. .

. ′ ′ ( ) ( ) ( ) ( ) . clearly. and show that k c k+1 c c P A1′ ∩ ⋅ ⋅ ⋅ ∩ Ak ∩ Ak+1 = P A1′ ⋅ ⋅ ⋅ P Ak P Ak+1 . Now. . ′ ′ ( ) ( ) take A′ +1 = A . . assume that c P A1 ∩ ⋅ ⋅ ⋅ ∩ Alc ∩ Al+1 ∩ ⋅ ⋅ ⋅ ∩ Ak +1 ( c = P A1 ⋅ ⋅ ⋅ P Alc P Al+1 ⋅ ⋅ ⋅ P Ak +1 ( ) ( )( ) c l +1 ( ) ) and show that P A1c ∩ ⋅ ⋅ ⋅ ∩ Alc ∩ Alc+1 ∩ Al+ 2 ∩ ⋅ ⋅ ⋅ ∩ Ak +1 =P A Indeed.3 Independence 31 This is. we have shown that P A1′ ∩ ⋅ ⋅ ⋅ ∩ Ak ∩ Ak+1 = P A1′ ⋅ ⋅ ⋅ P Ak P Ak+1 . ′ ′ ( ) ( ) ( ) ( ( ) ) Finally. true if Ac is replaced by any other Ac. . true that the same result holds if the A′’s i which are Ac are chosen in any one of the (k) possible ways of choosing out i of k. for 1 i < k. Thus.2. k +1 ) P A1c ∩ ⋅ ⋅ ⋅ ∩ Alc ∩ Alc+1 ∩ Al+ 2 ∩ ⋅ ⋅ ⋅ ∩ Ak+1 ( = P A1c ∩ ⋅ ⋅ ⋅ ∩ Alc ∩ S − Al+1 ∩ Al+ 2 ∩ ⋅ ⋅ ⋅ ∩ Ak+1 = P A ∩ ⋅ ⋅ ⋅ ∩ A ∩ Al+ 2 ∩ ⋅ ⋅ ⋅ ∩ Ak+1 c 1 c l [ ( ( ( ) ) ) ) ] − A1c ∩ ⋅ ⋅ ⋅ ∩ Alc ∩ Al+1 ∩ Al+ 2 ∩ ⋅ ⋅ ⋅ ∩ Ak+1 = P A ∩ ⋅ ⋅ ⋅ ∩ A ∩ Al+ 2 ∩ ⋅ ⋅ ⋅ ∩ Ak+1 c 1 − P A ∩ ⋅ ⋅ ⋅ ∩ A ∩ Al+1 ∩ Al+ 2 ∩ ⋅ ⋅ ⋅ ∩ Ak+1 c 1 c l = P A1c ⋅ ⋅ ⋅ P Alc P Al+ 2 ⋅ ⋅ ⋅ P Ak+1 c 1 c l l +1 l+ 2 (by the induction hypothesis) = P( A ) ⋅ ⋅ ⋅ P( A )P( A ) ⋅ ⋅ ⋅ P( A )[1 − P( A )] = P( A ) ⋅ ⋅ ⋅ P( A ) P( A ) ⋅ ⋅ ⋅ P( A ) P( A ) = P( A ) ⋅ ⋅ ⋅ P( A )P( A )P( A ) ⋅ ⋅ ⋅ P( A ). c 1 c 1 c l c l l+ 2 l+ 2 k+1 k+1 l +1 c l +1 c 1 c l c l +1 l+ 2 k+1 ( ) ( )( ) ( ) − P( A ) ⋅ ⋅ ⋅ P( A ) P( A ) P( A ) ⋅ ⋅ ⋅ P( A ) k+1 ( c l ) as was to be seen. under the assumption that P A1′ ∩ ⋅ ⋅ ⋅ ∩ Ak = P A1′ ⋅ ⋅ ⋅ P Ak . c 1 c l l+ 2 ( ( ) ⋅ ⋅ ⋅ P( A )P( A )P( A ) ⋅ ⋅ ⋅ P( A ). clearly. i = 2.4 Combinatorial Results 2. k. It is also.

c P A1′ ∩ ⋅ ⋅ ⋅ ∩ Ak ∩ Ak+1 = P A1′ ∩ ⋅ ⋅ ⋅ ∩ Ak ∩ S − Ak+1 ′ ′ ( ( ) = P( A′ ∩ ⋅ ⋅ ⋅ ∩ A′ ) − P( A′ ∩ ⋅ ⋅ ⋅ ∩ A′ ∩ A ) = P( A′) ⋅ ⋅ ⋅ P( A′ ) − P( A′) ⋅ ⋅ ⋅ P( A′ )P( A ) (by the induction hypothesis and what was last proved) = P( A′) ⋅ ⋅ ⋅ P( A′ )[1 − P( A )] = P( A′) ⋅ ⋅ ⋅ P( A′ )P( A ). ⋅ ⋅ ⋅ . . E2) of experiments. The above deﬁnitions generalize in a straightforward manner to any ﬁnite number of experiments. The notion of independence also carries over to experiments. or repeatedly tossing the same or different coins etc. ⋅ ⋅ ⋅ . and all events B2 associated E2 alone. n}. j = 1. where S = S1 × ⋅ ⋅ ⋅ × S n = {(s . class of events. What actually happens in practice is to start out with two experiments E1. ⋅ ⋅ ⋅ . . for j = 1. E . = P A1′ ∩ ⋅ ⋅ ⋅ ∩ Ak − A1′ ∩ ⋅ ⋅ ⋅ ∩ Ak ∩ Ak+1 ′ ′ 1 1 k 1 k k+1 k 1 k k+1 1 1 k k k+1 c k+1 ) [( ( ))] This completes the proof of the theorem. are n experiments with corresponding sample spaces Sj and probability functions Pj on the respective classes of events. . we say that the experiments E1 and E2 are independent if P(B1 ∩ B2) = P(B1)P(B2) for all events B1 associated with E1 alone. also denoted by E1 × E2.. . . s ∈S . s2). s1 ∈ S1. clearly. Thus. . 2. E ) = E 1 2 n 1 × E2 × ⋅ ⋅ ⋅ × En has sample space S. If S stands for this sample space. S = S1 × S2 = {(s1. such as the descriptive experiments (also mentioned above) of successively drawing balls with replacement from the same urn with always the same content. ▲ Now. 2. E2 which are intuitively independent. The corresponding events are. then the compound experiment (E .32 2 Some Probabilistic Concepts and Results In fact. s2 ∈ S2}. and the experiments are said to be independent if for all events Bj associated with experiment Ej alone. then. 2. j = 1. in terms of P1 and P2. n. 2. j = 1. class of events. subsets of S. P1) and (S2. P2). or drawing cards with replacement from the same deck of playing cards. on the class of events in the space S1 × S2 so that it reﬂects the intuitive independence. . Thus. 1 n j j The class of events are subsets of S. let Ej be an experiment having the sample space Sj. and then deﬁne the probability function P. One may look at the pair (E1. n. and then the question arises as to what is the appropriate sample space for this composite or compound experiment. s ). of course. if Ej. and have the corresponding probability spaces (S1. it holds . .

j = 1. by means of a counterexample. what is the probability that he will pass both tests on his nth attempt? (Assume that the road test cannot be taken unless he passes the written test. . the probability function P is deﬁned.4 For each j = 1. . . . then P(A) = 0 or 1. so that: . we mention that events and experiments which are not independent are said to be dependent. B + C are independent.1 If A and B are disjoint events. . and that once he passes the written test he does not have to take it again.6 Jim takes the written and road driver’s license tests repeatedly until he passes them. i ≠ j. .3. . . the probability of a successful hit is 3 . 4 what is the probability that: i) All will successfully hit the target? ii) At least one will do so? How many missiles should be ﬁred.7 The probability that a missile ﬁred against a target is not intercepted by 2 an antimissile missile is 3 . Also.2 Show that if the event A is independent of itself. . 2.3. ( ) ( ) ( ) Again. B are independent. . . j 2.6 and that tests are independent of each other. Then show that the events A1. then A. .3. . . Am. then show that A and B are independent if and only if at least one of P(A). Σn= 1Bj are independent. A. . C are independent and B ∩ C = ∅. . Exercises 2. no matter whether he passes or fails his next road test. 2. j = 1. 2.5 If Aj. suppose that the events A1.3. n. If four missiles are ﬁred independently. j = 1. Bj are independent and that Bi ∩ Bj = ∅. Show. In closing this section.3 If A.3. . the written and the road tests are considered distinct attempts. n are independent events.2. . Am.9 and the road test is 0. j ⎝ j =1 ⎠ j =1 ( ) 2. . . n. 2. . Given that the probability that he passes the written test is 0. n. P(B) is zero. . show that n ⎞ ⎛ n P⎜ U A j ⎟ = 1 − ∏ P Ac . that the conclusion need not be true if B ∩ C ≠ ∅. 2. Given that the missile has not been intercepted. . in terms of Pj.4 Combinatorial Results Exercises 33 P B1 ∩ ⋅ ⋅ ⋅ ∩ Bn = P B1 ⋅ ⋅ ⋅ P Bn . .3. on the class of events in S so that to reﬂect the intuitive independence of the experiments Ej.) 2.3.

At least m circuits are closed with m as in part (iii). Also examples illustrating the theorems of previous sections will be presented.4 Combinatorial Results In this section.99? 2. . . we will restrict ourselves to ﬁnite sample spaces and uniform probability functions. say? 1 2 A n B 2. Player B wins.3.8 Two fair dice are rolled repeatedly and independently. Exactly m circuits are closed for 0 ≤ m ≤ n. and the game is terminated. If the circuits close independently of each other and with respective probabilities pi.34 2 Some Probabilistic Concepts and Results iii) At least one is not intercepted with probability ≥ 0. Player A wins. Compute the probabilities that: i) ii) iii) iv) v) The game terminates on the nth throw and player A wins. . Does the game terminate ever? 2. . Some combinatorial results will be needed and we proceed to derive them here.95? iv) At least one successfully hits its target with probability ≥ 0.9 Electric current is transmitted from point A to point B provided at least one of the circuits #1 through #n below is closed. while the ﬁrst time that a total of 6 appears. determine the probability that: i) ii) iii) iv) v) Exactly one circuit is closed. player A wins. The ﬁrst time a total of 10 appears. i = 1. At least one circuit is closed. . The same for player B. n. player B wins.3. What do parts (i)–(iv) become for p1 = · · · = pn = p.

since by combining each one of the n1 ways of performing subtask T1 with each one of the n2 ways of performing subtask T2. 2. iii) Let nj = n(Sj) be the number of points of the sample space Sj. j The assertion is true for k = 2. ii) Consider the set S = {1. × Ek is n1 . since by combining each one of the ∏ m 1 n j ways of performing the ﬁrst m subtasks with each one of j= nm+1 ways of performing substask Tm+1. 2. . . . . . and in particular. ii) The number of unordered samples of size k is . The main results will be formulated as theorems and their proofs will be applications of the Fundamental Principle of Counting. j = 1. forms the backbone of the results in this section. we consider for each element whether to include it or not. . THEOREM 7 Let a task T be completed by carrying out all of the subtasks Tj. provided the sampling is done without replacement. and is equal to nk if the sampling is done with replacement. . The reasoning is the same as in the step just completed. k. k. j = 1. we obtain n1n2 as the total number of ways of performing task T. If k balls are drawn from the urn. ▲ PROOF ( ) The following examples serve as an illustration to Theorem 7. assume the result to be true for k = m and establish it for k = m + 1. . we obtain ∏ m 1 n j × nm+1 = ∏ jm+1 n j for j= =1 the total number of ways of completing task T. . Then the required number is equal to the following product of N factors 2 · · · 2 = 2N. k. . 2. Pn. we shall consider the problems of selecting balls from an urn and also placing balls into cells which serve as general models of many interesting real life problems. and let it be possible to perform the subtask Tj in nj (different) ways. . j = 1. . if nj is the number of outcomes of the experiment Ej. nk. Then the total number of ways the task T may be performed is given by ∏ k= 1 n j . known as the Fundamental Principle of Counting.4 Combinatorial Results 35 The following theorem. . . k.k (permutations of k objects out of n. . N} and suppose that we are interested in ﬁnding the number of its subsets. EXAMPLE 3 i) A man has ﬁve suits. Then we have the following result. but otherwise identical) balls. . Then the number of different ways he can attire himself is 5 · 3 · 2 = 30. . . Consider an urn which contains n numbered (distinct. . . . if k = n. Next. 2. three pairs of shoes and two hats. j = 1. .2. we say that a sample of size k was drawn. In the following. THEOREM 8 i) The number of ordered samples of size k is n(n − 1) · · · (n − k + 1) = Pn.n = 1 · 2 · · · n = n!). In forming a subset. The sample is ordered if the order in which the balls are drawn is taken into consideration and unordered otherwise. . then the number of outcomes of the compound experiment E1 × . . Then the sample space S = S1 × · · · × Sk has n(S) = n1 · · · nk sample points. Or.

. j = 1. take k out of n + k − 1. . irrelevant and the required number is (N) 0 N N + ( 1) + · · · + (N) = 2N. However. Since for every sample of size k one can form k! ordered samples of the same size. and 53 = 125 with repetitions. Golomb and appeared in the American Mathematical Monthly. [See also Theorem 9(iii). k . In part (i). . Hence the desired result. N}. . if order counts. “repeat (k − 1)st lowest numbered card.k n! = C n . then Pn. 1968. .k = ⎜ ⎟ = k! ⎝ k⎠ k! n − k ! ( ) if the sampling is done without replacement. ▲ For the sake of illustration of Theorem 8. k ⎠ ⎝ ( ) as was to be seen. ii) For the ﬁrst part. without replacement so that there will be at least one out of 1. 2.k.36 2 Some Probabilistic Concepts and Results ⎛ n⎞ Pn . . n. let us consider the following examples. as already found in Example 3. we have that.3 = 5 · 4 · 3 = 60 without repetitions. and then apply the instructions.] PROOF i) The ﬁrst part follows from Theorem 7 by taking nj = (n − j + 1). k = ⎜ ⎟ k ⎠ ⎝ ( ) if the sampling is done with replacement. consider the n balls to be cards numbered from 1 to n and adjoin k − 1 extra cards numbered from n + 1 to n + k − 1 and bearing the respective instructions: “repeat lowest numbered card.k = xk!.” Then a sample of size k without replacement from this enlarged (n + k − 1)-card deck corresponds uniquely to a sample of size k from the original deck with replacement. The proof of the second part may be carried out by an appropriate induction method. by the ﬁrst part. 75. . clearly. . . k.” . if x is the required number. . . 3. 530.) Thus. 4. 5. W. . j = 1. 2.” “repeat 2nd lowest numbered card. . . . we choose to present the following short alternative proof which is due to S. . clearly. k. this number is Pn. . . the required number is ⎛ n + k − 1⎞ ⎜ ⎟ = N n. p. Then the required number is P5. and the second part follows from the same theorem by taking nj = n. In part (ii) the order is. For clarity. the order in which the numbers are selected is relevant. and is equal to ⎛ n + k − 1⎞ N n. . EXAMPLE 4 i(i) Form all possible three digit numbers by using the numbers 1. . (ii) Find the number of all subsets of the set S = {1. (That is.

We thus let S be a set with N elements and assign the uniform probability measure to S. d) Choose one card (from the four at hand) of each face value chosen in (c).14. This can be done in (13) = 13 ways. This can be done in (4 ) 2 = 6 ways. . so there are ⎛ 52⎞ ⎜ ⎟ = N = 2. We arrive at a unique poker hand with one pair by completing the following tasks in order: a) Choose the face value of the pair from the 13 available face values. Since there are 12 face values to choose from. 8⋅7⋅6⋅5 7 ⎛ 4⎞ 3 ⎛ 4⎞ 2 ⎛ 4⎞ ⎛ 4⎞ ⎜ 1⎟ ⋅ 5 + ⎜ 2⎟ ⋅ 5 + ⎜ 3⎟ ⋅ 5 + ⎜ 4⎟ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ 671 Order counts/replacements allowed: ⎝ ⎠ = ≈ 0.16. 1 b) Choose two cards with the face value selected in (a). c) Choose the three face values for the other three cards in the hand.4 Combinatorial Results 37 EXAMPLE 5 An urn contains 8 balls numbered 1 to 8. This can be done in 4 · 4 · 4 = 43 = 64 ways. this can be done in (12) = 220 3 ways.17. 84 4. What is the probability that the smallest number is 3? Assuming the uniform probability function. ⎛ 8 + 4 − 1⎞ 165 ⎟ ⎜ 4 ⎠ ⎝ Order does not count/replacements not allowed: Order does not count/replacements allowed: Order counts/replacements not allowed: (5 ⋅ 4 ⋅ 3)4 = 1 ≈ 0. A poker hand with one pair has two cards of the same face value and three cards whose faces are all different among themselves and from that of the pair. 960 ⎝ 5⎠ different poker hands. the required probabilities are as follows for the respective four possible sampling cases: ⎛ 5⎞ ⎜ ⎟ ⎝ 3⎠ 1 = ≈ 0. ⎛ 8⎞ 7 ⎜ ⎟ ⎝ 4⎠ ⎛ 6 + 3 − 1⎞ ⎟ ⎜ 3 ⎠ ⎝ 28 = ≈ 0.14. Four balls are drawn.2. 096 EXAMPLE 6 What is the probability that a poker hand will have exactly one pair? A poker hand is a 5-subset of the set of 52 cards in a full deck. 598.

j = 1. . The n stars create n − 1 spaces and by selecting k − 1 of them in n−1 ways to place the k−1 k − 1 bars. k. the required probability is equal to 1. By ﬁxing the two extreme bars. there are 13 · 6 · 220 · 64 = 1. . Σk nj = n) j=1 is ⎛ ⎞ n n! =⎜ . As for the second part. ⎝ k − 1⎠ PROOF i) Obvious. this number becomes ⎛ n − 1⎞ ⎜ ⎟. Then the problem is that of selecting n spaces for the n n stars which can be done in k+n −1 ways. n1! n2 ! ⋅ ⋅ ⋅ nk ! ⎝ n1 . where the jth group contains exactly nj balls with nj as above. This can be done in the following number of ways: ⎛ n ⎞ ⎛ n − n1 ⎞ ⎛ n − n1 − ⋅ ⋅ ⋅ − nk−1 ⎞ n! ⎜ ⎟⎜ ⎟ ⋅⋅⋅⎜ ⎟ = n !n ! ⋅ ⋅ ⋅ n !. 598. since there are k places to put each of the n balls. nk ⎝ ⎠ ⎝ n1 ⎠ ⎝ n2 ⎠ 1 2 k iii) We represent the k cells by the k spaces between k + 1 vertical bars and the n balls by n stars. ii) The number of ways that n distinct balls can be distributed into k distinct cells so that the jth cell contains nj balls (nj ≥ 0. . nk ⎟ ⎠ iii) The number of ways that n indistinguishable balls can be distributed into k distinct cells is ⎛ k + n − 1⎞ ⎜ ⎟. ii) This problem is equivalent to partitioning the n balls into k groups. n ⎠ ⎝ Furthermore.240 poker hands with one pair. Hence. if n ≥ k and no cell is to be empty. by Theorem 6.38 2 Some Probabilistic Concepts and Results Then.098.42. n2 . 240 ≈ 0. ⋅ ⋅ ⋅ . by assuming the uniform probability measure. we are left with k + n − 1 bars and stars which we may consider as k + n − 1 spaces to be ﬁlled in by a bar or a star. 2. 960 THEOREM 9 i) The number of ways in which n distinct balls can be distributed into k distinct cells is kn. 098. . ▲ ( ) ( ) . we now have the condition that there should not be two adjacent bars. the result follows.

in dealing a bridge hand. ⎝ 1. East and West. ⋅ ⋅ ⋅ . Furthermore. This can be done in ⎛ ⎞ 48 48! ways.10 and 0. 1. nj ≥ 0. . Next. EXAMPLE 7 Find the probability that. . .11. 13. 12 to each player. This can be done in ⎛ ⎞ 4 4! ⎜ ⎟ = 1! 1! 1! 1! = 4! ways. Thus the required 1. EXAMPLE 8 The eleven letters of the word MISSISSIPPI are scrambled and then arranged in some order. The number of possible bridge hands is ⎛ ⎞ 52 52! N =⎜ . . North.2. each player receives one ace. 4 . it can be seen that this probability lies between 0. . 13⎠ 13! ( ) Our sample space S is a set with N elements and assign the uniform probability measure. ii) The answer to (ii) is also the answer to the following different question: Consider n numbered balls such that nj are identical among themselves and distinct from all others. ⎟= 4 ⎝ 13. ⎝ n1 . one to each player. k. Σk nj = n. ⎜ ⎟= 4 ⎝ 12. . j = 1. 1. 12. Then the number of j=1 different permutations is ⎛ ⎞ n ⎜ ⎟. n2 . j = 1. 2 probability is ( ) . k in the second part of the theorem are called occupancy numbers. 12. South. . 13. i) What is the probability that the four I’s are consecutive letters in the resulting arrangement? There are eight possible positions for the ﬁrst I and the remaining seven letters can be arranged in 7 distinct ways. 1⎠ b) Deal the remaining 48 cards. . 12⎠ 12! ( ) Thus the required number is 4!48!/(12!)4 and the desired probability is 4!48!(13!)4/[(12!)4(52!)]. has one ace can be found as follows: a) Deal the four aces. the number of sample points for which each player.4 Combinatorial Results 39 REMARK 5 i) The numbers nj. nk ⎠ Now consider the following examples for the purpose of illustrating the theorem.

if each symbol is either a dot or a dash? 2. . we clearly have P AB = ( ) ⎛ 5⎞ 6⎜ ⎟ ⎝ 2⎠ ⎛ 9 ⎞ ⎟ ⎜ ⎝ 4.02. 4. where B is the event that the arrangement starts with M and ends with S? Since there are only six positions for the ﬁrst I. as deﬁned above. 21 iii) What is the conditional probability of A. and the remaining four are also grouped together.3 How many different three-digit numbers can be formed by using the numbers 0. 2⎠ 4 = ≈ 0.4.4 Telephone numbers consist of seven digits. ⎛ 11 ⎞ 165 ⎜ ⎟ ⎝ 1. what is the number of possible combinations? 2. 2. 1.05. 4⎠ = 4 ≈ 0. where C is the event that the arrangement ends with four consecutive S’s? Since there are only four positions for the ﬁrst I. 9. . 9? 2.4. . switching it to the left and stopping at digit c. . . it is clear that P AC = ( ) ⎛ 3⎞ 4⎜ ⎟ ⎝ 2⎠ ⎛ 7 ⎞ ⎜ ⎟ ⎝ 1. b and c are chosen from among the numbers 0. 35 Exercises 2. 3. 4. 2⎠ = 1 ≈ 0.40 2 Some Probabilistic Concepts and Results ⎛ 7 ⎞ 8⎜ ⎟ ⎝ 1. given C. 4.4. How many numbers can be formed if: i) No restrictions are imposed? ii) If the ﬁrst three numbers are required to be 752? . three of which are grouped together.11. ﬁnally. . .2 How many distinct groups of n symbols in a row can be formed. then switching it to the right and stopping at digit b and.4. . 2⎠ ii) What is the conditional probability that the four I’s are consecutive (event A). given B.1 A combination lock can be unlocked by switching it to the left and stopping at digit a. If the distinct digits a. 1.

4.11 Consider ﬁve line segments of length 1.6 Suppose that the letters C. Then the six chips are mixed and drawn one by one without replacement. I and O are written on six chips and placed into an urn.10 Show that ⎛ n + 1⎞ ⎟ ⎜ ⎝ m + 1⎠ n+1 = . What is the probability that a majority will show up in the meeting? 2.13 In how many ways can a committee of 2n + 1 people be seated along one side of a table.8 If n countries exchange ambassadors.4. m men are to be drafted so that all possible combinations are equally likely to be chosen. m+1 ⎛ n⎞ ⎜ ⎟ ⎝ m⎠ 2.4. 5. What is the probability that their product is a negative number? 2. a committee member attends the meeting if an H appears.4 Combinatorial Results Exercises 41 2.4.4. What is the probability that: i) All 24 volumes appear in ascending order? ii) All 24 volumes appear in ascending order.2. if the chairman must sit in the middle? 2.4.9 From among n eligible draftees. What is the probability that a speciﬁed man is not drafted? 2.4. if each one of them tosses the coin independently n times? . how many ambassadors are involved? 2.14 Each of the 2n members of a committee ﬂips a fair coin in deciding whether or not to attend a meeting of the committee.5 A certain state uses ﬁve symbols for automobile license plates such that the ﬁrst two are letters and the last three numbers. F.4. 3.15 If the probability that a coin falls H is p (0 < p < 1). if: i) All letters and numbers may be used? ii) No two letters may be the same? 2. F.4. E. What is the probability that a triangle can be formed by using these three chosen line segments? 2.12 From 10 positive and 6 negative numbers. What is the probability that the word “OFFICE” is formed? 2.4.7 The 24 volumes of the Encyclopaedia Britannica are arranged on a shelf. 7 and 9 and choose three of them at random. 3 numbers are chosen at random and without repetitions. given that volumes 14 and 15 appeared in ascending order and that volumes 1–13 precede volume 14? 2.4. How many license plates can be made. what is the probability that two people obtain the same number of H’s.

What is the probability that the shipment will be accepted? 2.000 light bulbs contains 200 defective items and 1. For each question there are supplied 5 different answers (of which only one is correct). i) ⎛ n⎞ ∑ ⎜ j⎟ = 2n.4. ⎝ ⎠ j j=0 n ⎛ n⎞ 2.4.17 A shipment of 2. j=0 ⎝ ⎠ n ii ) ∑ (−1) ⎜ j ⎟ = 0.16 i) Six fair dice are tossed once.42 2 Some Probabilistic Concepts and Results 2. The student is required to answer correctly at least 25 questions in order to pass the test.18 Show that ⎛ M ⎞ ⎛ M − 1⎞ ⎛ M − 1⎞ ⎜ ⎟ =⎜ ⎟ +⎜ ⎟.4.22 A student committee of 12 people is to be formed from among 100 freshmen (60 male + 40 female).20 Show that x > k. ⎝ m ⎠ ⎝ m ⎠ ⎝ m − 1⎠ where N. Five hundred bulbs are chosen at random. What is the probability that every face appears at least once? 2.21 A student is given a test consisting of 30 questions. 70 juniors (46 male + 24 female). ii) Seven students are male and ﬁve female. are tested and the entire shipment is rejected if more than 25 bulbs from among those tested are found to be defective.4.19 Show that ∑ ⎜ x ⎟ ⎜ r − x⎟ = ⎜ ⎝ ⎠⎝ ⎠ ⎝ x =0 r ⎛ m⎞ ⎛ n ⎞ ⎛ m + n⎞ ⎟. .4.800 good items. 80 sophomores (50 male + 30 female). What is the probability that all six faces appear? ii) Seven fair dice are tossed once. and 40 seniors (28 male + 12 female). 2. If he knows the right answers to the ﬁrst 20 questions and chooses an answer to the remaining questions at random and independently of each other. what is the probability that he will pass the test? 2.4. Find the total number of different committees which can be formed under each one of the following requirements: i) No restrictions are imposed on the formation of the committee.4. m are positive integers and m < M. r ⎠ where ⎛ k⎞ ⎜ ⎟ = 0 if ⎝ x⎠ 2.

15 into the third.4. the secretary and the treasurer of the committee are all required to belong to different classes.23 Refer to Exercise 2. 2. iv) All long parts are paired with short parts. 5 are deﬁned as follows: . iv) The committee contains two male students and one female student from each class. What is the probability that this happens on the 20th throw? 2.25 Twenty letters addressed to 20 different addresses are placed at random into the 20 envelopes. what is the probability that the birthday of: 17 people falls into the ﬁrst such interval.4. j = 1. 5.26 Suppose that each one of the 365 days of a year is equally likely to be the birthday of each one of a given group of 73 people. .4.27 Suppose that each one of n sticks is broken into one long and one short part. Two parts are chosen at random.4.22 and suppose that the committee is formed by choosing its members at random.4 Combinatorial Results Exercises 43 iii) The committee contains the same number of students from each class. Find the probability that: iii) The parts are joined in the original order. 2. . What is the probability that: i) One part is long and one is short? ii) Both parts are either long or short? The 2n parts are arranged at random into n pairs from which new sticks are formed. j = 1.2.29 Three cards are drawn at random and with replacement from a standard deck of 52 playing cards. where the events Aj. What is the probability that: i) Forty people have the same birthdays and the other 33 also have the same birthday (which is different from that of the previous group)? ii) If a year is divided into ﬁve 73-day speciﬁed intervals.28 Derive the third part of Theorem 9 from Theorem 8(ii). . What is the probability that: i) All 20 letters go into the right envelopes? ii) Exactly 19 letters go into the right envelopes? iii) Exactly 17 letters go into the right envelopes? 2. vii) The chairman. Compute the probability that the committee to be chosen satisﬁes each one of the requirements (i)–(vii). 10 into the fourth and 8 into the ﬁfth interval? 2.4.4. Compute the probabilities P(Aj). . 2.4. . . vi) The committee chairman is required to be both a senior and male. . 23 into the second.24 A fair die is rolled independently until all faces appear at least once. .4. v) The committee chairman is required to be a senior. 2.

44 2 Some Probabilistic Concepts and Results A1 = s ∈ S . 1 card in s is a diamond. where the events Aj. Three cards are of three suits and the other two of the remaining suit.4. s consists of 5 diamonds. r1 balls are red. = {s ∈ S . . one is a king. 2.31 Consider hands of 5 cards from a standard deck of 52 playing cards. At least one ball is red. s does not contain aces. . { the second is a heart and the third is a club . tens and jacks}. s consists only of diamonds}. s consists of 1 color cards . A2 A3 A4 { } = {s ∈ S . . All ﬁve cards are of the same suit.30 Refer to Exercise 2. } 2. Find the number of all 5-card hands which satisfy one of the following requirements: i) ii) iii) iv) v) Exactly three cards are of one color. s contains at least 2 aces}.4. = {s ∈ S . = {s ∈ S . . j = 1. = {s ∈ S . 1 is a heart and 1 is a club . at least 2 cards in s are red}. 2 clubs and 3 spades}.32 and discuss the questions (i)–(iii) for r = 3 and r1 = r2 = r3 (= 1). . exactly 1 card in s is an ace}. j = 1. = {s ∈ S . 8. } A5 = s ∈ S . . 2. all 3 cards in s are black . r2 balls are black and r3 balls are white (r1 + r2 + r3 = r). j = 1. Two cards are aces. = {s ∈ S .32 An urn contains nR red balls. r balls are chosen at random and with replacement. There are balls of all three colors.29 and compute the probabilities P(Aj).4. Find the probability that: i) ii) iii) iv) All r balls are red. 8 are deﬁned as follows: A1 = s ∈ S . .4.4. A2 A3 A4 A5 A6 { } = {s ∈ S . . 2. . if the balls are drawn at random but without replacement. .4.34 Suppose that all 13-card hands are equally likely when a standard deck of 52 playing cards is dealt to 4 people. . 5 when the cards are drawn at random but without replacement. . Compute the probabilities P(Aj). one is a queen and one is a jack. 3 hearts.4.33 Refer to Exercise 2. . At least two of the cards are aces. the ﬁrst card in s is a diamond. s consists of cards of exactly 2 suits}. nB black balls and nW white balls. 2.

A1 ∈ A1 . A j = s ∈ S .34 and for j = 0. For j = 0.4. s contains exactly j tens .5* Product Probability Spaces The concepts discussed in Section 2. . P(A ∩ C) = P(A)P(C).4 Combinatorial Spaces 45 A7 = s ∈ S . P(Aj|A). 1. . A2. C are pairwise independent but not mutually independent.4. A2 ∈ A2 .2. B and C as follows: A = s ∈S . s consists of 3 aces. A8 { } = {s ∈ S . P(B ∩ C) = P(B)P(C). 2. 2 kings and exactly 7 red cards . deﬁne the events Aj and also A as follows: { } A = {s ∈ S .35 Refer to Exercise 2. 1. P(A ∩ B ∩ C) ≠ P(A)P(B)P(C). s begins with a speciﬁc letter . 4. The appropriate σ-ﬁeld A of events in S is deﬁned as follows: First deﬁne the class C by: C = A1 × A2 . compute the probabilities P(Aj).5 Product Probability Results 2. s has the speciﬁed letter mentioned in the deﬁnition of A in the middle entry . 2. Then show that: i) ii) iii) iv) P(A ∩ B) = P(A)P(B). Deﬁne the events A. compare the numbers P(Aj).3 can be stated precisely by utilizing more technical language. s 1 2 1 1 } 2 ∈ A2 . s ). Thus. P1) and (S2.4. then the compound experiment (E1. if we consider the experiments E1 and E2 with respective probability spaces (S1. { { { } ( } ) } Thus the events A. B. s ∈ A . } . s contains exactly 7 red cards}. where A1 × A2 = { {(s . . E2) = E1 × E2 has sample space S = S1 × S2 as deﬁned earlier. 2. .36 Let S be the set of all n3 3-letter words of a language and let P be the equally likely probability function on the events of S. C = s ∈S . . A1. B = s ∈S . s has exactly two of its letters the same . P(Aj|A) and also P(A). s consists of cards of all different denominations}. . P2). 4. . .

. . j = 1. s ). n with corresponding probability spaces (Sj. . . 2. En) = E1 × · · · × En has probability space (S. Pj). 2. . . notice that the factor σ-ﬁelds Aj.46 2 Some Probabilistic Concepts and Results Then A is taken to be the σ-ﬁeld generated by C (see Theorem 4 in Chapter 1). the compound experiment (E1. j = 1. . j = 1. The experiments E1 and E2 are then said to be independent if P(B1 ∩ B2) = P(B1)P(B2) for all events B1 and B2 as deﬁned above. The probability measure P is usually denoted by P1 × · · · × Pn and is called the product probability measure (with factors Pj. j = 1. 2. The deﬁnition of independent events carries over to σ-ﬁelds as follows. Then independence of the experiments Ej. . n (sub-σ-ﬁelds of A) are said to be independent if n ⎞ ⎛ n P⎜ I A j ⎟ = ∏ P A j ⎝ j =1 ⎠ j =1 ( ) for any A j ∈ A j . where C = A1 × ⋅ ⋅ ⋅ × An . the σ-ﬁelds Aj. 2. . . 2. ⋅ ⋅ ⋅ . 1 n j j A is the σ-ﬁeld generated by the class C. j = 1. At this point. s ∈S . . . ⋅ ⋅ ⋅ . n amounts to independence of the corresponding σ-ﬁelds Aj. . It can be shown that P determines uniquely a probability measure on A (by means of the so-called Carathéodory extension theorem). . Pj). A2 be two sub-σ-ﬁelds of A. and P is the unique probability measure deﬁned on A through the relationships { } P A1 × ⋅ ⋅ ⋅ × An = P A1 ⋅ ⋅ ⋅ P An . j = 1. 2. . where Bj is deﬁned by ( ) ( ) ( ) B j = S1 × ⋅ ⋅ ⋅ × S j − 1 × A j × S j + 1 × ⋅ ⋅ ⋅ × S n . . where S = S1 × ⋅ ⋅ ⋅ × S n = {(s . n. . Then the experiments Ej. P) is called the product probability space (with factors (Sj. . Of course. Aj. j = 1. ⋅ ⋅ ⋅ . j = 1. . j = 1. . More generally. deﬁne on C the set function P by P(A1 × A2) = P1(A1)P2(A2). . A j ∈ A j . Next. 2. A. P). . . A. ⋅ ⋅ ⋅ . . A2 ∈ A2. . n. . n}. n may be considered as sub-σ-ﬁelds of the product σ-ﬁeld A by identifying Aj with Bj. This probability measure is usually denoted by P1 × P2 and is called the product probability measure (with factors P1 and P2). 2. A j ∈ A j . 2. n (looked upon as sub-σ-ﬁelds of the product σ-ﬁeld A). n. A. . . . n). A2 are independent if P(A1 ∩ A2) = P(A1)P(A2) for any A1 ∈ A1. j = 1. P) is called the product probability space (with factors (Sj. Aj. Pj). 2). . σ-ﬁelds which are not independent are said to be dependent. . . n are said to be independent if P(B1 ∩ · · · ∩ B2) = P(B1) · · · P(B2). n). 2. For n experiments Ej. It is to be noted that events which refer to E1 alone are of the form B1 = A1 × S2. 2. and the probability space (S. . j = 1. 2. Aj. j = 1. We say that A1. . ⋅ ⋅ ⋅ . ⋅ ⋅ ⋅ . A2 ∈ A2. . and the probability space (S. and those referring to E2 alone are of the form B2 = S1 × A2. j = 1. 2. n . . where the Bj’s are deﬁned above. . A1 ∈ A1. Let A1. j = 1.

( ) 1≤ j1 < j2 < ⋅ ⋅ ⋅ < jr ≤ M ∑ P A j1 ∩ A j2 ∩ ⋅ ⋅ ⋅ ∩ A jr . go}. B × C. 2. The theorem is then illustrated by means of two interesting examples. an important result. . defective). is established providing an expression for the probability of occurrence of exactly m events out of possible M events. S1 = ∑ P A j . show that A × C ⊆ B × C for any set C. iii) (A × B) ∪ (C × D) = (A ∪ C) × (B ∪ D) − [(A ∩ Cc) × (Bc ∩ D) + (Ac ∩ C) × (B ∩ Dc )]. M occur. ( ) S M = P A1 ∩ A2 ∩ ⋅ ⋅ ⋅ ∩ AM .2. j =1 M ( ) S2 = M Sr = M 1≤ j1 < j2 ≤ M ∑ P A j1 ∩ A j2 .5. B = {good. Theorem 10.2 Show that A × B = ∅ if and only if at least one of the sets A.4 Show that i) (A × B)c = (A × Bc) + (Ac × B) + (Ac × Bc). 2. (1. 1). j = 1.5. 2). Consider M events Aj.3 If A ⊆ B. For this purpose. ii) (A × B) ∩ (C × D) = (A ∩ C) × (B ∩ D). A × B × C. 2)}.1 Form the Cartesian products A × B. C = {(1. Let also ( ) Bm = exactly ⎫ ⎪ C m = at least ⎬m of the events A j .5. Dm = at most ⎪ ⎭ Then we have . (2. some additional notation is needed which we proceed to introduce. . 2. j = 1. where A = {stop. B is ∅.4 Combinatorial Results 47 Exercises 2. A × C. . ⋅ ⋅ ⋅ .6* The Probability of Matchings In this section. 2.5.6* The Probability of Matchings 2. 2. 2. . M and set S0 = 1.

EXAMPLE 9 The matching problem (case of sampling without replacement). z2. the event Ak that a match will occur in the kth urn may be written Ak = {(z1. . .48 2 Some Probabilistic Concepts and Results THEOREM 10 With the notation introduced above ⎛ m + 1⎞ ⎛ m + 2⎞ P Bm = S m − ⎜ ⎟ S m+1 + ⎜ ⎟ S m+ 2 − ⋅ ⋅ ⋅ + −1 ⎝ m ⎠ ⎝ m ⎠ ( ) ( ) M −m ⎛ M⎞ ⎜ ⎟ SM ⎝ m⎠ (2) which for m = 0 is P B0 = S0 − S1 + S 2 − ⋅ ⋅ ⋅ + −1 and ( ) ( ) M SM . ( ) ( ) ( ) ( ) (4) and P Dm = P B0 + P B1 + ⋅ ⋅ ⋅ + P Bm . I. with one ball in each urn. zM)′ ∈ M. 99–100. Feller.6 of Chapter 5. 1 ≤ zj . pp. ( ) ( ) ( ) ( ) (5) For the proof of this theorem. zj integer. for m = 0. since (4) and (5) follow from it. .63 M! for large M. . . numbered 1 to M. i) Show the probability of at least one match is 1− 1 1 + − ⋅ ⋅ ⋅ + −1 2! 3! ( ) M +1 1 ≈ 1 − e −1 ≈ 0. (3) P C m = P Bm + P Bm+1 + ⋅ ⋅ ⋅ + P BM . all that one has to establish is (2). . Vol. and ii) exactly m matches will occur. . . Suppose that we have M urns. . 3rd ed. For k = 1. . zM) whose jth component represents the number of the ball inserted in the jth urn. . a match is said to have occurred. The following examples illustrate the above theorem. 2. . . M. . write an M-tuple (z1. 1. To describe the distribution of the balls among the urns. M is 1 ⎛ 1 1 ⎜ 1 − 1 + − + ⋅ ⋅ ⋅ + −1 ⎜ m! ⎝ 2! 3! = DISCUSSION ( ) M −m ( ⎞ 1 ⎟ M − m !⎟ ⎠ ) 1 M −m ∑ −1 m! k = 0 ( ) k 1 1 −1 e ≈ k! m! for M − m large. This will be done in Section 5. Let M balls numbered 1 to M be inserted randomly in the urns. . 1968.. . 2. For a proof where S is discrete the reader is referred to the book An Introduction to Probability Theory and Its Applications. by W. If a ball is placed into the urn bearing the same number as the ball.

We now deﬁne the events A1. ⎛ M⎞ M − r ! 1 Sr = ⎜ ⎟ = . M⎠ ⎝ M ⎠ ⎝ ( ) 2 n n k = 1. one of which is the following.6* The Probability of Matchings 2. that is. kr. each bearing one of the integers 1 to M. . 1 to M. . k1 = 1. . z j integer. . Suppose that a manufacturer gives away in packages of his product certain items (which we take to be coupons). z j ≠ k. j = 1. exactly m urns remain empty after the n balls have been distributed)? DISCUSSION To describe the coupons found in the n packages purchased. . ⋅ ⋅ ⋅ . n P Ak ∩ Ak 1 ( ) ⎛ M − 2⎞ ⎛ 2⎞ =⎜ ⎟ = ⎜1 − ⎟ . k2. zk = k}. M ⎠ M⎠ ⎝ ⎝ n n and. whose jth component zj represents the number of the coupon found in the jth package purchased. If n packages are bought. in general. · · · . 2. j = 1. . 2. M and any r unequal integers k1. what is the probability that there will be exactly m urns in which no ball was placed (that is. EXAMPLE 10 ( ) Coupon collecting (case of sampling with replacement). . . M. . n k2 = k1 + 1. 2. numbered 1 to M. ⋅ ⋅ ⋅ . zn ∈ n . For k = 1. . z2. M. . A2. r! ⎝ r ⎠ M! This implies the desired results. . It is clear that for any integer r = 1. ⋅ ⋅ ⋅ . from 1 to M. ′ ⎧ Ak = ⎨ z1 . will not be obtained is equal to k ⎛ M − m⎞ ⎛ ⎛ M⎞ M −m m + k⎞ ⎟ ⎜1 − M ⎟ . ⎜ ⎟ ∑ −1 ⎜ ⎠ ⎝ k ⎠⎝ ⎝ m⎠ k = 0 ( ) n Many variations and applications of the above problem are described in the literature. If n distinguishable balls are distributed among M urns. 2. M . in such a way that each of the M items is equally likely to be found in any package purchased. . . 2. . we write an n-tuple (z1. zn). . Ak is the event that the number k will not appear in the sample.. P Ak1 ∩ Ak2 ∩ ⋅ ⋅ ⋅ ∩ Akr = It then follows that Sr is given by ( M −r ! ) ( M! ) . . . show that the probability that exactly m of the integers. 1 ≤ z j ≤ M . .2. ⎭ ⎛ M − 1⎞ ⎛ 1⎞ P Ak = ⎜ ⎟ = ⎜1 − ⎟ . ⎩ It is easy to see that we have the following results: ( ) ⎫ n⎬. . ⋅ ⋅ ⋅ . AM. .4 Combinatorial Results 49 ≤ M. ⋅ ⋅ ⋅ .

AM will occur. ⋅ ⋅ ⋅ . . n k2 = k1 + 1. . . M . ⎠ ⎝ m ⎠ k=0 ⎝ k ⎠⎝ ( ) n (7) by setting r − m = k and using the identity ⎛ m + k⎞ ⎛ M ⎞ ⎛ M ⎞ ⎛ M − m⎞ ⎜ ⎟⎜ ⎟ = ⎜ ⎟⎜ ⎟. . For i = 1. clearly. 2. r = 0. required the event is the sum of the events c c A1 . . By relations (2) and (6). Then in a series of independent trials. A1c ∩ B1c ∩ A2 . show that: P A occurs before B occurs = PROOF ( ) P A P A +P B ( ) ( ) ( ) .50 2 Some Probabilistic Concepts and Results ⎛ r ⎞ P Ak ∩ Ak ∩ ⋅ ⋅ ⋅ ∩ Ak = ⎜ 1 − ⎟ . . Bm is the event that exactly m of the events A1. A1c ∩ B1c ∩ A2 ∩ B2 ∩ A3 . M⎠ ⎝ r ⎠⎝ n (6) Let Bm be the event that exactly m of the integers 1 to M will not be found in the sample. c c A1c ∩ B1c ∩ ⋅ ⋅ ⋅ ∩ An ∩ Bn ∩ An+1 .” Then. 2. THEOREM 11 Let A and B be two disjoint events. ⋅ ⋅ ⋅ and therefore . ⋅ ⋅ ⋅ . 1. This section is concluded with the following important result stated as a theorem. M⎠ ⎝ ( 1 2 r ) n k1 = 1.” Bi = “B occurs on the ith trial. ⋅ ⋅ ⋅ . ⎝ m ⎠ ⎝ m + k⎠ ⎝ m ⎠ ⎝ k ⎠ (8) This is the desired result. Clearly. ⋅ ⋅ ⋅ . we have P Bm = ( ) ∑( ) −1 r =m M r −m ⎛ r ⎞ ⎛ M⎞ ⎛ r ⎞ ⎜ ⎟ ⎜ ⎟ ⎜1 − M ⎟ ⎠ ⎝ m⎠ ⎝ r ⎠ ⎝ n k ⎛ M − m⎞ ⎛ ⎛ M⎞ M −m m + k⎞ = ⎜ ⎟ ∑ −1 ⎜ ⎟ ⎜1 − M ⎟ . ⋅ ⋅ ⋅ . n M kr = kr −1 + 1. deﬁne the events Ai and Bi as follows: Ai = “ A occurs on the ith trial. Thus the quantities Sr are given by ⎛ M⎞ ⎛ r ⎞ S r = ⎜ ⎟ ⎜ 1 − ⎟ . n. . .

2.6* The Probability of Matchings 2.4 Combinatorial Results

51

P A occurs before B occurs

(

)

c c = P A1 + A1c ∩ B1c ∩ A2 + A1c ∩ B1c ∩ A2 ∩ B2 ∩ A3 c c + ⋅ ⋅ ⋅ + A1c ∩ B1c ∩ ⋅ ⋅ ⋅ ∩ An ∩ Bn ∩ An+1 + ⋅ ⋅ ⋅

[ (

(

) (

)

]

)

c c = P A1 + P A1c ∩ B1c ∩ A2 + P A1c ∩ B1c ∩ A2 ∩ B2 ∩ A3 c c + ⋅ ⋅ ⋅ + P A1c ∩ B1c ∩ ⋅ ⋅ ⋅ ∩ An ∩ Bn ∩ An+1 + ⋅ ⋅ ⋅ c 1 c 1 c 1 c 1 c 2

) = P( A ) + P( A ∩ B ) P( A ) + P( A ∩ B ) P( A ∩ B ) P( A ) + ⋅ ⋅ ⋅ + P( A ∩ B ) ⋅ ⋅ ⋅ P( A ∩ B )P( A ) + ⋅ ⋅ ⋅ (by Theorem 6) = P( A) + P( A ∩ B )P( A) + P ( A ∩ B )P( A) + ⋅ ⋅ ⋅ + P ( A ∩ B )P( A) ⋅ ⋅ ⋅ = P( A)[1 + P( A ∩ B ) + P ( A ∩ B ) + ⋅ ⋅ ⋅ + P ( A ∩ B ) + ⋅ ⋅ ⋅]

1 2 c 2 3 c 1 c 1 c n c n n+1 c c 2 c c n c c

c c 2 c c n c c

( ) (

(

) (

)

=P A

1 ( )1−P A ∩B

(

c

c

)

.

But

c P Ac ∩ Bc = P ⎡ A ∪ B ⎤ = 1 − P A ∪ B ⎢ ⎥ ⎣ ⎦ = 1− P A+ B = 1− P A − P B ,

(

)

(

)

(

)

( ) ( ) ( )

so that

1 − P Ac ∩ Bc = P A + P B .

Therefore

(

) ( ) ()

)

**P A occurs before B occurs =
**

as asserted. ▲

(

P A

P A +P B

( ) ( )

( )

,

It is possible to interpret B as a catastrophic event, and A as an event consisting of taking certain precautionary and protective actions upon the energizing of a signaling device. Then the signiﬁcance of the above probability becomes apparent. As a concrete illustration, consider the following simple example (see also Exercise 2.6.3).

52

2

Some Probabilistic Concepts and Results

EXAMPLE 11

In repeated (independent) draws with replacement from a standard deck of 52 playing cards, calculate the probability that an ace occurs before a picture.

Let A = “an ace occurs,” B = “a picture occurs.”

4 13

Then P(A) = 1 4 = 13 13 = 1 . 4

4 52

=

1 13

and P(B) =

12 52

=

, so that P(A occurs before B occurs)

Exercises

2.6.1 Show that

⎛ m + k⎞ ⎛ M ⎞ ⎛ M ⎞ ⎛ M − m⎞ ⎜ ⎟⎜ ⎟ = ⎜ ⎟⎜ ⎟, ⎝ m ⎠ ⎝ m + k⎠ ⎝ m ⎠ ⎝ k ⎠

as asserted in relation (8). 2.6.2 Verify the transition in (7) and that the resulting expression is indeed the desired result. 2.6.3 Consider the following game of chance. Two fair dice are rolled repeatedly and independently. If the sum of the outcomes is either 7 or 11, the player wins immediately, while if the sum is either 2 or 3 or 12, the player loses immediately. If the sum is either 4 or 5 or 6 or 8 or 9 or 10, the player continues rolling the dice until either the same sum appears before a sum of 7 appears in which case he wins, or until a sum of 7 appears before the original sum appears in which case the player loses. It is assumed that the game terminates the ﬁrst time the player wins or loses. What is the probability of winning?

3.1 Soem General Concepts

53

Chapter 3

On Random Variables and Their Distributions

**3.1 Some General Concepts
**

Given a probability space (S, class of events, P), the main objective of probability theory is that of calculating probabilities of events which may be of importance to us. Such calculations are facilitated by a transformation of the sample space S, which may be quite an abstract set, into a subset of the real line with which we are already familiar. This is, actually, achieved by the introduction of the concept of a random variable. A random variable (r.v.) is a function (in the usual sense of the word), which assigns to each sample point s ∈ S a real number, the value of the r.v. at s. We require that an r.v. be a wellbehaving function. This is satisﬁed by stipulating that r.v.’s are measurable functions. For the precise deﬁnition of this concept and related results, the interested reader is referred to Section 3.5 below. Most functions as just deﬁned, which occur in practice are, indeed, r.v.’s, and we leave the matter to rest here. The notation X(S) will be used for the set of values of the r.v. X, the range of X. Random variables are denoted by the last letters of the alphabet X, Y, Z, etc., with or without subscripts. For a subset B of , we usually denote by (X ∈ B) the following event in S: (X ∈ B) = {s ∈ S; X(s) ∈ B} for simplicity. In particular, (X = x) = {s ∈ S; X(s) = x}. The probability distribution function (or just the distribution) of an r.v. X is usually denoted by PX and is a probability function deﬁned on subsets of as follows: PX (B) = P(X ∈ B). An r.v. X is said to be of the discrete type (or just discrete) if there are countable (that is, ﬁnitely many or denumerably inﬁnite) many points in , x1, x2, . . . , such that PX({xj}) > 0, j ≥ 1, and Σj PX({xj})(= Σj P(X = xj)) = 1. Then the function fX deﬁned on the entire by the relationships:

fX x j = PX x j

( )

({ })(= P(X = x ))

j

for x = x j ,

53

54

3

On Random Variables and Their Distributions

**and fX(x) = 0 otherwise has the properties:
**

fX x ≥ 0

()

for all x, and

∑ f (x ) = 1.

X j j

Furthermore, it is clear that

P X ∈B =

(

) ∑ f (x ).

X j x j ∈B

Thus, instead of striving to calculate the probability of the event {s ∈ S; X(s) ∈ B}, all we have to do is to sum up the values of fX(xj) for all those xj’s which lie in B; this assumes, of course, that the function fX is known. The function fX is called the probability density function (p.d.f.) of X. The distribution of a discrete r.v. will also be referred to as a discrete distribution. In the following section, we will see some discrete r.v.’s (distributions) often occurring in practice. They are the Binomial, Poisson, Hypergeometric, Negative Binomial, and the (discrete) Uniform distributions. Next, suppose that X is an r.v. which takes values in a (ﬁnite or inﬁnite but proper) interval I in with the following qualiﬁcation: P(X = x) = 0 for every single x in I. Such an r.v. is called an r.v. of the continuous type (or just a continuous r.v.). Also, it often happens for such an r.v. to have a function fX satisfying the properties fX(x) ≥ 0 for all x ∈ I, and P(X ∈ J) = ∫J fX(x)dx for any sub-interval J of I. Such a function is called the probability density function (p.d.f.) of X in analogy with the discrete case. It is to be noted, however, that here fX(x) does not represent the P(X = x)! A continuous r.v. X with a p.d.f. fX is called absolutely continuous to differentiate it from those continuous r.v.’s which do not have a p.d.f. In this book, however, we are not going to concern ourselves with non-absolutely continuous r.v.’s. Accordingly, the term “continuous” r.v. will be used instead of “absolutely continuous” r.v. Thus, the r.v.’s to be considered will be either discrete or continuous (= absolutely continuous). Roughly speaking, the idea that P(X = x) = 0 for all x for a continuous r.v. may be interpreted that X takes on “too many” values for each one of them to occur with positive probability. The fact that P(X = x) also follows formally by x the fact that P(X = x) = ∫ x fX(y)dy, and this is 0. Other interpretations are also possible. It is true, nevertheless, that X takes values in as small a neighborhood of x as we please with positive probability. The distribution of a continuous r.v. is also referred to as a continuous distribution. In Section 3.3, we will discuss some continuous r.v.’s (distributions) which occur often in practice. They are the Normal, Gamma, Chi-square, Negative Exponential, Uniform, Beta, Cauchy, and Lognormal distributions. Reference will also be made to t and F r.v.’s (distributions). Often one is given a function f and is asked whether f is a p.d.f. (of some r.v.). All one has to do is to check whether f is non-negative for all values of its argument, and whether the sum or integral of its values (over the appropriate set) is equal to 1.

3.2

Discrete Random Variables (and General Concepts 3.1 Soem Random Vectors)

55

When (a well-behaving) function X is deﬁned on a sample space S and takes values in the plane or the three-dimensional space or, more generally, in the k-dimensional space k, it is called a k-dimensional random vector (r. vector) and is denoted by X. Thus, an r.v. is a one-dimensional r. vector. The distribution of X, PX, is deﬁned as in the one-dimensional case by simply replacing B with subsets of k. The r. vector X is discrete if P(X = xj) > 0, j = 1, 2, . . . with Σj P(X = xj) = 1, and the function fX(x) = P(X = xj) for x = xj, and fX(x) = 0 otherwise is the p.d.f. of X. Once again, P(X ∈ B) = ∑ xj∈B fX(xj) for B subsets of k. The r. vector X is (absolutely) continuous if P(X = x) = 0 for all x ∈ I, but there is a function fX deﬁned on k such that:

fX x ≥ 0

()

for all x ∈

k

, and P X ∈ J = ∫ fX x dx

J

(

)

()

for any sub-rectangle J of I. The function fX is the p.d.f. of X. The distribution of a k-dimensional r. vector is also referred to as a k-dimensional discrete or (absolutely) continuous distribution, respectively, for a discrete or (absolutely) continuous r. vector. In Sections 3.2 and 3.3, we will discuss two representative multidimensional distributions; namely, the Multinomial (discrete) distribution, and the (continuous) Bivariate Normal distribution. We will write f rather than fX when no confusion is possible. Again, when one is presented with a function f and is asked whether f is a p.d.f. (of some r. vector), all one has to check is non-negativity of f, and that the sum of its values or its integral (over the appropriate space) is equal to 1.

**3.2 Discrete Random Variables (and Random Vectors)
**

3.2.1 Binomial

The Binomial distribution is associated with a Binomial experiment; that is, an experiment which results in two possible outcomes, one usually termed as a “success,” S, and the other called a “failure,” F. The respective probabilities are p and q. It is to be noted, however, that the experiment does not really have to result in two outcomes only. Once some of the possible outcomes are called a “failure,” any experiment can be reduced to a Binomial experiment. Here, if X is the r.v. denoting the number of successes in n binomial experiments, then

X S = 0, 1, 2, ⋅ ⋅ ⋅ , n ,

( ) {

}

⎛ n⎞ P X = x = f x = ⎜ ⎟ p x q n− x , ⎝ x⎠

(

) ()

where 0 < p < 1, q = 1 − p, and x = 0, 1, 2, . . . , n. That this is in fact a p.d.f. follows from the fact that f(x) ≥ 0 and

∑ f ( x ) = ∑ ⎜ x ⎟ p x q n− x = ( p + q ) ⎝ ⎠

x =0 x =0

n

n

⎛ n⎞

n

= 1n = 1.

56

3

On Random Variables and Their Distributions

**The appropriate S here is:
**

S = S, F × ⋅ ⋅ ⋅ × S, F

{

}

{

}

(n copies).

In particular, for n = 1, we have the Bernoulli or Point Binomial r.v. The r.v. X may be interpreted as representing the number of S’s (“successes”) in the compound experiment E × · · · × E (n copies), where E is the experiment resulting in the sample space {S, F} and the n experiments are independent (or, as we say, the n trials are independent). f(x) is the probability that exactly x S’s occur. In fact, f(x) = P(X = x) = P(of all n sequences of S’s and F’s with exactly x S’s). The probability of one such a sequence is pxqn−x by the independence of the trials and this also does not depend on the particular n sequence we are considering. Since there are ( x ) such sequences, the result follows. The distribution of X is called the Binomial distribution and the quantities n and p are called the parameters of the Binomial distribution. We denote the Binomial distribution by B(n, p). Often the notation X ∼ B(n, p) will be used to denote the fact that the r.v. X is distributed as B(n, p). Graphs of the p.d.f. of the B(n, p) distribution for selected values of n and p are given in Figs. 3.1 and 3.2.

f (x) 0.25 0.20 0.15 0.10 0.05 0 1 2 3 4 5 6 7 8 9 10 11 12 13 x n p 12

1 4

1 Figure 3.1 Graph of the p.d.f. of the Binomial distribution for n = 12, p = –. 4

**() f (1) = 0.1267 f (2 ) = 0.2323 f ( 3) = 0.2581 f ( 4 ) = 0.1936 f ( 5) = 0.1032 f (6) = 0.0401
**

f 0 = 0.0317

() f ( 8 ) = 0.0024 f (9) = 0.0004 f (10 ) = 0.0000 f (11) = 0.0000 f (12 ) = 0.0000

f 7 = 0.0115

3.2

3.1 Soem Random Vectors) Discrete Random Variables (and General Concepts

57

f (x) 0.25

0.20 0.15 0.10 0.05

n p

10

1 2

0

1 2 3 4 5 6 7 8 9 10

x

1 Figure 3.2 Graph of the p.d.f. of the Binomial distribution for n = 10, p = –. 2

**() f (1) = 0.0097 f (2 ) = 0.0440 f ( 3) = 0.1172 f ( 4 ) = 0.2051 f ( 5) = 0.2460
**

f 0 = 0.0010

**() f ( 7) = 0.1172 f ( 8 ) = 0.0440 f (9) = 0.0097 f (10 ) = 0.0010
**

f 6 = 0.2051

3.2.2 Poisson

λx , x! x = 0, 1, 2, . . . ; λ > 0. f is, in fact, a p.d.f., since f(x) ≥ 0 and

X S = 0, 1, 2, ⋅ ⋅ ⋅ ,

( ) {

}

P X = x = f x = e −λ

(

) ()

e λ = 1.

∑ f ( x ) = e λ ∑ x! = e

− x =0 x =0

∞

∞

λx

−λ

The distribution of X is called the Poisson distribution and is denoted by P(λ). λ is called the parameter of the distribution. Often the notation X ∼ P(λ) will be used to denote the fact that the r.v. X is distributed as P(λ). The Poisson distribution is appropriate for predicting the number of phone calls arriving at a given telephone exchange within a certain period of time, the number of particles emitted by a radioactive source within a certain period of time, etc. The reader who is interested in the applications of the Poisson distribution should see W. Feller, An Introduction to Probability Theory, Vol. I, 3rd ed., 1968, Chapter 6, pages 156–164, for further examples. In Theorem 1 in Section 3.4, it is shown that the Poisson distribution

58

3

On Random Variables and Their Distributions

may be taken as the limit of Binomial distributions. Roughly speaking, suppose that X ∼ B(n, p), xwhere n is large and p is small. Then P X = x = n− x ( xn ) px (1 − p) ≈ e − np ( np!) , x ≥ 0 . For the graph of the p.d.f. of the P(λ) x distribution for λ = 5 see Fig. 3.3. A visualization of such an approximation may be conceived by stipulating that certain events occur in a time interval [0,t] in the following manner: events occurring in nonoverlapping subintervals are independent; the probability that one event occurs in a small interval is approximately proportional to its length; and two or more events occur in such an interval with probability approximately 0. Then dividing [0,t] into a large number n of small intervals of length t/n, we have that the probability that exactly x events occur in [0,t] is x n− x approximately ( n )( λnt ) (1 − λnt ) , where λ is the factor of proportionality. x Setting pn = λnt , we have npn = λt and Theorem 1 in Section 3.4 gives that x x n− x ( nx )( λnt ) (1 − λnt ) ≈ e − λt (λxt ) . Thus Binomial probabilities are approximated by ! Poisson probabilities.

(

)

f (x) 0.20 0.15 0.10 0.05 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 x

Figure 3.3 Graph of the p.d.f. of the Poisson distribution with λ = 5.

**() f (1) = 0.0337 f (2 ) = 0.0843 f ( 3) = 0.1403 f ( 4 ) = 0.1755 f ( 5) = 0.1755 f (6) = 0.1462 f ( 7) = 0.1044 f ( 8 ) = 0.0653
**

f 0 = 0.0067

**() f (10 ) = 0.0181 f (11) = 0.0082 f (12 ) = 0.0035 f (13) = 0.0013 f (14 ) = 0.0005 f (15) = 0.0001
**

f 9 = 0.0363 f n is negligible for n ≥ 16.

()

3.2

Discrete Random Variables (and General Concepts 3.1 Soem Random Vectors)

59

3.2.3 Hypergeometric

X S = 0, 1, 2, ⋅ ⋅ ⋅ , r ,

( ) {

}

⎛ m⎞ ⎛ n ⎞ ⎜ ⎟⎜ ⎟ ⎝ x ⎠ ⎝ r − x⎠ f x = , ⎛ m + n⎞ ⎜ ⎟ ⎝ r ⎠

()

where ( m) = 0, by deﬁnition, for x > m. f is a p.d.f., since f(x) ≥ 0 and x

∑ f ( x ) = ⎛ m + n⎞ ∑ ⎜ x ⎟ ⎜ r − x⎟ = ⎛ m + n⎞ ⎜ ⎝ ⎠⎝ ⎠ ⎝

1 1

x =0

r

r

⎛ m⎞ ⎛ n ⎞

⎜ ⎝

r

⎟ ⎠

x =0

⎜ ⎝

r

⎟ ⎠

⎛ m + n⎞ ⎟ = 1. r ⎠

The distribution of X is called the Hypergeometric distribution and arises in situations like the following. From an urn containing m red balls and n black balls, r balls are drawn at random without replacement. Then X represents the number of red balls among the r balls selected, and f(x) is the probability that this number is exactly x. Here S = {all r-sequences of R’s and B’s}, where R stands for a red ball and B stands for a black ball. The urn/balls model just described is a generic model for situations often occurring in practice. For instance, the urn and the balls may be replaced by a box containing certain items manufactured by a certain process over a speciﬁed period of time, out of which m are defective and n meet set speciﬁcations.

3.2.4 Negative Binomial

X S = 0, 1, 2, ⋅ ⋅ ⋅ ,

( ) {

}

⎛ r + x − 1⎞ x f x = pr ⎜ ⎟q , x ⎠ ⎝

()

0 < p < 1, q = 1 − p, x = 0, 1, 2, . . . . f is, in fact, a p.d.f. since f(x) ≥ 0 and

x =0

∑ f (x) = p ∑ ⎜ ⎝

r

∞

⎛ r + x − 1⎞ x pr ⎟q = x ⎠ x =0 1− q

∞

(

)

r

=

pr = 1. pr

This follows by the Binomial theorem, according to which 1

(1 − x)

n

∞ ⎛ n + j − 1⎞ j = ∑⎜ ⎟x , j ⎠ j=0 ⎝

x < 1.

The distribution of X is called the Negative Binomial distribution. This distribution occurs in situations which have as a model the following. A Binomial experiment E, with sample space {S, F}, is repeated independently until exactly r S’s appear and then it is terminated. Then the r.v. X represents the number of times beyond r that the experiment is required to be carried out, and f(x) is the probability that this number of times is equal to x. In fact, here S =

60

3

On Random Variables and Their Distributions

{all (r + x)-sequences of S’s and F’s such that the rth S is at the end of the sequence}, x = 0, 1, . . . and f(x) = P(X = x) = P[all (r + x)-sequences as above for a speciﬁed x]. The probability of one such sequence is pr−1qxp by the independence assumption, and hence

⎛ r + x − 1⎞ r −1 x r ⎛ r + x − 1⎞ x f x =⎜ ⎟q . ⎟p q p= p ⎜ x ⎠ x ⎠ ⎝ ⎝

()

The above interpretation also justiﬁes the name of the distribution. For r = 1, we get the Geometric (or Pascal) distribution, namely f(x) = pqx, x = 0, 1, 2, . . . .

3.2.5

Discrete Uniform

X S = 0, 1, ⋅ ⋅ ⋅ , n − 1 ,

( ) {

}

f x =

()

1 , n

x = 0, 1, ⋅ ⋅ ⋅ , n − 1.

**This is the uniform probability measure. (See Fig. 3.4.)
**

f (x)

1 5

n

5

Figure 3.4 Graph of the p.d.f. of a Discrete Uniform distribution.

0

1

2

3

4

x

3.2.6

Here

Multinomial

k ⎧ ⎫ ⎪ ⎪ X S = ⎨x = x1 , ⋅ ⋅ ⋅ , xk ′ ; x j ≥ 0, j = 1, 2, ⋅ ⋅ ⋅ , k, ∑ x j = n⎬, ⎪ ⎪ j =1 ⎩ ⎭ k n! x1 x2 xk f x = p1 p2 ⋅ ⋅ ⋅ pk , p j > 0, j = 1, 2, ⋅ ⋅ ⋅ , k, ∑ p j = 1. x1! x 2 ! ⋅ ⋅ ⋅ x k ! j =1

()

(

)

()

That f is, in fact, a p.d.f. follows from the fact that

∑ f ( x) = ∑

X

x 1 ,⋅⋅⋅ , xk

n! x p1x1 ⋅ ⋅ ⋅ pk k = p1 + ⋅ ⋅ ⋅ + pk x1! ⋅ ⋅ ⋅ xk !

(

)

n

= 1n = 1,

where the summation extends over all xj’s such that xj ≥ 0, j = 1, 2, . . . , k, Σ k xj = n. The distribution of X is also called the Multinomial distribution and j=1 n, p1, . . . , pk are called the parameters of the distribution. This distribution occurs in situations like the following. A Multinomial experiment E with k possible outcomes Oj, j = 1, 2, . . . , k, and hence with sample space S = {all n-sequences of Oj’s}, is carried out n independent times. The probability of the Oj’s occurring is pj, j = 1, 2, . . . k with pj > 0 and ∑ k=1 p j = 1 . Then X is the j random vector whose jth component Xj represents the number of times xj the outcome Oj occurs, j = 1, 2, . . . , k. By setting x = (x1, . . . , xk)′, then f is the

3.1 Soem General Exercises Concepts

61

probability that the outcome Oj occurs exactly xj times. In fact f(x) = P(X = x) = P(“all n-sequences which contain exactly xj Oj’s, j = 1, 2, . . . , k). The probx x ability of each one of these sequences is p1 1 ⋅ ⋅ ⋅ pk k by independence, and since there are n!/( x1! ⋅ ⋅ ⋅ xk ! ) such sequences, the result follows. The fact that the r. vector X has the Multinomial distribution with parameters n and p1, . . . , pk may be denoted thus: X ∼ M(n; p1, . . . , pk).

REMARK 1 When the tables given in the appendices are not directly usable because the underlying parameters are not included there, we often resort to linear interpolation. As an illustration, suppose X ∼ B(25, 0.3) and we wish to calculate P(X = 10). The value p = 0.3 is not included in the Binomial 4 5 Tables in Appendix III. However, 16 = 0.25 < 0.3 < 0.3125 = 16 and the 4 5 probabilities P(X = 10), for p = 16 and p = 16 are, respectively, 0.9703 and 0.8756. Therefore linear interpolation produces the value:

0.9703 − 0.9703 − 0.8756 ×

(

)

0.3 − 0.25 = 0.8945. 0.3125 − 0.25

**Likewise for other discrete distributions. The same principle also applies appropriately to continuous distributions.
**

REMARK 2 In discrete distributions, we are often faced with calculations of the form ∑∞=1 xθ x . Under appropriate conditions, we may apply the following x approach:

∑ xθ

x =1

∞

x

= θ ∑ xθ x −1 = θ ∑

x =1 x =1

∞

∞

d x d θ =θ dθ dθ

∞ x=2

∑θ

x =1

∞

x

=θ

d ⎛ θ ⎞ θ ⎜ ⎟= dθ ⎝ 1 − θ ⎠ 1−θ

Similarly for the expression ∑

x x−1 θ

(

)

(

)

2

.

x−2

.

Exercises

3.2.1 A fair coin is tossed independently four times, and let X be the r.v. deﬁned on the usual sample space S for this experiment as follows:

X s = the number of H ’ s in s.

()

iii) What is the set of values of X? iii) What is the distribution of X? iii) What is the partition of S induced by X? 3.2.2 It has been observed that 12.5% of the applicants fail in a certain screening test. If X stands for the number of those out of 25 applicants who fail to pass the test, what is the probability that:

What is the minimum number. so that at least one of them is defective with probability at least 0.05. Calculate the probability that: iii) No particles reach the portion of space under consideration during time t. 3. 3.62 3 On Random Variables and Their Distributions iii) X ≥ 1? iii) X ≤ 20? iii) 5 ≤ X ≤ 20? 3.1 and P(X = 2) = 0.2.2. show that: ii) P(X = x) = P(Y = n − x).8 The phone calls arriving at a given telephone exchange within one minute follow the Poisson distribution with parameter λ = 10. q). n.v.1.2. ii) Also. .v. for any integers a.4 If the r.2.9 (Truncation of a Poisson r. Given that P(X = 0) = 0. .7 It has been observed that the number of particles emitted by a radioactive substance which reach a given portion of space during time t follows closely the Poisson distribution with parameter λ. where Y ∼ B(n. iii) At least 50 particles do so. n. one has: P(a ≤ X ≤ b) = P(n − b ≤ Y ≤ n − a). iii) Exactly 120 particles do so.95? Take p = 0. Find: .v. X is distributed as B(n. 3. compute the probability that X > 5. . with parameter λ. p) with p > 1 . In such a case. What is the probability that X < 10? If P(X = 1) = 0. and q = 1 − p.2. X be distributed as Poisson with parameter λ and deﬁne the r. where Y is as in part (i).6 Refer to Exercise 3. Y as follows: Y = X if X ≥ k a given positive integer ( ) and Y = 0 otherwise. x = 0. iv) Give the numerical values in (i)–(iii) if λ = 100.5 and suppose that P(X = 1) = P(X = 2).3 A manufacturing process produces certain articles such that the probability of each article being defective is p. 3. 3.v. of articles to be produced. the Binomial Tables 2 in Appendix III cannot be used directly.) Let the r. .5 Let X be a Poisson distributed r.2. 1. What is the probability that in a given minute: iii) No calls arrive? iii) Exactly 10 calls arrive? iii) At least 10 calls arrive? 3.2. b with 0 ≤ a < b ≤ n. calculate the probability that X = 0.v.2.2.

ii) Show that the distribution of Y is given by P Y = y = p2 y − 1 1 − p ( ) ( )( ) y−2 . . . 1. If X stands for the r. j = 3k + 1.2. . and let Xj be the r. () () where A = 0.1 Soem General Exercises Concepts 63 ii) P(Y = y). . k. . j = 1. ) 0 ≤ x j ≤ n j .14 Show that the function f(x) = ( 1 )xIA(x). () ( ) c .2. 3j j = 0. . k = 0.? f x = cα x I A x . consider an urn containing nj balls with the number j written on them. .200 are undergraduates and the remaining are graduate students. n balls are drawn at random and without replacement.600 students.}. k = 0. iii) P(X ∈ A). 3. . where B = {j. . . ii) P(Y = 0).f. 3. . y = 2. of whom 1.05. 1.3 and let Y be the r.v. .v. 2. k is given by P X j = x j .2. . 25 names are chosen at random.d. . what is the probability that X ≥ 10? 3.10 A university dormitory system houses 1.2. 2 For what value of c is the function f deﬁned below a p. From the combined list of their names. ⋅ ⋅ ⋅ . . 3. k. j =1 3. Then show that the joint distribution of Xj. . 2.}. j = 2k + 1. j = 1. denoting the number of graduate students among the 25 chosen. . ⋅ ⋅ ⋅ .13 3. . 1. . . ∑ x j = n.d. with the following probabilities: f j =P X=j = iii) Determine the constant c.15 Suppose that the r. X takes on the values 0. j = 1. . . . y = k. where A = {1. ii) Calculate the probability P(Y ≥ 100) for p = 0. denoting the minimum number of articles to be manufactured until the ﬁrst two defective articles appear. denoting the number of balls among the n ones with the number j written on them. ⋅ ⋅ ⋅ .f. iv) P(X ∈ B).12 Refer to the manufacturing process of Exercise 3. where A = {j.v. 1.v.2. 1. 3.2. ⋅ ⋅ ⋅ . . k + 1. ⋅ ⋅ ⋅ { } (0 < α < 1). is a p. k = ( ) ( ∏ k k nj j =1 x j n1 + ⋅ ⋅ ⋅ + nk n ( ).}.11 (Multiple Hypergeometric distribution) For j = 1.2. Compute the following probabilities: iii) P(X ≥ 10). .3.

. . 0.10. ⋅ ⋅ ⋅ . . and SE − SO = −λ e . respectively. show that: p iii) If X ∼ B(n. . . Determine the joint p. Then show that Xj is distributed as B(n. j = 0. . x = 0. . . k}. X4 = 3. B and AB. . 2. . . distributed as P(λ).v. . . 1. 0. . calculate the probabilities: P(X ∈ E) and P(X ∈ O).d. . eight of type A. Xk have the Multinomial distribution. ii) Find the numerical values of the probabilities in part (i) for λ = 5. A. . .20 The following recursive formulas may be used for calculating Binomial. . iii) If X has the Hypergeometric distribution.2. show that the r. jm are m distinct integers from the set {1. x = 0. . and set E = {0. what is the probability that: ii) All 20 people have blood of the same type? ii) Nine people have blood type O. ii) What is the joint p. . min m. X3 = 4.21 ( ) (m − x)(r − x) f x . . then f ( x + 1) = x +1 f ( x). notice that SE + SO = eλ. 1. of the Xj’s. . pj).f.2. To this effect. 3. Poisson and Hypergeometric probabilities. Suppose that these types occur with the following frequencies: 0.d. and let j be a ﬁxed number from the set {1. p). X2 = 5.’s Xj . .v. 3. . If 20 people are chosen at random. Xj have Multinomial distributions with parameters n and pj .’s Xj. X5 = 2. () (n − r + x + 1)(x + 1) x = 0. then f x+1 = 3.) 3. . . . }.2. . x +1 λ iii) If X ∼ P(λ). 1 m 1 m . . 6.’s X1. (Hint: If k k SE = Σ k∈E λk! and SO = Σ k∈O λk! .16 There are four distinct types of human blood denoted by O. 3 as follows: X j = the number of times j H ’ s appear. . k}.19 Let X be an r. 3. X6 = 1.2. . then f ( x + 1) = q n− x f ( x). · · · pj . ii) If m is an integer such that 2 ≤ m ≤ k − 1 and j1. r .64 3 On Random Variables and Their Distributions 3.40.17 A balanced die is tossed (independently) 21 times and let Xj be the number of times the number j appears. 1. . } and O = {1. 1.v.v. of the X’s? ii) Compute the probability that X1 = 6. . .f. Then: ii) In terms of λ.05.2. j = 1. . { } i) Suppose the r. . . n − 1. 0.18 Suppose that three coins are tossed (independently) n times and deﬁne the r.2. two of type B and one of type AB? 3. 2.45. .

denoting the number of black balls drawn.1 Normal (or Gaussian) X S = ⎡ x−μ exp⎢− ⎢ 2σ 2 2πσ ⎢ ⎣ 1 ( ) . μ ∈ . is replaced and c balls of the same color as the one drawn are placed into the urn. σ > 0). () ∞ ∞ ∞ I 2 = ⎡ ∫ f x dx ⎤ = ∫ f x dx ∫ f y dy ⎢ −∞ ⎥ −∞ −∞ ⎣ ⎦ ⎡ x−μ 2⎤ ⎡ 1 ∞ ⎢− ⎥ dx ⋅ 1 ∞ exp⎢− y − μ = exp ⎢ ⎢ 2πσ ∫−∞ σ ∫−∞ 2σ 2 ⎥ 2σ 2 ⎢ ⎥ ⎢ ⎣ ⎦ ⎣ () 2 () ) () ( ( ) 2 ⎤ ⎥ dy ⎥ ⎥ ⎦ = 1 1 ⋅ 2π σ ∫ ∞ −∞ e −z 2 2 σ dz ⋅ 1 σ ∫ ∞ −∞ e −υ 2 2 σ dυ . Feller. Then show that the p. so that υ ∈ (−∞. by W. In fact. Thus .v.2. 3rd ed. upon letting (x − μ)/σ = z. ) ) ) × b + r + 2c ⋅ ⋅ ⋅ b + r + m − 1 c [ ( )] (This distribution can be used for a rough description of the spread of contagious diseases. 120–121 and p. σ2 = variance. We say that X is distributed as normal (μ. For μ = 0.f x = () ( ) 2 ⎤ ⎥. pp. σ2 are called the parameters of the distribution of X which is also called the Normal distribution (μ = mean. 1). For more about this and also for a certain approximation to the above distribution.f. 1968.d. and (y − μ)/σ = υ. σ = 1. we get what is known as the Standard Normal distribution.3. Clearly f(x) > 0.1 Soem Random Vectors) 65 3. of X is given by b b + c b + 2c ⋅ ⋅ ⋅ b + x − 1 c ⎛ n⎞ P X =x =⎜ ⎟ ⎝ x⎠ b + r b + r + c ( )( ( ( ) ( )( [ ( )] × r (r + c ) ⋅ ⋅ ⋅ [r + ( n − x − 1)c ] . 142. σ2). so that z ∈ (−∞. ⎥ ⎥ ⎦ x∈ . where μ.) 3. Vol. the reader is referred to the book An Introduction to Probability Theory and Its Applications.22 (Polya’s urn scheme) Consider an urn containing b black balls and r red balls. denoted ∞ by N(0. ∞). σ2).3 Continuous Random Variables (and General Concepts 3.3 Continuous Random Variables (and Random Vectors) 3. denoted by N(μ. 1.. Suppose that this experiment is repeated independently n times and let X be the r.3. One ball is drawn at random. that I = ∫−∞ f x dx = 1 is proved by showing that 2 I = 1. ∞).

Taking all these facts into consideration. heights or weights of a (large) group of individuals. the force required to punctuate a cardboard. lifetimes of various manufactured items.4 1 0.3.3. However. From the fact that max f x = x∈ () 1 2πσ and the fact that ∫−∞ f ( x)dx = 1. the main signiﬁcance of it derives from the Central Limit Theorem to be discussed in Chapter 8. errors in numerous measurements. The Normal distribution is a good approximation to the distribution of grades. Section 8. we are led to Fig. ∞ we conclude that the larger σ is. It is easily seen that f(x) is symmetric about x = μ. 3. that is.6 0.8 0.2 Gamma X S = ( ) (actually X (S ) = (0.5. that is.d. I2 = 1 and hence I = 1.2 2 2 1 0 1 N( . since f(x) > 0. Or I2 = 1 2π ∫ 0 e −r 2 2 r dr ∫ 2π 0 dθ = ∫ e − r 0 ∞ 2 2 r dr = − e − r 2 2 ∞ 0 = 1. f(μ − x) = f(μ + x) and that f(x) attains its maximum at x = μ which is equal to 1/( 2πσ ). the diameters of hail hitting the ground during a storm.5 3 4 5 x Figure 3. etc.66 3 On Random Variables and Their Distributions f (x) 0.5 and several values of σ. the more spread-out f(x) is and vice versa.5 Graph of the p. 2 2) 0. I2 = 1 2π ∫ ∫ ∞ ∞ ∞ −∞ −∞ e − z 2 +υ 2 ( ) 2 dz dν = 1 2π ∫ ∫ 0 ∞ 2π 0 e −r 2 2 r dr dθ by the standard transformation to polar coordinates.f. It is also clear that f(x) → 0 as x → ±∞. ∞)) . of the Normal distribution with μ = 1. 3.

⎩ () ( ) x > 0 α > 0. 0 ( ) ( ) ∞ For later use. dx = β dy. and if α is an integer.3 Continuous Random Variables (and General Concepts 3. x≤0 where Γ(α) = ∫0 yα −1 e − y dy (which exists and is ﬁnite for α > 0). upon letting x/β = y. ∞). 2 ( ) . f(x) ≥ 0 and that ∫−∞ f(x)dx = 1 is seen as follows. y ∈ (0. ( ) ( )( ) ( ) ( )( ) () where Γ 1 = ∫ e − y dy = 1. then Γ α = α −1 α − 2 ⋅ ⋅ ⋅ Γ 1 . by integrating by parts. that is. Γα ( ) REMARK 3 One easily sees.3. that is. t ∈ 0. Γ a = α − 1 ! 0 () ∞ () ( ) We often use this notation even if α is not an integer. ∞ . that Γ α = α −1 Γ α −1 .1 Soem Random Vectors) 67 Here ⎧ 1 x α −1 e − x β . ⎪ f x = ⎨Γ α βα ⎪0 . ∞ ∫ f (x )dx = Γ(α )β α ∫ ∞ 1 ∞ −∞ 0 x α −1 e − x β dx = 1 Γ α βα ( ) ∫ ( ) ∞ 0 yα −1 e − y β α dy. 2⎠ 0 ⎝ By setting y1 2 = t 2 . (This integral is known as the Gamma function. β > 0. we write Γ α = α − 1 ! = ∫ yα −1 e − y dy for α > 0. ∞ Clearly. x = βy. that is. β are called the parameters of the distribution. we show that ⎛ 1⎞ ⎛ 1⎞ Γ⎜ ⎟ = ⎜ − ⎟ ! = π . ⎝ 2⎠ ⎝ 2⎠ We have ∞ − (1 2 ) ⎛ 1⎞ Γ⎜ ⎟ = ∫ y e − y dy. so that y= t2 . dy = t dt . ∫−∞ f (x )dx = Γ(α ) ∫0 ∞ 1 ∞ yα −1 e − y dy = 1 ⋅ Γ α = 1.) The distribution of X is also called the Gamma distribution and α.

of the Gamma distribution for several values of α.7 Graphs of the p.6 and 3. 1 1. and in particular its special case the Negative Exponential distribution. β. serve as satisfactory models for f (x) 1. β.50 0. that is. The Gamma distribution. ⎝ 2⎠ ⎝ 2⎠ From this we also get that ⎛ 3⎞ 1 ⎛ 1⎞ π Γ⎜ ⎟ = Γ⎜ ⎟ = . of the Gamma distribution for selected values of α and β are given in Figs.68 3 On Random Variables and Their Distributions we get ∞1 ⎛ 1⎞ Γ⎜ ⎟ = 2 ∫ e −t 0 t 2⎠ ⎝ 2 2 t dt = 2 ∫ e − t 0 ∞ 2 2 dt = π .f.00 0.7. . etc.f. 3.d.75 0.75 0.d.50 2.25 0 1 2 3 4 5 6 7 8 x Figure 3. of the Gamma distribution for several values of α. 0.25 2.00 0.f. 1 2.6 Graphs of the p.d. 2⎠ 2 ⎝ 2⎠ 2 ⎝ Graphs of the p. discussed below. 2 0 1 2 3 4 5 x Figure 3. 1 4.5 2. 1 0. f (x) 1. ⎛ 1⎞ ⎛ 1⎞ Γ⎜ ⎟ = ⎜ − ⎟ ! = π .

denoting the waiting time between successive occurrences of events following a Poisson distribution. To this end. f x =⎨ ⎩0 . To summarize: f(x) = 0 for x ≤ 0. if X is an r.3 Continuous Random Variables (and General Concepts 3.3. integer. which is known as the Negative Exponential distribution. their average − λx ( λx ) = e − λx . is denoted by χ r and r is called the number of degrees of freedom (d. then X has the Negative Exponential distribution. suppose that events occur according to the Poisson distribution P(λ). Since λ is the average number of emitted particles per time unit.f. 2 The distribution with this p. so that X is distributed as asserted.3 Chi-square For α = r/2. 3.d. Thus. we get what is known as the Chi-square distribution. The Chi-square distribution occurs often in statistics.f. β = 2.. To see this. for example. integer. β = 1/λ.1 Soem Random Vectors) 69 describing lifetimes of various manufactured items.4 Negative Exponential For α = 1. then dFdxx ) = f ( x).. we obtain the Chi-square and the Negative Exponential distributions given below. since no number during time x will be λx. Furthermore. () ( ) x>0 x≤0 r > 0. it is mentioned here that the distribution function F of an r.v. it follows that F(x) = 0. is deﬁned by F(x) = P(X ≤ x). and f(x) = λe−λx for x > 0.) of the distribution. x]. as well as in statistics. in particular.v. x ≤ 0. So let x > 0 be the the waiting time for the emission of the next item. so that f(x) = λe−λx.v. to be studied in the next chapter. More speciﬁcally. x ∈ . we get ⎧λ e − λx . That is.3. For speciﬁc choices of the parameters α and β in the Gamma distribution. Since X ≥ 0. denoting the waiting time until the next particle occurs. among other things. F(x) = 1 − e . and if X ( is a continuous r.3. We shall show that X has the Negative Exponential distribution with parameter λ. ⎧ 1 ( r 2 )−1 − x 2 x e . 0 . we suppose that we have just observed such a particle. ⎪ 1 r 2 f x = ⎨Γ 2 r 2 ⎪ ⎩0.v. in waitingtime problems. The Negative Exponential distribution occurs frequently in statistics and. and let X be the r. it sufﬁces to determine F here. Then F(x) = P(X ≤ x) = 1 − P(X > x). Then P( X > x) = e 0! −λx particles are emitted in (0. r ≥ 1. 3. as will be seen in subsequent chapters. particles emitted by a radioactive source with the average of λ particles per time unit. that is. () x>0 x≤0 λ > 0.

(See Fig. β]. β) distribution. 3. ⎪ f x =⎨ ⎪0. 1)) and ( ) β −1 ⎧Γ α +β ⎪ x α −1 1 − x f x = ⎨Γ α Γ β ⎪ ⎩0 elsewhere. The interpretation of this distribution is that subintervals of [α. β) or Rectangular R(α.8 Graph of the p. ∞ Clearly. β). ∞ 1 β The distribution of X is also called Uniform or Rectangular (α. we may use the notation X ∼ Γ(α. ⎧1 β − α . ⎩ () Clearly. 0<x<1 α > 0. f x ≥ 0.3.v.d. f (x) 1 Figure 3. β) or 2 X ∼ NE(λ). and α and β are the parameters of the distribution. respectively. are assigned the same probability of being observed regardless of their location. or X ∼ χ r in order to denote the fact that X is distributed as Gamma with parameters α and β. f(x) ≥ 0. That ∫−∞ f ( x)dx = 1 is seen as follows.8.3. 0 x 3. () ∫−∞ f (x )dx = β − α ∫α dx = 1.) The fact that the r. 3. of the same length. () ( ) ( )() .f. β).5 Uniform U(α. β > 0.70 3 On Random Variables and Their Distributions Consonant with previous notation. or Negative Exponental with parameter λ. β) X S = ( ) (actually X (S ) = [α . β ]) and ( ) α≤x≤β otherwise α < β. or Chi-square with r degrees of freedom. . of the U(α.6 Beta X S = ( ) (actually X (S ) = (0. X has the Uniform distribution with parameters α and β may be denoted by X ∼ U(α.

1 Soem Random Vectors) 71 ∞ ∞ Γ α Γ β = ⎛ ∫ x α −1 e − x dx⎞ ⎛ ∫ y β −1 e − y dy⎞ ⎝ 0 ⎠⎝ 0 ⎠ ∞ ∞ −( x+y ) = ∫ ∫ x α −1 y β −1 e dx dy 0 0 ( )() which. Γ α Γ β =Γ α +β and hence ( )() ( ( )∫ x (1 − x) 1 α −1 β −1 0 dx α −1 ∫−∞ f (x )dx = Γ(α )Γ(β ) ∫0 x (1 − x ) ∞ 1 Γ α +β ) β −1 dx = 1. since Γ(1) = 1 and Γ(2) = 1. ∞). 1 2 1− u 1− u ( ) ( ) − y 1− u and x + y = y . REMARK 4 For α = β = 1. β are called the parameters of the distribution 1 and the function deﬁned by ∫0 x α −1 (1 − x )β −1 dx for α.d. β).3. upon setting u = x/(x + y). Let y/(1 − u) = υ. . β > 0 is called the Beta function. so that x= uy y du . Then the integral is =∫ ∞ 1 0 ∞ 0 α ∫ u (1 − u) −1 β −1 υ α + β −1 e −υ du dυ = ∫ υ α + β −1 e −υ dυ ∫ uα −1 1 − u 0 0 =Γ α +β that is. Again the fact that X has the Beta distribution with parameters α and β may be expressed by writing X ∼ B(α. 1). dx = .3 Continuous Random Variables (and General Concepts 3. 3.9. 1− u becomes =∫ =∫ ∞ 1 0 ∫ ∫ uα −1 0 (1 − u) ( 1− u α −1 yα −1 y β −1 e yα + β −1 e ( ) y du (1 − u) 2 dy ∞ 1 0 uα −1 − y 1− u 0 ) α +1 ( ) du dy. of the Beta distribution for selected values of α and β are given in Fig. we get the U(0.f. u ∈ 0. Graphs of the p. ( )∫ u 1 0 α −1 ( (1 − u) 1 ) β −1 du β −1 du. so that y = υ(1 − u). α. υ ∈ (0. The distribution of X is also called the Beta distribution and occurs rather often in statistics. dy = (1 − u)dυ.

0 2 2 1.8 1.2 0.f. −∞ π ∫−∞ 1 + y 2 π upon letting y= x−μ . ∞)) and .72 3 On Random Variables and Their Distributions f (x) 2.4 0.8 Lognormal Here X(S) = (actually X(S) = (0. We may write X ∼ Cauchy(μ.f.d.f. μ ∈ .3. of the Cauchy distribution looks much the same as the Normal p.) 3. σ The distribution of X is also called the Cauchy distribution and μ.9 Graphs of the p.5 3 3 5 3 2. β. σ > 0. x ∈ . 3.6 0.0 0 0. 3. Clearly.d. of the Beta distribution for several values of α.0 x Figure 3.7 Cauchy Here X S = ( ) and f x = () ∞ σ 1 ⋅ π σ2 + x−μ ( ) 2 .3. f(x) > 0 and ∫−∞ f ( x )dx = π ∫−∞ ∞ σ 1 σ2 + x−μ ( ) 2 dx = 1 ∞ 1 σπ ∫−∞ 1 + x − μ σ [( ) ] 2 dx = ∞ 1 ∞ dy 1 = arctan y = 1.5 1. (The p.d. except that the tails of the former are heavier.. so that σ dx = dy. σ are called the parameters of the distribution (see Fig.10). σ2) to express the fact that X has the Cauchy distribution with parameters μ and σ2.

becomes ⎡ y − log α 1 = exp⎢− ∫ y ⎢ 2β 2 β 2π −∞ e ⎢ ⎣ 1 ∞ ( ) 2 ⎤ ⎥ e y dy.3. letting x = ey.11 Bivariate Normal Here X(S) = 2 (that is. ⎧ ⎡ ⎪ 1 exp⎢− log x − log α ⎪ ⎢ f x = ⎨ xβ 2π 2β 2 ⎢ ⎣ ⎪ ⎪0. (For the many applications of the Lognormal distribution. ∞). y ∈ (−∞. β) may be used to express the fact that X has the Lognormal distribution with parameters α and β . testing hypotheses. 1957.3 3.3. The distribution of X is called Lognormal and α. Aitchison and J. We close this section with an example of a continuous random vector. ⎩ () ( ) 2 ⎤ ⎥. β 2) density and hence is equal to 1.1 2 1 0 1 2 x Figure 3. 3.3 0. Section 9.2 0. β are called the parameters of the distribution (see Fig. Now f(x) ≥ 0 and ∫ () ∞ −∞ f x dx = 1 β 2π ∫0 ∞ ⎡ log x − log α 1 exp⎢− ⎢ x 2β 2 ⎢ ⎣ ( ) 2 ⎤ ⎥ dx ⎥ ⎥ ⎦ which. A. ⎥ ⎥ ⎦ But this is the integral of an N(log α. C. σ = 1. then Y = log X is normally distributed with parameters log α and β 2. etc.d.1 Soem Random Vectors) Continuous Random Variables (and General Concepts 73 f (x) 0.10 F These distributions occur very often in Statistics (interval estimation.f. the reader is referred to the book The Lognormal Distribution by J. analysis of variance.3. 3. The notation X ∼ Lognormal(α. that is.) and their densities will be presented later (see Chapter 9.11). ⎥ ⎥ ⎦ x>0 x ≤ 0 where α . X is a 2-dimensional random vector) with . β > 0. if X is lognormally distributed. Cambridge University Press.) 3.10 Graph of the p.9 t 3.2). Brown. of the Cauchy distribution with μ = 0.3. dx = eydy. so that log x = y. New York.

12. The distribution of X is also called the Bivariate Normal distribution and the quantities μ1.11 Graphs of the p.74 3 On Random Variables and Their Distributions f (x) 0. σ1 ( ) . x2) > 0.8 0. x2 ∈ q= 1 ( ) 1 2π σ 1 σ 2 1 − ρ 2 e −q 2 . of the Lognormal distribution for several values of α. ⎛ x2 − μ2 ⎞ ⎛ x1 − μ1 ⎞ x 2 − μ 2 1 x − μ1 − ⋅ ρσ 2 ⋅ 1 ⎜ ⎟ − ρ⎜ ⎟= σ2 σ2 σ1 ⎝ σ2 ⎠ ⎝ σ1 ⎠ = where b = μ2 + 1 σ2 ⎡ ⎛ x1 − μ1 ⎞ ⎤ 1 x2 − b .f.6 1 2 1 0. ⎢ x 2 − ⎜ μ 2 + ρσ 2 ⎟⎥ = σ1 ⎠ ⎦ σ 2 ⎝ ⎢ ⎥ ⎣ ( ) ρσ 2 x1 − μ1 . ⎝ σ1 ⎠ ⎦ ⎝ σ1 ⎠ ⎢ ⎥ ⎣⎝ σ 2 ⎠ 2 ( ) 2 Furthermore. 3. −1 < ρ < 1 and 2 ⎡⎛ x − μ ⎞ 2 ⎛ x − μ1 ⎞ ⎛ x 2 − μ2 ⎞ ⎛ x 2 − μ2 ⎞ ⎤ 1 ⎢⎜ 1 +⎜ − 2ρ ⎜ 1 ⎟ ⎥ ⎟ ⎟⎜ ⎟ ⎝ σ1 ⎠⎝ σ2 ⎠ ⎝ σ2 ⎠ ⎥ 1 − ρ 2 ⎢⎝ σ 1 ⎠ ⎣ ⎦ with μ1. (See Fig. β.2 e2 e 0 1 2 3 4 x Figure 3.5 0. f(x1. σ2 > 0. σ2. ρ are called the parameters of the distribution. σ1. μ2. σ1. f x1 . μ2 ∈ . . That ∫∫ f(x1. x 2 = where x1.d.) Clearly. x2)dx1dx2 = 1 is seen as follows: 2 ( 2 ⎡⎛ x − μ ⎞ 2 ⎛ x − μ1 ⎞ ⎛ x 2 − μ 2 ⎞ ⎛ x 2 − μ 2 ⎞ ⎤ 1 1 − ρ 2 q = ⎢⎜ 1 − 2 ρ⎜ 1 +⎜ ⎟ ⎟⎜ ⎟ ⎟ ⎥ ⎢⎝ σ 1 ⎠ ⎝ σ1 ⎠ ⎝ σ 2 ⎠ ⎝ σ 2 ⎠ ⎥ ⎦ ⎣ ) ⎡⎛ x − μ 2 ⎞ ⎛ x1 − μ1 ⎞ ⎤ 2 ⎛ x1 − μ1 ⎞ = ⎢⎜ 2 ⎟ − ρ⎜ ⎟⎥ + 1− ρ ⎜ ⎟ .4 0.

of the Bivariate Normal distribution. x 2 ) for z k 0 x2 x1 Figure 3. Since the ﬁrst 2 factor is the density of an N(μ1.1 Soem Random Vectors) Continuous Random Variables (and General Concepts 75 z z f(x1. k Thus ( and hence ⎛ x − b⎞ 2 ⎛ x1 − μ1 ⎞ 1 − ρ2 q = ⎜ 2 ⎟ ⎟ + 1− ρ ⎜ ⎝ σ2 ⎠ ⎝ σ1 ⎠ ) 2 ( ) 2 ∫−∞ ( ∞ f x1 . x 2 dx1 −∞ ( ) ∞ ( ) 2 is N μ 2 . f2 x 2 = ∫ f x1 . we get ∫−∞ ∫−∞ f ( x1 . integrating with respect to x1. x 2 dx 2 = ) ⎡ x −μ 1 1 exp⎢− 2 ⎢ 2σ 1 2πσ 1 ⎢ ⎣ 1 ×∫ ∞ −∞ ⎤ ⎥ ⎥ ⎥ ⎦ 2 ⎡ 1 ⎢− x 2 − b exp ⎢ 2σ 2 1 − ρ 2 2πσ 2 1 − ρ 2 2 ⎢ ⎣ 2 ( ) ( ( ) ) ⎤ ⎥ dx ⎥ 2 ⎥ ⎦ = ⎡ x −μ 1 1 exp⎢− ⎢ 2σ 12 2πσ 1 ⎢ ⎣ 1 ( ) 2 ⎤ ⎥ ⋅ 1. σ 1 ) random variable. then f1 x1 = ∫ f x1 . σ 2 .3. σ 2 (1 − ρ2)) density. ∞ ∞ REMARK 5 From the above derivations. ) . x2) is Bivariate Normal. ⎥ ⎥ ⎦ 2 since the integral above is that of an N(b.3 3. it follows that. if f(x1.d. ( ( ) and similarly. x 2 dx 2 −∞ ( ) ∞ ( ) is N μ1 .f. x 2 )dx1 dx 2 = 1.12 Graph of the p. σ 12 .

the values γ = P(X ≤ x) are given for r ranging from 1 to 45.68269. denoting the time which lapses .9973. σ 12.’s f1 and f2 above are called marginal p. σ 2 ). From the entries of the table.’s of f.d. 3. ii) max f x = x∈ () 1 2πσ .3.v.d.3. Select some values of γ and record the corresponding values of x for a set of increasing values of r. and for a < b.3. Let T be the r. Use Table 5 in Appendix III in order to determine the numbers a and b for which the following are true: ii) P(X < a) = P(X > b).6 Consider certain events which in every time interval [t1.2 Let X be distributed as N(0. of the N(μ. ρ.f. 2 The notation X ∼ N(μ1.1 Let f be the p. 2 2 Then X1 ∼ N(μ1 . μ2. f in order to show that: iii) iii) iii) iv) For 0 ≤ a < b. ρ) may be used to express the fact that 2 X has the Bivariate Normal distribution with parameters μ1. 1).90. use the Normal Tables in Appendix III in order to show (See Normal Tables in Appendix III for the deﬁnition of Φ.3 that: iii) P(−1 < X < 1) = 0. the values of x increase along with the number of degrees of freedom r. σ 2 .4 Let X be a χ r . p = Φ(−a) − Φ(−b). For a ≤ b < 0. 2 3. σ 12. for a ﬁxed γ. Exercises 3. In Table 5. 2 3. P(−c < X < c) = 2Φ(c) − 1. let p = P(a < X < b). ii) P(a < X < b) = 0.d. Then use the symmetry of the p.3. observe that. and for selected values of γ.76 3 On Random Variables and Their Distributions As will be seen in Chapter 4.f.5 Let X be an r.f.d. distributed as χ10. σ 1 ) and X2 ∼ N(μ2.9545. t2] (0 < t1 < t2) occur independently for nonoverlapping intervals according to the Poisson distribution P(λ(t2 − t1)).3. the p. 3.v. iii) P(−2 < X < 2) = 0. For c > 0. For a ≤ 0 < b.3. iii) P(−3 < X < 3) = 0. p = Φ(b) − Φ(a).) 3. Appendix III. σ2) distribution and show that: ii) f is symmetric about μ. If X ∼ N(0. μ2. σ 2 . p = Φ(b) + Φ(−a) − 1. 1).f.

compute the probability that: iii) He will survive to retire at 65. t > 0. β = ∫ x α −1 1 − x 0 ( ) 1 ( ) β −1 dx.9 Let X be an r. Determine the values of the parameter α for which the following are true: ii) P(−1 < X < 2) = 0.11 Establish the following identity: ⎛ n − 1 ⎞ p m−1 n⎜ 1− x ⎟∫ x ⎝ m − 1⎠ 0 ( ) n− m dx = ( p n! m−1 ∫0 x 1 − x m−1 n−m ! )( ) ( ) n− m dx ⎛ ⎞ = ∑ ⎜ n⎟ p j 1 − p ⎝j⎠ j= m n ( ) n− j . P(X > s + t|X > s) for some s. with p. 1. .10 Refer to the Beta distribution and set: B α . iii) For what value of c. 3. For an individual from the group in question.v.d.3.d. f is given by: f(x) = λe−λxI(0. j = 0.3. ii) P(|X| < 1) = P(|X| > 2).3.12 Let X be an r.f. ∞)(x).f given by f(x) = 1/[π(1 + x2)].3. iv) If it is known that P(X > s) = α. express the parameter λ in terms of α and s. iii) He will live to be at least 70 years old. Calculate the probability that X2 ≤ c.v.v.75. P(X > c) = 1 ? 2 3. . . α). 3. 3. P(X > t) for some t > 0.8 Suppose that the life expectancy X of each member of a certain group of people is an r. . Show that the distribution of T is Negative Exponential with parameter λ by computing the probability that T > t. Compare the probabilities in parts (ii) and (iii) and conclude that the Negative Exponential distribution is “memoryless”. having the Negative Exponential distribution with parameter λ = 1/50 (years). denoting the life length of a TV tube and suppose that its p.3. . Compute the following probabilities: iii) iii) iii) iv) P(j < X ≤ j + 1).1 Soem General Exercises Concepts 77 between two consecutive such events.3.7 Let X be an r. 3. β) = B(β.3. distributed as U(−α.v. given that he just celebrated his 40th birthday. α) (α > 0). 3. Then show that B(α.

f. iii) Observe that the Negative Exponential p. 3. ⎩ 3 x>0 −1 < x ≤ 0.d.f. ⎪ ii) f(x) = ⎨− cx.π/2)(x). Compare the answer with that of Exercise 3. is a special case of a Weibull p. (Note: The Weibull distribution is employed for describing the lifetime of living organisms or of mechanical systems.000 ] x . ⎪0 . and specify the values of the parameters for which this happens.d.d. 3.17 Let X be the r.15(ii).f.3.3.3. Compute the probability that X > x. x ≤ −1 ii) f(x) = cx2e−x I(0. ii) f(x) = xe−xI(0.f. β = 1 and β = 2.d.13 iii) iii) iii) iv) f(x) f(x) f(x) f(x) Show that the following functions are p.14 Show that the following functions are p. given that X > s.f.’s: = xe−x /2I(0.’s involved.∞)(x).7(iii).3.f. ∞)(x) (Raleigh distribution).∞)(x) (Maxwell’s distribution). A = (c.f. ii) Calculate the probability that the life span of one electronic device of the type just described is at least 2. iii) For α = 1 and β = 1 .d.v.3.15(ii) and compute the probability that X exceeds s + t. α.v.f.16 Let X be an r.d. 2 a = ( c )( c )α+1IA(x).. x > 0 (α.f.d.’s? ⎧ce −6 x . x 2 2 3.15 For what values of the constant c are the following functions p. 3.3.d.3. and suppose that its p. given by 3.f. denoting the life length of a certain electronic device expressed in hours. f is given by: f x = () c I 1.3.78 3 On Random Variables and Their Distributions 3.’s: ii) f(x) = cos xI(0.∞) (x). c > 0 (Pareto distribution). draw the respective graphs of the 2 p. 3. 3.d. with p.3. with parameters α and β). β > 0). and: β iii) Show that it is a p. 3. (called the Weibull p.18 Refer to Exercise 3. xn [ () ii) Determine the constant c in terms of n.d.3.000 hours.19 Consider the function f(x) = αβxβ −1e−α x . = 1 e−|x−μ| (Double Exponential). ∞).) . = 2 / π x2e−x /2I(0.000 .

3.4 The Poisson Distribution 3.2)×(0. ⋅ ⋅ ⋅ . P(1 < X < 2. ⎥ .5)(x. ⎪⎝ 2 ⎠ ⎪0 .: f x. 3. THEOREM 1 With the above notation and assumptions.’s having the joint p.22 Verify that the following function is a p.5 ) x 2 + y 2 .3. y . To this end.v. . ⎝ 2 2⎦ ( ] 3.f.20 Let X and Y be r. for some λ > 0.f. 1.f. 2 < Y < 4). P(X > Y).21 Let X and Y be r.’s whose joint p. Y > 5). we have x ⎛ n ⎞ x n− x −λ λ ⎯ ⎯ ⎜ ⎟ pn q n ⎯n→∞→ e ⎝ x⎠ x! for each ﬁxed x = 0.d. and we assume that as n → ∞.d.3. y = c 25 − x 2 − y 2 I (0 . Determine the constant c and compute the probability that 0 < X2 + Y2 < 4.3. ⎪4 ⎪1 . pn → 0 and that λ n = npn → λ . 3. Determine the constant c and compute the following probabilities: i) ii) iii) iv) P( 1 < X < 1. π × ⎜ − . ⋅ ⋅ ⋅ otherwise. f is given by f(x. 2. Then the following theorem is true. y). 2 P(X < 2. y) = cxyI(0.3.d.23 (A mixed distribution) Show that the following function is a p. consider a sequence of Binomial distributions.1 Soem General Concepts 79 3.v. so that the nth distribution has probability pn of a success. ⎪ f x = ⎨8 x ⎪⎛ 1 ⎞ ⎪⎜ ⎟ . y = ( ) 1 cos y I A x. we ﬁrst establish rigorously the assertion made earlier that the Poisson distribution may be obtained as the limit of Binomial distributions. 4π ( ) ( ) ⎛ π π⎤ A = −π . () 3. f given by: f x. 0 < Y < 3). ( ) ( ) ( ) 3.4 The Poisson Distribution as an Approximation to the Binomial Distribution and the Binomial Distribution as an Approximation to the Hypergeometric Distribution In this section.d. ⎩ x≤0 0<x<2 x = 2.f. ⎧1 x ⎪ e .

Then we can approximate x them by e−λ(λx/x!). . x n⎠ ⎛ x! ⎝ λn ⎞ ⎜1 − ⎟ n⎠ ⎝ since. we can go one step further in approximating the Binomial by the appropriate Poisson distribution according to Theorem 1. and then. where we replace λ be np.n → p. n → ∞ and suppose that m/(m + n) = pm. if need be. This is true for all x = 0. 0 < p < 1. provided p is small. . We also meet with difﬁculty in calculating probabilities in the Hypergeometric distribution REMARK 6 ⎛ m⎞ ⎛ n ⎞ ⎜ ⎟⎜ ⎟ ⎝ x ⎠ ⎝ r − x⎠ ⎛ m + n⎞ ⎜ ⎟ ⎝ r ⎠ if m.80 3 On Random Variables and Their Distributions PROOF We have ⎛ n ⎞ x n− x n n − 1 ⋅ ⋅ ⋅ n − x + 1 x n− x pn q n ⎜ ⎟ pn q n = ⎝ x⎠ x! = = n n − 1 ⋅ ⋅ ⋅ n − x + 1 ⎛ λn ⎞ ⎛ λn ⎞ ⎜ ⎟ ⎜1 − ⎟ x! n⎠ ⎝ n⎠ ⎝ x ( ( ( ) ) ) ( ( ( ) ) n− x n n−1 ⋅ ⋅ ⋅ n−x +1 nx )⋅ 1 λ x! x n ⎛ λ ⎞ 1 ⋅ ⎜1 − n ⎟ ⋅ x n⎠ ⎛ ⎝ λ ⎞ 1− n ⎟ ⎜ n⎠ ⎝ n ⎛ = 1⋅ ⎜1 − ⎝ x ⎛ 1⎞ x − 1⎞ λ n ⋅ ⋅ ⋅ ⎜1 − ⎟ ⎟ n⎠ n ⎠ x! ⎝ n ⎛ λ ⎞ 1 λx −λ × ⎜1 − n ⎟ ⋅ ⎯n→∞→ ⎯ ⎯ e . if λn → λ. . Thus we have THEOREM 2 Let m. n⎠ ⎝ n This is merely a generalization of the better-known fact that ⎛ λ⎞ −λ ⎜1 − ⎟ → e . n. 2. . n are large. Then . then ⎛ λn ⎞ −λ ⎜1 − ⎟ → e . ▲ n⎠ ⎝ n The meaning of the theorem is the following: If n is large the probabilities ( n )pxqn−x are not easily calculated. What we do is approximate the Hypergeometric distribution by an appropriate Binomial distribution. 1.

⋅ ⋅ ⋅ . we can approximate the probabilities . 2.n→∞→⎜ ⎟ p q .4 The Poisson Distribution 3. ⎯⎯ ⎝ x⎠ −1 ( ) ( ) ) since m → p and hence m+n REMARK 7 n → 1 − p = q.1 Soem General Concepts 81 ⎛ m⎞ ⎛ n ⎞ ⎜ ⎟⎜ ⎟ ⎛ r⎞ ⎝ x ⎠ ⎝ r − x⎠ → ⎜ ⎟ p x q r− x . We have ⎛ m⎞ ⎛ n ⎞ ⎜ ⎟⎜ ⎟ m! n! m + n − r ! ⎝ x ⎠ ⎝ r − x⎠ r! ⋅ = ⎛ m + n⎞ m − x ! n − r − x ! m + n ! x! r − x ! ⎜ ⎟ ⎝ r ⎠ ( )[ ( ( ( ) )] ( ) ( ) ⎛ r⎞ m − x + 1 ⋅ ⋅ ⋅ m n − r + x + 1 ⋅ ⋅ ⋅ n =⎜ ⎟ ⎝ x⎠ m+n−r +1 ⋅ ⋅ ⋅ m+n ) ( ⎛ r ⎞ m( m − 1) ⋅ ⋅ ⋅ [ m − ( x − 1)] ⋅ n( n − 1) ⋅ ⋅ ⋅ [ n − (r − x − 1)] . Dividing through by (m + n). ⎛ m + n⎞ ⎝ x⎠ ⎜ ⎟ ⎝ r ⎠ PROOF x = 0. m+n The meaning of the theorem is that if m. 1. r . =⎜ ⎟ ⎝ x⎠ (m + n) ⋅ ⋅ ⋅ [(m + n) − (r − 1)] Both numerator and denominator have r factors. we get ⎛ m⎞ ⎛ x ⎞ ⎜ ⎟⎜ ⎟ ⎛ m ⎝ n ⎠ ⎝ r − x⎠ ⎛ r ⎞ ⎛ m ⎞ ⎛ m 1 ⎞ x−1 ⎞ = ⎜ ⎟⎜ − − ⎟⎜ ⎟ ⋅⋅⋅⎜ ⎟ ⎛ m + n⎞ ⎝ m + n m + n⎠ ⎝ x⎠ ⎝ m + n ⎠ ⎝ m + n m + n ⎠ ⎜ ⎟ ⎝ r ⎠ ⎛ n ⎞⎛ n ⎛ n 1 ⎞ r − x − 1⎞ ×⎜ − − ⎟⎜ ⎟ ⋅⋅⋅⎜ ⎟ m+n ⎠ ⎝ m + n⎠ ⎝ m + n m + n⎠ ⎝m+n ⎡ ⎛ ⎛ 1 ⎞ r − 1 ⎞⎤ × ⎢1 ⋅ ⎜ 1 − ⎟ ⋅ ⋅ ⋅ ⎜1 − ⎟⎥ m + n⎠ m + n⎠ ⎥ ⎝ ⎢ ⎝ ⎣ ⎦ ⎛ r ⎞ x r− x ⎯m.3. n are large.

2 and suppose that the number of applicants is equal to 72. . p 16. deﬁne the inverse image of T. so that λ = 3. X : S → T.82 3 On Random Variables and Their Distributions ⎛ m⎞ ⎛ n ⎞ ⎟ ⎜ ⎟⎜ ⎝ x ⎠ ⎝ r − x⎠ ⎛ m + n⎞ ⎟ ⎜ ⎝ r ⎠ ⎛ r⎞ by ⎜ ⎟ p x q r − x ⎝ x⎠ by setting p = m/(m + n). P) and let T be a space and X be a function deﬁned on S into T. so that λ = 2. denoted by X−1(T). that is.2. . 2.2 Refer to Exercise 3.5* Random Variables as Measurable Functions and Related Results In this section. 3. under X.4.1 For the following values of n. 3. This is true for all x = 0. 1. . m m and n are large (m. so that λ = 2. . then the probabilities of having exactly x successes in r n (independent) trials are approximately equal under both sampling schemes. Compute the probabilities (i)–(iii) by using the Poisson approximation to Binomial (Theorem 1). A. p 24. It is to be x r−x observed that ( xr )( mm n ) (1 − mm n ) is the exact probability of having exactly x + + successes in r trials when sampling is done with replacement. remains constant. . X s ∈T .3 Refer to Exercise 3. If. Consider the probability space (S. so that the probability of a success. p 24.5. .5. random variables and random vectors are introduced as special cases of measurable functions.10 and use Theorem 2 in order to obtain an approximate value of the required probability. 3. The Hypergeometric distribu+ tion is appropriate when sampling is done without replacement. . however.4. so that λ = 2. p = = = = = 4 16 2 16 2 16 1 16 2 16 . draw graphs of B(n. For T ⊆ T. p and λ = np. mm n . as follows: X −1 T = s ∈ S . p) and P(λ) on the same coordinate axes: iii) iii) iii) iv) iv) n= n= n= n= n= 10.2. p 20.4.5. Certain results related to σ-ﬁelds are also derived. n → ∞) and n remains approximately constant ( m → c = p / q). ( ) { () } . r. so that λ = 1. Exercises 3. .

the following is immediate. B) and X is (A. (5). D)-measurable if and only if X−1(C) ⊆ A.v. or just measurable if there is no confusion possible. and just X if k = 1. vector).3. D) = ( . ⎝ j ⎠ j ( ) c (4) (5) (6) (7) X −1 T c = X −1 T X −1 T = S . If (T. Bk). ( ) (∅) = ∅. Let now D be a σ-ﬁeld of subsets of T and deﬁne the class X−1(D) of subsets of S as follows: X −1 D = A ⊆ S .). if (T. By means of (1). D) = ( k. More generally. we say that X is a k-dimensional random vector (r. ⎝ j ⎠ j ( ) ( ) (2) ( ) (3) Also ⎞ ⎛ X −1 ⎜ I T j ⎟ = I X −1 T j . Then the following properties are immediate consequences of the deﬁnition (and the fact X is a function): ⎛ ⎞ X −1 ⎜ U T j ⎟ = U X −1 T j . Bk)measurable. Then C* is a σ-ﬁeld. Then X is (A. then X −1 T1 ∩ X −1 T2 = ∅. where k = × ×···× (k copies of ). A = X −1 T for some T ∈D . The above theorem is the reason we require measurability in our deﬁnition of a random variable. B)measurable. D)-measurable. is well deﬁned. (6) above. In particular. where C is a class of subsets of T. X −1 ( ) [ ( )] .1 Soem General Concepts 83 This set is also denoted by [X ∈ T] or (X ∈ T). and X is (A. Let D = σ(C). If X−1(D) ⊆ A. A random variable is a one-dimensional random vector. ⎝ j ⎠ j ( ) (1) If T1 ∩ T2 = ∅. to be deﬁned below. On the basis of the properties (1)–(7) of X−1. Hence by (1) and (2) we have ⎞ ⎛ X −1 ⎜ ∑ T j ⎟ = ∑ X −1 T j . we say that X is a random variable (r. It guarantees that the probability distribution function of a random vector. we immediately have THEOREM 3 ( ) { ( ) } The class X−1(D) is a σ-ﬁeld of subsets of S. In this latter case. X is a random variable if and only if . THEOREM 4 COROLLARY Deﬁne the class C* of subsets of T as follows: C* = {T ⊆ T. X−1(T) = A for some A ∈ A}.5* Random Variables as Measurable Functions and Related Results 3. then we say that X is (A. we shall write X if k ≥ 1.

since P is a probability measure. deﬁne on Bk the PX B = P X −1 B = P X ∈ B = P s ∈ S . P[X−1(B)] makes sense. . 2. C ′j. The classes Co. Bk ). The σ-ﬁeld C* of Theorem 4 has the property that C* ⊇ C. or X−1(Cj). Therefore PX is well deﬁned by (8). It is now shown that PX is a probability measure on Bk . vector. . Cj. ⎝ j =1 ⎠ ⎝ j =1 ⎠ ⎥ j =1 ⎢ ⎣ j =1 ⎦ j =1 ⎣ ⎦ The probability measure PX is called the probability distribution function (or just the distribution) of X.5.5.84 3 On Random Variables and Their Distributions X−1(Co). A. . or X−1(C ′j) ⊆ A. ∞ ⎡ ⎛ ∞ ⎞ ⎛ ∞ ⎞⎤ ⎡∞ ⎤ ∞ PX ⎜ ∑ B j ⎟ = P ⎢ X −1 ⎜ ∑ B j ⎟ ⎥ = P ⎢∑ X −1 B j ⎥ = ∑ P X −1 B j = ∑ PX B j . 8. and ﬁnally. For an event A. Then C* ⊇ D = σ(C) and hence X−1(C*) ⊇ X−1(D). B ∈ Bk. D) = measurability. . iii) What is the partition of S induced by IA? iii) What is the σ-ﬁeld induced by IA? 3. (5) and (6). iii) Show that IA is r. i. But X−1(C*) ⊆ A. . and similarly for the case of k-dimensional random vectors.5. the indicator IA of A is deﬁned by: IA(s) = 1 if s ∈ A and IA(s) = 0 if s ∈ Ac. By the Corollary to Theorem 4. P) → ( set function PX as follows: k . is well deﬁned. PX(B) ≥ 0. j = 1. . Thus X−1(D) ⊆ A. . PX( k) = P[X −1( k)] = P(S ) = 1..3 Write out the proof of Theorem 1 by using (1). The converse is a direct consequence of the deﬁnition of (A. X s ∈ B .e. the sets X−1(B) in S are actually events due to the assumption that X is an r. In fact. . ▲ PROOF Now. .v. Write out the proof of Theorem 2. by means of an r. ( ) [ ( )] ( ) ({ ( ) }) (8) ( ) [ ( )] ( ) Exercises 3. vector X: (S. Next. j = 1. 8 are deﬁned in Theorem 5 and the paragraph before it in Chapter 1.1 Consider the sample space S supplied with the σ-ﬁeld of events A.2 3. for any A ∈ A .

We omit the subscript X if no confusion is possible. . as x → +∞.f. one may choose B to be an “interval” in k. ii) This means that x1 < x2 implies F(x1) ≤ F(x2). x ∈ i) Obvious. iii) F is continuous from the right. PROOF In the course of this proof. for the sake of simplicity. Probability Densities. or just the distribution function (d. For such a choice of B. F of a r.v. where B is a subset of k.f. if x = (x1.f. of an r. . In fact. xk)′ and y = (y1.1 The Cumulative Distribution Function 85 Chapter 4 Distribution Functions.f. the d. 85 . and Their Relationship 4. y ≤ x} in the sense that. F(+∞) = 1. Now we restrict our attention to the case k = 1 and prove the following basic properties of the d. THEOREM 1 The distribution function F of a random variable X satisﬁes the following properties: . .) of a Random Vector— Basic Properties of the d. or d. . iv) F(x) → 0 as x → −∞. B = {y ∈ k. In particular. then yj ≤ xj.d. i.1 The Cumulative Distribution Function (c. vector X is an ordinary point function deﬁned on k (and taking values in [0. PX(B) is denoted by FX(x) and is called the cumulative distribution function of X (evaluated at x). . j = 1. of a Random Variable The distribution of a k-dimensional r. F(x) → 1. ii) F is nondecreasing. vector X has been deﬁned through the relationship: PX (B) = P(X ∈ B). . Thus.f.) of X.e. . . We express this by writing F(−∞) = 0. 1]).4. k. we set Q for the distribution PX of X. . We have then: i) 0 ≤ F(x) ≤ 1.. .f. . . yk).

We may assume xn ↑ ∞. ( ] ( ] ( ) ( ) iii) This means that. x n ↓ Q ∅ = 0 ( ] ( ) (−∞. x 2 . REMARK 1 i) F(x) can be used to ﬁnd probabilities of the form P(a < X ≤ b). This limit always exists. Equivalently. that is. x ] 1 2 and hence Q −∞.f. 4. but need not be equal to F(x+)(=limit from the right) = F(x). if xn → +∞. An ↓ A and hence by Theorem 2. P(X = a) = F(a) − F(a−). x n → Q −∞. x ] ↓ ∅. x n ↓ x implies and hence (−∞. equivalently. We may assume that xn ↓ −∞ (see also Exercise 4.6). then F(xn) ↓ F(x). In fact. In fact.1. n→∞ ( ) ( ) ( ) ( ) or lim F a − F x n = P X = a . x1 ≤ Q −∞. Then and hence Q −∞. x ( by Theorem 2. since F(xn)↑. Similarly. F(xn) → 0. Then. Chapter 2.’s of several distributions are given in Fig. xn ↑ Q ( ] ( ) = 1. n so that Q −∞. or lim P x n < X ≤ a = P X = a . iv) Let xn → −∞. The quantities F(x) and F(x−) are used to express the probability P(X = a). and Thus ( ) () () (a < X ≤ b) = (−∞ < X ≤ b) − (−∞ < X ≤ a) (−∞ < X ≤ a) ⊆ (−∞ < X ≤ b).86 4 Distribution Functions. if xn ↓ x. n→∞ [ ( ) ( )] ( ) . denoted by F(x−). and Their Relationship x1 < x2 implies (−∞. equivalently. P An ↓ P A . F(xn) ↓ F(x). Chapter 2. x ] ↓ (−∞. x ] ↑ n by Theorem 2. F ( x ) → 1.1. ii) The limit from the left of F(x) at x. x] n Q −∞. is deﬁned as follows: ( F x − = lim F x n n→∞ ( ) ( ) with x n ↑ x. x ] ⊂ (−∞. that is P a<X ≤b =F b −F a . n Graphs of d. equivalently. In fact. Probability Densities. Chapter 2. clearly. Then ] ( ] (−∞. F x1 ≤ F x 2 . ) ( ) ( ) () () P a < X ≤ b = P −∞ < X ≤ b − P −∞ < X ≤ a = F b − F a . An = (xn < X ≤ a). let xn ↑ a and set A = (X = a).

its d.f. It is known that a nondecreasing function (such as F) may have discontinuities which can only be jumps.20 0 (a) Binomial for n 6.60 0. 4 x 0 (b) Poisson for 2. p 1 . . ).0 0 0 (c) U ( .1 Examples of graphs of c. its d. 1).5 x 0 (d) N(0. 1 2 1 Figure 4.60 0. Here F(x) x (x) 1.f. 2 1 0. dF x dx ( )=f (x) F(x) 1.’s. F is continuous. Of course. iii) If X is discrete. if F is continuous then F(x) = F(x−) and hence P(X = x) = 0 for all x.20 F(x) 1.80 0.f. n→∞ () ( ) ( ) or F a −F a− =P X =a .00 0. the value of it at x being deﬁned by () ( ) ( ) F x = ( ) ∑ f (x ) j xj≤x and f x j = F x j − F x j−1 .1 The Cumulative Distribution Function 87 or F a − lim F x n = P X = a .4.80 0. is a “step” function.00 0. iv) If X is of the continuous type. ( ) ( ) ( ) where it is assumed that x1 < x2 < · · · .40 0. x F(x) 1. Then F(a) − F(a−) is the length of the jump of F at a.40 0.0 x x x x . Furthermore.d.

we see that if f is continuous. . and set Y = distribution is N(0. vector itself. . Often one has to deal with functions of an r. .88 4 Distribution Functions. j = 1. by Statement 1. STATEMENT 1 Let X be a k-dimensional r. and Their Relationship at continuity points of f. well-behaving functions of r. ( ) ( ) 1 2π = ∫ 2πσ −∞ 1 yσ + μ ⎡ t−μ exp⎢− ⎢ 2σ 2 ⎢ ⎣ ( ) 2 ⎤ ⎥ dt = ⎥ ⎥ ⎦ ∫−∞e y −u2 2 du = Φ y . PROOF X −μ σ THEOREM 2 .v. vector. and is an r.v. . () . vectors are r. Xk). k are r. takes values in m. The question then arises as to how X and Xj. and its In the ﬁrst place. whereas the precise statement and justiﬁcation are given as Theorem 8 below.f. .) In particular. k are related from the point of view of being r. . . vectors. . (That is. . the resulting entities have got to be r. . k are real-valued functions deﬁned on S. Probability Densities. that is. . . . Then g(X) is deﬁned on the underlying sample space S. k be functions deﬁned on the sample space S and taking values in k and . vectors.e. . respectively. The following statement is to this effect. . . j = 1. Then X is an r. i. vector if and only if Xj. where Xj. In such cases. Y is an r. STATEMENT 2 Let X and Xj. X = (X1. .v. Let X be an N(μ. vector deﬁned on the sample space S. where Φ y = We have ⎞ ⎛X −μ P Y ≤ y = P⎜ ≤ y⎟ = P X ≤ yσ + μ ⎠ ⎝ σ () 1 2π ∫ y −∞ e −t 2 2 dt . Its precise formulation and justiﬁcation is given as Theorem 7 on page 104. . Two important applications of this are the following two theorems. . vector if g is continuous. . . j = 1. Then Y is an r. Through the relations F x = ∫ f t dt and −∞ () x () dF x dx ( )=f (x). vector X may be represented in terms of its coordinates. Xk)′. . The answer is provided by the following statement. The following two theorems provide applications of Statement 1. since we operate in a probability framework. . . f determines F( f ⇒ F) and F determines f (F ⇒ f ). as is well known from calculus. and let g be a (well-behaving) function deﬁned on k and taking values in m. 1).v. F ⇔ f. vectors. Then it sufﬁces to show that the d. j = 1.’s. σ 2)-distributed r. Now a k-dimensional r. and let X = (X1. of Y is Φ. g(X) is an r..

of Y ﬁrst and then differentating it in order to obtain fY. σ 2)-distributed r. ( Xσ− μ )2 is distributed as χ 2.d. Next.’s given there.v. 1 FY y = P Y ≤ y = P − y ≤ X ≤ = 1 2π () ( ) ( y ) y 0 ∫ y − y e−x 2 2 dx = 2 ⋅ 1 2π ∫ e−x 2 2 dx.f. of χ 2. THEOREM 3 X −μ σ of X is referred to as normalization (i) We will show that the p.14. 1)-distributed r. To this end. . of Y is that of a χ 2-distributed r. 1 (ii) If X is a N ( μ.’s given there.d. Let x = t . (Observe that here we used the fact that Γ( 1 ) = 1 2 π . y] and FY y = 2 ⋅ () 1 2π ∫ y 1 2 t 0 e − t 2 dt .1 The Cumulative DistributionExercises Function 89 where we let u = (t − μ)/σ in the transformation of the integral. and this is the p.1. REMARK 2 The transformation of X.’s corresponding to the p.f.4. ⎪ fY y = ⎨ π 21 2 ⎪0. y ≤ 0). Since fY(y) = 0 for y ≤ 0 (because FY(y) = 0.f. Hence dFY y dy ( )= 1 2π 1 y e −y 2 = 1 2π y (1 2 )−1 e −y 2 . for y > 0.f. and determine the d. by 1 deriving the d.13. it follows that ⎧ 1 (1 2 )−1 − y 2 y e . ⎩ () y>0 y ≤ 0.d.d.f.) Exercises 4. Then Y = X 2 is distributed as χ 2.2.f. in Chapter 3. t ∈ (0.2.f.v.1.v. we have PROOF i(i) Let X be an N(0..v. then the r.v. and determine the d. Then dx = dt/2 t . in Chapter 3.2 Refer to Exercise 3. let us observe ﬁrst that Y is an r. 4.’s corresponding to the p.1 Refer to Exercise 3. on account of Statement 1.

6 Refer to the proof of Theorem 1 (iv) and show that we may assume that xn ↓ −∞ (xn ↑ ∞) instead of xn → −∞(xn → ∞).12 A certain manufacturing process produces light bulbs whose life length (in hours) is an r. 4.13. 2002).10 If X is an r.’s corresponding to the p. X distributed as N(2.5 Let X be an r.f. 4.f. 4. 4. F.d. respectively.3).’s: −X. of an r.5).7 Let f and F be the p.f. XI[a.9 Refer to Exercise 3. F corresponding to the p. Write out the expressions of F and f for n = 2 and n = 3. A light bulb is supposed to be defective if its lifetime is less than 1. F x = () 1+ e 1 .f..1. 4.f. with d. of the following r. X. distributed as N(3.14.v.25).4 Refer to Exercise 3.v. Probability Densities. and Their Relationship 4. f given there. and dF(x)/dx = f(x) at the continuity points x of f. and the d.d.1. iii) P(−0.’s given there.’s corresponding to the p. 4.90 4 Distribution Functions.v.’s given there.1. β ∈ .1..d. and determine the d.17 in Chapter 3 and determine the d. What proportion of the individuals in the group in question has an IQ: i) At least 150? ii) At most 80? iii) Between 95 and 125? 4.f.3.11 The distribution of IQ’s of the people in a given group is well approximated by the Normal distribution with μ = 105 and σ = 20. Then show that F is continuous. − (αx + β ) ii) Show that f(x) =αF(x)[1 − F(x)]. If 25 light bulbs are .1. Determine the d. 0. in Chapter 3. 4.3 Refer to Exercise 3. ii) P(X > 2.800. 4.5 < X < 1.f. in Chapter 3.f.v.000.f. ii) X is discrete.f. α > 0.1.1. X 2. use Table 3 in Appendix III in order to compute the following probabilities: i) P(X < −1). f. x ∈ .8 i) Show that the following function F is a d.f. and determine the d.d.d.1.3.f.1.1.b) (X ) when: i) X is continuous and F is strictly increasing.v. aX + b. (Logistic distribution) and derive the corresponding p.3.

3.2 The d. With the above notation we have THEOREM 4 i) 0 ≤ F(x1. what is the probability that at most 15 of them are defective? (Use the required independence. x2) ≤ 1.f. A day’s production is examined. or the joint distribution function of X1. f (x ) ii) Also.f. and draw its graph for α = 1 and β = 1 . iv) What do the quantities F(x).2. H(x) and the probability in part (iii) become in the special case of the Negative Exponential distribution? 2 4. Compute the proportion of defective ball bearings. is F(x1. . () iii) For s and t > 0.f.X2) of X. given in Fig. 4. So consider the case that k = 2. calculate the failure (or hazard) rate H x = ( x ) . x2 ∈ ..f. f.5 ± 0. Then the following theorem holds true.v. distributed as N(μ. x2.13 A manufacturing process produces 1 -inch ball bearings. X2 ≤ x2). ﬁnd the value of c (in terms of μ and σ) for which P(X < c) = 2 − 9P(X > c).2 The 4. which are 2 assumed to be satisfactory if their diameter lies in the interval 0.’s and p. 4.f.14 If X is an r.1 of a Random Vector and Its Properties d.f. is ≥ 0. X2. ii) The variation of F over rectangles with sides parallel to the axes. 4. 2. and it is found that the distribution of the actual diameters of the ball bearings is approximately normal with mean μ = 0.’s For the case of a two-dimensional r.f. (x).0006 and defective otherwise. 1. a result analogous to Theorem 1 can be established.1. of a Random Vector and Its Properties—Marginal and Conditional d.1.19 in Chapter 3 and do the following: i) Calculate the corresponding d. x1. We then have X = (X1. calculate the probability P(X > s + t|X > t) where X is an r.v.1.0005 inch. The Cumulative Distribution Function 91 tested. F(or FX or FX1.) 4.d. σ 2). X2)′ and the d. or both of them jointly. given in Exercise 3. vector.15 Refer to the Weibull p.5007 inch and σ = 0.4. x2) = P(X1 ≤ x1. having the Weibull distribution. iii) F is continuous from the right with respect to each of the coordinates x1. F and the reliability function (x) = 1 − F(x).d.

x2 < ∞. x1] × (−∞. y 2) (x2. Probability Densities. −∞) = 0. and if at least one of the x1. of the random variable X1. ii) V = P(x1 < X1 ≤ x2. then F(x1. iii) Same as in Theorem 3. − ∞ < X 2 < ∞ = P X1 ≤ x1 = F1 x1 . ∞ = lim P X1 ≤ x1 . ∞) = F1(x1) is the d. ≥ 0. and Their Relationship y y2 (x1. where −∞ < x1. F1. x2n ↓ x2). hence F x1 .f. iv) If x1.92 4 Distribution Functions. x2) → 1. In particular. X 2 ≤ x n x n ↑∞ ( ) = P X1 ≤ x1 . . REMARK 4 ( ( ) ) ( ) ( ) It should be pointed out here that results like those discussed in parts (i)–(iv) in Remark 1 still hold true here (appropriately interpreted). then F(x1. . we have a theorem strictly analogous to Theorems 3 and 6 and also remarks such as Remark 1(i)–(iv) following Theorem 3. In particular. . x2 goes (↓) to −∞. y1) + F(x2. of the random variable X2.f. and zn = (x1n. ∞) = F1(x1) is the d. x2. part (iv) says that F(x1. (If x = (x1. Similarly F(∞. PROOF i) Obvious. F(x1. y2) − F(x1. the analog of (iv) says that F(x1. x2) = F(x1. x2n)′. x 2 = f x1 .’s.2 The variation V of F over the rectangle is: F(x1. y 1) x1 x2 x iv) If both x1. x2 ↑ ∞. y 2) Figure 4. x2 → −∞. ∞) = 1. xk) has kth order partial derivatives and . y1) y1 0 (x1. x2)′. x2] ↓ ∅. x 2 → P ∅ = 0. −∞) = F(−∞. . If at least one of x1.f. In fact. then (−∞. then (−∞. clearly. F2 are called marginal d. x1] × (−∞. We express this by writing F(∞. then zn ↓ x means x1n ↓ x1. REMARK 3 ( ) ( ) The function F(x1. x2) → 0. F x1 . x2] ↑ R 2. → ∞. In fact. y1 < X2 ≤ y2) and is hence.f. of the random variable X1. x2) has second order partial derivatives and ∂2 F x1 . so that F(x1. x2) = F2(x2) is the d. x 2 ∂x1∂x 2 ( ) ( ) at continuity points of f. For k > 2. y2) − F(x2. F(−∞. y 1) (x2. x2) → P(S) = 1.

Xk. xk = f x1 . or FX. ∞. xk). .f.d. f2 the marginal p. f1 is the p. x ∈ ⎪∫ ∫ f x1 . f(x) = f(x1. f(x) = f(x1.v. . Then deﬁne f(x2|x1) as follows: f x 2 x1 = ( f x. Consider ﬁrst the case k = 2. Xk. In fact. . x2 dx1dx2 = ∫ ∫ f x1 . of X. of the r.f. As in the two-dimensional case.v. vector. ∞. or the joint distribution function of X1. k are r.f.f.4.f. .d. f2 are p. x 2 dx 2 ∫−∞ ⎩ ⎧∑ f x1 . f1(x1) ≥ 0 and ∑ f ( x ) = ∑ ∑ f ( x . x 2 dx1 .d. Furthermore. x2 = ∑ f1 x1 ⎪ x ∈B x ∈ x ∈B P X 1 ∈ B = ⎨x ∈B . We call f1.’s are called marginal distribution functions. . . Now suppose f1(x1) > 0. x j . x2 dx2 dx1 = ∫ f1 x1 dx1 . In fact.’s.x ) (f (x ) ) . that is. 1 1 1 2 x1 x1 x2 or ∫−∞ f1 (x1 )dx1 = ∫−∞ ∫−∞ f (x1 . . ⎧ ∑ f x1 . . or FX1. . .2 The 4. . where F.f. and f2 is the p. . is the d. · · · . X2. ⋅ ⋅ ⋅ . B B ⎩B ( ) ( ) ( ) ( ) 1 2 1 2 ( ) [ 1 ( ) ] ( ) Similarly f2 is the p.f. In Statement 2. Xk.f. ⋅ ⋅ ⋅ . x ) = 1. X2)′. F ∞. ⋅ ⋅ ⋅ . xk ∂x1∂x2 ⋅ ⋅ ⋅ ∂xk ( ) ( ) ) at continuity points of f.’s. Then the p.v. . x 2 )dx1 dx 2 = 1. is also called the joint p. x2) and set ⎧∑ f x1 . we have seen that if X = (X1. of X1.f. Xk)′ is an r.’s X1. of the random variables corresponding to the remaining (k − m) Xj’s. ⎩∫−∞ ( ) ( ) 2 ( ) ( ( ) ) 1 ( ) Then f1.f.f. ∞ = F j x j ( ( ) is the d. . x 2 ⎪ f2 x 2 = ⎨ x ∞ ⎪ f x1 . of the random variable Xj.’s and vice versa. and if m xj’s are replaced by ∞ (1 < m < k). . .f. ∞ ∞ ∞ Similarly we get the result for f2. . All these d. of the r. The Cumulative Distribution Function 93 ∂k F x1 . 1 2 1 1 .d. 2.d. x2 = ∑ ∑ f x1 . of X. .1 of a Random Vector and Its Properties d. j = 1. x 2 ⎪ f1 x1 = ⎨ x ∞ ⎪ f x1 . X = (X1. then the resulting function is the joint d. .d. ⋅ ⋅ ⋅ . . of X2. then Xj.d.

one has (1 h )P(x < X ≤ x + h x < X ≤ x + h ) (1 h h )P(x < X ≤ x + h . we call f(·|x2) the conditional p. f (x1|x2) may be given an interpretation similar to the one given above. Probability Densities.f. For a similar reason.f. x1 being an arbitrary. 1 1 1 1 1 In a similar fashion. h2 > 0. X2 are both discrete.f. the conditional p.d.d. Thus for small h1.f. In fact. the f(x2|x1) has the following interpretation: f x 2 x1 = ( f x. given that X2 lies in a small neighborhood of x2.d.f.f. the last expression on the right-hand side above tends to f(x1. x ) + F (x + h . of X2. X =x ) (f (x ) ) = ( P(X = x ) ) = P(X 1 2 1 1 2 2 1 1 1 1 2 2 = x 2 X1 = x1 . Furthermore. h2 → 0 and assuming that (x1.94 4 Distribution Functions. respectively. of X2. f(x2|x1) ≥ 0 and ∑ f ( x 2 x1 ) = x2 ∞ 1 f1 x1 1 ( ) ∑ f ( x1 . x + h ) − F (x = (1 h )[F (x + h ) − F (x )] 1 1 1 1 1 2 2 2 2 1 2 1 1 1 1 2 2 2 2 2 2 2 2 2 1 2 1 2 1 1 2 2 1 2 2 2 2 2 2 2 2 1 + h1 .d. and Their Relationship This is considered as a function of x2. We can also deﬁne the conditional d.d. of X1. For this reason. ) Hence P(X2 ∈ B|X1 = x1) = ∑ x ∈B f(x2|x1).’s f and f2 are of the continuous type. h2. x ) (f (x ) ) 1 2 2 2 and show that f(·|x2) is a p. x < X ≤ x + h ) = (1 h )P(x < X ≤ x + h ) (1 h h )[F (x . we deﬁne f(x1|x2) by: f x1 x 2 = ( f x. by means of . if f2(x2) > 0. given X1 = x1. we call f(·|x2) the conditional p. if X1. By letting h1.f. For the case that the p. of X1. so that h1 f(x1|x2) is approximately the conditional probability that X1 lies in a small neighborhood (of length h1) of x1. x2)′ and x2 are continuity points of f and f2. A similar interpretation may be given to f(x2|x1). x 2 )] where F is the joint d.d.f. x2)/f2(x2) which was denoted by f(x1|x2). given that X2 = x2 (provided f2(x2) > 0). given that X1 = x1 (provided f1(x1) > 0). value of X1 ( f1(x1) > 0). f1 x1 ( ) ( ) ∫−∞ f ( x 2 x1 )dx 2 = f ( x ) ∫−∞ f ( x1 . but ﬁxed. X2 and F2 is the d. of X2. h1 f(x1|x2) is approximately equal to P(x1 < X1 ≤ x1 + h1|x2 < X2 ≤ x2 + h2). Then f(·|x1) is a p.f. x + h ) − F (x . x 2 ) = x2 ∞ 1 ⋅ f1 x1 = 1. By assuming (without loss of generality) that h1. x 2 )dx 2 = f ( x ) ⋅ f1 ( x1 ) = 1.x P X =x.

. .f.’s corresponding to the remaining s variables. .’s. . . Thus F x j . ⋅ ⋅ ⋅ . X2. . If we sum (integrate) over t of the variables x1. xk keeping the remaining s ﬁxed (t + s = k). . . 2 1 2 ⎩∫−∞ ( ) 2 2 ( ) 2 ( ) and similarly for F(x1|x2).f.d. . . xk ⎪x . EXAMPLE 1 Let the r. . let s and t be integers such that 1 ≤ s. Conditional distribution functions are deﬁned in a way similar to the one for k = 2. .v.v. called the joint conditional p. ⎩∫−∞ 1 t 1 s j1 jt j1 jt j1 jt ( ) f (x ′ .2 The 4. . ⋅ ⋅ ⋅ . x = ⎨ ⋅⋅⋅ ⎪ ∞⋅⋅⋅ ∞ f x .’s which are also called marginal p. 1 ⋅ ⋅ ⋅ . ⋅ ⋅ ⋅ . i xi . . Xjs = xjs. . ⋅ ⋅ ⋅ .d. x ) ) t t 1 s 1 ) 1 t We now present two examples of marginal and conditional p. . Thus if X = (X1. xk) the joint p. . x ′ x . . Also. of the r. then the function (of xj1. The concepts introduced thus far generalize in a straightforward way for k > 2.d. ⋅ ⋅ ⋅ . Also if xi1. Xk. xjt) deﬁned by f x j . . . xi 1 s 1 ( s ) ⎧ ∑ f x1 .d. .f. . Again there are 2k − 2 joint conditional p. that is. pk. . of the r.f. . Xk)′ with p.’s involving all k r. . . ⋅ ⋅ ⋅ . . . the resulting function is the joint p. xk fi . .d. xis) > 0. ⋅ ⋅ ⋅ . . . given Xi1 = xi1. xk dx j ⋅ ⋅ ⋅ dx j . .d. f(x1. x j xi . one taken from a discrete distribution and the other taken from a continuous distribution. . . ⋅ ⋅ ⋅ .f. .4. fi . ⋅ ⋅ ⋅ . ⋅ ⋅ ⋅ .1 of a Random Vector and Its Properties d. . xi = 1 t 1 s ( ) f x1 . of the r. then we have called f(x1. xk). Xjt. ⋅ ⋅ ⋅ . ⋅ ⋅ ⋅ . . or just given Xi1. ⋅ ⋅ ⋅ . · · · . . t < k and s + t = k. .f. xi 1 s ( ( ) 1 s ) is a p. xis are such that fi1. . .d. . . x ′j x i . . ⎩∫−∞ ∫−∞ j1 jt ( ) ( ) 1 t There are ⎛ k⎞ ⎛ k⎞ ⎛ k ⎞ k ⎜ ⎟ +⎜ ⎟ + ⋅⋅⋅ +⎜ ⎟ =2 −2 ⎝ 1⎠ ⎝ 2 ⎠ ⎝ k − 1⎠ such p.’s X1.v. .f. ⎪ ∫−∞ j ⋅ ⋅ ⋅ . x i dx ′j ⋅ ⋅ ⋅ dx ′j . we have: . .d.v.f. . i xi . Then in the notation employed above. x j xi . . . xi 1 t 1 ( ⎧ ∑ j j i i ⎪ ⎪ = ⎨( x ′ . Xk. . . . ⋅ ⋅ ⋅ . . x ′ )≤( x . Xis.d. The Cumulative Distribution Function 95 ⎧ ∑ f x′ x 2 1 ⎪ F x2 x1 = ⎨x ′ ≤ x ⎪ x f x ′ x dx ′ .’s. . .v.’s X1.f. Xk have the Multinomial distribution with parameters n and p1.f. . x ⎪ x ⋅ ⋅ ⋅ x f x′ . it (xi1.’s X1. . ⋅ ⋅ ⋅ .’s Xj1.

DISCUSSION i) Clearly. x )) j1 jt i1 is 1 k i1 is i1 is ⎛ ⎞ n! x x pi1i1 ⋅ ⋅ ⋅ pi si s q n − r ⎟ ⎜ ⎜ x !⋅ ⋅ ⋅ x ! n−r ! ⎟ is ⎝ i1 ⎠ ( ) ⎛ p ⋅ ⋅ ⋅ p ⋅p ⋅ ⋅ ⋅ p i is j1 jt =⎜ 1 ⎜ xi1 ! ⋅ ⋅ ⋅ xi s ! x j1 ! ⋅ ⋅ ⋅ x jt ! ⎝ xi1 xi s x j1 x jt = ⎛ p j1 ⎞ x j1 ! ⋅ ⋅ ⋅ x jt ! ⎜ q ⎟ ⎝ ⎠ (n − r )! (since n − r = n − ( x x jt ⎞ ⎟ ⎟ ⎠ ⎛p ⋅⋅⋅ p q ⎞ i is ⎜ 1 ⎟ ⎜ xi ! ⋅ ⋅ ⋅ xi ! n − r ! ⎟ 1 s ⎝ ⎠ xi1 xi s x j1 + ⋅ ⋅ ⋅ + x jt ( ) i1 + ⋅ ⋅ ⋅ + xi s = x j1 + ⋅ ⋅ ⋅ + x jt ) ) x j1 ⎛ pj ⎞ ⋅⋅⋅⎜ t⎟ . is . xi ! ⋅ ⋅ ⋅ xi ! n − r ! q = 1 − pi + ⋅ ⋅ ⋅ + pi . . 1 s 1 s ( 1 s ( ) ) i1 is 1 s that is. 1 s s 1 s ) ( ) ( ) ( ) so that (X i1 = xi .’s X1 and X2 have the Bivariate Normal distribution. q. and Their Relationship ii) fi .x f x.96 4 Distribution Functions.’s Xi1. Xis and Y = n − (Xi1 + · · · + Xis) have the Multinomial distribution with parameters n and pi1. x jt xi1 . . is given by: . ⋅ ⋅ ⋅ . Probability Densities.x f x . and recall that their (joint) p. . ⎝ q⎠ EXAMPLE 2 as was to be seen. . Xis and Y are distributed as asserted. Thus. . X i = xi = X i = xi . . .x . . X i = xi ⊆ X i + ⋅ ⋅ ⋅ + X i = r = n − Y = r = Y = n − r . xi s = ⎛ n! x1 xk ⎞ p1 ⋅ ⋅ ⋅ pk ⎟ =⎜ ⎝ x1! ⋅ ⋅ ⋅ xk ! ⎠ ( . . ii) f x j1 . ⋅ ⋅ ⋅ . r = xi + ⋅ ⋅ ⋅ + xi . . . . ii) We have f x j1 . ⋅ ⋅ ⋅ .v. . ⋅ ⋅ ⋅ . and the number of its occurrences is Y. ⋅ ⋅ ⋅ . .v. we have that the probability of O is q. . i xi .f. ⋅ ⋅ ⋅ . . . . . . . . . xi s ( ) ⎛ p j1 ⎞ = x j1 ! ⋅ ⋅ ⋅ x jt ! ⎜ q ⎟ ⎝ ⎠ (n − r )! x j1 ⎛ pj ⎞ t ⋅ ⋅ ⋅⎜ 1 ⎟ . 1 s s 1 1 s s ) ( ) Denoting by O the outcome which is the grouping of all n outcomes distinct from those designated by i1. x . pis.d. ⋅ ⋅ ⋅ . pjt/q.’s Xi1. the r. x jt xi1 . ⋅ ⋅ ⋅ . ⋅⋅ ⋅⋅ ⋅⋅ . ) ( ⋅f ⋅( ⋅x . xi = 1 s 1 s ( ) n! x x pi ⋅ ⋅ ⋅ pi q n − r . the (joint) conditional distribution of Xj1. . x⋅ ⋅)⋅ ) = f ((x . . . Let the r. . . that is. ⎝ q⎠ xj r = xi1 + ⋅ ⋅ ⋅ + xi s . Xjt given Xi1.v. . Y = n − r . Xis is Multinomial with parameters n − r and pj1/q. the r. . ⋅ ⋅ ⋅ . (X i1 = xi . X i = xi . ⋅ ⋅ ⋅ .

’s f1.2.2. σ 2(1 − ρ2)) r. x2) is a p. X3. · · · . Similarly f(x1|x2) is seen to be the 2 p.f. X2 are also normally distributed. σ 2). X2.’s Xj. σ1 ( ) f x 2 x1 = ( ) f x1 .. where 1 b′ = μ1 + ρ σ1 x2 − μ2 .d.17 in Chapter 3 and: i) Find the marginal p. j = 1. of an N(b′.f. X2. 1 2 that is. 6. X1. 4.2 Refer to Exercise 3. ii) The conditional p. σ 2). N(μ2.v.d. ⎢⎝ σ 1 ⎠ ⎝ σ 1 ⎠ ⎝ σ 2 ⎠ ⎝ σ 2 ⎠ ⎥⎪ ⎦⎭ ⎣ ( ) We saw that the marginal p. respectively.v.2.’s of the r.18 in Chapter 3 and determine: ii) The marginal p. given X3.4. . we rewrote it as follows: f x1 . 2 ⎥ ⎢ ⎛ 2⎞ ⎥ ⎥ ⎦ ⎢ 2⎝σ 2 1 − ρ ⎠ ⎥ ⎦ ⎣ ( ) where b = μ2 + ρ Hence σ2 x1 − μ1 . of X1.d.d. of an N(b..1 The Cumulative DistributionExercises Function 97 f x1 . X3.d. x2 = ( ) 1 2πσ 1σ 2 1 − ρ 2 ⎧ 1 ⎪ × exp ⎨− 2 ⎪ 2 1− ρ ⎩ 2 ⎫ ⎡⎛ x − μ ⎞ 2 ⎛ x − μ1 ⎞ ⎛ x 2 − μ 2 ⎞ ⎛ x 2 − μ 2 ⎞ ⎤ ⎪ 1 ⎢⎜ 1 +⎜ − 2ρ⎜ 1 ⎟ ⎟⎜ ⎟ ⎟ ⎥ ⎬.f. of each one of X1. x 2 = ( ) 1 2πσ 1σ 2 ⎡ x −μ 1 1 exp⎢− 2 ⎢ 2σ 1 1 − ρ2 ⎢ ⎣ ( ) 2 ⎤ ⎡ 2 ⎤ ⎥ ⎢ x2 − b ⎥ ⋅ exp⎢− ⎥. X3. σ2 ( ) Exercises 4.d.f. x 2 f1 x1 ( ( ) )= 1 2πσ 2 ⎤ ⎡ 2 ⎥ ⎢ x2 − b ⎥ exp⎢− 2 ⎢ ⎛ 2⎞ ⎥ 1 − ρ2 ⎢ 2⎝σ 2 1 − ρ ⎠ ⎥ ⎦ ⎣ ( ) which is the p. in the process of proving that f(x1.2.f. Furthermore.v. given X1 .1 Refer to Exercise 3. given X2.f. σ 2(1 − ρ2)) r. f2 are N(μ1. X2. X1.f.d. ii) Calculate the probability that X1 ≥ 5.

x3 .v. X1. provide expressions for the following probabilities: iv) P(3X1 + X2 ≤ 5).’s Xj.’s of X and Y. of X1. and Their Relationship iii) The conditional p. ii) Find the marginal p.f. X2. 0 ≤ y ≤ c and 0 otherwise. ( ) where A = 0.d.’s Xj. y) = e−x−y I(0.f. X3.d. 4. 4. ∞ × 0. X2. 2. ii) Find the marginal p. vii) P(3 ≤ X1 ≤ 10|X2 = X3).4 Let the r. i) Determine the constant c. and the conditional p. f given by f(x.f.∞) (x.98 4 Distribution Functions. ⎩ ( ) x = 0.v.d.d.d. iv) Find the conditional d. given X. iv) P(X + Y ≤ 3).f.’s jointly distributed with p.f. 4. Y be r. ∞ × 0. x 2 .∞)×(0.f. f x y = ⎨ x! ⎪0.’s in (iii). of X1. 1. of Y.f. is given by f x1 .v. iv) Calculate the probability that X ≤ 1. X2 . given X3. iii) Find the conditional (joint) p.f.3 Let X.d.d. ⋅ ⋅ ⋅ . v) P(X1 < X2 < X3). X3. iii) P(X < Y). ∞ . y). Compute the following probabilities: i) P(X ≤ x). 2. 4. x3 = c 3 e ( ) − c x1 + x2 + x 3 ( ) I A x1 .’s corresponding to the conditional p.6 Consider the function given below: ( ) ( ) ( ) ⎧ yx e − y ⎪ . X3. f of the r.2. of each one of the r. Y be jointly distributed with p.d. given X2.d. j = 1. of X1.2. vi) P(X1 + X2 = 10|X3 = 5). given X1. ii) P(Y ≤ y).f. 3. and the conditional p. iii) Find the conditional p. 3. given Y.2. If n = 20. i) Determine the constant c. given X2.f. f given by f(x.v.5 If the joint p.d.’s X. y ≥ 0 otherwise. j = 1. Probability Densities. given X3. x 2 . of X.2. viii) P(X1 < 3X2|X1 > X3). y) = 2/c2 if 0 ≤ x ≤ y. .f.f.d.

0.20. ii) If the marginal p. given that another r. X. x0. 1) and let p = 0. x0. 0.’s. x0. . }.f.60 = 0. given X = x.20. 4.30.80 and 0.5 we get a median of X.f. x0. x 2 = 2 2 ⎛ x1 + x2 ⎞ 1 1 3 3 exp⎜ − x1 x 2 I [ −1. where 2 A = {0. ii) Both marginal p.50.60. we get: x0.90.f.v. is B(n.3 Quantiles and Modes of a Distribution Let X be an r.90.20.f. 0.f. distributed as P(λ) and suppose that the conditional distribution of the r.50. 0. x0. 0. x 2 dx1 −∞ ( ) ∞ ( ) are Normal p.d. x 2 ⎟+ 2π 2 ⎠ 4πe ⎝ ( ) ( ) and show that: i) f is a non-Normal Bivariate p.40 = 0. For illustrative purposes.50.’s f1 x1 = ∫ f x1 . and for p = 0.50 = 0. of X and the conditional p. F(x) = x. x0.10. .40.80.70. or its d.90. f(·|y) is a p. 4.3 Quantiles and Modes of a Distribution 4. Y equals y. 0.d.v. x0.30 = 0. x0.20. x0. 2.80. x0. X. x 2 dx 2 −∞ ( ) ∞ ( ) and f2 x 2 = ∫ f x1 . distributed as U(0. consider the following simple examples.10.90.d.f. of Y. x0.30.40. 0. 0. x0.v.60. Determine the respective x0.10.d.70.f. 0.40.20 = 0. Determine the respective x0. X. 1.8 Consider the function f deﬁned as follows: f x1 .d. and x0. EXAMPLE 4 .30. x0.40..f. distributed as N(0.d.40. EXAMPLE 3 Let X be an r.f. 0. or of its d. 0.10 = 0.80 and 0. x0.10.60.v. A pth quantile of the r.2. of X is given by f(x) = ( 1 )x+1 IA(x). 1]×[ −1.1 The Cumulative Distribution Function 99 i) Show that for each ﬁxed y.60.50.f. given Y = n.d.4. For p = 0.70 = 0.f. x0.70. .v. x0. x0.v. 0. the conditional p. of Y is Negative Exponential with parameter λ = 1.70.7 Let Y be an r.v. and x0.30. with d. x0.. 1] x1 . what is the joint p. Let X be an r. 0.80 = 0. x0. of X. Since for 0 ≤ x ≤ 1.30.v.f.80. F. of an r. p).f.d. 1) and let p = 0.d.60. and x0. Determine the p. x0. 4.70.90.2.10. is a number denoted by xp and having the following property: P(X ≤ xp) ≥ p and P(X ≥ xp) ≥ 1 − p.20. or its d. Y? iii) Show that the marginal p. 0. F and consider a number p such that 0 < p < 1. x0.d.50.f.25 we get a quartile of X.90 = 0.

d. The following theorems answer the question for two important discrete cases.60 = 0.f. and Their Relationship Typical cases: F(x) 1 F(x) p p xp 0 (a) F(x) p x 0 [ xp ] (b) x 0 F(x) 1 p (c) xp x F(x) 1 p 0 (d) xp x 0 [ xp ] (e) x Figure 4.524. x0. Let X be an r. f. as deﬁned.f.253.70 = 0. x0.282.100 4 Distribution Functions. x0. x0.253. 4.50 = 0.f. .30 = −0.842.90 = 1. we ﬁnd: x0.10 = −1.80 = 0. In case f is a p. by linear interpolation and symmetry. Then a mode of f. is any number which maximizes f(x). and x0.d. which is twice differentiable. x0.v. In Fig. x0. a mode can be found by differentiation. This process breaks down in the discrete cases.842. From the Normal Tables (Table 3 in Appendix III).3 various cases are demonstrated for determining graphically the pth quantile of a d.524.282. Knowledge of quantiles xp for several values of p provides an indication as to how the unit probability mass is distributed over the real line. x0. Probability Densities. xp need not be unique.3 Observe that the ﬁgures demonstrate that.40 = −0.20 = −0. if it exists. with a p.

⎝ x⎠ () 0 < p < 1.3 Quantiles and Modes of a Distribution 4. p). where f(x) = f(x − 1) (from above calculations). then the maximum occurs at x = (n + 1)p. Then if (n + 1)p is not an integer. then f(x) has two modes obtained for x = m and x = m − 1. If (n + 1)p is an integer. n. x q Hence f(x) > f(x − 1) ( f is increasing) if and only if (n − x + 1) p > x(1 − p). THEOREM 6 Let X be P(λ). x! Then if λ is not an integer. If λ is an integer. 2. Consider the number (n + 1)p and set m = [(n + 1)p].4. 1. f(x) has a unique mode at x = [λ]. f(x) has a unique mode at x = m. Thus x = n+1 p−1 ( ) is a second point which gives the maximum value. that is. ⋅ ⋅ ⋅ . x q f x f x−1 ( () ) = n−x +1 p ⋅ . or (n + 1) p > x. q = 1 − p. λx . f x = e −λ () PROOF For x ≥ 1. n! p x −1 q n− x +1 x −1! n−x +1! = )( ) n−x +1 p ⋅ . ⋅ ⋅ ⋅ . where [y] denotes the largest integer which is ≤ y.1 The Cumulative Distribution Function 101 THEOREM 5 Let X be B(n. that is. we have ⎛ n⎞ x n− x ⎜ ⎟p q ⎝ x⎠ = ⎛ n ⎞ x −1 n− x +1 f x−1 ⎜ ⎟p q ⎝ x − 1⎠ f x ( () ) = n! p x q n− x x! n − x ! ( ) ( That is. or np − xp + p > x − xp. ⎛ n⎞ f x = ⎜ ⎟ px qn −x . x = 0. 1. x = 0. If (n + 1)p is an integer. f(x) keeps increasing for x ≤ m and then decreases so the maximum occurs at x = m. PROOF For x ≥ 1. Thus if (n + 1)p is not an integer. we have . λ > 0. then f(x) has two modes obtained for x = λ and x = λ − 1.

and so is the empty set (in a vacuous manner).3.3 Draw four graphs—two each for B(n. PROOF Let G be an open set in . 3. .d. f(c − x) = f(c + x) for all x ∈ ).2 Let X be an r.3. and for each x ∈ G.50. f(x) keeps increasing for x ≤ [λ] and then decreases.2. some preliminary concepts and results are needed and will be also discussed. the entire real line is an open set. a rigorous justiﬁcation of Statements 1 and 2 made in Section 4. Without loss of generality.f. p) and P(λ). Probability Densities.1 Determine the pth quantile xp for each one of the p.102 4 Distribution Functions.13–16 (Exercise 3.4 Consider the same p. These intervals are countably many and each one of them is measurable. 4.3.v. If λ is an integer.d. 4.’s given in Exercises 3.1 from the point of view of a mode. then the maximum occurs at x = λ. p) and P(λ)—which represent the possible occurrences for modes of the distributions B(n.3. For this purpose. Clearly. consider an open interval centered at x and contained in G. 4.13–15. and Their Relationship f x f x−1 ( () ) = e − λ λ x x! e −λ x −1 [λ (x − 1)!] ( ) = λ . the union over x. as x varies in G. It follows from this deﬁnition that an open interval is an open set.3. Then the maximum of f(x) occurs at x = [λ]. such intervals may be taken to be centered at x.’s mentioned in Exercise 4. of such intervals is equal to G. f symmetric about a constant c (that is. with p. Then show that c is a median of f. 0. But in this case f(x) = f(x − 1) which implies that x = λ − 1 is a second point which gives the maximum value to the p. x Hence f(x) > f(x − 1) if and only if λ > x.4* Justiﬁcation of Statements 1 and 2 In this section.75. The same is true if we consider only those intervals corresponding to all rationals x in G.f.d.2. Exercises 4.f.f.d. 4. DEFINITION 1 A set G in is called open if for every x in G there exists an open interval containing x and contained in G.3.1 will be presented. then so is their union.14 for α = 1 ) in Chapter 3 if p = 4 0. LEMMA 1 Every open set in is measurable. Thus if λ is not an integer.

the assertion is valid.4. ||x|| = 12 (∑ik=1 xi2 ) . by the term open “cube” we mean the Cartesian product of m open intervals of equal length. The same is true if we restrict ourselves to x’s in G whose m coordinates are rationals.e. Every open set in PROOF m n LEMMA 2 is measurable. . It follows from the concept of continuity that ε → 0 implies δ → 0. is called open if for every x in G there exists an open cube in m containing x and contained in G. But B(= g−1(G)) is the set of all x ∈ for which g(x) ∈ G.4* Justiﬁcation Distribution Function 103 DEFINITION 2 A set G in m. Set B = g−1(G). x0 + ε) ⊂ B. to Euclidean spaces of higher dimensions. m ≥ 1) is said to be continuous at x0 ∈ k if for every ε > 0 there exists a δ = δ (ε. The function g is continuous in S if it is continuous for every x ∈ S. Then by Lemma 1. As all x ∈ (x0 − ε. as x varies in G. we can make δ so small that (g(x0) − δ. it follows that (x0 − ε. DEFINITION 4 A function g : S ⊆ k → m (k. and similarly for the other quantities. it follows that B is open. consider an open cube centered at x and contained in G. x0) > 0 such that |x − x0| < ε implies |g(x) − g(x0)| < δ. by choosing ε sufﬁciently small. DEFINITION 3 Recall that a function g: S ⊆ → is said to be continuous at x0 ∈ S if for every ε > 0 there exists a δ = δ (ε. so let B ≠ ∅ and let x0 be an arbitrary point of B. g(x0) + δ) ⊂ G. Since g(x0) ∈ G and G is open. since so is each cube. for x = (x1. Without loss of generality. Then g is measurable. Thus if B = ∅. of course. . Indeed. i. Thus. Here ||x|| stands for the usual norm in k. LEMMA 3 Let g: PROOF → be continuous. x0 + ε) have this property. xk)′. of such cubes clearly is equal to G. g(x0) + δ ). The concept of continuity generalizes. m ≥ 1.. let G be an open set in . Once again. LEMMA 4 Let g: k → m be continuous. x0 + ε) implies that (g(x0) − δ. . for such a choice of ε and δ. Continuity of g at x0 implies that for every ε > 0 there exists δ = δ(ε. . and for each x ∈ G. By Theorem 5 in Chapter 1 it sufﬁces to show that g−1(G) are measurable sets for all open intevals G in . It is analogous to that of Lemma 1. The union over x. x0 + ε) implies g(x) ∈ (g(x0) − δ. and therefore their union is measurable. from the concept of continuity it follows that ε → 0 implies δ → 0. Since x0 is arbitrary in B. such cubes may be taken to be centered at x. . so that g(x0) ∈ G. x0) > 0 such that |x − x0| < ε implies |g(x) − g(x0)| < δ. g(x0) + δ ) is contained in G. Equivalently. it is measurable. Then g is measurable. The function g is continuous in S if it is continuous for every x ∈ S. x ∈ (x0 − ε.1 The Cumulativeof Statements 1 and 2 4. x0) > 0 such that ||x − x0|| < ε implies ||g(x) − g(x0)|| < δ. Then the resulting cubes are countably many. x ∈ (x0 − ε. and then a result analogous to the one in Lemma 3 also holds true.

and Their Relationship The proof is similar to that of Lemma 3. . .) PROOF To prove that [g(X)]−1(B) ∈ A if B ∈ B m. Xk)′. Since g(x0) ∈ G and G is open. that is. j = 1. k are PROOF 0j . δ ). . . . x0) > 0 such that ||x − x0|| < ε implies ||g(x) − g(x0)|| < δ. Thus. X (B1) ∈ A since X is measurable. For an arbitrary point x0 in K. consider x ∈ K such that ||x − x0|| < 2 k ε for some ε > 0. . B k) be a random vector. j = 1. k. ε) ⊂ B. . where Xj. δ ). . . . Then g(X) is a random vector. x ∈ S(x0. it is also measurable. . The question then arises as to how X and Xj. Now consider a k-dimensional function X deﬁned on the sample space S. . (That is. x ∈ S(x0. The proof is completed. δ ). We may now proceed with the justiﬁcation of Statement 1.) PROOF The continuity of g implies its measurability by Lemma 3. k → and It so happens that projection functions are continuous. Bm) and is a random vector. k. . . j = 1. ε). . measurable functions of random vectors are random vectors. the jth projection function gj is deﬁned by: gj: gj(x) = gj(x1. . Then g(X): (S. as deﬁned above. it follows that S(x0. Then C(x0. . we have the following COROLLARY Let X be as above and g be continuous. . . . j = 1. continuous functions of random vectors are random vectors. Bm) be measurable. At this point. . . . Continuity of g at x0 implies that for every ε > 0 there exists a δ = δ(ε. . k. Set B = g−1(G). and therefore the theorem applies and gives the result. ε) implies that g(x) ∈ S(g(x0). To this theorem. . . . ε). δ ). Since B(= g−1(G)) is the set of all x ∈ k for which g(x) ∈ G. Thus the deﬁnition of continuity of gj is satisﬁed here for δ = ε. (That is. The details are presented here for the sake of completeness. This is equivalent to ||x − x0||2 < ε 2 or ∑ j =1 ( x j − x ) < ε 2 which implies that (xj − x0j)2 < ε 2 for j = 1. equivalently. . Once again. . Also. By Lemma 2. If B = ∅ the assertion is true. . we have [g(X)] (B) = X [g (B)] = X (B ). k. and x ∈ S(x0. xk) = xj. Probability Densities. . call it C(x0. LEMMA 5 The coordinate functions gj. . A) → ( k. PROOF THEOREM 7 Let X : (S. . . Then X may be written as X = (X1. −1 −1 −1 −1 1 −1 where B1 = g −1 B ∈B k ( ) by the measurability of g. j = 1. and therefore suppose that B ≠ ∅ and let x0 be an arbitrary point of B. k are realvalued functions. ε) implies g(x) ∈ S(g(x0). or |xj − x0j| < ε. and therefore B is open. we can choose ε so small that the corresponding δ is sufﬁciently small to imply that g(x) ∈ S(g(x0). observe that it is clear that there is a cube containing x0 and contained in S(x0. A) → ( m. . ε) implies that g(x) ∈ S(g(x0). r) stands for the open sphere with center c and radius r. DEFINITION 5 For j = 1. k. for such a choice of ε and δ. are continuous. This last expression is equivalent to |gj(x) − gj(x0)| < ε.104 4 Distribution Functions. it sufﬁces to show that g−1(G) are measurable sets for all open cubes G in m. ε) ⊂ B. and let g : ( k. . where S(c. B k) → ( m.

ii) Use part (i) in conjunction with Exercise 4. .’s. x j ]) ∈ A .4. assume that Xj.4.2 Use Exercise 4.v.v. then show that so is the function X 2. . . 4.2 and 4. j =1 k The proof is completed. then show that so is the function 1 X . .1 in order to show that.v. X −1 ( B) = (X ∈ B) = ( X j ∈( −∞. . . r∈ Q where Q is the set of rationals in . Indeed. .. j = 1. .4.3(ii) to show that.v.4. x j ]. . . Y . vector and let gj. ii) Use part (i) and Exercise 4. then so is the function XY. .5 ii) If X is an r. k be the coordinate functions deﬁned on k. show that: {s ∈ S . . PROOF k .4 ii) If X is an r.’s. x1] × · · · × (−∞. To show that X is an r.v. k are r. To this effect.4. then show that so is the function −X. B k). . . 4. Y (s) < x − r}].4.4(ii) to show that.1 The Cumulative DistributionExercises Function 105 related from a measurability point of view. .. then so is the function X − Y.4. . then so is the function X .v. provided X ≠ 0. j = 1. then so is the function X + Y.’s. . it sufﬁces to show that X−1(B) ∈ A for each B = (−∞. ⋅ ⋅ ⋅ . x1. j = 1. provided Y ≠ 0. j = 1.4. . . Xk) = Xj is measurable and hence an r.’s. xk ∈ . if X and Y are r. k.v.1 If X and Y are functions deﬁned on the sample space S into the real line . THEOREM 8 Let X = (X1. 1 ii) Use the identity: XY = 2 (X + Y)2 − 1 (X 2 + Y 2) in conjunction with part (i) 2 and Exercises 4.4. . Exercises 4. Then for each j = 1. .v.4. by special case 3 in Section 2 of Chapter 1. vector. if X and Y are r. we have the following result. gj(X) = gj(X1. .2 to show that. A) → ( Xj.v. if X and Y are r.3 ii) If X is an r. k ) = I X j−1 (( −∞. . Then X is an r. Then gj’s are continuous by Lemma 5 and therefore measurable by Lemma 4. Xk)′ : (S.’s. 4. if X and Y are r. Next. 4. X (s) + Y (s) < x} = U [{s ∈S . .’s.4. X (s) < r} ∩ {s ∈S . . . k are r. . .v.. vector if and only if Suppose X is an r. xk].

k ≥ 1. . . x = x1 . Also ∞ ∞ we say that the integral ∫−∞ ⋅ ⋅ ⋅ ∫−∞ h x1 . . xk dx1 ⋅ ⋅ ⋅ dxk ⎩∫−∞ ∫−∞ [ ( )] )( ) 106 . Let X = (X1. . xk ⎪∫−∞ ∫−∞ ⎩ For n = 1. . when we write (inﬁnite) series or integrals it will always be assumed that they converge absolutely. xk dx1 ⋅ ⋅ ⋅ dxk converges absolutely if REMARK 1 ( ) ∫−∞ ⋅ ⋅ ⋅ ∫−∞ h( x1 . xk dx1 ⋅ ⋅ ⋅ dxk . xk)′ varies over a discrete set in k. we say that the moments to be deﬁned below exist. the following remark will prove useful. ⋅ ⋅ ⋅ . 2. ⋅ ⋅ ⋅ . x2 . ⋅ ⋅ ⋅ . . Xk) is an r. .v. where x = (x1.d. 1 ⋅ ⋅ ⋅ . . We say that the (inﬁnite) series ∑ xh(x). ⋅ ⋅ ⋅ . . Xk)′ be an r. we get [ ( )] [ ( )] ( ) [( ()() ( ( ) )] ( ) ⎧∑ g x f x ⎪ E g X =⎨ x ∞ ∞ ⎪ ⋅ ⋅ ⋅ g x1 . the nth moment of g(X) is denoted by E[g(X)]n and is deﬁned by: n ′ ⎧ ⎪∑ g x f x . . f and consider the (measurable) function g: k → . so that g(X) = g(X1. xk ) dx1 dx2 ⋅ ⋅ ⋅ dxk < ∞. In this case.1 Moments of Random Variables In the deﬁnitions to be given shortly. xk n ⎪x E g X =⎨ n ⎪ ∞⋅⋅⋅ ∞ g x . Then we give the DEFINITION 1 iii) For n = 1. . ⋅ ⋅ ⋅ . f x1 .f. . ∞ ∞ In what follows. . xk f x1 . converges absolutely if ∑ x|h(x)| < ∞. . . vector with p.106 5 Moments of Random Variables—Some Moment and Probability Inequalities Chapter 5 Moments of Random Variables—Some Moment and Probability Inequalities 5. ⋅ ⋅ ⋅ . .

x = x1 . 1 ⋅ ⋅ ⋅ . nk)-joint moment of X1. . In particular.v. or σ g ( X ). . E|g(X) − c|r.1.) of g(X) and is also denoted by σg(X). if no confusion is possible. . . we get 1 k 1 k . Let g(X1. ⋅ ⋅ ⋅ . the nth moment and rth absolute moment of g(X) about c are denoted by E[g(X) − c]n. ⋅ ⋅ ⋅ . x )dx k 1 2 is called the variance of g(X) and is also denoted by σ 2[g(X)]. Then n E( X 1n ⋅ ⋅ ⋅ X k ) is called the (n1. xk ⎪x =⎨ r ⎪ ∞⋅⋅⋅ ∞ g x . or just μ. respectively. iii) For r > 0. f x1 . xk ⎪ =⎨ x n ⎪ ∞⋅⋅⋅ ∞ g x . . ⋅ ⋅ ⋅ . . where nj ≥ 0 are integers. . or just σ. ⋅ ⋅ ⋅ .d. xk dx1 ⋅ ⋅ ⋅ dxk . 5. ⋅ ⋅ ⋅ . Xk. x = x1 . or μ[g(X)]. if no confusion is possible. . ⎪∫−∞ ∫−∞ ⎩ r ) ) For c = E[g(X)].1 Important Special Cases n n 1. . and n and r as above. The 2nd central moment of g(X).1 Moments of Random Variables 107 and call it the mathematical expectation or mean value or just mean of g(X). the rth absolute moment of g(X) is denoted by E|g(X)|r and is deﬁned by: Eg X ( ) r r ′ ⎧ ⎪∑ g x f x . for n1 = · · · = nj−1 = nj+1 = · · · = nk = 0. if no confusion is possible. or just 2 2 σ . is referred to as the moment of inertia in Mechanics. 1 ⋅ ⋅ ⋅ .5. xk − c f x1 . the moments are called central moments. ⋅ ⋅ ⋅ . 1 ⋅ ⋅ ⋅ . 1 ⋅ ⋅ ⋅ . xk − c f x1 . The quantity + σ [ g(X )] = σ [ g(X )] is called the standard deviation (s. xk 2 1 ( ) ′ ⋅ ⋅ ⋅ dxk [( ) ( )] f ( x . . The variance of an r. ⋅ ⋅ ⋅ . ⎪∫−∞ ∫−∞ ⎩ [() ] () ( ) ] ( [( () () ( ( ) ( ) ) and r ′ ⎧ ⎪∑ g x − c f x . . nj = n. x = x1 . xk ⎪x Eg X −c = ⎨ r ⎪ ∞⋅⋅⋅ ∞ g x . . E g X −E g X {( =⎨ x ⎪ ∞⋅⋅⋅ ∞ g x . xk − Eg X ⎪ ⎩∫−∞ ∫−∞ ) [ ( )]} ⎧ ⎪∑ [ g( x) − Eg(X )] f ( x). xk ⎪∫−∞ ∫−∞ ⎩ () () ( ( ) ) ( ) iii) For an arbitrary constant c. that is. Xk) = X 1 ⋅ ⋅ ⋅ X k . xk dx1 ⋅ ⋅ ⋅ dxk . and are deﬁned as follows: E g X −c [( ) ] ( ) n n ′ ⎧ ⎪∑ g x − c f x . xk dx1 ⋅ ⋅ ⋅ dxk . ⎪ 2 2 x = x1 . Another notation for E[g(X)] which is often used is μg(X). ⋅ ⋅ ⋅ .

then μX is just the arithmetic average of the possible outcomes of X. if one recalls from physics or elementary calculus the deﬁnition of center of gravity and its physical interpretation as the point of balance of the distributed mass. xk j j ⎪x ′ ⎪ ( x . provided the probability distribution of X is interpreted as the unit mass distribution.v. as given. of g(X) rather than that of X. There seems to be a discrepancy between these two deﬁnitions. X with p. Also. Actually. from the last expression above.f. ⋅ ⋅ ⋅ . The quantity μX can be interpreted as follows: It follows from the deﬁnition that if X is a discrete uniform r.v. ⋅ ⋅ ⋅ . xk dx1 ⋅ ⋅ ⋅ dxk ⎪∫−∞ ∫−∞ j ⎩ ⎧∑ x n f j x j j ⎪x =⎨ ⎪ ∞ x n f x dx j ⎩∫−∞ j j j 1 k () ( ) ( ) ( ) j ( ) which is the nth moment of the r. ∫−∞ ⎩ () () For n = 1.d. Then REMARK 2 −1 −1 ∫−∞ g( x) f ( x)dx = ∫−∞ yf [g ( y)] dy g ( y) dy. in the deﬁnition of E[g(X)]. one would expect to use the p. 1 ⋅ ⋅ ⋅ . Thus the nth moment of an r. we get ⎧∑ xf x ⎪ E X =⎨ x ∞ ⎪ xf x dx ⎩∫−∞ ( ) () () which is the mathematical expectation or mean value or just mean of X. In Deﬁnition 1. and that some further conditions are met. ∞ ∞ d . is correct and its justiﬁcation is roughly as follows: Consider E[g(x)] = ∞ ∫−∞ g( x)f ( x)dx and set y = g(x). Xj.f.. This quantity is also denoted by μX or μ(X) or just μ when no confusion is possible.108 5 Moments of Random Variables—Some Moment and Probability Inequalities E X ( ) n j ⎧∑ x n f x = ∑ x n f x1 .v. More speciﬁcally.x ) =⎨ ⎪ ∞ ⋅ ⋅ ⋅ ∞ xn f x . f is E Xn ( ) ⎧∑ x n f x ⎪ =⎨ x ∞ ⎪ x n f x dx. the interpretation of μX as the mean or expected value of the random variable is the natural one. suppose X is a continuous r. E(X) = ∞ ∫−∞ xf ( x)dx.v.d. Then E[g(X)] = ∞ ∫−∞ g( x)f ( x)dx. Suppose that g is differentiable and has an inverse g−1. the deﬁnition of E[g(X)]. On the other hand.

that is. One recalls that a large moment of inertia means the mass of the body is spread widely about its center of gravity.d. Xk) = Xj(Xj − 1) · · · (Xj − n + 1). g(X1. or moment of inertia. Xk) = X 1n ⋅ ⋅ ⋅ X k and n1 = · · · = nj−1 = nj+1 = · · · = nk = 0. . we get 1 k E X j − EX j ( ) n n ′ ⎧ ⎪∑ x j − EX j f x . . X with p. . . . the quantity ⎧∑ x j x j − 1 ⋅ ⋅ ⋅ x j − n + 1 f j x j ⎪x E Xj Xj −1 ⋅ ⋅ ⋅ Xj −n+1 = ⎨ ⎪ ∞ x x − 1 ⋅ ⋅ ⋅ x − n + 1 f x dx j j j j ⎩∫−∞ j j [ ( ) ( )] ( ) ( ) ( ) j ( ) ( ) ( ) . and is called the variance of X. . . For g(X1. . 3. .v. . For g as above. j = 1.f. Xk) = (X1 − EX1)n · · · (Xk − EXk)n . f and mean μ is E X − EX ( ) n =E X −μ ( ) n ⎧ ⎪∑ x − EX =⎨ x ⎪ ∞ x − EX ⎩∫−∞ ( ) f (x) = ∑ (x − μ ) f (x) n n x ( ) f (x)dx = ∫ (x − μ ) f (x)dx. . nk)-joint moment of X1. x = x1 . .d. . ∞ Therefore the last integral above is equal to ∫−∞ yfY ( y)dy. . nj = 1. . the quantity 1 k n n E ⎡ X1 − EX1 ⋅ ⋅ ⋅ X k − EX k ⎤ ⎢ ⎥ ⎣ ⎦ is the (n1. . xk dx1 ⋅ ⋅ ⋅ dxk j ⎪∫−∞ ∫−∞ j ⎩ n ⎧ ∑ x j − EX j f j x j ⎪ ⎪ = ⎨ xj ⎪ ∞ x − EX n f x dx j j j j ⎪∫−∞ j ⎩ ( ( ) () ( ) ( ) ( ) ( ) ) ( ) ) ( which is the nth central moment of the r. Xk about their means. 2 As in the case of μX.) of X.f. . xk ⎪x =⎨ ⎪ ∞ ⋅ ⋅ ⋅ ∞ x − EX n f x . . ⋅ ⋅ ⋅ . σ X has a physical interpretation also. and c = E(Xj). Likewise a large variance corresponds to a probability distribution which is not well concentrated about its mean value.) n 2.v. . .5. . . then fY ( y) = f [ g −1 ( y)] dy g −1 ( y) . Xk or the (n1. . n ∞ n −∞ 2 In particular. . Thus the nth central moment of an r. . nk)-central joint moment of X1. . of Y.d. For g(X1. .1 Moments of Random Variables 109 d On the other hand. . (A justiﬁcation of the above derivations is given in Theorem 2 of Chapter 9. . 1 ⋅ ⋅ ⋅ . k. if fY is the p. . Its deﬁnition corresponds to that of the second moment. Xj (or the nth moment of Xj about its mean). . . Its positive square root σX or σ(X) or just σ is called the standard deviation (s. which is consonant ∞ with the deﬁnition of E ( X ) = ∫−∞ xf ( x) dx. 1 k ( ) ( ) 4. for n = 2 the 2nd central moment of X is denoted by σ X or σ 2(X) 2 or just σ when no confusion is possible.

then E(X) ≥ E(Y). and.’s (with ﬁnite expectations). ⋅ ⋅ ⋅ .v. E(cX + d) = cE(X) + d if X is an r. f is ⎧∑ x x − 1 ⋅ ⋅ ⋅ x − n + 1 f x ⎪ E X X −1 ⋅ ⋅ ⋅ X −n+1 = ⎨ x ∞ ⎪ x x − 1 ⋅ ⋅ ⋅ x − n + 1 f x dx. (E4″) E ( ∑jn= 1 c j X j ) = ∑jn= 1 c j E ( X j ). . (E2) E[cg(X)] = cE[g(X)]. j j [ ( )] [ ( )] ( In fact. ⎩∫−∞ [ ( ) ( )] ( ) ( )() ( ) ( )() 5. E(cX) = cE(X) if X is an r. From the very deﬁnition of E[g(X)]. ⋅ ⋅ ⋅ . X with p. since of n = 1. xk f x1 . .V. in particular.2 Basic Properties of the Expectation of an R.v. j =1 [ ( )] The discrete case follows similarly. (E4′) E ∑n= 1 cj g j X = ∑n= 1 cj E g j X . we get that (E5′) If X ≥ Y. where X is an r. (E4) Combining (E2) and (E3). we have ⎡n ⎤ ⎤ ∞ ∞ ⎡ n E ⎢∑ c j g j X ⎥ = ∫ ⋅ ⋅ ⋅ ∫ ⎢∑ c j g j x1 . then E|X|r′< ∞ for all 0 < r′ < r.v. we get E[cg(X) + d] = cE[g(X)] + d. .110 5 Moments of Random Variables—Some Moment and Probability Inequalities is the nth factorial moment of the r.. (E5) If X ≥ 0. In particular. where c is a constant. in the continuous case. This is a consequence of the obvious inequality |X|r′ ≤ 1 + |X|r and (E5′). xk dx1 ⋅ ⋅ ⋅ dxk −∞ ⎢ ⎢ ⎥ −∞ ⎥ ⎣ j =1 ⎣ j =1 ⎦ ⎦ ( ) ) ( ) = ∑ c j ∫ ⋅ ⋅ ⋅∫ g j x1 . where d is a constant. E(X + d) = E(X) + d if X is an r. in particular. Furthermore. Xj. xk dx1 ⋅ ⋅ ⋅ dxk −∞ −∞ j =1 n n ∞ ∞ ( )( ) = ∑ cjE gj X . xk ⎥ f x1 . (E7) If E|X|r < ∞ for some r > 0. by means of (E5) and (E4″). where X and Y are r. and.d.v. then E(X) ≥ 0. we have |Xn| = |X|n.f.1. (E6) |E[g(X)]| ≤ E|g(X)|. it follows that . 2. Consequently. . In particular. ⋅ ⋅ ⋅ . ⋅ ⋅ ⋅ . Thus the nth factorial moment of an r. by means of (E6). (E3) E[g(X) + d] = E[g(X)] + d. the following properties are immediate.v. (E1) E(c) = c.v.v. for example.

5. and.v.V. then E|X|r′ for all 0 < r′ < r. with n′ < n. E|X|n < ∞) for some n = 2.1. if X is an r.v.1. 2. 5. 3. (V2) σ2[cg(X)] = c2σ 2[g(X)].3 For r′ < r. In fact.3 Basic Properties of the Variance of an R. . (V1) σ2(c) = 0.1 Moments of Random Exercises Variables 111 (E7′) If E(Xn) exists (that is..v. σ 2 g X = E g X − Eg X 2 ⎬ ⎨ [ ( )] [ ( ) ( )] = E⎧[g(X)] − 2 g(X)Eg(X) + [Eg(X)] ⎫ ⎭ ⎩ = E[ g( X)] − 2[ Eg( X)] + [ Eg( X)] = E[ g( X)] − [ Eg( X)] .v. . σ 2(cX) = c2σ 2(X). 5. Exercises 5. . in particular. if X is an r. if X is an r. show that |X|r′ ≤ 1 + |X|r and conclude that if E|X|r < ∞.v. σ 2(X + d) = σ 2(X). the following properties are easily established by means of the deﬁnition of the variance. This formula is especially useful in calculating the variance of a discrete r. if X is an r. . In particular. σ 2(cX + d) = c2σ 2(X).1. . we get σ 2[cg(X) + d] = c2σ 2[g(X)]. where d is a constant.2 Verify the details of properties (V1)–(V5). and. 2 (V4) Combining (V2) and (V3). 2 2 2 2 2 2 2 the equality before the last one being true because of (E4′). if X is an r. . In fact. in particular.v. σ2 g X +d = E g X +d −E g X +d 2 2 [( ) ] {[ ( ) ] [ ( ) ]} = E[ g( X) − Eg( X)] = σ [ g( X)]. . (V5) σ 2[g(X)] = E[g(X)]2 − [Eg(X)]2. Regarding the variance. (V3) σ2[g(X) + d] = σ 2[g(X)]. where c is a constant. in particular. . as is easily seen. (V6) σ 2(X) = E[X(X − 1)] + EX − (EX)2.. and.1. (V5′) σ2(X) = E(X 2) − (EX)2.5. then E(Xn′) also exists for all n′ = 1.1 Verify the details of properties (E1)–(E7). as is seen below.

v.v.006 iii) Calculate the EX and the σ 2(X).v.1. 5. k = 2. such that 1 P X = −c = P X = c = . denoting the number of claims ﬁled by a policyholder of an insurance company over a speciﬁed period of time.112 5 Moments of Random Variables—Some Moment and Probability Inequalities 5.1.1. iii) What premium should the company charge in order to break even? iii) What should be the premium charged if the company is to expect to come ahead by $M for administrative expenses and proﬁt? .019 6 0.1. 5. σ 2(X) and show that ∞ ∞ ( ) ( ) P X − EX ≤ c = 5. 5. with ﬁnite EX. X = IA.6 Let X be an r. and calculate EXr. 4 3 2 [ ( [ ( )( )] E[ X ( X − 1)( X − 2)( X − 3)] = EX − 6EX + 11EX )] − 6EX . E[X(X − 1)(X − 2) (X − 3)].8 Let X be an r.061 5 0.115 4 0.304 1 0.287 2 0.v.4 Verify the equality (E[ g( X )] =)∫−∞ g( x) fX ( x)dx = ∫−∞ yfY ( y)dy for the case that X ∼ N(0. E[X(X − 1)(X − 2)]. ii) E(X − EX)4 = EX4 − 4(EX)(EX3) + 6(EX)2(EX2) − 3(EX)4. Then show that ii) E(X − EX)3 = EX3 − 3(EX)(EX)2 + 2(EX)3. consider the r.9 If EX4 < ∞.7 Let X be an r. ii) For any constant c. 1) and Y = g(X) = X2. 5. (These relations provide a way of calculating EXk. such that EX4 < ∞.1. 4 by means of the factorial moments E[X(X − 1)].) 5.208 3 0. it may be assumed that the distribution of X is as follows: x f(x) 0 0. the indicator of A deﬁned by IA(s) = 1 for s ∈ A and IA(s) = 0 for s ∈ Ac. r > 0.10 Let X be the r. On the basis of an extensive study of the claim records. 3.v. show that E(X − c)2 = E(X − EX)2 + (EX − c)2.1. ii) Use part (i) to conclude that E(X − c)2 is minimum for c = EX. 2 Calculate EX. ( ) σ2 X c2 ( ).5 For any event A. E X X − 1 X − 2 = EX 3 − 3EX 2 + 2EX . and also σ 2(X).1. show that: E X X − 1 = EX 2 − EX .

1. j = 1. 2 ii) Compute EX. ii) Then if EX exists.v.1 Moments of Random Exercises Variables 113 5. 2 σ X 2 ( ) (α − β ) = 12 2 . and 2 iii) Suppose a gambler is placing a bet of $M on red.f.1. 5. 2.11 green.v. 0 −∞ ∞ [ ( )] 0 () . E|Xr|. those moments of X of odd order which exist are all equal to zero). with p. show that EX2n+1 = 0 (that is. 5.1.13(iv) in Chapter 3 and ﬁnd the EX for those α’s for which this expectation exists.v.1. having the distribution in question. Calculate EXn for any positive integer n. f(c − x) = f(c + x) for every x). 5. with ﬁnite expectation and d. 5.16 Refer to Exercise 3.3. calculate the expected gain or loss and the standard deviation. 18 black. . show that EX = c.d. c2 ( ) x () Compute EXn for any positive integer n. given by f x = () I − c . ii) Show that EX = ∫ 1 − F x dx − ∫ F x dx. .f. f symmetric about a constant c (that is.v.1.15 Let X be an r. ii) If c = 0 and EX2n+1 exists. show that α +β EX = .1. r > 0. β).d. ii) Use (i) in order to compute σ 2(X). F. E[X(X − 1)]. .1. σ 2(X).v. β).14 Let the r. iii) For what value of k do the two expectations in parts (i) and (ii) coincide? iv) Does this value of k depend on M? iv) How do the respective standard deviations compare? 5. .v.5. 5.17 Let X be an r. with p.12 Let X be an r. A roulette wheel has 38 slots of which 18 are red. What is the gambler’s expected gain or loss and what is the standard deviation? iii) If the same bet of $M is placed on green and if $kM is the amount the gambler wins.f. X be distributed as U(α.c x . distributed as U(α.v.1. where X is an r.13 If X is an r. 5.18 Let X be an r. such that P(X = j) = ( 1 )j.

2. Then E(X) = np. so that β⎠ β⎠ ⎝ ⎝ ⎡ ⎛ ⎛ 2⎞ 1⎞⎤ σ 2 X = ⎢Γ ⎜ 1 + ⎟ − Γ 2 ⎜ 1 + ⎟ ⎥ σ 2 β . then: ⎛ ⎛ 1⎞ 2⎞ 1β 2 β ii) Show that EX = Γ ⎜ 1 + ⎟ α .) 5. Let X be B(n.v.v.1. In fact. show that E X − c = E X − m + 2 ∫ c − x f x dx. m ∞ 1 Then the fact that ∫−∞ f ( x)dx = ∫m f ( x)dx = 2 and simple manipulations c prove part (i). σ 2(X) = npq. 5. 2 β = 1 and β = 2.f.2 Expectations and Variances of Some r. n n ⎛ n⎞ n! E X = ∑ x⎜ ⎟ p x q n − x = ∑ x px qn −x x! n − x ! x = 0 ⎝ x⎠ x =1 ( ) =∑ x =1 n ( x − 1)!(n − x)! (n − 1)! ( )( ) p q = np∑ x − 1)![( n − 1) − ( x − 1)]! ( (n − 1)! p q( ) = np p + q = np∑ ( ) x![( n − 1) − x]! n x −1 n −1 − x −1 x =1 n −1 x n −1 − x x= 0 n n−1 ! ( ) ( ) px qn −x n −1 = np.19 Let X be an r. β⎠ β ⎠⎥ ⎝ ⎢ ⎝ ⎣ ⎦ ∞ where recall that the Gamma function Γ is deﬁned by Γ γ = ∫0 t γ −1 e − t dt . c ii) If m is a median of f and c is any constant. f.d. X is distributed according to the Weibull distribution (see Exercise 4.v.1. EX 2 = Γ ⎜ 1 + ⎟ α .’s 5. and in each one split the integral from −∞ to c and c to ∞ in order to remove the absolute value. (Hint: Consider the two cases that c ≥ m and c < m. . ii) Determine the numerical values of EX and σ 2(X) for α = 1 and β = 1 . For part (ii). γ > 0. ( ) () 5. of the continuous type with ﬁnite EX and p. p).20 If the r. m ( )() ii) Utilize (i) in order to conclude that E|X − c| is minimized for c = m.1 Discrete Case 1.114 5 Moments of Random Variables—Some Moment and Probability Inequalities ii) Use the interpretation of the deﬁnite integral as an area in order to give a geometric interpretation of EX.1.15 in Chapter 4). observe that ∫m (c − x) f ( x)dx ≥ 0 whether c ≥ m or c < m.

E[X(X − 1) · · · (X − n + 1)] = λn. In fact.’s 115 Next. E X = ∑ xe − λ x= 0 ( ( ) ) ( ) ∞ ∞ ∞ λx λx λx −1 = ∑ xe − λ = λe − λ ∑ x! x =1 x x−1 ! x =1 x − 1 ! ( ) ( ) = λe − λ ∑ Next. λ = λ e −λ eλ = λ. Let X be P(λ). x x−1 x− 2 ! x = 0 x! = ∑ x x − 1 e −λ x=2 ( )( ) Hence EX2 = λ2 + λ. [ ( )] )] ( ) σ 2 X = E X X − 1 + EX − EX 2 2 2 2 2 ( ) [ ( ( ) 2 = n n − 1 p 2 + np − n 2 p 2 = n p − np + np − n p = np 1 − p = npq. x = 0 x! x ∞ E X X − 1 = ∑ x x − 1 e −λ x= 0 ∞ [ ( )] ∞ ( ( ) ) λx x! ∞ λx λx = λ2 e − λ ∑ = λ2 . One can also prove that the nth factorial moment of X is λn. Then E(X) = σ 2(X) = λ.1 and Variances of Some R.V. so that. E X X −1 n [ ( )] x n −x x−2 n −2 − x−2 ! ) x! nn− x ! p q ( ) n( n − 1)( n − 2)! ( = ∑ x( x − 1) p p q x( x − 1)( x − 2)![( n − 2) − ( x − 2)]! (n − 2)! ( )( ) = n( n − 1) p ∑ p q ( x − 2)![(n − 2) − ( x − 2)]! (n − 2)! p q( ) = n( n − 1) p ∑ x![( n − 2) − x]! = n( n − 1) p ( p + q) = n( n − 1) p . Hence. = ∑x x−1 x= 0 n 2 x=2 n 2 x−2 n −2 − x−2 x=2 2 n −2 x n −2 −x x= 0 2 n −2 2 ( )( ) That is. E X X − 1 = n n − 1 p2 . that is. REMARK 3 . 2.2 ExpectationsMoments of Random Variables 5. σ 2(X) = λ2 + λ − λ2 = λ.5. by (V6).

n 1. Then E X 2 n+1 = 0. we get then m2 n = 2 n − 1 m2 n− 2 . 2 ⋅ 1! In fact. E X 2 n = ( ) n ≥ 0. 1).2 Continuous Case 2n ! ( ) 2( (n)!) . we obtain ( ( ) () ) . Next. and ∞ 2n e−x 2 2 dx = 2 ∫ x 2 n e − x 0 2 ∞ 2 2 dx. ( ) σ2 X = E X2 = ( ) 1 2π ( ) 2 = 1.2. E X 2 n+1 = But ( ) ∫ 2 ∞ −∞ x 2 n+1 e − x 2 2 dx. In particular. Let X be N(0. Multiplying them out. ∫ ∞ 0 x 2n e − x 2 dx = − ∫ x 2n −1de − x 2 ∞ 2 0 = − x 2n −1e − x = 2n − 1 2 ∞ 2 0 + 2n − 1 2 ( )∫ ∞ 0 x 2n − 2 e − x 2 dx 2 ( )∫ ∞ 0 x 2n − 2 e − x 2 dx. and if we set m2n = E(X2n). ∫−∞x as is easily seen. 2 2 ∞ ∞ 0 0 Thus E(X2n+1) = 0. and similarly. m 2 n− 2 2 n− 4 ( ) = (2 n − 3)m M m2 = 1 ⋅ m0 m0 = 1 since m0 = E X 0 = E 1 = 1 . ∫−∞ x ∞ 2 n +1 − x 2 2 e dx = ∫ x 2n +1e − x 2 dx + ∫ x 2n +1e − x 2 dx 2 0 ∞ −∞ 0 0 = ∫ y 2n +1e − y 2 dy + ∫ x 2n +1e − x 2 dx 2 2 ∞ ∞ 0 = − ∫ x 2n +1e − x 2 dx + ∫ x 2n +1e − x 2 dx = 0.116 5 Moments of Random Variables—Some Moment and Probability Inequalities 5. then E X = 0.

V. In fact. ⎛ X − μ⎞ 1 2 σ 2⎜ ⎟= 2σ X ⎝ σ ⎠ σ and then 1 2 σ X = 1. σ 2). Hence ⎛ X − μ⎞ 2⎛ X − μ⎞ E⎜ ⎟ = 0. Then (X − μ)/σ is N(0. Let X be Gamma with parameters α and β. Next.1 and Variances of Some R.’s 117 m2 n = 2 n − 1 2 n − 3 ⋅ ⋅ ⋅ 1 )( ) 1 ⋅ 2 ⋅ ⋅ ⋅ ( 2 n − 3)( 2 n − 2)( 2 n − 1)( 2 n) (2 n)! = = 2 ⋅ ⋅ ⋅ ( 2 n − 2)( 2 n) (2 ⋅ 1) ⋅ ⋅ ⋅ [2( n − 1)](2 ⋅ n) (2 n)! (2 n)! . σ ⎝ σ ⎠ σ ( ) Hence μ 1 E X − = 0.5. Then E(X) = αβ and σ 2(X) = αβ 2. . ( ) ( ) 2. ⎝ σ ⎠ ⎝ σ ⎠ But ⎛ X − μ⎞ 1 μ E⎜ ⎟= EX − . σ ⎜ ⎟ = 1. σ σ ( ) so that E(X) = μ. 1). = = 2 [1 ⋅ ⋅ ⋅ ( n − 1) n] 2 ( n!) n n ( REMARK 4 Let now X be N(μ.2 ExpectationsMoments of Random Variables 5. EX = = ( ) 1 Γ α βα ( ) ∫ ( ) ∫ ∞ 0 ∞ xx α −1 e − x β dx = x α de − x β = − ∞ 1 Γ α βα ( ) ∫ ∞ 0 x α e − x β dx −β Γ α βα 0 β ⎛ x α e − x β ∞ −α ∞ x α −1 e − x β dx⎞ ∫0 α ⎝ 0 ⎠ Γα β ( ) = αβ 1 Γ α βα ( ) ∫ 0 x α −1 e − x β dx = αβ . σ2 so that σ 2(X) = σ 2.

that is. 2π A→∞ [ ( ) ( )] . we have. we get E(X) = r. β = 2. we set μ = 0. in terms of the principal value integral. ii) If X is Negative Exponential. As an example. if α = 1. σ 2(X) = 1/λ2. μ = 0. σ = 1 and we have I= 2 ⎞ ⎛ 1 ∞ x dx 1 1 ∞ d x ⎟ = ⎜ ∫ π ∫−∞ 1 + x 2 π ⎜ 2 −∞ 1 + x 2 ⎟ ⎠ ⎝ ( ) ( 2 1 1 1 ∞ d 1+ x 2 = ∫−∞ 1 + x 2 = 2π log 1 + x π 2 1 = ∞−∞ . β = 1/λ. 3. For example. It is an improper integral in which the limits are taken symmetrically. In somewhat advanced mathematics courses. σ 2(X) = 2r. we get I= σ π ∫−∞ ∞ x dx σ2 + x−μ ( ) 2 . if α = r/2. Thus the Cauchy distribution is an example of a distribution without a mean. This coincides with the improper Riemann integral when the latter exists. for n = 1. 2π ( ) ) ∞ −∞ ( ) which is an indeterminate form. for σ = 1. that is.118 5 Moments of Random Variables—Some Moment and Probability Inequalities Next. we get E(X) = 1/λ. Let X be Cauchy. REMARK 5 ( ) ) ( ) ii) If X is χ r2 . and it often exists even if the Riemann integral does not. REMARK 6 I * = lim = A→∞ 1 A xdx 1 2 lim ∫− A 1 + x 2 = 2π A→∞ log 1 + x π ( ) A −A 1 lim log 1 + A 2 − log 1 + A 2 = 0. E X2 = and hence ( ) Γ(α1)β ( α ∫0 ∞ x α +1 e − x β dx =β 2α α + 1 ( ) σ 2 X = β 2α α + 1 − α 2 β 2 = αβ 2 α + 1 − α = αβ 2 . For simplicity. one encounters sometimes the so-called Cauchy Principal Value Integral. Then E(Xn) does not exist for any n ≥ 1.

distributed as Negative Binomial with parameters r and p. (Hint: In (ii)–(iv).2. 5.2. determine the constant c.2. ﬁnd a lower bound for the probability that the observed frequency X/n does not differ from 0. Exercises 5. if X has the Geometric distribution. (m + n) (m + n − 1) mnr m + n − r 2 5.2. p). 5. m+n σ2 X = ( ) ( ) .6 If X is an r. but not in the sense of our deﬁnition which requires absolute convergence of the improper Riemann integral involved. with a Hypergeometric distribution.7 Let X be an r.v.5 by more 0. 5.2 An honest coin is tossed independently n times and let X be the r. 5. iii) Determine the smallest value of n for which the probability that X/n does not differ from 0. σ 2(X) = rq/p2. Find the expected number of people with blood of each one of the four types and the variance about these numbers. distributed as P(λ). Then show that . calculate the kth factorial moment E[X(X − 1) · · · (X − k + 1)].5| < c) ≥ 0. show that EX = rq/p.8 Let f be the Gamma density with parameters α = n.v.1.2.2. denoting the number of H’s that occur.7 in Chapter 3 and ﬁnd the expected number of particles to reach the portion of space under consideration there during time t and the variance about this number.2.95.2.1 is at least 0.16 in Chapter 3 and suppose that 100 people are chosen at random. iii) If n = 100. utilize Tchebichev’s inequality.v. ii) Use (i) in order to show that EX = q/p and σ 2(X) = q/p2.3 Refer to Exercise 3.5 by more than 0. ii) By working as in the Binomial example. calculate the kth factorial moment E[X(X − 1) · · · (X − k + 1)].4 If X is an r. iv) If n = 50 and P(|(X/n) − 0.5 Refer to Exercise 3.9. use an approach similar to the one used in the Binomial example in order to show that EX = mr .5. σ 2(X/n). 5. iii) Calculate E(X/n).) 5. β = 1.v. distributed as B(n.2.1 Moments of Random Exercises Variables 119 Thus the mean of the Cauchy exists in terms of principal value.1 If X is an r.v.2.

3. .2.2. If γ1 > 0. of a Poisson distribution at the points j = 1. ii) Show that EX n = ( ) ( ).000 and 50% of the amount is refunded if its lifetime is <2. Karl Paerson.2. ⋅ ⋅ ⋅ .2.v. Compute the proportion of the residents who consume more than 15 × 103 cubic feet monthly.v.000. having the Beta distribution with parameters α and β. 5. compute EX. . X and is a measure of asymmetry of the distribution.v.10 Refer to Exercise 4. Suppose further that the manufacturer sells an item on money-back guarantee terms if the lifetime of the tube is less than c.v.12 Let X be an r.2. for example. Furthermore. 2. the distribution is said to be skewed to .11 then If X is an r.7 in Chapter 3 and suppose that each TV tube costs $7 and that it sells for $11. . suppose that a bulb is sold under the following terms: The entire amount is refunded if its lifetime is <1. X is distributed as Lognormal with parameters α and β . Tables of the Incomplete Γ-Function.9 Refer to Exercise 3. σ 2(X).1. 5.f.12 in Chapter 4 and suppose that each bulb costs 30 cents and sells for 50 cents. Then show that E|X| = ∞. 1957. 5.2.v.13 If the r. ii) Express his expected gain (or loss) in terms of c and λ. 2. ii) Use (i) in order to ﬁnd EX and σ 2(X).2. Γ (α )Γ (α + β + n) Γ α +β Γ α +n n = 1. with ﬁnite third moment and set μ = EX. 5.14 Suppose that the average monthly water consumption by the residents of a certain community follows the Lognormal distribution with μ = 104 cubic feet and σ = 103 cubic feet monthly. 5. 5. Compute the expected gain (or loss) of the dealer. Deﬁne the (dimensionless quantity. ⎝ σ ⎠ 3 γ1 is called the skewness of the distribution of the r. σ 2 = σ 2(X). pure number) γ1 by ⎛ X − μ⎞ γ 1 = E⎜ ⎟ .120 5 Moments of Random Variables—Some Moment and Probability Inequalities n −1 −λ ∫λ f ( x)dx = ∑0 e x= ∞ λx . editor) in order to evaluate the d.15 Let X be an r. one may utilize the Incomplete Gamma tables (see. Cambridge University Press. distributed as Cauchy with parameters μ and σ 2. x! Conclude that in this case. . ii) For what value of c will he break even? 5.

λ > 0.f. 0 < p < 1.v.v. ⎝ σ ⎠ 4 where μ = EX .v. then E X X −1 ⋅ ⋅ ⋅ X −k +1 = [ ( ) ( )] dk Gt dt k () t =1 . iii) Show that if |EX| < ∞.13(iii) in Chapter 3). Set G t = ∑ pj t j . ( ) γ2 is called the kurtosis of the distribution of the r. The function G is called the generating function of the sequence {pj}. q = 1 − p . taking on the values j with probability pj = P(X = j). 5. j ≥ 0. p) is skewed to the right for p < skewed to the left for p > 1 . .2.16 Let X be an r. 1) p. 2 1 2 and is iii) The Poisson distribution P(λ) and the Negative Exponential distribution are always skewed to the right. β). X and is a measure of “peakedness” of this distribution.d. . j =0 () ∞ − 1 ≤ t ≤ 1. .2. iii) Also show that if |E[X(X − 1) · · · (X − k + 1)]| < ∞. is a measure of reference. iii) The Binomial distribution B(n.3. the distribution is called platykurtic. ii) γ2 > 0 if X has the Double Exponential distribution (see Exercise 3. then γ1 = 0.1 Moments of Random Exercises Variables 121 the right and if γ1 < 0. Then show that: iii) If the p. If γ2 > 0.f. Then show that: ii) γ2 < 0 if X is distributed as U(α. σ 2 = σ 2 X . the distribution is called leptokurtic and if γ2 < 0.5. where the N(0. . with EX4 < ∝ and deﬁne the (pure number) γ2 by ⎛ X − μ⎞ γ 2 = E⎜ ⎟ − 3. ⎨e j! ⎭ ⎩ j ≥ 0. of X is symmetric about μ. 1. j = 0. 5.d. j ≥ 0. ⎪⎝ j ⎠ ⎪ ⎩ ⎭ and ⎧ − λ λj ⎫ ⎬. iii) Find the generating function of the sequences ⎧⎛ n⎞ j n− j ⎫ ⎪ ⎪ ⎨⎜ ⎟ p q ⎬.17 Let X be an r. then EX = d/dt G(t)|t = 1. the distribution is said to be skewed to the left.

1 and 5. Thus 1 n 1 m 1 m ⎧∑ x f x x 2 2 1 ⎪ E X 2 X1 = x1 = ⎨ x ⎪ ∞ x f x x dx .f f(x1. . X2)′ has the Bivariate Normal distribution. and a function of X1.f. σ2 ( ) Let X1. 5.f. . respectively. . xj |xi .2. X2 be two r. σ 2 (1 − ρ2)) r. where b = μ2 + ρσ 2 x1 − μ1 .122 5 Moments of Random Variables—Some Moment and Probability Inequalities iv) Utilize (ii) and (iii) in order to calculate the kth factorial moments of X being B(n.4. the resulting moments are called conditional moments. f of the r. . p) and X being P(λ). x2). . that is. . In connection with this.3 Conditional Moments of Random Variables If.’s with joint p. vector X is replaced by a conditional p.2..d. in the preceding deﬁnitions. 2 1 2 ⎩∫−∞ 2 ⎧ ⎪∑ x 2 − E X 2 X1 = x1 ⎪ σ 2 X 2 X1 = x1 = ⎨ x ⎪ ∞ ⎪∫−∞ x 2 − E X 2 X1 = x1 ⎩ ( ( ) ) 2 2 [ [ ( ( ) ) ( )] f (x x ) 2 2 1 ( )] f (x x )dx . we have the following properties: 1 .v.f.d. . σ1 ( ) Similarly. . 2 2 1 2 For example. then f(x2 |x1) 2 is the p. Then E(X2|X1 = x1) is a function of x1.v.. σ1 ( ) Hence E X 2 X1 = x1 = μ 2 + ( ) ρσ 2 x1 − μ1 .d. E X1 X 2 = x 2 = μ1 + ( ) ρσ 1 x2 − μ2 . xi .v. . Then we may talk about the E[E(X2|X1)]. the p. if (X1. We just gave the deﬁnition of E(X2|X1 = x1) for all x1 for which f(x2|x1) is deﬁned. . . Compare the results with those found in Exercises 5. Replacing x1 by X1 and writing E(X2|X1) instead of E(X2|X1 = x1).d. and they are functions of xi . xi ). . of an N(b. for all x1 for which fX (x1) > 0. we then have that E(X2|X1 ) is itself an r. f(xj .

then E[E(X2|X1)] = E(X2) (that is. we have [ ( ) ] ( )( ∞ −∞ ) E X 2 g X1 X1 = x1 = ∫ x 2 g x1 f x 2 x1 dx 2 = g x1 = g x1 E X 2 X1 = x1 .5. We have ∞ ∞ E E X 2 X1 = ∫ ⎡ ∫ x 2 f x 2 x1 dx 2 ⎤ fX x1 dx1 ⎢ −∞ ⎥ −∞ ⎣ ⎦ 1 [( )] ( ) ( ) =∫ =∫ ∞ −∞ −∞ ∞ ∫ ∫ ∞ x 2 f x 2 x1 fX x1 dx 2 dx1 1 ( ( ) ( ) ) ) ∞ −∞ −∞ x 2 f x1 .3 Conditional Moments of Random Variables 5. in question). It sufﬁces to establish the property for the continuous case only. In particular. ( ) ( ) ( ) .v. by taking X2 = 1. g(X1) be a (measurable) function of X1 and let that E(X2) exists. x 2 dx1 dx 2 2 ( ) ∞ ∞ ∞ = ∫ x 2 ⎛ ∫ f x1 . the expectation of the conditional expectation of an r. REMARK 7 (CE2) Let X1. restricting ourselves to the continuous case.3. X2 be two r. is the same as the (unconditional) expectation of the r. Again. Set μ = E X 2 .v. we have E X 2 g X1 X1 = x1 = g x1 E X 2 X1 = x1 [ ( ) ] ( )( ) or E X 2 g X1 X1 = g X1 E X 2 X1 .’s. x 2 dx1 ⎞ dx 2 = ∫ x 2 fX x 2 dx 2 = E X 2 . (CV) Provided the quantities which appear below exist. Then for all x1 for which the conditional expectations below exist.v. we get [ ( ) ] ( ) ( ( )( ) ) ( )∫ ∞ −∞ x 2 f x 2 x1 dx 2 ( ) (CE2′) For all x1 for which the conditional expectations below exist. −∞ −∞ ⎝ −∞ ⎠ ( ( ) ( ) Note that here all interchanges of order of integration are legitimate because of the absolute convergence of the integrals involved.1 Some Basic Properties of the Conditional Expectation (CE1) If E(X2) and E(X2|X1) exist.1 123 5. x 2 dx 2 dx1 = ∫ ∞ −∞ −∞ ∫ ∞ x 2 f x1 . we have E[g(X1)|X1 = x1] = g(x1) (or E[g(X1)|X1] = g(X1)). for the proof for the discrete case is quite analogous. we have σ 2 E X 2 X1 ≤ σ 2 X 2 [( )] ( ) and the inequality is strict. unless X2 is a function of X1 (on a set of probability one). φ X1 = E X 2 X1 .

since E X 2 − φ X1 = μ − μ = 0. σ 2 X 2 ≥ E φ X1 − μ 2 = σ 2 E X 2 X1 . Y be jointly distributed with p.3. Therefore σ 2 X 2 = E X 2 − φ X1 and since [ ( )] 2 + E φ X1 − μ . ( 2 2 2 1 1 2 2 1 1 2 1 1 Next.v.124 5 Moments of Random Variables—Some Moment and Probability Inequalities Then σ 2 X2 = E X2 − μ 2 ( ) ) = E{[X − φ (X )] + [φ (X ) − μ ]} = E[ X − φ ( X )] + E[φ ( X ) − μ ] + 2 E{[ X − φ ( X )][φ ( X ) − μ ]}. given by .d. Let the r. Exercises 5.3.’s X. The inequality is strict unless E X 2 − φ X1 ( ) [( ) ] [( [ )] [ ( )] 2 = 0. 2 which is 0. by Remark 8.1 5. which follows. E X 2 − φ X1 φ X1 − μ 2 2 1 {[ ( )][ ( ) ]} = E[ X φ ( X )] − E[φ ( X )] − μE( X ) + μE[φ ( X )] = E{E[ X φ ( X ) X ]} − E[φ ( X )] − μE[ E( X X )] + μE[φ ( X )] (by (CE1)). [ ( )] ( )] Thus σ 2[X2 − φ(X1)] = 0 and therefore X2 = φ(X1) with probability one. 2 [( ) ] E X 2 − φ X1 we have [ ( )] ≥ 0.2 Establish properties (CEI) and (CE2) for the discrete case. 1 2 1 2 2 1 1 1 2 1 1 and this is equal to E φ 2 X1 − E φ 2 X1 − μE φ X1 + μE φ X1 [ ( )] [ ( )] ( ) [ ( )] [ ( )] (by (CE2)). But E X 2 − φ X1 [ ( )] 2 = σ 2 X 2 − φ X1 .f.

∑ x =1 x = 6 ) ( 5. y).1)×(0. . Y be r. σ 2(X). and let c > 0. .’s with p. y = ( ) 2 n n+1 ( ) if y = 1. σ 2(X|Y = y). Then show that into E h X gY X=x =h x E gY X=x.5. E(X|Y = y).d. . f given by f(x. 5. .d. . . ⋅ ⋅ ⋅ .d. Calculate the following quantities: EX.d. g be (measurable) functions on itself such that E[h(X)g(Y)] and Eg(X) exist. σ 2(Y). Hint: Recall that ∑ n=1 x = ( 2 ) . y) = λ2e−λ(x+y)I(0.3. xk)′ ∈ .v. with ﬁnite EX.∞)(x. Y and let h. .v. and 0 otherwise. y) = (x + y)I(0. . Then Pg X ≥c ≤ E g ( X) [ ( ) ] [ c ]. EY.f.3 Let X. ⋅ ⋅ ⋅ . y).5 Let X be an r. Y.v. EY. (Assume the existence of all p.f.3. ⋅ ⋅ ⋅ . where A = {(x1. σ 2(X|Y = y). xk) ≥ c}. . vector and g ≥ 0 be a real-valued (measurable) function deﬁned on k. ⋅ ⋅ ⋅ . . n.) 5.v. x = 1. f. g(x1. Then . x )dx k ( )( ) ) ( ) ⋅ ⋅ ⋅ dxk . Then ∞ ∞ E g X = ∫ ⋅ ⋅ ⋅ ∫ g x1 . so that g(X) is an r. σ 2(X). Then for any r. xk A A 1 k 1 ( )( × f ( x .1 Moments Moment Inequalities 125 f x.3. PROOF Assume X is continuous with p. xk dx1 ⋅ ⋅ ⋅ dxk + ∫ c g x1 . E(X|Y = y). and x n( n + 1)( 2 n + 1) n 2 .4 Let X.4 Some Important Applications: Probability and Moment Inequalities THEOREM 1 Let X be a k-dimensional r. show that E[E(X|Y)] = EX. ⋅ ⋅ ⋅ . [( )( ) ] ()[( ) ] 5. Y be r.’s X. .f. xk dx1 ⋅ ⋅ ⋅ dxk −∞ −∞ [ ( )] = ∫ g x1 . σ 2(Y). xk f x1 .’s needed. E(Y|X = x). .∞)×(0.6 Consider the r.4 Some Important Applications: Probability andof Random Variables 5. . Compute the following n n+1 quantities: E(X|Y = y). x.v. ⋅ ⋅ ⋅ .’s with p.f.3. . . Calculate the following quantities: EX. xk f x1 .v.1)(x.. f given by f(x. 5. .

P X − μ < kσ ≥ 1 − 2 . with mean μ and variance σ 2 = 0. imply then that P(X = μ) = 1 (see also Exercise 5.126 5 Moments of Random Variables—Some Moment and Probability Inequalities E g X ≥ ∫ g x1 . The proof is completely analogous if X is of the discrete type. ( ) ( ) and ( ) E( XY ) = −1 PROOF E XY = 1 if any only if if any only if ( ) P(Y = − X ) = 1. 2.v. and take g(X) = |X − μ|r. μ = E(X).6). ( ) ( ) Then E 2 XY ≤ 1 or. Then EX −μ r P X − μ ≥ c = P⎡ X − μ ≥ cr ⎤ ≤ .v.4. xk dx1 ⋅ ⋅ ⋅ dxk A A 1 k 1 k [ ( )] Hence P[g(X) ≥ c] ≤ E[g(X)]/c. equivalently.’s such that E X = E Y = 0. We have 0 ≤ E X −Y ( ) 2 = E X 2 − 2 XY + Y 2 ( = EX 2 − 2 E XY + EY 2 = 2 − 2 E XY ( ) ) ( ) . x )dx ⋅ ⋅ ⋅ dx = cP[ g(X ) ∈ A] = cP[ g(X ) ≥ c]. 2 k k [ ] Let X be an r. then P X − μ ≥ kσ ≤ REMARK 8 [ ] 1 1 . ⋅ ⋅ ⋅ . This result and Theorem 2. ( 5. Let X be an r. − 1 ≤ E XY ≤ 1. xk f x1 .v. In Markov’s inequality replace r by 2 to obtain 2 ⎛ ⎞ EX −μ P X − μ ≥ c ⎜ = P ⎡ X − μ ≥ c 2 ⎤⎟ ≤ ⎢ ⎥ ⎝ ⎣ ⎦⎠ c2 [ ] 2 = σ2 X c 2 ( )=σ 2 2 c . P Y = X = 1. ⋅ ⋅ ⋅ . Chapter 2.4. ( ) ( ) σ 2 X = σ 2 Y = 1. r > 0. In particular.1 Special Cases of Theorem 1 r 1. LEMMA 1 Let X and Y be r. ⎢ ⎥ ⎣ ⎦ cr [ ] This is known as Markov’s inequality. equivalently. ⋅ ⋅ ⋅ . Then Tchebichev’s inequality gives: P[|X − μ| ≥ c] = 0 for every c > 0. ▲ )( ) ≥ c ∫ f ( x . This is known as Tchebichev’s inequality. all one has to do is to replace integrals by summation signs. if c = kσ.

−σ 1σ 2 ≤ E X − μ1 Y − μ 2 ≤ σ 1σ 2 . μ2 and (positive) variances σ 12 . σ2 . equivalently. and if P(Y = −X) = 1. that is. Then 2 E 2 X − μ1 Y − μ 2 ≤ σ 12σ 2 . and E X − μ1 Y − μ 2 [( )( )] [( )( )] = σ σ 1 2 if and only if ⎡ ⎤ σ P ⎢Y = μ 2 + 2 X − μ1 ⎥ = 1 σ1 ⎣ ⎦ ( ) and E X − μ1 Y − μ 2 [( )( )] = −σ σ 1 2 if and only if ⎡ ⎤ σ P ⎢Y = μ 2 − 2 X − μ1 ⎥ = 1. so that −1 ≤ E(XY) ≤ 1.5.1 Moments Moment Inequalities 127 and 0 ≤ E X +Y ( ) 2 = E X 2 + 2 XY + Y 2 2 ( = EX + 2 E XY + EY 2 = 2 + 2 E XY . then E(XY) = −EY2 = −1. so that P X = −Y = 1. σ 2 . Then E(XY) = EY2 = 1. Finally. let E(XY) = 1. Conversely. ▲ THEOREM 2 ( ) (Cauchy–Schwarz inequality) Let X and Y be two random variables with 2 means μ1. σ1 ⎣ ⎦ ( ) PROOF Set X1 = X − μ1 . Now let P(Y = X) = 1. let E(XY) = −1. respectively. Then σ 2 X −Y = E X −Y = EX 2 ( ) ( ) − [E(X − Y )] = E(X − Y ) − 2 E( XY ) + EY = 1 − 2 + 1 = 0. σ1 Y1 = Y − μ2 . [( )( )] or. Then σ 2(X + Y) = 2 + 2E(XY) = 2 − 2 = 0.4 Some Important Applications: Probability andof Random Variables 5. P(X = Y) = 1. ( ) ) ( ) Hence E(XY) ≤ 1 and −1 ≤ E(XY). 2 2 2 2 so that P(X = Y) = 1 by Remark 8.

1 Establish Theorem 1 for the discrete case. g(−x) = g(x)) and nondecreasing for x ≥ 0. . Then. X and any ε > 0. This is established as follows: Since the inequality is trivially true if either one of EX2. and hence E 2 X1Y1 ≤ 1 ( ) if and only if −1 ≤ E X1Y1 ≤ 1 becomes E 2 X − μ1 Y − μ 2 ( ) [( )( σ σ 2 1 2 2 )] ≤ 1 if and only if −σ 1σ 2 ≤ E X − μ1 Y − μ 2 ≤ σ 1σ 2 . ∞). for any 5.4.4.4.v. use Tchebichev’s inequality in order to ﬁnd a lower bound for the probability P(|X − μ| < kσ). Compare the lower bounds for k = 1. both ﬁnite.4. If furthermore g is even (that is. 2. EY2 is ∞. into (0.3 For an r. 3 with the respective probabilities when X ∼ N(μ. suppose that they are both ﬁnite and set Z = λX − Y. then P X ≥ε ≤ Eg X ( ). P g X ≥ε ≤ [( ) ] ( ) Eg X ε ( ). REMARK 9 [( )( )] Exercises 5. Y1 are as in the previous lemma.v.6). The second half of the conclusion follows similarly and will be left as an exercise (see Exercise 5. or E2(XY) ≤ (EX2)(EY2). which happens if and only if E2(XY) − (EX2)(EY2) ≤ 0 (by the discriminant test for quadratic equations). ▲ A more familiar form of the Cauchy–Schwarz inequality is E2(XY) ≤ (EX2)(EY2).128 5 Moments of Random Variables—Some Moment and Probability Inequalities Then X1. where λ is a real number. Then 0 ≤ EZ2 = (EX2)λ2 − 2[E(XY)]λ + EY2 for all λ.2 Let g be a (measurable) function deﬁned on r. X with EX = μ and σ 2(X) = σ 2. g (ε ) 5. σ 2).

To this end. with EX = μ (ﬁnite) such that P(|X − μ| ≥ c) = 0 for every c > 0. 5. which are assumed to be positive. = σ 1σ 2 [( )( )] = Cov( X . Y) or ρX. From this stems the signiﬁcance of ρ as a measure of linear dependence between X and Y. E[(X − μ1)(Y − μ2)]. Correlation Coefﬁcient of Random Variables 5. and compare this bound with the exact value found from Table 3 in Appendix III. we say that X and Y are uncorrelated. then the covariance of (X − μ1)/σ1. that is.v.4.5).Y or ρ12 or just ρ if no confusion is possible. 5. σ2 are the standard deviations of X and Y. then P(X = μ) = 1. 1)-joint central mean. and ρ = −1 if and only if Y = μ2 − σ2 X − μ1 σ1 ( ) with probability 1. the (1. 5. ⎡⎛ X − μ1 ⎞ ⎛ Y − μ 2 ⎞ ⎤ E X − μ1 Y − μ 2 ρ = E ⎢⎜ ⎟⎜ ⎟⎥ = σ 1σ 2 ⎢⎝ σ 1 ⎠ ⎝ σ 2 ⎠ ⎥ ⎣ ⎦ E XY − μ1 μ 2 . is called the covariance of X. and ρ = 1 if and only if Y = μ2 + σ2 X − μ1 σ1 ( ) with probability 1.v. while if ρ = ±1.’s and provide an interpretation for the latter.5 Covariance.v. we say that X and Y are completely correlated (positively if ρ = 1.1. If σ1. we have that ρ2 ≤ 1. X. for two r.4.7 For any r.v. use the Cauchy–Schwarz inequality in order to show that E|X| ≤ E1/2X2.’s X and Y with means μ1.4. Y ) σ 1σ 2 ( ) From the Cauchy–Schwarz inequality. Use Tchebichev’s inequality in order to ﬁnd a lower bound for the probability P(|(X/40) − 1| ≤ 0. we introduce the concepts of covariance and correlation coefﬁcient of two r. μ2.6 Prove the second conclusion of Theorem 2. that is.5 Covariance. Y and is denoted by Cov(X. .1 Moments and Its Interpretation 129 2 5.5. distributed as χ 40 . 5. Correlation Coefﬁcient and Its Interpretation In this section. Y and is denoted by ρ(X. So ρ = ±1 means X and Y are linearly related. 5.v.Y). (See Fig. that is −1 ≤ ρ ≤ 1.4 Let X be an r.) If ρ = 0.4. negatively if ρ = −1). (Y − μ2)/σ2 is called the correlation coefﬁcient of X.5 Refer to Remark 8 and show that if X is an r.

Recalling that the distance of the point (x0. Y)) as a measure of co-linearity of the r. To this end. ( ) ( ) 2 (1) Taking the expectations of both sides in (1) and recalling that .130 5 Moments of Random Variables—Some Moment and Probability Inequalities Y 2 2 1 1 2 y y 2 2 1 (x (x 1) 2 2 2 1 1) 2 1 1 0 Figure 5. Y) from the line y = μ 2 + σ ( x − μ 1 ) . negatively if ρ < 0). y0) from the a 2 + b 2 . σ2 σ2 2 ⎡ ⎛σ μ ⎞⎤ σ D = ⎢ X − 1 Y + ⎜ 1 2 − μ1 ⎟ ⎥ σ2 ⎝ σ2 ⎠⎦ ⎢ ⎥ ⎣ 2 2 ⎛ σ2⎞ 1 + 12 ⎟ .’s X and Y. Positive values of ρ may indicate that there is a tendency of large values of Y to correspond to large values of X and small values of Y to correspond to small values of X. Values of ρ close to zero may also indicate that these tendencies are weak. we have in the present line ax + by + c = 0 is given by ax0 + by0 + c case: 2 1 D= X − ⎛σ μ ⎞ σ1 Y + ⎜ 1 2 − μ1 ⎟ σ2 ⎝ σ2 ⎠ 1+ σ 12 . Negative values of ρ may indicate that small values of Y correspond to large values of X and large values of Y to small values of X. ρ ≠ 0. for ρ > 0.v. while values of ρ close to ±1 may indicate that the tendencies are strong. Thus. b = − σ1 σμ and c = 1 2 − μ1 . ⎜ σ2 ⎠ ⎝ and we wish to evaluate the expected squared distance of (X. The following elaboration sheds more light on the intuitive interpretation of the correlation coefﬁcient ρ(= ρ(X. we say that X and Y are correlated (positively if ρ > 0. 2 σ2 since here a = 1. we ﬁnd σ 1 (σ 2 1 2 2 + σ 2 D 2 = σ 2 X 2 + σ 12Y 2 − 2σ 1σ 2 XY + 2σ 2 σ 1 μ 2 − σ 2 μ1 X ) ( ) − 2σ 1 σ 1 μ 2 − σ 2 μ1 Y + σ 1 μ 2 − σ 2 μ1 . that is. Y) from the above line.1 X 1 For −1 < ρ < 1. Carrying out the calculations. consider the line y = μ 2 + σ ( x − μ 1 ) σ in the xy-plane and let D be the distance of the (random) point (X. ED2.

Thereσ σ fore. published in the American Statistician. determine the coefﬁcients A. His approach is somewhat different and is outlined below. look at the point (X1. (2) Working likewise for the case that ρ < 0.1 Moments and Its Interpretation 131 2 2 EX 2 = σ 12 + μ12 .’s X and Y. ( ) we obtain ED2 = 2 2σ 12σ 2 1− ρ 2 σ 12 + σ 2 ( ) (ρ > 0). which is equivalent to saying that the pairs (X. we move from the xy-plane to the x1y1-plane. Y) tend to be arranged along the line y = μ 2 + σ ( x − μ 1 ) . the “normalized” − r. exploiting the interpretation of an expectation as an average.v. The preceding discussion is based on the paper “A Direct Development of the Correlation Coefﬁcient” by Leo Katz. For ρ = 0. by the observation just made. consider the r. Y) may lie anywhere in the xy-plane. Y1) is minimum. Y1) and seek the line Ax1 + By1 + C = 0 from which the expected squared distance of (X1. 2 σ 12 + σ 2 ( ) (4) At this point. regardless of the value of ρ. Y) lies on the line y = μ2 + σ ( x − μ 1 ) or y = μ 2 − σ ( x − μ 1 ). page 170. (3) For ρ = 1 or ρ = −1. Vol. Y) tend to be arranged along the line y = μ 2 − σ ( x − μ 1 ) .’s X and Y. so that ED1 is minimum. respectively (with probability 1).5.v. It is in this sense that ρ is a measure of co-linearity of the r. These points get closer and closer σ to this line as ρ gets closer to 1. For ρ < 0.v. That 2 is. B and C. and lie on this line for ρ = −1. relation (4) indicates the following: For ρ > 0. unlike the original r. the pairs (X. and lie on the line for ρ = 1.5 Covariance.’s X1 and Y1 are dimensionless. where 2 1 2 1 2 1 2 1 1 1 2 2 . In this latter plane.’s X1 and Y1 as deﬁned in the proof of Theorem 2. we already know that (X. the expected distance is constantly equal to σ 2 2 2 σ 12 σ 2 /( σ 12 + σ 2 ) from either one of the lines y = μ 2 + σ ( x − μ 1 ) and σ y = μ 2 − σ ( x − μ 1 ) . EY 2 = σ 2 + μ 2 and E XY = ρσ 1σ 2 + μ1 μ 2 . the σ pairs (X. These points get closer and closer to this line as ρ gets closer to −1. we get ED2 = 2 2σ 12σ 2 1+ ρ 2 σ 12 + σ 2 ( ) (ρ < 0). First. Correlation Coefﬁcient of Random Variables 5. relations (2) and (3) are summarized by the expression 2 2 1 1 ED2 = 2 2σ 12σ 2 1− ρ .v. Through the transformations x1 = x σ μ y−μ and y1 = σ . 29 (1975).

the +B 2 main diagonal. ED1 is minimized for A22AB2 = 1 and the minimum is 1 + ρ. Also observe that both minima of ED1 (for ρ > 0 and ρ < 0). and y1 = −x1. there is no minimizing line (through the origin) Ax1 + By1 = 0. taking expectations. σ respectively. if ρ = 0. for ρ = 0. namely. and A22AB2 = 1 if and only if A = −B. +B 2 If ρ < 0. and EX1 = EY1 = 0. the ED1 is constantly equal to 1. A2 + B2 (7) From (6) and (7). −1 ≤ 2 AB ≤ 1. A22AB2 = −1 if and only if A = +B B. ED1 is minimized for A22AB2 = −1 and the minimum is 1 − ρ. ( ) we obtain ED12 = 1 + 2 ABρ C2 + 2 . The corresponding lines are y1 = x1. the ED1 is minimized for the line y1 = x1. for 2 2 ρ < 0.132 5 Moments of Random Variables—Some Moment and Probability Inequalities D1 = AX 1 + BY1 + C1 noticing that 2 A2 + B2 . 2 1 2 1 . there is no minimizing line. Then. A2 + B2 A + B2 (5) 2 Clearly. observe that − A+ B 2 ABρ . 2 To summarize: For ρ > 0. However. we conclude that: 2 If ρ > 0. for ED1 to be minimized it is necessary that C = 0. and its constant value 1 (for ρ = 0) are expressed by a single form. the expression to be minimized is ED12 = 1 + At this point. the interpretation of ρ as a measure of co-linearity (of X1 and Y1) is argued as above. with the lines y = μ 2 + σ ( x − μ 1 ) σ and y = μ 2 − σ ( x − μ 1 ) being replaced by the lines y1 = x1 and y1 = −x1. and E X1Y1 = ρ. ( ) ( ) ( ) 2 or equivalently. Expanding D1 . 1 − |ρ|. A2 + B2 (6) ( ) 2 = − A2 + B2 − 2 AB ≤ 0 ≤ A2 + B2 − 2 AB = A − B . ED1 = 1. the ED1 is minimized for the line y1 = −x1. by (5). From this point on. EX12 = EY12 = 1. +B 2 Finally.

5.1 Moments of Random Exercises Variables

133

Exercises

5.5.1 Let X be an r.v. taking on the values −2, −1, 1, 2 each with probability 1 . Set Y = X2 and compute the following quantities: EX, σ 2(X), EY, σ 2(Y), 4 ρ(X, Y). 5.5.2 Go through the details required in establishing relations (2), (3) and (4). 5.5.3 Do likewise in establishing relation (5). 5.5.4 Refer to Exercise 5.3.2 (including the hint given there) and iii) Calculate the covariance and the correlation coefﬁcient of the r.v.’s X and Y; iii) Referring to relation (4), calculate the expected squared distance of (X, Y) from the appropriate line y = μ 2 + σ ( x − μ 1 ) or y = μ 2 − σ ( x − μ 1 ) σ σ (which one?);

2 2 1 1

iii) What is the minimum expected squared distance of (X1, Y1) from the appropriate line y = x or y = −x (which one?) where X 1 = Xσ− μ and − Y1 = Yσ μ .

1 1 2 2

⎛ ⎜ Hint: Recall that ⎜ ⎝

⎡n n+1 ∑x = ⎢ 2 ⎢ x =1 ⎣

n 3

(

) ⎤ .⎞ ⎥ ⎟

2

⎥ ⎟ ⎦ ⎠

5.5.5 Refer to Exercise 5.3.2 and calculate the covariance and the correlation coefﬁcient of the r.v.’s X and Y. 5.5.6 Do the same in reference to Exercise 5.3.3. 5.5.7 Repeat the same in reference to Exercise 5.3.4. 5.5.8 Show that ρ(aX + b, cY + d) = sgn(ac)ρ(X, Y), where a, b, c, d are constants and sgn x is 1 if x > 0 and is −1 if x < 0. 5.5.9 Let X and Y be r.v.’s representing temperatures in two localities, A and B, say, given in the Celsius scale, and let U and V be the respective temperatures in the Fahrenheit scale. Then it is known that U and X are related as 9 follows: U = 5 X + 32, and likewise for V and Y. Fit this example in the model of Exercise 5.5.8, and conclude that the correlation coefﬁcients of X, Y and U, V are identical, as one would expect. 5.5.10 Consider the jointly distributed r.v.’s X, Y with ﬁnite second moments ˆ ˆ and σ 2(X) > 0. Then show that the values α and β for which E[Y − (αX + β)]2 is minimized are given by

134

5

Moments of Random Variables—Some Moment and Probability Inequalities

ˆ ˆ β = EY − αEX ,

ˆ α=

( ) ρ X, Y . ( ) σ (X )

σY

**ˆ ˆ ˆ (The r.v. Y = α X + β is called the best linear predictor or Y, given X.)
**

5.5.11 If the r.v.’s X1 and X2 have the Bivariate Normal distribution, show that the parameter ρ is, actually, the correlation coefﬁcient of X1 and X2. (Hint: Observe that the exponent in the joint p.d.f. of X1 and X2 may be written as follows:

1

2 ⎡⎛ x − μ ⎞ 2 ⎛ x 1 − μ1 ⎞ ⎛ x 2 − μ2 ⎞ ⎛ x 2 − μ2 ⎞ ⎤ 1 ⎢⎜ 1 ⎟ ⎥ ⎟ +⎜ ⎟⎜ ⎟ − 2ρ ⎜ ⎢⎝ σ 1 ⎠ ⎝ σ1 ⎠⎝ σ2 ⎠ ⎝ σ2 ⎠ ⎥ ⎦ ⎣

2 1 − ρ2

(x =

(

)

2 1

1

− μ1 2σ

)

2

+

(x

2

−b

)

2 2

2⎛ σ 2 1 − ρ 2 ⎞ ⎝ ⎠

,

where b = μ 2 + ρ

σ2 x 1 − μ1 . σ1

(

)

This facilitates greatly the integration in calculating E(X1X2). 5.5.12 If the r.v.’s X1 and X2 have jointly the Bivariate Normal distribution 2 with parameters μ1, μ2, σ 12 , σ 2 and ρ, calculate E(c1X1 + c2X2) and σ2(c1X1 + c2X2) in terms of the parameters involved, where c1 and c2 are real constants. 5.5.13 For any two r.v.’s X and Y, set U = X + Y and V = X − Y. Then

iii) Show that P(UV < 0) = P(|X| < |Y|); iii) If EX2 = EY2 < ∞, then show that E(UV) = 0; iii) If EX2, EY2 < ∞ and σ 2(X) = σ 2(Y), then U and V are uncorrelated. 5.5.14 If the r.v.’s Xi, i = 1, . . . , m and Yj, j = 1, . . . , n have ﬁnite second moments, show that

n ⎛m ⎞ m n Cov⎜ ∑ X i , ∑ Yj ⎟ = ∑ ∑ Cov X i , Yj . ⎝ i =1 ⎠ i =1 j =1 j =1

(

)

**5.6* Justiﬁcation of Relation (2) in Chapter 2
**

As a ﬁnal application of the results of this chapter, we give a general proof of Theorem 9, Chapter 2. To do this we remind the reader of the deﬁnition of the concept of the indicator function of a set A.

5.6* Justiﬁcation of Relation (2) in Chapter 2 5.1 Moments of Random Variables

135

Let A be an event in the sample space S. Then the indicator function of A, denoted by IA, is a function on S deﬁned as follows:

⎧1 if s ∈ A IA s = ⎨ c ⎩0 if s ∈ A .

()

**The following are simple consequences of the indicator function: I I Aj = ∏ I A
**

n j=1

n

j =1

j

(8)

I and, in particular,

∑

n j=1

Aj = ∑ I A ,

j =1

j

n

(9)

IA = 1− IA.

c

(10)

Clearly,

E IA = P A

( )

r

( )

1

(11)

and for any X1, . . . , Xr, we have

(1 − X )(1 − X ) ⋅ ⋅ ⋅ (1 − X ) = 1 − H

1 2

+ H 2 − ⋅ ⋅ ⋅ + −1 H r ,

1 j

( )

r

(12)

where Hj stands for the sum of the products Xi · · · Xi , where the summation extends over all subsets {i1, i2, . . . , ij} of the set {1, 2, . . . , r}, j = 1, . . . , r. Let α, β be such that: 0 < α, β and α + β ≤ r. Then the following is true:

∑ Xi

jα

1

⎛α + β⎞ ⋅ ⋅ ⋅ X iα H β J α = ⎜ ⎟ Hα + β , ⎝ α ⎠

( )

(13)

where Jα = {i1, . . . , iα} is the typical member of all subsets of size α of the set {1, 2, . . . , r}, Hβ(Jα) is the sum of the products Xj · · · Xj , where the summation extends over all subsets of size β of the set {1, . . . , r} − Jα, and ∑J is meant to extend over all subsets Jα of size α of the set {1, 2, . . . , r}. The justiﬁcation of (13) is as follows: In forming Hα+β, we select (α + β) X’s r from the available r X’s in all possible ways, which is (α + β ). On the other hand, r −α for each choice of Jα, there are ( β ) ways of choosing β X’s from the remaining r r − (r − α) X’s. Since there are (α ) choices of Jα, we get (α )( r βα ) groups (products) of (α + β) X’s out of r X’s. The number of different groups of (α + β) X’s out

1

β

α

136

5

**Moments of Random Variables—Some Moment and Probability Inequalities
**

r − r of r X’s is (α + β ). Thus among the (α )( r βα ) groups of (α + β) X’s out of r X’s, the number of distinct ones is given by

r −α ! r! ⎛ r ⎞ ⎛r − α⎞ ⎜ ⎟⎜ ⎟ ⎝ α ⎠ ⎝ β ⎠ α ! r − α ! β! r − α − β ! = r! ⎛ r ⎞ ⎜ ⎟ α + β ! r −α − β ! ⎝α + β⎠

(

( ) ( )(

)

)

=

This justiﬁes (13). Now clearly,

(α + β )! = ⎛α + β ⎞ .

α ! β!

⎜ ⎟ ⎝ α ⎠

(

)

**Bm = ∑ Ai ∩ ⋅ ⋅ ⋅ ∩ Ai ∩ Aic ∩ ⋅ ⋅ ⋅ ∩ Aic ,
**

1 m m +1 M

Jm

where the summation extends over all choices of subsets Jm = {i1, . . . , im} of the set {1, 2, . . . , M} and Bm is the one used in Theorem 9, Chapter 2. Hence

IB = ∑ IA

m

Jm

i1 ∩

⋅ ⋅ ⋅ ∩ Aim ∩ Aic + 1 ∩ ⋅ ⋅ ⋅ ∩ AicM m

= ∑ IA ⋅ ⋅ ⋅ IA 1− IA

Jm

i1 im i1 im

(

im + 1

) ⋅ ⋅ ⋅ (1 − I ) (by (8), (9), (10))

AiM

= ∑ I A ⋅ ⋅ ⋅ I A ⎡1 − H1 J m + H 2 J m − ⋅ ⋅ ⋅ + −1 ⎢ ⎣ J

m

( )

( )

( )

M −m

H M −m J m ⎤ ⎥ ⎦

( )

(by (12)).

Since

∑IA

Jm

i1

⎛ m + k⎞ ⋅ ⋅ ⋅ I A H k Jm = ⎜ ⎟ H m+ k ⎝ m ⎠

im

( )

(by (13)),

( )

M −m

we have

⎛ m + 1⎞ ⎛ m + 2⎞ I B = Hm − ⎜ ⎟ H m+1 + ⎜ ⎟ H m+ 2 − ⋅ ⋅ ⋅ + −1 ⎝ m ⎠ ⎝ m ⎠

m

⎛ M⎞ ⎜ ⎟ HM . ⎝ m⎠

Taking expectations of both sides, we get (from (11) and the deﬁnition of Sr in Theorem 9, Chapter 2)

⎛ m + 1⎞ ⎛ m + 2⎞ P Bm = S m − ⎜ ⎟ S m+1 + ⎜ ⎟ S m+ 2 − ⋅ ⋅ ⋅ + −1 ⎝ m ⎠ ⎝ m ⎠

( )

( )

M −m

⎛ M⎞ ⎜ ⎟ SM , ⎝ m⎠

as was to be proved.

5.6* Justiﬁcation of Relation (2) in Chapter 2 5.1 Moments of Random Variables

137

(For the proof just completed, also see pp. 80–85 in E. Parzen’s book Modern Probability Theory and Its Applications published by Wiley, 1960.)

REMARK 10 In measure theory the quantity IA is sometimes called the characteristic function of the set A and is usually denoted by χA. In probability theory the term characteristic function is reserved for a different concept and will be a major topic of the next chapter.

138

6

Characteristic Functions, Moment Generating Functions and Related Theorems

Chapter 6

Characteristic Functions, Moment Generating Functions and Related Theorems

6.1 Preliminaries

The main subject matter of this chapter is the introduction of the concept of the characteristic function of an r.v. and the discussion of its main properties. The characteristic function is a powerful mathematical tool, which is used proﬁtably for probabilistic purposes, such as producing the moments of an r.v., recovering its distribution, establishing limit theorems, etc. To this end, recall that for z ∈ , eiz = cos z + i sin z, i = −1 , and in what follows, i may be treated formally as a real number, subject to its usual properties: i2 = −1, i3 = −i, i 4 = 1, i 5 = i, etc. The sequence of lemmas below will be used to justify the theorems which follow, as well as in other cases in subsequent chapters. A brief justiﬁcation for some of them is also presented, and relevant references are given at the end of this section.

LEMMA A

**Let g1, g2 : {x1, x2, . . .} → [0, ∞) be such that
**

g1 x j ≤ g 2 x j ,

( )

( )

j = 1, 2, ⋅ ⋅ ⋅ ,

**and that ∑xj g2(xj) < ∞. Then ∑xj g1(xj) < ∞.
**

PROOF LEMMA A′

If the summations are ﬁnite, the result is immediate; if not, it follows by taking the limits of partial sums, which satisfy the inequality.

b

Let g1, g2 : → [0, ∞) be such that g1(x) ≤ g2(x), x ∈ , and that ∫a g1 (x)dx ∞ ∞ exists for every a, b, ∈ with a < b, and that ∫−∞ g 2 (x)dx < ∞. Then ∫−∞ g1(x)dx < ∞.

PROOF

Same as above replacing sums by integrals.

138

6.5

The Moment Generating Function 6.1 Preliminaries

139

LEMMA B

Let g : {x1, x2, . . .} →

PROOF

and ∑xj|g(xj)| < ∞. Then ∑xj g(xj) also converges.

The result is immediate for ﬁnite sums, and it follows by taking the limits of partial sums, which satisfy the inequality.

LEMMA B′

**Let g : → be such that ∫a g(x)dx exists for every a, b, ∈ ∞ ∞ ∫−∞ |g( x ) |dx < ∞. Then ∫−∞ g(x)dx also converges.
**

b

with a < b, and that

PROOF

Same as above replacing sums by integrals.

The following lemma provides conditions under which the operations of taking limits and expectations can be interchanged. In more advanced probability courses this result is known as the Dominated Convergence Theorem.

LEMMA C

Let {Xn}, n = 1, 2, . . . , be a sequence of r.v.’s, and let Y, X be r.v.’s such that |Xn(s)| ≤ Y(s), s ∈ S, n = 1, 2, . . . and Xn(s) → X(s) (on a set of s’s of ⎯ ⎯ probability 1) and E(Y) < ∞. Then E(X) exists and E(Xn) ⎯n→∞→ E(X), or equivalently,

lim E X n = E lim X n .

n→∞ n→∞

( )

(

)

REMARK 1

The index n can be replaced by a continuous variable.

The next lemma gives conditions under which the operations of differentiation and taking expectations commute.

LEMMA D

For each t ∈ T (where T is or an appropriate subset of it, such as the interval [a, b]), let X(·; t) be an r.v. such that (∂∂t)X(s; t) exists for each s ∈S and t ∈T. Furthermore, suppose there exists an r.v. Y with E(Y) < ∞ and such that

∂ X s; t ≤ Y s , ∂t

( )

()

s ∈ S , t ∈T .

Then

⎡∂ ⎤ d E X ⋅; t = E ⎢ X ⋅; t ⎥, for all t ∈T . dt ⎣ ∂t ⎦

[ ( )]

( )

The proofs of the above lemmas can be found in any book on real variables theory, although the last two will be stated in terms of weighting functions rather than expectations; for example, see Advanced Calculus, Theorem 2, p. 285, Theorem 7, p. 292, by D. V. Widder, Prentice-Hall, 1947; Real Analysis, Theorem 7.1, p. 146, by E. J. McShane and T. A. Botts, Van Nostrand, 1959; The Theory of Lebesgue Measure and Integration, pp. 66–67, by S. Hartman and J. Mikusinski, Pergamon Press, 1961. Also Mathematical ´ Methods of Statistics, pp. 45–46 and pp. 66–68, by H. Cramér, Princeton University Press, 1961.

140

6

Characteristic Functions, Moment Generating Functions and Related Theorems

**6.2 Deﬁnitions and Basic Theorems—The One-Dimensional Case
**

Let X be an r.v. with p.d.f. f. Then the characteristic function of X (ch.f. of X), denoted by φX (or just φ when no confusion is possible) is a function deﬁned on , taking complex values, in general, and deﬁned as follows:

φ X t = E e itX

()

[ ]

⎧∑ e itx f x = ∑ cos tx f x + i sin tx f x ⎪ x =⎨ x ∞ ∞ itx ⎪ e f x dx = ∫−∞ ∫−∞ cos tx f x + i sin tx f x dx ⎩ ⎧∑ cos tx f x + i ∑ sin tx f x ⎪ x =⎨ x ∞ ∞ ⎪ cos tx f x dx + i sin tx f x dx. ∫−∞ ⎩∫−∞

[ ( ) ( ) ( ) ( )] () [ ( ) ( ) ( ) ( )] [ ( ) ( )] [ ( ) ( )]

() ( )() ( )()

By Lemmas A, A′, B, B′, φX(t) exists for all t ∈ . The ch.f. φX is also called the Fourier transform of f. The following theorem summarizes the basic properties of a ch.f.

THEOREM 1

(Some properties of ch.f’s) ii) iii) iv) v) vi) vii) i) φX(0) = 1. |φX(t)| ≤ 1. φX is continuous, and, in fact, uniformly continuous. φX+d(t) = eitdφX(t), where d is a constant. φcX(t) = φX(ct), where c is a constant. φcX+d(t) = eitdφX(ct). dn φX t dt n

()

= inE(Xn), n = 1, 2, . . . , if E|Xn| < ∞.

t =0

PROOF

i) φX(t) = EeitX. Thus φX(0) = Eei0X = E(1) = 1. ii) |φX(t)| = |EeitX| ≤ E|eitX| = E(1) = 1, because |eitX| = 1. (For the proof of the inequality, see Exercise 6.2.1.) iii) φ X t + h − φ X t = E ⎡ e ⎢ ⎣

(

)

()

i t+ h X

( )

**= E e itX e ihX − 1 ≤ E e itX e ihX − 1 = E e ihX − 1.
**

Then

[ (

− e itX ⎤ ⎥ ⎦

)]

(

)

6.2 Deﬁnitions and Basic Theorems—The One-Dimensional Case 6.5 The Moment Generating Function

141

**lim φ X t + h − φ X t ≤ lim E e ihX − 1 = E lim e ihX − 1 = 0,
**

h→0 h→0 h→0

(

)

()

[

]

provided we can interchange the order of lim and E, which here can be done by Lemma C. We observe that uniformity holds since the last expression on the right is independent of t. iv) φX+d(t) = Eeit(X+d) = E(eitXeitd) = eitd EeitX = eitd φX(t). v) φcX(t) = Eeit(cX) = Eei(ct)X = φX(ct). vi) Follows trivially from (iv) and (v). vii) ⎛ ∂n ⎞ dn dn φ t = n Ee itX = E⎜ n e itX ⎟ = E i n X n e itX , n X dt dt ⎝ ∂t ⎠

()

(

)

provided we can interchange the order of differentiation and E. This can be done here, by Lemma D (applied successively n times to φX and its n − 1 ﬁrst derivatives), since E|X n| < ∞ implies E|X k| < ∞, k = 1, . . . , n (see Exercise 6.2.2). Thus dn φX t dt n

REMARK 2 n

n d

dt n

()

= inE X n .

t =0

( )

(−i)

From part (vii) of the theorem we have that E(Xn) = ϕX(t)|t = 0, so that the ch.f. produces the nth moment of the r.v.

If X is an r.v. whose values are of the form x = a + kh, where a, h are constants, h > 0, and k runs through the integral values 0, 1, . . . , n or 0, 1, . . . , or 0, ±1, . . . , ±n or 0, ±1, . . . , then the distribution of X is called a lattice distribution. For example, if X is distributed as B(n, p), then its values are of the form x = a + kh with a = 0, h = 1, and k = 0, 1, . . . , n. If X is distributed as P(λ), or it has the Negative Binomial distribution, then again its values are of the same form with a = 0, h = 1, and k = 0, 1, . . . . If now φ is the ch.f. of X, it can be shown that the distribution of X is a lattice distribution if and only if |φ(t)| = 1 for some t ≠ 0. It can be readily seen that this is indeed the case in the cases mentioned above (for example, φ(t) = 1 for t = 2π). It can also be shown that the distribution of X is a lattice distribution, if and only if the ch.f. φ is periodic with period 2π (that is, φ(t + 2π) = φ(t), t ∈ ).

REMARK 3

**In the following result, the ch.f. serves the purpose of recovering the distribution of an r.v. by way of its ch.f.
**

THEOREM 2

(Inversion formula) Let X be an r.v. with p.d.f. f and ch.f. φ. Then if X is of the discrete type, taking on the (distinct) values xj, j ≥ 1, one has i) f x j = lim

( )

T →∞

1 2T

∫

T

−T

e

− itx j

φ t dt , j ≥ 1.

()

If X is of the continuous type, then

142

6

Characteristic Functions, Moment Generating Functions and Related Theorems

ii) f x = lim lim

()

1 h→0 T →∞ 2π

∫

**1 − e − ith − itx e φ t dt −T ith
**

T

()

**and, in particular, if ∫−∞ |φ (t )|dt < ∞, then (f is bounded and continuous and) ′ ii′) f(x) = 1 ∞ − itx ∫ e φ(t)dt. 2π −∞
**

j

∞

PROOF (outline) i) The ch.f. φ is continuous, by Theorem 1(iii), and since − itx T so is e−itxj, it follows that the integral ∫−T e φ (t)dt exists for every T(> 0). We have then

1 2T

∫

T

−T

e

− itx j

φ t dt =

()

⎡ − itx ⎤ ∑ e itx f x k ⎥ dt ⎢e k ⎣ ⎦ T ⎡ ⎤ it ( x − x ) 1 f x k ⎥ dt = ∑e 2T ∫−T ⎢ k ⎣ ⎦ 1 T it( x − x ) e dt = ∑ f xk 2T ∫−T k 1 2T

∫

T

j

k

−T

( )

k

j

( )

j

( )

k

(the interchange of the integral and summations is valid here). That is,

1 2T

But

∫−T e

T

− itx j

φ t dt = ∑ f x k

k

()

( ) 21 ∫ T

(

T

−T

e

it x k − x j

(

) dt .

(1)

∫

T

−T

e

it x k − x j

(

) dt =

∫

T

−T

e

it x k − x j

(

( = ∫ cos t ( x ) dt = cos t x ( ∫

T −T T −T T −T

∫

T

−T

[cos t(x

k

k

− x j + i sin t x k − x j dt

T −T

)

= ∫ cos t x k − x j dt + i ∫ sin t x k − x j dt

j

k

**) ( ) − x )dt , since sin z is an odd function. That is, − x )dt . (2)
**

j

)]

If xk = xj, the above integral is equal to 2T, whereas, for xk ≠ xj,

∫

T

−T

cos t x k − x j dt = = =

(

)

1 xk − x j

∫

T

−T

d sin t x k − x j

(

) )]

sin T x k − x j − sin −T x k − x j 2 sin T x k − x j xk − x j

(

)

(

xk − x j

[ (

).

6.2 Deﬁnitions and Basic Theorems—The One-Dimensional Case 6.5 The Moment Generating Function

143

Therefore,

1 2T

k j

∫

k j

T

−T

( e

it x k − x j

⎪ ) dt = ⎪ sin T x − x ( k j) ⎨ ⎪ ⎪ T xk − x j ⎩

⎧1,

if , if

xk = x j xk ≠ x j.

(

)

(3)

**sinT ( x − x ) ≤ x 1 x , a constant independent of T, and therefore, for But − x −x sin T ( x − x ) xk ≠ xj, lim T ( x − x ) = 0 , so that T →∞
**

k j j

k

k

j

1 T →∞ 2T lim

∫

T

−T

e

it x k − x j

(

⎪ ) dt = ⎧1,

⎨ ⎪ ⎩0 ,

if if

xk = x j xk ≠ x j.

(4)

By means of (4), relation (1) yields

T →∞

lim

1 2T

∫−T e

T

− itx j

φ t dt = lim ∑ f xk

T →∞ k

()

( ) 21 ∫ T

1 T →∞ 2T

T

−T T

e

it xk − x j

(

) )

dt dt

= ∑ f xk lim

k

( )

(

∫−T e

it xk − x j

(

**(the interchange of the limit and summation is legitimate here)
**

= f x j + ∑ lim

k≠ j

( )

T →∞

1 2T

∫−T e

T

it x k − x j

) dt = f x , ( j)

as was to be seen ii) (ii′) Strictly speaking, (ii′) follows from (ii). We are going to omit (ii) entirely and attempt to give a rough justiﬁcation of (ii′). The assumption that ∞ ∞ − itx ∫−∞ |φ (t )|dt < ∞ implies that ∫−∞ e φ (t)dt exists, and is taken as follows for every arbitrary but ﬁxed x ∈ :

∫

But

e − itxφ t dt = lim ∫ e − itxφ t dt . −∞ (0 < )T→∞ −T

∞

()

T

()

(5)

∫

∞ T e − itxφ t dt = ∫ e − itx ⎡ ∫ e ity f y dy⎤dt ⎢ −∞ ⎥ −T −T ⎦ ⎣ T T ∞ it ( y − x ) ∞ it ( y − x ) =∫ ∫ e f y dy dt = ∫ f y ⎡ ∫ e dt ⎤dy, ⎢ −T ⎥ −T −∞ −∞ ⎦ ⎣ T

()

()

()

()

(6)

where the interchange of the order of integration is legitimate here. Since the integral (with respect to y) is zero over a single point, we may assume in the sequel that y ≠ x. Then

∫−T

T

e

it y − x

(

)

dt =

2 sin T y − x y−x

(

),

(7)

as was seen in part (i). By means of (7), relation (6) yields

144

6

Characteristic Functions, Moment Generating Functions and Related Theorems

∫−∞ e

∫

∞

∞

− itx

φ t dt = 2 lim ∫ f y

T →∞ −∞

()

∞

()

sin T y − x

y−x

(

) dy.

(8)

**Setting T(y − x) = z, expression (8) becomes
**

−∞ ∞ ⎛ z ⎞ sin z e − itxφ t dt = 2 lim ∫ f ⎜ x + ⎟ dz T →∞ −∞ ⎝ T⎠ z

()

= 2 f x π = 2πf x ,

by taking the limit under the integral sign, and by using continuity of f and the ∞ fact that ∫−∞ sin z dz = π . Solving for f(x), we have z

()

∫

()

f x =

as asserted.

EXAMPLE 1

()

1 2π

∞

−∞

e − itxφ t dt ,

()

Let X be B(n, p). In the next section, it will be seen that φX(t) = (peit + q)n. Let us apply (i) to this expression. First of all, we have 1 2T

∫

T

−T

e − itxφ t dt = = = =

()

1 2T 1 2T 1 2T 1 2T +

∫

T

−T

( pe

⎡

n

it

+ q e − itx dt

it

)

n

∫ ⎢∑ ⎜ r ⎟ ( pe ⎢ ⎝ ⎠

T −T

⎛ n⎞

⎣r = 0

)q

r

n −r

⎤ e − itx ⎥ dt ⎥ ⎦ dt

T i r− x t

∑ ⎜ r⎟ p q ∫ ⎝ ⎠

r n −r r= 0 n

n

⎛ n⎞ ⎛ n⎞

T

−T

e

i r− x t

(

)

∑ ⎜ r⎟ p q ⎝ ⎠

r r= 0 r≠ x

n −r

∫ i( r − x )

1

−T

e

(

)

i r − x dt

(

)

1 ⎛ n⎞ x n − x T ⎜ ⎟ p q ∫−T dt 2T ⎝ x ⎠

i ( r − x )T − i ( r − x )T n ⎛ n⎞ e −e 1 ⎛ n⎞ x n − x = ∑ ⎜ ⎟ p r qn − r + ⎜ ⎟ p q 2T 2T ⎝ x ⎠ 2Ti r − x r= 0 ⎝ r ⎠ r≠ x

(

)

n sin r − x T ⎛ n⎞ x n − x ⎛ n⎞ +⎜ ⎟p q . = ∑ ⎜ ⎟ p r qn − r ⎝x⎠ r−x T r= 0 ⎝ r ⎠ r≠ x

(

(

)

)

**Taking the limit as T → ∞, we get the desired result, namely
**

⎛ n⎞ f x = ⎜ ⎟ p x q n− x . ⎝ x⎠

()

6.5

The Moment GeneratingExercises Function

145

**(One could also use (i′) for calculating f(x), since φ is, clearly, periodic with period 2π.)
**

EXAMPLE 2

For an example of the continuous type, let X be N(0, 1). In the next section, we 2 2 ∞ will see that φX(t) = e−t /2. Since |φ(t)| = e−t /2, we know that ∫−∞ |φ (t )|dt < ∞, so that (ii′) applies. Thus we have f x = =

( )

1 2π

∫

∞

−∞

e − itxφ t dt =

2

()

1 2π

∫

∞

−∞

e − itx e − t

2

2

dt

2 2 2

**⎤ ⎡ 1 ∞ − ( 1 2 )( t + 2 itx ) 1 ∞ − ( 1 2 ) ⎣ t + 2 t ( ix ) + ( ix ) ⎦ ( 1 2 )( ix ) ⎥ ⎢ e dt = e e dt 2π ∫−∞ 2π ∫−∞ −( 1 2 )x −( 1 2 )x ∞ ∞ e 1 e 1 − ( 1 2 )( t + ix ) −( 1 2 )u dt = du = e ∫−∞ 2π ∫−∞ 2π e 2π 2π
**

2 2 2 2

=

e

− 1 2 x2

( )

2π

⋅1 =

1 2π

e −x

2

2

,

**as was to be shown.
**

THEOREM 3

(Uniqueness Theorem) There is a one-to-one correspondence between the characteristic function and the p.d.f. of a random variable.

PROOF The p.d.f. of an r.v. determines its ch.f. through the deﬁnition of the ch.f. The converse, which is the involved part of the theorem, follows from Theorem 2.

Exercises

6.2.1 Show that for any r.v. X and every t ∈ , one has |EeitX| ≤ E|eitX|(= 1). (Hint: If z = a + ib, a, b ∈ , recall that z = a 2 + b 2 . Also use Exercise 5.4.7 in Chapter 5 in order to conclude that (EY)2 ≤ EY2 for any r.v. Y.) 6.2.2 Write out detailed proofs for parts (iii) and (vii) of Theorem 1 and justify the use of Lemmas C, D. ¯ 6.2.3 For any r.v. X with ch.f. φX, show that φ−X(t) = φ X(t), t ∈ , where the bar over φX denotes conjugate, that is, if z = a + ib, a, b ∈ , then z = a − ib. ¯ 6.2.4 Show that the ch.f. φX of an r.v. X is real if and only if the p.d.f. fX of X is symmetric about 0 (that is, fX(−x) = fX(x), x ∈ ). (Hint: If φX is real, then the conclusion is reached by means of the previous exercise and Theorem 2. If fX is symmetric, show that f−X(x) = fX(−x), x ∈ .)

146

6

Characteristic Functions, Moment Generating Functions and Related Theorems

6.2.5 Let X be an r.v. with p.d.f. f and ch.f. φ given by: φ(t) = 1 − |t| if |t| ≤ 1 and φ(t) = 0 if |t| > 1. Use the appropriate inversion formula to ﬁnd f. 6.2.6 Consider the r.v. X with ch.f. φ(t) = e−|t|, t ∈ , and utilize Theorem 2(ii′) in order to determine the p.d.f. of X.

**6.3 The Characteristic Functions of Some Random Variables
**

In this section, the ch.f.’s of some distributions commonly occurring will be derived, both for illustrative purposes and for later use.

6.3.1

Discrete Case

n n x n ⎛ n⎞ ⎛ n⎞ φ X t = ∑ e itx ⎜ ⎟ p x q n − x = ∑ ⎜ ⎟ pe it q n − x = pe it + q , ⎝ x⎠ x= 0 x = 0 ⎝ x⎠

1. Let X be B(n, p). Then φX(t) = (peit + q)n. In fact,

()

( )

(

)

Hence

d φX t dt

so that E(X ) = np. Also, d2 φX t dt 2

()

= n pe it + q

t=0

(

)

n −1

ipe it

t=0

= inp,

()

= inp

t=0

d ⎡ it pe + q dt ⎢ ⎣

(

)

n −1

⎤ e it ⎥ ⎦ t= 0

n −2

⎡ = inp⎢ n − 1 pe it + q ⎣

(

)(

)

ipe it e it + pe it + q

(

)

n −1

= i 2 np n − 1 p + 1 = − np n − 1 p + 1 = − E X 2 , so that

E X 2 = np n − 1 p + 1 and σ 2 X = E X 2 − EX = n 2 p 2 − np 2 + np − n 2 p 2 = np 1 − p = npq;

[( ) ]

]

[( ) ]

( )

( )

2

⎤ ie it ⎥ ⎦ t= 0

( )

[( )

( )

( ) ( )

that is, σ 2(X ) = npq. it 2. Let X be P(λ). Then φX(t) = eλe −λ. In fact,

φ X t = ∑ e itx e − λ

x= 0

()

∞

it ∞ λe λx −λ =e ∑ x! x! x= 0

( )

x

= e − λ e λe = e λe

it

it

−λ

.

6.3 The Characteristic 6.5 The Moment Generating Variables Functions of Some Random Function

147

Hence

d φX t dt

so that E(X ) = λ. Also,

d2 φX t dt 2

()

t=0

= e λe

it

−λ

iλe it

t= 0

= iλ ,

()

=

t=0

d iλe − λ e λe dt d λe e dt ⋅e

(

it

+it

)

t=0

**= iλe − λ = iλe = iλe
**

2 −λ −λ

it

+it

λeit +it λ

⋅ λe it ⋅ i + i

( ) = i λ (λ + 1) = − λ (λ + 1) = − E ( X ),

⋅ e λi + i

2

(

t= 0

)

t= 0

so that

σ 2 X = E X 2 − EX

that is, σ 2(X ) = λ.

( )

( ) ( )

2

= λ λ + 1 − λ2 = λ ;

(

)

6.3.2 Continuous Case

1. Let X be N(μ, σ 2). Then φX(t) = eitμ−(σ t /2), and, in particular, if X is 2 N(0, 1), then φX(t) = e−t /2. If X is N(μ, σ 2), then (X − μ)/σ, is N(0, 1). Thus

22

**φ( X − μ ) σ t = φ(1 σ ) X −( μ σ ) t = e − itμ σ φ X t σ , and φ X t σ = e itμ σ φ( X − μ ) σ t .
**

So it sufﬁces to ﬁnd the ch.f. of an N(0, 1) r.v. Y, say. Now

()

()

( )

( )

()

φY t =

=

()

1 2π 1 2π

2

∫−∞ e

e −t

2

∞

ity

e −y

∞

2

2

dy =

1 2π

∫−∞ e

2

∞

− y 2 − 2 ity

(

) 2 dy

2

∫−∞ e

− y − it

(

)

2

2

dy = e − t

2

.

Hence φX(t/σ) = eitμ/σ e−t /2 and replacing t/σ by t, we get, ﬁnally: ⎛ σ 2t 2 ⎞ φ X t = exp⎜ itμ − . 2 ⎟ ⎝ ⎠

()

148

6

Characteristic Functions, Moment Generating Functions and Related Theorems

Hence

d φX t dt

()

⎛ σ 2t 2 ⎞ = exp⎜ itμ − iμ − σ 2t 2 ⎟ ⎝ ⎠ t =0

(

)

= iμ , so that E X = μ .

t =0

( )

d2 φX t dt 2

()

⎛ σ 2t 2 ⎞ = exp⎜ itμ − iμ − σ 2t 2 ⎟ ⎝ ⎠ t =0

(

)

2

= i2μ 2 − σ 2 = i2 μ 2 + σ 2 .

(

)

⎛ σ 2t 2 ⎞ − σ 2 exp⎜ itμ − 2 ⎟ ⎝ ⎠ t =0

Then E(X 2) = μ 2 + σ 2 and σ 2(X) = μ2 + σ 2 − μ2 = σ 2. 2. Let X be Gamma distributed with parameters α and β. Then φX(t) = (1 − iβt)−α. In fact,

φX t =

()

1 Γ α βα

( ) ∫0

x=

∞

e itx x α −1 e − x β dx =

1 Γ α βα

( ) ∫0

∞

x α −1 e

− x 1− iβt β

(

)

dx.

**Setting x(1 − iβt) = y, we get
**

y dy , dx = , 1 − iβt 1 − iβt y ∈ 0, ∞ .

[

)

**Hence the above expression becomes
**

1 Γ α βα y ( ) ∫0 (1 − iβt )α −1 = 1 − iβt

∞

1

α −1 − y β

e

dy 1 − iβt

(

)

−α

1 Γ α βα

( ) ∫0

=

t =0

∞

yα −1e − y β dy = 1 − iβt

(

)

−α

.

Therefore

d φX t dt

so that E(X ) = αβ, and

d2 φX dt 2

2

()

(1 − iβt )

iαβ

α +1

t =0

= iαβ ,

=i

t =0

2

2

α α +1 β2

α +2

t =0

) (1 − iβt )

2

(

= i 2α α + 1 β 2 ,

(

)

so that E(X ) = α(α + 1)β . Thus σ (X) = α(α + 1)β2 − α2β2 = αβ2. For α = r/2, β = 2, we get the corresponding quantities for χ2, and for r α = 1, β = 1/λ, we get the corresponding quantities for the Negative Exponential distribution. So

6.5

**The Moment GeneratingExercises Function
**

−1

149

φ X t = 1 − 2it

() (

)

−r 2

⎛ it ⎞ , φX t = ⎜ 1 − ⎟ λ⎠ ⎝

()

=

λ , λ − it

respectively. 3. Let X be Cauchy distributed with μ = 0 and σ = 1. Then φX(t) = e−|t|. In fact,

φ X t = ∫ e itx

()

**1 1 1 ∞ cos tx dx = ∫ dx 2 −∞ π 1+ x π −∞ 1 + x 2 i ∞ sin tx 2 ∞ cos tx dx = ∫ dx + ∫ −∞ 1 + x 2 π π 0 1 + x2
**

∞

( )

( )

( )

because 1 + x2 since sin(tx) is an odd function, and cos(tx) is an even function. Further, it can be shown by complex variables theory that

−∞

∫

∞

sin tx

( ) dx = 0,

∫

Hence

∞

cos tx

0

1 + x2

( ) dx = π e

2

−t

.

φX t = e

Now

()

−t

.

d d −t φX t = e dt dt does not exist for t = 0. This is consistent with the fact of nonexistence of E(X ), as has been seen in Chapter 5.

()

Exercises

6.3.1 Let X be an r.v. with p.d.f. f given in Exercise 3.2.13 of Chapter 3. Derive its ch.f. φ, and calculate EX, E[X(X − 1)], σ 2(X), provided they are ﬁnite. 6.3.2 Let X be an r.v. with p.d.f. f given in Exercise 3.2.14 of Chapter 3. Derive its ch.f. φ, and calculate EX, E[X(X − 1)], σ 2(X), provided they are ﬁnite. 6.3.3 Let X be an r.v. with p.d.f. f given by f(x) = λe−λ(x−α) I(α,∞)(x). Find its ch.f. φ, and calculate EX, σ 2(X), provided they are ﬁnite.

150

6

Characteristic Functions, Moment Generating Functions and Related Theorems

6.3.4 Let X be an r.v. distributed as Negative Binomial with parameters r and p. i) Show that its ch.f., φ, is given by

φt =

()

pr

(

1 − qe it

)

r

;

ii) By differentiating φ, show that EX = rq/p and σ 2(X) = rq/p2; iii) Find the quantities mentioned in (i) and (ii) for the Geometric distribution. 6.3.5 Let X be an r.v. distributed as U(α, β). ii) Show that its ch.f., φ, is given by

φt =

()

e itβ − e itα ; it β − α

(

)

+ ii) By differentiating φ, show that EX = α 2 β and σ 2 ( X ) =

(α − β )2

12

.

6.3.6 Consider the r.v. X with p.d.f. f given in Exercise 3.3.14(ii) of Chapter 3, and by using the ch.f. of X, calculate EX n, n = 1, 2, . . . , provided they are ﬁnite.

**6.4 Deﬁnitions and Basic Theorems—The Multidimensional Case
**

In this section, versions of Theorems 1, 2 and 3 are presented for the case that the r.v. X is replaced by a k-dimensional r. vector X. Their interpretation, usefulness and usage is analogous to the ones given in the one-dimensional case. To this end, let now X = (X1, . . . , Xk)′ be a random vector. Then the ch.f. of the r. vector X, or the joint ch.f. of the r.v.’s X1, . . . , Xk, denoted by φX or φX1, . . . , Xk, is deﬁned as follows:

φX , ⋅ ⋅ ⋅ ,

1

Xk

(t , ⋅ ⋅ ⋅ , t ) = E[e

1 k

it 1 X 1 +it2 X 2 + ⋅ ⋅ ⋅ + itk X k

],

tj ∈ ,

j = 1, 2, . . . , k. The ch.f. φX1, . . . , Xk always exists by an obvious generalization of Lemmas A, A′ and B, B′. The joint ch.f. φX1, . . . , Xk satisﬁes properties analogous to properties (i)–(vii). That is, one has

THEOREM 1′

(Some properties of ch.f.’s) i′) φX1, . . . , Xk(0, . . . , 0) = 1. ′ ii′) |φX1, . . . , Xk(t1, . . . , tk)| ≤ 1. ′ iii′) φX1, . . . , Xk is uniformly continuous. ′

6.4

Deﬁnitions and Basic Theorems—The Multidimensional Case 6.5 The Moment Generating Function

151

**iv′) φX1+d1, . . . , Xk+dk(t1, . . . , tk) = eit1d1+ · · · +itkdkφX1, . . . , ′ v′) φc1X1, . . . , ckXk(t1, . . . , tk) = φX1, . . . , ′ vi′) φc1X1+ d1, . . . , ckXk + dk(t1, . . . , tk) = e ′
**

Xk it1d1 + · · · + itkdk

Xk

(t1, . . . , tk).

(c1t1, . . . , cktk).

φX1, . . . , Xk(c1t1, . . . , cktk).

vii′) If the absolute (n1, . . . , nk)-joint moment, as well as all lower order joint ′ moments of X1, . . . , Xk are ﬁnite, then

∂ n + ⋅ ⋅ ⋅ +n φX , ⋅ ⋅ ⋅ , ∂t tn ⋅ ⋅ ⋅ ∂t tn

1 k 1 k 1 1 k

Xk

(t , ⋅ ⋅ ⋅ , t )

1 k t 1 = ⋅ ⋅ ⋅ =tk = 0

= i∑

k j=1 n j

n n E X1 ⋅ ⋅ ⋅ X k ,

1 k

(

)

and, in particular,

∂n φX , ⋅ ⋅ ⋅ , ∂t n j

1

Xk

(t , ⋅ ⋅ ⋅ , t )

1 k t 1 = ⋅ ⋅ ⋅ =tk = 0

= inE X n , j

( )

j = 1, 2, ⋅ ⋅ ⋅ , k.

viii) If in the φX1, . . . , Xk(t1, . . . , tk) we set tj1 = · · · = tjn = 0, then the resulting expression is the joint ch.f. of the r.v.’s Xi1, . . . , Xim, where the j’s and the i’s are different and m + n = k. Multidimensional versions of Theorem 2 and Theorem 3 also hold true. We give their formulations below.

THEOREM 2 ′

**(Inversion formula) Let X = (X1, . . . , Xk)′ be an r. vector with p.d.f. f and ch.f. φ. Then ii) f X , ⋅ ⋅ ⋅ ,
**

1

Xk

(

⎛ 1 ⎞ x1 , ⋅ ⋅ ⋅ , xk = lim ⎜ ⎟ T →∞ ⎝ 2T ⎠

)

k

∫

1

T

−T

⋅ ⋅ ⋅ ∫ e − it x − ⋅ ⋅ ⋅ −it x

1 1

T

k k

−T

× φX , ⋅ ⋅ ⋅ ,

1

Xk

(t , ⋅ ⋅ ⋅ , t )dt

k

1

⋅ ⋅ ⋅ dt k ,

**if X is of the discrete type, and ii) f X , ⋅ ⋅ ⋅ ,
**

1

Xk

(x , ⋅ ⋅ ⋅ , x )

1 k

⎛ 1 ⎞ = lim lim ⎜ ⎟ h →0 T →∞ ⎝ 2π ⎠

k

∫

T

−T

⋅ ⋅ ⋅∫

Xk

T

−T

∏⎜

1

× e − it1x1 − ⋅ ⋅ ⋅ −itk xk φ X 1 , ⋅ ⋅ ⋅ ,

(t , ⋅ ⋅ ⋅ , t )dt

k

⎛ 1 − e − it j h ⎞ ⎟ it j h ⎠ j =1 ⎝

k 1

⋅ ⋅ ⋅ dt k ,

**if X is of the continuous type, with the analog of (ii′) holding if the integral of |φX1, . . . , Xk(t1, . . . , tk)| is ﬁnite.
**

THEOREM 3 ′

(Uniqueness Theorem) There is a one-to-one correspondence between the ch.f. and the p.d.f. of an r. vector. The justiﬁcation of Theorem 1′ is entirely analogous to that given for Theorem 1, and so is the proof of Theorem 2′. As for Theorem 3′, the fact that the p.d.f. of X determines its ch.f. follows from the deﬁnition of the ch.f. That the ch.f. of X determines its p.d.f. follows from Theorem 2′.

PROOFS

152

6

Characteristic Functions, Moment Generating Functions and Related Theorems

6.4.1

The Ch.f. of the Multinomial Distribution

**Let X = (X1, . . . , Xk)′ be Multinomially distributed; that is,
**

P X 1 = x1 , ⋅ ⋅ ⋅ , X k = xk =

(

)

n! x x p1 ⋅ ⋅ ⋅ pk . x1! ⋅ ⋅ ⋅ xk !

1 k

Then

φX , ⋅ ⋅ ⋅ ,

1

Xk

(t , ⋅ ⋅ ⋅ , t ) = ( p e

1 k 1

1 1

it 1

+ ⋅ ⋅ ⋅ + pk e it

k

).

n

1 k

In fact,

φX , ⋅ ⋅ ⋅ ,

1

Xk

(t , ⋅ ⋅ ⋅ , t ) = ∑

1 k

e it x + ⋅ ⋅ ⋅ +it x

k k

x 1 , ⋅ ⋅ ⋅ , xk

n! x x × p1 ⋅ ⋅ ⋅ pk x1! ⋅ ⋅ ⋅ xk !

1

=

x 1 , ⋅ ⋅ ⋅ , xk

∑

n! p1e it x1! ⋅ ⋅ ⋅ xk !

k

(

)

x1

⋅ ⋅ ⋅ pk e it

(

k

)

xk

= p1e it + ⋅ ⋅ ⋅ + pk e it

1

(

).

n

Hence

∂k φX , ⋅ ⋅ ⋅ , ∂t1 ⋅ ⋅ ⋅ ∂t k

1

Xk

(t , ⋅ ⋅ ⋅ , t )

1 k

= n n − 1 ⋅ ⋅ ⋅ n − k + 1 i p1 ⋅ ⋅ ⋅ pk p1e it1 + ⋅ ⋅ ⋅

k

(

) (

)

t 1 = ⋅ ⋅ ⋅ = tk = 0

+ pk e itk

)

(

n −k t 1 = ⋅ ⋅ ⋅ = tk = 0

= i k n n − 1 ⋅ ⋅ ⋅ n − k + 1 p1 p2 ⋅ ⋅ ⋅ pk .

(

) (

)

Hence

E X1 ⋅ ⋅ ⋅ X k = n n − 1 ⋅ ⋅ ⋅ n − k + 1 p1 p2 ⋅ ⋅ ⋅ pk .

(

) (

)

(

)

Finally, the ch.f. of a (measurable) function g(X) of the r. vector X = (X1, . . . , Xk)′ is deﬁned by:

⎧ e itg ( x ) f x , x = x , ∑ 1 ⋅ ⋅ ⋅ , xk ′ ⎡e itg ( X ) ⎤ = ⎪ x φg(X) t = E ⎥ ⎨ ⎢ ∞ itg ( x , ⋅ ⋅ ⋅ , x ) ⎦ ⎪ ∞ ⎣ ⋅⋅⋅ e f x1 , ⋅ ⋅ ⋅ , xk dx1 , ⋅ ⋅ ⋅ , dxk . ⎩∫−∞ ∫−∞

()

()

(

)

1

k

(

)

Exercise

6.4.1 (Cramér–Wold) Consider the r.v.’s Xj, j = 1, . . . , k and for cj ∈ j = 1, . . . , k, set ,

6.5

The Moment Generating Function

153

Yc = ∑ c j X j .

j =1

k

Then ii) Show that φYc(t) = φX1, . . . , Xk(c1t, . . . , ckt), t ∈ = φYc(1); , and φX1, . . . , Xk(c1, . . . , ck)

ii) Conclude that the distribution of the X’s determines the distribution of Yc for every cj ∈ , j = 1, . . . , k. Conversely, the distribution of the X’s is determined by the distribution of Yc for every cj ∈ , j = 1, . . . , k.

**6.5 The Moment Generating Function and Factorial Moment Generating Function of a Random Variable
**

The ch.f. of an r.v. or an r. vector is a function deﬁned on the entire real line and taking values in the complex plane. Those readers who are not well versed in matters related to complex-valued functions may feel uncomfortable in dealing with ch.f.’s. There is a partial remedy to this potential problem, and that is to replace a ch.f. by an entity which is called moment generating function. However, there is a price to be paid for this: namely, a moment generating function may exist (in the sense of being ﬁnite) only for t = 0. There are cases where it exists for t’s lying in a proper subset of (containing 0), and yet other cases, where the moment generating function exists for all real t. All three cases will be illustrated by examples below. First, consider the case of an r.v. X. Then the moment generating function (m.g.f.) MX (or just M when no confusion is possible) of a random variable X, which is also called the Laplace transform of f, is deﬁned by MX(t) = E(etX), t ∈ , if this expectation exists. For t = 0, MX(0) always exists and equals 1. However, it may fail to exist for t ≠ 0. If MX(t) exists, then formally φX(t) = MX(it) and therefore the m.g.f. satisﬁes most of the properties analogous to properties (i)–(vii) cited above in connection with the ch.f., under suitable conditions. In particular, property (vii) in Theorem 1 yields d M X (t ) t =0 = E ( X n ) , provided Lemma D applies. In fact, dt

n n

dn MX t dt n

()

=

t =0

dn Ee tX dt n

(

)

= E X n e tX

(

)

⎛ dn ⎞ = E ⎜ n e tX ⎟ ⎝ dt ⎠ t =0 t =0 = E Xn .

t =0

( )

This is the property from which the m.g.f. derives its name.

e t =0 dt t =0 t ( ) () = t =0 t d λe t e λe −λ dt ( ) = λ e t e λe −λ + e t e λe −λ λe t t t = λ 1 + λ = E X 2 . clearly. 6. although no justiﬁcation will be supplied.’s . 1.g. If X ∼ B(n. then MX(t) = eλe −λ.g.f.f. 2.. p).G.f. It is instructive to derive them in order to see how conditions are imposed on t in order for the m. t t Then d MX t dt and d2 MX t dt 2 () = t =0 d λe −λ = λe t e λe −λ =λ=E X . then MX(t) = (pet + q)n.154 6 Characteristic Functions.’s.5.V. ( ) d2 MX t dt 2 () = t =0 d ⎡ np pe t + q dt ⎢ ⎣ ( ) n −1 ⎤ et ⎥ ⎦ t =0 pe t e t + pe t + q ⎡ = np⎢ n − 1 pe t + q ⎣ ( )( ) n −2 ( ) n −1 = n n − 1 p 2 + np = n 2 p 2 − np 2 + np = E X 2 . If X ∼ P(λ).1 The M. as it would be formulated for an m. t ∈ n n x n ⎛ n⎞ ⎛ n⎞ M X t = ∑ e tx ⎜ ⎟ p x q n − x = ∑ ⎜ ⎟ pe t q n − x = pe t + q .’s of Some R. Indeed. In fact. t ∈ t ( ) ( ) ⎤ et ⎥ ⎦ t =0 . so that σ 2 X = λ + λ2 − λ2 = λ . Moment Generating Functions and Related Theorems Here are some examples of m. ( ) ( ) t =0 ( ) t =0 ( ) . ⎝ x⎠ x= 0 x = 0 ⎝ x⎠ () ( ) ) ( ) which. d MX t dt and () = t =0 d pe t + q dt ( ) n t =0 = n pe t + q ( n −1 pe t t =0 = np = E X .g.F. is applicable in all these examples. is ﬁnite for all t ∈ Then . to be ﬁnite. x M X t = ∑ e tx e − λ x= 0 () ∞ t ∞ λe λx −λ =e ∑ x! x! x= 0 ( ) t = e − λ e λe = e λe −λ . so that σ 2(X) = n2p2 − np2 + np − n2p2 = np(1 − p) = npq. It so happens that part (vii) of Theorem 1.

then MX(t) = (1 − βt)−α.6. ∞). so that x = expression is equal to . then MX(t) = et /2. t < 1/β. t ∈ . we get. Then d MX t dt () t =0 d 1 − βt dt ( ) −α t =0 = αβ = E X . Replacing t by σt. Then by setting x(1 − βt) = y. If X ∼ N(μ. so that M X ⎜ ⎟ = e μt σ M X − μ t ⎝σ ⎠ ⎝σ ⎠ σ σ () () for all t ∈ . or equivalently. analogous to property (vi) in Theorem 1. the above 1 (1 − βt ) α ⋅ 1 Γ α βα ( ) ∫0 = ∞ yα −1 e − y β dy = (1 − βt ) α . 4. t ∈ . If X is distributed as Gamma with parameters α and β.g. if X ∼ N(0. 2 3. provided 1 − βt > 0. σ 2). in particular. Then σ 2t 2 2 t= 0 () = t= 0 d μt + e dt σ 2t 2 2 t=0 = μ + σ 2t e ( ) μt + =μ=E X . ⎛t⎞ ⎛t⎞ M X − μ t = e − μt σ M X ⎜ ⎟ . and. By the property for m. Indeed. so that σ 2 X = σ 2 + μ 2 − μ 2 = σ 2 . and y ∈ [0. ( ) . ( ) d2 MX t dt 2 () μt + d ⎡ = ⎢ μ + σ 2t e dt ⎢ t=0 ⎣ ( ) σ 2t 2 2 ⎤ ⎥ ⎥ ⎦ t= 0 σ t σ t ⎡ 2 μt + μt + 2 2 = ⎢σ 2 e + μ + σ 2t e ⎢ ⎣ 2 2 ( ) 2 2 = E X 2 .f. M X (t ) = e d MX t dt and σ 2t 2 2 . ﬁnally. ( ) ( ) ⎤ ⎥ = σ 2 + μ2 ⎥ ⎦ t=0 MX t = () 1 Γ α βα ( ) ∫ 1 ∞ 0 e tx x α −1 e − x β dx = y 1− βt 1 Γ α βα ( ) ∫ dy 1− βt ∞ 0 x α −1 e − x 1− βt β ( ) dx. Therefore μt ⎛t⎞ MX ⎜ ⎟ = e σ et ⎝σ ⎠ 2 2 =e μt + μt t 2 + σ 2 . t < 1/β. then M X (t ) = e tμ + σ 2t 2 2 .5 The Moment Generating Function 155 1). dx = .

f. X is taken from its factorial m. EX = . More precisely. Formally. of the Negative Exponential distribution. of the χ 2. REMARK 4 For an r. we also deﬁne what is known as its factorial moment generating function. Clearly.’s exist for proper subsets of . the m. t < λ .9). t 2π ∫ ∞ 0 ( ) The examples just discussed exhibit all three cases regarding the existence or nonexistence of an m. we again reach the conclusion that MX(t) = ∞ (see Exercise 6.’s exist for all t ∈ . 1 . and this equals 2 x dx t ∞ du t = ∫1 u = 2π lim log u . and its mean and r variance. in Examples 2 and 4. 2 ( ) ( ) ( ) r In particular.f.f. ∞ 1 1 M X t = E e tX = ∫ e tx dx −∞ π 1 + x2 1 ∞ 1 1 ∞ 1 > ∫ e tx dx > ∫ tx dx 2 π 0 π 0 1+ x 1 + x2 MX t = () ( ) () ( ) ( ) if t > 0. and its mean and variance.g. we obtain the m.g.g. we get the m. by using the limits −∞.v.g.f. σ 2 X = 2 . Let X have the Cauchy distribution with parameters μ and σ. for α = 2 and β = 2. so that σ X = αβ .156 6 Characteristic Functions. ηX(t) = MX(log t) for t > 0. X. In Examples 1 and 3. the m. the m. ηX (or just η when no confusion is possible) of an r. σ 2 X = 2r . exists only for t = 0. MX(t) obviously is equal to ∞. for z > 0. let μ = 0.f.f. 2 1 For α = 1 and β = λ . if E t X exists. σ = 1.g.f. E X = r . and without loss of generality. by differentiation as follows: () ( ) ( ) . and in Example 5. t ∈ . Moment Generating Functions and Related Theorems and d2 MX t dt 2 () = αβ t =0 d 1 − βt dt ( ) − α −1 t =0 2 = α α + 1 β 2 1 − βt ( ) ( 2 ) −α − 2 t =0 2 = α α + 1 β = EX .v.g. This function is sometimes referred to as the Mellin or Mellin–Stieltjes transform of f. λ −t λ λ 5. t< ( ) ( ) λ 1 1 . namely MX t = 1 − 2t −r 2 () ( ) . X is deﬁned by: ηX t = E t X . If t < 0. namely. Then the MX(t) exists only for t = 0. the nth factorial moment of an r.g. since ez > z. 2 x →∞ 2π 1+ x Thus for t > 0.5.v. the factorial m.f. 0 in the integral.g. In fact.

5 The Moment Generating Function 157 dn ηX t dt n In fact.G.F. so that the interchange of the order of differentiation and expectation is valid. an expression of the variance of X in terms of derivatives of its factorial m. () = E X X −1 ⋅ ⋅ ⋅ X − n+1 . ηX t = ∑ t e x =0 () ∞ x −λ ∞ λt λx = e −λ ∑ = e − λ e λt = e λt −λ . Hence dn ηX t dt n REMARK 5 () = E X X −1 ⋅ ⋅ ⋅ X − n+1 . As has already been seen in the ﬁrst two examples in Section 2 of Chapter 5. derives its name from the property just established.’s of some R. t =1 [ ( ) ( )] ⎛ ∂n ⎞ dn dn η X t = n E t X = E⎜ n t X ⎟ = E X X − 1 ⋅ ⋅ ⋅ X − n + 1 t X − n . and E X 2 = E X X − 1 + E X . If X ∼ P(λ). then ηX(t) = eλt − λ. then ηX(t) = (pt + q)n. In fact. up to order two.g.f. since σ 2 X = E X 2 − EX . although no speciﬁc justiﬁcation will be provided.V.2 The Factorial M.f. t ∈ . t =1 [ ( ) ( )] (9) The factorial m.’s 1.v.f. If X ∼ B(n. 2.6. x! x = 0 x! ( ) x .’s. p). Below we derive some factorial m. [ ( )] ( ) ( ) 2 6. factorial moments are especially valuable in calculating the variance of discrete r. Indeed.g. ⎝ x⎠ ⎝ x⎠ x= 0 x= 0 () ( ) ( ) Then d2 ηX t dt 2 () = n n − 1 p 2 pt + q t =1 ( ) ( ) n −2 = n n − 1 p2 . In fact. Property (9) (for n = 2) is valid in all these examples. t ∈ . ( ) so that σ 2(X) = n(n − 1)p2 + np − n2p2 = npq. n n x n ⎛ n⎞ ⎛ n⎞ η X t = ∑ t x ⎜ ⎟ p x q n − x = ∑ ⎜ ⎟ pt q n − x = pt + q .5.’s.g. dt n dt ⎝ ∂t ⎠ () ( ) [ ( ) ( ) ] provided Lemma D applies. that is. we get ( ) ( ) ( ) ( ) 2 ( ) [ ( )] ( ) σ 2 X = E X X − 1 + E X − EX . t ∈ .

. . (11) An analytical derivation of this formula is possible. Xk. t ) 1 k n = E X 1n 1 ⋅ ⋅ ⋅ X k k .f. . of an r.d. nk are non-negative integers.G.f. 1 1 k k 1 k ( ) ( ) ∂ n + ⋅ ⋅ ⋅ +n MX . . ⋅ ⋅ ⋅ . ⋅ ⋅ ⋅ . vectors. xk where the summation is over all integers x1. . . . . . . . Below. t ⎥ ⎦ 1 2 ∈ . Xk have jointly the Multinomial distribution with parameters n and p1. . If the r. k.3 The M. n nk ∂t1 1 ⋅ ⋅ ⋅ ∂t k 1 k 1 Xk (t . ⋅ ⋅ ⋅ . k. the above derivations hold true for all tj ∈ . . .v. . MX1 .f. . . is 1 2 MX 1 . (viii) in Theorem 1′ hold true under suitable conditions. t ) = ( p e 1 k 1 t1 + ⋅ ⋅ ⋅ + pk e t k ). If the r.158 6 Characteristic Functions. ( ) The m. ⋅ ⋅ ⋅ . for those tj’s in for which this expectation exists. pk. . Recall that the joint p. .’s X1. tk) exists. X t1 .5. is deﬁned by: M X . . Xk (t . If MX1. . Xk. . Xk(it1. . .g. . 2.g. . ⋅ ⋅ ⋅ . . .’s of Some R. . .’s X1 and X2 have the Bivariate Normal distribution with parameters μ1.’s X1. n tj ∈ .F. so that σ 2 X = λ2 + λ − λ2 = λ . then MX . . . . ⋅ ⋅ ⋅ . then formally φX1. Xk(t1. . .v.f. Vectors 1. . In fact. itk) and properties analogous to (i′)–(vii′). In particular. which is more elegant and compact. μ2. .v. Clearly. 6.’s of r. of the r. . ⋅ ⋅ ⋅ . . . j = 1. 1 Xk (t .f. 2. k. vector X or the joint m. Xk(t1.g.g. . . ⋅ ⋅ ⋅ . but we prefer to use the matrix approach. . j = 1. . denoted by MX or MX1. Moment Generating Functions and Related Theorems Hence d2 ηX t dt 2 () = λ2 e λt −λ t =1 t =1 = λ2 . . tk) = MX1. then their joint m. t ) = Ee 1 k t 1 X 1 + ⋅ ⋅ ⋅ +tk X k = ∑e 1 x1 t X 1 + ⋅ ⋅ ⋅ +tk X k n! x p1x1 ⋅ ⋅ ⋅ pk k x1! ⋅ ⋅ ⋅ xk ! =∑ n! p1e t1 x1! ⋅ ⋅ ⋅ xk ! ( ) ⋅ ⋅ ⋅ pk e tk n = p1e t1 + ⋅ ⋅ ⋅ + pk e tk ( ( ) ). ⋅ ⋅ ⋅ . of X1 and X2 is given by . t . . we present two examples of m. σ 2 .X 2 (t . j = 1. t j ∈ . . t 1 = ⋅ ⋅ ⋅ = tk = 0 ( ) (10) where n1. . . σ 2 and ρ. t k = E e t X + ⋅ ⋅ ⋅ t X . . . t ) = exp⎢μ t 1 2 ⎡ ⎣ 1 1 + μ2t 2 + 1 2 2 2 2 σ 1 t1 + 2 ρσ 1σ 2 t1t 2 + σ 2 t 2 2 ( )⎤ . . ⋅ ⋅ ⋅ . xk ≥ 0 with x1 + · · · + xk = n.

⎢⎝ σ 1 ⎠ ⎝ σ1 ⎠ ⎝ σ 2 ⎠ ⎝ σ 2 ⎠ ⎥ ⎦ ⎣ Therefore the p. for t = (t1 t2)′. we have M X t = Ee t ′ X = ∫ 2 exp t′ x f x dx = 1 2π ∑ 1 2 () ∫ ( )() ) ( 2 ⎡ 1 exp ⎢t′ x − x − μ ′ ∑ −1 x − μ 2 ⎣ ( ) ⎥ dx . Σ−1. is written as follows in matrix notation: f x = () 1 2π ∑ 1 2 ⎡ 1 exp⎢− x − μ ′ ∑ −1 x − μ ⎣ 2 ( ) ( )⎥ . ⎦ ⎤ (11) The exponent may be written as follows: . ⎝ σ1 ⎠ ⎝ σ 2 ⎠ ⎝ σ 2 ⎠ ⎥ ⎪ 1 − ρ2 ⎪ 2 ⎢⎝ σ 1 ⎠ ⎦ ⎭ ⎣ ⎩ 2πσ 1σ 2 Set x = (x1 x2)′. μ = (μ1 μ2)′. is |Σ| = σ 2 σ 2(1 − ρ2). is Σ Σ 1 2 ∑ −1 = Therefore 2 1 ⎛ σ 2 − ρσ 1σ 2 ⎞ ⎜ ⎟. and Σ is the covariance matrix of X. and the inverse.6. Next. and ⎛σ 2 ρσ σ ⎞ =⎜ 1 ∑ ρσ σ σ12 2 ⎟ . ∑ ⎝ − ρσ 1σ 2 σ 12 ⎠ ( x − μ ′ ∑ −1 x − μ = = ) ( ) 2 ⎛ σ 2 − ρσ 1σ 2 ⎞ ⎛ x1 − μ1 ⎞ 1 x1 − μ1 x 2 − μ 2 ⎜ ⎟⎜ ⎟ ∑ σ 12 ⎠ ⎝ x 2 − μ 2 ⎠ ⎝ − ρσ 1σ 2 ( ) σ σ 1− ρ 2 1 2 2 ( 1 2 ) ⎡σ 2 x − μ 1 ⎢ 2 1 ⎣ ( ) 2 2 − 2 ρσ 1σ 2 x1 − μ1 x 2 − μ 2 + σ 12 x 2 − μ 2 ⎤ ⎥ ⎦ ( )( ) ( ) = 1 1 − ρ2 2 ⎡⎛ x − μ ⎞ 2 ⎛ x − μ1 ⎞ ⎛ x 2 − μ 2 ⎞ ⎛ x 2 − μ 2 ⎞ ⎤ 1 ⎢⎜ 1 +⎜ − 2 ρ⎜ 1 ⎟ ⎟⎜ ⎟ ⎟ ⎥. μ is the mean vector of X = (X1 X2)′. x 2 = ( ) 1 2⎫ ⎧ 2 ⎛ x1 − μ1 ⎞ ⎛ x 2 − μ 2 ⎞ ⎛ x 2 − μ 2 ⎞ ⎤ ⎪ ⎪ 1 ⎡⎛ x1 − μ1 ⎞ exp⎨− ⎢⎜ ⎟ − 2 ρ⎜ ⎟⎜ ⎟ +⎜ ⎟ ⎥ ⎬. |Σ|. ⎦ ⎤ In this form.5 The Moment Generating Function 159 f x1 .d.f. ⎝ 1 2 2 ⎠ Then the determinant of Σ.

μ′t = t′μ. is.g. Thus ⎛ ⎞ 1 M X t = exp⎜ μ ′t + t ′ ∑ t ⎟ . since it is the integral of a Bivariate Normal distribution with mean vector μ + Σt and covariance matrix Σ. 2 ⎝ ⎠ 2 [ ( ) ( )] (13) Focus on the quantity in the bracket. (Σ−1)′ = Σ−1.g.1 Derive the m. E[X(X − 1)] and σ 2(X).d. Exercises 6. Then calculate EX.g. By means of (13) and (14). 2 ⎝ ⎠ () (15) Observing that ⎛ σ 12 ρσ 1σ 2 ⎞ ⎛ t1 ⎞ 2 2 2 2 t ′ ∑ t = t1 t 2 ⎜ ⎟ ⎜ ⎟ = σ 1 t1 + 2 ρσ 1σ 2 t1t 2 + σ 2 t 2 . Then calculate EX. carry out the multiplication. respectively. and observe that Σ′ = Σ. provided they are ﬁnite.f.v.f. for those t’s for which they exist. with p. to obtain Σ ′ ′ ′ Σ 2 μ ′t + t ′ ∑ t − 2t ′ x + x − μ ′ ∑ −1 x − μ = x − μ + ∑ t ′ ∑ −1 x − μ + ∑ t . X which denotes the number of spots that turn up when a balanced die is rolled. M(t) and η(t).d. Derive its m.5.g. x′t = t′x.13 of Chapter 3.160 6 Characteristic Functions. the second factor above is equal to 1.f..2. f given in Exercise 3.f.v. and x′Σ −1μ = μ′Σ−1x. 2 ⎝ ρσ 1σ 2 σ 2 ⎠ ⎝ t 2 ⎠ ( ) it follows that the m. M(t) and η(t). the m. and factorial m. for those t’s for which they exist. respectively. of the r.2 Let X be an r. in (12) becomes ( ) ( ) [ ( ) ( ( ))] (14) MX t () ∫ 2 ⎛ ⎞ 1 1 = exp⎜ μ ′t + t ′ ∑ t ⎟ 2 ⎝ ⎠ 2π ∑ 1 2 ⎡ 1 ⎤ exp⎢− x − μ + ∑ t ′ ∑ −1 x − μ + ∑ t ⎥ dx.f.f. Moment Generating Functions and Related Theorems ⎛ ⎞ 1 1 −1 ⎜ μ ′t + t ′ ∑ t ⎟ − 2 μ ′t + t ′ ∑ t − 2 t ′ x + x − μ ′ ∑ x − μ . with p. .g.5. E[X(X − 1)] and σ 2(X).g.g. provided they are ﬁnite. and factorial m.14 of Chapter 3. given by (11).3 Let X be an r..f. 6. ⎣ 2 ⎦ ( ( )) ( ( )) However.f. indeed. Derive its m.f. 6.v. f given in Exercise 3.5.2.

show that EX = α + β and σ 2(X) = ( 2 α −β 12 ) .f. distributed as P(λ). deﬁne the function γ by γ (t) = E(1 + t)X for those t’s for which E(1 + t)X is ﬁnite. Then calculate EX and σ 2(X). t < − log q. respectively. ii) Show that its m.f.5 The Moment GeneratingExercises Function 161 6. iii) Find the quantities mentioned in parts (i) and (ii) for the Geometric distribution. with m. show that (d n dt n γ t ) () t =0 = E X X −1 ⋅ ⋅ ⋅ X −n+1 . of X and identify its p. show that EX = rq/p and σ 2(X) = rq/p2.7 Let X be an r.d. p).f.g..v.f.4 in Chapter 5.f. of X in order to calculate EX4.∞)(x).5.v. Use its factorial m. M.v.8 Let X be an r. Also use the ch. 6.5. Use its factorial m.6 Let X be an r. Then.g.4 Let X be an r. q ii) By differentiation. . M(t) and η(t).g. 2 6.v.f.5.2.1 in Chapter 5. in order to calculate its kth factorial moment.g.5.d. Find the ch. ηX t = () pr (1 − qt ) r .f. t ∈ (α ∈ .g. if the nth factorial moment of X is ﬁnite.f and factorial m.5.g. X.5. M(t) for those t’s for which it exists.v. with p.2.5 Let X be an r.f.6. β). Find its m. t < 1 . distributed as Negative Binomial with parameters r and p. distributed as B(n. in order to calculate its kth factorial moment. provided they are ﬁnite.5.g. M given by M(t) = eα t +β t . 6.10 Let X be an r.9 Refer to Example 3 in the Continuous case and show that MX(t) = ∞ for t < 0 as asserted there. Compare with Exercise 5.f. β > 0). f given by f(x) = λe−λ(x −α)I(α..11 For an r.12 Refer to the previous exercise and let X be P(λ). 2 6.5. are given by MX t = () pr (1 − qe ) t r . Derive γ (t) and use it in order to show that the nth factorial moment of X is λn. i) Show that its m. distributed as U(α. 6. 6. t β −α ( ) ii) By differentiation. [ ( ) ( )] 6.f. 6. Compare with Exercise 5.5.v.v. is given by Mt = () e tβ − e tα .

of X and also its ch. .’s with m. provided the assumptions of Theorem 1′ (vii′) are met.14) 6. X2). . with m. .f.5.5. Furthermore. . given by ⎡1 ⎤ 1 M t1 . Moment Generating Functions and Related Theorems 6. such that EX n is ﬁnite for all n = 1.5. M and set K(t) = log M(t) for those t’s for which M(t) exists.13 Let X be an r. Use the expansion ex = ∑ xn n = 0 n! ∞ in order to show that. X2 be two r.g.v. X2. . 6. and use it in order to calculate E(X1X2X3). X2.g. of X is given by M t = ∑ EX n n =0 () ∞ ( ) tn! .18 Refer to Exercise 4.f.v. Also ﬁnd the ch. Find the m.’s X1.19 Refer to the previous exercise and derive the m. t2. X3 for those t1.f. g(X1. such that EX n = n!. 6.f. 1. suppose that EX = μ and σ 2(X) = σ 2 are both ﬁnite. t1 . t3 for which it exists. Then deduce the distribution of X. provided they are ﬁnite.17 Let X1. of X and from this. 2.162 6 Characteristic Functions. X3) = X1 + X2 + X3 for those t’s for which it exists.5. t2.2.v. k = 0.g.v.f. M(t1. .5. M(t) of X for those t’s for which it exists.g.14 Let X be an r.v. then use the previous exercise in order to ﬁnd the m.f. t3) of the r.g.15 If X is an r. 6. such that EX 2k = 2 k ⋅ k! (2k)! . 6 ⎣3 ⎦ ( ) ( ) ( ) 2 Calculate EX1. n 6. t 2 ∈ . EX 2k +1 = 0. . Then show that d Kt dt () = μ and t =0 d2 Kt dt 2 () = σ 2.) 6. Also ﬁnd their joint ch. t 2 = ⎢ e t1 + t2 + 1 + e t1 + e t2 ⎥ . under appropriate conditions.v.v. From this. .f.f.g.f. deduce the distribution of g. deduce the distribution of X. one has that the m.f. in Chapter 4 and ﬁnd the joint m. .g.5.5.5.16 Let X be an r. M(t) of the r. t =0 (The function K just deﬁned is called the cumulant generating function of X.5. σ 2(X1) and Cov(X1. (Use Exercise 6.

. given by (11) and property 1 2 (10) in order to determine E(X1X2).v. indeed. . and 1 2 Cov(X1.g. . .’s have the Multinomial distribution with parameters n and pi.’s with m.g.f. it can be seen that they are positively quadrant dependent or negatively quadrant dependent according to whether ρ ≥ 0 or ρ < 0. that these m. μ2. and let i.f. ii) If the r. X2 for which Fx . X 2 .’s X1 and X2 have the Bivariate Normal distribution with parameters μ1. determine the E(XiXj). M and set K(t1. 6.4.5 The Moment GeneratingExercises Function 163 6. t 2 ∂t j ( ) = EX j . c2 ∈ . of Xi.g. of course.g. if X1 and X2 have the Bivariate Normal distribution.22 If the r.v. for all X1. X2 be two r.20 Let X1.v.v.5. so that these r. provided. and show that it is negative. ∂ K t1 . 6. p.25 Both parts of Exercise 6.5.v. t2) = log M(t1.f. σ 2 and ρ. show that c1X1 + c2X2 + c3 is also normally distributed for any c3 ∈ . X2) − Fx (X1)Fx (X2) ≤ 0. t 1 =t2 = 0 ( ) 6. Xj.f.4. .5.f. variances.v.’s exist.21 Suppose the r. X2) − Fx (X1)Fx (X2) ≥ 0. ii) Show that ρ is. X.’s. Consider the r. 2. t2 for which M(t1.’s X1.’s X1.5. pk. ( ) ∂2 K t1 . be arbitrary but ﬁxed. Xj).v. for all X1.x (X1. t 2 ∂t j2 ( ) t 1 =t2 = 0 =σ 2 Xj . the correlation coefﬁcient of X1 and X2.1 hold true if the ch. X2) < 0 if ρ < 0.6. X2) ≥ 0 if ρ ≥ 0.x (X1.g. σ 2. t2) for those t1. t 1 =t2 = 0 ∂2 K t1 .g.f. 1 ≤ i < j ≤ k.’s involved are replaced by m. σ 2 and ρ. . . In particular. . . Yc = c1X1 + c2X2 is normally distributed. pj. j. t 2 ∂t1∂t 2 ( ) = Cov X 1 . X2 in . Furthermore.5.’s X1 and X2 have a Bivariate Normal distribution if and only if for every c1.’s are all ﬁnite. respectively.23 6. Then show that for j = 1. use their joint m. show that Cov(X1.’s in order to show that the r. and covariances of these r. Cov(Xi. where p = 1 − pi − pj.f. are said to be positively quadrant dependent or negatively quadrant dependent. ii) Write out the joint m.5. ii) Calculate the covariance of Xi. t2) exists. Note: Two r. and set X = n − Xi − Xj. or Fx .’s Xi. μ2.’s X1 and X2 have the Bivariate Normal distribution with parameters μ1. Xk have the Multinomial distribution with parameters n and p1. ii) In either case. suppose that expectations.1 for k = 2 and formulated in terms of m. Xj.24 Verify the validity of relation (13). and by differentiation. 1 2 1 2 1 2 1 2 6. Xj.v. σ 2. X2 in . ii) Use Exercise 6.v.

. and let P be a probability function deﬁned on the class of events. .v. j = 1. as well as in subsequent chapters. based primarily on independence. xj ∈ . . . . .) REMARK 1 (i) The sets Bj. j =1 ( ) k ( ) The r.3). (See Lemma 3 in Section 7. . j = 1.4) also applies to mdimensional r. . Independence carries over to r. as will be seen below.v. taking Bj = (−∞. k may not be chosen entirely arbitrarily.v.’s are said to be dependent. Independence of r. k = ∏ P X j ∈ Bj . in essence. . the not-so-rigorous deﬁnition of independence of r. . .4. the concept of independence of events was deﬁned and was heavily used there. In Chapter 2 (Section 2. and several applications.v. For example. j = 1.164 7 Stochastic Independence with Some Applications Chapter 7 Stochastic Independence with Some Applications 7. and two criteria of independence are also discussed. . vectors when (and B in Deﬁnition 3) is replaced by m (Bm). . and is the most basic assumption made in this book. .v. consider a class of events associated with this space.) (ii) Deﬁnition 1 (as well as Deﬁnition 3 in Section 7. .v. but there is plenty of leeway in their choice. are said to be independent if every ﬁnite subcollection of them is a collection of independent r.’s.’s Xj. . Non-independent r. j = 1.’s also. j = 1. k are said to be independent if. for sets Bj ⊆ . ⋅ ⋅ ⋅ . In this section. . are discussed in subsequent sections. DEFINITION 1 The r. 164 . k would be sufﬁcient.’s is presented.1 Stochastic Independence: Criteria of Independence Let S be a sample space. .’s. (See also Deﬁnition 3 in Section 7. . it holds P X j ∈ Bj .4. and the comment following it. j = 1.’s Xj. A third criterion of independence.4. k. reduces to that of events. 2.v. A rigorous treatment of some results is presented in Section 7. xj].

X x1 . . X x1 . ⋅ ⋅ ⋅ . k are independent if and only if any one of the following two (equivalent) conditions holds: i) FX . we have: Let f X . ⋅ ⋅ ⋅ . ⋅ ⋅ ⋅ . xk = ∏ FX x j . . then P X j ∈ Bj . yj]. X x1 . ⋅ ⋅ ⋅ . . X k = xk = ∏ P X j = x j . . ⋅ ⋅ ⋅ . ⋅ ⋅ ⋅ . for all 1 k ( ) j =1 j ( ) ( PROOF ii) If Xj. Lemma 3. j = 1. k which gives j FX . . k. k. . ⋅ ⋅ ⋅ . where xj is in the range of Xj.v. ⋅ ⋅ ⋅ .’s Xj. j = 1. 1 k ( ( ) ) k j =1 j ( ) ( ) Let now f X . j =1 ( ) k ( ) Therefore Xj. X x1 . Then if Xj. ⋅ ⋅ ⋅ . . of course. we get B1 × ⋅ ⋅ ⋅ ×B ∑ f X . we get P X 1 = x1 . ⋅ ⋅ ⋅ . j = 1. . ⋅ ⋅ ⋅ . ii) f X . k are independent. j =1 ⎢ B ⎥ ⎣ ⎦ j ( ) j j FX . ⋅ ⋅ ⋅ . xk = 1 k ( ) k ⋅ ⋅ ⋅ ×B ∑ f X x1 ⋅ ⋅ ⋅ f X xk 1 k ( ) ( ) k or 1 k k ⎡ ⎤ = ∏ ⎢ ∑ f X x j ⎥. . j =1 ( ) k ( ) or f X . . this is true for Bj = (−∞. . X x1 . be omitted. k. . xk = ∏ f X x j . . j = 1. xk = ∏ f X x j . X x1 .4. ⋅ ⋅ ⋅ . k. k. . j = 1. ⋅ ⋅ ⋅ . xj]. . ⋅ ⋅ ⋅ . Bj ⊆ . . For the continuous case. Some relevant comments will be made in Section 7. . and will.7. j = 1. k = ∏ P X j ∈ Bj . x j ∈ . ⋅ ⋅ ⋅ . ⋅ ⋅ ⋅ . yk = ∏ FX y j . xk = ∏ f X x j 1 k ( ) k j =1 j ( ) and let . . . we set Bj = {xj}. ⋅ ⋅ ⋅ . j = 1. j = 1. ⋅ ⋅ ⋅ . k are independent. ii) For the discrete case. for all 1 k ( ) k j =1 k j ( ) x j ∈ . j = 1. ⋅ ⋅ ⋅ . .1 Stochastic Independence: Criteria of Independence 165 THEOREM 1 (Factorization Theorem) The r. · · · . ⋅ ⋅ ⋅ . j = 1. j = 1. xk = ∏ FX x j . X y1 . j =1 ( ) k ( ) The proof of the converse is a deep probability result. 1 k k j =1 j Then for any sets Bj = (−∞. . X x1 . xj ∈ 1 k . j =1 ( ) k ) In particular. . . yj ∈ B1 × . xk = ∏ f X x j . . k are independent by (i).

⋅ ⋅ ⋅ . Then it seems intuitively clear that the r.v. . X t1 . k are independent. vectors) are independent r. .’s. t k = ∏ φ X t j .v.4. ⋅ ⋅ ⋅ .v. . k are deﬁned on m into . k and let gj : → be (measurable) functions.v. .’s (r. X y1 . j = 1. The same conclusion holds if the gj’s are complex-valued. . X x1 . ▲ Independence of r. . j = 1. k are r. . k are independent by (i). . ⋅ ⋅ ⋅ . Then the r. xk = ∏ f X x j .v. The same conclusion holds if the r. functions of independent r.v. . the Xj’s are independent).d. ⋅ ⋅ ⋅ . true and is the content of the following LEMMA 1 For j = 1. are needed in the proof of Theorem 1′ below. as well as Lemma 1.v. . . REMARK 3 THEOREM 1′ (Factorization Theorem) The r. so that gj(Xj). . . ▲ 1 k ( ) k j =1 j ( ) It is noted that this step also is justiﬁable (by means of calculus) for the continuity points of the p.’s gj(Xj). ⋅ ⋅ ⋅ .’s.v.f. yk = ∏ FX y j . ⋅ ⋅ ⋅ . . . alone. LEMMA 2 Consider the r. . . y j ∈ . Next. X x1 .v. . . ⋅ ⋅ ⋅ . we get f X . k.v. . k are r. we get FX . ( ) [ ( )] PROOF See Section 7. xk = ∏ FX x j 1 k ( ) k j =1 j ( ) (that is. j = 1.’s are replaced by m-dimensional r. j = 1.v. k.’s Xj.166 7 Stochastic Independence with Some Applications C j = −∞. assume that FX . ⋅ ⋅ ⋅ . .’s gj(Xj).2.’s. . k are also independent. j = 1. .’s Xj be independent and consider (measurable) functions gj : → . . vectors. . 1 k ( ] ( ) k j =1 j ( ) so that Xj. k. . This is.v. actually. ⋅ ⋅ ⋅ . ⋅ ⋅ ⋅ . . only. . . j = 1. . j = 1.’s and suppose that gj is a function of the jth r. REMARK 2 Consider independent r. . . and the functions gj. k are independent if and only if: φ X . . .’s also has the following consequence stated as a lemma. j = 1. . j = 1. .’s Xj. we have ⎡ k ⎤ k E ⎢∏ g j X j ⎥ = ∏ E g j X j . y j . so that gj(Xj). Then. if the r. k ought to be independent. j = 1. let the r. Then differentiating both sides. .v. . 1 k ( ) k j =1 j ( ) . j = 1. (That is.v. Then integrating both sides of this last relationship over the set C1 × · · · × Ck. for all t j ∈ . ▲ The converse of the above statement need not be true as will be seen later by examples.) PROOF See Section 7. .’s Xj. Both this lemma. . ⎢ j =1 ⎥ j =1 ⎣ ⎦ provided the expectations considered exist.

xk = E ⎜ exp⎜ i ∑ t j X j ⎟ ⎟ = E ⎜ ∏ e ⎟ = ∏ Ee ⎝ j =1 ⎠ j =1 ⎠⎠ ⎝ j =1 ⎝ 1 k ( ) j j j j by Lemmas 1 and 2. X x1 . ⋅ ⋅ ⋅ . x k ) = lim ⎜ ⎟ T →∞ ⎝ 2T ⎠ ⎛ 1 ⎞ = lim ⎜ ⎟ T →∞ ⎝ 2T ⎠ k ∫ k T −T × φ X 1 . then by Theorem 1(ii). ⋅ ⋅ ⋅ . j = 1.1 Stochastic Independence: Criteria of Independence 167 PROOF If X1. h →0 T →∞ 2π ∫−T it j h and for the multidimensional case. . X k t1 . For the continuous case. X k (t1 . ⋅ ⋅ ⋅ . . j = 1. ⋅ ⋅ ⋅ .7. ⋅ ⋅ ⋅ . ⋅ ⋅ ⋅ . ⋅ ⋅ ⋅ . t k = ∏ φ X t j . . . T →∞ 2T ∫−T and for the multidimensional case. . f X . we have (see Theorem 2′(i) in Chapter 6) j ( ) j j j ( ) ⎛ 1 ⎞ f X 1 . . ⋅ ⋅ ⋅ . ⋅ ⋅ ⋅ . Let us assume now that j=1 j φ X . . T →∞ 2T ⎦ j =1 j =1 ⎣ ( ) That is. ⋅ ⋅ ⋅ . k are independent by Theorem 1(ii). t k dt1 ⋅ ⋅ ⋅ dt k ⎛ 1 ⎞ = lim lim ⎜ ⎟ h →0 T →∞ ⎝ 2π ⎠ × dt1 ⋅ ⋅ ⋅ dt k k ⎡ 1 = ∏ ⎢lim lim h →0 T →∞ 2π j =1 ⎣ ⎢ − it h ⎤ 1 − e j − it j x j ∫−T it j h e φ X j t j dt j ⎥ ⎥ ⎦ T k ( ) ⎞ ⎟ ⎠ ∫−T T ⋅ ⋅ ⋅∫ ⎤ ⎡ 1 − e − it j h − it j x j ∏ ⎢ it h e φ X j t j ⎥ −T j =1 ⎣ ⎥ ⎢ j ⎦ T k ( ) ( ) = ∏ fX j ( xj ) . and this is Πk φX (tj). ⋅ ⋅ ⋅ . k are independent. k. 1 k ( ) k j =1 j ( ) Hence k ⎛ ⎞⎞ ⎛ k ⎛ k it X ⎞ it X φ X . j =1 k . ⋅ ⋅ ⋅ . t k )dt1 ⋅ ⋅ ⋅ dt k ⎛ k ⎞ T ⋅ ⋅ ⋅ ∫ exp⎜ −i ∑ t j x j ⎟ −T ⎝ j =1 ⎠ ∫ ⎛ k ⎞ k T ⋅ ⋅ ⋅ ∫ exp⎜ −i ∑ t j x j ⎟ ∏ φ X j t j ( dt1 ⋅ ⋅ ⋅ dt k ) −T −T ⎝ j =1 ⎠ j =1 T ( ) k 1 T − it j x j ⎡ ⎤ k = ∏ ⎢ lim ∫− T e φ X j t j dt j ⎥ = ∏ fX j ( x j ) . xk = ∏ f X x j . k. ⋅ ⋅ ⋅ . X k ( x1 . j = 1. j = 1. we have (see Theorem 2′(ii) in Chapter 6) f X x j = lim lim j ( ) − it j h j j ( ) T f X 1 . X x1 . xk ( ) ⎛ 1 ⎞ = lim lim ⎜ ⎟ h →0 T →∞ ⎝ 2π ⎠ k ∫−T T ⋅ ⋅ ⋅∫ ∏⎜ −T j =1 k ⎛ 1 − e − it j h ⎝ it j h e − it j x j × φ X 1 . Xj. 1 k ( ) k j =1 j ( ) For the discrete case. X k x1 . we have 1 T 1− e − it h e φ X t j dt j . we have (see Theorem 2(i) in Chapter 6) 1 T − it x f X x j = lim e φ X t j dt j . ⋅ ⋅ ⋅ . ⋅ ⋅ ⋅ . ⋅ ⋅ ⋅ . X t1 . .

d. . ⎥ ⎥ ⎦ fX 1 . PROOF We have seen that (see Bivariate Normal in Section 3. .2 Let the r. that is. X(n) in terms of f and F.f. P( X12 + X 2 < 1 ). . j = 1. then ( ) 1 ( ) ( ) ( ) 2 ⎤ ⎥. so that ρ = 0.’s can be formulated. . Set X (1) = min X 1 . X n . x2) = I(0.i. x ) = 1 2 1 2πσ 1σ 2 1 − ρ 2 e −q 2 . r.d.1 Let Xj. ii) Show that X1.d.’s with p. X n s . ( ) () [ () ( )] () [ () ( )] Then express the d. The converse is always true by Corollary 1 in Section 7. X1. if X1. .2. of X(1). . X2 are uncorrelated. X2 are independent. ⋅ ⋅ ⋅ . and p. ( ) ( ) ( ) that is. f given by f(x1. 2 ii) Find the following probabilities: P(X1 + X2 < 1 ).’s X1. X2 (x .f.f. Then X1. .’s with p.1) × (0.v. x2) = g(x1)h(x2).1.f. X2 are independent and identify their common distribution.1)(x1. ⋅ ⋅ ⋅ .f.d. X n . k by Theorem 1(ii). . x 2 = fX 1 x1 ⋅ fX 2 x 2 . if the m. ▲ Exercises 7. X2 have p. 7. X2 are independent if and only if they are uncorrelated. f given by f(x1.f.1. where q= 1 1 − ρ2 2 ⎡⎛ x − μ ⎞ 2 ⎛ x − μ 2 ⎞ ⎛ x2 − μ 2 ⎞ ⎛ x2 − μ 2 ⎞ ⎤ 1 ⎢⎜ 1 − 2ρ⎜ 1 −⎜ ⎟ ⎟⎜ ⎟ ⎟ ⎥. X2 be two r. X n s . X ( n ) s = max X 1 s .168 7 Stochastic Independence with Some Applications which again establishes independence of Xj.f.3 of Chapter 3) fX 1. j = 1.v.f. ( ) X n = max X 1 .d.1. ⋅ ⋅ ⋅ .’s exist. ▲ A version of this theorem involving m. F. X (1) s = min X 1 s . f x = fX 1 x1 = exp⎢− exp⎢− X2 2 2 ⎢ ⎢ 2σ 12 ⎥ 2σ 2 2πσ 1 2πσ 2 ⎥ ⎢ ⎢ ⎦ ⎣ ⎣ Thus. n be i. x2). 3 4 1 P(X1X2 > 2 ). ⎢⎝ σ 1 ⎠ ⎝ σ1 ⎠⎝ σ 2 ⎠ ⎝ σ 2 ⎠ ⎥ ⎦ ⎣ and ⎡ x −μ 2⎤ ⎡ x −μ 1 1 1 2 2 ⎥. . REMARK 4 COROLLARY Let X1. 7.g.g.v. X 2 x1 . X2 have the Bivariate Normal distribution. ⋅ ⋅ ⋅ .3 Let X1. f and d.

1 . X2.2.d.f.i. X2 are dependent.1. 4 where ( ) ( ) A = 1. 1.9 Let X1. x 2 . with parameter λ and B = (1/λ. 1.1. λ = 1 . X2 be two independent r. 0. x3).3 in Chapter 6.f. X2.v. of X1 and X2 and show that X1.d. 1). are independent. . ii) X1. x2)′ ∈ 2. . express the ch. . of X in terms of φ. (Hint: . ii) Determine the constant c. 7.f. Let also Eg(X2) be ﬁnite. x2).1. r.’s with ch.’s and let g: → be measurable. r.’s X1. x3 . ii) Show that these r.’s X1. 7. 0 . where 2 A = {(x1.f.d.1 Stochastic Independence: Criteria of Independence 169 ii) Derive the p. k = 5. 1) × (0. iii) In terms of f. of X1 + X2 and X1 + X2 + X3 .Exercises 7.f.) 1 2 1 {( )( )( )( )} . 1 .4 Let X1. 0 . φ and sample mean X . x3) = 8x1x2x3 IA(x1.i.v.’s with p. 1. . iii) Find a numerical answer for n = 10. Then show that ii) Xi. 0. X3 be jointly distributed with p. X3 are dependent. f and let B be a (Borel) set in .’s with p. i ≠ j. j = 1. f given by 1 f x1 .5 in Chapter 4 and show that the r. x 2 .5 Let X1. ii) Calculate the probability P(X1 > X2) if g = h and h is of the continuous type. 1) × (0.f.f. Then show that E[g(X2) | X1 = x1] = Eg(X2).d.f.1. X3 be r.d.2. 0. 7. .’s with p.8 Let Xj. x3 = I A x1 .v.v. ii) Calculate the probability P(X1 < X2 < X3).1. . r. 2 7.f f given by f(x1. X2.d. X2. f given by f(x1. x2. .10 If Xj. 7.i.v. t ∈ Use Exercise 6.v. Xj.7 Refer to Exercise 4.11 For two i. express the probability that at least k of the X’s lie in B for some ﬁxed k with 1 ≤ k ≤ n.d.d.’s X1. X2 be two r. x12 + x 2 ≤ 9}. 7. . show that φX −X (t) = |φX (t)|2. 7.v. x2. iii) Simplify this expression if f is the Negative Exponential p. n are i.1.’s are independent. ii) Show that X1. 7.1. X2.v.d. X3 are independent. Utilize this result in order to ﬁnd the p. X2 are independent. x2) = cIA(x1.v.1.d. n be i. j = 1. where A = (0.6 Let the r. ∞). 0.

⋅ ⋅ ⋅ .2 Proof of Lemma 2 and Related Results We now proceed with the proof of Lemma 2. Indeed.’s with joint and marginal ch.1. show that φX (t. t 2 ∈ . Thus. ⋅ ⋅ ⋅ . = E Y11Y21 − Y12Y22 + iE Y11Y22 + Y12Y21 11 21 12 22 11 22 12 21 11 21 12 22 11 22 12 21 11 12 21 22 1 2 [( )( )] Next. . . k. j = 1. t ) = φ (t )φ (t ). . so that we use integrals. By an example. 1 2 X1 1 X2 2 1 .X2 t1 . assume the result to be true for k = m and establish it for k = m + 1. X2.170 7 Stochastic Independence with Some Applications 7. E Y1Y2 = E Y11 + iY12 Y21 + iY22 ( ) ( ) ( ) = [ E(Y Y ) − E(Y Y )] + i[ E(Y Y ) + E(Y Y )] = [( EY )( EY ) − ( EY )( EY )] + i[( EY )( EY ) − ( EY )( EY )] = ( EY + iEY )( EY + iEY ) = ( EY )( EY ).f. φX and φX . set gj(Xj) = Yj = Yj1 + Yj2.X2 (t . For k = 2. PROOF OF LEMMA 2 ⎡ k ⎤ ∞ ∞ E ⎢∏ g j X j ⎥ = ∫ ⋅ ⋅ ⋅ ∫ g1 x1 ⋅ ⋅ ⋅ gk xk −∞ −∞ ⎢ j =1 ⎥ ⎣ ⎦ × f X . and for simplicity.’s involved are continuous.v. t ) = φ (t )φ (t ). . X2 be two r. 1 ( ∞ ∞ −∞ −∞ 1 1 k X1 1 Xk k 1 k ∞ ∞ −∞ 1 1 X1 1 1 −∞ k k Xk k k 1 1 k k Now suppose that the gj’s are complex-valued. . Suppose that the r. X1 X2 t∈ . xk dx1 ⋅ ⋅ ⋅ dxk ( ) ( ) k ( ) k ) = ∫ ⋅ ⋅ ⋅ ∫ g ( x ) ⋅ ⋅ ⋅ g ( x ) f ( x ) ⋅ ⋅ ⋅ f ( x )dx ⋅ ⋅ ⋅ dx (by independence) = ⎡ ∫ g ( x ) f ( x )dx ⎤ ⋅ ⋅ ⋅ ⎡ ∫ g ( x ) f ( x )dx ⎤ ⎢ ⎢ ⎥ ⎥ ⎣ ⎣ ⎦ ⎦ = E[ g ( X )] ⋅ ⋅ ⋅ E[ g ( X )].X . Then X1.’s φX . does not imply independence of X1. X2 are independent if and only if 1 1 1 2 φX 1 . Replace integrals by sums in the discrete case.12 Let X1. X x1 . 7.v.

Thus uncorrelated r. if the r. ▲ m+1 1 m m+1 [( COROLLARY 1 The covariance of an r.’s are independent or have pairwise covariances 0 (are pairwise uncorrelated).v.7. that is. . the corollary to Theorem 1 after the proof of part (iii). then: ⎛ k ⎞ k σ 2 ⎜ ∑ c j X j ⎟ = ∑ c 2σ j2 + ∑ ci c jσ iσ j ρij j ⎝ j =1 ⎠ j =1 1≤i ≠ j ≤k = ∑ c 2σ j2 + 2 j j =1 k 1≤i < j ≤k ∑ ci c jσ iσ j ρij . k. c) = 0. ( ii) If also σj > 0. and ρij = ρ(Xi.’s in general are not independent.’s Xj. . and j j j j iii′) σ 2(∑k= 1Xj) = ∑k= 1σ 2 ′ j j j (Bienaymé equality).1 Stochastic Independence: Criteria of Independence 171 E Y1 ⋅ ⋅ ⋅ Ym+1 = E Y1 ⋅ ⋅ ⋅ Ym Ym+1 1 m ( ) ) ] = E(Y ⋅ ⋅ ⋅ Y )( EY ) (by the part just established) = ( EY ) ⋅ ⋅ ⋅ ( EY )( EY ) (by the induction hypothesis). . . . X j . k.v. then they are uncorrelated. . j = 1. . ▲ The converse of the above corollary need not be true.v. Xj). . . it holds: j ⎛ k ⎞ k σ 2 ⎜ ∑ c j X j ⎟ = ∑ c j2σ j2 + ∑ ci c j Cov X i . . if their variances are also positive. k with ﬁnite second moments and variances σ 2 = σ 2(Xj). however. X 2 = E X1 X 2 − EX1 EX 2 1 2 1 ( ) ) ( )( ) = ( EX )( EX ) − ( EX )( EX ) = 0. Cov(X. Y) = Cov(X.’s X1 and X2 are independent. Y)/σ(X)σ(Y).v. . In particular. 2 ( by independence and Lemma 2. PROOF Cov(X. . j = 1.v. PROOF In fact. X and of any other r. which is equal to a constant c (with probability 1) is equal to 0. X j ⎝ j =1 ⎠ j =1 1≤i ≠ j ≤k ( ) ) = ∑ c 2σ j2 + 2 j j =1 k 1≤i < j ≤k ∑ ci c j Cov X i . provided their second moments are ﬁnite. then they have covariance equal to 0.v. j = 1. In particular.) REMARK 5 COROLLARY 3 i) For any k r. ▲ COROLLARY 2 If the r. . The second assertion follows since ρ(X. i ≠ j. c) = E(cX) − (Ec)(EX) = cEX − cEX = 0.2 Proof of Lemma 2 and Related Results 7. (See. then: iii) σ2(∑k= 1cjXj) = ∑k= 1c2σ 2. and any constants cj. Cov X1 .

⎢ j =1 ⎥ j =1 ⎣ ⎦ Let Xj. X j ( ) ) i = ∑ c 2σ j2 + 2 j j =1 This establishes part (i).2 Refer to Exercise 4. ▲ (since Cov( X .v. Then show that n ⎡n ⎤ 3 E ⎢∑ X j − EX j ⎥ = ∑ E X j − EX j . Xj) = 0. ( ) 3 ( ) 7. . . Then the assertion follows from either part (i) or part (ii). . 2 ( ) 2 where 1 k 1 k X j and S 2 = ∑ X j − X . .’s with mean μ and variance σ2. j = 1.2. Xj) = σiσjρij = σjσiρji. j = 1. . . σ2(X1 + X2).3 Let Xj. X= ( ) 7. . . X ) = Cov( X .v. n be i. both ﬁnite. ′ iii′) Follows from part (iii) for c1 = · · · = ck = 1. . respectively.v. .4 . ∑ k j =1 k j =1 7. . .2. . . . .2. r.d.’s with ﬁnite moments of third order. X )). either because of independence and Corollary 2. k. X j ( Exercises 7. iii) Here Cov (Xi. k for which E(Xj) = μ (ﬁnite) j = 1. j = 1. i j j 1≤i < j ≤k ∑ ci c j Cov X i . Part (ii) follows by the fact that Cov(Xi.172 7 Stochastic Independence with Some Applications PROOF ii′i) Indeed. . . or ρij = 0. .1 For any k r. show that ∑ (X j =1 k j −μ ) = ∑ (X 2 j =1 k j −X ) 2 +k X −μ ( ) 2 = kS 2 + k X − μ . j = 1. k. σ2(X1 + X2 + X3) without integration. n be independent r. . E(X1 X2 X3). i ≠ j.5 in Chapter 4 and ﬁnd the E(X1 X2).2.’s Xj.i. ′ 2 ⎡k ⎛ k ⎞ ⎛ k ⎞⎤ 2 σ ⎜ ∑ c j X j ⎟ = E ⎢∑ c j X j − E ⎜ ∑ c j X j ⎟ ⎥ ⎝ j =1 ⎠ ⎝ j =1 ⎠⎥ ⎢ j =1 ⎦ ⎣ ⎡k = E ⎢∑ c j X j − EX j ⎢ j =1 ⎣ ⎡k = E ⎢∑ c 2 X j − EX j j ⎢ ⎣ j =1 ( ) ⎤ ⎥ ⎥ ⎦ 2 2 ( ) + ∑ c c (X i j i≠ j i ⎤ − EX i X j − EX j ⎥ ⎥ ⎦ )( ) = ∑ c 2σ j2 + j j =1 k k 1≤i ≠ j ≤k ∑ ci c j Cov X i .2. in case σj > 0.

we will restrict ourselves to the case of ch. when they exist. .v. of X is that of a B(n.f. .’s taking on the values −1. j = 1. X2 are dependent. f ( −1. − 1) = α ( ) f (0.’s alone. j =1 j =1 k () k .v.7. f −1.1 Stochastic Independence: Criteria of Independence 173 ii) In terms of α. The m.1 and σ = 2. For simplicity. . of a B(n. c and σ.v. 7. 1) = β .’s. j = 1. 1 with the following respective probabilities: f ( −1.f.’s with the same parameter p and possibly distinct nj’s is also Binomially distributed. . 7. 0) = β . X2 be two r. where n = ∑ n j .) PROOF It sufﬁces to prove that the ch. the sum of independent Binomially distributed r.f. we have φ X t = φ Σ X t = ∏ φ X t = ∏ pe it + q j j () () k j =1 j () k j =1 ( ) = ( pe nj it +q ) n which is the ch.f.. without explicitly mentioning it. where λ = ∑ λ j . β > 0. ii) X1. p) r. . f (0. ﬁnd the smallest value of n for which the probability that X (the sample mean of the X’s) and μ differ in absolute value at most by c is at least α. as will be seen below. p). can be used in the same way as the ch.90. f (1. Then ch. where n is as above. 0.f.’s involved are independent. f (0. .3 Some Consequences of Independence The basic assumption throughout this section is that the r. so that ρ = 0. − 1) = α . ii) Give a numerical answer if α = 0. 1 = α . − 1) = β . when this last j=1 expression appears as a subscript here and thereafter. f (1. k and independent. 1) = α . k and independent. THEOREM 2 Let Xj be B(nj. c = 0. 0) = β . Then X = ∑ Xj is B n. 0) = 0.5 Let X1. ▲ THEOREM 3 Let Xj be P(λj). as we desired to prove. j =1 j =1 k ( ) k (That is. the conclusions of the theorems will be reached by way of Theorem 3 in Chapter 6. writing ∑j Xj instead of ∑k Xj. . α + β = 1 .3 Some Consequences 7.’s. X2) = 0. .2. Then X = ∑ X j is P λ . α .g.f. p .’s are used very effectively in deriving certain “classic” theorems. However..v. f (1. p) r.v. In all cases. 4 Then show that: ii) Cov(X1.

. σ 2 = ∑k= 1c2σ 2. k j =1 By assuming that the X’s are normal. Hence X is N(μ.’s with E X j = μ. σ2/k). PROOF In (ii) of Theorem 4. we set c1 = ⋅ ⋅ ⋅ = c k = 1 2 . Chapter 4. . σ 2). j = 1. σ 2 = ∑k= 1σ 2. ▲ Now let Xj. or equivalently.174 7 Stochastic Independence with Some Applications (That is. ( ) j = 1. . σ2). . X= 1 k ∑Xj. k. μ1 = ⋅ ⋅ ⋅ = μ k = μ . σ2).) PROOF (ii) We have k k ⎡ ⎛ σ 2 c 2t 2 ⎞ ⎤ φ X t = φ Σ c X t = ∏ φ X cj t = ∏ ⎢exp⎜ icj t μ j − j j ⎟ ⎥ 2 ⎠⎥ j =1 j =1 ⎢ ⎝ ⎦ ⎣ 2 2⎞ ⎛ σ t = exp⎜ itμ − 2 ⎟ ⎝ ⎠ () j j j () j ( ) with μ and σ2 as in (ii) above. be N(μj. . Then j ii) X = ∑k= 1Xj is N(μ.) PROOF We have φ X t = φ Σ X t = ∏ φ X t = ∏ exp λ j e it − λ j j j () () k j =1 j () k j =1 ( ) k ⎛ k ⎞ = exp⎜ e it ∑ λ j − ∑ λ j ⎟ = exp λe it − λ ⎝ j =1 ⎠ j =1 ( ) which is the ch. we get COROLLARY Let Xj be N(μ. k be any k independent r. where μ = ∑k= 1μj. j = 1.’s is also Poisson distributed. . . the sum of independent Poisson distributed r. . [ k ( X − μ)]/σ is N(0. 1). the sum of independent Normally distributed r. The second follows from the ﬁrst by the use of Theorem 4. σ 2). of a P(λ) r.v. σ 2). more generally. (i) Follows from (ii) by setting c1 = c2 = · · · = ck = 1. j j j j ii) X = ∑k= 1cjXj is N(μ. j = 1. . j j j j j (That is.f. and. Then X is N(μ. .v.v.’s is Normally distributed. k and independent. Set ( ) σ 2 X j = σ 2.v. since . . k and independent. ▲ THEOREM 4 Let Xj. and σ 12 = ⋅ ⋅ ⋅ = σ k = σ 2 k and get the ﬁrst conclusion. . where μ = ∑k= 1cjμj. ⋅ ⋅ ⋅ .

j = 1. . σ 2). j = 1. . . and by Theorem 3. k are independent. Chapter 4.7. . k and independent.v. Xj. j = 1. j = 1. j =1 j =1 k k PROOF We have φ X t = φ Σ X t = ∏ φ X t = ∏ 1 − 2it j j () () k j =1 j () k j =1 ( ) − rj 2 = 1 − 2it ( ) −r 2 which is the ch. Then kS2/σ 2 is χ 2 . σ2 k ▲ THEOREM 5 Let Xj be χ 2 . . . k and independent. .) COROLLARY 2 Let Xj. . . σ2) and independent. σ2 . then it will be shown that X and S2 are independent.f. where r = ∑ rj . k−1 PROOF We have 2 ⎡ k X −μ ⎛ Xj − μ⎞ =⎢ ∑⎜ σ ⎟ ⎢ σ ⎠ j =1 ⎝ ⎣ k ( )⎤ ⎥ ⎥ ⎦ 2 + kS 2 . k. see Theorem 6. k and independent. k are N(μ. k. ( ) 2 where S2 = 2 1 k ∑ Xj − X . Then the following useful identity is easily established: ∑ (X j =1 k j −μ ) =∑ ( X 2 j =1 k j −X ) 2 +k X −μ ( ) 2 = kS 2 + k X − μ . ⎝ ⎠ j 2 j = 1. . . . . j = 1. . . . . . PROOF By Lemma 1. (For this. . k j =1 ( ) If. ⋅ ⋅ ⋅ .3 Some Consequences 7. Then r j X = ∑ X j is χ r2 . ⋅ ⋅ ⋅ . j = 1. ▲ 2 r COROLLARY 1 Let Xj be N(μj. be N(μ.’s such that E(Xj) = μ. Thus Theorem 3 applies and gives the result. .1 Stochastic Independence: Criteria of Independence 175 k X −μ ( σ ) = (X − μ ) . ⎛ X j − μj ⎞ ⎜ σ ⎟ . σ 2). Chapter 9. of a χ r. . ▲ Now let Xj. . j = 1. k be any k r.v. ⎛ X j − μj ⎞ ⎜ σ ⎟ ⎝ ⎠ j 2 are χ 12 . Then j k ⎛ X −μ ⎞ j j X = ∑⎜ σj ⎟ ⎠ j =1 ⎝ 2 2 is χ k . . in particular. .

r have the Geometric distribution with parameter p. is B(t. j = 1. .176 7 Stochastic Independence with Some Applications Now ⎛ X − μ⎞ ∑ ⎜ jσ ⎟ ⎠ j =1 ⎝ k 2 2 is χ k by Corollary 1 above. . . . The second statement is immediate. Then X = ∑k= 1Xj is kY. .v.d. . . THEOREM 6 ( ) ( ) Let Xj. and σ 2 S 2 = σ . is the Multinomial p. k k2 The following result demonstrates that the sum of independent r.v. of a χ 2 r. let Xj be independent r. .f. .’s Xj.2 If the independent r. . . and ⎡ k X −μ ⎤ ⎥ is χ12 . .v. n.d. ⎢ ⎥ ⎢ σ ⎦ ⎣ by Theorem 3. σ = 1.v.f. we get (1 − 2it)−k/2 = (1 − 2it)−1/2 φkS /σ (t). . k be independent r. Chapter 4. . and hence. .v. as was the case in Theorems 2–5 above. where Y is Cauchy with μ j = 0. ▲ k−1 2 2 2 2 ( ) 2 REMARK 6 It thus follows that.3. σ = 1. which is the ch.’s having the Cauchy distribution with parameters μ = 0 and σ = 1. given T = t. X/k = X is Cauchy with μ = 0. and σ 2 ⎜ 2 ⎟ = 2 k − 1 . . σ = 1. of Xj.’s having a certain distribution need not have a distribution of the same kind. . j = 1. where Y is Cauchy with μ = 0. X = X1 + · · · + Xr has the Negative Binomial distribution with parameters r and p. 7.’s distributed as P(λj).’s of both sides of the last identity above. Hence φkS /σ (t) = (1 − 2it)−(k−1)/2 which is the ch. j = 1. j =1 n Then show that ii) The conditional p. . n. λ1j /λ). with parameters t and pj = λj/λ. j = 1. .3.f.f.1 For j = 1. . . j =1 n λ = ∑λj . . of kY. ▲ PROOF j 1 Exercises 7. We have φX(t) = φ∑ Xj(t) = [φX (t)]k = (e−|t|)k = e−k|t|. ⎝ σ ⎠ ⎝ σ ⎠ ( ) or ES 2 = 2 k−1 4 k−1 2 σ . show that the r.f. . n. and set T = ∑Xj. Then taking ch.f. . ii) The conditional joint p.d. of Xj. given T = t. n.v. ⎛ kS 2 ⎞ ⎛ kS 2 ⎞ E⎜ 2 ⎟ = k − 1. j = 1. .

σ2. σ2. Lemma 3.5 Let X1. 1 2 Then ii) Calculate the probability P( X > Y ) as a function of m. p1) and B(n2.v. set X = c1X1 + c2X2. is also stated. . and for any two constants c1 and c2. . m and Yj. 1 2 respectively.v.3 The life of a certain part in a new automobile is an r.’s distributed as N(μ1.f.1 Stochastic Independence: Criteria of Independence 177 7. respectively.d. μ2 and σ1. (For example. .3. is Negative Exponential with parameter λ = 0. σ 2 ) and the Y’s are distributed as N(μ2.v.3.v.005 day. n be independent r. of the combined life of the part and its spare. which allows us to present a proof of Lemma 1.f. we give an alternative deﬁnition of independence of r.s. σ 2 ). . ii) Under what conditions on the α’s and β’s are the r. Determine the distribution of the r.d. r 1 2 7.’s. respecr r tively.7 Let X1 and X2 be independent r. . σ 2 ). .d. which provides a parsimonious way of checking independence of r.v. . Y.’s X1 + X2.’s such that the X’s are distributed as N(μ1.’s X and Y independent? 7.’s. j = 1. . X2 be independent r. X1 may represent the tensile strength (measured in p.’s distributed as B(n1. specify r.3.) of a steel cable and X2 may represent the strains applied on this cable. X1 − X2 and X1 − X2 + n2. iii) Find the expected life of the part in question. iii) If the automobile comes supplied with a spare part. j =1 n where the α’s and β’s are constants.7. Y distributed as X and independent of it.3. j = 1. Then P(X1 − X2 > 0) is the probability that the cable does not break. j =1 n Y =∑ β j X j . i = 1. An additional result.v. 1 2 7. and N(μ2. iii) What is the probability that X + Y ≥ 500 days? 7. μ1. X distributed as χ 2? Also. . . X whose p. 7. Then ii) Find the p. ﬁnd the p.’s X.v. σ2) and set X = ∑α j X j . n = 15.3.v. . ii) Give the numerical value of this probability for m = 10.v.v. Under what conditions on c1 and c2 is the r. X2 be independent r.f. n.’s of the r. μ2 and σ1. n be independent r.i. . Calculate the probability P(X1 − X2 > 0) as a function of μ1.v.4* Independence of Classes of Events and Related Results In this section.v. .) 7.’s distributed as χ 2 and χ 2 .v.8 Let Xj. σ 2 ).4 Let X1. μ1 = μ2 and σ 2 = σ 2 = 6.’s distributed as N(μ. p2).4* Independence of Classes of Events and Related Results 7. whose life is an r.3.6 Let Xi.

. k are independent (in any one of the modes mentioned in the previous deﬁnition) if the σ-ﬁelds induced by them are independent. . . . . k. . P) and recall that k events A1. ⋅ ⋅ ⋅ . the σ-ﬁeld induced by X. Deﬁnition 3 can be weakened considerably. .’s Xj. . and if g(X) is a measurable function of X and Ag(X) = [g(X)]−1(B). . . . j = 1. The converse is also obviously true. . . j = 1. j = 1. By Deﬁnition 3. . Ak are independent. . More precisely.178 7 Stochastic Independence with Some Applications To start with. In the ﬁrst place. so are Aj. . k. . k. . But PROOF OF LEMMA 1 A= g X [ ( )] (B) = X [g (B)] = X (B′).’s Xj. that is. . k.’s. . . let A ∈ Ag(X).v. k are (stochastically) independent (or independent in the probability sense. the events A1. . as explained in Lemma 3 below. . k. That independence of those σ-ﬁelds is implied by independence of the classes Cj. j = 1. From the very deﬁnition of X j−1(B). . if X is an r. we will have the σ-ﬁelds induced by them which we denote by Aj = X j−1(B). . Ak are said to be independent if for all 2 ≤ m ≤ k and all 1 ≤ i1 < · · · < im ≤ k. LEMMA 3 Let A j = X j−1 B ( ) and C j = X j−1 ({(−∞. k means independence of the σ-ﬁelds. . . . ▲ PROOF We may now proceed with the proof of Lemma 1. . . Then there exists B ∈ B such that A = [g(X)]−1 (B).v. or statistically independent) if for every Aj ∈ Cj. the previous deﬁnition is equivalent to Deﬁnition 1. . x]. independence of the r.v. k. Chapter 3) that X−1(B) is a σ-ﬁeld. . A. j = 1. for every Bj ∈B. .v. j = 1. In fact. then Ag(X) ⊆ AX. . j = 1. X j−1(Bj) ∈ X j−1(B). k. j = 1.v. Then if Cj are independent. k. . as follows: 1 m 1 m DEFINITION 2 We say that Cj. . It is an immediate consequence of this deﬁnition that subclasses of independent classes are independent. . it holds that P(Ai ∩ · · · ∩ Ai ) = P(Ai ) · · · P(Ai ). . j = 1. it sufﬁces to establish independence of the (much “smaller”) classes Cj. . and AX = X−1(B). j = 1. . . . DEFINITION 3 We say that the r. . j = 1.’s Xj. . . . j = 1.’s Xj. Thus. k. in order to establish independence of the r. . The next step is to carry over the deﬁnition of independence to r. . . k. k. if we consider the r. Actually. . . . . −1 −1 −1 −1 . On the basis of these observations. . . Then we have seen (Theorem 1. . This deﬁnition is extended to any subclasses of Cj. consider the probability space (S.v. x]. j = 1. . To this end. is an involved result in probability theory and it cannot be discussed here. where Cj = X j−1({(−∞. . let X be a random variable. for every Aj ∈ X j−1(B) there exists Bj ∈B such that Aj = X j−1(Bj). . sub-σ-ﬁeld of A. x ∈ }). j = 1. x ∈ }). According to the following statement. .

n. j = 1. k. ⋅ ⋅ ⋅ . Set X1 = IA . . B′ ∈ B. Let now Aj = Xj−1(B) and A j* = g X j [ ( )] (B ). Then A j* ⊆ A j . are independent. j = 1. . j = 1. It follows that X−1 (B′) ∈AX and thus.1 Consider the probability space (S. . k. j = 1. . so are A*. ⋅ ⋅ ⋅ . . . and since Aj. 1 2 . k. A. k. ▲ j Exercise 7. . P) and let A1. X2 are independent if and only if A1. .1 Stochastic Independence: Criteria of Independence 179 where B′ = g−1 (B) and by the measurability of g. X2 = IA and show that X1. −1 j = 1. A2 be events. Generalize it for the case of n events Aj. · · · . A ∈AX.4.Exercise 7. A2 are independent.

s.’s and the r. if Xn(s) ⎯ → X(s) for all s ∈ S except possibly for ⎯ ⎯ n →∞ n →∞ a subset N of S such that P(N) = 0. This type of convergence is also known as strong convergence. then Xn ⎯P → X is ⎯ n→∞ equivalent to: P[|Xn − X| ≤ ε] ⎯ → 1. and some comments are provided as to their nature. δ ) > 0 ⎯ n→∞ such that P[|Xn − X| > ε] < δ for all n ≥ N(ε. ii) We say that {Xn} converges in probability to X as n → ∞. . P[|Xn − X| > ε] ⎯ → 0. δ > 0 there exists N(ε. A.).v. Thus Xn ⎯⎯→ X means that for every ε > 0 and for every s ∈ N c there n→∞ exists N(ε. . ⎯ ⎯ n →∞ n→∞ Thus Xn ⎯P → X means that: For every ε. s) > 0 such that Xn s − X s < ε for all n ≥ N(ε. and we write Xn ⎯P → X. n = 1. δ ). ⎯ n →∞ n→∞ or P[Xn ⎯ → X] = 1.’s four kinds of convergence are deﬁned. P)). X are deﬁned on the probability space (S.s. DEFINITION 1 i) We say that {Xn} converges almost surely (a. if for every ε > 0. to a. s). then clearly P[|Xn − X| ≥ ε] ⎯ → 0. Also if P[|Xn − X| > ε] ⎯ → 0 for every ⎯ ⎯ n →∞ n →∞ ε > 0.s. . a. ⎯ n →∞ 180 . X as n → ∞. 2. For such a sequence of r.180 8 Basic Limit Theorems Chapter 8 Basic Limit Theorems 8. the sequence of the r.v.1 Some Modes of Convergence Let {Xn}.v. or with probability one. () () REMARK 1 Since P[|Xn − X| > ε] + P[|Xn − X| ≤ ε] = 1. or Xn ⎯ → X with probability 1. and we write Xn ⎯⎯→ X. An illustrative example is also discussed. be a sequence of random variables and let X be a random variable deﬁned on the sample space S supplied with a class of events A and a probability function P (that is.

→ X. . f.d. ⎯ n →∞ Next. x < 1− 1 n 1 1 2 Fn Figure 8.f. then Xn ⎯d → X does not necessarily imply ⎯ n→∞ the convergence of fn(x) to a p. clearly. Under further conditions on fn. .1 0 1 1 n 1 1 1 n One sees that Fn(x) ⎯ → F(x) for all x ≠ 1. x).f.f.8. if E|Xn − X|2 ⎯ → 0. ⎪ ⎩ () if if if ( ) 1 − (1 n) ≤ x < 1 + (1 n) x ≥ 1 + (1 n).f. ⎯ ⎯ n →∞ n→∞ Thus Xn ⎯d → X means that: For every ε > 0 and every x for which ⎯ n→∞ F is continuous there exists N(ε.’s fn.f. This type of convergence is also known as weak convergence. however. .d. that fn converges to a p. and we write Xn ⎯q. We now assume that E|Xn|2 < ∞. ⎪1 ⎪ Fn x = ⎨ 2 . . n = 1.1 Some Modes of Convergence 181 Let now Fn = FX . the d.f. Fn corresponding to fn is given by ⎧0. Then: iv) We say that {Xn} converges to X in quadratic mean (q.f. consider the p. ⎯ ⎯ n →∞ n→∞ Thus Xn ⎯q. .m. where F(x) is deﬁned by ⎯ n →∞ ⎧0 . REMARK 2 If Fn have p.. ⎪1. For n = 1.m. Then. and we write Xn ⎯d → X.d. F = FX.→ X means that: For every ε > 0. 2. Then n iii) We say that {Xn} converges in distribution to X as n → ∞.d. . which is a d. if Fn(x) ⎯ → F(x) for all x ∈ for which F is continuous. () if if x<1 x ≥ 1. ⎪ fn x = ⎨ 2 ⎪0. there exists N(ε) > 0 such ⎯ n→∞ that E|Xn − X|2 < ε for all n ≥ N(ε). fn(x) ⎯ → f(x) = 0 for all x ∈ and f(x) is not a p. F x =⎨ ⎩1. ⎩ () if x = 1− 1 n ( ) or x = 1+ 1 n ( ) EXAMPLE 1 otherwise.m.d. . f. 2. . it may be the case.’s deﬁned by ⎧1 . x) such that |Fn(x) − F(x)| < ε for all n ≥ N(ε. . as the following example illustrates.) as n → ∞.

8. is of a different nature. one has i) Yn ⎯P → 0. . Then show that.f. P n→∞ ( ) ⎯ Under what conditions on the pn’s does Xn ⎯ → 0? 8. . . . but the events An themselves keep wandering around the sample space S. ⎯ iv) Vn ⎯d → V.f.182 8 Basic Limit Theorems REMARK 3 Almost sure convergence is the familiar pointwise convergence of the sequence of numbers {Xn(s)} for every s outside of an event N of probability zero (a null event).m. . . Then show that Fn(x) ⎯ → 0 for every x ∈ .1. σ 2(Xj) = σ 2. Then show that Yn ⎯q. however. ⎯ n →∞ 8. . . . n.v. . 2.2 For n = 1.1 For n = 1. . ⎯ ii) Zn ⎯P → 1. r.’s such that P X n = 1 = pn . where U and V have the negative exponential distribution ⎯ with parameter λ = 1.’s such that EXj = μ. let Xn be independent r. . Zn = max(X1. 1).1. Convergence in distribution is also a pointwise convergence of the sequence of numbers {Fn(x)} for every x for which F is continuous.f. . we have that Xn ⎯P → X.1. . j = 1. . .4 For n = 1. So the sequence of numbers {P(An)} tends ⎯ ⎯ n →∞ n→∞ to 0. .1. ⎯ n →∞ Thus a convergent sequence of d. Yn be r. . Convergence in probability.3 Let Xj.1. with d.’s need not converge to a d.5 Let Xj .v.’s such that E(Xn − Yn)2 ⎯ → 0 and ⎯ n →∞ suppose that E(Xn − X)2 ⎯ → 0 for some r. |Xn(s) − X(s)| > ε} for an arbitrary but ﬁxed ε > 0. ( ) P X n = 0 = 1 − pn. n be independent r. Show that E(Xn − μ)2 ⎯ → 0.d. both ¯ ﬁnite. . . Xn). j = 1. and set Yn = min(X1. By setting An = {s ∈S.i. X. Vn = n(1 − Zn). . Un = nYn. as n → ∞.’s distributed as U(0. Fn given by Fn(x) = 0 if x < n and Fn(x) = 1 if x ≥ n. be i. convergence in quadratic mean simply signiﬁes that the averages E|Xn − X|2 converge to 0 as n → ∞.v. Finally. . . let Xn. ⎯ iii) Un ⎯d → U.v.→ ⎯ ⎯ n →∞ n→∞ X. let Xn be an r. 8. as n → ∞.v. Xn). . . 2. n.2 Relationships Among the Various Modes of Convergence The following theorem states the relationships which exist among the various modes of convergence. . . if P(An) ⎯ → 0. . 8. 2. . Exercises 8.v.

in dist. so that P(Bk) → P(Ac). 2. Chapter 2. But for any ﬁxed positive integer m.2 Relationships Among the Various Modes of Convergence 183 THEOREM 1 a.s. Cn↓Bk. . which is equivalent to saying that P(Ac) = 0. in prob. Chapter a. if a. ⎯ ⎯ n→∞ n→∞ P d iii) Xn ⎯ → X implies Xn ⎯ → X. again. To summarize. k⎠ r =1 ⎝ Hence P(Cn)↓P(Bk) = 0 by Theorem 2. However. ⇒ conv. By setting ∞ ∞ ⎛ 1⎞ Bk = I U ⎜ X n+ r − X ≥ ⎟ . and therefore P(Bk) = 0. that is. Next. n→∞ it is clear that for every ﬁxed k. as k → ∞. k ≥ 1.m. P[X = c] = 1 for some constant c.s. where ∞ ⎛ 1⎞ C n = U ⎜ X n+ r − X ≥ ⎟ . k⎠ k=1 n=1 r =1 ⎝ The sets A. ⎯ n→∞ n→∞ ii) Xn ⎯q. ii) By special case 1 (applied with r = 2) of Theorem 1.s.2.4) that ∞ ∞ ∞ ⎛ 1⎞ A = I U I ⎜ X n+ r − X < ⎟ . k⎠ n=1 r =1 ⎝ we have Bk↑Ac. i) Xn ⎯⎯→ X implies Xn ⎯P → X. Xn ⎯⎯→ X. Thus if Xn ⎯⎯→ X.s.m. we have ( ) P Xn − X > ε ≤ [ ] E Xn − X 2 ε2 . as ⎯ n→∞ was to be seen. in q. and as n → ∞.→ X implies Xn ⎯P → X. PROOF i) Let A be the subset of S on which Xn ⎯ → X. Then it is not hard to see ⎯ n →∞ (see Exercise 8.8. k⎠ k=1 n=1 r =1 ⎝ so that the set Ac for which Xn ⎯ → X is given by ⎯ n →∞ ∞ ∞ ∞ ⎛ 1⎞ A c = U I U ⎜ X n+ r − X ≥ ⎟ . this is equalivalent to saying that Xn ⎯P → X. then P(Ac) = 0. ⇑ conv. one has that n→∞ P(Cn) ⎯ → 0. conv. Ac as well as those appearing in the remaining of this discussion are all events. In terms of a diagram this is a. by Theorem 2. ⎜ k ⎠ r =1 ⎝ k⎠ ⎝ so that ⎡∞ ⎛ ⎛ 1⎞ 1⎞⎤ P⎜ X n+ m − X ≥ ⎟ ≤ P ⎢U ⎜ X n+ r − X ≥ ⎟ ⎥ = P C n ⎯ → 0 ⎯ n→∞ k⎠ k⎠⎥ ⎝ ⎢ r =1 ⎝ ⎣ ⎦ for every k ≥ 1. ⇒ conv. and hence we can take their probabilities. ⎯ n →∞ ∞ ⎛ ⎛ 1⎞ 1⎞ X n+ m − X ≥ ⎟ ⊆ U ⎜ X n+ r − X ≥ ⎟ . The converse is also true if X is degener⎯ ⎯ n→∞ n→∞ ate.

X ≤ x − ε since ] [ ≤ x] ∪ [ X ] [ ⊆ [X n n ] [ ≤ x + X n > x.m. then E|Xn − X|2 ⎯ → 0 implies P[|Xn − X| > ε] ⎯ → ⎯ ⎯ ⎯ n →∞ n →∞ n→∞ 0 for every ε > 0. [ ] [ ) ] [ or F x − ε ≤ Fn x + P X n − X ≥ ε . we get (by the fact that x is a continuity point of F) that F x ≤ lim inf Fn x ≤ lim→∞ Fn x ≤ F x . So ] [ ] ] [X ≤ x − ε ] ⊆ [X implies ≤ x ∪ Xn − X ≥ ε ] [ ] ] P X ≤ x − ε ≤ P Xn ≤ x + P Xn − X ≥ ε . n→∞ n→∞ Letting ε → 0. x ≠ c. n →∞ In a similar manner one can show that lim sup Fn x ≤ F x + ε . We have ⎯ n→∞ P X n − c ≤ ε = P −ε ≤ X n − c ≤ ε [ ] [ [ ] = P[ X ≤ c + ε ] − P[ X ≥ P[ X ≤ c + ε ] − P[ X = F (c + ε ) − F (c − ε ). − X ≥ − x + ε − X ≥ ε ⊆ Xn − X ≥ ε . ⎯ n→∞ iii) Let x ∈ [X ≤ x − ε ] = [X ⊆ [X ⊆ [X [X n be a continuity point of F and let ε > 0 be given. X ≤ x − ε + X n > x. sup n →∞ n Hence lim Fn(x) exists and equals F(x). Assume now that P[X = c] = 1. ] ] ] > x. X ≤ x − ε n −X ≥ε . X ≤ x − ε = X n > x. Thus.→ X. if Xn ⎯ → X. n →∞ ( ) () (1) () ( ) (2) But (1) and (2) imply F(x − ε) ≤ lim inf Fn(x) ≤ lim sup Fn(x) ≤ F(x + ε). Xn ⎯P → X. () x<c x≥c and our assumption is that Fn(x) ⎯ → F(x). = P c − ε ≤ Xn ≤ c + ε n n n n ] n n ] ≤ c − ε] < c−ε . if Xn ⎯q.184 8 Basic Limit Theorems Thus. Then n→∞ () () () () ⎧0 . Then we have n n n ≤ x. F x =⎨ ⎩1. then we have by taking limits ⎯ P n→∞ ( () [ ] F x − ε ≤ lim inf Fn x . We must show that ⎯ n →∞ Xn ⎯P → c. or equivalently.

m. 1]. It is also clear that for m > n. . . the proof m that EX 2 ⎯ → 0 is complete. clearly. These intervals are then given by ⎛ j −1 j ⎤ ⎜ k −1 . a. . ⎝2 2 ⎦ j = 1.’s within this group. EX 2 ⎯ → 0. ⎯ n→∞ that is. . and for n ≥ 1. 1]. otherwise.s.v. In fact. 1] as measures of their length. . . then: Xn ⎯q. it sufﬁces to show that Xn ⎯q. not even for a single s ∈ (0. in the following way: There are (2k − 1) − (2k−1 − 1) = 2k−1 r. 1] into 2k−1 subintervals of equal length. We deﬁne the jth r. 2. so that Xn ⎯q. For each k = 1. we deﬁne a group of 2k−1 r. ⎯ n →∞ n ⎯ The example just discussed shows that Xn ⎯P → X need not imply that n→∞ a. we get lim P X n − c ≤ ε ≥ F c + ε − F c − ε = 1 − 0 = 1. ⎯ ⎯ n →∞ n →∞ In fact. (This is known as the Lebesgue measure over (0.2 Relationships Among the Various Modes of Convergence 185 Since c − ε.v. k−1 ⎥ and 0. 1].8. .v. n→∞ n→∞ EXAMPLE 3 Let S and P be as in Example 2. n→∞ [ ] ( ) ( ) ] n→∞ Thus P X n − c ≤ ε ⎯ → 1. .’s converges to 0 in probability.v. Hence EX 2 = 1/2k−1.s. . ▲ ⎯ [ REMARK 4 not true. of r. EXAMPLE 2 It is shown by the following example that the converse in (i) is Let S = (0. Xn ⎯⎯→ X need not imply that Xn ⎯ ⎯→ X is seen by the following example. in this group to be equal to 1 for ⎛ j −1 j ⎤ s ∈ ⎜ k−1 .m. . k−1 ⎥ ⎝2 2 ⎦ for some k and j as above.m. .→ 0.m. if P[X = c] = 1. and let P be the probability function which assigns to subintervals of (0. by Theorem 1(ii). whose subscripts range from 2k−1 to 2k − 1.’s. Then. c + ε are continuity points of F.) Deﬁne the sequence X1.→ X need not imply Xn ⎯⎯→ X. X2. ⋅ ⋅ ⋅ . 2. Since for every ε > 0. That ⎯ n→∞ n→∞ n→∞ a.1/n]. . let Xn be deﬁned by Xn = ⎯ ⎯ nI(0.m. while it converges nowhere pointwise.’s as follows: For each k = 1. n EX 2 ≤ 1/2k−1. σ 2 X n ⎯ → 0. q. Xn ⎯⎯→ X. ⎝2 2 ⎦ We assert that the so constructed sequence X1. For any n ≥ 1. X2. ( ) ( ) . n →∞ n n→∞ REMARK 5 In (ii). and also that Xn ⎯q. .→ X if and only if ⎯ n→∞ E X n ⎯ → c.s. divide (0. k −1 ⎥.v. of r. 2 k −1. 1/2k−1 < ε for all sufﬁciently large k. .→ 0. Xn ⎯ → 0 but EX 2 = n(1/n) = 1. we have that Xn is the indicator of an ⎯ n →∞ n interval ⎛ j −1 j ⎤ ⎜ k−1 . 2.

REMARK 7 The assumption made in the second part of the theorem above according to which the function g is continuous at t = 0 is essential. and let F be a d. corresponding to Fn and φ be the ch. Then. ⋅ ⋅ ⋅ .v. Deﬁne the following r. as n → ∞. and t ∈ .v. let P be the discrete uniform function. () () X 3 = X 4 = 1. Hence Xn does not converge in probability to X. corresponding to F. ii) If φn(t) converges. to a function g(t) which is continuous at t = 0. n THEOREM 2 (P. Now. for all continuity points x of F.’s: X n 1 = X n 2 = 1.f.f. trivially. ⎪1. x≥1 FX () ⎧0. ⎩ n x<0 0 ≤ x < 1. The following theorem replaces this problem with that of proving convergence of ch. Very often one is confronted with the problem of proving convergence in distribution. that is. then g is a ch.f. () () FX n () ⎧0. so that its ch. let Xn be an r. 4}. () () Then X n s − X s = 1 for all s ∈ S. Lévy’s Continuity Theorem) Let {Fn} be a sequence of d.f..f. then Fn (x) ⎯ → ⎯ n →∞ F(x). as n → ∞. () () () () and X 1 = X 2 = 0. is given by φn(t) = e−t n/2. EXAMPLE 4 Let S = {1. then φn(t) ⎯ → φ(t). Xn ⎯d → X.. ⎩ x<0 0 ≤ x < 1.’s. but Xn does not converge in ⎯ n→∞ probability to X. and on the subsets of S. ⎪1. n = 1. which is much easier to deal with. ⎪1 x = ⎨2 . 2.f.186 8 Basic Limit Theorems E Xn − c ( ) 2 Hence E(Xn − c)2 ⎯ → 0 if and only if σ 2(Xn) ⎯ → 0 and EXn ⎯ → c. PROOF Omitted. distributed as N(0. Let φn be the ch. X n 3 = X n 4 = 0. 3. FX (x) ⎯ → FX(x) for all ⎯ n →∞ continuity points of FX. ⎪1 x = ⎨2 . 2. Then 2 . Thus. In fact. x≥1 so that FX (x) = FX(x) for all x ∈ . ⎯ ⎯ i) If Fn (x) ⎯ → F(x) for all continuity points x of F. ⎯ ⎯ ⎯ n →∞ n →∞ n →∞ ) ( = E( X − EX ) + ( EX = σ ( X ) + ( EX − c ) .f. for n →∞ n →∞ every t ∈ .’s. 2 n n 2 2 n n = E X n − EX n + EX n − c n [( )] − c) 2 2 REMARK 6 The following example shows that the converse of (iii) is not true. and if F is the corresponding d. n).f.

d.2. 1).) n→∞ 8. .2. 2. . pn). 2. . . The conclusion in (ii) does not hold here because ⎛X ⎛ x ⎞ x ⎞ 1 FX x = P X n ≤ x = P⎜ n ≤ ⎯ ⎟ = Φ⎜ ⎟ ⎯→2 n→∞ ⎝ n ⎝ n⎠ n⎠ 1 for every x ∈ and F(x) = – . of an r. N = 1. .’s with mean μ (ﬁnite) and (ﬁnite and positive) variance σ 2. let Xn be r. ⎩ j = 0. show that the set A of conver1 ∞ ∞ gence of {Xn} to X is. .f. . Then show that ⎯ i) Xn ⎯P → 0. .v. n→∞ iv) EXn ⎯ → 0. . where npn = λn ⎯ → λ(>0).v. For n = 1. THEOREM 3 (Central Limit Theorem) Let X1.’s.’s having the negative binomial distribution with pn and rn such that pn ⎯ → 1. . . otherwise. so that g is not continuous ⎯ n→∞ at 0. 8. Then. expressed by A = Ik =1 Un =1 I ∞=1(|Xn+r − X| < –). distributed as P(λ). so that rn(1 − pn) = λn ⎯ → ⎯ ⎯ ⎯ n→∞ n→∞ n→∞ λ(>0). ⎯ ⎯ n→∞ n→∞ distributed as P(λ). s ∈ (0. which assigns probability to intervals equal to their lengths. if t ≠ 0. r k 8. . x ∈ .’s distributed as B(n. j = 1. (Use ch.2. r.v.2 For n = 1. if ≤s< s =⎨ 2N + 1 2N + 1 +j ⎪0.1 (Rényi) Let S = [0. show that Xn ⎯d → X.4 If the i.v. 2. where X is an r.5 In reference to the proof of Theorem 1.8.3 The Central Limit Theorem We are now ready to formulate and prove the celebrated Central Limit Theorem (CLT) in its simplest form. 2. 1).f. deﬁne the r.’s. n→∞ ii) Xn(s) ⎯ → 0 for any s ∈ [0.i. Let . . where X is an r. .v.’s Xj.v.i. . . and g(0) = 1. indeed. .d.v. . . ⎯ n→∞ ⎯ iii) Xn (s) ⎯ → 0.’s. . 2 n () ( ) Exercises 8. ⎯ n→∞ XN 2 () 2 8. (Use ⎯ n→∞ ch.) 8. 2N.’s Xn as follows: ⎧ j j +1 ⎪N . n have a Cauchy distribution. . is not a d. .3 For n = 1. Show that Xn ⎯d → X. let Xn be r. . show that ¯ ⎯ there is no ﬁnite constant c for which Xn ⎯P → c. 1.2. .v. .2. 1) and let P be the probability function on subsets of S. by using ch. Xn be i. rn ⎯ → ∞. . r. .3 The Central Limit Theorem 187 φn(t) ⎯ → g(t). where g(t) = 0.f.f. .

This will imply that Gn(x) → Φ(x). however. Corollaries 1 and 2 in Chapter 7. it sufﬁces to prove that gn(t) ⎯ → φ(t). We have 2 n Xn − μ ( σ ) = nX = 1 n n − nμ n σ n = 1 n ∑ n Xj − μ j =1 σ ∑ Zj . Chapter 3. iv) We recall the following fact which was also employed in the proof of Theorem 3. since j =1 n n Xn − μ ( σ )=S ( ). then an = bno(1). n ⎠ n→∞ ⎝ n PROOF OF THEOREM 3 We may now begin the proof. 1 . be two sequences of numbers. by Theorem 2. . . .f.f. We say that {an} is o(bn) (little o of bn) and we write an = o(bn). Let gn be the ch. Then. . j =1 . For example. Xn.v. iii) In the proof of Theorem 3 and elsewhere. the “little o” notation will be employed as a convenient notation for the remainder in Taylor series expansions. 1 . that is. if an/bn ⎯ → 0. if an ⎯ n→∞ = n and bn = n2. ⎯ n→∞ x ∈ . 2.188 8 Basic Limit Theorems ⎤ ⎡ n X −μ 1 n n X j . or Theorem 9 and Corollary to Theorem 8 in this chapter). and Φ x = ∑ ⎥ ⎢ n j =1 σ ⎦ ⎣ Then Gn(x) ⎯ → Φ(x) for every x in . then ⎯ n→∞ ⎛ an ⎞ a ⎯ ⎜1 + ⎟ ⎯ → e . σ σ Sn for large n. for example. t ∈ . that the same or similar symbols have been employed elsewhere to denote different quantities (see. A relevant comment would then be in order. or ≈ N 0. then an = o(bn). let {an}. of Φ. . φ(t) = e−t /2.’s X1. since n/n2 = 1/n ⎯ → 0. 2 REMARK 8 i) We often express (loosely) the CLT by writing n Xn − μ Sn − E Sn ≈ N 0. Clearly. t ∈ . Namely. n = 1. {bn}. σ (S ) − E Sn n ii) In part (i). the notation Sn was used to denote the sum of the r. if an ⎯ → a. It should be pointed out. Therefore o(bn) = bno(1). where ( ) ( ) ( ) ( ) n ( ) Sn = ∑ X j . . To this end. and we are going to adhere to it here. . of Gn and φ be the ch. This is a generally accepted notation. ⎯ n→∞ Xn = () ( ) () 1 2π ∫ x −∞ e − t 2 dt . Gn x = P ⎢ ≤ x ⎥. if an = ⎯ n→∞ o(bn). This point should be kept in mind throughout.

⎟ = gZ 0 + 2! ⎝ n ⎠ ⎝ ⎠ ⎝ n⎠ n 1 1 () 1 () 2 1 () Since 2 g Z1 0 = 1. with E(Zj) = 0. σ 2(Xj) = σ 2.i. 2 COROLLARY ⎯ The convergence Gn(x) ⎯ → Φ(x) is uniform in x ∈ . of ⎯ n→∞ Φ. which is the ch. g Z1 0 = i 2 E Z 1 = −1. . for simplicity. n→∞ (That is. j = 1.3 The Central Limit Theorem 189 where Zj = (Xj − μ)/σ. −∞ < a < b < +∞. Then ⎛ t ⎞ ⎛ t2 ⎞ t 1⎛ t ⎞ gZ ⎜ gZ 0 + ⎜ ′′ ′ ⎟ g Z 0 + o⎜ n ⎟ . j = 1. σ 2(Zj) = E(Z2j) = 1.8. such that |Gn(x) − Φ(x)| < ε for all n ≥ N(ε) and all x ∈ simultaneously.1 Applications 1.d.i. . . writing Σ j Zj instead of Σ n Zj. 8. . . the CLT is used to give an approximation to P[a < Sn ≤ b]. ′ ′′ () () ( ) () ( ) we get ⎛ t ⎞ ⎛ t2 ⎞ t2 t2 t2 t2 gZ ⎜ 1− o 1 . ▲ The theorem just established has the following corollary. we have gn t = g t () ( n Σ jZ j ) () ⎛ t ⎞ ⎡ ⎛ 1 ⎞⎤ t = g Σ jZ j ⎜ ⎟ = ⎢ g Z1 ⎜ ⎟⎥ .3.6*. with E(Xj) = μ. n are i. . =1− + o⎜ ⎟ = 1 − + o 1 =1− ⎟ 2n 2n n 2n ⎝ n⎠ ⎝ n⎠ 1 () n [ ( )] Thus ⎫ ⎧ t2 ⎪ ⎪ 1−o 1 ⎬ .) PROOF It is an immediate consequence of Lemma 1 in Section 8. n are i. when this last expression j=1 appears as a subscript.d. ⎝ n ⎠ ⎣ ⎝ n ⎠⎦ ⎥ ⎢ 1 n Now consider the Taylor expansion of gz around zero up to the second order term. gn(t) ⎯ → e−t /2. . for every x ∈ and every ε > 0 there exists N(ε) > 0 independent of x. which along with the theorem itself provides the justiﬁcation for many approximations. If Xj. Hence.f. g n t = ⎨1 − ⎪ ⎪ 2n ⎭ ⎩ () [ ( )] Taking limits as n → ∞ we have. . We have: . g Z1 0 = iE Z1 = 0. ▲ The following examples are presented for the purpose of illustrating the theorem and its corollary.

5. where .5 and b + 0. respectively. The above approximation would not be valid if the convergence was not uniform in x ∈ . p). if in the expressions of a* and b* we replace a and b by a + 0. For a given n. To start with. and therefore move along as n → ∞. the approximation is best for p = – and 2 1 deteriorates as p moves away from – . 2. b* = REMARK 9 It is seen that the approximation is fairly good provided n and 1 p are such that npq ≥ 20. the Normal approximation to the Binomial distribution presented above can be improved. (This is the De Moivre theorem. . [ ] ( ) ( ) ( ) ( ) ( ) ( ) a* = ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) where . n. This is called the continuity correction. Also. where a* = a − np npq . Normal approximation to the Binomial. The points a* and b* do depend on n. Thus: P a < Sn ≤ b ≈ Φ b * − Φ a * . We have μ = p.) That is. j = 1. In the following we give an explanation of the continuity correction. b* = b − nμ ( ) ( ) ( ) b − np npq . npq Then it can be shown that fn(r)/φn(x) ⎯ → 1 and this convergence is uniform ⎯ n→∞ for all x’s in a ﬁnite interval [a. . b]. P(a < Sn ≤ b) ≈ Φ(b*) − Φ(a*). where now Xj. are independently distributed as B(1. . . let ⎛ n⎞ fn r = ⎜ ⎟ p r q n− r . σ n σ n (Here is where the corollary is utilized. a − nμ .190 8 Basic Limit Theorems ⎡ a − E Sn Sn − E Sn b − E Sn ⎤ ⎥ P a < Sn ≤ b = P⎢ < ≤ σ Sn σ Sn ⎥ ⎢ σ Sn ⎣ ⎦ ⎡ a − nμ S n − E S n b − nμ ⎤ ⎥ ≤ = P⎢ < σ Sn ⎢ σ n σ n ⎥ ⎦ ⎣ ⎤ ⎡ Sn − E Sn ⎡ Sn − E Sn b − nμ a − nμ ⎤ ⎥ ⎥ − P⎢ = P⎢ ≤ ≤ ⎢ σ Sn ⎢ σ Sn σ n ⎥ σ n ⎥ ⎦ ⎦ ⎣ ⎣ ≈ Φ b* − Φ a * .) Thus for x= r − np . This is the same problem as above. and let φ n x = ⎝r ⎠ () () 1 2πnpq e−x 2 2 . σ = pq . Some numerical examples will shed some 2 light on these points.

that fn(r) is close to φn(x). ⎛ b + 0. and for discrete r.8.5 − nμ ⎞ P a < S n ≤ b ≈ Φ⎜ ⎟ − Φ⎜ ⎟ ⎝ ⎠ ⎝ ⎠ σ n σ n ( ) with continuity correction.5. given by the area bounded by the normal curve. we ﬁrst rewrite the expression as follows: .5 − nμ ⎞ ⎛ a + 0. 1.v. and the abscissas 1 and 3. in general. the horizontal axis. in particular.5 and 3.6) 0. Note that this asymptotic relationship of the p.3 N(2. 1 1 To give an idea of how the correction term – comes in. we refer to Fig. To summarize.’s is not implied. 8.2 0. while the approximation without correction is the area bounded by the normal curve. the probability (n)prqn−r is approximately equal to the value r ⎡ r − np 2 ⎤ ⎥ exp⎢− ⎢ 2 npq ⎥ 2πnpq ⎥ ⎢ ⎦ ⎣ of the normal density with mean np and variance npq for sufﬁciently large n. Clearly. the horizontal axis and the abscissas 1. In particular. ( ) 0.’s.1 0 Figure 8. is closer to the exact area.d. the correction.2 1 2 3 4 5 Now P 1 < S n ≤ 3 = P 2 ≤ S n ≤ 3 = fn 2 + fn 3 = shaded area.2 2 drawn for n = 10. under the conditions of the CLT.v. by the convergence of the distribution functions in the CLT. for integer-valued r. ⎛ b − nμ ⎞ ⎛ a − nμ ⎞ P a < S n ≤ b ≈ Φ⎜ ⎟ − Φ⎜ ⎟ ⎝ σ n ⎠ ⎝ σ n ⎠ and ( ) ( ) () () ( ) without continuity correction. we have. That is.3 The Central Limit Theorem 191 large n.’s and probabilities of the form P(a ≤ Sn ≤ bn).f.2. p = 0.

2 − 1 = 0. ﬁnd P(45 ≤ Sn ≤ 55). where (5) .841345 + 0.1.5 = −1.7263. 2 1 i) For p1 = – : Exact value: 0. Normal approximation with correction: 1 45 − 0.2 = Φ 1 + Φ 1.1 a′ = 5 1 1 100 ⋅ ⋅ 2 2 1 55 + 0. EXAMPLE 5 1 5 – (Numerical) For n = 100 and p1 = – .5 − nμ Normal approximation without correction: 1 44 − 100 ⋅ 2 = 6 = −1.884930 − 1 = 0.7288 2 a′ = a − 0. b* = 1 1 5 100 ⋅ ⋅ 2 2 Thus Φ b * − Φ a * = Φ 1 − Φ −1. b* and a′. (4) P a ≤ S n ≤ b ≈ Φ b′ − Φ a ′ ( ) ( ) ( ) with continuity correction. b′ in (4) and (5) will be used in calculating probabilities of the form (3) in the numerical examples below.5 − nμ . a* = 1 1 5 100 ⋅ ⋅ 2 2 1 55 − 100 ⋅ 2 = 5 = 1.5 − 100 ⋅ 2 = 5. p2 = 16 . where a* = and σ n . b* = b − nμ σ n .5 − 100 ⋅ 2 = − 5.2. b′ = 5 1 1 100 ⋅ ⋅ 2 2 ( ) ( ) () ( ) () ( ) . and then apply the above approximations in order to obtain: P a ≤ Sn ≤ b ≈ Φ b * − Φ a * ( ) (( ) ) (3) ( ) ( ) ( ) a − 1 − nμ without continuity correction. b′ = b + 0.5 = 1.192 8 Basic Limit Theorems P a ≤ S n ≤ bn = P a − 1 < S n ≤ bn . σ n σ n These expressions of a*.

Then: so that Φ b * − Φ a * = 0. a′ = 2. Error with correction: 0.7288 − 0. Probabilities of the form P(a ≤ Sn ≤ b) are approximated as follows: P a ≤ Sn ≤ b ≈ Φ b * − Φ a * ( ) ( ) ( ) nλ without continuity correction.0025. ( ) ( ) so that Φ(b′) − Φ(a′) = 0. This is the same problem as in (1). . b* = b − nλ nλ . where a* = and P a ≤ S n ≤ b ≈ Φ b′ − Φ a ′ a − 1 − nλ . (Numerical) For nλ = 16.7838.5 − nλ nλ .5 − nλ ⎞ ⎛ b + 0. σ = λ .5 − nλ . j = 1.5 − nλ ⎞ P a < S n ≤ b ≈ Φ⎜ ⎟ ⎟ − Φ⎜ ⎝ ⎠ ⎝ ⎠ nλ nλ ( ) with continuity correction. ﬁnd P(12 ≤ Sn ≤ 21).3 The Central Limit Theorem 193 Thus Φ b′ − Φ a ′ = Φ 1. working as above. ( ) ( ) ( ) ( ) ( ) Error without correction: 0. Error with correction: 0.1 − 1 = 2 × 0.86. . a* = 2. b′ = b + 0. Error without correction: 0.0002. ⎛ a + 0.15. .23.0030.8. We have: Exact value: 0. 3. where a′ = EXAMPLE 6 a − 0. where now Xj. .1 = 2 Φ 1. b′ = 5.0000. we get: – Exact value: 0.7263 = 0.864334 − 1 = 0.7288 − 0. b* = 4.7286 = 0. 5 ii) For p2 = 16 .0021. Thus ⎛ b − nλ ⎞ ⎛ a − nλ ⎞ P a < S n ≤ b ≈ Φ⎜ ⎟ − Φ⎜ ⎟ ⎝ nλ ⎠ ⎝ nλ ⎠ and ( ) without continuity correction.7286. We have μ = λ.75.1 − Φ −1.0021. . Normal approximation to Poisson. n are independent P(λ).0030. ( ) ( ) ( ) nλ with continuity correction.

.) 8.25) − Φ(−1. 11 − 16 ( ) ( ) Error without correction: 0. what is probability that this number does not exceed 5% of 1. (For the = latter.v.194 8 Basic Limit Theorems Normal approximation without correction: 5 21 − 16 5 = −1.200 times.3.7 From a large collection of bolts which is known to contain 3% defective bolts. (Use the CLT.25.95? 8. Find the exact and approximate value for the probability P( ∑j1001 X j = 50). b′ = 1. Find the approximate probability that 65 ≤ X ≤ 90.3.3 Fifty balanced dice are tossed once and let X be the sum of the upturned spots. Find the approximate probability that the number of ones X is such that 180 ≤ X ≤ 220. How many bulbs manufactured by the new process must be examined.3. so that Φ b′ − Φ a ′ = 0. 1. Exercises 8. j = 1. .25. (Use the CLT. If X is the number of the defective bolts among those chosen.7851. .) 8.6 A Binomial experiment with probability p of a success is repeated 1 1 1. a* = =− Normal approximation with correction: a ′ = −1.3.) 8. p).000 times and let X be the number of successes.4 Let Xj.000p + 50).) 8.) . 100 be independent r. Find the approximate probability that 150 ≤ X ≤ 200.0049.7887.1 Refer to Exercise 4.25) − 1 = 2 × 0. and let X be the total number of aces drawn.000p − 50 ≤ X ≤ 1. (For the latter.3.894350 − 1 = 0. Error with correction: 0. use the CLT. ﬁnd 2 4 the exact and approximate values of probability P(1.12 of Chapter 4 and suppose that another manufacturing process produces light bulbs whose mean life is claimed to be 10% higher than the mean life of the bulbs produced by the process described in the exercise cited above. .3.) 8.000 are chosen at random. (Use the CLT. so as to establish the claim of their superiority with probability 0. 4 4 16 16 so that Φ(b*) − Φ(a*) = Φ(1.2 A fair die is tossed independently 1.3.5 One thousand cards are drawn with replacement from a standard deck of 52 playing cards. use the CLT.1. b* = = = 1.25) = 2Φ(1.125.000? (Use the CLT.375. For p = – and p = –.’s distributed as B(1.0013.

. n be i. r. ii) By using Tchebichev’s inequality. . If n = 100. show that the above value of n is given 1 by n = k (1− p) if this number is an integer.15 Let Xj.’s with EXj = μ ﬁnite and σ 2(Xj) = σ2 ∈ (0. How many voters must be sampled. n ii) n[( X −Y )2−( μ − μ )] is asymptotically distributed as N(0.099? (Use Exercise 8. . . n be the diameters of n ball bearings.8 Suppose that 53% of the voters favor a certain legislative proposal. j = 1.i. 100p% of the students admitted will. ∞). compare the respective values of n in parts (i) and (ii). . If you play the 2 game independently 1. . both ﬁnite.25σ) = 0. From past experience it follows that.10 A certain manufacturing process produces vacuum tubes whose lifetimes in hours are independently distributed r.3.) 8.12 Let Xj.v. both ﬁnite. . 0. n. .v.3. and n is the integer part of this number increased by 1. you win or lose $1 with probability – . .0005 inch. .’s such that EXj = μ ﬁnite and σ2(Xj) = ¯ σ2 = 4.d. n be i.500 hours. (Use the CLT.13 in Chapter 4 and let Xj. ¯ i) Show that the smallest value of the sample size n for which P(|Xn − μ| ≤ kσ) 2 1 ≥ p is given by n = [ k Φ −1 ( 1+ p )] if this number is an integer. n be independent r. n be i.5 inch and σ = 0.99? (Use the CLT. 0. r.) 8.3.14 Let Xj. . determine the constant c so that P(|Xn − μ| ≤ c) = 0.d.i. .12.) 8.3. j = 1.’s from the same distribution with EXj = EYj = μ and σ 2(Xj) = σ 2(Yj) = σ 2.v. . so that the observed relative frequency of those favoring the proposal will not differ from the assumed frequency by more than 2% with probability 0.1. σ 2(Xj) = σ 2.3. Determine ¯ ¯ the sample size n so that P(|Xn − Yn| ≤ 0. σ 2 ( X n − Yn ) = 2 σ .95. and n is 2 the integer part of this number increased by 1.12.Exercises 195 8.’s such that the X’s are identically distributed with EXj = μ1. the total amount you won or lost) is at least $10? (Use the CLT. iii) For p = 0. j = 1.13 Refer to Exercise 4. What is the probability that the total life of 50 tubes will exceed 75.90.25. otherwise.).i.0001) = 0.v. . on the average.) 8. 2 8. what is the ¯ minimum value of n for which P(|Xn − μ| ≤ 0. .3. . If EXj = 0.000 hours? (Use the CLT. j = 1. actually.) 8. (Use the CLT.9 In playing a game. what is the probability that your fortune (that is.000 times. . Yj = 1. . . r.) 1 8. accept an admission offer (0 < p . .3. j = 1.3. 1). . Yj.05. Show that: i) E ( X n − Yn ) = μ 1 − μ 2 . and the Y’s are identically distributed with EYj = μ2 ﬁnite and σ 2(Yj) = σ 2.11 Let Xj. .16 An academic department in a university wishes to admit c ﬁrst-year graduate students. .3. (Use Exercise 8. .3.v. . otherwise.3.1.’s with Negative Exponential distribution with mean 1. n. σ 2 n n 1 2 8. . .d. . j = 1.95 and k = 0.

6? Hint: For part (i).s.υ. Xn = PROOF Omitted.s. so that the probability P(|X − c| ≤ d) is maximum. It may be assumed that acceptance and rejection of admission offers by the various students are independent events. We distinguish two categories of LLN: the strong LLN (SLLN) in which the convergence involved is strong (a. . use the CLT (with continuity correction) in order to ﬁnd the approximate value to P(|X − c| ≤ d). .’s with (ﬁnite) mean μ.) 8. then Xn = X1 + ⋅ ⋅ ⋅ + X n P ⎯ → μ. . where the convergence involved is convergence in probability. are i. there will be two successive values of n suggesting themselves as optimal values of n. ¯ ¯ Then EXn = μ. j = 1.’s also have a ﬁnite variance σ2. . that is. and choose that value of n which gives the larger probability.d.i. .4 Laws of Large Numbers This section concerns itself with certain limit theorems which are known as laws of large numbers (LLN).d. and p = 0. n→∞ n→∞ THEOREM 5 (WLLN) If Xj. ε2 n . n→∞ then E(Xj) is ﬁnite and equal to μ. The latter are the weak LLN. r. n.v. where X is the number of students actually accepting an admission. ¯ ¯ ⎯ Of course. it is presented in a higher level probability course. then X1 + ⋅ ⋅ ⋅ + X n a. . ⎯ n→∞ n PROOF i) The proof is a straightforward application of Tchebichev’s inequality under the unnecessary assumption that the r. . d = 2. so that. P Xn − μ ≥ ε ≤ [ ] 1 σ2 → 0 as n → ∞.s. n are i.s. .6? iii) What is the maximum value of the probability P(|X − 20| ≤ 2) for p = 0.υ. if Xn ⎯⎯→ to some ﬁnite constant μ. For part (iii). n→∞ n a.196 8 Basic Limit Theorems < 1). r.). and the weak LLN (WLLN). that is.i. Then draw the picture of the normal curve. i) How many students n must be admitted. σ2(Xn) = σ2/n. ¯ The converse is also true.’s with (ﬁnite) mean μ. j = 1. ▲ a. ⎯⎯→ μ. Xn ⎯⎯→ μ implies Xn ⎯P → μ. and conclude that the probability is maximized when n is close to c/p. and d is a prescribed number? ii) What is the value of n for c = 20. THEOREM 4 (SLLN) If Xj. for every ε > 0. Calculate the respective probabilities.

n are independent and distributed as P(λ). .4. when this last expression j=1 appears as a subscript. we have n ⎛ t ⎞ ⎡ ⎛ t ⎞⎤ φ X t = φ1 nΣx t = φ Σx ⎜ ⎟ = ⎢φ X ⎜ ⎟ ⎥ ⎝ n⎠ ⎣ ⎝ n⎠ ⎦ ⎢ ⎥ n () j () j 1 ⎡ ⎛ t ⎞⎤ t = ⎢1 + iμ + o⎜ ⎟ ⎥ n ⎝ n⎠ ⎦ ⎢ ⎥ ⎣ n ⎡ ⎤ t t = ⎢1 + iμ + o 1 ⎥ n n ⎣ ⎦ () n ⎡ ⎤ t = ⎢1 + iμ + o 1 ⎥ ⎯ → e iμt . . writing ΣXj instead of Σ n Xj. .s.1 An Application of SLLN and WLLN Let Xj.s. .8. p).f. . ⎯ n () n →∞ () For simplicity. . For example.d. j = 1. . n be i. then Xn = X1 + ⋅ ⋅ ⋅ + X n ⎯→ p ⎯ n→∞ n a.i. then: Xn = and also in probability. with d.f. for t ∈ . . X n ≤ x . . [ ( )] n Both laws of LLN hold in all concrete cases which we have studied except for the Cauchy case.’s could also be used if they exist).4 Laws of Large Numbers 197 ii) This proof is based on ch. F.’s. . in order to ¯ ⎯ prove that Xn ⎯P → μ. For the Poisson case we have: If Xj. ⋅ ⋅ ⋅ . By Theorems 1(iii) (the converse case) and 2(ii) of this chapter.6*. ▲ ⎯ n →∞ ⎣ n ⎦ REMARK 10 An alternative proof of the WLLN. j = 1. Fn x = () 1 the number of X 1 . without the use of ch.f. . is denoted by Fn and is deﬁned as follows: For x ∈ . and also in probability. we have: If Xj.g. . The sample or empirical d. 8. n [ ] .f. where E(Xj) does not exist. in the Binomial case. as will be seen. is presented in Lemma 1 in Section 8.’s (m. X1 + ⋅ ⋅ ⋅ + X n ⎯→λ ⎯ n→∞ n a. it sufﬁces to prove that n→∞ φ X t ⎯ → φ μ t = e itμ . j = 1. The underlying idea there is that of truncation.f. n are independent and distributed as B(1.

d. as a function of the r. . we have P ⎡sup Fn x − F x .v.s. σ 2 ⎜ ∑ Yj ⎟ = npq = nF x 1 − F x .i. for each x. (that is. . ⎝ j =1 ⎠ ⎝ j =1 ⎠ () ( )[ ( )] It follows that 1 nF x = F x . r. Yj. r. j = 1.s. n So for each x ∈ .’s and suppose that EX k is ﬁnite for a j given positive integer k. .d. ⎪ Yj x = Yj = ⎨ ⎪0. Let ⎧1.4. 1 n→∞ 8.v. n→∞ [ ( )] () () () () Fn x ⎯P → F x .f. ⋅ ⋅ ⋅ . . . . ⎯ n→∞ () () Actually. . more is true. given in Exercise 3. Set n (k ) 1 Xn = ∑Xk j n j =1 for the kth sample moment of the distribution of the X’s and show that ¯ (k) ⎯ X n ⎯P → EX k. n be i. j = 1. . p). (Calculate the expectation by means of the ch.’s X1. It is also an r. . . .2. we get by the LLN E Fn x = a. . Fn x = () ) 1 n ∑ Yj . . ..1 Let Xj. n→∞ { () () } PROOF Omitted. ( ( ) () Hence ⎛ n ⎞ ⎛ n ⎞ E ⎜ ∑ Yj ⎟ = np = nF x .) .f. j = 1. Fn x ⎯⎯→ F x . clearly. and Yj is B(1. n be i. n are independent since the X’s are. .d. Then.v.’s with p. . ⎩ () Xj ≤ x X j > x.v.i. Xn.14 of Chapter 3 and show that the WLLN holds. Fn(x) ⎯⎯→ F(x) uniformly in x ∈ ). .4. Xn. n j =1 On the other hand. n.198 8 Basic Limit Theorems Fn is a step function which is a d. . . Exercises 8.f. for a ﬁxed set of values of X1. x ∈ ⎯ → 0⎤ = 1 ⎯ n →∞ ⎥ ⎢ ⎦ ⎣ a. Namely. j = 1. . THEOREM 6 (Glivenko–Cantelli Lemma) With the above notation. where p = P Yj = 1 = P X j ≤ x = F x .2 Let Xj.

4.s. and g: k (1) (k) → is continuous.v. n→∞ n→∞ ( ii) More generally. 8. 8. X n ⎯⎯→ X j . ⎝ ⎠ n →∞ ( ) .6 For j = 1.5 Further Limit Theorems In this section. ⋅ ⋅ ⋅ . . . . k. a. . has the Negative Exponential distribution with parameter λj = 2j/2. so that g(Xn). . we present some further limit theorems which will be used occasionally in the following chapters. n be r.’s such that the jth r. and let g: → be continuous.’s. THEOREM 7 i) Let Xn. . ⋅ ⋅ ⋅ . .4 Let Xj. . 8. Suppose that EXj = μj. . 8. X k .’s.5 Further Limit Theorems 199 8.v. and X be r.4. n →∞ (1 ) (k ) a.v.4. j ≥ 1. 2. all ﬁnite. a. Xk) are r.v. . n ≥ 1. .4.’s such that P X j = −α j = P X j = α j = ( ) ( ) 1 .8. Show that the generalized WLLN holds. σ 2(Xj) = σ 2. 2. j = 1. n be pairwise uncorrelated r. . ⋅ ⋅ ⋅ . and g(X) are r.v. . show that the generalized WLLN j=1 holds. Then Xn ⎯⎯→ X implies g(Xn) ⎯⎯→ g(X).3 Let Xj. j and set μn = 1 n ∑ μj.v. . . . . if for j = 1. the generalized WLLN holds. .4.v. ⎯ n→∞ Show that if the X’s are pairwise uncorrelated and σ 2 ≤ M(<∞). . let Xj be independent r. . n j =1 Then a generalized version of the WLLN states that X n − μ n ⎯P → 0. . 2 Show that for all α’s such that 0 < α ≤ 1. j = 1. k imply g ⎛ X n .’s. X n ) and g(X1. . .v. 8.’s such that Xj is distributed as χ j2/ j .v. X n ⎞ ⎯⎯→ g X 1 .s.’s such that Xj is distributed as P( λj). let Xj be independent r. so that g(X n . n ≥ 1.v. X nj). . .7 For j = 1. n ≥ 1.’s.5 Decide whether the generalized WLLN holds for independent r. . . . If {1/nΣn λj} remains bounded.s. j = 1. and Xj are r.’s which need be neither independent nor identically distributed.s. then the j generalized version of the WLLN holds. then ( j ) a.

and if Mn ↑ ∞(Mn > 0). That is. for n ≥ N(ε). . n ≥ 1. THEOREM 7 ′ i) Let Xn. Thus for every ε > 0. Xj and g be as in Theorem 7(ii). [ ( ) ( ) ] (for n ≥ N (ε )). () () which implies that c c A3 n ⊆ A1 ∪ Ac n . let again X (j). . Xk). . . 2M] with |x′ − x″| < δ(ε). there exists δ(ε. ▲ A similar result holds true when a. . M) = δ(ε) (<1) such that |g(x′) − g(x″)| < ε for all x′. and suppose n (k) that X (j) ⎯P → Xj. 2 () () Hence c c P A3 n ≤ P A1 + P Ac n ≤ ε 2 + ε 2 = ε 2 [ ( )] ( ) [ ( )] (for n ≥ N (ε )). P g X n − g X ≥ ε < ε. . ⎯ n→∞ ii) More generally.200 8 Basic Limit Theorems PROOF Follows immediately from the deﬁnition of the a. [ ( ) ( ) ] . Set [ ( )] () A1 = X ≤ M . . X n ) ⎯P → g(X1. Then g(X (1). . convergence is replaced by convergence in probability. ∞ )]) = P( X > M ) < ε 2 (M n0 n0 >1 . . j = 1. and hence A1 ∩ A2 n ⊆ A3 n . ⎯ ⎯ n n n→∞ n→∞ PROOF i) We have P(X ∈ ) = 1. k. A2 n = X n − X < δ ε . x″ ∈ [−2M. .s. ) P X > M < ε 2. g being continuous in . and A3 n = g X n − g X < ε [ ] () [ ( )] () Then it is easily seen that on A1 ∩ A2(n). Thus there exist n0 sufﬁciently large such that ⎯ n→∞ P X ∈ − ∞. is uniformly continuous in [−2M. From Xn ⎯P → X we have that ⎯ n→∞ there exists N(ε) > 0 such that ( ) P Xn − X ≥ δ ε < ε 2 . ⎯ n→∞ Then g(Xn) ⎯P → g(X). and suppose that Xn ⎯P → X. convergence and the continuity of g. but a justiﬁcation is needed. . we have −2M < X < 2M. Mn]) ⎯ → 1. −2M < Xn < 2M. X and g be as in Theorem 7(i). . − M n 0 Deﬁne M = Mn . 2M].s. n ≥ N ε . then P(X ∈ [−Mn. we then have ([ ( 0 )] + [ X ∈(M n0 .

n→∞ PROOF In sufﬁces to take g as follows and apply the second part of the theorem: i) g(x. y) = xy. b constants). y ≠ 0. y) = ax + by. ii) g(x.3 and 8.5 Further Limit Theorems 201 The proof is completed. constant. Yn ⎯P → Y.) ii) It is carried out along the same lines as the proof of part (i). c⎠ ⎝ c⎠ ⎪ ⎝ = P cX ≤ z = ⎨ ⎪ P ⎛ X ≥ z ⎞ = 1 − F ⎛ z −⎞ . ( ) ( ) = P X + c ≤ z = P X ≤ z − c = FX z − c . then ⎯ ⎯ n→∞ n→∞ i) Xn +Yn ⎯d → X + c. ⎯ n→∞ ii) XnYn ⎯d → cX. y) = x/y.6. COROLLARY ⎯ ⎯ If Xn ⎯P → X.8. X⎜ ⎟ ⎟ ⎪ ⎜ c⎠ ⎝c ⎠ ⎩ ⎝ ( ) Yn ⎯ (z) ⎯ → F (z) n→∞ X c c>0 ⎛X ⎞ ⎧P X ≤ cz = FX cz . ⎝ c ⎠ ⎪P X ≥ cz = 1 − FX cz − . (See also Exercise 8. provided P(Yn ≠ 0) = P(Y ≠ 0) = 1. ⎯ n→∞ Equivalently. then n→∞ n→∞ P i) Xn + Yn ⎯ → X + Y. iii) g(x. (See also Exercises 8. THEOREM 8 If Xn ⎯d → X and Yn ⎯P → c. iv) g(x.) ▲ The following corollary to Theorem 7′ is of wide applicability. c < 0.2. ⎯ n→∞ iii) XnYn ⎯P → XY. ⎯ n→∞ ii) aXn + bYn ⎯P → aX + bY (a. ▲ The following is in itself a very useful theorem. y) = x + y. ⎯ n→∞ ⎯ iv) Xn/Yn ⎯P → X/Y. ⎪ = P⎜ ≤ z⎟ = ⎨ c < 0. ⎯ n→∞ iii) Xn/Yn ⎯d → X/c.6. provided P(Yn ≠ 0) = 1. ( () () ) ( () () ) ( ) ⎛X ⎞ iii) P⎜ n ≤ z⎟ = FX n ⎝ Yn ⎠ ⎧ ⎛ ⎛ z⎞ z⎞ c>0 ⎪P⎜ X ≤ ⎟ = FX ⎜ ⎟ . ⎩ provided P(Yn ≠ 0) = 1. ( ( ) ) ( ) ( ) .5.1. i) P X n + Yn ≤ z = FX n +Yn z ⎯ → FX + c z ⎯ n→∞ ii) P X nYn ≤ z = FX Y z ⎯ → FcX z ⎯ n→∞ n n c ≠ 0.

202 8 Basic Limit Theorems REMARK 11 Of course. PROOF As an illustration of how the proof of this theorem is carried out. Outside this null set. we obtain ⎛X ⎞ lim sup P⎜ n ≤ z⎟ ≤ FX z c ± ε . we will be interested in the limit of the above probabilities as n → ∞. z∈ . z ∈ . ( )] Thus ⎛X ⎞ P⎜ n ≤ z⎟ ≤ P Yn − c ≥ ε + P X n ≤ z c ± ε . we may divide by Yn except perhaps on a null set. Thus. ≤ zYn ∩ Yn − c < ε ⊆ X n ≤ z c − ε . Next. or ⎯ ⎯ n→∞ n→∞ P(c − ε ≤ Yn ≤ c + ε) ⎯ → 1. We ﬁrst notice that Yn ⎯P → c (>0) implies that P(Yn > 0) ⎯ → 1. if we choose ε < c. if z ≥ 0. z ∈ . ⎠ ⎢⎝ Yn ⎥ ⎣ ⎦ In the following. ) ( ) ( ) [ ) [ ( ( )] and n That is. The case where c < 0 is treated similarly. ) But |Yn − c| < ε is equivalent to c − ε < Yn < c + ε. we have then ⎡⎛ X ⎤ ⎡⎛ X ⎤ ⎛X ⎞ ⎞ ⎞ P ⎜ n ≤ z⎟ = P ⎢⎜ n ≤ z⎟ I Yn > 0 ⎥ + P ⎢⎜ n ≤ z⎟ I Yn < 0 ⎥ ⎝ Yn ⎠ ⎠ ⎠ ⎢ ⎥ ⎢ ⎥ ⎣⎝ Yn ⎦ ⎣⎝ Yn ⎦ ⎡⎛ X ⎤ ⎞ ≤ P ⎢⎜ n ≤ z⎟ I Yn > 0 ⎥ + P Yn < 0 . if z < 0. for every z ∈ )] ( (X n ≤ zYn ∩ Yn − c < ε ⊆ X n ≤ z c ± ε ) ( ) [ n )] and hence ⎛ Xn ⎞ ≤ z⎟ ⊆ Yn − c ≥ ε ⎜ ⎝ Yn ⎠ ( )U [X ≤ z c ±ε . ⎝ Yn ⎠ ( ) [ ( )] Letting n → ∞ and taking into consideration the fact that P(|Yn − c| ≥ ε) → 0 and P[Xn ≤ z(c ± ε)] → FX[z(c ± ε)]. n→∞ ⎝ Yn ⎠ [ ( )] . In fact. We have then ( ) ( ) ( ) ( ) ⎛ Xn ⎞ ⎛X ⎞ ⎛X ⎞ ≤ z⎟ = ⎜ n ≤ z⎟ I Yn − c ≥ ε + ⎜ n ≤ z⎟ I Yn − c < ε ⎜ ⎝ Yn ⎠ ⎝ Yn ⎠ ⎝ Yn ⎠ ( ) ( ) ⊆ Yn − c ≥ ε ( )U ( X n ≤ zYn )I ( Y n −c <ε . ⎯ n→∞ since P(Yn ≠ 0) = 1. Therefore (X (X n ≤ zYn ∩ Yn − c < ε ⊆ X n ≤ z c + ε . ⎯ ⎯ n→∞ n→∞ Yn ⎯P → c is equivalent to P(|Yn − c| ≤ ε) ⎯ → 1 for every ε > 0. we assume that Yn > 0. we obtain the result. Since P(Yn < 0) → 0. . we proceed to establish (iii) under the (unnecessary) additional assumption that FX is continuous and for the case that c > 0. FX(z/c−) = FX(z/c) and FX(cz−) = FX(cz). if F is continuous.

n→∞ ⎝ Yn ⎠ [ ( )] ( ) Since. z ∈ . we obtain ⎛X ⎞ FX z c ± ε ≤ lim inf P⎜ n ≤ z⎟ . as ε → 0. Thus . That is. if ⎝ Yn ⎠ ) z < 0. n→∞ ⎝ Yn ⎠ ( ) (6) Next. n [X ≤ z(c ± ε )]I ( Y and hence ⎛X ⎞ − c < ε ⊆ ⎜ n ≤ z⎟ ⎝ Yn ⎠ ) [X ≤ z(c ± ε )] ⊆ ( Y n n −c ≥ε )U ⎛⎜⎝ X Y ) n n ⎞ ≤ z⎟ . z ∈ . and [X ≤ z(c + ε )]I ( Y n n ⎛X ⎞ − c < ε ⊆ ⎜ n ≤ z⎟ . FX[z(c ± ε)] → FX(zc). if ⎝ Yn ⎠ ) z ≥ 0. ⎝ Yn ⎠ [ ( )] ( Letting n → ∞ and taking into consideration the fact that P(|Yn − c| ≥ ε) → 0 and P[Xn ≤ z(c ± ε)] → Fx[z(c ± ε)]. Yn n→∞ ⎝ ⎠ (7) Relations (6) and (7) imply that lim P(Xn/Yn ≤ z) exists and is equal to n→∞ ⎛X ⎞ FX zc = P X ≤ zc = P⎜ ≤ z⎟ = FX ⎝ c ⎠ ( ) ( ) c (z). for every z ∈ n . as ε → 0. we have that |Yn − c| < ε is equivalent to 0 < c − ε < Yn < c + ε and hence [X ≤ z(c ± ε )] = [X ≤ z(c ± ε )] ∩ ( Y − c ≥ ε ) + [X ≤ z(c ± ε )] ∩ (Y − c < ε) ⊆ (Y − c ≥ ε) ∪ [ X ≤ z( c ± ε )] ∩ ( Y − c < ε ). z ∈ . z ∈ . we have ⎛X ⎞ FX zc ≤ lim inf P⎜ n ≤ z⎟ . By choosing ε < c. n n n n n n n n [X ≤ z(c − ε )]I ( Y n n ⎛X ⎞ − c < ε ⊆ ⎜ n ≤ z⎟ . FX[z(c ± ε)] → FX(zc).8. ⎠ Thus ⎛X ⎞ P X n ≤ z c ± ε ≤ P Yn − c ≥ ε + P⎜ n ≤ z⎟ .5 Further Limit Theorems 203 Since. we have ⎛X ⎞ lim sup P⎜ n ≤ z⎟ ≤ FX zc .

’s with mean μ and (positive) variance σ2. ⎯ n→∞ COROLLARY TO THEOREM 8 If X1. and also in probability. and also in probability (by Theorems probability. . REMARK 12 Theorem 8 is known as Slutsky’s theorem. ▲ n Yn ⎯ (z) ⎯ → F (z). . 1 .s. .i. r. S 2 ⎯P → σ 2 ⎯ n n→∞ implies 2 n Sn P ⎯ → 1.’s X j2.i. . and also in n→∞ 2 ¯ n ⎯ → μ2 a. 1 ⎯ ( ) d n→∞ and also n Xn − μ Sn ( ) ⎯ → N 0. . and also in probability (by the same theorems just referred to). n. j = 1. . Xn are i. n→∞ X c z∈ . . then n − 1 Xn − μ Sn ( ) ⎯ → N 0. X 2 ⎯ → μ2 a. ⎯ ( ) d n→∞ PROOF In fact. and E X 2j = σ 2 X j + EX j ( ) ( ) ( ) 2 = σ 2 + μ 2 . j = 1. n j =1 Next. Then S 2 ⎯ → σ 2 a. On the other hand. Therefore the SLLN and WLLN give the result that 1 n ⎯→ ∑ X 2j ⎯→∞ σ 2 + μ 2 n n j =1 a.v. r.s. . . . if μ = E Xj . . be i. ⎯ n n→∞ REMARK 13 Of course. n are i. So we have proved the following theorem. .. . the r. ¯n ⎯ and also in probability. .d. ⎯ ( ) d n→∞ . if Xj. THEOREM 9 Let Xj.i. σ2 =σ2 Xj ( ) ( ) (which are assumed to exist). σ 2(Xj) = σ 2.’s with E(Xj) = μ.v. Now.204 8 Basic Limit Theorems ⎛X ⎞ P⎜ n ≤ z⎟ = FX ⎝ Yn ⎠ as was to be seen. since the X’s are.d. . are i.v.’s. r.s. ⎯ n − 1 σ 2 n→∞ since n/(n − 1) ⎯ → 1.d. . 1 . . j = 1.s. n Xn − μ ( σ ) ⎯ → N 0. .s. n.i. Thus 1 n ∑ X 2j − X n2 → σ 2 + μ 2 − μ 2 = σ 2 n j =1 a.d. n. . we have seen that the sample variance 2 Sn = 1 n ∑ X j − Xn n j =1 ( ) 2 = 1 n ∑ X 2j − X n2 . j = 1.v. and hence X ⎯ n→∞ 7(i) and 7′(i)). .

Xn − d ⎯ → 0. Then. * * ⎯ ⎯ by Theorem 7′(i) again. THEOREM 10 For n = 1. and hence. . and let cn(Xn − d) ⎯d → X as n → ∞. . . let Xn and X be r. By assumption.5 Further Limit Theorems 205 by Theorem 3. and Sn P ⎯ → 1. . applies and gives ¯ n(Xn − μ) ⎯d → X ∼ N(0. or ⎯ ⎯ P equivalently. σ g ′ μ ⎯ ⎝ [ ( ) ( )] [ ( )] ⎞⎟⎠ . all limits are taken as n → ∞. and therefore.v. so that Xn ⎯P → d.’s. Hence * cn g X n − g d = cn X n − d g ′ X * . σ 2). we have cn(Xn − d) g′(X n ⎯d → g′(d)X. . . Finally. ∞). n ( ) () ( ) ( ) ) ( However. ⎯ PROOF In this proof. expand g(Xn) around d according to Taylor’s formula in order to obtain g X n = g d + X n − d g′ X * . ⎛ n g X n − g μ ⎯d → N ⎜ 0. n where X n is an r. Then. . as n → ∞. 2 PROOF By the CLT. .’s which is n − 1 Xn − μ Sn n ( ) converges in distribution to N(0. ⎯ (8) Next.i.v. ⎯ n By assumption. [ ( ) ( )] ( ) (9) g X * ⎯P → g d . ▲ The following result is based on theorems established above and it is of signiﬁcant importance. 1) as n → ∞. |X n − d| ≤ |Xn − d| ⎯P → 0 by (8). ▲ COROLLARY ( ) () (10) Let the r. Then cn[g(Xn) − g(d)] ⎯ ⎯d → g′(d)X as n → ∞. and let g: → be differentiable with derivative continuous at μ. convergence (10) and Theorem 8(ii). by Theorem 8(ii). and let its derivative g′(x) be continuous at a point d.v. with mean μ ∈ and variance σ 2 ∈ (0. This result and relation (9) complete the proof of the *) ⎯ theorem. Hence the quotient of these r. so that the theorem ⎯ . ⎯ n→∞ n−1 σ by Remark 13. 2.d. −1 cn(Xn −d) ⎯d → X and cn → 0. Xn − d ⎯d → 0. Xn be i. by Theorem 9. let g: → be differentiable. by Theorem 7′(i).’s X1. lying between d and Xn.8. let cn be constants such that 0 ≠ cn → ∞. ⎯ X n − d ⎯P → 0.v.

(12) ⎯ Next. then so does the WLLN. then. That is.5. PROOF Since F(x) → 0 as x → −∞.3 Carry out the proof of Theorem 7′(ii). there exists an interval [α. and let F be ⎯ n→∞ continuous. n in the corollary are distributed as B(1. x ∈ . . r . j = 1. . 8. β].6* Pólya’s Lemma and Alternative Proof of the WLLN The following lemma is an analytical result of interest in its own right. By taking .5.v. It was used in the corollary to Theorem 3 to conclude uniform convergence. (Use the usual Euclidean distance in k. 2 n X n 1 − X n − pq ⎯d → N ⎛ 0. for every ε > 0 there exists N(ε) > 0 such that n ≥ N(ε) implies that |Fn(x) − F(x)| < ε for every x∈ . Let F and {Fn} be d.206 8 Basic Limit Theorems ⎛ n g X n − g μ ⎯d → g ′ μ X ~ N ⎜ 0. Fn(xj) ⎯ → F(xj) implies that there exists Nj(ε) > 0 such that for all n→∞ n ≥ Nj(ε). F β > 1 − ε 2 .1 Use Theorem 8(ii) in order to show that if the CLT holds. ⋅ ⋅ ⋅ . we actually have −2M < X < 2M. . LEMMA 1 (Pólya). β ] such that F α < ε 2. 2 ▲ APPLICATION If the r. σ g ′ μ ⎯ ⎝ [ ( ) ( )] () [ ( )] ⎞⎟⎠. p). ( ) ( ) j = 1. and g(x) = x(1 − x).) 8. Then the convergence is uniform in x ∈ .f. . r − 1. so that g′(x) = 1 − 2x. pq 1 − 2 p ⎞ . Fn x j − F x j < ε 2. as x → ∞.2 Refer to the proof of Theorem 7′(i) and show that on the set A1 ∩ A2(n). 8. ( ) ( ) j = 1.’s such that Fn(x) ⎯ → F(x). ⎯ ⎝ ⎠ Here μ = p. and F(x) → 1. β] such that F x j +1 − F x j < ε 2. ( ) () (11) The continuity of F implies its uniform continuity in [α. ⋅ ⋅ ⋅ .’s Xj. as n → ∞. [ ( ) ] ( ) Exercises 8. Then there is a ﬁnite partition α = x1 < x2 < · · · < xr = β of [α. The result follows.5. σ 2 = pq.

and is due to Khintchine. (13) Let x0 = −∞. j = 0. ⎩ () Xj ≤δ ⋅n Xj >δ ⋅n and ⎧0. of the X’s. r + 1. we deﬁne ⎧X j . . . r. By (15) and (16) and for n ≥ N(ε). ( ) ( ) j = 1. j = 1. (15) Also (13) trivially holds for j = 0 and j = r + 1. let x be any real number. 1. Thus for n ≥ N(ε). . Xj = Yj + Zj. Let us restrict ourselves to the continuous case and let f be the (common) p. Fn x j − F x j < ε 2.v. it was also used by Markov.d. n. xr+1 = ∞. ⋅ ⋅ ⋅ . j = 1. . relation (11) implies that F x1 − F x0 < ε 2. r. if ⎪ Yj n = Yj = ⎨ if ⎪0 . Then. . () () for every x ∈ . 1. Then. ( ) ( ) ( ) ( ) (14) Thus.8. ⋅ ⋅ ⋅ .’s. a proof of the WLLN (Theorem 5) is presented without using ch. we have the following string of inequalities: F x j − ε 2 < Fn x j ≤ Fn x ≤ Fn x j+1 < F x j−1 + ε 2 j j+1 ( ) ( ) () ( ) ( ) < F (x ) + ε ≤ F (x) + ε ≤ F (x ) + ε . F x r +1 − F x r < ε 2. () ( ) ( ) Hence 0 ≤ F x + ε − Fn x ≤ F x j+1 + ε − F x j + ε 2 < 2 ε () and therefore |Fn(x) − F(x)| < ε. ( ) ( ) j = 0. Then xj ≤ x < xj+1 for some j = 0. (17) ALTERNATIVE PROOF OF THEOREM 5 We proceed as follows: For any δ > 0. by means of (12) and (14). Then by the fact that F(−∞) = 0 and F(∞) = 1. ⋅ ⋅ ⋅ . if ⎪ Zj n = Zj = ⎨ ⎪ X j . . we then have that () ( () ( )) Fn x j − F x j < ε 2. clearly. 1. . we have Fn x − F x < ε Relation (17) concludes the proof of the lemma.f. ▲ Below. we have that F x j +1 − F x j < ε 2. r . ⋅ ⋅ ⋅ .6* Pélya’s Lemma and Alternative Proof of the WLLN 207 n ≥ N ε = max N1 ε . n.f. The basic idea is that of suitably truncating the r. . ⋅ ⋅ ⋅ . if ⎩ () Xj ≤δ ⋅n X j > δ ⋅ n. . N r ε . we have ( ) ( ) (16) Next.’s involved. that is.

208 8 Basic Limit Theorems σ 2 Y j = σ 2 Y1 ( ) = E Y 12 − EY1 2 = E X1 ⋅ I = ∫ x2I −∞ ∞ { ( ) ( ) [X 1 ( ) 2 ≤ E Y 12 ≤δ ⋅n ] ( X1 ) [ x ≤δ ⋅n] ( x ) f ( x )dx } ( ) =∫ that is. that is. xI and ⎯ n→∞ [ x ≤δ ⋅n] ( x ) f ( x ) < x f ( x ). σ 2 Yj ≤ δ ⋅ n ⋅ E X 1 . ∫ Therefore ∞ −∞ x f x dx < ∞. ∞ () ∫ ∞ −∞ xI ⎯ n→∞ [ x ≤δ ⋅n] ( x ) f ( x )dx ⎯ → ∫−∞ xf ( x )dx = μ E Y j ⎯ → μ. } Now. Next. ⎯ n→∞ by Lemma C of Chapter 6. δ ⋅n − δ ⋅n x 2 f x dx ≤ δ ⋅ n∫ () δ ⋅n − δ ⋅n x f x dx ≤ δ ⋅ n∫ () ∞ −∞ x f x dx () = δ ⋅ nE X1 . xI [ x ≤δ ⋅n] ( x ) f ( x ) ⎯ → xf ( x ). ( ) (19) ⎡ n ⎤ ⎡1 n ⎤ ⎛ n ⎞ P ⎢ ∑ Yj − EYj ≥ ε ⎥ = P ⎢ ∑ Yj − E ⎜ ∑ Yj ⎟ ≥ nε ⎥ ⎢ j =1 ⎥ ⎝ j =1 ⎠ ⎢ ⎥ ⎣ n j =1 ⎦ ⎣ ⎦ ≤ = ≤ = ⎛ n ⎞ 1 σ 2 ⎜ ∑ Yj ⎟ 2 nε ⎝ j =1 ⎠ 2 nσ 2 Y1 2 n ε2 nδ ⋅ n ⋅ E X 1 ( ) δ E X1 ε2 n 2ε 2 . ( ) (18) E Yj = E Y1 = E X 1 I = ∫ xI −∞ ∞ ( ) ( ) { [X 1 ≤δ ⋅n ] ( X1 ) [ x ≤δ ⋅n ] ( x) f ( x)dx.

8. ⎤ δ ⎡1 n P ⎢ ∑ Yj − EY1 ≥ ε ⎥ ≤ 2 E X 1 . Next. So P(Zj ≠ 0) ≤ δ/n and hence ⎡n ⎤ P ⎢∑ Z j ≠ 0 ⎥ ≤ nP Z j ≠ 0 ≤ δ ⎢ j =1 ⎥ ⎣ ⎦ ( ) (22) . ⎤ ⎡⎛ 1 n ⎡1 n ⎤ ⎞ P ⎢ ∑ Y j − μ ≥ 2 ε ⎥ = P ⎢ ⎜ ∑ Y1 − E Y j ⎟ + E Y1 − μ ≥ 2 ε ⎥ ⎠ ⎥ ⎢ ⎝ n j=1 ⎢ ⎥ ⎣ n j=1 ⎦ ⎦ ⎣ ( ) ( ( ) ) ⎡1 n ⎤ ≤ P ⎢ ∑ Y j − EY1 + EY1 − μ ≥ 2 ε ⎥ ⎢ n j=1 ⎥ ⎣ ⎦ ⎡1 n ⎤ ≤ P ⎢ ∑ Y j − EY1 ≥ ε ⎥ + P EY1 − μ ≥ ε ⎢ n j=1 ⎥ ⎣ ⎦ δ ≤ 2 E X1 ε [ ] for n sufﬁciently large. since ∫ ( x >δ ⋅n ) n () () for n sufﬁciently large. that is. ⎥ ε ⎢ n j =1 ⎦ ⎣ (20) Thus. by (19) and (20). that is. P Zj ≠ 0 = P Zj > δ ⋅ n j (21) ( ) ( =∫ =∫ =∫ <∫ = ) = P ( X > δ ⋅ n) − δ ⋅n −∞ f x dx + ∫ ( x >δ ⋅n ( x >∫ in >1 f ( x)dx () f x dx ) ( ) f x dx ) ( ) ∞ δ ⋅n x ( x >δ ⋅n ) δ ⋅ n f x dx () 1 x f x dx δ ⋅ n ∫( x >δ ⋅n ) 1 2 δ < δ ⋅n δ x f x dx < δ 2 = .6* Pólya’s Lemma and Alternative Proof of the WLLN 209 by (18). ⎤ δ ⎡1 n P ⎢ ∑ Y j − μ ≥ 2 ε ⎥ ≤ 2 E X1 ⎥ ε ⎢ n j=1 ⎦ ⎣ for n large enough.

nk ↑ ∞. As an application of Theorem 11. it was stated that Xn ⎯P → ⎯ n→∞ a. in Remark 3. the following is n→∞ always true.v. k → ∞) ⎯ n→∞ a. so that 1/2k−1 < ε.’s {X2 −1}. THEOREM 11 If Xn ⎯P → X. Since this is true for every ε > 0. X does not necessarily imply that Xn ⎯⎯→ X. the result follows. such that Xnk ⎯⎯→ X. we have 1 P X 2 −1 > ε = P X 2 −1 = 1 = k−1 < ε. Then for ε > 0 and large enough k. (22). ( k ) ( k ) k . where k X2 k −1 = I⎛ 2 k−1 −1 ⎤ ⎜ k − 1 ⋅ 1⎥ ⎝ 2 ⎥ ⎦ . we get ⎤ ⎡1 n P ⎢ ∑ X j − μ ≥ 4ε ⎥ ≤ εE X 1 + ε 3 ⎥ ⎢ n j =1 ⎦ ⎣ for n sufﬁciently large. ▲ This section is concluded with a result relating convergence in probability and a.s. convergence. Replacing δ by ε3. by (21). Thus.210 8 Basic Limit Theorems for n sufﬁciently large.s.s. refer to Example 2 and consider the subsequence of r. for example. n→∞ PROOF Omitted. ⎤ ⎤ ⎡1 n ⎡1 n 1 n P ⎢ ∑ X j − μ ≥ 4ε ⎥ = P ⎢ ∑ Y j + ∑ Z j − μ ≥ 4ε ⎥ n j =1 ⎥ ⎢ n j =1 ⎥ ⎢ n j =1 ⎦ ⎦ ⎣ ⎣ ⎡1 n ⎤ 1 n ≤ P ⎢ ∑ Y j − μ + ∑ Z j ≥ 4ε ⎥ n j =1 ⎢ n j =1 ⎥ ⎣ ⎦ ⎡1 n ⎤ ⎤ ⎡1 n ≤ P ⎢ ∑ Y j − μ ≥ 2ε ⎥ + P ⎢ ∑ Z j ≥ 2ε ⎥ ⎥ ⎢ n j =1 ⎥ ⎢ n j =1 ⎦ ⎣ ⎣ ⎦ n n ⎤ ⎡1 ⎤ ⎡ ≤ P ⎢ ∑ Y j − μ ≥ 2ε ⎥ + P ⎢ ∑ Z j ≠ 0⎥ ⎥ ⎢ n j =1 ⎥ ⎢ ⎦ ⎣ j =1 ⎦ ⎣ δ ≤ 2 E X1 + δ ε for n sufﬁciently large. 2 Hence the subsequence {X2 −1} of {Xn} converges to 0 in probability. However. More precisely. then there is a subsequence {nk} of {n} (that is.

2 Do likewise in order to establish part (ii) of Theorem 7′. 8.Exercises 211 Exercises 8. .6.1 Use Theorem 11 in order to prove Theorem 7′(i).6.

. We wish to determine fY(yj) = P(Y = yj). Thus we have the following theorem. B (Borel) subset of . That is. . X as follows: for any (Borel) subset B of .v.1 The Univariate Case The problem we are concerned with in this section in its simplest form is the following: Let X be an r. Let PX. where A = h−1(B) = {x ∈ . .1 Application 1: Transformations of Discrete Random Variables Let X be a discrete r.1. so that Y = h(X) is an r. and let h: → be a (measurable) function. 2. Y is determined by the distribution PX of the r. THEOREM 1 Let X be an r. PY(B) = PX(A). X i x i ∈A 212 . .212 9 Transformations of Random Variables and Random Vectors Chapter 9 Transformations of Random Variables and Random Vectors 9. Now (Y ∈ B) = [h(X) ∈ B] = (X ∈ A).v. PY be the distributions of X and Y. 2. { ( ) } X and hence fY y j = P Y = y j = PY y j ( ) ( ) ({ }) = P ( A) = ∑ f (x ). j = 1. 9. taking the values xj. PY(B) = P(Y ∈ B). where A = h−1(B). h(x) ∈ B}. we have A = xi.v.v.v. respectively. we want to determine the distribution of Y. . . . so that Y = h(X) is an r. Given the distribution of X.v. . Therefore PY(B) = P(Y ∈ B) = P(X ∈ A) = PX(A). . .v. and let h be a (measurable) function on into . By taking B = {yj}. Then Y is also a discrete r. taking the values yj. j = 1. . 2. . Then the distribution PY of the r.v. h xi = y j . PX(B) = P(X ∈ B). j = 1. and let Y = h(X).

Then the inverse . 12. X. r = ±1. x=r Thus PY B = PX A = PX −r + PX r That is. Thus.f. and let Y = X2. . . . x 2 + 2 x − 3 = y. n. then A = h −1 B = x 2 = r 2 = x = − r or ( ) ( ) ( ) = ( x = r ) + ( x = − r ) = {− r} + {r}. . Y above. This is easily done if the transformation h is one-to-one from S onto T and monotone (increasing or decreasing). x = 0. under h: that is. we have P(Y = 12) = e−λλ3/3!. n each with probability 1/2n. FY. 1. it sufﬁces to determine its d. Then Y takes on the values 1.9. . 4.v. for y = 12. . −1. . −1 + y + 4 ! ) For example. It is a fact. and PY B = P Y = y = PX A = ( ) { ) } ( ) ( ( ) ( e − λ ⋅ λ−1+ y + 4 . EXAMPLE 2 ( ) Let X be P(λ) and let Y = h(X) = X2 + 2X − 3. .1 The Univariate Case 213 where EXAMPLE 1 fX x i = P X = x i . in determining the distribution PY of the r. Then Y takes on the values {y = x 2 + 2 x − 3. ⋅ ⋅ ⋅ = −3. the set to which S is transformed by h.. then A = h −1 B = −1 + y + 4 . . By “one-to-one” it is meant that for each y ∈T. so that ( ) x = −1 ± y + 4 .) Thus. ( ) ( ) Let X take on the values −n. ±n. ( ) ( ) ({ }) ({ }) = 21n + 21n = 1 . (A ﬁrst indication that such a result is feasible is provided by Lemma 3 in Chapter 7. . vectors. The same is true for r. ⋅ ⋅ ⋅ . 1. 0. Hence x = −1 + y + 4 . . the root −1 − y + 4 being rejected. ⋅ ⋅ ⋅ . . . n2 with probability found as follows: If B = {r2}. n P Y = r 2 = 1 n. r = 1. . 5. X is uniquely determined by its d. if B = {y}.v. proved in advanced probability courses. . since it is negative. that the distribution PX of an r. . } { } From we get x 2 + 2 x − y + 3 = 0. there is only one x ∈S such that h(x) = y.f. where S is the set of values of X for which fX is positive and T is the image of S.

and FY(y) = 1 − FX(x−) if h is decreasing. . X ( )} where FX(x−) is the limit from the left of FX at x. we have FY y = P Y ≤ y = P h X ≤ y −1 −1 () ) [( ) ] = P{h [ h( X )] ≤ h ( y)} = P( X ≤ x ) = F ( x ).1 points out why the direction of the inequality is reversed when h−1 is applied if h in monotone decreasing. h−1. REMARK 1 y y y0 (y y0) corresponds. h−1[h(x)] = x. Here is an example of such a case. under h. of course.f. ( X where x = h (y) and h is increasing.1 h(x) x0 h 1 (y0 ) x Thus we have the following corollary to Theorem 1. For such a transformation. FY of Y can be expressed in terms of the d. REMARK 2 Of course.f. Then FY(y) = FX(x) if h is increasing. FX(x−) = lim FX(y). ( ) [( ) ] ( ) ( () ) ( y ) − F (− y −). Then for y ≥ 0. y ↑ x. where x = h−1(y) in either case. it is zero for y < 0. FY y = P Y ≤ y = P h X ≤ y = P X 2 ≤ y = P − y ≤ X ≤ y = P X ≤ y − P X < − y = FX () ( that is.214 9 Transformations of Random Variables and Random Vectors transformation. EXAMPLE 3 Let Y = h(X) = X2. COROLLARY Let h: S → T be one-to-one and monotone. it is possible that the d. of course. X X ) ( ) FY y = FX ( y ) − F (− y −) for y ≥ 0 and. to (x x0) 0 Figure 9. In the case where h is decreasing. FX of X even though h does not satisfy the requirements of the corollary above. we have −1 FY y = P h X ≤ y = P h −1 h X ≥ h −1 y −1 ( ) [ ( ) ] { [ ( )] = P[ X ≥ h ( y)] = P( X ≥ x ) = 1 − P( X < x ) = 1 − F ( x − ). exists and. Figure 9.

v.f. which agrees with Theorem 3. The following example illustrates the procedure. Y by Theorem 1 (take B = (−∞. Chapter 4. FY of the r. assume that X is N(0. Let X be an r. Then A is an interval in S and ( )Z 1 2 1 1 2 y e−y 2 ( π = Γ( )). 1 2 () [ ( )] () P Y ∈ B = P h X ∈ B = P X ∈ A = ∫ fX x dx. d] be any interval in T and set A = h−1(B).d. the p. we know that () 2 FY y = FX () X ( y ) − F (− y ).d.v.v.9. fY of Y is given by the following expression: ⎧ d −1 h y . Let y = h(x) be a (measurable) transformation deﬁned on into which is one-to-one on the set S onto the set T (the image of S under h). A ( ) [( ) ] ( ) () Under the assumptions made. 1). fY of Y. let B = [c. fX is continuous on the set S of positivity of fX. y ∈ ). We recognize it as being the p. of Y = h(X). d FX dy d ( y ) = f ( y ) dy y= 1 2 y fX ( y) = 2 1 −1 2 1 2π y e −y 2 . M. of a χ 2 distributed 1 r. 2π Then.f. under appropriate conditions. EXAMPLE 4 In Example 3. provided it exists.v. Set Y = h(X). so that Y is an r.d. and then determine the p. if Y = X2. T.f. dy 2 y 2 2π y ( ) () ( ) Γ d 1 FY y = fY y = dy y () 1 2π e−y 2 = y ≥ 0 and zero otherwise. ⎩ For a sketch of the proof.f. y]. the theory of changing the variable in the integral on the right-hand side above applies (see for example. whose p. by differentiating (for the continuous case) FY at continuity points of fY.d. and so that d 1 1 FX − y = − fX − y = − e −y 2 . Next. Then the inverse transformation x = h−1(y) exists for y ∈ T.f. Under the above assumptions.d. y ∈T ⎪ f h −1 y fY y = ⎨ X dy ⎪0.f.d. One way of going about this problem would be to ﬁnd the d. X y ≥ 0. and we will determine the p. Another approach to the same problem is the following. .1 The Univariate Case 215 We will now focus attention on the case that X has a p.f. otherwise. so that 1 −x 2 fX x = e . Apostol. It is further assumed that h−1 is differentiable and its derivative is continuous and different from zero on T.

We have 2 h −1 y = () 1 y−b a ( ) and d −1 1 h y = . Suppose that h is one-to-one on S onto T (the image of S under h). It is further assumed that h−1 is differentiable and its derivative is continuous and ≠ 0 on T. fY y = () ⎡ exp⎢− ⎢ 2πσ ⎢ ⎣ 1 1 ( y −b a = ⎡ ⎢ − y − aμ + b exp⎢ 2 a 2σ 2 2π a σ ⎢ ⎣ [ ( −μ ⎤ 1 ⎥⋅ 2σ 2 ⎥ a ⎥ ⎦ ) )] 2 ⎤ ⎥ ⎥ ⎥ ⎦ which is the p.d. that fY has the expression given above. Now it may happen that the transformation h satisﬁes all the requirements of Theorem 2 except that it is not one-to-one from S onto T. 1957.f. then aX + b is N(aμ + b. σ2). a2σ2). the following might happen: There is a (ﬁnite) partition of S. with mean aμ + b and variance a2σ2. of a normally distributed r. of an r. if X is N(μ. clearly. of the r. EXAMPLE 5 Let X be N(μ. d That is. fX on the set S on which it is positive.v. Then the p.d. Thus we have the following theorem. b ∈ . Instead. σ ) and let y = h(x) = ax + b. Y.f. ⎩ () [ ( )] () y ∈T otherwise.v. dy a () Therefore.d.d. THEOREM 2 Let the r. which we denote by . 216 and 270–271) and gives −1 −1 ∫A fX (x )dx = ∫B fX [h ( y)] dy h ( y) dy.216 9 Transformations of Random Variables and Random Vectors Mathematical Analysis. P(Y ∈ B) = P[X ∈ h−1(B)] ≤ P(X ∈ Sc) = 0.v. and let y = h(x) be a (measurable) transformation deﬁned on into . a 0.f. where a.v. P Y ∈ B = ∫ fX h −1 y B ( ) d [ ( )] dy h ( y) dy. ⎪ f h −1 y fY y = ⎨ X dy ⎪0. X have a continuous p. are constants. so that Y = aX + b. Thus. fY of Y is given by ⎧ d −1 h y .f. Addison-Wesley.f. satisﬁes the conditions of Theorem 2.d.v. Here the transformation h: → . pp. so that Y = h(X) is an r. so that the inverse transformation x = h−1(y) exists for y ∈ T. We wish to determine the p. −1 Since for (measurable) subsets B of T c. it follows from the deﬁnition of the p. for any interval B in T.

∞ ( ) by assuming that fX(x) > 0 for every x ∈ . j EXAMPLE 6 Consider the r. . . . r. Tj). fX on the set S on which it is positive. . Assume that h−1 is differentiable and j j its derivative is continuous and ≠ 0 on Tj. . j = 1. () h −1 y = 2 () y. . Y. This result simply says that for each one of the r pairs of regions (Sj.v. . . which need not be distinct or disjoint. we work as we did in Theorem 2 in order to ﬁnd fY y = fX h −1 y j j () d [ ( )] dy h ( y) . fY of Y is given by ⎧ r ⎪ δ y fY y . .f. ∞ . . . Next. − h1 1 y = − y .d. where for j = 1. j = 1. fY y = fX h −1 y j j () d [ ( )] dy h ( y) .v. . fY of the r. we can establish the following theorem. r. . X and let Y = h(X) = X2.v. . . . j = 1. we ﬁnd fY(y) by summing up the corresponding fY (y)’s. . . h2 y = dy 2 y () . . . r. . fY y = ⎨∑ j j =1 ⎪0. ( ) T1 = 0. and let y = h(x) be a (measurable) transformation deﬁned on into . ( ] S 2 = 0.f. ⎩ () () () j y ∈T otherwise. −1 j y ∈T j . . . j = 1. Then by an argument similar to the one used in proving Theorem 2.1 The Univariate Case 217 {Sj. . . X have a continuous p. Then the p. . but the Tj’s need not be disjoint) such that h: Sj → Tj. . j = 1. j=1 j = 1. [ ) T2 = 0. Here the conditions of Theorem 3 are clearly satisﬁed with S1 = −∞. r. . () 1 d −1 . . THEOREM 3 Let the r. r is one-to-one. The following example will serve to illustrate the point. .v. such that ∪ r Tj = T and that h deﬁned on each one of Sj onto Tj .d. and δj(y) = 1 if y ∈ Tj and δj(y) = 0 otherwise. . r. . y > 0. r} of S and subsets Tj. which we denote by Tj. . .d. j = 1. . . and there are r subsets of T. r of T (the image of S under h). j = 1. Let hj be the restriction of the transformation h to Sj and let h−1 be its inverse. . 0 . We want to determine the p.f. ∞ . . . (note that r Tj = T. h1 y = − dy 2 y Therefore. j=1 j = 1. . j = 1. r (0 ≤ k ≤ r). so that Y = h(X) is an r. so that d −1 1 . r}. . . −1 j then if a y in T belongs to k of the regions Tj.9. is one-to-one. r. Suppose that there is a partition {Sj. . .

1.f. If X is distributed as N(μ. Use the ch. ﬁrst by determining their d.5 If the r. 9. j . where Y = eX.f.f. we then get fY y = () ( y ) + f (− y )⎤⎦⎥. 105). f given in Exercise 3. iii) If X is interpreted as a speciﬁed measurement taken on each item of a product made by a certain manufacturing process and cj. 3 are the proﬁt (in dollars) realized by selling one item under the condition that X ∈ Bj. . and secondly directly.d. X provided ±√y are continuity points of fX.d.1.d. X is N(99. ii) If n = 3. 1/(X + 1).v. ﬁrst by determining its d. 9. Y be r.f.’s in part (i) become for α = 0 and β = 1? iii) For α = 0 and β = 1. i) Express the p.’s. ﬁnd the expected proﬁt from the sale of one item. of a χ 2 r. σ ).d. . j j = 1. Y = X 3. respectively.3 Let X. and secondly directly. ﬁrst by determining their d. X2 +1.v. and notice that Y is a discrete r. are independent and distributed as the r.f. of the continuous type and set Y = ∑ n= 1cjIB (X). 92] + [107.’s Y. . j = 1. eX. and for y > 0. and secondly directly. respectively.v. .f. .1.1.’s Yj. . log X (for α > 0). n. j = 1.v. ∞).f. . Exercises 9. whereas X is an r.2. if X is N(0. B2 = (92. Y.v.d.v.f. . as we also saw in Example 1 4 in a different way.14 of Chapter 3 and determine the p. β): i) Derive the p. j = 1. 3.d.v. 1).4 If the r. of each one of the r.1.d. Z. of Y in terms of that of X.. 95) + (105.1 Let X be an r.v. ﬁnd the p. n. Y deﬁned above. . of Y. .f.218 9 Transformations of Random Variables and Random Vectors fY y = fX − y 1 () ( ) 2 1y . of the r. let Y = log X and suppose that the r.v. 107). ii) What do the p.’s: aX + b (a > 0). Z = log X. determine the p.d. j = 1. 2.f. with p.v.2 Let X be an r. 1 ⎡ f ⎢ X 2 y⎣ fY y = fX 2 () ( y) 2 1y . B3 = (−∞.f. approach to determine the p.f.’s. j 9. Then it is known that Y = – X 5 2 + 32.v. 9. X is distributed as Negative Exponential with parameter λ. X is distributed as U(α. n. are pairwise disjoint (Borel) sets and cj.d. with p.v. .’s of the following r.f. determine the distribution of the r. 2. are constants. In particular. .. of −∑n= 1Yj. where Bj.’s representing the temperature of a certain object 9 in degrees Celsius and Fahrenheit.v.v.f.d. 5) and B1 = (95. we arrive at the conclusion that fY(y) is the p. of the continuous type.

Z = sin X. show that the r.v.1. –π). vector and let h: k → m be a (measurable) function. Y = [(X − μ)/σ]2 is distributed as χ 2.v.f.v.1. set Y = X/(1 + X) and determine the r p.f. We have FY y = P X1 + X 2 ≤ y = ∫ ∫ {x +x 1 () ( ) 2 ≤y } fX 1 . 2 9. X is distributed as Cauchy with μ = 0 and σ = 1.10 Let X be an r. X is distributed as N(μ. show.1. β. . The proof of this theorem is carried out in exactly the same way as that of Theorem 1. by means of a transformation.v. Then the distribution PY of the r. that the r. .d.1. X is distributed as U(− –π. . THEOREM 1′ Let X = (X1.1.7 If the r.6 If the r. Xk)′ be a k-dimensional r. –π). x )dx dx . Derive the distribution of the 1 r.11 Suppose that the velocity X of a molecule of mass m is an r.2 The Multivariate Case 9.12 If the r.v. . 9.f.’s distributed as U(α. 2 9. 1). of Y.v. Y = tan−1 X is distributed as U(−–π. X2 be independent r. Y = X1 + X2. EXAMPLE 7 Let X1.3.v. vector X as follows: For any (Borel) subset B of m. of the r.2 The Multivariate Case What has been discussed in the previous section carries over to the multidimensional case with the appropriate modiﬁcations.d. FY y = () 1 (β − α ) ⋅ A. x∈ and show that the r. We wish to determine the d.X 2 (x . 2 2 9. σ2).1 The Univariate 219 1 1 9.v. and Y = 2X/β.v.9 If the r.f. 1 9. Also ﬁnd the distribution of the r.d.v. so that Y = h(X) is an r.v.1. vector Y is uniquely determined by its d. 9. we see that for y ≤ 2α. where A = h−1(B). vector. distributed as χ 2. f given by f x = () 1 2π x −2 e −1 2 x 2 ( ). 9.v. the distribution PY of the r. 1 2 1 2 2 From Fig.f.13(ii) of Chapter 3. PY(B) = PX(A).v. with p. show that Y ∼ χ 2α.1.8 If X is an r. Y = tan X is 2 2 distributed as Cauchy. 9.2.9. FY. For 2α < y ≤ 2 β . vector Y is determined by the distribution PX of the r. f given in Exercise 3. β). X has the Gamma distribution with parameters α. FY(y) = 0. As in the univariate case.v. provided 2α is an integer. with p. . Y = 1/X is distributed as N(0.v. show that the 1 1 r. Y = –mX2 (which is the kinetic energy of the molecule).

v.’s. ⎪ 2 ⎪2 β − α FY y = ⎨ 2β − y ⎪ ⎪1 − ⎪ 2 β −α ⎪ ⎩1.’s. α + β < y ≤ 2β y > 2 β. x2 2 x1 x2 y x1 x2 y 0 2 x1 2 Figure 9.220 9 Transformations of Random Variables and Random Vectors where A is the area of that part of the square lying to the left of the line x1 + x2 = y. A = (y − 2α)2/2. REMARK 3 1 2 1 2 1 2 1 2 .f. Thus we have: ⎧0 . We also write fX +X = fX * fX for the corresponding p.’s (not necessarily U(α. X2 and is denoted by FX +X = FX * FX .f.2 The d.d.’s of X1.f. Since for y ≤ α + β. ( () ( ( ) ) y ≤ 2α 2α < y ≤ α + β 2 2 ( ) ) .v. of X1 + X2 for any two independent r. For α + β < y ≤ 2β. we have FY y = () 1 (β − α ) 2 ⎡ ⎢ β −α ⎢ ⎢ ⎣ ( ) 2 (2 β − y) − 2 2 ⎤ ⎥ = 1 − 2β − y ⎥ 2 β −α ⎥ ⎦ ( ( ) ) 2 2 . ⎪ 2 ⎪ y − 2α . we get ( y − 2α ) F ( y) = 2(β − α ) Y 2 2 for 2α < y ≤ α + β . β) distributed) is called the convolution of the d. These concepts generalize to any (ﬁnite) number of r.

Y y1 . X2 be B(n2. n2 = υ. of Y1. for the four possible values of the pair. Y2 = y2 = P X1 = y1 − y2 . 1 2 ( ) ( ) ( ) since X1 = Y1 − Y2 and X2 = Y2. X 2 = y2 . it follows that ⎛ n1 ⎞ ⎛ n2 ⎞ ⎜ ⎟⎜ ⎟ ⎝ y − y2 ⎠ ⎝ y2 ⎠ P Y2 = y2 Y1 = y1 = 1 ..f. and the conditional p. with y1 and y2 as above. Y1 = X1 + X2 is B(n1 + n2.d. 1 − y2 ⎠ ⎝ y2 ⎠ Next. of p!. p) and independent. given Y1 = y1. =⎜ ⎟⎜ ⎟ p q ⎝ y1 − y2 ⎠ ⎝ y2 ⎠ 1 1 2 1 −y2 that is ⎛ n1 ⎞ ⎛ n2 ⎞ y ( n + n )− y fY .) Finally. p).9.d.2 The Multivariate Case 9. fY . .f. Chapter 7. We next have two theorems analogous to Theorems 2 and 3 in Section 1.1 The Univariate 221 EXAMPLE 8 Let X1 be B(n1. y ) = p y2 =u Y1 .f. y2 = ⎜ .d. (Observe that this agrees with Theorem 2.d. y1 − n1 ≤ y2 ≤ min y1 . independent. υ). y2 = P Y1 = y1 .f. p). of Y1. Let Y1 = X1 + X2 and Y2 = X2. ⎝ y1 − y2 ⎠ ⎝ y2 ⎠ ⎝ y1 ⎠ that is. this is equal to P X1 = y1 − y2 P X 2 = y2 1 2 1 1 ( ) ( ) 2 2 2 ⎛ n1 ⎞ y − y n −( y − y ) ⎛ n2 ⎞ y n =⎜ ⋅⎜ ⎟ p q q ⎟p ⎝ y1 − y2 ⎠ ⎝ y2 ⎠ ⎛ n1 ⎞ ⎛ n2 ⎞ y ( n + n )− y . ⎛ n1 + n2 ⎞ ⎜ ⎟ ⎝ y1 ⎠ ( ) the hypergeometric p. Furthermore. ⎩ ( ) ( 1 ) fY y1 = P Y1 = y1 = ( ) ( ) ∑ f (y . Y2 and also the marginal p.Y y1 . by independence. of Y2. That is. (u. ⎟⎜ ⎟ p q ⎝ y1 − y2 ⎠ ⎝ y2 ⎠ 1 2 ( ) 1 1 2 1 Thus 1 ⎧0 ≤ y1 ≤ n1 + n2 ⎪ ⎨ ⎪u = max 0. we have y2 = 0 ∑ ⎜y ⎝ y1 ⎛ y ⎛ n1 ⎞ ⎛ n2 ⎞ ⎞ ⎛ n2 ⎞ n ⎛ n1 ⎞ ⎛ n2 ⎞ ⎟⎜ ⎟ ⎟⎜ ⎟ = ∑ ⎜ ⎟⎜ ⎟ = ∑ ⎜ y =0 ⎝ y1 − y2 ⎠ ⎝ y2 ⎠ y = y −n ⎝ y1 − y2 ⎠ ⎝ y2 ⎠ 1 − y2 ⎠ ⎝ y2 ⎠ n1 1 2 2 1 1 = y2 = y1 −n 1 ∑ n2 ⎛ n1 ⎞ ⎛ n2 ⎞ ⎛ n1 + n2 ⎞ ⎟⎜ ⎟ = ⎜ ⎜ ⎟. We want to ﬁnd the joint p.Y2 1 2 2 υ y1 q ( n + n )− y 2 1 y2 =u ∑⎜ y ⎝ υ ⎛ n1 ⎞ ⎛ n2 ⎞ ⎟⎜ ⎟.

A number of examples will be presented to illustrate the application of Theorem 2′ as well as of the preceding remark. ⋅ ⋅ ⋅ . . .f. so that they are of the simplest possible form and such that the transformation REMARK 4 h = h1 .f. fX on the set S on which it is positive. we work as follows. . gk y J . the dimensionality m of Y is less than k. j = 1. . .222 9 Transformations of Random Variables and Random Vectors THEOREM 2 ′ Let the k-dimensional r. vector X have continuous p. . hk x () ( () ( )) ′ be a (measurable) transformation deﬁned on k into k. of Y. hm + 1 . where Y = (Y1. . ∂yi ( ) i.d. . so that Y = h(X) is a k-dimensional r. gk y () ( () ( )) ′ exists for y ∈T . 1 fY y = ⎨ X ⎪0. Then in order to determine the p. Then the p. Ym.d.f. . .f. ⋅ ⋅ ⋅ . say. In many applications. j = 1.f. . ⋅ ⋅ ⋅ .d. . . . Suppose that h is one-to-one on S onto T (the image of S under h). In Theorem 2′. vector Y. . ⎩ () [ ( )] [ () g12 g 22 M gk2 ( )] y ∈T otherwise. Ym + 1. k − m. yk . we have the p. however. . . .d. so that the inverse transformation x = h −1 y = g1 y . j = 1. ⋅ ⋅ ⋅ . we obtain the p. It is further assumed that the partial derivatives g ji y = () ∂ g j y1 . Ym)′ and Ym + j = hm + j(X). of Y. where the Jacobian J is a function of y and is deﬁned as follows J= g11 g 21 M g k1 ⋅ ⋅ ⋅ g1k ⋅ ⋅ ⋅ g2 k M M ⋅ ⋅ ⋅ g kk and is assumed to be ≠ 0 on T. ⋅ ⋅ ⋅ . . . k exist and are continuous on T. vector. . Let y = (h1(x). . . the transformation h transforms the k-dimensional r. . Then by applying Theorem 2′. . hm+j. hm . Yk)′. ⋅ ⋅ ⋅ . . . . j = 1. ⎪ X ⋅ ⋅ ⋅ . fY of Y is given by ⎧ f h −1 y J = f g y . k − m. Set Z = (Y1. k − m.d. . hm(x))′ and choose another k − m transformations deﬁned on k into . hk ( ) ′ satisﬁes the assumptions of Theorem 2′. and let y = h x = h1 x . fZ of Z and then integrating out the last k − m arguments ym+j. vector X to the k-dimensional r.

y2 are speciﬁed by α < y2 < β. Then J= 1 −1 =1 0 1 and also α < y2 < β. α < x1 . ) 2 . β).9.4 T = image of S under the transformation h. Since y1 − y2 = x1. .’s distributed as U(α.) x2 h (Fig. α < x1 .i. then ⎨ 1 h: ⎨ 1 ⎩Y2 = X 2 . X2 be i. α < x1 < β. Set Y1 = X1 + X2 and ﬁnd the p. x 2 < β . (See Figs. of Y1.f. 9. α < y1 − y2 < β.X2 (x1. fX1. x2)′.4) T Figure 9. 9.1 The Univariate 9.d.v.4. ⎩ y2 = x 2 From h.3 S = {(x1. x 2 < β otherwise. x2) > 0} S 0 x1 y2 y1 y2 c y1 y2 T 2 0 y1 y2 2 y1 Figure 9. r. x ) ( 1 2 ⎧ 1 ⎪ = ⎨ β −α ⎪ ⎩0 . Thus the limits of y1.3 and 9. we have α < y1 − y2 < β.d. we get ⎧ x1 = y1 − y2 ⎪ ⎨ ⎪ ⎩ x 2 = y2 .2 The Multivariate Case 223 EXAMPLE 9 Let X1. We have fX 1 . Consider the transformation ⎧Y = X1 + X 2 ⎧ y = x1 + x 2 .X 2 (x .

. x ) > 0⎫ ⎬ ⎭ 1 2 ⎧ ⎫ y T = ⎨ y1 .υ.Y y1 . α < y1 − y2 < β otherwise.d.d. 1 < y2 < β ⎬. α < y2 < β . ) 2 . x2 ′ . then ⎨ 1 h: ⎨ 1 ⎩ y2 = x 2 ⎩Y2 = X 2 .i.5 2 0 2 y1 REMARK 5 EXAMPLE 10 This density is known as the triangular p. of Y1. we get ⎧ y ⎪ x1 = 1 y2 ⎨ ⎪x = y 2 ⎩ 2 1 y1 − 2 1 and J = y2 y2 = . for 2α < y1 ≤ α + β for α + β < y1 < 2 β 2 otherwise. X2 be i. The graph of fY is given in Fig. y2 0 1 Now ⎧ S = ⎨ x1 . Let X1. From h. f X ⎩ is transformed by h onto ( ) 1 . 9.’s from U(1. β).d.f.f. Set Y1 = X1X2 and ﬁnd the p. 1 fY1(y1) 1 ( ) Figure 9. r. Consider the transformation ⎧ y = x1 x 2 ⎧Y = X1 X 2 . y2 ⎩ ⎭ ( ) . Therefore ⎧ 1 ⎪ ⎪ β −α ⎪ y1 = ⎨ 1 ⎪ ⎪ β −α ⎪0. 2α < y1 < 2 β . ⎩ fY 1 ( ) ( ( ) ) 2 ∫α y1 −α dy2 = ( y1 − 2α 2 ∫y −β dy2 = 1 β (β − α ) β −α 2 β − y1 ) 2 .X 2 ( x .5. y2 ′ .224 9 Transformations of Random Variables and Random Vectors Thus we get fY . y2 1 2 ( ) ( ⎧ 1 ⎪ = ⎨ β −α ⎪ ⎩0 . 1 < 1 < β .

9. 1 β ≤ y1 < β 2 . X2 be i.2 The Multivariate Case 9.6 y1 T y2 y1 1 2 y1 fY .’s from N(0. y1 = ⎨ 2 ⎪ β −1 ⎪ ⎪ ⎪0 . 2 (y . ⎪ 2 ⎪ β −1 ⎪ ⎪ 1 2 log β − log y1 .d.υ. Show that the p. ) ( ) 2 1 .6. y ) 1 2 ′ ∈T otherwise. σ = 1. we have fY 1 ( ) ( ( ⎧ 1 ⎪ ⎪ β −1 y1 = ⎨ ⎪ 1 ⎪ β −1 ⎩ ) ) 2 ∫ ∫ y1 1 dy2 1 = y2 β −1 ( ) 2 log y1 . y1 ∈ . ⎩ fY1 ( ) ( ( ) ) 1 < y1 < β ( ) β ≤ y1 < β 2 otherwise.v.i.9. that is. since y2 1 0 Figure 9. 1). π 1 + y2 1 We have Y1 = X1/X2.1 The Univariate 225 (See Fig.) Thus. fY y1 = 1 ( ) 1 1 ⋅ . Y1 = X1/X2 is Cauchy with μ = 0. of the r.Y 1 2 ( ⎧ 1 ⎪ y1 . EXAMPLE 11 Let X1.f. that is ⎧ 1 log y1 . y2 = ⎨ β − 1 ⎪ ⎩0. r.d. 1 < y1 < β β 2 y1 β dy2 1 = y2 β −1 ( ) 2 (2 log β − log y ). Let Y2 = X2 and consider the transformation .

and J= y2 0 y1 1 ⎧ x = y1 y2 then ⎨ 1 ⎩ x2 = y2 J = y2 . y2 ⋅ y2 = and therefore 1 fY ( y ) = 2π 1 1 ( ) ( ) 2 2 2 ⎛ y1 y2 + y2 ⎞ 1 exp⎜ − ⎟ y2 2π 2 ⎝ ⎠ ⎡ y2 + 1 y2 ⎤ 2 2 ⎛ y 2 y2 + y2 ⎞ 1 2 1 ∞ 1 ∫−∞ exp⎜ − 2 ⎟ y2 dy2 = π ∫0 exp ⎢− 2 ⎥y2 dy2 . Set Y1 = X1/(X1 + X2) and prove that Y1 is distributed as Beta with parameters α. 1 ∞ −t dt = 1. 2). x2 ≠ 0 h: ⎨ 1 ⎩ y2 = x2 .υ. so that 2 y2 = 2t y +1 2 1 2y2 dy2 = 2dt . y +1 2 1 [ ) Thus we continue as follows: 1 π since ∫0 e ∞ −t ∞ dt 1 1 1 1 −t = ⋅ 2 ∫0 e dt = π ⋅ y 2 + 1 . ⎪y = x + x 1 2 ⎩ 2 . ⎢ ⎥ ⎝ ⎠ ⎣ ⎦ ∞ ( ) Set (y and 2 1 +1 2 )y 2 2 = t . = y2 . or y2 + 1 1 y2 dy2 = dt . x1 . β. ∞ . so that Since −∞ < x1. π y1 + 1 y +1 1 2 1 ∫0 e that is.226 9 Transformations of Random Variables and Random Vectors ⎧ y = x1 x2 . π y1 + 1 fY y1 = EXAMPLE 12 ( ) Let X1.Y2 y1 . we have fY1 . x2 > 0. y2 < ∞.’s distributed as Gamma with parameters (α. then ⎨ 1 ⎩ x2 = y2 − y1 y2 . x2 < ∞ implies −∞ < y1. X2 be independent r. respectively. y2 = fX 1 .X 2 y1 y2 . 2) and (β. t ∈ 0. We set Y2 = X1 + X2 and consider the transformation: ⎧ x1 ⎧ x = y1 y2 ⎪y = h: ⎨ 1 x1 + x2 . 1 1 ⋅ 2 .

x1 + x 2 1 + x 2 x1 ( ) Thus 0 < y1 < 1 and. Y2 = . y2 = = ( ) 1 β α yα −1 y2 −1 y2 −1 1 − y1 1 Γ α Γ β 2 α +β ( )() ( )() 1 ( ) β −1 ⎛ y ⎞ exp⎜ − 2 ⎟ y2 ⎝ 2⎠ Γα Γβ2 α +β α y1 −1 1 − y1 ( ) β −1 α y2 + β −1e − y 2 2 .Y2 y1 . But ∫0 Therefore ∞ yα + β −1e − y 2 2 2 dy2 = 2 α + β ∫ t α + β −1e − t dt = 2 α + β Γ α + β . y1 = 0 and for x1 → ∞. it follows that for x1 = 0. α . r.9. X3 be i. 0 < y2 < ∞. 0 < y2 < ∞. Therefore. x2 > 0. for 0 < y1 < 1.υ. ⎩0 . () Set Y1 = X1 X1 + X 2 .i. ( ) ()() ( ) β −1 . x > 0 f x =⎨ x ≤ 0.1 The Univariate 227 Hence J= y2 − y2 y1 = y2 − y1 y2 + y1 y2 = y2 1 − y1 and J = y2 . clearly. x1 .2 The Multivariate Case 9. 0 ∞ ( ) fY 1 ( ) ⎧Γ α +β ⎪ yα −1 1 − y1 y1 = ⎨ Γ a Γ β 1 ⎪ ⎩0. fX 1 .X 2 ( ⎧ ⎛ x + x2 ⎞ 1 β xα −1 x2 −1 exp⎜ − 1 ⎪ ⎟ . β > 0. Y3 = X1 + X 2 + X 3 X1 + X 2 X1 + X 2 + X 3 .d. EXAMPLE 13 Let X1. x1 1 = → 1. ⎩0. Next. Hence fY1 y1 = ( ) 1 α y1 −1 1 − y1 Γ α Γ β 2α +β ( )() ∞ 0 ( ) β −1 α × ∫ y2 + β −1e − y2 2 dy2 . α β 1 x1 . we get fY1 .’s with density ⎧e − x . X2. 0 < y1 < 1 otherwise. ) ( )() y1 = From the transformation. x2 = ⎨ Γ α Γ β 2 2 2 ⎠ ⎝ ⎪ otherwise.

it follows that x1.Y2 . y . x1 . x3 ∈ (0. x3 > 0. 0 < y3 < ∞ otherwise. The functional forms of fY . 1). Y3 is distributed as Gamma with α = 3. y3 = fY1 y1 fY2 y2 fY3 y3 . Y2. 2 3 fY1 . 1 3 . y e ⎨ ⎪ 1 2 3 ⎧ ⎩ 2 −y3 3 . and Y1. Hence fY y1 = ∫ 1 fY and 2 ( ) ∫y y e (y ) = ∫ ∫ y y e ∞ 1 0 ∞ 1 0 0 2 2 0 2 2 −y3 3 dy2 dy3 = 1. ( ) 2 ( ) ( ) Thus fY . 0 < y2 < 1. 1 . Y3 are independent.Y 1 2 . y ) = ⎪0. x 2 . 0 < y1 < 1.228 9 Transformations of Random Variables and Random Vectors and prove that Y1 is U(0. y2 ∈ 0. β = 1. Y2. ∞ 0 0 < y1 < 1. fY verify the rest. ∞ . y2 . Y3 is established. ∞) implies that y1 ∈ 0. x2. 0 < y3 < ∞. 3 2 −y3 3 2 dy1dy3 = y2 ∫ y3 e − y dy3 = 2 y2 . 1 . ( ) ( ) ( ) ( ) the independence of Y1. y3 ∈ 0. Consider the transformation ⎧ x1 ⎪ y1 = x1 + x 2 ⎪ ⎧x1 = y1 y2 y3 x1 + x 2 ⎪ ⎪ h: ⎨ y2 = .Y 3 y ( y .Y3 y1 . 0 < y2 < 1 2 2 fY y3 = ∫ ∫ y2 y3 e − y dy1dy2 = y3 e − y 3 3 ( ) 1 1 3 0 0 ∫ y dy 0 2 1 2 = Since 1 2 −y y3 e . then ⎨x 2 = − y1 y2 y3 + y2 y3 x1 + x 2 + x3 ⎪ ⎪x = − y y + y 2 3 3 ⎩ 3 ⎪ ⎪ ⎩ y3 = x1 + x 2 + x3 and y2 y3 J = − y2 y3 0 y1 y3 − y1 y3 + y3 − y3 y1 y2 2 − y1 y2 + y2 = y2 y3 . − y2 + 1 Now from the transformation.

x∈ .U t . u = = Hence fT t = ∫ We set ( ) 1 2π e − t2 u 2 r ( ) ⋅ 1 ( r 2 )−1 − u 2 u u e . we get fT .f. r 2 Γr 22 r ( ) 2πr Γ r 2 2 r ( ) 1 2 u (1 2 )( r +1)−1 ⎡ u⎛ t2 ⎞⎤ exp⎢− ⎜ 1 + ⎟ ⎥. and set T = r X/√Y/r.υ. so that u = 2z⎜ 1 + ⎟ . then ⎨ r ⎪y = u ⎪u = y ⎩ ⎩ and u J= r 0 t 2 u r = 1 u r . () ( ) Set U = Y and consider the transformation ⎧ x ⎧ 1 t u ⎪x = ⎪t = h: ⎨ y r . 1) and χ2. du = 2 ⎜ 1 + ⎟ dz.2 The Multivariate Case 9. T is said to have the (Student’s) t-distribution with r degrees of freedom (d. ∞). y>0 y ≤ 0. r ⎠⎥ ⎢ 2⎝ ⎣ ⎦ ⎛ ⎛ u⎛ t2 ⎞ t2 ⎞ t2 ⎞ 1 + ⎟ = z.) and is often denoted by tr. We want to ﬁnd its p. Then for t ∈ . ⎧ 1 ( r 2 )−1 − y 2 . u > 0.d.1 The Univariate 229 9.f.’s X and Y be distributed as N(0. Therefore we continue as follows: −1 −1 .v. respectively.1 Application 2: The t and F Distributions The density of the t distribution with r degrees of freedom (tr).2. We have: fX x = () 1 2π e − 1 2 x2 ( ) . 2⎜ r⎠ r⎠ r⎠ ⎝ ⎝ ⎝ and z ∈[0. r ⎠⎦ ⎢ ⎥ ⎣ 2⎝ () ∞ 0 2πr Γ r 2 2 r ( ) 1 2 u (1 2 )( r +1)−1 ⎡ u⎛ t2 ⎞⎤ exp⎢− ⎜ 1 + ⎟ ⎥du. The r. y e ⎪ 1 (1 2 ) r fY y = ⎨ Γ r 2 2 ⎪ ⎩0.9. Let the independent r.

The r r r. t∈ .r .230 9 Transformations of Random Variables and Random Vectors (1 2 )( r +1)−1 ⎡ ⎤ 1 2 ⎢ 2z ⎥ e −z dz fT t = ∫ 2 0 r 2 ⎢ ⎥ 1+ t r 1 + t2 r 2πr Γ r 2 2 ⎣ ⎦ (1 2 )( r +1) ∞ (1 2 )( r +1)−1 1 2 z = e − z dz (1 2 )( r +1) ∫0 r 2 2 2πr Γ r 2 2 1+ t r () ∞ ( ) ( ) ( ) ( ) [ ( )] 1 = 1 2 r +1 πr Γ r 2 1 + t 2 r ( )( ) 1 ( ) [ ( )] Γ 1 2 Γ [ (r + 1)].7 t The density of the F distribution with r1. r2 d.7. . Let the independent r. 1 2 that is fT t = (r + 1)] () [ πr Γ r 2 ( ) [1 + (t r )]( 2 1 1 2 r +1 )( ) . and set F = (X/r1)/(Y/r2). 1)) t5 0 Figure 9.f.r ).) fT (t) t (N(0.v. ⎩ ( ) 2 x ( r 2 )−1 − x 2 e . (For the graph of fT.f. r2 degrees of freedom (d.’s X and Y be distributed as χ 2 and χ 2 . We want to ﬁnd its p. We have: 1 2 1 2 1 2 fX () ⎧ 1 ⎪ 1 x = ⎨ Γ 2 r1 2 r1 ⎪0.) and is often denoted by Fr . x>0 1 x ≤ 0. respectively.f. (Fr . F is said to have the F distribution with r1. 9.d. The probabilities P(T ≤ t) for selected values of t and r are given in tables (the t-tables).υ. see Fig.

1 2 )( r + r ) ( ⎠⎦ ⎢ 2 ⎝ r2 ⎥ Γ 1 r1 Γ 1 r2 2 ⎣ 2 2 ) ( )( ) 1 ( 1 1 2 1 2 Therefore fF f = ∫ fF . and consider the transformation ⎧ ⎧ x r1 r ⎪f = ⎪x = 1 fz h: ⎨ y r2 . 2 ⎝ r2 r2 ⎠ ⎝ ⎠ ⎛r ⎞ dz = 2 ⎜ 1 f + 1⎟ dt . ⎝ r2 ⎠ −1 −1 [ ) Thus continuing. y>0 2 y ≤ 0. ⎩ () ( ) 2 2 y ( r 2 )−1 − y 2 e . z = ( ) 1 Γ ( r )Γ( r )2 1 2 1 1 2 2 (1 2 )( r + r 1 2 ⎛ r1 ⎞ ) ⎜ r2 ⎟ ⎝ ⎠ ( r 2 )−1 1 f ( r 2 )−1 ( r 2 )−1 ( r 2 )−1 z z 1 1 2 ⎛ r ⎞ r × exp⎜ − 1 ⎟ fz e − z 2 1 z r2 ⎝ 2r2 ⎠ r 2 ( r 2 )−1 r1 r2 f ⎡ z⎛r ⎞⎤ (1 2 )( r + r )−1 = z exp⎢− ⎜ 1 f + 1⎟ ⎥. ∞ . then ⎨ r2 ⎪z = y ⎪y = z ⎩ ⎩ and r1 z J = r2 0 r1 f r1 r2 = z.2 The Multivariate Case 9. z > 0.1 The Univariate 231 ⎧ 1 ⎪ 1 fY y = ⎨ Γ 2 r2 2 r ⎪0. z dz 0 r1 2 1 2 () ( ) (r r ) f ( = ( Γ ( r )Γ ( r )2 ∞ 1 2 1 1 2 2 r1 2 −1 1 2 r1 + r2 ) )( ) ∫ ∞ 0 z (1 2 )( r + r )−1 1 2 ⎡ z⎛r ⎞⎤ exp⎢− ⎜ 1 f + 1⎟ ⎥dz. We set Z = Y. so that r2 1 J = r1 z.Z f .9. we get: f F .Z f . r2 For f. ⎠⎥ ⎢ 2 ⎝ r2 ⎣ ⎦ Set ⎞ ⎛ r1 ⎞ z ⎛ r1 ⎜ f + 1⎟ = t . t ∈ 0. we have . so that z = 2t ⎜ f + 1⎟ .

fF () ⎧Γ 1 r + r r r ⎪ 2 1 2 1 2 ⎪ f =⎨ Γ 1 r1 Γ 1 r2 2 2 ⎪ ⎪0. 10 F10.r. . .8 10 20 30 f REMARK 6 i) If F is distributed as Fr .) fF ( f ) F10. fX on the set S on which it is positive. 9. ⎩ [( )]( ) ( )( ) r1 2 ⋅ f 1 ( r 2 )−1 1 [1 + (r r ) f ] 2 (1 2 ) ( r + r ) 1 2 . THEOREM 3 ′ Let the k-dimensional r.r . j = 1. 1/F is distributed as Fr . so that T = X/√Y/r is r distributed as tr. hk(x))′ be a (measurable) transformation deﬁned on k into k. r of T (the image of S under h). see Fig. . such that jr= 1 Tj = T and that h deﬁned on each one of Sj onto Tj. clearly. vector. vector X have continuous p. which need not be distinct or disjoint. The probabilities P(F ≤ f) for selected values of f and r1. Y are independent. so that Y = h(X) is a k-dimensional r. .8.232 9 Transformations of Random Variables and Random Vectors fF (r r ) f ( f) = ( ( Γ ( r )Γ ( r )2 r1 2 1 2 1 2 1 1 2 2 r1 2 −1 1 2 r1 + r2 ) )( t ⎞ (1 2 )( r + r )−1 ⎛ r1 ⎜ f + 1⎟ ) ⎝ r2 ⎠ 2 1 2 1 2 − 1 2 r1 + r2 +1 ( )( ) ⎛r ⎞ × 2 ⎜ 1 f + 1⎟ ⎝ r2 ⎠ = Therefore Γ 1 2 1 2 −1 ∫ 1 ∞ (1 2 )( r + r )−1 r1 2 0 e − t dt f 1 [ (r + r )](r r ) Γ ( r )Γ( r ) 1 2 1 1 2 2 2 ⋅ ( r 2 )−1 1 [1 + (r r ) f ] 2 (1 2 )( r + r ) 1 2 . 4 0 Figure 9. Suppose that there is a partition {Sj. . since X 2 is χ 2.f.d. then. 1). . (For the graph of fF. . 1 1 2 2 1 We consider the multidimensional version of Theorem 3. for for f >0 f ≤ 0. . . j = 1. and let y = h(x) = (h1(x). r2 are given by tables (the F-tables). ii) If X is N(0. . . . Y is χ 2 and X. . the n T 2 is distributed as F1.r . r} of S and subsets Tj.

r exist and for each j. . .4 Let X1. .v. x 2 + x2 ≤ 1⎬.f. 9.2 Let X1. . f Y y = ⎨∑ j j =1 ⎪0.2. . δj(y) = 1 if y ∈T and δj(y) = 0 otherwise. . and the Jacobians Jj which are functions of y are deﬁned by g j11 g j21 M g jk1 g j12 g j22 M g jk 2 ⋅ ⋅ ⋅ g j1k ⋅ ⋅ ⋅ g j2 k M M ⋅ ⋅ ⋅ g jkk −1 j Jj = .’s taking on the values 1. X1 + X2 > 0). .d. and are assumed to be ≠ 0 on Tj. j = 1. gjk(y))′ be its inverse. l = 1. . Exercises 9. . . 6.’s with joint p. .f. .v. . j = 1.d. x = 1. X2 be independent r. 6 with 1 probability f(x) = –.d. i. Then the p.2.) 9. . · · · . i. j = 1. .3 Let X1. X2 be independent r. . . . f given by f x1 . . r.’s X1 + X2 and X1 − X2. . of the r.d. y ∈Tj. . Derive the distribution of the r. x 2 = ( ) 1 I A x1 .’s distributed as Negative Exponential with parameter λ = 1.2. j = 1. of the r.v. . . r. . . Z2.9. r. . Then: . l = 1. . fY of Y is given by ⎧ r ⎪ δ y fY y . k are continuous. Assume that j the partial derivatives gjil(y) = (∂/∂yl)gji(y1. . . k. . 1 ⎭ Set Z2 = X 2 + X 2 and derive the p. ⎩ () j () () j y ∈T otherwise. π ( ) where ′ ⎧ A = ⎨ x1 . . where for j = 1. . X1 + X2.v. X2 be r. r is one-to-one.1 Let X1. In the next chapter (Chapter 10) on order statistics we will have the opportunity of applying Theorem 3′. .2. . x 2 . Then: i) Find the p. r.’s distributed as N(0. x2 ∈ ⎩ ( ) 2 ⎫ 2 .1 The Univariate Case Exercises 233 j = 1. fY (y) = fX[h (y)]|Jj|. . .v. 6 9.f. 1). X2 be independent r.v. . . ii) Calculate the probability P(X1 − X2 < 0. (Hint: Use polar co1 2 ordinates. . gjil. yk). . . Let hj be the restriction of the transformation h to Sj and let h −1(y) = (gj1(y).v. . .f. .

page 50) which states that.d.r distribution.f. () () ii) For r = 1.v.2.5 Let X1. X is distributed as Fr .v.v.r . Then show i) The r.v. Y = is distributed as Beta.v.2.v. 9. use Stirling’s formula (see.f. X is distributed as χ 2 +r (as anticipated). of Y = is Beta. X 2 + X 2 has the Negative Exponential distribution with parameter 1 2 λ = 1/2σ2.f.) 9.f.v. r iii) The r. show that the p.’s of the r.234 9 Transformations of Random Variables and Random Vectors ii) Derive the p.’s X1. X2 be independent r.2. Then: i) Derive the p.d. respectively. 1+ X2 r ( ) 9.d.f. Vol.’s are independent or not.7 Let X be an r. r ( ) 1 (Hint: For part (iv). show that its median is equal to 1. 1 + r1 r2 X iv) The p. 9.2. then: 1 2 i) Find its expectation and variance.v..6 Let the independent r. Y = X1/X2.2.d.’s X 2 + X 2 and X1/X2 are independent. Γ(n)/(2π)1/2n(2n−1)/2e−n tends to 1. ii) Show that X1 + X2 and X1/X2 are independent. X2 be independent r. X1 − X 2 .d.’s of the following r. 1 2 . W.’s X and Y are independent.’s distributed as χ 2 and χ 2 . x2 ( ) Determine the distribution of the r.’s distributed as U(α.v. 1 ii) Also show that the r. r r and set X = X1 + X2. 1968.2.v. and X1 X 2 . 9.∞ x .d. 3rd ed.v.. σ2). X = X1/X2. for example.8 If the r. f given by f x = 1 I 1. Y is distributed as –Z.v.v.d.’s: X1 + X 2 . α + 1). as n → ∞. distributed as tr.v. X1/X2 has the Cauchy distribution with μ = 0 and σ = 1.v. r 1 2 ii) The r. 2 1 2 r1 9. of X becomes a Cauchy p.v. X2 be independent r.f.’s X1 + X2 and X1 − X2.9 Let X1. 1 iii) The p. Then show that: 1 2 i) The r. ii) Determine whether these r. X2 have p. where Z has the Fr . of r1 X converges to that of χ 2 as r2 → ∞. I.v. ii) The r.10 that: Let X1. iii) The r. Feller’s book An Introduction to Probability Theory. ii) If r1 = r2.f.’s distributed as N(0.

respectively. .’s distributed as χ 2 and Fr . . we can uniquely solve for the x’s in (1) and get xi = ∑ dij y j . . Then show that: EX r = 0.v. furthermore. If. and let Δ = |C| be the determinant of C. σ 2 X r1 . Then. 9. If Δ ≠ 0. (2) Let D = (dij) and Δ* = |D|. . i. . dij j =1 k real constants. the linear transformations.r2 = 2r 2 r1 + r2 − 2 r2 2 .d.r . distributed as tr. ⎝ 2⎠ 2π 1 (Hint: Use Stirling’s formula given as a hint in Exercise 9. C = (cij). ⋅ ⋅ ⋅ .1 Preliminaries A transformation h: k → k which transforms the variables x1.r2 = .α r1 1 as r2 → ∞. i.v.r be r.9.1 of Univariate Case 235 9.v. ⋅ ⋅ ⋅ . c2j.f. . Then show that: EX r1 . .α be deﬁned by: P(Xr ≥ χ 2 .2.14 Let Xr and Xr . cij j =1 k real constants.α) = α.α) r r = α. . k (1) is called a linear transformation. j = 1. . . r −2 Let Xr . We ﬁrst introduce some needed notation and terminology.3 Linear Transformations of Random Vectors In this section we will restrict ourselves to a special and important class of transformations.r .11 Let Xr be an r. r for α ∈ (0.8(iv). j = 1. j = 1.3 Linear TransformationsTheRandom Vectors 9. r ≥ 3. r2 ≥ 3. σ 2 X r = 9. Then show that: 1 1 1 1 2 1 1 2 1 1 1 2 1 2 Fr . Let C be the k × k matrix whose elements are cij. k.r ≥ Fr .r . x ∈ . Then show that: fr x ⎯ → ⎯ r →∞ () 2 ⎛ x2 ⎞ exp⎜ − ⎟ . .2. 2 r2 − 2 r1 r2 − 2 r2 − 4 ( ) ( ( )( ) ) 9. . yk in the following manner: yi = ∑ cij xj . xk to the variables y1. k are orthogonal.α → 1 2 χ r . 1). .2. That is. let χ 2 .12 1 2 1 2 ( ) r . and let fr be its p.v. distributed as tr. 9. . r2 ≥ 5. P(Xr .r be an r.α and Fr . . that is .r 1 2 .2.3. ckj)′.r . as is known from linear algebra (see also Appendix 1).) 9.13 Let Xr be an r. the linear transformation above is such that the column vectors (c1j.2. Δ* = 1/Δ. distributed as Fr . r ≥ 2. 2. . and.

. . (y . vector X = (X1. ⋅ ⋅ ⋅ . . k. ⋅ ⋅ ⋅ . . vector Y = (Y1. Xk)′ with p. . for an orthogonal transformation. ⎩0. . The orthogonality relations (3) are equivalent to orthogonality of the row vectors (ci1. i = 1. k. so that xi = ∑ c ji y j . . the Jacobian of the transformation (1) is J = Δ* = 1/Δ. That is. Also in the case of an orthogonal transformation. Yk)′ is given by fY y1 . of the r. j =1 k where we assume Δ = |(cij)| ≠ 0. ⋅ ⋅ ⋅ . cik)′ i = 1. . we have J = ±1. . These results are now applied as follows: Consider the r. . . i = 1. then j =1 k xi = ∑ c ji y j .236 9 Transformations of Random Variables and Random Vectors ⎫ j ≠ j ′⎪ ⎪ i =1 and (3) ⎬ k 2 cij = 1. ⋅ ⋅ ⋅ . j = 1. y ) 1 k ′ ∈T where T is the image of S under the transformation in question. j = 1. ⎪ ⎪ j =1 ⎭ ∑ cij ci′ j = 0 k (4) It is known from linear algebra that |Δ| = 1 for an orthogonal transformation. and for the case that the transformation is orthogonal. .f. i = 1.d. i. ∑ dkj y j ⎟ ⋅ . if the transformation is orthogonal. yk ( ) k ⎧ ⎛ k ⎞ 1 ⎪ fX ⎜ ∑ d1 j y j . fX and let S be the subset of k over which fX > 0. k. . ⎪ ∑ ⎪ i =1 ⎭ then the linear transformation is called orthogonal. ⋅ ⋅ ⋅ . Set Yi = ∑ cij X j . if yi = ∑ cij x j . j =1 k This is seen as follows: ∑c j =1 k ji k k ⎛ k ⎞ ⎛ k ⎞ k k y j = ∑ c ji ⎜ ∑ c jl xl ⎟ = ∑ ∑ c ji c jl xl = ∑ xl ⎜ ∑ c ji c jl ⎟ = xi ⎝ l =1 ⎠ j =1 l =1 ⎝ j =1 ⎠ j =1 l =1 by means of (3).f. so that |J| = 1. . k. ⋅ ⋅ ⋅ . . ⋅ ⋅ ⋅ . . . ⋅ ⋅ ⋅ . k. . . . = ⎨ ⎝ j =1 ⎠ Δ j =1 ⎪ otherwise. . j =1 k According to what has been seen so far.d. ∑ cij ci j′ = 0 k for and ⎫ for i ≠ i ′ ⎪ ⎪ j =1 ⎬ k ∑ cij2 = 1. i = 1. we have dij = cji. k. Thus. Then the p. k. In particular.

⎩0. of the r.f. = ⎨ ⎝ j =1 ⎠ Δ j =1 ⎪ otherwise. k. .f. ⋅ ⋅ ⋅ . ⎩ (y . ⋅ ⋅ ⋅ . (y . . If. in the case of orthogonality. vector Y = (Y1. yk = ⎨ ⎝ j =1 ⎠ j =1 ⎪ otherwise. . ⋅ ⋅ ⋅ . . vector X = (X1.d. ∑ c jk y j ⎟ . . where |(cij)| = Δ ≠ 0. We formulate these results as a theorem. k ⎛ k ⎞⎛ k ⎛ k ⎞ ⎞ ∑ Y = ∑ ⎜ ∑ cij X j ⎟ = ∑ ⎜ ∑ cij X j ⎟ ⎜ ∑ cil X l ⎟ ⎠ ⎠ ⎝ l =1 ⎠ i =1 i =1 ⎝ j =1 i =1 ⎝ j =1 k k 2 i k k k k k ⎞ ⎛ k = ∑ ∑ ∑ cij cil X j X l = ∑ ∑ X j X l ⎜ ∑ cij cil ⎟ ⎝ i =1 ⎠ i =1 j =1 l =1 j =1 l =1 2 = ∑X2 i i =1 k because ∑ cij cil = 1 i =1 k for j = l and 0 for j ≠ l. ⋅ ⋅ ⋅ . then k ⎧ ⎛ k ⎞ ′ ⎪ fX ⎜ ∑ c j 1 y j . j =1 k i = 1. y ) 1 k ′ ∈T where T is the image of S under the given transformation. Yk)′ is fY y1 . otherwise. = ⎨ ⎝ j =1 ⎠ j =1 ⎪ 0. i =1 i =1 k k In fact. . the transformation is orthogonal. fX which is > 0 on S ⊆ Set Yi = ∑ cij X j . y ) 1 k ′ ∈T Another consequence of orthogonality of the transformation is that ∑ Y i2 = ∑ X 2i . ⋅ ⋅ ⋅ . y1 . ⎩0. k. ⋅ ⋅ ⋅ .3 Linear TransformationsTheRandom Vectors 9.1 of Univariate Case 237 fY y1 . ⋅ ⋅ ⋅ . ⋅ ⋅ ⋅ . and the p. ⋅ ⋅ ⋅ .9. in particular. Xk)′ with p. THEOREM 4 Consider the r. i = 1. ⋅ ⋅ ⋅ . yk ∈T fY y1 . yk ( ) k ⎧ ⎛ k ⎞ 1 ⎪ fX ⎜ ∑ d1 j y j . ⋅ ⋅ ⋅ . Then X i = ∑ dij Yj . Furthermore. ∑ ckj y j ⎟ ⋅ . j =1 k k . we also have ( ) ( ) .d. yk ( ) k ⎧ ⎛ k ⎞ ⎪ fX ⎜ ∑ c j 1 y j . . . ∑ c jk y j ⎟ .

j j =1 k The following theorem is an application of Theorem 4 to the normal case. . ⋅ ⋅ ⋅ .v. . ⋅ ⋅ ⋅ . THEOREM 5 Let the r. k. k. . j =1 k Then the r. normally distributed with common variance σ2 and means given by E Yi = ∑ cij μ j . . ⋅ ⋅ ⋅ . . xk ( ⎛ 1 ⎞ ⎡ 1 =⎜ ⎟ exp ⎢− 2 ⎝ 2πσ ⎠ ⎣ 2σ k ∑ (x i =1 k i 2⎤ − μ i ⎥. we have fX x1 .’s Xi be N(μi. . ⎥ ⎦ Now 2 2 ⎤ k k ⎡⎛ k ⎛ k ⎞ ⎞ ⎢ ∑ c ji y j + μ 2 − 2 μ i ∑ c ji y j ⎥ c ji y j − μ i ⎟ = ∑ ⎜ ∑⎜∑ i ⎟ ⎥ ⎠ ⎠ i =1 ⎝ j =1 i = 1 ⎢⎝ j = 1 j =1 ⎦ ⎣ k k ⎛ k k ⎞ = ∑ ⎜ ∑ ∑ c ji cli y j yl + μ 2 − 2 μ i ∑ c ji y j ⎟ i ⎠ i =1 ⎝ j =1 l =1 j =1 k = ∑ ∑ y j yl ∑ c ji cli + ∑ μ 2 − 2 ∑ ∑ μ i c ji y j i j =1 l =1 k i =1 i =1 j =1 i =1 k k k k k k = ∑ y 2 − 2 ∑ ∑ c ji μ i y j + ∑ μ 2 j i j =1 j =1 i =1 i =1 k k k and this is equal to k ⎞ ⎛ ∑ ⎜ y j − ∑ c ji μi ⎟ . . . . ⎦ ) and hence fY y1 . . σ2). yk ( ) k ⎡ ⎛ 1 ⎞ 1 exp ⎢− =⎜ ⎟ ⎢ 2σ 2 ⎝ 2πσ ⎠ ⎣ ⎞ ⎛ k ∑ ⎜ ∑ c ji y j − μi ⎟ ⎠ i =1 ⎝ j =1 k 2 ⎤ ⎥. Consider the orthogonal transformation Yi = ∑ cij X j . i = 1. j =1 ( ) ) k PROOF With X = (X1. . ⋅ ⋅ ⋅ .v. .’s Y1. and independent. Yk are also independent. i = 1. Yk)′. i = 1. . Xk)′ and Y = (Y1. . ⎠ j =1 ⎝ i =1 k 2 since expanding this last expression we get: . . k.238 9 Transformations of Random Variables and Random Vectors ∑X j =1 k 2 j = ∑Y 2 . .

and i i−1 ( ) i−1 i i−1 2 1j ( ) = . . . . and for i = 2. and set ⎧ ⎪Y1 = ⎪ ⎪ ⎪Y2 = ⎪ ⎪ ⎨Y = ⎪ 3 ⎪ ⎪ M ⎪Y = ⎪ k ⎪ ⎩ We thus have 1 k 1 Z1 + 1 k Z2 + ⋅ ⋅ ⋅ + 1 2 ⋅1 1 3⋅ 2 Z2 Z2 − 2 3⋅ 2 1 k k −1 Z3 k −1 1 k Zk 2 ⋅1 1 3⋅ 2 1 Z1 − Z1 + k k −1 ( ) Z1 + ⋅ ⋅ ⋅ + ( ) Zk −1 − k k −1 ( ) Zk . j = 1. ⋅ ⋅ ⋅ . we get = ∑ c1 j cij = j =1 k 1 k ∑ cij = j =1 k 1 k ∑ cij = j =1 i ⎛ i −1 1 ⎜ i −1 − ⎜ i i −1 k⎝ i i −1 ( ) ( ) ⎞ ⎟ = 0. for j = 1. ⎟ ⎠ . . . . k i 2 ij i −1 1 ∑c = ∑c = i − 1 ⋅ i i − 1 + i i − 1 j =1 j =1 2 ij ( ) ( ) ( ) ( ) 2 1 i −1 + = 1.3 Linear TransformationsTheRandom Vectors 9. Zk be independent N(0. and for i = 2.9. c1 j = c ij = c ii = − Hence 1 k . k. i − 1. . i i while for i = 2. ⋅ ⋅ ⋅ . Let Z1. k. k 1 . ⋅ ⋅ ⋅ . we consider the following result. 1).1 of Univariate Case 239 k ⎛ 2 k k ⎞ y j + ∑ ∑ c ji c jl μ i μ l − 2 ∑ c ji μ i y j ⎟ ∑⎜ ⎠ i =1 l =1 j =1 ⎝ i =1 k = ∑ y 2 − 2 ∑ ∑ μ i c ji y j + ∑ ∑ μ i μ l ∑ c ji c jl j j =1 k j =1 i =1 k k i =1 l =1 k j =1 k k k k k k = ∑ y 2 − 2 ∑ ∑ μ i c ji y j + ∑ μ 2 . . ▲ As a further application of Theorems 4 and 5. k. ∑c j =1 k k k = 1. i j j =1 j = 1 i =1 i =1 as was to be seen. ⋅ ⋅ ⋅ .

’s distributed as N(μi. and for i > l. 3. . Thus we have the following theorem. we conclude that Z is independent of i=2 i k ¯ ∑i = 1 (Zi − Z )2. by Theorem 4. ∑c c For i < l.240 9 Transformations of Random Variables and Random Vectors and for i. X2 + 1 3 X3 . Xk be independent r. σ 2). . . let Xi be independent r. Hence X and S are independent. [(l − 1) − (l − 1)] = 0. Yk are independent. THEOREM 6 ¯ Let X1. k(i ≠ l). σ2). 2. Then the Z’s are as above. . . . . . k. that Y1. . . N(0.υ. . Y2 = − 1 6 X1 + 1 6 1 3 X1 − 2 6 1 3 X3 . we have ∑c c and j =1 k ij lj = ∑ cij clj j =1 i if i < l . .3. . j = 1.’s distributed as N(μ. It follows. and hence Z= 1 X −μ σ ( ) and 2 ∑ (Z j =1 k j −Z ) 2 = 1 σ2 ∑ (X j =1 k j −X ) 2 are independent. by Theorem 5. ( ¯ Since Y1 is independent of ∑k Y 2. Y3 = X2 + .1 set: For i = 1.v. and Y1 = − 1 2 X1 + 1 2 X 2 . . . Then X and S 2 are independent. l = 2. this is 1 i i−1 l l −1 ( )( ) Thus the transformation is orthogonal. ( )( ) [(i − 1) − (i − 1)] = 0. ▲ ¯ Exercises 9. PROOF Set Zj = (Xj − μ)/σ. this is 1 i i−1 l l −1 j =1 l ij lj if i > l . . 1). and that ∑ Y i2 = ∑ Z 2i i =1 i =1 k k Thus ∑ Y i2 = ∑ Y i2 − Y 12 = ∑ Z 2i − i=2 i =1 k i =1 k 2 i 2 i =1 i =1 k k k ( kZ ) ) 2 2 = ∑ Z − kZ = ∑ Zi − Z .

for σ 2 = σ 2 = σ2. 1.9. σ 2. 0. Y3 are also independent normally distributed with variance σ2.υ. X − Y)′ N(0.3. μ2. 2(1 − ρ)). ρ0). σ 2 1 . σ 2. σ 2. then: i) Determine the distribution of the r. cμ1 − dμ2.v.’s are independent.’s Y1. ρ).v. Y) N(μ1.v. c2. 9. σ 2. ii) In particular. Y) has the Bivariate Normal distribution with parameters μ1.V= Y − μ2 σ2 . ρ0). and specify their respective means. c2σ 2. then show that 1 2 1 2 X −μ Y −μ ′ N(0. 1. 1 2 N(cμ1 + dμ2. 0. X − Y.1 The Univariate Case Exercises 241 Then: i) Show that the r. 9. τ 2 = σ 1 + σ 2 − 2 ρσ 1σ 2 .’s 1 2 X + Y. and let c. specify the distributions of the r. where 1 2 2 2 2 2 2 τ 1 = σ 1 + σ 2 + 2 ρσ 1σ 2 .3. 1. say.v. and ρ0 = σ 1 − σ 2 τ 1τ 2 . τ 2 = c 2σ 1 + d 2σ 2 − 2 ρcdσ 1σ 2 . dμ2.2 If the pair of r. μ2. d2.3. (Hint: Verify that the transformation is orthogonal. U − V. μ2. τ 2). ρ). 0. 2(1 − ρ)). σ 2. and X + Y. Y)′ N(μ1. and show that r. ±ρ).3 If (X. (Compare with the latter part of Exercise 9. .’s X + Y and X − Y are independent if and only if σ1 = σ2. ρ0). ii) If μ1 = μ2 = μ3 = 0.6 Let (X.3. 0. τ 2. Y)′ Then: i) (cX. where 1 2 ii) (cX + dY. 9. ρ). 0.3.7 Let (X. dY) N(0. Y)′ iii) (X + Y. ρ). 1 2 N(cμ1. σ 2. c d iii) The r. 9. (X. τ 2) and X − Y 1 N(0. σ 2. τ 2. σ 2. Y2. with +ρ if cd > 0.v.v. and ρ0 = −ρ if cd < 0. 2 2 and ρ0 = 2 c 2σ 1 − d 2σ 2 2 τ1τ 2 . and then use Theorem 5).5. σ 2.3.v.’s cX + dY and cX − dY are independent if and only if =±σ . show that X + Y N(0. d2σ 2. 2 iii) The r. 0. σ 2. ρ). cX − dY)′ 2 2 2 2 τ 1 = c 2σ 1 + d 2σ 2 + 2 ρcdσ 1σ 2 .) 9.’s U + V. 1 2 3 3 9. 1. and c. dY)′ N(μ1. 1. then show that (cX. X − Y . X − Y are independent. and U = 1 2 X − μ1 σ1 ( 1 2 σ1 σ2 ) N(0. and show that these r.’s are independent. ρ). ρ. Y)′ N(0. Y)′ N(0. use a conclusion in Theorem 4 in order to show that Y 2 + Y 2 + Y 2 σ 2χ 2. d be constants with cd ≠ 0. τ 2.5 If (X. and −ρ if cd < 0.3. where ρ0 = ρ if cd > 0.4 If (X. and vice versa.’s (X. Then: 1 2 N(0. d are constants with cd ≠ 0. τ 2. ρ). that is. 1. 2 2 2 ( ) iii) X + Y N(0. μ2.

ρ).7 and: i) Provide an expression for the probability P(cX + dY > λ). b) such that F(x0) = y and F(xn) = y + – .f.v. (in the absolutely continuous case). 3rd edition (1966). .. . Next. Then: 1 2 ¯ ¯ i) Determine the distribution of the r. Set X = F−1(Y). We will show that G(y) = y.242 9 Transformations of Random Variables and Random Vectors iv) The r. by George B. σ2 = 0. b] and all y of the form y + – (n ≥ 2 integer) lie in (c. n Then PROOF . with d. Theorem 3(ii) on page 95 in Calculus and Analytic Geometry.v. Vol. G(1) = 1. F. 49 (1976) No. Reading.v. and deﬁne the r. 1). Inc. F. . let (Xj. Thomas. 1).’s in part (iii) are distributed as N(cμ1 + dμ2. there exists ε > 0 such that y + ε < 1 and F(y) < F(y + ε)(≤ 1). respectively. in several instances a statement has been made to the effect that X is an r. we derive two main results.f. μ2 = 1. 2 9. Then the distribution of Y is U(0.9 For j = 1.d. The proof presented is an adaptation of the discussion in the Note “The Probability Integral Transformation: A Simple Proof” by E. if X is any r. and if Y = F(X). μ2. μ1 = 3. 0 < y < 1.8 Refer to Exercise 9. pages 242–243.f. d = 3. τ 2). for example. by the n Intermediate Value Theorem (see. and F(b) = d. say? 1 2 9.3. can actually be constructed. ii) What does this distribution become for μ1 = μ2 and σ 2 = σ 2 = σ2. 5. 1 τ 2).3. . and N(cμ1 − dμ2. Set F(a) = c. F.v.f. Schuster. vectors with distribution N(μ1. G(0) = 0. According to the ﬁrst result. F.3. let y ∈ (0. ii) Give the numerical value of the probability in part (i) for c = 2. Then X F.4 The Probability Integral Transform In this short section. Indeed. published in Mathematics Magazine. Then the function F is continuous in the closed interval ε [a. σ 2. Therefore. 9. there exists a such that (0 ≤)F(a) < y. λ = 15. Yj)′ be independent r.f. with continuous d. y + ε = b. 1). The second result resolves this question as follows: Let F be any d.v. Since F(x) → 0 as x → −∞. The name of this section is derived from the transformation Y = F(X). and ρ = −0. then surprisingly enough Y U(0. THEOREM 7 Let X be an r. Y by Y = F(X).9. and since F(x) → 1 as x → ∞. with continuous d. d). of Y.v. σ 2.5.f.5. σ1 = 1. The question then arises as to whether such an r. X − Y . 1). Let G be the d. Addison-Wesley Publishing Company.v. since F is represented by the integral of a p.5. n. Massachusetts) ε there exist x0 and xn (n ≥ 2) in (a. and let Y U(0.

f. n ( 0 n P X ≤ x0 ≤ P F X ≤ y ≤ P X ≤ x n . Next.11. with d.9 x 0 A x Figure 9. (See also Figs. being a d.f.11 . for each x ∈ .) ) [ ( ) ] ( ) ε or y = F ( x ) ≤ G( y) ≤ F ( x ) = y + . The proof is completed. G(1−) = lim G(y) = lim y = 1. From this deﬁnition it is then clear that when F is strictly increasing. Hence Letting n → ∞. ⎛ ⎜ since F x n = y + ⎝ ( ) ε⎞ ⎟ n⎠ That is (X ≤ x0) ⊆ [F(X) ≤ y] ⊆ (X ≤ xn). we obtain G(y) = y. G(0) = lim G(y) = lim y = 0. 9. G is right-continuous. as y ↓ 0. as y ↑ 1. so that G(1) = 1.v. Thus. there is exactly one y ∈ (0. Set y = F(x) and deﬁne F−1 as follows: (5) F −1 ( y) = inf {x ∈ .10 B 0 x Figure 9. we need some notation and a preliminary result. F. To this end.9. 1 F 1 1 y F y y F 0 Figure 9. F ( x) = y}. Finally. 1) such that F(x) = y.1 The Univariate Case 243 (X ≤ x ) ⊆ [F (X ) ≤ F (x )] (since F is nondecreasing) = [ F ( X ) ≤ y] (since F (x ) = y) 0 0 0 ⎡ ε⎤ ⊆ ⎢F X < y + ⎥ n⎦ ⎣ ( ) = F X < F xn ⊆ X < xn [ ( ) ( )] ( ) (by the fact that F is nondecreasing and by contradiction).10 and 9. then the above deﬁnition becomes as follows: (6) F −1 ( y) = inf {x ∈ . It is also clear that. 9. ▲ For the formulation and proof of the second result.4 The Probability Integral Transform 9. let X be an r.9. if F is continuous. F ( x) ≥ y}.

F(x) ≥ y}. Then the d. Fj.v. X by X = F −1(Y). . where F−1 is deﬁned by (5). PROOF We have P X ≤ x = P F −1 Y ≤ x = P Y ≤ F x = F x . is a given d. THEOREM 8 [ ( )] (7) Let Y be an r. since F is nondecreasing. We have F−1(y) = inf {x ∈ .v. of X is F.’s such that Xj has continuous and strictly increasing d. The proof of the lemma is completed. by the right continuity of F. REMARK 7 Exercise 9. LEMMA 1 Let F −1 be deﬁned by (5).f.v. F(x) ≥ y} and hence F−1(y) ≤ t. 1) and the one before it by Lemma 1. 1). that y ≤ F(t). we may now establish the following result.1 Let Xj. X = −2 ∑ log 1 − Yj j =1 n ( ) is distributed as χ 2 . . Then F−1(y) ≤ t if and only if y ≤ F(t). Therefore there exists xn ∈ {x ∈ .f.f. F(xn) ≥ y} such that xn ↓ F−1(y). This means that t belongs to the set {x ∈ . Set Yj = Fj(Xj) and show that the r. ▲ As has already been stated. 2n . Hence F(xn) → F[F −1(y)]. . and let F be a d. . Deﬁne the r.244 9 Transformations of Random Variables and Random Vectors We now establish the result to be employed. F. Then F[F (y)] ≤ F(t). ( ) [ ( ) ] [ ( )] () where the last step follows from the fact that Y is distributed as U(0. ▲ By means of the above lemma.f. −1 −1 Now assume that F (y) ≤ t. Next assume. j = 1. distributed as U(0.v.v. the theorem just proved provides a speciﬁc way in which one can construct an r. and PROOF F F −1 y ≥ y. Combining this result with (7). we obtain y ≤ F(t). X whose d. n be independent r.f.4.

.d. Xn(s). X2. r. Xn is denoted by X(j). . ⋅ ⋅ ⋅ . .’s with d. j = 1.d. but it is easily seen.’s with p.1 Order Statistics and Related Distributions Let X1. X2(s).f. ⎩ ( ) ( ) ( ) a < y1 < y2 < ⋅ ⋅ ⋅ < yn < b otherwise. . 10. j = 1. Xn be i.d. We assume now that the X’s are of the continuous type with p. F. Chapter 9. The jth order statistic of X1. .f.d. the Y’s are not independent. In the ﬁrst place. for each s ∈S. . or Yj. Xn(s). f which is positive for a < x < b and 0 otherwise. . . and is deﬁned as follows: Yj = jth smallest of the X 1 . r. look at X1(s). ⎪ g y1 .i. . X 2 . . By means of Theorem 3′. . X n . to be valid in the general case as well. f such that f(x) > 0. . of the Y’s. and. One of the problems we are concerned with is that of ﬁnding the joint p. then the joint p. . n). PROOF The proof is carried out explicitly for n = 3. Xn are i. . ⋅ ⋅ ⋅ . The results obtained here will be used in the second part of this book for statistical inference purposes. (that is. of the order statistics Y1. . 245 . .f. for easier writing. in general. ⋅ ⋅ ⋅ . with the proper change in notation.v. . . .d. X2(s). . yn = ⎨ ⎪0.Chapter 10 Order Statistics and Related Theorems In this chapter we introduce the concept of order statistics and also derive various distributions. since for i ≠ j. 2. . .v. .f.d.i. Yn is given by: ⎧n! f y1 ⋅ ⋅ ⋅ f yn . . It follows that Y1 ≤ Y2 ≤ · · · ≤ Yn. . .f. (−∞ ≤)a < x < b(≤ ∞) and zero otherwise. n. it will be established that: THEOREM 1 If X1. and then Yj(s) is deﬁned to be the jth smallest among the numbers X1(s). . X2. .

i ≠ j ≠ k. j . y3 = x1 y1 = x3 . y2 = x 2 . a < x i < b. we may assume that the joint p. x3 = ⎨ ⎪0. 2. a < x1 ≠ x 2 ≠ x3 < b ⎪ f x1 . ⎩ ( ) ( )( )( ) Thus f(x1. x3 = y3 S132 : x1 = y1 . ⎭ Let Sijk ⊂ S be deﬁned by ′ ⎧ ⎫ S ijk = ⎨ x1 . y3 = x3 y2 = x3 . x3 are equal. x2. x 2 = y2 . x 2 = y2 . y3 = x 2 y3 = x3 y3 = x 2 S 213 : y1 = x 2 . f(·. x 2 . x3 = y3 S 231 : x1 = y3 . x3) is positive on the set S. ·. x2.d. x3 = y2 S 213 : x1 = y2 . ⎩ ⎭ ( ) Then we have S = S123 + S132 + S 213 + S 231 + S312 + S321 .f. y2 = x 2 . we have then: S123 : x1 = y1 . x3 = y1 . j j and therefore P(Xi = Xj = Xk) = 0 for i ≠ j ≠ k. y3 = x1 . Solving for the x’s.. x 2 . ·). x3 . y2 = x3 . x 2 = y3 . x 2 = y3 . The Jacobians are thus given by: . y1 = x1 . X3 is zero if at least two of the arguments x1. y2 = x1 . x3 all different ⎬. x 2 . x 2 . 3. x3 ∈ ⎩ ( ) 3 ⎫ . Now on each one of the Sijk’s there exists a one-to-one transformation from the x’s to the y’s deﬁned as follows: S123 : S132 : y1 = x1 . y2 = x1 . of X1. S312 : y1 = x3 . x 2 = y1 . x3 = y2 S312 : x1 = y2 . 3. x1 . i. x 2 = y1 . x3 = y1 S321 : x1 = y3 .246 10 Order Statistics and Related Theorems P X i = X j = ∫∫ ( xi = x j ) f xi f x j dxi dx j = ∫ a ( ) ( )( ) b x ∫ x f ( xi ) f ( x j )dxi dx j = 0. Thus we have ⎧ f x1 f x 2 f x3 . otherwise. i = 1. k = 1. S321 : S 231 : y1 = x 2 . where ′ ⎧ S = ⎨ x1 . X2. 2. a < x i < x j < x k < b⎬.

y3 = ⎨ ⎪ a < y1 < y2 < y3 < b ⎪ ⎩0.1 Order Statistics and Related Distributions 247 1 0 0 S123 : J 123 = 0 1 0 = 1 0 0 1 1 0 0 S132 : J 132 = 0 0 1 = −1. ( ) ( )( )( ) ( )( )( ) ( )( )( ) ( )( )( ) ( )( )( ) ( )( )( ) This is. ▲ ⎩ ( ) ( )( )( ) Notice that the proof in the general case is exactly the same. Then the joint p. and Theorem 3′. . .’s distributed as N(μ. y2 . r. . g y1 . EXAMPLE 2 Let X1.d. . .’s distributed as U(α. β). Then the joint p. 0 1 0 0 1 0 S213 : J 213 = 1 0 0 = −1 0 0 1 0 0 1 S231 : J 231 = 1 0 0 = 1 0 1 0 0 1 0 S312 : J 312 = 0 0 1 = 1 1 0 0 0 0 1 S321 : J 321 = 0 1 0 = −1.υ.d.d. . of the order statistics Y1. otherwise. ⎧3! f y1 f y2 f y3 . . ⋅ ⋅ ⋅ .f.d. Yn is given by ⎛ 1 ⎞ ⎡ 1 g y1 . . . From the deﬁnition of a determinant and the fact that each row and column contains exactly one 1 and the rest all 0. One has n! regions forming S. EXAMPLE 1 Let X1. . σ 2 ). y2 . Yn is given by g y1 .υ. Xn be i.i. . ⎥ ⎦ ) if −∞ < y1 < · · · < yn < ∞ and zero otherwise. r. Chapter 9. ⋅ ⋅ ⋅ . 1 0 0 Hence |J123| = · · · = |J321| = 1. yn = ( ) n! (β − α ) n . .f. one for each permutation of the integers 1 through n. . of the order statistics Y1. y3 = ⎨ ⎪0.i. . yn = n!⎜ ⎟ exp ⎢− 2 ⎝ 2πσ ⎠ ⎢ 2σ ⎣ ( ) n ∑ (y j =1 n j 2⎤ − μ ⎥. a < y1 < y2 < y3 < b ⎪ g y1 . . gives ⎧ f y1 f y2 f y3 + f y1 f y3 f y2 + f y2 f y1 f y3 ⎪ ⎪+ f y3 f y1 f y2 + f y2 f y3 f y1 + f y3 f y2 f y1 . otherwise. it follows that the n! Jacobians are either 1 or −1 and the remaining part of the proof is identical to the one just given except one adds up n! like terms instead of 3!.10. . . Xn be i.

Hence if u = F(y). gij of any Yi. j = 1.d.’s with d.d. . of any number of the Yj’s. ⎩ ( ) ( )[ ( ) ( )] n −2 f y1 f yn . . . Then the p. Since f is positive in (a. yn) = n!f(y1) · · · f(yn) for a < y1 < · · · < yn < b and equals 0 otherwise. . j = 1. ⎧ ⎪n n − 1 F yn − F y1 ii′) g1n y1 . yn = ⎨ ′ ⎪0. THEOREM 2 Let X1. . .f. From Theorem 1. In particular. gj of Yj. . b). . 1) and PROOF dy 1 = . 1 . . is given by: i −1 ⎧ n! F yi F y j − F yi ⎪ ⎪ i−1! j −i−1! n− j ! ii) g ij yi .d.d. ( ) . n. ( )( ) a < y1 < yn < b otherwise. b) and therefore F −1 exists in this interval. . we have that g(y1. otherwise. b). u ∈(0.f. Yj with 1 ≤ i < j ≤ n. [ ( )] f ( y ). . ⎧ ⎪n 1 − F y1 i′) g1 y1 = ⎨ ′ ⎪0. it follows that F is strictly increasing in (a. −1 du f F u [ ( )] u ∈ 0.f. . of each Yj. ⎩0. 2. .f. . The joint p. ( ) ( )( )[ ] [ j −1 n− j j j a < yj < b ( ) [ ( )] f ( y ). as well as the joint p. then y = F −1 (u).v. and let Y1. . ⎩ ( ) 1 − F ( y )] f ( y ).248 10 Order Statistics and Related Theorems if α < y1 < · · · < yn < β and zero otherwise.d.f.d. . y ∈(a. n−1 1 n−1 a < y1 < b otherwise and i″) g n yn ″ ( ) ⎧ ⎪ n F yn =⎨ ⎪ ⎩0. Yn be the order statistics. ( ) ( )( [ ( )] ( ) ( ) ( )] )( )[ ] [ ( )( ) j − i −1 In particular. n a < yn < b otherwise. As a partial answer to this problem. . y j = ⎨ n− j ⋅ f yi f y j . f which is positive and continuous for (−∞ ≤) a < x < b(≤ ∞) and zero otherwise. . .i. r.f. Xn be i. is given by: ⎧ n! F yj ⎪ i) g j y j = ⎨ j − 1 ! n − j ! ⎪ ⎩0. n. F and p. we have the following theorem. . Another interesting problem is that of ﬁnding the marginal p. a < yi < y j < b ⎪× 1 − F y j ⎪ otherwise.

f. uj+1 yield [1/(n − j)!] (1 − uj)n−j and the last j − 1 integrations with respect to the variables u1. 1 − G1 y1 = P Y1 > y1 = P all X j ’s > y1 = 1 − F y1 ( ) ( ) ( ) [ ( )] . ( ) ( ) ( ) ( ) Thus gn(yn) = n[F(yn)]n−1 f(yn). using once again the transformation Uj = F(Yj). h of the U’s is given by h u1 . n 1 The proof of (ii) is similar to that of (i). h u j = n! ∫ ⋅ ⋅ ⋅∫ 0 ( ) uj 0 ∫ 1 uj ⋅ ⋅ ⋅∫ 1 u n− 1 dun ⋅ ⋅ ⋅ du j +1 du1 ⋅ ⋅ ⋅ du j−1 . Similarly.1 Order Statistics and Related Distributions 249 Therefore by setting Uj = F(Yj). ⋅ ⋅ ⋅ . . . b) and 0 otherwise. Thus h uj = ( ) ( n! u jj−1 1 − u j j −1! n− j ! )( ) ( ) n− j for uj ∈(0. one has that the joint p. of any number of Yj’s (see also Exercise 10. ▲ EXAMPLE 3 Refer to Example 2. . 1) and equals 0 otherwise. . The ﬁrst n − j integrations with respect to the variables un. un = n! f F −1 u1 ⋅ ⋅ ⋅ f F −1 un ( ) [ ( )] u2 [ ( )] f [F (u )] ⋅ 1⋅ f [F (u )] ⋅ −1 −1 1 n for 0 < u1 < · · · < un < 1 and equals 0 otherwise. () . Hence for uj ∈(0. . .d. . (i′) and (i″) follow from (i) by setting j = 1 and j = n.d. This completes the proof of (i). F x =⎨ ⎪β − α ⎪1. . . . we obtain g yj = ( ) ( ) 1 − F ( y )] f ( y ) ( )( )[ ] [ n! F yj j −1! n− j ! j −1 n− j j j for yj ∈ (a.10.19). Finally. that is. . n Then − g1 y1 = n 1 − F y1 ( ) [ ( )] [− f ( y )].f. j = 1. h(u1. 1). un) = n! for 0 < u1 < · · · < un < 1 and equals 0 otherwise. . n. Then ⎧0 . n−1 1 or g1 y1 = n 1 − F y1 ( ) [ ( )] f ( y ). . respectively.1. uj−1 yield [1/(j − 1)!] ujj−1. ⎩ x ≤α a<x<β x ≥ β. Of course. and in fact the same method can be used to ﬁnd the joint p. . . ⎪ ⎪x −α . An alternative and easier way of establishing (i′) and (i″) is the following: Gn yn = P Yn ≤ yn = P all X j ’s ≤ yn = F n yn . .

⎩ ⎛y j −α⎞ ⎜ ⎟ !⎝ β − α ⎠ j −1 gj yj ( ) ( ( )( )( ) ⎛β − yj⎞ ⎜ ⎟ ⎝ β −α ⎠ j −1 n−j 1 . 0 < yj < 1 otherwise. a<yj <β otherwise. ⎩ ( ) ( ) ( ) α < y1 < β otherwise. In particular. ( ) ( ) ( ) α < yn < β otherwise. ( ) n− 2 1 ( β −α ) 2 = n n−1 ( ( β −α ) ) n (y − y1 ) n− 2 . which is the density of a Beta distribution with parameters α = j. ⎩ ( ) ( () ( ) ) ( ) n− j . β = n − j + 1. ⎩ ⎧ ⎛ y − α ⎞ n−1 1 n−1 n ⎪n⎜ n = yn − α . this becomes ⎧ Γ n+1 ⎪ y jj−1 1 − y j ⎪ g j yj = ⎨Γ j Γ n − j + 1 ⎪ ⎪0 . these formulas simplify as follows: ⎧ n! y jj −1 1 − y j ⎪ j −1 ! n− j ! g j yj = ⎨ ⎪ ⎩0. ( ) ( )( ) ( ) n− j . β = 1. 0 < yj < 1 otherwise. Since Γ(m) = (m − 1)!. Thus ⎧ ⎛ β − y ⎞ n−1 1 n−1 n 1 ⎪n β − y1 . a < y1 < yn < β otherwise. ⎩ ⎧ n! ⎪ ⎪ = ⎨ j −1! n− j ⎪ ⎪0. n g1n y1 . ⎪ ⎟ n g n yn = ⎨ ⎝ β − α ⎠ β − α β −α ⎪ ⎪0 . Likewise . yn ( ) ⎧ ⎛ y − y1 ⎞ ⎪n n − 1 ⎜ n ⎟ ⎝ β −α ⎠ ⎪ =⎨ ⎪ ⎪ ⎩0. β −α n−j a<yj <β otherwise )!(β − α ) n (y j −α ) (β − y ) j . for α = 0. = ⎪ n g1 y1 = ⎨ ⎜ β − α ⎟ β − α ⎝ ⎠ β −α ⎪ ⎪0 .250 10 Order Statistics and Related Theorems and therefore ⎧ n! ⎪ ⎪ = ⎨ j −1! n− j ⎪ ⎪0.

v.10. Y = Yn − Y1 is called the (sample) range and is of some interest in statistical applications. y + z ( ) ( ) ) ( )] n −2 = n n−1 F y+ z − F z ( )[ ( ⎧0 < y < b − a f z f y + z . ⎧n n − 1 yn − 2 1 − y . Let now U be χ and independent of the sample range Y. Z.⎨ ⎩a < z < b − y ()( ) and zero otherwise. fY . b− y n −2 a 0 < y< b−a otherwise. We are interested in deriving the distribution of the r. ( ) ( ) ) 0 < y < 1. To this end. ⎩ () ( )∫ [F ( y + z) − F (z)] f (z) f ( y + z)dz. 0 < y1 < 1 otherwise.v. ⎩ ⎧nyn −1 . Therefore ⎧y = z Then ⎨ 1 ⎩ yn = y + z and hence J = 1. ⎩ ( ) ( ) n −1 . z = g1n z. ( ) 0 < yn < 1 otherwise and g1n y1 . ⎩ () ( ) ( 0< y<1 otherwise. The distribution of Y is found as follows. 0 < y1 < yn < 1 otherwise. one obtains ⎧ ⎪ fY y = ⎨n n − 1 ⎪0.v. ⎪ gn yn = ⎨ n ⎪0. if X is an r. 1). yn ( ) ⎧n n − 1 y − y ⎪ n 1 =⎨ ⎪ ⎩0. Integrating with respect to z. ( )( ) n −2 . then fY y = n n − 1 that is () ( )∫ 1− y 0 y n− 2 dz = n n − 1 y n− 2 1 − y . Consider the transformation ⎧ y = yn − y1 ⎨ ⎩z = y1 .1 Order Statistics and Related Distributions 251 ⎧n 1 − y ⎪ 1 g1 y1 = ⎨ ⎪0. Z y. ⎪ fY y = ⎨ ⎪0. distributed as U(0. In particular. we consider the transformation . Set 2 r Z= Y U r . The r.

d. of the r. Xn are i. . also calculate the probability P(Yn ≤ m). . . . it is then clear that the distribution of the Y’s and. the crushing strength of bricks. . Then ⎧u = rw ⎪ ⎨ ⎪y = z w ⎩ and hence J = r w. . are i. Y2/Y1. j = 1. and n = 2.’s and Yj = X(j) is the jth order statistic of the X’s. ( ) ( ) ( ) if 0 < z.f. of Y1. n may represent various physical quantities such as the breaking strength of certain steel bars. W z.f. .1. β). Use Theorem 2(i″) in order to calculate the probability that all Xj’s exceed m.1. Donald B. for example. Integrating out w. Xj.d.3 If the independent r.v. Z is called the Studentized range.f. . . j = 1. 10.’s with d.d. f given by: − ( x −θ ) f x =e I (θ . in particular. 10. β = 1. then: i) Calculate the probability that all X’s are greater than (α + β)/2. and p. . respectively. then the r. 1) and Y is as above. 10. Xn are distributed as U(α.v. Therefore fZ . X3 be independent r.’s X1. .) ( ) ( ) Exercises Throughout these exercises. 0 () ∞ if 0 < z < ∞ and zero otherwise.4 Let Xj.d. derive the p. the weight of certain objects.2 Let X1. .f. n.’s with p. N(0.v. Then: i) Use the p. r.i. the life of certain items such as light bulbs. r.v.v.v.d. derived in Example 3 in order to show that . . are quantities of great interest. F and f. j = 1.v. 144–149. . . pp.’s X1.90. n be i. . X2. . ii) In particular.252 10 Order Statistics and Related Theorems ⎧ y ⎪z = u r ⎨ ⎪ ⎩w = u r . β). . w = fY z w fU rw r w . The r. ∞ ) x1 . etc. .v. Owen’s Handbook of Statistical Tables. From these examples. we get fZ z = ∫ fY z w fU rw r w dw. for α = 0. vacuum tubes.’s distributed as U(α. (See. .1. and let m be the median of F. n be independent r.1 Let Xj.v. 10. Yn as well as Yn − Y1.d.d. w < ∞ and zero otherwise. published by Addison-Wesley.’s Xj.f. . . . . () ( ) Use Theorem 2(i″) in order to determine the constant c = c(θ ) for which P(θ < Y3 < c) = 0.1.i. Its density is given by fZ(z) above and the values of the points zα for which P(Z > zα) = α are given by tables for selected values of α. Now if the r. j = 1.i.

in the present case: g ij yi . 1). i) Refer to the p. and let 1 ≤ i < j ≤ n.f. . use the appropriate Beta p. β).f. j! ) j +1 ∫0 z (1 − z) 1 n− j dz = ( j + 1)!(n − j )! . show that: P a <Y < b = ( ) (β − α ) [(β − α − b)b − β − α − a a n−1 + ( ) ] (β − α ) bn − a n n .d.d. . . σ 2(Yn) from part (i). then: g1n y1 . 0 < yi < y j < 1.f. Xn be independent r. ii) Integrating by parts repeatedly.v. if X1. α < y1 < yn < β . . . gij derived in Theorem 2(ii). and show that. gij derived in Theorem 2(ii). of Y is given by: fY y = () n n−1 (β − α ) n n ( (β − α − y) y n−1 n− 2 .Exercises 253 EY j (β − α ) j + α = n+1 and σ 2 (β − α ) j(n − j + 1) . and EYn. . show that: i ∫0 y ( z − y) z j − i −1 dy = i! j − i − 1 ! j z. (n + 2)! iii) Use parts (i) and (ii) to show that E YY j = i ( ) (n + 1)(n + 2) i j +1 ( ) .5 i) Refer to the p. iii) For a and b with 0 < a < b ≤ β − α. (Y ) = (n + 1) (n + 2) 2 j 2 ii) Derive EY1. 0 < y < β − α. and show that.’s distributed as U(0. σ 2(Y1). ii) Set Y = Yn − Y1.d. y j = ( ) ( )( ( n! yii−1 y j − yi i−1! j −i−1! n− j ! )( ) ( ) (1 − y ) j− i−1 j n− j .6 Let X1. iii) What do the quantities in parts (i) and (ii) become for α = 0 and β = 1? (Hint: In part (i).d. . and show that the p. iv) What do the quantities in parts (i)–(iii) become for α = 0 and β = 1? 10.’s to facilitate the integrations. .1.) 10.1. yn = ( ) n n−1 (β − α ) ) n ( ) n (y n − y1 ) n− 2 . Xn are independent with distribution U(α. .f.

Then show that Y1 has the same distribution with parameter nλ. Yn). yn > 0 . . this set is a triangle in 10. Xn. . 1). j = 2.8 Let the independent r. Xn have the Negative Exponential distribution with parameter λ. . .f. of Y1.d.’s X1. .f. that is. . .’s X1.f. Then F is the Negative Exponential d. v) From part (iv).v. given Y1. . zn ∈ ⎪ ⎩ ( ) n . with parameter λ . .1. ii) The conditional p. 10. .v. .f. of the independent r. . ⎪ ⎭ (For n = 2.’s X1. of Y is given by: fY y = n − 1 λe − λy 1 − e − λy () ( ) ( ) n− 2 . and deﬁne the r. Xn be distributed as U(0. n and 2 ∑z j =1 n j ⎫ ⎪ ≤ 1⎬. .1.’s X1. 10.v. 10. n are uniformly distributed over the set ⎧ ′ ⎪ ⎨ z1 . . . . . Then: i) Use Theorem 2(i′) in order to show that the p. . . . ii) Let Y be the (sample) range. zj ≥ 0.d. . .d. Y = Yn − Y1.f.7 Let the independent r. . 1). of Yj.10 Let the independent r. j = 1.4(i). Yj = ( ) i n− j +1 [( ( i n−i +1 j n− j +1 )( ) )] 1 2 . .f. . iii) The conditional p.’s X1. Xn be distributed as U(0.1.d.9 . Then use the result in Theorem 1 in order to show that the r. and then use Theorem 2 (ii′) in order to show that the p.f. ⋅ ⋅ ⋅ . given Yn. of Yn. j = 1. . and Exercise 1. n as follows: Z1 = Y1. Zj = Yj − Yj−1.254 10 Order Statistics and Related Theorems iv) By means of part (iii) and Exercise 1. and the conditional p. ⋅ ⋅ ⋅ . . n. of Yj.f.d.’s Zj. ii) Let F be the (common) d. . and suppose that their ﬁrst order statistic Y1 has the Negative Exponential distribution with parameter nλ. show that: Cov Yi . Xn have the Negative Exponential distribution with parameter λ.) i) Let the independent r. and let 1 < j < n. Yj = ( ) i n− j +1 ( ( n+1 )( 2 ) n+ 2 ) and ρ Yi . given Yn. Yn) and ρ(Y1.v.’s Zj. . . Use the relevant results in Example 3. derive Cov(Y1. of Yn is given by: g n yn = nλe − λy 1 − e λy n ( ) ( n ) n−1 .d.1.v. . j = 1. . given Y1.6(i) in order to derive: i) The conditional p. .v. y > 0.v.

12 Let the independent r. r.1.f. given Yn. Use Theorem 2(ii) and Exercises 1.1. [ ( ) ] i = 1.i. . . and n = 10.d. and the conditional p. ci ∈ constants. of Yn. . Then: i) For j = 1. . Xn have the Negative Exponential distribution with parameter λ.’s. given Y1. . it follows that Yj = Z1 + · · · + Zj. n. ⎢ ⎥ ⎣ i≠ j ⎦ 10.10(i) in order to determine: i) The conditional p. .v. .’s X1.’s X1. Xn have the Negative Exponential distribution with parameter λ.f.1. of Yj. . 10. i 2 i ii) For 1 ≤ i ≤ j < n.v. ⎝ i =1 ⎠ i =1 ⎝ j = i ⎠ 1≤k < l ≤n j =1 where cj ∈ .v. . .f. . . n. Yj) = ∑k = 1σ k . b = 2/λ. n be i.9(i) and 1. ii) From the deﬁnition of the Zj’s. . . ﬁnd a numerical value for the probability in part (iii). . Zj = Yj − Yj−1. n. given Yn. for c > 0: P ⎡min X i − X j ≥ c ⎤ = exp − λn n − 1 c 2 . dj ∈ constants. n. . given Y1.1. ii) The conditional p. Let Xj. iv) For a = 1/λ. n − j + 1⎠ λ ⎝n n−1 iii) Use part (i) in order to show that. ∑ d j Yj ⎟ = ∑ σ i2 ⎜ ∑ c j d j + 2 ∑ ck dl ⎟ .f. . of Yj. Use this expression and part (i) in order to conclude that: EY j = 1⎛1 1 1 ⎞ +⋅⋅⋅+ ⎜ + ⎟. iii) The conditional p.d.d. iii) From parts (i) and (ii). . .12 and show that: i) σ2(Yj) = ∑ ji = 1 σ i2 . . . and let 1 < j < n. .11 Let the independent r.d. of Y1. conclude that: ⎛ n ⎞ n ⎛ n ⎞ σ ⎜ ∑ c j Yj ⎟ = ∑ σ j2 ⎜ ∑ ci ⎟ . . . Cov(Yi. Then the sample median SM is deﬁned as follows: .d. j = 1. and set: Z1 = Y1. j = 1. show that Zj has the Negative Exponential distribution with parameter (n − j + 1)λ.v. where σ 2 = [λ(n − i + 1)]−2. . 10. j = 2. and that these r. . .’s are independent.13 Refer to Exercise 10. . ⎝ j =1 ⎠ j =1 ⎝ i = j ⎠ 2 2 iv) Also utilize parts (i) and (ii) in order to show that: n ⎛ n ⎞ n ⎛ n ⎞ Cov⎜ ∑ ci Yi .Exercises 255 iii) Calculate the probability P(a < Y < b) for 0 < a < b.

.’s Xj. y = 1. f(x) = 2(1 − x)I(0.. .2 Further Distribution Theory: Probability of Coverage of a Population Quantile It has been shown in Theorem 7. F and let Zj = F(Yj). 1). . of SM when the underlying distribution is: i) U(α.d. 1 10. This result in conjunction with Theorem 1 of the present chapter gives the following theorem. Also. where Yj. where SM is deﬁned by (*).f. . . 1).’s with continuous d. otherwise. . . . r. . Without integration. Find the p. . 10.v. show that the p.1. 6.1. n are i.f. where SM is deﬁned by (*).d. .1. conclude that ESM = μ. Y = F(X) is U(0. and hence their joint p. let the independent r. .1. .2 and derive the p. that if X is an r.υ. . j = 1. .d.d. 10. 10. .’s Xj. Then Z1. ( ) . observe that P(Y1 = y) = P(Y6 = 7 − y). zn = ⎨ ⎩0.19 Carry out the proof of part (ii) of Theorem 2.i.f. n are the order statistics. .15 If the r. of SM. F.v. . .d. j = 1. 6 x = 1. . f(x) = 2(2 − x)I(1.v. h is given by: ⎧n!.16 For n odd. 2. 6. .f. .d. 10. and n is odd.1)(x). 0 < z1 < ⋅ ⋅ ⋅ < zn < 1 h z1 . determine the p. . Xn be i. Then determine the p. .’s. . f given by f(x) = –.14 If Xj.d.f. of SM. THEOREM 3 Let X1. j = 1. then the r. . . of SM is symmetric about μ. .256 10 Order Statistics and Related Theorems SM ⎧Y ⎪ = ⎨1 ⎪ 2 ⎛Y + Y ⎞ ⎠ ⎩ ⎝ n+ 1 2 n 2 n +2 2 if n is odd if n is even.f.1.f. Chapter 9.f.2)(x).i.1. ii) Negative Exponential with parameter λ.1.’s of Y1 and Y6.v. . (*) 10.v.i. 6 be i. and also calculate the probability P(SM > m) in each one of the following cases: i) ii) iii) iv) f(x) = 2xI(0.d. r.f. f with median m. j = 1. . j = 1. β).f.1)(x). n are independently distributed as N(μ. .17 Refer to Exercise 10.’s with p. What do parts (i)–(iii) become for n = 3? 10. . ⋅ ⋅ ⋅ . . . n have p. r.v. .18 Let Xj. . with continuous d.d. σ 2).d. Zn are order statistics from U(0.d.

. 2. Then the Wj’s are independent. let xp be the (unique by assumption) pth quantile. we get P Yi ≤ x p = P Yi ≤ x p . Zn and (Z1.f. to each ordering of the Xj’s. ) . r. Because F is nondecreasing. n. . . . Then in Chapter 4. j = 1. THEOREM 4 Let X1. a pth quantile. . Consider a number p. n. .υ. Then we have P Yi ≤ x p ≤ Yj = ∑ k =i ( ) j −1 ( )p q n k k n −k . 1). p). X n ≤ x p = ∑ k =i ( ) ( )p q n k k k n− k n −k . since P W1 = 1 = P X1 ≤ x p = F x p = p. . j = 1. of the Zj’s is the one given above follows from Theorem 1 of this chapter.d. ⋅ ⋅ ⋅ . ( ) ( ) ( ) n Therefore P at least i of X 1 . For p. n. Zn) are given in Example 3 of this chapter.’s distributed as B(1. Let now X be an r. Now we would like to establish a certain theorem to be used in a subsequent chapter. . Next. Yn be the order statistics.v. . .2 Further Distribution Theory: Probability of Coverage of a Population Quantile 257 Set Wj = F(Xj). X j > x p . Wn are i. F. 2. . where q = 1 − p. n as follows: ⎧1. . by Theorem 7. and conversely. ▲ PROOF The distributions of Zj. xp.v. ⎩ j = 1.i. . . . . since the Xj’s are. 2. j = 1. . X(1) ≤ X(2) ≤ · · · ≤ X(n) there corresponds the ordering F(X(1)) ≤ F(X(2)) ≤ · · · ≤ F(X(n)) of the F(Xj)’s. r. 0 < p < 1. .f. .d. X j ≤ x p ⎪ Wj = ⎨ ⎪0. 2.’s with continuous d. PROOF Deﬁne the r. with d.i. . P Yi < x p = P Yi ≤ x p = ∑ k= i ( ) ( ) n ( )pq n k .’s Wj. F and let Y1.f. Xn be i. Y j < x p i p j i p ( ) ( = P(Y ≤ x ) ( ≤ Y ) + P(Y < x ). or equivalently. .d. Z1. ⋅ ⋅ ⋅ . of F was deﬁned to be a number with the following properties: i) P(X ≤ xp) ≥ p and ii) P(X ≥ xp) ≥ 1 − p. . and also distributed as U(0. Therefore W(j) = F(Yj). 0 < p < 1. Y j ≥ x p + P Yi ≤ x p . Then W1. . That the joint p. Chapter 9. for 1 ≤ i < j ≤ n.v.10. .

. .’s with continuous d. this gives P Yi ≤ x p ≤ Yj = ∑ k =i ( ) n ( )p q n k k n −k −∑ k= j ( )p q n k k n −k =∑ k =i j −1 ( )p q n k k n −k . ) ( ( ) P Yi ≤ x p ≤ Y j = P Yi ≤ x p − P Y j < x p .f. n be i. . F(Y1) and ﬁnd its expectation.v. ▲ Exercise 10. .i.v.258 10 Order Statistics and Related Theorems since (Y Therefore j < x p ⊆ Yi ≤ x p .1 Let Xj.d. F. j = 1.2. ( ) ) ( n ) By means of (1). Use Theorem 3 and the relevant part of Example 3 in order to determine the distribution of the r. r. .

Tm)′. θ ∈ Ω}. etc. of the r. we also introduce and discuss other concepts such as: completeness. . . f(·. let Tj be (measurable) functions deﬁned on n into and not depending on θ or any other unknown quantities. . however. . . . . θ). . X n = T1 X 1 . . m. We let Ω stand for the set of all possible values of θ and call it the parameter space.v.11. xn. n independent r. the data. Xn). By F we denote the family of all p. we shall often write T instead of T(X1. For j = 1. . .’s distributed as X above. .) on the basis of the observed values x1. We propose. the concept of sufﬁciency is treated exclusively in conjunction with estimation and testing hypotheses problems. X n . with p. Then ′ T X 1 . . In the same chapter. So Ω ⊆ r. . and set T = (T1. Also. . by slightly abusing the notation. F = {f(·. One of the basic problems of statistics is that of making inferences about the parameter θ (such as estimating θ. In most of the textbooks. Tm X 1 . θ) of known functional form but depending upon an unknown r-dimensional constant vector θ = (θ1. . . . Xn) rather than T(X1. . unbiasedness and minimum variance unbiasedness.v. Let X1. . we shall write θ rather than θ when r = 1. ⋅ ⋅ ⋅ . .’s we get by letting θ vary over Ω. . ⋅ ⋅ ⋅ . . .d. . . Xn be a random sample of size n from f(·. We shall write T(X1.’s X1. .1 Sufﬁciency: Deﬁnition and Some Basic Results 259 Chapter 11 Sufficiency and Related Theorems Let X be an r. the concept of sufﬁciency plays a fundamental role in allowing us to often substantially condense the data without ever losing any information carried by them about the parameter θ. Likewise. .f. that is. .v. In doing so. testing hypotheses about θ. ⋅ ⋅ ⋅ . θ). ⋅ ⋅ ⋅ . θr)′ which is called a parameter. that is. X n ( ) ( ( ) ( )) is called an m-dimensional statistic. . r ≥ 1. .d. to treat it in a separate chapter and gather together here all relevant results which will be needed in the sequel. . Xn. . . . 259 . Xn) if m = 1. .f.

EXAMPLE 1 Let X = (X1. θ1 < θ2} (that is. Ω = {(θ1. ( n! ) r × θ1x1 ⋅ ⋅ ⋅ θ rx−−11 1 − θ1 − ⋅ ⋅ ⋅ − θ r −1 ( ) −1 n − Σ r= 1 x j j () r ⎧ ⎫ ′ ⎪ ⎪ A = ⎨x = x1 . θ2 = σ2. j = 1. θ 2)′. Ω is that part of the plane through the points (1. r. . j = 1. .’s.1 Sufﬁciency: Definition and Some Basic Results Let us consider ﬁrst some illustrative examples of families of p. Then by setting θ1 = μ. 0) and (0. . θ r . x j ≥ 0. ( ) () [ ] EXAMPLE 3 Let X be N(μ. if β is known and α = θ. ⎪ ⎪ j =1 ⎩ ⎭ For example. θ r ∈ ⎩ r ⎫ ⎪ and ∑ θ j = 1⎬ ⎪ j =1 ⎭ ( ) ( ) r . θ 2 .260 11 Sufﬁciency and Related Theorems The basic notation and terminology introduced so far is enough to allow us to proceed with the main part of the present chapter. A = θ1 . θ = ( ) n! θ1x1 ⋅ ⋅ ⋅ θ rxr I A x = x1! ⋅ ⋅ ⋅ xr ! () ∏ r −1 j =1 x j ! n − x1 − ⋅ ⋅ ⋅ − xr −1 ! IA x . . θ = IA x . ⋅ ⋅ ⋅ . 11. then Ω = (α.d. whereas for r = 2. θ2 ∈ . β). the part of the plane above the horizontal axis) and ( ) . θ1. Xr)′ have the Multinomial distribution. we have ′ ′ ⎧ θ = θ1 . 1) which lies in the ﬁrst quadrant. . the distribution of X = (X1. we have θ = (θ1. . 1. the part of the plane above the main diagonal) and 1 f x. ⋅ ⋅ ⋅ . ⋅ ⋅ ⋅ . . θ1) = B(n. 0). Then by setting θj = pj. θ 2 − θ1 ( ) () [ ] If α is known and we put β = θ. θ −α Similarly. X2)′ is completely determined by that of X1 = X which is distributed as B(n. ∑ x j = n⎬. xr ∈ r . we have θ = (θ1. 0.f. θ j > 0. θ1 ∈ . θ2 = β. (0. By setting θ1 = α. θ 2 > 0⎬ ⎩ ⎭ (that is. for r = 3. Ω = ⎨ θ1 . θ 2 ∈ 2 . θ2)′. σ2). θ2)′ ∈ 2. r and f x. ⋅ ⋅ ⋅ . 0. θ = IA x . ′ ⎧ ⎫ Ω = ⎨ θ1 . θ). θ . j = 1. r . A = α. ⋅ ⋅ ⋅ . . ( ) EXAMPLE 2 Let X be U(α. ∞) and 1 f x.

v. ⎥ ⎥ ⎦ ⎤ ⎥. Set T = ∑n= 1 Xj. ⋅ ⋅ ⋅ . . . θ = where ( ) 1 2πσ 1σ 2 1 − ρ 2 e −q 2 . Then the following question arises: Can we say more about θ if we know how many successes occurred and where rather than merely how many successes occurred? The answer to this question will be provided by the following argument.i. θ 5 ∈ −1. ( ) ) j = 1. θ). θ2 = μ2. θ = θ j ( ) xj (1 − θ ) 1− x j I A xj .11. EXAMPLE 5 Let X1. θ = exp⎢− ⎢ 2σ 2 2πσ ⎢ ⎣ Similarly if μ is known and σ 2 = θ. given that T = t. . θ5)′ and 1 2 ′ ⎧ Ω = ⎨ θ1 . we have then θ = (θ1. X2)′ have the Bivariate Normal distribution. Given that the number of successes is t. n. . that is. so that Ω j ⎛ n⎞ fT t . n. . Setting θ1 = μ1. . . n. θ = exp ⎢− ⎢ 2θ 2 2πθ 2 ⎢ ⎣ If σ is known and we set μ = θ. We suppose that the Binomial experiment in question is performed and that the observed values of Xj are xj. θ). . θ5 = ρ. 1}. 1. Xn be i. θ 5 ∈ ⎩ ( ) 5 ⎫ . . θ 2 ∈ . . . . 1). θ = ⎜ ⎟ θ t 1 − θ ⎝t⎠ ( ) ( n− t IB t . fX x j . . j = 1. . j = 1. θ4 = σ 2. . . we label as a success the outcome 1. . then Ω = and ( ) 1 ( ) 2 ⎤ ⎥. t = 0.1 Sufﬁciency: Deﬁnition and Some Basic Results 261 ⎡ x −θ 1 f x. r. 1. ⎟ ⎟⎜ ⎢⎝ σ 1 ⎠ ⎝ σ1 ⎠⎝ σ 2 ⎠ ⎝ σ 2 ⎠ ⎥ ⎦ ⎣ ′ x = x1 . . ⋅ ⋅ ⋅ . . Then the problem is to make some kind of inference about θ on the basis of xj. . As usual. θ ∈Ω = (0. Then T is B(n. . θ1 .’s from B(1. . 1 1 − ρ2 ( ) Before the formal deﬁnition of sufﬁciency is given. θ 3 . where A = {0. ( ) 1 ( ) 2 EXAMPLE 4 Let X = (X1. n. θ 4 ∈ 0. ∞ .d. n}. that is. . . θ3 = σ 2. x2 . ⎥ ⎥ ⎦ ⎡ x −θ f x. ﬁnd the probability of . q= 2 ⎡⎛ x − μ ⎞ 2 ⎛ x − μ1 ⎞ ⎛ x 2 − μ 2 ⎞ ⎛ x 2 − μ 2 ⎞ ⎤ 1 ⎢⎜ 1 − 2ρ⎜ 1 +⎜ ⎟ ⎟ ⎥. () where B = {0. an example will be presented to illustrate the underlying motivation. 1 ⎬ ⎭ ( ) ( ) and f x.

⋅ ⋅ ⋅ . T = t Pθ X 1 = x1 . . f(·. xn such that xj = 0 or 1.d. r. given T = t. θr)′ ∈ Ω ⊆ let T = (T1. . m are statistics. where Pθ denotes the probability function associated with the p. . If.f. . .262 11 Sufﬁciency and Related Theorems each one of the (n) different ways in which the t successes can occur. except perhaps for a set N in m of values of t such that Pθ (T ∈ N) = 0 for all θ ∈ Ω. we found that for all x1. . ⋅ ⋅ ⋅ . . . .v. θ = (θ1. . for almost all (a. . . Pθ[(X1. j = 1. given the total number of successes t. ⋅ ⋅ ⋅ . n be i. f(·.i. on the other hand. then clearly the positions where the t successes occurred are entirely irrelevant and the total number of successes t provides all possible information about θ. we will say that knowledge of the positions where the t successes occurred is more informative about θ than simply knowledge of the total number of successes t. ⋅ ⋅ ⋅ . . θ). or for the parameter θ. . and therefore the total number of successes t alone provides all possible information about θ. T being a sufﬁcient statistic for θ implies that every (measurable) set A in n. . Xn)′. ⋅ ⋅ ⋅ . and ( ) j = 1. . X n . . . ⋅ ⋅ ⋅ . is independent of θ for all values of t (actually.a. where Tj = Tj X 1 . . X n = xn | T = t = = ( ) Pθ X 1 = x1 . This example motivates the following deﬁnition of a sufﬁcient statistic. and this is equal to θ x 1 −θ 1 ( ⎛ n⎞ t ⎜ ⎟θ 1 − θ ⎝t⎠ ) 1− x 1 ⋅ ⋅ ⋅ θ x 1 −θ n ( ) 1− x n ( ) n− t = θt 1 −θ ⎛ n⎞ t ⎜ ⎟θ 1 − θ ⎝t⎠ ( ) n− t ( ) n− t = 1 ⎛ n⎞ ⎜ ⎟ ⎝t⎠ if x1 + · · · + xn = t and zero otherwise. Thus. X n = xn .d. . θ). . . Tm)′.f. . . . We say that T is an m-dimensional sufﬁcient statistic for the family F = {f(·. . n and ∑ x j = t . REMARK 1 Thus.d. Pθ ( X 1 = x1 .a. . . all possible outcomes. if t there are values of θ for which particular occurrences of the t successes can happen with higher probability than others.’s with p. In the present case. j = 1. DEFINITION 1 Let Xj. θ)). X n = xn | T = t ) = 1 j =1 n ⎛ n⎞ ⎜ ⎟ ⎝t⎠ independent of θ. . if the conditional distribution of (X1. we have Pθ X 1 = x1 . Then.)t. that is. have the same probability of occurrence. θ ∈ Ω}. Xn)′ ∈ A|T = t] is independent of θ for a. X n = xn Pθ T = t ( ( Pθ T = t ( ) ) ( ) ) if x1 + ⋅ ⋅ ⋅ + xn = t and zero otherwise. r .

t. Xn)′ is always a sufﬁcient statistic for θ. . . THEOREM 1 (Fisher–Neyman factorization theorem) Let X1. . x′ ) = t. given T = t. Xn be i. . Deﬁnition 1 above does not seem appropriate for identifying a sufﬁcient statistic. more is true. X n = T1 X 1 . . f x1 .d. . Clearly. ⋅ ⋅ ⋅ . . ( ) ( ) Clearly. Namely. xn . . . θr)′ ∈ Ω ⊆ r. In connection with this. . x ′ ). . . θ = g T x1 . ⋅ ⋅ ⋅ . θ). .’s with p. θ = ∑ g t. . Pθ T = t = Pθ T X 1 . x′ )′ for which T(x′. . ⋅ ⋅ ⋅ . ⋅ ⋅ ⋅ . Then ′ ⎤ ⎡ Pθ T* ∈ B T = t = Pθ ⎢ X 1 . . . . xn) = t. . ⋅ ⋅ ⋅ . .a. Discrete case: In the course of this proof. . . . ⋅ ⋅ ⋅ . xn only through T and h is (entirely) independent of θ. . X n . if T* = (T*. PROOF f x1 . ( ) [ ( ) ]( ) where g depends on x1. We ﬁnally remark that X = (X1. ⋅ ⋅ ⋅ . · · · .a. This can be done quite easily by means of the following theorem. xn .f. To see this. xn . ( ( )( ) Hence . xn ′ ′ ′ ′ 1 n ( ) [( ) ] ( ) ( ) ) ( ) = g(t. θ h x1 . An m-dimensional statistic ′ T = T X 1 . Actually. it sufﬁces to restrict attention to those t’s for which Pθ (T = t) > 0. ′ ′ where the summation extends over all (x′. .v. . X n = xn .i. ⋅ ⋅ ⋅ . T*)′ is any k-dimensional 1 k statistic. θ = g T x1 . 1 n 1 n Thus Pθ T = t = ∑ f x1 . . of X1. θ = (θ1. Xn factors as follows. ⋅ ⋅ ⋅ . X n = t = ∑ Pθ X 1 = x1 . X n ( ) ( ( ) ( )) is sufﬁcient for θ if and only if the joint p. xn . it should be pointed out at the outset that by doing so we restrict attention only to those x1. let B be any (measurable) set in k and let A = T*−1 (B). θ ⋅ ⋅ ⋅ f xn . . Assume that the factorization holds. Tm X 1 . we are going to use the notation T(x1. xn . . .11.d. . ⋅ ⋅ ⋅ .d. xn for which T(x1. then the conditional distribution of T*. ⋅ ⋅ ⋅ . that is. . . . . θ h x1 .1 Sufﬁciency: Deﬁnition and Some Basic Results 263 t. ⋅ ⋅ ⋅ . ⋅ ⋅ ⋅ . The proof is given separately for the discrete and the continuous case. ⋅ ⋅ ⋅ . θ)∑ h( x ′ . f(·. . ⋅ ⋅ ⋅ . r. . xn) = t. t. θ h x1 .f. . . X n ∈ A T = t⎥ ⎣ ⎦ and this is independent of θ for a. ( ) [ ( ) ]( ) with g and h as described in the theorem. . is independent of θ for a. xn . Next.

⋅ ⋅ ⋅ . xn )]. ⋅ ⋅ ⋅ . . T x1 . . . . Continuous case: The proof in this case is carried out under some further regularity conditions (and is not as rigorous as that of the discrete case). so that x j = x j t. ⋅ ⋅ ⋅ . j = m + 1. X n = xn ( = k x1 . θ = Pθ T = t [( ) ] = k x1 . . ⋅ ⋅ ⋅ . x ) g t. θ)∑ h( x . . ⋅ ⋅ ⋅ . ⋅ ⋅ ⋅ . x )]. xn . ( ) . .264 11 Sufﬁciency and Related Theorems Pθ X 1 = x1 . t m +1 . A proof without the regularity conditions mentioned above involves deeper concepts of measure theory the knowledge of which is not assumed here. . Xn). T = t Pθ T = t ( ) ( )( ) = h( x . ⋅ ⋅ ⋅ . T(x1. θ h x1 . . X n = xn . Xn = xn|T = t) is independent of θ . call it k[x1. xn . . . xn . ( ) ( ) [ ( ) ]( as was to be seen. it follows that m ≤ n. From Remark 1. ⋅ ⋅ ⋅ . . Then Pθ (X1 = x1. θ ⋅ ⋅ ⋅ f xn . xn [ Pθ T = t ( ( ) ) )] if and only if f x1 . 1 n Setting g T x1 . ⋅ ⋅ ⋅ . . t = t1 . It should be made clear. ⋅ ⋅ ⋅ . . X n = xn Pθ T = t ( ) ) and this is independent of θ. . ⋅ ⋅ ⋅ . . . θ ⋅ ⋅ ⋅ f xn . T( x . ⋅ ⋅ ⋅ . ⋅ ⋅ ⋅ . [ ( ) and h x1 . n. xn . ⋅ ⋅ ⋅ . x ) ∑ h( x . . ⋅ ⋅ ⋅ . . X n = xn T = t = ( ) Pθ X 1 = x1 . Then Pθ X 1 = x1 . x ) g(t. θ = Pθ X 1 = x1 . θ = g T x1 . m. θ h x1 . . T x1 . n. . ( ) ′ j = 1. X n = xn θ 1 n ( ) ( ) ( ) = P (T = t)k[ x . . ⋅ ⋅ ⋅ . ⋅ ⋅ ⋅ . xn . . Then set Tj = Tj(X1. j = 1. ⋅ ⋅ ⋅ . let T be sufﬁcient for θ. xn 1 1 n n 1 n ( ) ) = P (X θ 1 = x1 . xn ( ( )] ) ) we get f x1 . ⋅ ⋅ ⋅ . t m . . ⋅ ⋅ ⋅ . X n = xn T = t = = ( Pθ X 1 = x1 . x . ⋅ ⋅ ⋅ . ⋅ ⋅ ⋅ . . such that the transformation t j = Tj x1 . ( ) j = 1. Xn). is invertible. that the theorem is true as stated. . . and assume that there exist other n − m statistics Tj = Tj(X1. however. t n . . Now. xn . ⋅ ⋅ ⋅ . ⋅ ⋅ ⋅ . xn . n. ⋅ ⋅ ⋅ . xn.

j = 1. t m +1 .11. T t. ⋅ ⋅ ⋅ . there is a one-to-one correspondence between T. xn . . T . θ = g T x1 . t . θ h x1 . Let ﬁrst f x1 . ( ) [ ( ) ] . ⋅ ⋅ ⋅ . . t n . Xn. . t m +1 . ⋅ ⋅ ⋅ . is independent of θ. ⋅ ⋅ ⋅ . θ J −1 . . −∞ −∞ ( ) ∞ ∞ ( ) ( ∞ −∞ ) ( ) where h * * t = ∫ ⋅ ⋅ ⋅ ∫ h * t. ⋅ ⋅ ⋅ . ⋅ ⋅ ⋅ .t ) ( )g(t. . t m +1 . . θ)h * (t . . t n dt m +1 ⋅ ⋅ ⋅ dt n = g t. t m+1 n ( = g t. xn .t h * t. θ = ( g t. ⋅ ⋅ ⋅ . t n t. . t n Hence ( ) [ ( ) ( )] J . . ⋅ ⋅ ⋅ . t n . −∞ () ∞ ( ) That is. and that the respective Jacobian J (which is independent of θ) is different from 0. is independent of θ. ⋅ ⋅ ⋅ . It follows that the conditional distribution of T. ⋅ ⋅ ⋅ . ( ) ( ) [ ( ) ]( ) Then fT . t m +1 . θ) is independent of θ by Remark 1. Tm+1. is independent of θ. it follows that the conditional distribution of X1. ) ) ). So we may set f t m +1 . t = h x1 . . θ h ** t . . n. θ h * t . t n dt m +1 ⋅ ⋅ ⋅ dt n . () fT t. . T . . xn . θ ⋅ ⋅ ⋅ f xn . t . ( ) ( m+1 n ( ) ( ) ) ( ) ( ) ( ) If we also set fT t. Tn. Xn. t n = h x1 t. ⋅ ⋅ ⋅ . ⋅ ⋅ ⋅ . the conditional distribution of Tm+1. one has f x1 . tn )] J where we set h * t. t n . t n m +1 . . θ) = g(t. ⋅ ⋅ ⋅ . . xn t. . given T = t. t m +1 . θ J −1 = f t m +1 . Tn. θ . t n t. Tm+1. ⋅ ⋅ ⋅ . θ = ∫ ⋅ ⋅ ⋅ ∫ g t. xn . . x (t. i. and X1. θ = fT . t n t. . tn|t. by using the inverse transformation of the one used in the ﬁrst part of this proof. . t m +1 . fT(t. ⋅ ⋅ ⋅ . T t. θ fT t. ⋅ ⋅ ⋅ . Then.1 Sufﬁciency: Deﬁnition and Some Basic Results 265 It is also assumed that the partial derivatives of xj with respect to ti. θ J −1 = h * t m +1 . ⋅ ⋅ ⋅ . xn . ⋅ ⋅ ⋅ . ⋅ ⋅ ⋅ . ⋅ ⋅ ⋅ . Since. . t n . That is. θ)h**(t) and hence f t m +1 . Let now T be sufﬁcient for θ. ⋅ ⋅ ⋅ . by assumption. θ = g T x1 . . Tn. . . ⋅ ⋅ ⋅ . θ h x1 t. . exist and are continuous. t n m +1 . given T = t. t m +1 . tn . θ h * t. given T = t. t m +1 . · · · . . ⋅ ⋅ ⋅ . θ ( )[ ( = g(t. (θ)h * *(⋅t)⋅ ⋅ ) = ( h * *(t⋅ )⋅ ⋅ ) m +1 n m +1 n which is independent of θ. . But f(tm+1.

Actually. ⋅ ⋅ ⋅ . ( ) [( ) ]( ( ) [ ( )] ( )] ˜ Thus. ⋅ ⋅ ⋅ . by the fact that ∑rj = 1θj = 1 and ∑rj = 1 xj = n. where ψ: r → r is one-to-one θ (and measurable). ⋅ ⋅ ⋅ . xn . xn =g φ ( ) [ ( { −1 ) ]( ) ˜ [T( x . xn . .266 11 Sufﬁciency and Related Theorems we get f x1 . θr)′. xn . where f x. EXAMPLE 6 Refer to Example 1. ⋅ ⋅ ⋅ . . ▲ [( ) ] [( ) We now give a number of examples of determining sufﬁcient statistics by way of Theorem 1 in some interesting cases. . it follows that the statistic (X1. θ h x1 . θ = g T x1 . . θ h x1 . ⋅ ⋅ ⋅ . θ = g T x1 . ˜ θ = ψ −1 ψ θ = ψ −1 θ . x )]. θ h x1 . xn . ⋅ ⋅ ⋅ . so that ˜ the inverse φ−1 exists. ⋅ ⋅ ⋅ . T is sufﬁcient for the new parameter θ . [ ( )] () Hence f x1 . ⋅ ⋅ ⋅ . ⋅ ⋅ ⋅ . ⋅ ⋅ ⋅ . . . x ) 1 n 1 n ˜ which shows that T is sufﬁcient for θ. xn . Next. θ = ( ) n! θ1x ⋅ ⋅ ⋅ θ rx I A x . θ = g T x1 . θ h x1 . θ = g T x1 . θ = f x1 . xn . ⋅ ⋅ ⋅ . xn . θ}h( x . we have that T = φ(T) is also ˜ sufﬁcient for θ and T is sufﬁcient for θ = ψ (θ). Thus f x1 . Then. ⋅ ⋅ ⋅ . . xn . θ = g T x1 . if T is sufﬁcient for θ. ⋅ ⋅ ⋅ . Xr)′ is sufﬁcient for θ = (θ1. ⋅ ⋅ ⋅ . ψ −1 θ and ˜ ˜ ˜ g T x1 . where we set ˜ ˜ ˜ f x1 . xn . ⋅ ⋅ ⋅ . xn . ( as was to be seen. xn . ⋅ ⋅ ⋅ . by Theorem 1. . xn becomes ( ) [ ( ) ]( ) ) ˜ ˜ ˜ ˜ f x1 . ˜ PROOF We have T =φ−1[φ (T)] = φ−1( T ). x1! ⋅ ⋅ ⋅ xr ! 1 r () Then. xn . ⋅ ⋅ ⋅ . we also have f x. ψ −1 θ . θ = ( ) ∏ r −1 j =1 1 x j ! n − x1 − ⋅ ⋅ ⋅ − xr −1 ! r−1 ( n! ) × θ1x ⋅ ⋅ ⋅ θ rx−1 1 − θ1 − ⋅ ⋅ ⋅ − θ r −1 ( ) −1 n − Σ r= 1 x j j IA x () . xn . ▲ COROLLARY ) [ ( ) ]( ) Let φ: m → m ((measurable and independent) of θ) be one-to-one. xn . ⋅ ⋅ ⋅ .

. Since also j ⎛ 1 ⎞ ⎛θ ⎛ nθ 2 ⎞ f x. . EXAMPLE 7 Let X1. θr−1)′. ∞ ) x(1) I ( −∞ . .’s from U(θ1. . . . and n j =1 ( ) 1 n ∑ Xj − μ n j =1 ( ) 2 is sufﬁcient for θ2 = θ if θ1 = μ is known. .’s from N(μ. xn)′ and θ = (θ1. θ = ⎜ ⎟ exp⎜ − 1 ⎟ exp⎜ 1 ⎜ 2πθ ⎟ ⎝ 2θ 2 ⎠ ⎝ θ2 ⎝ 2 ⎠ n ( ) ∑x j =1 n j − 1 2θ 2 ∑x j =1 n 2 j ⎞ ⎟. θ2). then ∑n= 1 (Xj − μ)2 is sufﬁcient for θ. Xn be i. Xr−1)′ is sufﬁcient for (θ1. ⎠ it follows that. σ2 = θ2 and θ = (θ1.1 Sufﬁciency: Deﬁnition and Some Basic Results 267 from which it follows that the statistic (X1. . j whereas if θ1 = μ is known and θ2 = θ. θ g 2 x( n ) . if θ1 = α is known and θ2 = θ. . . ⎥ ⎦ ) But ∑ (x j =1 n j − θ1 ) = ∑ [( x j =1 j − x + x − θ1 ) ( )] = ∑ ( x 2 j =1 j −x ) 2 + n x − θ1 . In the examples just discussed it so happens that the dimensionality of the sufﬁcient statistic is the same as the dimensionality of the REMARK 2 . as j follows from the form of f(x. .v. . it follows that X(n) is sufﬁcient for θ.d.11. X(n))′ is sufﬁcient for θ. 2θ 2 ⎥ ⎦ ( ) It follows that ( X . .v.i. . 2 2 [ ][ ] where g1[x(1). . if θ2 = σ2 is known and θ1 = θ. σ2). r. θ2)′. θ) at the beginning of this example. Similarly. μ = θ1.i. θ = = ( ) 1 (θ (θ 2 − θ1 1 − θ1 ) ) n I [θ 1. . X1= X is sufﬁcient for θ1 = θ. we get f x. r. ∑n= 1(Xj − X )2)′ is sufﬁcient for θ. By setting x = (x1. θ] = I(−∞. . we have ⎛ 1 ⎞ ⎡ 1 f x. g2[x(n). θ = ⎜ exp ⎢− ⎜ 2πθ ⎟ ⎟ ⎢ ⎣ 2θ 2 ⎝ 2 ⎠ n ( ) 2 n ∑ (x j =1 n n j 2⎤ − θ 1 ⎥. . In particular. where S2 = 2 1 n ∑ X j − X . 1 EXAMPLE 8 Let X1. X(1) is sufﬁcient for θ. for r = 2. ( ) 2 so that ⎛ 1 ⎞ ⎡ 1 f x. θ .d. θ ](x(n)). In particular. . . xn)′. then ∑n= 1Xj is sufﬁcient for θ. if θ2 = β is known and θ1 = θ. . . . Xn be i. θ] = I[θ . θ = ⎜ ⎟ exp ⎢− ⎜ 2πθ ⎟ ⎢ ⎣ 2θ 2 ⎝ 2 ⎠ n ( ) ∑ (x j =1 n j −x ) 2 − 2⎤ n x − θ 1 ⎥. By the corollary to Theorem 1. Then by setting x = (x1. θ ] x( n ) 2 ( ) ( ) n g1 x(1) . θ2)′. . it also follows that ( X . ∞)(x(1)). S2)′ is sufﬁcient for θ. It follows that (X(1).

of (X1. . . . . θ). . . is a sufﬁcient statistic for θ = (θ1. θr)′. . r. . it can be shown that no sufﬁcient statistic of smaller dimensionality other than the (sufﬁcient) statistic (X1. ⋅ ⋅ ⋅ . yn = ∑ x j .’s from N(θ1. If m is the smallest number for which T = (T1.f.i. .d. In a situation like this.’s F = {f(·. . j = 1. Xn are i. .i. . .268 11 Sufﬁciency and Related Theorems parameter. S2)′ is sufﬁcient for (θ1. . For example. . The notion of sufﬁciency is connected with a family of p. given Tj = tj. i ≠ j. r. if all other θi. . n − 1. 2πnθ 2 ⎣ 2 nθ 2 ⎦ 1 ( ) This quotient is equal to ( 2πnθ 2 2πθ 2 ) n ⎧ 1 ⎡ exp ⎨ ⎢ yn − nθ1 ⎩ 2 nθ 2 ⎣ ( ) 2 − n y1 − θ1 ( ) 2 − ⋅ ⋅ ⋅ − n yn −1 − θ1 ( ) 2 2 ⎫ − n yn − y1 − ⋅ ⋅ ⋅ − yn −1 − θ1 ⎤ ⎬ ⎥⎭ ⎦ ( ) . is not in conformity with the deﬁnition of a sufﬁcient statistic.f. otherwise Tj is to be either sufﬁcient for the above family F or not sufﬁcient at all. θ ∈Ω}. . suppose that X1. where REMARK 3 S2 = 2 1 n ∑ Xj − X . the number of the real-valued statistics which are jointly sufﬁcient for the parameter θ coincides with the number of independent coordinates of θ. In Deﬁnition 1. Xn)′. . . .’s: ⎛ 1 ⎞ ⎧ 1 ⎡ exp ⎨− ⎜ ⎢ y1 − θ1 ⎜ 2πθ ⎟ ⎟ ⎩ 2θ 2 ⎣ ⎝ 2 ⎠ n ( ) 2 + ⋅ ⋅ ⋅ + yn −1 − θ1 ( ) 2 and 2 ⎫ + yn − y1 − ⋅ ⋅ ⋅ − yn −1 − θ1 ⎤ ⎬ ⎥⎭ ⎦ ( ) 2⎤ ⎡ 1 exp⎢− yn − nθ1 ⎥. . are known. Xn−1)′. . .d. j =1 n one sees that the above mentioned conditional p. is independent of θj. . . .d. . Or to put it differently. is given by the quotient of the following p. .’s from the Cauchy distribution with parameter θ = (μ. . then T is called a minimal sufﬁcient statistic for θ. . .v. Xn)′ exists.f. m. suppose that m = r and that the conditional distribution of (X1.d.d. θ2). Tm)′. As an example. Then ( X . Tj = Tj(X1. n j =1 ( ) Now consider the conditional p. Ω and we may talk about Tj being sufﬁcient for θj. . . j = 1. however. . . However. one may be tempted to declare that Tj is sufﬁcient forθj. this need not always be the case. Xn are i. . if X1. This outlook. given ∑n= 1Xj = yn.d. By j using the transformation y j = x j . σ 2)′. Xn).f. θ2)′.v.

f(·.’s distributed as stated below. r.1 In each one of the following cases write out the p. Π n= 1 Xj is a sufﬁcient j statistic for α = θ if β is known. i) ii) iii) iv) X X X X is distributed as Poisson. and ∑n= 1 Xj or X is a sufﬁcient statistic for j β = θ if α is known. Π n= 1 Xj or −∑n= 1 log Xj is a sufﬁcient j j statistic for α = θ if β is known. 11.11. and Π n= 1 (1 − Xj) is a sufﬁcient statistic for j β = θ if α is known. X is not j sufﬁcient for (θ1. θ2)′ = (α. ( ) where θ > 0. is distributed as Negative Binomial.d.f.2 Let X1. x ≠ 0. . β)′ if the X’s j j are distributed as Beta. . θ = 1 − e −θ − θe −θ .d. take α = 1 and conclude that ∑n= 1 Xj j ˜ or X is a sufﬁcient statistic for the parameter θ = 1/θ of the Negative Exponential distribution. . j j j β)′ if the X’s are distributed as Gamma. ( ) ( ) f ( x. In the latter case.’s) Let X1. is distributed as Gamma. Π n= 1 (1 − Xj))′ is a sufﬁcient statistic for (θ1. is distributed as Beta. Thus the conditional p. ii) ∑n= 1 Xj or X is a sufﬁcient statistic for θ.v. 1.’s with p. .1. θ2)′.v.v. f 2. Then show that X1 + X2 is not a sufﬁcient statistic for θ.d. θ = e −θ . In particular. or equivalently. Then use Theorem 1 and its corollary in order to show that: i) ∑n= 1 Xj or X is a sufﬁcient statistic for θ. if the X’s are distributed as j Negative Binomial. θ ) = 0. Xn be i. under consideration is independent of θ1 but it does depend on θ2.1. iv) (Π n= 1 Xj. ∑n= 1 Xj)′ or (Π n= 1 Xj. .1. θ = θe −θ . 2.d. In particular. θ) given by: f 0 .f. X2 be i.d. iii) (Π n= 1 Xj. 11. of the r. f 1.v. X and specify the parameter space Ω of the parameter involved.1 Exercises Sufﬁciency: Deﬁnition and Some Basic Results 269 and (y n − nθ1 ) 2 − n y1 − θ1 ( ) 2 − ⋅ ⋅ ⋅ − n yn −1 − θ1 ( ) 2 − n yn − y1 − ⋅ ⋅ ⋅ − yn −1 − θ1 2 2 2 2 = yn − n⎡ y1 + ⋅ ⋅ ⋅ + yn −1 + yn − y1 − ⋅ ⋅ ⋅ − yn −1 ⎢ ⎣ independent of θ1. if the X’s are distributed as j Poisson.3 (Truncated Poisson r. ⎥ ⎦ ) 2 Exercises 11. ( ( ) ⎤.i. r. Thus ∑n= 1 Xj. X )′ is a sufﬁcient statistic for (θ1. θ2)′ = (α.f.i. The concept of X being sufﬁcient for θ1 is not valid unless θ2 is known.

∞ . .1. θ ∈ 0. r. Furthermore. . .d. () show that X(1) is a sufﬁcient statistic for θ. ⎝ ⎞ ∑ X j2 ⎟ ⎠ j =1 n ′ is a sufﬁcient statistic for θ.270 11 Sufﬁciency and Related Theorems 11.8 If X1. ∞ ) x .f. . Xn are i. Then show that T = (T1.f. ∑ X1 j .v. ∞ ) x . . θ2). . 11. f x. θ ) = (θ − x)I ( ) ( x).5 If Xj = (X1j. 11. . T2)′ is a sufﬁcient statistic for θ. 11.3. X(n))′ is a sufﬁcient statistic for θ. . θ = e ( ) − x −θ ( ) I (θ .7 that If X1. f(·.13(iii) of Chapter 3. θ ∈ 0. .v.1. ∑ X 2 j . is a random sample of size n from the Bivariate Normal distribution with parameter θ as described in Example 4. Xn is a random sample of size n from N(θ. ﬁnd a sufﬁcient statistic for θ. . .9 Let X1. θ = ( ) 1 3 −x θ x e I ( 0 . θ = ⎜ ⎟ ⎜ ⎟ ) ⎝ c ⎠⎝ x⎠ ( ) I ( c .d. 11.1.i. show that (X(1). and set T1 for the number of X’s which are equal to 0 and T2 for the number of X’s which are equal to 1. . . X2j)′. .1. j = 1. . Xn is a random sample of size n from U(−θ. θ = θxθ −1 I ( 0 . θ ∈ . 2 0. ∞ ) x . θ). .4 Let X1. θ iii) f x.d. Xn be i. θ) given in Exercise 3. 11. Xn is a random sample of size n with p. show that: ′ n n n ⎛ ⎞ 2 2 ⎜ X1 .d. . ∑ X1 j X 2 j ⎟ ⎝ ⎠ j =1 j =1 j =1 is a sufﬁcient statistic for θ. X 2 . r.’s with p. θ) given below.1. () ( ) . ∞).10 If X1. Then show that ∑n= 1|Xj| is a sufﬁcient j statistic for θ.’s with the Double Exponential p. 11. ∞ . . θ ∈(0. θ ∈ . . ∞).6 If X1. θ ∈ 0. Xn be a random sample of size n from the Bernoulli distribution. |Xn|) is also a sufﬁcient statistic for θ.f. . .d. show n ⎛ n ⎞ 2 ⎜∑ X j. show that this statistic is not minimal by establishing that T = max(|X1|. n. . by using Theorem 1. f(·. . . . θ ∈(0. . ( ) () ( ) 2 ii) f ( x. 6θ 4 θ +1 () ( ) ⎛θ ⎞⎛ c ⎞ iv) f x. . .i. .1. then. 1) x . ∑ X j ⎟ ⎝ j =1 ⎠ j =1 ′ ⎛ or ⎜ X . θ i) f x.1. . ∞ . . .

∞) implies g(x)/x! = 0 for x = 0. . . we introduce the (technical) concept of completeness which we also illustrate by a number of examples. x = 0. Meanwhile let us recall that if ∑n= 0 cn−j xn−j = 0 for more than n values of x. Thus Eθ g(X) = 0 for all θ ∈ (0. . Ω so that g(X) is an r. . θ). then j cj = 0. and therefore ⎛ n⎞ g x ⎜ ⎟ = 0. .f. . ⎝ x⎠ ⎪ ⎪ ⎭ ⎩ where A = {0. . Ω The examples which follow illustrate the concept of completeness.}. . ⋅ ⋅ ⋅ . .θ ∈Ω}. . EXAMPLE 9 Let ⎫ ⎧ n −x ⎛ n⎞ ⎪ ⎪ F = ⎨ f ⋅. . . EXAMPLE 10 Let ⎧ ⎫ θx ⎪ ⎪ F = ⎨ f ⋅. 1 ⎬. if ∑∞ = 0 cnxn = 0 for all values of x in an interval for n which the series converges. hence for more than n values of ρ. . 1.2 Completeness Some Basic Results 271 11. n. . 1) is equivalent to ∑ g( x )⎜ x⎟ ρ x = 0 ⎝ ⎠ x=0 n ⎛ n⎞ for every ρ ∈ (0. Eθ g(X) = 0 for all θ ∈ Ω implies that g(x) = 0 except possibly on a set N of x’s such that Pθ(X ∈ N) = 0 for all θ ∈Ω. θ . ( ) ( ) ( ) () n n ( ) ⎛ ⎞ n ⎛ n⎞ Eθ g X = ∑ g x ⎜ ⎟ θ x 1 − θ ⎝ x⎠ x= 0 ( ) () ( ) n −x = 1−θ ( ) ∑ g( x)⎜ n⎟ ρ ⎝ x⎠ x =0 x . f x. θ . 1. To this end. j = 0. In fact. ∞ ⎬. θ = e −θ I A x . . .11. θ ∈Ω ⊆ Rr. . We assume that Eθ g(X) exists for all θ ∈Ω and set F = {f(. n}.. . . . In fact. . ∞). where ρ = θ/(1 − θ). then cn = 0. . ( ) ( ∞ ) () ( ) Eθ g X = ∑ g x e −θ x= 0 ( ) () ∞ g x θx θx = 0 = e −θ ∑ x! x = 0 x! () for θ ∈ (0. n.2 Completeness In this section. n = 0. f(·. . DEFINITION 2 With the above notation. Then F is complete. θ = ⎜ ⎟ θ x 1 − θ I A x . x! ⎪ ⎪ ⎩ ⎭ where A = {0. θ ∈ 0.d. . θ ∈ 0. Also. n ⎝ x⎠ () which is equivalent to g(x) = 0. let X be a k-dimensional random vector with p. . f x. . θ ).1 Sufﬁciency: Deﬁnition and 11.v. Its usefulness will become apparent in the subsequent sections. . 1. 1. Then F is complete. 1. and this is equivalent to g(x) = 0 for x = 0. we say that the family F (or the random vector X) is complete if for every g as above. 1. and let g: k → be a (measurable) function. x = 0. 1.

if Eθ g(X) = 0 for all θ ∈(α.d. Tm)′ be a sufﬁcient statistic for θ.v. f(·. it can be shown that ⎧ ⎡ x −θ 1 ⎪ exp ⎢− F = ⎨ f ⋅. Xn). ( ) ( ) ( ) ( ) In the following. then ( ) ( ) ( ) 2 ⎤ ⎥. θ for all v and t ∈ S. θ ∈ α . θ −α ⎭ ⎩ Then F is complete. Vk)′.f. θ − a ∫α () Thus. then ∫θ g(x)dx = 0 for all θ > α which α intuitively implies (and that can be rigorously justiﬁed) that g(x) = 0 except possibly on a set N of x’s such that Pθ (X ∈N) = 0 for all θ ∈ Ω. ( ) ( ) () ( ) Eθ g X = ( ) 1 θ g x dx. θ ∈ 0 . In fact. σ2). j = 1. . Then Eθ g(X ) = Eθ(X − μ) = 0 for all θ ∈ (0. . θ = I [α . Then the distribution of V does not depend on θ. Xn be i. while g(x) = 0 only for x = μ. θ).d. f x. THEOREM 2 Let X1. . θ . θ for all v and t. by sufﬁciency. Xn be i. t. ∞ ⎬. θ = ⎢ 2σ 2 2πσ ⎪ ⎢ ⎣ ⎩ is complete. θ . . . θ) > 0 for all θ ∈ Ω and so f(v|t) is well deﬁned and is also independent of θ. we establish two theorems which are useful in certain situations. f x. f x. θ ∈ Ω ⊆ r and let T = (T1. θ ∈ ⎥ ⎥ ⎦ ⎫ ⎪ ⎬ ⎪ ⎭ ⎧ ⎫ ⎡ x−μ 2⎤ 1 ⎪ ⎥. where X is an r. if both μ and σ2 are unknown. k. ∞).’s from N(μ. Let g(·. . r. . We have that for t ∈S. Let V = (V1. j = 1. θ = f v t g t. . r. . θ . while by independence ( ) ( )( ) fV . ∞ ⎪ ⎢− F = ⎨ f ⋅. Xn). . θ) is the same for all θ ∈ Ω.i. . θ) is U(θ. Finally.272 11 Sufﬁciency and Related Theorems EXAMPLE 11 Let ⎫ ⎧ 1 F = ⎨ f ⋅. In fact. of T and assume that the set S of positivity of g(·.v. θ ] x . θ). with p.i. θ = exp ⎬ ⎢ 2θ ⎥ 2πθ ⎪ ⎪ ⎥ ⎢ ⎦ ⎣ ⎩ ⎭ is not complete. θ) be the p. let g(x) = x − μ. . it can be shown that ( X .f. T v.f. . f(·.v. . ∞).’s with p. . . The same is seen to be true if f(·. be any other statistic which is assumed to be (stochastically) independent of T.d. . · · · . .d. Vj = Vj(X1. · · · .d. g(t. If σ is known and μ = θ. β ). t. . . Then PROOF fV . Therefore ( ) ( )( ) . T v. . θ g t. where Tj = Tj(X1. EXAMPLE 12 Let X1. . S2)′ is complete. θ = fV v. If μ is known and σ2 = θ. . m. .

completeness. θ) = f(v/t) for all v and t ∈S. j = 1. Tm)′ be a sufﬁcient statistic of θ. . . . then show that F is complete.’s with p. Vk)′. Two balls are drawn one by one with replacement.3 (Basu) Consider an urn containing 10 identical balls numbered θ + 1. }. t m . and let Xj be the number on the jth ball. . θ). .2 If F is the family of all U(−θ. .’s. ⋅ ⋅ ⋅ .1 If F is the family of all Negative Binomial p. and stochastic independence. if the distribution of V does not depend on θ. θ + 10. v ∈ k. . θ ∈Ω} Ω is complete.i. Under certain regularity conditions. .v.2. it follows that V and T are independent. . Vj = Vj(X1. v) = 0 for all θ ∈ Ω and hence φ(t. where Tj = Tj(X1. . Eθφ(T. ⋅ ⋅ ⋅ . . .d. Then.d. r.d. . . . Let g(·. θ dt1 ⋅ ⋅ ⋅ dt m = fV = fV −∞ V −∞ 1 m 1 () ( )( (v) − ∫ ⋅ ⋅ ⋅ ∫ f (v. 11. . . θ g t. ∞). m. Use this .f. one has fV(v) = f(v|t). k be any other statistic.11.2.d. v) = fV(v) − f(v|T) which is deﬁned for all t’s except perhaps for a set N of t’s such that Pθ (T ∈N) = 0 for all θ ∈Ω. θ + 2. t . t . θ ( )( ) ( )( ) for all v and t ∈ S. [ ( ) ( )] () ( ) ) ⋅ ⋅ ⋅ dt m that is. . as was to be seen. v = Eθ f V v − f v T = f V v − E θ f v T ∞ ∞ −∞ ∞ −∞ ∞ ( ) = fV v − ∫ ⋅ ⋅ ⋅ ∫ f v t1 . . So fV(v) = f(v/t). ▲ REMARK 4 The theorem need not be true if S depends on θ. . then show that F is not complete. θ) be the p.f. Let V = (V1. f(·. j = 1. . for an arbitrary but ﬁxed v.f. of T and assume that C = {g(·. Then we have for the continuous case (the discrete case is treated similarly) PROOF Eθ φ T . Hence fV(v. . θ). ⋅ ⋅ ⋅ . .d. fV(v. . t m g t1 . θ ∈ (0. Xn). . . . THEOREM 3 (Basu) Let X1. 10. . . 11. the converse of Theorem 2 is true and also more interesting. Xn).1 Exercises Sufﬁciency: Deﬁnition and Some Basic Results 273 fV v.f. 20. Xn be i. . where θ ∈ Ω = {0. θ) p. 2. . t ∈Nc. . that is. It sufﬁces to show that for every t ∈ m for which f(v|t) is deﬁned. It relates sufﬁciency. θ ∈Ω ⊆ r and let Ω T = (T1. To this end. consider the statistic φ(T.’s. θ) = fV(v) is independent of θ. θ)dt ( v ) − f ( v ) = 0.2. v) = 0 for all t ∈ Nc by completeness (N is independent of v by the deﬁnition of completeness). j = 1. . θ = f v t g t. . . ▲ Exercises 11.

θ ∈Ω ⊆ . m. PROOF THEOREM 4 i) That φ(T) is a function of the sufﬁcient statistic T alone and does not depend on θ is a consequence of the sufﬁciency of T. then it becomes so by conditioning it with respect to T. θ ∈Ω. . since the resulting statistic will be the same (with probability 1) by (CE2′). Then we say that U is an unbiased statistic for θ if EθU = θ for every θ ∈Ω. . . r. We shall then introduce the concept of unbiasedness and we shall establish the existence and uniqueness of uniformly minimum variance unbiased statistics. follows from (CE1).v. 2 2 iii) σ θ[φ(T)] < σ θ(U). ii) φ(T) is an unbiased statistic for θ. . where by EθU we mean that the expectation of U is calculated by using the p.f. . f(·. Set φ(t) = Eθ(U|T = t).d. r. (Rao–Blackwell) Let X1. . . Xn be i. . . ▲ The interpretation of the theorem is the following: If for some reason one is interested in ﬁnding a statistic with the smallest possible variance within the class of unbiased statistics of θ.3 Unbiasedness—Uniqueness In this section. Chapter 5. θ).v. page 123. if an unbiased statistic U is not already a function of T alone (with probability 1). It is further clear that the variance does not decrease any further by conditioning again with respect to T. f(·.’s with p. Tm)′. f(·. .’s with p. θ).i. provided EθU2 < ∞. Chapter 5. .f. ii) That φ(T) is unbiased for θ. . Xn) be an unbiased statistic for θ which is not a function of T alone (with probability 1).d. . . .i. . given T. The variance of the resulting statistic will be smaller than the variance of the statistic we started out with by (iii) of the theorem. . .d. Xn) be a statistic.274 11 Sufﬁciency and Related Theorems example to show that Theorem 2 need not be true if the set S in that theorem does depend on θ. . . Tj = Tj(X1. Xn).d. 11. φ(T) is a function of the sufﬁcient statistic T alone. Then we have that: i) The r.f. is known as Rao–Blackwellization. . . page 123. and let T = (T1. that is. be a sufﬁcient statistic for θ. we shall restrict ourselves to the case that the parameter is realvalued. θ). . . Eθφ(T) = θ for every θ ∈Ω. page 123. then one may restrict oneself to the subclass of the unbiased statistics which depend on T alone (with probability 1). We can now formulate the following important theorem. Xn be i. θ ∈Ω ⊆ and let U = U(X1. . . iii) This follows from (CV). . Chapter 5. j = 1. . DEFINITION 3 Let X1. . Let U = U(X1.v. The process of forming the conditional expectation of an unbiased statistic of θ. This is so because.d.

is an unbiased statistic for θ with variance θ2. Then T = ∑n= 1 Xj is a j sufﬁcient statistic for θ. Xn be i.d. θ). . Let X1. ( ) An unbiased statistic for θ which is of minimum variance in the class of all unbiased statistics of θ is called a uniformly minimum variance (UMV) unbiased statistic of θ (the term “uniformly” referring to the fact that the variance is minimum for all θ ∈ Ω). we have then EθU(T) = EθV(T) = θ. . θ ∈ Ω. θ).11. θ) be its p.f. It is further easily seen (by Theorem . if V = V(T) is another unbiased statistic for θ. r. Then T = ∑n= 1 X 2 is a sufﬁcient statistic for θ. Some illustrative examples follow.3 Unbiasedness—Uniqueness Sufﬁciency: Deﬁnition and Some Basic Results 275 The concept of completeness in conjunction with the Rao–Blackwell theorem will now be used in the following theorem. Then U is the unique unbiased statistic for θ with the smallest variance in the class of all unbiased statistics for θ in the sense that. . Setting θ = 1/λ. that it is UMV unbiased for θ. and also complete. by Example 12. EXAMPLE 13 EXAMPLE 14 EXAMPLE 15 Let X1. be a sufﬁcient statistic for θ and let g(·.d. Let X1. . 2.f. with parameter λ. Let U = U(T) be an unbiased statistic for θ and suppose that EθU2 < ∞ for all θ ∈Ω. by Example 5.i. By the Rao–Blackwell theorem. we have φ(t) = 0 for all t ∈Rm except possibly on a set N of t’s such that Pθ(T ∈N) = 0 for all θ ∈Ω. . σ2). It is also j complete. the p. Xn). θ).d. j = 1. it sufﬁces to restrict ourselves in the class of unbiased statistics of θ which are functions of T alone. Thus X1. . θ ∈Ω ⊆ . . it follows.’s from B(1.v. . Here is another example which serves as an application to both Rao– Blackwell and Lehmann–Scheffé theorems. r. θ). . Then if σ is known and μ = θ. and let F = {f(·. 1). . Set C = {g(·. where φ(T) = U(T) − V(T). Now X = (1/n)T is an unbiased statistic for θ and hence. then U(t) = V(t) (except perhaps on a set N of t’s such that Pθ(T ∈N) = 0 for all θ ∈Ω). since it is unbiased for θ. THEOREM 5 (Uniqueness theorem: Lehmann–Scheffé) Let X1. . θ ∈Ω}. θ ∈ Ω. θ ∈ Ω. 3. by Theorem 5.f. Then by completeness of C.d. ▲ Ω DEFINITION 4 [ ( ) ( )] or Eθ φ T = 0. .’s from N(μ. . . equivalently. θ ∈(0. .f. Let T = (T1. UMV unbiased for θ. Xn be i.d. Then.1 11. . r.v. Tm)′. X = (1/n)T is UMV unbiased for θ.i. Xn be i. . j = 1. we have that T = ∑n= 1 Xj is a sufﬁcient statistic for θ. X2. θ ∈Ω} and assume that C is complete. . for example. r.v. m. Let μ be known and without loss of generality set μ = 0 and σ2 = θ.i. By the unbiasedness of U and V. We 2 have then that Eθ(Xj) = θ and σ θ(Xj) = θ2.’s with p. PROOF Eθ U T − V T = 0. by Example j j 8. . by Theorem 5.d. . f(·. by Example 9. . by Example 8.v. by Theorem 5. x > 0. Tj = Tj(X1.’s from the Negative Exponential p. .d. X3 be i. θ) = 1/θe−x/θ.d. of the X’s becomes f(x. .i. Since T is also complete (by Theorem 8 below) and S2 = (1/n)T is unbiased for θ. .

f. then use Exercise 11. of the X’s is given by n ⎡ ⎤ f x1 . θ ∈ Ω. θ ∈Ω and also h(x) > 0 for x ∈S. it is clear that. Xn are i. (One.’s: One-Dimensional Parameter Case A large class of p. ∞ . Thus Eθ(X1|T) = T/3 which is what we were after. θ ∈ 0. then the joint p. . one has that their common value is T/3. it sufﬁces to Rao–Blackwellize X1.276 11 Sufﬁciency and Related Theorems 8 below) that T = X1 + X2 + X3 is a sufﬁcient statistic for θ and it can be shown that it is also complete. . θ). . . . θ = C θ e hx.2(i) and Example 10 to show that X is the (essentially) unique UMV unbiased statistic for θ.f. . by symmetry. which is independent of θ. To actually ﬁnd the UMV unbiased statistic for θ.v.4 The Exponential Family of p.1. θ) as above.d. (1) ( ) () () ( ) where C(θ) > 0. To this end. x ∈ . θ ∈Ω ⊆ . 3 ⎝ 3⎠ ( ) ( ) Exercises 11. one sees that ⎛T ⎞ Eθ ⎜ ⎟ = θ ⎝ 3⎠ and ⎛T ⎞ θ2 σ θ2 ⎜ ⎟ = < θ 2 .d. . f(·. j = 1. n. ⋅ ⋅ ⋅ . one has Eθ(X1|T) = Eθ(X2|T) = Eθ(X3|T).3.1 If X1. the set of positivity of f(x.3.d.i.2 Refer to Example 15 and. show that X is the (essentially) unique UMV unbiased statistic for θ. of course. θ = C n θ exp ⎢Q θ ∑ T x j ⎥ h x1 ⋅ ⋅ ⋅ h xn . Since X1 is not a function of T.d.’s depending on a real-valued parameter θ is of the following form: Q (θ )T ( x ) f x. ⋅ ⋅ ⋅ . ( ) () () ( ) ( ) ( ) (2) . Some illustrative examples follow.d. xn . Since also their sum is equal to Eθ(T|T) = T.f.’s with p. . 11. Xn is a random sample of size n from P(θ). and ( ) ∑e x ∈S S Q θ T x () ( ) hx () C −1 θ = ∫ e () Q θ T x () ( ) h x dx () for the continuous case. r. ⎢ ⎥ j =1 ⎣ ⎦ x j ∈ . by utilizing the appropriate transformation. 11. If X1. It follows that C −1 θ = for the discrete case. arrives at the same result by using transformations.) Just for the sake of verifying the Rao–Blackwell theorem. one then knows that X1 is not the UMV unbiased statistic for θ.f.

f.11. θ). Then the completeness of the families established in Examples 9 and 10 and the completeness of the families asserted in the ﬁrst part of Example 12 and the last part of Example 14 follow from the above theorem.d. θ ∈ Ω ⊆ given by (1) and set C = {g(·.f. σ2). ∞ . ⎝ 2θ ⎠ 2πθ 1 ( ) ( ) and hence it is again of the exponential form with Cθ = () 1 2πθ . θ ∈ 0. More precisely. can also be written as follows. . . () If the parameter space Ω of a one-parameter exponential family of p. 1 −θ ⎝ x⎠ () () () Let now the p. Qθ =− () 1 . In connection with families of p. 2⎟ ⎝ 2σ ⎠ 2πσ 1 Qθ = () θ .’s contains a non-degenerate interval. h x = exp⎜ − x ⎟.4 The Exponential Family of p. θ ∈Ω}. θ) is the p. θ).f.d. θ ∈ 0.f. θ = 1 − θ ( ) ( ) n ⎡⎛ θ ⎞ ⎤⎛ n⎞ exp⎢⎜ log I A x . we have f x. 2⎟ ⎝σ ⎠ ⎝ 2σ 2 ⎠ ⎝ 2σ ⎠ 2πσ 1 and hence is of the exponential form with Cθ = () ⎛ θ2 ⎞ exp⎜ − . θ ∈ . provided Ω contains a non-degenerate interval. . 1.’s of the one-parameter exponential form.d.d.d. n}.1 Sufﬁciency: Deﬁnition and Some Basic Results 277 EXAMPLE 16 Let ⎛ n⎞ f x. where g(·. THEOREM 6 Let X be an r.d. θ = ⎜ ⎟ θ x 1 − θ ⎝ x⎠ ( ) ( ) n− x IA x .f. the following theorem holds true. be N(μ. it can be shown that the family is complete. then we have () () f x. of T(X). . Then if σ is known and μ = θ. This p. . h x = ⎜ ⎟ I A x .f. θ = ( ) 2⎞ ⎛ 1 exp⎜ − x − μ ⎟ .v. Q θ = log EXAMPLE 17 () ( ) ) n () ⎛ n⎞ θ . ⎝ 2σ 2 ⎠ 2 If now μ is known and σ = θ. ⎟ x⎥ 1 − θ ⎠ ⎥⎜ x ⎟ ⎢⎝ ⎣ ⎦⎝ ⎠ () ( ) and hence is of the exponential form with C θ = 1 − θ .d.’s: One-Dimensional Parameter Case 11. θ = ( ⎛ θ2 ⎞ ⎛ θ ⎞ ⎛ 1 2⎞ exp⎜ − exp⎜ 2 x⎟ exp⎜ − x ⎟. the following result can be proved. Then C is complete. σ2 ⎛ 1 2⎞ T x = x. with p. () where A = {0. T x = x−μ 2θ () ( ) 2 and h x = 1.f. 1 . f x. T x = x. f(·.

i.278 11 Sufﬁciency and Related Theorems THEOREM 7 Let X1.f. . θ = ∑ C n θ exp ⎢Q θ ∑ T x j ⎥∏ h x j ⎢ ⎥ j =1 j =1 ⎣ ⎦ ⎡ n ⎤ Q (θ ) t Q θ t = Cn θ e ∑ ⎢∏ h x j ⎥ = C n θ e ( ) h * t .v. ⎪ x = y . xn. and then so is T*.d. of T* is of the form g t. . We set Y1 = ∑n= 1 T(Xj) and let j Yj = Xj. xn)′ for which ∑n= 1T(xj) = t. Xn be i. . j ⎩ j ( ) where we assume that y = T(x) is one-to-one and hence the inverse T −1 exists. j j ⋅ ⋅ ⋅ . j = 2. . . . . . ⋅ ⋅ ⋅ . ⎢ ⎥ ⎣ j =1 ⎦ ( ) () () ( ) ( ) () ( ) () () where ⎡ n ⎤ h * t = ∑ ⎢∏ h x j ⎥.f. . PROOF i) This is immediate from (2) and Theorem 1. j = 2. . of the one-parameter exponential form. ⎢ ⎥ ⎣j = 1 ⎦ () ( ) Next. Then consider the transformation n n ⎧ ⎧ ⎪ y1 = ∑ T x j ⎪T x1 = y1 − ∑ T y j j =1 j =2 ⎪ ⎪ ⎪ ⎪ hence ⎨ ⎨ ⎪ y = x . Then the proof is carried out under certain regularity conditions to be spelled out. . ii) First. Thus j n ⎡ ⎤ n g t . n. . Then we have g(t. . . let the X’s be of the continuous type. . . . () where the set of positivity of h*(t) is independent of θ. Next. Then i) T* = ∑n= 1 T(Xj) is a sufﬁcient statistic for θ.’s with p. suppose that the X’s are discrete. n. j ii) The p. j = 2. . θ) = Pθ (T* = t) = ∑f(x1. . r. . where the summation extends over all (x1. n. ⎪ j ⎪ j ⎪ ⎪ ⎩ ⎩ ( ) ( ) ( ) and thus n ⎧ ⎡ ⎤ −1 ⎪ x1 = T ⎢ y1 − ∑ T y j ⎥ ⎨ ⎢ ⎥ j =2 ⎣ ⎦ ⎪ x = y . θ). .d.d. j = 2. θ = C n θ e ( ) () Q θ t () h* t . n.

Then the following result concerning the completeness of C follows from Theorem 6. . REMARK 5 We next set C = {g(·. n. THEOREM 8 The family C = {g(·. θ = C n θ exp Q θ y1 − T y2 − ⋅ ⋅ ⋅ − T yn 2 n ( ( ) { ( )[ ( ) + T ( y ) + ⋅ ⋅ ⋅ + T ( y )]} ( ) j × h T −1 y1 − T y2 − ⋅ ⋅ ⋅ − T yn = Cn θ e n { [ ( ) () Q θ y1 () h T −1 y1 − T y2 − ⋅ ⋅ ⋅ − T yn { [ ( )]}∏ h( y ) J n j =2 ( ) ( )]} × ∏ h yj J . Therefore. . j. .11. the joint p.f. ▲ The above proof goes through if y = T(x) is one-to-one on each set of a ﬁnite partition of . n and ∂xj/∂yi = 0 for 1 < i. θ) is the p. .d. yn. provided Ω contains a non-degenerate interval. j =2 ( ) So if we integrate with respect to y2. by t. . ∂y1 T ′ T −1 z [ ( )] where z = y1 − ∑ T y j . . . .1 Sufﬁciency: Deﬁnition and Some Basic Results 279 ∂x1 1 = .d.’s: One-Dimensional Parameter Case 11. we have T ′ yj ∂x1 1 ∂z =− . of Y1.d. . 3. . . i ≠ j. .f.4 The Exponential Family of p. set h * y1 = ∫ ⋅ ⋅ ⋅ ∫ h T −1 y1 − T y2 − ⋅ ⋅ ⋅ − T yn −∞ −∞ ( ) ∞ ∞ { [ ( ) ( )]} × ∏ h y j J dy2 ⋅ ⋅ ⋅ dyn . where g(·. ⋅ ⋅ ⋅ . . . Yn is given by g y1 . Now as a consequence of Theorems 2. . Since for j = 2. we arrive at the desired result. θ ∈ Ω}. θ ∈Ω} is complete. j =2 n ( ) provided we assume that the derivative T ′ of T exists and T′[T −1(z)] ≠ 0. j =2 n ( ) and replace y1.f. we obtain the following result. = ∂y j T ′ T −1 z ∂y j T ′ T −1 z [ ( )] 1 −1 ( ) [ ( )] and ∂x j =1 ∂y j for j = 2. we have that J= T′ T [ (z)] ) = T′ T { [ y − T ( y ) − ⋅ ⋅ ⋅ − T ( y )]} −1 1 2 n 1 . yn . 7 and 8. of the sufﬁcient statistic T*. .

v.i. Then X= are independent. σ2). σ2).d. and the set of positivity of its p.’s from N(μ. X is of the one-parameter exponential form and identify the various quantities appearing in a one-parameter exponential family.f.f. Xn be i. . . by Theorem 7(i).280 11 Sufﬁciency and Related Theorems THEOREM 9 Let the r. r. show that the distribution of the r. For the converse. T* is sufﬁcient for θ. if the distribution of a statistic V does not depend on θ. is independent of θ.v. it follows. if and only if the distribution of V does not depend on θ. because P[∑n= 1 (Yj j j − Y )2 ∈ B] is equal to the integral of the joint p. iii) X is distributed as Gamma with β known. Thus. ▲ Exercises 11.f. of the X’s is of the one-parameter exponential form and T = X is both sufﬁcient for θ and complete.d.v.d. . ⋅ ⋅ ⋅ . by Theorem 9. from a p. ii) X is distributed as Negative Binomial.i. . X1.d. X n = ∑ X j − X . we have ∑ (X j =1 n j −X ) = ∑ (Y − Y ) . that V and T* are independent. ▲ APPLICATION Let X1.f. we have that the family C of the p. by Theorem 7(ii). of the one-parameter exponential form and let T* be deﬁned by (i) in Theorem 7.1 In each one of the following cases. The proof is completed. . by Theorem 8. σ2) implies that Yj = Xj − θ is N(0. . 2 2 j j =1 n But the distribution of ∑n= 1 (Yj − Y )2 does not depend on θ. . 2 j =1 ( ) n ( ) Then V and T will be independent. it follows that the distribution of V is independent of θ. Let V = V X1 . Now Xj being N(θ. by Theorem 3. Xn be i. does not depend on θ. Thus the assumptions of Theorem 2 are satisﬁed and therefore. PROOF 1 n ∑Xj n j =1 and S2 = 1 n ∑ Xj − X n j =1 ( ) 2 We treat μ as the unknown parameter θ and let σ2 be arbitrary (>0) but ﬁxed. .4. if V is any statistic which is independent of T*.d.f.’s of T* is complete. of the Y’s over B and this p.d. Then. i) X is distributed as Poisson. if V is any other statistic. .f. Since Y = X − θ.d.d. PROOF In the ﬁrst place. Then the p. it follows that V and T* are independent if and only if the distribution of V does not depend on θ.

r and ∑r= 1 xj = n}. .f. which is θ independent of θ. EXAMPLE 18 Let X = (X1. .v. iv′) X is distributed as Beta with α known. . Then f x1 . the set of positivity of f(·. xr)′ ∈ r. of the X’s. xr . . . . Xk be i.f.f. Xr)′ have the multinomial p. . r. θ = (θ1.v. xr .’s? 11. k. ⋅ ⋅ ⋅ . ⋅ ⋅ ⋅ . j iii) Is f(·. θ) given by f x. 11. θ > 0 . .5 Some Multiparameter Generalizations Sufﬁciency: Deﬁnition and Some Basic Results 281 iii′) X is distributed as Gamma with α known. 1 − θ1 − ⋅ ⋅ ⋅ − θ r −1 ⎟ x1! ⋅ ⋅ ⋅ xr ! ⎝ j =1 ⎠ ( ) where A = {(x1. . Thus this p.i. θ r −1 = 1 − θ1 − ⋅ ⋅ ⋅ − θ r −1 ( ) ( ) n ⎛ r ⎞ θj n! × exp⎜ ∑ x j log × I A x1 . ii) Completeness in the Beta and Gamma distributions when one of the parameters is unknown and the other is known. or that the p. · · · . C(θ) > 0. .f. 12 (for μ = θ and σ known). j is of exponential form with . 15.d. ∞ ) x . . .3 Use Theorems 6 and 7 to discuss: ii) The completeness established or asserted in Examples 9..1 11.f. Xn be i.4. .i. ⎢ ⎥ ⎣j =1 ⎦ ( ) () () () () where x = (x1.’s with p.f.2 Let X1. i) Show that f(·. . belongs to the r-parameter exponential family if it is of the following form: ⎡ r ⎤ f x.5 Some Multiparameter Generalizations Let X1. xj ∈ . θ = C θ exp ⎢∑ Q j θ Tj x ⎥ h x . . f(·.f. . j = 1. . 11. . θ) a member of a one-parameter exponential family of p.d. The following are examples of multiparameter exponential families.4. iv) X is distributed as Beta with β known. . . .d. ii) Show that ∑n= 1 X γj is a sufﬁcient statistic for θ . θ ∈ Ω and h(x) > 0 for x ∈ S. θ).’s and set X = (X1.11.d. Xk)′.d. γ > 0 ⎠ ( ) () (known). θ1 . xk)′. . . . θ = ⎛ xγ γ γ −1 x exp⎜ − θ ⎝ θ ⎞ ⎟ I (0 . . . We say that the joint p.d.d.d. r. θ) is indeed a p. ⋅ ⋅ ⋅ . 10. xj ≥ 0. . . θr)′ ∈ Ω ⊆ Rr. j = 1.d. . k ≥ 1. . . of X. .

v. . . . θ = (θ1. . θr)′ ∈ Ω ⊆ r. j = 1.5. Tj x1 . ⋅ ⋅ ⋅ . . show that the p. Finally. j = 1. j = 1. X2)′ is distributed as Bivariate Normal with parameters as described in Example 4. .282 11 Sufﬁciency and Related Theorems C θ = 1 − θ1 − ⋅ ⋅ ⋅ − θ r −1 . of X is not of an exponential form regardless of whether one or both of α. β).v.d. r.3 Use the not explicitly stated multiparameter versions of Theorems 6 and 7 to discuss: . Q1 θ = 1 . This point will not be pursued here. Again.2 If the r. . xr = x j .f. β are unknown.v. . ⋅ ⋅ ⋅ . θ 1 . . . Xn). Ur)′. . 2θ 2 ⎠ 2θ 2 θ2 ⎝ 2πθ 2 1 () () T2 x = − x 2 () and h x = 1. xr . θ). . is of exponential form with Cθ = () ⎛ θ2 ⎞ 1 θ exp⎜ − 1 ⎟ . . Uj = Uj(X1. multiparameter versions of Theorems 4–9 may be formulated but this matter will not be dealt with here.f. .i. () For multiparameter exponential families. 11. appropriate versions of Theorems 6. ⋅ ⋅ ⋅ . Exercises 11. ii) X is distributed as Beta. . .’s with p.f. x1! ⋅ ⋅ ⋅ xr ! ( ) Let X be N(θ1. r − 1. . . iii) X = (X1. if X1. the r-dimensional statistic U = (U1. Q2 = .5.d. however. θ2). . . not necessarily of an exponential form. r for all θ ∈ Ω. i) X is distributed as Gamma. X and the random vector X is of the multiparameter exponential form and identify the various quantities appearing in a multiparameter exponential family. xr = EXAMPLE 19 ( ) n! I A x1 . . 7 and 8 are also true. . θ 2 = ( ) ⎛ θ2⎞ ⎛θ 1 2⎞ exp⎜ − 1 ⎟ exp⎜ 1 x − x ⎟. is said to be unbiased if EθUj = θj. f x. Xn are i.5. X is distributed as U(α. . Then. 2θ 2 ⎠ ⎝ θ2 2πθ 2 ⎝ 2θ 2 ⎠ 1 and hence this p. 11. f(·. r. T1 x = x.d. show that the distribution of the r.1 In each one of the following cases. 1 − θ1 − ⋅ ⋅ ⋅ − θ r −1 ( ) h x1 . ⋅ ⋅ ⋅ . . Q j θ = log and () ( ) n () θj .d.

.1 Exercises Sufﬁciency: Deﬁnition and Some Basic Results 283 ii) The completeness asserted in Example 15 when both parameters are unknown. Dose Number of animals used (n) Number of deaths (Y) x1 x2 . (REMARK In connection with the probability p(x) given above. . . β ∈ are unknown parameters. . Yk constitutes an exponential family. Yk are independent r.’s. 11. ii) The statistic ′ k ⎛ k ⎞ Yj . β)′. In an experiment. k different doses of the drug are considered.. .. nk Y1 Y2 .4 (A bio-assay problem) Suppose that the probability of death p(x) is related to the dose x of a certain drug in the following manner px = () 1+e 1 . ∑ x j Yj ⎟ ⎜∑ ⎝ j =1 j =1 ⎠ is sufﬁcient for θ = (α.v. − (α + βx ) where α > 0. . The resulting data can be presented in a table as follows. each dose is applied to a number of animals and the number of deaths among them is recorded. . . nk are known constants. Yj is distributed as B(nj. . Y1. . Yk x1. Y2. xk and n1. . . . see also Exercise 4. . ii) Completeness in the Beta and Gamma distributions when both parameters are unknown.. Then show that: ii) The joint distribution of Y1.. p(xj)). . x2.1. Y2.) .8 in Chapter 4. ..5.11. xk n1 n2 .. n2.

we might be interested in estimating some function of θ. θ is generally unknown. If θ is known. or more generally. . however.) 284 .1 Let X1. θ For simplicity and by slightly abusing the notation.f. . .f.d. g(θ ). .d. The value U(x1. . . . Then the problem of estimating θ arises. r. with p. Let θ X1. which one of the estimators ¯ X1. f(·.’s with p. r. θ ). .’s having the Cauchy distribution with σ = 1 and μ unknown. Then DEFINITION 1 Any statistic U = U(X1. . f(·. xn) of U for θ θ the observed values of the X’s is called an estimate of g(θ ).1.v. θ say. Xn) which is used for estimating the unknown quantity g(θ ) is called an estimator of g(θ ). the terms estimator and estimate are often used interchangeably.v.284 12 Point Estimation Chapter 12 Point Estimation 12. θ ). ¯ (Hint: Use the distributions of X1 and X as a criterion of selection.i. in principle. . .1 Introduction Let X be an r. . Suppose you were to estimate μ. Xn be i. Xn be i.d. In practice. . we can Ω calculate. X would you choose? Justify your answer. where g is (measurable and) usually a real-valued function.i. Exercise 12. .v. where θ ∈Ω ⊆ r. .d. all probabilities we might be interested in. We now proceed to deﬁne what we mean by an estimator and an estimate of g(θ ). .

one would choose the one with smaller θ variance.12. . Thus if U = U(X1. . This suggests that a second desirable criterion (that of variance) would have to be superimposed on that of unbiasedness.3 The Case of Availability of Complete Minimum Variance 285 12. Minimum Variance From Deﬁnition 1. of U1 (for a fixed θ ).d. Xn) is called an unbiased estimator of g(θ ) if Eθ U(X1. θ the average value (expectation under θ ) of U is equal to g(θ ).1 (a) p. DEFINITION 2 Let g be as above and suppose that it is real-valued. Thus. . . ) h2(u. According to Deﬁnition 2.Sufﬁcient Statistics 12. Then the estimator U = U(X1.f. According to this criterion. 12. . . h1(u.2 Criteria for Selecting an Estimator: Unbiasedness. . Xn) is an unbiased estimator of g(θ ). too large. (b) p. . Xn) = θ g(θ ) for all θ ∈ Ω. if U = U(X1.) The reason for doing so rests on the interpretation of variance as a measure of concentration about the mean. . g(θ ) is said to be estimable if θ it has an unbiased estimator. . . ) 0 g( ) u 0 g( ) (b) u (a) Figure 12. as a rule. Thus the question arises as to how a class of estimators is to be selected. .f. In this chapter. Xn) is an unbiased estimator of g(θ ). one could restrict oneself to the class of unbiased estimators. then by Tchebichev’s θ inequality. DEFINITION 3 Pθ U − g θ ≤ ε ≥ 1 − [ () ] 2 σ θU . . . it is obvious that in order to obtain a meaningful estimator of g(θ ). this class is.1. one would have to choose that estimator from a speciﬁed class of θ estimators having some optimal properties. no matter what θ ∈ Ω is. then. among two estimators of g(θ ) which are both unbiased. ε2 2 Therefore the smaller σ θU is. . we will devote ourselves to discussing those criteria which are often used in selecting a class of estimators. . A similar interpretation can be given θ by means of the CLT when applicable. θ Although the criterion of unbiasedness does specify a class of estimators with a certain property. The interest in the members of this class stems from the interpretation of the expectation as an average value. of U2 (for a fixed θ ). (See Fig.d. the larger the lower bound of the probability of concentration of U about g(θ ) becomes.2 Criteria for Selecting an Estimator: Unbiasedness. θ Let g be as above and suppose it is real-valued. .

286 12 Point Estimation Following this line of reasoning. .3. the Cramér–Rao inequality. That is. In the second method just described. Lest we give the impression that UMVU estimators are all-important. within this restricted class.3. Using the second approach. . . The second approach is appropriate when a complete sufﬁcient statistic is not readily available. . θ ∈ Ω = (0.2. one would search for an estimator with the smallest variance. is instrumental.’s from U(θ1. . θ In many cases of interest a UMVU estimator does exist. if U1 = U1(X1. Once one decides to restrict oneself to the class of all unbiased estimators with ﬁnite variance.) It is more effective. . one would restrict oneself ﬁrst to the class of all unbiased estimators of g(θ ) and next to the subclass of unbiased estimaθ tors which have ﬁnite variance under all θ ∈ Ω. . the corollary to Theorem 2.d.2–12.v. r. In discussing Exercises 12.4 below. θ1 < θ2 and ﬁnd unbiased estimators for the mean (θ1 + θ2)/2 and the range θ2 − θ1 depending only on a sufﬁcient statistic for (θ1.3 Let X1. 12. 12.11 and 12. .2. and then would try to determine an estimator whose variance is equal to this lower bound.2 Let X1. we have the following deﬁnition. where the UMVU estimators involved behave in a rather ridiculous fashion. θ2)′. θ ). . . Find unbiased estimators of the mean and variance of the X’s depending only on a sufﬁcient statistic for θ. Xn be independent r. to be established below.1 Let X be an r. Exercises 12. ∞). An estimator U = U(X1.12. DEFINITION 4 Let g be estimable. . refer to Example 3 in Chapter 10 and Example 7 in Chapter 11. the problem arises as to how one would go about searching for a UMVU estimator (if such an estimator exists). .’s distributed as U(0.v.v. . Then. (Regarding sufﬁciency see.2. then σ θ U1 ≥ σ θ U for all θ ∈ Ω. There are two approaches which may be used. Show that there is no unbiased estimator of g(θ ) = 1/θ based on X. θ ). . distributed as B(n.2. . Xn be i. θ2). . The ﬁrst is appropriate when complete sufﬁcient statistics are available and provides us with a UMVU estimator. in that it does provide a lower bound for the variances of all unbiased estimators regardless of the existence or not of a complete sufﬁcient statistic.2.i. . Xn) is any other unbiased estimator 2 2 of g(θ ). however. one would ﬁrst determine a lower bound for the variances of all estimators in the class under consideration. Formalizing this. we refer the reader to Exercises 12. Xn) is said to be a uniformly minimum variance unbiased (UMVU) estimator of g(θ ) if it is unbiased and θ has the smallest variance within the class of all unbiased estimators of g(θ ) θ under all θ ∈ Ω.

i. . let T = (T1.d. Also ﬁnd the value of c for which the variance of U is minimum. . U say.4 Let X1.6 Let X1. . Set φ(T) = Eθ(U|T). r. . To this end. Chapter 11) (or more precisely. . . Xn be i. . θ ∈ Ω = . respectively. 12. . Then Rao– Blackwellize it and obtain φ(T). j = 1.v. the unbiased estimator φ(T) is the one with uniformly minimum variance in the class of all unbiased estimators. . Then. . Thus in the presence of a sufﬁcient statistic. Namely. 12. . . . both ¯ unknown. Xn be i.d. Then show that (X(1) + X(n))/2 is an unbiased estimator of θ. . where g is assumed to be real-valued.3 The Case of Availability of Complete Sufﬁcient Statistics The ﬁrst approach described above will now be looked into in some detail.12. . r. Tm)′. . 12.2. . .2. . .2. in searching for a UMVU estimator of g(θ). U = cX + (1 − c)Y is an unbiased estimator of θ. provided an unbiased estimator with ﬁnite variance exists. Yn be two independent random samples with the same mean θ and known variances σ 2 and σ 2.3 The Case of Availability of Complete Sufﬁcient Statistics 287 12. .’s from the U(θ.2. be a statistic which is sufﬁcient for θ and let U = U(X1. r.’s with mean μ and variance σ 2. . Xn). θ ∈ Ω = (0. 2θ ). . Chapter 11) (or rather.v. .d. Xn) be an unbiased estimator of g(θ). by the Lehmann–Scheffé theorem (Theorem 5. Notice that the method just described not only secures the existence of a UMVU estimator.i. φ(T) is also an unbiased estimator of g(θ) and θ 2 2 furthermore σ θ (φ) ≤ σ θ U for all θ ∈Ω with equality holding only if U is a φ Ω function of T (with Pθ-probability 1).’s from the Double Exponential distribu1 tion f(x. Tj = Tj(X1. one starts out with any unbiased estimator of g(θ) with θ ﬁnite variance. . m. Thus we have the following result. . Then show 1 2 ¯ ¯ that for every c ∈ [0. 1]. Then by the θ Rao–Blackwell theorem (Theorem 4. . It is essentially unique in the sense that any other UMVU estimators will differ from φ(T) only on a set of Pθ -probability zero for all θ ∈ Ω. . .7 Let X1.v. . This is the required estimator. the Rao–Blackwell theorem tells us that.i. but also produces it. θ ) = 2 e−|x−θ|. ∞) distribution and set U1 = n+1 X 2 n + 1 (n ) and U 2 = n+1 2 X ( n ) + X (1 ) . assuming that such an estimator exists. an obvious modiﬁcation of it). an obvious modiﬁcation of it). it sufﬁces to restrict ourselves to the class of those unbiased θ estimators which depend on T alone. . . Xn be i. . . 12. . 5n + 4 [ ] Then show that both U1 and U2 are unbiased estimators of θ and that U2 is uniformly better than U1 (in the sense of variance). Xm and Y1. . Next. assume that T is also complete.5 Let X1. Then show that X is the minimum variance unbiased linear estimator of μ. .

Then φ(T) is a UMVU estimator of g(θ) and is essentially unique. if it exists. . On the basis of r independent r. if we set p = θ. θ Tm)′. . The variance of the X’s is equal to pq. Thus U is an unbiased estimator of g(θ). . . Set . so that T = ∑jr= 1 Xj is B(nr. n − 1 j =1 EXAMPLE 1 ( ) then Eθ U = g(θ).288 12 Point Estimation THEOREM 1 Let g be as in Deﬁnition 2 and assume that there exists an unbiased estimator U = U(X1.’s from B(1. . distributed as B(n. r are independent B(n. . Now the r. θ may represent the probability of an item being defective. the problem is that of ﬁnding a UMVU estimator for g(θ). Xn be i. j = 1. By setting j T = ∑n= 1 Xj. Set φ(T) = Eθ (U|T). . θ This theorem will be illustrated by a number of concrete examples. . Xn) of g(θ) with ﬁnite variance. . . θ ). sufﬁcient statistic for θ by Examples 6 and 9 in Chapter 11. . . Therefore U is a UMVU estimator of the variance of the X’s according to Theorem 1. θ) and set 2 ⎛ n⎞ g θ = Pθ X ≤ 2 = ∑ ⎜ ⎟ θ x 1 − θ x = 0 ⎝ x⎠ () ( ) ( ) n −x = 1−θ ( ) n + nθ 1 − θ ( ) n −1 ⎛ n⎞ + ⎜ ⎟θ 2 1 − θ ⎝ 2⎠ ( ) n −2 . n n ⎛1 n ⎞ = ∑ X j2 − nX 2 = ∑ X j − n⎜ ∑ X j ⎟ ∑ ⎝ n j =1 ⎠ j =1 j =1 j =1 because Xj takes on the values 0 and 1 only and hence X 2 = Xj. Furthermore.i. . if the rule for rejection is this: Choose at random n (≥2) items from the lot and then accept the entire lot if the number of observed defective items is ≤2. .v. . if it exists. Xr distributed as X. we would like to ﬁnd a UMVU estimator of g(θ ). For example. Therefore. sufﬁcient statistic for θ. when chosen at random from a lot of such items. We know that. Furthermore. . . Let X1. . EXAMPLE 2 Let X be an r. .v. .v. p) and suppose we wish to ﬁnd a UMVU estimator of the variance of the X’s.v. . let T = (T1.’s X1. so that U = . r. Xn). θ ). Then g(θ ) represents the probability of accepting the entire lot. we have then j n ( Xj − X ) 2 2 ∑ (X j =1 n j −X ) 2 =T − T2 T2⎞ 1 ⎛ T− . if U= 2 1 n ∑ Xj − X .’s Xj. . m be a sufﬁcient statistic for θ and suppose that it is also complete. . n n−1⎜ n⎟ ⎝ ⎠ But T is a complete. . . . . The problem is that of ﬁnding a UMVU estimator of g(θ ). . j = 1. if the experiment just described is repeated independently r times.d. θ ∈ Ω = (0. T is a complete. Tj = Tj(X1. 1) and g(θ) = θ (1 − θ ).

Then Eθ U = g(θ ) but it is not a function of T. X 2 + ⋅ ⋅ ⋅ + X r = t − 1 1 = 2. we have Eθ U T = t = Pθ U = 1T = t θ 1 ( ) ( = P (X = ) ) ≤ 2T = t = Pθ X 1 ≤ 2. X 1 + ⋅ ⋅ ⋅ + X r = t Pθ T = t ( ( ) ) ( +P ( X θ 1 Pθ X 1 = 0.12.3 The Case of Availability of Complete Sufﬁcient Statistics 289 ⎧1 if U=⎨ ⎩0 if X1 ≤ 2 X 1 > 2. X 2 + ⋅ ⋅ ⋅ + X r ) = t − 2)] = ( +P ( X θ 1 Pθ X 1 = 0 Pθ X 2 + ⋅ ⋅ ⋅ + X r = t Pθ T = t ( ( ) ( ) + Pθ X 1 = 1 Pθ X 2 + ⋅ ⋅ ⋅ + X r = t − 1 1 ) ( = 2)P ( X θ 2 ⎡⎛ nr ⎞ = ⎢⎜ ⎟ θ t 1 − θ ⎢ ⎣⎝ t ⎠ ( ) nr −t + nθ 1 − θ ( ) n −1 ⎛ n r − 1 ⎞ t −1 n ( r −1)−t +1 ⎜ ⎟θ 1 − θ ⎝ t −1 ⎠ ( ⎤ ⎥ ⎥ ⎣ ⎦ ⎢ −1 ) + ⋅ ⋅ ⋅ + X = t − 2)] ⎡ ⎛ n(r − 1)⎞ ( ⎢(1 − θ ) ⎜ ⎟ θ (1 − θ ) r n t n r −1 −t ) ⎝ t ⎠ ) ( ) ⎛ n⎞ +⎜ ⎟ θ 2 1 − θ ⎝ 2⎠ ( ) n −2 ⎛ n r − 1 ⎞ t −2 n ( r −1)−t + 2 ⎤ ⎥ ⎜ ⎟θ 1 − θ ⎥ ⎝ t−2 ⎠ ⎦ ( ) ( ) ⎡⎛ nr ⎞ = ⎢⎜ ⎟ θ t 1 − θ ⎢ ⎣⎝ t ⎠ ( ) nr −t ⎤ t ⎥ θ 1−θ ⎥ ⎦ −1 ( ) nr −t ⎛ n r − 1 ⎞ ⎛ n⎞ ⎛ n r − 1 ⎞ ⎤ + n⎜ ⎟ + ⎜ ⎟⎜ ⎟ ⎥. ⎝ t − 1 ⎠ ⎝ 2⎠ ⎝ t − 2 ⎠ ⎥ ⎦ ( ) ( ) ⎡⎛ n r − 1 ⎞ ⎢⎜ ⎟ ⎢⎝ t ⎠ ⎣ ( ) Therefore −1 ⎛ n r − 1 ⎞ ⎛ n⎞ ⎛ n r − 1 ⎞ ⎤ ⎛ nr ⎞ ⎡⎛ n r − 1 ⎞ φ T = ⎜ ⎟ ⎢⎜ ⎟ + n⎜ ⎟ + ⎜ ⎟⎜ ⎟⎥ ⎝ T ⎠ ⎢⎝ T ⎠ ⎝ T − 1 ⎠ ⎝2⎠ ⎝ T − 2 ⎠ ⎥ ⎣ ⎦ ( ) ( ) ( ) ( ) . X 2 + ⋅ ⋅ ⋅ + X r = t Pθ T = t ( )[ )[ ( ) + Pθ X 1 = 1. Then one obtains the required estimator by Rao–Blackwellization of U. To this end.

if we set j=1 n S2 = 2 1 n ∑ Xj − μ . EXAMPLE 4 Let X1. . it does not depend on T which is a complete. (See Exercise 12. n⎠ ⎝ ( ) ( ) () ( ) ( ) t so that ⎛ 1⎞ φ T = ⎜1 − ⎟ n⎠ ⎝ ( ) T is a UMVU estimator of e−λ. . So. EXAMPLE 3 Consider certain events which occur according to the distribution P(λ). that is. r. Xn be i. λ = θ . .v. Set T = ∑ X j .2(i) and Example 10 in Chapter 11.) Then ⎛ 1⎞ Eθ U T = t = Pθ X1 = 0 T = t = ⎜ 1 − ⎟ . . ⎛ nS ⎞ ⎛ nS ⎞ Eθ ⎜ ′ ⎟ = c n .i. However. 1/n). For this purpose we use the fact that the conditional distribution of X1. . Xn (n ≥ 2) be i. It remains then for us to Rao–Blackwellize U. sufﬁcient statistic for θ. according to Exercise 11. By Corollary 5. σ 2) with σ 2 unknown and μ known. Then the problem is that of ﬁnding a UMVU estimator of e−λ. . Eθ U = Pθ U = 1 = Pθ X1 = 0 = g θ . Then the probability that no event occurs is equal to e−λ. Let now X1.290 12 Point Estimation is a UMVU estimator of g(θ) by Theorem 1.’s from N(μ.3. Chapter 7. Then the expectation n – – Eθ(√nS/√θ ) can be calculated and is independent of θ . given T = t. is B(t. .2).1. so that √nS/√θ is distributed as χn. r. call it c′ (see Exercise n 12.’s from P(λ). U is an unbiased estimator of g(θ ).v. . g θ = e −θ j =1 n () and deﬁne U by ⎧1 if U=⎨ ⎩0 if Then X1 = 0 X 1 ≥ 1. n j =1 ( ) – – then nS2/θ is χ2 .3. ′ ⎝ θ ⎠ ⎝ cn ⎠ – Setting ﬁnally cn = c′n /√n.d.d. That is. we obtain .i.1. – Set σ 2 = θ and let g(θ) = √θ . We are interested in ﬁnding a UMVU estimator of σ. so that Eθ ⎜ ⎟ = θ. we have that 1/θ∑ n (Xj − μ)2 is χ 2 .

Chapter 11.3 The Case of Availability of Complete Sufﬁcient Statistics 291 ⎛S⎞ Eθ ⎜ ⎟ = θ . g2(θ) = σ 2. respectively.12. S2)′ is a sufﬁcient statistic for θ. Then by setting g(θ) = μ + σ Φ−1(1 − p). S/cn is an unbiased estimator of g(θ ). Φ−1(1 − p) is a uniquely determined number. Chapter 11. sufﬁcient statistic (X. Since this estimator depends on the complete. one has Pθ (X1 ≥ ξp) = p. it follows that they are UMVU estimators. (See Example 12. since p is given. . it is complete. EXAMPLE 6 Let X1. . . σ 2) with both μ and σ 2 unknown. and set ξp for the upper pth quantile of the distribution (0 < p < 1). Chapter 11) S2 alone. S 2)′. EXAMPLE 5 Let again X1. By setting θ θ S2 = 2 1 n ∑ Xj − X .’s from N(μ.3(ii). σ 2) with both μ and σ 2 unknown. . r. r. . ⎝σ ⎠ Therefore ⎛ nS 2 ⎞ Eθ ⎜ = σ 2. ⎝ σ ⎠ Hence ξp − μ = Φ −1 1 − p and ξ p = μ + σ Φ −1 1 − p . . Set θ = (μ. Let θ ( ) ( ) .5.i. n − 1⎟ ⎝ ⎠ So U1 and U2 are unbiased estimators of μ and σ 2. σ2 )′.) Let U1 = X and U2 2 = nS /(n − 1). The problem is that of ﬁnding a UMVU estimator of ξp. Here θ = (μ. ⎛ nS 2 ⎞ Eθ ⎜ 2 ⎟ = n − 1. Since they ¯ depend only on the complete. Clearly. Xn be i. σ Of course. it follows that S/cn is a UMVU estimator of σ. . Xn be i. We are interested in ﬁnding UMVU estimators for each one of μ and σ 2. n j =1 ( ) ¯ we have that (X. .v. Eθ U1 = μ.d.) ¯ Furthermore. sufﬁcient statistic (see Example 8 and Exercise 11. our problem is that of ﬁnding a UMVU θ estimator of g(θ). σ ⎠ ⎝ σ ⎝ ⎠ ( ) so that ⎛ ξp − μ ⎞ Φ⎜ ⎟ = 1 − p. σ 2)′ and let g1(θ) = μ. ⎝ cn ⎠ that is. But ⎛ X − μ ξp − μ ⎞ ⎛ ξp − μ ⎞ Pθ X 1 ≥ ξ p = Pθ ⎜ 1 ≥ ⎟ = 1 − Φ⎜ σ ⎟ .d.i.’s from N(μ. By Remark 5 in Chapter 7.v. From the deﬁnition of ξp. (See Example 8.

σ2. . .d. θ ).3. f x.v. we have that Eθ U = g(θ). 12. Then the reliability of the equipment at time x.8 Let X1. n be independent random vectors having the Bivariate Normal distribution with parameter θ = (μ1. cn ( ) ¯ where cn is deﬁned in Example 4.2 Refer to Example 4 and evaluate the quantity c′n mentioned there. Find the UMVU estimator of g(θ ) = 1/θ and determine its variance.’s from P(θ ). Xn be i.i. .v. θ ∈Ω = (0. . of X1. σ 2 ). . . 1). that is. 12. sufﬁcient statistic (X. given T = t. r.d.1 Let X1.4 If X1. .v. it follows that U is a UMVU estimator of ξp. having the Geometric distribution.v. observe that the same is true if X1 is replaced by any one of the remaining X’s. .10 Let X be an r.3.i. . Xn be i.3. ∞).v. Furthermore. 12.f.3.d. S2)′.v. 12. r. Yj)′. . If X has the Negative Exponential distribution with parameter θ ∈Ω = (0.3. use Theorem 1 in order to determine the UMVU estimator of θ. 12.3 If X1. μ2.’s distributed as N(μ.9 Let (Xj.6 Let X be an r.3. ∞).’s from the Negative Exponential distribution with parameter θ ∈Ω = (0.v. 1. θ = θ 1 − θ . denoting the life span of a piece of equipment. Use Theorem 1 in order to determine the UMVU estimator of θ. Xn are i. .v. . j = 1. . . Exercises 12. . Since U depends only on the θ ¯ complete. Then show j that the conditional p. σ1. 12. . where both μ and σ2 are unknown. ﬁnd the UMVU estimator of the reliability R(x. ( ) ( ) x ( ) .i. 12.3. ρ)′.3. . r. . Then by the fact that Eθ X = μ and Eθ (S/cn) = σ (see Example 4). 12. Find the UMVU estimators of the following quantities: ρσ1σ2. . . show that X is the UMVU estimator of θ. r.292 12 Point Estimation U=X+ S −1 Φ 1− p .3. 12.’s distributed as N(θ. 12.v. ∞). is deﬁned as the probability that X > x.d.5 Let X1. ρσ2/σ1. . having the Negative Binomial distribution with parameter θ ∈ Ω = (0.11 Let X be an r. . θ ) on the basis of n observations on X. . Xn are i.i. Find the UMVU estimator of μ/σ. 1 . 1). . μ1μ2. by using ¯ Theorem 1.7 Let X1.3. Xn be independent r. Show that ¯ X 2 − (1/n) is the UMVU estimator of g(θ ) = θ 2. . . . θ ∈Ω = (0. is that of B(t.’s from P(λ) and set T = ∑n= 1 Xj. θ ∈ Ω = 0. . . 1). Xn be independent r. 1/n). .3. .’s from B(1. . R(x). x = 0.d.

3. 12.1.v.1 Regularity Conditions Let X be an r. Deﬁne U(X ) by U(X) = (−1)X. If that is possible.f. where θ ∈Ω = (0.3. and suppose that X is distributed as P(θ ). we ﬁrst establish a lower bound for the variances of all unbiased estimators and then we attempt to identify an unbiased estimator with variance equal to the lower bound found. Then it is assumed that iii) f(x.) 12. iii) (∂/∂θ ) f(x. θ ). θ ) is positive on a set S independent of θ ∈ Ω.2. one may use the approach described here in searching for a UMVU estimator.13 Use Example 11. the problem is solved again. θ ) exists for all θ ∈ Ω and all x ∈ S except possibly on a set N ⊂ S which is independent of θ and such that Pθ (X ∈ N ) = 0 for all θ ∈ Ω. ∞) is the number of calls arriving at the telephone exchange under consideration within a 15 minute period. According to this method. 12.5 is not UMVU.3. show that U(X) is a UMVU estimator of θ and conclude that it is an unreasonable one. f(·.12.2 is actually UMVU.14 Use Exercise 11.2. Then show that U(X) is the UMVU estimator of g(θ ) and conclude that it is an entirely unreasonable estimator. iv) ∫S ⋅⋅⋅ ∫S f ( x1 . iii) Ω is an open interval in (ﬁnite or not).12 Let X be an r.3 The Case of Availability of Complete Sufﬁcient Not Exist 293 and let U(X) be deﬁned as follows: U(X ) = 1 if X = 0 and U(X ) = 0 if X ≠ 0. which may be useful for comparison purposes.d. we do have a lower bound of the variances of a class of estimators. Then the number of calls which arrive at the given telephone exchange within 30 minutes is an r. denoting the number of telephone calls which arrive at a given telephone exchange. or it is not easy to identify them. Chapter 11. in order to show that the unbiased estimator constructed in Exercise 12. θ ∈ Ω ⊆ . Thus Pθ (Y = 0) = e−2θ = g(θ ). (Hint: Use Theorem 1. When such statistics do not exist.4 The Case Where Complete Sufﬁcient Statistics Are Not Available or May Not Exist: Cramér–Rao Inequality When complete.4. 12. We assume that Ω ⊆ and that g is real-valued and differentiable for all θ ∈ Ω. sufﬁcient statistics are available.v.v. 12. At any rate. θ ) ⋅ ⋅ ⋅ f ( xn .4.4 The Case Where Complete Sufﬁcient Statistics Are Not Available or May Statistics 12. θ )dx1 ⋅ ⋅ ⋅ dxn . as can be shown. By using Theorem 1. the problem of ﬁnding a UMVU estimator is settled as in Section 3. in order to conclude that the unbiased estimator constructed in Exercise 12. with p. Y distributed as P(2θ). Chapter 11. The following regularity conditions will be employed in proving the main result in this section.

) Let X1. Hence we need only consider the case where σ 2 U < ∞ and I(θ ) θ < ∞ for all θ ∈ Ω. iv) Eθ [(∂/∂θ )log f (X. θ ⎥ ∏ f xi .’s with p. xn f x1 . θ ) S may be differentiated under the integral or summation sign. Xn) is any unbiased estimator of g(θ ). the inequality is trivially true θ for those θ’s. X n S S ( = ∫ ⋅ ⋅ ⋅ ∫ U x1 . . Then we have the following theorem.294 12 Point Estimation or ∑ S ⋅⋅⋅ ∑ f ( x1 . . θ )⎥ ⎥ i i≠j ⎤ ⎦ ( ) ( ⎤ n ∂ f x j . (Cramér–Rao inequality. ∂θ ⎢ j =1 ⎣ ⎡n 1 = ⎢∑ ⎢ j =1 f x j . We have Eθ U X 1 . to be denoted by I(θ ). . θ ⎥ ∏ f xi . .d. xn )f ( x1 . . . . θ + ⎢ f =⎢ ⎣ ∂θ ⎦ j ≠1 ⎣ ∂θ ⎡∂ ×∏ f x j . . where U(X1. . is >0 for all θ ∈ Ω. θ ) S S THEOREM 2 may be differentiated under the integral or summation si gn. one has σ θ2 [g ′(θ )] U≥ nI θ 2 () . f(·.d. θ dx1 ⋅ ⋅ ⋅ dxn = g θ . . Now restricting ourselves to S. θ ⋅ ⋅ ⋅ f xn . θ ∂θ ⎡∂ ⎤ ⎡∂ f x1 . θ ) ⋅ ⋅ ⋅ f ( xn . θ )⎥ 2 ⎤ ⎦ j ( ) ( )⎥ ∏ f ( x .f. . respectively. r. . θ ⎣ ⎡n ∂ = ⎢∑ log f ⎢ ⎣ j =1 ∂θ ( θ )∏ f ( x . Xn) of g(θ ). . since the discrete case is treated entirely similarly with integrals replaced by summation signs. θ ∂θ ⎥ i =1 ⎦ ⎤ n x j . . . xn ) f ( x1 . θ ) ⎦ j ≠n ⎤ n ⎡ ∂ = ∑⎢ f xj . . . . θ )]2. θ ⎥∏ f x j . θ ⋅ ⋅ ⋅ f xn . . θ ) ⋅ ⋅ ⋅ f ( xn .i. θ + ⋅ ⋅ ⋅ + ⎢ f xn . . . . θ ) dx1 ⋅ ⋅ ⋅ dxn ∑ ⋅ ⋅ ⋅ ∑U ( x1 . we have ( ) )( ) ( ) () (1) ∂ f x1 . θ ) and assume that the regularity conditions (i)–(vi) are fulﬁlled. respectively. ⎥ ⎦ i =1 ( ) ( ) (2) ) ( ) . vi) or ∫S ⋅⋅⋅ ∫S U ( x1 . . Xn be i. . . PROOF If σ 2 U = ∞ or I(θ ) = ∞ for some θ ∈ Ω. θ j ≠2 ⎣ ∂θ [( ) ( )] ( ) ( ) ( x . θ ∈ Ω. θ . .v. . Then for any unbiased estimator U = U(X1. Also it sufﬁces to discuss the continuous case only. where g ′ θ = () dg θ dθ () . θ ) ⋅ ⋅ ⋅ f ( xn . .

θ dx1 ⋅ ⋅ ⋅ dxn = 1. ∂θ ⎪ ⎢ ⎥⎭ ⎣ j =1 ⎦⎪ ⎩ ( ) ( ) ( ) ( ) ( ) ( ) where we set Vθ = Vθ X 1 .4 The Case Where Complete Sufﬁcient Statistics Are Not Available or May Statistics 12. θ ⎥ . . θ ⎥ = 0. θ . θ ⎥ = ∑ σ θ2 ⎢ log f X j . S ( ) ( ) Therefore differentiating both sides with respect to θ by virtue of (iv). θ ⎥ ⎢ j = 1 ∂θ ⎥ j = 1 ⎣ ∂θ ⎦ ⎣ ⎦ ⎡∂ ⎤ 2 = nσ θ ⎢ log f X 1 . θ dx1 ⋅ ⋅ ⋅ dxn ∫S S ⎢ j = 1 ∂θ ⎥ i =1 ⎣ ⎦ n ⎫ ⎧ ⎡ ⎤⎪ ∂ ⎪ (3) = Eθ ⎨U X 1 . ⎣ ∂θ ⎦ ( ) ( ) ( ) so that ⎡∂ ⎤ Eθ ⎢ log f X 1 . ⎣ ∂θ ⎣ ∂θ ⎦ ⎦ ( ) 2 ( ) 2 (6) . j = 1 ∂θ n ( ) ∫S ⋅ ⋅ ⋅ ∫ f x1 . θ ⎥ = ∑ Eθ ⎢ log f X j . X n = ∑ Next. . xn ⎢∑ log f x j . θ ⎥ ⎢ j =1 ∂θ ⎥ j =1 ⎣ ∂θ ⎦ ⎣ ⎦ ⎡∂ ⎤ = nEθ ⎢ log f X 1 . . ( ) ( ) ( )( ) ( ) () (5) From (4) and the deﬁnition of Vθ. we obtain g′ θ = ∫ ⋅ ⋅ ⋅ () ⎡n ∂ ⎤ n U x1 . θ ⋅ ⋅ ⋅ f xn . . ⎡n ∂ ⎤ n 0 = ∫ ⋅ ⋅ ⋅ ∫ ⎢∑ log f x j . it follows that ( ) ( ) (4) Coυθ U . θ ⎥∏ f xi . . θ ⎥ ⎣ ∂θ ⎦ ( ) ( ) ( ) ⎡∂ ⎡∂ ⎤ ⎤ = nEθ ⎢ log f X 1 . . S S ⎢ ⎥ ⎣ j = 1 ∂θ ⎦ i =1 From (3) and (4). θ ⎥. . and employing (2). . . ⎣ ∂θ ⎦ ( ) Therefore ⎡n ∂ ⎤ n ⎡∂ ⎤ σ θ2Vθ = σ θ2 ⎢∑ log f X j . .12. θ ⎥∏ f xi . θ dx1 ⋅ ⋅ ⋅ dxn = Eθ Vθ . . ( ) ∂ log f X j .3 The Case of Availability of Complete Sufﬁcient Not Exist 295 Differentiating with respect to θ both sides of (1) on account of (vi) and utilizing (2). . it further follows that ⎡n ∂ ⎤ n ⎡∂ ⎤ 0 = Eθ Vθ = Eθ ⎢∑ log f X j . θ ⎥ = nEθ ⎢ log f X . Vθ = Eθ UVθ − EθU Eθ Vθ = Eθ UVθ = g ′ θ . θ ⎥ ⎬ = Eθ UVθ . X n ⎢∑ log f X j .

the fact that EθU = g(θ) and the deﬁnition of Vθ.) Returning to the proof of Theorem 2. X n j =1 n ( ) ( ˜ )∫ k(θ )dθ − ∫ g(θ )k(θ )dθ + h( X . relation (7) becomes [g ′(θ )] ( ) 2 2 θ ⎡∂ ⎤ ≤ σ U nEθ ⎢ log f X . . . where kθ =± ( )( ) (9) () σ θ Vθ . 1 n . . this is equivalent to Vθ = Eθ Vθ + k θ U − EθU with Pθ −probability 1. . see Exercises 12. Integrating (10) (with respect to θ ) and assuming that the indeﬁnite integrals ∫k(θ )dθ and ∫g(θ )k(θ )dθ exist. is called Fisher’s information (about θ ) number. Chapter 5). . Vθ θ θ θ and ρ (U.4. Xn) ∈ N] = 0 for all θ ∈ Ω.6 and 12. . . θ = U X 1 . ▲ DEFINITION 5 The expression Eθ[(∂/∂θ )log f(X. we have that equality holds in (8) if 2 2 2 and only if C oυθ(U. Vθ) ≤ 1. . . . Vθ = 2 θ ( ) ( ) (σ U )(σ V ) Coυ U . θ )]2. σ θU Furthermore. . we obtain log ∏ f X j . . . Vθ) = (σ θU)(σ θVθ) because of (7). because of (i). . equation (9) becomes as follows: n ∂ log ∏ f X j . (For an alternative way of calculating I(θ ). θ = k θ U X 1 . the exceptional set for which (9) does not hold is independent of θ and has Pθ-probability 0 for all θ ∈ Ω. ∂θ ⎣ ⎦ ( ) 2 or by means of (v). . .7. σ θ2U ≥ nEθ ∂ ∂θ log f X . which is equivalent to 2 2 C o υθ U . . Vθ ≤ σ θ U σ θ Vθ . θ )]2 is the information (about θ ) contained in the sample X1. . denoted by I(θ ). Xn. θ [( [g ′(θ )] ) 2 ( )] 2 [g ′(θ )] = nI θ 2 () .296 12 Point Estimation But ρθ U . X n − g θ k θ ∂θ j =1 ( ) () ( ) ()() (10) outside a set N in n such that Pθ[(X1. By Schwarz inequality (Theorem 2. (8) The proof of the theorem is completed.4. X ). nEθ[(∂/∂θ )log f(X. θ ⎥ . . 2 ( ) ( )( ) (7) Taking now into consideration (5) and (6). . Taking into consideration (4).

That is. COROLLARY [( )] If in Theorem 2 equality occurs for some unbiased estimator U = U (X1. . 1 n (11) Exponentiating both sides of (11). THEOREM 3 Let X1. · · · . . . such that U = g(θ ) + d(θ )Vθ except perhaps on a set of Pθ-probability zero for all θ ∈ Ω. . f(·. provided certain conditions are met.d.3 The Case of Availability of Complete Sufﬁcient Not Exist 297 ˜ where h(X1. () [ ()() ] () ) () ( Thus. . . . . x ) n j 1 n 1 n j =1 outside a set N in n such that Pθ[(X1. . . . Then σ θ U is equal to the Cramér–Rao bound if and only if there exists a real-valued function of θ.d. . θ ) = C (θ ) exp[Q(θ )U ( x . we also have the following important result. σ θU ∏ f ( x . . . r. . . Q θ = ∫ k θ dθ and ˜ h x1 . . . of the X’s is of the one-parameter exponential family (and hence U is sufﬁcient for θ). . . . x )]h( x . . . we 2 assume that regularity conditions (i)–(vi) are satisﬁed. .d. xn = exp h x1 . PROOF Under the regularity conditions (i)–(vi). xn j =1 n ( ) ( ˜ )∫ k(θ )dθ − ∫ g(θ )k(θ )dθ + h( x . More precisely. we have that . then the joint p. .f. the joint p. . we have the following result. For an unbiased estimator U = U(X1. . d(θ ).d. . .i. xn . here C(θ) = exp[−∫g(θ )k(θ )dθ] and Q(θ ) = ∫k(θ )dθ. . .4 The Case Where Complete Sufﬁcient Statistics Are Not Available or May Statistics 12. or log ∏ f x j . Xn) ∈N] = 0 for all θ ∈Ω. .v. . . . θ ) = C (θ ) exp[Q(θ )U ( x .f. . ∫g(θ )k(θ )dθ exist. Xn) of g(θ ) and if the indeﬁnite integrals ∫ k(θ )dθ. . . Xn) is the “constant” of the integration. In connection with the Cramér–Rao bound. . . .’s with p.f. where kθ =± then () σ θ Vθ . but this will not be discussed here. . . . n j 1 n 1 n j =1 (12) where C θ = exp − ∫ g θ k θ dθ . Xn be i. if equality occurs in the Cramér–Rao inequality for some unbiased estimator. x )]h( x .12. θ ) and let g be an estimable realvalued function of θ. REMARK 1 Theorem 2 has a certain generalization for the multiparameter case. x ). of the X’s is of the one-parameter exponential form. we obtain ∏ f ( x . θ = U x1 . . . Xn) of g(θ ). x ). . .

we get that U = g(θ ) + d(θ )Vθ except perhaps on a set of Pθ-probability 0 for all θ ∈ Ω. θ ⎥ = .v. By setting p = θ. ⎥ ⎢ ∂θ θ 1−θ ⎦ ⎣ so that the Cramér–Rao bound is equal to θ(1 − θ )/n. ( ) ) ( ) x 1− x ∂ log f x. 2 θ 2 θ θ But 2 2 2 Thus σ θU is equal to the Cramér–Rao bound if and only if C θ(U. 1 so that Then log f x. we have f x. θ ⎥ = 2 x + θ ⎣ ∂θ ⎦ 1−θ ( ) 2 ( ) 2 (1 − x) 2 − 2 x 1− x . the exceptional set for which this relationship does not hold is independent of θ and has Pθ-probability 0 for all θ ∈ Ω.’s from B(1. Xn be i. or [g ′(θ )] σ U≥ 2 θ 2 σ θ2 Vθ . θ = θ x 1 − θ ( ) ( ) ( 1− x . . The proof of the theorem is completed. Vθ 2 ( ) by (5). or equivalently. a(θ ) and d(θ ). Furthermore. r. θ 1−θ ( ) ( ) Since Eθ X 2 = θ . θ = x log θ + 1 − x log 1 − θ . ▲ [g ′(θ )] 2 = C o υθ U . The following three examples serve to illustrate Theorem 2. Eθ 1 − X (see Chapter 5). . p ∈ (0. Vθ) = (σ θU) 2 × (σ θVθ).298 12 Point Estimation [g ′(θ )] σ U≥ 2 θ 2 nI θ () 2 . ( ) ( . if and only if U = a(θ ) + d(θ )Vθ with Pθ-probability 1 for some functions of θ. Taking expectations and utilizing the unbiasedness of U and relation (4). because of (i).d. EXAMPLE 7 Let X1. . The checking of the regularity conditions is left as an exercise. Then σ θU is equal to the Cramér–Rao bound if and only if [g ′(θ )] = (σ U )(σ V ). θ = − ∂θ θ 1−θ ( ) and ⎡∂ ⎤ 1 2 1 ⎢ log f x. x = 0.i. p). 2 2 since nI(θ ) = σ θVθ by (6). . we have ( ) 2 = 1−θ and Eθ X 1 − X = 0 2 [ ( ) )] ⎤ ⎡∂ 1 E ⎢ log f X . 1).

Assume ﬁrst that σ 2 is known and set μ = θ. λ > 0. .’s from P(λ). 1. we obtain ( ) 2 ⎡∂ ⎤ 1 Eθ ⎢ log f X . θ = ( ) ⎡ x −θ exp ⎢− ⎢ 2σ 2 2πσ ⎢ ⎣ 1 ( ) ( 2 ⎤ ⎥. Next. θ = . θ ⎥ = 2 . . r.12. . ( ) 2 EXAMPLE 9 Let X1. Then f x. .v. θ = log⎜ ⎟− 2σ 2 ⎝ 2πσ ⎠ ( ) ) 2 . ∂θ θ ⎣ ⎦ ¯ so that the Cramér–Rao bound is equal to θ/n. ∂θ θ θ ⎣ ⎦ 2 Since Eθ X = θ and Eθ X = θ (1 + θ ) (see Chapter 5). . r. .’s from N(μ. x! ( ) x ∂ log f x. θ = e − θ Then ( ) θx .v. 1 x −θ ∂ log f x. Xn be i. . θ = −1 + ∂θ θ ( ) and ⎡∂ ⎤ 1 2 2 ⎢ log f x. we have f x.3 The Case of Availability of Complete Sufﬁcient Not Exist 299 2 ¯ ¯ Now X is an unbiased extimator of θ and its variance is σ θ(X ) = ¯ is a UMVU θ(1 − θ )/n. σ 2). . θ = −θ + x log θ − log x!.4 The Case Where Complete Sufﬁcient Statistics Are Not Available or May Statistics 12. so that log f x. . that is. . we have that X is a UMVU estimator of θ.i.i. θ ⎥ = . . Xn be i. EXAMPLE 8 Let X1. θ ⎥ = 1 + 2 x − x. σ ⎝ σ ⎠ ⎣ ∂θ ⎦ ( ) 2 2 Then ⎤ ⎡∂ 1 Eθ ⎢ log f X .d. Again by setting λ = θ. Since again X is an unbiased ¯ estimator of θ with variance θ/n. ⎥ ⎢ ∂θ σ ⎦ ⎣ 2 ( ) . Therefore X estimator of θ. ∂θ σ σ ( ) so that ⎡∂ ⎤ 1 ⎛ x −θ⎞ ⎢ log f x. x = 0. x ∈ ⎥ ⎥ ⎦ and hence ⎛ 1 ⎞ x −θ log f x. θ ⎥ = 2 ⎜ ⎟ . equal to the Cramér–Rao bound.d.

This was also shown in Example 5. 1) and hence ⎛ X −θ⎞ Eθ = ⎜ ⎟ = 1.300 12 Point Estimation since (X − θ )/σ is N(0. Chapter 7). so that 2 ⎡ n ⎛ X − μ⎞2⎤ ⎡n ⎛ ⎞ ⎤ j ⎢∑ ⎥ = n and σ 2 ⎢∑ X j − μ ⎥ = 2 n Eθ θ ⎟ ⎟ ⎢ j =1 ⎜ ⎢ j =1 ⎜ ⎝ ⎝ θ ⎠ ⎥ θ ⎠ ⎥ ⎣ ⎦ ⎣ ⎦ . Then ⎡∂ ⎤ 1 1 ⎛ x − μ⎞ 1 ⎛ x − μ⎞ ⎢ log f x. Eθ ⎜ ⎟ = 3. that is. θ ⎥ = 2 − 2 ⎜ ⎟ + 2⎜ ⎟ 4θ 2θ ⎝ θ ⎠ 4θ ⎝ θ ⎠ ⎣ ∂θ ⎦ – and since (X − μ)/√θ is N(0. X is a UMVU estimator.) ⎡∂ ⎤ 1 Eθ ⎢ log f X . Next. ⎝ θ ⎠ ⎝ θ ⎠ Therefore 2 2 4 (See Chapter 5. θ ⎥ = 2 2θ ⎣ ∂θ ⎦ 2 and the Cramér–Rao bound is 2θ /n. the Cramér–Rao bound.) ¯ Thus the Cramér–Rao bound is σ 2/n. Once again. ⎝ σ ⎠ 2 (See Chapter 5. we obtain ( ) 2 2 4 ⎛ X − μ⎞ ⎛ X − μ⎞ Eθ ⎜ ⎟ = 1. θ = ( ) ⎡ x−μ exp ⎢− ⎢ 2θ 2πθ ⎢ ⎣ 1 ( ) 2 ⎤ ⎥. 1). ¯ Therefore. ( ) ⎛ X − μ⎞ ∑ ⎜ j ⎟ is χ n2 j =1 ⎝ θ ⎠ n 2 (see ﬁrst corollary to Theorem 5. ⎥ ⎥ ⎦ so that x−μ 1 1 log f x. Then f x. Suppose now that μ is known and set σ 2 = θ. θ = − log 2π − log θ − 2 2 2θ ( ) ( ) ( ) 2 and x−μ ∂ 1 log f x. X is an unbiased esti2 mate of θ and its variance is equal to σ /n. θ = − + ∂θ 2θ 2θ 2 ( ) ( ) 2 .

As a matter of fact. ∞) unknown. 1].eff. clearly. Thus (1/n)∑n (Xj − μ)2 is a UMVU estimator of θ. This then is an example of a case where a UMVU estimator does exist but its variance is larger than the Cramér–Rao bound. takes values in (0. . Now it has been seen in Example 5 that 1 n ∑ Xj − X n − 1 j =1 ( ) 2 is a UMVU estimator of θ 2. Then show that the UMVU estimator of θ is U X1 . ⎢ ⎥ ⎣ n − 1 j =1 ⎦ the Cramér–Rao bound. we assume that both μ and σ2 are unknown and set μ = θ1. . it follows that ⎡ 1 n 2⎤ 2θ 2 2θ 2 σ θ2 ⎢ ∑ X j − X ⎥ = n −21 > n2 .12. . X n = ( ) 1 nα ∑X j =1 n j . REMARK 2 Corollary D in Chapter 6 indicates the sort of conditions which would guarantee the fulﬁllment of the regularity conditions (iv) and (vi).i. we arrive at the same 2 conclusion by treating θ1 as a constant and θ2 as the (unknown) parameter θ and calculating the Cramér–Rao bound. Chapter 7). . By using the generalization we spoke of in Remark 1. Therefore (1/n)∑n (Xj − μ)2 is an unbiased j=1 estimator of θ and its variance is 2θ 2/n.3 The Case of Availability of Complete Sufﬁcient Exercises Statistics 301 (see Remark 5 in Chapter 7). . j=1 Finally.1 Let X1. It is known as relative efﬁciency (r. then the quantity σ θU/(σ θU*) may serve as θ a measure of expressing the efﬁciency of U* relative to that of U.d.) of U* and. σ 2 = θ 2. Xn be i. equal to the Cramér–Rao bound. Thus if U is a UMVU estimator of g(θ ) and U* is any θ 2 2 other unbiased estimator of g(θ ). provided by Theorem 2. r. . .4. . Suppose that we are interested in ﬁnding a UMVU estimator of θ2.v. Since ⎛X −X⎞ ∑⎜ j ⎟ ⎜ j =1 ⎝ θ2 ⎟ ⎠ n 2 2 is χ n −1 (see second corollary to Theorem 5. it can be seen that the Cramér– Rao bound is again equal to 2θ 2/n.’s from the Gamma distribution with α known and β = θ ∈ Ω (0. Exercises 12. ( ) A UMVU estimator of g(θ ) is also called an efﬁcient estimator of g(θ ) (in the sense of variance).

θ ∈ Ω ⊆ .) 12.5 Criteria for Selecting an Estimator: The Maximum Likelihood Principle So far we have concerned ourselves with the problem of ﬁnding an estimator on the basis of the criteria of unbiasedness and minimum variance.5 and investigate whether the Cramér–Rao bound is attained. Furthermore. .302 12 Point Estimation and its variance attains the Cramér–Rao bound.4. on a set N ⊂ S with Pθ(X ∈ N) = 0 for all θ ∈ Ω.d.v. θ ) = 0. Xn be i.f. 12.f.4.v. Xn be i. . θ ) and b′(θ ) = db(θ )/dθ. f(·.4. . r.5 Refer to Exercise 12.2 Refer to Exercise 12. perhaps.6 and investigate whether the Cramér–Rao bound is attained.6 Assume conditions (i) and (ii) listed just before Theorem 2. .3. θ ).’s with p.4. 12. . S ∂2 ⎡ ∂2 ⎤ Then show that I θ = − Eθ ⎢ 2 log f X .3.d. 12. () ( ) 12. θ [( [1 + b′(θ )] ) 2 ( )] 2 . . θ) exists for all θ ∈ Ω and all x ∈ S except.d.1–12.8 Let X1. .i. .i. θ ∈ Ω. ⎢ ⎥ ⎣ ∂θ ⎦ 12. Let X1. . under the regularity conditions (i)–(vi) preceding Theorem 2—where (vi) is assumed to hold true for all estimators for which the integral (sum) is ﬁnite—one has σ θ2V ≥ nEθ ∂ ∂θ log f X . . θ ∈Ω ⊆ r and consider the Ω . f(·.f. recalculate I(θ ) and the Cramér–Rao bound by utilizing Exercise 12. θ).’s with p.d.4. .3 Refer to Exercise 12.3. .4. 12. Show that. Another principle which is very often used is that of the maximum likelihood.4. r.7 and show that the Cramér–Rao bound is not attained for the UMVU estimator of g(θ ) = θ 2. θ )dx = 0 ∂2 or ∑ ∂θ 2 f ( x.v. Then b(θ ) is called the bias of V.4.4.11 and investigate whether the Cramér–Rao bound is attained. write EθV = θ + b(θ ). and also ∂ suppose that the ∂θ f(x.4. suppose that. respectively. with p. 2 2 ∫S ∂θ 2 f ( x. 12. θ ⎥.7 In Exercises 12.4 Refer to Exercise 12. Xn) of θ for which EθV is ﬁnite. Here X is an r. (This inequality is established along the same lines as those used in proving Theorem 2.d.3.6 where appropriate. f(·. For an estimator V = V(X1.4.

xn) with the θ probability element L(θ |x1. j = 1. . Xn) is called an ML estimator (MLE for short) of θ. EXAMPLE 10 ( ) ( ) Let X1. . . . if such a θ exists. . xn) and call it the likelihood function. . . an MLE is often shown to have several desirable properties. . . xn .d. . X n = xn . . . . A similar interpretation holds true for the case that the X’s are continuous by replacing L(θ |x1. suppose ﬁrst that the X’s are discrete. xn = max L θ x1 . Then it is intuitively clear that one should select as an estimate of θ that θ which maximizes the probability of observing the x’s which were actually observed. we denote it by L(θ |x1.i. . θ ∈ Ω . x > 0 is strictly increasing. . . . it sufﬁces θ to maximize log L(θ |x1. . θ). . n. In many important cases there is a unique MLE. . . xn = e − nθ ( ) 1 ∏ j =1 x j ! n θ ∑n= 1 x j j and hence ⎛ n ⎞ ⎛ n ⎞ log L θ x1 . .f. . . . . . . . L(θ |x1. . . Then L θ x1 . xn = − log⎜ ∏ x j !⎟ − nθ + ⎜ ∑ x j ⎟ log θ . We will elaborate on this point later. . θ DEFINITION 6 ˆ ˆ The estimate θ = θ (x1. . that is. which we then call the MLE and which is often obtained by differentiation. it does provide a method for producing estimates in many cases of practical importance. This is much more convenient to work with. xn) is the probability of observing the x’s which were θ acutally observed.5 Criteria12. .v.d. . . . . . . . . of the X’s f(x1. in order to maximize (with respect to θ ) L(θ |x1. ⎝ j =1 ⎠ ⎝ j =1 ⎠ ( ) Therefore the likelihood equation . ( ) [( ) ] REMARK 3 Since the function y = log x. . . The method of maximum likelihood estimation will now be applied to a number of concrete examples. xn)dx1 · · · dxn which represents the probability θ (under Pθ ) that Xj lies between xj and xj + dxj. Treating the x’s as if they were constants and looking at this joint p. . . . as a function of θ. .f. . Then L θ x1 . . xn). xn = Pθ X 1 = x1 . Although the principle of maximum likelihood does not seem to be justiﬁable by a purely mathematical reasoning.12. θ) · · · f(xn. . xn) is called a maximum likelihood estimate (MLE) of θ if ˆ L θ x1 . ˆ θ(X1.’s from P(θ ). .3 Selecting anof Availability of Maximum Likelihood Statistics for The Case Estimator: The Complete Sufﬁcient Principle 303 joint p. Xn be i. . . . . . In addition. . r. . . . In order to give an intuitive interpretation of a MLE. as θ will become apparent from examples to be discussed below. xn) in the case that Ω ∈ . . . .d.

. . · · · . . . . . . . p1 pr −1 pr and this common value is equal to x1 + ⋅ ⋅ ⋅ + xr −1 + xr n = . where Ω is the (r − 1)-dimensional hyperplane in r deﬁned by ⎧ ′ ⎪ Ω = ⎨θ = p1 . p1 + ⋅ ⋅ ⋅ + pr −1 + pr 1 . . p j > 0. in particular. . r − 1. Differentiating with respect to pj. that is. . . r and ∑p j =1 r j ⎫ ⎪ = 1⎬. . Next. xr = log ( ) n! ∏ r x! j =1 j + x1 log p1 + ⋅ ⋅ ⋅ + xr −1 log pr −1 + xr log 1 − p1 − ⋅ ⋅ ⋅ − pr −1 . . xr = = where n = ∑ jr= 1 xj. . log L θ x1 . . xn = − nx 2 < 0 for all θ > 0 2 ∂θ θ ˜ and hence. . . .v. . Thus θ = ¯ is the MLE of θ. for θ = θ. r − 1 and equating the resulting expressions to zero. we get xj This is equivalent to xj pj = xr . . pr)′ ∈ Ω.’s with parameter θ = (p1. ( ) ∂2 1 L θ x1 . . . . j = 1. xn = 0 becomes – n + nx = 0 ∂θ θ ¯ which gives θ˜ = x. . . . . . . j = 1. . . Xr be multinomially distributed r. x1 x x = ⋅ ⋅ ⋅ = r −1 = r . . ( ) 1 1 − xr = 0. . . . x EXAMPLE 11 ( ) Let X1. r − 1. . pj pr j = 1.304 12 Point Estimation 1 ∂ log L θ x1 . ⎪ ⎭ Then L θ x1 . . . . Then ( ) n! ∏ j =1 x j ! r x p 1 ⋅ ⋅ ⋅ prx 1 r n! ∏ j =1 x j ! r p1x ⋅ ⋅ ⋅ prx−1 1 − p1 − ⋅ ⋅ ⋅ − pr −1 1 r−1 ( ) xr . pr j = 1. pr ∈ ⎪ ⎩ ( ) r .

12. . ∂θ In this case it is readily seen that n ∂2 log L θ x1 .5 Criteria12. then the root of ( ) ( ) ∂ log L θ x1 . . . .) EXAMPLE 12 Let X1. respectively.d. j = 1. . xn = − n log 2π − n log σ 2 − ( ) 1 2σ 2 ∑ (x j =1 n j −μ . . Now. . and therefore pj = xj /n.3 Selecting anof Availability of Maximum Likelihood Statistics for The Case Estimator: The Complete Sufﬁcient Principle 305 Hence xj /pj = n and pj = xj /n. ) 2 Differentiating with respect to μ and σ 2 and equating the resulting expressions to zero. Then L θ x1 .5) and therefore ˜ ˜ μ = x and σ 2 = ( ( ) ) 1 n ˆ ˆ μ = x and σ 2 = ∑ x j − x n j =1 2 are the MLE’s of μ and σ 2.4. . . . . if μ is known and we set σ 2 = θ. xn = 2 x − μ = 0 ∂μ σ ∂ n 1 n + log L θ x1 . (See Exercise 12. . if we assume that σ 2 is known and set μ = θ. . . . ⎥ ⎦ ) so that log L θ x1 . σ 2)′. xn = 0. . . n j =1 ( ) ( ) . It can be seen that these values of the ˆ p’s actually maximize the likelihood function. . σ 2) with parameter θ = (μ. . . . It is further shown that μ and σ 2 actually ˜ ˜ maximize the likelihood function (see Exercise 12. . xn = − 2 < 0 2 ∂θ σ and hence μ = ¯ is the MLE of μ. Then 2 1 n ∑ xj − x n j =1 are the roots of these equations. r.’s from N(μ.5. . Xn be i. . . . xn = − ∑ xj − μ ∂σ 2 2σ 2 2σ 4 j =1 ( ) ( ) ( ) ( ) 2 = 0. . . . . then we have again that ¯ is the root of the equation μ=x ˜ ∂ log L θ x1 .v. . . . xn = 0 ∂θ is equal to 2 1 n ∑ xj − μ . . . . r are the MLE’s of the p’s. xn ( ) ⎛ 1 ⎞ ⎡ 1 =⎜ ⎟ exp ⎢− 2 ⎢ 2σ ⎝ 2πσ 2 ⎠ ⎣ n ∑ (x j =1 n j 2⎤ − μ ⎥. .5. j = 1. we obtain ∂ n log L θ x1 . r. .i. . . ˆ x On the other hand.

for any θ such that θ − c ≤ x(1) and θ + c ≥ x(n). r. 1 For example. β )′ ∈ Ω which is the part of the plane above the main diagonal. Here θ = (α.306 12 Point Estimation Next. 2 If β is known and α = θ. or if α is known and β = θ. .’s from U(α. . Xn be i. This. ∂2 1 ⎡n 1 log L θ x1 . subject to the conditions that ˆ α ≤ x(1) and β ≥ x(n).v. . REMARK 4 iii) The MLE may be a UMVU estimator. ( ) ( ) The likelihood function is maximized. ∞ ) x(1) I ( −∞ .d. x(1) and x(n) are the MLE’s of α and β. then L θ x1 . . This shows that any statistic that lies between X(1) + c and X(n) − c is an MLE of θ. . and also for σ 2 in the same example ˆ ˆ when μ is known. θ +c ] x( n ) . but it is. Thus α = x(1) and ˆ ˆ ˆ β = x(n) are the MLE’s of α and β. This happens when α = x(1) and β = x(n). . Then L θ x1 . for σ equal to ( ) ∑ (x j =1 n j 2⎤ −μ ⎥ ⎥ ⎦ ) 2 1 n ∑ xj − μ . . . . then. if α = θ − c. maximized when β − α is minimum. respectively. equivalently. for μ in Example 12. clearly. β = θ + c. ∞ ) x(1) I ( −∞ . clearly. n j =1 ( ) becomes 1 σ4 ⎛n ⎞ n < 0. ⎜ − n⎟ = − 2 ⎝ ⎠ 2σ 4 So 1 n ˆ σ 2 = ∑ xj − μ n j =1 ( ) 2 is the MLE of σ 2 in this case. ( ) ( ) Here the likelihood function is not differentiable with respect to α and β. . In particular. . for instance. happens in Example 10. –[X(1) + X(n)] is such a statistic and hence an MLE of θ. xn = 4 ⎢ − 2 2 ∂θ σ ⎣2 σ ⎢ 2 which. and its maximum is 1/(2c)n. β ).i. θ ≤ x(1) + c and θ ≥ x(n) − c. EXAMPLE 13 Let X1. xn = ( ) 1 ( 2c) n I [θ −c . β ] x( n ) . xn = ( ) 1 (β − α ) n I [α . . ⋅ ⋅ ⋅ . respectively. where c is a given positive constant. .

. . ▲ ) {[ ) ( ] ) ] } REMARK 5 Notice that the conclusion of the theorem holds true in all Examples 10–13. .d. and let φ be deﬁned ˆ on Ω onto Ω* ⊆ m and let it be one-to-one. Suppose θ is an MLE of θ. xn). . θ ∈ Ω ⊆ r.10. See also Exercise 12. . . since ( ) [ ( ) ] [( ) ] [ ( ) ] . iii) The MLE is not always obtainable by differentiation.f. f(·. so that θ = φ−1(θ *)..d. . Then ˆ ) is an MLE of φ(θ ). This is the case in Example 13.i. This case occurs in Example 13 when α = θ − c. This happens. . it follows that θ is a function of T. . f(x. PROOF Set θ * = φ(θ ). θr)′. THEOREM 5 Let X1. θˆr)′ is the unique MLE θ. we present two of the general properties that an MLE enjoys. .f. where h is independent of θ. . PROOF Since T is sufﬁcient. and let T = (T1. we have that the maximum at the ˆ left-hand side above is attained at an MLE θ. . . xn = L φ −1 θ * x1 . . THEOREM 4 Let X1. . . xn max g T x1 .v. if a unique MLE exists. . . Therefore ( ) ( ) [ ( ( ) ]( ) max f x1 . . ˆ ˆ ˆ Then. θ ). . an MLE is invariant under one-to-one φ(θ θ transformations. . clearly. . θ h x1 .v. . xn . Another optimal property of an MLE is invariance.d. . . . . . . . . xn . . . in Example 12 for σ 2 when μ is unknown. . xn . Thus. θ ∈ Ω [( ( ) = h x1 .3. it will have to be a function of T.’s with p. r. as is proved in the following theorem. xn . . Thus φ(θ) is an MLE of φ(θ). θ ∈ Ω = max L * θ * x1 . Xn be i. . . Then θ θ call it L*(θ *|x1. max L θ x1 . j = 1. Tr)′. .12. r be a sufﬁcient statistic for θ = (θ1. iv) There may be more than one MLE. where θ* = φ(θ). r. θ . θ ∈ Ω . . if θ = (θ1. θ ). θ ∈ Ω ⊆ r. θ . θ* ∈ Ω * .3 Selecting anof Availability of Maximum Likelihood Statistics for The Case Estimator: The Complete Sufﬁcient Principle 307 ˆ iii) The MLE need not be UMVU. . . . xn . . θ = g T x1 . . Then. . .g. Xn). . . e. c > 0 In the following.5 Criteria12. θ ⋅ ⋅ ⋅ f xn . as it follows from the right-hand side of the equation above. β = θ + c. . θ ⋅ ⋅ ⋅ f xn .’s with p.d. . It follows that θ L θ x1 . ▲ θ For instance. By assuming the existence of an MLE. . the right-hand ˆ ˆ ˆ ˆ side attains its maximum at θ*. Theorem 1 in Chapter 11 implies the following factorization: f x1 . . . . .i. . . Tj = Tj(X1. That is. . . . . . xn . Xn be i. . .

12. Then show that the quantities μ = ¯ and ˜ x ˜ σ2 = 1 n ∑ xj − x n j =1 ( ) 2 indeed maximize the likelihood function. . . .5. .5.2. Xn be i. and ﬁnd the MLE of θ. and let X be the r. Chapter 11. 12.’s with p. θ1. j = 1.8 12.10 and ﬁnd the MLE of the reliability (x. show that ¯ X /m is the MLE of θ.θ+ 1 2 ). θ).v. .5.d.10 Let X1. .6 Suppose that certain particles are emitted by a radioactive source (whose strength remains the same over a long period of time) according to a Poisson distribution with parameter θ during a unit of time. . θ ∈ Ω ⊆ . 12. ∞ . r. θ 2 ∈Ω = θ2 θ2 ⎠ ⎝ ( ) × 0. ∞). show that r/(r + X ) is the MLE of θ. .’s from the U (θ − distribution. . 12.i.d.5 Refer to Example 12 and consider the case that both μ and σ 2 are unknown. denoting the number of times that no particles were emitted. 12.v. .4 Refer to Example 11 and show that the quantities pj = xj/n. . 1 2 12. r. . r. it follows that 1 n ∑ xj − x n j =1 ( ) 2 is the MLE of σ. .2 If X1. Refer to Exercise 12. Find the MLE of θ in terms of X.1 If X1.i.’s from the Negative Binomial distribution ¯ with parameter θ ∈ Ω = (0.4. and let .’s from B(m.v. θ1 .v.d.3 If X1. θ2.308 12 Point Estimation 1 n ∑ xj − x n j =1 ( ) 2 is the MLE of σ 2 in the normal case (see Example 12). .v. θ ∈ Ω = (0.5. . .5. ( ) Find the MLE’s of θ1. r.5. ∞). θ = θ1 . Exercises 12.5.f.5.i. . The source in question is observed for n time units.i.i. show that 1/X ) is the MLE of θ. . . Xn are i.5. x ≥ θ1 . .3. Xn be i.’s from the Negative Exponential distribu¯ tion with parameter θ ∈ Ω = (0. r. θ ). 1). ˆ r indeed maximize the likelihood function.7 Let X1. θ2) given by f x. Xn are i.d.d. Xn are i. 12. 12.9 Refer to Exercise 11.5. . f(·.d. . . θ 2 = ( ) ′ ⎛ x − θ1 ⎞ 1 exp⎜ − ⎟ .v. .

. . xn = θ − δ x1 . For this. δ x1 . we will restrict ourselves to a real-valued parameter. xn)′ is called a decision. . . k > 0. θ ∈ Ω ⊆ . f(·. xn). θ . January 1971. ·) is denoted by R (·. Vol. .d.3 The Case of Availability of Decision-Theoretic Statistics 309 ⎛ 1⎞ ˆ ˆ θ = θ X 1 . S. . . . see also the paper Maximum Likelihood and Sufﬁcient Statistics by D. δ x . 2⎠ ⎝ ˆ Then show that θ is an MLE of θ but it is not a function only of the sufﬁcient statistic (X(1). . that is. n into . . or more generally. . X n ( ) ⎧ ∞ ⋅ ⋅ ⋅ ∞ L θ . . r. . . . Xn and by using the decision function δ. . xn f x1 . No. X(n))′. . Xn be i. . . θ ⋅ ⋅ ⋅ f x . . . .v. ·) and is deﬁned by R θ . pp. δ = Eθ L θ . .) ( ) ( )( ) 12. . . . . For estimating θ on the basis of X1. δ x1 . . . . . . x ⎩x 1 n [ ( [ ( [ ( )] )] ( )] ( ) ) ( ( ) ) . . . . . δ X 1 . . θ ⋅ ⋅ ⋅ f xn . θ dx ⋅ ⋅ ⋅ dx 1 n 1 n 1 n ∫−∞ ⎪∫−∞ =⎨ ⎪∑ ⋅ ⋅ ⋅ ∑ L θ . xn) which expresses the (ﬁnancial) loss incurred when θ is estimated by δ(x1. .12. . x f x . . xn . . δ x1 . . . . Moore in the American Mathematical Monthly. xn DEFINITION 9 [ ( )] [ ( )] . 78. In this section. . xn = υ θ θ − δ x1 . .d.’s with p. xn = θ − δ x1 . L θ . . . . . or L[·. . 1. [ ( )] ( ) L θ . xn [ ( )] ( ) ( ) k . δ(x1. .6 Criteria for Selecting an Estimator: The Complete Sufﬁcient Approach 12. . DEFINITION 7 DEFINITION 8 A decision function (or rule) δ is a (measurable) function deﬁned on The value δ(x1. X n = ⎜ X ( n ) − ⎟ + cos 2 X 1 X (1) − X ( n ) + 1 .i. . The most convenient form of a loss function is the squared loss function. . . .6 Criteria for Selecting an Estimator: The Decision-Theoretic Approach We will ﬁrst develop the general theory underlying the decision-theoretic method of estimation and then we will illustrate the theory by means of concrete examples. . xn) of δ at (x1. . . Our problem is that of estimating θ. 2 The risk function corresponding to the loss function L(·. . . . The loss functions which are usually used are of the following form: L θ . . . So let X1. a loss function is a nonnegative function in the arguments θ and δ(x1. . θ ). (Thus Theorem 4 need not be correct if there exists more than one MLE of the parameters involved. δ x1 .f. 42–45. . xn)] is taken to be a convex function of θ. .

for it might so happen that . it sufﬁces to restrict ourselves to an essentially complete class of estimators. However. searching for an estimator with some optimal properties. we believe that the approach adopted here is more pedagogic and easier for the reader to follow. where DEFINITION 11 A class D of estimators of θ is said to be essentially complete if for any estimator δ* of θ not in D one can ﬁnd an estimator δ in D such that R(θ. Since for any two equivalent estimators δ and δ* we have R(θ . Thus. clearly. δ = Eθ L θ . δ * X 1 . δ). . . . . Xn). in this case. there are two principles on which our search may be based. some authors discuss UMVU estimators as a special case within the decision-theoretic approach as just mentioned. δ ) ≤ R(θ. . Actually. . . the risk corresponding to a given decision function is simply the average loss incurred if that decision function is used. where DEFINITION 10 ( ) [ ( )] [ ( )] ( ) The estimator δ of θ is said to be admissible if there is no other estimator δ* of θ such that R(θ. δ ) = R(θ. δ*) = R(θ. and its goodness will be judged on the basis of its risk R(·. Unfortunately. such estimators do not exist except in trivial cases. X n = R θ . is called a minimax (from minimizing the maximum) estimator. assumed that a certain loss function is chosen and then kept ﬁxed throughout. . Two decision functions δ and δ* such that R θ . R(θ. δ*) for all θ ∈Ω. . δ ) for all θ ∈ Ω. that is. However. if we restrict ourselves only to the class of unbiased estimators with ﬁnite variance and take the loss function to be the squared loss function (see paragraph following Deﬁnition 8). δ ) becomes simply the variance of δ(X1. It is. To start with. Once this has been done the question arises as to which member of this class is to be chosen as an estimator of θ. xn) will be called an estimate of θ. . Such an estimator. . . Setting aside the fruitless search for an estimator which would uniformly (in θ ) minimize the risk within the entire class of admissible estimators. δ X 1 . δ * for all θ ∈ Ω are said to be equivalent.310 12 Point Estimation That is. while we may still conﬁne ourselves to the essentially complete class of estimators. δ*) for any other estimator δ* within the class and for all θ ∈ Ω. This problem has already been discussed in Section 3 and Section 4. to minimize the maximum (over θ ) risk. In the present context of (point) estimation. The ﬁrst is to look for an estimator which minimizes the worst which could happen to us. . . . then. we conﬁne our attention to an essentially complete class of admissible estimators. δ*) ≤ R(θ. An apparently obvious answer to this question would be to choose an estimator δ such that R(θ. δ ) for all θ ∈ Ω with strict inequality for at least one θ. we may not rule out inadmissible estimators. of course. X n = Eθ L θ . the decision δ = δ(x1. if it exists. However. we ﬁrst rule out those estimates which are not admissible (inadmissible). The criterion proposed above for selecting δ then coincides with that of ﬁnding a UMVU estimator. .

Then we have the following deﬁnition: DEFINITION 12 Within the class D1.d.f. minimax R R(•. the assumption of θ being an r. For example. δ ) is ﬁnite for all θ ∈ Ω.2 illustrates the fact that a minimax estimator may be inadmissible. Recall that Ω ⊆ . and suppose now that θ is an r.6 Criteria for Selecting an Estimator: The Complete Sufﬁcient Approach 12. Such a δ is called a Bayes estimator of θ. For example. to be called a prior p. Then set ⎧ R θ .2. Now one may very well object to the minimax principle on the grounds that it gives too much weight to the maximum risk and entirely neglects its other values. ). δ ). one has sup R θ . It should be pointed out at the outset that the Bayes approach to estimation poses several issues that we have to reckon with. itself with p.3 0 a minimax estimator is inadmissible. Let D2 be the class of all estimators for which R(δ ) is ﬁnite for a given prior p. it is clear that R(δ ) is simply the average (with respect to λ) risk over the entire parameter space Ω when the estimator δ is employed. in Fig. θ ∈ Ω . δ λ θ dθ ⎪∫Ω R δ = Eλ R θ . λ on Ω. provided it exists. ⎩Ω Assuming that the quantity just deﬁned is ﬁnite. the estimator δ is said to be minimax if for any other estimator δ*. δ . *) R(•.f. might be entirely unreasonable. we restrict our attention to the class D1 of all estimators for which R(θ. Figure 12. δ = ⎨ ⎪∑ R θ . Then it makes sense to choose that δ for which R(δ) ≤ R(δ*) for any other δ*. the estimator δ is said to be a Bayes estimator (in the decision-theoretic sense and with respect to the prior p. δ λ θ . θ may denote the (unknown but ﬁxed) distance between Chicago and New York City.12. Then [( ) ] [( ) ] () ( ) ( )() ( )() DEFINITION 13 Within the class D2.d. θ ∈ Ω ≤ sup R θ . To see what this is. λ. ) 0 Figure 12. ) R( 0. which is . whereas the minimax estimate δ is slightly better at its maximum R(θ0. λ on Ω) if R(δ ) ≤ R(δ*) for any other estimator δ*. 12.v. it is much worse than δ* at almost all other points. First. Legitimate objections to minimax principles like the one just cited prompted the advancement of the concept of a Bayes estimate.d. δ * . 12. (See Fig.f.3. some further notation is required.d.) Instead.2 0 Figure 12.v.f.3 The Case of Availability of Decision-Theoretic Statistics 311 R R(•.

Then.7 Finding Bayes Estimators Let X1. . L θ . . xn Let θ be an r. .f. In addition.d. 12. θ ) ⋅ ⋅ ⋅ f ( x . it is sensible to assign more weight in the subset under question than to its complement.d. . by means of which we expect to construct estimates with some tangible and mathematically optimal properties. θ dx1 ⋅ ⋅ ⋅ dxn λ θ dθ ( ) [ θ − δ x1 . δ = Eθ θ − δ X 1 . . and consider the δ = δ x1 . prior experience might suggest that it is more likely that the true parameter lies in a given subset of Ω rather than in its complement. . f(·. . . xn . For instance. However. We should like to mention once and for all that the results in the following two sections are derived by employing squared loss functions. when choosing λ we have the ﬂexibility to weigh the parameters the way we feel appropriate. . since the discrete case is handled similarly with the integrals replaced by summation signs. . We consider the continuous case. with prior p. in principle. δ = L θ . however. δ λ θ dθ Ω ∞ = ∫ ⎧∫ ⋅ ⋅ ⋅ ⎨ Ω ⎩ −∞ () ( )() ∫ ∞ −∞ × f x1 . θ ). Then we are interested in determining δ so that it will be a Bayes estimate (of θ in the decision-theoretic sense). θ )dx 2 1 n 1 n )] 2 1 ⋅ ⋅ ⋅ dxn . .i. Xn be i. . θ ⋅ ⋅ ⋅ f xn . there still is a problem in choosing the prior λ on Ω.’s with p. X n =∫ Therefore ∞ −∞ ( ) [ ( ⋅⋅⋅ ∫ ∞ −∞ [θ − δ ( x . . . .d. .f. Thus we have the possibility of incorporating prior information about θ or expressing our prior opinion about θ. there are inﬁnitely many such choices. . We have ( ) ( ) [ ( )] [ ( )] . . Of course. and also incorporate in it any prior knowledge we might have in connection with the true value of the parameter. Another decisive factor in choosing λ is that of mathematical convenience. . xn ( ( ) )] 2 }() . we are forced to select λ so that the resulting formulas can be handled. . . x )] f ( x . δ x1 . . That is. in concrete cases. 2 R θ . in choosing λ.312 12 Point Estimation to be determined by repeated measurements. . that the same results may be discussed by using other loss functions.v. for an estimate . This difﬁculty may be circumvented by pretending that this assumption is only a mathematical device. r. . It should be emphasized. . xn = θ − δ x1 .v. R δ = ∫ R θ . . λ. θ ∈ Ω ⊆ squared loss function. . . This granted. choices do suggest themselves.

we have the following theorem: THEOREM 6 A Bayes estimate δ(x1. (In fact. xn )] λ (θ ) f ( x1 . .) From (13). then R (δ ) is also minimized.12. )() for each (x1. θ )λ (θ )dθ < ∞. . θ )λ (θ )dθ + ∫ θ 2 f ( x1 . xn )] λ (θ ) f ( x1 .f. θ ) ⋅ ⋅ ⋅ f ( xn . . . . . θ λ θ dθ < ∞.7 Finding Bayes Estimators The Case of Availability of Complete Sufﬁcient Statistics ∞ −∞ 313 =∫ ⋅⋅⋅ × λ θ f x1 .) Thus the required estimate is given by () (a > 0 ) δ x1 . xn)′. θ )dθ = δ 2 ( x1 . ( )() ∫ ( ) Ω 1 n Ω 1 n Formalizing this result. . Ω and ∫ θ f (x . λ on Ω for which ∫Ω θf ( x1 .d. . θ λ θ dθ . g′(t) = 2at − 2b = 0 implies t = b/a and g″(t) = 2a > 0.3 12. θ ) ⋅ ⋅ ⋅ f ( x n . . . xn) (of θ) corresponding to a prior p. . Integrals in (15) are to be replaced by summation signs if λ is of the discrete type. Ω Ω and the right-hand side of (14) is of the form g t = at 2 − 2bt + c (14) which is minimized for t = b/a. . ()( ⎧ ∫ ⎨∫ [θ − δ ( x . . xn)′. . . θ ) ⋅ ⋅ ⋅ f ( xn . is given by δ x1 . . . . θ )λ (θ )dθ − 2δ ( x1 . . . ( )() ∫ ( ) Ω 1 n Ω 1 n (15) provided λ is of the continuous type. xn ) Ω × ∫ θf ( x1 . . . . θ ⋅ ⋅ ⋅ f x . . . θ )λ (θ )dθ ) ∫ f x . xn = ( θf ( x . the interchange of the order of integration is valid here because the integrand is nonnegative. . . . . . But ∫Ω [θ − δ ( x1 . . . θ ) ⋅ ⋅ ⋅ f ( x . xn )∫ f ( x1 . θ λ θ dθ . x )] ⎩ ∞ −∞ Ω 1 n 2 ) ( ) } (13) (As can be shown. θ ) ⋅ ⋅ ⋅ f ( x n . θ ) ⋅ ⋅ ⋅ f ( xn . . θ ) ⋅ ⋅ ⋅ f (x 2 Ω 1 n . θ ) ⋅ ⋅ ⋅ f ( xn . θ )λ (θ )dθ . xn = ( θf ( x . The theorem used is known as the Fubini theorem. . . it follows that if δ is chosen so that is minimized for each (x1. . θ )dθ 2 2 ∫Ω [θ − δ ( x1 . θ ⋅ ⋅ ⋅ f xn . . θ dθ dx1 ⋅ ⋅ ⋅ dxn . 0 < ∫ f ( x1 . θ ⋅ ⋅ ⋅ f x . θ ) ⋅ ⋅ ⋅ f ( x . θ ) ⋅ ⋅ ⋅ f ( xn . . . θ )λ (θ )dθ < ∞. θ )λ (θ )dθ ) ∫ f x . . .

f.f. . we determine the conditional p. xn) is the expectation of θ with respect to its posterior p. . of θ given by (16) and corresponding to the prior p. xn)′ and denoting by h(·|x) the posterior p. if it exists.v. from the deﬁnition of the p. Thus when no prior knowledge about θ is available (which is expressed by taking λ(θ ) = c.i. θ ). . θ λ θ f x . Setting x = (x1. EXAMPLE 14 Let X1. . is simply that value of θ which maximizes h(·|x). of θ. given X1 = x1. We choose λ to be the Beta density with parameters α and β. .d. ⎧Γ α +β β −1 ⎪ θ α −1 1 − θ . n. . j = 1. . . . . . θ λ θ ( ) (h(x) ) = ( h(x)) ( ) = ( ) h(x() ) ( ) . that is. Another Bayesian estimate of θ could be provided by the median of h(·|x). As will be seen.d. 1 n (16) for the case that λ is of the continuous type. .314 12 Point Estimation Now.f.d. . θ ∈ Ω.d. .f. if θ ∈ 0. we have () ( ) ( )() ( ) ( ) . x f x. ⎩ Now. let h(·|x) be the posterior p. θ ⋅⋅⋅ f x .’s from B(1. θ ⋅ ⋅ ⋅ f xn . .f. h x = ∫ f x. or the mode of h(·|x). of θ. . This is called the posterior p. Xn = xn. xn = ∫ θh θ x dθ . . we have then hθx = where f θ. θ ∈ Ω). if it exists. θ λ θ dθ Ω Ω () ( )() ( ) ( )() δ x1 . otherwise. .d..f. (17) Now let us suppose that Ω is bounded and let λ be constant on Ω. if the observed value of Xj is xj. let us make the following observation regarding the maximum likelihood and the Bayesian approach to estimation problems.d. the likelihood function is maximized if and only if the posterior p.d. To this end.d.d.f. Ω ( ) ( ) REMARK 6 At this point. . this observation establishes a link between maximum likelihood and Bayes estimates and provides insight into each other. say. 1). h(·|x) may be written as follows: hθx = L θ x)λ (θ ) ( ) ( h(x) . Since f(x. θ λ θ dθ = ∫ f x1 . λ. θ ∈ Ω = (0. By means of (15) and (16). r. is.f. . of θ and represents our revised opinion about θ after new evidence (the observed X’s) has come in. that is. . 1 ⎪ λ θ = ⎨Γ α Γ β ⎪ ⎪0. Xn be i. . it follows then that the Bayes estimate of θ (in the decision-theoretic sense) δ(x1. Then it follows from (17) that the MLE of θ. of a Beta distribution with parameters α and β. . . Some examples follow. λ(θ ) = c. θ ) = L(θ |x).

of course Γ(γ) = (γ − 1)Γ(γ − 1). (1 − θ )( ∫θ Γ (α )Γ ( β ) = Γ α +β 1 ∑ j xj n −∑ j x j ( ) ( )() α −1 β −1 0 1 α +∑ j x j −1 β +n − ∑ j x j −1 0 which by means of (18) becomes as follows: I1 ( ) ⋅ Γ⎛α + ∑ x ⎞ Γ⎛ β + n − ∑ ⎝ ⎠ ⎝ = Γ (α )Γ ( β ) Γ (α + β + n) Γ α +β n j =1 j n j =1 xj ⎞ ⎠ . θ ⋅ ⋅ ⋅ f xn . by virtue of (15). . θ λ θ dθ Ω ( ) θθ 1 − θ θ 1 − θ dθ ( ) ( ) ∫ Γ (α )Γ ( β ) Γ (α + β ) ) ( ) = dθ . we have j=1 I 1 = ∫ f x1 . for simplicity. . Then.7 Finding Bayes Estimators The Case of Availability of Complete Sufﬁcient Statistics 315 α −1 ∫0 x (1 − x) 1 β −1 dx = Γα Γβ ( ) ( ). xn ( ) ∑ = n j =1 xj + α n+α + β . ( ) n Γ α + β + n Γ ⎛ α + ∑ j =1 x j + 1⎞ α + ∑n x j ⎝ ⎠ j =1 = .3 12. Γ (α + β ) (18) and. . (20) Relations (19) and (20) imply. I 1 = ∫ θf x1 . θ ⋅ ⋅ ⋅ f xn . (21) . writing ∑nxj rather j than ∑n xj when this last expression appears as an exponent. . = n α +β+n Γ α + β + n + 1 Γ ⎛ α + ∑ j =1 x j ⎞ ⎝ ⎠ ( ( ) ) δ x1 . .12. . θ λ θ dθ Ω ( ) θ 1 − θ θ 1 − θ dθ ( ) ( ) ∫ Γ (α )Γ ( β ) Γ (α + β ) ) ( ) = dθ . (19) Next. . . δ x1 . (1 − θ )( ∫θ Γ (α )Γ ( β ) = Γ α +β 1 ∑ j xj n −∑ j x j ( ) ( )() α −1 β −1 0 1 α +∑ j x j +1 −1 β +n −∑ j x j −1 0 Once more relation (18) gives I2 ( ) ⋅ Γ⎛α + ∑ x + 1⎞ Γ⎛ β + n − ∑ ⎝ ⎠ ⎝ = Γ (α )Γ ( β ) Γ (α + β + n + 1) Γ α +β n j =1 j n j =1 xj ⎞ ⎠ . xn that is.

EXAMPLE 15 ( ) ∑ = n j =1 xj + 1 n+ 2 . In this case the corresponding Bayes estimate is δ x1 . θ ⋅ ⋅ ⋅ f xn . Take λ to be N(μ. .i.316 12 Point Estimation REMARK 7 We know (see Remark 4 in Chapter 3) that if α = β = 1.’s from N(θ. Let X1. 1). 1). . × ∫ exp ⎨− −∞ ⎩ 2 ⎭ [( ) ) ) ) ( )] But (n + 1)θ 2 ⎛ nx + μ ⎞ − 2 nx + μ θ = n + 1 ⎜ θ 2 − 2 θ⎟ n+1 ⎠ ⎝ ) ( ( ( Therefore 2 2 ⎡ ⎛ nx + μ ⎞ ⎛ nx + μ ⎞ ⎤ nx + μ = n + 1 ⎢θ 2 − 2 −⎜ θ +⎜ ⎟ ⎟ ⎥ n+1 ⎢ ⎝ n+1 ⎠ ⎝ n+1 ⎠ ⎥ ⎦ ⎣ 2 2⎤ ⎡⎛ ⎛ nx + μ ⎞ nx + μ ⎞ = n + 1 ⎢⎜ θ − ⎟ ⎥. θ λ θ dθ Ω ( ) ( )() ( = ( ) 2π 1 n +1 ⎡ 1 n exp ⎢− ∑ x j − θ ∫−∞ ⎢ 2 j =1 ⎣ ∞ ) 2 ⎡ θ−μ ⎤ ⎢ ⎥ exp ⎢− 2 ⎥ ⎦ ⎢ ⎣ ( ) 2 ⎤ ⎥ dθ ⎥ ⎥ ⎦ = ( ) ( ⎡ 1⎛ n ⎞⎤ exp ⎢− ⎜ ∑ x 2 + μ 2 ⎟ ⎥ j n +1 ⎠⎥ ⎢ 2 ⎝ j =1 ⎣ ⎦ 2π 1 ∞ ⎧ 1 ⎫ n + 1 θ 2 − 2 nx + μ θ ⎬dθ . Xn be i. then the Beta distribution becomes U(0. 1). j n ⎢ n + 1 ⎥⎪ n + 1 2π ⎪ 2 ⎢ j =1 ⎥ ⎣ ⎦⎭ ⎩ ) ∞ ( ) ( ) ( ) (22) . r. Then I 1 = ∫ f x1 .d. . where μ is known. . . ⎟ −⎜ n+1 ⎠ ⎢⎝ ⎝ n+1 ⎠ ⎥ ⎦ ⎣ I1 = 1 n+1 ( 2π ) ( 1 1 n ⎧ ⎡ nx + μ ⎪ 1⎢ n 2 exp ⎨− ∑ x j + μ 2 − ⎢ n+1 ⎪ 2 ⎢ j =1 ⎣ ⎩ ( ) 2 ⎤⎫ ⎥⎪ ⎥⎬ ⎥⎪ ⎦⎭ ⎡ ⎤ 2 ⎢ ⎛ nx + μ ⎞ ⎥ 1 × θ− ⎟ ⎥dθ ∫−∞ exp ⎢− 2 ⎜ n+1 ⎠ ⎥ ⎢ 2 1 n+1 ⎝ 2π 1 n + 1 ⎢ ⎥ ⎣ ⎦ 2 ⎤⎫ ⎧ ⎡ nx + μ ⎥ ⎪ 1 1 ⎪ 1 n = exp ⎨− ⎢∑ x 2 + μ 2 − ⎬.v. . xn as follows from (21). . .

. iii) Construct a 100(1 − α)% Bayes conﬁdence interval for θ . assign equal probabilities to the two tails. xn = ( ) nx + μ .f. where c(x) is determined by the requirement that the Pλ-probability of this set is equal to 1 − α.12. determine a set {θ ∈ (0. iii) Derive the Bayes estimate in (24) as the mean of the posterior p.7.) 12.2 Refer to Example 15 and: iii) Determine the posterior p. (Hint: For simplicity. that is. θ λ θ dθ Ω ( ) ( )() ( = ( ) 2π 1 n+1 1 n +1 ⎡ 1 n θ exp ⎢− ∑ x j − θ ∫−∞ ⎢ 2 j =1 ⎣ ∞ ) 2 ⎡ θ−μ ⎤ ⎢ ⎥ exp ⎢− 2 ⎥ ⎦ ⎢ ⎣ ( ) 2 ⎤ ⎥dθ ⎥ ⎥ ⎦ = ( 2π ) ( 1 1 n ⎧ ⎡ nx + μ ⎪ 1 n exp ⎨− ⎢∑ x 2 + μ 2 − ⎢ j =1 j n+1 ⎪ 2⎢ ⎣ ⎩ ( ) 2 ⎤⎫ ⎥⎪ ⎥⎬ ⎥⎪ ⎦⎭ ⎤ ⎡ 2 ⎢ ⎛ 1 nx + μ ⎞ ⎥ × θ− ⎟ ⎥dθ ∫−∞ θ exp ⎢− 2 ⎜ n+1 ⎠ ⎥ ⎢ 2 1 n+1 ⎝ 2π 1 n + 1 ⎥ ⎢ ⎦ ⎣ 2 ⎤⎫ ⎧ ⎡n nx + μ ⎥ ⎪ nx + μ 1 1 ⎪ 1 = . h(θ |x). θ ⋅ ⋅ ⋅ f xn . I 2 = ∫ θf x1 .f.7. h(θ|x). iii) Derive the Bayes estimate in (21) as the mean of the posterior p. . . h(θ |x). iii) Construct the equal-tail 100(1 − α)% Bayes conﬁdence interval for θ. h(θ |x) ≥ c(x)}. δ x1 . exp ⎨− ⎢∑ x j2 + μ 2 − ⎬ n ⎢ n + 1 ⎥⎪ n + 1 n + 1 2π ⎪ 2 ⎢ j =1 ⎥⎭ ⎦ ⎣ ⎩ ) ∞ ( ) ( ) ( ) (23) By means of (22) and (23).d. n+1 (24) Exercises 12. one has. on account of (15).1 Refer to Example 14 and: iii) Determine the posterior p.d.d. 1). . h(θ|x).f.d. .f.3 The Case of Availability of Complete Sufﬁcient Exercises Statistics 317 Next.

12. iv) Do parts (i)–(iii) for any sample size n.d. Ω provided λ is of the continuous type. r. .v. with parameters α = θ and β = 1.4 Let X be an r. .d. iii) Derive the Bayes estimates δ(x) for the loss functions L(θ.d. λ on Ω such that for the Bayes estimate δ deﬁned by (15) the risk R(θ . xn ) = ∫ θh(θ x)dθ . .’s with p. THEOREM 7 Suppose there is a prior p. .f.f. Then. δ) = [θ − δ (x)]2/θ .d. δ ) = [θ − δ (x)]2/θ. on Ω. δ ) is independent of θ. λ of θ be the Negative Exponential with parameter τ. on the basis of X: iii) Determine the posterior p. iii) Derive the Bayes estimates δ(x) for the loss functions L(θ. and as has been already observed. λ of θ be Negative Exponential with parameter τ. and let the prior p. θ ∈ Ω (⊆ ) and let λ be a prior p. on the basis of X: iii) Determine the posterior p.f. .8 Finding Minimax Estimators Although there is no general method for deriving minimax estimates. δ ) = [θ − δ (x)]2 as well as L(θ . δ ) = [θ − δ(x)]2 as well as L(θ. then it is easily seen that 2Y ∼ χ2 . having the Beta p.7. . Then the posterior p. iv) Do parts (i)–(iv) for any sample size n when λ is Gamma with parameters k (positive integer) and β. h(θ|x).d.i. f(·.d. of θ.f. Then δ is minimax.f.v.f. . iv) Do parts (i)–(iii) for any sample size n. .f.3 Let X be an r. .7. .f. this can be achieved in many instances by means of the Bayes method described in the previous section.) 2k β 12. is given by (16). .d.318 12 Point Estimation 12.d.d. the Bayes estimate of θ (in the decision-theoretic sense) is given by δ ( x1 . Xn be i. . . (Hint: If Y is distributed as Gamma with parameters k and β.f. . Then. Then we have the following result. h(θ |x). iii) Construct the equal-tail 100(1 − α)% Bayes conﬁdence interval for θ . xn)′ = x. . iii) Construct the equal-tail 100(1 − α)% Bayes conﬁdence interval for θ . .d. h(·|x). . distributed as P(θ ). θ ). and let the prior p.v. PROOF one has By the fact that δ is the Bayes estimate corresponding to the prior λ. given X = (X1. Let X1. Xn)′ = (x1.

12. a UMVU estimator need not be admissible. Xn and λ be as in Example 14. Here is an example. δ * . outside this class there might be estimators which are better than a UMVU estimator. Now by setting X = ∑n= 1Xj and taking into considj eration that Eθ X = nθ and Eθ X2 = nθ(1 − θ + nθ). δ )λ (θ )dθ ≤ ∫Ω R(θ .8 Finding Sufﬁcient Statistics The Case of Availability of Complete Minimax Estimators 319 ∫Ω R(θ . It can be shown that it is also minimax and admissible. But R(θ. δ * = 4 n+ n 4 1+ n Since R(θ. Xn be i. . 2α 2 + 2αβ − n = 0. δ * x1 . we have 2 (α + β ) ) ( 2 − n = 0. ⎥ ⎦ ⎭ ( ) By taking α = β = 1 – √n and denoting by δ* the resulting estimate. will not be presented here. Now a UMVU estimator has uniformly (in θ ) smallest risk when its competitors lie in the class of unbiased estimators with ﬁnite variance. . Theorem 6 implies that ( α2 n+α + β ) 2 = ( n ) ( 2 = 1 ) 2 .i. In other words. The case that λ is of the discrete type is treated similarly. The proof of these latter two facts. so that R θ. r. σ 2). EXAMPLE 17 ( ) ∑ n j =1 1 n 2 nx + 1 2 = n+ n 2 1+ n xj + ( ) Let X1. . we obtain ⎛ X +α ⎞ R θ . . Then the corresponding Bayes estimate δ is given by (21). where σ 2 is known and μ = θ. δ = Eθ ⎜ θ − ⎟ n+α + β⎠ ⎝ ( ) 2 = 1 (n + α + β ) 2 ⎧⎡ ⎨⎢ α + β ⎩⎣ ( ) 2 ⎫ − n⎤θ 2 − 2α 2 + 2αβ − n θ + α 2 ⎬. δ . .’s from N(μ. EXAMPLE 16 [( ) ] ( )() [( ) ] Let X1.d. . . xn = is minimax. δ * λ θ dθ ≤ sup R θ . . . δ *)λ (θ )dθ for any estimate δ*. .3 12. ▲ The theorem just proved is illustrated by the following example. however. However. . ¯ It was shown (see Example 9) that the estimator X of θ was UMVU. Therefore δ is minimax. . δ*) is independent of θ. θ ∈ Ω Ω for any estimate δ*.v. θ ∈ Ω = c ≤ ∫ R θ . Hence sup R θ . δ ) = c by assumption. .

U) = 2θ 2/n. j = 1. δ ) = [θ − δ (x)]2/θ. k. . Then its risk is R θ . . Xn be independent r. that is. Let Xj be the number of trials which result in Aj and let pj be the probability that any one of the trials results in Aj.) Its variance (risk) was seen to be equal to 2θ 2/n. θ r . δ = Eθ αU − θ ( ) ( ) 2 = Eθ α U − θ + α − 1 θ [( ) ( )] 2 = θ2 n + 2 α 2 − 2 nα + n . ′ p j = p j θ . n [( ) ] The value α = n/(n + 2) minimizes this risk and the minimum risk is equal to 2θ 2/(n + 2) < 2θ 2/n for all θ. .’s from the P(θ ) distribution. Exercise 12. and then the minimization can be achieved.v. . However.8. Thus U is not admissible. . np (θ) − np j θ j 2 Often the p’s are differentiable with respect to the θ’s. σ 2). . δ ) = 1/θEθ [θ − δ (X)]2. . .v. by differentiation. in principle.’s from N(0. The solution may be easier by minimizing the following modiﬁed χ2 expression: . . calculate the risk R(θ. . . Set σ 2 = θ. the actual solution of the resulting system of r equations is often tedious.9 Other Methods of Estimation Minimum chi-square method. k. The probabilities pj may be functions of r parameters.1 Let X1. . consider n independent repetitions of an experiment whose possible outcomes are the k pairwise disjoint events Aj. R(θ. r. j = 1. and conclude that δ (x) is minimax. Then for the estimate δ (x) = ¯ x. θ = θ1 . . This method of estimation is applicable in situations which can be described by a Multinomial distribution. . . . Namely. 12. Xn be i. () ( ) Then the present method of estimating θ consists in minimizing some measure of discrepancy between the observed X’s and the expected values of them. Then the UMVU estimator of θ is given by U= 1 n ∑ X j2 . .i.d. and consider the loss function L(θ.320 12 Point Estimation EXAMPLE 18 Let X1. n j =1 (See Example 9. One such measure is the following: χ 2 [X =∑ k j =1 j ( )] . that is. . . . Consider the estimator δ = αU.

θ ) and for a positive integer r. f(·. we have ⎧X = μ ⎪ n 2 1 n ⎨1 ˆ ˆ ∑ X j2 = σ 2 + μ 2 . mr will be estimated by the corresponding sample moment 1 n ∑ X jr . θ r .i. By the method of moments. . r.d. where both α and β are unknown. n j =1 ( ) the solution of which (if possible) will provide estimators for θj. j = 1. The problem is that of estimating mr. hence μ = X . r. r.i. they enjoy some asymptotic optimal properties as well. where both μ and σ 2 are unknown. . all Xj > 0. . r. of course.v. Since α +β EX 1 = 2 (see Chapter 5). .d. . . σ 2). . . assume that EX r = mr is ﬁnite.v.3 12. ⎪n j =1 ⎩ j =1 ( ) EXAMPLE 20 Let X1. . . Xn be i.d. EXAMPLE 19 Let X1. . . . Then we consider the following system 1 n k ∑ X j = mk θ1 . Xn be i. .9 Other Methods of Estimation The Case of Availability of Complete Sufﬁcient Statistics 321 2 χ mod [X =∑ k j =1 j − np j θ Xj ( )] . According to the present method. . Xn be i. . . .i.’s from N(μ. . Under suitable regularity conditions. . k = 1. the resulting estimators can be shown to have some asymptotic optimal properties.d. σ 2 = n ∑ X j − X .10. . β ).) The method of moments. . n j =1 The resulting moment estimates are always unbiased and. . 2 provided. .v.f. . r. . we have ⎧ α +β ⎪X = 2 ⎪ ⎨ n ⎪1 X2 = α − β ⎪n ∑ j 12 ⎩ j =1 and σ X 1 2 ( ) (α − β ) = 12 2 ( ) + (α + β ) 2 2 4 ⎧β + α = 2 X ⎪ . Let X1. under suitable regularity conditions. . . (See Section 12. . k. On the other hand the theoretical moments are also functions of θ = (θ1. .’s from U(α. ⎩ where . θr)′.’s with p. . j = 1. . or ⎨ ⎪β − α = S 12 .12.

’s distributed as U(−θ. . f(·.5 Let X1.322 12 Point Estimation S= 1 n ∑ Xj − X n j =1 ( ) 2 . β.’s distributed as U(θ − a. where S2 = 12. Exercises 12.9.’s with p. Xn are i.d. θ ) given by f x. ∞ . 12. . . 12.1 Let X1. β = X + S 3 .’s from the Beta distribution with parameters α. .v.9.9. . we see that the moment estimators α. Another obvious disadvantage of this method is that it fails when no moments exist (as in the case of the Cauchy distribution). Xn be i.’s from the Gamma distribution with paramˆ ¯ eters α and β. θ + b). Xn are independent r. θ2 Find the moment estimator of θ. . .9. show that α = X 2/S2 and β = S2/X are the moment estimators of ˆ ¯ α and β. does the method of moments provide an estimator for θ? 12.f. θ ) x . respectively. . X2 be independent r. β and ﬁnd the moment estimators of α and β.i. This is a drawback of the method of moment estimation. θ ). where a. r.2 If X1. .v. .d.4 2 1 n ∑ Xj − X .v. ˆ ˆ REMARK 8 In Example 20. are not functions of the sufﬁcient statistic (X(1). respectively.3 If X1. or when not enough moments exist. Xn be independent r. ∞). . θ = 2 θ − x I ( 0 . θ ∈ Ω = 0. X(n))′ of (α.10 Asymptotically Optimal Properties of Estimators So far we have occupied ourselves with the problem of constructing an estimator on the basis of a sample of ﬁxed size n.i.9. b > 0 are known and θ ∈ Ω = . This method is applicable when the underlying distribution is of a certain special form and it will be discussed in detail in Chapter 16. 12.5. θ ∈ Ω = (0. .9. and having one or more of the .v. r.7 and ﬁnd the moment estimators of θ1 and θ2. Find the moment estimator of θ and calculate its variance.6 Refer to Exercise 12. ( ) ( ) () ( ) 12. . n j =1 ( ) Let X1. β of α. . β)′. Least square method. .v. ˆ ˆ Hence α = X − S 3 .d. .

σ 2(θ)). θ ∈ Ω ⊆ . we have the following deﬁnitions. . . r.” The following theorem provides a criterion for a sequence of estimates to be consistent. we have the following result. asymptotic properties can be associated with an estimator. consistent (or strongly consistent) if Vn ⎯⎯→ θ as n → ∞.1). {Vn} = {V(X1. ⎯ ( Pθ ) θ it follows that Vn ⎯→∞ θ (see Exercise 12.10 Asymptotically of Complete Sufﬁcient Statistics The Case of Availability Optimal Properties of Estimators 323 following properties: Unbiasedness. ⎯→ n ( ) ( ( )) P DEFINITION 16 The sequence of estimators of θ. . – is said to be asymptotically normal N(0. σ2(θ)/n). where X is distributed (under Pθ) as N(0. . Xn be i. It is said to be a.d. . as n → ∞. ⎯ a.12. Xn)}. If n Vn − θ ⎯d → N 0. If however. .s. EθVn → θ and σ 2Vn → 0. Xn)}. . θ ). √n(Vn − θ ) d ⎯ → X for all θ ∈ Ω. then some additional. Pθ for all θ ∈ Ω. is said to be consistent P in probability (or weakly consistent) if Vn ⎯ θ → θ as n → ∞. . {Vn} = {V(X1. To this effect. f(·.10. properly normalized. for all θ ∈ Ω. In connection with the concepts introduced above. σ2(θ)). Let X1. . .3 12. ▲ DEFINITION 15 The sequence of estimators of θ.) From now on. A BAN sequence of estimators is also called asymptotically efﬁcient (with respect to the variance).f. Xn)}. .d.i. (See ⎯ ( Pθ ) Chapter 8. Chapter 8. the sample size n may increase indeﬁnitely. as n → ∞. {Vn} = {V(X1. . the (intuitively optimal) property associated with an MLE. σ 2 θ .s. is said to be best asymptotically normal (BAN) if: ii) iIt is asymptotically normal and ii) The variance σ 2(θ) of its limiting normal distribution is smallest for all θ ∈ Ω in the class of all sequences of estimators which satisfy (i). . minimax. if.’s with p. . then Vn ⎯ θ → θ. the term “consistent” will be used in the sense of “weakly consistent. (uniformly) minimum variance. THEOREM 8 ⎯ If. as n → ∞. . DEFINITION 14 The sequence of estimators of θ.v. θ P PROOF For the proof of the theorem the reader is referred to Remark 5. (See Chapter 8.) This is often expressed (loosely) by writing Vn ≈ N(θ. The relative asymptotic efﬁciency of any other sequence of estimators which satisﬁes (i) only is expressed by the quotient of the smallest variance mentioned in (ii) to the variance of the asymptotic normal distribution of the sequence of estimators under consideration. . . minimum average risk (Bayes).

and therefore the above sequences of estimators are actually BAN. Appropriate regularity conditions ensure that these exceptional θ ’s are only “a few” (in the sense of their set having Lebesgue measure zero).f. θ ∈ Ω ⊆ . .v. .i. this topic will not be touched upon here. Xn). .’s from P(θ ). however. the proof of Theorem 9 is beyond the scope of this book. Xn are i. . Xn are i. . . respectively (see consistency of Xn – ¯ Chapter 8). ¯ ¯ iii) The same is true of the MLE X = Xn of μ and (1/n)∑n (Xj − μ)2 of σ 2 if j=1 X1. . σ 2) with one parameter known and the other unknown (see Example 12).d. . ⎢ ⎥ ⎣ n j =1 ⎦ It can further be shown that in all cases (i)–(iii) just considered the regularity conditions not explicitly mentioned in Theorem 9 are satisﬁed. r. distributed as the X’s above.’s with p. for each n. .v. Then. Xn be i. 1) by the CLT (see Chapter 8).5.v. the ¯ ¯ MLE of θ is X. f(·. which we denote by Xn here.i. r. .i. Also. The variance of the (normal) distribu– — tion of √n(Xn − μ) is I−1(μ) = σ 2. . . That √n(Xn − θ) is asymptotically normal N(0. . θ * will be an MLE or the MLE. such that the sequence {θ*} of n n estimators is BAN and the variance of its limiting normal distribution is equal to the inverse of Fisher’s information number ( ) ⎡∂ ⎤ I θ = Eθ ⎢ log f X . .d. Xn be i.d. X n = 0 ∂θ has a root θ * = θ*(X1. The weak and strong ¯ follows by the WLLN and SLLN. The fact that there can be exceptional θ’s. ( ) ( ) ( ) . Then. follows from the fact that n ( X n − θ ) θ (1 − θ ) is asymptotically N(0. and therefore it will be omitted. However.i. ⎣ ∂θ ⎦ () ( ) 2 where X is an r.d.’s from N(μ. . I−1(θ)). In smooth cases. r. by Exercise 12. . . θ ⎥ . . with the variance of limiting normal distribution being equal to I−1(θ) = θ (see Example 8).v. Examples have been n constructed.’s from B(1.324 12 Point Estimation THEOREM 9 Let X1. .1. then the MLE X = Xn of θ (see Example 10) is both (strongly) consistent and asymptotically normal by the same reasoning as above. r. . . for which {θ ∗} does not satisfy (ii) of Deﬁnition 16 for n some exceptional θ ’s. and the variance of the limiting normal distribution of ⎡1 n ⎤ 2 n ⎢ ∑ X j − μ − σ 2 ⎥ is I −1 σ 2 = 2σ 4 see Example 9 . . .d. has prompted the introduction of other criteria of asymptotic efﬁciency. where I(θ) = 1/[θ(1 − θ)] (see Example 7). along with other considerations. EXAMPLE 21 iii) Let X1. ¯ ¯ iii) If X1.v. if certain suitable regularity conditions are satisﬁed. θ). the likelihood equation ∂ log L θ X 1 . θ). .

) 12. .11. distributed as ( Pθ ) – N(0.11 Closing Remarks The Case of Availability of Complete Sufﬁcient Statistics 325 Exercise 12.12. where Z is an r. Thus {Un} and {Vn} are asymptotically equivalent according to Deﬁ⎯→ n – nition 17. θ).v.3) that the UMVU estimator of θ is Un = Xn (= X ) and this coincides with the MLE of θ (Exercise 12. it can be established (see Exercise 12. This is not surprising.3. It can also be shown (see Exercise 12. .d. X )} n n 1 n and {V } = {V ( X . θ ). θ ∈ Ω ⊆ and let {Vn} – d ⎯ = {Vn(X1. . X )} n n be two sequences of estimators of θ.v. Next.f. f(·.f. . that √n(Vn − θ) d ⎯ → Z.i.1).’s with p. as n → ∞. not functions of n) ⎯ ( Pθ ) – values of α and β. and it can also be shown (see Exercise 11.5. four different methods of estimation of the same parameter θ provided three different estimators. σ (θ )).i. r.11. for any arbitrary but ﬁxed (that is. √n(Un − θ) ⎯d → Z.’s with p. . by the – ⎯ CLT. asymptotic normality of {Vn} implies its consistency in ⎯→ n probability. Then show that Pθ Vn ⎯→∞ θ.d. . .f. .d. It has been shown ¯ ¯ (see Exercise 12. .1 Let X1. distributed as N( 2 − θ. θ ). . . . where W is an r.3) that √n(Wn − d 1 ⎯ θ) ⎯ → W. However.v.v.11 Closing Remarks The following deﬁnition serves the purpose of asymptotically comparing two estimators. As for Wn. corresponding to a Beta p. is given by ( ) n→∞ P Vn and the minimax estimator is Wn ∑ = n n j =1 Xj +α n+α + β . Xn be i. n U n − Vn ⎯ θ → 0. the Bayes estimator of θ. DEFINITION 17 Let X1.1). since the criteria of optimality employed in the four approaches were different.v. . . (P ) θ . (26) That is. θ(1 − θ)). . distributed as N(0.d. r. suppose that the X’s are from B(1. . as n → ∞. ⎯ For an example.10.3 12. f(·. . Then we say that {Un} and {Vn} are asymptotically equivalent if for every θ ∈ Ω. Xn be i. . Xn)} be a sequence of estimators of θ such that √n(Vn − θ ) ⎯ → ( Pθ ) 2 Y as n → ∞. as n → ∞. (That is. where Y is an r. λ. . . θ ∈ Ω ⊆ and let 1 n {U } = {U ( X .d.2) that √n(Un − Vn) Pθ ⎯→∞ 0. (25) ∑ = j =1 Xj + n 2 n+ n . θ(1 − θ)).

the answer would be that this would depend on which kind of optimality is judged to be most appropriate for the case in question. is the minimax – ⎯ estimator of θ. Un = Xn is the UMVU (and also the ML) – estimator of θ.3 In reference to Example 14. the estimator Vn given by (25) is the Bayes estimator of θ. Although the preceding comments were made in reference to the Binomial case.v.11. θ (1 − θ )).1 In reference to Example 14.326 12 Point Estimation Thus {Un} and {Wn} or {Vn} and {Wn} are not even comparable on the basis of Deﬁnition 17. given by (26). regarding the question as to which estimator is to be selected in a given case. they are of a general nature.2 In reference to Example 14. Wn.d. θ(1 − θ ). (P ) ¯ 12. and were used for the sake of deﬁniteness only. Then show that √n(Un Pθ − Vn) ⎯ → 0.11. Exercises 12. ⎯ n →∞ 12. where Z is an r. where W is an ( Pθ ) 1 r.11. distributed as N(0. whereas the estimator Vn is given by (25). corresponding to a prior Beta p.v. Finally. Then show that – ⎯ √n(Vn − θ) ⎯d → Z as n → ∞. distributed as (N 2 − θ. Then show that √n(Wn − θ) ⎯d → W as n → ∞.f.) θ .

. then a coin. 1] and having the following interpretation: If (x1. otherwise it is called a composite hypothesis. DEFINITION 2 ( ) A randomized (statistical) test (or test function) for testing H against the alternative A is a (measurable) function φ deﬁned on n. DEFINITION 1 A statement regarding the parameter θ. a new technique. X1. If ω contains only one point. .1 General Concepts of the Neyman–Pearson Testing Hypotheses Theory In this section.i. the problem is that of testing H on the basis of the observed values of the X’s. .’s deﬁned on a probability space (S. that is. which is called the alternative to H (or H0) and is usually denoted by A. In this context.13. Similarly for alternatives. θ). . Pθ). whose probability of falling 327 . .d. Often hypotheses come up in the form of a claim that a new product.v. Once a hypothesis H is formulated. H (or H0) is a statement which nulliﬁes this claim and is called a null hypothesis. .f. xn)′ is the observed value of (X1. . . . . 13. Xn)′ and φ(x1.d. xn) = y. . is called a (statistical) hypothesis (about θ) and is usually denoted by H (or H0). f(·. θ ∈ Ω ⊆ r and having p. we introduce the basic concepts of testing hypotheses theory. then H is called a simple θ hypothesis. such as θ ∈ ω ⊂ Ω. taking values in [0. r. etc. class of events. ω = {θ0}. . . . Xn will be i. is more efﬁcient than existing ones.3 UMP Tests for Testing Certain Composite Hypotheses 327 Chapter 13 Testing Hypotheses Throughout this chapter. Thus H H 0 : θ ∈ω A : θ ∈ω c . The statement that θ ∈ ω c (the complement of ω with respect to Ω) is also a (statistical) hypothesis about θ. . ..

. . with a ﬁxed sample size. c n In this case. the (Borel) set B in n is called the rejection or critical region and Bc is called the acceptance region. Thus a nonrandomized test has the following form: ⎧ ⎪1 if =⎨ ⎪0 if ⎩ ′ φ x1 . in fact. β(θ) is the probability of an error. . If.01. 0. . α is the smallest upper bound of the type-I error probabilities. xn ( ) (x (x 1 . The function β restricted to ωc is called the power function of the test and β(θ) is called the power of the test at θ ∈ωc. then the test φ is called a nonrandomized test. The sup [β(θ). In testing a hypothesis H. . DEFINITION 3 Let β(θ) = Pθ (rejecting H). . it fails to be more . 1 − β(θ) with θ ∈ ωc is the probability of θ accepting H. or to accept H when H is actually false.328 13 Testing Hypotheses heads is y. calculated under the assumpθ tion that H is true. Clearly. is tossed and H is rejected or accepted when heads or tails appear. Thus for θ ∈ ωc. . For example. Unfortunately. one may commit either one of the following two kinds of errors: to reject H when actually H is true. Of course. . . θ ∈ Ω. Thus for θ ∈ ω. . .10) and then derive tests which maximize the power. x ) ∈B . the probability of typeθ II error. . This will be done explicitly in this chapter for a number of interesting cases. θ the probability of type-I error. . the decision maker is somewhat conservative for holding the null hypothesis as true unless there is overwhelming evidence from the data that it is false. the (unknown) parameter θ does lie in the subset ω speciﬁed by H. He/she believes that the consequence of wrongly rejecting the null hypothesis is much more severe to him/ her than that of wrongly accepting it. maximizing the power is equivalent to minimizing the type-II error probability. in general. On the basis of limited experimentation. . From the consideration of potential losses due to wrong decisions (which may or may not be quantiﬁable in monetary terms). . The reason for this course of action is that the roles played by H and A are not at all symmetric. namely.005. 0. namely. In the particular case where y can be either 0 or 1 for all (x1.05. Then θ θ β(θ) with θ ∈ ω is the probability of rejecting H. θ ∈ω] is denoted θ ω θ ω by α and is called the level of signiﬁcance or size of the test. It is also plain that one would desire to make α as small as possible (preferably 0) and at the same time to make the power as large as possible (preferably 1). What the classical theory of testing hypotheses does is to ﬁx the size α at a desirable level (which is usually taken to be 0. xn 1 ) ∈B ′ . that is. this cannot be done. xn)′. suppose a pharmaceutical company is considering the marketing of a newly developed drug for treatment of a disease for which the best available drug in the market has a cure rate of 60%. 0. 1 − β(θ) represents the probability of an error. respectively. so that 1 − β(θ) = Pθ (accepting H). . calculated under the assumption that H is false. the research division claims that the new drug is more effective.

f(·.. the p. . (1) A level-α test which maximizes the power among all tests of level α is said to be uniformly most powerful (UMP).. In many important cases a UMP test does exist. Exercise 13. . level-α test if (i) sup [βφ(θ). that is.3 UMPaTests for Testing Certain Composite Hypotheses Testing Simple Hypothesis Against a Simple Alternative 329 effective or if it has harmful side effects.1 In the following examples indicate which statements constitute a simple and which a composite hypothesis: i) X is an r.f. In actuality. We set f0 = f(·. X n .d.1. .v. 13.f. . . .’s f0 and f1 need not 0 1 . ii) When tossing a coin. .’s with p. . On the other hand. the loss sustained by the company due to an immediate obsolescence of the product. failure to market a truly better drug is an opportunity loss. . If a decision is to be made on the basis of a number of clinical trials.v.f. Xn be i. X n ∈ B⎥ ⎣ ⎣ ⎦ ⎦ ′ ⎡ ⎤ + 0 ⋅ Pθ ⎢ X 1 . . Ω may consist of more θ than two points but we focus attention only on two of its points. whose expectation is equal to 5. etc. .2 Testing a Simple Hypothesis Against a Simple Alternative In the present case. r.d.∞)(x). θ). X n . DEFINITION 4 () () ( ) θ ∈Ω .v. taking the value 1 if head appears and 0 if tail appears. iii) X is an r. . θ ∈ ωc for any other test φ* which θ θ θ satisﬁes (i).d.’s. If ωc consists of a single point only.13. . In such a formulation. . Thus φ is a UMP. θ θ we want to test the hypothesis that the underlying p. We notice that for a nonrandomized test with critical region B. θ1) and let X1.v. . .f.i. Ω = {θ0. . but that may not be considered to be as serious as the other loss. will be quite severe. . . Thus β φ θ = β θ = Eθ φ X 1 . f1 = f(·. . . of the X’s is f0 against the alternative that it is f1. a UMP test is simply called most powerful (MP). X n ∈ Bc ⎥ = Eθφ X 1 .d. The problem is that of testing the hypothesis H : Ω θ ∈ ω = {θ0} against the alternative A : θ ∈ ω c = {θ1} at level α. X n ∈ B⎥ = 1 ⋅ Pθ ⎢ X 1 . whose p. f is given by f(x) = 2e−2xI(0. θ1}. we take Ω to consist of two points only. let X be the r. Let fθ and fθ be two given p. we have ′ ′ ⎡ ⎡ ⎤ ⎤ β θ = Pθ ⎢ X 1 . θ ∈Ω. decline of the company’s image. . Then the statement is: The coin is biased. which can be labeled as θ0 and θ1. In other words.213. θ0).d.d. . ⎣ ⎦ () ( ) ( ) ( ) ( ) and the same can be shown to be true for randomized tests (by an appropriate application of property (CE1) in Section 3 of Chapter 5). θ ∈ ω] = α and (ii) βφ(θ) ≥ βφ*(θ).f. . the null hypothesis H should be that the cure rate of the new drug is no more than 60% and A should be that this cure rate exceeds 60%.

.d. . X n = α . θ) · · · f(Xn. . for testing H against A at level α. In connection with this testing problem. . .f. . .i. θ1 ⋅ ⋅ ⋅ f xn . let T be the set of points z in n such that f0(z) > 0 and let Dc = Z−1(T c). θ1 > Cf x1 . . θ).d. we are going to prove the following result. ⎩ Eθ0 φ X 1 . ( ) Z = X1 . ( ( ) ) ( ( ) ) ( ( ) ) ( ( ) ) where the constants γ (0 ≤ γ ≤ 1) and C(>0) are determined so that ( ) (3) Then. respectively. otherwise. if f x1 . Xn be i. xn ( ) ⎧1. z = x1 . θ). . we set ′ dz = dx1 ⋅ ⋅ ⋅ dxn . θ 0 ⎪ (2) ⎪0. θ ∈ Ω = {θ0. f(·. θ) for f(x1. θ1 = Cf x1 . θ1}. r.f. . 0 Eθ φ Z = Pθ f1 Z > Cf0 Z + γPθ f1 Z = Cf0 Z 0 0 0 ( ) [ ( ) ( )] [ ( ) ( )] = P {[ f (Z) > Cf (Z)]I D} + γP {[ f (Z) = Cf (Z)]I D} θ0 1 0 θ0 1 0 ⎫ ⎧⎡ f Z ⎤ ⎪ 1 ⎪ = Pθ ⎨⎢ > C ⎥I D⎬ + γPθ ⎢ f0 Z ⎥ ⎪⎣ ⎪ ⎦ ⎩ ⎭ 0 = Pθ = Pθ 0 0 ⎫ ⎧ ⎡ f ( Z) ⎤ ( ) ⎪ ⎪ ⎢ = C ⎥ I D⎬ ⎨ ⎢ f ( Z) ⎥ ( ) ⎪ ⎪⎣ ⎦ ⎭ ⎩ [(Y > C )I D] + γP [(Y = C )I D] (Y > C ) + γP (Y = C ). .330 13 Testing Hypotheses even be members of a parametric family of p. . they may be any p.We are interested in testing the hypothesis H : θ θ = θ0 against the alternative A : θ = θ1 at level α (0 < α < 1). .d. . . in particular. Then Pθ Dc = Pθ Z ∈T c = ∫ f0 z dz = 0. . θ 0 ⋅ ⋅ ⋅ f xn . 1 0 0 θ0 θ0 (4) .’s with p. θ1 ⋅ ⋅ ⋅ f xn . PROOF For convenient writing. θ).d.’s which are of interest to us.v. θ 0 ⋅ ⋅ ⋅ f xn . if f x1 . Let φ be the test deﬁned as follows: φ x1 . 0 0 ( ) ( ) Tc () and therefore in calculating Pθ -probabilities we may redeﬁne and modify r. xn .’s on the set Dc. X n ( ) ′ and f(z. . Thus we have. . θ) · · · f(xn. THEOREM 1 (Neyman–Pearson Fundamental Lemma) Let X1. f(X1. f(Z. . Next.f.’s. θ). the test deﬁned by (2) and (3) is MP within the class of all tests whose level is ≤α. since the discrete case is dealt with similarly by replacing integrals by summation signs.v. . θ 0 ⎪ ⎪ = ⎨γ . The proof is presented for the case that the X’s are of the continuous type.

0 ( ) ( ) ( ) [ 0 ( )] [ ( )] ( ) ( ) and a(C) = 1 for C < 0.) At this point. α = a(C0) and if in (2) C is replaced by C0 and γ = 0. a is nonincreasing and continuous from the right. (See Fig. Futhermore. Y. First. a(C0) = a(C0 −). C0 is a continuity point of the function a.3 UMPaTests for Testing Certain Composite Hypotheses Testing Simple Hypothesis Against a Simple Alternative 331 a(C ) 1 Figure 13. Again we assert that the resulting test is of level α. take again C = C0 in (2) and also set γ = ( ) a(C − ) − a(C ) α − a C0 0 0 (so that 0 ≤ γ ≤ 1). Now let a(C) = Pθ (Y > C). there are two cases to consider. [ ( ) ( )] a(C − ) − a(C ) α − a C0 0 0 0 0 ) ( ) . a(∞) = 0.v. G is nondecreasing and continuous from the right. we have G(−∞) = 0..1 a(C ) a(C ) 0 C where Y = f1(Z)/f0(Z) on D and let Y be arbitrary (but measurable) on Dc. In this case. (4) becomes as follows: Eθ φ Z = Pθ Y > C0 + γPθ Y = C0 0 0 0 ( ) ( = a C0 + ( ) ( ) a C − − a C = α. Since G is a d. 0 0 ( ) ( ) ( ) as was to be seen. In the present case. Then. we assume that C0 is a discontinuity point of a.213. so that G(C) = 1 − a(C) = Pθ (Y ≤ C) is the d.f. 13.1 represents the graph of a typical function a. since Pθ (Y ≥ 0) = 1 Figure 13. the resulting test is of level α. 0 0 Pθ Y = C = G C − G C − = 1 − a C − 1 − a C − = a C − − a C . These properties of G imply that the function a is such that a(−∞) = 1. in this case (4) becomes Eθ φ Z = Pθ Y > C0 = a C0 = α .13.1.f. Next. Now for any α (0 < α < 1) there exists C0 (≥0) such that a(C0) ≤ α ≤ a(C0 −). In fact. of the r. that is. G(∞) = 1.

(7) and (8) yield βφ(θ1) − βφ*(θ1) ≥ 0. Then B+ ∩ B− = ∅ and. (3) is satisﬁed. . ∫ which is equivalent to [φ(z) − φ * (z)][ f (z) − Cf (z)]dz ≥ 0. That is. φ (z ) − φ * (z ) < 0} = (φ − φ * < 0). ▲ The theorem also guarantees that the power βφ(θ1) is at least α. 1 θ1 θ1 φ 1 1 (8) Relations (6). 1 0 1 n ∫ But n [φ(z) − φ * (z)]f (z)dz ≥ C ∫ [φ(z) − φ * (z)] f (z)dz. Now it remains for us to show that the test so deﬁned is MP. clearly. 0 0 0 ( ) ( ) ( ) φ* (7) and similarly. let φ* be any test of level ≤α and set B+ = z ∈ B − { = {z ∈ ( ( n n () () } ( ) . B+ = φ > φ * ⊆ φ = 1 ∪ φ = γ = f1 ≥ Cf0 ⎫ ⎪ ⎬ − B = φ < φ * ⊆ φ = 0 ∪ φ = γ = f1 ≤ Cf0 . This θ θ θ θ completes the proof of the theorem. 0 0 n (6) ∫ n [φ(z) − φ * (z)] f (z)dz = ∫ φ(z) f (z)dz − ∫ φ * (z) f (z)dz 0 n 0 = Eθ φ Z − Eθ φ * Z = α − Eθ φ * Z ≥ 0. we have that with C = C0. That is. and γ = ( ) a(C − ) − a(C ) α − a C0 0 0 (which it is to be interpreted as 0 whenever is of the form 0/0).332 13 Testing Hypotheses Summarizing what we have done so far. ∫ n [φ(z) − φ * (z)] f (z)dz = E φ(Z) − E φ * (Z) = β (θ ) − β (θ ).⎪ ⎭ Therefore ) ( ) ( ) ( ) ( ) ( ) ( ) ) (5) ∫ n [φ(z) − φ * (z)][ f (z) − Cf (z)]dz = ∫ [φ (z ) − φ * (z )][ f (z ) − Cf (z )]dz + ∫ [φ (z ) − φ * (z )][ f (z ) − Cf (z )]dz 1 0 B+ 1 0 B− 1 0 n and this is ≥0 on account of (5). φ z − φ * z > 0 = φ − φ* > 0 . as described in the theorem. That is. To see this. the test deﬁned by (2) is of level α. as deﬁned above. θ . or βφ(θ1) ≥ βφ*(θ1).

θ0. θ1} and the problem will be that of testing a simple hypothesis against a simple alternative at level of signiﬁcance α. then φ necessarily has the form (2) unless there is a test of size <α with power 1. then γ = 0 again and any C in (b1. if φ is an MP level α test.’s from B(1. It will then prove convenient to set R z.v. we have θ βφ(θ1) ≥ βφ*(θ1) = α. Xn be i. ▲ θ PROOF REMARK 1 i) The determination of C and γ is essentially unique. Ω = {θ0. r. θ ) ⋅ ⋅ ⋅ f (x . then γ = 0 and C is the unique point for which a(C) = α. This is so because Pθ Y ∈ b1 . α) and parallel to the C-axis has only one point in common with the graph of a. Also it is often more convenient to work with log R(z. EXAMPLE 1 Let X1. if the (straight) line through the point (0. θ ) f x1 . θ1) > 0. . The examples to be discussed below will illustrate how the theorem is actually used in concrete cases. b2 ≤ G b2 − G b1 0 [ ( ]] ( ) ( ) = [1 − a(b )] − [1 − a(b )] = a(b ) − a(b ) = 0. Then βφ(θ1) ≥ α. θ0. R(z. θ The test φ*(z) = α is of level α. of course. θ0. θ1 1 0 n 0 whenever the denominator is greater than 0. where x = Σ n= 1 xj and therefore. Finally. Then log 1−θ θ (z. θ ) = x log θ + (n − x) log 1 − θ 1 0 0 1 1 0 . 2 1 2 1 ii) The theorem shows that there is always a test of the structure (2) and (3) which is MP.i. θ1) itself. . θ . and since φ is most powerful. θ1 ⋅ ⋅ ⋅ f xn . if C = C0 is a discontinuity point of a.213. θ1) > C is j equivalent to θ1 1 − θ0 ⎛ 1 − θ1 ⎞ .13. by the fact that θ0 < θ1. R(z. This point will not be pursued further here. θ 0 . say. Next. b2]. x > C0 . provided. where C0 = ⎜ log C − n log ⎟ log 1 − θ0 ⎠ ⎝ θ0 1 − θ1 ( ( ) ) . then both C and γ are uniquely deﬁned the way it was done in the proof of the theorem. In fact. b2] can be chosen without affecting the level of the test. . θ) and suppose θ0 < θ1.3 UMPaTests for Testing Certain Composite Hypotheses Testing Simple Hypothesis Against a Simple Alternative 333 COROLLARY Let φ be deﬁned by (2) and (3). In the examples to follow. θ1) rather than R(z. . namely. The converse is also true. θ0.d. θ1 = ( ) ( ) ( ) f (x . if the above (straight) line coincides with part of the graph of a corresponding to an interval (b1.

⎪ ⎪0. the inequality signs in (9) and (10) j are reversed.v.75. EXAMPLE 2 Let X1. Thus the MP test is deﬁned by ⎧1. θi).i.882.95. Therefore the MP test in this case is given by (2) with C0 = 17 and γ = 0. 0 0 0 ( ) ( ) ( ) (10) and X = Σn= 1 Xj is B(n. let us take θ0 = 0.05 and n = 25.8356. The power of the test is P0.95.5 X = C0 = 0.75(X > 17) + 0. . ( ) ( ) For C0 = 17. ⎪ ⎪ φ z = ⎨γ .5 X ≤ C0 − γP0.75(X = 17) = 0.8792.5 X = C0 = 1 − P0. where n(θ −θ log ⎡Ce ⎢ ⎣ C0 = log θ1 θ0 1 0 ( ) )⎤ ⎥ ⎦. whence γ = 0. by using the assumption that θ0 < θ1.05 = P0.9784 and P0. θ0.5(X ≤ 17) = 0. we have.’s from P(θ) and suppose θ0 < θ1. θ 0 . . Then logR z.5 X = C0 ( ) ( ) ( ) ( ) is equivalent to P0. ⎩ if if () ∑ j =1 x j > C0 n ∑ j =1 x j = C0 n (9) otherwise. P0.5 X > C0 + γP0. r. θ0 ( ) where x = ∑ xj j =1 n and hence.50.882 P0. . by means of the Binomial tables. θ1) > C is equivalent to x > C0 . where C0 and γ are determined by Eθ φ Z = Pθ X > C0 + γPθ X = C0 = α . θ1 = x log ( ) θ1 − n θ1 − θ 0 .5 X ≤ C0 + γP0. i = 0. . Xn be i. α = 0. ⎪ ⎪ φ z = ⎨γ .0323γ = 0. ⎪ ⎪0.334 13 Testing Hypotheses Thus the MP test is given by ⎧1. Thus γ is deﬁned by 0.9784 − 0. 1. . If θ0 > θ1. For the sake of deﬁniteness. ⎩ if if () ∑ ∑ n j =1 n j =1 x j > C0 x j = C0 (11) otherwise. Then 0. θ1= 0. one has that R(z.5(X = 17) = 0.0323.d.

3(X ≤ 10) = 0.3(X = 10) = 0. 0 ( ) ( ) (14) ¯ and X is N(θi. Therefore the MP test in this case is given by (13) with C0 = 0. the inequality signs in (13) and (14) are reversed. ( ) [( ) ( ) )] [ ( ) ( )] whence C0 = 0.3. θ0. one has that for C0 = 10.05 and n = 20. i = 0.95.d.13. θ1 = 0. ( ) ( ) ( ) (12) and X = Σn Xj is P(nθi). () 0 if x > C0 otherwise.95. 1 1 ( ) [( ] [ ( ) ] . Let.0413.213. the inequality signs in (11) and (12) j=1 are reversed. EXAMPLE 3 ( ) ( ) Let X1. Thus the test is given by (11) with C0 = 10 and γ = 0.v.1791. If θ0 > θ1.’s from N(θ. α = 0. As an application. Thus the MP test is given by ⎧1.9574 − 0. θ1.4 X > 10 + 0. whence γ = 0.4. Then (14) gives P−1 X > C0 = P−1 3 X + 1 > 3 C0 + 1 = P N 0.4 X = 10 = 0. The power of the test is P X > 0.001 and n = 9.3 X ≤ C0 − γP0. . (13) where C0 is determined by Eθ φ Z = Pθ X > C0 = α .91 = 0.i. 1. = 1. ( ) ( ) By means of the Poisson tables.1791 P0. α = 0.9982. let us take θ0 = 0. φ z =⎨ ⎩0.0413γ = 0. 1/n). Therefore γ is deﬁned by 0. θ0 = −1. .2013. Then logR z. Xn be i. P0. θ 0 . θ1) > C is equivalent to x > C0.03. . i = 0. 1. 1 > 3 C0 + 1 = 0.001. 1 > −2. for example. θ1 = ( ) 1 n ⎡ ∑ ⎣ x j − θ0 2 j =1 ⎢ ( ) − (x 2 j 2 − θ1 ⎤ ⎥ ⎦ ) ¯ and therefore R(z.03. Then (12) becomes P0.3 UMPaTests for Testing Certain Composite Hypotheses Testing Simple Hypothesis Against a Simple Alternative 335 where C0 and γ are determined by Eθ0 φ Z = Pθ0 X > C0 + γPθ0 X = C0 = α . .9574 and P0. 1) and suppose θ0 < θ1. where C0 = n θ0 + θ1 1 ⎡ log C ⎢ + n ⎢ θ1 − θ0 2 ⎣ ( )⎤ ⎥ ⎥ ⎦ by using the fact that θ0 < θ1. The power of the test is P0.03 = P 3 X − 1 > −2. r.3 X = C0 = 0.1791. If θ0 > θ1.91 = P N 0.

θ1 = ( ) 1 θ1 − θ 0 θ x + log 0 . Then (16) becomes ⎛X C ⎞ ⎛ 2 C ⎞ P4 X > C0 = P4 ⎜ > 0 ⎟ = P ⎜ χ 20 > 0 ⎟ = 0. θ1 − θ0 θ0 ⎠ ⎝ Thus the MP test in the present case is given by ⎧1. . r. θ) and suppose θ0 < θ1.’s from N(0. 4 ⎠ 4 ⎠ ⎝ 4 ⎝ X ( ) whence C0 = 150. ⎝ j =1 ⎠ 0 () 0 (16) and θ is distributed as χ2 . construct the MP test of the hypothesis H that the common distribution of the X’s is N(0. α = 0. the inequaln j j i ity signs in (15) and (16) are reversed. θ1 = 16. What is the numerical value of n if α = 0.264.v. Xn be i. . where C0 is determined by ⎛ n ⎞ Eθ φ P = Eθ ⎜ ∑ X j2 > C0 ⎟ = α . .2. i = 0.336 13 Testing Hypotheses EXAMPLE 4 Let X1.i. X16 are independent r. where C0 = ⎛ 2θ0θ1 θ ⎞ log ⎜C 1 ⎟ . . . . θ0. one has predetermined values for α and β.2 Let X1. . . 1. If θ0 > θ1.’s distributed as N(μ. . . θ 0 .977.05. . 13. Also ﬁnd the power of the test. where μ is unknown and σ is known. let θ0 = 4. 9) at level of signiﬁcance α = 0.v.05.01.’s.d. β = 0. θ1) > C is j=1 j equivalent to x > C0. The power of the test is ⎛ X 150.01 and n = 20. ⎪ φ z =⎨ ⎪0. where X = Σn= 1 X2.264 = P ⎜ > 16 16 ⎟ = P χ 20 > 9. Thus the test is given by (15) with C0 = 150. so that.1 If X1. one has that R(z. 9) against the alternative A that it is N(1.9 and σ = 1? .264.v.264 ⎞ 2 P X > 150. Here logR z. For an example. ⎩ () if ∑ n j =1 x 2 > C0 j (15) otherwise. σ2). Xn be independent r. by means of θ0 < θ1. . 2θ 0θ1 2 θ1 where x = Σn x2.3915 = 0.2. 16 ⎠ ⎝ 16 ( ) ( ) Exercises 13. Show that the sample size n can be determined so that when testing the hypothesis H : μ = 0 against the alternative A : μ = 1.

. over the [0. X100 be independent r. It will prove convenient to set g z. In the following. where μ is unknown and σ is known.7 Let X be an r.05. . . 13. f which can be either f0 or else f1.d.’s having the monotone likelihood ratio property. . whose p.3 UMP Tests for Testing Certain Composite Hypotheses In the previous section an MP test was constructed for the problem of testing a simple hypothesis against a simple alternative.’s distributed as N(μ.f.2. . construct the MP test of the hypothesis H : μ = 3. or the Triangular p. This will be shown in the present section.2.v. σ2 = 4 against the alternative A : μ = 3. 13.f.05.13.6 Let X be an r.’s with p. If ¯ x = 3. θ ⋅ ⋅ ⋅ f x1 . in most problems of practical interest. r. .2. 1) p. Construct the MP test of the hypothesis H : β = 2 against the alternative A : β = 3 at level of signiﬁcance 0.d.5. construct the MP test of the hypothesis H : f = f0 against the alternative A : f = f1 at level of signiﬁcance α = 0. .v.2. where f0 is P(1) and f1 is the Geometric p.3 UMP Tests for Testing Certain Composite Hypotheses 337 13.f. .’s distributed as Gamma with α = 10 and β unknown.v. x! 2x ≥ C} for (Hint: Observe that the function x!/2x is nondecreasing for x integer ≥1. f1(x) = 4 − 4x for 1 ≤ x ≤ 1 and 0 otherwise).v. f1(x) = 4x for 0 ≤ x < 1 .v. θ). Xn be independent r.d. However.i. . In cases like this it so happens that for certain families of distributions and certain H and A. with p. σ2). . . 1] interval.d.36 × some positive number C. .2.’s distributed as N(μ. θ . 1.f. ( ) (17) Also Z = (X1. For testing the hypothesis 2 H : f = f0 against the alternative A : f = f1: i) Show that the rejection region is deﬁned by: {x ≥ 0 integer. This deﬁnition is somewhat more restrictive than the one found in more advanced textbooks but it is sufﬁcient for our purposes. with p = 1 .d. denoted by f1 (that is. . θ = f x1 . UMP tests do exist.f. . . . . Xn)′. show that α can get arbitrarily small and β arbitrarily large for sufﬁciently large n. . θ ∈ Ω ⊆ .f. 13.3 Let X1. f(·. . denoted by f0.01. . is either the U(0.) 13. Xn be i. we give the deﬁnition of a family of p. . On the basis of one 2 2 observation on X. Let X1. σ2 = 4 at level of signiﬁcance α = 0. ii) Determine the level of the test α when C = 3. 13. σ2).v.f. at least one of the hypotheses H or A is composite. .d. .4 Let X1.2. X30 be independent r.d. For testing the hypothesis H : μ = μ1 against the alternative A : μ = μ2.d. ( ) ( ) ( ) ′ z = x1 . xn .5 Let X1. . .

σ2) with σ2 known and N(μ. If Q is j=1 decreasing. θ) > 0 is independent of θ and there exists a (measurable) function V deﬁned in n into such that whenever θ. θ ′ ( ) = C (θ ′)e ( ) ( ) = C (θ ′) e[ () () g ( z. f x. or β = θ and α known. θ ∈ Ω} is said to have the monotone likelihood ratio (MLR) property in V if the set of z’s for which g(z. (18) . PROPOSITION 1 Consider the exponential family f x: θ =C θ e ( ) () Q θ T x () ( ) hx. θ ′) are distinct and (ii) g(z.338 13 Testing Hypotheses DEFINITION 5 The family {g(·. that is. one has g z. θ ) C (θ ) C (θ )e Q θ′ V z 0 0 Q θ V z 0 0 Q θ ′ −Q θ V z ( ) ( )] ( ) . but it is not of a one-parameter exponential type. θ) is a monotone function of V(z). This completes the proof of the ﬁrst assertion.d. the assumption that Q is increasing implies that g(z. θ) and g(·. () where C(θ) > 0 for all θ ∈ Ω ⊆ and the set of positivity of h is independent of θ.8(i). In what follows. θ) with μ known. Note that the likelihood ratio (LR) in (ii) is well deﬁned except perhaps on a set N of z’s such that Pθ(Z ∈ N) = 0 for all θ ∈ Ω. Now for θ < θ′.’s having the MLR property is a oneparameter exponential family. where V(z) = Σn T(xj) and g(· .d. Chapter 4) with parameter θ . Suppose that Q is increasing. () where C0(θ) = C (θ). V(z) = Σ T(xj) and h*(z) = h(x1) · · · h(xn). we will always work outside such a set. x ∈ . ▲ From examples and exercises in Chapter 11. θ′)/g(z. Therefore on the set of z’s for which h*(z) > 0 (which set has Pθ-probability 1 for all θ). θ ′ ∈ Ω with θ < θ ′ then: (i) g(·. Below we present an example of a family which has the MLR property.f.θ). Poisson. the family has the MLR property in V′ = −V. θ ∈ Ω} has the MLR property in V. θ) is given by (17). An important family of p. Negative Binomial.f. EXAMPLE 5 Consider the Logistic p. The proof of the second assertion follows from the fact that [Q(θ ′) − Q(θ )]V (z) = [Q(θ ) − Q(θ ′)]V ′(z). θ) is an increasing function of V(z). N(θ. (see also Exercise 4. Then the family {g(· . it follows that all of the following families have the MLR property: Binomial. θ = ( ) e − x −θ (1 + e ) − x −θ 2 . PROOF We have g z : θ = C0 θ e n n j=1 ( ) () Q θ V z () () h* z . θ ∈Ω = . θ). Gamma with α = θ and β known. θ ′)/g(z.1.

the MP test φ′ is given by PROOF ⎧1. θ) is deﬁned in (17). θ ) 2 if and only if θ −θ ′ ⎛ 1 + e − x −θ ⎞ ⎛ 1 + e − x ′−θ ⎞ < e θ −θ ′ ⎜ .’s with p. . θ ) f ( x ′. . ⎩ if if g z. In the case that the LR is increasing in V(z). θ ′ ) f ( x. θ0 () ( ) ( ) g( z. θ ) e θ −θ ′ ⎛ 1 + e − x −θ ⎞ ⎜ − x −θ ′ ⎟ ⎝1+ e ⎠ 2 and f x. θ ) 0 otherwise. ⎩ if V z > C if () () V ( z) = C (19) otherwise. by Theorem 1. θ ′ ( ) < f ( x ′. Let θ0 ∈ Ω and set ω = {θ ∈ Ω. ⎪ ⎪ φ ′ z = ⎨γ . θ ≤ θ0}.f. The proof of the theorem is a consequence of the following two lemmas.f. . Then. ⎪ ⎪ φ z = ⎨γ . the last inequality is equivalent to e−x < e−x′ or −x < −x′. r. Therefore if θ < θ′. . Xn be i. θ ∈ Ω} have the MLR property in V. θ). there exists a test φ which is UMP within the class of all tests of level ≤α. θ ′ > C ′g z. the test φ deﬁned by (19) and (19′) is MP (at level α) for testing the (simple) hypothesis H0 : θ = θ0 against the (composite) alternative A : θ ∈ ω c among all tests of level ≤ α. ( ) [() ] [() ] (19′) If the LR is decreasing in V(z). LEMMA 1 Under the assumptions made in Theorem 2. ⎜ − x −θ ′ ⎟ − x ′−θ ′ ⎟ ⎝1+e ⎠ ⎝1+e ⎠ 2 However. where g(·. f(x. this is equivalent to e−x(e−θ − e−θ′ ) < e−x′(e−θ − e−θ′ ).i. θ). ⎪ ⎪0 . .d. For families of p. the test is given by ⎧1. θ ∈ Ω ⊆ and let the family {g(·.’s having the MLR property. where C and γ are determined by Eθ 0 φ Z = Pθ 0 V Z > C + γPθ 0 V Z = C = α . THEOREM 2 Let X1.3 UMP Tests for Testing Certain Composite Hypotheses 339 Then f x. ⎪ ⎪0 . Then for testing the (composite) hypothesis H : θ ∈ ω against the (composite) alternative A : θ ∈ω c at level of signiﬁcance α.v.13. This shows that the family {f(·. θ ∈ } has the MLR property in −x. Let θ′ be an arbitrary but ﬁxed point in ω c and consider the problem of testing the above hypothesis H0 against the (simple) alternative A′ : θ = θ′ at level α. θ ′ ( )=e f ( x. the test is taken from (19) and (19′) with the inequality signs reversed.d. we have the following important theorem. θ).d. θ ′) = C ′g( z.

⎪ ⎪ φ ′ z = ⎨γ . ⎪ ⎭ () () ( ) (20) Eθ 0 φ ′ Z = Pθ 0 ψ V Z > C ′ + γ ′ θ 0 ψ V Z = C ′ P = Pθ 0 V Z > C0 + γ ′Pθ 0 V Z = C0 . ⎪ ⎪ φ ′ z = ⎨γ . where C′ and γ′ are determined by Eθ ′φ ′ Z = Pθ ′ ψ V Z > C ′ + γ ′ θ′ ψ V Z = C ′ = α θ ′ . Then in the case under consideration ψ is deﬁned on into itself and is increasing. we have that C = C0 and γ = γ ′ and the test given by (19) and (19′) is UMP for testing H0 : θ = θ0 against A : θ ∈ ω c (at level α). θ0 > C ′g z. and Eθ0 φ ′ Z = Pθ0 V Z > C0 + γ ′ Pθ0 V Z = C 0 = α . by Theorem 1. ▲ LEMMA 2 Under the assumptions made in Theorem 2. θ ′) otherwise. It follows from (21) and (21′) that the test φ′ is independent of θ′ ∈ ω c. ( ) [() ] [() ] (21′) so that C0 = C and γ ′ = γ by means of (19) and (19′).340 13 Testing Hypotheses where C′ and γ′ are deﬁned by Eθ 0 φ ′ Z = α . θ ) = C ′g( z. ( ) Let g(z. θ0) = ψ [V(z)]. ( ) [() { [ ( )] } ] [() { [ ( )] } ] Therefore the test φ′ deﬁned above becomes as follows: ⎧1. In other words. Let θ′ be an arbitrary but ﬁxed point in ω and consider the problem of testing the (simple) hypothesis H′ : θ = θ′ against the (simple) alternative A0(= H0) : θ = θ0 at level α(θ′) = Eθ′φ(Z). ⎪ ⎪0 . ⎪ ⎪0 . the MP test φ′ is given by PROOF ⎧1. if and only if if and only if V z > ψ −1 C ′ = C0 ⎫ ⎪ ⎬ V z = C0 . P ( ) { [ ( )] } { [ ( )] } ( ) . Therefore [ ( )] ψ [V ( z)] = C ′ ψ V z > C′ In addition. ⎩ if V z > C0 if 0 () () V ( z) = C (21) otherwise. we have Eθ′φ(Z) ≤ α for all θ′ ∈ ω. θ ′ 0 () ( ) ( ) g( z. Once again. θ′ )/g(z. ⎩ if if g z. and for the test function φ deﬁned by (19) and (19′).

C ⊆ C0. θ = C θ e hx. Also for testing H : θ ∈ ω = {θ ∈ Ω. j PROOF It is immediate on account of Proposition 1 and Remark 2. Let X1. by Lemma 1. In all tests. θ) given by Q (θ )T ( x ) f x. . V(z) = Σn= 1 T(xj). . the test φ. The test is given by (19) and (19′) if the LR is decreasing in V(z) and by those relationships with the inequality signs reversed if the LR is increasing in V (z). clearly.f. C0 0 0 { } = {all level α tests for testing H : θ = θ }.v. Next. Hence it is MP among tests in C. for the problem discussed in Theorem 2 and also the symmetric situation mentioned . f(·. there is a test φ which is UMP within the class of all tests of level ≤α. This test is given by (19) and (19′) if Q is increasing and by (19) and (19′) with reversed inequality signs if Q is decreasing. and is MP among all tests in C0. . ▲ REMARK 2 For the symmetric case where ω = {θ ∈ Ω. under the assumptions of Theorem 2. θ ∈ Ω. ▲ It can further be shown that the function β(θ) = Eθφ(Z). the test φ′ above also becomes as follows: ⎧1. ▲ PROOF OF THEOREM 2 Deﬁne the classes of test C and C0 as follows: C = all level α tests for testing H : θ ≤ θ0 . a UMP test also exists for testing H : θ ∈ ω against A : θ ∈ ω c. there is a test φ which is UMP within the class of all tests of level ≤α. θ ≤ θ0} against A : θ ∈ ω c at level α. Eθ ′φ ′ Z = Pθ ′ V Z > C0′ + γ ′Pθ ′ V Z = C0′ = α θ ′ . ⎪ ⎪ φ ′ z = ⎨γ .d. Furthermore. deﬁned by (19) and (19′). one has that α(θ′) ≤ α. ( ) [() ] [() ] ( ) (22′) where C′0 = ψ −1(C′ ).’s with p. Then for testing H : θ ∈ ω = {θ ∈ Ω. The desired result follows. θ ≤ θ0} against A : θ ∈ω c at level of signiﬁcance α.13.3 UMP Tests for Testing Certain Composite Hypotheses 341 On account of (20). ⎩ if V z > C0′ if () () V ( z) = C ′ 0 (22) otherwise.i. Replacing θ0 by θ′ in the expression on the left-hand side of (19′) and comparing the resulting expression with (22′). Xn be i. This test is given by (19) and (19′) if Q is decreasing and by those relationships with reversed inequality signs if Q is increasing. ⎪ ⎪0 .d. COROLLARY ( ) () () where Q is strictly monotone. since α is the power of the test φ′. ′ Therefore the tests φ′ and φ are identical. θ ≥ θ0}. by the corollary to Theorem 1. belongs in C by Lemma 2. . r. one has that C0 = C and γ ′ = γ. Then. The relevant proof is entirely analogous to that of Theorem 2.

respectively). This result is stated as a theorem below but its proof is not given. θi 2 ] () n j =1 and V z = ∑ T x j . θ = C θ e ( ) () Q θ T x () ( ) hx. ⎪ ⎪ φ z = ⎨γ i ⎪ ⎪0.342 13 Testing Hypotheses ( ) 1 1 ( ) 0 0 0 0 Figure 13. θ). In the case that Q is increasing.d.2 and 13. θ may represent a dose of a certain medicine and θ1. γ2 are determined by Eθ φ Z = Pθ C1 < V Z < C 2 + γ 1Pθ V Z = C1 i i i ( ) [ +γ 2 ( ) ] [() P [V (Z) = C ] = α . If θ ≤ θ1 the dose is rendered harmless but also useless. () (23) where Q is assumed to be strictly monotone and θ ∈ Ω = . is increasing for those θ’s for which it is less than 1 (see Figs. ⎩ if C1 < V z < C 2 if V z = C i otherwise. . θ ≤ θ1 or θ ≥ θ 2 { } against A : θ ∈ω c. θ ≤ θ1 or θ ≥ θ2}. Then for testing the (composite) hypothesis H : θ ∈ ω against the (composite) alternative A : θ ∈ ω c at level of signiﬁcance α. there exists a UMP test φ.’s with p. since this would rather exceed the scope of this book. Set ω = {θ ∈ Ω. r. One may then hypothesize that the dose in question is either useless or harmful and go about testing the hypothesis. then a UMP test for the testing problem above does exist. i = 1. Another problem of practical importance is that of testing H : θ ∈ω = θ ∈ Ω. Xn be i. If the underlying distribution of the relevant measurements is assumed to be of a certain exponential form. f(·.v. A : θ > θ0 Figure 13. θ2 ∈ Ω and θ1 < θ2.d. 2. φ is given by ⎧1.3.i. θ2 are the limits within which θ is allowed to vary. A : θ < θ0 in Remark 2. 13. () () () (i = 1.2 H : θ ≤ θ0. where θ1. given by f x. For instance. C2 and γ1. θ2 ∈ Ω and θ1 < θ2.f. . . where θ1. THEOREM 3 Let X1. ( ) (25) .3 H : θ ≥ θ0. 2) (C 1 < C2 ) (24) where C1. whereas if θ ≥ θ2 the dose becomes harmful. .

. θ1. on account of the corollary to Theorem 2. EXAMPLE 6 Let X1.4). A : θ1 < θ < θ2.01 and n = 25. θ ∈ Ω for the problem discussed in Theorem 3. The level of signiﬁcance is α(0 < α < 1). the test is given again by (24) and (25) with C1 < V (z) < C2 replaced by V(z) < C1 or V(z) > C2.i. If Q is decreasing. Xn be i. 0 0 0 ( ) ( ) ( ) (27) and X = ∑Xj j =1 n is B n. θ . ⎪ ⎪ φ z = ⎨γ . θ ≤ θ1 or θ ≥ θ2} against A′ : θ ∈ωc.d. the UMP test for testing H is given by ⎧1.13.’s from B(1. θ ∈ Ω = (0. let θ0 = 0. . r. Here V z = ∑ xj j =1 () n and Q θ = log () θ 1−θ is increasing since θ/(1 − θ) is so. θ2 ∈Ω and θ1 < θ2. 1). ⎪ ⎪0. increases for θ ≤ θ0 and decreases for θ ≥ θ0 for some θ1 < θ0 < θ2 (see also Fig. . we mention once and for all that the hypotheses to be tested are H : θ ∈ω = {θ ∈Ω. θ ≤ θ0} against A : θ ∈ωc and H′ : θ ∈ω = {θ ∈ Ω. where C and γ are determined by Eθ φ Z = Pθ X > C + γPθ X = C = α . θ). Theorems 2 and 3 are illustrated by a number of examples below.3 UMP Tests for Testing Certain Composite Hypotheses 343 ( ) 1 0 1 0 2 Figure 13. It can also be shown that (in nondegenerate cases) the function β(θ) = Eθφ(Z). Then one has .5.v. . ⎩ if if () ∑j = 1 x j > C n ∑j = 1 x j = C n (26) otherwise. ( ) For a numerical application. Therefore. α = 0.4 H : θ ≤ θ1 or θ ≥ θ2. 13. In order to avoid trivial repetitions. θ0.

25 X = C1 + γ 2 P0. α = 0. C2 and γ1.75 1 2 0.25.5 X = 15 = 0.75 X = 18 = 0. ⎩ if C1 < ∑ j=1 x j < C 2 n () if ∑ n j=1 x j = Ci (i = 1. take θ0 = 0. r.v. The Binomial tables provided the values C = 18 and γ = test at θ = 0. 363 .i.d.75 X > 18 + ( ) ( ) 27 P0.5. The power of the βφ 0. Xn be i. θ ∈Ω = (0.344 13 Testing Hypotheses P0. for testing H′ the UMP test is given by ⎧1 ⎪ ⎪ φ z = ⎨γ i ⎪ ⎪0.75.5 is 205 ≈ 0. One has then P0.75 1 2 1 0. 418 [ ( ) ( )] Let X1. one has after some simpliﬁcations 416γ 1 + 2γ 2 = 205 2γ 1 + 416γ 2 = 205. 2) ( ) otherwise. . take θ1 = 0.25 X = C 2 = 0. ( ) ( ) ( ) Again for a numerical application. θ2 = 0.05 P0. . .5 X > C + γP0. i = 1.5923. Therefore the UMP test for testing H is again given by (26) and (27).75 = P0. 418 βφ 0. 2 For C1 = 10 and C2 = 15. For a numerical example.05 and n = 25.75 is 27 143 ( ) ( ) . α = 0. where now X is P(nθ).05 and n = 10. by means of the Poisson tables. from which we obtain γ1 = γ 2 = The power of the test at θ = 0. we ﬁnd C = 9 and γ = 182 ≈ 0. ∞).’s from P(θ).25 C1 < X < C 2 + γ 1 P0. 143 ( ) By virtue of Theorem 3.5014.05. γ2 deﬁned by Eθ i φ Z = Pθ i C1 < X < C 2 + γ 1 Pθ i X = C1 + γ 2 Pθ i X = C 2 = α .5 X = C = 0.5 10 < X < 15 + EXAMPLE 7 ( ) ( ) 205 P0. Here V(z) = Σn= 1 xj and j Q(θ) = log θ is increasing.4904. 2.6711.5 X = 10 + P0.5 = P0. with C1. . Then.01.75 ( (C ) ( ) < X < C ) + γ P (X = C ) + γ P ( ) (X = C ) = 0.

6 H : θ ≥ θ0. . r. σ2) with σ2 known. 2 n ) 0 C N ( 0. α = 0.8023. . . ⎥ σ ⎦ 1 ) N ( 0.6. Here V z = ∑ xj j =1 () n and Qθ = () 1 θ σ2 is increasing.5 and 13. βφ θ = 1 − Φ ⎢ ⎢ ⎥ σ ⎣ ⎦ For instance. (See also Fig. i i ( ) ( ) i = 1. σ2/n).3 UMP Tests for Testing Certain Composite Hypotheses 345 The power of the test at θ = 1 is βφ(1) = 0.7.v. Xn be i. for σ = 2 and θ0 = 20. ( ) ( ) ¯ and X is N(θ.’s from N(θ. .66. Figure 13. () if x > C otherwise.13. On the other hand. for testing H′ the UMP test is given by ⎧1. φ z =⎨ ⎩0. 2 n ) 0 0 C 0 Figure 13. For θ = 21. A : θ < θ0. where C is determined by Eθ 0 φ Z = Pθ 0 X > C = α . the power of the test is () ( ) βφ 21 = 0. ( ) () if C1 < x < C 2 otherwise.5 H : θ ≤ θ0. is given by ⎡ n C −θ ⎤ ⎥.i. A : θ > θ0.6048. 2. φ z =⎨ ⎩0 .05 and n = 25. Therefore for testing H the UMP test is given by (dividing by n) ⎧1. one has C = 20.) The power of the test is given by ⎡ n C −θ 2 βφ θ = Φ ⎢ ⎢ σ ⎣ () ( ) ⎤ − Φ ⎡ n (C ⎥ ⎢ ⎥ ⎦ ⎢ ⎣ −θ ⎤ ⎥. as is easily seen. C2 are determined by Eθ φ Z = Pθ C1 < X < C 2 = α .d. 13. (See also Figs.) The power of the test. EXAMPLE 8 Let X1. where C1. . 13.

take θ0 = 4. the UMP j test is given by ⎧1. Xn be i.’s from N(μ. For instance. C2 = 0.8 H : θ ≤ θ0.i.344. the power of the test is βφ(12) = 0.8 and 13. α = 0. is given by 2 βφ θ = 1 − P χ n < C θ () ( ) (independent of μ!). A : θ > θ0. χ2 stands for an r. ⎩ () if C1 < ∑j = 1 x j − μ n ( ) 2 < C2 otherwise. Then one has C = 150.v.05 and n = 25. Figure 13. A : θ < θ0.980. EXAMPLE 9 Let X1. ⎪ φ z =⎨ ⎪0.d. θ2 = 1. θ) with μ known. ⎪ φ z =⎨ ⎪ ⎩0.05 and n = 25. α = 0.610. r. for σ = 2 and θ1 = −1. C2 are determined by 2 n 2 n 0 C/ 0 0 C/ 0 Figure 13. as is easily seen. ⎤ > C⎥ = α. distribution as χ2.) n n For a numerical example. and for θ = 12. 13.9.346 13 Testing Hypotheses N ( 0. for testing H′ the UMP test is given by ⎧1. .7 H ′ : θ ≤ θ1 or θ ≥ θ2. Therefore for testing H. On the other hand. A′ : θ1 < θ < θ2. ⎥ ⎦ where C is determined by ⎡n Eθ φ Z = Pθ ⎢∑ X j − μ ⎢ j =1 ⎣ 0 ( ) 0 ( ) 2 The power of the test. 2 n ) 0 C1 0 C2 Figure 13. . . one has C1 = −0.v. () if ∑j = 1( x j − μ ) n 2 >C otherwise.344. . the power of the test is βφ(0) = 0. . (See also Figs. where C1.608. and for θ = 0. Then V(z) = Σn= 1 (xj − μ)2 and Q(θ) = −1/(2θ) is increasing.9 H : θ ≥ θ0.

Exercises 13. . .f. ( ) ( ) ⎛ 2 C ⎞ ⎛ 2 C ⎞ P⎜ χ 25 < 2 ⎟ − P⎜ χ 25 < 1 ⎟ = 0. α = known Γα ( ) () ( ) (> 0).d. .3 UMP Tests for Testing Certain Composite Hypotheses Exercises n ⎡ Eθ φ Z = Pθ ⎢C1 < ∑ X j − μ ⎢ j =1 ⎣ i 347 ( ) i ( ) 2 ⎤ < C2 ⎥ = α . θ2 = 3. .v. α = 0. . θ = θ r ⎜ ⎟ 1 − θ I A x .d. . with unknown mean μ and known s.d.3 The length of life X of a 50-watt light bulb of a certain brand may be assumed to be a normally distributed r.i. . 2. . show that the joint p. θ = ( ) θ α α −1 −θx x e I ( 0 . .∞ ) x . for θ1 = 1.1 Let X1. . .’s with p. of the X’s has the MLR property in V = V(x1. as is easily seen. A = 0. The power of the test. ( ) x ⎛ r + x − 1⎞ ii) f x. r. 13.3. i) f x. For instance.3.2 Refer to Example 8 and show that.01. θ ∈ Ω = 0.01 and n = 25. .01. X25 be independent r.800 against the alternative A : μ < 1.800 at level of signiﬁcance α = 0. C2 are determined from the Chi-square tables (by trial and error). for testing the hypotheses H and H′ mentioned there.v. In each case. θ ∈ Ω = 0. f given below. σ = 150 hours.730 hours.3. . xn) and identity V.’s distributed as X and ¯ suppose that x = 1.d.f. ∞ . . ⎥ ⎦ i = 1. 1. 1 .01 3⎠ 3⎠ ⎝ ⎝ and C1. we have 2 2 P χ 25 < C 2 − P χ 25 < C1 = 0. Xn be i. is given by ⎛ 2 C ⎞ ⎛ 2 C ⎞ βθ θ = P ⎜ χ n < 2 ⎟ − P ⎜ χ n < 1 ⎟ θ ⎠ θ⎠ ⎝ ⎝ () (independent of μ!). . the power of the respective tests is given by ⎡ n C −θ βφ θ = 1 − Φ ⎢ ⎢ σ ⎣ () ( )⎤ ⎥ ⎥ ⎦ −θ ⎤ ⎥ ⎥ σ ⎦ 1 and ⎡ n C −θ 2 βφ θ = Φ ⎢ ⎢ σ ⎣ () ( ) ⎤ − Φ ⎡ n (C ⎥ ⎢ ⎥ ⎦ ⎢ ⎣ ) as asserted. .v. Let X1. Test the hypothesis H : μ = 1. .13. x ⎠ ⎝ ( ) ( ) () { } 13.

625. 25.4 The rainfall X at a certain station during a year may be assumed to be a normally distributed r.01.05. and secondly. σ = 3 inches and unknown mean μ. .9.500.9 Let X be an r. . iv) Determine the minimum sample size n required by using the Binomial tables (if possible) and also by using the CLT.d. Test the hypothesis H : μ = 30 against the alternative A : μ < 30 at level of signiﬁcance α = 0. . the power of the respective tests is given by ⎛ 2 C⎞ ⎛ 2 C ⎞ ⎛ 2 C ⎞ βφ θ = 1 − P⎜ χ n < ⎟ and βφ θ = P⎜ χ n < 2 ⎟ − P⎜ χ n < 1 ⎟ θ⎠ θ ⎠ θ⎠ ⎝ ⎝ ⎝ () () as asserted.95. What does the relevant test become for Σ25 1x2 = 120.6 Let X1.v. 13.’s distributed as N(0.375. p). x10 = 25. 0.5 Refer to Example 9 and show that.3.9 against the alternative θ1 = 0. .9.05.7 In a certain university 400 students were chosen at random and it was found that 95 of them were women. For testing the hypothesis H : p ≤ 1 against the alternative A : p > 1 . .0.3.4.8. 0. x4 = 29.05? iii) Compute the power at θ1 = 0. x7 = 30.v. For the past 10 years. x8 = 28. .3. use the CLT for its numerical determination. 13. Now let θ0 = 0. On the basis of this. . X25 be independent r.125 and α = 0.3. .2.05. . x9 = 31. θ ∈Ω = (0. 13. distributed as P(λ).v. i) Derive the UMP test for testing the hypothesis H : θ ≤ θ0 against the alternative A : θ > θ0 at level of signiﬁcance α.1.25 and α = 0. Test the hypothesis H : σ ≤ 2 against the alternative A : σ > 2 at level of signiﬁcance α = 0.3. suppose that α = 0.’s distributed as B(1. . x3 = 27.10 The number X of fatal trafﬁc accidents in a certain city during a year may be assumed to be an r. write out the expression for the exact determination of the cut-off point. distributed as B(n. σ2). the record provides the following rainfalls: x1 = 30. test the hypothesis H that the proportion of women is 25% against the alternative A that is less than 25% at level of signiﬁcance α = 0. Use the CLT in order to determine the cut-off point. θ).v. with s. Xn be independent r. . 13. x2 = 34.05 2 2 7 and β( 8 ) = 0.348 13 Testing Hypotheses 13. 1). where xj is the j= j observed value of Xj. First.875. j = 1. For the latest year x = 4.v. Test whether it has been an improvement. ii) What does the test in (i) become for n = 10.25.3. θ0 = 0. 13.750.5.3. 0. at level of signiﬁcance α = 0. Use the CLT in order to determine the required sample size n. x5 = 35.7.3.1 and suppose that we are interested in securing power at least 0.8 Let X1. whereas for the past several years the average was 10. for testing the hypotheses H and H′ mentioned there. 0. x6 = 26. . . 13.

d. 13. . f given by f x.’s distributed as X and suppose that x = 15. Also under the assumptions of Theorem 2. for testing H : θ ∈ω = {θ ∈Ω. Then Ω for testing the hypothesis H : θ ∈ω against the alternative A : θ ∈ ωc at level of ω signiﬁcance α. . a UMP test does not exist any longer. . . it was stated that under the assumptions of Theorem 3. . .4 UMPU Tests for Testing Certain Composite Hypotheses 13. a UMP test exists. ∞ . ii) Determine the minimum sample size n required to obtain power at least 0. . .∞ ) x . Next. one would have to restrict oneself to a smaller class of tests. A UMP test is always UMPU. .3. θ = ( ) 1 −x θ e I (0 .d. f(·. θ () θ ∈ Ω = 0. Xn) ≥ α for all θ ∈ωc.13. if the roles of H and A are interchanged.05. DEFINITION 7 In the notation of Deﬁnition 6. Test the hypothesis H : p = 10−4 against the alternative A : p > 10−4 at level of signiﬁcance α = 0.95 against the alternative θ1 = 500 when θ0 = 1.v. . Xn) ≤ α for all θ ∈ ω and Eθφ(X1. . Thus there is no unique test which is UMP for all θ ≠ θ0. This is so because the test given by (19) and (19′) is UMP for θ > θ0 but is worse than the trivial test φ(z) = α for θ < θ0. It is then somewhat surprising that. in the ﬁrst place it is unbiased because it is at least as powerful as the test which is identically equal to α. r. a test φ based on X1. . . . for testing H0 : θ = θ0 against A″ : θ ≠ θ0 a UMP does not exist.f. . In fact.12 Let X1. Xn be i. a test is said to be uniformly most powerful unbiased (UMPU) if it is UMP within the class of all unbiased tests. .’s with p. Let X1.f. DEFINITION 6 Let X1. Xn be independent r.d.v. .v. .3.4 UMPU Tests for Testing Certain Composite Hypotheses In Section 13.11 Let X be the number of times that an electric light switch can be turned on and off until failure occurs.’s with p. 13. the deﬁning property of an unbiased test is that the type-I error probability is at most α and the power of the test is at least α. REMARK 3 . θ). This leads us to introducing the concept of an unbiased test. θ ∈Ω and let ω ⊂ Ω ⊆ r.000 and α = 0. . distributed as Negative Binomial with r = 1 and unknown p.3. . it is UMPU because it is UMP within a class including the class of unbiased tests. The above observations suggest that in order to ﬁnd a test with some optimal property.3 UMP 349 13. .05. ω That is. θ ≤ θ1 or θ ≥ θ2} against A : θ ∈ω c.i. Xn is said to be unbiased if Eθφ(X1. . Then X may be considered to be an r. .v. . X15 be ¯ independent r.150. ( ) i) Derive the UMP test for testing the hypothesis H : θ ≥ θ0 against the alternative A : θ < θ0 at level of signiﬁcance α .

Xn be i.i.350 13 Testing Hypotheses For certain important classes of distributions and certain hypotheses. by Examples and Exercises of Chapter 11 it can be seen that all these families are of the exponential form REMARK 4 ( ) 1 0 1 0 2 Figure 13. Then for testing the hypothesis H : θ ∈ω against A : θ ∈ ωc and the hypothesis H0 : θ ∈ω0 against A0 : θ ∈ω c at level of signiﬁcance α. there exist UMPU tests which are given by ⎧1. We would expect that cases like Binomial. . θ) given by f x. γi. ( ) 0 i = 1.v. A : θ < θ1 or θ > θ2. The following theorem covers cases of this sort. but it will be presented without a proof. while they seemingly do not. . . In fact. THEOREM 4 Let X1. However. . .d. . . Poisson and Normal would fall under Theorem 4. . 2 for H . . (Recall that z = (x1. xn)′. f(·. Z = (X1.) j Furthermore.10). 13. i = 1. θ ∈Ω (except for degenerate cases) is decreasing for θ ≤ θ0 and increasing for θ ≥ θ0 for some θ1 < θ0 < θ2 (see also Fig. UMPU tests do exist. 2 are given by Eθi φ Z = α . where the constants Ci. θ = C θ e ( ) () θT x ( ) hx.10 H : θ1 ≤ θ ≤ θ2. . Eθ V Z φ Z = αEθ V Z 0 0 ( ) [ ( ) ( )] ( ) for H 0 .d. θ1 ≤ θ ≤ θ2} and ω0 = {θ0}. . (28) Let ω = {θ ∈Ω. . () θ ∈Ω ⊆ . 2) (C 2 i 1 < C2 ) otherwise. a simple reparametrization of the families brings them under the form (28). (i = 1. Xn)′ and V(z) = Σn= 1 T(xj). θ2 ∈ Ω and θ1 < θ2.’s with p.f. . ⎪ ⎪ φ z = ⎨γ i . and Eθ φ Z = α . ⎩ if V z < C1 if () or V (z ) > C () V (z ) = C . it can be shown that the function βφ(θ) = Eθφ(Z). ⎪ ⎪0. r. θ1. where θ0.

where τ i = log θi . Q(θ) = (1/σ2)θ and the factor 1/σ2 may be absorbed into T(x). i = 0. iii) For the Normal case with σ known and μ = θ. As an application to Theorem 4 and for later reference. τ = τ0. 1 − θi i = 0.3 UMP 351 f x. where τi = −1/(2θi). equivalently. 0 0 0 ( ) [ ( ) ( )] ( ) Now φ can be expressed equivalently as follows: . σ2). σ2 Therefore. . Let σ be known and set μ = θ. Q(θ) = −1/(2θ) and the transformation −1/(2θ) = τ brings the family under the form (28) again. τ1 ≤ τ ≤ τ2 and τ = τ0. Eθ V Z φ Z = αEθ V Z . ii) For the Poisson case. τ1 ≤ τ ≤ τ2. () i) For the Binomial case. τ1 ≤ τ ≤ τ2. In the present case. or n x > C2 σ2 where C1. the UMPU test is as follows: ⎧ ⎪1.4 UMPU Tests for Testing Certain Composite Hypotheses 13. Then by setting log[θ/(1 − θ)] = τ. The transformation implies θ = eτ and the hypotheses θ1 ≤ θ ≤ θ2. Since θ = −1/(2τ). T x = so that V z = ∑T x j = j =1 () 1 x. The level of signiﬁcance will be α. . equivalently. 2.v. ⎩ () if n x < C1 σ2 otherwise. by Theorem 4. σ2 () n ( ) 1 σ2 ∑x j =1 n j = n x. θ = C θ e ( ) () Q θ T x () ( ) hx. 1. 1. i = 0. the hypotheses θ1 ≤ θ ≤ θ2 and θ = θ0 become.’s from N(μ. θ = θ0 become equivalently. φ z =⎨ ⎪0 . Q(θ) = log[θ/(1 − θ)]. r. 2. . θ = θ0 become. 2. Suppose that we are interested in testing the hypothesis H : θ = θ0 against the alternative A : θ ≠ θ0. we consider the following example. iv) For the Normal case with μ known and σ2 = θ. EXAMPLE 10 Suppose X1.i. From this transformation. . the family is brought under the form (28). Q(θ) = log θ and the transformation log θ = τ brings the family under the form (28). 1. Xn are i. C2 are determined by Eθ φ Z = α . we get θ = eτ/(1 + eτ ) and the hypotheses θ1 ≤ θ ≤ θ2.d. τ = τ0 with τi = log θi.13.

ϑ 1 . More explicitly. ϑ 1 . In many situations of practical importance. the p. By summarizing then. . .f. ϑ k = C θ . . ϑ k exp θT x 1 1 k k k ( ) ( ) [ () + ϑ T ( x) + ⋅ ⋅ ⋅ + ϑ T ( x )]h( x). . . . ϑ1. ⎩ ⎡ n x −θ 0 if ⎢ ⎢ σ ⎣ otherwise. − ′ σ σ n – ¯ On the other hand. . . is of the following exponential form: f x.f. ϑk are left unspeciﬁed. φ z =⎨ ⎪ ⎪0. Also 1 2 n x − θ0 ( σ ) < −C or n x − θ0 ( σ ) >C is equivalent to ⎡ n x −θ 0 ⎢ ⎢ σ ⎣ ( )⎤ ⎥ ⎥ ⎦ 2 >C – ¯ and. These latter parameters are known as nuisance parameters. . . . the underlying p. Then the (composite) hypotheses of interest are the following ones. .d. θ2 ∈Ω with θ1 < θ2. ( ) < C′ 1 or n x − θ0 ( σ ) > C′ 2 where C1′ = σC1 n − nθ0 nθ0 σC 2 .352 13 Testing Hypotheses ⎧ ⎪ φ z = ⎨1. of course. . because of symmetry C′ = −C′ = −C. () if n x − θ0 σ otherwise. . . θ1. . ⎪ ⎩0. [√n(X − θ0)/σ]2 is χ2 . . Therefore. C2 = . . under H. and in addition some other real-valued parameters ϑ1. under H. Let θ0. . 1). ( ) (29) where θ ∈Ω ⊆ . . we have 1 ⎧ ⎪ ⎪1. ϑk are real-valued and h(x) > 0 on a set independent of all parameters involved. say (C > 0). √n(X − θ0)/σ is N(0. ϑk in which we have no interest. θ . () ( )⎤ ⎥ ⎥ ⎦ 2 >C where C is determined by P χ12 > C = α . involves a real-valued parameter θ in which we are exclusively interested. . where ϑ1.d.

there exist UMPU tests with level of signiﬁcance α for testing any one of the hypotheses Hi(H′ ) against the alternatives 1 Ai(A′ ).4.13. Use the UMPU test for testing the hypothesis H : p = 1 against the alternative A : p ≠ 1 at level of signiﬁcance α = 0.’s with p. THEOREM 5 Let X1.d.05. where both μ and σ2 are unknown.v.25 at level of signiﬁcance α. θ ≤ θ or θ ≥ θ }⎪A ( A′) :θ ∈ω ⎪ {θ ∈Ω. i = 1.25 against A : p ≠ 0. except for the ﬁrst one which is UMP.’s from N(μ.v. Xn be i. . θ ≤ θ ≤ θ } ⎪ ⎪ {θ } ⎪ ⎭ 0 0 1 2 i i 1 2 0 c .f. This is a consequence of Theorem 5 (except again for the UMP test). . . . Also the remaining (unspeciﬁed) regularity conditions in Theorem 5 can be shown to be satisﬁed here. r.’s distributed as B(1. However. with probability p of falling heads. Derive the UMPU test for testing H : p = 0. θ ≥ θ } ⎪ ⎬ {θ ∈Ω. . Some of the tests will be arrived at again on the basis of the principle of likelihood ratio to be discussed in Section 7. .5 Testing the Parameters of a Normal Distribution In the present section. (30) We may now formulate the following theorem. is tossed independently 100 times and 60 heads are observed. .d.5 Testing Testing Certain Composite Distribution 353 ⎧H1 :θ ∈ω = ⎪ ⎪H1′ :θ ∈ω = ⎪ ⎪ ⎨H 2 :θ ∈ω = ⎪ ⎪H 3 :θ ∈ω = ⎪ ⎪H :θ ∈ω = ⎩ 4 ⎫ {θ ∈Ω. X2.3 UMP Tests forthe Parameters of a Normal Hypotheses 13. X3 be independent r. under some additional regularity conditions.i.2 Let X1. whose proof is omitted. Xn are assumed to be i. 13. the family is brought under the form (29). 2 2 13. the following two sections are devoted to presenting simple tests for the hypotheses speciﬁed in (30). .d.1. Then. . the other serving as a nuisance parameter. 1 Because of the special role that normal populations play in practice. respectively. σ2). . 4. 4. . and therefore the conclusion of the theorem holds. . Determine the test for α = 0. . i = 1.4. Exercises 13. X1. .i. r. p). . . .v. All tests to be presented below are UMPU. One of the parameters at a time will be the parameter of interest. Under appropriate reparametrization. given by (29). θ ≤ θ } ⎪ ⎪ {θ ∈Ω.1 A coin. as indicated in Remark 5. the optimal character of the tests will not become apparent by that process.

PROPOSITION 3 For testing H2 : σ ≤ σ1 or σ ≥ σ2.415. ⎩ () if ∑ (x n j =1 j −x ) 2 >C (31) otherwise.8384. Again. C1. 2. so that C = 124.354 13 Testing Hypotheses Whenever convenient. σ is the true s. 1 1 The power of the tests is easily determined by the fact that (1/σ2) Σn= 1 j ¯ (Xj − X )2 is χ2 when σ obtains (that is.).632.735. is UMPU for testing H3 : σ1 ≤ σ ≤ σ2 against A3 : σ < σ1 or σ > σ2. The test given by (33) and (34). ( ) (32) is UMP. the test given by ⎧1. all tests will be of level α. For example.5. 24 For H′ . against A2 : σ1 < σ < σ2.d.1094) = 0. C/9 = 36. j n−1 For example. σ0 = 3 and α = 0. Xn)′. ⎩ () if C1 < ∑ j =1 x j − x n ( ) 2 < C2 (33) otherwise. and similarly for (34). . σ1 = 2. we will also use the notation z and Z instead of (x1. xn)′ and (X1. . The power of the test at σ = 5 is equal to P(χ2 > 13. . the power of the tests is determined by the fact that ¯ (1/σ2) Σn= 1 (Xj − X )2 is χ2 when σ obtains. respectively. where C is determined by 2 P χ n−1 > C σ 02 = α . so that C = 327.158) = 0. C2 are determined by 2 P C1 σ i2 < χ n−1 < C 2 σ i2 = α . .962.1 Tests about the Variance PROPOSITION 2 For testing H1 : σ ≤ σ0 against A1 : σ > σ0. C2 are determined by ⎧ ⎛ 2 C1 ⎞ ⎛ 2 C2 ⎞ ⎪P⎜ χ 24 > ⎟ − P⎜ χ 24 > ⎟ = 0. where C1. . σ2 = 3 and α = 0. ⎪ φ z =⎨ ⎪0. The test given by (31) and (32) with reversed inequalities is UMPU for testing H′ : σ ≥ σ0 against A′ : σ < σ0. . 13. ⎪ φ z =⎨ ⎪0.05 ⎟ ⎜ 24 ⎟ ⎪ ⎜ 24 9⎠ 9 ⎠ ⎝ ⎩ ⎝ . for n−1 n = 25. . where the inequalities C1 < Σn j=1 ¯) (xj − x 2 < C2 are replaced by ∑( j =1 n xj − x ) 2 < C1 or ∑ (x j =1 n j −x ) 2 > C2 . Finally.05. ( ) i = 1. and the power at σ = 2 is P χ2 24 1 (< 31.848.05. we have for H1. C/9 = 13. . the test given by ⎧1. (34) is UMPU. for H2 and for n = 25.05 4⎠ 4 ⎠ ⎝ ⎝ ⎪ ⎨ ⎪P⎛ χ 2 > C1 ⎞ − P⎛ χ 2 > C 2 ⎞ = 0.

PROPOSITION 4 For testing H4 : σ = σ0 against A4 : σ ≠ σ0. distributed as tn−1.7109 for H′ . () if t z > C otherwise.v. we have P(t24 > C) = 0. it is a close approximation to the UMPU test when n is large. we set t z = () n x − μ0 n ( ) ) 2 1 ∑ xj − x n − 1 j =1 ( . t(z) is given by (35). REMARK 5 13. 1 PROPOSITION 6 For testing H4 : μ = μ0 against A4 : μ ≠ μ0. The test given by (36) and (37) with reversed inequalities is UMPU for testing H′ : μ ≥ μ0 against A′ : μ < σ0. ⎪ φ z =⎨ ⎪ ⎩0. UMPU tests exist in a simple form and are explicitly given for the following three cases: μ ≤ μ0. the test given by ⎧1.11 1 1 and 13. 1 () g being the p. C2 are determined by ∫ 2 c2 σ0 2 c1 σ0 g t dt = () 1 c n − 1 ∫c 2 2 σ0 2 σ0 tg t dt = 1 − α . 13. the test given by ⎧1. () or t z > C () (C > 0) . μ ≥ μ0 and μ = μ0. of a χ2 distribution.05. (See also Figs.5. tn−1 stands for an r. () if ∑ (x n j =1 j −x ) 2 < C1 or ∑ (x n j =1 j −x ) 2 > C2 otherwise.) For n = 25 and α = 0.05. the test given by ⎧1. () if t z < −C otherwise. ⎪ φ z =⎨ ⎪ ⎩0. The popular equal tail test is not UMPU. is UMPU.5 Testing Testing Certain Composite Distribution 355 from the Chi-square tables (by trial and error). To facilitate the writing. where C1. ( ) (37) is UMPU.12.f. ⎪ φ z =⎨ ⎪ ⎩0. and C = −1.7109 for H1. () (36) where C is determined by P t n−1 > C = α . (35) PROPOSITION 5 For testing H1 : μ ≤ μ0 against A1 : μ > μ0.13. n−1 The power of the test is determined as in the previous cases.d. hence C = 1.2 Tests about the Mean In connection with the problem of testing the mean.3 UMP Tests forthe Parameters of a Normal Hypotheses 13.

σ must not exceed 0. .i. σ0 = 2. A sample of size 16 is taken and is found that s = 0. r. i) Derive the UMPU test for testing the hypothesis H : σ = σ0 against the alternative A : σ ≠ σ0 at level of signiﬁcance α. A1 : μ > μ0.5.3. Figure 13.3 Discuss the testing hypothesis problem in Exercise 13. the determination of the power involves what is known as non-central t-distribution. 1 1 where C is determined by P t n−1 > C = α 2. In both these last two propositions.’s from N(μ.d. (See also Fig.1 The diameters of bolts produced by a certain machine are r.12 H ′ : μ ≥ μ0. In order for the bolts to be usable for the intended purpose.v.356 13 Testing Hypotheses tn 1 tn 1 0 C C 0 Figure 13. 13. .05 inch.5. . the s. . A′ : μ < μ0.13 H4 : μ = μ0. σ2).05.d. and α = 0. A4 : μ ≠ μ0. . Xn be i.4 if both μ and σ are unknown. Exercises 13. Formulate the appropriate testing hypothesis problem and carry out the test if α = 0. which is deﬁned in Appendix II.8. where both μ and σ are unknown. Σ25 1 (xj − x 2 = 24.) tn 2 1 Figure 13. 2 C 0 C For example. for n = 25 and α = 0. we have C = 2.2 Let X1.v. t(z) is given by (35).’s distributed as N(μ.0639. 13.13. ( ) is UMPU.05. σ2). j= 13.5.04 inch.11 H1 : μ ≤ μ0.05. ¯) ii) Carry out the test if n = 25.

For convenient writing. Each time either μ or τ will be the parameter of interest. respectively. we shall employ the notation ′ Z = X1 . Yn be i. 0.13. of the X’s and Y’s and reparametrizing the family along the lines suggested in Remark 4 reveals that this joint p. . yn ( ) for their observed values. Set μ = μ1 − μ2 and τ = σ2/ 2 σ2.v. 157. . .’s 1 from N(μ2. in either one of the parameters μ or τ.05. . and ′ z = x1 .01). . For some of these hypotheses. . . 13. Yn ′ ( ) ′ for the X’s and Y’s.i. . the tests have a simple form to be explicitly mentioned below.v.d. .6. σ2) and let Y1. .5 The diameters of certain cylindrical items produced by a machine are r. it can be shown that the additional (but unspeciﬁed) regularity conditions of Theorem 5 are satisﬁed and therefore there exist UMPU tests for the hypotheses speciﬁed in (30).v. ( ) W = Y1 . Xm be i. .’s distributed as N(μ.01.5. If the desired value for μ is 2. . has the form (29).5 inches.752 and ∑x j =1 100 2 j = 31. r. .f.5. .1 Comparing the Variances of Two Normal Densities PROPOSITION 7 For testing H1 : τ ≤ τ0 against A1 : τ > τ0. .f. xm . In order to check his claim. X m . formulate the appropriate testing hypothesis problem and carry out the test if α = 0. . It is assumed that the two random samples are independent 2 and that all four parameters involved are unknown. A sample of size 16 is taken and is found that ¯ x = 2. r. 100 packages are chosen at random from a large lot and it is found that ∑x j =1 100 j = 1.4 A manufacturer claims that packages of certain goods contain 18 ounces.d.6 Comparing the Parameters of Two Normal Distributions Let X1. . Furthermore.d. . .6 Comparing the Parameters of Two Normal Distributions 13. 13. the test given by . σ2).’s from N(μ1. . Make the appropriate assumptions and test the hypothesis H that the manufacturer’s claim is correct against the appropriate alternative A at level of signiﬁcance α = 0. ( ) w = y1 . . The problem to be discussed in this section is that of testing certain 1 hypotheses about μ and τ. Writing down the joint p. the remaining parameters serving as nuisance parameters. .3 UMP Tests for Testing Certain Composite Hypotheses 357 13. .i.48 inches.d. . 13.

.0825 and hence C = 1.m−1 > C0 = α . ⎧ ⎪1.m−1 distributed when τ obtains. τ0 = 2 and α = 0. 13.m−1. Now set 1 τ0 m i =1 V z. m 1 Fn 1.v.15.14 and 13. n = 21. ′ ′ Figure 13.24 < ⎟ = P⎜ F24 .05. where C is determined by P Fn−1.14 H1 : τ ≤ τ0. w = ⎨ ⎪ ⎪0.20 > ⎟ = 0.24 > 5C/12) = 0.8640 for H1.m−1 1 stands for an r. ⎩ ( ) if ∑ j =1 ( y j − y ) 2 m ∑i =1 ( xi − x ) n 2 >C (38) otherwise. 1 ⎛ ⎛ 5C ⎞ 12 ⎞ P⎜ F20 . For m = 25. one has P(F20. A1 : τ > τ0. distributed as Fn−1.05 12 ⎠ 5C ⎠ ⎝ ⎝ implies 12/5C = 2. w = ( ) ∑ ( yj − y) j =1 2 n 2 ∑ ( xi − x ) 1 + τ0 ∑ ( yj − y) j =1 n . The test given by (38) and (39) with reversed inequalities is UMPU for testing H′ : τ ≥ τ0 against A′ : τ < τ0.) The power of the test is easily determined by the fact that 1 n Yj − Y 2 ∑ σ 2 j =1 1 m ∑ Xj − X σ 12 i = 1 ( ) 2 n−1 = m−1 ( ) 2 1 m−1 τ n−1 ∑ (Yj − Y ) j =1 m i =1 n 2 ∑ (Xi − X ) 2 is Fn−1. Fn−1. m 1 0 C0 0 C0 Figure 13. (See also Figs. A1 : τ < τ0. hence 5C/12 = 2. for H′ .1525. (n − 1)τ 0 (39) is UMPU. ⎪ φ z.05. Thus the power of the test depends only on τ.0267 and C = 4.358 13 Testing Hypotheses Fn 1. 2 (40) Then we have the following result.15 H 1 : τ ≥ τ0. ( ) C0 = (m − 1)C .

7139 and C = 0. w = ⎨ ⎪0. w < C1 otherwise.6 Comparing the Parameters of Two Normal Distributions 13. C0 = C ( ) ( m+n−2 . also. one has for H1 : P(t23 > C 23 × 6 ) = 0.1459. the test given by . For H′ .13. 1 ( m−1) ⎢ ⎥ ⎢ ⎥ 2 2 2 2 ⎣ ⎦ ⎣ ⎦ is UMPU. w > C otherwise. as was also the case in Propositions 5 and 6. The test given by (42) and (43) with reversed inequalities is UMPU for testing H′ : μ ≥ 0 against A′ : μ < 0. 1 ( m−1) ( n+1) . ⎪ φ z. w = y−x ( ) ∑ ( m i =1 xi − x ) 2 + ∑ j =1 y j − y n ( ) 2 . n = 10 and α = 0. ( n−1) . 1 C = −0.2 Comparing the Means of Two Normal Densities In the present context. Cambridge University Press. ( ) if t z.6.r stands for an r. Aerospace Research Laboratories. The determination 1 1 of the power of the test involves a non-central t-distribution. t(z. r2 degrees of freedom. distributed as Beta with r1. for m = 15. ( ) or V z.v. 1 m + 1 n ) ( ) (43) is UMPU. the test given by ⎧1.05.1459. w = ⎨ ⎪ ⎩0. for example. w > C 2 ( ) where C1. hence C 23 × 6 = 1.) 1 2 13. C2. 1 2 PROPOSITION 9 For testing H1 : μ ≤ 0 against A1 : μ > 0. Ofﬁce of Aerospace Research. V(z. ⎩ ( ) if V z. ( ) (42) where C is determined by P t m+ n− 2 > C0 = α . (41) We shall also assume that σ2 = σ2 = σ2 (unspeciﬁed). C2 are determined by ⎡ ⎤ ⎡ ⎤ P ⎢C1 < B1 < C 2 ⎥ = P ⎢C1 < B1 < C2 ⎥ = 1 − α . w) is deﬁned by (40). (Br . ⎪ φ z. where μ = μ2 − μ1. Tables of the Incomplete Beta-Function by Karl Pearson. New Tables of the Incomplete Gamma Function Ratio and of Percentage Points of the Chi-Square and Beta Distributions by H. w) is given by (41).3 UMP Tests for Testing Certain Composite Hypotheses 359 PROPOSITION 8 For testing H4 : τ = τ0 against A4 : τ ≠ τ0. (See. PROPOSITION 10 For testing H4 : μ = 0 against A4 : μ ≠ 0.) For the actual determination of C1. we use the incomplete Beta tables. it will be convenient to set t z.05. For example. the test given by ⎧1. Leon Harter.

respectively.05. . .0. . . y4 = 0. test the hypothesis: H : σ1 = σ2 against the alternative A : σ1 ≠ σ2 and ﬁnd the power of the test at σ1 = 2. 9 and Yj. ( ) or t z.120 y5 = 0. σ2) and N(μ2.120. . sY = 3.4. σ2) and N(μ2. . 10 be independent random samples from the distributions N(μ1. . y2 = 8.114. At level of signiﬁcance α = 0. j = 1.3. Test the hypothesis H : σ1 = σ2 against the alternative A : σ1 ≠ σ2 at level of signiﬁcance α = 0. . Once again the determination of the power of the test involves the noncentral t-distribution. Suppose 1 2 that the observed values of the X’s and Y’s are as follows: x1 = 10. various tests have been proposed but we will not discuss them here. and set up the formulas for determining the cut-off points and the power of the test. y2 = 0.125. w = ⎨ ⎪0. ⎪ φ z. one has P(t23 > C 23 × 6 ) = 0. Make the appropriate assumptions and test the hypothesis H : σ1 = σ2 against the alternative A : σ1 ≠ σ2 at level of signiﬁcance α = 0. In Propositions 9 and 10. . σ2 = 3.119. x 2 = 8.1762. REMARK 6 Exercises 13.’s are sX = 2.117. j = 1.3. y1 = 9. and set up the formulas for determining the cut-off points and the power of the test.05.6. Suppose 1 2 that the observed values of the sample s. .115. i = 1. . σ2). Again with m = 15.110. . x3 = 14. (Compute the value of the test statistic.2 Let Xj. x3 = 0. w < −C otherwise.1 Let Xi. w > C ( ) where C is determined by P t m+ n− 2 > C0 = α 2 . σ2). y4 = 10. 4 be two independent random samples from the distributions N(μ1.1. is UMPU.118.025 and hence C 23 × 6 = 2. .121. j = 1. ( ) C0 as above. (Compute the value .05.7. 4 and Yj.360 13 Testing Hypotheses ⎧1.6. x5 = 0.05.2. y3 = 0. x 4 = 0. the tests presented above are not UMPU. (Compute the value of the test statistic. if the variances are not equal. . x 4 = 11. x 2 = 0.1.0687 and C = 0. . n = 10 and α = 0. ⎩ ( ) if t z. respectively.3 Five resistance measurements are taken on two test pieces and the observed values (in ohms) are as follows: x1 = 0.d.) 13.) 13. y3 = 12. . For this case. The problem of comparing the means of two normal densities when the variances are unequal is known as the Behrens– Fisher problem.6. y1 = 0.

θ ∈ Ω ⊆ r and let ω ⊂ Ω. 13. and sup1 2 pose that all four parameters are unknown.) However.6. Now. Test whether there is a difference between the two processes at the level of signiﬁcance α = 0.) 13.i.6 Refer to Exercise 13.4 Refer to Exercise 13. . The problem can be reduced to it though by the following device.3. . however. be employed.13. i = 1. . Of course. . Test the hypotheses H1 : μ ≤ 0 against A1 : μ > 0 and H2 : μ = 0 against A2 : μ ≠ 0 at level of signiﬁcance α = 0. . 13. .5 The breaking powers of certain steel bars produced by processes A and B are r.6. In practice. n and Yi. n. Then for setting up a test. θ) · · · f(xn. 13. ˆ used rather than L(ωc)/L(ω where. L(Ω) = max[L(θ). r.05 for the data given in (i) Exercise 13.d.7 Let Xi. the test is called a likelihood ratio (LR) test. n be independent random samples from the distributions N(μ1. and L(ωc) = f(x1.6. θ) whenever θ ∈ ω. θ) · · · f(xn.6. where C is speciﬁed by the desired size of the test. . one rejects H if L(ω ω that is. ω Ω θ ω (When we talk about a statistic. θ) when ω ω θ is varying over ωc. and test the hypothesis H : μ1 = μ2 against the alternative A : μ1 ≠ μ2 at level of signiﬁcance α = 0. i = 1. A random sample of size 25 is taken from bars produced by ¯ ¯ each one of the processes. ≤C. f(·. L(ω) ω ˆ) is to be replaced by L(ω = max[L(θ). θ). . (ii) Exercise 13. we have that the paired r. sY = 7. σ2).7 Likelihood Ratio Tests 361 of the test statistic.7 Likelihood Ratio Tests Let X1.3 UMP Tests for Testing Certain Composite Hypotheses 13. . sX = 6. then L(ω) and L(ωc) are completely determined and for testing H : θ ∈ ω against ω ω A : θ ∈ ωc.05. and it is found that x = 60. if ω and ωc contain more than one point each.’s distributed as normal with possibly different means but the same variance. then neither L(ω) nor ω L(ωc) is determined by H and A and the above method of testing does not ω apply. it will be understood that the observed values have been replaced by the corresponding r.6.6. . Set L(ω) = f(x1. .2 and suppose it is known that σ1 = 4 and σ2 = 3.3. Xn be i.f. the test . . . θ ∈ Ω].v. i = 1. respectively. 13. and set up the formulas for determining the cut-off points and the power of the test. θ ∈ ω] and L(ωc) is to be replaced by ω θ ω ˆ L(ωc) = max[L(θ).6.05. are independent and distributed as N(μ. make the appropriate assumptions. y = 65. the statistic L(ωc)/L(Ω) is ω ω Ω ω ˆ ˆ).v. the MP test rejects when the likelihood ratio (LR) L(ωc)/L(ω) is too ω ω large (greater than or equal to a constant C determined by the size of the test.’s with p. θ ∈ ωc]. For obvious reasons.05. .’s although the same notation will ˆ ˆ)/L(Ω) is too small.’s Zi.v. one would compare the ω θ ˆ) ˆ ˆ quantities L(ω and L(ωc). σ2) with μ = μ1 − μ2 and σ2 = σ2 + σ2.6. of course. σ2 ) and N(μ2.d.) In terms of this statistic. Then one may use Propositions 5 and 6 to 1 2 test hypotheses about μ. when both ω and ωc consist of a single point. Test the hypothesis H that the two means do not differ by more than 1 at level of signiﬁcance α = 0. .2. By setting Zi = Xi − Yi.v. .

. under suitable regularity conditions. reω spectively. (Notice that 0 < λ ≤ 1. First is the problem of determining the cut-off point C and second is the problem of ˆ actually determining L(ω and L(Ω). The ﬁrst difﬁculty is removed at the ˆ) ω Ω asymptotic level. Should λ be too θ far away from 1. where Ω is an r-dimenΩ sional subset of r and let ω be an m-dimensional subset of Ω. that is.f. of observing x1. . respectively. . L(Ω) and L(Ω)dx1 · · · dxn represent the maximum probability for Ω Ω the discrete and continuous case. one is apt to encounter two sorts of difﬁculties. H is simple and then ˆ) ω no problem exists. ( ) () x ≥ 0 for all θ ∈ω. under certain regularity conditions. Furthermore. r−m provided θ ∈ω . . . the quantities L(ω and L(Ω) ˆ) ω ω Ω will tend to be close together (by an assumed continuity (in θ) of the likelihood function L(θ | x1. The ˆ problem of ﬁnding L(Ω) is essentially that of ﬁnding the MLE of θ. the asymptotic distribution of −2 log λ. if θ ∈ω. . the data would tend to discredit H. In the following. . as follows: The quantity L(ω and the ˆ) ω ˆ)dx probability element L(ω 1 · · · dxn for the discrete and continuous case. it does provide a uniﬁed method for producing tests. ω Pθ −2 log λ ≤ x → G x . . under H.) ω Ω Also the statistic −2 log λ rather than λ itself is employed. xn without ˆ restrictions on θ.362 13 Testing Hypotheses speciﬁed by the Neyman–Pearson fundamental lemma is also a likelihood ratio test. the asymptotic distribution of −2 log λ is χ2 . as speciﬁed by H. does not depend on θ. where C is determined by the desired level of the test. a theorem referring to the asymptotic distribution of −2 log λ is stated (but not proved) and then a number of illustrative examples are discussed. Also in addition to its intuitive interpretation. . xn)). ˆ The notation λ = L(ω ˆ)/L(Ω) has been in wide use. Then in terms of this statistic. ˆ Now the likelihood ratio test which rejects H whenever L(ω ˆ)/L(Ω) is too ω Ω small has an intuitive interpretation. is the maximum probability of observing x1. ˆ ˆ Similarly. f(·. the likelihood ratio test happens to coincide with or to be close to other tests which are known to have some well deﬁned optimal properties such as being UMP or being UMPU. as n → ∞. .d. . In carrying out the likelihood ratio test in actuality. Thus. in many cases of practical interest and for a ﬁxed sample size. xn if θ lies in ω. Suppose also that the set of positivity of the p. in the sense that we may use as an approximation (for sufﬁciently large n) the limiting distribution of −2 log λ for specifying C. Calculating Ω L(ω is a much harder problem. θ). In many cases. is known. the reason being that. .f. Then under some additional regularity conditions.v. this test is equivalent to the LR test. it enjoys some asymptotic optimal properties as well. Of course. Xn be i.i. r. and therefore λ will be close to 1. .’s with p. and therefore H is to be rejected.d.d. THEOREM 6 Let X1. . . however. In spite of the apparent difﬁculties that a likelihood ratio test may present. θ ∈Ω. one rejects H whenever −2 log λ > C. . .

.’s from N(μ. .d. it is much easier to determine the distribution of −2 log λ rather than that of λ. In fact. φ z =⎨ ⎪ ⎪0.i. r−m Since in using the LR test. .3 UMP Tests for Testing Certain Composite Hypotheses 13. .) Notice that this is consistent with Theorem 6. and consider the following testing hypotheses problems. (Recall that z = (x1. or some other test equivalent to it. we have ˆ LΩ = and ˆ Lω = ( ) ( 1 2πσ ) n n ⎡ 1 exp ⎢− 2 ⎢ ⎣ 2σ ∑ (x j =1 n j 2⎤ −x ⎥ ⎥ ⎦ ) ( ) ( 1 2πσ ) ⎡ 1 exp ⎢− 2 ⎢ 2σ ⎣ ∑ (x j =1 n j 2⎤ − μ 0 ⎥. ⎥ ⎦ ) In this example. n x − μ0 σ2 ( ) 2 () ( )⎤ ⎥ ⎥ ⎦ 2 >C where C is determined by P χ12 > C = α .f. ⎩ ⎡ n x−μ 0 if ⎢ ⎢ σ ⎣ otherwise. Also the level of signiﬁcance will always be α. r. σ2). ii) Consider the same problem as in (i) but suppose now that σ is also unknown. Xn be i. xn)′. .v. Chapter 12). since σ is unspeciﬁed. the alternative A speciﬁes that θ ∈ ωc . ˆ x Since the MLE of μ is μΩ = ¯ (see Example 12. EXAMPLE 11 (Testing the mean of a normal distribution) Let X1.7 Likelihood Ratio Tests 363 where G is the d. this will not have to be mentioned explicitly in the sequel. ( ) . of a χ2 distribution. −2 log λ = and the LR test is equivalent to ⎧ ⎪ ⎪1. It should also be pointed out that this test is the same as the test found in Example 10 and therefore the present test is also UMPU. . i) Let σ be known and suppose we are interested in testing the hypothesis H : μ ∈ω = {μ 0}. .13. . We are still interested in testing the hypothesis H : μ = μ0 which now is composite.

() if t < − C otherwise. That is. Then ⎛σ2 ⎞ ˆ λ =⎜ Ω⎟ ˆ2 ⎝ σω ⎠ But n 2 or λ 2 n ∑ (x − x ) = ∑ (x − μ ) n j=1 j n j=1 j 2 2 . Chapter 12). Therefore ˆ LΩ = ( ) ( 1 ˆ 2πσ Ω ) n n ⎡ 1 exp ⎢− ˆ2 ⎢ ⎣ 2σ Ω ∑ (x j =1 n j 2⎤ −x ⎥= ⎥ ⎦ ) ( ( 0 1 ˆ 2πσ Ω ) ) n e −n 2 and ˆ Lω = ( ) ( 1 ˆ 2πσ ω ) ⎡ 1 exp ⎢− ˆ2 ⎢ ⎣ 2σ ω ∑ (x j =1 n j 2⎤ − μ0 ⎥ = ⎥ ⎦ ) 1 ˆ 2πσ ω n e −n 2 . 1 n ˆ2 σ Ω = ∑ xj − x n j =1 . n − 1⎟ ⎝ ⎠ −1 where t =t z = () n x − μ0 ( ) ) 2 1 n ∑ xj − x n − 1 j =1 ( . or t > C .364 13 Testing Hypotheses Now the MLE’s of σ2. μ ∈ θ {θ = (μ. σ > 0} and ω = ( ) 2 ˆ2 and σ ω = 1 n ∑ x j − μ0 n j =1 ( ) 2 (see Example 12. σ > 0} are. under Ω = {θ = (μ. the LR test is equivalent to the test ⎧1. σ)′. ∑ (x j =1 n j − μ0 ) = ∑ (x 2 j =1 n j −x ) 2 + n x − μ0 ( ) 2 and therefore λ 2 n ⎡ 2 ⎢ n x − μ0 ⎢1 + 1 = ⎢ n−1 1 n ⎢ ∑ xj − x n − 1 j =1 ⎢ ⎣ ( ) ( ) ⎤ ⎥ ⎥ 2⎥ ⎥ ⎥ ⎦ −1 ⎛ t2 ⎞ = ⎜1 + . respectively. μ = μ0. φ z =⎨ ⎩0. Then λ < λ0 is equivalent to t2 > C for a certain constant C. σ)′.

In the present case. σ2).’s from N(μ2. μ2. μ2. . one has υ= and ˆ2 σω = Therefore n ⎞ 1 m +n 1 ⎛m ˆ ∑ υ k = m + n ⎜ ∑ xi + ∑ y j ⎟ = μ ω m + n k =1 ⎝ i =1 ⎠ j =1 1 m +n ∑ vk − v m + n k =1 ( ) 2 = 1 ⎡m ˆ ⎢∑ xi − μ ω m + n ⎢ i =1 ⎣ ( ) + ∑ (y 2 j =1 n j 2⎤ ˆ − μ ω ⎥. . . Under ω = {θ = (μ1. . Under Ω = {θ = (μ1. σ)′. ⎥ ⎦ ) i) Assume that σ1 = σ2 = σ unknown and we are interested in testing the hypothesis H : μ1 = μ2 (= μ unspeciﬁed). θ σ > 0}. ⎜ m + n ⎝ i =1 ⎠ j =1 and by setting υk = xk.7 Likelihood Ratio Tests 365 where C is determined by P t n−1 > C = α 2.’s from N(μ1. we have n ⎞ mx + ny 1 ⎛m ∑ xi + ∑ y j ⎟ = m + n . . . n.f. r. by Proposition 6. k = 1.v. μ 2 .13. ⎥ ⎦ ) ˆ LΩ = ( ) ( 1 ˆ 2πσ Ω ) m +n e − m +n ( ) 2 . EXAMPLE 12 ( ) (Comparing the parameters of two normal distributions) Let X1.d. . .3 UMP Tests for Testing Certain Composite Hypotheses 13. σ2) and Y1. ⎥ ⎦ ) ˆ Lω = It follows that ( ) ( 1 ˆ 2πσ ω ) m +n e − m +n ( ) 2 . of the X’s and Y’s is given by ( ) 2π 1 m +n ⎡ 1 1 exp ⎢− 2 m n σ1 σ 2 ⎢ 2σ 1 ⎣ ∑ (x − μ ) i 1 i =1 m 2 − 1 2 2σ 2 ∑ (y j =1 m j 2⎤ − μ 2 ⎥. Xm be i.i. Yn be i.d. μ2 ∈ .Ω = y . μ1. . . r. . Suppose 1 2 that the X’s and Y’s are independent and consider the following testing hypotheses problems.v. m and υm+k = yk. μ1 = μ2 ∈ θ ˆ μω = .d. .i. σ)′. the test just derived is UMPU. k = 1. . . . Notice that. σ > 0}. the joint p.Ω = x . . σ Ω = as is easily seen. the MLE’s of the parameters involved are given by ˆ ˆ ˆ2 μ1. . Therefore 1 ⎡m ⎢ ∑ xi − x m + n ⎣ i =1 ⎢ ( ) + ∑ (y 2 j =1 n j 2⎤ − y ⎥.

w = ⎨ ⎪ ⎩0. 2 and in a similar manner ∑ (y j =1 n j ˆ − μω ) = ∑ (y 2 j =1 n −y ) 2 + m2 n (m + n) m 2 (x − y) .366 13 Testing Hypotheses m +n ˆ ⎛σ ⎞ λ =⎜ Ω⎟ ˆ ⎝ σω ⎠ and λ 2 m +n ( ) = ˆ2 σΩ . w = (y1. ˆ2 σω Next ˆ ˆ ∑ ( xi − μω ) = ∑ [( xi − x ) + ( x − μω )] = ∑ ( xi − x ) m 2 m v 2 m 2 i =1 i =1 m i =1 ˆ + m x − μω ( ) 2 = ∑ xi − x i =1 ( ) 2 + mn 2 (m + n) j 2 (x − y) . . m+ n ( ) λ where t= mn x−y m+ n 2 m+ n ( ) ⎛ ⎞ t2 = ⎜1 + . because. 2 2 n so that ˆ (m + n)σˆ = (m + n)σ 2 ω 2 Ω + mn x−y m+ n ( ) = ∑ (x − x) + ∑ (y 2 i i =1 j =1 j −y ) 2 + It follows then that 2 mn x−y . . and z = (x1. . yn)′. ( ) . t is distributed as tm+n−2. ⎥ ⎦ ) Therefore the LR test which rejects H whenever λ < λ0 is equivalent to the following test: ⎧1. ⎪ φ z. . . . xm)′. m + n − 2⎟ ⎝ ⎠ −1 ( ) ⎡m 1 ⎢∑ xi − x m + n − 2 ⎣i =1 ⎢ ( ) + ∑ (y 2 j =1 n j 2⎤ −y ⎥. ( ) if t < −C otherwise. . We notice that the test φ above is the same as the UMPU test found in Proposition 10. . or t > C (C > 0) where C is determined by P t m+ n− 2 > C = α 2. under H.

Ω = ∑ xi − x m i =1 ( ) 2 and v 2 1 n ˆ2 σ 2 .Ω . we have θ 1 m ˆ ˆ ˆ μ1. σ1.3 UMP Tests for Testing Certain Composite Hypotheses 13.Ω 2 ω m +n 2 n 2 = (m + n)( m +n ) 2⎡ m ⎢∑i =1 xi − x ⎣ ( ) ∑ (y 2 n j =1 j 2 −y ⎤ ⎥ ⎦ n ) m 2 ⎧ m mm 2 nn 2 ⎨⎡∑i = 1 xi − x ⎣ ⎩⎢ ( ) 2 2 n + ∑j = 1 y j − y ⎤ ⎥ ⎦ ( ) ∑ ( j =1 2⎫ yj − y ⎬ ⎭ ) ( m +n ) 2 (m + n)( = m +n 2 ) 2 mm 2 nn ⋅ 2 m ⎡ ⎤ ⎢ m − 1 ⋅ ∑i = 1 xi − x m − 1 ⎥ 2 n ⎢ n−1 ⎥ ∑j = 1 y j − y n − 1 ⎦ ⎢ ⎥ ⎣ ( ( ) ) m 2 2 m ⎡ ⎤ ⎢1 + m − 1 ⋅ ∑i = 1 xi − x m − 1 ⎥ 2 n ⎢ ⎥ n−1 ∑j = 1 y j − y n − 1 ⎦ ⎢ ⎥ ⎣ ( ( ) ) ( m +n ) 2 . σ2)′. μ2 ∈ . n j =1 ( ) while under ω = {θ = (μ1. μ1.ω = μ1. σ1.7 Likelihood Ratio Tests 367 ii) Now we are interested in testing the hypothesis H : σ1 = σ2 (= σ unspeciﬁed). μ2.Ω and ˆ2 σω = Therefore ˆ LΩ = 1 ⎡m ⎢ ∑ xi − x m + n ⎣ i =1 ⎢ ( ) + ∑ (y 2 j =1 n j 2⎤ − y ⎥. ( m +n ) 2 (σˆ ) (σˆ ) λ= (σˆ )( ) 2 1 .Ω = ∑ y j − y .ω = μ 2 . σ1 = σ2 > 0}. σ 12. σ2)′.Ω = x . θ ˆ ˆ ˆ ˆ μ1. σ1. μ2 ∈ . σ2 > 0}.Ω 1 ⋅ 1 n 2 e − m +n ( ) 2 and ˆ Lω = so that ( ) ( 2π ) (σˆ ) m +n 2 ω 1 ⋅ 1 − ( m +n ) 2 e .13. μ1. μ 2 .Ω m 2 2 2 . μ 2 . ⎥ ⎦ ) ( ) ( 2π ) (σˆ ) (σˆ ) m +n 2 1 .Ω = y .Ω m 2 2 2 . μ2. Under Ω = {θ = (μ1.

Therefore the constants C1 and C2 are uniquely determined by the following requirements: P Fm−1. such that P Fm−1.n−1. Now. is equivalent to the test based on f and rejecting H if ⎛m−1 ⎜ ⎝ n−1 ⎞ f⎟ ⎠ m 2 ⎛ m−1 ⎞ f⎟ ⎜1 + n−1 ⎠ ⎝ ( m+ n) 2 <C for a certain constant C.n−1 < C1 or ( Fm−1. Furthermore.368 13 Testing Hypotheses = (m + n) ( n+ m) 2 m m 2 nn 2 ⋅ ⎛m−1 ⎜ ⎝ n−1 ⎞ f⎟ ⎠ m 2 ⎛ m−1 ⎞ f⎟ ⎜1 + n−1 ⎠ ⎝ ( m+ n) 2 . if in the expression of f the x’s and y’s are replaced by X’s and Y’s. ∞). Setting g( f) for the left hand side of this last inequality. n( m − 1) m n−1 f < C1 or f > C2 it is increasing between 0 and fmax and decreasing in (fmax.16. 13.) Fm 2 1. it follows that. which rejects H whenever λ < λ0.n−1 < C1 = P Fm−1.7. it can be seen (see Exercise 13. i= j Therefore the LR test. in practice the C1 and C2 are determined so as to assign probability α/2 to each one of the two tails of the Fm−1. Therefore g f < C if and only if () for certain speciﬁed constants C1 and C2. we have that g(0) = 0 and g(f) → 0 as f → ∞.4) that g(f) has a maximum at the point fmax = ( ).n−1 > C 2 = α ) and g C1 = g C 2 .n−1 distribution. ( ) ( ) (See also Fig. F is distributed as Fm−1. n 1 2 Figure 13. under H. and denote by F the resulting statistic.16 0 C1 C2 . ( ) ( ) However. respectively. ¯ ¯) where f = [Σm 1(xi − x 2/(m − 1)]/[Σn= 1(yj − y)2/(n − 1)].n−1 > C 2 = α 2. that is.

4. Xn are i.7. iii) Specify the (randomized) test of level of signiﬁcance α = 0.05. 13.2. n ≥ 2. 13. 3 as well as the corresponding probabilities under H.7.1 Refer to Exercise 13. is tossed 100 times and 60 heads are observed. and: i) Calculate the values λ(t) for t = 0. t ≥ 0. set λ(t) for the likelihood function.25.7. .’s from N(μ.13. iv) Compare the test in part (iii) with the UMPU test constructed in Exercise 13. n m−1 ⎢ ⎣ ( ( )⎤ ⎥ )⎥ ⎦ ⎡m n − 1 ⎞ and decreasing in ⎢ . max g t .3 UMP Tests for Testing Certain Composite Hypotheses Exercises 369 Exercises 13. .3 If X1.1.25 against the alternative A : p ≠ 0.2 A coin. 13. . ii) Compare the cut-off point in part (i) with that found in Exercise 13. derive the LR test and the test based on −2 log λ for testing the hypothesis H : σ = σ0 ﬁrst in the case that μ is known and secondly in the case that μ is unknown. where t = x1 + x2 + x3. t ≥ 0 = [ () ] [1 + (m n)]( mm 2 m+ n 2 ) and that g is increasing in ⎡ m n−1 ⎢0.v. with probability p of falling heads. ∞⎟ . compare the test based on −2 log λ with that derived in Example 11. ii) Set up the LR test. ⎟ ⎢n m − 1 ⎠ ⎣ ( ( ) ) .2 and use the LR test to test the hypothesis H : p = 0. m.4. and show that g(t) → 0 as t → ∞. 1. in terms of both λ(t) and t.i. At level of signiﬁcance α = 0. Speciﬁcally.d.4. integers.1: i) Test the hypothesis H : p = 1 against the alternative A : p ≠ 1 by using the 2 2 LR test and employ the appropriate approximation to determine the cutoff point.7. . 2. r. σ2).4 Consider the function ⎛m−1 ⎞ t⎟ ⎜ ⎝ n−1 ⎠ ⎛ m−1 ⎞ t⎟ ⎜1 + n−1 ⎠ ⎝ m 2 gt = () ( m+ n) 2 . In the ﬁrst case.

. We then formulate this as a hypothesis and proceed to test it on the basis of the data. . . . . . . . j = ∑ pij . as it can be shown on the k−1 basis of Theorem 6. let pj be the (constant) probability that each one of the trials will result in the outcome Oj and denote by Xj the number of trials which result in Oj. r which form a partition of the sample space S and let {Bj. . The constant C is determined by the fact that −2 log λ is asymptotically χ2 distributed under H. ⎪ ⎭ We may suspect that the p’s have certain speciﬁed values. i = 1. . Σk= 1xj = n and j ( ) n! x p1x ⋅ ⋅ ⋅ pk . in the case of a die. x1! ⋅ ⋅ ⋅ x k ! 1 k ⎧ ′ ⎪ Ω = ⎨θ = p1 . we may want to test the hypothesis that θ lies in a subset ω of Ω. Now consider r events Ai. P X1 = x1 . . We consider an r. . . . experiment which may result in k possibly different outcomes denoted by Oj. . j = 1. . pk . k. x1! ⋅ ⋅ ⋅ x k ! 1 k ˆ where pj = xj/n are the MLE’s of pj. = ∑ pij .8 Applications of LR Tests: Contingency Tables. . . p j > 0. . under Ω. . . j = 1. . . x1! ⋅ ⋅ ⋅ x k ! 1 k while. . . . Then. . j = 1. More generally. ˆ LΩ = ( ) n! ˆ ˆx p1x ⋅ ⋅ ⋅ pk . . that is. the die may be balanced. In n independent repetitions of the experiment. Then the joint distribution of the X’s is the Multinomial distribution. ⎪ ⎩ ( ) ∑p j =1 k j ⎫ ⎪ = 1⎬. . s} be another partition of S. j = 1. Goodness-of-Fit Tests Now we turn to a slightly different testing hypotheses problem. . and the desired level of signiﬁcance α. . . . . j =1 i =1 s r . Chapter 12). for example. Therefore ⎛p ⎞ λ = n ∏⎜ j 0 ⎟ j =1 ⎝ x j ⎠ k n xj and H is rejected if −2 log λ > C. . k. where the LR is also appropriate. . under ω. j = 1.370 13 Testing Hypotheses 13. . pk0)′}. X k = x k = where xj ≥ 0. . Consider the case that H : θ ∈ ω = {θ0} = {(p10. k. j = 1. k. k (see Example 11. . . p. θ ˆ Lω = ( ) n! x x p10 ⋅ ⋅ ⋅ pk0 . Let pij = P(Ai ∩ Bj) and let pi .

Ar} and {B1. p. . ∑p i =1 r ij = qj . r. j = 1. . . j = ∑ X ij . . s. . i = 1. . . . . . x. . . . A2 for male and female. . pij > 0. .13. . j =1 s Let θ = (pij. pi.3 LR Tests: for Testing Tables. . . . θ r. . j =1 i =1 s r . i =1 j =1 r s . . s such that H : pij = piqj. . = ∑ X . r. . B2 (high school graduate). . . . A may stand for gender and A1. . . . . . . . Goodness-of-Fit Tests 371 Then. = ∑ xij . We set Xi . speciﬁed by H. . if xij is the observed value of Xij and if we set xi . = P(Ai). . . . . . j)th cell. Ar be the r levels of A and B1. . . Br be the J levels of B. Namely. . . it follows that the set ω. Bs} are independent if and only if pij = pi. and B may denote educational status comprising the levels B1 (elementary school graduate). i =1 j =1 r s Furthermore. . . j = n. r. Since for i = 1. Then the set Ω of all possible values of θ is an (rs −1)-dimensional hyperplane in rs. B4 (beyond). i = 1. . . . r. . r − 1 and j = 1. = ∑ X ij j =1 s and X . Σir= 1 Σs= 1 pij = 1}. . We may think of the rs events Ai ∩ Bj being arranged in an r × s rectangular array which is known as a contingency table. i = 1. the problem of interest is that of testing whether the characteristics A and B are independent. i =1 r It is then clear that ∑X i =1 r i. j = ∑ xij . That is. r. . Next. i = 1. is an (r + s − 2)-dimensional subset of Ω. j = 1. . . . . j)th cell. s)′.j = ∑ ∑ pij = 1.j = P(Bj) and ∑p = ∑p i. . we want to test the existence of probabilities pi. the event Ai ∩ Bj is called the (i. . j Under the above set-up. j = 1.j. . the events {A1. . A situation where this set-up is appropriate is the following: Certain experimental units are classiﬁed according to two characteristics denoted by A and B and let A1. . . . s. For instance. . s. j = 1. j = 1. . clearly. j = 1. p. s)′ ∈ rs. Ω = {θ = (pij. i = 1. . . . .8 Applications of UMP TestsContingencyCertain Composite Hypotheses 13. qj. . . Again consider n experimental units classiﬁed according to the characteristics A and B and let Xij be the number of those falling into the (i. . B3 (college graduate). . s − 1 we have the r + s − 2 independent linear relationships ∑p j =1 s ij = pi . i = 1.

ˆ pij . ⎟ ⎥ ⎢∏ ⎜ . j xij ! ⎢ i ⎝ n ⎠ ⎣ as is easily seen (see also Exercise 13.j ⎤ ⎥ ⎥ ⎦ and hence λ= ⎡ ⎛ x ⎞ xi . j ⎟ ⎥⎢ j ⎝ n ⎠ ⎦⎣ . respectively. j xi . It can be shown that the (unspeciﬁed) assumptions of Theorem 6 are fulﬁlled in the present case and therefore −2 log λ is asymptotically χ2.1). n n x n! ⎡ ⎛ xi . n! ⎛ x ⎜ ∏ pi xij ! ⎝ i ∏i .ω = .j instead of Πir= 1 Πjs= 1. ∏ pi . j ⎟ ⎥ ⎢ i ⎝ n ⎠ ⎥⎢ j ⎝ n ⎠ ⎥ ⎦⎣ ⎣ ⎦ ⎛x ⎞ ∏ ⎜ nij ⎟ ⎠ i . ˆ . ⎞ ⎢∏ ⎜ ⎟ ∏i .j ⎝ xij = ⎛ ⎛ x⋅ j ⎞ xi ⋅ ⎞ ⎜ ∏ xi . respectively. i j xij x q j = ∏ ∏ pi q j = ∏ pix q!x ⋅ ⋅ ⋅ q s xij xij xij i⋅ i1 is i j i ⎛ ⎞⎛ ⎛ x x ⎞ = ⎜ ∏ pix ⎟ ⎜ ∏ q1 ⋅ ⋅ ⋅ q s ⎟ = ⎜ ∏ pix ⎝ i ⎠⎝ i ⎠ ⎝ i i⋅ i1 is i⋅ ⎞⎛ x ⎟ ⎜ ∏ qj ⎠⎝ j ⋅j ⎞ ⎟. under Ω and ω.j ij ij xij ij .9. Now in a multinomial situation. as described at the beginning of this section and in connection with the estimation problem. under ω. j =1 .W = xij n ˆ .j xij = n! x x ∏ pi q j = xij ! i . Therefore ˆ LΩ = ( ) ⎛x ⎞ n! ˆ ∏ ⎜ nij ⎟ . pi . Writing Πi. in a sense.j . j ⎞⎛ x ⎞ ⎟ ⎜ ∏ qj ⎟ ⎠⎝ j ⎠ .j ij i .j ( ) ∏n!x ! ∏ ( p q ) i j i . Recall that χ =∑ 2 k (X j − np j np j ) 2 . j xij ! i .372 13 Testing Hypotheses the likelihood function takes the following forms under Ω and ω. pi and qi are.j ij i . j ⎝ ⎠ xij ( ) i. j ∏i . Hence the test for H can be carried out explicitly. we have LΩ = Lω = since ( ) ∏n!x ! ∏ p i . f where f = (rs − 1) − (r + s − 2) = (r − 1)(s − 1) according to Theorem 6. ⎠ Now the MLE’s of pij. L ω = ∏i . Chapter 12) that certain chi-square statistics were appropriate.ω = x. ⎤⎡ ⎛ x ⎞ x ⎥ ⎢∏ ⎜ . ⎟ ⎜ ∏ x⋅ j ⎟ ⎠⎝ j ⎝ i ⎠ nn ∏ xij ij x i . it was seen (see Section 12. j i. j ⎤ ⎢∏ ⎜ i . ⎤ ⎡ ⎛ x ⎞ x.8. q j .

⎩ ⎭ where θ = (p1. q j .2 and test the hypothesis formulated there at the speciﬁed level of signiﬁcance by using a χ 2-goodness-of-ﬁt test. The constant ˆ ˆ C is to be determined by the signiﬁcance level and the fact that the asymptotic 2 distribution of χ ω . compare the cut-off point with that found in Exercise 13. is χ2 with f = (r − 1)(s − 1).1 Show that pij . However.v.7.8. By replacω ing them by their MLE’s.Ω = this section. In fact.2–13. 2 2 By means of χ ω . .2 Refer to Exercise 13.ω ) 2 . Also.7.8. Tests based on chi-square statistics are known as chi-square tests or goodness-of-ﬁt tests for obvious reasons. For the case of contingency tables and the problem of testing independence there. we consider { } ( k ) χ =∑ 2 ω j =1 (x j − npj 0 npj 0 ) 2 and reject H if χ2 is too large. . xij n ˆ . . qj.ω p j . That is. one can test H by rejecting it whenever χ ω > C. pi . χ2 is not a statistic since it involves the parameters pi.ω ˆ ˆ npi .ω = x⋅ j n as claimed in the discussion in In Exercises 13. . .ω q j . as can be shown. pk)′.9 below. . pk 0 ⎬. where ω is as in the previous case in connection with the contingency tables. It can further be shown that. we obtain the statistic χ =∑ 2 ˆ ω i . in the sense of being greater than a certain ω constant C which is speciﬁed by the desired level of the test. can be used for testing the hypothesis ′⎫ ⎧ H : θ ∈ω = θ 0 = ⎨ p10 . Once ˆ f more this test is asymptotically equivalent to the corresponding test based on −2 log λ.j (x ij ˆ ˆ − npi . χ2 is asymptotically distributed as χ2 . under ω. the test to be used will be the appropriate χ2 test. .3 UMP Tests for Testing Certain Composite Hypotheses Exercises 373 This χ2 r. n ˆ . Exercises ˆ 13. we have χ =∑ 2 ω i .j (x ij − npi q j npi q j ) 2 .8. under ω.13.2(i). . .8. 13. the present k−1 ω test is asymptotically equivalent to the test based on −2 log λ.ω = xi .

B for grades in [75.05. respectively. Half of them are given a preventive shot against a certain disease and the other half serve as control. two different varieties of a certain species are crossed and a speciﬁc characteristic of the offspring can only occur at three levels A. Use the data given below to check the assumption that the data is coming from an N(75. Of those who received the treatment. 12 and 12 . 30 did contract the disease and the remaining 20 did not. In a certain class. 89]. 1 100 2 94 3 103 4 89 5 110 6 104 At the level of signiﬁcance α = 0. 100]. 6.3 A die is cast 600 times and the numbers 1 through 6 appear with the frequencies recorded below. scores of human beings are normally distributed.8. C for grades in [60. respectively. Test the validity of the proposed model at the level of signiﬁcance α = 0.8. B and C. say. Of those not treated. B and C. A 3 B 12 C 10 D 4 F 1 13. employ the appropriate χ2 test and take α = 0. For this purpose.) 13.8. 18.Q. Test this claim for the data given below by choosing appropriately the Normal distribution and taking α = 0. 13. x ≤ 90 90 < x ≤ 100 100 < x ≤ 110 110 < x ≤ 120 120 < x ≤ 130 x > 130 10 18 23 22 18 9 (Hint: Estimate μ and σ2 from the grouped data. . the probabili1 3 8 ties for A. take the midpoints for the ﬁnite intervals and the points 65 and 160 for the leftmost and rightmost intervals.5 Course work grades are often assumed to be normally distributed. 49]. 59] and F for grades in [0. suppose that letter grades are given in the following manner: A for grades in [90.7 Consider a group of 100 people living and working under very similar conditions. Test effectiveness of the vaccine at the level of signiﬁcance α = 0.1. respectively. According to a proposed model.05.05.8. B and C are 12 .8.374 13 Testing Hypotheses 13. and 36 fall into levels A.6 It is often assumed that I. D for grades in [50. 40 did not contract the disease whereas the remaining 10 did so.05. 92) distribution. test the fairness of the die. Out of 60 offsprings. 13.4 In a certain genetic experiment. 74].

100).13.8 On the basis of the following scores. test whether the fractions of voters favoring the legislative proposal under consideration differ in the four wards. Xn be independent r.9 From each of four political wards of a city with approximately the same number of voters. . ω xn)′. 100 voters were chosen at random and their opinions were asked regarding a certain legislative proposal. 75).8. Chapter 12.d. f(·. in the present context a non-randomized decision function δ = δ(z) is deﬁned as follows: ⎧1.3 and 13. WARD 1 Favor Proposal Do not favor proposal Totals 37 63 100 2 29 71 100 3 32 68 100 4 21 79 100 Totals 119 281 400 13. appropriately taken.v.i.’s with p. . [85. [80.2.3 UMP Tests for Testing Certain Composite Hypotheses 13.3.05. θ ∈ Ω ⊆ r.’s with p. .f. Then the hypothesis to be tested is H : θ ∈ ω against the alternative A : θ ∈ωc. converges to 1 as n → ∞. . 80). .10 Let X1. [90.2. . [75. Let X1. δ (z ) = ⎨ ⎩0. . the reader is referred to Section 6. Refer to the previous exercises and ﬁnd at least one test which enjoys the property of consistency.8.v. Let B be a critical region. 85). On the basis of the data given below.9 Decision-Theoretic Viewpoint of Testing 375 13. r. .d.d.8. test whether there are gender-associated differences in mathematical ability (as is often claimed!). . θ ∈ Ω ⊆ r. θ). Boys: 80 96 98 87 75 83 70 92 97 82 Girls: 82 90 84 70 80 97 76 90 88 86 (Hint: Group the grades into the following six intervals: [70. 90). f(·. θ). 13. if z ∈ B otherwise. and let ω be a (measurable) subset of Ω. Speciﬁcally. . evaluated at any ﬁxed θ ∈ Ω. check whether the consistency property is satisﬁed with regards to Exercises 13.9 Decision-Theoretic Viewpoint of Testing Hypotheses For the deﬁnition of a decision. . For testing a hypothesis H against an alternative A at level of signiﬁcance α. . a test φ is said to be consistent if its power βφ. .f. Xn be i. loss and risk function. Then by setting z = (x1. Take α = 0.) 13.05. Take α = 0.

a decision function in the present framework is simply a test function. θ ∈ Ω = {θ0. if ω = {θ0}. R(θ .d. . R θ1 . we have θ θ ( ( ( ) 0 1 ⎧L1α . . Regarding the existence of minimax decision functions. By setting Z = (X1. We are θ interested in testing the hypothesis H : θ = θ0 against the alternative A : θ = θ1 at level α.376 13 Testing Hypotheses We shall conﬁne ourselves to non-randomized decision functions only.’s X1. . Thus in the case θ θ that ω = {θ0} and ωc = {θ1}.v. Deﬁne the subset B of n as follows: B = {z ∈ n. Clearly. θ1}. θ1). f(·. we have THEOREM 7 Let X1. is either f(·. 0 Pθ Ζ ∈ Bc . . 0 1 (46) Then the decision function δ deﬁned by .d.d. δ ≤ max R θ 0 . or θ ∈ω c and δ = 1. θ0)} and assume that there is a determination of the constant C such that L1Pθ0 Z ∈ B = L2 Pθ1 Z ∈ Bc ( ) ( ) (equivalently. ω = {θ1} and Pθ (Z ∈ B) = α. ⎩ As in the point estimation case. δ . δ * [( ) ( )] [( ) ( )] for any other decision function δ*. Xn)′. By setting f0 = f(·. θ1). f(z.’s with p. Also an appropriate loss function. corresponding to δ. on Ω. if θ = θ 0 ⎪ (45) R θ. R θ1 . . δ * . The notation φ instead of δ could be used if one wished.f. ⎩ c In particular. . we have the result below. we are led to minimax decision and Bayes decision functions corresponding to a given prior p.v. we would like to determine a decision function δ for which the corresponding risk would be uniformly (in θ) smaller than the risk corresponding to any other decision function δ*. ( ) if θ ∈ω and δ = 0. .i. ⎪ L2 1 − β . δ = ⎨L1 .f. 1 Pθ Z ∈ B + L θ. Pθ (Z ∈ B) = β. ⎪ ⎩L2 . θ0) or else f(·. δ = ⎨ if θ = θ1 .f. ⎪ L θ. δ ) = R(θ . . θ1) > Cf(z. Since this is not feasible. The r. r. L2 > 0. or ( ) ( ) ( ) ) ( ) ) ( ) ⎧L1Pθ Z ∈ B . . . ⎪L2 Pθ Z ∈ B . θ0) and f1 = f(·. δ is minimax if ( ) ( ) max R θ 0 . where L1. δ = L θ. .d. if θ ∈ω and δ = 1 if θ ∈ω c and δ = 0. the corresponding risk function is R θ. except for trivial cases. δ = ⎨ (44) c if θ ∈ω c . . Xn is a sample whose p. θ). if θ ∈ω ⎪ R θ. δ )). Xn be i. is of the following form: ⎧0.

δ) for R(θ0.3 UMP Tests for Testing Certain Composite Hypotheses 13. Then and let δ* be the corresponding R 0.13. δ = max R 0. REMARK 7 [( ) ( )] ( ) ( ) [ ( ) ( )] We close this section with a consideration of the Bayesian approach. ( ( ) ( ) ( ) (48) Since by assumption (0. as was to be seen. δ * > R 0. R 1. δ*) and suppose that R(0. then [( ) ( )] ( ) ( ) [ ( ) ( )] max R 0. δ) = (1. δ = max R 0. δ) and R(0.f. δ). δ = L2 1 − β . is the MP test of level P0 (Z ∈ B) constructed in Theorem 1. The relation (45) implies that 0 1 R 0. δ . This is equivalent to L1P0(Z ∈ A) ≤ L1P0(Z ∈ B). δ * = L2 P1 Z ∈ Ac . we have . δ . δ . (50) Relations (49) and (50) show that δ is minimax. δ) < R(0. δ*) ≤ R(0. δ*). (47) is minimax. θ More precisely. R 1. δ . δ * ≥ R 0. δ * = L1P0 Z ∈ A ( ) ( ) ( and R 1. δ). () if z ∈B otherwise. δ = L1α ( ) and R 1. ▲ It follows that the minimax decision function deﬁned by (46) is an LR test and. δ * .9 Decision-Theoretic Viewpoint of Testing 377 ⎧1. or P0 Z ∈ A ≤ α . ) ( ) Consider R(0. respectively. δ ≤ R 1. θ1}. ( ) ( ) ) ( ) ( ) or equivalently. corresponding to a given p. δ * . PROOF For simplicity. δ*) ≥ R(1. In connection with this it is shown that. δ * . and similarly R(0. δ . R(1. Also set P0(Z ∈ B) = α and θ θ P1(Z ∈ Bc) = 1 − β. R 1. R(1. δ). in fact. or L2 P1 Z ∈ Ac ≥ L2 P1 Z ∈ Bc . if R 0.d. δ * ≥ R 1. δ * = R 1. δ). R 1. ) Then Theorem 1 implies that P1(Z ∈ A) ≤ P1(Z ∈ B) because the test deﬁned by (47) is MP in the class of all tests of level ≤α. set P0 and P1 for Pθ and Pθ . That is. then R 1. δ z =⎨ ⎩0. (49) whereas if R(0. we have max R 0. on Ω = {θ0. δ * ≤ R 0. there is always a Bayes decision function which is actually an LR test. δ) and R(θ1. δ). n ( ) ( ( ) Let A be any other (measurable) subset of decision function. Hence P1 Z ∈ Ac ≥ P1 Z ∈ Bc .

f z. a decision rule which minimizes the average risk R(θ0. θ ∈ Ω = {θ0. Then by virtue of (44). we have 0 Rλ δ = L1P0 Z ∈ B p0 + L2 P1 Z ∈ Bc p1 0 () ( = p0 L1P0 Z ∈ B + p1L2 1 − P1 Z ∈ B = p1L2 + p0 L1P0 Z ∈ B − p1L2 P1 Z ∈ B and this is equal to p1L2 + ∫ p0 L1 f z. θ 0 ⎬. θ 1 > ( ) ( ) It follows that the Bayesian decision function is an LR test and is. θ 0 − p1L2 f z. δλ z = ⎨ ⎩0. PROOF Let Rλ (δ) be the average risk corresponding to λ0. . 0 () n if z ∈B otherwise. ⎫ p0 L1 f z. θ1 z∈B [ ( ) ( )] 0 for the discrete case.i. r. Xn be i. ▲ REMARK 8 .378 13 Testing Hypotheses THEOREM 8 Let X1. the MP test for testing H against A at the level P0(Z ∈ B). δ)p0 + R(θ1. θ1} and let λ0 = {p0. Then for testing the hypothesis H : θ = θ0 against the alternative A : θ = θ1. δλ z = ⎨ ⎩0.f. and by employing the simpliﬁed notation used in the proof of Theorem 7. θ1 dz B [ ( ) ) ( [ ( ) ( ) ( )] )] (51) [ ( ) ( )] for the continuous case and equal to p1L2 + ∑ p0 L1 f z. and is given by θ θ 0 ⎧1. ⎧1. . θ0)} and C = p0L1/p1L2. 0 () if p0 L1 f z. . θ). θ 0 − p1L2 f z.v. in fact. f(z. f(·. θ p1} (0 < p0 < 1) be a probability distribution on Ω. θ 0 − p1L2 f z. it follows that the δ which minimizes Rλ (δ) is given by ⎧1. as follows by Theorem 1. there exists a Bayes decision function δλ corresponding to λ0 = {p0.’s with p. 0 () if z ∈ B otherwise.d. θ1 < 0 ( ) ( ) otherwise. that is. In either case. where B = {z ∈ Rn.d. . . ⎪ δλ z = ⎨ ⎪ ⎩0. δ)p1. p1}. equivalently. p1L2 ⎭ where ⎧ B = ⎨z ∈ ⎩ as was to be seen. θ1) > Cf(z.

( ) ] or Φ 5C0 − 5 = 2 1 − Φ 5C0 . . We have ( ) = exp[n(θ − θ )x ] .5. We are interested in determining the minimax decision function δ for testing the hypothesis H : θ = θ0 against the alternative A : θ = θ1. EXAMPLE 13 Let X1. or 2 Φ 5C0 − Φ 5 − 5C0 = 1 Hence C0 = 0. The type-I error probability of this test is Pθ X > 0. . 1 > 0. δ z =⎨ ⎩0. θ 1 0 1 0 ⎣2 2 1 2 0 ⎦ so that f(z. n = 25 and L1 = 5.53 × 5 = 1 − Φ 2. Then L1 Pθ 0 X > C0 = L2 Pθ 1 X < C0 ( ) ( ) becomes Pθ 1 X < C0 = 2Pθ0 X > C0 . ( ) ( ) or Pθ 1 [ n X − θ1 < 5 C0 − 1 = 2 Pθ ( ) ( )] 0 [ n X − θ0 > 5C0 .3 UMP Tests for Testing Certain Composite Hypotheses 13. Therefore the minimax decision function is given by ( ) [ ( )] ( ) ( ) ⎧1. 0 1 ( ) ( ) As a numerical example.13. L2 = 2.004 0 ( ) [ ( ) ] ( ) .65 = 1 − 0. θ ) exp ⎢ n(θ − θ )⎥ f z. . Xn be i. ⎡1 ⎤ f ( z.996 = 0.i.v.53.53 otherwise.’s from N(θ. () if x > 0.9 Decision-Theoretic Viewpoint of Testing 379 The following examples are meant as illustrations of Theorems 7 and 8.53 = P N 0. . θ1 = 1. 1). as is found by the Normal tables. where C0 = 1 log C θ1 + θ0 + 2 n θ1 − θ0 ) ( ) (for θ > θ ). 1 2 Then condition (46) becomes L1Pθ X > C0 = L2 Pθ X ≤ C0 . r.d. θ0) is equivalent to ⎡1 ⎤ exp n θ1 − θ0 x > C exp⎢ n θ12 − θ02 ⎥ or ⎣2 ⎦ [( )] ( ( ) x > C0 . take θ0 = 0. θ1) > Cf(z.

53 = P N 0.9906. θ1) > Cf(z. 1 } is given by 0 3 ⎧1.02 and R θ1 . p1L2 2 Suppose p0 = 3 . 0 () if x > C0 otherwise. From the discussion in the previous example it follows that the Bayes decision function is given by ⎧1.0235. corresponding to the minimax δ given above. ( ) ( ) Thus max R θ 0 .380 13 Testing Hypotheses and the power of the test is Pθ 1 X > 0. 1 > 5 0. δ . p1 }. θ (1 − θ ) 0 1 0 0 1 . θ0) is equivalent to x log (1 − θ )θ > C ′ . . 1) > 2. 1 } is equal to 0. R θ1 .555451 (≈0. We are interested in determining the minimax decision function δ for testing H : θ = θ0 against A : θ = θ1. δλ z = ⎨ ⎩0. j =1 n so that f(z. Xn be i.i. δλ z = ⎨ ′ ⎩0. δ = 2. Therefore relation (51) gives that the Bayes risk 2 corresponding to { 3 . EXAMPLE 14 [( ) ( )] Refer to Example 13 and determine the Bayes decision function corresponding to λ0 = {p0.35 = 0. where x = ∑ x j . θ).0235.25) = 0.009 = 0. ( ) [ ( ) ( )] ( ) Therefore relation (44) gives R θ 0 . Then C = 4 and C0 = 0.0202.5 × 0. . θ 1 0 ⎞ ⎛ 1 − θ1 ⎞ ⎟ ⎜ ⎟ 0 ⎠ ⎝ 1 − θ0 ⎠ 1 x n −x . 3 1 EXAMPLE 15 Let X1.75) = 0.d.53 − 1 = Φ 2.55 otherwise. p1 = 1 . . 0 () if x > 0.55) = P[N(1.25] = Φ(2. .9878. We have ( ) = ⎛θ ⎜ f ( z. r. δ = 0.’s from B(1.55).v. where C0 = 1 log C θ1 + θ0 + 2 n θ1 − θ0 ( ) ( ) and C p0 L1 . 1) > − X 2. δ = 5 × 0. Therefore the 3 2 Bayes decision function corresponding to λ′ = { 3 .75] = ¯ 1 − Φ(2.003 and the power of the test is Pθ (X > 0. θ ) ⎝ θ f z.004 = 0. 0 ¯ The type-I error probability of this test is Pθ (X > 0.55) = P[N(0.

if x > 13 otherwise.75(X ≤ 13) = 0. 1 − θ0 Let now θ0 = 0. it follows then that relation (46) is satisﬁed. Then (1 − θ )θ = 3 θ (1 − θ ) 0 1 0 1 ( > 1) ( ).856. Therefore the minimax decision function is determined by ⎧1.2142. the minimax risk is equal to 0.7858. so that P0. where ⎛ 1 − θ1 ⎞ C0 = ⎜ log C − n log ⎟ 1 − θ0 ⎠ ⎝ n j=1 log Next.1071. With the chosen values of L1 and L2. Furthermore.2142 = 0. δ (z ) = ⎨ ⎩0. we have P0.5.3 UMP Tests for Testing Certain Composite Hypotheses 13. θ (1 − θ ) θ1 1 − θ0 0 1 and therefore f(z. θ1 = 0. θ1) > Cf(z. L2 = 0.5 × 0.13.5(X > 13) = 0.5.75.0577 and P0.75(X > 13) = 0.9 Decision-Theoretic Viewpoint of Testing 381 where C 0′ = log C − n log 1 − θ1 . θ0) is equivalent to x > C0. X = Σ Xj is B(n. n = 20 and L1 = 1071/577 ≈ 1. . θ) and for C0 = 13.

as a rule.’s Z1. DEFINITION 1 Let {Zn} be a sequence of r. say. . .’s. . but that is interpreted as 0 and no problem arises. it is advantageous not to ﬁx the sample size in advance. a single one at a time (sequentially). indeed. suppose we observe the r. . 382 . we have the following deﬁnition. REMARK 1 Next.382 14 Sequential Procedures Chapter 14 Sequential Procedures 14. Zn alone. but to keep sampling and terminate the experiment according to a (random) stopping time.’s Z1. and we stop observing them after a speciﬁed event occurs. Now. one after another. on the basis of the outcomes. then the relevant random experiment was supposed to have been independently repeated n times and ﬁnally. in many other cases this is. . Z2. in the point estimation and testing hypotheses problems the sample size n was ﬁxed beforehand. DEFINITION 2 A sampling procedure which terminates according to a stopping time is called a sequential procedure. In the latter case. for each n. . . the term ∞ · 0 appears. In certain circumstances. Zn. Thus.v. A stopping time (deﬁned on this sequence) is a positive integer-valued r. for example. . . In such a case and when forming EN.v. the random sample Z1.v. whereas in some situations the random experiment under consideration cannot be repeated at will. N such that. a point estimate or a test was constructed with certain optimal properties. .v.1 Some Basic Theorems of Sequential Sampling In all of the discussions so far. . the event (N = n) depends on the r. the case. In connection with such a sampling scheme. that we have dealt with was assumed to be of ﬁxed size n. a stopping time N is also allowed to take the value ∞ but with probability equal to zero.

SN. THEOREM 1 (Wald’s lemma for sequential analysis) For j ≥ 1. (∞ >)m = E Y j −1 n =1 j = E E Yj N = ∑ E Yj N = n P N = n n =1 [( )] ∞ ( )( ) ) (4) = ∑ E Yj N = n P N = n + ∑ E Y j N = n P N = n . where ZN is deﬁned as follows: the value of Z N at s ∈ S is equal to Z N ( s ) s . j n ii) ∑∞ = 1∑n= 1E(|Yj||N = n)P(N = n) = ∑∞= 1∑∞ = jE(|Yj||N = n)P(N = n). The proof of the theorem is simpliﬁed by ﬁrst formulating and proving a lemma. so that EZj = μ is also ﬁnite. n j j n PROOF i) For j ≥ 2. are independent. E(|Yj||N = n) = 0 for those n’s for which P(N = n) = 0). EYj = 0 and have (common) ﬁnite absolute moment of ﬁrst order to be denoted by m. let Zj be independent r. where SN is deﬁned by (2) and ZN is deﬁned by (1). . . Let N be a stopping time. E(|Yj||N = n) = E|Yj| = m. Yn and hence. Also set TN = Y1 + · · · + YN.’s Y1.v. Quite often the partial sums SN = Z1 + · · · + ZN deﬁned by S N s = Z1 s + ⋅ ⋅ ⋅ + Z N ( s ) s . where YN and TN are deﬁned in a way similar to the way ZN and SN are deﬁned by (1) and (2). set Yj = Zj − μ. (3) In all that follows. deﬁned on the sequence {Zj}. Then we will show that ETN < ∞ and ETN = 0. that is. () (1) () () () s ∈S (2) are of interest and one of the problems associated with them is that of ﬁnding the expectation ESN of the r. . respectively. this expectation is provided by a formula due to Wald. E|Yj| = m < ∞. for j > n. Y2. given N = n. j ≥ 1. LEMMA 1 In the notation introduced above: i) ∑∞= 1∑∞ = jE(|Yj||N = n)P(N = n) = m E N(<∞). j ≥ 1.v. it is assumed that all conditional expectations. Then E|SN| < ∞ and ESN = μEN.1 Some Basic Theorems of Sequential Sampling 383 Thus a sequential procedure terminates with the r. . are ﬁnite for all n for which P(N = n) > 0.’s (not necessarily identically distributed) with identical ﬁrst moments such that E|Zj| = M < ∞.v. . and assume that EN is ﬁnite. We set E(Yj|N = n) = 0 (accordingly.14. n= j ( )( ) ) ∞ ( )( ) The event (N = n) depends only on Y1. ZN. . Therefore (4) becomes m = m∑ P N = n + ∑ E Y j N = n P N = n n =1 n =j j −1 ( ∞ ( )( . Under suitable regularity conditions. Then the r.v. . For this purpose.

Apostol. in Mathematical Analysis. it sufﬁces to show (3). ∞ n= j j ≥ 1. Theorem 12–42. 1957). j j Relation (6) establishes part (i). and hence ∑ ∑ E( Y ∞ ∞ j =1n = j j N = n P N = n = m∑ P N ≥ j = m∑ jP N = j = mEN . To this end. That this is. we have ETN = E E TN N = ∑ E TN N = n P N = n n=1 ∞ ⎛ n ⎞ = ∑ E⎜ ∑ Y j N = n⎟ P N = n n=1 ⎝ j=1 ⎠ [( )] ∞ ( )( )( ) ( ) ) ∞ ∞ n ⎛ n ⎞ ≤ ∑ E⎜ ∑ Y j N = n⎟ P N = n = ∑ ∑ E Y j N = n P N = n ⎝ j=1 ⎠ n=1 n=1 j=1 ( ( )( ) ) (by Lemma 1(ii)) = mEN (< ∞) (by Lemma 1(i)). indeed. T. page 373. j =1 j =1 )( ) ∞ ( ) ∞ ( ) (6) where the equality ∑∞=1P(N ≥ j) = ∑∞=1 jP(N = j) is shown in Exercise 14.1. ii) By setting pjn = E(|Yj||N = n)P(N = n).M. as mP N ≥ 1 = m = E Y1 = ∑ E Y1 N = n P N = n . ▲ PROOF OF THEOREM 1 Since TN = SN − μN. ( ) ( ) and ∑∑ p j =1 n = j ∞ ∞ jn = p11 + p12 + ⋅ ⋅ ⋅ + p22 + p23 + ⋅ ⋅ ⋅ + ⋅ ⋅ ⋅ + pnn + pn . Addison-Wesley. the case follows from part (i) and calculus results (see.1. for example. = ∑ ∑ E Yj N = n P N = n j=1 n= j ∞ ∞ ( )( .n +1 + ⋅ ⋅ ⋅ + ⋅ ⋅ ⋅ ( ) ( ) ( ) are equal. n =1 ( ) ∞ ( )( ) Therefore ∑ E( Y j N = n)P( N = n) = mP( N ≥ j ). n =j ( ) ∞ ( )( ) (5) Equality (5) is also true for j = 1.384 14 Sequential Procedures or mP N ≥ j = ∑ E Yj N = n P N = n . this part asserts that n =1 j =1 ∑∑ p ∞ n jn = p11 + p12 + p22 + ⋅ ⋅ ⋅ + p1n + p2n + ⋅ ⋅ ⋅ + pnn + ⋅ ⋅ ⋅ .

If C1 < Sn(s) < C2 for all n. (11) Summing up over j ≥ 1 in (11). .v. . E(Yj|N = n) = EYj = 0. for each s ∈S. ▲ Now consider any r. . . . for j ≥ 1. . so that. Set Sn = Z1 + · · · + Zn and deﬁne the random quantity N as follows: N is the smallest value of n for which Sn ≤ C1 or Sn ≥ C2. In other words. and let C1. since ∑ ∑ E(Y ∞ ∞ j =1 n = j j N = n P N = n ≤ ∑ ∑ E Yj N = n P N = n < ∞ j =1 n = j )( ) ∞ ∞ ( )( ) by Lemma 1(i). )( ) (12) Relations (7) and (12) complete the proof of the theorem. . If C1 < Sn < C2 for all n. ∞ n= j j ≥ 1. Z2. for j > n. and ﬁnd the ﬁrst n. Yn. Next. then set N = ∞.’s Z1. Therefore ∑ E(Y j N = n)P( N = n) = 0. this is also true for j = 1. Therefore (9) yields ∑ E(Y j N = n)P( N = n) = 0. C2 be two constants such that C1 < C2. n= j ) This is so because the event (N = n) depends only on Y1.14. then set N(s) = ∞. (10) By (8). for j ≥ 2. j =1 n = j ∞ ( )( ) This last equality holds by Lemma 1(ii). ∞ n= j j ≥ 2. N(s). . N = N(s). Then we have the following result. the value of N at s. we have then ∑ ∑ E(Y ∞ ∞ j =1n = j j N = n P N = n = 0. n =1 [ ( )] )( )( ∞ ( )( ) (8) whereas. for which SN(s) ≤ C1 or SN(s) ≥ C2. 0 = EYj = E E Yj N = ∑ E Yj N = n P N = n . relation (8) becomes as follows: 0 = ∑ E Yj N = n P N = n + ∑ E Yj N = n P N = n n =1 ∞ n= j j −1 ( ( ) ∞ ( )( ) (9) = ∑ E Yj N = n P N = n . is assigned as follows: Look at Sn(s) for n ≥ 1.1 Some Basic Theorems of Sequential Sampling 385 ii) Here ETN = E E TN N = ∑ E TN N = n P N = n n =1 [( ∞ )] ∞ ( )( ) ∞ ∞ n ⎛ n ⎞ = ∑ E ⎜ ∑ Yj N = n⎟ P N = n = ∑ ∑ E Yj N = n P N = n ⎠ n =1 ⎝ j =1 n =1 j =1 ( ) ( )( ) (7) = ∑ ∑ E Yj N = n P N = n . say.

then. 1. . . Thus for the case that P(Zj > 0) > 0. ⎝ j = k +1 ⎠ ⎢ ⎥ ⎣j = k +1 ⎦ j = k +1 the inequality following from (17) and the equalities being true because of the independence of the Z’s and (14). k − 1. Let us suppose ﬁrst that P(Zj > 0) > 0. km. P(Zj > 1/n) = 0 for all n. (13) The assumption P(Zj = 0) ≠ 1 implies that P(Zj > 0) > 0. Thus ⎛ k +m ⎞ ⎡ k +m ⎤ k +m P ⎜ ∑ Z j > C 2 − C1 ⎟ ≥ P ⎢ I Z j > ε ⎥ = ∏ P Z j > ε = δ m . (16) j =k +1 I (Z j ⎛ k +m ⎞ ⎛ k +m ⎞ > ε ⊆ ⎜ ∑ Z j > mε ⎟ ⊆ ⎜ ∑ Z j > C 2 − C1 ⎟ . a contradiction. C2 with C1 < C2. km implies Z jm +1 + ⋅ ⋅ ⋅ + Z( j +1)m ≤ C 2 − C1 . ⎝ j =k +1 ⎠ ⎝ j =k +1 ⎠ ) (17) the ﬁrst inclusion being obvious because there are m Z’s. . deﬁne the r. . With C1. . . there exists a positive integer m such that ( ) (14) mε > C 2 − C1. j = 0. . . i = 1. . C2 as in the theorem and ε as in (14). which is in contradiction to C1 < Si < C2. set N = ∞ if C1 < Sn < C2 for all n. . .386 14 Sequential Procedures THEOREM 2 Let Z1.v. Then there exists ε > 0 such that P(Zj > ε) = δ > 0. .d. Then there exist c > 0 and 0 < r < 1 such that P N ≥ n ≤ cr n PROOF ( ) for all n. if for some j = 0. or P(Zj < 0) > 0. in particular. i = 1. 1. quantity N as the smallest n for which Sn ≤ C1 or Sn ≥ C2. r. if P(Zj > ε) = 0 for every ε > 0. . we shall show that ⎛ k +m ⎞ P ⎜ ∑ Z j > C 2 − C1 ⎟ > δ m ⎝ j = k +1 ⎠ We have k +m (15) for k ≥ 0. we have that There exists ε > 0 such that P Z j > ε = δ > 0. be i.’s such that P(Zj = 0) ≠ 1. In fact. Z2. For such an m. Set Sn = Z1 + · · · + Zn and for two constants C1. Next. . j =0 k −1 [ ] Now we assert that C1 < Si < C 2 . Clearly ( ) ( ) Skm = ∑ Z jm +1 + ⋅ ⋅ ⋅ + Z( j + 1)m . (18) This is so because. . . .i. . k − 1. we suppose that Zjm+1 + · · · + Z(j+1)m > C2 − C1. this inequality together with Sjm > C1 would imply that S(j+1)m > C2. and the second inclusion being true because of (15). . . each one of which is greater than ε. But (Zj > 1/n) ↑ (Zj > 0) and hence 0 = P(Zj > 1/n) → P(Zj > 0) > 0.

choose k so that km < n ≤ (k + 1)m. Since also P A = P lim An = lim P An n→∞ n→∞ ( ) ( ) ( ) . we have (i) P(N < ∞) = 1 and (ii) EN < ∞. m m 1/m ( ) ( ) k (19) Now set c = 1/(1 − δ ). clearly. and also leads to (13). r = (1 − δ ) . We have then P N ≥ n ≤ P N ≥ km + 1 ≤ 1 − δ m = ( ) ( ) ( m (1 − δ ) m 1 (1 − δ ) ) = c ⎡(1 − δ ) ⎢ ⎣ k k+ 1 m 1 m ⎤ ⎥ ⎦ (k + 1 )m = cr (k + 1 )m ≤ cr n .1.2. k the last inequality holding true because of (16) and the equality before it by the independence of the Z’s.14. . Then. . km ⊆ I Z jm +1 + ⋅ ⋅ ⋅ + Z( j +1)m ≤ C 2 − C1 . PROOF i) Set A = (N = ∞) and An = (N ≥ n). these inequalities and equalities are true because of the choice of k. . Thus P N ≥ km + 1 ≤ 1 − δ m . (See also Exercise 14. j = 1. j= 0 k −1 [ ) ] the ﬁrst inclusion being obvious from the deﬁnition of N and the second one following from (18). Therefore ⎧k −1 ⎪ P N ≥ km + 1 ≤ P ⎨I Z jm +1 + ⋅ ⋅ ⋅ + Z( j +1)m ≤ C 2 − C1 ⎪j = 0 ⎩ ( ) [ = ∏ P[ Z k −1 j =0 k −1 j =0 jm +1 + ⋅ ⋅ ⋅ + Z( j +1)m ⎪ ⎬ ]⎫ ⎪ ⎭ ≤ C −C ] 2 1 ≤ ∏ 1−δm = 1−δm ( ) ( ). and for a given n. The case P(Zj < 0) > 0 is treated entirely symmetrically. relation (13) is established.) The proof of the theorem is then completed. A = A1 A2 · · · . COROLLARY Under the assumptions of Theorem 2.1 Some Basic Theorems of Sequential Sampling 387 (N ≥ km + 1) ⊆ (C 1 < S j < C 2 . we have A = lim An and hence n→∞ ∞ n=1 An. relation (19) and the deﬁnition of c and r. ▲ The theorem just proved has the following important corollary. . Thus for the case that P(Zj > 0) > 0.

We are going to consider only the problem of sequentially testing a simple hypothesis against a simple alternative as a way of illustrating the application of sequential procedures in a concrete problem. let X1.1. assume that P(Zj < 0) > 0 and arrive at relation (13). as was to be shown. X n . .2 For a positive integer-valued r.v. N show that EN = ∑∞ P(N ≥ n). b. be two numbers (to be determined later) such that 0 < a < b.d. f (X ) ⋅ ⋅ ⋅ f (X ) f1 X1 ⋅ ⋅ ⋅ f1 X n 0 1 0 n . . the event (N = n) depends only on the r. Accordingly. n=1 In Theorem 2. Zn. so that P(A) = 0.v.388 14 Sequential Procedures by Theorem 2 in Chapter 2. and for each n. f0(x) > 0} = {x ∈ . r. N is a stopping time by Deﬁnition 1 and Remark 1. ii) We have EN = ∑ nP N = n = ∑ P N ≥ n ≤ ∑ cr n = c ∑ r n n=1 n=1 n=1 n=1 ∞ ( ) ∞ ( ) ∞ ∞ =c r < ∞. N is positive integer-valued and it might also take on the value ∞ but with probability 0 by the ﬁrst part of the corollary. On the other hand. . either f0 or else f1.v. the mathematical machinery involved is well beyond the level of this book. . Let a. consider the ratio λ n = λ n X1 .’s with p. .i. Thus lim P(An) = 0. f1(x) > 0}.’s Z1.f. 1 = ( ) ( ) ( ). . sampling according to a stopping time is.1 14.2 Sequential Probability Ratio Test Although in the point estimation and testing hypotheses problems discussed in Chapter 12 and 13. in general.1.d. . 1−r as was to be seen. . . But P(An) ≤ cr n by the theorem. 14. be i. 0. In order to simplify matters. X2. . . from the deﬁnition of N it follows that for each n. REMARK 2 Exercises 14. proﬁtable. and suppose that we are interested in testing the (simple) hypothesis H: the true density is f0 against the (simple) alternative A: the true density is f1.v. we also assume that {x ∈ . at level of signiﬁcance α (0 < α < 1) without ﬁxing in advance the sample size n. ▲ The r. To this end. respectively (as well as in the interval estimation problems to be dealt with in Chapter 15).

. the smallest n for which Sn ≤ log a or Sn ≥ log b for all n. if C is the set over which f0 and f1 differ. This assumption is equivalent to Pi(Z1 ≠ 0) > 0 under which the corollary to Theorem 2 applies. 1. we have that N takes on the values 1.’s with p. PROPOSITION 1 Let X1.i. f (X ) ⋅ ⋅ ⋅ f (X ) f1 X1 ⋅ ⋅ ⋅ f1 X n 0 1 0 n Z j = log ( ). 0. xn. .1 Some Basic Theorems ofProbability Ratio Test 14. . . In what follows. r. f1 x > 0 () } { () } and that Pi [f0(X1) ≠ f1(X1)] > 0. By letting N stand for the smallest n for which λn ≤ a or λn ≥ b. . . 1). deﬁne the random quantity N as the smallest n for which λn ≤ a or λn ≥ b. . we restrict ourselves to the common set of positivity of f0 and f1. f (X ) f1 X j 0 j j = 1. set λn = and ( ) ( ).i. Summarizing. Under suitable additional assumptions. and suppose that {x ∈ . and. Thus. xn are the observed values of X1. . . ( ) i = 0. we also make the assumption that Pi[f0(X1) ≠ f1(X1)] > 0 for i = 0. take another observation. of course. . . clearly. 1. i = 0.d. .d. f (X ) f1 X j 0 j so that log λn = ∑ Z j . j =1 n Clearly. we shall show that the value ∞ is taken on only with probability 0. 1 = log ( ) ( ). .d.14. j =1 n For two numbers a and b with 0 < a < b. . . n Sn = ∑ Z j = log λn . At this point. where x1.2 Sequential Sequential Sampling 389 We shall use the same notation λn for λn (x1. 2. Xn. for each n. . then N is j redeﬁned as the smallest n for which Sn ≤ log a or Sn ≥ log b. i = 0. equivalently. the Zj’s are i. the event (N = n) depends only on X1. and as soon as λn ≤ a. stop sampling and reject H. n. and for j = 1. The implication of Pi(N < ∞) = 1. we have the following result. . so that N will be a stopping time. . then it is assumed that ∫C f0(x)dx > 0 and ∫C f1(x)dx > 0 for the continuous case. .v. . X2. . . be i. f0 x > 0 = x ∈ . equivalently. . and if Sn = ∑n= 1Zj. consider the following sequential procedure: As long as a < λn < b. . regardless of whether the true density is f0 or f1. 1. . . . since the X’s are so. the proposition assures us that N is actually a stopping time with ﬁnite expectation. either f0 or else f1. stop sampling and accept H and as soon as λn ≥ b. . 0. For testing H against A. Xn. .f. set Z j = Z j X j . For each n. and possibly ∞. Then Pi N < ∞ = 1 and Ei N < ∞. 1 is. . that the SPRT described above will . Then the sequential procedure just described is called a sequential probability ratio test (SPRT) for obvious reasons.

and let α < β < 1. j = 1. λ 2 ≥ b + ⋅ ⋅ ⋅ ( = P (λ ≥ b) + P ( a < λ < b. . λ + P ( a < λ < b. . respectively. . However. λ 2 1 ( ( = P (λ ≤ a ) + P ( a < λ < b. . the actual identiﬁcation presents difﬁculties. . λ + P ( a < λ < b. x n ∈ ′ ⎪ ⎩ ⎧ ′ ⎪ T n = ⎨ x1 . . and the use of approximate values is often necessary. . . λ n ≤ a + ⋅ ⋅ ⋅ . a < λ 0 1 0 1 0 1 [( ) ( ) ) + a < λ1 < b. . 11 ≤ a ⎬. the determination of a and b was postponed until later. a< . . a < λ n−1 < b. T1′′= ⎨x1 ∈ . . set fin = f x1 . . From their own deﬁnition. (23) f0 j f0 n ⎪ ⎭ ⎫ f f ⎪ n . . .390 14 Sequential Procedures terminate with probability one and acceptance or rejection of H. 1 ( ) and in terms of them. . . . we shall see what is the exact determination of a and b. deﬁne T ′n and T ″ as below. (24) f0 j f0 n ⎪ ⎭ n . At this point. . we proceed as follows. let α and 1 − β be prescribed ﬁrst and second type of errors. i = 0. in testing H against A. as will be seen. . . x n ∈ ′′ ⎪ ⎩ ( ) ( ) (22) ( ( ) ) ⎫ f1 j f ⎪ < b. . . . . . . a < 1 j < b. i . . . . To start with. a < λ n−1 < b. . f01 f01 x1 ⎭ ⎩ ⎪ ⎪ ⎩ ⎭ and for n ≥ 2. In the formulation of the proposition above. we have α = P rejecting H when H is true ( = P0 λ1 ≥ b + a < λ1 < b. a < λ 1 1 1 1 1 1 [( ) ( ) ≤ a) + ⋅ ⋅ ⋅ ) + a < λ1 < b. n − 1 and 1n ≥ b⎬. . namely n ⎧ ⎫ f11 x1 ⎫ ⎧ f ⎪ ⎪ ≥ b⎬ T1′ = ⎨x1 ∈ . For each n. . λ n ≤ a + ⋅ ⋅ ⋅ 2 ) ] (21) ≤a +⋅⋅⋅ < b. In order to ﬁnd workable values of a and b. λ n ≥ b + ⋅ ⋅ ⋅ 2 ≥b +⋅⋅⋅ < b. . λ n ≥ b + ⋅ ⋅ ⋅ ) ) ] (20) n−1 ) and 1 − β = P accepting H when H is false = P λ1 ≤ a + a < λ1 < b. at least from theoretical point of view. . n−1 ) Relations (20) and (21) allow us to determine theoretically the cut-off points a and b when α and β are given. j = 1. ⎧ ′ ⎪ T n = ⎨ x1 . regardless of the true underlying density. . n − 1 and 1n ≤ a ⎬. . x n .

1). b n=1 ∫T ′′ (25) On the other hand. n=1 n ∞ Relation (25) becomes then α ≤ β b. (22). and in a very similar way (see also Exercise 14. we clearly have Pi N = n = ∫ fin + ∫ fin . 1. Also. one has α = ∑ ∫ f0n . (28) 1−β β (29) . n =1 Tn ′′ ∞ But on T ″. 1. From (20).14. 1 = ∑ Pi N = n = ∑ ∫ fin + ∑ ∫ fin . In the remainder of this section. for simplicity.2.1 Some Basic Theorems ofProbability Ratio Test 14. i = 0. Furthermore. the arguments will be carried out for the case that the Xj’s are continuous. T′ n T ′′ n ( ) and by Proposition 1. b≤ . let α ′ .2 Sequential Sequential Sampling 391 In other words. n=1 n=1 T′ n n=1 T ′′ n ∞ ( ) ∞ ∞ (26) From (21). the discrete case being treated in the same way by replacing integrals by summation signs. so that f0n ≤ (1/b)f1n. so that n=1 T′ n n=1 T ′′ n ∞ ∞ ∑ ∫T′′ f1n = β. (22) and (23). the differentials in the integrals will not be indicated. 1−α α Relation (29) provides us with a lower bound and an upper bound for the actual cut-off points a and b. (24) and (26) (with i = 1). while T ″ is the set of points in n for which n the SPRT terminates with n observations and rejects H. we also obtain (27) 1−α ≥ 1− β From (27) and (28) it follows then that ( ) a. f1n/fon ≥ b. i = 0. and b′ = 1−α α so that 0 < a′ < b′ by the assumption α < β < 1 . T ′n is the set of points in n for which the SPRT terminates with n observations and accepts H. Now set α≥ ( 1− β β . a′ = ) (30) and suppose that the SPRT is carried out by employing the cut-off points a′ and b′ given by (30) rather than the original ones a and b. respectively. Therefore n α = ∑ ∫ f0 n ≤ n=1 T ′′ n ∞ 1 ∞ ∑ n f1n. we have 1 − β = ∑ ∫ f1n = 1 − ∑ ∫ f1n .

0506. a and b by α′. β ′. we have α ′ < 0. for α = 0. It can be argued that this does not happen. Then replacing α. PROPOSITION 2 ) (33) Summarizing the main points of our derivations. From (33) it follows that α ′ > α and 1 − β ′ > 1 − β cannot happen simultaneously. in (29) and also taking into consideration (30). we have the following result. (1 − β ′) − α + αβ ′ ≤ (1 − β ) − α ′ + α ′β ( ) ( and by adding them up.01. respectively. Then relation (32) provides upper bounds for α ′ and 1 − β ′ and inequality (33) shows that their sum α ′ + (1 − β′) is always bounded above by α + (1 − β).2. α′ + 1 − β′ ≤ α + 1 − β .392 14 Sequential Procedures and 1 − β ′ be the two types of errors associated with a′ and b′. a′ and b′. 1−α (32) (1 − α )(1 − β ′) ≤ (1 − β )(1 − α ′) or and α ′β ≤ αβ ′. we obtain 1 − β′ 1−β ≤ a′ = 1 −α′ 1−α and hence 1 − β′ ≤ and β β′ = b′ ≤ α α′ α α β′ ≤ . As a consequence. For testing H against A by means of the SPRT with prescribed error probabilities α and 1 − β such that α < β < 1. respectively. α′ ≤ From (31) we also have α β and 1 − β ′ ≤ 1−β . respectively. β. and then it follows from (32) that α ′ and 1 − β ′ lie close to α and 1 − β. β β (31) 1−β 1−β 1 −α′ ≤ 1−α 1−α ( ) and α ′ ≤ That is. Furthermore. say. . we would be led to taking a much larger number of observations than would actually be needed to obtain α and β. 0. the cut-off points a and b are determined by (20) and (21).05.01 and 1 − β = 0. the typical values of α and 1 − β are such as 0. REMARK 3 Exercise 14.1 Derive inequality (28) by using arguments similar to the ones employed in establishing relation (27). The only problem which may arise is that. because a′ and b′ are used instead of a and b. and − αβ ′ ≤ −α ′β . Relation (30) provides approximate cut-off points a′ and b′ with corresponding error probabilities α ′ and 1 − β ′. the resulting α ′ and 1 − β ′ are too small compared to α and 1 − β.1.05 and 0.0106 and 1 − β′ < 0. For example. So there is no serious problem as far as α ′ and 1 − β′ are concerned.

whose proof is omitted as being well beyond the scope of this book. we have the relationships below: (a < λ and j < b. . . . all partial sums ∑ Zi.1 Some Basicof the SPRT-Expected Sample Size 14. 1. it minimizes E0N and E1N) among all tests (sequential or not) with error probabilities bounded above by α and 1 − β and for which the expected sample size is ﬁnite under both H and A. Thus formula (34) provides the expected sample size of the SPRT under both H and A. j = 1. . . we are led i to assume as an approximation that SN takes on the values A and B with respective probabilities j i=1 (λ 1 ≤ a or λ1 ≥ b = Z1 ≤ A or Z1 ≥ B .3 Optimality of the SPRT-Expected Sample Size An optimal property of the SPRT is stated in the following theorem. . The remaining part of this section is devoted to calculating the expected sample size of the SPRT with given error probabilities. . This suggests that we should try to ﬁnd an approximate value to Ei N.3 Optimality Theorems of Sequential Sampling 393 14. . . as follows. the SPRT with error probabilities α and 1 − β minimizes the expected sample size under both H and A (that is. but the actual calculations are tedious. and also ﬁnding approximations to the expected sample size. n= 2 ( ) ∞ ( λ n ≤ a or λ n ≥ b . n − 1 lie between A and B and it is only the ∑ n Zi which is either ≤A or ≥B. n ≥ 2 ⎠ (35) From the right-hand side of (35). . P0 S N ≥ B = α P S N ≤ A = 1 − β . 1 1 ( ) ( ) ( ) ( ) . ⎝ i =1 ∑ Zi ≤ A or i =1 ⎞ ≥ B⎟ . . Then we clearly have Ei N = ∑ nPi N = n = 1Pi N = 1 + ∑ nPi N = n n=1 n= 2 ∞ ( ) ( ) ∞ ( ) ) (34) = Pi λ1 ≤ a or λ1 ≥ b + ∑ nPi a < λ j < b. . and let N be the associated stopping time. . j = 1. ) ( ) (36) Pi S N ≤ A ( ) and Pi S N ≥ B . . We would then expect that ∑ n= 1Zi would not be i too far away from either A or B. j = 1. P S N ≥ B = β. n − 1. . ( ) i = 0. So consider the SPRT with error probabilities α and 1 − β. λn ≤ a or λ ≥ b j n ) ∑Z i =1 n i ⎛ = ⎜ A < ∑ Zi < B. n − 1. j = 1. THEOREM 3 For testing H against A. By setting A = log a and B = log b. i = 0. But and P0 S N ≤ A = 1 − α . by letting SN = ∑ N= 1Zi. . and this is i=1 due to the nth observation Zn. n − 1. 1.14. Accordingly.

f. f(·. Find the expected sample sizes under both H and A and compare them with the ﬁxed sample size of the MP test for testing H against A with the same α and 1 − β as above. Hence.’s X1. Theorem 1 gives EiSN = (EiN)(EiZ1). relation (38) provides approximations to EiN. Thus in the present case f0 = f(·. If furthermore Ei|Z1| < ∞ and EiZ1 ≠ 0. 1. the problem is that of testing H : θ = θ0 against A : θ = θ1 by means of the SPRT with error probabilities α and 1 − β. 14.05. . θ0) and f1 = f(·. if also Ei Z1 ≠ 0. then EiN = (EiSN)/(EiZ1). since (30) was derived under this additional (but entirely reasonable) condition. .3. i = 0.d.v.394 14 Sequential Procedures Therefore we obtain E0 S N ≈ 1 − α A + αB and E1 S N ≈ 1 − β A + βB. Exercises 14.d. and for θ0. θ1 ∈ Ω with θ0 < θ1.4 Some Examples This chapter is closed with two examples. be independent r. θ ∈ Ω = (0. θ). this becomes E0 N ≈ (1 − α )A + αB . that is. X2. α (39) In utilizing (39). E0 Z1 E1 N ≈ (1 − β )A + βB . 1 is given by (34). i = 0. the expected sample size EiN. in order to be able to calculate the approximations given by (38).3. are i. with p.2 Discuss the same questions as in the previous exercise if the Xj’s are independently distributed as Negative Exponential with parameter θ ∈ Ω = (0. i = 0. 1 − β = 0. By virtue of (37). . 1. 1. by assuming that Ei |Z1| < ∞. REMARK 4 A ≈ log a ′ = log 1−β 1−α and B ≈ log b′ = β . In both. ( ) ( ) (37) On the other hand.1 Let X1. it is necessary to replace A and B by their approximate values taken from (30).05 with α = 0. i = 0. θ1). . E1Z1 (38) Thus we have the following result.i. θ ∈ Ω ⊆ . Actually. PROPOSITION 3 In the SPRT with error probabilities α and 1 − β. Use the SPRT for testing the hypothesis H : θ = 0. ∞).v. we also assume that α < β < 1. .03 against the alternative A : θ = 0. the r.’s distributed as P(θ). X2. 14.1. . . ∞).

calculate a′. 0. 99 3 4 At this point. θ (1 − θ ) θ1 1 − θ 0 0 1 1 0 (40) Next.95 = ≈ 0.01 and 1 − β = 0. relation (39) gives 5 = −1. is to set up the formal SPRT and for selected numerical values of α and 1 − β.1 Some Basic Theorems of Sequential Examples 14. a′ = 0. (41) For a numerical application. X2. Z1 = log ( ) = X log θ (1 − θ ) + log 1 − θ 1 −θ f (X ) θ (1 − θ ) f1 X1 0 1 0 1 1 0 1 . EXAMPLE 1 Let X1.0505. In the present case.05. 1.14. estimate EiN. i = 0.99 Next. and ﬁnally compare the estimated EiN.0505 and b′ = = 95. 1. ( ) Then the test statistic λn is given by ⎛θ ⎞ λn = ⎜ 1 ⎟ ⎝ θ2 ⎠ ∑j X j ⎛ 1 − θ1 ⎞ ⎜ ⎟ ⎝ 1 − θ0 ⎠ n −∑ j X j and we continue sampling as long as ⎛ 1 − θ1 ⎞ ⎟ ⎜ A − n log 1 − θ0 ⎠ ⎝ log ( ) θ (1 − θ ) θ1 1 − θ 0 0 1 n ⎛ 1 − θ1 ⎞ < ∑ X j < ⎜ B − n log ⎟ 1 − θ0 ⎠ ⎝ j =1 log ( ). the corresponding error probabilities α ′ and 1 − β′ are bounded as follows according to (32): a′ ≤ 0. Then 8 8 A ≈ log (42) .01 0.95 0. θ ∈ Ω = 0. be i. θ = θ x 1 − θ ( ) ( ) 1− x .05 0. so that EiZ1 = θ i log θ1 1 − θ0 0 ( ) + log 1 − θ 1 −θ θ (1 − θ ) 1 1 0 .f.d.i. .29667 and B ≈ log 95 = 1.’s with p.01 0. . let us suppose that θ0 = – and θ1 = –.05 0. i = 0. 1. f x.d.0105 and 1 − β ′ ≤ ≈ 0. respectively.97772. 1 with the size of the ﬁxed sample size test with the same error probabilities. i = 0.4 Some Sampling 395 What we explicitly do.01 For the cut-off points a′ and b′. Then the cut-off points a and b are approximately equal to a′ and b′. .05 ≈ 0. b′. take α = 0. 1 − 0.99 0.v. upper bounds for α ′ and 1 − β′. where a′ and b′ are given by (30). 1 . r. x = 0.

13716 and E1Z1 = 0.53 and E1 N ≈ 3.f. EXAMPLE 2 Let X1. we ﬁnd that for the given α = 0.’s with p. (43) Finally.d. be i.09691. .5 and E1Z1 = 0. X2. 2 ( ) 1 2 2 θ 1 − θ 0 .d. 1).01 and β = 0. Thus both E0N and E1N compare very favorably with it. .05. . we have EiZ1 = θ i θ1 − θ0 − ( ) ( ) E0 Z1 = −0. by means of (42) and (43). n has to be equal to 244. relation (38) gives E0 N ≈ 92. Now the ﬁxed sample size MP test is given by (13) in Chapter 13. Thus relation (38) gives E0 N ≈ 2. r.396 14 Sequential Procedures log θ1 1 − θ0 0 ( ) = log 5 = 0. we have E0 Z1 = −0.4 On the other hand.63.95. (45) 2 By using the same values of α and 1 − β as in the previous example. 1 − θ0 5 so that by means of (41). i = 0 . we have the same A and B as before.22185 3 θ (1 − θ ) 1 and log 1 − θ1 4 = log = −0. that of N(θ. From this we ﬁnd that n ≈ 15. Using the normal approximation.i. Taking θ0 = 0 and θ1 = 1. 1.84. the MP test for testing H against A based on a ﬁxed sample size n is given by (9) in Chapter 13.v. Again both E0N and E1N compare very favorably with the ﬁxed value of n which provides the same protection. ) (44) Next.5.5 and E1 N ≈ 129. .014015. Then n ⎡ ⎤ 1 2 λn = exp ⎢ θ1 − θ 0 ∑ X j − n θ 1 − θ 2 ⎥ 0 2 ⎢ ⎥ j =1 ⎣ ⎦ and we continue sampling as long as ( ) ( ) ⎡ n 2 2 ⎤ ⎢A + 2 θ 1 − θ 0 ⎥ ⎣ ⎦ ( ) (θ 1 n ⎡ ⎤ n − θ 0 < ∑ X j < ⎢B + θ 2 − θ 2 ⎥ 1 0 2 j =1 ⎣ ⎦ ) ( ) (θ 1 − θ0 . Z1 = log so that ( )= θ ( f (X ) f1 X1 0 1 1 − θ0 X1 − ) 1 2 2 θ 1 −θ0 .

. Xn)] is a conﬁdence interval for θ with conﬁdence coefﬁcient 1 − α (0 < α < 1) if Pθ L X1 . we return to the estimation problem. f(·. X n ≥ 1 − α [( ) ( )] for all θ ∈ Ω. Xn) ≤ U(X1. .v. . Xn) and U(X1. . First. . . . . . . Xn)] is a conﬁdence interval for θ with conﬁdence coefﬁcient 1 − α. Xn) is an upper and a lower conﬁdence limit for θ. with conﬁdence coefﬁcient 1 − α. X n ≤ θ < ∞ ≥ 1 − α . . U(X1. we consider the case that θ is a real-valued parameter and proceed to deﬁne what is meant by a random interval and a conﬁdence interval. we considered the problem of estimating g(θ) by a statistic (based on θ the X’s) having certain optimality properties. if the probability is at least 1 − α that the 397 . We say that the r. Xn) and L(X1. . Xn). In the present chapter.1 Conﬁdence Intervals Let X1. . .2 Some Examples 397 Chapter 15 Conﬁdence Regions—Tolerance Intervals 15. . Pθ −∞ < θ ≤ U X1 . U(X1. . . θ That is. interval [L(X1. [ ( )] [( ) ] (2) Thus the r. . . .f.d. . . . . . In Chapter 12. Xn). Xn) be two statistics such that L(X1. . if for all θ ∈ Ω. . .d. . Let L(X1. r. . Xn).v. . . Xn be i. .i. . X n ≤ θ ≤ U X1 . . . . .’s with p. θ) θ ∈ Ω ⊆ r. interval [L(X1. . . . . . we considered the problem of point estimation of a real-valued function of θ. X n ≥ 1 − α and Pθ L X1 . (1) Also we say that U(X1. . where at least one of the end points is an r. but in a different context. . . . . . . g(θ). . . . respectively. DEFINITION 1 DEFINITION 2 A random interval is a ﬁnite or inﬁnite interval. . . . . . .15. .

. . In all of the examples in the present section. . . so that we obtain N intervals. . and whose distribution. . . . j = 1. at least (1 − α)N of the N intervals will cover the true parameter θ. . . . Xn. . . Tn(θ) = T(X1. 15. . . . Xn). Xn) is a lower and an upper 1 conﬁdence limit for θ. . . A similar interpretation holds true for an upper and a lower conﬁdence limit of θ. . each with conﬁdence coefﬁcient 1 − – α. . . The length l(X1.2 Some Examples We now proceed with the discussion of certain concrete cases. . . . Then Ln = L(X1. . . . . The interpretation of this statement is as follows: Suppose that the r. The examples which follow illustrate the point. . X n ≤ θ ≤ U X1 . . . xn). X n + 1. Xn)] covers the parameter θ no matter what θ ∈ Ω is. U(x1. if L(X1. Xn)] is a conﬁdence interval for θ with conﬁdence coefﬁcient 1 − α. . it should be pointed out that a general procedure for constructing a conﬁdence interval is as follows: We start out with an r. under Pθ . . . This will be done explicitly in a number of interesting examples. . U(X1. . . . .1.1 Establish the relation claimed in Remark 1 above. . . respectively. Xn) = U(X1. Then. interval [L(X1. . . Xn). REMARK 1 By relations (1) and (2) and the fact that Pθ θ ≥ L X1 . Xn) and Un = U(X1. . n. . . . . Xn) − L(X1. . . . . . . . In such a case. the problem is that of constructing a conﬁ- . Xn) of this conﬁdence interval is l = l(X1. Now it is quite possible that there exist more than one conﬁdence interval for θ with the same conﬁdence coefﬁcient 1 − α. . if it exists. Xn) and U(X1. . experiment under consideration is carried out independently n times. . Xn) and the expected length is Eθ l. . . Suppose now that this process is repeated independently N times. it is obvious that we would be interested in ﬁnding the shortest conﬁdence interval within a certain class of conﬁdence intervals. . . . [( ( ) )] [ ( ( )] )] it follows that. .v. . 2 then [L(X1. and if xj is the observed value of Xj. . . . . . θ) which depends on θ and on the X’s only through a sufﬁcient statistic of θ. is completely determined. X n [ = Pθ L X1 . . Xn) are some rather simple functions of Tn(θ) which are chosen in an obvious manner. . . Exercise 15. U(X1. construct the interval [L(x1. X n + Pθ θ ≤ U X1 . At this point. . . . .398 15 Conﬁdence Regions—Tolerance Intervals r. . . xn)]. as N gets larger and larger. .

.2. so that σ 2 is the parameter. 2 n j =1 σ ( ) ¯ Then T n(σ 2) depends on the X’s only through the sufﬁcient statistic S 2 of σ 2 n and its distribution is χ 2 for all σ 2.’s from N(μ.v. X + zα 2 ⎢ X − zα 2 ⎥. suppose that σ is known. we have which is equivalent to ⎡ Pμ ⎢a ≤ ⎢ ⎣ n X −μ ( ) ≤ b⎤ = 1 − α ⎥ ⎥ ⎦ ⎛ σ σ ⎞ Pμ ⎜ X − b ≤ μ ≤ X −a ⎟ = 1 − α. ⎝ n n⎠ Therefore ⎡ σ σ ⎤ . σ 2). First. Next. where a and b satisfy (3). where c is the upper α /2 quantile of the N(0. . assume that μ is known. r. we obtain . Then Tn(μ) ¯ of μ and its distribudepends on the X’s only through the sufﬁcient statistic X tion is N(0. EXAMPLE 1 Let X1. Its length is equal to (b − a)σ /√n. . the shortest one is that for which b − a is smallest. among all conﬁdence intervals with conﬁdence coefﬁcient 1 − α which are of the form (4). ⎢ ⎥ n n⎦ ⎣ Next. Tn(μ) = √n(X − μ)/σ. 1 ≤ b = 1 − α . From this it follows that. Tn σ 2 = ( ) 2 2 1 n nSn 2 . 1) distribution which we denote by zα /2. so ¯ that μ is the parameter. X −a (4) ⎢X − b ⎥ ⎢ ⎥ n n⎦ ⎣ is a conﬁdence interval for μ with conﬁdence coefﬁcient 1 − α. n Now determine two numbers a and b (0 < a < b) such that 2 P a ≤ χn ≤ b = 1 − α. [ ( ) ] σ (3) From (3).v. ( ) (6) From (6). and consider the r.15. where Sn = ∑ X j − μ . Therefore the shortest conﬁdence interval for μ with conﬁdence coefﬁcient 1 − α (and which is of the form (4)) is given by ⎡ σ σ ⎤ (5) . It can be seen (see also Exercise 15. determine two numbers a and b (a < b) such that P a ≤ N 0.v. 1) for all μ.i.d.2 Some Examples 399 dence interval (and also the shortest conﬁdence interval within a certain class) for θ with conﬁdence coefﬁcient 1 − α. and consider the r. .1) that this happens if b = c (> 0) and a = −c. Xn be i.

it clearly follows that that a for which l is shortest is given by n dl/da = 0 which is equivalent to db b 2 = . we work as follows.999). a ⎠ ⎝ b 2 Therefore 2 ⎡ nSn . Differentiating it with respect to a.995.” Journal of the American Statistical Association. it is obvious that a and b are not independent of each other but the one is a function of the other.90. From (6).f. 54. Tate and G. b (9) For the numerical solution of (9). W. pp. da n or db gn a = . 0. although there are inﬁnite pairs of numbers a and b satisfying (6). it follows that a and b are determined by a 2 gn a = b2 gn b () () and ∫a g n (t )dt = 1 − α .f. da gn b () () Thus (8) becomes a2gn(a) = b2gn(b). The expected length is equal to (1/a − 1/b)nσ 2. ⎢ ⎢ ⎣ b 2 nSn ⎤ ⎥ a ⎦ ⎥ (7) is a conﬁdence interval for σ 2 with conﬁdence coefﬁcient 1 − α and its length is equal to (1/a − 1/b)nS 2 . F. tables are required. this is not the best choice because then the n corresponding interval (7) is not the shortest one. the shortest (both in actual and expected length) conﬁdence interval for σ 2 with conﬁdence coefﬁcient 1 − α (and which is of the form (7)) is given by . one obtains gn b ( ) db − g (a) = 0. relation (6) becomes n Gn(b) − Gn(a) = 1 − α.99. 674–682) for n = 2(1)29 and 1 − α = 0.95. Such tables are available (see Table 678 in R. of the χ 2 . n Now. Klett. 0.d. Vol. 0.400 15 Conﬁdence Regions—Tolerance Intervals ⎛ ⎞ nS 2 Pσ ⎜ a ≤ 2n ≤ b⎟ = 1 − α σ ⎝ ⎠ 2 which is equivalent to ⎛ nS 2 nS 2 ⎞ Pσ ⎜ n ≤ σ 2 ≤ n ⎟ = 1 − α . letting Gn and gn be the d. By means of this result and (6). However. For the determination of the shortest conﬁdence interval. 0. and the p. Since the length of the conﬁdence interval in (7) is l = (1/a − 1/b)nS 2 . 1959. To summarize then. da a 2 (8) Now. So let b = b(a). in practice they are often chosen by assigning mass α /2 to each one of the tails of the χ 2 distribution. “Optimum conﬁdence intervals for the variance of a normal distribution.

n. . ⎝ j =1 ⎠ j =1 Therefore a conﬁdence interval with conﬁdence coefﬁcient 1 − α is given by . As a numerical application.96. for the equal-tails conﬁdence interval given by (7). call it r.d. r. since 2r φ2 X Therefore j β (t ) = φ Xj ⎛ 2t ⎞ 1 ⎜ ⎟= ⎝β⎠ 1 − 2it ( ) 2r 2 (see Chapter 6).2636 ⎥ ⎣ ⎦ and the ratio of their lengths is approximately 1. Xn be i.1. 2Xj /β is χ 2 .7051 14. Furthermore. Tn β = () 2 n ∑Xj β j =1 is χ 2 for all β > 0.i. .2 2 ⎡ nSn . Chapter 11).07.646.646 ⎣ 2 25S25 ⎤ ⎥.v. ⎢ ⎥ ⎢ 45. Now determine a and b (0 < a < b) such that 2rn 2 P a ≤ χ 2 rn ≤ b = 1 − α . σ = 1.392.15. the shortest conﬁdence interval is equal to 2 2 ⎡ 25S25 25S25 ⎤ . ( ) (10) From (10). EXAMPLE 2 Let X1. so that the equal-tails conﬁdence interval itself is given by 2 ⎡ 25S25 . . Next. ⎢ ⎢ b ⎣ 2 nSn ⎤ ⎥. . and 1 − α = 0.120 ⎥ ⎦ On the other hand. X + 0. . Then ∑ n= 1Xj is a sufﬁcient statistic for j β (see Exercise 11.2(iii). 13. so that (5) gives [X − 0. ⎢ ⎢ 40. Then ¯ ¯ zα/2 = 1.392]. we obtain ⎛ ⎞ 2 n Pβ ⎜ a ≤ ∑ X j ≤ b⎟ = 1 − α β j =1 ⎝ ⎠ which is equivalent to n ⎛ n ⎞ Pβ ⎜ 2 ∑ X j b ≤ β ≤ 2 ∑ X j a⎟ = 1 − α . for each j = 1.’s from the Gamma distribution with parameter β and α a known positive integer.95. .v. a ⎥ ⎦ Some Examples 401 where a and b are determined by (9). .120 and b = 40. we have a = 13. the r. let n = 25. .

’s from the Beta distribution with β = 1 and α = θ unknown. It follows then that the shortest (both in actual and expected length) conﬁdence interval with conﬁdence coefﬁcient 1 − α (which is of the form (11)) is given by (11) with a and b determined by a 2 g 2 rn a = b 2 g 2 rn b () () and ∫a g 2 rn (t )dt = 1 − α . is not the shortest among those of the form (11). a = 16. ⎝ a b⎠ As in the second part of Example 1.461 ⎢ ⎣ 2 ∑ j =1 X j ⎤ ⎥. .308 ⎥ ⎥ ⎦ 7 so that the ratio of their length is approximately equal to 1.f. Xn be i. we have.95. r. or −∑ jn= 1 log Xj is a sufﬁcient statistic for θ. Yj = −2θ log Xj. . of a χ 2. .) Consider the r. one has to minimize l subject to (10). Then ∏ n= 1Xj. This shows that 2 2 Tn θ = −2θ ∑ log X j = ∑ Yj j =1 j =1 () n n is distributed as χ 2 . ⎥. EXAMPLE 3 Let X1. respectively. which is the p.2(iv) in Chapter 11. it follows that the equal-tails conﬁdence interval.d. ( ) (12) .1.3675 ⎢ ⎣ The equal-tails conﬁdence interval is 2 ∑ j =1 X j ⎤ ⎥. is – exp(−yj /2). Thus the corresponding shortest conﬁdence interval is then ⎡2 7 X ⎢ ∑ j =1 j . But this is the same problem as the one we solved in the second part of Example 1.f.v. yj > 0. ⎢ 49. .d.402 15 Conﬁdence Regions—Tolerance Intervals n ⎡2 n X 2 ∑ j =1 X j ⎤ ⎢ ∑ j =1 j . ⎢ ⎥ b a ⎢ ⎥ ⎦ ⎣ Its length and expected length are.d.5128 ⎥ ⎥ ⎦ 7 ⎡2 7 X ⎢ ∑ j =1 j . 15.v.5128 and b = 49. ⎢ 44. for n = 7.i. Now determine a and b (0 < a < b) such that 2n 2 P a ≤ χ 2n ≤ b = 1 − α . (11) ⎛ 1 1⎞ n l = 2⎜ − ⎟ ∑ X j . r = 2 and 1 − α = 0.3675.075. It is easily seen that 1 its p. (See Exercise j 11. by means of the tables cited in Example 1. 16. b For instance. In order to determine the shortest conﬁdence interval. ⎝ a b ⎠ j =1 ⎛ 1 1⎞ Eβ l = 2 βrn⎜ − ⎟ . which is customarily employed.

2 Some Examples 403 From (12). θ).f. ⎝ ⎠ j =1 j =1 Therefore a conﬁdence interval for θ with conﬁdence coefﬁcient 1 − α is given by ⎤ ⎡ a b ⎥.357 and b = 71. Chapter 10). Chapter 11) and its p.d. EXAMPLE 4 Let X1. a [ () ] b (14) . is easily seen to be given by hn t = nt n −1 . .v. Consider the r.’s from U(0. Then Yn = X(n) is a sufﬁcient statistic for θ (see Example 7. b However. () Then deﬁne a and b with 0 ≤ a < b ≤ 1 and such that Pθ a ≤ Tn θ ≤ b = ∫ nt n −1dt = bn − an = 1 − α .d.d. . . Tn(θ) = Yn/θ. the equal-tails conﬁdence interval for θ is given by (13) with a = 32.v. for n = 25 and 1 − α = 0. Its p. we obtain n ⎛ ⎞ Pθ ⎜ a ≤ −2θ ∑ log X j ≤ b⎟ = 1 − α ⎝ ⎠ j =1 which is equivalent to n n ⎛ ⎞ Pθ ⎜ a −2 ∑ log X j ≤ θ ≤ b −2 ∑ log X j ⎟ = 1 − α .95. gn is given by gn yn = ( ) n n −1 yn . Considering dl/da = 0 in conjunction with (12) in the same way as it was done in Example 2. 0 ≤ yn ≤ θ θn (by Example 3.f. r. Xn be i. For example.i. − n ⎢ 2 n log X 2 ∑ j =1 log X j ⎥ j ⎥ ⎢ ⎦ ⎣ ∑ j =1 Its length is equal to l= a−b 2 ∑ j =1 log X j n (13) .420. no tables which would facilitate this solution are available. ⎢− . we have that the shortest (both in actual and expected length) conﬁdence interval (which is of the form (13)) is found by numerically solving the equations g 2n a = g 2n b () () and ∫a g 2n (t )dt = 1 − α . .15. 0 ≤ t ≤ 1.

) 15. . Guenther on “Shortest conﬁdence intervals” in The American Statistican. by means of (14). and the exercises just mentioned. we get Pθ(a ≤ Yn/θ ≤ b) = 1 − α which is equivalent to Pθ[X(n)/b ≤ θ ≤ X(n)/a] = 1 − α. 1) distribution and let a and b with a < b be such that Φ(b) − Φ(a) = γ (0 < γ < 1). Show that b − a is minimum if b = c (> 0) and a = −c.5–15.2. The inclusion of the discussions in relation to shortest conﬁdence intervals in the previous examples. l is decreasing as a function of b and its minimum is obtained for b = 1.098X(32)].2 Let X1. we have approximately [X(32). U is distributed as Gamma with parameters (n.7 at the end of this section are treated along the same lines with the examples already discussed and provide additional interesting cases. . 2n .v. Therefore a conﬁdence interval for θ with conﬁdence coefﬁcient 1 − α is given by ⎡ X (n ) X (n ) ⎤ ⎥ ⎢ . 2U/θ is distributed as χ 2 .v. ( ) α 1 n ⎥.’s having the Negative Exponential distribution with parameter θ ∈ Ω = (0. θ) and that the r. Therefore the shortest (both in actual and expected length) conﬁdence interval with conﬁdence coefﬁcient 1 − α (which is the form (15)) is given by ⎡ X (n ) ⎤ ⎢X n . so that dl an +1 − bn +1 = X (n ) . Number 1. ⎢ ⎥ ⎣ ⎦ For example. of the N(0. in which case a = α 1/n.2. Vol.95. C. 23.1 Let Φ be the d. 1. db ⎝ a db b 2 ⎠ while by way of (14). ∞). From this. Xn be independent r. .404 15 Conﬁdence Regions—Tolerance Intervals From (14). has been motivated by a paper by W.v.f. (See also the discussion of the second part of Example 1. . 1969.2. we have (15) ⎛ 1 da 1 ⎞ dl = X (n ) ⎜ − 2 + ⎟. i ii) Show that the r. da/db = bn−1/an−1. a ⎥ ⎢ b ⎦ ⎣ and its length is l = (1/a − 1/b)X(n). Exercises 15.2. and set U = ∑ n= 1Xi. for n = 32 and 1 − α = 0. where shortest conﬁdence intervals exist. db b 2 an +1 Since this is less than 0 for all b. Exercises 15.

. θ = e I (θ . x > 0). 15. (Hint: Use the parametrization f (x. ⎢Y1 − 2n ⎢ ⎥ ⎣ ⎦ where χ 2. 15.f. 1) such that P (c ≤ R ≤ 1) = 1 − α. . Y1 ⎥. x > 0). Chapter 11. θ) = 1/θ e−x/θ. X has the Negative Exponential distribution with parameter θ ∈ Ω = (0. Xn be i. .∞ ) x .2(i)) 2U/θ is distributed as χ 2 . θ) with conﬁdence coefﬁcient 1 − α. Use this fact 2n and part (i) of this exercise to construct a conﬁdence interval for R(x.v. Tn(θ) = 2n(Y1 − θ) is distributed as χ 2. where c is a root of the equation c n −1 n − n − 1 c = α iii) Show that the expected length of the shortest conﬁdence interval in Example 4 is shorter than that of the conﬁdence interval in (ii) above. 2n j j iii) A conﬁdence interval for θ. Y1 − (a/2n)]. . Then show that: iii) The p.d. .v. then (by Exercise 15. x > 0). 2Y/a].2. given in Exercise 11. Then: iii) Find the distribution of R. Tn(θ) = 2Y/θ is distributed as χ 2 . . 2 iii) A conﬁdence interval for θ. Xn be independent r.d. θ) = 1/θ e−x/θ.d. iv) The shortest conﬁdence interval of the form given in (iii) is provided by 2 ⎡ ⎤ χ 2 . 15. (Hint: Take cε(0.v.3 ii) If the r.2. given by − ( x −θ ) f x. θ ∈Ω = [ ( )] ( ) () and set Y1 = X(1).2 Some Exercises Examples 405 ii) Use part (i) to construct a conﬁdence interval for θ with conﬁdence coefﬁcient 1 − α.i.2. r.v.’s having the Weibull p. ii) If X1.α .2.4. where Y = ∑ n= 1X γ.2. .6 Let X1. . (Hint: Use the parametrization f (x. with conﬁdence coefﬁcient 1 − α is of the form [2Y/b. based on R.v.15. . based on Tn(θ). 2 2 15. (Hint: Use the parametrization f (x.d.α is the upper αth quantile of the χ 2 distribution. θ) = Pθ(X > x) (x > 0) is equal to e−x/θ. .5 Let X1. g of Y1 is given by g(y) = ne−n(y−θ )I(θ.) θ θ iii) Show that a conﬁdence interval for θ. R/c]. θ) = 1/θ e−x/θ.2. .f. based on Tn(θ). . with conﬁdence coefﬁcient 1 − α is of the form [Y1 − (b/2n). .4 Refer to Example 4 and set R = X(n) − X(1). show that the reliability R(x. Xn is a random sample from the distribution in part (i) and U = ∑ in= 1Xi.∞)(y) iii) The r. ∞). Then show that: iii) The r. with conﬁdence coefﬁcient 1 − α is of the form [R.f.’s with p.

.406 15 Conﬁdence Regions—Tolerance Intervals iii) The shortest conﬁdence interval of the form given in (ii) is taken for a and b satisfying the equations ∫a g 2n (t )dt = 1 − α b and a 2 g 2n a = b 2 g 2n b . σ 2). r.d.8. .2. based on Tm.8 Consider the independent random samples X1.d. . ∞ .’s with p.f. 2θ θ ∈ Ω = 0. 2n 15. Xn be i. Sn ⎥ ⎢ Sn ⎣ ⎦ . based on T m. σ 2) 1 and Y1. Yn from N(μ2. 2 and let the r. . 15.9 Refer to Exercise 15. μ2 are unknown.v. with conﬁdence coefﬁcient 1 − α is given by ⎡ σ2 σ2 σ2 σ2 ⎤ ⎢ X m − Yn − b 1 + 2 . σ2 are known and μ1.2. ⎛ σ ⎞ σ 2 S2 Tm. where σ1. σ2 are unknown. θ = ( ) 1 x θ e .v. . Tn(θ) = 2Y is distributed as χ 2 . .v. m n m n ⎥ ⎢ ⎦ ⎣ where a and b are such that Φ(b) − Φ(a) = 1 − α. ii) A conﬁdence interval for μ 1 − μ 2. ( ) Then show that: ii) The r.f. .d. 2n j θ ii) and (iii) as in Exercise 15.2.2. Tm.7 Let X1. but now suppose that μ 1. with 1 2 conﬁdence coefﬁcient 1 − α is given by 2 2 ⎡ Sm Sm ⎤ ⎢ a 2 . μ 2 are known and σ1. of the χ 2 distribution.i. . b 2 ⎥. () () where g2n is the p. .n(μ1 − μ2). . Xm from N(μ1.n(σ1/σ2).2. X m − Yn − a 1 + 2 ⎥. Consider the r. . 2 15. n μ1 − μ 2 = Then show that: ( ) (X m − Yn − μ1 − μ 2 2 1 (σ ) ( m + σ ) ( 2 2 n ) ).n(μ 1 − μ 2) be deﬁned by Tm. ( ) ( ) ii) The shortest conﬁdence interval of the aforementioned form is provided by the last expression above with − a = b = zα . where Y = ∑n= 1|Xj|. given by f x.v.6. n ⎜ 1 ⎟ = 12 n 2 ⎝ σ 2 ⎠ σ 2 Sm ¯ and show that a conﬁdence interval for σ 2/σ 2. .

EXAMPLE 5 Refer to Example 1 and suppose that both μ and σ are unknown. the equaltails conﬁdence interval is provided by the last expression above with a = F′ α /2 and b = Fn. Yn be independent random samples from the Negative Exponential distributions with parameters θ1 and θ2.3 Conﬁdence Intervals in the Presence of Nuisance Parameters So far we have been concerned with the problem of constructing a conﬁdence interval for a real-valued parameter when no other parameters are present.15. some other (nuisance) parameters do appear in the p.f. First. (Hint: Employ the parametrization used in Exercise 15.α 2 n −1 ⎥. .2m.v.m.’s 2U/θ1 and 2V/θ2 are distributed as χ 2 and χ 2 . so 2m 2n that the r.10 Let X1. we obtain a conﬁdence interval of the form ⎡ Sn −1 S ⎤ . an argument similar to the one employed in Example 1 implies that the shortest (both in actual and expected length) conﬁdence interval of the form (16) is given by ⎡ Sn −1 S ⎤ . .2.2(i)) the indei= i pendent r. .v. V = ∑ n= 1Yj. of Fn. . ⎢Xn − b ⎢ ⎥ n n⎦ ⎣ (16) Furthermore. in addition to the real-valued parameter of main interest.α 2 ⎢ ⎥ n n⎦ ⎣ (17) . Then (by Exercise 15. The examples below illustrate the relevant procedure.m. Tn(μ) of Example 1 and replace σ 2 by its usual estimator 2 Sn −1 = 2 1 n ∑ X j − Xn . respectively.2. respectively. 15. consider the r. S 2 )′ of (μ. and working as in Example 1.α /2 and Fn. and set U = ∑ m 1Xi. . X n + t n −1.2. σ 2)′. respectively. X n − a n −1 ⎥.v. . Basing the conﬁdence only through the sufﬁcient statistic (X n−1 interval in question on Tn(μ). In particular. α /2 quantiles.2 Some Examples Nuisance Parameters 407 where 0 < a < b are such that P(a ≤ Fn. However. In such cases.d. . For this purpose. Xm and Y1. ⎢ X n − t n −1.3 Conﬁdence Intervals in the Presence of15. under consideration. .) 15. we suppose that we are interested in constructing a conﬁdence interval for μ. T′(μ) = √n(X n − μ)/Sn−1 which depends on the X’s n ¯n.α /2 are the lower and the upper ′ n. we replace the nuisance parameters by appropriate estimators and then proceed as before. which is tn−1 distributed. in many interesting examples.m.m ≤ b) = 1 − α.m.α /2. where F n. Use this result in order to construct a 2 1 conﬁdence interval for θ1/θ2 with conﬁdence coefﬁcient 1 − α.m.2. 2θV 2θU is distributed as F2n. n − 1 j =1 ( ) ¯ Thus we obtain the new r.v.

as in the ﬁrst case of Example 1 (and also Example 5). . where all μ 1. so that the corresponding interval approximately is equal to [0. σ 2) and Y1. m+ n−2 ⎝ m n⎠ ⎥ ⎦ 2 m−1 ( ) . Proceeding as in the corresponding case of n−1 Example 1. Consider the r.775S 2 ]. σ1 and σ2 are unknown. for n = 25 and 1 − α = 0.4802. T n(σ 2) of Example 1 as follows: T n′ σ 2 = n−1 S ( ) ( σ) 2 n −1 . suppose that a conﬁdence interval for μ 1 − μ 2 is desired. . one has the following conﬁdence interval for σ 2: 2 2 ⎡ n − 1 Sn −1 n − 1 Sn −1 ⎤ ⎢ ⎥. . modify the r. b a ⎢ ⎥ ⎣ ⎦ and the shortest conﬁdence interval of this form is taken when a and b are numerical solutions of the equations ( ) ( ) a 2 gn −1 a = b 2 gn −1 b () () and ∫a gn −1 (t )dt = 1 − α . a = 13. ¯′ so that T n(σ 2) is χ 2 distributed. To this ¯ end. by means of the tables cited in Example 1.v. n(μ 1 − μ 2) is distributed as tm+n−2. a ⎢ m ⎢ ⎣ ( ) 2 (m − 1)S (m − 1)S 2 + n − 1 Sn −1 ⎛ 1 1⎞ ⎜ + ⎟.5227 and b = 44. Tm.41278S24]. ¯n + 0. .539S 2 .408 15 Conﬁdence Regions—Tolerance Intervals where tn−1. 24 24 EXAMPLE 6 Consider the independent r. we have to assume that σ1 = σ2 = σ. . a ) 2 2 ⎤ + n − 1 Sn −1 ⎛ 1 1⎞⎥ ⎜ + ⎟ ⎥.v.95.025 = 2. . For this purpose. b Thus with n and 1 − α as above. . .α /2 is the upper α /2 quantile of the tn−1 distribution.0.41278S24. samples X1. Then Tm. m+ n−2 ⎝ m n⎠ 2 m−1 ( ) (X m − Yn + t m + n − 2 .0639. the corresponding conﬁdence interval for μ is taken ¯ from (17) with t 24. 1. 1 Yn from N(μ 2. σ 2). say (unspeciﬁed). μ 2. 2 First. one has. For instance. Xm from N(μ 1.n(μ 1 − μ 2) is given by ⎡ ⎢X −Y −t n m+n −2. X Suppose now that we wish to construct a conﬁdence interval for σ 2. the shortest (both in actual and expected length) conﬁdence interval based on Tm. n μ1 − μ 2 = ( ) (X (m − 1)S m − Yn − μ1 − μ 2 2 + n − 1 S n−1 ⎛ 1 1 ⎞ ⎜ + ⎟ m+n−2 ⎝ m n⎠ 2 m−1 ( ) ( ) ) . Thus we have approximately [X n − 0. (18) . Thus.

Fm −1.025 = 3.15.m−1.v.2 Some Examples Nuisance Parameters 409 For instance. for m = 13. we have F13. X − Y + 0.0. we have t25. b 2 ⎥. so that the corresponding interval approximately is equal to ⎡ X − Y − 0.2388 2 ⎥.95./0.α /2 is read off the F-tables and the point F′−1.α 2 = 1 . The point Fn−1.1586 12S 2 + 13S 2 ⎤. ⎢ Sn −1 ⎣ 2 Sm −1 Fn −1. 2 ⎛ σ ⎞ σ 2 Sn Tm.α /2 and Fn−1.0.m −1. n and 1 − α.2388 and F12. S13 S13 ⎥ ⎢ ⎣ ⎦ .n −1.0595.m −1 ≤ b = 1 − α . ( ) Then Pσ or Pσ σ2 2 ⎛ Sm −1 σ 12 S2 ⎞ a 2 ≤ 2 ≤ b m −1 ⎟ = 1 − α .025 = 2. ⎥ ⎦ where F′−1. Now determine two numbers a and b with 0 < a < b and such that P a ≤ Fn −1.α 2 Sn −1 2 ⎤ ⎥. ⎜ 2 Sn −1 ⎠ ⎝ Sn −1 σ 2 1 σ2 2 ⎛ ⎞ σ 12 Sn −1 ⎜ a ≤ 2 2 ≤ b⎟ = 1 − α . the equal-tails conﬁdence interval is provided by 2 ⎡ Sm −1 ⎢ 2 Fn′−1. we consider the r.m−1. for the previous values of m.1586 12S 2 + 13S 2 . n ⎜ 1 ⎟ = 12 2 −1 ⎝ σ 2 ⎠ σ 2 S m−1 ( ) ( ) which is distributed as Fn−1.13.12. n = 14 and 1 − α = 0.025 = 3.m −1.m−1.m −1.m−1.m−1.1532.α /2 are the lower and the upper α /2-quantiles of n Fn−1.α 2 .m−1.3 Conﬁdence Intervals in the Presence of15. so that the corresponding interval approximately is equal to 2 2 ⎡ S12 S12 ⎤ ⎢0. Sn −1 ⎦ ⎢ ⎥ ⎣ Sn −1 In particular. 14 12 13 13 14 12 13 ⎥ ⎢ ⎣ 13 ⎦ 2 2 If our interest lies in constructing a conﬁdence interval for σ 1/σ 2. 3. σ 2 Sm −1 ⎝ ⎠ 1 Therefore a conﬁdence interval for σ 2/σ 2 is given by 1 2 2 2 ⎡ Sm −1 Sm −1 ⎤ ⎢ a 2 .α 2 Thus.3171 2 .α /2 is n given by Fn′−1.

This will be illustrated by means of the following example. σ 2)′ indicated in Fig.v. σ 2).1. we have the conﬁdence region for (μ. 1) and χ 2 . ( ) ( ) ( ) ( ) ⎡ Pμ . In cases like this. . which are independently distributed as N(0. σ 2)′.’s n Xn − μ ( σ ) and (n − 1)S σ2 2 n −1 .3. n (n − 1)S b 2 n −1 ≤σ2 ≤ (n − 1)S a 2 n −1 ⎤ ⎥ = 1 − α. where this is not the case. b and c may be determined so that the resulting intervals are the shortest ones.410 15 Conﬁdence Regions—Tolerance Intervals Exercise 15. To this end. ⎥ ⎦ (19) For the observed values of the X’s.1 Let X1. .v.) Here the problem is that of constructing a conﬁdence region in 2 for (μ. ( ) From these relationships.v. ⎢ ⎥ σ σ2 ⎥ ⎢ ⎦ ⎣ ⎣ ⎦ Equivalently. respectively.’s employed for the construction of conﬁdence intervals had an exact and known distribution. . . The quantities a. EXAMPLE 7 (Refer to Example 5. n−1 deﬁne the constants c (> 0). however.4 Conﬁdence Regions—Approximate Conﬁdence Intervals The concept of a conﬁdence interval can be generalized to that of a conﬁdence region in the case that θ is a multi-dimensional parameter. under . 1 ≤ c = 1 − α [ ( ) ] and 2 P a ≤ χ n −1 ≤ b = 1 − α . consider the r. Now suppose again that θ is real-valued. no suitable r. Xn be independent r. 15.’s distributed as N(μ.σ ⎢− c ≤ ≤ c. There are important examples. a and b (0 < a < b) by P − c ≤ N 0. with known distribution is available which can be used for setting up conﬁdence intervals. both in actual and expected lengths. Next. In all of the examples considered so far the r. That is.σ ⎢ μ − X n ⎢ ⎣ ( ) 2 ≤ c 2σ 2 .v. we obtain 2 ⎤ ⎡ n Xn − μ n − 1 Sn −1 Pμ . a ≤ ≤ b⎥ ⎥ ⎢ σ σ2 ⎦ ⎣ 2 ⎡ ⎤ ⎤ ⎡ n Xn − μ n − 1 Sn −1 ⎢− c ≤ ⎥ × Pμ . Derive a conﬁdence interval for σ with conﬁdence coefﬁcient 1 − α when μ is unknown.σ ≤c ≤ b⎥ = 1 − α .σ ⎢a ≤ = Pμ . 15.

d. so that (20) becomes n ⎡ ⎢X − z α ⎢ n ⎢ ⎣ Xn 1 − Xn 2 ( n ). Xn be i. n The two-sample problem also ﬁts into this scheme. . ⎥ n⎦ (20) As an application.i. . EXAMPLE 8 Let X1. 2 )' with confidence coefficient 1 ( 2 xn ) 2 1 b n j 1 (xj xn ) 2 0 Figure 15. if we assume that σ is known. ⎡ ⎢ X n − zα ⎢ ⎣ Sn 2 n . Then a conﬁdence interval for λ with approximate conﬁdence coefﬁcient 1 − α is provided by X n ± zα 2 Xn . .’s from P(λ). . ⎥ n ⎥ ⎦ ( ) EXAMPLE 9 Let X1. Then since 2 Sn = 1 n ∑ X j − Xn n j =1 ( ) 2 ⎯n⎯ → σ 2 ⎯ →∞ ¯ in probability. .2 Some Examples 2 2 411 1 aj n 1 (xj c2 n 2 xn ) 2 confidence region for ( . .1 x appropriate conditions.i. then a conﬁdence interval for μ with approximate conﬁdence coefﬁcient 1 − α is given by (5). . 1) for large n.v. Suppose now that σ is also unknown. conﬁdence intervals can be constructed by way of the CLT. provided n is sufﬁciently large. 1) for large n and therefore a conﬁdence interval for μ with approximate conﬁdence coefﬁcient 1 − α is given by (20) below. Then the CLT applies and gives that √n(X n − μ)/σ is approximately N(0.15.v. . . r.’s from B(1. X n + zα 2 Xn 1 − Xn ⎤ ⎥. r. The problem is that of constructing a conﬁdence interval for p with approximate conﬁdence coefﬁcient 1 − α. ¯ respectively. we have that √n(X n − μ)/Sn is again approximately N(0. .’s with ﬁnite mean and variance μ and σ 2.i. .4 Conﬁdence Regions—Approximate Conﬁdence Intervals 15. So let X1. Xn be i. r. provided n is sufﬁciently large. Here ¯ ¯ S 2 = X n(1 − X n). consider the Binomial and Poisson distributions. Xn be i. Thus. provided both means and variances (known or not) are ﬁnite. X n + zα 2 Sn ⎤ ⎥. . .v.d. p).d.

and deﬁne the region T(z) in Ω as follows: T z = θ ∈Ω : z ∈ A θ . . .05. r. θ ∈ Ω ⊆ r.’s with p. provided σ = 1 and α = 0. Xn be i. xn)′. consider the problem of testing H(θ *) : θ = θ * at level α and let A(θ *) Ω θ θ be the acceptance region.d. Set Z = (X1.’s with (ﬁnite) unknown mean μ and (ﬁnite) known variance σ 2. Exercises 15. Then a conﬁdence interval for μ with approximate conﬁdence coefﬁcient 1 − α is given be relation (20). .412 15 Conﬁdence Regions—Tolerance Intervals We close this section with a result which shows that there is an intimate relationship between constructing conﬁdence regions and testing hypotheses. . f(x.1. . .4. f(x. r. . . . .d.1 Let X1.4. () { ( )} (21) In other words. . . it is obvious that θ z ∈A θ () if and only if θ ∈T z . iii) What does this interval become for n = 100 and α = 0. For each θ * ∈Ω.f. z = (x1.v. Xn be i. θ ∈ Ω ⊆ r. Xn)′. r.i. . Xn)′. θ ∈ Ω = (0. z = (x1. .d. () Therefore Pθ θ ∈T Z = Pθ Z ∈ A θ ≥ 1 − α . . .s. From (21). . .’s with p. θ ∈ Ω = (0.f. . . θ). . 15. σ = 1 and α = 0. . . . every H(θ) is accepted for θ ∈ T(z). and let A(θ *) stand for the acceptance region in n.d. and suppose that n is large. Xn be i. .05? iii) Refer to part (i) and determine n so that the length of the conﬁdence interval is 0.d. Let X1.v.2 Refer to the previous problem and suppose that both μ and σ 2 are unknown.) as n → ∞. iii) Use the CLT to construct a conﬁdence interval for μ with approximate conﬁdence coefﬁcient 1 − α . θ ). iii) What does this interval become if n = 100.i.v. so that T(Z) is a conﬁdence region for θ with conﬁdence coefﬁcient 1 − α. xn)′. . T(z) is that subset of Ω with the following property: On the basis of z. . and deﬁne T(z) by (21). θ). iii) Discuss part (i) for the case that the underlying distribution is B(1. Thus we have the following theorem. Set Z = θ (X1. Then T(Z) is a conﬁdence region for θ with conﬁdence coefﬁcient 1 − α.05? iii) Show that the length of this conﬁdence interval tends to 0 in probability (and also a. . . For each θ * ∈ Ω let us consider the problem of testing the hypothesis H(θ *): θ = θ * at level of θ signiﬁcance α. THEOREM 1 [ ( )] [ ( )] Let X1.i. ∞). 1) or P(θ).

. of θ. we say that the interval (T1. sample with a p.v. xp] is a prediction interval for θ with p conﬁdence coefﬁcient 1 − 2p. we have the following deﬁnition. . .4. Xn) and T2 = T2(X1.i. Use Theorem 4 in Chapter 10 to construct a conﬁdence interval for the pth quantile of F. since θ is considered to be an r. (The term prediction interval rather than conﬁdence interval is more appropriate here. DEFINITION 3 Let X1.05. .d. γ < 1. . . .i.) 15.’s with continuous d. of known functional form and depending on a parameter θ. Xn be independent r. . T2] is a 100γ percent tolerance interval of 100p percent of F if P[F(T2) − F(T1) ≥ p] ≥ γ. where p = 0. is Beta with parameters α + ∑ n= 1xj and β + n − ∑ n= 1xj.5? ¯ 15.’s with a (nonparametric) d.f. f of the X’s is not of known functional form. F and let T1 = T1(X1.f. given x1.d. ∞) and suppose that n is large.v.4. the problem was that of constructing a conﬁdence interval for θ with a preassigned conﬁdence coefﬁcient. Chapter 12.7 Refer to Example 15.4. Then for the case that θ were real-valued. .6 Refer to Example 14. Thus the Bayes method of estimation considered in Section 7 of Chapter 12 also leads to the construction of prediction intervals for θ. rather than a parameter. Then the concept of a conﬁdence interval. . More precisely. n = 9. .f. is N((nx + μ)/(n + 1). Xn were an r. What does this interval become for p = 0. . .d.f. . . Chapter 12. This problem was solved for certain cases. Instead. j). .2 Some Examples 413 15. 0. and show that the posterior p. . r. .d. μ = 1 and x = 1. xn. xn. 15. of the Beta p p.75.2. and show that the posterior p.5 Let X1. Xn be i. .5 Tolerance Intervals 15.4. . 1/(n + 1)).’s having the Negative Exponential distribution with parameter θ ∈ Ω = (0. given x1. we assumed that X1. Use the CLT to construct a conﬁdence interval for θ with approximate conﬁdence coefﬁcient 1 − α.50.5 Tolerance Intervals In the sections discussed so far.f. as given in Deﬁnition 2.6 to ﬁnd a prediction interval for θ with conﬁdence coefﬁcient 1 − p. becomes meaningless in the present context. . F. . For p and γ with 0 < p.d. . Xn be i.3 Let X1.25. . 15. Xn ) be two statistics of the X’s such that T1 ≤ T2. it follows that [x′. .v.v.15. Now we suppose that the p. .4 Construct conﬁdence intervals as in Example 1 by utilizing Theorem 1. .d.f. Compare this interval with that constructed in Exercise 15. 15.d. Also identify the conﬁdence coefﬁcient 1 − α if n = 10 for various values of the pair (i. we assume a nonparametric model. .4. Then work as in Exercise ¯ 15. 0. . mentioned above.4. .2. . it is replaced by what is known as a tolerance interval.f. . of θ. that is. respectively. r. . Thus j j if x′ and xp are the lower and the upper pth quantiles.

of the sum of any consecutive r W’s is the same. If we set Zk = F(Yk). r. Namely. Then as N gets larger and larger. k = 2. n. so that we obtain N intervals (t1. . f of the continuous type and let Yj = X( j). . t2]. . THEOREM 2 Let X1. For this purpose. . . interval (Yi. n.v.d. ( ) 0 < wk . Accordingly.414 15 Conﬁdence Regions—Tolerance Intervals If we notice that for the observed values t1 and t 2 of T1 and T2. this becomes P Z j − Zi ≥ p = γ . . . it sufﬁces to determine the p. p 1 () gj−i being the p. n.f. use the transformation Vk = W1 + · · · Wk.f. . . k = 1. wn = ⎨ ⎩0. 1 − γ is read off the Incomplete Beta tables. of a Beta distribution with parameters α = j − i and β = n − j + i + 1. experiment under consideration is carried out independently n times and let (t1. t 2]. .’s with p. . .d.f. .) PROOF We wish to show that P[F(Yj) − F(Yi) ≥ p] = γ. Set W1 = Z1 and Wk = Zk − Zk −1 . we also have Z j − Zi = W1 + ⋅ ⋅ ⋅ + Wj − W1 + ⋅ ⋅ ⋅ + Wi = Wi +1 + ⋅ ⋅ ⋅ + Wj . .f. j = 1. Actually. g w1 . respectively. . . F(t 2) − F(t1) is the portion of the distribution mass of F which lies in the interval (t1. w1 + ⋅ ⋅ ⋅ + wn < 1 otherwise.i. From the transformation above. α and β. ( ) ( ) Thus it sufﬁces to ﬁnd the p. Then for any p ∈ (0. t 2] be the resulting interval for the observed values of the X’s. k = 1. . if we set j − i = r. where γ is determined as follows: γ = ∫ g j − i υ dυ . . . we have the following result. Now regarding the actual construction of tolerance intervals. Yj] is a 100γ percent tolerance interval of 100p percent of F. . . n. at least 100 γ of the N intervals will cover at least 100p percent of the distribution mass of F. . of Zj − Zi.f. . suppose the r. This suggests that we shall have to ﬁnd the p. .d. ( ) (22) Then the determinant of the transformation is easily seen to be 1 and therefore Theorem 3 in Chapter 10 gives ⎧n!.f. k = 1. . 1) and 1 ≤ i < j ≤ n. Then we see that formally we go back to the Z’s and therefore . then it is clear that the p.d. the concept of a tolerance interval has an interpretation analogous to that of a conﬁdence interval.d.d. the r.d. . Suppose now that this is repeated independently N times. of W1 + · · · + Wr. . n be the order statistics. of Wi+1 + · · · + Wj. (For selected values of p. . Xn be i.

interval (−∞.d. But this is the p. By taking into consideration that Γ(m) = (m − 1)!. . it follows that (22) is true. this can be rewritten as follows: ⎧ Γ n+1 ⎪ v r −1 1 − v ⎪ gr v = ⎨ Γ r Γ n − r + 1 ⎪ ⎪0.f. It follows then from Theorem 2(i) in Chapter 10. Since this is also the p. as was to be seen. F. and the r. the r. () ( )( ) ( ) n −r .f. each with probability equal to p. . Then. provided γ is determined by ∫ g (v)dv = γ . of Zj − Zi.d. of a Beta distribution with parameters α = r and β = n − r + 1. it follows that for any p ∈ (0. so that F is strictly increasing. ⎩ () ( () ( ) ) ( ) n −r . ∞) does cover at least 1 − p of the distribution mass of F with probability p. 0<v<1 otherwise.f. υ k = ⎨ ⎩0. . 1). X] does cover at most p of the distribution mass of F with probability p. b) with −∞ ≤ a < b ≤ ∞. P F X ≤ p = P X ≤ F −1 p = F F −1 p = p. that the marginal p. In fact. and P F X > 1 − p = 1 − P F X ≤ 1 − p = 1 − F F −1 1 − p = 1 − 1 − p = p. with d. ∞) covers at least 100(1 − p) percent of the distribution mass of F. interval (X.v.15. X] covers at most 100p percent of the distribution mass of F. ▲ Let now f be positive in (a. [( ) ] [ ] [( ) ( )] [ ( )] [( ) ] [ ( )] ( ) so that (X.f. 0<v<1 otherwise. 1 p r This completes the proof of the theorem.5 Tolerance Intervals 15.d. ( ) 0 < v1 < ⋅ ⋅ ⋅ < vk < 1 otherwise. so that (−∞.2 Some Examples 415 ⎧n!. . . gr is given by ⎧ n! v r −1 1 − v ⎪ gr v = ⎨ r − 1 ! n − r ⎪ ⎩0. if X is an r. g υ1 .

416 16 The General Linear Hypothesis Chapter 16 The General Linear Hypothesis 16. as is seen by inspection. the data is periodic and it is ﬁt best by a trigonometric polynomial. and secondly. . . .1 Introduction of the Model In the present chapter. . . . . yj is actually an observed value of a r. One question which naturally arises is how we draw a line in the xy-plane which would ﬁt the data best in a certain sense. n as closely as possible. . that is. that is. First. or 416 . and still in others. The underlying idea in all these cases is that. However. for the sake of completeness. . Assume that a certain aspect of the experiment depends on the temperature and let yj be some measurements taken at the temperatures xj. . y j = β1 + β 2 x j . n which can be represented as points in the xy-plane. j = 1. due to random errors in taking measurements. n which need not be all distinct but they are not all identical either. j = 1. n would be (exactly) related as follows for the three cases considered above. j = 1. For the introduction of the model. . . . the pairs (xj. they lie approximately on a straight line. it reveals the pattern according to which the y’s depend on the x’s. j = 1. Yj. If it were not for the r. . . j = 1. The reason that this line-ﬁtting problem is important is twofold. . . . it can be used for prediction purposes. . . a polynomial of higher degree would seem to ﬁt the data better. Then one has the n pairs (xj. which would pass through the pairs (xj. consider a certain chemical or physical experiment which is carried out at each one of the (without appreciable error) selected temperatures xj. j = 1. yj). n are approximately linearly related. n (n ≥ 2). j = 1. Quite often. . . . In other cases. . . . n. . (1) for some values of the parameters β1. j = 1. yj). the pairs (xj. .v. β2. . . yj). the reader is assumed to be familiar with the basics of linear algebra. n. . . a brief exposition of the results employed herein is given in Appendix I. yj). errors.

errors ej. . . . + e j . n ( ) (3) for some values of the parameters β1. . . βk+1.v. In the presence of r. j = 1. . . β p . . so that X ′ = ⎜ 12 X= ⎜⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⎟ ⎜⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜x ⎟ ⎝ x p1 x p 2 ⋅ ⋅ ⋅ x pn ⎠ ⎝ 1n x2n ⋅ ⋅ ⋅ x pn ⎠ relation (4) is written as follows in matrix notation: Y = X ′β + e. it should be noted that what one actually observes is the r. . βp enter the model in their ﬁrst powers only). .1 Introduction of the Model 417 y j = β1 + β 2 x j + ⋅ ⋅ ⋅ + β k +1 x k . . . . n n ≥ 2 . . (n ≥ 2k + 1). . j = 1. n (2 ≤ k ≤ n − 1). (5) The model given by (5) is called the general linear model (linear because the parameters β1. . . n with p ≤ n and most often p < n. . . . . . j j = 1. . . (4) By setting ′ ′ ′ Y = Y1 . vector e is unobservable. . respectively: Yj = β 1 + β 2 x j + e j . . . . j = 1. . . n. ( ) (2) for some values of the parameters β1. . . . At this point. . n (n ≥ 2k + 1). . . β2k+1. .’s. ﬁnally. the y’s appearing in (1)–(3) are observed values of the following r. ( ) ) (2′) (3′) At this point. . . . . . j = 1. one observes that the models appearing in relations (1′)–(3′) are special cases of the following general model: Y1 = x11 β1 + x 21 β 2 + ⋅ ⋅ ⋅ + x p1 β p + e1 Y2 = x12 β1 + x 22 β 2 + ⋅ ⋅ ⋅ + x p 2 β p + e 2 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ Yn = x1n β1 + x 2 n β 2 + ⋅ ⋅ ⋅ + x pn β p + e n . . . n 2 ≤ k ≤ n − 1 . or. j and Y j = β1 + β 2 cos t j + β3 sin t j + ⋅ ⋅ ⋅ + β 2 k ( cos( kt ) + β sin kt j . . en ( ) ( ) ( ) and ⎛ x11 x21 ⋅ ⋅ ⋅ x p1 ⎞ ⎛ x11 x12 ⋅ ⋅ ⋅ x1n ⎞ ⎜ ⎟ ⎜x x x22 ⋅ ⋅ ⋅ x p 2 ⎟ ⋅ ⋅ ⋅ x2n ⎟ ⎜ 21 x22 ⎟ . or in a more compact form Yj = ∑ xij β i + e j . vector Y. . . y j = β1 + β 2 cos t j + β3 sin t j + ⋅ ⋅ ⋅ + β 2 k cos kt j + β 2 k+1 sin kt j . . j 2 k+1 ( ) (1′) Yj = β1 + β 2 x j + ⋅ ⋅ ⋅ + β k +1 x k + e j . j = 1. Yn .16. . i =1 p j = 1. . . whereas the r. e = e1 . β = β1 . . .

. / where In is the n × n unit matrix. . namely . Clearly the (i. vector. n are r.v. Y − η = e = ∑ e j2 . so that the difference between what we expect and what we actually observe is minimum. we would expect to have η = X′β. .418 16 The General Linear Hypothesis DEFINITION 1 Let C = (Zij) be an n × k matrix whose elements Zij are r. . Then by assuming the EZij are ﬁnite. is the usual Euclidean norm. denoted by v . errors. Then the β principle of least squares (LS) calls for determining β. Another assumption about the e’s which is often made is that they are uncorrelated. . our model in (5) becomes as follows: / Y = X ′β + e. .v. we have EZ = (EZ1. j = 1. Zn)′. we have EC = E[(Z − EZ) (Z − EZ)′]. (6) where e is an n × 1 r. for Z = (Z1. This last expression is denoted by Σ z and is called the variance–covariance matrix of Z. . where η = X′β. More precisely. . It should also be mentioned in passing that the expectations ηj of the r. and β is a p × 1 vector of parameters. βp. . and for C = (Z − EZ)(Z − EZ)′. This is done in the following sections. . . . ej) = 0. . . vm)′. or just the covariance / matrix of Z. . .’s. j = 1.v. j)th element of the n × n matrix Σ z is Coυ(Zi.’s Yj. . it is reasonable to assume that Eej = 0 and that σ 2(ej) = σ 2. β is to be determined so that the sum of squares of errors. ∑ Y = σ 2 I n . is β ˆ called a least square estimator (LSE) of β and is denoted by β. that is. Zj). . the / covariance of Zi and Zj. j = 1. . n. The norm of an m-dimensional vector v = (v1. 2 2 j =1 n is minimum. so that the diagonal elements are simply the variances of the Z’s. so that Y is an n × 1 r.2 Least Square Estimators—Normal Equations According to the model assumed in (6). the EC is deﬁned as follows. In particular. n are linearly related to the β’s and are called linear regression functions. EZn)′. . This motivates the title of the present chapter. . EC = (EZij). . there are p + 1 parameters β1. X′ is an n × p (p ≤ n) matrix of known constants. This is the model we are going to concern ourselves with from now on. By then taking into consideration Deﬁnition 1 and the assumptions just made. . Coυ(ei. Since the r. vector. . . These assumptions are summarized by writing E(e) = 0 and Σ e = σ 2 In. . In the model represented by (6). . EY = X ′β = η. for i ≠ j. . 16. DEFINITION 2 Any value of β which minimizes the squared norm Y − η 2. σ 2 and the problem is that of estimating these parameters and also testing certain hypotheses about the β’s.’s ej. β whereas what we actually observe is Y = X′β + e = η + e for some β.

. so that ηj = β1 + β2xj.16. . n. . . ηn)′ = η = X′β. ⎠ j =1 ⎝ i =1 2 which we denote by S(Y. corresponding to xj.1. x1j = 1 and x2j = xj.2 Least Square Estimators—Normal Equations 419 ⎛m ⎞ v = ⎜ ∑ υ j2 ⎟ ⎝ j =1 ⎠ 1 2 . Then any LSE is a root of the equations 1 2x Y5 Y5 5 5 Y4 4 3 Y4 4 Y3 Y3 3 Y2 x1 2 2 Y2 x2 0 1 x3 x4 x5 x Y1 Figure 16. n are n points on the straight line η = β1 + β2x and the LS principle speciﬁes that β1 and β2 be chosen so that Σ jn= 1(Yj − ηj)2 be minimum. . let p = 2.1 1 Y1 . . . . ηj)′. . (See also Fig. . . . .v. n and Y − η = ∑ Yj − η j 2 j =1 n ( ) 2 p n ⎛ ⎞ = ∑ ⎜ Yj − ∑ xij β i ⎟ . Yj is the (observable) r. For the pictorial illustration of the principle of LS. j = 1. . 16. Thus (xj. . j = 1. . . n. . we have that β η j = ∑ xij β i . .) (The values of β1 and β2 are chosen in order to minimize the quantity (Y1 − η1)2 + · · · + (Y5 − η5)2.) From (η1. j = 1. i =1 p j = 1. . β). . . j = 1. n. . .

β = 0. . . Now p p n ⎛ n n ⎞ ∂ S Y. .420 16 The General Linear Hypothesis ∂ S Y. . . THEOREM 1 ˆ Any LSE β of β is a solution of the normal equations and any solution of the normal equations is an LSE. x pn ( ) ′ ( ) ′ +⋅⋅⋅ = β1ξ1 + β 2 ξ 2 + ⋅ ⋅ ⋅ + β p ξ p . x p 2 . . . . . . . . p ∂β v ( ) which are known as the normal equations. or Sβ = XY. β = 2 ∑ ⎜ Yj − ∑ xij β j ⎟ 1 − xvj = −2 ∑ xvj Yj + 2 ∑ ∑ xvj xij β i . ∂β v ⎠ j =1 ⎝ i =1 j =1 j =1 i =1 ( ) ( ) so that the normal equations become ∑ ∑ xvj xij βi = ∑ xvjYj . v = 1. . . j = 1. . . . Thus η = ∑ β jξ j j=1 p with ξ j . where ξj is the jth column of X′. . . . . the set of LSE’s of β coincides with the set of solutions of the normal equations. . (7′) Actually. ν = 1. ⋅ ⋅ ⋅ . j = 1. . . where S = XX ′. p as above. x1n ( ( ) ′ ) ′ + β 2 x 21 . . n. . . j =1 i =1 j =1 n p n (7) The equations in (7) are written as follows in matrix notation XX ′β = XY. The normal equations provide a method for the actual calculation of LSE’s. as the following theorem shows. x12 β1 + x 22 β 2 + ⋅ ⋅ ⋅ + x p 2 β p . . (8) Let Vn be the n-dimensional vector space n and let r (≤ p) be the rank of X (= rank X′). ξp is of dimension r . . x12 . x1n β1 + x 2 n β 2 + ⋅ ⋅ ⋅ + x pn β p = β1 x11 . x 2 n + β p x p1 . Then the vector space Vr generated by ξ1. p. . . PROOF We have ⎛ x11 x 21 ⋅ ⋅ ⋅ x p1 ⎞ ⎛ β1 ⎞ ⎜ ⎟⎜ ⎟ ⎜ x12 x 22 ⋅ ⋅ ⋅ x p 2 ⎟ ⎜ β 2 ⎟ η = X ′β = ⎜ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅⎟ ⎜ M ⎟ ⎟⎜ ⎟ ⎜ ⎟ ⎜x ⎝ 1n x 2 n ⋅ ⋅ ⋅ x pn ⎠ ⎝ β p ⎠ = x11 β1 + x 21 β 2 + ⋅ ⋅ ⋅ + x p1 β p . x 22 . .

p may j ˆ not be uniquely determined (η is. . Then S = XX′ is a p × p symmetric matrix of rank p. however) but may be chosen to be functions η ˆ of Y since η is a function of Y. which is the matrix notation for the normal = 0. . This completes the proof of the theorem. this LSE is linear in Y. this last condition is equivalent to X(Y − X′β) β ˆ = XY. . Of course. see Fig. where S = XX ′. p. j = 1. . an equivalent condition this is equivalent to saying that Y − X′β β ˆ ˆ to it is that Y − X′β ⊥ ξj. Y ∈ Vn and from (8). j = 1. and Vr ⊆ Vn. . This is part of the following result.2. . it follows that η ∈ Vr. where βj. Furthermore. . rank X = p. and β ˆ ˆ ⊥ Vr. . . unbiased and has covariance matrix given by Σ βˆ = σ 2S−1. (≤p). . namely β = S−1XY.2 Least Square Estimators—Normal Equations 421 Vn Y X' ˆ Y ˆ Y ˆ 0 Vr Figure 16. Y − X′β 2 = Y − η 2 β ˆ ˆ ˆ becomes minimum if η = η. as is well known. . j = 1. THEOREM 2 ˆ If rank X = p. Clearly. p. (For a pictorial illustration of some of the arguments used in the proof of the theorem. That it is linear in Y follows immediately from (9). p. From β β ˆ the deﬁnition of ξj. so that S−1 exists. Then η = Σ p= 1βjξj. . Now.) ▲ In the course of the proof of the last theorem. / PROOF ˆ The existence and uniqueness of the LSE β and the fact that it is given by (9) have already been established. . . Thus β is an LSE of β if and only if X′β = η. . Next. Therefore the normal ˆ equations in (7′) provide a unique solution. Let ˆ ˆ ˆ ˆ η be the projection of Y into Vr. or ξ ′j(Y − X′β) = 0. ( ) p . or equivalently. XX′β β equations. 16. j = 1. it was seen that there exists ˆ at least one LSE β or β and by the theorem itself the totality of LSE’s coincides with the set of the solutions of the normal equations. its unbiasedness is checked thus: ˆ Eβ = E S −1 XY = S −1 X E Y = S −1 XX ′β = S −1Sβ = I β = β.2 (n = 3.16. . then there exists a unique LSE β of β given by the expression ˆ (9) β = S −1 XY. Now a special but important case is that where X is of full rank. r = 2). that is.

let d be the projection of a into Vr. Relation (10) is established as follows: ′⎫ ⎧ ′ ⎤ ⎡ / ∑ AV = E ⎨ AV − E AV AV − E AV ⎬ = E ⎢ A V − EV V − EV A ′ ⎥ ⎣ ⎦ ⎩ ⎭ ′⎤ ⎡ / = A E ⎢ V − EV V − EV ⎥ A ′ = A ∑ V A ′. if there exists a nonrandom n × 1 vector a such that E(a′Y) = ψ identically in β. This completes the proof of the theorem. ˆ ˆ iii) d′Y = c′β for any LSE β of β. ⎣ ⎦ [ ( )][ )( ( )] ( )( ) ( ) In the present case. (10) where V is an n × 1 r. that is. the following result holds. ii) σ 2(a′Y) ≥ σ 2 (d′Y). for the calculation of the covariance of β. PROOF LEMMA 1 i) The vector a can be written uniquely as follows: a = d + b. vector with ﬁnite covariances and A is an m × n matrix of constants. In connection with estimable functions. A = S−1X and V = Y with Σ Y = σ 2In. so that / ′ ′ ′ /ˆ ∑ β = σ 2S −1X S −1X = σ 2S −1XX ′ S −1 = σ 2S −1S S −1 ( ) ( ) ( ) = σ 2 I p S −1 = σ 2S −1 because S and hence S−1 is symmetric. Hence ′ a ′ Y = d + b Y = d ′ Y + b′ Y ( ) and therefore ψ = E a ′ Y = E d ′ Y + b′ EY = E d ′ Y + b′ X ′β. β iv) If α is any other nonrandom vector in Vr such that E(α ′Y) = ψ. β A parametric function ψ is called estimable if it has an unbiased. DEFINITION 3 For a known p × 1 vector c. Let ψ = c′β be an estimable function. Then ψ is called a parametric function. ( ) ( ) ( ) . we need the following auxiliary result: / / ∑ AV = A ∑ V A ′. so that (S−1)′ = S−1. set ψ = c′β. where b ⊥ Vr. so that there exists a ∈ Vn such that β E(a′Y) = ψ identically in β. Furthermore. linear (in Y) estimator. then α α = d. Then: i) E(d′Y) = ψ.422 16 The General Linear Hypothesis ˆ Finally. ▲ The following deﬁnition will prove useful in the sequel.

j = 1. Next. and let d be the projection of a into Vr. Thus α = d. let α ∈ Vr be such that E(α′Y) = ψ. p. (α′ − d′)X′β = 0 identically in β and hence (α′ − d′)X′ = 0. . ˆ ˆ so that d′Xβ = c′β identically in β. But ′ β ( ) ( ) E d ′ Y = d ′EY = d ′ X ′β. β ˆ ˆ where β is any LSE of β. We are now able to formulate and prove the following basic result. That is. E(d′Y) = ψ = c′β identically in β. and let d be the projection of a into Vr. iii) By (i). / σ 2 a′Y = a′∑Ya = σ 2 a Since also 2 ( ) 2 =σ2 d 2 +σ2 b . Thus there exists a ∈ Vn such that E(a′Y) β ˆ ˆ = ψ identically in β.16. Since ψ is estimable there exists a ∈ Vn such that E(a′Y) = ψ identically in β. linear (in Y) estimator ψ of ψ is called the LSE of ψ. that σ 2(b′Y) ≥ ˆ. DEFINITION 4 ( ) ( ) [( ) ] ( ) Let ψ = c′β be an estimable function. iv) Finally. Then its LSE ψ has the smallest variance in the class of all linear in Y and unbiased estimators of ψ. both α − d α ∈ Vr and α − d ⊥ Vr and hence α − d = 0. Hence E(d′Y) = ψ. it follows that a 2 = d 2 + b 2.2 Least Square Estimators—Normal Equations 423 But b′X′ = 0 since b ⊥ Vr and thus b ⊥ ξj. Then the unbiased. or α β α (α − d)′X′ = 0 which is equivalent to saying that α − d ⊥ Vr. ▲ Part (iii) of Lemma 1 justiﬁes the following deﬁnition. the β β β ˆ projection of Y into Vr. we have = σ 2 d ′Y ( ) ) 2 σ 2 a′Y = σ 2 d′Y + σ 2 b from which we conclude that ( ) ( σ 2 a′Y ≥ σ 2 d′Y . . it follows. Since d′Y = ψ the proof is complete. as was to be seen. So. with η = X′β. since d ∈ Vr. one has d′(Y − η ) = 0. Then we have α ( ) 0 = E α ′ Y − E d ′ Y = E α ′ − d ′ Y = α ′ − d ′ X ′β. by Lemma 1. Next. . Therefore ˆ ˆ ˆ d ′ Y = d ′ η = d ′ X ′β = c ′β. the column vectors – of X′. σ 2(d′Y). ii) From the decomposition a = d + b mentioned in (i). by means of (10). Hence d′X′ = c′. 2 σ2 d by (10) again. ▲ PROOF THEOREM 3 . Then if b′Y is any other linear in Y and unbiased estimator of ψ. Set ψ = c′β(= d′Y). . (Gauss–Markov) Assume the model described in (6) and let ψ be an estimable ˆ function.

. we obtain a certain reduction of the linear model under consideration. . . . one has that Y = Σ jn= 1Zjαj for certain r. α αr. . . ˆ ˆ j = 1. ( ) (11) Next.’s Zj. Zn)′. . n. . . Now since Y ∈ Vn. we also obtain an estimator of the variance σ2. n to be speciﬁed below. where r = rank X. . n. . r). j = r + 1. . . and as a by-product of it. . . . we have (12) ˆ Y = ∑ Z jα j and η = ∑ Z jα j . ▲ 16. . .424 16 The General Linear Hypothesis COROLLARY Suppose that rank X = p. p. in the previous section we solved the problem of estimating β by means of the LS principle. It follows that α i′ Y = Σ jn= 1Zjα i′αj. . a basis for which α i′αj = 0. From the deﬁnition of P. . . αr} be an orthonormal basis for Vr (that is. so that relation (10) gives / ∑ Z = Pσ 2 I n P ′ = σ 2 I n . . ζn)′. . In the present section. αn} for Vn. . . j = 1. . . Then for any c ∈ Vp. j = 1. βp)′. i = 1. ˆ = (β1. β ˆ ˆ and hence its LSE ψ = c′β has the smallest variance in the class of all linear in β ˆ Y and unbiased estimators of ψ. where β PROOF ˆ The ﬁrst part follows immediately by the fact that β = S−1XY. so that Zi = α i′Y. . . Then ζ = E(PY) = Pη. . the same is true for each βj. j = 1. . the function ψ = c′β is estimable. where η ∈ Vr. where Z = (Z1. . . For this. . . . n. n. It follows η then that ζ j = 0. .3 Canonical Reduction of the Linear Model—Estimation of σ 2 Assuming the model described in (6). Thus σ 2 Z j = σ 2 . By letting P be the matrix with rows the vectors α i′. Recall that Vr is the r-dimensional vector space generated by the column vectors of X′. j=1 j=1 n r so that . let EZ = ζ = (ζ1. Then this basis can be extended to an orthonormal basis {α1. . The particular case follows from the ﬁrst part by taking c to have all its components equal to zero except for the jth one which is equal to one. . . for j = 1. it will have to be assumed that r < n as will become apparent in the sequel. . ˆ By recalling that η is the projection of Y into Vr. . . . . . . Let {α1. . i = 1. . .v. so that Vr ⊆ Vn. . . . . the last n equations are summarized as follows: Z = PY. . i ≠ j and αj 2 = 1. αr+1. it is clear that PP′ = In. In particular. n. .

(Here is where we use the assumption that r < n in order to ensure that n − r > 0. . an unbiased ˆ ˆ estimator for σ 2. we have REMARK 1 2 ˆ ˆ ˆ Y − η = Y − X ′β = Y − X ′β 2 ˆ ˆ ˆ ˆ ˆ ˆ = Y ′ − β ′ X Y − X ′β = Y ′ Y − Y ′ X ′β − β ′ XY + β ′ XX ′β ˆ ˆ ˆ = Y ′ Y − Y ′ X ′β + β ′ XX ′β − XY . . . n. we get that EZ 2 = σ 2. . . n. . (13) From (11) and (12). . j = 1. j = 1. n are the columns of P′. so that (13) gives j ˆ ˆ E Y − η 2 = (n − r)σ 2. .v.) Thus we have shown the following result. . . To this end.1). Therefore ξ ′αi = 0 for all j = 1. it follows that in order for us to be able to actually calculate the LSE σ 2 of σ 2.’s Zj. . j = 1. the rows of X are ξ ′. p. Then the LSE’s β and σ 2 are functions only of the ˜ transformed r. so that β = S−1XY by Theorem 2. . that is. j =1 n (14) . . . j = p + 1. We may refer to σ 2 as the LSE of σ 2. Now from the transformation Z = PY one has Y = P−1Z. . ( ) By summarizing these results. . Because of the special ˆ form of the matrix XP′ menitoned above. p only (see also Exercise 16. p and Zj. Next. j = 1. Hence β ˆ ˆ ˆ ˆ ˆ Y′X′β = (Y′X′β)′ = β′XY. . . . j = 1. n. whereas ˜ σ2 = n 1 ∑ Z j2 n − p j = p +1 by 13 . Therefore 2 ˆ ˆ ˆ Y − η = Y ′ Y − β ′ XY = ∑ Yj2 − β ′ XY. respectively. Thus Y = P′Z and therefore β = S−1XP′Z. . where η is the projection ˜ ˜ of Y into Vr and r (≤ p) is the rank of X. a number. . Hence Y − η 2/(n − r) is an unbiased estimator of σ 2. σ 2. On the other hand. . p are the column vectors of X′. . . we would have to rewrite ˜ ˆ Y − η 2 in a form appropriate for calculation. But P −1 = P′ as follows from the fact ˆ that PP′ = In. . ˆ Now suppose that rank X = p. . is provided by Y − η 2/(n − r). we have then COROLLARY ˆ Let rank X = p (< n). .16. p and i = p + 1. . . XX′β − XY = 0 since XX′β = XY β β β β by the normal equations (7′). Since j αj. . . . .3. . it follows that the last n − p elements in all p rows of the p × n matrix XP′ are all equal to zero. ( )( ( ( ) ˆ ) ( Y − X ′ β) ′ ) ˆ But Y′X′β is (1 × n) × (n × p) × (p × 1) = 1 × 1. where ξj. THEOREM 4 In the model described in (6) with the assumption that r < n. and ξj ∈ j Vp.3 Canonical Reduction of the Linear Model—Estimation of σ 2 425 ˆ Y−η = 2 j = r +1 ∑Z α j n 2 j ′ ⎛ n ⎞ ⎛ n ⎞ = ⎜ ∑ Zjα j ⎟ ⎜ ∑ Zjα j ⎟ = ⎝ j = r +1 ⎠ ⎝ j = r +1 ⎠ j = r +1 ∑Z n 2 j . it follows that β is a function of Zj. j = r + 1. From the last theorem above. . .

. . j =1 v =1 n p (16) As an application of some of the results obtained so far. . . . ⎝ x1 x2 ⋅ ⋅ ⋅ xn ⎠ ⎝ j =1 ⎠ j =1 ( ) (17) Now n n ⎛ n ⎞ 2 S = n∑ x j2 − ⎜ ∑ x j ⎟ = n∑ x j − x . consider the following example. ′ and β = β1 . j = 1. one has rv = ∑ xvj Yj . Yn = ⎜ ∑ Yj . Clearly. . . . ⎝ j =1 ⎠ j =1 j =1 2 ( ) so that ∑ (x j =1 n j −x ) 2 ≠ 0. . if i = j and δij = 0 if i ≠ j). EXAMPLE 1 Let Yj = β1 + β2xj + ej.426 16 The General Linear Hypothesis Finally. ( ) ⎛1 XX ′ = ⎜ ⎝ x1 ⎛1 ⎜1 ⎜ 1 ⋅ ⋅ ⋅ 1 ⎞ ⎜⋅ ⎟⎜ x2 ⋅ ⋅ ⋅ xn ⎠ ⎜ ⋅ ⎜⋅ ⎜ ⎝1 x1 ⎞ x2 ⎟ ⎛ ⎟ n ⋅ ⋅⎟ ⎜ =⎜ n ⎟ ⋅ ⋅⎟ ⎜ ⎜ ∑ xj ⋅ ⋅⎟ ⎝ j = 1 ⎟ xn ⎠ ⎟ = S. where Eej = 0 and E(eiej) = σ 2δij. . . ⎟ ∑x ⎟ ⎠ j =1 j =1 n 2 j ∑x ⎟ j n ⎞ so that the normal equations are given by (7′) with S as above and ′ n ⎞ ⎛1 1 ⋅ ⋅ ⋅ 1 ⎞ ′ ⎛ n XY = ⎜ ⎟ Y1 . this example ﬁts the model described in (6) by taking ⎛ 1 x1 ⎞ ⎜1 x ⎟ 2 ⎜ ⎟ ⎜⋅ ⋅ ⎟ X′ = ⎜ ⎟ ⎜⋅ ⋅ ⎟ ⎜⋅ ⋅ ⎟ ⎜ ⎟ ⎝ 1 xn ⎠ Next. v = 1. ∑ x j Yj ⎟ . denoting by rv the vth element of the p × 1 vector XY. p j =1 n (15) and therefore (14) becomes as follows: 2 ˆ ˆ Y − η = ∑ Yj2 − ∑ β v rv . n (δij = 1. . . i. Y2 . β 2 .

⎜⎛ n ⎞⎛ n ⎞ ⎟ n ⎜ − ∑ x j ∑ Yj + n∑ x j Yj ⎟ ⎟ ⎟⎜ ⎜⎜ ⎟ j =1 ⎝ ⎝ j =1 ⎠ ⎝ j =1 ⎠ ⎠ so that ⎞ ⎫ ⎛ n 2⎞⎛ n ⎞ ⎛ n ⎞⎛ n ∑ x j ⎟ ⎜ ∑ Yj ⎟ − ⎜ ∑ x j ⎟ ⎜ ∑ x j Yj ⎟ ⎪ ⎜ ⎠ ⎪ ⎝ j =1 ⎠ ⎝ j =1 ⎠ ⎝ j =1 ⎠ ⎝ j =1 ˆ . so that ˆ β2 = ∑ ( x − x )(Y − Y ) .16.⎪ β1 = n 2 ⎪ n∑ x j − x ⎪ j =1 ⎪ n n n ⎛ ⎞⎛ ⎞ ⎪ n∑ x j Yj − ⎜ ∑ x j ⎟ ⎜ ∑ Yj ⎟ ⎬ ⎝ j =1 ⎠ ⎝ j =1 ⎠ j =1 ˆ β2 = and ⎪ n 2 ⎪ n∑ x j − x ⎪ j =1 ⎪ 2 ⎪ n n ⎛ n ⎞ 2 n∑ x j − x = n∑ x 2 − ⎜ ∑ x j ⎟ . Then S−1 exists and is given by ⎛ n 2 ⎜ ∑ xj 1 ⎜ j =n ⎜ ⎜ −∑ x j ⎝ j =1 n ⎞ −∑ x j ⎟ j =1 ⎟. ⎝ j =1 ⎠ ⎝ j =1 ⎠ j =1 j =1 ( )( ) as is easily seen. ∑ (x − x) n j =1 i j n 2 j =1 j (19′) . ⎟ n ⎟ ⎠ S −1 = 1 n∑ x j − x j =1 n ( ) 2 (18) It follows that ˆ ⎛β ⎞ 1 ˆ β = ⎜ 1 ⎟ = S −1XY = n ˆ ⎠ ⎝ β2 n∑ x j − x j =1 ( ) 2 ⎛⎛ n 2⎞⎛ n ⎞ ⎛ n ⎞⎛ n ⎞⎞ ⎜ ⎜ ∑ x j ⎟ ⎜ ∑ Yj ⎟ − ⎜ ∑ x j ⎟ ⎜ ∑ x j Yj ⎟ ⎟ ⎝ j =1 ⎠ ⎝ j =1 ⎠ ⎝ j =1 ⎠ ⎝ j =1 ⎠⎟ × ⎜ .3 Canonical Reduction of the Linear Model—Estimation of σ 2 427 provided that not all x’s are equal. ⎪ j ⎪ ⎝ j =1 ⎠ j =1 j =1 ⎭ ( ) (19) ( ) ( ) But n n ⎛ n ⎞⎛ n ⎞ n∑ x j Yj − ⎜ ∑ x j ⎟ ⎜ ∑ Yj ⎟ = n∑ x j − x Yj − Y .

σ 2 β 2 = ( ) ∑ (x n j =1 σ2 j −x ) 2 . are more compact.3 Verify relation (19″).3.3833 and β 2 = −0. take n = 12 and the x’s and Y’s as follows: x1 = 30 x 2 = 30 x3 = 30 x 4 = 50 x5 = 50 x 6 = 50 x 7 = 70 x8 = 70 x 9 = 70 x10 = 90 x11 = 90 Y1 = 37 Y7 = 20 Y2 = 43 Y8 = 26 Y3 = 30 Y9 = 22 Y4 = 32 Y10 = 15 Y5 = 27 Y11 = 19 x12 = 90 Y6 = 34 Y12 = 20.2) that ˆ ˆ β = Y − β x.428 16 The General Linear Hypothesis It is also veriﬁed (see also Exercise 16. ⎝ j =1 ⎠ ⎝ j =1 ⎠ j =1 (20) Since also r = p = 2. Eβ 2 = β 2 .’s Zj. j =1 n so that (16) becomes n n ⎞ ˆ ⎛ n ⎞ 2 ˆ ⎛ ˆ Y − η = ∑ Yj2 − β1 ⎜ ∑ Yj ⎟ − β 2 ⎜ ∑ x j Yj ⎟ .1 Referring to the proof of the corollary to Theorem 4. 2 and (20) and (21) give the estimate σ = 14. (15) gives in conjunction with (17) r1 = ∑ Yj j =1 n and r2 = ∑ x j Yj . . elaborate on the ˆ assertion that β is a function of the r. the LSE of σ 2 is given by ˜ σ = 2 ˆ Y−η n−2 2 .8939. that ˆ ˆ ˆ Eβ1 = β1 . .3. In the present case. r alone. . j = 1. but their expressions given by (19) are more appropriate for actual calculations.v. Show directly. . respectively. ˜ Exercises 16. (21) For a numerical example.3. ˆ ˆ Then relation (19) provides us with the estimates β 1 = 46.2 16.3216. by means of (19′) and (19″).3. 1 2 (19″) ˆ ˆ The expressions of β1 and β2 given by (19″) and (19′). 16.

vector Y. . Those assumptions did not specify any particular kind of distribution for the r. (It should be pointed out.5 3 1. n represent errors in taking measurements. however.v.3 4 2. σ 2 β2 = σ 2 β1 = n 2 ( ) ( ) σ2 ∑ n j =1 x j2 and. ¯ Again refer to Example 1 and suppose that x = 0. Then for n = 5 and the data given in the table below ﬁnd: i) The LSE’s of β and σ 2. x]. n for some x > 0.4 Testing Hypotheses About η E(Y) The assumptions made in (6) were adequate for the derivation of the results obtained so far. j = 1. ˆ conclude that σ 2(β2) becomes a minimum if half of the x’s are chosen equal to x and the other half equal to −x (if that is feasible). .” This is the case. iii) An estimate of the covariance found in (ii). However.16. . . j = 1. .4 Testing Hypotheses About E(Y) 429 ˆ ˆ and that β2 and β1 are normally distributed if the Y’s are normally distributed. by assuming that n is even and xj∈[−x. for example. that such a choice of the x’s—when feasible—need not be “optimal.4 i) Use relation (18) to show that ˆ σ β1 = 2 ( ) σ 2 ∑j = 1 x j2 n n∑j = 1 x j − x n ( ) 2 ˆ . when there is doubt about the linearity of the model.’s ej.7 16. then β1 and β2 are uncorrelated. σ 2 β2 = ( ) ∑ (x n j =1 σ2 j −x ) 2 ˆ ˆ and that. .3. . x y 1 1. ii) Conclude that β iii) Show that σ ˆ ˆ . j = 1. .5 5 1. . . . such an assumption will have to be made now. . Since the r.5 Consider the linear model Yj = β1 + β2xj + β3x 2 + ej.0 2 1.) 16.3. 16. it . Then ¯ ˆ1 is normally distributed if the Y’s are normally distributed. n under j the usual assumptions and bring it under the form (6). vector e and hence the r. ˆ ii) The covariance of β (the LSE of β). if x = 0. in order to be able to carry out tests about η.

. rank X = r ≤ p < n . σ 2) with respect to both β and σ 2. n S y. . we have then (C ): Y = X′β + e. For testing the hypothesis H. denote by fY(y. The assumption that e: N(0. e: N 0. under C. we are going to use the Likelihood Ratio (LR) test. This means that the coordinates of η are not allowed to vary freely but they satisfy n − r independent linear relationships. n are uncorrelated normal (equivalently. β) is replaced by Sc. it sufﬁces to maximize with respect to σ 2 the quantity ⎛ 1 ⎞ ⎛ ⎞ 1 SC ⎟ . For this purpose. the r-dimensional vector space generated by the column vectors of X′. under C. This is expressed by saying that η actually belongs in Vr−q. σ 2 I n . we also assume H. the maximum of fY(Y.’s ej. σ 2). Denoting by (C) the set of conditions assumed so far.d. β = y − X ′β = ∑ y j − EYj 2 j =1 ( ) n ( ) 2 under C and c. . that is. σ ( 2 ) ⎛ 1 ⎞ ⎡ 1 =⎜ ⎟ exp ⎢− 2 2 ⎢ ⎝ 2πσ ⎠ ⎣ 2σ n n ∑ ( y j − EYj ) j =1 n 2 ⎤ ⎥ ⎥ ⎦ ⎛ 1 ⎞ ⎡ 1 ⎤ S y . =⎜ ⎟ exp ⎢− 2 ⎝ 2πσ 2 ⎠ ⎣ 2σ ⎦ ( ) (24) From (24). However. or fY Y (y1. Thus we hypothesize that ( ) ( ) (22) H : η ∈ Vr −q ⊂ Vr (q < r ) (23) and denote by c = C ∩ H.v. an (r − q)-dimensional vector subspace of Vr. is obtained when S(y. 2 2 2 σ ( ) . .f. σ 2) with respect to β.430 16 The General Linear Hypothesis is reasonable to assume that they are normally distributed. . . . Now from (22). of the Y’s and let SC and Sc stand for the minimum of 1. j = 1. β. We have fY y : β. Thus in order to maximize fY(y. β. β. β ⎥. . we have that η = EY = X′β and it was seen in (8) that β η ∈ Vr. respectively. there might be some evidence that the components of η satisfy 0 < q ≤ r additional independent linear relationships. β. . independent normal) with mean zero and variance σ 2. the conditions that our model satisﬁes if in addition to (C). ⎜ ⎟ exp⎜ − 2 ⎝ 2σ 2 ⎠ ⎝ 2πσ ⎠ n or its logarithm − n n S 1 log 2π − log σ 2 − C 2 . . σ 2In) simply states that the r. σ 2) the joint p. . . that is. yn. it is obvious that for a ﬁxed σ 2.

16. The LR test rejects H whenever λ < λ 0. ⎝ 2⎠ (26) so that the LR statistic λ is given by ⎛S ⎞ λ=⎜ c⎟ ⎝ SC ⎠ where −n 2 (27) ˆ SC = min S Y. 2 (29) ˆ β c = LSE of β under c . respectively. ˆ The actual calculation of SC and Sc is done by means of (16). set g λ = λ−2 n − 1. Then () (30) dg λ dλ ( )=−2λ ( n − n+ 2 ) n < 0. We will show in the following that the LR test is equivalent to a certain “F-test” based on a statistic whose distribution is the F-distribution. σ 2 = ⎜ ⎟ C ⎝ 2πSC ⎠ ( ) n 2 ⎛ n⎞ exp⎜ − ⎟ . so that σ 2 = C . 2 (28) ˆ βC = LSE of β under C . we obtain − S n 1 SC 1 + = 0. . Therefore ¯ ⎛ n ⎞ max fY y. β = Y − ηC C ( ) 2 ˆ = Y − X ′ βC . σ 2 = ⎜ ⎟ c ⎝ 2πSc ⎠ ( ) n 2 ⎛ n⎞ exp⎜ − ⎟ . β = Y − ηc c ( ) 2 ˆ = Y − X ′β c . To this end. one also has ⎛ n ⎞ max fY y. where λ 0 is deﬁned. ⎝ 2⎠ (25) In an entirely analogous way. Now the problem which arises is that of determining the distribution of λ (at least) under H. 2 4 2σ 2 σ n The second derivative with respect to σ 2 is equal to n/(2σ 4) − (SC/σ 6) which for 2 σ2 = σ 2 becomes −n3/(2SC) < 0. under H. where η is ˆ ˆ replaced by η C and η c. and ˆ S c = min S Y. so that the level of the test is α. with speciﬁed degrees of freedom. β.4 Testing Hypotheses About E(Y) 431 Differentiating with respect to σ 2 this last expression and equating the derivative to zero. β.

d. and the distribution of F. r − q = 1). and let β be the unique (and unbiased) ˆ η ˆ ˆ ˆ LSE of β. r = 2. its geometric interpretation illustrated by Fig.432 16 The General Linear Hypothesis so that g(λ) is decreasing. Thus λ < λ 0 if and only if g(λ) > g(λ 0). one has that the joint p. n − r .3 below ˆ illuminates it even further.3 (n = 3. Taking into consideration relations (27) and (30). equivalently. the last inequality becomes n − r S c − SC > F0 . . We have that ηC is the “best” estimator of η under ˆ C and η c is the “best” estimator of η under c. ⎜ ⎟ 2 ⎢ ⎝ 2πσ 2 ⎠ ⎦⎭ ⎩ 2σ ⎣ n ( ) Vn SC squared norm of Y Sc squared norm of Vr 0 q ˆC ˆc Vr Sc SC squared norm of Figure 16. respectively. q SC where F0 is determined by ⎛ n − r S c − SC ⎞ PH ⎜ > F0 ⎟ = α . jx (31) The statistics SC and Sc are given by (28) and (29).f. under H. where F = n − r Sc − SC q SC and F0 = Fq . of the Y’s is given by the following expression. Now although the F-test in (31) is justiﬁed on the basis that it is equivalent to the LR test. By the facet that (Y − η )′(η − η) = 0 because Y − η ⊥ η − η. Then the F-test rejects H ˆ ˆ whenever η C and ηc differ by too much. as is shown in the next section. where y has been replaced by Y: 2 ⎤⎫ ⎛ 1 ⎞ ⎧ 1 ⎡ ˆ ˜ exp ⎨− n − p σ 2 + X ′β − X ′β ⎥ ⎬. is Fq. q SC ⎝ ⎠ Therefore the LR test is equivalent to the test which rejects H whenever F > F0 . 16. whenever Sc − SC is too large (when measured in terms of SC). ˆ Suppose now that rank X = p (< n).n−r.

x= and β.5 Derivation of the Distribution of the F Statistic 433 ˆ This shows that (β. are ˆ ˜ related as follows: n−r 2 ˆ ˜ σ2 = σ n ˆ and that σ 2 = σ 2 where σ 2 is given in Section 16. . . . . n (that is. It can be shown that β ˜ β it is also complete (this follows from the multiparameter version of Theorem 6 in Chapter 11). p. ¯ 16. . let {αq+r. . . Following similar arguments to those in Section 16. j = 1. . αr} for Vr and then to an orthonormal α basis {α1. are known constants. Vr and Vn(r < n) which are related as follows: Vr−q ⊂ Vr ⊂ Vn. . . . . . . It is also of minimum variance (by the Lehmann–Scheffé theorem) as it depends only on a sufﬁcient and complete statistic. αq. σ 2 are parameters. αr+1.1 Show that the MLE and the LSE of σ 2. and σ 2. . . αq+1. αr. . n ii) Derive the LR test for testing the hypothesis H : γ = γ0 against the alternative A : γ ≠ γ0 at level of signiﬁcance α. we have then the following result. αq+1.5 Derivation of the Distribution of the F Statistic Consider the three vector spaces Vr−q. xj . . αn} for Vn. . . .2 Let Yj. σ 2 . n be independent r. . γ. . By sufﬁciency and completeness. Then: 1 n ∑ xj n j =1 ( ( ) ) j = 1. βj is an unbiased (and linear in Y) estimate of βj. .16. ▲ Exercises 16. . Let P be the n × n orthogonal α . . . j = 1. it follows that the LSE’s βj of βj. PROOF ˆ For j = 1.4. . .3. σ 2) is a sufﬁcient statistic for (β. respectively. .v. αr} be an orthonormal basis for Vr−q and extend it to an α orthonormal basis {α1. . σ 2). . . . regardless of whether they are linear in Y or not). αq. σ 2.’s where Yj is distributed as N β + γ xj − x . . . . ii) Set up a conﬁdence interval for γ with conﬁdence coefﬁcient 1 − α. j = 1. . 16. . . . .4.4. . . THEOREM 5 Under conditions (C) described in (22) and the assumption that rank X = p (< ˆ n). p have the smallest variance in the class of all unbiased estimators of βj.

The distribution of F. n. . .3. n Hence Sc − SC = ∑ Z j2 . n n ∑j = r +1Z j2 ∑j = r +1Z j2 n − r q q Now since the Y’s are independent and the transformation P is orthogonal. . so that ζj = 0. (See Theorem 5. true. Now if H is true. . so that PP′ = In. . j = 1. as follows from the transformation ζ = Pη. n by (11). H′). j =1 q so that n−r F = q ∑j = 1Z j2 = ∑j = 1Z j2 q .) Since also σ 2(Zj) = σ 2. j = 1. . . . Thus ζj = 0. . we will further have that η ∈ Vr−q. j j ˆ ˆ since η c is the projection of η into Vr−q. one has the following theorem. ′ consider the transformation Z = PY and set EZ = ζ = (ζ1. the q reader is referred to Appendix II. . Y = ∑ α jZj j =1 n 2 j = 1. From the derivations in this section and previous results. ∑ Z j2 is σ 2 χ q2 j =1 q and j = r +1 ∑ Z j2 is σ 2 χ n2−r . Thus the hypothesis H is equivalent η to the hypothesis H ′ : ζ j = 0. The converse is also. it follows that Z’s are also independent. it follows that. clearly. Chapter 9. . . . Therefore (29) gives ˆ Sc = Y − ηc 2 = ∑ Z j2 + j =1 q j = r +1 ∑ Z j2 . ˆ SC = Y − ηC On the other hand. n−r which is deﬁned in 2 terms of a χ n−r and a non-central χ 2 distribution. . . . ζn)′. the statistic F is distributed as Fq. n−r. . By (13) and (28).434 16 The General Linear Hypothesis matrix with rows the vectors α J. under the alternatives. . ˆ and ηc = j = q +1 ∑α Z . j = 1. . η where η ∈ Vr. j = r + 1. . under H (or equivalently. under H (H′). . For these deﬁnitions. q. Then ζ = Pη. = j = r +1 ∑Z r n 2 j . . . is non-central Fq. n It follows that. . q. As in Section 16. . n.

. ˆ ˆ ˆ ˆ The r. 1 Next. . Let Y0 be their sample mean. n to the r. ˆ ¯ ¯ 1. under H1 (or H′). and the cut-off point is F0 = F1. Hypothesis H1 is equivalent to the hypothesis H′: ηi = β1 + β20 xi. suppose we are interested in testing the hypothesis H2: β1 = β10. the linear model adopted in this chapter can also be used for prediction purposes.α.5 Derivation of the Distribution of the F Statistic 435 THEOREM 6 Assume the model described in (22) and let rank X = p (< n). The LSE of β1 and β2 are β1C = Y − β2Cx β2C = Σ jn= 1(xj − x 2(Yj − Y )/ ¯) ˆ ¯ Σ jn= 1(xj − x 2. and that the Y’s are normally distributed. Then ηi = β1 + β2 xi. Thus. Therefore η ∈ V2.α . i = 1. or q = 2. j = 3. it is also assumed that if the Y0i’s were actually observed. respectively. ¯) ˆ ˆ ¯ ˆ ˆ ¯) ηc = X′βc and Sc = Y − ηc 2 = Σn= 1(Yj − Y )2 + β20 · (β20 − 2β2C)Σ jn= 1(xj − x 2. from which it follows that. Thus. . β2c = β20. the LSE become β1c = Y − β20x β2c = β20. Without loss of generality. . It β j ˆ follows that Sc − SC = (β2C − β20)2 Σ jn= 1(xj − x 2. n are linear combinations of η1 and η2.4(i) and (18). . ¯) ¯. Furthermore. m are to be observed at the point x0.v. . we should like to emphasize that the transformation of the r. r − q = 1.’s Yj. β1c = β10. r − q = 0. Then it follows ˆ 0 = β1 + β2x0. ▲ PROOF Finally.’s Zj. n−2 . i = 1. η ∈ V1. The following example provides an instance of a prediction problem and its solution. Clearly. Let x0 ≠ xj. so that 2 ˆ ˆ ηc = X′βc and Sc = Σ jn= 1(Yj − β10 − β20xj)2. n and suppose that the independent r. n−2 . It is an immediate consequence of the corollary to Theorem 4 and Theorem 5 in Chapter 9. The problem is that of predicting Y0 and also constructing a prediction interval for Y0. suppose we are interested in testing the following hypothesis about the slope of the regression line. Then the LSE’s ˆ ˜ β and σ 2 of β and σ2. where β20 is a given number. n is only a technical device for deriving the distribution of F and also for proving unbiasedness of the LSE of σ 2. EXAMPLE 3 Refer to Example 2. . j = 1. namely H1: β2 = β20.3) that . . η ∈ V0. we ﬁnd (see also Exercise 16.16. or q = 1 ˆ ˆ ¯. . by ˆ that EY means of Exercise 16.’s Y0i = β1 + β2x0 + ei. Y0 is to be predicted by Y0. As mentioned earlier. 2. where β10 and β20 are given numbers. . Likewise. This section is closed with two examples. are independent. It follows that rank X = r = 2. The cut-off point is F0 = F2. It follows that the test statistic β − here is F = n2 2 ScS−CSC ~ F2 .v. and all ηj. where Y0 = β1 + β2x0. ˆ 1 ˆ ˆ 2C ¯ ˆ ˆ Then η C = X′βC and SC = Y − η C 2 = Σ jn= 1(Yj − Y )2 − β 2 Σ jn= 1(xj − x 2. i = 1. then EZ = 0. . . j = 1. β2 = β20. i = 1. For actually carrying out the test and also for calculating the LSE of σ 2. whereas under H1 (or H′. they would be independent of the Y’s. Hypothesis H2 is equivalent to the hypothesis H′: ηi = β10 + β20 xi. and the test statistic is ¯) F = n−2 ScS−CSC ~ F1. . .v. where SC was given above. under H2 2 ˆ ˆ (or H′). . the Y’s rather than the Z’s are used. Now. we may assume that x1 ≠ x2. .n−2. 2 are linearly independent. j = 1. from which it follows that. 1 2.n−2. Thus.5. Of course. .v. . and the regression line is y = β1 + β2x in the xy-plane.5. EXAMPLE 2 Refer to Example 1 and suppose that the x’s are not all equal. . . if we set Z = Y0 − Y0.

05? 16. Thus the statistic σ 2 can be used for testing hypotheses ˜ ˜ n−r about σ 2 and also for constructing conﬁdence intervals for σ 2.2570.025 = 2.9176].1 From the discussion in Section 16. α 2 ].5.5.4 to show that ˆ n β − β1 ( σ ) and ˆ ∑j = 1 x j2 (β 2 − β 2 ) n σ . ii) Set up a conﬁdence interval for σ 2 with conﬁdence coefﬁcient 1 − α. it follows that the distribution of [(n − r)σ 2]/σ2 is χ 2 . Then Y0 = 27. so that a prediction interval for Y0 with conﬁdence coefﬁcient 1 − α is provided by [Yˆ − st 0 n −2 .05. s = 1. 29.2703 and t10. Exercises 16. refer to the data used in Example 1 and let x0 = 60.0.436 16 The General Linear Hypothesis ⎡ ⎤ 2 ⎢ ⎥ x0 − x 1 1 2 ⎥.0873. σZ = σ 2 ⎢ + + n ⎢m n 2⎥ ⎢ ∑ xj − x ⎥ ⎢ ⎥ j =1 ⎣ ⎦ ( ) ( ) (32) It follows by Theorem 6 that Z σZ ˆ σ2 σ2 = ˆ Y0 − Y0 ⎤ ⎡ 2 ⎥ ⎢ x0 − x ˜ σ ⎢1 1 ⎥ + + n 2⎥ n− 2 ⎢m n ⎢ ∑ xj − x ⎥ ⎥ ⎢ j =1 ⎦ ⎣ 2 ( ) ( ) is tn−2 distributed.5. ⎤ ⎡ 2 ⎥ ⎢ x0 − x ˜ σ ⎢1 1 ⎥ + + n 2⎥ n− 2 ⎢m n ⎢ ∑ xj − x ⎥ ⎥ ⎢ j =1 ⎦ ⎣ 2 where s = ( ) ( ) For a numerical example.2281.2 Refer to Example 2 and: i) Use Exercises 16. so that the prediction interval for Y0 is given by [24. ii) What is this conﬁdence interval in the case of Example 2 when n = 27 and α = 0. Y0 + st n − 2 .3.3 and 16.3. α 2 ˆ . ˆ m = 1 and α = 0.

.v. . . . v) What do the results in (iii) and (iv) become for n = 27 and α = 0. 16. Thus the r. n and Y0i. provided x = 0. It is assumed that the r. and ∑ j =1 ( x j − x ) n 2 (βˆ 2 − β2 ) (m + n)σˆˆ (m + n − 3) σ 2 V σV .3 Verify relation (32). m corresponding to an unknown point x0 are observed. i = 1.Exercises 437 are distributed as N(0.v. . . . ii) Derive the MLE x0 of x0.1 to show that ˆ n β − β1 ( ) ˜ ˜ σ2 σ2 are distributed as tn−2. ¯ ii) Use Theorem 6 and Exercise 16. ˆ ˆ1 − β2x0. . 16.’s Y0i = β1 + β2x0 + ei.5. . ⎥ i =1 ⎦ ( ) is distributed as tm+n−3.5. where Y0 is the sample mean of the Y0i’s and show ˆ ii) Set V = Y0 − β that the r. .v.’s Yj. i = 1.4 Refer to Example 3 and suppose that the r. m are all independent.5.5.05? 16. j = 1.v. iv) Set up the test for testing the hypothesis H′ : β2 = 0 (the Y’s are independent of the x’s) against A′ : β2 ≠ 0 at level α and also construct a 1 − α conﬁdence interval for β2.5 Refer to the model considered in Example 1 and suppose that the x’s and the observed values of the Y’s are given by the following table: . iii) Set up the test for testing the hypothesis H : β1 = 0 (the regression line passes through the origin) against A : β1 ≠ 0 at level α and also construct a 1 − α conﬁdence interval for β1.’s ﬁguring in (ii) may be used for testing hypotheses about β1 and β2 and also for constructing conﬁdence intervals for β1 and β2. . 2 where 2 ⎡ x0 − x ⎢1 +1+ σ =σ ⎢ n m n ∑j = 1 x j − x ⎢ ⎣ 2 V 2 ( ( ) ) ⎤ ⎥ 2⎥ ⎥ ⎦ and ˆ ˆ σ2 = 1 ⎡n ˆ ˆ ⎢ ∑ Yj − β 1 − β 2 x j m + n ⎣ j =1 ⎢ ( ) 2 m 2⎤ + ∑ Y0i − Y0 ⎥. . 1).

35 25 0.2 and transform the Y’s to Z’s by means of an orthogonal transformation P whose ﬁrst two rows are n 2 ⎛ x1 − x x − x⎞ ⎛ 1 1 ⎞ 2 . iii) On the basis of the assumed model. .5. 16. respectively. j = 1. 16. n be independent r.62 i) Find the LSE’s of β1.5.32 1.4.8 Consider the r. Then derive the LR test for testing the hypothesis H : ζ1 = ζ1 against the alternative A : ζ1 ≠ ζ1 at level of signiﬁcance α. Also discuss question (iii) of the same exercise for x0 = 3. . .4. σ 2 are parameters.. . Also compare the resulting test with the test mentioned in Exercise 16.2(ii)).7 with r = 2.v..80 1. .44 30 0. σ 2). n whereas ζ1. .5. x 3. .20 1.77.5. where Zj is distributed as N(ζj.. sx = ∑ x j − x .95 (see Exercises 16.07 3. predict Y0 at x0 = 17 and construct a prediction interval for Y0 with conﬁdence coefﬁcient 1 − α = 0.10 1.54 0. β2 and σ 2 by utilizing the formulas (19′). Suppose that ζj = 0 for j = r + 1. ii) Construct conﬁdence intervals for β1..438 16 The General Linear Hypothesis x y 5 0..50 3.72 1.84 0.5.’s. 16.7 Let Zj. β2 and σ 2 with conﬁdence coefﬁcient 1 − α = 0..65 0. ζ1 = γ sx. ⎜ ⎟ . ζ2 = β. (19″) and (21).26 3. . .7 and then transform the Z’s back to Y’s..21 15 0.62 y Assume the model considered in Example 1 and discuss questions (i) and (ii) of the previous exercise. . . ζr.57 0.6 The following table gives the reciprocal temperatures (x) and the corresponding observed solubilities of a certain chemical substance.60 0.’s of Exercise 16.27 1.5.95 (see Example 3).1 and 16.30 20 0. .82 0.80 3. .67 1.. sx ⎠ ⎝ n ⎝ sx j =1 n⎠ ( ) Then: ii) Show that the Z’s are as in Exercise 16.v. n . ii) Set up the test mentioned in Exercise 16.5.2.10 10 0. ⎜ ⎟.

.5. H2 : β2 = β * 1 2 against the corresponding alternatives A1 : β1 ≠ β *. σ 2).1 to test the hypotheses H1 : β1 = β *.3. i = 1. . . j 1 2 j j. n. . .Exercises 439 16. .’s distributed as N(0.3. Use Exercises 16. Y* = β * + β *x* + e* j = 1. 16. m.9 Let Yi = β1 + β2xi + ei. .3.4 and 16.5. . A2 : β2 ≠ β * at level of 1 2 signiﬁcance α.v. where the e’s and e*’s are independent r. .

Comparison of the yields of a chemical substance by using different catalytic methods. Comparison of the wearing of different materials.1 One-way Layout (or One-way Classiﬁcation) with the Same Number of Observations Per Cell The models to be discussed in the present chapter are special cases of the general model which was studied in the previous chapter. 17. Identiﬁcation of the melting point of a metal by using different thermometers. Comparison of a certain brand of gasoline with and without an additive by using it in several cars.440 17 Analysis of Variance Chapter 17 Analysis of Variance The Analysis of Variance techniques discussed in this chapter can be used to study a great variety of problems of practical interest. etc. we consider what is known as a one-way layout. which we introduce by means of a couple of examples. etc. Below. Comparison of the strengths of certain objects made of different batches of some material. In this section. 440 . Crop yields corresponding to different soils and fertilizers. Below we mention a few such problems. Comparison of the effect of different types of oil on the wear of several piston rings. or one-way classiﬁcation. Comparison of test scores from different schools and different teachers. we discuss some statistical models which make the comparisons mentioned above possible. Crop yields corresponding to different soil treatment. Comparison of different brands of gasoline by using them in several cars.

e .’s eij are normally distributed with mean 0 and variance σ 2. . . . Therefore the Yij’s are r. . . Suppose that the same agricultural commodity (some sort of a grain. . consider I · J identical plots arranged in an I × J orthogonal array.v. . I. Of course his decision is to be based on the productivity of each one of the I different machines. . 17. etc. . . . Y2 J . I. σ 2 ( ) for i = 1.1 One-way Layout (or One-way Classiﬁcation) 441 EXAMPLE 1 Consider I machines. so that the yield Yij of the jth plot treated by the ith kind of fertilizer is given by (1). i = 1. . Then it is reasonable to assume that the r. fertilizers. σ 2). and denote by Yij his output the jth day he is running the ith machine. . . let a worker run each one of the I machines for J days each and always under the same conditions. In connection with model (1). μ )′ . .17. . In such a case there are formed IJ rectangles in the resulting rectangular array which are also referred to as cells (see also Fig. One may envision the I objects (machines. . . we denote by μi the average yield of each one of the J plots in the ith row. unspeciﬁed) (that is. . etc. etc. or the I kind of fertilizers) and estimation of σ 2. . . . All other conditions assumed to be the same. . . . It is further assumed that they are independent.) as being represented by the I spaces between I + 1 horizontal (straight) lines and the J objects (days.’s eij. . To this end. eI ′ ) ) .) as being represented by the J spaces between J + 1 vertical (straight) lines. . . . YI . (1) ( ) j = 1.1). each one of which is manufactured by I different companies but all intended for the same purpose. Yij = μ i + eij where eij are independent N 0. . . .) is planted in all I · J plots and that the plants in the ith row are treated by the ith kind of I available fertilizers. . . Y21 . . e β = ( μ . J are independent N(0. . . . . eI . e2 J . there is no difference between the I machines. . the problem is that of comparing the I different kinds of fertilizers with a view to using the most appropriate one on a large scale. I ≥ 2 . .v. .’s themselves and one has the following model. . Y1 J . . The same interpretation and terminology is used in similar situations throughout this chapter.v. . plots. EXAMPLE 2 For an agricultural example. . J (≥ 2). . and let eij stand for the variation of the yield from plot to plot in the ith row. j = 1. . Let μi be the average output of the worker when running the ith machine and let eij be his “error” (variation) the jth day when he is running the ith machine. . YI ′ 11 1 1J I 21 ( e = (e . tomatoes. . . . i = 1. testing the hypothesis: H : μ1 = · · · = μI (=μ. A purchaser who is interested in acquiring a number of these machines is then faced with the question as to which brand he should choose. . I. . . Set Y = Y11 . . . . Once again. . . i = 1. Then it is again reasonable to assume that the r. there are three basic problems we are interested in: Estimation of μi.

that is. Next. (0. . 0)′. clearly. the I vectors (1. jth) cell I I Figure 17. X . 0. 0. . independent and any other row vector in X′ is a linear combination of them. (0. . . Thus we have the model described in (6) β of Chapter 16 with n = IJ and p = I. . . 0. · · · . . . . . Thus rank X′ = I (= p). 0)′.1 1 ⎛ 6444 I 444 7 8 ⎜1 0 0 ⋅ ⋅ ⋅ 0 ⎜ J ⎜⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⎜1 0 0 ⋅ ⋅ ⋅ 0 ⎜ ⎜0 1 0 ⋅ ⋅ ⋅ 0 ⎫ ⎪ ⎜ ⎜⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0 ⎬ J ⎪ X′ = ⎜ 0 1 0 ⋅ ⋅ ⋅ 0 ⎭ ⎜ ⎜⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⎜ ⎜⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⎜⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⎜ ⎜0 0 ⋅ ⋅ ⋅ 0 1 ⎫ ⎜⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⎪ J ⎬ ⎜ ⎜0 0 ⋅ ⋅ ⋅ 0 1 ⎪ ⎝ ⎭ ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ Then it is clear that Y = X′β + e. 1)′ are. . .442 17 Analysis of Variance 1 1 2 2 j J 1 J i (i. 0. 1.

. (2) ( ) (3) Now. according to (31) in Chapter 16. . . μ I . . Therefore. . by (9). . ⎝ j =1 ⎠ j =1 j =1 so that. . 2 i =1 j =1 I J ( ) One has then the (unique) solution ˆ μ = Y. ⎜ ⎟ ⎜ ⎟ ⎝ ⎠ so that. . I have uniquely determined LSE’s which have all the properties mentioned in Theorem 5 of the same chapter. we observe that ⎛J 0 ⎜0 J S = XX ′ = ⎜ ⎜⋅ ⋅ ⎜ ⎝0 0 0 ⋅ ⋅ 0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0⎞ ⋅ 0⎟ ⎟ = JIp ⋅ ⋅⎟ ⎟ 0 J⎠ and J J ⎛ J ⎞′ XY = ⎜ ∑ Y1 j . Then by Theorem 2. . . . . J j =1 J j =1 ⎠ ⎝ J j =1 Therefore the LSE’s of the μ’s are given by 1 J ˆ μ i = Yi . = 1 IJ ∑ ∑Y . ∑ Y2 j . . μ 2 . the F statistic for testing H is given by F = n − r Sc − SC I J − 1 Sc − SC = . μi = 1. . . Chapter 16. = ∑ Yij . under H. . ij i =1 j =1 I J (4) Therefore relations (28) and (29) in Chapter 16 give . μ1 . . That is. μ 2 . J j =1 Next. . ⎛1 J ⎞′ 1 J 1 J ˆ β = S −1 XY = ⎜ ∑ Y1 j . Chapter 16. . the model becomes Yij = μ + eij and the LSE of μ is obtained by differentiating with respect to μ the expression Y − ηc 2 = ∑ ∑ Yij − μ . I . . . μ I ⎟ . . . . q I −1 SC SC i = 1.17. ∑ YIJ ⎟ . . ∑ Y2 j . r − q = 1 and hence q = r − 1 = p − 1 = I − 1. . under the hypothesis H : μ1 = · · · = μI (= μ. . one has ′ ⎛ 64748 64748 J J 64 J 4 ⎞ 7 8 η = EY = ⎜ μ1 .1 One-way Layout (or One-way Classiﬁcation) 443 is of full rank. . .. . . In order to determine the explicit expression of them. where Y. η ∈ V1. . unspeciﬁed). . ∑ YIJ ⎟ . . where Yi . .

. where SST = ∑ ∑ Yij − Y. − Y. ∑⎝ ∑ I i =1 ⎜ J j =1 ⎠ I i =1 (7) ( ) 2 = J ∑ Y i2. − IY . I − 1 SS e MS e ( ) (8) SS H SS e . I ( J − 1) (9) . i =1 j =1 I J ( ) = ∑ ∑Y i =1 j =1 2 ij − IJY . Sc − SC = SSH. ⎝ i =1 ⎠ i =1 i =1 ( ) since Y. by means of (5) and (6). respectively. the LSE of σ 2 is given by MS H = ˜ σ2 = SS e . i =1 I ⎞ 1 I 1 I ⎛1 J Yij ⎟ = ∑ Yi . .C i =1 j =1 I J ( ) = ∑ ∑ (Y 2 i =1 j =1 2 I J ij I J ij − Yi . SC = SSe .2 . . . These expressions are also appropriate for actual calculations. ) 2 2 ˆ = ∑ ∑ Yij − ηij .444 17 Analysis of Variance ˆ SC = Y − ηC and ˆ Sc = Y − ηc But for each ﬁxed i. . according to Theorem 4 of Chapter 16. − Y.2 . . . Sc = SST . . . − IJY . MS e = I −1 I ( J − 1) and SSH and SSe are given by (7) and (5). i =1 I (5) Likewise. where SSe = ∑ ∑ Yij − Yi .2 .c i =1 j =1 I J ( ) = ∑ ∑ (Y i =1 j =1 2 ij − Y. . = J ⎜ ∑ Yi2. i =1 j =1 ( ) = ∑ ∑Y 2 i =1 j =1 2 I J I J 2 ij − J ∑ Y i2. = That is. 2 ˆ = ∑ ∑ Yij − ηij . .2 . where SSH = J ∑ Yi . one has I I ⎛ I ⎞ 2 Sc − SC = J ∑ Y i2. Finally. − IJY . ⎟ = J ∑ Yi . i =1 I Therefore the F statistic given in (3) becomes as follows: F = where I J − 1 SS H MS H = . . I ) = ∑Y 2 j =1 J J − JY i2. ) 2 ∑ (Y so that j =1 J ij − Yi . (6) so that.

4. respectively. ˆ μ3 = 74. ) i =1 I mean squares between groups I−1 MS H = SS H I −1 within groups SS e = ∑ ∑ Y ij − Y i . J within the ith group. J = 5 and let Y11 = Y12 = Y13 = Y14 = Y15 = We have then 82 83 75 79 78 Y21 = Y22 = Y23 = Y24 = Y25 = 61 62 67 65 64 Y31 = 78 Y31 = 72 Y33 = 74 Y34 = 75 Y35 = 72 ˆ μ1 = 79. for each i. . σ 2 = MSe = 7.4. . Thus for α = 0. . REMARK 1 EXAMPLE 3 For a numerical example. Now. Finally. On the other hand.12.6404. under H. and as mentioned above. ˆ μ 2 = 63. ∑ Jj = 1(Yij − Yi.4. it splits into SSH and SSe. as σ 2χ I−1 and 2 2 2 2 σ χ I(J−1).)2 is the sum of squares of the deviations of Yij. SST is called the total sum of squares for obvious reasons. (6) and (7) it follows that SST = SSH + SSe. from (7) we have that SSH represents the sum of squares of the deviations of the group means Yi. .8853 and the hypothesis H : μ1 = μ2 = μ3 is rejected. from the grand mean Y. take I = 3. − Y.17. . Of course.2 and MSH = 315. j = 1. i =1 j =1 ( ) 2 IJ − 1 — From (5). For this reason. For this reason. from (5) we have that.8.05. so that F = 42.0. SSe is called the sum of squares within groups. under H. i =1 j =1 I J I J ( ) 2 I( J − 1) MS e = SS e I( J − 1) total SS T = ∑ ∑ Y ij − Y. MSe = 7. ˜ = 3. Then SST is σ χ IJ−1 distributed. We may summarize all relevant information in a table (Table 1) which is known as an Analysis of Variance Table. (up to the factor J). F2. the 2 quantities SSH and SSe are independently distributed. . .1 One-way Layout (or One-way Classiﬁcation) 445 Table 1 Analysis of Variance for One-Way Layout source of variance degrees of freedom 2 sums of squares SS H = J ∑ (Y i . the analysis of variance itself derives its name because of such a split of SST. Next. . .. Actually. as follows from the discussion in Section 5 of Chapter 16. Also from (6) it follows that SST stands for the sum of squares of the deviations of the Yij’s from the grand (sample) mean Y.5392.05. SSH is called the sum of squares between groups.

4 17. . J are independent N(0. and called the jth column effect. one variety in each plot. j = 1. it is assumed.5 11. errors eij.2 8.446 17 Analysis of Variance Exercise 17. . plus a contribution αi due to the ith row (ith machine). . Here the ith row effect . Then the yield of the jth variety of the commodity in question treated by the ith fertilizer is an r. It is further assumed that the I row effects and also the J column effects cancel out each other in the sense that ∑ α i = ∑ β j = 0. . consider the I machines mentioned there and also J workers from a pool of available workers. Yij such that Yij = μij + eij. . i =1 j =1 I J Finally. where ∑αi = ∑ β j = 0 i =1 j =1 I J (10) and eij. . and suppose that J different varieties of a certain agricultural commodity are planted in each one of the I rows.” His actual daily output is then an r. . .3 9.1 10. . as is usually the case. EXAMPLE 5 Consider the identical I · J plots described in Example 2. plus a contribution βj due to the jth worker. Yij which is assumed again to have the structure described in (10). At this point it is assumed that each μij is equal to a certain quantity μ.v. the grand mean. EXAMPLE 4 Referring to Example 1. and called the ith row effect. I.4 9. . . i = 1. that the r. σ2). . Each one of the J workers is assigned to each one of the I machines which he runs for one day.4 C 9. j = 1. . I (≥ 2). . Thus the assumed model is then Yij = μ + α i + β j + eij .2 Two-way Layout (Classiﬁcation) with One Observation Per Cell The model to be employed in this paragraph will be introduced by an appropriate modiﬁcation of Examples 1 and 2. J (≥ 2) are independent N(0. . Then all J plots in the ith row are treated by the ith of I different kinds of fertilizers.7 B 9. σ 2). Let μij be the daily output of the jth worker when running the ith machine and let eij be his “error. i = 1.0 11.1 Apply the one-way layout analysis of variance to the data given in the table below.1.v. . A 10.

. . . i = 1. . The I objects (machines or fertilizers) and the J objects (workers or varieties of an agricultural commodity) associated with these factors are also referred to as levels of the factors. . . we set Y = Y11 . . . The same interpretation and terminology is used in similar situations throughout this chapter. . βj. HB : β1 = · · · = βJ = 0 (that is. β )′ 1 I 1 J . . . . αi. . machines and workers in Example 4 and fertilizers and varieties of agricultural commodity in Example 5. . Y21 . β . Y1 J . . . e IJ ′ ) ) and ⎛ ⎜1 ⎜ ⎜1 ⎜⋅ ⎜ ⎜1 ⎜ ⎜1 ⎜1 ⎜ ⎜⋅ X′ = ⎜ 1 ⎜ ⎜⋅ ⎜⋅ ⎜ ⎜⋅ ⎜ ⎜1 ⎜1 ⎜ ⎜⋅ ⎜1 ⎝ I 644474448 1 0 0 ⋅ ⋅ ⋅ 0 1 0 0 ⋅ ⋅ ⋅ 0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 1 0 0 ⋅ ⋅ ⋅ 0 0 1 0 ⋅ 0 1 0 ⋅ ⋅ ⋅ ⋅ ⋅ 0 1 0 ⋅ ⋅ ⋅ ⋅ 0 0 ⋅ 0 ⋅ ⋅ ⋅ 0 0 ⋅ 0 ⋅ ⋅ ⋅ ⋅ ⋅ 0 ⋅ ⋅ 0 ⋅ ⋅ ⋅ ⋅ ⋅ 0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 644474448 4J 4 1 0 0 ⋅ ⋅ ⋅ 0 0 1 0 ⋅ ⋅ ⋅ 0 J ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0 0 0 ⋅ ⋅ 0 1 1 0 0 ⋅ 0 1 0 ⋅ ⋅ ⋅ ⋅ ⋅ 0 0 0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0 ⋅ ⋅ ⋅ 0⎫ ⎪ 0⎪ ⎬J ⋅ ⎪ 1⎪ ⎭ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0 ⋅ ⋅ 0 1 0 ⋅ ⋅ 0 1 ⋅ ⋅ ⋅ ⋅ 0 ⋅ ⋅ 0 1 ⋅ 1 0 0 ⋅ ⋅ 0 1 0 ⋅ ⋅ ⋅ 0 ⋅ 0 ⋅ 0⎫ ⎪ ⋅ 0⎪ ⎬J ⋅ ⋅ ⋅ ⋅ ⋅ ⎪ 0 ⋅ ⋅ 0 1⎪ ⎭ ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ . . . . For this purpose.2 Two-way Layout (Classiﬁcation) with One Observation Per Cell 447 is the contribution of the ith fertilized and the jth column effect is the contribution of the jth variety of the commodity in question. α . . . . α . . . . YI 1 . there is no row effect). . . e . J IJ ′ 11 1J 21 2J I1 ( e = (e . . e . . . J. . I. From the preceding two examples it follows that the outcome Yij is affected by two factors. . there is no column effect) and estimation of σ 2. . . In connection with model (10). j = 1. e β = ( μ . there are the following three problems to be solved: Estimation of μ. . . . . .17. testing the hypothesis HA : α1 = · · · = αI = 0 (that is. . . . e . Y2 J . . . . We ﬁrst show that model (10) is a special case of the model described in (6) of Chapter 16. . . .

A. . . I i =1 Summarizing these results. . j = 1 I ∑ Yij . ... I are given by (2). where Yi. ∂ S Y. i = 1. is again given by (4). .β = 0 β ∂a i implies α i = Yi . . Then. . β = 0 ∂μ ( ) ˆ implies μ = Y. βJ Consider the hypothesis ( ) ′ ∈ Vr . α i = Yi. as is found by differentiation. . . − Y. It can be shown (see also Exercise 17. . . j = 1. respectively. . I . . . α I .j − Y. − Y. because of the two independent restrictions ∑α = ∑ β i i =1 j =1 I J j = 0. .2. imposed on the parameters. HA : α1 = · · · = αI = 0. . I i =1 (12) Now we turn to the testing hypotheses problems. so that qA = I − 1. . S Y. j = ˆ ˆ ˆ μ = Y. . . is given by (2) and (∂/∂βj)S(Y. Y. is given by (4) and Y. . .. Y. (11) where Yi . However. η ∈ Vr−q. . S(Y. . . . β j = Y. under HA. the normal equations still have a unique solution. where r = I + J − 1. . . Next. In fact. j − Y. where Y. j = 1. β1 . . . . we determine the LSE’s of μ and βj.1) that X′ is not of full rank but rank X′ = r = I + J − 1. . one has . . β) = 0 implies ˆ ˆ βj = Y. β = ∑ ∑ Yij. . i = 1. J . where r − qA = J. to be ˆ ˆ denoted by μA and βj. That is.. under HA again. J . . We have EY = η = X ′ μ : α1 . where ( ) 1 I ∑ Yij .448 17 Analysis of Variance and then we have Y = X ′β + e with n = IJ and p = I + J + 1. respectively. αi and βj are. we have then that the LSE’s of μ. β) becomes ∑ ∑ (Y i =1 j =1 I J ij − μ − βj ) 2 from where by differentiation. . − μ − α i − β j i =1 j =1 ( ) I J ( ) 2 and ∂ S Y.

− Y. . . j i =1 j =1 ) − J ∑ Yi . MSe (16) where MS A = SS A . . .2. = β j . − Y. − Y. = μ . ) 2 Now SC can be rewritten as follows: SC = SSe = ∑ ∑ Yij − Y. (However. − Y. . j − Yi . β j. i =1 I ( ) 2 (14) because ∑ ∑ (Yij − Y. see (20) below. . . . i =1 j =1 I J I J [( ( ) ( 2 )] 2 = ∑ ∑ Yij − Y. − Y. 2 i =1 I ( ) Therefore ˆi Sc − SC = SS A . to be denoted here by FA. . j = 1. j − Y. . A I I i =1 I i =1 ( ) 2 = J ∑ Yi 2 − IJY.2 Two-way Layout (Classiﬁcation) with One Observation Per Cell 449 ˆ ˆ ˆ ˆ μ A = Y.17.c i =1 j =1 J ( A ) = ∑ ∑ (Y J i =1 j =1 ij − Y. . respectively. SSe are given by (15) and (14). is given by FA = (I − 1)( J − 1) SS I −1 A SSe = MS A . j + Y. I −1 MS e = ( SS e I −1 J −1 )( ) and SSA. . .C i =1 j =1 I J ( I ) = ∑ ∑ (Y 2 i =1 j =1 2 I I J ij − Yi . J . (13) Therefore relations (28) and (29) in Chapter 16 give by means of (11) and (12) ˆ SC = Y − ηC and ˆ Sc = Y − ηc A 2 ˆ = ∑ ∑ Yij − ηij . i =1 (15) It follows that for testing HA. where SS A = J ∑ α 2 = J ∑ Yi . j ) i =1 j =1 i =1 j =1 I J I J = J ∑ Yi . j . )∑ (Yij − Y. the F statistic. ) 2 2 A ˆ = ∑ ∑ Yij − ηij . A = Y. for an expression of SSe to be used in actual calculations. − Y. .) . . ) = ∑ (Yi . . − Y. j )(Yi .

j − Y. . − Y. j − Y. . ) i =1 j =1 i =1 j =1 I J I J = J ∑ Yi . . . − Yi . Finally. if we set SST = ∑ ∑ Yij − Y. − Y. SSe = ∑ ∑ Yij − Y. . . . . . . to be denoted here by FB. i =1 j =1 I J I J [( ( ) ( 2 2 ) ( )] 2 = ∑ ∑ Yij − Y. . − Y. 2 − IJY. . . .450 17 Analysis of Variance Next. j − Y. . . is given by FB = (I − 1)( J − 1) SS J −1 J J B SSe = MSB . − Y. . = SSTT − SS A − SSB i =1 j =1 ( )( because ∑ ∑ (Yij − Y. )(Yi .2. B j =1 j =1 ( ) 2 = I ∑ Y. . Clearly. i =1 j =1 ( )( ) − 2 ∑ ∑ Yij − Y. MSe (17) where MSB = SSB /(J − 1) and ˆ SSB = Sc − SC = I ∑ β j2 = I ∑ Y. respectively. we ﬁnd in an entirely symmetric way that the F statistic. i =1 I ( ) 2 = SS A . for testing the hypothesis HB : β1 = · · · = βJ = 0. (19) we show below that SST = SSe + SSA + SSB from where we get SSe = SST − SSA − SSB. j − Y. j =1 I J ) − 2 ∑ ∑ Yij − Y. − Y. (18) and (19).2. − Y. (20) Relation (20) provides a way of calculating SSe by way of (15). . − Y. i =1 I J I ( ) 2 + I ∑ Y. . . i =1 j =1 J ( ) + J ∑ Yi . i =1 j =1 I J ( )( ) ) + 2 ∑ ∑ Yi . Y. )∑ (Yij − Y. − Y. j − Y. Y. ) = ∑ (Yi . Yi . j j =1 J (18) The quantities SSA and SSB are known as sums of squares of row effects and column effects. i =1 j =1 I J ( ) = ∑ ∑Y 2 i =1 j =1 I J 2 ij − IJY. .

. ) = I ∑ Y. j − Y. − Y. ) The pairs SSe.Exercises 451 Table 2 Analysis of Variance for Two-way Layout with One Observation Per Cell source of variance degrees of freedom 2 sums of squares ˆi SS A = J ∑ α 2 = J ∑ (Y i . j =1 J )( ) J ( )∑ (Y i =1 I ij − Y. . This section is closed by summarizing the basic results in Table 2 above. i =1 j =1 I J ( ) 2 ( I − 1) × ( J − 1) IJ − 1 MS e = ( I − 1)( J − 1) — SS e total SS T = ∑ ∑ Y ij − Y. = ∑ Y. . . j =1 ( ) 2 = SSB and ∑ ∑ (Y i =1 j =1 I J i. Y. Finally. . j − Y. j − Y. the LSE of σ 2 is given by ˜ σ 2 = MSe. = 0. Y. . as a consequence of the discussion in Section 5 of Chapter 16. 17. . (21) Exercises 17.05). − Y. where X′ is the matrix employed in Section 2.2 Apply the two-way layout with one observation per cell analysis of variance to the data given in the following table (take α = 0. . i =1 )( ) I ( )∑ (Y j =1 J . j − Y. = ∑ Yi . SSA and SSe.1 Show that rank X′ = I + J − 1. j + Y. .2. . − Y. . . SSB are independent σ 2χ 2 distributed r.’s with certain degrees of freedom. j − Y. − Y.v.2. j j =1 j =1 I J J J ( ) 2 J−1 MS B = residual SS e = ∑ ∑ Y ij − Y i . .j − Y. . ) i =1 i =1 I I mean squares rows I−1 MS A = SS A I −1 SS B J −1 columns ˆ SS B = I ∑ β 2 = I ∑ Y. i =1 j =1 ( ) 2 ∑ ∑ (Y i =1 j =1 I J ij − Y.

there are no interactions present). . where I J J I (22) ∑α = ∑ β = ∑γ i j i =1 j =1 j =1 ij = ∑ γ ij = 0 i =1 for all i and j and eijk. σ 2). e1 JK . . . . . . . . . . . . . . j = 1. . . i = 1. . the plot where the jth agricultural commodity was planted and it was treated by the ith fertilizer (in connection with Example 5). j)th plot. . . μ1 J . . . . . . the means μij. J need not be additive any longer. . such as fertilizers and varieties of agricultural commodities. . μ I 1 . . . Y1 J 1 . on the average. . . YIJK ′ 111 11 ( e = (e β = (μ . Y11K . In other words. . . I.452 17 Analysis of Variance 3 −1 1 7 2 2 5 0 4 4 2 0 17. . . . J (that is. YIJ 1 . . Once again the problems of main interest are estimation of μ. . e1 J 1 . By setting Y = Y111 . except for the grand mean μ and the row and column effects αi and βj. that is.3 Two-way Layout (Classiﬁcation) with K (≥ 2) Observations Per Cell ≥ In order to introduce the model of this section. . . . j = 1. . . .μ IJ ′ ) ) ) and . . e11K . which in the previous section added up to make μij. It is not unreasonable to assume that. j = 1. . . . . k = 1. . testing the hypotheses: HA : α1 = · · · = αI = 0. . I (≥ 2). . . However. . In the present case. the relevant model will have the form Yijk = μij + eijk. and estimation of σ 2. j = 1. . βj and γij. . J (≥ 2). i = 1. This amounts to saying that we observe the yields Yijk. consider Examples 4 and 5 and suppose that K (≥ 2) observations are taken in each one of the IJ cells. . . . . . . . . . K of K identical plots with the (i. I. . . e IJK ′ . . . respectively. . i = 1. . . . . . . Thus our present model is as follows: Yijk = μ + αi + β j + γij + eijk. . e IJ 1 . k = 1. K (≥ 2) are independent N(0. . . or workers and machines. . . . αi. . i = 1. these interactions cancel out each other and we shall do so. . . HB : β1 = · · · = βJ = 0 and HAB : γij = 0. . . J. or we allow the jth worker to run the ith machine for K days instead of one day (Example 4). . . I. . Y1 JK . . . we may now allow interactions γij among the various factors involved.

From the form of X′ it is also clear that rank X′ = r = p = IJ.17. ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟. ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ (22′) so that model (22′) is a special case of model (6) in Chapter 16. that is. β = ∑ ∑ ∑ Yijk − μ ij .3 Two-way Layout (Classiﬁcation) with K (≥ 2) Observations Per Cell 453 ⎛ 644444 7444444 4IJ 8 ⎜1 0 0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0 ⎜ ⎜⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ K ⎜1 0 0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0 ⎜ ⎜0 1 0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0 ⎫ ⎪ ⎜ ⎜⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⎬ K ⎜0 1 0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0 ⎪ ⎭ ⎜ ⎜⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⎜ ⎜⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⎜⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⎜ J ⎜ 6447448 ⎜0 ⋅ ⋅ ⋅ 0 1 0 ⋅ ⋅ ⋅ ⋅ 0 ⎜ K ⎜⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ X′ = ⎜ 0 ⋅ ⋅ ⋅ 0 1 0 ⋅ ⋅ ⋅ ⋅ 0 ⎜ ⎜⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⎜⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⎜ ⎜⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⎜ ⎛ I −1⎞ J +1 ⎝ ⎜ 64444 ⎠ 4444 7 8 ⎜0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0 1 0 ⋅ ⋅ ⋅ 0 ⎜ ⎜⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ K ⎜0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0 1 0 ⋅ ⋅ ⋅ 0 ⎜ ⎜⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⎜ ⎜⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⎜⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⎜ 1⎫ ⎜0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0 ⎜⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⎪ K ⎬ ⎜ ⎜0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0 1⎪ ⎝ ⎭ it is readily seen that Y = X′β + e β with n = IJK and p = IJ.1). Therefore the unique LSE’s of the parameters involved are obtained by differentiating with respect to μij the expression S Y. 2 i =1 j =1k =1 ( ) I J K ( ) We have then .3. X′ is of full rank (see also Exercise 17.

. . . − μ.j − μ. are given by the above-mentioned linear combinations. . . βj and γij are linear combinations of the parameters μij. .. . (24) by employing the “dot” notation already used in the previous two sections. . − μ. ˆ β j = Y. . − Yi.3.) From identity (26) it follows that. . i = 1. j. j = 1. . under the hypothesis HA :α1 = · · · = αI = 0. the LSE’s of the remaining parameters remain the same as those given in (25). α i = Yi. J . − Y. αi = μi. .. i =1 j =1k =1 I J K I J K ( ) 2 2 = ∑ ∑ ∑ Yijk − K ∑ ∑ Yij2 .. by the corollary to Theorem 3 in Chapter 16. . so that Sc A − SC = JK ∑ α i2 . 2 i =1 j =1 I J ( ) (26) because. . from the fact that μij = μ + αi + βj + γij and on the basis of the assumptions made in (22). (See also Exercise 17. Therefore. i =1 j =1k =1 i =1 j =1 I J (27) and . + γij. (23) Next. It is then readily seen that ˆ ˆ μ = Y. . ˆ ˆ ˆ ˆ ˆ Now from (23) and (25) it follows that μij = μ + α i + βj. they are ˆ ˆ ˆ ˆ estimable. . + Y. i =1 i =1 I I Thus for testing the hypothesis HA the sum of squares to be employed are SC = SSe = ∑ ∑ ∑ Yijk − Yij . .2. I . . γij. . βj = μ . upon replacing μij by their LSE’s. . . αi. .j + μ. . as is easily seen. . − Y. − Y. Therefore ˆ SC = ∑ ∑ ∑ Yijk − μ ij Next. and their LSE’s μ. It follows then that ˆ ˆ Sc A = SC + JK ∑ α i2 . j = 1. (25) i = 1. I . we have μ = μ. j. From (24) we have that μ. . . .454 17 Analysis of Variance ˆ μ ij = Yij. i =1 j =1 k =1 I J K ( ) 2 ˆ ˆ ˆ ˆ Yijk − μ ij = Yijk − μ − α i − β j − γˆ ij + μ − μ i i j j ij ( and hence ) ( ˆ ˆ + (α − α ) + ( β − β ) + (γˆ ) 2 ) + γ ij ) S Y. β = ∑ ∑ ∑ Yijk − μ ij i =1 j =1k =1 I ( ) I J K ( ˆ = SC + IJK μ − μ J ( ) 2 ˆ + JK ∑ α i − α i i =1 ( ) 2 ˆ + IK ∑ β j − β j j =1 ( ) 2 + K ∑ ∑ γˆ ij − γ ij . . . i =1 j =1k =1 I J K γˆ ij = Yij. J . . ( ) 2 ˆ ˆ ˆ = ∑ ∑ ∑ Yijk − μ − α i − β j − γˆ ij . .. . α i. γij = μij − μi. . . all other terms are equal to zero. βj. .

. which now is given by F AB = ( SS AB MS AB = . so that. we get I − 1 independent linear relationships which the IJ components of η satisfy and hence r − qA = IJ − (I − 1). . MS e I − 1 J − 1 SS e IJ K − 1 ( )( ) ) (32) where . = αi. A I I i =1 i =1 ( ) 2 = JK ∑ Yi2. − Y. . (31) arguments similar to the ones used before yield the F statistic. .17. . under HA. Thus qA = I − 1 since r = IJ.2. . SSe are given by (28) and (27). . . μi . . . − IJKY. . we observe that μi. I. . i = 1. 2. . − Y. − μ. . . . J. i =1 I (28) For the purpose of determining the dimension r − qA of the vector space in which η = EY lies under HA. Therefore the F statistic in the present case is FA = IJ K − 1 SS A MS A = . .2. we ﬁnd in an entirely symmetric way that the F statistic to be employed is given by FB = where MSB = SSB J −1 J j =1 IJ K − 1 SS B MS B = . . . j . I − 1 SS e MS e SS A . I −1 SS e IJ K − 1 ( ) (29) where MS A = MS e = ( ) and SSA. For i = 1.3 Two-way Layout (Classiﬁcation) with K (≥ 2) Observations Per Cell 455 ˆ Sc − SC = SS A = JK ∑ α i2 = JK ∑ Yi . respectively. I. . . . . . I − 1. For testing the hypothesis HB : β1 = · · · = βJ = 0. − μ. = 0. J − 1 SS e MS e 2 ( ) (30) and ˆ SSB = IK ∑ β j2 = IK ∑ Y. − IJKY. j = 1. . . j Also for testing the hypothesis HAB : γ ij = 0. i = 1. j =1 j =1 J J ( ) = IK ∑ Y. . .

5416. ˆ β3 = 16.3.7916. so that SSAB = SST − SSe − SSA − SSB. . Of course. (36) Once again the main results of this section are summarized in a table. (35) Relation (35) is suitable for calculating SSAB in conjunction with (27). .5416. i =1 j =1k =1 I J K I J K ( ) 2 2 = ∑ ∑ ∑ Yijk − IJKY.’s with certain degrees of freedom.3) that SST = SSe + SSA + SSB + SSAB. j .5417. .) Finally. (28). for an expression of SSAB suitable for calculations. .4166. ˆ γ22 = 1. β1 = −17.2084 ˆ ˆ ˆ α 2 = −2. see (35) below. by setting SST = ∑ ∑ ∑ Yijk − Y. SSB. γ ˆ γ23 = −2. EXAMPLE 6 For a numerical application.8334. γ21 = 0.456 17 Analysis of Variance MS AB = ( SS AB I −1 J −1 )( J ) and 2 SS AB = K ∑ ∑ γˆ ij i =1 j =1 I J = K ∑ ∑ Yij . ˆ12 = −1. consider two drugs (I = 2) administered in three dosages (J = 3) to three groups each of which consists of four (K = 4) subjects. . − Y. 2 i =1 j =1 I ( ) (33) (However. . which can be shown to be independently distributed as σ 2χ 2 r.4167. .7917.v. − Yi . ˆ ˆ γ 18 20 50 53 34 36 40 17 X121 X122 X123 X124 X221 X222 X223 X224 = 64 = 49 = 35 = 62 = 40 = 63 = 35 = 63 X131 X132 X133 X134 X231 X232 X233 X234 = = = = = = = = 61 73 62 90 56 61 58 73 . Certain measurements are taken on the subjects and suppose they are as follows: X111 = X112 = X113 = X114 = X211 = X212 = X213 = X214 = For this data we have ˆ μ = 50. β2 = 0. the LSE of σ 2 is given by ˜ σ 2 = MSe. . γ13 = 2. SSAB and SSe. ˆ α 1 = 2. + Y. i =1 j =1k =1 (34) we can show (see Exercise 17. The number of degrees of freedom of SST is calculated by those of SSA.2. Table 3. ˆ11 = −0.2084. (31) and (34).2083.0416.

˜ The models analyzed in the previous three sections describe three experimental designs often used in practice. − Y. . by allowing the row effects. .0. even a brief study of these designs would be well beyond the scope of this book. 17. . we have σ 2 = 183. + Y.4139 and F2. − Yi . . etc. we accept HA.v. ij i =1 j =1 i =1 j =1 I J K ( ) 2 ( I − 1)( J − 1) IJ ( K − 1) MS AB = ( I − 1)( J − 1) SS e IJ ( K − 1) SS AB error SS e = ∑ ∑ ∑ Y ijk − Y ij . FB = 12.18. we have F1.1 17. Exercises 17.’s themselves.05 = 4.0230. . Some of them are taken from the ones just described by allowing different numbers of observations per cell.8471. by randomizing the levels of some of the factors. . ) i =1 i =1 J J I I I−1 B main effects ˆ SS B = IK ∑ β j2 = IK ∑ Y. where X′ is the matrix employed in Section Verify identity (26). by increasing the number of factors. column effects and interactions to be r. Finally.18. − Y.5546. + Y. j =1 j =1 I J I J ( ) 2 J−1 MS B = AB interactions SS AB = K ∑ ∑ γˆ 2 = K ∑ ∑ Yij .1038.3. .05 = 3.Exercises 457 Table 3 Analysis of Variance for Two-way Layout with K (≥ 2) Observations Per Cell source of variance degrees of freedom 2 sums of squares mean squares MS A = SS A I −1 SS B J −1 A main effects ˆ SS A = JK ∑ α i2 = JK ∑ (Y i . There are many others as well.05. i =1 j =1 k =1 I J K ( ) 2 MS e = total SS T = ∑ ∑ ∑ Y ijk − Y.0. reject HB and accept HAB.3. However. Thus for α = 0. FAB = 0.1641. . i =1 j =1 k =1 ( ) 2 IJK − 1 — and FA = 0. . . j .3. j . .2 Show that rank X′ = IJ.

(28). . (31). 3 3 1 1 or μ1 + μ 3 + μ 5 − μ 2 + μ 4 + μ 6 etc.4 Apply the two-way layout with two observations per cell analysis of variance to the data given in the table below (take α = 0. the natural quantities to look into are of the following sort: 1 1 μ1 + μ 2 + μ 3 − μ 4 + μ 5 + μ 6 . I.3. DEFINITION 1 Any linear combination ψ = ∑iI= 1 ciμi of the μ’s. . i = 1. (33) and (34). SSB. . . SSAB and SST are given by (27). . 17. is said to be a contrast among the parameters μi. 3 3 We observe that these quantities are all of the form μ i − μ j . For the sake of simplicity. i = 1.4 A Multicomparison Method Consider again the one-way layout with J (≥ 2) observations per cell described in Section 17. 110 95 214 217 208 119 128 117 183 187 183 195 48 60 115 127 130 164 123 138 114 156 225 194 19 94 129 125 114 109 17. unspeciﬁed) we decided to reject it on the basis of the available data. . No conclusions are reached as to which speciﬁc μ’s may be unequal. where ci. let us suppose that I = 6. i ≠ j. we simply conclude that the μ’s are not all equal. . respectively. where SSe.3. or ( ) ( ) ( ) ( ) ∑ ci μ i i =1 6 with ∑ ci = 0. After rejecting H.05). SSA.458 17 Analysis of Variance 17. i =1 6 This observation gives rise to the following deﬁnition.1 and suppose that in testing the hypothesis H : μ1 = · · · = μI (= μ.3 Show that SST = SSe + SSA + SSB + SSAB. The multicomparison method described in this section sheds some light on this problem. . In rejecting H. I are known constants such that ∑iI= 1ci = 0. Let ψ = ∑iI= 1ciμi be a contrast among the μ’s and let .

. . I) the quantity PROOF f c1 . . ψ + Sσ (ψ )] is a conﬁdence interval simultaneously for all contrasts ψ with conﬁdence coefﬁcients 1 − α. Now. consider the following deﬁnition. clearly. cI). . Now it can be shown that the F test rejects the hypothesis H if and only if ˆ there is at least one contrast ψ such that ψ is signiﬁcantly different from zero. . . J i =1 ˆ ˆ ˆ ˆ ˆ ˆ where MSe is given in Table 1. . Next. if |ψ | > Sσ (ψ ). . Then the interval [ψ − Sσ (ψ ). cI) = f(γc1.n − I . . . . We say that ψ is signiﬁcantly different from zero. .n−I. i =1 I so that 1 I ˆ ˆ σ 2 ψ = ∑ ci2 MSe . . . . σ 2 ψ = ∑ ci2 MSe and S 2 = I − 1 FI −1. ( ) Consider the problem of maximizing (minimizing) (with respect to ci. γcI) for any γ > 0. The conﬁdence intervals in question are provided by the following theorem. . . 1 c′i = γ ci. J i =1 i =1 ˆ ˆ ˆ ˆ where n = IJ.a .. ψ + Sσ ˆ ˆ ˆ ˆ (ψ )] does not contain zero. f(c1. . ψ + ˆ ˆ Sσ(ψ )] is a conﬁdence interval with conﬁdence coefﬁcient 1 − α for all contrasts ψ. . ˆ ˆ ˆ ˆ ˆ ˆ according to the S (for Scheffé) criterion.1 and let ψ = ∑ ci μ i . − μi ) subject to the contrast constraint ∑ ci = 0. c I = ( ) 1 1 I 2 ∑ ci J i =1 I ∑ c (Y i i =1 I i.17. ( ) ( ) DEFINITION 2 ˆ Let ψ and ψ be as above. . I subject to the restraints i =1 I . i = 1. . . . i =1 I ∑ c i = 0. . γ cI) = f(c′. .4 A Multicomparison Method 459 I 1 I ˆ ˆ ˆ ψ = ∑ ci Yi . We will show in the sequel that the interval [ψ − Sσ(ψ ). . Therefore the maximum (minimum) of f(c1. . Thus following the rejection of H one would construct a conﬁdence interval for each contrast ψ and then would proceed to ﬁnd out which contrasts are responsible for the rejection of H starting with the simplest contrasts ﬁrst. i = 1. . . THEOREM 1 Refer to the one-way layout described in Section 17. . . subject to the restraint i =1 ∑ ci = 0. if the interval [ψ − Sσ(ψ ). cI′). where S2 = (I − 1)FI −1. is the same with the maximum (minimum) of f(γ c1. .α and n = IJ. equivalently.

. . To this end. . − μ i . . I ⎪ ∂ck ⎪ I ⎪ ∂h ⎪ = ∑ ci = 0 ⎬ ∂λ1 i =1 ⎪ I ⎪ ∂h 2 = ∑ c i − J = 0. . . . . . Because of this it is clear that q(c1. . We have ⎫ ∂h = Yk . ⎪ ⎪ ∂λ 2 i =1 ⎭ (37) Solving for ck in (37). I and λ1. . . cI) has both a maximum and a minimum. . . k = 1. . (38) Then the last two equations in (37) provide us with . . . λ2. . c I = ∑ ci Yi .460 17 Analysis of Variance ∑ c′ = 0 i i =1 I and 1 I 2 ∑ ci′ = 1. . . we get ck = 1 μ k − Yk . i =1 ( ) I ( ) subject to the constraints ∑ ci = 0 i =1 I I and ∑ ci2 = J . − μ k + λ1 + 2 λ 2 ck = 0. i = 1. cI) are to be found on the circumference of the circle which is the intersection of the sphere ∑ ci2 = J i =1 and the plane ∑ ci = 0 i =1 I which passes through the origin. λ1 . − μ i + λ1 ⎜ ∑ ci ⎟ + λ 2 ⎜ ∑ ci2 − J ⎟ ⎠ ⎝ i =1 ⎝ i =1 ⎠ i =1 ( ) ( ) and maximizes (minimizes) it with respect to ci. − λ1 2λ 2 ( ) k = 1. I . . J i =1 Hence the problem becomes that of maximizing (minimizing) the quantity q c1 . λ 2 = ∑ ci Yi . · · · . The solution of the problem in question will be obtained by means of the Lagrange multipliers. i =1 I Thus the points which maximize (minimize) q(c1. one considers the expression I ⎞ ⎛ I ⎛ I ⎞ h = h c1 . . . c I . .

. . n − I . . + Y. ⎡ I = − ⎢∑ μ k − Yk . . = − ∑ μ k − Yk . .) Therefore n−I J ∑i =1 μ i − μ . − Yi . − μ i I ( ) ) 2 1 J ∑i =1 ci2 I for all ci. . + Y. . k ) ( )] ) = − ∑ μ k − Yk . + Y. − μ i − μ . . . .1) and also independent of SSe I−1 which is σ 2χ 2 distributed. − Yk . ( (39) ∑ ci = 0. − μ k μ k − μ . . + Y. − Y. I ( ⎡ I = P ⎢ J ∑i =1 μ i − μ . k =1 ) + ( μ − Y )∑ ( μ k =1 − Yk . we have ck = ± J μ k − μ . . ) 2 ( ) ) 2 J ∑i =1 μ i − μ . . Next.. (See Section 17.17. Therefore k =1 I [( ) ( )] 2 ≤ 0. i =1 I [( ) ( )] 2 is σ 2χ 2 distributed (see also Exercise 17. − Yi . − Yi . .4. − Yi . − Yi . + Y. ∑ (Y k =1 I k. I . ( . . − J ∑i =1 μ i − μ . − Yi . − Y. + Y.4 A Multicomparison Method 461 2 J Replacing these values in (38). . + Y. I ( ) (I − 1) ) 2 MSe is FI−1. . k =1 I )( ) I ( ( )[(μ 2 k − Yk . ⎣ k =1 ( ) 2 2⎤ − I μ . k = 1. − Yi . − μ . + Y. I ( ) 2 ≤ ∑ I i =1 i c Yi . I λ1 = μ . − Yk . ⎦ (40) . ⎥ ⎦ ( ) = − ∑ μ k − Yk . Now we observe that i =1 J ∑ μ i − μ . − μ . − Yi . and λ2 = ± 1 ∑ (μ i =1 I i − μ . . . + Y.α MSe ≤ − J ∑i =1 μ i − μ . I such that I ≤ J ∑i =1 μ i − μ . ⎣ ( ) 2 ≤ (I − 1)F I −1 .n−I distributed and thus ⎡ P ⎢− ⎣ ⎤ ⎥ ⎦ (I − 1)F I −1 .1. . . i = 1. I . − Y. + Y. i =1 I ( ) 2 = J ∑ Yi . . − Y. n − I . − Y. . . .a ⎤ MSe ⎥ = 1 − α .

4 is distributed as σ 2χ 2 .1. − μ i I I I ⎤ 1 J ∑i = 1ci2 MSe ⎥ = 1 − α .) ▲ In closing. under the null hypothesis.95). i = 1.1 and construct conﬁdence intervals for all contrasts of the μ’s (take 1 − α = 0. we would like to point out that a similar theorem to the one just proved can be shown for the two-way layout with (K ≥ 2) observations per cell and as a consequence of it we can construct conﬁdence intervals for all contrasts among the α’s. ⎦ ( ) (I − 1)F I −1 . n − I . − Yi . . . . . 1969. for all contrasts ψ.462 17 Analysis of Variance From (40) and (39) it follows then that ⎡ P ⎢− ⎣ ≤ (I − 1)F I −1 .2 Refer to Exercise 17. + Y.4. mentioned in SecI tion 17. or the γ’s. ˆ ˆ ˆ ˆ ˆ ˆ P ψ − S σ ψ ≤ ψ ≤ ψ + Sσ ψ = 1 − α . I−1 ( ) 2 17. 23. as was to be seen.α 1 J ∑i = 1ci2 MSe ≤ ∑i = 1ci Yi . Vol.α for all ci.4. n − I . or equivalently. [ ( ) ( )] Exercises 17. Number 5. . I such that ∑iI= 1ci = 0. (This proof has been adapted from the paper “A simple proof of Scheffé’s multiple comparison theorem for contrasts in the one-way layout” by Jerome Klotz in The American Statistician. .1 Show that the quantity J ∑i =1 μ i − μ . or the β’s.

463 . r. . . certain estimation and independence testing problems closely connected with it are discussed. . . . . k are jointly normally distributed.1 Introduction In this chapter. m and μ the r. it follows that if Xi. k or the distribution of the r. k. . r. Also.’s Xi. . .1 Introduction 463 Chapter 18 The Multivariate Normal Distribution 18. then any subset of them also is a set of jointly normally distributed r. . ( Y = Y1 . . i = 1. μ ) . ∑m 1 cjYj + j= μ is distributed as N(μ. . ∑m 1 c2). .d. (1) (2) or in matrix notation X = CY + μ.18.’s distributed as N(0. .’s. Let Yj.’s with common distribution N(0.’s Xi. Let Yj. where ′ X = X1 . .v.i. DEFINITION 1 ( ) ′ ) C = c ij k × m . . j = 1. i = 1. .v.v. .v. m be i. Ym . . k-Variate) Normal. X i = ∑ cij Yj + μ i . vector X. j = 1. m be i. . . . k Thus we can give the following deﬁnition. . . that is. 1 and ( )( ) ′ μ = (μ . REMARK 1 From Deﬁnition 1. is called Multivariate (or more speciﬁcally. . 1). consider k such combinations. . . . i = 1. .v. . . j =1 m i = 1.d. . Then we know that for any constants cj. 1) and let the r.v. k. we introduce the Multivariate Normal distribution and establish some of its fundamental properties. be deﬁned by (1) or (2). j = 1. or the r. . . . . respectively. . X k .i. Now instead of considering one (nonj= j homogeneous) linear combination of the Y’s. . Then the joint distribution of the r. . . vector X.

1) and set X = CY + μ. ⎝ j =1 () ( ) [ ( )] ⎞ ( ) (4) ∑ t j c jm ⎟ (Y1 . / / Σ x or just Σ = CC ′. . φx of the r. We now proceed to ﬁnding the ch. THEOREM 1 The ch.d. of the r. .f. j = 1. and therefore the distribution of X. / Now we shall establish the following interesting result. vector X.f. . we have the following result. r. 2 ⎝ ⎠ From (6) it follows that φx. But ⎛ k t ′CY = ⎜ ∑ t j c j 1 . ( ) where μ and Σ are the parameters of the distribution. Ym ) ⎠ j =1 k ′ ⎞ ⎛ k ⎞ ⎛ k = ⎜ ∑ t j c j 1 ⎟ Y1 + ⋅ ⋅ ⋅ + ⎜ ∑ t j c jm ⎟ Ym ⎠ ⎝ j =1 ⎠ ⎝ j =1 and hence ⎛ k ⎞ ⎛ k ⎞ E exp it ′CY = φY ⎜ ∑ t j c j 1 ⎟ + ⋅ ⋅ ⋅ + φY ⎜ ∑ t j c jm ⎟ ⎝ j =1 ⎠ ⎝ j =1 ⎠ ( ) 1 m because 2 2 ⎡ ⎛ k ⎞ ⎛ k ⎞ ⎤ ⎢− 1 ∑ t j c j 1 − ⋅ ⋅ ⋅ − 1 ∑ t j c j m ⎥ = exp ⎟ ⎟ ⎥ ⎢ 2 ⎜ j =1 2 ⎜ j =1 ⎝ ⎠ ⎝ ⎠ ⎦ ⎣ ⎛ 1 ⎞ = exp⎜ − t ′CC′t⎟ ⎝ 2 ⎠ 2 2 (5) ⎛ k ⎞ ⎛ k ⎞ t j c j 1 ⎟ + ⋅ ⋅ ⋅ + ⎜ ∑ t j c j m ⎟ = t ′CC′t. / Σ EX = μ. Then the p. Xk)′.v. fx of X exists and is given by . we have ( ) (3) φ X t = E exp it ′X = E exp it ′ CY + μ = exp it ′μE exp it ′CY . is completely determined by means of its mean μ and covariance matrix Σ. tk)′ ∈ k . . ⎜∑ ⎝ j =1 ⎠ ⎝ j =1 ⎠ Therefore by means of (3)–(5).464 18 The Multivariate Normal Distribution From (2) and relation (10). . THEOREM 2 Let Yj. it follows that EX = μ and Σx = / CΣYC′ = CImC′ = CC′. . . Chapter 16. is given by / ⎛ ⎞ 1 / (6) φ x t = exp⎜ it ′μ − t ′Σt ⎟ .d. k be i. . . a fact analogous / to that of a Univariate Normal distribution. . Σ . . .i. . . that is.f. which has the k-Variate Normal distribution with mean μ and covariance matrix Σ.’s with distribution N(0. . . vector X = (X1. . . where C is a k × k non-singular matrix. . For t = (t1. . This fact justiﬁes the following notation: () / X ~ N μ.

−1 −1 −1 −1 −1 −1 −1 (8) fX x = 2π () ( ) −k 2 ′ ⎡ 1 / C −1 exp⎢− x − μ Σ −1 x − μ ⎣ 2 ( ) ( )⎥ .f. 2 σ1 ⎟ ⎠ and hence |Σ| = σ 2σ 2(1 − ρ2). X2 is the Bivariate Normal p.d. ⎦ 1 – 2 ⎤ Finally. X2)′ and the joint p. so that ||C|| = |Σ| . σ 2).f. given by (7) is called a non-singular k-Variate Normal. Also let ρ be the correlation coefﬁcient of X1 and 1 2 X2. Thus / / / Σ Σ 2 fX x = 2π () ( ) −k 2 / Σ −1 2 ′ ⎡ 1 / exp⎢− x − μ Σ −1 x − μ ⎣ 2 ( ) ( )⎥ . ′ / and (C ) (C ) = (C ′) C = (CC ′) = Σ . that is. x ∈ ⎣ 2 ⎦ where Σ = CC′ and |Σ| denotes the determinant of Σ. Y = C −1 X − μ . so that / Σ 1 2 / Σ −1 = 2 σ 1σ 2 1 − ρ 2 2 ( 1 ) ⎛ σ2 2 ⎜ ⎝ − ρσ 1σ 2 . from Σ = CC′. The use of the term non-singular correΣ sponds to the fact that |Σ| ≠ 0. one has |Σ| = |C| |C′| = |C| . let k = 2. ⎝ 2 j =1 ⎠ ∑ y = ( x − μ) (C k j =1 C −1 = C Therefore −1 ) (C )(x − μ) (see also Exercise 18.f. which. of X1. / / / Σ fX x = 2π () ( ) −k 2 / Σ −1 2 ( ) ( ) k . / / REMARK 2 COROLLARY 1 In the theorem. Then their covariance matrix Σ is given by / 2 ⎛ σ1 / Σ=⎜ ⎝ ρσ 1σ 2 ρσ 1σ 2 ⎞ σ2 ⎟ 2 ⎠ − ρσ 1σ 2 ⎞ .2). both X1 and X2 are normally distributed and let X1~ N(μ1.d.d. (7) PROOF From X = CY + μ we get CY = X − μ. ⎦ ⎤ as was to be seen. ▲ A k-Variate Normal distribution with p.18.1 Introduction 465 ′ ⎡ 1 ⎤ / exp⎢− x − μ Σ −1 x − μ ⎥. σ 2) and X2 ∼ N(μ2. PROOF By Remark 1. gives ( ) Therefore fX x = fY C −1 x − μ But ′ () 2 j [ ( ′ )] C −1 −1 = 2π ( ) −k 2 ⎛ 1 k ⎞ exp⎜ − ∑ y 2j ⎟ C −1 . since C is non-singular. Then X = (X1.1. the fact that Σ is of full rank.

▲ COROLLARY 2 2 ⎛ x − μ 2 ⎞ ⎤⎫ 2ρ ⎪ x1 − μ1 x 2 − μ 2 + ⎜ 2 ⎟ ⎥ ⎬. REMARK 3 Exercises ˆ 18. since independence implies noncorrelation in any case. given by (19″) and (19′) of .1. − x1 − μ1 ρσ 1σ 2 + x 2 − μ 2 σ 1 ⎜ 1 2 ⎟ ⎝ x2 − μ2 ⎠ (( ( ) ( ) ( ) ) 2 = x1 − μ1 σ 2 − 2 x1 − μ1 x 2 − μ 2 ρσ 1σ 2 + x 2 − μ 2 σ 1 . i = 1. The r. In particular.1 Use Deﬁnition 1 herein in order to conclude that the LSE β of β in (9) of Chapter 16 has the n-Variate Normal distribution with mean β and ˆ ˆ covariance matrix σ 2S−1.v. k are independent if and only if they are uncorrelated. . On the other hand. k are uncorrelated if and only if Σ is a / diagonal matrix and its diagonal elements are the variances of the X’s.’s Xi. as it has been seen elsewhere. It is also to be noted that noncorrelation without normality need not imply independence. . β 2)′. (β 1. so that Σ−1 itself is a diagonal matrix with the / i jth diagonal element being given by 1/σ 2.v. . x ) = 1 2 1 2πσ 1σ 2 − ⎧ 1 ⎪ exp⎨− 2 2 1−ρ ⎪ 2 1−ρ ⎩ ( ) ⎡⎛ x1 − μ1 ⎞ ⎢⎜ ⎟ ⎢⎝ σ 1 ⎠ ⎣ 2 as was to be shown. Xk (x . . Then |Σ| = σ 2 · · · σ2 . . σ 1σ 2 ⎝ σ 2 ⎠ ⎥⎪ ⎦⎭ ( )( ) The (normal) r. It follows that j PROOF fX 1 .466 18 The Multivariate Normal Distribution Therefore ′ 2 / σ 1 σ 2 1 − ρ 2 x − μ Σ −1 x − μ 2 ( )( ) ) ( ) − ρσ 1σ 2 ⎞ ⎛ x1 − μ1 ⎞ σ 2 ⎟ ⎜ x2 − μ2 ⎟ ⎠ ⎠⎝ 1 ⎛ σ2 2 = x1 − μ1 . . . x 2 − μ 2 ⎜ ⎝ − ρσ 1σ 2 ( ( ) ⎛ x − μ1 ⎞ 2 = x1 − μ1 σ 2 − x 2 − μ 2 ρσ 1σ 2 . . .’s Xi. . . 2 ) 2 ( )( ) ( ) 2 Hence fX 1 . x ) = ∏ k 1 k i =1 ⎡ 1 exp ⎢− xi − μ i 2 ⎢ 2σ i 2πσ i ⎣ 1 ( )⎥ ⎥ 2 ⎤ ⎦ and this establishes the independence of the X’s.X 2 (x . |Σ| Σ−1 is also a diagonal matrix with the jth / / Σ Σ / 1 n diagonal element given by ∏i≠j σ 2. . . . i = 1. . ▲ The really important part of the corollary is that noncorrelation plus normality implies independence. .

.2 Some Properties of Multivariate Normal Distributions 18. / μ Σ In particular. . 1 m 1 n 18. . is Multivariate Normal and specify its parameters. 2 j 18. all i1. . .v. Xj (1 ≤ m < k. vectors and let μ / cj be constants. .2 Verify relation (8). the r.1 Introduction 467 Chapter 16. . have the Bivariate Normal distribution with means and variances ˆ ˆ Eβ 1 = β1.1. . of the m-Variate Normal with mean Aμ and covariance matrix AΣA′. . we have () [ ( )] ( ( ) ) ( ) ( ) ′ ′ ⎡ ⎤ ⎡ ⎤ 1 1 / / φ Y t = exp⎢i A ′t μ − A ′t Σ A ′t ⎥ = exp⎢it ′ Aμ − t ′ AΣA ′ t ⎥ 2 2 ⎣ ⎦ ⎣ ⎦ and this last expression is the ch. ⎣ ⎦ so that by means of (6). . . the r. ∑ c 2 Σ j ⎟ j / ⎝ j =1 ⎠ j =1 j =1 . . jn). say. im ≠ from all j1.2 Some Properties of Multivariate Normal Distributions In this section we establish some of the basic properties of a Multivariate Normal distribution. . and Y has the Univariate Normal distribution with mean α′μ and variance μ / Σ α′Σα.3 Let the random vector X = (X1. Σ) (not necessarily non-singular). . if m = 1. . . . given Xj .18. . The particular case follows from / Σ the general one just established. Σj) k-dimensional r. vector Y deﬁned by Y = AX has the m-Variate Normal distribution with mean Aμ and covariance matrix AΣA′.f. Σ) and μ / suppose that Σ is non-singular. Then the r. Y = α′X. . as was to be seen. vector n n ⎛ n ⎞ X = ∑ c j X j is N ⎜ ∑ c j μ j . . . Then show that the conditional joint distribu/ tion of Xi . let Xj be independent N(μj. . Then for any μ / m × k constant matrix A = (αij). . σ 2 β2 = ( ) j n j =1 ∑ (x n j =1 σ2 j −x ) 2 and correlation coefﬁcient equal to − 18. ▲ () ( ) ( ) ( ) ( ) THEOREM 4 For j = 1. ∑ x n∑ x n j =1 . . . m + n = k. Y is a linear combination of the X’s. Eβ 2 = β2 and ˆ σ β1 = 2 ( ) σ 2 ∑j = 1 x 2j n n∑j = 1 x j − x n ( ) 2 ˆ . Xk)′ be N(μ. PROOF For t ∈ m . . . THEOREM 3 Let X = (X1. Xk)′ be distributed as N(μ. Xi . we have ′ ⎤ ⎡ φ Y t = E exp t ′Y = E exp t ′AX = E ⎢exp A ′t X ⎥ = φ X A ′t . . n.1.

. ⎥ ⎦ ⎤ −k / Σ / = 1 − 2it Σ . . vectors and let μ / X= ¯ Σ Then X is N(μ. distributed as χ 2 . (1/n)Σ).’s). of a k-Variate Normal with mean μ and covariance matrix Σ/(1 − 2it). . let Xj be independent N(μ. . n. Σ) and set Q = (X − μ)′Σ−1(X − μ).468 18 The Multivariate Normal Distribution (a result parallel to a known one for r. j =1 j j () n () n j =1 ( ) But so that ′ ′ ⎡ ⎤ 1 / φ X c j t = exp⎢i c jt μ j − c jt Σ j c jt ⎥ 2 ⎣ ⎦ ⎡ ⎤ 1 / = exp⎢it ′ c jμ j − t ′ c 2j Σ j t ⎥. n j =1 In the theorem. . 1 − 2it Now the integrand in the last integral above can be looked upon as the p.v. n. . Σj = Σ and cj = 1/n. .d.v. Xk)′ be non-singular N(μ. we have φ X t = ∏ φc X t = ∏ φ Xj c j t . . ▲ ⎠ 2 ⎝ j =1 ⎠ ⎦ ⎥ ⎢ ⎣ ⎝ j =1 () COROLLARY For j = 1. j = 1. Hence / ( ) . PROOF For t ∈ k and the independence of the Xj’s. . 2 ⎣ ⎦ j ( ) ( ) ( ( ) ( ) ) ( ) ⎡ ⎛ n ⎞ 1 ⎛ n ⎞ ⎤ / φ X t = exp ⎢it ′⎜ ∑ c j μ j ⎟ − t ′⎜ ∑ c 2j Σ j ⎟ t⎥. Σ) k-dimensional r. . .f. / μ / Σ Then Q is an r. taken μj = μ. / μ PROOF 1 n ∑Xj. . ▲ / / THEOREM 5 Let X = (X1. we have ′ ⎡ ⎤ / φQ t = E exp itQ = ∫ exp⎢it x − μ Σ −1 x − μ ⎥ 2π ⎣ ⎦ ′ ⎡ 1 ⎤ / × exp⎢− x − μ Σ −1 x − μ ⎥ dx ⎣ 2 ⎦ () ( ) k ( ) ( )( ) )( −k 2 / Σ −1 2 ( ) ( ) =∫ k (2π ) −k 2 / Σ −1 2 ′ ⎡ 1 ⎤ / exp⎢− x − μ Σ −1 x − μ 1 − 2it ⎥ dx 2 ⎣ ⎦ ( ) ( ) = 1 − 2it ( ) ∫ ( ) −k 2 k 2π −k 2 / Σ 1 − 2it −1 2 since −1 ⎡ 1 ′⎛ Σ ⎞ / × exp⎢− x − μ ⎜ ⎟ x−μ ⎢ 2 ⎝ 1 − 2it ⎠ ⎣ ( ) ( )⎥ dx. k PROOF For t ∈ .

respectively. of χ 2 . . . Y which is distributed as Normal with mean λ′μ and μ variance λ′Σλ for every λ ∈ k. Xk)′ with d.v.f. and then we proceed with a certain testing hypothesis problem. n j =1 ( ) and S = Sij . . .2. . . φ. . Xkn)′. respectively. providing estimators for μ and / Σ. . . λk)′ ∈ k. Xjk)′ be independent. . . .1 Consider the k-dimensional random vectors Xn = (X1n.3 Estimation of μ and ∑ and Test of Independence / 18. μ / ¯ and S/(n − 1) are unbiased estimators of μ and Σ. n. k. . . / Σ 18. .v. j = 1. for every λ = (λ. i = 1. .’s X and Y is the Bivariate Normal distribution. X k . Σ) μ / r. F and ch. to an r. and we write X n ⎯d → X. . n = 1. It can be shown that a multidimensional version of Theorem 2 in Chapter 8 holds true.f.18. X n ⎯d → X. . REMARK 4 Exercise 18. respectively. where X i = ∑ X ji . ▲ k Notice that Theorem 5 generalizes a known result for the onedimensional case. .f. / Now suppose that the joint distribution of the r. non-singular N(μ. THEOREM 6 For j = 1. vectors and set ′ 1 n X = X 1 . .3 Estimation of μ and Σ and a Test of Independence / First we formulate a theorem without proof. .1 Introduction 469 the integral is equal to one and we conclude that φQ (t) = (1 − 2it)−k/2 which is the ch. k =1 ( ) n ( )( ) Then ¯ i) X and S are sufﬁcient for (μ. if and only if n→∞ ⎯ λ ′ X n ⎯d → λ ′ X. k. . Σ). . ii) X / ¯ iii) X and S/n are MLE’s of μ and Σ.’s Fn. . Use this result (and also ⎯ Theorem 3′ in Chapter 6) in order to prove that X n ⎯d → X. .’s φn. . . where Sij = ∑ X ki − X i X kj − X j . . . . ⎯ n→∞ n→∞ where X is distributed as N(μ. . . Σ) if and only if {λ′Xn} converges in distribution μ / λ as n → ∞. 2. i. That is. . and X = (X1. let Xj = (Xj1. . . if Fn ( x) ⎯ → F ( x) for all x ∈ k for which F is continuous (see ⎯ ⎯ n→∞ n→∞ also Deﬁnition 1(iii) in Chapter 8). . In particular. Then we say that {Xn} converges in distribution to X as n → ∞.

j = 1. sample of size n(Xj. f. . n. σ 2 > 0. ρ ∈ 2 ⎩ ( ) 5 We have 2 g = g θ = g μ1 . y = ( ) 1 2πσ 1σ 2 1 − ρ 2 e −q 2 . x1 . σ 2 > 0. σ 2 . Thus the problem of testing independence for X and Y becomes that of testing the hypothesis H : ρ = 0. where the parameter space Ω is given by ′ ⎧ 2 Ω = ⎨θ = μ1 . . μ 2 . μ 2 . y1 . σ 1 . μ 2 ∈ . 1 q= 1 − ρ2 2 ⎡⎛ x − μ ⎞ 2 ⎛ x − μ1 ⎞ ⎛ y − μ 2 ⎞ ⎛ y − μ 2 ⎞ ⎤ 1 ⎢⎜ ⎟ ⎥. ⎟ +⎜ ⎟⎜ ⎟ − 2 ρ⎜ ⎢⎝ σ 1 ⎠ ⎝ σ1 ⎠ ⎝ σ 2 ⎠ ⎝ σ 2 ⎠ ⎥ ⎣ ⎦ Then by Corollary 2 to Theorem 2. . . ρ = 0 ⎬. ⎟ ⎠ n and 1 qj = 1 − ρ2 2 ⎡⎛ x − μ ⎞ 2 ⎛ x j − μ1 ⎞ ⎛ y j − μ 2 ⎞ ⎛ y j − μ 2 ⎞ ⎤ 1 ⎥ ⎢ j − 2 ρ⎜ ⎟⎜ ⎟ +⎜ ⎟ . the parameter space ω becomes ′ ⎧ 2 ω = ⎨θ = μ1 . σ 1 . For this purpose. μ 2 ∈ . . . consider an r. 2 2 2 2 2 j =1 ( ) n . n.470 18 The Multivariate Normal Distribution fX . 2 ⎭ whereas under H. 2 ⎭ ⎫ 2 . the r. . xn . σ 2 . . . ρ. Then their joint p. . Yj). we set g(θ) for log f(θ) θ θ considered as a function of the parameter θ ∈ Ω. For this purpose. σ 1 . . ⎢⎜ σ 1 ⎟ ⎝ ⎠ ⎝ σ1 ⎠ ⎝ σ 2 ⎠ ⎝ σ 2 ⎠ ⎥ ⎦ ⎣ j = 1. μ1 . we choose to derive them directly.v.’s X and Y are independent if and only if they are uncorrelated. μ 2 . − 1 < p < 1⎬.d. . σ 1 . yn 2 () ( ) (10) = n log 2π − n n n 1 2 log σ 1 − log σ 2 − log 1 − ρ 2 − ∑ q j . σ 1 . . we are going to employ the LR test. ρ ∈ 2 ⎩ ( ) 5 ⎫ 2 .f. And although the MLE’s of the parameters involved are readily given by Theorem 6. . σ 2 . . . μ1 .. is given by ⎛ 1 ⎜ ⎜ 2πσ σ 1 − ρ 2 ⎝ 1 2 where Q = ∑ qj j =1 n ⎞ ⎟ e −Q 2 .Y x. (9) For testing H. from the Bivariate Normal under consideration.

3.4) 1 2 ˜2 ˜2 ˜ σ 1 = S x . 2 ⎜ 2 σ 1σ 2 σ 2 ⎠ σ 1σ 2 1 − ρ ⎝σ1 S xy S x Sy (15) In (14) and (15).) (11) Solving system (11) for μ1 and μ2.5) that the values of the parameters given by (12) and (16) actually maximize f (equivalently. equating the partial deriva1 2 tives to zero and replacing μ1 and μ2 by μ1 and μ 2. Now let us set Sx = Sy = 1 ∑ yj − y n j =1 n 2 1 n ∑ xj − x . we obtain (see also Exercise 18. ⎪ σ 1σ 2 σ2 2 ⎭ (See also Exercise 18. σ1 σ2 σ1 σ2 ⎪ ⎭ (See also Exercise 18. (16) It can further be shown (see also Exercise 18. . n j =1 1 n and Sxy = ∑ x j − x y j − y . solving for σ 2. ρ = . . .3. n are given by (9).3. σ 2 and ρ. σ 2 = Sy . differentiating g with respect to ρ and equating the partial derivative to zero. Differentiating (10) with respect to μ1 and μ2 and equating the partial derivatives to zero. differentiating g with respect to σ 2 and σ 2. g) and the maximum is given by ⎛ ⎜ ⎜ e −1 ˆ max f θ . we get ˜ ˜ μ1 = x .1 Introduction 471 where qj. we get after some simpliﬁcations ρ 1 ρ 1 ⎫ y− x⎪ μ2 − μ1 = σ2 σ1 σ2 σ1 ⎪ ⎬.3) ρ− ⎞ ρ ⎛ 1 2ρ 1 1 Sx − S xy + 2 S y ⎟ + S xy = 0.1. ⎟ ⎟ ⎠ n (17) . respectively. we obtain after ˜ ˜ some simpliﬁcations ⎫ ρ 1 Sx − Sxy = 1 − ρ 2 ⎪ 2 σ 1σ 2 σ1 ⎪ ⎬. ρ 1 2 ⎪ Sy − Sxy = 1 − ρ . .3 Estimation of μ and ∑ and Test of Independence / 18. ρ 1 ρ 1 ⎪ μ1 − μ2 = x− y.2.3. we obtain after some simpliﬁcations (see also Exercise 18. θ ∈ Ω = L Ω = ⎜ S2 ⎜ 2π Sx Sy 1 − xy ⎜ Sx Sy ⎝ [() ] () ⎞ ⎟ ⎟ ⎟ .3. μ 2 = y.) (14) Next. n j =1 (12) ( ) ( ) 2 ( )( ) (13) Then. j = 1.18.

it follows that R 2 ≤ 1. σ 1.3.Ω ˆ 2. θ ∈ω = L ω = ⎜ ⎜ 2π S S x y ⎝ ⎞ ⎟ .7. by differentiation. so that PH(W < −c or W > c) = α. it is seen (see also Exercise 18.Ω = x . σ 2 Ω. μ 2 .Ω = y . R can be derived.3. (24) it is easily seen.f. σ 2.ω = Sy and ⎛ e −1 ˆ max f θ . this p. that W is an increasing function of R . That is.Ω. σ 2 and ρ. that is. for ρ = 0). respectively.Ω = Sx . (25) where c is determined. of the r. if we consider the function W =W R = ( ) n − 2R 1 − R2 .6) that the MLE’s of the parameters involved are given by ˆ ˆ ˆ2 ˆ2 μ1. under Ω. we get by means of (21).Ω.472 18 The Multivariate Normal Distribution It follows that the MLE’s of μ1. in order to be able to determine the cut-off point c0. . which we may now denote by μ1. that this test is equivalent to rejecting H whenever R 2 > c0 . R < − c0 or R > c0 . is none of the usual ones. c0 = 1 − λ2 n . σ 2 . μ 2 . so that PH(λ < λ 0) = α. σ 2 Ω and ρ Ω.ω = Sx . ⎟ ⎠ n (19) [() ] () n 2 (20) Replacing the x’s and y’s by X’s and Y’s.f.d. However.Ω = Sy . we have to know the distribution of R under H. ρΩ = Sxy Sx Sy .v.ω = x . we have that the LR statistic λ is given by ⎛ S2 ⎞ λ = ⎜ 1 − XY ⎟ S X SY ⎠ ⎝ = 1 − R2 ( ) n 2 . It is shown in the sequel that the distribution of W under H is tn−2 and hence c is readily determined. R = ∑ X j − X Yj − Y j =1 n ( )( ) ∑ (X j =1 n j −X ) ∑ (Y − Y ) 2 j j =1 n 2 . (See also Exercise 18. the test in (23) is equivalent to the following test Reject H whenever W < −c or W > c . Therefore. where λ 0 is determined. (21) where R is the sample correlation coefﬁcient.d. equivalently.Ω ˆ ˆ ˆ ˆ2 ˆ2 ˆ μ1. (22) From (22). 0 (23) In (23). Now although the p. (18) Under ω (that is.ω = y . σ 2 . ˆ Ω ˆ Ω ˆ 1. μ2. μ2. in (17) and (20).) Therefore by the fact that the LR test rejects H whenever λ < λ 0. are given by (12) and 1 2 (16). σ 1.

18. one rejects H whenever W < −c or W > c. Thus we have the following result. where W is given by (24). 1) r.9) that Wx = W* remains unchanged.v. . . Therefore the distribution of Wx. j = 1. By j j j means of the transformation in question.3. so that (see also Exercise 18. . . 1). It is readily seen that n R x = R ∗ = ∑ v j Yj v j =1 ∑Y j =1 2 j − nY 2 . j = 1. 1).8) ∗ Wx = W v = ∑ v j Yj j =1 n 2 ⎡n ⎛ n ⎞ ⎤ ⎢∑ Y j2 − nY 2 − ∑ v j Yj ⎥ ⎜ ⎟ ⎥ ⎢ j =1 ⎝ j =1 ⎠ ⎦ ⎣ ( n − 2) . . . Chapter 9. n are independent N(μ2. this transformation can be completed to an orthogonal transformation (see Theorem 8. equivalently the distribution of W.I(i) in Appendix I) and let Zj. j = 1.3 Estimation of μ and ∑ and Test of Independence / 18. Since this distribution is independent of the x’s. the statistic in (28) becomes Z2 Σ jn= 3 Z j2 ( n − 2) . Therefore we may v assume that the Y’s are themselves independent N(0. Next consider the transformation ⎧ 1 1 Y1 + ⋅ ⋅ ⋅ + Yn ⎪Z1 = ⎨ n n ⎪Z = v Y + ⋅ ⋅ ⋅ + v Y . n is tn−2. . THEOREM 7 For testing H : ρ = 0 against A : ρ ≠ 0 at level of signiﬁcance α. . it follows that the unconditional distribution of W is tn−2. . it is seen j j (see also Exercise 18. j = 1.’s Zj. j = 1. .’s Y′ = (Yj − μ2)/ σ2. Now if we consider the 2 N(0. .1 Introduction 473 Suppose Xj = xj. . . the sample correlation . . n and replace Yj by Y′ in (28). j = 3. . . given Xj = xj. . (26) Let also vj = x j − x so that n ( ) ∑ (x j =1 n j =1 j −x ) 2 . (27) Let also Wx = n − 2R x 1 − R 2x . n be the remaining Z’s. . 1 1 n n ⎩ 2 Then because of (1/√n)2 + · · · + (1/√n)2 = n/n = 1 and also because of (27). . Then by Theorem 5. it follows that the r. σ 2). . . n are independent N(0. Also ∑n= 1 Y 2 = ∑n= 1 Z2j by Theorem 4. . ∑v j =1 j = 0 and ∑v n 2 j = 1. Chapter 9.3. n and that ∑n= 1(xj − x )2 > 0 and set ¯ j R x = ∑ x j − x Yj − Y j =1 n ( )( ) ∑ (x j =1 n n j −x ) ∑ (Y − Y ) 2 j j =1 n 2 . .v. . (28) We have that Yj.

Σ) distribution has a p. = ⎡1 ⎤ π Γ⎢ n − 2 ⎥ ⎣2 ⎦ ( ) ( ) ( ) as was to be shown. . if Σ is non-singular. However. ⎛ n − 2 r ⎞ dw fR r = fW ⎜ ⎟ ⎝ 1 − r 2 ⎠ dr () ⎡1 ⎤ Γ⎢ n − 1 ⎥ 2 ⎣ ⎦ = ⎡1 π n − 2Γ ⎢ n − 2 ⎣2 ( ) ( ) ⎡ n − 2 r2 ⎢1 + ⎤⎢ n − 2 1− r2 ⎥⎣ ⎦ ( ( )( ) ) ⎤ ⎥ ⎥ ⎦ − n −1 2 ( ) n − 2 1− r2 ( ) −3 2 ⎡1 ⎤ Γ⎢ n − 1 ⎥ 2 ⎣ ⎦ 1 − r 2 ( n 2 )− 2 . one has the following corollary.d. then the N(μ. of the correlation coefﬁcient R is given by ⎡1 Γ⎢ n − 1 1 ⎣2 fR r = π Γ⎡ 1 n − 2 ⎢2 ⎣ () ( )⎤ ⎥ ) ( ⎦ 1 − r 2 ( n 2 ) − 2 . one has dw/dr = √n − 2(1 − r2)−3/2.474 18 The Multivariate Normal Distribution coefﬁcient R is given by (22) and the cut-off point c is determined from P(tn−2 > c) = α/2 by the fact that the distribution of W. under H.d. of R when ρ ≠ 0 can also be obtained. In this latter / case. w∈ .f. To this last theorem. whereas fW Therefore ( ) ⎡1 ⎤ Γ⎢ n − 1 ⎥ ⎣2 ⎦ w = ⎡1 π n − 2Γ ⎢ n − 2 ⎣2 ( ) ( ) ⎛ w2 ⎞ 1+ n − 2⎟ ⎤⎜ ⎝ ⎠ ⎥ ⎦ − n −1 2 ( ) . ▲ The p.f. Then its ch. that is. is tn−2. RW ≥ 0. ⎤ ⎥ ⎦ ( ) PROOF From W = √n − 2 r/√1 − R2. this is not the case if Σ is singular. Solving for R. / / which is given by (7). COROLLARY The p.f. We close this chapter with the following comment. the distribution is called singular. μ / Furthermore.f. it follows that R and W have the same sign.d. By setting w = √n − 2 r/√1 − r2. one has then R = W/√W 2 + n − 2. Σ). − 1 < r < 1. and it can be shown that it is concentrated in a hyperplane of dimensionality less than k. is given by (6). but its expression is rather complicated and we choose not to go into it. Let X be a kdimensional random vector distributed as N(μ.

. i.6 Show that the MLE’s of μ1.3. σ2 ¯ 18.18. μ 2 . Then show that the six numbers D0.’s Yj are replaced by the r. σ 2 . are indeed given 1 2 by (19).3 Verify relation (11).v. (See for example Mathematical Analysis by T. . 2 ≤ 1. ˜ θ=θ where 2 θ = μ1 . σ 2. 1957. Let D = (dij). Theorem 7. . 5 ˜ ˜ ˜1 ˜2 ˜ and denote by D5−k the determinant obtained from D by deleting the last k rows and columns. σ 1 . Verify relation (15). i. j = 1.9. . .8 Verify relation (28).4 Show that σ 2.v.3. 18. . . show that they are independent. .5 Consider g given by (10) and set dij = ∂2 gθ ∂θ i ∂θ j () . . indeed. Chapter 11). . M. and σ 2. j = 1. the MLE’s of the parameters μ1.) 18. D0 = 1.7 Show that by (22). ρ 2 ( ) ′ ˜ ˜ ˜ ˜2 ˜2 ˜ θ = μ1 .3. Apostol. σ 2 . Y j′ = Yj − μ 2 . under ω. 5 are continuous functions of θ implies that the quantities given by (18) are. 18.3. by using Basu’s theorem (Theorem 3. k = 1. . ρ ( ) ′ and μ1. μ 2 . . σ 2. σ 2 1 2 and ρ. pp.3. This result together with the fact that dij. Addison-Wesley. where is the sample correlation coefﬁcient given 18.3. μ2. .10 Refer to the quantities X and S deﬁned in Theorem 6 and. 18. D1. n. 151–152. . . . σ 2. and ρ given by (16) is indeed the solution of the ˜1 ˜2 ˜ system of the equations in (14) and (15).3. Verify relation (14). μ2. .2 18.3.3.9 Show that the statistic Wx (= W*) in (28) remains unchanged if the v r.’s . j = 1.1 18. σ 2 and ρ are given by (12) and (16). μ2.3. . under Ω. σ 1 . .1 Introduction Exercises 475 Exercises 18. . σ 2. 5. D5 are alternately positive and negative. 18.

i =1 j =1 n n where here and in the sequel the coefﬁcients of the x’s are always assumed to be real-valued constants. that is to say. where A = – (C + C′). n.476 19 Quadratic Forms Chapter 19 Quadratic Forms 19. j = 1. That is. . . . j = 1. in the variables xj. . Q. . . Q = ∑ ∑ cij xi x j .v. Q. i. n. j = 1. In matrix notation. . In this latter case. we introduce the concept of a quadratic form in the variables xj. xn)′ and C = (cij). . . j = 1. n. Thus A is symmetric. j = 1. . . . we formulate and prove a number of standard theorems referring to the distribution and/or independence of quadratic forms. 1 1 Therefore Q = – (x′C′x + x′Cx) = x′Ax. j = 1. i =1 j =1 n n (1) where cij ∈ follows: and cij = cji. . . . . By setting x = (x1. We can then 2 give the following deﬁnition. . .’s Xj. if 2 2 1 A = (aij). DEFINITION 1 A (real) quadratic form. . j = 1. . in the variables xj. n is a homogeneous quadratic (second degree) function of xj. (2) 476 . n. . . . then aij = – (cij + cji). or (x′Cx)′ = x′C′x = x′Cx. . . . . . . . we can write Q = x′Cx. so that aij = aji. A quadratic form. n and then conﬁne attention to quadratic forms in which the xj’s are replaced by independent normally distributed r. Q = ∑ ∑ cij xi x j .1 Introduction In this chapter. . Now Q is a 1 × 1 matrix and hence Q′ = Q. . n is a homogeneous quadratic function of xj. (1) becomes as Q = x ′Cx.

Therefore if for i = 1. we suppose that ∑k= 1ri = n and show that for i = 1. C = (cij) and C′ = C (which expresses the symmetry of C). DEFINITION 2 For an n × n matrix C. negative deﬁnite or positive semideﬁnite. n are independently distributed as N(0. ⎝ ⎠ ⎝ ⎠ 2 2 i i i i (4) (i) where δ 1 . . Qi are j j n independent χ 2 . k. . r i i We have that X′X = ∑ n= 1X 2 is χ 2 . .2 Some Theorems on Quadratic Forms 477 where x = (x1. δ (i) are either 1 or −1. xn)′. or X: Q = X ′CX. respectively.’s Qi are independent χ 2 if and only if ∑k= 1ri = n. . . . it is called negative deﬁnite if x′Cx < 0 for every x ≠ 0 and positive semideﬁnite if x′Cx ≥ 0 for every x. j = 1.19. . r i Next. negative deﬁnite or positive semideﬁnite if the quadratic form associated with it. Xn)′. By Theorem 11. k. . then the rank of C is also called the rank of Q. . A symmetric n × n matrix C is called positive deﬁnite. . Now ∑k= 1ri = n and let B be the n × n r i matrix deﬁned by . . DEFINITION 3 DEFINITION 4 19. . . . To this end. Q = x′Cx. . . j = 1. . ∑k= 1ri = n. where C′ = C. .v. THEOREM 1 (Cochran) Let X ′X = ∑ Qi . there exist ri linear forms in the X’s such that PROOF i i (i ) (i ) (i ) (i ) (i ) (i ) Qi = δ 1 ⎛ b11 X 1 + ⋅ ⋅ ⋅ + b1n X n ⎞ + ⋅ ⋅ ⋅ + δ r ⎛ b r 1 X 1 + ⋅ ⋅ ⋅ + b r n X n ⎞ .v. 1) and we set X = (X1. . n. .2 Some Theorems on Quadratic Forms Throughout this section.I(ii) in Appendix I. . k. one has that Qi = X′CiX. is positive deﬁnite. . . Qi are quadratic forms in X with rank Qi = ri. The n roots of the equation |C − λIn| = 0 are called characteristic or latent roots or eigenvalues of C. .’s Xj. The quadratic form Q = x′Cx is called positive deﬁnite if x′Cx > 0 for every x ≠ 0. We then replace x by X in (2) and obtain the following quadratic form in Xj. Then the r. . . Consider the matrix Ci. i =1 k (3) where for i = 1. then because of (3). If Q = x′Cx. the polynomial (in λ) |C − λIn| is of degree n and is called the characteristic polynomial of C. . . . Qi are i independent χ 2 . . where Ci is an n × n r symmetric matrix with rank Ci = ri. Some theorems related to such quadratic forms will now be established. . . it is assumed that the r.

The proof is r completed. . then the r. . that is to say. Also the fact that D = In and the transformation Y = BX imply. From the deﬁnition of D. so that r = n. . Yn)′. . On the other hand. n.I(ii) in Appendix I) and so is (MM′)−1. It has been seen elsewhere that 2 2 ⎡ n Z −μ n ⎛Z −Z⎞ ⎛ Zj − μ ⎞ j = ∑⎜ +⎢ ∑⎜ σ ⎟ σ ⎟ ⎢ σ ⎠ ⎠ j =1 ⎝ j =1 ⎝ ⎣ n ( )⎤ . . Also n = rank In = rank (B′DB) ≤ r. of course. By Theorem 5. r ∑ Qi = X ′X i =1 k and (BX)′D(BX) = X ′(B′DB)X. Let r = rank B. by means of (4). ) identically in X. . It follows that B is nonsingular and therefore the relationship B′DB = In implies D = (B′)−1B−1 = (BB′)−1. Q2 is the sum of the squares of the next r2 Y’s.i. . ▲ i APPLICATION 1 For j = 1. it follows that. 1). . . j = 1. k. which implies that D = In and hence B′B = In. On the other hand.v. . Qi are independent χ 2 . MM′ is positive deﬁnite (by Theorem 10. . . B is orthogonal. so that rank D = n. k. . From the form of D.478 19 Quadratic Forms ⎧b(1) ⋅ ⋅ ⋅ b(1) ⎫ 1n ⎪ ⎪ 11 ⎪b(1) ⋅ ⋅ ⋅ b(1) ⎪ rn ⎪ ⎪ r1 B=⎨ . Thus (BB′)−1 is positive deﬁnite and hence so is D. .’s distributed as N(μ. . Therefore (5) gives X ′X = X ′ B′DB X ( Hence B′DB = In. Qk is the sum of the squares of the last rk Y’s. (k ) (k ) ⎬ ⎪b11 ⋅ ⋅ ⋅ b1n ⎪ ⎪ (k ) (k ) ⎪ ⎪br 1 ⋅ ⋅ ⋅ br n ⎪ ⎭ ⎩ 1 1 k k Then by (4) and the deﬁnition of B. that Q1 is equal to the sum of the squares of the ﬁrst r1. δ (i) .v. n are independent N(0. for i = 1. so that the X’s are i. ⎥ 2 ⎥ ⎦ equivalently. . let Zj be independent r. . . . it follows that ||D|| = 1. . σ 2) and set Xj = (Zj − μ)/σ. . . . r ≤ n. .d. .’s Yj. it is clear that ∑ Qi = (BX )′D(BX ). . i = 1. Then. if Y = (Y1. It follows that. Y’s. distributed as N(0. Chapter 9. . i =1 i k (5) where D is an n × n diagonal matrix with diagonal elements equal to (i) δ 1 . Set Y = BX. 1). it follows then that all diagonal elements of D are equal to 1. for any nonsingular square matrix M. . .

then Q = X ′CX = PY ′C PY = Y ′ P ′CP Y = ∑ λ j Y j2 .19. Then Theorem 1 applies with k = 2 and gives that ¯ ¯ ∑ n= 1(Xj − X )2 and (√nX )2 are independent distributed as χ 2 and χ 2. 1) r. Namely. Suppose that C is idempotent and that rank C = r. . The following theorem refers to the distribution of a quadratic form in the independent N(0. ( ) (7) Then Theorem 1 applies with k = 2 and gives that X′CX is χ 2 (and also r X′(In − C)X is χ 2 ).I(iii) in Appendix I. Then we ﬁrst show that rank C = r. ( ) (6) Also X ′X = X ′CX + X ′ I n − C X. Thus it follows that (1/σ 2)∑n= 1(Zj − Z )2 is distributed as χ 2 and is j n−1 ¯ independent of Z . Then Q is distributed as χ 2 if and only r if C is idempotent (that is. Then by Theorem 12. where C1 is given by ⎧ n−1 n ⎪ ⎪ −1 n C1 = ⎨ ⋅ ⎪ ⎪ −1 n ⎩ ( ) (n − 1) ⋅ −1 n ⋅⋅⋅ n ⋅⋅⋅ ⋅⋅⋅ ⋅⋅⋅ −1 n ( −1 n ⎫ ⎪ −1 n ⎪ ⎬ ⋅ ⎪ n − 1 n⎪ ⎭ ) and that rank C1 = n − 1. j = 1. n. . THEOREM 2 Consider the quadratic form Q = X′CX.2. there exists an orthogonal matrix P such that if Y = P−1X (equivalently. . so that rank C2 = 1.’s Xj.v. n−r Assume now that Q = X′CX is χ 2. respecj n−1 1 ¯ tively. j =1 ( ) ( ) ( ) m (8) . we have PROOF rank C + rank I n − C = n. ⎝ ⎠ 1 2 2 Now ⎞ ⎛ ⎞ 1⎛ n ⎜ n X ⎟ = ⎜ ∑ X j ⎟ = X ′C 2 X.1) that ∑ (X j =1 n j −X ) 2 = X ′C1X.2 Some Theorems on Quadratic Forms 479 ∑ X j2 = ∑ X j − X j =1 j =1 n n ( ) 2 ⎛ ⎞ + ⎜ n X⎟ . X = PY). C2 = C) and rank C = r. . Next it can be shown (see also Exercise 19. ⎝ ⎠ n ⎝ j =1 ⎠ 1 2 2 2 where C2 has its elements identically equal to 1/n.I(iii) in Appendix I. By r Theorem 11.

. ▲ APPLICATION 2 Refer to Application 1. (11) It follows then that rank C = r. . so that j X′CX is equal to ∑ jr= 1Y 2. respectively.f. so that the Y 2’s are independent χ 2. j=1 or equivalently 1 σ2 ∑ (Z j =1 n j −Z ) 2 and ⎡ n Z −μ ⎢ ⎢ σ ⎣ ( )⎤ ⎥ ⎥ ⎦ 2 are distributed as χ 2 and χ 2. This completes the proof of the theorem. one concludes that C = C2. . . its diagonal elements are either 1 or 0. Yn)′ = Y and λj. one has that Q = X′CX is equal to ∑ jr= 1Y 2. (10) From (8)–(10). We now show that C2 = C. . COROLLARY 1 If the quadratic form Q = X′CX is distributed as χ 2. Hence P′CP is idempotent. j Thus X′CX ≥ 0 for every X. Therefore the ch. n−1 1 To this theorem there are the following two corollaries which will be employed in the sequel. m are the nonzero characteristic roots of C.480 19 Quadratic Forms where (Y1. By the orthogonality of P. . . −1 −1 (12) Multiplying by P′ and P on the left and right. Xn)′ and (Y1. . . one then has that (see also Exercise 19.2) and m = r. by (11). 1 j j evaluated at t. Q is χ 2 by assumption. (9) On the other hand. evaluated at t. Then Theorem 2 implies that ∑ n (Xj − X )2 and (√n 1/2X )2. respectively. . 1) (Theorem 5. Yn)′ = Y = P−1X.2.2. . Chapter 9). one has that P′CP is diagonal and.f. so that its ch. of ∑ m= 1λjY 2. PROOF From (8) and (10). j = 1. . P ′CP = P ′C 2P. It can be shown (see also Exercise 19. . ▲ . From (8). .. where X = (X1. is given by [(1 − 2iλ t ) ⋅ ⋅ ⋅ (1 − 2iλ t )] 1 m −1 2 . 2 2 n ( That is. the Y’s are independent N(0. is r given by (1 − 2it ) λ 1 = ⋅ ⋅ ⋅ = λm = 1 −r 2 .3) that C1 and ¯ ¯ C2 are idempotent. then it is positive r semideﬁnite. Thus P ′CP = P ′CP ) = (P′CP)(P′CP) = P ′C(PP ′)CP = P ′CI CP = P ′C P. as was to be seen. . both sides of (12). .

I in Appendix I. For i = 1. where Bi = P ′Ci P.2 Some Theorems on Quadratic Forms 481 COROLLARY 2 Let P be an orthogonal matrix and consider the transformation Y = P−1X. . We have (P′CP) 2 2 = P ′C PP ′ CP = P ′CCP = P ′CP ( ) since C = C. . That rank P′CP = r follows from Theorem 9. . 2. Then the assumption that Q1 is χ 2 implies (by r Theorem 2) that C1 is idempotent and rank C1 = r1. Q1 and Q2 are quadratic forms in X. ▲ 2 2 1 APPLICATION 3 ¯ ¯ Refer to Application 1. . 1). Also rank C1 + rank (In − C1) = n by Theorem 12. r PROOF By the theorem. From this result and (13). ▲ THEOREM 3 Suppose that X′X = Q1 + Q2. . In − C1 is idempotent. Then. B1 and B2. X = PY). Q2 are independent. so is the quadratic form Q* = Y′(P′CP)Y. ∑ j=1(Xj − X n−1 ¯ ¯ (√nX )2. r 1 2 THEOREM 4 PROOF Let Q = X′CX. then Q is transformed into Y′(P′CP)Y = ∑jr= 1Y 2. and let Q1 be χ 2 . that is. Furthermore. let Q1 be χ 2 and let Q2 be positive semideﬁnite. Then. Next PROOF 1 Q2 = X ′X − Q1 = X ′X − X ′C1X = X ′ I n − C1 X ( ) and (In − C1) = In − 2C1 + C = In − C1. 2. it sufﬁces to show that P′CP is idempotent and that its rank is r.19. Then Q2 is χ 2 and Q1. Suppose that Q = Q1 + Q2. and therefore Theorem 1 applies and gives the result.Y ) = ∑ Y 1 r 1 r j =1 r 2 j = Q∗ + Q∗ . that is.I(iv) in Appendix I. Then r r Q2 is χ 2 . We have then that rank Q1 + rank Q2 = n. Then if the quadratic form Q = X′CX is χ 2. so that rank (In − C1) = n − r1. r n−r 1 1 Let Q1 = X′C1X. where Q. . (1/σ 2)∑n= 1(Zj − Z )2 is distributed as χ 2 and is j n−1 ¯. where Q1.Y )′(Y . C is idempotent and rank C = r. 1 2 (13) By Corollary 1 to Theorem 2. . it follows that (√nX )2 is χ 2. . it 1 2 . Qi∗ = Y ′Bi Y. By Theorem 11. Since √nX is N(0. or equivalently. Hence the result. by Theorem 2. 1 n ¯)2 is distributed as χ 2 and is independent of by Theorem 3. The equation Q = Q1 + Q2 implies (Y . let Qi = X′CiX and let Q* be the quadratic j i form in Y into which Qi is transformed under P. Q2 are quadratic forms in X. where r2 = r − r1. let Q be χ 2. and Q1.I(iii) in Appendix I.I(iv) in Appendix I. it follows that there exists an orthogonal matrix P such that if Y = P−1X (equivalently. i = 1. it follows that C1 is positive semideﬁnite and so is C2 by assumption. Therefore by Theorem 10. Q2 are independent. Q* and Q* are positive semideﬁnite. Thus once again. independent of Z The following theorem is also of interest.

. it follows that Q* is m positive semideﬁnite. By our assumptions and Corollary 1 to Theorem 2. Qm −1 . . . k (≥ 2) are quadratic forms i in X. we have that the r. On the other hand. . Then Qk is χ 2 . By the fact that r i . 2. . Once again Theorem 4 applies r* r and gives that Qm+1 is χ 2 . r are independent N(0. and for i = 1. m + 1 are also independent and the proof is concluded. Hence Q* is χ 2 . . 2. Then Q1 and Q2 are independent if and only if C1C2 = 0.’s Yj. i =1 m and that Qm and Qm+1 are independent. More precisely.v. let Q be χ 2. r* m m ∗ rm = r − ∑ ri . . Q* is χ 2 by Corollary 2 to Theorem 2. Furthermore. i = 1. We write PROOF ∗ Q = ∑ Qi + Qm . suppose that C1C2 = 0. . . . r only.482 19 Quadratic Forms follows that Q* and Q* are functions of Yj. . by the induction hypothesis. . i = 1. . . These 1 r facts together with (13) imply that Theorem 3 applies (with n = r) and provides the desired result. . where Q* m m is χ 2 . . . j = 1. It follows that Qi. 1) and Qi χ 2 . . . . .’s Yj. let Qi be χ 2 . that is. σ 2). i =1 m −1 where ∗ Qm = Qm + Qm +1 . From the 1 2 orthogonality of P. . THEOREM 5 Suppose that Q = ∑k= 1Qi. . i = 1. i =1 k −1 and Qi . let Qi be quadratic forms in Y = (Y1. . THEOREM 6 Consider the independent r. ▲ The theorem below gives a necessary and sufﬁcient condition for independence of two quadratic forms. . For k = 2 the conclusion is true by Theorem 4. . where r i k rk = r − ∑ ri . Qi = Y′CiY. . i =1 m −1 and ∗ Q1 . j = 1. ▲ 1 This last theorem generalizes as follows. . . . . where Q and Qi. . i = 1. we have the following result. . Qm is χ 2 and Qm+1 is positive semideﬁnite. Let the theorem hold for k = m and show that it also holds for m + 1. . i = 1. Yn)′. Qm are independent. k − 1 and let Qk be r r positive semideﬁnite. Thus Q* = Qm + Qm+1. where r m m m+1 ∗ rm +1 = rm − rm = r − ∑ ri . n. k are independent. j = 1. PROOF The proof is presented only for the special case that Yj = Xj N(0. 1). The proof is by induction.v. . To this end. . where Yj is distributed as N(μj.

in Appendix I. 2 are idempotent. X′(C1 + C2)X is a quadratic form in X r distributed as χ 2 +r . the case as is easily seen. n−p . Write Y as follows: Y = X′β + (Y − X′β) and show that: ˆ ii) ||Y||2 = Y′X′S−1XY + ||Y − X′β||2.4 Consider the usual linear model Y = X′β + e. This concludes the proof of the theorem. Next by the symmetry of Ci. So the matrices r C1. Then Theorem 12.3 Refer to Application 2 and show that the matrices C1 and C2 are both idempotent. it follows that r r Q1 + Q2 = X′(C1 + C2)X is χ 2 +r . 19.’s Y′X′S−1XY and ||Y − X′β||2 are independent. Since Q1 is χ 2 and Q2 is χ 2 . where X is of full rank β ˆ ˆ ˆ p. That is.2. r that is.19.2 Some Theorems on Quadratic Forms Exercises 483 Qi is distributed as χ 2 . 19.v. according to the conclusion reached in discussing those applications. i = 1. Thus C1 + C2 is idempotent by Theorem 2. C 2 = Ci. n−1 1 This should imply that C1C2 = 0. Then Theorem 12. This is.2. On the other hand. 2. we have X ′X = X ′C1X + X ′C 2 X + X ′ I n − C1 − C 2 X. Q2 be independent. clearly. 19. Exercises 19. ˆ ii) The r.1 Refer to Application 1 and show that ∑ (X j =1 n j −X ) 2 = X ′C1X as asserted there. respectively. (15) and Theorem 1 imply that X′C1X = Q1. by Theorem 6. ▲ 1 2 1 2 1 2 REMARK 1 Consider the quadratic forms X′C1X and X′C2X ﬁguring in Applications 1–3.2. it follows (by Theorem 2) that Ci. X′C2X = Q2 (and X′(In − C1 − C2)X) are independent. applies and gives that C1C2 (= C2C1) = 0. ( ) (14) ( ) (15) Then relations (14).2 Justify the equalities asserted in (11). i = 1. implies that ( ) ( ) rank C1 + rank C 2 + rank I n − C1 − C 2 = n. Let now Q1. X′C1X and X′C2X are distributed as χ 2 and χ 2. C2 and C1 ± C2 are all idempotent. the ﬁrst being distrib2 uted as noncentral χ p and the second as χ 2 .2. indeed. in Appendix I. and let β = S−1XY be the LSE of β. one has C2C1 = C′ C′1 = 2 i (C1C2)′ = 0′ = 0. Therefore i C 1 I n − C 1 − C 2 = C 2 I n − C 1 − C 2 = 0.I(iv).I(iii). Then.

v.2. X3 be independent r. k. . .v. .δ if and only if ∑k= 1ri = n. . are independent.v. Qi i are quadratic forms in Y. . . .’s β 1. i = 1.v. where 2 2 ˜ σ is the LSE of σ . . .2. and set Y = (Y1. . . let Yj be independent r. . X2. μ = (μ1. .5 Let X1. 19.’s Qi are independent χ 2 . 1) and let the r. .7 For j = 1.2. .’s β 2. . [Hint: The proof is presented along the same lines as that of Theorem 1. μn)′. ∑3= 1X 2 − Q.v. k.6 Refer to Example 1 in Chapter 16 and (by using Theorem 6 herein) ˆ ˜ ˆ ˜ show that the r. 2 3 6 Then ﬁnd the distribution of Q and show that Q is independent of the r.’s distributed as N(0.v.v. where for i = 1. . where the noncentrality r i parameter δi = μ′Ciμ. . with rank Qi = ri. n. 1).484 19 Quadratic Forms 19. Yj being distributed as N(μj.] i i ( ) . Then show that the r. Let Y′Y = ∑k= 1Qi. σ 2. j j Q= 19. Q be deﬁned by 1 2 5X 1 + 2 X 2 + 5X 2 + 4 X1 X 2 − 2 X1 X 3 + 4 X 2 X 3 . Qi = Y′CiY. .’s. Yn)′. . as well as the r. σ 2.

that is. X n ⎯⎯→ μ . That is. .1 Nonparaetric Estimation 485 Chapter 20 Nonparametric Inference In this chapter. we discuss brieﬂy some instances of nonparametric. is weakly Then it has been shown that X ⎯ consistent. let Xj. The only assumption made is that the X’s have a ﬁnite (unknown) ¯ mean μ. r. N 0. inferences which are made without any assumptions regarding the functional form of the underlying distributions. consistent in the probability sense. n be i. This is justiﬁed on the basis of the SLLN’s. 1 n Xn = ∑ X j .s. namely. that is. Let X n be the sample mean of the X’s. 20. . .i.v. n→∞ σ Thus if zα/2 is the upper α/2 quantile of the N(0. It has also been mentioned that X n is strongly consistent. j = 1. a. 1 . n →∞ Let us suppose now that the X’s also have ﬁnite (and positive) variance σ 2 which presently is assumed to be known. Then. To this end. . distribution-free inference.d.1 Nonparametric Estimation At the beginning. we should like to mention a few cases of nonparametric estimation which have already been discussed in previous chapters although the term “nonparametric” was not employed there. The ﬁrst part of the chapter is devoted to nonparametric estimation and the remaining part of it to nonparametric testing of hypotheses. viewed as an estimator of μ. or more properly. n j =1 ¯n.’s with certain distribution about which no functional form is stipulated. according to the CLT.20. 1) distribution. This is n→∞ ¯ so by the WLLN’s. then n Xn − μ ( ) ⎯d→ Z ⎯ ( ) 485 . Thus X n ⎯P → μ .

X n = X n − zα n and ( ) Sn 2 n . .f. U*] is a conﬁdence interval for μ with asymptotic conﬁdence n n coefﬁcient 1 − α. Let F be the (common and unknown) d. ( ) L∗ = L∗ X 1 . suppose that σ 2 is unknown and set S 2 for the sample variance of the n X’s. .486 20 Nonparametric Inference ⎤ ⎡ n Xn − μ P ⎢−zα 2 ≤ ≤ zα 2 ⎥ ⎥ ⎢ σ ⎦ ⎣ ⎡ σ = P ⎢ X n − zα 2 ≤ μ ≤ X n + zα ⎢ n ⎣ ( ) 2 σ ⎤ ⎯ ⎥ ⎯ → 1 − α. 1 . ensured that S 2 . A further instance of point nonparametric estimation is provided by the following example. . X n = X n − zα and U n = U X 1 . . X n = X n + zα ( ) σ 2 n ( ) σ 2 . here Ln = L X 1 . . Un] is a conﬁdence interval for μ with asymptotic conﬁdence coefﬁcient 1 − α. . . S2 = n 2 1 n ∑ X j − Xn . . viewed n as an estimator of σ 2. was both a weakly and strongly consistent estimator of σ 2. Also by the corollary to Theorem 9 of Chapter 8. ⎥ n ⎦ n→∞ so that [Ln. n j =1 ( ) Then the WLLN’s and the SLLN’s. n we have that [L*. namely. . 2 U ∗ = U ∗ X 1 . n Next. the examples mentioned so far are cases of nonparametric point and interval estimation. of the Xi’s and set Fn for their sample or empirical d. . . X n = X n + zα n ( ) Sn . . .f. it follows that n Xn − μ Sn By setting ( ) ⎯d→ Z ⎯ n→∞ N 0. . that is.. Clearly. properly applied. . .

is assumed to be unknown. f which is assumed to be of the continuous type. which appeared in The Annals of Mathematical Statistics. n be i.f.d. j = 1.f. ·) is a strongly consistent estimator of F(x) and for almost all s ∈ S and every ε > 0.d. ( ) () ( ) provided n ≥ n(ε.d. . then f x = lim h →0 () F x+h −F x−h 2h ( ) ( ( ).20. recall that if F is the d. an unknown d. . (2) Thus Fn(x. which. x ∈ . s + ε .d. First we shall try to give a motivation to the estimates to be employed in the sequel. namely that of constructing tolerance intervals. s = ( ) 1 the number of X 1 s .F. corresponding to the p. j = 1. . The relevant proofs can be found in the paper “On estimation of a probability density function and mode.f.’s Xj.f. we shall consider the problem of estimation of an unknown p. 1065–1076. . 487 Fn x. A signiﬁcant amount of work has been done regarding the estimation of f(x). we have Fn x.’s with (the common) p. 20.i. is assumed to be the unknown one F. F was estimated by the sample d. Fn based on the i.f. . we report some of these results without proofs. Then it was stated in Chapter 8 (see Theorem 6) that Fn x. This suggests estimating f(x) by the (known) quantity ˆ fn x = () Fn x + h − Fn x − h 2h ( ) ). X n s ≤ x . . s ∈S .d.→ F x ⎯ n→∞ ( ) () uniformly in x∈ . In this section.i. .v. . f.v. At the end of the previous section. . Thus. We close this section by observing that Section 5 of Chapter 15 is concerned with another nonparametric aspect. . 33 (1962). . To this end. To this end. pp.2 Nonparametric Estimation of a p. . . r.f.f. s) independent of x ∈ . s). However.2 Nonparametric Estimation Estimation 20.f. the quantity [F(x + h) − F(x − h)]/2h should be close to f(x).” by E. r. s − ε ≤ F x ≤ Fn x. Parzen. ⋅ ⎯a. of course. n whose (common) d.D. n [ () () ] (1) We often omit the random element s and write Fn(x) rather than Fn(x. x ∈ . for (0 <) h sufﬁciently small.d. let Xj. In the present section.s. Vol.1 Nonparaetric of a P.

.’s with (unknown) p. . K by means of (3). .i. . j = 1. n be i. ⎪ ⎭ Next.f.d.1) that () ( ] (3) 1 n ⎛ x − Xj⎞ ˆ fn x = . x ∈ . For this purpose.f. otherwise.: ⎧ 1 . satisfying conditions (4). X n ≤ x − h ⎞ ⎟ n ⎠ 1 the number of X1 . r.d. let {hn} be a sequence of positive constants . hn ⎟ ⎠ (6) Then we may formulate the following results. . f and let K be a p.f. . .v. . . deﬁned on into itself and satisfying the following properties: () ( ] ⎫ ⎪ ⎪ lim xK x = 0 as x → ∞ ⎬ ⎪ K −x = K x . .488 20 Nonparametric Inference ˆ fn x = () Fn x + h − Fn x − h ( ) ( ) 2h 1 ⎛ the number of X1 . x + h ˆ fn x = n 2h and it can be further easily seen (see Exercise 20. X n in x − h. ⎩ ˆn(x) of f(x) is expressed in terms of a known Thus the proposed estimator f p.d. s). x ∈ { () ( ) () }<∞ (4) () hn ⎯ → 0. . . let {hn} be a sequence of positive constants such that sup K x . . deﬁne the r.f. if x ∈ −1. as follows: 1 ˆ fn x = nhn () ∑ K⎜ ⎝ j =1 n ⎛ x− Xj ⎞ . let K be any p. ∑ K⎝ nh j=1 ⎜ h ⎟ ⎠ () where K is the following p.d. X n ≤ x + h = ⎜ 2h ⎝ n − = the number of X1 . .f.v. . 2h n ( ] that is 1 the number of X1 .2. to be For each x ∈ shortened to fˆn(x). x + h . 1 ⎪ K x = ⎨2 ⎪0. .d. . . Also. X n in x − h. This expression also suggests an entire class of estimators to be introduced below. fˆn(x. . THEOREM 1 Let Xj.d. ⎯ n→∞ (5) and by means of K and {hn}. .

As an illustration.’s of the type K satisfying conditions (4). 2 2 satisﬁed.1 Nonparaetric of a P. As for the sequence {hn}. ⎯ Now let {hn} be as above and also satisfying the following requirement: nhn ⎯ → ∞. when properly normalized. if it happens to be known that f belongs to a class of p.D.f.d.⎯→ f x ⎯ n→∞ () () uniformly in x ∈ .v. provided f is uniformly continuous. is also asymptotically normal.f.f. Take K x = () 1 2π e−x 2 2 . THEOREM 2 Under the same assumptions as those in Theorem 1 and the additional condition (7). . is asymptotically unbiased in the sense that ˆ Efn x ⎯ → f x .F. as the following theorem states. ⎯ n→∞ The estimator fˆn(x). n with (unknown) p. consider the following example. r. ˆ fn x ⎯a. if K is taken to be the 1 1 p. s. ⎯ n→∞ ( ) n→∞ ( ) (7) Then the following results hold true. .d. viewed as an estimator of f(x).i. . the estimator fˆn(x) of f(x) is consistent in quadratic mean in the sense that ˆ E fn x − f x () () 2 ⎯ → 0. In closing this section.’s which are uniformly continuous. for each x ∈ f is continuous. ( ) Finally. . For example. fˆn(x). there is plenty of ﬂexibility in choosing it. of the N(0.d. EXAMPLE 1 Consider the i.d. 1 . j = 1. clearly. 489 satisfying (5) and for each x ∈ let fˆn(x) be deﬁned by (6). – ) p. THEOREM 3 Under the same assumptions as those in Theorem 2. THEOREM 4 Under the same assumptions as those in Theorem 1 and also condition (8). n ⎯ n→∞ (8) we may show the following result.20. the r. these conditions are.. Then for any x ∈ at which f is continuous.f. for each x ∈ at which f is continuous.d. then by choosing the sequence {hn} of positive constants to tend to zero and also such that nh 2 ⎯ → ∞.v. or the U(− – .’s Xj.d. it should be pointed out that there are many p. . 1).f. ˆ ˆ fn x − E fn x n at which ( ) [ ( )] ⎯d→ Z ⎯ n→∞ ˆ σ [ f ( x )] N 0. f.2 Nonparametric Estimation Estimation 20.

j = 1.1. x + h ˆ fn x = . r. . X n in x − h. n be i.3 Some Nonparametric Tests Let Xj.’s with unknown d. . 1) and 0 otherwise.2. However. one has ex < ex . ∑ K⎝ nh j = 1 ⎜ h ⎟ ⎠ () 1 where K(x) is – if x ∈ [−1. . j = 1.d.490 20 Nonparametric Inference Then. and replace t by x/2. ⎥ ⎥ ⎦ Exercise 20. 2 ⎯ ⎯ Let us now take hn = 1/n1/4. so that sup K x . . In a similar way xe − x 2 ⎯ → 0. so that e−x /2 < e−x/2 and hence 2 2 2 K x = () 1 e−x 2 ≤ 1 { () } < ∞. Now consider the expansion et = 1 + teθt for some 0 < θ < 1. .1 Let Xj. n 2h Then show that () ( ] 1 n ⎛ x− Xj ⎞ ˆ fn x = .v.f. Fn may be used for the purpose of estimating F. This estimator here becomes as follows: x x = e x 2 1 + x 2 eθ x ( ) 2 = ( ) | | || ˆ fn x = () 1 2π n3 4 ⎡ x−X j ∑ exp ⎢− 2 n1 2 ⎢ j =1 ⎢ ⎣ n ( ) 2 ⎤ ⎥. condition (4) is satisﬁed. 1 the number of X1 . . x ∈ 2π 2π Next.i.i. . nhn = n n1 2 = n1 2 ⎯ → ∞ and n→∞ n→∞ 3 4 nhn = n ⎯ → ∞. . the sample d. 2 20. . . for x > 1.v. n be i. . As was seen in Section 20. F. Since also K(−x) = K(x).d. so that x→∞ x→−∞ lim xK(x) = 0 as x → ∞. We get then 1 ⎯ →0 ⎯ 1 θ x 2 x→∞ 1 x + e 2 2 2 ⎯ ⎯ and therefore xe − x 2 ⎯ → 0. Thus the estimator given by (6) has all properties stated in ⎯ n→∞ Theorems 1–4. .f. clearly. .’s and for some h > 0 and any x ∈ deﬁne fˆn(x) as follows: . xe − x 2 2 < xe − x 2 = x e x 2 . . Then 0 < hn ⎯ → 0. r.

Dn = sup Fn x − F0 x . x ≥ 0. 0. ) (10) In order for this determination to be possible.1 Nonparaetric Estimation 491 testing hypotheses problems about F also arise and are of practical importance.v. . a given continuous d.01. Thus we may be interested in testing the hypothesis H : F = F0. m be i. (9) where Fn is the sample d. 1962. Then. It has been shown in the literature that P ( nDn ≤ x H ⎯ → ⎯ n→∞ ) j = −∞ ∑ (−1) e j ∞ −2 j 2 x 2 . Handbook of Statistical Tables by D. 0 Let α be the level of signiﬁcance.3 Some Nonparametric Tests 20. This hypothesis can be tested by utilizing the chi-square test for goodness of ﬁt discussed in Chapter 13.’s.i. the sample d.i. there are tables available which facilitate the calculation of C. . Therefore we would reject H if Dn > C and would accept it otherwise. Fn may also be used for testing the same hypothesis as above.f. we would have to know the distribution of Dn.v. What arise naturally in practice are problems of the following type: Let Xi.025. for example. against all possible alternatives. under H.→ 0.v. . .005).f. the right-hand side of (11) may be used for the purpose of determining C by way of (10). . The chi-square test is the oldest nonparametric test regarding d. Addison-Wesley. Owen.. j = 1. r.) The test employed above is known as the Kolmogorov one-sample test. a given d. i = 1.f.f. . it follows from (2) ⎯ that Dn ⎯a. F and let Yj. For moderate values of n (n ≤ 100) and selected α’s (α = 0. . x ∈ { ( () () }. Section 8.20. G. Alternatively. against the alternative A : F ≠ F0 (in the sense that F (x) ≠ F (x) for at least one x ∈ ).. 0. under H.s.05. Deﬁne the r. we have to make the supplementary (but mild) assumption that F is continuous.d. (See. n be i. In order to be able to employ the test proposed below.d. The testing hypothesis problem just described is of limited practical importance. The two random samples are assumed to be independent and the hypothesis of interest here is H : F = G. Thus the hypothesis to be tested here is H : F = F0 . One possible alternative is the following: (12) A: F ≠ G (in the sense that F(x) ≠ G(x) for at least one x ∈ ). or of some known multiple of it.f. . . deﬁned by (1).’s with continuous but unknown d.’s with continuous but unknown d. (11) Thus for large n. 0. B. Dn as follows.10. The constant C is to be determined through the relationship n→∞ P Dn > C H = α . r.f.f. 0.

For moderate values of m and n (such as m = n ≤ 40). in the sense that F(x) ≤ G(x) with strict inequality for at least one x ∈ For testing H against A′. Thus.n > C H = α .s. we have that Dm. for large m and n. namely. n → ∞. the right-hand side of (15) may be used for the purpose of determining C by way of (14).v. n { () () } ⎯⎯→ 0.n ⎯a.’s of the X’s and Y’s. the following two alternatives are also of interest.492 20 Nonparametric Inference The hypothesis is to be tested at level α.n as follows: Dm. so that Fm x − Gn x = Fm x − F x − Gn x − G x m n () ( ) [ ( ) ( )] [ ( ) ( )] ≤ F ( x ) − F ( x ) + G ( x ) − G( x ) . Hence Dm. { () () } . A′ : F > G. F = G.n + Dm. m→∞ sup Gn x − G x . (15) where N = mn/(m + n).n = sup Fm x − Gn x . m→∞ In other words.→ 0 as m.n = sup Fm x − Gn x . (See reference cited above in connection with the one-sample Kolmogorov test. Deﬁne the r. The constant C is determined by means of the relation P Dm. under H.n ≤ x H → ) j = −∞ ∑ (−1) e j ∞ −2 j 2x 2 as m. respectively. Dm. or some known multiple of it. and (17) . In connection with this it has been shown in the literature that P ( N Dm .) In addition to the alternative A : F ≠ G just considered. x ∈ }.s.n > C and accepting it otherwise. and this suggests ⎯ rejecting H if Dm. x ∈ whereas sup Fm x − F x . x ∈ { () () }. x ∈ { () () } ⎯⎯→ 0. x ∈ (16) . there are tables available which facilitate the calculation of C. x ∈ { () () } + sup{G (x) − G(x) . a.n ≤ sup Fm x − F x . Under H. Gn are the sample d. ( ) (14) Once again the actual determination of C requires the knowledge of the distribution of Dm. (13) where Fm. x ≥ 0. n → ∞. in the sense that F(x) ≥ G(x) with strict inequality for at least one x ∈ A′′ : F < G.f. we employ the statistic D+ deﬁned by m. a.s.n.

the conclusion should be the same if the X’s and Y’s are multiplied by the same positive constant. .4 More About Nonparametric Tests: Rank Tests 20. The rank of Xi in the combined sample of X’s and Y’s.f. This is a special case of what is known as invariance under monotone transformations.n relation + P Dm. we employ the statistic D − deﬁned by m. The problem is that of testing the hypothesis H : F = G against various alternatives at level of signiﬁcance α. x ∈ .4 More About Nonparametric Tests: Rank Tests Consider again the two-sample problem discussed in the latter part of the previous section. . N = mn/(m + n). we should reach the same conclusion regarding the rejection or acceptance of H regardless of the scale used in measuring the X’s and Y’s.1 Nonparaetric Estimation 493 and reject H if D+ > C+. Now it seems reasonable that in testing H on the basis of the X’s and Y’s.n ≤ x → 1 − e −2 x ) 2 as m. N (= m + n) which corresponds to the position of Xi after the X’s and Y’s have been ordered according to their size.n known as Kolmogorov–Smirnov two-sample tests. it follows that in . x ∈ { () () } and reject H if D− < C −.n − Dm. for testing H against A″.n > C + H = α ( ) by utilizing the fact that P ( + N Dm. .n. the rank R(Yj) of Yj in the combined sample of the X’s and Y’s is deﬁned in a similar fashion. Of course. (That is.n < C − H = α ( ) by utilizing the fact that P ( − N Dm. By the assumption of continuity of F and G. n → ∞. Here N is as before. . Namely. that is. n be two independent random samples with continuous d.’s F and G. to be denoted by (Xi).n relation − P Dm.) This is done by employing the ranks of the X’s and Y’s in the combined sample rather than their actual values. . the reader is referred to the reference cited earlier in this section. let Xi. j = 1. is that integer among the numbers 1. The cut-off point C − is determined through the m. . The last three tests based on the statistics Dm. . as can be shown.n ≤ x → 1 − e −2 x ) 2 as m. m and Yj. . . respectively. n → ∞. .20. . 20. i = 1.n = sup Gn x − Fm x . For relevant tables. Similarly.n m. The cut-off point C+ is determined through the m. x ∈ . D+ and D− are m. .

(19) (20) where C′ is deﬁned. we are going to use either one of the rank sum statistics RX. if z < 0 () (21) and set U = ∑ ∑ u X i − Yj . let the r. the number of times a Y precedes an X and it can be shown (see Exercise 20. R(Xm)) are equally likely each having probability 1/(m ). . so that RX would tend to take on small values. RY deﬁned by RX = ∑ R X i . Under A′. this procedure is facilitated by tables (see reference cited in previous section). A. N The rejection region then is deﬁned as follows: Consider all these ( m ) values and for each one of them form the rank sum RX. all (m ) N values of (R(X1). (See Exercise 20. i =1 j =1 m n ( ) (22) Then U is. whereas for large values of m and n it becomes unmanageable. As an example. . namely. under H. . we have strict inequalities with probability equal to one. . x ∈ . Next.4.4. we would reject H in favor of A′ if X < C ′. 1 < k < ⎜ ⎟. (23) . respectively.v. ) N Theoretically the determination of C′ is a simple matter. P(X ≤ x) ≥ P(Y ≤ x). as they are speciﬁed by (12). ⎝ m⎠ There are three alternatives of interest to consider. A′ and A″. so that P ( X < C ′ H = α.) In the present case. it is customary to take the level of signiﬁcance α as follows: α= ( ) N m k ⎛ N⎞ . (16) and (17). Y be distributed as the X’s and Y’s. RY = ∑ R Yj i =1 j =1 m ( ) n ( ) (18) because RX + RY = N(N + 1)/2 (ﬁxed).1. Then the rejection region consists of the k smallest values of these rank sums.’s X. if z > 0 uz =⎨ ⎩0. for this latter case. clearly. The remaining two alternatives are treated in a similar fashion. and let us consider alternative A′. For small values of m and n (n ≤ m ≤ 10). the normal approximation to be discussed below may be employed. For testing the hypothesis H speciﬁed above.2) that U = mn + n n+1 2 ( ) −R Y = RX − m m+1 2 ( ). consider the function u deﬁned as follows: ⎧1. respectively.494 20 Nonparametric Inference ordering the X’s and Y’s. accordingly. and for reasons to become apparent soon. as is easily seen.

their discussion here would be beyond the purposes of the present chapter. x∈ () ( ) for some unknown Δ ∈ . respectively.20. Y1 = 110.3) that under H. RY = 19. This test is known as the two-sample Wilcoxon–Mann–Whitney test. A special interesting case. Now it can be shown (see Exercise 20. is that where the d.4. As before. 2 3 4 ( ) ( ) ( ) ( ) R(Y ) = 5. X3 = 74. F is assumed to be unknown but continuous. these results check with (23) and Exercise 20. 1) as m. we say that G is a shift of F (to the right if Δ > 0 and to the left if Δ < 0).v.4 More About Nonparametric Tests: Rank Tests 20. R(Y ) = 3. we ﬁnd that R X 1 = 7. In this case. R X 5 = 8. 2 12 Then the r. However. we should like to mention that there is also the onesample Wilcoxon–Mann–Whitney test. for large m. we obtain ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 45 X 50 Y 53 Y 65 X 71 Y 74 X 78 X 82 X 110 Y . as well as other one-sample and twosample rank tests available.1 Nonparaetric Estimation 495 Therefore the test in (19) can be expressed equivalently in terms of the U statistic. σ2 U = .1. n. Combining these values and ordering them according to their size. From this. X2 = 65. R Y1 = 9. R X 4 = 1.v. Y3 = 53. As an illustration. (Incidentally. Y4 = 50. consider the following numerical example. R X 3 = 6. EXAMPLE 2 Let m = 5. Y2 = 71. R X 2 = 4. where an X or Y below a number means that the number is coming from the X or Y sample. A′ : F > G and A″ : F < G are equivalent to Δ ≠ 0. the limiting distribution (along with the continuity correction for better precision) may be used for determining the cut-off point C′ by means of (20). X4 = 45. In closing this section. respectively.) Now N = 5 + 4 = 9 and ( ) () N m 9 5 1 = 1 = 1 . R(Y ) = 2. n → ∞ and therefore. Δ > 0 and Δ < 0.f. Z distributed as N(0.4. mn m + n + 1 mn . 126 . Then the hypothesis H : F = G is equivalent to testing Δ = 0 and the alternatives A : F ≠ G. n = 4 and suppose that X1 = 78. G of the Y’s is assumed to be of the form EU = ( ) ( ) G x =F x−Δ . We also ﬁnd that U = 4 + 4 + 3 + 0 = 11. so that ( ) ( ) RX = 26. X5 = 82. (U − EU)/(σ (U )) converges in distribution to an r. where the rank sum tests of the present section are appropriate.

. . Then show that. . For the given m.1 Consider the two independent random samples Xi. we brieﬂy mention another nonparametric test—the twosample Sign test.) α= ( ) Exercises 20. The two random samples are assumed to be independent and the hypothesis H to be tested is H : F = G. j = 1. let RX and RY be deﬁned by (18). we assume. .4.’s with continuous d. 2 12 ( ) ( ) 20. . . or equivalently (by means of (23)). More precisely. . 126 Then for testing H against A′ (given by (16)). that the underlying distributions are continuous. mn m + n + 1 mn EU = .v. . σ2 U = .d. m and Yj. j = 1. 20.’s with continuous d. if X j < Y j ⎪ Zj = ⎨ ⎪0. j = 1. 341 of the reference cited in Section 20. n and set ⎧1. G.f. i = 1.3 Let Xi. .496 20 Nonparametric Inference and let us take 5 ≈ 0. ⎩ . To this end. 20. .d. j = 1. under H. . . Furthermore. m and Yj.v. n are also i. . H is accepted.4. F and that Yj. respectively. .i. j = 1. as we have also done in previous sections. we would reject for small values of RX. . α and for the observed value of RX (or U). . n. . (See tables on p. n are i. for small values of U. . . r. we suppose that Xj. in the combined sample of the X’s and Y’s. . r.04 . n and let R(Xi) and (Yj) be the ranks of Xi and Yj. In order to avoid distribution related difﬁculties. where N = m + n. . . Then establish (23). Then show that RX + RY = N N +1 2 ( ). if X j > Y j . . n be two independent random samples and let U be deﬁned by (22).3.i.4.2 Let RX and RY be as in the previous exercise and let U be deﬁned by (22). which is easily applicable in many situations of practical importance. Yj). . consider the n pairs (Xj. . . i = 1.5 Sign Test In this section. .f. .

Y9 = 98.1 and if we are interested in testing H : F = G against A : F ≠ G 1 1 (equivalently. X10 = 85 and Y1 = 50. one would use the two-sided or the appropriate onesided test. X 6 = 48. the efﬁciency of two manufacturing processes producing the same item. Z10 = 0. respectively. Some cases where the sign test just described is appropriate is when one is interested in comparing the effectiveness of two different drugs used for the treatment of the same disease.20. say. j = 1. Y7 = 76. but we will not discuss it here. let Xi. or A″ : F < G. Z8 = 1. Y5 = 74. X 9 = 90. . p = – against p ≠ –). . Then Z1 = 0. . p) j=1 1 and the hypothesis H above is equivalent to testing p = –. Y2 = 100.’s with continuous d. we would accept H. etc. X 8 = 75.6 Relative Asymptotic Efficiency of Tests 20. namely. Then. consider the following numerical example. against a speciﬁed alternative when one of the above-mentioned tests is employed. X 3 = 64. Z2 = 1. The hypothesis to be tested is H : F = G and the alternative may be either A : F ≠ G. m and Yj. This is obtained by introducing what is known as the Pitman asymptotic relative efﬁciency of tests. there is also the one-sample Sign test available. .i. i = 1. Depending on the 2 type of the alternatives. Y6 = 64. the response of n customers regarding their preferences towards a certain consumer item. Y4 = 96. Z is distributed as B(n. Y10 = 40. . Thus. . if α = 0. Z5 = 0. Z9 = 1.’s F and G. r. Y8 = 83. 2 2 20.1 Nonparaetric Estimation 497 Also set Z = ∑n Zj and p = P(Xj < Yj).f.d.6 Relative Asymptotic Efﬁciency of Tests Consider again the testing hypothesis problem discussed in the previous sections. X 2 = 68. n be i. For the precise deﬁnition of this concept. Z4 = 1. X 7 = 100. we would like to have some measure on the basis of which we could judge the performance of the test in question at least in an asymptotic sense. . Z6 = 1. clearly. Of course. so that 10 ∑Z j =1 j = 6. The level of signiﬁ- . In employing either the Wilcoxon–Mann–Whitney test or the Sign test in the problem just described.v. or A′ : F > G. X 4 = 90. suppose that the two sample sizes are the same and let n be the sample size needed in order to obtain a given power β. For the sake of an illustration. Y3 = 70. Z7 = 0. X 5 = 83. EXAMPLE 3 Let n = 10 and suppose that X1 = 73. Z3 = 1. .

We further assume that the limit of n*/n. Formally. Chapter 13) for testing the same hypothesis against the same speciﬁed alternative at the same level α. if e = 5. 1 when the underlying distribution is Uniform and ∞ when the underlying distribution is Cauchy. Then this quantity e is the Pitman asymptotic relative efﬁciency of the Wilcoxon–Mann–Whitney test (or of the Sign test. exists and is independent of α and β. depending on which one is used) relative to the t-test. Thus.95 when the underlying distribution is Normal. Let n* be the (common) sample size required in order to achieve a power equal to β by employing the t-test. if we use the Wilcoxon–Mann–Whitney test 1 and if it so happens that e = – . we may also employ the t-test (see (36). then this means that the Wilcoxon–Mann– 3 Whitney test requires approximately three times as many observations as the t-test in order to achieve the same power. then the Wilcoxon–Mann–Whitney test requires approximately only one-ﬁfth as many observations as the t-test in order to achieve the same power.498 20 Nonparametric Inference cance is α. It has been found in the literature that the asymptotic efﬁciency of the Wilcoxon–Mann–Whitney test relative to the t-test is 3/π ≈ 0. However. as n → ∞. Denote this limit by e. .

. Two vectors x = (x1. n. xn)′ by the real number α (scalar) is the vector deﬁned by αx = (αx1.3 Basic Deﬁnitions About Matrices 499 Appendix I Topics from Vector and Matrix Algebra I. . × . . . ′ (αx + βy) = αx′ + βy ′. the following properties are immediate: α x + y = αx + αy. . y = (y1. . . The set of all n-dimensional vectors is denoted by Vn. . . xn)′. . . n factors) in our previous notation. ( x + y ) + z = x + ( y + z ) = x + y + z. to be denoted by O. α (βx) = β (αx) = αβx. . yn)′ is the vector deﬁned by x + y = (x1 + y1. . xn + yn)′. y and z in Vn. xn)′. . . . . All vectors will be column vectors but for typographical convenience. xn)′. . . y in Vn and any two scalars α. . Only vectors with real components will be considered. The zero vector. β. . xj ∈ . is the vector all of whose components are equal to 0. yn)′ are said to be equal if xj = yj. For any three vectors x. ( ) (α + β )x = αx + βx. . . The product αx of the vector x = (x1. . . . . . The sum x + y of two vectors x = (x1. and Vn is called the (real) n-dimensional vector space. . . . This deﬁnition is extended in an obvious manner to any ﬁnite number of vectors. . xn)′. . n. . . . The inner (or scalar) product x′y of any two vectors x = (x1. . Thus Vn = n (= × . the following properties are immediate: x + y = y + x. yn)′ is a scalar and is deﬁned as follows: 499 . . Thus x = (x1. y = (y1. . 1x = x. . .1 Basic Deﬁnitions in Vector Spaces For a positive integer n. . . j = 1. . .I. . . they will be written in the form of a row with a prime (′) to indicate transpose. . j = 1. αxn)′. . For any two vectors x. . y = (y1. x is said to be an n-dimensional vector with real components if it is an n-tuple of real numbers.

α j ∈ . x2 ∈ . then x = 0. β. x2)′ ∈ V2. the m-dimensional vector space Vm may be considered as a subspace of Vn by enlarging the m-tuples to n-tuples and identifying the appropriate components with zero in the resulting n-tuples. j = 1. we shall assume that the above-mentioned identiﬁcation has been made and we shall write Vm ⊆ Vn to indicate that Vm is a subspace of Vn. . Two vectors x. . A (nonempty) subset V of Vn is a vector space. for examples. . x3)′ ∈ V3.500 Appendix I Topics from Vector and Matrix Algebra x ′y = ∑ x j y j . For any positive integer m < n. if x′y = 0. denoted by V ⊆ Vn. . x3= 0} which is a subspace of V3. Thus. . and we write x ⊥ y. . x ′ αy = αx y = α x ′y . x ′ αy + βz = αx ′y + βx ′z. The vectors xj. j = 1. r in Vn are said to span (or generate) the subspace r V ⊆ Vn if every vector y in V may be written as follows: y = Σj=1αjxj for some scalars αj. . y in Vn are said to be orthogonal (or perpendicular). x2 = 0} which is a subspace of V2. the following property is immediate: αx = α x . ⎪ j =1 ⎩ ⎫ ⎪ r⎬ ⎪ ⎭ is a subspace of Vn. x1. A vector x in Vn is said to be orthogonal (or perpendicular) to a subset U of Vn. being the set {x = (x1. x1 ∈ . It is shown easily that for any given set of vectors xj. x2. The norm (or length) ||x|| of a vector x in Vn is a non-negative number and is deﬁned by ||x|| = (x′x)1/2. . which is a subspace of Vn. the x-axis in the plane may be identiﬁed with the set {x = (x1. j = 1. x2)′ ∈ V2. j =1 n For any three vectors x. etc. From now on. y in V and any scalars α and β. and we write x ⊥ U. the xy-plane in the three-dimensional space may be identiﬁed with the set {z = (x1. if for any vectors x. ( ) ( ) ( ) ( ) x ′x ≥ 0 and x ′x = 0 if and only if x = 0. For any vector in Vn and any scalar α. y2)′ ∈ V2. y2 ∈ } which is a subspace of V2. . the following properties are immediate: ′ x ′y = y ′x. . . . . . y = ∑ α j x j . j = 1. . y and z in Vn and any scalars α. r. V deﬁned by r ⎧ ⎪ V = ⎨y ∈Vn . r in Vn. αx + βy is also in V. also if x ′y = 0 for every y ∈Vn . Similarly the y-axis in the plane may be identiﬁed with the set {y = (y1. Thus. . y1 = 0. if x′y = 0 for every y ∈ U. α1x1 + α2x2 = 0} is a subspace of V2. for example. the straight line α1x1 + α2x2 = 0 in the plane. .

m (the dimension of V ). as z varies in Vm. . . . . ||x − w|| has a minimum value obtained for w = u. n} form an orthonormal basis in Vn. . . u are called the projections of x into Vm and Ur. and as w varies in Ur. k} are said to form an orthonormal basis in V if they form a basis in V and also are pairwise orthogonal and of norm one. Let m. . 1. 0. . u ∈ U r. I.I THEOREM 3.I For any positive integer n. we gather together for easy reference those results about vector spaces used in this book. 0. 0 . . k in the subspace V ⊆ Vn are said to be linearly independent if there are no scalars αj. . . . . consider any subspace V ⊆ Vn. n} for Vn. . . j = 1.I . j = 1. THEOREM 2. . Furthermore. 0. . . or to the vectors of any set of vectors in V spanning V. . . . let x be a vector in Vn and let V be a subspace of Vn. The vectors v. respectively. . | x − z|| has a minimum value obtained for z = v. THEOREM 1. . . k.3 Basic Deﬁnitions Vector Spaces 501 The vectors xj . j = 1. . any vector x in Vn may be written (decomposed) uniquely as follows: x = v + u with v ∈ Vm. j = 1. k which are not all zero for which Σk= 1αjxj = 0. Then U is an r-dimensional subspace Ur of Vn with r = n − m and is called the orthocomplement (or orthogonal complement) of Vm in Vn. It can be shown that the number of vectors in any basis of V is the same and this is the largest number of linearly independent vectors in V. . . say. . e n = 0. j = 1. that is. . 0 . e 2 = 0. j = 1. . Then V has a basis and any two bases in V have the same number of vectors.I. ( ) ( ) ( ) it is clear that the vectors {ej. otherwise they are said to be linearly dependent. . and ||x||2 = ||v||2 + ||u||2. . The vectors {xj. n be any positive integers with m < n and let {xj. Let n be any positive integer.2 Some Theorems on Vector Spaces In this section. . n be any positive integers with m < n and let Vm be a subspace of Vn of dimension m.2 Some Theorems onAbout Matrices I. . .I THEOREM 4. 1 . . . Let m. Let U be the set of vectors in Vn each of which is orthogonal to Vm. i = 1. Finally. Then x ⊥ V if and only if x is orthogonal to the vectors of a basis for V. For example. i x ′x i = x i i 2 = 1. . A basis for j the subspace V ⊆ Vn is any set of linearly independent vectors which span V. In particular. by taking ′ ′ ′ e1 = 1. This number is called the dimension of V. x′xj = 0 for i ≠ j. . Then this basis can be extended to an orthonormal basis {xj. m} be an orthonormal basis for Vm. the dimension of Vn is n and m ≤ n. . . . . .

. m. αA = αA ′. The product αA of the matrix A = (aij) by the scalar α is the matrix deﬁned by αA = (αaij). αA + βB = αA ′ + βB′. . j = 1. the following properties are immediate: ′ ′ ′ A ′ = A. .502 Appendix I Topics from Vector and Matrix Algebra I. m. n of a square matrix of order n are called the elements of the main diagonal of A. aij = aji for all i and j. However. It follows that a vector in Vn is simply an n × 1 matrix. A zero matrix will be denoted by 0 regardless of its dimensions. ( ) ( ) ( ) The product AB of the m × n matrix A = (aij) by the n = r matrix B = (bij) is the m × r matrix deﬁned as follows: AB = (cij). The proper notation for a unit matrix of order n is In. Thus ⎛ a11 ⎜a = ⎜ 21 ⎜⋅⋅⋅ ⎜ ⎝ am 1 a12 a22 ⋅⋅⋅ am 2 ⋅ ⋅ ⋅ a1n ⎞ ⋅ ⋅ ⋅ a2 n ⎟ ⎟ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅⎟ ⎟ ⋅ ⋅ ⋅ amn ⎠ A m ×n The numbers aij. B = (bij) is the m = n matrix deﬁned by A + B = (aij + bij). For any m × n matrices A. (A + B) + C = A + (B + C) = A + B + C. For any m × n matrices A. then A is called diagonal. n are called the elements of the matrix. A matrix is said to be a square matrix if m = n and then n is called the order of the matrix. . . if all of the elements off the main diagonal are 0). where cij = Σn = 1aikbkj. A unit (or identity) matrix is a square matrix in which all diagonal elements are equal to 1 and all other elements are equal to 0. that is. For brevity. Two m × n matrices are said to be equal if they have identical elements.3 Basic Deﬁnitions About Matrices Let m. This deﬁnition is extended in an obvious manner to any ﬁnite number of m × n matrices. the following properties are immediate: A + B = B + A. . The sum A + B of two m × n matrices A = (aij). β. Then a (real) m × n matrix Am×n is a rectangular array of mn real numbers arranged in m rows and n columns. n be any positive integers. Clearly. The transpose A′ of the m × n matrix A = (aij) is the n × m matrix deﬁned by A′ = (aij). n for an m × n matrix. . The product k . . we shall often write simply I and the order is to be understood from the context. The elements aii. i = 1. If A is a square matrix and A′ = A. i = 1. If aij = 0 for all i ≠ j (that is. . A zero matrix is one in which all the elements are equal to zero. Thus the rows and columns of A′ are equal to the columns and rows of A. respectively. . or just the diagonal elements of A. j = 1. for a symmetric matrix the elements symmetric with respect to the main diagonal of A are equal. . . . . B and any scalars α. m and n are called the dimensions of the matrix. then A is called symmetric. . Thus I = (δij). we shall write A = (aij). i = 1. where δij = 1 if i = j and equals 0 if i ≠ j. . . and only real matrices will be considered in the sequel. B and C. . . .

m. . By means of the last property. For further elaboration. . Then A is said to be orthogonal. The matrix A−1 is called the inverse of A. . n stand for the row and column vectors of A. i = 1. 2. m). Then for any scalars α. Always rank A ≤ min(m. . Let A be an m × n matrix and let ri. there exists a unique matrix. . . by the expression A = ∑ ± a1i a2j ⋅ ⋅ ⋅ amp . . Thus the rank of A. . . see any of the references cited at the end of this appendix. n) and if equality occurs. . that AB = BA. otherwise A is called singular. . p) of (1. . IA = AI = A. It can be shown that A is nonsingular if and only is |A| ≠ 0. Then it can be shown that the largest number of independent r-vectors is the same as the largest number of independent c-vectors and this common number is called the rank of the matrix A. say m × m. we may omit the parentheses and set ABD for (AB)D = A(BD). . It can also be shown that if |A| ≠ 0. β and γ. . i = 1. . BA = ⎜ ⎟. . (AB)′ = B′A ′. . (αA)B = A(αB) = α (AB) = αAB. take ⎛ 0 0⎞ ⎛ 1 1⎞ A=⎜ ⎟. 0A = A0 = 0. is the common dimension of the two vector spaces spanned by the r-vectors and the c-vectors. . we say that A is non-singular or of full rank. such that AA−1 = A−1A = I. Let now |A| stand for the determinant of the square matrix A. The products AB. Let A be an m × n matrix.3 Basic Deﬁnitions About Matrices 503 BA is not deﬁned unless r = m and even then. C be two n × r matrices and let D be an r × k matrix. let B. to be denoted by A−1. A β B + γ C = βAB + γAC. ⎝ 0 1⎠ ⎝ 0 0⎠ Then ⎛ 0 0⎞ ⎛ 0 1⎞ AB = ⎜ ⎟ . in general. where the aij are the elements of A and the summation extends over all permutations (i. B = ⎜ ⎟. (AB)D = A(BD). j = 1. to be denoted by rank A.I. BA are always deﬁned for all square matrices of the same order. Clearly. . Let ri and ci. the following properties are immediate: ( ) (β B + γ C)D = β BD + γ CD. ⎝ 0 0⎠ ⎝ 0 0⎠ so that AB ≠ BA. . For example. n stand for the row and column . respectively. cj. (A−1)−1 = A. it is not true. deﬁned only for square matrices. Let A be a square matrix of order n such that A′A = AA′ = I. j. The plus sign is chosen if the permutation is even and the minus sign if it is odd.

.4 Some Theorems About Matrices and Quadratic Forms Those theorems about matrices used in this book are gathered together here for easy reference. v) Let A. Finally. or positive semideﬁnite if its characteristic roots λj.) iii) Let A be a non-singular square matrix. n} and {cj. they are always real for symmetric matrices. THEOREM 5.I Although all matrices considered here are matrices with real elements. n are j the diagonal elements of A.I Let A. . j = 1. However. That is. . {rj. .504 Appendix I Topics from Vector and Matrix Algebra vectors of the matrix A of order n. where aj. Then |B′AB| = |BAB′| = |A|. j = 1. . . Then A′. or eigenvalues of A. j = 1. it should be noted that their characteristic roots will. j = 1.) iv) If A is symmetric non-singular. (See (iii). where λ is a scalar. . C be any m × n. iii) For any (square) matrix A. (See (iv) of Theorem 6. respectively. then so is A−1. . . Then it is immediate that |A − λI| is a polynomial in λ of degree n and is called the characteristic polynomial of A. . THEOREM 7. ii) Every orthogonal matrix is non-singular.I. n × r. B be any two matrices of the same order. . . . REMARK 1. The matrix A is said to be positive deﬁnite. Then the orthogonality of A is equivalent to the following properties: r i′ri = ri 2 = ci 2 = c i′c i = 1 and ri′rj = c i′c j = 0 for i ≠ j. B. . . negative deﬁnite. n satisfy the following inequalities λj > 0. |A| = |A′|.I . i) A square matrix A is non-singular if and only if |A| ≠ 0. v) Let A. |A−1| = |A|−1. be complex numbers. . iv) For any orthogonal matrix A. |A| is either 1 or −1. Then the m × m matrix AB is non-singular and (AB)−1 = B−1A−1.I THEOREM 6. consider the determinant |A − λI|. in particular (by taking C = Ir).I. The n roots of the equation |A − λI| = 0 are called the characteristic (or latent) roots. i) Let A. B be non-singular m × m matrices. n. ii) For any diagonal matrix A of order n. . Then (ABC)′ = C′B′A′ and. A−1 are also non-singular. r × s matrices. λj < 0. . For a square matrix A of order n. (AB)′ = B′A′. λj ≥ 0. I. . a square matrix A is said to be idempotent if A2 = A. j = 1. (vi) of Theorem 6. respectively. Then |AB| = |BA| = |A| |B|. in general. B be matrices of the same order and suppose that B is orthogonal. . n} are orthonormal bases of Vn. vi) For any matrix A for which |A| ≠ 0. |A| = Πn=1aj.

m × m and n × n matrices. THEOREM 10. see the application after Theorem 5 in Chapter 9. a diagonal matrix is positive deﬁnite if and only if its diagonal elements are all positive. Then rank BA = rank AC = rank BAC = rank A. respectively. . respectively. B and C be m × n. Then x′x = y′y. . let A be an n × n orthogonal matrix and set y = Ax. THEOREM 9. rank (B′AB) = rank (BAB′) = rank A if m = n and B is orthogonal. m × m and n × n matrices. . 1 2 (For a concrete example. iv) A matrix A of order n is positive deﬁnite (semideﬁnite. . AA′ is positive deﬁnite (and symmetric). . n. and suppose that B. respectively. rank A = number of nonzero characteristic roots of A. rank C . <0. In particular. i. . B and C be m × n. . ( ) ( ) ( ) iv) Let A.I i) For any square matrix A. v) For any matrix A. respectively. the ﬁrst two rows of which are equal to r′ .I. Then A is positive deﬁnite if and only if |Aj| > 0.I i) If A is positive deﬁnite. n × r and r × k matrices. Then rank (BAC) = rank A. r′ . . n. ⎝ a j 1 ⋅ ⋅ ⋅ a jj ⎠ j = 1. iii) For every symmetric matrix A there is an orthogonal matrix B (of the same order as that of A) such that the matrix B′AB is diagonal (and its diagonal elements are the characteristic roots of A). and suppose that B. negative deﬁnite. . A−1 exists and is also positive deﬁnite. r2 be two vectors in Vn such that r′ r2 = 0 and ||r1|| = ||r2|| = 1. . C are non-singular. B and C be m × n. .) ii) Let x be a vector in Vn. so that ||x|| = ||y||. .) if and only if x′Ax > 0 (≥0. respectively) for every x ∈Vn with x ≠ 0. ( ( iii) Let A. rank B. Then rank AB ≤ min rank A. rank B ( ) ) ( ) ) and rank ABC ≤ min rank A. C are non-singular. iii) Let A = (aij). rank AA ′ = rank A ′A = rank A = rank A ′. Then there 1 exists an n × n orthogonal matrix. .I i) Let r1.3 Basic Deﬁnitions About Matrices About Matrices and Quadratic Forms 505 THEOREM 8. n and deﬁne Aj by ⎛a ⋅ ⋅ ⋅ a1 j ⎞ A j = ⎜ 11 ⎟. j = 1. ii) For any nonsingular square matrix A. In particular. j = 1.4 Some Theorems I. ( ) ( ) ii) Let A.

such that AiAj = 0 for 1 ≤ i < j ≤ m. . For the deﬁnition of a quadratic form. then Q = ∑ λ j y 2 . j j =1 r Finally. If x′Ax = x′x identically in x ∈ Vn. The following theorem refers to quadratic forms. . THEOREM 11. then Σm= 1Aj is idempotent and j ∑ rank A j =1 m j ⎛m ⎞ = rank ⎜ ∑ A j ⎟ . ⎝ j =1 ⎠ i =1 r 2 where δi is either 1 or −1. m are the nonzero characteristic roots of A. . There exists an orthogonal matrix B such that if y = B −1 x. ii) A diagonal matrix whose (diagonal) elements are either 1 or 0 is idempotent. . vi) The characteristic roots of a positive deﬁnite (semideﬁnite) matrix are positive (nonnegative). iii) Let Q be as in (ii). ii) Consider the quadratic form Q = x′Ax. j = 1. then Q = ∑ y2 . then B′AB is positive semideﬁnite. i = 1. ⎝ j =1 ⎠ In particular. the reader is referred to Deﬁnition 1. There exists an orthogonal matrix B such that if y = B−1x. .I i) Let A be a symmetric matrix of order n. . ij j j =1 n i = 1. j j =1 m where λj. then A = I. rank A 1 + rank I − A 1 = n ( ) . . r such that ⎛ n ⎞ Q = ∑ δ i ⎜ ∑ bij x j ⎟ . m are symmetric idempotent matrices of order n. . we formulate the following results referring to idempotent matrices: THEOREM 12. . . iv) Let Q be as in (ii) and suppose that A is idempotent and rank A = r. . . . j = 1. where A is of order n. .I i) The characteristic roots of an idempotent matrix are either 1 or 0. . iii) If Aj. r. and suppose that rank A = r. Chapter 19.506 Appendix I Topics from Vector and Matrix Algebra v) If A is a positive semideﬁnite matrix of order n and B is a non-singular matrix of order n. . Then there exist r linear forms in the x′s ∑b x .

( ) . j The proof of the theorems formulated in this appendix may be found in most books of linear algebra. 3d ed. 1957. 1968. Rao. Lang. m are symmetric idempotent matrices of the same order and Σm= 1Aj is also idempotent. C. Linear Algebra. S. 1961.4 Some Theorems I. Addison-Wesley. . and F. R. MacMillan. Vol.3 Basic Deﬁnitions About Matrices About Matrices and Quadratic Forms 507 and rank A 1 + rank A 2 + rank I − A 1 − A 2 = n. Scheffé. Chapter 1. I. Wiley. Wiley. j = 1. . Linear Statistical Inference and Its Applications. see also C. H. the AiAj = 0 for 1 ≤ i < j ≤ m. Addison-Wesley. iv) If Aj. Graybill. A. The Analysis of Variance.I. 1959. A Survey of Modern Algebra. Perlis. see Birkhoff and MacLane. Murdoch. . For example. McGraw-Hill. S. 1965. Appendices I and II. An Introduction to Linear Statistical Models. Linear Algebra for Undergraduates. Theory of Matrices. .. 1965. For a brief exposition of most results from linear algebra employed in statistics. D. Chapter 1. 1952. Wiley.

as well as an r. ⎥ 2⎢ ⎝ r ⎠ ⎥⎪ ⎪ ⎢ ⎣ ⎦⎭ ⎩ II. respectively.v. X = Σjr= 1X 2 was distributed as χ 2. . r T = X (Y r ) was the (Student’s) t-distribution with r d. t ∈ .1 Noncentral t-Distribution It was seen in Chapter 9. 1) and χ2. having this distribution.f.’s X1. is given by ft ′ δ t . T ′ is said to have the noncentral t-distribution with r d. Then the distribution of the r. Xr were independent normally distributed r.v.’s distributed as N(δ. . . . and set r T ′ = X (Y r ) . μr.v.v. Xr be indepenj r dent normally distributed with variance 1 but means μ1.f.’s with variance 1 and mean 0. χ2 and F Distributions II.v. Using the deﬁnition of a t′:δ r. This distribution. . if often denoted by t′:δ.v.d.. Application 2.2 Noncentral χ 2-Distribution It was seen in Chapter 7 (see corollary to Theorem 5) that if X1. Let now the r. The r. δ = r. it can r r be found by well known methods that its p. .508 Appendix II Noncentral t. .f.v. that if the independent r. . . and noncentrality parameter δ. 1) and χ2. X* = Σjr= 1X 2 is said to be the noncentral j 508 . . ( ) 2 ( ) 1 Γr 2 r +1 2 ( ) ∫ πr ∞ 0 x ( r −1) 2 2 ⎧ ⎡ ⎞ ⎤⎫ ⎪ ⎪ 1⎢ ⎛ x × exp ⎨− x + ⎜ t − δ ⎟ ⎥ ⎬dx. χ2 and F Distributions Appendix II Noncentral t. . . respectively. then the distribution of the r.v. respectively.v.’s X and Y were distributed as N(0. Now let X and Y be independent r. then the r.v.

but it does not have any simple closed form. This distribution. . This distribution.v. and its p. is often denoted by F′ . 2 r.d. and noncentrality parameter δ. F= X r1 . . and set r 1 2 F′ = X r1 . 1. ( ) ∞ j =0 () () where Pj δ = e () −δ 2 2 (δ 2) 2 j j! and fr + 2 j is the p. respectively.f.v. . one has f χ ′ δ x. is a mixture of χ2-distributions with Poisson weights. .v.f. Using the deﬁnition of a χ′r.d. having this distribution. x ≥ 0. and noncentrality parameter δ.v.d.3 Noncentral F-Distribution In Chapter 9. . j = 0. .f.f. Y r2 1 2 where X and Y were independent r. r2 :δ (f.r . is often j 2 2 denoted by χ′r:δ. where δ 2 = Σjr= 1 μ2. 1. It can be seen that this p.δ r.f.3 Noncentral F-Distribution 509 chi-square distribution with r d. δ) = e −δ 2 2 ∑ cj j =0 ∞ (δ 2) 2 j f 1 r −1+ j 2 1 1 2 j! (1 + f ) ( r + r )+ j 1 2 . r r Suppose now that the r.v. Y r2 Then the distribution of F ′ is said to be the noncentral F-distribution with r1 and r2 d. . II. having this distribution. of χ r2+ 2j .d.. and also an r. .δ.f. was deﬁned as the distribution of the r. δ = ∑ Pj δ fr + 2j x .’s X and Y are independent and distributed as χ′r2:δ and χ2 . ⎡1 ⎤ Γ ⎢ r1 + r2 + j ⎥ 2 ⎣ ⎦ .’s distributed as χ 2 and χ 2 . More precisely. respectively. is given by the following expression: 1 2 fF ′ where r1 .II.. and also an r. Application 2. the F-distribution with r1 and r2 d. one can ﬁnd its p. f ≥ 0. . cj = ⎛1 ⎞ ⎛1 ⎞ Γ ⎜ r1 + j⎟ Γ ⎜ r2 ⎟ ⎝2 ⎠ ⎝2 ⎠ ( ) j = 0. which does not have r any simple closed form.v.f.

χ 2 and F-distributions. we obtain the t. Addison-Wesley. Handbook of Statistical Tables by D. namely. χ2 and F Distributions REMARKS (i) By setting δ = 0 in the noncentral t. χ 2 and F-distributions. B. (ii) Tables for the noncentral t. χ 2 and F-distributions. .510 Appendix II Noncentral t. the latter distributions may also be called central t. In view of this. 1962. χ 2 and F-distributions are given in a reference cited elsewhere. Owen. respectively.

9929 0.Tables 511 Appendix III Tables Table 1 The Cumulative Binomial Distribution The tabulated quantity is ∑ ⎜ j ⎟ p (1 − p ) ⎝ ⎠ j j =0 k ⎛ n⎞ n− j .3541 0.9023 1.9687 1.9570 0.9211 0.0000 0.0625 0.0000 0.8793 0.6836 0.1780 0.8809 0.4219 0.9785 0.0000 0.9375 1.9308 0.9934 1.0000 p 4/16 0.9512 0.0000 0.8200 0.3906 0.0000 0.9695 1.9656 0.0000 7/16 0.5625 0.9163 1.9648 1. n 2 k 0 1 2 0 1 2 3 0 1 2 3 4 0 1 2 3 4 5 1/16 0.9840 1.0000 0.1526 0.6296 0.5188 0.0000 3 4 5 511 .6875 0.7242 0.3164 0.9773 0.3125 0.9947 0.1001 0.7500 1.7681 0.9978 0.1250 0.9998 1.9642 0.9802 1.9905 1.7725 0.0000 0.5364 0.4358 0.0000 0.3164 0.8594 1.0000 0.9990 1.0000 0.5000 0.8086 1.9926 1.0000 6/16 0.7656 0.5129 0.0000 0.9844 0.0000 8/16 0.9839 0.7627 0.2753 0.9492 0.7749 0.9999 1.0000 0.8750 1.0000 1.0000 0.0000 0.7383 0.9961 1.0000 0.9998 1.5027 0.0312 0.7248 0.5933 0.8437 0.4116 0.1536 0.6160 0.9961 1.0000 1.0000 0.0000 1.6328 0.1875 0.6699 0.2373 0.9844 1.9634 1.0000 0.9989 1.0000 0.9888 0.0000 0.0563 0.8125 0.9077 0.9970 1.2500 0.9991 1.8240 0.9844 1.0000 0.0000 5/16 0.0000 2/16 0.8965 0.9065 0.0000 0.8381 0.9375 1.2234 0.4727 0.6602 0.9473 1.9998 1.0000 0.9980 1.8484 0.2441 0.9988 1.8789 0.0954 0.5000 0.0000 3/16 0.0000 0.3250 0.3815 0.5862 0.0000 0.

8528 0.1445 0.8951 0.0563 0.9102 0.0000 1.9952 0.0000 0.7650 0.9159 0.0233 0.9227 0.9453 0.9693 0.7826 0.9964 8/16 0.6861 0.0000 0.2314 0.9954 0.9999 1.9790 0.0751 0.1254 0.0000 0.0000 1.0898 0.0078 0.9511 0.0000 1.9995 1.0000 1.0000 1.0000 0.8281 0.9991 1.9748 0.6943 0.0000 0.9866 0.3927 0.0020 0.2266 0.9998 6/16 0.8805 0.9929 0.8335 0.9640 0.0000 1.9998 1.6365 0.0000 0.9805 0.9975 0.0000 0.9327 0.0100 0.0178 0.9900 0.0000 1.0000 0.5201 0.8626 0.9955 0.7363 0.0637 0.5000 0.5967 0.9991 0.0000 0.0000 1.7734 0.9300 0.0000 0.9830 0.7630 0.7644 0.3437 0.2440 0.0000 1.0000 1.1094 0.4748 0.9994 1.9958 0.9851 0.6160 0.9994 1.1780 0.0625 0.1001 0.0000 1.1795 0.9892 0.3770 0.0000 0.4147 0.5458 0.9926 0.0000 0.9727 0.9733 0.9944 0.7570 0.9428 0.9998 1.9999 1.0156 0.9995 1.4753 0.0000 1.0000 0.0317 0.3633 0.9922 0.9978 0.2877 0.7564 0.6007 0.0000 0.0056 0.0000 0.9192 0.3412 0.9871 0.0000 0.0000 2/16 0.9990 7/16 0.6389 0.0000 1.9996 1.0000 1.9001 0.3436 0.2110 0.9868 0.0000 1.6562 0.9577 0.8343 0.9817 0.9219 0.9999 1.4488 0.0000 1.0107 0.9994 1.8275 0.0091 0.0000 1.9998 1.0000 1.9375 0.8628 0.9922 1.1056 0.0000 1.6514 0.0000 0.0000 1.9846 0.0726 0.5406 0.3007 0.0931 0.0000 1.9624 0.0000 1.1350 0.2932 0.2539 0.1148 0.0000 0.8741 0.9505 0.0000 1.4467 0.9977 0.3003 0.8535 0.7152 0.9694 0.0000 3/16 0.4299 0.9616 0.6367 0.9979 0.0000 0.6186 0.0000 1.5062 0.9690 0.0032 0.0195 0.3036 0.0373 0.7759 0.0451 0.1719 0.5960 0.9998 1.0499 0.0000 0.7208 0.9930 1.9648 0.8238 0.5256 0.9868 0.2338 0.9725 0.1335 0.9999 1.9260 0.0000 0.9983 0.9999 1.0000 1.9991 1.8728 0.9893 7 8 9 10 .0000 0.9955 0.2631 0.9945 0.6789 0.9988 0.5369 0.0000 0.0000 0.9972 1.9860 0.9656 0.9118 0.5339 0.6346 0.5594 0.9335 0.9773 0.9980 1.0000 0.0039 0.6230 0.8906 0.9996 1.0000 0.9389 0.1899 0.0000 1.0000 0.9888 0.0000 1.0000 1.0010 0.0724 0.8306 0.4669 0.7461 0.9537 0.4449 0.0236 0.9294 0.2422 0.0000 1.3907 0.8572 0.0000 1.9849 0.1747 0.7707 0.9969 1.7006 0.3936 0.0352 0.512 Appendix III Tables Table 1 (continued ) n 6 k 0 1 2 3 4 5 6 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 1/16 0.1142 0.9999 1.3697 0.9988 1.9260 0.3671 0.9081 0.8555 0.0343 0.0000 1.9318 0.9997 1.3501 0.9803 0.9965 0.9987 0.9961 1.0000 0.9965 0.0000 0.6785 0.0000 0.0000 0.0000 0.2742 0.9995 1.0547 0.9865 0.9998 1.9976 0.6506 0.2817 0.1308 0.0000 1.0000 1.0000 p 4/16 0.8851 0.9938 0.9958 0.1679 0.0000 5/16 0.8862 0.0000 1.9985 0.9998 1.9996 1.1937 0.0278 0.9545 0.9997 1.9990 1.0146 0.6114 0.9970 0.9922 0.0000 1.9709 0.0596 0.6872 0.7834 0.1543 0.9987 1.9150 0.5000 0.9987 0.7854 0.9999 1.5245 0.8725 0.9844 1.

Tables

513

Table 1

(continued )

n 10 11

k 9 10 0 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 4 5 6 7 8 9 10 11 12 0 1 2 3 4 5 6 7 8 9 10 11 12 13 0 1

1/16 1.0000 1.0000 0.4917 0.8522 0.9724 0.9965 0.9997 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.4610 0.8297 0.9649 0.9950 0.9995 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.4321 0.8067 0.9565 0.9931 0.9992 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.4051 0.7833

2/16 1.0000 1.0000 0.2302 0.5919 0.8503 0.9610 0.9927 0.9990 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 0.2014 0.5467 0.8180 0.9472 0.9887 0.9982 0.9998 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.1762 0.5035 0.7841 0.9310 0.9835 0.9970 0.9996 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.1542 0.4626

3/16 1.0000 1.0000 0.1019 0.3605 0.6589 0.8654 0.9608 0.9916 0.9987 0.9999 1.0000 1.0000 1.0000 1.0000 0.0828 0.3120 0.6029 0.8267 0.9429 0.9858 0.9973 0.9996 1.0000 1.0000 1.0000 1.0000 1.0000 0.0673 0.2690 0.5484 0.7847 0.9211 0.9778 0.9952 0.9992 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 0.0546 0.2312

p 4/16 1.0000 1.0000 0.0422 0.1971 0.4552 0.7133 0.8854 0.9657 0.9924 0.9988 0.9999 1.0000 1.0000 1.0000 0.0317 0.1584 0.3907 0.6488 0.8424 0.9456 0.9857 0.9972 0.9996 1.0000 1.0000 1.0000 1.0000 0.0238 0.1267 0.3326 0.5843 0.7940 0.9198 0.9757 0.9944 0.9990 0.9999 1.0000 1.0000 1.0000 1.0000 0.0178 0.1010

5/16 1.0000 1.0000 0.0162 0.0973 0.2816 0.5329 0.7614 0.9068 0.9729 0.9943 0.9992 0.9999 1.0000 1.0000 0.0111 0.0720 0.2240 0.4544 0.6900 0.8613 0.9522 0.9876 0.9977 0.9997 1.0000 1.0000 1.0000 0.0077 0.0530 0.1765 0.3824 0.6164 0.8078 0.9238 0.9765 0.9945 0.9991 0.9999 1.0000 1.0000 1.0000 0.0053 0.0388

6/16 0.9999 1.0000 0.0057 0.0432 0.1558 0.3583 0.6014 0.8057 0.9282 0.9807 0.9965 0.9996 1.0000 1.0000 0.0036 0.0291 0.1135 0.2824 0.5103 0.7291 0.8822 0.9610 0.9905 0.9984 0.9998 1.0000 1.0000 0.0022 0.0195 0.0819 0.2191 0.4248 0.6470 0.8248 0.9315 0.9795 0.9955 0.9993 0.9999 1.0000 1.0000 0.0014 0.0130

7/16 0.9997 1.0000 0.0018 0.0170 0.0764 0.2149 0.4303 0.6649 0.8473 0.9487 0.9881 0.9983 0.9999 1.0000 0.0010 0.0104 0.0504 0.1543 0.3361 0.5622 0.7675 0.9043 0.9708 0.9938 0.9992 1.0000 1.0000 0.0006 0.0063 0.0329 0.1089 0.2565 0.4633 0.6777 0.8445 0.9417 0.9838 0.9968 0.9996 1.0000 1.0000 0.0003 0.0038

8/16 0.9990 1.0000 0.0005 0.0059 0.0327 0.1133 0.2744 0.5000 0.7256 0.8867 0.9673 0.9941 0.9995 1.0000 0.0002 0.0032 0.0193 0.0730 0.1938 0.3872 0.6128 0.8062 0.9270 0.9807 0.9968 0.9998 1.0000 0.0001 0.0017 0.0112 0.0461 0.1334 0.2905 0.5000 0.7095 0.8666 0.9539 0.9888 0.9983 0.9999 1.0000 0.0001 0.0009

12

13

14

514

Appendix III Tables

Table 1

(continued )

n 14

k 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13

1/16 0.9471 0.9908 0.9988 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.3798 0.7596 0.9369 0.9881 0.9983 0.9998 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.3561 0.7359 0.9258 0.9849 0.9977 0.9997 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000

2/16 0.7490 0.9127 0.9970 0.9953 0.9993 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.1349 0.4241 0.7132 0.8922 0.9689 0.9930 0.9988 0.9998 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.1181 0.3879 0.6771 0.8698 0.9593 0.9900 0.9981 0.9997 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000

3/16 0.4960 0.7404 0.8955 0.9671 0.9919 0.9985 0.9998 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.0444 0.1981 0.4463 0.6946 0.8665 0.9537 0.9873 0.9972 0.9995 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.0361 0.1693 0.3998 0.6480 0.8342 0.9373 0.9810 0.9954 0.9991 0.9999 1.0000 1.0000 1.0000 1.0000

p 4/16 0.2811 0.5213 0.7415 0.8883 0.9167 0.9897 0.9978 0.9997 1.0000 1.0000 1.0000 1.0000 1.0000 0.0134 0.0802 0.2361 0.4613 0.6865 0.8516 0.9434 0.9827 0.9958 0.9992 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 0.0100 0.0635 0.1971 0.4050 0.6302 0.8103 0.9204 0.9729 0.9925 0.9984 0.9997 1.0000 1.0000 1.0000

5/16 0.1379 0.3181 0.5432 0.7480 0.8876 0.9601 0.9889 0.9976 0.9996 1.0000 1.0000 1.0000 1.0000 0.0036 0.0283 0.1069 0.2618 0.4729 0.6840 0.8435 0.9374 0.9799 0.9949 0.9990 0.9999 1.0000 1.0000 1.0000 1.0000 0.0025 0.0206 0.0824 0.2134 0.4069 0.6180 0.7940 0.9082 0.9666 0.9902 0.9977 0.9996 0.9999 1.0000

6/16 0.0585 0.1676 0.3477 0.5637 0.7581 0.8915 0.9615 0.9895 0.9979 0.9997 1.0000 1.0000 1.0000 0.0009 0.0087 0.0415 0.1267 0.2801 0.4827 0.6852 0.8415 0.9352 0.9790 0.9947 0.9990 0.9999 1.0000 1.0000 1.0000 0.0005 0.0057 0.0292 0.0947 0.2226 0.4067 0.6093 0.7829 0.9001 0.9626 0.9888 0.9974 0.9995 0.9999

7/16 0.0213 0.0756 0.1919 0.3728 0.5839 0.7715 0.8992 0.9654 0.9911 0.9984 0.9998 1.0000 1.0000 0.0002 0.0023 0.0136 0.0518 0.1410 0.2937 0.4916 0.6894 0.8433 0.9364 0.9799 0.9952 0.9992 0.9999 1.0000 1.0000 0.0001 0.0014 0.0086 0.0351 0.1020 0.2269 0.4050 0.6029 0.7760 0.8957 0.9609 0.9885 0.9975 0.9996

8/16 0.0065 0.0287 0.0898 0.2120 0.3953 0.6047 0.7880 0.9102 0.9713 0.9935 0.9991 0.9999 1.0000 0.0000 0.0005 0.0037 0.0176 0.0592 0.1509 0.3036 0.5000 0.6964 0.8491 0.9408 0.9824 0.9963 0.9995 1.0000 1.0000 0.0000 0.0003 0.0021 0.0106 0.0384 0.1051 0.2272 0.4018 0.5982 0.7728 0.8949 0.9616 0.9894 0.9979

15

16

Tables

515

Table 1

(continued )

n 16

k 14 15 16 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 0 1 2 3 4

1/16 1.0000 1.0000 1.0000 0.3338 0.7121 0.9139 0.9812 0.9969 0.9996 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.3130 0.6885 0.9013 0.9770 0.9959 0.9994 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.2934 0.6650 0.8880 0.9722 0.9947

2/16 1.0000 1.0000 1.0000 0.1033 0.3542 0.6409 0.8457 0.9482 0.9862 0.9971 0.9995 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.0904 0.3228 0.6051 0.8201 0.9354 0.9814 0.9957 0.9992 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.0791 0.2938 0.5698 0.7933 0.9209

3/16 1.0000 1.0000 1.0000 0.0293 0.1443 0.3566 0.6015 0.7993 0.9180 0.9728 0.9927 0.9984 0.9997 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.0238 0.1227 0.3168 0.5556 0.7622 0.8958 0.9625 0.9889 0.9973 0.9995 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.0193 0.1042 0.2804 0.5108 0.7235

p 4/16 1.0000 1.0000 1.0000 0.0075 0.0501 0.1637 0.3530 0.5739 0.7653 0.8929 0.9598 0.9876 0.9969 0.9994 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 0.0056 0.0395 0.1353 0.3057 0.5187 0.7175 0.8610 0.9431 0.9807 0.9946 0.9988 0.9998 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.0042 0.0310 0.1113 0.2631 0.4654

5/16 1.0000 1.0000 1.0000 0.0017 0.0149 0.0631 0.1724 0.3464 0.5520 0.7390 0.8725 0.9484 0.9828 0.9954 0.9990 0.9998 1.0000 1.0000 1.0000 1.0000 0.0012 0.0108 0.0480 0.1383 0.2920 0.4878 0.6806 0.8308 0.9247 0.9721 0.9915 0.9979 0.9996 0.9999 1.0000 1.0000 1.0000 1.0000 0.0008 0.0078 0.0364 0.1101 0.2440

6/16 1.0000 1.0000 1.0000 0.0003 0.0038 0.0204 0.0701 0.1747 0.3377 0.5333 0.7178 0.8561 0.9391 0.9790 0.9942 0.9987 0.9998 1.0000 1.0000 1.0000 0.0002 0.0025 0.0142 0.0515 0.1355 0.2765 0.4600 0.6486 0.8042 0.9080 0.9640 0.9885 0.9970 0.9994 0.9999 1.0000 1.0000 1.0000 0.0001 0.0016 0.0098 0.0375 0.1040

7/16 1.0000 1.0000 1.0000 0.0001 0.0008 0.0055 0.0235 0.0727 0.1723 0.3271 0.5163 0.7002 0.8433 0.9323 0.9764 0.9935 0.9987 0.9998 1.0000 1.0000 0.0000 0.0005 0.0034 0.0156 0.0512 0.1287 0.2593 0.4335 0.6198 0.7807 0.8934 0.9571 0.9860 0.9964 0.9993 0.9999 1.0000 1.0000 0.0000 0.0003 0.0021 0.0103 0.0356

8/16 0.9997 1.0000 1.0000 0.0000 0.0001 0.0012 0.0064 0.0245 0.0717 0.1662 0.3145 0.5000 0.6855 0.8338 0.9283 0.9755 0.9936 0.9988 0.9999 1.0000 0.0000 0.0001 0.0007 0.0038 0.0154 0.0481 0.1189 0.2403 0.4073 0.5927 0.7597 0.8811 0.9519 0.9846 0.9962 0.9993 0.9999 1.0000 0.0000 0.0000 0.0004 0.0022 0.0096

17

18

19

516

Appendix III Tables

Table 1

(continued )

n 19

k 5 6 7 8 9 10 11 12 13 14 15 16 17 18 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 0 1 2 3 4 5 6 7 8

1/16 0.9992 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.2751 0.6148 0.8741 0.9670 0.9933 0.9989 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.2579 0.6189 0.8596 0.9612 0.9917 0.9986 0.9998 1.0000 1.0000

2/16 0.9757 0.9939 0.9988 0.9998 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.0692 0.2669 0.5353 0.7653 0.9050 0.9688 0.9916 0.9981 0.9997 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.0606 0.2422 0.5018 0.7366 0.8875 0.9609 0.9888 0.9973 0.9995

3/16 0.8707 0.9500 0.9840 0.9957 0.9991 0.9998 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.0157 0.0883 0.2473 0.4676 0.6836 0.8431 0.9351 0.9776 0.9935 0.9984 0.9997 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.0128 0.0747 0.2175 0.4263 0.6431 0.8132 0.9179 0.9696 0.9906

p 4/16 0.6678 0.8251 0.9225 0.9713 0.9911 0.9977 0.9995 0.9999 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.0032 0.0243 0.0913 0.2252 0.4148 0.6172 0.7858 0.8982 0.9591 0.9861 0.9961 0.9991 0.9998 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.0024 0.0190 0.0745 0.1917 0.3674 0.5666 0.7436 0.8701 0.9439

5/16 0.4266 0.6203 0.7838