Contents

i

**A Course in Mathematical Statistics
**

Second Edition

ii

Contents

This Page Intentionally Left Blank

Contents

iii

**A Course in Mathematical Statistics
**

Second Edition

George G. Roussas

Intercollege Division of Statistics University of California Davis, California

ACADEMIC PRESS

San Diego New York

• •

London • Boston Sydney • Tokyo • Toronto

iv

Contents

This book is printed on acid-free paper. Copyright © 1997 by Academic Press

∞

All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher.

ACADEMIC PRESS 525 B Street, Suite 1900, San Diego, CA 92101-4495, USA 1300 Boylston Street, Chestnut Hill, MA 02167, USA http://www.apnet.com ACADEMIC PRESS LIMITED 24–28 Oval Road, London NW1 7DX, UK http://www.hbuk.co.uk/ap/

Library of Congress Cataloging-in-Publication Data Roussas, George G. A course in mathematical statistics / George G. Roussas.—2nd ed. p. cm. Rev. ed. of: A ﬁrst course in mathematical statistics. 1973. Includes index. ISBN 0-12-599315-3 1. Mathematical statistics. I. Roussas, George G. First course in mathematical statistics. II. Title. QA276.R687 1997 96-42115 519.5—dc20 CIP

Printed in the United States of America 96 97 98 99 00 EB 9 8 7 6 5 4 3 2 1

Contents

v

To my wife and sons

vi

Contents

This Page Intentionally Left Blank

Contents

vii

Contents

Preface to the Second Edition xv Preface to the First Edition xviii

Chapter 1

**Basic Concepts of Set Theory 1
**

1.1 1.2* Some Deﬁnitions and Notation Exercises 5 Fields and σ -Fields 8 1

Chapter 2

**Some Probabilistic Concepts and Results 14
**

2.1 2.2 2.3 2.4 2.5* Probability Functions and Some Basic Properties and Results 14 Exercises 20 Conditional Probability 21 Exercises 25 Independence 27 Exercises 33 Combinatorial Results 34 Exercises 40 Product Probability Spaces 45 Exercises 47

vii

viii

Contents

2.6*

The Probability of Matchings 47 Exercises 52

Chapter 3

**On Random Variables and Their Distributions 53
**

3.1 3.2 3.3 3.4 Some General Concepts 53 Discrete Random Variables (and Random Vectors) 55 Exercises 61 Continuous Random Variables (and Random Vectors) 65 Exercises 76 The Poisson Distribution as an Approximation to the Binomial Distribution and the Binomial Distribution as an Approximation to the Hypergeometric Distribution 79 Exercises 82 Random Variables as Measurable Functions and Related Results 82 Exercises 84

3.5*

Chapter 4

**Distribution Functions, Probability Densities, and Their Relationship 85
**

4.1 The Cumulative Distribution Function (c.d.f. or d.f.) of a Random Vector—Basic Properties of the d.f. of a Random Variable 85 Exercises 89 The d.f. of a Random Vector and Its Properties—Marginal and Conditional d.f.’s and p.d.f.’s 91 Exercises 97 Quantiles and Modes of a Distribution 99 Exercises 102 Justiﬁcation of Statements 1 and 2 102 Exercises 105

4.2

4.3 4.4*

Chapter 5

**Moments of Random Variables—Some Moment and Probability Inequalities 106
**

5.1 5.2 5.3 Moments of Random Variables 106 Exercises 111 Expectations and Variances of Some R.V.’s 114 Exercises 119 Conditional Moments of Random Variables 122 Exercises 124

Contents

ix

5.4

5.5 5.6*

Some Important Applications: Probability and Moment Inequalities 125 Exercises 128 Covariance, Correlation Coefﬁcient and Its Interpretation 129 Exercises 133 Justiﬁcation of Relation (2) in Chapter 2 134

Chapter 6

**Characteristic Functions, Moment Generating Functions and Related Theorems 138
**

6.1 6.2 6.3 6.4 6.5 Preliminaries 138 Deﬁnitions and Basic Theorems—The One-Dimensional Case 140 Exercises 145 The Characteristic Functions of Some Random Variables 146 Exercises 149 Deﬁnitions and Basic Theorems—The Multidimensional Case 150 Exercises 152 The Moment Generating Function and Factorial Moment Generating Function of a Random Variable 153 Exercises 160

Chapter 7

**Stochastic Independence with Some Applications 164
**

7.1 7.2 7.3 7.4* Stochastic Independence: Criteria of Independence 164 Exercises 168 Proof of Lemma 2 and Related Results 170 Exercises 172 Some Consequences of Independence 173 Exercises 176 Independence of Classes of Events and Related Results 177 Exercise 179

Chapter 8

**Basic Limit Theorems 180
**

8.1 8.2 8.3 8.4 Some Modes of Convergence 180 Exercises 182 Relationships Among the Various Modes of Convergence 182 Exercises 187 The Central Limit Theorem 187 Exercises 194 Laws of Large Numbers 196 Exercises 198

x

Contents

8.5 8.6*

Further Limit Theorems 199 Exercises 206 Pólya’s Lemma and Alternative Proof of the WLLN 206 Exercises 211

Chapter 9

**Transformations of Random Variables and Random Vectors 212
**

9.1 9.2 9.3 9.4 The Univariate Case 212 Exercises 218 The Multivariate Case 219 Exercises 233 Linear Transformations of Random Vectors 235 Exercises 240 The Probability Integral Transform 242 Exercise 244

Chapter 10

**Order Statistics and Related Theorems 245
**

10.1 10.2 Order Statistics and Related Distributions 245 Exercises 252 Further Distribution Theory: Probability of Coverage of a Population Quantile 256 Exercise 258

Chapter 11

**Sufﬁciency and Related Theorems 259
**

11.1 11.2 11.3 11.4 Sufﬁciency: Deﬁnition and Some Basic Results 260 Exercises 269 Completeness 271 Exercises 273 Unbiasedness—Uniqueness 274 Exercises 276 The Exponential Family of p.d.f.’s: One-Dimensional Parameter Case 276 Exercises 280 Some Multiparameter Generalizations 281 Exercises 282

11.5

Chapter 12

**Point Estimation 284
**

12.1 Introduction 284 Exercise 284

9 Other Methods of Estimation 320 Exercises 322 12.7 Finding Bayes Estimators 312 Exercises 317 12.Contents xi 12.6 13.2 13. Goodness-of-Fit Tests 370 Exercises 373 13.4 The Case Where Complete Sufﬁcient Statistics Are Not Available or May Not Exist: Cramér-Rao Inequality 293 Exercises 301 12.6 Criteria for Selecting an Estimator: The DecisionTheoretic Approach 309 12.7 13.5 13.11 Closing Remarks 325 Exercises 326 Chapter 13 Testing Hypotheses 327 13.10 Asymptotically Optimal Properties of Estimators 322 Exercise 325 12.4 13.1 General Concepts of the Neyman-Pearson Testing Hypotheses Theory 327 Exercise 329 Testing a Simple Hypothesis Against a Simple Alternative 329 Exercises 336 UMP Tests for Testing Certain Composite Hypotheses 337 Exercises 347 UMPU Tests for Testing Certain Composite Hypotheses 349 Exercises 353 Testing the Parameters of a Normal Distribution 353 Exercises 356 Comparing the Parameters of Two Normal Distributions 357 Exercises 360 Likelihood Ratio Tests 361 Exercises 369 Applications of LR Tests: Contingency Tables. Minimum Variance 285 Exercises 286 12.8 Finding Minimax Estimators 318 Exercise 320 12.8 .3 13.3 The Case of Availability of Complete Sufﬁcient Statistics 287 Exercises 292 12.2 Criteria for Selecting an Estimator: Unbiasedness.5 Criteria for Selecting an Estimator: The Maximum Likelihood Principle 302 Exercises 308 12.

3 16.1 15.3 15.4 16.5 Conﬁdence Intervals 397 Exercise 398 Some Examples 398 Exercises 404 Conﬁdence Intervals in the Presence of Nuisance Parameters 407 Exercise 410 Conﬁdence Regions—Approximate Conﬁdence Intervals 410 Exercises 412 Tolerance Intervals 413 Chapter 16 The General Linear Hypothesis 416 16.3 14.1 16.9 Decision-Theoretic Viewpoint of Testing Hypotheses 375 Chapter 14 Sequential Procedures 382 14.5 Introduction of the Model 416 Least Square Estimators—Normal Equations 418 Canonical Reduction of the Linear Model—Estimation of σ2 Exercises 428 Testing Hypotheses About η = E(Y) 429 Exercises 433 Derivation of the Distribution of the F Statistic 433 Exercises 436 424 Chapter 17 Analysis of Variance 17.2 14.2 One-way Layout (or One-way Classiﬁcation) with the Same Number of Observations Per Cell 440 Exercise 446 Two-way Layout (Classiﬁcation) with One Observation Per Cell 446 Exercises 451 .4 Some Basic Theorems of Sequential Sampling 382 Exercises 388 Sequential Probability Ratio Test 388 Exercise 392 Optimality of the SPRT-Expected Sample Size 393 Exercises 394 Some Examples 394 Chapter 15 Conﬁdence Regions—Tolerance Intervals 397 15.2 16.xii Contents 13.1 14.2 15.1 440 17.4 15.

Contents xiii 17.1 19.4 20.3 I.4 Basic Deﬁnitions in Vector Spaces 499 Some Theorems on Vector Spaces 501 Basic Deﬁnitions About Matrices 502 Some Theorems About Matrices and Quadratic Forms 504 Appendix II Noncentral t-.f.6 Nonparametric Estimation 485 Nonparametric Estimation of a p.4 Two-way Layout (Classiﬁcation) with K (≥ 2) Observations Per Cell 452 Exercises 457 A Multicomparison method 458 Exercises 462 Chapter 18 The Multivariate Normal Distribution 463 18.2 18.3 Introduction 463 Exercises 466 Some Properties of Multivariate Normal Distributions 467 Exercise 469 / and a Test of Independence 469 Estimation of μ and Σ Exercises 475 Chapter 19 Quadratic Forms 476 19.d.3 20. χ 2-.1 Noncentral t-Distribution 508 508 . and F-Distributions II. 487 Exercise 490 Some Nonparametric Tests 490 More About Nonparametric Tests: Rank Tests 493 Exercises 496 Sign Test 496 Relative Asymptotic Efﬁciency of Tests 498 Appendix I Topics from Vector and Matrix Algebra 499 I.3 17.2 I.5 20.2 Introduction 476 Some Theorems on Quadratic Forms 477 Exercises 483 Chapter 20 Nonparametric Inference 485 20.1 18.2 20.1 20.1 I.

xiv Contents II.2 II.3 Noncentral x2-Distribution 508 Noncentral F-Distribution 509 Appendix III Tables 1 2 3 4 5 6 7 511 The Cumulative Binomial Distribution 511 The Cumulative Poisson Distribution 520 The Normal Distribution 523 Critical Values for Student’s t-Distribution 526 Critical Values for the Chi-Square Distribution 529 Critical Values for the F-Distribution 532 Table of Selected Discrete and Continuous Distributions and Some of Their Characteristics 542 Some Notation and Abbreviations 545 Answers to Selected Exercises 547 Index 561 .

under the title A First Course in Mathematical Statistics. however. whereas some adjustments are affected. either kindly brought to my attention by users of the book or located by my students and myself. have acknowledged to me that they made their entrance into the wonderful world of probability and statistics through my book. In xv . holding senior academic appointments in North America and elsewhere. I have had the pleasure of hearing colleagues telling me that the book ﬁlled an existing gap between a plethora of textbooks of lower mathematical level and others of considerably higher level. One change occurring throughout the book is the grouping of exercises of each chapter in clusters added at the end of sections. and a few new ones were inserted. the reissuing of the book has provided me with an excellent opportunity to incorporate certain rearrangements of the material. The changes in this issue consist in correcting some rather minor factual errors and a considerable number of misprints. The ﬁrst issue of the book was very well received from an academic viewpoint. throughout the years.Contents xv Preface to the Second Edition This is the second edition of a book published for the ﬁrst time in 1973 by Addison-Wesley Publishing Company. Associating exercises with material discussed in sections clearly makes their assignment easier. That issue. This second edition preserves the unique character of the ﬁrst issue of the book. It is in response to these comments and inquiries that I have decided to prepare a second edition of the book. as being too complicated for the level of the book. The ﬁrst edition has been out of print for a number of years now. a handful of exercises were omitted. In the process of doing this. after it went out of print. and also as forming a collector’s item. Finally. I have also heard of the book as being in a class of its own. I have received numerous inquiries as to the possibility of having the book reprinted. is meant for circulation only in Taiwan. Also.. Inc. A substantial number of colleagues. although its reprint in Taiwan is still available.

1. except that in Chapter 13. stated in Section 7. In Chapter 7. in Chapter 14.xvi Contents Preface to the Second Edition Chapters 1 through 8. no changes were deemed necessary. Furthermore. some additional material is also presented for the purpose of further clarifying the interpretation of correlation coefﬁcient. The section on the problem of the probability of matching and the section on product probability spaces are also marked by an asterisk for the reason explained above. Also. Section 5. the concepts of a ﬁeld and of a σ-ﬁeld. In Chapter 2. have been grouped together in Section 1. two new sections have been created by discussing separately marginal and conditional distribution functions and probability density functions. a new theorem. in Chapter 8. In Chapter 3.1 has been somewhat simpliﬁed by the formulation and proof of Lemma 1 in the same section. the proofs of two statements.6*. the proof of Theorem 2 in Section 13. based on truncation. has been added in Section 8. formulated in Section 4.5*. and needs no elaboration. the proof of Pólya’s lemma and an alternative proof of the Weak Law of Large Numbers. the solution of the problem of the probability of matching is isolated in a section by itself.1. Section 3. the proof of Theorem 1 in Section 14.3 has been facilitated by the formulation and proof in the same section of two lemmas. the discussion of covariance and correlation coefﬁcient is carried out in a separate section. thus increasing the number of sections from ﬁve to six. especially useful in estimation. Also.4*. the justiﬁcation of relation (2) in Chapter 2 is done in a section by itself. but their inclusion is not indispensable. Finally. In Chapter 4. has been added. Section 8. some of the materials were pulled out to form separate sections. a table of some commonly met distributions. Speciﬁcally. are carried out in a separate section. Theorem 10.6*. also. the discussion of random variables as measurable functions and related results is carried out in a separate section. along with their means. and related results in a separate section. in Chapter 1. Lemma 1 and Lemma 2. and basic results on them. variances and other characteristics. by grouping together in a section marked by an asterisk deﬁnitions and results on independence. These sections have also been marked by an asterisk (*) to indicate the fact that their omission does not jeopardize the ﬂow of presentation and understanding of the remaining material. They are still readily available for those who wish to employ them to add elegance and rigor in the discussion. and also by presenting. In Chapter 6.5. . the number of sections has been doubled from three to six. The value of such a table for reference purposes is obvious. Also. In the remaining chapters. the number of sections has been doubled from two to four by presenting the proof of Lemma 2. and also by isolating in a section by itself deﬁnitions and results on momentgenerating functions and factorial moment generating functions. Statements 1 and 2.2*. this last section is also marked by an asterisk. Finally. the number of sections has been expanded from three to ﬁve by discussing in separate sections characteristic functions for the one-dimensional and the multidimensional case. This was done by discussing independence and product probability spaces in separate sections. In Chapter 5. in Section 4.

One may also be selective in covering the material in Chapter 9. Roussas Davis. The book can also be used independently for a one-semester (or even one quarter) course in probability alone. In closing. perhaps. it has been included in the book for completeness only. one would strive to cover the material in Chapters 1 through 10 with the exclusion. however. He is also cognizant of the trend of having recipes of probability and statistical results parading in textbooks. California May 1996 .6 in Chapter 13. In either case. The instructor can also be selective regarding Chapters 11 and 18. As for Chapter 19. Sections 13.4–13. depriving the reader of the challenge of thinking and reasoning instead delegating the “thinking” to a computer. Some of the material can actually be omitted without disrupting the continuity of presentation. the trend and practices just described should make the availability of a textbook such as this one exceedingly useful if not imperative. presentation of results involving characteristic functions may be perfunctory only. why characteristic functions are introduced in the ﬁrst place. It is hoped that there is still room for a book of the nature and scope of the one at hand. or for ﬁrst-year graduate students not having been exposed before to a serious course on the subject matter. This includes the sections marked by asterisks. One should mention. In such a case. it is to be mentioned that this author is fully aware of the fact that the audience for a book of this level has diminished rather than increased since the time of its ﬁrst edition. and therefore what one may be missing by not utilizing this valuable tool.Preface to the Second Contents Edition xvii This book contains enough material for a year course in probability and statistics at the advanced undergraduate level. Indeed. and all of Chapter 14. G. G. of the sections marked by an asterisk. with emphasis placed on moment-generating functions. perhaps.

so that students without a sophisticated mathematical background can assimilate a fairly broad spectrum of the theorems and results from mathematical statistics. in general—with no prior knowledge of statistics. There are basically two streams of textbooks on mathematical statistics that are currently on the market. Thus. whereas Part xviii . measure theory. for that purpose. as well as for ﬁrst-year graduate students in statistics—or graduate students. they require a high level. whereas the intermediate texts deliver much less than they hope to learn in a course of mathematical statistics. and results are often stated without proofs or with partial proofs that fail to satisfy an inquisitive mind. The present book attempts to bridge the gap between the two categories. This book consists of two parts. mathematical background of the reader (for example. readers with a modest background in mathematics and a strong motivation to understand statistical concepts are left somewhere in between.xviii Contents Preface to the First Edition This book is designed for a ﬁrst-year course in mathematical statistics at the undergraduate level. A typical three-semester course in calculus and some familiarity with linear algebra should sufﬁce for the understanding of most of the mathematical aspects of this book. Part 1 (Chapters 1–10) deals with probability and distribution theory. although it is not formally so divided. This has been made possible by developing the fundamentals of modern probability theory and the accompanying mathematical ideas at the beginning of this book so as to prepare the reader for an understanding of the material presented in the later chapters. where the concepts are demonstrated in terms of intuitive reasoning. The other category consists of intermediate level texts. real and complex analysis). The advanced texts are inaccessible to them. Some advanced calculus—perhaps taken concurrently—would be helpful for the complete appreciation of some ﬁne points. One category is the advanced level texts which demonstrate the statistical theories in their full generality and mathematical rigor.

in particular. and also the deﬁnition of a random variable as a measurable function. quadratic forms. The following chapter is devoted to testing hypotheses problems. Other important features are as follows: a detailed and systematic discussion of the most useful distributions along with ﬁgures and various approximations for several of them. the point estimation problem is taken up and is discussed in great detail and as large a generality as is allowed by the level of this book. along with the examples (most of them numerical) and the illustrative ﬁgures. this does not do justice to such an important concept as that of sufﬁciency. the Multivariate Normal distribution. . More precisely. . In many textbooks of about the same level of sophistication as the present book. Here. the systematic employment of characteristic functions—rather than moment generating functions—with all the well-known advantages of the former over the latter. In our opinion.” etc. the establishment of several moment and probability inequalities. This we consider to be one of the distinctive characteristics of this part. a complete statement and proof of the Central Limit Theorem (in its classical form). A few of the proofs of theorems and some exercises have been drawn from recent publications in journals. For the convenience of the reader. the book also includes an appendix summarizing all necessary results from vector and matrix algebra. . are introduced. . although historically logical.” “it can be proved that . These latter topics are available only in more advanced texts. including all common modes of convergence and their relationship. The second part of the book opens with an extensive chapter on sufﬁciency. there are special chapters on sequential procedures. . and nonparametric inference. statements of the Laws of Large Numbers and several proofs of the Weak Law of Large Numbers. There are more than 120 examples and applications discussed in detail in . and also an extensive chapter on transformations of random variables with numerous illustrative examples discussed in detail. The concept of sufﬁciency is usually treated only in conjunction with estimation and testing hypotheses problems. conﬁdence regions—tolerance intervals. a discussion of exponential families. . which is pedagogically unsound. Other features are a complete formulation and treatment of the general Linear Hypothesis and the discussion of the Analysis of Variance as an application of it. Finally. This allows us to state and prove fundamental results in their full generality that would otherwise be presented vaguely using statements such as “it may be shown that . in Part 1 the concepts of a ﬁeld and σ-ﬁeld.Contents Preface to the First Edition xix 2 (Chapters 11–20) is devoted to statistical inference. Special attention is given to estimators derived by the principles of unbiasedness. and further useful limit theorems. Next. the above two topics are approached either separately or in the reverse order from the one used here. an extensive chapter on limit theorems. uniform minimum variance and the maximum likelihood and minimax principles. An abundance of examples is also found in this chapter. the reader ﬁnds a discussion of families of probability density functions which have the monotone likelihood ratio property and.

which are of both theoretical and practical importance. Also thanks go to B. classes consisting of relatively mathematically immature. as well as juniors and seniors. the responsibility in this book lies with this author alone for all omissions and errors which may still be found. Section 5 of Chapter 5.xx Contents Preface to the First Edition the text. Bhattacharyya. Also. These classes met three hours a week over the academic year. the abundance of examples. and statistically unsophisticated graduate students. R. and adventurous sophomores. and the existence of a host of exercises of both theoretical and applied nature will. there are more than 530 exercises. The careful selection of the material. we hope. A. eager. the inclusion of a large variety of topics. On the other hand. The material of this book has been presented several times to classes of the composition mentioned earlier. Madison. Mehrotra. satisfy people of both theoretical and applied inclinations. Lind. Of course. that is. We feel that there is enough material in this book for a threequarter session if the classes meet three or even four hours a week. As the teaching of statistics becomes more widespread and its level of sophistication and mathematical rigor (even among those with limited mathematical training but yet wishing to know “why” and “how”) more demanding. special thanks are due to G. At various stages and times during the organization of this book several students and colleagues helped improve it by their comments. K. In connection with this. and a host of others. G. K. His meticulous reading of the manuscripts resulted in many comments and suggestions that helped improve the quality of the text. G. G. appearing at the end of the chapters. Wisconsin November 1972 . and Section 3 of Chapter 9. All the application-oriented reader has to do is to skip some ﬁne points of some of the proofs (or some of the proofs altogether!) when studying the book. too many to be mentioned here. the careful handling of these same ﬁne points should offer some satisfaction to the more mathematically inclined readers. Agresti. we hope that this book will ﬁll a gap and satisfy an existing need. and most of the material was covered in the order in which it is presented with the occasional exception of Chapters 14 and 20.

1). or space (which may be different from situation to situation). or universal set. (See Fig.1. S′ is said to be a proper subset of S. or that it belongs to S is expressed by writing s ∈ S. We say that S′ is a subset of S.2. Sets are denoted by capital letters. in fact. an element of S. and we write S′ ⊂ S. if for every s ∈ S′. 1.1 Some Deﬁnitions and Notation A set S is a (well deﬁned) collection of distinct objects which we denote by s. The negation of the statement is expressed by writing s ∉ S. but s2 ∉S ′. S ′ ⊂ S. and write S′ ⊆ S. we have s ∈ S. to be denoted by S. The fact that s is a member of S. if S′ ⊆ S and there exists s ∈ S such that s ∉ S′. The complement (with respect to S) of the set A. is deﬁned by Ac = {s ∈ S.1 Some Deﬁnitions and Notation 1 Chapter 1 Basic Concepts of Set Theory 1. while lower case letters are used for elements of sets. s ∉ A}. denoted by Ac. will be considered and all other sets in question will be subsets of S. From now on a basic. 1.1 Set Operations 1. These concepts can be illustrated pictorially by a drawing called a Venn diagram (Fig.) 1 . S S' s1 s2 Figure 1.1. since s2 ∈S. 1. or that S′ is contained in S.1 S ′ ⊆ S.

2 Ac is the shaded region. The intersection of the sets Aj. 2. This deﬁnition extends to an inﬁnite number of sets. this is pictorially illustrated in Fig. 2.4. this is pictorially illustrated in Fig. s ∈ A j for all j = 1. 1. j =1 ∞ S Figure 1. n j =1 For n = 2. 2. n}. The union of the sets Aj. n}.3 A1 ∪ A2 is the shaded region. 2. one has U A j = {s ∈S . . . j = 1. ⋅ ⋅ ⋅ . 2.4 A1 ∩ A2 is the shaded region. s ∈ A j for all j = 1. s ∈ A j for at least one j = 1. j =1 ∞ S Figure 1. The deﬁnition extends to an inﬁnite number of sets.3. ⋅ ⋅ ⋅ . .2 1 Basic Concepts of Set Theory A Ac S Figure 1. . Thus for denumerably many sets. ⋅ ⋅ ⋅ }. 1. 2. to be denoted by A1 ∩ A2 ∩ ⋅ ⋅ ⋅ ∩ An is deﬁned by or I Aj . j =1 n is deﬁned by U A j = {s ∈S . s ∈ A j for at least one j = 1. n. A1 A2 . 2. to be denoted by A1 ∪ A2 ∪ ⋅ ⋅ ⋅ ∪ An or U Aj . . j =1 n I A j = {s ∈S . A1 A2 3. one has I A j = {s ∈S . n. n j =1 For n = 2. . . Thus for denumerably many sets. ⋅ ⋅ ⋅ }. j = 1. .

respectively. { { } } Symmetrically. and (3). 1. Note that ( ) ( ) ( ) A1 Δ A2 = A1 ∪ A2 − A1 ∩ A2 .5. The difference A1 − A2 is deﬁned by A1 − A2 = s ∈ S . I j A j. ∑ j A j. A1 + ⋅ ⋅ ⋅ + An = ∑ A j and j =1 n A1 + A2 + ⋅ ⋅ ⋅ = ∑ A j j =1 ∞ instead of A1 ∪ A2. Two sets A1. c Note that A1 − A2 = A1 ∩ Ac 2. It is worthwhile to observe that operations (4) and (5) can be expressed in terms of operations (1). The sets Aj. A2 − A1 is \\\\.1. and U A j. A1 A2 5. (2). . which n j =1 ∞ j =1 . j = 1. this is shown in Fig.) S Figure 1.7). and that. We will write U j A j.2 Further Deﬁnitions and Notation A set which contains no elements is called the empty set and is denoted by ∅.1. in general. are said to be pairwise or mutually disjoint if Ai ∩ Aj = ∅ for all i ≠ j (Fig.6. In such a case. A2 are said to be disjoint if A1 ∩ A2 = ∅. A1 − A 2 ≠ A2 − A1. . if both A1 ⊆ A2 and A2 ⊆ A1. 1. s ∈ A1 . A2 − A1 = A2 ∩ A 1. (See Fig.6 A1 Δ A2 is the shaded area. Pictorially. 2. it is customary to write A1 + A2 . S Figure 1. s ∉ A2 . s ∉ A1 . ( ) A1 A2 1.1 Some Deﬁnitions and Notation 3 4. The symmetric difference A1 Δ A2 is deﬁned by A1 Δ A2 = A1 − A2 ∪ A2 − A1 . U A j. A2 are said to be equal. and we write A1 = A2. A2 − A1 = s ∈ S . Two sets A1. . s ∈ A2 . 1.5 A1 − A2 is ////. where we do not wish to specify the range of j.

The previous statements are all obvious as is the following: ∅ ⊆ A for every subset A of S. A ∪ A = A. Also A1 ∪ A2 = A1 + A2 for the same reason. A ∩ Ac = ∅. we prove (i).4 1 Basic Concepts of Set Theory will usually be either the (ﬁnite) set {1. A1 ∪ A2 = A2 ∪ A1 A1 ∩ A2 = A2 ∩ A1 6. .7 A1 and A2 are disjoint. or the (inﬁnite) set {1. a) ( U jAj)c ⊆ I jAc j . . They are known as De Morgan’s laws: i ) ) ⎛ ⎞ c ⎜ U Aj⎟ = I Aj . Also 4. ∅ ∪ A = A. S Figure 1. S ∪ A = S. (Ac)c = A. ⎝ j ⎠ j ⎛ ⎞ c ⎜ I Aj⎟ = U Aj . An identity: UA j j c c c = A1 + A1 ∩ A2 + A1 ∩ A2 ∩ A3 + ⋅ ⋅ ⋅ . A ∪ Ac = S. The following identity is a useful tool in writing a union of sets as a sum of disjoint sets. A ∩ (∪j Aj) = ∪j (A ∩ Aj) A ∪ (∩j Aj) = ∩j (A ∪ Aj) } (Associative laws) (Commutative laws) (Distributive laws) } } are easily seen to be true. . . S ∩ A = A. A1 A2 1.1.}. 2. There are two more important properties of the operation on sets which relate complementation to union and intersection. A1 ∩ A2 = ∅. . ⎝ j ⎠ j c c ii As an example of a set theoretic proof. . n}. 2. 2. that is. PROOF OF (i) We wish to establish and c b) I jAc j ⊆ ( U j A j) . 3. A1 ∪ (A2 ∪ A3) = (A1 ∪ A2) ∪ A3 A1 ∩ (A2 ∩ A3) = (A1 ∩ A2) ∩ A3 5.3 Properties of the Operations on Sets 1. S c = ∅. ∅ ∩ A = ∅. A ∩ A = A. ∅c = S. .

. iii) (A1 ∩ A2) ∩ (A1 − A2) = ∅. The sets A and A ¯. a) Let s ∈ ( U jAj)c. . for any sequence {An}. . iii) (A1 ∪ A2) − A1 = A2. An is decreasing. n = 1. ▲ This section is concluded with the following: DEFINITION 1 The sequence {An}. to be denoted by An↓). we deﬁne A = lim inf An = U n→∞ ∞ n=1 j= n I Aj.1 Let Aj. 2. have veriﬁed the desired equality of the two sets. then lim An = I An . Thus s ∈ Ac j for every j and therefore s ∈ I jAc . U Aj . j c b) Let s ∈ I jAc j . iii) (A1 − A2) ∪ A2 = A2. . j = 1. by deﬁnition. Then s ∉ U jAj and therefore s ∈( U jAj)c. of the sequence {An}. . then lim An = U An . 3 be arbitrary subsets of S.1. respectively. . . to be denoted by An↑).1. An is increasing. Determine whether each of the following statements is correct or incorrect. n→∞ n=1 More generally. . The limit of a monotone sequence is deﬁned as follows: ii) If An↑.1 Some Deﬁnitions and Exercises Notation 5 We will then. and n→∞ n=1 ∞ ∞ ii) If An↓. Then s ∉ U jAj. Then s ∈ A j for every j and hence s ∉ Aj for any j. 2. iv) (A1 ∪ A2) ∩ (A2 ∪ A3) ∩ (A3 ∪ A1) = (A1 ∩ A2) ∪ (A2 ∩ A3) ∪ (A3 ∩ A1). or ii) A1 ʖ A2 ʖ A3 ʖ · · · (that is. 2. . hence s ∉ Aj for any j. n = 1. The proof of (ii) is quite similar. is said to be a monotone sequence of sets if: ii) A1 ⊆ A2 ⊆ A3 ⊆ · · · (that is. ∞ ∞ and A = lim sup An = I n→∞ ∞ n=1 j= n ¯ are called the inferior limit and superior limit. The sequence {An} has a limit if A = A Exercises 1.

1. y = integers}. −5 Յ x Յ 5. 1. the subset relationship is transitive. ⎩ ⎭ ⎧ ⎫ A7 = ⎨ x. y ′ ∈ S . A2.2 Let S = {(x. .5 Establish the distributive laws stated on page 4. x 2 = y 2 ⎬. s belongs to all of A1. ⎭ ⎫ ′ ∈S .6 1 Basic Concepts of Set Theory 1.4 Let A. where prime denotes transpose. ⎩ ⎭ ( ) ( ) ( ) ⎧ A2 = ⎨ x. y ⎩ ⎧ A6 = ⎨ x. Then show that A ⊆ C. ⎝ j =2 ⎠ j =2 ( ) ⎞ 7 ⎛ 7 iii) A1 ∪ ⎜ I A j ⎟ = I A1 ∪ A j . 0 Յ y Յ 5. .1. and perhaps their complements. that is. and deﬁne the subsets Aj. Verify it by taking A = A1. 1. 1. 1. 1. iii) C = {s ∈ S . x = y⎬. x = − y⎫ ⎬. j = 1. ⎝ j =2 ⎠ j =2 ( ) 7 ⎛ 7 ⎞ iii) ⎜ U A j ⎟ = I Ac j. y)′ ∈ ޒ2.1. . A2. B and C be subsets of S and suppose that A ⊆ B and B ⊆ C. y ′ ∈ S . x ( ) ⎫ ≤ y 2 ⎬. x 2 + y 2 ≤ 4 ⎬.2 and show that: ⎞ 7 ⎛ 7 iii) A1 ∩ ⎜ U A j ⎟ = U A1 ∩ A j . y ′ ∈ S .3 Refer to Exercise 1. ⎩ ⎭ ⎧ ⎫ A3 = ⎨ x. A3 . A3 and A4 are deﬁned in Exercise 1. . y ⎩ ( ) ′ ∈ S . A3.2.6 In terms of the acts A1.1. . 2. s belongs to exactly i of A1. A2. x. A3 }. y ⎩ ⎧ A4 = ⎨ x. ⎬ ⎭ 2 ( ) List the members of the sets just deﬁned.1. 7 of S as follows: ⎧ ⎫ A1 = ⎨ x. ⎭ ( )′ ∈ S . where A1. y ′ ∈ S . B = A3 and C = A4. express each one of the following acts: iii) Bi = {s ∈ S. ⎩ ⎭ ⎧ ⎫ A5 = ⎨ x. 3}. x ≤ y2 . ⎝ j =1 ⎠ j =1 7 ⎞ ⎛ 7 iv) ⎜ I A j ⎟ = U Ac j ⎝ j =1 ⎠ j =1 c c by listing the members of each one of the eight sets appearing on either side of each one of the relations (i)–(iv). x 2 ≥ y⎬. where i = 0.1.1.

A3}. 1. . s belongs to at least 1 of A1. A3}.1.11 Let S = ޒand deﬁne the subsets An. iii) A ⊆ A ¯ ¯ = lim A . of S as follows: ⎧ 1 2 1⎫ An = ⎨ x.1. A2. Bn. of S as follows: ⎧ 1 An = ⎨x ∈ . iv) E = {s ∈ S. 1. n→∞ . 2. n n n ⎭ ⎩ ⎧ 1⎫ Bn = ⎨ x. n→∞ lim sup An = A ∪ B. Then show that lim inf An = A ∩ B. s belongs to none of A1.1. 1. A2. n⎭ ⎧ 3⎫ Bn = ⎨x ∈ .1. Bn. . n ⎭ ⎩ ( ) ( ) Then show that An↑ A.7 Establish the identity stated on page 4.12 Let A and B be subsets of S and for n = 1. ޒ0 < x < 7 + ⎬. A3 }. s belongs to all but ﬁnitely many A’s}. . s belongs to inﬁnitely many A’s}. . then A = A n n→∞ ¯ 1. . deﬁne the sets An as follows: A2n −1 = A. A2.1. so that lim n→∞ n→∞ n Exercise 1. ޒ− 5 + < x < 20 − n ⎩ 1⎫ ⎬.1. . . n = 1. . n ⎭ ⎩ An = A and lim B = B exist (by Then show that An↑ and Bn↓. 2. show that ⎛ ⎞ c ⎜ I Aj⎟ = U Aj . x 2 + y 2 ≤ 3 ⎬. Also identify the sets A and B. y ′ ∈ ޒ2 . s belongs to at most 2 of A1. .8 Give a detailed proof of the second identity in De Morgan’s laws. n = 1. that is. ⎝ j ⎠ j 1. y ′ ∈ ޒ2 . iv) F = {s ∈ S. iii) A ¯. Bn↓ B and identify A and B. ¯ = {s ∈ S.1.9 Refer to Deﬁnition 1 and show that c iii) A = {s ∈ S.1 Some Deﬁnitions and Exercises Notation 7 iii) D = {s ∈ S. A2n = B. 2. .9(iv)). 1. 3 + ≤ x < 6 − . 0 ≤ y ≤ 2 − 2 ⎬.10 Let S = ޒ2 and deﬁne the subsets An.1. iv) If {An} is a monotone sequence.

.8 1 Basic Concepts of Set Theory 1. need not imply that their union or intersection is in F. By (F 2). (F 3) A1. ⎝ j =1 ⎠ j =1 k −1 By the induction hypothesis. If Aj ∈ F. . . 2. if A1 .2. . A ∪ Ac = S ∈ F. ⎝ j =1 ⎠ j =1 n c * The reader is reminded that sections marked by an asterisk may be omitted without jeo* pardizing the understanding of the remaining material. . By the associative law for unions of sets. (F 2) A ∈ F implies that Ac ∈ F (that is. . however. . S c = ∅ ∈ F. if (F 1) F is a non-empty class. 1. . Since Ak ∈ F.) PROOF OF (1) AND (2) (1) (F1) implies that there exists A ∈ F and (F 2) implies that Ac ∈ F. for a counterexample. (That is. A2 ∈ F. A2 .) Now assume the statement for unions is true for n = k − 1. j = 1. . ⋅ ⋅ ⋅ . present a number of examples. By (F 3). Ij = 1 A j ∈ F for any ﬁnite n. By observing that ⎛ n c⎞ A = I j ⎜ U Aj ⎟ . By (F3). that is. see consequence 2 on page 10. and derive some basic results. j = 1. A2. Ak ∈ F. (2) The proof will be by induction on n and by one of the De Morgan’s laws. . S. (F 3) implies that k ⎛ k−1 ⎞ ⎜ U A j ⎟ ∪ Ak = U A j ∈ F ⎝ j =1 ⎠ j =1 and by induction. ⎛ k−1 ⎞ A = U j ⎜ U A j ⎟ ∪ Ak . and is denoted by F. n. A2 ∈ F implies that A1 ∪ A2 ∈ F (that is. then Un j = 1 A j ∈ F. then U Aj ∈F . 2. n 2. Uk j = 1 A j ∈ F.2* Fields and σ -Fields In this section. F is closed under complementation). Ak−1 ∈ F . (It is trivially true for n = 1. F is closed under ﬁnite unions and intersections. if A1. . then A1 ∪ A2 ∈ F. DEFINITION 2 A class (set) of subsets of S is said to be a ﬁeld. the statement for unions is true for any ﬁnite n. Notice. . we introduce the concepts of a ﬁeld and of a σ-ﬁeld. ∅ ∈ F. that Aj ∈ F.1 Consequences of the Deﬁnition of a Field 1. j =1 k−1 Consider A1. F is closed under pairwise unions). hence the statement for unions is true for n = 2.

Then F is a ﬁeld. or countably inﬁnite. As an example. 2. we shall verify that C4 is a ﬁeld. (F2). C4 = {A ⊂ S. that is.2 Examples of Fields 1. (F3) are satisﬁed. Then A1 ∪ A2 is ﬁnite. A ∈ Fj for all j ∈ I}. j ∈ I } be the class of all ﬁelds containing C and deﬁne F(C) by F C = I F j. . S ∈ C4. Hence (F1). iii) If A1. A2 are both ﬁnite. Thus Ac ∈ Fj for every j ∈ I. then In j = 1 A j ∈ F for any ﬁnite n. C is contained in the discrete ﬁeld. so that Ac ∈ F. so that A1 ∪ A2 ∈ C4. then A ∈ Fj for every j ∈ I. . Ac}. ∅ ∈ F and hence F is non-empty. ▲ THEOREM 2 Let C be an arbitrary class of subsets of S. and let Fj. A2 ∈ C4. Hence A1 ∪ A2 ∈ C4. A2 are ﬁnite.1 Some1. Let S be inﬁnite (countably so or not) and let C4 be the class of subsets of S which are ﬁnite. ▲ We now formulate and prove the following theorems about ﬁelds.σ -Fields 1. The other two possibilities follow just as in (b). A2 ∈ Fj for every j ∈ I. If A is ﬁnite. or uncountable). . ii) If A ∈ F. so that S. An ∈ F. Next. then Ac ∈ C4. then A1. C3 = {∅. c c b) Suppose that A1 . so that C4 is non-empty. let {Fj. S} is a ﬁeld (trivial ﬁeld). j ∈ I be ﬁelds of subsets of S. PROOF i) S. A2 ∈ F. C1 = {∅. j ∈I ( ) .) PROOF Clearly. ∅ ∈ Fj for every j ∈ I. If Ac is ﬁnite. ii) Suppose that A ∈ C4. 3. S. (We say that F is generated by C and write F = F(C). is a ﬁeld.2* Deﬁnitions Fields and and Notation 9 we see that (F2) and the above statement for unions imply that if A1. Then A or Ac is ﬁnite. a) Suppose that A1. THEOREM 1 Let I be any non-empty index set (ﬁnite. C2 = {all subsets of S} is a ﬁeld (discrete ﬁeld). ▲ 1. Deﬁne F by F = I j ∈I F j = {A. A or Ac is ﬁnite}. . Then (A1 ∪ A2)c = A1 ∩ Ac 2 is ﬁnite c since A1 is. and hence A1 ∪ A2 ∈ F.2. A. PROOF THAT C4 IS A FIELD i) Since S c = ∅ is ﬁnite. Then A1 or Ac 1 is ﬁnite and A2 or A2 is ﬁnite. for some ∅ ⊂ A ⊂ S. c iii) Suppose that A1. Then there is a unique minimal ﬁeld F containing C. or whose complements are ﬁnite. 4. then (Ac)c = A is ﬁnite and hence Ac ∈ C4 also. Then A1 ∪ A2 ∈ Fj for every j ∈ I.

In the ﬁrst case. by deﬁnition Ac ∈ C4. and is unique.2. A. ▲ DEFINITION 3 A class of subsets of S is said to be a σ-ﬁeld. S. so that Ac ∈ C4. By deﬁnition. . and deﬁne Aj = {all integers in [−j. then A or Ac is countable. 2. iii) The proof of this statement requires knowledge of the fact that a countable union of countable sets is countable. Hence A ∉ F. . Then either each Aj is countable.3 Consequences of the Deﬁnition of a σ -Field 1. a σ-ﬁeld is a ﬁeld. 2. C1 = {∅. 2. .) Let Aj. . then U ∞ j =1 Aj ∈ A (that is. we prove that C4 is a σ-ﬁeld. If Ac is countable. in Example 4 on page 9. PROOF i) Sc = ∅ is countable. but the converse is not true. ∈ A. . of all integers. . since it is the intersection of all ﬁelds containing C. published by Addison-Wesley. If Aj ∈ A. . ii) If A ∈ C4. 1. Ac} for some ∅ ⊂ A ⊂ S is a σ-ﬁeld. we note that ∞ ⎛∞ ⎞ A = I Acj . then (Ac)c = A is countable. since it is the intersection of sets. Take S to be uncountable and deﬁne C4 as follows: C4 = {all subsets of S which are countable or whose complements are countable}. whereas Aj ∈ F for all j. 3. As an example. j = 1. Thus A is inﬁnite and furthermore so is Ac. A is closed under denumerable intersections). S } is a σ-ﬁeld (trivial σ-ﬁeld). say. take S = (−∞.2. 1. F(C) is a ﬁeld containing C. In fact. . or there exists some Aj for which Aj is not countable but Ac j is. 2. (For proof of this fact see page 36 in Tom M. If A is countable. Apostol’s book Mathematical Analysis. . one of which is countable. 1957. . Hence F = F(C). C3 = {∅. 4. A is closed under denumerable unions). j]}. j = 1. so C4 is non-empty. . 2. ⎜U j ⎟ ⎝j = 1 ⎠ j = 1 c which is countable. ▲ . . if it is a ﬁeld and furthermore (F3) is replaced by (A 3): If Aj ∈ A. C2 = {all subsets of S } is a σ -ﬁeld (discrete σ-ﬁeld). Then U ∞ j = 1 A j is the set A.10 1 Basic Concepts of Set Theory By Theorem 1. In the second case.4 Examples of σ -Fields 1. j = 1. . j = 1. 2. and is denoted by A . ∞). . It is obviously the smallest such ﬁeld. we invoke the previously mentioned theorem on the countable union of countable sets. then I ∞ j = 1 A j ∈ A (that is. .

x. the pair (S. y ∈ ޒ. THEOREM 3 Let I be as in Theorem 1.5 Special Cases of Measurable Spaces 1. ▲ j =1 j THEOREM 4 Let C be an arbitrary class of subsets of S. and let Aj. This is easily seen to be true by the distributive property of intersection over union (see also Exercise 1. . Let S be ( ޒthe set of real numbers. Deﬁne σ C = I all σ -ﬁelds containing C . we note that if A is a σ-ﬁeld and A ∈ A. be σ-ﬁelds. x . Then A is a σ-ﬁeld.2* Deﬁnitions Fields and and Notation 11 We now introduce some useful theorems about σ-ﬁelds. PROOF i) S. we will always take A to be the discrete σ-ﬁeld. σ(C ) is a σ-ﬁeld which obviously contains C. where complements of sets are formed with respect to A. . Uniqueness and minimality again follow from the deﬁnition of σ(C ).1 σ -Fields Some1.1. A ∈ Aj for all j ∈ I }. x. −∞. ﬁnite or denumerably inﬁnite). x < y ⎪ ⎪ ⎩ ⎭ { } ( ( )( ]( )[ ][ )[ ] )( ) By Theorem 4. then for certain technical reasons. . ∞ .2. (We say that A is the σ-ﬁeld generated by C and write A = σ(C). y .⎫ ⎪ ⎪ C0 = all intervals in ⎨ = ޒ ⎬. A) is called a measurable space. Deﬁne A by A = I j∈I Aj = {A. we take the σ-ﬁeld to be “smaller” than the discrete one.2. which now plays the role of the entire space.) PROOF Clearly. then A ∈ Aj for every j ∈ I. ii) If A ∈ A. if S is countable (that is. or otherwise known as the real line) and deﬁne C0 as follows: ⎧ −∞. x. ∈ A. so that Ac ∈ Aj for every j ∈ I. A2. Thus Ac ∈ A. x. if S is uncountable. y . In both cases. C = B ∩ A for some B ∈ A} is a σ-ﬁeld.5). However. we denote this σ-ﬁeld by B and call . iii) If A1. . . ∈ Aj for every j ∈ I and hence U ∞ j =1 Aj A ∈ A ∈ A j. x. x. . y . j ∈ I. ▲ For later use. Then there is a unique minimal σ-ﬁeld A containing C. x . then A1. A2. then AA = {C. By Theorem 3. y . for every j ∈ I. Hence A = σ(C ). x. In all that follows. there is a σ-ﬁeld A = σ(C0). ∞ . REMARK 1 ( ) { } 1. C is contained in the discrete σ-ﬁeld. . ∅ ∈ Aj for every j ∈ I and hence they belong in A. thus U ∞ .

= {( x. I (−∞. C ′ j is deﬁned the same way as Cj is except that x. y ∈ ޒ. x ) = (−∞. y). j = 1. . x. Next. x < y}. The pair ( ޒ. n n=1 ∞ Thus (−∞. j = 1. y] = (−∞. x < y}. B ⊆ σ (C ′j). y are restricted to the rational numbers. x < y}. y) = (−∞. y ∈ ޒ. . C0 ⊆ σ(C ′j). 8. 2. y) − (−∞. x] . y). . 7 7 7 Thus C0 ⊆ σ (C7). consider monotone sequences of rational numbers convergent to given irrationals x. j = 1. = {[ x. . x] = (−∞. . Consider xn ↓ x. y ∈ޒ. In the case of C ′j. . THEOREM 5 Each one of the following classes generates the Borel σ-ﬁeld. x ). c [x. x ∈ }ޒ. if C. C1 = C2 C3 C4 C5 C6 C7 C8 Also the classes C ′j. . ∞) ∈σ (C ). y ∈ ޒ.12 1 Basic Concepts of Set Theory it the Borel σ-ﬁeld (over the real line). x ]. . 2. c 7 it also follows that (x. Thus. But PROOF {(x. y. y) ∩ (x. . . . Clearly. ∞). B) is called the Borel real line. [x. . ∞). . j = 1. y) ∩ [x. ∞) ∈σ (C ). x ∈ }ޒ. As an example. 2. 8. . y]. . x] ∈ σ(C7) for every x ∈ ޒ. x ∈ }ޒ. x ∈ }ޒ. = {( x. ∞) ∈ σ(C7). 8 generate the Borel σ-ﬁeld. x. . ∞). y]. y] = (−∞. ▲ . = {( −∞. xn) ∈ σ (C7) and hence I∞ n=1 (−∞. ∞) = (−∞. in order to prove the theorem. it sufﬁces to show that C0 ⊆ σ (Cj). ∞) ∈σ (C ). y] ∩ [x. xn) ∈ σ (C7). = {[ x. y) = (−∞. . ∞) ∈σ (C ). ∞) = (−∞. we show that C0 ⊆ σ (C7). then σ (C) ⊆ σ (C ′ ). . 8. and in order to prove this. where for j = 1. . [x. Since (x. y] ∩ (x. = {[ x. 8. x < y}. x. [x. = {( −∞. (x. (x. C ′ are two classes of subsets of S such that C ⊆ C ′. Then (−∞. x) . x ]. . x. it sufﬁces to prove that B ⊆ σ (Cj).

2. ∞).2 Show that in the deﬁnition of a ﬁeld (Deﬁnition 2). 4 . 4 . (x. x) × (−∞. (−∞. 3 . [x. 1.2. 3 . x ′). {1. Let S = ޒ = ޒ × · · · × ޒ × ޒk (k copies of ) ޒand deﬁne C0 in a way similar to that in (2) above. y.2.1. ⋅ ⋅ ⋅ . ∞) × [x ′. Let S = ޒ = ޒ × ޒ2 and deﬁne C0 as follows: C0 = all rectangles in ޒ2 = { } {(−∞.2.3 Show that in Deﬁnition 3. ∞) × (x ′. (x. 3}. x ′]. 1. property (A3) can be replaced by (A3′).5 Give a formal proof of the fact that the class AA deﬁned in Remark 1 is a σ-ﬁeld. .2. 4}. x < y. { {} {} {} { } { } { } { } { } { } {1.6 Refer to Deﬁnition 1 and show that all three sets A. 1 . y′]. x ′ < y′}. 1. j = 1.4 Refer to Example 4 on σ-ﬁelds on page 10 and explain why S was taken to be uncountable. 2. 1. A theorem similar to Theorem 5 holds here too. 3. then A1 ∩ A2 ∈ F. (−∞. 3. 3 . ¯ and lim An. x) × (−∞. 4 . 2. 3. ⋅ ⋅ ⋅ . x ′]. 1. x ′. n ≥ 1. ⋅ ⋅ ⋅ then I Aj ∈ A. y) × (x ′. 4}. Determine whether or not C is a ﬁeld. 2. {2. belong to A provided An. 1.7 Let S = {1. A n→∞ ¯ whenever it exists. ∞). x ′). x. j =1 ∞ 1. 1. 1. The σ-ﬁeld generated by C0 is denoted by Bk and is the k-dimensional Borel σ-ﬁeld. x] × (−∞. 2 and 3 on page 9 are ﬁelds.2. 2 . y′ ∈ ޒ. [x. x] × (−∞. 3.1 Some Deﬁnitions and Exercises Notation 13 2. ⋅ ⋅ ⋅ . belong to A.2. A theorem similar to Theorem 5 holds here too. 2.1 Verify that the classes deﬁned in Examples 1. A2 ∈ F. 4} and deﬁne the class C of subsets of S as follows: C = ∅. 2 . S }. y] × [x ′. 2. (−∞.2. The σ-ﬁeld generated by C0 is denoted by B2 and is the two-dimensional Borel σ-ﬁeld. 1. property (F3) can be replaced by (F3′) which states that if A1. Exercises 1.8 Complete the proof of the remaining parts in Theorem 5. y′). which states that if Aj ∈ A.

and upon completion of the procedure certain results are observed. 14 . Also. An experiment is a deterministic experiment if.14 2 Some Probabilistic Concepts and Results Chapter 2 Some Probabilistic Concepts and Results 2. rolling a die. recording the heights of individuals in a certain class.050 after one year. Events of the form {s} are called simple events. . except that it is known to be one of a set of possible outcomes. Also. is called a random experiment.000 at the annual rate of 5% will yield $1. if Aj. and $(1.1 Probability Functions and Some Basic Properties and Results Intuitively by an experiment one pictures a procedure being carried out under a certain set of conditions whereby the procedure can be repeated any number of times under the same set of conditions. . Accordingly. given the conditions under which the experiment is carried out. 2. if A is an event. etc. drawing a card from a standard deck of playing cards. . a container of pure water is brought to a temperature of 100°C and 760 mmHg of atmospheric pressure the outcome is that the water will boil. Certain subsets of S are called events. for example. then so is its complement Ac.05)n × 1. a certiﬁcate of deposit of $1. are events. counting the number of defective items produced by a certain manufacturing process within a certain period of time. If. An experiment for which the outcome cannot be determined. Only random experiments will be considered in this book. S and ∅ are always events. The set of all possible outcomes of a random experiment is called a sample space and is denoted by S. j = 1. The class of all events has got to be sufﬁciently rich in order to be meaningful.000 after n years when the (constant) interest rate is compounded. respectively. The elements s of S are called sample points. and are called the sure or certain event and the impossible event. recording the number of telephone calls which arrive at a telephone exchange within a speciﬁed period of time. we require that. the outcome is completely determined. then so is their union U j Aj. while an event containing at least two sample points is called a composite event. Examples of random experiments are tossing a coin.

⎝ j =1 ⎠ j =1 ( ) Actually. j = 1.2. A1 − A2 occurs if A1 occurs but A2 does not. for every event A.2. for every collection of pairwise disjoint events. that is for any event Aj. P) (or (S. . so that P S = P S +∅+ ⋅⋅⋅ = P S +P ∅ + ⋅⋅⋅ . The next basic quantity to be introduced here is that of a probability function (or of a probability measure). for every collection of pairwise (or mutually) disjoint events Aj. This is the axiomatic (Kolmogorov) deﬁnition of probability. etc. . S = S + ∅ + ⋅ ⋅ ⋅ . we require that the events associated with a sample space form a σ-ﬁeld of subsets in that space. Aj. . 2. ( ) ( ) ( ) ( ) or 1 = 1+ P ∅ + ⋅ ⋅ ⋅ ( ) and P ∅ = 0. The U j Aj occurs if at least one of the Aj occurs.1. . (So P(∅) = 0. j = 1. A.1 Probability Functions and Some Basic 2. in particular. called the probability of A. then every subset of S is an event (that is. 2. 2. . class of events. ( ) since P(∅) ≥ 0. we have P(Σj Aj) = Σj P(Aj). In such a case. DEFINITION 1 A probability function denoted by P is a (set) function which assigns to each event A a number denoted by P(A). i ≠ j. (P3′) follows then from this assumption by induction. and satisﬁes the following requirements: (P1) P is non-negative. . . we say that the event A occurs or happens.4 Properties Combinatorial and Results 15 (In the terminology of Section 1. (P3) P is σ-additive. P)) is known as a probability space. . possibly ∅. P(A) ≥ 0. there are only ﬁnitely many events and hence. . If the random experiment results in s and s ∈ A.) It follows then that I j Aj is also an event. 2. (P2) P is normed. with probability 0 is called a null event. . n such that Ai ∩ Aj = ∅. that is.1 Consequences of Deﬁnition 1 (C1) P(∅) = 0. A is taken to be the discrete σ-ﬁeld). that is. Any event. that is. . ﬁnitely many pairwise disjoint events. Then (P3) is reduced to: (P3′) P is ﬁnitely additive. P(S ) = 1. etc. . the I j Aj occurs if all Aj occur.) (C2) P is ﬁnitely additive. that is. In fact. n. The triple (S. REMARK 1 If S is ﬁnite. and so is A1 − A2. we have n ⎛ n ⎞ P⎜ ∑ A j ⎟ = ∑ P A j . j = 1. in such a case it is sufﬁcient to assume that (P3′) holds for any two disjoint events. .

P A + Ac = P S . (C4) P is a non-decreasing function. ⎝ j =1 ⎠ j =1 ( ) Indeed. P(Ac) = 1 − P(A). (C5) 0 ≤ P(A) ≤ 1 for every event A. in general. ⎞ ∞ ⎛∞ P⎜ U A j ⎟ ≤ ∑ P A j ⎝ j =1 ⎠ j =1 ( ) ( ) ( ( ) ) and also n ⎛ n ⎞ P⎜ U A j ⎟ ≤ ∑ P A j .16 2 Some Probabilistic Concepts and Results n ⎛ n ⎞ P⎜ ∑ A j ⎟ = ∑ P A j . In fact. In fact. j ≥ n + 1. that is. ( ) ( ) ( ) and therefore P(A2) ≥ P(A1). P Σ jn= 1Aj = P Σ j∞= 1PAj = Σ j∞= 1P Aj = ( ) ( ) ( ) (C3) For every event A. 2 since A1 ∩ A2 ⊆ A2 implies P A2 − A1 ∩ A2 = P A2 − P A1 ∩ A2 . (C6) For any events A1. (C7) P is subadditive. or ( ) () P A + P Ac = 1. then P(A2 − A1) = P(A2) − P(A1). (P2). ( ) ( ) so that P(Ac) = 1 − P(A). ( ) hence P A2 = P A1 + P A2 − A1 . but this is not true. P(A1 ∪ A2) = P(A1) + P(A2) − P(A1 ∩ A2). since A + Ac = S. and (C4). This follows from (C1). that is A1 ⊆ A2 implies P(A1) ≤ P(A2). Hence P A1 ∪ A2 = P A1 + P A2 − A1 ∩ A2 1 2 1 ( ) ( ) ( ) ( ) = P( A ) + P( A ) − P( A ∩ A ). ⎝ j =1 ⎠ j =1 ( ) . A2. A1 ∪ A2 = A1 + A2 − A1 ∩ A2 . ( ) for Aj = 0. A2 = A1 + A2 − A1 . Σ jn= 1P Aj . REMARK 2 If A1 ⊆ A2. In fact.

. . and is denoted by P(A). THEOREM 1 (Additive Theorem) For any ﬁnite number of events. n. this classical deﬁnition together with the following relative frequency (or statistical) deﬁnition of probability served as a motivation for arriving at the axioms (P1)–(P3) in the Kolmogorov deﬁnition of probability. may be assumed to be “equally likely. . P clearly satisﬁes (P1)–(P3′) and this is the classical deﬁnition of probability. n. . and (C4). lim[n(A)/n] exists. the statement is trivial. A random experiment associated with the sample space S is carried out n times. j =1 (P3) and (C2). With this deﬁnition. . If. and we have proven the case n = 2 as consequence (C6) of the deﬁnition of probability functions. j = 1. . This deﬁnition is adequate as long as S is ﬁnite and the simple events {sj}.” but it breaks down if either of these two conditions is not satisﬁed. however. respectively. .4 Properties Combinatorial and Results 17 This follows from the identities U A j = A1 + ( A1c ∩ A2 ) + ⋅ ⋅ ⋅ + ( A1c ∩ ⋅ ⋅ ⋅ ∩ Anc−1 ∩ An ) + ⋅ ⋅ ⋅ . sn}. 2. However. A special case of a probability space is the following: Let S = {s1. Neither the classical deﬁnition nor the relative frequency deﬁnition of probability is adequate for a deep study of probability theory. as n → ∞. s2. 2. . . it is called the probability of A. and deﬁne P as P({sj}) = 1/n.2. . this deﬁnition satisﬁes (P1).1 Probability Functions and Some Basic 2. We now state and prove some general theorems about probability functions. Now assume the result to be true for n = k. For n = 1. j =1 n ∞ U A j = A1 + ( A1c ∩ A2 ) + ⋅ ⋅ ⋅ + ( A1c ∩ ⋅ ⋅ ⋅ ∩ Anc−1 ∩ An ). ( PROOF (By induction on n). . . Clearly. and prove it for n = k + 1. The relative frequency deﬁnition of probability is this: Let S be any sample space. j = 1. Such a probability function is called a uniform probability function. (P2) and (P3′). an intuitively satisfactory interpretation of the concept of probability. supplied with a class of events A. let the class of events be the class of all subsets of S. Let n(A) be the number of times that the event A occurs. we have n ⎛ n ⎞ P⎜ U A j ⎟ = ∑ P A j − ∑ P A j ∩ A j ⎝ j =1 ⎠ j =1 1≤ j < j ≤ n ( ) ∑ ( 1 2 ) ) + P Aj ∩ Aj ∩ Aj 1 2 1≤ j1 < j2 < j3 ≤ n ( 1 2 3 ) − ⋅ ⋅ ⋅ + −1 ( ) n+1 P A1 ∩ A2 ∩ ⋅ ⋅ ⋅ ∩ An . ﬁnite or not. The relative frequency deﬁnition of probability provides. .

)⎟ ⎠ ⎞ (1) k ⎛ k ⎞ P⎜ U A j ∩ Ak+1 ⎟ = ∑ P A j ∩ Ak+1 − ∑ P A j1 ∩ A j2 ∩ Ak+1 ⎝ j =1 ⎠ j =1 1≤ j1 < j2 ≤ k ( ) ( ) ( ) ) + 1≤ j1 < j2 < j3 ≤ k ∑ k P A j1 ∩ A j2 ∩ A j3 ∩ Ak+1 − ⋅ ⋅ ⋅ ( ) + −1 + −1 Replacing this in (1). ( ) k ⎡ ⎤ ⎛ k+1 ⎞ k+1 P⎜ U A j ⎟ = ∑ P A j − ⎢ ∑ P A j ∩ A j + ∑ P A j ∩ Ak+1 ⎥ ⎝ j =1 ⎠ j =1 ⎢ ⎥ j =1 ⎣1≤ j < j ≤ k ⎦ ⎡ ⎤ + ⎢ ∑ P A j ∩ A j ∩ A j + ∑ P A j ∩ A j ∩ Ak+1 ⎥ 1≤ j < j ≤ k ⎢ ⎥ ⎣1≤ j < j < j ≤ k ⎦ ( ) 1 2 ( 1 2 ) ( ) 1 2 ( 1 2 3 ) ( 1 2 ) 3 1 2 − ⋅ ⋅ ⋅ + −1 + ( ) [ P( A ∩ ⋅ ⋅ ⋅ ∩ A ) k+1 1 k 1≤ j1 < j2 < ⋅ ⋅ ⋅ < jk − 1 ∑ ⎤ P A j ∩ ⋅ ⋅ ⋅ ∩ A j ∩ Ak+1 ⎥ ≤k ⎥ ⎦ ( 1 k− 1 ) .18 2 Some Probabilistic Concepts and Results We have ⎞ ⎛⎛ k ⎛ k+1 ⎞ ⎞ P⎜ U A j ⎟ = P⎜ ⎜ U A j ⎟ ∪ Ak+1 ⎟ ⎝ j =1 ⎠ ⎠ ⎝ ⎝ j =1 ⎠ ⎛⎛ k ⎞ ⎛ k ⎞ ⎞ = P⎜ U A j ⎟ + P Ak+1 − P⎜ ⎜ U A j ⎟ ∩ Ak+1 ⎟ ⎝ j =1 ⎠ ⎝ ⎝ j =1 ⎠ ⎠ ( ) ⎡k = ⎢∑ P A j − ∑ P A j1 ∩ A j2 1≤ j1 < j2 ≤ k ⎢ ⎣ j =1 + ∑ P A j1 ∩ A j2 ∩ A j3 − ⋅ ⋅ ⋅ ( ) k+1 ( ) 1≤ j1 < j2 < j3 ≤ k ( ) + −1 k+1 ( ) ⎛ k ⎞ − P A1 ∩ A2 ∩ ⋅ ⋅ ⋅ ∩ Ak ⎤ P A P A j ∩ Ak+1 ⎟ + U k+1 ⎜ ⎥ ⎦ ⎝ j =1 ⎠ ( ) ( ) ( ) = ∑ P Aj − j =1 + 1≤ ji < j2 < j3 ≤ k ( ) ∑ P( A ∩ A ) ∑ P( A ∩ A ∩ A ) − ⋅ ⋅ ⋅ 1≤ ji < j2 ≤ k j1 j1 j2 j2 j3 k+1 + −1 But ( ) ⎛ k P A1 ∩ A2 ⋅ ⋅ ⋅ ∩ Ak − P⎜ U A j ∩ Ak+1 ⎝ j =1 ( ) ( . we get ( ) ( ) 1≤ j1 < j2 ⋅ ⋅ ⋅ jk − 1 ≤ k k+1 ∑ P A j1 ∩ ⋅ ⋅ ⋅ ∩ A jk− 11 ∩ Ak+1 ( P A1 ∩ ⋅ ⋅ ⋅ ∩ Ak ∩ Ak+1 .

as n → ∞. n→∞ j=1 ∞ . Then.1 Probability Functions and Some Basic 2. An ↑ or An ↓. n→∞ j=1 ∞ We recall that UA j =1 ∞ j c = A1 + A1c ∩ A2 + A1c ∩ A2 ∩ A3 + ⋅ ⋅ ⋅ ( ( ) ( ) = A1 + A2 − A1 + A3 − A2 + ⋅ ⋅ ⋅ . Hence ) ( ) ⎛∞ ⎞ P lim An = P⎜ U A j ⎟ = P A1 + P A2 − A1 n→∞ ⎝ j =1 ⎠ ( ) ( ) ( 2 1 ) n n−1 ( ) = lim[ P( A ) + P( A − A ) + ⋅ ⋅ ⋅ + P( A − A )] = lim[ P( A ) + P( A ) − P( A ) + P( A ) − P( A ) + ⋅ ⋅ ⋅ + P( A ) − P( A )] = lim P( A ).4 Properties Combinatorial and Results 19 ( ) P( A ∩ ⋅ ⋅ ⋅ ∩ A ∩ A ) = ∑ P( A ) − ∑ P( A ∩ A ) + ∑ P( A ∩ A ∩ A ) − ⋅ ⋅ ⋅ + −1 k+1 j=1 k+ 2 1 k k+1 j 1≤ j1 < j2 ≤ k+1 j1 j1 j2 1≤ j1 < j2 < j3 ≤ k+1 j2 j3 + −1 THEOREM 2 ( ) k+ 2 P A1 ∩ ⋅ ⋅ ⋅ ∩ Ak+1 . P lim An = lim P An . + P A3 − A2 + ⋅ ⋅ ⋅ + P An − An−1 + ⋅ ⋅ ⋅ n→∞ n→∞ 1 1 2 1 3 2 n n−1 n→∞ n ( ) Thus P lim An = lim P An . n→∞ n→∞ ( ) ( ) PROOF Let us ﬁrst assume that An ↑.2. ▲ ( ) Let {An} be a sequence of events such that. n→∞ n→∞ ( ) ( ) Now let An ↓. by the assumption that An ↑. Then lim An = U A j. Then Ac n ↑. so that c lim An = U Ac j.

1. A1 ∩ A2 ∩ A3c . x 2 + 1 ≤ 375 . x is divisible by 7 . B.1 If the events Aj. or 1 − P⎜ I A j ⎟ = 1 − lim P An . Exercises 2. If x1 and x2 are the numbers written on the ﬁrst and second ball drawn. { B = {x ∈ S . 3 are such that A1 ⊂ A2 ⊂ A3 and P(A1) = 5 P(A2) = 12 . what is the probability that the total number of spots shown is i) Equal to 5? ii) Divisible by 3? 2. respectively.20 2 Some Probabilistic Concepts and Results Hence ⎞ ⎛∞ c c P lim An = P⎜ U Ac j ⎟ = lim P An . n→∞ n →∞ ⎝ j=1 ⎠ ( ) ( ) or equivalently. { } } . c ⎡⎛ ∞ ⎞ ⎤ ⎛∞ ⎞ ⎢ P ⎜ I A j ⎟ ⎥ = lim 1 − P An . c 1 1 4 . what is the probability that: i) x1 + x2 = 8? ii) x1 + x2 ≤ 5? 2. and C by: A = x ∈ S .3 Twenty balls numbered from 1 to 20 are mixed in an urn and two balls are drawn successively and without replacement. P(A3) = 7 . compute the probability of the following events: 12 c c c A ∩ A2 . A1c ∩ A3 .1. 2. A2 ∩ A3 . n→∞ n→∞ ⎝ j =1 ⎠ ( ) ( ) and the theorem is established.1. x = 3n + 10 } C = x ∈ S . for some positive integer n .4 Let S = {x integer. ▲ This theorem will prove very useful in many parts of this book. 1 ≤ x ≤ 200} and deﬁne the events A. n→∞ ⎢⎝ j = 1 ⎠ ⎥ n→∞ ⎝ j =1 ⎠ ⎦ ⎣ [ ( )] ( ) Thus ⎞ ⎛∞ lim P An = P⎜ I A j ⎟ = P lim An . j = 1.2 If two fair dice are rolled once.1. A1c ∩ A2 ∩ A3c . 2.

by assuming the uniform probabil. The same question as above is asked. B as follows: { } B = {s ∈ S .7 Consider the events Aj. where P is the equally likely probability function on the events of S. P(C). . any T in s precedes every H in s}. we shall attempt to provide some intuitive motivation for it. Again. P(B). Before the formal deﬁnition of conditional probability is given. 2. 2. s contains more T ’s than H ’s . are such that ∑ P( A j ) < ∞.1. the answer is. we shall introduce the concepts of conditional probability and stochastic independence. Letting B stand for the event that number 6 appears and A for the event that the uppermost side is red. Compute the probabilities P(A). . This latter probability is called the condiity function. Assuming the uniform probability function. what is the probability that the number on the uppermost side is 6. . 2. Deﬁne the events A. j = 1. Next. we may be observing the die from a considerable distance.1. j=1 ∞ Use Deﬁnition 1 in Chapter 1 and Theorem 2 in this chapter in order to show ¯ ) = 0.2 2. . A = s ∈ S . the above- . The die is rolled once and we are asked for the probability that the upward side is the one bearing the number 6. namely. 1 . 4 and 6 are painted red.2.1. that P(A 2. 2.6 Suppose that the events Aj. so that the color is visible but not the numbers on the sides). j = 1. To this end. suppose that the die is rolled once as 6 before and all that we can observe is the color of the upward side but not the number on it (for example. n→∞ n→∞ ( ) ( ) ( ) ( ) 2. P(B). whereas the remaining three sides are painted black. and use Deﬁnition 1 in Chapter 1 and Theorem 2 herein in order to show that P A ≤ lim inf P An ≤ lim sup P An ≤ P A . consider a balanced die and suppose that the sides bearing the numbers 1. given the information that the uppermost side was painted red. .2 Conditional Probability In this section. . clearly. the answer now is 1 3 tional probability of the number 6 turning up.4 Conditional Combinatorial Probability Results 21 Compute P(A).5 Let S be the set of all outcomes when ﬂipping a fair coin four times and let P be the uniform probability function on the events of S.

⎝ j =1 ⎠ Then . ) The conditional probability can be used in expressing the probability of the intersection of a ﬁnite number of events. PSA = P S∩A P A ( ) (P( A) ) = P(( A)) = 1. j = 1. we have ∞ ⎡⎛ ∞ A ⎞ ∩ A⎤ P ⎡∑ j = 1 A j ∩ A ⎤ ⎛ ∞ ⎞ P⎢ ⎝ ∑j = 1 j ⎠ ⎥ ⎥ ⎢ ⎦ ⎣ ⎣ ⎦ P⎜ ∑ A j A⎟ = = P A P A ⎝ j =1 ⎠ ( ) = 1 P A ( ) ∑ P( A j =1 ∞ j ∩A =∑ j =1 ) ∞ P Aj ∩ A P A ( ( ( ) ) ( ) )= ∑ P( A ∞ j =1 j A. for example. . for the purposes of a certain study. The set function P(·|A) is actually a probability function. given A. are events such that Ai ∩ Aj = ∅. Then the conditional probability. . where b stands for boy and g for girl. A sample space for this experiment is the following: S = {bb. . gg}. DEFINITION 2 Let A be an event such that P(A) > 0. n. We have: P(B|A) Ն 0 for every event B. 2. . gb}. 3 From these and other examples. Or suppose that. clearly. we observe two-children families in a certain locality. j = 1. is the (set) function denoted by P(·|A) and deﬁned for every event B as follows: P BA = P A∩ B ( ) (P( A) ) . 2. and record the gender of the children. Then P(A|B) = P(A ∩ B)/ P(B) = 1 . be events such that ⎛ n−1 ⎞ P ⎜ I A j ⎟ > 0. . and bg. Next. Suppose further (although this is not exactly correct) that: P({bb}) = P({bg}) = P({gb}) = P({gg}) = 1 . and if Aj. . i ≠ j. B = “at least one boy” = {bb. bg. and deﬁne the events A and B as follows: A = “children of one gender” = 4 {bb. given A. . To see this.22 2 Some Probabilistic Concepts and Results mentioned conditional probability is denoted by P(B|A). . and we observe that this is equal to the quotient P(A ∩ B)/P(A). THEOREM 3 (Multiplicative Theorem) Let Aj. it sufﬁces to prove the P(·|A) satisﬁes (P1)–(P3). one is led to the following deﬁnition of conditional probability. gb. indicates that the boy is older than the girl. bg. gg}. P(B|A) is called the conditional probability of B.

Under the condition that P(B) > 0. for all j. i ≠ j. 8 P A4 A1 ∩ A2 ∩ A3 = Thus the required probability is equal to 1 42 ( ) 4 . Find the probability that the ﬁrst ball is black. Thus we have the following theorem. } be a partition of S with P(Aj) > 0. j = 1. it is easier to calculate the conditional probabilities on the right-hand side. .0238. . A2 be the event that the second ball is red. This point is illustrated by the following simple example. 9 P A3 A1 ∩ A2 = ( ) 2 . in general.2 Conditional Combinatorial Probability Results 23 ⎛ n ⎞ P⎜ I A j ⎟ = P An A1 ∩ A2 ∩ ⋅ ⋅ ⋅ ∩ An−1 ⎝ j =1 ⎠ ( ) ( )( ) × P An−1 A1 ∩ ⋅ ⋅ ⋅ ∩ An− 2 ⋅ ⋅ ⋅ P A2 A1 P A1 . The partition is ﬁnite or (denumerably) inﬁnite.) The value of the above formula lies in the fact that. For any event. 10 P A2 A1 = ( ) 3 . . and by using the uniform probability function.2. .4 2. 2. . . j = 1. the second red. we have P(B) = ΣjP(B|Aj)P(Aj). see Exercise 2.4. This formula gives a way of evaluating P(B) in terms of P(B|Aj) and P(Aj). Then P A1 ∩ A2 ∩ A3 ∩ A4 ( = P A4 A1 ∩ A2 ∩ A3 P A3 A1 ∩ A2 P A2 A1 P A1 . j j ( ) ( ) ( )( ) provided P(Aj) > 0. be events such that Ai ∩ Aj = ∅. . Then for B ∈ A. 2. . as the events Aj are ﬁnitely or (denumerably) inﬁnitely many. .2. Now let Aj. accordingly. the above formula . and Σj Aj = S. REMARK 3 EXAMPLE 1 An urn contains 10 identical balls (except for color) of which ﬁve are black. ( ) (The proof of this theorem is left as an exercise. j ( ) Hence P B = ∑ P B ∩ Aj =∑ P B Aj P Aj . all j. 7 Ӎ 0. Four balls are drawn without replacement. j = 1. we clearly have: B = ∑ B ∩ Aj . 2. Let A1 be the event that the ﬁrst ball is black. A3 be the event that the third ball is white and A4 be the event that the fourth ball is black. THEOREM 4 (Total Probability Theorem) Let {Aj. the third white and the fourth black. three are red and two are white. . . we have ( ) )( )( )( ) P A1 = ( ) 5 . Such a collection of events is called a partition of S.

. Let p denote the probability of the event A that the student does the homework and let B be the event that he/she answers the question correctly. . . . 2. For example. Then. for example. two partitions {Ai. Ai = ∑ Ai ∩ B j . . 2.68. }. . .3. i = 1. 2. 0. The following simple example serves as an illustration of Theorems 4 and 5. .5. clearly. ⋅ ⋅ ⋅ . 2. 2. that P(A|B) is approximately equal to: 0. j = 1. 2. then P Aj B = ( ) )( ). i = 1. If a student has done the homework. . ⋅ ⋅ ⋅ .} {Bj. {A ∩ B . j = 1. as only then are the above theorems true. there is no reason to restrict ourselves to one partition of S only. ) ( P(B) ) = ( P(B ) ∑ P( B A ) P( A ) j j j j j i i i (Bayes Formula) If {Aj. we ﬁnd. In fact. it is easily seen that P(A|B) = P(A) if and only if p = 0 or 1. REMARK 4 EXAMPLE 2 A multiple choice test question lists ﬁve alternative answers. . We may consider. j i i j i = 1. j ≥ 1} forms a partition of S. respectively. . . . an application of Theorems 4 and 5 gives P AB = ( ) P BA P A +P BA P A c ( )( ) ( P BA P A ( )( ) )( ) c = 1⋅ p 5p = . j = 1. ⋅ ⋅ ⋅} i . j = 1. ∑ P( B A ) P( A ) P B Aj P Aj i i i ( It is important that one checks to be sure that the collection {Aj. 0. . for p = 0. ⋅ ⋅ ⋅ . Find the expression of the conditional probability P(A|B) in terms of p. P Aj B = Thus THEOREM 5 ( P B A ) P( A ) P( B A ) P( A ) P A ∩B = . j Bj and ( ) = ∑ ( B ∩ A ).24 2 Some Probabilistic Concepts and Results can be “reversed” to provide an expression for P(Aj|B). . of which only one is correct.7. Of course. j = 1. By noting that A and Ac form a partition of the appropriate sample space.83 and 0. 2.92. 2. 0. and if P(B) > 0. 2. j = 1. otherwise he/she chooses an answer at random. 1 4p+1 1⋅ p + 1 − p 5 ( ) Furthermore. then he/she is certain to identify the correct answer.} is a partition of S and P(Aj) > 0. . .

2.2 Show that: i) P(Ac|B) = 1 − P(A|B). iv) P(C|A + B) = P(C|A) + P(C|B).5 Suppose that a multiple choice test lists n alternative answers of which only one is correct. j ′) i i and ∑ ( A ∩ B ) = ∑ ∑ ( A ∩ B ) = ∑ A = S.2.2. that the following equations need not be true: iii) P(A|Bc) = 1 − P(A|B).4 Combinatorial Exercises Results 25 is a partition of S. j ) ≠ (i′. Exercises 2. On the other hand. i = 1. j j ( ) ( ) ( ( ) ) ( ( )( ) )( ) provided P(Bj) > 0. j = 1. i i provided P(Ai) > 0. 2. . from Ai = ∑ Ai ∩ Bj j ( ) and Bj = ∑ Ai ∩ Bj . . then show that P(B|A) > P(B) 2. .2. P(Bj) are called marginal probabilities.2.2. ii) P(A ∪ B|C) = P(A|C) + P(B|C) − P(A ∩ B|C). A and B be deﬁned as in Example 2 and ﬁnd Pn(A|B) (P(A)P(B) > 0). i j i j i. (A ∩ B ) ∩ (A i j i′ ∩ B j ′ = ∅ if ) (i.2. In fact. i ( ) we get P Ai = ∑ P Ai ∩ B j = ∑ P Ai B j P B j .1 If P(A|B) > P(A). j i j The expression P(Ai ∩ Bj) is called the joint probability of Ai and Bj. . . . 2.3 If A ∩ B = ∅ and P(A + B) > 0. Also show. . We have analogous expressions for the case of more than two partitions of S. by means of counterexamples.4 Use induction to prove Theorem 3. and P B j = ∑ P Ai ∩ B j = ∑ P B j Ai P Ai . 2. The probabilities P(Ai). 2. . Let p. . express the probabilities P(A|A + B) and P(B|A + B) in terms of P(A) and P(B).

j = 1.2. 5} be a partition of S and suppose that P(Aj) = j/15 and P(A|Aj) = (5 − j)/15. . If one arranges a blind date with a club member.2. 3 are any events in S. If it is known that 5 of 1.2. . .12 Suppose that a test for diagnosing a certain heart disease is 95% accurate when applied to both those who have the disease and those who do not. Compute the probabilities P(Aj|A). (A1 ∪ A2 ∪ A3) } is a partition of S. if the ﬁrst two were found to be good? ii) The third tube is defective. . 2. . . Does this result seem reasonable? c c 2. What is the probability that: i) The third tube is good. what is the probability that: i) The second twin is a boy.2. if it was only revealed that she has blue eyes? 2. A ball is drawn at random from each one of the two urns and . 4%. respectively. 5 with blue eyes and 20 with brown eyes. 2.13 Consider two urns Uj.26 2 Some Probabilistic Concepts and Results in terms of n and p.2. respectively. 1 with blue eyes and 4 with green eyes. 15 with blue eyes and 5 with brown eyes.52. . . then Pn(A|B) increases as n increases. such that urn Uj contains mj white balls and nj black balls. II and III? 2. j = 1. are defective. if one of the other two was found to be good and the other one was found to be defective? 2.26.2. Next show that if p is ﬁxed but different from 0 and 1. 30% and 40%. 5. of the total output of certain items. . . 2.000 in a certain population have the disease in question. show that {A1. given that the ﬁrst is a boy? ii) The second twin is a girl. II and III manufacture 30%. Given that the probability of a child being a boy is 0. Three tubes are chosen at random and tested successively. One item is drawn at random. j = 1.7 Let {Aj. given that the ﬁrst is a girl? 2.11 A shipment of 20 TV tubes contains 16 good tubes and 4 defective tubes. 5.6 If Aj. A 1 ∩ A 2 ∩ c A3.2.2. 5 redheads. . . what is the probability that: i) The girl is blonde? ii) The girl is blonde.10 Three machines I. compute the probability that a patient actually has the disease if the test indicates that he does. What is the probability that the item was manufactured by each one of the machines I.8 A girl’s club has on its membership rolls the names of 50 girls with the following descriptions: 20 blondes. 3% and 2%. 2. j = 1. Ac 1 ∩ A2. Of them. 25 brunettes. (Interpret the answer by intuitive reasoning.) 2. tested and found to be defective. j = 1.30 and that the probability that they are both girls is 0.9 Suppose that the probability that both of a pair of twins are boys is 0.

is transferred from urn U1 to urn U2 (color unnoticed). the probability that the second ball is red is 7 . chosen at random. 3 such that urn Uj contains mj white balls and nj black balls. Except for color. . is 10 . a ball. two balls are selected at random from urn Uj. 2.16 Consider the urns of Exercise 2. Finally. 2. whereas the conditional probability 9 that the second ball is red. Finally. we deﬁned P(B|A) = P(A ∩ B)/P(A).2.2. . given that the ﬁrst ball was red. Compute the probability that this last ball is black.14 Consider the urns of Exercise 2. j = 1. In other words. . and then a ball.2. knowledge of the event which occurred in the ﬁrst drawing provides no additional information in . .15. If an odd number appears. .2. This probability is the same even if the ﬁrst ball was black. is transferred to urn U1. Suppose that two balls are drawn successively and without replacement. . A ball is drawn at random from urn U1 and is placed in urn U2. As an illustration. A ball. 2. A balanced die is tossed once and if the number j appears on the die. Now P(B|A) may be >P(B).2.4 Combinatorial 2. what is the probability that the urn chosen was urn U1 or U2? 2. A balanced die is rolled and if an even number appears. . is 7 . What is the probability that the ball is white? 2. if the balls are drawn with replacement. is transferred from urn U2 to urn U3 (color unnoticed). If the ball drawn was white. or = P(B). the number of white balls in the urn U2 remains the same? 2.2.17 Consider six urns Uj. given that the ﬁrst ball was red.2. chosen at random from U1. On the other hand. What is the probability that. Without any 9 knowledge regarding the ﬁrst ball. < P(B). a ball is drawn at random from urn U3. chosen at random. Compute the probability that the ball is black.2. a ball is chosen at random from urn Uk−1 and is placed in urn Uk. j = 1. 2. . Then a ball is drawn at random from urn U2 and is placed in urn U3 etc. B with P(A) > 0. is 6 . the remaining three being black. Then (assuming throughout the uniform probability function) the conditional probability that the second ball is red. after the above experiment is performed twice. One urn is chosen at random and one ball is drawn from it also at random.15 Consider three urns Uj. the probability 10 7 that the second ball is red.18 Consider k urns Uj. Compute the probability that one ball is white and one ball is black. chosen at random from urn U2. a ball. given that the ﬁrst was black. Then a ball is drawn at random from the third urn. 6 such that urn Uj contains mj (≥ 2) white balls and nj (≥ 2) black balls. seven of which are red. the balls are identical. consider an urn containing 10 balls.13.3 Independence Results 27 is placed into a third urn. j = 1. is transferred to urn U2. k each of which contain m white balls and n black balls.3 Independence For any events A. A ball is then drawn at random from urn Uk.

Then if P(B|A) = P(B). independence is a symmetric relation. As was pointed out in connection with the examples discussed above. Thus.28 2 Some Probabilistic Concepts and Results calculating the probability of the event that the second ball is red. . . B are said to be (statistically or stochastically or in the probability sense) independent if P(A ∩ B) = P(A)P(B). . . j = 1. . 2. and one of the events is independent of the other. n and j1. jk = 1. provided P(B) > 0. Again 2 knowledge of the event A provides no additional information in calculating the probability of the event B. Events like these are said to be independent. for example. The deﬁnition of independence generalizes to any ﬁnite number of events. These events are said to be pairwise independent if P(Ai ∩ Aj) = P(Ai)P(Aj) for all i ≠ j. . P(B) = 0. . and then deﬁne the probability function P appropriately to reﬂect this independence. if P(A). then this second event is also independent of the ﬁrst. and we may simply say that A and B are independent. n such that 1 ≤ j1 < j2 < · · · < jk ≤ n. In fact. .” B = “older child is a boy. etc. then it is easily seen that A is also independent of B. . in connection with the descriptive experiments of successively drawing balls with replacement from the same urn with always the same content. n are said to be (mutually or completely) independent if the following relationships hold: P Aj ∩ ⋅ ⋅ ⋅ ∩ Aj = P Aj ⋅ ⋅ ⋅ P Aj 1 k 1 ( ) ( ) ( ) k for any k = 2. . As another example. as follows from the equation P(A|B) = P(A). DEFINITION 3 The events A. 2. P(B) > 0. What actually happens in practice is to consider events which are independent in the intuitive sense. More generally. ( ) (P(B) ) = ( P(B P( B) ) That is. That is. Thus A and B are independent. or repeatedly tossing the same or different coins. Notice that this relationship is true even if one or both of P(A). . Thus: DEFINITION 4 The events Aj. B be events with P(A) > 0. we say that the even B is (statistically or stochastically or in the probability sense) independent of the event A. provided P(A) > 0. Events which are intuitively independent arise. revisit the two-children families example considered earlier. . In this case P(A ∩ B) = P(A)P(B) and we may take this relationship as the deﬁnition of independence of A and B. If P(B) is also > 0. or P(B|A) = P(B).” Then P(A) = P(B) = P(B|A) = 1 . and deﬁne the events A and B as follows: A = “children of both genders. let A. . or drawing cards with replacement from the same deck of playing cards. independence of two events simply means that knowledge of the occurrence of one of them helps in no way in re-evaluating the probability that the other event happens. . P AB = P B A)P( A) P( B)P( A) P A∩ B = = P( A). . . . This is true for any two independent events A and B.

then they are pairwise independent. 2. P( A ∩ A ) = P( A )P( A ). 5}. P 2 8 3 5 . for n = 3 we will have: P A1 ∩ A2 ∩ A3 = P A1 P A2 P A3 .4 Combinatorial 2. 3. A2. 2. P({1}) = · · · = P({4}) = 1 . 2.2. A2 = {1. 4 2 2 P A1 ∩ A2 = ( ( ( ) ( ) ( ) ) ) ( ) ( ) ( ) ( ) but P A1 ∩ A2 ∩ A3 = ( ) 1 1 1 1 ≠ ⋅ ⋅ = P A1 P A2 P A3 . 4}. j = 1. Then A1 ∩ A2 = A1 ∩ A3 = A2 ∩ A3 = 1 . n and they are all necessary. . 4 2 2 2 ( ) ( ) ( ) Now let S = {1. 3. 3}. 4}. . and Thus {} A1 ∩ A2 ∩ A3 = 1 . P( A ∩ A ) = P( A )P( A ). 4 Next. j = 1.3 Independence Results 29 It follows that. 4 2 2 1 1 1 P A1 ∩ A3 = = ⋅ = P A1 P A3 . 1 2 1 2 ( ) ( ) ( ) ( ) P( A ∩ A ) = P( A )P( A ). if the events Aj. 4 A3 = {1. . 2}. Also there are ⎛ n⎞ ⎛ n⎞ ⎛ n⎞ ⎛ n⎞ ⎛ n⎞ n n ⎜ ⎟ +⎜ ⎟ + ⋅ ⋅ ⋅ +⎜ ⎟ =2 −⎜ ⎟ −⎜ ⎟ =2 −n−1 ⎝ 2 ⎠ ⎝ 3⎠ ⎝ n⎠ ⎝ 1⎠ ⎝ 0⎠ relationships characterizing the independence of Aj. 4. n are mutually independent. ({ }) = P({3}) = P({4}) = 16 16 . P({5}) = . . The converse need not be true. For example. 1 1 1 = ⋅ = P A1 P A2 . . and set A1 = {1. {} P A1 ∩ A2 = P A1 ∩ A3 = P A2 ∩ A3 = P A1 ∩ A2 ∩ A3 = ( ) ( ) ( ) ( ) 1 . A3 is illustrated by the following examples: Let S = {1. and deﬁne P as follows: P 1 = ({ }) 1 . as the example below illustrates. . 4 2 2 1 1 1 P A1 ∩ A3 = = ⋅ = P A2 P A3 . . 1 3 1 3 2 3 2 3 That these four relations are necessary for the characterization of independence of A1. .

2. An ′ . . . Next. 4 . n. 8 2 2 2 ( ) ( ) ( ) but P A1 ∩ A2 = ( ) 5 1 1 ≠ ⋅ = P A1 P A2 . and we have to show that PROOF P A1′ ∩ ⋅ ⋅ ⋅ ∩ Ak ′ +1 = P A1′ ∩ ⋅ ⋅ ⋅ ∩ Ak ′ ∩ Ak+1 1 k k+1 ( ) ( ) = P( A′) ⋅ ⋅ ⋅ P( A′ )P( A ). 3. . we suppose that P(A′ ′ ) = P(A1 ′ ) · · · P(A′ 1 ∩ · · · ∩ Ak k). A3 = 1. . Then A1 ∩ A2 = 1. and we shall show that P(A′ ∩ · · · ∩ A ′ ) = P ( A ′ ) · · · P ( A ′ ). k+1 2 2 k k k+1 k+1 1 2 k k+1 1 c 1 2 k k+1 ) [( ] . {} Thus P A1 ∩ A2 ∩ A3 = ( ) 1 1 1 1 = ⋅ ⋅ = P A1 P A2 P A3 . THEOREM 6 If the events A1. . It is the theorem below. The proof is done by (a double) induction. then P A1c ∩ A2 ∩ ⋅ ⋅ ⋅ ∩ Ak ∩ Ak+1 = P S − A1 ∩ A2 ∩ ⋅ ⋅ ⋅ ∩ Ak ∩ Ak+1 = P A2 ∩ ⋅ ⋅ ⋅ ∩ Ak ∩ Ak+1 − A1 ∩ A2 ∩ ⋅ ⋅ ⋅ ∩ Ak ∩ Ak+1 2 k k+1 1 2 k ( ( ) = P( A ∩ ⋅ ⋅ ⋅ ∩ A ∩ A ) − P( A ∩ A ∩ ⋅ ⋅ ⋅ ∩ A ∩ A ) = P( A ) ⋅ ⋅ ⋅ P( A ) P( A ) − P( A ) P( A ) ⋅ ⋅ ⋅ P( A ) P( A ) = P( A ) ⋅ ⋅ ⋅ P( A )P( A )[1 − P( A )] = P( A )P( A ) ⋅ ⋅ ⋅ P( A )P( A ). 16 2 2 ( ) ( ) The following result.30 2 Some Probabilistic Concepts and Results Let A1 = 1. 2) = P(A1 c c c Similarly if A′ = A and A ′ = A . . That is. 3 . 4 . j = 1. Indeed. For A ′ = A and A ′ = A . i = 2. let A′ 1 ∩ A′ 2) = P(A′ 1)P(A2 1 = A1 and A′ 2 = A 2. First. regarding independence of events. 2. . . For n = 2. An are independent. we have to c show that P(A′ ′ ). . . A2 = 1. . so are the events A1 ′ . P ( A ′ ∩ A ′ ) 1 2 2 1 2 1 2 = 1 1 2 c c c c c c P(Ac ∩ A ) = P [( S − A ) ∩ A ] = P ( A − A ∩ A ) = P ( A ) − P ( A ∩ A ) = P ( Ac 1 1 1 1 2 2 2 2 2 2 2) c c c c − P(A1)P(A2) = P(A2)[1 − P(A1)] = P(A2)P(A1) = P(A1 ′ )P(A2 ′ ). Then c P(A1 ′ ∩ A2 ′ ) = P(A1 ∩ A 2) = P[A1 ∩ (S − A2)] = P(A1 − A1 ∩ A2) = P(A1) − P(A1 ∩ A2) = P(A1) − P(A1)P(A2) = P(A1)[1 − P(A2)] = P(A1 ′ )P(Ac ′ )P(A2 ′ ). 2 . . k. assume the assertion to be true for k events and show it to be true for k + 1 events. . . { } { } { } { } A1 ∩ A2 ∩ A3 = 1 . ) This relationship is established also by induction as follows: If A1 ′ = Ac 1 and Ai′ = Ai. . assume 1 k+1 1 k+1 that Ak ′ +1 = Ak+1. . where Aj′ is either Aj or Ac j. is often used by many authors without any reference to it.

3 Independence Results 31 c This is. we have shown that P A1′ ∩ ⋅ ⋅ ⋅ ∩ Ak ′ P Ak+1 . clearly. ( ) ( ) ( ) ( ) . ′ = P A1′ ⋅ ⋅ ⋅ P Ak ( ) ( ) take Ak ′ +1 = A . Thus. assume that c P A1 ∩ ⋅ ⋅ ⋅ ∩ Alc ∩ Al+1 ∩ ⋅ ⋅ ⋅ ∩ Ak +1 ( c = P A1 ⋅ ⋅ ⋅ P Alc P Al+1 ⋅ ⋅ ⋅ P Ak +1 ( ) ( )( ) c l +1 ( ) ) and show that P A1c ∩ ⋅ ⋅ ⋅ ∩ Alc ∩ Alc+1 ∩ Al+ 2 ∩ ⋅ ⋅ ⋅ ∩ Ak +1 =P A Indeed. It is also. .2. Now. c 1 c 1 c l c l l+ 2 l+ 2 k+1 k+1 l +1 c l +1 c 1 c l c l +1 l+ 2 k+1 ( ) ( )( ) ( ) − P( A ) ⋅ ⋅ ⋅ P( A ) P( A ) P( A ) ⋅ ⋅ ⋅ P( A ) k+1 ( c l ) as was to be seen. k +1 ) P A1c ∩ ⋅ ⋅ ⋅ ∩ Alc ∩ Alc+1 ∩ Al+ 2 ∩ ⋅ ⋅ ⋅ ∩ Ak+1 ( = P A1c ∩ ⋅ ⋅ ⋅ ∩ Alc ∩ S − Al+1 ∩ Al+ 2 ∩ ⋅ ⋅ ⋅ ∩ Ak+1 = P A ∩ ⋅ ⋅ ⋅ ∩ A ∩ Al+ 2 ∩ ⋅ ⋅ ⋅ ∩ Ak+1 c 1 c l [ ( ( ( ) ) ) ) ] − A1c ∩ ⋅ ⋅ ⋅ ∩ Alc ∩ Al+1 ∩ Al+ 2 ∩ ⋅ ⋅ ⋅ ∩ Ak+1 = P A ∩ ⋅ ⋅ ⋅ ∩ A ∩ Al+ 2 ∩ ⋅ ⋅ ⋅ ∩ Ak+1 c 1 − P A ∩ ⋅ ⋅ ⋅ ∩ A ∩ Al+1 ∩ Al+ 2 ∩ ⋅ ⋅ ⋅ ∩ Ak+1 c 1 c l = P A1c ⋅ ⋅ ⋅ P Alc P Al+ 2 ⋅ ⋅ ⋅ P Ak+1 c 1 c l l +1 l+ 2 (by the induction hypothesis) = P( A ) ⋅ ⋅ ⋅ P( A )P( A ) ⋅ ⋅ ⋅ P( A )[1 − P( A )] = P( A ) ⋅ ⋅ ⋅ P( A ) P( A ) ⋅ ⋅ ⋅ P( A ) P( A ) = P( A ) ⋅ ⋅ ⋅ P( A )P( A )P( A ) ⋅ ⋅ ⋅ P( A ). . . c 1 c l l+ 2 ( ( ) ⋅ ⋅ ⋅ P( A )P( A )P( A ) ⋅ ⋅ ⋅ P( A ). under the assumption that P A1′ ∩ ⋅ ⋅ ⋅ ∩ Ak ′ . ′ ∩ Ak+1 = P A1′ ⋅ ⋅ ⋅ P Ak ( ) ( ) ( ) ( ( ) ) Finally.4 Combinatorial 2. i = 2. k. and show that c k+1 c c P A1′ ∩ ⋅ ⋅ ⋅ ∩ Ak ′ ∩ Ak ′ ⋅ ⋅ ⋅ P Ak ′ P Ak +1 = P A 1 +1 . for ᐉ < k. true if Ac 1 is replaced by any other A i. true that the same result holds if the ᐉ Ai′’s k which are Ac i are chosen in any one of the ( ᐉ) possible ways of choosing ᐉ out of k. clearly. .

s2 ∈ S2}. What actually happens in practice is to start out with two experiments E1. 2. 2. If S stands for this sample space. n. ▲ Now. E2) of experiments. P2). it holds . Thus. or repeatedly tossing the same or different coins etc. j = 1. we say that the experiments E1 and E2 are independent if P(B1 ∩ B2) = P(B1)P(B2) for all events B1 associated with E1 alone. or drawing cards with replacement from the same deck of playing cards. ⋅ ⋅ ⋅ . subsets of S. n.32 2 Some Probabilistic Concepts and Results In fact. and then deﬁne the probability function P. then. S = S1 × S2 = {(s1. One may look at the pair (E1. let Ej be an experiment having the sample space Sj. ⋅ ⋅ ⋅ . E2 which are intuitively independent. . . of course. then the compound experiment (E . and the experiments are said to be independent if for all events Bj associated with experiment Ej alone. j = 1. class of events. clearly. class of events. in terms of P1 and P2. on the class of events in the space S1 × S2 so that it reﬂects the intuitive independence. if Ej. E . . The notion of independence also carries over to experiments. such as the descriptive experiments (also mentioned above) of successively drawing balls with replacement from the same urn with always the same content. . and then the question arises as to what is the appropriate sample space for this composite or compound experiment. c P A1′ ∩ ⋅ ⋅ ⋅ ∩ Ak ′ ∩ ⋅ ⋅ ⋅ ∩ Ak ′ ∩ S − Ak+1 ′ ∩ Ak +1 = P A 1 ( ( ) = P( A′ ∩ ⋅ ⋅ ⋅ ∩ A′ ) − P( A′ ∩ ⋅ ⋅ ⋅ ∩ A′ ∩ A ) = P( A′) ⋅ ⋅ ⋅ P( A′ ) − P( A′) ⋅ ⋅ ⋅ P( A′ )P( A ) (by the induction hypothesis and what was last proved) = P( A′) ⋅ ⋅ ⋅ P( A′ )[1 − P( A )] = P( A′) ⋅ ⋅ ⋅ P( A′ )P( A ). s1 ∈ S1. The corresponding events are. . also denoted by E1 × E2. P1) and (S2. E ) = E 1 2 n 1 × E2 × ⋅ ⋅ ⋅ × En has sample space S. n}. = P A1′ ∩ ⋅ ⋅ ⋅ ∩ Ak ′ − A1′ ∩ ⋅ ⋅ ⋅ ∩ Ak ′ ∩ Ak+1 1 1 k 1 k k+1 k 1 k k+1 1 1 k k k+1 c k+1 ) [( ( ))] This completes the proof of the theorem. Thus. are n experiments with corresponding sample spaces Sj and probability functions Pj on the respective classes of events. s ∈S . . 1 n j j The class of events are subsets of S. and have the corresponding probability spaces (S1. where S = S1 × ⋅ ⋅ ⋅ × S n = {(s . for j = 1. The above deﬁnitions generalize in a straightforward manner to any ﬁnite number of experiments. . . s ). s2). and all events B2 associated E2 alone. j = 1. 2. 2. ⋅ ⋅ ⋅ ..

. that the conclusion need not be true if B ∩ C ≠ ∅.2.3. . by means of a counterexample. . Show. . i ≠ j. so that: . Am. . Then show that the events A1. then P(A) = 0 or 1. 4 what is the probability that: i) All will successfully hit the target? ii) At least one will do so? How many missiles should be ﬁred. 2.3 If A. 2.6 Jim takes the written and road driver’s license tests repeatedly until he passes them.4 Combinatorial Exercises Results 33 P B1 ∩ ⋅ ⋅ ⋅ ∩ Bn = P B1 ⋅ ⋅ ⋅ P Bn . .3. n are independent events.1 If A and B are disjoint events. In closing this section. . n. .9 and the road test is 0.3. If four missiles are ﬁred independently. Given that the probability that he passes the written test is 0. on the class of events in S so that to reﬂect the intuitive independence of the experiments Ej. . n. B are independent. 2. . . . .7 The probability that a missile ﬁred against a target is not intercepted by 2 an antimissile missile is 3 . B + C are independent. Σn j = 1Bj are independent. j = 1. then A. no matter whether he passes or fails his next road test. C are independent and B ∩ C = ∅. 2. .3. Also. . . 2. the probability function P is deﬁned. A. then show that A and B are independent if and only if at least one of P(A).4 For each j = 1.3.) 2. . the probability of a successful hit is 3 .3. Bj are independent and that Bi ∩ Bj = ∅. . suppose that the events A1. . .3. j = 1. in terms of Pj. Am. . . P(B) is zero.6 and that tests are independent of each other. show that n ⎞ ⎛ n P⎜ U A j ⎟ = 1 − ∏ P Ac j . ( ) ( ) ( ) Again.2 Show that if the event A is independent of itself. Given that the missile has not been intercepted. . n. and that once he passes the written test he does not have to take it again. we mention that events and experiments which are not independent are said to be dependent. what is the probability that he will pass both tests on his nth attempt? (Assume that the road test cannot be taken unless he passes the written test. j = 1. ⎝ j =1 ⎠ j =1 ( ) 2. .5 If Aj. Exercises 2. 2. the written and the road tests are considered distinct attempts.

Some combinatorial results will be needed and we proceed to derive them here. Compute the probabilities that: i) ii) iii) iv) v) The game terminates on the nth throw and player A wins. . Player A wins. The ﬁrst time a total of 10 appears. The same for player B. player B wins. Also examples illustrating the theorems of previous sections will be presented. while the ﬁrst time that a total of 6 appears. and the game is terminated. If the circuits close independently of each other and with respective probabilities pi.34 2 Some Probabilistic Concepts and Results iii) At least one is not intercepted with probability ≥ 0. determine the probability that: i) ii) iii) iv) v) Exactly one circuit is closed. . At least one circuit is closed.3. we will restrict ourselves to ﬁnite sample spaces and uniform probability functions.9 Electric current is transmitted from point A to point B provided at least one of the circuits #1 through #n below is closed. . . What do parts (i)–(iv) become for p1 = · · · = pn = p. n.8 Two fair dice are rolled repeatedly and independently.4 Combinatorial Results In this section.95? iv) At least one successfully hits its target with probability ≥ 0.99? 2. Exactly m circuits are closed for 0 ≤ m ≤ n.3. i = 1. Does the game terminate ever? 2. say? 1 2 A n B 2. . player A wins. Player B wins. At least m circuits are closed with m as in part (iii).

. . but otherwise identical) balls. The assertion is true for k = 2. and in particular. In forming a subset. . . 2. forms the backbone of the results in this section. k. . × Ek is n1 . . j = 1. Then the required number is equal to the following product of N factors 2 · · · 2 = 2N. . j = 1. . if nj is the number of outcomes of the experiment Ej. Consider an urn which contains n numbered (distinct. Then the total number of ways the task T may be performed is given by ∏ k j =1 n j. ii) Consider the set S = {1. we say that a sample of size k was drawn. . . known as the Fundamental Principle of Counting. since by combining each one of the ∏ m ways of performing the ﬁrst m subtasks with each one of j =1 n j m+1 nm+1 ways of performing substask Tm+1. assume the result to be true for k = m and establish it for k = m + 1. 2. nk. . THEOREM 7 Let a task T be completed by carrying out all of the subtasks Tj. iii) Let nj = n(Sj) be the number of points of the sample space Sj.n = 1 · 2 · · · n = n!). 2. . ii) The number of unordered samples of size k is . k. we obtain ∏ m j = 1 n j × nm+1 = ∏ j = 1 n j for the total number of ways of completing task T. k. provided the sampling is done without replacement. . The sample is ordered if the order in which the balls are drawn is taken into consideration and unordered otherwise. Then the sample space S = S1 × · · · × Sk has n(S) = n1 · · · nk sample points.4 Combinatorial Results 35 The following theorem. since by combining each one of the n1 ways of performing subtask T1 with each one of the n2 ways of performing subtask T2. . N} and suppose that we are interested in ﬁnding the number of its subsets. then the number of outcomes of the compound experiment E1 × . Or. . . we consider for each element whether to include it or not. and let it be possible to perform the subtask Tj in nj (different) ways. . The main results will be formulated as theorems and their proofs will be applications of the Fundamental Principle of Counting. three pairs of shoes and two hats. we shall consider the problems of selecting balls from an urn and also placing balls into cells which serve as general models of many interesting real life problems. we obtain n1n2 as the total number of ways of performing task T. . Next. . . EXAMPLE 3 i) A man has ﬁve suits. ▲ PROOF ( ) The following examples serve as an illustration to Theorem 7. j = 1. Then we have the following result. . 2. In the following. The reasoning is the same as in the step just completed. and is equal to nk if the sampling is done with replacement. Then the number of different ways he can attire himself is 5 · 3 · 2 = 30. . . j = 1. if k = n.2. THEOREM 8 i) The number of ordered samples of size k is n(n − 1) · · · (n − k + 1) = Pn. Pn.k (permutations of k objects out of n. k. . If k balls are drawn from the urn.

. we have that. Golomb and appeared in the American Mathematical Monthly. j = 1. this number is Pn. [See also Theorem 9(iii). . and 53 = 125 with repetitions. In part (i). . 2. let us consider the following examples. clearly. . 1968. 530. by the ﬁrst part. . the required number is ⎛ n + k − 1⎞ ⎜ ⎟ = N n. W. .k = xk!. take k out of n + k − 1.3 = 5 · 4 · 3 = 60 without repetitions. ▲ For the sake of illustration of Theorem 8. . .k n! = C n . 2. n. if order counts. (That is. .k. . (ii) Find the number of all subsets of the set S = {1. . “repeat (k − 1)st lowest numbered card. we choose to present the following short alternative proof which is due to S. 4. k = ⎜ ⎟ k ⎠ ⎝ ( ) if the sampling is done with replacement.) Thus. .k = ⎜ ⎟ = k! ⎝ k⎠ k! n − k ! ( ) if the sampling is done without replacement. However. if x is the required number. Then the required number is P5.36 2 Some Probabilistic Concepts and Results ⎛ n⎞ Pn . The proof of the second part may be carried out by an appropriate induction method. 3. . . the order in which the numbers are selected is relevant. then Pn. Hence the desired result. clearly. .” Then a sample of size k without replacement from this enlarged (n + k − 1)-card deck corresponds uniquely to a sample of size k from the original deck with replacement.” . j = 1. . Since for every sample of size k one can form k! ordered samples of the same size. p. 5. and the second part follows from the same theorem by taking nj = n. For clarity. . consider the n balls to be cards numbered from 1 to n and adjoin k − 1 extra cards numbered from n + 1 to n + k − 1 and bearing the respective instructions: “repeat lowest numbered card. k. N}. EXAMPLE 4 i(i) Form all possible three digit numbers by using the numbers 1. as already found in Example 3. ii) For the ﬁrst part. k .” “repeat 2nd lowest numbered card. k. and then apply the instructions. 75. In part (ii) the order is. . without replacement so that there will be at least one out of 1. k ⎠ ⎝ ( ) as was to be seen. . and is equal to ⎛ n + k − 1⎞ N n. . irrelevant and the required number is (N 0) N N + ( 1) + · · · + (N ) = 2N.] PROOF i) The ﬁrst part follows from Theorem 7 by taking nj = (n − j + 1).

This can be done in 4 · 4 · 4 = 43 = 64 ways. the required probabilities are as follows for the respective four possible sampling cases: ⎛ 5⎞ ⎜ ⎟ ⎝ 3⎠ 1 = ≈ 0. We thus let S be a set with N elements and assign the uniform probability measure to S. . Since there are 12 face values to choose from. What is the probability that the smallest number is 3? Assuming the uniform probability function. This can be done in (13 1 ) = 13 ways. Four balls are drawn. 8⋅7⋅6⋅5 7 ⎛ 4⎞ 3 ⎛ 4⎞ 2 ⎛ 4⎞ ⎛ 4⎞ ⎜ 1⎟ ⋅ 5 + ⎜ 2⎟ ⋅ 5 + ⎜ 3⎟ ⋅ 5 + ⎜ 4⎟ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ 671 Order counts/replacements allowed: ⎝ ⎠ = ≈ 0. 096 EXAMPLE 6 What is the probability that a poker hand will have exactly one pair? A poker hand is a 5-subset of the set of 52 cards in a full deck. b) Choose two cards with the face value selected in (a). d) Choose one card (from the four at hand) of each face value chosen in (c). A poker hand with one pair has two cards of the same face value and three cards whose faces are all different among themselves and from that of the pair. 598.4 Combinatorial Results 37 EXAMPLE 5 An urn contains 8 balls numbered 1 to 8. 960 ⎝ 5⎠ different poker hands. ⎛ 8⎞ 7 ⎜ ⎟ ⎝ 4⎠ ⎛ 6 + 3 − 1⎞ ⎟ ⎜ 3 ⎠ ⎝ 28 = ≈ 0. so there are ⎛ 52⎞ ⎜ ⎟ = N = 2.17. This can be done in (4 2) = 6 ways.14.2. this can be done in (12 3 ) = 220 ways. ⎛ 8 + 4 − 1⎞ 165 ⎟ ⎜ 4 ⎠ ⎝ Order does not count/replacements not allowed: Order does not count/replacements allowed: Order counts/replacements not allowed: (5 ⋅ 4 ⋅ 3)4 = 1 ≈ 0. c) Choose the three face values for the other three cards in the hand.14. 84 4. We arrive at a unique poker hand with one pair by completing the following tasks in order: a) Choose the face value of the pair from the 13 available face values.16.

Hence. since there are k places to put each of the n balls. j = 1. nk ⎝ ⎠ ⎝ n1 ⎠ ⎝ n2 ⎠ 1 2 k iii) We represent the k cells by the k spaces between k + 1 vertical bars and the n balls by n stars. we now stars which can be done in k+n have the condition that there should not be two adjacent bars. by Theorem 6. if n ≥ k and no cell is to be empty. Σk j=1 nj = n) is ⎛ ⎞ n n! =⎜ .38 2 Some Probabilistic Concepts and Results Then. 2. . .42. ii) The number of ways that n distinct balls can be distributed into k distinct cells so that the jth cell contains nj balls (nj ≥ 0. 240 ≈ 0. 598. 098. the required probability is equal to 1. n1! n2 ! ⋅ ⋅ ⋅ nk ! ⎝ n1 . ▲ ( ) ( ) . By ﬁxing the two extreme bars.240 poker hands with one pair. this number becomes ⎛ n − 1⎞ ⎜ ⎟. n ⎠ ⎝ Furthermore. ii) This problem is equivalent to partitioning the n balls into k groups. This can be done in the following number of ways: ⎛ n ⎞ ⎛ n − n1 ⎞ ⎛ n − n1 − ⋅ ⋅ ⋅ − nk−1 ⎞ n! ⎜ ⎟⎜ ⎟ ⋅⋅⋅⎜ ⎟ = n !n ! ⋅ ⋅ ⋅ n !. by assuming the uniform probability measure. there are 13 · 6 · 220 · 64 = 1. . ⋅ ⋅ ⋅ . The n stars create n − 1 spaces and by selecting k − 1 of them in n−1 ways to place the k−1 k − 1 bars. 960 THEOREM 9 i) The number of ways in which n distinct balls can be distributed into k distinct cells is kn. we are left with k + n − 1 bars and stars which we may consider as k + n − 1 spaces to be ﬁlled in by a bar or a star. k. where the jth group contains exactly nj balls with nj as above. As for the second part. the result follows. Then the problem is that of selecting n spaces for the n n−1 ways. .098. n2 . nk ⎟ ⎠ iii) The number of ways that n indistinguishable balls can be distributed into k distinct cells is ⎛ k + n − 1⎞ ⎜ ⎟. ⎝ k − 1⎠ PROOF i) Obvious.

South. 12. j = 1. . 12⎠ 12! ( ) Thus the required number is 4!48!/(12!)4 and the desired probability is 4!48!(13!)4/[(12!)4(52!)]. The number of possible bridge hands is ⎛ ⎞ 52 52! N =⎜ . 12. EXAMPLE 8 The eleven letters of the word MISSISSIPPI are scrambled and then arranged in some order. .4 Combinatorial Results 39 REMARK 5 i) The numbers nj.11. the number of sample points for which each player. 2 probability is ( ) . 13⎠ 13! ( ) Our sample space S is a set with N elements and assign the uniform probability measure. Next. 1. 13. j = 1. EXAMPLE 7 Find the probability that. Then the number of different permutations is ⎛ ⎞ n ⎜ ⎟. Σk j=1 nj = n.10 and 0. 13. . k. 1⎠ b) Deal the remaining 48 cards. in dealing a bridge hand. 1. East and West. Thus the required 1. Furthermore. it can be seen that this probability lies between 0. n2 . k in the second part of the theorem are called occupancy numbers. 12 to each player. ⎟= 4 ⎝ 13. 4 . . each player receives one ace.2. i) What is the probability that the four I’s are consecutive letters in the resulting arrangement? There are eight possible positions for the ﬁrst I and the remaining seven letters can be arranged in 7 distinct ways. This can be done in ⎛ ⎞ 48 48! ways. one to each player. ⎜ ⎟= 4 ⎝ 12. . . has one ace can be found as follows: a) Deal the four aces. . nk ⎠ Now consider the following examples for the purpose of illustrating the theorem. nj ≥ 0. ii) The answer to (ii) is also the answer to the following different question: Consider n numbered balls such that nj are identical among themselves and distinct from all others. . ⋅ ⋅ ⋅ . North. This can be done in ⎛ ⎞ 4 4! ⎜ ⎟ = 1! 1! 1! 1! = 4! ways. ⎝ 1. ⎝ n1 .

it is clear that P AC = ( ) ⎛ 3⎞ 4⎜ ⎟ ⎝ 2⎠ ⎛ 7 ⎞ ⎜ ⎟ ⎝ 1. three of which are grouped together.11. what is the number of possible combinations? 2.2 How many distinct groups of n symbols in a row can be formed.3 How many different three-digit numbers can be formed by using the numbers 0.4.4. 4. If the distinct digits a.4. . . given B.4. 2.4 Telephone numbers consist of seven digits. ﬁnally. 4. 1. .1 A combination lock can be unlocked by switching it to the left and stopping at digit a. if each symbol is either a dot or a dash? 2. 4. where B is the event that the arrangement starts with M and ends with S? Since there are only six positions for the ﬁrst I.40 2 Some Probabilistic Concepts and Results ⎛ 7 ⎞ 8⎜ ⎟ ⎝ 1. we clearly have P AB = ( ) ⎛ 5⎞ 6⎜ ⎟ ⎝ 2⎠ ⎛ 9 ⎞ ⎟ ⎜ ⎝ 4.05. 2⎠ = 1 ≈ 0. 9? 2. 35 Exercises 2. then switching it to the right and stopping at digit b and. 2⎠ 4 = ≈ 0. 21 iii) What is the conditional probability of A. How many numbers can be formed if: i) No restrictions are imposed? ii) If the ﬁrst three numbers are required to be 752? . 4⎠ = 4 ≈ 0. given C. 9. switching it to the left and stopping at digit c. ⎛ 11 ⎞ 165 ⎜ ⎟ ⎝ 1. . . 3. 2⎠ ii) What is the conditional probability that the four I’s are consecutive (event A). . and the remaining four are also grouped together. .02. where C is the event that the arrangement ends with four consecutive S’s? Since there are only four positions for the ﬁrst I. as deﬁned above. . 1. b and c are chosen from among the numbers 0.

13 In how many ways can a committee of 2n + 1 people be seated along one side of a table. E. 5.4 Combinatorial Results Exercises 41 2.10 Show that ⎛ n + 1⎞ ⎟ ⎜ ⎝ m + 1⎠ n+1 = . what is the probability that two people obtain the same number of H’s. how many ambassadors are involved? 2. if: i) All letters and numbers may be used? ii) No two letters may be the same? 2. What is the probability that a triangle can be formed by using these three chosen line segments? 2.14 Each of the 2n members of a committee ﬂips a fair coin in deciding whether or not to attend a meeting of the committee. m men are to be drafted so that all possible combinations are equally likely to be chosen.4.4.4. F. What is the probability that a speciﬁed man is not drafted? 2. What is the probability that a majority will show up in the meeting? 2.5 A certain state uses ﬁve symbols for automobile license plates such that the ﬁrst two are letters and the last three numbers.4.7 The 24 volumes of the Encyclopaedia Britannica are arranged on a shelf. How many license plates can be made. 3 numbers are chosen at random and without repetitions. What is the probability that their product is a negative number? 2.9 From among n eligible draftees. 7 and 9 and choose three of them at random. What is the probability that: i) All 24 volumes appear in ascending order? ii) All 24 volumes appear in ascending order.12 From 10 positive and 6 negative numbers.4.2.4. 3.4. m+1 ⎛ n⎞ ⎜ ⎟ ⎝ m⎠ 2. if each one of them tosses the coin independently n times? . a committee member attends the meeting if an H appears. F.15 If the probability that a coin falls H is p (0 < p < 1).4.4. I and O are written on six chips and placed into an urn.4.4. given that volumes 14 and 15 appeared in ascending order and that volumes 1–13 precede volume 14? 2.8 If n countries exchange ambassadors.6 Suppose that the letters C. What is the probability that the word “OFFICE” is formed? 2.11 Consider ﬁve line segments of length 1. if the chairman must sit in the middle? 2. Then the six chips are mixed and drawn one by one without replacement.

⎝ m ⎠ ⎝ m ⎠ ⎝ m − 1⎠ where N.18 Show that ⎛ M ⎞ ⎛ M − 1⎞ ⎛ M − 1⎞ ⎜ ⎟ =⎜ ⎟ +⎜ ⎟. The student is required to answer correctly at least 25 questions in order to pass the test. and 40 seniors (28 male + 12 female). i) ⎛ n⎞ n ∑⎜ ⎟ =2 . Five hundred bulbs are chosen at random. What is the probability that all six faces appear? ii) Seven fair dice are tossed once.4. r ⎠ where ⎛ k⎞ ⎜ ⎟ = 0 if ⎝ x⎠ 2.800 good items. . m are positive integers and m < M. j ⎝ ⎠ j=0 n ii ) ∑ (−1) ⎜ ⎟ = 0.4. are tested and the entire shipment is rejected if more than 25 bulbs from among those tested are found to be defective. If he knows the right answers to the ﬁrst 20 questions and chooses an answer to the remaining questions at random and independently of each other.4. 2. ⎝ j⎠ j j=0 n ⎛ n⎞ 2.4.21 A student is given a test consisting of 30 questions.4.17 A shipment of 2.000 light bulbs contains 200 defective items and 1.20 Show that x > k.4. ii) Seven students are male and ﬁve female.19 Show that ∑⎜ ⎟⎜ ⎟ =⎜ ⎝ x ⎠ ⎝ r − x⎠ ⎝ x =0 r ⎛ m⎞ ⎛ n ⎞ ⎛ m + n⎞ ⎟. 80 sophomores (50 male + 30 female). For each question there are supplied 5 different answers (of which only one is correct).42 2 Some Probabilistic Concepts and Results 2. 70 juniors (46 male + 24 female). what is the probability that he will pass the test? 2.4. What is the probability that the shipment will be accepted? 2. What is the probability that every face appears at least once? 2.22 A student committee of 12 people is to be formed from among 100 freshmen (60 male + 40 female).16 i) Six fair dice are tossed once. Find the total number of different committees which can be formed under each one of the following requirements: i) No restrictions are imposed on the formation of the committee.

10 into the fourth and 8 into the ﬁfth interval? 2.22 and suppose that the committee is formed by choosing its members at random. 2. What is the probability that this happens on the 20th throw? 2. v) The committee chairman is required to be a senior. 15 into the third. vii) The chairman. . j = 1.4.2. . 5. Two parts are chosen at random. what is the probability that the birthday of: 17 people falls into the ﬁrst such interval. the secretary and the treasurer of the committee are all required to belong to different classes. What is the probability that: i) Forty people have the same birthdays and the other 33 also have the same birthday (which is different from that of the previous group)? ii) If a year is divided into ﬁve 73-day speciﬁed intervals.4. . 23 into the second.4.4. Find the probability that: iii) The parts are joined in the original order.25 Twenty letters addressed to 20 different addresses are placed at random into the 20 envelopes.29 Three cards are drawn at random and with replacement from a standard deck of 52 playing cards.28 Derive the third part of Theorem 9 from Theorem 8(ii). 2. 5 are deﬁned as follows: . .4 Combinatorial Results Exercises 43 iii) The committee contains the same number of students from each class.4. iv) The committee contains two male students and one female student from each class.4.27 Suppose that each one of n sticks is broken into one long and one short part. Compute the probabilities P(Aj). What is the probability that: i) One part is long and one is short? ii) Both parts are either long or short? The 2n parts are arranged at random into n pairs from which new sticks are formed. 2. where the events Aj.4. What is the probability that: i) All 20 letters go into the right envelopes? ii) Exactly 19 letters go into the right envelopes? iii) Exactly 17 letters go into the right envelopes? 2. vi) The committee chairman is required to be both a senior and male. . Compute the probability that the committee to be chosen satisﬁes each one of the requirements (i)–(vii). iv) All long parts are paired with short parts.26 Suppose that each one of the 365 days of a year is equally likely to be the birthday of each one of a given group of 73 people. . .24 A fair die is rolled independently until all faces appear at least once. 2.4.23 Refer to Exercise 2. j = 1. .

. the ﬁrst card in s is a diamond.44 2 Some Probabilistic Concepts and Results A1 = s ∈ S .33 Refer to Exercise 2. = {s ∈ S . . one is a queen and one is a jack.31 Consider hands of 5 cards from a standard deck of 52 playing cards. at least 2 cards in s are red}. There are balls of all three colors. Two cards are aces.29 and compute the probabilities P(Aj). 2. All ﬁve cards are of the same suit.4.4. A2 A3 A4 { } = {s ∈ S . { the second is a heart and the third is a club .32 and discuss the questions (i)–(iii) for r = 3 and r1 = r2 = r3 (= 1). . 2. 8. 1 card in s is a diamond. 8 are deﬁned as follows: A1 = s ∈ S . s consists of 1 color cards . At least one ball is red. .4.32 An urn contains nR red balls.4. } A5 = s ∈ S . = {s ∈ S . At least two of the cards are aces. = {s ∈ S .4. j = 1. .30 Refer to Exercise 2. 2. 2 clubs and 3 spades}. j = 1. 1 is a heart and 1 is a club . Three cards are of three suits and the other two of the remaining suit. one is a king. = {s ∈ S . where the events Aj. . s contains at least 2 aces}. s consists of 5 diamonds. . Find the probability that: i) ii) iii) iv) All r balls are red. . all 3 cards in s are black . A2 A3 A4 A5 A6 { } = {s ∈ S . 2. 5 when the cards are drawn at random but without replacement. j = 1. if the balls are drawn at random but without replacement. exactly 1 card in s is an ace}. . r1 balls are red. . r balls are chosen at random and with replacement. s does not contain aces. . nB black balls and nW white balls. s consists of cards of exactly 2 suits}. 3 hearts. s consists only of diamonds}. Find the number of all 5-card hands which satisfy one of the following requirements: i) ii) iii) iv) v) Exactly three cards are of one color. r2 balls are black and r3 balls are white (r1 + r2 + r3 = r).4. = {s ∈ S .4. tens and jacks}.34 Suppose that all 13-card hands are equally likely when a standard deck of 52 playing cards is dealt to 4 people. . . = {s ∈ S . } 2. Compute the probabilities P(Aj).

1. 2 kings and exactly 7 red cards .3 can be stated precisely by utilizing more technical language. . C = s ∈S . 4.4. s contains exactly 7 red cards}. where A1 × A2 = { {(s . A j = s ∈ S . 2. compute the probabilities P(Aj). . C are pairwise independent but not mutually independent.2. Then show that: i) ii) iii) iv) P(A ∩ B) = P(A)P(B). { { { } ( } ) } Thus the events A. s consists of cards of all different denominations}. P(A ∩ C) = P(A)P(C). For j = 0. then the compound experiment (E1. s ∈ A . s consists of 3 aces.34 and for j = 0. B and C as follows: A = s ∈S . P1) and (S2. s contains exactly j tens . . s 1 2 1 1 } 2 ∈ A2 .36 Let S be the set of all n3 3-letter words of a language and let P be the equally likely probability function on the events of S. Deﬁne the events A. deﬁne the events Aj and also A as follows: { } A = {s ∈ S . B. if we consider the experiments E1 and E2 with respective probability spaces (S1. 4. .5 Product 2. E2) = E1 × E2 has sample space S = S1 × S2 as deﬁned earlier. A1.4. P(Aj|A). A2 ∈ A2 . A8 { } = {s ∈ S . compare the numbers P(Aj). s begins with a speciﬁc letter . A1 ∈ A1 . s has exactly two of its letters the same . s has the speciﬁed letter mentioned in the deﬁnition of A in the middle entry .4 Combinatorial Probability Results Spaces 45 A7 = s ∈ S . s ). Thus.5* Product Probability Spaces The concepts discussed in Section 2. The appropriate σ-ﬁeld A of events in S is deﬁned as follows: First deﬁne the class C by: C = A1 × A2 . A2. B = s ∈S . P(A ∩ B ∩ C) ≠ P(A)P(B)P(C). P2). } . .4.35 Refer to Exercise 2. 2. P(Aj|A) and also P(A). 1. . P(B ∩ C) = P(B)P(C). . . 2.

. . j = 1. Aj. The experiments E1 and E2 are then said to be independent if P(B1 ∩ B2) = P(B1)P(B2) for all events B1 and B2 as deﬁned above. ⋅ ⋅ ⋅ . . . j = 1. j = 1. 2. n . . . . 2. P) is called the product probability space (with factors (Sj. where the Bj’s are deﬁned above. and the probability space (S. A2 are independent if P(A1 ∩ A2) = P(A1)P(A2) for any A1 ∈ A1. n are said to be independent if P(B1 ∩ · · · ∩ B2) = P(B1) · · · P(B2). . P) is called the product probability space (with factors (Sj. A1 ∈ A1. . j = 1. ⋅ ⋅ ⋅ . En) = E1 × · · · × En has probability space (S. . . Pj). n may be considered as sub-σ-ﬁelds of the product σ-ﬁeld A by identifying Aj with Bj. . n (looked upon as sub-σ-ﬁelds of the product σ-ﬁeld A). . Next. Then independence of the experiments Ej. n. . . and the probability space (S. . ⋅ ⋅ ⋅ . A j ∈ A j . . j = 1. j = 1. . 2. where Bj is deﬁned by ( ) ( ) ( ) B j = S1 × ⋅ ⋅ ⋅ × S j − 1 × A j × S j + 1 × ⋅ ⋅ ⋅ × S n . j = 1. . A. P). j = 1. n). A. the σ-ﬁelds Aj. . 2. A2 be two sub-σ-ﬁelds of A. 2). Let A1. A. n with corresponding probability spaces (Sj. A2 ∈ A2. It can be shown that P determines uniquely a probability measure on A (by means of the so-called Carathéodory extension theorem). j = 1. n}. ⋅ ⋅ ⋅ . It is to be noted that events which refer to E1 alone are of the form B1 = A1 × S2. A2 ∈ A2. σ-ﬁelds which are not independent are said to be dependent. . 2. j = 1. A j ∈ A j . . where S = S1 × ⋅ ⋅ ⋅ × S n = {(s . For n experiments Ej. 1 n j j A is the σ-ﬁeld generated by the class C. 2. n amounts to independence of the corresponding σ-ﬁelds Aj. ⋅ ⋅ ⋅ .46 2 Some Probabilistic Concepts and Results Then A is taken to be the σ-ﬁeld generated by C (see Theorem 4 in Chapter 1). . The deﬁnition of independent events carries over to σ-ﬁelds as follows. s ∈S . n (sub-σ-ﬁelds of A) are said to be independent if n ⎞ ⎛ n P⎜ I A j ⎟ = ∏ P A j ⎝ j =1 ⎠ j =1 ( ) for any A j ∈ A j . Aj. At this point. Of course. . . j = 1. deﬁne on C the set function P by P(A1 × A2) = P1(A1)P2(A2). The probability measure P is usually denoted by P1 × · · · × Pn and is called the product probability measure (with factors Pj. . . . We say that A1. n. where C = A1 × ⋅ ⋅ ⋅ × An . j = 1. j = 1. 2. . . 2. ⋅ ⋅ ⋅ . . Pj). the compound experiment (E1. . n. 2. Aj. s ). 2. Pj). This probability measure is usually denoted by P1 × P2 and is called the product probability measure (with factors P1 and P2). 2. Then the experiments Ej. . and P is the unique probability measure deﬁned on A through the relationships { } P A1 × ⋅ ⋅ ⋅ × An = P A1 ⋅ ⋅ ⋅ P An . . notice that the factor σ-ﬁelds Aj. . and those referring to E2 alone are of the form B2 = S1 × A2. . 2. 2. n). j = 1. . More generally.

(1. (2. 2. 2. go}. 2. ( ) S M = P A1 ∩ A2 ∩ ⋅ ⋅ ⋅ ∩ AM .2 Show that A × B = ∅ if and only if at least one of the sets A. A × C. M occur. Dm = at most ⎪ ⎭ Then we have . . C = {(1.4Probability Combinatorial of Matchings Results 47 Exercises 2. iii) (A × B) ∪ (C × D) = (A ∪ C) × (B ∪ D) − [(A ∩ Cc) × (Bc ∩ D) + (Ac ∩ C) × (B ∩ Dc )]. j =1 M ( ) S2 = M Sr = M 1≤ j1 < j2 ≤ M ∑ P A j1 ∩ A j2 . B is ∅. is established providing an expression for the probability of occurrence of exactly m events out of possible M events. Let also ( ) Bm = exactly ⎫ ⎪ C m = at least ⎬m of the events A j . . B = {good. The theorem is then illustrated by means of two interesting examples. .5.2. 2)}. A × B × C. j = 1.1 Form the Cartesian products A × B. 2. show that A × C ⊆ B × C for any set C. Consider M events Aj.4 Show that i) (A × B)c = (A × Bc) + (Ac × B) + (Ac × Bc). For this purpose. 2. B × C. some additional notation is needed which we proceed to introduce.3 If A ⊆ B.5. where A = {stop. . 2. ( ) 1≤ j1 < j2 < ⋅ ⋅ ⋅ < jr ≤ M ∑ P A j1 ∩ A j2 ∩ ⋅ ⋅ ⋅ ∩ A jr .6* The 2. defective). Theorem 10. M and set S0 = 1. j = 1. S1 = ∑ P A j . 2). 1). an important result.6* The Probability of Matchings In this section. ⋅ ⋅ ⋅ .5.5. ii) (A × B) ∩ (C × D) = (A ∩ C) × (B ∩ D).

6 of Chapter 5. 3rd ed. ( ) ( ) ( ) ( ) (4) and P Dm = P B0 + P B1 + ⋅ ⋅ ⋅ + P Bm . pp. . The following examples illustrate the above theorem. If a ball is placed into the urn bearing the same number as the ball. Vol. write an M-tuple (z1. . . zM)′ ∈ ޒM. . 1968. with one ball in each urn. . for m = 0. I. the event Ak that a match will occur in the kth urn may be written Ak = {(z1. Let M balls numbered 1 to M be inserted randomly in the urns. . z2. all that one has to establish is (2). ( ) ( ) ( ) ( ) (5) For the proof of this theorem. This will be done in Section 5.63 M! for large M. M is 1 ⎛ 1 1 1 − 1 + − + ⋅ ⋅ ⋅ + −1 ⎜ ⎜ m! ⎝ 2! 3! = DISCUSSION ( ) M −m ( ⎞ 1 ⎟ M − m !⎟ ⎠ ) 1 M −m ∑ −1 m! k = 0 ( ) k 1 1 −1 e ≈ k! m! for M − m large. . EXAMPLE 9 The matching problem (case of sampling without replacement). . zj integer. 99–100. . by W.. Feller. 1 ≤ zj . To describe the distribution of the balls among the urns.48 2 Some Probabilistic Concepts and Results THEOREM 10 With the notation introduced above ⎛ m + 1⎞ ⎛ m + 2⎞ P Bm = S m − ⎜ ⎟ S m+1 + ⎜ ⎟ S m+ 2 − ⋅ ⋅ ⋅ + −1 ⎝ m ⎠ ⎝ m ⎠ ( ) ( ) M −m ⎛ M⎞ ⎜ ⎟ SM ⎝ m⎠ (2) which for m = 0 is P B0 = S0 − S1 + S 2 − ⋅ ⋅ ⋅ + −1 and ( ) ( ) M SM . . M. and ii) exactly m matches will occur. i) Show the probability of at least one match is 1− 1 1 + − ⋅ ⋅ ⋅ + −1 2! 3! ( ) M +1 1 ≈ 1 − e −1 ≈ 0. For k = 1. (3) P C m = P Bm + P Bm+1 + ⋅ ⋅ ⋅ + P BM . Suppose that we have M urns. . zM) whose jth component represents the number of the ball inserted in the jth urn. a match is said to have occurred. . numbered 1 to M. For a proof where S is discrete the reader is referred to the book An Introduction to Probability Theory and Its Applications. 2. . . . since (4) and (5) follow from it. 1. . 2.

j = 1. 2. M and any r unequal integers k1. we write an n-tuple (z1. . M⎠ ⎝ M ⎠ ⎝ ( ) 2 n n k = 1. n k2 = k1 + 1. . ⋅ ⋅ ⋅ .6* The of Matchings 2. Suppose that a manufacturer gives away in packages of his product certain items (which we take to be coupons). . z j integer. in such a way that each of the M items is equally likely to be found in any package purchased. M. . 1 ≤ z j ≤ M . . M M ⎝ ⎠ ⎝ ⎠ n n and. · · · . j = 1. 2. . ⎭ ⎛ M − 1⎞ ⎛ 1⎞ P Ak = ⎜ ⎟ = ⎜1 − ⎟ . EXAMPLE 10 ( ) Coupon collecting (case of sampling with replacement). ⎩ It is easy to see that we have the following results: ( ) ⎫ n⎬. that is. k2. A2. . whose jth component zj represents the number of the coupon found in the jth package purchased. in general. . zn). AM. show that the probability that exactly m of the integers. . zk = k}. . 2. will not be obtained is equal to k ⎛ M − m⎞ ⎛ ⎛ M⎞ M −m m + k⎞ ⎟ ⎜1 − M ⎟ . . from 1 to M. . M. r! ⎝ r ⎠ M! This implies the desired results. ⋅ ⋅ ⋅ . If n distinguishable balls are distributed among M urns. . If n packages are bought. P Ak1 ∩ Ak2 ∩ ⋅ ⋅ ⋅ ∩ Akr = It then follows that Sr is given by ( M −r ! ) ( M! ) . kr. We now deﬁne the events A1. M . z2.. 2. n P Ak ∩ Ak 1 ( ) ⎛ M − 2⎞ ⎛ 2⎞ =⎜ ⎟ = ⎜1 − ⎟ . . ⎛ M⎞ M − r ! 1 Sr = ⎜ ⎟ = .2. one of which is the following. ⋅ ⋅ ⋅ . numbered 1 to M. Ak is the event that the number k will not appear in the sample. z j ≠ k. 2. . ⋅ ⋅ ⋅ . each bearing one of the integers 1 to M. k1 = 1. what is the probability that there will be exactly m urns in which no ball was placed (that is. It is clear that for any integer r = 1.4Probability Combinatorial Results 49 ≤ M. ⎜ ⎟ ∑ −1 ⎜ ⎠ ⎝ k ⎠⎝ ⎝ m⎠ k = 0 ( ) n Many variations and applications of the above problem are described in the literature. . exactly m urns remain empty after the n balls have been distributed)? DISCUSSION To describe the coupons found in the n packages purchased. zn ∈ ޒn . . ⋅ ⋅ ⋅ . For k = 1. . . 1 to M. . . ′ ⎧ Ak = ⎨ z1 .

Clearly. ⋅ ⋅ ⋅ . we have P Bm = ( ) ∑( ) −1 r =m M r −m ⎛ r ⎞ ⎛ M⎞ ⎛ r ⎞ ⎜ ⎟ ⎜ ⎟ ⎜1 − M ⎟ ⎠ ⎝ m⎠ ⎝ r ⎠ ⎝ n k ⎛ M − m⎞ ⎛ ⎛ M⎞ M −m m + k⎞ = ⎜ ⎟ ∑ −1 ⎜ ⎟ ⎜1 − M ⎟ . ⋅ ⋅ ⋅ . ⋅ ⋅ ⋅ . 1. . M⎠ ⎝ r ⎠⎝ n (6) Let Bm be the event that exactly m of the integers 1 to M will not be found in the sample. r = 0. This section is concluded with the following important result stated as a theorem. n. Bm is the event that exactly m of the events A1. . Then in a series of independent trials. deﬁne the events Ai and Bi as follows: Ai = “ A occurs on the ith trial. For i = 1. . ⋅ ⋅ ⋅ . ⋅ ⋅ ⋅ and therefore .” Bi = “B occurs on the ith trial. . A1c ∩ B1c ∩ A2 ∩ B2 ∩ A3 . . A1c ∩ B1c ∩ A2 . 2. ⎝ m ⎠ ⎝ m + k⎠ ⎝ m ⎠ ⎝ k ⎠ (8) This is the desired result. ⎠ ⎝ m ⎠ k=0 ⎝ k ⎠⎝ ( ) n (7) by setting r − m = k and using the identity ⎛ m + k⎞ ⎛ M ⎞ ⎛ M ⎞ ⎛ M − m⎞ ⎜ ⎟⎜ ⎟ = ⎜ ⎟⎜ ⎟. n M kr = kr −1 + 1. M⎠ ⎝ ( 1 2 r ) n k1 = 1. required the event is the sum of the events c c A1 . By relations (2) and (6).50 2 Some Probabilistic Concepts and Results ⎛ r ⎞ P Ak ∩ Ak ∩ ⋅ ⋅ ⋅ ∩ Ak = ⎜ 1 − ⎟ . show that: P A occurs before B occurs = PROOF ( ) P A P A +P B ( ) ( ) ( ) . M . THEOREM 11 Let A and B be two disjoint events. 2.” Then. clearly. . . n k2 = k1 + 1. AM will occur. Thus the quantities Sr are given by ⎛ M⎞ ⎛ r ⎞ S r = ⎜ ⎟ ⎜ 1 − ⎟ . ⋅ ⋅ ⋅ . . c c A1c ∩ B1c ∩ ⋅ ⋅ ⋅ ∩ An ∩ Bn ∩ An+1 .

2.6* The 2.4Probability Combinatorial of Matchings Results

51

P A occurs before B occurs

(

)

c c ∩ B2 ∩ A3 = P A1 + A1c ∩ B1c ∩ A2 + A1c ∩ B1c ∩ A2 c c + ⋅ ⋅ ⋅ + A1c ∩ B1c ∩ ⋅ ⋅ ⋅ ∩ An ∩ Bn ∩ An+1 + ⋅ ⋅ ⋅

[ (

(

) (

)

]

)

c c = P A1 + P A1c ∩ B1c ∩ A2 + P A1c ∩ B1c ∩ A2 ∩ B2 ∩ A3 c c + ⋅ ⋅ ⋅ + P A1c ∩ B1c ∩ ⋅ ⋅ ⋅ ∩ An ∩ Bn ∩ An+1 + ⋅ ⋅ ⋅ c 1 c 1 c 1 c 1 c 2

) = P( A ) + P( A ∩ B ) P( A ) + P( A ∩ B ) P( A ∩ B ) P( A ) + ⋅ ⋅ ⋅ + P( A ∩ B ) ⋅ ⋅ ⋅ P( A ∩ B )P( A ) + ⋅ ⋅ ⋅ (by Theorem 6) = P( A) + P( A ∩ B )P( A) + P ( A ∩ B )P( A) + ⋅ ⋅ ⋅ + P ( A ∩ B )P( A) ⋅ ⋅ ⋅ = P( A)[1 + P( A ∩ B ) + P ( A ∩ B ) + ⋅ ⋅ ⋅ + P ( A ∩ B ) + ⋅ ⋅ ⋅]

1 2 c 2 3 c 1 c 1 c n c n n+1 c c 2 c c n c c

c c 2 c c n c c

( ) (

(

) (

)

=P A

1 ( )1−P A ∩B

(

c

c

)

.

But

c P Ac ∩ Bc = P ⎡ A∪ B ⎤ = 1− P A∪ B ⎢ ⎥ ⎣ ⎦ = 1− P A+ B = 1− P A − P B ,

(

)

(

)

(

)

( ) ( ) ( )

so that

1 − P Ac ∩ Bc = P A + P B .

Therefore

(

) ( ) ()

)

**P A occurs before B occurs =
**

as asserted. ▲

(

P A

P A +P B

( ) ( )

( )

,

It is possible to interpret B as a catastrophic event, and A as an event consisting of taking certain precautionary and protective actions upon the energizing of a signaling device. Then the signiﬁcance of the above probability becomes apparent. As a concrete illustration, consider the following simple example (see also Exercise 2.6.3).

52

2

Some Probabilistic Concepts and Results

EXAMPLE 11

In repeated (independent) draws with replacement from a standard deck of 52 playing cards, calculate the probability that an ace occurs before a picture.

Let A = “an ace occurs,” B = “a picture occurs.”

4 13

Then P(A) = 1 4 = 13 = 1 . 13 4

4 52

=

1 13

and P(B) =

12 52

=

, so that P(A occurs before B occurs)

Exercises

2.6.1 Show that

⎛ m + k⎞ ⎛ M ⎞ ⎛ M ⎞ ⎛ M − m⎞ ⎜ ⎟⎜ ⎟ = ⎜ ⎟⎜ ⎟, ⎝ m ⎠ ⎝ m + k⎠ ⎝ m ⎠ ⎝ k ⎠

as asserted in relation (8). 2.6.2 Verify the transition in (7) and that the resulting expression is indeed the desired result. 2.6.3 Consider the following game of chance. Two fair dice are rolled repeatedly and independently. If the sum of the outcomes is either 7 or 11, the player wins immediately, while if the sum is either 2 or 3 or 12, the player loses immediately. If the sum is either 4 or 5 or 6 or 8 or 9 or 10, the player continues rolling the dice until either the same sum appears before a sum of 7 appears in which case he wins, or until a sum of 7 appears before the original sum appears in which case the player loses. It is assumed that the game terminates the ﬁrst time the player wins or loses. What is the probability of winning?

3.1 Soem General Concepts

53

Chapter 3

On Random Variables and Their Distributions

**3.1 Some General Concepts
**

Given a probability space (S, class of events, P), the main objective of probability theory is that of calculating probabilities of events which may be of importance to us. Such calculations are facilitated by a transformation of the sample space S, which may be quite an abstract set, into a subset of the real line ޒwith which we are already familiar. This is, actually, achieved by the introduction of the concept of a random variable. A random variable (r.v.) is a function (in the usual sense of the word), which assigns to each sample point s ∈ S a real number, the value of the r.v. at s. We require that an r.v. be a wellbehaving function. This is satisﬁed by stipulating that r.v.’s are measurable functions. For the precise deﬁnition of this concept and related results, the interested reader is referred to Section 3.5 below. Most functions as just deﬁned, which occur in practice are, indeed, r.v.’s, and we leave the matter to rest here. The notation X(S) will be used for the set of values of the r.v. X, the range of X. Random variables are denoted by the last letters of the alphabet X, Y, Z, etc., with or without subscripts. For a subset B of ޒ, we usually denote by (X ∈ B) the following event in S: (X ∈ B) = {s ∈ S; X(s) ∈ B} for simplicity. In particular, (X = x) = {s ∈ S; X(s) = x}. The probability distribution function (or just the distribution) of an r.v. X is usually denoted by PX and is a probability function deﬁned on subsets of ޒas follows: PX (B) = P(X ∈ B). An r.v. X is said to be of the discrete type (or just discrete) if there are countable (that is, ﬁnitely many or denumerably inﬁnite) many points in ޒ, x1, x2, . . . , such that PX({xj}) > 0, j ≥ 1, and Σj PX({xj})(= Σj P(X = xj)) = 1. Then the function fX deﬁned on the entire ޒby the relationships:

fX x j = PX x j

( )

({ })(= P(X = x ))

j

for x = x j ,

53

54

3

On Random Variables and Their Distributions

**and fX(x) = 0 otherwise has the properties:
**

fX x ≥ 0

()

for all x, and

∑ f (x ) = 1.

X j j

Furthermore, it is clear that

P X ∈B =

(

) ∑ f (x ).

X j x j ∈B

Thus, instead of striving to calculate the probability of the event {s ∈ S; X(s) ∈ B}, all we have to do is to sum up the values of fX(xj) for all those xj’s which lie in B; this assumes, of course, that the function fX is known. The function fX is called the probability density function (p.d.f.) of X. The distribution of a discrete r.v. will also be referred to as a discrete distribution. In the following section, we will see some discrete r.v.’s (distributions) often occurring in practice. They are the Binomial, Poisson, Hypergeometric, Negative Binomial, and the (discrete) Uniform distributions. Next, suppose that X is an r.v. which takes values in a (ﬁnite or inﬁnite but proper) interval I in ޒwith the following qualiﬁcation: P(X = x) = 0 for every single x in I. Such an r.v. is called an r.v. of the continuous type (or just a continuous r.v.). Also, it often happens for such an r.v. to have a function fX satisfying the properties fX(x) ≥ 0 for all x ∈ I, and P(X ∈ J) = ∫J fX(x)dx for any sub-interval J of I. Such a function is called the probability density function (p.d.f.) of X in analogy with the discrete case. It is to be noted, however, that here fX(x) does not represent the P(X = x)! A continuous r.v. X with a p.d.f. fX is called absolutely continuous to differentiate it from those continuous r.v.’s which do not have a p.d.f. In this book, however, we are not going to concern ourselves with non-absolutely continuous r.v.’s. Accordingly, the term “continuous” r.v. will be used instead of “absolutely continuous” r.v. Thus, the r.v.’s to be considered will be either discrete or continuous (= absolutely continuous). Roughly speaking, the idea that P(X = x) = 0 for all x for a continuous r.v. may be interpreted that X takes on “too many” values for each one of them to occur with positive probability. The fact that P(X = x) also follows formally by x the fact that P(X = x) = ∫ x fX(y)dy, and this is 0. Other interpretations are also possible. It is true, nevertheless, that X takes values in as small a neighborhood of x as we please with positive probability. The distribution of a continuous r.v. is also referred to as a continuous distribution. In Section 3.3, we will discuss some continuous r.v.’s (distributions) which occur often in practice. They are the Normal, Gamma, Chi-square, Negative Exponential, Uniform, Beta, Cauchy, and Lognormal distributions. Reference will also be made to t and F r.v.’s (distributions). Often one is given a function f and is asked whether f is a p.d.f. (of some r.v.). All one has to do is to check whether f is non-negative for all values of its argument, and whether the sum or integral of its values (over the appropriate set) is equal to 1.

3.2

Discrete Random Variables 3.1 Soem (and General Random Concepts Vectors)

55

When (a well-behaving) function X is deﬁned on a sample space S and takes values in the plane or the three-dimensional space or, more generally, in the k-dimensional space ޒk, it is called a k-dimensional random vector (r. vector) and is denoted by X. Thus, an r.v. is a one-dimensional r. vector. The distribution of X, PX, is deﬁned as in the one-dimensional case by simply replacing B with subsets of ޒk. The r. vector X is discrete if P(X = xj) > 0, j = 1, 2, . . . with Σj P(X = xj) = 1, and the function fX(x) = P(X = xj) for x = xj, and fX(x) = 0 otherwise is the p.d.f. of X. Once again, P(X ∈ B) = ∑ xj∈B fX(xj) for B subsets of ޒk. The r. vector X is (absolutely) continuous if P(X = x) = 0 for all x ∈ I, but there is a function fX deﬁned on ޒk such that:

fX x ≥ 0

()

**for all x ∈ ޒk , and P X ∈ J = ∫ fX x dx
**

J

(

)

()

for any sub-rectangle J of I. The function fX is the p.d.f. of X. The distribution of a k-dimensional r. vector is also referred to as a k-dimensional discrete or (absolutely) continuous distribution, respectively, for a discrete or (absolutely) continuous r. vector. In Sections 3.2 and 3.3, we will discuss two representative multidimensional distributions; namely, the Multinomial (discrete) distribution, and the (continuous) Bivariate Normal distribution. We will write f rather than fX when no confusion is possible. Again, when one is presented with a function f and is asked whether f is a p.d.f. (of some r. vector), all one has to check is non-negativity of f, and that the sum of its values or its integral (over the appropriate space) is equal to 1.

**3.2 Discrete Random Variables (and Random Vectors)
**

3.2.1 Binomial

The Binomial distribution is associated with a Binomial experiment; that is, an experiment which results in two possible outcomes, one usually termed as a “success,” S, and the other called a “failure,” F. The respective probabilities are p and q. It is to be noted, however, that the experiment does not really have to result in two outcomes only. Once some of the possible outcomes are called a “failure,” any experiment can be reduced to a Binomial experiment. Here, if X is the r.v. denoting the number of successes in n binomial experiments, then

X S = 0, 1, 2, ⋅ ⋅ ⋅ , n ,

( ) {

}

⎛ n⎞ P X = x = f x = ⎜ ⎟ p x q n− x , ⎝ x⎠

(

) ()

**where 0 < p < 1, q = 1 − p, and x = 0, 1, 2, . . . , n. That this is in fact a p.d.f. follows from the fact that f(x) ≥ 0 and
**

x n− x = ( p + q) ∑ f (x) =∑ ⎜ ⎟p q ⎝ x⎠ x =0 x =0 n n

⎛ n⎞

n

= 1n = 1.

56

3

On Random Variables and Their Distributions

**The appropriate S here is:
**

S = S, F × ⋅ ⋅ ⋅ × S, F

{

}

{

}

(n copies).

In particular, for n = 1, we have the Bernoulli or Point Binomial r.v. The r.v. X may be interpreted as representing the number of S’s (“successes”) in the compound experiment E × · · · × E (n copies), where E is the experiment resulting in the sample space {S, F} and the n experiments are independent (or, as we say, the n trials are independent). f(x) is the probability that exactly x S’s occur. In fact, f(x) = P(X = x) = P(of all n sequences of S’s and F ’s with exactly x S’s). The probability of one such a sequence is pxqn−x by the independence of the trials and this also does not depend on the particular n sequence we are considering. Since there are ( x ) such sequences, the result follows. The distribution of X is called the Binomial distribution and the quantities n and p are called the parameters of the Binomial distribution. We denote the Binomial distribution by B(n, p). Often the notation X ∼ B(n, p) will be used to denote the fact that the r.v. X is distributed as B(n, p). Graphs of the p.d.f. of the B(n, p) distribution for selected values of n and p are given in Figs. 3.1 and 3.2.

f (x) 0.25 0.20 0.15 0.10 0.05 0 1 2 3 4 5 6 7 8 9 10 11 12 13 x n ϭ 12 pϭ

1 4

1 Figure 3.1 Graph of the p.d.f. of the Binomial distribution for n = 12, p = – . 4

**() f (1) = 0.1267 f (2 ) = 0.2323 f ( 3) = 0.2581 f ( 4 ) = 0.1936 f ( 5) = 0.1032 f (6) = 0.0401
**

f 0 = 0.0317

() f ( 8 ) = 0.0024 f (9) = 0.0004 f (10 ) = 0.0000 f (11) = 0.0000 f (12 ) = 0.0000

f 7 = 0.0115

3.2

3.1 Soem Concepts Discrete Random Variables (and General Random Vectors)

57

f (x) 0.25

0.20 0.15 0.10 0.05

n ϭ 10 pϭ

1 2

0

1 2 3 4 5 6 7 8 9 10

x

1 Figure 3.2 Graph of the p.d.f. of the Binomial distribution for n = 10, p = – . 2

**() f (1) = 0.0097 f (2 ) = 0.0440 f ( 3) = 0.1172 f ( 4 ) = 0.2051 f ( 5) = 0.2460
**

f 0 = 0.0010

**() f ( 7) = 0.1172 f ( 8 ) = 0.0440 f (9) = 0.0097 f (10 ) = 0.0010
**

f 6 = 0.2051

3.2.2 Poisson

λx , x! x = 0, 1, 2, . . . ; λ > 0. f is, in fact, a p.d.f., since f(x) ≥ 0 and

X S = 0, 1, 2, ⋅ ⋅ ⋅ ,

( ) {

}

P X = x = f x = e −λ

(

) ()

e λ = 1.

∑ f ( x ) = e λ ∑ x! = e

− x =0 x =0

∞

∞

λx

−λ

The distribution of X is called the Poisson distribution and is denoted by P(λ). λ is called the parameter of the distribution. Often the notation X ∼ P(λ) will be used to denote the fact that the r.v. X is distributed as P(λ). The Poisson distribution is appropriate for predicting the number of phone calls arriving at a given telephone exchange within a certain period of time, the number of particles emitted by a radioactive source within a certain period of time, etc. The reader who is interested in the applications of the Poisson distribution should see W. Feller, An Introduction to Probability Theory, Vol. I, 3rd ed., 1968, Chapter 6, pages 156–164, for further examples. In Theorem 1 in Section 3.4, it is shown that the Poisson distribution

58

3

On Random Variables and Their Distributions

may be taken as the limit of Binomial distributions. Roughly speaking, suppose that X ∼ B(n, p), xwhere n is large and p is small. Then P X = x = n− x ) , x ≥ 0 . For the graph of the p.d.f. of the P(λ) ( xn ) px (1 − p) ≈ e − np ( np x! distribution for λ = 5 see Fig. 3.3. A visualization of such an approximation may be conceived by stipulating that certain events occur in a time interval [0,t] in the following manner: events occurring in nonoverlapping subintervals are independent; the probability that one event occurs in a small interval is approximately proportional to its length; and two or more events occur in such an interval with probability approximately 0. Then dividing [0,t] into a large number n of small intervals of length t/n, we have that the probability that exactly x events occur in [0,t] is x n− x λt approximately ( n 1 − λnt ) , where λ is the factor of proportionality. x )( n ) ( Setting pn = λnt , we have npn = λt and Theorem 1 in Section 3.4 gives that x x n− x . Thus Binomial probabilities are approximated by ( nx )( λnt ) (1 − λnt ) ≈ e − λt (λxt ) ! Poisson probabilities.

(

)

f (x) 0.20 0.15 0.10 0.05 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 x

Figure 3.3 Graph of the p.d.f. of the Poisson distribution with λ = 5.

**() f (1) = 0.0337 f (2 ) = 0.0843 f ( 3) = 0.1403 f ( 4 ) = 0.1755 f ( 5) = 0.1755 f (6) = 0.1462 f ( 7) = 0.1044 f ( 8 ) = 0.0653
**

f 0 = 0.0067

**() f (10 ) = 0.0181 f (11) = 0.0082 f (12 ) = 0.0035 f (13) = 0.0013 f (14 ) = 0.0005 f (15) = 0.0001
**

f 9 = 0.0363 f n is negligible for n ≥ 16.

()

3.2

Discrete Random Variables 3.1 Soem (and General Random Concepts Vectors)

59

3.2.3 Hypergeometric

X S = 0, 1, 2, ⋅ ⋅ ⋅ , r ,

( ) {

}

⎛ m⎞ ⎛ n ⎞ ⎜ ⎟⎜ ⎟ ⎝ x ⎠ ⎝ r − x⎠ f x = , ⎛ m + n⎞ ⎜ ⎟ ⎝ r ⎠

()

where ( m x ) = 0, by deﬁnition, for x > m. f is a p.d.f., since f(x) ≥ 0 and

∑ f ( x ) = ⎛ m + n⎞ ∑ ⎜ ⎟⎜ ⎟= ⎜ ⎝ x ⎠ ⎝ r − x⎠ ⎛ m + n⎞ ⎝

1 1

x =0

r

r

⎛ m⎞ ⎛ n ⎞

⎜ ⎝

r

⎟ ⎠

x =0

⎜ ⎝

r

⎟ ⎠

⎛ m + n⎞ ⎟ = 1. r ⎠

The distribution of X is called the Hypergeometric distribution and arises in situations like the following. From an urn containing m red balls and n black balls, r balls are drawn at random without replacement. Then X represents the number of red balls among the r balls selected, and f(x) is the probability that this number is exactly x. Here S = {all r-sequences of R’s and B’s}, where R stands for a red ball and B stands for a black ball. The urn/balls model just described is a generic model for situations often occurring in practice. For instance, the urn and the balls may be replaced by a box containing certain items manufactured by a certain process over a speciﬁed period of time, out of which m are defective and n meet set speciﬁcations.

3.2.4 Negative Binomial

X S = 0, 1, 2, ⋅ ⋅ ⋅ ,

( ) {

}

⎛ r + x − 1⎞ x f x = pr ⎜ ⎟q , x ⎠ ⎝

()

0 < p < 1, q = 1 − p, x = 0, 1, 2, . . . . f is, in fact, a p.d.f. since f(x) ≥ 0 and

x =0

∑ f (x) = p ∑ ⎜ ⎝

r

∞

⎛ r + x − 1⎞ x pr ⎟q = x ⎠ x =0 1− q

∞

(

)

r

=

pr = 1. pr

This follows by the Binomial theorem, according to which 1

(1 − x)

n

∞ ⎛ n + j − 1⎞ j = ∑⎜ ⎟x , j ⎠ j=0 ⎝

x < 1.

The distribution of X is called the Negative Binomial distribution. This distribution occurs in situations which have as a model the following. A Binomial experiment E, with sample space {S, F}, is repeated independently until exactly r S’s appear and then it is terminated. Then the r.v. X represents the number of times beyond r that the experiment is required to be carried out, and f(x) is the probability that this number of times is equal to x. In fact, here S =

60

3

On Random Variables and Their Distributions

{all (r + x)-sequences of S’s and F ’s such that the rth S is at the end of the sequence}, x = 0, 1, . . . and f(x) = P(X = x) = P[all (r + x)-sequences as above for a speciﬁed x]. The probability of one such sequence is pr−1qxp by the independence assumption, and hence

⎛ r + x − 1⎞ r −1 x r ⎛ r + x − 1⎞ x f x =⎜ ⎟q . ⎟p q p= p ⎜ x ⎠ x ⎠ ⎝ ⎝

()

The above interpretation also justiﬁes the name of the distribution. For r = 1, we get the Geometric (or Pascal) distribution, namely f(x) = pqx, x = 0, 1, 2, . . . .

3.2.5

Discrete Uniform

X S = 0, 1, ⋅ ⋅ ⋅ , n − 1 ,

( ) {

}

f x =

()

1 , n

x = 0, 1, ⋅ ⋅ ⋅ , n − 1.

**This is the uniform probability measure. (See Fig. 3.4.)
**

f (x)

1 5

nϭ5

Figure 3.4 Graph of the p.d.f. of a Discrete Uniform distribution.

0

1

2

3

4

x

3.2.6

Here

Multinomial

k ⎧ ⎫ ⎪ ⎪ X S = ⎨x = x1 , ⋅ ⋅ ⋅ , xk ′ ; x j ≥ 0, j = 1, 2, ⋅ ⋅ ⋅ , k, ∑ x j = n⎬, ⎪ ⎪ j =1 ⎩ ⎭ k n! x1 x2 xk f x = p1 p2 ⋅ ⋅ ⋅ pk , p j > 0, j = 1, 2, ⋅ ⋅ ⋅ , k, ∑ p j = 1. x1! x 2 ! ⋅ ⋅ ⋅ x k ! j =1

()

(

)

()

That f is, in fact, a p.d.f. follows from the fact that

∑ f ( x) = ∑

X

x 1 ,⋅⋅⋅ , xk

n! xk p1x1 ⋅ ⋅ ⋅ pk = p1 + ⋅ ⋅ ⋅ + pk x1! ⋅ ⋅ ⋅ xk !

(

)

n

= 1n = 1,

where the summation extends over all xj’s such that xj ≥ 0, j = 1, 2, . . . , k, Σk j=1 xj = n. The distribution of X is also called the Multinomial distribution and n, p1, . . . , pk are called the parameters of the distribution. This distribution occurs in situations like the following. A Multinomial experiment E with k possible outcomes Oj, j = 1, 2, . . . , k, and hence with sample space S = {all n-sequences of Oj’s}, is carried out n independent times. The probability of the Oj’s occurring is pj, j = 1, 2, . . . k with pj > 0 and ∑ k j=1 p j = 1 . Then X is the random vector whose jth component Xj represents the number of times xj the outcome Oj occurs, j = 1, 2, . . . , k. By setting x = (x1, . . . , xk)′, then f is the

3.1 Soem General Exercises Concepts

61

probability that the outcome Oj occurs exactly xj times. In fact f(x) = P(X = x) = P(“all n-sequences which contain exactly xj Oj’s, j = 1, 2, . . . , k). The probx1 xk ability of each one of these sequences is p1 by independence, and since ⋅ ⋅ ⋅ pk there are n!/( x1! ⋅ ⋅ ⋅ xk ! ) such sequences, the result follows. The fact that the r. vector X has the Multinomial distribution with parameters n and p1, . . . , pk may be denoted thus: X ∼ M(n; p1, . . . , pk).

REMARK 1 When the tables given in the appendices are not directly usable because the underlying parameters are not included there, we often resort to linear interpolation. As an illustration, suppose X ∼ B(25, 0.3) and we wish to calculate P(X = 10). The value p = 0.3 is not included in the Binomial 4 5 = 0.25 < 0.3 < 0.3125 = 16 and the Tables in Appendix III. However, 16 4 5 probabilities P(X = 10), for p = 16 and p = 16 are, respectively, 0.9703 and 0.8756. Therefore linear interpolation produces the value:

0.9703 − 0.9703 − 0.8756 ×

(

)

0.3 − 0.25 = 0.8945. 0.3125 − 0.25

**Likewise for other discrete distributions. The same principle also applies appropriately to continuous distributions.
**

REMARK 2 In discrete distributions, we are often faced with calculations of x the form ∑∞ x =1 xθ . Under appropriate conditions, we may apply the following approach:

∑ xθ

x =1

∞

x

= θ ∑ xθ x −1 = θ ∑

x =1 x =1

∞

∞

d x d θ =θ dθ dθ

∞ x=2

∑θ

x =1

∞

x

=θ

d ⎛ θ ⎞ θ ⎜ ⎟= dθ ⎝ 1 − θ ⎠ 1−θ

Similarly for the expression ∑

x x−1 θ

(

)

(

)

2

.

x−2

.

Exercises

3.2.1 A fair coin is tossed independently four times, and let X be the r.v. deﬁned on the usual sample space S for this experiment as follows:

X s = the number of H ’ s in s.

()

iii) What is the set of values of X? iii) What is the distribution of X? iii) What is the partition of S induced by X? 3.2.2 It has been observed that 12.5% of the applicants fail in a certain screening test. If X stands for the number of those out of 25 applicants who fail to pass the test, what is the probability that:

calculate the probability that X = 0. .2.2. the Binomial Tables 3. In such a case. one has: P(a ≤ X ≤ b) = P(n − b ≤ Y ≤ n − a). so that at least one of them is defective with probability at least 0. b with 0 ≤ a < b ≤ n. iii) At least 50 particles do so. .95? Take p = 0. compute the probability that X > 5.2.) Let the r.2. show that: ii) P(X = x) = P(Y = n − x). x = 0.1.6 Refer to Exercise 3. n. 1.v. n. Calculate the probability that: iii) No particles reach the portion of space under consideration during time t.62 3 On Random Variables and Their Distributions iii) X ≥ 1? iii) X ≤ 20? iii) 5 ≤ X ≤ 20? 3. What is the probability that in a given minute: iii) No calls arrive? iii) Exactly 10 calls arrive? iii) At least 10 calls arrive? 3. ii) Also. .2. and q = 1 − p. X is distributed as B(n.v.3 A manufacturing process produces certain articles such that the probability of each article being defective is p.8 The phone calls arriving at a given telephone exchange within one minute follow the Poisson distribution with parameter λ = 10. for any integers a. X be distributed as Poisson with parameter λ and deﬁne the r. of articles to be produced.9 (Truncation of a Poisson r.4 If the r.v.2. Find: .2. Given that P(X = 0) = 0. 3. 3.v. What is the minimum number. with parameter λ. 3. .5 Let X be a Poisson distributed r. iv) Give the numerical values in (i)–(iii) if λ = 100. where Y is as in part (i).05.7 It has been observed that the number of particles emitted by a radioactive substance which reach a given portion of space during time t follows closely the Poisson distribution with parameter λ.2. What is the probability that X < 10? If P(X = 1) = 0.1 and P(X = 2) = 0. where Y ∼ B(n.v. p) with p > 1 2 in Appendix III cannot be used directly. q).5 and suppose that P(X = 1) = P(X = 2). . iii) Exactly 120 particles do so. 3. Y as follows: Y = X if X ≥ k a given positive integer ( ) and Y = 0 otherwise.2.

is a p. 1. 1. 3j j = 0. what is the probability that X ≥ 10? 3.v. j = 3k + 1. If X stands for the r. 1. .f. . ⋅ ⋅ ⋅ . y = k. k.200 are undergraduates and the remaining are graduate students. j =1 3.f. ∑ x j = n. 3. ) 0 ≤ x j ≤ n j . . . where A = { j. consider an urn containing nj balls with the number j written on them. . 3. iv) P(X ∈ B). .? f x = cα x I A x . denoting the number of balls among the n ones with the number j written on them. k + 1. . where B = { j. .2. j = 1. k is given by P X j = x j .2. . 2.13 3. . From the combined list of their names. 1.05. Then show that the joint distribution of Xj. y = 2. 2 For what value of c is the function f deﬁned below a p.v.10 A university dormitory system houses 1. . . X takes on the values 0.600 students.3. () ( ) c . . .1 Soem General Exercises Concepts 63 ii) P(Y = y).}. . where A = {1.12 Refer to the manufacturing process of Exercise 3.d. j = 1. n balls are drawn at random and without replacement. .15 Suppose that the r.11 (Multiple Hypergeometric distribution) For j = 1. . denoting the minimum number of articles to be manufactured until the ﬁrst two defective articles appear. ii) Show that the distribution of Y is given by P Y = y = p2 y − 1 1 − p ( ) ( )( ) y−2 .2. 3. k = 0. ⋅ ⋅ ⋅ . . with the following probabilities: f j =P X=j = iii) Determine the constant c.2. . of whom 1.3 and let Y be the r. . j = 2k + 1. ii) Calculate the probability P(Y ≥ 100) for p = 0. j = 1.v.d. k = 0. k.2. Compute the following probabilities: iii) P(X ≥ 10). 25 names are chosen at random. . . 2. and let Xj be the r. () () where A = 0.2.}. ⋅ ⋅ ⋅ . iii) P(X ∈ A). . 3. ⋅ ⋅ ⋅ . denoting the number of graduate students among the 25 chosen. . 1. ⋅ ⋅ ⋅ { } (0 < α < 1).14 Show that the function f(x) = ( 1 )xIA(x). ii) P(Y = 0).v. .}. k = ( ) ( ∏ k k nj j =1 x j n1 + ⋅ ⋅ ⋅ + nk n ( ).2.

. respectively. show that: p n− x iii) If X ∼ B(n. Xk have the Multinomial distribution. notice that SE + SO = eλ.2. of the Xj’s. 6. Then show that Xj is distributed as B(n. then f x+1 = 3. 1. . .d. }. .64 3 On Random Variables and Their Distributions 3. . . . . . Then: ii) In terms of λ. x = 0. 1. n − 1. show that the r. . 0. Determine the joint p. iii) If X has the Hypergeometric distribution. . p).2. . 3.v. . jm are m distinct integers from the set {1. and set E = {0. j = 1. . ii) What is the joint p. ⋅ ⋅ ⋅ . . · · · pj .19 Let X be an r. k}.2. . 0. .f.05. X5 = 2. .v. r . . of the X’s? ii) Compute the probability that X1 = 6. two of type B and one of type AB? 3. . () (n − r + x + 1)(x + 1) x = 0. ii) Find the numerical values of the probabilities in part (i) for λ = 5. Suppose that these types occur with the following frequencies: 0. 3. Xj have Multinomial distributions with parameters n and pj . . . . and let j be a ﬁxed number from the set {1. calculate the probabilities: P(X ∈ E) and P(X ∈ O).45. 2.40. pj).’s X1. (Hint: If k k SE = Σ k∈E λk! and SO = Σ k∈O λk! . 3 as follows: X j = the number of times j H ’ s appear. Poisson and Hypergeometric probabilities.21 ( ) (m − x)(r − x) f x .v. .10. X2 = 5. min m.18 Suppose that three coins are tossed (independently) n times and deﬁne the r. . X3 = 4. . ii) If m is an integer such that 2 ≤ m ≤ k − 1 and j1. distributed as P(λ).20 The following recursive formulas may be used for calculating Binomial. .17 A balanced die is tossed (independently) 21 times and let Xj be the number of times the number j appears. To this effect. eight of type A. 0. .) 3.16 There are four distinct types of human blood denoted by O.’s Xj. x +1 λ iii) If X ∼ P(λ).v. . 1. 1. 2. 3.’s Xj . X4 = 3.2. .f. A.2. then f ( x + 1) = x +1 f ( x).2. 1 m 1 m . B and AB. . k}.d. what is the probability that: ii) All 20 people have blood of the same type? ii) Nine people have blood type O. . . . X6 = 1. { } i) Suppose the r. } and O = {1. and SE − SO = −λ e . If 20 people are chosen at random. . then f ( x + 1) = q f ( x). . . . x = 0. j = 0.

Clearly f(x) > 0.22 (Polya’s urn scheme) Consider an urn containing b black balls and r red balls.f. Feller. 1). ⎥ ⎥ ⎦ x ∈ ޒ.3. Thus . σ > 0). 3rd ed. denoted by N(μ. 1968. In fact.v. 1. ∞). For more about this and also for a certain approximation to the above distribution. μ ∈ ޒ. of X is given by b b + c b + 2c ⋅ ⋅ ⋅ b + x − 1 c ⎛ n⎞ P X =x =⎜ ⎟ ⎝ x⎠ b + r b + r + c ( )( ( ( ) ( )( [ ( )] × r (r + c ) ⋅ ⋅ ⋅ [r + ( n − x − 1)c ] .3 Continuous Random Variables 3. ∞).1 Soem (and General Random Concepts Vectors) 65 3. upon letting (x − μ)/σ = z.d.3. σ2). Suppose that this experiment is repeated independently n times and let X be the r. so that υ ∈ (−∞. σ2 are called the parameters of the distribution of X which is also called the Normal distribution (μ = mean. so that z ∈ (−∞.3 Continuous Random Variables (and Random Vectors) 3. σ2). pp. For μ = 0. f x = ⎡ x−μ exp⎢− ⎢ 2σ 2 2πσ ⎢ ⎣ 1 ( ) () ( ) 2 ⎤ ⎥. that I = ∫−∞ f x dx = 1 is proved by showing that 2 I = 1. σ2 = variance. and (y − μ)/σ = υ. by W. 142. σ = 1. Vol.2. denoting the number of black balls drawn. where μ.. is replaced and c balls of the same color as the one drawn are placed into the urn. we get what is known as the Standard Normal distribution. () ∞ ∞ ∞ I2 = ⎡ f x dx ⎤ = ∫ f x dx ∫ f y dy ∫ ⎢ ⎥ −∞ −∞ −∞ ⎣ ⎦ ⎡ x−μ 2⎤ ⎡ y−μ 1 ∞ 1 ∞ ⎢ ⎥ = exp − dx ⋅ ∫ exp⎢− ⎢ ⎢ 2πσ ∫−∞ σ −∞ 2σ 2 ⎥ 2σ 2 ⎢ ⎥ ⎢ ⎣ ⎦ ⎣ () 2 () ) () ( ( ) 2 ⎤ ⎥ dy ⎥ ⎥ ⎦ = 1 1 ⋅ 2π σ ∫ ∞ −∞ e −z 2 2 σ dz ⋅ 1 σ ∫ ∞ −∞ e −υ 2 2 σ dυ .1 Normal (or Gaussian) X S = ޒ. Then show that the p.) 3. One ball is drawn at random. the reader is referred to the book An Introduction to Probability Theory and Its Applications. We say that X is distributed as normal (μ. 120–121 and p. ) ) ) × b + r + 2c ⋅ ⋅ ⋅ b + r + m − 1 c [ ( )] (This distribution can be used for a rough description of the spread of contagious diseases. denoted ∞ by N(0.

3. lifetimes of various manufactured items. etc. Taking all these facts into consideration. However. It is also clear that f(x) → 0 as x → ±∞.2 Gamma X S = ޒactually X S = 0. The Normal distribution is a good approximation to the distribution of grades. that is. f(μ − x) = f(μ + x) and that f(x) attains its maximum at x = μ which is equal to 1/( 2πσ ).5. It is easily seen that f(x) is symmetric about x = μ.f.8 0. Section 8. ∞ ( ) ( ( ) ( )) . From the fact that max f x = x∈ ޒ () 1 2πσ and the fact that ∫−∞ f ( x)dx = 1.5 ϭ1 ϭ2 0 1 2 N ( . ∞ we conclude that the larger σ is. the main signiﬁcance of it derives from the Central Limit Theorem to be discussed in Chapter 8.d. heights or weights of a (large) group of individuals. the diameters of hail hitting the ground during a storm. that is.66 3 On Random Variables and Their Distributions f (x) 0. since f(x) > 0.6 0. of the Normal distribution with μ = 1.4 0. the force required to punctuate a cardboard.3. we are led to Fig.3. errors in numerous measurements. Or I2 = 1 2π ∫ 0 e −r 2 2 r dr ∫ 2π 0 dθ = ∫ e − r 0 ∞ 2 2 r dr = − e − r 2 2 ∞ 0 = 1. I2 = 1 2π ∫ ∫ ∞ ∞ ∞ −∞ −∞ e − z 2 +υ 2 ( ) 2 dz dν = 1 2π ∫ ∫ 0 ∞ 2π 0 e −r 2 2 r dr dθ by the standard transformation to polar coordinates. 2 ) 3 4 5 x Figure 3. the more spread-out f(x) is and vice versa. 3.5 Graph of the p. I2 = 1 and hence I = 1.2 Ϫ2 Ϫ1 ϭ 0.5 and several values of σ.

Γα ( ) REMARK 3 One easily sees. x≤0 where Γ(α) = ∫0 yα −1 e − y dy (which exists and is ﬁnite for α > 0). that is. ⎝ 2⎠ ⎝ 2⎠ We have ∞ − (1 2 ) ⎛ 1⎞ Γ⎜ ⎟ = ∫ y e − y dy. f(x) ≥ 0 and that ∫−∞ f(x)dx = 1 is seen as follows. ∞ Clearly. and if α is an integer. ⎩ () ( ) x > 0 α > 0. 0 2 ⎝ ⎠ By setting y1 2 = t 2 . we show that ⎛ 1⎞ ⎛ 1⎞ Γ⎜ ⎟ = ⎜ − ⎟ ! = π . ∫−∞ f (x )dx = Γ(α ) ∫0 ∞ 1 ∞ yα −1 e − y dy = 1 ⋅ Γ α = 1.) The distribution of X is also called the Gamma distribution and α. ⎪ f x = ⎨Γ α βα ⎪0 . so that y= t2 . ( ) ( )( ) ( ) ( )( ) () where Γ 1 = ∫ e − y dy = 1. dy = t dt . 2 ( ) . upon letting x/β = y. that Γ α = α −1 Γ α −1 . that is. β > 0. dx = β dy.1 Soem (and General Random Concepts Vectors) 67 Here ⎧ 1 x α −1 e − x β . by integrating by parts.3. x = βy.3 Continuous Random Variables 3. (This integral is known as the Gamma function. t ∈ 0. ∞ . y ∈ (0. β are called the parameters of the distribution. we write Γ α = α − 1 ! = ∫ yα −1 e − y dy for α > 0. then Γ α = α −1 α − 2 ⋅ ⋅ ⋅ Γ 1 . ∞). Γ a = α − 1 ! 0 () ∞ () ( ) We often use this notation even if α is not an integer. 0 ( ) ( ) ∞ For later use. ∞ ∫ f (x )dx = Γ(α )β α ∫ ∞ 1 ∞ −∞ 0 x α −1 e − x β dx = 1 Γ α βα ( ) ∫ ( ) ∞ 0 yα −1 e − y β α dy. that is.

f.00 0.f. 3.68 3 On Random Variables and Their Distributions we get ∞1 ⎛ 1⎞ Γ⎜ ⎟ = 2 ∫ e −t 0 2 t ⎝ ⎠ 2 2 t dt = 2 ∫ e − t 0 ∞ 2 2 dt = π .  ϭ 1 0.  ϭ 1 ␣ ϭ 1.  ϭ 0. . and in particular its special case the Negative Exponential distribution. ⎝ 2⎠ ⎝ 2⎠ From this we also get that ⎛ 3⎞ 1 ⎛ 1⎞ π Γ⎜ ⎟ = Γ⎜ ⎟ = .75 0.50 0.d.00 0. of the Gamma distribution for several values of α.d. of the Gamma distribution for selected values of α and β are given in Figs. β.7 Graphs of the p.5 ␣ ϭ 2.d. serve as satisfactory models for f (x) 1.6 Graphs of the p. β. discussed below. that is.6 and 3.  ϭ 2 4 0 1 2 3 5 x Figure 3.25 ␣ ϭ 4. etc. f (x) 1. 2 2 2 2 ⎝ ⎠ ⎝ ⎠ Graphs of the p.7. of the Gamma distribution for several values of α. ⎛ 1⎞ ⎛ 1⎞ Γ⎜ ⎟ = ⎜ − ⎟ ! = π .50 ␣ ϭ 2.25 ␣ ϭ 2.f. The Gamma distribution.  ϭ 1 0 1 2 3 4 5 6 7 8 x Figure 3.  ϭ 1 ␣ ϭ 2.75 0.

we get what is known as the Chi-square distribution. as well as in statistics. is deﬁned by F(x) = P(X ≤ x). More speciﬁcally. To this end. x ≤ 0. Furthermore. The Chi-square distribution occurs often in statistics. and if X (x ) = f ( x). Then P( X > x) = e 0! −λx particles are emitted in (0. it follows that F(x) = 0. 3. We shall show that X has the Negative Exponential distribution with parameter λ.) of the distribution. The Negative Exponential distribution occurs frequently in statistics and.4 Negative Exponential For α = 1.. denoting the waiting time between successive occurrences of events following a Poisson distribution. so that f(x) = λe−λx. r ≥ 1. 3.f. it is mentioned here that the distribution function F of an r. and f(x) = λe−λx for x > 0. x]. Then F(x) = P(X ≤ x) = 1 − P(X > x). β = 2.v. denoting the waiting time until the next particle occurs.f. that is.3. x ∈ ޒ. integer.d. integer.. Thus. in waitingtime problems. particles emitted by a radioactive source with the average of λ particles per time unit.v.3. to be studied in the next chapter. f x =⎨ ⎩0 . in particular. For speciﬁc choices of the parameters α and β in the Gamma distribution. Since λ is the average number of emitted particles per time unit.1 Soem (and General Random Concepts Vectors) 69 describing lifetimes of various manufactured items. so that X is distributed as asserted. ⎧ 1 ( r 2 )−1 − x 2 x e . for example. and let X be the r. since no number during time x will be λx. is denoted by χ r and r is called the number of degrees of freedom (d. we obtain the Chi-square and the Negative Exponential distributions given below. That is. then dFdx Since X ≥ 0. To see this. if X is an r.3 Continuous Random Variables 3.v. among other things.3. is a continuous r.3 Chi-square For α = r/2. β = 1/λ. as will be seen in subsequent chapters. suppose that events occur according to the Poisson distribution P(λ). To summarize: f(x) = 0 for x ≤ 0. ⎪ 1 r 2 f x = ⎨Γ 2 r 2 ⎪ ⎩0. 0 . F(x) = 1 − e . it sufﬁces to determine F here.v. () x>0 x≤0 λ > 0. which is known as the Negative Exponential distribution. we suppose that we have just observed such a particle. 2 The distribution with this p. So let x > 0 be the the waiting time for the emission of the next item. we get ⎧λ e − λx . then X has the Negative Exponential distribution. their average − λx ( λx ) = e − λx . () ( ) x>0 x≤0 r > 0.

β). β). 3. ( ) ( ( ) [ ]) () ( ) α≤x≤β otherwise α < β. The interpretation of this distribution is that subintervals of [α.) The fact that the r. of the same length.70 3 On Random Variables and Their Distributions Consonant with previous notation. () ∫−∞ f (x )dx = β − α ∫α dx = 1. f x ≥ 0. f (x) 1 Ϫ␣ Figure 3. we may use the notation X ∼ Γ(α. respectively. 1 and ⎧Γ α +β ⎪ x α −1 1 − x f x = ⎨Γ α Γ β ⎪ ⎩0 elsewhere. (See Fig. β]. and α and β are the parameters of the distribution. β) or 2 X ∼ NE(λ).3. 3.5 Uniform U(α. β) or Rectangular R(α. ∞ 1 β The distribution of X is also called Uniform or Rectangular (α.f. That ∫−∞ f ( x)dx = 1 is seen as follows. .8 Graph of the p. β) X S = ޒactually X S = α . or Chi-square with r degrees of freedom. X has the Uniform distribution with parameters α and β may be denoted by X ∼ U(α. β > 0. 0 ␣  x 3. f(x) ≥ 0. are assigned the same probability of being observed regardless of their location. of the U(α. ( ) ( ( ) ( )) ) β −1 () ( ) ( )() ( . or X ∼ χ r in order to denote the fact that X is distributed as Gamma with parameters α and β.6 Beta X S = ޒactually X S = 0. or Negative Exponental with parameter λ. 0<x<1 α > 0. ∞ Clearly.v. β) distribution.d.3. f x =⎨ ⎪ ⎩0.8. Clearly. β and ⎧ ⎪1 β − α .

Let y/(1 − u) = υ. Graphs of the p. Γ α Γ β =Γ α +β and hence ( )() ( ( )∫ x (1 − x) 1 α −1 β −1 0 dx α −1 ∫−∞ f (x )dx = Γ(α )Γ(β ) ∫0 x (1 − x ) ∞ 1 Γ α +β ) β −1 dx = 1. Again the fact that X has the Beta distribution with parameters α and β may be expressed by writing X ∼ B(α. u ∈ 0. The distribution of X is also called the Beta distribution and occurs rather often in statistics. α.d. since Γ(1) = 1 and Γ(2) = 1. β are called the parameters of the distribution 1 and the function deﬁned by ∫0 x α −1 (1 − x )β −1 dx for α. so that x= uy y du . upon setting u = x/(x + y). β > 0 is called the Beta function. υ ∈ (0.3 Continuous Random Variables 3. so that y = υ(1 − u). β).9. ( )∫ u 1 0 α −1 ( (1 − u) 1 ) β −1 du β −1 du. dy = (1 − u)dυ.f. dx = . 1− u becomes =∫ =∫ ∞ 1 0 ∫ ∫ uα −1 0 (1 − u) ( 1− u α −1 yα −1 y β −1 e yα + β −1 e ( ) y du (1 − u) 2 dy ∞ 1 0 uα −1 − y 1− u 0 ) α +1 ( ) du dy. of the Beta distribution for selected values of α and β are given in Fig. ∞). 1). we get the U(0.3. . Then the integral is =∫ ∞ 1 0 ∞ 0 α ∫ u (1 − u) −1 β −1 υ α + β −1 e −υ du dυ = ∫ υ α + β −1 e −υ dυ ∫ uα −1 1 − u 0 0 =Γ α +β that is.1 Soem (and General Random Concepts Vectors) 71 ∞ ∞ Γ α Γ β = ⎛ ∫ x α −1 e − x dx⎞ ⎛ ∫ y β −1 e − y dy⎞ ⎝ 0 ⎠⎝ 0 ⎠ ∞ ∞ −( x+y ) = ∫ ∫ x α −1 y β −1 e dx dy 0 0 ( )() which. REMARK 4 For α = β = 1. 3. 1 2 1− u 1− u ( ) ( ) − y 1− u and x + y = y .

β.4 0.6 0.5 1. Clearly. x ∈ ޒ.5 ␣ϭ3 ϭ3 ␣ϭ5 ϭ3 2..9 Graphs of the p. so that σ dx = dy. μ ∈ ޒ.d.d. of the Beta distribution for several values of α.0 0 0. σ The distribution of X is also called the Cauchy distribution and μ. of the Cauchy distribution looks much the same as the Normal p. 3.8 1. σ2) to express the fact that X has the Cauchy distribution with parameters μ and σ2.72 3 On Random Variables and Their Distributions f (x) 2. (The p.f.0 x Figure 3.7 Cauchy Here X S = ޒand ( ) f x = () ∞ σ 1 ⋅ π σ2 + x−μ ( ) 2 .) 3.8 Lognormal Here X(S) = ( ޒactually X(S) = (0.d.3.10). We may write X ∼ Cauchy(μ.f. ∞)) and . 3.3.f. except that the tails of the former are heavier.0 ␣ϭ2 ϭ2 1. σ > 0.2 0. f(x) > 0 and ∫−∞ f ( x )dx = π ∫−∞ ∞ σ 1 σ2 + x−μ ( ) 2 dx = 1 ∞ 1 σπ ∫−∞ 1 + x − μ σ [( ) ] 2 dx = ∞ 1 ∞ dy 1 = arctan y = 1. σ are called the parameters of the distribution (see Fig. −∞ π ∫−∞ 1 + y 2 π upon letting y= x−μ .

) and their densities will be presented later (see Chapter 9.3.11 Bivariate Normal Here X(S) = ޒ2 (that is. Section 9. 3.1 Soem Concepts Continuous Random Variables (and General Random Vectors) 73 f (x) 0.9 t 3. () ( ) 2 ⎤ ⎥.3. X is a 2-dimensional random vector) with . so that log x = y.) 3. the reader is referred to the book The Lognormal Distribution by J. ⎧ ⎡ ⎪ 1 exp⎢− log x − log α ⎪ ⎢ f x = ⎨ xβ 2π 2β 2 ⎢ ⎣ ⎪ ⎪ ⎩0. ⎥ ⎥ ⎦ x>0 x ≤ 0 where α . analysis of variance. The notation X ∼ Lognormal(α. New York. y ∈ (−∞. Cambridge University Press. that is.2). then Y = log X is normally distributed with parameters log α and β 2. ⎥ ⎥ ⎦ But this is the integral of an N(log α. Aitchison and J. We close this section with an example of a continuous random vector. A. C. The distribution of X is called Lognormal and α.10 Graph of the p. β 2) density and hence is equal to 1. 3.f.3.1 Ϫ2 Ϫ1 0 1 2 x Figure 3. becomes ⎡ y − log α 1 = exp − ∫ y ⎢ ⎢ 2β 2 β 2π −∞ e ⎢ ⎣ 1 ∞ ( ) 2 ⎤ ⎥ e y dy.3 3. ∞). (For the many applications of the Lognormal distribution. if X is lognormally distributed. dx = eydy. Brown.3 0. β are called the parameters of the distribution (see Fig. Now f(x) ≥ 0 and ∫ () ∞ −∞ f x dx = 1 β 2π ∫0 ∞ ⎡ log x − log α 1 exp⎢− ⎢ x 2β 2 ⎢ ⎣ ( ) 2 ⎤ ⎥ dx ⎥ ⎥ ⎦ which.d.10 F ͖ These distributions occur very often in Statistics (interval estimation. 1957. letting x = ey. testing hypotheses. β) may be used to express the fact that X has the Lognormal distribution with parameters α and β . etc.3. σ = 1. of the Cauchy distribution with μ = 0.2 0.11). β > 0.

(See Fig. x2 ∈ .5 0 1 2 3 4 x Figure 3.d. 3. σ2 > 0. ⎢ x 2 − ⎜ μ 2 + ρσ 2 ⎟⎥ = σ σ ⎝ ⎠ ⎢ ⎥ 1 2 ⎣ ⎦ ( ) ρσ 2 x1 − μ1 .74 3 On Random Variables and Their Distributions f (x) 0.12. μ2.ޒσ1. ρ are called the parameters of the distribution. x2)dx1dx2 = 1 is seen as follows: 2 ( 2 ⎡⎛ x − μ ⎞ 2 ⎛ x1 − μ1 ⎞ ⎛ x 2 − μ 2 ⎞ ⎛ x 2 − μ 2 ⎞ ⎤ 1 1 − ρ 2 q = ⎢⎜ 1 − 2 ρ + ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎥ ⎢⎝ σ 1 ⎠ ⎝ σ1 ⎠ ⎝ σ 2 ⎠ ⎝ σ 2 ⎠ ⎥ ⎦ ⎣ ) ⎡⎛ x − μ 2 ⎞ ⎛ x1 − μ1 ⎞ ⎤ 2 ⎛ x1 − μ1 ⎞ = ⎢⎜ 2 ⎟ − ρ⎜ ⎟⎥ + 1− ρ ⎜ ⎟ . σ2. That ∫∫ ޒf(x1. −1 < ρ < 1 and q= 2 ⎡⎛ x − μ ⎞ 2 ⎛ x 1 − μ1 ⎞ ⎛ x 2 − μ2 ⎞ ⎛ x 2 − μ2 ⎞ ⎤ 1 ⎢⎜ 1 2 + − ρ ⎟ ⎥ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎝ σ1 ⎠⎝ σ2 ⎠ ⎝ σ2 ⎠ ⎥ 1 − ρ 2 ⎢⎝ σ 1 ⎠ ⎣ ⎦ 1 with μ1. x2) > 0. β. where x1.2 ␣ ϭ e2 ␣ϭe 1 ␣ϭ1  2 ϭ 0. f(x1.f. ⎛ x2 − μ2 ⎞ ⎛ x1 − μ1 ⎞ x 2 − μ 2 1 x − μ1 − ⋅ ρσ 2 ⋅ 1 ⎜ ⎟ − ρ⎜ ⎟= σ2 σ2 σ1 ⎝ σ2 ⎠ ⎝ σ1 ⎠ = where b = μ2 + 1 σ2 ⎡ ⎛ x1 − μ1 ⎞ ⎤ 1 x2 − b .6 0. σ1. σ1 ( ) .8 0. x 2 = ( ) 1 2π σ 1 σ 2 1 − ρ 2 e −q 2 . ⎝ σ1 ⎠ ⎥ ⎝ σ1 ⎠ ⎢ ⎣⎝ σ 2 ⎠ ⎦ 2 ( ) 2 Furthermore. f x1 . μ2 ∈ ޒ.4 0.) Clearly. of the Lognormal distribution for several values of α. The distribution of X is also called the Bivariate Normal distribution and the quantities μ1.11 Graphs of the p.

integrating with respect to x1. ( ( ) and similarly. σ 2 (1 − ρ2)) density.3 3. ∞ ∞ REMARK 5 From the above derivations. ⎥ ⎥ ⎦ 2 since the integral above is that of an N(b. σ 2 .12 Graph of the p. x 2 dx 2 −∞ ( ) ∞ ( ) is N μ1 .d. of the Bivariate Normal distribution. x 2 ) for z Ͼ k 0 x2 x1 Figure 3. k Thus ( and hence ⎛ x − b⎞ 2 ⎛ x1 − μ1 ⎞ 1 − ρ2 q = ⎜ 2 ⎟ ⎟ + 1− ρ ⎜ ⎝ σ2 ⎠ ⎝ σ1 ⎠ ) 2 ( ) 2 ∫−∞ ( ∞ f x1 . x2) is Bivariate Normal. we get ∫−∞ ∫−∞ f ( x1 . x 2 dx1 −∞ ( ) ∞ ( ) 2 is N μ 2 . f2 x 2 = ∫ f x1 . σ 12 .1 Soem Concepts Continuous Random Variables (and General Random Vectors) 75 z z ϭ f (x1.f. ) . if f(x1. Since the ﬁrst 2 factor is the density of an N(μ1. it follows that. x 2 )dx1 dx 2 = 1. σ 1 ) random variable.3. then f1 x1 = ∫ f x1 . x 2 dx 2 = ) ⎡ x −μ 1 1 exp⎢− 2 ⎢ 2σ 1 2πσ 1 ⎢ ⎣ 1 ×∫ ∞ −∞ ⎤ ⎥ ⎥ ⎥ ⎦ 2 ⎡ x2 − b 1 ⎢ exp − ⎢ 2σ 2 1 − ρ 2 2πσ 2 1 − ρ 2 2 ⎢ ⎣ 2 ( ) ( ( ) ) ⎤ ⎥ dx ⎥ 2 ⎥ ⎦ = ⎡ x −μ 1 1 exp⎢− 2 ⎢ 2 σ 2πσ 1 1 ⎢ ⎣ 1 ( ) 2 ⎤ ⎥ ⋅ 1.

v.d. If X ∼ N(0. and for a < b.3. the values γ = P(X ≤ x) are given for r ranging from 1 to 45.6 Consider certain events which in every time interval [t1. ii) max f x = x∈ ޒ () 1 2πσ . Then use the symmetry of the p.3.1 Let f be the p. σ 2 . For a ≤ 0 < b. let p = P(a < X < b). 1). t2] (0 < t1 < t2) occur independently for nonoverlapping intervals according to the Poisson distribution P(λ(t2 − t1)). For a ≤ b < 0. From the entries of the table.f.3 that: iii) P(−1 < X < 1) = 0. for a ﬁxed γ. σ2) distribution and show that: ii) f is symmetric about μ. ρ) may be used to express the fact that 2 X has the Bivariate Normal distribution with parameters μ1. distributed as χ10 . μ2.68269. p = Φ(b) − Φ(a).3. ii) P(a < X < b) = 0. 2 2 Then X1 ∼ N(μ1 .d.76 3 On Random Variables and Their Distributions As will be seen in Chapter 4.3.f. Select some values of γ and record the corresponding values of x for a set of increasing values of r. and for selected values of γ.9973. p = Φ(−a) − Φ(−b). σ 12. μ2. 2 3.5 Let X be an r. Exercises 3. Let T be the r.9545.’s f1 and f2 above are called marginal p.f.3. 3. p = Φ(b) + Φ(−a) − 1. ρ. 2 3.d.4 Let X be a χ r . 3. of the N(μ.d.3. f in order to show that: iii) iii) iii) iv) For 0 ≤ a < b. Appendix III.f. σ 2 . Use Table 5 in Appendix III in order to determine the numbers a and b for which the following are true: ii) P(X < a) = P(X > b). observe that. 1). iii) P(−2 < X < 2) = 0.v. denoting the time which lapses . σ 12.2 Let X be distributed as N(0. the p. In Table 5. iii) P(−3 < X < 3) = 0. P(−c < X < c) = 2Φ(c) − 1.’s of f.90. the values of x increase along with the number of degrees of freedom r. use the Normal Tables in Appendix III in order to show (See Normal Tables in Appendix III for the deﬁnition of Φ. 2 The notation X ∼ N(μ1. σ 2 ).) 3. For c > 0. σ 1 ) and X2 ∼ N(μ2.

β = ∫ x α −1 1 − x 0 ( ) 1 ( ) β −1 dx.3. iv) If it is known that P(X > s) = α.7 Let X be an r. distributed as U(−α.3.3. . 3. with p. 3. Determine the values of the parameter α for which the following are true: ii) P(−1 < X < 2) = 0. .3. t > 0. For an individual from the group in question. j = 0. given that he just celebrated his 40th birthday. Calculate the probability that X2 ≤ c. f is given by: f(x) = λe−λxI(0. Compare the probabilities in parts (ii) and (iii) and conclude that the Negative Exponential distribution is “memoryless”. 3. α) (α > 0).75. P(X > s + t|X > s) for some s. iii) For what value of c. iii) He will live to be at least 70 years old.11 Establish the following identity: ⎛ n − 1 ⎞ p m−1 n⎜ 1− x ⎟∫ x ⎝ m − 1⎠ 0 ( ) n− m dx = ( p n! x m−1 1 − x ∫ m−1 n−m ! 0 )( ) ( ) n− m dx ⎛ ⎞ = ∑ ⎜ n⎟ p j 1 − p ⎝j⎠ j= m n ( ) n− j .9 Let X be an r.1 Soem General Exercises Concepts 77 between two consecutive such events. compute the probability that: iii) He will survive to retire at 65. 3. α).d. 1.3. . P(X > c) = 1 ? 2 3.v.10 Refer to the Beta distribution and set: B α . P(X > t) for some t > 0.v.v. having the Negative Exponential distribution with parameter λ = 1/50 (years). denoting the life length of a TV tube and suppose that its p. ii) P(|X| < 1) = P(|X| > 2). express the parameter λ in terms of α and s.12 Let X be an r.8 Suppose that the life expectancy X of each member of a certain group of people is an r.d. Show that the distribution of T is Negative Exponential with parameter λ by computing the probability that T > t.3.f given by f(x) = 1/[π(1 + x2)]. . ∞)(x). Compute the following probabilities: iii) iii) iii) iv) P(j < X ≤ j + 1).3. 3.f. β) = B(β. Then show that B(α.v. .

f. 3.f.v. denoting the life length of a certain electronic device expressed in hours. ⎪0 . ii) f(x) = xe−xI(0.f. Compare the answer with that of Exercise 3. β = 1 and β = 2. 3.∞)(x).78 3 On Random Variables and Their Distributions 3.f.v. given by 3.’s involved. (called the Weibull p. 2 a = (c )( cx )α+1IA(x).7(iii). and specify the values of the parameters for which this happens.13 iii) iii) iii) iv) f(x) f(x) f(x) f(x) Show that the following functions are p.3.’s: ii) f(x) = cos xI(0. draw the respective graphs of the 2 p. with p.15 For what values of the constant c are the following functions p.15(ii) and compute the probability that X exceeds s + t. is a special case of a Weibull p. and suppose that its p. c > 0 (Pareto distribution).d. iii) For α = 1 and β = 1 . = 1 e−|x−μ| (Double Exponential).d.3.000 ] x . xn [ () ii) Determine the constant c in terms of n.18 Refer to Exercise 3. and: β iii) Show that it is a p.3.d.’s: = xe−x /2I(0.16 Let X be an r. x ≤ −1 ii) f(x) = cx2e−x I(0.f.d. β > 0).d.π/2)(x).3.f.19 Consider the function f(x) = αβxβ −1e−α x . 3. α.f.f.14 Show that the following functions are p. ⎩ 3 x>0 −1 < x ≤ 0.d. iii) Observe that the Negative Exponential p.3. 3. 3. f is given by: f x = () c I 1. ⎪ ii) f(x) = ⎨− cx. (Note: The Weibull distribution is employed for describing the lifetime of living organisms or of mechanical systems. = 2 / π x2e−x /2I(0. ∞)(x) (Raleigh distribution).d. Compute the probability that X > x.) .3.3.’s? ⎧ce −6 x ..d. A = (c. ∞).000 hours.d.f. given that X > s.∞) (x). 2 2 3.∞)(x) (Maxwell’s distribution). x > 0 (α.3.17 Let X be the r.3.d. with parameters α and β).15(ii).000 .f. ii) Calculate the probability that the life span of one electronic device of the type just described is at least 2.3. 3.

⋅ ⋅ ⋅ .d. ( ) ( ) ( ) 3.v. ⎪⎝ 2 ⎠ ⎪ ⎩0 . Determine the constant c and compute the probability that 0 < X2 + Y2 < 4.: f x.d.f. 1.f.v. To this end. y = c 25 − x 2 − y 2 I (0 . ⎪4 ⎪1 .3. Then the following theorem is true.1 The Soem Poisson General Distribution Concepts 79 3. 2 < Y < 4). f given by: f x. 4π ( ) ( ) ⎛ π π⎤ A = −π .3. 3. so that the nth distribution has probability pn of a success. pn → 0 and that λ n = npn → λ . consider a sequence of Binomial distributions. we ﬁrst establish rigorously the assertion made earlier that the Poisson distribution may be obtained as the limit of Binomial distributions. π × ⎜ − . y = ( ) 1 cos y I A x. we have x ⎛ n ⎞ x n− x −λ λ ⎯ ⎯ → e ⎜ ⎟ pn q n ⎯n →∞ ⎝ x⎠ x! for each ﬁxed x = 0. ⋅ ⋅ ⋅ otherwise. ⎥ .5 ) x 2 + y 2 . THEOREM 1 With the above notation and assumptions. P(1 < X < 2.4 3. and we assume that as n → ∞. ⎝ 2 2⎦ ( ] 3.3. y). ⎪ f x = ⎨8 x ⎪⎛ 1 ⎞ ⎪⎜ ⎟ . Determine the constant c and compute the following probabilities: i) ii) iii) iv) < X < 1. f is given by f(x. for some λ > 0.f.3.22 Verify that the following function is a p.2)×(0. 2. P(X > Y). .23 (A mixed distribution) Show that the following function is a p.d. y .f. () 3.’s whose joint p. x≤0 0<x<2 x = 2.3. 0 < Y < 3).d. ⎧1 x ⎪ e .4 The Poisson Distribution as an Approximation to the Binomial Distribution and the Binomial Distribution as an Approximation to the Hypergeometric Distribution In this section. Y > 5).21 Let X and Y be r.20 Let X and Y be r.’s having the joint p.5)(x. P( 1 2 P(X < 2. y) = cxyI(0. 3.

. Then . provided p is small. n⎠ ⎝ n This is merely a generalization of the better-known fact that ⎛ λ⎞ −λ ⎜1 − ⎟ → e . What we do is approximate the Hypergeometric distribution by an appropriate Binomial distribution. 0 < p < 1. n→∞ x n⎠ ⎛ x! ⎝ λn ⎞ ⎜1 − ⎟ n⎠ ⎝ since. n are large. ▲ n⎠ ⎝ n The meaning of the theorem is the following: If n is large the x n−x probabilities ( n are not easily calculated. . if λn → λ. and then.n → p. 2. n. . This is true for all x = 0. Then we can approximate x )p q −λ x them by e (λ /x!). where we replace λ be np.80 3 On Random Variables and Their Distributions PROOF We have ⎛ n ⎞ x n− x n n − 1 ⋅ ⋅ ⋅ n − x + 1 x n− x pn q n ⎜ ⎟ pn q n = ⎝ x⎠ x! = = n n − 1 ⋅ ⋅ ⋅ n − x + 1 ⎛ λn ⎞ ⎛ λn ⎞ ⎜ ⎟ ⎜1 − ⎟ x! n⎠ ⎝ n⎠ ⎝ x ( ( ( ) ) ) ( ( ( ) ) n− x n n−1 ⋅ ⋅ ⋅ n−x +1 nx )⋅ 1 λ x! x n ⎛ λ ⎞ 1 ⋅ ⎜1 − n ⎟ ⋅ x n⎠ ⎛ ⎝ λn ⎞ 1 − ⎜ ⎟ n⎠ ⎝ n ⎛ = 1⋅ ⎜1 − ⎝ x ⎛ 1⎞ x − 1⎞ λ n ⋅ ⋅ ⋅ 1 − ⎟ ⎜ ⎟ n⎠ n ⎠ x! ⎝ n ⎛ λ ⎞ 1 λx −λ × ⎜1 − n ⎟ ⋅ ⎯ ⎯ ⎯ → e . 1. if need be. n → ∞ and suppose that m/(m + n) = pm. Thus we have THEOREM 2 Let m. We also meet with difﬁculty in calculating probabilities in the Hypergeometric distribution REMARK 6 ⎛ m⎞ ⎛ n ⎞ ⎜ ⎟⎜ ⎟ ⎝ x ⎠ ⎝ r − x⎠ ⎛ m + n⎞ ⎜ ⎟ ⎝ r ⎠ if m. we can go one step further in approximating the Binomial by the appropriate Poisson distribution according to Theorem 1. then ⎛ λn ⎞ −λ ⎜1 − ⎟ → e . .

1 The Soem Poisson General Distribution Concepts 81 ⎛ m⎞ ⎛ n ⎞ ⎜ ⎟⎜ ⎟ ⎛ r⎞ ⎝ x ⎠ ⎝ r − x⎠ → ⎜ ⎟ p x q r− x . r . 2.4 3. . Dividing through by (m + n). ⋅ ⋅ ⋅ .n→∞ ⎝ x⎠ −1 ( ) ( ) ) since m → p and hence m+n REMARK 7 n → 1 − p = q. =⎜ ⎟ ⎝ x⎠ (m + n) ⋅ ⋅ ⋅ [(m + n) − (r − 1)] Both numerator and denominator have r factors. 1. ⎛ m + n⎞ ⎝ x⎠ ⎜ ⎟ ⎝ r ⎠ PROOF x = 0. n are large. we can approximate the probabilities . we get ⎛ m⎞ ⎛ x ⎞ ⎜ ⎟⎜ ⎟ ⎛ m ⎝ n ⎠ ⎝ r − x⎠ ⎛ r ⎞ ⎛ m ⎞ ⎛ m 1 ⎞ x−1 ⎞ = ⎜ ⎟⎜ − − ⎟⎜ ⎟ ⋅⋅⋅⎜ ⎟ ⎛ m + n⎞ ⎝ m + n m + n⎠ ⎝ x⎠ ⎝ m + n ⎠ ⎝ m + n m + n ⎠ ⎜ ⎟ ⎝ r ⎠ ⎛ n ⎞⎛ n ⎛ n 1 ⎞ r − x − 1⎞ ×⎜ − − ⎟⎜ ⎟ ⋅⋅⋅⎜ ⎟ m+n ⎠ ⎝ m + n⎠ ⎝ m + n m + n⎠ ⎝m+n ⎡ ⎛ ⎛ 1 ⎞ r − 1 ⎞⎤ × ⎢1 ⋅ ⎜ 1 − ⎟ ⋅ ⋅ ⋅ ⎜1 − ⎟⎥ m + n⎠ m + n⎠ ⎥ ⎝ ⎢ ⎣ ⎝ ⎦ ⎛ r ⎞ x r− x ⎯m ⎯⎯ →⎜ ⎟ p q .3. We have ⎛ m⎞ ⎛ n ⎞ ⎜ ⎟⎜ ⎟ m! n! m + n − r ! ⎝ x ⎠ ⎝ r − x⎠ r! ⋅ = ⎛ m + n⎞ m − x ! n − r − x ! m + n ! x! r − x ! ⎜ ⎟ ⎝ r ⎠ ( )[ ( ( ( ) )] ( ) ( ) ⎛ r⎞ m − x + 1 ⋅ ⋅ ⋅ m n − r + x + 1 ⋅ ⋅ ⋅ n =⎜ ⎟ ⎝ x⎠ m+n−r +1 ⋅ ⋅ ⋅ m+n ) ( ⎛ r ⎞ m( m − 1) ⋅ ⋅ ⋅ [ m − ( x − 1)] ⋅ n( n − 1) ⋅ ⋅ ⋅ [ n − (r − x − 1)] . ᭡ m+n The meaning of the theorem is that if m.

It is to be x r−x m observed that ( xr )( mm is the exact probability of having exactly x + n ) (1 − m + n ) successes in r trials when sampling is done with replacement. draw graphs of B(n.5* Random Variables as Measurable Functions and Related Results In this section.1 For the following values of n. remains constant. A. . that is.2 and suppose that the number of applicants is equal to 72. . denoted by X−1(T). however. ( ) { () } . p 20. Consider the probability space (S. p 16. . p = = = = = 4 16 2 16 2 16 1 16 2 16 . If. p) and P(λ) on the same coordinate axes: iii) iii) iii) iv) iv) n= n= n= n= n= 10.2 Refer to Exercise 3. so that the . n → ∞) and n remains approximately constant → c = p / q). deﬁne the inverse image of T. P) and let T be a space and X be a function deﬁned on S into T. The Hypergeometric distribuprobability of a success. p and λ = np. then the probabilities of having exactly x successes in r (m n (independent) trials are approximately equal under both sampling schemes.2. . .5. Certain results related to σ-ﬁelds are also derived. 3. mm +n tion is appropriate when sampling is done without replacement. 1. so that λ = 1. 2.82 3 On Random Variables and Their Distributions ⎛ m⎞ ⎛ n ⎞ ⎟ ⎜ ⎟⎜ ⎝ x ⎠ ⎝ r − x⎠ ⎛ m + n⎞ ⎟ ⎜ ⎝ r ⎠ ⎛ r⎞ by ⎜ ⎟ p x q r − x ⎝ x⎠ by setting p = m/(m + n). 3.5. under X. r. m m and n are large (m. For T ⊆ T. so that λ = 2. as follows: X −1 T = s ∈ S .5.3 Refer to Exercise 3. 3.2. p 24.4. so that λ = 2. so that λ = 3. random variables and random vectors are introduced as special cases of measurable functions. X : S → T. p 24.4. Exercises 3. . This is true for all x = 0.4. X s ∈T . .10 and use Theorem 2 in order to obtain an approximate value of the required probability. Compute the probabilities (i)–(iii) by using the Poisson approximation to Binomial (Theorem 1). so that λ = 2. .

If (T. D) = (ޒ.1 Soem and General Related Concepts Results 83 This set is also denoted by [X ∈ T] or (X ∈ T). then X −1 T1 ∩ X −1 T2 = ∅.3. where ޒk = ( ޒ × · · · × ޒ × ޒk copies of )ޒ. ⎝ j ⎠ j ( ) ( ) (2) ( ) (3) Also ⎞ ⎛ X −1 ⎜ I T j ⎟ = I X −1 T j . is well deﬁned. D) = (ޒk. if (T. we immediately have THEOREM 3 ( ) { ( ) } The class X−1(D) is a σ-ﬁeld of subsets of S. where C is a class of subsets of T. we shall write X if k ≥ 1. or just measurable if there is no confusion possible. ⎝ j ⎠ j ( ) c (4) (5) (6) (7) X −1 T c = X −1 T X −1 T = S . A = X −1 T for some T ∈D . and X is (A. In particular. Bk). X −1 ( ) [ ( )] . ⎝ j ⎠ j ( ) (1) If T1 ∩ T2 = ∅. then we say that X is (A. If X−1(D) ⊆ A. we say that X is a random variable (r. Then X is (A. In this latter case. THEOREM 4 COROLLARY Deﬁne the class C* of subsets of T as follows: C* = {T ⊆ T. More generally. (6) above. Then C* is a σ-ﬁeld. B)measurable. we say that X is a k-dimensional random vector (r. By means of (1). Hence by (1) and (2) we have ⎞ ⎛ X −1 ⎜ ∑ T j ⎟ = ∑ X −1 T j . It guarantees that the probability distribution function of a random vector.v. Let now D be a σ-ﬁeld of subsets of T and deﬁne the class X−1(D) of subsets of S as follows: X −1 D = A ⊆ S .5* Random Variables as Measurable Functions 3. (5).). Bk)measurable. X is a random variable if and only if . vector). D)-measurable if and only if X−1(C) ⊆ A. A random variable is a one-dimensional random vector. D)-measurable. Let D = σ(C). the following is immediate. The above theorem is the reason we require measurability in our deﬁnition of a random variable. Then the following properties are immediate consequences of the deﬁnition (and the fact X is a function): ⎛ ⎞ X −1 ⎜ U T j ⎟ = U X −1 T j . ( ) (∅) = ∅. On the basis of the properties (1)–(7) of X−1. to be deﬁned below. X−1(T) = A for some A ∈ A}. and just X if k = 1. B) and X is (A.

deﬁne on Bk the set function PX as follows: PX B = P X −1 B = P X ∈ B = P s ∈ S . The σ-ﬁeld C* of Theorem 4 has the property that C* ⊇ C. Thus X−1(D) ⊆ A. . and similarly for the case of k-dimensional random vectors.5. ( ) [ ( )] ( ) ({ ( ) }) (8) ( ) [ ( )] ( ) Exercises 3. and ﬁnally. ∞ ⎡ ⎛ ∞ ⎞ ⎛ ∞ ⎞⎤ ⎡∞ ⎤ ∞ PX ⎜ ∑ B j ⎟ = P ⎢ X −1 ⎜ ∑ B j ⎟ ⎥ = P ⎢∑ X −1 B j ⎥ = ∑ P X −1 B j = ∑ PX B j . Cj. C ′j. P[X−1(B)] makes sense.1 Consider the sample space S supplied with the σ-ﬁeld of events A. Write out the proof of Theorem 2. by means of an r.2 3. . vector. or X−1(C ′j) ⊆ A. In fact. . The converse is a direct consequence of the deﬁnition of (A. 2. X s ∈ B . PX(B) ≥ 0. P) → (ޒk. PX(ޒk) = P[X −1(ޒk)] = P(S ) = 1.84 3 On Random Variables and Their Distributions X−1(Co). ⎝ j =1 ⎠ ⎝ j =1 ⎠ ⎥ j =1 ⎢ ⎣ j =1 ⎦ j =1 ⎣ ⎦ The probability measure PX is called the probability distribution function (or just the distribution) of X. Therefore PX is well deﬁned by (8). vector X: (S. is well deﬁned. The classes Co. Bk ).5. i. . Next. the sets X−1(B) in S are actually events due to the assumption that X is an r. . 8 are deﬁned in Theorem 5 and the paragraph before it in Chapter 1. It is now shown that PX is a probability measure on Bk . (5) and (6). iii) Show that IA is r. for any A ∈ A . B ∈ Bk.e. 8.5. j = 1.v. iii) What is the partition of S induced by IA? iii) What is the σ-ﬁeld induced by IA? 3. ▲ PROOF Now. By the Corollary to Theorem 4. . A. . since P is a probability measure. But X−1(C*) ⊆ A.3 Write out the proof of Theorem 1 by using (1).. Then C* ⊇ D = σ(C ) and hence X−1(C*) ⊇ X−1(D). For an event A. j = 1. D) = measurability. or X−1(Cj). . the indicator IA of A is deﬁned by: IA(s) = 1 if s ∈ A and IA(s) = 0 if s ∈ Ac. .

We express this by writing F(−∞) = 0. F(x) → 1. We omit the subscript X if no confusion is possible. Thus. We have then: i) Obvious. ii) F is nondecreasing.f.f. 1]).d. and Their Relationship 4. . . . . iv) F(x) → 0 as x → −∞.) of a Random Vector— Basic Properties of the d. Now we restrict our attention to the case k = 1 and prove the following basic properties of the d. . . F of a r. where B is a subset of ޒk.f. In fact.1 The Cumulative Distribution Function (c. 85 . one may choose B to be an “interval” in ޒk.f.4. ii) This means that x1 < x2 implies F(x1) ≤ F(x2). Probability Densities. . yk). for the sake of simplicity. the d. or just the distribution function (d. of an r. or d. y ≤ x} in the sense that. PX(B) is denoted by FX(x) and is called the cumulative distribution function of X (evaluated at x). as x → +∞. then yj ≤ xj. . k. vector X is an ordinary point function deﬁned on ޒk (and taking values in [0. i. if x = (x1. In particular.f. of a Random Variable The distribution of a k-dimensional r. .e. .) of X. we set Q for the distribution PX of X. xk)′ and y = ( y1. j = 1. B = {y ∈ ޒk.. x ∈ ޒ. THEOREM 1 The distribution function F of a random variable X satisﬁes the following properties: i) 0 ≤ F(x) ≤ 1.1 The Cumulative Distribution Function 85 Chapter 4 Distribution Functions. vector X has been deﬁned through the relationship: PX (B) = P(X ∈ B). iii) F is continuous from the right. F(+∞) = 1. PROOF In the course of this proof. . . For such a choice of B.v.f.

In fact. if xn → +∞. P(X = a) = F(a) − F(a−).86 4 Distribution Functions. This limit always exists. Similarly. that is.1. In fact. Then ] ( ] (−∞. but need not be equal to F(x+)(=limit from the right) = F(x). ii) The limit from the left of F(x) at x. x] n Q −∞. equivalently. and Their Relationship x1 < x2 implies (−∞. Equivalently.1. x ] ↓ (−∞. ) ( ) ( ) () () P a < X ≤ b = P −∞ < X ≤ b − P −∞ < X ≤ a = F b − F a . Chapter 2. We may assume xn ↑ ∞. x1 ≤ Q −∞. iv) Let xn → −∞. then F(xn) ↓ F(x).6). In fact. Chapter 2. if xn ↓ x. Probability Densities. x ] ↑ ޒand hence Q(−∞. n→∞ ( ) ( ) ( ) ( ) or lim F a − F x n = P X = a . and Thus ( ) () () (a < X ≤ b) = (−∞ < X ≤ b) − (−∞ < X ≤ a) (−∞ < X ≤ a) ⊆ (−∞ < X ≤ b). equivalently. n→∞ [ ( ) ( )] ( ) . since F(xn)↑. P An ↓ P A . ( ] ( ] ( ) ( ) iii) This means that. that is P a<X ≤b =F b −F a . ᭡ n n n by Theorem 2. Chapter 2. The quantities F(x) and F(x−) are used to express the probability P(X = a). We may assume that xn ↓ −∞ (see also Exercise 4. clearly. F ( x ) → 1. x ] ↑ Q( = ) ޒ1. x ] ⊂ (−∞. F x1 ≤ F x 2 . x n ↓ x implies and hence (−∞. x n ↓ Q ∅ = 0 ( ] ( ) (−∞. or lim P x n < X ≤ a = P X = a . An = (xn < X ≤ a). x n → Q −∞.’s of several distributions are given in Fig. x ] 1 2 and hence Q −∞. Then Graphs of d. n so that Q −∞.f. x ( by Theorem 2. denoted by F(x−). 4. F(xn) → 0. Then. let xn ↑ a and set A = (X = a). REMARK 1 i) F(x) can be used to ﬁnd probabilities of the form P(a < X ≤ b). is deﬁned as follows: ( F x − = lim F x n n→∞ ( ) ( ) with x n ↑ x. F(xn) ↓ F(x). equivalently. x 2 . x ] ↓ ∅. An ↓ A and hence by Theorem 2.

4 x 0 (b) Poisson for ϭ 2. 1 2 (c) U (␣. if F is continuous then F(x) = F(x−) and hence P(X = x) = 0 for all x. ( ) ( ) ( ) where it is assumed that x1 < x2 < · · · . n→∞ () ( ) ( ) or F a −F a− =P X =a .80 0. iii) If X is discrete.40 0. Of course.f. its d.80 0. is a “step” function.60 0.f. ). its d.d.5 x ␣ 0  xϪ␣ Ϫ␣ 0 (d) N (0. dF x dx ( )=f (x) F(x) 1.40 0.0 0 x xϽ␣ ␣ р x р .’s. Furthermore.20 F(x) 1.00 0.1 Examples of graphs of c. p ϭ 1 . It is known that a nondecreasing function (such as F) may have discontinuities which can only be jumps.20 0 (a) Binomial for n ϭ 6.f. F is continuous. x F(x) 1.0 0. iv) If X is of the continuous type. Then F(a) − F(a−) is the length of the jump of F at a.1 The Cumulative Distribution Function 87 or F a − lim F x n = P X = a . 1). .60 0.00 0. Here F(x) ϭ 1 Figure 4. xϾ Ϫ2 Ϫ1 ⌽(x) 1. the value of it at x being deﬁned by () ( ) ( ) F x = ( ) ∑ f (x ) j xj≤x and f x j = F x j − F x j−1 .4.

. by Statement 1. j = 1. Xk). Let X be an N(μ. k are real-valued functions deﬁned on S. . () . vectors. . vectors. since we operate in a probability framework. where Φ y = We have ⎞ ⎛X −μ P Y ≤ y = P⎜ ≤ y⎟ = P X ≤ yσ + μ ⎠ ⎝ σ () 1 2π ∫ y −∞ e −t 2 2 dt . vector if g is continuous. . . . Then g(X) is deﬁned on the underlying sample space S. Now a k-dimensional r. i. . of Y is Φ. as is well known from calculus. g(X) is an r. Often one has to deal with functions of an r. . .v. . STATEMENT 1 Let X be a k-dimensional r. j = 1. . (That is. Xk)′. Its precise formulation and justiﬁcation is given as Theorem 7 on page 104. Then it sufﬁces to show that the d. k are r. we see that if f is continuous.’s. where Xj. respectively. The question then arises as to how X and Xj. . takes values in ޒm. Y is an r. Then Y is an r. j = 1. . Through the relations F x = ∫ f t dt and −∞ () x () dF x dx ( )=f (x). . f determines F( f ⇒ F ) and F determines f (F ⇒ f ). σ 2)-distributed r. and Their Relationship at continuity points of f. PROOF X −μ σ THEOREM 2 . the resulting entities have got to be r. The following statement is to this effect. vector itself. .v.. and let g be a (well-behaving) function deﬁned on ޒk and taking values in ޒm. vector deﬁned on the sample space S. F ⇔ f. .v. and is an r. j = 1. Two important applications of this are the following two theorems. .v. k are related from the point of view of being r. and its In the ﬁrst place. and let X = (X1. . . Then X is an r. In such cases. The following two theorems provide applications of Statement 1. ( ) ( ) 1 2π = ∫ 2πσ −∞ 1 yσ + μ ⎡ t−μ exp⎢− ⎢ 2σ 2 ⎢ ⎣ ( ) 2 ⎤ ⎥ dt = ⎥ ⎥ ⎦ ∫−∞e y −u2 2 du = Φ y . whereas the precise statement and justiﬁcation are given as Theorem 8 below. . Probability Densities. vectors are r. .e. k be functions deﬁned on the sample space S and taking values in ޒk and ޒ.f.) In particular. vector X may be represented in terms of its coordinates. The answer is provided by the following statement. well-behaving functions of r. vectors. vector. and set Y = distribution is N(0. that is. . 1).88 4 Distribution Functions. X = (X1. . . STATEMENT 2 Let X and Xj. vector if and only if Xj.

2 Refer to Exercise 3. THEOREM 3 X −μ σ of X is referred to as normalization (i) We will show that the p. ⎪ fY y = ⎨ π 21 2 ⎪0.’s corresponding to the p.f. . for y > 0. 1)-distributed r. Since fY( y) = 0 for y ≤ 0 (because FY( y) = 0. 2 2 X − μ (ii) If X is a N ( μ.d. and determine the d.d. and determine the d. in Chapter 3.d.13. σ )-distributed r. of Y ﬁrst and then differentating it in order to obtain fY.v. ( σ ) is distributed as χ 2 1.v.’s given there. let us observe ﬁrst that Y is an r.) ᭡ Exercises 4. in Chapter 3. of Y is that of a χ 2 1-distributed r. To this end.f. ᭡ REMARK 2 The transformation of X.1 Refer to Exercise 3. FY y = P Y ≤ y = P − y ≤ X ≤ = 1 2π () ( ) ( y ) y 0 ∫ y − y e−x 2 2 dx = 2 ⋅ 1 2π ∫ e−x 2 2 dx.4.f. y] and FY y = 2 ⋅ () 1 2π ∫ y 1 2 t 0 e − t 2 dt . (Observe that here we used the fact that Γ( 2 ) = π .14.’s given there. of χ 2 1.2. Then dx = dt/2 t . then the r. y ≤ 0).v.’s corresponding to the p.f. by deriving the d. it follows that ⎧ 1 (1 2 )−1 − y 2 y e .d.f.1.2. Let x = t . Next.1. ⎩ () y>0 y ≤ 0. Hence dFY y dy ( )= 1 2π 1 y e −y 2 = 1 2π y (1 2 )−1 e −y 2 .f..v. on account of Statement 1. we have PROOF i(i) Let X be an N(0.v. Then Y = X 2 is distributed as χ 2 1. 4. t ∈ (0.f.1 The Cumulative DistributionExercises Function 89 where we let u = (t − μ)/σ in the transformation of the integral. 1 and this is the p.

respectively. in Chapter 3.10 If X is an r. 4.f.90 4 Distribution Functions.13.800.3).d.f.5 < X < 1.f. 4.3 Refer to Exercise 3. and the d.4 Refer to Exercise 3.d. Determine the d. Then show that F is continuous.f. 4.5).d. and Their Relationship 4.’s corresponding to the p.1.v. and determine the d. X distributed as N(2. 0. A light bulb is supposed to be defective if its lifetime is less than 1. β ∈ . 4. x ∈ ޒ.14. ii) X is discrete. 4.7 Let f and F be the p.6 Refer to the proof of Theorem 1 (iv) and show that we may assume that xn ↓ −∞ (xn ↑ ∞) instead of xn → −∞(xn → ∞).1. of the following r. 4. Probability Densities.3.11 The distribution of IQ’s of the people in a given group is well approximated by the Normal distribution with μ = 105 and σ = 20. (Logistic distribution) and derive the corresponding p.1.1. α > 0.25).v. aX + b.f. X 2.v. distributed as N(3. iii) P(−0.5 Let X be an r.. and dF(x)/dx = f(x) at the continuity points x of f.3.000. XI[a. ii) P(X > 2.f. f.17 in Chapter 3 and determine the d. F corresponding to the p. 4.f. use Table 3 in Appendix III in order to compute the following probabilities: i) P(X < −1)..1. Write out the expressions of F and f for n = 2 and n = 3. 4. X.b) (X ) when: i) X is continuous and F is strictly increasing. 2002). f given there.8 i) Show that the following function F is a d. What proportion of the individuals in the group in question has an IQ: i) At least 150? ii) At most 80? iii) Between 95 and 125? 4. If 25 light bulbs are . in Chapter 3.1.f.1.v. F x = () 1+ e 1 . and determine the d.’s given there. F.9 Refer to Exercise 3.12 A certain manufacturing process produces light bulbs whose life length (in hours) is an r.f.d.f. ޒ − (αx + β ) ii) Show that f(x) =αF(x)[1 − F(x)].1.’s: −X.f.3.v. with d.1.1.d. of an r.’s corresponding to the p.f.’s given there.

x2 ∈ ޒ. of The a Random Cumulative Vector Distribution and Its Properties Function 91 tested.. or the joint distribution function of X1. So consider the case that k = 2. given in Fig. calculate the failure (or hazard) rate H x = ( ޒx ) . f. H(x) and the probability in part (iii) become in the special case of the Negative Exponential distribution? 2 4. and it is found that the distribution of the actual diameters of the ball bearings is approximately normal with mean μ = 0. σ 2). what is the probability that at most 15 of them are defective? (Use the required independence. is ≥ 0. or both of them jointly.5 ± 0.f. of a Random Vector and Its Properties—Marginal and Conditional d. vector. ii) The variation of F over rectangles with sides parallel to the axes.d. 4. is F(x1.13 A manufacturing process produces 1 -inch ball bearings.f. X2)′ and the d.f.2 The d.0006 and defective otherwise.f. We then have X = (X1. and draw its graph for α = 1 and β = 1 .f. Compute the proportion of defective ball bearings. 4. () iii) For s and t > 0. x1. 4. given in Exercise 3.3.’s For the case of a two-dimensional r.14 If X is an r. distributed as N(μ. 1. X2 ≤ x2).5007 inch and σ = 0.d. f (x ) ii) Also.2 The 4.X2) of X. Then the following theorem holds true. iii) F is continuous from the right with respect to each of the coordinates x1.1. F and the reliability function (ޒx) = 1 − F(x). x2) = P(X1 ≤ x1. A day’s production is examined. x2) ≤ 1.v.15 Refer to the Weibull p.0005 inch. X2. x2.2.f. F(or FX or FX1.’s and p. (ޒx). which are 2 assumed to be satisfactory if their diameter lies in the interval 0.19 in Chapter 3 and do the following: i) Calculate the corresponding d.1. having the Weibull distribution. iv) What do the quantities F(x).) 4. calculate the probability P(X > s + t|X > t) where X is an r.1 d.4.f.1. With the above notation we have THEOREM 4 i) 0 ≤ F(x1. ﬁnd the value of c (in terms of μ and σ) for which P(X < c) = 2 − 9P(X > c). .v. 2. a result analogous to Theorem 1 can be established.

and if at least one of the x1. part (iv) says that F(x1.f. y 1) x1 ϩ Ϫ (x2. x 2 → P ∅ = 0. y1) y1 0 (x2. In particular. then F(x1. y2) − F(x2. x2) has second order partial derivatives and ∂2 F x1 . where −∞ < x1.f. F2 are called marginal d. . x2) → P(S ) = 1. x1] × (−∞. iv) If x1. iii) Same as in Theorem 3. X 2 ≤ x n x n ↑∞ ( ) = P X1 ≤ x1 . In particular. ∞) = 1. xk) has kth order partial derivatives and . F x1 . x2 goes (↓) to −∞.2 The variation V of F over the rectangle is: F(x1. If at least one of x1. F(−∞. y 1) x2 x iv) If both x1. and Their Relationship y y2 (x1. F1. ∞) = F1(x1) is the d. ᭡ REMARK 3 ( ) ( ) The function F(x1. − ∞ < X 2 < ∞ = P X1 ≤ x1 = F1 x1 . x2n ↓ x2). x2 ↑ ∞. of the random variable X1. of the random variable X1. . x2 < ∞. and zn = (x1n.f. then F(x1. of the random variable X2. x2) → 0. F(x1. hence F x1 . x2)′. y 2) Figure 4. We express this by writing F(∞. PROOF i) Obvious. Similarly F(∞. REMARK 4 ( ( ) ) ( ) ( ) It should be pointed out here that results like those discussed in parts (i)–(iv) in Remark 1 still hold true here (appropriately interpreted). then (−∞. clearly. −∞) = F(−∞. x2] ↓ ∅. x1] × (−∞. x 2 ∂x1∂x 2 ( ) ( ) at continuity points of f. x2] ↑ R 2. ∞ = lim P X1 ≤ x1 . x2n)′. → ∞. ≥ 0. then (−∞. y 2) Ϫ ϩ (x1. For k > 2. x2) → 1. y1) + F(x2. x2. ii) V = P(x1 < X1 ≤ x2. In fact.f. y2) − F(x1. . x 2 = f x1 .92 4 Distribution Functions. (If x = (x1. −∞) = 0. then zn ↓ x means x1n ↓ x1. x2) = F(x1. ∞) = F1(x1) is the d. Probability Densities. . x2 → −∞. x2) = F2(x2) is the d. y1 < X2 ≤ y2) and is hence.’s. so that F(x1. the analog of (iv) says that F(x1. we have a theorem strictly analogous to Theorems 3 and 6 and also remarks such as Remark 1(i)–(iv) following Theorem 3. In fact.

.d.d. ⋅ ⋅ ⋅ . As in the two-dimensional case.f. In fact. . ⋅ ⋅ ⋅ . .’s and vice versa. j = 1.v. . ⎩∫−∞ ( ) ( ) 2 ( ) ( ( ) ) 1 ( ) Then f1. xk = f x1 .f. of X. where F. . X = (X1. Furthermore. Then deﬁne f(x2|x1) as follows: f x 2 x1 = ( f x. In fact.f. . Xk. . x2 = ∑ f1 x1 ⎪ x ∈B x ∈ ޒ x ∈B P X 1 ∈ B = ⎨x ∈B . We call f1. In Statement 2. of the r. . . .d. of the random variable Xj. xk ∂x1∂x2 ⋅ ⋅ ⋅ ∂xk ( ) ( ) ) at continuity points of f. x2 dx1dx2 = ∫ ∫ f x1 . of X. we have seen that if X = (X1. ⋅ ⋅ ⋅ . x 2 dx 2 ∫ ⎩ −∞ ⎧∑ f x1 . f2 are p. or the joint distribution function of X1. Now suppose f1(x1) > 0. of The a Random Cumulative Vector Distribution and Its Properties Function 93 ∂k F x1 .4.f. B ޒ B ⎩B ޒ ( ) ( ) ( ) ( ) 1 2 1 2 ( ) [ 1 ( ) ] ( ) Similarly f2 is the p. of the random variables corresponding to the remaining (k − m) Xj’s. Xk)′ is an r. f1(x1) ≥ 0 and ∑ f ( x ) = ∑ ∑ f ( x . . x ∈ޒ ⎪∫ ∫ f x1 . . and f2 is the p.v.f.’s.’s are called marginal distribution functions. . of X1. is also called the joint p.x ) (f (x ) ) .v.f.’s X1. 1 1 1 2 x1 x1 x2 or ∫−∞ f1 (x1 )dx1 = ∫−∞ ∫−∞ f (x1 .f. · · · . 1 2 1 1 . All these d. X2. x 2 dx1 .d. ∞. and if m xj’s are replaced by ∞ (1 < m < k).1 d. vector. then Xj. x ) = 1.d. . . f(x) = f(x1. x2) and set ⎧∑ f x1 . . ∞ ∞ ∞ Similarly we get the result for f2. then the resulting function is the joint d. k are r.d.f. X2)′. x 2 ⎪ f1 x1 = ⎨ x ∞ ⎪ f x1 . ⋅ ⋅ ⋅ . f2 the marginal p. . f(x) = f(x1. x2 dx2 dx1 = ∫ f1 x1 dx1 .f. f1 is the p. F ∞. Xk. x2 = ∑ ∑ f x1 .f. Consider ﬁrst the case k = 2. xk). of X2. or FX1. is the d. ∞ = F j x j ( ( ) is the d. x j . or FX. ⎧ ∑ f x1 . that is.’s. . ∞. Then the p.2 The 4.f. . 2.f. x 2 )dx1 dx 2 = 1. .d. Xk. of the r. x 2 ⎪ f2 x 2 = ⎨ x ∞ ⎪ f x1 .

f (x1|x2) may be given an interpretation similar to the one given above.f. and Their Relationship This is considered as a function of x2. x ) (f (x ) ) 1 2 2 2 and show that f (·|x2) is a p. 1 1 1 1 1 In a similar fashion. x + h ) − F (x .f. the f(x2|x1) has the following interpretation: f x 2 x1 = ( f x. Probability Densities.d. of X1. if X1.f. h2 > 0. For the case that the p.f. by means of . given X1 = x1. x 2 )dx 2 = f ( x ) ⋅ f1 ( x1 ) = 1. In fact.x P X =x. given that X2 = x2 (provided f2(x2) > 0). Then f (·|x1) is a p. X2 and F2 is the d. respectively.d. of X1.f. of X2. but ﬁxed. X2 are both discrete. of X2. we call f (·|x2) the conditional p. For a similar reason. x ) + F (x + h .d.’s f and f2 are of the continuous type. By assuming (without loss of generality) that h1. Furthermore. Thus for small h1. x2)′ and x2 are continuity points of f and f2. one has (1 h )P(x < X ≤ x + h x < X ≤ x + h ) (1 h h )P(x < X ≤ x + h . if f2(x2) > 0. of X2. f1 x1 ( ) ( ) ∫−∞ f ( x 2 x1 )dx 2 = f ( x ) ∫−∞ f ( x1 . For this reason. We can also deﬁne the conditional d. h2 → 0 and assuming that (x1. f (x2|x1) ≥ 0 and ∑ f ( x 2 x1 ) = x2 ∞ 1 f1 x1 1 ( ) ∑ f ( x1 . x1 being an arbitrary. given that X2 lies in a small neighborhood of x2. given that X1 = x1 (provided f1(x1) > 0).f. the last expression on the right-hand side above tends to f(x1. h1 f (x1|x2) is approximately equal to P(x1 < X1 ≤ x1 + h1|x2 < X2 ≤ x2 + h2). x 2 ) = x2 ∞ 1 ⋅ f1 x1 = 1.94 4 Distribution Functions. x + h ) − F (x = (1 h )[F (x + h ) − F (x )] 1 1 1 1 1 2 2 2 2 1 2 1 1 1 1 2 2 2 2 2 2 2 2 2 1 2 1 2 1 1 2 2 1 2 2 2 2 2 2 2 2 1 + h1 .d.d. we deﬁne f (x1|x2) by: f x1 x 2 = ( f x. By letting h1. x2)/f2(x2) which was denoted by f(x1|x2). x < X ≤ x + h ) = (1 h )P(x < X ≤ x + h ) (1 h h )[F (x . X =x ) (f (x ) ) = ( P(X = x ) ) = P(X 1 2 1 1 2 2 1 1 1 1 2 2 = x 2 X1 = x1 . we call f (·|x2) the conditional p. x 2 )] where F is the joint d. h2.d.f.f.f. A similar interpretation may be given to f(x2|x1). so that h1 f (x1|x2) is approximately the conditional probability that X1 lies in a small neighborhood (of length h1) of x1. ) Hence P(X2 ∈ B|X1 = x1) = ∑ x ∈B f (x2|x1). value of X1 ( f1(x1) > 0). the conditional p.

’s involving all k r. . xk) the joint p. · · · . ⋅ ⋅ ⋅ . . . . ⋅ ⋅ ⋅ .v. . . x ′j x i . that is.v. ⋅ ⋅ ⋅ .f. . xk fi .d.2 The 4. . . Conditional distribution functions are deﬁned in a way similar to the one for k = 2.’s corresponding to the remaining s variables. . .’s Xj1. Also. we have: . . . then the function (of xj1. of The a Random Cumulative Vector Distribution and Its Properties Function 95 ⎧ ∑ f x′ x 2 1 ⎪ F x2 x1 = ⎨x ′ ≤ x ⎪ x f x ′ x dx ′ .d.d.d. xis are such that fi1. . . . . ⋅ ⋅ ⋅ .’s. 2 1 2 ⎩∫−∞ ( ) 2 2 ( ) 2 ( ) and similarly for F(x1|x2).f.v. . ⋅ ⋅ ⋅ . Again there are 2k − 2 joint conditional p. one taken from a discrete distribution and the other taken from a continuous distribution.v. xk keeping the remaining s ﬁxed (t + s = k). . .f. The concepts introduced thus far generalize in a straightforward way for k > 2. x i dx ′j ⋅ ⋅ ⋅ dx ′j .f. xi 1 s 1 ( s ) ⎧ ∑ f x1 . . Then in the notation employed above. . . .d.d. then we have called f(x1. called the joint conditional p. . xjt) deﬁned by f x j .f. of the r. or just given Xi1. the resulting function is the joint p. ⋅ ⋅ ⋅ .’s X1. EXAMPLE 1 Let the r. . . ⋅ ⋅ ⋅ . . xi = 1 t 1 s ( ) f x1 . . i xi . . of the r.’s which are also called marginal p.4. ⋅ ⋅ ⋅ . . ⎪ ∫−∞ j ⋅ ⋅ ⋅ . x ′ )≤( x . ⋅ ⋅ ⋅ .’s X1. ⋅ ⋅ ⋅ . . . . fi .v. Thus if X = (X1.f. ⋅ ⋅ ⋅ . . X2.d. pk.d. Xk have the Multinomial distribution with parameters n and p1. x ′ x . Xk. . . . x ) ) t t 1 s 1 ) 1 t We now present two examples of marginal and conditional p.d. . let s and t be integers such that 1 ≤ s. x j xi . t < k and s + t = k.f. ⋅ ⋅ ⋅ . xi 1 t 1 ( ⎧ ∑ j j i i ⎪ ⎪ = ⎨( x ′ . If we sum (integrate) over t of the variables x1. ⋅ ⋅ ⋅ .’s X1. .1 d. . given Xi1 = xi1. i xi . Thus F x j . . . ⎩∫−∞ 1 t 1 s j1 jt j1 jt j1 jt ( ) f (x ′ . Xis. it (xi1. . . xk dx j ⋅ ⋅ ⋅ dx j . . . f (x1. x j xi . xi 1 s ( ( ) 1 s ) is a p. . xk ⎪x . ⋅ ⋅ ⋅ .f. . Xjs = xjs. x = ⎨ ⋅⋅⋅ ⎪ ∞⋅⋅⋅ ∞ f x . . .f. x ⎪ x ⋅ ⋅ ⋅ x f x′ . xk).’s. Xk)′ with p. . 1 ⋅ ⋅ ⋅ . xis) > 0. Also if xi1. of the r. . .f. ⎩∫−∞ ∫−∞ j1 jt ( ) ( ) 1 t There are ⎛ k⎞ ⎛ k⎞ ⎛ k ⎞ k ⎜ ⎟ +⎜ ⎟ + ⋅⋅⋅ +⎜ ⎟ =2 −2 ⎝ 1⎠ ⎝ 2 ⎠ ⎝ k − 1⎠ such p. . . ⋅ ⋅ ⋅ . Xjt. Xk. .

Probability Densities. . X i = xi . xi s ( ) ⎛ p j1 ⎞ = ⎟ x j1 ! ⋅ ⋅ ⋅ x jt ! ⎜ ⎝ q⎠ (n − r )! x j1 ⎛ pj ⎞ t ⋅ ⋅ ⋅⎜ 1 ⎟ . . . ii) We have f x j1 . . q. Xjt given Xi1. 1 s s 1 1 s s ) ( ) Denoting by O the outcome which is the grouping of all n outcomes distinct from those designated by i1. x⋅ ⋅)⋅ ) = f ((x . . that is. the (joint) conditional distribution of Xj1. Y = n − r . . . is given by: .x f x . 1 s 1 s ( 1 s ( ) ) i1 is 1 s that is. ⋅ ⋅ ⋅ . Xis is Multinomial with parameters n − r and pj1/q. is . . . .’s X1 and X2 have the Bivariate Normal distribution. ⋅ ⋅ ⋅ . X i = xi ⊆ X i + ⋅ ⋅ ⋅ + X i = r = n − Y = r = Y = n − r . X i = xi = X i = xi . ⋅⋅ ⋅⋅ ⋅⋅ . xi ! ⋅ ⋅ ⋅ xi ! n − r ! q = 1 − pi + ⋅ ⋅ ⋅ + pi .x f x.v. ) ( ⋅f ⋅( ⋅x . xi = 1 s 1 s ( ) n! x x pi ⋅ ⋅ ⋅ pi q n − r . and Their Relationship ii) fi .’s Xi1. .d. . . .v. x jt xi1 . . ⎝ q⎠ xj r = xi1 + ⋅ ⋅ ⋅ + xi s . . xi s = ⎛ n! x1 xk ⎞ p1 =⎜ ⋅ ⋅ ⋅ pk ⎟ ⎝ x1! ⋅ ⋅ ⋅ xk ! ⎠ ( . . . . . x jt xi1 . the r.’s Xi1. . ⋅ ⋅ ⋅ . Xis and Y = n − (Xi1 + · · · + Xis) have the Multinomial distribution with parameters n and pi1. . and recall that their (joint) p. ⎝ q⎠ EXAMPLE 2 as was to be seen. ⋅ ⋅ ⋅ . Let the r. the r. pjt/q. ⋅ ⋅ ⋅ . . and the number of its occurrences is Y. we have that the probability of O is q. x . 1 s s 1 s ) ( ) ( ) ( ) so that (X i1 = xi . . . r = xi + ⋅ ⋅ ⋅ + xi . ii) f x j1 . . x )) j1 jt i1 is 1 k i1 is i1 is ⎛ ⎞ n! x x pi1i1 ⋅ ⋅ ⋅ pi si s q n − r ⎟ ⎜ ⎜ x !⋅ ⋅ ⋅ x ! n−r ! ⎟ is ⎝ i1 ⎠ ( ) ⎛ p ⋅ ⋅ ⋅ p ⋅p ⋅ ⋅ ⋅ p i is j1 jt =⎜ 1 ⎜ xi1 ! ⋅ ⋅ ⋅ xi s ! x j1 ! ⋅ ⋅ ⋅ x jt ! ⎝ xi1 xi s x j1 x jt = ⎛ p j1 ⎞ ⎟ x j1 ! ⋅ ⋅ ⋅ x jt ! ⎜ ⎝ q⎠ (n − r )! (since n − r = n − ( x x jt ⎞ ⎟ ⎟ ⎠ ⎛p ⋅⋅⋅ p q ⎞ i is ⎜ 1 ⎟ ⎜ xi ! ⋅ ⋅ ⋅ xi ! n − r ! ⎟ 1 s ⎝ ⎠ xi1 xi s x j1 + ⋅ ⋅ ⋅ + x jt ( ) i1 + ⋅ ⋅ ⋅ + xi s = x j1 + ⋅ ⋅ ⋅ + x jt ) ) x j1 ⎛ pj ⎞ ⋅⋅⋅⎜ t⎟ . ⋅ ⋅ ⋅ . . ⋅ ⋅ ⋅ .v. ⋅ ⋅ ⋅ . . ⋅ ⋅ ⋅ . Thus.f.x . i xi . ⋅ ⋅ ⋅ . pis. DISCUSSION i) Clearly.96 4 Distribution Functions. . Xis and Y are distributed as asserted. (X i1 = xi . .

· · · .2. X1.d.d. X2 are also normally distributed. respectively.’s Xj.2.v.’s of the r. given X2. of each one of X1. given X3. 2 ⎥ ⎢ ⎛ 2⎞ ⎥ ⎥ ⎦ ⎢ 2⎝σ 2 1 − ρ ⎠ ⎥ ⎦ ⎣ ( ) where b = μ2 + ρ Hence σ2 x1 − μ1 .f. of an N(b′.d.2.f. in the process of proving that f(x1.1 The Cumulative DistributionExercises Function 97 f x1 . x 2 f1 x1 ( ( ) )= 1 2πσ 2 ⎤ ⎡ 2 ⎥ ⎢ x2 − b ⎥ exp⎢− 2 ⎢ ⎛ 2⎞ ⎥ 1 − ρ2 ⎢ 2⎝σ 2 1 − ρ ⎠ ⎥ ⎦ ⎣ ( ) 2 which is the p. σ 1(1 − ρ )) r.f.2. N(μ2. X2. of an N(b. σ2 ( ) Exercises 4.17 in Chapter 3 and: i) Find the marginal p.2 Refer to Exercise 3. σ 2 1). σ 2 2(1 − ρ )) r.v.d.f. X3. f2 are N(μ1.4.v.. σ 2). we rewrote it as follows: f x1 . x2 = ( ) 1 2πσ 1σ 2 1 − ρ 2 ⎧ 1 ⎪ × exp ⎨− 2 1 − ρ2 ⎪ ⎩ 2 ⎫ ⎡⎛ x − μ ⎞ 2 ⎛ x1 − μ1 ⎞ ⎛ x2 − μ 2 ⎞ ⎛ x2 − μ 2 ⎞ ⎤ ⎪ 1 ⎢⎜ 1 ρ + − 2 ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎥ ⎬. given X1 . X2.d. where b′ = μ1 + ρ σ1 x2 − μ2 . X2. ii) Calculate the probability that X1 ≥ 5. j = 1.f.’s f1.f. . 6. X3. ii) The conditional p. x 2 = ( ) 1 2πσ 1σ 2 ⎡ x −μ 1 1 exp⎢− 2 ⎢ 2σ 1 1 − ρ2 ⎢ ⎣ ( ) 2 ⎤ ⎡ 2 ⎤ ⎥ ⎢ x − b 2 ⎥ ⋅ exp⎢− ⎥. Similarly f(x1|x2) is seen to be the 2 2 p. Furthermore. X1.f. ⎢⎝ σ 1 ⎠ ⎝ σ 1 ⎠ ⎝ σ 2 ⎠ ⎝ σ 2 ⎠ ⎥⎪ ⎦⎭ ⎣ ( ) 2 We saw that the marginal p.d.. X3.18 in Chapter 3 and determine: ii) The marginal p. that is. of X1.1 Refer to Exercise 3. σ1 ( ) f x 2 x1 = ( ) f x1 . x2) is a p. 4.d.

f of the r. 2. i) Determine the constant c. 3. and the conditional p.f. v) P(X1 < X2 < X3).2. 1. 4.f. f given by f(x.v. iv) P(X + Y ≤ 3).5 If the joint p.v.d. 4. f x y = ⎨ x! ⎪0.’s jointly distributed with p. of each one of the r. Probability Densities. ii) P(Y ≤ y).f. ⎩ ( ) x = 0. ∞ × 0. given X2. vii) P(3 ≤ X1 ≤ 10|X2 = X3). x 2 .6 Consider the function given below: ( ) ( ) ( ) ⎧ yx e − y ⎪ .d. of X1.f.v. given X.’s of X and Y. iii) P(X < Y). iii) Find the conditional p. X3.98 4 Distribution Functions. is given by f x1 .2.f.d. X2.f.’s Xj.f. X3.f.d. 4. y) = e−x−y I(0.f. x 2 . X2. i) Determine the constant c. iii) Find the conditional (joint) p. and Their Relationship iii) The conditional p. given X3. vi) P(X1 + X2 = 10|X3 = 5). given Y.f.’s Xj.3 Let X. 3.d. ∞ . 4. x3 . X3. iv) Find the conditional d. given X2. Y be r. y ≥ 0 otherwise.’s X. j = 1.d. given X3. given X1. Compute the following probabilities: i) P(X ≤ x).’s corresponding to the conditional p.f. provide expressions for the following probabilities: iv) P(3X1 + X2 ≤ 5). If n = 20. ⋅ ⋅ ⋅ . of X1. ii) Find the marginal p.d.4 Let the r. of X1.f. X1. iv) Calculate the probability that X ≤ 1. ( ) where A = 0. y).2. f given by f(x.d. X2 . 2. . y) = 2/c2 if 0 ≤ x ≤ y.d.’s in (iii).∞) (x. Y be jointly distributed with p.2. ii) Find the marginal p.v. j = 1. viii) P(X1 < 3X2|X1 > X3).d. and the conditional p.∞)×(0. of Y. of X.d. ∞ × 0. x3 = c 3 e ( ) − c x1 + x2 + x 3 ( ) I A x1 . 0 ≤ y ≤ c and 0 otherwise.

distributed as U(0. x 2 = 2 2⎞ ⎛ x1 + x2 1 1 3 3 exp⎜ − x1 x 2 I [ −1. 0.10. 1.50. 0. of Y is Negative Exponential with parameter λ = 1. we get: x0.10.v. 0.90.f.10.50 = 0.30 = 0. 0.f. x0.80. is a number denoted by xp and having the following property: P(X ≤ xp) ≥ p and P(X ≥ xp) ≥ 1 − p.10.f. F.80. 0.30.f. EXAMPLE 4 .5 we get a median of X. 4. Let X be an r. 4. 1]×[ −1. or of its d.25 we get a quartile of X. x 2 dx1 −∞ ( ) ∞ ( ) are Normal p.10. x0. where iii) Show that the marginal p.f.f.d. and for p = 0. F and consider a number p such that 0 < p < 1. x0. .80 and 0. Determine the respective x0. of X.70. 2. x0. 0. or its d.v.d. x0. 1] x1 .70 = 0. given X = x.60. EXAMPLE 3 Let X be an r.3 4.40. x 2 ⎟+ 2π 2 ⎠ 4πe ⎝ ( ) ( ) and show that: i) f is a non-Normal Bivariate p.1 Quantiles The Cumulative and Modes Distribution of a Distribution Function 99 i) Show that for each ﬁxed y. x0.2.40. 0. ii) If the marginal p.30.60.d. Since for 0 ≤ x ≤ 1. ii) Both marginal p.60. For p = 0. .70.f.30. what is the joint p. x0. A pth quantile of the r.d. x0.f. x0.20. f (·|y) is a p. X.30.90.f. x0.d.8 Consider the function f deﬁned as follows: f x1 . Determine the respective x0.f. 0. of an r.d. p).. x 2 dx 2 −∞ ( ) ∞ ( ) and f2 x 2 = ∫ f x1 . x0.f.60. consider the following simple examples.70. }. x0. 0. x0.70.20.40. x0.d.v. is B(n. of Y.50. 1) and let p = 0. distributed as N(0. x0.20.v. given Y = n. F(x) = x.v. Determine the p.20 = 0..70.50. and x0.3 Quantiles and Modes of a Distribution Let X be an r. of X is given by f (x) = ( 1 2 A = {0.90.20.d. X.40. and x0. the conditional p.10 = 0.60. given that another r.d. x0.40 = 0. 0. and x0. 4.d.v. of X and the conditional p.30. or its d. x0.60 = 0.50.50. Y equals y. Y? )x+1 IA(x). with d.2.80 and 0. x0. 0. 0.v.4. 0. x0.f. 1) and let p = 0. .’s. x0.80 = 0. x0.40.f.’s f1 x1 = ∫ f x1 . distributed as P(λ) and suppose that the conditional distribution of the r.v. For illustrative purposes.90.7 Let Y be an r. 0. X.90 = 0.20.80.f.90.

is any number which maximizes f(x). In case f is a p.50 = 0. as deﬁned. x0.70 = 0. which is twice differentiable.842. Probability Densities. x0. a mode can be found by differentiation. if it exists.10 = −1.v. x0.d. x0. From the Normal Tables (Table 3 in Appendix III).842. we ﬁnd: x0.d.f. Then a mode of f.20 = −0. 4. Let X be an r. xp need not be unique. This process breaks down in the discrete cases. x0. and x0. x0.3 various cases are demonstrated for determining graphically the pth quantile of a d.40 = −0.282.524.100 4 Distribution Functions. In Fig.3 Observe that the ﬁgures demonstrate that. .f. Knowledge of quantiles xp for several values of p provides an indication as to how the unit probability mass is distributed over the real line. f. The following theorems answer the question for two important discrete cases. x0.f.30 = −0. by linear interpolation and symmetry.282.80 = 0.60 = 0.524.90 = 1.253. and Their Relationship Typical cases: F(x) 1 F(x) p p xp 0 (a) F(x) p x 0 [ xp ] (b) x 0 F(x) 1 p (c) xp x F(x) 1 p 0 (d) xp x 0 [ xp ] (e) x Figure 4.253. with a p.

that is. 2.3 4. x q f x f x−1 ( () ) = n−x +1 p ⋅ . Thus if (n + 1)p is not an integer. If λ is an integer. ⎝ x⎠ () 0 < p < 1. q = 1 − p. f(x) has a unique mode at x = m. we have . n. 1. then f(x) has two modes obtained for x = λ and x = λ − 1. ᭡ THEOREM 6 Let X be P(λ). f(x) has a unique mode at x = [λ].1 Quantiles The Cumulative and Modes Distribution of a Distribution Function 101 THEOREM 5 Let X be B(n. x = 0. or np − xp + p > x − xp. Thus x = n+1 p−1 ( ) is a second point which gives the maximum value. 1. x! Then if λ is not an integer. λ > 0. x q Hence f (x) > f (x − 1) ( f is increasing) if and only if (n − x + 1) p > x(1 − p). f (x) keeps increasing for x ≤ m and then decreases so the maximum occurs at x = m. p). PROOF For x ≥ 1. Then if (n + 1)p is not an integer. ⋅ ⋅ ⋅ . then the maximum occurs at x = (n + 1)p. If (n + 1)p is an integer. or (n + 1) p > x. f x = e −λ () PROOF For x ≥ 1. where [y] denotes the largest integer which is ≤ y. that is. then f(x) has two modes obtained for x = m and x = m − 1. Consider the number (n + 1)p and set m = [(n + 1)p]. n! p x −1 q n− x +1 x −1! n−x +1! = )( ) n−x +1 p ⋅ . we have ⎛ n⎞ x n− x ⎜ ⎟p q ⎝ x⎠ = ⎛ f x−1 n ⎞ x −1 n− x +1 ⎜ ⎟p q ⎝ x − 1⎠ f x ( () ) = n! p x q n− x x! n − x ! ( ) ( That is. x = 0. ⋅ ⋅ ⋅ . ⎛ n⎞ f x = ⎜ ⎟ px qn −x .4. λx . If (n + 1)p is an integer. where f (x) = f (x − 1) (from above calculations).

4* Justiﬁcation of Statements 1 and 2 In this section. DEFINITION 1 A set G in ޒis called open if for every x in G there exists an open interval containing x and contained in G.3. Thus if λ is not an integer. f symmetric about a constant c (that is.50. f(c − x) = f (c + x) for all x ∈ ) ޒ.v. LEMMA 1 Every open set in ޒis measurable. Clearly. The same is true if we consider only those intervals corresponding to all rationals x in G. It follows from this deﬁnition that an open interval is an open set.f. p) and P(λ). some preliminary concepts and results are needed and will be also discussed. the entire real line ޒis an open set. such intervals may be taken to be centered at x. of such intervals is equal to G. Probability Densities. Then the maximum of f (x) occurs at x = [λ]. These intervals are countably many and each one of them is measurable. then the maximum occurs at x = λ. a rigorous justiﬁcation of Statements 1 and 2 made in Section 4. x Hence f (x) > f (x − 1) if and only if λ > x. But in this case f (x) = f (x − 1) which implies that x = λ − 1 is a second point which gives the maximum value to the p. If λ is an integer.d. ᭡ . Then show that c is a median of f.3. For this purpose.13–16 (Exercise 3.102 4 Distribution Functions.3. 0.1 will be presented. and so is the empty set (in a vacuous manner). 4.’s given in Exercises 3.d. then so is their union. ᭡ Exercises 4.1 Determine the pth quantile xp for each one of the p.2 Let X be an r.’s mentioned in Exercise 4. with p.d. 4.3 Draw four graphs—two each for B(n.1 from the point of view of a mode.3.14 for α = 1 ) in Chapter 3 if p = 4 0.2.4 Consider the same p. as x varies in G. f (x) keeps increasing for x ≤ [λ] and then decreases. consider an open interval centered at x and contained in G.3. PROOF Let G be an open set in ޒ.13–15. and Their Relationship f x f x−1 ( () ) = e − λ λ x x! e −λ x −1 [λ (x − 1)!] ( ) = λ .d.f. the union over x. 4. 4.2. Without loss of generality.f. 3.75. p) and P(λ)—which represent the possible occurrences for modes of the distributions B(n.f.3. and for each x ∈ G.

Since g(x0) ∈ G and G is open. But B(= g−1(G)) is the set of all x ∈ ޒfor which g(x) ∈ G. . . Then g is measurable. g(x0) + δ) ⊂ G. as x varies in G. x ∈ (x0 − ε.1 4. the assertion is valid. of such cubes clearly is equal to G. It follows from the concept of continuity that ε → 0 implies δ → 0. x0) > 0 such that |x − x0| < ε implies |g(x) − g(x0)| < δ. ᭡ PROOF The concept of continuity generalizes. it follows that (x0 − ε. we can make δ so small that (g(x0) − δ. and therefore their union is measurable. of course. Since x0 is arbitrary in B. The same is true if we restrict ourselves to x’s in G whose m coordinates are rationals. . consider an open cube centered at x and contained in G. for x = (x1. g(x0) + δ ). x0 + ε) implies g(x) ∈ (g(x0) − δ. x0) > 0 such that ||x − x0|| < ε implies ||g(x) − g(x0)|| < δ. and similarly for the other quantities. The union over x. Without loss of generality. x0 + ε) have this property. x0 + ε) ⊂ B. and for each x ∈ G. Indeed. m ≥ 1) is said to be continuous at x0 ∈ ޒk if for every ε > 0 there exists a δ = δ (ε. DEFINITION 4 A function g : S ⊆ ޒk → ޒm (k. Then by Lemma 1. for such a choice of ε and δ. LEMMA 4 Let g: ޒk → ޒm be continuous. Then the resulting cubes are countably many. As all x ∈ (x0 − ε. since so is each cube. By Theorem 5 in Chapter 1 it sufﬁces to show that g−1(G) are measurable sets for all open intevals G in ޒ. It is analogous to that of Lemma 1. The function g is continuous in S if it is continuous for every x ∈ S. ||x|| = 12 (∑ik=1 xi2 ) . such cubes may be taken to be centered at x. x ∈ (x0 − ε.4. m ≥ 1. x0 + ε) implies that (g(x0) − δ. so let B ≠ ∅ and let x0 be an arbitrary point of B. from the concept of continuity it follows that ε → 0 implies δ → 0. . by choosing ε sufﬁciently small. and then a result analogous to the one in Lemma 3 also holds true. g(x0) + δ ) is contained in G. x0) > 0 such that |x − x0| < ε implies |g(x) − g(x0)| < δ. Then g is measurable.. The function g is continuous in S if it is continuous for every x ∈ S. is called open if for every x in G there exists an open cube in ޒm containing x and contained in G. LEMMA 3 Let g: ޒ → ޒbe continuous. . Continuity of g at x0 implies that for every ε > 0 there exists δ = δ(ε.e. it is measurable. by the term open “cube” we mean the Cartesian product of m open intervals of equal length. Set B = g−1(G). ᭡ PROOF LEMMA 2 DEFINITION 3 Recall that a function g: S ⊆ ޒ → ޒis said to be continuous at x0 ∈ S if for every ε > 0 there exists a δ = δ (ε. i. Once again. to Euclidean spaces of higher dimensions. it follows that B is open. Thus if B = ∅. xk)′. Every open set in ޒn is measurable. so that g(x0) ∈ G. Equivalently.4* The Justiﬁcation Cumulativeof Distribution Statements Function 1 and 2 103 DEFINITION 2 A set G in ޒm. Thus. let G be an open set in ޒm. Here ||x|| stands for the usual norm in ޒk.

Bm) and is a random vector. ε) ⊂ B. j = 1. for such a choice of ε and δ. For an arbitrary point x0 in ޒK. and therefore suppose that B ≠ ∅ and let x0 be an arbitrary point of B. are continuous. x ∈ S(x0. k. . The proof is completed. δ ). . j = 1. and therefore the theorem applies and gives the result. . . ε) implies that g(x) ∈ S(g(x0). Thus the deﬁnition of continuity of gj is satisﬁed here for δ = ε. ε). This last expression is equivalent to |gj(x) − gj(x0)| < ε. r) stands for the open sphere with center c and radius r. . k. ε). A) → ( ޒk. Bm) be measurable. k are realvalued functions. Continuity of g at x0 implies that for every ε > 0 there exists a δ = δ(ε. Then X may be written as X = (X1. . δ ). j = 1. Since g(x0) ∈ G and G is open. k are PROOF 0j LEMMA 5 . . . Once again. . and therefore B is open. it follows that S(x0. . . . Set B = g−1(G). . equivalently. Xk)′. . By Lemma 2. we can choose ε so small that the corresponding δ is sufﬁciently small to imply that g(x) ∈ S(g(x0). .) PROOF The continuity of g implies its measurability by Lemma 3. Probability Densities. The coordinate functions gj. . B k) → ( ޒm. consider x ∈ ޒK such that ||x − x0|| < 2 k ε for some ε > 0. . where Xj. −1 −1 −1 −1 1 −1 where B1 = g −1 B ∈B k ( ) by the measurability of g. . . The details are presented here for the sake of completeness. . and Their Relationship The proof is similar to that of Lemma 3. ᭡ DEFINITION 5 For j = 1. as deﬁned above. observe that it is clear that there is a cube containing x0 and contained in S(x0. j = 1. . where S(c. it is also measurable. ᭡ Now consider a k-dimensional function X deﬁned on the sample space S.) PROOF To prove that [g(X)]−1(B) ∈ A if B ∈ B m. or |xj − x0j| < ε. k. . . we have the following COROLLARY Let X be as above and g be continuous. x0) > 0 such that ||x − x0|| < ε implies ||g(x) − g(x0)|| < δ. . . Then g(X): (S. δ ). PROOF THEOREM 7 Let X : (S. call it C(x0. ε) ⊂ B. (That is. It so happens that projection functions are continuous. Thus. Since B(= g−1(G)) is the set of all x ∈ ޒk for which g(x) ∈ G. . A) → ( ޒm. j = 1. x ∈ S(x0. the jth projection function gj is deﬁned by: gj: ޒk → ޒand gj(x) = gj(x1. . that is. At this point. ᭡ To this theorem. . . δ ). . xk) = xj. (That is. X (B1) ∈ A since X is measurable. ε) implies that g(x) ∈ S(g(x0). measurable functions of random vectors are random vectors. ε) implies g(x) ∈ S(g(x0). we have [g(X)] (B) = X [g (B)] = X (B ). Also. it sufﬁces to show that g−1(G) are measurable sets for all open cubes G in ޒm. . . Then C(x0. . The question then arises as to how X and Xj. If B = ∅ the assertion is true. This is equivalent to ||x − x0||2 < ε 2 or ∑ j =1 ( x j − x ) < ε 2 which implies that (xj − x0j)2 < ε 2 for j = 1. Then g(X) is a random vector. . and x ∈ S(x0.104 4 Distribution Functions. . and let g : ( ޒk. B k) be a random vector. k. . ᭡ We may now proceed with the justiﬁcation of Statement 1. k. continuous functions of random vectors are random vectors.

. it sufﬁces to show that X−1(B) ∈ A for each B = (−∞. x j ].1 If X and Y are functions deﬁned on the sample space S into the real line ޒ. gj(X) = gj(X1. then so is the function X Y .’s. . vector and let gj.v. . .4 ii) If X is an r. Y (s) < x − r}]. PROOF X −1 ( B) = (X ∈ B) = ( X j ∈( −∞.’s.v. . .5 ii) If X is an r. j = 1.3 ii) If X is an r.4.. by special case 3 in Section 2 of Chapter 1.4. .v. Then for each j = 1. . Indeed. j = 1. .3(ii) to show that.2 Use Exercise 4.v.4. x1. are r. . k. 4. show that: {s ∈ S . j = 1.4. 4. . ii) Use part (i) and Exercise 4. 1 ii) Use the identity: XY = 2 (X + Y)2 − 1 (X 2 + Y 2) in conjunction with part (i) 2 and Exercises 4. .. provided X ≠ 0.v. . xk]. if X and Y are r. . ⋅ ⋅ ⋅ . A) → ( ޒk.’s.v. Xk) = Xj is measurable and hence an r. . Next. . provided Y ≠ 0. xk ∈ ޒ. k are r. . ii) Use part (i) in conjunction with Exercise 4. To show that X is an r. if X and Y .2 and 4. x1] × · · · × (−∞. 4. . we have the following result. then so is the function XY. . then so is the function X − Y. . then show that so is the function −X.4(ii) to show that. .2 to show that. B k).’s. vector if and only if Xj. then show that so is the function X 2. Then X is an r. r∈ Q where Q is the set of rationals in ޒ. .4. Then gj’s are continuous by Lemma 5 and therefore measurable by Lemma 4. .1 The Cumulative DistributionExercises Function 105 related from a measurability point of view.4. . ᭡ Exercises 4. Xk)′ : (S. if X and Y are r..v.4.v. assume that Xj. if X and Y are r. k ) = I X j−1 (( −∞.1 in order to show that. X (s) < r} ∩ {s ∈S . j = 1. Suppose X is an r.4. . then show that so is the function 1 X . j =1 k The proof is completed. 4. . x j ]) ∈ A . . vector.v. then so is the function X + Y.4. THEOREM 8 Let X = (X1.4. To this effect.’s. .’s.v. k be the coordinate functions deﬁned on ޒk.4. k are r. X (s) + Y (s) < x} = U [{s ∈S .

. 1 ⋅ ⋅ ⋅ . We say that the (inﬁnite) series ∑ xh(x). when we write (inﬁnite) series or integrals it will always be assumed that they converge absolutely. xk f x1 . ⋅ ⋅ ⋅ .v. . Xk) is an r. so that g(X) = g(X1. ∞ ∞ In what follows. . xk dx1 ⋅ ⋅ ⋅ dxk . . ⋅ ⋅ ⋅ . . ⋅ ⋅ ⋅ . the following remark will prove useful. we say that the moments to be deﬁned below exist. . ⋅ ⋅ ⋅ .f. f and consider the (measurable) function g: ޒk → ޒ. . In this case. . xk ) dx1 dx2 ⋅ ⋅ ⋅ dxk < ∞. x2 . k ≥ 1. xk dx1 ⋅ ⋅ ⋅ dxk ⎩∫−∞ ∫−∞ [ ( )] )( ) 106 . . converges absolutely if ∑ x|h(x)| < ∞. Also ∞ ∞ we say that the integral ∫−∞ ⋅ ⋅ ⋅ ∫−∞ h x1 . . the nth moment of g(X) is denoted by E[g(X)]n and is deﬁned by: n ′ ⎧ g x f x . xk ⎪ ∫ ∫ −∞ ⎩ −∞ For n = 1.d. . Then we give the DEFINITION 1 iii) For n = 1. Xk)′ be an r. 2. . . Let X = (X1. ⋅ ⋅ ⋅ . . ⋅ ⋅ ⋅ . where x = (x1. we get [ ( )] [ ( )] ( ) [( ()() ( ( ) )] ( ) ⎧∑ g x f x ⎪ E g X =⎨ x ∞ ∞ ⎪ ⋅ ⋅ ⋅ g x1 . vector with p.1 Moments of Random Variables In the deﬁnitions to be given shortly. x = x1 . f x1 . xk ⎪ n ⎪∑ E g X =⎨ x n ⎪ ∞⋅⋅⋅ ∞ g x .106 5 Moments of Random Variables—Some Moment and Probability Inequalities Chapter 5 Moments of Random Variables—Some Moment and Probability Inequalities 5. . xk dx1 ⋅ ⋅ ⋅ dxk converges absolutely if REMARK 1 ( ) ∫−∞ ⋅ ⋅ ⋅ ∫−∞ h( x1 . xk)′ varies over a discrete set in ޒk. .

xk dx1 ⋅ ⋅ ⋅ dxk . . if no confusion is possible. and n and r as above. if no confusion is possible. Xk. . 1 ⋅ ⋅ ⋅ . ⋅ ⋅ ⋅ . xk ∑ ⎪ ⎪x Eg X −c = ⎨ r ⎪ ∞⋅⋅⋅ ∞ g x . xk 2 1 ( ) ′ ⋅ ⋅ ⋅ dxk [( ) ( )] f ( x .1 Moments of Random Variables 107 and call it the mathematical expectation or mean value or just mean of g(X). x = x1 . ⎪ ∫ ∫ −∞ ⎩ −∞ [() ] () ( ) ] ( [( () () ( ( ) ( ) ) and r ′ ⎧ g x − c f x . In particular. ⎪ ∫ ∫ −∞ ⎩ −∞ r ) ) For c = E[g(X)]. The quantity + σ [ g(X )] = σ [ g(X )] is called the standard deviation (s. ⋅ ⋅ ⋅ . xk − c f x1 . The 2nd central moment of g(X). . . xk dx1 ⋅ ⋅ ⋅ dxk . . 1 ⋅ ⋅ ⋅ . . Then n E( X 1n ⋅ ⋅ ⋅ X k ) is called the (n1. . or σ g ( X ). the moments are called central moments. The variance of an r. xk ⎪ ∫ ∫ −∞ ⎩ −∞ () () ( ( ) ) ( ) iii) For an arbitrary constant c. E g X −E g X {( =⎨ x ⎪ ∞⋅⋅⋅ ∞ g x . x = x1 . 1 ⋅ ⋅ ⋅ . . for n1 = · · · = nj−1 = nj+1 = · · · = nk = 0. Let g(X1. xk ∑ ⎪ ⎪ =⎨ x n ⎪ ∞⋅⋅⋅ ∞ g x . ⋅ ⋅ ⋅ . xk − Eg X ⎪ ⎩∫−∞ ∫−∞ ) [ ( )]} ⎧ g x − Eg(X )] f ( x). or just σ. 1 ⋅ ⋅ ⋅ . or just μ. Xk) = X 1 ⋅ ⋅ ⋅ X k . ⋅ ⋅ ⋅ .d. nj = n. or just 2 2 σ . iii) For r > 0. and are deﬁned as follows: E g X −c [( ) ] ( ) n n ′ ⎧ g x − c f x . ⋅ ⋅ ⋅ . the rth absolute moment of g(X) is denoted by E|g(X)|r and is deﬁned by: Eg X ( ) r r ′ ⎧ g x f x . if no confusion is possible. Another notation for E[g(X)] which is often used is μg(X). f x1 .1 Important Special Cases n n 1. we get 1 k 1 k . x = x1 . E|g(X) − c|r. xk ∑ ⎪ ⎪x =⎨ r ⎪ ∞⋅⋅⋅ ∞ g x .5. ⋅ ⋅ ⋅ . ⋅ ⋅ ⋅ . xk − c f x1 . is referred to as the moment of inertia in Mechanics. that is. .1. xk dx1 ⋅ ⋅ ⋅ dxk . where nj ≥ 0 are integers. ⎪ ⎪∑ [ ( ) 2 2 x = x1 . 5. ⋅ ⋅ ⋅ . x )dx k 1 2 is called the variance of g(X) and is also denoted by σ 2[g(X)]. or μ[g(X)]. .v.) of g(X) and is also denoted by σg(X). . nk)-joint moment of X1. the nth moment and rth absolute moment of g(X) about c are denoted by E[g(X) − c]n. respectively. .

xk ⎪ x ⎪ x . Thus the nth moment of an r. More speciﬁcally. the interpretation of μX as the mean or expected value of the random variable is the natural one. E(X) = ∞ ∫−∞ xf ( x)dx. provided the probability distribution of X is interpreted as the unit mass distribution. Then REMARK 2 −1 −1 ∫−∞ g( x) f ( x)dx = ∫−∞ yf [g ( y)] dy g ( y) dy. f is E Xn ( ) ⎧∑ x n f x ⎪ =⎨ x ∞ ⎪ x n f x dx. suppose X is a continuous r. the deﬁnition of E[g(X)].v.v. Then E[g(X)] = ∞ ∫−∞ g( x)f ( x)dx.. On the other hand. In Deﬁnition 1. ⋅ ⋅ ⋅ . Also. of g(X) rather than that of X. X with p.f. 1 ⋅ ⋅ ⋅ . if one recalls from physics or elementary calculus the deﬁnition of center of gravity and its physical interpretation as the point of balance of the distributed mass. Actually. This quantity is also denoted by μX or μ(X) or just μ when no confusion is possible. ⋅ ⋅ ⋅ . in the deﬁnition of E[g(X)]. then μX is just the arithmetic average of the possible outcomes of X.f. ∫ ⎩ −∞ () () For n = 1. ∞ ∞ d . Suppose that g is differentiable and has an inverse g−1.v. xk dx1 ⋅ ⋅ ⋅ dxk ⎪ ⎩∫−∞ ∫−∞ j ⎧∑ x n j fj xj ⎪x =⎨ ⎪ ∞ x n f x dx j ⎩∫−∞ j j j 1 k () ( ) ( ) ( ) j ( ) which is the nth moment of the r. as given. Xj.d. from the last expression above. one would expect to use the p.v.d. is correct and its justiﬁcation is roughly as follows: Consider E[g(x)] = ∞ ∫−∞ g( x)f ( x)dx and set y = g(x). and that some further conditions are met.x ) ( =⎨ ⎪ ∞ ⋅ ⋅ ⋅ ∞ xn f x . There seems to be a discrepancy between these two deﬁnitions. we get ⎧∑ xf x ⎪ E X =⎨ x ∞ ⎪ xf x dx ⎩∫−∞ ( ) () () which is the mathematical expectation or mean value or just mean of X. The quantity μX can be interpreted as follows: It follows from the deﬁnition that if X is a discrete uniform r.108 5 Moments of Random Variables—Some Moment and Probability Inequalities E X ( ) n j ⎧∑ x n ∑ ′ xn j f x = j f x1 .

g(X1. 3. . . and is called the variance of X. nk)-joint moment of X1. j = 1. . xk ⎪ ⎪∑ =⎨ x ⎪ ∞ ⋅ ⋅ ⋅ ∞ x − EX n f x . Xk) = Xj(Xj − 1) · · · (Xj − n + 1). the quantity 1 k n n E⎡ X1 − EX1 ⋅ ⋅ ⋅ X k − EX k ⎤ ⎢ ⎥ ⎣ ⎦ is the (n1. xk dx1 ⋅ ⋅ ⋅ dxk j ⎪ ⎩∫−∞ ∫−∞ j n ⎧ x j − EX j f j x j ∑ ⎪ ⎪ = ⎨ xj ⎪ ∞ x − EX n f x dx j j j j ⎪ ⎩∫−∞ j ( ( ) () ( ) ( ) ( ) ( ) ) ( ) ) ( which is the nth central moment of the r. x = x1 . . 2 As in the case of μX. 1 ⋅ ⋅ ⋅ . . Thus the nth central moment of an r.f. ∞ Therefore the last integral above is equal to ∫−∞ yfY ( y)dy. . n ∞ n −∞ 2 In particular. .) n 2. . nj = 1. or moment of inertia. . . that is. Xj (or the nth moment of Xj about its mean). . .1 Moments of Random Variables 109 d On the other hand. . σ X has a physical interpretation also. .d. Its positive square root σX or σ(X) or just σ is called the standard deviation (s. . One recalls that a large moment of inertia means the mass of the body is spread widely about its center of gravity. of Y. . . which is consonant ∞ with the deﬁnition of E ( X ) = ∫−∞ xf ( x) dx. 1 k ( ) ( ) 4. f and mean μ is E X − EX ( ) n =E X −μ ( ) n ⎧ ⎪∑ x − EX =⎨ x ⎪ ∞ x − EX ⎩∫−∞ ( ) f (x) = ∑ (x − μ ) f (x) n n x ( ) f (x)dx = ∫ (x − μ ) f (x)dx. Likewise a large variance corresponds to a probability distribution which is not well concentrated about its mean value. Xk or the (n1. ⋅ ⋅ ⋅ . if fY is the p.d.v.v. For g(X1.d. . For g as above. .) of X. . for n = 2 the 2nd central moment of X is denoted by σ X or σ 2(X) 2 or just σ when no confusion is possible. . (A justiﬁcation of the above derivations is given in Theorem 2 of Chapter 9. Xk about their means. . . nk)-central joint moment of X1. and c = E(Xj). For g(X1. Its deﬁnition corresponds to that of the second moment. . the quantity ⎧∑ x j x j − 1 ⋅ ⋅ ⋅ x j − n + 1 f j x j ⎪x E Xj Xj −1 ⋅ ⋅ ⋅ Xj −n+1 = ⎨ ⎪ ∞ x x − 1 ⋅ ⋅ ⋅ x − n + 1 f x dx j j j j ⎩∫−∞ j j [ ( ) ( )] ( ) ( ) ( ) j ( ) ( ) ( ) . . then fY ( y) = f [ g −1 ( y)] dy g −1 ( y) . . X with p. . . . . we get 1 k E X j − EX j ( ) n n ′ ⎧ x j − EX j f x . k. Xk) = X 1n ⋅ ⋅ ⋅ X k and n1 = · · · = nj−1 = nj+1 = · · · = nk = 0. Xk) = (X1 − EX1)n · · · (Xk − EXk)n . .f. .5.

v. f is ⎧∑ x x − 1 ⋅ ⋅ ⋅ x − n + 1 f x ⎪ E X X −1 ⋅ ⋅ ⋅ X −n+1 = ⎨ x ∞ ⎪ x x − 1 ⋅ ⋅ ⋅ x − n + 1 f x dx. we have ⎡n ⎤ ⎤ ∞ ∞ ⎡ n E ⎢∑ c j g j X ⎥ = ∫ ⋅ ⋅ ⋅ ∫ ⎢∑ c j g j x1 . ⎩∫−∞ [ ( ) ( )] ( ) ( )() ( ) ( )() 5. (E6) |E[g(X)]| ≤ E|g(X)|. E(X + d) = E(X) + d if X is an r.v. (E4) Combining (E2) and (E3). Xj. Thus the nth factorial moment of an r. In particular. (E1) E(c) = c. n (E4′) E ∑n j = 1 cj g j X = ∑ j = 1 cj E g j X .1. we get E[cg(X) + d] = cE[g(X)] + d.f. then E(X) ≥ 0. in particular. xk ⎥ f x1 . (E2) E[cg(X)] = cE[g(X)]. ⋅ ⋅ ⋅ . (E4″) E ( ∑jn= 1 c j X j ) = ∑jn= 1 c j E ( X j ). in the continuous case. in particular. Furthermore. E(cX) = cE(X) if X is an r. . (E3) E[g(X) + d] = E[g(X)] + d. In particular. where c is a constant. then E|X|r′< ∞ for all 0 < r′ < r.v. ⋅ ⋅ ⋅ .V. the following properties are immediate. where d is a constant. xk f x1 . xk dx1 ⋅ ⋅ ⋅ dxk −∞ −∞ j =1 n n ∞ ∞ ( )( ) = ∑ cjE gj X . and. (E5) If X ≥ 0. it follows that . we get that (E5′) If X ≥ Y. ⋅ ⋅ ⋅ .’s (with ﬁnite expectations). where X and Y are r. From the very deﬁnition of E[g(X)]..110 5 Moments of Random Variables—Some Moment and Probability Inequalities is the nth factorial moment of the r.d. then E(X) ≥ E(Y). . This is a consequence of the obvious inequality |X|r′ ≤ 1 + |X|r and (E5′).v. by means of (E6). ⋅ ⋅ ⋅ . j =1 [ ( )] The discrete case follows similarly. for example. we have |Xn| = |X|n. where X is an r. .v.v. by means of (E5) and (E4″).2 Basic Properties of the Expectation of an R. xk dx1 ⋅ ⋅ ⋅ dxk −∞ −∞ ⎢ ⎢ ⎥ ⎥ ⎣ j =1 ⎣ j =1 ⎦ ⎦ ( ) ) ( ) = ∑ c j ∫ ⋅ ⋅ ⋅∫ g j x1 . 2. E(cX + d) = cE(X) + d if X is an r. . since of n = 1. and. Consequently. [ ( )] [ ( )] ( In fact. X with p. (E7) If E|X|r < ∞ for some r > 0.v.

5. and. in particular. . This formula is especially useful in calculating the variance of a discrete r.1. 5. if X is an r. and. E|X|n < ∞) for some n = 2. then E(Xn′) also exists for all n′ = 1.. with n′ < n..1. in particular. . In fact. where d is a constant.V. (V1) σ2(c) = 0.1. (V5) σ 2[g(X)] = E[g(X)]2 − [Eg(X)]2. . (V2) σ2[cg(X)] = c2σ 2[g(X)].v. . Exercises 5. (V5′) σ2(X) = E(X 2) − (EX)2. σ2 g X +d = E g X +d −E g X +d 2 2 [( ) ] {[ ( ) ] [ ( ) ]} = E[ g( X) − Eg( X)] = σ [ g( X)]. and.3 For r′ < r. σ 2 g X = E g X − Eg X 2 ⎫ ⎨[ g( X)] − 2 g( X)Eg( X) + [ Eg( X)] ⎬ [ ( )] [ ( ) ( )] = E⎧ ⎭ ⎩ = E[ g( X)] − 2[ Eg( X)] + [ Eg( X)] = E[ g( X)] − [ Eg( X)] . if X is an r. as is easily seen. we get σ 2[cg(X) + d] = c2σ 2[g(X)]. . .v.3 Basic Properties of the Variance of an R.5. 3. in particular. In particular.1 Verify the details of properties (E1)–(E7). .v. σ 2(X + d) = σ 2(X). (V3) σ2[g(X) + d] = σ 2[g(X)]. .2 Verify the details of properties (V1)–(V5).1. then E|X|r′ for all 0 < r′ < r. if X is an r. (V6) σ 2(X) = E[X(X − 1)] + EX − (EX)2. where c is a constant. as is seen below. In fact. if X is an r. Regarding the variance. σ 2(cX + d) = c2σ 2(X). show that |X|r′ ≤ 1 + |X|r and conclude that if E|X|r < ∞.v.v. σ 2(cX) = c2σ 2(X). 2. the following properties are easily established by means of the deﬁnition of the variance. if X is an r. 2 (V4) Combining (V2) and (V3). 5. 2 2 2 2 2 2 2 the equality before the last one being true because of (E4′).v.1 Moments of Random Exercises Variables 111 (E7′) If E(Xn) exists (that is.

with ﬁnite EX. such that 1 P X = −c = P X = c = . such that EX4 < ∞.304 1 0. σ 2(X) and show that ∞ ∞ ( ) ( ) P X − EX ≤ c = 5. show that: E X X − 1 = EX 2 − EX . iii) What premium should the company charge in order to break even? iii) What should be the premium charged if the company is to expect to come ahead by $M for administrative expenses and proﬁt? . and also σ 2(X).6 Let X be an r.v. 2 Calculate EX.115 4 0.208 3 0. X = IA. ii) For any constant c.4 Verify the equality (E[ g( X )] =)∫−∞ g( x) fX ( x)dx = ∫−∞ yfY ( y)dy for the case that X ∼ N(0. k = 2. r > 0.1. ii) E(X − EX)4 = EX4 − 4(EX)(EX3) + 6(EX)2(EX2) − 3(EX)4. 5.v.5 For any event A.v. 4 by means of the factorial moments E[X(X − 1)].) 5. 5.1. E[X(X − 1)(X − 2) (X − 3)]. 1) and Y = g(X) = X2. the indicator of A deﬁned by IA(s) = 1 for s ∈ A and IA(s) = 0 for s ∈ Ac.8 Let X be an r. 3.287 2 0. On the basis of an extensive study of the claim records. and calculate EXr. 5. consider the r.v. denoting the number of claims ﬁled by a policyholder of an insurance company over a speciﬁed period of time.9 If EX4 < ∞. E X X − 1 X − 2 = EX 3 − 3EX 2 + 2EX . Then show that ii) E(X − EX)3 = EX3 − 3(EX)(EX)2 + 2(EX)3.006 iii) Calculate the EX and the σ 2(X).1. ( ) σ2 X c2 ( ). 5. show that E(X − c)2 = E(X − EX)2 + (EX − c)2.019 6 0. E[X(X − 1)(X − 2)].112 5 Moments of Random Variables—Some Moment and Probability Inequalities 5.1.061 5 0.v. 4 3 2 [ ( [ ( )( )] E[ X ( X − 1)( X − 2)( X − 3)] = EX − 6EX + 11EX )] − 6EX .1.1. (These relations provide a way of calculating EXk. it may be assumed that the distribution of X is as follows: x f (x ) 0 0.10 Let X be the r.1. ii) Use part (i) to conclude that E(X − c)2 is minimum for c = EX.7 Let X be an r.

1 Moments of Random Exercises Variables 113 5.3. ii) Show that EX = ∫ 1 − F x dx − ∫ F x dx. f symmetric about a constant c (that is. 2. c2 ( ) x () Compute EXn for any positive integer n. β). 5. β). iii) For what value of k do the two expectations in parts (i) and (ii) coincide? iv) Does this value of k depend on M? iv) How do the respective standard deviations compare? 5. . show that EX2n+1 = 0 (that is.17 Let X be an r.1.1.11 green. X be distributed as U(α. 18 black.f. ii) Use (i) in order to compute σ 2(X). given by f x = () I − c . 2 ii) Compute EX. calculate the expected gain or loss and the standard deviation.14 Let the r. distributed as U(α. .16 Refer to Exercise 3.c x .v. 5. with ﬁnite expectation and d.5.18 Let X be an r. F. f(c − x) = f(c + x) for every x). 5. having the distribution in question. where X is an r. What is the gambler’s expected gain or loss and what is the standard deviation? iii) If the same bet of $M is placed on green and if $kM is the amount the gambler wins. Calculate EXn for any positive integer n. A roulette wheel has 38 slots of which 18 are red.v.v. with p.1.1.f. j = 1. E|Xr|. such that P(X = j) = ( 1 )j.13(iv) in Chapter 3 and ﬁnd the EX for those α’s for which this expectation exists. 0 −∞ ∞ [ ( )] 0 () .v. with p. and 2 iii) Suppose a gambler is placing a bet of $M on red.12 Let X be an r.v. σ 2(X). .1. ii) Then if EX exists. 5.f. ii) If c = 0 and EX2n+1 exists.1.d.1. . show that EX = c.v.15 Let X be an r.v. 5. r > 0. show that α +β EX = . E[X(X − 1)].d. 2 σ X 2 ( ) (α − β ) = 12 2 .13 If X is an r. 5.1. those moments of X of odd order which exist are all equal to zero).

p).20 If the r.d.2 Expectations and Variances of Some r.’s 5.v. ii) Determine the numerical values of EX and σ 2(X) for α = 1 and β = 1 . and in each one split the integral from −∞ to c and c to ∞ in order to remove the absolute value. of the continuous type with ﬁnite EX and p. f. ( ) () 5. then: ⎛ ⎛ 1⎞ 2⎞ 1β 2 β ii) Show that EX = Γ ⎜ 1 + ⎟ α .1. X is distributed according to the Weibull distribution (see Exercise 4. m ( )() ii) Utilize (i) in order to conclude that E|X − c| is minimized for c = m. c ii) If m is a median of f and c is any constant.15 in Chapter 4). Let X be B(n. σ 2(X) = npq. observe that ∫m (c − x) f ( x)dx ≥ 0 whether c ≥ m or c < m.114 5 Moments of Random Variables—Some Moment and Probability Inequalities ii) Use the interpretation of the deﬁnite integral as an area in order to give a geometric interpretation of EX. γ > 0. show that E X − c = E X − m + 2 ∫ c − x f x dx. 5. For part (ii).v. n n ⎛ n⎞ n! E X = ∑ x⎜ ⎟ p x q n − x = ∑ x px qn −x x! n − x ! x = 0 ⎝ x⎠ x =1 ( ) =∑ x =1 n ( x − 1)!(n − x)! (n − 1)! ( )( ) p q = np∑ ! ! x n x 1 1 1 − − − − ( ) [( ) ( )] (n − 1)! p q( ) = np p + q = np∑ ( ) x![( n − 1) − x]! n x −1 n −1 − x −1 x =1 n −1 x n −1 − x x= 0 n n−1 ! ( ) ( ) px qn −x n −1 = np. (Hint: Consider the two cases that c ≥ m and c < m. m ∞ 1 Then the fact that ∫−∞ f ( x)dx = ∫m f ( x)dx = 2 and simple manipulations c prove part (i).2.) 5.19 Let X be an r.1 Discrete Case 1. . so that β⎠ β⎠ ⎝ ⎝ ⎡ ⎛ ⎛ 2⎞ 1⎞⎤ σ 2 X = ⎢Γ ⎜ 1 + ⎟ − Γ 2 ⎜ 1 + ⎟ ⎥ σ 2 β .f. In fact.v. β⎠ β ⎠⎥ ⎝ ⎢ ⎣ ⎝ ⎦ ∞ where recall that the Gamma function Γ is deﬁned by Γ γ = ∫0 t γ −1 e − t dt .1. Then E(X) = np. 2 β = 1 and β = 2. EX 2 = Γ ⎜ 1 + ⎟ α .1.

E X = ∑ xe − λ x= 0 ( ( ) ) ( ) ∞ ∞ ∞ λx λx λx −1 = ∑ xe − λ = λe − λ ∑ x! x =1 x x−1 ! x =1 x − 1 ! ( ) ( ) = λe − λ ∑ Next. E[X(X − 1) · · · (X − n + 1)] = λn. x x−1 x− 2 ! x = 0 x! = ∑ x x − 1 e −λ x=2 ( )( ) Hence EX2 = λ2 + λ. One can also prove that the nth factorial moment of X is λn. = ∑x x−1 x= 0 n 2 x=2 n 2 x−2 n −2 − x−2 x=2 2 n −2 x n −2 −x x= 0 2 n −2 2 ( )( ) That is.1 Moments and Variances of Random of Some Variables R. σ 2(X) = λ2 + λ − λ2 = λ. 2.’s 115 Next. by (V6). that is. λ = λ e −λ eλ = λ.V.5. Hence. REMARK 3 . x = 0 x! x ∞ E X X − 1 = ∑ x x − 1 e −λ x= 0 ∞ [ ( )] ∞ ( ( ) ) λx x! ∞ λx λx = λ2 e − λ ∑ = λ2 . E X X −1 n [ ( )] x n −x x−2 n −2 − x−2 ! p q ) x! nn− ( x)! n( n − 1)( n − 2)! ( = ∑ x( x − 1) p p q x( x − 1)( x − 2)![( n − 2) − ( x − 2)]! (n − 2)! ( )( ) = n( n − 1) p ∑ p q ( x − 2)![(n − 2) − ( x − 2)]! (n − 2)! p q( ) = n( n − 1) p ∑ x![( n − 2) − x]! = n( n − 1) p ( p + q) = n( n − 1) p . Then E(X) = σ 2(X) = λ. In fact. Let X be P(λ).2 Expectations 5. E X X − 1 = n n − 1 p2 . so that. [ ( )] )] ( ) σ 2 X = E X X − 1 + EX − EX 2 2 2 2 2 ( ) [ ( ( ) 2 = n n − 1 p 2 + np − n 2 p 2 = n p − np + np − n p = np 1 − p = npq.

E X 2 n = ( ) n ≥ 0. ∫ ∞ 0 x 2n e − x 2 dx = − ∫ x 2n −1de − x 2 ∞ 2 0 = − x 2n −1e − x = 2n − 1 2 ∞ 2 0 + 2n − 1 2 ( )∫ ∞ 0 x 2n − 2 e − x 2 dx 2 ( )∫ ∞ 0 x 2n − 2 e − x 2 dx. we obtain ( ( ) () ) . we get then m2 n = 2 n − 1 m2 n− 2 . E X 2 n+1 = But ( ) ∫ 2 ∞ −∞ x 2 n+1 e − x 2 2 dx. ∫−∞x as is easily seen. 2 2 ∞ ∞ 0 0 Thus E(X2n+1) = 0. ( ) σ2 X = E X2 = ( ) 1 2π ( ) 2 = 1. Then E X 2 n+1 = 0. m 2 n− 2 2 n− 4 ( ) = (2 n − 3)m M m2 = 1 ⋅ m0 m0 = 1 since m0 = E X 0 = E 1 = 1 . and similarly. 1). ∫−∞ x ∞ 2 n +1 − x 2 2 e dx = ∫ x 2n +1e − x 2 dx + ∫ x 2n +1e − x 2 dx 2 0 ∞ −∞ 0 0 = ∫ y 2n +1e − y 2 dy + ∫ x 2n +1e − x 2 dx 2 2 ∞ ∞ 0 = − ∫ x 2n +1e − x 2 dx + ∫ x 2n +1e − x 2 dx = 0. Let X be N(0. and ∞ 2n e−x 2 2 dx = 2 ∫ x 2 n e − x 0 2 ∞ 2 2 dx. then E X = 0. n 1.2.116 5 Moments of Random Variables—Some Moment and Probability Inequalities 5. Multiplying them out. and if we set m2n = E(X2n).2 Continuous Case 2n ! ( ) 2( (n)!) . 2 ⋅ 1! In fact. In particular. Next.

. Hence ⎛ X − μ⎞ 2⎛ X − μ⎞ E⎜ ⎟ = 0. Then E(X) = αβ and σ 2(X) = αβ 2. ( ) ( ) 2.5. σ σ ( ) so that E(X) = μ. σ2 so that σ 2(X) = σ 2. EX = = ( ) 1 Γ α βα ( ) ∫ ( ) ∫ ∞ 0 ∞ xx α −1 e − x β dx = x α de − x β = − ∞ 1 Γ α βα ( ) ∫ ∞ 0 x α e − x β dx −β Γ α βα 0 β ⎛ x α e − x β ∞ −α ∞ x α −1 e − x β dx⎞ ∫0 α ⎝ 0 ⎠ Γα β ( ) = αβ 1 Γ α βα ( ) ∫ 0 x α −1 e − x β dx = αβ . σ ⎝ σ ⎠ σ ( ) Hence μ 1 E X − = 0. σ ⎜ ⎟ = 1.V. Next.’s 117 m2 n = 2 n − 1 2 n − 3 ⋅ ⋅ ⋅ 1 )( ) 1 ⋅ 2 ⋅ ⋅ ⋅ ( 2 n − 3)( 2 n − 2)( 2 n − 1)( 2 n) (2 n)! = = 2 ⋅ ⋅ ⋅ ( 2 n − 2)( 2 n) (2 ⋅ 1) ⋅ ⋅ ⋅ [2( n − 1)](2 ⋅ n) (2 n)! (2 n)! .1 Moments and Variances of Random of Some Variables R. Let X be Gamma with parameters α and β. = = 2 [1 ⋅ ⋅ ⋅ ( n − 1) n] 2 ( n!) n n ( REMARK 4 Let now X be N(μ. Then (X − μ)/σ is N(0. 1). σ 2). ⎝ σ ⎠ ⎝ σ ⎠ But ⎛ X − μ⎞ 1 μ E⎜ ⎟= EX − .2 Expectations 5. ⎛ X − μ⎞ 1 2 σ 2⎜ ⎟= 2σ X ⎝ σ ⎠ σ and then 1 2 σ X = 1. In fact.

that is. σ 2(X) = 2r. Thus the Cauchy distribution is an example of a distribution without a mean. if α = 1. we have. REMARK 6 I * = lim = A→∞ 1 A xdx 1 = lim log 1 + x 2 ∫ 2 − A π 2π A→∞ 1+ x ( ) A −A 1 lim log 1 + A 2 − log 1 + A 2 = 0. μ = 0. that is. β = 1/λ. σ 2(X) = 1/λ2. REMARK 5 ( ) ) ( ) ii) If X is χ r2 . In somewhat advanced mathematics courses. ii) If X is Negative Exponential.118 5 Moments of Random Variables—Some Moment and Probability Inequalities Next. σ = 1 and we have I= 2 ⎞ ⎛ 1 ∞ x dx 1 1 ∞ d x ⎟ ⎜ = π ∫−∞ 1 + x 2 π ⎜ 2 ∫−∞ 1 + x 2 ⎟ ⎠ ⎝ ( ) ( 2 1 1 1 ∞ d 1+ x = log 1 + x 2 = ∫ 2 −∞ 2π π 2 1+ x 1 = ∞−∞ . Let X be Cauchy. 2π ( ) ) ∞ −∞ ( ) which is an indeterminate form. For example. For simplicity. for n = 1. we get I= σ π ∫−∞ ∞ x dx σ2 + x−μ ( ) 2 . This coincides with the improper Riemann integral when the latter exists. if α = r/2. E X2 = and hence ( ) Γ(α1)β ( α ∫0 ∞ x α +1 e − x β dx =β 2α α + 1 ( ) σ 2 X = β 2α α + 1 − α 2 β 2 = αβ 2 α + 1 − α = αβ 2 . Then E(Xn) does not exist for any n ≥ 1. in terms of the principal value integral. 2π A→∞ [ ( ) ( )] . one encounters sometimes the so-called Cauchy Principal Value Integral. β = 2. It is an improper integral in which the limits are taken symmetrically. As an example. and it often exists even if the Riemann integral does not. 3. we get E(X) = 1/λ. we get E(X) = r. for σ = 1. we set μ = 0.

v.9. utilize Tchebichev’s inequality. determine the constant c.2.) 5. iv) If n = 50 and P(|(X/n) − 0.2.2. use an approach similar to the one used in the Binomial example in order to show that EX = mr . (Hint: In (ii)–(iv). 5.5 by more than 0. Find the expected number of people with blood of each one of the four types and the variance about these numbers. β = 1.2.2.2 An honest coin is tossed independently n times and let X be the r.v. if X has the Geometric distribution. 5. 5. distributed as Negative Binomial with parameters r and p. calculate the kth factorial moment E[X(X − 1) · · · (X − k + 1)]. ii) Use (i) in order to show that EX = q/p and σ 2(X) = q/p2. ii) By working as in the Binomial example.7 in Chapter 3 and ﬁnd the expected number of particles to reach the portion of space under consideration there during time t and the variance about this number. σ 2(X) = rq/p2.4 If X is an r.v.v. 5. (m + n) (m + n − 1) mnr m + n − r 2 5.3 Refer to Exercise 3.95.7 Let X be an r. iii) Determine the smallest value of n for which the probability that X/n does not differ from 0.2. distributed as P(λ).1. iii) Calculate E(X/n).8 Let f be the Gamma density with parameters α = n.5| < c) ≥ 0. with a Hypergeometric distribution. iii) If n = 100. show that EX = rq/p. ﬁnd a lower bound for the probability that the observed frequency X/n does not differ from 0. p). m+n σ2 X = ( ) ( ) . but not in the sense of our deﬁnition which requires absolute convergence of the improper Riemann integral involved.16 in Chapter 3 and suppose that 100 people are chosen at random.2.5. 5.v.5 Refer to Exercise 3. Exercises 5. distributed as B(n.1 If X is an r.5 by more 0.2. denoting the number of H’s that occur.6 If X is an r. calculate the kth factorial moment E[X(X − 1) · · · (X − k + 1)]. Then show that . σ 2(X/n).1 is at least 0.2.2.1 Moments of Random Exercises Variables 119 Thus the mean of the Cauchy exists in terms of principal value.

2.v. 2.3.9 Refer to Exercise 3. 5.v. . ⎝ σ ⎠ 3 γ1 is called the skewness of the distribution of the r.v. ⋅ ⋅ ⋅ . 5. ii) For what value of c will he break even? 5. 5.14 Suppose that the average monthly water consumption by the residents of a certain community follows the Lognormal distribution with μ = 104 cubic feet and σ = 103 cubic feet monthly.000 and 50% of the amount is refunded if its lifetime is <2. editor) in order to evaluate the d. σ 2 = σ 2(X). X is distributed as Lognormal with parameters α and β . for example.12 in Chapter 4 and suppose that each bulb costs 30 cents and sells for 50 cents.10 Refer to Exercise 4. .11 then If X is an r. having the Beta distribution with parameters α and β.1. compute EX. Furthermore.2. 2.12 Let X be an r.2. pure number) γ1 by ⎛ X − μ⎞ γ 1 = E⎜ ⎟ . 5. Then show that E|X| = ∞.2. distributed as Cauchy with parameters μ and σ 2. Γ (α )Γ (α + β + n) Γ α +β Γ α +n n = 1.120 5 Moments of Random Variables—Some Moment and Probability Inequalities n −1 e −λ ∫λ f ( x)dx = ∑ x= 0 ∞ λx .7 in Chapter 3 and suppose that each TV tube costs $7 and that it sells for $11. Cambridge University Press. . Compute the expected gain (or loss) of the dealer. of a Poisson distribution at the points j = 1.v.13 If the r. Tables of the Incomplete Γ-Function. with ﬁnite third moment and set μ = EX. If γ1 > 0. the distribution is said to be skewed to . σ 2(X). ii) Use (i) in order to ﬁnd EX and σ 2(X). . 5.f.v. 1957. ii) Express his expected gain (or loss) in terms of c and λ. Suppose further that the manufacturer sells an item on money-back guarantee terms if the lifetime of the tube is less than c.000.15 Let X be an r.2. 5.2. one may utilize the Incomplete Gamma tables (see. X and is a measure of asymmetry of the distribution.2. x! Conclude that in this case. Deﬁne the (dimensionless quantity. suppose that a bulb is sold under the following terms: The entire amount is refunded if its lifetime is <1. Compute the proportion of the residents who consume more than 15 × 103 cubic feet monthly. ii) Show that EX n = ( ) ( ). Karl Paerson.

p) is skewed to the right for p < skewed to the left for p > 1 . 5.v.13(iii) in Chapter 3). The function G is called the generating function of the sequence {pj}. the distribution is called leptokurtic and if γ2 < 0. then E X X −1 ⋅ ⋅ ⋅ X −k +1 = [ ( ) ( )] dk Gt dt k () t =1 . ( ) γ2 is called the kurtosis of the distribution of the r. j =0 () ∞ − 1 ≤ t ≤ 1.1 Moments of Random Exercises Variables 121 the right and if γ1 < 0. . . iii) Find the generating function of the sequences ⎧ ⎪⎛ n⎞ j n− j ⎫ ⎪ ⎨⎜ ⎟ p q ⎬.f. . the distribution is said to be skewed to the left. taking on the values j with probability pj = P(X = j). of X is symmetric about μ. the distribution is called platykurtic.2.17 Let X be an r. iii) Show that if |EX| < ∞. then EX = d/dt G(t)|t = 1. iii) Also show that if |E[X(X − 1) · · · (X − k + 1)]| < ∞.d.5. ⎨e j! ⎭ ⎩ j ≥ 0. Then show that: iii) If the p. ⎝ σ ⎠ 4 where μ = EX . σ 2 = σ 2 X . with EX4 < ∝ and deﬁne the (pure number) γ2 by ⎛ X − μ⎞ γ 2 = E⎜ ⎟ − 3.3. 1) p. j ≥ 0. is a measure of reference.d. where the N(0. . then γ1 = 0. If γ2 > 0.2. j = 0.f. j ≥ 0.16 Let X be an r. X and is a measure of “peakedness” of this distribution.v. Set G t = ∑ pj t j .v. 2 1 2 and is iii) The Poisson distribution P(λ) and the Negative Exponential distribution are always skewed to the right. iii) The Binomial distribution B(n. ii) γ2 > 0 if X has the Double Exponential distribution (see Exercise 3. 5. q = 1 − p . λ > 0. β). Then show that: ii) γ2 < 0 if X is distributed as U(α. 1. 0 < p < 1. j ⎝ ⎠ ⎪ ⎪ ⎩ ⎭ and ⎧ − λ λj ⎫ ⎬.

σ1 ( ) Similarly. .2. Replacing x1 by X1 and writing E(X2|X1) instead of E(X2|X1 = x1). for all x1 for which fX (x1) > 0. In connection with this. E X1 X 2 = x 2 = μ1 + ( ) ρσ 1 x2 − μ2 . . the resulting moments are called conditional moments. σ2 ( ) Let X1. σ 2 b = μ2 + ρσ 2 x1 − μ1 . . and a function of X1. . Then we may talk about the E[E(X2|X1)]. the p. .v.122 5 Moments of Random Variables—Some Moment and Probability Inequalities iv) Utilize (ii) and (iii) in order to calculate the kth factorial moments of X being B(n.d.d..1 and 5. X2)′ has the Bivariate Normal distribution.3 Conditional Moments of Random Variables If. .f. Thus 1 n 1 m 1 m ⎧∑ x f x x 2 2 1 ⎪ E X 2 X1 = x1 = ⎨ x ⎪ ∞ x f x x dx . . X2 be two r. then f(x2 |x1) 2 (1 − ρ2)) r. we have the following properties: 1 .f f(x1. f(xj . we then have that E(X2|X1 ) is itself an r.v.4. vector X is replaced by a conditional p. xi ). p) and X being P(λ). . and they are functions of xi . .d. f of the r. . . x2). if (X1.’s with joint p.d. 2 2 1 2 For example.2. xj |xi . of an N(b.f. 2 1 2 ⎩∫−∞ 2 ⎧ ∑ x 2 − E X 2 X1 = x1 ⎪ ⎪ σ 2 X 2 X1 = x1 = ⎨ x ⎪ ∞ x − E X 2 X1 = x1 ⎪ ⎩∫−∞ 2 ( ( ) ) 2 2 [ [ ( ( ) ) ( )] f (x x ) 2 2 1 ( )] f (x x )dx . where is the p. respectively. σ1 ( ) Hence E X 2 X1 = x1 = μ 2 + ( ) ρσ 2 x1 − μ1 . Compare the results with those found in Exercises 5. in the preceding deﬁnitions. 5. Then E(X2|X1 = x1) is a function of x1. that is.f. xi .v.. We just gave the deﬁnition of E(X2|X1 = x1) for all x1 for which f(x2|x1) is deﬁned. .

x 2 dx1 dx 2 2 ( ) ∞ ∞ ∞ = ∫ x 2 ⎛ ∫ f x1 . then E[E(X2|X1)] = E(X2) (that is. in question).1 Moments of Random Variables 123 5. REMARK 7 (CE2) Let X1. Again. the expectation of the conditional expectation of an r. We have ∞ ∞ E E X 2 X1 = ∫ ⎡ x 2 f x 2 x1 dx 2 ⎤ fX x1 dx1 ∫ ⎢ ⎥ −∞ ⎣ −∞ ⎦ 1 [( )] ( ) ( ) =∫ =∫ ∞ −∞ −∞ ∞ ∫ ∫ ∞ x 2 f x 2 x1 fX x1 dx 2 dx1 1 ( ( ) ( ) ) ) ∞ −∞ −∞ x 2 f x1 . x 2 dx 2 dx1 = ∫ ∞ −∞ −∞ ∫ ∞ x 2 f x1 .3. we have σ 2 E X 2 X1 ≤ σ 2 X 2 [( )] ( ) and the inequality is strict. by taking X2 = 1. we get [ ( ) ] ( ) ( ( )( ) ) ( )∫ ∞ −∞ x 2 f x 2 x1 dx 2 ( ) (CE2′) For all x1 for which the conditional expectations below exist. we have [ ( ) ] ( )( ∞ −∞ ) E X 2 g X1 X1 = x1 = ∫ x 2 g x1 f x 2 x1 dx 2 = g x1 = g x1 E X 2 X1 = x1 . unless X2 is a function of X1 (on a set of probability one).’s. (CV) Provided the quantities which appear below exist.1 Some Basic Properties of the Conditional Expectation (CE1) If E(X2) and E(X2|X1) exist.3 Conditional 5.v. restricting ourselves to the continuous case.v. x 2 dx1 ⎞ dx 2 = ∫ x 2 fX x 2 dx 2 = E X 2 .v. In particular. is the same as the (unconditional) expectation of the r. we have E[g(X1)|X1 = x1] = g(x1) (or E[g(X1)|X1] = g(X1)). −∞ −∞ ⎝ −∞ ⎠ ( ( ) ( ) Note that here all interchanges of order of integration are legitimate because of the absolute convergence of the integrals involved. Then for all x1 for which the conditional expectations below exist. for the proof for the discrete case is quite analogous. we have E X 2 g X1 X1 = x1 = g x1 E X 2 X1 = x1 [ ( ) ] ( )( ) or E X 2 g X1 X1 = g X1 E X 2 X1 . X2 be two r. ( ) ( ) ( ) . φ X1 = E X 2 X1 .5. Set μ = E X 2 . g(X1) be a (measurable) function of X1 and let that E(X2) exists. It sufﬁces to establish the property for the continuous case only.

σ 2 X 2 ≥ E φ X1 − μ 2 = σ 2 E X 2 X1 . E X 2 − φ X1 φ X1 − μ 2 2 1 {[ ( )][ ( ) ]} = E[ X φ ( X )] − E[φ ( X )] − μE( X ) + μE[φ ( X )] = E{E[ X φ ( X ) X ]} − E[φ ( X )] − μE[ E( X X )] + μE[φ ( X )] (by (CE1)). Therefore σ 2 X 2 = E X 2 − φ X1 and since [ ( )] 2 + E φ X1 − μ . which follows. Let the r. since E X 2 − φ X1 = μ − μ = 0.f. [ ( )] ( )] Thus σ 2[X2 − φ(X1)] = 0 and therefore X2 = φ(X1) with probability one.3.’s X. Exercises 5. The inequality is strict unless E X 2 − φ X1 ( ) [( ) ] [( [ )] [ ( )] 2 = 0. But E X 2 − φ X1 [ ( )] 2 = σ 2 X 2 − φ X1 . by Remark 8.v. ( 2 2 2 1 1 2 2 1 1 2 1 1 Next. Y be jointly distributed with p. given by . 1 2 1 2 2 1 1 1 2 1 1 and this is equal to E φ 2 X1 − E φ 2 X1 − μE φ X1 + μE φ X1 [ ( )] [ ( )] ( ) [ ( )] [ ( )] (by (CE2)).2 Establish properties (CEI) and (CE2) for the discrete case.124 5 Moments of Random Variables—Some Moment and Probability Inequalities Then σ 2 X2 = E X2 − μ 2 ( ) ) = E{[X − φ (X )] + [φ (X ) − μ ]} = E[ X − φ ( X )] + E[φ ( X ) − μ ] + 2 E{[ X − φ ( X )][φ ( X ) − μ ]}.1 5. 2 which is 0.3.d. 2 [( ) ] E X 2 − φ X1 we have [ ( )] ≥ 0.

∞)×(0. ∑ x =1 x = 6 ) ( 5. Hint: Recall that ∑ n x =1 x = 2 n n + 1 2 n + 1 ( )( ) n 2 .v. σ 2(X|Y = y).5. . σ 2(X|Y = y). Then show that E h X gY X=x =h x E gY X=x. f given by f(x. ⋅ ⋅ ⋅ .f.d. where A = {(x1.3. g be (measurable) functions on ޒinto itself such that E[h(X)g(Y)] and Eg(X) exist. y). . with ﬁnite EX.v. 5. ⋅ ⋅ ⋅ . (Assume the existence of all p. vector and g ≥ 0 be a real-valued (measurable) function deﬁned on ޒk. E(X|Y = y). Then Pg X ≥c ≤ E g ( X) [ ( ) ] [ c ]. ⋅ ⋅ ⋅ . Y and let h. xk f x1 . . σ 2(Y).1 Moments andof Moment Random Inequalities Variables 125 f x.6 Consider the r. . PROOF Assume X is continuous with p. y = ( ) 2 n n+1 ( ) if y = 1. xk dx1 ⋅ ⋅ ⋅ dxk + ∫ c g x1 . . Y. [( )( ) ] ()[( ) ] 5. g(x1. E(X|Y = y). xk) ≥ c}..’s with p. .f. y). . Compute the following n ( n + 1) .d.3 Let X. x. ⋅ ⋅ ⋅ . show that E[E(X|Y)] = EX. f. Then ∞ ∞ E g X = ∫ ⋅ ⋅ ⋅ ∫ g x1 . y) = λ2e−λ(x+y)I(0.4 Let X. and quantities: E(X|Y = y). Calculate the following quantities: EX.4 Some Important Applications: Probability and Moment Inequalities THEOREM 1 Let X be a k-dimensional r. xk dx1 ⋅ ⋅ ⋅ dxk −∞ −∞ [ ( )] = ∫ g x1 .3. Calculate the following quantities: EX.v. ⋅ ⋅ ⋅ . . 5. x )dx ( )( ) ) ( ) ⋅ ⋅ ⋅ dxk .1)(x. Y be r. . x = 1. .f. xk f x1 .f.∞)(x.v. σ 2(Y). . . .) 5.d.v. Then for any r. and 0 otherwise.d. xk)′ ∈ ޒk.v. xk A A 1 k 1 ( )( × f ( x . and let c > 0. n. . σ 2(X).4 Some Important Applications: Probability 5. σ 2(X). EY. . Y be r. so that g(X) is an r.5 Let X be an r.1)×(0.’s with p.’s needed. . E(Y|X = x). EY. Then . f given by f(x.3. y) = (x + y)I(0. ⋅ ⋅ ⋅ .’s X.3.

and take g(X) = |X − μ|r.4. with mean μ and variance σ 2 = 0. This result and Theorem 2.4. 2. − 1 ≤ E XY ≤ 1. In particular. This is known as Tchebichev’s inequality.v.1 Special Cases of Theorem 1 r 1. P Y = X = 1. In Markov’s inequality replace r by 2 to obtain EX −μ 2 ⎛ 2 ⎤⎞ P X − μ ≥ c ⎜ = P⎡ X − μ ≥ c ≤ ⎟ ⎢ ⎥⎠ ⎝ ⎣ ⎦ c2 [ ] 2 = σ2 X c 2 ( )=σ 2 2 c . if c = kσ.126 5 Moments of Random Variables—Some Moment and Probability Inequalities E g X ≥ ∫ g x1 . ▲ )( ) ≥ c ∫ f ( x . x )dx ⋅ ⋅ ⋅ dx = cP[ g(X ) ∈ A] = cP[ g(X ) ≥ c]. 2 k k [ ] Let X be an r. Then Tchebichev’s inequality gives: P[|X − μ| ≥ c] = 0 for every c > 0. ( ) ( ) σ 2 X = σ 2 Y = 1. r > 0. ⎢ ⎥ ⎣ ⎦ cr [ ] This is known as Markov’s inequality. We have 0 ≤ E X −Y ( ) 2 = E X 2 − 2 XY + Y 2 ( = EX 2 − 2 E XY + EY 2 = 2 − 2 E XY ( ) ) ( ) . then P X − μ ≥ kσ ≤ REMARK 8 [ ] 1 1 .6). Chapter 2. all one has to do is to replace integrals by summation signs. ⋅ ⋅ ⋅ . μ = E(X). ⋅ ⋅ ⋅ . Then EX −μ r ≤ P X − μ ≥ c = P⎡ X − μ ≥ cr ⎤ . ⋅ ⋅ ⋅ . equivalently. P X − μ < kσ ≥ 1 − 2 . ( ) ( ) Then E 2 XY ≤ 1 or. Let X be an r. ( 5.’s such that E X = E Y = 0. ( ) ( ) and ( ) E( XY ) = −1 PROOF E XY = 1 if any only if if any only if ( ) P(Y = − X ) = 1. LEMMA 1 Let X and Y be r. equivalently.v.v. imply then that P(X = μ) = 1 (see also Exercise 5. The proof is completely analogous if X is of the discrete type. xk f x1 . xk dx1 ⋅ ⋅ ⋅ dxk A A 1 k 1 k [ ( )] Hence P[g(X) ≥ c] ≤ E[g(X)]/c.

σ1 Y1 = Y − μ2 . [( )( )] or. ▲ THEOREM 2 ( ) (Cauchy–Schwarz inequality) Let X and Y be two random variables with 2 . Then σ 2 X −Y = E X −Y = EX 2 ( ) ( ) − [E(X − Y )] = E(X − Y ) − 2 E( XY ) + EY = 1 − 2 + 1 = 0. respectively. σ 2 2 E 2 X − μ1 Y − μ 2 ≤ σ 12σ 2 . let E(XY) = 1. Then E(XY) = EY2 = 1.4 Some Important Applications: Probability 5. let E(XY) = −1.5. Then σ 2(X + Y) = 2 + 2E(XY) = 2 − 2 = 0. ( ) ) ( ) Hence E(XY) ≤ 1 and −1 ≤ E(XY). that is. so that −1 ≤ E(XY) ≤ 1. Then means μ1. so that P X = −Y = 1. then E(XY) = −EY2 = −1. Now let P(Y = X) = 1. and if P(Y = −X) = 1. and E X − μ1 Y − μ 2 [( )( )] [( )( )] = σ σ 1 2 if and only if ⎡ ⎤ σ P ⎢Y = μ 2 + 2 X − μ1 ⎥ = 1 σ 1 ⎣ ⎦ ( ) and E X − μ1 Y − μ 2 [( )( )] = −σ σ 1 2 if and only if ⎡ ⎤ σ P ⎢Y = μ 2 − 2 X − μ1 ⎥ = 1. σ1 ⎣ ⎦ ( ) PROOF Set X1 = X − μ1 . equivalently. 2 2 2 2 so that P(X = Y) = 1 by Remark 8. −σ 1σ 2 ≤ E X − μ1 Y − μ 2 ≤ σ 1σ 2 . σ2 . Finally. μ2 and (positive) variances σ 12 .1 Moments andof Moment Random Inequalities Variables 127 and 0 ≤ E X +Y ( ) 2 = E X 2 + 2 XY + Y 2 2 ( = EX + 2 E XY + EY 2 = 2 + 2 E XY . Conversely. P(X = Y) = 1.

4. ▲ A more familiar form of the Cauchy–Schwarz inequality is E2(XY) ≤ (EX2)(EY2). use Tchebichev’s inequality in order to ﬁnd a lower bound for the probability P(|X − μ| < kσ). X and any ε > 0. The second half of the conclusion follows similarly and will be left as an exercise (see Exercise 5.6). g(−x) = g(x)) and nondecreasing for x ≥ 0. 5. If furthermore g is even (that is. Then 0 ≤ EZ2 = (EX2)λ2 − 2[E(XY)]λ + EY2 for all λ.2 Let g be a (measurable) function deﬁned on ޒinto (0. This is established as follows: Since the inequality is trivially true if either one of EX2. Compare the lower bounds for k = 1. X with EX = μ and σ 2(X) = σ 2.v. g (ε ) 5.4. both ﬁnite. where λ is a real number. suppose that they are both ﬁnite and set Z = λX − Y. Then.1 Establish Theorem 1 for the discrete case.4. or E2(XY) ≤ (EX2)(EY2). Y1 are as in the previous lemma. which happens if and only if E2(XY) − (EX2)(EY2) ≤ 0 (by the discriminant test for quadratic equations). P g X ≥ε ≤ [( ) ] ( ) Eg X ε ( ). and hence E 2 X1Y1 ≤ 1 ( ) if and only if −1 ≤ E X1Y1 ≤ 1 becomes E 2 X − μ1 Y − μ 2 ( ) [( )( σ σ 2 1 2 2 )] ≤ 1 if and only if −σ 1σ 2 ≤ E X − μ1 Y − μ 2 ≤ σ 1σ 2 . for any r. EY2 is ∞. REMARK 9 [( )( )] Exercises 5. 2.4. 3 with the respective probabilities when X ∼ N(μ. σ 2). ∞). then P X ≥ε ≤ Eg X ( ).128 5 Moments of Random Variables—Some Moment and Probability Inequalities Then X1.v.3 For an r. .

σ2 are the standard deviations of X and Y. distributed as χ 40 . and ρ = 1 if and only if Y = μ2 + σ2 X − μ1 σ1 ( ) with probability 1.v. Y) or ρX.1 Coefﬁcient Moments of and Random Its Interpretation Variables 129 2 5. and compare this bound with the exact value found from Table 3 in Appendix III. we say that X and Y are uncorrelated. So ρ = ±1 means X and Y are linearly related. that is.6 Prove the second conclusion of Theorem 2. Correlation Coefﬁcient and Its Interpretation In this section. 5. .Y). we have that ρ2 ≤ 1.) If ρ = 0. (See Fig. From this stems the signiﬁcance of ρ as a measure of linear dependence between X and Y.1.4 Let X be an r.v.5. μ2. we say that X and Y are completely correlated (positively if ρ = 1. 5. that is.’s X and Y with means μ1. To this end. = σ 1σ 2 [( )( )] = Cov( X . which are assumed to be positive. X. Y and is denoted by Cov(X. negatively if ρ = −1).’s and provide an interpretation for the latter. then the covariance of (X − μ1)/σ1. 1)-joint central mean.4. while if ρ = ±1. 5.5 Covariance.4. use the Cauchy–Schwarz inequality in order to show that E|X| ≤ E1/2X2. for two r.v. is called the covariance of X.v.5 Covariance. E[(X − μ1)(Y − μ2)]. (Y − μ2)/σ2 is called the correlation coefﬁcient of X. and ρ = −1 if and only if Y = μ2 − σ2 X − μ1 σ1 ( ) with probability 1.5). Y and is denoted by ρ(X. If σ1. we introduce the concepts of covariance and correlation coefﬁcient of two r. 5.7 For any r. 5.4. then P(X = μ) = 1. that is −1 ≤ ρ ≤ 1.5 Refer to Remark 8 and show that if X is an r.v. Use Tchebichev’s inequality in order to ﬁnd a lower bound for the probability P(|(X/40) − 1| ≤ 0. the (1. with EX = μ (ﬁnite) such that P(|X − μ| ≥ c) = 0 for every c > 0.Y or ρ12 or just ρ if no confusion is possible. Y ) σ 1σ 2 ( ) From the Cauchy–Schwarz inequality.4. ⎡⎛ X − μ1 ⎞ ⎛ Y − μ 2 ⎞ ⎤ E X − μ1 Y − μ 2 ρ = E ⎢⎜ ⎟⎜ ⎟⎥ = σ 1σ 2 ⎢ ⎣⎝ σ 1 ⎠ ⎝ σ 2 ⎠ ⎥ ⎦ E XY − μ1 μ 2 . Correlation 5.

b = − σ1 σμ and c = 1 2 − μ1 . Y) from the line 2 y = μ2 + σ σ ( x − μ 1 ) . negatively if ρ < 0). while values of ρ close to ±1 may indicate that the tendencies are strong. ( ) ( ) 2 (1) Taking the expectations of both sides in (1) and recalling that . ρ ≠ 0. consider the line y = μ 2 + σ σ ( x − μ1 ) in the xy-plane and let D be the distance of the (random) point (X. The following elaboration sheds more light on the intuitive interpretation of the correlation coefﬁcient ρ(= ρ(X.v. Y) from the above line. Positive values of ρ may indicate that there is a tendency of large values of Y to correspond to large values of X and small values of Y to correspond to small values of X. ED .’s X and Y. we have in the present line ax + by + c = 0 is given by ax0 + by0 + c case: 2 1 D= X − ⎛σ μ ⎞ σ1 Y + ⎜ 1 2 − μ1 ⎟ σ2 ⎝ σ2 ⎠ 1+ σ 12 .130 5 Moments of Random Variables—Some Moment and Probability Inequalities Y 2 ϩ 2 1 1 2 y ϭ 2 ϩ y ϭ 2 Ϫ 2 1 2 1 (x Ϫ 1 ) (x Ϫ 1 ) 2 Ϫ 2 1 1 0 1 X Figure 5. Thus. To this end. Recalling that the distance of the point (x0. Y)) as a measure of co-linearity of the r. Values of ρ close to zero may also indicate that these tendencies are weak. we say that X and Y are correlated (positively if ρ > 0. σ2 σ2 2 ⎡ ⎛σ μ ⎞⎤ σ D = ⎢ X − 1 Y + ⎜ 1 2 − μ1 ⎟ ⎥ σ2 ⎝ σ2 ⎠⎥ ⎢ ⎣ ⎦ 2 2 ⎛ σ 12 ⎞ 1 . Carrying out the calculations.1 For −1 < ρ < 1. + ⎜ 2⎟ σ2 ⎝ ⎠ and we wish to evaluate the expected squared distance of (X. y0) from the a 2 + b 2 . we ﬁnd 1 (σ 2 1 2 2 +σ2 D2 = σ 2 X 2 + σ 12Y 2 − 2σ 1σ 2 XY + 2σ 2 σ 1 μ 2 − σ 2 μ1 X ) ( ) − 2σ 1 σ 1 μ 2 − σ 2 μ1 Y + σ 1 μ 2 − σ 2 μ1 . for ρ > 0. 2 σ2 since here a = 1. Negative values of ρ may indicate that small values of Y correspond to large values of X and large values of Y to small values of X. that is.

’s X1 and Y1 as deﬁned in the proof of Theorem 2. Through the transformations x1 = x σ y−μ and y1 = σ .v. 2 σ 12 + σ 2 ( ) (4) At this point. by the observation just made.’s X and Y. exploiting the interpretation of an expectation as an average. Y) lies on the line y = σ μ2 + σ σ ( x − μ 1 ) or y = μ 2 − σ ( x − μ 1 ). the “normalized” −μ r.’s X1 and Y1 are dimensionless. His approach is somewhat different and is outlined below. Therefore. Correlation 5. relations (2) and (3) are summarized by the expression 2 2 1 1 ED2 = 2 2σ 12σ 2 1− ρ . Y1) is minimum.v. These points get closer and closer to this line as ρ gets closer to −1.v. For ρ = 0. consider the r.5. 29 (1975).’s X and Y. First. Y1) and seek the line Ax1 + By1 + C = 0 from which the expected squared distance of (X1. The preceding discussion is based on the paper “A Direct Development of the Correlation Coefﬁcient” by Leo Katz. (3) For ρ = 1 or ρ = −1. where 2 1 2 1 2 1 2 1 1 1 2 2 . ( ) we obtain ED2 = 2 2σ 12σ 2 1− ρ 2 σ 12 + σ 2 ( ) (ρ > 0). we get ED2 = 2 2σ 12σ 2 1+ ρ 2 σ 12 + σ 2 ( ) (ρ < 0). Y) may lie anywhere in the xy-plane. For ρ < 0. B and C.5 Covariance. the expected distance is constantly equal to σ 2 2 2 σ 12 σ 2 /( σ 12 + σ 2 ) from either one of the lines y = μ 2 + σ ( x − μ 1 ) and σ y = μ 2 − σ ( x − μ 1 ) . (2) Working likewise for the case that ρ < 0. we move from the xy-plane to the x1y1-plane. These points get closer and closer to this line as ρ gets closer to 1. Y) tend to be arranged along the line y = μ 2 + σ σ ( x − μ 1 ) . Vol. regardless of the value of ρ. determine the coefﬁcients A. That 2 is. the σ pairs (X. Y) tend to be arranged along the line y = μ 2 − σ ( x − μ 1 ) . In this latter plane. look at the point (X1. respectively (with probability 1). published in the American Statistician. we already know that (X. page 170. so that ED1 is minimum. It is in this sense that ρ is a measure of co-linearity of the r. and lie on this line for ρ = −1. EY 2 = σ 2 + μ2 and E XY = ρσ 1σ 2 + μ1 μ 2 . and lie on the line for ρ = 1. the pairs (X.v.1 Coefﬁcient Moments of and Random Its Interpretation Variables 131 2 2 EX 2 = σ 12 + μ12 . which is equivalent to saying that the pairs (X. unlike the original r. relation (4) indicates the following: For ρ > 0.

Then. If ρ > 0. the ED1 is constantly equal to 1. ED1 + B2 2 If ρ < 0. However. and its constant value 1 (for ρ = 0) are expressed by a single form. ED1 = 1. and y1 = −x1. A22AB = −1 if and only if A = + B2 B. 1 − |ρ|. and A2 + B2 . Expanding D1 EX1 = EY1 = 0. if ρ = 0. the expression to be minimized is ED12 = 1 + At this point. there is no minimizing line (through the origin) Ax1 + By1 = 0. ED1 is minimized for A22AB = 1 and the minimum is 1 + ρ. A2 + B2 (6) ( ) 2 = − A2 + B2 − 2 AB ≤ 0 ≤ A2 + B2 − 2 AB = A − B . A2 + B2 A2 + B2 (5) 2 Clearly. and E X1Y1 = ρ. Also observe that both minima of ED1 (for ρ > 0 and ρ < 0). for 2 2 ρ < 0. A2 + B2 (7) From (6) and (7). observe that − A+ B 2 ABρ . The corresponding lines are y1 = x1. taking expectations. the interpretation of ρ as a measure of co-linearity (of X1 and Y1) is argued as above. −1 ≤ 2 AB ≤ 1. for ED1 to be minimized it is necessary that C = 0. the ED1 is minimized for the line y1 = −x1. there is no minimizing line. we conclude that: 2 is minimized for A22AB = −1 and the minimum is 1 − ρ. 2 To summarize: For ρ > 0. namely. the ED1 is minimized for the line y1 = x1. and A22AB = 1 if and only if A = − B . with the lines y = μ 2 + σ σ ( x − μ1 ) x − μ and y = μ 2 − σ being replaced by the lines y = x and y1 = −x1. the + B2 2 main diagonal. for ρ = 0. EX12 = EY12 = 1. ( ) ( ) ( ) 2 or equivalently. + B2 2 Finally.132 5 Moments of Random Variables—Some Moment and Probability Inequalities D1 = AX 1 + BY1 + C1 noticing that 2 . ( ) we obtain ED12 = 1 + 2 ABρ C2 + . 2 1 2 1 . ( ) 1 1 1 σ respectively. by (5). From this point on.

5.1 Moments of Random Exercises Variables

133

Exercises

5.5.1 Let X be an r.v. taking on the values −2, −1, 1, 2 each with probability 1 . Set Y = X2 and compute the following quantities: EX, σ 2(X), EY, σ 2(Y), 4 ρ(X, Y). 5.5.2 Go through the details required in establishing relations (2), (3) and (4). 5.5.3 Do likewise in establishing relation (5). 5.5.4 Refer to Exercise 5.3.2 (including the hint given there) and iii) Calculate the covariance and the correlation coefﬁcient of the r.v.’s X and Y; iii) Referring to relation (4), calculate the expected squared distance of (X, y = μ2 − σ Y) from the appropriate line y = μ 2 + σ σ ( x − μ 1 ) or σ ( x − μ1 ) (which one?);

2 2 1 1

iii) What is the minimum expected squared distance of (X1, Y1) from the appropriate line y = x or y = −x (which one?) where X 1 = Xσ− μ and −μ Y1 = Yσ .

1 1 2 2

⎛ ⎜ Hint: Recall that ⎜ ⎝

⎡n n+1 ∑x = ⎢ 2 ⎢ x =1 ⎣

n 3

(

⎞ )⎤ ⎥ .⎟

2

⎟ ⎥ ⎦ ⎠

5.5.5 Refer to Exercise 5.3.2 and calculate the covariance and the correlation coefﬁcient of the r.v.’s X and Y. 5.5.6 Do the same in reference to Exercise 5.3.3. 5.5.7 Repeat the same in reference to Exercise 5.3.4. 5.5.8 Show that ρ(aX + b, cY + d) = sgn(ac)ρ(X, Y), where a, b, c, d are constants and sgn x is 1 if x > 0 and is −1 if x < 0. 5.5.9 Let X and Y be r.v.’s representing temperatures in two localities, A and B, say, given in the Celsius scale, and let U and V be the respective temperatures in the Fahrenheit scale. Then it is known that U and X are related as 9 follows: U = 5 X + 32, and likewise for V and Y. Fit this example in the model of Exercise 5.5.8, and conclude that the correlation coefﬁcients of X, Y and U, V are identical, as one would expect. 5.5.10 Consider the jointly distributed r.v.’s X, Y with ﬁnite second moments ˆ for which E[Y − (αX + β)]2 ˆ and β and σ 2(X) > 0. Then show that the values α is minimized are given by

134

5

Moments of Random Variables—Some Moment and Probability Inequalities

ˆ = EY − α ˆ EX , β

ˆ = α

( ) ρ X, Y . ( ) σ (X )

σY

**ˆ is called the best linear predictor or Y, given X.) ˆ = α ˆX+ β (The r.v. Y
**

5.5.11 If the r.v.’s X1 and X2 have the Bivariate Normal distribution, show that the parameter ρ is, actually, the correlation coefﬁcient of X1 and X2. (Hint: Observe that the exponent in the joint p.d.f. of X1 and X2 may be written as follows:

1

2 ⎡⎛ x − μ ⎞ 2 ⎛ x 1 − μ1 ⎞ ⎛ x 2 − μ2 ⎞ ⎛ x 2 − μ2 ⎞ ⎤ 1 ⎢⎜ 1 ⎟ ⎥ ⎟ +⎜ ⎟⎜ ⎟ − 2ρ ⎜ ⎢⎝ σ 1 ⎠ ⎝ σ1 ⎠⎝ σ2 ⎠ ⎝ σ2 ⎠ ⎥ ⎦ ⎣

2 1 − ρ2

(x =

(

)

2 1

1

− μ1 2σ

)

2

+

(x

2

−b

)

2 2

2⎛ σ 2 1 − ρ 2 ⎞ ⎝ ⎠

,

where b = μ 2 + ρ

σ2 x 1 − μ1 . σ1

(

)

This facilitates greatly the integration in calculating E(X1X2). 5.5.12 If the r.v.’s X1 and X2 have jointly the Bivariate Normal distribution 2 with parameters μ1, μ2, σ 12 , σ 2 and ρ, calculate E(c1X1 + c2X2) and σ2(c1X1 + c2X2) in terms of the parameters involved, where c1 and c2 are real constants. 5.5.13 For any two r.v.’s X and Y, set U = X + Y and V = X − Y. Then

iii) Show that P(UV < 0) = P(|X| < |Y|); iii) If EX2 = EY2 < ∞, then show that E(UV) = 0; iii) If EX2, EY2 < ∞ and σ 2(X) = σ 2(Y), then U and V are uncorrelated. 5.5.14 If the r.v.’s Xi, i = 1, . . . , m and Yj, j = 1, . . . , n have ﬁnite second moments, show that

n ⎛m ⎞ m n Cov⎜ ∑ X i , ∑ Yj ⎟ = ∑ ∑ Cov X i , Yj . ⎝ i =1 ⎠ i =1 j =1 j =1

(

)

**5.6* Justiﬁcation of Relation (2) in Chapter 2
**

As a ﬁnal application of the results of this chapter, we give a general proof of Theorem 9, Chapter 2. To do this we remind the reader of the deﬁnition of the concept of the indicator function of a set A.

5.6* Justiﬁcation 5.1 Moments of Relation of Random (2) in Chapter Variables 2

135

Let A be an event in the sample space S. Then the indicator function of A, denoted by IA, is a function on S deﬁned as follows:

⎧1 if s ∈ A IA s = ⎨ c ⎩0 if s ∈ A .

()

**The following are simple consequences of the indicator function: I I Aj = ∏ I A
**

n j=1

n

j =1

j

(8)

I and, in particular,

∑

n j=1

Aj = ∑ I A ,

j =1

j

n

(9)

IA = 1− IA.

c

(10)

Clearly,

E IA = P A

( )

r

( )

1

(11)

and for any X1, . . . , Xr, we have

(1 − X )(1 − X ) ⋅ ⋅ ⋅ (1 − X ) = 1 − H

1 2

+ H 2 − ⋅ ⋅ ⋅ + −1 H r ,

1 j

( )

r

(12)

where Hj stands for the sum of the products Xi · · · Xi , where the summation extends over all subsets {i1, i2, . . . , ij} of the set {1, 2, . . . , r}, j = 1, . . . , r. Let α, β be such that: 0 < α, β and α + β ≤ r. Then the following is true:

∑ Xi

jα

1

⎛α + β⎞ ⋅ ⋅ ⋅ X iα H β J α = ⎜ ⎟ Hα + β , ⎝ α ⎠

( )

(13)

where Jα = {i1, . . . , iα} is the typical member of all subsets of size α of the set {1, 2, . . . , r}, Hβ(Jα) is the sum of the products Xj · · · Xj , where the summation extends over all subsets of size β of the set {1, . . . , r} − Jα, and ∑J is meant to extend over all subsets Jα of size α of the set {1, 2, . . . , r}. The justiﬁcation of (13) is as follows: In forming Hα+β, we select (α + β) X’s r from the available r X’s in all possible ways, which is (α + ). On the other hand, β r −α for each choice of Jα, there are ( β ) ways of choosing β X’s from the remaining r r −α (r − α) X’s. Since there are (α ) choices of Jα, we get (α )( r β ) groups (products) of (α + β) X’s out of r X’s. The number of different groups of (α + β) X’s out

1

β

α

136

5

**Moments of Random Variables—Some Moment and Probability Inequalities
**

r −α r of r X’s is (α + ). Thus among the (α )( r β ) groups of (α + β) X’s out of r X’s, the β number of distinct ones is given by

r −α ! r! ⎛ r ⎞ ⎛r − α⎞ ⎜ ⎟⎜ ⎟ ⎝ α ⎠ ⎝ β ⎠ α ! r − α ! β! r − α − β ! = r! ⎛ r ⎞ ⎜ ⎟ α β r −α − β ! + ! ⎝α + β⎠

(

( ) ( )(

)

)

=

This justiﬁes (13). Now clearly,

(α + β )! = ⎛α + β ⎞ .

α ! β!

⎜ ⎟ ⎝ α ⎠

(

)

**Bm = ∑ Ai ∩ ⋅ ⋅ ⋅ ∩ Ai ∩ Aic ∩ ⋅ ⋅ ⋅ ∩ Aic ,
**

1 m m +1 M

Jm

where the summation extends over all choices of subsets Jm = {i1, . . . , im} of the set {1, 2, . . . , M} and Bm is the one used in Theorem 9, Chapter 2. Hence

IB = ∑ IA

m

Jm

i1 ∩

⋅ ⋅ ⋅ ∩ Aim ∩ Aic ∩ ⋅ ⋅ ⋅ ∩ AicM m+1

= ∑ IA ⋅ ⋅ ⋅ IA 1− IA

Jm

i1 im i1 im

(

im + 1

) ⋅ ⋅ ⋅ (1 − I ) (by (8), (9), (10))

AiM

= ∑ IA ⋅ ⋅ ⋅ IA ⎡ 1 − H1 J m + H 2 J m − ⋅ ⋅ ⋅ + −1 ⎢ ⎣ J

m

( )

( )

( )

M −m

H M −m J m ⎤ ⎥ ⎦

( )

(by (12)).

Since

∑IA

Jm

i1

⎛ m + k⎞ ⋅ ⋅ ⋅ I A H k Jm = ⎜ ⎟ H m+ k ⎝ m ⎠

im

( )

(by (13)),

( )

M −m

we have

⎛ m + 1⎞ ⎛ m + 2⎞ I B = Hm − ⎜ ⎟ H m+1 + ⎜ ⎟ H m+ 2 − ⋅ ⋅ ⋅ + −1 ⎝ m ⎠ ⎝ m ⎠

m

⎛ M⎞ ⎜ ⎟ HM . ⎝ m⎠

Taking expectations of both sides, we get (from (11) and the deﬁnition of Sr in Theorem 9, Chapter 2)

⎛ m + 1⎞ ⎛ m + 2⎞ P Bm = S m − ⎜ ⎟ S m+1 + ⎜ ⎟ S m+ 2 − ⋅ ⋅ ⋅ + −1 ⎝ m ⎠ ⎝ m ⎠

( )

( )

M −m

⎛ M⎞ ⎜ ⎟ SM , ⎝ m⎠

as was to be proved.

5.6* Justiﬁcation 5.1 Moments of Relation of Random (2) in Chapter Variables 2

137

(For the proof just completed, also see pp. 80–85 in E. Parzen’s book Modern Probability Theory and Its Applications published by Wiley, 1960.)

REMARK 10 In measure theory the quantity IA is sometimes called the characteristic function of the set A and is usually denoted by χA. In probability theory the term characteristic function is reserved for a different concept and will be a major topic of the next chapter.

138

6

Characteristic Functions, Moment Generating Functions and Related Theorems

Chapter 6

Characteristic Functions, Moment Generating Functions and Related Theorems

6.1 Preliminaries

The main subject matter of this chapter is the introduction of the concept of the characteristic function of an r.v. and the discussion of its main properties. The characteristic function is a powerful mathematical tool, which is used proﬁtably for probabilistic purposes, such as producing the moments of an r.v., recovering its distribution, establishing limit theorems, etc. To this end, recall that for z ∈ ޒ, eiz = cos z + i sin z, i = −1 , and in what follows, i may be treated formally as a real number, subject to its usual properties: i2 = −1, i3 = −i, i 4 = 1, i 5 = i, etc. The sequence of lemmas below will be used to justify the theorems which follow, as well as in other cases in subsequent chapters. A brief justiﬁcation for some of them is also presented, and relevant references are given at the end of this section.

LEMMA A

**Let g1, g2 : {x1, x2, . . .} → [0, ∞) be such that
**

g1 x j ≤ g 2 x j ,

( )

( )

j = 1, 2, ⋅ ⋅ ⋅ ,

**and that ∑xj g2(xj) < ∞. Then ∑xj g1(xj) < ∞.
**

PROOF LEMMA A′

If the summations are ﬁnite, the result is immediate; if not, it follows by taking the limits of partial sums, which satisfy the inequality. ᭡

b

Let g1, g2 : [ → ޒ0, ∞) be such that g1(x) ≤ g2(x), x ∈ ޒ, and that ∫a g1 (x)dx ∞ ∞ exists for every a, b, ∈ ޒwith a < b, and that ∫−∞ g 2 (x)dx < ∞. Then ∫−∞ g1(x)dx < ∞.

PROOF

Same as above replacing sums by integrals. ᭡

138

6.5

The Moment Generating 6.1 Preliminaries Function

139

LEMMA B

Let g : {x1, x2, . . .} → ޒand ∑xj|g(xj)| < ∞. Then ∑xj g(xj) also converges. The result is immediate for ﬁnite sums, and it follows by taking the limits of partial sums, which satisfy the inequality. ᭡

PROOF

LEMMA B′

Let g : ޒ → ޒbe such that ∫a g(x)dx exists for every a, b, ∈ ޒwith a < b, and that ∞ ∞ ∫−∞ |g( x ) |dx < ∞. Then ∫−∞ g(x)dx also converges.

b

PROOF

Same as above replacing sums by integrals. ᭡

The following lemma provides conditions under which the operations of taking limits and expectations can be interchanged. In more advanced probability courses this result is known as the Dominated Convergence Theorem.

LEMMA C

Let {Xn}, n = 1, 2, . . . , be a sequence of r.v.’s, and let Y, X be r.v.’s such that |Xn(s)| ≤ Y(s), s ∈ S, n = 1, 2, . . . and Xn(s) → X(s) (on a set of s’s of ⎯ ⎯→ E(X ), or probability 1) and E(Y ) < ∞. Then E(X ) exists and E(Xn) ⎯n →∞ equivalently,

lim E X n = E lim X n .

n→∞ n→∞

( )

(

)

REMARK 1

The index n can be replaced by a continuous variable.

The next lemma gives conditions under which the operations of differentiation and taking expectations commute.

LEMMA D

For each t ∈ T (where T is ޒor an appropriate subset of it, such as the interval [a, b]), let X(·; t) be an r.v. such that (∂∂t)X(s; t) exists for each s ∈S and t ∈T. Furthermore, suppose there exists an r.v. Y with E(Y ) < ∞ and such that

∂ X s; t ≤ Y s , ∂t

( )

()

s ∈ S , t ∈T .

Then

⎡∂ ⎤ d E X ⋅; t = E ⎢ X ⋅; t ⎥, for all t ∈T . dt ⎣ ∂t ⎦

[ ( )]

( )

The proofs of the above lemmas can be found in any book on real variables theory, although the last two will be stated in terms of weighting functions rather than expectations; for example, see Advanced Calculus, Theorem 2, p. 285, Theorem 7, p. 292, by D. V. Widder, Prentice-Hall, 1947; Real Analysis, Theorem 7.1, p. 146, by E. J. McShane and T. A. Botts, Van Nostrand, 1959; The Theory of Lebesgue Measure and Integration, pp. 66–67, by S. Hartman and J. Mikusin ´ ski, Pergamon Press, 1961. Also Mathematical Methods of Statistics, pp. 45–46 and pp. 66–68, by H. Cramér, Princeton University Press, 1961.

140

6

Characteristic Functions, Moment Generating Functions and Related Theorems

**6.2 Deﬁnitions and Basic Theorems—The One-Dimensional Case
**

Let X be an r.v. with p.d.f. f. Then the characteristic function of X (ch.f. of X ), denoted by φX (or just φ when no confusion is possible) is a function deﬁned on ޒ, taking complex values, in general, and deﬁned as follows:

φ X t = E e itX

()

[ ]

⎧∑ e itx f x = ∑ cos tx f x + i sin tx f x ⎪ x =⎨ x ∞ ∞ itx ⎪ e f x dx = cos tx f x + i sin tx f x dx ∫ ∫ −∞ −∞ ⎩ ⎧∑ cos tx f x + i ∑ sin tx f x ⎪ x =⎨ x ∞ ∞ ⎪ cos tx f x dx + i sin tx f x dx. ∫−∞ ⎩∫−∞

[ ( ) ( ) ( ) ( )] () [ ( ) ( ) ( ) ( )] [ ( ) ( )] [ ( ) ( )]

() ( )() ( )()

By Lemmas A, A′, B, B′, φX(t) exists for all t ∈ ޒ. The ch.f. φX is also called the Fourier transform of f. The following theorem summarizes the basic properties of a ch.f.

THEOREM 1

(Some properties of ch.f’s) ii) iii) iv) v) vi) vii) i) φX(0) = 1. |φX(t)| ≤ 1. φX is continuous, and, in fact, uniformly continuous. φX+d(t) = eitdφX(t), where d is a constant. φcX(t) = φX(ct), where c is a constant. φcX+d(t) = eitdφX(ct). dn φX t dt n

()

= inE(Xn), n = 1, 2, . . . , if E|Xn| < ∞.

t =0

PROOF

i) φX(t) = EeitX. Thus φX(0) = Eei0X = E(1) = 1. ii) |φX(t)| = |EeitX| ≤ E|eitX| = E(1) = 1, because |eitX| = 1. (For the proof of the inequality, see Exercise 6.2.1.) iii) φ X t + h − φ X t = E ⎡ e ⎢ ⎣

(

)

()

i t+ h X

( )

**= E e itX e ihX − 1 ≤ E e itX e ihX − 1 = E e ihX − 1.
**

Then

[ (

− e itX ⎤ ⎥ ⎦

)]

(

)

6.2 Deﬁnitions and Basic Theorems—The 6.5 The Moment One-Dimensional Generating Function Case

141

**lim φ X t + h − φ X t ≤ lim E e ihX − 1 = E lim e ihX − 1 = 0,
**

h→0 h→0 h→0

(

)

()

[

]

provided we can interchange the order of lim and E, which here can be done by Lemma C. We observe that uniformity holds since the last expression on the right is independent of t. iv) φX+d(t) = Eeit(X+d) = E(eitXeitd) = eitd EeitX = eitd φX(t). v) φcX(t) = Eeit(cX) = Eei(ct)X = φX(ct). vi) Follows trivially from (iv) and (v). vii) ⎛ ∂n ⎞ dn dn φ t = n Ee itX = E⎜ n e itX ⎟ = E i n X n e itX , n X dt dt ⎝ ∂t ⎠

()

(

)

provided we can interchange the order of differentiation and E. This can be done here, by Lemma D (applied successively n times to φX and its n − 1 ﬁrst derivatives), since E|X n| < ∞ implies E|X k| < ∞, k = 1, . . . , n (see Exercise 6.2.2). Thus dn φX t dt n

REMARK 2 n

n d

dt n

()

= inE X n . ᭡

t =0

( )

(−i)

From part (vii) of the theorem we have that E(Xn) = ϕX(t)|t = 0, so that the ch.f. produces the nth moment of the r.v.

If X is an r.v. whose values are of the form x = a + kh, where a, h are constants, h > 0, and k runs through the integral values 0, 1, . . . , n or 0, 1, . . . , or 0, ±1, . . . , ±n or 0, ±1, . . . , then the distribution of X is called a lattice distribution. For example, if X is distributed as B(n, p), then its values are of the form x = a + kh with a = 0, h = 1, and k = 0, 1, . . . , n. If X is distributed as P(λ), or it has the Negative Binomial distribution, then again its values are of the same form with a = 0, h = 1, and k = 0, 1, . . . . If now φ is the ch.f. of X, it can be shown that the distribution of X is a lattice distribution if and only if |φ(t)| = 1 for some t ≠ 0. It can be readily seen that this is indeed the case in the cases mentioned above (for example, φ(t) = 1 for t = 2π). It can also be shown that the distribution of X is a lattice distribution, if and only if the ch.f. φ is periodic with period 2π (that is, φ(t + 2π) = φ(t), t ∈ )ޒ.

REMARK 3

**In the following result, the ch.f. serves the purpose of recovering the distribution of an r.v. by way of its ch.f.
**

THEOREM 2

(Inversion formula) Let X be an r.v. with p.d.f. f and ch.f. φ. Then if X is of the discrete type, taking on the (distinct) values xj, j ≥ 1, one has i) f x j = lim

( )

T →∞

1 2T

∫

T

−T

e

− itx j

φ t dt , j ≥ 1.

()

If X is of the continuous type, then

142

6

Characteristic Functions, Moment Generating Functions and Related Theorems

ii) f x = lim lim

()

1 h→0 T →∞ 2π

∫

**1 − e − ith − itx e φ t dt −T ith
**

T

()

**and, in particular, if ∫−∞ |φ (t )|dt < ∞, then ( f is bounded and continuous and) ii′) f(x) = 1 ∞ − itx ∫ e φ(t)dt. 2π −∞
**

j

∞

PROOF (outline) i) The ch.f. φ is continuous, by Theorem 1(iii), and since − itx T so is e−itxj, it follows that the integral ∫−T e φ (t)dt exists for every T(> 0). We have then

1 2T

∫

T

−T

e

− itx j

φ t dt =

()

⎡ − itx ⎤ ∑ e itx f x k ⎥ dt ⎢e k ⎣ ⎦ T ⎡ ⎤ − it x x 1 ( ) f x dt = ∑e k ⎥ 2T ∫−T ⎢ ⎣k ⎦ 1 T it( x − x ) e dt = ∑ f xk 2T ∫−T k 1 2T

∫

T

j

k

−T

( )

k

j

( )

j

( )

k

(the interchange of the integral and summations is valid here). That is,

1 2T

But

∫−T e

T

− itx j

φ t dt = ∑ f x k

k

()

( ) 21 T∫

(

T

−T

e

it x k − x j

(

) dt .

(1)

∫

T

−T

e

it x k − x j

(

) dt =

∫

T

−T

e

it x k − x j

(

( = ∫ cos t ( x ) dt = cos t x ( ∫

T −T T −T T −T

∫

T

−T

[cos t(x

k

k

− x j + i sin t x k − x j dt

T −T

)

= ∫ cos t x k − x j dt + i ∫ sin t x k − x j dt

j

k

**) ( ) − x )dt , since sin z is an odd function. That is, − x )dt . (2)
**

j

)]

If xk = xj, the above integral is equal to 2T, whereas, for xk ≠ xj,

∫

T

−T

cos t x k − x j dt = = =

(

)

1 xk − x j

∫

T

−T

d sin t x k − x j

(

) )]

sin T x k − x j − sin −T x k − x j 2 sin T x k − x j xk − x j

(

)

(

xk − x j

[ (

).

6.2 Deﬁnitions and Basic Theorems—The 6.5 The Moment One-Dimensional Generating Function Case

143

Therefore,

1 2T

k j

∫

k j

T

−T

( e

it x k − x j

⎪ ) dt = ⎪ ⎨ sin T ( x k − x j ) ⎪ T xk − x j ⎪ ⎩

⎧1,

if , if

xk = x j xk ≠ x j.

(

)

(3)

**sin T ( x − x ) ≤ x1 But − x , a constant independent of T, and therefore, for x −x sin T ( x − x ) xk ≠ xj, lim T ( x − x ) = 0 , so that T →∞
**

k j j

k

k

j

1 T →∞ 2T lim

∫

T

−T

e

it x k − x j

(

⎪1, ) dt = ⎧

⎨ ⎪0 , ⎩

if if

xk = x j xk ≠ x j.

(4)

By means of (4), relation (1) yields

T →∞

lim

1 2T

∫−T e

T

− itx j

φ t dt = lim ∑ f xk

T →∞ k

()

( ) 21 T∫

1 T →∞ 2T

T

−T T

e

it xk − x j

(

) )

dt dt

= ∑ f xk lim

k

( )

(

∫−T e

it xk − x j

(

**(the interchange of the limit and summation is legitimate here)
**

= f x j + ∑ lim

k≠ j

( )

T →∞

1 2T

∫−T e

T

it x k − x j

) dt = f x , ( j)

as was to be seen ii) (ii′) Strictly speaking, (ii′) follows from (ii). We are going to omit (ii) entirely and attempt to give a rough justiﬁcation of (ii′). The assumption that ∞ ∞ − itx ∫−∞ |φ (t )|dt < ∞ implies that ∫−∞ e φ (t)dt exists, and is taken as follows for every arbitrary but ﬁxed x ∈ޒ:

∫

But

e − itxφ t dt = lim ∫ e − itxφ t dt . −∞ (0 < )T→∞ −T

∞

()

T

()

(5)

∫

∞ T e − itxφ t dt = ∫ e − itx ⎡ e ity f y dy⎤ dt ∫ ⎢ ⎥ −T −T ⎦ ⎣ −∞ T T ∞ it ( y − x ) ∞ it ( y − x ) =∫ ∫ e f y dy dt = ∫ f y ⎡ e dt ⎤ dy, ∫ ⎢ ⎥ T −T −∞ −∞ − ⎦ ⎣ T

()

()

()

()

(6)

where the interchange of the order of integration is legitimate here. Since the integral (with respect to y) is zero over a single point, we may assume in the sequel that y ≠ x. Then

∫−T

T

e

it y − x

(

)

dt =

2 sin T y − x y−x

(

),

(7)

as was seen in part (i). By means of (7), relation (6) yields

144

6

Characteristic Functions, Moment Generating Functions and Related Theorems

∫−∞ e

∫

∞

∞

− itx

φ t dt = 2 lim ∫ f y

T →∞ −∞

()

∞

()

sin T y − x

y−x

(

) dy.

(8)

**Setting T( y − x) = z, expression (8) becomes
**

−∞ ∞ ⎛ z ⎞ sin z e − itxφ t dt = 2 lim ∫ f ⎜ x + ⎟ dz T →∞ −∞ ⎝ T⎠ z

()

= 2 f x π = 2πf x ,

by taking the limit under the integral sign, and by using continuity of f and the ∞ z dz = π . Solving for f(x), we have fact that ∫−∞ sin z

()

∫

()

f x =

as asserted. ᭡

EXAMPLE 1

()

1 2π

∞

−∞

e − itxφ t dt ,

()

Let X be B(n, p). In the next section, it will be seen that φX(t) = ( peit + q)n. Let us apply (i) to this expression. First of all, we have 1 2T

∫

T

−T

e − itxφ t dt = = = =

()

1 2T 1 2T 1 2T 1 2T +

∫

T

−T

( pe

⎡

n

it

+ q e − itx dt

it

)

n

∑ ⎜ ⎟ ( pe ∫ ⎢ ⎢ ⎝ r⎠

T −T

⎛ n⎞

⎣r = 0

)q

r

n −r

⎤ e − itx ⎥ dt ⎥ ⎦ dt

T i r− x t

∑⎜ ⎟p q ∫ ⎝ r⎠

r n −r r= 0 n

n

⎛ n⎞ ⎛ n⎞

T

−T

e

i r− x t

(

)

∑⎜ ⎟p q ⎝ r⎠

r r= 0 r≠ x

n −r

∫ i( r − x )

1

−T

e

(

)

i r − x dt

(

)

1 ⎛ n⎞ x n − x T ⎜ ⎟ p q ∫−T dt 2T ⎝ x ⎠

i ( r − x )T − i ( r − x )T n ⎛ n⎞ e −e 1 ⎛ n⎞ x n − x = ∑ ⎜ ⎟ p r qn − r + ⎜ ⎟ p q 2T T ⎝x⎠ 2 2Ti r − x r= 0 ⎝ r ⎠ r≠ x

(

)

n sin r − x T ⎛ n⎞ x n − x ⎛ n⎞ +⎜ ⎟p q . = ∑ ⎜ ⎟ p r qn − r ⎝x⎠ r−x T r= 0 ⎝ r ⎠ r≠ x

(

(

)

)

**Taking the limit as T → ∞, we get the desired result, namely
**

⎛ n⎞ f x = ⎜ ⎟ p x q n− x . ⎝ x⎠

()

6.5

The Moment GeneratingExercises Function

145

**(One could also use (i′ ) for calculating f (x), since φ is, clearly, periodic with period 2π.)
**

EXAMPLE 2

For an example of the continuous type, let X be N(0, 1). In the next section, we 2 2 ∞ will see that φX(t) = e−t /2. Since |φ(t)| = e−t /2, we know that ∫−∞ |φ (t )|dt < ∞, so that (ii′) applies. Thus we have f x = =

( )

1 2π

∫

∞

−∞

e − itxφ t dt =

2

()

1 2π

∫

∞

−∞

e − itx e − t

2

2

dt

2 2 2

⎡ t + 2 t ix + ix ⎤ ( ) ( ) ⎦ 1 ∞ − ( 1 2 )( t + 2 itx ) 1 ∞ −( 1 2 )⎣ ⎥ ( 1 2 )( ix ) ⎢ e dt = e e dt ∫ ∫ 2π −∞ 2π −∞ −( 1 2 )x −( 1 2 )x ∞ ∞ e 1 e 1 − ( 1 2 )( t + ix ) −( 1 2 )u dt = e du = e ∫ ∫ −∞ −∞ 2π 2π 2π 2π

2 2 2 2

=

e

− 1 2 x2

( )

2π

⋅1 =

1 2π

e −x

2

2

,

**as was to be shown.
**

THEOREM 3

(Uniqueness Theorem) There is a one-to-one correspondence between the characteristic function and the p.d.f. of a random variable.

PROOF The p.d.f. of an r.v. determines its ch.f. through the deﬁnition of the ch.f. The converse, which is the involved part of the theorem, follows from Theorem 2. ᭡

Exercises

6.2.1 Show that for any r.v. X and every t ∈ ޒ, one has |EeitX| ≤ E|eitX|(= 1). (Hint: If z = a + ib, a, b ∈ ޒ, recall that z = a 2 + b 2 . Also use Exercise 5.4.7 in Chapter 5 in order to conclude that (EY )2 ≤ EY2 for any r.v. Y.) 6.2.2 Write out detailed proofs for parts (iii) and (vii) of Theorem 1 and justify the use of Lemmas C, D. ¯ X(t), t ∈ ޒ, where the bar 6.2.3 For any r.v. X with ch.f. φX, show that φ−X(t) = φ over φX denotes conjugate, that is, if z = a + ib, a, b ∈ ޒ, then z ¯ = a − ib. 6.2.4 Show that the ch.f. φX of an r.v. X is real if and only if the p.d.f. fX of X is symmetric about 0 (that is, fX(−x) = fX(x), x ∈ )ޒ. (Hint: If φX is real, then the conclusion is reached by means of the previous exercise and Theorem 2. If fX is symmetric, show that f−X(x) = fX(−x), x ∈ ޒ.)

146

6

Characteristic Functions, Moment Generating Functions and Related Theorems

6.2.5 Let X be an r.v. with p.d.f. f and ch.f. φ given by: φ(t) = 1 − |t| if |t| ≤ 1 and φ(t) = 0 if |t| > 1. Use the appropriate inversion formula to ﬁnd f. 6.2.6 Consider the r.v. X with ch.f. φ(t) = e−|t|, t ∈ ޒ, and utilize Theorem 2(ii′) in order to determine the p.d.f. of X.

**6.3 The Characteristic Functions of Some Random Variables
**

In this section, the ch.f.’s of some distributions commonly occurring will be derived, both for illustrative purposes and for later use.

6.3.1

Discrete Case

n n x n ⎛ n⎞ ⎛ n⎞ φ X t = ∑ e itx ⎜ ⎟ p x q n − x = ∑ ⎜ ⎟ pe it q n − x = pe it + q , ⎝ x⎠ x= 0 x = 0 ⎝ x⎠

1. Let X be B(n, p). Then φX(t) = ( peit + q)n. In fact,

()

( )

(

)

Hence

d φX t dt

so that E(X ) = np. Also, d2 φX t dt 2

()

= n pe it + q

t=0

(

)

n −1

ipe it

t=0

= inp,

()

= inp

t=0

d ⎡ it pe + q dt ⎢ ⎣

(

)

n −1

⎤ e it ⎥ ⎦ t= 0

n −2

⎡ = inp⎢ n − 1 pe it + q ⎣

(

)(

)

ipe it e it + pe it + q

(

)

n −1

= i 2 np n − 1 p + 1 = − np n − 1 p + 1 = − E X 2 , so that

E X 2 = np n − 1 p + 1 and σ 2 X = E X 2 − EX = n 2 p 2 − np 2 + np − n 2 p 2 = np 1 − p = npq;

[( ) ]

]

[( ) ]

( )

( )

2

⎤ ie it ⎥ ⎦ t= 0

( )

[( )

( )

( ) ( )

that is, σ 2(X ) = npq. it 2. Let X be P(λ). Then φX(t) = eλe −λ. In fact,

φ X t = ∑ e itx e − λ

x= 0

()

∞

it ∞ λe λx −λ =e ∑ x! x! x= 0

( )

x

= e − λ e λe = e λe

it

it

−λ

.

6.3 The Characteristic 6.5 Functions The Moment of Some Generating Random Variables Function

147

Hence

d φX t dt

so that E(X ) = λ. Also,

d2 φX t dt 2

()

t=0

= e λe

it

−λ

iλe it

t= 0

= iλ ,

()

=

t=0

d iλe − λ e λe dt d λe e dt ⋅e

(

it

+it

)

t=0

**= iλe − λ = iλe = iλe
**

2 −λ −λ

it

+it

λeit +it λ

⋅ λe it ⋅ i + i

( ) = i λ (λ + 1) = − λ (λ + 1) = − E ( X ),

⋅ e λi + i

2

(

t= 0

)

t= 0

so that

σ 2 X = E X 2 − EX

that is, σ 2(X ) = λ.

( )

( ) ( )

2

= λ λ + 1 − λ2 = λ ;

(

)

6.3.2 Continuous Case

1. Let X be N(μ, σ 2). Then φX(t) = eitμ−(σ t /2), and, in particular, if X is 2 N(0, 1), then φX(t) = e−t /2. If X is N(μ, σ 2), then (X − μ)/σ, is N(0, 1). Thus

22

**φ( X − μ ) σ t = φ(1 σ ) X −( μ σ ) t = e − itμ σ φ X t σ , and φ X t σ = e itμ σ φ( X − μ ) σ t .
**

So it sufﬁces to ﬁnd the ch.f. of an N(0, 1) r.v. Y, say. Now

()

()

( )

( )

()

φY t =

=

()

1 2π 1 2π

2

∫−∞ e

e −t

2

∞

ity

e −y

∞

2

2

dy =

1 2π

∫−∞ e

2

∞

− y 2 − 2 ity

(

) 2 dy

2

∫−∞ e

− y − it

(

)

2

2

dy = e − t

2

.

Hence φX(t /σ) = eitμ/σ e−t /2 and replacing t/σ by t, we get, ﬁnally: ⎛ σ 2t 2 ⎞ φ X t = exp⎜ itμ − . 2 ⎟ ⎝ ⎠

()

148

6

Characteristic Functions, Moment Generating Functions and Related Theorems

Hence

d φX t dt

()

⎛ σ 2t 2 ⎞ 2 = exp⎜ itμ − ⎟ iμ − σ t 2 ⎝ ⎠ t =0

(

)

= iμ , so that E X = μ .

t =0

( )

d2 φX t dt 2

()

⎛ σ 2t 2 ⎞ 2 = exp⎜ itμ − ⎟ iμ − σ t 2 ⎝ ⎠ t =0

(

)

2

= i2μ 2 − σ 2 = i2 μ 2 + σ 2 .

(

)

⎛ σ 2t 2 ⎞ − σ 2 exp⎜ itμ − 2 ⎟ ⎝ ⎠ t =0

Then E(X 2) = μ 2 + σ 2 and σ 2(X ) = μ2 + σ 2 − μ2 = σ 2. 2. Let X be Gamma distributed with parameters α and β. Then φX(t) = (1 − iβt)−α. In fact,

φX t =

()

1 Γ α βα

( ) ∫0

x=

∞

e itx x α −1 e − x β dx =

1 Γ α βα

( ) ∫0

∞

x α −1 e

− x 1− iβt β

(

)

dx.

**Setting x(1 − iβt) = y, we get
**

y dy , dx = , 1 − iβt 1 − iβt y ∈ 0, ∞ .

[

)

**Hence the above expression becomes
**

1 Γ α βα y ( ) ∫0 (1 − iβt )α −1 = 1 − iβt

∞

1

α −1 − y β

e

dy 1 − iβt

(

)

−α

1 Γ α βα

( ) ∫0

=

t =0

∞

yα −1e − y β dy = 1 − iβt

(

)

−α

.

Therefore

d φX t dt

so that E(X ) = αβ, and

d2 φX dt 2

2

()

(1 − iβt )

iαβ

α +1

t =0

= iαβ ,

=i

t =0

2

2

α α +1 β2

α +2

t =0

) (1 − iβt )

2

(

= i 2α α + 1 β 2 ,

(

)

so that E(X ) = α(α + 1)β . Thus σ (X) = α(α + 1)β2 − α2β2 = αβ2. For α = r/2, β = 2, we get the corresponding quantities for χ2 r , and for α = 1, β = 1/λ, we get the corresponding quantities for the Negative Exponential distribution. So

6.5

**The Moment GeneratingExercises Function
**

−1

149

φ X t = 1 − 2it

() (

)

−r 2

⎛ it ⎞ , φX t = ⎜ 1 − ⎟ λ⎠ ⎝

()

=

λ , λ − it

respectively. 3. Let X be Cauchy distributed with μ = 0 and σ = 1. Then φX(t) = e−|t|. In fact,

φ X t = ∫ e itx

()

**1 1 1 ∞ cos tx dx = ∫ dx 2 −∞ π 1+ x π −∞ 1 + x 2 i ∞ sin tx 2 ∞ cos tx dx = ∫ dx + ∫ 2 −∞ π π 0 1 + x2 1+ x
**

∞

( )

( )

( )

because 1 + x2 since sin(tx) is an odd function, and cos(tx) is an even function. Further, it can be shown by complex variables theory that

−∞

∫

∞

sin tx

( ) dx = 0,

∫

Hence

∞

cos tx

0

1 + x2

( ) dx = π e

2

−t

.

φX t = e

Now

()

−t

.

d d −t φX t = e dt dt does not exist for t = 0. This is consistent with the fact of nonexistence of E(X ), as has been seen in Chapter 5.

()

Exercises

6.3.1 Let X be an r.v. with p.d.f. f given in Exercise 3.2.13 of Chapter 3. Derive its ch.f. φ, and calculate EX, E[X(X − 1)], σ 2(X ), provided they are ﬁnite. 6.3.2 Let X be an r.v. with p.d.f. f given in Exercise 3.2.14 of Chapter 3. Derive its ch.f. φ, and calculate EX, E[X(X − 1)], σ 2(X ), provided they are ﬁnite. 6.3.3 Let X be an r.v. with p.d.f. f given by f(x) = λe−λ(x−α) I(α,∞)(x). Find its ch.f. φ, and calculate EX, σ 2(X ), provided they are ﬁnite.

150

6

Characteristic Functions, Moment Generating Functions and Related Theorems

6.3.4 Let X be an r.v. distributed as Negative Binomial with parameters r and p. i) Show that its ch.f., φ, is given by

φt =

()

pr

(

1 − qe it

)

r

;

ii) By differentiating φ, show that EX = rq/p and σ 2(X ) = rq/p2; iii) Find the quantities mentioned in (i) and (ii) for the Geometric distribution. 6.3.5 Let X be an r.v. distributed as U(α, β). ii) Show that its ch.f., φ, is given by

φt =

()

e itβ − e itα ; it β − α

(

)

+β ii) By differentiating φ, show that EX = α 2 and σ 2 ( X ) =

(α − β )2

12

.

6.3.6 Consider the r.v. X with p.d.f. f given in Exercise 3.3.14(ii) of Chapter 3, and by using the ch.f. of X, calculate EX n, n = 1, 2, . . . , provided they are ﬁnite.

**6.4 Deﬁnitions and Basic Theorems—The Multidimensional Case
**

In this section, versions of Theorems 1, 2 and 3 are presented for the case that the r.v. X is replaced by a k-dimensional r. vector X. Their interpretation, usefulness and usage is analogous to the ones given in the one-dimensional case. To this end, let now X = (X1, . . . , Xk)′ be a random vector. Then the ch.f. of the r. vector X, or the joint ch.f. of the r.v.’s X1, . . . , Xk, denoted by φX or φX1, . . . , Xk, is deﬁned as follows:

φX , ⋅ ⋅ ⋅ ,

1

Xk

(t , ⋅ ⋅ ⋅ , t ) = E[e

1 k

it 1 X 1 +it2 X 2 + ⋅ ⋅ ⋅ + itk X k

],

t j ∈ ޒ,

j = 1, 2, . . . , k. The ch.f. φX1, . . . , Xk always exists by an obvious generalization of Lemmas A, A′ and B, B′. The joint ch.f. φX1, . . . , Xk satisﬁes properties analogous to properties (i)–(vii). That is, one has

THEOREM 1′

(Some properties of ch.f.’s) i′) φX1, . . . , Xk(0, . . . , 0) = 1. ii′) |φX1, . . . , Xk(t1, . . . , tk)| ≤ 1. iii′) φX1, . . . , Xk is uniformly continuous.

6.4

Deﬁnitions and Basic Theorems—The 6.5 The Moment Multidimensional Generating Function Case

151

**iv′) φX1+d1, . . . , Xk+dk(t1, . . . , tk) = eit1d1+ · · · +itkdkφX1, . . . , v′) φc1X1, . . . , ckXk(t1, . . . , tk) = φX1, . . . , vi′) φc1X1+ d1, . . . , ckXk + dk(t1, . . . , tk) = e
**

Xk it1d1 + · · · + itkdk

Xk

(t1, . . . , tk).

(c1t1, . . . , cktk).

φX1, . . . , Xk(c1t1, . . . , cktk).

vii′) If the absolute (n1, . . . , nk)-joint moment, as well as all lower order joint moments of X1, . . . , Xk are ﬁnite, then

∂ n + ⋅ ⋅ ⋅ +n φX , ⋅ ⋅ ⋅ , ∂t tn ⋅ ⋅ ⋅ ∂t tn

1 k 1 k 1 1 k

Xk

(t , ⋅ ⋅ ⋅ , t )

1 k t 1 = ⋅ ⋅ ⋅ =tk = 0

= i∑

k j=1 n j

n n ⋅ ⋅ ⋅ Xk , E X1

1 k

(

)

and, in particular,

∂n φX , ⋅ ⋅ ⋅ , ∂t n j

1

Xk

(t , ⋅ ⋅ ⋅ , t )

1 k t 1 = ⋅ ⋅ ⋅ =tk = 0

= inE X n j ,

( )

j = 1, 2, ⋅ ⋅ ⋅ , k.

viii) If in the φX1, . . . , Xk(t1, . . . , tk) we set tj1 = · · · = tjn = 0, then the resulting expression is the joint ch.f. of the r.v.’s Xi1, . . . , Xim, where the j’s and the i’s are different and m + n = k. Multidimensional versions of Theorem 2 and Theorem 3 also hold true. We give their formulations below.

THEOREM 2 ′

**(Inversion formula) Let X = (X1, . . . , Xk)′ be an r. vector with p.d.f. f and ch.f. φ. Then ii) f X , ⋅ ⋅ ⋅ ,
**

1

Xk

(

⎛ 1 ⎞ x1 , ⋅ ⋅ ⋅ , xk = lim ⎜ ⎟ T →∞ ⎝ 2T ⎠

)

k

∫

1

T

−T

⋅ ⋅ ⋅ ∫ e − it x − ⋅ ⋅ ⋅ −it x

1 1

T

k k

−T

× φX , ⋅ ⋅ ⋅ ,

1

Xk

(t , ⋅ ⋅ ⋅ , t )dt

k

1

⋅ ⋅ ⋅ dt k ,

**if X is of the discrete type, and ii) f X , ⋅ ⋅ ⋅ ,
**

1

Xk

(x , ⋅ ⋅ ⋅ , x )

1 k

⎛ 1 ⎞ = lim lim ⎜ ⎟ h →0 T →∞ ⎝ 2π ⎠

k

∫

T

−T

⋅ ⋅ ⋅∫

Xk

T

−T

∏⎜

1

× e − it1x1 − ⋅ ⋅ ⋅ −itk xk φ X 1 , ⋅ ⋅ ⋅ ,

(t , ⋅ ⋅ ⋅ , t )dt

k

⎛ 1 − e − it j h ⎞ ⎟ it j h ⎠ j =1 ⎝

k 1

⋅ ⋅ ⋅ dt k ,

**if X is of the continuous type, with the analog of (ii′) holding if the integral of |φX1, . . . , Xk(t1, . . . , tk)| is ﬁnite.
**

THEOREM 3 ′

(Uniqueness Theorem) There is a one-to-one correspondence between the ch.f. and the p.d.f. of an r. vector. The justiﬁcation of Theorem 1′ is entirely analogous to that given for Theorem 1, and so is the proof of Theorem 2′. As for Theorem 3′, the fact that the p.d.f. of X determines its ch.f. follows from the deﬁnition of the ch.f. That the ch.f. of X determines its p.d.f. follows from Theorem 2′. ᭡

PROOFS

152

6

Characteristic Functions, Moment Generating Functions and Related Theorems

6.4.1

The Ch.f. of the Multinomial Distribution

**Let X = (X1, . . . , Xk)′ be Multinomially distributed; that is,
**

P X 1 = x1 , ⋅ ⋅ ⋅ , X k = xk =

(

)

n! x x ⋅ ⋅ ⋅ pk p1 . x1! ⋅ ⋅ ⋅ xk !

1 k

Then

φX , ⋅ ⋅ ⋅ ,

1

Xk

(t , ⋅ ⋅ ⋅ , t ) = ( p e

1 k 1

1 1

it 1

+ ⋅ ⋅ ⋅ + pk e it

k

).

n

1 k

In fact,

φX , ⋅ ⋅ ⋅ ,

1

Xk

(t , ⋅ ⋅ ⋅ , t ) = ∑

1 k

e it x + ⋅ ⋅ ⋅ +it x

k k

x 1 , ⋅ ⋅ ⋅ , xk

n! x x × p1 ⋅ ⋅ ⋅ pk x1! ⋅ ⋅ ⋅ xk !

1

=

x 1 , ⋅ ⋅ ⋅ , xk

∑

n! p1e it x1! ⋅ ⋅ ⋅ xk !

k

(

)

x1

⋅ ⋅ ⋅ pk e it

(

k

)

xk

= p1e it + ⋅ ⋅ ⋅ + pk e it

1

(

).

n

Hence

∂k φX , ⋅ ⋅ ⋅ , ∂t1 ⋅ ⋅ ⋅ ∂t k

1

Xk

(t , ⋅ ⋅ ⋅ , t )

1 k

= n n − 1 ⋅ ⋅ ⋅ n − k + 1 i p1 ⋅ ⋅ ⋅ pk p1e it1 + ⋅ ⋅ ⋅

k

(

) (

)

t 1 = ⋅ ⋅ ⋅ = tk = 0

+ pk e itk

)

(

n −k t 1 = ⋅ ⋅ ⋅ = tk = 0

= i k n n − 1 ⋅ ⋅ ⋅ n − k + 1 p1 p2 ⋅ ⋅ ⋅ pk .

(

) (

)

Hence

E X1 ⋅ ⋅ ⋅ X k = n n − 1 ⋅ ⋅ ⋅ n − k + 1 p1 p2 ⋅ ⋅ ⋅ pk .

(

) (

)

(

)

Finally, the ch.f. of a (measurable) function g(X) of the r. vector X = (X1, . . . , Xk)′ is deﬁned by:

⎧ e itg ( x ) f x , x = x , 1 ⋅ ⋅ ⋅ , xk ′ ⎪∑ itg ( X ) ⎤ ⎡ =⎨ x φg(X) t = E e ⎥ ⎢ ⎦ ⎣ ⎪ ∞ ⋅ ⋅ ⋅ ∞ e itg ( x , ⋅ ⋅ ⋅ , x ) f x , 1 ⋅ ⋅ ⋅ , xk dx1 , ⋅ ⋅ ⋅ , dxk . ⎩∫−∞ ∫−∞

()

()

(

)

1

k

(

)

Exercise

6.4.1 (Cramér–Wold) Consider the r.v.’s Xj, j = 1, . . . , k and for cj ∈ ޒ, j = 1, . . . , k, set

6.5

The Moment Generating Function

153

Yc = ∑ c j X j .

j =1

k

Then ii) Show that φYc(t) = φX1, . . . , Xk(c1t, . . . , ckt), t ∈ ޒ, and φX1, . . . , Xk(c1, . . . , ck) = φYc(1); ii) Conclude that the distribution of the X’s determines the distribution of Yc for every cj ∈ ޒ, j = 1, . . . , k. Conversely, the distribution of the X’s is determined by the distribution of Yc for every cj ∈ ޒ, j = 1, . . . , k.

**6.5 The Moment Generating Function and Factorial Moment Generating Function of a Random Variable
**

The ch.f. of an r.v. or an r. vector is a function deﬁned on the entire real line and taking values in the complex plane. Those readers who are not well versed in matters related to complex-valued functions may feel uncomfortable in dealing with ch.f.’s. There is a partial remedy to this potential problem, and that is to replace a ch.f. by an entity which is called moment generating function. However, there is a price to be paid for this: namely, a moment generating function may exist (in the sense of being ﬁnite) only for t = 0. There are cases where it exists for t’s lying in a proper subset of ( ޒcontaining 0), and yet other cases, where the moment generating function exists for all real t. All three cases will be illustrated by examples below. First, consider the case of an r.v. X. Then the moment generating function (m.g.f.) MX (or just M when no confusion is possible) of a random variable X, which is also called the Laplace transform of f, is deﬁned by MX(t) = E(etX), t ∈ ޒ, if this expectation exists. For t = 0, MX(0) always exists and equals 1. However, it may fail to exist for t ≠ 0. If MX(t) exists, then formally φX(t) = MX(it) and therefore the m.g.f. satisﬁes most of the properties analogous to properties (i)–(vii) cited above in connection with the ch.f., under suitable conditions. In particular, property (vii) in Theorem 1 yields d M X (t ) t = 0 = E ( X n ) , provided Lemma D applies. In fact, dt

n n

dn MX t dt n

()

=

t =0

dn Ee tX dt n

(

)

= E X n e tX

(

)

⎛ dn ⎞ = E ⎜ n e tX ⎟ ⎝ dt ⎠ t =0 t =0 = E Xn .

t =0

( )

This is the property from which the m.g.f. derives its name.

Then d MX t dt and () = t =0 d pe t + q dt ( ) n t =0 = n pe t + q ( n −1 pe t t =0 = np = E X . to be ﬁnite. 6. t ∈ ޒ.154 6 Characteristic Functions.f.g. so that σ 2(X ) = n2p2 − np2 + np − n2p2 = np(1 − p) = npq. ( ) ( ) t =0 ( ) t =0 ( ) . If X ∼ P(λ).’s 1. t ( ) ( ) ⎤ et ⎥ ⎦ t =0 M X t = ∑ e tx e − λ x= 0 () ∞ t ∞ λe λx −λ =e ∑ x! x! x= 0 ( ) t x = e − λ e λe = e λe −λ .g.f. t t Then d MX t dt and d2 MX t dt 2 () = t =0 d λe −λ = λe t e λe −λ =λ=E X .f.1 The M. is ﬁnite for all t ∈ ޒ. In fact.. If X ∼ B(n. although no justiﬁcation will be supplied. clearly. then MX(t) = ( pet + q)n. p). ⎝ x⎠ x= 0 x = 0 ⎝ x⎠ () ( ) ) ( ) which. It so happens that part (vii) of Theorem 1. Indeed. is applicable in all these examples. then MX(t) = eλe −λ.5.V. n n x n ⎛ n⎞ ⎛ n⎞ M X t = ∑ e tx ⎜ ⎟ p x q n − x = ∑ ⎜ ⎟ pe t q n − x = pe t + q .G. Moment Generating Functions and Related Theorems Here are some examples of m. so that σ 2 X = λ + λ2 − λ2 = λ . It is instructive to derive them in order to see how conditions are imposed on t in order for the m.F. 2. ( ) d2 MX t dt 2 () = t =0 d ⎡ np pe t + q dt ⎢ ⎣ ( ) n −1 ⎤ et ⎥ ⎦ t =0 pe t e t + pe t + q ⎡ = np⎢ n − 1 pe t + q ⎣ ( )( ) n −2 ( ) n −1 = n n − 1 p 2 + np = n 2 p 2 − np 2 + np = E X 2 .g.’s. as it would be formulated for an m. t ∈ ޒ.’s of Some R. e t =0 dt t =0 t ( ) () = t =0 t d λe t e λe −λ dt ( ) = λ e t e λe −λ + e t e λe −λ λe t t t = λ 1 + λ = E X 2 .

so that σ 2 X = σ 2 + μ 2 − μ 2 = σ 2 . if X ∼ N(0. the above 1 (1 − βt ) α ⋅ 1 Γ α βα ( ) ∫0 = ∞ yα −1 e − y β dy = (1 − βt ) α . By the property for m. in particular. If X is distributed as Gamma with parameters α and β. t < 1/β. then M X (t ) = e tμ + σ 2t 2 2 . so that x = expression is equal to . If X ∼ N(μ. t ∈ ޒ. ( ) d2 MX t dt 2 () μt + d ⎡ = ⎢ μ + σ 2t e dt ⎢ t=0 ⎣ ( ) σ 2t 2 2 ⎤ ⎥ ⎥ ⎦ t= 0 σ t σ t ⎡ 2 μt + μt + 2 2 = ⎢σ 2 e + μ + σ 2t e ⎢ ⎣ 2 2 ( ) 2 2 = E X 2 . and. t ∈ ޒ.f. and y ∈ [0. Indeed. then MX(t) = (1 − βt)−α. dx = . or equivalently. M X (t ) = e d MX t dt and σ 2t 2 2 . ( ) . analogous to property (vi) in Theorem 1. Then d MX t dt () t =0 d 1 − βt dt ( ) −α t =0 = αβ = E X . then MX(t) = et /2. t < 1/β. ⎛t⎞ ⎛t⎞ M X − μ t = e − μt σ M X ⎜ ⎟ . 2 3.g.6. so that M X ⎜ ⎟ = e μt σ M X − μ t ⎝σ ⎠ ⎝σ ⎠ σ σ () () for all t ∈ ޒ. ∞).5 The Moment Generating Function 155 1). ﬁnally. ( ) ( ) ⎤ ⎥ = σ 2 + μ2 ⎥ ⎦ t=0 MX t = () 1 Γ α βα ( ) ∫ 1 ∞ 0 e tx x α −1 e − x β dx = y 1− βt 1 Γ α βα ( ) ∫ dy 1− βt ∞ 0 x α −1 e − x 1− βt β ( ) dx. provided 1 − βt > 0. Then by setting x(1 − βt) = y. Replacing t by σt. 4. σ 2). Therefore μt ⎛t⎞ MX ⎜ ⎟ = e σ et ⎝σ ⎠ 2 2 =e μt + μt t 2 + σ 2 . we get. Then σ 2t 2 2 t= 0 () = t= 0 d μt + e dt σ 2t 2 2 t=0 = μ + σ 2t e ( ) μt + =μ=E X .

λ −t λ λ 5. X is deﬁned by: ηX t = E t X .f. the nth factorial moment of an r.’s exist for all t ∈ . 2 1 For α = 1 and β = λ . t 2π ∫ ∞ 0 ( ) The examples just discussed exhibit all three cases regarding the existence or nonexistence of an m.5. In Examples 1 and 3. since ez > z. and its mean and variance. if E t X exists. and its mean and variance. X.g.9).g.’s exist for proper subsets of . of the Negative Exponential distribution. so that σ X = αβ . t < λ .156 6 Characteristic Functions.ޒand in Example 5. 0 in the integral. More precisely. σ 2 X = 2r .g. the m.f.f.v. and without loss of generality.g. t ∈ ޒ. let μ = 0.v.g. ηX (or just η when no confusion is possible) of an r. EX = .f.g. we get the m. t< ( ) ( ) λ 1 1 . Moment Generating Functions and Related Theorems and d2 MX t dt 2 () = αβ t =0 d 1 − βt dt ( ) − α −1 t =0 2 = α α + 1 β 2 1 − βt ( ) ( 2 ) −α − 2 t =0 2 = α α + 1 β = EX . Clearly. = = ∫ 2 1 2π u 2π x→∞ 1+ x Thus for t > 0.f. by differentiation as follows: () ( ) ( ) . namely MX t = 1 − 2t −r 2 () ( ) . for α = 2 and β = 2. This function is sometimes referred to as the Mellin or Mellin–Stieltjes transform of f.f.v. 1 . we also deﬁne what is known as its factorial moment generating function. In fact. REMARK 4 For an r. for z > 0. E X = r . Then the MX(t) exists only for t = 0.ޒin Examples 2 and 4. 2 ( ) ( ) ( ) r In particular. the m. If t < 0. namely. σ = 1. σ 2 X = 2 . we obtain the m. ηX(t) = MX(log t) for t > 0. we again reach the conclusion that MX(t) = ∞ (see Exercise 6. of the χ 2 r. Formally. Let X have the Cauchy distribution with parameters μ and σ. ∞ 1 1 M X t = E e tX = ∫ e tx dx −∞ π 1 + x2 1 ∞ 1 1 ∞ 1 > ∫ e tx dx > ∫ tx dx 2 π 0 π 0 1+ x 1 + x2 MX t = () ( ) () ( ) ( ) if t > 0. and this equals 2 x dx t ∞ du t lim log u . exists only for t = 0.g. by using the limits −∞.f.g. the m. the factorial m.f. X is taken from its factorial m. MX(t) obviously is equal to ∞.

5. since σ 2 X = E X 2 − EX . then ηX(t) = eλt − λ.f. and E X 2 = E X X − 1 + E X . factorial moments are especially valuable in calculating the variance of discrete r.6. we get ( ) ( ) ( ) ( ) 2 ( ) [ ( )] ( ) σ 2 X = E X X − 1 + E X − EX . x x ⎝ ⎠ ⎝ ⎠ x= 0 x= 0 () ( ) ( ) Then d2 ηX t dt 2 () = n n − 1 p 2 pt + q t =1 ( ) ( ) n −2 = n n − 1 p2 . () = E X X −1 ⋅ ⋅ ⋅ X − n+1 . t ∈ ޒ.5 The Moment Generating Function 157 dn ηX t dt n In fact. ηX t = ∑ t e x =0 () ∞ x −λ ∞ λt λx = e −λ ∑ = e − λ e λt = e λt −λ .’s of some R. Hence dn ηX t dt n REMARK 5 () = E X X −1 ⋅ ⋅ ⋅ X − n+1 . X ⎜ n t ⎟ = E X X −1 ⋅ ⋅ ⋅ X −n+1 t dt n dt n ⎝ ∂t ⎠ () ( ) [ ( ) ( ) ] provided Lemma D applies.2 The Factorial M.f.’s 1.’s. an expression of the variance of X in terms of derivatives of its factorial m. Indeed.’s.v. n n x n ⎛ n⎞ ⎛ n⎞ η X t = ∑ t x ⎜ ⎟ p x q n − x = ∑ ⎜ ⎟ pt q n − x = pt + q . t ∈ޒ. x! x = 0 x! ( ) x . In fact. [ ( )] ( ) ( ) 2 6. If X ∼ B(n. Below we derive some factorial m.f. up to order two. Property (9) (for n = 2) is valid in all these examples. derives its name from the property just established. t =1 [ ( ) ( )] (9) The factorial m.G. ( ) so that σ 2(X ) = n(n − 1)p2 + np − n2p2 = npq.V. although no speciﬁc justiﬁcation will be provided.g. p). so that the interchange of the order of differentiation and expectation is valid.F.g. If X ∼ P(λ). In fact. t =1 [ ( ) ( )] ⎛ ∂n X⎞ dn dn X X −n E t E t η = = . then ηX(t) = (pt + q)n. that is.g. t ∈ ޒ. 2. As has already been seen in the ﬁrst two examples in Section 2 of Chapter 5.

. tk) exists.’s X1. . 6. Vectors 1. Xk have jointly the Multinomial distribution with parameters n and p1. j = 1. t ) 1 k nk = E X 1n 1 ⋅ ⋅ ⋅ X k . X t1 . . then their joint m.’s X1. If the r. ⋅ ⋅ ⋅ . ( ) The m. we present two examples of m. t .g.v. the above derivations hold true for all tj ∈ ޒ. In fact. t ) = Ee 1 k t 1 X 1 + ⋅ ⋅ ⋅ +tk X k = ∑e 1 x1 t X 1 + ⋅ ⋅ ⋅ +tk X k n! xk p1x1 ⋅ ⋅ ⋅ pk x1! ⋅ ⋅ ⋅ xk ! =∑ n! p1e t1 x1! ⋅ ⋅ ⋅ xk ! ( ) ⋅ ⋅ ⋅ pk e tk n = p1e t1 + ⋅ ⋅ ⋅ + pk e tk ( ( ) ). of an r. If the r. . ⋅ ⋅ ⋅ . t 1 = ⋅ ⋅ ⋅ = tk = 0 ( ) (10) where n1. nk are non-negative integers. ⋅ ⋅ ⋅ . denoted by MX or MX1.158 6 Characteristic Functions. . j = 1. . but we prefer to use the matrix approach. ⋅ ⋅ ⋅ . t k = E e t X + ⋅ ⋅ ⋅ t X . . which is more elegant and compact. Recall that the joint p. .d.v. . . so that σ 2 X = λ2 + λ − λ2 = λ . . is MX 1 . . . .’s X1 and X2 have the Bivariate Normal distribution with 2 parameters μ1. Xk. μ2. . 1 1 k k 1 k ( ) ( ) ∂ n + ⋅ ⋅ ⋅ +n MX . Xk(t1. .’s of r. . . . t ) = exp⎢μ t 1 2 ⎡ ⎣ 1 1 + μ2t 2 + 1 2 2 2 2 σ 1 t1 + 2 ρσ 1σ 2 t1t 2 + σ 2 t2 2 ( )⎤ ⎥. . ⋅ ⋅ ⋅ .f. itk) and properties analogous to (i′)–(vii′). . MX1 . then formally φX1. Xk (t .5. xk where the summation is over all integers x1. k. . k. . . If MX1.v. ⋅ ⋅ ⋅ . . xk ≥ 0 with x1 + · · · + xk = n.f. k. j = 1. 2. n1 nk ∂t1 ⋅ ⋅ ⋅ ∂t k 1 k 1 Xk (t . .g. ⋅ ⋅ ⋅ . . σ 2 1.X 2 (t . . . (11) An analytical derivation of this formula is possible.f. ⋅ ⋅ ⋅ . n t j ∈ ޒ. . Clearly. . . . .g. is deﬁned by: M X . . ⋅ ⋅ ⋅ . tk) = MX1. t ) = ( p e 1 k 1 t1 + ⋅ ⋅ ⋅ + pk e t k ). 2. . . Xk. (viii) in Theorem 1′ hold true under suitable conditions. .F. Xk(it1. Below.3 The M. . vector X or the joint m.f. . .G. σ 2 and ρ. . .g. t ⎦ 1 2 ∈ ޒ. then MX . .’s of Some R. . . ⋅ ⋅ ⋅ . for those tj’s in ޒfor which this expectation exists. . vectors. . .f. pk. t j ∈ ޒ. 1 Xk (t . of X1 and X2 is given by . In particular. . Moment Generating Functions and Related Theorems Hence d2 ηX t dt 2 () = λ2 e λt −λ t =1 t =1 = λ2 . of the r. . . Xk(t1.

⎝ σ1 ⎠ ⎝ σ1 ⎠ ⎝ σ 2 ⎠ ⎝ σ 2 ⎠ ⎥ ⎪ 1 − ρ2 ⎪ 2⎢ ⎦ ⎭ ⎣ ⎩ 2πσ 1σ 2 Set x = (x1 x2)′. ⎝ 1 2 2 ⎠ −1 2 2 Then the determinant of Σ. and ⎛ σ 12 ρσ σ ⎞ = ∑ ⎜ ρσ σ σ12 2 ⎟ . ∑ ⎝ − ρσ 1σ 2 σ 12 ⎠ ( x − μ ′ ∑ −1 x − μ = = ) ( ) 2 ⎛ σ2 − ρσ 1σ 2 ⎞ ⎛ x1 − μ1 ⎞ 1 x1 − μ1 x 2 − μ 2 ⎜ ⎟⎜ ⎟ ∑ σ 12 ⎠ ⎝ x 2 − μ 2 ⎠ ⎝ − ρσ 1σ 2 ( ) σ σ 1− ρ 2 1 2 2 ( 1 2 ) ⎡σ 2 x − μ 1 ⎢ 2 1 ⎣ ( ) 2 2 − 2 ρσ 1σ 2 x1 − μ1 x 2 − μ 2 + σ 12 x 2 − μ 2 ⎤ ⎥ ⎦ ( )( ) ( ) = 1 1 − ρ2 2 ⎡⎛ x − μ ⎞ 2 ⎛ x1 − μ1 ⎞ ⎛ x 2 − μ 2 ⎞ ⎛ x 2 − μ 2 ⎞ ⎤ 1 ⎢⎜ 1 + − 2 ρ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎥. and Σ is the covariance matrix of X. |Σ|. x 2 = ( ) 1 2⎫ ⎧ 2 ⎛ x1 − μ1 ⎞ ⎛ x1 − μ1 ⎞ ⎛ x 2 − μ 2 ⎞ ⎛ x 2 − μ 2 ⎞ ⎤ ⎪ ⎪ 1⎡ exp⎨− ⎢⎜ ⎟ − 2 ρ⎜ ⎟⎜ ⎟ +⎜ ⎟ ⎥ ⎬. we have M X t = Ee t ′ X = ∫ 2 exp t′ x f x dx ޒ () ( )() −1 = 1 2π ∑ 1 2 t′ x − ( x − μ )′ ∑ ( x − μ )⎥dx.6. is ∑ −1 = Therefore 2 1 ⎛ σ 2 − ρσ 1σ 2 ⎞ ⎜ ⎟. is |Σ| = σ 2 1σ 2(1 − ρ ). μ is the mean vector of X = (X1 X2)′. and the inverse. for t = (t1 t2)′. ⎦ ⎤ In this form. ∫ ޒexp ⎢ 2 ⎣ ⎦ 1 2 ⎡ ⎤ (11) The exponent may be written as follows: . Next. Σ . μ = (μ1 μ2)′.5 The Moment Generating Function 159 f x1 . ⎢⎝ σ 1 ⎠ ⎝ σ1 ⎠ ⎝ σ 2 ⎠ ⎝ σ 2 ⎠ ⎥ ⎦ ⎣ Therefore the p.d. is written as follows in matrix notation: f x = () 1 2π ∑ 1 2 ⎡ 1 exp⎢− x − μ ′ ∑ −1 x − μ ⎣ 2 ( ) ( )⎥ .f.

g. respectively. f given in Exercise 3.5.2 Let X be an r.2. of the r.d..160 6 Characteristic Functions. for those t’s for which they exist. for those t’s for which they exist. μ′t = t′μ. provided they are ﬁnite.f.5. X which denotes the number of spots that turn up when a balanced die is rolled. E[X(X − 1)] and σ 2(X ). given by (11).v.f. x′t = t′x.v. indeed.g.14 of Chapter 3. By means of (13) and (14). . 2 ⎝ ⎠ 2 [ ( ) ( )] (13) Focus on the quantity in the bracket. the second factor above is equal to 1.g. carry out the multiplication. since it is the integral of a Bivariate Normal distribution with mean vector μ + Σt and covariance matrix Σ. provided they are ﬁnite. M(t) and η(t). f given in Exercise 3. Then calculate EX. Thus ⎛ ⎞ 1 M X t = exp⎜ μ ′t + t ′ ∑ t ⎟ . and factorial m.v. Exercises 6. respectively. Moment Generating Functions and Related Theorems ⎛ ⎞ 1 1 −1 ⎜ μ ′t + t ′ ∑ t ⎟ − 2 μ ′t + t ′ ∑ t − 2 t ′ x + x − μ ′ ∑ x − μ .g. with p.f. Derive its m.5. to obtain that Σ′ = Σ. and x′Σ −1μ = μ′Σ 2 μ ′t + t ′ ∑ t − 2t ′ x + x − μ ′ ∑ −1 x − μ = x − μ + ∑ t ′ ∑ −1 x − μ + ∑ t .g.f. the m. 6.13 of Chapter 3. and factorial m.. in (12) becomes ( ) ( ) [ ( ) ( ( ))] (14) MX t () ∫ޒ 2 ⎛ ⎞ 1 1 = exp⎜ μ ′t + t ′ ∑ t ⎟ 2 ⎝ ⎠ 2π ∑ 1 2 ⎡ 1 ⎤ exp⎢− x − μ + ∑ t ′ ∑ −1 x − μ + ∑ t ⎥ dx. and observe Σ −1x.2. 2 ⎝ ρσ 1σ 2 σ 2 ⎠ ⎝ t 2 ⎠ ( ) it follows that the m. 2 ⎝ ⎠ () (15) Observing that ⎛ σ 12 ρσ 1σ 2 ⎞ ⎛ t1 ⎞ 2 2 2 2 t ′ ∑ t = t1 t 2 ⎜ ⎟ ⎜ ⎟ = σ 1 t1 + 2 ρσ 1σ 2 t1t 2 + σ 2 t 2 . M(t) and η(t). E[X(X − 1)] and σ 2(X ).g. is.f.f. 6.f.3 Let X be an r. ⎣ 2 ⎦ ( ( )) ( ( )) However. (Σ −1)′ = Σ−1. Then calculate EX. with p.g. Derive its m.1 Derive the m.f.f.d.

show that (d n dt n γ t ) () t =0 = E X X −1 ⋅ ⋅ ⋅ X −n+1 . β).6 Let X be an r. t ∈ ( ޒα ∈ ޒ. in order to calculate its kth factorial moment. is given by Mt = () e tβ − e tα . Use its factorial m. Find the ch.11 For an r.f.f.f.f and factorial m. show that EX = rq/p and σ 2(X ) = rq/p2. distributed as Negative Binomial with parameters r and p.5.4 in Chapter 5.v.2. Use its factorial m.f.v. p). Compare with Exercise 5.5.f.g. q ii) By differentiation. 2 6.5. 6.4 Let X be an r. of X in order to calculate EX4.2. M(t) and η(t).7 Let X be an r. . distributed as P(λ). Derive γ (t) and use it in order to show that the nth factorial moment of X is λn.f. t < − log q.9 Refer to Example 3 in the Continuous case and show that MX(t) = ∞ for t < 0 as asserted there. provided they are ﬁnite.v.d.5 The Moment GeneratingExercises Function 161 6.6. with m.10 Let X be an r. iii) Find the quantities mentioned in parts (i) and (ii) for the Geometric distribution. t < 1 . M(t) for those t’s for which it exists.5. Then. Then calculate EX and σ 2(X).f.8 Let X be an r.g.. distributed as U(α.f. of X and identify its p.g.v. 6. Also use the ch. β > 0). with p.g. are given by MX t = () pr (1 − qe ) t r .1 in Chapter 5.g.d. ηX t = () pr (1 − qt ) r .. 6.5.g. Find its m. [ ( ) ( )] 6.f.5. 2 6.5. M given by M(t) = eα t +β t . deﬁne the function γ by γ (t) = E(1 + t)X for those t’s for which E(1 + t)X is ﬁnite. Compare with Exercise 5. respectively.v. 6. M.v. if the nth factorial moment of X is ﬁnite. X.g. 6. i) Show that its m.∞)(x).5. distributed as B(n.5. f given by f(x) = λe−λ(x −α)I(α.f. ii) Show that its m.v. show that EX = α + and σ 2(X ) = ( 2 α −β 12 ) . t β −α ( ) β ii) By differentiation. in order to calculate its kth factorial moment.5 Let X be an r.12 Refer to the previous exercise and let X be P(λ).

g.5.v.g.g. then use the previous exercise in order to ﬁnd the m. . one has that the m. 6. 6. Also ﬁnd their joint ch.19 Refer to the previous exercise and derive the m.5. in Chapter 4 and ﬁnd the joint m. t 2 = ⎢ e t1 + t2 + 1 + e t1 + e t2 ⎥ . .g. . such that EX n = n!.f.v. M(t) of the r. . .16 Let X be an r. deduce the distribution of X. of X and from this.5.f.5.f. t1 . Then show that d Kt dt () = μ and t =0 d2 Kt dt 2 () = σ 2.f. From this. σ 2(X1) and Cov(X1. n 6. Moment Generating Functions and Related Theorems 6. of X is given by M t = ∑ EX n n =0 () ∞ ( ) tn! . Furthermore. with m. . Then deduce the distribution of X.18 Refer to Exercise 4. M(t) of X for those t’s for which it exists. suppose that EX = μ and σ 2(X ) = σ 2 are both ﬁnite.f.14) 6. t3) of the r. such that EX n is ﬁnite for all n = 1.5. such that EX 2k = 2 k ⋅ k! (2k)! . X2.g.f.v.v. t3 for which it exists. X3 for those t1. deduce the distribution of g. 6 ⎣3 ⎦ ( ) ( ) ( ) 2 Calculate EX1. EX 2k +1 = 0. given by ⎡1 ⎤ 1 M t1 . provided they are ﬁnite.14 Let X be an r.2. Use the expansion ex = ∑ xn n = 0 n! ∞ in order to show that.f.f.162 6 Characteristic Functions. . of X and also its ch.5. t 2 ∈ ޒ. t2.g. X2 be two r.15 If X is an r.f. . k = 0.5. M(t1.v.’s with m. M and set K(t) = log M(t) for those t’s for which M(t) exists. 1. X2). t =0 (The function K just deﬁned is called the cumulant generating function of X.) 6.g. under appropriate conditions.f.’s X1.v. 6. g(X1. and use it in order to calculate E(X1X2X3).13 Let X be an r. X2. X3) = X1 + X2 + X3 for those t’s for which it exists.v. (Use Exercise 6. t2. Find the m. Also ﬁnd the ch. 2.5.5.17 Let X1. . provided the assumptions of Theorem 1′ (vii′) are met.

4. X2) − Fx (X1)Fx (X2) ≥ 0.’s X1. provided. M and set K(t1. j.v. t2) for those t1. ii) Write out the joint m. t 1 =t2 = 0 ( ) 6.4. show that Cov(X1.’s X1 and X2 have the Bivariate Normal distribution with 2 parameters μ1.v.1 for k = 2 and formulated in terms of m.5.v. . X.v. of Xi.g.g.23 6. are said to be positively quadrant dependent or negatively quadrant dependent.f.1 hold true if the ch.v. Cov(Xi. so that these r. x (X1.f. of course. determine the E(XiXj). In particular. Consider the r. X 2 .5. μ2. show that c1X1 + c2X2 + c3 is also normally distributed for any c3 ∈ ޒ.g. t 1 =t2 = 0 ∂2 K t1 . for all X1.5. pj.’s X1 and X2 have a Bivariate Normal distribution if and only if for every c1. given by (11) and property (10) in order to determine E(X1X2). t 2 ∂t1∂t 2 ( ) = Cov X 1 . ii) Use Exercise 6. it can be seen that they are positively quadrant dependent or negatively quadrant dependent according to whether ρ ≥ 0 or ρ < 0. ( ) ∂2 K t1 . where p = 1 − pi − pj. for all X1. use their joint m. the correlation coefﬁcient of X1 and X2. and by differentiation. X2 for which Fx .v. Note: Two r.’s.’s with m.5. variances. σ 2 1. X2 in ޒ. Xj. 2. X2) − Fx (X1)Fx (X2) ≤ 0. pk. and Cov(X1. ii) If the r. .20 Let X1. μ2. t 2 ∂t j ( ) = EX j . .22 If the r.f. Yc = c1X1 + c2X2 is normally distributed.’s X1 and X2 have the Bivariate Normal distribution with param2 eters μ1. and show that it is negative. Xk have the Multinomial distribution with parameters n and p1.f. Furthermore.’s are all ﬁnite. ii) Show that ρ is.6. respectively. Xj. 6. and set X = n − Xi − Xj. t2) exists. ∂ K t1 . or Fx . .x (X1. X2 in ޒ.5 The Moment GeneratingExercises Function 163 6.5.24 Verify the validity of relation (13). t2) = log M(t1.’s X1. .v. 6. . X2) < 0 if ρ < 0. σ 2 and ρ. that these m.g.’s in order to show that the r. t2 for which M(t1. be arbitrary but ﬁxed. σ 2 1. indeed.’s exist.’s Xi. p. X2 be two r. suppose that expectations. and covariances of these r. X2) ≥ 0 if ρ ≥ 0.g. Xj.g. Xj). Then show that for j = 1.5.’s have the Multinomial distribution with parameters n and pi. t 2 ∂t j2 ( ) t 1 =t2 = 0 =σ 2 Xj . 1 2 1 2 1 2 1 2 6. σ 2 and ρ.25 Both parts of Exercise 6. .f. 1 ≤ i < j ≤ k.f. c2 ∈ ޒ.f. ii) Calculate the covariance of Xi.v. if X1 and X2 have the Bivariate Normal distribution. and let i.21 Suppose the r. ii) In either case. .’s involved are replaced by m.v. .

. the concept of independence of events was deﬁned and was heavily used there.v. Independence carries over to r.v.’s is presented. . j = 1. . consider a class of events associated with this space.v. A third criterion of independence. reduces to that of events. .v. k = ∏ P X j ∈ Bj . Independence of r. the not-so-rigorous deﬁnition of independence of r. 2. and several applications. . . and let P be a probability function deﬁned on the class of events. In Chapter 2 (Section 2. DEFINITION 1 The r. (See Lemma 3 in Section 7. but there is plenty of leeway in their choice. (See also Deﬁnition 3 in Section 7.4. xj].4. 164 . In this section. . . . are said to be independent if every ﬁnite subcollection of them is a collection of independent r. . for sets Bj ⊆ ޒ.3). ⋅ ⋅ ⋅ . k may not be chosen entirely arbitrarily. and two criteria of independence are also discussed. j = 1. . j = 1. as well as in subsequent chapters. .4) also applies to mdimensional r. j = 1.1 Stochastic Independence: Criteria of Independence Let S be a sample space.’s. based primarily on independence.v.v.’s are said to be dependent. and is the most basic assumption made in this book.v. . k would be sufﬁcient. as will be seen below.’s also. For example. taking Bj = (−∞. . .) REMARK 1 (i) The sets Bj. k are said to be independent if.164 7 Stochastic Independence with Some Applications Chapter 7 Stochastic Independence with Some Applications 7. Non-independent r. vectors when ( ޒand B in Deﬁnition 3) is replaced by ޒm (Bm). xj ∈ ޒ.) (ii) Deﬁnition 1 (as well as Deﬁnition 3 in Section 7. j = 1.’s Xj.’s Xj. and the comment following it.’s. are discussed in subsequent sections.4. k. . . in essence. it holds P X j ∈ Bj . A rigorous treatment of some results is presented in Section 7. . j = 1. . j =1 ( ) k ( ) The r.

j =1 ( ) k ) In particular. X x1 .7. j = 1. 1 k k j =1 j Then for any sets Bj = (−∞. k = ∏ P X j ∈ Bj . xk = ∏ FX x j . ii) f X . ⋅ ⋅ ⋅ . xk = ∏ f X x j .v. for all 1 k ( ) k j =1 k j ( ) x j ∈ ޒ. . . ⋅ ⋅ ⋅ . yk = ∏ FX y j . we get B1 × ⋅ ⋅ ⋅ ×B ∑ f X . we have: Let f X . . ⋅ ⋅ ⋅ . . x j ∈ ޒ. . . ii) For the discrete case. . ⋅ ⋅ ⋅ . . X x1 . j =1 ( ) k ( ) or f X . Bj ⊆ ޒ. Lemma 3. ⋅ ⋅ ⋅ . X x1 . X x1 .4. Some relevant comments will be made in Section 7. k. we set Bj = {xj}. be omitted.1 Stochastic Independence: Criteria of Independence 165 THEOREM 1 (Factorization Theorem) The r. Then if Xj. where xj is in the range of Xj. ⋅ ⋅ ⋅ . j =1 ( ) k ( ) Therefore Xj. . ⋅ ⋅ ⋅ . . X x1 . k. . j = 1. xj ∈ ޒ. then P X j ∈ Bj . . · · · . yj ∈ ޒ. X y1 . we get P X 1 = x1 . xk = ∏ f X x j . . 1 k ( ( ) ) k j =1 j ( ) ( ) Let now f X . ⋅ ⋅ ⋅ . ⋅ ⋅ ⋅ . ⋅ ⋅ ⋅ . . j = 1. j = 1. ⋅ ⋅ ⋅ . k. xk = 1 k ( ) k B1 × ⋅ ⋅ ⋅ ×B ∑ f X x1 ⋅ ⋅ ⋅ f X xk 1 k ( ) ( ) k or 1 k k ⎡ ⎤ = ∏ ⎢ ∑ f X x j ⎥. . k are independent. ⋅ ⋅ ⋅ . . of course. j = 1. 1 k ( ) k j =1 j ( ) The proof of the converse is a deep probability result. k. j = 1. j = 1. k are independent. ⋅ ⋅ ⋅ . . j = 1. ⋅ ⋅ ⋅ . ⋅ ⋅ ⋅ . k are independent if and only if any one of the following two (equivalent) conditions holds: i) FX . k. yj]. . for all 1 k ( ) j =1 j ( ) ( PROOF ii) If Xj. xk = ∏ FX x j . ⋅ ⋅ ⋅ . ⋅ ⋅ ⋅ . j =1 ⎢ ⎥ ⎣B ⎦ j ( ) j j FX . For the continuous case. xj]. j = 1. ⋅ ⋅ ⋅ . xk = ∏ f X x j . . k which gives FX . . X x1 . and will. xk = ∏ f X x j 1 k ( ) k j =1 j ( ) and let . ⋅ ⋅ ⋅ .’s Xj. . ⋅ ⋅ ⋅ . j = 1. . X x1 . X k = xk = ∏ P X j = x j . . j = 1. . this is true for Bj = (−∞. k are independent by (i). ⋅ ⋅ ⋅ .

⋅ ⋅ ⋅ . j = 1. ⋅ ⋅ ⋅ . ⋅ ⋅ ⋅ .f.v. we get f X .d.’s. . y j ∈ ޒ. j = 1. . ⋅ ⋅ ⋅ .) PROOF See Section 7. k are r.’s Xj.’s Xj. LEMMA 2 Consider the r. .166 7 Stochastic Independence with Some Applications C j = −∞. 1 k ( ] ( ) k j =1 j ( ) so that Xj. . t k = ∏ φ X t j . k. X t1 .’s. ( ) [ ( )] PROOF See Section 7. let the r. k are independent. k. we get FX . the Xj’s are independent). j = 1. . vectors. .v. assume that FX . .v. .’s gj(Xj). . ▲ The converse of the above statement need not be true as will be seen later by examples. Then differentiating both sides. . .’s (r.v.4.’s Xj. . vectors) are independent r. . k ought to be independent. Both this lemma. for all t j ∈ ޒ. . . ⋅ ⋅ ⋅ . . we have ⎡ k ⎤ k E ⎢∏ g j X j ⎥ = ∏ E g j X j .v. j = 1.’s are replaced by m-dimensional r.v. k and let gj : ޒ → ޒbe (measurable) functions. . k are independent if and only if: φ X .v.2. This is. j = 1.v. . true and is the content of the following LEMMA 1 For j = 1. k. X x1 . .v. k are also independent.’s. . ▲ Independence of r. ▲ 1 k ( ) k j =1 j ( ) It is noted that this step also is justiﬁable (by means of calculus) for the continuity points of the p. ⋅ ⋅ ⋅ . j = 1. xk = ∏ FX x j 1 k ( ) k j =1 j ( ) (that is. are needed in the proof of Theorem 1′ below. . X y1 . Then it seems intuitively clear that the r. . Next. REMARK 3 THEOREM 1′ (Factorization Theorem) The r. . The same conclusion holds if the r.’s Xj be independent and consider (measurable) functions gj : ޒ → ޒ. . j = 1. k are independent by (i). only. .v. . . . ⋅ ⋅ ⋅ . xk = ∏ f X x j . and the functions gj. 1 k ( ) k j =1 j ( ) .’s and suppose that gj is a function of the jth r. .v.v. j = 1. Then integrating both sides of this last relationship over the set C1 × · · · × Ck. X x1 . as well as Lemma 1. alone. (That is. . j = 1.v. Then. ⋅ ⋅ ⋅ . The same conclusion holds if the gj’s are complex-valued. ⋅ ⋅ ⋅ . . ⎢ ⎥ ⎣ j =1 ⎦ j =1 provided the expectations considered exist. y j . k are r. . k are deﬁned on ޒm into ޒ. so that gj(Xj). yk = ∏ FX y j .v. so that gj(Xj). Then the r. REMARK 2 Consider independent r. . . functions of independent r. . actually. . j = 1. if the r.’s also has the following consequence stated as a lemma. . ⋅ ⋅ ⋅ . . .’s gj(Xj). . j = 1.

⋅ ⋅ ⋅ . Xj. ⋅ ⋅ ⋅ . we have (see Theorem 2(i) in Chapter 6) 1 T − it x f X x j = lim e φ X t j dt j . and this is Πk j=1φX (tj). x k ) = lim ⎜ ⎟ T →∞ ⎝ 2T ⎠ ⎛ 1 ⎞ = lim ⎜ ⎟ T →∞ ⎝ 2T ⎠ k ∫ k T −T × φ X 1 . 1 k ( ) k j =1 j ( ) Hence k ⎛ ⎞⎞ ⎛ k ⎛ k it X ⎞ it X φ X . ⋅ ⋅ ⋅ . . X k t1 . k. xk ( ) ⎛ 1 ⎞ = lim lim ⎜ ⎟ h →0 T →∞ ⎝ 2π ⎠ k ∫−T T ⋅ ⋅ ⋅∫ ∏⎜ −T j =1 k ⎛ 1 − e − it j h ⎝ it j h e − it j x j × φ X 1 . ⋅ ⋅ ⋅ . ⋅ ⋅ ⋅ . X x1 . ⋅ ⋅ ⋅ . we have (see Theorem 2′(ii) in Chapter 6) f X x j = lim lim j ( ) − it j h j j ( ) T f X 1 . T →∞ 2T ∫−T and for the multidimensional case. . ⋅ ⋅ ⋅ . h →0 T →∞ 2π ∫−T it j h and for the multidimensional case. . j = 1. X t1 . j = 1. ⋅ ⋅ ⋅ . X x1 . k are independent by Theorem 1(ii). f X . ⋅ ⋅ ⋅ . j =1 k . = ∏ ⎢ lim ∫ T →∞ 2T − T ⎦ j =1 j ( j ) j =1 ⎣ ( ) That is. . X k ( x1 . . 1 k ( ) k j =1 j ( ) For the discrete case. . . xk = ∏ f X x j . we have 1 T 1− e − it h e φ X t j dt j . X k (t1 . t k dt1 ⋅ ⋅ ⋅ dt k ⎛ 1 ⎞ = lim lim ⎜ ⎟ h →0 T →∞ ⎝ 2π ⎠ × dt1 ⋅ ⋅ ⋅ dt k k ⎡ 1 = ∏ ⎢lim lim h →0 T →∞ 2π j =1 ⎢ ⎣ − it h ⎤ 1 − e j − it j x j ∫−T it j h e φ X j t j dt j ⎥ ⎥ ⎦ T k ( ) ⎞ ⎟ ⎠ ∫−T T ⋅ ⋅ ⋅∫ ⎤ ⎡ 1 − e − it j h − it j x j e φXj t j ⎥ ⎢ ∏ −T j =1 ⎢ ⎥ ⎦ ⎣ it j h T k ( ) ( ) = ∏ fX j ( xj ) . ⋅ ⋅ ⋅ . j = 1. k. ⋅ ⋅ ⋅ .7. ⋅ ⋅ ⋅ . . t k = ∏ φ X t j .1 Stochastic Independence: Criteria of Independence 167 PROOF If X1. X k x1 . k are independent. then by Theorem 1(ii). For the continuous case. t k )dt1 ⋅ ⋅ ⋅ dt k ⎛ k ⎞ T ⋅ ⋅ ⋅ ∫ exp⎜ −i ∑ t j x j ⎟ −T ⎝ j =1 ⎠ ∫ ⎛ k ⎞ k T ⋅ ⋅ ⋅ ∫ exp⎜ −i ∑ t j x j ⎟ ∏ φ X j t j ( dt1 ⋅ ⋅ ⋅ dt k ) −T −T ⎝ j =1 ⎠ j =1 T ( ) k 1 T − it j x j ⎡ ⎤ k e φ X j t j dt j ⎥ = ∏ f X x . ⋅ ⋅ ⋅ . Let us assume now that j φ X . ⋅ ⋅ ⋅ . xk = E ⎜ exp⎜ i ∑ t j X j ⎟ ⎟ = E ⎜ ∏ e ⎟ = ∏ Ee ⎝ ⎠ ⎠ ⎝ j =1 j =1 j =1 ⎝ ⎠ 1 k ( ) j j j j by Lemmas 1 and 2. ⋅ ⋅ ⋅ . j = 1. we have (see Theorem 2′(i) in Chapter 6) j ( ) j j j ( ) ⎛ 1 ⎞ f X 1 . ⋅ ⋅ ⋅ .

X2 have p. x2).v. x2) = g(x1)h(x2). . x ) = 1 2 1 2πσ 1σ 2 1 − ρ 2 e −q 2 . f x = ⎢− 2 fX 1 x1 = exp⎢− exp X2 2 2 2 ⎥ ⎢ ⎢ 2 2 σ σ 2πσ 1 2πσ 2 1 2 ⎥ ⎢ ⎢ ⎦ ⎣ ⎣ Thus.v. 2 ii) Find the following probabilities: P(X1 + X2 < 1 ). ii) Show that X1. then ( ) 1 ( ) ( ) ( ) 2 ⎤ ⎥.2. X(n) in terms of f and F. r.’s with p.f. X2 be two r. f given by f(x1.3 Let X1. X2 (x . .f.1)(x1.f. X n . n be i. X n . 3 4 1 P(X1X2 > 2 ).168 7 Stochastic Independence with Some Applications which again establishes independence of Xj. j = 1.’s X1. . where q= 1 1 − ρ2 2 ⎡⎛ x − μ ⎞ 2 ⎛ x1 − μ 2 ⎞ ⎛ x2 − μ 2 ⎞ ⎛ x2 − μ 2 ⎞ ⎤ 1 ⎢⎜ 1 2 − ρ − ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎥. . 7. P( X12 + X 2 < 1 ).3 of Chapter 3) fX 1. if the m. . X n s . X (1) s = min X 1 s . ⋅ ⋅ ⋅ .d. ( ) ( ) ( ) that is.i. x 2 = fX 1 x1 ⋅ fX 2 x 2 .d.2 Let the r. REMARK 4 COROLLARY Let X1.d. and p. PROOF We have seen that (see Bivariate Normal in Section 3.d. ⋅ ⋅ ⋅ .d.’s can be formulated.f. if X1. X ( n ) s = max X 1 s . k by Theorem 1(ii). ▲ Exercises 7.1. j = 1. X2 have the Bivariate Normal distribution. so that ρ = 0. . X2 are independent if and only if they are uncorrelated.1. X2 are independent and identify their common distribution.v. that is.1 Let Xj.f. X 2 x1 . Set X (1) = min X 1 . ⋅ ⋅ ⋅ .’s with p.1) × (0. . F. 7. ( ) () [ () ( )] () [ () ( )] Then express the d. Then X1. x2) = I(0. X2 are independent. ( ) X n = max X 1 . f given by f(x1. ⎥ ⎥ ⎦ fX 1 . . X1. .f.’s exist. ⎢⎝ σ 1 ⎠ ⎝ σ1 ⎠⎝ σ 2 ⎠ ⎝ σ 2 ⎠ ⎥ ⎦ ⎣ and ⎡ x −μ 2⎤ ⎡ x −μ 1 1 1 2 ⎥. of X(1). ⋅ ⋅ ⋅ .f.f. The converse is always true by Corollary 1 in Section 7.g. X2 are uncorrelated. X n s .1. ▲ A version of this theorem involving m. f and d.g.

X2 are dependent. 1) × (0.i. 7. n are i. x 2 .d.5 Let X1. ii) Show that these r.f. 1. 7.i.f.) 1 2 1 {( )( )( )( )} . X2. Then show that E[g(X2) | X1 = x1] = Eg(X2).1. t ∈ ޒ.v.f.3 in Chapter 6.8 Let Xj.d. 1 . 0.d. ii) Calculate the probability P(X1 > X2) if g = h and h is of the continuous type. . 1. r. 0. .d. 7.1. X2 be two independent r.v.1. where 2 A = {(x1.i. 0. X2.’s with p. . X2. of X in terms of φ.1 Stochastic Independence: Criteria of Independence 169 ii) Derive the p. 1).1. ii) Show that X1.10 If Xj. show that φX −X (t) = |φX (t)|2.’s X1.f.d. X2. 4 where ( ) ( ) A = 1. x2. x12 + x 2 ≤ 9}. where A = (0. . Then show that ii) Xi. 1. of X1 and X2 and show that X1. iii) Find a numerical answer for n = 10. 0 . x3 .d. . λ = 1 .6 Let the r. x2.1. ii) X1. X2 are independent. X3 are independent.d.v.v. x3). 1 .v. 2 7.Exercises 7.v.1. X3 are dependent.2.’s X1. ∞). . 7. with parameter λ and B = (1/λ.v.f.11 For two i. f given by 1 f x1 . 0. 7. 7. n be i.4 Let X1.1. iii) Simplify this expression if f is the Negative Exponential p. iii) In terms of f. Let also Eg(X2) be ﬁnite. express the ch.’s are independent. X3 be jointly distributed with p.’s with ch. . x2).’s with p. are independent. ii) Determine the constant c.7 Refer to Exercise 4. x2) = cIA(x1. Utilize this result in order to ﬁnd the p. Xj. X3 be r.v. k = 5. j = 1. X2.v. of X1 + X2 and X1 + X2 + X3 .f. x3) = 8x1x2x3 IA(x1.’s with p. 7.f. .2. X2 be two r. i ≠ j. 1) × (0.9 Let X1. x 2 . j = 1.’s and let g: ޒ → ޒbe measurable. φ and sample mean X . r.d. (Hint: Use Exercise 6.d.’s X1. f and let B be a (Borel) set in ޒ.f f given by f(x1.5 in Chapter 4 and show that the r.d. x2)′ ∈ ޒ2.1. express the probability that at least k of the X’s lie in B for some ﬁxed k with 1 ≤ k ≤ n. f given by f(x1.f. ii) Calculate the probability P(X1 < X2 < X3). r. 0 . x3 = I A x1 .

X2 (t . show that φX (t.’s involved are continuous. X2 are independent if and only if 1 1 1 2 φX 1 .f. 7. Indeed. = E Y11Y21 − Y12Y22 + iE Y11Y22 + Y12Y21 11 21 12 22 11 22 12 21 11 21 12 22 11 22 12 21 11 12 21 22 1 2 [( )( )] Next. ⋅ ⋅ ⋅ . Then X1. set gj(Xj) = Yj = Yj 1 + Yj 2. E Y1Y2 = E Y11 + iY12 Y21 + iY22 ( ) ( ) ( ) = [ E(Y Y ) − E(Y Y )] + i[ E(Y Y ) + E(Y Y )] = [( EY )( EY ) − ( EY )( EY )] + i[( EY )( EY ) − ( EY )( EY )] = ( EY + iEY )( EY + iEY ) = ( EY )( EY ). so that we use integrals. 1 ( ∞ ∞ −∞ −∞ 1 1 k X1 1 Xk k 1 k ∞ ∞ −∞ 1 1 X1 1 1 −∞ k k Xk k k 1 1 k k Now suppose that the gj’s are complex-valued. Thus. PROOF OF LEMMA 2 ⎡ k ⎤ ∞ ∞ E ⎢∏ g j X j ⎥ = ∫ ⋅ ⋅ ⋅ ∫ g1 x1 ⋅ ⋅ ⋅ gk xk −∞ −∞ ⎢ ⎥ ⎣ j =1 ⎦ × f X . t ) = φ (t )φ (t ).12 Let X1. . X2 be two r. t ) = φ (t )φ (t ).X2 t1 . j = 1.v. t 2 ∈ ޒ.X . assume the result to be true for k = m and establish it for k = m + 1. xk dx1 ⋅ ⋅ ⋅ dxk ( ) ( ) k ( ) k ) = ∫ ⋅ ⋅ ⋅ ∫ g ( x ) ⋅ ⋅ ⋅ g ( x ) f ( x ) ⋅ ⋅ ⋅ f ( x )dx ⋅ ⋅ ⋅ dx (by independence) ⎤ ⋅ ⋅ ⋅ ⎡ g x f x dx ⎤ =⎡ ( ) ( ) ⎦ ⎢ ∫ g ( x ) f ( x )dx ⎦ ⎢∫ ⎥ ⎥ ⎣ ⎣ = E[ g ( X )] ⋅ ⋅ ⋅ E[ g ( X )]. .v. and for simplicity.’s φX . . k. 1 2 X1 1 X2 2 1 . . X x1 . does not imply independence of X1.1. Replace integrals by sums in the discrete case. .’s with joint and marginal ch. For k = 2. ⋅ ⋅ ⋅ . X2.170 7 Stochastic Independence with Some Applications 7. By an example. φX and φX .2 Proof of Lemma 2 and Related Results We now proceed with the proof of Lemma 2. X1 X2 t ∈ ޒ. Suppose that the r.

. ▲ m+1 1 m m+1 [( COROLLARY 1 The covariance of an r.1 Stochastic Criteria Independence 171 E Y1 ⋅ ⋅ ⋅ Ym+1 = E Y1 ⋅ ⋅ ⋅ Ym Ym+1 1 m ( ) ) ] = E(Y ⋅ ⋅ ⋅ Y )( EY ) (by the part just established) = ( EY ) ⋅ ⋅ ⋅ ( EY )( EY ) (by the induction hypothesis). PROOF In fact.) REMARK 5 COROLLARY 3 i) For any k r.v.v. Y) = Cov(X. i ≠ j. ( ii) If also σj > 0. j = 1. The second assertion follows since ρ(X.’s in general are not independent. . ▲ COROLLARY 2 If the r. PROOF Cov(X. that is. c) = E(cX) − (Ec)(EX) = cEX − cEX = 0. . k with ﬁnite second moments and variances 2 σ2 j = σ (Xj). provided their second moments are ﬁnite.’s are independent or have pairwise covariances 0 (are pairwise uncorrelated). c) = 0. Thus uncorrelated r. if the r. (See.v. j = 1. and any constants cj.’s X1 and X2 are independent. then they are uncorrelated. X and of any other r. ▲ The converse of the above corollary need not be true.2 Independence: Proof of Lemma 2 andof Related Results 7. Y)/σ(X)σ(Y). j = 1. . Cov X1 . . . if their variances are also positive.’s Xj. Xj). which is equal to a constant c (with probability 1) is equal to 0. X j ⎝ j =1 ⎠ j =1 1≤i ≠ j ≤k ( ) ) 2 = ∑ c2 jσ j + 2 j =1 k 1≤i < j ≤k ∑ ci c j Cov X i . and k 2 iii′) σ 2(∑k j = 1Xj) = ∑ j = 1σ j (Bienaymé equality).v. In particular. X j . it holds: ⎛ k ⎞ k σ 2 ⎜ ∑ c j X j ⎟ = ∑ c j2σ j2 + ∑ ci c j Cov X i . . .v. . Cov(X. then they have covariance equal to 0. .7. . . however. X 2 = E X1 X 2 − EX1 EX 2 1 2 1 ( ) ) ( )( ) = ( EX )( EX ) − ( EX )( EX ) = 0. the corollary to Theorem 1 after the proof of part (iii). and ρij = ρ(Xi. k. 2 ( by independence and Lemma 2. .v. then: k 2 2 iii) σ2(∑k j = 1cjXj) = ∑ j = 1c jσ j. k. In particular. then: ⎛ k ⎞ k 2 σ 2 ⎜ ∑ cj X j ⎟ = ∑ c2 j σ j + ∑ ci c j σ i σ j ρ ij ⎝ j =1 ⎠ j =1 1≤i ≠ j ≤k 2 = ∑ c2 jσ j + 2 j =1 k 1≤i < j ≤k ∑ ci c jσ iσ j ρij .

r. j = 1. . i ≠ j. j = 1. respectively. .v.5 in Chapter 4 and ﬁnd the E(X1 X2). j = 1. . both ﬁnite.’s with mean μ and variance σ2. .i.2. X )).’s Xj.v.4 . . Then the assertion follows from either part (i) or part (ii).d.2. 2 ( ) 2 where 1 k 1 k 2 X and S = ∑ j ∑ Xj − X .172 7 Stochastic Independence with Some Applications PROOF ii′i) Indeed.3 Let Xj. X= ( ) 7. Xj) = 0. n be i. k. either because of independence and Corollary 2. . . . σ2(X1 + X2 + X3) without integration. k j =1 k j =1 7. E(X1 X2 X3).2 Refer to Exercise 4. show that ∑ (X j =1 k j −μ ) = ∑ (X 2 j =1 k j −X ) 2 +k X −μ ( ) 2 = kS 2 + k X − μ . X j ( ) ) i 2 = ∑ c2 jσ j + 2 j =1 This establishes part (i). X ) = Cov( X .v.’s with ﬁnite moments of third order. n be independent r. 2 ⎡k ⎛ k ⎞ ⎛ k ⎞⎤ 2 σ ⎜ ∑ c j X j ⎟ = E ⎢∑ c j X j − E ⎜ ∑ c j X j ⎟ ⎥ ⎝ j =1 ⎠ ⎝ j =1 ⎠⎥ ⎢ ⎦ ⎣ j =1 ⎡k = E ⎢∑ c j X j − EX j ⎢ ⎣ j =1 ⎡k = E ⎢∑ c 2 j X j − EX j ⎢ ⎣ j =1 ( ) ⎤ ⎥ ⎥ ⎦ 2 2 ( ) + ∑ c c (X i j i≠ j i ⎤ − EX i X j − EX j ⎥ ⎥ ⎦ )( ) 2 = ∑ c2 jσ j + j =1 k k 1≤i ≠ j ≤k ∑ ci c j Cov X i . in case σj > 0. .2. . . Xj) = σiσjρij = σjσiρji. iii′) Follows from part (iii) for c1 = · · · = ck = 1. iii) Here Cov (Xi. . k. X j ( Exercises 7. . . ⎢ ⎥ j =1 ⎣ j =1 ⎦ Let Xj. i j j 1≤i < j ≤k ∑ ci c j Cov X i . σ2(X1 + X2). j = 1. . Then show that n ⎡n ⎤ 3 E ⎢∑ X j − EX j ⎥ = ∑ E X j − EX j . . ▲ (since Cov( X . or ρij = 0. . ( ) 3 ( ) 7. k for which E(Xj) = μ (ﬁnite) j = 1. . . .1 For any k r. Part (ii) follows by the fact that Cov(Xi.2.2.

f ( −1. f (0. f (0. when this last expression appears as a subscript here and thereafter.’s. For simplicity. . α .3 Some Consequences of Independence The basic assumption throughout this section is that the r. ▲ THEOREM 3 Let Xj be P(λj).f.’s with the same parameter p and possibly distinct nj’s is also Binomially distributed. 1 = α . the sum of independent Binomially distributed r.f. when they exist. can be used in the same way as the ch. of a B(n.90. c = 0. the conclusions of the theorems will be reached by way of Theorem 3 in Chapter 6.3 Independence: Some Consequences 7.v. 0) = β .v. Then X = ∑ Xj is B n. j = 1. 1) = β . − 1) = β . so that ρ = 0. f (1. . f −1.f. f (1.2. 0) = β . .v. − 1) = α ( ) f (0.g. k and independent. c and σ. .’s alone.’s are used very effectively in deriving certain “classic” theorems. without explicitly mentioning it. 0) = 0.f. of X is that of a B(n. p) r.v. . THEOREM 2 Let Xj be B(nj. 0. where n is as above.. we have φ X t = φ Σ X t = ∏ φ X t = ∏ pe it + q j j () () k j =1 j () k j =1 ( ) = ( pe nj it +q ) n which is the ch. p . p) r. X2 are dependent. j =1 j =1 k ( ) k (That is.’s taking on the values −1. X2 be two r. where λ = ∑ λ j . The m. . as will be seen below. p).f. 1) = α ..f.’s involved are independent. j =1 j =1 k () k . as we desired to prove.1 and σ = 2. X2) = 0.v.5 Let X1. However. 1 with the following respective probabilities: f ( −1. f (1.’s.) PROOF It sufﬁces to prove that the ch. In all cases. Then ch. writing ∑j Xj instead of ∑k j=1Xj. 7. ii) Give a numerical answer if α = 0.7. Then X = ∑ X j is P λ . β > 0. we will restrict ourselves to the case of ch.1 Stochastic Criteria of Independence 173 ii) In terms of α. ﬁnd the smallest value of n for which the probability that X (the sample mean of the X’s) and μ differ in absolute value at most by c is at least α. α + β = 1 . 7. − 1) = α . . j = 1. . k and independent. where n = ∑ n j . 4 Then show that: ii) Cov(X1. ii) X1.

⋅ ⋅ ⋅ . μ1 = ⋅ ⋅ ⋅ = μ k = μ . (That is. σ2).v. ( ) j = 1. j = 1. (i) Follows from (ii) by setting c1 = c2 = · · · = ck = 1.) PROOF We have φ X t = φ Σ X t = ∏ φ X t = ∏ exp λ j e it − λ j j j () () k j =1 j () k j =1 ( ) k ⎛ k ⎞ = exp⎜ e it ∑ λ j − ∑ λ j ⎟ = exp λe it − λ ⎝ j =1 ⎠ j =1 ( ) which is the ch. ▲ THEOREM 4 Let Xj. . k and independent. σ2/k).’s is also Poisson distributed. [ k ( X − μ)]/σ is N(0.174 7 Stochastic Independence with Some Applications (That is. Hence X is N(μ. be N(μj. and σ 12 = ⋅ ⋅ ⋅ = σ k =σ2 k and get the ﬁrst conclusion. σ = ∑ j = 1c jσ j.v. Then 2 k 2 k 2 ii) X = ∑k j = 1Xj is N(μ. k and independent. . . . .’s with E X j = μ. . The second follows from the ﬁrst by the use of Theorem 4. . σ = ∑ j = 1σ j. k be any k independent r. where μ = ∑ j = 1μj. . we get COROLLARY Let Xj be N(μ. .) PROOF (ii) We have k k ⎡ ⎛ σ 2 c 2t 2 ⎞ ⎤ φ X t = φ Σ c X t = ∏ φ X cj t = ∏ ⎢exp⎜ icj t μ j − j j ⎟ ⎥ 2 ⎠⎥ j =1 j =1 ⎢ ⎝ ⎦ ⎣ 2 2 ⎛ σ t ⎞ = exp⎜ itμ − 2 ⎟ ⎝ ⎠ () j j j () j ( ) with μ and σ2 as in (ii) above. σ2). . 2 k 2 k 2 2 ii) X = ∑k j = 1cjXj is N(μ. Set ( ) σ 2 X j = σ 2. . the sum of independent Poisson distributed r. the sum of independent Normally distributed r. σ 2 j). since . Then X is N(μ. j = 1. j = 1. Chapter 4. ▲ Now let Xj. σ ). . PROOF In (ii) of Theorem 4. or equivalently. of a P(λ) r. k. 1).v.’s is Normally distributed. X= 1 k ∑Xj. we set c1 = ⋅ ⋅ ⋅ = c k = 1 2 . and.v. more generally. k j =1 By assuming that the X’s are normal. σ ).f. where μ = ∑ j = 1cjμj.

be N(μ. σ 2 j). ▲ Now let Xj. PROOF By Lemma 1. . k and independent. j = 1. Thus Theorem 3 applies and gives the result. ⎝ ⎠ j 2 j = 1. . . . ⎛ X j − μj ⎞ ⎜ σ ⎟ . k are N(μ. .3 Independence: Some Consequences 7. ∑ k j =1 ( ) If. k. where r = ∑ rj . . .’s such that E(Xj) = μ. k be any k r. . ⋅ ⋅ ⋅ . Then k ⎛ X −μ ⎞ j j X = ∑⎜ ⎟ σ ⎠ j =1 ⎝ j 2 2 is χ k . in particular. . of a χ r. j = 1. . . . Xj. j = 1. Chapter 4. Then the following useful identity is easily established: ∑ (X j =1 k j −μ ) =∑ ( X 2 j =1 k j −X ) 2 +k X −μ ( ) 2 = kS 2 + k X − μ . then it will be shown that X and S2 are independent. σ 2). see Theorem 6. j = 1. PROOF We have 2 ⎡ k X −μ ⎛ Xj − μ⎞ ⎢ = ∑⎜ ⎢ σ ⎟ σ ⎠ j =1 ⎝ ⎣ k ( )⎤ ⎥ ⎥ ⎦ 2 + kS 2 . . ⎛ X j − μj ⎞ ⎜ σ ⎟ ⎝ ⎠ j 2 are χ 12 . ( ) 2 where S2 = 2 1 k Xj − X .1 Stochastic Criteria of Independence 175 k X −μ ( σ ) = (X − μ ) . . Chapter 9. (For this. ⋅ ⋅ ⋅ . and by Theorem 3.) COROLLARY 2 Let Xj. . .v. ▲ 2 r COROLLARY 1 Let Xj be N(μj. σ2 k ▲ THEOREM 5 Let Xj be χ 2 r . k are independent. Then j X = ∑ X j is χ r2 . Then kS2/σ 2 is χ 2 k−1. k and independent. k and independent. σ2) and independent. . k. .7.v. . . j =1 j =1 k k PROOF We have φ X t = φ Σ X t = ∏ φ X t = ∏ 1 − 2it j j () () k j =1 j () k j =1 ( ) − rj 2 = 1 − 2it ( ) −r 2 which is the ch. j = 1. . . σ2 . j = 1. j = 1. . .f.

. ⎝ σ ⎠ ⎝ σ ⎠ ( ) or ES 2 = 2 k−1 4 k−1 2 σ .d. ▲ PROOF j 1 Exercises 7.176 7 Stochastic Independence with Some Applications Now ⎛ Xj − μ⎞ ∑⎜ σ ⎟ ⎠ j =1 ⎝ k 2 2 is χ k by Corollary 1 above. j = 1. σ = 1.v. . ⎢ ⎥ ⎢ σ ⎦ ⎣ by Theorem 3.3. which is the ch. j = 1. and hence. k be independent r. j = 1.d. . . . n. ⎛ kS 2 ⎞ ⎛ kS 2 ⎞ E⎜ 2 ⎟ = k − 1. . Then X = ∑k j = 1Xj is kY. σ = 1. k k2 The following result demonstrates that the sum of independent r. and σ 2 ⎜ 2 ⎟ = 2 k − 1 .v. . .f. j = 1. We have φX(t) = φ∑ Xj(t) = [φX (t)]k = (e−|t|)k = e−k|t|. . 7. .’s distributed as P(λj). of Xj. . j =1 n Then show that ii) The conditional p. let Xj be independent r. of a χ 2 k−1 r. n. given T = t. Then taking ch. given T = t.’s having a certain distribution need not have a distribution of the same kind. . where Y is Cauchy with μ = 0. ▲ 2 2 2 2 ( ) 2 REMARK 6 It thus follows that.f. X/k = X is Cauchy with μ = 0.f. and set T = ∑Xj. . σ = 1. Hence φkS /σ (t) = (1 − 2it)−(k−1)/2 which is the ch.d. . ii) The conditional joint p. Chapter 4.f. . . of kY. j =1 n λ = ∑λj .’s Xj. X = X1 + · · · + Xr has the Negative Binomial distribution with parameters r and p. . .v.’s having the Cauchy distribution with parameters μ = 0 and σ = 1. .f. and ⎡ k X −μ ⎤ ⎥ is χ12 . is the Multinomial p. . with parameters t and pj = λj/λ. and σ 2 S 2 = σ . is B(t. λ1 j /λ). we get (1 − 2it)−k/2 = (1 − 2it)−1/2 φkS /σ (t). .v.f. where Y is Cauchy with μ = 0. . of Xj. .’s of both sides of the last identity above. j = 1.3. r have the Geometric distribution with parameter p. . n.2 If the independent r.1 For j = 1. as was the case in Theorems 2–5 above. The second statement is immediate. . THEOREM 6 ( ) ( ) Let Xj.v.v. n. show that the r.

Under what conditions on c1 and c2 is the r. μ2 and σ1. is Negative Exponential with parameter λ = 0. . p1) and B(n2. σ2. n. μ2 and σ1. .v. ii) Under what conditions on the α’s and β’s are the r. we give an alternative deﬁnition of independence of r.’s X1 + X2.d. iii) If the automobile comes supplied with a spare part. i = 1.’s such that 2 the X’s are distributed as N(μ1.d. and N(μ2. Y.f. iii) What is the probability that X + Y ≥ 500 days? 7. X2 be independent r.v. σ2) and set X = ∑α j X j .’s of the r. ﬁnd the p.3.5 Let X1. j = 1.v. specify r.) 7. which allows us to present a proof of Lemma 1. set X = c1X1 + c2X2. .’s distributed as N(μ1. iii) Find the expected life of the part in question.3. . and for any two constants c1 and c2.d. of the combined life of the part and its spare. μ1. σ 2 1) and the Y’s are distributed as N(μ2. m and Yj. X whose p.7. . is also stated. n be independent r.1 Stochastic Independence: Criteria Independence 177 7.005 day. σ 2). . X1 − X2 and X1 − X2 + n2. .’s distributed as N(μ.s. .7 Let X1 and X2 be independent r. . n be independent r.i. Then P(X1 − X2 > 0) is the probability that the cable does not break. 1 2 7.3. j =1 n where the α’s and β’s are constants.8 Let Xj. Calculate the probability P(X1 − X2 > 0) as a function of μ1.f. . Y distributed as X and independent of it. μ1 = μ2 and 2 σ2 1 = σ 2 = 6.v. X distributed as χ 2 r ? Also.v.4* Independence of Classes of Events and Related Results In this section.3 The life of a certain part in a new automobile is an r. p2). ii) Give the numerical value of this probability for m = 10.3. respectively.’s distributed as χ 2 r and χ r . (For example.v. σ2.6 Let Xi.4* Independence of Classes of Events andof Related Results 7. 2 7.v. j = 1. n = 15.3.) of a steel cable and X2 may represent the strains applied on this cable. Determine the distribution of the r. X2 be independent r.v. . . σ 2 1). Then ii) Find the p.’s.v.’s X. An additional result.4 Let X1. X1 may represent the tensile strength (measured in p.’s X and Y independent? 7. Then ii) Calculate the probability P( X > Y ) as a function of m.’s.v. respectively. j =1 n Y =∑ β j X j . . 2 7.’s distributed as B(n1.v. respectively.3.f. which provides a parsimonious way of checking independence of r. σ 2). whose life is an r. Lemma 3.v.v.

The next step is to carry over the deﬁnition of independence to r. On the basis of these observations. Ak are independent. . j = 1. . k are independent (in any one of the modes mentioned in the previous deﬁnition) if the σ-ﬁelds induced by them are independent. . . or statistically independent) if for every Aj ∈ Cj.’s Xj. if we consider the r. as explained in Lemma 3 below. Then we have seen (Theorem 1. . . . for every Bj ∈B. A. x]. k. . it holds that P(Ai ∩ · · · ∩ Ai ) = P(Ai ) · · · P(Ai ). k. we will have the σ-ﬁelds induced by them which we denote by Aj = X j−1(B). . . the events A1. . and if g(X) is a measurable function of X and Ag(X) = [g(X)]−1(B). and AX = X−1(B). . j = 1. . LEMMA 3 Let A j = X j−1 B ( ) and C j = X j−1 ({(−∞. j = 1. More precisely. But PROOF OF LEMMA 1 A= g X [ ( )] (B) = X [g (B)] = X (B′). let A ∈ Ag(X). .178 7 Stochastic Independence with Some Applications To start with. . . k. According to the following statement. .v.’s. j = 1.’s Xj. j = 1. . j = 1.’s Xj.v. it sufﬁces to establish independence of the (much “smaller”) classes Cj. j = 1. k. . Then there exists B ∈ B such that A = [g(X)]−1 (B). j = 1. . . . . . It is an immediate consequence of this deﬁnition that subclasses of independent classes are independent. . . the previous deﬁnition is equivalent to Deﬁnition 1. j = 1. In fact. consider the probability space (S. DEFINITION 3 We say that the r. . . . . then Ag(X) ⊆ AX. . . . k. . independence of the r. −1 −1 −1 −1 . . for every Aj ∈ X j−1(B) there exists Bj ∈B such that Aj = X j−1(Bj). That independence of those σ-ﬁelds is implied by independence of the classes Cj.v. Deﬁnition 3 can be weakened considerably. In the ﬁrst place.v. . . the σ-ﬁeld induced by X.’s Xj. This deﬁnition is extended to any subclasses of Cj. x ∈)}ޒ. Actually. j = 1.v. is an involved result in probability theory and it cannot be discussed here. . let X be a random variable. j = 1. . in order to establish independence of the r. X j−1(Bj) ∈ X j−1(B). P) and recall that k events A1. j = 1. . Thus. . . where Cj = X j−1({(−∞. k. . . Then if Cj are independent. k means independence of the σ-ﬁelds. . . By Deﬁnition 3. so are Aj. k. j = 1. . x ∈)}ޒ. . Chapter 3) that X−1(B) is a σ-ﬁeld. . k. as follows: 1 m 1 m DEFINITION 2 We say that Cj.v. To this end. ▲ PROOF We may now proceed with the proof of Lemma 1. k. ⋅ ⋅ ⋅ . . . The converse is also obviously true. k. sub-σ-ﬁeld of A. From the very deﬁnition of X j−1(B). . Ak are said to be independent if for all 2 ≤ m ≤ k and all 1 ≤ i1 < · · · < im ≤ k. . . k are (stochastically) independent (or independent in the probability sense. x]. . . . j = 1. . . . if X is an r. that is. k.

▲ Exercise 7. A. .Exercise 7. A2 be events. n. B′ ∈ B. j = 1. j = 1. · · · . . . are independent. Then A j* ⊆ A j .1 Stochastic Independence: Criteria of Independence 179 where B′ = g−1 (B) and by the measurability of g. P) and let A1. j = 1. k. ⋅ ⋅ ⋅ . It follows that X−1 (B′) ∈AX and thus. Generalize it for the case of n events Aj. k. . and since Aj. . A ∈AX. −1 j = 1. A2 are independent. . X2 = IA and show that X1. Let now Aj = Xj−1(B) and A j* = g X j [ ( )] (B ). k. 1 2 . .1 Consider the probability space (S. so are A* j . X2 are independent if and only if A1. j = 1. .4. k. ⋅ ⋅ ⋅ . Set X1 = IA .

v.s. . An illustrative example is also discussed. n = 1. DEFINITION 1 i) We say that {Xn} converges almost surely (a. 2. if X ( s ) X ( s ) for all s ∈ S except possibly for ⎯ ⎯ → ⎯ → n →∞ n →∞ n a subset N of S such that P(N) = 0. n →∞ n→∞ or P[Xn ⎯ X ] = 1. or with probability one. δ > 0 there exists N(ε. For such a sequence of r. and we write P Xn ⎯ ⎯ → 0. P[|Xn − X| > ε] ⎯ n →∞ n→∞ P Thus Xn ⎯ ⎯ → X means that: For every ε. then clearly P[|Xn − X| ≥ ε] ⎯ ⎯ → 0.v.s.v.180 8 Basic Limit Theorems Chapter 8 Basic Limit Theorems 8. or Xn ⎯ ⎯ → X with probability 1.’s and the r. X as n → ∞.1 Some Modes of Convergence Let {Xn}.s. s). and some comments are provided as to their nature. then Xn ⎯ ⎯ → X is n→∞ equivalent to: P[|Xn − X| ≤ ε] ⎯ ⎯ → 1. X are deﬁned on the probability space (S. δ ) > 0 n→∞ such that P[|Xn − X| > ε] < δ for all n ≥ N(ε.). Also if P[|Xn − X| > ε] ⎯ ⎯ → 0 for every n →∞ n →∞ ε > 0. and we write Xn ⎯ ⎯ → X. to a. c a. s) > 0 such that Xn s − X s < ε for all n ≥ N(ε. . if for every ε > 0. Thus Xn ⎯ ⎯ → X means that for every ε > 0 and for every s ∈ N there n→∞ exists N(ε. This type of convergence is also known as strong convergence. .’s four kinds of convergence are deﬁned. P)). ii) We say that {Xn} converges in probability to X as n → ∞. be a sequence of random variables and let X be a random variable deﬁned on the sample space S supplied with a class of events A and a probability function P (that is. ⎯ → X. δ ). n →∞ () () 180 . A. P REMARK 1 Since P[|Xn − X| > ε] + P[|Xn − X| ≤ ε] = 1. the sequence of the r.

⎪ ⎩ () if if if ( ) 1 − (1 n) ≤ x < 1 + (1 n) x ≥ 1 + (1 n). n →∞ n→∞ d Thus Xn ⎯ X means that: For every ε > 0 and every x for which ⎯ → n→∞ F is continuous there exists N(ε. ⎯→ X. fn x = ⎨ 2 ⎪ ⎩0. This type of convergence is also known as weak convergence. n →∞ Next.) as n → ∞.d. if E|Xn − X| ⎯ n →∞ n→∞ q.’s fn. 2. as the following example illustrates. Thus Xn ⎯ ⎯ → X means that: For every ε > 0.f.d. For n = 1. F x =⎨ ⎩1. Then n iii) We say that {Xn} converges in distribution to X as n → ∞. Fn corresponding to fn is given by ⎧0. Under further conditions on fn.1 Some Modes of Convergence 181 Let now Fn = FX .’s deﬁned by 1 ⎧ ⎪ . f.d. however. clearly.f. the d. if Fn(x) ⎯ ⎯ → F(x) for all x ∈ ޒfor which F is continuous. x) such that |Fn(x) − F(x)| < ε for all n ≥ N(ε. Then. fn(x) ⎯ ⎯ → f(x) = 0 for all x ∈ ޒand f(x) is not a p. () if if x<1 x ≥ 1. . Fn x = ⎨ 2 ⎪1.f.d.f. We now assume that E|Xn|2 < ∞. write Xn ⎯ ⎯ → 0. x). () if x = 1− 1 n ( ) or x = 1+ 1 n ( ) EXAMPLE 1 otherwise.m.f.m. which is a d.d. . n = 1. it may be the case. x < 1− 1 n 1 1 2 Fn Figure 8. f. Then: iv) We say that {Xn} converges to X in quadratic mean (q. . and we write d Xn ⎯ ⎯ → X. d REMARK 2 If Fn have p. 2. consider the p. and we 2 q.m. where F(x) is deﬁned by n →∞ ⎧0 . . there exists N(ε) > 0 such n→∞ that E|Xn − X|2 < ε for all n ≥ N(ε). .8. F = FX.f. . .. that fn converges to a p. . .1 0 1Ϫ 1 n 1 1ϩ 1 n One sees that Fn(x) ⎯ ⎯ → F(x) for all x ≠ 1. then Xn ⎯ ⎯ → X does not necessarily imply n→∞ the convergence of fn(x) to a p. ⎪ ⎪1 .f.

v.f. 8. .5 Let Xj . .v. Yn be r. d iii) Un ⎯ ⎯ → U. j = 1. Exercises 8. .v. if P(An) ⎯ n →∞ n→∞ to 0.’s such that EXj = μ. . . Convergence in distribution is also a pointwise convergence of the sequence of numbers {Fn(x)} for every x for which F is continuous. 2. j = 1. . . n. n →∞ Thus a convergent sequence of d. 8. . 2. . let Xn be an r.1 For n = 1. be i. Convergence in probability.’s need not converge to a d. however. By setting An = {s ∈S. as n → ∞. .i. . . r. 8.1.’s such that P X n = 1 = pn . . both ¯n − μ)2 ⎯ ﬁnite. So the sequence of numbers {P(An)} tends ⎯ → X. P n→∞ ( ) ⎯→ 0? Under what conditions on the pn’s does Xn ⎯ 8. Finally. Un = nYn. . .2 For n = 1. .v.2 Relationships Among the Various Modes of Convergence The following theorem states the relationships which exist among the various modes of convergence. Then show that Y ⎯→ ⎯ → n →∞ n ⎯ n→∞ X. Then show that. and set Yn = min(X1. .d. Fn given by Fn(x) = 0 if x < n and Fn(x) = 1 if x ≥ n. . Zn = max(X1.v. X . n.1. .’s such that E(Xn − Yn)2 ⎯ ⎯ → 0 and n →∞ q. |Xn(s) − X(s)| > ε} for an arbitrary but ﬁxed ε > 0. d iv) Vn ⎯ ⎯→ V. where U and V have the negative exponential distribution with parameter λ = 1. . Show that E(X ⎯ → 0. we have P that Xn ⎯ ⎯ → 0. ( ) P X n = 0 = 1 − pn. .182 8 Basic Limit Theorems REMARK 3 Almost sure convergence is the familiar pointwise convergence of the sequence of numbers {Xn(s)} for every s outside of an event N of probability zero (a null event). . but the events An themselves keep wandering around the sample space S. Xn).m. one has P i) Yn ⎯ ⎯ → 0. Then show that Fn(x) ⎯ ⎯ → 0 for every x ∈ޒ.v. . convergence in quadratic mean simply signiﬁes that the averages E|Xn − X|2 converge to 0 as n → ∞.f. let Xn be independent r.’s distributed as U(0.f. with d. .1.4 For n = 1.1. P ii) Zn ⎯ ⎯→ 1. let Xn. Xn). 2. . . σ 2(Xj) = σ 2. . n →∞ 8. . n be independent r. 1). Vn = n(1 − Zn).1.3 Let Xj. . as n → ∞. is of a different nature. suppose that E(Xn − X)2 ⎯ 0 for some r.

Ac as well as those appearing in the remaining of this discussion are all events.m. n →∞ ∞ ⎛ ⎛ 1⎞ 1⎞ X − X ≥ ⊆ ⎜ n+ m ⎟ U ⎜ X n+ r − X ≥ ⎟ . Chapter 2. 2. in q. one has that n→∞ P(Cn) ⎯ ⎯ → 0. k⎠ k=1 n=1 r =1 ⎝ so that the set Ac for which Xn ⎯ ⎯ → X is given by n →∞ ∞ ∞ ∞ ⎛ 1⎞ A c = U I U ⎜ X n+ r − X ≥ ⎟ . where ∞ ⎛ 1⎞ C n = U ⎜ X n+ r − X ≥ ⎟ . we have ( ) P Xn − X > ε ≤ [ ] E Xn − X 2 ε2 .2 Relationships Among the Various Modes of Convergence 183 THEOREM 1 a. by Theorem 2. n n→∞ n→∞ P d iii) Xn ⎯ ⎯→ X implies Xn ⎯ ⎯ → X. In terms of a diagram this is a. as k → ∞. To summarize. ii) By special case 1 (applied with r = 2) of Theorem 1.4) that ∞ ∞ ∞ ⎛ 1⎞ A = I U I ⎜ X n+ r − X < ⎟ .s. k⎠ n=1 r =1 ⎝ we have Bk↑Ac.s. k⎠ r =1 ⎝ Hence P(Cn)↓P(Bk) = 0 by Theorem 2. P i) Xn ⎯ ⎯ → X implies Xn ⎯ ⎯ → X. which is equivalent to saying that P(Ac) = 0.2.s. By setting ∞ ∞ ⎛ 1⎞ Bk = I U ⎜ X n+ r − X ≥ ⎟ . ⇑ conv. in dist. as n→∞ was to be seen. k ≥ 1. so that P(Bk) → P(Ac). ⇒ conv. But for any ﬁxed positive integer m. Thus if Xn ⎯ ⎯ → X. The converse is also true if X is degenern→∞ n→∞ ate. conv. Xn ⎯ ⎯ → X. PROOF i) Let A be the subset of S on which Xn ⎯ ⎯ → X.8. if a.m. Next. Then it is not hard to see n →∞ (see Exercise 8. . and therefore P(Bk) = 0. ⇒ conv. Cn↓Bk. n→∞ n→∞ q. and hence we can take their probabilities. However. Chapter a.s. again. this is equalivalent to saying that Xn ⎯ ⎯ → X. that is. and as n → ∞. n→∞ it is clear that for every ﬁxed k. P[X = c] = 1 for some constant c. k ⎠ r =1 ⎝ k⎠ ⎝ so that ⎡∞ ⎛ ⎛ 1⎞ 1⎞⎤ P⎜ X n+ m − X ≥ ⎟ ≤ P ⎢U ⎜ X n+ r − X ≥ ⎟ ⎥ = P C n ⎯ ⎯ →0 n→∞ k⎠ k⎠⎥ ⎝ ⎢ ⎣ r =1 ⎝ ⎦ P for every k ≥ 1. then P(Ac) = 0. P ii) Xn ⎯ ⎯ → X implies X ⎯ ⎯ → X. in prob. k⎠ k=1 n=1 r =1 ⎝ The sets A.

Thus. then E|Xn − X|2 ⎯ ⎯ → 0 implies P[|Xn − X| > ε] ⎯ ⎯ → n →∞ n →∞ n→∞ P 0 for every ε > 0. n →∞ n →∞ Letting ε → 0. then we have by taking limits P n→∞ ( () [ ] F x − ε ≤ lim inf Fn x . X ≤ x − ε since ] [ ≤ x] ∪ [ X ] [ ⊆ [X n n ] [ ≤ x + X n > x. x ≠ c. We have ⎯ → n→∞ P X n − c ≤ ε = P −ε ≤ X n − c ≤ ε [ ] [ [ ] = P[ X ≤ c + ε ] − P[ X ≥ P[ X ≤ c + ε ] − P[ X = F (c + ε ) − F (c − ε ). Assume now that P[X = c] = 1.184 8 Basic Limit Theorems q. Xn ⎯ ⎯ → X . We must show that n →∞ P Xn ⎯ c . () x<c x≥c and our assumption is that Fn(x) ⎯ ⎯ → F(x). n →∞ n →∞ Hence lim Fn(x) exists and equals F(x). X ≤ x − ε + X n > x. So ] [ ] ] [X ≤ x − ε ] ⊆ [X implies ≤ x ∪ Xn − X ≥ ε ] [ ] ] P X ≤ x − ε ≤ P Xn ≤ x + P Xn − X ≥ ε . Then we have [X ≤ x − ε ] = [X ⊆ [X ⊆ [X [X n n n n ≤ x. n →∞ In a similar manner one can show that lim sup Fn x ≤ F x + ε . F x =⎨ ⎩1. n→∞ iii) Let x ∈ ޒbe a continuity point of F and let ε > 0 be given.m. or equivalently. = P c − ε ≤ Xn ≤ c + ε n n n n ] n n ] ≤ c − ε] < c−ε . [ ] [ ) ] [ or F x − ε ≤ Fn x + P X n − X ≥ ε . X ≤ x − ε n −X ≥ε . n →∞ ( ) () (1) () ( ) (2) But (1) and (2) imply F(x − ε) ≤ lim inf Fn(x) ≤ lim sup Fn(x) ≤ F(x + ε). ] ] ] > x. if Xn ⎯ ⎯→ X. if Xn ⎯ ⎯→ X. − X ≥ − x + ε − X ≥ ε ⊆ Xn − X ≥ ε . Then n→∞ () () () () ⎧0 . X ≤ x − ε = X n > x. Thus. we get (by the fact that x is a continuity point of F ) that F x ≤ lim inf Fn x ≤ lim sup Fn x ≤ F x .

c + ε are continuity points of F. otherwise. a.’s as follows: For each k = 1. while it converges nowhere pointwise. n n→∞ n→∞ EXAMPLE 3 Let S and P be as in Example 2.s. For each k = 1. . n→∞ 2 that is. q.v.m. These intervals are then given by ⎛ j −1 j ⎤ ⎜ k −1 . Hence EX 2 . σ 2 X n ⎯ ⎯ → 0.m. and also that X ⎯ ⎯ → X need not imply Xn ⎯ ⎯ → X. . ⎝2 2 ⎦ We assert that the so constructed sequence X1. n →∞ n ⎯ P ⎯ → X need not imply that The example just discussed shows that Xn ⎯ n→∞ a. EX n ⎯ 0. . .1/n].v. X2. whose subscripts range from 2k−1 to 2k − 1. ⎯ → 0 but EX 2 ⎯→ 0.s. . .2 Relationships Among the Various Modes of Convergence 185 Since c − ε. That n n→∞ n→∞ n→∞ a. It is also clear that for m > n. and let P be the probability function which assigns to subintervals of (0. clearly. 1]. Xn ⎯ n →∞ n = n(1/n) = 1. . ⎝2 2 ⎦ j = 1. EXAMPLE 2 It is shown by the following example that the converse in (i) is Let S = (0.’s converges to 0 in probability. Xn ⎯ ⎯ → X . k−1 ⎥ and 0. 1/2 < ε for all sufﬁciently large k. nI(0. REMARK 5 In (ii). s ∈ (0. n →∞ n →∞ In fact. of r. X2. ▲ [ REMARK 4 not true. n→∞ [ ] ( ) ( ) ] n→∞ Thus P Xn − c ≤ ε ⎯ ⎯→ 1. ⋅ ⋅ ⋅ . we have that X is the indicator of an ⎯ → n →∞ n interval ⎛ j −1 j ⎤ ⎜ k−1 . then: Xn ⎯ ⎯→ X if and only if n→∞ E Xn ⎯ ⎯→ c. ( ) ( ) . the proof that EX 2 ⎯ → 0 is complete. so that Xn ⎯ n→∞ q.m.v. . and for n ≥ 1. we deﬁne a group of 2k−1 r. 1] as measures of their length. Then. by Theorem 1(ii). let Xn be deﬁned by Xn = q. . In fact. We deﬁne the jth r.s.’s. . divide (0.8. 2 k −1. Since for every ε > 0. . k−1 ⎥ ⎝2 2 ⎦ k−1 for some k and j as above. if P[X = c] = 1. in the following way: There are (2k − 1) − (2k−1 − 1) = 2k−1 r. q. we get lim P X n − c ≤ ε ≥ F c + ε − F c − ε = 1 − 0 = 1. 1] into 2k−1 subintervals of equal length. in this group to be equal to 1 for ⎛ j −1 j ⎤ s ∈ ⎜ k−1 . 1]. 2.m.v. n = 1/2 2 k−1 k−1 EX m ≤ 1/2 . not even for a single q. it sufﬁces to show that Xn ⎯ ⎯→ 0.) Deﬁne the sequence X1.v. of r. .’s within this group. 2. . For any n ≥ 1.m. 1]. k −1 ⎥. (This is known as the Lebesgue measure over (0. Xn ⎯ ⎯ → X need not imply that X ⎯ ⎯ → X is seen by the following example. . 2.

⎩ x<0 0 ≤ x < 1. but X does not converge in ⎯ → n n→∞ probability to X. ⎩ n x<0 0 ≤ x < 1. 4}. corresponding to F. then g is a ch. corresponding to Fn and φ be the ch.f. to a function g(t) which is continuous at t = 0. Let φn be the ch. which is much easier to deal with. PROOF Omitted. let Xn be an r. In fact. then Fn (x) ⎯ ⎯ → n →∞ F(x). for all continuity points x of F. is given by φn(t) = e−t n/2. and on the subsets of S. FX (x) ⎯ ⎯ → FX(x) for all n →∞ d continuity points of FX.. X n 3 = X n 4 = 0.f. x≥1 FX () ⎧0. and let F be a d. then φn(t) ⎯ ⎯ → φ(t). trivially. n THEOREM 2 (P. ii) If φn(t) converges. distributed as N(0. The following theorem replaces this problem with that of proving convergence of ch. x = ⎨2 ⎪1. () () FX n () ⎧0. as n → ∞.’s: X n 1 = X n 2 = 1.186 8 Basic Limit Theorems E Xn − c ( ) 2 Hence E(Xn − c)2 ⎯ ⎯ → 0 if and only if σ 2(Xn) ⎯ ⎯ → 0 and EXn ⎯ ⎯ → c. Then. ⋅ ⋅ ⋅ . () () X 3 = X 4 = 1. ⎪1 .f. so that its ch. () () Then X n s − X s = 1 for all s ∈ S . Then 2 . x = ⎨2 ⎪1. ⎯ → F(x) for all continuity points x of F. and if F is the corresponding d.f. n →∞ n →∞ n →∞ ) ( = E( X − EX ) + ( EX = σ ( X ) + ( EX − c ) . Deﬁne the following r. Lévy’s Continuity Theorem) Let {Fn} be a sequence of d. x≥1 so that FX (x) = FX(x) for all x ∈ޒ. Xn ⎯ X .’s. Thus. that is. Very often one is confronted with the problem of proving convergence in distribution. n). n = 1. 2 n n 2 2 n n = E X n − EX n + EX n − c n [( )] − c) 2 2 REMARK 6 The following example shows that the converse of (iii) is not true. let P be the discrete uniform function.f.’s. 3.f.f. and t ∈ޒ. for i) If Fn (x) ⎯ n →∞ n →∞ every t ∈ޒ. ⎪1 ..v. 2. 2. Now. Hence Xn does not converge in probability to X. REMARK 7 The assumption made in the second part of the theorem above according to which the function g is continuous at t = 0 is essential. EXAMPLE 4 Let S = {1.f.v. as n → ∞. () () () () and X 1 = X 2 = 0.

v. which assigns probability to intervals equal to their lengths.i.2.2. i) Xn ⎯ n→∞ ii) Xn(s) ⎯ ⎯ → 0 for any s ∈ [0. . N = 1. Let . (Use ch.v.’s having the negative binomial distribution with pn and rn such that pn ⎯ ⎯ → ∞. . show that the set A of conver1 ∞ ∞ ∞ gence of {Xn} to X is. .3 The Central Limit Theorem 187 φn(t) ⎯ ⎯ → g(t).) 8. .’s. n→∞ ⎯ → 0. deﬁne the r.f. is not a d. 2. where npn = d λn ⎯ ⎯ → λ(>0).5 In reference to the proof of Theorem 1. show that Xn ⎯ ⎯ → X. .v. . .v. where X is an r.3 The Central Limit Theorem We are now ready to formulate and prove the celebrated Central Limit Theorem (CLT) in its simplest form. .v. of an r.’s.f. let Xn be r. =1 Un =1 I r =1(|Xn+r − X| < – k 8. . (Use ⎯ → n→∞ ch. 2.2 For n = 1. so that rn(1 − pn) = λn ⎯ ⎯ → ⎯ → 1. Xn be i. . . j = 0. distributed as P ( λ ).2. . . show that P ¯n ⎯ there is no ﬁnite constant c for which X ⎯ → c. by using ch. if t ≠ 0. indeed. otherwise. 2. . n→∞ XN 2 () 2 8. n have a Cauchy distribution. j = 1. . 1) and let P be the probability function on subsets of S. . iii) Xn (s) ⎯ n→∞ iv) EXn ⎯ ⎯ → 0. .d.8. THEOREM 3 (Central Limit Theorem) Let X1.’s. The conclusion in (ii) does not hold here because ⎛X ⎛ x ⎞ x ⎞ 1 FX x = P X n ≤ x = P⎜ n ≤ ⎯ → ⎟ = Φ⎜ ⎟ ⎯ n→∞ 2 ⎝ n ⎝ n⎠ n⎠ 1 for every x ∈ ޒand F(x) = – . 2 n () ( ) Exercises 8. pn). . n→∞ n→∞ distributed as P(λ). r.i. For n = 1.v.’s distributed as B(n.4 If the i. 2N. . 1). 1). rn ⎯ n→∞ n→∞ n→∞ d λ(>0).3 For n = 1. .’s Xj. .1 (Rényi) Let S = [0. . and g(0) = 1.v. . so that g is not continuous n→∞ at 0. if ≤s< s = ⎨ 2N + 1 2N + 1 +j ⎪ ⎩0. s ∈ (0.2. . .d. .’s with mean μ (ﬁnite) and (ﬁnite and positive) variance σ 2. where g(t) = 0. r. let Xn be r.’s Xn as follows: ⎧ j j +1 ⎪N .f. x ∈ޒ. Show that Xn ⎯ X .2. Then. where X is an r. . 2. expressed by A = Ik ).f. .v. Then show that P ⎯ → 0.) n→∞ 8. 1. 8.

This point should be kept in mind throughout. that is. if an n→∞ = n and bn = n2. n = 1.188 8 Basic Limit Theorems ⎤ ⎡ n X −μ 1 n n ⎥. since n/n2 = 1/n ⎯ ⎯ → 0. n→∞ Xn = () ( ) () 1 2π ∫ x −∞ e − t 2 dt . This will imply that Gn(x) → Φ(x). To this end. and Φ x = ⎢ X . 2. t ∈ ޒ. Clearly.f. t ∈ޒ. or ≈ N 0. then an = o(bn). We say that {an} is o(bn) (little o of bn) and we write an = o(bn). since j =1 n n Xn − μ ( σ )=S ( ).v. n→∞ x ∈ ޒ. Namely. of Gn and φ be the ch. where ( ) ( ) ( ) ( ) n ( ) Sn = ∑ X j . if an = n→∞ o(bn). {bn}. We have 2 n Xn − μ ( σ ) = nX = 1 n n − nμ n σ n = 1 n ∑ n Xj − μ j =1 σ ∑ Zj . it sufﬁces to prove that gn(t) ⎯ ⎯ → φ(t). iii) In the proof of Theorem 3 and elsewhere. . . For example. . if an ⎯ ⎯ → a. Therefore o(bn) = bno(1). the “little o” notation will be employed as a convenient notation for the remainder in Taylor series expansions. that the same or similar symbols have been employed elsewhere to denote different quantities (see. iv) We recall the following fact which was also employed in the proof of Theorem 3. . for example. 1 . if an/bn ⎯ ⎯ → 0.’s X1. .f. let {an}. σ (S ) − E Sn n ii) In part (i). of Φ. . Xn. It should be pointed out. be two sequences of numbers. Corollaries 1 and 2 in Chapter 7. or Theorem 9 and Corollary to Theorem 8 in this chapter). G x = P ≤ x ∑ j n ⎥ ⎢ n j =1 σ ⎦ ⎣ Then Gn(x) ⎯ ⎯ → Φ(x) for every x in ޒ. A relevant comment would then be in order. . j =1 . the notation Sn was used to denote the sum of the r. and we are going to adhere to it here. 2 REMARK 8 i) We often express (loosely) the CLT by writing n Xn − μ Sn − E Sn ≈ N 0. then an = bno(1). by Theorem 2. This is a generally accepted notation. then n→∞ ⎛ an ⎞ ⎯→ e a. Let gn be the ch. φ(t) = e−t /2. Then. 1 . Chapter 3. however. ⎜1 + ⎟ ⎯ n ⎠ n→∞ ⎝ n PROOF OF THEOREM 3 We may now begin the proof. σ σ Sn for large n.

2 COROLLARY ⎯ → Φ(x) is uniform in x ∈ ޒ. . n are i. gn(t) ⎯ ⎯ → e−t /2. of n→∞ Φ. We have: . j = 1. which along with the theorem itself provides the justiﬁcation for many approximations. . If Xj. ▲ The following examples are presented for the purpose of illustrating the theorem and its corollary.d. = − + = − + ⎟ ⎜ n⎟ 2n 2n n 2n ⎝ ⎠ ⎝ n⎠ 1 () n [ ( )] Thus ⎫ ⎧ t2 ⎪ ⎪ 1−o 1 ⎬ . n are i.3. σ 2(Zj) = E(Z2j) = 1.i. . . . for simplicity. . ′ 0 + ⎜ ⎟ ⎟ = gZ 0 + 2! ⎝ n ⎠ ⎝ n⎠ ⎝ n⎠ n 1 1 () 1 () 2 1 () Since 2 g Z1 0 = 1.8. the CLT is used to give an approximation to P[a < Sn ≤ b]. such that |Gn(x) − Φ(x)| < ε for all n ≥ N(ε) and all x ∈ ޒsimultaneously.6*.) PROOF It is an immediate consequence of Lemma 1 in Section 8. ▲ The theorem just established has the following corollary. g Z ′′1 0 = i 2 E Z 1 () () ( ) () ( ) we get ⎛ t ⎞ ⎛ t2 ⎞ t2 t2 t2 t2 gZ ⎜ 1 o 1 o 1 = 1 − 1− o 1 . 8.3 The Central Limit Theorem 189 where Zj = (Xj − μ)/σ. with E(Xj) = μ. The convergence Gn(x) ⎯ n→∞ (That is. . σ 2(Xj) = σ 2.i.f. Hence. which is the ch. . j = 1. ⎝ n⎠ ⎣ ⎥ ⎢ ⎝ n ⎠⎦ 1 n Now consider the Taylor expansion of gz around zero up to the second order term. when this last expression appears as a subscript. Then ⎛ t ⎞ ⎛ t2 ⎞ t 1⎛ t ⎞ gZ ⎜ gZ gZ ′′ 0 + o⎜ ⎟ . g n t = ⎨1 − ⎪ ⎪ ⎭ ⎩ 2n () [ ( )] Taking limits as n → ∞ we have. ′ 1 0 = iE Z1 = 0. with E(Zj) = 0.d.1 Applications 1. for every x ∈ ޒand every ε > 0 there exists N(ε) > 0 independent of x. writing Σ j Zj instead of Σ n j=1 Zj. g Z = −1. −∞ < a < b < +∞. we have gn t = g t () ( n Σ jZ j ) () ⎛ t ⎞ ⎡ ⎛ 1 ⎞⎤ t = g Σ jZ j ⎜ ⎟ = ⎢ g Z1 ⎜ ⎟⎥ .

a − nμ . To start with. For a given n. [ ] ( ) ( ) ( ) ( ) ( ) ( ) a* = ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) where .5. σ = pq . are independently distributed as B(1. where now Xj. (This is the De Moivre theorem. where . npq Then it can be shown that fn(r)/φn(x) ⎯ ⎯ → 1 and this convergence is uniform n→∞ for all x’s in a ﬁnite interval [a. the approximation is best for p = – and 2 1 deteriorates as p moves away from – . Thus: P a < Sn ≤ b ≈ Φ b * − Φ a * . let ⎛ n⎞ fn r = ⎜ ⎟ p r q n− r . respectively. n. . Normal approximation to the Binomial. if in the expressions of a* and b* we replace a and b by a + 0. 2.190 8 Basic Limit Theorems ⎡ a − E Sn Sn − E Sn b − E Sn ⎤ ⎥ P a < Sn ≤ b = P⎢ < ≤ σ Sn σ Sn σ Sn ⎥ ⎢ ⎣ ⎦ ⎡ a − nμ S n − E S n b − nμ ⎤ ⎥ ≤ = P⎢ < σ S ⎥ ⎢ σ σ n n n ⎦ ⎣ ⎤ ⎡ ⎡ Sn − E Sn Sn − E Sn b − nμ a − nμ ⎤ ⎥ ⎥ − P⎢ = P⎢ ≤ ≤ σ Sn ⎢ ⎢ σ n ⎥ σ n ⎥ ⎦ ⎦ ⎣ σ Sn ⎣ ≈ Φ b* − Φ a * . . σ n σ n (Here is where the corollary is utilized.) Thus for x= r − np . the Normal approximation to the Binomial distribution presented above can be improved.) That is. Some numerical examples will shed some 2 light on these points. . P(a < Sn ≤ b) ≈ Φ(b*) − Φ(a*). In the following we give an explanation of the continuity correction. This is the same problem as above. This is called the continuity correction. The above approximation would not be valid if the convergence was not uniform in x ∈ޒ. b]. Also. j = 1. . The points a* and b* do depend on n. where a* = a − np npq . b* = b − nμ ( ) ( ) ( ) b − np npq . and let φ n x = ⎝r ⎠ () () 1 2πnpq e−x 2 2 .5 and b + 0. p). b* = REMARK 9 It is seen that the approximation is fairly good provided n and 1 p are such that npq ≥ 20. We have μ = p. and therefore move along ޒas n → ∞.

while the approximation without correction is the area bounded by the normal curve. the horizontal axis.3 N (2. 1 1 To give an idea of how the correction term – comes in.3 The Central Limit Theorem 191 large n. Clearly. and for discrete r. under the conditions of the CLT. given by the area bounded by the normal curve.d.f.2 0.v. ( ) 0.6) 0. the probability r n−r (n is approximately equal to the value r )p q ⎡ r − np 2 ⎤ ⎥ exp⎢− ⎥ ⎢ 2 npq 2πnpq ⎥ ⎢ ⎦ ⎣ of the normal density with mean np and variance npq for sufﬁciently large n. In particular. for integer-valued r. the correction. and the abscissas 1 and 3.’s.v. the horizontal axis and the abscissas 1.2 2 drawn for n = 10. To summarize.2 1 2 3 4 5 Now P 1 < S n ≤ 3 = P 2 ≤ S n ≤ 3 = fn 2 + fn 3 = shaded area.2. we ﬁrst rewrite the expression as follows: .5 − nμ ⎞ P a < S n ≤ b ≈ Φ⎜ ⎟ − Φ⎜ ⎟ ⎝ ⎠ ⎝ ⎠ σ n σ n ( ) with continuity correction. we have. we refer to Fig. 8. ⎛ b − nμ ⎞ ⎛ a − nμ ⎞ P a < S n ≤ b ≈ Φ⎜ ⎟ − Φ⎜ ⎟ ⎝ σ n ⎠ ⎝ σ n ⎠ and ( ) ( ) () () ( ) without continuity correction.5. in general.8. Note that this asymptotic relationship of the p. by the convergence of the distribution functions in the CLT. that fn(r) is close to φn(x).’s is not implied. is closer to the exact area.’s and probabilities of the form P(a ≤ Sn ≤ bn). 1. That is.1 0 Figure 8.5 and 3. p = 0. ⎛ b + 0. in particular.5 − nμ ⎞ ⎛ a + 0.

1. where (5) . p2 = 16 – . b′ = b + 0. and then apply the above approximations in order to obtain: P a ≤ Sn ≤ b ≈ Φ b * − Φ a * ( ) (( ) ) (3) ( ) ( ) ( ) a − 1 − nμ without continuity correction.5 = 1.5 − nμ . ﬁnd P(45 ≤ Sn ≤ 55). b* = b − nμ σ n . σ n σ n These expressions of a*. a* = 1 1 5 100 ⋅ ⋅ 2 2 1 55 − 100 ⋅ 5 2 = = 1.5 − 100 ⋅ 2 = 5.5 − nμ Normal approximation without correction: 1 44 − 100 ⋅ 2 = 6 = −1.1 a′ = 5 1 1 100 ⋅ ⋅ 2 2 1 55 + 0. (4) P a ≤ S n ≤ b ≈ Φ b′ − Φ a ′ ( ) ( ) ( ) with continuity correction. b* = 1 1 5 100 ⋅ ⋅ 2 2 Thus Φ b * − Φ a * = Φ 1 − Φ −1.7288 2 a′ = a − 0. where a* = and σ n .5 − 100 ⋅ 2 = − 5.2 − 1 = 0.841345 + 0.2 = Φ 1 + Φ 1.884930 − 1 = 0. b′ = 5 1 1 100 ⋅ ⋅ 2 2 ( ) ( ) () ( ) () ( ) .192 8 Basic Limit Theorems P a ≤ S n ≤ bn = P a − 1 < S n ≤ bn . EXAMPLE 5 1 5 . Normal approximation with correction: 1 45 − 0. (Numerical) For n = 100 and p1 = – 2 1 i) For p1 = – : Exact value: 0.2. b* and a′.5 = −1. b′ in (4) and (5) will be used in calculating probabilities of the form (3) in the numerical examples below.7263.

3. 5 ii) For p2 = 16 – . a′ = 2. n are independent P(λ).7288 − 0.7838.7286 = 0.5 − nλ . Then: so that Φ b * − Φ a * = 0.1 − Φ −1.23.7286.7263 = 0. . This is the same problem as in (1). Thus ⎛ b − nλ ⎞ ⎛ a − nλ ⎞ P a < S n ≤ b ≈ Φ⎜ ⎟ − Φ⎜ ⎟ ⎝ nλ ⎠ ⎝ nλ ⎠ and ( ) without continuity correction. ⎛ a + 0.0030. ( ) ( ) ( ) nλ with continuity correction. where now Xj. Error with correction: 0. where a′ = EXAMPLE 6 a − 0.5 − nλ ⎞ ⎛ b + 0. b′ = 5.7288 − 0.0002.86. Normal approximation to Poisson. We have μ = λ. (Numerical) For nλ = 16. . b′ = b + 0.5 − nλ ⎞ P a < S n ≤ b ≈ Φ⎜ ⎟ ⎟ − Φ⎜ ⎝ ⎠ ⎝ ⎠ nλ nλ ( ) with continuity correction.0025.0021.0021. σ = λ . j = 1. . .3 The Central Limit Theorem 193 Thus Φ b′ − Φ a ′ = Φ 1. working as above.8. a* = 2. ( ) ( ) so that Φ(b′) − Φ(a′) = 0. ﬁnd P(12 ≤ Sn ≤ 21). Error without correction: 0.1 = 2 Φ 1. We have: Exact value: 0.15.0000. we get: Exact value: 0. Error with correction: 0. b* = b − nλ nλ .0030.1 − 1 = 2 × 0. b* = 4. .864334 − 1 = 0.5 − nλ nλ .75. ( ) ( ) ( ) ( ) ( ) Error without correction: 0. Probabilities of the form P(a ≤ Sn ≤ b) are approximated as follows: P a ≤ Sn ≤ b ≈ Φ b * − Φ a * ( ) ( ) ( ) nλ without continuity correction. where a* = and P a ≤ S n ≤ b ≈ Φ b′ − Φ a ′ a − 1 − nλ .

) 8. (Use the CLT. so that Φ b′ − Φ a ′ = 0.000 are chosen at random.’s distributed as B(1.25) = 2Φ(1.3. b′ = 1.3. p). use the CLT.4 Let Xj. ﬁnd 2 4 the exact and approximate values of probability P(1.) 8. (Use the CLT.1.0013.2 A fair die is tossed independently 1. Error with correction: 0. Find the approximate probability that the number of ones X is such that 180 ≤ X ≤ 220. For p = – and p = – . Find the approximate probability that 65 ≤ X ≤ 90.3.194 8 Basic Limit Theorems Normal approximation without correction: 5 21 − 16 5 = −1.1 Refer to Exercise 4. Find the approximate probability that 150 ≤ X ≤ 200.7887.375.3. . Exercises 8.0049.3.) 8.95? 8.7 From a large collection of bolts which is known to contain 3% defective bolts.25) − 1 = 2 × 0. j = 1.v. 4 4 16 16 so that Φ(b*) − Φ(a*) = Φ(1.3. . (Use the CLT.125. . (For the latter.000 times and let X be the number of successes.894350 − 1 = 0. 100 be independent r. Find the exact and approximate value for the probability P( ∑j100 = 50).25.6 A Binomial experiment with probability p of a success is repeated 1 1 1. How many bulbs manufactured by the new process must be examined.200 times.000p − 50 ≤ X ≤ 1.25.) 8. a* = =− Normal approximation with correction: a ′ = −1.3. so as to establish the claim of their superiority with probability 0. 1.000p + 50). (For the =1 Xj latter. what is probability that this number does not exceed 5% of 1.) . . If X is the number of the defective bolts among those chosen. and let X be the total number of aces drawn. use the CLT.12 of Chapter 4 and suppose that another manufacturing process produces light bulbs whose mean life is claimed to be 10% higher than the mean life of the bulbs produced by the process described in the exercise cited above. 11 − 16 ( ) ( ) Error without correction: 0.3 Fifty balanced dice are tossed once and let X be the sum of the upturned spots.5 One thousand cards are drawn with replacement from a standard deck of 52 playing cards.7851.25) − Φ(−1.000? (Use the CLT.) 8. b* = = = 1.

If n = 100.3.11 Let Xj. n be independent r. (Use σ2 = 4.12.’s such that the X’s are identically distributed with EXj = μ1.d. σ 2 2 n n 1 2 8.Exercises 195 8. ∞).v.3.).3.5 inch and σ = 0. and the Y’s are identically distributed with EYj = μ2 ﬁnite and σ 2(Yj) = σ 2. Yj = 1. you win or lose $1 with probability – .500 hours.0005 inch.3. How many voters must be sampled. n be i. .000 times.000 hours? (Use the CLT. . . . . (Use the CLT. 0. j = 1. . .v. r. j = 1.0001) = 0.3.25σ) = 0. If you play the 2 game independently 1. . .’s with EXj = μ ﬁnite and σ 2(Xj) = σ2 ∈ (0.1.’s from the same distribution with EXj = EYj = μ and σ 2(Xj) = σ 2(Yj) = σ 2.12 Let Xj. . n. r.i.i.25. determine the constant c so that P(|X the CLT.16 An academic department in a university wishes to admit c ﬁrst-year graduate students. . on the average.14 Let Xj.05. .d. Yj. r. 2 8. otherwise. j = 1.9 In playing a game.3.3. 100p% of the students admitted will. ¯n − μ| ≤ kσ) i) Show that the smallest value of the sample size n for which P(|X 2 1 p ≥ p is given by n = [ k if this number is an integer.13 Refer to Exercise 4. Determine ¯ | ≤ 0.) 8. accept an admission offer (0 < p . n be i. Show that: i) E ( X n − Yn ) = μ 1 − μ 2 . otherwise.v. ii) By using Tchebichev’s inequality. .) 8.’s with Negative Exponential distribution with mean 1. . . both ﬁnite. (Use Exercise 8. j = 1. n be the diameters of n ball bearings.95 and k = 0.13 in Chapter 4 and let Xj. . and n is Φ −1 ( 1+ ) 2 ] the integer part of this number increased by 1. j = 1.1. show that the above value of n is given 1 by n = k (1 if this number is an integer. . . iii) For p = 0. σ 2 ( X n − Yn ) = 2 σ .’s such that EXj = μ ﬁnite and σ2(Xj) = ¯n − μ| ≤ c) = 0. .3.) 1 8.v. From past experience it follows that.95. n be i.i.10 A certain manufacturing process produces vacuum tubes whose lifetimes in hours are independently distributed r. . . compare the respective values of n in parts (i) and (ii).90. n n [( X −Y )−( μ − μ )] ii) is asymptotically distributed as N(0. what is the ¯n − μ| ≤ 0.12. . .15 Let Xj. 0. . What is the probability that the total life of 50 tubes will exceed 75.99? (Use the CLT.3. .) ¯n − Y the sample size n so that P(|X n 8. actually.v. . the total amount you won or lost) is at least $10? (Use the CLT. . so that the observed relative frequency of those favoring the proposal will not differ from the assumed frequency by more than 2% with probability 0. If EXj = 0. what is the probability that your fortune (that is. 1). . both ﬁnite.) 8. σ 2(Xj) = σ 2.d.) 8.3.8 Suppose that 53% of the voters favor a certain legislative proposal.3. and n is the integer part of this − p) number increased by 1.099? (Use Exercise minimum value of n for which P(|X 8. n. j = 1.

We distinguish two categories of LLN: the strong LLN (SLLN) in which the convergence involved is strong (a. σ2(X ¯n) = σ2/n. . .i. use the CLT (with continuity correction) in order to ﬁnd the approximate value to P(|X − c| ≤ d). i) How many students n must be admitted. are i. then X1 + ⋅ ⋅ ⋅ + X n a.s. n→∞ n a.v. It may be assumed that acceptance and rejection of admission offers by the various students are independent events. . n→∞ then E(Xj) is ﬁnite and equal to μ. r. it is presented in a higher level probability course. P ¯n ⎯⎯ ¯n ⎯ Of course. . j = 1. so that the probability P(|X − c| ≤ d) is maximum. for every ε > 0. X → μ implies X ⎯ → μ. that is.6? iii) What is the maximum value of the probability P(|X − 20| ≤ 2) for p = 0. Then EX P Xn − μ ≥ ε ≤ [ ] 1 σ2 → 0 as n → ∞.). and d is a prescribed number? ii) What is the value of n for c = 20.s. n are i. ¯n ⎯⎯ The converse is also true. The latter are the weak LLN. if X → to some ﬁnite constant μ. .6? Hint: For part (i). ¯n = μ. j = 1. r. Calculate the respective probabilities. THEOREM 4 (SLLN) If Xj. and p = 0. and conclude that the probability is maximized when n is close to c/p. ▲ a.s. .d. For part (iii). .’s with (ﬁnite) mean μ. there will be two successive values of n suggesting themselves as optimal values of n.υ. and choose that value of n which gives the larger probability. ε2 n .s.4 Laws of Large Numbers This section concerns itself with certain limit theorems which are known as laws of large numbers (LLN). Xn = PROOF Omitted.i.d. then Xn = X1 + ⋅ ⋅ ⋅ + X n P ⎯ ⎯→ μ. and the weak LLN (WLLN). where the convergence involved is convergence in probability. n. so that.’s with (ﬁnite) mean μ.’s also have a ﬁnite variance σ2. where X is the number of students actually accepting an admission. n→∞ n→∞ THEOREM 5 (WLLN) If Xj. n→∞ n PROOF i) The proof is a straightforward application of Tchebichev’s inequality under the unnecessary assumption that the r. ⎯ ⎯→ μ. that is. .196 8 Basic Limit Theorems < 1). Then draw the picture of the normal curve.) 8. d = 2.υ.

n are independent and distributed as P(λ). for t ∈ ޒ.s.f. it sufﬁces to prove that n→∞ φX t ⎯ ⎯→ φ μ t = e itμ . without the use of ch. ⋅ ⋅ ⋅ .d. 8. .6*.f. . as will be seen. j = 1.’s (m. is denoted by Fn and is deﬁned as follows: For x ∈ ޒ.1 An Application of SLLN and WLLN Let Xj. By Theorems 1(iii) (the converse case) and 2(ii) of this chapter. with d.f. we have n ⎛ t ⎞ ⎡ ⎛ t ⎞⎤ φ X t = φ1 nΣx t = φ Σx ⎜ ⎟ = ⎢φ X ⎜ ⎟ ⎥ ⎝ n⎠ ⎢ ⎣ ⎝ n⎠ ⎥ ⎦ n () j () j 1 ⎡ ⎛ t ⎞⎤ t = ⎢1 + iμ + o⎜ ⎟ ⎥ n ⎝ n⎠ ⎥ ⎢ ⎣ ⎦ n ⎡ ⎤ t t = ⎢1 + iμ + o 1 ⎥ n n ⎣ ⎦ () n ⎡ ⎤ t = ⎢1 + iμ + o 1 ⎥ ⎯ ⎯ → e iμt . is presented in Lemma 1 in Section 8. when this last expression appears as a subscript. j = 1. writing ΣXj instead of Σ n j=1 Xj. Fn x = () 1 the number of X 1 .’s. . n be i. . where E(Xj) does not exist. For example. .s. ▲ n →∞ n ⎣ ⎦ REMARK 10 An alternative proof of the WLLN. . . in order to P ¯n ⎯ prove that X ⎯ → μ. X n ≤ x . The sample or empirical d.f. The underlying idea there is that of truncation.g. n are independent and distributed as B(1. then: Xn = and also in probability. .’s could also be used if they exist). [ ( )] n Both laws of LLN hold in all concrete cases which we have studied except for the Cauchy case.f. .i. For the Poisson case we have: If Xj. and also in probability. X1 + ⋅ ⋅ ⋅ + X n ⎯ ⎯ →λ n→∞ n a. j = 1. . F.8.4. . then Xn = X1 + ⋅ ⋅ ⋅ + X n ⎯ ⎯ →p n→∞ n a. in the Binomial case. n [ ] .4 Laws of Large Numbers 197 ii) This proof is based on ch. . we have: If Xj. n () n →∞ () For simplicity. p).

.s. .. for a ﬁxed set of values of X1.i. () Xj ≤ x X j > x. . . Xn. Namely. Then. n are independent since the X’s are.v.’s X1.1 Let Xj. . . ⎪ Yj x = Yj = ⎨ ⎪ ⎩0. n be i. .2 Let Xj. as a function of the r.’s and suppose that EX k j is ﬁnite for a given positive integer k. ⋅ ⋅ ⋅ . p).198 8 Basic Limit Theorems Fn is a step function which is a d. . It is also an r.f. and Yj is B(1. .v. Fn x ⎯ ⎯ →F x .4. ( ( ) () Hence ⎛ n ⎞ ⎛ n ⎞ E ⎜ ∑ Yj ⎟ = np = nF x . . . Fn(x) ⎯ ⎯ → F ( x ) uniformly in x ∈ ޒ ). Yj.d.d. . (Calculate the expectation by means of the ch. ⎝ j =1 ⎠ ⎝ j =1 ⎠ () ( )[ ( )] It follows that 1 nF x = F x . . n→∞ () () Actually. .’s with p. n→∞ 8. n j =1 On the other hand.4.f.s. clearly. x ∈⎯ ޒ ⎯ → 0⎤ = 1 n →∞ ⎥ ⎢ ⎦ ⎣ a. . (that is.d.v. we have P ⎡sup Fn x − F x . σ 2 ⎜ ∑ Yj ⎟ = npq = nF x 1 − F x . .14 of Chapter 3 and show that the WLLN holds. n→∞ { () () } PROOF Omitted. r. Xn. for each x.2. Set n (k ) 1 Xn = ∑Xk j n j =1 for the kth sample moment of the distribution of the X ’s and show that P (k) ¯n X ⎯ ⎯ → EX k 1.) . n. . THEOREM 6 (Glivenko–Cantelli Lemma) With the above notation. n be i. n→∞ [ ( )] () () () () P Fn x ⎯ ⎯ →F x. more is true. j = 1. given in Exercise 3. . we get by the LLN E Fn x = a. . . j = 1. r. Fn x = () ) 1 n ∑ Yj .i. j = 1. n So for each x ∈ޒ. j = 1.v. where p = P Yj = 1 = P X j ≤ x = F x . Exercises 8.f. Let ⎧1.

4 Let Xj. and Xj are r. . the generalized WLLN holds. 8. and g(X) are r. .8. n ≥ 1.v. 2.v.s. has the Negative Exponential distribution with parameter λj = 2j/2. . n be r. . we present some further limit theorems which will be used occasionally in the following chapters. ⎯ → g(X). ⎯ → g X1 .’s such that P X j = −α j = P X j = α j = ( ) ( ) 1 . 2. . and g: ii) More generally. and let g: ޒ→ޒbe continuous. a.5 Further Limit Theorems 199 8.4. . so that g(X n . . . j = 1. . then ( j ) a. . X k . and set μn = 1 n ∑ μj. .5 Decide whether the generalized WLLN holds for independent r. . .’s which need be neither independent nor identically distributed.v. then the generalized version of the WLLN holds.4. X n ⎞ ⎯ ⎝ ⎠ n →∞ ( ) .’s.v. let Xj be independent r. .s. ⋅ ⋅ ⋅ . j ≥ 1.s.’s such that the jth r. .v. Xk) are r. k. ⎯→ X j . 8.5 Further Limit Theorems In this section. n be pairwise uncorrelated r. σ 2(Xj) = σ 2 j. if for j = 1.’s such that Xj is distributed as P( λj). so that g(Xn).’s such that Xj is distributed as χ j2/ j . j = 1. n ≥ 1. Then Xn ⎯ ⎯ → X implies g(Xn) ⎯ n→∞ n→∞ ( j) . X n ) and g(X1. . . .v. ⋅ ⋅ ⋅ .v.v.v. all ﬁnite. a. . 8. k imply g ⎛ X n . show that the generalized WLLN holds.4. . . 2 Show that for all α’s such that 0 < α ≤ 1. Suppose that EXj = μj. .7 For j = 1. n j =1 Then a generalized version of the WLLN states that P X n − μn ⎯ ⎯ → 0.’s.6 For j = 1. j = 1.s. 8. let Xj be independent r. .’s. and X be r.4. Show that the generalized WLLN holds. . ⋅ ⋅ ⋅ .v.4. X n k (1) (k) ޒ→ ޒis continuous. n→∞ Show that if the X’s are pairwise uncorrelated and σ 2 j ≤ M(<∞). n ≥ 1. If {1/nΣn j=1λj} remains bounded. . .’s. .3 Let Xj. 8. Xn ⎯ n →∞ (1 ) (k ) a. THEOREM 7 i) Let Xn. .

and hence A1 ∩ A2 n ⊆ A3 n . convergence and the continuity of g. Xj and g be as in Theorem 7(ii). and suppose P P ( j) (k) that X n ⎯ ⎯→ Xj. Then g(X (1) ⎯ → g(X1. Thus there exist n0 sufﬁciently large such that n→∞ P X ∈ − ∞. we then have ([ ( 0 )] + [ X ∈(M n0 .200 8 Basic Limit Theorems PROOF Follows immediately from the deﬁnition of the a. Xk). there exists δ(ε. Mn]) ⎯ ⎯ → 1. THEOREM 7 ′ P i) Let Xn. That is. . () () which implies that c c A3 n ⊆ A1 ∪ Ac 2 n . and A3 n = g X n − g X < ε [ ] () [ ( )] () Then it is easily seen that on A1 ∩ A2(n). . and suppose that Xn ⎯ ⎯ → X. A2 n = X n − X < δ ε . 2M] with |x′ − x″| < δ(ε). let again X ( n .s. but a justiﬁcation is needed. . P g X n − g X ≥ ε < ε. . . X and g be as in Theorem 7(i). . . Xn ) ⎯ n→∞ n→∞ PROOF i) We have P(X ∈ = )ޒ1. is uniformly continuous in [−2M. ▲ A similar result holds true when a. . k. M) = δ(ε) (<1) such that |g(x′) − g(x″)| < ε for P all x′. n→∞ P Then g(Xn) ⎯ ⎯ → g ( X ). g being continuous in ޒ. . then P(X ∈ [−Mn. for n ≥ N(ε). and if Mn ↑ ∞(Mn > 0). Thus for every ε > 0. we have −2M < X < 2M. [ ( ) ( ) ] . convergence is replaced by convergence in probability. . −2M < Xn < 2M. . [ ( ) ( ) ] (for n ≥ N (ε )). 2M]. . n→∞ j) ii) More generally. n . ∞ )]) = P( X > M ) < ε 2 (M n0 n0 >1 . n ≥ N ε . n ≥ 1. ) P X > M < ε 2. j = 1. x″ ∈ [−2M. − M n 0 Deﬁne M = Mn . From Xn ⎯ ⎯ → X we have that n→∞ there exists N(ε) > 0 such that ( ) P Xn − X ≥ δ ε < ε 2 . () () Hence c c P A3 n ≤ P A1 + P Ac 2 n ≤ε 2 +ε 2 = ε [ ( )] ( ) [ ( )] (for n ≥ N (ε )). Set [ ( )] () A1 = X ≤ M .s.

y) = x + y. iv) g(x. then If Xn ⎯ n→∞ n→∞ P i) Xn + Yn ⎯ ⎯→ X + Y. n→∞ P iii) XnYn ⎯ ⎯ → XY. COROLLARY P P ⎯ → X. y) = ax + by. y) = x/y. provided P(Yn ≠ 0) = 1.1. (See also Exercise 8. (See also Exercises 8.) ▲ The following corollary to Theorem 7′ is of wide applicability.8. constant.3 and 8. n→∞ d ii) XnYn ⎯ ⎯ → cX. ▲ The following is in itself a very useful theorem. y) = xy. c < 0. i) P X n + Yn ≤ z = FX n +Yn z ⎯ ⎯ → FX + c z n→∞ ii) P X nYn ≤ z = FX Y z ⎯ ⎯ → FcX z n→∞ n n ( ) ( ) = P X + c ≤ z = P X ≤ z − c = FX z − c .6. THEOREM 8 d P If Xn ⎯ ⎯ → X and Yn ⎯ ⎯ → c. n→∞ d iii) Xn/Yn ⎯ ⎯ → X/c. iii) g(x. X⎜ ⎟ ⎟ ⎪ ⎜ c⎠ ⎝c ⎠ ⎩ ⎝ ( ) Yn ⎯→ F (z) (z) ⎯ n→∞ X c c>0 ⎛X ⎞ ⎧ ⎪P X ≤ cz = FX cz .2. iv) Xn/Yn ⎯ n→∞ PROOF In sufﬁces to take g as follows and apply the second part of the theorem: i) g(x. n→∞ P ii) aXn + bYn ⎯ ⎯ → aX + bY (a.5. then n→∞ n→∞ d i) Xn +Yn ⎯ ⎯ → X + c. = P⎜ ≤ z⎟ = ⎨ c < 0. ( () () ) ( () () ) ( ) ⎛X ⎞ iii) P⎜ n ≤ z⎟ = FX n ⎝ Yn ⎠ ⎧ ⎛ ⎛ z⎞ z⎞ c>0 ⎪P⎜ X ≤ ⎟ = FX ⎜ ⎟ . ⎝ c ⎠ ⎪P X ≥ cz = 1 − FX cz − . y ≠ 0. ⎩ provided P(Yn ≠ 0) = 1. Equivalently. provided P(Yn ≠ 0) = P(Y ≠ 0) = 1. ii) g(x. c⎠ ⎝ c⎠ ⎪ ⎝ = P cX ≤ z = ⎨ ⎪ P ⎛ X ≥ z ⎞ = 1 − F ⎛ z −⎞ .6. b constants).5 Further Limit Theorems 201 The proof is completed.) ii) It is carried out along the same lines as the proof of part (i). Yn ⎯ ⎯ → Y. ( ( ) ) ( ) ( ) . n→∞ P ⎯ → X/Y. n→∞ c ≠ 0.

if we choose ε < c . we assume that Yn > 0. )] ( (X n ≤ zYn ∩ Yn − c < ε ⊆ X n ≤ z c ± ε ) ( ) [ n )] and hence ⎛ Xn ⎞ ≤ z⎟ ⊆ Yn − c ≥ ε ⎜ ⎝ Yn ⎠ ( )U [X ≤ z c ± ε . we have then ⎡⎛ X ⎤ ⎡⎛ X ⎤ ⎛X ⎞ ⎞ ⎞ P ⎜ n ≤ z⎟ = P ⎢⎜ n ≤ z⎟ I Yn > 0 ⎥ + P ⎢⎜ n ≤ z⎟ I Yn < 0 ⎥ ⎝ Yn ⎠ ⎠ ⎠ ⎢⎝ Yn ⎥ ⎢ ⎥ ⎣ ⎦ ⎣⎝ Yn ⎦ ⎡⎛ X ⎤ ⎞ ≤ P ⎢⎜ n ≤ z⎟ I Yn > 0 ⎥ + P Yn < 0 . if z < 0. we obtain the result. FX(z/c−) = FX(z/c) and FX(cz−) = FX(cz). P We ﬁrst notice that Yn ⎯ ⎯ → 1. we may divide by Yn except perhaps on a null set. we will be interested in the limit of the above probabilities as n → ∞.202 8 Basic Limit Theorems REMARK 11 Of course. We have then ( ) ( ) ( ) ( ) ⎛ Xn ⎞ ⎛X ⎞ ⎛X ⎞ ≤ z⎟ = ⎜ n ≤ z⎟ I Yn − c ≥ ε + ⎜ n ≤ z⎟ I Yn − c < ε ⎜ ⎝ Yn ⎠ ⎝ Yn ⎠ ⎝ Yn ⎠ ( ) ( ) ⊆ Yn − c ≥ ε ( )U ( X n ≤ zYn )I ( Y n −c <ε . Outside this null set. ) ( ) ( ) [ ) [ ( ( )] and n That is. we proceed to establish (iii) under the (unnecessary) additional assumption that FX is continuous and for the case that c > 0. n→∞ ⎝ Yn ⎠ [ ( )] . Next. ( )] Thus ⎛X ⎞ P⎜ n ≤ z⎟ ≤ P Yn − c ≥ ε + P X n ≤ z c ± ε . if z ≥ 0. for every z ∈ ޒ. ⎠ ⎢ ⎥ ⎣⎝ Yn ⎦ In the following. z ∈ ޒ. if F is continuous. or ⎯ ⎯ → ⎯ → n n→∞ n→∞ P(c − ε ≤ Yn ≤ c + ε) ⎯ 1. ) But |Yn − c| < ε is equivalent to c − ε < Yn < c + ε. PROOF As an illustration of how the proof of this theorem is carried out. we obtain ⎛X ⎞ lim sup P⎜ n ≤ z⎟ ≤ FX z c ± ε . The case where c < 0 is treated similarly. In fact. z ∈ ޒ. ⎯ → c (>0) implies that P(Yn > 0) ⎯ n→∞ n→∞ P Yn ⎯ c is equivalent to P (| Y − c | ≤ ε ) 1 for every ε > 0. ≤ zYn ∩ Yn − c < ε ⊆ X n ≤ z c − ε . z ∈ ޒ. Therefore (X (X n ≤ zYn ∩ Yn − c < ε ⊆ X n ≤ z c + ε . Since P(Yn < 0) → 0. ⎯ → n→∞ since P(Yn ≠ 0) = 1. ⎝ Yn ⎠ ( ) [ ( )] Letting n → ∞ and taking into consideration the fact that P(|Yn − c| ≥ ε) → 0 and P[Xn ≤ z(c ± ε)] → FX[z(c ± ε)]. Thus.

Thus . if ⎝ Yn ⎠ ) z ≥ 0. n→∞ ⎝ Yn ⎠ ( ) (6) Next. we have ⎛X ⎞ lim sup P⎜ n ≤ z⎟ ≤ FX zc .8. n→∞ ⎝ Yn ⎠ [ ( )] ( ) Since. z ∈ ޒ. we have that |Yn − c| < ε is equivalent to 0 < c − ε < Yn < c + ε and hence [X ≤ z(c ± ε )] = [X ≤ z(c ± ε )] ∩ ( Y − c ≥ ε ) + [X ≤ z(c ± ε )] ∩ (Y − c < ε) ⊆ (Y − c ≥ ε) ∪ [ X ≤ z( c ± ε )] ∩ ( Y − c < ε ). ⎠ Thus ⎛X ⎞ P X n ≤ z c ± ε ≤ P Yn − c ≥ ε + P⎜ n ≤ z⎟ . we have ⎛X ⎞ FX zc ≤ lim inf P⎜ n ≤ z⎟ . FX[z(c ± ε)] → FX(zc). [X ≤ z(c ± ε )]I ( Y n n ⎛X ⎞ − c < ε ⊆ ⎜ n ≤ z⎟ ⎝ Yn ⎠ ) and hence [X ≤ z(c ± ε )] ⊆ ( Y n n −c ≥ε )U ⎛⎜⎝ X Y ) n n ⎞ ≤ z⎟ . as ε → 0. n n n n n n n n [X ≤ z(c − ε )]I ( Y n n ⎛X ⎞ − c < ε ⊆ ⎜ n ≤ z⎟ . z ∈ ޒ. and [X ≤ z(c + ε )]I ( Y n n ⎛X ⎞ − c < ε ⊆ ⎜ n ≤ z⎟ . for every z ∈ ޒ. Y n→∞ ⎝ n ⎠ (7) Relations (6) and (7) imply that lim P (Xn/Yn ≤ z) exists and is equal to n→∞ ⎛X ⎞ FX zc = P X ≤ zc = P⎜ ≤ z⎟ = FX ⎝ c ⎠ ( ) ( ) c (z). By choosing ε < c. ⎝ Yn ⎠ [ ( )] ( Letting n → ∞ and taking into consideration the fact that P(|Yn − c| ≥ ε) → 0 and P[Xn ≤ z(c ± ε)] → Fx[z(c ± ε)]. we obtain ⎛X ⎞ FX z c ± ε ≤ lim inf P⎜ n ≤ z⎟ . as ε → 0. FX[z(c ± ε)] → FX(zc). z ∈ ޒ. z ∈ ޒ.5 Further Limit Theorems 203 Since. if ⎝ Yn ⎠ ) z < 0. That is.

. X ⎯ → μ2 a. . j = 1. . .s. n. On the other hand. Xn are i. .v.i. THEOREM 9 Let Xj. σ2 =σ2 Xj ( ) ( ) (which are assumed to exist). are i.’s with E(Xj) = μ. and E X 2j = σ 2 X j + EX j ( ) ( ) ( ) 2 = σ 2 + μ 2 . then n − 1 Xn − μ Sn ( )⎯ ⎯→ N ( 0. Thus 1 n ∑ X 2j − X n2 → σ 2 + μ 2 − μ 2 = σ 2 n j =1 a. . n Xn − μ ( σ )⎯ ⎯→ N ( 0. r. and also in probability (by Theorems n→∞ 7(i) and 7′(i)). n − 1 σ 2 n→∞ since n/(n − 1) ⎯ ⎯ → 1. n j =1 Next. if Xj. . r. ▲ n Yn ⎯→ F (z). and also in n ⎯ n→∞ 2 2 ¯ probability. d n→∞ PROOF In fact.’s. .. j = 1.i. . . . . .d.v. j = 1.d. . P S2 ⎯ →σ2 n ⎯ n→∞ implies 2 n Sn P ⎯ ⎯→ 1.204 8 Basic Limit Theorems ⎛X ⎞ P⎜ n ≤ z⎟ = FX ⎝ Yn ⎠ as was to be seen. . if μ = E Xj .’s X j2.v. .d. σ 2(Xj) = σ 2. n are i. the r.d. and also in probability (by the same theorems just referred to). 1).s. r. .i.s.i. 1). REMARK 12 Theorem 8 is known as Slutsky’s theorem.’s with mean μ and (positive) variance σ2.v. be i. n. j = 1. Therefore the SLLN and WLLN give the result that 1 n ⎯ → σ 2 + μ2 ∑ X 2j ⎯ n →∞ n j =1 a. since the X’s are. (z) ⎯ n→∞ X c z ∈ ޒ. n. ¯2 and also in probability. . n→∞ COROLLARY TO THEOREM 8 If X1. n ⎯ n→∞ REMARK 13 Of course. and hence X n ⎯ ⎯ → μ a. Then S2 ⎯ → σ 2 a. . 1) d n→∞ and also n Xn − μ Sn ( )⎯ ⎯→ N ( 0. and also in probability. .s.s. So we have proved the following theorem. Now. d n→∞ . we have seen that the sample variance 2 Sn = 1 n ∑ X j − Xn n j =1 ( ) 2 = 1 n ∑ X 2j − X n2 .

▲ COROLLARY ( ) () (10) Let the r. and let its derivative g′(x) be continuous at a point d. ( ) () ( ) ( ) ) ( P P However.d.5 Further Limit Theorems 205 by Theorem 3.v. so that the theorem . 2. Finally. let Xn and X be r. Hence the quotient of these r. 1) as n → ∞. 2 PROOF By the CLT. by Theorem 8(ii). Xn − d ⎯ cn(Xn −d) ⎯ ⎯ → X and cn ⎯ → 0.’s. σ 2). applies and gives d ¯n − μ) ⎯ n(X ⎯ → X ∼ N(0. Xn − d ⎯ ⎯→ 0. By assumption. THEOREM 10 For n = 1. Hence cn g X n − g d = cn X n − d g ′ X * n .’s which is n − 1 Xn − μ Sn n ( ) converges in distribution to N(0. −1 d d → 0. so that Xn by Theorem 7′(i) again. Xn be i.v. let g: ޒ → ޒbe differentiable. and therefore. or P equivalently. [ ( ) ( )] ( ) (9) P g X* ⎯ →g d. with mean μ ∈ ޒand variance σ 2 ∈ (0. .v. ⎛ d n g Xn − g μ ⎯ ⎯ → N ⎜ 0. . as n → ∞. . This result and relation (9) complete the proof of the theorem.i. by Theorem 9.v. |X n * − d| ≤ |Xn − d| ⎯ * ⎯ ⎯ → d. (8) Next. we have cn(Xn − d) d g′(X n *) ⎯ ⎯ → g′(d)X. σ g ′ μ ⎝ [ ( ) ( )] [ ( )] ⎞⎟⎠ . n→∞ n−1 σ by Remark 13. Then. PROOF In this proof.8. Then cn[g(Xn) − g(d)] d ⎯ ⎯→ g′(d)X as n → ∞. ∞). by Theorem 7′(i). all limits are taken as n → ∞. and let cn(Xn − d) ⎯ ⎯ → X as n → ∞. Then. n ⎯ By assumption. . . and Sn P ⎯ ⎯→ 1. lying between d and Xn. . . .’s X1. expand g(Xn) around d according to Taylor’s formula in order to obtain g X n = g d + X n − d g′ X * n . ▲ The following result is based on theorems established above and it is of signiﬁcant importance. and hence. ⎯ → 0 by (8). convergence (10) and Theorem 8(ii). let cn be constants such d that 0 ≠ cn → ∞. P Xn − d ⎯ ⎯ → 0. where X n * is an r. and let g: ޒ → ޒbe differentiable with derivative continuous at μ.

β] such that F x j +1 − F x j < ε 2. 8. By taking .6* Pólya’s Lemma and Alternative Proof of the WLLN The following lemma is an analytical result of interest in its own right. for every ε > 0 there exists N(ε) > 0 such that n ≥ N(ε) implies that |Fn(x) − F(x)| < ε for every x ∈ ޒ. there exists an interval [α.3 Carry out the proof of Theorem 7′(ii). 2 d n X n 1 − X n − pq ⎯ ⎯ → N ⎛ 0. ( ) ( ) j = 1. 2 ▲ APPLICATION If the r. β ] such that F α < ε 2.v.206 8 Basic Limit Theorems ⎛ d n g Xn − g μ ⎯ ⎯ → g ′ μ X ~ N ⎜ 0. so that g′(x) = 1 − 2x. 8. as x → ∞. F β > 1 − ε 2 .5. The result follows. PROOF Since F(x) → 0 as x → −∞. x ∈ޒ. (Use the usual Euclidean distance in ޒk. σ g ′ μ ⎝ [ ( ) ( )] () [ ( )] ⎞⎟⎠. ⋅ ⋅ ⋅ . Let F and {Fn} be d. r − 1.5. β]. ( ) ( ) j = 1. ( ) () (11) The continuity of F implies its uniform continuity in [α. Fn x j − F x j < ε 2. It was used in the corollary to Theorem 3 to conclude uniform convergence. Then there is a ﬁnite partition α = x1 < x2 < · · · < xr = β of [α. ⎝ ⎠ Here μ = p. That is. we actually have −2M < X < 2M. and let F be n→∞ continuous.’s such that Fn(x) ⎯ ⎯ → F(x). and F(x) → 1.2 Refer to the proof of Theorem 7′(i) and show that on the set A1 ∩ A2(n). and g(x) = x(1 − x).f. [ ( ) ] ( ) Exercises 8. (12) ⎯ → F(xj) implies that there exists Nj(ε) > 0 such that for all Next. Then the convergence is uniform in x ∈ޒ. as n → ∞. . pq 1 − 2 p ⎞ . then so does the WLLN. ⋅ ⋅ ⋅ . then.1 Use Theorem 8(ii) in order to show that if the CLT holds.) 8. p). Fn(xj) ⎯ n→∞ n ≥ Nj(ε).’s Xj. j = 1.5. LEMMA 1 (Pólya). . r . n in the corollary are distributed as B(1. . . σ 2 = pq.

we deﬁne ⎧ ⎪X j . The basic idea is that of suitably truncating the r. . we then have that () ( () ( )) Fn x j − F x j < ε 2. Thus for n ≥ N(ε). if ⎪ ⎩ j () Xj ≤δ ⋅n X j > δ ⋅ n. a proof of the WLLN (Theorem 5) is presented without using ch. F x r +1 − F x r < ε 2. ( ) ( ) j = 1. let x be any real number. .f. (15) Also (13) trivially holds for j = 0 and j = r + 1. ⋅ ⋅ ⋅ . () () for every x ∈ ޒ. () ( ) ( ) Hence 0 ≤ F x + ε − Fn x ≤ F x j+1 + ε − F x j + ε 2 < 2 ε () and therefore |Fn(x) − F(x)| < ε.’s. . Xj = Yj + Zj. j = 0. of the X’s. ▲ Below. n. xr+1 = ∞. if Yj n = Yj = ⎨ 0. r. Then by the fact that F(−∞) = 0 and F(∞) = 1. 1.v. and is due to Khintchine. Fn x j − F x j < ε 2. Then. . Then. . . we have ( ) ( ) (16) Next. ⋅ ⋅ ⋅ .8. r. ⋅ ⋅ ⋅ . we have the following string of inequalities: F x j − ε 2 < Fn x j ≤ Fn x ≤ Fn x j+1 < F x j−1 + ε 2 j j+1 ( ) ( ) () ( ) ( ) < F (x ) + ε ≤ F (x) + ε ≤ F (x ) + ε . j = 1. By (15) and (16) and for n ≥ N(ε). (13) Let x0 = −∞. relation (11) implies that F x1 − F x0 < ε 2. j = 1. ( ) ( ) ( ) ( ) (14) Thus.d. ⋅ ⋅ ⋅ . r . 1. . if ⎪ ⎩ () Xj ≤δ ⋅n Xj >δ ⋅n and ⎧ if ⎪0. . (17) ALTERNATIVE PROOF OF THEOREM 5 We proceed as follows: For any δ > 0. we have that F x j +1 − F x j < ε 2. it was also used by Markov. r + 1. 1. by means of (12) and (14). N r ε . .f. clearly. ( ) ( ) j = 0. that is. Then xj ≤ x < xj+1 for some j = 0.6* Pélya’s Lemma and Alternative Proof of the WLLN 207 n ≥ N ε = max N1 ε . n.’s involved. ⋅ ⋅ ⋅ . Let us restrict ourselves to the continuous case and let f be the (common) p. Zj n = Zj = ⎨ X . we have Fn x − F x < ε Relation (17) concludes the proof of the lemma.

that is. xI [ x ≤δ ⋅n] ( x ) f ( x ) ⎯ ∫ Therefore ∞ −∞ x f x dx < ∞. ( ) (18) E Yj = E Y1 = E X 1 I = ∫ xI −∞ ∞ ( ) ( ) { [X 1 ≤δ ⋅n ] ( X1 ) [ x ≤δ ⋅n ] ( x) f ( x)dx. n→∞ by Lemma C of Chapter 6. Next. σ 2 Yj ≤ δ ⋅ n ⋅ E X 1 .208 8 Basic Limit Theorems σ 2 Y j = σ 2 Y1 ( ) = E Y 12 − EY1 2 = E X1 ⋅I = ∫ x2I −∞ ∞ { ( ) ( ) [X 1 ( ) 2 ≤ E Y 12 ≤δ ⋅n ] ( X1 ) [ x ≤δ ⋅n] ( x ) f ( x )dx } ( ) =∫ that is. n→∞ [ x ≤δ ⋅n] ( x ) f ( x ) < x f ( x ). δ ⋅n − δ ⋅n x 2 f x dx ≤ δ ⋅ n∫ () δ ⋅n − δ ⋅n x f x dx ≤ δ ⋅ n∫ () ∞ −∞ x f x dx () = δ ⋅ nE X1 . ∞ () ∫ ∞ −∞ xI ⎯ → xf ( x )dx = μ n→∞ ∫−∞ [ x ≤δ ⋅n] ( x ) f ( x )dx ⎯ E Yj ⎯ ⎯ → μ. ( ) (19) ⎡ n ⎤ ⎡1 n ⎤ ⎛ n ⎞ P ⎢ ∑ Yj − EYj ≥ ε ⎥ = P ⎢ ∑ Yj − E ⎜ ∑ Yj ⎟ ≥ nε ⎥ ⎢ ⎥ ⎝ j =1 ⎠ ⎢ ⎥ ⎣ n j =1 ⎦ ⎣ j =1 ⎦ ≤ = ≤ = ⎛ n ⎞ 1 σ 2 ⎜ ∑ Yj ⎟ 2 nε ⎝ j =1 ⎠ 2 nσ 2 Y1 2 n ε2 nδ ⋅ n ⋅ E X 1 ( ) δ E X1 ε2 n 2ε 2 . xI and ⎯ → xf ( x ). } Now.

Next. P Zj ≠ 0 = P Zj > δ ⋅ n j (21) ( ) ( =∫ =∫ =∫ <∫ = ) = P ( X > δ ⋅ n) − δ ⋅n −∞ f x dx + ∫ ( x >δ ⋅n ( x >∫ in >1 f ( x)dx () f x dx ) ( ) f x dx ) ( ) ∞ δ ⋅n x ( x >δ ⋅n ) δ ⋅ n f x dx () 1 x f x dx δ ⋅ n ∫( x >δ ⋅n ) 1 2 δ < δ ⋅n δ x f x dx < δ 2 = . ⎤ δ ⎡1 n P ⎢ ∑ Y j − μ ≥ 2 ε ⎥ ≤ 2 E X1 ⎥ ⎢ ⎦ ε ⎣ n j=1 for n large enough. ⎥ ⎢ ⎦ ε ⎣ n j =1 (20) Thus. ⎤ ⎡⎛ 1 n ⎡1 n ⎤ ⎞ P ⎢ ∑ Y j − μ ≥ 2 ε ⎥ = P ⎢ ⎜ ∑ Y1 − E Y j ⎟ + E Y1 − μ ≥ 2 ε ⎥ n ⎠ ⎥ ⎢ ⎢ ⎥ ⎣ n j=1 ⎦ ⎦ ⎣ ⎝ j=1 ( ) ( ( ) ) ⎡1 n ⎤ ≤ P ⎢ ∑ Y j − EY1 + EY1 − μ ≥ 2 ε ⎥ ⎢ ⎥ ⎣ n j=1 ⎦ ⎡1 n ⎤ ≤ P ⎢ ∑ Y j − EY1 ≥ ε ⎥ + P EY1 − μ ≥ ε ⎢ ⎥ ⎣ n j=1 ⎦ δ ≤ 2 E X1 ε [ ] for n sufﬁciently large. ⎤ δ ⎡1 n P ⎢ ∑ Yj − EY1 ≥ ε ⎥ ≤ 2 E X 1 .8.6* Pólya’s Lemma and Alternative Proof of the WLLN 209 by (18). that is. that is. by (19) and (20). So P(Zj ≠ 0) ≤ δ/n and hence ⎡n ⎤ P ⎢∑ Z j ≠ 0 ⎥ ≤ nP Z j ≠ 0 ≤ δ ⎢ ⎥ ⎣ j =1 ⎦ ( ) (22) . since ∫ ( x >δ ⋅n ) n () () for n sufﬁciently large.

(22). the following is ⎯ → n→∞ always true. then there is a subsequence {nk} of {n} (that is. such that Xnk ⎯ ⎯ → X. However.’s {X2 −1}. ( k ) ( k ) k . for example. THEOREM 11 P If Xn ⎯ ⎯ → X. Since this is true for every ε > 0. the result follows. it was stated that Xn ⎯ ⎯ → n→∞ a.s. in Remark 3. convergence. As an application of Theorem 11. n→∞ PROOF Omitted.s. ▲ This section is concluded with a result relating convergence in probability P and a. Replacing δ by ε3.v. ⎤ ⎤ ⎡1 n ⎡1 n 1 n P ⎢ ∑ X j − μ ≥ 4ε ⎥ = P ⎢ ∑ Y j + ∑ Z j − μ ≥ 4ε ⎥ n n n j =1 ⎥ ⎢ ⎥ ⎢ ⎦ ⎦ ⎣ j =1 ⎣ j =1 ⎡1 n ⎤ 1 n ≤ P ⎢ ∑ Y j − μ + ∑ Z j ≥ 4ε ⎥ n n j =1 ⎢ ⎥ ⎣ j =1 ⎦ ⎡1 n ⎤ ⎤ ⎡1 n ≤ P ⎢ ∑ Y j − μ ≥ 2ε ⎥ + P ⎢ ∑ Z j ≥ 2ε ⎥ n n ⎥ ⎢ ⎥ ⎢ ⎦ ⎣ j =1 ⎣ j =1 ⎦ n n ⎤ ⎡1 ⎤ ⎡ ≤ P ⎢ ∑ Y j − μ ≥ 2ε ⎥ + P ⎢ ∑ Z j ≠ 0⎥ n ⎥ ⎢ ⎥ ⎢ ⎦ ⎣ j =1 ⎦ ⎣ j =1 δ ≤ 2 E X1 + δ ε for n sufﬁciently large. we get ⎤ ⎡1 n P ⎢ ∑ X j − μ ≥ 4ε ⎥ ≤ εE X 1 + ε 3 ⎥ ⎢ ⎦ ⎣ n j =1 for n sufﬁciently large. X does not necessarily imply that Xn ⎯ X . nk ↑ ∞. k → ∞) n→∞ a.210 8 Basic Limit Theorems for n sufﬁciently large. so that 1/2k−1 < ε. by (21). Thus. 2 Hence the subsequence {X2 −1} of {Xn} converges to 0 in probability. refer to Example 2 and consider the subsequence of r. we have 1 P X 2 −1 > ε = P X 2 −1 = 1 = k−1 < ε. Then for ε > 0 and large enough k.s. where k X2 k −1 = I⎛ 2 k−1 −1 ⎤ ⎜ k − 1 ⋅ 1⎥ ⎝ 2 ⎥ ⎦ . More precisely.

Exercises 211 Exercises 8.1 Use Theorem 11 in order to prove Theorem 7′(i).2 Do likewise in order to establish part (ii) of Theorem 7′.6. . 8.6.

v. . Then Y is also a discrete r. . where A = h−1(B). .v.v. . we want to determine the distribution of Y. . Y is determined by the distribution PX of the r. and let Y = h(X). taking the values xj. . PY be the distributions of X and Y.v. . PY(B) = P(Y ∈ B). B (Borel) subset of ޒ. Therefore PY(B) = P(Y ∈ B) = P(X ∈ A) = PX(A). where A = h−1(B) = {x ∈ . { ( ) } X and hence fY y j = P Y = y j = PY y j ( ) ( ) ({ }) = P ( A) = ∑ f (x ). j = 1.v. Thus we have the following theorem. That is.ޒh(x) ∈ B}. .v. 9. . 2. we have A = xi.1.212 9 Transformations of Random Variables and Random Vectors Chapter 9 Transformations of Random Variables and Random Vectors 9. We wish to determine fY(yj) = P(Y = yj). PX(B) = P(X ∈ B). and let h: ޒ → ޒbe a (measurable) function. and let h be a (measurable) function on ޒinto ޒ. . j = 1. . so that Y = h(X) is an r. THEOREM 1 Let X be an r. Given the distribution of X. PY(B) = PX(A). so that Y = h(X) is an r. Let PX. . X i x i ∈A 212 . Now (Y ∈ B) = [h(X) ∈ B] = (X ∈ A). respectively. Then the distribution PY of the r.1 Application 1: Transformations of Discrete Random Variables Let X be a discrete r. h xi = y j . 2. 2. X as follows: for any (Borel) subset B of ޒ.v. By taking B = {yj}.v. j = 1. taking the values yj.1 The Univariate Case The problem we are concerned with in this section in its simplest form is the following: Let X be an r.

( ) ( ) Let X take on the values −n. . n each with probability 1/2n. . 12. . ( ) ( ) ({ }) . vectors. . −1 + y + 4 ! ) For example. 4. ⋅ ⋅ ⋅ . . Y above. ⋅ ⋅ ⋅ . proved in advanced probability courses. that the distribution PX of an r. . we have P(Y = 12) = e−λλ3/3!.1 The Univariate Case 213 where EXAMPLE 1 fX x i = P X = x i . Thus. n2 with probability found as follows: If B = {r2}. . (A ﬁrst indication that such a result is feasible is provided by Lemma 3 in Chapter 7. for y = 12. This is easily done if the transformation h is one-to-one from S onto T and monotone (increasing or decreasing). x = 0. in determining the distribution PY of the r. so that ( ) x = −1 ± y + 4 . where S is the set of values of X for which fX is positive and T is the image of S. −1. Then Y takes on the values {y = x 2 + 2 x − 3. ±n. n.v. FY. . ⋅ ⋅ ⋅ = −3. By “one-to-one” it is meant that for each y ∈T. ({ }) = 21n + 21n = 1 n P Y = r 2 = 1 n.f. . . Hence x = −1 + y + 4 . the root −1 − y + 4 being rejected. EXAMPLE 2 ( ) Let X be P(λ) and let Y = h(X) = X2 + 2X − 3. x 2 + 2 x − 3 = y. X. 5. .f. since it is negative. 0. } { } From we get x 2 + 2 x − y + 3 = 0. 1. . then A = h −1 B = x 2 = r 2 = x = − r or ( ) ( ) ( ) = ( x = r ) + ( x = − r ) = {− r} + {r}. 1. . there is only one x ∈S such that h(x) = y.. . and PY B = P Y = y = PX A = ( ) { ) } ( ) ( ( ) ( e − λ ⋅ λ−1+ y + 4 . x=r Thus PY B = PX A = PX −r + PX r That is. Then Y takes on the values 1.v. . it sufﬁces to determine its d. X is uniquely determined by its d. under h: that is. r = 1. then A = h −1 B = −1 + y + 4 . the set to which S is transformed by h. The same is true for r. r = ±1.) Thus. Then the inverse . . if B = {y}. It is a fact. and let Y = X2.9.

of course. it is zero for y < 0. REMARK 1 y y ϭ h (x) (y р y0) corresponds. In the case where h is decreasing. ( X where x = h (y) and h is increasing. EXAMPLE 3 Let Y = h(X) = X2. and FY(y) = 1 − FX(x−) if h is decreasing. h−1. under h. we have FY y = P Y ≤ y = P h X ≤ y −1 −1 () ) [( ) ] = P{h [ h( X )] ≤ h ( y)} = P( X ≤ x ) = F ( x ). y ↑ x. For such a transformation. . where x = h−1(y) in either case. it is possible that the d. h−1[h(x)] = x. REMARK 2 Of course.1 x0 ϭ hϪ1 ( y0 ) x Thus we have the following corollary to Theorem 1. X ( )} where FX(x−) is the limit from the left of FX at x. COROLLARY Let h: S → T be one-to-one and monotone.214 9 Transformations of Random Variables and Random Vectors transformation. X X ) ( ) FY y = FX ( y ) − F (− y −) for y ≥ 0 and.f. we have −1 FY y = P h X ≤ y = P h −1 h X ≥ h −1 y −1 ( ) [ ( ) ] { [ ( )] = P[ X ≥ h ( y)] = P( X ≥ x ) = 1 − P( X < x ) = 1 − F ( x − ).f. to (x у x0) y0 0 Figure 9. ( ) [( ) ] ( ) ( () ) ( y ) − F (− y −). Then FY(y) = FX(x) if h is increasing. of course. FX of X even though h does not satisfy the requirements of the corollary above. FX(x−) = lim FX(y).1 points out why the direction of the inequality is reversed when h−1 is applied if h in monotone decreasing. Figure 9. Then for y ≥ 0. Here is an example of such a case. FY y = P Y ≤ y = P h X ≤ y = P X 2 ≤ y = P − y ≤ X ≤ y = P X ≤ y − P X < − y = FX () ( that is. exists and. FY of Y can be expressed in terms of the d.

fY of Y is given by the following expression: ⎧ d −1 h y . EXAMPLE 4 In Example 3.d. fX is continuous on the set S of positivity of fX. so that 1 −x 2 fX x = e .9. 1). we know that () 2 FY y = FX () X ( y ) − F (− y ). 2π Then. M. Let X be an r. fY of Y. d] be any interval in T and set A = h−1(B). if Y = X2. and we will determine the p. whose p.v. Then the inverse transformation x = h−1(y) exists for y ∈ T. y]. . y ∈) ޒ. which agrees with Theorem 3. so that Y is an r.f. Apostol. It is further assumed that h−1 is differentiable and its derivative is continuous and different from zero on T. FY of the r. A ( ) [( ) ] ( ) () Under the assumptions made. provided it exists.d.d. and then determine the p.d. T. Set Y = h(X).v. X y ≥ 0.f. the p. Another approach to the same problem is the following. ⎩ For a sketch of the proof. Then A is an interval in S and ( )Z 1 2 1 1 2 y e−y 2 ( π = Γ( )). Chapter 4. y ∈T ⎪ f h −1 y fY y = ⎨ X dy ⎪0. The following example illustrates the procedure.f.f. We recognize it as being the p. otherwise.f. and so that d 1 1 FX − y = − fX − y = − e −y 2 . of Y = h(X). assume that X is N(0.1 The Univariate Case 215 We will now focus attention on the case that X has a p. Next. One way of going about this problem would be to ﬁnd the d. 1 2 () [ ( )] () P Y ∈ B = P h X ∈ B = P X ∈ A = ∫ fX x dx.d. Y by Theorem 1 (take B = (−∞. Let y = h(x) be a (measurable) transformation deﬁned on ޒinto ޒwhich is one-to-one on the set S onto the set T (the image of S under h). the theory of changing the variable in the integral on the right-hand side above applies (see for example.d. d FX dy d ( y ) = f ( y ) dy y= 1 2 y fX ( y) = 2 1 −1 2 1 2π y e −y 2 . dy 2 y 2 2π y ( ) () ( ) Γ d 1 FY y = fY y = dy y () 1 2π e−y 2 = y ≥ 0 and zero otherwise.f. of a χ 2 1 distributed r.f. under appropriate conditions. Under the above assumptions. by differentiating (for the continuous case) FY at continuity points of fY.v. let B = [c.v.

216 9 Transformations of Random Variables and Random Vectors Mathematical Analysis.v. with mean aμ + b and variance a2σ2.f. fY of Y is given by ⎧ d −1 h y . ⎪ f h −1 y fY y = ⎨ X dy ⎪0. It is further assumed that h−1 is differentiable and its derivative is continuous and ≠ 0 on T. Here the transformation h: ޒ → ޒ. which we denote by . a2σ2). if X is N(μ. [ ( )] dy −1 Since for (measurable) subsets B of T c.d. dy a () Therefore. σ2).v. so that the inverse transformation x = h−1(y) exists for y ∈ T. Addison-Wesley. of a normally distributed r. that fY has the expression given above. Then the p. satisﬁes the conditions of Theorem 2. it follows from the deﬁnition of the p. P Y ∈ B = ∫ fX h −1 y B ( ) d h ( y) dy. then aX + b is N(aμ + b. 1957. of an r.d. Suppose that h is one-to-one on S onto T (the image of S under h). fX on the set S on which it is positive. fY y = () ⎡ exp⎢− ⎢ 2πσ ⎢ ⎣ 1 1 ( y −b a = ⎡ ⎢ − y − aμ + b exp⎢ 2 a 2σ 2 2π a σ ⎢ ⎣ [ ( −μ ⎤ 1 ⎥⋅ 2σ 2 ⎥ a ⎥ ⎦ ) )] 2 ⎤ ⎥ ⎥ ⎥ ⎦ which is the p. We wish to determine the p.d. and let y = h(x) be a (measurable) transformation deﬁned on ޒinto ޒ. Thus.v. P(Y ∈ B) = P[X ∈ h−1(B)] ≤ P(X ∈ Sc) = 0.d. X have a continuous p.v. Instead. where a. Thus we have the following theorem. of the r. for any interval B in T.f. σ ) and let y = h(x) = ax + b.v. clearly. 216 and 270–271) and gives −1 −1 ∫A fX (x )dx = ∫B fX [h ( y)] dy h ( y) dy. d That is. Now it may happen that the transformation h satisﬁes all the requirements of Theorem 2 except that it is not one-to-one from S onto T. a 0.f. EXAMPLE 5 Let X be N(μ.d. ⎩ () [ ( )] () y ∈T otherwise. We have 2 h −1 y = () 1 y−b a ( ) and d −1 1 h y = . pp. the following might happen: There is a (ﬁnite) partition of S. so that Y = h(X) is an r. Y. are constants.f. so that Y = aX + b. b ∈ޒ.f. THEOREM 2 Let the r.

. ( ) T1 = 0. [ ( )] dy −1 j y ∈T j .f. h1 y = − dy 2 y Therefore.d. . h2 y = dy 2 y () . . j = 1. j = 1. . and δj(y) = 1 if y ∈ Tj and δj(y) = 0 otherwise. . Then by an argument similar to the one used in proving Theorem 2. [ ) T2 = 0. j = 1.d. such that ∪ r j=1Tj = T and that h deﬁned on each one of Sj onto Tj . . Then the p. r}. . j = 1. j = 1. . Here the conditions of Theorem 3 are clearly satisﬁed with S1 = −∞. we ﬁnd fY(y) by summing up the corresponding fY (y)’s. r. which we denote by Tj. . . . ∞ . . we can establish the following theorem. j = 1. . . is one-to-one.f. THEOREM 3 Let the r. . Tj). r (0 ≤ k ≤ r). and there are r subsets of T. j = 1. () 1 d −1 . () 1 h− 2 y = () y. . X and let Y = h(X) = X2. r. .v. fX on the set S on which it is positive. y > 0. .f. ∞ ( ) by assuming that fX(x) > 0 for every x ∈ޒ.9. . fY y = ⎨∑ j =1 ⎪0. fY of the r. and let y = h(x) be a (measurable) transformation deﬁned on ޒinto ޒ. . but the Tj’s need not be disjoint) such that h: Sj → Tj. r of T (the image of S under h). 0 . We want to determine the p. X have a continuous p. . j = 1. . . . r. . The following example will serve to illustrate the point.d.1 The Univariate Case 217 {Sj. r. j = 1. Next. ∞ . which need not be distinct or disjoint. fY of Y is given by ⎧ r ⎪ δ j y fY y . we work as we did in Theorem 2 in order to ﬁnd 1 fY y = fX h − y j j () d h ( y) . Let hj be the restriction of the transformation h to 1 −1 Sj and let h− j be its inverse. . . . Assume that h j is differentiable and its derivative is continuous and ≠ 0 on Tj. . Suppose that there is a partition {Sj. ⎩ () () () j y ∈T otherwise. 1 fY y = fX h − y j j () d h ( y) . . . (note that ʜr j=1Tj = T. . . [ ( )] dy −1 j then if a y in T belongs to k of the regions Tj. where for j = 1. j = 1. Y. so that Y = h(X) is an r. r is one-to-one. . . so that d −1 1 . r} of S and subsets Tj. r. . .v. This result simply says that for each one of the r pairs of regions (Sj. . ( ] S 2 = 0. .v. . j EXAMPLE 6 Consider the r. −1 h1 y = − y. r. . . . .v.

Z. .d. ﬁrst by determining their d. .v. 105). and for y > 0.f. Y = X 3.v.1.1.3 Let X. Then it is known that Y = – 5X 2 + 32. 9. B3 = (−∞. ﬁnd the p. 107). ii) If n = 3. . iii) If X is interpreted as a speciﬁed measurement taken on each item of a product made by a certain manufacturing process and cj. . and secondly directly. 5) and B1 = (95.5 If the r. 2. If X is distributed as N(μ.f. with p.f. of the continuous type and set Y = ∑n j = 1cjIB (X). 92] + [107.f. ∞). Z = log X. let Y = log X and suppose that the r. . where Y = eX. j = 1. are independent and distributed as the r. B2 = (92.v. determine the distribution of the r.v.’s of the following r. 2.1 Let X be an r.14 of Chapter 3 and determine the p. we then get fY y = () ( y ) + f (− y )⎤⎦⎥. of the r. ﬁrst by determining its d. n. log X (for α > 0). .v.f.. 1). eX.. . σ ). j = 1.1. are pairwise disjoint (Borel) sets and cj.218 9 Transformations of Random Variables and Random Vectors fY y = fX − y 1 () ( ) 2 1y .1. f given in Exercise 3.f. determine the p. n.d.d. 1/(X + 1). Exercises 9. j = 1.d. if X is N(0. and secondly directly. X is N(99.2. of −∑n j = 1Yj. of a χ 2 1 r. .v. β): i) Derive the p.v.d.d. as we also saw in Example 4 in a different way.v. Use the ch.’s. j = 1. of Y in terms of that of X. respectively.v.2 Let X be an r. 95) + (105. where Bj.f. 1 ⎡ f ⎢ X 2 y⎣ fY y = fX 2 () ( y) 2 1y . we arrive at the conclusion that fY(y) is the p.f. X is distributed as U(α. . X is distributed as Negative Exponential with parameter λ. 3. and secondly directly.1. of the continuous type. Y be r. approach to determine the p.d. n.’s in part (i) become for α = 0 and β = 1? iii) For α = 0 and β = 1.v.d. 3 are the proﬁt (in dollars) realized by selling one item under the condition that X ∈ Bj. ﬁrst by determining their d.’s Yj. .f. whereas X is an r.4 If the r.v. . In particular.v. ii) What do the p.d. ﬁnd the expected proﬁt from the sale of one item. j = 1. of each one of the r. are constants.f. i) Express the p.f.f. 9. of Y. .v.’s Y. .’s representing the temperature of a certain object 9 in degrees Celsius and Fahrenheit. Y.’s: aX + b (a > 0). with p.v. and notice that Y is a discrete r.’s.d. 9.f.f. X2 +1. respectively. j 9. Y deﬁned above. X provided ±√y are continuity points of fX.

of the r. β.11 Suppose that the velocity X of a molecule of mass m is an r. by means of a transformation. Then the distribution PY of the r. vector X as follows: For any (Borel) subset B of ޒm.v.2 9. with p.’s distributed as U(α.12 If the r. f given in Exercise 3. Y = X1 + X2. show that the 1 1 – r. Y = 1/X is distributed as N(0. 9.v. 1 2 1 2 2 From Fig.f. PY(B) = PX(A). 9. show that Y ∼ χ 2 2α. FY. 2 π).1. 9. Y = [(X − μ)/σ]2 is distributed as χ 2 1. We have FY y = P X1 + X 2 ≤ y = ∫ ∫ {x +x 1 () ( ) 2 ≤y } fX 1 . EXAMPLE 7 Let X1.v. the distribution PY of the r.1.d.f. 9. 9. vector Y is determined by the distribution PX of the r.f.v.f. For 2α < y ≤ 2 β . 9.v. . show.d. x ∈ޒ and show that the r.3. Y = tan−1 X is distributed as U(−– 2 π. vector. of Y. and Y = 2X/β. vector and let h: ޒk → ޒm be a (measurable) function.v. . 2 π).v. distributed as χ 2 r . we see that for y ≤ 2α. Y = – 2 mX (which is the kinetic energy of the molecule).13(ii) of Chapter 3. show that the r.6 If the r. . X has the Gamma distribution with parameters α.v. f given by f x = () 1 2π x −2 e −1 2 x 2 ( ). 1). As in the univariate case. x )dx dx .v.1. . so that Y = h(X) is an r. FY y = () 1 (β − α ) ⋅ A. where A = h−1(B). Y = tan X is distributed as Cauchy.f. THEOREM 1′ Let X = (X1. We wish to determine the d.1. X is distributed as Cauchy with μ = 0 and σ = 1.d. Z = sin X. Xk)′ be a k-dimensional r.X 2 (x .1. with p.10 Let X be an r. set Y = X/(1 + X) and determine the p.1 The The Multivariate Univariate Case 219 1 1 – 9.2.7 If the r. Derive the distribution of the 1 2 r. that the r.v. The proof of this theorem is carried out in exactly the same way as that of Theorem 1. 9. σ2).v. provided 2α is an integer.1.8 If X is an r.2 The Multivariate Case What has been discussed in the previous section carries over to the multidimensional case with the appropriate modiﬁcations.v. Also ﬁnd the distribution of the r. FY(y) = 0.9. β). X is distributed as U(− – 2 π. vector Y is uniquely determined by its d. 9. X is distributed as N(μ. .1. X2 be independent r.9 If the r.v.v.v.

( () ( ( ) ) y ≤ 2α 2α < y ≤ α + β 2 2 ( ) ) . We also write fX +X = fX * fX for the corresponding p. of X1 + X2 for any two independent r. we have FY y = () 1 (β − α ) 2 ⎡ ⎢ β −α ⎢ ⎢ ⎣ ( ) 2 (2 β − y) − 2 2 ⎤ ⎥ = 1 − 2β − y ⎥ 2 β −α ⎥ ⎦ ( ( ) ) 2 2 . ⎪ 2 ⎪2 β − α FY y = ⎨ 2β − y ⎪ ⎪1 − ⎪ 2 β −α ⎪ ⎩1.v.d. A = (y − 2α)2/2.f. These concepts generalize to any (ﬁnite) number of r.’s.220 9 Transformations of Random Variables and Random Vectors where A is the area of that part of the square lying to the left of the line x1 + x2 = y. Since for y ≤ α + β. x2 2 x1 ϩ x2 ϭ y  ␣ϩ x1 x1 ϩ x2 ϭ y ␣ 0 ␣ 2␣  2 Figure 9.f. REMARK 3 1 2 1 2 1 2 1 2 . α + β < y ≤ 2β y > 2 β. Thus we have: ⎧0 . For α + β < y ≤ 2β.’s (not necessarily U(α. X2 and is denoted by FX +X = FX * FX .v.’s. ⎪ 2 ⎪ y − 2α . β) distributed) is called the convolution of the d. we get ( y − 2α ) F ( y) = 2(β − α ) Y 2 2 for 2α < y ≤ α + β .2 The d.’s of X1.f.

y2 = P Y1 = y1 .Y y1 . X 2 = y2 . p). with y1 and y2 as above.1 The The Multivariate Univariate Case 221 EXAMPLE 8 Let X1 be B(n1. we have y2 = 0 ∑⎜ ⎝y y1 ⎛ y ⎛ n1 ⎞ ⎛ n2 ⎞ ⎞ ⎛ n2 ⎞ n ⎛ n1 ⎞ ⎛ n2 ⎞ ⎟⎜ ⎟ ⎟⎜ ⎟ = ∑ ⎜ ⎟⎜ ⎟ = ∑ ⎜ y =0 ⎝ y1 − y2 ⎠ ⎝ y2 ⎠ y = y −n ⎝ y1 − y2 ⎠ ⎝ y2 ⎠ 1 − y2 ⎠ ⎝ y2 ⎠ n1 1 2 2 1 1 = y2 = y1 −n 1 ∑ n2 ⎛ n1 ⎞ ⎛ n2 ⎞ ⎛ n1 + n2 ⎞ ⎟⎜ ⎟ = ⎜ ⎜ ⎟. ⎪ ⎩ ( ) ( 1 ) fY y1 = P Y1 = y1 = ( ) ( ) ∑ f (y . Let Y1 = X1 + X2 and Y2 = X2.2 9. Furthermore. y2 = ⎜ . of Y1. of Y1.f.Y y1 .f. X2 be B(n2.d. y ) = p y2 =u Y1 . it follows that ⎛ n1 ⎞ ⎛ n2 ⎞ ⎜ ⎟⎜ ⎟ ⎝ y − y2 ⎠ ⎝ y2 ⎠ P Y2 = y2 Y1 = y1 = 1 . this is equal to P X1 = y1 − y2 P X 2 = y2 1 2 1 1 ( ) ( ) 2 2 2 ⎛ n1 ⎞ y − y n −( y − y ) ⎛ n2 ⎞ y n =⎜ ⋅⎜ ⎟ p q q ⎟p ⎝ y1 − y2 ⎠ ⎝ y2 ⎠ ⎛ n1 ⎞ ⎛ n2 ⎞ y ( n + n )− y . p) and independent. We want to ﬁnd the joint p. for the four possible values of the pair. and the conditional p. We next have two theorems analogous to Theorems 2 and 3 in Section 1.d. ⎝ y1 − y2 ⎠ ⎝ y2 ⎠ ⎝ y1 ⎠ that is. That is.f. . 1 2 ( ) ( ) ( ) since X1 = Y1 − Y2 and X2 = Y2. independent. fY .f. =⎜ ⎟⎜ ⎟ p q ⎝ y1 − y2 ⎠ ⎝ y2 ⎠ 1 1 2 1 −y2 that is ⎛ n1 ⎞ ⎛ n2 ⎞ y ( n + n )− y fY . Y1 = X1 + X2 is B(n1 + n2. y1 − n1 ≤ y2 ≤ min y1 .9. of Y2. Y2 = y2 = P X1 = y1 − y2 . (u. υ). ⎟⎜ ⎟ p q ⎝ y1 − y2 ⎠ ⎝ y2 ⎠ 1 2 ( ) 1 1 2 1 Thus 1 ⎧ ⎪0 ≤ y1 ≤ n1 + n2 ⎨ u = max 0.Y2 1 2 2 υ y1 q ( n + n )− y 2 1 y2 =u ∑⎜ ⎝y υ ⎛ n1 ⎞ ⎛ n2 ⎞ ⎟⎜ ⎟. n2 = υ.) Finally. Chapter 7. Y2 and also the marginal p. given Y1 = y1..d.d. of p!. p). (Observe that this agrees with Theorem 2. ⎛ n1 + n2 ⎞ ⎜ ⎟ ⎝ y1 ⎠ ( ) the hypergeometric p. 1 − y2 ⎠ ⎝ y2 ⎠ Next. by independence.

. Ym)′ and Ym + j = hm + j(X). vector X have continuous p. . ∂yi ( ) i. however.d. A number of examples will be presented to illustrate the application of Theorem 2′ as well as of the preceding remark. ⋅ ⋅ ⋅ . . . where the Jacobian J is a function of y and is deﬁned as follows J= g11 g 21 M g k1 ⋅ ⋅ ⋅ g1k ⋅ ⋅ ⋅ g2 k M M ⋅ ⋅ ⋅ g kk and is assumed to be ≠ 0 on T. . . . . Ym + 1. ⋅ ⋅ ⋅ .d. It is further assumed that the partial derivatives g ji y = () ∂ g j y1 .f. () [ ( )] [ () g12 g 22 M gk2 ( )] y ∈T otherwise. k − m. gk y () ( () ( )) ′ exists for y ∈T . say. hm+j.f. . fX on the set S on which it is positive. hm + 1 . . In Theorem 2′. hk ( ) ′ satisﬁes the assumptions of Theorem 2′. ⋅ ⋅ ⋅ . of Y. k − m. vector X to the k-dimensional r. so that the inverse transformation x = h −1 y = g1 y . we obtain the p. vector Y. hm(x))′ and choose another k − m transformations deﬁned on ޒk into ޒ. . k − m. . fY y = ⎨ X ⎪ ⎩0.f. . . so that they are of the simplest possible form and such that the transformation REMARK 4 h = h1 . . . . . .222 9 Transformations of Random Variables and Random Vectors THEOREM 2 ′ Let the k-dimensional r. so that Y = h(X) is a k-dimensional r. ⋅ ⋅ ⋅ . . j = 1. fY of Y is given by ⎧ ⎪ f h −1 y J = fX g1 y .d. j = 1. In many applications. and let y = h x = h1 x . . . . j = 1. of Y. .d. hm . Suppose that h is one-to-one on S onto T (the image of S under h).f. vector. ⋅ ⋅ ⋅ . ⋅ ⋅ ⋅ .f. the dimensionality m of Y is less than k. gk y J . . . fZ of Z and then integrating out the last k − m arguments ym+j. where Y = (Y1.d. Yk)′. . Let y = (h1(x). Then in order to determine the p. hk x () ( () ( )) ′ be a (measurable) transformation deﬁned on ޒk into ޒk. we have the p. yk . we work as follows. the transformation h transforms the k-dimensional r. Ym. Then by applying Theorem 2′. . ⋅ ⋅ ⋅ . j = 1. Set Z = (Y1. k exist and are continuous on T. Then the p. .

β). Since y1 − y2 = x1.2 Multivariate 223 EXAMPLE 9 Let X1.i.d. (See Figs.X2 (x1.3 and 9. r.v. Set Y1 = X1 + X2 and ﬁnd the p. x2)′. of Y1. 9. ⎩ Then J= 1 −1 =1 0 1 and also α < y2 < β.’s distributed as U(α.d.4 T = image of S under the transformation h.9. x ) ( 1 2 ⎧ 1 ⎪ = ⎨ β −α ⎪ ⎩0 .) x2  S ␣ 0 ␣  h (Fig. we get ⎧ ⎪ x1 = y1 − y2 ⎨ ⎪ x 2 = y2 . α < x1 .1 The The Univariate Case 9. Consider the transformation ⎧Y = X1 + X 2 ⎧ y = x1 + x 2 . ⎩ y2 = x 2 From h.f. Thus the limits of y1. α < y1 − y2 < β. then ⎨ 1 h: ⎨ 1 ⎩Y2 = X 2 . fX1. we have α < y1 − y2 < β. x 2 < β .3 S = {(x1. We have fX 1 . ) 2 .4) T Figure 9. x 2 < β otherwise. .X 2 (x . α < x1 < β. X2 be i. α < x1 . x2) > 0} x1 y2  y1 Ϫ y2 ϭ ␣ y1 Ϫ y2 ϭ c T 2␣ ␣ 0 ␣ ␣ϩ  2 y1 y1 Ϫ y2 ϭ  Figure 9. 9.4. y2 are speciﬁed by α < y2 < β.

Y y1 . y2 ⎩ ⎭ ( ) .υ. β). f X ⎩ is transformed by h onto ( ) 1 . x ) > 0⎫ ⎬ ⎭ 1 2 ⎧ ⎫ y T = ⎨ y1 . y2 0 1 Now ⎧ S = ⎨ x1 .d.d.X 2 ( x . Set Y1 = X1X2 and ﬁnd the p. r. of Y1. 9. From h. .224 9 Transformations of Random Variables and Random Vectors Thus we get fY . then ⎨ 1 h: ⎨ 1 ⎩ y2 = x 2 ⎩Y2 = X 2 . x2 ′ . The graph of fY is given in Fig. for 2α < y1 ≤ α + β for α + β < y1 < 2 β 2 otherwise. 1 < 1 < β . 1 fY1( y1) 1 ( Ϫ ␣) Figure 9.’s from U(1.i. X2 be i. 2α < y1 < 2 β .5. Let X1. ) 2 .f. ⎩ fY 1 ( ) ( ( ) ) 2 ∫α y1 −α dy2 = ( y1 − 2α 2 ∫y −β dy2 = 1 β (β − α ) β −α 2 β − y1 ) 2 . α < y2 < β . y2 1 2 ( ) ( ⎧ 1 ⎪ = ⎨ β −α ⎪ ⎩0 . α < y1 − y2 < β otherwise. 1 < y2 < β ⎬. y2 ′ .f. we get ⎧ y ⎪ x1 = 1 y2 ⎨ ⎪x = y 2 ⎩ 2 1 y1 − 2 1 and J = y2 y2 = . Therefore ⎧ 1 ⎪ ⎪ β −α ⎪ y1 = ⎨ 1 ⎪ ⎪ β −α ⎪0. Consider the transformation ⎧ y = x1 x 2 ⎧Y = X1 X 2 .5 2␣ 0 ␣ϩ 2 y1 REMARK 5 EXAMPLE 10 This density is known as the triangular p.d.

⎪ 2 ⎪ β −1 ⎪ ⎪ 1 2 log β − log y1 . y2 = ⎨ β − 1 ⎪ ⎩0. y1 = ⎨ 2 ⎪ β −1 ⎪ ⎪ ⎪ ⎩0 .i.’s from N(0.6.f.2 9. fY y1 = 1 ( ) 1 1 ⋅ . ) ( ) 2 1 .6 1  2 y1 fY . Let Y2 = X2 and consider the transformation . X2 be i. that is ⎧ 1 log y1 .d. since  y2 ϭ y1 T y2 ϭ y1  1 0 Figure 9. π 1 + y2 1 We have Y1 = X1/X2.9.1 The The Multivariate Univariate Case 225 (See Fig. y ) 1 2 ′ ∈T otherwise. Show that the p. 1 β ≤ y1 < β 2 . fY1 ( ) ( ( ) ) 1 < y1 < β ( ) β ≤ y1 < β 2 otherwise. of the r. we have fY 1 ( ) ( ( ⎧ 1 ⎪ ⎪ β −1 y1 = ⎨ ⎪ 1 ⎪ β −1 ⎩ ) ) 2 ∫ ∫ y1 1 dy2 1 = y2 β −1 ( ) 2 log y1 .) Thus. 2 (y . EXAMPLE 11 Let X1.v. y1 ∈ ޒ. 1 < y1 < β β 2 y1 β dy2 1 = y2 β −1 ( ) 2 (2 log β − log y ). 9.Y 1 2 ( ⎧ 1 ⎪ y1 . Y1 = X1/X2 is Cauchy with μ = 0. σ = 1.d. 1). r.υ. that is.

x2 < ∞ implies −∞ < y1. and J= y2 0 y1 1 ⎧ x = y1 y2 then ⎨ 1 ⎩ x2 = y2 J = y2 . We set Y2 = X1 + X2 and consider the transformation: ⎧ x1 ⎧ x = y1 y2 ⎪y = h: ⎨ 1 x1 + x2 . respectively. y2 = fX 1 . − ⎟ y2 dy2 = ∫0 exp ⎢− ∫−∞ exp⎜ ⎥ 2 2 2 π 2 ⎝ ⎠ ⎣ ⎦ ∞ ( ) Set (y and 2 1 +1 2 )y 2 2 = t . so that 2 = y2 2t y +1 2 1 2y2 dy2 = 2dt . 2) and (β. 2).υ.’s distributed as Gamma with parameters (α. x2 > 0. ⎪y = x + x 1 2 ⎩ 2 . π y1 +1 fY y1 = EXAMPLE 12 ( ) Let X1. t ∈ 0. 1 1 ⋅ 2 .X 2 y1 y2 . y +1 2 1 [ ) Thus we continue as follows: 1 π since ∫0 e ∞ −t ∞ dt 1 1 1 1 e − t dt = ⋅ 2 . x1 . x2 ≠ 0 h: ⎨ 1 ⎩ y2 = x2 . 1 ∞ −t dt = 1. ∞ .Y2 y1 . Set Y1 = X1/(X1 + X2) and prove that Y1 is distributed as Beta with parameters α. = ⋅ 2 ∫ 0 π π y +1 y1 + 1 y1 + 1 2 1 ∫0 e that is. so that Since −∞ < x1. we have fY1 . y2 < ∞. then ⎨ 1 ⎩ x2 = y2 − y1 y2 .226 9 Transformations of Random Variables and Random Vectors ⎧ y = x1 x2 . X2 be independent r. β. or y2 1 +1 y2 dy2 = dt . y2 ⋅ y2 = and therefore 1 fY ( y ) = 2π 1 1 ( ) ( ) 2 2 2⎞ ⎛ y1 y2 + y2 1 exp⎜ − ⎟ y2 2π 2 ⎝ ⎠ ⎡ y2 + 1 y2 ⎤ 2 2⎞ ⎛ y2 1 2 1 ∞ 1 y2 + y2 ⎢ ⎥y dy . = y2 .

α . () Set Y1 = X1 X1 + X 2 . Hence fY1 y1 = ( ) 1 α −1 y1 1 − y1 Γ α Γ β 2α +β ( )() ∞ 0 ( ) β −1 α + β −1 − y2 2 e dy2 . ( ) ()() ( ) β −1 .9. clearly. x2 > 0. 0 < y2 < ∞. β > 0. x2 = ⎨ Γ α Γ β 2 2 2 ⎠ ⎝ ⎪ otherwise. EXAMPLE 13 Let X1. y1 = 0 and for x1 → ∞. x1 1 = → 1.d. x1 + x 2 1 + x 2 x1 ( ) Thus 0 < y1 < 1 and.υ. Y3 = X1 + X 2 + X 3 X1 + X 2 X1 + X 2 + X 3 . 0 ∞ ( ) fY 1 ( ) ⎧Γ α +β ⎪ yα −1 1 − y1 y1 = ⎨ Γ a Γ β 1 ⎪ ⎩0. ⎩0. ) ( )() y1 = From the transformation. × ∫ y2 But ∫0 Therefore ∞ + β −1 − y yα e 2 2 2 dy2 = 2 α + β ∫ t α + β −1e − t dt = 2 α + β Γ α + β . fX 1 . it follows that for x1 = 0.’s with density ⎧e − x . Next. ⎩0 . for 0 < y1 < 1. x > 0 f x =⎨ x ≤ 0. Therefore. 0 < y2 < ∞. we get fY1 . X2. Y2 = . x1 . y2 = = ( ) 1 −1 α −1 β −1 yα 1 − y1 1 y2 y2 Γ α Γ β 2 α +β ( )() ( )() 1 ( ) β −1 ⎛ y ⎞ exp⎜ − 2 ⎟ y2 ⎝ 2⎠ Γα Γβ2 α +β α −1 y1 1 − y1 ( ) β −1 α + β −1 − y 2 2 y2 e . 0 < y1 < 1 otherwise. r. α β 1 x1 .1 The The Multivariate Univariate Case 227 Hence J= y2 − y2 y1 = y2 − y1 y2 + y1 y2 = y2 1 − y1 and J = y2 .i.Y2 y1 . X3 be i.X 2 ( ⎧ ⎛ x + x2 ⎞ 1 β −1 xα −1 x2 exp⎜ − 1 ⎪ ⎟ .2 9.

x3 ∈ (0. ( ) ( ) ( ) ( ) the independence of Y1. y3 = fY1 y1 fY2 y2 fY3 y3 . Y3 are independent. 1 2 3 ⎧ ⎩ 2 −y3 3 . y . 1).Y3 y1 . 1 . 1 3 .228 9 Transformations of Random Variables and Random Vectors and prove that Y1 is U(0. y2 ∈ 0. x1 . and Y1. Hence fY y1 = ∫ 1 fY and 2 ( ) ∫y y e (y ) = ∫ ∫ y y e ∞ 1 0 ∞ 1 0 0 2 2 0 2 2 −y3 3 dy2 dy3 = 1. y2 . − y2 + 1 Now from the transformation. 2 3 fY1 . 0 < y1 < 1. 0 < y2 < 1 2 −y 2 −y fY y3 = ∫ ∫ y2 y3 e dy1dy2 = y3 e 3 3 ( ) 1 1 3 0 0 ∫ y dy 0 2 1 2 = Since 1 2 −y y3 e . y3 ∈ 0. Consider the transformation ⎧ x1 ⎪ y1 = x1 + x 2 ⎪ ⎧x1 = y1 y2 y3 x1 + x 2 ⎪ ⎪ h: ⎨ y2 = . 3 2 −y3 3 2 −y dy1dy3 = y2 ∫ y3 e dy3 = 2 y2 . ( ) 2 ( ) ( ) Thus fY .Y2 . ∞ 0 0 < y1 < 1. fY verify the rest. x2. Y3 is distributed as Gamma with α = 3. Y3 is established.Y 3 y y e (y . 1 . x 2 . 0 < y2 < 1. 0 < y3 < ∞. x3 > 0. it follows that x1. then ⎨x 2 = − y1 y2 y3 + y2 y3 x1 + x 2 + x3 ⎪ ⎪x = − y y + y 2 3 3 ⎩ 3 ⎪ ⎪ ⎩ y3 = x1 + x 2 + x3 and y2 y3 J = − y2 y3 0 y1 y3 − y1 y3 + y3 − y3 y1 y2 2 − y1 y2 + y2 = y2 y3 . 0 < y3 < ∞ otherwise.Y 1 2 . y ) = ⎪ ⎨ ⎪0. Y2. ∞) implies that y1 ∈ 0. Y2. β = 1. The functional forms of fY . ∞ .

y e ⎪ 1 1 2 )r ( fY y = ⎨ Γ r 2 2 ⎪ ⎩0. r ⎠⎥ ⎢ ⎣ 2⎝ ⎦ () ∞ 0 2πr Γ r 2 2 r ( ) 1 2 u (1 2 )( r +1)−1 ⎡ u⎛ t2 ⎞⎤ exp⎢− ⎜ 1 + ⎟ ⎥du.v. respectively.9. and set T = X/√Y/r.’s X and Y be distributed as N(0. Let the independent r. ⎧ 1 ( r 2 )−1 − y 2 . () ( ) Set U = Y and consider the transformation ⎧ x ⎧ 1 t u ⎪x = ⎪t = h: ⎨ y r . + = .f. x ∈ ޒ.) and is often denoted by tr.1 Application 2: The t and F Distributions The density of the t distribution with r degrees of freedom (tr). 1) and χ2 r . u = = Hence fT t = ∫ We set ( ) 1 2π e − t2 u 2 r ( ) ⋅ 1 ( r 2 )−1 − u 2 u u e . we get fT . u > 0.d. then ⎨ r ⎪y = u ⎪u = y ⎩ ⎩ and u J= r 0 t 2 u r = 1 u r .2. r 2 Γr 22 r ( ) 2πr Γ r 2 2 r ( ) 1 2 u (1 2 )( r +1)−1 ⎡ u⎛ t2 ⎞⎤ exp⎢− ⎜ 1 + ⎟ ⎥.υ. The r. Then for t ∈ ޒ. ∞). y>0 y ≤ 0.U t .f.2 9. so that = + . r ⎠⎥ ⎢ ⎣ 2⎝ ⎦ ⎛ ⎛ u⎛ t2 ⎞ t2 ⎞ t2 ⎞ 1 z u 2 z 1 du 2 1 dz. = + ⎜ ⎜ 2⎜ r⎟ r⎟ r⎟ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ and z ∈[0. T is said to have the (Student’s) t-distribution with r degrees of freedom (d. Therefore we continue as follows: −1 −1 . We want to ﬁnd its p.1 The The Multivariate Univariate Case 229 9. We have: fX x = () 1 2π e − 1 2 x2 ( ) .

We have: 1 2 1 2 1 2 fX () ⎧ 1 ⎪ 1 x = ⎨ Γ 2 r1 2 r1 ⎪0. r2 degrees of freedom (d. F is said to have the F distribution with r1. The probabilities P(T ≤ t) for selected values of t and r are given in tables (the t-tables).) and is often denoted by Fr . (Fr .r .7 t The density of the F distribution with r1.v. 9. (For the graph of fT. see Fig.’s 2 X and Y be distributed as χ 2 r and χ r .r ). and set F = (X/r1)/(Y/r2). 1 2 that is fT t = (r + 1)] () [ πr Γ r 2 ( ) [1 + (t r )]( 2 1 1 2 r +1 )( ) .f.υ. . r2 d. The r.) f T ( t) t ϱ (N (0.230 9 Transformations of Random Variables and Random Vectors (1 2 )( r +1)−1 ⎡ ⎤ 1 2 ⎢ 2z ⎥ e −z dz fT t = ∫ 2 0 r 2 ⎢ ⎥ 1+ t r 1 + t2 r 2πr Γ r 2 2 ⎣ ⎦ (1 2 )( r +1) ∞ (1 2 )( r +1)−1 1 2 z = e − z dz 1 2 )( r +1) ∫ ( 0 r 2 2 2πr Γ r 2 2 1+ t r () ∞ ( ) ( ) ( ) ( ) [ ( )] 1 = 1 2 r +1 πr Γ r 2 1 + t 2 r ( )( ) 1 ( ) [ ( )] Γ 1 2 Γ [ (r + 1)].7. x>0 1 x ≤ 0. We want to ﬁnd its p. Let the independent r. ⎩ ( ) 2 x ( r 2 )−1 − x 2 e .d.f. respectively. t ∈ ޒ.f. 1)) t5 0 Figure 9.

9. so that r2 1 J = r1 z. then ⎨ r2 ⎪z = y ⎪y = z ⎩ ⎩ and r1 z J = r2 0 r1 f r1 r2 = z. z = ( ) 1 Γ ( r )Γ( r )2 1 2 1 1 2 2 (1 2 )( r + r 1 2 ⎛ r1 ⎞ ⎟ )⎜ ⎝ r2 ⎠ ( r 2 )−1 1 f ( r 2 )−1 ( r 2 )−1 ( r 2 )−1 z z 1 1 2 ⎛ r ⎞ r × exp⎜ − 1 ⎟ fz e − z 2 1 z r2 ⎝ 2r2 ⎠ r 2 ( r 2 )−1 r1 r2 f ⎡ z⎛r ⎞⎤ (1 2 )( r + r )−1 = z exp⎢− ⎜ 1 f + 1⎟ ⎥.Z f . so that z = 2t ⎜ f + 1⎟ . y>0 2 y ≤ 0. 1 2 r + r ( )( ) ⎠⎥ ⎢ Γ 1 r Γ 1 r 2 ⎣ 2 ⎝ r2 ⎦ 2 2 2 1 ) ( )( ) 1 ( 1 1 2 1 2 Therefore fF f = ∫ fF . ⎩ () ( ) 2 2 y ( r 2 )−1 − y 2 e . z dz 0 r1 2 1 2 () ( ) (r r ) f ( = ( Γ ( r )Γ ( r )2 ∞ 1 2 1 1 2 2 r1 2 −1 1 2 r1 + r2 ) )( ) ∫ ∞ 0 z (1 2 )( r + r )−1 1 2 ⎡ z⎛r ⎞⎤ exp⎢− ⎜ 1 f + 1⎟ ⎥dz. t ∈ 0. we have . and consider the transformation ⎧ ⎧ x r1 r ⎪f = ⎪x = 1 fz h: ⎨ y r2 . z > 0. ∞ . we get: f F . ⎠⎥ ⎢ ⎣ 2 ⎝ r2 ⎦ Set ⎞ ⎛ r1 ⎞ z ⎛ r1 ⎜ f + 1⎟ = t .Z f . We set Z = Y. ⎝ r2 ⎠ −1 −1 [ ) Thus continuing.1 The The Multivariate Univariate Case 231 ⎧ 1 ⎪ 1 fY y = ⎨ Γ 2 r2 2 r ⎪0. r2 For f.2 9. 2 ⎝ r2 r ⎠ ⎝ 2 ⎠ ⎛r ⎞ dz = 2 ⎜ 1 f + 1⎟ dt .

fX on the set S on which it is positive. 9. . fF () ⎧Γ 1 r + r r r ⎪ 2 1 2 1 2 ⎪ f =⎨ Γ 1 r Γ 1 r 2 1 2 2 ⎪ ⎪ ⎩0. 1). The probabilities P(F ≤ f ) for selected values of f and r1. r} of S and subsets Tj. then.r. such that ʜjr= 1 Tj = T and that h deﬁned on each one of Sj onto Tj. which need not be distinct or disjoint. 1 2 2 1 We consider the multidimensional version of Theorem 3. . j = 1. . clearly. 1/F is distributed as Fr . . r2 are given by tables (the F-tables). for for f >0 f ≤ 0. . so that Y = h(X) is a k-dimensional r. . . and let y = h(x) = (h1(x).8. 4 0 Figure 9.r .r . [( )]( ) ( )( ) r1 2 ⋅ f 1 ( r 2 )−1 1 [1 + (r r ) f ] 2 (1 2 ) ( r + r ) 1 2 . hk(x))′ be a (measurable) transformation deﬁned on ޒk into ޒk. ii) If X is N(0. see Fig. Y are independent. .232 9 Transformations of Random Variables and Random Vectors fF (r r ) f ( f = () ( Γ ( r )Γ ( r )2 r1 2 1 2 1 2 1 1 2 2 r1 2 −1 1 2 r1 + r2 ) )( t ⎞ (1 2 )( r + r )−1 ⎛ r1 ⎜ f + 1⎟ ) ⎝ r2 ⎠ 2 1 2 1 2 − 1 2 r1 + r2 +1 ( )( ) ⎛r ⎞ × 2 ⎜ 1 f + 1⎟ ⎝ r2 ⎠ = Therefore Γ 1 2 1 2 −1 ∫ 1 ∞ (1 2 )( r + r )−1 r1 2 0 e − t dt f 1 [ (r + r )](r r ) Γ ( r )Γ( r ) 1 2 1 1 2 2 2 ⋅ ( r 2 )−1 1 [1 + (r r ) f ] 2 (1 2 )( r + r ) 1 2 . . THEOREM 3 ′ Let the k-dimensional r. Suppose that there is a partition {Sj. vector X have continuous p. . vector. Y is χ 2 r and X. the n T 2 is distributed as F1. (For the graph of fF. so that T = X/√Y/r is distributed as tr. r of T (the image of S under h).) fF ( f ) F10. . 10 F10. since X 2 is χ 2 1.d. j = 1.8 10 20 30 f REMARK 6 i) If F is distributed as Fr .f. . .

. · · · .d.’s X1 + X2 and X1 − X2. X2 be independent r.f. Then the p. . . . j = 1.v. . . .1 Let X1. 9. .v. . gjk(y))′ be its inverse. .) 9. . r exist and for each j. In the next chapter (Chapter 10) on order statistics we will have the opportunity of applying Theorem 3′. r.v. . (Hint: Use polar coordinates. x2 ∈ ޒ2 .2. y ∈Tj. .v.v. π ( ) where ′ ⎧ ⎫ 2 A = ⎨ x1 . Let hj be the restriction of the transformation h to 1 Sj and let h − j (y) = (gj1(y).’s taking on the values 1. .d. r. . . ⎩ ⎭ ( ) 2 2 Set Z2 = X 2 1 + X 2 and derive the p. . l = 1. x 2 = ( ) 1 I A x1 .d. Z . f given by f x1 . i. k are continuous. of the r. 9. . k. .’s with joint p. Derive the distribution of the r.f.2. . . . .3 Let X1. yk). l = 1.f. and the Jacobians Jj which are functions of y are deﬁned by g j11 g j21 M g jk1 g j12 g j22 M g jk 2 ⋅ ⋅ ⋅ g j1k ⋅ ⋅ ⋅ g j2 k M M ⋅ ⋅ ⋅ g jkk −1 j Jj = . 6. . ii) Calculate the probability P(X1 − X2 < 0.4 Let X1. . Assume that the partial derivatives gjil(y) = (∂/∂yl)gji(y1. . .d. .’s distributed as N(0. X1 + X2. . . of the r. and are assumed to be ≠ 0 on Tj. . . . r. fY (y) = fX[h (y)]|Jj|. f Y y = ⎨∑ j =1 ⎪0. Then: i) Find the p. x 2 . gjil. . .’s distributed as Negative Exponential with parameter λ = 1.f. X2 be independent r. ⎩ () j () () j y ∈T otherwise.2. Then: . X2 be r. . . Exercises 9. j = 1. r. . j = 1.9. x = 1. .v.2 Let X1. 1). j = 1. r is one-to-one. . . δj(y) = 1 if y ∈T and δj(y) = 0 otherwise. .v.1 The Univariate Exercises Case 233 j = 1. . . fY of Y is given by ⎧ r ⎪ δ j y fY y . 6 with 1 probability f(x) = – 6 . where for j = 1. i. X1 + X2 > 0). X2 be independent r. x 2 1 + x 2 ≤ 1⎬.2.

v. ii) If r1 = r2.’s X 2 1 + X 2 and X1/X2 are independent.f.2. Y is distributed as – Z. X2 be independent r.’s X and Y are independent.. 9. where Z has the Fr .v.234 9 Transformations of Random Variables and Random Vectors ii) Derive the p. then: 1 2 i) Find its expectation and variance. 1 2 ii) The r.2. Y = is distributed as Beta. X is distributed as χ 2 r +r (as anticipated). 1 + r1 r2 X iv) The p.v. 1 ii) Also show that the r. ii) The r. of Y = is Beta.f..v. as n → ∞.v. X2 be independent r. 2 1 2 r1 9. ii) Show that X1 + X2 and X1/X2 are independent. 1 iii) The p. 9. of X becomes a Cauchy p. Γ(n)/(2π)1/2n(2n−1)/2e−n tends to 1. 3rd ed.’s distributed as N(0.v.v.r distribution.7 Let X be an r. X2 have p. () () ii) For r = 1.d.v.d. f given by f x = 1 I 1. distributed as tr.f. Then show that: 1 2 i) The r.v. Then: i) Derive the p. Y = X1/X2. of r1 X converges to that of χ 2 r as r2 → ∞.v. X is distributed as Fr . and set X = X1 + X2.d. Vol.2.’s are independent or not.) 2 9.f. ii) Determine whether these r.’s X1. 1+ X2 r ( ) 9.’s distributed as χ 2 r and χ r . X = X1/X2.v.’s of the following r. 9.8 If the r.2.v. use Stirling’s formula (see. Feller’s book An Introduction to Probability Theory.’s X1 + X2 and X1 − X2.d.2.9 Let X1.f. show that the p. X2 be independent r. α + 1).6 Let the independent r. X1/X2 has the Cauchy distribution with μ = 0 and σ = 1. page 50) which states that. r iii) The r. X1 − X 2 .2. σ2). I. ( ) 1 (Hint: For part (iv).r . 1968. and X1 X 2 . for example.v.f.∞ x . respectively. x2 ( ) Determine the distribution of the r.’s of the r.d.f.v.’s: X1 + X 2 .5 Let X1.v. show that its median is equal to 1.10 that: Let X1.v. W. X 2 1 + X 2 has the Negative Exponential distribution with parameter 2 λ = 1/2σ . Then show 2 i) The r.d.v. 2 iii) The r.’s distributed as U(α. .d.

r ≥ 3. Let C be the k × k matrix whose elements are cij. We ﬁrst introduce some needed notation and terminology. j = 1. respectively. we can uniquely solve for the x’s in (1) and get xi = ∑ dij y j .d. If Δ ≠ 0. distributed as tr. the linear transformation above is such that the column vectors (c1j. j = 1. Δ* = 1/Δ. C = (cij). Then show that: fr x ⎯ ⎯ → r →∞ () 2 ⎛ x2 ⎞ exp⎜ − ⎟ . ⋅ ⋅ ⋅ . . yk in the following manner: yi = ∑ cij xj .2. Then.r ≥ Fr .r2 = . k (1) is called a linear transformation. .α r1 1 as r2 → ∞. . i.2. the linear transformations. . k.8(iv). .r . If.r be r. σ 2 X r1 . 9. .2. . r ≥ 2.r . 2 for α ∈ (0.α) = α. distributed as tr.v. .v.11 Let Xr be an r. That is. (2) Let D = (dij) and Δ* = |D|. Then show that: EX r1 .r . let χ 2 and F be deﬁned by: P ( X r . and let Δ = |C| be the determinant of C. . .3. distributed as Fr . r2 ≥ 3.2. 2 r2 − 2 r1 r2 − 2 r2 − 4 ( ) ( ( )( ) ) 9.1 The of Random Univariate Vectors Case 235 9.1 Preliminaries A transformation h: ޒk → ޒk which transforms the variables x1. .3 Linear Transformations 9. Then show that: EX r = 0.r be an r.α r ≥ χ r . 2.v. and let fr be its p. as is known from linear algebra (see also Appendix 1). 1).α → 1 2 χ r . σ 2 X r = 9.’s distributed as χ 2 r and Fr . . dij j =1 k real constants. j = 1. . c2j.α r . i. and.r . xk to the variables y1. .r2 = 2r 2 r2 2 r 1 + r2 − 2 .2.f. cij j =1 k real constants. P(Xr . that is . .14 Let Xr and Xr .3 Linear Transformations of Random Vectors In this section we will restrict ourselves to a special and important class of transformations. ckj)′.12 1 2 1 2 ( ) r . furthermore.13 Let Xr be an r. r2 ≥ 5. r −2 Let Xr . 9. .α) = α. Then show that: 1 1 1 1 2 1 1 2 1 1 1 2 1 2 Fr . x ∈ ޒ.r 1 2 . ⎝ 2⎠ 2π 1 (Hint: Use Stirling’s formula given as a hint in Exercise 9.9. ⋅ ⋅ ⋅ . k are orthogonal.) 9.v.

and for the case that the transformation is orthogonal. fX and let S be the subset of ޒk over which fX > 0. for an orthogonal transformation. if yi = ∑ cij x j . i = 1. then j =1 k xi = ∑ c ji y j . vector X = (X1. j =1 k where we assume Δ = |(cij)| ≠ 0. . of the r. . . ⋅ ⋅ ⋅ .f. . ∑ dkj y j ⎟ ⋅ . so that xi = ∑ c ji y j .236 9 Transformations of Random Variables and Random Vectors ⎫ j ≠ j ′⎪ ⎪ i =1 and (3) ⎬ k 2 ⎪ c 1 . . ⋅ ⋅ ⋅ . i = 1. . k. ⎪ ⎪ j =1 ⎭ ∑ cij ci′ j = 0 k (4) It is known from linear algebra that |Δ| = 1 for an orthogonal transformation. k. . Xk)′ with p. Set Yi = ∑ cij X j . Also in the case of an orthogonal transformation. k. = = ∑ ij ⋅⋅⋅ ⎪ i =1 ⎭ then the linear transformation is called orthogonal. so that |J| = 1. we have dij = cji. . k . ⋅ ⋅ ⋅ . ⋅ ⋅ ⋅ . y ) 1 k ′ ∈T where T is the image of S under the transformation in question. yk ( ) k ⎧ ⎛ k ⎞ 1 ⎪ fX ⎜ ∑ d1 j y j . Yk)′ is given by fY y1 . . vector Y = (Y1. Then the p. we have J = ±1. k.d. k. i = 1. ⋅ ⋅ ⋅ .f. (y . ⎩0. . . . if the transformation is orthogonal. j =1 k According to what has been seen so far. j 1 . These results are now applied as follows: Consider the r. In particular. j =1 k This is seen as follows: ∑c j =1 k ji k k ⎛ k ⎞ ⎛ k ⎞ k k y j = ∑ c ji ⎜ ∑ c jl xl ⎟ = ∑ ∑ c ji c jl xl = ∑ xl ⎜ ∑ c ji c jl ⎟ = xi ⎝ l =1 ⎠ j =1 l =1 ⎝ j =1 ⎠ j =1 l =1 by means of (3). . k. j = 1. i. . That is. Thus. .d. . . . ∑ cij ci j′ = 0 k for and ⎫ for i ≠ i ′ ⎪ ⎪ j =1 ⎬ k ∑ cij2 = 1. the Jacobian of the transformation (1) is J = Δ* = 1/Δ. . ⋅ ⋅ ⋅ . ⋅ ⋅ ⋅ . . . . cik)′ i = 1. = ⎨ ⎝ j =1 ⎠ Δ j =1 ⎪ otherwise. i = 1. The orthogonality relations (3) are equivalent to orthogonality of the row vectors (ci1.

otherwise.3 Linear Transformations 9. the transformation is orthogonal. k ⎛ k ⎞⎛ k ⎛ k ⎞ ⎞ cij X j ⎟ = ∑ ⎜ ∑ cij X j ⎟ ⎜ ∑ cil X l ⎟ ∑Y = ∑ ⎜ ∑ ⎠ ⎠ ⎝ l =1 ⎠ i =1 i =1 ⎝ j =1 i =1 ⎝ j =1 k k 2 i k k k k k ⎞ ⎛ k = ∑ ∑ ∑ cij cil X j X l = ∑ ∑ X j X l ⎜ ∑ cij cil ⎟ ⎝ i =1 ⎠ i =1 j =1 l =1 j =1 l =1 2 = ∑X2 i i =1 k because ∑ cij cil = 1 i =1 k for j = l and 0 for j ≠ l. we also have ( ) ( ) . (y . ⋅ ⋅ ⋅ . in the case of orthogonality. ⎩0. Xk)′ with p. yk ( ) k ⎧ ⎛ k ⎞ 1 ⎪ fX ⎜ ∑ d1 j y j . . ⋅ ⋅ ⋅ . ⋅ ⋅ ⋅ . Yk)′ is fY y1 . ⋅ ⋅ ⋅ . Set Yi = ∑ cij X j . ∑ ckj y j ⎟ ⋅ . ∑ c jk y j ⎟ . i =1 i =1 k k In fact. yk ∈T fY y1 .9. fX which is > 0 on S ⊆ ޒk. and the p. . .d. Furthermore. j =1 k i = 1. . ⋅ ⋅ ⋅ . where |(cij)| = Δ ≠ 0. yk = ⎨ ⎝ j =1 ⎠ j =1 ⎪ otherwise. ⎩0. y1 . THEOREM 4 Consider the r. ⋅ ⋅ ⋅ . . then k ⎧ ⎛ k ⎞ ′ ⎪ fX ⎜ ∑ c j 1 y j . ⎩ (y . vector Y = (Y1. If. ⋅ ⋅ ⋅ . ⋅ ⋅ ⋅ . We formulate these results as a theorem. . j =1 k i = 1.1 The of Random Univariate Vectors Case 237 fY y1 .f. . ∑ c jk y j ⎟ . yk ( ) k ⎧ ⎛ k ⎞ ⎪ fX ⎜ ∑ c j 1 y j . y ) 1 k ′ ∈T where T is the image of S under the given transformation. ⋅ ⋅ ⋅ . k. k.d. vector X = (X1. of the r. y ) 1 k ′ ∈T Another consequence of orthogonality of the transformation is that ∑ Y i2 = ∑ X 2i .f. ⋅ ⋅ ⋅ . ⋅ ⋅ ⋅ . . = ⎨ ⎝ j =1 ⎠ Δ j =1 ⎪ otherwise. Then X i = ∑ dij Yj . = ⎨ ⎝ j =1 ⎠ j =1 ⎪ 0 . in particular.

σ2). . j =1 k Then the r. yk ( ) k ⎡ ⎛ 1 ⎞ ⎢− 1 exp =⎜ ⎟ ⎢ 2σ 2 ⎝ 2πσ ⎠ ⎣ ⎞ ⎛ k c ji y j − μ i ⎟ ∑⎜ ∑ ⎠ i =1 ⎝ j =1 k 2 ⎤ ⎥. and independent. i = 1. xk ( ⎛ 1 ⎞ ⎡ 1 =⎜ ⎟ exp ⎢− 2 ⎝ 2πσ ⎠ ⎣ 2σ k ∑ (x i =1 k i 2⎤ − μ i ⎥.’s Xi be N(μi. k. . Consider the orthogonal transformation Yi = ∑ cij X j . . Xk)′ and Y = (Y1. we have fX x1 . . . ⋅ ⋅ ⋅ . j =1 ( ) ) k PROOF With X = (X1. normally distributed with common variance σ2 and means given by E Yi = ∑ cij μ j . . . . ⋅ ⋅ ⋅ . . . ⋅ ⋅ ⋅ . k.v. . . i = 1. j =1 k The following theorem is an application of Theorem 4 to the normal case. . . i = 1.v. ∑⎜ ⎠ j =1 ⎝ i =1 k 2 since expanding this last expression we get: . . Yk are also independent. ⋅ ⋅ ⋅ . THEOREM 5 Let the r. Yk)′. .’s Y1. ⎥ ⎦ Now 2 2 ⎤ k k ⎡⎛ k ⎛ k ⎞ ⎞ 2 ⎥ ⎢ c y − μ = c y + μ − 2 μ c y ∑⎜ ∑ ji j i ⎟ ∑ ⎢⎜ ∑ ji j ⎟ i i ∑ ji j ⎥ ⎠ ⎠ i =1 ⎝ j =1 i =1 ⎝ j =1 j =1 ⎦ ⎣ k k ⎛ k k ⎞ = ∑ ⎜ ∑ ∑ c ji cli y j yl + μ 2 i − 2 μ i ∑ c ji y j ⎟ ⎠ i =1 ⎝ j =1 l =1 j =1 k = ∑ ∑ y j yl ∑ c ji cli + ∑ μ 2 i − 2 ∑ ∑ μ i c ji y j j =1 l =1 k i =1 i =1 j =1 i =1 2 = ∑ y2 j − 2 ∑ ∑ c ji μ i y j + ∑ μ i j =1 j =1 i =1 i =1 k k k k k k k k k and this is equal to k ⎞ ⎛ y j − ∑ c ji μ i ⎟ . k. ⎦ ) and hence fY y1 .238 9 Transformations of Random Variables and Random Vectors ∑X j =1 k 2 j = ∑Y 2 j.

c1 j = c ij = c ii = − Hence 1 k . . . ⋅ ⋅ ⋅ . we get = ∑ c1 j cij = j =1 k 1 k ∑ cij = j =1 k 1 k ∑ cij = j =1 i ⎛ i −1 1 ⎜ i −1 − ⎜ k ⎝ i i −1 i i −1 ( ) ( ) ⎞ ⎟ = 0. i i while for i = 2. k i 2 ij i −1 1 ∑c = ∑c = i − 1 ⋅ i i − 1 + i i − 1 j =1 j =1 2 ij ( ) ( ) ( ) ( ) 2 1 i −1 + = 1.1 The of Random Univariate Vectors Case 239 k ⎛ 2 k k ⎞ y + c c μ μ − 2 c ji μ i y j ⎟ ∑⎜ ∑ ∑ ∑ j ji jl i l ⎠ i =1 l =1 j =1 ⎝ i =1 k = ∑ y2 j − 2 ∑ ∑ μ i c ji y j + ∑ ∑ μ i μ l ∑ c ji c jl j =1 k j =1 i =1 k k i =1 l =1 k j =1 2 = ∑ y2 j − 2 ∑ ∑ μ i c ji y j + ∑ μ i . k. ⋅ ⋅ ⋅ . and i i−1 ( ) i−1 i i−1 2 1j ( ) = . j =1 j = 1 i =1 i =1 k k k k k k as was to be seen. k. . .9. k. ⋅ ⋅ ⋅ . i − 1. and for i = 2. ∑c j =1 k k k = 1. and set ⎧ ⎪Y1 = ⎪ ⎪ ⎪Y2 = ⎪ ⎪ ⎨Y = ⎪ 3 ⎪ ⎪ M ⎪Y = ⎪ k ⎪ ⎩ We thus have 1 k 1 Z1 + 1 k Z2 + ⋅ ⋅ ⋅ + 1 2 ⋅1 1 3⋅ 2 Z2 Z2 − 2 3⋅ 2 1 k k −1 Z3 k −1 1 k Zk 2 ⋅1 1 3⋅ 2 1 Z1 − Z1 + k k −1 ( ) Z1 + ⋅ ⋅ ⋅ + ( ) Zk −1 − k k −1 ( ) Zk . we consider the following result. . ▲ As a further application of Theorems 4 and 5. for j = 1.3 Linear Transformations 9. and for i = 2. . Zk be independent N(0. j = 1. 1). . k 1 . Let Z1. ⋅ ⋅ ⋅ . ⎟ ⎠ . .

. Hence X and S are independent. 2. 2 ¯ Since Y1 is independent of ∑k i=2Y i. by Theorem 5. N(0. . . Yk are independent. and for i > l. Xk be independent r. [(l − 1) − (l − 1)] = 0. ( )( ) [(i − 1) − (i − 1)] = 0. 1). and Y1 = − 1 2 X1 + 1 2 X 2 . Thus we have the following theorem. and hence Z= 1 X −μ σ ( ) and 2 ∑ (Z j =1 k j −Z ) 2 = 1 σ2 ∑ (X j =1 k j −X ) 2 are independent.v. .’s distributed as N(μi. Then X independent. . . l = 2. . this is 1 i i−1 l l −1 ( )( ) Thus the transformation is orthogonal. . j = 1. 3. by Theorem 4.1 set: For i = 1.240 9 Transformations of Random Variables and Random Vectors and for i. . k. . we have ∑c c and j =1 k ij lj = ∑ cij clj j =1 i if i < l . . Y2 = − 1 6 X1 + 1 6 1 3 X1 − 2 6 1 3 X3 .’s distributed as N(μ. PROOF Set Zj = (Xj − μ)/σ. this is 1 i i−1 l l −1 j =1 l ij lj if i > l . Then the Z’s are as above. . . . let Xi be independent r. . we conclude that Z is independent of k 2 ¯ ) . . σ 2). that Y1. ∑i = 1 (Zi − Z ( THEOREM 6 ¯ and S 2 are Let X1. k(i ≠ l). σ2). It follows. Y3 = X2 + .3. X2 + 1 3 X3 .υ. and that ∑ Y i2 = ∑ Z 2i i =1 i =1 k k Thus ∑ Y i2 = ∑ Y i2 − Y 12 = ∑ Z 2i − i=2 i =1 k i =1 k 2 i 2 i =1 i =1 k k k ( kZ ) ) 2 2 = ∑ Z − kZ = ∑ Zi − Z . ▲ ¯ Exercises 9. ∑c c For i < l.

Y)′ ϳ N(0. σ 2. ρ). ρ). U − V. d σ 2. (Compare with the latter part of Exercise 9.) 2 9. 2(1 − ρ)). τ 2 1.’s are independent.3. dY)′ ϳ N(cμ1. 1. σ 2 1.3.v. ρ).v. ρ). 1. 1. X − Y ϳ N(0.’s X + Y and X − Y are independent if and only if σ1 = σ2. ( ) iii) The r.1 The Univariate Exercises Case 241 Then: i) Show that the r.3. 2(1 − ρ)). then: i) Determine the distribution of the r.v. σ 2 1 . show that X + Y ϳ N(0. 2 2 ii) In particular. X − Y)′ ϳ N(0. 1. then show that (cX. X − Y are independent. c2. τ 2 = σ 1 + σ 2 − 2 ρσ 1σ 2 . cμ1 − dμ2.’s X + Y. μ2. Y)′ ϳ N(0. σ 2 1. say. that is. (Hint: Verify that the transformation is orthogonal. Y)′ ϳ N(μ1. Then: 2 2 i) (cX. Y) ϳ N(μ1. ρ0).’s (X. 0. 1.3. and ρ0 = 2 c 2σ 1 − d 2σ 2 2 τ1τ 2 .4 If (X.3 If (X. cX − dY)′ ϳ N(cμ1 + dμ2. d2. τ 2. ρ). τ 2 1) and X − Y ϳ N(0. 1. ρ0). with +ρ if cd > 0. d are constants with cd ≠ 0. σ 1.5 If (X. for σ 2 1 = σ 2 = σ .V= Y − μ2 σ2 . σ 2. σ 2.3. Then: 2 iii) (X + Y. dμ2. τ 2. c d iii) The r. and X + Y.v. σ 2. and ρ0 = −ρ if cd < 0. ρ0). ρ). then show that X − μ Y − μ ′ ϳ N(0. and −ρ if cd < 0. . d be constants with cd ≠ 0. 2 9. and specify their respective means. and then use Theorem 5). 2 ii) (cX + dY. ±ρ). Y) has the Bivariate Normal distribution with 2 2 2 parameters μ1. c2σ 2 1. τ 2 = c σ 1 + d σ 2 − 2 ρcdσ 1σ 2 . 9.’s cX + dY and cX − dY are independent if and only if =±σ . dY) ϳ N(0. σ 2 1.2 If the pair of r. 0. where 2 2 2 2 2 2 2 τ1 = σ1 +σ2 τ 1τ 2 . μ2.’s are independent. and U = ( 1 2 σ1 σ2 ) X − μ1 σ1 . 0. 2 + 2 ρσ 1σ 2 . ρ).v. μ2. 0. specify the distributions of the r. τ 2 1. and let c. and show that r. σ 2 1. Y2.v. Y3 are also independent normally distributed with variance σ2. ρ. 9. σ 2. 0.5. and vice versa. Y)′ ϳ N(0. where ρ0 = ρ if cd > 0.’s U + V. Y)′ ϳ N(μ1. and ρ0 = σ 1 − σ 2 2 iii) X + Y ϳ N(0. μ2. and show that these r.3.3.6 Let (X. τ 2). where 2 2 2 2 2 2 2 τ1 = c 2σ 1 + d 2σ 2 2 + 2 ρcdσ 1σ 2 . use a conclusion in Theorem 4 in order to show that 2 2 2 2 Y2 1 + Y 2 + Y 3 ϳ σ χ 3.9. (X.7 Let (X. X − Y. 9. ii) If μ1 = μ2 = μ3 = 0. 2 9.υ. and c. 0.v.’s Y1.

(in the absolutely continuous case). 1).’s in part (iii) are distributed as N(cμ1 + dμ2. We will show that G(y) = y. F. 49 (1976) No. F. d = 3. Inc.v. Set X = F−1(Y). Vol. λ = 15. Set F(a) = c. let (Xj. there exists ε > 0 such that y + ε < 1 and F(y) < F(y + ε)(≤ 1). σ2 = 0.242 9 Transformations of Random Variables and Random Vectors iv) The r. σ 2 1.f. τ2 2).f. Y by Y = F(X). vectors with distribution 2 N(μ1. Let G be the d.d.f.. b] and all y of the form y + – n (n ≥ 2 integer) lie in (c.3.v. Then: ¯−Y ¯. Thomas. 0 < y < 1. . The proof presented is an adaptation of the discussion in the Note “The Probability Integral Transformation: A Simple Proof” by E.9 For j = 1. y + ε = b. ρ). Reading. say? 9. 9. of Y. ii) Give the numerical value of the probability in part (i) for c = 2. there exists a such that (0 ≤)F(a) < y. F. μ2 = 1. 5. with continuous d. 1). μ1 = 3. Schuster. Yj)′ be independent r.v. Since F(x) → 0 as x → −∞. can actually be constructed. for example.7 and: i) Provide an expression for the probability P(cX + dY > λ). σ 2. σ1 = 1. by the Intermediate Value Theorem (see. Then the distribution of Y is U(0. there exist x0 and xn (n ≥ 2) in (a.8 Refer to Exercise 9. we derive two main results.3. published in Mathematics Magazine. Theorem 3(ii) on page 95 in Calculus and Analytic Geometry. and since F(x) → 1 as x → ∞. THEOREM 7 Let X be an r. G(0) = 0.v. if X is any r. 9. .f.3. 1). by George B. and if Y = F(X).v. 3rd edition (1966).5. and deﬁne the r. The second result resolves this question as follows: Let F be any d. with d.v. According to the ﬁrst result. The name of this section is derived from the transformation Y = F(X).5. Indeed. and F(b) = d. respectively.v. pages 242–243. 1). τ 2 1). Next.4 The Probability Integral Transform In this short section. d). .9. in several instances a statement has been made to the effect that X is an r. Therefore. since F is represented by the integral of a p. μ2. and N(cμ1 − dμ2. i) Determine the distribution of the r. b) such that F(x0) = y and F(xn) = y + – n Then PROOF . The question then arises as to whether such an r. G(1) = 1. Massachusetts) ε . then surprisingly enough Y ϳ U(0. X 2 2 ii) What does this distribution become for μ1 = μ2 and σ 2 1 = σ 2 = σ . F. n. and let Y ϳ U(0. let y ∈ (0. with continuous d.f. Addison-Wesley Publishing Company. Then the function F is continuous in the closed interval ε [a.5. and ρ = −0. Then X ϳ F. .f.

so that G(1) = 1.v. with d. Hence Letting n → ∞.1 The Integral Univariate Transform Case 243 (X ≤ x ) ⊆ [F (X ) ≤ F (x )] (since F is nondecreasing) = [ F ( X ) ≤ y] (since F (x ) = y) 0 0 0 ⎡ ε⎤ ⊆ ⎢F X < y + ⎥ n⎦ ⎣ ( ) = F X < F xn ⊆ X < xn [ ( ) ( )] ( ) (by the fact that F is nondecreasing and by contradiction).10 and 9. 9. as y ↓ 0. being a d. G(0) = lim G(y) = lim y = 0. as y ↑ 1. From this deﬁnition it is then clear that when F is strictly increasing. 1) such that F(x) = y.9. It is also clear that. 9. 1 F 1 1 y F y y F 0 Figure 9. Finally. ⎛ ⎜ since F x n = y + ⎝ ( ) ε⎞ ⎟ n⎠ That is (X ≤ x0) ⊆ [F(X) ≤ y] ⊆ (X ≤ xn). Next.9 x 0 A x Figure 9. The proof is completed. we obtain G(y) = y. ▲ For the formulation and proof of the second result.10 B 0 x Figure 9. G is right-continuous. Thus. then the above deﬁnition becomes as follows: (6) F −1 ( y) = inf {x ∈ . n ( 0 n P X ≤ x0 ≤ P F X ≤ y ≤ P X ≤ x n .4 The Probability 9.f. Set y = F(x) and deﬁne F−1 as follows: (5) F −1 ( y) = inf {x ∈ . To this end. we need some notation and a preliminary result. G(1−) = lim G(y) = lim y = 1. ޒF ( x) = y}. if F is continuous. there is exactly one y ∈ (0. for each x ∈ ޒ. ޒF ( x) ≥ y}.11. (See also Figs.f. F.11 .) ) [ ( ) ] ( ) ε or y = F ( x ) ≤ G( y) ≤ F ( x ) = y + .9. let X be an r.

Hence F(xn) → F [F −1(y)].4. Fj. .v. and let F be a d. Set Yj = Fj(Xj) and show that the r. ▲ As has already been stated. We have F−1(y) = inf {x ∈ .f. . we may now establish the following result. X whose d. where F−1 is deﬁned by (5). . ▲ By means of the above lemma. of X is F.ޒF(x) ≥ y}. Combining this result with (7). Then the d. that y ≤ F(t). 1).’s such that Xj has continuous and strictly increasing d. j = 1. the theorem just proved provides a speciﬁc way in which one can construct an r. PROOF We have P X ≤ x = P F −1 Y ≤ x = P Y ≤ F x = F x . −1 −1 Now assume that F (y) ≤ t. distributed as U(0.v. Then F[F (y)] ≤ F(t). X = −2 ∑ log 1 − Yj j =1 n ( ) is distributed as χ 2 2n. .ޒF(xn) ≥ y} such that xn ↓ F−1(y). Then F−1(y) ≤ t if and only if y ≤ F(t). THEOREM 8 [ ( )] (7) Let Y be an r. REMARK 7 Exercise 9. since F is nondecreasing. by the right continuity of F. F(x) ≥ y} and hence F−1(y) ≤ t. The proof of the lemma is completed. is a given d. we obtain y ≤ F(t). This means that t belongs to the set {x ∈ ޒ. Next assume.f. F.v. .v.v.f.1 Let Xj. X by X = F −1(Y).244 9 Transformations of Random Variables and Random Vectors We now establish the result to be employed. Therefore there exists xn ∈ {x ∈ .f. 1) and the one before it by Lemma 1. n be independent r.f. LEMMA 1 Let F −1 be deﬁned by (5). and PROOF F F −1 y ≥ y. Deﬁne the r. ( ) [ ( ) ] [ ( )] () where the last step follows from the fact that Y is distributed as U(0.

since for i ≠ j. .i. . . Xn be i. ⋅ ⋅ ⋅ . .Chapter 10 Order Statistics and Related Theorems In this chapter we introduce the concept of order statistics and also derive various distributions. the Y ’s are not independent. X2. . f which is positive for a < x < b and 0 otherwise. ⋅ ⋅ ⋅ . . . . f such that f(x) > 0. . . . for each s ∈S. X2.f. In the ﬁrst place. The results obtained here will be used in the second part of this book for statistical inference purposes. Yn is given by: ⎧n! f y1 ⋅ ⋅ ⋅ f yn . ⎪ g y1 .d. but it is easily seen. j = 1. It follows that Y1 ≤ Y2 ≤ · · · ≤ Yn. j = 1. ( ) ( ) ( ) a < y1 < y2 < ⋅ ⋅ ⋅ < yn < b otherwise. (that is.f. . . .i. ⋅ ⋅ ⋅ .1 Order Statistics and Related Distributions Let X1. 245 . . . of the Y’s. Xn is denoted by X( j). . and. PROOF The proof is carried out explicitly for n = 3.’s with d. and then Yj(s) is deﬁned to be the jth smallest among the numbers X1(s). n). in general. it will be established that: THEOREM 1 If X1. . yn = ⎨ ⎪ ⎩0. .d. Xn are i. (−∞ ≤)a < x < b(≤ ∞) and zero otherwise.v. then the joint p. or Yj. X n .v. .d. .d. r. X2(s). . . for easier writing.’s with p. and is deﬁned as follows: Yj = jth smallest of the X 1 . to be valid in the general case as well. . The jth order statistic of X1. F. By means of Theorem 3′.d. One of the problems we are concerned with is that of ﬁnding the joint p. Xn(s). of the order statistics Y1. 2. X 2 . . We assume now that the X’s are of the continuous type with p.d. Chapter 9.f. with the proper change in notation. X2(s). .f. Xn(s). . n.f. r. 10. . look at X1(s).

i = 1. x 2 = y2 . f(·.f. x 2 = y1 . x 2 = y1 . S321 : S 231 : y1 = x 2 . S312 : y1 = x3 . y2 = x1 . x3 = ⎨ ⎪ otherwise. x3 = y2 S 213 : x1 = y2 . where ′ ⎧ ⎫ S = ⎨ x1 . a < x1 ≠ x 2 ≠ x3 < b f x1 . x3 all different ⎬. y2 = x3 . Thus we have ⎧ ⎪ f x1 f x 2 f x3 . x2. The Jacobians are thus given by: . ( ) ( )( )( ) Thus f(x1. y2 = x 2 . i. ⎩0. x3 . x 2 = y3 . Now on each one of the Sijk’s there exists a one-to-one transformation from the x’s to the y’s deﬁned as follows: S123 : S132 : y1 = x1 . 3. 2. j j and therefore P(Xi = Xj = Xk) = 0 for i ≠ j ≠ k. x 2 . x3 = y3 S132 : x1 = y1 . x 2 = y3 . x1 .246 10 Order Statistics and Related Theorems P X i = X j = ∫∫ ( xi = x j ) f xi f x j dxi dx j = ∫ a ( ) ( )( ) b x ∫ x f ( xi ) f ( x j )dxi dx j = 0. ⎩ ⎭ ( ) Then we have S = S123 + S132 + S 213 + S 231 + S312 + S321 . y3 = x 2 y3 = x3 y3 = x 2 S 213 : y1 = x 2 . y3 = x3 y2 = x3 . y1 = x1 . x 2 . ⎩ ⎭ ( ) Let Sijk ⊂ S be deﬁned by ′ ⎧ ⎫ S ijk = ⎨ x1 . X3 is zero if at least two of the arguments x1. k = 1. x3 = y1 . x2. x3 ∈ ޒ3 . x3) is positive on the set S. y3 = x1 y1 = x3 . 2. x 2 = y2 . y3 = x1 . 3. x3 = y2 S312 : x1 = y2 . of X1. a < x i < x j < x k < b⎬. y2 = x 2 . X2. we have then: S123 : x1 = y1 . we may assume that the joint p. x3 are equal. i ≠ j ≠ k.d. Solving for the x’s. x 2 . ·).. x 2 . x3 = y1 S321 : x1 = y3 . x3 = y3 S 231 : x1 = y3 . ·. a < x i < b. y2 = x1 . j .

of the order statistics Y1. From the deﬁnition of a determinant and the fact that each row and column contains exactly one 1 and the rest all 0. r. Xn be i. . .10. r.d.d.1 Order Statistics and Related Distributions 247 1 0 0 S123 : J 123 = 0 1 0 = 1 0 0 1 1 0 0 S132 : J 132 = 0 0 1 = −1. g y1 . Yn is given by g y1 . y3 = ⎨ ⎪ a < y1 < y2 < y3 < b ⎪ ⎩0. otherwise. it follows that the n! Jacobians are either 1 or −1 and the remaining part of the proof is identical to the one just given except one adds up n! like terms instead of 3!. Chapter 9. . a < y1 < y2 < y3 < b ⎪ g y1 . EXAMPLE 2 Let X1. .’s distributed as N(μ. yn = ( ) n! (β − α ) n . y2 .’s distributed as U(α. . β). . Then the joint p. 0 1 0 0 1 0 S213 : J 213 = 1 0 0 = −1 0 0 1 0 0 1 S231 : J 231 = 1 0 0 = 1 0 1 0 0 1 0 S312 : J 312 = 0 0 1 = 1 1 0 0 0 0 1 S321 : J 321 = 0 1 0 = −1. . . One has n! regions forming S. of the order statistics Y1. ( ) ( )( )( ) Notice that the proof in the general case is exactly the same. .i. Then the joint p. .d. y2 .υ.f. . one for each permutation of the integers 1 through n. . ⎧3! f y1 f y2 f y3 .i. EXAMPLE 1 Let X1. y3 = ⎨ ⎪ ▲ ⎩0. ( ) ( )( )( ) ( )( )( ) ( )( )( ) ( )( )( ) ( )( )( ) ( )( )( ) This is. . 1 0 0 Hence |J123| = · · · = |J321| = 1. ⋅ ⋅ ⋅ .υ. otherwise. gives ⎧ f y1 f y2 f y3 + f y1 f y3 f y2 + f y2 f y1 f y3 ⎪ ⎪+ f y3 f y1 f y2 + f y2 f y3 f y1 + f y3 f y2 f y1 . . ⎥ ⎦ ) if −∞ < y1 < · · · < yn < ∞ and zero otherwise. .d. ⋅ ⋅ ⋅ . yn = n!⎜ ⎟ exp ⎢− 2 ⎝ 2πσ ⎠ ⎢ ⎣ 2σ ( ) n ∑ (y j =1 n j 2⎤ − μ ⎥. . σ 2 ). Yn is given by ⎛ 1 ⎞ ⎡ 1 g y1 .f. and Theorem 3′. Xn be i. .

b) and therefore F −1 exists in this interval. r. . ( ) ( )( )[ ] [ j −1 n− j j j a < yj < b ( ) [ ( )] f ( y ). gj of Yj. − du f F 1 u [ ( )] u ∈ 0. ( ) ( )[ ( ) ( )] n −2 f y1 f yn . From Theorem 1. yn) = n!f( y1) · · · f( yn) for a < y1 < · · · < yn < b and equals 0 otherwise. f which is positive and continuous for (−∞ ≤) a < x < b(≤ ∞) and zero otherwise. is given by: ⎧ n! F yj ⎪ i) g j y j = ⎨ j − 1 ! n − j ! ⎪ ⎩0. .’s with d. it follows that F is strictly increasing in (a. Yn be the order statistics. yn = ⎨ ⎪ ⎩0. n a < yn < b otherwise. . as well as the joint p. y j = ⎨ n− j ⋅ f yi f y j . 1 .248 10 Order Statistics and Related Theorems if α < y1 < · · · < yn < β and zero otherwise. a < yi < y j < b ⎪× 1 − F y j ⎪ otherwise.f. b). Another interesting problem is that of ﬁnding the marginal p. . . Xn be i. . Since f is positive in (a. n. of any number of the Yj’s. . ⎩ [ ( )] f ( y ). . ( ) 1 − F ( y )] f ( y ). we have that g( y1.f. Hence if u = F(y). THEOREM 2 Let X1. . Then the p.v. . then y = F −1 (u). . is given by: i −1 ⎧ n! F yi F y j − F yi ⎪ ⎪ i−1! j −i−1! n− j ! ii) g ij yi .f. . ( ) ( )( [ ( )] ( ) ( ) ( )] )( )[ ] [ ( )( ) j − i −1 In particular. . ⎧ ⎪n 1 − F y1 i′) g1 y1 = ⎨ ⎪ ⎩0.d. ( ) . j = 1. ⎩0. F and p.d.d. b). n.f.f. . .d. In particular. otherwise.f. ( )( ) a < y1 < yn < b otherwise. and let Y1. we have the following theorem. Yj with 1 ≤ i < j ≤ n. . j = 1.d. . u ∈(0. . gij of any Yi. The joint p. . of each Yj. .i. y ∈(a. As a partial answer to this problem. n−1 1 n−1 a < y1 < b otherwise and i″) g n yn ( ) ⎧ ⎪ n F yn =⎨ ⎪0. 2. ⎧ ⎪n n − 1 F yn − F y1 ii′) g1n y1 .d. 1) and PROOF dy 1 = .

. . that is. n Then − g1 y1 = n 1 − F y1 ( ) [ ( )] [− f ( y )]. and in fact the same method can be used to ﬁnd the joint p. n. . uj+1 yield [1/(n − j)!] (1 − uj)n−j and the last j − 1 integrations with respect to the variables u1. h u j = n! ∫ ⋅ ⋅ ⋅∫ 0 ( ) uj 0 ∫ 1 uj ⋅ ⋅ ⋅∫ 1 u n− 1 dun ⋅ ⋅ ⋅ du j +1 du1 ⋅ ⋅ ⋅ du j−1 . uj−1 yield [1/( j − 1)!] ujj−1. . . . () . . .1. h(u1. . un = n! f F −1 u1 ⋅ ⋅ ⋅ f F −1 un ( ) [ ( )] u2 [ ( )] f [F (u )] ⋅ 1 ⋅ ⋅ f [ F (u )] −1 −1 1 n for 0 < u1 < · · · < un < 1 and equals 0 otherwise.19). . ( ) ( ) ( ) ( ) Thus gn(yn) = n[F(yn)]n−1 f(yn). . respectively. Of course. . . The ﬁrst n − j integrations with respect to the variables un. 1) and equals 0 otherwise.d. h of the U’s is given by h u1 .f. Thus h uj = ( ) ( n! u jj−1 1 − u j j −1! n− j ! )( ) ( ) n− j for uj ∈(0. b) and 0 otherwise. 1). . we obtain g yj = ( ) ( ) 1 − F ( y )] f ( y ) ( )( )[ ] [ n! F yj j −1! n− j ! j −1 n− j j j for yj ∈ (a. . Finally.f. ⎪ ⎪x −α . This completes the proof of (i). Then ⎧0 . un) = n! for 0 < u1 < · · · < un < 1 and equals 0 otherwise. (i′) and (i″) follow from (i) by setting j = 1 and j = n. j = 1. using once again the transformation Uj = F(Yj).1 Order Statistics and Related Distributions 249 Therefore by setting Uj = F(Yj). Similarly. n−1 1 or g1 y1 = n 1 − F y1 ( ) [ ( )] f ( y ).d. An alternative and easier way of establishing (i′) and (i″) is the following: Gn yn = P Yn ≤ yn = P all X j ’s ≤ yn = F n yn . ⋅ ⋅ ⋅ . x ≤α a<x<β x ≥ β. of any number of Yj’s (see also Exercise 10. . 1 − G1 y1 = P Y1 > y1 = P all X j ’s > y1 = 1 − F y1 ( ) ( ) ( ) [ ( )] .10. one has that the joint p. F x =⎨ ⎪β − α ⎪ ⎩1. n 1 The proof of (ii) is similar to that of (i). Hence for uj ∈(0. ▲ EXAMPLE 3 Refer to Example 2.

Likewise . ⎪n⎜ ⎟ n g n yn = ⎨ ⎝ β − α ⎠ β − α β −α ⎪ ⎪ ⎩0 . 0 < yj < 1 otherwise. a < y1 < yn < β otherwise. for α = 0. β = n − j + 1. a<yj <β otherwise. In particular. n g1n y1 . β −α n−j a<yj <β otherwise )!(β − α ) n (y j −α ) (β − y ) j . = ⎪n ⎟ n g1 y1 = ⎨ ⎜ ⎝ β −α ⎠ β −α β −α ⎪ ⎪ ⎩0 . 0 < yj < 1 otherwise. Since Γ(m) = (m − 1)!. yn ( ) ⎧ ⎛ y − y1 ⎞ ⎪n n − 1 ⎜ n ⎟ ⎝ β −α ⎠ ⎪ =⎨ ⎪ ⎪ ⎩0. ( ) ( )( ) ( ) n− j . Thus ⎧ ⎛ β − y ⎞ n−1 1 n−1 n 1 ⎪ β − y1 . β = 1. ( ) n− 2 1 ( β −α ) 2 = n n−1 ( ( β −α ) ) n (y − y1 ) n− 2 . ⎧ ⎛ y − α ⎞ n−1 1 n−1 n n ⎪ = yn − α . ( ) ( () ( ) ) ( ) n− j . these formulas simplify as follows: ⎧ n! y jj −1 1 − y j ⎪ j − n j − 1 ! ! g j yj = ⎨ ⎪ ⎩0. which is the density of a Beta distribution with parameters α = j. ⎛y j −α⎞ ⎜ ⎟ !⎝ β − α ⎠ j −1 gj yj ( ) ( ( )( )( ) ⎛β − yj⎞ ⎜ ⎟ ⎝ β −α ⎠ j −1 n−j 1 . ⎧ n! ⎪ ⎪ = ⎨ j −1! n− j ⎪ ⎪ ⎩0. ( ) ( ) ( ) α < yn < β otherwise. ( ) ( ) ( ) α < y1 < β otherwise. this becomes ⎧ Γ n+1 ⎪ y jj−1 1 − y j ⎪ g j yj = ⎨Γ j Γ n − j + 1 ⎪ ⎪ ⎩0 .250 10 Order Statistics and Related Theorems and therefore ⎧ n! ⎪ ⎪ = ⎨ j −1! n− j ⎪ ⎪ ⎩0.

fY . Z.v.10. 1). y + z ( ) ( ) ) ( )] n −2 = n n−1 F y+ z − F z ( )[ ( ⎧0 < y < b − a f z f y + z . Y = Yn − Y1 is called the (sample) range and is of some interest in statistical applications. ( ) 0 < yn < 1 otherwise and g1n y1 . () ( ) ( 0< y<1 otherwise. We are interested in deriving the distribution of the r. The r. then fY y = n n − 1 that is () ( )∫ 1− y 0 y n− 2 dz = n n − 1 y n− 2 1 − y .v. 0 < y1 < 1 otherwise. b− y n −2 a 0 < y< b−a otherwise. 0 < y1 < yn < 1 otherwise. () ( )∫ [F ( y + z) − F (z)] f (z) f ( y + z)dz. Consider the transformation ⎧ y = yn − y1 ⎨ ⎩z = y1 . z = g1n z. if X is an r. n −2 ⎧ 1− y . one obtains ⎧ ⎪ fY y = ⎨n n − 1 ⎪ ⎩0. ( ) ( ) n −1 . ⎪n n − 1 y fY y = ⎨ ⎪ ⎩0. we consider the transformation .⎨ ⎩a < z < b − y ()( ) and zero otherwise.1 Order Statistics and Related Distributions 251 ⎧ ⎪n 1 − y1 g1 y1 = ⎨ ⎪ ⎩0. To this end. Z y. yn ( ) ⎧ ⎪n n − 1 yn − y1 =⎨ ⎪0. ( ) ( ) ) 0 < y < 1. In particular. Let now U be χ and independent of the sample range Y. The distribution of Y is found as follows. gn yn = ⎨ n ⎪ ⎩0. Set 2 r Z= Y U r .v. ⎩ ( )( ) n −2 . distributed as U(0. Therefore ⎧y = z Then ⎨ 1 ⎩ yn = y + z and hence J = 1. Integrating with respect to z. ⎧ ⎪nyn −1 .

v. f given by: − ( x −θ ) f x =e I (θ . n be independent r.d. Integrating out w. then: i) Calculate the probability that all X’s are greater than (α + β)/2.f. derive the p. W z. Owen’s Handbook of Statistical Tables.d.1. Its density is given by fZ(z) above and the values of the points zα for which P(Z > zα) = α are given by tables for selected values of α. . Xn are i. 0 () ∞ if 0 < z < ∞ and zero otherwise. X2.v.f. of Y1.’s with d.1. for example. . .f. and p.’s Xj.v.d. n. . .’s with p.v. .1.f.v.i. . n be i. . respectively. ii) In particular. Use Theorem 2(i″) in order to calculate the probability that all Xj’s exceed m.252 10 Order Statistics and Related Theorems ⎧ y ⎪z = u r ⎨ ⎪ ⎩w = u r . Donald B. . Therefore fZ . β).’s X1. the life of certain items such as light bulbs. The r. the crushing strength of bricks. Yn as well as Yn − Y1. etc. Then: i) Use the p. in particular. .90. j = 1. Then ⎧u = rw ⎪ ⎨ ⎪ ⎩y = z w and hence J = r w.i. 10.3 If the independent r.) ( ) ( ) Exercises Throughout these exercises. published by Addison-Wesley. for α = 0. . . Z is called the Studentized range. of the r. Xn are distributed as U(α. ( ) ( ) ( ) if 0 < z. .2 Let X1. the weight of certain objects. . β = 1. Y2/Y1. are quantities of great interest. . j = 1.v.’s distributed as U(α. . . then the r.v. derived in Example 3 in order to show that . j = 1. X3 be independent r. pp. j = 1. β). and let m be the median of F.f. are i.d.d. ∞ ) x1 . r. F and f. Now if the r.4 Let Xj. also calculate the probability P(Yn ≤ m). 10. 144–149. vacuum tubes. . . 10. . w = fY z w fU rw r w .v. we get fZ z = ∫ fY z w fU rw r w dw. 1) and Y is as above.1. . . From these examples. () ( ) Use Theorem 2(i″) in order to determine the constant c = c(θ ) for which P(θ < Y3 < c) = 0. (See. and n = 2.1 Let Xj. n may represent various physical quantities such as the breaking strength of certain steel bars. 10.d.d.’s and Yj = X( j) is the jth order statistic of the X’s. .v. . r.’s X1.i. it is then clear that the distribution of the Y ’s and. Xj. w < ∞ and zero otherwise. N(0.

and show that. .) 10. Xn be independent r. iii) For a and b with 0 < a < b ≤ β − α. 1).1. .6 Let X1.5 i) Refer to the p. α < y1 < yn < β . . iii) What do the quantities in parts (i) and (ii) become for α = 0 and β = 1? (Hint: In part (i). . and show that. . show that: i ∫0 y ( z − y) z j − i −1 dy = i! j − i − 1 ! j z. i) Refer to the p.f. and let 1 ≤ i < j ≤ n. ii) Set Y = Yn − Y1. then: g1n y1 . j! ) j +1 ∫0 z (1 − z) 1 n− j dz = ( j + 1)!(n − j )! . yn = ( ) n n−1 (β − α ) ) n ( ) n (y n − y1 ) n− 2 . . and show that the p. and EYn.d.f. ii) Integrating by parts repeatedly. 0 < yi < y j < 1.d. σ 2(Y1).’s distributed as U(0. Xn are independent with distribution U(α. show that: P a <Y < b = ( ) (β − α ) [(β − α − b)b − β − α − a a n−1 + ( ) ] (β − α ) bn − a n n . if X1.d. of Y is given by: fY y = () n n−1 (β − α ) n n ( (β − α − y) y n−1 n− 2 . iv) What do the quantities in parts (i)–(iii) become for α = 0 and β = 1? 10. use the appropriate Beta p. (Y ) = (n + 1) (n + 2) 2 j 2 ii) Derive EY1.Exercises 253 EY j (β − α ) j + α = n+1 and σ 2 (β − α ) j(n − j + 1) . β). (n + 2)! iii) Use parts (i) and (ii) to show that E YY i j = ( ) (n + 1)(n + 2) i j +1 ( ) . 0 < y < β − α.f. . .1.d. σ 2(Yn) from part (i). gij derived in Theorem 2(ii). .v. y j = ( ) ( )( ( n! yii−1 y j − yi i−1! j −i−1! n− j ! )( ) ( ) (1 − y ) j− i−1 j n− j .’s to facilitate the integrations.f. in the present case: g ij yi . gij derived in Theorem 2(ii).

with parameter λ . 10.f.7 Let the independent r. of Yj. j = 1. . . Yj = ( ) i n− j +1 ( ( n+1 )( 2 ) n+ 2 ) and ρ Yi .f. ޒzj ≥ 0.’s X1. . yn > 0 . Then: i) Use Theorem 2(i′) in order to show that the p. that is.1.d.8 Let the independent r. . . . v) From part (iv).v. . n are uniformly distributed over the set ⎧ ′ ⎪ n ⎨ z1 . this set is a triangle in ޒ.v.9 i) Let the independent r.v. . given Y1. and suppose that their ﬁrst order statistic Y1 has the Negative Exponential distribution with parameter nλ.1. j = 2. Xn have the Negative Exponential distribution with parameter λ. Y = Yn − Y1. of Yn is given by: g n yn = nλe − λy 1 − e λy n ( ) ( n ) n−1 . Yj = ( ) i n− j +1 [( ( i n−i +1 j n− j +1 )( ) )] 1 2 . ii) The conditional p. Then show that Y1 has the same distribution with parameter nλ. . iii) The conditional p. ii) Let F be the (common) d.) 2 10. Then use the result in Theorem 1 in order to show that the r.’s X1. Xn be distributed as U(0.d. . derive Cov(Y1. of Yn. 10. and let 1 < j < n. . .f. Zj = Yj − Yj−1. Then F is the Negative Exponential d.’s Zj. Use the relevant results in Example 3. 1). j = 1. . .f. . 1).10 Let the independent r.v.1. . . . given Yn. zn ∈ . n and ⎪ ⎩ ( ) ∑z j =1 n j ⎫ ⎪ ≤ 1⎬. ii) Let Y be the (sample) range. .f. . .’s X1.6(i) in order to derive: i) The conditional p. .f.’s Zj. Xn have the Negative Exponential distribution with parameter λ. .f. . . .254 10 Order Statistics and Related Theorems iv) By means of part (iii) and Exercise 1.v. .1. j = 1. of Y is given by: fY y = n − 1 λe − λy 1 − e − λy () ( ) ( ) n− 2 . y > 0.d.d. ⋅ ⋅ ⋅ .d. Xn. of Y1. .’s X1. Yn) and ρ(Y1. of Yj. ⎪ ⎭ (For n = 2.’s X1. given Y1.4(i). Yn). ⋅ ⋅ ⋅ . Xn be distributed as U(0. . . . 10. and the conditional p.v.f. and deﬁne the r.d. . n as follows: Z1 = Y1. show that: Cov Yi . given Yn. and Exercise 1.v. of the independent r. . and then use Theorem 2 (ii′ ) in order to show that the p. n.

for c > 0: P ⎡min X i − X j ≥ c ⎤ = exp − λn n − 1 c 2 .’s. ii) From the deﬁnition of the Zj’s. n − j + 1⎠ λ ⎝n n−1 iii) Use part (i) in order to show that. .d. ∑ d j Yj ⎟ = ∑ σ i2 ⎜ ∑ c j d j + 2 ∑ ck dl ⎟ . of Yj. . given Yn.v. iv) For a = 1/λ. r. where σ 2 i = [λ(n − i + 1)] .d.Exercises 255 iii) Calculate the probability P(a < Y < b) for 0 < a < b. . of Yn. b = 2/λ. show that Zj has the Negative Exponential distribution with parameter (n − j + 1)λ.f. it follows that Yj = Z1 + · · · + Zj.d. iii) From parts (i) and (ii). . n. and set: Z1 = Y1. ⎝ i =1 ⎠ i =1 ⎝ j = i ⎠ 1≤k < l ≤n j =1 where cj ∈ޒ. .’s X1. . . . and that these r.1.f. iii) The conditional p.d. 10. 10. . Xn have the Negative Exponential distribution with parameter λ. of Y1. and let 1 < j < n.12 and show that: −2 i) σ2(Yj) = ∑ ji = 1 σ i2 .9(i) and 1.1. .’s are independent. j = 1. Then: i) For j = 1. . n be i.10(i) in order to determine: i) The conditional p. . ﬁnd a numerical value for the probability in part (iii). ci ∈ ޒconstants. .v. .1. .’s X1. ii) The conditional p. Cov(Yi. Use this expression and part (i) in order to conclude that: EY j = 1⎛1 1 1 ⎞ +⋅⋅⋅+ ⎜ + ⎟. Xn have the Negative Exponential distribution with parameter λ.i. n. j = 2.12 Let the independent r. and n = 10.1. 2 i ii) For 1 ≤ i ≤ j < n. Use Theorem 2(ii) and Exercises 1. . j = 1. Yj) = ∑k = 1σ k .11 Let the independent r. . n. Then the sample median SM is deﬁned as follows: . . Zj = Yj − Yj−1. and the conditional p. Let Xj. .d. given Y1. given Yn. . .13 Refer to Exercise 10. .f.v. ⎢ i≠ j ⎥ ⎣ ⎦ 10. . n. dj ∈ ޒconstants. . . . . given Y1. ⎝ j =1 ⎠ j =1 ⎝ i = j ⎠ 2 2 iv) Also utilize parts (i) and (ii) in order to show that: n ⎛ n ⎞ n ⎛ n ⎞ Cov⎜ ∑ ci Yi . . of Yj. [ ( ) ] i = 1.f. conclude that: ⎛ n ⎞ n ⎛ n ⎞ σ ⎜ ∑ c j Yj ⎟ = ∑ σ j2 ⎜ ∑ ci ⎟ .v.

r. 6 x = 1. Xn be i. 0 < z1 < ⋅ ⋅ ⋅ < zn < 1 h z1 . j = 1. where Yj. . ⋅ ⋅ ⋅ . . otherwise.f. Then Z1.d. . h is given by: ⎧n!.f. Chapter 9. . .1. . Y = F(X) is U(0. F and let Zj = F(Yj). . Then determine the p. that if X is an r. What do parts (i)–(iii) become for n = 3? 10.d. of SM. ( ) .1. . 1 10.256 10 Order Statistics and Related Theorems SM ⎧Y ⎪ = ⎨1 Y +Y ⎞ ⎪2 ⎛ ⎠ ⎩ ⎝ n+ 1 2 n 2 n +2 2 if n is odd if n is even.f.2)(x). with continuous d. . where SM is deﬁned by (*).d. . . . 6.f. f with median m. β).i. f(x) = 2(2 − x)I(1. . r. . .17 Refer to Exercise 10. . then the r. .2 Further Distribution Theory: Probability of Coverage of a Population Quantile It has been shown in Theorem 7.1. 10. .d.1.’s with continuous d.1)(x).v. . 10. n are i.f. . of SM is symmetric about μ. r.16 For n odd. f(x) = 2(1 − x)I(0. 6. .d. .19 Carry out the proof of part (ii) of Theorem 2.d. of SM. 10. show that the p.d. (*) 10.2 and derive the p.d. observe that P(Y1 = y) = P(Y6 = 7 − y). and n is odd. j = 1.i. n are independently distributed as N(μ. 2. ii) Negative Exponential with parameter λ. conclude that ESM = μ.1.1. 10. 1). j = 1.’s Xj. .v. zn = ⎨ ⎩0. y = 1.f. n have p. j = 1. determine the p.’s with p. . of SM when the underlying distribution is: i) U(α.v.f. . . n are the order statistics. . and hence their joint p. F.v. .15 If the r. . where SM is deﬁned by (*).1)(x). . σ 2).v. This result in conjunction with Theorem 1 of the present chapter gives the following theorem.i. f given by f(x) = – . Also.d. THEOREM 3 Let X1. 6 be i.d.v. let the independent r.f.18 Let Xj.’s of Y1 and Y6.d. j = 1.υ.f. . Find the p. . .’s. . 1). Without integration. .. and also calculate the probability P(SM > m) in each one of the following cases: i) ii) iii) iv) f(x) = 2xI(0.f.1. Zn are order statistics from U(0.14 If Xj. .’s Xj.

d. THEOREM 4 Let X1. . . 0 < p < 1. X j ≤ x p Wj = ⎨ ⎪ ⎩0. . Chapter 9. Let now X be an r. a pth quantile. 2. . n. Then W1.v. 1). X(1) ≤ X(2) ≤ · · · ≤ X(n) there corresponds the ordering F(X(1)) ≤ F(X(2)) ≤ · · · ≤ F(X(n)) of the F(Xj)’s. . . .’s distributed as B(1. Wn are i. j = 1. n. where q = 1 − p. Therefore W( j) = F(Yj). j = 1.υ. .’s with continuous d. ) . PROOF Deﬁne the r. n as follows: ⎧ ⎪1. . Z1.d. . n.f. j = 1. For p. Y j < x p i p j i p ( ) ( = P(Y ≤ x ) ( ≤ Y ) + P(Y < x ). . since the Xj’s are. to each ordering of the Xj’s. Because F is nondecreasing. X j > x p .v. .10.’s Wj. . .f. r.f. ⋅ ⋅ ⋅ . and conversely. Yn be the order statistics. for 1 ≤ i < j ≤ n. by Theorem 7. xp. 2.2 Further Distribution Theory: Probability of Coverage of a Population Quantile 257 Set Wj = F(Xj). . Now we would like to establish a certain theorem to be used in a subsequent chapter. Next. . Xn be i. Zn) are given in Example 3 of this chapter. of the Zj’s is the one given above follows from Theorem 1 of this chapter. Then we have P Yi ≤ x p ≤ Yj = ∑ k =i ( ) j −1 ( )p q n k k n −k . or equivalently. X n ≤ x p = ∑ k =i ( ) ( )p q n k k k n− k n −k .i. P Yi < x p = P Yi ≤ x p = ∑ k= i ( ) ( ) n ( )pq n k . with d. j = 1. . F and let Y1. . . ▲ PROOF The distributions of Zj. 0 < p < 1. ( ) ( ) ( ) n Therefore P at least i of X 1 . That the joint p. . . we get P Yi ≤ x p = P Yi ≤ x p . F. of F was deﬁned to be a number with the following properties: i) P(X ≤ xp) ≥ p and ii) P(X ≥ xp) ≥ 1 − p. r. Zn and (Z1. Y j ≥ x p + P Yi ≤ x p . p). ⋅ ⋅ ⋅ . . 2. . let xp be the (unique by assumption) pth quantile. 2.v. .d. and also distributed as U(0. Then in Chapter 4.i. since P W1 = 1 = P X1 ≤ x p = F x p = p. Consider a number p. Then the Wj’s are independent.

v.’s with continuous d.f. this gives P Yi ≤ x p ≤ Yj = ∑ k =i ( ) n ( )p q n k k n −k −∑ k= j ( )p q n k k n −k =∑ k =i j −1 ( )p q n k k n −k . . . ▲ Exercise 10.1 Let Xj.d. F(Y1) and ﬁnd its expectation.2. . Use Theorem 3 and the relevant part of Example 3 in order to determine the distribution of the r.v. F. . n be i. ) ( ( ) P Yi ≤ x p ≤ Y j = P Yi ≤ x p − P Y j < x p . j = 1. r.i.258 10 Order Statistics and Related Theorems since (Y Therefore j < x p ⊆ Yi ≤ x p . ( ) ) ( n ) By means of (1). .

By F we denote the family of all p. Tm X 1 . . let Tj be (measurable) functions deﬁned on ޒn into ޒand not depending on θ or any other unknown quantities. ⋅ ⋅ ⋅ . of the r. the data. . .v. . m. n independent r. . testing hypotheses about θ. . to treat it in a separate chapter and gather together here all relevant results which will be needed in the sequel. θ ∈ Ω}. etc. that is. and set T = (T1. 259 . . . . .f. . So Ω ⊆ ޒr. F = { f(·. f(·. . We let Ω stand for the set of all possible values of θ and call it the parameter space. In the same chapter. xn.’s X1. r ≥ 1. Tm)′. θ). we shall often write T instead of T(X1. however. . Then ′ T X 1 .d. . Let X1. ⋅ ⋅ ⋅ . . Also. the concept of sufﬁciency plays a fundamental role in allowing us to often substantially condense the data without ever losing any information carried by them about the parameter θ. . Xn be a random sample of size n from f(·.1 Sufﬁciency: Deﬁnition and Some Basic Results 259 Chapter 11 Sufficiency and Related Theorems Let X be an r. we shall write θ rather than θ when r = 1. Xn) rather than T(X1. .11. that is.) on the basis of the observed values x1. . For j = 1. ⋅ ⋅ ⋅ . Xn). ⋅ ⋅ ⋅ . . .’s distributed as X above. In most of the textbooks. .’s we get by letting θ vary over Ω. θr)′ which is called a parameter. the concept of sufﬁciency is treated exclusively in conjunction with estimation and testing hypotheses problems. with p. Xn. . . . θ).v.f. . .v.d. X n ( ) ( ( ) ( )) is called an m-dimensional statistic. . . . . In doing so. . we also introduce and discuss other concepts such as: completeness. X n . Xn) if m = 1. We propose. One of the basic problems of statistics is that of making inferences about the parameter θ (such as estimating θ. We shall write T(X1. . . Likewise. unbiasedness and minimum variance unbiasedness. X n = T1 X 1 . . . θ) of known functional form but depending upon an unknown r-dimensional constant vector θ = (θ1. by slightly abusing the notation. .

0). r ⎩ r ⎫ ⎪ and ∑ θ j = 1⎬ ⎪ j =1 ⎭ ( ) ( ) and f x. the part of the plane above the main diagonal) and 1 f x. θ j > 0. σ2). . we have θ = (θ1. θ1 < θ2} (that is. ( ) EXAMPLE 2 Let X be U(α. Then by setting θ1 = μ. A = α. 0.1 Sufﬁciency: Definition and Some Basic Results Let us consider ﬁrst some illustrative examples of families of p. the distribution of X = (X1. θ1) = B(n. whereas for r = 2. j = 1. Ω = {(θ1. θ 2 ∈ ޒ2 . ⎪ ⎪ j =1 ⎩ ⎭ For example.d. r. By setting θ1 = α. for r = 3. ′ ⎧ ⎫ Ω = ⎨ θ1 . θ). θ 2)′. 0) and (0. . θ1. θ . θ1 ∈ ޒ. ∞) and 1 f x. 1. A = θ1 . θ2)′ ∈ ޒ2. θ2 ∈ޒ. ⋅ ⋅ ⋅ . ⋅ ⋅ ⋅ . x j ≥ 0.260 11 Sufﬁciency and Related Theorems The basic notation and terminology introduced so far is enough to allow us to proceed with the main part of the present chapter. 11. θ = IA x . xr ∈ ޒr . θ r . θ −α Similarly. β). θ 2 > 0 ⎬ ⎩ ⎭ (that is. . 0. EXAMPLE 1 Let X = (X1. X2)′ is completely determined by that of X1 = X which is distributed as B(n.f. r . ⋅ ⋅ ⋅ . θ = IA x . we have ′ ′ ⎧ θ = θ1 . 1) which lies in the ﬁrst quadrant. (0. the part of the plane above the horizontal axis) and ( ) . . ⋅ ⋅ ⋅ . θ2 = β.’s. . we have θ = (θ1. . θ = ( ) n! θ1x1 ⋅ ⋅ ⋅ θ rxr I A x = x1! ⋅ ⋅ ⋅ xr ! () ∏ r −1 j =1 x j ! n − x1 − ⋅ ⋅ ⋅ − xr −1 ! IA x . θ2)′. if β is known and α = θ. θ r ∈ ޒr . . ( ) () [ ] EXAMPLE 3 Let X be N(μ. . Then by setting θj = pj. ⋅ ⋅ ⋅ . θ 2 − θ1 ( ) () [ ] If α is known and we put β = θ. j = 1. θ2 = σ2. Ω is that part of the plane through the points (1. ( n! ) r−1 × θ1x1 ⋅ ⋅ ⋅ θ rx− 1 1 − θ 1 − ⋅ ⋅ ⋅ − θ r −1 ( ) −1 n −Σr j = 1x j () r ⎧ ⎫ ′ ⎪ ⎪ A = ⎨x = x1 . θ 2 . j = 1. Xr)′ have the Multinomial distribution. then Ω = (α. Ω = ⎨ θ1 . ∑ x j = n⎬.

θ). . . ⎥ ⎥ ⎦ ⎡ x −θ f x. . 1}.1 Sufﬁciency: Deﬁnition and Some Basic Results 261 ⎡ x −θ 1 f x. . θ ∈Ω Ω = (0. . X2)′ have the Bivariate Normal distribution.11. Set T = ∑n j = 1 Xj. t = 0. 1 ⎬ ⎩ ⎭ ( ) ( ) ( ) and f x.d. so that ⎛ n⎞ fT t .i. . ⋅ ⋅ ⋅ . n. fX x j . θ 5 ∈ ޒ5 . θ = θ j ( ) xj (1 − θ ) 1− x j I A xj . θ2 = μ2. . () where B = {0. θ = exp ⎢− ⎢ 2 θ 2πθ 2 2 ⎢ ⎣ If σ is known and we set μ = θ. As usual. Setting θ1 = μ1. n. . . . .v. ﬁnd the probability of . 1. . that is. ⎥ ⎥ ⎦ ⎤ ⎥. n. . Xn be i. . . . ⋅ ⋅ ⋅ . j = 1. we label as a success the outcome 1. θ1 .’s from B(1. j = 1. . EXAMPLE 5 Let X1. . n. . θ 5 ∈ −1. n}. 1 1 − ρ2 ( ) Before the formal deﬁnition of sufﬁciency is given. θ5)′ and ′ ⎧ ⎫ Ω = ⎨ θ1 . then Ω = ޒand ( ) 1 ( ) 2 ⎤ ⎥. . Then the problem is to make some kind of inference about θ on the basis of xj. . x2 . θ4 = σ 2. θ5 = ρ. θ 3 . θ = exp⎢− ⎢ 2σ 2 2πσ ⎢ ⎣ Similarly if μ is known and σ 2 = θ. q= 2 ⎡⎛ x − μ ⎞ 2 ⎛ x1 − μ1 ⎞ ⎛ x2 − μ 2 ⎞ ⎛ x2 − μ 2 ⎞ ⎤ 1 ⎢⎜ 1 − + ρ 2 ⎟ ⎜ ⎟ ⎥. ∞ . θ). We suppose that the Binomial experiment in question is performed and that the observed values of Xj are xj. Then the following question arises: Can we say more about θ if we know how many successes occurred and where rather than merely how many successes occurred? The answer to this question will be provided by the following argument. Then T is B(n. θ 2 ∈ ޒ. an example will be presented to illustrate the underlying motivation. ( ) 1 ( ) 2 EXAMPLE 4 Let X = (X1. Given that the number of successes is t. ⎟ ⎜ ⎟⎜ ⎢⎝ σ 1 ⎠ ⎝ σ1 ⎠⎝ σ 2 ⎠ ⎝ σ 2 ⎠ ⎥ ⎦ ⎣ ′ x = x1 . θ = ⎜ ⎟ θ t 1 − θ ⎝t⎠ ( ) ( n− t IB t . ( ) ) j = 1. 1). 1. . given that T = t. θ = where ( ) 1 2πσ 1σ 2 1 − ρ 2 e −q 2 . r. . θ 4 ∈ 0. . 2 θ3 = σ 2 1. we have then θ = (θ1. where A = {0. that is.

Tm)′. f(·. . . X n = xn | T = t ) = 1 j =1 n ⎛ n⎞ ⎜ ⎟ ⎝t⎠ independent of θ. ⋅ ⋅ ⋅ . θ). . j = 1. Pθ[(X1. we have Pθ X 1 = x1 . that is. . we will say that knowledge of the positions where the t successes occurred is more informative about θ than simply knowledge of the total number of successes t. all possible outcomes.a. ⋅ ⋅ ⋅ . . X n = xn . X n = xn Pθ T = t ( ( Pθ T = t ( ) ) ( ) ) if x1 + ⋅ ⋅ ⋅ + xn = t and zero otherwise. is independent of θ for all values of t (actually. . . . T being a sufﬁcient statistic for θ implies that every (measurable) set A in ޒn. and let T = (T1. . . on the other hand. X n . In the present case.f. Xn)′ ∈ A|T = t] is independent of θ for a. then clearly the positions where the t successes occurred are entirely irrelevant and the total number of successes t provides all possible information about θ. REMARK 1 Thus. j = 1. . ( ) j = 1. except perhaps for a set N in ޒm of values of t such that Pθ (T ∈ N) = 0 for all θ ∈ Ω. DEFINITION 1 Let Xj. n be i. . and this is equal to θ x 1 −θ 1 ( ⎛ n⎞ t ⎜ ⎟θ 1 − θ ⎝t⎠ ) 1− x 1 ⋅ ⋅ ⋅ θ x 1 −θ n ( ) 1− x n ( ) n− t = θt 1 −θ ⎛ n⎞ t ⎜ ⎟θ 1 − θ ⎝t⎠ ( ) n− t ( ) n− t = 1 ⎛ n⎞ ⎜ ⎟ ⎝t⎠ if x1 + · · · + xn = t and zero otherwise. . n and ∑ x j = t .d.)t.’s with p. Thus. have the same probability of occurrence. . ⋅ ⋅ ⋅ . We say that T is an m-dimensional sufﬁcient statistic for the family F = { f(·. for almost all (a.d. If. θ). θ)). ⋅ ⋅ ⋅ . m are statistics.f. we found that for all x1.i. .v. f (·. X n = xn | T = t = = ( ) Pθ X 1 = x1 . ⋅ ⋅ ⋅ . . θ ∈ Ω}. . given T = t. ⋅ ⋅ ⋅ .d. . . and therefore the total number of successes t alone provides all possible information about θ. . Xn)′. where Pθ denotes the probability function associated with the p. r. Then. . This example motivates the following deﬁnition of a sufﬁcient statistic. Pθ ( X 1 = x1 . . θr)′ ∈ Ω ⊆ ޒr. . T = t Pθ X 1 = x1 . if the conditional distribution of (X1. . . where Tj = Tj X 1 . xn such that xj = 0 or 1. θ = (θ1. given the total number of successes t. if there are values of θ for which particular occurrences of the t successes can happen with higher probability than others. .a. .262 11 Sufﬁciency and Related Theorems each one of the (n t ) different ways in which the t successes can occur. or for the parameter θ. . .

. . ⋅ ⋅ ⋅ . . ( ) ( ) Clearly. X n = xn ′ . ⋅ ⋅ ⋅ . Actually. The proof is given separately for the discrete and the continuous case. .1 Sufﬁciency: Deﬁnition and Some Basic Results 263 t. . .a. . In connection with this. Deﬁnition 1 above does not seem appropriate for identifying a sufﬁcient statistic.d. given T = t. xn . is independent of θ for a. θ ⋅ ⋅ ⋅ f xn ′ . . f x1 . t.v. xn ′ )′ for which T(x1 ′ .f. Pθ T = t = Pθ T X 1 . ⋅ ⋅ ⋅ . Then ′ ⎤ ⎡ Pθ T* ∈ B T = t = Pθ ⎢ X 1 . . ⋅ ⋅ ⋅ . .f. t. xn .a. X n ( ) ( ( ) ( )) is sufﬁcient for θ if and only if the joint p. ( ( )( ) Hence . . θ = ∑ g t. . X n = T1 X 1 . ⋅ ⋅ ⋅ .i. of X1. · · · . X n ∈ A T = t⎥ ⎣ ⎦ and this is independent of θ for a. ( ) [ ( ) ]( ) where g depends on x1. . An m-dimensional statistic ′ T = T X 1 . THEOREM 1 (Fisher–Neyman factorization theorem) Let X1. To see this. θ = g T x1 . Assume that the factorization holds. ⋅ ⋅ ⋅ . xn only through T and h is (entirely) independent of θ. θ h x1 ′ . X n . xn for which T(x1. Xn factors as follows. xn ′ ) = t. . . . if T* = (T* 1. PROOF f x1 . θ)∑ h( x ′ . ( ) [ ( ) ]( ) with g and h as described in the theorem. xn . ⋅ ⋅ ⋅ . Next. . This can be done quite easily by means of the following theorem. . . θ h x1 . θ).d. . . . . xn . . . Namely. . where the summation extends over all (x1 ′. θ = (θ1. x ′ ). θ = g T x1 . . ⋅ ⋅ ⋅ . r. . ⋅ ⋅ ⋅ . . T* k)′ is any k-dimensional statistic. ⋅ ⋅ ⋅ . xn) = t. that is. it sufﬁces to restrict attention to those t’s for which Pθ (T = t) > 0. Tm X 1 . . We ﬁnally remark that X = (X1. ⋅ ⋅ ⋅ . . . xn . Xn be i. it should be pointed out at the outset that by doing so we restrict attention only to those x1. . f(·. .’s with p. ⋅ ⋅ ⋅ . xn ′ 1 n ( ) [( ) ] ( ) ( ) ) ( ) = g(t. . . . ⋅ ⋅ ⋅ . Thus Pθ T = t = ∑ f x1 ′ . Discrete case: In the course of this proof. X n = t = ∑ Pθ X 1 = x1 ′ . θr)′ ∈ Ω ⊆ ޒr. xn) = t. . then the conditional distribution of T*. ⋅ ⋅ ⋅ . Clearly.d. . more is true. xn . θ h x1 .11. ⋅ ⋅ ⋅ . Xn)′ is always a sufﬁcient statistic for θ. we are going to use the notation T(x1. . let B be any (measurable) set in ޒk and let A = T*−1 (B).

. x )]. t m +1 . . . . ⋅ ⋅ ⋅ . ⋅ ⋅ ⋅ . ⋅ ⋅ ⋅ . xn . X n = xn ( = k x1 . X n = xn T = t = ( ) Pθ X 1 = x1 . ⋅ ⋅ ⋅ . θ ⋅ ⋅ ⋅ f xn . call it k[x1. Then Pθ (X1 = x1. X n = xn . j = 1. xn . From Remark 1. . n. . ( ) . ⋅ ⋅ ⋅ . . ⋅ ⋅ ⋅ . xn . however. ⋅ ⋅ ⋅ . θ)∑ h( x . θ = Pθ T = t [( ) ] = k x1 . θ h x1 . ⋅ ⋅ ⋅ . T x1 . such that the transformation t j = Tj x1 . ⋅ ⋅ ⋅ . n. xn. 1 n Setting g T x1 . Now. it follows that m ≤ n. . ⋅ ⋅ ⋅ . . xn . ⋅ ⋅ ⋅ . . . θ ⋅ ⋅ ⋅ f xn . xn )]. θ h x1 . ⋅ ⋅ ⋅ . xn . xn ( ( )] ) ) we get f x1 . ⋅ ⋅ ⋅ . m. ⋅ ⋅ ⋅ . ⋅ ⋅ ⋅ . T x1 . let T be sufﬁcient for θ. . x . . . j = m + 1. . is invertible. xn [ Pθ T = t ( ( ) ) )] if and only if f x1 . . ⋅ ⋅ ⋅ . Xn). t m . Continuous case: The proof in this case is carried out under some further regularity conditions (and is not as rigorous as that of the discrete case). n. . A proof without the regularity conditions mentioned above involves deeper concepts of measure theory the knowledge of which is not assumed here. xn 1 1 n n 1 n ( ) ) = P (X θ 1 = x1 . . ⋅ ⋅ ⋅ . . . Xn).264 11 Sufﬁciency and Related Theorems Pθ X 1 = x1 . t n . T(x1. [ ( ) and h x1 . Then set Tj = Tj(X1. ⋅ ⋅ ⋅ . . It should be made clear. . that the theorem is true as stated. ⋅ ⋅ ⋅ . ( ) j = 1. xn . x ) g t. Then Pθ X 1 = x1 . X n = xn Pθ T = t ( ) ) and this is independent of θ. . ⋅ ⋅ ⋅ . T( x . and assume that there exist other n − m statistics Tj = Tj(X1. . ⋅ ⋅ ⋅ . . X n = xn θ 1 n ( ) ( ) ( ) = P (T = t)k[ x . so that x j = x j t. X n = xn T = t = = ( Pθ X 1 = x1 . x ) ∑ h( x . ( ) ( ) [ ( ) ]( as was to be seen. ( ) ′ j = 1. x ) g(t. ⋅ ⋅ ⋅ . t = t1 . xn . . T = t Pθ T = t ( ) ( )( ) = h( x . . ⋅ ⋅ ⋅ . ⋅ ⋅ ⋅ . θ = Pθ X 1 = x1 . . Xn = xn|T = t) is independent of θ . ⋅ ⋅ ⋅ . θ = g T x1 .

xn . t . given T = t. . given T = t. θ fT t. Then. θ h x1 t. t m +1 . −∞ () ∞ ( ) That is. ⋅ ⋅ ⋅ . fT(t. () fT t. . ⋅ ⋅ ⋅ . θ J −1 = h * t m +1 . t n t. ⋅ ⋅ ⋅ . θ = g T x1 . by assumption. t m +1 . Tm+1. Tn. is independent of θ.11. ⋅ ⋅ ⋅ . t = h x1 . is independent of θ. . Xn. Tm+1. xn . t n . t n dt m +1 ⋅ ⋅ ⋅ dt n . T t. . T . t n . ⋅ ⋅ ⋅ . ⋅ ⋅ ⋅ . . ⋅ ⋅ ⋅ . . θ . t . the conditional distribution of Tm+1. xn . ⋅ ⋅ ⋅ . θ = fT . xn . t m +1 . t m +1 . . Let now T be sufﬁcient for θ. θ h ** t . t n t. by using the inverse transformation of the one used in the ﬁrst part of this proof. t n m +1 . θ) is independent of θ by Remark 1. Tn. is independent of θ. t m+1 n ( = g t. So we may set f t m +1 . θ ⋅ ⋅ ⋅ f xn . ⋅ ⋅ ⋅ . . ⋅ ⋅ ⋅ . θ h * t . θ = ∫ ⋅ ⋅ ⋅ ∫ g t. ⋅ ⋅ ⋅ . i. . . given T = t. t n Hence ( ) [ ( ) ( )] J . and X1. . It follows that the conditional distribution of T. t m +1 . t m +1 . T . t n t. ⋅ ⋅ ⋅ . But f(tm+1. tn )] J where we set h * t.1 Sufﬁciency: Deﬁnition and Some Basic Results 265 It is also assumed that the partial derivatives of xj with respect to ti. . Since. t n = h x1 t. θ = ( g t. it follows that the conditional distribution of X1. ) ) ). θ J −1 = f t m +1 . t n . . ⋅ ⋅ ⋅ . . one has f x1 . . T t. x (t. θ J −1 . . Let ﬁrst f x1 . . . t n . t n m +1 . there is a one-to-one correspondence between T. n. θ = g T x1 . θ ( )[ ( = g(t. . . θ h x1 . exist and are continuous. . tn . ⋅ ⋅ ⋅ . −∞ −∞ ( ) ∞ ∞ ( ) ( ∞ −∞ ) ( ) where h * * t = ∫ ⋅ ⋅ ⋅ ∫ h * t. ( ) ( ) [ ( ) ]( ) Then fT .t ) ( )g(t. · · · . ⋅ ⋅ ⋅ . ⋅ ⋅ ⋅ . . . ( ) ( m+1 n ( ) ( ) ) ( ) ( ) ( ) If we also set fT t. That is. t m +1 . Tn. θ)h * (t . ⋅ ⋅ ⋅ . ⋅ ⋅ ⋅ . θ h * t. (θ)h * *(⋅t)⋅ ⋅ ) = ( h * *(t⋅ )⋅ ⋅ ) m +1 n m +1 n which is independent of θ. t n dt m +1 ⋅ ⋅ ⋅ dt n = g t. xn t. θ)h**(t) and hence f t m +1 . ⋅ ⋅ ⋅ . ⋅ ⋅ ⋅ . . j = 1. ⋅ ⋅ ⋅ . θ) = g(t. ( ) [ ( ) ] . t m +1 . xn . Xn. ⋅ ⋅ ⋅ . . . tn|t. ⋅ ⋅ ⋅ .t h * t. and that the respective Jacobian J (which is independent of θ) is different from 0.

EXAMPLE 6 Refer to Example 1. θ = ( ) n! θ1x ⋅ ⋅ ⋅ θ rx I A x . . xn . Next. θ g θ . so that ˜ = φ(T) is also the inverse φ−1 exists. . θ h x1 .266 11 Sufﬁciency and Related Theorems we get f x1 . if T is sufﬁcient for θ. ˜ T x1 . where f x. we have that T ˜ = ψ (θ). x ) [T 1 n 1 n ˜ is sufﬁcient for θ. where ψ: ޒr → ޒr is one-to-one sufﬁcient for θ and T is sufﬁcient for θ (and measurable). θ = ψ −1 ψ θ = ψ −1 θ [ ( )] () Hence f x1 . by the fact that ∑rj = 1θj = 1 and ∑rj = 1 xj = n. xn . ⋅ ⋅ ⋅ . Actually. θ = g T x1 . xn =g φ ( ) [ ( { −1 ) ]( ) ˜ (x . xn . xn . T is sufﬁcient for the new parameter θ [( ) ] [( ) We now give a number of examples of determining sufﬁcient statistics by way of Theorem 1 in some interesting cases. xn becomes ( ) [ ( ) ]( ) ) ˜ x. xn . Xr)′ is sufﬁcient for θ = (θ1. θ = f x1 . ˜ ). θ = g T x1 . xn . x1! ⋅ ⋅ ⋅ xr ! 1 r () Then. Then. ⋅ ⋅ ⋅ . ( as was to be seen. . ˜ f θ 1 ⋅ ⋅ ⋅ . ⋅ ⋅ ⋅ . by Theorem 1. . which shows that T ˜ . θ = g T x1 . xn . xn . ▲ COROLLARY ) [ ( ) ]( ) Let φ: ޒm → ޒm ((measurable and independent) of θ) be one-to-one. ⋅ ⋅ ⋅ . . it follows that the statistic (X1. ⋅ ⋅ ⋅ . ⋅ ⋅ ⋅ . ˜ ˜ ˜ f 1 ⋅ ⋅ ⋅ . where we set −1 ˜ ˜ x. Thus PROOF We have T =φ−1[φ (T)] = φ−1( T f x1 . ⋅ ⋅ ⋅ . θ h x1 . ⋅ ⋅ ⋅ . ⋅ ⋅ ⋅ . ⋅ ⋅ ⋅ . x )]. ψ ˜. ⋅ ⋅ ⋅ . xn . xn . ⋅ ⋅ ⋅ . . we also have f x. θ = ( ) ∏ r −1 j =1 1 x j ! n − x1 − ⋅ ⋅ ⋅ − xr −1 ! r−1 ( n! ) × θ1x ⋅ ⋅ ⋅ θ rx−1 1 − θ1 − ⋅ ⋅ ⋅ − θ r −1 ( ) −1 n −Σr j = 1x j IA x () . θ}h( x . xn . θ = g T x1 . . ψ ( ) [( ) ]( ( ) [ ( )] ( )] and −1 ˜ ˜ =g T x . θ h x1 . ⋅ ⋅ ⋅ . 1 ⋅ ⋅ ⋅ . ⋅ ⋅ ⋅ . ▲ Thus. xn . xn . θr)′. . ⋅ ⋅ ⋅ . θ h x1 . xn .

By the corollary to Theorem 1.d. . we have ⎛ 1 ⎞ ⎡ 1 f x. θ] = I(−∞. θ] = I[θ . r. . Xn be i. θ .11.d. In the examples just discussed it so happens that the dimensionality of the sufﬁcient statistic is the same as the dimensionality of the REMARK 2 . Similarly. μ = θ1.v. . r. . σ2). Since also ⎛ 1 ⎞ ⎛θ ⎛ nθ 2 ⎞ f x. where S2 = 2 1 n X j − X . . 1 EXAMPLE 8 Let X1. . σ2 = θ2 and θ = (θ1. Xr−1)′ is sufﬁcient for (θ1. xn)′ and θ = (θ1.i. θ = ⎜ exp⎜ − 1 ⎟ exp⎜ 1 ⎟ ⎜ 2πθ ⎟ ⎝ 2θ 2 ⎠ ⎝ θ2 ⎝ 2 ⎠ n ( ) ∑x j =1 n j − 1 2θ 2 ∑x j =1 n 2 j ⎞ ⎟. it also follows that ( X . if θ1 = α is known and θ2 = θ. θ2). for r = 2. . S2)′ is sufﬁcient for θ. X(1) is sufﬁcient for θ. θ = = ( ) 1 (θ (θ 2 − θ1 1 − θ1 ) ) n I [θ 1. . X1= X is sufﬁcient for θ1 = θ. . whereas if θ1 = μ is known and θ2 = θ. ( ) 2 so that ⎛ 1 ⎞ ⎡ 1 f x. ⎥ ⎦ ) But ∑ (x j =1 n j − θ1 ) = ∑ [( x j =1 j − x + x − θ1 ) ( )] = ∑ ( x 2 j =1 j −x ) 2 + n x − θ1 . . In particular.’s from U(θ1. θ = ⎜ exp ⎢− ⎟ ⎜ 2πθ ⎟ ⎢ ⎣ 2θ 2 ⎝ 2 ⎠ n ( ) ∑ (x j =1 n j −x ) 2 − 2⎤ n x − θ 1 ⎥. . g2[x(n). 2 2 [ ][ ] where g1[x(1). . . Xn be i. In particular.’s from N(μ. θ2)′. By setting x = (x1. θ) at the beginning of this example. . if θ2 = β is known and θ1 = θ. θ2)′. θ ] x( n ) 2 ( ) ( ) n g1 x(1) . ∞ ) x(1) I ( −∞ . . X(n))′ is sufﬁcient for θ. 2θ 2 ⎥ ⎦ ( ) 2 It follows that ( X . . then ∑n ( X − μ )2 is sufﬁcient for θ. ∑n j = 1(Xj − X ) )′ is sufﬁcient for θ. Then by setting x = (x1.1 Sufﬁciency: Deﬁnition and Some Basic Results 267 from which it follows that the statistic (X1. ⎠ it follows that. θr−1)′. it follows that X(n) is sufﬁcient for θ. . then ∑n j = 1Xj is sufﬁcient for θ. θ ](x(n)). we get f x. xn)′. ∞)(x(1)). if θ2 = σ2 is known and θ1 = θ. It follows that (X(1). . EXAMPLE 7 Let X1. .v. and ∑ n j =1 ( ) 1 n ∑ Xj − μ n j =1 ( ) 2 is sufﬁcient for θ2 = θ if θ1 = μ is known.i. as j=1 j follows from the form of f(x. θ = ⎜ exp ⎢− ⎜ 2πθ ⎟ ⎟ ⎢ 2θ 2 ⎣ ⎝ 2 ⎠ n ( ) 2 n ∑ (x j =1 n n j 2⎤ − θ 1 ⎥. θ g 2 x( n ) . . . . . .

. notion of sufﬁciency is connected with a family of p.d. suppose that X1.i. suppose that m = r and that the conditional distribution of (X1. .d. . . . where REMARK 3 S2 = 2 1 n Xj − X . . j = 1. θ2).d. As an example.268 11 Sufﬁciency and Related Theorems parameter. In a situation like this. one may be tempted to declare that Tj is sufﬁcient forθj. if all other θi. Or to put it differently. . given ∑n j = 1Xj = yn.d. S2)′ is sufﬁcient for (θ1. The Ω}. m. . . Xn)′. However. θ ∈Ω and we may talk about Tj being sufﬁcient for θj. . given Tj = tj. . .d. . r. ∑ n j =1 ( ) Now consider the conditional p. . . it can be shown that no sufﬁcient statistic of smaller dimensionality other than the (sufﬁcient) statistic (X1. Xn). 2πnθ 2 ⎣ 2 nθ 2 ⎦ 1 ( ) This quotient is equal to ( 2πnθ 2 2πθ 2 ) n ⎧ 1 ⎡ exp ⎨ ⎢ yn − nθ1 ⎩ 2 nθ 2 ⎣ ( ) 2 − n y1 − θ1 ( ) 2 − ⋅ ⋅ ⋅ − n yn −1 − θ1 ( ) 2 2 ⎫ − n yn − y1 − ⋅ ⋅ ⋅ − yn −1 − θ1 ⎤ ⎬ ⎥ ⎦⎭ ( ) . j = 1. . Xn are i. Xn are i. In Deﬁnition 1.’s: ⎛ 1 ⎞ ⎧ 1 ⎡ exp ⎨− ⎜ ⎢ y1 − θ1 ⎜ 2πθ ⎟ ⎟ ⎩ 2θ 2 ⎣ ⎝ 2 ⎠ n ( ) 2 + ⋅ ⋅ ⋅ + yn −1 − θ1 ( ) 2 and 2 ⎫ + yn − y1 − ⋅ ⋅ ⋅ − yn −1 − θ1 ⎤ ⎬ ⎥ ⎦⎭ ( ) 2⎤ ⎡ 1 exp⎢− yn − nθ1 ⎥.v. Tm)′. If m is the smallest number for which T = (T1. σ 2)′. Xn−1)′. .f. the number of the real-valued statistics which are jointly sufﬁcient for the parameter θ coincides with the number of independent coordinates of θ.’s from the Cauchy distribution with parameter θ = (μ. . By using the transformation y j = x j . of (X1.’s from N(θ1. . otherwise Tj is to be either sufﬁcient for the above family F or not sufﬁcient at all.v. . .d. . ⋅ ⋅ ⋅ .f. . then T is called a minimal sufﬁcient statistic for θ. j =1 n one sees that the above mentioned conditional p. this need not always be the case. . however. is given by the quotient of the following p. This outlook. .f. is independent of θj. . . . . yn = ∑ x j . θ2)′. . if X1. . θr)′. θ). is a sufﬁcient statistic for θ = (θ1. are known. . is not in conformity with the deﬁnition of a sufﬁcient statistic. n − 1.i. i ≠ j. . . For example.’s F = { f(·. r. Xn)′ exists. Tj = Tj(X1. .f. . Then ( X .

Π n j = 1 Xj or −∑ j = 1 log Xj is a sufﬁcient n statistic for α = θ if β is known. θ ) = 0. . and ∑n j = 1 Xj or X is a sufﬁcient statistic for β = θ if α is known.v. θ2)′. Thus ∑n j = 1 Xj. or equivalently. Then show that X1 + X2 is not a sufﬁcient statistic for θ. ii) ∑n j = 1 Xj or X is a sufﬁcient statistic for θ. θ2)′ = (α. under consideration is independent of θ1 but it does depend on θ2. β)′ if the X’s are distributed as Gamma. f 1.3 (Truncated Poisson r. x ≠ 0.i. ( ) where θ > 0.d. 11. . 2. θ) given by: f 0 . θ = e −θ . ( ) ( ) f ( x. ( ( .’s with p. The concept of X being sufﬁcient for θ1 is not valid unless θ2 is known.v. r.’s distributed as stated below. In particular. Π j = 1 (1 − Xj))′ is a sufﬁcient statistic for (θ1. θ2)′ = (α. θ = 1 − e −θ − θe −θ .11. take α = 1 and conclude that ∑n j = 1 Xj ˜ = 1/θ of the Negative or X is a sufﬁcient statistic for the parameter θ Exponential distribution. . n iv) (Π n j = 1 Xj.f. of the r.d. X )′ is a sufﬁcient statistic for (θ1.1. if the X’s are distributed as Negative Binomial. if the X’s are distributed as Poisson. Xn be i. i) ii) iii) iv) X X X X is distributed as Poisson. X is not sufﬁcient for (θ1. 1.’s) Let X1. n n iii) (Π n j = 1 Xj.f.d. )⎤ ⎥ ⎦ ) 2 Exercises 11. .1. Then use Theorem 1 and its corollary in order to show that: i) ∑n j = 1 Xj or X is a sufﬁcient statistic for θ.d. is distributed as Negative Binomial.f.2 Let X1. 11. is distributed as Gamma. θ = θe −θ .d. and Π j = 1 (1 − Xj) is a sufﬁcient statistic for β = θ if α is known. X and specify the parameter space Ω of the parameter involved. ∑ j = 1 Xj)′ or (Π j = 1 Xj. r.1. In the latter case. β)′ if the X’s n are distributed as Beta. Π n j = 1 Xj is a sufﬁcient statistic for α = θ if β is known. f 2. . Thus the conditional p.v.v. is distributed as Beta.1 In each one of the following cases write out the p.1 Exercises Sufﬁciency: Deﬁnition and Some Basic Results 269 and (y n − nθ1 ) 2 − n y1 − θ1 ( ) 2 − ⋅ ⋅ ⋅ − n yn −1 − θ1 ( ) 2 − n yn − y1 − ⋅ ⋅ ⋅ − yn −1 − θ1 2 2 2 2 = yn − n⎡ + ⋅ ⋅ ⋅ + yn y1 −1 + yn − y1 − ⋅ ⋅ ⋅ − yn −1 ⎢ ⎣ independent of θ1. In particular. f(·. X2 be i.i.

1) x . . . ∞).1.d.1. . () ( ) . then. f x. θ ∈(0. . 2 0. .f.d.8 If X1. Xn be a random sample of size n from the Bernoulli distribution. θ ∈ 0. . show that this statistic is not minimal by establishing that T = max(|X1|. θ = e ( ) − x −θ ( ) I (θ . Xn are i. X(n))′ is a sufﬁcient statistic for θ. . θ ) = (θ − x)I ( ) ( x). Xn is a random sample of size n from U(−θ.v. . θ2). r. Then show that T = (T1. ∞ . . show that: ′ n n n ⎛ ⎞ 2 2 ⎜ X1 .1. .d. . j = 1.i.f.d. . f(·. () show that X(1) is a sufﬁcient statistic for θ.5 If Xj = (X1j. by using Theorem 1. 11.f.6 If X1.4 Let X1. θ) given in Exercise 3. θ ∈ 0. n. ∑ X j ⎟ ⎝ j =1 ⎠ j =1 ′ ⎛ or ⎜ X .1. . 11. ∞ ) x .’s with p. ∞). 11.i. ∞ . . ∞ . . . f(·. . θ = ( ) 1 3 −x θ x e I ( 0 . show that (X(1).v. .270 11 Sufﬁciency and Related Theorems 11. . . is a random sample of size n from the Bivariate Normal distribution with parameter θ as described in Example 4.13(iii) of Chapter 3.d. 11. θ = ⎜ ⎟ ⎜ ⎟ ⎝ c ⎠⎝ x⎠ ( ) I ( c . r. . θ = θxθ −1 I ( 0 . θ ∈(0. . |Xn|) is also a sufﬁcient statistic for θ. .’s with the Double Exponential p. X2j)′.9 Let X1. . .1.1. Xn is a random sample of size n with p. θ). θ) given below. .1. ∞ ) x . ∑ X1 j . . and set T1 for the number of X’s which are equal to 0 and T2 for the number of X’s which are equal to 1. ∑ X 2 j . θ ∈ 0. 6θ 4 θ +1 () ( ) ⎛θ ⎞⎛ c ⎞ iv) f x. Xn is a random sample of size n from N(θ. . θ iii) f x. . Furthermore. show n ⎛ n ⎞ 2 ⎜∑ X j. . Then show that ∑n j = 1|Xj| is a sufﬁcient statistic for θ. θ ∈ ޒ. . .7 that If X1.10 If X1. θ i) f x. X 2 . 11. ∞ ) x .3. Xn be i. ∑ X1 j X 2 j ⎟ ⎝ ⎠ j =1 j =1 j =1 is a sufﬁcient statistic for θ. ⎝ ⎞ ∑ X j2 ⎟ ⎠ j =1 n ′ is a sufﬁcient statistic for θ. T2)′ is a sufﬁcient statistic for θ. θ ∈ޒ. ﬁnd a sufﬁcient statistic for θ. ( ) () ( ) 2 ii) f ( x. 11.

⋅ ⋅ ⋅ .2 Some Completeness Basic Results 271 11. Eθ g(X) = 0 for all θ ∈ Ω implies that g(x) = 0 Ω. f x. . then cn = 0. . EXAMPLE 9 Let ⎧ ⎛ n⎞ ⎪ F = ⎨ f ⋅. ⎪ ⎭ () n n ( ) ⎛ ⎞ where A = {0. Also. . θ . θ ∈ 0.}. except possibly on a set N of x’s such that Pθ(X ∈ N) = 0 for all θ ∈Ω The examples which follow illustrate the concept of completeness. ∞) implies g(x)/x! = 0 for x = 0.1 Sufﬁciency: Deﬁnition and 11. θ = e −θ I A x . . In fact. n. n ⎛ n⎞ Eθ g X = ∑ g x ⎜ ⎟ θ x 1 − θ ⎝ x⎠ x= 0 ( ) () ( ) n −x = 1−θ ( n ) ∑ g( x)⎜ ⎟ρ ⎝ x⎠ x =0 x . . f x. . and let g: ޒk → ޒbe a (measurable) function. θ). Then F is complete. . θ ∈Ω so that g(X) is an r.f. . 1.d. . . . . Then F is complete. . 1.v.. Meann−j while let us recall that if ∑n = 0 for more than n values of x. . 1. . n ⎝ x⎠ () which is equivalent to g(x) = 0. n = 0. hence for more than n values of ρ. In fact. . 1. x = 0. . where ρ = θ/(1 − θ). n}. 1 ⎬. 1. n. We assume that Eθ g(X) exists for all θ ∈Ω and set F = {f(. . and this is equivalent to g(x) = 0 for x = 0. 1. . . j = 0. ∞ ⎬. Thus Eθ g(X) = 0 for all θ ∈ (0. and therefore ⎛ n⎞ g x ⎜ ⎟ = 0. . . To this end. EXAMPLE 10 Let ⎧ ⎫ θx ⎪ ⎪ F = ⎨ f ⋅. .11. DEFINITION 2 With the above notation. then j = 0 cn−j x ∞ cj = 0. we say that the family F (or the random vector X) is complete if for every g as above. let X be a k-dimensional random vector Ω ⊆ Rr. . . 1. θ ). . θ ∈ 0. x = 0.2 Completeness In this section. with p. θ = ⎜ ⎟ θ x 1 − θ ⎝ x⎠ ⎪ ⎩ ( ) ( ) ( ) n −x ⎫ ⎪ I A x . ∞). f(·. 1) is equivalent to x ∑ g( x )⎜ ⎟ρ = 0 ⎝ x⎠ x=0 n ⎛ n⎞ for every ρ ∈ (0. . Its usefulness will become apparent in the subsequent sections.θ ∈Ω}. x ! ⎪ ⎪ ⎩ ⎭ where A = {0. we introduce the (technical) concept of completeness which we also illustrate by a number of examples. ( ) ( ∞ ) () ( ) Eθ g X = ∑ g x e −θ x= 0 ( ) () ∞ g x θx θx = 0 = e −θ ∑ x! x = 0 x! () for θ ∈ (0. θ . if ∑ n = 0 cnxn = 0 for all values of x in an interval for which the series converges.

d. t. θ ∈ Ω ⊆ ޒr and let T = (T1. f x. In fact. σ2). θ −α ⎭ ⎩ Then F is complete. . . . θ ∈ 0.f. ( ) ( ) ( ) ( ) In the following. Xn be i. Xn). θ ∈ α . . then ∫θ α g(x)dx = 0 for all θ > α which intuitively implies (and that can be rigorously justiﬁed) that g(x) = 0 except possibly on a set N of x’s such that Pθ (X ∈ N) = 0 for all θ ∈ Ω.f. it can be shown that ⎧ ⎡ x −θ 1 ⎪ exp ⎢− F = ⎨ f ⋅. In fact. g(t. Vj = Vj(X1. . then ( ) ( ) ( ) 2 ⎫ ⎤ ⎥. θ for all v and t ∈ S. Then the distribution of V does not depend on θ. θ = fV v. T v. θ . Let V = (V1. of T and assume that the set S of positivity of g(·. · · · .v. Vk)′. θ ] x . Xn be i. θ = f v t g t. if Eθ g(X) = 0 for all θ ∈ (α. Then Eθ g(X ) = Eθ(X − μ) = 0 for all θ ∈ (0. θ . . θ) is U(θ. θ) is the same for all θ ∈ Ω. ∞). . θ = I [α . ∞). f x.d.d. θ g t. r. Xn). · · · . . j = 1. . where Tj = Tj(X1. Then PROOF fV . be any other statistic which is assumed to be (stochastically) independent of T. . . while by independence ( ) ( )( ) fV . S2)′ is complete. θ − a ∫α () Thus. θ ∈⎪ ޒ ⎬ ⎥ ⎪ ⎥ ⎦ ⎭ ⎧ ⎫ ⎡ x−μ 2⎤ 1 ⎪ ⎪ ⎥ ⎢ F = ⎨ f ⋅.v. if both μ and σ2 are unknown. We have that for t ∈ S.d. The same is seen to be true if f(·. If μ is known and σ2 = θ.i.d. . Tm)′ be a sufﬁcient statistic for θ. ( ) ( ) () ( ) Eθ g X = ( ) 1 θ g x dx. by sufﬁciency. . . . . θ) > 0 for all θ ∈ Ω and so f(v|t) is well deﬁned and is also independent of θ. ∞ ⎬. θ = ⎢ 2σ 2 2πσ ⎪ ⎢ ⎣ ⎩ is complete. t. with p. f(·. θ for all v and t. . . it can be shown that ( X . k. . where X is an r. let g(x) = x − μ. f(·. while g(x) = 0 only for x = μ. If σ is known and μ = θ.’s from N(μ.f. r.v. ∞ ⎬ ⎥ ⎢ 2θ 2πθ ⎪ ⎪ ⎥ ⎢ ⎦ ⎣ ⎩ ⎭ is not complete.272 11 Sufﬁciency and Related Theorems EXAMPLE 11 Let ⎫ ⎧ 1 F = ⎨ f ⋅. j = 1. . θ). . Let g(·. Finally. we establish two theorems which are useful in certain situations. Therefore ( ) ( )( ) . m. . T v. θ . θ). θ = exp − . . THEOREM 2 Let X1. f x.’s with p.i. . EXAMPLE 12 Let X1. β ). θ) be the p.

f. Ω} j = 1. Eθφ(T. r. v) = 0 for all t ∈ Nc by completeness (N is independent of v by the deﬁnition of completeness).d. that is. θ = f v t g t. θ). θ ∈Ω T = (T1.f. v ∈ ޒk.f.’s. v) = 0 for all θ ∈ Ω and hence φ(t. Then. . ▲ REMARK 4 The theorem need not be true if S depends on θ. . m. . . .v. θ) = fV(v) is independent of θ. 2. for an arbitrary but ﬁxed v. θ ∈ (0. θ) be the p. of T and assume that C = {g(·. . ⋅ ⋅ ⋅ . fV(v.’s. then show that F is complete. 20. t . θ + 2. . θ ∈Ω is complete. Xn). To this end. θ dt1 ⋅ ⋅ ⋅ dt m = fV = fV −∞ V −∞ 1 m 1 () ( )( (v) − ∫ ⋅ ⋅ ⋅ ∫ f (v.11. . the converse of Theorem 2 is true and also more interesting. t m g t1 . θ)dt ( v ) − f ( v ) = 0. θ). one has fV(v) = f(v|t). . .2 If F is the family of all U(−θ. Vj = Vj(X1. [ ( ) ( )] () ( ) ) ⋅ ⋅ ⋅ dt m that is. then show that F is not complete. ⋅ ⋅ ⋅ . Tm)′ be a sufﬁcient statistic of θ. v) = fV(v) − f(v|T) which is deﬁned for all t’s except perhaps for a set N of t’s such that Pθ (T ∈ N) = 0 for all θ ∈Ω. where Tj = Tj(X1. t .2. it follows that V and T are independent. consider the statistic φ(T. j = 1. . . t ∈Nc. . as was to be seen.d. .2. . . Then we have for the continuous case (the discrete case is treated similarly) PROOF Eθ φ T . THEOREM 3 Ω ⊆ ޒr and let (Basu) Let X1.d. θ) = f(v/t) for all v and t ∈S.f. . . t m . Two balls are drawn one by one with replacement. Xn be i. . f(·. Hence fV(v. It sufﬁces to show that for every t ∈ ޒm for which f(v|t) is deﬁned. Under certain regularity conditions. θ + 10. 11. . .i. k be any other statistic. Use this . . where θ ∈ Ω = {0. . 11. ∞). . if the distribution of V does not depend on θ. So fV(v) = f(v/t). . .d. θ ( )( ) ( )( ) for all v and t ∈ S. 10. ⋅ ⋅ ⋅ . .’s with p. Xn).2.d. Let V = (V1. . . . }. Vk)′. . .1 If F is the family of all Negative Binomial p. θ) p. . It relates sufﬁciency. completeness. and let Xj be the number on the jth ball. v = Eθ f V v − f v T = f V v − E θ f v T ∞ ∞ −∞ ∞ −∞ ∞ ( ) = fV v − ∫ ⋅ ⋅ ⋅ ∫ f v t1 . ▲ Exercises 11.3 (Basu) Consider an urn containing 10 identical balls numbered θ + 1. Let g(·. j = 1. and stochastic independence. . θ g t. .1 Exercises Sufﬁciency: Deﬁnition and Some Basic Results 273 fV v.

. page 123. page 123. . f(·. . Xn be i.i.’s with p. iii) This follows from (CV). Then we have that: i) The r. . DEFINITION 3 Let X1. Xn). Eθφ(T) = θ for every θ ∈Ω. . and let T = (T1. . f(·. r. Xn be i. . . θ ∈Ω ⊆ ޒand let U = U(X1. The process of forming the conditional expectation of an unbiased statistic of θ.i. Chapter 5. (Rao–Blackwell) Let X1. . then it becomes so by conditioning it with respect to T.d. . . PROOF THEOREM 4 i) That φ(T) is a function of the sufﬁcient statistic T alone and does not depend on θ is a consequence of the sufﬁciency of T. .3 Unbiasedness—Uniqueness In this section.v. ii) That φ(T) is unbiased for θ.274 11 Sufﬁciency and Related Theorems example to show that Theorem 2 need not be true if the set S in that theorem does depend on θ. Xn) be an unbiased statistic for θ which is not a function of T alone (with probability 1). follows from (CE1). Then we say that U is an unbiased statistic for θ if EθU = θ for every θ ∈Ω.v. 11. . θ). then one may restrict oneself to the subclass of the unbiased statistics which depend on T alone (with probability 1). that is.f.d. . we shall restrict ourselves to the case that the parameter is realvalued. Let U = U(X1. This is so because. Tj = Tj(X1. The variance of the resulting statistic will be smaller than the variance of the statistic we started out with by (iii) of the theorem.f. We can now formulate the following important theorem. Xn) be a statistic. .d. .d. . Tm)′. . .’s with p. θ). We shall then introduce the concept of unbiasedness and we shall establish the existence and uniqueness of uniformly minimum variance unbiased statistics. . . θ ∈Ω ⊆ ޒ. θ ∈Ω. m.f. Set φ(t) = Eθ(U|T = t). j = 1. Chapter 5. provided EθU2 < ∞. r. where by EθU we mean that the expectation of U is calculated by using the p. . f(·. ii) φ(T) is an unbiased statistic for θ. . since the resulting statistic will be the same (with probability 1) by (CE2′). is known as Rao–Blackwellization. given T. page 123. 2 2 iii) σ θ [φ(T)] < σ θ (U ). .v. Chapter 5.d. be a sufﬁcient statistic for θ. ▲ The interpretation of the theorem is the following: If for some reason one is interested in ﬁnding a statistic with the smallest possible variance within the class of unbiased statistics of θ. θ). . φ(T) is a function of the sufﬁcient statistic T alone. . . It is further clear that the variance does not decrease any further by conditioning again with respect to T. . . if an unbiased statistic U is not already a function of T alone (with probability 1).

.3 Unbiasedness—Uniqueness Sufﬁciency: Deﬁnition and Some Basic Results 275 The concept of completeness in conjunction with the Rao–Blackwell theorem will now be used in the following theorem. Some illustrative examples follow. r. θ). is an have then that Eθ(Xj) = θ and σ θ unbiased statistic for θ with variance θ2. σ2). . 3. . with parameter λ. Thus X1. where φ(T) = U(T) − V(T). Let X1.v.’s from the Negative Exponential p. of the X’s becomes f(x. Then U is the unique unbiased statistic for θ with the smallest variance in the class of all unbiased statistics for θ in the sense that.d. by Example 5. by Theorem 5.d. by Example 8.v.d.d.f. Let T = (T1. . that it is UMV unbiased for θ. UMV unbiased for θ. be a sufﬁcient statistic for θ and let g(·. r. It is also complete. θ ∈ Ω. r.11. 2. Then if σ is known and μ = θ. Then T = ∑n j = 1 Xj is a sufﬁcient statistic for θ. Here is another example which serves as an application to both Rao– Blackwell and Lehmann–Scheffé theorems. . X = (1/n)T is UMV unbiased for θ. . X2. by Example 12. Xn be i. θ) be its p. Let X1. j = 1. θ ∈ Ω. θ). PROOF Eθ U T − V T = 0. by Theorem 5. THEOREM 5 (Uniqueness theorem: Lehmann–Scheffé) Let X1. . Now X = (1/n)T is an unbiased statistic for θ and hence. and let F = {f(·. Setting θ = 1/λ. . X3 be i.i. .’s from B(1.’s from N(μ. θ). θ). . . ▲ θ ∈Ω DEFINITION 4 [ ( ) ( )] or Eθ φ T = 0.1 11. Then T = ∑n j = 1 X j is a sufﬁcient statistic for θ. Then. Set C = {g(·. Xn be i.f. 1). . equivalently. . r. by Example 8.v. θ) = 1/θe−x/θ. .d. f(·. it follows.d. ( ) An unbiased statistic for θ which is of minimum variance in the class of all unbiased statistics of θ is called a uniformly minimum variance (UMV) unbiased statistic of θ (the term “uniformly” referring to the fact that the variance is minimum for all θ ∈ Ω). . .d. Tm)′.i. .’s with p. θ ∈Ω} and assume that C is complete. j = 1. we have then EθU(T) = EθV(T) = θ. . .d.v. Tj = Tj(X1. Then by completeness of C. we have that T = ∑n j = 1 Xj is a sufﬁcient statistic for θ. x > 0. .f. we have φ(t) = 0 for all t ∈ Rm except possibly on a set N of t’s such that Pθ(T ∈ N) = 0 for all Ω. Xn be i. We 2 (Xj) = θ2. by Example 9.f. if V = V(T) is another unbiased statistic for θ. θ ∈ Ω. Since T is also complete (by Theorem 8 below) and S2 = (1/n)T is unbiased for θ.i. By the Rao–Blackwell theorem.i. the p. since it is unbiased for θ. m. . Let μ be known and without loss of generality 2 set μ = 0 and σ2 = θ. and also complete. θ ∈ (0. θ ∈Ω}. . EXAMPLE 13 EXAMPLE 14 EXAMPLE 15 Let X1. for example. Xn). . It is further easily seen (by Theorem . by Theorem 5. . By the unbiasedness of U and V. then U(t) = V(t) (except perhaps on a set N of t’s such that Pθ(T ∈ N) = 0 for all θ ∈Ω). θ ∈Ω ⊆ ޒ. it sufﬁces to restrict ourselves in the class of unbiased statistics of θ which are functions of T alone. Let U = U(T) be an unbiased statistic for θ and suppose that EθU2 < ∞ for all θ ∈Ω.

f. then the joint p.3.f.d. Some illustrative examples follow.’s with p. 11. 11. n. To actually ﬁnd the UMV unbiased statistic for θ.f. . θ). xn .2 Refer to Example 15 and. θ ∈ Ω.1 If X1. which is independent of θ.276 11 Sufﬁciency and Related Theorems 8 below) that T = X1 + X2 + X3 is a sufﬁcient statistic for θ and it can be shown that it is also complete. θ) as above. θ = C θ e hx. ⋅ ⋅ ⋅ . then use Exercise 11.f. .4 The Exponential Family of p. of the X’s is given by n ⎡ ⎤ f x1 . one has that their common value is T/3. If X1. x ∈ ޒ. ⋅ ⋅ ⋅ . (1) ( ) () () ( ) where C(θ) > 0. Since also their sum is equal to Eθ(T|T) = T. Since X1 is not a function of T. . θ ∈ Ω ⊆ ޒ. one then knows that X1 is not the UMV unbiased statistic for θ. It follows that C −1 θ = for the discrete case. To this end.d.v.d.d.i. by symmetry. Xn are i. the set of positivity of f(x. θ ∈ 0. of course. ( ) () () ( ) ( ) ( ) (2) . (One. . Thus Eθ(X1|T) = T/3 which is what we were after.’s depending on a real-valued parameter θ is of the following form: Q (θ )T ( x ) f x. f(·. ⎢ ⎥ j =1 ⎣ ⎦ x j ∈ ޒ. ∞ . it sufﬁces to Rao–Blackwellize X1.d.1. θ = C n θ exp ⎢Q θ ∑ T x j ⎥ h x1 ⋅ ⋅ ⋅ h xn . . and ( ) ∑e x ∈S S Q θ T x () ( ) hx () C −1 θ = ∫ e () Q θ T x () ( ) h x dx () for the continuous case. one has Eθ(X1|T) = Eθ(X2|T) = Eθ(X3|T).) Just for the sake of verifying the Rao–Blackwell theorem.2(i) and Example 10 to show that X is the (essentially) unique UMV unbiased statistic for θ. by utilizing the appropriate transformation. . . . r. show that X is the (essentially) unique UMV unbiased statistic for θ. θ ∈Ω and also h(x) > 0 for x ∈ S.3. one sees that ⎛T ⎞ Eθ ⎜ ⎟ = θ ⎝ 3⎠ and ⎛T ⎞ θ2 σ θ2 ⎜ ⎟ = < θ 2 . Xn is a random sample of size n from P(θ). j = 1. 3 ⎝ 3⎠ ( ) ( ) Exercises 11. arrives at the same result by using transformations. it is clear that.’s: One-Dimensional Parameter Case A large class of p.

with p. T x = x−μ 2θ () ( ) 2 and h x = 1. ⎝ 2σ 2 ⎠ 2 If now μ is known and σ = θ. θ = 1 − θ ( ) ( ) n ⎡⎛ θ ⎞ ⎤⎛ n⎞ exp⎢⎜ log ⎟ x ⎥⎜ ⎟ I A x .d.f. can also be written as follows. This p. θ = ( ⎛ θ2 ⎞ ⎛ θ ⎞ ⎛ 1 2⎞ exp⎜ − exp⎜ 2 x⎟ exp⎜ − x ⎟ . 1 −θ ⎝ x⎠ () () () Let now the p.d. More precisely. 1. T x = x. . In connection with families of p.’s of the one-parameter exponential form. θ ∈Ω}.d.f. . we have f x.d. θ).d. θ). 1 −θ⎠ ⎥ ⎢ ⎣⎝ ⎦⎝ x ⎠ () ( ) and hence is of the exponential form with C θ = 1 − θ . Then if σ is known and μ = θ. σ2 ⎛ 1 2⎞ T x = x. h x = exp⎜ − x ⎟. THEOREM 6 Let X be an r.f. be N(μ. 2⎟ ⎝ 2σ ⎠ 2πσ 1 Qθ = () θ . .f. θ) is the p. () If the parameter space Ω of a one-parameter exponential family of p. θ ∈ ޒ. f x. h x = ⎜ ⎟ I A x . ∞ .d. it can be shown that the family is complete. ⎝ 2θ ⎠ 2πθ 1 ( ) ( ) and hence it is again of the exponential form with Cθ = () 1 2πθ .11. . . where g(·. θ ∈ Ω ⊆ ޒgiven by (1) and set C = {g(·. θ ∈ 0. σ2).1 Sufﬁciency: Deﬁnition and Some Basic Results 277 EXAMPLE 16 Let ⎛ n⎞ f x. the following result can be proved.4 The Exponential Family of p. n}.f.v.f. of T(X).d. the following theorem holds true. Then C is complete. θ = ( ) 2⎞ ⎛ 1 exp⎜ − x − μ ⎟ . 2⎟ ⎝σ ⎠ ⎝ 2σ 2 ⎠ ⎝ 2σ ⎠ 2πσ 1 and hence is of the exponential form with Cθ = () ⎛ θ2 ⎞ exp⎜ − . Qθ =− () 1 . θ ∈ 0 . θ = ⎜ ⎟ θ x 1 − θ ⎝ x⎠ ( ) ( ) n− x IA x . f(·.f. () where A = {0.’s contains a non-degenerate interval. Then the completeness of the families established in Examples 9 and 10 and the completeness of the families asserted in the ﬁrst part of Example 12 and the last part of Example 14 follow from the above theorem. 1 . then we have () () f x.’s: One-Dimensional Parameter Case 11. Q θ = log EXAMPLE 17 () ( ) ) n () ⎛ n⎞ θ . provided Ω contains a non-degenerate interval.

. j j ⋅ ⋅ ⋅ . . θ = ∑ C n θ exp ⎢Q θ ∑ T x j ⎥∏ h x j ⎢ ⎥ j =1 ⎣ ⎦ j =1 ⎡ n ⎤ Q (θ ) t Q (θ ) t = Cn θ e h xj ⎥ = C n θ e h* t . . Then i) T* = ∑n j = 1 T(Xj) is a sufﬁcient statistic for θ. Then consider the transformation n n ⎧ ⎧ ⎪ y1 = ∑ T x j ⎪T x1 = y1 − ∑ T y j j =1 j =2 ⎪ ⎪ ⎪ ⎪ hence ⎨ ⎨ ⎪ y = x . ∑⎢ ∏ ⎢ ⎥ ⎣ j =1 ⎦ ( ) () () ( ) ( ) () ( ) () () where ⎡ n ⎤ h * t = ∑ ⎢∏ h x j ⎥. . Then the proof is carried out under certain regularity conditions to be spelled out. suppose that the X’s are discrete. j = 2. . . j ⎩ j ( ) where we assume that y = T(x) is one-to-one and hence the inverse T −1 exists. . j = 2. xn)′ for which ∑n j = 1T(xj) = t. of T* is of the form g t. where the summation extends over all (x1. We set Y1 = ∑n j = 1 T(Xj) and let Yj = Xj. n. and then so is T*.i.278 11 Sufﬁciency and Related Theorems THEOREM 7 Let X1. θ = C n θ e ( ) () Q θ t () h* t . Then we have g(t. n.f. n. r.’s with p. . . ⎢ ⎥ ⎣j = 1 ⎦ () ( ) Next.f. xn. ⎪ j ⎪ j ⎪ ⎪ ⎩ ⎩ ( ) ( ) ( ) and thus n ⎧ ⎡ ⎤ −1 ⎪ x1 = T ⎢ y1 − ∑ T y j ⎥ ⎨ ⎢ ⎥ j =2 ⎣ ⎦ ⎪ x = y . n. . . . . . Next. let the X’s be of the continuous type. ⎪ x = y . . ii) The p. . . . . of the one-parameter exponential form. ⋅ ⋅ ⋅ . Xn be i. ii) First. θ).d.v. j = 2. j = 2. Thus n ⎡ ⎤ n g t .d. . . PROOF i) This is immediate from (2) and Theorem 1. θ) = Pθ (T* = t) = ∑f(x1. () where the set of positivity of h*(t) is independent of θ.d.

yn . . .d. θ ∈ Ω}. . Now as a consequence of Theorems 2.1 Sufﬁciency: Deﬁnition and Some Basic Results 279 ∂x1 1 = . by t. i ≠ j. we have T ′ yj ∂x1 1 ∂z =− .f. . n. θ = C n θ exp Q θ y1 − T y2 − ⋅ ⋅ ⋅ − T yn 2 n ( ( ) { ( )[ ( ) + T ( y ) + ⋅ ⋅ ⋅ + T ( y )]} ( ) j × h T −1 y1 − T y2 − ⋅ ⋅ ⋅ − T yn = Cn θ e n { [ ( ) () Q θ y1 () h T −1 y1 − T y2 − ⋅ ⋅ ⋅ − T yn { [ ( )]}∏ h( y ) J n j =2 ( ) ( )]} × ∏ h yj J . we obtain the following result. j =2 n ( ) and replace y1. 3.11. 7 and 8.4 The Exponential Family of p.d.f. . of the sufﬁcient statistic T*. θ ∈Ω} is complete. j =2 ( ) So if we integrate with respect to y2. . Therefore. . Yn is given by g y1 . THEOREM 8 The family C = {g(·. Then the following result concerning the completeness of C follows from Theorem 6. . .’s: One-Dimensional Parameter Case 11. we arrive at the desired result. = ∂y j T ′ T −1 z ∂y j T ′ T −1 z [ ( )] 1 −1 ( ) [ ( )] and ∂x j =1 ∂y j for j = 2. ⋅ ⋅ ⋅ . j =2 n ( ) provided we assume that the derivative T ′ of T exists and T ′[T −1(z)] ≠ 0.d. of Y1. ▲ The above proof goes through if y = T(x) is one-to-one on each set of a ﬁnite partition of ޒ. yn. the joint p. . . ∂y1 T ′ T −1 z [ ( )] where z = y1 − ∑ T y j . REMARK 5 We next set C = {g(·. provided Ω contains a non-degenerate interval. . set h * y1 = ∫ ⋅ ⋅ ⋅ ∫ h T −1 y1 − T y2 − ⋅ ⋅ ⋅ − T yn −∞ −∞ ( ) ∞ ∞ { [ ( ) ( )]} × ∏ h y j J dy2 ⋅ ⋅ ⋅ dyn . . θ) is the p. . n and ∂xj/∂yi = 0 for 1 < i. . Since for j = 2. where g(·. .f. we have that J= T′ T [ (z)] ) = T′ T { [ y − T ( y ) − ⋅ ⋅ ⋅ − T ( y )]} −1 1 2 n 1 . j.

X1. if and only if the distribution of V does not depend on θ. if the distribution of a statistic V does not depend on θ. does not depend on θ. Xn be i.v. i) X is distributed as Poisson.f. if V is any other statistic. . ii) X is distributed as Negative Binomial. . by Theorem 8. it follows that the distribution of V is independent of θ.d. PROOF 1 n ∑Xj n j =1 and S2 = 1 n ∑ Xj − X n j =1 ( ) 2 We treat μ as the unknown parameter θ and let σ2 be arbitrary (>0) but ﬁxed. Xn be i. . Then X= are independent. because P[∑ j = 1 (Yj 2 − Y ) ∈ B] is equal to the integral of the joint p. . of the X’s is of the one-parameter exponential form and T = X is both sufﬁcient for θ and complete. . of the one-parameter exponential form and let T* be deﬁned by (i) in Theorem 7.f. Thus.i. it follows.280 11 Sufﬁciency and Related Theorems THEOREM 9 Let the r.’s from N(μ. Since Y = X − θ. PROOF In the ﬁrst place. 2 2 j j =1 n 2 n But the distribution of ∑n j = 1 (Yj − Y ) does not depend on θ.4.d. . For the converse.i. σ2).d. Now Xj being N(θ. Let V = V X1 . . is independent of θ. Then the p. iii) X is distributed as Gamma with β known.f. by Theorem 3. .1 In each one of the following cases. by Theorem 9.d. ⋅ ⋅ ⋅ . X n = ∑ X j − X .d. that V and T* are independent. r.f. we have that the family C of the p.’s of T* is complete.d. it follows that V and T* are independent if and only if the distribution of V does not depend on θ. σ2) implies that Yj = Xj − θ is N(0.v.f. by Theorem 7(ii). ▲ Exercises 11. Thus the assumptions of Theorem 2 are satisﬁed and therefore. ▲ APPLICATION Let X1. show that the distribution of the r. T* is sufﬁcient for θ.d. and the set of positivity of its p. from a p. of the Y’s over B and this p. 2 j =1 ( ) n ( ) Then V and T will be independent.d. Then.f. The proof is completed. X is of the one-parameter exponential form and identify the various quantities appearing in a one-parameter exponential family. by Theorem 7(i). we have ∑ (X j =1 n j −X ) = ∑ (Y − Y ) .v. σ2). if V is any statistic which is independent of T*. .

12 (for μ = θ and σ known). θ1 . 10. xr .d. 15. k ≥ 1.11. θ) given by f x. θ = (θ1. The following are examples of multiparameter exponential families. θ = ⎛ xγ γ γ −1 x exp⎜ − θ ⎝ θ ⎞ ⎟ I (0 . θ = C θ exp ⎢∑ Q j θ Tj x ⎥ h x .d.4. r.f.v. of the X’s.2 Let X1. Xk be i. r and ∑r j = 1 xj = n}. C(θ) > 0. .d. ..v. xj ≥ 0. j = 1. .f. . θ r −1 = 1 − θ1 − ⋅ ⋅ ⋅ − θ r −1 ( ) ( ) n ⎛ r ⎞ θj n! × exp⎜ ∑ x j log ⎟ × x ! ⋅ ⋅ ⋅ x ! I A x1 . θ) is indeed a p.d. . . . which is independent of θ. or that the p.d. . 11.i.’s and set X = (X1. − − ⋅ ⋅ ⋅ − 1 θ θ ⎝ j =1 1 r −1 ⎠ 1 r ( ) where A = {(x1. . . ⋅ ⋅ ⋅ .5 Some Multiparameter Generalizations Let X1.i.’s? 11.’s with p.4. is of exponential form with . f(·. xr)′ ∈ ޒr. . . . . . .f. θ ∈ Ω and h(x) > 0 for x ∈ S. θ) a member of a one-parameter exponential family of p. γ > 0 ⎠ ( ) () (known). . .d. . the set of positivity of f(·.f. . γ ii) Show that ∑n j = 1 X j is a sufﬁcient statistic for θ . iv′) X is distributed as Beta with α known. . ⋅ ⋅ ⋅ .d. iii) Is f(·. EXAMPLE 18 Let X = (X1. θr)′ ∈ Ω ⊆ Rr. . Then f x1 . . .d. xk)′. j = 1. k. . . We say that the joint p. .d. . iv) X is distributed as Beta with β known. xr . . of X.3 Use Theorems 6 and 7 to discuss: ii) The completeness established or asserted in Examples 9. i) Show that f(·. θ > 0 . ii) Completeness in the Beta and Gamma distributions when one of the parameters is unknown and the other is known.1 11.f. Xr)′ have the multinomial p. Xn be i. ⎢ ⎥ ⎣j =1 ⎦ ( ) () () () () where x = (x1. ∞ ) x .f. θ). ⋅ ⋅ ⋅ .5 Some Multiparameter Generalizations Sufﬁciency: Deﬁnition and Some Basic Results 281 iii′) X is distributed as Gamma with α known. . . r. belongs to the r-parameter exponential family if it is of the following form: ⎡ r ⎤ f x. . Xk)′.f. xj ∈ ޒ. Thus this p. 11. · · · .

. xr . β). Tj x1 . θr)′ ∈ Ω ⊆ ޒr. is said to be unbiased if EθUj = θj. .v. . r for all θ ∈ Ω. .5.v. the r-dimensional statistic U = (U1. . x1! ⋅ ⋅ ⋅ xr ! ( ) Let X be N(θ1. appropriate versions of Theorems 6. however.2 If the r. . 2 2 θ θ θ ⎝ 2πθ 2 2⎠ 2 2 1 () () T2 x = − x 2 () and h x = 1.d. Q1 θ = 1 . Uj = Uj(X1. is of exponential form with Cθ = () ⎛ θ2 ⎞ 1 θ exp⎜ − 1 ⎟ .i. This point will not be pursued here. θ). Finally. . β are unknown. ⋅ ⋅ ⋅ . . . 11. .f. f(·.f. Q j θ = log and () ( ) n () θj . . show that the p.v. . ⋅ ⋅ ⋅ .282 11 Sufﬁciency and Related Theorems C θ = 1 − θ1 − ⋅ ⋅ ⋅ − θ r −1 . θ 1 . . T1 x = x. . θ = (θ1. not necessarily of an exponential form. multiparameter versions of Theorems 4–9 may be formulated but this matter will not be dealt with here.d. j = 1. iii) X = (X1. . Xn). Xn are i. if X1. () For multiparameter exponential families. Exercises 11. ⋅ ⋅ ⋅ . of X is not of an exponential form regardless of whether one or both of α. xr = x j .d. Ur)′. j = 1.’s with p. i) X is distributed as Gamma. r. . f x. ii) X is distributed as Beta. ⋅ ⋅ ⋅ . θ2). . xr = EXAMPLE 19 ( ) n! I A x1 . 1 − θ1 − ⋅ ⋅ ⋅ − θ r −1 ( ) h x1 .d. show that the distribution of the r. Then.3 Use the not explicitly stated multiparameter versions of Theorems 6 and 7 to discuss: . . 11.f. Again.5. θ 2 = ( ) ⎛ θ2⎞ ⎛θ 1 2⎞ exp⎜ − 1 ⎟ exp⎜ 1 x − x ⎟.5. j = 1. X2)′ is distributed as Bivariate Normal with parameters as described in Example 4. . 7 and 8 are also true. . . Q2 = . r − 1.1 In each one of the following cases. . X is distributed as U(α. . 2θ 2 ⎠ ⎝ θ2 2πθ 2 ⎝ 2θ 2 ⎠ 1 and hence this p. X and the random vector X is of the multiparameter exponential form and identify the various quantities appearing in a multiparameter exponential family. . r.

xk and n1. . . In an experiment.) . Y2. . nk Y1 Y2 . .8 in Chapter 4. ii) The statistic ′ k ⎛ k ⎞ Y . n2. k different doses of the drug are considered. β ∈ ޒare unknown parameters. .4 (A bio-assay problem) Suppose that the probability of death p(x) is related to the dose x of a certain drug in the following manner px = () 1+e 1 .. . ii) Completeness in the Beta and Gamma distributions when both parameters are unknown.. . Yk x1. each dose is applied to a number of animals and the number of deaths among them is recorded. Yj is distributed as B(nj. Yk constitutes an exponential family.. 11. . The resulting data can be presented in a table as follows.11. x Y ⎜∑ j ∑ j j⎟ ⎝ j =1 j =1 ⎠ is sufﬁcient for θ = (α. nk are known constants.. . x2. Y2. . Yk are independent r. Dose Number of animals used (n) Number of deaths (Y ) x1 x2 .1 Exercises Sufﬁciency: Deﬁnition and Some Basic Results 283 ii) The completeness asserted in Example 15 when both parameters are unknown.1.. Then show that: ii) The joint distribution of Y1. . (REMARK In connection with the probability p(x) given above.’s. .5. xk n1 n2 . − (α + βx ) where α > 0. . . . Y1. p(xj)). β)′. see also Exercise 4. .v..

all probabilities we might be interested in. θ is generally unknown. . In practice. Exercise 12.’s having the Cauchy distribution with σ = 1 and μ unknown. we can calculate. The value U(x1. in principle. Suppose you were to estimate μ. r. . . θ ). say. We now proceed to deﬁne what we mean by an estimator and an estimate of g(θ ). Then DEFINITION 1 Any statistic U = U(X1. where g is (measurable and) usually a real-valued function.d. For simplicity and by slightly abusing the notation.v. or more generally.v. we might be interested in estimating some function of θ.d. . X ¯ as a criterion of selection. . which one of the estimators ¯ would you choose? Justify your answer. . Xn be i.d. where θ ∈Ω Ω ⊆ ޒr.i. Let X1. . . .1 Introduction Let X be an r. If θ is known. .f.v.’s with p.1.i. f(·. . θ ).1 Let X1. Xn be i. g(θ ). r.f. .284 12 Point Estimation Chapter 12 Point Estimation 12. with p.) (Hint: Use the distributions of X1 and X 284 . . Then the problem of estimating θ arises. Xn) which is used for estimating the unknown quantity g(θ ) is called an estimator of g(θ ). xn) of U for the observed values of the X’s is called an estimate of g(θ ). the terms estimator and estimate are often used interchangeably. . X1 . f(·. . .d. however.

one could restrict oneself to the class of unbiased estimators. . Xn) = g(θ ) for all θ ∈ Ω. Then the estimator U = U(X1.d. Thus. . no matter what θ ∈ Ω is. it is obvious that in order to obtain a meaningful estimator of g(θ ). too large.1. .1 (a) p. Xn) is an unbiased estimator of g(θ ). Let g be as above and suppose it is real-valued. one would choose the one with smaller variance. we will devote ourselves to discussing those criteria which are often used in selecting a class of estimators.3 The Case ofEstimator: Availability of Complete Sufﬁcient 285 12. . if U = U(X1.d. DEFINITION 3 Pθ U − g θ ≤ ε ≥ 1 − [ () ] 2 σθ U . then. the larger the lower bound of the probability of concentration of U about g(θ ) becomes. this class is.12. . According to Deﬁnition 2. ) h2( u. h1( u. . Although the criterion of unbiasedness does specify a class of estimators with a certain property.f. In this chapter. of U1 (for a fixed θ ). . one would have to choose that estimator from a speciﬁed class of estimators having some optimal properties. (b) p. DEFINITION 2 Let g be as above and suppose that it is real-valued. . According to this criterion. . as a rule.2 Criteria for Selecting an Unbiasedness. The interest in the members of this class stems from the interpretation of the expectation as an average value. then by Tchebichev’s inequality. g(θ ) is said to be estimable if it has an unbiased estimator. . . the average value (expectation under θ ) of U is equal to g(θ ). Minimum Variance From Deﬁnition 1. . .f. . Thus the question arises as to how a class of estimators is to be selected. .2 Criteria for Selecting an Estimator: Unbiasedness. (See Fig.) The reason for doing so rests on the interpretation of variance as a measure of concentration about the mean. A similar interpretation can be given by means of the CLT when applicable. among two estimators of g(θ ) which are both unbiased. Xn) is an unbiased estimator of g(θ ). Minimum Statistics Variance 12. ε2 2 Therefore the smaller σ θ U is. Xn) is called an unbiased estimator of g(θ ) if Eθ U(X1. 12. ) 0 g( ) u 0 g( ) (b) u (a) Figure 12. . Thus if U = U(X1. This suggests that a second desirable criterion (that of variance) would have to be superimposed on that of unbiasedness. . of U2 (for a fixed θ ).

in that it does provide a lower bound for the variances of all unbiased estimators regardless of the existence or not of a complete sufﬁcient statistic. Xn be independent r. Xn) is any other unbiased estimator 2 2 U1 ≥ σ θ U for all θ ∈ Ω.2–12. if U1 = U1(X1. An estimator U = U(X1.d. to be established below.2 Let X1. 12.2. . where the UMVU estimators involved behave in a rather ridiculous fashion. Show that there is no unbiased estimator of g(θ ) = 1/θ based on X. ∞). Xn) is said to be a uniformly minimum variance unbiased (UMVU) estimator of g(θ ) if it is unbiased and has the smallest variance within the class of all unbiased estimators of g(θ ) under all θ ∈ Ω. the problem arises as to how one would go about searching for a UMVU estimator (if such an estimator exists).v. . then σ θ In many cases of interest a UMVU estimator does exist. one would search for an estimator with the smallest variance.2. θ ∈ Ω = (0. . we refer the reader to Exercises 12.2. Once one decides to restrict oneself to the class of all unbiased estimators with ﬁnite variance. Exercises 12. The second approach is appropriate when a complete sufﬁcient statistic is not readily available.11 and 12. . Find unbiased estimators of the mean and variance of the X’s depending only on a sufﬁcient statistic for θ. within this restricted class.v. .12. .2. θ ). one would restrict oneself ﬁrst to the class of all unbiased estimators of g(θ ) and next to the subclass of unbiased estimators which have ﬁnite variance under all θ ∈ Ω. the corollary to Theorem 2.286 12 Point Estimation Following this line of reasoning. and then would try to determine an estimator whose variance is equal to this lower bound. . is instrumental. distributed as B(n. . .’s from U(θ1. The ﬁrst is appropriate when complete sufﬁcient statistics are available and provides us with a UMVU estimator. In discussing Exercises 12.i. θ1 < θ2 and ﬁnd unbiased estimators for the mean (θ1 + θ2)/2 and the range θ2 − θ1 depending only on a sufﬁcient statistic for (θ1. 12. . refer to Example 3 in Chapter 10 and Example 7 in Chapter 11.v. DEFINITION 4 Let g be estimable. Then.3.4 below. . That is.) It is more effective. θ2). however. Using the second approach. There are two approaches which may be used. . Xn be i. one would ﬁrst determine a lower bound for the variances of all estimators in the class under consideration. . .3. . Lest we give the impression that UMVU estimators are all-important.1 Let X be an r. Formalizing this.2.3 Let X1. . the Cramér–Rao inequality. of g(θ ). In the second method just described.’s distributed as U(0. . θ ). we have the following deﬁnition. θ2)′. r. (Regarding sufﬁciency see.

by the Lehmann–Scheffé theorem (Theorem 5. Xn) be an unbiased estimator of g(θ). . . . assuming that such an estimator exists. . the unbiased estimator φ(T) is the one with uniformly minimum variance in the class of all unbiased estimators. m.d. U say. Notice that the method just described not only secures the existence of a UMVU estimator.3 The Case of Availability of Complete Sufﬁcient Statistics The ﬁrst approach described above will now be looked into in some detail. r. Then. . . Xm and Y1.i. Xn be i. .d. ∞) distribution and set U1 = n+1 X 2 n + 1 (n ) and U 2 = n+1 2 X ( n ) + X (1 ) . be a statistic which is sufﬁcient for θ and let U = U(X1. . Then by the Rao–Blackwell theorem (Theorem 4. Tj = Tj(X1. Namely. .’s from the U(θ. . . 12.v. . . both ¯ is the minimum variance unbiased linear estimaunknown. . This is the required estimator.2. . Then show that (X(1) + X(n))/2 is an unbiased estimator of θ. . Thus in the presence of a sufﬁcient statistic. To this end. one starts out with any unbiased estimator of g(θ) with ﬁnite variance. Then show that X tor of μ. . Xn be i. Xn be i. let T = (T1. 12. . . an obvious modiﬁcation of it). the Rao–Blackwell theorem tells us that. . Tm)′. Xn). respectively. . Also ¯ + (1 − c)Y that for every c ∈ [0. Next. r.v. It is essentially unique in the sense that any other UMVU estimators will differ from φ(T) only on a set of Pθ -probability zero for all θ ∈ Ω.d. .5 Let X1.12.i. .7 Let X1. . . but also produces it. U = cX ﬁnd the value of c for which the variance of U is minimum. θ ∈ Ω = (0. it sufﬁces to restrict ourselves to the class of those unbiased estimators which depend on T alone.2. assume that T is also complete. . . Chapter 11) (or rather. .3 The Case of Availability of Complete Sufﬁcient Statistics 287 12. 12. provided an unbiased estimator with ﬁnite variance exists. an obvious modiﬁcation of it). . . in searching for a UMVU estimator of g(θ). Thus we have the following result. .v. . 2θ ).6 Let X1. j = 1. .i. θ ∈ Ω = ޒ. 1]. where g is assumed to be real-valued. θ ) = 2 e . .’s with mean μ and variance σ 2.2. . Then Rao– Blackwellize it and obtain φ(T). 12. Chapter 11) (or more precisely. 5n + 4 [ ] Then show that both U1 and U2 are unbiased estimators of θ and that U2 is uniformly better than U1 (in the sense of variance). r.2. Set φ(T ) = Eθ(U|T).’s from the Double Exponential distribu1 −|x−θ| tion f(x. .4 Let X1. . Then show ¯ is an unbiased estimator of θ. φ(T ) is also an unbiased estimator of g(θ) and 2 2 Ω with equality holding only if U is a furthermore σ θ (φ) ≤ σ θ U for all θ ∈Ω function of T (with Pθ-probability 1). Yn be two independent random samples 2 with the same mean θ and known variances σ 2 1 and σ 2.

. For example. . . θ may represent the probability of an item being defective. if we set p = θ. T is a complete. . if the rule for rejection is this: Choose at random n (≥2) items from the lot and then accept the entire lot if the number of observed defective items is ≤2. The variance of the X ’s is equal to pq. n − 1 j =1 EXAMPLE 1 ( ) then Eθ U = g(θ). j = 1. . we have then j=1 j n ( Xj − X ) 2 2 ∑ (X j =1 n j −X ) 2 =T − T2 T2⎞ 1 ⎛ T − . let T = (T1. θ ). j = 1. Set φ(T) = Eθ (U|T). Furthermore. . 1) and g(θ) = θ (1 − θ ). . we would like to ﬁnd a UMVU estimator of g(θ ). . if the experiment just described is repeated independently r times. sufﬁcient statistic for θ. By setting T = ∑n X .v. sufﬁcient statistic for θ by Examples 6 and 9 in Chapter 11. The problem is that of ﬁnding a UMVU estimator of g(θ ).288 12 Point Estimation THEOREM 1 Let g be as in Deﬁnition 2 and assume that there exists an unbiased estimator U = U(X1. θ ). distributed as B(n. Now the r. Xn be i. This theorem will be illustrated by a number of concrete examples. . Xr distributed as X. . .i. We know that. . Therefore. On the basis of r independent r. Tj = Tj(X1.v. so that U = . r are independent B(n. . Therefore U is a UMVU estimator of the variance of the X’s according to Theorem 1. Then φ(T) is a UMVU estimator of g(θ) and is essentially unique. if it exists. .’s X1.v. m be a sufﬁcient statistic for θ and suppose that it is also complete. θ ∈ Ω = (0. n n−1⎜ n⎟ ⎝ ⎠ But T is a complete. Xn). .v.d. r. if it exists. EXAMPLE 2 Let X be an r. . when chosen at random from a lot of such items. . . . Let X1. . . . Then g(θ ) represents the probability of accepting the entire lot. . Set . .’s Xj. Furthermore. θ) and set 2 ⎛ n⎞ g θ = Pθ X ≤ 2 = ∑ ⎜ ⎟ θ x 1 − θ x = 0 ⎝ x⎠ () ( ) ( ) n −x = 1−θ ( ) n + nθ 1 − θ ( ) n −1 ⎛ n⎞ + ⎜ ⎟θ 2 1 − θ ⎝ 2⎠ ( ) n −2 . Xn) of g(θ) with ﬁnite variance. . p) and suppose we wish to ﬁnd a UMVU estimator of the variance of the X ’s.’s from B(1. if U= 2 1 n ∑ Xj − X . . Tm)′. the problem is that of ﬁnding a UMVU estimator for g(θ). n n ⎛1 n ⎞ = ∑ X j2 − nX 2 = ∑ X j − n⎜ ∑ X j ⎟ ∑ ⎝ n j =1 ⎠ j =1 j =1 j =1 because Xj takes on the values 0 and 1 only and hence X 2 j = Xj. . . so that T = ∑jr= 1 Xj is B(nr. Thus U is an unbiased estimator of g(θ).

X 2 + ⋅ ⋅ ⋅ + X r = t − 1 1 = 2.3 The Case of Availability of Complete Sufﬁcient Statistics 289 ⎧1 if U=⎨ ⎩0 if X1 ≤ 2 X 1 > 2. ⎝ t − 1 ⎠ ⎝ 2⎠ ⎝ t − 2 ⎠ ⎥ ⎦ ( ) ( ) ⎡⎛ n r − 1 ⎞ ⎢⎜ ⎟ ⎢ ⎣⎝ t ⎠ ( ) Therefore −1 ⎛ n r − 1 ⎞ ⎛ n⎞ ⎛ n r − 1 ⎞ ⎤ ⎛ nr ⎞ ⎡⎛ n r − 1 ⎞ φ T = ⎜ ⎟ ⎢⎜ ⎟ + n⎜ ⎟ + ⎜ ⎟⎜ ⎟⎥ ⎝ T ⎠ ⎢⎝ T ⎠ ⎝ T − 1 ⎠ ⎝2⎠ ⎝ T − 2 ⎠ ⎥ ⎣ ⎦ ( ) ( ) ( ) ( ) .12. To this end. Then one obtains the required estimator by Rao–Blackwellization of U. we have Eθ U T = t = Pθ U = 1T = t θ 1 ( ) ( = P (X = ) ) ≤ 2T = t = Pθ X 1 ≤ 2. X 1 + ⋅ ⋅ ⋅ + X r = t Pθ T = t ( ( ) ) ( +P ( X θ 1 Pθ X 1 = 0. X 2 + ⋅ ⋅ ⋅ + X r ) = t − 2)] = ( +P ( X θ 1 Pθ X 1 = 0 Pθ X 2 + ⋅ ⋅ ⋅ + X r = t Pθ T = t ( ( ) ( ) + Pθ X 1 = 1 Pθ X 2 + ⋅ ⋅ ⋅ + X r = t − 1 1 ) ( = 2)P ( X θ 2 ⎡⎛ nr ⎞ = ⎢⎜ ⎟ θ t 1 − θ ⎢ ⎣⎝ t ⎠ ( ) nr −t + nθ 1 − θ ( ) n −1 ⎛ n r − 1 ⎞ t −1 n ( r −1)−t +1 ⎜ ⎟θ 1 − θ ⎝ t −1 ⎠ ( ⎤ ⎥ ⎥ ⎦ ⎢ ⎣ −1 ) + ⋅ ⋅ ⋅ + X = t − 2)] ⎡ ⎛ n(r − 1)⎞ ( ⎢(1 − θ ) ⎜ ⎟ θ (1 − θ ) r n t n r −1 −t ) ⎝ t ⎠ ) ( ) ⎛ n⎞ +⎜ ⎟ θ 2 1 − θ ⎝ 2⎠ ( ) n −2 ⎛ n r − 1 ⎞ t −2 n ( r −1)−t + 2 ⎤ ⎥ ⎜ ⎟θ 1 − θ ⎥ ⎝ t−2 ⎠ ⎦ ( ) ( ) ⎡⎛ nr ⎞ = ⎢⎜ ⎟ θ t 1 − θ ⎢ ⎣⎝ t ⎠ ( ) nr −t ⎤ t ⎥ θ 1−θ ⎥ ⎦ −1 ( ) nr −t ⎛ n r − 1 ⎞ ⎛ n⎞ ⎛ n r − 1 ⎞ ⎤ + n⎜ ⎟ + ⎜ ⎟⎜ ⎟ ⎥. Then Eθ U = g(θ ) but it is not a function of T. X 2 + ⋅ ⋅ ⋅ + X r = t Pθ T = t ( )[ )[ ( ) + Pθ X 1 = 1.

we have that 2 2 1/θ∑ n j=1(Xj − μ) is χ n. so that Eθ ⎜ ⎟ = cn ⎟ = θ. Then the expectation n. call it c ′ n (see Exercise 12. .1. is B(t.v. g θ = e −θ j =1 n () and deﬁne U by ⎧1 if U=⎨ ⎩0 if Then X1 = 0 X 1 ≥ 1. . ⎛ nS ⎞ ⎛ nS ⎞ Eθ ⎜ ′ . Xn (n ≥ 2) be i. Eθ U = Pθ U = 1 = Pθ X1 = 0 = g θ . sufﬁcient statistic for θ. . n⎠ ⎝ ( ) ( ) () ( ) ( ) t so that ⎛ 1⎞ φ T = ⎜1 − ⎟ n⎠ ⎝ ( ) T is a UMVU estimator of e−λ. .’s from P(λ).2(i) and Example 10 in Chapter 11. . By Corollary 5. However. Chapter 7.d. – Set σ 2 = θ and let g(θ) = √θ . It remains then for us to Rao–Blackwellize U. (See Exercise 12. So. ∑ n j =1 ( ) – – then nS2/θ is χ2 θ is distributed as χn. Xn be i.3. We are interested in ﬁnding a UMVU estimator of σ. that is. Let now X1. Then the probability that no event occurs is equal to e−λ. given T = t.v. .290 12 Point Estimation is a UMVU estimator of g(θ) by Theorem 1.i. Then the problem is that of ﬁnding a UMVU estimator of e−λ. 1/n).’s from N(μ. For this purpose we use the fact that the conditional distribution of X1. That is. we obtain . according to Exercise 11. .3. λ = θ .) Then ⎛ 1⎞ Eθ U T = t = Pθ X1 = 0 T = t = ⎜ 1 − ⎟ . ′ ⎠ ⎝ θ ⎠ ⎝ cn – Setting ﬁnally cn = c′n /√n. Set T = ∑ X j . it does not depend on T which is a complete. EXAMPLE 3 Consider certain events which occur according to the distribution P(λ). σ 2) with σ 2 unknown and μ known. EXAMPLE 4 Let X1. r. . U is an unbiased estimator of g(θ ).2).1. r.i. if we set S2 = 2 1 n Xj − μ . so that √nS/√ – – Eθ(√nS/√θ ) can be calculated and is independent of θ .d.

and set ξp for the upper pth quantile of the distribution (0 < p < 1). EXAMPLE 5 Let again X1. . σ Of course.5. ⎝σ ⎠ Therefore ⎛ nS 2 ⎞ 2 Eθ ⎜ ⎟ =σ . Here θ = (μ.’s from N(μ. σ 2) with both μ and σ 2 unknown. Clearly. n − 1 ⎝ ⎠ So U1 and U2 are unbiased estimators of μ and σ 2. . By Remark 5 in Chapter 7. Chapter 11.v. Φ−1(1 − p) is a uniquely determined number.d. Xn be i. it is complete. our problem is that of ﬁnding a UMVU estimator of g(θ). Let ( ) ( ) . We are interested in ﬁnding UMVU estimators for each one of μ and σ 2.’s from N(μ. one has Pθ (X1 ≥ ξp) = p. ⎝ σ ⎠ Hence ξp − μ = Φ −1 1 − p and ξ p = μ + σ Φ −1 1 − p .v. it follows that S/cn is a UMVU estimator of σ. (See Example 12. σ2 )′. Since this estimator depends on the complete. Xn be i. ⎝ cn ⎠ that is.i. g2(θ) = σ 2. Then by setting g(θ) = μ + σ Φ−1(1 − p). Eθ U1 = μ. S 2)′. But ⎛ X − μ ξp − μ ⎞ ⎛ ξp − μ ⎞ Pθ X 1 ≥ ξ p = Pθ ⎜ 1 ≥ = 1 − Φ⎜ ⎟ ⎟.3(ii). . r. σ ⎠ ⎝ σ ⎝ σ ⎠ ( ) so that ⎛ ξp − μ ⎞ Φ⎜ ⎟ = 1 − p.) Let U1 = X 2 = nS /(n − 1). respectively. ∑ n j =1 ( ) ¯. From the deﬁnition of ξp. Since they ¯. since p is given. .i.3 The Case of Availability of Complete Sufﬁcient Statistics 291 ⎛S⎞ Eθ ⎜ ⎟ = θ . sufﬁcient statistic (X are UMVU estimators. ⎛ nS 2 ⎞ Eθ ⎜ 2 ⎟ = n − 1. . Chapter 11. By setting S2 = 2 1 n Xj − X . S2)′ is a sufﬁcient statistic for θ. . Chapter 11) S2 alone. σ 2)′ and let g1(θ) = μ.d. . The problem is that of ﬁnding a UMVU estimator of ξp. S/cn is an unbiased estimator of g(θ ). Set θ = (μ. (See Example 8. sufﬁcient statistic (see Example 8 and Exercise 11. σ 2) with both μ and σ 2 unknown. EXAMPLE 6 Let X1.) we have that (X ¯ and U2 Furthermore.12. . r. it follows that they depend only on the complete.

r. . Then the reliability of the equipment at time x.v. where both μ and σ2 are unknown. having the Geometric distribution. μ1μ2.3.8 Let X1. cn ( ) ¯ = μ and Eθ (S/cn) where cn is deﬁned in Example 4.v. σ2. Then by the fact that Eθ X = σ (see Example 4). r. r. given T = t.2 Refer to Example 4 and evaluate the quantity c′n mentioned there. σ 2 ). .’s distributed as N(μ. 1). . . .i. Xn are i.d. is that of B(t. . .10 Let X be an r. . is deﬁned as the probability that X > x. ∞). θ = θ 1 − θ .v.v. Show that ¯ 2 − (1/n) is the UMVU estimator of g(θ ) = θ 2.’s from P(θ ). 1/n). ( ) ( ) x ( ) . θ ∈Ω = (0. .v.v. j = 1.4 If X1. ρσ2/σ1.3. it follows that U is a UMVU estimator complete. Find the UMVU estimators of the following quantities: ρσ1σ2. that is. . . . 12. . . .3. Xn be i.7 Let X1. . 12. Yj)′. .3.3. 12. σ1.3.i. 1 .3. .3. 1).v. f x.i. . n be independent random vectors having the Bivariate Normal distribution with parameter θ = (μ1.3. having the Negative Binomial distribution with parameter θ ∈ Ω = (0. .v. .d. θ ) on the basis of n observations on X. μ2. θ ∈Ω = (0. Xn are i. R(x). .’s from B(1. 1.9 Let (Xj. observe that the same is true if X1 is replaced by any one of the remaining X’s.d. Xn be i. ρ)′. Exercises 12. use Theorem 1 in order to determine the UMVU estimator of θ. 12. θ ∈ Ω = 0.3.6 Let X be an r. . we have that Eθ U = g(θ).5 Let X1.3 If X1.d. ∞). Xn be independent r. Find the UMVU estimator of μ/σ. . Xn be independent r. sufﬁcient statistic (X of ξp. . x = 0. θ ).11 Let X be an r. 12. denoting the life span of a piece of equipment. of X1. show that X 12. If X has the Negative Exponential distribution with parameter θ ∈Ω = (0. . 12.d. ﬁnd the UMVU estimator of the reliability R(x.1 Let X1. by using ¯ is the UMVU estimator of θ. . Find the UMVU estimator of g(θ ) = 1/θ and determine its variance. ∞). . 12. 1).f. . Furthermore.’s from P(λ) and set T = ∑n j = 1 Xj. Since U depends only on the ¯.’s distributed as N(θ. Theorem 1. . r.292 12 Point Estimation U=X+ S −1 Φ 1− p .3.i. .v. Use Theorem 1 in order to determine the UMVU estimator of θ.’s from the Negative Exponential distribution with parameter θ ∈Ω = (0. S2)′. Then show that the conditional p. . 12. X 12.

4. sufﬁcient statistics are available. Y distributed as P(2θ). with p. Then show that U(X) is the UMVU estimator of g(θ ) and conclude that it is an entirely unreasonable estimator. According to this method. 12.3.v.1. which may be useful for comparison purposes. Chapter 11. we do have a lower bound of the variances of a class of estimators.2 is actually UMVU. iv) ∫S ⋅⋅⋅ ∫S f ( x1 .14 Use Exercise 11. or it is not easy to identify them.12 Let X be an r. the problem of ﬁnding a UMVU estimator is settled as in Section 3. denoting the number of telephone calls which arrive at a given telephone exchange.1 Regularity Conditions Let X be an r. iii) (∂/∂θ ) f(x. θ ) ⋅ ⋅ ⋅ f ( xn . θ ). we ﬁrst establish a lower bound for the variances of all unbiased estimators and then we attempt to identify an unbiased estimator with variance equal to the lower bound found. 12. We assume that Ω ⊆ ޒand that g is real-valued and differentiable for all θ ∈ Ω. and suppose that X is distributed as P(θ ). θ ) is positive on a set S independent of θ ∈ Ω.5 is not UMVU. as can be shown. If that is possible.4 The Case Where Complete Sufﬁcient Statistics Are Not Available or May Not Exist: Cramér–Rao Inequality When complete. Deﬁne U(X ) by U(X) = (−1)X.2. one may use the approach described here in searching for a UMVU estimator.3 The Case ofStatistics Availability ofNot Complete Sufﬁcient 293 and let U(X) be deﬁned as follows: U(X ) = 1 if X = 0 and U(X ) = 0 if X ≠ 0. in order to show that the unbiased estimator constructed in Exercise 12.) 12. 12.v. (Hint: Use Theorem 1. in order to conclude that the unbiased estimator constructed in Exercise 12.f. θ ∈ Ω ⊆ ޒ. f(·.v.4.4 The Case Where Complete Sufﬁcient Are Available or May Statistics Not Exist 12. ∞) is the number of calls arriving at the telephone exchange under consideration within a 15 minute period.3. the problem is solved again.2. Then it is assumed that iii) f(x.3. show that U(X) is a UMVU estimator of θ and conclude that it is an unreasonable one. At any rate. iii) Ω is an open interval in ( ޒﬁnite or not). where θ ∈Ω = (0. Thus Pθ (Y = 0) = e−2θ = g(θ ). The following regularity conditions will be employed in proving the main result in this section. By using Theorem 1. 12. Chapter 11. When such statistics do not exist. Then the number of calls which arrive at the given telephone exchange within 30 minutes is an r.13 Use Example 11. θ )dx1 ⋅ ⋅ ⋅ dxn . θ ) exists for all θ ∈ Ω and all x ∈ S except possibly on a set N ⊂ S which is independent of θ and such that Pθ (X ∈ N ) = 0 for all θ ∈ Ω.d.12.

the inequality is trivially true for those θ’s. xn )f ( x1 . where g ′ θ = () dg θ dθ () . . θ ) S may be differentiated under the integral or summation sign.’s with p. f(·. (Cramér–Rao inequality. θ ⋅ ⋅ ⋅ f xn . . . . . . . . .294 12 Point Estimation or ∑ S ⋅⋅⋅ ∑ f ( x1 . . . PROOF If σ 2 θ U = ∞ or I(θ ) = ∞ for some θ ∈ Ω. θ ) ⋅ ⋅ ⋅ f ( xn . .) Let X1. θ ⎥∏ f x j . . xn ) f ( x1 . . ⎥ ⎦ i =1 ( ) ( ) (2) ) ( ) . xn f x1 . θ )⎥ 2 ⎤ ⎦ j ( ) ( )⎥ ∏ f ( x . θ + ⎢ f =⎢ ⎣ ∂θ ⎦ j ≠1 ⎣ ∂θ ⎡∂ ×∏ f x j . to be denoted by I(θ ). . Hence we need only consider the case where σ 2 θ U < ∞ and I(θ ) < ∞ for all θ ∈ Ω. . θ ⎥ ∏ f xi . respectively. θ ⎥ ∏ f xi . where U(X1. Xn be i. θ + ⋅ ⋅ ⋅ + ⎢ f xn . since the discrete case is treated entirely similarly with integrals replaced by summation signs. θ ) ⋅ ⋅ ⋅ f ( xn . r. θ dx1 ⋅ ⋅ ⋅ dxn = g θ . . . iv) Eθ [(∂/∂θ )log f (X. θ )]2. is >0 for all θ ∈ Ω. θ j ≠2 ⎣ ∂θ [( ) ( )] ( ) ( ) ( x .f. θ ∈ Ω. . θ )⎥ ⎥ i i≠j ⎤ ⎦ ( ) ( ⎤ n ∂ f x j . . θ ∂θ ⎥ ⎦ i =1 ⎤ n x j . θ . θ ⎢ ⎣ j =1 ⎡n ∂ = ⎢∑ log f ⎢ ⎣ j =1 ∂θ ( θ )∏ f ( x . Xn) is any unbiased estimator of g(θ ). respectively.i. ∂θ j =1 ⎢ ⎣ ⎡n 1 = ⎢∑ f xj . . θ ) S S THEOREM 2 may be differentiated under the integral or summation si gn. X n S S ( = ∫ ⋅ ⋅ ⋅ ∫ U x1 .d. . We have Eθ U X 1 .v. one has σ θ2 [g ′(θ )] U≥ nI θ 2 () . . Then for any unbiased estimator U = U(X1. θ ) ⎦ j ≠n ⎤ n ⎡ ∂ = ∑⎢ f xj . θ ) and assume that the regularity conditions (i)–(vi) are fulﬁlled.d. . Then we have the following theorem. Also it sufﬁces to discuss the continuous case only. θ ∂θ ⎡∂ ⎤ ⎡∂ f x1 . θ ) dx1 ⋅ ⋅ ⋅ dxn ∑ ⋅ ⋅ ⋅ ∑U ( x1 . Xn) of g(θ ). . Now restricting ourselves to S. . . θ ) ⋅ ⋅ ⋅ f ( xn . . vi) or ∫S ⋅⋅⋅ ∫S U ( x1 . θ ⋅ ⋅ ⋅ f xn . we have ( ) )( ) ( ) () (1) ∂ f x1 .

. θ ⎥ ⎢ ⎥ ⎦ ⎣ j = 1 ∂θ ⎦ j = 1 ⎣ ∂θ ⎡ ⎤ 2 ∂ = nσ θ ⎢ log f X 1 . . . θ ⎥ . it further follows that ⎡n ∂ ⎤ n ⎡∂ ⎤ 0 = Eθ Vθ = Eθ ⎢∑ log f X j . θ ⎥ = ∑ Eθ ⎢ log f X j . θ ⎥ = ∑ σ θ2 ⎢ log f X j . . ⎣ ∂θ ⎦ ( ) ( ) ( ) so that ⎡∂ ⎤ Eθ ⎢ log f X 1 . . θ ⎥ = 0. θ ⎥ ⎬ = Eθ UVθ . we obtain g′ θ = ∫ ⋅ ⋅ ⋅ () ⎡n ∂ ⎤ n θ U x . and employing (2). θ ⎥ = nEθ ⎢ log f X . X n = ∑ Next. .4 The Case Where Complete Sufﬁcient Are Available or May Statistics Not Exist 12. θ dx1 ⋅ ⋅ ⋅ dxn ∑ n j ∫S 1 S ⎢ ⎥ ⎣ j = 1 ∂θ ⎦ i =1 ⎧ ⎡n ∂ ⎤⎫ ⎪ ⎪ (3) = Eθ ⎨U X 1 . . ∂θ ⎪ ⎪ ⎢ ⎥ = 1 j ⎣ ⎦ ⎭ ⎩ ( ) ( ) ( ) ( ) ( ) ( ) where we set Vθ = Vθ X 1 . Vθ = Eθ UVθ − EθU Eθ Vθ = Eθ UVθ = g ′ θ . θ ⋅ ⋅ ⋅ f xn . θ ⎥ ⎢ ⎥ ⎦ ⎣ j =1 ∂θ ⎦ j =1 ⎣ ∂θ ⎡∂ ⎤ = nEθ ⎢ log f X 1 . S ( ) ( ) Therefore differentiating both sides with respect to θ by virtue of (iv). . θ ⎥ ⎣ ∂θ ⎦ ( ) ( ) ( ) ⎡∂ ⎡∂ ⎤ ⎤ = nEθ ⎢ log f X 1 . ⎣ ∂θ ⎣ ∂θ ⎦ ⎦ ( ) 2 ( ) 2 (6) . . ⎣ ∂θ ⎦ ( ) Therefore ⎡n ∂ ⎤ n ⎡∂ ⎤ σ θ2Vθ = σ θ2 ⎢∑ log f X j .12. θ ⎥∏ f xi . S S ⎢ ⎥ i =1 ⎣ j = 1 ∂θ ⎦ From (3) and (4).3 The Case ofStatistics Availability ofNot Complete Sufﬁcient 295 Differentiating with respect to θ both sides of (1) on account of (vi) and utilizing (2). ( ) ∂ log f X j . θ . j = 1 ∂θ n ( ) ∫S ⋅ ⋅ ⋅ ∫ f x1 . θ ⎥. θ dx1 ⋅ ⋅ ⋅ dxn = Eθ Vθ . . . θ dx1 ⋅ ⋅ ⋅ dxn = 1. ⎢ ⎥∏ f xi . X n ⎢∑ log f X j . . x log f x . ( ) ( ) ( )( ) ( ) () (5) From (4) and the deﬁnition of Vθ. ⎡n ∂ ⎤ n 0 = ∫ ⋅ ⋅ ⋅ ∫ ⎢∑ log f x j . it follows that ( ) ( ) (4) Coυθ U .

.4.7. θ = U X 1 . σ θ2U ≥ nEθ ∂ ∂θ log f X . Xn) ∈ N] = 0 for all θ ∈ Ω. (8) The proof of the theorem is completed. . ∂θ ⎣ ⎦ ( ) 2 or by means of (v).6 and 12. we have that equality holds in (8) if 2 2 2 and only if C oυθ(U. X ). where kθ =± ( )( ) (9) () σ θ Vθ . . X n j =1 n ( ) ( ˜ ( X . . Vθ ≤ σ θ U σθ Vθ . . . Xn. X n − g θ k θ ∂θ j =1 ( ) () ( ) ()() (10) outside a set N in ޒn such that Pθ[(X1. . is called Fisher’s information (about θ ) number. (For an alternative way of calculating I(θ ). θ [( [g ′(θ )] ) 2 ( )] 2 [g ′(θ )] = nI θ 2 () .) Returning to the proof of Theorem 2. . . . Taking into consideration (4). Chapter 5). . this is equivalent to Vθ = Eθ Vθ + k θ U − EθU with Pθ −probability 1. ▲ DEFINITION 5 The expression Eθ[(∂/∂θ )log f(X. θ ⎥ . Vθ = 2 θ ( ) ( ) (σ U )(σ V ) Coυ U . . θ )]2 is the information (about θ ) contained in the sample X1. denoted by I(θ ). θ )]2. . we obtain log ∏ f X j . Vθ) = (σ θ U)(σ θ Vθ) because of (7). see Exercises 12. Vθ θ θ θ and ρ (U. . Vθ) ≤ 1. because of (i). θ = k θ U X 1 . relation (7) becomes [g ′(θ )] ( ) 2 2 θ ⎡∂ ⎤ ≤ σ U nEθ ⎢ log f X .296 12 Point Estimation But ρθ U . which is equivalent to 2 2 C o υθ U . )∫ k(θ )dθ − ∫ g(θ )k(θ )dθ + h 1 n . . σ θU Furthermore. By Schwarz inequality (Theorem 2. . . nEθ[(∂/∂θ )log f(X. equation (9) becomes as follows: n ∂ log ∏ f X j . .4. the fact that EθU = g(θ) and the deﬁnition of Vθ. the exceptional set for which (9) does not hold is independent of θ and has Pθ-probability 0 for all θ ∈ Ω. . . 2 ( ) ( )( ) (7) Taking now into consideration (5) and (6). Integrating (10) (with respect to θ ) and assuming that the indeﬁnite integrals ∫k(θ )dθ and ∫g(θ )k(θ )dθ exist.

f(·. x . REMARK 1 Theorem 2 has a certain generalization for the multiparameter case. . . For an unbiased estimator U = U(X1. . . · · · . x ). .f.v. COROLLARY [( )] If in Theorem 2 equality occurs for some unbiased estimator U = U (X1. xn = exp h 1 n () [ ()() ] () ) () ( Thus.f. the joint p. THEOREM 3 Let X1. we also have the following important result. . .d. h x1 . we obtain ∏ f ( x . θ ) and let g be an estimable realvalued function of θ.i. . σ θU ∏ f ( x . where kθ =± then () σ θ Vθ . Xn be i. . . . Then σ θ U is equal to the Cramér–Rao bound if and only if there exists a real-valued function of θ. More precisely.12. we 2 assume that regularity conditions (i)–(vi) are satisﬁed. In connection with the Cramér–Rao bound. Xn) of g(θ ) and if the indeﬁnite integrals ∫ k(θ )dθ.4 The Case Where Complete Sufﬁcient Are Available or May Statistics Not Exist 12. r. θ ) = C (θ ) exp[Q(θ )U ( x . such that U = g(θ ) + d(θ )Vθ except perhaps on a set of Pθ-probability zero for all θ ∈ Ω. . . θ ) = C (θ ) exp[Q(θ )U ( x . Xn) of g(θ ). . xn j =1 n ( ) ( ˜ ( x . then the joint p. . or where h 1 n log ∏ f x j .d. . x )]h( x . we have that . . PROOF Under the regularity conditions (i)–(vi). n j 1 n 1 n j =1 (12) where C θ = exp − ∫ g θ k θ dθ . we have the following result. x ). Xn) ∈ N] = 0 for all θ ∈Ω.d. .’s with p. .3 The Case ofStatistics Availability ofNot Complete Sufﬁcient 297 ˜ (X . here C(θ) = exp[−∫g(θ )k(θ )dθ] and Q(θ ) = ∫k(θ )dθ. )∫ k(θ )dθ − ∫ g(θ )k(θ )dθ + h 1 n (11) Exponentiating both sides of (11). if equality occurs in the Cramér–Rao inequality for some unbiased estimator. . . . . . That is. . . .. . but this will not be discussed here. provided certain conditions are met. of the X’s is of the one-parameter exponential family (and hence U is sufﬁcient for θ). .. of the X’s is of the one-parameter exponential form. . . .f. . x )]h( x . . .d. ∫g(θ )k(θ )dθ exist. . . . Q θ = ∫ k θ dθ and ˜ x . θ = U x1 . d(θ ). x ) n j 1 n 1 n j =1 outside a set N in ޒn such that Pθ[(X1.. . . . . X ) is the “constant” of the integration. . ..

r. θ ⎥ = . . we have ( ) 2 = 1−θ and Eθ X 1 − X = 0 2 [ ( ) )] ⎤ ⎡∂ 1 E ⎢ log f X . 2 2 since nI(θ ) = σ θ Vθ by (6). or [g ′(θ )] σ U≥ 2 θ 2 σ θ2 Vθ . . 2 θ 2 θ θ But 2 2 2 Thus σ θ U is equal to the Cramér–Rao bound if and only if C θ (U. Vθ 2 ( ) by (5). Vθ) = (σ θ U) 2 × (σ θVθ). p). we get that U = g(θ ) + d(θ )Vθ except perhaps on a set of Pθ-probability 0 for all θ ∈ Ω. a(θ ) and d(θ ). Furthermore. the exceptional set for which this relationship does not hold is independent of θ and has Pθ-probability 0 for all θ ∈ Ω. θ ⎥ = 2 x + θ ⎣ ∂θ ⎦ 1−θ ( ) 2 ( ) 2 (1 − x) 2 − 2 x 1− x .d. θ = θ x 1 − θ ( ) ( ) ( 1− x . or equivalently. θ = x log θ + 1 − x log 1 − θ . Taking expectations and utilizing the unbiasedness of U and relation (4). θ 1−θ ( ) ( ) Since Eθ X 2 = θ . 1). if and only if U = a(θ ) + d(θ )Vθ with Pθ-probability 1 for some functions of θ.v. EXAMPLE 7 Let X1. Xn be i. By setting p = θ. 1 so that Then log f x. θ = − ∂θ θ 1−θ ( ) and ⎡∂ ⎤ 1 2 1 ⎢ log f x. ⎥ ⎢ ∂θ θ 1−θ ⎦ ⎣ so that the Cramér–Rao bound is equal to θ(1 − θ )/n. Then σ θ U is equal to the Cramér–Rao bound if and only if [g ′(θ )] = (σ U )(σ V ). The proof of the theorem is completed.’s from B(1. The checking of the regularity conditions is left as an exercise. The following three examples serve to illustrate Theorem 2.298 12 Point Estimation [g ′(θ )] σ U≥ 2 θ 2 nI θ () 2 . ▲ [g ′(θ )] 2 = C o υθ U . . p ∈ (0. Eθ 1 − X (see Chapter 5). x = 0. we have f x. ( ) ( . . ( ) ) ( ) x 1− x ∂ log f x.i. because of (i).

r. Assume ﬁrst that σ 2 is known and set μ = θ. . we obtain ( ) 2 ⎡∂ ⎤ 1 Eθ ⎢ log f X . we have that X ( ) 2 EXAMPLE 9 Let X1. θ = −1 + ∂θ θ ( ) and ⎡∂ ⎤ 1 2 2 ⎢ log f x. r. . .’s from N(μ. σ ⎝ σ ⎠ ⎣ ∂θ ⎦ ( ) 2 2 Then ⎤ ⎡∂ 1 Eθ ⎢ log f X .d. θ = log⎜ ⎟− 2σ 2 ⎝ 2πσ ⎠ ( ) ) 2 . λ > 0. θ = .’s from P(λ). Since again X ¯ is a UMVU estimator of θ. . Xn be i.v. θ ⎥ = 2 . we have f x. that is. Then f x.i. . . Xn be i. estimator of θ with variance θ/n. EXAMPLE 8 Let X1.i. . x! ( ) x ∂ log f x. 1. ∂θ σ σ ( ) so that ⎡∂ ⎤ 1 ⎛ x −θ⎞ ⎢ log f x. ∂θ θ θ ⎣ ⎦ 2 Since Eθ X = θ and Eθ X = θ (1 + θ ) (see Chapter 5).d. ∂θ θ ⎣ ⎦ ¯ is an unbiased so that the Cramér–Rao bound is equal to θ/n. x = 0. Next. Therefore X is a UMVU estimator of θ. σ 2). . 1 x −θ ∂ log f x. θ ⎥ = . Again by setting λ = θ. θ ⎥ = 2 ⎜ ⎟ . . ⎥ ⎢ ∂θ σ ⎦ ⎣ 2 ( ) .4 The Case Where Complete Sufﬁcient Are Available or May Statistics Not Exist 12.3 The Case ofStatistics Availability ofNot Complete Sufﬁcient 299 2 ¯ ¯ is an unbiased extimator of θ and its variance is σ θ Now X (X ) = ¯ θ(1 − θ )/n. x ∈ޒ ⎥ ⎥ ⎦ and hence ⎛ 1 ⎞ x −θ log f x.v.12. equal to the Cramér–Rao bound. θ = ( ) ⎡ x −θ exp ⎢− ⎢ 2σ 2 2πσ ⎢ ⎣ 1 ( ) ( 2 ⎤ ⎥. so that log f x. θ = e − θ Then ( ) θx . . θ = −θ + x log θ − log x!. θ ⎥ = 1 + 2 x − x. .

θ = − log 2π − log θ − 2 2 2θ ( ) ( ) ( ) 2 and x−μ ∂ 1 log f x. ⎝ σ ⎠ 2 (See Chapter 5. Eθ ⎜ ⎟ = 3. 1) and hence ⎛ X −θ⎞ Eθ = ⎜ ⎟ = 1. we obtain ( ) 2 2 4 ⎛ X − μ⎞ ⎛ X − μ⎞ Eθ ⎜ ⎟ = 1. ⎥ ⎥ ⎦ so that x−μ 1 1 log f x. θ = ( ) ⎡ x−μ exp ⎢− ⎢ 2θ 2πθ ⎢ ⎣ 1 ( ) 2 ⎤ ⎥. X 2 mate of θ and its variance is equal to σ /n. Therefore. the Cramér–Rao bound.) ¯ is an unbiased estiThus the Cramér–Rao bound is σ 2/n. ¯ is a UMVU estimator. that is. Next. ( ) ⎛ Xj − μ⎞ 2 ∑⎜ ⎟ is χ n j =1 ⎝ θ ⎠ n 2 (see ﬁrst corollary to Theorem 5. 1).) ⎡∂ ⎤ 1 Eθ ⎢ log f X . θ ⎥ = 2 2θ ⎣ ∂θ ⎦ 2 and the Cramér–Rao bound is 2θ /n. Then ⎡∂ ⎤ 1 1 ⎛ x − μ⎞ 1 ⎛ x − μ⎞ + 2⎜ ⎢ log f x. Chapter 7). θ ⎥ = 2 − 2 ⎜ ⎟ ⎟ 4θ 2θ ⎝ θ ⎠ 4θ ⎝ θ ⎠ ⎣ ∂θ ⎦ – and since (X − μ)/√θ is N(0. ⎝ θ ⎠ ⎝ θ ⎠ Therefore 2 2 4 (See Chapter 5. so that ⎡ n ⎛ X − μ⎞2⎤ ⎡ n ⎛ X − μ⎞2⎤ j j 2⎢ ⎢ ⎥ ⎥ Eθ ∑ ⎜ ⎟ ⎥ = n and σ θ ⎢∑ ⎜ ⎟ = 2n ⎢ j =1 ⎝ j =1 ⎝ θ ⎠ θ ⎠ ⎥ ⎣ ⎦ ⎣ ⎦ . Then f x. θ = − + ∂θ 2θ 2θ 2 ( ) ( ) 2 . This was also shown in Example 5. X Suppose now that μ is known and set σ 2 = θ. Once again.300 12 Point Estimation since (X − θ )/σ is N(0.

Now it has been seen in Example 5 that 1 n ∑ Xj − X n − 1 j =1 ( ) 2 is a UMVU estimator of θ 2. provided by Theorem 2.) of U* and. we arrive at the same conclusion by treating θ1 as a constant and θ2 as the (unknown) parameter θ and calculating the Cramér–Rao bound.’s from the Gamma distribution with α known and β = θ ∈ Ω (0. Xj − X ⎥ = ∑ n ⎢ ⎥ ⎣ n − 1 j =1 ⎦ n−1 the Cramér–Rao bound.12. Xn be i. . Suppose that we are interested in ﬁnding a UMVU estimator of θ2. Then show that the UMVU estimator of θ is U X1 . . .3 The Case of Availability of Complete Sufﬁcient Exercises Statistics 301 2 (see Remark 5 in Chapter 7). Since ⎛X −X⎞ j ⎟ ∑⎜ ⎜ j =1 ⎝ θ2 ⎟ ⎠ n 2 2 is χ n −1 (see second corollary to Theorem 5. it follows that 2 ⎡ 1 n 2⎤ 2θ 2 2θ 2 σ θ2 ⎢ > 2.i. .4. σ 2 = θ 2. By using the generalization we spoke of in Remark 1. Thus if U is a UMVU estimator of g(θ ) and U* is any 2 2 other unbiased estimator of g(θ ). clearly. Thus 2 (1/n)∑n j=1(Xj − μ) is a UMVU estimator of θ. X n = ( ) 1 nα ∑X j =1 n j .v. .1 Let X1. ( ) A UMVU estimator of g(θ ) is also called an efﬁcient estimator of g(θ ) (in the sense of variance). Exercises 12. r. As a matter of fact. It is known as relative efﬁciency (r. .d. ∞) unknown. This then is an example of a case where a UMVU estimator does exist but its variance is larger than the Cramér–Rao bound.eff. it can be seen that the Cramér– Rao bound is again equal to 2θ 2 2/n. Chapter 7). . 1]. REMARK 2 Corollary D in Chapter 6 indicates the sort of conditions which would guarantee the fulﬁllment of the regularity conditions (iv) and (vi). takes values in (0. then the quantity σ θ U/(σ θ U*) may serve as a measure of expressing the efﬁciency of U* relative to that of U. equal to the Cramér–Rao bound. Therefore (1/n)∑n j=1(Xj − μ) is an unbiased 2 estimator of θ and its variance is 2θ /n. we assume that both μ and σ2 are unknown and set μ = θ1. . Finally.

write EθV = θ + b(θ ).3. 12.4.) 12. (This inequality is established along the same lines as those used in proving Theorem 2. θ ⎥.7 In Exercises 12. Here X is an r.f.d.5 and investigate whether the Cramér–Rao bound is attained. .1–12. . Ω ⊆ ޒr and consider the Let X1. . . Xn) of θ for which EθV is ﬁnite. For an estimator V = V(X1.4. Another principle which is very often used is that of the maximum likelihood. θ ∈ Ω ⊆ ޒ. r. S ∂2 ⎡ ∂2 ⎤ Then show that I θ = − Eθ ⎢ 2 log f X .i. . . f(·.8 Let X1. . suppose that. θ ) = 0. θ ∈Ω .4. θ). recalculate I(θ ) and the Cramér–Rao bound by utilizing Exercise 12.4. Show that. θ ) and b′(θ ) = db(θ )/dθ.d. and also ∂ suppose that the ∂θ f(x.4. . θ ∈ Ω.v.6 Assume conditions (i) and (ii) listed just before Theorem 2.3.d.6 where appropriate. Then b(θ ) is called the bias of V. 12.302 12 Point Estimation and its variance attains the Cramér–Rao bound. under the regularity conditions (i)–(vi) preceding Theorem 2—where (vi) is assumed to hold true for all estimators for which the integral (sum) is ﬁnite—one has σ θ2V ≥ nEθ ∂ ∂θ log f X . θ )dx = 0 ∂2 or ∑ ∂θ 2 f ( x. 2 2 ∫S ∂θ 2 f ( x.d. . r. 12.11 and investigate whether the Cramér–Rao bound is attained. with p.v.4. ⎢ ⎥ ⎣ ∂θ ⎦ 12. f(·. perhaps.f.4.f. Xn be i. respectively.d. on a set N ⊂ S with Pθ(X ∈ N) = 0 for all θ ∈ Ω.4. 12. 12. .4. f(·. . Furthermore.5 Refer to Exercise 12.v.4. .3.4.2 Refer to Exercise 12.3.7 and show that the Cramér–Rao bound is not attained for the UMVU estimator of g(θ ) = θ 2.4 Refer to Exercise 12.5 Criteria for Selecting an Estimator: The Maximum Likelihood Principle So far we have concerned ourselves with the problem of ﬁnding an estimator on the basis of the criteria of unbiasedness and minimum variance.’s with p. Xn be i. θ [( [1 + b′(θ )] ) 2 ( )] 2 . θ) exists for all θ ∈ Ω and all x ∈ S except.3 Refer to Exercise 12.’s with p.6 and investigate whether the Cramér–Rao bound is attained.i. θ ). () ( ) 12.

. . . . . .d. . . . . Although the principle of maximum likelihood does not seem to be justiﬁable by a purely mathematical reasoning.3 for Selecting anof Estimator: The Principle The Case Availability of Maximum Complete Likelihood Sufﬁcient Statistics 303 joint p. xn) is the probability of observing the x’s which were acutally observed. xn) with the probability element L(θ |x1. We will elaborate on this point later. xn) is called a maximum likelihood estimate (MLE) The estimate θ of θ if ˆ x . . as will become apparent from examples to be discussed below. . . if such a θ exists. . . . . . . X n = xn . . . θ) · · · f(xn. Treating the x’s as if they were constants and looking at this joint p. . . . θ ∈ Ω . . xn). . . . . which we then call the MLE and which is often obtained by differentiation. θ). of the X’s f(x1. it does provide a method for producing estimates in many cases of practical importance. . x > 0 is strictly increasing.f. .5 Criteria12. . . . Xn) is called an ML estimator (MLE for short) of θ. . . In many important cases there is a unique MLE. we denote it by L(θ |x1. x . In order to give an intuitive interpretation of a MLE. x = max L θ x . in order to maximize (with respect to θ ) L(θ |x1. . . . xn = Pθ X 1 = x1 . . j = 1. r. . . xn) in the case that Ω ∈ޒ. . an MLE is often shown to have several desirable properties. Xn be i. . .i. . A similar interpretation holds true for the case that the X’s are continuous by replacing L(θ |x1.v. . Lθ 1 n 1 n ˆ(X1. . EXAMPLE 10 ( ) ( ) Let X1. . The method of maximum likelihood estimation will now be applied to a number of concrete examples.12. suppose ﬁrst that the X ’s are discrete. . xn = − log⎜ ∏ x j !⎟ − nθ + ⎜ ∑ x j ⎟ log θ . Then L θ x1 . that is. Then it is intuitively clear that one should select as an estimate of θ that θ which maximizes the probability of observing the x’s which were actually observed. it sufﬁces to maximize log L(θ |x1. In addition. . . ⎝ j =1 ⎠ ⎝ j =1 ⎠ ( ) Therefore the likelihood equation .d.d. xn = e − nθ ( ) 1 ∏ j =1 x j ! n θ ∑n j=1 xj and hence ⎛ n ⎞ ⎛ n ⎞ log L θ x1 . L(θ |x1. as a function of θ. θ ( ) [( ) ] REMARK 3 Since the function y = log x. . This is much more convenient to work with. Then L θ x1 . . . n. .’s from P(θ ). . . xn) and call it the likelihood function. . . . . xn)dx1 · · · dxn which represents the probability (under Pθ ) that Xj lies between xj and xj + dxj.f. DEFINITION 6 ˆ=θ ˆ (x1. . .

. r and ⎪ ⎩ ( ) ∑p j =1 r j ⎫ ⎪ = 1⎬. r − 1 and equating the resulting expressions to zero. . r − 1. . . . we get xj This is equivalent to xj pj = xr . . . . j = 1. where Ω is the (r − 1)-dimensional hyperplane in ޒr deﬁned by ⎧ ′ ⎪ Ω = ⎨θ = p1 . xn = 0 becomes – n + nx = 0 ∂θ θ ¯ . . . ⎪ ⎭ Then L θ x1 . . pr ∈ ޒr . j = 1. p1 + ⋅ ⋅ ⋅ + pr −1 + pr 1 . . p1 pr −1 pr and this common value is equal to x1 + ⋅ ⋅ ⋅ + xr −1 + xr n = . . . . xr = = where n = ∑ jr= 1 xj.’s with parameter θ = (p1. r − 1. . pr)′ ∈ Ω. . . pj pr j = 1. which gives θ˜ = x ( ) ∂2 1 L θ x1 . for θ = θ x is the MLE of θ. xn = − nx 2 < 0 for all θ > 0 2 ∂θ θ ˜.304 12 Point Estimation 1 ∂ log L θ x1 . . that is. . in particular. . xr = log ( ) n! ∏ r x! j =1 j + x1 log p1 + ⋅ ⋅ ⋅ + xr −1 log pr −1 + xr log 1 − p1 − ⋅ ⋅ ⋅ − pr −1 . Thus θ = ¯ and hence. Xr be multinomially distributed r. . pr j = 1. . . . EXAMPLE 11 ( ) Let X1. x1 x x = ⋅ ⋅ ⋅ = r −1 = r . log L θ x1 .v. . . . · · · . Next. . . . Then ( ) n! ∏ j =1 x j ! r x p1 ⋅ ⋅ ⋅ prx 1 r n! ∏ j =1 x j ! r p1x ⋅ ⋅ ⋅ prx−1 1 − p1 − ⋅ ⋅ ⋅ − pr −1 1 r−1 ( ) xr . . . p j > 0. Differentiating with respect to pj. . . . ( ) 1 1 − xr = 0. . . . .

. xn = − 2 < 0 2 ∂θ σ and hence μ ˆ=¯ x is the MLE of μ. xn = − n log 2π − n log σ 2 − ( ) 1 2σ 2 ∑ (x j =1 n j −μ . . . if we assume that σ 2 is known and set μ = θ. . .5. r. . then we have again that ¯ μ ˜ = x is the root of the equation ∂ log L θ x1 . . Then 2 1 n ∑ xj − x n j =1 are the roots of these equations. . σ 2) with parameter θ = (μ. . . . .v. .5 Criteria12. Now. . Xn be i. xn = 2 x − μ = 0 ∂μ σ ∂ n 1 n θ = − + log L x . It can be seen that these values of the ˆ j = xj /n. . . . . On the other hand. p’s actually maximize the likelihood function. j = 1. respectively. ∂θ In this case it is readily seen that n ∂2 log L θ x1 . . . we obtain ∂ n log L θ x1 . (See Exercise 12. . . . .i. xn = 0 ∂θ is equal to 2 1 n xj − μ .12. . x ∑ xj − μ 1 n ∂σ 2 2σ 2 2σ 4 j =1 ( ) ( ) ( ) ( ) 2 = 0. . ⎥ ⎦ ) so that log L θ x1 . xn ( ) ⎛ 1 ⎞ ⎡ 1 =⎜ exp ⎢− ⎟ 2 ⎢ ⎝ 2πσ 2 ⎠ ⎣ 2σ n ∑ (x j =1 n j 2⎤ − μ ⎥. . r. . . . . σ 2)′. . . j = 1. and therefore p r are the MLE’s of the p’s. . Then L θ x1 . . It is further shown that μ ˜ and σ ˜ 2 actually maximize the likelihood function (see Exercise 12.) EXAMPLE 12 Let X1.d. .4. ∑ n j =1 ( ) ( ) .’s from N(μ.3 for Selecting anof Estimator: The Principle The Case Availability of Maximum Complete Likelihood Sufﬁcient Statistics 305 Hence xj /pj = n and pj = xj /n. then the root of ( ) ( ) ∂ log L θ x1 . ) 2 Differentiating with respect to μ and σ 2 and equating the resulting expressions to zero.5) and therefore ˜ = x and σ ˜2 = μ ( ( ) ) 1 n ˆ = x and σ ˆ 2 = ∑ xj − x μ n j =1 2 are the MLE’s of μ and σ 2. . if μ is known and we set σ 2 = θ. . .5. . xn = 0.

β ] x( n ) . for μ ˆ in Example 12. ∞ ) x(1) I ( −∞ . β In particular. β ). or if α is known and β = θ. REMARK 4 iii) The MLE may be a UMVU estimator. . ∞ ) x(1) I ( −∞ . respectively. xn = ( ) 1 ( 2c) n I [θ −c . xn = ( ) 1 (β − α ) n I [α . . x(1) and x(n) are the MLE’s of α and β. θ ≤ x(1) + c and θ ≥ x(n) − c. ( ) ( ) Here the likelihood function is not differentiable with respect to α and β.’s from U(α. xn = 4 ⎢ − 2 2 ∂θ σ ⎣ ⎢2 σ 2 which. 2 If β is known and α = θ. clearly. . Then L θ x1 . θ +c ] x( n ) . β = θ + c. for σ equal to ( ) ∑ (x j =1 n j 2⎤ −μ ⎥ ⎥ ⎦ ) 2 1 n ∑ xj − μ . . . – [X(1) + X(n)] is such a statistic and hence an MLE of θ. This happens when α ˆ = x(1) and β ˆ = x(1) and ˆ = x(n) are the MLE’s of α and β. then. This.i. and its maximum is 1/(2c)n. clearly. EXAMPLE 13 Let X1. 1 For example.v. . Here θ = (α. Xn be i. respectively. . for any θ such that θ − c ≤ x(1) and θ + c ≥ x(n). where c is a given positive constant. Thus α α ≤ x(1) and β ≥ x(n). r. ( ) ( ) The likelihood function is maximized. This shows that any statistic that lies between X(1) + c and X(n) − c is an MLE of θ. subject to the conditions that ˆ = x(n). . ⋅ ⋅ ⋅ . if α = θ − c. for instance. equivalently. n j =1 ( ) becomes 1 σ4 ⎛n ⎞ n < 0. . and also for σ ˆ 2 in the same example when μ is known. . then L θ x1 . maximized when β − α is minimum. β )′ ∈ Ω which is the part of the plane above the main diagonal. happens in Example 10. ⎜ − n⎟ = − 2 ⎝ ⎠ 2σ 4 So 1 n ˆ 2 = ∑ xj − μ σ n j =1 ( ) 2 is the MLE of σ 2 in this case. . ∂2 1 ⎡n 1 log L θ x1 .d. but it is. .306 12 Point Estimation Next. .

d. ▲ side attains its maximum at θ For instance. xn . θ ⋅ ⋅ ⋅ f xn . θ ⋅ ⋅ ⋅ f xn . . e. . θ . .’s with p. θ = g T x1 . . . Then. . . clearly.3 for Selecting anof Estimator: The Principle The Case Availability of Maximum Complete Likelihood Sufﬁcient Statistics 307 ˆ2 iii) The MLE need not be UMVU. .d.d. . . . Theorem 1 in Chapter 11 implies the following factorization: f x1 . where h is independent of θ. iv) There may be more than one MLE. . Another optimal property of an MLE is invariance. . That is.g. . xn . θ ). j = 1. f(·. . THEOREM 4 Let X1.. . θ ∈ Ω ⊆ ޒr. . . Thus. . xn = L φ −1 θ * x1 . Xn). xn max g T x1 . Tr)′. r be a sufﬁcient statistic for θ = (θ1. . . . xn .i. r. . Then on Ω onto Ω* ⊆ ޒm and let it be one-to-one. . Then call it L*(θ *|x1.12. . . It follows that L θ x1 . . it will have to be a function of T. This happens. β = θ + c. θ ∈ Ω [( ( ) = h x1 . so that θ = φ−1(θ *). . ▲ ) {[ ) ( ] ) ] } REMARK 5 Notice that the conclusion of the theorem holds true in all Examples 10–13. θ h x1 . See also Exercise 12. f(x. we have that the maximum at the ˆ .i. c > 0 In the following. since ( ) [ ( ) ] [( ) ] [ ( ) ] . Xn be i. . ˆ = (θ ˆ1. . Suppose θ ˆ φ(θ ) is an MLE of φ(θ ). xn). . This is the case in Example 13. . .10. . r. . . . Therefore ( ) ( ) [ ( ( ) ]( ) max f x1 . This case occurs in Example 13 when α = θ − c. θ ). . . . . . THEOREM 5 Let X1. . θˆr)′ is the unique MLE θ. an MLE is invariant under one-to-one transformations. θ* ∈ Ω * .v. the right-hand left-hand side above is attained at an MLE θ ˆ *. By assuming the existence of an MLE. max L θ x1 . and let φ be deﬁned ˆ is an MLE of θ. . xn . PROOF Since T is sufﬁcient. . θr)′. if θ of T. θ ∈ Ω . where θ ˆ * = φ(θ ˆ ). . . xn . . θ ∈ Ω ⊆ ޒr. it follows that θ ˆ is a function Then. and let T = (T1. if a unique MLE exists.f. iii) The MLE is not always obtainable by differentiation. Tj = Tj(X1. as is proved in the following theorem. Thus φ(θ ˆ ) is an MLE of φ(θ). . xn .’s with p.3.d. . . . θ ∈ Ω = max L * θ * x1 .5 Criteria12.v. . θ . Xn be i. we present two of the general properties that an MLE enjoys. . . . . . in Example 12 for σ when μ is unknown. . . as it follows from the right-hand side of the equation above. . . .f. PROOF Set θ * = φ(θ ). .

. 12.v. . Xn are i.4.’s from the Negative Exponential distribu¯ ) is the MLE of θ. Xn be i. 12. θ ∈ Ω ⊆ ޒ . Exercises 12. with parameter θ ∈ Ω = (0.f. . . f(·.v. . .2.1 If X1. θ ). θ).8 12. θ 2 ∈ Ω = × ޒ0.5. θ ∈ Ω = (0.i. Xn be i. show that r/(r + X 12.4 Refer to Example 11 and show that the quantities p ˆ j = xj/n.d. . Chapter 11.’s from the U (θ − distribution. r. .10 Let X1.3 If X1.i. r.v. θ 2 = ( ) ′ ⎛ x − θ1 ⎞ 1 exp⎜ − ⎟ .v. Then show that the quantities μ ˜=¯ x and ˜2 = σ 1 n ∑ xj − x n j =1 ( ) 2 indeed maximize the likelihood function. .d. 12. . . X 12.v.i.d. and ﬁnd the MLE of θ.308 12 Point Estimation 1 n ∑ xj − x n j =1 ( ) 2 is the MLE of σ 2 in the normal case (see Example 12). θ2 θ2 ⎠ ⎝ ( ) ( ) Find the MLE’s of θ1. . show that 1/X 12.3. . .5. . Xn are i. r.10 and ﬁnd the MLE of the reliability (ޒx. 12.’s with p. .7 Let X1. . Refer to Exercise 12. θ2. 1 2 12.i. tion with parameter θ ∈ Ω = (0. .d. θ = θ 1 .5.5 Refer to Example 12 and consider the case that both μ and σ 2 are unknown.9 Refer to Exercise 11. it follows that 1 n ∑ xj − x n j =1 ( ) 2 is the MLE of σ. j = 1. denoting the number of times that no particles were emitted.5.6 Suppose that certain particles are emitted by a radioactive source (whose strength remains the same over a long period of time) according to a Poisson distribution with parameter θ during a unit of time. 1).θ+ 1 2 ). θ1.5.5.2 If X1.d. ∞). . r.5. . x ≥ θ 1 .v.5.’s from the Negative Binomial distribution ¯ ) is the MLE of θ. and let X be the r. θ2) given by f x. r indeed maximize the likelihood function. Find the MLE of θ in terms of X. .5. ∞). The source in question is observed for n time units.’s from B(m. . . show that ¯/m is the MLE of θ. θ1 . . Xn are i.i. r.d. and let . ∞ .5.

. . . . The most convenient form of a loss function is the squared loss function.d. . see also the paper Maximum Likelihood and Sufﬁcient Statistics by D. . xn)] is taken to be a convex function of θ. . . k > 0. . xn). . 1. ·) and is deﬁned by R θ .6 Criteria for Selecting an Estimator: The Decision-Theoretic Approach We will ﬁrst develop the general theory underlying the decision-theoretic method of estimation and then we will illustrate the theory by means of concrete examples. . In this section. . . θ 1 n 1 ⎜ (n ) ⎟ (1 ) (n ) 2⎠ ⎝ ˆ is an MLE of θ but it is not a function only of the sufﬁcient Then show that θ statistic (X(1). . δ x1 . . . . xn [ ( )] ( ) ( ) k . . . θ ). .v. .) ( ) ( )( ) 12. we will restrict ourselves to a real-valued parameter. . δ = Eθ L θ . . 2 The risk function corresponding to the loss function L(·. The value δ(x1. Moore in the American Mathematical Monthly. δ x . . x f x . . Vol. f(·. So let X1. .’s with p. . . or L[·.3 The Casean of Estimator: Availability of Complete Sufﬁcient Approach Statistics 309 ˆ =θ ˆ X . . . δ x1 . θ ⋅ ⋅ ⋅ f xn . . Xn and by using the decision function δ. . . δ X 1 . . Xn be i. 42–45. . . X(n))′. δ x1 . . . . δ(x1. pp. ·) is denoted by R (·. . δ x1 . xn = θ − δ x1 .i. . . S. . . x ⎩x 1 n [ ( [ ( [ ( )] )] ( )] ( ) ) ( ( ) ) . [ ( )] ( ) L θ . . θ . January 1971. No. . xn) of δ at (x1.f. . L θ . a loss function is a nonnegative function in the arguments θ and δ(x1. . xn = υ θ θ − δ x1 . DEFINITION 7 DEFINITION 8 A decision function (or rule) δ is a (measurable) function deﬁned on ޒn into ޒ. . X = ⎛ X − 1 ⎞ + cos 2 X X − X + 1 . . . that is.12. . . θ dx ⋅ ⋅ ⋅ dx 1 n 1 n 1 n ∫−∞ ⎪∫−∞ =⎨ ⎪∑ ⋅ ⋅ ⋅ ∑ L θ . The loss functions which are usually used are of the following form: L θ . . xn f x1 . θ ⋅ ⋅ ⋅ f x . xn)′ is called a decision. (Thus Theorem 4 need not be correct if there exists more than one MLE of the parameters involved. θ ∈ Ω ⊆ ޒ. .6 Criteria for Selecting The Decision-Theoretic 12. . xn = θ − δ x1 . . . For this. . or more generally. xn DEFINITION 9 [ ( )] [ ( )] . . . r. . . . xn . xn) which expresses the (ﬁnancial) loss incurred when θ is estimated by δ(x1. 78. X n ( ) ⎧ ∞ ⋅ ⋅ ⋅ ∞ L θ . . . Our problem is that of estimating θ. For estimating θ on the basis of X1.d. . .

δ ) ≤ R(θ. δ ) for all θ ∈ Ω. assumed that a certain loss function is chosen and then kept ﬁxed throughout. δ ) becomes simply the variance of δ(X1. is called a minimax (from minimizing the maximum) estimator. δ). Unfortunately. . δ ) = R(θ. and its goodness will be judged on the basis of its risk R(·. . then. we believe that the approach adopted here is more pedagogic and easier for the reader to follow. δ*) = R(θ. δ = Eθ L θ . if we restrict ourselves only to the class of unbiased estimators with ﬁnite variance and take the loss function to be the squared loss function (see paragraph following Deﬁnition 8). clearly. . . Thus. we conﬁne our attention to an essentially complete class of admissible estimators. In the present context of (point) estimation. for it might so happen that . xn) will be called an estimate of θ. in this case. . Setting aside the fruitless search for an estimator which would uniformly (in θ ) minimize the risk within the entire class of admissible estimators. Since for any two equivalent estimators δ and δ* we have R(θ . However. there are two principles on which our search may be based. such estimators do not exist except in trivial cases.310 12 Point Estimation That is. However. . of course. Actually. . some authors discuss UMVU estimators as a special case within the decision-theoretic approach as just mentioned. δ * X 1 . δ X 1 . . δ*) for all θ ∈Ω. . . Once this has been done the question arises as to which member of this class is to be chosen as an estimator of θ. The criterion proposed above for selecting δ then coincides with that of ﬁnding a UMVU estimator. X n = R θ . δ*) ≤ R(θ. However. To start with. It is. δ*) for any other estimator δ* within the class and for all θ ∈ Ω. This problem has already been discussed in Section 3 and Section 4. if it exists. Such an estimator. we may not rule out inadmissible estimators. Two decision functions δ and δ* such that R θ . . while we may still conﬁne ourselves to the essentially complete class of estimators. the decision δ = δ(x1. to minimize the maximum (over θ ) risk. where DEFINITION 11 A class D of estimators of θ is said to be essentially complete if for any estimator δ* of θ not in D one can ﬁnd an estimator δ in D such that R(θ. Xn). the risk corresponding to a given decision function is simply the average loss incurred if that decision function is used. R(θ. The ﬁrst is to look for an estimator which minimizes the worst which could happen to us. that is. X n = Eθ L θ . An apparently obvious answer to this question would be to choose an estimator δ such that R(θ. . δ ) for all θ ∈ Ω with strict inequality for at least one θ. we ﬁrst rule out those estimates which are not admissible (inadmissible). δ * for all θ ∈ Ω are said to be equivalent. . it sufﬁces to restrict ourselves to an essentially complete class of estimators. . . searching for an estimator with some optimal properties. where DEFINITION 10 ( ) [ ( )] [ ( )] ( ) The estimator δ of θ is said to be admissible if there is no other estimator δ* of θ such that R(θ. .

λ. δ * . Legitimate objections to minimax principles like the one just cited prompted the advancement of the concept of a Bayes estimate.3. some further notation is required.f. the estimator δ is said to be minimax if for any other estimator δ*. Let D2 be the class of all estimators for which R(δ ) is ﬁnite for a given prior p. in Fig.2. Then it makes sense to choose that δ for which R(δ) ≤ R(δ*) for any other δ*. For example. which is . 12. Such a δ is called a Bayes estimator of θ. the assumption of θ being an r. ␦ ) R(0. θ ∈ Ω ≤ sup R θ . Then [( ) ] [( ) ] () ( ) ( )() ( )() DEFINITION 13 Within the class D2.2 illustrates the fact that a minimax estimator may be inadmissible. it is much worse than δ* at almost all other points. θ may denote the (unknown but ﬁxed) distance between Chicago and New York City.f. one has sup R θ . λ on Ω) if R(δ ) ≤ R(δ*) for any other estimator δ*. ␦ ϭ minimax R R(• . For example.2 0 Figure 12. ⎩Ω Assuming that the quantity just deﬁned is ﬁnite. ␦ ). δ .v.d. (See Fig. and suppose now that θ is an r. It should be pointed out at the outset that the Bayes approach to estimation poses several issues that we have to reckon with. ␦ *) R(• .d. whereas the minimax estimate δ is slightly better at its maximum R(θ0. Then we have the following deﬁnition: DEFINITION 12 Within the class D1.d. it is clear that R(δ ) is simply the average (with respect to λ) risk over the entire parameter space Ω when the estimator δ is employed.3 The Casean of Estimator: Availability of Complete Sufﬁcient Approach Statistics 311 R R(• .3 0 a minimax estimator is inadmissible.v. First. provided it exists. θ ∈ Ω . δ = ⎨ ⎪∑ R θ . might be entirely unreasonable.6 Criteria for Selecting The Decision-Theoretic 12. the estimator δ is said to be a Bayes estimator (in the decision-theoretic sense and with respect to the prior p. itself with p. λ on Ω.12.) Instead.f. ␦ ) 0 Figure 12. to be called a prior p. 12. δ ) is ﬁnite for all θ ∈ Ω. δ λ θ dθ ⎪∫Ω R δ = Eλ R θ . δ λ θ . Now one may very well object to the minimax principle on the grounds that it gives too much weight to the maximum risk and entirely neglects its other values.d. Then set ⎧ R θ . we restrict our attention to the class D1 of all estimators for which R(θ. To see what this is.f. Figure 12. Recall that Ω ⊆ ޒ. δ ).

. For instance. . δ x1 . in choosing λ. δ = Eθ θ − δ X 1 . L θ . with prior p. .f.d. 2 R θ . θ )dx 2 1 n 1 n )] 2 1 ⋅ ⋅ ⋅ dxn .d. We consider the continuous case. . θ ⋅ ⋅ ⋅ f xn . xn Let θ be an r.7 Finding Bayes Estimators Let X1. That is.d. and also incorporate in it any prior knowledge we might have in connection with the true value of the parameter. . We should like to mention once and for all that the results in the following two sections are derived by employing squared loss functions. . . that the same results may be discussed by using other loss functions. δ λ θ dθ Ω ∞ =∫ ⎧ ⋅⋅⋅ ⎨ Ω ⎩ ∫−∞ () ( )() ∫ ∞ −∞ × f x1 . δ = L θ . θ ). . in principle. . R δ = ∫ R θ . Xn be i. . . This difﬁculty may be circumvented by pretending that this assumption is only a mathematical device. xn ( ( ) )] 2 }() . 12. . . In addition. f(·. there still is a problem in choosing the prior λ on Ω. there are inﬁnitely many such choices. .i. we are forced to select λ so that the resulting formulas can be handled. . . Of course. Thus we have the possibility of incorporating prior information about θ or expressing our prior opinion about θ.f. X n =∫ Therefore ∞ −∞ ( ) [ ( ⋅⋅⋅ ∫ ∞ −∞ [θ − δ ( x . . in concrete cases. . Then. Then we are interested in determining δ so that it will be a Bayes estimate (of θ in the decision-theoretic sense). . since the discrete case is handled similarly with the integrals replaced by summation signs. . .v. It should be emphasized. when choosing λ we have the ﬂexibility to weigh the parameters the way we feel appropriate. θ ) ⋅ ⋅ ⋅ f ( x . choices do suggest themselves. . . We have ( ) ( ) [ ( )] [ ( )] .312 12 Point Estimation to be determined by repeated measurements. xn . This granted. . it is sensible to assign more weight in the subset under question than to its complement.v. However.’s with p. . prior experience might suggest that it is more likely that the true parameter lies in a given subset of Ω rather than in its complement. θ ∈ Ω ⊆ ޒ. Another decisive factor in choosing λ is that of mathematical convenience. for an estimate δ = δ x1 . however. . λ. x )] f ( x . . by means of which we expect to construct estimates with some tangible and mathematically optimal properties. r. . and consider the squared loss function. θ dx1 ⋅ ⋅ ⋅ dxn λ θ dθ ( ) [ θ − δ x1 . xn = θ − δ x1 .

θ ) ⋅ ⋅ ⋅ f ( xn . . But ∫Ω [θ − δ ( x1 . θ ) ⋅ ⋅ ⋅ f ( xn .) From (13). ( )() ∫ ( ) Ω 1 n Ω 1 n Formalizing this result. . θ )λ (θ )dθ + ∫ θ 2 f ( x1 . . ()( ⎧ ∫ ⎨ ∫ [θ − δ ( x . Ω Ω and the right-hand side of (14) is of the form g t = at 2 − 2bt + c (14) which is minimized for t = b/a. θ )dθ = δ 2 ( x1 . . θ ) ⋅ ⋅ ⋅ f (x 2 Ω 1 Ω 1 n . xn )] λ (θ ) f ( x1 . then R (δ ) is also minimized. xn )∫ f ( x1 . . the interchange of the order of integration is valid here because the integrand is nonnegative. θ ) ⋅ ⋅ ⋅ f ( x . θ )λ (θ )dθ < ∞. θ ) ⋅ ⋅ ⋅ f ( x n . xn)′. θ ⋅ ⋅ ⋅ f x .3 12. θ )λ (θ )dθ ) ∫ f x . θ λ θ dθ . λ on Ω for which ∫Ω θf ( x1 . . . . . . )() δ x1 . . 0 < ∫ f ( x1 . xn )] λ (θ ) f ( x1 . is given by ∫ θ f (x . . Integrals in (15) are to be replaced by summation signs if λ is of the discrete type. (In fact. ( )() ∫ ( ) n Ω 1 n (15) provided λ is of the continuous type. . θ ) ⋅ ⋅ ⋅ f ( xn . . . . .12. θ ) ⋅ ⋅ ⋅ f ( xn . xn = ( θf ( x .) Thus the required estimate is given by () (a > 0 ) δ x1 . xn = ( θf ( x . xn) (of θ) corresponding to a prior p. xn ) Ω × ∫ θf ( x1 . θ )λ (θ )dθ . θ ) ⋅ ⋅ ⋅ f ( xn . . θ )λ (θ )dθ < ∞. The theorem used is known as the Fubini theorem. . . . . . θ ⋅ ⋅ ⋅ f xn . θ )λ (θ )dθ ) ∫ f x . . . . θ ) ⋅ ⋅ ⋅ f ( x . . xn)′. g′(t) = 2at − 2b = 0 implies t = b/a and g″(t) = 2a > 0. θ ) ⋅ ⋅ ⋅ f ( x n . Ω and for each (x1. . . . .d.7 Finding Bayes Estimators The Case of Availability of Complete Sufﬁcient Statistics ∞ −∞ 313 =∫ ⋅⋅⋅ × λ θ f x1 . θ )dθ 2 2 ∫Ω [θ − δ ( x1 . it follows that if δ is chosen so that is minimized for each (x1. . . θ dθ dx1 ⋅ ⋅ ⋅ dxn . x )] ⎩ ∞ −∞ Ω 1 n 2 ) ( ) } (13) (As can be shown. . . θ ⋅ ⋅ ⋅ f x . . θ )λ (θ )dθ − 2δ ( x1 .f. . θ λ θ dθ < ∞. . . . . we have the following theorem: THEOREM 6 A Bayes estimate δ(x1. θ λ θ dθ .

θ ∈ Ω. it follows then that the Bayes estimate of θ (in the decision-theoretic sense) δ(x1.d. given X1 = x1.f. This is called the posterior p. that is. if the observed value of Xj is xj.. . To this end. . xn = ∫ θh θ x dθ . Some examples follow. θ λ θ dθ = ∫ f x1 . θ ⋅ ⋅ ⋅ f xn . xn)′ and denoting by h(·|x) the posterior p. ⎩0. 1 n (16) for the case that λ is of the continuous type. let h(·|x) be the posterior p. . Ω ( ) ( ) REMARK 6 At this point.f. 1 ⎪ λ θ = ⎨Γ α Γ β ⎪ ⎪ otherwise. θ ) = L(θ |x). Setting x = (x1. this observation establishes a link between maximum likelihood and Bayes estimates and provides insight into each other. say.d. from the deﬁnition of the p. By means of (15) and (16). of a Beta distribution with parameters α and β. j = 1. θ ⋅⋅⋅ f x . . 1). θ ∈ Ω = (0. EXAMPLE 14 Let X1. of θ given by (16) and corresponding to the prior p. . . the likelihood function is maximized if and only if the posterior p. n. or the mode of h(·|x). Another Bayesian estimate of θ could be provided by the median of h(·|x). θ λ θ dθ Ω Ω () ( )() ( ) ( )() δ x1 . ⎧Γ α +β β −1 ⎪ θ α −1 1 − θ . . of θ and represents our revised opinion about θ after new evidence (the observed X’s) has come in.d. λ(θ ) = c. we have then hθx = where f θ. Xn = xn. . Xn be i. .d. .d. of θ. of θ.v. we have () ( ) ( )() ( ) ( ) .d. we determine the conditional p. r. is. Since f(x. . if θ ∈ 0.i. . ( ) ( h(x ) (17) Now let us suppose that Ω is bounded and let λ be constant on Ω. θ λ θ ( ) (h(x) ) = ( h(x)) ( ) = ( ) h(x() ) ( ) .’s from B(1. .f. θ λ θ f x . θ ).d. Then it follows from (17) that the MLE of θ.f. xn) is the expectation of θ with respect to its posterior p. Thus when no prior knowledge about θ is available (which is expressed by taking λ(θ ) = c.314 12 Point Estimation Now. θ ∈ Ω).f.f. if it exists. Now. that is. let us make the following observation regarding the maximum likelihood and the Bayesian approach to estimation problems. . λ.d. . . . . is simply that value of θ which maximizes h(·|x). We choose λ to be the Beta density with parameters α and β.d. . As will be seen. . x f x. . h(·|x) may be written as follows: hθx = L θ x)λ (θ ) . .f.f. . h x = ∫ f x. if it exists. .

. 1 −θ) ( ∫ Γ (α )Γ ( β ) = Γ α +β 1 ∑ j xj n −∑ j x j ( ) ( )() α −1 β −1 0 1 α +∑ j x j −1 β +n − ∑ j x j −1 0 which by means of (18) becomes as follows: ⎞ ⎛ Γ⎛ ( ) ⎝α + ∑ x ⎠ Γ⎝ β + n − ∑ = ⋅ Γ (α )Γ ( β ) Γ (α + β + n) Γ α +β n j =1 j n j =1 I1 xj ⎞ ⎠ . by virtue of (15). θ λ θ dθ Ω ( ) θθ 1 − θ θ 1 − θ dθ ( ) ( ) ∫ Γ (α )Γ ( β ) Γ (α + β ) ( ) ( ) θ = dθ . θ λ θ dθ Ω ( ) θ 1 − θ θ 1 − θ dθ ( ) ( ) ∫ Γ (α )Γ ( β ) Γ (α + β ) ( ) ( ) θ = dθ . (20) Relations (19) and (20) imply. xn ( ) ∑ = n j =1 xj + α n+α + β . of course Γ(γ) = (γ − 1)Γ(γ − 1). θ ⋅ ⋅ ⋅ f xn . θ ⋅ ⋅ ⋅ f xn . . Then. ( ) n n ⎞ Γ α + β + n Γ⎛ ⎝ α + ∑ j =1 x j + 1⎠ α + ∑ j =1 x j = . . 1 −θ) ( ∫ Γ (α )Γ ( β ) = Γ α +β 1 ∑ j xj n −∑ j x j ( ) ( )() α −1 β −1 0 1 α +∑ j x j +1 −1 β +n −∑ j x j −1 0 Once more relation (18) gives ⎞ ⎛ Γ⎛ ( ) ⎝ α + ∑ x + 1⎠ Γ ⎝ β + n − ∑ ⋅ = Γ (α )Γ ( β ) Γ (α + β + n + 1) Γ α +β n j =1 j n j =1 I2 xj ⎞ ⎠ . writing ∑n j xj rather than ∑n j=1xj when this last expression appears as an exponent. (21) . . xn that is.3 12. we have I 1 = ∫ f x1 . for simplicity. Γ (α + β ) (18) and. I 1 = ∫ θf x1 . . δ x1 . . . = n α +β+n ⎞ + Γ α + β + n + 1 Γ⎛ α x ∑ j j =1 ⎝ ⎠ ( ( ) ) δ x1 .7 Finding Bayes Estimators The Case of Availability of Complete Sufﬁcient Statistics 315 α −1 ∫0 x (1 − x) 1 β −1 dx = Γα Γβ ( ) ( ). .12. (19) Next.

.d. . Let X1. × ∫ exp ⎨− −∞ ⎩ 2 ⎭ [( ) ) ) ) ( )] But (n + 1)θ 2 ⎛ nx + μ ⎞ − 2 nx + μ θ = n + 1 ⎜ θ 2 − 2 θ⎟ n+1 ⎠ ⎝ ) ( ( ( Therefore 2 2 ⎡ ⎛ nx + μ ⎞ ⎛ nx + μ ⎞ ⎤ nx + μ = n + 1 ⎢θ 2 − 2 − θ +⎜ ⎟ ⎜ ⎟ ⎥ n+1 ⎢ ⎝ n+1 ⎠ ⎝ n+1 ⎠ ⎥ ⎦ ⎣ 2 2⎤ ⎡⎛ ⎛ nx + μ ⎞ nx + μ ⎞ = n + 1 ⎢⎜ θ − ⎟ ⎥. . 1).v. In this case the corresponding Bayes estimate is δ x1 . . θ ⋅ ⋅ ⋅ f xn . EXAMPLE 15 ( ) ∑ = n j =1 xj + 1 n+ 2 . μ = exp − x + − ⎨ ⎬ n ⎢∑ j n + 1 ⎥⎪ n + 1 2π ⎪ 2 ⎢ j =1 ⎥ ⎣ ⎦ ⎩ ⎭ ) ∞ ( ) ( ) ( ) (22) . ⎟ −⎜ n+1 ⎠ ⎢⎝ ⎝ n+1 ⎠ ⎥ ⎦ ⎣ I1 = 1 n+1 ( 2π ) ( 1 1 n ⎧ ⎡ nx + μ ⎪ 1⎢ n 2 exp ⎨− ∑ x j + μ 2 − ⎢ n+1 ⎪ 2 ⎢ j =1 ⎣ ⎩ ( ) 2 ⎤⎫ ⎥⎪ ⎥⎬ ⎪ ⎥ ⎦⎭ ⎡ ⎤ 2 ⎢ ⎛ nx + μ ⎞ ⎥ 1 × − θ− ⎟ ⎥dθ ∫−∞ exp ⎢ 2 ⎜ n+1 ⎠ ⎥ ⎢ 2 1 n+1 ⎝ 2π 1 n + 1 ⎢ ⎥ ⎣ ⎦ 2 ⎧ ⎫ ⎡ nx + μ ⎤ 1 1 ⎪ 1⎢ n 2 2 ⎥ ⎪.316 12 Point Estimation REMARK 7 We know (see Remark 4 in Chapter 3) that if α = β = 1. r. Xn be i. θ λ θ dθ Ω ( ) ( )() ( = ( ) 2π 1 n +1 ⎡ 1 n − ∑ xj − θ exp ∫−∞ ⎢ ⎢ ⎣ 2 j =1 ∞ ) 2 ⎡ θ−μ ⎤ ⎢ ⎥ exp ⎢− 2 ⎥ ⎦ ⎢ ⎣ ( ) 2 ⎤ ⎥ dθ ⎥ ⎥ ⎦ = ( ) ( ⎡ 1⎛ n 2 ⎞⎤ 2 μ exp x − + ⎢ ∑ j ⎟⎥ ⎜ n +1 ⎠⎥ ⎢ ⎣ 2 ⎝ j =1 ⎦ 2π 1 ∞ ⎧ 1 ⎫ n + 1 θ 2 − 2 nx + μ θ ⎬dθ . Then I 1 = ∫ f x1 . where μ is known. . then the Beta distribution becomes U(0. .i. xn as follows from (21). . Take λ to be N(μ. 1). .’s from N(θ. 1).

. δ x1 . h(θ|x).2 Refer to Example 15 and: iii) Determine the posterior p. iii) Derive the Bayes estimate in (21) as the mean of the posterior p. h(θ |x). .12. where c(x) is determined by the requirement that the Pλ-probability of this set is equal to 1 − α.d. that is.f.f.f.d. iii) Derive the Bayes estimate in (24) as the mean of the posterior p. one has.d. θ λ θ dθ Ω ( ) ( )() ( = ( ) 2π 1 n+1 1 n +1 ⎡ 1 n θ exp ⎢− ∑ x j − θ ∫−∞ ⎢ ⎣ 2 j =1 ∞ ) 2 ⎡ θ−μ ⎤ ⎢ ⎥ exp ⎢− 2 ⎥ ⎦ ⎢ ⎣ ( ) 2 ⎤ ⎥dθ ⎥ ⎥ ⎦ = ( 2π ) ( 1 1 n ⎧ ⎡ nx + μ ⎪ 1 n 2 exp ⎨− ⎢∑ x 2 j +μ − ⎢ n+1 ⎪ 2 ⎢ j =1 ⎣ ⎩ ( ) 2 ⎤⎫ ⎥⎪ ⎥⎬ ⎪ ⎥ ⎦⎭ ⎤ ⎡ 2 ⎢ ⎛ 1 nx + μ ⎞ ⎥ × − θ− ⎟ ⎥dθ ∫−∞ θ exp ⎢ 2 ⎜ n+1 ⎠ ⎥ ⎢ 2 1 n+1 ⎝ 2π 1 n + 1 ⎥ ⎢ ⎦ ⎣ 2 ⎤⎫ ⎧ ⎡n nx + μ ⎥ ⎪ nx + μ 1 1 ⎪ 1⎢ 2 2 = μ − + − .1 Refer to Example 14 and: iii) Determine the posterior p. θ ⋅ ⋅ ⋅ f xn .f. xn = ( ) nx + μ . iii) Construct a 100(1 − α)% Bayes conﬁdence interval for θ .d. exp x ⎨ ⎬ ∑ j n ⎢ n + 1 ⎥⎪ n + 1 n + 1 2π ⎪ 2 ⎢ j =1 ⎥ ⎦⎭ ⎣ ⎩ ) ∞ ( ) ( ) ( ) (23) By means of (22) and (23). . h(θ |x) ≥ c(x)}. 1). assign equal probabilities to the two tails.) 12. on account of (15). . . h(θ|x).7. h(θ |x). (Hint: For simplicity.3 The Case of Availability of Complete Sufﬁcient Exercises Statistics 317 Next.7. I 2 = ∫ θf x1 . iii) Construct the equal-tail 100(1 − α)% Bayes conﬁdence interval for θ. n+1 (24) Exercises 12. determine a set {θ ∈ (0.

on Ω. iii) Derive the Bayes estimates δ(x) for the loss functions L(θ. . on the basis of X: iii) Determine the posterior p. THEOREM 7 Suppose there is a prior p.v.’s with p.d.f. on the basis of X: iii) Determine the posterior p. δ ) = [θ − δ (x)]2/θ. 12. .d.f. δ ) = [θ − δ (x)]2 as well as L(θ . . . h(θ|x). the Bayes estimate of θ (in the decision-theoretic sense) is given by δ ( x1 . with parameters α = θ and β = 1. Then.f. r. . h(·|x).f. θ ∈ Ω (⊆ )ޒand let λ be a prior p. . Xn)′ = (x1. iii) Construct the equal-tail 100(1 − α)% Bayes conﬁdence interval for θ .i. . δ ) = [θ − δ(x)]2 as well as L(θ. iv) Do parts (i)–(iii) for any sample size n.7. xn ) = ∫ θh(θ x)dθ . Xn be i.d. . and let the prior p. . given X = (X1.f. xn)′ = x. Then. iv) Do parts (i)–(iii) for any sample size n.7.f. Then δ is minimax. Let X1. is given by (16).318 12 Point Estimation 12. λ of θ be Negative Exponential with parameter τ. λ of θ be the Negative Exponential with parameter τ.d. iii) Derive the Bayes estimates δ(x) for the loss functions L(θ.3 Let X be an r. . PROOF one has By the fact that δ is the Bayes estimate corresponding to the prior λ.v.d. . of θ.f.d. . δ ) is independent of θ.d. λ on Ω such that for the Bayes estimate δ deﬁned by (15) the risk R(θ . h(θ |x). Then we have the following result. Then the posterior p. . δ) = [θ − δ (x)]2/θ .f.v.) 12. and let the prior p. (Hint: If Y is distributed as Gamma with parameters k and β. then it is easily seen that 2βY ∼ χ2 2k. and as has been already observed. .4 Let X be an r. this can be achieved in many instances by means of the Bayes method described in the previous section. f(·. iii) Construct the equal-tail 100(1 − α)% Bayes conﬁdence interval for θ . .f. .8 Finding Minimax Estimators Although there is no general method for deriving minimax estimates. distributed as P(θ ). . iv) Do parts (i)–(iv) for any sample size n when λ is Gamma with parameters k (positive integer) and β. θ ).d. Ω provided λ is of the continuous type.d. having the Beta p.d.

δ*) is independent of θ. θ ∈ Ω = c ≤ ∫ R θ . . θ ∈ Ω Ω for any estimate δ*. EXAMPLE 16 [( ) ] ( )() [( ) ] Let X1. so that R θ. . δ * = 4 n+ n 4 1+ n Since R(θ. . we have 2 (α + β ) ) ( 2 − n = 0. Xn and λ be as in Example 14. . σ 2). . However. Therefore δ is minimax. we obtain ⎛ X +α ⎞ R θ . . .3 12. ¯ of θ was UMVU. δ *)λ (θ )dθ for any estimate δ*. ▲ The theorem just proved is illustrated by the following example. . Now by setting X = ∑n j = 1Xj and taking into consideration that Eθ X = nθ and Eθ X2 = nθ(1 − θ + nθ).’s from N(μ. Xn be i.d.8 Finding Minimax Estimators The Case of Availability of Complete Sufﬁcient Statistics 319 ∫Ω R(θ . EXAMPLE 17 ( ) ∑ n j =1 1 n 2 nx + 1 2 = n+ n 2 1+ n xj + ( ) Let X1. δ * x1 . will not be presented here. Here is an example.i. . . Now a UMVU estimator has uniformly (in θ ) smallest risk when its competitors lie in the class of unbiased estimators with ﬁnite variance. xn = is minimax. Theorem 6 implies that ( α2 n+α + β ) 2 = ( n ) ( 2 = 1 ) 2 .v. δ )λ (θ )dθ ≤ ∫Ω R(θ . . δ * λ θ dθ ≤ sup R θ . ⎥ ⎦ ⎭ ( ) By taking α = β = 1 – √n and denoting by δ* the resulting estimate.12. δ = Eθ ⎜ θ − ⎟ n+α + β⎠ ⎝ ( ) 2 = 1 (n + α + β ) 2 ⎧⎡ ⎨⎢ α + β ⎩⎣ ( ) 2 ⎫ − n⎤ θ 2 − 2α 2 + 2αβ − n θ + α 2 ⎬. δ . The proof of these latter two facts. It can It was shown (see Example 9) that the estimator X be shown that it is also minimax and admissible. however. In other words. δ ) = c by assumption. . δ * . r. Then the corresponding Bayes estimate δ is given by (21). a UMVU estimator need not be admissible. outside this class there might be estimators which are better than a UMVU estimator. But R(θ. The case that λ is of the discrete type is treated similarly. Hence sup R θ . where σ 2 is known and μ = θ. . 2α 2 + 2αβ − n = 0.

δ = Eθ αU − θ ( ) ( ) 2 = Eθ α U − θ + α − 1 θ [( ) ( )] 2 = θ2 n + 2 α 2 − 2 nα + n . . . . by differentiation. θ = θ1 . Xn be independent r. calculate the risk R(θ.8. and consider the loss function L(θ. However. Xn be i. the actual solution of the resulting system of r equations is often tedious. θ r . R(θ. Then its risk is R θ . . .1 Let X1.v. . . .320 12 Point Estimation EXAMPLE 18 Let X1. np (θ) − np j θ j 2 Often the p’s are differentiable with respect to the θ’s. Then for the estimate δ (x) = ¯ . .’s from the P(θ ) distribution. r. Let Xj be the number of trials which result in Aj and let pj be the probability that any one of the trials results in Aj. σ 2). . ′ p j = p j θ . k. . j = 1. This method of estimation is applicable in situations which can be described by a Multinomial distribution. .v. δ ) = 1/θEθ [θ − δ (X)]2. Then the UMVU estimator of θ is given by U= 1 n ∑ X j2 . 12. Thus U is not admissible. that is. . Set σ 2 = θ. and conclude that δ (x) is x minimax. One such measure is the following: χ 2 [X =∑ k j =1 j ( )] . . .) Its variance (risk) was seen to be equal to 2θ 2/n. Consider the estimator δ = αU. . . Exercise 12. U) = 2θ 2/n. n [( ) ] The value α = n/(n + 2) minimizes this risk and the minimum risk is equal to 2θ 2/(n + 2) < 2θ 2/n for all θ. that is. in principle. n j =1 (See Example 9. and then the minimization can be achieved. consider n independent repetitions of an experiment whose possible outcomes are the k pairwise disjoint events Aj. . Namely.’s from N(0. . The probabilities pj may be functions of r parameters.9 Other Methods of Estimation Minimum chi-square method.d. k. j = 1. . The solution may be easier by minimizing the following modiﬁed χ2 expression: .i. () ( ) Then the present method of estimating θ consists in minimizing some measure of discrepancy between the observed X’s and the expected values of them. δ ) = [θ − δ (x)]2/θ.

f. they enjoy some asymptotic optimal properties as well. . . r. j = 1. f(·. . . r. .i. .9 Other Methods of Estimation The Case of Availability of Complete Sufﬁcient Statistics 321 2 χ mod [X =∑ k j =1 j − np j θ Xj ( )] .v. j = 1. . hence . Xn be i. . . .v. Since α +β EX 1 = 2 (see Chapter 5). .v.3 12. . where both μ and σ 2 are unknown. the resulting estimators can be shown to have some asymptotic optimal properties. . Let X1. . k = 1. where both α and β are unknown. . .’s from U(α. . . n j =1 ( ) the solution of which (if possible) will provide estimators for θj.10. On the other hand the theoretical moments are also functions of θ = (θ1. X X Xj − X . we have ⎧X = μ ⎪ n 2 1 n ⎨1 2 2 2 2 ˆ ˆ = σ + μ μ = σ = . . of course. β ). under suitable regularity conditions. assume that EX r = mr is ﬁnite.d.d. σ 2). θ ) and for a positive integer r. . The problem is that of estimating mr. . we have ⎧ α +β ⎪X = 2 ⎪ ⎨ n ⎪1 X2 = α − β j ⎪n ∑ 12 ⎩ j =1 and σ X 1 2 ( ) (α − β ) = 12 2 ( ) + (α + β ) 2 2 4 ⎧ ⎪β + α = 2 X . all Xj > 0. θ r . (See Section 12.d.) The method of moments. or ⎨ ⎪ ⎩β − α = S 12 . .’s with p. mr will be estimated by the corresponding sample moment 1 n ∑ X jr . θr)′. . . . k. where .i.12.’s from N(μ. n j =1 The resulting moment estimates are always unbiased and.i. ∑ ∑ j ⎪n n j =1 ⎩ j =1 ( ) EXAMPLE 20 Let X1. . r. Xn be i. . Xn be i. . According to the present method. r. . . By the method of moments. 2 provided. Then we consider the following system 1 n k ∑ X j = mk θ1 .d. . . Under suitable regularity conditions. r. EXAMPLE 19 Let X1.

ˆ = X + S 3.9.β REMARK 8 In Example 20. respectively.d. ∑ n j =1 ( ) Let X1.5. β Hence α ˆ of α.4 2 1 n Xj − X . and having one or more of the . . where S2 = 12. θ + b). β)′.v. show that α ˆ =X α and β. .9. . or when not enough moments exist. Xn are independent r.d.9.1 Let X1. θ2 Find the moment estimator of θ.’s distributed as U(θ − a. Least square method. does the method of moments provide an estimator for θ? 12.7 and ﬁnd the moment estimators of θ1 and θ2.9. Xn be independent r. we see that the moment estimators α respectively.d. 12.10 Asymptotically Optimal Properties of Estimators So far we have occupied ourselves with the problem of constructing an estimator on the basis of a sample of ﬁxed size n. .v. . .2 If X1. . This is a drawback of the method of moment estimation. . . β. Xn are i. f(·.5 Let X1.v.’s from the Beta distribution with parameters α. . where a.i. ∞). 12. 12. r. This method is applicable when the underlying distribution is of a certain special form and it will be discussed in detail in Chapter 16. ∞ .’s from the Gamma distribution with paramˆ = S2/X ¯ 2/S2 and β ¯ are the moment estimators of eters α and β. ( ) ( ) () ( ) 12. θ ) given by f x. θ ∈ Ω = 0. X2 be independent r. θ ∈ Ω = (0. θ ).6 Refer to Exercise 12. β and ﬁnd the moment estimators of α and β. . θ ) x . are not functions of the sufﬁcient statistic (X(1). Xn be i. Exercises 12.9. . ˆ.v. ˆ = X − S 3. . r.i.’s with p. X(n))′ of (α.3 If X1. b > 0 are known and θ ∈ Ω = ޒ. .v. Find the moment estimator of θ and calculate its variance. Another obvious disadvantage of this method is that it fails when no moments exist (as in the case of the Cauchy distribution). .322 12 Point Estimation S= 1 n ∑ Xj − X n j =1 ( ) 2 . θ = 2 θ − x I ( 0 .’s distributed as U(−θ. .f.9.

n →∞ ( ) ( ( )) P DEFINITION 16 The sequence of estimators of θ. a. properly normalized. . Xn be i. Xn)}. is said to be consistent Pθ in probability (or weakly consistent) if Vn ⎯ ⎯ → θ as n → ∞. asymptotic properties can be associated with an estimator.d.’s with p. the term “consistent” will be used in the sense of “weakly consistent. . If d n Vn − θ ⎯ ⎯ → N 0.s. σ2(θ)/n). ( Pθ ) θ it follows that Vn ⎯ ⎯ → θ (see Exercise 12. Let X1. ▲ DEFINITION 15 The sequence of estimators of θ.12. . . is said to be best asymptotically normal (BAN) if: ii) iIt is asymptotically normal and ii) The variance σ 2(θ) of its limiting normal distribution is smallest for all θ ∈ Ω in the class of all sequences of estimators which satisfy (i). The relative asymptotic efﬁciency of any other sequence of estimators which satisﬁes (i) only is expressed by the quotient of the smallest variance mentioned in (ii) to the variance of the asymptotic normal distribution of the sequence of estimators under consideration. In connection with the concepts introduced above. (See ( Pθ ) Chapter 8. σ2(θ)). .) This is often expressed (loosely) by writing Vn ≈ N(θ. where X is distributed (under Pθ) as N(0. – is said to be asymptotically normal N(0. .3 12. θ ∈ Ω ⊆ ޒ. {Vn} = {V(X1. Chapter 8. θ ). If. Pθ for all θ ∈ Ω.f. f(·. . . we have the following deﬁnitions. Xn)}. .s. as n → ∞.d. .10.v. DEFINITION 14 The sequence of estimators of θ. .) From now on. then Vn ⎯ P PROOF For the proof of the theorem the reader is referred to Remark 5.i. (See Chapter 8. . EθVn → θ and σ 2 θVn → 0. r. minimax. as n → ∞. . . σ 2 θ . σ 2(θ)). A BAN sequence of estimators is also called asymptotically efﬁcient (with respect to the variance). for all θ ∈ Ω. the (intuitively optimal) property associated with an MLE.” The following theorem provides a criterion for a sequence of estimates to be consistent. if. the sample size n may increase indeﬁnitely.10 Asymptotically Properties of Estimators The Case of Availability Optimal of Complete Sufﬁcient Statistics 323 following properties: Unbiasedness. . minimum average risk (Bayes). {Vn} = {V(X1. . as n → ∞. If however. we have the following result. . It is said to be a. THEOREM 8 θ ⎯ → θ. then some additional. consistent (or strongly consistent) if Vn ⎯⎯→ θ as n → ∞. To this effect. {Vn} = {V(X1. Xn)}. √n(Vn − θ ) d ⎯ ⎯→ X for all θ ∈ Ω. (uniformly) minimum variance.1).

. Xn are i. ¯ =X ¯n of θ (see iii) If X1. θ).i. r. by Exercise 12. .d. the likelihood equation ∂ log L θ X 1 .v.’s from B(1. . ⎣ ∂θ ⎦ () ( ) 2 where X is an r.’s from P(θ ). . then the MLE X Example 10) is both (strongly) consistent and asymptotically normal by the same reasoning as above. and therefore the above sequences of estimators are actually BAN. However. however. That √n(X I(θ) = 1/[θ(1 − θ)] (see Example 7). Xn are i. θ ⎥ . with the variance of limiting normal distribution being equal to I−1(θ) = θ (see Example 8). Xn be i. Xn be i.v. . . .’s with p. for each n.d. The fact that there can be exceptional θ’s. the ¯.d.’s from N(μ. for which {θ ∗ n} does not satisfy (ii) of Deﬁnition 16 for some exceptional θ ’s. The variance of the (normal) distribution of √n(Xn − μ) is I−1(μ) = σ 2. Then. r. which we denote by X ¯n here. the proof of Theorem 9 is beyond the scope of this book. θ).d. ⎢ ⎥ ⎣ n j =1 ⎦ It can further be shown that in all cases (i)–(iii) just considered the regularity conditions not explicitly mentioned in Theorem 9 are satisﬁed.324 12 Point Estimation THEOREM 9 Let X1.v.v. follows from the fact that n ( X n − θ ) θ (1 − θ ) is asymptotically N(0. Examples have been constructed. . . Xn). .5. . this topic will not be touched upon here.i.v. Then.1. 2 2 ¯=X ¯n of μ and (1/n)∑n iii) The same is true of the MLE X j=1(Xj − μ) of σ if 2 X1.d. Appropriate regularity conditions ensure that these exceptional θ ’s are only “a few” (in the sense of their set having Lebesgue measure zero). θ ∈ Ω ⊆ ޒ. I−1(θ)).i. if certain suitable regularity conditions are satisﬁed. along with other considerations. r. such that the sequence {θ* n} of estimators is BAN and the variance of its limiting normal distribution is equal to the inverse of Fisher’s information number ( ) ⎡∂ ⎤ I θ = Eθ ⎢ log f X . EXAMPLE 21 iii) Let X1. . θ * n will be an MLE or the MLE. f(·. distributed as the X’s above. . and therefore it will be omitted. X n = 0 ∂θ has a root θ * n = θ*(X1. . . .f. . 1) by the CLT (see Chapter 8).i. . . has prompted the introduction of other criteria of asymptotic efﬁciency. . σ ) with one parameter known and the other unknown – — (see Example 12). . where Chapter 8). and the variance of the limiting normal distribution of ⎡1 n ⎤ 2 n ⎢ ∑ X j − μ − σ 2 ⎥ is I −1 σ 2 = 2σ 4 see Example 9 . The weak and strong MLE of θ is X ¯n follows by the WLLN and SLLN. ( ) ( ) ( ) . Also. respectively (see consistency of X – ¯n − θ) is asymptotically normal N(0. r. . In smooth cases. . .

. However. for any arbitrary but ﬁxed (that is. not functions of n) ( Pθ ) – values of α and β. Xn)} be a sequence of estimators of θ such that √n(Vn − θ ) ⎯ ( Pθ ) 2 Y as n → ∞.v. θ ).11Sufﬁcient Closing Statistics Remarks The Case of Availability of Complete 325 Exercise 12.d.3) that the UMVU estimator of θ is Un = X coincides with the MLE of θ (Exercise 12. it can be established (see Exercise 12. λ. (26) That is.1). Then show that Pθ Vn ⎯ ⎯ → θ. It can also be shown (see Exercise 12. ( ) n→∞ P For an example. where Y is an r. This is not surprising. f(·. θ) ⎯ 2 (P ) θ .11. as n → ∞. where W is an r. Then we say that {Un} and {Vn} are asymptotically equivalent if for every θ ∈ Ω.v. √n(Un − θ) ⎯ ( Pθ ) – N(0.’s with p.v. .d. As for Wn. Next. r. . suppose that the X’s are from B(1.i. . Xn be i.f.11. .f. θ(1 − θ)). distributed as N(0. . θ). . distributed as N( 1 − θ. as n → ∞. .11 Closing Remarks The following deﬁnition serves the purpose of asymptotically comparing two estimators. θ ).d.’s with p. . the Bayes estimator of θ.f. It has been shown ¯n (= X ¯ ) and this (see Exercise 12.d.) 12. as n → ∞. . .v. by the – d ⎯ → Z. σ (θ )). Thus {Un} and {Vn} are asymptotically equivalent according to Deﬁn →∞ – nition 17. distributed as CLT. X )} n n 1 n and {V } = {V ( X .3 12. . . X )} n n 1 n be two sequences of estimators of θ. DEFINITION 17 Let X1. f(·. where Z is an r. r. . is given by Vn and the minimax estimator is Wn ∑ = n n j =1 Xj +α n+α + β .12. . four different methods of estimation of the same parameter θ provided three different estimators.3) that √n(Wn − d ⎯→ W. .v. θ n U n − Vn ⎯ ⎯ → 0. asymptotic normality of {Vn} implies its consistency in n →∞ probability. θ(1 − θ)). Xn be i. (25) ∑ = j =1 Xj + n 2 n+ n . . since the criteria of optimality employed in the four approaches were different. .1 Let X1. . θ ∈ Ω ⊆ ޒand let {Vn} – d ⎯ → = {Vn(X1. and it can also be shown (see Exercise 11. (That is. .d.i. θ ∈ Ω ⊆ ޒand let {U } = {U ( X .5.3.2) that √n(Un − Vn) Pθ ⎯ ⎯→ 0. corresponding to a Beta p. that √n(Vn − θ) d ⎯ ⎯→ Z.10.1).

n →∞ 12. the answer would be that this would depend on which kind of optimality is judged to be most appropriate for the case in question. Then show that √n(Un Pθ − Vn) ⎯ ⎯→ 0. where Z is an r.2 In reference to Example 14. where W is an estimator of θ. Un = X – estimator of θ. θ(1 − θ ).d. given by (26).3 In reference to Example 14. θ (1 − θ )). Then show that – d ⎯ → Z as n → ∞. Then show that √n(Wn − θ) ⎯ ( Pθ ) 1 r. Exercises 12.11.) θ .f.v. Wn. and were used for the sake of deﬁniteness only. whereas the estimator Vn is given by (25).11. distributed as N(0. Finally. √n(Vn − θ) ⎯ (P ) ¯n is the UMVU (and also the ML) 12.1 In reference to Example 14. regarding the question as to which estimator is to be selected in a given case. is the minimax – d ⎯ → W as n → ∞.11. corresponding to a prior Beta p. distributed as (N 2 − θ.v. the estimator Vn given by (25) is the Bayes estimator of θ.326 12 Point Estimation Thus {Un} and {Wn} or {Vn} and {Wn} are not even comparable on the basis of Deﬁnition 17. they are of a general nature. Although the preceding comments were made in reference to the Binomial case.

H (or H0) is a statement which nulliﬁes this claim and is called a null hypothesis. then H is called a simple hypothesis. Often hypotheses come up in the form of a claim that a new product. Thus H H 0 : θ ∈ω A : θ ∈ω c . Similarly for alternatives. is called a (statistical) hypothesis (about θ) and is usually denoted by H (or H0). θ ∈ Ω ⊆ ޒr and having p. Once a hypothesis H is formulated. . 13. the problem is that of testing H on the basis of the observed values of the X’s. . . In this context. then a coin. DEFINITION 1 A statement regarding the parameter θ. class of events.13. .d. . . .3 UMP Tests for Testing Certain Composite Hypotheses 327 Chapter 13 Testing Hypotheses Throughout this chapter. taking values in [0. that is. whose probability of falling 327 . such as θ ∈ ω ⊂ Ω. . If ω contains only one point. we introduce the basic concepts of testing hypotheses theory. 1] and having the following interpretation: If (x1. Xn)′ and φ(x1. f(·. . is more efﬁcient than existing ones. Pθ). xn) = y.’s deﬁned on a probability space (S. Xn will be i. otherwise it is called a composite hypothesis..v. θ). . .1 General Concepts of the Neyman–Pearson Testing Hypotheses Theory In this section. .d. etc. xn)′ is the observed value of (X1.i. . .f. which is called the alternative to H (or H0) and is usually denoted by A. The statement that θ ∈ ω c (the complement of ω with respect to Ω) is also a (statistical) hypothesis about θ. X1. ω = {θ0}. r. . DEFINITION 2 ( ) A randomized (statistical) test (or test function) for testing H against the alternative A is a (measurable) function φ deﬁned on ޒn. a new technique. .

θ ∈ Ω. the decision maker is somewhat conservative for holding the null hypothesis as true unless there is overwhelming evidence from the data that it is false. . 1 − β(θ) represents the probability of an error. Of course.01. suppose a pharmaceutical company is considering the marketing of a newly developed drug for treatment of a disease for which the best available drug in the market has a cure rate of 60%. . xn ( ) (x (x 1 .10) and then derive tests which maximize the power. From the consideration of potential losses due to wrong decisions (which may or may not be quantiﬁable in monetary terms). . 1 − β(θ) with θ ∈ ωc is the probability of accepting H. in fact. this cannot be done. 0. θ ∈ω ω] is denoted and β(θ) is called the power of the test at θ ∈ω by α and is called the level of signiﬁcance or size of the test. . . so that 1 − β(θ) = Pθ (accepting H). The function β restricted to ωc is called the power function of the test ωc. The sup [β(θ). DEFINITION 3 Let β(θ) = Pθ (rejecting H ). Thus for θ ∈ ω. one may commit either one of the following two kinds of errors: to reject H when actually H is true. . is tossed and H is rejected or accepted when heads or tails appear. What the classical theory of testing hypotheses does is to ﬁx the size α at a desirable level (which is usually taken to be 0. .05. . He/she believes that the consequence of wrongly rejecting the null hypothesis is much more severe to him/ her than that of wrongly accepting it. 0. If. then the test φ is called a nonrandomized test. namely. or to accept H when H is actually false. In the particular case where y can be either 0 or 1 for all (x1. . 0. the (Borel) set B in ޒn is called the rejection or critical region and Bc is called the acceptance region. that is. xn)′.328 13 Testing Hypotheses heads is y. The reason for this course of action is that the roles played by H and A are not at all symmetric. α is the smallest upper bound of the type-I error probabilities. Clearly.005. On the basis of limited experimentation. with a ﬁxed sample size. xn 1 ) ∈B ′ . in general. Thus a nonrandomized test has the following form: ⎧ ⎪1 if =⎨ ⎪0 if ⎩ ′ φ x1 . namely. This will be done explicitly in this chapter for a number of interesting cases. the probability of typeII error. it fails to be more . the research division claims that the new drug is more effective. Unfortunately. Thus for θ ∈ ωc. . . Then β(θ) with θ ∈ ω is the probability of rejecting H. the (unknown) parameter θ does lie in the subset ω speciﬁed by H. x ) ∈B . respectively. For example. calculated under the assumption that H is false. In testing a hypothesis H. . calculated under the assumption that H is true. . c n In this case. maximizing the power is equivalent to minimizing the type-II error probability. . the probability of type-I error. . It is also plain that one would desire to make α as small as possible (preferably 0) and at the same time to make the power as large as possible (preferably 1). . β(θ) is the probability of an error.

X n ∈ B⎥ = 1 ⋅ Pθ ⎢ X 1 . ⎣ ⎦ () ( ) ( ) ( ) ( ) and the same can be shown to be true for randomized tests (by an appropriate application of property (CE1) in Section 3 of Chapter 5).3 Testing UMPaTests Simple for Hypothesis Testing Certain Against Composite a SimpleHypotheses Alternative 329 effective or if it has harmful side effects.f. . We notice that for a nonrandomized test with critical region B. .d. In many important cases a UMP test does exist.d. will be quite severe.v.i.. X n . ii) When tossing a coin. we want to test the hypothesis that the underlying p. the p. . . but that may not be considered to be as serious as the other loss. If ωc consists of a single point only. f(·. . On the other hand.’s with p. whose expectation is equal to 5. .∞)(x). Ω. .f.v. iii) X is an r.. let X be the r. that is. θ1}. θ). Xn be i. θ0). X n ∈ B⎥ ⎣ ⎣ ⎦ ⎦ ′ ⎡ ⎤ + 0 ⋅ Pθ ⎢ X 1 .2 Testing a Simple Hypothesis Against a Simple Alternative In the present case. In actuality.f. . The problem is that of testing the hypothesis H : r.d. . f1 = f(·. Thus φ is a UMP. failure to market a truly better drug is an opportunity loss. (1) A level-α test which maximizes the power among all tests of level α is said to be uniformly most powerful (UMP). . level-α test if (i) sup [βφ(θ). we have ′ ′ ⎡ ⎡ ⎤ ⎤ β θ = Pθ ⎢ X 1 . . the loss sustained by the company due to an immediate obsolescence of the product. We set f0 = f(·.213. the null hypothesis H should be that the cure rate of the new drug is no more than 60% and A should be that this cure rate exceeds 60%. whose p. . X n ∈ Bc ⎥ = Eθφ X 1 . . f is given by f(x) = 2e−2xI(0.1. of the X’s is f0 against the alternative that it is f1.v.f. . decline of the company’s image. 13. . Let fθ and fθ be two given p. .d. In such a formulation. . . a UMP test is simply called most powerful (MP). In other words. Ω = {θ0. θ ∈ ω] = α and (ii) βφ(θ ) ≥ βφ*(θ). θ1) and let X1. DEFINITION 4 () () ( ) θ ∈Ω . . .d. we take Ω to consist of two points only. θ ∈ ωc for any other test φ* which satisﬁes (i).’s. θ ∈Ω θ ∈ ω = {θ0} against the alternative A : θ ∈ ω c = {θ1} at level α. Exercise 13. If a decision is to be made on the basis of a number of clinical trials. Thus β φ θ = β θ = Eθ φ X 1 . which can be labeled as θ0 and θ1.13. . .f. Then the statement is: The coin is biased.1 In the following examples indicate which statements constitute a simple and which a composite hypothesis: i) X is an r. . taking the value 1 if head appears and 0 if tail appears. . X n .’s f0 and f1 need not 0 1 .v.d. Ω may consist of more than two points but we focus attention only on two of its points. etc.

θ1 ⋅ ⋅ ⋅ f xn . respectively. θ1 = Cf x1 . . xn ( ) ⎧1. if f x1 . . θ). ( ) Z = X1 . . . In connection with this testing problem.v. 0 0 ( ) ( ) Tc () and therefore in calculating Pθ -probabilities we may redeﬁne and modify r. the test deﬁned by (2) and (3) is MP within the class of all tests whose level is ≤α. since the discrete case is dealt with similarly by replacing integrals by summation signs.d. 0 Eθ φ Z = Pθ f1 Z > Cf0 Z + γPθ f1 Z = Cf0 Z 0 0 0 ( ) [ ( ) ( )] [ ( ) ( )] = P {[ f (Z) > Cf (Z)]I D} + γP {[ f (Z) = Cf (Z)]I D} θ0 1 0 θ0 1 0 ⎫ ⎧⎡ f Z ⎤ ⎪ 1 ⎪ = Pθ ⎨⎢ > C ⎥I D⎬ + γPθ f Z ⎢ ⎥ ⎪ ⎪ ⎦ ⎩⎣ 0 ⎭ 0 = Pθ = Pθ 0 0 ⎫ ⎧ ⎡ f ( Z) ⎤ ( ) ⎪ ⎪ ⎢ ⎥ = C D ⎨ I ⎬ Z f ⎢ ⎥ ( ) ⎪ ⎪ ⎦ ⎭ ⎩⎣ ( ) [(Y > C )I D] + γP [(Y = C )I D] (Y > C ) + γP (Y = C ). . .’s.f. . θ). ( ( ) ) ( ( ) ) ( ( ) ) ( ( ) ) where the constants γ (0 ≤ γ ≤ 1) and C(>0) are determined so that ( ) (3) Then. θ) · · · f(Xn.’s with p. θ 0 ⋅ ⋅ ⋅ f xn . 1 0 0 θ0 θ0 (4) . PROOF For convenient writing. θ 0 ⎪ (2) 0. otherwise.f. θ) for f(x1. they may be any p. .d. f(Z. θ). θ1}. Then Pθ Dc = Pθ Z ∈T c = ∫ f0 z dz = 0. . θ ∈ Ω = {θ0.330 13 Testing Hypotheses even be members of a parametric family of p. θ1 ⋅ ⋅ ⋅ f xn .d. Thus we have. ⎪ ⎩ Eθ0 φ X 1 . z = x1 .’s on the set Dc. .v.We are interested in testing the hypothesis H : θ = θ0 against the alternative A : θ = θ1 at level α (0 < α < 1). . . if f x1 . θ) · · · f(xn. θ 0 ⎪ ⎪ = ⎨γ . r. . we are going to prove the following result. we set ′ dz = dx1 ⋅ ⋅ ⋅ dxn . X n = α . . . X n ( ) ′ and f(z.f. Next. . . in particular. The proof is presented for the case that the X’s are of the continuous type. Let φ be the test deﬁned as follows: φ x1 . . θ). let T be the set of points z in ޒn such that f0(z) > 0 and let Dc = Z−1(T c). θ 0 ⋅ ⋅ ⋅ f xn . θ1 > Cf x1 . for testing H against A at level α. f(X1. f(·.’s which are of interest to us.d. THEOREM 1 (Neyman–Pearson Fundamental Lemma) Let X1.i. . . xn . Xn be i.

G is nondecreasing and continuous from the right. Next.. we assume that C0 is a discontinuity point of a. First. of the r.f.) At this point.1. (4) becomes as follows: Eθ φ Z = Pθ Y > C0 + γPθ Y = C0 0 0 0 ( ) ( = a C0 + ( ) ( ) a C − − a C = α. Again we assert that the resulting test is of level α. 0 ( ) ( ) ( ) [ 0 ( )] [ ( )] ( ) ( ) and a(C) = 1 for C < 0. G(∞) = 1. In fact. These properties of G imply that the function a is such that a(−∞) = 1. a(C0) = a(C0 −). since Pθ (Y ≥ 0) = 1 Figure 13. (See Fig. Now for any α (0 < α < 1) there exists C0 (≥0) such that a(C0) ≤ α ≤ a(C0 −). 13.3 Testing Simple Hypothesis Against a SimpleHypotheses Alternative 331 a (C ) 1 Figure 13. a(∞) = 0. Then. so that G(C) = 1 − a(C) = Pθ (Y ≤ C) is the d. 0 0 Pθ Y = C = G C − G C − = 1 − a C − 1 − a C − = a C − − a C .213.1 a (CϪ) a (C ) 0 C where Y = f1(Z)/f0(Z) on D and let Y be arbitrary (but measurable) on Dc. Y. take again C = C0 in (2) and also set γ = ( ) a(C − ) − a(C ) α − a C0 0 0 (so that 0 ≤ γ ≤ 1). In the present case. [ ( ) ( )] a(C − ) − a(C ) α − a C0 0 0 0 0 ) ( ) . Now let a(C) = Pθ (Y > C). there are two cases to consider. we have G(−∞) = 0.v. a is nonincreasing and continuous from the right. that is.1 represents the graph of a typical function a. In this case. C0 is a continuity point of the function a. Since G is a d.UMPaTests for Testing Certain Composite 13. α = a(C0) and if in (2) C is replaced by C0 and γ = 0.f. the resulting test is of level α. 0 0 ( ) ( ) ( ) as was to be seen. Futhermore. in this case (4) becomes Eθ φ Z = Pθ Y > C0 = a C0 = α .

( ) ( ) ( ) (8) Relations (6). φ z − φ * z > 0 = φ − φ * > 0 . That is. let φ* be any test of level ≤α and set B + = z ∈ ޒn . (3) is satisﬁed. B+ = φ > φ * ⊆ φ = 1 ∪ φ = γ = f1 ≥ Cf0 ⎫ ⎪ ⎬ − B = φ < φ * ⊆ φ = 0 ∪ φ = γ = f1 ≤ Cf0 . and γ = ( ) a(C − ) − a(C ) α − a C0 0 0 (which it is to be interpreted as 0 whenever is of the form 0/0).332 13 Testing Hypotheses Summarizing what we have done so far.⎪ ⎭ Therefore ∫[ ޒφ (z) − φ * (z)][ f1 (z) − Cf0 (z)]dz = ∫ [φ (z ) − φ * (z )][ f1 (z ) − Cf0 (z )]dz B + ∫ [φ (z ) − φ * (z )][ f1 (z ) − Cf0 (z )]dz B n + − and this is ≥0 on account of (5). Now it remains for us to show that the test so deﬁned is MP. ∫[ ޒφ (z) − φ * (z)] f (z)dz = E φ (Z) − E n 1 θ1 θ1 φ * Z = β φ θ1 − β φ * θ1 . To see this. as deﬁned above. or βφ(θ1) ≥ βφ*(θ1). as described in the theorem. ∫[ ޒφ (z) − φ * (z)][ f (z) − Cf (z)]dz ≥ 0. ( ( ) ( ) ( ) ( ) ( ) ( ) ( ) ) (5) Then B+ ∩ B− = ∅ and. That is. n 1 n 0 (6) But ∫[ ޒφ (z) − φ * (z)] f (z)dz = ∫ ޒφ (z) f (z)dz − ∫ ޒφ * (z) f (z)dz n 0 n 0 n 0 = Eθ φ Z − Eθ φ * Z = α − Eθ φ * Z ≥ 0. (7) and (8) yield βφ(θ1) − βφ*(θ1) ≥ 0. B − n () () } ( ) { = {z ∈ . ▲ The theorem also guarantees that the power βφ(θ1) is at least α. the test deﬁned by (2) is of level α. n 1 0 which is equivalent to ∫[ ޒφ (z) − φ * (z)]f (z)dz ≥ C ∫[ ޒφ (z) − φ * (z)] f (z)dz. 0 0 0 ( ) ( ) ( ) (7) and similarly. This completes the proof of the theorem. ޒφ (z ) − φ * (z ) < 0} = (φ − φ * < 0). . That is. we have that with C = C0. clearly.

and since φ is most powerful. It will then prove convenient to set R z. In the examples to follow.v. θ1) itself. θ 0 . where C0 = ⎜ log C − n log ⎟ log 1 − θ0 ⎠ ⎝ θ0 1 − θ1 ( ( ) ) . Finally. Then βφ(θ1) ≥ α. . then γ = 0 and C is the unique point for which a(C) = α. θ1 = x log ( ) 1 − θ1 θ1 + n − x log . θ0. then φ necessarily has the form (2) unless there is a test of size <α with power 1. b2 ≤ G b2 − G b1 0 [ ( ]] ( ) ( ) = [1 − a(b )] − [1 − a(b )] = a(b ) − a(b ) = 0.3 Testing UMPaTests Simple for Hypothesis Testing Certain Against Composite a SimpleHypotheses Alternative 333 COROLLARY Let φ be deﬁned by (2) and (3). say. if the (straight) line through the point (0. . r. b2] can be chosen without affecting the level of the test. θ1) > C is equivalent to θ1 1 − θ0 ⎛ 1 − θ1 ⎞ . θ 0 . provided. R(z. then γ = 0 again and any C in (b1. θ0. This is so because Pθ Y ∈ b1 .13. θ) and suppose θ0 < θ1. θ1 ⋅ ⋅ ⋅ f xn . α) and parallel to the C-axis has only one point in common with the graph of a. . The converse is also true. Next. if C = C0 is a discontinuity point of a. Ω = {θ0. The test φ*(z) = α is of level α. Also it is often more convenient to work with log R(z. θ ) f x1 . θ0. R(z. θ1} and the problem will be that of testing a simple hypothesis against a simple alternative at level of signiﬁcance α. namely. θ 1 = ( ) ( ) ( ) f (x . θ1 1 0 n 0 whenever the denominator is greater than 0. if φ is an MP level α test. of course.213. x > C0 . 2 1 2 1 ii) The theorem shows that there is always a test of the structure (2) and (3) which is MP. ▲ PROOF REMARK 1 i) The determination of C and γ is essentially unique. we have βφ(θ1) ≥ βφ*(θ1) = α. This point will not be pursued further here. θ ) ⋅ ⋅ ⋅ f (x . In fact. b2]. EXAMPLE 1 Let X1. The examples to be discussed below will illustrate how the theorem is actually used in concrete cases. θ1) rather than R(z.’s from B(1. Then log ޒz.d. θ0. . then both C and γ are uniquely deﬁned the way it was done in the proof of the theorem. 1 − θ0 θ0 ( ) where x = Σ n j = 1 xj and therefore. if the above (straight) line coincides with part of the graph of a corresponding to an interval (b1. θ1) > 0.i. by the fact that θ0 < θ1. Xn be i.

05 = P0. . Therefore the MP test in this case is given by (2) with C0 = 17 and γ = 0.5 X ≤ C0 − γP0. Thus γ is deﬁned by 0. θ0 ( ) where x = ∑ xj j =1 n and hence. Thus the MP test is deﬁned by ⎧1.5 X = C0 = 0.5(X ≤ 17) = 0. the inequality signs in (9) and (10) are reversed. θi).d.5 X ≤ C0 + γP0.5 X = C0 = 1 − P0. ⎪ ⎪0.5 X > C0 + γP0.0323γ = 0.8356.882. ⎪ ⎪ φ z = ⎨γ . Xn be i. Then logR z. ( ) ( ) For C0 = 17. θ0. where n(θ −θ log ⎡Ce ⎢ ⎣ C0 = log θ1 θ0 1 0 ( ) )⎤ ⎥ ⎦. θ1 = x log ( ) θ1 − n θ1 − θ 0 . i = 0. ⎪ ⎪ φ z = ⎨γ . θ1) > C is equivalent to x > C0 .05 and n = 25. .5(X = 17) = 0.75. . 0 0 0 ( ) ( ) ( ) (10) and X = Σn j = 1 Xj is B(n. EXAMPLE 2 Let X1.’s from P(θ) and suppose θ0 < θ1.882 P0. where C0 and γ are determined by Eθ φ Z = Pθ X > C0 + γPθ X = C0 = α .75(X > 17) + 0.0323.9784 and P0. . If θ0 > θ1. P0. whence γ = 0. by means of the Binomial tables.i. 1.8792.v.75(X = 17) = 0. α = 0.95.9784 − 0. one has that R(z.50.5 X = C0 ( ) ( ) ( ) ( ) is equivalent to P0. For the sake of deﬁniteness. by using the assumption that θ0 < θ1. ⎪ ⎪0.334 13 Testing Hypotheses Thus the MP test is given by ⎧1. Then 0. . let us take θ0 = 0. ⎩ if if () ∑ j =1 x j > C0 n ∑ j =1 x j = C0 n (9) otherwise. we have. θ1= 0. θ 0 . The power of the test is P0. ⎩ if if () ∑ ∑ n j =1 n j =1 x j > C0 x j = C0 (11) otherwise. r.95.

let us take θ0 = 0.213.91 = 0. one has that for C0 = 10. i = 0. Thus the test is given by (11) with C0 = 10 and γ = 0. ( ) ( ) ( ) (12) and X = Σn j=1 Xj is P(nθi).4 X > 10 + 0. . 1 > 3 C0 + 1 = 0.3 X ≤ C0 − γP0. If θ0 > θ1. θ1 = 0.0413γ = 0. θ 0 .4. If θ > θ . Let.3.4 X = 10 = 0.’s from N(θ. Then (14) gives P−1 X > C0 = P−1 3 X + 1 > 3 C0 + 1 = P N 0.d. Xn be i. θ0 = −1. Therefore γ is deﬁned by 0. α = 0. Then (12) becomes P0. the inequality signs in (13) and (14) are and X i 0 1 reversed.3 Testing UMPaTests Simple for Hypothesis Testing Certain Against Composite a SimpleHypotheses Alternative 335 where C0 and γ are determined by Eθ0 φ Z = Pθ0 X > C0 + γPθ0 X = C0 = α .001 and n = 9. 1.95. . θ1. r. θ1 = ( ) 1 n ⎡ ∑ ⎣ x j − θ0 2 j =1 ⎢ ( ) − (x 2 j 2 − θ1 ⎤ ⎥ ⎦ ) ¯ > C0.91 = P N 0 . Then logR z.2013. whence γ = 0. i = 0.1791 P0. . (13) where C0 is determined by Eθ φ Z = Pθ X > C0 = α .001. As an application.95. α = 0. 1 > −2. ( ) [( ] [ ( ) ] .1791. where and therefore R(z.05 and n = 20. for example. θ0.v.3(X = 10) = 0. 1.i.9574 − 0.03.03. The power of the test is P0. () 0 if x > C0 otherwise. 1) and suppose θ0 < θ1. ( ) ( ) By means of the Poisson tables. 1/n).03 = P 1 3 X − 1 > −2. .0413. EXAMPLE 3 ( ) ( ) Let X1. θ1) > C is equivalent to x C0 = n θ0 + θ1 1 ⎡ log C ⎢ + n ⎢ θ1 − θ0 2 ⎣ ( )⎤ ⎥ ⎥ ⎦ by using the fact that θ0 < θ1. 0 ( ) ( ) (14) ¯ is N(θ . P0. = 1.9982.1791.9574 and P0.13. ( ) [( ) ( ) )] [ ( ) ( )] whence C0 = 0. Thus the MP test is given by ⎧1. The power of the test is P 1 X > 0. φ z =⎨ ⎩0.3(X ≤ 10) = 0.3 X = C0 = 0. Therefore the MP test in this case is given by (13) with C0 = 0. the inequality signs in (11) and (12) are reversed.

264 ⎞ 2 P > 16 X > 150. () if ∑ n j =1 x2 j > C0 (15) otherwise. θ1 = 16. i = 0. X16 are independent r. let θ0 = 4. 16 ⎠ ⎝ 16 ( ) ( ) Exercises 13. . where C0 = ⎛ 2θ0θ1 θ ⎞ log ⎜C 1 ⎟ . . What is the numerical value of n if α = 0. one has predetermined values for α and β.01. . θ1 = ( ) 1 θ1 − θ 0 θ x + log 0 . . by means of θ0 < θ1.264. where μ is unknown and σ is known.1 If X1.9 and σ = 1? . Xn be i. 9) against the alternative A that it is N(1. 9) at level of signiﬁcance α = 0.05. α = 0. The power of the test is ⎛ X 150. θ1) > C is equivalent to x > C0. θ1 − θ0 θ0 ⎠ ⎝ Thus the MP test in the present case is given by ⎧ ⎪1. . where C0 is determined by ⎛ n ⎞ Eθ φ P = Eθ ⎜ ∑ X j2 > C0 ⎟ = α . β = 0.3915 = 0. r. one has that R(z.2 Let X1. Xn be independent r.’s. . 1. Then (16) becomes X ⎛X C ⎞ ⎛ 2 C0 ⎞ P4 X > C0 = P4 ⎜ > 0 ⎟ = P ⎜ χ 20 > ⎟ = 0.264 = P 16 ⎜ ⎟ = P χ 20 > 9.336 13 Testing Hypotheses EXAMPLE 4 Let X1. φ z =⎨ ⎪ ⎩0. σ2). . θ 0 . θ0. so that. where X = Σ j = 1 X j. 13.977.v. Show that the sample size n can be determined so that when testing the hypothesis H : μ = 0 against the alternative A : μ = 1. the inequali ity signs in (15) and (16) are reversed. 4 ⎠ 4 ⎠ ⎝ 4 ⎝ ( ) whence C0 = 150. .v.2. . Here logR z.01 and n = 20. .264. construct the MP test of the hypothesis H that the common distribution of the X’s is N(0. θ) and suppose θ0 < θ1.2. Thus the test is given by (15) with C0 = 150. If θ0 > θ1. .i. ⎝ j =1 ⎠ 0 () 0 (16) n 2 and θ is distributed as χ2 n.v.’s distributed as N(μ. . For an example.d.05. Also ﬁnd the power of the test. 2θ 0θ1 2 θ1 2 where x = Σn j=1 x j.’s from N(0.

3 UMP Tests for Testing Certain Composite Hypotheses 337 13. Construct the MP test of the hypothesis H : β = 2 against the alternative A : β = 3 at level of signiﬁcance 0. r. θ). Let X1. This will be shown in the present section.f.5. σ2). This deﬁnition is somewhat more restrictive than the one found in more advanced textbooks but it is sufﬁcient for our purposes.i. 1) p. 1] interval.05. construct the MP test of the hypothesis H : μ = 3.f. UMP tests do exist. .f. σ2 = 4 at level of signiﬁcance α = 0. f(·. . . .05. .’s having the monotone likelihood ratio property. with p = 1 2 H : f = f0 against the alternative A : f = f1: i) Show that the rejection region is deﬁned by: {x ≥ 0 integer.d.d.2.2. . . x! 2x ≥ C} for (Hint: Observe that the function x!/2x is nondecreasing for x integer ≥1. . X100 be independent r. .01. show that α can get arbitrarily small and β arbitrarily large for sufﬁciently large n. θ = f x1 . θ ∈ Ω ⊆ ޒ. It will prove convenient to set g z. . ( ) ( ) ( ) ′ z = x1 . For testing the hypothesis f0 is P(1) and f1 is the Geometric p. In the following.6 Let X be an r. Xn)′.4 Let X1. xn . 13. . In cases like this it so happens that for certain families of distributions and certain H and A.3 Let X1. . or the Triangular p.2.f. Xn be i.f. . . . .13. . with p. On the basis of one 2 2 observation on X. 13. f1(x) = 4x for 0 ≤ x < 1 . 13. whose p.d. θ .5 Let X1.2. σ2).d. For testing the hypothesis H : μ = μ1 against the alternative A : μ = μ2. at least one of the hypotheses H or A is composite. .v.d.v.’s with p. we give the deﬁnition of a family of p.d.d.2. . . where μ is unknown and σ is known. If ¯ = 3. construct the MP test of the hypothesis H : f = f0 against the alternative A : f = f1 at level of signiﬁcance α = 0. over the [0. X30 be independent r.) 13. ii) Determine the level of the test α when C = 3.v. .d. Xn be independent r. However. ( ) (17) Also Z = (X1.v.’s distributed as N(μ. denoted by f1 (that is.3 UMP Tests for Testing Certain Composite Hypotheses In the previous section an MP test was constructed for the problem of testing a simple hypothesis against a simple alternative. . σ2 = 4 against the x alternative A : μ = 3.36 × some positive number C. where . is either the U(0.v. f1(x) = 4 − 4x for 1 ≤ x ≤ 1 and 0 otherwise). θ ⋅ ⋅ ⋅ f x1 . denoted by f0.’s distributed as N(μ.7 Let X be an r.f. 1. . . .’s distributed as Gamma with α = 10 and β unknown.2. f which can be either f0 or else f1.v.f. in most problems of practical interest. 13.

θ) and g(·.d.1. (see also Exercise 4. one has g z. θ ′ ( ) = C (θ ′)e ( ) ( ) = C (θ ′) e[ () () g ( z. θ ′) are distinct and (ii) g(z. Therefore on the set of z’s for which h*(z) > 0 (which set has Pθ-probability 1 for all θ). θ = ( ) e − x −θ (1 + e ) − x −θ 2 .θ). θ ∈ Ω = ޒ. θ ′ ∈ Ω with θ < θ ′ then: (i) g(·. σ2) with σ2 known and N(μ. where V(z) = Σn j=1 T(xj) and g(· . PROOF We have g z : θ = C0 θ e n n j=1 ( ) () Q θ V z () () h* z . θ) is a monotone function of V(z). Suppose that Q is increasing. This completes the proof of the ﬁrst assertion. we will always work outside such a set.8(i). Below we present an example of a family which has the MLR property. Gamma with α = θ and β known. θ) with μ known. it follows that all of the following families have the MLR property: Binomial. θ) is an increasing function of V(z). θ ) C (θ ) C (θ )e Q θ′ V z 0 0 Q θ V z 0 0 Q θ ′ −Q θ V z ( ) ( )] ( ) . () where C0(θ) = C (θ). Negative Binomial. Chapter 4) with parameter θ . x ∈ ޒ. Note that the likelihood ratio (LR) in (ii) is well deﬁned except perhaps on a set N of z’s such that Pθ(Z ∈ N) = 0 for all θ ∈ Ω.d. ▲ From examples and exercises in Chapter 11. θ). θ) is given by (17).f. the family has the MLR property in V′ = −V. the assumption that Q is increasing implies that g(z. f x. θ ∈ Ω} has the MLR property in V.338 13 Testing Hypotheses DEFINITION 5 The family {g(·. θ ′)/g(z. If Q is decreasing.f. that is. (18) . Now for θ < θ′. An important family of p. EXAMPLE 5 Consider the Logistic p.’s having the MLR property is a oneparameter exponential family. The proof of the second assertion follows from the fact that [Q(θ ′) − Q(θ )]V (z) = [Q(θ ) − Q(θ ′)]V ′(z). PROPOSITION 1 Consider the exponential family f x: θ =C θ e ( ) () Q θ T x () ( ) hx. θ ∈ Ω} is said to have the monotone likelihood ratio (MLR) property in V if the set of z’s for which g(z. θ) > 0 is independent of θ and there exists a (measurable) function V deﬁned in ޒn into ޒsuch that whenever θ. () where C(θ) > 0 for all θ ∈ Ω ⊆ ޒand the set of positivity of h is independent of θ. V(z) = Σ T(xj) and h*(z) = h(x1) · · · h(xn). Poisson. In what follows. N(θ. Then the family {g(· . θ′)/g(z. but it is not of a one-parameter exponential type. or β = θ and α known.

the test is taken from (19) and (19′) with the inequality signs reversed. θ ′ ) f ( x. ⎪ 0. The proof of the theorem is a consequence of the following two lemmas. ⎪ ⎩ if V z > C if () () V ( z) = C (19) otherwise. ⎜ ⎜ − x −θ ′ ⎟ − x ′−θ ′ ⎟ ⎝1+e ⎠ ⎝1+e ⎠ 2 However. Let θ′ be an arbitrary but ﬁxed point in ω c and consider the problem of testing the above hypothesis H0 against the (simple) alternative A′ : θ = θ′ at level α. THEOREM 2 Let X1.v. In the case that the LR is increasing in V(z). Therefore if θ < θ′.’s with p. θ). θ ≤ θ0}.d. ⎪ ⎪ φ ′ z = ⎨γ . f(x. where C and γ are determined by Eθ 0 φ Z = Pθ 0 V Z > C + γPθ 0 V Z = C = α .f. by Theorem 1.13. θ ) 2 if and only if θ −θ ′ − x ′−θ ⎞ ⎛ 1 + e − x −θ ⎞ ⎛ θ −θ ′ 1 + e < e .f. For families of p. . This shows that the family {f(·. the test φ deﬁned by (19) and (19′) is MP (at level α) for testing the (simple) hypothesis H0 : θ = θ0 against the (composite) alternative A : θ ∈ ω c among all tests of level ≤ α. . θ ′ ( ) < f ( x ′. . the last inequality is equivalent to e−x < e−x′ or −x < −x′. θ) is deﬁned in (17). Then for testing the (composite) hypothesis H : θ ∈ ω against the (composite) alternative A : θ ∈ω c at level of signiﬁcance α. θ). θ ) e θ −θ ′ ⎛ 1 + e − x −θ ⎞ ⎜ − x −θ ′ ⎟ ⎝1+ e ⎠ 2 and f x. Xn be i. θ ) 0 otherwise. θ ∈ Ω ⊆ ޒand let the family {g(·. θ ∈ Ω} have the MLR property in V. θ ) f ( x ′. the test is given by ⎧1.’s having the MLR property.d. θ ′) = C ′g( z. .3 UMP Tests for Testing Certain Composite Hypotheses 339 Then f x. θ). θ ′ ( )=e f ( x.i. θ ∈ }ޒhas the MLR property in −x. the MP test φ′ is given by PROOF ⎧1. Then. this is equivalent to e−x(e−θ − e−θ′ ) < e−x′(e−θ − e−θ′ ). ⎪ ⎪ φ z = ⎨γ . LEMMA 1 Under the assumptions made in Theorem 2. we have the following important theorem. ⎪ ⎩ if if g z. Let θ0 ∈ Ω and set ω = {θ ∈ Ω. θ ′ > C ′g z. r. where g(·. θ0 () ( ) ( ) g( z. ( ) [() ] [() ] (19′) If the LR is decreasing in V(z). there exists a test φ which is UMP within the class of all tests of level ≤α. ⎪ 0. .d.

⎪ ⎪ φ ′ z = ⎨γ . we have that C = C0 and γ = γ ′ and the test given by (19) and (19′) is UMP for testing H0 : θ = θ0 against A : θ ∈ ω c (at level α). the MP test φ′ is given by PROOF ⎧1. θ0) = ψ [V(z)]. ( ) [() ] [() ] (21′) so that C0 = C and γ ′ = γ by means of (19) and (19′). ( ) Let g(z. ⎪ 0. ( ) { [ ( )] } { [ ( )] } ( ) . ⎪ ⎩ if if g z. θ0 > C ′g z. ⎪ ⎪ φ ′ z = ⎨γ . if and only if if and only if V z > ψ −1 C ′ = C0 ⎫ ⎪ ⎬ V z = C0 .340 13 Testing Hypotheses where C′ and γ′ are deﬁned by Eθ 0 φ ′ Z = α . θ ) = C ′g( z. In other words. ▲ LEMMA 2 Under the assumptions made in Theorem 2. Therefore [ ( )] ψ [V ( z)] = C ′ ψ V z > C′ In addition. by Theorem 1. ⎪ ⎭ () () ( ) (20) Eθ 0 φ ′ Z = Pθ 0 ψ V Z > C ′ + γ ′ Pθ 0 ψ V Z = C ′ = Pθ 0 V Z > C0 + γ ′Pθ 0 V Z = C0 . θ ′) otherwise. ( ) [() { [ ( )] } ] [() { [ ( )] } ] Therefore the test φ′ deﬁned above becomes as follows: ⎧1. and for the test function φ deﬁned by (19) and (19′). Then in the case under consideration ψ is deﬁned on ޒinto itself and is increasing. Once again. θ′ )/g(z. Let θ′ be an arbitrary but ﬁxed point in ω and consider the problem of testing the (simple) hypothesis H′ : θ = θ′ against the (simple) alternative A0(= H0) : θ = θ0 at level α(θ′) = Eθ′φ(Z). and Eθ0 φ ′ Z = Pθ0 V Z > C0 + γ ′ Pθ0 V Z = C 0 = α . ⎪ ⎩ if V z > C0 if 0 () () V ( z) = C (21) otherwise. ⎪ 0. we have Eθ′φ(Z) ≤ α for all θ′ ∈ ω. θ ′ 0 () ( ) ( ) g( z. where C′ and γ′ are determined by Eθ ′φ ′ Z = Pθ ′ ψ V Z > C ′ + γ ′ Pθ′ ψ V Z = C ′ = α θ ′ . It follows from (21) and (21′) that the test φ′ is independent of θ′ ∈ ω c.

θ ≥ θ0}. The desired result follows. a UMP test also exists for testing H : θ ∈ ω against A : θ ∈ ω c. COROLLARY ( ) () () where Q is strictly monotone. the test φ′ above also becomes as follows: ⎧1. ⎪ ⎩ if V z > C0′ if () () V ( z) = C ′ 0 (22) otherwise. Replacing θ0 by θ′ in the expression on the left-hand side of (19′) and comparing the resulting expression with (22′). Hence it is MP among tests in C. Eθ ′φ ′ Z = Pθ ′ V Z > C0′ + γ ′Pθ ′ V Z = C0′ = α θ ′ . θ ∈ Ω. ⎪ ⎪ φ ′ z = ⎨γ .d. . one has that α(θ′) ≤ α. ⎪ 0. one has that C0 ′ = C and γ ′ = γ. under the assumptions of Theorem 2. . . f(·. In all tests.3 UMP Tests for Testing Certain Composite Hypotheses 341 On account of (20). Then. C ⊆ C0. Xn be i. Also for testing H : θ ∈ ω = {θ ∈ Ω. clearly. Furthermore. ▲ REMARK 2 For the symmetric case where ω = {θ ∈ Ω.f. Therefore the tests φ′ and φ are identical. Let X1.i. there is a test φ which is UMP within the class of all tests of level ≤α. since α is the power of the test φ′. The test is given by (19) and (19′) if the LR is decreasing in V(z) and by those relationships with the inequality signs reversed if the LR is increasing in V (z). ▲ It can further be shown that the function β(θ) = Eθφ(Z).’s with p. and is MP among all tests in C0. This test is given by (19) and (19′) if Q is decreasing and by those relationships with reversed inequality signs if Q is increasing. This test is given by (19) and (19′) if Q is increasing and by (19) and (19′) with reversed inequality signs if Q is decreasing. PROOF It is immediate on account of Proposition 1 and Remark 2. belongs in C by Lemma 2. V(z) = Σn j = 1 T(xj). Next. ( ) [() ] [() ] ( ) (22′) where C′0 = ψ −1(C′ ). θ) given by Q (θ )T ( x ) f x. the test φ. Then for testing H : θ ∈ ω = {θ ∈ Ω. for the problem discussed in Theorem 2 and also the symmetric situation mentioned . by Lemma 1. deﬁned by (19) and (19′). θ ≤ θ0} against A : θ ∈ω c at level of signiﬁcance α. θ = C θ e hx. there is a test φ which is UMP within the class of all tests of level ≤α. . θ ≤ θ0} against A : θ ∈ ω c at level α. r.d. The relevant proof is entirely analogous to that of Theorem 2.13.v. C0 0 0 { } = {all level α tests for testing H : θ = θ }. by the corollary to Theorem 1. ▲ PROOF OF THEOREM 2 Deﬁne the classes of test C and C0 as follows: C = all level α tests for testing H : θ ≤ θ0 .

θ ≤ θ1 or θ ≥ θ 2 { } against A : θ ∈ω c. For instance. θ2 ∈ Ω and θ1 < θ2. is increasing for those θ’s for which it is less than 1 (see Figs. This result is stated as a theorem below but its proof is not given. If θ ≤ θ1 the dose is rendered harmless but also useless. Then for testing the (composite) hypothesis H : θ ∈ ω against the (composite) alternative A : θ ∈ ω c at level of signiﬁcance α. then a UMP test for the testing problem above does exist. θ).d. θ2 are the limits within which θ is allowed to vary. THEOREM 3 Let X1. ⎪ ⎩ if C1 < V z < C 2 if V z = C i otherwise. A : θ < θ0 in Remark 2.i. 2.342 13 Testing Hypotheses  ( ) 1  ( ) 1 ␣ 0 0 ␣ 0 0 Figure 13. C2 and γ1.f. ( ) (25) . Xn be i. there exists a UMP test φ. .2 and 13. θ = C θ e ( ) () Q θ T x () ( ) hx. where θ1. 2) (C 1 < C2 ) (24) where C1. One may then hypothesize that the dose in question is either useless or harmful and go about testing the hypothesis. . whereas if θ ≥ θ2 the dose becomes harmful. f(·.v. () (23) where Q is assumed to be strictly monotone and θ ∈ Ω = ޒ. where θ1. respectively). 13. . i = 1. A : θ > θ0 Figure 13. θ ≤ θ1 or θ ≥ θ2}. In the case that Q is increasing. θ may represent a dose of a certain medicine and θ1. Another problem of practical importance is that of testing H : θ ∈ω = θ ∈ Ω.3 H : θ ≥ θ0.2 H : θ ≤ θ0. given by f x. . Set ω = {θ ∈ Ω. θ2 ∈ Ω and θ1 < θ2. () () () (i = 1. θi 2 ] () n j =1 and V z = ∑ T x j . γ2 are determined by Eθ φ Z = Pθ C1 < V Z < C 2 + γ 1Pθ V Z = C1 i i i ( ) [ +γ 2 ( ) ] [() P [V (Z) = C ] = α .’s with p. φ is given by ⎧1.3. If the underlying distribution of the relevant measurements is assumed to be of a certain exponential form. ⎪ ⎪ φ z = ⎨γ i ⎪ 0.d. since this would rather exceed the scope of this book. r.

θ ≤ θ0} against A : θ ∈ωc and H′ : θ ∈ω = {θ ∈ Ω. It can also be shown that (in nondegenerate cases) the function β(θ) = Eθφ(Z). 13. we mention once and for all that the hypotheses to be tested are H : θ ∈ω = {θ ∈Ω. θ). Xn be i. θ2 ∈Ω and θ1 < θ2. In order to avoid trivial repetitions. α = 0. the UMP test for testing H is given by ⎧1.4).13. ␣ 2 If Q is decreasing. Theorems 2 and 3 are illustrated by a number of examples below. A : θ1 < θ < θ2.5.01 and n = 25.d. θ1. ⎩ if if () ∑j = 1 x j > C n ∑j = 1 x j = C n (26) otherwise. θ . ⎪ ⎪ φ z = ⎨γ . on account of the corollary to Theorem 2. θ0. let θ0 = 0. 0 0 0 ( ) ( ) ( ) (27) and X = ∑Xj j =1 n is B n.4 H : θ ≤ θ1 or θ ≥ θ2. increases for θ ≤ θ0 and decreases for θ ≥ θ0 for some θ1 < θ0 < θ2 (see also Fig. 1). the test is given again by (24) and (25) with C1 < V (z) < C2 replaced by V(z) < C1 or V(z) > C2. θ ≤ θ1 or θ ≥ θ2} against A′ : θ ∈ωc. The level of signiﬁcance is α(0 < α < 1). . ⎪ ⎪0. . Here V z = ∑ xj j =1 () n and Q θ = log () θ 1−θ is increasing since θ/(1 − θ) is so. Then one has .i. where C and γ are determined by Eθ φ Z = Pθ X > C + γPθ X = C = α .3 UMP Tests for Testing Certain Composite Hypotheses 343  ( ) 1 ␣ 0 1 0 Figure 13. . Therefore.v.’s from B(1. r. ( ) For a numerical application. θ ∈ Ω = (0. EXAMPLE 6 Let X1. . θ ∈ Ω for the problem discussed in Theorem 3.

( ) ( ) ( ) Again for a numerical application.75 ( (C ) ( ) < X < C ) + γ P (X = C ) + γ P ( ) (X = C ) = 0. C2 and γ1. where now X is P(nθ). One has then P0. 418 [ ( ) ( )] Let X1. with C1. The Binomial tables provided the values C = 18 and γ = test at θ = 0.5 X = 15 = 0.5923. by means of the Poisson tables. Then.05 P0.75 X = 18 = 0.25.v.5 = P0.5 10 < X < 15 + EXAMPLE 7 ( ) ( ) 205 P0. α = 0. 2. . . take θ0 = 0.75 1 2 1 0.’s from P(θ).75.75 1 2 0.05 and n = 25.5.25 X = C1 + γ 2 P0. Therefore the UMP test for testing H is again given by (26) and (27).5 X = C = 0.i. 2 For C1 = 10 and C2 = 15. Here V(z) = Σn j = 1 xj and Q(θ) = log θ is increasing. i = 1.05.25 C1 < X < C 2 + γ 1 P0.344 13 Testing Hypotheses P0. 363 . ∞). 2) ( ) otherwise. Xn be i. take θ1 = 0. 143 ( ) By virtue of Theorem 3. For a numerical example. ⎩ if C1 < ∑ j=1 x j < C 2 n () if ∑ n j=1 x j = Ci (i = 1. from which we obtain γ1 = γ 2 = The power of the test at θ = 0. r.75 is 27 143 ( ) ( ) . .75 = P0. θ ∈Ω = (0.d.5 X = 10 + P0.05 and n = 10.4904. we ﬁnd C = 9 and γ = 182 ≈ 0. γ2 deﬁned by Eθ i φ Z = Pθ i C1 < X < C 2 + γ 1 Pθ i X = C1 + γ 2 Pθ i X = C 2 = α . α = 0.25 X = C 2 = 0.6711. θ2 = 0. The power of the βφ 0. 418 βφ 0.5 X > C + γP0.75 X > 18 + ( ) ( ) 27 P0. one has after some simpliﬁcations 416γ 1 + 2γ 2 = 205 2γ 1 + 416γ 2 = 205.01.5014.5 is 205 ≈ 0. . for testing H′ the UMP test is given by ⎧1 ⎪ ⎪ φ z = ⎨γ i ⎪ ⎪0.

is given by ⎡ n C −θ ⎤ ⎥. () if x > C otherwise. 2 n ) ␣ 0 C 0 N (0.6 H : θ ≥ θ0.66. where C1. the power of the test is () ( ) βφ 21 = 0. βφ θ = 1 − Φ ⎢ ⎢ ⎥ σ ⎣ ⎦ For instance.i.5 H : θ ≤ θ0. where C is determined by Eθ 0 φ Z = Pθ 0 X > C = α .v. one has C = 20. ( ) () if C1 < x < C 2 otherwise. . C2 are determined by Eθ φ Z = Pθ C1 < X < C 2 = α .d.6.5 and 13. Therefore for testing H the UMP test is given by (dividing by n) ⎧1. EXAMPLE 8 Let X1. 2.8023. . . Figure 13. For θ = 21. (See also Figs. φ z =⎨ ⎩0 .05 and n = 25. Here V z = ∑ xj j =1 () n and Qθ = () 1 θ σ2 is increasing.’s from N(θ. σ2/n). α = 0.3 UMP Tests for Testing Certain Composite Hypotheses 345 The power of the test at θ = 1 is βφ(1) = 0. A : θ < θ0. φ z =⎨ ⎩0. On the other hand.6048. Xn be i. .7. for testing H′ the UMP test is given by ⎧1.13. i i ( ) ( ) i = 1. σ2) with σ2 known. 13. for σ = 2 and θ0 = 20.) The power of the test is given by ⎡ n C −θ 2 βφ θ = Φ ⎢ ⎢ σ ⎣ () ( ⎡ nC )⎤ ⎥ − Φ⎢ ( ⎥ ⎦ ⎢ ⎣ −θ ⎤ ⎥. 13. 2 n ) ␣ 0 0 C Figure 13. ⎥ σ ⎦ 1 ) N (0 . as is easily seen. r. ( ) ( ) ¯ is N(θ.) and X The power of the test. . A : θ > θ0. (See also Fig.

A′ : θ1 < θ < θ2. and for θ = 12.344. ⎪ φ z =⎨ ⎪ ⎩0. ⎥ ⎦ where C is determined by ⎡n Eθ φ Z = Pθ ⎢∑ X j − μ ⎢ ⎣ j =1 0 ( ) 0 ( ) 2 The power of the test. Then V(z) = 2 Σn j = 1 (xj − μ) and Q(θ) = −1/(2θ) is increasing. ␣ 0 C/ 0 Figure 13. A : θ < θ0. . 13. . Xn be i.7 H ′ : θ ≤ θ1 or θ ≥ θ2.610.05 and n = 25.05 and n = 25. EXAMPLE 9 Let X1. . is given by 2 βφ θ = 1 − P χ n <C θ () ( ) (independent of μ!).9 H : θ ≥ θ0.344.i. the UMP test is given by ⎧1. Then one has C = 150. ⎤ > C⎥ = α. For instance.9.980.) For a numerical example.d. one has C1 = −0. r. distribution as χn. for testing H′ the UMP test is given by ⎧1. the power of the test is βφ(0) = 0. A : θ > θ0. take θ0 = 4.346 13 Testing Hypotheses N (0. ⎪ φ z =⎨ ⎪0. the power of the test is βφ(12) = 0. Therefore for testing H. 2 n ) ␣ 0 C1 0 C2 Figure 13.8 H : θ ≤ θ0. for σ = 2 and θ1 = −1. and for θ = 0. () if C1 < ∑j = 1 x j − μ n ( ) 2 < C2 otherwise. χ2 n stands for an r. C2 are determined by 2 n 2 n ␣ 0 C/ 0 Figure 13. θ2 = 1. On the other hand. 2 (See also Figs.608. α = 0. θ) with μ known.’s from N(μ.v. . as is easily seen. . where C1. ⎩ () if ∑j = 1( x j − μ ) n 2 >C otherwise. α = 0.v. C2 = 0.8 and 13.

xn) and identity V. θ = ( ) θ α α −1 −θx x e I ( 0 . σ = 150 hours. . 13. . . Exercises 13. .3. ∞ . as is easily seen. Test the hypothesis H : μ = 1.3. For instance. x ⎠ ⎝ ( ) ( ) () { } 13. r.d.730 hours.13. ( ) ( ) ⎛ 2 C2 ⎞ ⎛ 2 C1 ⎞ P⎜ χ 25 < ⎟ − P⎜ χ 25 < ⎟ = 0.’s distributed as X and ¯ = 1. .i. C2 are determined from the Chi-square tables (by trial and error). The power of the test. 2.1 Let X1. i) f x. . . ( ) x ⎛ r + x − 1⎞ ii) f x. θ ∈ Ω = 0. θ2 = 3. . we have 2 2 P χ 25 < C 2 − P χ 25 < C1 = 0. for testing the hypotheses H and H′ mentioned there. .v.f.d.d.f. show that the joint p. θ = θ r ⎜ ⎟ 1 − θ I A x . .v. 1. of the X’s has the MLR property in V = V(x1. f given below.3 The length of life X of a 50-watt light bulb of a certain brand may be assumed to be a normally distributed r. for θ1 = 1.d.01 and n = 25. Let X1. the power of the respective tests is given by ⎡ n C −θ βφ θ = 1 − Φ ⎢ ⎢ σ ⎣ () ( )⎤ ⎥ ⎥ ⎦ −θ ⎤ ⎥ ⎥ σ ⎦ 1 and ⎡ n C −θ 2 βφ θ = Φ ⎢ ⎢ σ ⎣ () ( ⎡ nC )⎤ ⎥ − Φ⎢ ( ⎥ ⎦ ⎢ ⎣ ) as asserted.01. .800 against the suppose that x alternative A : μ < 1. with unknown mean μ and known s. θ ∈ Ω = 0. Xn be i. is given by ⎛ 2 C2 ⎞ ⎛ 2 C1 ⎞ βθ θ = P ⎜ χ n < ⎟ − P⎜ χ n < ⎟ θ ⎠ θ⎠ ⎝ ⎝ () (independent of μ!). 1 .v. α = 0.’s with p.3.01 3⎠ 3⎠ ⎝ ⎝ and C1.2 Refer to Example 8 and show that. . . X25 be independent r. ⎥ ⎦ i = 1. α = known Γα ( ) () ( ) (> 0). In each case.800 at level of signiﬁcance α = 0. A = 0.01. .3 UMP Tests for Testing Certain Composite Hypotheses Exercises n ⎡ Eθ φ Z = Pθ ⎢C1 < ∑ X j − μ ⎢ j =1 ⎣ i 347 ( ) i ( ) 2 ⎤ < C2 ⎥ = α . . . .∞ ) x .

500. suppose that α = 0. .750.3. whereas for the past several years the average was 10.95.05. . Use the CLT in order to determine the cut-off point. 25. x3 = 27. .3. at level of signiﬁcance α = 0. 13. 1). with s.4 The rainfall X at a certain station during a year may be assumed to be a normally distributed r. iv) Determine the minimum sample size n required by using the Binomial tables (if possible) and also by using the CLT. X25 be independent r.v. x7 = 30. distributed as B(n. . Xn be independent r.875. the record provides the following rainfalls: x1 = 30.v. For the past 10 years. For testing the hypothesis H : p ≤ 1 against the alternative A : p > 1 .348 13 Testing Hypotheses 13. θ ∈Ω = (0. 13. x9 = 31.3. use the CLT for its numerical determination. .05? iii) Compute the power at θ1 = 0.5 Refer to Example 9 and show that.01.v. . 13.8 Let X1.1 and suppose that we are interested in securing power at least 0. test the hypothesis H that the proportion of women is 25% against the alternative A that is less than 25% at level of signiﬁcance α = 0.05. 0. On the basis of this. .10 The number X of fatal trafﬁc accidents in a certain city during a year may be assumed to be an r. θ). p).05. i) Derive the UMP test for testing the hypothesis H : θ ≤ θ0 against the alternative A : θ > θ0 at level of signiﬁcance α. What does the relevant test become for Σ25 j = 1xj = 120.9 Let X be an r. Test the hypothesis H : μ = 30 against the alternative A : μ < 30 at level of signiﬁcance α = 0. ii) What does the test in (i) become for n = 10. For the latest year x = 4. . First. x5 = 35. 13. Test whether it has been an improvement. for testing the hypotheses H and H′ mentioned there. where xj is the observed value of Xj. write out the expression for the exact determination of the cut-off point.3.’s distributed as N(0.375. 0.’s distributed as B(1. . Now let θ0 = 0. 13.3. x4 = 29.1. 0.9 against the alternative θ1 = 0. x8 = 28.125 and α = 0.6 Let X1. x6 = 26. . . the power of the respective tests is given by ⎛ 2 C⎞ ⎛ 2 C2 ⎞ ⎛ 2 C1 ⎞ βφ θ = 1 − P⎜ χ n < ⎟ and βφ θ = P⎜ χ n < ⎟ − P⎜ χ n < ⎟ θ⎠ θ ⎠ θ⎠ ⎝ ⎝ ⎝ () () as asserted. 13.9. 0.0.3.25.3. j = 1. distributed as P(λ). σ = 3 inches and unknown mean μ. .d.7. . x2 = 34. θ0 = 0. σ2).05 2 2 7 and β( 8 ) = 0.7 In a certain university 400 students were chosen at random and it was found that 95 of them were women.3.v.5.625. Test the hypothesis H : σ ≤ 2 against the alternative A : σ > 2 at level of signiﬁcance α = 2 0.2. and secondly. x10 = 25.4. Use the CLT in order to determine the required sample size n.8.9.v.25 and α = 0.

for testing H : θ ∈ω = {θ ∈Ω.d. . DEFINITION 7 In the notation of Deﬁnition 6.v. . .f. . .d. one would have to restrict oneself to a smaller class of tests.95 against the alternative θ1 = 500 when θ0 = 1. In fact.4 13. f(·. f given by f x. a UMP test exists. . . Then X may be considered to be an r. θ ≤ θ1 or θ ≥ θ2} against A : θ ∈ω c. θ = ( ) 1 −x θ e I (0 .11 Let X be the number of times that an electric light switch can be turned on and off until failure occurs.12 Let X1.f. θ ∈Ω ω against the alternative A : θ ∈ ωc at level of for testing the hypothesis H : θ ∈ω signiﬁcance α. . Xn) ≤ α for all θ ∈ ω and Eθφ(X1.000 and α = 0. . Next. Then Let X1. Xn be i. X15 be ¯ = 15. Xn) ≥ α for all θ ∈ω That is. .3 UMPU UMP Tests for Testing Certain Composite Hypotheses 349 13. The above observations suggest that in order to ﬁnd a test with some optimal property.’s with p. Also under the assumptions of Theorem 2. the deﬁning property of an unbiased test is that the type-I error probability is at most α and the power of the test is at least α. Test the independent r. .’s with p.05. a test φ based on X1. ( ) i) Derive the UMP test for testing the hypothesis H : θ ≥ θ0 against the alternative A : θ < θ0 at level of signiﬁcance α . Xn is said to be unbiased if Eθφ(X1. . ∞ . θ). . a test is said to be uniformly most powerful unbiased (UMPU) if it is UMP within the class of all unbiased tests.4 UMPU Tests for Testing Certain Composite Hypotheses In Section 13. . ii) Determine the minimum sample size n required to obtain power at least 0. . . distributed as Negative Binomial with r = 1 and unknown p.∞ ) x . A UMP test is always UMPU.v. a UMP test does not exist any longer.13. .3. . 13.i.v.’s distributed as X and suppose that x hypothesis H : p = 10−4 against the alternative A : p > 10−4 at level of signiﬁcance α = 0.3. This is so because the test given by (19) and (19′) is UMP for θ > θ0 but is worse than the trivial test φ(z) = α for θ < θ0. . r.05. Xn be independent r. . .v. ω c. it was stated that under the assumptions of Theorem 3. REMARK 3 .d.150. It is then somewhat surprising that. . for testing H0 : θ = θ0 against A″ : θ ≠ θ0 a UMP does not exist. Thus there is no unique test which is UMP for all θ ≠ θ0. This leads us to introducing the concept of an unbiased test. if the roles of H and A are interchanged. in the ﬁrst place it is unbiased because it is at least as powerful as the test which is identically equal to α. 13. DEFINITION 6 Ω and let ω ⊂ Ω ⊆ ޒr. . Let X1. it is UMPU because it is UMP within a class including the class of unbiased tests. . θ () θ ∈ Ω = 0.3.

2 for H . . xn)′. it can be shown that the function βφ(θ) = Eθφ(Z). . . Xn be i.d. γi. ⎪ 0. The following theorem covers cases of this sort. . where the constants Ci. 2) (C 2 i 1 < C2 ) otherwise. . Xn)′ and V(z) = Σn j = 1 T(xj).10). . .’s with p. θ = C θ e ( ) () θT x ( ) hx. θ ∈Ω (except for degenerate cases) is decreasing for θ ≤ θ0 and increasing for θ ≥ θ0 for some θ1 < θ0 < θ2 (see also Fig.d. Eθ V Z φ Z = αEθ V Z 0 0 ( ) [ ( ) ( )] ( ) for H 0 . a simple reparametrization of the families brings them under the form (28). θ1 ≤ θ ≤ θ2} and ω0 = {θ0}. . Z = (X1.i. where θ0. . θ2 ∈ Ω and θ1 < θ2. 2 are given by Eθi φ Z = α . However.10 H : θ1 ≤ θ ≤ θ2. f(·.) Furthermore. A : θ < θ1 or θ > θ2. () θ ∈Ω ⊆ ޒ. ⎪ ⎪ φ z = ⎨γ i . ( ) 0 i = 1. THEOREM 4 Let X1.350 13 Testing Hypotheses For certain important classes of distributions and certain hypotheses. .v. . by Examples and Exercises of Chapter 11 it can be seen that all these families are of the exponential form REMARK 4  ( ) 1 ␣ 0 1 0 2 ␣ Figure 13. and Eθ φ Z = α . r. i = 1. ⎪ ⎩ if V z < C1 if () or V (z ) > C () V (z ) = C . θ1. (i = 1. . In fact. Poisson and Normal would fall under Theorem 4. . but it will be presented without a proof. UMPU tests do exist. θ) given by f x. there exist UMPU tests which are given by ⎧1.f. Then for testing the hypothesis H : θ ∈ω against A : θ ∈ ωc and the hypothesis H0 : θ ∈ω0 against A0 : θ ∈ω c at level of signiﬁcance α. while they seemingly do not. (Recall that z = (x1. (28) Let ω = {θ ∈Ω. 13. We would expect that cases like Binomial.

where τi = −1/(2θi). the family is brought under the form (28). we consider the following example.’s from N(μ.d. where τ i = log θi .4 13. . 1 − θi i = 0. Let σ be known and set μ = θ. Since θ = −1/(2τ). θ = θ0 become equivalently. r. or n x > C2 σ2 where C1. 1. Then by setting log[θ/(1 − θ)] = τ. () i) For the Binomial case. Q(θ) = −1/(2θ) and the transformation −1/(2θ) = τ brings the family under the form (28) again. the UMPU test is as follows: ⎧ ⎪1. EXAMPLE 10 Suppose X1. . The transformation implies θ = eτ and the hypotheses θ1 ≤ θ ≤ θ2. i = 0. 1. 1. iv) For the Normal case with μ known and σ2 = θ. The level of signiﬁcance will be α. the hypotheses θ1 ≤ θ ≤ θ2 and θ = θ0 become. τ1 ≤ τ ≤ τ2. 2.3 UMPU UMP Tests for Testing Certain Composite Hypotheses 351 f x. τ1 ≤ τ ≤ τ2. Xn are i. τ = τ0. In the present case. σ2). τ = τ0 with τi = log θi. we get θ = eτ/(1 + eτ ) and the hypotheses θ1 ≤ θ ≤ θ2.i. θ = C θ e ( ) () Q θ T x () ( ) hx. θ = θ0 become. As an application to Theorem 4 and for later reference. Q(θ) = (1/σ2)θ and the factor 1/σ2 may be absorbed into T(x). 2. i = 0. iii) For the Normal case with σ known and μ = θ. σ2 () n ( ) 1 σ2 ∑x j =1 n j = n x.13. equivalently.v. From this transformation. ⎩ () if n x < C1 σ2 otherwise. . T x = so that V z = ∑T x j = j =1 () 1 x. 2. . τ1 ≤ τ ≤ τ2 and τ = τ0. Q(θ) = log θ and the transformation log θ = τ brings the family under the form (28). by Theorem 4. C2 are determined by Eθ φ Z = α . 0 0 0 ( ) [ ( ) ( )] ( ) Now φ can be expressed equivalently as follows: . equivalently. Eθ V Z φ Z = αEθ V Z . Q(θ) = log[θ/(1 − θ)]. Suppose that we are interested in testing the hypothesis H : θ = θ0 against the alternative A : θ ≠ θ0. φ z =⎨ ⎪0 . σ2 Therefore. ii) For the Poisson case.

. More explicitly. θ . . we have ⎧ ⎪ ⎪1. . ϑ 1 . . [√n(X − θ0)/σ]2 is χ2 1. ⎪ ⎩0. ϑk in which we have no interest. ( ) < C′ 1 or n x − θ0 ( σ ) > C′ 2 where C1′ = σC1 n − nθ0 nθ0 σC 2 .d. ϑ k = C θ . . ϑ 1 . . under H. () ( )⎤ ⎥ ⎥ ⎦ 2 >C where C is determined by P χ12 > C = α . These latter parameters are known as nuisance parameters. . . . . . under H. () if n x − θ0 σ otherwise. . ϑ k exp θT x 1 1 k k k ( ) ( ) [ () + ϑ T ( x) + ⋅ ⋅ ⋅ + ϑ T ( x )]h( x). ϑk are real-valued and h(x) > 0 on a set independent of all parameters involved. . 1). θ1. . is of the following exponential form: f x. C2 . φ z =⎨ ⎪ ⎪ ⎩0. Then the (composite) hypotheses of interest are the following ones. − ′= σ σ n – ¯ On the other hand. . say (C > 0). of course. . . √n(X − θ0)/σ is N(0. Also n x − θ0 ( σ ) < −C or n x − θ0 ( σ ) >C is equivalent to ⎡ n x −θ 0 ⎢ ⎢ σ ⎣ ( )⎤ ⎥ ⎥ ⎦ 2 >C – ¯ and.352 13 Testing Hypotheses ⎧ ⎪ φ z = ⎨1. ϑk are left unspeciﬁed. .d. Let θ0. and in addition some other real-valued parameters ϑ1. involves a real-valued parameter θ in which we are exclusively interested. θ2 ∈Ω with θ1 < θ2. . the p. Therefore. In many situations of practical importance. ϑ1. the underlying p. where ϑ1. .f. ( ) (29) where θ ∈Ω ⊆ ޒ. . By summarizing then. ⎡ n x −θ 0 if ⎢ ⎢ σ ⎣ otherwise. because of symmetry C1 ′ = −C2 ′ = −C.f.

Determine the test for α = 0. 2 2 13. This is a consequence of Theorem 5 (except again for the UMP test). .2 Let X1. Xn be i. . Exercises 13. X3 be independent r.4. θ ≤ θ ≤ θ } ⎪ ⎪ {θ } ⎪ ⎭ 0 0 1 2 i i 1 2 0 c .d.v. the optimal character of the tests will not become apparent by that process. the other serving as a nuisance parameter. . θ ≤ θ } ⎪ ⎪ {θ ∈Ω.4. . 4. r.05. with probability p of falling heads. given by (29). . Then. One of the parameters at a time will be the parameter of interest.v. . .25 against A : p ≠ 0. where both μ and σ2 are unknown.1 A coin.i. i = 1. whose proof is omitted. σ2). as indicated in Remark 5.13.d. r.5 Testing the Parameters of a Normal Distribution In the present section. the family is brought under the form (29). Because of the special role that normal populations play in practice. All tests to be presented below are UMPU.’s distributed as B(1.5 Tests Testing forthe Testing Parameters Certainof Composite a Normal Hypotheses Distribution 353 ⎧H1 :θ ∈ω = ⎪ ⎪H1′ :θ ∈ω = ⎪ ⎪ ⎨H 2 :θ ∈ω = ⎪ ⎪H 3 :θ ∈ω = ⎪ H :θ ∈ω = ⎪ ⎩ 4 ⎫ {θ ∈Ω.i. (30) We may now formulate the following theorem.3 UMP 13. . . . Use the UMPU test for testing the hypothesis H : p = 1 against the alternative A : p ≠ 1 at level of signiﬁcance α = 0. θ ≥ θ } ⎪ ⎬ A ( A′) :θ ∈ω {θ ∈Ω. Xn are assumed to be i. THEOREM 5 Let X1.’s from N(μ. X2. . i = 1. . . respectively.25 at level of signiﬁcance α.’s with p. . Derive the UMPU test for testing H : p = 0. X1. the following two sections are devoted to presenting simple tests for the hypotheses speciﬁed in (30). under some additional regularity conditions.v. except for the ﬁrst one which is UMP. Also the remaining (unspeciﬁed) regularity conditions in Theorem 5 can be shown to be satisﬁed here. θ ≤ θ or θ ≥ θ }⎪ ⎪ {θ ∈Ω.1. . Some of the tests will be arrived at again on the basis of the principle of likelihood ratio to be discussed in Section 7. 4.d. and therefore the conclusion of the theorem holds. . . Under appropriate reparametrization. there exist UMPU tests with level of signiﬁcance α for testing any one of the hypotheses Hi(H1 ′ ) against the alternatives Ai(A′ 1). 13. However.f. is tossed independently 100 times and 60 heads are observed. p).

respectively. The test given by (33) and (34).).05. −1 < C 2 σ i ( ) i = 1. is UMPU for testing H3 : σ1 ≤ σ ≤ σ2 against A3 : σ < σ1 or σ > σ2. σ is the true s. Finally. The test given by (31) and (32) with reversed inequalities is UMPU for testing H′ ′ : σ < σ0. . . so that C = 124.415. 13. so that C = 327. against A2 : σ1 < σ < σ2.1094) = 0. . C/9 = 36. PROPOSITION 3 For testing H2 : σ ≤ σ1 or σ ≥ σ2. where the inequalities C1 < Σn j=1 ¯ )2 < C2 are replaced by (x j − x ∑( j =1 n xj − x ) 2 < C1 or ∑ (x j =1 n j −x ) 2 > C2 . Xn)′. .158) = 0.05 24 ⎟ ⎜ 24 ⎟ ⎪ ⎜ 9⎠ 9 ⎠ ⎝ ⎩ ⎝ . for H2 and for n = 25.962. Again. 1 : σ ≥ σ0 against A1 The power of the tests is easily determined by the fact that (1/σ2) Σn j=1 ¯ )2 is χ2 (X j − X n−1 when σ obtains (that is. The power of the test at σ = 5 is equal to P(χ2 24 > 13. the test given by ⎧1. . σ1 = 2. where C is determined by 2 2 P χn −1 > C σ 0 = α . For example. and similarly for (34). where C1. 2.5. ⎪ φ z =⎨ ⎪ ⎩0. we will also use the notation z and Z instead of (x1. xn)′ and (X1. all tests will be of level α.05 4 4 ⎠ ⎝ ⎠ ⎝ ⎪ ⎨ ⎪P⎛ χ 2 > C1 ⎞ − P⎛ χ 2 > C 2 ⎞ = 0. the power of the tests is determined by the fact that 2 ¯ 2 (1/σ2) Σn j = 1 (Xj − X ) is χ n−1 when σ obtains. C/9 = 13. (34) is UMPU.848. () if C1 < ∑ j =1 x j − x n ( ) 2 < C2 (33) otherwise. σ2 = 3 and α = 0.632. and the power at σ = 2 is P χ2 24 (< 31. For H1 ′ .1 Tests about the Variance PROPOSITION 2 For testing H1 : σ ≤ σ0 against A1 : σ > σ0.d. the test given by ⎧1. .05. . C2 are determined by 2 2 P C1 σ i2 < χ n = α. we have for H1. ( ) (32) is UMP.8384. σ0 = 3 and α = 0. for n = 25. For example. C1. . C2 are determined by ⎧ ⎛ 2 C1 ⎞ ⎛ 2 C2 ⎞ ⎪P⎜ χ 24 > ⎟ − P⎜ χ 24 > ⎟ = 0.354 13 Testing Hypotheses Whenever convenient. ⎪ φ z =⎨ ⎪ ⎩0.735. () if ∑ (x n j =1 j −x ) 2 >C (31) otherwise.

tn−1 stands for an r. where C1. is UMPU. ( ) (37) is UMPU. () (36) where C is determined by P t n−1 > C = α . the test given by ⎧1.v. To facilitate the writing. (35) PROPOSITION 5 For testing H1 : μ ≤ μ0 against A1 : μ > μ0.7109 for H1. (See also Figs. φ z =⎨ ⎪0.11 and 13.05. it is a close approximation to the UMPU test when n is large.3 UMP 13.7109 for H1 ′. the test given by ⎧ ⎪1.13. PROPOSITION 4 For testing H4 : σ = σ0 against A4 : σ ≠ σ0. φ z =⎨ ⎪0. the test given by ⎧ ⎪1. PROPOSITION 6 For testing H4 : μ = μ0 against A4 : μ ≠ μ0. The popular equal tail test is not UMPU. ⎩ () if t z > C otherwise. The test given by (36) and (37) with reversed inequalities is UMPU for testing H1 ′ : μ ≥ μ0 against A1 ′ : μ < σ0. we have P(t24 > C) = 0.05. ⎩ () if ∑ (x n j =1 j −x ) 2 < C1 or ∑ (x n j =1 j −x ) 2 > C2 otherwise. and C = −1.2 Tests about the Mean In connection with the problem of testing the mean. hence C = 1. μ ≥ μ0 and μ = μ0. REMARK 5 13. distributed as tn−1. we set t z = () n x − μ0 n ( ) ) 2 1 ∑ xj − x n − 1 j =1 ( .5. ⎪ φ z =⎨ ⎪0. 1 () g being the p. of a χ2 n−1 distribution.12. () or t z > C () (C > 0) .) For n = 25 and α = 0.d. The power of the test is determined as in the previous cases. UMPU tests exist in a simple form and are explicitly given for the following three cases: μ ≤ μ0.f. ⎩ () if t z < −C otherwise. t(z) is given by (35). 13.5 Tests Testing forthe Testing Parameters Certainof Composite a Normal Hypotheses Distribution 355 from the Chi-square tables (by trial and error). C2 are determined by ∫ 2 c2 σ0 2 c1 σ0 g t dt = () 1 c n − 1 ∫c 2 2 σ0 2 σ0 tg t dt = 1 − α .

σ must not exceed 0.2 Let X1. In both these last two propositions. σ2).05 inch.3 Discuss the testing hypothesis problem in Exercise 13. Exercises 13. ii) Carry out the test if n = 25. the s.04 inch. In order for the bolts to be usable for the intended purpose. .’s distributed as N(μ. Xn be i. ( ) is UMPU. which is deﬁned in Appendix II. . 13. .5. ϪC 0 C For example.11 H1 : μ ≤ μ0. Σ25 j = 1 (xj − x 13. for n = 25 and α = 0. tn Ϫ 1 ␣ 0 Figure 13.8. and α = 0. σ0 = 2. r.v.5.d. Formulate the appropriate testing hypothesis problem and carry out the test if α = 0.’s from N(μ. A′ 1 : μ < μ0 .0639. ¯)2 = 24. A1 : μ > μ0. σ2). A4 : μ ≠ μ0.v.05. we have C = 2. 13. where both μ and σ are unknown. .) tn Ϫ 1 ␣ 2 ␣ 2 Figure 13.i.05.1 The diameters of bolts produced by a certain machine are r. the determination of the power involves what is known as non-central t-distribution.13 H4 : μ = μ0. where C is determined by P t n−1 > C = α 2.4 if both μ and σ are unknown.5. i) Derive the UMPU test for testing the hypothesis H : σ = σ0 against the alternative A : σ ≠ σ0 at level of signiﬁcance α. .13. (See also Fig.12 H ′ 1 : μ ≥ μ0. t(z) is given by (35).3. A sample of size 16 is taken and is found that s = 0.05.d.356 13 Testing Hypotheses tn Ϫ 1 ␣ 0 C C Figure 13.

X m . we shall employ the notation ′ Z = X1 . Xm be i. If the desired value for μ is 2.3 Comparing UMP Tests the forParameters Testing Certain of Two Composite Normal Distributions Hypotheses 357 13. . r. respectively. Each time either μ or τ will be the parameter of interest.i. It is assumed that the two random samples are independent and that all four parameters involved are unknown.’s distributed as N(μ. Yn ′ ( ) ′ for the X’s and Y’s. xm . ( ) W = Y1 . .6 Comparing the Parameters of Two Normal Distributions Let X1. . .i.05. .4 A manufacturer claims that packages of certain goods contain 18 ounces.v. 13.’s from N(μ2. ( ) w = y1 .5 inches.v.01.5. .d. Make the appropriate assumptions and test the hypothesis H that the manufacturer’s claim is correct against the appropriate alternative A at level of signiﬁcance α = 0. in either one of the parameters μ or τ. . . . The problem to be discussed in this section is that of testing certain hypotheses about μ and τ.13. . Yn be i.752 and ∑x j =1 100 2 j = 31. Furthermore. 13.’s from N(μ1. . Writing down the joint p. and ′ z = x1 . 0. . yn ( ) for their observed values.6.d. σ2 2). it can be shown that the additional (but unspeciﬁed) regularity conditions of Theorem 5 are satisﬁed and therefore there exist UMPU tests for the hypotheses speciﬁed in (30). . the remaining parameters serving as nuisance parameters.6 13.1 Comparing the Variances of Two Normal Densities PROPOSITION 7 For testing H1 : τ ≤ τ0 against A1 : τ > τ0.f. 157. . A sample of size 16 is taken and is found that ¯ = 2.v. . . For convenient writing. r.5. 100 packages are chosen at random from a large lot and it is found that ∑x j =1 100 j = 1. . . the tests have a simple form to be explicitly mentioned below. formulate the approprix ate testing hypothesis problem and carry out the test if α = 0. . σ2 1) and let Y1.5 The diameters of certain cylindrical items produced by a machine are r. 13.01). For some of these hypotheses. . .d. In order to check his claim. Set μ = μ1 − μ2 and τ = σ2 2/ σ2 1. . .48 inches. has the form (29). of the X’s and Y’s and reparametrizing the family along the lines suggested in Remark 4 reveals that this joint p. the test given by .f.d. .

A1 ′ : τ < τ0. A1 : τ > τ0. The test given by (38) and (39) with reversed inequalities is UMPU for testing H′ : τ ≥ τ0 against A′ 1 : τ < τ0. (n − 1)τ 0 (39) is UMPU. 13. w = ⎨ ⎪ ⎪ ⎩0.) The power of the test is easily determined by the fact that 1 n Yj − Y 2 ∑ σ2 j =1 1 m ∑ Xj − X σ 12 i = 1 ( ) 2 n−1 = m−1 ( ) 2 1 m−1 τ n−1 ∑ (Yj − Y ) j =1 m i =1 n 2 ∑ (Xi − X ) 2 is Fn−1.15.05. n = 21. (See also Figs. τ0 = 2 and α = 0. m Ϫ 1 Fn Ϫ 1. w = ( ) ∑ ( yj − y) j =1 2 n 2 ∑ ( xi − x ) 1 + τ0 ∑ ( yj − y) j =1 n .8640 for H1.m−1 stands for an r. ′. ( ) C0 = (m − 1)C . one has P(F20. ( ) if ∑ j =1 ( y j − y ) 2 m ∑i =1 ( xi − x ) n 2 >C (38) otherwise. for H1 ⎛ ⎛ 5C ⎞ 12 ⎞ P⎜ F20 .0267 and C = 4.05 12 ⎠ 5C ⎠ ⎝ ⎝ implies 12/5C = 2. Fn−1.1525. ⎪ φ z.15 H 1 ′ : τ ≥ τ0. hence 5C/12 = 2.24 < ⎟ = P⎜ F24 . where C is determined by P Fn−1.m−1 > C0 = α . Now set 1 τ0 m i =1 V z. distributed as Fn−1.14 and 13.24 > 5C/12) = 0.358 13 Testing Hypotheses Fn Ϫ 1.m−1. 2 (40) Then we have the following result.m−1 distributed when τ obtains.v. ⎧ ⎪1. For m = 25. Thus the power of the test depends only on τ. m Ϫ 1 ␣ ␣ 0 C0 0 C0 Figure 13.20 > ⎟ = 0.14 H1 : τ ≤ τ0. .05.0825 and hence C = 1. Figure 13.

w > C 2 ( ) where C1. w) is deﬁned by (40). (Br . V(z. w > C otherwise. (41) 2 2 We shall also assume that σ2 1 = σ 2 = σ (unspeciﬁed). w) is given by (41). w = ⎨ ⎪ ⎩0. Leon Harter. ( ) if V z. ⎩ ( ) if t z.7139 and C = 0. φ z. the test given by ⎧ ⎪1.13. ( ) or V z. ( m−1) ( ( ⎢ ⎥ ⎢ ⎥ 2 2 2 2 ⎣ ⎦ ⎣ ⎦ is UMPU. for example. w = ⎨ ⎪0.r stands for an r. PROPOSITION 9 For testing H1 : μ ≤ 0 against A1 : μ > 0. C = −0. we use the incomplete Beta tables. C2 are determined by ⎡ ⎤ ⎡ ⎤ P ⎢C1 < B1 < C 2 ⎥ = P ⎢C1 < B1 < C2 ⎥ = 1 − α . hence C 23 × 6 = 1.1459. For example. r2 degrees of freedom. as was also the case in Propositions 5 and 6. 1 m + 1 n ) ( ) (43) is UMPU. it will be convenient to set t z. Aerospace Research Laboratories.3 Comparing UMP Tests the forParameters Testing Certain of Two Composite Normal Distributions Hypotheses 359 PROPOSITION 8 For testing H4 : τ = τ0 against A4 : τ ≠ τ0. distributed as Beta with r1.2 Comparing the Means of Two Normal Densities In the present context. C0 = C ( ) ( m+n−2 . The test given by (42) and (43) with reversed inequalities is UMPU for testing H′ ′ : μ < 0. for m = 15. Cambridge University Press. For H1 ′.1459. also. Ofﬁce of Aerospace Research. (See. ( ) (42) where C is determined by P t m+ n− 2 > C0 = α . PROPOSITION 10 For testing H4 : μ = 0 against A4 : μ ≠ 0.6 13. New Tables of the Incomplete Gamma Function Ratio and of Percentage Points of the Chi-Square and Beta Distributions by H. where μ = μ2 − μ1.6.05.v. ( m−1) n+1) . w < C1 otherwise. w = y−x ( ) ∑ ( m i =1 xi − x ) 2 + ∑ j =1 y j − y n ( ) 2 . 1 1 n−1) . Tables of the Incomplete Beta-Function by Karl Pearson. one has for H1 : P(t23 > C 23 × 6 ) = 0. the test given by ⎧1. The determination 1 : μ ≥ 0 against A1 of the power of the test involves a non-central t-distribution. n = 10 and α = 0. the test given by .05. ⎪ φ z.) 1 2 13. t(z.) For the actual determination of C1. C2.

σ 2). x3 = 14. . σ2 1) and N(μ2. . . y3 = 0.6.1 Let Xi. The problem of comparing the means of two normal densities when the variances are unequal is known as the Behrens– Fisher problem. In Propositions 9 and 10. x 2 = 8.05.4. x 4 = 0. Suppose that the observed values of the X ’s and Y ’s are as follows: x1 = 10. At level of signiﬁcance α = 0.3. Once again the determination of the power of the test involves the noncentral t-distribution. n = 10 and α = 0. (Compute the value . various tests have been proposed but we will not discuss them here. For this case. σ 2).125. 4 and Yj. . j = 1. ( ) C0 as above.2 Let Xj.’s are sX = 2. y1 = 0. φ z. the tests presented above are not UMPU. y3 = 12.120. y2 = 0. is UMPU. .0687 and C = 0. x5 = 0.d. . σ2 = 3. w = ⎨ ⎪ ⎩0. if the variances are not equal.6.1762.118. y1 = 9. and set up the formulas for determining the cut-off points and the power of the test. . . . . 9 and Yj. . Again with m = 15. one has P(t23 > C 23 × 6 ) = 0. y2 = 8.6. respectively. test the hypothesis: H : σ1 = σ2 against the alternative A : σ1 ≠ σ2 and ﬁnd the power of the test at σ1 = 2.05.3 Five resistance measurements are taken on two test pieces and the observed values (in ohms) are as follows: x1 = 0.115. REMARK 6 Exercises 13. Test the hypothesis H : σ1 = σ2 against the alternative A : σ1 ≠ σ2 at level of signiﬁcance α = 0. x 4 = 11. 10 be independent random 2 samples from the distributions N(μ1. .121. 4 be two independent random 2 samples from the distributions N(μ1. .117. (Compute the value of the test statistic. .120 y5 = 0. σ2 1) and N(μ2.05.114. x3 = 0. ( ) if t z. .) 13. j = 1. x 2 = 0. w > C ( ) where C is determined by P t m+ n− 2 > C0 = α 2 . sY = 3.360 13 Testing Hypotheses ⎧ ⎪1. . y4 = 10.1. (Compute the value of the test statistic.05. Make the appropriate assumptions and test the hypothesis H : σ1 = σ2 against the alternative A : σ1 ≠ σ2 at level of signiﬁcance α = 0.0.7.119.2. j = 1. w < −C otherwise.110.025 and hence C 23 × 6 = 2. and set up the formulas for determining the cut-off points and the power of the test. i = 1.1. y4 = 0. respectively. Suppose that the observed values of the sample s.) 13.3. ( ) or t z.

θ) whenever θ ∈ ω. . Set L(ω) = f(x1. . n be independent random 2 samples from the distributions N(μ1. θ) when θ is varying over ωc. θ ∈ Ω ⊆ ޒr and let ω ⊂ Ω. L(ω) ˆ) = max[L(θ).2. Of course.3.v. In practice. Then one may use Propositions 5 and 6 to test hypotheses about μ.’s although the same notation will ˆ ) is too small.d. and suppose that all four parameters are unknown. θ).05. σ2 1) and N(μ2. . of course. Then for setting up a test. i = 1.05.6. the test . 13. we have that the paired r. . A random sample of size 25 is taken from bars produced by ¯ = 65. . f(·. θ) · · · f(xn. 13. L(Ω) = max[L(θ).6. y each one of the processes.v.7 Composite Likelihood Hypotheses Ratio Tests 361 of the test statistic.6. the statistic L(ω ˆ c)/L(Ω) is quantities L(ω c ˆ ˆ ). ≤C. . sX = 6. i = 1.) 13. the MP test rejects when the likelihood ratio (LR) L(ωc)/L(ω) is too large (greater than or equal to a constant C determined by the size of the test. 13. θ ∈ Ω]. n.’s Zi. . ˆ )/L(Ω be employed. ˆ )/L(ω used rather than L(ω (When we talk about a statistic. where C is speciﬁed by the desired size of the test. By setting Zi = Xi − Yi.3 UMP Tests for Testing Certain 13.v. n and Yi. ¯ = 60.6. and set up the formulas for determining the cut-off points and the power of the test. when both ω and ωc consist of a single point. σ 2). where. . Xn be i. and L(ωc) = f(x1.05. i = 1.7 Likelihood Ratio Tests Let X1.6. however.’s distributed as normal with possibly different means but the same variance. respectively. θ) · · · f(xn. (ii) Exercise 13.05 for the data given in (i) Exercise 13. make the appropriate assumptions. . then neither L(ω) nor L(ωc) is determined by H and A and the above method of testing does not apply. .f. one rejects H if L(ω that is.’s with p. The problem can be reduced to it though by the following device. r. θ ∈ ω] and L(ωc) is to be replaced by is to be replaced by L(ω c ˆ ) = max[L(θ).6. it will be understood that the observed values have been replaced by the corresponding r.4 Refer to Exercise 13. Test the hypotheses H1 : μ ≤ 0 against A1 : μ > 0 and H2 : μ = 0 against A2 : μ ≠ 0 at level of signiﬁcance α = 0.v. Test whether there is a difference between the two processes at the level of signiﬁcance α = 0. then L(ω) and L(ωc) are completely determined and for testing H : θ ∈ ω against A : θ ∈ ωc. if ω and ωc contain more than one point each. 13.2 and suppose it is known that σ1 = 4 and σ2 = 3. .7 Let Xi.) In terms of this statistic.) However.3. are independent and distributed as N(μ.6.d. . . Test the hypothesis H that the two means do not differ by more than 1 at level of signiﬁcance α = 0. and test the hypothesis H : μ1 = μ2 against the alternative A : μ1 ≠ μ2 at level of signiﬁcance α = 0. the test is called a likelihood ratio (LR) test. . . and it is found that x sY = 7. 2 σ2) with μ = μ1 − μ2 and σ2 = σ2 1 + σ 2.13.6 Refer to Exercise 13. one would compare the L(ω ˆ) and L(ω ˆ c). . Now. For obvious reasons.5 The breaking powers of certain steel bars produced by processes A and B are r.6. θ ∈ ωc].i.

Should λ be too far away from 1. the data would tend to discredit H. xn)). In many cases. that is. ˆ ) is too Now the likelihood ratio test which rejects H whenever L(ω ˆ )/L(Ω small has an intuitive interpretation. the reason being that. ω . the quantities L(ω restrictions on θ. Suppose also that the set of positivity of the p. Then in terms of this statistic.i. the asymptotic distribution of −2 log λ. H is simple and then no problem exists. under suitable regularity conditions.362 13 Testing Hypotheses speciﬁed by the Neyman–Pearson fundamental lemma is also a likelihood ratio test. . Furthermore. the likelihood ratio test happens to coincide with or to be close to other tests which are known to have some well deﬁned optimal properties such as being UMP or being UMPU. The ﬁrst difﬁculty is removed at the actually determining L(ω ˆ ) and L(Ω asymptotic level. a theorem referring to the asymptotic distribution of −2 log λ is stated (but not proved) and then a number of illustrative examples are discussed. and therefore λ will be close to 1. First is the problem of determining the cut-off point C and second is the problem of ˆ ). it does provide a uniﬁed method for producing tests.v. (Notice that 0 < λ ≤ 1. Then under some additional regularity conditions. In carrying out the likelihood ratio test in actuality. as follows: The quantity L(ω ˆ ) and the ˆ )dx1 · · · dxn for the discrete and continuous case. Also in addition to its intuitive interpretation. is the maximum probability of observing x1. under H. f(·. r. it enjoys some asymptotic optimal properties as well. THEOREM 6 Let X1.) The notation λ = L(ω ˆ )/L(Ω Also the statistic −2 log λ rather than λ itself is employed.f. as speciﬁed by H. The ˆ ) is essentially that of ﬁnding the MLE of θ. . . where Ω is an r-dimensional subset of ޒr and let ω be an m-dimensional subset of Ω. does not depend on θ. one rejects H whenever −2 log λ > C. . ˆ ) has been in wide use. .d. . In spite of the apparent difﬁculties that a likelihood ratio test may present. reprobability element L(ω spectively. as n → ∞.d. if θ ∈ω ˆ) and L(Ω will tend to be close together (by an assumed continuity (in θ) of the likelihood function L(θ | x1.d. L(Ω the discrete and continuous case. the asymptotic distribution of −2 log λ is χ2 r−m. . this test is equivalent to the LR test. xn if θ lies in ω. Thus. xn without ˆ) ω. . . Xn be i. . ( ) () x ≥ 0 for all θ ∈ω. respectively. Calculating problem of ﬁnding L(Ω L(ω ˆ ) is a much harder problem. . one is apt to encounter two sorts of difﬁculties. provided θ ∈ω Pθ −2 log λ ≤ x → G x .’s with p. however. ˆ ) and L(Ω ˆ )dx1 · · · dxn represent the maximum probability for Similarly. . θ ∈Ω Ω. . and therefore H is to be rejected. of observing x1. is known.f. in the sense that we may use as an approximation (for sufﬁciently large n) the limiting distribution of −2 log λ for specifying C. In the following. . Of course. under certain regularity conditions. . θ). where C is determined by the desired level of the test. . in many cases of practical interest and for a ﬁxed sample size. .

Chapter 12). σ2). ˆΩ = ¯ x (see Example 12. It should also be pointed out that this test is the same as the test found in Example 10 and therefore the present test is also UMPU. ( ) . n x − μ0 σ2 ( ) 2 () ( )⎤ ⎥ ⎥ ⎦ 2 >C where C is determined by P χ12 > C = α . .) Notice that this is consistent with Theorem 6. EXAMPLE 11 (Testing the mean of a normal distribution) Let X1.3 UMP Tests for Testing Certain 13. xn)′.v.i. ⎥ ⎦ ) In this example. .’s from N(μ. Also the level of signiﬁcance will always be α. of a χ2 r−m distribution. r. we have Since the MLE of μ is μ ˆ = LΩ and ˆ = Lω ( ) ( 1 2πσ ) n n ⎡ 1 exp ⎢− 2 ⎢ ⎣ 2σ ∑ (x j =1 n j 2⎤ −x ⎥ ⎥ ⎦ ) ( ) ( 1 2πσ ) ⎡ 1 exp ⎢− 2 ⎢ ⎣ 2σ ∑ (x j =1 n j 2⎤ − μ 0 ⎥. or some other test equivalent to it. i) Let σ be known and suppose we are interested in testing the hypothesis H : μ ∈ω = {μ 0}. φ z =⎨ ⎪ ⎪ ⎩0. . Xn be i. the alternative A speciﬁes that θ ∈ ωc .7 Composite Likelihood Hypotheses Ratio Tests 363 where G is the d. (Recall that z = (x1. .13. . since σ is unspeciﬁed. ii) Consider the same problem as in (i) but suppose now that σ is also unknown.d. ⎡ n x−μ 0 if ⎢ ⎢ σ ⎣ otherwise. this will not have to be mentioned explicitly in the sequel. . and consider the following testing hypotheses problems. . In fact. We are still interested in testing the hypothesis H : μ = μ0 which now is composite.f. . it is much easier to determine the distribution of −2 log λ rather than that of λ. Since in using the LR test. −2 log λ = and the LR test is equivalent to ⎧ ⎪ ⎪1.

∑ (x j =1 n j − μ0 ) = ∑ (x 2 j =1 n j −x ) 2 + n x − μ0 ( ) 2 and therefore λ 2 n ⎡ 2 ⎢ n x − μ0 1 ⎢ = 1+ ⎢ n−1 1 n ⎢ ∑ xj − x n − 1 j =1 ⎢ ⎣ ( ) ( ) ⎤ ⎥ ⎥ 2⎥ ⎥ ⎥ ⎦ −1 ⎛ t2 ⎞ = ⎜1 + . σ > 0} and ω = {θ = (μ. under Ω = {θ = (μ.364 13 Testing Hypotheses Now the MLE’s of σ2. respectively. () if t < − C otherwise. σ)′. Chapter 12). Then ⎛σ ˆ2⎞ λ=⎜ Ω 2 ⎟ ˆω ⎠ ⎝σ But n 2 or λ 2 n ∑ (x − x ) = ∑ (x − μ ) n j=1 j n j=1 j 2 2 . or t > C . σ > 0} are. φ z =⎨ ⎩0. Then λ < λ0 is equivalent to t2 > C for a certain constant C. μ ∈ ޒ. σ)′. 1 n 2 ˆΩ = ∑ xj − x σ n j =1 ( ) 2 2 ˆω and σ = 1 n ∑ x j − μ0 n j =1 ( ) 2 (see Example 12. the LR test is equivalent to the test ⎧1. Therefore ˆ = LΩ ( ) ( 1 ˆΩ 2πσ ) n n ⎡ 1 exp ⎢− 2 ˆΩ ⎢ ⎣ 2σ ∑ (x j =1 n j 2⎤ −x ⎥= ⎥ ⎦ ) ( ( 0 1 ˆΩ 2πσ ) ) n e −n 2 and ˆ = Lω ( ) ( 1 ˆω 2πσ ) ⎡ 1 exp ⎢− 2 ˆω ⎢ ⎣ 2σ ∑ (x j =1 n j 2⎤ − μ0 ⎥ = ⎥ ⎦ ) 1 ˆω 2πσ n e −n 2 . n − 1⎟ ⎝ ⎠ −1 where t =t z = () n x − μ0 ( ) ) 2 1 n ∑ xj − x n − 1 j =1 ( . μ = μ0. That is.

.f. . .i. . μ1. by Proposition 6. μ2. k = 1.v. EXAMPLE 12 ( ) (Comparing the parameters of two normal distributions) Let X1. . the MLE’s of the parameters involved are given by 2 ˆ 1. σ)′. . Xm be 2 i. k = 1. + x yj ⎟ = ∑ ∑ i ⎜ m + n ⎝ i =1 m+ n ⎠ j =1 and by setting υk = xk. Suppose that the X’s and Y’s are independent and consider the following testing hypotheses problems. r. . . μ2 ∈ ޒ. . Therefore ˆ = LΩ ( ) ( 1 ˆΩ 2πσ ) m +n e − m +n ( ) 2 . n. μ2.13. . r. σ2 1) and Y1. . ⎥ ⎦ ) as is easily seen.i. −μ ⎥ ⎦ ) Therefore ˆ = Lω It follows that ( ) ( 1 ˆω 2πσ ) m +n e − m +n ( ) 2 . μ ˆ 2 . one has υ= and 2 ˆω = σ n ⎞ 1 m +n 1 ⎛m ˆω υk = + x yj ⎟ = μ ∑ ∑ ∑ i ⎜ m + n k =1 m + n ⎝ i =1 ⎠ j =1 1 m +n ∑ vk − v m + n k =1 ( ) 2 = 1 ⎡m ˆω ⎢∑ xi − μ m+ n⎢ ⎣ i =1 ( ) + ∑ (y 2 j =1 n j 2⎤ ˆ ω ⎥.’s from N(μ2.d. σ 2). . . Under ω = {θ = (μ1. Notice that. . . σ > 0}.7 Composite Likelihood Hypotheses Ratio Tests 365 where C is determined by P t n−1 > C = α 2. .v.Ω = x . Under Ω = {θ = (μ1.’s from N(μ1.Ω = y . m and υm+k = yk.d. . the test just derived is UMPU. σ)′. μ1 = μ2 ∈ ޒ. ⎥ ⎦ ) i) Assume that σ1 = σ2 = σ unknown and we are interested in testing the hypothesis H : μ1 = μ2 (= μ unspeciﬁed). of the X’s and Y’s is given by ( ) 2π 1 m +n ⎡ 1 1 exp ⎢− 2 m n σ1 σ 2 ⎢ ⎣ 2σ 1 ∑ (x − μ ) i 1 i =1 m 2 − 1 2 2σ 2 ∑ (y j =1 m j 2⎤ − μ 2 ⎥. σ > 0}. σ ˆΩ μ = 1 ⎡m ⎢ ∑ xi − x m+ n ⎢ ⎣ i =1 ( ) + ∑ (y 2 j =1 n j 2⎤ − y ⎥.3 UMP Tests for Testing Certain 13. In the present case. the joint p. Yn be i.d. we have ˆω = μ n ⎞ mx + ny 1 ⎛m .

. w = (y1. m+ n ( ) λ where t= mn x−y m+ n 2 m+ n ( ) ⎛ ⎞ t2 = ⎜1 + . . t is distributed as tm+n−2. ⎥ ⎦ ) Therefore the LR test which rejects H whenever λ < λ0 is equivalent to the following test: ⎧ ⎪1. 2 ˆω σ Next ˆ ω ) = ∑ [( xi − x ) + ( x − μ ˆ ω )] = ∑ ( xi − x ) ∑ ( xi − μ m 2 m v 2 m 2 i =1 i =1 m i =1 ˆω +m x−μ ( ) 2 = ∑ xi − x i =1 ( ) 2 + mn 2 (m + n) j 2 (x − y) . . . ( ) . because. We notice that the test φ above is the same as the UMPU test found in Proposition 10. 2 2 n so that ˆ (m + n)σˆ = (m + n)σ 2 ω 2 Ω + mn x−y m+ n ( ) = ∑ (x − x) + ∑ (y 2 i i =1 j =1 j −y ) 2 + It follows then that 2 mn x−y . ⎩ ( ) if t < −C otherwise. and z = (x1. . 2 and in a similar manner ∑ (y j =1 n j ˆω −μ ) = ∑ (y 2 j =1 n −y ) 2 + m2 n (m + n) m 2 (x − y) . w = ⎨ ⎪0. . φ z. . m + n − 2⎟ ⎝ ⎠ −1 ( ) ⎡m 1 ⎢∑ xi − x m+ n− 2 ⎢ ⎣i =1 ( ) + ∑ (y 2 j =1 n j 2⎤ −y ⎥. or t > C (C > 0) where C is determined by P t m+ n− 2 > C = α 2. . under H.366 13 Testing Hypotheses m +n ˆ ⎞ ⎛σ λ =⎜ Ω⎟ ˆω ⎠ ⎝σ and λ 2 m +n ( ) = 2 ˆΩ σ . yn)′. xm)′.

Ω = x .Ω = y .3 UMP Tests for Testing Certain 13. ⎥ ⎦ ) Therefore ˆ = LΩ ( ) ( 2π ) (σˆ ) (σˆ ) m +n 2 1 . σ2)′. μ2. ˆ 1. Under Ω = {θ = (μ1. μ ˆ 2 . σ2 > 0}. σ1 = σ2 > 0}. ( m +n ) 2 (σˆ ) (σˆ ) λ= (σˆ )( ) 2 1 .ω = μ ˆ 1.Ω m 2 2 2 .Ω n j =1 ( ) while under ω = {θ = (μ1.ω = μ ˆ 2 .Ω μ and 2 ˆω σ = 1 ⎡m ⎢ ∑ xi − x m+ n ⎢ ⎣ i =1 ( ) + ∑ (y 2 j =1 n j 2⎤ − y ⎥.Ω = ∑ xi − x μ m i =1 ( ) 2 and v 2 1 n 2 ˆ2 σ = yj − y . μ1. σ1. μ2 ∈ ޒ.7 Composite Likelihood Hypotheses Ratio Tests 367 ii) Now we are interested in testing the hypothesis H : σ1 = σ2 (= σ unspeciﬁed). σ ˆ 12. σ1.Ω 2 ω m +n 2 n 2 = (m + n)( m +n ) 2⎡ m ∑ xi − x ⎢ ⎣ i =1 ( ) ∑ (y 2 n j =1 j 2 −y ⎤ ⎥ ⎦ n ) m 2 ⎧ m mm 2 nn 2 ⎨⎡ ∑ xi − x ⎣ i =1 ⎩⎢ ( ) 2 2 n + ∑j = 1 y j − y ⎤ ⎥ ⎦ ( ) ∑ ( j =1 2⎫ yj − y ⎬ ⎭ ) ( m +n ) 2 ( m + n) ( = m +n 2 ) 2 mm 2 nn ⋅ 2 m ⎡ ⎤ ⎢ m − 1 ⋅ ∑i = 1 xi − x m − 1 ⎥ 2 n ⎢ n−1 ⎥ ∑j = 1 y j − y n − 1 ⎦ ⎢ ⎥ ⎣ ( ( ) ) m 2 2 m ⎡ ⎤ ⎢1 + m − 1 ⋅ ∑i = 1 xi − x m − 1 ⎥ 2 n ⎢ ⎥ n−1 ∑j = 1 y j − y n − 1 ⎥ ⎢ ⎣ ⎦ ( ( ) ) ( m +n ) 2 .Ω m 2 2 2 . σ2)′. μ1.13. ∑ . μ2 ∈ޒ. μ2.Ω .Ω 1 ⋅ 1 n 2 e − m +n ( ) 2 and ˆ = Lω so that ( ) ( 2π ) (σˆ ) m +n 2 ω 1 ⋅ 1 − ( m +n ) 2 e . σ1. we have 1 m ˆ 1. μ ˆ 2 .

in practice the C1 and C2 are determined so as to assign probability α/2 to each one of the two tails of the Fm−1.n−1 distribution. such that P Fm−1.368 13 Testing Hypotheses = (m + n) ( n+ m) 2 m m 2 nn 2 ⋅ ⎛m−1 ⎜ ⎝ n−1 ⎞ f⎟ ⎠ m 2 ⎛ m−1 ⎞ f⎟ ⎜1 + n−1 ⎠ ⎝ ( m+ n) 2 . it can be seen (see Exercise 13. ¯ 2 ¯)2/(m − 1)]/[Σn where f = [Σm i = 1(xi − x j = 1(yj − y) /(n − 1)].n−1 > C 2 = α 2. ∞). n( m − 1) m n−1 f < C1 or f > C2 it is increasing between 0 and fmax and decreasing in (fmax. is equivalent to the test based on f and rejecting H if ⎛m−1 ⎜ ⎝ n−1 ⎞ f⎟ ⎠ m 2 ⎛ m−1 ⎞ f⎟ ⎜1 + n−1 ⎠ ⎝ ( m+ n) 2 <C for a certain constant C. that is. ( ) ( ) (See also Fig. F is distributed as Fm−1. and denote by F the resulting statistic. Setting g( f ) for the left hand side of this last inequality.7.n−1. Therefore g f < C if and only if () for certain speciﬁed constants C1 and C2. respectively.n−1 > C 2 = α ) and g C1 = g C 2 . Therefore the constants C1 and C2 are uniquely determined by the following requirements: P Fm−1.n−1 < C1 = P Fm−1. ( ) ( ) However. Furthermore.) Fm Ϫ 1.16. Now.n−1 < C1 or ( Fm−1. it follows that. which rejects H whenever λ < λ0. if in the expression of f the x’s and y’s are replaced by X’s and Y’s. Therefore the LR test. 13. under H.4) that g(f ) has a maximum at the point fmax = ( ).16 0 C1 C2 . n Ϫ 1 ␣ 2 ␣ 2 Figure 13. we have that g(0) = 0 and g(f ) → 0 as f → ∞.

v. and show that g(t) → 0 as t → ∞.7.25. ii) Set up the LR test.4 Consider the function ⎛m−1 ⎞ t⎟ ⎜ ⎝ n−1 ⎠ ⎛ m−1 ⎞ t⎟ ⎜1 + n−1 ⎠ ⎝ m 2 gt = () ( m+ n) 2 . At level of signiﬁcance α = 0.13.7. Xn are i.25 against the alternative A : p ≠ 0. with probability p of falling heads. iii) Specify the (randomized) test of level of signiﬁcance α = 0. . r. where t = x1 + x2 + x3. . t ≥ 0 = [ () ] [1 + (m n)]( mm 2 m+ n 2 ) and that g is increasing in ⎡ m n−1 ⎢0. and: i) Calculate the values λ(t) for t = 0.05. In the ﬁrst case. iv) Compare the test in part (iii) with the UMPU test constructed in Exercise 13. ⎟ − ⎢ ⎠ ⎣n m 1 ( ( ) ) . is tossed 100 times and 60 heads are observed.i.1: i) Test the hypothesis H : p = 1 against the alternative A : p ≠ 1 by using the 2 2 LR test and employ the appropriate approximation to determine the cutoff point.2 A coin. ∞⎟ .1 Refer to Exercise 13.2 and use the LR test to test the hypothesis H : p = 0. 13. n m−1 ⎢ ⎣ ( ( )⎤ ⎥ )⎥ ⎦ ⎡m n − 1 ⎞ and decreasing in ⎢ . compare the test based on −2 log λ with that derived in Example 11.4. 2.7.3 If X1. set λ(t) for the likelihood function.2. m. σ2).d.4.4. 1. derive the LR test and the test based on −2 log λ for testing the hypothesis H : σ = σ0 ﬁrst in the case that μ is known and secondly in the case that μ is unknown. 3 as well as the corresponding probabilities under H. integers. . Speciﬁcally.7. ii) Compare the cut-off point in part (i) with that found in Exercise 13. in terms of both λ(t) and t. t ≥ 0.1. 13. max g t . n ≥ 2.’s from N(μ. .3 UMP Tests for Testing Certain Composite Hypotheses Exercises 369 Exercises 13. 13.

. r which form a partition of the sample space S and let {Bj. . Then. j = 1. Consider the case that H : θ ∈ ω = {θ0} = {(p10. ⎪ ⎩ ( ) ∑p j =1 k j ⎫ ⎪ = 1⎬. for example. pk . . . Chapter 12). that is. . where p Therefore ⎛p ⎞ λ = n ∏⎜ j 0 ⎟ j =1 ⎝ x j ⎠ k n xj and H is rejected if −2 log λ > C. p x1! ⋅ ⋅ ⋅ x k ! 1 k ˆ j = xj/n are the MLE’s of pj. i = 1. Then the joint distribution of the X’s is the Multinomial distribution. . experiment which may result in k possibly different outcomes denoted by Oj. let pj be the (constant) probability that each one of the trials will result in the outcome Oj and denote by Xj the number of trials which result in Oj. . ⎪ ⎭ We may suspect that the p’s have certain speciﬁed values. where the LR is also appropriate. k. in the case of a die. j = 1. . . under ω. . . k (see Example 11. k. s} be another partition of S. . . . j = 1. . Σk j = 1xj = n and ( ) n! x p1x ⋅ ⋅ ⋅ pk . k. More generally. ˆ = LΩ ( ) n! x ˆ 1x ⋅ ⋅ ⋅ p ˆk . . j =1 i =1 s r . . x1! ⋅ ⋅ ⋅ x k ! 1 k ⎧ ′ ⎪ Ω = ⎨θ = p1 . . . . as it can be shown on the basis of Theorem 6. ˆ = Lω ( ) n! x x ⋅ ⋅ ⋅ pk p10 0. . under Ω. . Now consider r events Ai. . X k = x k = where xj ≥ 0. P X1 = x1 . pk0)′}. . . k. . . Let pij = P(Ai ∩ Bj) and let pi . p j > 0.370 13 Testing Hypotheses 13. . j = 1. we may want to test the hypothesis that θ lies in a subset ω of Ω. .8 Applications of LR Tests: Contingency Tables. We consider an r. = ∑ pij . Goodness-of-Fit Tests Now we turn to a slightly different testing hypotheses problem. We then formulate this as a hypothesis and proceed to test it on the basis of the data. x1! ⋅ ⋅ ⋅ x k ! 1 k while. In n independent repetitions of the experiment. . . j = ∑ pij . . p. j = 1. and the desired level of signiﬁcance α. . . The constant C is determined by the fact that −2 log λ is asymptotically χ2 k−1 distributed under H. the die may be balanced. . j = 1. . . . . .

.Composite Goodness-of-Fit Hypotheses Tests 371 Then. is an (r + s − 2)-dimensional subset of Ω. . Ω = {θ = (pij. r. . That is. pi. . . . . . . . For instance. i =1 r It is then clear that ∑X i =1 r i. j = 1. . p. Next. i = 1. . p. . . . . . . . s. . . . r. . . j)th cell. s. i = 1. . s such that H : pij = piqj. . We may think of the rs events Ai ∩ Bj being arranged in an r × s rectangular array which is known as a contingency table. s)′. . j = 1. . Again consider n experimental units classiﬁed according to the characteristics A and B and let Xij be the number of those falling into the (i. . Namely. . . . i =1 j =1 r s .8 Applications 13. j = 1. B4 (beyond).j = P(Bj) and ∑p = ∑p i. = P(Ai). . j = 1. We set Xi . . . A may stand for gender and A1. = ∑ xij . . r. j = 1. i = 1. i = 1. B3 (college graduate). . . r. Ar be the r levels of A and B1. = ∑ X . . speciﬁed by H. . . Br be the J levels of B. clearly. Then the set Ω of all possible values of θ is an (rs −1)-dimensional hyperplane in ޒrs. . j = ∑ X ij . j =1 s Let θ = (pij. . . the event Ai ∩ Bj is called the (i. . . A situation where this set-up is appropriate is the following: Certain experimental units are classiﬁed according to two characteristics denoted by A and B and let A1. j = ∑ xij . x. . . s)′ ∈ ޒrs. the problem of interest is that of testing whether the characteristics A and B are independent.3of UMP LR Tests: TestsContingency for Testing Certain Tables. and B may denote educational status comprising the levels B1 (elementary school graduate). . . B2 (high school graduate).13. . . r. i = 1. s.j. . . . . . j =1 i =1 s r . . . . the events {A1. i = 1.j = ∑ ∑ pij = 1. we want to test the existence of probabilities pi. . it follows that the set ω. r − 1 and j = 1. qj. . . . . Ar} and {B1. j = n. . . i =1 j =1 r s Furthermore. . ∑p i =1 r ij = qj . r. Σir= 1 Σs j = 1 pij = 1}. pij > 0. . j = 1. j)th cell. Since for i = 1. . Bs} are independent if and only if pij = pi. Under the above set-up. . = ∑ X ij j =1 s and X . s − 1 we have the r + s − 2 independent linear relationships ∑p j =1 s ij = pi . if xij is the observed value of Xij and if we set xi . . A2 for male and female.

j instead of Πir= 1 Πjs= 1. j ij i .W = p xij n ˆ i . j ⎟ ⎥ ⎢ i ⎝ n ⎠ ⎥⎢ j ⎝ n ⎠ ⎥ ⎦⎣ ⎣ ⎦ ⎛ xij ⎞ ∏⎜ ⎟ i . pi and qi are.j ( ) ∏n!x ! ∏ ( p q ) i j i . Therefore ˆ = LΩ ( ) ⎛ xij ⎞ n! ˆ = ∏ ⎟ . j i. j ij i ⎞⎛ x ⎞ ⎟ ⎜ ∏ qj ⎟ ⎠⎝ j ⎠ . j ⎜ xij ( ) i. under ω.ω = . Writing Πi. Chapter 12) that certain chi-square statistics were appropriate. we have LΩ = Lω = since ( ) ∏n!x ! ∏ p i .j ⎤ ⎥ ⎥ ⎦ and hence λ= ⎡ ⎛ x ⎞ xi . under Ω and ω. Lω ⎝ n⎠ ∏i . j xij ! i .j ij i .9. respectively.j ij i . i j xij x q j = ∏ ∏ pi q j = ∏ pix q!x ⋅ ⋅ ⋅ q s xij xij xij i⋅ i1 is i j i ⎛ ⎞⎛ ⎛ x x ⎞ x = ⎜ ∏ pix ⎟ ⎜ ∏ q1 ⋅ ⋅ ⋅ qs ⎟ = ⎜ ∏ pi ⎝ i ⎠⎝ i ⎠ ⎝ i i⋅ i1 is i⋅ ⎞⎛ x ⎟ ⎜ ∏ qj ⎠⎝ j ⋅j ⎞ ⎟. in a sense. j xij ! ⎢ ⎣ i as is easily seen (see also Exercise 13. ⎤ ⎡ ⎛ x ⎞ x. it was seen (see Section 12. Recall that χ =∑ 2 k (X j − np j np j ) 2 . where f = (rs − 1) − (r + s − 2) = (r − 1)(s − 1) according to Theorem 6. j =1 . ⎟ ⎝ n⎠ ∏i . ˆ j . Hence the test for H can be carried out explicitly. p x xi . j ⎤ ⎢∏ ⎜ i .j xij = n! x x pi q j = ∏ x ! ∏i . ⎠ Now the MLE’s of pij.8. pi ∏ . q n n x ⎛ ⎞ n! ⎡ ⎢∏ ⎜ xi . ⎟ ⎜ ∏ x⋅ j ⎟ ⎠⎝ j ⎝ i ⎠ nn ∏ xij ij x i . It can be shown that the (unspeciﬁed) assumptions of Theorem 6 are fulﬁlled in the present case and therefore −2 log λ is asymptotically χ2 f. . respectively.ω = . n! ⎛ x ⎜ ∏ pi x ! ⎝ ∏i . ⎟ ⎥ ⎢∏ ⎜ .1).372 13 Testing Hypotheses the likelihood function takes the following forms under Ω and ω. as described at the beginning of this section and in connection with the estimation problem. ⎤⎡ ⎛ x ⎞ x ⎥ ⎢∏ ⎜ . j ⎟ ⎥⎢ j ⎝ n ⎠ ⎦⎣ .j .j ij ij xij ij .j ⎝ n ⎠ xij = ⎛ ⎛ x⋅ j ⎞ xi ⋅ ⎞ ⎜ ∏ xi . Now in a multinomial situation. j . ˆ ij .

pk 0 ⎬. . However. .j (x ij ˆ i . Once more this test is asymptotically equivalent to the corresponding test based on −2 log λ.ω q ˆ j . we obtain the statistic χ =∑ 2 ˆ ω i .ω np ) 2 . p xi . 2 2 By means of χ ω ˆ . . xij n ˆ i .2 Refer to Exercise 13. is χ f with f = (r − 1)(s − 1). In fact. .ω − np ˆ i . q x⋅ j n as claimed in the discussion in In Exercises 13. can be used for testing the hypothesis ′⎫ ⎧ H : θ ∈ω = θ 0 = ⎨ p10 . . . one can test H by rejecting it whenever χ ω ˆ > C.2 and test the hypothesis formulated there at the speciﬁed level of signiﬁcance by using a χ 2-goodness-of-ﬁt test.7.j (x ij − npi q j npi q j ) 2 . we have χ =∑ 2 ω i . Also.1 Show that p this section. χ2 ω is not a statistic since it involves the parameters pi.8.2(i). under ω. ⎩ ⎭ where θ = (p1. . pk)′. as can be shown. χ2 ω is asymptotically distributed as χ k−1.13.9 below. the present test is asymptotically equivalent to the test based on −2 log λ. .7. under ω.v.8.2–13. compare the cut-off point with that found in Exercise 13. 13. It can further be 2 shown that.Ω = 13. where ω is as in the previous case in connection with the contingency tables. Tests based on chi-square statistics are known as chi-square tests or goodness-of-ﬁt tests for obvious reasons. in the sense of being greater than a certain constant C which is speciﬁed by the desired level of the test.ω = . we consider { } ( k ) χ =∑ 2 ω j =1 (x j − npj 0 npj 0 ) 2 and reject H if χ2 ω is too large.ω p ˆ j . By replacing them by their MLE’s.ω = . . n ˆ j . For the case of contingency tables and the problem of testing independence there. Exercises ˆ ij . That is. qj.3 UMP Tests for Testing Certain Composite Hypotheses Exercises 373 This χ2 r. the test to be used will be the appropriate χ2 test. The constant C is to be determined by the signiﬁcance level and the fact that the asymptotic 2 2 distribution of χ ω ˆ .8.8.

05. 13.05. 49]. Test this claim for the data given below by choosing appropriately the Normal distribution and taking α = 0. Test the validity of the proposed model at the level of signiﬁcance α = 0.3 A die is cast 600 times and the numbers 1 through 6 appear with the frequencies recorded below.6 It is often assumed that I. 40 did not contract the disease whereas the remaining 10 did so. Test effectiveness of the vaccine at the level of signiﬁcance α = 0. 92) distribution. say.8. 59] and F for grades in [0. test the fairness of the die.Q.5 Course work grades are often assumed to be normally distributed. 18. take the midpoints for the ﬁnite intervals and the points 65 and 160 for the leftmost and rightmost intervals. respectively. 74]. B and C. Of those not treated. C for grades in [60. D for grades in [50.) 13. A 3 B 12 C 10 D 4 F 1 13. 30 did contract the disease and the remaining 20 did not. In a certain class. B and C are 12 and 36 fall into levels A. Use the data given below to check the assumption that the data is coming from an N(75. employ the appropriate χ2 test and take α = 0.8.1. two different varieties of a certain species are crossed and a speciﬁc characteristic of the offspring can only occur at three levels A.8.8. scores of human beings are normally distributed. 1 100 2 94 3 103 4 89 5 110 6 104 At the level of signiﬁcance α = 0. 13.05. Of those who received the treatment.05.8. x ≤ 90 90 < x ≤ 100 100 < x ≤ 110 110 < x ≤ 120 120 < x ≤ 130 x > 130 10 18 23 22 18 9 (Hint: Estimate μ and σ2 from the grouped data. respectively. . 6. ties for A. respectively. B and C. 100]. 12 and 12 . suppose that letter grades are given in the following manner: A for grades in [90. Half of them are given a preventive shot against a certain disease and the other half serve as control.7 Consider a group of 100 people living and working under very similar conditions. the probabili1 3 8 . B for grades in [75.4 In a certain genetic experiment.374 13 Testing Hypotheses 13. For this purpose. According to a proposed model. 89]. Out of 60 offsprings.

Take α = 0. check whether the consistency property is satisﬁed with regards to Exercises 13.05. 85).9 From each of four political wards of a city with approximately the same number of voters. loss and risk function.13. Xn be i.8.d. [85.10 Let X1.f.d. Then the hypothesis to be tested is H : θ ∈ ω against ωc.8.3. 90). a test φ is said to be consistent if its power βφ. . .8.8 On the basis of the following scores. f(·. 100 voters were chosen at random and their opinions were asked regarding a certain legislative proposal.2.3 and 13. in the present context a non-randomized decision function δ = δ(z) is deﬁned as follows: ⎧1. .’s with p. Then by setting z = (x1. evaluated at any ﬁxed θ ∈ Ω. For testing a hypothesis H against an alternative A at level of signiﬁcance α. On the basis of the data given below. appropriately taken. δ (z ) = ⎨ ⎩0. Boys: 80 96 98 87 75 83 70 92 97 82 Girls: 82 90 84 70 80 97 76 90 88 86 (Hint: Group the grades into the following six intervals: [70.05. θ). .9 Decision-Theoretic Viewpoint of Testing Hypotheses For the deﬁnition of a decision. [90. .9 UMP Decision-Theoretic Tests for Testing Viewpoint Certain Composite of Testing Hypotheses 375 13. .i. . Refer to the previous exercises and ﬁnd at least one test which enjoys the property of consistency. converges to 1 as n → ∞. . Let B be a critical region.3 13. Take α = 0. Xn be independent r. θ).d. [75.’s with p. r. if z ∈ B otherwise. 13.) 13.2. and let ω be a (measurable) subset of Ω. f(·. . θ ∈ Ω ⊆ ޒr. Chapter 12. Let X1. . test whether the fractions of voters favoring the legislative proposal under consideration differ in the four wards. θ ∈ Ω ⊆ ޒr. 100).f. 75).v. Speciﬁcally. 80). the reader is referred to Section 6. . WARD 1 Favor Proposal Do not favor proposal Totals 37 63 100 2 29 71 100 3 32 68 100 4 21 79 100 Totals 119 281 400 13. .v. [80. the alternative A : θ ∈ω xn)′. test whether there are gender-associated differences in mathematical ability (as is often claimed!). .

we have THEOREM 7 Let X1. δ = ⎨ (44) c L2 Pθ Z ∈ B . L2 > 0. . Thus in the case that ω = {θ0} and ωc = {θ1}. (45) R θ. R θ1 . Xn be i. on Ω. . δ ≤ max R θ 0 . . θ). or θ ∈ω c and δ = 1.d. ⎪ ⎩ c In particular. δ = ⎨L1 . θ1}. we are led to minimax decision and Bayes decision functions corresponding to a given prior p. By setting Z = (X1. 0 Pθ Ζ ∈ Bc . δ is minimax if ( ) ( ) max R θ 0 . ⎪ L θ. δ . 0 1 (46) Then the decision function δ deﬁned by . . where L1. θ ∈ Ω = {θ0. θ1). . ( ) if θ ∈ω and δ = 0. is either f(·. ⎪ ⎩L2 . Regarding the existence of minimax decision functions. 1 Pθ Z ∈ B + L θ.’s with p. θ0) or else f(·.d. .v. ω = {θ1} and Pθ (Z ∈ B) = α. ⎪ ⎩ 2 As in the point estimation case. The notation φ instead of δ could be used if one wished. f(z. Since this is not feasible. f(·. δ = ⎨ L 1− β . r. Also an appropriate loss function. if θ ∈ω ⎪ R θ. Deﬁne the subset B of ޒn as follows: B = {z ∈ ޒn. if θ ∈ω and δ = 1 if θ ∈ω c and δ = 0. a decision function in the present framework is simply a test function.f. Xn)′.376 13 Testing Hypotheses We shall conﬁne ourselves to non-randomized decision functions only. δ = L θ. Clearly. By setting f0 = f(·. θ0) and f1 = f(·. We are interested in testing the hypothesis H : θ = θ0 against the alternative A : θ = θ1 at level α. we have the result below. is of the following form: ⎧0. δ )). or ( ) ( ) ( ) ) ( ) ) ( ) ⎧L1Pθ Z ∈ B . corresponding to δ.v. δ ) = R(θ . if θ ∈ω c . we have ( ( ( ) 0 1 ⎧ if θ = θ 0 ⎪L1α .f. Xn is a sample whose p. R(θ . δ * . the corresponding risk function is R θ. . except for trivial cases.f. if θ = θ1 . δ * [( ) ( )] [( ) ( )] for any other decision function δ*. θ1) > Cf(z. . θ1). R θ1 .i. Pθ (Z ∈ B) = β. . if ω = {θ0}. The r. we would like to determine a decision function δ for which the corresponding risk would be uniformly (in θ) smaller than the risk corresponding to any other decision function δ*.d. θ0)} and assume that there is a determination of the constant C such that L1Pθ0 Z ∈ B = L2 Pθ1 Z ∈ Bc ( ) ( ) (equivalently.d. . .’s X1. .

δ * = R 1. δ) and R(θ1. δ = max R 0. is the MP test of level P0 (Z ∈ B) constructed in Theorem 1. δ .d. Then R 0. δ * ≥ R 0. This is equivalent to L1P0(Z ∈ A) ≤ L1P0(Z ∈ B). δ . in fact. δ * . on Ω = {θ0. then [( ) ( )] ( ) ( ) [ ( ) ( )] max R 0. and similarly R(0. () if z ∈B otherwise. R(1. Hence P1 Z ∈ Ac ≥ P1 Z ∈ Bc . R 1.9 UMP Decision-Theoretic Tests for Testing Viewpoint Certain Composite of Testing Hypotheses 377 ⎧1. Also set P0(Z ∈ B) = α and P1(Z ∈ Bc) = 1 − β. δ * = L1P0 Z ∈ A ( ) ( ) ( and R 1. δ) = (ޒ1. PROOF For simplicity. δ) for R(θ0. δ). More precisely. δ ≤ R 1. we have . δ * > R 0. δ . set P0 and P1 for Pθ and Pθ . ) Then Theorem 1 implies that P1(Z ∈ A) ≤ P1(Z ∈ B) because the test deﬁned by (47) is MP in the class of all tests of level ≤α. there is always a Bayes decision function which is actually an LR test. (49) whereas if R(0. ▲ It follows that the minimax decision function deﬁned by (46) is an LR test and. δ). as was to be seen. R 1. δ = L2 1 − β . δ). θ1}. or L2 P1 Z ∈ Ac ≥ L2 P1 Z ∈ Bc . δ = max R 0. (50) Relations (49) and (50) show that δ is minimax. That is. δ . or P0 Z ∈ A ≤ α . if R 0. δ*) ≥ R(1. δ) and R(0. δ = L1α ( ) and R 1. corresponding to a given p. δ * ≥ R 1. n ( ) ( ( ) Let A be any other (measurable) subset of ޒand let δ* be the corresponding decision function. δ * ≤ R 0. ) ( ) Consider R(0.f. respectively. δ z =⎨ ⎩0.3 13. δ . REMARK 7 [( ) ( )] ( ) ( ) [ ( ) ( )] We close this section with a consideration of the Bayesian approach. δ * = L2 P1 Z ∈ Ac . δ) < R(0.13. then R 1. (47) is minimax. we have max R 0. δ). In connection with this it is shown that. R 1. δ*) and suppose that R(0. δ*) ≤ R(0. δ). R(1. ( ( ) ( ) ( ) (48) Since by assumption (ޒ0. δ*). δ * . R 1. ( ) ( ) ) ( ) ( ) or equivalently. δ * . The relation (45) implies that 0 1 R 0.

θ 0 − p1L2 f z. 0 () if z ∈B otherwise. the MP test for testing H against A at the level P0(Z ∈ B). ⎩ 0 () if p0 L1 f z. θ 0 − p1L2 f z. θ). f(z. θ1 dz B [ ( ) ) ( [ ( ) ( ) ( )] )] (51) [ ( ) ( )] for the continuous case and equal to p1L2 + ∑ p0 L1 f z. we have 0 Rλ δ = L1P0 Z ∈ B p0 + L2 P1 Z ∈ Bc p1 0 () ( = p0 L1P0 Z ∈ B + p1L2 1 − P1 Z ∈ B = p1L2 + p0 L1P0 Z ∈ B − p1L2 P1 Z ∈ B and this is equal to p1L2 + ∫ p0 L1 f z. δλ z = ⎨ ⎩0. Xn be i. p1}. and by employing the simpliﬁed notation used in the proof of Theorem 7.f. p1L2 ⎭ ⎩ ( ) ( ) as was to be seen. as follows by Theorem 1. θ 0 ⎬. 0 () if z ∈ B otherwise. PROOF Let Rλ (δ) be the average risk corresponding to λ0.’s with p.d. p1} (0 < p0 < 1) be a probability distribution on Ω.d. θ 0 − p1L2 f z. . f(·. ⎧1. and is given by 0 ⎧1. . f z. θ ∈ Ω = {θ0. equivalently. θ1 > 0 1 f z. that is. θ0)} and C = p0L1/p1L2. θ1 < 0 ( ) ( ) otherwise. ▲ It follows that the Bayesian decision function is an LR test and is. . a decision rule which minimizes the average risk R(θ0. δ)p1. θ1} and let λ0 = {p0. θ1) > Cf(z. Then for testing the hypothesis H : θ = θ0 against the alternative A : θ = θ1. in fact. δλ z = ⎨ ⎩0.378 13 Testing Hypotheses THEOREM 8 Let X1. In either case. where ⎫ ⎧ pL B = ⎨z ∈ ޒn . Then by virtue of (44).i. θ1 z∈B [ ( ) ( )] 0 for the discrete case. it follows that the δ which minimizes Rλ (δ) is given by ⎧ ⎪1. where B = {z ∈ Rn. δλ z = ⎨ ⎪0. there exists a Bayes decision function δλ corresponding to λ0 = {p0. δ)p0 + R(θ1. REMARK 8 . . r.v.

0 1 ( ) ( ) As a numerical example. take θ0 = 0. EXAMPLE 13 Let X1. .004 0 ( ) [ ( ) ] ( ) . 1). Therefore the minimax decision function is given by ( ) [ ( )] ( ) ( ) ⎧1. ( ) ( ) or Pθ 1 [ n X − θ1 < 5 C0 − 1 = 2 Pθ ( ) ( )] 0 [ n X − θ0 > 5C0 .996 = 0.5.53 = P N 0. θ ) exp ⎢ n(θ − θ )⎥ f z. . where C0 = 1 log C θ1 + θ0 + 2 n θ1 − θ0 ) ( ) (for θ > θ ). θ0) is equivalent to ⎡1 ⎤ exp n θ1 − θ0 x > C exp⎢ n θ12 − θ02 ⎥ or ⎣2 ⎦ [( )] ( ( ) x > C0 . We are interested in determining the minimax decision function δ for testing the hypothesis H : θ = θ0 against the alternative A : θ = θ1. . δ z =⎨ ⎩0. Xn be i.53 × 5 = 1 − Φ 2. ( ) ] or Φ 5C0 − 5 = 2 1 − Φ 5C0 .9 UMP Decision-Theoretic Tests for Testing Viewpoint Certain Composite of Testing Hypotheses 379 The following examples are meant as illustrations of Theorems 7 and 8.3 13.53. θ1) > Cf(z.v.65 = 1 − 0.’s from N(θ. 1 > 0. as is found by the Normal tables. () if x > 0. θ 1 0 1 0 ⎣2 2 1 2 0 ⎦ so that f(z. We have ( ) = exp[n(θ − θ )x ] . 1 2 Then condition (46) becomes L1Pθ X > C0 = L2 Pθ X ≤ C0 . θ1 = 1.d. or 2 Φ 5C0 − Φ 5 − 5C0 = 1 Hence C0 = 0. n = 25 and L1 = 5.i. . Then L1 Pθ 0 X > C0 = L2 Pθ 1 X < C0 ( ) ( ) becomes Pθ 1 X < C0 = 2Pθ0 X > C0 . r.53 otherwise. ⎡1 ⎤ f ( z. L2 = 2.13. The type-I error probability of this test is Pθ X > 0.

We are interested in determining the minimax decision function δ for testing H : θ = θ0 against A : θ = θ1. where C0 = 1 log C θ1 + θ0 + 2 n θ1 − θ0 ( ) ( ) and C p0 L1 . 3 } is given by ⎧1. where x = ∑ x j . θ). Therefore relation (51) gives that the Bayes risk 2 corresponding to { 3 .02 and R θ1 .75) = 0.35 = 0. p1 = 1 .9878. θ ) ⎝ θ f z.v. δ = 0. . 0 () if x > 0.004 = 0. δ = 2.55).9906.25) = 0. R θ1 . p1 }.380 13 Testing Hypotheses and the power of the test is Pθ 1 X > 0. 3 1 EXAMPLE 15 Let X1. δλ ′ z =⎨ ⎩0. 0 ¯ > 0. . δλ z = ⎨ ⎩0. We have ( ) = ⎛θ ⎜ f ( z.25] = Φ(2. θ1) > Cf(z. EXAMPLE 14 [( ) ( )] Refer to Example 13 and determine the Bayes decision function corresponding to λ0 = {p0. 1) > − 1 − Φ(2.555451 (≈0. From the discussion in the previous example it follows that the Bayes decision function is given by ⎧1. Then C = 4 and C0 = 0. θ 1 0 ⎞ ⎛ 1 − θ1 ⎞ ⎟ ⎜ ⎟ 0 ⎠ ⎝ 1 − θ0 ⎠ 1 x n −x .0202. 1) > 2.55) = P[N(1.0235. j =1 n so that f(z. θ0) is equivalent to x log (1 − θ )θ > C ′ .5 × 0.55 otherwise.0235.009 = 0.d. 1 } is equal to 0.003 and the power of the test is Pθ (X 2.i. . r.53 = P N 0. ( ) [ ( ) ( )] ( ) Therefore relation (44) gives R θ 0 . 0 () if x > C0 otherwise. Therefore the 3 2 1 Bayes decision function corresponding to λ′ 0 = { 3 . 1 > 5 0.75] = The type-I error probability of this test is Pθ (X ¯ > 0. ( ) ( ) Thus max R θ 0 . . δ .53 − 1 = Φ 2.’s from B(1. θ (1 − θ ) 0 1 0 0 1 . corresponding to the minimax δ given above.55) = P[N(0. p1L2 2 Suppose p0 = 3 . δ = 5 × 0. Xn be i.

Then (1 − θ )θ = 3 θ (1 − θ ) 0 1 0 1 ( > 1) ( ).5.856. θ1 = 0. L2 = 0.3 13. it follows then that relation (46) is satisﬁed. if x > 13 otherwise.1071. θ1) > Cf(z. where ⎛ 1 − θ1 ⎞ C0 = ⎜ log C − n log ⎟ 1 − θ0 ⎠ ⎝ n j=1 log Next.5 × 0.7858. θ) and for C0 = 13.0577 and P0. so that P0. With the chosen values of L1 and L2. X = Σ Xj is B(n.2142 = 0.75(X ≤ 13) = 0. Furthermore.5(X > 13) = 0.13. Therefore the minimax decision function is determined by ⎧1.2142. θ0) is equivalent to x > C0.75.9 UMP Decision-Theoretic Tests for Testing Viewpoint Certain Composite of Testing Hypotheses 381 where C 0′ = log C − n log 1 − θ1 . the minimax risk is equal to 0. n = 20 and L1 = 1071/577 ≈ 1. θ (1 − θ ) θ1 1 − θ0 0 1 and therefore f(z. we have P0. 1 − θ0 Let now θ0 = 0. δ (z ) = ⎨ ⎩0.5.75(X > 13) = 0. .

then the relevant random experiment was supposed to have been independently repeated n times and ﬁnally.’s Z1. on the basis of the outcomes. it is advantageous not to ﬁx the sample size in advance. a stopping time N is also allowed to take the value ∞ but with probability equal to zero. Thus. In connection with such a sampling scheme.1 Some Basic Theorems of Sequential Sampling In all of the discussions so far. . Zn alone. In such a case and when forming EN. .382 14 Sequential Procedures Chapter 14 Sequential Procedures 14. one after another. for example.’s. in many other cases this is. for each n. the event (N = n) depends on the r. Z2. In certain circumstances. . . . say. as a rule. the term ∞ · 0 appears. suppose we observe the r. the case.v. . N such that. 382 . the random sample Z1.v.’s Z1. Zn. and we stop observing them after a speciﬁed event occurs. a point estimate or a test was constructed with certain optimal properties.v. A stopping time (deﬁned on this sequence) is a positive integer-valued r.v. that we have dealt with was assumed to be of ﬁxed size n. whereas in some situations the random experiment under consideration cannot be repeated at will. . but to keep sampling and terminate the experiment according to a (random) stopping time. . . In the latter case. but that is interpreted as 0 and no problem arises. Now. REMARK 1 Next. . in the point estimation and testing hypotheses problems the sample size n was ﬁxed beforehand. . indeed. a single one at a time (sequentially). we have the following deﬁnition. DEFINITION 2 A sampling procedure which terminates according to a stopping time is called a sequential procedure. DEFINITION 1 Let {Zn} be a sequence of r.

j ≥ 1. it is assumed that all conditional expectations. E(|Yj||N = n) = E|Yj| = m. n= j ( )( ) ) ∞ ( )( ) The event (N = n) depends only on Y1. (3) In all that follows. Also set TN = Y1 + · · · + YN. Then the r. .14. LEMMA 1 In the notation introduced above: ∞ i) ∑∞ j = 1∑ n = jE(|Yj||N = n)P(N = n) = m E N(<∞). where YN and TN are deﬁned in a way similar to the way ZN and SN are deﬁned by (1) and (2). n ∞ ∞ ii) ∑∞ n = 1∑j = 1E(|Yj||N = n)P(N = n) = ∑ j = 1∑ n = jE(|Yj||N = n)P(N = n). Yn and hence. THEOREM 1 (Wald’s lemma for sequential analysis) For j ≥ 1. so that EZj = μ is also ﬁnite. . let Zj be independent r. given N = n.’s Y1. We set E(Yj|N = n) = 0 (accordingly. () (1) () () () s ∈S (2) are of interest and one of the problems associated with them is that of ﬁnding the expectation ESN of the r. respectively.’s (not necessarily identically distributed) with identical ﬁrst moments such that E|Zj| = M < ∞. . j ≥ 1. Quite often the partial sums SN = Z1 + · · · + ZN deﬁned by S N s = Z1 s + ⋅ ⋅ ⋅ + Z N ( s ) s .v. where ZN is deﬁned as follows: the value of Z N at s ∈ S is equal to Z N ( s ) s . this expectation is provided by a formula due to Wald. Under suitable regularity conditions. PROOF i) For j ≥ 2. For this purpose. are ﬁnite for all n for which P(N = n) > 0. Let N be a stopping time.v. set Yj = Zj − μ. ZN. where SN is deﬁned by (2) and ZN is deﬁned by (1).1 Some Basic Theorems of Sequential Sampling 383 Thus a sequential procedure terminates with the r. deﬁned on the sequence {Zj}. for j > n. EYj = 0 and have (common) ﬁnite absolute moment of ﬁrst order to be denoted by m.v. . . . that is. (∞ >)m = E Y j −1 n =1 j = E E Yj N = ∑ E Yj N = n P N = n n =1 [( )] ∞ ( )( ) ) (4) = ∑ E Yj N = n P N = n + ∑ E Y j N = n P N = n . and assume that EN is ﬁnite. Then E|SN| < ∞ and ESN = μEN. are independent. The proof of the theorem is simpliﬁed by ﬁrst formulating and proving a lemma.v. Then we will show that ETN < ∞ and ETN = 0. SN. E(|Yj||N = n) = 0 for those n’s for which P(N = n) = 0). Y2. E|Yj| = m < ∞. Therefore (4) becomes m = m∑ P N = n + ∑ E Y j N = n P N = n n =1 n =j j −1 ( ∞ ( )( . .

T. To this end. in Mathematical Analysis.M. the case follows from part (i) and calculus results (see. ▲ PROOF OF THEOREM 1 Since TN = SN − μN. indeed. ii) By setting pjn = E(|Yj||N = n)P(N = n). ∞ n= j j ≥ 1. for example. Apostol. = ∑ ∑ E Yj N = n P N = n j=1 n= j ∞ ∞ ( )( . we have ETN = E E TN N = ∑ E TN N = n P N = n n=1 ∞ ⎛ n ⎞ = ∑ E⎜ ∑ Y j N = n⎟ P N = n n=1 ⎝ j=1 ⎠ [( )] ∞ ( )( )( ) ( ) ) ∞ ∞ n ⎛ n ⎞ ≤ ∑ E⎜ ∑ Y j N = n⎟ P N = n = ∑ ∑ E Y j N = n P N = n ⎝ j=1 ⎠ n=1 n=1 j=1 ( ( )( ) ) (by Lemma 1(ii)) = mEN (< ∞) (by Lemma 1(i)). n =1 ( ) ∞ ( )( ) Therefore ∑ E( Y j N = n)P( N = n) = mP( N ≥ j ). j =1 j =1 )( ) ∞ ( ) ∞ ( ) (6) ∞ where the equality ∑∞ j =1P(N ≥ j) = ∑ j =1 jP(N = j) is shown in Exercise 14.n +1 + ⋅ ⋅ ⋅ + ⋅ ⋅ ⋅ ( ) ( ) ( ) are equal.1. Addison-Wesley. page 373. ( ) ( ) and ∑∑ p j =1 n = j ∞ ∞ jn = p11 + p12 + ⋅ ⋅ ⋅ + p22 + p23 + ⋅ ⋅ ⋅ + ⋅ ⋅ ⋅ + pnn + pn .384 14 Sequential Procedures or mP N ≥ j = ∑ E Yj N = n P N = n . Relation (6) establishes part (i). n =j ( ) ∞ ( )( ) (5) Equality (5) is also true for j = 1. Theorem 12–42. and hence ∑ ∑ E( Y ∞ ∞ j =1n = j j N = n P N = n = m∑ P N ≥ j = m∑ jP N = j = mEN . That this is. 1957). as mP N ≥ 1 = m = E Y1 = ∑ E Y1 N = n P N = n . this part asserts that n =1 j =1 ∑∑ p ∞ n jn = p11 + p12 + p22 + ⋅ ⋅ ⋅ + p1n + p2n + ⋅ ⋅ ⋅ + pnn + ⋅ ⋅ ⋅ .1. it sufﬁces to show (3).

C2 be two constants such that C1 < C2. Next. (11) Summing up over j ≥ 1 in (11). since ∑ ∑ E(Y ∞ ∞ j =1 n = j j N = n P N = n ≤ ∑ ∑ E Yj N = n P N = n < ∞ j =1 n = j )( ) ∞ ∞ ( )( ) by Lemma 1(i). Then we have the following result. and let C1.14. N = N(s). .’s Z1. Therefore ∑ E(Y j N = n)P( N = n) = 0. for j > n. (10) By (8). . the value of N at s. . ∞ n= j j ≥ 2. Set Sn = Z1 + · · · + Zn and deﬁne the random quantity N as follows: N is the smallest value of n for which Sn ≤ C1 or Sn ≥ C2.v. for each s ∈S. then set N(s) = ∞. for j ≥ 2. this is also true for j = 1. N(s). E(Yj|N = n) = EYj = 0. In other words. for which SN(s) ≤ C1 or SN(s) ≥ C2. so that. )( ) (12) Relations (7) and (12) complete the proof of the theorem. relation (8) becomes as follows: 0 = ∑ E Yj N = n P N = n + ∑ E Yj N = n P N = n n =1 ∞ n= j j −1 ( ( ) ∞ ( )( ) (9) = ∑ E Yj N = n P N = n . Yn. we have then ∑ ∑ E(Y ∞ ∞ j =1n = j j N = n P N = n = 0. If C1 < Sn(s) < C2 for all n. n =1 [ ( )] )( )( ∞ ( )( ) (8) whereas. is assigned as follows: Look at Sn(s) for n ≥ 1. Therefore (9) yields ∑ E(Y j N = n)P( N = n) = 0. . .1 Some Basic Theorems of Sequential Sampling 385 ii) Here ETN = E E TN N = ∑ E TN N = n P N = n n =1 [( ∞ )] ∞ ( )( ) ∞ ∞ n ⎛ n ⎞ = ∑ E ⎜ ∑ Yj N = n⎟ P N = n = ∑ ∑ E Yj N = n P N = n ⎠ n =1 ⎝ j =1 n =1 j =1 ( ) ( )( ) (7) = ∑ ∑ E Yj N = n P N = n . . then set N = ∞. j =1 n = j ∞ ( )( ) This last equality holds by Lemma 1(ii). ▲ Now consider any r. and ﬁnd the ﬁrst n. . say. . Z2. ∞ n= j j ≥ 1. If C1 < Sn < C2 for all n. for j ≥ 1. n= j ) This is so because the event (N = n) depends only on Y1. 0 = EYj = E E Yj N = ∑ E Yj N = n P N = n .

km. Set Sn = Z1 + · · · + Zn and for two constants C1. . C2 with C1 < C2. set N = ∞ if C1 < Sn < C2 for all n. Next. . Z2. C2 as in the theorem and ε as in (14). km implies Z jm +1 + ⋅ ⋅ ⋅ + Z( j +1)m ≤ C 2 − C1 . Thus for the case that P(Zj > 0) > 0. if for some j = 0. if P(Zj > ε) = 0 for every ε > 0. i = 1. . In fact. . 1. each one of which is greater than ε. .i. or P(Zj < 0) > 0. Thus ⎛ k +m ⎞ ⎡ k +m ⎤ k +m P ⎜ ∑ Z j > C 2 − C1 ⎟ ≥ P ⎢ I Z j > ε ⎥ = ∏ P Z j > ε = δ m . in particular. . i = 1. ⎝ j =k +1 ⎠ ⎝ j =k +1 ⎠ ) (17) the ﬁrst inclusion being obvious because there are m Z’s. . Clearly ( ) ( ) Skm = ∑ Z jm +1 + ⋅ ⋅ ⋅ + Z( j + 1)m . k − 1. . deﬁne the r. this inequality together with Sjm > C1 would imply that S( j+1)m > C2. ⎝ j = k +1 ⎠ ⎢ ⎥ ⎣j = k +1 ⎦ j = k +1 the inequality following from (17) and the equalities being true because of the independence of the Z’s and (14). (18) This is so because. then. we shall show that ⎛ k +m ⎞ P ⎜ ∑ Z j > C 2 − C1 ⎟ > δ m ⎝ j = k +1 ⎠ We have k +m (15) for k ≥ 0. Then there exist c > 0 and 0 < r < 1 such that P N ≥ n ≤ cr n PROOF ( ) for all n. . P(Zj > 1/n) = 0 for all n. . Let us suppose ﬁrst that P(Zj > 0) > 0. j =0 k −1 [ ] Now we assert that C1 < Si < C 2 . . k − 1. .’s such that P(Zj = 0) ≠ 1. . j = 0. r. there exists a positive integer m such that ( ) (14) mε > C 2 − C1. and the second inclusion being true because of (15). . . For such an m. Then there exists ε > 0 such that P(Zj > ε) = δ > 0. . be i.v. quantity N as the smallest n for which Sn ≤ C1 or Sn ≥ C2. which is in contradiction to C1 < Si < C2. .386 14 Sequential Procedures THEOREM 2 Let Z1.d. (16) j =k +1 I (Z j ⎛ k +m ⎞ ⎛ k +m ⎞ > ε ⊆ ⎜ ∑ Z j > mε ⎟ ⊆ ⎜ ∑ Z j > C 2 − C1 ⎟ . . a contradiction. . . we have that There exists ε > 0 such that P Z j > ε = δ > 0. we suppose that Zjm+1 + · · · + Z( j+1)m > C2 − C1. But (Zj > 1/n) ↑ (Zj > 0) and hence 0 = P(Zj > 1/n) → P(Zj > 0) > 0. 1. (13) The assumption P(Zj = 0) ≠ 1 implies that P(Zj > 0) > 0. With C1.

.2. k the last inequality holding true because of (16) and the equality before it by the independence of the Z’s. . j = 1. we have (i) P(N < ∞) = 1 and (ii) EN < ∞. relation (13) is established. j= 0 k −1 [ ) ] the ﬁrst inclusion being obvious from the deﬁnition of N and the second one following from (18). these inequalities and equalities are true because of the choice of k. ▲ The theorem just proved has the following important corollary. PROOF i) Set A = (N = ∞) and An = (N ≥ n). . A = പ∞ n=1An. Then. Therefore ⎧k −1 ⎪ P N ≥ km + 1 ≤ P ⎨I Z jm +1 + ⋅ ⋅ ⋅ + Z( j +1)m ≤ C 2 − C1 ⎪ ⎩j = 0 ( ) [ = ∏ P[ Z k −1 j =0 k −1 j =0 jm +1 + ⋅ ⋅ ⋅ + Z( j +1)m ⎫ ⎬ ]⎪ ⎪ ⎭ ≤ C −C ] 2 1 ≤ ∏ 1−δm = 1−δm ( ) ( ). The case P(Zj < 0) > 0 is treated entirely symmetrically. choose k so that km < n ≤ (k + 1)m. Thus P N ≥ km + 1 ≤ 1 − δ m . Since also A1 ʖ A2 ʖ · · · . m m 1/m ( ) ( ) k (19) Now set c = 1/(1 − δ ).1 Some Basic Theorems of Sequential Sampling 387 (N ≥ km + 1) ⊆ (C 1 < S j < C 2 . . Thus for the case that P(Zj > 0) > 0. relation (19) and the deﬁnition of c and r. we have A = lim An and hence n→∞ P A = P lim An = lim P An n→∞ n→∞ ( ) ( ) ( ) .14. and for a given n. (See also Exercise 14. and also leads to (13). clearly. km ⊆ I Z jm +1 + ⋅ ⋅ ⋅ + Z( j +1)m ≤ C 2 − C1 . We have then P N ≥ n ≤ P N ≥ km + 1 ≤ 1 − δ m = ( ) ( ) ( m (1 − δ ) m 1 (1 − δ ) ) = c⎡ (1 − δ ) ⎢ ⎣ k k+ 1 m 1 m ⎤ ⎥ ⎦ (k + 1 )m = cr (k + 1 )m ≤ cr n .1. COROLLARY Under the assumptions of Theorem 2. r = (1 − δ ) .) The proof of the theorem is then completed.

N is a stopping time by Deﬁnition 1 and Remark 1.2 Sequential Probability Ratio Test Although in the point estimation and testing hypotheses problems discussed in Chapter 12 and 13. REMARK 2 Exercises 14. 14. . from the deﬁnition of N it follows that for each n. at level of signiﬁcance α (0 < α < 1) without ﬁxing in advance the sample size n.ޒf1(x) > 0}. and suppose that we are interested in testing the (simple) hypothesis H: the true density is f0 against the (simple) alternative A: the true density is f1.i. as was to be shown.v.1 14. the mathematical machinery involved is well beyond the level of this book. Let a. proﬁtable. . On the other hand. so that P(A) = 0. assume that P(Zj < 0) > 0 and arrive at relation (13). . . 1−r as was to be seen.’s Z1.2 For a positive integer-valued r.ޒf0(x) > 0} = {x ∈ . Accordingly. b.f.388 14 Sequential Procedures by Theorem 2 in Chapter 2. . In Theorem 2. and for each n. . . 0. ▲ The r. Thus lim P(An) = 0. . In order to simplify matters.1.d. Zn. f (X ) ⋅ ⋅ ⋅ f (X ) f1 X1 ⋅ ⋅ ⋅ f1 X n 0 1 0 n . ii) We have EN = ∑ nP N = n = ∑ P N ≥ n ≤ ∑ cr n = c ∑ r n n=1 n=1 n=1 n=1 ∞ ( ) ∞ ( ) ∞ ∞ =c r < ∞. the event (N = n) depends only on the r. respectively (as well as in the interval estimation problems to be dealt with in Chapter 15). let X1. in general.d. X2. be two numbers (to be determined later) such that 0 < a < b.v. consider the ratio λ n = λ n X1 . N is positive integer-valued and it might also take on the value ∞ but with probability 0 by the ﬁrst part of the corollary. To this end. But P(An) ≤ cr n by the theorem. We are going to consider only the problem of sequentially testing a simple hypothesis against a simple alternative as a way of illustrating the application of sequential procedures in a concrete problem. 1 = ( ) ( ) ( ). either f0 or else f1.1. . sampling according to a stopping time is. be i. we also assume that {x ∈ . N show that EN = ∑∞ n=1P(N ≥ n). .v.v. . r.’s with p. X n .

Xn. and for j = 1. be i. This assumption is equivalent to Pi(Z1 ≠ 0) > 0 under which the corollary to Theorem 2 applies. 0. Thus. . deﬁne the random quantity N as the smallest n for which λn ≤ a or λn ≥ b. . the smallest n for which Sn ≤ log a or Sn ≥ log b for all n.d. 1 is. Summarizing. For testing H against A. Under suitable additional assumptions. the event (N = n) depends only on X1. . At this point. The implication of Pi(N < ∞) = 1. r. regardless of whether the true density is f0 or f1. and if Sn = ∑n j = 1Zj. . either f0 or else f1. the Zj’s are i. j =1 n Clearly. . xn are the observed values of X1. . j =1 n For two numbers a and b with 0 < a < b. for each n. xn. . clearly. 1. stop sampling and reject H. . . if C is the set over which f0 and f1 differ. . . . 0. 1). 2. equivalently. .’s with p. since the X’s are so. we restrict ourselves to the common set of positivity of f0 and f1. . PROPOSITION 1 Let X1. 1. . equivalently. we shall show that the value ∞ is taken on only with probability 0. In what follows. take another observation. n. n Sn = ∑ Z j = log λn . . . i = 0. that the SPRT described above will . we have that N takes on the values 1. . . where x1. consider the following sequential procedure: As long as a < λn < b. we also make the assumption that Pi[ f0(X1) ≠ f1(X1)] > 0 for i = 0. By letting N stand for the smallest n for which λn ≤ a or λn ≥ b.14.i. f (X ) f1 X j 0 j so that log λn = ∑ Z j . . and suppose that {x ∈ . . . .f. . For each n. and possibly ∞. .v. . 1. 1 = log ( ) ( ). then it is assumed that ∫C f0(x)dx > 0 and ∫C f1(x)dx > 0 for the continuous case. Then the sequential procedure just described is called a sequential probability ratio test (SPRT) for obvious reasons. stop sampling and accept H and as soon as λn ≥ b. . f (X ) f1 X j 0 j j = 1.2 Theorems Sequential ofProbability SequentialRatio Sampling Test 389 We shall use the same notation λn for λn (x1. . of course.d.ޒf (x) > 0} = {x ∈ .i. so that N will be a stopping time. ( ) i = 0.ޒf (x) > 0} 0 1 and that Pi [ f0(X1) ≠ f1(X1)] > 0.d. . . f (X ) ⋅ ⋅ ⋅ f (X ) f1 X1 ⋅ ⋅ ⋅ f1 X n 0 1 0 n Z j = log ( ).1 Some Basic 14. Xn. set λn = and ( ) ( ). and as soon as λn ≤ a. i = 0. and. then N is redeﬁned as the smallest n for which Sn ≤ log a or Sn ≥ log b. X2. we have the following result. Then Pi N < ∞ = 1 and Ei N < ∞. the proposition assures us that N is actually a stopping time with ﬁnite expectation. set Z j = Z j X j .

a < λ 0 1 0 1 0 1 [( ) ( ) ) + a < λ1 < b. For each n. λ + P ( a < λ < b. at least from theoretical point of view. respectively. . the actual identiﬁcation presents difﬁculties. x n ∈ ޒn . a < λ n−1 < b. λ n ≤ a + ⋅ ⋅ ⋅ . as will be seen. . . . . .390 14 Sequential Procedures terminate with probability one and acceptance or rejection of H. In the formulation of the proposition above. (23) f0 n ⎪ ⎭ ⎫ f1n ⎪ ≥ b⎬. λ 2 ( ( = P (λ ≤ a ) + P ( a < λ < b. x n ∈ ޒn . a < λ n−1 < b. . j = 1. a < 1 j < b. To start with. . . . . n − 1 and f0 j ⎪ ⎩ ( ) ( ) (22) ( ( ) ) ⎫ f1n ⎪ ≤ a ⎬. a < 1 j < b. . and let α < β < 1. ޒ f01 f01 x1 ⎭ ⎩ ⎪ ⎪ ⎩ ⎭ and for n ≥ 2. n−1 ) Relations (20) and (21) allow us to determine theoretically the cut-off points a and b when α and β are given. . ⎧ ′ f ⎪ Tn ′ = ⎨ x1 . set fin = f x1 . λ n ≥ b + ⋅ ⋅ ⋅ 2 ≥b +⋅⋅⋅ < b. . in testing H against A. . x n . . . However. . . . deﬁne T ′n and T ″ n as below. λ n ≤ a + ⋅ ⋅ ⋅ 2 ) ] (21) ≤a +⋅⋅⋅ < b. regardless of the true underlying density. j = 1. i . From their own deﬁnition. (24) f0 n ⎪ ⎭ . . . 1 ( ) and in terms of them. λ 2 ≥ b + ⋅ ⋅ ⋅ ( = P (λ ≥ b) + P ( a < λ < b. . and the use of approximate values is often necessary. the determination of a and b was postponed until later. . . . n − 1 and f0 j ⎪ ⎩ ⎧ ′ f ⎪ Tn ′′ = ⎨ x1 . . . . we have α = P rejecting H when H is true ( = P0 λ1 ≥ b + a < λ1 < b. . a < λ 1 1 1 1 1 1 [( ) ( ) ≤ a) + ⋅ ⋅ ⋅ ) + a < λ1 < b. we proceed as follows. we shall see what is the exact determination of a and b. ޒ11 ≤ a ⎬. λ n ≥ b + ⋅ ⋅ ⋅ ) ) ] (20) n−1 ) and 1 − β = P accepting H when H is false =P 1 λ1 ≤ a + a < λ1 < b. . i = 0. At this point. namely ⎧ ⎫ f11 x1 ⎫ ⎧ f ⎪ ⎪ ≥ b⎬ T1′ = ⎨x1 ∈ . . . . . T1′′= ⎨x1 ∈ . . λ + P ( a < λ < b. let α and 1 − β be prescribed ﬁrst and second type of errors. In order to ﬁnd workable values of a and b.

n =1 Tn ′′ ∞ But on T ″ n. n=1 n=1 T′ n n=1 T ′′ n ∞ ( ) ∞ ∞ (26) From (21).2. we clearly have Pi N = n = ∫ fin + ∫ fin . so that f0n ≤ (1/b)f1n. one has α = ∑ ∫ f0n . i = 0. Now set α≥ ( 1− β β . From (20). (28) 1−β β (29) . b n=1 ∫T ′′ (25) On the other hand. 1. 1 = ∑ Pi N = n = ∑ ∫ fin + ∑ ∫ fin . Furthermore. let α ′ . Also. 1−α α Relation (29) provides us with a lower bound and an upper bound for the actual cut-off points a and b.14.1 Some Basic 14. f1n/fon ≥ b.2 Theorems Sequential ofProbability SequentialRatio Sampling Test 391 In other words. 1. b≤ . and b′ = 1−α α so that 0 < a′ < b′ by the assumption α < β < 1 .1). the discrete case being treated in the same way by replacing integrals by summation signs. we also obtain (27) 1−α ≥ 1− β From (27) and (28) it follows then that ( ) a. Therefore α = ∑ ∫ f0 n ≤ n=1 T ′′ n ∞ 1 ∞ ∑ n f1n. and in a very similar way (see also Exercise 14. while T ″ n is the set of points in ޒfor which the SPRT terminates with n observations and rejects H. (24) and (26) (with i = 1). (22) and (23). In the remainder of this section. we have 1 − β = ∑ ∫ f1n = 1 − ∑ ∫ f1n . (22). a′ = ) (30) and suppose that the SPRT is carried out by employing the cut-off points a′ and b′ given by (30) rather than the original ones a and b. the differentials in the integrals will not be indicated. so that n=1 T′ n n=1 T ′′ n ∞ ∞ ∑ ∫T′′ f1n = β. T ′n is the set of points in ޒn for which the SPRT terminates n with n observations and accepts H. respectively. i = 0. the arguments will be carried out for the case that the Xj’s are continuous. for simplicity. n=1 n ∞ Relation (25) becomes then α ≤ β b. T′ n T ′′ n ( ) and by Proposition 1.

The only problem which may arise is that. say. From (33) it follows that α ′ > α and 1 − β ′ > 1 − β cannot happen simultaneously. PROPOSITION 2 ) (33) Summarizing the main points of our derivations. respectively. As a consequence. Then relation (32) provides upper bounds for α ′ and 1 − β ′ and inequality (33) shows that their sum α ′ + (1 − β′) is always bounded above by α + (1 − β). we have α ′ < 0. a and b by α′. It can be argued that this does not happen. β ′.0106 and 1 − β′ < 0. respectively. the typical values of α and 1 − β are such as 0. β β (31) 1−β 1−β 1 −α′ ≤ 1−α 1−α ( ) and α ′ ≤ That is. and then it follows from (32) that α ′ and 1 − β ′ lie close to α and 1 − β. the cut-off points a and b are determined by (20) and (21).05 and 0. For example.01. Furthermore.01 and 1 − β = 0.2. the resulting α ′ and 1 − β ′ are too small compared to α and 1 − β.1 Derive inequality (28) by using arguments similar to the ones employed in establishing relation (27).0506. So there is no serious problem as far as α ′ and 1 − β′ are concerned.392 14 Sequential Procedures and 1 − β ′ be the two types of errors associated with a′ and b′. 0. we would be led to taking a much larger number of observations than would actually be needed to obtain α and β. we obtain 1 − β′ 1−β ≤ a′ = 1 −α′ 1−α and hence 1 − β′ ≤ and β β′ = b′ ≤ α α′ α α β′ ≤ . β. in (29) and also taking into consideration (30). REMARK 3 Exercise 14. for α = 0. because a′ and b′ are used instead of a and b. a′ and b′.1. and − αβ ′ ≤ −α ′β . α′ ≤ From (31) we also have α β and 1 − β ′ ≤ 1−β . For testing H against A by means of the SPRT with prescribed error probabilities α and 1 − β such that α < β < 1. α′ + 1 − β′ ≤ α + 1 − β . (1 − β ′) − α + αβ ′ ≤ (1 − β ) − α ′ + α ′β ( ) ( and by adding them up. Relation (30) provides approximate cut-off points a′ and b′ with corresponding error probabilities α ′ and 1 − β ′. . Then replacing α. 1−α (32) (1 − α )(1 − β ′) ≤ (1 − β )(1 − α ′) or and α ′β ≤ αβ ′.05. respectively. we have the following result.

. as follows. we have the relationships below: (a < λ and j < b. the SPRT with error probabilities α and 1 − β minimizes the expected sample size under both H and A (that is. n= 2 ( ) ∞ ( λ n ≤ a or λ n ≥ b . The remaining part of this section is devoted to calculating the expected sample size of the SPRT with given error probabilities. ( ) ( ) ( ) ( ) . but the actual calculations are tedious. P0 S N ≥ B = α P P 1 SN ≤ A = 1 − β. it minimizes E0N and E1N) among all tests (sequential or not) with error probabilities bounded above by α and 1 − β and for which the expected sample size is ﬁnite under both H and A. ) ( ) (36) Pi S N ≤ A ( ) and Pi S N ≥ B . . Thus formula (34) provides the expected sample size of the SPRT under both H and A. λn ≤ a or λ ≥ b j n ) ∑Z i =1 n i ⎛ = ⎜ A < ∑ Zi < B. . n − 1.1 14. i = 0. . By setting A = log a and B = log b. This suggests that we should try to ﬁnd an approximate value to Ei N. ⎝ i =1 ∑ Zi ≤ A or i =1 ⎞ ≥ B⎟ .14. Then we clearly have Ei N = ∑ nPi N = n = 1Pi N = 1 + ∑ nPi N = n n=1 n= 2 ∞ ( ) ( ) ∞ ( ) ) (34) = Pi λ1 ≤ a or λ1 ≥ b + ∑ nPi a < λ j < b. . . we are led to assume as an approximation that SN takes on the values A and B with respective probabilities j i=1 (λ 1 ≤ a or λ1 ≥ b = Z1 ≤ A or Z1 ≥ B . . whose proof is omitted as being well beyond the scope of this book. j = 1. .3 Some Optimality Basicof Theorems the SPRT-Expected of Sequential Sample Sampling Size 393 14. n − 1 lie between A and B and it is only the ∑ n i=1Zi which is either ≤A or ≥B. and let N be the associated stopping time. .3 Optimality of the SPRT-Expected Sample Size An optimal property of the SPRT is stated in the following theorem. . We would then expect that ∑ n i = 1Zi would not be too far away from either A or B. . by letting SN = ∑ N i = 1Zi. j = 1. 1. n − 1. Accordingly. . and also ﬁnding approximations to the expected sample size. THEOREM 3 For testing H against A. 1. ( ) i = 0. all partial sums ∑ Zi. j = 1. and this is due to the nth observation Zn. . So consider the SPRT with error probabilities α and 1 − β. . 1 S N ≥ B = β. . n − 1. But and P0 S N ≤ A = 1 − α . . j = 1. n ≥ 2 ⎠ (35) From the right-hand side of (35).

If furthermore Ei|Z1| < ∞ and EiZ1 ≠ 0. 1. relation (38) provides approximations to EiN. 1. ∞).05 with α = 0. . Actually. this becomes E0 N ≈ (1 − α )A + αB .d. Thus in the present case f0 = f(·. in order to be able to calculate the approximations given by (38). and for θ0.d. then EiN = (EiSN)/(EiZ1).03 against the alternative A : θ = 0. i = 0.’s X1. PROPOSITION 3 In the SPRT with error probabilities α and 1 − β. Theorem 1 gives EiSN = (EiN)(EiZ1).3.’s distributed as P(θ). Hence.3. the problem is that of testing H : θ = θ0 against A : θ = θ1 by means of the SPRT with error probabilities α and 1 − β. θ0) and f1 = f(·. i = 0. E1Z1 (38) Thus we have the following result. Exercises 14. ∞). 1. with p.1 Let X1. Use the SPRT for testing the hypothesis H : θ = 0. By virtue of (37).4 Some Examples This chapter is closed with two examples. ( ) ( ) (37) On the other hand. we also assume that α < β < 1.i. . be independent r. . . . are i. 1 is given by (34). X2. if also Ei Z1 ≠ 0. REMARK 4 A ≈ log a ′ = log 1−β 1−α and B ≈ log b′ = β . θ1 ∈ Ω with θ0 < θ1.1. 14. i = 0. 1 − β = 0. the expected sample size EiN. In both. 14. .05. θ1).v. θ ∈ Ω = (0. by assuming that Ei |Z1| < ∞. X2. θ).f. that is. α (39) In utilizing (39). θ ∈ Ω ⊆ ޒ. E0 Z1 E1 N ≈ (1 − β )A + βB .394 14 Sequential Procedures Therefore we obtain E0 S N ≈ 1 − α A + αB and E1 S N ≈ 1 − β A + βB. since (30) was derived under this additional (but entirely reasonable) condition. i = 0. the r. . f(·. it is necessary to replace A and B by their approximate values taken from (30).v. Find the expected sample sizes under both H and A and compare them with the ﬁxed sample size of the MP test for testing H against A with the same α and 1 − β as above.2 Discuss the same questions as in the previous exercise if the Xj’s are independently distributed as Negative Exponential with parameter θ ∈ Ω = (0.

01 For the cut-off points a′ and b′. . is to set up the formal SPRT and for selected numerical values of α and 1 − β.05.99 Next. where a′ and b′ are given by (30). . 1.’s with p. a′ = 0.97772. upper bounds for α ′ and 1 − β′. Then the cut-off points a and b are approximately equal to a′ and b′. so that EiZ1 = θ i log θ1 1 − θ0 0 ( ) + log 1 − θ 1 −θ θ (1 − θ ) 1 1 0 .v.05 ≈ 0. estimate EiN. θ ∈ Ω = 0.1 Some Basic Theorems of 14.0505 and b′ = = 95. Then A ≈ log (42) . respectively. (41) For a numerical application. relation (39) gives 5 = −1. . i = 0. 99 3 4 – At this point. x = 0. 1 .f.95 = ≈ 0. the corresponding error probabilities α ′ and 1 − β′ are bounded as follows according to (32): a′ ≤ 0. 1. 0. be i. take α = 0. In the present case. ( ) Then the test statistic λn is given by ⎛θ ⎞ λn = ⎜ 1 ⎟ ⎝ θ2 ⎠ ∑j X j ⎛ 1 − θ1 ⎞ ⎜ ⎟ ⎝ 1 − θ0 ⎠ n −∑ j X j and we continue sampling as long as ⎛ 1 − θ1 ⎞ ⎟ ⎜ A − n log 1 − θ0 ⎠ ⎝ log ( ) θ (1 − θ ) θ1 1 − θ 0 0 1 n ⎛ 1 − θ1 ⎞ < ∑ X j < ⎜ B − n log ⎟ − θ0 ⎠ 1 ⎝ j =1 log ( ). and ﬁnally compare the estimated EiN.29667 and B ≈ log 95 = 1. θ (1 − θ ) θ1 1 − θ 0 0 1 1 0 (40) Next.05 0.01 0. i = 0.d.14.95 0. 1 with the size of the ﬁxed sample size test with the same error probabilities. θ = θ x 1 − θ ( ) ( ) 1− x .01 and 1 − β = 0. calculate a′.99 0. EXAMPLE 1 Let X1. 1 − 0. 1. let us suppose that θ0 = – 8 and θ1 = 8 .i.0505. f x.05 0.d. r.4 Sequential Some Examples Sampling 395 What we explicitly do. X2.0105 and 1 − β ′ ≤ ≈ 0.01 0. Z1 = log ( ) = X log θ (1 − θ ) + log 1 − θ 1 −θ f (X ) θ (1 − θ ) f1 X1 0 1 0 1 1 0 1 . i = 0. b′.

’s with p.84. EXAMPLE 2 Let X1. 1). (45) 2 By using the same values of α and 1 − β as in the previous example. Again both E0N and E1N compare very favorably with the ﬁxed value of n which provides the same protection.09691.13716 and E1Z1 = 0.396 14 Sequential Procedures log θ1 1 − θ0 0 ( ) = log 5 = 0. Z1 = log so that ( )= θ ( f (X ) f1 X1 0 1 1 − θ0 X1 − ) 1 2 2 θ 1 −θ0 .5 and E1 N ≈ 129. 1. the MP test for testing H against A based on a ﬁxed sample size n is given by (9) in Chapter 13.i.95. Thus relation (38) gives E0 N ≈ 2.d. 2 ( ) 1 2 2 θ 1 −θ0 .5. Then n ⎡ ⎤ 1 2 λn = exp ⎢ θ1 − θ 0 ∑ X j − n θ 1 −θ 2 0 ⎥ 2 ⎢ ⎥ j =1 ⎣ ⎦ and we continue sampling as long as ( ) ( ) ⎡ n 2 2 ⎤ ⎢A + 2 θ 1 − θ 0 ⎥ ⎣ ⎦ ( ) (θ 1 n ⎡ n 2 ⎤ − θ 0 < ∑ X j < ⎢B + θ 2 1 −θ 0 ⎥ 2 j =1 ⎣ ⎦ ) ( ) (θ 1 − θ0 . ) (44) Next. we have EiZ1 = θ i θ1 − θ0 − ( ) ( ) E0 Z1 = −0. Thus both E0N and E1N compare very favorably with it. 1 − θ0 5 so that by means of (41). . .f. we ﬁnd that for the given α = 0.22185 3 θ (1 − θ ) 1 and log 1 − θ1 4 = log = −0.4 On the other hand. X2.014015.01 and β = 0.05. . Using the normal approximation.53 and E1 N ≈ 3. we have the same A and B as before.d. relation (38) gives E0 N ≈ 92. n has to be equal to 244. be i. Taking θ0 = 0 and θ1 = 1. i = 0 . Now the ﬁxed sample size MP test is given by (13) in Chapter 13.v. . we have E0 Z1 = −0. r. (43) Finally.5 and E1Z1 = 0. by means of (42) and (43).63. that of N(θ. From this we ﬁnd that n ≈ 15.

. . . . . . . . Xn). . Xn). In the present chapter. g(θ). Let L(X1. . . Xn) be two statistics such that L(X1. we considered the problem of point estimation of a real-valued function of θ. . . . Xn). Xn) is an upper and a lower conﬁdence limit for θ. .i. .d. X n ≤ θ < ∞ ≥ 1 − α . . . . Xn)] is a conﬁdence interval for θ with conﬁdence coefﬁcient 1 − α. . First. we considered the problem of estimating g(θ) by a statistic (based on the X’s) having certain optimality properties. Pθ −∞ < θ ≤ U X1 . In Chapter 12. Xn be i. . . . . . U(X1. . θ) θ ∈ Ω ⊆ ޒr.1 Conﬁdence Intervals Let X1. . .v. if for all θ ∈ Ω. . . We say that the r. Xn) and L(X1.d. Xn)] is a conﬁdence interval for θ with conﬁdence coefﬁcient 1 − α (0 < α < 1) if Pθ L X1 . . .15. . . . . (1) Also we say that U(X1. if the probability is at least 1 − α that the 397 . Xn) ≤ U(X1. X n ≥ 1 − α [( ) ( )] for all θ ∈ Ω.2 Some Examples 397 Chapter 15 Conﬁdence Regions—Tolerance Intervals 15. .’s with p. . with conﬁdence coefﬁcient 1 − α. U(X1. . . . [ ( )] [( ) ] (2) Thus the r. . . .v. but in a different context. DEFINITION 1 DEFINITION 2 A random interval is a ﬁnite or inﬁnite interval. . f(·. . That is. . where at least one of the end points is an r. . . interval [L(X1. . . . we return to the estimation problem. . respectively. . Xn) and U(X1. . interval [L(X1. we consider the case that θ is a real-valued parameter and proceed to deﬁne what is meant by a random interval and a conﬁdence interval. X n ≤ θ ≤ U X1 . r. . . . .f. X n ≥ 1 − α and Pθ L X1 . .

. . .398 15 Conﬁdence Regions—Tolerance Intervals r. . . . Tn(θ) = T(X1. . . . it should be pointed out that a general procedure for constructing a conﬁdence interval is as follows: We start out with an r. . Xn) = U(X1. Xn) of this conﬁdence interval is l = l(X1. is completely determined. . . .1 Establish the relation claimed in Remark 1 above. . interval [L(X1. . U(x1. . at least (1 − α)N of the N intervals will cover the true parameter θ. In such a case. . X n + Pθ θ ≤ U X1 . . . . . Xn) are some rather simple functions of Tn(θ) which are chosen in an obvious manner. . The length l(X1. xn). X n [ = Pθ L X1 . . . Suppose now that this process is repeated independently N times. .v. . Xn) is a lower and an upper 1 conﬁdence limit for θ. . . REMARK 1 By relations (1) and (2) and the fact that Pθ θ ≥ L X1 . Then Ln = L(X1. The examples which follow illustrate the point. . Xn). . . . Xn) − L(X1. . . . . . j = 1. so that we obtain N intervals. Xn) and the expected length is Eθ l. . . . . . Exercise 15. and whose distribution. Xn). . . 15. At this point. X n ≤ θ ≤ U X1 . . U(X1. . under Pθ . . . . .1. if L(X1. In all of the examples in the present section. . experiment under consideration is carried out independently n times. . Xn) and U(X1. . U(X1. . . . respectively. . . Xn)] is a conﬁdence interval for θ with conﬁdence coefﬁcient 1 − α. . . . the problem is that of constructing a conﬁ- . Now it is quite possible that there exist more than one conﬁdence interval for θ with the same conﬁdence coefﬁcient 1 − α. A similar interpretation holds true for an upper and a lower conﬁdence limit of θ. This will be done explicitly in a number of interesting examples. . . . Xn) and Un = U(X1. . it is obvious that we would be interested in ﬁnding the shortest conﬁdence interval within a certain class of conﬁdence intervals. xn)]. . . Xn)] covers the parameter θ no matter what θ ∈ Ω is. . . Xn. construct the interval [L(x1. . . . . each with conﬁdence coefﬁcient 1 − – α. Then. . 2 then [L(X1. [( ( ) )] [ ( ( )] )] it follows that. and if xj is the observed value of Xj. θ) which depends on θ and on the X’s only through a sufﬁcient statistic of θ. . X n + 1. if it exists. . . The interpretation of this statement is as follows: Suppose that the r. as N gets larger and larger. n. .2 Some Examples We now proceed with the discussion of certain concrete cases. . .

Now determine two numbers a and b (0 < a < b) such that 2 P a ≤ χn ≤ b = 1 − α. X + zα 2 ⎢ X − zα 2 ⎥. so that σ 2 is the parameter. Tn σ 2 = ( ) 2 2 1 n nSn 2 . the shortest one is that for which b − a is smallest.15. suppose that σ is known.2 Some Examples 399 dence interval (and also the shortest conﬁdence interval within a certain class) for θ with conﬁdence coefﬁcient 1 − α.i.’s from N(μ. we have which is equivalent to ⎡ Pμ ⎢a ≤ ⎢ ⎣ n X −μ ( ) ≤ b⎤ ⎥ = 1−α ⎥ ⎦ ⎛ σ σ ⎞ Pμ ⎜ X − b ≤ μ ≤ X −a ⎟ = 1 − α.d.v. ⎢ n n⎥ ⎣ ⎦ Next. so ¯ − μ)/σ. 1 ≤ b = 1 − α . EXAMPLE 1 Let X1. Xn be i. [ ( ) ] σ (3) From (3). . Therefore the shortest conﬁdence interval for μ with conﬁdence coefﬁcient 1 − α (and which is of the form (4)) is given by ⎡ σ σ ⎤ (5) . . among all conﬁdence intervals with conﬁdence coefﬁcient 1 − α which are of the form (4). where a and b satisfy (3). determine two numbers a and b (a < b) such that P a ≤ N 0.2. where S = Xj − μ . 1) distribution which we denote by zα /2. Its length is equal to (b − a)σ /√n. and consider the r. . From this it follows that. Tn(μ) = √n(X ¯ depends on the X’s only through the sufﬁcient statistic X of μ and its distribution is N(0.v. ⎝ n n⎠ Therefore ⎡ σ σ ⎤ . Then Tn(μ) that μ is the parameter. we obtain . ∑ n 2 n j =1 σ ( ) 2 ¯n(σ 2) depends on the X’s only through the sufﬁcient statistic S 2 Then T n of σ 2 2 and its distribution is χ n for all σ . σ 2). First. X −a (4) ⎢X − b ⎥ ⎢ n n⎥ ⎣ ⎦ is a conﬁdence interval for μ with conﬁdence coefﬁcient 1 − α. 1) for all μ. ( ) (6) From (6). assume that μ is known.1) that this happens if b = c (> 0) and a = −c. r. and consider the r. Next. where c is the upper α /2 quantile of the N(0.v. . It can be seen (see also Exercise 15.

90. W. the shortest (both in actual and expected length) conﬁdence interval for σ 2 with conﬁdence coefﬁcient 1 − α (and which is of the form (7)) is given by . b (9) For the numerical solution of (9). 674–682) for n = 2(1)29 and 1 − α = 0. The expected length is equal to (1/a − 1/b)nσ . 1959.95. da gn b () () Thus (8) becomes a2gn(a) = b2gn(b). tables are required. For the determination of the shortest conﬁdence interval. So let b = b(a). this is not the best choice because then the corresponding interval (7) is not the shortest one. 0. Tate and G. it clearly follows that that a for which l is shortest is given by dl/da = 0 which is equivalent to db b 2 = . Klett. it follows that a and b are determined by a 2 gn a = b2 gn b () () and ∫a g n (t )dt = 1 − α . F. 54. ⎢ ⎢ ⎣ b 2⎤ nSn ⎥ a ⎥ ⎦ (7) is a conﬁdence interval for σ 2 with conﬁdence coefﬁcient 1 − α and its length 2 is equal to (1/a − 1/b)nS 2 n. Vol. 0. By means of this result and (6).999).” Journal of the American Statistical Association. From (6).400 15 Conﬁdence Regions—Tolerance Intervals ⎛ ⎞ nS 2 Pσ ⎜ a ≤ 2n ≤ b⎟ = 1 − α σ ⎝ ⎠ 2 which is equivalent to ⎛ nS 2 nS 2 ⎞ Pσ ⎜ n ≤ σ 2 ≤ n ⎟ = 1 − α . Since the length of the conﬁdence interval in (7) is l = (1/a − 1/b)nS 2 n. letting Gn and gn be the d. Such tables are available (see Table 678 in R.995. and the p. “Optimum conﬁdence intervals for the variance of a normal distribution. a ⎠ ⎝ b 2 Therefore 2 ⎡ nSn . one obtains gn b − g (a) = 0. pp.99. of the χ 2 n. Differentiating it with respect to a. ( ) db da n or db gn a = . it is obvious that a and b are not independent of each other but the one is a function of the other.d. To summarize then. 0. 0. da a 2 (8) Now.f. we work as follows. in practice they are often chosen by assigning mass α /2 to each one of the tails of the χ 2 n distribution. However. Now.f. although there are inﬁnite pairs of numbers a and b satisfying (6). relation (6) becomes Gn(b) − Gn(a) = 1 − α.

Tn β = () 2 n ∑Xj β j =1 is χ 2 2rn for all β > 0. .1. ⎢ ⎢ ⎣ 40.d. the r. . Furthermore. let n = 25.2 2 ⎡ nSn . ⎢ ⎥ ⎢ ⎣ 45. the shortest conﬁdence interval is equal to 2 2 ⎡ 25S25 ⎤ 25S25 . so that (5) gives [X conﬁdence interval given by (7). . we have a = 13.i. since φ2 X Therefore j β (t ) = φ Xj ⎛ 2t ⎞ 1 ⎜ ⎟= ⎝β⎠ 1 − 2it ( ) 2r 2 (see Chapter 6). . Then ¯ − 0. Now determine a and b (0 < a < b) such that 2 P a ≤ χ2 rn ≤ b = 1 − α .2636 ⎥ ⎦ and the ratio of their lengths is approximately 1.v. n.120 ⎥ ⎦ On the other hand. . we obtain ⎛ ⎞ 2 n Pβ ⎜ a ≤ ∑ X j ≤ b⎟ = 1 − α β j =1 ⎝ ⎠ which is equivalent to n ⎛ n ⎞ Pβ ⎜ 2 ∑ X j b ≤ β ≤ 2 ∑ X j a⎟ = 1 − α .95.15.v. 13. EXAMPLE 2 Let X1. ⎝ j =1 ⎠ j =1 Therefore a conﬁdence interval with conﬁdence coefﬁcient 1 − α is given by .’s from the Gamma distribution with parameter β and α a known positive integer. r. . . 2Xj /β is χ 2 2r.7051 14. a ⎥ ⎦ Some Examples 401 where a and b are determined by (9). Xn be i. for each j = 1. As a numerical application. for the equal-tails zα /2 = 1.392].07. and 1 − α = 0.646. Then ∑ n j = 1Xj is a sufﬁcient statistic for β (see Exercise 11. Chapter 11). call it r.392. X ¯ + 0. σ = 1.96.646 2 ⎤ 25S25 ⎥.2(iii). ⎢ ⎢ ⎣ b 2⎤ nSn ⎥. Next. . ( ) (10) From (10). so that the equal-tails conﬁdence interval itself is given by 2 ⎡ 25S25 .120 and b = 40.

v. This shows that 2 Tn θ = −2θ ∑ log X j = ∑ Yj j =1 j =1 () n n is distributed as χ 2 2n.i. (See Exercise 11.2(iv) in Chapter 11. a = 16. one has to minimize l subject to (10). we have.f.3675 ⎢ ⎣ The equal-tails conﬁdence interval is 2 ∑ j =1 X j ⎤ ⎥.308 ⎥ ⎥ ⎦ 7 so that the ratio of their length is approximately equal to 1. is not the shortest among those of the form (11). . But this is the same problem as the one we solved in the second part of Example 1.d.v.5128 ⎥ ⎥ ⎦ 7 ⎡2 7 X ⎢ ∑ j =1 j . which is customarily employed. Yj = −2θ log Xj. of a χ 2 2. yj > 0. Xn be i. . ⎥.461 ⎢ ⎣ 2 ∑ j =1 X j ⎤ ⎥. for n = 7.’s from the Beta distribution with β = 1 and α = θ unknown. or −∑ j = 1 log Xj is a sufﬁcient statistic for θ.95. Now determine a and b (0 < a < b) such that 2 P a ≤ χ2 n ≤ b = 1 − α. . 16.3675.402 15 Conﬁdence Regions—Tolerance Intervals n ⎡2 n X 2 ∑ j =1 X j ⎤ ⎢ ∑ j =1 j . Thus the corresponding shortest conﬁdence interval is then ⎡2 7 X ⎢ ∑ j =1 j .) Consider the r. which is the p. . It is easily seen that 1 its p. (11) ⎛ 1 1⎞ n l = 2⎜ − ⎟ ∑ X j .075. ⎢ 44. r = 2 and 1 − α = 0. EXAMPLE 3 Let X1. it follows that the equal-tails conﬁdence interval.5128 and b = 49. b For instance.d.f. ⎝ a b ⎠ j =1 ⎛ 1 1⎞ Eβ l = 2 βrn⎜ − ⎟ . respectively.d. r. ⎢ ⎥ b a ⎢ ⎥ ⎦ ⎣ Its length and expected length are. by means of the tables cited in Example 1. In order to determine the shortest conﬁdence interval. It follows then that the shortest (both in actual and expected length) conﬁdence interval with conﬁdence coefﬁcient 1 − α (which is of the form (11)) is given by (11) with a and b determined by a 2 g 2 rn a = b 2 g 2 rn b () () and ∫a g 2 rn (t )dt = 1 − α . ( ) (12) . 15. n Then ∏ n j = 1Xj.1. is – exp(−yj /2). ⎝ a b⎠ As in the second part of Example 1. ⎢ 49.

. EXAMPLE 4 Let X1.d. .420. 0 ≤ yn ≤ θ θn (by Example 3.v. 0 ≤ t ≤ 1.i. Chapter 11) and its p. gn is given by gn yn = ( ) n n −1 yn . Its p.2 Some Examples 403 From (12). Consider the r. Tn(θ) = Yn/θ. no tables which would facilitate this solution are available. the equal-tails conﬁdence interval for θ is given by (13) with a = 32. For example.d. Considering dl/da = 0 in conjunction with (12) in the same way as it was done in Example 2. a [ () ] b (14) . θ).v. for n = 25 and 1 − α = 0.95. () Then deﬁne a and b with 0 ≤ a < b ≤ 1 and such that Pθ a ≤ Tn θ ≤ b = ∫ nt n −1dt = bn − an = 1 − α . b However.’s from U(0.15. . Then Yn = X(n) is a sufﬁcient statistic for θ (see Example 7. ⎢− − .f. is easily seen to be given by hn t = nt n −1 . Xn be i. Chapter 10). . ⎝ ⎠ j =1 j =1 Therefore a conﬁdence interval for θ with conﬁdence coefﬁcient 1 − α is given by ⎤ ⎡ a b ⎥. n ⎢ 2 n log X 2 ∑ j =1 log X j ⎥ j ⎥ ⎢ ⎦ ⎣ ∑ j =1 Its length is equal to l= a−b 2 ∑ j =1 log X j n (13) . we obtain n ⎛ ⎞ Pθ ⎜ a ≤ −2θ ∑ log X j ≤ b⎟ = 1 − α ⎝ ⎠ j =1 which is equivalent to n n ⎛ ⎞ Pθ ⎜ a −2 ∑ log X j ≤ θ ≤ b −2 ∑ log X j ⎟ = 1 − α . r.357 and b = 71.d.f. we have that the shortest (both in actual and expected length) conﬁdence interval (which is of the form (13)) is found by numerically solving the equations g 2n a = g 2n b () () and ∫a g 2n (t )dt = 1 − α .

and the exercises just mentioned. for n = 32 and 1 − α = 0. θ) and that the r.’s having the Negative Exponential distribution with parameter θ ∈ Ω = (0. C. a ⎥ ⎢ b ⎦ ⎣ and its length is l = (1/a − 1/b)X(n). l is decreasing as a function of b and its minimum is obtained for b = 1. (See also the discussion of the second part of Example 1. ⎥. da/db = bn−1/an−1. we have (15) ⎛ 1 da 1 ⎞ dl = X (n ) ⎜ − 2 + ⎟.5–15. we get Pθ(a ≤ Yn/θ ≤ b) = 1 − α which is equivalent to Pθ[X(n)/b ≤ θ ≤ X(n)/a] = 1 − α.v. . ∞).2.1 Let Φ be the d.2. Exercises 15. Vol. Show that b − a is minimum if b = c (> 0) and a = −c. 2U/θ is distributed as χ 2 2n. where shortest conﬁdence intervals exist. U is distributed as Gamma with parameters (n.2. .098X(32)]. 1969. Therefore the shortest (both in actual and expected length) conﬁdence interval with conﬁdence coefﬁcient 1 − α (which is the form (15)) is given by ⎡ X (n ) ⎤ ⎢X n .7 at the end of this section are treated along the same lines with the examples already discussed and provide additional interesting cases.) 15. Exercises 15. . Therefore a conﬁdence interval for θ with conﬁdence coefﬁcient 1 − α is given by ⎡ X (n ) X (n ) ⎤ ⎥ ⎢ . has been motivated by a paper by W.404 15 Conﬁdence Regions—Tolerance Intervals From (14). of the N(0. 1) distribution and let a and b with a < b be such that Φ(b) − Φ(a) = γ (0 < γ < 1). .f. so that dl an +1 − bn +1 = X (n ) . db ⎝ a db b 2 ⎠ while by way of (14). .2. by means of (14).v. 23. Xn be independent r. Number 1.2 Let X1. db b 2 an +1 Since this is less than 0 for all b.95. we have approximately [X(32). in which case a = α 1/n. From this. ii) Show that the r. and set U = ∑ n i = 1Xi.v. The inclusion of the discussions in relation to shortest conﬁdence intervals in the previous examples. ( ) α1 n ⎥ ⎢ ⎣ ⎦ For example. Guenther on “Shortest conﬁdence intervals” in The American Statistican. 1.

with conﬁdence coefﬁcient 1 − α is of the form [Y1 − (b/2n).) (Hint: Take cε(0.2.f.v. Then show that: n γ iii) The r.6 Let X1.v. iii) A conﬁdence interval for θ. with conﬁdence coefﬁcient 1 − α is of the form [R. x > 0). θ = e I (θ .2 Some Exercises Examples 405 ii) Use part (i) to construct a conﬁdence interval for θ with conﬁdence coefﬁcient 1 − α.α is the upper αth quantile of the χ 2 distribution. Y1 − (a/2n)].v.v.15. . Y1 ⎥. based on Tn(θ). Then show that: iii) The p. iv) The shortest conﬁdence interval of the form given in (iii) is provided by 2 ⎡ ⎤ χ2 .2. Chapter 11. with conﬁdence coefﬁcient 1 − α is of the form [2Y/b.i. ≤ 1) = 1 − α. . θ) = 1/θ e−x/θ.4 Refer to Example 4 and set R = X(n) − X(1). .v. (Hint: Use the parametrization f (x. 15.d.2.2. then (by Exercise 15. X has the Negative Exponential distribution with parameter θ ∈ Ω = (0. .f. Xn is a random sample from the distribution in part (i) and U = ∑ in= 1Xi. θ) = 1/θ e−x/θ. Tn(θ) = 2Y/θ is distributed as χ 2 2n.d. Then: iii) Find the distribution of R.∞ ) x . given in Exercise 11. .2.’s having the Weibull p. . .2(i)) 2U/θ is distributed as χ 2 2n. where Y = ∑ j = 1X j . 15. θ) = 1/θ e−x/θ. given by − ( x −θ ) f x. ii) If X1. R/c]. θ) = Pθ(X > x) (x > 0) is equal to e−x/θ. r.3 ii) If the r. (Hint: Use the parametrization f (x. θ ∈Ω = ޒ [ ( )] ( ) () and set Y1 = X(1). 1) such that P (c ≤ R θ θ iii) Show that a conﬁdence interval for θ. (Hint: Use the parametrization f (x. . . .α .4. . 15. Xn be i.∞)( y) iii) The r.d.f. iii) A conﬁdence interval for θ. x > 0). where c is a root of the equation c n −1 n − n − 1 c = α iii) Show that the expected length of the shortest conﬁdence interval in Example 4 is shorter than that of the conﬁdence interval in (ii) above. Xn be independent r. .5 Let X1. 2Y/a].’s with p. 15. θ) with conﬁdence coefﬁcient 1 − α. ⎢Y1 − 2n ⎢ ⎥ ⎣ ⎦ 2 where χ 2 2. g of Y1 is given by g(y) = ne−n( y−θ )I(θ. show that the reliability R(x. based on Tn(θ). based on R. .d. ∞). Use this fact and part (i) of this exercise to construct a conﬁdence interval for R(x.2. Tn(θ) = 2n(Y1 − θ) is distributed as χ 2 2. x > 0).

θ = ( ) 1 x θ e . 2 15.d. where σ . Xm from N(μ1. ⎛ σ ⎞ σ 2 S2 Tm. based on T m. ( ) Then show that: ii) The r.d.8.i. b 2 ⎥. X m − Yn − a 1 + 2 ⎥.2. with conﬁdence coefﬁcient 1 − α is given by 2 2 ⎤ ⎡ Sm Sm ⎢ a 2 . σ are known and μ . σ 2 1) and Y1.v. m n m n ⎥ ⎢ ⎦ ⎣ where a and b are such that Φ(b) − Φ(a) = 1 − α.’s with p.2.2. σ 2 ). 15. . ii) A conﬁdence interval for μ 1 − μ 2. μ 2 are known and σ1. .n(σ1/σ2). .v.6. ( ) ( ) ii) The shortest conﬁdence interval of the aforementioned form is provided by the last expression above with − a = b = zα . Consider the r. θ ii) and (iii) as in Exercise 15. Tm. 15. . given by f x.v. based on Tm.f. ∞ . 2θ θ ∈ Ω = 0. . Sn ⎥ ⎢ ⎣ Sn ⎦ . n ⎜ 1 ⎟ = 12 n 2 ⎝ σ 2 ⎠ σ 2 Sm 2 ¯ and show that a conﬁdence interval for σ 2 1/σ 2.406 15 Conﬁdence Regions—Tolerance Intervals iii) The shortest conﬁdence interval of the form given in (ii) is taken for a and b satisfying the equations ∫a g 2n (t )dt = 1 − α b and a 2 g 2n a = b 2 g 2n b . Tn(θ) = 2Y n is distributed as χ 2 2n.f.d. 2 1 2 1 2 and let the r. . Yn from N(μ2. . . with conﬁdence coefﬁcient 1 − α is given by ⎡ σ2 σ2 σ2 σ2 ⎤ ⎢ X m − Yn − b 1 + 2 .n(μ1 − μ2).2. where Y = ∑ j = 1|Xj|. . Xn be i.n(μ 1 − μ 2) be deﬁned by Tm. .v.2.9 Refer to Exercise 15. but now suppose that μ 1.8 Consider the independent random samples X1. σ2 are unknown.7 Let X1. . r. μ are unknown. n μ1 − μ 2 = Then show that: ( ) (X m − Yn − μ1 − μ 2 2 1 (σ ) ( m + σ ) ( 2 2 n ) ). . () () where g2n is the p. of the χ 2 2n distribution.

EXAMPLE 5 Refer to Example 1 and suppose that both μ and σ are unknown. However. some other (nuisance) parameters do appear in the p. Tn ′(μ) = √n(X 2 ¯ only through the sufﬁcient statistic (X n. .2. under consideration. so 2 V 2U that the r. . 15. In particular.m ′ . where F n. σ )′. respectively. Yn be independent random samples from the Negative Exponential distributions with parameters θ1 and θ2.2 Nuisance Some Parameters Examples 407 where 0 < a < b are such that P(a ≤ Fn. θ2 θ 1 is distributed as F2n.f. In such cases.15. and set U = ∑ m i = 1Xi.v. an argument similar to the one employed in Example 1 implies that the shortest (both in actual and expected length) conﬁdence interval of the form (16) is given by ⎡ Sn −1 S ⎤ . ⎢ X n − t n −1.m ′ .α /2. respectively. For this purpose.m.m.α 2 n −1 ⎥. and working as in Example 1.α /2 are the lower and the upper α /2 quantiles. consider the r.α 2 ⎢ n n⎥ ⎣ ⎦ (17) . n − 1 j =1 ( ) ¯n − μ)/Sn−1 which depends on the X’s Thus we obtain the new r.m.2m. Then (by Exercise 15. .m ≤ b) = 1 − α. . ⎢Xn − b ⎢ n n⎥ ⎣ ⎦ (16) Furthermore. Xm and Y1.d.10 Let X1.2. . of Fn. in many interesting examples.v.2(i)) the inde2 pendent r. S 2 n−1)′ of (μ. V = ∑ i = 1Yj.2. .3 Conﬁdence Intervals in the Presence of Nuisance Parameters So far we have been concerned with the problem of constructing a conﬁdence interval for a real-valued parameter when no other parameters are present. (Hint: Employ the parametrization used in Exercise 15. X n − a n −1 ⎥.α /2 and b = Fn. Use this result in order to construct a conﬁdence interval for θ1/θ2 with conﬁdence coefﬁcient 1 − α. we suppose that we are interested in constructing a conﬁdence interval for μ. X n + t n −1.) 15. we obtain a conﬁdence interval of the form ⎡ Sn −1 S ⎤ . . .α /2 and Fn.3 Conﬁdence Intervals in the Presence of15.’s 2U/θ1 and 2V/θ2 are distributed as χ 2 2m and χ 2n. which is tn−1 distributed. Basing the conﬁdence interval in question on Tn(μ). The examples below illustrate the relevant procedure. in addition to the real-valued parameter of main interest. Tn(μ) of Example 1 and replace σ 2 by its usual estimator 2 Sn −1 = 2 1 n ∑ X j − Xn . the equaltails conﬁdence interval is provided by the last expression above with a = Fn.v.2. respecn tively. we replace the nuisance parameters by appropriate estimators and then proceed as before. First.v.

for n = 25 and 1 − α = 0. . the corresponding conﬁdence interval for μ is taken ¯n − 0.5227 and b = 44. samples X1. n(μ 1 − μ 2) is given by ⎡ ⎢X −Y −t n m+n −2. . so that the corresponding interval 2 approximately is equal to [0. For this purpose. n μ1 − μ 2 = ( ) (X (m − 1)S m − Yn − μ1 − μ 2 2 + n − 1 Sn 1⎞ −1 ⎛ 1 ⎜ + ⎟ m+n−2 ⎝ m n⎠ 2 m−1 ( ) ( ) ) . Then Tm. Proceeding as in the corresponding case of Example 1. . For instance. a = 13. Thus. Tm.0639.v. Suppose now that we wish to construct a conﬁdence interval for σ 2. say (unspeciﬁed). EXAMPLE 6 Consider the independent r.0. . suppose that a conﬁdence interval for μ 1 − μ 2 is desired. ¯n so that T ′(σ 2) is χ 2 n−1 distributed. T T n′ σ 2 = n−1 S ( ) ( σ) 2 n −1 .α /2 is the upper α /2 quantile of the tn−1 distribution. b a ⎢ ⎥ ⎣ ⎦ and the shortest conﬁdence interval of this form is taken when a and b are numerical solutions of the equations ( ) ( ) a 2 gn −1 a = b 2 gn −1 b () () and ∫a gn −1 (t )dt = 1 − α . b Thus with n and 1 − α as above. as in the ﬁrst case of Example 1 (and also Example 5). . from (17) with t 24. σ 2 2).408 15 Conﬁdence Regions—Tolerance Intervals where tn−1. σ 2 1) and Y1. 1.775S 24]. m+ n−2 ⎝ m n⎠ 2 m−1 ( ) (X m − Yn + t m + n − 2 . Xm from N(μ 1. a ) 2 2 ⎤ + n − 1 Sn 1⎞⎥ −1 ⎛ 1 ⎜ + ⎟ ⎥.4802.41278S24.539S 2 24. σ1 and σ2 are unknown. modify the r. μ 2. by means of the tables cited in Example 1.025 = 2. we have to assume that σ1 = σ2 = σ. Yn from N(μ 2. m+ n−2 ⎝ m n⎠ ⎥ ⎦ 2 m−1 ( ) . n(μ 1 − μ 2) is distributed as tm+n−2.v. a ⎢ m ⎢ ⎣ ( ) 2 (m − 1)S (m − 1)S 2 + n − 1 Sn 1⎞ −1 ⎛ 1 ⎜ + ⎟. To this ¯n(σ 2) of Example 1 as follows: end. .41278S24]. one has. (18) . one has the following conﬁdence interval for σ 2: 2 2 ⎤ ⎡ n − 1 Sn n − 1 Sn −1 −1 ⎢ ⎥. First. where all μ 1.95. the shortest (both in actual and expected length) conﬁdence interval based on Tm. . Thus we have approximately [X ¯ X n + 0. Consider the r. .

1586 12S 2 + 13S 2 . ⎜ 2 ⎟ 2 2 Sn −1 ⎠ ⎝ Sn −1 σ 2 1 σ2 2 ⎛ ⎞ σ 12 Sn −1 ⎜ a ≤ 2 2 ≤ b⎟ = 1 − α .α 2 Thus.m−1. n = 14 and 1 − α = 0.1586 12S12 + 13S13 .025 = 3.1532. Now determine two numbers a and b with 0 < a < b and such that P a ≤ Fn −1.m−1.v. 2 ⎛ σ ⎞ σ 2 Sn −1 Tm. n ⎜ 1 ⎟ = 12 2 ⎝ σ 2 ⎠ σ 2 S m−1 which is distributed as Fn−1.m −1 ≤ b = 1 − α . n and 1 − α. we consider the r.m−1. ⎥ ⎦ where Fn ′−1. The point Fn−1. we have t25.α /2 is given by Fn′−1.n −1.α 2 . so that the corresponding interval approximately is equal to 2 ⎤ 2 ⎡ S12 S12 ⎢0.3171 2 . for m = 13. 3.025 = 3. the equal-tails conﬁdence interval is provided by 2 ⎡ Sm −1 ⎢ 2 Fn′−1.2 Nuisance Some Parameters Examples 409 For instance.m−1. 14 12 13 ⎢ 13 ⎣ ( ) (X 13 2 2 ⎤ − Y14 + 0.0.0. we have F13. σ 2 Sm −1 ⎝ ⎠ 1 2 Therefore a conﬁdence interval for σ 2 1/σ 2 is given by 2 2 ⎡ Sm ⎤ Sm −1 −1 ⎢ a 2 .0595.α /2 is read off the F-tables and the point F n ′−1.m−1.12.3 Conﬁdence Intervals in the Presence of15.95.2388 and F12./0. so that the corresponding interval approximately is equal to ⎡ X − Y − 0.025 = 2.m−1.13.m −1. S ⎢ ⎣ n −1 2 Sm −1 Fn −1.m −1.15.2388 2 ⎥. S13 S13 ⎥ ⎢ ⎣ ⎦ . Fm −1.α 2 = 1 . for the previous values of m. ⎥ ⎦ ) 2 If our interest lies in constructing a conﬁdence interval for σ 2 1/σ 2.α /2 are the lower and the upper α /2-quantiles of Fn−1. b 2 ⎥.m −1.α /2 and Fn−1. ( ) Then Pσ or Pσ σ2 2 2 ⎛ Sm ⎞ Sm σ 12 −1 −1 a ≤ ≤ b = 1 − α.α 2 Sn −1 2 ⎤ ⎥. Sn −1 ⎥ ⎢ ⎣ Sn −1 ⎦ In particular.

That is. 1 ≤ c = 1 − α [ ( ) ] and 2 P a ≤ χn −1 ≤ b = 1 − α . ( ) ( ) ( ) ( ) ⎡ Pμ . Derive a conﬁdence interval for σ with conﬁdence coefﬁcient 1 − α when μ is unknown.v. There are important examples. To this end. n (n − 1)S b 2 n −1 ≤σ2 ≤ (n − 1)S a 2 n −1 ⎤ ⎥ = 1 − α. no suitable r. ⎢ = Pμ .’s employed for the construction of conﬁdence intervals had an exact and known distribution. we obtain 2 ⎤ ⎡ n Xn − μ n − 1 Sn −1 ⎥ Pμ . with known distribution is available which can be used for setting up conﬁdence intervals. This will be illustrated by means of the following example. In cases like this. b and c may be determined so that the resulting intervals are the shortest ones. . which are independently distributed as N(0. 15. . a and b (0 < a < b) by P − c ≤ N 0.σ ⎢− c ≤ ≤ c. Xn be independent r. . σ 2)′ indicated in Fig. both in actual and expected lengths. where this is not the case. Now suppose again that θ is real-valued. σ 2).v.’s distributed as N(μ. The quantities a. ⎥ ⎦ (19) For the observed values of the X’s.1 Let X1. consider the r.’s n Xn − μ ( σ ) and (n − 1)S σ2 2 n −1 . deﬁne the constants c (> 0).4 Conﬁdence Regions—Approximate Conﬁdence Intervals The concept of a conﬁdence interval can be generalized to that of a conﬁdence region in the case that θ is a multi-dimensional parameter. respectively. under . .3. we have the conﬁdence region for (μ. In all of the examples considered so far the r.σ a ≤ ≤ b 2 ⎢ ⎥ σ σ ⎥ ⎢ ⎦ ⎣ ⎣ ⎦ Equivalently.) Here the problem is that of constructing a conﬁdence region in ޒ2 for (μ.σ ⎢ μ − X n ⎢ ⎣ ( ) 2 ≤ c 2σ 2 . a ≤ ≤ b ⎥ ⎢ σ σ2 ⎦ ⎣ 2 ⎡ ⎤ ⎤ ⎡ n Xn − μ n − 1 Sn −1 ⎢ ⎥ ⎥ = 1 − α.410 15 Conﬁdence Regions—Tolerance Intervals Exercise 15. 1) and χ 2 n−1.1. 15. ( ) From these relationships. σ 2)′. however.v.σ − c ≤ ≤ c × Pμ .v. Next. EXAMPLE 7 (Refer to Example 5.

v. Xn be i. Suppose now that σ is also unknown. p). provided n is sufﬁciently large. 1) for in probability. r. Then the CLT applies and gives that √n(X mately N(0. .v. ⎥ n ⎥ ⎦ ( ) EXAMPLE 9 Let X1.d. . Then since 2 Sn = 1 n ∑ X j − Xn n j =1 ( ) 2 ⎯n⎯ ⎯→ σ 2 →∞ ¯n − μ)/Sn is again approximately N(0. The problem is that of constructing a conﬁdence interval for p with approximate conﬁdence coefﬁcient 1 − α.’s with ﬁnite mean and variance μ and σ 2. . Xn be i.i. . X n + zα 2 Xn 1 − Xn ⎤ ⎥.2 Conﬁdence Some Examples Intervals 411 2 2 ϭ confidence region for ( . So let X1. we have that √n(X large n and therefore a conﬁdence interval for μ with approximate conﬁdence coefﬁcient 1 − α is given by (20) below. . Xn be i. ¯n − μ)/σ is approxirespectively. r. provided n is sufﬁciently large. 1) for large n. n The two-sample problem also ﬁts into this scheme.’s from P(λ). n⎥ ⎦ (20) As an application. provided both means and variances (known or not) are ﬁnite.4 Conﬁdence Regions—Approximate 15. .i. 2 )' with confidence coefficient 1 Ϫ ␣ 1 n (xj a j⌺ ϭ1 Ϫ xn ) 2 ( Ϫ xn ) 2 ϭ 2 ϭ 1 b n jϭ1 c 2 2 n ⌺ (xj Ϫ xn) 2 0 Figure 15. . Here ¯ ¯ S2 n = X n(1 − X n). . conﬁdence intervals can be constructed by way of the CLT.1 x appropriate conditions. then a conﬁdence interval for μ with approximate conﬁdence coefﬁcient 1 − α is given by (5).d.15. ⎡ ⎢ X n − zα ⎢ ⎣ Sn 2 n . .i. EXAMPLE 8 Let X1.v. . consider the Binomial and Poisson distributions.’s from B(1. . X n + zα 2 Sn ⎤ ⎥.d. . r. so that (20) becomes ⎡ ⎢X − z α ⎢ n ⎢ ⎣ Xn 1 − Xn 2 ( n ). if we assume that σ is known. Then a conﬁdence interval for λ with approximate conﬁdence coefﬁcient 1 − α is provided by X n ± zα 2 Xn . Thus. .

412 15 Conﬁdence Regions—Tolerance Intervals We close this section with a result which shows that there is an intimate relationship between constructing conﬁdence regions and testing hypotheses.’s with p.’s with p. and suppose that n is large. . For each θ * ∈Ω Ω. θ). . . .d.) as n → ∞.’s with (ﬁnite) unknown mean μ and (ﬁnite) known variance σ 2. σ = 1 and α = 0.1 Let X1. and deﬁne T(z) by (21). θ ∈ Ω ⊆ ޒr. xn)′.d. Thus we have the following theorem. and let A(θ *) stand for the acceptance region in ޒn. and deﬁne the region T(z) in Ω as follows: T z = θ ∈Ω : z ∈ A θ . so that T(Z) is a conﬁdence region for θ with conﬁdence coefﬁcient 1 − α. . Exercises 15.i.v.1. . . T(z) is that subset of Ω with the following property: On the basis of z.d. Xn)′. . 15. r. f(x. f(x.f. . From (21). Set Z = (X1.s. θ ∈ Ω = (0. () { ( )} (21) In other words. . .i. . θ ∈ Ω = (0. ∞).05? iii) Show that the length of this conﬁdence interval tends to 0 in probability (and also a. iii) Discuss part (i) for the case that the underlying distribution is B(1. For each θ * ∈ Ω let us consider the problem of testing the hypothesis H(θ *): θ = θ * at level of signiﬁcance α. .05? iii) Refer to part (i) and determine n so that the length of the conﬁdence interval is 0. . θ ∈ Ω ⊆ ޒr.2 Refer to the previous problem and suppose that both μ and σ 2 are unknown. . r. z = (x1. z = (x1. r. provided σ = 1 and α = 0. Then a conﬁdence interval for μ with approximate conﬁdence coefﬁcient 1 − α is given be relation (20). THEOREM 1 [ ( )] [ ( )] Let X1. . iii) Use the CLT to construct a conﬁdence interval for μ with approximate conﬁdence coefﬁcient 1 − α . . () Therefore Pθ θ ∈T Z = Pθ Z ∈ A θ ≥ 1 − α . Xn)′. it is obvious that z ∈A θ () if and only if θ ∈T z . Xn be i. . . . iii) What does this interval become if n = 100.4. every H(θ) is accepted for θ ∈ T(z). xn)′. iii) What does this interval become for n = 100 and α = 0. . Set Z = (X1.4.d. .v. . Then T(Z) is a conﬁdence region for θ with conﬁdence coefﬁcient 1 − α. θ ). 1) or P(θ). θ). consider the problem of testing H(θ *) : θ = θ * at level α and let A(θ *) be the acceptance region.i.d. .05. Let X1. Xn be i. . Xn be i. .v. .f. . .

. . .15. Xn be independent r. .5? 15. .6 Refer to Example 14. DEFINITION 3 Let X1.5 15. and show that the posterior p. T2] is a 100γ percent tolerance interval of 100p percent of F if P[F(T2) − F(T1) ≥ p] ≥ γ.f.v. Xn be i. .d. . . 1/(n + 1)).4.05. Use the CLT to construct a conﬁdence interval for θ with approximate conﬁdence coefﬁcient 1 − α. Use Theorem 4 in Chapter 10 to construct a conﬁdence interval for the pth quantile of F. ∞) and suppose that n is large. xn. Xn be i.’s with continuous d. Thus the Bayes method of estimation considered in Section 7 of Chapter 12 also leads to the construction of prediction intervals for θ.i. Chapter 12. Also identify the conﬁdence coefﬁcient 1 − α if n = 10 for various values of the pair (i. .v. For p and γ with 0 < p. F. .f. . . . Now we suppose that the p.5 Let X1.4 Construct conﬁdence intervals as in Example 1 by utilizing Theorem 1.25. rather than a parameter. of n θ. where p = 0.5 Tolerance Intervals In the sections discussed so far.2 Tolerance Some Examples Intervals 413 15.4. and show that the posterior p. sample with a p. Instead. respectively.v. is N((nx ¯ + μ)/(n + 1). γ < 1.d. xp] is a prediction interval for θ with conﬁdence coefﬁcient 1 − 2p. we have the following deﬁnition. it is replaced by what is known as a tolerance interval.’s with a (nonparametric) d.4.d. Xn ) be two statistics of the X’s such that T1 ≤ T2. j).4.d. r. xn.4. is Beta with parameters α + ∑ n j = 1xj and β + n − ∑ j = 1xj. Xn) and T2 = T2(X1.f. . Then the concept of a conﬁdence interval. μ = 1 and x ¯ = 1. .7 Refer to Example 15.50. becomes meaningless in the present context. Thus if xp ′ and xp are the lower and the upper pth quantiles.f. we say that the interval (T1. (The term prediction interval rather than conﬁdence interval is more appropriate here. of the Beta p. What does this interval become for p = 0. .4. . This problem was solved for certain cases. . . Chapter 12. . Compare this interval with that constructed in Exercise 15. the problem was that of constructing a conﬁdence interval for θ with a preassigned conﬁdence coefﬁcient. .2. 0.f.) 15. since θ is considered to be an r.3 Let X1. of known functional form and depending on a parameter θ. . F and let T1 = T1(X1. Then work as in Exercise 15. Then for the case that θ were real-valued. we assumed that X1. 15. r. f of the X ’s is not of known functional form.2.’s having the Negative Exponential distribution with parameter θ ∈ Ω = (0.f. .6 to ﬁnd a prediction interval for θ with conﬁdence coefﬁcient 1 − p. . 0. n = 9. mentioned above. . that is. given x1. given x1. . . Xn were an r.d. 15.f.d. . . . we assume a nonparametric model. 15.v. . .i. it follows that [xp ′. . as given in Deﬁnition 2.75. of θ.d. More precisely.

. if we set j − i = r. w1 + ⋅ ⋅ ⋅ + wn < 1 otherwise. the r. . of Wi+1 + · · · + Wj. the concept of a tolerance interval has an interpretation analogous to that of a conﬁdence interval. Accordingly. t 2]. .f. . . . Yj] is a 100γ percent tolerance interval of 100p percent of F. ( ) (22) Then the determinant of the transformation is easily seen to be 1 and therefore Theorem 3 in Chapter 10 gives ⎧n!.) PROOF We wish to show that P[F(Yj) − F(Yi) ≥ p] = γ. ( ) ( ) Thus it sufﬁces to ﬁnd the p.f. .d. at least 100 γ of the N intervals will cover at least 100p percent of the distribution mass of F. of a Beta distribution with parameters α = j − i and β = n − j + i + 1. From the transformation above. of Zj − Zi. t2]. it sufﬁces to determine the p.d. n. interval (Yi. Then as N gets larger and larger. we also have Z j − Zi = W1 + ⋅ ⋅ ⋅ + Wj − W1 + ⋅ ⋅ ⋅ + Wi = Wi +1 + ⋅ ⋅ ⋅ + Wj . Set W1 = Z1 and Wk = Zk − Zk −1 .d.v. experiment under consideration is carried out independently n times and let (t1. . Then for any p ∈ (0.f. of the sum of any consecutive r W’s is the same. . t 2] be the resulting interval for the observed values of the X’s. If we set Zk = F(Yk). Actually. n. use the transformation Vk = W1 + · · · Wk. k = 1. p 1 () gj−i being the p. . n be the order statistics. . . r. of W1 + · · · + Wr. For this purpose.d. Namely. suppose the r. Then we see that formally we go back to the Z’s and therefore . .’s with p. F(t 2) − F(t1) is the portion of the distribution mass of F which lies in the interval (t1. . Now regarding the actual construction of tolerance intervals. 1 − γ is read off the Incomplete Beta tables. .i. Suppose now that this is repeated independently N times. wn = ⎨ ⎩0.d. respectively. n. (For selected values of p. k = 1. . . This suggests that we shall have to ﬁnd the p. Xn be i. f of the continuous type and let Yj = X( j). we have the following result. .f. . then it is clear that the p. . .d. THEOREM 2 Let X1. k = 1. where γ is determined as follows: γ = ∫ g j − i υ dυ . . . . . . j = 1.f. so that we obtain N intervals (t1. . α and β. ( ) 0 < wk . k = 2. n. 1) and 1 ≤ i < j ≤ n. this becomes P Z j − Zi ≥ p = γ . g w1 . .f.d.414 15 Conﬁdence Regions—Tolerance Intervals If we notice that for the observed values t1 and t 2 of T1 and T2.

f. In fact. It follows then from Theorem 2(i) in Chapter 10. each with probability equal to p.f. interval (X. so that (−∞. X] covers at most 100p percent of the distribution mass of F. the r.d.f. g υ1 . 1 p r This completes the proof of the theorem. provided γ is determined by ∫ g (v)dv = γ . . ( ) 0 < v1 < ⋅ ⋅ ⋅ < vk < 1 otherwise. ∞) does cover at least 1 − p of the distribution mass of F with probability p. υ k = ⎨ ⎩0. () ( () ( ) ) ( ) n −r . F.v. Then.d. P F X ≤ p = P X ≤ F −1 p = F F −1 p = p. . and the r. But this is the p.d. with d. of a Beta distribution with parameters α = r and β = n − r + 1. 1). () ( )( ) ( ) n −r . ∞) covers at least 100(1 − p) percent of the distribution mass of F.15. . it follows that for any p ∈ (0. b) with −∞ ≤ a < b ≤ ∞. Since this is also the p. and P F X > 1 − p = 1 − P F X ≤ 1 − p = 1 − F F −1 1 − p = 1 − 1 − p = p. so that F is strictly increasing. By taking into consideration that Γ(m) = (m − 1)!. gr is given by ⎧ n! v r −1 1 − v ⎪ gr v = ⎨ r − 1 ! n − r ⎪ ⎩0. of Zj − Zi. ▲ Let now f be positive in (a. this can be rewritten as follows: ⎧ Γ n+1 ⎪ v r −1 1 − v ⎪ gr v = ⎨ Γ r Γ n − r + 1 ⎪ ⎪ ⎩0. [( ) ] [ ] [( ) ( )] [ ( )] [( ) ] [ ( )] ( ) so that (X. 0<v<1 otherwise. . 0<v<1 otherwise. X] does cover at most p of the distribution mass of F with probability p. as was to be seen.2 Tolerance Some Examples Intervals 415 ⎧n!. that the marginal p.f. if X is an r. interval (−∞.5 15. it follows that (22) is true. .

the reader is assumed to be familiar with the basics of linear algebra. j = 1. Assume that a certain aspect of the experiment depends on the temperature and let yj be some measurements taken at the temperatures xj.v. . Yj. j = 1. y j = β1 + β 2 x j . . yj). n would be (exactly) related as follows for the three cases considered above. One question which naturally arises is how we draw a line in the xy-plane which would ﬁt the data best in a certain sense. or 416 . . as is seen by inspection. . it can be used for prediction purposes. . j = 1. n. The reason that this line-ﬁtting problem is important is twofold. consider a certain chemical or physical experiment which is carried out at each one of the (without appreciable error) selected temperatures xj. a polynomial of higher degree would seem to ﬁt the data better. (1) for some values of the parameters β1. In other cases. . the pairs (xj. they lie approximately on a straight line. and secondly. n which can be represented as points in the xy-plane. . that is. . . n. β2. and still in others. . First. . . errors. yj). . n which need not be all distinct but they are not all identical either. . yj is actually an observed value of a r. . a brief exposition of the results employed herein is given in Appendix I. j = 1. . it reveals the pattern according to which the y’s depend on the x’s. . yj). . . . n (n ≥ 2). However. . . . for the sake of completeness. For the introduction of the model. . . n are approximately linearly related. . due to random errors in taking measurements.1 Introduction of the Model In the present chapter. yj). j = 1. the data is periodic and it is ﬁt best by a trigonometric polynomial. If it were not for the r. j = 1. j = 1. the pairs (xj. . that is.416 16 The General Linear Hypothesis Chapter 16 The General Linear Hypothesis 16. n as closely as possible. . which would pass through the pairs (xj. The underlying idea in all these cases is that. Quite often. Then one has the n pairs (xj. . j = 1. . . .

errors ej. . . i =1 p j = 1. j = 1. (5) The model given by (5) is called the general linear model (linear because the parameters β1. vector e is unobservable. . or.16. . n ( ) (3) for some values of the parameters β1. .1 Introduction of the Model 417 y j = β1 + β 2 x j + ⋅ ⋅ ⋅ + β k +1 x k j . it should be noted that what one actually observes is the r. so that X ′ = ⎜⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⎟ ⎜⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜x ⎟ ⎝ x p1 x p 2 ⋅ ⋅ ⋅ x pn ⎠ ⎝ 1n x2n ⋅ ⋅ ⋅ x pn ⎠ relation (4) is written as follows in matrix notation: Y = X ′β + e. Yn . . one observes that the models appearing in relations (1′)–(3′) are special cases of the following general model: Y1 = x11 β1 + x 21 β 2 + ⋅ ⋅ ⋅ + x p1 β p + e1 Y2 = x12 β1 + x 22 β 2 + ⋅ ⋅ ⋅ + x p 2 β p + e 2 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ Yn = x1n β1 + x 2 n β 2 + ⋅ ⋅ ⋅ + x pn β p + e n . n. . . . . . en ( ) ( ) ( ) and ⎛ x11 x21 ⋅ ⋅ ⋅ x p1 ⎞ ⎛ x11 x12 ⋅ ⋅ ⋅ x1n ⎞ ⎜ ⎟ ⎜x x12 x22 ⋅ ⋅ ⋅ x p 2 ⎟ x22 ⋅ ⋅ ⋅ x2n ⎟ 21 ⎜ ⎜ ⎟ X= . j = 1. . and Y j = β1 + β 2 cos t j + β3 sin t j + ⋅ ⋅ ⋅ + β 2 k ( cos( kt ) + β sin kt j . . . . ( ) ) (2′) (3′) At this point. . n (2 ≤ k ≤ n − 1). . . . j 2 k+1 ( ) (1′) Yj = β1 + β 2 x j + ⋅ ⋅ ⋅ + β k +1 x k j + ej . . . whereas the r. e = e1 . . . j = 1. . j = 1. vector Y. . . . . (4) By setting ′ ′ ′ Y = Y1 . . . β2k+1.’s. At this point. βp enter the model in their ﬁrst powers only).v. . n (n ≥ 2k + 1). . . . . . n with p ≤ n and most often p < n. . j = 1. . y j = β1 + β 2 cos t j + β3 sin t j + ⋅ ⋅ ⋅ + β 2 k cos kt j + β 2 k+1 sin kt j . βk+1. In the presence of r. . β = β1 . j = 1. . ( ) (2) for some values of the parameters β1. + e j . respectively: Yj = β 1 + β 2 x j + e j . . . β p . ﬁnally. n 2 ≤ k ≤ n − 1 . . or in a more compact form Yj = ∑ xij β i + e j . . . . . . (n ≥ 2k + 1). . . n n ≥ 2 . . . the y’s appearing in (1)–(3) are observed values of the following r. .

the EC is deﬁned as follows. so that the diagonal elements are simply the variances of the Z’s. Since the r. β + e = η + e for some β. . so that Y is an n × 1 r. Another assumption about the e’s which is often made is that they are uncorrelated. n. Y = X ′β + e. . . β is to be determined so that the sum of squares of errors. . we have EC = E[(Z − EZ) (Z − EZ)′]. the covariance of Zi and Zj.’s Yj. σ 2 and the problem is that of estimating these parameters and also testing certain hypotheses about the β’s. is Any value of β which minimizes the squared norm ʈY − ηʈ2. This last expression is denoted by Σ / z and is called the variance–covariance matrix of Z. we have EZ = (EZ1. . vector. DEFINITION 2 β. . In the model represented by (6). . . n are linearly related to the β’s and are called linear regression functions. that is.418 16 The General Linear Hypothesis DEFINITION 1 Let C = (Zij) be an n × k matrix whose elements Zij are r. βp. . . j = 1. Then the whereas what we actually observe is Y = X′β principle of least squares (LS) calls for determining β. j = 1. EC = (EZij). Zj).v. These assumptions are summarized by writing E(e) = 0 and Σ / e = σ In . . Coυ(ei. errors. This is the model we are going to concern ourselves with from now on. Y − η = e = ∑ e j2 . . More precisely.2 Least Square Estimators—Normal Equations According to the model assumed in (6). In particular. 16. . there are p + 1 parameters β1. so that the difference between what we expect and what we actually observe is minimum. . denoted by ʈvʈ. Clearly the (i. . X′ is an n × p ( p ≤ n) matrix of known constants. EZn)′. .v. where η = X′β ˆ. vector. j)th element of the n × n matrix Σ / z is Coυ(Zi. . EY = X ′β = η. . ∑ (6) where e is an n × 1 r. This motivates the title of the present chapter. It should also be mentioned in passing that the expectations ηj of the r. for Z = (Z1. .’s ej. ej) = 0. called a least square estimator (LSE) of β and is denoted by β The norm of an m-dimensional vector v = (v1. and for C = (Z − EZ)(Z − EZ)′. Zn)′. . or just the covariance matrix of Z. our model in (5) becomes as follows: / Y = σ 2 In . . it is reasonable to assume that Eej = 0 and that σ 2(ej) = σ 2. vm)′. By then taking into consideration Deﬁnition 1 and the assumptions just made. n are r. . . where In is the n × n unit matrix. j = 1. 2 2 j =1 n is minimum. and β is a p × 1 vector of parameters. This is done in the following sections.v.’s. is the usual Euclidean norm. . . namely . for 2 i ≠ j. Then by assuming the EZij are ﬁnite. . . we would expect to have η = X′β β. .

. ⎠ j =1 ⎝ i =1 2 which we denote by S(Y. j = 1. Yj is the (observable) r. ηj)′. (See also Fig. Thus (xj. . 16. n. . . so that ηj = β1 + β2xj. .1 . . j = 1. ηn)′ = η = X′β η j = ∑ xij β i . . . . .v. . n and Y − η = ∑ Yj − η j 2 j =1 n ( ) 2 p n ⎛ ⎞ = ∑ ⎜ Yj − ∑ xij β i ⎟ . . . . j = 1.) β. . n. we have that From (η1. corresponding to xj. β). .) (The values of β1 and β2 are chosen in order to minimize the quantity (Y1 − η1)2 + · · · + (Y5 − η5)2. .2 Least Square Estimators—Normal Equations 419 ⎛m ⎞ v = ⎜ ∑ υ j2 ⎟ ⎝ j =1 ⎠ 1 2 . n are n points on the straight line η = β1 + β2x and the LS principle speciﬁes that β1 and β2 be chosen so that Σ jn= 1(Yj − ηj)2 be minimum. .16. . .1. . . For the pictorial illustration of the principle of LS. x1j = 1 and x2j = xj. n. i =1 p j = 1. . j = 1. let p = 2. Then any LSE is a root of the equations Y5 5 ϭ 1 ϩ 2 x ԽY5 Ϫ 5Խ Y4 4 3 Y3 ԽY4 Ϫ 4Խ ԽY3 Ϫ 3Խ ԽY2 Ϫ 2Խ x1 x2 2 Y2 0 1 Y1 x3 x4 x5 x ԽY1 Ϫ 1Խ Figure 16. .

x p 2 . Then the vector space Vr generated by ξ1. j = 1. THEOREM 1 ˆ of β is a solution of the normal equations and any solution of the Any LSE β normal equations is an LSE. . The normal equations provide a method for the actual calculation of LSE’s. . . . the set of LSE’s of β coincides with the set of solutions of the normal equations. Thus η = ∑ β jξ j j=1 p with ξ j . . . . . .420 16 The General Linear Hypothesis ∂ S Y. . ξp is of dimension r . . . β = 2 ∑ ⎜ Yj − ∑ xij β j ⎟ 1 − xvj = −2 ∑ xvj Yj + 2 ∑ ∑ xvj xij β i . p. x pn ( ) ′ ( ) ′ +⋅⋅⋅ = β1ξ1 + β 2 ξ 2 + ⋅ ⋅ ⋅ + β p ξ p . . . where S = XX ′. . (7′) Actually. PROOF We have ⎛ x11 x 21 ⋅ ⋅ ⋅ x p1 ⎞ ⎛ β1 ⎞ ⎜ ⎟ x12 x 22 ⋅ ⋅ ⋅ x p 2 ⎟ ⎜ β 2 ⎟ ⎜ ⎜ ⎟ η = X ′β = ⎜ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅⎟ ⎜ M ⎟ ⎟ ⎜ ⎟⎜ ⎟ ⎜x ⎝ 1n x 2 n ⋅ ⋅ ⋅ x pn ⎠ ⎝ β p ⎠ = x11 β1 + x 21 β 2 + ⋅ ⋅ ⋅ + x p1 β p . . . v = 1. . x 22 . p ∂β v ( ) which are known as the normal equations. . . ⋅ ⋅ ⋅ . ∂β v ⎠ j =1 ⎝ i =1 j =1 j =1 i =1 ( ) ( ) so that the normal equations become ∑ ∑ xvj xij βi = ∑ xvjYj . . p as above. . . . Now p p n ⎛ n n ⎞ ∂ S Y. ν = 1. x1n ( ( ) ′ ) ′ + β 2 x 21 . x12 . . . . j =1 i =1 j =1 n p n (7) The equations in (7) are written as follows in matrix notation XX ′β = XY. . β = 0. or Sβ = XY. x12 β1 + x 22 β 2 + ⋅ ⋅ ⋅ + x p 2 β p . x1n β1 + x 2 n β 2 + ⋅ ⋅ ⋅ + x pn β p = β1 x11 . . n. . (8) Let Vn be the n-dimensional vector space ޒn and let r (≤ p) be the rank of X (= rank X′). . where ξj is the jth column of X′. x 2 n + β p x p1 . as the following theorem shows. . j = 1.

. Of course.2. as is well known. which is the matrix notation for the normal = 0. XX′β equations. Next.) ▲ In the course of the proof of the last theorem.2 (n = 3. . Now a special but important case is that where X is of full rank. this LSE is linear in Y. This completes the proof of the theorem. . . . p. (For a pictorial illustration of some of the arguments used in the proof of the theorem. p may ˆ is. Clearly.16. so that S−1 exists. . however) but may be chosen to be functions not be uniquely determined (η ˆ is a function of Y. it was seen that there exists ˆ or β and by the theorem itself the totality of LSE’s coincides at least one LSE β with the set of the solutions of the normal equations. that is. an equivalent condition this is equivalent to saying that Y − X′β ˆ ⊥ ξj. . That it is linear in Y follows immediately from (9). Y ∈ Vn and from (8). and Vr ⊆ Vn. it follows that η ∈ Vr. or equivalently. unbiased and has covariance matrix given 2 −1 by Σ / βˆ = σ S . Eβ ( ) p . (≤ p). p. its unbiasedness is checked thus: ˆ = E S −1 XY = S −1 X E Y = S −1 XX ′β = S −1Sβ = I β = β. this last condition is equivalent to X(Y − X′β ˆ β = XY. p. THEOREM 2 ˆ of β given by the expression If rank X = p. where βj. Then η ˆ = Σp η j = 1βjξ j. Thus β is an LSE of β if and only if X′β ˆ . or ξ ′j(Y − X′β ˆ ) = 0. see Fig. . . . . . . ʈY − X′β βʈ2 = ʈY − ηʈ2 of Y since η ˆ ˆ =η ˆ . j = 1. j = 1. then there exists a unique LSE β ˆ = S −1 XY. and β becomes minimum if η = η ˆ β ⊥ Vr. Then S = XX′ is a p × p symmetric matrix of rank p. Let ˆ ˆ ˆ be the projection of Y into Vr. 16. This is part of equations in (7′) provide a unique solution. where S = XX ′. From β β to it is that Y − X′β ˆ) β the deﬁnition of ξj. . Therefore the normal ˆ = S−1XY. PROOF ˆ and the fact that it is The existence and uniqueness of the LSE β given by (9) have already been established. .2 Least Square Estimators—Normal Equations 421 Vn Y Ϫ X' ˆ ϭ Y Ϫ ˆ Y ˆ 0 Vr Figure 16. r = 2). Now. rank X = p. (9) β Furthermore. namely β the following result. j = 1. j = 1. .

ˆ for any LSE β ˆ of β. Then ψ is called a parametric function. ∑ (10) where V is an n × 1 r. we need the following Finally. so that ′ ′ ′ 2 −1 −1 /β = σ 2S −1XX ′ S −1 = σ 2S −1S S −1 ∑ ˆ =σ S X S X ( ) ( ) ( ) = σ 2 I p S −1 = σ 2S −1 because S and hence S−1 is symmetric. ii) σ 2(a′Y) ≥ σ 2 (d′ Y). Furthermore. Then: i) E(d′Y) = ψ. Let ψ = c′β β be an estimable function. so that there exists a ∈ Vn such that E(a′Y) = ψ identically in β. where b ⊥ Vr. let d be the projection of a into Vr. PROOF LEMMA 1 i) The vector a can be written uniquely as follows: a = d + b.422 16 The General Linear Hypothesis ˆ . vector with ﬁnite covariances and A is an m × n matrix of constants. DEFINITION 3 β. linear (in Y) estimator. the following result holds. ( ) ( ) ( ) . so that (S−1)′ = S−1. For a known p × 1 vector c. ▲ The following deﬁnition will prove useful in the sequel. A = S−1X and V = Y with Σ / Y = σ In. set ψ = c′β A parametric function ψ is called estimable if it has an unbiased. then α = d. if there exists a nonrandom n × 1 vector a such that E(a′Y) = ψ identically in β. for the calculation of the covariance of β auxiliary result: / AV = A ∑ / V A ′. that is. iii) d′Y = c′β β iv) If α is any other nonrandom vector in Vr such that E(α ′ Y) = ψ. Relation (10) is established as follows: ′⎫ ⎧ ′ ⎤ ⎡ / AV = E ⎨ AV − E AV AV − E AV ⎬ = E ⎢ A V − EV V − EV A ′ ⎥ ∑ ⎣ ⎦ ⎩ ⎭ ′⎤ ⎡ / V A ′. This completes the proof of the theorem. = A E ⎢ V − EV V − EV ⎥ A ′ = A ∑ ⎣ ⎦ [ ( )][ )( ( )] ( )( ) ( ) 2 In the present case. In connection with estimable functions. Hence ′ a ′ Y = d + b Y = d ′ Y + b′ Y ( ) and therefore ψ = E a ′ Y = E d ′ Y + b′ EY = E d ′ Y + b′ X ′β.

and let d be the projection of a into Vr. . it follows. We are now able to formulate and prove the following basic result. by Lemma 1. ˆ .2 Least Square Estimators—Normal Equations 423 But b′X′ = 0 since b ⊥ Vr and thus b ⊥ ξj. ▲ Part (iii) of Lemma 1 justiﬁes the following deﬁnition. p. or (α − d)′X′ = 0 which is equivalent to saying that α − d ⊥ Vr. Therefore projection of Y into Vr. by means of (10). Hence E(d′Y) = ψ. So. the proof is complete. Next. both α − d ∈ Vr and α − d ⊥ Vr and hence α − d = 0. DEFINITION 4 ( ) ( ) [( ) ] ( ) β be an estimable function. linear (in Y) estimator ψ ˆ of ψ is where β called the LSE of ψ. (α′ − d′)X′β β = 0 identically in β and hence (α′ − d′)X′ = 0. (Gauss–Markov) Assume the model described in (6) and let ψ be an estimable ˆ has the smallest variance in the class of all linear in function. let α ∈ Vr be such that E(α′Y) = ψ. Since ψ is estimable there exists a ∈ Vn such that E(a′Y) = ψ identically in β. ii) From the decomposition a = d + b mentioned in (i). β identically in β. Then we have ( ) 0 = E α ′ Y − E d ′ Y = E α ′ − d ′ Y = α ′ − d ′ X ′β. since d ∈ Vr. ˆ = c′β β = ψ identically in β. Then if b′Y is any other linear in Y and unbiased estimator of ψ. 2 σ2 d by (10) again. j = 1. E(d′Y) = ψ = c′β ( ) ( ) E d ′ Y = d ′EY = d ′ X ′β. / Ya = σ 2 a σ 2 a′Y = a′∑ Since also 2 ( ) 2 =σ2 d 2 +σ2 b . . we have = σ 2 d ′Y ( ) ) 2 σ 2 a′Y = σ 2 d′Y + σ 2 b from which we conclude that ( ) ( σ 2 a′Y ≥ σ 2 d′Y . But iii) By (i). Thus there exists a ∈ Vn such that E(a′Y) Let ψ = c′β ˆ (= d′Y). . Set ψ ˆ is any LSE of β. Thus α = d. one has d′(Y − η ˆ = c ′β ˆ. . Since d′Y = ψ PROOF THEOREM 3 . the ˆ = X′β β identically in β. ▲ σ 2(d′Y). Next.16. it follows that ʈaʈ2 = ʈdʈ2 + ʈbʈ2. the column vectors – of X′. Hence d′X′ = c′. with η β so that d′Xβ = c′β ˆ ) = 0. That is. as was to be seen. that σ 2(b′Y) ≥ ˆ. Then its LSE ψ Y and unbiased estimators of ψ. ˆ = d ′ X ′β d′Y = d′η iv) Finally. and let d be the projection of a into Vr. Then the unbiased.

j = 1. j = r + 1. n. ( ) (11) Next. we also obtain an estimator of the variance σ2. Zn)′. Recall that Vr is the r-dimensional vector space generated by the column vectors of X′. . for j = 1. Then ζ = E(PY) = Pη. αn} for Vn. ▲ 16. .v. a basis for which α i′αj = 0. . i = 1. . i ≠ j and ʈαjʈ 2 = 1. .’s Zj. . For this. . where β = (β1. . From the deﬁnition of P. the last n equations are summarized as follows: Z = PY. . .3 Canonical Reduction of the Linear Model—Estimation of σ 2 Assuming the model described in (6). . one has that Y = Σ jn= 1Zjαj for certain r. we have By recalling that η (12) ˆ = ∑ Z jα j . p. . j = 1. . . It follows then that ζ j = 0. where η ∈ Vr. we obtain a certain reduction of the linear model under consideration. . In the present section. αr. . . . so that Zi = α i′Y. . .424 16 The General Linear Hypothesis COROLLARY β is estimable. Let {α1. ∑ Thus σ 2 Z j = σ 2 . n to be speciﬁed below. . . . . . Then for any c ∈ Vp. let EZ = ζ = (ζ1. . Suppose that rank X = p. PROOF ˆ = S−1XY. . Y = ∑ Z jα j and η j=1 j=1 n r so that . . By letting P be the matrix with rows the vectors α i′. . . . so that Vr ⊆ Vn. . so that relation (10) gives / Z = Pσ 2 I n P ′ = σ 2 I n . in the previous section we solved the problem of estimating β by means of the LS principle. n. where r = rank X. where Z = (Z1. . . . n. . . αr+1. ˆ is the projection of Y into Vr. . . . the function ψ = c′β ˆ has the smallest variance in the class of all linear in ˆ = c′β β and hence its LSE ψ ˆ j. Y and unbiased estimators of ψ. Now since Y ∈ Vn. . . The ﬁrst part follows immediately by the fact that β The particular case follows from the ﬁrst part by taking c to have all its components equal to zero except for the jth one which is equal to one. αr} be an orthonormal basis for Vr (that is. In particular. and as a by-product of it. . ζn)′. . Then this basis can be extended to an orthonormal basis {α1. n. . the same is true for each β ˆ ˆ ˆ j = 1. r). . βp)′. . . . . i = 1. j = 1. . . it will have to be assumed that r < n as will become apparent in the sequel. . it is clear that PP′ = In. It follows that α i′ Y = Σ jn= 1Zjα i′αj. . n. . .

Then the LSE’s β ˜ 2 are functions only of the transformed r. a number. Hence ʈY − η ˆ ʈ /(n − r) is an unbiased estimator of σ 2. . Next. σ ˜ 2. j = 1.v. On the other hand. ( ) By summarizing these results. XX′β ˆ − XY = 0 since XX′β ˆ = XY β β β β Y′X′β by the normal equations (7′). THEOREM 4 In the model described in (6) with the assumption that r < n. . . an unbiased ˆ ʈ2/(n − r). But P −1 = P′ as follows from the fact ˆ = S−1XP′Z. . p are the column vectors of X′. it follows that β j = 1. . . . . n are the columns of P′. . EʈY − η (Here is where we use the assumption that r < n in order to ensure that n − r > 0. From the last theorem above.3. where ξj. (13) 2 From (11) and (12). . . . . . . . j = 1.) Thus we have shown the following result. whereas ˜2 = σ n 1 Z j2 ∑ n − p j = p +1 by 13 . p and Zj. . Since αj. . we have then COROLLARY ˆ and σ Let rank X = p (< n). . form of the matrix XP′ menitoned above. = Y ′ Y − Y ′ X ′β ( )( ( ( ) ˆ ) ( Y − X ′β ) ′ ) ˆ is (1 × n) × (n × p) × ( p × 1) = 1 × 1. n. we get that EZ 2 j = σ . . . Therefore ξ ′ jα i = 0 for all j = 1. so that (13) gives 2 2 2 ˆ ʈ = (n − r)σ . where η ˆ is the projection estimator for σ 2. .16. is provided by ʈY − η ˜ 2 as the LSE of Y into Vr and r (≤ p) is the rank of X. We may refer to σ 2 of σ . Now from the transformation Z = PY one has Y = P−1Z. respectively.1). . j = p + 1. . the Now suppose that rank X = p. p and i = p + 1. n. . . that is. it follows that in order for us to be able to actually calculate the LSE σ ˜ 2 of σ 2. j =1 n (14) .’s Zj. . and ξj ∈ Vp. j = 1. . j = 1. Because of the special that PP′ = In. p. . . p only (see also Exercise 16. . we have ʈY − η REMARK 1 2 ˆ = Y − X ′β ˆ ˆ = Y − X ′β Y−η 2 ˆ ′ X Y − X ′β ˆ = Y ′ Y − Y ′ X ′β ˆ −β ˆ ′ XY + β ˆ ′ XX ′β ˆ = Y′ − β ˆ +β ˆ ′ XX ′β ˆ − XY . it follows that the last n − p elements in all p rows of the p × n matrix XP′ are all equal to zero. so that β rows of X are ξ ′ j. j = r + 1. .3 Canonical Reduction of the Linear Model—Estimation of σ 2 425 ˆ = Y−η 2 j = r +1 ∑Z α j n 2 j ′ ⎛ n ⎞ ⎛ n ⎞ = ⎜ ∑ Zjα j ⎟ ⎜ ∑ Zjα j ⎟ = ⎝ j = r +1 ⎠ ⎝ j = r +1 ⎠ j = r +1 ∑Z n 2 j . Thus Y = P′Z and therefore β ˆ is a function of Zj. Hence β But Y′X′β ˆ = (Y′X′β ˆ )′ = β ˆ ′XY. we would have to rewrite 2 ˆ ʈ in a form appropriate for calculation. Therefore 2 ˆ ′ XY = Y 2 − β ˆ = Y ′Y − β Y−η ∑ j ˆ ′ XY. . To this end. . n. . ˆ = S−1XY by Theorem 2. .

. Clearly. ⎝ x1 x2 ⋅ ⋅ ⋅ xn ⎠ ⎝ j =1 ⎠ j =1 ( ) (17) Now n n ⎛ n ⎞ 2 S = n∑ x j2 − ⎜ ∑ x j ⎟ = n∑ x j − x . ∑ x j Yj ⎟ . Yn = ⎜ ∑ Yj . . β 2 . . where Eej = 0 and E(eiej) = σ 2δij.426 16 The General Linear Hypothesis Finally. . EXAMPLE 1 Let Yj = β1 + β2xj + ej. . . if i = j and δij = 0 if i ≠ j). ( ) ⎛1 XX ′ = ⎜ ⎝ x1 ⎛1 ⎜1 ⎜ 1 ⋅ ⋅ ⋅ 1 ⎞ ⎜⋅ ⎟⎜ x2 ⋅ ⋅ ⋅ xn ⎠ ⎜ ⋅ ⎜⋅ ⎜ ⎝1 x1 ⎞ x2 ⎟ ⎛ ⎟ n ⋅ ⋅⎟ ⎜ ⎜ = ⎟ n ⋅ ⋅⎟ ⎜ ∑ ⎜ xj ⋅ ⋅⎟ ⎝ j = 1 ⎟ xn ⎠ ⎟ = S. one has rv = ∑ xvj Yj . j = 1. . . . Y2 . ′ and β = β1 . ⎝ j =1 ⎠ j =1 j =1 2 ( ) so that ∑ (x j =1 n j −x ) 2 ≠ 0. n (δij = 1. p j =1 n (15) and therefore (14) becomes as follows: 2 ˆ r. . this example ﬁts the model described in (6) by taking ⎛ 1 x1 ⎞ ⎜1 x ⎟ 2 ⎜ ⎟ ⎜⋅ ⋅ ⎟ X′ = ⎜ ⎟ ⎜⋅ ⋅ ⎟ ⎜⋅ ⋅ ⎟ ⎜ ⎟ ⎝ 1 xn ⎠ Next. . . ˆ = ∑ Yj2 − ∑ β Y−η v v j =1 v =1 n p (16) As an application of some of the results obtained so far. consider the following example. . denoting by rv the vth element of the p × 1 vector XY. v = 1. i. ⎟ ∑x ⎟ ⎠ j =1 j =1 n 2 j ∑x ⎟ j n ⎞ so that the normal equations are given by (7′) with S as above and ′ n ⎞ ⎛1 1 ⋅ ⋅ ⋅ 1 ⎞ ′ ⎛ n XY = ⎜ ⎟ Y1 .

⎝ j =1 ⎠ ⎝ j =1 ⎠ j =1 j =1 ( )( ) as is easily seen. Then S−1 exists and is given by ⎛ n 2 ⎜ ∑ xj 1 ⎜ j =n ⎜ ⎜ −∑ x j ⎝ j =1 n ⎞ −∑ x j ⎟ j =1 ⎟.⎪ β1 = n 2 ⎪ n∑ x j − x ⎪ j =1 ⎪ n n n ⎛ ⎞⎛ ⎞ ⎪ n∑ x j Yj − ⎜ ∑ x j ⎟ ⎜ ∑ Yj ⎟ ⎬ ⎝ ⎠ ⎝ ⎠ 1 1 1 = = = j j j ˆ = ⎪ β and 2 n 2 ⎪ n∑ x j − x ⎪ j =1 ⎪ 2 ⎪ n n n ⎛ ⎞ 2 ⎪ − n∑ x j − x = n∑ x 2 x . so that ˆ = β 2 ∑ ( x − x )(Y − Y ) . ⎜⎛ n ⎞⎛ n ⎞ ⎟ n ⎜ − ∑ x j ∑ Yj + n∑ x j Yj ⎟ ⎟ ⎟⎜ ⎜⎜ ⎟ j =1 ⎝ ⎝ j =1 ⎠ ⎝ j =1 ⎠ ⎠ so that ⎞ ⎫ ⎛ n 2⎞⎛ n ⎞ ⎛ n ⎞⎛ n x Y x x Y − ∑ ∑ ∑ ∑ j j j j j ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎪ ⎠ ⎪ ⎝ j =1 ⎠ ⎝ j =1 ⎠ ⎝ j =1 ⎠ ⎝ j =1 ˆ . ⎟ n ⎟ ⎠ S −1 = 1 n∑ x j − x j =1 n ( ) 2 (18) It follows that ⎛ˆ ⎞ 1 ˆ = β1 = S −1XY = β ⎜ˆ ⎟ n ⎝ β2 ⎠ n∑ x j − x j =1 ( ) 2 ⎛⎛ n 2⎞⎛ n ⎞ ⎛ n ⎞⎛ n ⎞⎞ ⎜ ⎜ ∑ x j ⎟ ⎜ ∑ Yj ⎟ − ⎜ ∑ x j ⎟ ⎜ ∑ x j Yj ⎟ ⎟ ⎝ j =1 ⎠ ⎝ j =1 ⎠ ⎝ j =1 ⎠ ⎝ j =1 ⎠⎟ × ⎜ . j ⎜∑ j⎟ ⎪ ⎝ j =1 ⎠ j =1 j =1 ⎭ ( ) (19) ( ) ( ) But n n ⎛ n ⎞⎛ n ⎞ n∑ x j Yj − ⎜ ∑ x j ⎟ ⎜ ∑ Yj ⎟ = n∑ x j − x Yj − Y .16.3 Canonical Reduction of the Linear Model—Estimation of σ 2 427 provided that not all x’s are equal. ∑ (x − x) n j =1 i j n 2 j =1 j (19′) .

the LSE of σ 2 is given by ˜ = σ 2 ˆ Y−η n−2 2 . r alone. Eβ ˆ =β .2 16. Show directly. .3833 and β ˆ 2 = −0.8939.3216. ˆ = ∑ Yj2 − β Y−η 1⎜∑ j ⎟ 2⎜∑ j j⎟ ⎝ j =1 ⎠ ⎝ j =1 ⎠ j =1 (20) Since also r = p = 2.3 Verify relation (19″). but their expressions given by (19) are more appropriate for actual calculations. Then relation (19) provides us with the estimates β 2 and (20) and (21) give the estimate σ ˜ = 14.1 Referring to the proof of the corollary to Theorem 4. . elaborate on the ˆ is a function of the r. ˆ 1 = 46. take n = 12 and the x’s and Y’s as follows: x1 = 30 x 2 = 30 x3 = 30 x 4 = 50 x5 = 50 x 6 = 50 x 7 = 70 x8 = 70 x 9 = 70 x10 = 90 x11 = 90 Y1 = 37 Y7 = 20 Y2 = 43 Y8 = 26 Y3 = 30 Y9 = 22 Y4 = 32 Y10 = 15 Y5 = 27 Y11 = 19 x12 = 90 Y6 = 34 Y12 = 20. j =1 n so that (16) becomes n n n 2 ˆ ⎛ Y ⎞ −β ˆ ⎛ x Y ⎞. respectively.v. that ˆ = β . assertion that β 16.3.3.’s Zj. .3. (15) gives in conjunction with (17) r1 = ∑ Yj j =1 n and r2 = ∑ x j Yj . (21) For a numerical example. β 1 2 (19″) ˆ1 and β ˆ2 given by (19″) and (19′). Exercises 16. In the present case.2) that ˆ =Y −β ˆ x. are more The expressions of β compact. . j = 1. by means of (19′) and (19″). σ2 β ˆ = Eβ 1 1 2 2 2 ( ) ∑ (x n j =1 σ2 j −x ) 2 .3.428 16 The General Linear Hypothesis It is also veriﬁed (see also Exercise 16.

n under the usual assumptions and bring it under the form (6). . j = 1. n represent errors in taking measurements.4 i) Use relation (18) to show that ˆ = σ β 1 2 ( ) σ 2 ∑j = 1 x j2 n n∑j = 1 x j − x n ( ) 2 ˆ = . However.4 Testing Hypotheses About η ؍E(Y) The assumptions made in (6) were adequate for the derivation of the results obtained so far. . σ2 β ˆ = σ2 β 1 2 n 2 ( ) ( ) σ2 ∑ n j =1 x j2 and.’s ej. ˆ2) becomes a minimum if half of the x’s are chosen equal conclude that σ 2(β to x and the other half equal to −x (if that is feasible). in order to be able to carry out tests about η. .v. Since the r. for example. vector e and hence the r. vector Y. Those assumptions did not specify any particular kind of distribution for the r.5 Consider the linear model Yj = β1 + β2xj + β3x 2 j + ej. (It should be pointed out. . however. j = 1. ii) The covariance of β iii) An estimate of the covariance found in (ii). . and that.4 Testing Hypotheses About ؍E(Y) 429 ˆ2 and β ˆ1 are normally distributed if the Y’s are normally distributed. and that β 16. .) 16.16. Then ˆ ii) Conclude that β1 is normally distributed if the Y’s are normally distributed. . if x ¯ = 0.3. then β Again refer to Example 1 and suppose that x ¯ = 0. n for some x > 0. such an assumption will have to be made now. j = 1. by assuming that n is even and xj∈[−x. ˆ (the LSE of β). . . iii) Show that ˆ = σ .” This is the case. σ2 β 2 ( ) ∑ (x n j =1 σ2 j −x ) 2 ˆ1 and β ˆ2 are uncorrelated. Then for n = 5 and the data given in the table below ﬁnd: i) The LSE’s of β and σ 2. it . .3. when there is doubt about the linearity of the model.0 2 1. that such a choice of the x’s—when feasible—need not be “optimal.5 5 1.5 3 1.7 16.3 4 2. . x y 1 1. x]. .

For this purpose. β = y − X ′β = ∑ y j − EYj 2 j =1 ( ) n ( ) 2 under C and c. β ⎥. is obtained when S(y. . there might be some evidence that the components of η satisfy 0 < q ≤ r additional independent linear relationships. . or fY Y (y1. σ 2) with respect to β.d.’s ej. σ 2 I n . β and it was seen in (8) that Now from (22). that is. n are uncorrelated normal (equivalently. we also assume H.f. we are going to use the Likelihood Ratio (LR) test. . β. of the Y’s and let SC and Sc stand for the minimum of 1. . . j = 1. σ ( 2 ) ⎛ 1 ⎞ ⎡ 1 =⎜ ⎟ exp ⎢− 2 2 ⎢ ⎝ 2πσ ⎠ ⎣ 2σ n n ∑ ( y j − EYj ) j =1 n 2 ⎤ ⎥ ⎥ ⎦ ⎛ 1 ⎞ ⎡ 1 ⎤ S y . . Thus we hypothesize that ( ) ( ) (22) H : η ∈ Vr −q ⊂ Vr (q < r ) (23) and denote by c = C ∩ H. This is expressed by saying that η actually belongs in Vr−q. independent normal) with mean zero and variance σ 2. under C. an (r − q)-dimensional vector subspace of Vr. . This means that the coordinates of η are not allowed to vary freely but they satisfy n − r independent linear relationships. it sufﬁces to maximize with respect to σ 2 the quantity ⎛ 1 ⎞ ⎛ ⎞ 1 SC ⎟ . exp ⎢− =⎜ ⎟ 2 ⎝ 2πσ 2 ⎠ ⎣ 2σ ⎦ ( ) (24) From (24). . it is obvious that for a ﬁxed σ 2. the r-dimensional vector space generated by the column vectors of X′. 2 2 2 σ ( ) . that is. Thus in order to maximize fY(y. β) is replaced by Sc. . the maximum of fY(Y. respectively. σ 2In) simply states that the r. e: N 0. .v. under C. rank X = r ≤ p < n . σ 2) with respect to both β and σ 2. σ 2) the joint p. ⎜ ⎟ exp⎜ − 2 2 ⎝ ⎠ 2 σ ⎝ 2πσ ⎠ n or its logarithm − n n S 1 log 2π − log σ 2 − C 2 . Denoting by (C) the set of conditions assumed so far. For testing the hypothesis H. we have then (C ): Y = X′β + e. The assumption that e: N(0.430 16 The General Linear Hypothesis is reasonable to assume that they are normally distributed. yn. . β. β. n S y. We have fY y : β. . σ 2). we have that η = EY = X′β η ∈ Vr. the conditions that our model satisﬁes if in addition to (C). β. denote by fY(y. However.

4 Testing Hypotheses About ؍E(Y) 431 Differentiating with respect to σ 2 this last expression and equating the derivative to zero. replaced by η The LR test rejects H whenever λ < λ 0. we obtain − S n 1 SC 1 + = 0. with speciﬁed degrees of freedom. one also has ⎛ n ⎞ max fY y . where η ˆ C and η ˆ c. β = Y − η c ( ) 2 ˆ . β = Y − η C ( ) 2 ˆ . σ 2 = ⎜ ⎟ C ⎝ 2πSC ⎠ ( ) n 2 ⎛ n⎞ exp⎜ − ⎟ . 2 4 2σ 2 σ n The second derivative with respect to σ 2 is equal to n/(2σ 4) − (SC/σ 6) which for 2 σ2 = σ ¯ 2 becomes −n3/(2SC ) < 0. under H. so that the level of the test is α. set g λ = λ− 2 n − 1. Therefore ⎛ n ⎞ max fY y . = Y − X ′β C 2 (28) ˆ = LSE of β under C . respectively. Then () (30) dg λ dλ ( )=−2λ ( n − n+ 2 ) n < 0.16. Now the problem which arises is that of determining the distribution of λ (at least) under H. We will show in the following that the LR test is equivalent to a certain “F-test” based on a statistic whose distribution is the F-distribution. β c ˆ is The actual calculation of SC and Sc is done by means of (16). β. To this end. . so that σ 2 = C . β. β C and ˆc S c = min S Y. σ 2 = ⎜ ⎟ c ⎝ 2πSc ⎠ ( ) n 2 ⎛ n⎞ exp⎜ − ⎟ . where λ 0 is deﬁned. ⎝ 2⎠ (25) In an entirely analogous way. = Y − X ′β c 2 (29) ˆ = LSE of β under c . ⎝ 2⎠ (26) so that the LR statistic λ is given by ⎛S ⎞ λ=⎜ c⎟ ⎝ SC ⎠ where −n 2 (27) ˆC SC = min S Y.

r = 2. its geometric interpretation illustrated by Fig. Thus λ < λ 0 if and only if g(λ) > g(λ 0).n−r. 16. where y has been replaced by Y: 2 ⎛ 1 ⎞ ⎧ 1 ⎡ 2 ˆ − X ′β ⎤ ⎫. ˜ exp n p + X β σ − − ′ ⎨ ⎥⎬ ⎜ ⎟ 2 ⎢ ⎝ 2πσ 2 ⎠ ⎦⎭ ⎩ 2σ ⎣ n ( ) Vn SC ϭ squared norm of Y Sc ϭ squared norm of Vr Ϫ q 0 ˆC ˆc Vr Sc Ϫ S C ϭ squared norm of Figure 16. n − r . ˆ be the unique (and unbiased) Suppose now that rank X = p (< n). respectively. We have that η ˆ c is the “best” estimator of η under c. is Fq. Then the F-test rejects H C and η ˆ C and η ˆ c differ by too much. Now although the F-test in (31) is justiﬁed on the basis that it is equivalent to the LR test.f. q SC where F0 is determined by ⎛ n − r S c − SC ⎞ PH ⎜ > F0 ⎟ = α . as is shown in the next section. of the Y’s is given by the following expression. By the facet that (Y − η that the joint p. one has LSE of β.432 16 The General Linear Hypothesis so that g(λ) is decreasing. equivalently. and let β ˆ )′(η ˆ − η) = 0 because Y − η ˆ ⊥η ˆ − η.3 (n = 3. whenever Sc − SC is too whenever η large (when measured in terms of SC).3 below ˆ C is the “best” estimator of η under illuminates it even further. Taking into consideration relations (27) and (30). r − q = 1). jx (31) The statistics SC and Sc are given by (28) and (29). the last inequality becomes n − r S c − SC > F0 . . under H. and the distribution of F. where F = n − r Sc − SC q SC and F0 = Fq .d. q S ⎝ ⎠ C Therefore the LR test is equivalent to the test which rejects H whenever F > F0 .

are related as follows: n−r 2 ˆ2 = ˜ σ σ n ˆ2 =σ2 and that σ where σ ¯ 2 is given in Section 16. αr} for Vr and then to an orthonormal basis {α1. . . . x= and β. γ. . 16. . . ▲ Exercises 16. . . .4.v. . . ii) Set up a conﬁdence interval for γ with conﬁdence coefﬁcient 1 − α. . it follows that the LSE’s β the class of all unbiased estimators of βj. n be independent r. j = 1. By sufﬁciency and completeness. regardless of whether they are linear in Y or not). . . . let {αq+r. . Following similar arguments to those in Section 16. αn} for Vn. . . PROOF ˆj is an unbiased (and linear in Y) estimate of βj. . respectively. Let P be the n × n orthogonal . . xj . we have then the following result.5 Derivation of the Distribution of the F Statistic 433 ˆ. p. and σ ˜ 2. αq. . . . . . . . . σ ˆ 2. . σ This shows that (β ˜ 2) is a sufﬁcient statistic for (β. are known constants. Vr and Vn(r < n) which are related as follows: Vr−q ⊂ Vr ⊂ Vn. . . . αr+1. αr} be an orthonormal basis for Vr−q and extend it to an orthonormal basis {α1.’s where Yj is distributed as N β + γ xj − x . . p have the smallest variance in n).4.4. . j = 1. σ 2). αq+1. αr. σ 2 are parameters. j = 1. It For j = 1. . 16. . .5 Derivation of the Distribution of the F Statistic Consider the three vector spaces Vr−q.1 Show that the MLE and the LSE of σ 2. . . .3. It can be shown that it is also complete (this follows from the multiparameter version of Theorem 6 in Chapter 11). Then: 1 n ∑ xj n j =1 ( ( ) ) j = 1. . σ 2 . THEOREM 5 Under conditions (C) described in (22) and the assumption that rank X = p (< ˆj of βj. αq. n (that is. αq+1. β is also of minimum variance (by the Lehmann–Scheffé theorem) as it depends only on a sufﬁcient and complete statistic.16. n ii) Derive the LR test for testing the hypothesis H : γ = γ0 against the alternative A : γ ≠ γ0 at level of signiﬁcance α.2 Let Yj. . . .

As in Section 16. . under H (or equivalently.) Since also σ 2(Zj) = σ 2. clearly.3. . under H (H′). . n Hence Sc − SC = ∑ Z j2 . is non-central Fq.434 16 The General Linear Hypothesis matrix with rows the vectors α J ′. the statistic F is distributed as Fq. . j = r + 1. one has the following theorem. it follows that Z’s are also independent. . . . . . H′). Therefore (29) gives since η ˆc Sc = Y − η 2 = ∑ Z j2 + j =1 q j = r +1 ∑ Z j2 . From the derivations in this section and previous results. . . j = 1. . Now if H is true. For these deﬁnitions. Then ζ = Pη. . = j = r +1 ∑Z r n 2 j . . . Thus ζj = 0. ˆc = and η j = q +1 ∑α Z . . n It follows that. . j = 1. we will further have that η ∈ Vr−q. (See Theorem 5. n n ∑j = r +1Z j2 ∑j = r +1Z j2 n − r q q Now since the Y’s are independent and the transformation P is orthogonal. so that PP′ = In. Y = ∑ α jZj j =1 n 2 j = 1. The distribution of F. ∑ Z j2 is σ 2 χ q2 j =1 q and j = r +1 ∑ Z j2 is σ 2 χ n2−r . j j ˆ c is the projection of η ˆ into Vr−q. Thus the hypothesis H is equivalent to the hypothesis H ′ : ζ j = 0. n−r. j = 1. true. under the alternatives. The converse is also. . . ζn)′. j =1 q so that n−r F = q ∑j = 1Z j2 = ∑j = 1Z j2 q . n−r which is deﬁned in 2 2 terms of a χ n −r and a non-central χ q distribution. so that ζj = 0. it follows that. where η ∈ Vr. . n. q. . as follows from the transformation ζ = Pη. the reader is referred to Appendix II. q. n by (11). n. ˆC SC = Y − η On the other hand. . . Chapter 9. . . By (13) and (28). consider the transformation Z = PY and set EZ = ζ = (ζ1.

. respectively. . This section is closed with two examples. For actually carrying out the test and also for calculating the LSE of σ 2. n to the r. namely H1: β2 = β20. F = n− ~ F .v. . . . we ﬁnd (see also Exercise 16.v. β ¯)2(Yj − Y n 2 ˆ1c = Y ˆ2c = β20. β ˆ c and Sc = Σ jn= 1(Yj − β10 − β20xj)2. or q = ˆ1C = Y ˆ2Cx ˆ2C = Σ jn= 1(xj − x ¯ − β ¯ )/ 1. and that the Y’s are normally distributed. The cut-off point is here is F = n2 SC F0 = F2. and the regression line is y = β1 + β2x in the xy-plane.3) that . . The LSE of β1 and β2 are β ¯. n and suppose that the independent r. Let x0 ≠ xj. j = 1. under H1 (or H1′). and all ηj. The following example provides an instance of a prediction problem and its solution. . Then the LSE’s ˆ and σ ˜ 2 of β and σ2. β n ˆC and SC = ʈY − η ˆ2 ¯ )2 − β ˆ C = X′β ˆ C ʈ2 = Σ jn= 1(Yj − Y Then η ¯)2. ¯ − β20x Σ j = 1(xj − x ¯) . the Y’s rather than the Z’s are used.n−2. . 2C Σ j = 1(xj − x 2 n 2 ˆ2C)Σ jn= 1(xj − x ˆ c and Sc = ʈY − η ¯ ) + β20 · (β20 − 2β ˆ c = X′β ˆ c ʈ = Σ j = 1(Yj − Y η β ¯)2. Without loss of generality. . The problem is that of predicting Y0 and also constructing a prediction interval for Y0. we may assume that x1 ≠ x2. It 2 n 2 ˆ2C − β20) Σ j = 1(xj − x follows that Sc − SC = (β ¯) . the LSE become β ¯. 2 are linearly independent. n are linear combinations of η1 and η2. . i = 1. η ∈ V1. Then ηi = β1 + β2 xi. β2 = β20. Clearly. under H2 ˆ1c = β10. 2. Furthermore. Let Y0 be their sample mean. Thus. m are to be observed at the point x0. from which it follows that.16. i = 1. .5. . 0 1. we should like to emphasize that the transformation of the r. Now. n−2 . As mentioned earlier. EXAMPLE 2 Refer to Example 1 and suppose that the x’s are not all equal.5.’s Yj. from which it follows that. n − 2 1 SC Next. are independent. if we set Z = Y0 − Y0. . . by means of Exercise 16. then EZ = 0. they would be independent of the Y’s. where β20 is a given number. Of course.5 Derivation of the Distribution of the F Statistic 435 THEOREM 6 Assume the model described in (22) and let rank X = p (< n).4(i) and (18). Hypothesis H2 is equivalent to the hypothesis H2 ′: ηi = β10 + β20 xi. 2. or q = 2. Likewise. . . and the test statistic is 2 Sc − SC and the cut-off point is F = F1. j = 1.n−2. EXAMPLE 3 Refer to Example 2. It follows that rank X = r = 2. . Thus. j = 3. β It is an immediate consequence of the corollary to Theorem 4 and Theorem 5 in Chapter 9. the linear model adopted in this chapter can also be used for prediction purposes. ▲ PROOF Finally. suppose we are interested in testing the hypothesis H2: β1 = β10. It follows that the test statistic ˆ c = X′β η β − 2 Sc − SC ~ F2 . suppose we are interested in testing the following hypothesis about the slope of the regression line. Thus. ˆ 0. so that (or H2′). whereas under H1 (or H1′. where β10 and β20 are given numbers. Hypothesis H1 is equivalent to the hypothesis H1 ′: ηi = β1 + β20 xi. n is only a technical device for deriving the distribution of F and also for proving unbiasedness of the LSE of σ 2.’s Zj. . Y0 is to be predicted by Y ˆ ˆ that EY0 = β1 + β2x0. Then it follows The r. where SC was given above. .α.α .v. . j = 1. η ∈ V0.’s Y0i = β1 + β2x0 + ei. i = 1. r − q = 0.v. where Y ˆ0 = β ˆ1 + β ˆ2x0. r − q = 1. i = 1. β ˆ2c = β20. Therefore η ∈ V2. it is also assumed that if the Y0i’s were actually observed.

5.025 = 2.05. Exercises 16. ii) Set up a conﬁdence interval for σ 2 with conﬁdence coefﬁcient 1 − α. Thus the statistic σ 2 about σ and also for constructing conﬁdence intervals for σ 2. s = 1.4 to show that ˆ −β n β 1 ( σ ) and ˆ −β ∑j = 1 x j2 (β 2 2) n σ . it follows that the distribution of [(n − r)σ ˜ 2]/σ2 is χ 2 ˜ 2 can be used for testing hypotheses n−r.2570. 29.0.5.3.05? 16. α 2 ˆ + st .9176]. so that the m = 1 and α = 0. ˆ 0 = 27.0873.436 16 The General Linear Hypothesis ⎡ ⎤ 2 ⎢ ⎥ x0 − x 1 1 2 ⎥. ⎤ ⎡ 2 ⎥ ⎢ x0 − x ˜ ⎢1 1 σ ⎥ + + n 2⎥ n− 2 ⎢m n ⎢ ∑ xj − x ⎥ ⎥ ⎢ j =1 ⎦ ⎣ 2 where s = ( ) ( ) For a numerical example. σZ =σ2⎢ + + n ⎢m n 2⎥ ⎢ ∑ xj − x ⎥ ⎢ ⎥ j =1 ⎣ ⎦ ( ) ( ) (32) It follows by Theorem 6 that Z σZ ˆ2 σ2 σ = ˆ Y0 − Y 0 ⎤ ⎡ 2 ⎥ ⎢ x0 − x ˜ ⎢1 1 σ ⎥ + + n 2⎥ n− 2 ⎢m n ⎢ ∑ xj − x ⎥ ⎥ ⎢ j =1 ⎦ ⎣ 2 ( ) ( ) is tn−2 distributed.2703 and t10. ii) What is this conﬁdence interval in the case of Example 2 when n = 27 and α = 0.5.3. refer to the data used in Example 1 and let x0 = 60.3 and 16.1 From the discussion in Section 16.Y n −2 . Then Y prediction interval for Y0 is given by [24. α 0 2 ].2 Refer to Example 2 and: i) Use Exercises 16.2281. so that a prediction interval for Y0 with conﬁdence coefﬁcient 1 − α is provided by [Yˆ − st 0 n −2 .

.3 Verify relation (32). It is assumed that the r. 16.5 Refer to the model considered in Example 1 and suppose that the x’s and the observed values of the Y’s are given by the following table: .1 to show that ˆ −β n β 1 ( ) ˜2 ˜2 σ σ are distributed as tn−2. . . n and Y0i. . provided x ¯ = 0.5.v. ⎥ i =1 ⎦ ( ) is distributed as tm+n−3. . iv) Set up the test for testing the hypothesis H′ : β2 = 0 (the Y’s are independent of the x’s) against A′ : β2 ≠ 0 at level α and also construct a 1 − α conﬁdence interval for β2.v.5. where Y0 is the sample mean of the Y0i’s and show ii) Set V = Y0 − β1 − β that the r. . ii) Use Theorem 6 and Exercise 16. and ∑ j =1 ( x j − x ) n 2 (βˆ 2 − β2 ) (m + n)σˆˆ (m + n − 3) σ 2 V σV . j = 1. .Exercises 437 are distributed as N(0.’s Y0i = β1 + β2x0 + ei. iii) Set up the test for testing the hypothesis H : β1 = 0 (the regression line passes through the origin) against A : β1 ≠ 0 at level α and also construct a 1 − α conﬁdence interval for β1.4 Refer to Example 3 and suppose that the r. .5. 2 where 2 ⎡ x0 − x 1 1 ⎢ σ =σ ⎢ + + n m n ∑j = 1 x j − x ⎢ ⎣ 2 V 2 ( ( ) ) ⎤ ⎥ 2⎥ ⎥ ⎦ and ˆ ˆ2 = σ 1 ⎡n ˆ ˆ ⎢ ∑ Yj − β 1 − β 2 x j m+ n ⎢ ⎣ j =1 ( ) 2 m 2⎤ + ∑ Y0i − Y0 ⎥. ii) Derive the MLE x ˆ0 of x0. .’s ﬁguring in (ii) may be used for testing hypotheses about β1 and β2 and also for constructing conﬁdence intervals for β1 and β2. v) What do the results in (iii) and (iv) become for n = 27 and α = 0. m corresponding to an unknown point x0 are observed. i = 1. 16. ˆ ˆ2x0.5. . m are all independent. . .v.05? 16. Thus the r. 1).’s Yj. i = 1.v.

54 0. ζ2 = β. .21 15 0.77.65 0.438 16 The General Linear Hypothesis x y 5 0..44 30 0. . 16.1 and 16.4. . (19″) and (21)..32 1. .2(ii)). ζ1 = γ sx. predict Y0 at x0 = 17 and construct a prediction interval for Y0 with conﬁdence coefﬁcient 1 − α = 0. ⎜ ⎟ sx ⎠ ⎝ n ⎝ sx j =1 n⎠ ( ) Then: ii) Show that the Z’s are as in Exercise 16.v.2 and transform the Y’s to Z’s by means of an orthogonal transformation P whose ﬁrst two rows are n 2 ⎛ x1 − x x − x⎞ ⎛ 1 1 ⎞ 2 .6 The following table gives the reciprocal temperatures (x) and the corresponding observed solubilities of a certain chemical substance. 16.5.. sx = ∑ xj − x . .35 25 0.7 Let Zj.. .27 1.20 1. .7 and then transform the Z’s back to Y’s.30 20 0. . j = 1. where Zj is distributed as N(ζj. ii) Construct conﬁdence intervals for β1. n .’s.60 0. iii) On the basis of the assumed model.95 (see Exercises 16. x 3.5. . σ 2).80 3. ζr. .5.62 y Assume the model considered in Example 1 and discuss questions (i) and (ii) of the previous exercise.57 0.v.95 (see Example 3).80 1. Also compare the resulting test with the test mentioned in Exercise 16. .82 0. n be independent r. n whereas ζ1.84 0. Also discuss question (iii) of the same exercise for x0 = 3..10 10 0.50 3.8 Consider the r. σ 2 are parameters. . β2 and σ 2 by utilizing the formulas (19′).5. β2 and σ 2 with conﬁdence coefﬁcient 1 − α = 0.4. Suppose that ζj = 0 for j = r + 1.62 i) Find the LSE’s of β1.’s of Exercise 16...72 1.07 3.10 1.5. . .7 with r = 2. respectively.. ⎜ ⎟.26 3.67 1.5. Then derive the LR test for testing the hypothesis H : ζ1 = ζ1 against the alternative A : ζ1 ≠ ζ1 at level of signiﬁcance α.2. 16.5. ii) Set up the test mentioned in Exercise 16.

Exercises 439 16.v.3. A2 : β2 ≠ β * 2 at level of signiﬁcance α. . σ 2). . H2 : β2 = β * 2 against the corresponding alternatives A1 : β1 ≠ β * 1. . j = 1. Use Exercises 16. .1 to test the hypotheses H1 : β1 = β * 1.3. n.5. 16.’s distributed as N(0. where the e’s and e*’s are independent r. .3. .9 Let Yi = β1 + β2xi + ei.5. i = 1. Y * j = β* 1 + β* 2 x* j + e* j.4 and 16. . . m. .

17. etc. Comparison of the strengths of certain objects made of different batches of some material. Comparison of the yields of a chemical substance by using different catalytic methods. Crop yields corresponding to different soils and fertilizers. we discuss some statistical models which make the comparisons mentioned above possible. Crop yields corresponding to different soil treatment. or one-way classiﬁcation. Comparison of different brands of gasoline by using them in several cars. Comparison of the effect of different types of oil on the wear of several piston rings. 440 . Below. Comparison of test scores from different schools and different teachers. Identiﬁcation of the melting point of a metal by using different thermometers.1 One-way Layout (or One-way Classiﬁcation) with the Same Number of Observations Per Cell The models to be discussed in the present chapter are special cases of the general model which was studied in the previous chapter. Comparison of a certain brand of gasoline with and without an additive by using it in several cars. we consider what is known as a one-way layout. In this section. etc.440 17 Analysis of Variance Chapter 17 Analysis of Variance The Analysis of Variance techniques discussed in this chapter can be used to study a great variety of problems of practical interest. Below we mention a few such problems. which we introduce by means of a couple of examples. Comparison of the wearing of different materials.

All other conditions assumed to be the same. YI ′ 11 1 1J I 21 ( e = (e . . . . . . . consider I · J identical plots arranged in an I × J orthogonal array. One may envision the I objects (machines.’s eij. . . . eI ′ ) ) . etc. . . . The same interpretation and terminology is used in similar situations throughout this chapter. . Y2 J . there is no difference between the I machines. . . . . Set Y = Y11 . Y21 . . It is further assumed that they are independent. . . fertilizers. . Of course his decision is to be based on the productivity of each one of the I different machines. e . . unspeciﬁed) (that is.’s themselves and one has the following model. Let μi be the average output of the worker when running the ith machine and let eij be his “error” (variation) the jth day when he is running the ith machine. there are three basic problems we are interested in: Estimation of μi. or the I kind of fertilizers) and estimation of σ 2. eI . YI . . 17. . tomatoes. . . Therefore the Yij’s are r.1).) as being represented by the I spaces between I + 1 horizontal (straight) lines and the J objects (days. J (≥ 2). A purchaser who is interested in acquiring a number of these machines is then faced with the question as to which brand he should choose. In connection with model (1). . . (1) ( ) j = 1. J are independent N(0. . I. . .1 One-way Layout (or One-way Classiﬁcation) 441 EXAMPLE 1 Consider I machines. and denote by Yij his output the jth day he is running the ith machine. . . etc. EXAMPLE 2 For an agricultural example. In such a case there are formed IJ rectangles in the resulting rectangular array which are also referred to as cells (see also Fig. so that the yield Yij of the jth plot treated by the ith kind of fertilizer is given by (1). . σ 2). . . . . . and let eij stand for the variation of the yield from plot to plot in the ith row. the problem is that of comparing the I different kinds of fertilizers with a view to using the most appropriate one on a large scale. . each one of which is manufactured by I different companies but all intended for the same purpose. plots. i = 1.) is planted in all I · J plots and that the plants in the ith row are treated by the ith kind of I available fertilizers. μ )′ . . . . To this end. we denote by μi the average yield of each one of the J plots in the ith row. I. .v. Then it is again reasonable to assume that the r. Y1 J . σ 2 ( ) for i = 1. . . j = 1. Once again. e2 J .) as being represented by the J spaces between J + 1 vertical (straight) lines. . . I ≥ 2 . . i = 1. . etc. I.v. . . . . . Yij = μ i + eij where eij are independent N 0. . .17. let a worker run each one of the I machines for J days each and always under the same conditions. i = 1. .’s eij are normally distributed with mean 0 and variance σ 2.v. testing the hypothesis: H : μ1 = · · · = μI (=μ. Then it is reasonable to assume that the r. . Suppose that the same agricultural commodity (some sort of a grain. e β = ( μ . .

.442 17 Analysis of Variance 1 1 2 2 j JϪ1 J i (i. . .1 I 444 ⎛ 6444 7 8 ⎜1 0 0 ⋅ ⋅ ⋅ 0 ⎜ J ⎜⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⎜1 0 0 ⋅ ⋅ ⋅ 0 ⎜ ⎜0 1 0 ⋅ ⋅ ⋅ 0 ⎫ ⎪ ⎜ ⎜⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0 ⎬ J ⎪ X′ = ⎜ 0 1 0 ⋅ ⋅ ⋅ 0 ⎭ ⎜ ⎜⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⎜ ⎜⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⎜⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⎜ ⎜0 0 ⋅ ⋅ ⋅ 0 1 ⎫ ⎜⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⎪ ⎬J ⎜ ⎜0 0 ⋅ ⋅ ⋅ 0 1 ⎪ ⎝ ⎭ · ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ β + e. Next. 1. Thus we have the model described in (6) Then it is clear that Y = X′β of Chapter 16 with n = IJ and p = I. 1)′ are. 0)′. . independent and any other row vector in X′ is a linear combination of them. . X . . 0)′. 0. 0. . the I vectors (1. (0. . . . . . that is. 0. j th) cell IϪ1 I Figure 17. 0. clearly. (0. · · · . Thus rank X′ = I (= p).

Therefore. . 1 Y . . . μ where Y. . = 1 IJ ∑ ∑Y . where Yi . μ1 . .1 One-way Layout (or One-way Classiﬁcation) 443 is of full rank. μ J j =1 Next. ∑ YIJ ⎟ . . ′ J J J ˆ = S −1 XY = ⎛ 1 Y . 1 Y ⎞ . I have uniquely determined LSE’s which have all the properties mentioned in Theorem 5 of the same chapter. q I −1 SC SC i = 1. r − q = 1 and hence q = r − 1 = p − 1 = I − 1. . . μ I . . . Then by Theorem 2. . by (9). . the F statistic for testing H is given by F = n − r Sc − SC I J − 1 Sc − SC = . 2 i =1 j =1 I J ( ) One has then the (unique) solution ˆ = Y. That is. . η ∈ V1. μ 2 . . . ij i =1 j =1 I J (4) Therefore relations (28) and (29) in Chapter 16 give . μ 2 . . . . ⎝ j =1 ⎠ j =1 j =1 so that. . under the hypothesis H : μ1 = · · · = μI (= μ. . I . . . . = ∑ Yij . μ I ⎟ .. . Chapter 16. ⎜ ⎟ ⎜ ⎟ ⎝ ⎠ so that. μi = 1. (2) ( ) (3) Now. . . under H. In order to determine the explicit expression of them. . according to (31) in Chapter 16. . the model becomes Yij = μ + eij and the LSE of μ is obtained by differentiating with respect to μ the expression Y − ηc 2 = ∑ ∑ Yij − μ . . .17. . one has ′ ⎛ 647 J 48 647 J 48 J4 64 7 8⎞ η = EY = ⎜ μ1 . . . β ∑ IJ ⎠ ⎜ J ∑ 1j J ∑ 2 j J j =1 ⎟ ⎝ j =1 j =1 Therefore the LSE’s of the μ’s are given by 1 J ˆ i = Yi . . we observe that ⎛J 0 ⎜0 J S = XX ′ = ⎜ ⎜⋅ ⋅ ⎜ ⎝0 0 0 ⋅ ⋅ 0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0⎞ ⋅ 0⎟ ⎟ = JIp ⋅ ⋅⎟ ⎟ 0 J⎠ and J J ⎛ J ⎞′ XY = ⎜ ∑ Y1 j . . ∑ Y2 j . Chapter 16. unspeciﬁed).

. = That is. i =1 j =1 I J ( ) = ∑ ∑Y i =1 j =1 2 ij − IJY . Finally. Sc = SST . one has I I ⎛ I ⎞ 2 Sc − SC = J ∑ Y i2. 2 ˆ ij . MS e = I −1 I ( J − 1) and SSH and SSe are given by (7) and (5). . ) 2 ∑ (Y so that j =1 J ij − Yi . − Y. according to Theorem 4 of Chapter 16. .2 . − IY .c = ∑ ∑ Yij − η i =1 j =1 I J ( ) = ∑ ∑ (Y i =1 j =1 2 ij − Y. SC = SSe . = J ⎜ ∑ Yi2. I ) = ∑Y 2 j =1 J J − JY i2. . . i =1 I ⎞ 1 I 1 I ⎛1 J Y = ∑ Yi . ∑ ⎝ J ∑ ij ⎟ I i =1 ⎜ ⎠ I i =1 j =1 (7) ( ) 2 = J ∑ Y i2. ⎟ = J ∑ Yi . i =1 I Therefore the F statistic given in (3) becomes as follows: F = where I J − 1 SS H MS H = . i =1 I (5) Likewise. where SSe = ∑ ∑ Yij − Yi . . respectively. ⎝ i =1 ⎠ i =1 i =1 ( ) since Y. . i =1 j =1 ( ) = ∑ ∑Y 2 i =1 j =1 2 I J I J 2 ij − J ∑ Y i2. . (6) so that. . Sc − SC = SSH. . . − IJY . where SSH = J ∑ Yi . I − 1 SS e MS e ( ) (8) SS H SS e . the LSE of σ 2 is given by MS H = ˜2 = σ SS e .C = ∑ ∑ Yij − η i =1 j =1 I J ( ) = ∑ ∑ (Y 2 i =1 j =1 2 I J ij I J ij − Yi . These expressions are also appropriate for actual calculations.444 17 Analysis of Variance ˆC SC = Y − η and ˆc Sc = Y − η But for each ﬁxed i. − Y. . ) 2 2 ˆ ij .2 .2 . − IJY .2 . by means of (5) and (6). I ( J − 1) (9) . where SST = ∑ ∑ Yij − Y.

SSH is called the sum of squares between groups. Actually. For this reason. for each i. the analysis of variance itself derives its name because of such a split of SST. Also from (6) it follows that SST stands for the sum of squares of the deviations of the Yij’s from the grand (sample) mean Y. On the other hand. SST is called the total sum of squares for obvious reasons. . i =1 j =1 ( ) 2 IJ − 1 — From (5). Next. as follows from the discussion in Section 5 of Chapter 16. (up to the factor J). the 2 quantities SSH and SSe are independently distributed.6404. . SSe is called the sum of squares within groups. ) i =1 I mean squares between groups I−1 MS H = SS H I −1 within groups SS e = ∑ ∑ Y ij − Y i . j = 1. ∑ Jj = 1(Yij − Yi .05.12. under H. from (5) we have that. it splits into SSH and SSe. Then SST is σ χ IJ−1 distributed. Now. We may summarize all relevant information in a table (Table 1) which is known as an Analysis of Variance Table. i =1 j =1 I J I J ( ) 2 I( J − 1) MS e = SS e I( J − 1) total SS T = ∑ ∑ Y ij − Y. μ ˆ 3 = 74. and as mentioned above. respectively. σ .8853 and the hypothesis H : μ1 = μ2 = μ3 is rejected.05. .17.. . MSe = 7.1 One-way Layout (or One-way Classiﬁcation) 445 Table 1 Analysis of Variance for One-Way Layout source of variance degrees of freedom 2 sums of squares SS H = J ∑ (Y i .8.0. For this reason. from the grand mean Y. μ ˆ 2 = 63.4.4. . ˜ 2 = MSe = 7. REMARK 1 EXAMPLE 3 For a numerical example. (6) and (7) it follows that SST = SSH + SSe.)2 is the sum of squares of the deviations of Yij.5392. . Thus for α = 0. F2. − Y. from (7) we have that SSH represents the sum of squares of the deviations of the group means Yi. J = 5 and let Y11 = Y12 = Y13 = Y14 = Y15 = We have then 82 83 75 79 78 Y21 = Y22 = Y23 = Y24 = Y25 = 61 62 67 65 64 Y31 = 78 Y31 = 72 Y33 = 74 Y34 = 75 Y35 = 72 ˆ 1 = 79. as σ 2χ I −1 and 2 2 2 2 σ χ I(J−1). = 3. take I = 3. J within the ith group.2 μ and MSH = 315. Finally. Of course. under H.4. so that F = 42. . .

Yij such that Yij = μij + eij. errors eij. . . . . it is assumed.2 Two-way Layout (Classiﬁcation) with One Observation Per Cell The model to be employed in this paragraph will be introduced by an appropriate modiﬁcation of Examples 1 and 2. . plus a contribution βj due to the jth worker. i = 1. . . At this point it is assumed that each μij is equal to a certain quantity μ. I. EXAMPLE 4 Referring to Example 1.3 9. . J (≥ 2) are independent N(0. . σ 2). the grand mean. J are independent N(0.4 9. .” His actual daily output is then an r. EXAMPLE 5 Consider the identical I · J plots described in Example 2. .4 C 9. Then the yield of the jth variety of the commodity in question treated by the ith fertilizer is an r. . Each one of the J workers is assigned to each one of the I machines which he runs for one day.v. one variety in each plot.0 11. . . i =1 j =1 I J Finally. Thus the assumed model is then Yij = μ + α i + β j + eij .1. Then all J plots in the ith row are treated by the ith of I different kinds of fertilizers. consider the I machines mentioned there and also J workers from a pool of available workers. as is usually the case.7 B 9. where ∑αi = ∑ β j = 0 i =1 j =1 I J (10) and eij. and suppose that J different varieties of a certain agricultural commodity are planted in each one of the I rows.5 11.v.1 Apply the one-way layout analysis of variance to the data given in the table below. and called the jth column effect.1 10.4 17. Yij which is assumed again to have the structure described in (10). σ2). Here the ith row effect . plus a contribution αi due to the ith row (ith machine).2 8. I (≥ 2). that the r. and called the ith row effect. Let μij be the daily output of the jth worker when running the ith machine and let eij be his “error. A 10. j = 1. i = 1. . .446 17 Analysis of Variance Exercise 17. It is further assumed that the I row effects and also the J column effects cancel out each other in the sense that ∑ α i = ∑ β j = 0. j = 1.

j = 1. Y2 J . . We ﬁrst show that model (10) is a special case of the model described in (6) of Chapter 16. e . Y1 J . . β )′ 1 I 1 J . . J IJ ′ 11 1J 21 2J I1 ( e = (e . α . . there are the following three problems to be solved: Estimation of μ. . . The same interpretation and terminology is used in similar situations throughout this chapter. YI 1 . I. . . e IJ ′ ) ) and ⎛ ⎜1 ⎜ ⎜1 ⎜⋅ ⎜ ⎜1 ⎜ ⎜1 ⎜1 ⎜ ⎜⋅ X′ = ⎜ 1 ⎜ ⎜⋅ ⎜⋅ ⎜ ⎜⋅ ⎜ ⎜1 ⎜1 ⎜ ⎜⋅ ⎜1 ⎝ I 4448 64447 1 0 0 ⋅ ⋅ ⋅ 0 1 0 0 ⋅ ⋅ ⋅ 0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 1 0 0 ⋅ ⋅ ⋅ 0 0 1 0 ⋅ 0 1 0 ⋅ ⋅ ⋅ ⋅ ⋅ 0 1 0 ⋅ ⋅ ⋅ ⋅ 0 0 ⋅ 0 ⋅ ⋅ ⋅ 0 0 ⋅ 0 ⋅ ⋅ ⋅ ⋅ ⋅ 0 ⋅ ⋅ 0 ⋅ ⋅ ⋅ ⋅ ⋅ 0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ J 444 6444 4 7 4 8 1 0 0 ⋅ ⋅ ⋅ 0 0 1 0 ⋅ ⋅ ⋅ 0 J ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0 0 0 ⋅ ⋅ 0 1 1 0 0 ⋅ 0 1 0 ⋅ ⋅ ⋅ ⋅ ⋅ 0 0 0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0 ⋅ ⋅ ⋅ 0⎫ ⎪ 0⎪ ⎬J ⋅ ⎪ 1⎪ ⎭ ⋅ ⋅ ⋅ · ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0 ⋅ ⋅ 0 1 0 ⋅ ⋅ 0 1 ⋅ ⋅ ⋅ ⋅ 0 ⋅ ⋅ 0 1 ⋅ 1 0 0 ⋅ ⋅ 0 1 0 ⋅ ⋅ ⋅ 0 ⋅ 0 ⋅ 0⎫ ⎪ ⋅ 0⎪ ⎬J ⋅ ⋅ ⋅ ⋅ ⋅ ⎪ 0 ⋅ ⋅ 0 1⎪ ⎭ ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ . . . .2 Two-way Layout (Classiﬁcation) with One Observation Per Cell 447 is the contribution of the ith fertilized and the jth column effect is the contribution of the jth variety of the commodity in question. . . testing the hypothesis HA : α1 = · · · = αI = 0 (that is. . . . . . . . αi. . In connection with model (10). βj. . .17. J. . . α . . . . machines and workers in Example 4 and fertilizers and varieties of agricultural commodity in Example 5. . . . . we set Y = Y11 . there is no column effect) and estimation of σ 2. . i = 1. . . . . From the preceding two examples it follows that the outcome Yij is affected by two factors. For this purpose. HB : β1 = · · · = βJ = 0 (that is. . . . Y21 . e . e β = ( μ . there is no row effect). . β . . . . e . . . . . The I objects (machines or fertilizers) and the J objects (workers or varieties of an agricultural commodity) associated with these factors are also referred to as levels of the factors.

under HA.. . . .. . We have EY = η = X ′ μ : α1 .j . . . where Yi. . α ˆ i = Yi. . under HA again. In fact. . HA : α1 = · · · = αI = 0. I i =1 Summarizing these results. ˆ = Y. Y. . we have then that the LSE’s of μ. is given by (2) and (∂/∂βj)S (Y. the normal equations still have a unique solution. i = 1. .A. we determine the LSE’s of μ and βj. α I . .448 17 Analysis of Variance and then we have Y = X ′β + e with n = IJ and p = I + J + 1. − Y. S Y. αi and βj are. Then. . . That is. β = 0 ∂μ ( ) ˆ = Y. β) = 0 implies ˆ j = Y. . . β = ∑ ∑ Yij. Y. − Y. . where r − qA = J. Next. . . where Y. However. . β1 . − μ − α i − β j i =1 j =1 ( ) I J ( ) 2 and ∂ S Y. imposed on the parameters. . i = 1. . β μ j . β = 0 ∂a i implies α ˆ i = Yi . . η ∈ Vr−q. J . . where β ( ) 1 I ∑ Yij . j = 1. . β) becomes ∑ ∑ (Y i =1 j =1 I J ij − μ − βj ) 2 from where by differentiation. I are given by (2).j − Y. I .. . is again given by (4).2. S(Y. because of the two independent restrictions ∑α = ∑ β i i =1 j =1 I J j = 0. It can be shown (see also Exercise 17. . j = ˆ = Y − Y ..1) that X′ is not of full rank but rank X′ = r = I + J − 1. one has ˆ A and β denoted by μ . J . implies μ ∂ S Y. respectively. . j = 1. . (11) where Yi . as is found by differentiation. βJ Consider the hypothesis ( ) ′ ∈ Vr . I i =1 (12) Now we turn to the testing hypotheses problems. to be ˆj. . . j = 1 I ∑ Yij . . so that qA = I − 1. where r = I + J − 1.. respectively. is given by (4) and Y. .

j ) i =1 j =1 i =1 j =1 I J I J = J ∑ Yi . . j i =1 j =1 ) − J ∑ Yi . i =1 j =1 I J I J [( ( ) ( 2 )] 2 = ∑ ∑ Yij − Y. the F statistic.j . . i =1 I ( ) 2 (14) because ∑ ∑ (Yij − Y. . . )∑ (Yij − Y. j − Yi . A I I i =1 I i =1 ( ) 2 2 = J ∑ Yi 2 .) . β μ j. . I −1 MS e = ( SS e I −1 J −1 )( ) and SSA. j . . (However. to be denoted here by FA. − Y. ) 2 2 A ˆ ij . is given by FA = (I − 1)( J − 1) SS I −1 A SSe = MS A . − Y. − Y. j )(Yi . . ) 2 Now SC can be rewritten as follows: SC = SSe = ∑ ∑ Yij − Y. . i =1 (15) It follows that for testing HA. where SS A = J ∑ α i = J ∑ Yi . − Y.17. ˆ A = Y. j = 1.c = ∑ ∑ Yij − η i =1 j =1 J ( A ) = ∑ ∑ (Y J i =1 j =1 ij − Y.C = ∑ ∑ Yij − η i =1 j =1 I J ( I ) = ∑ ∑ (Y 2 i =1 j =1 2 I I J ij − Yi . .2 Two-way Layout (Classiﬁcation) with One Observation Per Cell 449 ˆ = Y −Y = β ˆ . MSe (16) where MS A = SS A . . respectively. j (13) Therefore relations (28) and (29) in Chapter 16 give by means of (11) and (12) ˆC SC = Y − η and ˆc Sc = Y − η A 2 ˆ ij . J .. SSe are given by (15) and (14). for an expression of SSe to be used in actual calculations. . . ) = ∑ (Yi . A . − Y. j + Y. − Y. . − Y. = μ ˆ. − IJY . see (20) below. . 2 i =1 I ( ) Therefore ˆ2 Sc − SC = SS A . .

. − Y. − Y. . j =1 I J ) − 2 ∑ ∑ Yij − Y. j − Y. respectively. j − Y.450 17 Analysis of Variance Next. i =1 j =1 ( )( ) − 2 ∑ ∑ Yij − Y. . . − Y. . . − Y. − Y. i =1 I J I ( ) 2 + I ∑ Y. . j =1 J (18) The quantities SSA and SSB are known as sums of squares of row effects and column effects. j − Y. . . )∑ (Yij − Y. (19) we show below that SST = SSe + SSA + SSB from where we get SSe = SST − SSA − SSB. . . . to be denoted here by FB. i =1 j =1 J ( ) + J ∑ Yi . − Y. . Y. i =1 j =1 I J ( ) = ∑ ∑Y 2 i =1 j =1 I J 2 ij − IJY. (18) and (19). Yi . i =1 j =1 I J I J [( ( ) ( 2 2 ) ( )] 2 = ∑ ∑ Yij − Y. 2 j − IJY . SSe = ∑ ∑ Yij − Y. is given by FB = (I − 1)( J − 1) SS J −1 J J B SSe = MSB . . Y. (20) Relation (20) provides a way of calculating SSe by way of (15). . we ﬁnd in an entirely symmetric way that the F statistic.2. . j − Y. . j − Y. Clearly. MSe (17) where MSB = SSB /(J − 1) and ˆ2 = I SSB = Sc − SC = I ∑ β ∑ Y. ) i =1 j =1 i =1 j =1 I J I J = J ∑ Yi . − Y. if we set SST = ∑ ∑ Yij − Y. . . = SSTT − SS A − SSB i =1 j =1 ( )( because ∑ ∑ (Yij − Y. . − Yi . . − Y. ) = ∑ (Yi . i =1 I ( ) 2 = SS A . i =1 j =1 I J ( )( ) ) + 2 ∑ ∑ Yi . Finally. )(Yi . for testing the hypothesis HB : β1 = · · · = βJ = 0. j B j =1 j =1 ( ) 2 2 = I ∑ Y. . .

σ This section is closed by summarizing the basic results in Table 2 above. .2 Apply the two-way layout with one observation per cell analysis of variance to the data given in the following table (take α = 0.2. − Y. . as a consequence of the discussion in Section 5 of Chapter 16. (21) Exercises 17.Exercises 451 Table 2 Analysis of Variance for Two-way Layout with One Observation Per Cell source of variance degrees of freedom 2 sums of squares ˆ2 SS A = J ∑ α i = J ∑ (Y i . j + Y. ) The pairs SSe. Y. − Y. . 17. the LSE of σ 2 is given by ˜ 2 = MSe.’s with certain degrees of freedom. j − Y. = 0. j − Y. i =1 )( ) I ( )∑ (Y j =1 J . . j − Y. j j =1 j =1 I J J J ( ) 2 J−1 MS B = residual SS e = ∑ ∑ Y ij − Y i . j =1 ( ) 2 = SSB and ∑ ∑ (Y i =1 j =1 I J i.v. j =1 J )( ) J ( )∑ (Y i =1 I ij − Y. . where X′ is the matrix employed in Section 2. . Y. = ∑ Y. j − Y. ) = I ∑ Y.2. . − Y. . = ∑ Yi . . i =1 j =1 ( ) 2 ∑ ∑ (Y i =1 j =1 I J ij − Y. − Y. Finally.j − Y.1 Show that rank X′ = I + J − 1. . . SSB are independent σ 2χ 2 distributed r. SSA and SSe. . j − Y. ) i =1 i =1 I I mean squares rows I−1 MS A = SS A I −1 SS B J −1 columns ˆ 2=I SS B = I ∑ β ∑ Y. i =1 j =1 I J ( ) 2 ( I − 1) × ( J − 1) IJ − 1 MS e = ( I − 1)( J − 1) — SS e total SS T = ∑ ∑ Y ij − Y. . .05).

j = 1. . . . . . . However. . j = 1. . μ1 J . k = 1. It is not unreasonable to assume that. i = 1. . Y1 JK . . This amounts to saying that we observe the yields Yijk. . . . I (≥ 2). . . . . . . Y11K . there are no interactions present). . . .3 Two-way Layout (Classiﬁcation) with K (≥ 2) Observations Per Cell In order to introduce the model of this section. . testing the hypotheses: HA : α1 = · · · = αI = 0. . . K of K identical plots with the (i. . By setting Y = Y111 . . . the plot where the jth agricultural commodity was planted and it was treated by the ith fertilizer (in connection with Example 5). Y1 J 1 . . . . . . . . and estimation of σ 2. . . or we allow the jth worker to run the ith machine for K days instead of one day (Example 4). k = 1. I. . . . except for the grand mean μ and the row and column effects αi and βj. . αi. . j = 1. J. or workers and machines. . such as fertilizers and varieties of agricultural commodities. . J (≥ 2). . . . e IJK ′ . . . . In other words. . . we may now allow interactions γij among the various factors involved. HB : β1 = · · · = βJ = 0 and HAB : γij = 0. i = 1. . . . . . . σ 2). .μ IJ ′ ) ) ) and . . J (that is. Once again the problems of main interest are estimation of μ. . . .452 17 Analysis of Variance 3 −1 1 7 2 2 5 0 4 4 2 0 17. these interactions cancel out each other and we shall do so. . . βj and γij. . which in the previous section added up to make μij. I. . i = 1. . . YIJ 1 . K (≥ 2) are independent N(0. . Thus our present model is as follows: Yijk = μ + αi + β j + γij + eijk. . that is. j)th plot. e IJ 1 . . . In the present case. . . . I. . . consider Examples 4 and 5 and suppose that K (≥ 2) observations are taken in each one of the IJ cells. respectively. . on the average. J need not be additive any longer. μ I 1 . . . where I J J I (22) ∑α = ∑ β = ∑γ i j i =1 j =1 j =1 ij = ∑ γ ij = 0 i =1 for all i and j and eijk. . i = 1. . e1 JK . . e11K . . . . . e1 J 1 . j = 1. . . the means μij. YIJK ′ 111 11 ( e = (e β = (μ . the relevant model will have the form Yijk = μij + eijk. .

⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ (22′) it is readily seen that β+e Y = X′β with n = IJK and p = IJ. that is. so that model (22′) is a special case of model (6) in Chapter 16. 2 i =1 j =1k =1 ( ) I J K ( ) We have then . X′ is of full rank (see also Exercise 17.3 Two-way Layout (Classiﬁcation) with K (≥ 2) Observations Per Cell 453 IJ 444444 ⎛ 644444 4 7 8 ⎜1 0 0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0 ⎜ ⎜⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ K ⎜1 0 0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0 ⎜ ⎜0 1 0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0 ⎫ ⎪ ⎜ ⎜⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⎬ K ⎜0 1 0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0 ⎪ ⎭ ⎜ ⎜⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⎜ ⎜⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⎜⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⎜ J ⎜ 6447448 ⎜0 ⋅ ⋅ ⋅ 0 1 0 ⋅ ⋅ ⋅ ⋅ 0 ⎜ K ⎜⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ X′ = ⎜ 0 ⋅ ⋅ ⋅ 0 1 0 ⋅ ⋅ ⋅ ⋅ 0 ⎜ ⎜⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⎜⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⎜ ⎜⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⎜ ⎛ I −1⎞ J +1 ⎝ ⎠ ⎜ 64444 7 4444 8 ⎜0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0 1 0 ⋅ ⋅ ⋅ 0 ⎜ ⎜⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ K ⎜0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0 1 0 ⋅ ⋅ ⋅ 0 ⎜ ⎜⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⎜ ⎜⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⎜⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⎜ 1⎫ ⎜0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0 ⎜⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⎪ K ⎬ ⎜ ⎪ ⎜0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0 1 ⎝ ⎭ · · · ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟. From the form of X′ it is also clear that rank X′ = r = p = IJ.3. β = ∑ ∑ ∑ Yijk − μ ij .17. Therefore the unique LSE’s of the parameters involved are obtained by differentiating with respect to μij the expression S Y.1).

. It is then readily seen that ˆ = Y. μ ˆ = Y −Y . Sc A = SC + JK ∑ α A i =1 i =1 I I Thus for testing the hypothesis HA the sum of squares to be employed are SC = SSe = ∑ ∑ ∑ Yijk − Yij . . .) From identity (26) it follows that. ˆ.α ˆ i. Therefore Now from (23) and (25) it follows that μ j ˆ ij SC = ∑ ∑ ∑ Yijk − μ Next. . I . β ˆij. . . β j .. − Yi. are given by the above-mentioned linear estimable. . ˆ −α ˆi −β = ∑ ∑ ∑ Yijk − μ j ij i =1 j =1 k =1 I J K ( ) 2 ˆ − γˆ + μ ˆ −α ˆi −β ˆ −μ Yijk − μ ij = Yijk − μ j ij i i j j ij ( and hence ) ( ˆ − β + (γˆ ˆ − α ) + (β + (α ) ) 2 ) + γ ij ) S Y. . . under the hypothesis HA : α1 = · · · = αI = 0. ( ) 2 ˆ − γˆ . + Y. i =1 j =1k =1 I J K γˆ ij = Yij. . μ i = 1. . . J .j + μ.. . 2 i =1 j =1 I J ( ) (26) because. j = 1. From (24) we have that μ. − μ. . I .2. J . . β = ∑ ∑ ∑ Yijk − μ ij i =1 j =1k =1 I ( ) I J K ( ˆ −μ = SC + IJK μ J ( ) 2 ˆ i − αi + JK ∑ α i =1 ( ) 2 ˆ −β + IK ∑ β j j j =1 ( ) 2 + K ∑ ∑ γˆ ij − γ ij . . we have μ = μ. j. . (24) by employing the “dot” notation already used in the previous two sections. .. i =1 j =1k =1 i =1 j =1 I J (27) and . by the corollary to Theorem 3 in Chapter 16. βj and γij are linear combinations of the parameters μij. (23) Next. . αi = μi. . j = 1. and their LSE’s μ j combinations. i =1 j =1k =1 I J K I J K ( ) 2 2 = ∑ ∑ ∑ Yijk − K ∑ ∑ Yij2 . . j. βj = μ . .. − μ. . .j − μ. . (See also Exercise 17. . . from the fact that μij = μ + αi + βj + γij and on the basis of the assumptions made in (22). Therefore. so that Sc − SC = JK ∑ α ˆ i2 . It follows then that ˆ i2 . (25) i = 1. upon replacing μij by their LSE’s. . − Y. the LSE’s of the remaining parameters remain the same as those given in (25). γij = μij − μi. α ˆ i = Yi.3. as is easily seen.+γ ˆ ij = μ ˆ +α ˆi +β ˆij. . all other terms are equal to zero.454 17 Analysis of Variance ˆ ij = Yij. . they are ˆ.. . .. . αi. γ ˆ. − Y.

.17. . = 0. . which now is given by F AB = ( SS AB MS AB = . we get I − 1 independent linear relationships which the IJ components of η satisfy and hence r − qA = IJ − (I − 1). we ﬁnd in an entirely symmetric way that the F statistic to be employed is given by FB = where MSB = SSB J −1 J j =1 IJ K − 1 SS B MS B = . .3 Two-way Layout (Classiﬁcation) with K (≥ 2) Observations Per Cell 455 ˆ i2 = JK ∑ Yi . . . . − Y. . . . For testing the hypothesis HB : β1 = · · · = βJ = 0. . . . − IJKY. j . . 2 j . .2. . − μ. . μi . Sc − SC = SS A = JK ∑ α A I I i =1 i =1 ( ) 2 = JK ∑ Yi2. so that. I. . under HA. . I −1 SS e IJ K − 1 ( ) (29) where MS A = MS e = ( ) and SSA. Therefore the F statistic in the present case is FA = IJ K − 1 SS A MS A = . I − 1 SS e MS e SS A . − IJKY . . SSe are given by (28) and (27). . respectively. . − Y. . . I. i =1 I (28) For the purpose of determining the dimension r − qA of the vector space in which η = EY lies under HA. arguments similar to the ones used before yield the F statistic. . − μ. (31) Also for testing the hypothesis HAB : γ ij = 0. . i = 1. I − 1. = αi. . we observe that μi . For i = 1. Thus qA = I − 1 since r = IJ. i = 1. . J − 1 SS e MS e 2 ( ) (30) and ˆ 2 = IK SSB = IK ∑ β ∑ Y. J. . j = 1. MS e I − 1 J − 1 SS e IJ K − 1 ( )( ) ) (32) where . j j =1 j =1 J J ( ) 2 = IK ∑ Y.

i =1 j =1k =1 I J K I J K ( ) 2 2 = ∑ ∑ ∑ Yijk − IJKY. j .2083. (28). − Yi . . γ21 = 0.5416.’s with certain degrees of freedom.) Finally. which can be shown to be independently distributed as σ 2χ 2 r. for an expression of SSAB suitable for calculations.v. Of course. σ (36) Once again the main results of this section are summarized in a table. . β 3 ˆ22 = 1. i =1 j =1k =1 (34) we can show (see Exercise 17. γ ˆ 1 = 2. 2 i =1 j =1 I ( ) (33) (However. ˆ 2 = −2. .4166.0416.456 17 Analysis of Variance MS AB = ( SS AB I −1 J −1 )( J ) and 2 SS AB = K ∑ ∑ γˆ ij i =1 j =1 I J = K ∑ ∑ Yij . 18 20 50 53 34 36 40 17 X121 X122 X123 X124 X221 X222 X223 X224 = 64 = 49 = 35 = 62 = 40 = 63 = 35 = 63 X131 X132 X133 X134 X231 X232 X233 X234 = = = = = = = = 61 73 62 90 56 61 58 73 . . − Y. The number of degrees of freedom of SST is calculated by those of SSA. so that SSAB = SST − SSe − SSA − SSB. β ˆ = 0. EXAMPLE 6 For a numerical application. the LSE of σ 2 is given by ˜ 2 = MSe.2084. by setting SST = ∑ ∑ ∑ Yijk − Y.3) that SST = SSe + SSA + SSB + SSAB. + Y.5417. γ13 = 2. . see (35) below.7916. (35) Relation (35) is suitable for calculating SSAB in conjunction with (27).7917.4167. μ ˆ = 16.2. SSB. (31) and (34). .5416. Certain measurements are taken on the subjects and suppose they are as follows: X111 = X112 = X113 = X114 = X211 = X212 = X213 = X214 = For this data we have ˆ = 50. consider two drugs (I = 2) administered in three dosages (J = 3) to three groups each of which consists of four (K = 4) subjects. SSAB and SSe.2084 γ ˆ = −17. ˆ23 = −2. . .3. α ˆ γ11 = − 0.8334. Table 3. β α 1 2 ˆ ˆ ˆ γ12 = −1.

3.18.05 = 4. However. j. The models analyzed in the previous three sections describe three experimental designs often used in practice.05. . There are many others as well.Exercises 457 Table 3 Analysis of Variance for Two-way Layout with K (≥ 2) Observations Per Cell source of variance degrees of freedom 2 sums of squares mean squares MS A = SS A I −1 SS B J −1 A main effects ˆ i2 = JK ∑ (Y i . . + Y. Some of them are taken from the ones just described by allowing different numbers of observations per cell. − Yi .0. FB = 12. .’s themselves.3. by allowing the row effects.4139 and F2. where X′ is the matrix employed in Section Verify identity (26).2 Show that rank X′ = IJ. by randomizing the levels of some of the factors.v. by increasing the number of factors. column effects and interactions to be r. i =1 j =1 k =1 I J K ( ) 2 MS e = total SS T = ∑ ∑ ∑ Y ijk − Y. Exercises 17. Thus for α = 0. Finally. 17. we accept HA.3.1038. j . . .5546. . we have F1. ) SS A = JK ∑ α i =1 i =1 J J I I I−1 B main effects ˆ 2 = IK SS B = IK ∑ β ∑ Y. even a brief study of these designs would be well beyond the scope of this book. j j =1 j =1 I J I J ( ) 2 J−1 MS B = AB interactions SS AB = K ∑ ∑ γˆ 2 ij = K ∑ ∑ Yij .1641.8471. etc. . i =1 j =1 i =1 j =1 I J K ( ) 2 ( I − 1)( J − 1) IJ ( K − 1) MS AB = ( I − 1)( J − 1) SS e IJ ( K − 1) SS AB error SS e = ∑ ∑ ∑ Y ijk − Y ij .05 = 3. .0230.0. we have σ ˜ 2 = 183.18.1 17. reject HB and accept HAB. − Y. + Y. . i =1 j =1 k =1 ( ) 2 IJK − 1 — and FA = 0. . − Y. . FAB = 0.

05). where SSe.3. DEFINITION 1 Any linear combination ψ = ∑iI= 1 ciμi of the μ’s. 110 95 214 217 208 119 128 117 183 187 183 195 48 60 115 127 130 164 123 138 114 156 225 194 19 94 129 125 114 109 17. Let ψ = ∑iI= 1ciμi be a contrast among the μ’s and let . I. . After rejecting H. (31). the natural quantities to look into are of the following sort: 1 1 μ1 + μ 2 + μ 3 − μ 4 + μ 5 + μ 6 . unspeciﬁed) we decided to reject it on the basis of the available data. (28). In rejecting H. . 3 3 1 1 or μ1 + μ 3 + μ 5 − μ 2 + μ 4 + μ 6 etc.3 Show that SST = SSe + SSA + SSB + SSAB. . . The multicomparison method described in this section sheds some light on this problem.4 A Multicomparison Method Consider again the one-way layout with J (≥ 2) observations per cell described in Section 17. (33) and (34).1 and suppose that in testing the hypothesis H : μ1 = · · · = μI (= μ. i ≠ j. let us suppose that I = 6. No conclusions are reached as to which speciﬁc μ’s may be unequal. we simply conclude that the μ’s are not all equal. i = 1. SSA. .4 Apply the two-way layout with two observations per cell analysis of variance to the data given in the table below (take α = 0. . 17. . SSAB and SST are given by (27). . SSB.3. I are known constants such that ∑iI= 1ci = 0. where ci. 3 3 We observe that these quantities are all of the form μ i − μ j . is said to be a contrast among the parameters μi.458 17 Analysis of Variance 17. For the sake of simplicity. i =1 6 This observation gives rise to the following deﬁnition. respectively. i = 1. or ( ) ( ) ( ) ( ) ∑ ci μ i i =1 6 with ∑ ci = 0.

is the same with the maximum (minimum) of f(γ c1. cI) = f(γc1. . cI). i = 1. . I) the quantity PROOF f c1 . ( ) Consider the problem of maximizing (minimizing) (with respect to ci. .α and n = IJ. − μi ) subject to the contrast constraint ∑ ci = 0. .17. i =1 I so that 1 I ˆ2 ψ ˆ = ∑ ci2 MSe .n−I. . . .4 A Multicomparison Method 459 I 1 I ˆ = ∑ ci Yi . i = 1. Therefore the maximum (minimum) of f(c1. . equivalently. . i =1 I ∑ c i = 0. ( ) ( ) DEFINITION 2 ˆ be as above. ψ ˆ + Sσ ˆ (ψ ˆ )] is a where MSe is given in Table 1. . f(c1. . clearly. . γcI) for any γ > 0. THEOREM 1 Refer to the one-way layout described in Section 17. . subject to the restraint i =1 ∑ ci = 0. σ ˆ2 ψ ˆ = ∑ ci2 MSe and S 2 = I − 1 FI −1. consider the following deﬁnition. . ψ J i =1 i =1 ˆ − Sσ ˆ (ψ ˆ ). I subject to the restraints i =1 I . . σ J i =1 ˆ − Sσ ˆ (ψ ˆ ). . . . . . . . (ψ Now it can be shown that the F test rejects the hypothesis H if and only if ˆ is signiﬁcantly different from zero. where S2 = (I − 1)FI −1.1 and let ψ = ∑ ci μ i . . ψ ˆ + Sσ ˆ according to the S (for Scheffé) criterion. .a . if the interval [ψ ˆ )] does not contain zero. c I = ( ) 1 1 I 2 ∑ ci J i =1 I ∑ c (Y i i =1 I i. We will show in the sequel that the interval [ψ ˆ (ψ ˆ )] is a conﬁdence interval with conﬁdence coefﬁcient 1 − α for all conSσ trasts ψ. Next. c ′i = γ ci. Then the interval [ψ conﬁdence interval simultaneously for all contrasts ψ with conﬁdence coefﬁcients 1 − α. . γ cI) = f(c1 ′. ψ ˆ + where n = IJ.. We say that ψ Let ψ and ψ ˆ is signiﬁcantly different from zero. Now. cI′). . . if |ψ ˆ | > Sσ ˆ (ψ ˆ ). . . ˆ − Sσ ˆ (ψ ˆ ). . . . The conﬁdence intervals in question are provided by the following theorem.n − I . there is at least one contrast ψ such that ψ Thus following the rejection of H one would construct a conﬁdence interval for each contrast ψ and then would proceed to ﬁnd out which contrasts are responsible for the rejection of H starting with the simplest contrasts ﬁrst.

i =1 I Thus the points which maximize (minimize) q(c1. . . · · · . . . . c I . i =1 ( ) I ( ) subject to the constraints ∑ ci = 0 i =1 I I and ∑ ci2 = J . . . I and λ1. one considers the expression I ⎞ ⎛ I ⎛ I ⎞ h = h c1 . . . . ⎪ ⎪ ∂λ 2 i =1 ⎭ (37) Solving for ck in (37). λ2. . I ⎪ ∂ck ⎪ I ⎪ ∂h ⎪ = ∑ ci = 0 ⎬ ∂λ1 i =1 ⎪ I ⎪ ∂h 2 = ∑ c i − J = 0. . − μ i . . cI) are to be found on the circumference of the circle which is the intersection of the sphere ∑ ci2 = J i =1 and the plane ∑ ci = 0 i =1 I which passes through the origin. . k = 1. . . . c I = ∑ ci Yi . λ1 . − μ i + λ1 ⎜ ∑ ci ⎟ + λ 2 ⎜ ∑ ci2 − J ⎟ ⎠ ⎝ i =1 ⎝ i =1 ⎠ i =1 ( ) ( ) and maximizes (minimizes) it with respect to ci. i = 1. I . To this end. The solution of the problem in question will be obtained by means of the Lagrange multipliers. − μ k + λ1 + 2 λ 2 ck = 0.460 17 Analysis of Variance ∑ c′ = 0 i i =1 I and 1 I 2 ∑ ci′ = 1. . (38) Then the last two equations in (37) provide us with . we get ck = 1 μ k − Yk . − λ1 2λ 2 ( ) k = 1. Because of this it is clear that q(c1. . J i =1 Hence the problem becomes that of maximizing (minimizing) the quantity q c1 . cI) has both a maximum and a minimum. . . We have ⎫ ∂h = Yk . . λ 2 = ∑ ci Yi . . .

.1.. . . − Y. − μ . . − μ k μ k − μ . k ) ( )] ) = − ∑ μ k − Yk .17. − Y. − Yi . − Yk . i =1 I [( ) ( )] 2 is σ 2χ 2 I−1 distributed (see also Exercise 17. − Y. − Yi . . + Y. . ⎥ ⎦ ( ) = − ∑ μ k − Yk . . I . I ( ⎡ I = P ⎢ J ∑i =1 μ i − μ . . .4 A Multicomparison Method 461 2 J Replacing these values in (38). ( (39) ∑ ci = 0. and λ2 = ± 1 ∑ (μ i =1 I i − μ . . . − Yi . we have ck = ± J μ k − μ . . + Y. i = 1. I such that I ≤ J ∑i =1 μ i − μ . ⎣ k =1 ( ) 2 2⎤ − I μ . + Y. . n − I . . − μ .α MSe ≤ − J ∑i =1 μ i − μ . + Y.a ⎤ MSe ⎥ = 1 − α . + Y.) Therefore J ∑i =1 μ i − μ . . − Yi . + Y. . − Yi . . ∑ (Y k =1 I k.4. ⎣ ( ) 2 ≤ (I − 1)F I −1 . ( . Now we observe that i =1 J ∑ μ i − μ . − Yi . − Y. − μ i − μ . = − ∑ μ k − Yk . − μ i I ( ) ) 2 1 J ∑i =1 ci2 I for all ci. k =1 ) + ( μ − Y )∑ ( μ k =1 − Yk . + Y. . Next. .1) and also independent of SSe which is σ 2χ 2 n−I distributed. . i =1 I ( ) 2 = J ∑ Yi . I ( ) 2 ≤ ∑ I i =1 i c Yi . I λ1 = μ . ⎦ (40) . . (See Section 17. I . k =1 I )( ) I ( ( )[(μ 2 k − Yk . I ( ) (I − 1) ) 2 MSe is FI−1. − Yi . − Yi . . − Yk . + Y. . n − I .n−I distributed and thus ⎡ P ⎢− ⎣ ⎤ ⎥ ⎦ (I − 1)F I −1 . Therefore k =1 I [( ) ( )] 2 ≤ 0. . − J ∑i =1 μ i − μ . ⎡ I = − ⎢∑ μ k − Yk . + Y. − Y. ) 2 ( ) ) 2 J ∑i =1 μ i − μ . k = 1. . + Y.

or equivalently.) ▲ In closing. Vol.1. 23.4. n − I . − μ i I I I ⎤ 1 J ∑i = 1ci2 MSe ⎥ = 1 − α . ˆ − Sσ ˆ ψ ˆ ≤ψ ≤ψ ˆ + Sσ ˆ ψ ˆ = 1 − α. − Yi . under the null hypothesis. . ( ) 2 17.1 Show that the quantity J ∑i =1 μ i − μ . (This proof has been adapted from the paper “A simple proof of Scheffé’s multiple comparison theorem for contrasts in the one-way layout” by Jerome Klotz in The American Statistician. mentioned in SecI tion 17.α for all ci. n − I . ⎦ ( ) (I − 1)F I −1 . + Y. Pψ for all contrasts ψ. . Number 5. . I such that ∑iI= 1ci = 0.α 1 J ∑i = 1ci2 MSe ≤ ∑i = 1ci Yi .462 17 Analysis of Variance From (40) and (39) it follows then that ⎡ P ⎢− ⎣ ≤ (I − 1)F I −1 .95). . [ ( ) ( )] Exercises 17. or the γ’s. i = 1. or the β’s. as was to be seen.2 Refer to Exercise 17.1 and construct conﬁdence intervals for all contrasts of the μ’s (take 1 − α = 0. 1969.4 is distributed as σ 2χ 2 I−1. .4. we would like to point out that a similar theorem to the one just proved can be shown for the two-way layout with (K ≥ 2) observations per cell and as a consequence of it we can construct conﬁdence intervals for all contrasts among the α’s. .

. . . . X i = ∑ cij Yj + μ i . .i. . 1 and ( )( ) ′ μ = (μ . Now instead of considering one (nonj=1 j homogeneous) linear combination of the Y’s. . ∑m j = 1 cjYj + 2 μ is distributed as N(μ. . . m be i. Then the joint distribution of the r. .’s with common distribution N(0.i. . REMARK 1 From Deﬁnition 1. . . . .d. . .v. consider k such combinations.v. . Let Yj. .d. j = 1. k. k are jointly normally distributed. k.v.1 Introduction 463 Chapter 18 The Multivariate Normal Distribution 18. Then we know that for any constants cj. . k or the distribution of the r. . 1). ∑m c ). k-Variate) Normal. Also. . vector X. DEFINITION 1 ( ) ′ ) C = c ij k × m . .’s Xi. is called Multivariate (or more speciﬁcally. . . . j = 1. . then any subset of them also is a set of jointly normally distributed r. .v. vector X.v. that is. . m and μ the r. or the r. ( Y = Y1 . X k . μ ) . (1) (2) or in matrix notation X = CY + μ. r. we introduce the Multivariate Normal distribution and establish some of its fundamental properties. be deﬁned by (1) or (2). . . 1) and let the r. i = 1. respectively.’s. it follows that if Xi. . k Thus we can give the following deﬁnition. Let Yj.’s distributed as N(0. . i = 1.1 Introduction In this chapter. m be i.18.’s Xi. . j =1 m i = 1. r. . where ′ X = X1 .v. Ym . . . 463 . certain estimation and independence testing problems closely connected with it are discussed. . j = 1. i = 1. . .

r. vector X. ⎝ j =1 ⎠ ⎝ j =1 ⎠ Therefore by means of (3)–(5). a fact analogous to that of a Univariate Normal distribution. is given by ⎛ ⎞ 1 / t⎟ . (6) φ x t = exp⎜ it ′μ − t ′Σ 2 ⎝ ⎠ From (6) it follows that φx. . . we have ( ) (3) φ X t = E exp it ′X = E exp it ′ CY + μ = exp it ′μE exp it ′CY . vector X = (X1. which has the k-Variate Normal distribution with mean μ and covariance matrix Σ / . . where C is a k × k non-singular matrix. ⎝ j =1 () ( ) [ ( )] ⎞ ( ) (4) (Y1 . . . .v. . Now we shall establish the following interesting result.f. Σ We now proceed to ﬁnding the ch.d. that is.i. This fact justiﬁes the following notation: () / . and therefore the distribution of X. φx of the r. . But ⎛ k t ′CY = ⎜ ∑ t j c j 1 .d. . . Σ ( ) where μ and Σ / are the parameters of the distribution. tk)′ ∈ ޒk. . / x or just Σ / = CC ′. . Ym ) ∑ t j c jm ⎟ ⎠ j =1 k ′ ⎞ ⎛ k ⎞ ⎛ k = ⎜ ∑ t j c j 1 ⎟ Y1 + ⋅ ⋅ ⋅ + ⎜ ∑ t j c jm ⎟ Ym ⎠ ⎝ j =1 ⎠ ⎝ j =1 and hence ⎛ k ⎞ ⎛ k ⎞ E exp it ′CY = φY ⎜ ∑ t j c j 1 ⎟ + ⋅ ⋅ ⋅ + φY ⎜ ∑ t j c jm ⎟ ⎝ j =1 ⎠ ⎝ j =1 ⎠ ( ) 1 m because 2 2 ⎡ ⎞ ⎞ ⎤ 1⎛ k 1⎛ k ⎢ = exp − ⎜ ∑ t j c j 1 ⎟ − ⋅ ⋅ ⋅ − ⎜ ∑ t j c j m ⎟ ⎥ ⎢ 2 ⎝ j =1 2 ⎝ j =1 ⎠ ⎠ ⎥ ⎦ ⎣ ⎛ 1 ⎞ = exp⎜ − t ′CC′t⎟ ⎝ 2 ⎠ 2 2 (5) ⎛ k ⎞ ⎛ k ⎞ t c + ⋅ ⋅ ⋅ + ⎜ ∑ j j1⎟ ⎜ ∑ t j c j m ⎟ = t ′CC′t.464 18 The Multivariate Normal Distribution From (2) and relation (10). of the r.f. j = 1. . 1) and set X = CY + μ. For t = (t1. . k be i. fx of X exists and is given by . THEOREM 2 Let Yj.f. EX = μ. THEOREM 1 The ch. X ~ N μ. it follows that EX = μ and Σ /x = CΣ / YC′ = CImC′ = CC′. Chapter 16. Xk)′. . . . is completely determined by means of its mean μ and covariance matrix Σ / . . we have the following result. . .’s with distribution N(0. Then the p.

of X1.f. ′ / . gives ( ) Therefore fX x = fY C −1 x − μ But ′ () 2 j [ ( ′ )] C −1 −1 = 2π ( ) −k 2 ⎛ 1 k ⎞ exp⎜ − ∑ y 2j ⎟ C −1 . x ∈ ޒk .d. σ 2 1 2).2).18. 2 σ1 ⎠ 2 2 and hence |Σ /| = σ2 1σ 2(1 − ρ ). so that ||C|| = |Σ / | . ⎝ 2 j =1 ⎠ ∑ y = ( x − μ) (C k j =1 C −1 = C Therefore −1 ) (C )(x − μ) (see also Exercise 18. ▲ A k-Variate Normal distribution with p. so that / −1 = Σ 2 2 σ1 σ 2 1 − ρ2 ( 1 ) ⎛ σ2 2 ⎜ ⎝ − ρσ 1σ 2 . ⎦ ⎤ as was to be seen. ⎦ 1 – 2 ⎤ Finally.f. REMARK 2 COROLLARY 1 In the theorem. fX x = 2π () ( ) −k 2 / Σ −1 2 ( ) ( ) (7) PROOF From X = CY + μ we get CY = X − μ. from Σ / = CC′.1. The use of the term non-singular corresponds to the fact that |Σ / | ≠ 0. Then X = (X1. X2)′ and the joint p. Then their covariance matrix Σ / is given by 2 ⎛ σ1 / =⎜ Σ ⎝ ρσ 1σ 2 ρσ 1σ 2 ⎞ ⎟ σ2 2 ⎠ − ρσ 1σ 2 ⎞ ⎟. one has |Σ / | = |C| |C′| = |C| . given by (7) is called a non-singular k-Variate Normal. PROOF By Remark 1.1 Introduction 465 ′ −1 ⎡ 1 ⎤ / x − μ ⎥. exp⎢− x − μ Σ ⎣ 2 ⎦ where Σ / = CC′ and |Σ / | denotes the determinant of Σ /. Thus 2 fX x = 2π () ( ) −k 2 / Σ −1 2 ′ −1 ⎡ 1 / x−μ exp⎢− x − μ Σ ⎣ 2 ( ) ( )⎥ . since C is non-singular. that is. Y = C −1 X − μ . the fact that Σ / is of full rank.d. which. X2 is the Bivariate Normal p. let k = 2. and (C ) (C ) = (C ′) C = (CC ′) = Σ −1 −1 −1 −1 −1 −1 −1 (8) fX x = 2π () ( ) −k 2 ′ −1 ⎡ 1 / x−μ C −1 exp⎢− x − μ Σ ⎣ 2 ( ) ( )⎥ . both X1 and X2 are normally distributed and let X1~ N(μ1.f.d. Also let ρ be the correlation coefﬁcient of X1 and X2. σ 2 ) and X2 ∼ N(μ2.

σ 1σ 2 ⎝ σ 2 ⎠ ⎥⎪ ⎦⎭ ( )( ) The (normal) r. . ▲ The really important part of the corollary is that noncorrelation plus normality implies independence. β ˆ 2)′. . = x1 − μ1 σ 2 2 − 2 x1 − μ1 x 2 − μ 2 ρσ 1σ 2 + x 2 − μ 2 ) 2 ( )( ) ( ) 2 Hence fX 1 . . It is also to be noted that noncorrelation without normality need not imply independence. The r. k are independent if and only if they are uncorrelated. . REMARK 3 Exercises ˆ of β in (9) 18.466 18 The Multivariate Normal Distribution Therefore ′ −1 2 2 / x−μ σ1 σ 2 1 − ρ2 x − μ Σ ( )( ) ) ( ) − ρσ 1σ 2 ⎞ ⎛ x1 − μ1 ⎞ ⎟⎜ ⎟ σ2 ⎠ ⎝ x2 − μ2 ⎠ 1 ⎛ σ2 2 = x1 − μ1 . . . . Xk (x . It follows that PROOF fX 1 .’s Xi. x ) = 1 2 1 2πσ 1σ 2 − ⎧ 1 ⎪ exp⎨− 2 2 1 − ρ2 1−ρ ⎪ ⎩ ( ) ⎡⎛ x1 − μ1 ⎞ ⎢⎜ ⎟ ⎢ ⎣⎝ σ 1 ⎠ 2 as was to be shown. ▲ COROLLARY 2 2 ⎛ x − μ 2 ⎞ ⎤⎫ 2ρ ⎪ x1 − μ1 x 2 − μ 2 + ⎜ 2 ⎟ ⎥ ⎬. (β . as it has been seen elsewhere.X 2 (x . given by (19″) and (19′) of covariance matrix σ 2S−1. i = 1. x ) = ∏ k 1 k i =1 ⎡ 1 exp ⎢− xi − μ i 2 ⎢ 2πσ i ⎣ 2σ i 1 ( )⎥ ⎥ 2 ⎤ ⎦ and this establishes the independence of the X’s. i = 1. Then 2 |Σ /| = σ2 /| Σ / −1 is also a diagonal matrix with the jth 1 · · · σ n. − x1 − μ1 ρσ 1σ 2 + x 2 − μ 2 σ 1 ⎜ ⎟ ⎝ x2 − μ2 ⎠ (( ( ) ( ) ( ) ) 2 σ1 . . so that Σ / −1 itself is a diagonal matrix with the 2 jth diagonal element being given by 1/σ j. In particular.v.1 Use Deﬁnition 1 herein in order to conclude that the LSE β of Chapter 16 has the n-Variate Normal distribution with mean β and ˆ 1.’s Xi. . . . . . k are uncorrelated if and only if Σ / is a diagonal matrix and its diagonal elements are the variances of the X’s. . |Σ 2 diagonal element given by ∏i≠j σ i .1.v. . since independence implies noncorrelation in any case. x 2 − μ 2 ⎜ ⎝ − ρσ 1σ 2 ( ( ) ⎛ x1 − μ1 ⎞ 2 = x1 − μ1 σ 2 2 − x 2 − μ 2 ρσ 1σ 2 . On the other hand. .

2 j 18.3 Let the random vector X = (X1. . Xj (1 ≤ m < k. . Then the r. . let Xj be independent N(μj. have the Bivariate Normal distribution with means and variances ˆ 1 = β1. . . Σ / j) k-dimensional r. . . . Then show that the conditional joint distribution of Xi . ▲ () ( ) ( ) ( ) ( ) THEOREM 4 For j = 1.18. . n. . all i1. μ and variance say. given Xj .f.2 Some Properties of Multivariate Normal Distributions In this section we establish some of the basic properties of a Multivariate Normal distribution. . ⎣ ⎦ so that by means of (6). . . .1. ∑ c 2 jΣ ⎝ j =1 ⎠ j =1 j =1 . . . . 1 m 1 n 18. Σ / ) and suppose that Σ / is non-singular. as was to be seen. if m = 1. .2 Verify relation (8).1. . ∑ x n∑ x n j =1 . . is Multivariate Normal and specify its parameters. . Xk)′ be distributed as N(μ. σ2 β 2 ( ) j n j =1 ∑ (x n j =1 σ2 j −x ) 2 and correlation coefﬁcient equal to − 18. Y = α′X. Xk)′ be N(μ. and Y has the Univariate Normal distribution with mean α′μ Σ α′Σ / α. . Eβ ˆ 2 = β2 and Eβ ˆ = σ β 1 2 ( ) σ 2 ∑j = 1 x 2j n n∑j = 1 x j − x n ( ) 2 ˆ = . Σ / ) (not necessarily non-singular). THEOREM 3 Let X = (X1. we have () [ ( )] ( ( ) ) ( ) ( ) ′ ′ ⎡ ⎤ ⎡ ⎤ 1 1 / A ′t ⎥ = exp⎢it ′ Aμ − t ′ AΣ / A′ t ⎥ φ Y t = exp⎢i A ′t μ − A ′t Σ 2 2 ⎣ ⎦ ⎣ ⎦ and this last expression is the ch. of the m-Variate Normal with mean Aμ and covariance matrix AΣ / A′. Then for any m × k constant matrix A = (αij). we have ′ ⎤ ⎡ φ Y t = E exp t ′Y = E exp t ′AX = E ⎢exp A ′t X ⎥ = φ X A ′t . . In particular. .v.1 Distributions Introduction 467 Chapter 16. the r. . Y is a linear combination of the X’s. vector Y deﬁned by Y = AX has the m-Variate Normal distribution with mean Aμ and covariance matrix AΣ / A′. m + n = k. PROOF For t ∈ ޒm.2 Some Properties of Multivariate Normal 18. . jn). vectors and let cj be constants. vector n n ⎛ n ⎞ / j⎟ X = ∑ c j X j is N ⎜ ∑ c j μ j . Xi . The particular case follows from the general one just established. . . the r. im ≠ from all j1.

. let Xj be independent N(μ. we have ′ −1 ⎡ ⎤ / x − μ ⎥ 2π φQ t = E exp itQ = ∫ exp⎢it x − μ Σ ޒ ⎣ ⎦ ′ −1 ⎡ 1 ⎤ / x − μ ⎥ dx × exp⎢− x − μ Σ ⎣ 2 ⎦ () ( ) k ( ) ( )( ) )( −k 2 / Σ −1 2 ( ) ( ) =∫ ޒk (2π ) −k 2 / Σ −1 2 ′ −1 ⎡ 1 ⎤ / x − μ 1 − 2it ⎥ dx exp⎢− x − μ Σ 2 ⎣ ⎦ ( ) ( ) = 1 − 2it ( ) ∫ ( ) −k 2 ޒk 2π −k 2 / Σ 1 − 2it −1 2 since −1 ⎡ 1 ′⎛ Σ / ⎞ × exp⎢− x − μ ⎜ ⎟ x−μ ⎢ 2 ⎝ 1 − 2it ⎠ ⎣ ( ) ( dx.468 18 The Multivariate Normal Distribution (a result parallel to a known one for r. n. vectors and let X= ¯ is N(μ. . j = 1. n j =1 In the theorem. = exp⎢it ′ c jμ j − t ′ c 2j Σ 2 ⎣ ⎦ j ( ) ( ) ( ( ) ( ) ) ( ) ⎡ ⎛ n ⎞ 1 ⎛ n ⎞ ⎤ / j ⎟ t ⎥. .f. Xk)′ be non-singular N(μ. PROOF 1 n ∑Xj.’s). Σ / ) and set Q = (X − μ)′Σ / −1(X − μ). )⎥ ⎥ ⎦ ⎤ −k / Σ /. ▲ THEOREM 5 Σ Let X = (X1. . n. Hence ( ) .d.v. ▲ φ X t = exp ⎢it ′⎜ ∑ c j μ j ⎟ − t ′⎜ ∑ c 2j Σ ⎠ 2 ⎝ j =1 ⎠ ⎥ ⎢ ⎦ ⎣ ⎝ j =1 () COROLLARY For j = 1. Σ /j = Σ / and cj = 1/n. . PROOF For t ∈ ޒ. = 1 − 2it Σ 1 − 2it Now the integrand in the last integral above can be looked upon as the p. of a k-Variate Normal with mean μ and covariance matrix Σ / /(1 − 2it). Σ / ) k-dimensional r. j =1 j j () n () n j =1 ( ) But so that ′ ′ ⎡ ⎤ 1 / j c jt ⎥ φ X c j t = exp⎢i c jt μ j − c jt Σ 2 ⎣ ⎦ ⎡ ⎤ 1 / j t ⎥. . . (1/n)Σ Then X / ). we have φ X t = ∏ φc X t = ∏ φ Xj c j t . distributed as χ k. PROOF For t ∈ ޒk and the independence of the Xj’s. . . . . 2 Then Q is an r.v. . taken μj = μ.

n→∞ n→∞ where X is distributed as N(μ. . Use this result (and also d ⎯ → X. .’s Fn. and then we proceed with a certain testing hypothesis problem. vectors and set ′ 1 n X = X 1 . Xk)′ with d.v. and X = (X1. Xjk)′ be independent. X n ⎯ λ′X n ⎯ ⎯ → X. . That is.18. . .v. . . respectively. for every λ = (λ. Now suppose that the joint distribution of the r. of χ 2 k. . .1 Consider the k-dimensional random vectors Xn = (X1n.f.3 Estimation of μ and ∑ / and Test 18. 2. ▲ Notice that Theorem 5 generalizes a known result for the onedimensional case. . . . 18. . . . . ¯ ii) X and S/(n − 1) are unbiased estimators of μ and Σ / . if Fn ( x) ⎯ ⎯→ F ( x) for all x ∈ ޒk for which F is continuous (see n→∞ n→∞ also Deﬁnition 1(iii) in Chapter 8). respectively.1 of Independence Introduction 469 the integral is equal to one and we conclude that φQ (t) = (1 − 2it)−k/2 which is the ch. j = 1. λk)′ ∈ ޒ. X k .3 Estimation of μ and Σ / and a Test of Independence First we formulate a theorem without proof. k. k. Σ THEOREM 6 For j = 1. In particular. . Xkn)′. if and only if Theorem 3′ in Chapter 6) in order to prove that X n ⎯ n→∞ k d d ⎯→ λ ′ X. k =1 ( ) n ( )( ) Then ¯ and S are sufﬁcient for (μ. . Σ /) r. .’s X and Y is the Bivariate Normal distribution. . providing estimators for μ and / .’s φn. . Σ / ) if and only if {λ′Xn} converges in distribution μ and as n → ∞. . . F and ch. . . . φ. REMARK 4 Exercise 18. and we write d Xn ⎯ ⎯ → X. n j =1 ( ) and S = Sij . where Sij = ∑ X ki − X i X kj − X j . . i = 1. It can be shown that a multidimensional version of Theorem 2 in Chapter 8 holds true. ¯ and S/n are MLE’s of μ and Σ iii) X / . . . .f. n. n = 1. . i. to an r. non-singular N(μ. Y which is distributed as Normal with mean λ′μ Σ variance λ′Σ / λ for every λ ∈ ޒk. .2. . . .f. Then we say that {Xn} converges in distribution to X as n → ∞. Σ i) X / ). respectively. . where X i = ∑ X ji . let Xj = (Xj1.

consider an r. ⎩ ⎭ ( ) We have 2 . σ 1 2 . σ2 . ޒσ 1 Ω = ⎨θ = μ1 .f. y = ( ) 1 2πσ 1σ 2 1 − ρ 2 e −q 2 . ⎩ ⎭ ( ) whereas under H..’s X and Y are independent if and only if they are uncorrelated. ρ = 0 ⎬. . . ρ 2 > 0 . Then their joint p. . . ρ 2 > 0 .470 18 The Multivariate Normal Distribution fX . − log σ 2 2 − 2 2 2 2 j =1 ( ) n . ⎟ +⎜ ⎟⎜ ⎟ − 2 ρ⎜ ⎢⎝ σ 1 ⎠ ⎝ σ1 ⎠ ⎝ σ 2 ⎠ ⎝ σ 2 ⎠ ⎥ ⎣ ⎦ Then by Corollary 2 to Theorem 2. where the parameter space Ω is given by ′ ⎧ ⎫ 2 2 ∈ ޒ5 . we are going to employ the LR test.Y x. σ2 . σ2 2. ρ . For this purpose.d.v. μ 2 . Yj). σ2 g = g θ = g μ1 . yn () ( ) (10) = n log 2π − n n n 1 2 log σ 1 log 1 − ρ 2 − ∑ q j . from the Bivariate Normal under consideration. (9) For testing H. And although the MLE’s of the parameters involved are readily given by Theorem 6. . σ2 2. − 1 < p < 1⎬. sample of size n(Xj. ⎟ ⎠ n and 1 qj = 1 − ρ2 2 ⎡⎛ x − μ ⎞ 2 ⎛ x j − μ1 ⎞ ⎛ y j − μ 2 ⎞ ⎛ y j − μ 2 ⎞ ⎤ 1 ⎥ ⎢ j ⎟ − 2 ρ⎜ σ ⎟ ⎜ σ ⎟ +⎜ σ ⎟ ⎥. μ1 . μ 2 . . Thus the problem of testing independence for X and Y becomes that of testing the hypothesis H : ρ = 0. we set g(θ) for log f(θ) considered as a function of the parameter θ ∈ Ω. the parameter space ω becomes ′ ⎧ ⎫ 2 2 ∈ ޒ5 . For this purpose. . y1 . the r. j = 1. . ޒσ 1 ω = ⎨θ = μ1 . . μ1 . ⎢⎜ ⎝ σ1 ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠ 1 2 2 ⎦ ⎣ j = 1. is given by ⎛ 1 ⎜ ⎜ 2πσ σ 1 − ρ 2 ⎝ 1 2 where Q = ∑ qj j =1 n ⎞ ⎟ e −Q 2 . . n. μ 2 ∈ . . f. n. . . 1 q= 1 − ρ2 2 ⎡⎛ x − μ ⎞ 2 ⎛ x − μ1 ⎞ ⎛ y − μ 2 ⎞ ⎛ y − μ 2 ⎞ ⎤ 1 ⎢⎜ ⎟ ⎥. . x1 . μ 2 . . μ 2 ∈ . we choose to derive them directly. . xn . σ 1 . σ 1 .

σ ρ 2 = Sy . we obtain (see also Exercise 18. μ ˜ 2 = y. we obtain after some simpliﬁcations (see also Exercise 18. (16) It can further be shown (see also Exercise 18. differentiating g with respect to ρ and equating the partial derivative to zero.3. θ ∈ Ω = L Ω ⎜ S2 ⎜ xy S S − 2 π 1 x y ⎜ S Sy x ⎝ [() ] () ⎞ ⎟ ⎟ ⎟ .3.2. 2 ⎜ 2 σ 1σ 2 σ 2 ⎠ σ 1σ 2 1 − ρ ⎝σ1 S xy S x Sy (15) 2 In (14) and (15).3. j = 1. μ Now let us set Sx = Sy = 1 ∑ yj − y n j =1 n 2 1 n xj − x . . Differentiating (10) with respect to μ1 and μ2 and equating the partial derivatives to zero. ∑ n j =1 1 n and Sxy = ∑ x j − x y j − y .3) ρ− ⎞ ρ ⎛ 1 2ρ 1 1 Sx − S xy + 2 S y ⎟ + S xy = 0. n j =1 (12) ( ) ( ) 2 ( )( ) (13) 2 Then. ⎟ ⎟ ⎠ n (17) . σ1 σ2 σ1 σ2 ⎪ ⎭ (See also Exercise 18. g) and the maximum is given by ⎛ ⎜ e −1 ˆ =⎜ max f θ . ρ 1 2 ⎪ Sy − Sxy = 1 − ρ . . ρ 1 ρ 1 ⎪ μ1 − μ2 = x− y. ⎪ σ 1σ 2 σ2 2 ⎭ (See also Exercise 18. n are given by (9). we obtain after some simpliﬁcations ⎫ ρ 1 Sx − Sxy = 1 − ρ 2 ⎪ 2 σ 1σ 2 σ1 ⎪ ⎬.4) 2 ˜1 ˜2 ˜= σ = Sx.1.3 Estimation of μ and ∑ / and Test 18. respectively.3. solving for σ 2 1.1 of Independence Introduction 471 where qj. .) (14) Next. σ 2 and ρ. we get ˜1 = x .18. .) (11) Solving system (11) for μ1 and μ2. equating the partial derivatives to zero and replacing μ1 and μ2 by μ ˜ 1 and μ ˜ 2. differentiating g with respect to σ 2 1 and σ 2.3. we get after some simpliﬁcations ρ 1 ρ 1 ⎫ y− x⎪ μ2 − μ1 = σ2 σ1 σ2 σ1 ⎪ ⎬. .5) that the values of the parameters given by (12) and (16) actually maximize f (equivalently.

in order to be able to determine the cut-off point c0. which we may now denote by μ ˆ 1. Therefore. we have that the LR statistic λ is given by ⎛ S2 ⎞ λ = ⎜ 1 − XY ⎟ S X SY ⎠ ⎝ = 1 − R2 ( ) n 2 . σ ˆ1 ˆ2 μ . μ ˆ 2 .d. under Ω.Ω = S y . Now although the p. so that PH(W < −c or W > c) = α.Ω.ω = S x .d. (23) In (23). R < − c0 or R > c0 .6) that the MLE’s of the parameters involved are given by 2 ˆ 1.3. ⎟ ⎠ n (20) Replacing the x’s and y’s by X’s and Y’s. σ ˆ1 ˆ2 ˆΩ = μ ρ . (24) it is easily seen. the test in (23) is equivalent to the following test Reject H whenever W < −c or W > c .ω = S y (19) and ⎛ e −1 ˆ =⎜ max f θ . σ ˆ2 ˆ2 ˆ Ω. σ 2 . However.f. (18) Under ω (that is. σ 2. μ2. μ ˆ 2 . R can be derived. σ 2 1. (See also Exercise 18. that is. c0 = 1 − λ2 0 .Ω.7. of the r. if we consider the function W =W R = ( ) n − 2R 1 − R2 .v.Ω = x . by differentiation. σ 2 and ρ. R = ∑ X j − X Yj − Y j =1 n ( )( ) ∑ (X j =1 n j −X ) ∑ (Y − Y ) 2 j j =1 n 2 . (22) From (22).f. where λ 0 is determined. (25) where c is determined.Ω = y . 1. for ρ = 0). so that PH(λ < λ 0) = α.Ω and ρ 2 ˆ 1.Ω = S x . in (17) and (20). we get by means of (21). it is seen (see also Exercise 18.3. it follows that R 2 ≤ 1. That is.ω = y .) Therefore by the fact that the LR test rejects H whenever λ < λ 0. respectively. this p. we have to know the distribution of R under H. It is shown in the sequel that the distribution of W under H is tn−2 and hence c is readily determined. is none of the usual ones. that W is an increasing function of R . σ 2 . equivalently. θ ∈ω = L ω ⎜ 2π S S x y ⎝ [() ] () n 2 ⎞ ⎟ . are given by (12) and (16). that this test is equivalent to rejecting H whenever n R 2 > c0 .ω = x . . Sxy Sx Sy . (21) where R is the sample correlation coefﬁcient.472 18 The Multivariate Normal Distribution 2 It follows that the MLE’s of μ1. μ ˆ 2.Ω.

18. so that (see also Exercise 18. .1 of Independence Introduction 473 Suppose Xj = xj. By means of the transformation in question. n be the remaining Z’s.9) that Wx = W* v remains unchanged. (26) Let also vj = x j − x so that n ( ) ∑ (x j =1 n j =1 j −x ) 2 . 1) r. . j = 1. the sample correlation .3. Therefore we may assume that the Y’s are themselves independent N(0. Next consider the transformation ⎧ 1 1 Y1 + ⋅ ⋅ ⋅ + Yn ⎪Z1 = ⎨ n n ⎪Z = v Y + ⋅ ⋅ ⋅ + v Y . 1). it is seen (see also Exercise 18. . j = 1. Now if we consider the N(0. It is readily seen that n R x = R∗ v = ∑ v j Yj j =1 ∑Y j =1 2 j − nY 2 . j = 3.’s Yj′ = (Yj − μ2)/ σ2. . n 2 n 2 are independent N(0. . . .3 Estimation of μ and ∑ / and Test 18. 1 1 n n ⎩ 2 Then because of (1/√n)2 + · · · + (1/√n)2 = n/n = 1 and also because of (27). . it follows that the unconditional distribution of W is tn−2. j = 1. j = 1.v.8) ∗= vY Wx = W v ∑ j j j =1 n 2 ⎡n ⎛ n ⎞ ⎤ ⎢∑ Y j2 − nY 2 − ∑ v j Yj ⎥ ⎜ ⎟ ⎥ ⎢ j =1 ⎝ j =1 ⎠ ⎦ ⎣ ( n − 2) .3. n and replace Yj by Yj′ in (28). . . Chapter 9. n and that ∑n ¯ )2 > 0 and set j = 1(xj − x R x = ∑ x j − x Yj − Y j =1 n ( )( ) ∑ (x j =1 n n j −x ) ∑ (Y − Y ) 2 j j =1 n 2 . (28) We have that Yj. σ 2 2). Since this distribution is independent of the x’s. given Xj = xj. . THEOREM 7 For testing H : ρ = 0 against A : ρ ≠ 0 at level of signiﬁcance α. Then by Theorem 5. Thus we have the following result. where W is given by (24). Chapter 9.v. n is tn−2. it follows that the r. the statistic in (28) becomes Z2 Σ jn= 3 Z j2 ( n − 2) . . . Therefore the distribution of Wx. . this transformation can be completed to an orthogonal transformation (see Theorem 8. . (27) Let also Wx = n − 2R x 1 − R 2x .’s Zj. . ∑v j =1 j = 0 and ∑v n 2 j = 1. . . . equivalently the distribution of W. . . . .I(i) in Appendix I) and let Zj. 1). one rejects H whenever W < −c or W > c. j = 1. . n are independent N(μ2. Also ∑n j = 1 Y j = ∑ j = 1 Z j by Theorem 4.

Σ / ) distribution has a p. ⎤ ⎥ ⎦ ( ) PROOF From W = √n − 2 r/√1 − R2. if Σ / is non-singular. one has the following corollary. one has then R = W/√W 2 + n − 2. However.f. . of the correlation coefﬁcient R is given by ⎡1 Γ⎢ n − 1 1 ⎣2 fR r = π Γ⎡ 1 n − 2 ⎢2 ⎣ () ( )⎤ ⎥ ) ( ⎦ 1 − r 2 ( n 2 ) − 2 . this is not the case if Σ / is singular. In this latter case.d. the distribution is called singular.f. ▲ The p. of R when ρ ≠ 0 can also be obtained.d. and it can be shown that it is concentrated in a hyperplane of dimensionality less than k.f. but its expression is rather complicated and we choose not to go into it. ⎛ n − 2 r ⎞ dw fR r = fW ⎜ ⎟ ⎝ 1 − r 2 ⎠ dr () ⎡1 ⎤ Γ⎢ n − 1 ⎥ 2 ⎣ ⎦ = ⎡1 π n − 2Γ ⎢ n − 2 ⎣2 ( ) ( ) ⎡ n − 2 r2 ⎢1 + ⎤⎢ n − 2 1− r2 ⎥⎣ ⎦ ( ( )( ) ) ⎤ ⎥ ⎥ ⎦ − n −1 2 ( ) n − 2 1− r2 ( ) −3 2 ⎡1 ⎤ Γ⎢ n − 1 ⎥ 2 ⎣ ⎦ 1 − r 2 ( n 2 )− 2 . COROLLARY The p. which is given by (7). it follows that R and W have the same sign. one has dw/dr = √n − 2(1 − r2)−3/2. whereas fW Therefore ( ) ⎡1 ⎤ Γ⎢ n − 1 ⎥ ⎣2 ⎦ w = ⎡1 π n − 2Γ ⎢ n − 2 ⎣2 ( ) ( ) ⎛ w2 ⎞ 1 + n − 2⎟ ⎤⎜ ⎝ ⎠ ⎥ ⎦ − n −1 2 ( ) . is given by (6). w ∈ ޒ. under H. is tn−2. Furthermore. RW ≥ 0. Σ / ).f.474 18 The Multivariate Normal Distribution coefﬁcient R is given by (22) and the cut-off point c is determined from P(tn−2 > c) = α/2 by the fact that the distribution of W.d. = ⎡1 ⎤ π Γ⎢ n − 2 ⎥ ⎣2 ⎦ ( ) ( ) ( ) as was to be shown. Solving for R. then the N(μ. Then its ch. Let X be a kdimensional random vector distributed as N(μ. − 1 < r < 1. that is. We close this chapter with the following comment. By setting w = √n − 2 r/√1 − r2. To this last theorem.

μ ˜ 2. Apostol. μ2.’s . Chapter 11).) 2 18. and ρ system of the equations in (14) and (15).v.3. i. ˜ θ=θ where 2 θ = μ1 . .2 18. . are indeed given by (19). pp.3. D1.8 Verify relation (28). Y j′ = Yj − μ 2 . . . 18. Then show that the six numbers D0. . .1 Introduction Exercises 475 Exercises 18. σ 2 and ρ and denote by D5−k the determinant obtained from D by deleting the last k rows and columns. . i. 18. j = 1.v. σ 1 . 5. j = 1. . the MLE’s of the parameters μ1.3.9 Show that the statistic Wx (= W* v ) in (28) remains unchanged if the r.3.3.1 18. j = 1.3. σ 2. .3.5 Consider g given by (10) and set dij = ∂2 gθ ∂θ i ∂θ j () .10 Refer to the quantities X Basu’s theorem (Theorem 3. . under ω. . (See for example Mathematical Analysis by T. 18. 18. This result together with the fact that dij.3. and σ 2. μ ˜ 2 . D5 are alternately positive and negative. . μ2. M. 5 are continuous functions of θ implies that the 2 quantities given by (18) are. . σ ˜2 ˜2 ˜ are given by (12) and (16). σ 2 1. σ 2 and ρ. by using 18. Addison-Wesley. Verify relation (15). .18.4 Show that σ ˜2 ˜2 ˜ given by (16) is indeed the solution of the 1. 18.ρ ( ) ′ 2 ˜= μ ˜1. σ2 ¯ and S deﬁned in Theorem 6 and. Theorem 7. .9. under Ω.3 Verify relation (11). 151–152.3. Let D = (dij).σ 2 2. . 5 1. σ 2 1. .3. n.6 Show that the MLE’s of μ1. k = 1. μ 2 . . . 1957.σ ˜1 ˜2 ˜ θ .ρ ( ) ′ and μ ˜ 1.7 Show that ޒ2 ≤ 1. show that they are independent. .σ 2. Verify relation (14).’s Yj are replaced by the r. D0 = 1. indeed. where ޒis the sample correlation coefﬁcient given by (22).

. . in the variables xj. . Now Q is a 1 × 1 matrix and hence Q′ = Q. . . . (1) becomes as follows: Q = x ′Cx. Q = ∑ ∑ cij xi x j . n is a homogeneous quadratic (second degree) function of xj. . n. we formulate and prove a number of standard theorems referring to the distribution and/or independence of quadratic forms. we introduce the concept of a quadratic form in the variables xj. . then aij = – (cij + cji). in the variables xj. so that aij = aji. n. n is a homogeneous quadratic function of xj. . j = 1. .1 Introduction In this chapter. . Q. xn)′ and C = (cij). (2) 476 . . . In matrix notation. . j = 1. if Therefore Q = – 2 2 1 A = (aij). . DEFINITION 1 A (real) quadratic form. We can then 2 give the following deﬁnition. i =1 j =1 n n (1) where cij ∈ ޒand cij = cji.’s Xj. . 1 1 (x′C′x + x′Cx) = x′Ax. In this latter case. j = 1. where A = – (C + C′). Q. . That is. i. n and then conﬁne attention to quadratic forms in which the xj’s are replaced by independent normally distributed r. . n. we can write Q = x′Cx. that is to say. A quadratic form. j = 1. i =1 j =1 n n where here and in the sequel the coefﬁcients of the x’s are always assumed to be real-valued constants. . . n. j = 1. . . . . . . . Thus A is symmetric. . . j = 1. By setting x = (x1. .v. or (x′Cx)′ = x′C′x = x′Cx. Q = ∑ ∑ cij xi x j .476 19 Quadratic Forms Chapter 19 Quadratic Forms 19. . . j = 1.

The n roots of the equation |C − λIn| = 0 are called characteristic or latent roots or eigenvalues of C. . ⎝ ⎠ ⎝ ⎠ 2 2 i i i i (4) (i) . C = (cij) and C′ = C (which expresses the symmetry of C). . k. If Q = x′Cx. Consider the matrix Ci. ∑ i = 1ri = n. or X: Q = X ′CX. . . . i =1 k (3) where for i = 1. We then replace x by X in (2) and obtain the following quadratic form in Xj. To this end. . . . j = 1. negative deﬁnite or positive semideﬁnite if the quadratic form associated with it.2 Some Theorems on Quadratic Forms Throughout this section.I(ii) in Appendix I. . it is called negative deﬁnite if x′Cx < 0 for every x ≠ 0 and positive semideﬁnite if x′Cx ≥ 0 for every x. one has that Qi = X′CiX. respectively. . . . δ (ri) are either 1 or −1. Q = x′Cx. . we suppose that ∑k i = 1ri = n and show that for i = 1.v. Qi are quadratic forms in X with rank Qi = ri. where Ci is an n × n r symmetric matrix with rank Ci = ri. . . . there exist ri linear forms in the X’s such that PROOF i i (i ) (i ) (i ) (i ) (i ) (i ) Qi = δ 1 ⎛ b11 X 1 + ⋅ ⋅ ⋅ + b1n X n ⎞ + ⋅ ⋅ ⋅ + δ r ⎛ b r 1 X 1 + ⋅ ⋅ ⋅ + b r n X n ⎞ .’s Qi are independent χ 2 r if and only if ∑ i = 1ri = n. Qi are 2 k independent χ r . is positive deﬁnite. . Some theorems related to such quadratic forms will now be established. . k. Xn)′. DEFINITION 3 DEFINITION 4 19. The quadratic form Q = x′Cx is called positive deﬁnite if x′Cx > 0 for every x ≠ 0. THEOREM 1 (Cochran) Let X ′X = ∑ Qi . then because of (3). n are independently distributed as N(0. Next.19.v. k. . then the rank of C is also called the rank of Q. . A symmetric n × n matrix C is called positive deﬁnite. it is assumed that the r. .2 Some Theorems on Quadratic Forms 477 where x = (x1. Qi are independent χ 2 .’s Xj. DEFINITION 2 For an n × n matrix C. . 1) and we set X = (X1. the polynomial (in λ) |C − λIn| is of degree n and is called the characteristic polynomial of C. i 2 2 We have that X′X = ∑ n j = 1X j is χ n. Therefore if for i = 1. . . . . By Theorem 11. where C′ = C. . . j = 1. . . n. Then the k r. negative deﬁnite or positive semideﬁnite. Now ∑k where δ 1 i = 1ri = n and let B be the n × n matrix deﬁned by . . . xn)′.

. . Qk is the sum of the squares of the last rk Y’s. it follows that ||D|| = 1. n. Therefore (5) gives X ′X = X ′ B′DB X ( Hence B′DB = In. it is clear that ∑ Qi = (BX )′D(BX ). From the deﬁnition of D. ) identically in X. 2 ⎥ ⎦ equivalently. . Q2 is the sum of the squares of the next r2 Y’s. . It follows that B is nonsingular and therefore the relationship B′DB = In implies D = (B′)−1B−1 = (BB′)−1. . MM′ is positive deﬁnite (by Theorem 10. Yn)′. δ (ri) . Thus (BB′)−1 is positive deﬁnite and hence so is D. . k. It follows that. i = 1. then the r. . . which implies that D = In and hence B′B = In. On the other hand. of course. k) ( (k ) ⎬ ⎪b11 ⋅ ⋅ ⋅ b1n ⎪ ⎪ (k ) (k ) ⎪ br 1 ⋅ ⋅ ⋅ br n ⎪ ⎪ ⎭ ⎩ 1 1 k k Then by (4) and the deﬁnition of B. Also the fact that D = In and the transformation Y = BX imply. . i =1 i k (5) where D is an n × n diagonal matrix with diagonal elements equal to (i) δ1 . . so that the X’s are i. k. if Y = (Y1. . it follows then that all diagonal elements of D are equal to 1. . Set Y = BX. . Also n = rank In = rank (B′DB) ≤ r. . ∑ Qi = X ′X i =1 k and (BX)′D(BX) = X ′(B′DB)X. B is orthogonal. From the form of D. Qi are independent χ 2 r .478 19 Quadratic Forms ⎧b(1) ⋅ ⋅ ⋅ b(1) ⎫ 1n ⎪ ⎪ 11 ⎪b(1) ⋅ ⋅ ⋅ b(1) ⎪ rn ⎪ ⎪ r1 B=⎨ .’s Yj. that is to say. r ≤ n. Let r = rank B. so that rank D = n. by means of (4). . .i.v. distributed as N(0. . . . . σ 2) and set Xj = (Zj − μ)/σ. ▲ i APPLICATION 1 For j = 1. for i = 1. n are independent N(0. By Theorem 5. . . that Q1 is equal to the sum of the squares of the ﬁrst r1. .v. 1). j = 1. It has been seen elsewhere that 2 2 ⎡ n Z −μ n ⎛Z −Z⎞ ⎛ Zj − μ ⎞ j ⎢ = + ∑⎜ ∑⎜ ⎢ σ ⎟ σ ⎟ σ ⎠ ⎠ j =1 ⎝ j =1 ⎝ ⎣ n ( )⎤ ⎥ . On the other hand. . so that r = n. Chapter 9. . . 1). .’s distributed as N(μ.d. . let Zj be independent r. Y’s. The proof is completed.I(ii) in Appendix I) and so is (MM′)−1. . for any nonsingular square matrix M. it follows that. Then.

⎝ ⎠ 1 2 2 Now ⎞ ⎛ ⎞ 1⎛ n ⎜ n X ⎟ = ⎜ ∑ X j ⎟ = X ′C 2 X. we have PROOF rank C + rank I n − C = n. Then Theorem 1 applies with k = 2 and gives that 2 2 ¯ 2 ¯2 ∑n j = 1(Xj − X ) and (√nX ) are independent distributed as χ n−1 and χ 1.2 Some Theorems on Quadratic Forms 479 ∑ X j2 = ∑ X j − X j =1 j =1 n n ( ) 2 ⎛ ⎞ + ⎜ n X⎟ .v. respec2 n 2 2 ¯ ) is distributed as χ n−1 and is tively. C2 = C) and rank C = r. so that rank C2 = 1. . .1) that ∑ (X j =1 n j −X ) 2 = X ′C1X. Assume now that Q = X′CX is χ 2 r .2. ⎝ ⎠ n ⎝ j =1 ⎠ 1 2 2 2 where C2 has its elements identically equal to 1/n. n. j = 1. . Thus it follows that (1/σ )∑ j = 1(Zj − Z ¯ independent of Z .I(iii) in Appendix I.19. j =1 ( ) ( ) ( ) m (8) . Namely. Next it can be shown (see also Exercise 19. By Theorem 11. Then Q is distributed as χ 2 r if and only if C is idempotent (that is. . X = PY). Suppose that C is idempotent and that rank C = r. ( ) (6) Also X ′X = X ′CX + X ′ I n − C X. Then we ﬁrst show that rank C = r. where C1 is given by ⎧ n−1 n ⎪ ⎪ −1 n C1 = ⎨ ⋅ ⎪ ⎪ −1 n ⎩ ( ) (n − 1) ⋅ −1 n ⋅⋅⋅ n ⋅⋅⋅ ⋅⋅⋅ ⋅⋅⋅ −1 n ( −1 n ⎫ ⎪ −1 n ⎪ ⎬ ⋅ ⎪ n − 1 n⎪ ⎭ ) and that rank C1 = n − 1. there exists an orthogonal matrix P such that if Y = P−1X (equivalently.’s Xj.I(iii) in Appendix I. ( ) (7) Then Theorem 1 applies with k = 2 and gives that X′CX is χ 2 r (and also X′(In − C)X is χ 2 n−r). THEOREM 2 Consider the quadratic form Q = X′CX. then Q = X ′CX = PY ′C PY = Y ′ P ′CP Y = ∑ λ j Y j2 . Then by Theorem 12. 1) r. The following theorem refers to the distribution of a quadratic form in the independent N(0.

We now show that C2 = C. Then Theorem 2 implies that ∑ n j=1(Xj − X ) and (√n X ) . of ∑ j = 1λjY j .. This completes the proof of the theorem.f. . . . 1) (Theorem 5. . From (8). −1 −1 (12) Multiplying by P′ and P on the left and right. . respectively. so that the Y 2’s are independent χ 2 1. respectively.f.2. . Hence P′CP is idempotent. or equivalently 1 σ2 ∑ (Z j =1 n j −Z ) 2 and ⎡ n Z −μ ⎢ ⎢ σ ⎣ ( )⎤ ⎥ ⎥ ⎦ 2 2 are distributed as χ 2 n−1 and χ 1. one concludes that C = C2. . Therefore the ch.480 19 Quadratic Forms where (Y1.2) and m = r. both sides of (12). (11) It follows then that rank C = r. Thus X′CX ≥ 0 for every X. . Yn)′ = Y and λj. the Y’s are independent N(0. P ′CP = P ′C 2P. then it is positive semideﬁnite. . . . (10) From (8)–(10). (9) On the other hand. one has that Q = X′CX is equal to ∑ jr= 1Y 2 j. m 2 Chapter 9). j = 1. one has that P′CP is diagonal and. COROLLARY 1 If the quadratic form Q = X′CX is distributed as χ 2 r. as was to be seen.3) that C1 and 1/2 ¯ 2 ¯ 2 C2 are idempotent. PROOF From (8) and (10). ▲ . Thus P ′CP = P ′CP ) = (P′CP)(P′CP) = P ′C(PP ′)CP = P ′CI CP = P ′C P. so that −1 X′CX is equal to ∑ jr= 1Y 2 j . To this theorem there are the following two corollaries which will be employed in the sequel. is given by [(1 − 2iλ t ) ⋅ ⋅ ⋅ (1 − 2iλ t )] 1 m −1 2 . Yn)′ = Y = P X. so that its ch. It can be shown (see also Exercise 19. . Q is χ 2 r by assumption. m are the nonzero characteristic roots of C. its diagonal elements are either 1 or 0. one then has that (see also Exercise 19. evaluated at t. . 2 2 n ( That is. . . by (11).2. evaluated at t. By the orthogonality of P. where X = (X1. ▲ APPLICATION 2 Refer to Application 1. . Xn)′ and (Y1. is given by (1 − 2it ) λ 1 = ⋅ ⋅ ⋅ = λm = 1 −r 2 .

∑ j=1(Xj − X ) is distributed as χ n−1 and is independent of 2 ¯ )2. it sufﬁces to show that P′CP is idempotent and that its rank is r. 1). The equation Q = Q1 + Q2 implies (Y . B1 and B2. so that rank (In − C1) = n − r1. 2 Furthermore. Then Q2 is χ n−r and Q1.I(iv) in Appendix I. Q* 1 and Q* 2 are positive semideﬁnite. 1 1 Let Q1 = X′C1X. it follows that there exists an orthogonal matrix P such that if Y = P−1X (equivalently. . (1/σ 2)∑n ¯2 (√nX j = 1(Zj − Z ) is distributed as χ n−1 and is ¯ independent of Z . That rank P′CP = r follows from Theorem 9. Therefore by Theorem 10. Q1 and Q2 are quadratic forms in X. (13) By Corollary 1 to Theorem 2. 2. We have then that rank Q1 + rank Q2 = n. . In − C1 is idempotent. For i = 1. C is idempotent and rank C = r. . so is the quadratic form Q* = Y′(P′CP)Y. . . Thus once again. where r2 = r − r1. ▲ 2 2 1 APPLICATION 3 ¯ is N(0.Y )′(Y . i = 1. or equivalently. By Theorem 11.2 Some Theorems on Quadratic Forms 481 COROLLARY 2 Let P be an orthogonal matrix and consider the transformation Y = P−1X. . Hence the result. Since √nX 1. Q2 are independent. it follows that (√nX ¯ )2 is χ 2 Refer to Application 1. and Q1. Also rank C1 + rank (In − C1) = n by Theorem 12. . that is. and let Q1 2 be χ 2 r . The following theorem is also of interest. Then the assumption that Q1 is χ 2 r implies (by Theorem 2) that C1 is idempotent and rank C1 = r1. Suppose that Q = Q1 + Q2. X = PY). let Q be χ 2 r . ▲ THEOREM 3 Suppose that X′X = Q1 + Q2. We have (P′CP) 2 2 = P ′C PP ′ CP = P ′CCP = P ′CP ( ) since C = C. where Q1. Then. Q2 are quadratic forms in X.I in Appendix I. Next PROOF 1 Q2 = X ′X − Q1 = X ′X − X ′C1X = X ′ I n − C1 X ( ) and (In − C1) = In − 2C1 + C = In − C1. . Then if the quadratic form Q = X′CX is χ 2 r . it . and therefore Theorem 1 applies and gives the result. 2. 1 2 THEOREM 4 PROOF Let Q = X′CX. that is.Y ) = ∑ Y 1 r 1 r j =1 r 2 j ∗ = Q∗ 1 + Q2 . let Qi = X′CiX and let Q* i be the quadratic form in Y into which Qi is transformed under P. Then. where Bi = P ′Ci P. by Theorem 2. let Q1 be χ r and let Q2 be positive semideﬁnite. From this result and (13).19. Then 2 Q2 is χ r . where Q.I(iv) in Appendix I. Qi∗ = Y ′Bi Y. PROOF By the theorem. Q2 are independent.I(iii) in Appendix I. it follows that C1 is positive semideﬁnite and so is C2 by assumption. n 2 2 ¯ by Theorem 3. then Q is transformed into Y′(P′CP)Y = ∑jr= 1Y 2 j.

Then Qk is χ r . . . m + 1 are also independent and the proof is concluded. 2. we have the following result. 2. . . . i = 1. Qi = Y′CiY. j = 1. and for i = 1. rm ∑i i =1 m −1 and ∗ Q1 . . i = 1. . i = 1. we have that the r. ▲ 1 This last theorem generalizes as follows. . . . . n. More precisely. Once again Theorem 4 applies m m+1 r* r and gives that Qm+1 is χ 2 r . that is. . The proof is by induction.v. . . Let the theorem hold for k = m and show that it also holds for m + 1. Qm are independent. . Furthermore. . . where i k rk = r − ∑ ri . By the fact that i . THEOREM 6 Consider the independent r.’s Yj. Thus Q* m = Qm + Qm+1. where Yj is distributed as N(μj. by the induction hypothesis. k (≥ 2) are quadratic forms 2 in X. rm +1 = rm ∑i m i =1 m and that Qm and Qm+1 are independent. k − 1 and let Qk be 2 positive semideﬁnite. . . . m ∗ =r− r. let Qi be quadratic forms in Y = (Y1. r only. 1). . i = 1. Yn)′. On the other hand. j = 1. k are independent. . We write PROOF ∗. ▲ The theorem below gives a necessary and sufﬁcient condition for independence of two quadratic forms. Q* 1 is χ r by Corollary 2 to Theorem 2. . j = 1. . where Q and Qi. σ 2). Q = ∑ Qi + Qm i =1 m −1 where ∗ = Q +Q . Qm −1 . To this end. . Then Q1 and Q2 are independent if and only if C1C2 = 0. r are independent 2 N(0. 1) and Qi ϳ χ 2 r . . Qm m m +1 By our assumptions and Corollary 1 to Theorem 2. i = 1. . . Q is χ and Q is positive semideﬁnite. where Q* m 2 is χ 2 . where m m m+1 ∗−r =r− r. it follows that Q* m is 2 positive semideﬁnite. . Hence Q* m is χ r* . PROOF The proof is presented only for the special case that Yj = Xj ϳ N(0. . These facts together with (13) imply that Theorem 3 applies (with n = r) and provides the desired result.482 19 Quadratic Forms follows that Q* 1 and Q* 2 are functions of Yj. For k = 2 the conclusion is true by Theorem 4. . THEOREM 5 Suppose that Q = ∑k i = 1Qi. . let Qi be χ r . . It follows that Qi.v. suppose that C1C2 = 0. i =1 k −1 and Qi . .’s Yj. From the orthogonality of P. . . let Q be χ 2 r.

3 Refer to Application 2 and show that the matrices C1 and C2 are both idempotent. ▲ 1 2 1 2 1 2 REMARK 1 Consider the quadratic forms X′C1X and X′C2X ﬁguring in Applications 1–3. Then. i = 1. one has C2C1 = C ′ (C1C2)′ = 0′ = 0. C2 and C1 ± C2 are all idempotent. the ﬁrst being distribii) The r. applies and gives that C1C2 (= C2C1) = 0. 2 are idempotent. by Theorem 6. X′(C1 + C2)X is a quadratic form in X distributed as χ 2 r +r . This concludes the proof of the theorem.2 Some Theorems on Quadratic Exercises Forms 483 Qi is distributed as χ 2 r . it follows that 2 Q1 + Q2 = X′(C1 + C2)X is χ r +r . where X is of full rank 19. according to the conclusion reached in discussing those 2 applications. respectively. it follows (by Theorem 2) that Ci.I(iv). That is. 19.v. Next by the symmetry of Ci. 19. So the matrices C1.’s Y′X′S−1XY and ||Y − X′β 2 uted as noncentral χ p and the second as χ 2 n−p. clearly. 2 Let now Q1. Q2 be independent. i = 1. This should imply that C1C2 = 0. Thus C1 + C2 is idempotent by Theorem 2.2. β + e.I(iii). in Appendix I. C 2 2C ′ 1 = i = Ci. ii) ||Y||2 = Y′X′S−1XY + ||Y − X′β ˆ ||2 are independent. in Appendix I. Then Theorem 12. we have X ′X = X ′C1X + X ′C 2 X + X ′ I n − C1 − C 2 X. Write Y as follows: Y = X′β ˆ + (Y − X′β ˆ) p.2. and let β and show that: ˆ ||2.4 Consider the usual linear model Y = X′β ˆ = S−1XY be the LSE of β. Since Q1 is χ 2 r and Q2 is χ r .2.1 Refer to Application 1 and show that ∑ (X j =1 n j −X ) 2 = X ′C1X as asserted there. that is. On the other hand. Then Theorem 12. . ( ) (14) ( ) (15) Then relations (14). 2.19.2. (15) and Theorem 1 imply that X′C1X = Q1. Exercises 19. X′C2X = Q2 (and X′(In − C1 − C2)X) are independent. indeed.2 Justify the equalities asserted in (11). Therefore i C 1 I n − C 1 − C 2 = C 2 I n − C 1 − C 2 = 0. the case as is easily seen. This is. X′C1X and X′C2X are distributed as χ 2 n−1 and χ 1. implies that ( ) ( ) rank C1 + rank C 2 + rank I n − C1 − C 2 = n.

Yn)′. [Hint: The proof is presented along the same lines as that of Theorem 1. as well as the r. . Yj being distributed as N(μj.] i i ( ) . where for i = 1.’s distributed as N(0. Qi are quadratic forms in Y. where show that the r. .6 Refer to Example 1 in Chapter 16 and (by using Theorem 6 herein) ˆ. .’s β ˜ 2. 6 Then ﬁnd the distribution of Q and show that Q is independent of the r.7 For j = 1. .2. σ 19. 2 ∑3 j = 1X j − Q.v. k. . n. X2.2. Q= 19. .’s β 1 2 2 2 ˜ is the LSE of σ . 1). 1) and let the r.v. and set Y = (Y1. .v.σ ˜ 2. μn)′. X3 be independent r. are independent. . with rank Qi = ri. Then show that the k r.5 Let X1. .δ if and only if ∑ i = 1ri = n. . . Qi = Y′CiY. k.’s. . . . Let Y′Y = ∑k i = 1Qi. let Yj be independent r. .v. i = 1. .v.v.’s Qi are independent χ 2 r . where the noncentrality parameter δi = μ′Ciμ.484 19 Quadratic Forms 19.σ ˆ. μ = (μ1. . .v. Q be deﬁned by 1 2 2 5X 1 + 2X 2 2 + 5X 3 + 4 X1 X 2 − 2 X1 X 3 + 4 X 2 X 3 . .2. .

’s with certain distribution about which no functional form is stipulated. mean μ. j = 1. Let X 1 n Xn = ∑ X j . . . This is justiﬁed on the basis of the SLLN’s. n be i. namely. consistent in the probability sense.v. n Xn − μ σ Thus if zα/2 is the upper α/2 quantile of the N(0. 1) distribution.s. according to the CLT. 1). 20. The ﬁrst part of the chapter is devoted to nonparametric estimation and the remaining part of it to nonparametric testing of hypotheses. inferences which are made without any assumptions regarding the functional form of the underlying distributions. is weakly P ⎯ → μ . Thus X n ⎯ n→∞ ¯n is strongly consistent. distribution-free inference. r. n →∞ Let us suppose now that the X’s also have ﬁnite (and positive) variance σ 2 which presently is assumed to be known. . The only assumption made is that the X’s have a ﬁnite (unknown) ¯n be the sample mean of the X’s. To this end.1 Nonparaetric Estimation 485 Chapter 20 Nonparametric Inference In this chapter. viewed as an estimator of μ. then 485 ( )⎯ d ⎯ → Zϳ N ( 0.d. X n ⎯ ⎯ → μ .i.20. we discuss brieﬂy some instances of nonparametric. n j =1 ¯ Then it has been shown that X n. This is consistent. n→∞ . It has also been mentioned that X a. or more properly. that is. that is. we should like to mention a few cases of nonparametric estimation which have already been discussed in previous chapters although the term “nonparametric” was not employed there. That is.1 Nonparametric Estimation At the beginning. . so by the WLLN’s. Then. let Xj.

. A further instance of point nonparametric estimation is provided by the following example. . Clearly.486 20 Nonparametric Inference ⎤ ⎡ n Xn − μ P ⎢−zα 2 ≤ ≤ zα 2 ⎥ ⎥ ⎢ σ ⎦ ⎣ ⎡ σ = P ⎢ X n − zα 2 ≤ μ ≤ X n + zα ⎢ n ⎣ ( ) 2 σ ⎤ ⎯→ 1 − α . viewed as an estimator of σ 2. here Ln = L X 1 . 2 ∗ X . . .X = X + z U∗ n =U 1 n n α ( ) Sn . . . . Un coefﬁcient 1 − α. X n = X n − zα and U n = U X 1 . . that is. Un] is a conﬁdence interval for μ with asymptotic conﬁdence coefﬁcient 1 − α.f. X n = X n + zα ( ) σ 2 n ( ) σ 2 . . . n j =1 ( ) Then the WLLN’s and the SLLN’s. n Next. it follows that n Xn − μ Sn By setting ( )⎯ d ⎯ → Zϳ N ( 0. of the Xi’s and set Fn for their sample or empirical d. properly applied. Let F be the (common and unknown) d.. ⎥⎯ n→∞ n⎥ ⎦ so that [Ln. . ensured that S 2 n. X n = X n − zα and ( n . namely. suppose that σ 2 is unknown and set S 2 n for the sample variance of the X’s. Also by the corollary to Theorem 9 of Chapter 8. . S2 n = 2 1 n ∑ X j − Xn . n we have that [L* *] is a conﬁdence interval for μ with asymptotic conﬁdence n. 1). . was both a weakly and strongly consistent estimator of σ 2. . . n→∞ ) Sn 2 ∗ L∗ n = L X 1 . .f. the examples mentioned so far are cases of nonparametric point and interval estimation.

recall that if F is the d.i. ⋅ ⎯a. ⎯→ F x n→∞ ( ) () uniformly in x ∈ ޒ. . we have Fn x. an unknown d. s − ε ≤ F x ≤ Fn x. 1065–1076.d. 20.f. To this end. 487 Fn x. we report some of these results without proofs. r. let Xj. Parzen. We close this section by observing that Section 5 of Chapter 15 is concerned with another nonparametric aspect.s.d. the quantity [F(x + h) − F(x − h)]/2h should be close to f(x). The relevant proofs can be found in the paper “On estimation of a probability density function and mode.f.20. x ∈ ޒ. .’s Xj. s + ε . . n be i.2 Nonparametric Estimation of a p.f.D. j = 1. f. In the present section.1 Nonparaetric Estimation Estimation of a P.’s with (the common) p. . which. corresponding to the p.2 Nonparametric 20. F was estimated by the sample d.f. X n s ≤ x . Then it was stated in Chapter 8 (see Theorem 6) that Fn x.v. of course. To this end. . r. x ∈ ޒ.d.d. j = 1. At the end of the previous section. This suggests estimating f(x) by the (known) quantity ˆ x = f n () Fn x + h − Fn x − h 2h ( ) ). namely that of constructing tolerance intervals.f. 33 (1962).f.f. for (0 <) h sufﬁciently small. s ∈S . then f x = lim h →0 () F x+h −F x−h 2h ( ) ( ( ). which appeared in The Annals of Mathematical Statistics.F. Vol. In this section.f. we shall consider the problem of estimation of an unknown p. Fn based on the i. ( ) () ( ) provided n ≥ n(ε. ·) is a strongly consistent estimator of F(x) and for almost all s ∈ S and every ε > 0. . . First we shall try to give a motivation to the estimates to be employed in the sequel.d. A signiﬁcant amount of work has been done regarding the estimation of f(x). However. n [ () () ] (1) We often omit the random element s and write Fn(x) rather than Fn(x. . is assumed to be unknown. s = ( ) 1 the number of X 1 s . . s) independent of x ∈ ޒ. n whose (common) d.i. .v. Thus. . . f which is assumed to be of the continuous type. s). . pp. (2) Thus Fn(x.d. is assumed to be the unknown one F.” by E.

2.f. f and let K be a p. . . .f. let {hn} be a sequence of positive constants .f. r.’s with (unknown) p. x ∈ ޒ. . X n ≤ x + h = ⎜ 2h ⎝ n − = the number of X1 . X n ≤ x − h ⎞ ⎟ n ⎠ 1 the number of X1 . THEOREM 1 Let Xj. to be shortened to fˆn(x). x + h . deﬁne the r. . Thus the proposed estimator fˆn(x) of f(x) is expressed in terms of a known p.d.f.v.d. X n in x − h. . . ⎪ ⎭ Next. .f. . This expression also suggests an entire class of estimators to be introduced below.d. fˆn(x. if x ∈ −1. . . hn ⎟ ⎠ (6) Then we may formulate the following results. satisfying conditions (4). . x ∈ ∞ < ޒ { () ( ) () } (4) () hn ⎯ ⎯→ 0. 1 ⎪ K x = ⎨2 ⎪ ⎩0. .d. deﬁned on ޒinto itself and satisfying the following properties: () ( ] ⎫ ⎪ ⎪ lim xK x = 0 as x → ∞ ⎬ ⎪ K − x = K x . ∑ n nh j=1 ⎝ h ⎟ ⎠ () where K is the following p. x + h ˆ x = 1 f n n 2h and it can be further easily seen (see Exercise 20. Also. . . as follows: ˆ x = 1 f n nhn () ∑ K⎜ ⎝ j =1 n ⎛ x− Xj ⎞ . .488 20 Nonparametric Inference ˆ x = f n () Fn x + h − Fn x − h ( ) ( ) 2h 1 ⎛ the number of X1 . For this purpose. . . K by means of (3). n→∞ (5) For each x ∈ ޒand by means of K and {hn}.d.v.: ⎧ 1 . s).1) that () ( ] (3) n ⎛ x − Xj⎞ ˆ x = 1 f K⎜ . n be i. let K be any p. j = 1. . X n in x − h. let {hn} be a sequence of positive constants such that sup K x .i. 2h n ( ] that is the number of X1 . otherwise. .d.

n→∞ ( ) n→∞ ( ) (7) Then the following results hold true.2 Nonparametric 20.f. consider the following example. if K is taken to be the 1 1 – p. As an illustration. 1). s.v.f.. ˆ x −E f ˆ x f n n n ( ) [ ( )] ⎯ d ⎯ → Zϳ N ( 0. n ⎯ n→∞ (8) we may show the following result.d. THEOREM 3 Under the same assumptions as those in Theorem 2. of the N(0. Take K x = () 1 2π e−x 2 2 .d. there is plenty of ﬂexibility in choosing it. satisﬁed. f. it should be pointed out that there are many p. n →∞ ˆ σ [ f ( x )] Finally.v. . these conditions are. fˆn(x).d. for each x ∈ ޒat which f is continuous.1 Nonparaetric Estimation Estimation of a P. n→∞ The estimator fˆn(x).’s of the type K satisfying conditions (4). as the following theorem states. n Now let {hn} be as above and also satisfying the following requirement: nhn ⎯ ⎯→ ∞.20. 1). f ⎯ ⎯ →f x n n→∞ () () uniformly in x ∈ ޒ.i. or the U(− – 2 . . clearly. j = 1. Then for any x ∈ ޒat which f is continuous.f. for each x ∈ ޒat which f is continuous.d. THEOREM 2 Under the same assumptions as those in Theorem 1 and the additional condition (7).f.D. In closing this section.’s Xj. r.d. . ˆ x ⎯a. if it happens to be known that f belongs to a class of p. 2 ) p. As for the sequence {hn}. the r. n with (unknown) p. .’s which are uniformly continuous. For example.f. THEOREM 4 Under the same assumptions as those in Theorem 1 and also condition (8). is asymptotically unbiased in the sense that ˆ x ⎯ Ef ⎯→ f x .F. provided f is uniformly continuous. then by choosing the sequence {hn} of positive constants to tend to zero and also such that nh 2 ⎯→ ∞. 489 satisfying (5) and for each x ∈ ޒlet fˆn(x) be deﬁned by (6). . is also asymptotically normal.d. EXAMPLE 1 Consider the i. when properly normalized. the estimator fˆn(x) of f(x) is consistent in quadratic mean in the sense that ˆ x −f x Ef n () () 2 ⎯ ⎯→ 0. viewed as an estimator of f(x).

f. so that e−x /2 < e−x/2 and hence K x = 2 () 1 e−x 2 ≤ 1 { () } 2 2 xe − x 2 2 < xe − x 2 = x e x 2 .f. r.2. 1) and 0 otherwise. Since also K(−x) = K(x). n n 2h Then show that () ( ] n ⎛ x− Xj ⎞ ˆ x = 1 f K⎜ .i.i. .1. n be i. ∑ n nh j = 1 ⎝ h ⎟ ⎠ () 1 where K(x) is – 2 if x ∈ [−1. . We get then 1 ⎯ ⎯→ 0 1 θ x 2 x→∞ 1 x + e 2 2 2 ⎯ → 0. Thus the estimator given by (6) has all properties stated in n→∞ Theorems 1–4. . 2π 2π Next. As was seen in Section 20. the sample d. one has ex < ex .d. . j = 1. .d. so that sup K x .v. Fn may be used for the purpose of estimating F. .’s with unknown d. . and replace t by x/2. . . deﬁne fˆn(x) as follows: the number of X1 . n be i. for x > 1. x + h ˆ x = 1 f . X n in x − h. This estimator here becomes as follows: x x = e x 2 1 + x 2 eθ x ( ) 2 = ( ) | | || ˆ x = f n () 1 2π n3 4 ⎡ x−X j − ∑ exp ⎢ 1 2 ⎢ 2n j =1 ⎢ ⎣ n ( ) 2 ⎤ ⎥.v. However. . so that and therefore xe − x 2 ⎯ In a similar way xe − x 2 ⎯ x→∞ x→−∞ lim xK(x) = 0 as x → ∞.’s and for some h > 0 and any x ∈ ޒ.490 20 Nonparametric Inference Then. ⎥ ⎥ ⎦ Exercise 20. x ∈ ∞ < ޒ. nhn = n n1 2 = n1 2 ⎯ ⎯→ ∞ and Let us now take hn = 1/n1/4. . 20. condition (4) is satisﬁed. F. . j = 1. . . clearly. ⎯→ 0. r.1 Let Xj. Then 0 < hn ⎯ n→∞ n→∞ 3 4 nhn = n ⎯ ⎯ → ∞ . Now consider the expansion et = 1 + teθt for some 0 < θ < 1.3 Some Nonparametric Tests Let Xj. 2 ⎯→ 0.

f.3 20. For moderate values of n (n ≤ 100) and selected α’s (α = 0.’s with continuous but unknown d. G. Dn = sup Fn x − F0 x . Dn as follows.v. .05.i. 0.f. Fn may also be used for testing the same hypothesis as above.025. One possible alternative is the following: (12) A: F ≠ G (in the sense that F(x) ≠ G(x) for at least one x ∈ )ޒ. deﬁned by (1). The two random samples are assumed to be independent and the hypothesis of interest here is H : F = G. a given continuous d.v. F and let Yj.) The test employed above is known as the Kolmogorov one-sample test. 1962.005). The chi-square test is the oldest nonparametric test regarding d. 0.’s.f.f. Therefore we would reject H if Dn > C and would accept it that Dn ⎯a. The constant C is to be determined through the relationship n→∞ P Dn > C H = α . ) (10) In order for this determination to be possible. we would have to know the distribution of Dn. Thus the hypothesis to be tested here is H : F = F0 .. Section 8. Thus we may be interested in testing the hypothesis H : F = F0. Owen. (See. the right-hand side of (11) may be used for the purpose of determining C by way of (10). (11) Thus for large n. we have to make the supplementary (but mild) assumption that F is continuous. 0.. under H. Then. In order to be able to employ the test proposed below. r. . x ≥ 0. against the alternative A : F ≠ F0 (in the sense that F (x) ≠ F (x) for at least one x ∈) ޒ. This hypothesis can be tested by utilizing the chi-square test for goodness of ﬁt discussed in Chapter 13.1 Some Nonparaetric Nonparametric Estimation Tests 491 testing hypotheses problems about F also arise and are of practical importance. What arise naturally in practice are problems of the following type: Let Xi. . Addison-Wesley.10.s. .20. .’s with continuous but unknown d. under H. x ∈ ޒ. . B. . { ( () () } (9) where Fn is the sample d. .f.d. against all possible alternatives.v. m be i.i. Alternatively. i = 1. Deﬁne the r. otherwise. 0 Let α be the level of signiﬁcance. there are tables available which facilitate the calculation of C.f. the sample d. for example. a given d. It has been shown in the literature that P ( nDn ≤ x H ⎯ ⎯→ n→∞ ) j = −∞ ∑ (−1) e j ∞ −2 j 2 x 2 . 0. j = 1. or of some known multiple of it.01. it follows from (2) ⎯→ 0. Handbook of Statistical Tables by D. r.d. n be i. The testing hypothesis problem just described is of limited practical importance.f. .

⎯→ 0. the right-hand side of (15) may be used for the purpose of determining C by way of (14). F = G. we have that Dm.v. a. For moderate values of m and n (such as m = n ≤ 40).n as follows: Dm. so that Fm x − Gn x = Fm x − F x − Gn x − G x m n () ( ) [ ( ) ( )] [ ( ) ( )] ≤ F ( x ) − F ( x ) + G ( x ) − G( x ) .s. x ≥ 0. sup Gn x − G x .n ≤ sup Fm x − F x . ( ) (14) Once again the actual determination of C requires the knowledge of the distribution of Dm. x ∈ ޒ. under H. Thus.n deﬁned by + Dm . x ∈ ޒ+ sup Gn x − G x . Hence Dm. namely.n ⎯ ⎯→ 0 as m. For testing H against A′. n → ∞. { () () } (13) where Fm. Dm. Deﬁne the r.s. we employ the statistic D+ m. the following two alternatives are also of interest. x ∈ ⎯ ޒa.’s of the X’s and Y’s. x ∈ ޒ. The constant C is determined by means of the relation P Dm. respectively.s. (16) (17) in the sense that F(x) ≥ G(x) with strict inequality for at least one x ∈ ޒ.) In addition to the alternative A : F ≠ G just considered. x ∈ ޒ { () () } . and in the sense that F(x) ≤ G(x) with strict inequality for at least one x ∈ ޒ. (15) where N = mn/(m + n).n > C and accepting it otherwise. A′′ : F < G. x ∈ ⎯ ޒa.n ≤ x H → ) j = −∞ ∑ (−1) e j ∞ −2 j 2x 2 as m.f. Under H. (See reference cited above in connection with the one-sample Kolmogorov test. or some known multiple of it. A′ : F > G.n > C H = α . and this suggests rejecting H if Dm. ⎯→ 0. In connection with this it has been shown in the literature that P ( N Dm . whereas sup Fm x − F x . there are tables available which facilitate the calculation of C. Gn are the sample d. m→∞ m→∞ { () () } { () () } { () () } { () () } In other words. n → ∞.n = sup Fm x − Gn x .n = sup Fm x − Gn x . for large m and n.n.492 20 Nonparametric Inference The hypothesis is to be tested at level α.

N = mn/(m + n). By the assumption of continuity of F and G.n ≤ x → 1 − e ) 2 as m.f.) This is done by employing the ranks of the X’s and Y’s in the combined sample rather than their actual values.1 Nonparaetric Tests: Rank Estimation Tests 493 + + and reject H if D+ m.n < C . respectively. the rank R(Yj) of Yj in the combined sample of the X’s and Y’s is deﬁned in a similar fashion. . as can be shown. we employ the statistic D − m. . The cut-off point C is determined through the relation + + P Dm . the reader is referred to the reference cited earlier in this − section. Now it seems reasonable that in testing H on the basis of the X’s and Y’s. The cut-off point C is determined through the relation − − P Dm . .n deﬁned by − Dm . (That is. 20.4 More About Nonparametric Tests: Rank Tests Consider again the two-sample problem discussed in the latter part of the previous section. For relevant tables.n ≤ x → 1 − e ) 2 as m. that is. n be two independent random samples with continuous d. The rank of Xi in the combined sample of X’s and Y’s.n > C . .n. Namely. n → ∞. D+ m. x ∈ ޒ. The problem is that of testing the hypothesis H : F = G against various alternatives at level of signiﬁcance α.’s F and G. j = 1.n < C H = α ( ) by utilizing the fact that P ( − −2 x N Dm . . .n and Dm. m and Yj. . . n → ∞. The last three tests based on the statistics Dm. . it follows that in . x ∈ ޒ { () () } − − and reject H if D− m.4 More About Nonparametric 20.n are known as Kolmogorov–Smirnov two-sample tests. N (= m + n) which corresponds to the position of Xi after the X’s and Y’s have been ordered according to their size. i = 1. for testing H against A″. Of course. x ∈ ޒ. Similarly. Here N is as before. to be denoted by (ޒXi). .n > C H = α ( ) by utilizing the fact that P ( + −2 x N Dm . is that integer among the numbers 1.n = sup Gn x − Fm x . let Xi. the conclusion should be the same if the X’s and Y’s are multiplied by the same positive constant. This is a special case of what is known as invariance under monotone transformations. .20. we should reach the same conclusion regarding the rejection or acceptance of H regardless of the scale used in measuring the X’s and Y’s. .

if z > 0 uz =⎨ ⎩0. and for reasons to become apparent soon. we would reject H in favor of A′ if where C′ is deﬁned. and let us consider alternative A′. the number of times a Y precedes an X and it can be shown (see Exercise 20.’s X.2) that U = mn + n n+1 2 ( ) −R Y = RX − m m+1 2 ( ). as they are speciﬁed by (12). .v. we have strict inequalities with probability equal to one. the normal approximation to be discussed below may be employed. As an example. . RY = ∑ R Yj i =1 j =1 m ( ) n ( ) (18) because RX + RY = N(N + 1)/2 (ﬁxed).) In the present case. R(Xm)) are equally likely each having probability 1/(m ). so that RX would tend to take on small values. ⎝ m⎠ There are three alternatives of interest to consider. . accordingly.1. for this latter case. A′ and A″.4. P(X ≤ x) ≥ P(Y ≤ x). clearly. For testing the hypothesis H speciﬁed above. A. (19) (20) ( ) N Theoretically the determination of C′ is a simple matter. all (m ) N values of (R(X1). consider the function u deﬁned as follows: ⎧1. let the r. Y be distributed as the X’s and Y’s. (23) .494 20 Nonparametric Inference ordering the X’s and Y’s. this procedure is facilitated by tables (see reference cited in previous section). Then the rejection region consists of the k smallest values of these rank sums. as is easily seen. (16) and (17). Under A′. For small values of m and n (n ≤ m ≤ 10). N The rejection region then is deﬁned as follows: Consider all these ( m ) values and for each one of them form the rank sum RX. it is customary to take the level of signiﬁcance α as follows: α= ( ) N m k ⎛ N⎞ . RY deﬁned by RX = ∑ R X i . respectively. 1 < k < ⎜ ⎟. so that ޒX < C ′. namely. Next.4. if z < 0 () (21) and set U = ∑ ∑ u X i − Yj . x ∈ ޒ. under H. P ޒX < C ′ H = α. respectively. (See Exercise 20. The remaining two alternatives are treated in a similar fashion. i =1 j =1 m n ( ) (22) Then U is. we are going to use either one of the rank sum statistics RX. whereas for large values of m and n it becomes unmanageable. .

mn m + n + 1 mn . Y2 = 71. X3 = 74. Y4 = 50. σ2 U = . X2 = 65. 2 12 Then the r. X5 = 82. consider the following numerical example.20. Y3 = 53. (Incidentally. for large m.1. n → ∞ and therefore. so that ( ) ( ) RX = 26. Z distributed as N(0. From this.) Now N = 5 + 4 = 9 and ( ) () N m 9 5 1 = 1 = 1 . respectively. F is assumed to be unknown but continuous. the limiting distribution (along with the continuity correction for better precision) may be used for determining the cut-off point C′ by means of (20). their discussion here would be beyond the purposes of the present chapter. R(Y ) = 3.v. R X 3 = 6. R X 4 = 1. A′ : F > G and A″ : F < G are equivalent to Δ ≠ 0. As before. where the rank sum tests of the present section are appropriate. A special interesting case. these results check with (23) and Exercise 20. R X 2 = 4. As an illustration. where an X or Y below a number means that the number is coming from the X or Y sample. n = 4 and suppose that X1 = 78. However. (U − EU)/(σ (U )) converges in distribution to an r. is that where the d.4.4 More About Nonparametric 20. EXAMPLE 2 Let m = 5. Δ > 0 and Δ < 0. This test is known as the two-sample Wilcoxon–Mann–Whitney test. Y1 = 110.f. 126 .4. Then the hypothesis H : F = G is equivalent to testing Δ = 0 and the alternatives A : F ≠ G. we should like to mention that there is also the onesample Wilcoxon–Mann–Whitney test. 2 3 4 ( ) ( ) ( ) ( ) R(Y ) = 5. respectively. x ∈ޒ () ( ) for some unknown Δ ∈ ޒ. R Y1 = 9. RY = 19. we obtain ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 45 X 50 Y 53 Y 65 X 71 Y 74 X 78 X 82 X 110 Y . as well as other one-sample and twosample rank tests available.1 Nonparaetric Tests: Rank Estimation Tests 495 Therefore the test in (19) can be expressed equivalently in terms of the U statistic. n. X4 = 45.v. G of the Y’s is assumed to be of the form EU = ( ) ( ) G x = F x − Δ . Now it can be shown (see Exercise 20. In closing this section. 1) as m. In this case. R X 5 = 8. we ﬁnd that R X 1 = 7. R(Y ) = 2.3) that under H. Combining these values and ordering them according to their size. we say that G is a shift of F (to the right if Δ > 0 and to the left if Δ < 0). We also ﬁnd that U = 4 + 4 + 3 + 0 = 11.

v. i = 1. .1 Consider the two independent random samples Xi.i. 20. respectively. j = 1.f. 20. we assume. To this end. n. More precisely. j = 1. i = 1. . . or equivalently (by means of (23)). .d. For the given m. we suppose that Xj. Furthermore. . σ2 U = .v.496 20 Nonparametric Inference and let us take 5 ≈ 0. .04 . Yj). m and Yj. in the combined sample of the X’s and Y’s. Then show that RX + RY = N N +1 2 ( ). . Then establish (23).i.4. F and that Yj. r. . . (See tables on p. that the underlying distributions are continuous. mn m + n + 1 mn EU = . G. 341 of the reference cited in Section 20.d. j = 1. n be two independent random samples and let U be deﬁned by (22). The two random samples are assumed to be independent and the hypothesis H to be tested is H : F = G. 126 Then for testing H against A′ (given by (16)). .) α= ( ) Exercises 20. .’s with continuous d. . . n are also i. which is easily applicable in many situations of practical importance. n are i. . if X j > Y j . . . 2 12 ( ) ( ) 20. we brieﬂy mention another nonparametric test—the twosample Sign test. if X j < Y j Zj = ⎨ ⎪ ⎩0. . .3.2 Let RX and RY be as in the previous exercise and let U be deﬁned by (22). j = 1. .4. . as we have also done in previous sections. for small values of U. we would reject for small values of RX. let RX and RY be deﬁned by (18). n and let R(Xi) and (ޒYj) be the ranks of Xi and Yj. . .5 Sign Test In this section. . Then show that.f. j = 1. . α and for the observed value of RX (or U). H is accepted. In order to avoid distribution related difﬁculties.4. m and Yj.’s with continuous d. n and set ⎧ ⎪1. under H. . . consider the n pairs (Xj. r. .3 Let Xi. where N = m + n. . .

. but we will not discuss it here. suppose that the two sample sizes are the same and let n be the sample size needed in order to obtain a given power β. Thus. clearly. X10 = 85 and Y1 = 50.20. X 4 = 90.’s F and G. . X 2 = 68. Y8 = 83. For the sake of an illustration. or A″ : F < G.1 and if we are interested in testing H : F = G against A : F ≠ G 1 1 – (equivalently. etc. if α = 0. so that 10 ∑Z j =1 j = 6. Some cases where the sign test just described is appropriate is when one is interested in comparing the effectiveness of two different drugs used for the treatment of the same disease. . Y4 = 96. let Xi. the response of n customers regarding their preferences towards a certain consumer item. Y6 = 64.1 Nonparaetric Efficiency Estimation of Tests 497 Also set Z = ∑n j=1Zj and p = P(Xj < Yj). p = – 2 against p ≠ 2 ). . Y5 = 74. one would use the two-sided or the appropriate onesided test. Y9 = 98. The hypothesis to be tested is H : F = G and the alternative may be either A : F ≠ G. Z is distributed as B(n. Y10 = 40.6 Relative Asymptotic 20. . The level of signiﬁ- . there is also the one-sample Sign test available. . Z10 = 0. Z7 = 0. Depending on the type of the alternatives. Z8 = 1. respectively. Of course. Z9 = 1. or A′ : F > G. Z3 = 1. consider the following numerical example. Then. . Y7 = 76. we would accept H. For the precise deﬁnition of this concept. X 5 = 83. i = 1.6 Relative Asymptotic Efﬁciency of Tests Consider again the testing hypothesis problem discussed in the previous sections. EXAMPLE 3 Let n = 10 and suppose that X1 = 73. Y2 = 100. say. 20. X 8 = 75.i. against a speciﬁed alternative when one of the above-mentioned tests is employed. X 9 = 90. Z6 = 1. . Y3 = 70. Z2 = 1. j = 1. This is obtained by introducing what is known as the Pitman asymptotic relative efﬁciency of tests. X 3 = 64.’s with continuous d. m and Yj. Then Z1 = 0. Z5 = 0. X 7 = 100. n be i. X 6 = 48.f.d. Z4 = 1. In employing either the Wilcoxon–Mann–Whitney test or the Sign test in the problem just described. namely. r. we would like to have some measure on the basis of which we could judge the performance of the test in question at least in an asymptotic sense.v. p) 1 and the hypothesis H above is equivalent to testing p = – 2 . the efﬁciency of two manufacturing processes producing the same item.

Denote this limit by e. if e = 5. Let n* be the (common) sample size required in order to achieve a power equal to β by employing the t-test. we may also employ the t-test (see (36). Formally. depending on which one is used) relative to the t-test. if we use the Wilcoxon–Mann–Whitney test 1 and if it so happens that e = – 3 . 1 when the underlying distribution is Uniform and ∞ when the underlying distribution is Cauchy. then the Wilcoxon–Mann–Whitney test requires approximately only one-ﬁfth as many observations as the t-test in order to achieve the same power. exists and is independent of α and β. as n → ∞. . It has been found in the literature that the asymptotic efﬁciency of the Wilcoxon–Mann–Whitney test relative to the t-test is 3/π ≈ 0.498 20 Nonparametric Inference cance is α.95 when the underlying distribution is Normal. Thus. then this means that the Wilcoxon–Mann– Whitney test requires approximately three times as many observations as the t-test in order to achieve the same power. Chapter 13) for testing the same hypothesis against the same speciﬁed alternative at the same level α. We further assume that the limit of n*/n. However. Then this quantity e is the Pitman asymptotic relative efﬁciency of the Wilcoxon–Mann–Whitney test (or of the Sign test.

the following properties are immediate: α x + y = αx + αy.I. . . β. . . . The inner (or scalar) product x′y of any two vectors x = (x1. xn)′ by the real number α (scalar) is the vector deﬁned by αx = (αx1. The set of all n-dimensional vectors is denoted by Vn. n. . the following properties are immediate: x + y = y + x. y = ( y1. y in Vn and any two scalars α. . y = ( y1. The sum x + y of two vectors x = (x1. . . . Thus x = (x1. . xn)′. . ( ) (α + β )x = αx + βx. . y = ( y1. . αxn)′. y and z in Vn. For any three vectors x. . . . Two vectors x = (x1. Thus Vn = ޒn (= × ޒ. ′ (αx + βy) = αx′ + βy ′. . j = 1. . The zero vector. xn)′. . xn)′. yn)′ is the vector deﬁned by x + y = (x1 + y1. . they will be written in the form of a row with a prime (′) to indicate transpose. . . xn + yn)′. . α (βx) = β (αx) = αβx. . For any two vectors x. . 1x = x. j = 1. and Vn is called the (real) n-dimensional vector space. . . n. . . yn)′ are said to be equal if xj = yj. . to be denoted by O. is the vector all of whose components are equal to 0. . n factors) in our previous notation. . .3 Basic Deﬁnitions About Matrices 499 Appendix I Topics from Vector and Matrix Algebra I.1 Basic Deﬁnitions in Vector Spaces For a positive integer n. ( x + y ) + z = x + ( y + z ) = x + y + z. . . yn)′ is a scalar and is deﬁned as follows: 499 . . . Only vectors with real components will be considered. . . xj ∈ ޒ. . . This deﬁnition is extended in an obvious manner to any ﬁnite number of vectors. The product αx of the vector x = (x1. xn)′. . x is said to be an n-dimensional vector with real components if it is an n-tuple of real numbers. . . × ޒ. . All vectors will be column vectors but for typographical convenience. . . . .

which is a subspace of Vn. The norm (or length) ||x|| of a vector x in Vn is a non-negative number and is deﬁned by ||x|| = (x′x)1/2. the following properties are immediate: ′ x ′y = y ′x. β. x3= 0} which is a subspace of V3. j = 1. From now on. the straight line α1x1 + α2x2 = 0 in the plane. y in Vn are said to be orthogonal (or perpendicular). . y in V and any scalars α and β. if x′y = 0. for examples. It is shown easily that for any given set of vectors xj. V deﬁned by r ⎧ ⎪ V = ⎨y ∈Vn . if for any vectors x. A (nonempty) subset V of Vn is a vector space. the following property is immediate: αx = α x . . j = 1. and we write x ⊥ U. y2)′ ∈ V2.500 Appendix I Topics from Vector and Matrix Algebra x ′y = ∑ x j y j . Thus. the xy-plane in the three-dimensional space may be identiﬁed with the set {z = (x1. x ′ αy + βz = αx ′y + βx ′z. j = 1. being the set {x = (x1. if x′y = 0 for every y ∈ U. x2. j =1 n For any three vectors x. then x = 0. . x2 = 0} which is a subspace of V2. α1x1 + α2x2 = 0} is a subspace of V2. . the m-dimensional vector space Vm may be considered as a subspace of Vn by enlarging the m-tuples to n-tuples and identifying the appropriate components with zero in the resulting n-tuples. also if x ′y = 0 for every y ∈Vn . etc. we shall assume that the above-mentioned identiﬁcation has been made and we shall write Vm ⊆ Vn to indicate that Vm is a subspace of Vn. ⎪ j =1 ⎩ ⎫ ⎪ r⎬ ⎪ ⎭ is a subspace of Vn. y and z in Vn and any scalars α. . denoted by V ⊆ Vn. . ( ) ( ) ( ) ( ) x ′x ≥ 0 and x ′x = 0 if and only if x = 0. x2 ∈ ޒ. . α j ∈ ޒ. . . . y = ∑ α j x j . x2)′ ∈ V2. . For any vector in Vn and any scalar α. y1 = 0. x2)′ ∈ V2. . The vectors xj. . y2 ∈ }ޒwhich is a subspace of V2. . r in Vn. Two vectors x. x ′ αy = αx y = α x ′y . . x1 ∈ ޒ. r. For any positive integer m < n. . for example. A vector x in Vn is said to be orthogonal (or perpendicular) to a subset U of Vn. Similarly the y-axis in the plane may be identiﬁed with the set {y = (y1. r in Vn are said to span (or generate) the subspace V ⊆ Vn if every vector y in V may be written as follows: y = Σjr=1αjxj for some scalars αj. . and we write x ⊥ y. the x-axis in the plane may be identiﬁed with the set {x = (x1. x1. x3)′ ∈ V3. j = 1. Thus. αx + βy is also in V.

respectively. . Then U is an r-dimensional subspace Ur of Vn with r = n − m and is called the orthocomplement (or orthogonal complement) of Vm in Vn. . by taking ′ ′ ′ e1 = 1.I THEOREM 4. . . the dimension of Vn is n and m ≤ n. x′ i xi = xi 2 = 1. otherwise they are said to be linearly dependent. we gather together for easy reference those results about vector spaces used in this book. . . . k} are said to form an orthonormal basis in V if they form a basis in V and also are pairwise orthogonal and of norm one. Let m. . The vectors v. u are called the projections of x into Vm and Ur. The vectors {xj. . Furthermore. . m (the dimension of V ). . 1. . that is. . ( ) ( ) ( ) it is clear that the vectors {ej. and ||x||2 = ||v||2 + ||u||2. and as w varies in Ur. . j = 1.I THEOREM 3. let x be a vector in Vn and let V be a subspace of Vn. 0. A basis for the subspace V ⊆ Vn is any set of linearly independent vectors which span V. . . I. . . . THEOREM 2. 0. It can be shown that the number of vectors in any basis of V is the same and this is the largest number of linearly independent vectors in V. . . as z varies in Vm. 0. . Let m. or to the vectors of any set of vectors in V spanning V. . any vector x in Vn may be written (decomposed) uniquely as follows: x = v + u with v ∈ Vm. . ||x − w|| has a minimum value obtained for w = u. . . 0 . . x′ ixj = 0 for i ≠ j. . j = 1. Then V has a basis and any two bases in V have the same number of vectors. .I For any positive integer n. THEOREM 1.I . n} form an orthonormal basis in Vn.I. . consider any subspace V ⊆ Vn. j = 1. . 1 . Finally. . For example. k. e 2 = 0. j = 1. | x − z| has a minimum value obtained for z = v. . . Let n be any positive integer. n be any positive integers with m < n and let {xj. Then this basis can be extended to an orthonormal basis {xj. . k in the subspace V ⊆ Vn are said to be linearly independent if there are no scalars αj. n} for Vn.3 Some Basic Theorems Deﬁnitions onAbout Vector Matrices Spaces 501 The vectors xj . . j = 1. n be any positive integers with m < n and let Vm be a subspace of Vn of dimension m. k which are not all zero for which Σk j = 1αj xj = 0. 0 . . . Then x ⊥ V if and only if x is orthogonal to the vectors of a basis for V. . m} be an orthonormal basis for Vm. . .2 I. j = 1. u ∈ U r. e n = 0. i = 1. . This number is called the dimension of V.2 Some Theorems on Vector Spaces In this section. . In particular. say. Let U be the set of vectors in Vn each of which is orthogonal to Vm. .

The proper notation for a unit matrix of order n is In. . we shall often write simply I and the order is to be understood from the context. For any m × n matrices A. Then a (real) m × n matrix Am×n is a rectangular array of mn real numbers arranged in m rows and n columns. then A is called symmetric. m. i = 1. m.502 Appendix I Topics from Vector and Matrix Algebra I. Thus I = (δij). n of a square matrix of order n are called the elements of the main diagonal of A. αA = αA ′. . if all of the elements off the main diagonal are 0). . or just the diagonal elements of A. the following properties are immediate: A + B = B + A. Two m × n matrices are said to be equal if they have identical elements. n for an m × n matrix. A zero matrix will be denoted by 0 regardless of its dimensions. . αA + βB = αA ′ + βB′. aij = aji for all i and j. . Thus ⎛ a11 ⎜a = ⎜ 21 ⎜⋅⋅⋅ ⎜ ⎝ am 1 a12 a22 ⋅⋅⋅ am 2 ⋅ ⋅ ⋅ a1n ⎞ ⋅ ⋅ ⋅ a2 n ⎟ ⎟ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅⎟ ⎟ ⋅ ⋅ ⋅ amn ⎠ A m ×n The numbers aij. n are called the elements of the matrix. . . j = 1. The transpose A′ of the m × n matrix A = (aij) is the n × m matrix deﬁned by A′ = (aij). . B = (bij) is the m = n matrix deﬁned by A + B = (aij + bij). . . . The product . This deﬁnition is extended in an obvious manner to any ﬁnite number of m × n matrices. For brevity. . i = 1. . For any m × n matrices A. B and any scalars α. B and C. . A zero matrix is one in which all the elements are equal to zero. β. where δij = 1 if i = j and equals 0 if i ≠ j. (A + B) + C = A + (B + C) = A + B + C. However. The elements aii. Thus the rows and columns of A′ are equal to the columns and rows of A.3 Basic Deﬁnitions About Matrices Let m. where cij = Σn k = 1aikbkj. If aij = 0 for all i ≠ j (that is. that is. The sum A + B of two m × n matrices A = (aij). . for a symmetric matrix the elements symmetric with respect to the main diagonal of A are equal. If A is a square matrix and A′ = A. n be any positive integers. Clearly. It follows that a vector in Vn is simply an n × 1 matrix. m and n are called the dimensions of the matrix. respectively. and only real matrices will be considered in the sequel. then A is called diagonal. A unit (or identity) matrix is a square matrix in which all diagonal elements are equal to 1 and all other elements are equal to 0. j = 1. A matrix is said to be a square matrix if m = n and then n is called the order of the matrix. ( ) ( ) ( ) The product AB of the m × n matrix A = (aij) by the n = r matrix B = (bij) is the m × r matrix deﬁned as follows: AB = (cij). . we shall write A = (aij). . The product αA of the matrix A = (aij) by the scalar α is the matrix deﬁned by αA = (αaij). . . i = 1. . the following properties are immediate: ′ ′ ′ A ′ = A.

respectively. (AB)′ = B′A ′. m). BA = ⎜ ⎟. take ⎛ 0 0⎞ ⎛ 1 1⎞ A=⎜ ⎟. Let A be an m × n matrix. Let A be a square matrix of order n such that A′A = AA′ = I. Then A is said to be orthogonal. Let ri and ci. n stand for the row and column . such that AA−1 = A−1A = I. Let now |A| stand for the determinant of the square matrix A. is the common dimension of the two vector spaces spanned by the r-vectors and the c-vectors. . . IA = AI = A. Thus the rank of A. ⎝ 0 1⎠ ⎝ 0 0⎠ Then ⎛ 0 0⎞ ⎛ 0 1⎞ AB = ⎜ ⎟ . m. . . in general. 2. Always rank A ≤ min(m. Clearly. The plus sign is chosen if the permutation is even and the minus sign if it is odd. . (AB)D = A(BD). . For further elaboration. . . For example. It can also be shown that if |A| ≠ 0. cj. see any of the references cited at the end of this appendix. Then it can be shown that the largest number of independent r-vectors is the same as the largest number of independent c-vectors and this common number is called the rank of the matrix A. j. The products AB. ⎝ 0 0⎠ ⎝ 0 0⎠ so that AB ≠ BA. . by the expression A = ∑ ± a1i a2j ⋅ ⋅ ⋅ amp . By means of the last property. the following properties are immediate: ( ) (β B + γ C)D = β BD + γ CD. B = ⎜ ⎟. deﬁned only for square matrices. A β B + γ C = βAB + γAC. . where the aij are the elements of A and the summation extends over all permutations (i. . The matrix A−1 is called the inverse of A. Let A be an m × n matrix and let ri. BA are always deﬁned for all square matrices of the same order. that AB = BA. n) and if equality occurs. otherwise A is called singular. 0A = A0 = 0. β and γ. (αA)B = A(αB) = α (AB) = αAB. i = 1. to be denoted by A−1. Then for any scalars α.3 Basic Deﬁnitions About Matrices 503 BA is not deﬁned unless r = m and even then. i = 1. C be two n × r matrices and let D be an r × k matrix.I. there exists a unique matrix. n stand for the row and column vectors of A. (A−1)−1 = A. j = 1. . . . to be denoted by rank A. It can be shown that A is nonsingular if and only is |A| ≠ 0. . . . it is not true. . p) of (1. we say that A is non-singular or of full rank. we may omit the parentheses and set ABD for (AB)D = A(BD). say m × m. let B. . .

in general. .I Although all matrices considered here are matrices with real elements. n} are orthonormal bases of Vn. ii) For any diagonal matrix A of order n.) iii) Let A be a non-singular square matrix. in particular (by taking C = Ir). i) Let A. . Then A′. where aj. n are the diagonal elements of A. . j = 1. . vi) For any matrix A for which |A| ≠ 0. |A| = |A′|. v) Let A.I THEOREM 6. Then (ABC)′ = C′B′A′ and. negative deﬁnite. (AB)′ = B′A′. n × r. C be any m × n. THEOREM 7. |A| is either 1 or −1. consider the determinant |A − λI|. . . That is. they are always real for symmetric matrices. j = 1. . i) A square matrix A is non-singular if and only if |A| ≠ 0. . respectively.I. it should be noted that their characteristic roots will. . v) Let A. . . . B be any two matrices of the same order. λj < 0. . ii) Every orthogonal matrix is non-singular. r × s matrices. .I . (See (iv) of Theorem 6. . . be complex numbers. |A−1| = |A|−1. iv) For any orthogonal matrix A. Then it is immediate that |A − λI| is a polynomial in λ of degree n and is called the characteristic polynomial of A. n. A−1 are also non-singular. (vi) of Theorem 6. j = 1. For a square matrix A of order n. a square matrix A is said to be idempotent if A2 = A.I Let A. n satisfy the following inequalities λj > 0.4 Some Theorems About Matrices and Quadratic Forms Those theorems about matrices used in this book are gathered together here for easy reference. Then the m × m matrix AB is non-singular and (AB)−1 = B−1A−1. then so is A−1. where λ is a scalar. I. or eigenvalues of A. . j = 1. iii) For any (square) matrix A. REMARK 1. . respectively. B be non-singular m × m matrices. |A| = Πn j = 1aj. Then |B′AB| = |BAB′| = |A|. Then the orthog