A Course in Large Sample Theory

A Course in Large Sample Theory Thomas S. Ferguson Professor of Statistics University of California Los Angeles USA i] | CHAPMAN & HALL Te) oon - Weinheim New Yor Tokyo - Melbourne - MadrasPublished by Chapman & Hall, 2-6 Boundary Row, London SEI SHN, UK (Chapman & Hall, 2-6 Boundary Row, London SEL SEN, UK (Chapman & Hall GmbH, Pappelallee 3, 69469 Weinheim, Germany ‘Chapman & Hall USA, 115 Fifth Avenue, New York, NY 10003, USA. (Chapman & Hall Japan, ITP-Japan, Kyowa Building, 35,221 Tinehamatlw, Chiyuderhu, Tokyo 102, japaue (Chapman & Hall Australia, 102 Dodds Street, South Melbourne, Victoria 2205, Australia ‘Chapman & Hall india, R. Seshadri, 32 Second Main Road, CIT East, ‘Madras 600 035, India First edition 1996 © 1996 Chapman & Hall ‘Typeset in the USA by Brookhaven Typesetting Systems, Brookhaven, New York Panted in Great Britain by St Edmundsbury Press, Bury St Ediunds ISBN 0412 043718 “Apact from any fair dealing for the purposes of research or prwvate study, Cr criticism or review, as permitted under the UK Copyright Designs and Patents Act, 1988 this publication may not be reproduced, stored, or transmitted, in any form or by any means, without the prior permission in writing ofthe publishers, or in the case of reprographic reproduction only in accordance with the terme of the licences iscued by the Copyright ‘Licensing Agency in the UK, or in accordance with the terms of licences issued by the appropriate Reproduction Rights Organization outside the “UK. Enquiries concerning reproduction outside the terms stated here ‘should be sent to the publishers atthe London address printed on this Page. “The publisher makes no representation, express or implied, with ‘regard to the accuracy of the information contained in this book and ‘exnnot accept any legal rcponsibility or ibility for any errors ot ‘omissions that may be made. as A catalogue record for this book is avallable from the British Library ‘Frm on permanent acid-tree text paper, manufactured in ‘cordance with ANSI/NISO 239 48-1982 and ANSI/NISO 23948-1984 ‘Permanence of Paper).Preface Part 1 waene Part 3 " 12 2B 14 15 Part 4 16 7 Basie Probability Modes of Convergence Partial Converses to Theorem 1 Convergence in Law Laws of Large Numbers Central Limit Theoroms Basic Statistical Large Sample Theory Slutsky Theorems Functions of the Sample Moments ‘The Sample Correlation Coefficient Pearson’s Chi-Square ‘Asymptotic Power of the Pearson Chi-Square Test Special Topies Stationary m-Dependent Sequences Some Rank Statistics ‘Asymptotic Distribution of Sample Quantiles ‘Asymptotic Theory of Extreme Order Statistics ‘Asymptotic Joint Distributions of Extrema Efficient Estimation and Testing ‘A Uniform Sirony Law uf Largs Numbers Strong Consistency of Maximum-Likelihood Estimatesvi Contents 18 Asymptotic Normality of the Maximum-Likelihood Estimate 19 ‘The Cramér-Rao Lower Bound 20 Asymptotic Efficiency 2 Asymptotic Normality of Posterior Distributions 22 Distribution uf ihe Likeiiivod Ratio Test Statistic 23 Minimum Citi Square Estimates 24 General Chi-Square Tests Appendix: Solutions to the exercises References Index 119 126 133 140. 144 Io 163 12 236 239wan The subject area of mathematical statistics is so vast that in undergraduate courses there is only time enough to present an overview of the material. In particular, proofs of theorems are often omitted, occasionally with a reference to specialized material, with the understanding that proofs will be given in later, presumably graduate, courses. Some undergraduate texts contain an outline of the proof of the central limit theorem, but other theorems useful in the large sample analysis of statistical problems are usually stated and used without proof. Typical examples concern topics such as the asymptotic normality of the maximum likelihood estimate, the asymptotic distribution of Pearson's chi-square statistic, the asymptotic distribution of the likelihood ratio test, and the asymptotic normality of the rank-sum test statistic ‘But then in graduate courses, it often happens that proofs of theorems are assumed ta be given in earlier, possibly undergraduate, courses, or proofs are given as they arise in specialized settings. Thus the student never learns in a general methodical way one of the most useful areas for research in statistics large sample theory, of as it is also called, asymptotic theory. There is a need for a separate course in large sample theory at the beginning graduate level. Itis hoped that this book will help in filling this need. ‘A course in large sample theory has been given at UCLA as the second quarter of our basic graduate course in theoretical statistics for about wenty years, The students who have learned large sample theory by the route given jn this text can be said to form a large sample, Although this course is given jn the Mathematics Department, the clients have been a mix of graduate students from various disciplines. Roughly 40% of the studes from Mathematics, possibly 30% from Biostatistics, and the rest from Biomathematics, Enginocring, Economics, Business, and other fields, The vilvil Preface students generally find the course challenging and interesting, and have often contributed to the improvement of the course through questions, suggestions and, of course, complaints. Because of the mix of students, the mathematical background required for the course has necessarily been restricted. In particular, it could not be iudcuis lave a backgroud in measure-tiieoreiic analysis or probability. However, for an understanding of this book, an undergraduate course in analysis is needed as well as a good undergraduate course in mathematical statistics. Statistics is a multivariate discipline. Nearly, every useful univariate prob. em has important multivariate extensions and applications. For this reason, nearly all theorems are stated in a rmultivariate setting. Often the statement of a multivariate theorem is identical to the univariate version, but when itis not, the reader may find it useful to consider the thearem carefully in ane dimension first, and then look at the examples and exercises that treat problems in higher dimensions. The material is constructed in consideration of the student who wants to learn techniques of large sample theory on his/her own without the benefit of a classroom environment. There are many exercises, and solutions to all exercises may be found in the appendix. For use by instructors, other exercises, without solutions, can be found on the web page for the course, at http:/wwwstat.ucta.edu/courses/graduate/M276B/. Each section treats a specific topic and the basic idea or central result of the section is stated as a theorem. There are 24 sections and so there are 24 theorems. The sections are grouped into four parts. In the first part, basic notions of limits in probability theory are treated, including laws of large ‘numbers and the central limit theorem. In the second part, certain basic tools in statistical asymptotic theory, such as Slutsky’s Theorem and Cramér’s ‘Theorem, are discussed and illustrated, and finally used to derive the asymptotic distribution and power of Pearson’s chi-square. In the third part, certain special topics are treated by the methods of the first two parts, such 8 some time series statistics, some rank statistics, and distributions of quantiles and extreme order stati ‘The last part contains a treatment of standard statistical techniques including maximum likelihood estimation, the likelihood ratio test, asymptotic normality of Bayes estimates, and minimum chi-square estimation. Parts 3 and 4 may be read independently, ‘There is easily enough material in the book for a semester course. In a quarter Course, some material in parts 3 and 4 will have to be omitted or skimmed. I would like to acknowledge a great debt this book owes to Lucien Le Cam not oniy for specitic detauls as one may note in references to him in the text here and there, but also for a general philosophic outlook on thee Preface x subject. Since the time | learned the subject from him many years ago, he” has developed a much more general and mathematical approach to the subject that may be found in his book, Le Cam (1986) mentioned in the ref erences. Rudimentary vesious of aoe existence for some 20 years, and have undergone several changes in computer systems aud word processors, Iam indebted to my wife, Reatriz, for cheer fully typing some of these conversions, Finally, am indebted to my students, too namerous to mention individually. Each class was distinctive and each ‘dass taught me something new so that the next year’s class was taught Somewhat differently than the last. If future students find this book helpful, they also can thank these students for their contribution to making it under- Thomas S. Ferguson, April 1996Basic Probability TheoryModes of Convergence We begin by studying the relationships among four distinct modes of convergence of a sequence of random vectors to a limit. All convergences fare defined for d-dimensional random vectors, For a random vector X =(X,,...,X,) ER, the distribution function of X, defined for x = Gyerty ERS, is denoted by Fy(x) = POX $0 = PUG Sty. X, $x,). The Euclidean norm of x= (x)-+-5%4) © R¢ is denoted by Ix} = G2 + +43), Let X,X),X2,--- be random vectors with values: in R¢. Derinmion 1. X, converges in law to X, X,, LX, if Fy (0) > FG) as n> ©, for all points x at which Fx(x) is continuous. Convergence in law is the mode of convergence most used in the following chapters. It is the mode found in the Central Limit Theorem and is sometimes called convergence in distribution, or weak convergence. EXAMPLE 1. We say that a random vector X € R* is degenerate at a point CER! if P(X =e) = 1. Let X, © R! be degenerate at the point 1/n, for n= 1,2,.. and let X ©R' be degencrate at 0. Since 1/m converges to zero as n tends to infinity, it may be expected that X, = X. This may be seen by checking Definition 1. The distribution function of X, is F(x) = (a), and that of X is Fy(x) = p,of(*), where [4(x) denotes the function of th Ge, 1,(2) denotes 1 if x A, and 0 otherwise). Then Fy (x) —» Fy(2) for all x except x = 0, and for x = Owe have F,(0) = 0 Fy(0) = 1. But because Fe(x) is not continuous at x= 0, we nevertheless have X, =X from Definition 1. This shows the 34 ACourse in Large Sample Theory need, in the definition of convergence in law, to exclude points x at which F(x) is not continuous. DEFINITION 2. X,, converges in probability to X, X, X, if for every €>0, P(X, — X| >e} > 0asn >, DeriNiTioN 3. For a real number r > 0, X, converges in the rth mean to X, X, SX, if FIX, —X! 4 0asn so Drrinirion 4. X,, converges almost surely to X, X, 25+ X, if P(lim,, Almost sure convergence is sometimes called convergence with probabil- itv 1 (wp. 1) or strong convergence. In statistics, convergence in the rth ‘mean is most useful for r= 2, when it is called convergence in quadratic mean, and is written X,, —> X. The basic relationships are as follows. a)X,*8X=>Xx, 5x. BX, EX forsomgr > O-» X, SX, o) X, XX, SX, . Theorem I states the only universally valid implications between the various modes of convergence, as the following examples show. Exampte 2. To check convergence in law, nothing needs to de known about the joint distribution of X,, and X, whereas this distribution must be defined to check convergence in probability. For example, if X,,Xz,-.. are independent and identically distributed (i.id.) normal random variables, with mean 0 and variance 1, then X, % X,, yet X, &X,. Examrie 3. Let Z be a random variable with a uniform distribution on the interval (0,1), Z € 20,1), and let Xj =1, Xp = Ia a(Z), Xs = My, 62, Xe = Ny ryoZ, Xs = Ia r(Z-++ In general, if n — 2 +m, where O0, then X, =], at emanrr(Z)- ‘Then X, does not converge for any Z € [0,1), 90 X, for all r> 0 and X, 0.hich very ox, abil- rth vatic the own tbe r Modos of Convergence 5 = pxanpta d, Let Z be (0,1) and let Xp = 2" 5/97) Them ELI! = 2" Jn —» @, 0 X, 0 for any r > 0. Yet X, 2550 lim, ..X, = 0 = {ZS 0), and P(Z'> 0) = 1), and X, > 0 (if 0<0 <1, PUX,| > ©) = P(X, = 2") = 1/n > 0). In this example, we have 0 X and lim, ... EX, > EX. That we cannot have 05 X, “*+X and lim, ..,£X, < FX foitows from the Fatou-Lebesgue Lemma. This states: If X,, 45, X and if for all n X, >= ¥ for some random variable Y with FIY| < ©, then liminf, .. EX, > EX. In particular, this implies the Monotone Convergence Theorem: If 0 < X, 2X, <-> and X, +X, then EX, — BX. tn these theorems, X, EX,, and EX may take the value +22. ‘the FatanT ebessue Lemma also implies the basic Lebesgue Dominated Convergence Theorem: If X,, “35 X and if [X,| < Y for some random variahle ¥ with E|Y| <, then EX, —> EX. ‘The following lemma contains an’ equivalent detimition of almost sure convergence. It clarifies the distinction between convergence in probability ‘and convergence almost surely. For convergence in probability, one needs for every e > 0 that the probability that X,, is within © of X tends to one. For convergence almost surely, one needs for every ¢ > 0 that the probability that X, stays within © of X for all k > m tends to one as n tends to infinity. LEMMA L. X55 X if'and only if for every © > 0, PIX, —X1 n}— 1 asn >, a Proof. Let Ape ~ {IX,— X1 0, there exists ann. such that |X, — X| 0, 2) is equivalent to P(U, 4y,<) = 1 for all « > 0. Then, because A,,. increases 10 Ug Ane a8 2 — ® this in turn is equivalent to P(A,,} 2 lasn—>%, — forall > 0, @) which is exactly (1). i6 A Course in Large Sample Theory Proof of Theorem 1. @ X, “3x => xX, SX: Let © > 0. Then PIX, — XI se} 2 P(IX, —X| Se, forall k2n) Sasa, from Lemma 1 (b) X, SX => X, 5 X: We let 1(X € A) denote the indicator random variable that is equal to 1 if X © A and to 0 otherwise. Note that E|X, — Xl! = E[IX, — XI'HIX, — XI 2 e}] 2 e'P(IX, — Xl 2). (This, is Chebyshey's Inequality.) The result follows by letting n — (OX, 2X > X, SX: Let c > 0 and let 1 € Ré represent the vector with 1 in every component. If X, < xp, then either X < x9 + el or IX-X,| > 6. In other words, (K, 6}. Hence, Fy (Xo) < Fy(xy + el) + P(IX —X,1 > e} Similarly, (Xp — el) < Fy (xq) + P(IX —X,l > €}. Hence, since P{|X — X,| > 2) > 0as n+, Fg(xq — el) limin€ Fy, (xx9) < lim sup Fx (x9) < Fx(xp + el). If Fy(x) is continuous at x», then the left and right ends of this inequality both converge to Fx(x.) as ¢ —> 0, implying that Fx (xp) > Fg(Xo) asn > =. EXERCISES 1, Suppose X, €Gel1/n,1/n) (beta) and X €(1,1/2) (binomial). Show that X, X. What if X, €@e(a/n, B/n)? 2, Suppose X, is uniformly “distributed ‘on the set of points {1/n,2/n,...51}, Show that X, SX, where X is %(0,1). Does x, 2x? 3. (@) Show that if 0 , then sup,|Fy (x) — Fy(x)| + 0.as n> 7, Using the Fatou-Lebesgue Lemma, (&) prove the Monotone Conver- 2. gence Theorem, and (b) prove the Lebesgue Dominated Convergence Theorem. )- thisPartial Converses to Theorem 1 Although complete converses to the statements of Theorem 1 are invalid, as we have seen, under certain additional conditions some important partial converses hold. We use the same symbol ¢ to denote the point ¢ © RY, as well as the degenerate random vector identically equal to c. ‘THronem 2, (@) cE RS, thenX, eX, Se. (b) If X,, > X and X,I" < Z for some r > 0 and some random variable Z with EZ <, then X,, > X. (©) [Scheffé (1947). If X, 25 X, X, 20, and EX, + EX <=, then X, 2X, wherer (d) X, © Xx of and only if every subsequence n,, nz, e (1,2...) has a subsequence my, mys... (My, Myy.-.) Such that Xq, 24 X as j 2. Remarks. Part (a), together with part (c) of Theorem 1, implies that convergence in law and convergence in probability are equivalent if the limit is a constant random vector. In the following sections we use this ‘equivalence often without explicit mention. Part (b) gives a method of deducing convergence in the rth mean from almost sure convergence. See Exercise 3 for a strengthening of this result, and Exercise 2 for a simple sufficient condition for almost sure convergence. Part (c) is sometimes called Scheffé's Useful Convergence Theorem because of the title of Scheffé's 1947 article. It is usually stated in terms of densities (nonnegative functions that integrate to one) as follows: If f,(x) and g(x) are densities such that f,(x) — g(x) for all x, then fIf,) — g(x)|de — 0. [The hypotheses f(x) > 0 and {f,(x) de fg(x) de are automatic here. The proof of this is analogous to the proof of (c) given below.) 8we or int rat he ult, of x) en Partial Converses to Theorem 1 9 Pointwise convergence of densities is a type of convergence in distribution that is much stronger than convergence in law. Convergence in law only requires that P(X, € A) converge to P(X € A) for certain sets A of the form {x: x < a}. If the densities converge, then P(X, © A) converges ta P(X € A) for all Borel sets A, and, moreover, the convergence is tniform in A. In other words, suppose that X,, and X have densities (with respect to a measure ») denoted by f(x) and f(x), respectively. Then, if f,(x) — f(%) for all x, we have p|P(X, € A) — P(X € A)| 0. 4 ‘The proof is an exercise. We will encounter this type of convergence later in the Bernstein-von Mises Theorem. wn of the difference between this type of convergence and convergence in law, suppose that X, is uniformly distributed on the eet (1/n,2/n,....n/n}. Then X, =X € 20, 1, the uniform distribution on (0, 1], but P(X, € A) does not converge to P(X & A) for all A. For example, if A= (x: x is rational), then P(X, € A) = 1 does not converge to P(X € A)= 0. ‘Part () is a tool for dealing with convergence in probability using convergence almost surely. Generally convergence almost surcly is easier to work with, Here is an example of the use of part (@). If X,, — X with probability one (ie., almost surely), and if g(x) is a continuous function of x then it is immediate that g(X,)—» g(X) with probability one. Is the same result (rue if onvergepce almost surely is replaced by ennvergence in probability? Assume X, > X and let g(x) be a continuous function of x Fo show g(X,) £ g(Xs it is sufficient, according to part (d), to show that for every subsequence, ny,M,..-€ (1,2,.-.), there is a sub-subse~ ‘quence, 7, Mz,--€ {m, May...) such that 2X) “25, g(X) as i ©. So Tet m,,nys_-. be an arbitrary subsequence and find, using part (@), @ sub-subsequence 1, tg..-.6 {My Mz,-++} $0 that Xp, +55 X. Then g(X,,.) 25, g(X), since g(X) is continuous, and the result is proved. i Proof of Theorem 2. (a) (In two dimensions) P(K, ~ els e¥2) 2r(e~4(}) X is equivalent to for all > 0. (@) (18) Suppose X,, does not converge in probabiiity w X. Thew there exists ane > O and a 8> 0 such that P(|X, — X|>e} > 8 for infinitely many n, say (n,). Then no subsequence of {n;) converges in probability, nor, consequently, almost surely. (Only if) Let ©, > 0 and Ej_, 6; <*. Find n, such that P(X, — X1> «,) <¢; for all n > ny, and assume without loss of generality that m 0, we have for’any © > that with probability 1, IX, ~X|> © occurs only finitely many times. Hence, X,, “=> X; that is PK, — XI>€ io} = 0 for all © > 0. Similarly, if n’ is any subsequence, X,» > X, so we can find fa sub-subsequence n of n’ such that X,- > X. EXERCISES 1. Let X,, Xe... be independent identically distributed with densities fla) ~'ax"*04, (2). (a) For what values of a > 0 and r > 0 is it true that (1/n)X, - 0? (b) For what values of a > 0 is it true that (1/n)X, 2 02 (Use the Bore!~Cantelli Lemma.) 2. Show that if 2 BOX, - X)? <@, then X, “25 X and X, > X. Show that if D EI, — XI"<~, then X, > X and X, > X. 3, Improve Theorem 2b) and Theorem 2(c) by using Theorem 2(d) to show (a) If X, 2X and IK,|" 0 and some random variable Z guch that EZ <, then X, “> X. (b) If X, 2X, X, 20, and EX, > EX <=, then X, > X, where r=.12 A Course in Large Sample Theory 4, (a) Give an example of events A,, A2,... such that D5, P(A)) = @ and PCA, io.) = 0. (b) Show that if A,,A),... are independent events such that FFL, P(A) = @, then PCA, io.) = 1 (Hint: Show that P(A, finitely often) lim. TI... - PLA) < lim, exol- 5. Let X,, X;,... be independent random vari = PLU, Nj> 445) = P(AD)) ies such that PLX, ad = 1/n ond = 1 = 1/n for n = 1,2,...,where a is a constant, For what values of a, —® 0? 6. (@) Suppose f,(x) and g(x) are densities such that for all x, f,(x) -> g(x) as n — ©, Show that JUG) -8(2) |e > 0 a8 n (b) Show that if X, has density f,(x), if X has density g(x), and if S| f(x) — gGe) de > 0 as n > &, then sup |P(X, € A) - P(X A)| > 0asn +=, yi 7. Prove the following strengthening of Scheffé’s Theorem: If X, “*> X and if E|X,|—» ElX| <~, then E|X, — X|-> 0. 8, Show if X, “2X and if EX? + EX, then X, “> X.that n*} son- Convergence in Law In this section, we investigate the relationship between convergence in law of a sequence of random vectors and convergence of expectations of functions of the vectors. The basic result is that X, % X if and only if Eg(X,) ~» Eg(X) for all continuous bounded functions g. We conclude with the Continuity Theorem that relates convergence in law of a se~ {quence of random vectors with convergence of the corresponding chiarac- teristic functions. Let g represent a real-valued function defined on BR’, We say that g vanishes outside a compact set if there is a compact set C ¢ R# such that g(x) = O for all x € C. ‘Tueorem 3. The following conditions are equivalent. (a) X, 3 X. (b) Bg(X,) -> Eg(®) for all continuous functions g that vanish outside a compact set. (©) Eg(X,) > Bg(X) for all continuous bounded functions g. (@ Eg(K.) — Lg(X) for all bounded measurable functions g such that P(X € C(g)) = 1, where C(g) = {x: gis continuous at x} is called the continuity st of ‘The implication (a) = (b) or () or (4) is known as the Helly~Bray ‘Theorem. Kor example, it impiies that EustX,) > D cot X) where X, 5X, because cos(x) is continuous and bounded. We now give some ‘counterexamples (o show the necessity of the boundedness and continuity conditions. 1314 ‘A Course in Large Sample Theory Exampue 1. Let g(x) = x, and let me n with probability 1/n, 0, with probability (n — 1) /n. Then X, 3X a0, bur Fel) and (d), one cannot remove the boundedness of g. nA fn = 1 Fell Exampte 2. Let 1, itx>0 a(x) (ai if x=0, and let X, be degenerate at 1/n. Then X, 40, but Eg(X,) = 1% Proof of Theorem 3. Obviously, (4) = (©) and (c) = (b). We will show @) = @ =) >) (0). (@) = (a): Let x" be a continuity point of Fy. Then Fx(x°) = Eg), where g(x) is the indicator function, mn (ML aE eee 3) ~ {i otherwise. The continuity set of g contains all puints x except those such rat xs x' with equality for at least one component. Because x° is a continuity point of Fx, we have Fy(x" | cl) Fy(x® — ef) > 0 as & -> 0, which implics that the continuity set of g has probability 1 under the distribution of X. Hence, Fy (x°) > Fy). (a) = (b): Let g be continuous and vanishing outside a compact set, C. Then g is uniformly continuous: For every € > 0, there exists a number 8 > 0 such that |x — yl < 8 implies |g(x) — g(y)| 0 and find such a 8% 0. Slice C by finite sets of parallel hyperplanes at a distance of at most 8/ yd apart, one set for each dimension, each hyperplane having probability zcro under Fy (only count- ably many parallel planes can have positive mass). This cuts R# into parallelepipeds of the form (b, ¢] = (x: b 5, a,F x(x) = £800. Finally, |2(%,) - Ba(X)| <|Fo(X, Fa(X,)| +1 9(X..) — E&(X)| +|E@(X) — Ee(X)| < 2c +|B5(X,) — B&(X)| > 26. Since this is true for all © > 0, Bg(X,) > Eg(X). (b) = (c): Let g be continuous, |g(x)| 0. Find B such that P{|X| 2 B) < ¢/(2A). Find A continuous so that nn = (0 AL if xl >B+1 and 0 < h(x) < 1 for all x. Then, |Be(X,) ~ Ee(X)| s|£8(X,) ~ Fa(X,)A(X,)| +|Be(X,)A(X,) — Be(X)h(X)] +|Eg(X)h(X) ~ £8(X)| ‘The middle term — 0, hecause gh is continuous and vanishes outside a compact set. The first term is bounded by ¢/2, |Eg(X,) — Eg(Xn)A(Xn)| S$ £]g(Xq)| [1 — A(Xq)| Ss AB(L — A%Q)) — A(1 — Eh(X,)) + A( = Fh(X)) < #/?, and, similarly, the last term is bounded by 6/2. Therefore, |Bg(K,) ~ Eg(X)] is bounded by something that converges to «. Since this is true for all e > 0, lim, ,. [Eg(X,) — Eg)| =0. To prove (c) = (d), we use the following lemma. Lemma, Let g be bounded measurable with P(X € C(g)) = 1. Then, for every € > 0 there exist bounded continuous functions f and h such that f 0 be arbitrary. We show fu(x) > g(x) — e. Find 6 > 0 such that ly ~ xl <6 implies |g) ~ g(@)| (g(x) — B)/6. Then Fo(x) = f(x) A min{ int [g(y) + kix—yij, int [g(y) + Rix - vil} Iy-aled ir-xize = min(g(x) — e, B + ((g(x) ~ 8)/5)0) = a(x) ~ &] Third, note that Efj(X) = Eg(X) = Eh,(X), because P(X = C(g)} = 1 Now by the Monotone Convergence Theorem, Ef,(X) 7 Ef,(X) and Eh (X) \. Ehg(X0. So, for every & > 0, there exists k such that E(/,(X) ~ f,00) 0, and find f and A as in the lemma. Then, Eg(X) — e < Ef(X) = lim Ef(X,) < liminf Eg(X,) < lim sup Bg(X,)' < lim Eh(X,) = Bh(X) < Eg(X) + e. Let e -* 0, and conclude Eg(X) = lim Eg(X,). For X © R4 and t © R, the characteristic function of X is defined as ox() = E explitX) = E expli(t,X, + -- +t,X,)), where i= Y= 1. ‘Tuzonem 3(e) (the Continuity Theorem) X, 5X ont) > get), for allt Ew! Proof. (=) This follows immediately from the Helly-Bray Theorem, because exp(it”X = cos t"X + isin t”X is bounded and continuous. (4) Let g be continuous and vanishing outside a compact set. unen g is bounded, Ig(x)| < B say, and uniformly continuous. Let ¢ > 0. Findx)= sntin- . We y- loose 1, be- wen g Find Convergence in Law 7 38> 0 such that tx — yl < A= lex) ~ g(y)l <2. To show Fe(X,) > Fg(X), let ¥, (0, 7D) be independent of the X, and X. Then | £g(X,,) — Eg(X)| s| Ba(Xq) ~ £8(X, + ¥,)I +1Fe(X, + ¥,) — (X+¥,)1 +|Eg(X + ¥,) — E9(X)|. ‘The first term is < E{ig(%,) — 8%, + Yo)MNI < 8)} + E(la(X,) ~ 8(%q + Ye )IM(INgi > 8)) 5} <2 for o sufficiently small. Similarly, the third term < 2e. It remains to show that Eg(X, + ¥,) + Eg(X + Y,)- ‘The characteristic function of (0, a*1) is ata : Using this with a = 1/c, and making the change of variables u = x + y for y, we find ‘ ats, +¥2)~ [rz] f former mayan) fica -[gte] lun fereoonasye a 7 las:| Jaws f[ ie leone dt dk,(x) du : - zl fal fe" 0q (0 at du Fxlnlegaenae Hea ueineeeat ox(—t) atau, using the Lebesgue Dominated Convergence Theorem (ley (-0! <118 A Course in Large Sample Theory and g has compact support). Undoing the previous steps, we see that this last expression is equal to Fg(X + Y,). i EXERCISES 1. If X, 4X €A(), is it necessarily true that Eg(X,) + Eg(X) for (a) g(x) = To, iy), (b) g(x) i © gx) +1 if x>0,0 if x=0, and @ gx) =x? If not, give a counterexample. Show that if a”X, 4 a7 X for all vectors a, then X, 4 X. 3, Show that if X, has a density f,(x), if X has density f(x), and if for all Fy fad > AA) as > 4g, Fg(X,) > Ee(X). A, The Poisson Approximation to the Binomial Distribution. (a) Let S, have the binomial distribution, @(n, p,), and let Z have the Poisson distribution, (4), and suppose that mp, >A as n> 2 Using characteristic functions, show that S, % Z. (b) Generalize as follows. Let X;,u.Xaq0-+++Xqq be independent Bernoulli trials with P(X), = 1) =pj,,- Suppose that as n > %, Pinto +Pyg ? A, and Max; <9 P,,-» 0. Then, $, 3A), ae ee worst error that may be made using the Poisson approximation. Let XpXay.++5X, be independent Bemoulli trials with P(X, = 1) = p, for i=1,...,n,and let S, = D{X,. Let A= Ef pj, and let Z be a random variable with the Poisson distribution, (A), Show that tor all sets A, |P(5, €4) - P(ZeA)| = Soph ra Note that if each p, = A/n, this gives Exercise 4(a). (See J. Michael Steele, “Le Cam’s Inequality and Poisson Approximation,” Am. Math. Monthly (1994), pp. 48-54, for a survey article.) (Hint: ‘The following is a coupling argument; it couples S, and Z by defining them on the same probability space, and making them as close as possible. For i=1,...,1, let U, be independent 2(0, 1) random variables, let X, HU; > 1 ~ pj)y and let ¥,=0 if U; R, the derivative of /f is the row vector, y d a foo = Ff) -( ~ 1920 ‘A Course in Large Sample Theory The derivative of g: RY > R', thinking of g as a column vector, & & is 3 2 0) FBO) FBO a=| 2 fe a@)| [fae ~ sao (a kX d matrix). The second derivative of f: R4 > R is defined as a a ; Gay oral) fs) = Ze" = He :, al Gpve RULES () If f RY > RY, g RY RY, and B(x) = g(x), then ix) = gtoontGo. (2) If RY > RA, g: RE > RE, and A(x) = fx)" g(x), then A(x) = g(x)" f(x) + fx)" g@). (3) The Mean-Value Theorem. If : R4 > R* and if ix) is continuous in the sphere (x: [x ~ xl R, and if f(x) is continuous in the sphere (x: |x — Xgl <7), then for Ith p= EX. (b) IFEIXI <=, then X,, a w= EX. (©) (Strong law) X, => p => E|X| <@ and pw = EX. Proof. (a) Let g(t) = E explit”X). Then 94 (0) = ex.0-onlt/0) = ELexit/n) = ex(t/)” - (rx ' lf en(ut/n) ale)’ Because ¢,(0) = 1, and gy(e) + ie? as © 0, 1 x(t) -f li f ex(ut/n) al} — exp{ipt}. Here, we use the fact that for any sequence of real numbers, a,, for which lim, v2 m@, exists, we have (1+ a,)" > expllim,... na,}. Because expliy t) is the characteristic function of the distribution giving mass 1 10 the point jx, we have from the Continuity Theorem X, = 2 which implies trom ‘Theorem 2(a), X, — w. ¥)'(®,- 0) = PVD LEG = W(X #) = (UP LEG ~ HK w) = (1/n)E(X ~ w)"(K ~ 2) > 0.22 A Course in Large Sample Theory (Note that this proof requires only that the X, be uncorrelated and have the same mean and covariance matrix; it does not require that they be independent, or that they be identically distributed.) (c) Omitted. [See, e.g., Chung (1974), Rao (1973).) i The method of proof of part (b) is very general and quite useful for proving consistency in statistical estimation probiems. in such probiems, the underlying probability, Py, depends upon a parameter @ in @ in RY, and we are given @ sequence of random veelurs, ,, considered as estimates of 8, We say that 6, is a consistent sequence of estimates of @ if for all @< ©, 6, °» @ when P — Py is the “true” probability distribution. This is sometimes called weak consistency, of consistency in probability. We may similarly define strong consistency (6, ** 0), or consistency in quadratic mean (0, > 8), both of which imply (weak) consistency. The weak (strong) law of large numbers states the sample mean is a weakly (strongly) consistent estimate of the population mean. Exercises 1 and 2 give extensions of the law of large numbers. In the first, the X, are not identically di ibuted, and in the second, they are not independent. ‘The weak law of large numbers says that if X,..., X, are ii.d, random variables with finite first moment, 2, then for every © >0 we have PUX, ~ u| > 6) 0 as n>, The argument of Theorem 2b) only shows that P(LX, ~ yl > ©) Oat rate 1/n, Actually, the rate of convergence of P(|X, — | > e) to zero is typically exponential at a certain rate that depends on ¢ and on the underlying ion of the Xs. That is, P(X, — wl > &) behaves asymptotically like exp(—na} for some & > 0, in the sense that P(X, — yl > e)/" — exp(—a} or (1/n) log P(X, ‘Ihe study of the rate of convergence of P(X, — | > €) to zero is in the domain of large deviation theory. (See Exercises 5-8.) =pl>e)>-aasn +, Consistency of the Empirical Distribution Function. Let X,)...,Xq be lependent identically distributed random variables on R with distribution function F(x) = P(X 0} Se, such that Fay) < j/k s Fla) tel je ; ion for PX < 1,3] Note that if x), Fa, 4) ~ FU5) 2 Fela.) ~ FCa_,) ~ &. This implies that sup JF, “FG)| <4, + € 25.6, Since this holds for all e > 0, the corollary follows. a EXERCISES 1. (Consistency of the least-squares estimate of a regression coefficient.) Suppose that for given constants 7), 7,,... the random variables X,, Xz... are independent with linear regression, E(X)) = a + Az, and constant variance, var(X,) = «r?. The least-squares estimates of and. B based on X,,...,X, are 4.- Ex -2y/Bte-2 X, — Bains where Z, = (1/nJEZIz,. an (2) Under what congitions on 2,225... is it true that B, “> 6? (b) When does &, —> a?24 A Course in Large Sample Theory 2. (An autoregressive model.) Suppose €y, 2,... ate independent random variables all having the same mean yz and variance o”, Define X, as the autoregressive sequence, and for n > 2, BXqu1 + Egy where —1 < B <1. Show that ¥, SS n/a — 3. (Bemstin’s Theorem.) Let X;, Xr... be a sequence of random vari ables with ECX,) = 0, var(X,) = a, and corr(X;, X;) = py, Show that if the variances are uniformly bounded (o;? < ¢, say}, and if p,, > 0 as li —j| > (ie, for every ©>0, therg, is an integer N sitch that li jl > N implies Ip, < e), then X, “5 0. 4, (Monte Carlo.) One strategy for evaluating the integral ol I= [= sin(2ax) de = 0.153.. he by Monte Carlo approximation is as follows. Write integral with a change of variable, y = 1/x, as and approximate J by where ¥,,...,¥, is a sample from the uniform distribution on (0, 1]. How well does this approximation work? Does /, converge tol? The following four exercises deal with large deviations for sums of ii.d. random variables. For an accessible introduction to the general theory, see the book, Large Deviation Techniques in Decision, Simulation and Estima- tion by James A. Bucklew, John Wiley & Sons, New York, 1990. Let Xy,...X,. be random variables with moment-generat function MO} finite for all @ Let 2 denote the frst moment of X. To show that P(\X, — | > e) converges to zero exponentially, it is sufficient to show that both P(X, > +e) and PUX, zero , since > large ous at . Laws of Large Numbers 25 y+ exthen (1/n)log PX, > ye + 6)» ~H( y+ e). This is done in two steps in Exercises 6 and 7. 5, Let X be a random variable with moment-generating function, M(8) = Eexp(0X) finite in a neighborhood of the origin and let 4. denote the mean of X, 4 = EX. The quantity, H(x) = sup (8x ~ log M(4)) @ is called the large deviation rate function of X. (@) Show that H(2) is a convex function of x. (b) Show that H(x) has @ minimam value of zero at x = y (c) Evaluate X(x) for the normal, Poisson, and Bernoulli distributions. 6. Show that P(E, > w+ €) s expl—o( + €) + mivg Hi G/n)} p +e) > PUX, — yl < 8) = exp(—niy + 8)})(IX, — yl <8), and conclude that liminf, ,(1/n)log PCX,, putea Mn+). 8, For the Remnoulli distribution with probability p of success, the rate function H(x) is not continuous at x = 1, Establish the rate of convergence of P(X, 2 1)and PCR, > 1) to zero directly in this case.In this section, we present the basic Central Limit Theorem for i. variables. We do this for vector variables since the proof is essentially the same as for one-dimensional variables. The extension to independent nonidentically distributed random variables, due to Lindeberg and Feller, stated without proof. Applications are given to some important statistical problems: to least-squares estimators of regression coefficients, to randomization tests for paired comparison experimems, and to the signed-rank test. ‘TueoreM $. Let X,,X2,... be i.i.d. random vectors with mean p. and finite covariance matrix, ©. Then Va(X, — p) 40, S). Proof. Because vn (&, — w) = (1/ Va ENK, — 1), we have err, (t/) = Tex-s(va)= ely’, where g(t) is the characteristic function of X; — . Then, because g(0) = 1, $@) = 0, and G(e) > —E as e — 0, we have, applying Taylor's Theo- rem, Pe. i(4) °6,-w(t) (1+ Sef" (uot Vn) dudot) n aan tim ¢ f' f'v@(wt/¥n) dudvt) (ase 040 } = exp( ~(1/2)¢ 4). (2) | 4 4 mam‘or iid ally the sendent | Feller, —statist- ents, to to the nd fire @ (0) = 's Theo- @ Central | imit Theorems 27 In the convergence statement, we have used the fact that for any sequence of real numbers, a,, for which lim, ... na, exists, we have (1 + @,)* > expflim,...na,). Ml The extension of the Central Limit Theorem to the independent nonidentically distributed case is very important for statistical work. We state the basic theorem in one dimension without proof. See Feller (1986) (Vol. 2) or Chung (1974) for a proof quite similar to the proof of Theorem 5. It is useful to state this extension in terms of a triangular array of random variables, Xu Xn Xn Xa, Xa Xs where the random variables in each row are assumed to be independent with means zero ad finite variances. Tae LinpEBeRG-FELLer THeoreM. For each n = 1,2,. JH 2...5%, be independent random variables with EX,; var Xj) = og. Let Zy = Li jXqyp and let B? = var(Z,) = Ej Z,/B. MO, 1), provided the Lindeberg Condition holds: For every © > 0, 1 z Levan (IX,;| = €B,)} > 0as n>. (G3) » let X, Conversely, if (1/B2)max, <4 03 + 0 as n+ 0 (that is, if no one term of the sum B2 plays a significant role in the limit), and if Z,/B, >4(0, 1, then the Lindeberg Condition holds. ‘The important special case where there is a single sequence, X,,X,,..., of independent identically distributed random variables with mean 1 and var(X,) = a? can be obtained from this theorem hy putting X,, =n) to obtain the asymptotic normality of Z,/B, where Z, Lp42,(X; — w) and B? = o7 Et ,z3,, (See Exercise 5.) EXAMPLE 1. Application to the asymptotic normality of the least-squares estimate of a regression coefficient. Suppose X;= a+ Bz; +; for j= 1,2,..-5 Where the z; are known numbers not all equal and the ¢, are independent random variables with means zero and common variances28 A Course in Large Sample Theory o-*, In Exercise 1 of Section 4 we saw that the least-squares estimate, ,, of B was consistent provided 5, (2, ~ 2,)? > 2 asm &, We now show that if the conditions are strengthened to include (a) the ¢; are identically distributed, and (b) max; . (2; ~ 2,)°/Lje(z; — 2.) 9 Oas n> &, then A, is asymptotically normal Note that (4) We show that the conditions of the Lindeberg-Feller Theorem are satisfied with X,; = j(z) — z,). Since EX,, — 0 and var(X,,) = 27, we have B? = 05? (2) z,)?, and dite Be LER! = £B,)} iu ~ BEA He eal a= &D,)) ia qui o ep a < Lz, ~ 2) Efeil(le|l 2 €0/%))- (3) i where yy = max; . ,(z; ~2,)°/L}_(z, ~ 2,)°. From assumption (a), the expectation term is independent of j and may be factored outside the summation sign. The terms B? cancel, and the expectation tends to zero Decause the variance of ¢, is finite and y, — 0 from assumption (b). We ‘may conclude that Vis,(B, - B) 5.100, 07), (6) where 52 = D7_,(z; —2,)°/n. Exampts 2. The randomization t-test for paired comparisons. In a paired comparison experiment for comparing a treatment with a control, 2” experimental units are grouped into 7 pairs such that within each pair the units are as much alike as possible. Then for each pair, it is decided at a oe ee dis me she theey 2 Ba show () o, the le the > zero )..We 9) paired al, 2n air the ded at Central Limit Theorems 29 random which member of the pair receives the treatment and which serves as control. We let (X;, Y,) represent the resulting measurements on the jth pair, for j = 1,...,m, with X, being the result of the treatment and Y, being the result of the control ‘The usual paired comparison ¢ test for comparing treatment and control is based on the assumption that the ditterences, Z, =X, — ¥), are mde- pendent and identically distributed with finite second moment. The hypothesis Hy of no difference between treatment and control becomes the hypothesis that the distribution of the Z, is symmetric about zero. The usual test of Hy is based on the one-samplc ¢ statistic, ¢~ yn 1 Z,/s, va — 1(R, ~ ¥,)/sy, where 5, isthe standard deviation of the sample 52 = (/mE{(Z, ~ Z,). Under the hypothesis that the Z) are iid. nor tally istibuted, ¢ has ¢ distribution with n ~ 1 degrees of freedom, used for this problem. This test is based solely on the fact that the assignment of treatment and control to the pairs is made independently and at random, The analysis of the test is done conditionally on the observed values of the Z,. Because of this, the random variables Z conditional on the values’ |Z,| = lzjl, are independent under Hy with P(Z, = +lz)) = P(Z, = le) <4. Thus under Hg, the vector cea J has 2" equally likely possible values, (+1z)|,...» + lz,l). Any statistic based on (Z,,..., Z,) has at most 2° values as well. The randomization ¢ test uses the one-sample 1 statistic, ¢ = vn — 1Z,/s,. The rejection criterion is not based on the distribution but rather on the discrete distribution generated by these 2" equally likely values of (Z;,...,Z,). For example, testing against one-sided alternatives, ‘one computes the 7 statistic for all 2° values and rejects Hy if the observed ¢ falls in the upper 100a% of them. For small values of 1, this distribution can easily be tabled by a computer by evaluating the ¢ statistic at each of these 2* values. For large values of n, other methods must be employed. One method is the Monte Carlo method of approximate randomization, vihere a random sample of a few hundred is drawn from the distribution and the observed £ statistic is compared to the sample, The method we use here looks at the large sample distribution of the statistic under the randomization hypoth i First, consider the randomization test applied to the statistic Z,. We show that if the z, satisfy the condition ae ya Gt Oy Fen | jet then ¥iZ,/o, 37,1) under Hy, where 0 = (1/mEjz7. We tet30 A Course in Large Sample Theory X,; = Z; in the notation of the Lindeberg-Feller Theorem. Then EX,; = 0 and var X,,=72, sa that R2 = iz? We show F2 7,/R, 54(0, 1) by checking the Lindeberg Condition. Because |X, ;| is degenerate at |z;!, ae LAK 1(IX,)1 > €B,)} = Ez Lens > €B,) < Be Eat mms | > eB,) = 1 max 23/B2 > & 2) (8) From condition (7), this is equal to zero for all n sufficiently large. Thus, The randomization 1 test is based on the ¢ statistic rather than Z,. However, one ean show that these twa randomization tests are equivalent This is because 1= ¥n—1Z,/s, is an increasing function of v= vn Z,/a,. To see this, note that t and v always have the same sign, and that nZ3/o2 =nZ3/(s2 + Z2 ‘The conclusion to be drawn from this is that the randomization ¢ test is asymptotically normal and has asymptotically the same cutoff points as the usual f test provided (7) is satisfied. This result can be considered as a nonparametric justification of the usual ¢ test for paired comparisons when the sample size is large EXAMPLE 3. The signed-rank test for paired comparisons. One may also apply the signed-rank test to this problem. This test, like the randomiza- i + test, is based on the assumption that under H the random variables , conditional on the values |Z;l = lzj|, are independent with P(Z, = i 2\I) = P(Z, = ~|)l) = 4. The signed-rank statistic is defined as follows. Let R, denote the rank of {z)| in the ranking of |2)l,...»[2q| from smallest to largest. (We assume that all the |z;| are different and that no lz) is zero.) Then the signed-rank statistic, W., is the sum of the ranks R, for those Z, that are postive: W.= ERMZ; >). () T alae ee A ocean ane 5 In uw tir fu th fi Bi ar, wi aat is s the as a isons Central Limit Theorems 31 If we reorder the subscripts of the Z, so that 0 < |z)I < lz < ~~ < Iz,|, then we have W,= Ly ji(Z, > 0). “Because under Hp, the MZ, > 0) are Lid. Bernoulli variables equally likely to be zero as one, we find n(n e Dans 1) (19) To show asymptotic riormality of (W,, — EW.)/ yvar W, , note that W,— Ef jUZ, > 0) may be reduced to a form of the randomization test based the cum of trinus the sum of the ranks of the negative Z, then (assuming no z, ~ Q, W, — Df j(Z, > 0) — It j(Z, < 0) = 2W,- Lf j. This shows that Ww, is linearly related to W,. But (I /n)W, is exactly of the form of Z, of the randomization test with [zl =}. We merely have to show that the sequence |z,| = j satisfies (7). This follows because max, <, j? =n, and By j? = nln + 12n + 1)/6. We may conclude that W, and hence W, are asympotically normal; (W,~-EW,)/ YwarW, 100, 1). Improving the Approximation ‘The convergence in the Central Limit Theorem is not uniform in the inderlying distribution. For any fixed sample size n, there are distributions for which the normal distribution approximation to the distribution function of yn (X, — )/o is arbitrarily poor. However, there is an upper bound, due to Berry (1941) and Essen (1942), to the error of the Central Limit Theorem approximation that shows the convergence is uniform for the class of distributions for which ELX ~ |°/a? is bounded above by a finite bound. We state this theorem without proof in one dimension. Benny-Esseen Tuonem. If Xy Xa...4Xy are bid. with mean py var- ance o* > 0, and absolute third ‘moment p=ElX — pl’ <®, then |F,(2) ~ ©(x)| , Assuming the fourth moment exists, itis valid under the condition that Jim sup | E(exp(itX})| <1. (13) Wie ‘This condition is known as Cramér's Condition. It holds, in particular, if the underlying distribution has a nonzero absolutely continuous component. The expansion to the term involving 1/ vn is valid if the third moment exists, provided only that the underlying distribution is nonlattice, and even for lattice distributions it is valid provided a correction for ‘continuity is made. See Feller (Vol. 2, Chap. XVI.) for details. Let us inspect this approximation. If we stop at the first term, F,(x) ~ (x), we have the approximation given by the Central Limit Theorem. ‘The next term is of order 1/ yn and represents a correction for skewness, since this term is zero if , = 0. In particular, if the underlying distribution is symmetric, the Central Limit Theorem approximation is accurate up to terms of order 1/n. The remaining term is a correction for kurtosis (and skewness) or order 1/n. The Edgeworth Expansion is an asymptotic expansion. which means ‘that continuing with further terms in the expansion with n fixed may not converge. In particular, expanding to further terms for fixed n may make the accuracy worse. There are a number of books treating the more advanced theory of Edgeworth and allied expansions. The review by Bhattacharya (1990), treats the more mathematical aspects of the theory and the book of Barndorff-Nielsen and Cox (1989) the more statistical. t a tt e be reof the vxima- steris- worth Feige- (12) coetti- where . This of the ag the sat (La) ar, if ————— i i Central Limit Theorems 33 ‘Table 1. Normal and Edgeworth approximations of the normalized ‘mean of a sample of size 5 from an exponential distribution x oD Ea) EQ Exact 20 002 0.001 =0.007 0,000 -18 0.036 0.010 0.000 0,003 -16 0.055 0.029 ory ls -14 0.081 0.059 0.047 0.042 -12 04S 0.102 0.091 0.086 =10 0.159 0.159 0.51 0.47 08 0212 0227 0.223 0221 -06 = 0274 0.306 0.305 0.305 -04 0345 0391 0392 0392 -02 42h 0477 0478 0478. 00 = 0500 0559 0559 0560 04 0655 0.702 0700 (0701 06 OT 0.788 0758 0758 08 0.788 0.804 0808 0.807 10 O84 0841 0849 0847 120.885 087 0883 0.881 14 0919 0.898 0.910 0.908 16 0.945 os19 0.931 0929 18 (0.964 0.938 0.947 0.946 20 0977 0953 0.959 0.959 Hall (1992) is concerned with the application of Edgeworth Expansion to the bootstrap. We conclude with a simple example to illustrate the improvement accuracy afforded hy the Edgeworth Expansion. Suppose that n = 5 and that X,,X,,...,X, is a sample from the exponential distribution with density expt —x) on (0,2) For this distribution, w= 1, 0? = 1, B, = 2, and B, = 6. In Table 1, the exact values of F,(z) may be compared with the normal approximation, (x), and the Edgeworth Expansions to terms of order 1/ Vn and 1/n, denoted by E(x) and E,(x), respectively. The exact values may be obtained from the x? distribution with 10 degrees of freedom, normalized to have mean 0 and variance 1. It may be seen that the normal approximation is only moderately good, being otf by 0.060 at x = 0. The approximation £,(x) is much better, the maximum error having been reduced to .018, occurring at x= —14. Finally, the approximation £,(x) is remarkably good, having a manivns error of 0,005 at x = ~12. The negative values of E, and E, may be replaved by ce10s.34 ‘A Course in Large Sample Theory EXERCISES i 6. 1 (a) If X,, X,,... are iid, in R? with distribution giving probability 6, 4), @ to? and (= 6 ~ @,)to [3], where 6; 2 0, 6 2 0, and 0, + 6, <1, what is the asymptotic distribution of X,, given by the Central Limit Theorem? (b) Let X,,X,,...,X, be a sample from the Poisson distribution with density f(x|@) = e""0"/x! for x= 0,1,..., and let Z, be the proportion of zeros observed, Z, = (1/mE}_ 1%, = 0). Find the joint asymptotic distribution of (X,, Z,)- Let Xp Xa..+ be independent and suppose that X, = yi with probability } and X, = — vn with probability 3, for n = 1,2, Find the asymptotic distribution of X,. (Check the Lindeberg ‘Condi- tion.) ‘Show that the Lindeberg-Feiier Theorem impiies the Central Limit Theorem in one dimension. . Give a counterexample tu the conjecture: If X;, X2,... ate indepen dent om variables, and EX; = 0 and var X, = 1 for all j, then WX, >, 1). (Consider distributions of the form P(X; = y) = p,/2, PUX/= —u) = p,/2, and P(X, = 0) =1—pj, for some numbers v,~ and p,.) . (All the applications of this section may be based on the following special case of the Lindeberg Feller Theorem.) Suppose. X1, Xz, are iid, random variables with mean 4. and variance o?, Let T, EL; =X where the =,, are given numbers. Let 4, — ET, and o,) = var T,. Using the Lindeberg-Feller Theorem, show that (7, — Hn)/O 70,1) provided maxy «, 23)/Lj-12Z4j 7 Oss nm, Records. Let Z,,Z;,... be iid. continuous random variables. We say a record occurs at F if 2, > max,<, Z). Let Ry — 1 ifa record occurs at k, and let R, = 0 otherwise. Then Rj, R,,... are independent Bernoulli random variables with P(R, = 1) = 1 — P(R, = 0) = 1/k for k=1,2,.... Let $, = E{R, denote the number of records in the first_n observations. Find FS, and varS,, and show that (S, — ES,)/ yvarS, 3.4(,1). (The distribution of S,, is also the distribution of the number of cycles in a random permutation.) Kendall's r. Let Z;, Z,... be iid. continuous random variables, and let X, denote the number of Z, for i < k that are ont of order, that is, have values greater than Z,, X, = Lf"! I(Z, > Z,). It is known that the X, are independent random variables and that X, is uniformly distributed on the set {0, 1,...,k — 1}. The statistic 7, = D7 X, represents the total number of discrepancies in the ordering, It is zero 10,lity a, 120, vation z, be Find with Ve say ecurs ndent =1/k in the (s, - aribu- s,and that Central Limit Theorems 35 if the observations are in increasing order, and it takes on its maxi mum value of D3(k ~ 1) = n(n ~ 1)/2 when the observations are in decreasing order. It may be used as a nonparametric test of random- ness against the hypothesis that there is a trend in the observations, increasing or decreasing. The statistic, +, = 1~ 47,/(n(n — 1), is always between —1 and +1 and is called Kendall's coetticient ot fank correlation. It is a measure of agreement between two rankings of n objects. Find £Y, and varT,, and show that (TZ, ~ £T,)/ [var F, #10, 1). 8. If Xy,Xo..--4X, are Lid. in R' with distribution giving probability $ to —land +1, find c, for n = 1 and 2 such that sup, |F,(2) — ®(x)) = ¢, p/(Vn o»). What do you conjecture for lim, ,.» C,? (Use Stirling's formula: n! = (n"/e")V2arn .) What does this say about the constant ¢ in the Berry-Hsseen Iheorem? 9, Show that if X,,X,,...,X, are samples from a distribution with coefficient of skewness 6, aid cvcllicient of kurtosis , then the coefficient of skewness ,, and the coefficient of kurtosis B,, of 5, —X, + Xy +: +X, are given by By, ~ B,/ Vn and B,, = B./”. Conclude that Table 1 also represents the Edgeworth Expansion approximations for the mean of a sample of size 10 from the x? distribution with 1 degree of freedom, or the Edgeworth Expansion approximations for a sample of size 1 from the x? distribution with 10 degrees of freedom. 10. Suppose that X,.X,.%, is a sample of size 3 from the uniform distribution on (0,1). Compare the exact probability, POX, +X. + X; <2), to its normal and Edgeworth approximations.oO ow NT Donn ont en ReBasic Statistical Large Sample Theorysequ find Sluts lem. for norn Tae PX cc) « N Thus sam Exar gives M0,Slutsky Theorems ‘A common problem in large sample theory is the following. Givgn a sequence of random vectors, (X,), and given its limit law, say Xp = X, find the limiting distribution of f(X,) for u given function, f(x). The Slutsky Theorems provide a powerful technique for attacking this prob- Jem. For example, it gives a simple method for showing that the ¢-statistic for a sample froma distribution with finite variance is asymptotically normal, as we shall sce. TueorEM 6. (a) If X, 4X, X, and if f RY RE is such that PIX € C(t) = 1, where C(O is the continuity set of f, then £(X,)* f00. (b) If X, SX and X, - Y,) > 0, then ¥, % X. (o If X,ER4Y, © RX, SX, and Y, % 6, then ()4(3) Note: We say X,, and Y, are asymptotically equivalent if (Xq ~ Yn) to. ‘Thus, part (b) states that asymptotically equivalent sequences have the same limit laws. ExampLe 1, Suppose X, > X €/(0, 1), Then, using f(x) = x*, part (a) gives X? © X%, because, f is continuous. Since X* € xg when XE A,1), We have X? => xP. 3940 ‘A.Course in Large Sample Theory Examete 2. If X, 5X €.0,1), then 1/X, %Z, where Z has the distribution of 1/X, even though the function f(x) = 1/x is not continu- ‘ous at 0, because P(X = 0) = 0. Z has the reciprocal normal distribution with density, (2) = gaczoo{- 53} Enamre 3. However, if X, = 1/n and _f, ifx>0, 4) (o ifx<0, ee EA then X, = 0, but f(X,) # f(0). Eaamrun 4. Pari ‘) cannui be improved by assuming ¥, & cluding f “\S % (¥). For example, if X is 20,1) and X, = X for all n, and ¥,=X al m odd and ¥, = 1~X for n even, then X, X and y, % 20,2), yet (%) does not converge in law. Examete 5. Suppose X,%X and ¥, Sc. Does X,+¥,%X +c? First we note from (c) that (¥*) $% (), and then from (2) with f(x,y) = x+y that X,+¥, 3X +c. This combination of (a) and (6) is worth stating as a corollary. Corouary. If X, €R4Y, € RX, SX, Y, Se, and & RY > Ris such that P((%) €C()}=1, then £%,,¥,) 5 £00). “This follows directly from (a) and (©). Examrie 6. IX, X and ¥, Sc, then ¥7X, % e7X. ie In onc dimension, if c #0 and X,. 2X and bt 4 c, then X,/¥, *X/c. In this last, we are using the function , ify #0, sean = (7? eo which is discontinuous at all points of the line y = 0. However, the F fe) distribution of (7) gives mass 0 iv thio tine if © + 0, ou ikke xesult follows from the corollary. Proc The oo of f Ent: b unif Eg & cont Ay) dist fron Ing fron L | unmeeuntsbnncseis a e ehas the continu ribution ad con- wall n, LX and X+e? uY)= 5 worth oR is c, then ver, the e result Slutsky Theorems 41 Proof of Theorem 6. (a) Let g: R* > R be bounded and continuous. From ‘Theorem 30), it is sufficient to show that Eg(t(X,)) > Egg). Let Gx) = g(fG)). Then, a point of continuity of fis also a point of continuity of hi; that is, C(® C COA), so from Theorem 3(4), Eg(t(X,)) = EM(X,,) > Eh(X) = Eg(f0). (b) Let g be continuous vanishing ouside « wwmpact sei. Piunt Theo- rem 3(b), it is sufficient to show that Eg(¥,) > Fg(X). Because g is uniformly continuous, fel © 7 0, and find 5 > 0 such that Ix yl <8 >la(x) ~ 8Q9)| <® Also g is bounded, say (g()I 6) +|Be(X,) ~ Ee(X)| <6+ 2BP(IX, ~ Yl > 8) +|e(X,) ~ Ba(X)| > & © Ai(®) -(e)l> d= Py, ~el> 9) +0. So from (b), it is sufficient to show that (*) S(x). pur it gis continuous bounded, Eg(X,»6) —* Eg(X,0) becanse X, SX. “Asymptotic Normality of the t Statistic. If X,.X>.... is a sample from a distribution with mean pz and variance o? > 0 (on the real line), then ¥, 2p and (1/n) DX? Ex, T from the Law of Large Numbers, so from the corollary, st = (1/n) DX} - ¥2 S Ex? - wP 7 In addition, Vn (%, — #)/e 40,1) from the Central Limit Theorem. Hence, again ium ithe corollary, va(X,— 1/5 40,1).42 ‘A.Course in Large Sample Theory The left side is defined as zero (or anything else) if s, = 0, as in Example 7. From this it follows that the ¢ statistic is asymptotically normal, t,1=Va-1(X, -m)/s, 31(0,1). The Slutsky Theorems for convergence in probability are quite analogous to Theorem 6, but part (c) can be strengthened: Tusozem 6. (a) If XE R4 X,% X, and & RY RM is such thar PK € C(O) = 1, then £(X,) © £00, (b) If X, +X and X, — ¥, 20, then ¥, 2x. (©) FX, 2X and ¥, 2 Y, then () 4(%) The Slutsky Theorems for convergence almost surely, obtained by replacing —> wherever it occurs in Theorem 6' by ==>, are also valid and easy to prove. EXERCISES 1, Prove Theorem 6’. Hint: For (a), use Theorem 2(4). 2 Show that if (X,) and i are independent, and if X, %X and y, &, then (* ) &( (4), where X and ¥ are taken to be independent. 3. Consider the autoregressive scheme, X,= BX, +6, forn=1,2,3, where €1,€2,... are iid, Ee, =, vare,)= 0%, -1 1, then (Y, ~ EY,)/ YwarY, X. (©) Suppose X, sud Y, have ety Inca and equal yarns ist wae that if X, % X and cor(X,, ¥,) > 1, then ¥, % X? ; 5. Sh ct 6. Th rat @) (b)Slutsky Theorems i 5, Show that if BUX, — Y,)?/var X, -» 0, then corr(X,,¥,) > 1. Con- ‘lude using Exercise 4, Xn EX E(x, -¥,)* Sy ang HOW 4 i var X, 2 atalee imply ay toh that Ag vwarY, 6, The following version of Theorem 6(b) is often useful for nonnegative random variables. : : (a) Show that if X, =X > 0 and X,/Y, — 1, then ¥, =X. (b) Exicnd ned by 30 valid X and wdepen- 1, and <1, BR I Bes & Seek ee ce eee Sla We continue investigating the implications of the Slutsky Theorems. Here we study Cramér’s Theorem on the asymptotic normality of functions of the sample moments through a Taylor-series expansion to one term. In some situations, the rate of convergence to normality is exceed ingly slow. Hence, we conclude this section by studying improvements to the normal approximation that take more terms of the series expansion into account. The analysis of the asymptotic distribution of the t-statistic given in the previous section may be extended to d dimensions as follows, From the central limit theorem, we have vn (X, — 4) ~».4(0, ) where & = var(X), and from the Law of Large Numbers, some help from the Slutsky ‘Theorems, we have S, = (1/n) D}(X, — X, XX, — X,)! Ax. if E is non-singular, ‘then P(S, is nonsingular) > '1 and S;'/yn&, pw) > %°'7Y where ¥ €0,). Since ¥~ even, rey ay =40,D, we conclude that 8," Vn (X, ~ pw) 3.10,D. This is an example of a more general theorem, due to Cramér, that states that smooth functions of the sample moments are asymptotically normal, First, it is clear from the Cental Limit Theorem that the eae moments about zero, things like (1/n) Ef X), (1/n) DEX?, and G/m) Ef XPY,, are jointly asymptotically normal if the expectations of the squares of all terms exist. Then, repeated application of the following 44 Turore continu sional . atw)) = dxdc Proof. 1 is conti so for | \ Since ¥ = i Exp): 0K Exams mand distrib Solutio, Theore Note aware ity in Seco asympt distrib Wak,

A Course in Large Sample Theory

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Course in Large Sample Theory

Uploaded by

Copyright:

Available Formats

You might also like