Professional Documents
Culture Documents
STAT603 Notes
STAT603 Notes
Mathematical Statistics I
Fall Semester 1999
2 Random Variables 23
2.1 Measurable Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2 Probability Distribution of a Random Variable . . . . . . . . . . . . . . . . . . . . . 27
2.3 Discrete and Continuous Random Variables . . . . . . . . . . . . . . . . . . . . . . . 31
2.4 Transformations of Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4 Random Vectors 72
4.1 Joint, Marginal, and Conditional Distributions . . . . . . . . . . . . . . . . . . . . . 72
4.2 Independent Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.3 Functions of Random Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.4 Order Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.5 Multivariate Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.6 Multivariate Generating Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.7 Conditional Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
4.8 Inequalities and Identities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
1
5 Particular Distributions 113
5.1 Multivariate Normal Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.2 Exponential Familty of Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
2
Acknowledgements
I would like to thank my students, Hanadi B. Eltahir, Rich Madsen, and Bill Morphet, who helped
in typesetting these lecture notes using LATEX and for their suggestions how to improve some of the
material presented in class.
In addition, I particularly would like to thank Mike Minnotte and Dan Coster, who previously
taught this course at Utah State University, for providing me with their lecture notes and other
materials related to this course. Their lecture notes, combined with additional material from
Rohatgi (1976) and other sources listed below, form the basis of the script presented here.
The textbook required for this class is:
Rohatgi, V. K. (1976): An Introduction to Probability Theory and Mathematical Statistics,
John Wiley and Sons, New York.
A Web page dedicated to this class is accessible at:
http://www.math.usu.edu/~symanzik/teaching/1999_stat6710/stat6710.html
This course closely follows Rohatgi (1976) as described in the syllabus. Additional material origi-
nates from the lectures from Professors Hering, Trenkler, and Gather I have attended while study-
ing at the Universitat Dortmund, Germany, the collection of Masters and PhD Preliminary Exam
questions from Iowa State University, Ames, Iowa, and the following textbooks:
3
Additional denitions, integrals, sums, etc. originate from the following formula collections:
Bronstein, I. N. and Semendjajew, K. A. (1985): Taschenbuch der Mathematik (22. Au
age),
Verlag Harri Deutsch, Thun, German Democratic Republic.
Bronstein, I. N. and Semendjajew, K. A. (1986): Erganzende Kapitel zu Taschenbuch der
Mathematik (4. Au
age), Verlag Harri Deutsch, Thun, German Democratic Republic.
Sieber, H. (1980): Mathematische Formeln | Erweiterte Ausgabe E, Ernst Klett, Stuttgart,
Germany.
4
Stat 6710 Mathematical Statistics I Fall Semester 1999
1 Axioms of Probability
1.1 {Fields
Let
be the sample space of all possible outcomes of a chance experiment. Let ! 2
(or x 2
)
be any outcome.
Example:
Count # of heads in n coin tosses.
= f0; 1; 2; : : : ; ng.
Any subset A of
is called an event.
Denition 1.1.1:
A class L of subsets of
is called a eld if
2 L and L is closed under complements and nite
unions, i.e., L satises
(i)
2 L
(ii) A 2 L =) AC 2 L
(iii) A; B 2 L =) A [ B 2 L
Since
C = , (i) and (ii) imply 2 L. Therefore, (i)': 2 L [can replace (i)].
1
Note:
So (ii), (iii) imply (iii)': A; B 2 L =) A \ B 2 L [can replace (iii)].
Proof:
Denition 1.1.2:
A class L of subsets of
is called a {eld (Borel eld, {algebra) if it is a eld and closed under
countable unions, i.e.,
[
1
(iv) fAn g1
n=1 2 L = ) An 2 L.
n=1
Note:
(iv) implies (iii) by taking An = for n 3.
Example 1.1.3:
For some
, let L contain all nite and all conite sets (A is conite if AC is nite). Then L is a
eld. But L is a {eld i (if and only if)
is nite.
Example:
[
1
= Z . Take An = fng, each nite, so An 2 L. But An = Z + 62 L, since the set is not nite (it
n=1
[
1
;
is innite) and also not conite (( An )C = Z0 is innite, too).
n=1
Terminology:
A set A 2 L is said to be \measurable L".
2
Stat 6710 Mathematical Statistics I Fall Semester 1999
We often begin with a class of sets, say a, which may not be a eld or a {eld.
Denition 1.1.4:
The {eld generated by a, (a), is the smallest {eld containing a, or the intersection of all
{elds containing a.
Note:
(i) Such {elds containing a always exist (e.g., P(
)), and (ii) the intersection of an arbitrary #
of {elds is always a {eld.
Proof: \
(ii) Suppose L = L . We have to show that conditions (i) and (ii) of Def. 1.1.1 and (iv) of Def.
1.1.2 are fullled:
(i)
2 L 8 =)
2 L
(ii) Let A 2 L =) A 2 L 8 =) AC 2 L 8[ =) AC 2 L [
(iv) Let An 2 L 8n =) An 2 L 8 8n =) An 2 L 8 =) An 2 L
n n
Example 1.1.5:
What is (a)?
(a): must include
; ; f0g
also: f1; 2; 3g by 1.1.1 (ii)
Since all unions are included, we have (a) = f
; ; f0g; f1; 2; 3gg
What is (b)?
(b): must include
; ; f0g; f0; 1g
also: f1; 2; 3g; f2; 3g by 1.1.1 (ii)
f0; 2; 3g by 1.1.1 (iii)
3
f1g by 1.1.1 (ii)
Since all unions are included, we have (b) = f
; ; f0g; f1g; f0; 1g; f2; 3g; f0; 2; 3g; f1; 2; 3gg
If
is nite or countable, we will usually use L = P(
). If j
j= n < 1, then j L j= 2n .
If
is uncountable, P(
) may be too large to be useful and we may have to use some smaller {eld.
Denition 1.1.6:
If
= IR, an important special case is the Borel {eld, i.e., the {eld generated from all
half{open intervals of the form (a; b], denoted B or B1 . The sets of B are called Borel sets.
The Borel {eld on IRd (Bd ) is the {eld generated by d{dimensional rectangles of the form
f(x1 ; x2 ; : : : ; xd) j ai < xi bi ; i = 1; 2; : : : ; dg.
Note:
\1
B contains all points: fxg = (x ; n1 ; x]
n=1
closed intervals: [x; y] = (x; y] + fxg = (x; y] [ fxg
open intervals: (x; y) = (x; y] ; fyg = (x; y] \ fygC
[
1
and semi{innite intervals: (x; 1) = (x; x + n]
n=1
Note:
[
1
An 2 L holds automatically if L is a {eld but it is needed as a precondition in the case that
n=1
L is just a eld. Property (iii) is called countable additivity.
4
Stat 6710 Mathematical Statistics I Fall Semester 1999
[
1
1 = P (
) = P( An )
n=1
Def 1:1:7(iii) X
1
= P (An )
n=1
5
X
1
= P (A1 ) + P (A2 ) + P (An)
n=3
Th1:2:1(i)
= P (A1 ) + P (A2 )
= P (A) + P (AC )
=) P (AC ) = 1 ; P (A) 8A 2 L.
(iii) By Th. 1.2.1 (ii) P (A) = 1 ; P (AC ) =) P (A) 1 8A 2 L since P (AC ) 0 by Def 1.1.7 (i)
(iv) A [ B = (A \ B C ) [ (A \ B ) [ (B \ AC ). So, (A [ B ) can be written as a union of disjoint
sets (A \ B C ); (A \ B ); (B \ AC ):
=) P (A [ B ) = P ((A \ B C ) [ (A \ B ) [ (B \ AC ))
Def:1:1:7(iii)
= P (A \ B C ) + P (A \ B ) + P (B \ AC )
= P (A \ B C ) + P (A \ B ) + P (B \ AC ) + P (A \ B ) ; P (A \ B )
= (P (A \ B C ) + P (A \ B )) + (P (B \ AC ) + P (A \ B )) ; P (A \ B )
= P (A) + P (B ) ; P (A \ B )
(v) B = (B \ AC ) [ A where (B \ AC ) and A are disjoint sets.
P (B ) = P ((B \ AC ) [ A) Def 1=
:1:7(iii)
P (B \ AC ) + P (A)
=) P (A) = P (B ) ; P (B \ AC )
=) P (A) P (B ) since P (B \ AC ) 0 by Def 1.1.7 (i)
6
Proof:
Right side induction base:
For n = 1, Th. 1.2.3 right side evaluates to P (A1 ) P (A1 ), which is true.
This establishes the induction base for the right side of Th. 1.2.3.
Right side induction step assumes Th. 1.2.3 right side is true for n and shows that it is true for
n + 1:
n[
+1 [n
P( Ai) = P (( Ai ) [ An+1 )
i=1 i=1
Th:1:2:1(iv) [n [n
= P ( Ai ) + P (An+1) ; P (( Ai) \ An+1)
i=1 i=1
Def:1:1:7(i) [n
P ( Ai ) + P (An+1 )
i=1
I:B: X
n
P (Ai ) + P (An+1)
i=1
nX
+1
= P (Ai )
i=1
For n = 2, Th. 1.2.3 left side evaluates to P (A1 ) + P (A2 ) ; P (A1 \ A2 ) P (A1 [ A2 ), which is
true by Th. 1.2.1 (iv).
7
This holds since
P (A1 [ A2 [ A3 )
= P ((A1 [ A2 ) [ A3 )
Th:1:2:1(iv)
= P (A1 [ A2 ) + P (A3 ) ; P ((A1 [ A2 ) \ A3 )
= P (A1 [ A2 ) + P (A3 ) ; P ((A1 \ A3 ) [ (A2 \ A3 ))
Th:1:2:1(iv)
= P (A1 ) + P (A2 ) ; P (A1 \ A2 ) + P (A3 ) ; P (A1 \ A3 ) ; P (A2 \ A3 )
+P ((A1 \ A3 ) \ (A2 \ A3 ))
= P (A1 ) + P (A2 ) + P (A3 ) ; P (A1 \ A2 ) ; P (A1 \ A3 ) ; P (A2 \ A3 ) + P (A1 \ A2 \ A3 )
Def 1:1:7(i)
P (A1 ) + P (A2 ) + P (A3 ) ; P (A1 \ A2) ; P (A1 \ A3 ) ; P (A2 \ A3 )
This establishes the induction base for the left side of Th. 1.2.3.
Left side induction step assumes Th. 1.2.3 left side is true for n and shows that it is true for n + 1:
n[
+1 [n
P( Ai) = P (( Ai ) [ An+1 )
i=1 i=1
[n [n
= P ( Ai) + P (An+1 ) ; P (( Ai ) \ An+1 )
i=1 i=1
left I:B: X
n X
n [n
P (Ai ) ; P (Ai \ Aj ) + P (An+1 ) ; P (( Ai ) \ An+1)
i=1 i<j i=1
nX
+1 Xn [n
= P (Ai ) ; P (Ai \ Aj ) ; P ( (Ai \ An+1 ))
i=1 i<j i=1
Th:1:2:3 right side nX+1 Xn X
n
P (Ai ) ; P (Ai \ Aj ) ; P (Ai \ An+1 )
i=1 i<j i=1
nX+1 nX+1
= P (Ai ) ; P (Ai \ Aj )
i=1 i<j
8
Stat 6710 Mathematical Statistics I Fall Semester 1999
Theorem 1.2.6:
If fAn g1
n=1 ; An 2 L and A 2 L, then nlim
!1 P (An ) = P (A) if 1.2.5 (i) or 1.2.5 (ii) holds.
Proof:
Part (i): Assume that 1.2.5 (i) holds.
Let B1 = A1 and Bk = Ak ; Ak;1 = Ak \ ACk;1 8k 2
By construction, Bi \ Bj = ; for i =
6 j
[
1 [1
It is A = An = Bn
n=1 n=1
[n [n
and also An = Ai = Bi
i=1 i=1
9
[
1 X
1 X
n
P(A) = P( Bk ) By Def.=1.1.7 (iii) P (Bk ) = nlim
!1[ P (Bk )]
k=1 k=1 k=1
again by Def. 1.1.7 (iii) [
n [
n
= !1[P (k=1 Bk )] = nlim
nlim !1[P (k=1 Ak )] = nlim
!1 P (An )
[n
The last step is possible since An = Ak
k=1
Theorem 1.2.7:
(i) Countable unions of probability 0 sets have probability 0.
(ii) Countable intersections of probability 1 sets have probability 1.
Proof:
Part (i):
Let fAn g1 n=1 2 L, P (An ) = 0 8n
By K.A.P. (i) [1 By Bonferroni's Inequality X
1 X
1
0 P ( An ) P (An ) = 0 = 0
n=1 n=1 n=1
[
1
Therefore P ( An ) = 0
n=1
Part (ii):
Let fAn g1
n=1 2 L, P (An ) = 1 8n
[
1 \
1
=)
by Th. 1.2.1 (ii)
P (ACn ) = 0 8n by Th.=)
1.2.7 (i)
P( ACn ) = 0 De=Morgan
) P( An ) = 1
n=1 n=1
10
Stat 6710 Mathematical Statistics I Fall Semester 1999
Note:
If this is true, P (A) = number !j in A
number !j in
. Therefore, to calculate such probabilities, we just need to
be able to count elements accurately.
11
Denition 1.3.3:
For positive integer n, we dene n factorial as n! = n (n ; 1) (n ; 2) : : : 2 1 = n (n ; 1)!
and 0! = 1.
Denition 1.3.4:
For nonnegative integers n r, we dene the binomial coecient (read as n choose r) as
n
r = r!(nn;! r)! = n (n ; 1) 1(n 2; 31) : :: :: : r(n ; r + 1) :
Note:
Most counting problems consist of drawing a xed number of times from a set of elements (e.g.,
f1; 2; 3; 4; 5; 6g). To solve such problems, we need to know
(i) the size of the set, n;
(ii) the size of the sample, r;
(iii) whether the result will be ordered (i.e., is f1; 2g dierent from f2; 1g); and
(iv) whether the draws are with replacement (i.e, can results like f1; 1g occur?).
Theorem 1.3.5:
The number of ways to draw r elements from a set of n, if
(i) ordered, without replacement, is (n;n!r)! ;
(ii) ordered, with replacement, is nr ;
(iii) unordered, without replacement, is r!(nn;! r)! = rn ;
1)! = n+rr;1 .
(iv) unordered, with replacement, is (rn!(+nr;;1)!
Proof:
(i) n choices to select 1st
n ; 1 choices to select 2nd
..
.
n ; r + 1 choices to select rth
By Theorem 1.3.2, there are n (n ; 1) : : : (n ; r +1) = n(n;1):::(n(;nr;)!r+1)(n;r)! = (n;n!r)!
ways to do so.
12
Corollary:
The number of permutations of n objects is n!.
(ii) n choices to select 1st
n choices to select 2nd
..
.
n choices to select rth
By Theorem 1.3.2, there are n| n {z: : : n} = nr ways to do so.
r times
(iii) We know from (i) above that there are (n;n!r)! ways to draw r elements out of n elements
without replacement in the ordered case. However, for each unordered set of size r, there are
r! related ordered sets that consist of the same elements. Thus, there are (n;n!r)! r1! = rn
ways to draw r elements out of n elements without replacement in the unordered case.
(iv) There is no immediate direct way to show this part. We have to come up with some extra
motivation. We assume that there are (n ; 1) walls that separate the n bins of possible
outcomes and there are r markers. If we shake everything, there are (n ; 1+ r)! permutations
to arrange these (n ; 1) walls and r markers according to the Corollary. Since the r markers
are indistinguishable and the (n ; 1) markers are also indistinguishable, we have to divide
the number of permutations by r! to get rid of identical permutations where only the markers
are changed and by (n ; 1)! to get rid of identical permutations where only the walls are
changed. Thus, there are (rn!(;n1+ r)! ;n+r;1
;1)! = r ways to draw r elements out of n elements with
replacement in the unordered case.
13
Induction Step:
Suppose it is true for k. We show that it is true for k + 1.
(1 + x)k+1 = (1 + x)k (1 + x)
k k! !
X
IB
= r x (1 + x)
r=0 r
k k!
X k k!
X
= r xr + r xr+1
r=0 r=0
! X k " ! !# !
= k x0 + k + k k
xr + k xk+1
0 r r ; 1
! r=1 X " ! !# !
= k + 1 x0 + k k + k k + 1
xr + k + 1 xk+1
0 =1 r r ; 1
! rX ! !
()
= k + 1 x0 + k k + 1 xr + k + 1 xk+1
0 r=1 r k+1
kX+1 k + 1 !
= r xr
r=0
(*) Here we use Theorem 1.3.8 (i). Since the proof of Theorem 1.3.8 (i) only needs algebraic trans-
formations without using the Binomial Theorem, part (i) of Theorem 1.3.8 can be applied here.
Corollary 1.3.7:
For a non{negative integer n, it holds:
n n
(i) 0 + 1 + : : : + nn = 2n
n n n n
(ii) 0 ; 1 + 2 ; 3 + : : : + (;1)n nn = 0
n n n
(iii) 1 1 + 2 2 + 3 3 + : : : + n nn = n2n;1
n n n
(iv) 1 1 ; 2 2 + 3 3 + : : : + (;1)n;1 n nn = 0
Proof:
Use the Binomial Theorem:
(i) Let x = 1. Then ! !
Xn n 1r = Xn n
2n = (1 + 1)n Bin:Th:
= r
r=0 r=0 r
14
(ii) Let x = ;1. Then !
n Bin:Th: Xn n (;1)r
0 = (1 + (;1)) = r
r=0
!
d (1 + x)n = d X
n n xr
(iii) dx dx r
r=0
X
n !
=) n(1 + x)n;1 = r nr xr;1
r=1
Substitute x = 1, then !
Xn n
n2n;1 = n(1 + 1)n;1 = r r
r=1
(iv) Substitute x = ;1 in (iii) above, then
X
n !Xn !
0 = n(1 + (;1))n;1 = r nr (;1)r = r nr (;1)r;1
r=1 r=1
since for
P a = 0 also P(;a ) = 0.
i i
Note:
A useful extension for the binomial coecient for n < r is
n n (n ; 1) : : : 0 : : : (n ; r + 1)
r = 1 2 ::: r = 0:
Theorem 1.3.8:
For non{negative integers, n; m; r, it holds:
n;1
(i) n;r 1 + r ; 1 = rn
n m n m n m m+n
(ii) 0 r + 1 r ; 1 + ::: + r 0 = r
n+1
(iii) r0 + r1 + r2 + : : : + rn = r + 1
Proof:
Homework
15
Stat 6710 Mathematical Statistics I Fall Semester 1999
Denition 1.4.1:
Given (
; L; P ) and H 2 L; P (H ) > 0, and A 2 L, we dene
\ H ) = P (A)
P (AjH ) = P (PA(H ) H
Theorem 1.4.2:
In the situation of Denition 1.4.1, (
; L; PH ) is a probability space.
Proof:
If PH is a probability measure, it must satisfy Def. 1.1.7.
(ii) PH (
) = P P(
(H\H) ) = PP ((HH )) = 1
(iii) Let fAn g1
n=1 be a sequence of disjoint sets. Then,
[
1
[
1 P (( An ) \ H )
Def:1:4:1 n=1
PH ( An ) = P (H )
n=1
[
1
P( (An \ H ))
= n=1
P (H )
16
X
1
P (An \ H )
Def:1:1:7(iii) n=1
= P (H )
X
1 P (A \ H )
= ( n )
P (H )
n=1
Def 1:4:1 X
1
= PH (An )
n=1
Note:
What we have done is to move to a new sample space H and a new {eld LH = L \ H of subsets
A \ H for A 2 L. We thus have a new measurable space (H; LH ) and a new probability space
(H; LH ; PH ).
Note:
From Denition 1.4.1, if A; B 2 L; P (A) > 0, and P (B ) > 0, then
P (A \ B ) = P (A)P (B jA) = P (B )P (AjB );
which generalizes to
Theorem 1.4.3: Multiplication Rule
n\
;1
If A1 ; : : : ; An 2 L and P ( Aj ) > 0, then
j =1
\n n\
;1
P( Aj ) = P (A1 ) P (A2 jA1 ) P (A3 jA1 \ A2 ) : : : P (An j Aj ):
j =1 j =1
Proof:
Homework
Denition 1.4.4:
n=1 of
form a partition of
if
A collection of subsets fAn g1
[
1
(i) An =
, and
n=1
(ii) Ai \ Aj = 8i 6= j , i.e., elements are pairwise disjoint.
17
Theorem 1.4.5: Law of Total Probability
If fHj g1
j =1 is a partition of
, and P (Hj ) > 0 8j , then, for A 2 L,
X
1 X
1
P (A) = P (A \ Hj ) = P (Hj )P (AjHj ):
j =1 j =1
Proof:
By the Note preceding Theorem 1.4.3, the summands on both sides are equal
=) the right side of Th. 1.4.5 is true.
Proof:
P (Hj \ A) Def=1:4:1 P (A) P (Hj jA) = P (Hj ) P (AjHj )
=) P (Hj jA) = P (HjP)P(A()AjHj ) Th:=1:4:5 X
1
P (Hj )P (AjHj ) .
P (Hn)P (AjHn)
n=1
Denition 1.4.7:
For A; B 2 L, A and B are independent i P (A \ B ) = P (A)P (B ).
Note:
There are no restrictions on P (A) or P (B ).
If A and B are independent, then P (AjB ) = P (A) (given that P (B ) > 0) and P (B jA) = P (B )
(given that P (A) > 0).
18
If A and B are independent, then the following events are independent as well: A and B C ;
AC and B ; AC and B C .
Denition 1.4.8:
Let A be a collection of L{sets. The events of A are pairwise independent i for every distinct
A1 ; A2 2 A it holds P (A1 \ A2 ) = P (A1 )P (A2 ).
Denition 1.4.9:
Let A be a collection of L{sets. The events of A are mutually independent (or completely inde-
\k Yk
pendent) i for every nite subcollection fAi1 ; : : : ; Aik g; Aij 2 A, it holds P ( Ai ) =
j P (Ai ).
j
j =1 j =1
Note:
To check for mutually independence of n events fA1 ; : : : ; An g 2 L, there are 2n ; n ; 1 relations
(i.e., all subcollections of size 2 or more) to check.
Example 1.4.10:
Flip a fair coin twice.
= fHH; HT; TH; TT g.
A1 = \H on 1st toss"
A2 = \H on 2nd toss"
A3 = \Exactly one H "
Obviously, P (A1 ) = P (A2 ) = P (A3 ) = 12 .
Question: Are A1; A2 and A3 pairwise independent and also mutually independent?
P (A1 \ A2 ) = :25 = :5 :5 = P (A1 ) P (A2 ) ) A1; A2 are independent.
P (A1 \ A3 ) = :25 = :5 :5 = P (A1 ) P (A3 ) ) A1; A3 are independent.
P (A2 \ A3 ) = :25 = :5 :5 = P (A2 ) P (A3 ) ) A2; A3 are independent.
Thus, A1 ; A2 ; A3 are pairwise independent.
P (A1 \ A2 \ A3) = 0 6= :5 :5 :5 = P (A1 ) P (A2 ) P (A3 ) ) A1 ; A2 ; A3 are not mutually independent.
19
Stat 6710 Mathematical Statistics I Fall Semester 1999
It is
p1 = P (at least 1 other (from r-1) students has a birthday on this particular day.)
= 1 ; P (all (r-1) students have a birthday on the remaining 364 out of 365 days)
364 r;1
= 1 ; 365
p2 = P (no student has a birthday matching the rst student and at least one
of the other (r-2) students has a b-day matching the second student)
Let A No student has a b-day matching the 1st student
Let B At least one of the other (r-2) has b-day matching 2nd
So p2 = P (A \ B )
= P (A) P (B jA)
= P (no student has a matching b-day with the 1st student )
P (at least one of the remaining students has a mathching b-day with the second,
given that no one matched the rst.)
20
= (1 ; p1 )[1 ;" P (all (r-2) students have a b-day on the remaining 363 out of 364 days)
364 r;1 363 r;2#
= 365 1 ; 364
365 ; 1 r;1 " 363 r;2#
= 365 1 ; 364
p3 = P (No one has same b-day as rst and no one same as second, and at least one of the
remaining (r ; 3) has a matching b-day with the 3rd student)
Now:
p3 = P (A \ B \ C )
= P (A) P (B jA) P (C jA \ B )
364 r;1 363 r;2
= 365 364 [1 ; P (all (r ; 3) students have a b-day on the remaining 362 out of 363 days]
364 r;1 363 r;2 " 362 r;3#
= 365 364 1 ; 363
(364) r;1 (363)r;2 " 362 r;3#
= (365)r;1 (364)r;2 1 ; 363
364 r;1 ! 363r;2 ! " 362 r;3#
= 364r;2 365r;1 1 ; 363
21
364 363r;2 ! " 362 r;3#
= 365 365r;2 1 ; 363
22
2 Random Variables
2.1 Measurable Functions
Denition 2.1.1:
A random variable (rv) is a set function from
to IR.
More formally: Let (
; L; P ) be any probability space. Suppose X :
! IR and that X is a
measurable function, then we call X a random variable.
More generally: If X :
! IRk , we call X a random vector, X = (X1 (!); X2 (!); : : : ; Xk (!)).
Denition 2.1.2:
Suppose (
; L) and (S; B) are two measurable spaces and X :
! S is a mapping from
to S. We
say that X is measurable L ; B if X ;1 (B ) 2 L for every set B 2 B, where X ;1 (B ) = f! 2
:
X (!) 2 B g.
Example 2.1.3:
Record the opinion of 50 people: \yes" (y) or \no" (n).
23
Example 2.1.4:
Let
= \innite fair coin tossing space", i.e., innite sequence of H's and T's.
Let Ln be a {eld for the 1st n tosses.
[
1
Dene L = ( Ln ).
n=1
Let Xn :
! IR be Xn (!) = \proportion of H's in 1st n tosses".
For each n, Xn () is simple (values f0; n1 ; n2 ; : : : ; ng) and Xn;1 ( nk ) 2 Ln 8k = 0; 1; : : : ; n.
Therefore, Xn;1 ( nk ) 2 L.
So every random variable Xn () is measurable L ; B. Now we have a sequence of rv's fXn g1 n=1 . We
1
will show later that P (f! : Xn (!) ! 2 g) = 1, i.e., the Strong Law of Large Numbers (SLLN).
24
Stat 6710 Mathematical Statistics I Fall Semester 1999
This means we only have to check measurability on a basis collection A. The usage is: B on IR is
generated by f(;1; x] : x 2 IRg.
2.1.6:
If (
; L); (
0 ; L0 ), and (
00 ; L00 ) are measure spaces and X :
!
0 and Y :
0 !
00 are measur-
able, then the composition (Y X ) :
!
00 is measurable L ; L00 .
2.1.7:
If f : IRi ! IRk is a continuous function, then f is measurable Bi ; Bk .
2.1.8:
If fj :
! IR; j = 1; : : : k and g : IRk ! IR are measurable, then g(f1 (); : : : ; fk ()) is measurable.
The usage is: g could be sum, average, dierence, product, (nite) maximums and minimums of
x1 ; : : : ; xk , etc.
2.1.9:
Limits: Extend the real line to [;1; 1] = IR [ f;1; 1g.
We say f :
! IR is measurable L ; B if
(i) f ;1 (B ) 2 L 8B 2 B, and
(ii) f ;1 (;1); f ;1 (1) 2 L also.
25
2.1.10:
Suppose f1 ; f2 ; : : : is a sequence of real{valued measurable functions (
; L) ! (IR; B). Then it
holds:
(i) sup fn ; inf
n fn; lim sup fn ; lim inf
n fn , are measurable.
n n
(ii) If f = lim
n fn exists, then f is measurable.
(iii) The set f! : fn (!) convergesg 2 L.
(iv) If f is any measurable function, the set f! : fn (!) ! f (!)g 2 L.
26
Stat 6710 Mathematical Statistics I Fall Semester 1999
Theorem 2.2.1:
A random variable X on (
; L; P ) induces a probability measure on a space (IR; B; Q) with the
probability distribution Q of X dened by
Q(B ) = P (X ;1 (B )) = P (f! : X (!) 2 B g) 8B 2 B:
Note:
By the denition of a random variable, X ;1 (B ) 2 L 8B 2 B.
Proof:
If X induces a probability measure Q on (IR; B), then Q must satisfy the Kolmogorov Axioms of
probability.
X : (
; L) ! (S; B). X is a rv ) X ;1 (B ) = f! : X (!) 2 B g = A 2 L 8B 2 B.
Def:1:1:7(i)
(i) Q(B ) = P (X ;1 (B )) = P (f! : X (!) 2 B g) = P (A) 0 8B 2 B
(ii) Q(IR) = P (X ;1 (IR)) X=rv P (
) Def:1=:1:7(ii) 1
(iii) Let fBn g1
n=1 2 B; Bi \ Bj = ; 8i 6= j . Then,
[
1 [
1 [
1 X
1 X
1
Q( Bn ) = P (X ;1 ( Bn )) (=) P ( (X ;1 (Bn))) Def:1=:1:7(iii) P (X ;1 (Bn)) = Q(Bn )
n=1 n=1 n=1 n=1 n=1
() holds since X ;1 () commutes with unions/intersections and preserves disjointedness.
27
Denition 2.2.2:
A real{valued function F on (;1; 1) that is non{decreasing, right{continuous, and satises
F (;1) = 0; F (1) = 1
is called a cumulative distribution function (cdf) on IR.
Note:
No mention of probability space or measure P in Denition 2.2.2 above.
Denition 2.2.3:
Let P be a probability measure on (IR; B). The cdf associated with P is
F (x) = FP (x) = P ((;1; x]) = P (f! : X (!) xg) = P (X x)
for a random variable X dened on (IR; B; P ).
Note:
F () dened as in Denition 2.2.3 above indeed is a cdf.
Proof:
(i) Let x1 < x2
=) (;1; x1 ] (;1; x2 ]
Th:1:2:1(v)
=) F (x1 ) = P (f! : X (!) x1 g) P (f! : X (!) x2 g) = F (x2 )
Thus, since x1 < x2 and F (x1 ) F (x2 ), F (:) is non{decreasing.
(ii) Since F is non-decreasing, it is sucient to show that F (:) is right{continuous if for any
sequence of numbers xn ! x+ (which means that xn is approaching x from the right) with
x1 > x2 > : : : > xn > : : : > x : F (Xn ) ! F (X ):
Let An = f! : X (!) 2 (x; xn ]g 2 L and An # ;. None of the intervals (x; xn ] contains x. As
xn ! x+, the number of points ! in An diminishes until the set is empty. Formally,
\n \
1
!1 i=1 Ai = n=1 An = ;.
!1 An = nlim
nlim
By Theorem 1.2.6 it follows that
!1 An ) = P (;) = 0.
!1 P (An ) = P (nlim
nlim
28
It is
P (An ) = P (f! : X (!) xn g) ; P (f! : X (!) xg) = F (xn ) ; F (x).
=) (nlim
!1 F (xn )) ; F (x) = nlim
!1(F (xn ) ; F (x)) = nlim
!1 P (An ) = 0
=) nlim
!1 F (xn ) = F (x)
=) F (x) is right{continuous.
(iii) F (;n) Def:=2:2:3 P (f! : X (!) ;ng)
=)
F (;1) = !1 F (;n)
nlim
= !1 P (f! : X (!) ;ng)
nlim
= !1f! : X (!) ;ng)
P (nlim
= P (;)
= 0
Note that (iii) and (iv) implicitly use Theorem 1.2.6. In (iii), we use An = (;1; ;n) where
An An+1 and An # ;. In (iv), we use An = (;1; n) where An An+1 and An " IR.
Denition 2.2.4:
If a random variable X :
! IR has induced a probability measure PX on (IR; B) with cdf F (x),
we say
(i) rv X is continuous if F (x) is continuous in x.
(ii) rv X is discrete if F (x) is a step function in x.
29
Note:
There are rvs that are mixtures of continuous and discrete rvs. One such example is a truncated
failure time distribution. We assume a continuous distribution (e.g., exponential) up to a given
truncation point x and assign the \remaining" probability to the truncation point. Thus, a single
point has a probability > 0 and F (x) jumps at the truncation point x.
Denition 2.2.5:
Two random variables X and Y are identically distributed i PX (X 2 A) = PY (Y 2 A) 8A 2 L.
Note:
Def. 2.2.5 does not mean that X (!) = Y (!) 8! 2
. For example,
X = # H in 3 coin tosses
Y = # T in 3 coin tosses
X; Y are both Bin(3; 0:5), i.e., identically distributed, but for ! = (H; H; T ); X (!) = 2 6= 1 = Y (!),
i.e., X 6= Y .
Theorem 2.2.6:
The following two statements are equivalent:
(i) X; Y are identically distributed.
(ii) FX (x) = FY (x) 8x 2 IR.
Proof:
(i) ) (ii):
FX (x) = PX ((;1; x])
= P (f! : X (!) 2 (;1; x]g)
byDef:2:2:5
= P (f! : Y (!) 2 (;1; x]g)
= PY ((;1; x])
= FY (X )
(ii) ) (i):
Requires extra knowledge from measure theory.
30
Stat 6710 Mathematical Statistics I Fall Semester 1999
Denition 2.3.1:
Let X be a real{valued random variable with cdf F on (
; L; P ). X is discrete if there exists a
countable set E IR such that P (X 2 E ) = 1, i.e., P (f! : X (!) 2 E g) = 1. The points of E
which have positive probability are the jump points of the step function F , i.e., the cdf of X .
X
1
Dene pi = P (f! : X (!) = xi ; xi 2 E g) = PX (X = xi ) 8i 1. Then, pi 0; pi = 1.
i=1
We call fpi : pi 0g the probability mass function (pmf) (also: probability frequency function)
of X .
Note:
X
1
Given any set of numbers fpn g1
n=1 ; p n 0 8 n 1 ; pn = 1, fpn g1
n=1 is the pmf of some rv X .
n=1
Note:
The issue of continuous rv's and probability density functions (pdfs) is more complicated. A rv
X :
! IR always has a cdf F . Whether there exists a function f such that f integrates to F
and F 0 exists and equals f (almost everywhere) depends on something stronger than just continuity.
Denition 2.3.2:
A real{valued function F is continuous in x0 2 IR i
8 > 0 9 > 0 8x : j x ; x0 j< )j F (x) ; F (x0) j< :
F is continuous i F is continuous in all x 2 R.
31
Denition 2.3.3:
A real{valued function F dened on [a; b] is absolutely continuous on [a; b] i
8 > 0 9 > 0 8 nite subcollection of disjoint subintervals [ai; bi ]; i = 1; : : : ; n :
X
n Xn
(bi ; ai ) < ) j F (bi ) ; F (ai ) j< :
i=1 i=1
Note:
Absolute continuity implies continuity.
Theorem 2.3.4:
(i) If F is absolutely continuous, then F 0 exists almost everywhere.
(ii) A function F is an indenite integral i it is absolutely continuous. Thus, every absolutely
continuous function F is the indenite integral of its derivative F 0 .
Denition 2.3.5:
Let X be a random variable on (
; L; P ) with cdf F . We say X is a continuous rv i F is absolutely
continuous. In this case, there exists a non{negative integrable function f , the probability density
function (pdf) of X , such that
Zx
F (x) = f (t)dt = P (X x):
;1
From this it follows that, if a; b 2 IR; a < b, then
Zb
PX (a < X b) = F (b) ; F (a) = f (t)dt
a
exists and is well dened.
Theorem 2.3.6:
Let X be a continuous random variable with pdf f . Then it holds:
Z
(i) For every Borel set B 2 B; P (B ) = f (t)dt.
B
(ii) If F is absolutely continuous and f is continuous at x, then F 0 (x) = dFdx(x) = f (x).
32
Proof:
Part (i): From Denition 2.3.5 above.
Part (ii): By Fundamental Theorem of Calculus.
Note:
As already stated in the Note following Denition 2.2.4, not every rv will fall into one of these two
(or if you prefer { three {, i.e., discrete, continuous/absolutely continuous) classes. However, most
rv which arise in practice will. We look at one example that is unlikely to occur in practice in the
next Homework assignment.
However, note that every cdf F can be written as
F (x) = aFd (x) + (1 ; a)Fc(x); 0 a 1;
where Fd is the cdf of a discrete rv and Fc is a continuous (but not necessarily absolute continuous)
cdf.
Some authors, such as Marek Fisz Wahrscheinlichkeitsrechnung und mathematische Statistik, VEB
Deutscher Verlag der Wissenschaften, Berlin, 1989, are even more specic. There it is stated that
every cdf F can be written as
F (x) = a1 Fd (x) + a2Fc (x) + a3 Fs(x); a1 ; a2 ; a3 0; a1 + a2 + a3 = 1:
Here, Fd (x) and Fc (x) are discrete and continuous cdfs (as above). Fs (x) is called a singular cdf.
Singular means that Fs (x) is continuous and its derivative F 0 (x) equals 0 almost everywhere (i.e.,
everywhere but in those points that belong to a Borel{measurable set of probability 0).
Question: Does \continuous" but \not absolutely continuous" mean \singular"? | We will (hope-
fully) see later: : :
33
Stat 6710 Mathematical Statistics I Fall Semester 1999
Example 2.3.7:
Consider 8
>
> 0; x<0
>
< 1=2; x=0
F (x) = >
> 1=2 + x=2; 0 < x < 1
>
: 1; x1
We can write F (x) as aFd (x) + (1 ; a)Fc (x); 0 a 1. How?
Since F (x) has only one jump at x = 0, it is reasonable to get started with a pmf p0 = 1 and
corresponding cdf (
Fd (x) = 0; x < 0
1; x 0
Since F (x) = 0 for x < 0 and F (x) = 1 for x 1, it must clearly hold that Fc (x) = 0 for x < 0
and Fc (x) = 1 for x 1. In addition F (x) increases linearly in 0 < x < 1. A good guess would be
a pdf fc (x) = 1 I(0;1) (x) and corresponding cdf
8
>
< 0; x 0
Fc (x) = > x; 0 < x < 1
: 1; x 1
Knowing that F (0) = 1=2, we have at least to multiply Fd (x) by 1=2. And, indeed, F (x) can be
written as
F (x) = 21 Fd (x) + 12 Fc (x):
Denition 2.3.8:
The two{valued function IA (x) is called indicator function and it is dened as follows:
IA(x) = 1 if x 2 A and IA(x) = 0 if x 62 A for any set A.
34
An Excursion into Logic
When proving theorems we only used direct methods so far. We used induction proofs to show that
something holds for arbitrary n. To show that a statement A implies a statement B , i.e., A ) B ,
we used proofs of the type A ) A1 ) A2 ) : : : ) An;1 ) An ) B where one step directly
follows from the previous step. However, there are dierent approaches to obtain the same result.
A ) B is equivalent to :B ) :A is equivalent to :A _ B :
A B A ) B :A :B :B ) :A :A _ B
1 1 1 0 0 1 1
1 0 0 0 1 0 0
0 1 1 1 0 1 1
0 0 1 1 1 1 1
A B A , B A ) B B ) A (A ) B ) ^ (B ) A) :A _ B A _ :B (:A _ B ) ^ (A _ :B )
1 1 1 1 1 1 1 1 1
1 0 0 0 1 0 0 1 0
0 1 0 1 0 0 1 0 0
0 0 1 1 1 1 1 1 1
35
Stat 6710 Mathematical Statistics I Fall Semester 1999
Theorem 2.4.1:
Given a random rariable X with known induced distribution and a Borel{measurable function g,
then the distribution of the random variable Y = g(X ) is determined.
Proof:
FY (y) = PY (Y y)
= P (f! : g(X (!)) yg)
= P (f! : X (!) 2 By g) where By = g;1 (;1; y] 2 B since g is Borel-measureable.
= P (X ;1 (By ))
Note:
From now on, we restrict ourselves to real{valued (vector{valued) functions that are Borel{measurable,
i.e., measurable with respect to (IR; B) or (IRk ; Bk ).
More generally, PY (Y 2 C ) = PX (X 2 g;1 (C )) 8C 2 B.
Example 2.4.2:
Suppose X is a discrete random variable. Let A be a countable set such that P (X 2 A) = 1 and
P (X = x) > 0 8x 2 A.
Let Y = g(X ). Obviously, the sample space of Y is also countable. Then,
X X
PY (Y = y) = PX (X = x) = PX (X = x) 8y 2 g(A)
x2g;1 (fyg) fx:g(x)=yg
36
Example 2.4.3:
X U (;1; 1) so the pdf of X is fX (x) = 1=2I[;1;1] (x), which, according to Denition 2.3.8, reads
as fX (x) = 1=2 for ;1 x 1 and 0 otherwise.
(
Let Y = X =+ x; x 0
0; otherwise
Then, 8
>
> 0; y<0
>
< 1=2; y=0
FY (y) = PY (Y y) = >
> 1=2 + y=2; 0 < y < 1
>
: 1; y1
This is the mixed discrete/continuous distribution from Example 2.3.7.
Note:
We need to put some conditions on g to ensure g(X ) is continuous if X is continuous and avoid
cases as in Example 2.4.3 above.
Denition 2.4.4:
For a random variable X from (
; L; P ) to (IR; B), the support of X (or P ) is any set A 2 L for
which P (A) = 1. For a continuous random variable X with pdf f , we can think of the support of
X as X = X ;1 (fx : fX (x) > 0g).
Denition 2.4.5:
Let f be a real{valued function dened on D IR; D 2 B. We say:
f is (strictly) non{decreasing if x < y ) f (x) (<) f (y) 8x; y 2 D
f is (strictly) non{increasing if x < y ) f (x) (>) f (y) 8x; y 2 D
f is monotonic on D if f is either increasing or decreasing and write f " or f #.
Theorem 2.4.6:
Let X be a continuous rv with pdf fX and support X. Let y = g(x) be dierentiable for all x and
either (i) g0 (x) > 0 or (ii) g0 (x) < 0 for all x.
Then, Y = g(X ) is also a continuous rv with pdf
d g;1 (y) j I (y):
fY (y) = fX (g;1 (y)) j dy g(X)
37
Proof:
Part (i): g0 (x) > 0 8x 2 X
So g is strictly increasing and continuous.
Therefore, x = g;1 (y) exists and it is also strictly increasing and also dierentiable.
Then, from Rohatgi, page 9, Theorem 15:
d g;1 (y) = d g(x) j ;1 ;1 > 0
dy dx x=g (y)
We get FY (y) = PY (Y y) = PY (g(X ) y) = PX (X g;1 (y)) = FX (g;1 (y)) for y 2 g(X) and,
by dierentiation,
d (F (g;1 (y))) By Chain
fY (y) = FY0 (y) = dy d g;1 (y)
= Rule fX (g;1 (y)) dy
X
Since dyd g;1 (y) < 0, the negative sign will cancel out, always giving us a positive value. Hence the
need for the absolute value signs.
Combining parts (i) and (ii), we can therefore write
d g;1 (y) j I (y):
fY (y) = fX (g;1 (y)) j dy g(X)
38
Stat 6710 Mathematical Statistics I Fall Semester 1999
Note:
In Theorem 2.4.6, we can also write
fY (y) = fdg((xx)) ; y 2 g (X )
j dx j x=g;1(y)
If g is monotonic over disjoint intervals, we can also get an expression for the pdf/cdf of Y = g(X )
as stated in the following theorem:
Theorem 2.4.7:
Let Y = g(X ) where X is a rv with pdf fX (x) on support X. Suppose there exists a partition
A0 ; A1 ; : : : ; Ak of X such that P (X 2 A0 ) = 0 and fX (x) is continuous on each Ai . Suppose there
exist functions g1 (x); : : : ; gk (x) dened on A1 through Ak , respectively, satisfying
(i) g(x) = gi (x) 8x 2 Ai ,
(ii) gi (x) is monotonic on Ai ,
(iii) the set Y = gi (Ai ) = fy : y = gi (x) for some x 2 Ai g is the same for each i = 1; : : : ; k, and
(iv) gi;1 (y) has a continuous derivative on Y for each i = 1; : : : ; k.
Then,
X
k d g;1 (y) j I (y)
fY (y) = fX (gi;1 (y)) j dy i Y
i=1
Note:
Rohatgi, page 73, Theorem 4, removes condition (iii) by dening n = n(y) and x1 (y); : : : ; xn (y).
39
Example 2.4.8:
Let X be a rv with pdf fX (x) = 2x2 I(0;) (x).
Let Y = sin(X ). What is fY (y)?
Since sin is not monotonic on (0; ), Theorem 2.4.6 cannot be used to determine the pdf of Y .
Two possible approaches:
Method 1: cdfs
For 0 < y < 1 we have
FY (y) = PY (Y y)
= PX (sin X y)
= PX ([0 X sin;1 (y)] or [ ; sin;1 (y) X ])
= FX (sin;1 (y)) + (1 ; FX ( ; sin;1 (y)))
since [0 X sin;1 (y)] and [ ; sin;1 (y) X ] are disjoint sets. Then,
fY (y) = FY0 (y)
= fX (sin;1 (y)) p 1 2 + (;1)fX ( ; sin;1 (y)) p ;1 2
1;y 1;y
= p 1 2 fX (sin;1 (y)) + fX ( ; sin;1 (y))
1;y
1 2(sin;1 (y)) 2( ; sin;1 (y)) !
= p 2 + 2
1 ; y2
= 2 p 1 2 2
1;y
= p 2 2 I(0;1) (y)
1;y
Method 2: Use of Theorem 2.4.7
Let A1 = (0; 2 ), A2 = ( 2 ; ), and A0 = f 2 g.
Let g1;1 (y) = sin;1 (y) and g2;1 (y) = ; sin;1 (y).
It is dyd g1;1 (y) = p11;y2 = ; dyd g2;1 (y) and Y = (0; 1).
Thus, by use of Theorem 2.4.7, we get
X
2 d g;1 (y) j I (y)
fY (y) = fX (gi;1 (y)) j dy i Y
i=1
40
;1 ;1 (y)) 1
= 2 sin2 (y) p 1 2 I(0;1) (y) + 2( ; sin
2
p 2 I(0;1) (y)
1;y 1;y
= 22 p 1 2 I(0;1) (y)
1;y
= p 2 2 I(0;1) (y)
1;y
Obviously, both results are identical.
Theorem 2.4.9:
Let X be a rv with a continuous cdf FX (x) and let Y = FX (X ). Then, Y U (0; 1).
Proof:
We have to consider two possible cases:
(a) FX is strictly increasing, i.e., FX (x1 ) < FX (x2 ) for x1 < x2 , and
(b) FX is non{decreasing, i.e., there exists x1 < x2 and FX (x1 ) = FX (x2 ). Assume that x1 is
the inmum and x2 the supremum of those values for which FX (x1 ) = FX (x2 ) holds.
In (a), FX;1 (y) is uniquely dened. In (b), we dene FX;1 (y) = inf fx : FX (x) yg
Without loss of generality:
FX;1 (1) = +1 if FX (x) < 1 8x 2 IR and
FX;1 (0) = ;1 if FX (x) > 0 8x 2 IR.
For Y = FX (X ) and 0 < y < 1, we have
P (Y y) = P (FX (X ) y)
F ;1 "
=
X
P (FX;1 (FX (X )) FX;1(y))
()
= P (X FX;1 (y))
= FX (FX;1 (y))
= y
41
Stat 6710 Mathematical Statistics I Fall Semester 1999
Example:
X Cauchy; fX (x) = (1+1 x2 ) ; ;1 < x < 1:
Z1 x 1 [log(1 + x2 )]1 = 1
2
E (j X j) = dx =
0 1+x 2 0
So, E (X ) does not exist for the Cauchy distribution.
Theorem 3.1.2:
If E (X ) exists and a and b are nite constants, then E (aX + b) exists and equals aE (X ) + b.
Proof:
Continuous case only:
Existence:
Z1
E (j aX + b j) = j ax + b j fX (x)dx
Z;1
1
(j a j j x j + j b j)fX (x)dx
;1Z Z1
1
= jaj j x j fX (x)dx+ j b j fX (x)dx
;1 ;1
42
= j a j E (j X j)+ j b j
< 1
Numerical Result:
Z1
E (aX + b) = (ax + b)fX (x)dx
;1
Z1 Z1
= a xfX (x)dx + b fX (x)dx
;1 ;1
= aE (X ) + b
Theorem 3.1.3:
If X is bounded (i.e., there exists a M; 0 < M < 1, such that P (j X j< M ) = 1), then E (X )
exists.
Denition 3.1.4:
The kth moment of X , if it exists, is mk = E (X k ).
The kth central moment of X , if it exists, is k = E ((X ; E (X ))k ).
Denition 3.1.5:
The variance of X , if it exists, is the second central moment of X , i.e.,
V ar(X ) = E ((X ; E (X ))2 ):
Theorem 3.1.6:
V ar(X ) = E (X 2 ) ; (E (X ))2 .
Proof:
V ar(X ) = E ((X ; E (X ))2 )
= E (X 2 ; 2XE (X ) + (E (X ))2 )
= E (X 2 ) ; 2E (X )E (X ) + (E (X ))2
= E (X 2 ) ; (E (X ))2
43
Theorem 3.1.7:
If V ar(X ) exists and a and b are nite constants, then V ar(aX + b) exists and equals a2 V ar(X ).
Proof:
Existence & Numerical Result:
; ;
V ar(aX + b) = E ((aX + b) ; E (aX + b))2 exists if E j ((aX + b) ; E (aX + b))2 j exists.
It holds that
E j ((aX + b) ; E (aX + b))2 j
= E ((aX + b) ; E (aX + b))2
= V ar(aX + b)
3:1:6
Th:= E ((aX + b)2 ) ; (E (aX + b))2
3:1:2
Th:= E (a2 X 2 + 2abX + b2 ) ; (aE (X ) + b)2
3:1:2
Th:= a2 E (X 2 ) + 2abE (X ) + b2 ; a2 (E (X ))2 ; 2abE (X ) ; b2
= a2 (E (X 2 ) ; (E (X ))2 )
3:1:6
Th:= a2 V ar(X )
< 1 since V ar(X ) exists
44
Stat 6710 Mathematical Statistics I Fall Semester 1999
Theorem 3.1.8:
If the tth moment of a rv X exists, then all moments of order 0 < s < t exist.
Proof:
Continuous case only:
Z Z
E (j X js) = j x js fX (x)dx + j x js fX (x)dx
Zj x j 1 Z j xj> 1
1 fX (x)dx + j x jt fX (x)dx
jxj1 jxj>1
P (j X j 1) + E (j X jt)
< 1
Theorem 3.1.9:
If the tth moment of a rv X exists, then
t
!1 n P (j X
nlim j> n) = 0:
Proof:
Continuous case only:
Z Z
t
1 > j x j fX (x)dx = nlim j x jt fX (x)dx
IR !1 jxjn
Z
=) nlim
!1 j x jt fX (x)dx = 0
jxj>n
Z Z
But, nlim
!1 j x jt fX (x)dx nlim
!1 n
t fX (x)dx = nlim t
!1 n P (j X j> n) = 0
jxj>n jxj>n
Note:
t
!1 n P (j X j> n) = 0, then the t moment of a rv
The inverse is not necessarily true, i.e., if nlim th
X does not necessarily exist. We can only approach t up to some > 0 as the following Theorem
3.1.10 indicates.
45
Stat 6710 Mathematical Statistics I Fall Semester 1999
Theorem 3.1.10:
t
!1 n P (j X j> n) = 0 for some t > 0. Then,
Let X be a rv with a distribution such that nlim
E (j X js ) < 1 8 0 < s < t:
Note:
To prove this Theorem, we need Lemma 3.1.11 and Corollary 3.1.12.
Lemma 3.1.11:
Let X be a non{negative rv with cdf F . Then,
Z1
E (X ) = (1 ; FX (x))dx
0
(if either side exists).
Proof:
Continuous case only:
To prove that the left side implies that the right side is nite and both sides are identical, we
assume that E (X ) exists. It is
Z1 Zn
E (X ) = xfX (x)dx = nlim !1 xfX (x)dx
0 0
Replace the expression for the right side integral using integration by parts.
Let u = x and dv = fX (x)dx, then
Zn Zn
n
xfX (x)dx = (xF (x)) j0 ; FX (x)dx
0 0
Zn
= nFX (n) ; 0FX (0) ; FX (x)dx
0
Zn
= nFX (n) ; n + n ; FX (x)dx
0
Zn
= nFX (n) ; n + [1 ; FX (x)]dx
0
46
Zn
= n[FX (n) ; 1] + [1 ; FX (x)]dx
0
Zn
= ;n[1 ; FX (n)] + [1 ; FX (x)]dx
0
Z n
= ;nP (X > n) + [1 ; FX (x)]dx
0
X 0
Zn
= ;n1 P (jX j > n) + [1 ; FX (x)]dx
0
Zn
=) E (X 1 ) = nlim 1
!1[;n P (jX j > n) + [1 ; FX (x)]dx]
0
Th:=3:1:9 0 +
Zn
nlim
!1 [1 ; FX (x)]dx
0
Z1
= [1 ; FX (x)]dx
0
Z1
Thus, the existence of E (X ) implies that [1 ; FX (x)]dx is nite and that both sides are identical.
0
We still have to show the converse implication:
Z1
If [1 ; FX (x)]dx is nite, then E (X ) exists, i.e., E (j X j) = E (X ) < 1, and both sides are
0
identical. It is
Zn Zn Zn
xfX (x)dx X=0 j x j fX (x)dx = ;n[1 ; FX (n)] + [1 ; FX (x)]dx
0 0 0
as seen above.
Since ;n[1 ; FX (n)] 0, we get
Zn Zn Z1
j x j fX (x)dx [1 ; FX (x)]dx [1 ; FX (x)]dx < 1 8n
0 0 0
Thus, Zn Z1 Z1
nlimj x j fX (x) = j x j fX (x)dx [1 ; FX (x)]dx < 1
!1 0 0 0
Z1
=) E (X ) exists and is identical to [1 ; FX (x)]dx as seen above.
0
47
Corollary 3.1.12: Z1
E (j X js) = s ys;1 P (j X j> y)dy
0
Proof: Z1 Z1
E (j X js ) Lemma= 3:1:11 [1 ; FjX js (z )]dz = P (j X js > z )dz .
0 0
dz = sys;1 and dz = sys;1dy. Therefore,
Let z = ys . Then dy
Z1 Z1
P (j X js> z)dz = P (j X js > ys)sys;1 dy
0 0Z
1 s;1
= s y P (j X js > ys )dy
monotonic "
Z01
= s ys;1P (j X j> y)dy
0
Thus, for E (j X js ) < 1, it must hold that s ; 1 ; t < ;1, or equivalently, s < t. So
E (j X js ) < 1, i.e., it exists, for every s with 0 < s < t for a rv X with a distribution such
t
!1 n P (j X j> n) = 0 for some t > 0.
that nlim
48
Stat 6710 Mathematical Statistics I Fall Semester 1999
Theorem 3.1.13:
Let X be a rv such that
lim P (j X j> k) = 0 8 > 1:
k!1 P (j X j> k)
Then, all moments of X exist.
Proof:
For > 0; we select some k0 such that
P (j X j> k) < 8k k :
P (j X j> k) 0
Select k1 such that P (j X j> k) < 8k k1 :
Select N = max(k0 ; k1 ).
If we have some xed positive integer r:
P (j X j> r k) = P (j X j> k) P (j X j> 2 k) P (j X j> 3 k) : : : P (j X j> r k)
P (j X j> k) P (j X j> k) P (j X j> k) P (j X j> 2 k) P (j X j> r;1k)
49
but is Z1
n xn;1P (j X j> x)dx < 1 ?
N
To check the second part, we use:
Z1 1 Z N
X
r
n; 1
x P (j X j> x)dx = xn;1 P (j X j> x)dx
N r=1r;1 N
We know that: Zr N Zr N
xn;1 P (j X j> x)dx r xn;1 dx
r;1 N r;1 N
This step is possible P (j X j r;1N ) P (j X j> x) P (j X j r N )
since r
8x 2 (r;1 N; r N ) and N = max(k0 ; k1 ).
r n ; 1 r
x dx ( N )r n; 1 1dx r (r N )n;1 (r N ) r (r N )n
r;1 N r;1 N
50
Stat 6710 Mathematical Statistics I Fall Semester 1999
Theorem 3.2.2:
If a rv X has a mgf MX (t) that exists for ;h < t < h for some h > 0, then
n
E (X n ) = MX(n) (0) = dtd n MX (t) jt=0 :
Proof:
We assume that we can dierentiate under the integral sign. If, and when, this really is true will
be discussed later in this section.
d M (t) = d Z 1 etx f (x)dx
dt X dt ;1
Z1 @
X
= ( @t etx fX (x))dx
Z;1
1
= xetxfX (x)dx
;1
= E (XetX )
Evaluating this at t = 0, we get: dtd MX (t) jt=0 = E (X )
By iteration, we get for n 2:
!
dn M (t) = d dn;1 M (t)
dtn X dt dtn;1 X
d Z 1
= dt n ; 1 tx
x e fX (x)dx
Z1 @ ;1
= ( @t xn;1 etx fX (x))dx
;1
51
Z1
= xn etx fX (x)dx
;1
= E (X n etX )
Evaluating this at t = 0, we get: dtdnn MX (t) jt=0 = E (X n )
Example 3.2.3:
X U (a; b); fX (x) = b;1 a I[a;b](x).
Then,
Z b etx etb ; eta
MX (t) = dx =
a b;a t (b ; a )
MX (0) = 0
0
L'Hospital
= betb ; aeta
b;a
= 1
So MX (0) = 1 and since ettb(b;;eata) is continuous, it also exists in an open interval around 0 (in fact,
it exists for every t 2 IR).
=) E (X ) = MX0 (0) = 00
L'Hospital
= betb ; aeta + tb2 etb ; ta2 eta ; betb + aeta
2t(b ; a) t=0
= tb2 etb ; ta2eta = 0
2t(b ; a) t=0 0
L'Hospital
= b2 etb ; a2 eta + tb3 etb ; ta3eta
2(b ; a) t=0
= b2 ; a2
2(b ; a)
= b+a
2
52
Note:
In the previous example, we made use of L'Hospital's rule. This rule gives conditions under
which we can resolve indenite expressions of the type \ 00 " and \ 1
1 ".
(i) Let f and g be functions that are dierentiable in an open interval around x0 , say in
(x0 ; ; x0 + ); but not necessarily dierentiable in x0 . Let f (x0 ) = g(x0 ) = 0 and g0 (x) 6= 0
8x 2 (x0 ; ; x0 + ) ; fx0g. Then, xlim f 0(x) = A implies that also lim f (x) = A. The
!x0 g0 (x) x!x0 g(x)
+ ;
!x0 g(x) = 1 and x ! x0 or x ! x0 .
!x0 f (x) = xlim
same holds for the cases xlim
(ii) Let f and g be functions that are dierentiable for x > a (a > 0). Let xlim
!1 f (x) = xlim
!1 g(x) =
0
0 and xlim g 0 (x) 6= 0. Then, lim f (x) = A implies that also lim f (x) = A.
!1 x!1 g0 (x) x!1 g(x)
(iii) We can iterate this process as long as the required conditions are met and derivatives exist,
e.g., if the rst derivatives still result in an indenite expression, we can look at the second
derivatives, then at the third derivatives, and so on.
(iv) It is recommended to keep expressions as simple as possible. If we have identical factors in the
numerator and denominator, we can exclude them from both and continue with the simpler
functions.
(v) Indenite expressions of the form \0 1" can be handled by rearranging them to \ 1=01 " and
lim f (x) can be handled by use of the rules for lim f (;x) .
x!;1 g(x) x!1 g(;x)
Note:
The following Theorems provide us with rules that tell us when we can dierentiate under the
integral sign. Theorem 3.2.4 relates to nite integral bounds a() and b() and Theorems 3.2.5 and
3.2.6 to innite bounds.
53
Theorem 3.2.5: Lebesque's Dominated Convergence Theorem
Z1
Let g be an integrable function such that g(x)dx < 1. If j fn j g almost everywhere (i.e.,
;1
except for a set of Borel{measure 0) and if fn ! f almost everywhere, then fn and f are integrable
and Z Z
fn(x)dx ! f (x)dx:
Note:
If f is dierentiable with respect to , then
@ f (x; ) = lim f (x; + ) ; f (x; )
@ !0
and Z1 @ Z 1 f (x; + ) ; f (x; )
;1@ f (x; )dx = lim
!0 ;1 dx
while
@ Z 1 f (x; )dx = lim Z 1 f (x; + ) ; f (x; ) dx
@ ;1 !0 ;1
Theorem 3.2.6:
Let fZn(x; 0 ) = f (x;0 +nn);f (x;0 ) for some 0 . Suppose there exists an integrable function g(x) such
1
that g(x)dx < 1 and j fn (x; ) j g(x) 8x, then
;1
d Z1 Z1@
d ;1 f (x; )dx
=0
=
;1 @ f (x; ) j=0 dx:
Usually, if f is dierentiable for all , we write
d Z 1 f (x; )dx = Z 1 @ f (x; )dx:
d ;1 ;1 @
Corollary 3.2.7:
Z 1 f (x; ) be dierentiable
Let @ for all . Suppose
there exists an integrable function g(x; ) such that
g(x; )dx < 1 and @ f (x; ) j=0 g(x; ) 8x 80 in some {neighborhood of , then
;1
d Z 1 f (x; )dx = Z 1 @ f (x; ) j dx:
d ;1 =0
;1 @
54
More on Moment Generating Functions
Consider @
etxfX (x) jt=t0 =j x j et0 xfX (x) for j t0 ; t j 0 :
@t
Choose t; 0 small enough such that t + 0 2 (;h; h) and t ; 0 2 (;h; h). Then,
@
etxfX (x) jt=t0 g(x; t)
@t
where (
g(x; t) = j x j e(t;0 )x fX (x); x 0
(t+0 )x
jxje fX (x); x < 0
R
To verify g(x; t)dx < 1, we need to know fX (x).
Suppose mgf MX (t) exists for j t j h for some h > 1. Then j t + 0 + 1 j< h and j t ; 0 ; 1 j< h.
Since j x j ejxj 8x, we get
(
g(x; t) e(t;0 ;1)x fX (x); x 0
(t+0 +1)x
e fX (x); x < 0
Z1 Z0
Then, g(x; t)dx MX (t + 0 +1) < 1 and g(x; t)dx MX (t ; 0 ; 1) < 1 and, therefore,
Z1 0 ;1
g(x)dx < 1.
;1
Together with Corollary 3.2.7, this establishes that we can dierentiate under the integral in the
Proof of Theorem 3.2.2.
If h 1, we may need to check more carefully to see if the condition holds.
Note:
If MX (t) exists for t 2 (;h; h), then we have an innite collection of moments.
Does a collection of integer moments fmk : k = 1; 2; 3; : : : g completely characterize the distribution,
i.e., cdf, of X ? | Unfortunately not, as Example 3.2.8 shows.
55
Stat 6710 Mathematical Statistics I Fall Semester 1999
Example 3.2.8:
Let X1 and X2 be rv's with pdfs
fX1 (x) = p1 x1 exp(; 12 (log x)2 ) I(0;1)(x)
2
and
fX2 (x) = fX1 (x) (1 + sin(2 log x)) I(0;1) (x)
It is E (X1r ) = E (X2r ) = er2 =2 for r = 0; 1; 2; : : : as you have to show in the Homeworks.
Two dierent pdfs/cdfs have the same moment sequence! What went wrong? In this example,
MX1 (t) does not exist as shown in the Homeworks!
Theorem 3.2.9:
Let X and Y be 2 rv's with cdf's FX and FY for which all moments exist.
(i) If FX and FY have bounded support, then FX (u) = FY (u) 8u i E (X r ) = E (Y r ) for
r = 0; 1; 2; : : : .
(ii) If both mgf's exist, i.e., MX (t) = MY (t) for t in some neighborhood of 0, then FX (u) =
FY (u) 8u.
Note:
The existence of moments is not equivalent to the existence of a mgf as seen in Example 3.2.8 above
and some of the Homework assignments.
Theorem 3.2.10:
Suppose rv's fXi g1
i=1 have mgf's MXi (t) and that ilim
!1 Xi
M (t) = MX (t) 8t 2 (;h; h) for some
h > 0 and that MX (t) itself is a mgf. Then, there exists a cdf FX whose moments are determined
by MX (t) and for all continuity points x of FX (x) it holds that ilim F (x) = FX (x), i.e., the
!1 Xi
convergence of mgf's implies the convergence of cdf's.
Proof:
Uniqueness of Laplace transformations, etc.
56
Theorem 3.2.11:
For constants a and b, the mgf of Y = aX + b is
MY (t) = ebt MX (at);
given that MX (t) exists.
Proof:
MY (t) = E (e(aX +b)t )
= E (eaXt ebt )
= ebt E (eXat )
= ebt MX (at)
57
Conjugate Complex Numbers:
For z = a + ib, we dene the conjugate complex number z = a ; ib. It holds:
z=z
z = z i z 2 IR
z1 z2 = z1 z2
z1 z2 = z1 z2
z1 z1
z2 = z2
z z = a2 + b2
Re(z ) = a = 12 (z + z )
Im(z ) = b = 21i (z ; z )
p p
j z j= a2 + b2 = z z
Denition 3.3.1:
Let (
; L; P ) be a probability space and X and Y real{valued rv's, i.e., X; Y : (
; L) ! (IR; B)
(i) Z = X + iY : (
; L) ! (CI; BCI) is called a complex{valued random variable (CI-rv).
(ii) If E (X ) and E (Y ) exist, then E (Z ) is dened as E (Z ) = E (X ) + iE (Y ) 2 CI.
Note:
E (Z ) exists i E (j X j) and E (j Y j) exist. It also holds that if E (Z ) exists, then j E (Z ) j E (j Z j)
(see Homework).
Denition 3.3.2:
Let X be a real{valued rv on (
; L; P ). Then, X (t) : IR ! CI with X (t) = E (eitX ) is called the
characteristic function of X .
Note:
Z1 Z1 Z1
(i) X (t) = eitx fX (x)dx = cos(tx)fX (x)dx + i sin(tx)fX (x)dx if X is continuous.
;1 ;1 ;1
X X X
(ii) X (t) = eitx P (X = x) = cos(tx)P (X = x)+ i sin(tx)P (X = x)(x) if X is discrete
x2X x2X x2X
and X is the support of X .
(iii) X (t) exists for all real{valued rv's X since j eitx j= 1.
58
Stat 6710 Mathematical Statistics I Fall Semester 1999
Theorem 3.3.3:
Let X be the characteristic function of a real{valued rv X . Then it holds:
(i) X (0) = 1.
(ii) j X (t) j 1 8t 2 IR.
(iii) X is uniformly continuous, i.e., 8 > 0 9 > 0 8t1 ; t2 2 IR :j t1 ; t2 j< )j (t1 ) ; (t2 ) j< .
(iv) X is a positive denite function, i.e., 8n 2 IN 81 ; : : : ; n 2 CI 8t1 ; : : : ; tn 2 IR :
X
n X
n
l j X (tl ; tj ) 0.
l=1 j =1
(v) X (t) = X (;t).
(vi) If X is symmetric around 0, i.e., if X has a pdf that is symmetric around 0, then X (t) 2
IR 8t 2 IR.
(vii) aX +b (t) = eitb X (at).
Proof:
See Homework for parts (i), (ii), (iv), (v), (vi), and (vii).
Part (iii):
Known conditions:
(i) Let > 0.
(ii) 9 a > 0 : P (;a < X < +a) > 1 ; 4 and P (j X j> a) < 4
(iii) 9 > 0 : j e{(t0 ;t)x ; 1 j< 2 8x s.t. j x j< a and 8(t0 ; t) s.t. 0 < (t0 ; t) < .
This third condition holds since j e{0 ; 1 j= 0 and the exponential function is continuous.
Therefore, if we select (t0 ; t) and x small enough, j e{(t0 ;t)x ; 1 j will be < 2 for a given .
59
Let t; t0 2 IR, t < t0 , and t0 ; t < . Then,
Z +1 Z +1
j X (t0) ; X (t) j = j e{t0 xfX (x)dx ; e{tx fX (x)dx j
;1 ;1
Z +1
= j (e{t0 x ; e{tx )fX (x)dx j
;1
Z ;a Z +a Z +1
= j (e{t0 x ; e{tx )f X (x)dx + (e{t0 x ; e{tx )f X (x)dx + (e{t0 x ; e{tx )fX (x)dx j
;1 ;a +a
Z ;a Z +a Z +1
j (e{t0 x ; e{tx )fX (x)dx j + j (e{t0 x ; e{tx )fX (x)dx j + j (e{t0 x ; e{tx )fX (x)dx j
;1 ;a +a
We now take a closer look at the rst and third of these absolute integrals. It is:
Z ;a Z ;a Z ;a
j (e{t0 x ; e{tx )fX (x)dx j = j e{t0 xfX (x)dx ; e{tx fX (x)dx j
;1 ;1 ;1
Z ;a Z ;a
j e{t0 xfX (x)dx j + j e{tx fX (x)dx j
;1 ;1
Z ;a Z ;a
j e{t0 x j fX (x)dx + j e{tx j fX (x)dx
;1 ;1
(A)
Z ;a Z ;a
= 1fX (x)dx + 1fX (x)dx
;1 ;1
Z ;a
= 2fX (x)dx.
;1
(A) holds due to Note (iii) that follows Denition 3.3.2.
Similarly,
Z +1 Z +1
j (e{t0 x ; e{tx )fX (x)dx j 2fX (x)dx
+a +a
Returning to the main part of the proof, we get
Z ;a Z +a Z +1
j X (t0) ; X (t) j 2fX (x)dx + j (e{t0 x ; e{tx )fX (x)dx j + 2fX (x)dx
;1 ;a +a
Z ;a Z +1 Z +a
=2 fX (x)dx + fX (x)dx + j (e{t0 x ; e{tx )fX (x)dx j
;1 +a ;a
Z +a
= 2P (j X j> a) + j (e{t0 x ; e{tx )fX (x)dx j
;a
60
Condition (ii) Z +a 0
24 + j (e{t x ; e{tx )fX (x)dx j
;a
Z +a
= 2 +j e{tx (e{(t0 ;t)x ; 1)fX (x)dx j
;a
Z +a
2 + j e{tx (e{(t0 ;t)x ; 1) j fX (x)dx
;a
Z +a
2 + j e{tx j j (e{(t0 ;t)x ; 1) j fX (x)dx
;a
(B ) Z +a
2+ 1 2 fX (x)dx
;a
Z +1
2+ 2 fX (x)dx
;1
= 2 + 2
=
(B) holds due to Note (iii) that follows Denition 3.3.2 and due to condition (iii).
Theorem 3.3.5:
Let X be a real{valued rv and E (X k ) exists for an integer k. Then, X is k times dierentiable
and (Xk) (t) = ik E (X k eitX ). In particular for t = 0, it is (Xk) (0) = ik mk .
Theorem 3.3.6:
Let X be a real{valued rv with characteristic function X and let X be k times dierentiable,
where k is an even integer. Then the kth moment of X , mk , exists and it is (Xk) (0) = ik mk .
61
Theorem 3.3.7: Levy's Theorem
Let X be a real{valued rv with cdf FX and characteristic function X . Let a; b 2 IR, a < b. If
P (X = a) = P (X = b) = 0, i.e., FX is continuous in a and b, then
Z 1 e;ita ; e;itb
F (b) ; F (a) = 21 it X (t)dt:
;1
Theorem 3.3.8:
Let X and Y be a real{valued rv with characteristic functions X and Y . If X = Y , then X
and Y are identically distributed.
Theorem 3.3.9: Z1
Let X be a real{valued rv with characteristic function X such that j X (t) j dt < 1. Then
;1
X has pdf Z1
fX (x) = 21 e;itxX (t)dt:
;1
Theorem 3.3.10:
Let X be a real{valued rv with mgf MX (t), i.e., the mgf exists. Then X (t) = MX (it).
Theorem 3.3.11:
Suppose real{valued rv's fXi g1 1 1
i=1 have cdf's fFXi gi=1 and characteristic functions fXi (t)gi=1 . If
lim (t) = X (t) 8t 2 (;h; h) for some h > 0 and X (t) is itself a characteristic function
i!1 Xi
(of a rv X with cdf FX ), then ilim F (x) = FX (x) for all continuity points x of FX (x), i.e., the
!1 Xi
convergence of characteristic functions implies the convergence of cdf's.
62
Stat 6710 Mathematical Statistics I Fall Semester 1999
Theorem 3.3.12:
Characteristic functions for some well{known distributions:
Distribution X (t)
(i) X Dirac(c) eitc
(ii) X Bin(1; p) 1 + p(eit ; 1)
(iii) X Poisson(c) exp(c(eit ; 1))
(iv) X U (a; b) eitb ;eita
(b;a)it
(v) X N (0; 1) exp(;t2 =2)
(vi) X N (; 2 ) eit exp(;2 t2 =2)
(vii) X ;(p; q) (1 ; itq );p
(viii) X Exp(c) (1 ; itc );1
(ix) X 2n (1 ; 2it);n=2
Proof:
(i) X (t) = E (eitX ) = eitc P (X = c) = eitc
X
1
(ii) X (t) = eitk P (X = k) = eit0 (1 ; p) + eit1 p = 1 + p(eit ; 1)
k=0
X n X1 X1 xn
(iii) X (t) = eitn cn! e;c = e;c n1! (c eit )n = e;c ece = ec(e ;1) since
it it
n ! = ex
n2IN 0 n=0 n=0
Z " #
b
(iv) X (t) = b;1 a eitx dx = 1 eitx b = eitb ; eita
a b ; a it a (b ; a)it
(v) X N (0; 1) is symmetric around 0
) X (t) is real since there is no imaginary part according to Theorem 3.3.3 (vi)
Z1 ;x2
) X (t) = p12 cos(tx)e 2 dx
;1
63
Since E (X ) exists, X (t) is dierentiable according to Theorem 3.3.5 and the following holds:
0X (t) = Re(0X (t))
0 1
Z1
= Re B p1 e dxC
;x2
@ ;1 ix eitx
|{z} 2 A
cos(tx)+i sin(tx) 2
Z 1 1 ;2
Z1 1 ;2
= Re ix cos(tx) p e dx + 2
x
;x sin(tx) p e dxx
2
;1 2 ;1 2
Z 1
= p1 (|; sin( tx ))
{z } v0 xe
|
;2
x
{z2 dx
} j u 0 = ;t cos(tx) and v = ;e ;2 2
x
2 ;1 u
Z1
p1 sin(tx)e ;2 j1;1 ; p1
2 ;2
= (;t cos(tx))(;e 2 )dx
x x
| 2 {z } 2 ;1
=0 since sin is odd
Z1
= ;t p1 ;2
cos(tx)e 2 dx
x
2 ;1
= ;tX (t)
0
Thus, 0X (t) = ;tX (t). It follows that XX ((tt)) = ;t and by integrating both sides, we get
ln j X (t) j= ; 21 t2 + c with c 2 IR.
For t = 0, we know that X (0) = 1 by Theorem 3.3.3 (i) and ln j X (0) j= 0. It follows that
0 = 0 + c. Therefore, c = 0 and j X (t) j= e; 12 t2 .
If we take t = 0, then X (0) = 1 by Theorem 3.3.3 (i). Since X is continuous, X must take
the value 0 before it can eventually take a negative value. However, since e; 12 t2 > 0 8t 2 IR,
X cannot take 0 as a possible value and therefore cannot pass into the negative numbers.
So, it must hold that X (t) = e; 21 t2 8t 2 IR:
(vi) For > 0; 2 IR, we know that if X N (0; 1), then X + N (; 2 ). By Theorem 3.3.3
(vii) we have
1
X + (t) = eit X (t) = eit e; 2 2 t2 :
(vii)
Z1
X (t) = eitx
(p; q; x)dx
0
Z 1 qp
= xp;1e;(q;it)x dx
0 ;(p)
q p Z1
= ;(p) (q ; it) ; p ((q ; it)x)p;1 e;(q;it)x (q ; it)dx j u = (q ; it)x; du = (q ; it)dx
0
64
p Z1
= ;(q p) (q ; it);p (u)p;1 e;u du
| 0 {z }
=;(p)
= qp(q ; it);p
;p
= (q ; it)
q;p
q ; it ;p
= q
it ;p
= 1; q
(viii) Since an Exp(c) distribution is a ;(1; c) distribution, we get for X Exp(c) = ;(1; c):
X (t) = (1 ; itc );1
(ix) Since a 2n distribution (for n 2 IN ) is a ;( n2 ; 12 ) distribution, we get for X 2n = ;( n2 ; 12 ):
X (t) = (1 ; 1it=2 );n=2 = (1 ; 2it);n=2
Example 3.3.13:
Since we know that m1 = E (X ) and m2 = E (X 2 ) exist for X Bin(1; p), we can determine these
moments according to Theorem 3.3.5 using the characteristic function.
It is
X (t) = 1 + p(eit ; 1)
65
Note: Z1
The restriction j X (t) j dt < 1 in Theorem 3.3.9 works in such a way that we don't end up
;1
with a (non{existing) pdf if X is a discrete rv. For example,
X Dirac(c):
Z1 Z1
j X (t) j dt = j eitc j dt
;1 Z;1
1
= 1dt
;1
= x j1
;1
which is undened.
66
Otherwise, X N (0; 1):
Z1 Z1
j X (t) j dt = exp(;t2 =2)dt
;1 ;1
p Z1 1
= 2 p exp(;t2 =2)dt
p ;1 2
= 2
< 1
67
Stat 6710 Mathematical Statistics I Fall Semester 1999
Theorem 3.4.2:
G(s) converges for j s j< 1.
Proof: 1
X X
1
j G(s) j j pk sk j j pk j= 1
k=0 k=0
Theorem 3.4.3:
Let X be a discrete rv which only takes non{negative integer values and has pgf G(s). Then it
holds:
dk G(s) j
P (X = k) = k1! ds k s=0
Theorem 3.4.4:
Let X be a discrete rv which only takes non{negative integer values and has pgf G(s). If E (X )
exists, then it holds:
d G(s) j
E (X ) = ds s=1
68
Denition 3.4.5:
The kth factorial moment of X is dened as
E [X (X ; 1)(X ; 2) : : : (X ; k + 1)]
if this expectation exists.
Theorem 3.4.6:
Let X be a discrete rv which only takes non{negative integer values and has pgf G(s). If
E [X (X ; 1)(X ; 2) : : : (X ; k + 1)] exists, then it holds:
dk G(s) j
E [X (X ; 1)(X ; 2) : : : (X ; k + 1)] = ds k s=1
Note:
Similar to the Cauchy distribution for the continuous case, there exist discrete distributions where
the mean (or higher moments) do not exist. See Homework.
69
Corollary 3.5.2: Markov's Inequality
Let h(X ) =j X jr and = kr where r > 0 and k > 0. If E (j X jr ) exists, then it holds:
P (j X j k) E (j kXr j )
r
Proof:
Since P (j X j k) = P (j X jr kr ) for k > 0, it follows using Theorem 3.5.1:
P (j X j k) = P (j X jr kr ) E (j X j )
Th:3:5:1 r
kr
Proof:
Continuous case only:
k;1 k+1
Let Q(u; v) = E (u j X j 2 +v j X j 2 )2 . Obviously, Q(u; v) 0 8u; v 2 IR. Also,
Z1 ;1 +1 2
Q(u; v) = (u j x j +v j x j 2 ) fX (x)dx
k k
2
;1
Z1 Z1 Z1
= u 2 k ; 1
j x j fX (x)dx + 2uv k
j x j fX (x)dx + v 2 j x jk+1 fX (x)dx
;1 ;1 ;1
= u2 k;1 + 2uvk + v2 k+1
0 8u; v 2 IR
70
Using the fact that Ax2 + 2Bxy + Cy2 0 8x; y 2 IR i A > 0 and AC ; B 2 > 0 (see Rohatgi,
page 6, Section P2.4), we get with A = k;1 ; B = k , and C = k+1 :
k;1 k+1 ; k2 0
=) k2 k;1 k+1
=) k2k kk;1 kk+1
This means that 12 0 2 , 24 12 32 , 36 23 43 , and so on. Multiplying these, we get:
kY
;1 kY
;1
j2j jj;1 jj+1
j =1 j =1
= (0 2 )(12 32 )(23 43 )(34 54 ) : : : (kk;;32 kk;;12 )(kk;;21 kk;1 )
kY;2
= k ;2 k
0 k;1 k ; 1 j2j
j =1
kY
;2
Dividing both sides by j2j , we get:
j =1
k2;k;1 2 0 kk;1kk;;12
0 =1 k
=) k;1 kk;1
;1
=) k1;1 k k
k
1 1
=) kk;;11 kk
71
Stat 6710 Mathematical Statistics I Fall Semester 1999
4 Random Vectors
4.1 Joint, Marginal, and Conditional Distributions
Denition 4.1.1:
The vector X = (X1 ; : : : ; Xn )0 on (
; L; P ) ! IRn dened by X (!) = (X1 (!); : : : ; Xn (!))0 ; ! 2
,
is an n{dimensional random vector (n{rv) if X ;1 (I ) = f! : X1 (!) a1 ; : : : ; Xn (!) an g 2 L
for all n{dimensional intervals I = f(x1 ; : : : ; xn ) : ;1 < xi ai ; ai 2 IR 8i = 1; : : : ; ng.
Note:
It follows that if X1 ; : : : ; Xn are any n rv's on (
; L; P ), then X = (X1 ; : : : ; Xn )0 is an n{rv on
(
; L; P ) since for any I , it holds:
X ;1 (I ) = f! : (X1 (!); : : : ; Xn (!)) 2 I g
= f! : X1 (!) a1 ; : : : ; Xn (!) an g
\n
= f| ! : Xk ({z!) ak g}
k=1
| {z2L }
2L
Denition 4.1.2:
For an n{rv X , a function F dened by
F (x) = P (X x) = P (X1 x1 ; : : : ; Xn xn ) 8x 2 IRn
is the joint cumulative distribution function of X .
Note:
(i) F is non{decreasing and right{continuous in each of its arguments xi .
!1 F (x) = x1 !1lim
(ii) xlim !;1 F (x) = 0 8x1 ; : : : ; xk;1 ; xk+1 ; : : : ; xn 2 IR.
;:::xn !1 F (x) = 1 and xklim
72
However, conditions (i) and (ii) together are not sucient for F to be a joint cdf. Instead we need
the conditions from the next Theorem.
Theorem 4.1.3:
A function F (x) = F (x1 ; : : : ; xn ) is the joint cdf of some n{rv X i
(i) F is non{decreasing and right{continuous with respect to each xi ,
(ii) F (;1; x2 ; : : : ; xn ) = F (x1 ; ;1; x3 ; : : : ; xn ) = : : : = F (x1 ; : : : ; xn;1 ; ;1) = 0 and
F (1; : : : ; 1) = 1, and
(iii) 8x 2 IRn 8i > 0; i = 1; : : : n, the following inequality holds:
X
n
F (x + ) ; F (x1 + 1 ; : : : ; xi;1 + i;1 ; xi ; xi+1 + i+1 ; : : : ; xn + n)
i=1X
+ F (x1 + 1 ; : : : ; xi;1 + i;1 ; xi ; xi+1 + i+1 ; : : : ;
1i<j n
xj ;1 + j ;1 ; xj ; xj+1 + j+1; : : : ; xn + n)
:::
+ (;1)n F (x)
0
Note:
We won't prove this Theorem but just see why we need condition (iii) for n = 2:
P (x1 < X x2; y1 < Y y2 ) =
P (X x2 ; Y y2 ) ; P (X x1 ; Y y2 ) ; P (X x2 ; Y y1) + P (X x1 ; Y y1 ) 0
We will restrict ourselves to n = 2 for most of the next Denitions and Theorems but those can
be easily generalized to n > 2. The term bivariate rv is often used to refer to a 2{rv and multi-
variate rv is used to refer to an n{rv, n 2.
Denition 4.1.4:
A 2{rv (X; Y ) is discrete if there exists a countable collection XXof pairs (xi ; yi ) that has proba-
bility 1. Let pij = P (X = xi ; Y = yj ) > 0 8(xi ; yj ) 2 X. Then, pij = 1 and fpij g is the joint
i;j
probabiliy mass function of (X; Y ).
73
Denition 4.1.5:
Let (X; Y ) be a discrete 2{rv with joint pmf fpij g. Dene
X
1 X
1
pi = pij = P (X = xi ; Y = yj ) = P (X = xi )
j =1 j =1
and
X
1 X
1
pj = pij = P (X = xi ; Y = yj ) = P (Y = yj ):
i=1 i=1
Then fpi g is called the marginal probability mass function of X and fpj g is called the
marginal probability mass function of Y .
Denition 4.1.6:
A 2{rv (X; Y ) is continuous if there exists a non{negative function f such that
Zx Zy
F (x; y) = f (u; v) dv du 8(x; y) 2 IR2
;1 ;1
where F is the joint cdf of (X; Y ). We call f the joint probability density function of (X; Y ).
Note:
If F is continuous at (x; y), then
d2 F (x; y) = f (x; y):
dx dy
Denition 4.1.7: Z1
Let (X; Y ) be a continuous 2{rv with joint pdf f . Then fX (x) = f (x; y)dy is called the
Z1 ;1
marginal probability density function of X and fY (y) = f (x; y)dx is called the marginal
;1
probability density function of Y .
Note:
(i)
Z1 Z 1 Z 1 Z 1 Z 1 Z1
fX (x)dx = f (x; y)dy dx = F (1; 1) = 1 = f (x; y)dx dy = fY (y)dy
;1 ;1 ;1 ;1 ;1 ;1
and fX (x) 0 8x 2 IR and fY (y) 0 8y 2 IR.
(ii) Given a 2{rv (X; Y ) with joint cdf F (x; y), how do we generate a marginal cdf
FX (x) = P (X x) ? | The answer is P (X x) = P (X x; ;1 < Y < 1) = F (x; 1).
74
Stat 6710 Mathematical Statistics I Fall Semester 1999
Denition 4.1.8:
If FX (x1 ; : : : ; xn ) = FX (x) is the joint cdf of an n{rv X = (X1 ; : : : ; Xn ), then the marginal
cumulative distribution function of (Xi1 ; : : : ; Xik ); 1 k n ; 1; 1 i1 < i2 < : : : < ik n,
is given by
lim F (x) = FX (1; : : : ; 1; xi1 ; 1; : : : ; 1; xi2 ; 1; : : : ; 1; xik ; 1; : : : ; 1):
xi !1;i6=i1;:::;ik X
Note:
In Denition 1.4.1, we dened conditional probability distributions in some probability space
(
; L; P ). This denition extends to conditional distributions of 2{rv's (X; Y ).
Denition 4.1.9:
Let (X; Y ) be a discrete 2{rv. If P (Y = yj ) = pj > 0, then the conditional probability mass
function of X given Y = yj (for xed j ) is dened as
pijj = P (X = xi j Y = yj ) = P (XP=(Yxi=; Yy =) yj ) = ppij :
j j
Note:
For a continuous 2{rv (X; Y ) with pdf f , P (X x j Y = y) is not dened. Let > 0 and suppose
that P (y ; < Y y + ) > 0. For every x and every interval (y ; ; y + ], consider the conditional
probability of X x given Y 2 (y ; ; y + ]. We have
P (X x j y ; < Y y + ) = P (XP (y x;; y ;< Y<Yy + y)+ )
which is well{dened if P (y ; < Y y + ) > 0 holds.
75
Denition 4.1.10:
The conditional cumulative distribution function of a rv X given that Y = y is dened to be
FX jY (x j y) = lim
!0+
P (X x j Y 2 (y ; ; y + ])
provided that this limit exists. If it does exist, the conditional probability density function
of X given that Y = y is any non{negative function fX jY (x j y) satisfying
Zx
FX jY (x j y) = f (t j y)dt 8x 2 IR:
;1 X jY
Note: Z1
For xed y, fX jY (x j y) 0 and fX jY (x j y)dx = 1. So it is really a pdf.
;1
Theorem 4.1.11:
Let (X; Y ) be a continuous 2{rv with joint pdf fX;Y . It holds that at every point (x; y) where f is
continuous and the marginal pdf fY (y) > 0, we have
lim0+ P (XP
FX jY (x j y) = ! x; Y 2 (y ; ; y + ])
(Y 2 (y ; ; y + ])
0 Z x Z y+ 1
1 f (u; v )dv du
B 2 ;1 y; X;Y CC
lim0+ B
= ! B@ Z y+ CA
1 f (v)dv
2 y; Y
Zx
fX;Y (u; y)du
= ;1
fY (y)
Z x fX;Y (u; y)
= du:
;1 fY (y)
Thus, fX jY (x j y) exists and equals fX;Y (x;y)
fY (y) , provided that fY (y) > 0. Furthermore, since
Zx
fX;Y (u; y)du = fY (y)FX jY (x j y);
;1
we get the following marginal cdf of X :
Z 1 Z x Z1
FX (x) = fX;Y (u; y)du dy = fY (y)FX jY (x j y)dy
;1 ;1 ;1
76
Example 4.1.12:
Consider (
fX;Y (x; y) = 2; 0 < x < y < 1
0; otherwise
We calculate the marginal pdf's fX (x) and fY (y) rst:
Z1 Z1
fX (x) = fX;Y (x; y)dy = 2dy = 2(1 ; x) for 0 < x < 1
;1 x
and Z1 Zy
fY (y) = fX;Y (x; y)dx = 2dx = 2y for 0 < y < 1
;1 0
The conditional pdf's fY jX (y j x) and fX jY (x j y) are calculated as follows:
77
Denition 4.2.2:
Let FX;Y (x; y) be the joint cdf and FX (x) and FY (y) be the marginal cdf's of a 2{rv (X; Y ).
X and Y are independent i
FX;Y (x; y) = FX (x)FY (y) 8(x; y) 2 IR2:
Lemma 4.2.3:
If X and Y are independent, a; b; c; d 2 IR, and a < b and c < d, then
P (a < X b; c < Y d) = P (a < X b)P (c < Y d):
Proof:
P (a < X b; c < Y d) = FX;Y (b; d) ; FX;Y (a; d) ; FX;Y (b; c) + FX;Y (a; c)
= FX (b)FY (d) ; FX (a)FY (d) ; FX (b)FY (c) + FX (a)FY (c)
= (FX (b) ; FX (a))(FY (d) ; FY (c))
= P (a < X b)P (c < Y d)
Denition 4.2.4:
A collection of rv's X1 ; : : : ; Xn with joint cdf FX (x) and marginal cdf's FXi (xi ) are mutually (or
completely) independent i
Y
n
FX (x) = FX (xi ) 8x 2 IRn:
i
i=1
Note:
We often simply say that the rv's X1 ; : : : ; Xn are independent when we really mean that they are
mutually independent.
78
Stat 6710 Mathematical Statistics I Fall Semester 1999
Y
n
= FX (xi )
i
i=1
(ii) Continuous case: Homework
80
Theorem 4.2.6:
Yn
X1 ; : : : ; Xn are independent i P (Xi 2 Ai ; i = 1; : : : ; n) = P (Xi 2 Ai ) 8 Borel sets Ai 2 B
i=1
(i.e., rv's are independent i all events involving these rv's are independent).
Proof:
Lemma 4.2.3 and denition of Borel sets.
Theorem 4.2.7:
Let X1 ; : : : ; Xn be independent rv's and g1 ; : : : ; gn be Borel{measurable functions. Then
g1 (X1 ); g2 (X2 ); : : : ; gn(Xn ) are independent.
Proof:
Fg(X1 );g(X2 );:::;g(X )(h1 ; h2 ; : : : ; hn )
n
= P (g(X1 ) h1; g(X2 ) h2 ; : : : ; g(Xn ) hn)
()
= P (X1 2 g1;1 (;1; h1 ]; : : : ; Xn 2 gn;1 (;1; hn ])
Th:=4:2:6 Y
n
P (Xi 2 gi;1 (;1; hi ])
i=1
Yn
= P (gi (Xi ) hi)
i=1
Yn
= Fg (X ) (hi )
i i
i=1
() holds since g1;1 (;1; h1 ] 2 B; : : : ; gn;1 (;1; hn ] 2 B
Theorem 4.2.8:
If X1 ; : : : ; Xn are independent, then also every subcollection Xi1 ; : : : ; Xik ; k = 2; : : : ; n ; 1,
1 i1 < i2 : : : < ik n, is independent.
Denition 4.2.9:
A set (or a sequence) of rv's fXn g1
n=1 is independent i every nite subcollection is independent.
Note:
Recall that X and Y are identically distributed i FX (x) = FY (x) 8x 2 IR according to Denition
2.2.5 and Theorem 2.2.6.
81
Denition 4.2.10:
We say that fXn g1 n=1 is a set (or a sequence) of independent identically distributed (iid) rv's
1
if fXn gn=1 is independent and all Xn are identically distributed.
Note:
Recall that X and Y being identically distributed does not say that X = Y with probability 1. If
this happens, we say that X and Y are equivalent rv's.
Note:
We can also extend the dention of independence to 2 random vectors X n1 and Y n1 : X and Y
are independent i FX;Y (x; y) = FX (x)FY (y ) 8x; y 2 IRn .
This does not mean that the components Xi of X or the components Yi of Y are independent.
However, it does mean that each pair of components (Xi ; Yi ) are independent, any subcollections
(Xi1 ; : : : ; Xik ) and (Yj1 ; : : : ; Yjl ) are independent, and any Borel{measurable functions f (X ) and
g(Y ) are independent.
82
Theorem 4.3.2:
Let X1 ; : : : ; Xn be rv's on (
; L; P ) ! IR. Dene
MAXn = maxfX1 ; : : : ; Xn g = X(n)
by
MAXn(!) = maxfX1 (!); : : : ; Xn (!)g 8! 2
and
MINn = minfX1 ; : : : ; Xng = X(1) = ; maxf;X1 ; : : : ; ;Xn g
by
MINn (!) = minfX1 (!); : : : ; Xn (!)g 8! 2
:
Then,
(i) MINn and MAXn are rv's.
(ii) If X1 ; : : : ; Xn are independent, then
Y
n
FMAX (z ) = P (MAXn z) = P (Xi z 8i = 1; : : : ; n) =
n FX (z )
i
i=1
and
Y
n
FMIN (z ) = P (MINn z ) = 1 ; P (Xi > z 8i = 1; : : : ; n) = 1 ;
n (1 ; FXi (z )):
i=1
(iii) If fXi gni=1 are iid rv's with common cdf FX , then
FMAX (z ) = FXn (z)
n
and
FMIN (z ) = 1 ; (1 ; FX (z ))n :
n
If FX is absolutely continuous with pdf fX , then the pdfs of MAXn and MINn are
fMAX (z) = n FXn;1 (z ) fX (z)
n
and
fMIN (z) = n (1 ; FX (z))n;1 fX (z)
n
83
Note:
Using Theorem 4.3.2, it is easy to derive the joint cdf and pdf of MAXn and MINn for iid rv's
fX1 ; : : : ; Xn g. For example, if the Xi's are iid with cdf FX and pdf fX , then the joint pdf of MAXn
and MINn is
(
fMAX 0; xy
;MINn (x; y) = ;
n
n(n ; 1) (FX (x) ; FX (y)) fX (x)fX (y); x > y
n 2
However, note that MAXn and MINn are not independent. See Rohatgi, page 129, Corollary, for
more details.
84
Stat 6710 Mathematical Statistics I Fall Semester 1999
Note:
The previous transformations are special cases of the following Theorem:
Theorem 4.3.3:
If g : IRn ! IRm is a Borel{measurable function (i.e., 8B 2 Bm : g;1 (B ) 2 Bn ) and if X =
(X1 ; : : : ; Xn ) is an n{rv, then g(X ) is an m{rv.
Proof:
If B 2 Bm , then f! : g(X (!)) 2 B g = f! : X (!) 2 g;1 (B )g 2 Bn .
Discrete Case:
Let X = (X1 ; : : : ; Xn ) be a discrete n{rv and X IRn be the countable support of X , i.e.,
P (X 2 X) = 1 and P (X = x) > 0 8x 2 X.
The joint marginal pmf of any subcollection of ui 's is now obtained by summing over the other
remaining uj 's.
85
Example 4.3.4:
Let X; Y be iid Bin(n; p); 0 < p < 1. Let U = YX+1 and V = Y + 1. Then X = UV and
Y = V ; 1. So the joint pmf of U; V is
! !
P (U = u; V = v) = n puv (1 ; p)n;uv n pv;1 (1 ; p)n+1;v
uv v;1
! !
= uv n n puv+v;1 (1 ; p)2n+1;uv;v
v;1
for v 2 f1; 2; : : : ; n + 1g and uv 2 f0; 1; : : : ; ng.
Continuous Case:
Let X = (X1 ; : : : ; Xn ) be a continuous n{rv with joint cdf FX and joint pdf fX .
Let 0 1 0 g (X ) 1
U1
B
U =B .. C
C = g (X ) =
BB 1 .. CC ;
@ . A @ . A
Un gn (X )
i.e., Ui = gi (X ), be a mapping from IRn into IRn .
If B 2 Bn , then
R R R RY
n
P (U 2 B ) = P (X 2 g;1 (B )) = g;:1:(:B ) fX (x)d(x) = g;:1:(:B ) fX (x) dxi
i=1
where g;1 (B ) = fx = (x1 ; : : : ; xn ) 2 IRn : g(x) 2 B g.
If G happens to be absolutely continuous, the joint pdf of U will be given by fU (u) = @u1@@uG2(:::@u
u) n
n
at every continuity point of fU .
Under certain conditions, we can write fU in terms of the original pdf fX of X as stated in the
next Theorem:
86
Theorem 4.3.5: Multivariate Transformation
Let X = (X1 ; : : : ; Xn ) be a continuous n{rv with joint pdf fX .
(i) Let 0 1 0 g (X ) 1
U1 1
B
U =B .. C
C B
B . CC ;
@ . A = g(X ) = @ .. A
Un gn (X )
(i.e., Ui = gi (X )) be a 1{to{1{mapping from IRn into IRn , i.e., there exist inverses hi ,
i = 1; : : : ; n, such that xi = hi (u) = hi (u1 ; : : : ; un ); i = 1; : : : ; n, over the range of the
transformation g.
(ii) Assume both g and h are continuous.
@xi = @hi (u) ; i; j = 1; : : : ; n, exist and are continuous.
(iii) Assume partial derivatives @uj @uj
(iv) Assume that the Jacobian of the inverse transformation
@x1 @x1
@ (x ; : : : ; x ) @u 1 : : : @u
J = @ (u1 ; : : : ; un ) = ... ..
n
1 n @x .
n @x
@u1 : : : @u
n
n
is dierent from 0 for all u in the range of g.
Then the n{rv U = g(X ) has a joint absolutely continuous cdf with corresponding joint pdf
fU (u) =j J j fX (h1 (u); : : : ; hn (u)):
Proof:
Let u 2 IRn and
Bu = f(u01 ; : : : ; u0n ) : ;1 < u0i < ui 8i = 1; : : : ; ng:
Then,
R :::R
GU (u) = g;1 (B ) fX (x)d(x)
u
R :::R
= Bu fX (h1 (u); : : : ; hn (u)) j J j d(u)
The result follows from dierentiation of GU .
For additional steps of the proof see Rohatgi (page 135 and Theorem 17 on page 10) or a book on
multivariate calculus.
87
Theorem 4.3.6:
Let X = (X1 ; : : : ; Xn ) be a continuous n{rv with joint pdf fX .
(i) Let 0 1 0 g (X ) 1
U1 1
B
U =B .. C
C B
B . CC ;
@ . A = g(X ) = @ .. A
Un gn (X )
(i.e., Ui = gi (X )) be a mapping from IRn into IRn .
(ii) Let X = fx : fX (x) > 0g be the support of X .
(iii) Suppose that for each u 2 B = fu 2 IRn : u = g(x) for some x 2 Xg there is a nite number
k = k(u) of inverses.
(iv) Suppose we can partition X into X0 ; X1 ; : : : ; Xk s.t.
(a) P (X 2 X0 ) = 0.
(b) U = g(X ) is a 1{to{1{mapping
0 h (u) 1 from Xl onto B for all l = 1; : : : ; k, with inverse trans-
l1
B
formation hl (u) = B . C
@ .. CA ; u 2 B , i.e., for each u 2 B , hl (u) is the unique x 2 Xl
hln (u)
such that u = g(x).
@xi = @hli (u) ; l = 1; : : : ; k; i; j = 1; : : : ; n, exist and are continuous.
(v) Assume partial derivatives @u j @uj
(vi) Assume the Jacobian of each of the inverse transformations
@x1 @x1 @h 1 : : : @hl1
@u1 : : : @u @u1 l
@un
Jl = .. . .
.. = ...
n
.. ; l = 1; : : : ; k;
@x .
: : : @x @h @h
@u1 : : :
n n ln ln
@u1 @u n @u n
88
Example 4.3.7:
Let X; Y be iid N (0; 1). Dene
(
U = g1 (X; Y ) =
X;
Y Y=6 0
0; Y = 0
and
V = g2 (X; Y ) =j Y j :
X = IR2 , but U; V are not 1{to{1 mappings since (U; V )(x; y) = (U; V )(;x; ;y), i.e., conditions do
not apply for the use of Theorem 4.3.5. Let
X0 = f(x; y) : y = 0g
X1 = f(x; y) : y > 0g
X2 = f(x; y) : y < 0g
Then P ((X; Y ) 2 X0 ) = 0.
Let B = f(u; v) : v > 0g = g(X1 ) = g(X2 ).
Inverses:
B ! X1 : x h11 (u; v) = uv
=
y h12 (u; v) = v
=
B ! X2 : x h21 (u; v) = ;uv
=
y h22 (u; v) = ;v
=
J1 = v u ) jJ1 j =j v j
0 1
J2 = ; v ; u ) jJ2 j =j v j
0 ;1
fX;Y (x; y) = 21 e;x2 =2 e;y2 =2
fU;V (u; v) = j v j 21 e;(uv)2 =2 e;v2 =2 + j v j 21 e;(;uv)2 =2 e;(;v)2 =2
= v e 2 ; ;1 < u < 1; 0 < v < 1
;( 2 +1) 2 u v
Marginal:
Z1v 2 2 dz
dv j z = (u +2 1)v ; dv
;(u2 +1)v2
fU (u) = e
2 = (u2 + 1)v
0
Z1 1 ;z
=
0 (u2 + 1) e dz
89
1
= (u21+ 1) (;e;z )
0
= (1 +1 u2 ) ; ;1 < u < 1
Thus, the quotient of two iid N (0; 1) rv's is a rv that has a Cauchy distribution.
90
Stat 6710 Mathematical Statistics I Fall Semester 1999
Note:
As shown in Theorem 4.3.2, X(1) and X(n) are rv's. This result will be extended in the following
Theorem:
Theorem 4.4.2:
Let (X1 ; : : : ; Xn ) be an n{rv. Then the kth order statistic X(k) , k = 1; : : : ; n, is also an rv.
Theorem 4.4.3:
Let X1 ; : : : ; Xn be continuous iid rv's with pdf fX . The joint pdf of X(1) ; : : : ; X(n) is
8 Y
> n
< n! fX (xi); x1 x2 : : : xn
fX(1);:::;X( ) (x1 ; : : : ; xn ) = > i=1
: 0;
n
otherwise
Proof:
For the case n = 3, look at the following scenario how X1 ; X2 , and X3 can be possibly ordered to
yield X(1) < X(2) < X(3) . Columns represent X(1) ; X(2) , and X(3) . Rows represent X1 ; X2 , and X3 :
1 0 0
X1 < X2 < X3 : 0 1 0
0 0 1
1 0 0
X1 < X3 < X2 : 0 0 1
0 1 0
91
0 1 0
X2 < X1 < X3 : 1 0 0
0 0 1
0 0 1
X2 < X3 < X1 : 1 0 0
0 1 0
0 1 0
X3 < X1 < X2 : 0 0 1
1 0 0
0 1 0
X3 < X2 < X1 : 1 0 0
0 0 1
For n = 3, there are 3! = 6 possible arrangements. In general, there are n! arrangements of
X1 ; : : : ; Xn for each (X(1) ; : : : ; X(n) ). This mapping is not 1{to{1. For each mapping, we have a
n n matrix J that results from an n n identity matrix through the rearrangement or rows.
Therefore, j J j= 1. By Theorem 4.3.6, we get
fX(1) ;:::;X( ) (x(1) ; : : : ; x(n) ) = n!fX1;:::;X (x(k1 ) ; x(k2 ) ; : : : ; x(k ) )
n n n
Y
n
= n! fX (x(k ) ) i i
i=1
Yn
= n! fX (xi )
i=1
Theorem 4.4.4: Let X1 ; : : : ; Xn be continuous iid rv's with pdf fX and cdf FX . Then the following
holds:
(i) The marginal pdf of X(k) , k = 1; : : : ; n, is
n! k;1 n;k
fX(k) (x) = (k ; 1)!( n ; k)! (FX (x)) (1 ; FX (x)) fX (x):
(ii) The joint pdf of X(j ) and X(k) , 1 j < k n, is
fX(j);X(k) (xj ; xk ) = (j ; 1)!(k ; jn!; 1)!(n ; k)!
(FX (xj ))j ;1 (FX (xk ) ; FX (xj ))k;j ;1 (1 ; FX (xk ))n;k fX (xj )fX (xk )
if xj < xk and 0 otherwise.
92
4.5 Multivariate Expectation
In this section, we assume that X = (X1 ; : : : ; Xn ) is an n{rv and g : IRn ! IRn is a Borel{
measurable function.
Denition 4.5.1:
If n = 1, i.e., g is univariate, we dene the following:
Note:
The above can be extended to vector{valued functions g (n > 1) in the obvious way. For example,
if g is the identity mapping from IRn ! IRn , then
0 E (X ) 1 0 1
1
1
B
E (X ) = B . CC = BB .. C
@ .. A @ . C
A
E (Xn ) n
provided that E (j Xi j) < 1 8i = 1; : : : ; n.
Similarly, provided that all expectations exist, we get for the variance{covariance matrix:
V ar(X ) = X = E ((X ; E (X )) (X ; E (X ))0 )
with (i; j )th component
E ((Xi ; E (Xi )) (Xj ; E (Xj ))) = Cov(Xi ; Xj )
and with (i; i)th component
E ((Xi ; E (Xi )) (Xi ; E (Xi ))) = V ar(Xi ) = i2 :
93
Note:
We are often interested in (weighted) sums of rv's or products of rv's and their expectations. This
will be addressed in the next two Theorems:
Theorem 4.5.2:
Xn
Let Xi ; i = 1; : : : ; n, be rv's such that E (j Xi j) < 1. Let a1 ; : : : ; an 2 IR and dene S = ai Xi .
i=1
Then it holds that E (j S j) < 1 and
X
n
E (S ) = aiE (Xi ):
i=1
Proof:
Continuous case only:
Z X
n
E (j S j) = j ai xi j fX (x)dx
IRn i=1
Z X n
j ai j j xi j fX (x)dx
IRn i=1
Xn Z Z
= j ai j j xi j n;1 fX (x)dx1 : : : dxi;1 dxi+1 : : : dxn dxi
i=1 IR IR
Xn Z
= j ai j j xi j fXi (xi )dxi
i=1 IR
Xn
= j ai j E (j Xi j)
i=1
< 1
X
n
It follows that E (S ) = ai E (Xi ) by the same argument without using the absolute values j j.
i=1
Note:
If Xi ; i = 1; : : : ; n, are iid with E (Xi ) = , then
X
n X
n 1
E (X ) = E ( n1 Xi) = E (Xi ) = :
i=1 i=1 n
94
Stat 6710 Mathematical Statistics I Fall Semester 1999
Theorem 4.5.3:
Let Xi ; i = 1; : : : ; n, be independent rv's such that E (j Xi j) < 1. Let gi ; i = 1; : : : ; n, be Borel{
measurable functions. Then
Yn Y
n
E ( gi (Xi )) = E (gi (Xi ))
i=1 i=1
if all expectations exist.
Proof:
Yn
By Theorem 4.2.5, fX (x) = fXi (xi ), and by Theorem 4.2.7, gi (Xi ); i = 1; : : : ; n, are also inde-
i=1
pendent. Therefore,
Yn Z Y
n
E ( gi (xi )) = gi (xi )fX (x)dx
i=1 IRn i=1
4:2:5
Th:=
Z Y n
(gi (xi )fXi (xi )dxi )
IRn i=1
Z Z Y
n Y
n Y
n
= ::: gi (xi ) fX (xi )
i dxi
IR IR i=1 i=1 i=1
Th:=4:2:7
Z Z Z
g1 (x1 )fX1 (x1 )dx1 g2 (x2 )fX2 (x2 )dx2 : : : gn (xn )fXn (xn)dxn
IR IR IR
Yn Z
= gi (xi )fXi (xi)dxi
i=1 IR
Yn
= E (gi (Xi ))
i=1
Corollary 4.5.4:
If X; Y are independent, then Cov(X; Y ) = 0.
Theorem 4.5.5:
Two rv's X; Y are independent i for all pairs of Borel{measurable functions g1 and g2 it holds
that E (g1 (X ) g2 (Y )) = E (g1 (X )) E (g2 (Y )) if all expectations exist.
Proof:
\=)": It follows from Theorem 4.5.3 and the independence of X and Y that
E (g1 (X )g2 (Y )) = E (g1 (X )) E (g2 (Y )):
95
\(=": From Theorem 4.2.6, we know that X and Y are independent i P (X 2 A1 ; Y 2 A2 ) =
P (X 2 A1 ) P (Y 2 A2 ) 8 Borel sets A1 and A2.
How do we relate Theorem 4.2.6 to g1 and g2 ? Let us dene two Borel{measurable functions g1
and g2 as:
(
g1 (x) = IA1 (x) = 1; x 2 A1
0; otherwise
(
g2(y) = IA2 (y) = 1; y 2 A2
0; otherwise
Then,
E (g1 (X )) = 0 P (X 2 Ac1 ) + 1 P (X 2 A1 ) = P (X 2 A1 );
E (g2 (Y )) = 0 P (Y 2 Ac2 ) + 1 P (Y 2 A2 ) = P (Y 2 A2 )
and
E (g1 (X ) g2 (Y )) = P (X 2 A1; Y 2 A2 ):
=) P (X 2 A1 ; Y 2 A2 ) = E (g1 (X ) g2 (Y )) given
= E (g1 (X )) E (g2 (Y )) = P (X 2 A1 )P (Y 2 A2 )
=) P (X 2 A1 ; Y 2 A2 ) = P (X 2 A1 ) P (Y 2 A2 )
=) X; Y independent by Theorem 4.2.6.
Denition 4.5.6:
1 ; i2 ; : : : ; in ) multi{way moment of X = (X1 ; : : : ; Xn ) is dened as
The (ith th th
if it exists.
The (ith1 ; i2 ; : : : ; in ) multi{way central moment of X = (X1 ; : : : ; Xn ) is dened as
th th
Y
n
i1 i2:::i = E ( (Xj ; E (Xj ))i )
n
j
j =1
if it exists.
Note:
If we set ir = is = 1 and ij = 0 8j 6= r; s in Denition 4.5.6, we get
0 : : : 0 1 0 : : : 0 1 0 : : : 0 = rs = Cov(Xr ; Xs ):
" "
r s
96
Theorem 4.5.7: Cauchy{Schwarz{Inequality
Let X; Y be 2 rv's with nite variance. Then it holds:
(i) Cov(X; Y ) exists.
(ii) (E (XY ))2 E (X 2 )E (Y 2 ).
(iii) (E (XY ))2 = E (X 2 )E (Y 2 ) i there exists an (; ) 2 IR2 ; f(0; 0)g such that
P (X + Y = 0) = 1.
Proof:
Assumptions: V ar(X ); V ar(Y ) < 1. Then also E (X 2 ); E (X ); E (Y 2 ); E (Y ) < 1.
Result used in proof:
0 (a ; b)2 = a2 ; 2ab + b2 =) ab a2 +2 b2
0 (a + b)2 = a2 + 2ab + b2 =) ;ab a2 +2 b2
=)j ab j a2 +2 b2 8 a; b 2 IR ()
(i)
Z
E (j XY j) = j xy j fX;Y (x; y)dx dy
IR2
Z x2 + y2
()
2 fX;Y (x; y)dx dy
IR2
Z x2 Z y2
= f X;Y ( x; y ) dx dy + fX;Y (x; y)dy dx
IR2 2 IR2 2
Z x2 Z y2
= 2 fX (x)dx + IR 2 fY (y)dy
IR
2 2
= E (X ) +2 E (Y )
< 1
=) E (XY ) exists
=) Cov(X; Y ) = E (XY ) ; E (X )E (Y ) exists
(ii) 0 E ((X + Y )2 ) = 2 E (X 2 ) + 2E (XY ) + 2 E (Y 2 ) 8 ; 2 IR (A)
If E (X 2 ) = 0, then X has a degenerate 1{point Dirac distribution and the inequality trivially
is true. Therefore, we can assume that E (X 2 ) > 0. As (A) is true for all ; 2 IR, we can
choose = ;EE((XXY2 ) ) ; = 1.
97
2 2
=) (EE(XY )) (E (XY ))
(X 2 ) ; 2 E (X 2 ) + E (Y ) 0
2
Denition 4.6.2:
Let X = (X1 ; : : : ; Xn ) be an n{rv. We dene the n{dimensional characteristic function
X : IRn ! CI of X as 0 n 1
X
X (t) = E (eit X ) = E (exp @i tj Xj A):
0
j =1
Note:
(i) X (t) exists for any real{valued n{rv.
(ii) If MX (t) exists, then X (t) = MX (it).
98
Theorem 4.6.3:
(i) If MX (t) exists, it is unique and uniquely determines the joint distribution of X . X (t) is
also unique and uniquely determines the joint distribution of X .
(ii) MX (t) (if it exists) and X (t) uniquely determine all marginal distributions of X , i.e.,
MXi (ti ) = MX (0; ti ; 0) and and Xi (ti ) = X (0; ti ; 0).
(iii) Joint moments of all orders (if they exist) can be obtained as
i1 +i2 +:::+i
= @i1 i2 = E (X1i1 X2i2 : : : Xni )
n
mi1 :::i M ( t )
@t1 @t2 : : : @tin X t=0
n
n n
Theorem 4.6.4:
Let X1 ; : : : ; Xn be independent rv's.
X
n
(i) If mgf's MX1 (t); : : : ; MXn (t) exist, then the mgf of Y = aiXi is
i=1
Y
n
MY (t) = MX (ait)i
i=1
on the common interval where all individual mgf's exist.
X
n
(ii) The characteristic function of Y = aj Xj is
j =1
Y
n
Y (t) = Xj (aj t)
j =1
99
(iii) If mgf's MX1 (t); : : : ; MXn (t) exist, then the mmgf of X is
Y
n
MX (t) = MX (ti)
i
i=1
on the common interval where all individual mgf's exist.
(iv) The n{dimensional characteristic function of X is
Y
n
X (t) = Xj (tj ):
j =1
Proof:
Homework (parts (ii) and (iv) only)
100
Stat 6710 Mathematical Statistics I Fall Semester 1999
Theorem 4.6.5:
Let X1 ; : : : ; Xn be independent discrete rv's on the non{negative integers with pgf's GX1 (s); : : : ; GXn (s).
X
n
The pgf of Y = Xi is
i=1
Y
n
GY (s) = GX (s):
i
i=1
Proof:
Version 1:
GX (s)
i = E (sX ) i
GY (s) = E (sY )
P
E (s =1 Xi )
n
= i
indep: Y
n
= E (sX ) i
i=1
Yn
= GX (s) i
i=1
101
A generalized proof for n 3 needs to be done by induction on n.
Theorem 4.6.6:
Let X1 ; : : : ; XN be iid discrete rv's on the non{negative integers with common pgf GX (s). Let N
be a discrete rv on the non{negative integers with pgf GN (s). Let N be independent of the Xi 's.
X
N
Dene SN = Xi . The pgf of SN is
i=1
GS (s) = GN (GX (s)):
N
Proof:
X
1
P (SN = k) = P (SN = kjN = n) P (N = n)
n=0
X
1 X
1
=) GSN (s) = P (SN = kjN = n) P (N = n) sk
k=0 n=0
X
1 X
1
= P (N = n) P (SN = kjN = n) sk
n=0 k=0
X
1 X
1
= P (N = n) P (Sn = k) sk
n=0 k=0
X
1 X
1 X
n
= P (N = n) P ( Xi = k) sk
n=0 k=0 i=1
Th:=4:6:5 X
1 Yn
P (N = n) GX (s) i
n=0 i=1
iid X
1
= P (N = n) (GX (s))n
n=0
= GN (GX (s))
Example 4.6.7:
Starting with a single cell at time 0, after one time unit there is probability p that the cell will have
split (2 cells), probability q that it will survive without splitting (1 cell), and probability r that it
will have died (0 cells). It holds that p; q; r 0 and p + q + r = 1. Any surviving cells have the
same probabilities of splitting or dying. What is the pgf for the # of cells at time 2?
GX (s) = GN (s) = ps2 + qs + r
GS (s) = p(ps2 + ps + r)2 + q(ps2 + ps + r) + r
N
102
Theorem 4.6.8:
Let X1 ; : : : ; XN be iid rv's with common mgf MX (t). Let N be a discrete rv on the non{negative
X
N
integers with mgf MN (t). Let N be independent of the Xi 's. Dene SN = Xi . The mgf of SN
i=1
is
MSN (t) = MN (ln MX (t)):
Proof:
Consider the case that the Xi 's are non{negative integers:
We know that
GX (S ) = E (S X ) = E (eln S ) = E (e(ln S )X ) = MX (ln S )
X
=) MX (S ) = GX (eS )
=) MS (t) = GS (et ) Th:=4:6:6 GN (GX (et )) = GN (MX (t)) = MN (ln MX (t))
N N
In the general case, i.e., if the Xi0 s are not non{negative integers, we need results from Section 4.7
(conditional expectation) to proof this Theorem.
103
Stat 6710 Mathematical Statistics I Fall Semester 1999
Denition 4.7.1:
Let X; Y be rv's on (
; L; P ). Let h be a Borel{measurable function. Assume that E (h(X )) exists.
Then the conditional expectation of h(X ) given Y , i.e., E (h(X ) j Y ), is a rv that takes the
value E (h(X ) j y). It is dened as
8X
>
> h(x)P (X = x j Y = y); if (X; Y ) is discrete and P (Y = y) > 0
< x2X
E (h(X ) j y) = > Z
1
>
: h(x)fX jY (x j y)dx; if (X; Y ) is continuous and fY (y) > 0
;1
Note:
(i) The rv E (h(X ) j Y ) = g(Y ) is a function of Y as a rv.
(ii) The usual properties of expectations apply to the conditional expectation:
(a) E (c j Y ) = c 8c 2 IR.
(b) E (aX + b j Y ) = aE (X j Y ) + b 8a; b 2 IR.
(c) If g1 ; g2 are Borel{measurable functions and if E (g1 (X )); E (g2 (X )) exist, then
E (a1 g1 (X ) + a2g2 (X ) j Y ) = a1 E (g1 (X ) j Y ) + a2 E (g2 (X ) j Y ) 8a1; a2 2 IR.
(d) If X 0 then E (X j Y ) 0.
(e) If X1 X2 then E (X1 j Y ) E (X2 j Y ).
(iii) Moments are dened in the usual way. If E (j X jr ) < 1, then E (X r j Y ) exists and is the
rth conditional moment of X given Y .
104
Example 4.7.2:
Recall Example 4.1.12: (
fX;Y (x; y) = 2; 0 < x < y < 1
0; otherwise
The conditional pdf's fY jX (y j x) and fX jY (x j y) have been calculated as:
fY jX (y j x) = 1 ;1 x for x < y < 1 (where 0 < x < 1)
and
fX jY (x j y) = y1 for 0 < x < y (where 0 < y < 1).
So, Zyx
E (X j y) = dx = y
0 y 2
and Z1 1 1 y 2 1 1 1 ; x2 1 + x
E (Y j x) = 1 ; x y dy = 1 ; x 2 = 2 1 ; x = 2 :
x x
Therefore, we get the rv's E (X j Y ) = Y2 and E (Y j X ) = 1+2X .
Theorem 4.7.3:
If E (h(X )) exists, then
EY (EX jY (h(X ) j Y )) = E (h(X )):
Proof:
Continuous case only:
Z1
EY (E (h(X ) j Y )) = E (h(x) j y)fY (y)dy
;1
Z1Z1
= h(x)fX jY (x j y)fY (y)dxdy
;1 ;1
Z1 Z1
= h(x) fX;Y (x; y)dydx
;1 ;1
Z1
= h(x)fX (x)dx
;1
= E (h(X ))
Theorem 4.7.4:
If E (X 2 ) exists, then
V arY (E (X j Y )) + EY (V ar(X j Y )) = V ar(X ):
105
Proof:
V arY (E (X j Y )) + EY (V ar(X j Y )) = EY ((E (X j Y ))2 ) ; (EY (E (X j Y )))2
+ EY (E (X 2 j Y ) ; (E (X j Y ))2 )
= EY ((E (X j Y ))2 ) ; (E (X ))2 + E (X 2 ) ; EY ((E (X j Y ))2 )
= E (X 2 ) ; (E (X ))2
= V ar(X )
Note:
If E (X 2 ) exists, then V ar(X ) V arY (E (X j Y )). V ar(X ) = V arY (E (X j Y )) i X = g(Y ). The
inequality directly follows from Theorem 4.7.4.
For equality, it is necessary that EY (V ar(X j Y )) = EY ((X ; E (X j Y ))2 j Y ) = 0 which holds if
X = E (X j Y ) = g(Y ).
If X; Y are independent, FX jY (x j y) = FX (x) 8x. Thus, if E (h(X )) exists, then E (h(X ) j Y ) =
E (h(X )).
Proof:
Fix b. Let
g(a) = p1 ap + 1q bq ; ab
=) g0 (a) = ap;1 ; b =! 0
=) b = ap;1
=) bq = a(p;1)q = ap
1 1
The result follows immediately when multiplying both sides with (E (j X jp )) p (E (j Y jq )) q .
Note:
Note that Theorem 4.5.7 (ii) (Cauchy{Schwarz{Inequality) is a special case of Theorem 4.8.2 with
p = q = 2.
107
Stat 6710 Mathematical Statistics I Fall Semester 1999
Denition 4.8.4:
A function g(x) is convex if
g(x + (1 ; )y) g(x) + (1 ; )g(y) 8x; y 2 IR 80 < < 1:
Note:
(i) Geometrically, a convex function falls above all of its tangent lines. Also, a connecting line
between any pairs of points (x; g(x)) and (y; g(y)) in the 2{dimensional plane always falls
above the curve.
108
(ii) A function g(x) is concave i ;g(x) is convex.
Note:
Typical convex functions g are:
(i) g1 (x) =j x j ) E (j X j) j E (X ) j.
(ii) g2 (x) = x2 ) E (X 2 ) (E (X ))2 ) V ar(X ) 0.
(iii) g3 (x) = x1p for x > 0; p > 0 ) E ( X1p ) (E (X1 ))p ; for p = 1: E ( X1 ) E (1X )
(iv) Other convex functions are xp for x > 0; p 1; x for > 1; ; ln(x) for x > 0; etc.
(v) Recall that if g is convex and dierentiable, then g00 (x) 0 8x.
(vi) If the function g is concave, the direction of the inequality in Jensen's Inequality is reversed,
i.e., E (g(X )) g(E (X )).
Example 4.8.6:
Given the real numbers a1 ; a2 ; : : : ; an > 0, we dene
X
n
arithmetic mean : aA = n1 (a1 + a2 + : : : + an) = n1 ai
i=1
109
1
n !1
Y n
geometric mean : aG = (a1 a2 : : : an) = n ai
i=1
harmonic mean : aH = 1 1 1 1 1
= X1
n 1
n a1 + a2 + : : : + an 1
n
i=1 ai
Let X be a rv that takes values a1 ; a2 ; : : : ; an > 0 with probability n1 each.
(i) aA aG :
X
n !
ln(aA ) = ln n1 ai
i=1
= ln(E (X ))
ln concave
E (ln(X ))
Xn 1
= n ln(ai )
i=1
= 1X
n
ln(a )
n i=1 i
= 1 ln( Y
n
a)
n i
i=1
Yn 1
= ln(( ai ) n )
i=1
= ln(aG )
Taking the anti{log of both sides gives aA aG .
(ii) aA aH :
1 = 1
aA 1 X
n
n ai
i=1
= 1
E (X )
1=X convex
E( 1 ) X
X
n 11
=
i=1n ai
= 1 1 + 1 + ::: 1
n a1 a2 an
= 1
aH
Inverting both sides gives aA aH .
110
(iii) aG aH :
; ln(aH ) = ln(a;H1 )
= ln( a1 )
1H 1 1 1
= ln n ( a + a + : : : + a )
1 2 n
= 1
ln(E ( X ))
E (ln( 1 ))
ln concave
X
X
n 1 1)
= ln(
n a
i=1 i
= 1X
n
ln( a1 )
n i=1 i
1X
n
= n i=1 ; ln ai
Yn
= ; n1 ln( ai)
i=1
Yn
= ; ln( ai ) 1 n
i=1
= ; ln aG
Multiplying both sides with ;1 gives ln aH ln aG . Then taking the anti{log of both sides
gives aH aG .
In summary, aH aG aA . Note that it would have been sucient to prove steps (i) and (iii)
only to establish this result. However, step (ii) has been included to provide another example how
to apply Theorem 4.8.5.
112
Stat 6710 Mathematical Statistics I Fall Semester 1999
5 Particular Distributions
5.1 Multivariate Normal Distributions
Denition 5.1.1:
A rv X has a (univariate) Normal distribution, i.e., X N (; 2 ) with 2 IR and > 0, i
it has the pdf
fX (x) = p 1 2 e; 22 (x;) :
1 2
2
X has a standard Normal distribution i = 0 and 2 = 1, i.e., X N (0; 1).
Note:
If X N (; 2 ), then E (X ) = and V ar(X ) = 2 . If X1 N (1 ; 12 ); X2 N (2 ; 22 ) and
c1 ; c2 2 IR, then Y = c1 X1 + c2 X2 N (c1 1 + c2 2 ; c21 12 + c22 22 ).
Denition 5.1.2:
A 2{rv (X; Y ) has a bivariate Normal distribution i there exist constants a11 ; a12 ; a21 ; a22 ; 1 ; 2 2
IR and iid N (0; 1) rv's Z1 and Z2 such that
X = 1 + a11 Z1 + a12 Z2 ; Y = 2 + a21 Z1 + a22Z2 :
If we dene ! ! ! !
A = a11 a12 ; = 1 ; X = X ; Z = Z1 ;
a21 a22 2 Y Z2
then we can write
X = AZ + :
Note:
E (X ) = 1 + a11 E (Z1 ) + a12 E (Z2 ) = 1 and E (Y ) = 2 + a21 E (Z1 ) + a22 E (Z2 ) = 2 . The
marginal distributions are X N (1 ; a211 + a212 ) and Y N (2 ; a221 + a222 ). Thus, X and Y have
(univariate) Normal marginal densities or degenerate marginal densities (which correspond to Dirac
distributions) if ai1 = ai2 = 0.
113
Theorem 5.1.3:
Dene g : IR2 ! IR2 as g(x) = Cx + d. If X is a bivariate Normal rv, then g (X ) also is a bivariate
Normal rv.
Proof:
g(X ) = CX + d
= C (AZ + ) + d
= (|CA{z }) Z + (C + d)
another matrix
| {z }
another vector
~ + ~ which represents another bivariate Normal distribution
= AZ
Note:
1 2 = Cov(X; Y ) = Cov(a11 Z1 + a12 Z2 ; a21 Z1 + a22 Z2 )
= a11 a21 Cov(Z1 ; Z1 ) + (a11 a22 + a12 a21 )Cov(Z1 ; Z2 ) + a12 a22 Cov(Z2 ; Z2 )
= a11 a21 + a12 a22
since Z1 ; Z2 are iid N (0; 1) rv's.
Denition 5.1.4:
The variance{covariance matrix of (X; Y ) is
! ! ! !
a a
= AA0 = 11 12
a11 a21 =
a211 + a212 a11 a21 + a12 a22 =
12 1 2 :
a21 a22 a12 a22 a11 a21 + a12 a22 a221 + a222 1 2 22
Theorem 5.1.5:
Assume that 1 > 0; 2 > 0 and j j< 1. Then the joint pdf of X = (X; Y ) = AZ + (as dened
in Denition 5.1.2) is
1 1
fX (x) = p exp ; 2 (x ; ) (x ; ) 0 ; 1
2 j j
1 1 x ; 1 2 x ; 1 y ; 2 y ; 2 2!!
= p exp ; 2(1 ; 2 ) 1 ; 2 2 + 2
21 2 1 ; 2 1
Proof:
The mapping Z ! X is 1{to{1:
X = AZ +
114
=) Z = A;1 (X ; ) (requirement is that A is invertible.)
J = jA;1 j = jA1 j
q q q q q q
jAj = jAj2 = jAj jAT j = jAAT j = jj = 1222 ; 212 22 = 12 1 ; 2
We can use this result to get to the second line of the theorem:
2 2
fZ (z ) = p 1 e ; z1
2 p 1 e ; z2
2
2 2
= 21 e; 2 (zT z)
1
As already stated, the mapping from Z to X is 1{to{1, so we can apply Theorem 4.3.5:
fX (x) = 1 exp(; 1 (x ; )T (A;1 )T A;1 (x ; ))
p
2 jj 2 | {z;1 }
()
= 1 1
p exp(; 2 (x ; )T ;1(x ; ))
2 jj
This proves the 1st line of the Theorem. Step () holds since
(A;1 )T A;1 = (AT );1 A;1 = (AAT );1 = ;1 :
The second line of the Theorem is based on the following transformations:
q q
jj = 12 1 ; 2
!
;1 = 1 22 ;1 2
jj ;12 12
0 1 ; 1
= @ 12 (1;2 )
;
1 2 (1;2 )
1
A
1 2 (1;2 ) 22 (1;2 )
1p 1 x ; 1 2 x ; 1 y ; 2 y ; 2 2!!
=) fX (x) = exp ; 2(1 ; 2 ) 1 ; 2 1 2 + 2
21 2 1 ; 2
Note:
In the situation of Theorem 5.1.5, we say that (X; Y ) N (1 ; 2 ; 12 ; 22 ; ).
Theorem 5.1.6:
If (X; Y ) has a non{degenerate N (1 ; 2 ; 12 ; 22 ; ) distribution, then the conditional distribution
of X given Y = y is
N (1 + 1 (y ; 2 ); 12 (1 ; 2 )):
2
115
Proof:
Homework
Example 5.1.7:
Let rv's (X1 ; Y1 ) be N (0; 0; 1; 1; 0) with pdf f1 (x; y) and (X2 ; Y2 ) be N (0; 0; 1; 1; ) with pdf f2 (x; y).
Let (X; Y ) be the rv that corresponds to the pdf
fX;Y (x; y) = 21 f1 (x; y) + 21 f2 (x; y):
116
Stat 6710 Mathematical Statistics I Fall Semester 1999
Theorem 5.1.8:
The mgf MX (t) of a bivariate Normal rv X = (X; Y ) is
MX (t) = MX;Y (t1 ; t2 ) = exp(0 t + 12 t0 t) = exp 1 t1 + 2t2 + 12 (12 t21 + 22 t22 + 21 2 t1 t2 ) :
Proof:
The mgf of a univariate Normal rv X N (; 2 ) will be used to develop the mgf of a bivariate
Normal rv X = (X; Y ):
MX (t) = E (exp(tX ))
Z1 1
= exp(tx) p 1 2
exp ; 22 (x ; ) dx
;1 22
Z1 1
= p exp ; 21 2 [;22 tx + (x ; )2 ] dx
;1 22
Z1 1 1
= p 2 2
exp ; 22 [;2 tx + (x ; 2x + )] dx 2
;1 22
Z1 1
= p exp ; 21 2 [x2 ; 2( + 2 t)x + ( + t2 )2 ; ( + t2 )2 + 2 ] dx
;1 22
1 Z 1 1 1
= 2 2
exp ; 22 [;( + t ) + ] 2 p 2 2
exp ; 22 [x ; ( + t )] dx
| ;1 22 {z }
2 2
pdf of N ( + t ; ), that integrates to 1
1
= 2 2 2
exp ; 22 [; ; 2t ; t + ] 4 2
;2 t 2 ; t2 4 !
= exp ;22
= exp t + 12 2 t2
117
Z1Z1
= exp(t1 x) exp(t2 y) fX (x) fY jX (y j x) dy dx
;1 ;1
Z 1 Z 1
= exp(t2 y) fY jX (y j x) dy exp(t1 x) fX (x) dx
;1 ;1
(A)
Z 1 Z 1 exp(t2 y) ; (y ; X ) 2! !
= p p exp 22 (1 ; 2) dy exp(t1 x) fX (x) dx
;1 ;1 2 1 ; 2 2 2
j X = 2 + 2 (x ; 1)
1
Z1
(B)
= exp X t2 + 21 22 (1 ; 2 )t22 exp(t1 x) fX (x) dx
;1
Z1
= exp [2 + 2 (x ; 1 )]t2 + 12 22 (1 ; 2 )t22 + t1 x fX (x) dx
;1 1
Z1 2 2 1
= 2 2 2
exp 2 t2 + t2 x ; 1 t2 + 2 2 (1 ; )t2 + t1 x fX (x) dx
;1 1 1
1 Z 1
= exp 2 22 (1 ; 2 )t22 + t2 2 ; 2 1 t2 exp (t1 + 2 t2 )x fX (x) dx
1 ;1 1
(C )
1 2
2 1 2
= 2 2 2 2
exp 2 2 (1 ; )t2 + t2 2 ; 1 t2 exp 1 (t1 + t2 ) + 2 1 (t1 + t2 ) 2
1 1 1
1 1 2 2 1 1
= 2 2 2 2 2 2 2 2 2 2
exp 2 2 t2 ; 2 2 t2 + 2 t2 ; 1 t2 + 1 t1 + 1 t2 + 2 1 t1 + 1 2 t1 t2 + 2 2 t2
1 1
22 22 !
= exp 1 t1 + 2 t2 + 1 t1 + 2 t2 2+ 21 2 t1 t2
(A) follows from Theorem 5.1.6 since Y j X N (X ; 22 (1 ; 2 )). (B ) follows when we apply our
calculations of the mgf of a N (; 2 ) distribution to a N (X ; 22 (1 ; 2 )) distribution. (C ) holds
since the integral represents MX (t1 + 21 t2 ).
Corollary 5.1.9:
Let (X; Y ) be a bivariate Normal rv. X and Y are independent i = 0.
Denition 5.1.10:
Let Z be a k{rv of k iid N (0; 1) rv's. Let A 2 IRkk be a k k matrix, and let 2 IRk be a
k{dimensional vector. Then X = AZ + has a multivariate Normal distribution with mean
vector and variance{covariance matrix = AA0 .
118
Note:
(i) If is non{singular, X has the joint pdf
fX (x) = (2)k=2 (1j j)1=2 exp ; 12 (x ; )0 ;1(x ; ) :
(ii) If is singular, then X ; takes values in a linear subspace of IRk with probability 1.
(iii) If is non{singular, then X has mgf
MX (t) = exp(0 t + 21 t0t):
Theorem 5.1.11:
The components X1 ; : : : ; Xk of a normally distributed k{rv X are independent i Cov(Xi ; Xj ) = 0
8i; j = 1; : : : ; k; i 6= j .
Theorem 5.1.12:
Let X = (X1 ; : : : ; Xk )0 . X has a k{dimensional Normal distribution i every linear function of X ,
i.e., X 0 t = t1 X1 + t2 X2 + : : : + tk Xk , has a univariate Normal distribution.
Proof:
The Note following Denition 5.1.1 states that any linear function of two Normal rv's has a uni-
variate Normal distribution. By induction on k, we can show that every linear function of X , i.e.,
X 0 t, has a univariate Normal distribution.
Conversely, if X 0 t has a univariate Normal distribution, we know from Theorem 5.1.8 that
1
0 0
MX 0 t (s) = exp E (X t) s + 2 V ar(X t) s2
= exp 0 ts + 12 t0 ts2
1
0
=) MX 0 t (1) = exp t + 2 t t0
= MX (t)
By uniqueness of the mgf and Note (iii) that follows Denition 5.1.10, X has a multivariate Normal
distribution.
119
5.2 Exponential Familty of Distributions
Denition 5.2.1:
Let # be an interval on the real line. Let ff (; ) : 2 #g be a family of pdf's (or pmf's). We
assume that the set fx : f (x; ) > 0g is independent of , where x = (x1 ; : : : ; xn ). We say that
the family ff (; ) : 2 #g is a one{parameter exponential family if there exist real{valued
functions Q() and D() on # and Borel{measurable functions T (X ) and S (X ) on IRn such that
f (x; ) = exp(Q()T (x) + D() + S (x)):
Note:
We can also write f (x; ) as
f (x; ) = h(x)c() exp(T (x))
where h(x) = exp(S (x)), = Q(), and c() = exp(D(Q;1 ())), and call this the exponential
family in canonical form for a natural parameter .
Denition 5.2.2:
Let # IRk be a k{dimensional interval. Let ff (; ) : 2 #g be a family of pdf's (or pmf's). We
assume that the set fx : f (x; ) > 0g is independent of , where x = (x1 ; : : : ; xn ). We say that the
family ff (; ) : 2 #g is a k{parameter exponential family if there exist real{valued functions
Q1 (); : : : Qk () and D() on # and Borel{measurable functions T1 (X ); : : : ; Tk (X ) and S (X ) on
IRn such that !
X
k
f (x; ) = exp Qi ()Ti (x) + D() + S (x) :
i=1
Note:
Similar to the Note following Denition 5.2.1, we can express the k{parameter exponential family
in canonical form for a natural k 1 parameter vector = (1 ; : : : ; k )0 .
Example 5.2.3:
Let X N (; 2 ) with both parameters and 2 unknown. We have:
1 2 !
f (x; ) = p 2 exp ; 22 (x ; ) = exp ; 21 2 x2 + 2 x ; 22 ; 12 ln(22 )
1 2
2
= (; 2 )
# = f(; 2 ) : 2 IR; 2 > 0g
120
Therefore,
Q1 () = ; 21 2
T1 (x) = x2
Q2 () =
2
T2 (x) = x
2
D() = ; 22 ; 12 ln(22 )
S (x) = 0
Thus, this is a 2{parameter exponential family.
121
Stat 6710 Mathematical Statistics I Fall Semester 1999
6 Limit Theorems
Motivation:
I found this slide from my Stat 250, Section 003, \Introductory Statistics" class (an undergraduate
class I taught at George Mason University in Spring 1999):
122
6.1 Modes of Convergence
Denition 6.1.1:
Let X1 ; : : : ; Xn be iid rv's with common cdf FX (x). Let T = T (X ) be any statistic, i.e., a Borel{
measurable function of X that does not involve the population parameter(s) #, dened on the
support X of X . The induced probability distribution of T (X ) is called the sampling distribu-
tion of T (X ).
Note:
(i) Commonly used statistics are:
X n
Sample Mean: X n = n1 Xi
i=1
X n
Sample Variance: Sn2 = n;1 1 (Xi ; X n )2
i=1
Sample Median, Order Statistics, Min, Max, etc.
(ii) Recall that if X1 ; : : : ; Xn are iid and if E (X ) and V ar(X ) exist, then E (X n ) = = E (X ),
E (Sn2 ) = 2 = V ar(X ), and V ar(X n) = n2 .
(iii) Recall that if X1 ; : : : ; Xn are iid and if X has mgf MX (t) or characteristic function X (t)
then MX n (t) = (MX ( nt ))n or X n (t) = (X ( nt ))n .
Denition 6.1.2:
Let fXn g1 1
n=1 be a sequence of rv's with cdf's fFn gn=1 and let X be a rv with cdf F . If Fn (x) ! F (x)
at all continuity points of F , we say that Xn converges in distribution to X (Xn ;! d X ) or
Xn converges in law to X (Xn ;! L X ), or F converges weakly to F (F ;! w F ).
n n
Example 6.1.3:
Let Xn N (0; n1 ). Then
Z x exp ; 21 nt2
Fn (x) = q 2 dt
;1 n
123
Z pnx exp(; 1 s2)
= p 2 ds
;1 2
p
= ( nx)
8
>
< (1) =11; if x > 0
=) Fn (x) ! > (0) = 2 ; if x = 0
: (;1) = 0; if x < 0
(
If FX (x) = 1; x 0 the only point of discontinuity is at x = 0. Everywhere else,
p 0; x < 0
( nx) = Fn (x) ! FX (x).
So, Xn ;!d X , where P (X = 0) = 1, or X ;!
d 0 since the limiting rv here is degenerate, i.e., it
n
has a Dirac(0) distribution.
Example 6.1.4:
In this example, the sequence fFn g1
n=1 converges pointwise to something that is not a cdf:
Let Xn Dirac(n), i.e., P (Xn = n) = 1. Then,
(
Fn(x) = 0; x < n
1; x n
d X.
It is Fn (x) ! 0 8x which is not a cdf. Thus, there is no rv X such that Xn ;!
Example 6.1.5:
Let fXn g1 1 1
n=1 be a sequence of rv's such that P (Xn = 0) = 1 ; n and P (Xn = n) = n and let
X Dirac(0), i.e., P (X = 0) = 1.
It is
8
>
< 0; 1 x < 0
Fn (x) = 1 ; n ; 0 x < n
>
: 1; xn
(
FX (x) = 0; x < 0
1; x 0
w F but
It holds that Fn ;! X
E (Xnk ) = nk;1 6! E (X k ) = 0:
Thus, convergence in distribution does not imply convergence of moments/means.
124
Note:
Convergence in distribution does not say that the Xi 's are close to each other or to X . It only
means that their cdf's are (eventually) close to some cdf F . The Xi 's do not even have to be dened
on the same probability space.
Example 6.1.6:
Let X and fXn g1 d
n=1 be iid N (0; 1). Obviously, Xn ;! X but nlim
!1 Xn 6= X .
Theorem 6.1.7:
Let X and fXn g1 1
=1 be discrete rv's with support X and fXn gn=1 , respectively. Dene the count-
n1
[
able set A = X [ Xn = fak : k = 1; 2; 3; : : : g. Let pk = P (X = ak ) and pnk = P (Xn = ak ). Then
n=1
d X.
it holds that pnk ! pk 8k i Xn ;!
Theorem 6.1.8:
Let X and fXn g1 1
n=1 be continuous rv's with pdf's f and ffn gn=1 , respectively. If fn(x) ! f (x) for
d
almost all x as n ! 1 then Xn ;! X .
Theorem 6.1.9:
Let X and fXn g1 d
n=1 be rv's such that Xn ;! X . Let c 2 IR be a constant. Then it holds:
d X + c.
(i) Xn + c ;!
d cX .
(ii) cXn ;!
d aX + b.
(iii) If an ! a and bn ! b, then an Xn + bn ;!
Proof:
Part (iii):
Suppose that a > 0; an > 0. Let Yn = an Xn + bn and Y = aX + b. It is
FY (y) = P (Y < y) = P (aX + b < y) = P (X < y ;a b ) = FX ( y ;a b ):
Likewise,
FY (y) = FX ( y ;a bn ):
n n
n
If y is a continuity point of FY , y;a b is a continuity point of FX . Since an ! a; bn ! b and
FXn (x) ! FX (x), it follows that FYn (y) ! FY (y) for every continuity point y of FY . Thus,
an Xn + bn ;!d aX + b.
125
Stat 6710 Mathematical Statistics I Fall Semester 1999
Denition 6.1.10:
Let fXn g1
n=1 be a sequence of rv's dened on a probability space (
; L; P ). We say that Xn
p
converges in probability to a rv X (Xn ;! X , P- nlim
!1 Xn = X ) if
!1 P (j Xn ; X
nlim j> ) = 0 8 > 0:
Note:
The following are equivalent:
!1 P (j Xn ; X j> ) = 0 , nlim
nlim !1 P (j Xn ; X j ) = 1 , nlim
!1 P (f! :j Xn (!) ; X (!) j> )) = 0:
If X is degenerate, i.e., P (X = c) = 1, we say that Xn is consistent for c. For example, let Xn
such that P (Xn = 0) = 1 ; n1 and P (Xn = 1) = n1 . Then
( 1
P (j Xn j> ) = n; 0<<1
0; 1
p
!1 P (j Xn j> ) = 0 8 > 0. So Xn ;! 0, i.e., Xn is consistent for 0.
Therefore, nlim
Theorem 6.1.11:
p p
(i) Xn ;! X () Xn ; X ;! 0.
p p
(ii) Xn ;! X; Xn ;! Y =) P (X = Y ) = 1.
p p p
(iii) Xn ;! X; Xm ;! X =) Xn ; Xm ;! 0 as n; m ! 1.
p p p
(iv) Xn ;! X; Yn ;! Y =) Xn Yn ;! X Y.
p p
(v) Xn ;! X; k 2 IR a constant =) kXn ;! kX .
p p r
(vi) Xn ;! k; k 2 IR a constant =) Xnr ;! k 8r 2 IN .
p p p
(vii) Xn ;! a; Yn ;! b; a; b 2 IR =) Xn Yn ;! ab.
p p
(viii) Xn ;! 1 =) Xn;1 ;! 1.
126
p p p a
(ix) Xn ;! a; Yn ;! b; a 2 IR; b 2 IR ; f0g =) XYnn ;! b.
p p
(x) Xn ;! X; Y an arbitrary rv =) XnY ;! XY .
p p p
(xi) Xn ;! X; Yn ;! Y =) Xn Yn ;! XY .
Proof:
See Rohatgi, page 244{245 for partial proofs.
Theorem 6.1.12:
p p
Let Xn ;! X and let g be a continuous function on IR. Then g(Xn ) ;! g(X ).
Proof:
Preconditions:
1.) X rv =) 8 > 0 9k = k() : P (jX j > k) < 2
2.) g is continuous on IR
=) g is also uniformly continuous (see Denition of u.c. in Theorem 3.3.3 (iii)) on [;k; k]
=) 9 = (; k) : jX j k; jXn ; X j < ) jg(Xn ) ; g(X )j <
Let
A = fjX j kg = f! : jX (!)j kg
B = fjXn ; X j < g = f! : jXn (!) ; X (!)j < g
C = fjg(Xn ) ; g(X )j < g = f! : jg(Xn (!)) ; g(X (!))j < g
If ! 2 A \ B
=2)
:)
!2C
=) A \ B C
=) C C (A \ B )C = AC [ B C
=) P (C C ) P (AC [ B C ) P (AC ) + P (B C )
Now:
P (jg(Xn ) ; g(X )j ) P| (jX{zj > k}) + P| (jXn ;{zX j })
2 by 1.) 2 for nn0 (;;k) since Xn ;!
p
X
for n n0(; ; k)
127
Corollary 6.1.13:
p p
Let Xn ;! c; c 2 IR and let g be a continuous function on IR. Then g(Xn ) ;! g(c).
Theorem 6.1.14:
p d X.
Xn ;! X =) Xn ;!
Proof:
p
Xn ;! X , P (jXn ; X j > ) ! 0 as n ! 1 8 > 0
It holds:
P (X x ; ) = P (X x ; ; jXn ; X j ) + P (X x ; ; jXn ; X j > )
(A)
P (Xn x) + P (jXn ; X j > )
(A) holds since X x ; and Xn within of X , thus Xn x.
Similarly, it holds:
P (Xn x) P (X x + ) + P (jXn ; X j > )
= P (Xn x; j Xn ; X j ) + P (Xn x; j Xn ; X j> )
Combining the 2 inequalities from above gives:
P (X x ; ) ; P| (jXn ;{zX j > }) P| (Xn{z x}) P (X x + ) + P| (jXn ;{zX j > })
!0 as n!1 =Fn (x) !0 as n!1
Therefore,
P (X x ; ) Fn(x) P (X x + ) as n ! 1:
Since the cdf's Fn () are not necessarily left continuous, we get the following result for # 0:
P (X < x) Fn (x) P (X x) = FX (x)
Let x be a continuity point of F . Then it holds:
F (x) = P (X < x) Fn (x) F (x)
=) Fn (x) ! F (x)
=) Xn ;!d X
128
Theorem 6.1.15:
Let c 2 IR be a constant. Then it holds:
p
d c () X ;!
Xn ;! n c:
Example 6.1.16:
In this example, we will see that
p
d X 6=) X ;!
Xn ;! n X
for some rv X . Let Xn be identically distributed rv's and let (Xn ; X ) have the following joint
distribution:
Xn 0 1
X
0 0 12 12
1 1 0 1
2 2
1 1 1
2 2
d X since all have exactly the same cdf, but for any 2 (0; 1), it is
Obviously, Xn ;!
P (j Xn ; X j> ) = P (j Xn ; X j= 1) = 1 8n;
6 p X.
!1 P (j Xn ; X j> ) 6= 0. Therefore, Xn ;!
so nlim
Theorem 6.1.17:
Let fXn g1 1
n=1 and fYn gn=1 be sequences of rv's and X be a rv dened on a probability space
(
; L; P ). Then it holds:
p
d X; j X ; Y j;! d X:
Yn ;! n n 0 =) Xn ;!
Proof:
Similar to the proof of Theorem 6.1.14. See also Rohatgi, page 253, Theorem 14.
129
Stat 6710 Mathematical Statistics I Fall Semester 1999
Proof:
p Th:6:1:11(i) p
(i) Yn ;! c () Yn ; c ;! 0
p
=) Yn ; c = Yn + (Xn ; Xn ) ; c = (Xn + Yn ) ; (Xn + c) ;! 0 (A)
d X Th:=6:1:9(i) d X + c (B )
Xn ;! ) Xn + c ;!
Combining (A) and (B ), it follows from Theorem 6.1.17:
Xn + Yn ;! d X +c
(ii) Case c = 0:
8 > 0 8k > 0, it is
P (j Xn Yn j> ) = P (j XnYn j> ; Yn k ) + P (j Xn Yn j> ; Yn > k )
P (j Xn k j> ) + P (Yn > k )
P (j Xn j> k) + P (j Yn j> k )
p
d X and Y ;!
Since Xn ;! n 0, it follows
130
Case c 6= 0:
d X and Y ;! p p
Since Xn ;! n c, it follows Xn Yn ; cXn = Xn (Yn ; c) ;! 0.
p
=) Xn Yn ;! cXn
Th:=6)
:1:14 X Y ;!
d cX
n n n
Since cXn ;! d cX by Theorem 6.1.9 (ii), it follows from Theorem 6.1.17:
d cX
Xn Yn ;!
p
(iii) Let Zn ;! 1 and let Yn = cZn .
c6=0 1
=) Yn = Z1n 1c
Th:6:1:11(v;viii) 1 p 1
=) Yn ;! c
With part (ii) above, it follows:
d X and 1 ;! p 1
Xn ;! Yn c
d X
=) XYnn ;! c
Denition 6.1.19:
Let (Xn )1
n=1 be a sequence of rv's such that E (j Xn j ) < 1 for some r > 0. We say that Xn
r
converges in the rth mean to a rv X (Xn ;! r X ) if E (j X jr ) < 1 and
!1 E (j Xn ; X
nlim jr ) = 0:
Example 6.1.20:
Let (Xn )1 1 1
n=1 be a sequence of rv's dened by P (Xn = 0) = 1 ; n and P (Xn = 1) = n .
r 0 8r > 0.
It is E (j Xn jr ) = n1 8r > 0. Therefore, Xn ;!
Note:
1 X)
The special cases r = 1 and r = 2 are called convergence in absolute mean for r = 1 (Xn ;!
and convergence in mean square for r = 2 (Xn ;!ms X or X ;! 2 X ).
n
Theorem 6.1.21:
r X for some r > 0. Then X ;! p
Assume that Xn ;! n X.
Proof:
Using Markov's Inequality (Corollary 3.5.2), it holds for any > 0:
E (j Xn ; X jr ) P (j X ; X j )
r n
131
r X =) lim E (j X ; X jr ) = 0
Xn ;! n!1 n
=) nlim P (j X ; X j ) lim E (j Xn ; X jr ) = 0
!1 n n!1 r
p
=) Xn ;! X
Example 6.1.22:
Let (Xn )1 1 1
n=1 be a sequence of rv's dened by P (Xn = 0) = 1 ; nr and P (Xn = n) = nr for some
r > 0.
p
For any > 0, P (j Xn j> ) ! 0 as n ! 1; so Xn ;! 0.
For 0 < s < r, E (j Xn js ) = nr1;s ! 0 as n ! 1; so Xn ;!s 0. But E (j X jr ) = 1 6! 0 as n ! 1;
n
6 r 0.
so Xn ;!
Theorem 6.1.23:
r X , then it holds:
If Xn ;!
r r
!1 E (j Xn j ) = E (j X j ); and
(i) nlim
s X for 0 < s < r.
(ii) Xn ;!
Proof:
(i) For 0 < r 1, it holds:
E (j Xn jr ) = E (j Xn ; X + X jr ) E (j Xn ; X jr + j X jr )
=) E (j Xn jr ) ; E (j X jr ) E (j Xn ; X jr )
=) nlim r r r
!1 E (j Xn j ) ; nlim
!1 E (j X j ) nlim
!1 E (j Xn ; X j ) = 0
=) nlim r r
!1 E (j Xn j ) E (j X j ) (A)
Similarly,
E (j X jr ) = E (j X ; Xn + Xn jr ) E (j Xn ; X jr + j Xn jr )
=) E (j X jr ) ; E (j Xn jr ) E (j Xn ; X jr )
=) nlim r r r
!1 E (j X j ) ; nlim
!1 E (j Xn j ) nlim
!1 E (j Xn ; X j ) = 0
=) E (j X jr ) nlim r
!1 E (j Xn j ) (B )
Combining (A) and (B ) gives
r r
!1 E (j Xn j ) = E (j X j )
nlim
132
For r > 1, it follows from Minkowski's Inequality (Theorem 4.8.3):
[E (j X ; Xn + Xn jr )] r1 [E (j X ; Xn jr )] 1r + [E (j Xn jr )] 1r
=) [E (j X jr )] r1 ; [E (j Xn jr )] 1r [E (j X ; Xn jr )] 1r
=) [E (j X jr )] 1r ; nlim r 1r r 1r r
!1[E (j Xn j )] nlim !1[E (j Xn ; X j )] = 0 since Xn ;! X
1
=) [E (j X jr )] 1r nlim r r
!1[E (j Xn j )] (C )
Similarly,
[E (j Xn ; X + X jr )] 1r [E (j Xn ; X jr )] 1r + [E (j X jr )] 1r
=) nlim r 1r r 1r r 1r r
!1[E (j Xn j )] ; nlim!1[E (j X j )] nlim !1[E (j Xn ; X j )] = 0 since Xn ;! X
r r 1 r r 1
=) nlim
!1[E (j Xn j )] [E (j X j )] (D)
Combining (C ) and (D) gives
r 1r r 1r
!1[E (j Xn j )] = [E (j X j )]
nlim
=) nlim r r
!1 E (j Xn j ) = E (j X j )
133
Stat 6710 Mathematical Statistics I Fall Semester 1999
Denition 6.1.24:
Let fXn g1n=1 be a sequence of rv's on (
; L; P ). We say that Xn converges almost surely to
a:s: X ) or X converges with probability 1 to X (X w:p: 1
a rv X (Xn ;! n n ;! X ) or Xn converges
strongly to X i
P (f! : Xn (!) ! X (!) as n ! 1g) = 1:
Note:
An interesting characterization of convergence with probability 1 and convergence in probability
can be found in Parzen (1960) \Modern Probability Theory and Its Applications" on page 416 (see
Handout).
Example 6.1.25:
Let
= [0; 1] and P a uniform distribution on
. Let Xn (!) = ! + !n and X (!) = !.
For ! 2 [0; 1), !n ! 0 as n ! 1. So Xn (!) ! X (!) 8! 2 [0; 1).
However, for ! = 1, Xn (1) = 2 6= 1 = X (1) 8n, i.e., convergence fails at ! = 1.
a:s: X .
Anyway, since P (f! : Xn (!) ! X (!) as n ! 1g) = P (f! 2 [0; 1)g) = 1, it is Xn ;!
Theorem 6.1.26:
a:s: X =) X ;! p
Xn ;! n X.
Proof:
Choose > 0 and > 0. Find n0 = n0 (; ) such that
\
1 !
P fj Xn ; X j g 1 ; :
n=n0
\
1
Since fj Xn ; X j g fj Xn ; X j g 8n n0, it is
n=n0
\
1 !
P (fj Xn ; X j g) P fj Xn ; X j g 1 ; 8n n0:
n=n0
p
Therefore, P (fj Xn ; X j g) ! 1 as n ! 1. Thus, Xn ;! X.
134
Example 6.1.27:
p a:s: X :
Xn ;! X 6=) Xn ;!
Let
= (0; 1] and P a uniform distribution on
.
Dene An by
A1 = (0; 12 ]; A2 = ( 21 ; 1]
A3 = (0; 14 ]; A4 = ( 41 ; 12 ]; A5 = ( 21 ; 34 ]; A6 = ( 43 ; 1]
A7 = (0; 18 ]; A8 = ( 81 ; 14 ]; : : :
Let Xn (!) = IAn (!).
p
It is P (j Xn ; 0 j ) ! 0 8 > 0 since Xn is 0 except on An and P (An ) # 0. Thus Xn ;! 0.
But P (f! : Xn (!) ! 0g) = 0 (and not 1) because any ! keeps being in some An beyond any n0 ,
6 a:s: 0.
i.e., Xn (!) looks like 0 : : : 010 : : : 010 : : : 010 : : :, so Xn ;!
Example 6.1.28:
Xn ;! r X 6=) X ;! a:s: X :
n
Let Xn be independent rv's such that P (Xn = 0) = 1 ; n1 and P (Xn = 1) = n1 .
r 0 8r > 0.
It is E (j Xn ; 0 jr ) = E (j Xn jr ) = E (j Xn j) = n1 ! 0 as n ! 1, so Xn ;!
But
Y
n0
P (Xn = 0 8m n n0 ) = (1 ; n1 ) = ( mm; 1 )( mm+ 1 )( m + 1 ) : : : ( n0 ; 2 )( n0 ; 1 ) = m ; 1
m+2 n ;1 n n
n=m 0 0 0
6 a:s: 0.
As n0 ! 1, it is P (Xn = 0 8m n n0 ) ! 0 8m, so Xn ;!
Example 6.1.29:
a:s: X 6=) X ;!
Xn ;! r X:
n
Let
= [0; 1] and P a uniform distribution on
.
Let An = [0; ln1n ].
Let Xn (!) = nIAn (!) and X (!) = 0.
a:s: 0.
It holds that 8! > 0 9n0 : ln1n0 < ! =) Xn (!) = 0 8n > n0 and P (! = 0) = 0. Thus, Xn ;!
6 r X.
But E (j Xn ; 0 jr ) = lnnrn ! 1 8r > 0, so Xn ;!
135
Stat 6710 Mathematical Statistics I Fall Semester 1999
Note:
For iid rv's with nite variance, X n is consistent for .
A more general way to derive a \WLLN" follows in the next Denition.
Denition 6.2.2:
Xn
Let fXi g1
i=1 be a sequence of rv's. Let Tn = Xi. We say that fXi g obeys the WLLN with
i=1
respect to a sequence of norming constants fBi g1i=1 , Bi > 0; Bi " 1, if there exists a sequence
of centering constants fAi gi=1 such that
1
p
Bn;1(Tn ; An) ;! 0:
Theorem 6.2.3:
Let fXi g1
i=1 be a sequence of pairwise uncorrelated rv's with E (Xi ) = i and V ar(Xi ) = i2 , i 2 IN .
X 2
n Xn Xn
If i ! 1 as n ! 1, we can choose An = i and Bn = i2 and get
i=1 i=1 i=1
X
n
(Xi ; i )
i=1 p
X
n ;! 0:
i2
i=1
136
Proof:
By Markov's Inequality, it holds for all > 0:
X
n
X
n X
n X
n E (( (Xi ; ))2 )
P (j Xi ; i j> i2 ) i=1 = X1 ;! 0 as n ! 1
Xn n
i=1 i=1 i=1 2 ( i2)2 2 i2
i=1 i=1
Note:
To obtain Theorem 6.2.1, we choose An = n and Bn = n2 .
Theorem 6.2.4:
Xn
Let fXi g1i=1 be a sequence of rv's. Let X n = 1
n Xi . A necessary and sucient condition for
i=1
fXi g to obey the WLLN with respect to Bn = n is that
2 !
E Xn 2 ! 0
1 + Xn
as n ! 1.
Proof:
Rohatgi, page 258, Theorem 2.
Example 6.2.5:
Let (X1 ; : : : ; Xn ) be jointly Normal with E (Xi ) = 0, E (Xi2 ) = 1 for all i, and Cov(Xi ; Xj ) = if
j i ; j j= 1 and Cov(Xi; Xj ) = 0 if j i ; j j> 1. Then, Tn N (0; n + 2(n ; 1)) = N (0; 2 ). It is
2 ! !
E X n = Tn2
E n2 + T 2
1 + X 2n n
Z 1 x2 ; 2 x dx
= p2 e 2 2 dx j y = ; dy =
x
2 0 n + x2 2
Z 1 2 y2
= p2 e ; 22 dy
y
2 0 n2 + 2 y2
Z 1 (n + 2(n ; 1))y2
= p2 e; 22 dy
y
2 0 n + (n + 2(n ; 1))y
2 2
n + 2(n ; 1) Z1 2
p y 2 e; 22 dy
y
n2 | 0 2{z }
=1; since Var of N (0;1) distribution
!0 as n ! 1
p
=) X n ;! 0
137
Note:
We would like to have a WLLN that just depends on means but does not depend on the existence
of nite variances. To approach this, we consider the following:
X
n
Let fXi g1
i=1 be a sequence of rv's. Let Tn = Xi . We truncate each Xi at c > 0 and get
i=1
(
Xic = Xi; j Xi j c
0; otherwise
X
n X
n
Let Tnc = Xic and mn = E (Xic ).
i=1 i=1
Lemma 6.2.6:
For Tn , Tnc and mn as dened in the Note above, it holds:
X
n
P (j Tn ; mn j> ) P (j Tnc ; mn j> ) + P (j Xi j> c) 8 > 0
i=1
Proof:
It holds for all > 0:
P (j Tn ; mn j> ) = P (j Tn ; mn j> and j Xi j c 8i 2 f1; : : : ; ng) +
P (j Tn ; mn j> and j Xi j> c for at least one i 2 f1; : : : ; ng
()
P (j Tnc ; mn j> ) + P (j Xi j> c for at least one i 2 f1; : : : ; ng)
Xn
P (j Tnc ; mn j> ) + P (j Xi j> c)
i=1
() holds since Tnc = Tn when j Xi j c 8i 2 f1; : : : ; ng.
Note:
If the Xi 's are identically distributed, then
P (j Tn ; mn j> ) P (j Tnc ; mn j> ) + nP (j X1 j> c) 8 > 0:
If the Xi 's are iid, then
c2
P (j Tn ; mn j> ) nE ((X2 1 ) ) + nP (j X1 j> c) 8 > 0 ():
Note that P (j Xi j> c) = P (j X1 j> c) 8i 2 IN if the Xi 's are identically distributed and that
E ((Xic )2 ) = E ((X1c )2 ) 8i 2 IN if the Xi 's are iid.
138
Theorem 6.2.7: Khintchine's WLLN
Let fXi g1
i=1 be a sequence of iid rv's with nite mean E (Xi ) = . Then it holds:
X = 1 T ;!
n n
p
n
Proof:
If we take c = n and replace by n in () in the Note above, we get
X1n )2 ) + nP (j X j> n):
P (j Tn ; mn j> n) E ((n 2 1
Note:
Theorem 6.2.7 meets the previously stated goal of not having a nite variance requirement.
139
Stat 6710 Mathematical Statistics I Fall Semester 1999
Note:
Unless otherwise specied, we will only use the case that Bn = n in this section.
Theorem 6.3.2:
a:s: X () lim P (sup j X ; X j> ) = 0 8 > 0.
Xn ;! n!1 mn m
Proof: (see also Rohatgi, page 249, Theorem 11)
WLOG, we can assume that X = 0 since Xn ;! a:s: X implies X ; X ;!
a:s: 0. Thus, we have to prove:
n
140
We also have
P (Bn()) = P (An \ C )
= 1 ; P (C c [ Acn )
= 1 ; P| ({zC c}) ;P (Acn ) + P| (C c{z\ ACn })
=0 =0
= P (An )
=) nlim
!1 P (An ()) = 0
\(=":
!1 P (An ()) = 0 8 > 0 and dene D() = fnlim
Assume that nlim !1 j Xn j> g. Since D()
An() 8n 2 IN , it follows that P (D()) = 0 8 > 0. Also,
[
1 1
C c = fnlim
!1 Xn 6= 0g fnlim
!1 j Xn j> k g:
k=1
X
1
=) 1 ; P (C ) P (D( k1 )) = 0
k=1
=) Xn a:s: 0
;!
Note:
a:s: 0 implies that 8 > 0 8 > 0 9n 2 IN : P ( sup j X j> ) < .
(i) Xn ;! 0 n
nn0
(ii) Recall that for a given sequence of events fAn g1
n=1 ,
[
1 \
1 [
1
A = nlim
!1An = nlim
!1 Ak = Ak
k=n n=1 k=n
is the event that innitely many of the An occur. We write P (A) = P (An i:o:) where i:o:
stands for \innitely often".
(iii) Using the terminology dened in (ii) above, we can rewrite Theorem 6.3.2 as
a:s: 0 () P (j X j> i:o:) = 0 8 > 0:
Xn ;! n
141
Theorem 6.3.3: Borel{Cantelli Lemma
(i) 1st BC{Lemma:
1 X
1
Let fAn gn=1 be a sequence of events such that P (An ) < 1. Then P (A) = 0.
n=1
(ii) 2nd BC{Lemma:
X
1
Let fAn g1
n=1 be a sequence of independent events such that P (An ) = 1. Then P (A) = 1.
n=1
Proof:
(i): [
1
P (A) = P (nlim
!1 k=n Ak )
[
1
!1 P (
= nlim
k=n
Ak )
X
1
nlim
!1 P (Ak )
k=n
X
1 nX
;1 !
= nlim
!1 P (Ak ) ; P (Ak )
k=1 k=1
= 0
[
1 \
1
(ii): We have Ac = Ack . Therefore,
n=1 k=n
\
1 \
1
P (Ac) = P (nlim
!1 Ack ) = nlim
!1 P ( Ack ):
k =n k=n
If we choose n0 > n, it holds that
\
1 \
n0
Ack Ack :
k=n k=n
Therefore,
\
1 \
n0
P( Ack ) !1 P (
n0lim Ack )
k=n k=n
Y
n 0
= n0lim
!1 (1 ; P (Ak ))
k=n !
indep: X
n0
n0lim
!1 exp ; P (Ak )
k=n
= 0
=) P (A) = 1
142
Example 6.3.4:
Independence is necessary for 2nd BC{Lemma:
Let
= (0; 1) and P a uniform distribution on
.
Let An = I(0; n1 ) (!). Therefore,
X
1 X
1 1
P (An ) = = 1:
n=1 n=1 n
But for any ! 2
, An occurs only for 1; 2; : : : ; b !1 c, where b !1 c denotes the largest integer (\
oor")
that is !1 . Therefore, P (A) = P (An i:o:) = 0.
Proof:
See Rohatgi, page 269, Lemma 3.
143
Theorem 6.3.8:
X
1 X
1
If V ar(Xn ) < 1, then (Xn ; E (Xn )) converges almost surely.
n=1 n=1
Proof:
See Rohatgi, page 272, Theorem 6.
144
Stat 6710 Mathematical Statistics I Fall Semester 1999
Corollary 6.3.9:
Let fXi g1
i=1 be a sequence of independent rv's. Let fBi g1
i=1 , Bi > 0; Bi " 1, a sequence of norming
Xn X
1 V ar(X )
i
constants. Let Tn = Xi . If
i=1 B 2 < 1 then it holds:
i=1 i
Tn ; E (Tn ) ;!
a:s: 0
Bn
Proof:
This Corollary follows directly from Theorem 6.3.8 and Lemma 6.3.6.
Lemma 6.3.11:
Let X be a rv with E (j X j) < 1. Then it holds:
X
1 X
1
P (j X j n) E (j X j) 1 + P (j X j n)
n=1 n=1
Proof:
Continuous case only:
Let X have a pdf f . Then it holds:
Z1 1 Z
X
E (j X j) = j x j f (x)dx = j x j f (x)dx
;1 k=0 kjxjk+1
X
1 X
1
=) kP (k j X j k + 1) E (j X j) (k + 1)P (k j X j k + 1)
k=0 k=0
145
It is
X
1 X
1 X
k
kP (k j X j k + 1) = P (k j X j k + 1)
k=0 k=0 n=1
X
1 X
1
= P (k j X j k + 1)
n=1 k=n
X
1
= P (j X j n)
n=1
Similarly,
X
1 X
1 X
1
(k + 1)P (k j X j k + 1) = P (j X j n) + P (k j X j k + 1)
k=0 n=1 k=0
X
1
= P (j X j n) + 1
n=1
Proof:
\=)":
a:s: < 1. It is
Suppose that X n ;!
X
n nX
;1
Tn = Xi = Xi + Xn = Tn;1 + Xn .
i=1 i=1
=) Xn = Tn ; n ; 1 Tn;1 ;!a:s: 0
n ; 1}
n | {zn } |n {z
|{z}
;!
a:s:
!1 ;!
a:s:
X
1
By 1st Borel{Cantelli Lemma, we must have P (j Xn j n) < 1.
n=1
=)6:3:11 E (j X
Lemma j) < 1
Th: 6:2:7 (WLLN ) p
=) X n ;! E (X )
a:s: , it holds that X ;! p
Since X n ;! n . Therefore, it must hold that = E (X ).
146
\(=":
Let E (j X j) < 1. Dene truncated rv's:
(
Xk0 = Xk ; if j Xk j k
0; otherwise
X
n
Tn0 = Xk0
k=1
0
X 0n = Tnn
Then it holds:
X
1 X
1
P (Xk 6= Xk0 ) = P (j Xk j k)
k=1 k=1
iid X
1
= P (j X j k)
k=1
Lemma 6:3:11
< E (j X j)
< 1
By Lemma 6.3.10, it follows that Tn and Tn0 are convergence{equivalent. Thus, it is sucient to
prove that X 0n ;!
a:s: E (X ).
147
X1 1 2 1:65 < 2 and the rst two sums can be rearranged as follows:
() holds since =
n=1 n
2 6
n k n k
1 1 1 1; 2; 3; : : :
2 1; 2 =) 2 2; 3; : : :
3 1; 2; 3 3 3; : : :
.. .. .. ..
. . . .
It is
X1 1 1 + 1 + 1 + :::
2 = k2 (k + 1)2 (k + 2)2
n=k n
= 12 ; 2 1 3 ; 3 1 4 ; : : : ; (k ;11) k
= 13 ; 3 1 4 ; : : : ; (k ;11) k
= 14 ; : : : ; (k ;11) k
= :::
= 1 k
X1 1 1 + X
1 1
=) 2 k n=k+1 n(n ; 1)
2
n=k n
= k12 + k1
148
k2
Using this result in (A), we get
X
1 1
0 ) 2 X (k + 1) P (k j X j< k + 1) + 2P (0 j X j< 1)
1 2
n2 V ar ( X n k
n=1 k=1
X
1 X
1
= 2 kP (k j X j< k + 1) + 4 P (k j X j< k + 1)
k=0 k=1
X
1 1
+2 P (k j X j< k + 1) + 2P (0 j X j< 1)
k=1 k
(B)
2E (j X j) + 4 + 2 + 2
< 1
To establish (B ), we use an inequality from the Proof of Lemma 6.3.11.
Thus, the conditions needed in Corollary 6.3.9 are met. It follows that
1 T 0 ; 1 E (T 0 ) ;!
a:s: 0 (C )
n n n n
1 T 0 ;!
a:s: E (X ) Lemma
=) 6:3:10 1 T ;!
a:s: E (X )
n n n n
149