Professional Documents
Culture Documents
by Howard G. Tucker
1
one obtains after a small amount of computation that, for every pair of real
numbers (x; y),
X
FX;Y (x; y) = fP ([X = k] \ [Y = l]) : 0 k x; 0 l y; k + l = 1 or 2g .
The values of this distribution function and the regions over which it has
those values are as follows:
Value Re gion over which FX;Y (x; y) has the value
0 minfx; yg < 0 or maxfx; yg < 1
2
15
0 x < 1 and 1 y < 2
3
15
0 x < 1 and y 2
3
15
1 x < 2 and 0 y < 1
11
15
1 x < 2 and 1 y < 2
12
15
1 x < 2 and y 2
6
15
x 2 and 0 y < 1
14
15
x 2 and 1 y < 2
1 x 2 and y 2 .
In …gure ? below there is a graph of this function, the values of the function
being recorded over the domains over which the function takes these values.
FIGURE 1
[X x1 ] \ [Y y1 ] [X x2 ] \ [Y y2 ]:
2
Hint of proof: In order to prove part (i), one should use the fact that, for
every value of y 0,
[
1
[Y y] = ([X 0] \ [Y y]) [ [k 1<X k] \ [Y y]
k=1
and then closely folow the steps in the proof of theorem 17 in section 2.1. In
order to prove part (ii), one should use the result from part (i) and the fact
that FX;Y (x; y) = FY;X (y; x).
Theorem 3. If X and Y are random variables, then
and
(ii) limy! 1 FX;Y (x1 ; y1 ) =0.
The proof of this theorem is very much like the proof of theorem 17 in
section 2.1.
At this time we need not present a formal development of the general
notion of multivariate distributions. After carefully going through the devel-
opment this far, the reader should be able to supply a de…nition of the joint
distribution function of three random variables X1 , X2 , X3 and should be
able to prove theorems similar to theorems 1, 2 and 3. We next introduce
the notion of independent random variables.
De…nition. Let X1 ; X2 ; denote a …nite or in…nite sequence of random
variables. These random variables are said to be independent if for all
possible selections of pairs of real numbers
(a1 ; b1 ); (a2 ; b2 );
where ai < bi for all i, and where the values of 1 are allowed, the events
are independent.
Theorem 4. Let X1 , X2 , ; Xm denote m random variables. A nec-
essary and su¢ cient condition that these random variables be independent is
that for every m-tuple of real numbers, (x1 ; x2 ; ; xm ), the equality
Y
m
FX1 ;X2 ; ;Xm (x1 ; x2 ; ; xm ) = FXj (xj )
j=1
3
holds.
Proof: We …rst prove that the condition is necessary. Accordingly, we
assume that the random variables are independent and prove that the con-
dition, namely the above equation holds for any point (x1 ; x2 ; ; xm ). In
this case, let 1 = a1 = a2 = = am and xi = bi for 1 i m. Then by
the de…nition of independence and joint distribution we have
T
m
FX1 ;X2 ; ;Xm (x1 ; x2 ; ; xm ) = P [ 1 < Xk x k ]
Qm k=1
= k=1 P ([ 1 < Xk xk ])
Qm
= P ([Xk xk ])
k=1
Qm
= FXj (xj ) ,
j=1
which establishes the condition. We now prove that the equation in the
theorem implies independence of the random variables.We …rst remark that
if (i1 ; i2 ; ; ik ) is any subset of the integers 1; 2; ; ; m, , where 1 i1 <
< ik m, then
Y
k
FXi1 ; ;Xim (xi1 ; ; xim ) = FXij (xij ) .
j=1
Y
2
P ([a1 < X1 b1 ] \ [a2 < X2 b2 ]) = P ([aj < Xj bj ]) .
j=1
for all real number pairs (x1 ; x2 ). We …rst note that the following equations
hold:
FX1 ;X2 (b1 ; b2 ) = FX1 (b1 )FX2 (b2 )
FX1 ;X2 (a1 ; b2 ) = FX1 (a1 )FX2 (b2 )
FX1 ;X2 (b1 ; a2 ) = FX1 (b1 )FX2 (a2 )
FX1 ;X2 (a1 ; a2 ) = FX1 (a1 )FX2 (a2 ).
4
Also it is easy to verify that
[X1 b1 ] \ [X2 b2 ] = [a1 < X1 b1 ] \ [a2 < X2 b2 ]
[[a1 < X1 b1 ] \ [X2 a2 ]
[[X1 a1 ] \ [a2 < X2 b2 ]
[[X1 a1 ] \ [X2 a2 ] .
Since
[X1 b1 ] \ [X2 a2 ] = ([X1 a1 ] \ [X2 a2 ]) [ ([a1 < X1 b1 ] \ [X2 a2 ])
we obtain, after taking probabilities of both sides, that
P ([a1 < X1 b1 ] \ [X2 a2 ]) = FX1 ;X2 (b1 ; a2 ) FX1 ;X2 (a1 ; a2 ) .
In a similar manner, we obtain
P ([X1 a1 ] \ [a2 < X2 b2 ]) = FX1 ;X2 (a1 ; b2 ) FX1 ;X2 (a1 ; a2 ) .
Now, making use of all of the above, we …nally obtain
P ([a1 < X1 b1 ] \ [a2 < X2 b2 ])
= FX1 (b1 )FX2 (b2 ) FX1 (a1 )FX2 (b2 ) FX1 (b1 )FX2 (a2 ) + FX1 (a1 )FX2 (a2 )
= (FX1 (b1 ) FX1 (a1 ))(FX2 (b2 ) FX2 (a2 ))
= P ([a1 < X1 b1 ])P ([a2 < X2 b2 ]) .
If, in the above proof, one replaces X1 by Xi1 and X2 by Xi2 where 1 i1 <
i2 n, then one proves in the very same way that
\
2 Y
2
P( [aij < Xij bij ]) = P ([aij < Xij bij ]) .
j=1 j=1
5
EXERCISES
6
the joint density of X1 ; ; Xk , we are able to determine the joint density of
all k + 1 discrete random variables. So let us determine this joint density. It
is easy to see that the fundamental probability set, , is the set of all (k +1)n
ordered n-tuples of the outcomes A0 ; A1 ; ; Ak . (Here is another case where
the individual outcomes are not necessarily equally likely.) Let pi denote the
probability that Ai occurs at a particular trial. For 1 i k; If we let xi be
a non-negative integer, 1 i k, such that 0 x1 + + xk n, then the
probability that, for one particular elementary event, x1 of the outcomes are
A1 ’s, x2 of the outcomes are A2 ’s, , and xk of the outcomes are Ak ’s is
We
Tk must now determine hown many such elementary events there are in
j=1 [Xj = xj ]. There are x1 ways in which x1 out of the n trials can
be selected for A1 to occur. For each way that x1 trials can be selected for
A1 to occur, there are n x2x1 ways in which one can select x2 trials out of the
remaining n x1 trials for the outcome to be A2 . After selecting x1 + + xk
trial numbers for x1 of the A1 ’s to occur on up to xk of the Ak ’s there is
exactly one way in which to select the remaining trial numbers for A0 only
to occur. Thus the joint density of X1 ; ; Xk is
n n x1 n x1 xk 1
Y
k
fX1 ; ;Xk (x1 ; ; xk ) = pxi i ,
x1 x2 xk i=0
n n x1 n x1 xk 1 n!
= .
x1 x2 xk x0 !x1 ! xk !
Consequenntly
n! Y k
fX1 ; ;Xk (x1 ; ; xk ) = p xi
x0 !x1 ! xk ! i=0 i
7
Proof: Suppose X1 ; ; Xn have a joint discrete distribution. We shall
prove that each random variable has a discrete distribution only in the case
n = 2. Suppose that
and thus F is the range of X1 , thus proving that X1 has a discrete distri-
bution. A proof that X2 has a discrete distribution is similar to the above
proof. Conversely, suppose that each of the individual random variables has
a discrete distribution. Then,
X X
X1 = xn I[X1 =xn ] and X2 = yn I[X2 =yn ]
n 1 n 1
for countable sets of distinct numbers fxn g and fyn g. Now the set, S, of all
possible pairs f(xi ; yj ) : i 1; j 1g is countable. For each i 1,
[
[X1 = xi ] = ([X1 = xi ] \ [X2 = yj ]),
j 1
Thus,
X X X
fP ([X1 = xi ] \ [X2 = yj ]) : j 2 Si g = P ([X1 = xi ]g = 1,
i 1 i 1
8
and we have shown that range(X1 ; X2 ) [i 1 f(i; j) : j 2 Si g, which is a
countable set. This shows that the joint distribution of (X1 ; X2 ) is discrete.
Theorem 2. If X1 ; X2 have a joint discrete distribution, then so does
X1 , and X
fX1 (x1 ) = ffX1 ;X2 (x1 ; x2 ) : x2 2 Sx1 g .
Proof: This was actually proved in the proof of theorem 1.
9
where this last sum is taken over the set
f(t1 ; t2 ; ; tn ) 2 range(X1 ; X2 ; ; Xn ) : ti xi ; 1 i ng .
By the uniqueness of the discrete density, this proves that the condition is
true.
De…nition. Random variables X1 ; ; Xn are said to have a joint ab-
solutely continuous distribution (function) if there exists a nonnegative func-
tion fX1 ; ;Xk (x1 ; ; xk ) de…ned over n-dimensional euclidean space, Rn ,
such that
Zxn Zx1
FX1 ; ;Xn (x1 ; ; xn ) = fX1 ; ;Xn (t1 ; ; tn )dt1 dtn
1 1
@n
FX1 ; ;Xn (x1 ; ; xn ) = fX1 ; ;Xk (x1 ; ; xk ) .
@x1 @xn
The following theorem will not be stated and proved in all its generality
because the general statement and proof require too much complicated no-
tation. We instead state and prove a special case; the reader should be able
to extend the proof to more general cases.
Theorem 4. If X1 ; X2 ; X3 have a joint absolutely continuous distribu-
tion function, then so does X1; X3 , and a joint density of X1 ; X3 may be
written as Z 1
fX1 ;X3 (x1 ; x3 ) = fX1 ;X2 ;X3 (x1 ; x2 ; x3 )dx2 .
1
10
Proof: By theorem 2 in section 5.1, the de…nition of density and the usual
properties of integrals (i.e., that we can change the order of integration in
multiple integrals)
FX1 ;X3 (x1 ; x3 ) = P ([X1 x1 ] \ [X3 x3 ])
= RP ([XR1 Rx1 ] \ [X2 1] \ [X3 x3 ])
x1 1 x3
= 1 f (t ; t ; t )dt3 dto2 dt1
1 n 1 X1 ;X2 ;X3 1 2 3
R x1 R x3 R 1
= 1 1 f (t ; t ; t )dt2 dt1 dt3 .
1 X1 ;X2 ;X3 1 2 3
11
Theorem 5. If random variables X1 ; X2 ; ; Xn have a joint absolutely
continuous distribution, then a necessay and su¢ cient condition that they
are independent is that their joint density factors into the product of their
univariate marginals, i.e.,
Y
n
fX1 ;X2 ; ;Xn (x1 ; x2 ; ; xn ) = fXj (xj )
j=1
By the very de…nition of a joint density, the integrand of the last multiple
integral above is a joint density, and the theorem is proved.
Aside to sophisticates: For an undergraduate course where one is limited
to the Riemann integral, much is left unsaid. Certain statements made above
are justi…ed only by appealing to the a.e. uniquenss of the Radon-Nikodym
theorem and also to Tonelli’s theorem.
EXERCISES
12
1. Let X1 and X2 be random variables with a joint discrete distribution,
de…ne
F = fx : (x; x0 ) 2 range(X1 ; X2 ) for some x0 g,
i.e., F is the set of all …rst coordinates of points in range(X1 ; X2 ), and for
every x 2 F , let Sx = fy : (x; y) 2 F g. Prove that
[X1 = x] = [(X1 ; X2 ) 2 f(x; y) : y 2 Sx g] for every x 2 F ,
and
X
P ([X1 = x]) = fP ([X1 = x] \ [X2 = y]) : y 2 Sx g for every x 2 F .
2. Three tags, numbered 1, 2 and 3, are placed in a box. One samples two
times with replacement. Let X denote the number on the …rst tag selected,
and let Y denote the number on the second tag selected. Find the joint
discrete density function and and the joint distribution function of (X; Y ).
3. Use theorem 4 in this section to prove that X and Y in problem 2 are
independent.
4. An urn contains two red balls, two white balls and one blue ball. Let
X denote the number of red balls in a sample of size three taken without
replacement, and let Y denote the number of white balls in the same sample.
Find the joint density function of (X; Y ), and show that X and Y are not
independent.
5. Let X and Y be random variables with joint absolutely continuous
distribution whose joint density is given by
3
4
if j x j 1; x2 y 1
fX;Y (x:y) =
0 otherwise.
Find the two univariate marginal densities, fX (x) and fY (y).
6. Let X, Y and Z be independent random variables, each of which has
an absolutely continuous distribution function.
(i) Prove that the joint distribution fuction of X, Y ,Z is absolutely con-
tinuous.
(ii) If
e x for x > 0
fX (x) =
0 for x 0,
if
1 if 0 < y < 1
fY (y) =
0 if j y 21 j 1
2
13
and if
1
fZ (z) = e jzj for all real z,
2
determine the joint density of X, Y ,Z.
7. Let U and V be random variables with joint absolutely continuous
distribution function whose density is given by
2
p
if j v j 1; 0 u 1 v2
fU;V (u; v) =
0 otherwise.
(i) Find the marginal densities fU (u) and fV (v).
(ii) Show that U and V are not independent.
8. At the beginning of this section where we were dixcussing the multino-
mial distribution, there is the statement, “Thus if we can determine the joint
density of X1 ; ; Xk , we are able to determine the joint density of all k + 1
discrete random variables.” Render a precise meaning of this statement.
5.3 Variance and Covariance. In this section we assemble some basic
results about variance and covariance and their relationships with indepen-
dence.
Theorem 1. If X and Y are independent random variables with …nite
…rst moments, then the random variable XY has a …nite …rst moment, and
E(XY ) = E(X)E(Y ).
Proof: We shall …rst prove this theorem when both X and Y are positive
random variables, i.e., when P ([X 0]) = 1 and P ([Y 0]) = 1. Case (i):
Suppose Y is discrete and takes only a …nite number of non-negative values,
0 < z1 < < zn , with positive probabilities. Then for z > 0,
X
n
P ([XY > z]) = P ([XY > z] \ [Y = zj ]) .
j=1
14
Hence
Z1 X
n Z1
z
P ([XY > z])dz = P ([Y = zj ]) P ([X > ])dz ,
j=1
zj
0 0
and after a change of variable in each integral on the right hand side, we
obtain
R1 P
n R1
P ([XY > z])dz = P ([Y = zj ])zj P ([X > x])dx
0 j=1 0
R1 ,
= E(Y ) P ([X > x])dx
0
= E(X)E(Y )
since X and Y are positive random variables. Case (ii) : Now suppose X is
as before in case (i) and Y is a bounded non-negative random variable, i.e.,
there exists a positive integer K such that 0 Y (!) K for all ! 2 . As
in the proof of theorem 4 in section 3.2, we de…ne for every positive integer
n,
Pn j 1
K2
Yn = I n n , and
2n [(j 1)=2 <Y j=2 ]
j=1
Pn
K2
j
Yn+ = I n
2n [(j 1)=2 <Y j=2n ] .
j=1
Thus
lim E(Yn+ ) = E(Y ) and lim E(Yn ) = E(Y ) .
n!1 n!1
k k
[Yn+ y] = [Yn+ ] = [Y ].
2n 2n
15
Using the independence of X and Y , we obtain
k
P ([X x] \ [Yn+ y]) = P ([X x] \ [Y 2n
])
k
= P ([X x])P ([Y 2n
])
= P ([X x])P ([Yn+ y]),
which proves the independence of X and Yn+ . The proof of independence of
X and Yn is the same where each k is replaced by k 1. Because XYn
XY XYn over , then by theorem 3 in section 3.2 and case (i),
E(X)E(Yn ) = E(XYn ) E(XY ) E(XYn+ ) = E(X)E(Yn+ ) .
Letting n ! 1, we obtain the conclusion of the theorem for case (ii). Case
(iii) We now consider the case where X and Y are any independent, positve
random variables whose expectations exist. First, we show that if K > 0 is
any positive integer, then X and Y I[Y <K] are independent. Indeed, we …rst
observe that, for any y > 0,
[Y I[Y K] y] = [Y minfK; yg] [ [Y > K],
the union being a disjoint union. Then, because of independence and dis-
jointness, we have, for every x > 0 and every y > 0,
P ([X x] \ [Y I[Y K] y])
= P ([X x] \ ([Y minfK; yg]) + P ([X x] \ [Y > K])
= P ([X x])P ([Y minfK; yg]) + P ([X x])P ([Y > K])
= P ([X x])P ([Y minfK; yg] [ [Y > K])
= P ([X x])P ([Y I[Y K] y]),
which proves that X and Y I[Y <K] are independent. Since Y I[Y <K] is a
bounded, positive random variable, we have by case (ii) that
E(XY I[Y <K] ) = E(X)E(Y I[Y <K] ).
This and lemma 1 in section 3.2 and its corollary imply
R1 R1
P ([XY > x])dx = limK!1 P ([XY > x] \ [j Y j< K]))dx
0 0
R1
= limK!1 P ([XY I[Y <K] > x])dx
0
= limK!1 E(XY I[Y <K] )
= limK!1 E(X)E(Y I[Y <K] )
= E(X)E(Y ) ,
16
which is …nite. Thus the expectation of XY exists and equals E(X)E(Y ).
We now establish the theorem in the general case. Let us denote
X + = XI[X 0] , X = XI[X<0] ,
Y + = Y I[Y 0] , Y = Y I[Y <0] .
XY = X + Y + + X Y X +Y X Y+ .
which proves the claim. Also, one can show that the existence of E(X) and
E(Y ) imply that of E(X + ), E(X ), E(Y + ) and E(Y ) and that E(X) =
E(X + ) E(X ) and E(Y ) = E(Y + ) E(Y ). Thus, by all this and the
last displayed equation, we have by the fact that we have shown the theorem
to to be true for all non-negative random variables:
17
Theorem 2. If X and Y are random variables, then Cov(X; Y ) =
E(XY ) E(X)E(Y ), provided the expectations exist.
Proof: By previous results on expectations and the de…nition above, we
have
Cov(X; Y ) = E((X E(X))(Y E(Y )))
= E(XY XE(Y ) E(X)Y + E(X)E(Y ))
= E(XY ) E(X)E(Y ) E(X)E(Y ) + E(X)E(Y )
= E(XY ) E(X)E(Y ) .
Lemma 1. If X is a random variable such that E(X 2 ) = 0, then
P ([X = 0]) = 1.
Proof: Let us suppose that the conclusion is not true. Since P ([X 2
0]) = 1, there exists x0 > 0 such that P ([X 2 x0 ]) < 1 or 1 P ([X 2 x0 ]) >
0. Let = 1 P ([X 2 x0 ]) > 0. Then for all x 2 [0; ), P ([X 2 > x]) .
Thus
Z 1 Z x0
2 2
E(X ) = P ([X > x])dx P ([X 2 > x])dx x0 > 0,
0 0
18
in addition, equality holds if and only if there exists a real number t such that
which gives the …rst conclusion of the theorem. Strict equality holds if and
only if there exists a value of t such that E((tX + Y )2 ) = 0. By lemma 1,
this implies that P ([tX + Y = 0]) = 1. Conversely, if P ([tX + Y = 0]) = 1
for some t, then E((tX + Y )2 ) = 0 for that value of t, and thus the proof is
complete.
De…nition. If a random variable X has a …nite second moment,p then
the standard deviation of X, s:d:(X), is de…ned by s:d:(X) = V ar(X).
De…nition. Let X and Y be random variables with …nite second mo-
ments. The correlation coe¢ cient (X; Y ) of X and Y is de…ned by
Cov(X; Y )
(X; Y ) = ,
s:d:(X)s:d:(Y )
provided s:d:(X) > 0 and s:d:(Y ) > 0. If the standard deviation of any of
these random variables is equal to zero, then the correlation coe¢ cient is not
de…ned.
Theorem 4. If X and Y are random variables with positive standard
deviations, then 1 (X; Y ) 1. The equality (X; Y ) = 1 holds if and
19
only if there exists a positive number such that Y E(Y ) = (X E(X)).
The equality (X; Y ) = 1 holds if and only if there exists a negative number
such that Y E(Y ) = (X E(X)).
Proof: By Schwarz’inequality
20
P
n P
n
Proof: First note that E Xk = E(Xk ). Then
k=1 k=1
P
n P
n P
n
V ar Xk =E ( Xk E(Xk ))2
k=1 k=1 k=1
Pn
= E ( (Xk E(Xk )))2
( k=1 )
Pn P
=E (Xk E(Xk ))2 + (Xi E(Xi ))(Xj E(Xj ))
k=1 i6=j
P
n P
= V ar(Xk ) + Cov(Xi ; Xj ) .
k=1 i6=j
EXERCISES
21
(i) Find the densities of X, of Y , of Z, of XZ and of Y Z.
(ii) Compute (X; Z) and (X; Z) .
5. In the proof of theorem 1, we noted that since X and Y are inde-
pendent, then for each n; X and Yn+ are independent, and X and Yn are
independent. Prove these two statements.
5.4 Limit Theorems. Limit theorems are among the most treasured
results of probability and statistics. We have already encountered our …rst
limit theorem, Bernoulli’s theorem, which provided us with an interpretation
of probability of an event and justi…ed the method used in permutation tests.
In this section we obtain a generalization of Bernoulli’s theorem, the law of
large numbers, which is crucial in statistical inference and at the very basis
of the insurance industry.
Theorem 1. (Law of Large Numbers) If fX1 ; X2 ; ; Xn ; g is a
sequence of independent and identically distributed random variables with
…nite common expectation and …nite common second moment, then for
every > 0, the following is true:
" #!
1X
n
P j Xj )j ! 0 as n ! 1 .
n j=1
and let 2 > 0 denote the common variance of the random variables. It is
2
easy to verify that E(X n ) = and V ar(X n ) = n . Thus by Chebishev’s
inequality proved in section 4.1,
" #!
1X
n 2
0 P j Xj j = P j X n E(X) j !0
n j=1 n2
22
Theorem 2. (Central Limit Theorem) If fX1 ; X2 ; ; Xn ; g is a
sequence of independent and identically distributed random variables with
…nite common second moment, then
" Pn #!
j=1 (Xj E(X j ))
limn!1 P p x = (x)
nV ar(X1 )
for all real x, where
Zx
1 t2 =2
(x) = p e dt.
2
1
23
p q
1
Since p(1 p)=n 4n
, this last event implies the event
" r r #
Sn 1 Sn 1
x0 p + x0 .
n 4n n 4n
Thus we know that for large values of n, this last inequality supplies us with
what is called a 95% con…dence interval for p, since this last event has a
probability that is approximately equal to or greater than :95.
Let us nail down an interpretation of what has been done above. If
someone has the number of successes Sn in n trials and wishes to know what
are the possible values of p, the statistician will use the above formulas
q and
will tell the person that the value of p is any number between Snn x0 4n 1
and
q
Sn 1
n
+x0 4n . What this means, according to Bernoulli’s theorem given above,
is that among the many independent projects in your statistical career that
you use these formulas for left and right endpoints for possible values of p, in
roughly 95% or more of the cases you will be correct and in approximately 5%
or fewer of the cases the values of the unknown p will not be included in the
interval. (And you may never know in which cases you were wrong.) What it
does not mean is that you have 95% con…dence in a particular situation that
the interval you supply contains the unknown value of p, unless you de…ne
this expression as being the interpretation given before it.
EXERCISES
24
3. Yet another polling agency selects their random sample in such a way
that they encounter considerable expense. So they wish to minimize the size
of the sample. If the voter sentiment is in a dead heat, they do not wish
to say that candidate A will win, so they wish to limit the probability of
making this error to be no greater than 0:05. However, if the true proportion
of voters for candidate A is at least 0:55, they wish to be able to ascertain
this with large probability, say, 0:90. So how large a sample of voters should
they take, and under what circumstance will they state that candidate A is
leading?
25