Ta 2

TA Session 3
Oct. 19th, 2007
1. Random Vector
A random vector is a vector of random variables,
 
X1 (s)
 .. 
X(s) =  . 
Xn (s)
that maps from S to Rn .

FX (x), or FX1 ,··· ,Xn (x1 , · · · , xn ) which is called joint cdf, is defined by
FX1 ,··· ,Xn (x1 , · · · , xn ) = Pr(X1 ≤ x1 , · · · , Xn ≤ xn )
A joint pmf or pdf is obtained in the same way as in the random variable case.
pX (x) = Pr(X = x) = Pr(X1 = x1 , · · · , Xn = xn )

dn FX (x)
fX (x) =
dx1 · · · dxn
A marginal distribution of Xi is given by
FXi (xi ) = Pr(Xi ≤ xi and Xj ∈ R for all j 6= i) = lim FX (x)

xj →∞ ∀j6=i
A marginal pmf or pdf can be obtained in the same way. The relationships between joint and marginal
cdf/pdf are
Z Z Zz1 Zzn
FX (z) = ··· fX (x)dx = ··· fX1 ,··· ,Xn (x1 , · · · , xn )dxn · · · dx1
x≤z −∞ −∞
Zz1 Z∞ Z∞
FX1 (z1 ) = ··· fX1 ,··· ,Xn (x1 , · · · , xn )dxn · · · dx1
−∞ −∞ −∞
Z Z
dFX1 (x1 )
fX1 (x1 ) = = ··· fX (x)dx
dx1
x2 ,··· ,xn ∈Rn−1
Definition 1 Let (X1 , X2 )0 be a random vector. Conditional pmf of X2 given X1 = x1 is given by
pX1 ,X2 (x1 , x2 )

pX2 |X1 =x1 (x2 ) =
pX1 (x1 )
and conditional pdf of X2 given X1 = x1 is given by
fX1 ,X2 (x1 , x2 )

fX2 |X1 =x1 (x2 ) =
fX1 (x1 )
1
(e.g) f (x1 , x2 ) = 12x1 x2 (1 − x2 ) when 0 < x1 , x2 < 1, 0 elsewhere.
R1
fX1 (x1 ) = 0 12x1 x2 (1 − x2 )dx2 = 2x1 when 0 < x1 < 1, and 0 elsewhere.
R1
fX2 (x2 ) = 0 12x1 x2 (1 − x2 )dx1 = 6x2 (1 − x2 ) when 0 < x2 < 1, and 0 elsewhere.
f (x1 , x2 ) 12x1 x2 (1 − x2 )
fX2 |X1 =x1 = = = 6x2 (1 − x2 ) when 0 < x2 < 1, and 0 elsewhere.
fX1 (x1 ) 2x1
Consider a random vector Y = g(X), where g : Rn → Rn . Let fX (x) be the pdf of X. What would
be the pdf of Y ? For n = 2, it is
¯  dw ¯
¯ 1 dw1 ¯
¯ (y1 , y2 ) (y1 , y2 ) ¯
¯  dy1 dy2 ¯
¯  ¯
fY1 ,Y2 (y1 , y2 ) = fX1 ,X2 [w1 (y1 , y2 ), w2 (y1 , y2 )] ¯det  ¯
¯  dw dw2 ¯
¯ 2
(y1 , y2 ) (y1 , y2 ) ¯¯
¯
dy1 dy2
when (y1 , y2 ) ∈ SY = g(SX ), and 0 otherwise, where w1 and w2 are coordinate-wise inverse functions
of g. Generally,
¯  ¯
¯ dw1 dw1 ¯
¯ (y) (y) ¯
¯  dy1 dyn ¯
¯  .. ¯
¯
fY (y) = fX [w1 (y), · · · , wn (y)] ¯det 
 . ¯
¯
¯  dwn dwn ¯
¯ (y) (y) ¯¯
¯
dy1 dyn
when y ∈ SY = g(SX ), and 0 otherwise, where w1 , · · · , wn are similarly defined.
2006 PS2 Q2. fX (x1 , x2 ) = 2e−x1 −x2 when 0 < x1 < x2 < ∞, and 0 elsewhere.
Let Y1 = 2X1 and Y2 = X2 − X1 . Find the joint pdf of Y1 and Y2 .
Solution Note first that the support of Y1 is R+ , and that of Y2 is R+ for every value of y1 .
The support of Y is SY = {y|y ≥ 0}. Obtain the inverse function as
1 1
X1 = Y1 , X2 = Y1 + Y2
2 2
Therefore, by transformation theorem,
¯ µ ¶¯
¯
−( 12 y1 )−( 12 y1 +y2 ) ¯
1
0 ¯
fY (y1 , y2 ) = 2e 2 ¯ = e−y1 −y2 when y1 , y2 > 0, and 0 elsewhere.
¯det 1
1 ¯
2
(e.g) fX (x) = 10x1 x22 when 0 < x1 < x2 < 1, and 0 elsewhere.
Let Y1 = X1 /X2 and Y2 = X2 .
Note that the support of Y is SY = {y|0 < y1 , y2 < 1}.
Since X1 = Y1 Y2 and X2 = Y2 , then |J| = y2 .
fY (y1 , y2 ) = 10y1 y24 on SY and 0 elsewhere
2
2. Expectation
Expectation of a random vector X is similarly defined as
X
E[X] = xpX (x)
x∈SX
if X is discrete, and Z Z
E[X] = ··· xfX (x)dx
x∈Rn
if X is continuous. Of course, it is defined when it indeed exists. We can easily verify that
 
E[X1 ]
 .. 
E[X] =  . 
E[Xn ]
For example, for continuous X, the first coordinate of E[X] is nothing but
Z Z Z∞ Z∞ Z∞ Z ∞
··· x1 fX (x)dx = x1 ··· fX (x)dxn · · · dx2 dx1 = x1 fX1 (x1 )dx1 = E[X1 ]
−∞
x∈Rn −∞ −∞ −∞
Expectation of Y = g(X), where g : Rn → Rm , is

 X

 g(x)pX (x) if X is discrete

 x∈S
Z X Z
E[Y ] = E[g(X)] =

 · · · g(x)fX (x)dx if X is continuous


x∈Rn
The covariance between X1 and X2 is defined by
Cov(X1 , X2 ) = E[(X1 − EX1 )(X2 − EX2 )]
The variance covariance matrix of a random vector X is defined by
V ar(X) = E[(X − EX)(X − EX)0 ]
Definition 2 A moment generating function (mgf ) of a random vector X is

h 0 i £ ¤
MX (t) = E et X = E et1 X1 +···+tn Xn
h 0 i
Note again that this is defined only when E et X exists on −h < t < h for some h > 0. The mgf has
the following properties.
dm1 +···+mn MX m1 mn
(1) mn (0) = E[X1 · · · Xn ]
dtm1
1
· · · dtn
(2) The mgf of a random variable is unique up to the distribution. In other words,
MX (t) = MY (t) ∀t ∈ (−h, h) ⇔ FX (t) = FY (t) ∀t ∈ R
3
Important properties of expectation
(1) Expectation is a linear operator. In other words, for any scalars a and b, and functions g and h,
E [ag(X) + bh(Y )] = aE[g(X)] + bE[h(Y )]
(2) For any nonstochastic n × 1 vector a,
E[a0 X] = a0 E[X]
(3) Let X be a m × n random matrix. For any nonstochastic l × m matrix A and n × k matrix B,
E[AXB] = AE[X]B
and thus
h¡ ¢¡ ¢0 i h i
V ar[AX] = E AX − E[AX] AX − E[AX] = E A(X − EX)(X − EX)0 A0 = AV ar(X)A0
(4) Consider Y = X2 |X1 =x1 . This is a random variable and its expectation is
Z ∞
E[X2 |X1 = x1 ] = x2 fX2 |X1 =x1 (x2 )dx2
−∞
which is a constant. What would you think E[X2 |X1 ] is? This is a function of X1 .
It is because it varies with X1 . In fact, it is a random variable. What is its expectation?
h i
E E[X2 |X1 ] = E[X2 ]
Its variance satisfies h i

V ar E[X2 |X1 ] ≤ V ar[X2 ]
(5) For any function g and h,

h i h i
E g(X1 )h(X2 )|X1 = x1 = g(x1 )E h(X2 )|X1 = x1
E[g(X1 )h(X2 )|X1 ] = g(X1 )E[h(X2 )|X1 ]
(6) The mgf satisfies

MX1 (t1 ) = MX (t1 , 0, · · · , 0)
3. Independence
Definition 3 Discrete random variables X1 and X2 are independent if
pX1 ,X2 (x1 , x2 ) = pX1 (x1 )pX2 (x2 ) for all (x1 , x2 ) ∈ SX
and continuous random variables X1 and X2 are independent if
fX1 ,X2 (x1 , x2 ) = fX1 (x1 )fX2 (x2 ) for all (x1 , x2 ) ∈ R2
4
Note that the following are equivalent (TFAE).1
(1) fX1 ,X2 (x1 , x2 ) = fX1 (x1 )fX2 (x2 ) for all (x1 , x2 )
(2) fX2 |X1 =x1 (x2 ) = fX2 (x2 ) for all (x1 , x2 )
(3) There exist function g and h such that
fX1 ,X2 (x1 , x2 ) = g(x1 )h(x2 ) for all (x1 , x2 )
(4) FX1 ,X2 (x1 , x2 ) = FX1 (x1 )FX2 (x2 ) for all (x1 , x2 )
(5) Pr(a < X1 ≤ b, c < X2 ≤ d) = Pr(a < X1 ≤ b) Pr(c < X2 ≤ d) for all a, b, c, d
(6) MX (t1 , t2 ) = MX (t1 , 0)MX (0, t2 )
Also note that if X1 and X2 are independent, the following hold.

(1) g(X1 ) and h(X2 ) are also independent for any function g and h.
(2) If E[g(X1 )] and E[h(X2 )] exist
E[g(X1 )h(X2 )] = E[g(X1 )]E[h(X2 )]
(3) Cov(X1 , X2 ) = 0
(4) The support of X1 doesn’t vary with X2 , and vice versa.
(e.g) Recall the first example in page 2. Are X1 and X2 independent?

Yes, because product of marginal pdf’s is equal to joint pdf. Check the other conditions above.
4. Miscellaneous
The correlation coefficient of X and Y is
E [(X − EX)(Y − EY )] Cov(X, Y )
ρXY = p p =
2
E [(X − EX) ] E [(Y − EY ) ]2 σX σY
p
where µX = E[X] and σX = V ar(X) = std(X). Note that |ρ| ≤ 1.
Theorem 4 (Cauchy Schwartz inequality) Let x and y be any n × 1 vector. Then
|x0 y| ≤ ||x|| · ||y||

qP
n 2
where ||x|| = i=1 xi . Let x and y be m × 1 and n × 1 vectors, respectively. Let a and b be weight
P Pn
vectors such that mi=1 ai = j=1 bj = 1 and ai , bj ≥ 0. Then
¯ ¯ v v
¯m n ¯ um uX
¯X X ¯ uX u n
¯ ¯
ai xi bj yj ¯ ≤ t ai xi t
2 bj yj2
¯
¯ i=1 j=1 ¯ i=1 j=1
For random variables X and Y ,

¯ ¯ p p
¯ ¯
¯E[XY ]¯ ≤ E[X 2 ] E[Y 2 ]
1
For simplicity, both pmf and pdf are written as f (x).
5
2006 mid Q5. Use Cauchy Schwartz ineqaulity to prove that |ρ| ≤ 1.
Definition 5 n × n matrix A is positive semi-definite (p.s.d) if for any n × 1 vector x,
x0 Ax ≥ 0
n × n matrix A is positive definite (p.d) if for any n × 1 vector x 6= 0,
x0 Ax > 0
Definition 6 n × n matrix A is negative semi-definite (n.s.d) if −A is p.s.d. A is negative definite

(n.d) if −A is p.d.
µ ¶
1 0 ¡ ¢
(e.g) Let A = , then for any x = xx12 6= 0,
0 1
x0 Ax = x21 + x22 > 0

µ ¶
1 1
so A is positive definite. Let B = , then for any x
1 1
x0 Bx = (x1 + x2 )2 ≥ 0
so B is positive semidefinite.
Definition 7 A determinant of n × n matrix A is defined by

n
X
det A = (−1)i+1 ai1 det Ai1
i=1
..
.
n
X
= (−1)i+n ain det Ain
i=1
n
X
= (−1)1+i a1j det A1j
j=1
..
.
n
X
= (−1)n+i anj det Anj
j=1
where Aij is a matrix obtained by deleting i’th row and j’th column. It is also equivalent to define
X
det A = ε(k1 ,··· ,kn ) a1k1 · · · ankn
(k1 ,··· ,kn )
where (k1 , · · · , kn ) is a permutation from (1, · · · , n) to (1, · · · , n) and ε(k1 ,··· ,kn ) is 1 if it is even
permutation, and −1 if it is odd permutation.
6
Theorem 8 Let X be a random variable FX . Then Y = FX (X) be a random variable whose distri-
bution is uniform on [0, 1], i.e., ½
y if y ∈ [0, 1]
FY (y) =
0 otherwise
Conversely, let F be a cdf and Y be a random varaible whose distribution is uniform on [0, 1], Then
Z = F −1 (Y ) has a cdf F .
2006 spring Q4. Prove the above theorem.

Solution Assume FX and F are strictly increasing for simplicity.2
¡ ¢ ¡ ¢ ¡ ¢
Pr(Y ≤ y) = Pr FX (X) ≤ y = Pr X ≤ FX−1 (y) = FX FX−1 (y) = y for 0 < y < 1
¡ ¢ ¡ ¢
Pr(Z ≤ z) = Pr F −1 (Y ) ≤ z = Pr Y ≤ F (z) = F (z) since F (z) is always between 0 and 1
1
Exercise. E[FX (X)] =
2
2005 mid Q3 & 2006 mid Q2.

µ ¶
σ12 ρσ1 σ2
V ar(X1 , X2 ) =
ρσ1 σ2 σ22
ρσ2
Let X3 ≡ X2 − X1 . Show that Cov(X1 , X3 ) = 0.
σ1
Solution Let Ã !
1 0
A= ρσ2
− 1
σ1
then µ ¶ µ ¶
X1 X1
=A
X3 X2
¡X1 ¢
The covariance between X1 and X3 is the 1-2 coordinate of V ar X3 .
µ ¶ µ ¶ µ 2 ¶
X1 X1 0 σ1 0
V ar = AV ar A =
X3 X2 0 σ22 (1 − ρ2 )
X −µ
2006 mid Q6. Let X be a random varaible whose pdf is fX (x). Let Y = . What is the pdf of Y ?
σ
2006 mid Q7. Show that X and Y −E[Y |X] are uncorrelated, in other words, Cov(X, Y −E[Y |X]) =
0.
2
You should be able to prove without this assumption.

Ta 2

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ta 2

Uploaded by

Copyright:

Available Formats

TA Session 3

Oct. 19th, 2007

that maps from S to Rn .

FX1 ,··· ,Xn (x1 , · · · , xn ) = Pr(X1 ≤ x1 , · · · , Xn ≤ xn )

pX (x) = Pr(X = x) = Pr(X1 = x1 , · · · , Xn = xn )

FXi (xi ) = Pr(Xi ≤ xi and Xj ∈ R for all j 6= i) = lim FX (x)

Definition 1 Let (X1 , X2 )0 be a random vector. Conditional pmf of X2 given X1 = x1 is given by

pX1 ,X2 (x1 , x2 )

and conditional pdf of X2 given X1 = x1 is given by

fX1 ,X2 (x1 , x2 )

fY (y1 , y2 ) = 10y1 y24 on SY and 0 elsewhere

Expectation of Y = g(X), where g : Rn → Rm , is

The covariance between X1 and X2 is defined by

Cov(X1 , X2 ) = E[(X1 − EX1 )(X2 − EX2 )]

The variance covariance matrix of a random vector X is defined by

V ar(X) = E[(X − EX)(X − EX)0 ]

Definition 2 A moment generating function (mgf ) of a random vector X is

MX (t) = MY (t) ∀t ∈ (−h, h) ⇔ FX (t) = FY (t) ∀t ∈ R

E [ag(X) + bh(Y )] = aE[g(X)] + bE[h(Y )]

(2) For any nonstochastic n × 1 vector a,

Its variance satisfies h i

(5) For any function g and h,

(6) The mgf satisfies

Definition 3 Discrete random variables X1 and X2 are independent if

and continuous random variables X1 and X2 are independent if

fX1 ,X2 (x1 , x2 ) = g(x1 )h(x2 ) for all (x1 , x2 )

Also note that if X1 and X2 are independent, the following hold.

E[g(X1 )h(X2 )] = E[g(X1 )]E[h(X2 )]

(e.g) Recall the first example in page 2. Are X1 and X2 independent?

Theorem 4 (Cauchy Schwartz inequality) Let x and y be any n × 1 vector. Then

|x0 y| ≤ ||x|| · ||y||

For random variables X and Y ,

Definition 5 n × n matrix A is positive semi-definite (p.s.d) if for any n × 1 vector x,

n × n matrix A is positive definite (p.d) if for any n × 1 vector x 6= 0,

Definition 6 n × n matrix A is negative semi-definite (n.s.d) if −A is p.s.d. A is negative definite

x0 Ax = x21 + x22 > 0

Definition 7 A determinant of n × n matrix A is defined by

2006 spring Q4. Prove the above theorem.

2005 mid Q3 & 2006 mid Q2.

You might also like