You are on page 1of 55

DSC6132: Probability and Statistical Modelling

Lecture 4: Multivariate Random variables

Dr. Joseph Nzabanita and Dr. Annie Uwimana

University of Rwanda
ACE-DS

Semester 1, 2020-2021

Dr. Joseph Nzabanita and Dr. Annie Uwimana DSC6132 Part 1-Probability, MSc. in DS
Outline 1

Two-dimensional random vectors


Multi-dimensional random vectors
Multivariate normal distribution

Dr. Joseph Nzabanita and Dr. Annie Uwimana DSC6132 Part 1-Probability, MSc. in DS
Introduction 2

The notion of random variables can be generalized to the case


of two (or more) dimensions.

First, we consider the case of two-dimensional random vectors.

Later, we will extend various definitions to the


multidimensional case is straight forward.

Dr. Joseph Nzabanita and Dr. Annie Uwimana DSC6132 Part 1-Probability, MSc. in DS
Discrete two-dimensional random vectors 3

The joint probability function

pX,Y (xj , yk ) = P ({X = xj } ∩ {Y = yk }) ≡ P (X = xj , Y = yk )

of the pair of discrete random variables (X, Y ), whose possible


values are a (finite or countably infinite) set of points (xj , yk ) in
the plane, has the following properties:
1 pX,Y (xj , yk ) ≥ 0, ∀(xj , yk );
P∞ P∞
k=1 pX,Y (xj , yk ) = 1.
2
j=1
The joint distribution function is defined by
X X
FX,Y (x, y) = P ({X ≤ x} ∩ {Y ≤ y}) = pX,Y (xj , yk )
xj ≤x yk ≤y

Dr. Joseph Nzabanita and Dr. Annie Uwimana DSC6132 Part 1-Probability, MSc. in DS
Example 4

Consider the joint probability function pX,Y given by the table

y\x −1 0 1
0 1/16 1/16 1/16
1 1/16 1/16 2/16
2 2/16 1/16 6/16

Check that the function pX,Y possesses the two properties of


joint probability functions.

Find FX,Y (0, 1/2) = · · · · · · = 1/8.

Dr. Joseph Nzabanita and Dr. Annie Uwimana DSC6132 Part 1-Probability, MSc. in DS
Marginal probability function 5

When the function pX,Y is summed over all possible values of Y


(resp., X), the resulting function is called the marginal
probability function of X (resp., Y ). That is,

X
pX (xj ) = pX,Y (xj , yk )
k=1

and

X
pY (yk ) = pX,Y (xj , yk ).
j=1

Dr. Joseph Nzabanita and Dr. Annie Uwimana DSC6132 Part 1-Probability, MSc. in DS
Example (cont.) 6

We find the marginal probability functions

y\x −1 0 1 pY (y)
0 1/16 1/16 1/16 3/16
1 1/16 1/16 2/16 4/16
2 2/16 1/16 6/16 9/16
pX (x) 4/16 3/16 9/16 1

That is,

x −1 0 1
pX (x) 1/4 3/16 9/16

y 0 1 2
pY (y) 3/16 1/4 9/16

Dr. Joseph Nzabanita and Dr. Annie Uwimana DSC6132 Part 1-Probability, MSc. in DS
Independent random variables 7

Two discrete random variables, X and Y , are said to be


independent if and only if

pX,Y (xj , yk ) = pX (xj )pY (yk ),

for any point (xj , yk ).

Are the random variables X and Y in the previous example


independent? Justify.

Dr. Joseph Nzabanita and Dr. Annie Uwimana DSC6132 Part 1-Probability, MSc. in DS
Conditional probability function 8

Let AX be an event defined in terms of the random variable X.


For instance, AX = {X ≥ 0}. We define the conditional
probability function of Y , given the event AX , by

P ({Y = y} ∩ AX )
pY (y|AX ) ≡ P (Y = y|AX ) = if P (AX ) > 0.
P (AX )

Likewise, we define

P ({X = x} ∩ AY )
pX (x|AY ) ≡ P (X = x|AY ) = if P (AY ) > 0.
P (AY )

If X and Y are independent discrete random variables, then we


may write that

pX (x|AY ) = pX (x) and pY (y|AX ) = py (y).

Dr. Joseph Nzabanita and Dr. Annie Uwimana DSC6132 Part 1-Probability, MSc. in DS
Example (cont.) 9

Let AX = {X = 1}. Then we have

P ({Y = y} ∩ {X = 1})
pY (y|X = 1) =
P ({X = 1})
pX,Y (1, y)
=
pX (1)

16  1/9 if y = 0,
= pX,Y (1, y) = 2/9 if y = 1,
9
2/3 if y = 2.

Dr. Joseph Nzabanita and Dr. Annie Uwimana DSC6132 Part 1-Probability, MSc. in DS
Continuous random vectors 10

Let X and Y be two continuous random variables. The


generalization of the notion of density function to the
two-dimensional case is the joint density function fX,Y of the
pair (X, Y ). This function has the following properties:
1 fX,Y (x, y) ≥ 0 for any point (x, y),
R∞ R∞
−∞ −∞ fX,Y (x, y)dxdy = 1.
2

The joint distribution function is defined by


Z y Z x
FX,Y (x, y) = P (X ≤ x, Y ≤ y) = fX,Y (u, v)dudv,
−∞ −∞

and
∂2
fX,Y (x, y) = FX,Y (x, y)
∂x∂y
for any point (x, y) at which the function FX,Y (x, y) is
differentiable.
Dr. Joseph Nzabanita and Dr. Annie Uwimana DSC6132 Part 1-Probability, MSc. in DS
Example 11

Consider the function fX,Y defined by


2 −y 2
fX,Y (x, y) = cxye−x , x ≥ 0, y ≥ 0,
where c is a constant.

We have that (i) fX,Y (x, y) ≥ 0 for any point (x, y) with
x ≥ 0, y ≥ 0 [fX,Y (x, y) ≡ 0 elsewhere] and (ii)
Z ∞Z ∞
2 2
cxye−x −y dxdy = c/4.
0 0

So, this function is a valid joint density function if and only if


the constant c is equal to 4.
The joint distribution function of the pair (X, Y ) is given by
Z yZ x
2 2 2 2
FX,Y (x, y) = 4 uve−u −v dudv = (1 − e−x )(1 − e−y )
0 0

for x ≥ 0, y ≥ 0 and FX,Y (x, y) ≡ 0, elsewhere.


Dr. Joseph Nzabanita and Dr. Annie Uwimana DSC6132 Part 1-Probability, MSc. in DS
Marginal density functions 12

The marginal density functions of X and Y are defined by


Z ∞
fX (x) = fX,Y (x, y)dy,
−∞
Z ∞
fY (y) = fX,Y (x, y)dx.
−∞

Dr. Joseph Nzabanita and Dr. Annie Uwimana DSC6132 Part 1-Probability, MSc. in DS
Remark 13

We can easily generalize the previous definitions to the case of


three or more random variables.

For instance, the joint density function of the random vector


(X, Y, Z) is a nonnegative function fX,Y,Z (x, y, z) such that
Z ∞Z ∞Z ∞
fX,Y,Z (x, y, z)dxdydz = 1.
−∞ −∞ −∞
Moreover, the joint density function of the random vector
(X, Y ) is obtained as follows:
Z ∞
fX,Y (x, y) = fX,Y,Z (x, y, z)dz.
−∞
Finally, the marginal density function of the random variable X
is given by
Z ∞Z ∞
fX (x) = fX,Y,Z (x, y, z)dydz.
−∞ −∞

Dr. Joseph Nzabanita and Dr. Annie Uwimana DSC6132 Part 1-Probability, MSc. in DS
Independence 14

The continuous random variables X and Y are said to be


independent if and only if

fX,Y (x, y) = fX (x)fY (y),

for any point (x, y).

Dr. Joseph Nzabanita and Dr. Annie Uwimana DSC6132 Part 1-Probability, MSc. in DS
Example (cont.) 15

We have,
Z ∞ Z ∞
−x2 −y 2 −x2 2
fX (x) = 4xye dy = 2xe 2ye−y dy
0 0
2
= 2xe−x , for x ≥ 0.

By symmetry,
2
fY (y) = 2ye−y , for y ≥ 0.

Furthermore,
2 −y 2
fX (x)fY (y) = 4xye−x = fX,Y (x, y),

for any point (x, y) (with x ≥ 0 and y ≥ 0). Thus, the random
variables X and Y are independent.

Dr. Joseph Nzabanita and Dr. Annie Uwimana DSC6132 Part 1-Probability, MSc. in DS
Conditional density function 16

The conditional density function fX (x|Y = y) is obtained by


dividing the joint density function of (X, Y ), evaluated at the
point (x, y), by the marginal density function of Y evaluated at
the point y, that is

fX,Y (x, y)
fX (x|Y = y) = .
fY (y)

Likewise,
fX,Y (x, y)
fY (y|X = x) = .
fX (x)
If X and Y are two independent continuous random variables,
then we have

fX (x|Y = y) = fX (x) and fY (y|X = x) = fY (y).

Dr. Joseph Nzabanita and Dr. Annie Uwimana DSC6132 Part 1-Probability, MSc. in DS
Covariance 17

The covariance of X and Y is defined by

COV(X, Y ) = σX,Y = E[X − µX ][Y − µY ]

or equivalently

COV(X, Y ) = E[XY ] − µX µY .

Dr. Joseph Nzabanita and Dr. Annie Uwimana DSC6132 Part 1-Probability, MSc. in DS
Covariance: Computation 18

If X and Y are discrete


∞ X
X ∞
COV(X, Y ) = (xj − µX )(yk − µY )pX,Y (xj , yk ),
j=1 k=1

or

∞ X
X ∞
COV(X, Y ) = xj yk pX,Y (xj , yk ) − µX µY .
j=1 k=1

If X and Y are continuous


Z ∞Z ∞
COV(X, Y ) = (x − µX )(y − µY )fX,Y (x, y)dxdy,
−∞ −∞
or
Z ∞ Z ∞
COV(X, Y ) = xyfX,Y (x, y)dxdy − µX µY .
−∞ −∞

Dr. Joseph Nzabanita and Dr. Annie Uwimana DSC6132 Part 1-Probability, MSc. in DS
Remarks 19

1. If X and Y are two independent random variables, then

E[XY ] = E[X]E[Y ]

and
COV(X, Y ) = E[X]E[Y ] − µX µY = 0.
2. If the covariance of the random variables X and Y is equal
to zero, they are not necessarily independent. Nevertheless,
we can show that, if X and Y are two random variables
having a joint normal distribution, then X and Y are
independent if and only if COV(X, Y ) = 0.

Dr. Joseph Nzabanita and Dr. Annie Uwimana DSC6132 Part 1-Probability, MSc. in DS
Remarks (cont.) 20

3. We have that COV[X, X] = E[X 2 ] − (E[X])2 = VAR[X].


Thus, the variance is a particular case of the covariance.
However, contrary to the variance, the covariance may be
negative.
4. COV(X, Y ) = COV(Y, X).
5. VAR[aX +bY +c] = a2 VAR[X]+b2 VAR[Y ]+2abCOV[X, Y ].

Dr. Joseph Nzabanita and Dr. Annie Uwimana DSC6132 Part 1-Probability, MSc. in DS
Correlation coefficient 21

The correlation coefficient of X and Y is given by

COV(X, Y )
CORR(X, Y ) = ρX,Y = p .
VAR[X]VAR[Y ]

We can show that −1 ≤ ρX,Y ≤ 1. Moreover, ρX,Y = ±1 if and


only if we can write that Y = aX + b, where a 6= 0.

More precisely, ρX,Y = 1 (resp., -1) if a > 0 (resp., a < 0). In


fact, ρX,Y is a measure of the linear relationship between X and
Y.

Dr. Joseph Nzabanita and Dr. Annie Uwimana DSC6132 Part 1-Probability, MSc. in DS
Example 22

Consider again the joint probability function given by the table

y\x −1 0 1 pY (y)
0 1/16 1/16 1/16 3/16
1 1/16 1/16 2/16 4/16
2 2/16 1/16 6/16 9/16
pX (x) 4/16 3/16 9/16 1

Calculate ρX,Y = · · · ≈ 0.2012.

Dr. Joseph Nzabanita and Dr. Annie Uwimana DSC6132 Part 1-Probability, MSc. in DS
Random vectors

A random vector is defined as x = (X1 , . . . , Xp )0 where the


components Xi are univariate random variables.

In other words, a random vector is a vector whose individual


elements are random variables. Similary, we define a random
matrix as a matrix whose individual elements are random
variables. The expected value of a random vector or matrix is
performed element-wise.

Dr. Joseph Nzabanita and Dr. Annie Uwimana DSC6132 Part 1-Probability, MSc. in DS
Joint distribution 24

The distribution function of x is the function F : Rp → [0; 1]


defined by

F (x) = P (X1 ≤ x1 , . . . , Xp ≤ xp ). (1)

We may also use the term joint distribution of x, in which


case we actually refer to the components of x, i.e., random
variables X1 , . . . , Xp . Alternative notations are F (x1 , . . . , xp ),
FX1 ,...,Xp (x1 , . . . , xp ), Fx (x), etc. Sometimes the notation
FX1 ,...,Xp (x1 , . . . , xp ) is used to distinguish the joint distribution
function of X1 , . . . , Xp from other distribution functions.

Dr. Joseph Nzabanita and Dr. Annie Uwimana DSC6132 Part 1-Probability, MSc. in DS
Properties of a distribution function 25

Let x = (X1 , . . . , Xp )0 be a random vector.


1 FX1 ,...,Xp (x1 , . . . , xp ) is monotonically non-decreasing for
each of its variables
2 FX1 ,...,Xp (x1 , . . . , xp ) is right-continuous for each of its
variables.
3 0 ≤ F (x1 , ..., xn ) ≤ 1
4 limx1 ,...,xn →+∞ F (x1 , ..., xn ) = 1 and
limxi →−∞ F (x1 , ..., xn ) = 0, for all i

Dr. Joseph Nzabanita and Dr. Annie Uwimana DSC6132 Part 1-Probability, MSc. in DS
Joint probability density 26

For a random vector x, the distribution function always exists.

Using the fundamental theorem of calculus, when it applies, one


may obtain from the distribution function the (joint)
probability density function (p.d.f) for the random vector x
which we shall represent as fx (x) or fX1 ,...,Xp (x1 , . . . , xp ).

In this course, we shall always assume that the density function


exists for a random vector x.

Dr. Joseph Nzabanita and Dr. Annie Uwimana DSC6132 Part 1-Probability, MSc. in DS
Joint probability density (cont.) 27

The distribution of a random vector x = (X1 , . . . , Xp )0 is called


absolutely continuous if there exists a nonnegative integrable
function f on Rp , called the joint density (or joint probability
density function), such that the distribution function F of x can
be written as the integral
Z x1 Z xp
F (x1 , . . . , xp ) = ··· f (u1 , . . . , up )du1 · · · dup , (2)
−∞ −∞

or in short notation
Z x
F (x) = f (u)du. (3)
−∞

On the set of continuity of f ,


∂p
f (x1 , . . . , xp ) = F (x1 , . . . , xp ). (4)
∂x1 · · · ∂xp

Dr. Joseph Nzabanita and Dr. Annie Uwimana DSC6132 Part 1-Probability, MSc. in DS
Partitioning and Marginal distributions 28

The random vector x can be partitioned into two mutually


exclusive subsets, where
 
u
x= , u is (q × 1), v is (s × 1) and p = q + s.
v

Thus u and v are also random vectors but of lower dimension


than x.

The joint distribution function Fu (u) for u can be obtained


from Fx (x) by integrating the joint density fx (x) over the entire
range of the variables in v.

Dr. Joseph Nzabanita and Dr. Annie Uwimana DSC6132 Part 1-Probability, MSc. in DS
Partitioning and Marginal distributions (cont.) 29

The joint distribution function Fu (x1 , . . . , xq ) for u is given by

Fu (x1 , . . . , xq ) = FX1 ,...,Xq (x1 , . . . , xq )


Z x1 Z xq Z ∞ Z ∞
= ··· ··· fx (x1 , . . . , xp ) dx1 · · · dxp
−∞ −∞ −∞ −∞
Z x1 Z xq
= ··· fX1 ,...,Xq (x1 , . . . , xq ) dx1 · · · dxq ,
−∞ −∞

where fX1 ,...,Xq (x1 , . . . , xq ) is the joint density for u given by


Z ∞ Z ∞
fX1 ,...,Xq (x1 , . . . , xq ) = ··· fx (x1 , . . . , xp ) dxq+1 · · · dxp
−∞ −∞

Dr. Joseph Nzabanita and Dr. Annie Uwimana DSC6132 Part 1-Probability, MSc. in DS
Partitioning and Marginal distributions (cont.) 30

Also, Fu (x1 , . . . , xq ) can be obtained by

Fu (x1 , . . . , xq ) = lim FX1 ,...,Xp (x1 , . . . , xp )


xq+1 ,...,xp →∞
def
= FX1 ,X2 ,...,Xp (x1 , . . . , xq , ∞, . . . , ∞). (5)

Special cases are marginal distributions and marginal densities.


(a) Marginal distribution of Xi

FXi (xi ) = FX1 ,...,Xp (∞, . . . , ∞, xi , ∞, . . . , ∞) (6)

(b) Marginal p.d.f of Xi


Z ∞ Z ∞
fXi (xi ) = ··· fx (x1 , . . . , xp ) dx1 · · · dxi−1 dxi+1 · · · dxp (7
−∞ −∞

Dr. Joseph Nzabanita and Dr. Annie Uwimana DSC6132 Part 1-Probability, MSc. in DS
Conditional distributions and independence 31

Let
 
u
x= ,
v
be the partitioning of the random vector x as above. The
conditional density of v given u is obtained from fx (x) by
fx (x) fx (x1 , . . . , xp )
fv|u (v|u = u) = = . (8)
fu (u) fu (x1 , . . . , xq )
The two random vectors u and v are indepedent if and only if
fv|u (v|u = u) = fv (v), for all u and all v, (9)
or equivalently
fx (x) = fu (u)fv (v). (10)
Sometimes the notation fv|u (v|u) is used instead of
fv|u (v|u = u).
Dr. Joseph Nzabanita and Dr. Annie Uwimana DSC6132 Part 1-Probability, MSc. in DS
Mean vector 32

The mean vector of the random vector x = (X1 , . . . , Xp )0 is


given by
 
X1
X2 
µ = E[x] = E  . 
 
 .. 
Xp
   
E[X1 ] µ1
E[X2 ] µ2 
=  .  =  . .
   
 .   .. 
.
E[Xp ] µp

Dr. Joseph Nzabanita and Dr. Annie Uwimana DSC6132 Part 1-Probability, MSc. in DS
Covariance matrix 33

The covariance matrix is defined as


 
σ11 σ12 · · · σ1p
σ21 σ22 · · · σ2p 
Σ = cov(x) =  . (11)
 
.. .. 
 .. . . 
σp1 σp2 · · · σpp

The diagonal elements σjj = σj2 = var(Xj ), and the off-diagonal


elements σjk = cov(Xj , Xk ), i.e, the covariances of all possible
pairs of x’s. The population covariance matrix in (11) can also
be found as

Σ = E (x − µ)(x − µ)0 .
 
(12)

Note that the covariance matrix is symmetric and at least


positive semi-definite.

Dr. Joseph Nzabanita and Dr. Annie Uwimana DSC6132 Part 1-Probability, MSc. in DS
Correlation matrix 34

The correlation matrix is given by


 
ρ11 ρ12 · · · ρ1p
ρ21 ρ22 · · · ρ2p 
Pρ = (ρjk ) =  . (13)
 
.. .. 
 .. . . 
ρp1 ρp2 · · · ρpp
σ σ
where ρjk = √σjjjkσkk = σjjk σk , j 6= k and ρjj = 1.
The correlation matrix can be obtained from the covariance
matrix, and vice versa. Define
 
σ1 0 · · · 0
 0 σ2 · · · 0
D = diag(σ1 , σ2 , . . . , σp ) =  . (14)
 
.. .. 
 .. . .
0 0 ··· σp

Then Pρ = D−1 ΣD−1 , Σ = DPρ D.


Dr. Joseph Nzabanita and Dr. Annie Uwimana DSC6132 Part 1-Probability, MSc. in DS
Theorem

Let x : (p × 1) be a random vector with covariance matrix Σx .


Define a new random vector as

y = Ax + b : (m × 1),

where A : (m × p) is a fix matrix and b : m × 1 is a fix vector.


Then
E(y) = A E(x) + b,
Σy = AΣx A0 .

Dr. Joseph Nzabanita and Dr. Annie Uwimana DSC6132 Part 1-Probability, MSc. in DS
Proof
We have

Σy = E[(y − E(y))(y − E(y))0 ]


= E[(Ax + b − A E(x) − b)(Ax + b − A E(x) − b)0 ]
= E[A(x − E(x))(x − E(x))0 A0 ] = . . . = AΣx A0 ,

where we have used (AB)0 = B 0 A0 .

Dr. Joseph Nzabanita and Dr. Annie Uwimana DSC6132 Part 1-Probability, MSc. in DS
Special case

Let x1 , . . . , xp be random variabels. A linear combination of


these is given as
n
X
y= ai xi = (a1 , . . . , ap )x = a0 x,
i=1

where x = (x1 , . . . , xp )0 and a = (a1 , . . . , ap )0 . Then

var(y) = σy2 = a0 Σx a : (1 × 1).

Since var(y) > 0 we also see that Σx is positive definite or


semidefinite.

Dr. Joseph Nzabanita and Dr. Annie Uwimana DSC6132 Part 1-Probability, MSc. in DS
Examples 38

Example 1
Let f : R3 −→ R be given by

c x1 (x2 + x3 ), 0 ≤ x1 , x2 , x3 ≤ 1
f (x1 , x2 , x3 ) =
0, otherwise

(a) Find the constant c so that f is a joint density function of


some random variables X1 , X2 and X3 .
(b) Find the joint distribution function F (x1 , x2 , x3 ) of X1 , X2
and X3 , and use it to determine
P (X1 ≤ 0.5, X2 ≤ 0.5, X3 ≤ 0.5).
(c) Find the marginal density function of X1 and E[X1 ].
(d) Find the conditional density for (X2 , X3 ) given X1 = x1 . Is
X1 independent of X2 and X3 ?

Dr. Joseph Nzabanita and Dr. Annie Uwimana DSC6132 Part 1-Probability, MSc. in DS
39

Solution:
(a) We have to find c such that
Z 1Z 1Z 1
f (x1 , x2 , x3 )dx1 dx2 dx3 = 1.
0 0 0
Integreting, we get
Z 1Z 1Z 1
f (x1 , x2 , x3 )dx1 dx2 dx3
0 0 0
Z 1Z 1Z 1
= c x1 (x2 + x3 )dx1 dx2 dx3
0 0 0
= c/2.
So, c/2 = 1 or c = 2 and thus,

2 x1 (x2 + x3 ), 0 ≤ x1 , x2 , x3 ≤ 1
f (x1 , x2 , x3 ) =
0, otherwise
is a joint density function of some random variables X1 , X2
and X3 .
Dr. Joseph Nzabanita and Dr. Annie Uwimana DSC6132 Part 1-Probability, MSc. in DS
40

(b) By the definition of distribution function, for


0 ≤ x1 , x2 , x3 ≤ 1,
Z x1 Z x2 Z x3
F (x1 , x2 , x3 ) = f (u1 , u2 , u3 )du1 du2 du3
Z0 x1 Z0 x2 Z0 x3
= 2 u1 (u2 + u3 )du1 du2 du3
0 0 0
Z x1  Z x2 Z x3 
= 2 u1 du1 (u2 + u3 )du2 du3
0 0 0
x21 2
= (x x3 + x2 x23 ).
2 2
So,

 0,2
 x1 , x2 , x3 < 0
x1 2
F (x1 , x2 , x3 ) = 2 (x2 x3 + x2 x23 ), 0 ≤ x1 , x2 , x3 ≤ 1

 1, x1 , x2 , x3 > 1

Dr. Joseph Nzabanita and Dr. Annie Uwimana DSC6132 Part 1-Probability, MSc. in DS
41

Thus,
P (X1 ≤ 0.5, X2 ≤ 0.5, X3 ≤ 0.5) = F (0.5, 0.5, 0.5)
= 1/32 = 0.03125.

(c) The marginal density function of X1 is given by


Z 1Z 1
fX1 (x1 ) = f (x1 , x2 , x3 )dx2 dx3 = 2x1 ,
0 0
and
Z 1 Z 1
E[X1 ] = x1 fX1 (x1 )dx1 = 2 x21 dx1 = 2/3.
0 0
(d) The conditional density for (X2 , X3 ) given X1 = x1 is
f (x1 , x2 , x3 )
fX2 ,X3 |X1 (x2 , x3 |X1 = x1 ) = = x2 + x3 .
fX1 (x1 )
Hence, X1 is independent of X2 and X3 .
Dr. Joseph Nzabanita and Dr. Annie Uwimana DSC6132 Part 1-Probability, MSc. in DS
Multivariate normal distribution 42

A random vector y = (y1 , . . . , yp )0 has a p-dimensional


multivariate normal distribution if its density is given by
1 0 −1 (y−µ)/2
g(y) = e−(y−µ) Σ , (15)
(2π)p/2 |Σ|1/2

where the elements of y are in the range (−∞, ∞), E[y] = µ,


var(y) = Σ and rank(Σ) = p.
We say that y is distributed as Np (µ, Σ) or y is Np (µ, Σ) and
sometimes write y ∼ Np (µ, Σ).

Dr. Joseph Nzabanita and Dr. Annie Uwimana DSC6132 Part 1-Probability, MSc. in DS
43

If p = 1, we get the usual univariate normal density


1 2 2
g(y1 ) = √ e−(y1 −µ1 ) /2σ1 , (16)
2πσ1

and we write y1 ∼ N (µ1 , σ12 ).


The quatity (y − µ)0 Σ−1 (y − µ) in the exponent of the
multivariate normal density (15) is the squared generalized
distance from y to µ, or the Mahalanobis distance,

∆2 = (y − µ)0 Σ−1 (y − µ). (17)

In the univariate case this squared distance becomes


−1
δ 2 = (y1 − µ1 )0 σ11 (y1 − µ1 ) = (y1 − µ1 )2 /σ12 . (18)

Dr. Joseph Nzabanita and Dr. Annie Uwimana DSC6132 Part 1-Probability, MSc. in DS
44

The determinant of the covariance matrix |Σ| is the generalized


population variance. If σ 2 is small in the univariate normal, the
y values are concentrated near the mean.
Similarly, a small value of |Σ| in the multivariate case indicates
that the y’s are concentrated close to µ in p-space or that there
is multicollinearity among the variables. The term
multicollinearity indicates that the variables are highly
intercorrelated, in which case the effective dimensionality is less
than p. Figure 1 shows the familiar bell-shaped curves of
N (5, 0.5) and N (5, 1) and Figure 2 shows a bivariate normal
density surface.

Dr. Joseph Nzabanita and Dr. Annie Uwimana DSC6132 Part 1-Probability, MSc. in DS
45

Figure 1: The normal density curves: N (5, 0.5) and N (5, 1)

Dr. Joseph Nzabanita and Dr. Annie Uwimana DSC6132 Part 1-Probability, MSc. in DS
46

Figure 2: A bivariate normal density surface

Dr. Joseph Nzabanita and Dr. Annie Uwimana DSC6132 Part 1-Probability, MSc. in DS
Properties of multivariate normal random variables 47

Consider y = (y1 , y2 , . . . , yp )0 ∼ Np (µ, Σ).


1. Normality of linear combinations of the variables in
y:
(a) If a is a vector of constants, the linear function a0 y is
univariate normal:
a0 y ∼ N1 (a0 µ, a0 Σa). (19)
(b) If A is a constant q × p matrix of rank q, where q ≤ p, the q
linear combinations in Ay have a multivariate normal
distribution:
Ay ∼ Nq (Aµ, AΣA0 ). (20)
2. Standardized variables:
A standardized vector z can be obtained in two ways:
z = (T0 )−1 (y − µ), (21)
where T is obtained from the the Cholesky factorization
Σ = T0 T, or
z = (Σ1/2 )−1 (y − µ), (22)
Dr. Joseph Nzabanita and Dr. Annie Uwimana DSC6132 Part 1-Probability, MSc. in DS
48

where Σ1/2 is the symmetric square root matrix of Σ. In either


case, it follows from Property 1b that z is multivariate normal:

z ∼ Np (0, I). (23)

3. Chi-square distribution:
A chi-square random variable with p degrees of freedom is
defined as the sum of squares of p independent standard
normal random variables. Thus, if zP is the standardized
vector defined in (21) or (22), then pi=1 zi2 = z0 z has the
χ2 -distribution with p degrees of freedom, denoted as χ2p or
χ2 (p). From either (21) or (22) we obtain
z0 z = (y − µ)0 Σ−1 (y − µ). Hence,

(y − µ)0 Σ−1 (y − µ) ∼ χ2 (p). (24)

Dr. Joseph Nzabanita and Dr. Annie Uwimana DSC6132 Part 1-Probability, MSc. in DS
49

4. Normality of marginal distributions:


(a) Any subset of the y’s in y has a multivariate normal
distribution, with mean vector consisting of the
corresponding subvector of µ and covariance matrix
composed of the corresponding submatrix of Σ. To
illustrate, let y1 = (y1 , y2 , . . . , yr )0 denote the subvector
containing the first r elements of y and y2 = (yr+1 , . . . , yp )0
consist of the remaining p − r elements. Thus y, µ, and Σ
are partitioned as
     
y1 µ1 Σ11 Σ12
y= ,µ= ,Σ= .
y2 µ2 Σ21 Σ22
where for instance y1 and µ1 are r × 1, and Σ11 is r × r.
Then
y1 ∼ Nr (µ, Σ11 ). (25)
Here, again, E(y1 ) = µ1 and cov(y1 ) = Σ11 hold for any
random vector partitioned in this way. But if y is p-variate
normal, then y1 is r-variate normal.
Dr. Joseph Nzabanita and Dr. Annie Uwimana DSC6132 Part 1-Probability, MSc. in DS
50

(b) As a special case of the preceding result, each yj in y has


the univariate normal distribution:

yj ∼ N (µj , σjj ), j = 1, 2, . . . , p.

The converse of this is not true. If the density of each yj in


y is normal, it does not necessarily follow that y is
multivariate normal.
In the next three properties, let the observation vector be
partitioned into two subvectors denoted by y and x, where y is
p × 1 and x is q × 1. Or, alternatively, let x represent some
additional variables to be considered along with those in y.
Then,
       
y µy y Σyy Σyx
E = , cov = .
x µx x Σxy Σxx ,

Dr. Joseph Nzabanita and Dr. Annie Uwimana DSC6132 Part 1-Probability, MSc. in DS
51

We assume, in addition, that


     
y µy Σyy Σyx
∼ Np+q , .
x µx Σxy Σxx ,

5. Independence:
(a) The subvectors y and x are independent if Σxy = 0.
(b) Two individual variables yj and yk are independent if
σjk = 0.
Note that this is generally not true for nonnormal random
variables.
6. Conditional distribution:
If y and x are not independent, then Σyx 6= 0, and the
conditional distribution of y given x, f (y|x), is
multivariate normal with
E(y|x) = µy + Σyx Σ−1
xx (x − µx ), (26)
cov(y|x) = Σyy − Σyx Σ−1
xx Σxy . (27)
Dr. Joseph Nzabanita and Dr. Annie Uwimana DSC6132 Part 1-Probability, MSc. in DS
52

Note that E(y|x) is a vector of linear functions of x, whereas


cov(y|x) is a matrix that does not depend on x. The linear
trend in (26) holds for any pair of variables and it is called the
multivariate regression function. The matrix Σyx Σ−1 xx is called
the matrix of regression coefficients.
7. Distribution of the sum of two subvectors:
If y and x are the same size (both p × 1) and independent,
then

y + x ∼ Np (µy + µx , Σyy + Σxx ) (28)


y − x ∼ Np (µy − µx , Σyy + Σxx ). (29)

Dr. Joseph Nzabanita and Dr. Annie Uwimana DSC6132 Part 1-Probability, MSc. in DS
Example 53

 
1 0
y0
Let = (y1 , y2 ) ∼ N2 (µ, Σ) with µ0
= (0, 0), Σ = .
0 1
Define a random vector x0 = (x1 , x2 ) by

x1 = y1 − 3y2
x2 = y1 + cy2

(a) Find the mean vector, the covariance matrix of x and its
distribution.
(b) For which value of c the r.v. x1 and x2 are independent?

Dr. Joseph Nzabanita and Dr. Annie Uwimana DSC6132 Part 1-Probability, MSc. in DS
Exercises 54

Separate sheet!

Dr. Joseph Nzabanita and Dr. Annie Uwimana DSC6132 Part 1-Probability, MSc. in DS

You might also like