DSC6132: Probability and Statistical Modelling: Lecture 4: Multivariate Random Variables

DSC6132: Probability and Statistical Modelling
Lecture 4: Multivariate Random variables
Dr. Joseph Nzabanita and Dr. Annie Uwimana
University of Rwanda
ACE-DS
Semester 1, 2020-2021
Dr. Joseph Nzabanita and Dr. Annie Uwimana DSC6132 Part 1-Probability, MSc. in DS
Outline 1
Two-dimensional random vectors

Multi-dimensional random vectors
Multivariate normal distribution
Introduction 2
The notion of random variables can be generalized to the case

of two (or more) dimensions.
First, we consider the case of two-dimensional random vectors.
Later, we will extend various definitions to the

multidimensional case is straight forward.
Discrete two-dimensional random vectors 3
The joint probability function
pX,Y (xj , yk ) = P ({X = xj } ∩ {Y = yk }) ≡ P (X = xj , Y = yk )
of the pair of discrete random variables (X, Y ), whose possible

values are a (finite or countably infinite) set of points (xj , yk ) in
the plane, has the following properties:
1 pX,Y (xj , yk ) ≥ 0, ∀(xj , yk );
P∞ P∞
k=1 pX,Y (xj , yk ) = 1.
2
j=1
The joint distribution function is defined by
X X
FX,Y (x, y) = P ({X ≤ x} ∩ {Y ≤ y}) = pX,Y (xj , yk )
xj ≤x yk ≤y
Example 4
Consider the joint probability function pX,Y given by the table
y\x −1 0 1
0 1/16 1/16 1/16
1 1/16 1/16 2/16
2 2/16 1/16 6/16
Check that the function pX,Y possesses the two properties of

joint probability functions.
Find FX,Y (0, 1/2) = · · · · · · = 1/8.
Marginal probability function 5
When the function pX,Y is summed over all possible values of Y

(resp., X), the resulting function is called the marginal
probability function of X (resp., Y ). That is,
∞
X
pX (xj ) = pX,Y (xj , yk )
k=1
and
∞
X
pY (yk ) = pX,Y (xj , yk ).
j=1
Example (cont.) 6
We find the marginal probability functions
y\x −1 0 1 pY (y)
0 1/16 1/16 1/16 3/16
1 1/16 1/16 2/16 4/16
2 2/16 1/16 6/16 9/16
pX (x) 4/16 3/16 9/16 1
That is,
x −1 0 1
pX (x) 1/4 3/16 9/16
y 0 1 2
pY (y) 3/16 1/4 9/16
Independent random variables 7
Two discrete random variables, X and Y , are said to be

independent if and only if
pX,Y (xj , yk ) = pX (xj )pY (yk ),
for any point (xj , yk ).
Are the random variables X and Y in the previous example

independent? Justify.
Conditional probability function 8
Let AX be an event defined in terms of the random variable X.

For instance, AX = {X ≥ 0}. We define the conditional
probability function of Y , given the event AX , by
P ({Y = y} ∩ AX )
pY (y|AX ) ≡ P (Y = y|AX ) = if P (AX ) > 0.
P (AX )
Likewise, we define
P ({X = x} ∩ AY )
pX (x|AY ) ≡ P (X = x|AY ) = if P (AY ) > 0.
P (AY )
If X and Y are independent discrete random variables, then we

may write that
pX (x|AY ) = pX (x) and pY (y|AX ) = py (y).
Example (cont.) 9
Let AX = {X = 1}. Then we have
P ({Y = y} ∩ {X = 1})
pY (y|X = 1) =
P ({X = 1})
pX,Y (1, y)
=
pX (1)

16  1/9 if y = 0,
= pX,Y (1, y) = 2/9 if y = 1,
9
2/3 if y = 2.

Continuous random vectors 10
Let X and Y be two continuous random variables. The

generalization of the notion of density function to the
two-dimensional case is the joint density function fX,Y of the
pair (X, Y ). This function has the following properties:
1 fX,Y (x, y) ≥ 0 for any point (x, y),
R∞ R∞
−∞ −∞ fX,Y (x, y)dxdy = 1.
2
The joint distribution function is defined by

Z y Z x
FX,Y (x, y) = P (X ≤ x, Y ≤ y) = fX,Y (u, v)dudv,
−∞ −∞
and
∂2
fX,Y (x, y) = FX,Y (x, y)
∂x∂y
for any point (x, y) at which the function FX,Y (x, y) is
differentiable.
Example 11
Consider the function fX,Y defined by

2 −y 2
fX,Y (x, y) = cxye−x , x ≥ 0, y ≥ 0,
where c is a constant.
We have that (i) fX,Y (x, y) ≥ 0 for any point (x, y) with
x ≥ 0, y ≥ 0 [fX,Y (x, y) ≡ 0 elsewhere] and (ii)
Z ∞Z ∞
2 2
cxye−x −y dxdy = c/4.
0 0
So, this function is a valid joint density function if and only if

the constant c is equal to 4.
The joint distribution function of the pair (X, Y ) is given by
Z yZ x
2 2 2 2
FX,Y (x, y) = 4 uve−u −v dudv = (1 − e−x )(1 − e−y )
0 0
for x ≥ 0, y ≥ 0 and FX,Y (x, y) ≡ 0, elsewhere.

Marginal density functions 12
The marginal density functions of X and Y are defined by

Z ∞
fX (x) = fX,Y (x, y)dy,
−∞
Z ∞
fY (y) = fX,Y (x, y)dx.
−∞
Remark 13
We can easily generalize the previous definitions to the case of

three or more random variables.
For instance, the joint density function of the random vector

(X, Y, Z) is a nonnegative function fX,Y,Z (x, y, z) such that
Z ∞Z ∞Z ∞
fX,Y,Z (x, y, z)dxdydz = 1.
−∞ −∞ −∞
Moreover, the joint density function of the random vector
(X, Y ) is obtained as follows:
Z ∞
fX,Y (x, y) = fX,Y,Z (x, y, z)dz.
−∞
Finally, the marginal density function of the random variable X
is given by
Z ∞Z ∞
fX (x) = fX,Y,Z (x, y, z)dydz.
−∞ −∞
Independence 14
The continuous random variables X and Y are said to be

independent if and only if
fX,Y (x, y) = fX (x)fY (y),
for any point (x, y).
Example (cont.) 15
We have,
Z ∞ Z ∞
−x2 −y 2 −x2 2
fX (x) = 4xye dy = 2xe 2ye−y dy
0 0
2
= 2xe−x , for x ≥ 0.
By symmetry,
2
fY (y) = 2ye−y , for y ≥ 0.
Furthermore,
2 −y 2
fX (x)fY (y) = 4xye−x = fX,Y (x, y),
for any point (x, y) (with x ≥ 0 and y ≥ 0). Thus, the random
variables X and Y are independent.
Conditional density function 16
The conditional density function fX (x|Y = y) is obtained by

dividing the joint density function of (X, Y ), evaluated at the
point (x, y), by the marginal density function of Y evaluated at
the point y, that is
fX,Y (x, y)
fX (x|Y = y) = .
fY (y)
Likewise,
fX,Y (x, y)
fY (y|X = x) = .
fX (x)
If X and Y are two independent continuous random variables,
then we have
fX (x|Y = y) = fX (x) and fY (y|X = x) = fY (y).
Covariance 17
The covariance of X and Y is defined by
COV(X, Y ) = σX,Y = E[X − µX ][Y − µY ]
or equivalently
COV(X, Y ) = E[XY ] − µX µY .
Covariance: Computation 18
If X and Y are discrete

∞ X
X ∞
COV(X, Y ) = (xj − µX )(yk − µY )pX,Y (xj , yk ),
j=1 k=1
or
∞ X
X ∞
COV(X, Y ) = xj yk pX,Y (xj , yk ) − µX µY .
j=1 k=1
If X and Y are continuous

Z ∞Z ∞
COV(X, Y ) = (x − µX )(y − µY )fX,Y (x, y)dxdy,
−∞ −∞
or
Z ∞ Z ∞
COV(X, Y ) = xyfX,Y (x, y)dxdy − µX µY .
−∞ −∞
Remarks 19
1. If X and Y are two independent random variables, then
E[XY ] = E[X]E[Y ]
and
COV(X, Y ) = E[X]E[Y ] − µX µY = 0.
2. If the covariance of the random variables X and Y is equal
to zero, they are not necessarily independent. Nevertheless,
we can show that, if X and Y are two random variables
having a joint normal distribution, then X and Y are
independent if and only if COV(X, Y ) = 0.
Remarks (cont.) 20
3. We have that COV[X, X] = E[X 2 ] − (E[X])2 = VAR[X].

Thus, the variance is a particular case of the covariance.
However, contrary to the variance, the covariance may be
negative.
4. COV(X, Y ) = COV(Y, X).
5. VAR[aX +bY +c] = a2 VAR[X]+b2 VAR[Y ]+2abCOV[X, Y ].
Correlation coefficient 21
The correlation coefficient of X and Y is given by
COV(X, Y )
CORR(X, Y ) = ρX,Y = p .
VAR[X]VAR[Y ]
We can show that −1 ≤ ρX,Y ≤ 1. Moreover, ρX,Y = ±1 if and

only if we can write that Y = aX + b, where a 6= 0.
More precisely, ρX,Y = 1 (resp., -1) if a > 0 (resp., a < 0). In

fact, ρX,Y is a measure of the linear relationship between X and
Y.
Example 22
Consider again the joint probability function given by the table
y\x −1 0 1 pY (y)
0 1/16 1/16 1/16 3/16
1 1/16 1/16 2/16 4/16
2 2/16 1/16 6/16 9/16
pX (x) 4/16 3/16 9/16 1
Calculate ρX,Y = · · · ≈ 0.2012.
Random vectors
A random vector is defined as x = (X1 , . . . , Xp )0 where the

components Xi are univariate random variables.
In other words, a random vector is a vector whose individual

elements are random variables. Similary, we define a random
matrix as a matrix whose individual elements are random
variables. The expected value of a random vector or matrix is
performed element-wise.
Joint distribution 24
The distribution function of x is the function F : Rp → [0; 1]

defined by
F (x) = P (X1 ≤ x1 , . . . , Xp ≤ xp ). (1)
We may also use the term joint distribution of x, in which

case we actually refer to the components of x, i.e., random
variables X1 , . . . , Xp . Alternative notations are F (x1 , . . . , xp ),
FX1 ,...,Xp (x1 , . . . , xp ), Fx (x), etc. Sometimes the notation
FX1 ,...,Xp (x1 , . . . , xp ) is used to distinguish the joint distribution
function of X1 , . . . , Xp from other distribution functions.
Properties of a distribution function 25
Let x = (X1 , . . . , Xp )0 be a random vector.

1 FX1 ,...,Xp (x1 , . . . , xp ) is monotonically non-decreasing for
each of its variables
2 FX1 ,...,Xp (x1 , . . . , xp ) is right-continuous for each of its
variables.
3 0 ≤ F (x1 , ..., xn ) ≤ 1
4 limx1 ,...,xn →+∞ F (x1 , ..., xn ) = 1 and
limxi →−∞ F (x1 , ..., xn ) = 0, for all i
Joint probability density 26
For a random vector x, the distribution function always exists.
Using the fundamental theorem of calculus, when it applies, one

may obtain from the distribution function the (joint)
probability density function (p.d.f) for the random vector x
which we shall represent as fx (x) or fX1 ,...,Xp (x1 , . . . , xp ).
In this course, we shall always assume that the density function

exists for a random vector x.
Joint probability density (cont.) 27
The distribution of a random vector x = (X1 , . . . , Xp )0 is called

absolutely continuous if there exists a nonnegative integrable
function f on Rp , called the joint density (or joint probability
density function), such that the distribution function F of x can
be written as the integral
Z x1 Z xp
F (x1 , . . . , xp ) = ··· f (u1 , . . . , up )du1 · · · dup , (2)
−∞ −∞
or in short notation
Z x
F (x) = f (u)du. (3)
−∞
On the set of continuity of f ,

∂p
f (x1 , . . . , xp ) = F (x1 , . . . , xp ). (4)
∂x1 · · · ∂xp
Partitioning and Marginal distributions 28
The random vector x can be partitioned into two mutually

exclusive subsets, where

u
x= , u is (q × 1), v is (s × 1) and p = q + s.
v
Thus u and v are also random vectors but of lower dimension

than x.
The joint distribution function Fu (u) for u can be obtained

from Fx (x) by integrating the joint density fx (x) over the entire
range of the variables in v.
Partitioning and Marginal distributions (cont.) 29
The joint distribution function Fu (x1 , . . . , xq ) for u is given by
Fu (x1 , . . . , xq ) = FX1 ,...,Xq (x1 , . . . , xq )

Z x1 Z xq Z ∞ Z ∞
= ··· ··· fx (x1 , . . . , xp ) dx1 · · · dxp
−∞ −∞ −∞ −∞
Z x1 Z xq
= ··· fX1 ,...,Xq (x1 , . . . , xq ) dx1 · · · dxq ,
−∞ −∞
where fX1 ,...,Xq (x1 , . . . , xq ) is the joint density for u given by

Z ∞ Z ∞
fX1 ,...,Xq (x1 , . . . , xq ) = ··· fx (x1 , . . . , xp ) dxq+1 · · · dxp
−∞ −∞
Partitioning and Marginal distributions (cont.) 30
Also, Fu (x1 , . . . , xq ) can be obtained by
Fu (x1 , . . . , xq ) = lim FX1 ,...,Xp (x1 , . . . , xp )

xq+1 ,...,xp →∞
def
= FX1 ,X2 ,...,Xp (x1 , . . . , xq , ∞, . . . , ∞). (5)
Special cases are marginal distributions and marginal densities.

(a) Marginal distribution of Xi
FXi (xi ) = FX1 ,...,Xp (∞, . . . , ∞, xi , ∞, . . . , ∞) (6)
(b) Marginal p.d.f of Xi

Z ∞ Z ∞
fXi (xi ) = ··· fx (x1 , . . . , xp ) dx1 · · · dxi−1 dxi+1 · · · dxp (7
−∞ −∞
Conditional distributions and independence 31
Let

u
x= ,
v
be the partitioning of the random vector x as above. The
conditional density of v given u is obtained from fx (x) by
fx (x) fx (x1 , . . . , xp )
fv|u (v|u = u) = = . (8)
fu (u) fu (x1 , . . . , xq )
The two random vectors u and v are indepedent if and only if
fv|u (v|u = u) = fv (v), for all u and all v, (9)
or equivalently
fx (x) = fu (u)fv (v). (10)
Sometimes the notation fv|u (v|u) is used instead of
fv|u (v|u = u).
Mean vector 32
The mean vector of the random vector x = (X1 , . . . , Xp )0 is

given by
 
X1
X2 
µ = E[x] = E  . 
 
 .. 
Xp
   
E[X1 ] µ1
E[X2 ] µ2 
=  .  =  . .
   
 .   .. 
.
E[Xp ] µp
Covariance matrix 33
The covariance matrix is defined as

 
σ11 σ12 · · · σ1p
σ21 σ22 · · · σ2p 
Σ = cov(x) =  . (11)
 
.. .. 
 .. . . 
σp1 σp2 · · · σpp
The diagonal elements σjj = σj2 = var(Xj ), and the off-diagonal

elements σjk = cov(Xj , Xk ), i.e, the covariances of all possible
pairs of x’s. The population covariance matrix in (11) can also
be found as
Σ = E (x − µ)(x − µ)0 .

(12)
Note that the covariance matrix is symmetric and at least

positive semi-definite.
Correlation matrix 34
The correlation matrix is given by

 
ρ11 ρ12 · · · ρ1p
ρ21 ρ22 · · · ρ2p 
Pρ = (ρjk ) =  . (13)
 
.. .. 
 .. . . 
ρp1 ρp2 · · · ρpp
σ σ
where ρjk = √σjjjkσkk = σjjk σk , j 6= k and ρjj = 1.
The correlation matrix can be obtained from the covariance
matrix, and vice versa. Define
 
σ1 0 · · · 0
 0 σ2 · · · 0
D = diag(σ1 , σ2 , . . . , σp ) =  . (14)
 
.. .. 
 .. . .
0 0 ··· σp
Then Pρ = D−1 ΣD−1 , Σ = DPρ D.

Theorem
Let x : (p × 1) be a random vector with covariance matrix Σx .

Define a new random vector as
y = Ax + b : (m × 1),
where A : (m × p) is a fix matrix and b : m × 1 is a fix vector.

Then
E(y) = A E(x) + b,
Σy = AΣx A0 .
Proof
We have
Σy = E[(y − E(y))(y − E(y))0 ]

= E[(Ax + b − A E(x) − b)(Ax + b − A E(x) − b)0 ]
= E[A(x − E(x))(x − E(x))0 A0 ] = . . . = AΣx A0 ,
where we have used (AB)0 = B 0 A0 .
Special case
Let x1 , . . . , xp be random variabels. A linear combination of

these is given as
n
X
y= ai xi = (a1 , . . . , ap )x = a0 x,
i=1
where x = (x1 , . . . , xp )0 and a = (a1 , . . . , ap )0 . Then
var(y) = σy2 = a0 Σx a : (1 × 1).
Since var(y) > 0 we also see that Σx is positive definite or

semidefinite.
Examples 38
Example 1
Let f : R3 −→ R be given by

c x1 (x2 + x3 ), 0 ≤ x1 , x2 , x3 ≤ 1
f (x1 , x2 , x3 ) =
0, otherwise
(a) Find the constant c so that f is a joint density function of

some random variables X1 , X2 and X3 .
(b) Find the joint distribution function F (x1 , x2 , x3 ) of X1 , X2
and X3 , and use it to determine
P (X1 ≤ 0.5, X2 ≤ 0.5, X3 ≤ 0.5).
(c) Find the marginal density function of X1 and E[X1 ].
(d) Find the conditional density for (X2 , X3 ) given X1 = x1 . Is
X1 independent of X2 and X3 ?
39
Solution:
(a) We have to find c such that
Z 1Z 1Z 1
f (x1 , x2 , x3 )dx1 dx2 dx3 = 1.
0 0 0
Integreting, we get
Z 1Z 1Z 1
f (x1 , x2 , x3 )dx1 dx2 dx3
0 0 0
Z 1Z 1Z 1
= c x1 (x2 + x3 )dx1 dx2 dx3
0 0 0
= c/2.
So, c/2 = 1 or c = 2 and thus,

2 x1 (x2 + x3 ), 0 ≤ x1 , x2 , x3 ≤ 1
f (x1 , x2 , x3 ) =
0, otherwise
is a joint density function of some random variables X1 , X2
and X3 .
40
(b) By the definition of distribution function, for

0 ≤ x1 , x2 , x3 ≤ 1,
Z x1 Z x2 Z x3
F (x1 , x2 , x3 ) = f (u1 , u2 , u3 )du1 du2 du3
Z0 x1 Z0 x2 Z0 x3
= 2 u1 (u2 + u3 )du1 du2 du3
0 0 0
Z x1 Z x2 Z x3
= 2 u1 du1 (u2 + u3 )du2 du3
0 0 0
x21 2
= (x x3 + x2 x23 ).
2 2
So,

 0,2
 x1 , x2 , x3 < 0
x1 2
F (x1 , x2 , x3 ) = 2 (x2 x3 + x2 x23 ), 0 ≤ x1 , x2 , x3 ≤ 1

 1, x1 , x2 , x3 > 1
41
Thus,
P (X1 ≤ 0.5, X2 ≤ 0.5, X3 ≤ 0.5) = F (0.5, 0.5, 0.5)
= 1/32 = 0.03125.
(c) The marginal density function of X1 is given by

Z 1Z 1
fX1 (x1 ) = f (x1 , x2 , x3 )dx2 dx3 = 2x1 ,
0 0
and
Z 1 Z 1
E[X1 ] = x1 fX1 (x1 )dx1 = 2 x21 dx1 = 2/3.
0 0
(d) The conditional density for (X2 , X3 ) given X1 = x1 is
f (x1 , x2 , x3 )
fX2 ,X3 |X1 (x2 , x3 |X1 = x1 ) = = x2 + x3 .
fX1 (x1 )
Hence, X1 is independent of X2 and X3 .
Multivariate normal distribution 42
A random vector y = (y1 , . . . , yp )0 has a p-dimensional

multivariate normal distribution if its density is given by
1 0 −1 (y−µ)/2
g(y) = e−(y−µ) Σ , (15)
(2π)p/2 |Σ|1/2
where the elements of y are in the range (−∞, ∞), E[y] = µ,

var(y) = Σ and rank(Σ) = p.
We say that y is distributed as Np (µ, Σ) or y is Np (µ, Σ) and
sometimes write y ∼ Np (µ, Σ).
43
If p = 1, we get the usual univariate normal density

1 2 2
g(y1 ) = √ e−(y1 −µ1 ) /2σ1 , (16)
2πσ1
and we write y1 ∼ N (µ1 , σ12 ).

The quatity (y − µ)0 Σ−1 (y − µ) in the exponent of the
multivariate normal density (15) is the squared generalized
distance from y to µ, or the Mahalanobis distance,
∆2 = (y − µ)0 Σ−1 (y − µ). (17)
In the univariate case this squared distance becomes

−1
δ 2 = (y1 − µ1 )0 σ11 (y1 − µ1 ) = (y1 − µ1 )2 /σ12 . (18)
44
The determinant of the covariance matrix |Σ| is the generalized

population variance. If σ 2 is small in the univariate normal, the
y values are concentrated near the mean.
Similarly, a small value of |Σ| in the multivariate case indicates
that the y’s are concentrated close to µ in p-space or that there
is multicollinearity among the variables. The term
multicollinearity indicates that the variables are highly
intercorrelated, in which case the effective dimensionality is less
than p. Figure 1 shows the familiar bell-shaped curves of
N (5, 0.5) and N (5, 1) and Figure 2 shows a bivariate normal
density surface.
45
Figure 1: The normal density curves: N (5, 0.5) and N (5, 1)
46
Figure 2: A bivariate normal density surface
Properties of multivariate normal random variables 47
Consider y = (y1 , y2 , . . . , yp )0 ∼ Np (µ, Σ).

1. Normality of linear combinations of the variables in
y:
(a) If a is a vector of constants, the linear function a0 y is
univariate normal:
a0 y ∼ N1 (a0 µ, a0 Σa). (19)
(b) If A is a constant q × p matrix of rank q, where q ≤ p, the q
linear combinations in Ay have a multivariate normal
distribution:
Ay ∼ Nq (Aµ, AΣA0 ). (20)
2. Standardized variables:
A standardized vector z can be obtained in two ways:
z = (T0 )−1 (y − µ), (21)
where T is obtained from the the Cholesky factorization
Σ = T0 T, or
z = (Σ1/2 )−1 (y − µ), (22)
48
where Σ1/2 is the symmetric square root matrix of Σ. In either

case, it follows from Property 1b that z is multivariate normal:
z ∼ Np (0, I). (23)
3. Chi-square distribution:
A chi-square random variable with p degrees of freedom is
defined as the sum of squares of p independent standard
normal random variables. Thus, if zP is the standardized
vector defined in (21) or (22), then pi=1 zi2 = z0 z has the
χ2 -distribution with p degrees of freedom, denoted as χ2p or
χ2 (p). From either (21) or (22) we obtain
z0 z = (y − µ)0 Σ−1 (y − µ). Hence,
(y − µ)0 Σ−1 (y − µ) ∼ χ2 (p). (24)
49
4. Normality of marginal distributions:

(a) Any subset of the y’s in y has a multivariate normal
distribution, with mean vector consisting of the
corresponding subvector of µ and covariance matrix
composed of the corresponding submatrix of Σ. To
illustrate, let y1 = (y1 , y2 , . . . , yr )0 denote the subvector
containing the first r elements of y and y2 = (yr+1 , . . . , yp )0
consist of the remaining p − r elements. Thus y, µ, and Σ
are partitioned as

y1 µ1 Σ11 Σ12
y= ,µ= ,Σ= .
y2 µ2 Σ21 Σ22
where for instance y1 and µ1 are r × 1, and Σ11 is r × r.
Then
y1 ∼ Nr (µ, Σ11 ). (25)
Here, again, E(y1 ) = µ1 and cov(y1 ) = Σ11 hold for any
random vector partitioned in this way. But if y is p-variate
normal, then y1 is r-variate normal.
50
(b) As a special case of the preceding result, each yj in y has

the univariate normal distribution:
yj ∼ N (µj , σjj ), j = 1, 2, . . . , p.
The converse of this is not true. If the density of each yj in

y is normal, it does not necessarily follow that y is
multivariate normal.
In the next three properties, let the observation vector be
partitioned into two subvectors denoted by y and x, where y is
p × 1 and x is q × 1. Or, alternatively, let x represent some
additional variables to be considered along with those in y.
Then,

y µy y Σyy Σyx
E = , cov = .
x µx x Σxy Σxx ,
51
We assume, in addition, that

y µy Σyy Σyx
∼ Np+q , .
x µx Σxy Σxx ,
5. Independence:
(a) The subvectors y and x are independent if Σxy = 0.
(b) Two individual variables yj and yk are independent if
σjk = 0.
Note that this is generally not true for nonnormal random
variables.
6. Conditional distribution:
If y and x are not independent, then Σyx 6= 0, and the
conditional distribution of y given x, f (y|x), is
multivariate normal with
E(y|x) = µy + Σyx Σ−1
xx (x − µx ), (26)
cov(y|x) = Σyy − Σyx Σ−1
xx Σxy . (27)
52
Note that E(y|x) is a vector of linear functions of x, whereas

cov(y|x) is a matrix that does not depend on x. The linear
trend in (26) holds for any pair of variables and it is called the
multivariate regression function. The matrix Σyx Σ−1 xx is called
the matrix of regression coefficients.
7. Distribution of the sum of two subvectors:
If y and x are the same size (both p × 1) and independent,
then
y + x ∼ Np (µy + µx , Σyy + Σxx ) (28)

y − x ∼ Np (µy − µx , Σyy + Σxx ). (29)
Example 53

1 0
y0
Let = (y1 , y2 ) ∼ N2 (µ, Σ) with µ0
= (0, 0), Σ = .
0 1
Define a random vector x0 = (x1 , x2 ) by
x1 = y1 − 3y2
x2 = y1 + cy2
(a) Find the mean vector, the covariance matrix of x and its
distribution.
(b) For which value of c the r.v. x1 and x2 are independent?
Exercises 54
Separate sheet!

DSC6132: Probability and Statistical Modelling: Lecture 4: Multivariate Random Variables

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DSC6132: Probability and Statistical Modelling: Lecture 4: Multivariate Random Variables

Uploaded by

Copyright:

Available Formats

DSC6132: Probability and Statistical Modelling

Lecture 4: Multivariate Random variables

Dr. Joseph Nzabanita and Dr. Annie Uwimana

Two-dimensional random vectors

The notion of random variables can be generalized to the case

First, we consider the case of two-dimensional random vectors.

Later, we will extend various definitions to the

The joint probability function

pX,Y (xj , yk ) = P ({X = xj } ∩ {Y = yk }) ≡ P (X = xj , Y = yk )

of the pair of discrete random variables (X, Y ), whose possible

Consider the joint probability function pX,Y given by the table

Check that the function pX,Y possesses the two properties of

Find FX,Y (0, 1/2) = · · · · · · = 1/8.

When the function pX,Y is summed over all possible values of Y

We find the marginal probability functions

Two discrete random variables, X and Y , are said to be

pX,Y (xj , yk ) = pX (xj )pY (yk ),

for any point (xj , yk ).

Are the random variables X and Y in the previous example

Let AX be an event defined in terms of the random variable X.

If X and Y are independent discrete random variables, then we

pX (x|AY ) = pX (x) and pY (y|AX ) = py (y).

Let AX = {X = 1}. Then we have

Let X and Y be two continuous random variables. The

The joint distribution function is defined by

Consider the function fX,Y defined by

So, this function is a valid joint density function if and only if

for x ≥ 0, y ≥ 0 and FX,Y (x, y) ≡ 0, elsewhere.

The marginal density functions of X and Y are defined by

We can easily generalize the previous definitions to the case of

For instance, the joint density function of the random vector

The continuous random variables X and Y are said to be

fX,Y (x, y) = fX (x)fY (y),

for any point (x, y).

The conditional density function fX (x|Y = y) is obtained by

fX (x|Y = y) = fX (x) and fY (y|X = x) = fY (y).

The covariance of X and Y is defined by

COV(X, Y ) = σX,Y = E[X − µX ][Y − µY ]

If X and Y are discrete

If X and Y are continuous

1. If X and Y are two independent random variables, then

3. We have that COV[X, X] = E[X 2 ] − (E[X])2 = VAR[X].

The correlation coefficient of X and Y is given by

We can show that −1 ≤ ρX,Y ≤ 1. Moreover, ρX,Y = ±1 if and

More precisely, ρX,Y = 1 (resp., -1) if a > 0 (resp., a < 0). In

Consider again the joint probability function given by the table

Calculate ρX,Y = · · · ≈ 0.2012.

A random vector is defined as x = (X1 , . . . , Xp )0 where the

In other words, a random vector is a vector whose individual

The distribution function of x is the function F : Rp → [0; 1]

F (x) = P (X1 ≤ x1 , . . . , Xp ≤ xp ). (1)

We may also use the term joint distribution of x, in which

Let x = (X1 , . . . , Xp )0 be a random vector.

For a random vector x, the distribution function always exists.

Using the fundamental theorem of calculus, when it applies, one

In this course, we shall always assume that the density function

The distribution of a random vector x = (X1 , . . . , Xp )0 is called

On the set of continuity of f ,

The random vector x can be partitioned into two mutually

Thus u and v are also random vectors but of lower dimension

The joint distribution function Fu (u) for u can be obtained

The joint distribution function Fu (x1 , . . . , xq ) for u is given by

Fu (x1 , . . . , xq ) = FX1 ,...,Xq (x1 , . . . , xq )