You are on page 1of 14

37

PUSAT PENGAJIAN SAINS MATEMATIK

UNIVERSITI SAINS MALAYSIA

If f ( y ) is the density of the random variable y , the mean or expected value of y is

defined as

=µ E=
( y) ∫ yf ( y )dy (3.1)
−∞

This is the population mean.

Generally, for a function u ( y ) , we have

E [u ( y ) ] = ∫ u ( y ) f ( y ) d y (3.2)
−∞

For a constant a and functions u ( y ) and v( y ) , it follows from (3.2) that

E (ay ) = aE ( y ),
(3.3)
E [u ( y ) + v( y )=
] E [u ( y ) ] + E [ v( y ) ]

The variance of random variable y is defined as

σ 2 var(=
= y) E ( y − µ )2 (3.4)
This is the population variance.

The square root of the variance is known as the standard deviation:

σ E( y − µ) (3.5)
2
= =
var( y)
Using (1.3), the variance of y can be expressed in the form
σ 2 var(
= = y) E ( y 2 ) − µ 2 (3.6)

If a is a constant, we can use (3.3) to show that

=var(a ) ya=
2
var( y ) a 2σ 2 (3.7)
For any two variables yi , y j in a random vector y = ( y1 , y2 ,, y p )′ , we define the
covariance as
σ ij =cov( yi , y j ) =E ( yi − µi ) ( y j − µ j )  (3.8)
where
= ( yi ), u j E ( y j ) .
ui E=

From (3.3), σ ij can be expressed in the form

=σ ij cov(= (
yi , y j ) E yi y j − µi µ j ) (3.9)
38

Two random variables yi , y j are said to be independent if their joint density factors into
the product of their marginal densities:
( )
f yi , y j = f ( yi ) f ( y j ), (3.10)
where the marginal density fi ( yi ) is defined as

(
fi ( yi ) = ∫ f yi , y j dy j
−∞
)
From (3.10), we obtain the following properties:
i) E ( yi y j ) = E ( yi ) E ( y j ) if yi and y j are independent.
ii) σ ij cov(
= = yi , y j ) 0 if yi and y j are independent.

The covariance σ ij depends on the scale of measurement of both yi and y j . To

standardize σ ij , divide it by (the product of) the standard deviations of yi and y j to
obtaine the correlation
σ ij
=ρij corr
= ( yi , y j ) (3.11)
σ iσ j

Example 3.1: Suppose the bivariate random variable ( x, y ) is distributed uniformly over
the region 0 ≤ x ≤ 2, 2 x − x 2 ≤ y ≤ 1 + 2 x − x 2
y

y =+
1 2 x − x2

=
y 2x − x2

The area of the region is given by

2
1+ 2 x − x 2
= ∫2 x − x2 dydx 2 .
Area ∫=
0

Hence, for a uniform distribution over the region, we set

1
f ( x=
, y) , 0 ≤ x ≤ 2, 2 x − x 2 ≤ y ≤ 1 + 2 x − x 2 ,
2
So that ∫∫ f ( x, y )dydx = 1

a) To find σ xy using (3.9), we need to find E ( xy ), E ( x) and E ( y )

39

2
1+ 2 x − x 2 1
E ( xy ) = ∫ ∫2 x − x2 xy  dydx
0 2

( )
2
x 7
= ∫ 1 + 4 x − 2 x 2 dx=
0 4 6

1+ 2 x − x 2
1 1
f1 ( x=
) ∫ dy= ,0≤ x≤2
2 x − x2 2 2

For f 2 ( x) , we obtain different results for 0 ≤ y ≤ 1 and 1 ≤ y ≤ 2 .

2
1 1
f 2 ( y ) =∫0
1− 1− y
dx + ∫ dx =1 − 1 − y , 0 ≤ y ≤ 1
2 1+ 1− y 2

1
f 2 ( y ) = ∫1−
1+ 1− y
1− y 0
dx = 2 − y, 1≤ y ≤ 2
2

Then,
2
1
=
E ( x ) ∫=x  dx 1,
0 2

( )
1 2
7
E ( y )= ∫ y 1 − 1 − y dy + ∫ y 2 − ydy=
0 1 6
Now by (3.9),
=σ xy E ( xy ) − E ( x) E ( y )
7 7
  − (1)   =
= 0
6 6

However, x and y are clearly dependent since the range of y for each x depends on
the value of x .
As further indication of the dependence of y on , examine E ( y | x ) , the expected value
of y for a given value of x .
E ( y | x ) = ∫ yf ( y | x ) d ,y
where the conditional density f ( y | x ) is defined as
f ( x, y )
f ( y | x) = .
f1 ( x )
In this case,
f ( x, y )
f ( y=
| x) = 1, 2 x − x 2 ≤ y ≤ 1 + 2 x − x 2
f1 ( x )
Thus,
40

E ( y | x ) = ∫2 x − x2
1+ 2 x − x 2
y (1)dy
1
2
= (
1 + 4 x − 2 x2 )
Since E ( y | x ) depends on x , the two variables are dependent.

3.2. Mean Vectors and Covariance Matrices For Random Vectors

Mean Vector
The expected value of a p × 1 random vector y is defined as the vector of expected
values of the p random variables y1 , y2 ,, y p in y :
 y1   E ( y1 )   µ1 
     
y2   E ( y2 )  µ2 
( yμ) E =
E=
   = 
=
  
, (3.12)
     
 yp   ( )
E y 
p   µp 
where E ( yi ) = µi is obtained as E ( yi ) = ∫ yi fi ( yi )dyi using fi ( y i ) , the marginal density
of yi .
If are x and y are p × 1 random vectors , it follows from (3.12), that the expected value
of their sum is the sum of their expected values:
E ( x + y=
) E (x) + E (y ) (3.13)

Covariance Matrix
The variances σ 12 ,σ 22 ,,σ 2p of y1 , y2 ,, y p and the covariances σ ij for all i ≠ j can be
conveniently displayed in the covariance matrix, which is denoted by Σ , the uppercase
version of σ ij :
 σ 11 σ 12  σ 1 p 

 σ 21 σ 22  σ 2 p 
=Σ cov
= ( )  
y
  
(3.14)
 
 σ p1 σ p 2  σ pp 
Note:
1. The i th row of Σ contains the variance of yi and the covariance of yi with each of
the other y ’s.
2. For the consistency purpose, use = i , i 1, 2,, p for the variance.
σ ii σ=2

3. The variances are on the diagonal of Σ , and the covariances occupy off-diagonal
positions.
4. Note the distinction in the font used for Σ as the covariance matrix and ∑ as the
summation symbol and the distinction in meaning between the notation cov ( yΣ
) = and
( )
cov yi , y j = σ ij .
41

Eq (3.14) can be expressed as the expected values of a random matrix. Thus, the ( ij ) th
element of the matrix ( yμ− y)( μ− )′ is ( yi − µi ) ( y j − µ j ) . Hence, by (3.8), the ( ij ) th
element of E ( yμ− y)( μ−
 )′  is

( )
E ( yi − µi ) y j − µ j  =
σ ij . Hence,

 σ 11 σ 11  σ 11 

σ 21 σ 22  σ 2 p 
E = ( yμ− y)( μ− )′   =Σ (3.15)
      
 
 σ p1 σ p 2  σ pp 

Here, for p = 3 :
Σ = E ( y − μ )( y − μ )′ 
 
 y1 − µ1  
  
= E  y2 − µ2  ( y1 − µ1 , y2 − µ2 , y3 − µ3 ) 
 y3 − µ3  

( y1 − µ1 )2 ( y1 − µ1 )( y2 − µ2 ) ( y1 − µ1 ) ( y3 − µ3 )  (3.16)
= E  ( y2 − µ2 )( y1 − µ1 ) ( y2 − µ 2 ) 2 ( y2 − µ2 ) ( y3 − µ3 )
 
( y3 − µ13 ) ( y1 − µ1 ) ( y3 − µ3 ) ( y2 − µ2 ) ( y3 − µ3 ) 
2
 
 σ 12 σ 12 σ 13 
 
=  σ 21 σ 22 σ 23 
 2 

 σ 31 σ 32 σ 3 

This (3.15) can be written in the form

Σ = E ( y − μ )( y − μ )′  = E ( yy ′ ) − μμ′ (3.17)
 
(analogous to (3.6) & (3.9)).

Generalized Variance
A measure of overall variability in the population of y ’s can be defined as the
determinant of Σ :
Generalized variance = Σ (3.18)
Note:
1. If Σ is small, the y ’s are concentrated closer to μ than if Σ is large.
2. A small value of Σ may also indicate that the variables y1 , y2 ,, y p in y are highly
intercorrelated.
42

Standardized Distance
To get a useful measure of distance between y and μ , need to take account the
variances and covariances of the yi ’s in y . By analogy with the univariate standardized
variable ( y − µ ) σ , which has mean 0 and variance 1, the standardized distance is
defined as
Standardized distance = ( yμ− Σ)′ y( μ−
−1
) (3.19)
The use of Σ −1 standardizes the (transformed) yi ’s so that they have means equal to 0
and variances equal to 1 and are also uncorrelated. A distance such as (3.19) is often
called a Mahalanobis distance.

3.3 Correlation Matrices

Is defined as
 1 ρ12  ρ1 p 

ρ 21 1  ρ 2 p 
Pρ (=
= ρij )  , (3.20)
    
 
 ρ p1 ρ p2  1 
σ ij
where ρij = is the correlation of yi and y j defined in (3.11). The second row of
σ iσ j
Pρ ,for example, contains the correlation of y2 with each of the other y ’s . If we define

 diag ( ) ( )
12
σ =
DΣ =diag σ 1 ,σ 2 ,,σ p (3.21)
Then, we can obtain Pρ from Σ and vice versa:
ΣD
Pρ = Dσ−1 −1
σ ,
(3.22)
Σ = Dσ Pρ Dσ
Note: We use the subscript ρ in Pρ to emphasize that P is the uppercase version of ρ .

3.4 Mean Vectors and Covariance Matrices For Partitioned Random Vectors.
Suppose the random vector v is partitioned into two subsets of variables, which we
denote by y and x :
 y1 
 
  
y  yp 
=
v =  .
x  x1 
  
 
 xq 
 

Note:
1. There are p + q random variables in v .
43

 y   E (y) μy 
2.
= ( v ) E=
μ E=    =    (3.23)
 x   E ( x )   μ x 
 y   Σ yy Σ yx 
3. Σ cov
= = ( v ) cov
=   Σ  where Σ xy = Σ′yx .
Σ xx 
(3.24)
 x   xy

4. In (3.23), the submatrix μ y =  E ( y1 ) , E ( y2 ) ,, E ( y p )  contains the means of
y1 , y2 ,, y p . Similarly, μ x contains the means of x ’s.
5. In (3.24) , the submatrix Σ yy = cov ( y ) is the p × p covariance matrix for y containing
the variances of y1 , y2 ,, y p on the diagonal and the covariance of each yi with each
y j ( i ≠ j ) off the diagonal:
 σ y2 σ y1 y2  σ y1 y p 
 1 
σ y y σ y22  σ y2 y p 
Σ yy =  2 1

    
 
σ y y σ y y  σ y2p 
 p1 p 2

6. Similarly, Σ xx = cov ( x ) is the q × q covariance matrix of x1 , x2 ,, xq . The matrix Σ yx in

(3.24) is p × q and contains the covariance of each yi with each x j :
 σ y2 x σ y1 x2  σ y1 xq 
 11 
σ y x σ y22 x2  σ y2 xq 
Σ yx = 2 1 
    
 
σ y x σ y p x2  σ y2p xq 
 p1 

7. Σ yx is rectangular unless p = q .
8. The covariance matrix Σ yx is also denoted by cov ( y , x ) and can be obtained as


(
Σ yx =cov ( y , x ) =E  y − μ y ) ( x − μ )′ 
x (3.25)

y
ps: Note the differences in meaning between cov   in (3.24) and cov ( y , xΣ) = yx in
x
(3.25). Now there are three notation of cov : (i) cov ( yi , y j ) is a scalar, (ii) cov ( y ) is a
symmetric matrix, and (iii) cov ( y , x ) is a rectangular matrix.

3.5. Means, Variances and Covariances For Linear Functions of Random Vectors.
The linear combination=
of a ( a , a ,…, a )′ , a vector of constant and a random vector y
1 2 p

can be written as
z= a1 y1 + a2 y2 +  + a p y p = a′y (3.26)
44

Means
Theorem 3.5A: If a is a p × 1 vector of constants and y is a p × 1 random vector with
mean vector μ , then the mean of z = a′y is given by
= µ z E=( a′y ) μa=
′E ( y ) a′ (3.27)
Proof:
E ( a= (
′y ) E a1 y1 + a2 y2 +  + a p y p )
= E ( a1 y1 ) + E ( a2 y2 ) +  + E a p y p ( )
= a1 E ( y1 ) + a2 E ( y2 ) +  + a p E(y )
p

 E ( y1 ) 
 
 E ( y2 ) 
(
= a1 , a2 ,, a p 
  )
 
 E yp 
  ( )
′E (μy ) a′
= a=

Theorem 3.5B: Suppose y is an random vector, X is a random matrix, a and b are

vectors of constants, and A and B are matrices of constants. Then, assuming the
matrices and vectors in each product are comformable, we have the following expected
values:

(i) E ( Ay ) = AE ( y ) , (3.28)
(ii) E ( a′Xb ) = a′E ( X ) b, (3.29)
(iii) E ( AXB ) = AE ( X ) B. (3.30)

Corollary 1: If A is a k × p matrix of constants, b is a k × 1 vector of constants, and y is

a p × 1 random vector, then
E ( Ay +=
b ) AE ( y ) + b (3.31)

Variances & Covariance

Theorem 3.5C: If a is a p × 1 vector of constants and y is a p × 1 random vector with
covariance matrix Σ , then the variance of z = a′y is given by
=σ z2 var
= ( a′y )Σaa′ (3.32)
Proof:
45

var ( a′y ) =μE ( a′y − aa′ )y =μE  ′ ( − )

2 2

= E a′ (μy −a y) ′ (μ − ) 

= E a′ (μy −y )(μ −a )′  (note that

b c ′ is a sum of product;
b c c′ =b ′ )
 

= a′Eμ( yy − μ)( a− )′ 
 

= aΣa

For p = 3, :
)Σavar ( a1 y1 + a2 y2 + a3 y=
var ( a′y= 3) a

= a12σ 12 + a22σ 22 + a32σ 32 + 2a1a2σ 12 + 2a1a3σ 13 + 2a2 a3σ 23
Thus, var ( a′y )Σa
= a′ involves all the variances and covariances of y1 , y2 , y3

Corollary 1(the covariance of two linear combinations): If a and b are p × 1 vectors of

constants, then
cov ( a′y , b′yΣb
) = a′ (3.33)

Theorem 3.5D: Let z = Ay and w = By , where A is a k × p matrix of constants, B is an

m × p matrix of constants, and y is a p × 1 random vector with covariance matrix Σ .
Then
(i) =cov ( z ) cov
ΣA( Ay ) A ′,
= (3.34)
(ii) = cov ( z, w ) cov
= ( Ay, By ) A ′,
ΣB (3.35)

Note: cov ( z, w ) = AΣB′ is a k × m rectangular matrix containing the covariance of each

( zi , wi ) , i 1,=
zi with each w j , that is, cov ( z, w ) contains cov= 2,, k ; j 1, 2,, m .

cov ( Ay + bΣA
)= A ′, (3.36)

4.1 Univariate Normal Density Function

The standard normal density is given by
1 − z2 2
= g ( z) e ; − ∞ < z < ∞, (4.1)

To obtain a normal random variable y with mean µ and variance σ 2 , use the transformation
=
z ( y − µ) σ y σ z + µ , so that E ( y ) = µ and var ( y ) = σ 2 . Thus,
or =
dz
f ( y) = g ( z) (4.2)
dy
46

where dz dy is the absolute value of dz dy . Note that, to find the density of y using (4.2),
both z and dz dy on the right side must be expressed in terms of y .Thus,

y= σz + µ ⇒ z=
( y − µ)
σ

⇒ dz dy =
Therefore,
dz  y−µ 1
f ( y ) g=
= (z) g 
dy  σ σ
(4.3)
1 −( y − µ )2 2σ 2
= e

(
is the normal density with E ( y ) = µ and var ( y ) = σ 2 . y is distributed as N µ ,σ 2 )
4.2 Multivariate Normal Density Function
Start with independent standard normal random variables z1 , z2 ,, z p , with µi = 0 and
variance σ i 2 for all i and σ ij = 0 for i ≠ j . Then, transform the zi ’s to multivariate normal
variables y1 , y2 ,, y p , with arbitrary means, variances, and covariances. Thus, start with a

random vector z = ( z1 , z2 ,, z p )′ , where

= ,cov ( z ) I , and each zi has the N ( 0,1)
E ( z ) 0=
density in (4.1). Now, to transform z to a multivariate normal random vector
y = ( y1 , y2 ,, y p )′ with E ( yμ) = and cov ( yΣ
) = , where μ is any p × 1 vector and Σ is any
p × p positive definite matrix:
(
g z1 , z2 ,,=
zp ) ( )
( z ) g1 ( z1 ) g 2 ( z2 ) g p z p
g=
1 − z12 2 1 − z22 2 1 − z 2p
=
2
e e  e
2π 2π 2π
1 − ∑ i zi2 2
= e (4.4)
( )
p

1
= e − z′z 2
( )
p

So that, z has a multivariate normal density with mean vector 0 and covariance matrix I or z
is distributed as N p ( 0, I ) , with p is the dimension of the distribution (= number of variables in
y)
Note:
1. Two random variables yi & y j are said to be independent if their joint density factors
into the product of their marginal densities.
2. b′b = b12 + b22 +  + bp2

Now, define the transformation

=yΣ z1 2 μ+ (4.5)
47

where Σ1 2 is the (symmetric) square root matrix. By (3.31) & (3.36),

E ( yΣ
= ( )
) Ez μ1 2 +Σ = z 1 2 E ( μ) + EΣ( =)0 μ1 2 μ+=
( yΣ) cov
cov= z μ ( 12
Σ=
+ ) cov (
z1 2 Σ ) (Σ= )
12 ′
IΣ Σ=
1212

Now, to find the density of

= yΣ z1 2 μ+ from the density of z in (4.4):
By (4.2), the density of = y σ z + µ=is f ( y ) g=
( z ) dz dy g ( z ) 1 σ . Therefore, the
=
analogous expression for the multivariate linear transformation yΣ z1 2 μ+ is
f ( y )Σ= g ( z ) abs ( −1 2
) (4.6)

where Σ −1 2 is defined as ( Σ1 2 ) ( ) represents the absolute value of the

−1
and abs Σ −1 2
determinant of Σ −1 2 , which parallels the absolute-value expression dz dy = 1 σ in the
univariate case. [Note: The determinant Σ −1 2 is the Jacobian of the transformation]. Since
Σ −1 2 is positive definite, we can write (4.6) as

f ( y )Σ= g ( z ) −1 2

= g ( zΣ
)
−1 2

1
= e − z′z 2
( )
p
2π Σ
12

1 −  Σ −1 2 ( y −μ )   Σ −1 2 ( y −μ )  2
= e (4.7)
( )
p
2π Σ
12

=
1
e
(
−( yμ− )′Σ 1Σ2 )
1 2 −1
y( μ− ) 2

( )
p
2π Σ
12

1 − yμ− Σ′ −1y
e( ) ( )
μ−
=
2

( )
p
2π Σ
12

which is the multivariate normal density function with mean vector μ and covariance
matrix Σ . We say that y is distributed as N p ( μ, Σ ) . [Note: p is the dimension of the p -
variate normal distribution and indicates the number of variables; that is y is p × 1 , μ is p × 1 ,
and Σ is p × p .

Note: A comparison between (4.7), and (4.3) :

1. The standardized distance ( yμ− Σ)′ −1
y( μ− ) in place of ( y − µ ) σ 2 in the exponent.
2

2. The square root of the generalized variance Σ in place of the square root of σ 2 in the
denominator.
3. In (4.7), f ( y ) decreases as the distance from y to μ increases, and a small value of Σ
48

indicates that the y ’s are concentrated closer to μ than is the case when Σ is large.
4. A small value of Σ may also indicate a high degree of multicollinearity among the variables.

4.3 Moment-Generating Functions (mgf)

In a univariate random variable y , moment-generating function is defined as
( )
M y ( t ) = E ety , (4.8)
provided that E ( e ) exists for every real number t
ty
in the neighborhood − h < t < h for some
positive number h .
(
Moment-generating function of y for the univariate normal N µ ,σ 2 is given by )
M y ( t ) = et µ + t σ
2 2
2
(4.9)
Note:
1. Mgf can also be used to generate moments: For a continuous r.v y ,
=
M y (t ) E=
ety ( )
∫−∞ e f ( y ) dy
ty ∞

dM y ( t ) (4.10)
=M ′y ( t ) =∫−∞ yety f ( y ) dy

dt
2. Setting t = 0 gives the first moment or mean:
M ′y ( 0 ) ∫−∞ y f ( y ) dy E ( y )

= = (4.11)

3. Similarly, the k th moment can be obtained using the k th derivative evaluated at 0;

M y( ) ( 0 ) = E y k
k
( ) (4.12)

( )
4. The second moment, E y 2 , can be used to find the variance.

For a r.vector y , the mgf is defined as

=M y ( t ) E=
e1 1 2 2 p p
(
E et′y
t y + t y ++ t y
) ( ) (4.13)

By analogy with (4.11),

∂M y ( 0 )
= E (y) (4.13b)
∂t
where the notation ∂M y ( 0 ) ∂t indicates that ∂M y ( t ) ∂t is evaluated at t = 0 . Similarly,
∂ 2 M y ( t ) ∂tr ∂ts evaluated at tr= ts= 0 gives E ( yr ys ) ;
∂2 M y ( 0)
= E ( yr y s ) (4.14)
∂tr ∂ts

Theorem 4.3A. If y is distributed as N p ( μ, Σ ) , its mgf is given by

M y ( t ) = etμ′ +t Σt
′ 2
(4.15)
49

Proof: By (4.13) &(4.7),

∞ ∞
− )′ y −1μ( −
t ′y −μ( y Σ )
M y ( t ) = ∫  ∫ ke (4.16)
2
dy
−∞ −∞

( )
p 12
where k = 1 2π Σ and dy = dy1dy2  dy p . Rewriting the exponent,
∞ ∞
′ 2 −(y −μ −Σt )′Σ −1 (y −μ −Σt )
M y ( t ) = ∫  ∫ ke
′ +t Σt
tμ 2
dy
−∞ −∞
∞ ∞
−  yμ−( Σt
+ )Σ′ −1
y  μ−( Σt
+ ) 
= etμ′ +t Σt

(4.17)
2
∫  ∫ ke
2

−∞ −∞

= etμ′ +t Σt
′ 2

Note: The multiple integral in (4.17) is equal to 1 because the multivariate normal
density in (4.7) integrates to 1 for any mean vector, including μ + Σt .

Corollary 1. The mgf for yμ− is

M yμ− ( t ) = etΣt
′ 2
(4.18)

Properties of mgfs:
1. If two r.vectors have the same mgf, they have the same density.
2. Two r.vectors are independent if and only if their joint mgf factors into the product of their
( )
two separate mgfs: that is, if y ′ = y1′ , y 2′ and t ′ = t1′ , t 2′ , then y1 and y 2 are ( )
independent if and only if
M y ( t ) = M y1 ( t1 ) M y 2 ( t 2 ) . (4.19)

4.4 Properties of the Multivariate Normal Distribution

Firstly, consider the distribution of linear functions of multivariate normal r.vs.

Theorem 4.4A. Let the p × 1 r.vector y be N p ( μ, Σ ) , let a be any p × 1 vector of constants,

and let A be any k × p matrix of constants with rank k ≤ p . Then
(i) z = a′y is N ( aμ ′ ).
′ ,a Σa
(ii) z = A′y is N ( Aμ, AΣA′ )

Corollary 1. If b is any k × 1 vector of constants, then

=z Ay + b is N k ( Aμ + b, AΣA′ )

Theorem 4.4B. If y is N p ( μ, Σ ) , then any r × 1 subvector of y has an r -variate normal

distribution with the same means, variances, and covariances as in the original p -
variate normal distribution. (Note: The marginal dist. of multivariate normal variables
are also normal).
50

Corollary 1. If y is N p ( μ, Σ ) , then any individual variable yi in y is distributed as N ( µi ,σ ii ) .

y
Theorem 4.4C. If v =   is N p + q ( μ, Σ ) ,then y and x are independent if Σ yx = O
x
Corollary 1. If y is N p ( μ, Σ ) , then any two individual variables yi and y j are independent if
σ ij = 0 .

Corollary 2. If y is N p ( μ, Σ ) and if cov ( Ay=

, By )ΣBA=O′ , then Ay & By are independent.

Theorem 4.4D. If y and x are jointly multivariate normal with Σ yx ≠ O , then the conditional
distribution of y given x , f ( y | x ) , is multivariate normal with mean vector and covariance
matrix
E ( y | μx ) =
Σy +Σ yx x −xx1 μ( − ) , (4.20)
cov ( y |=
x)
Σ Σyy −Σ yx
Σ −1
xx xy (4.21)

=
Corollary 1. If v (=
y, x , x ,, x ) ( y, x′ ) with
1 2 q

 µy  σ 2
σ′yx 
=μ =  , Σ 
y

 μx   σ yx Σ xx 

where µ y and σ y2 are the mean and variance of y , μ x = ( µ1 , µ2 ,, µq )′ contains the means
µi = E ( xi ) , σ′yx = (σ y1 ,σ y 2 ,,σ yq ) contains the covariances σ yi = cov ( y, xi ) , and Σ xx
contains the variances and covariances of the x ’s.

Then, y | x is normal with

µ y +x ′yxμ −xx1 ( −
E ( y | xσ) =
Σ x ), (4.22)
var ( y | xσ
=) σΣ y2 σ− ′yx −1
xx xy (4.23)
Note: In (4.23), σ′yx Σ −xx1σ xy ≥ 0 because Σ −xx1 is positive definite. Therefore
var ( y | x ) ≤ var ( y ) (4.24)

In Theorem 4.4C & 4.4D, random vector v is partitioned into two subvectors denoted by y and
x , where y is p × 1 and x is q × 1 , with corresponding partitioning of μ and Σ :
y y μy   y   Σ yy Σ yx 
=vμ  =
, E=
  Σ =  , cov=
   .
x x  μx   x   Σ xy Σ xx 