# 9 Bivariate and Multivariate Normal distributions

9.1 Bivariate normal distribution
9.1.1 Density function
Let random variables (X, Y ) have the joint pdf
f(x, y) =
1
2πσ
X
σ
Y
_
1 −ρ
2
exp
_
−1
2(1 −ρ
2
)
_
_
x −µ
X
σ
X
_
2
−2ρ
_
x −µ
X
σ
X
__
y −µ
Y
σ
Y
_
+
_
y −µ
Y
σ
Y
_
2
__
(1)
where −∞ < x < ∞, −∞ < y < ∞, −∞ < µ
X
< ∞, −∞ <
µ
Y
< ∞, σ
X
> 0, σ
Y
> 0 and −1 < ρ < 1, then X, Y have a
bivariate normal distribution.
We ﬁrst show this is a proper density. It is clearly non-negative. We
show that it integrates to one. In the double integral of the pdf we let
u = (x − µ
X
)/σ
X
and V = (y − µ
Y
)/σ
Y
so that du = dx/σ
X
and
dv = dy/σ
Y
. Thus
_

−∞
_

−∞
f(x, y) dx dy =
_

−∞
_

−∞
1

_
1 −ρ
2
exp
_
−1
2(1 −ρ
2
)
_
u
2
−2ρuv + v
2
¸
_
du dv
Completing the square in v we have
_

−∞
_

−∞
1

_
1 −ρ
2
exp
_
−1
2(1 −ρ
2
)
_
(u −ρv)
2
+ v
2
(1 −ρ
2
)
¸
_
du dv
now substitute w = (u − ρv)/
_
1 −ρ
2
so that dw = du/
_
1 −ρ
2
79
giving
_

−∞
_

−∞
1

exp
_

_
w
2
+ v
2
¸
2
_
dw dv
which can be written as
_

−∞
1

exp
_
−w
2
2
_
dw
_

−∞
1

exp
_
−v
2
2
_
dv
which equals one since each integral is that of a N(0, 1) pdf.
9.1.2 Moment generating function
The joint moment generating function of (X, Y ) is deﬁned by
M
X,Y
(t
1
, t
2
) = E[e
t
1
X+t
2
Y
].
It is used similarly to the moment generating function M
X
(t). Thus,
for example,

∂t
1
M
X,Y
(0, 0) = E[X], that is the partial derivative
of M
X,Y
(t
1
, t
2
) evaluated at t
1
= 0 and t
2
= 0 gives the mean
of X. Similarly

∂t
2
M
X,Y
(0, 0) = E[Y ]

2
∂t
1
2
M
X,Y
(0, 0) = E[X
2
],

2
∂t
1
∂t
2
M
X,Y
(0, 0) = E[XY ] and so on.
We state without proof the moment generating function for the bi-
variate normal distribution, it is
M
X,Y
(t
1
, t
2
) = exp[t
1
µ
X
+t
2
µ
Y
+
1
2
(t
2
1
σ
2
X
+ 2ρt
1
t
2
σ
X
σ
Y
+t
2
2
σ
2
Y
)].
The parameters of the bivariate normal correspond to the moments
as we would expect but we can formally show they do so by using
the moment generating function. We have the following result.
Theorem 9.1. If (X, Y ) has a bivariate normal distribution with pdf
as given in equation(1) on page 79, then E[X] = µ
X
, E[Y ] = µ
Y
,
Var[X] = σ
2
X
, Var[Y ] = σ
2
Y
, Cov[X, Y ] = ρσ
X
σ
Y
and corr[X, Y ] =
ρ.
80
Proof. We shall prove the ﬁrst result and leave the remainder as an
exercise.
∂M
∂t
1
= exp[t
1
µ
X
+ t
2
µ
Y
+
1
2
(t
2
1
σ
2
X
+ 2ρt
1
t
2
σ
X
σ
Y
+ t
2
2
σ
2
Y
)]
×(µ
x
+
1
2
2t
1
σ
2
X
+
1
2
2ρt
2
σ
X
σ
Y
)
Evaluating this at t
1
= 0, t
2
= 0 we ﬁnd it is 1×(µ
X
+0+0) = µ
X
.
We have mentioned before that uncorrelated normal random vari-
ables, i.e. those which have ρ = 0, are independent. We can now
prove this.
Theorem 9.2. If X, Y have a bivariate normal distribution then X
and Y are independent if and only if they are uncorrelated.
Proof. If ρ = 0 then we may write the joint pdf of X and Y as
1
2πσ
X
σ
Y
exp
_
−1
2
_
_
x −µ
X
σ
X
_
2
+
_
y −µ
Y
σ
Y
_
2
__
which can be written as
1

2πσ
X
exp
_
−1
2
_
_
x −µ
X
σ
X
_
2
__
1

2πσ
Y
exp
_
−1
2
_
_
y −µ
Y
σ
Y
_
2
__
and as the joint pdf splits into the product of a function of x and a
function of y it follows that X and Y are independent.
If X, Y are independent they are certainly uncorrelated as E[XY ] =
E[X] E[Y ] so Cov[X, Y ] = 0.
9.1.3 The marginal and conditional densities
We can ﬁnd the marginal densities for X and Y by integrating out
the other from the joint pdf.
81
Theorem 9.3. If X and Y have a bivariate normal distribution then
the marginal distribution of X is normal with mean µ
X
and variance
σ
2
X
. Similarly Y is N(µ
Y
, σ
2
Y
).
Proof. We have to ﬁnd
f(x) =
_

−∞
f(x, y) dy.
In this integral let v = (y −µ
Y
)/σ
Y
so that dv = dy/σ
Y
then
f(x) =
_

−∞
1
2πσ
X
_
1 −ρ
2
exp
_
−1
2(1 −ρ
2
)
_
_
x −µ
X
σ
X
_
2
−2ρ
_
x −µ
X
σ
X
_
v + v
2
__
dv
Completing the square on v so that we add and subtract ρ
2
(x −
µ
X
)
2

2
X
we have
f(x) =
_

−∞
1
2πσ
X
_
1 −ρ
2
exp
_
−1
2
_
x −µ
X
σ
X
_
2
+
−1
2(1 −ρ
2
)
_
v −ρ
x −µ
X
σ
X
_
2
_
dv
Now use the substitution
w =
v −ρ(x −µ
X
)/σ
X
_
1 −ρ
2
dw =
dw
_
1 −ρ
2
so that
f(x) =
_

−∞
1
2πσ
X
exp
_
−1
2
_
_
x −µ
X
σ
X
_
2
+ w
2
__
=
1

2πσ
X
exp
_
−1
2
_
x −µ
X
σ
X
_
2
_
_

−∞
1

exp
_
−w
2
2
_
dw
The ﬁnal integral is equal to one as it is the pdf of a N(0, 1) and
hence the result is proved.
82
We can now derive the conditional distribution of Y |X = x. Con-
sider the expression for f(x) in the previous proof after completing
the square in v. We have that
f(x, v) =
1
2πσ
X
_
1 −ρ
2
exp
_
−1
2
_
x −µ
X
σ
X
_
2
+
−1
2(1 −ρ
2
)
_
v −ρ
x −µ
X
σ
X
_
2
_
so the conditional distribution of V |X = x has pdf
f(v|x) =
1

_
1 −ρ
2
exp
_
−1
2(1 −ρ
2
)
_
v −ρ
x −µ
X
σ
X
_
2
_
reversing the transformation of y to v we have
f(y|x) =
1

2πσ
Y
_
1 −ρ
2
exp
_
−1
2(1 −ρ
2

2
Y
_
y −µ
Y

ρσ
Y
σ
X
(x −µ
X
)
_
2
_
and hence the conditional distribution of Y |X = x is normal with
mean µ
Y
+ ρ
σ
Y
σ
X
(x −µ
X
) and variance σ
2
Y
(1 −ρ
2
).
Example 9.1. In a population of college students the respective grade
point averages, X, Y , in high school and ﬁrst year in college have an
approximate bivariate normal distribution with µ
X
= 2.9, µ
Y
= 2.4,
σ
X
= 0.4, σ
Y
= 0.5 and ρ = 0.8. Find the probability that a ﬁrst
year student in college has a grade point average between 2.1 and
3.3.
P(2.1 < Y < 3.3) = P
_
2.1 −2.4
0.5
<
Y −2.4
0.5
<
3.3 −2.4
0.5
_
= P(−0.6 < Z < 1.8)
= Φ(1.8) −Φ(−0.6)
= 0.6898
where Z ∼ N(0, 1).
Find the corresponding probability for a ﬁrstyear student whose GPA
in high school was 3.2.
83
We have to ﬁnd the conditional distribution of Y given X = 3.2.
Using the result given above it is normal with mean
2.4 + 0.8
0.5
0.4
(3.2 −2.9) = 2.7
and variance 0.5
2
(1 −0.8
2
) = 0.09 so the standard deviation is 0.3.
In this case
P(2.1 < Y < 3.3|X = 3.2) = P
_
2.1 −2.7
0.3
<
Y −2.7
0.3
<
3.3 −2.7
0.3
_
= P(−2 < Z < 2)
= Φ(2) −Φ(−2)
= 0.9544
Example 9.2. Suppose X, Y are bivariate normal with µ
X
= 3, µ
Y
=
2, σ
X
= 2, σ
Y
= 1 and ρ = 0.6. What is the distribution of X +Y ?
We do this by ﬁnding the mgf of X + Y . We know the joint mgf of
X, Y is given by
M
X,Y
(t
1
, t
2
) = exp[t
1
µ
X
+t
2
µ
Y
+
1
2
(t
2
1
σ
2
X
+ 2ρt
1
t
2
σ
X
σ
Y
+t
2
2
σ
2
Y
)].
We want M
X+Y
(t) = E[e
t(X+Y )
]. If we set t
1
= t
2
= t in the joint
mgf we will have the required answer.
M
X,Y
(t, t) = exp[t(µ
X
+ µ
Y
) +
1
2
t
2

2
X
+ 2ρσ
X
σ
Y
+ σ
2
Y
)].
and so we can say that
X + Y ∼ N(µ
X
+ µ
Y
, σ
2
X
+ 2ρσ
X
σ
Y
+ σ
2
Y
).
Substituting the numbers for the parameters we see that X + Y ∼
N(5, 7.4).
9.2 Multivariate normal distribution
Before we deﬁne the multivariate normal distribution we need some
ideas about writing combinations of random variables as vectors.
84
9.2.1 Vectors of random variables
A vector Y is a random vector if its elements are random variables.
Below we show some properties of random vectors.
Deﬁnition 9.1. The expected value of a random vector is the vector
of the respective expected values. That is, for a random vector Z =
(Z
1
, . . . , Z
n
)
T
we write
E(Z) = E
_
_
_
_
_
Z
1
Z
2
.
.
.
Z
n
_
_
_
_
_
=
_
_
_
_
_
E(Z
1
)
E(Z
2
)
.
.
.
E(Z
n
)
_
_
_
_
_
(2)
We have analogous properties of the expectation for random vectors
as for single random variables. Namely, for a random vector Z, a
constant scalar a, a constant vector b and for matrices of constants
Aand B we have
E(aZ +b) = a E(Z) +b
E(AZ) = AE(Z)
E(Z
T
B) = E(Z)
T
B
(3)
Variances and covariances of the random variables Z
i
are put to-
gether to form the so called variance-covariance (dispersion) matrix,
V ar(Z) =
_
_
_
_
_
Var(Z
1
) Cov(Z
1
, Z
2
) · · · Cov(Z
1
, Z
n
)
Cov(Z
2
, Z
1
) Var(Z
2
) · · · Cov(Z
2
, Z
n
)
.
.
.
.
.
.
Cov(Z
n
, Z
1
) · · · Var(Z
n
)
_
_
_
_
_
(4)
The dispersion matrix has the following properties.
(a) The matrix Var(Z) is symmetric since Cov(Z
i
, Z
j
) = Cov(Z
j
, Z
i
).
85
(b) For mutually uncorrelated random variables the matrix is diag-
onal, since Cov(Z
i
, Z
j
) = 0 for all i = j.
(c) The var-cov matrix can be expressed as
Var(Z) = E[(Z −E[Z])(Z −E[Z])
T
]
(d) The dispersion matrix of a transformed variable U = AZ is
Var(U) = AVar(Z)A
T
Proof. Denote by µ = (µ
1
, . . . , µ
n
)
T
= (E[Z
1
], . . . , E[Z
n
)
T
.
To see (c) write
E[(Z −µ)(Z −µ)
T
] = E
_
¸
_
¸
_
_
_
_
Z
1
−µ
1
.
.
.
Z
n
−µ
n
_
_
_
_
Z
1
−µ
1
, . . . , Z
n
−µ
n
_
_
¸
_
¸
_
=
_
_
_
_
_
E(Z
1
−µ
1
)
2
E[(Z
1
−µ
1
)(Z
2
−µ
2
)] · · · E[(Z
1
−µ
1
)(Z
n
−µ
n
)]
E(Z
2
−µ
2
)
2
· · · E[(Z
2
−µ
2
)(Z
n
−µ
n
)]
.
.
.
.
.
.
E[(Z
n
−µ
n
)(Z
1
−µ
1
)] · · · E(Z
n
−µ
n
)
2
_
_
_
_
_
= Var(Z).
To show (d) we can use the notation of (c),
Var(U) = E[(U −E[U])(U −E[U])
T
]
= E[(AZ −Aµ)(AZ −Aµ)
T
]
= E[A(Z −µ)(Z −µ)
T
A
T
]
= AE[(Z −µ)(Z −µ)
T
]A
T
= AVar(Z)A
T
.

Note that the property (c) gives the expression for the dispersion ma-
trix of a random vector analogous to the expression for the variance
of a single rv, that is
Var(Z) = E(ZZ
T
) −µµ
T
. (5)
86
Figure 1: Bivariate Normal pdf
9.2.2 Multivariate Normal Distribution
Arandomvector Xhas a multivariate normal distribution if its p.d.f.
can be written as
f(x) =
1
(2π)
n
2
_
det(V )
exp
_

1
2
(x −µ)
T
V
−1
(x −µ)
_
, (6)
where µ is the mean and V is the variance-covariance matrix of X.
When n = 1 we have V = σ
2
and
f(x) =
1
(2π)
1
2

2
)
1
2
exp
_

1
2
(x −µ)
2
σ
2
_
as usual.
When n = 2 and writing the random vector as (X, Y )
T
as before
µ = (µ
X
, µ
Y
)
T
V =
_
σ
2
X
ρσ
X
σ
Y
ρσ
X
σ
Y
σ
2
Y
_
So det(V ) = σ
2
X
σ
2
Y
(1 −ρ
2
). We can invert V to give
V
−1
=
_
1
σ
2
X
(1−ρ
2
)

ρ
σ
X
σ
Y
(1−ρ
2
)

ρ
σ
X
σ
Y
(1−ρ
2
)
1
σ
2
Y
(1−ρ
2
)
_
.
87
Multiplying out
(x −µ
X
y −µ
Y
)V
−1
_
x −µ
X
y −µ
Y
_
we ﬁnd that the exp term is equal to
_
−1
(2(1 −ρ
2
)
_
_
x −µ
X
σ
X
_
2
−2ρ
_
x −µ
X
σ
X
__
y −µ
Y
σ
Y
_
+
_
y −µ
Y
σ
Y
_
2
__
The constant term is
(2π)

2
2
|V |

1
2
= (2πσ
X
σ
Y
)
−1
(1 −ρ
2
)

1
2
and so this pdf agrees with that of the bivariate normal distribution
given at the beginning of this chapter.
88

