You are on page 1of 31

CHAPTER 2

MULTIVARIATE NORMAL
DISTRIBUTION
2

2.1 Introduction
Just as the univariate normal distribution tends to be the most important statistical
distribution in univariate statistics, the multivariate normal distribution is the most
important distribution in multivariate statistics.
Why is the multivariate normal distribution so important?

11/02/2023
a) Mathematical Simplicity - It turns out that this distribution is relatively easy to work
3 so it is easy to obtain multivariate methods based on this particular distribution.
with,

b) Multivariate version of the Central Limit Theorem - In the univariate course that we
had a central limit theorem for the sample mean for large samples of random variables. A
similar result is available in multivariate statistics that says if we have a collection of
random vectors that are independent and identically distributed, then the sample mean
vector, , is going to be approximately multivariate normally distributed for large samples.

c) Many natural phenomena may also be modeled using this distribution, just as in the
univariate case.
11/02/2023
4

2.2 Univariate Normal Distributions


A random variable is normally distributed with mean and variance ( ) if it
has the probability density function of as:

11/02/2023
Main Features of p.d.f.
5
 It is bell shaped and symmetric about .
 Approximately 95% of the distribution lies within 2 standard deviations of the
mean. Sometimes called as “ rule”. Also approximately 99.8% of the
distribution lies with in 3 standard deviation of the mean.

 The maximum value of occurs when and is given by.


 is change the position and change the shape of the curve.

11/02/2023
6
2.3 Multivariate Normal Distributions
If we have a random vector that is distributed according to a multivariate
normal distribution with population mean vector and population variance-
covariance matrix , then this random vector, , will have the joint density
function as shown in the expression below:

Where - determinant of the variance-covariance matrix

11/02/2023
Main Features of p.d.f.
7
 This distribution will take maximum values when the vector is equal to the mean vector ,
and decrease around that maximum.
 If is equal to , then we have just a bivariate normal distribution and this will yield a bell-
shaped curve but now in three dimensions.
 The vector 'is distributed as multivariate normal with mean vector and variance-
covariance matrix is denoted by .
 The following term appearing inside the exponent of the multivariate normal distribution
is a quadratic form:. This particular quadratic form is also called the
squared Mahalanobis distance between the random vector and the mean vector .
11/02/2023
8

 Each single variable has a univariate normal distribution. Thus we can look at
univariate tests of normality for each variable when assessing multivariate
normality.
 Any subset of the variables also has a multivariate normal distribution.
 Any linear combination of the variables has a univariate normal distribution.

11/02/2023
9

Figure 2.1: Two bivariate normal distributions


(a) (b).

11/02/2023
10
Example 2.1:
Evaluate the bivariate normal density in terms of the individual parameters .
Example 2.2:
Consider a bivariate normal distribution with.
(a)Write out the bivariate normal density
(b)Write out the squared statistical distance expression as a quadratic function of

11/02/2023
2.4 Exponent of Multivariate Normal Distribution
11
Probability density function as shown in the expression below:

this probability density function, only depends on through the squared Mahalanobis
distance:
. This is the equation for a hyper-ellipse centered at .
From the expression of probability density function of a p- dimensional normal variable, it
should be clear that the paths of x values yielding a constant height for the density are
ellipsoids. That is the multivariate normal density is constant on surfaces where the square
of the distance is constant. These paths are called contours.

11/02/2023
Constant probability density contour
12 = surface of an ellipsoid centered at
The axis of each ellipsoid of constant density are in the direction of the eigenvectors of , and
their lengths are proportional to the reciprocals of the square roots of the eigenvalues of .

Result 2.1:
If is positive definite, so exists, then

So is an eigenvalue-eigenvector pair for corresponding to the pair for Also is positive


definite.

11/02/2023
13

The following summarizes concepts:


Contours of constant density for the p-dimensional normal distribution are ellipsoids
defined by such that

These ellipsoids are centered at and have axis where for .


For a bivariate normal, where variables, we have an ellipse as shown in the plot below:

11/02/2023
14

Figure 2.2: A constant –density contour for a bivariate normal distribution


with and 11/02/2023
15
Example 2.3: (Contours of the bivariate normal distribution)
Obtain the axis of constant probability density contours for a bivariate normal distribution
when .

2.5 Geometry of the Multivariate Normal Distribution


The geometry of the multivariate normal distribution can be investigated by considering the
orientation, and shape of the prediction ellipse as depicted in the following diagram:

11/02/2023
16

11/02/2023
The prediction ellipse above is centered on the population means .
17
The ellipse has axes pointing in the directions of the eigenvectors . Here, in this diagram
for the bivariate normal, the longest axis of the ellipse points in the direction of the first
eigenvector and the shorter axis is perpendicular to the first, pointing in the direction of
the second eigenvector .

The corresponding half-lengths of the axes are obtained by the following expression:

The plot above captures the lengths of these axes within the ellipse.

11/02/2023
2.6 Special Cases:
18
To further understand the shape of the multivariate normal distribution, let's return to
the special case where we have p = 2 variables.
If , there is zero correlation, and the eigenvalues turn out to be equal to the variances
of the two variables. So, for example, the first eigenvalue would be equal to and the
second eigenvalue would be equal to as shown below:

the corresponding eigenvectors will have elements 1 and 0 for the first eigenvalue
and 0 and 1 for the second eigenvalue.

11/02/2023
19

So, the axis of the ellipse, in this case, are parallel to the coordinate axis.
If there is zero correlation, and the variances are equal so that, then the eigenvalues will
be equal to one another, and instead of an ellipse we will get a circle. In this special case
we have a so-called circular normal distribution.

11/02/2023
20

Figure 2.4:The 50% and 90% contours for the bivariate normal distribution

11/02/2023
If21the correlation is greater than zero, then the longer axis of the ellipse will have a
positive slope.
Conversely, if the correlation is less than zero, then the longer axis of the ellipse will
have a negative slope.
As the correlation approaches plus or minus 1, the larger eigenvalue will approach the
sum of the two variances, and the smaller eigenvalue will approach zero:

So, what is going to happen in this case is that the ellipse becomes more and more
elongated as the correlation approaches one.

11/02/2023
Useful Facts about the Exponent Component
22
 All values of such that for any specified constant value have the same value of the
density and thus have equal likelihood.
 As the value of increases, the value of the density function decreases. The value of
increases as the distance between x and μ increases.
 The variable has a chi-square distribution with degrees of freedom.
 The value of for a specific observation is called a squared Mahalanobis distance. It is
calculated as .

11/02/2023
23

If we define a specific hyper-ellipse by taking the squared Mahalanobis distance equal to


a critical value of the chi-square distribution with degrees of freedom and evaluate this at
, then the probability that the random value will fall inside the ellipse is going to be equal
to .

This particular ellipse is called the prediction ellipse for a multivariate normal random
vector with mean vector and variance-covariance matrix .

11/02/2023
24

Example 2.4:
Consider a bivariate normal distribution with.
(a)Write out the bivariate normal density
(b)Write out the squared statistical distance expression as a quadratic function of
(c)Determine (and sketch) the constant density contour that contains of the probability.

11/02/2023
Result 2.2:
25
If is distributed as then any linear combination of variables is distributed as . Also if is
distributed as for every , then must be .

Example 2.5
Consider the linear combination of a multivariate normal random vector determined by the
choice .
Example 2.6:
For X distributed as , find the distribution of

11/02/2023
Result 2.3:
All subsets
26 of X are normally distributed. If we respective partition X, its mean vector , and its covariance
matrix as

and

Then is distributed as .

Example 2.7 :( The distribution of a subset of a normal random vector)


If X is distributed as , find the distribution of . 11/02/2023
2.7 Assessing the Assumption of Normality
27
Univariate case
The calculations required for plots are easily programmed for electronic computers. Many
statistical programs available commercially are capable of producing such plots. ,
 The steps leading to a Q-Q plot are as follows:
 Order the original observations to get and their corresponding probability values ;
 Calculate the standard normal quantiles ;and
 Plot the pairs of observations and examine the “straightness" of the outcome.

11/02/2023
28

Figure 2.5:A Q-Q plot for the data

11/02/2023
29

Evaluating Bivariate Normality


The scatter plot should conform bivariate normality by exhibiting an overall pattern that
is nearly elliptical.
Moreover the set of bivariate outcomes such that

11/02/2023
Example 2.5:
30

11/02/2023
31

Data consisting of the pairs of observations for the 10 largest companies in the world
given:

Check the normality of the data.

11/02/2023

You might also like