Professional Documents
Culture Documents
MULTIVARIATE NORMAL
DISTRIBUTION
2
2.1 Introduction
Just as the univariate normal distribution tends to be the most important statistical
distribution in univariate statistics, the multivariate normal distribution is the most
important distribution in multivariate statistics.
Why is the multivariate normal distribution so important?
11/02/2023
a) Mathematical Simplicity - It turns out that this distribution is relatively easy to work
3 so it is easy to obtain multivariate methods based on this particular distribution.
with,
b) Multivariate version of the Central Limit Theorem - In the univariate course that we
had a central limit theorem for the sample mean for large samples of random variables. A
similar result is available in multivariate statistics that says if we have a collection of
random vectors that are independent and identically distributed, then the sample mean
vector, , is going to be approximately multivariate normally distributed for large samples.
c) Many natural phenomena may also be modeled using this distribution, just as in the
univariate case.
11/02/2023
4
11/02/2023
Main Features of p.d.f.
5
It is bell shaped and symmetric about .
Approximately 95% of the distribution lies within 2 standard deviations of the
mean. Sometimes called as “ rule”. Also approximately 99.8% of the
distribution lies with in 3 standard deviation of the mean.
11/02/2023
6
2.3 Multivariate Normal Distributions
If we have a random vector that is distributed according to a multivariate
normal distribution with population mean vector and population variance-
covariance matrix , then this random vector, , will have the joint density
function as shown in the expression below:
11/02/2023
Main Features of p.d.f.
7
This distribution will take maximum values when the vector is equal to the mean vector ,
and decrease around that maximum.
If is equal to , then we have just a bivariate normal distribution and this will yield a bell-
shaped curve but now in three dimensions.
The vector 'is distributed as multivariate normal with mean vector and variance-
covariance matrix is denoted by .
The following term appearing inside the exponent of the multivariate normal distribution
is a quadratic form:. This particular quadratic form is also called the
squared Mahalanobis distance between the random vector and the mean vector .
11/02/2023
8
Each single variable has a univariate normal distribution. Thus we can look at
univariate tests of normality for each variable when assessing multivariate
normality.
Any subset of the variables also has a multivariate normal distribution.
Any linear combination of the variables has a univariate normal distribution.
11/02/2023
9
11/02/2023
10
Example 2.1:
Evaluate the bivariate normal density in terms of the individual parameters .
Example 2.2:
Consider a bivariate normal distribution with.
(a)Write out the bivariate normal density
(b)Write out the squared statistical distance expression as a quadratic function of
11/02/2023
2.4 Exponent of Multivariate Normal Distribution
11
Probability density function as shown in the expression below:
this probability density function, only depends on through the squared Mahalanobis
distance:
. This is the equation for a hyper-ellipse centered at .
From the expression of probability density function of a p- dimensional normal variable, it
should be clear that the paths of x values yielding a constant height for the density are
ellipsoids. That is the multivariate normal density is constant on surfaces where the square
of the distance is constant. These paths are called contours.
11/02/2023
Constant probability density contour
12 = surface of an ellipsoid centered at
The axis of each ellipsoid of constant density are in the direction of the eigenvectors of , and
their lengths are proportional to the reciprocals of the square roots of the eigenvalues of .
Result 2.1:
If is positive definite, so exists, then
11/02/2023
13
11/02/2023
14
11/02/2023
16
11/02/2023
The prediction ellipse above is centered on the population means .
17
The ellipse has axes pointing in the directions of the eigenvectors . Here, in this diagram
for the bivariate normal, the longest axis of the ellipse points in the direction of the first
eigenvector and the shorter axis is perpendicular to the first, pointing in the direction of
the second eigenvector .
The corresponding half-lengths of the axes are obtained by the following expression:
The plot above captures the lengths of these axes within the ellipse.
11/02/2023
2.6 Special Cases:
18
To further understand the shape of the multivariate normal distribution, let's return to
the special case where we have p = 2 variables.
If , there is zero correlation, and the eigenvalues turn out to be equal to the variances
of the two variables. So, for example, the first eigenvalue would be equal to and the
second eigenvalue would be equal to as shown below:
the corresponding eigenvectors will have elements 1 and 0 for the first eigenvalue
and 0 and 1 for the second eigenvalue.
11/02/2023
19
So, the axis of the ellipse, in this case, are parallel to the coordinate axis.
If there is zero correlation, and the variances are equal so that, then the eigenvalues will
be equal to one another, and instead of an ellipse we will get a circle. In this special case
we have a so-called circular normal distribution.
11/02/2023
20
Figure 2.4:The 50% and 90% contours for the bivariate normal distribution
11/02/2023
If21the correlation is greater than zero, then the longer axis of the ellipse will have a
positive slope.
Conversely, if the correlation is less than zero, then the longer axis of the ellipse will
have a negative slope.
As the correlation approaches plus or minus 1, the larger eigenvalue will approach the
sum of the two variances, and the smaller eigenvalue will approach zero:
So, what is going to happen in this case is that the ellipse becomes more and more
elongated as the correlation approaches one.
11/02/2023
Useful Facts about the Exponent Component
22
All values of such that for any specified constant value have the same value of the
density and thus have equal likelihood.
As the value of increases, the value of the density function decreases. The value of
increases as the distance between x and μ increases.
The variable has a chi-square distribution with degrees of freedom.
The value of for a specific observation is called a squared Mahalanobis distance. It is
calculated as .
11/02/2023
23
This particular ellipse is called the prediction ellipse for a multivariate normal random
vector with mean vector and variance-covariance matrix .
11/02/2023
24
Example 2.4:
Consider a bivariate normal distribution with.
(a)Write out the bivariate normal density
(b)Write out the squared statistical distance expression as a quadratic function of
(c)Determine (and sketch) the constant density contour that contains of the probability.
11/02/2023
Result 2.2:
25
If is distributed as then any linear combination of variables is distributed as . Also if is
distributed as for every , then must be .
Example 2.5
Consider the linear combination of a multivariate normal random vector determined by the
choice .
Example 2.6:
For X distributed as , find the distribution of
11/02/2023
Result 2.3:
All subsets
26 of X are normally distributed. If we respective partition X, its mean vector , and its covariance
matrix as
and
Then is distributed as .
11/02/2023
28
11/02/2023
29
11/02/2023
Example 2.5:
30
11/02/2023
31
Data consisting of the pairs of observations for the 10 largest companies in the world
given:
11/02/2023