You are on page 1of 9

Multivariate normal distribution

Every linear combination of its components Y =


a1 X1 + + akXk is normally distributed. That is,
for any constant vector a Rk , the random variable
Y = ax has a univariate normal distribution, where
a univariate normal distribution with zero variance
is a point mass on its mean.

MVN redirects here. For the airport with that IATA


code, see Mount Vernon Airport.
In probability theory and statistics, the multivariate
normal distribution or multivariate Gaussian distribution, is a generalization of the one-dimensional
(univariate) normal distribution to higher dimensions.
One possible denition is that a random vector is said to
be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate
central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any
set of (possibly) correlated real-valued random variables
each of which clusters around a mean value.

There exists a random -vector z, whose components are independent standard normal random variables, a k-vector , and a k matrix A, such that
x = Az + . Here is the rank of the covariance
matrix = AA. Especially in the case of full rank,
see the section below on Geometric interpretation.
There is a k-vector and a symmetric, nonnegativedenite kk matrix , such that the characteristic
function of x is

Notation and parametrization


(
)
x (u) = exp iu 12 u u .

The multivariate normal distribution of a k-dimensional


random vector x = [X1 , X2 , , Xk] can be written in the
following notation:

The covariance matrix is allowed to be singular (in which


case the corresponding distribution has no density). This
case arises frequently in statistics; for example, in the distribution of the vector of residuals in the ordinary least
squares regression. Note also that the Xi are in general
not independent; they can be seen as the result of applying the matrix A to a collection of independent Gaussian
variables z.

x N (, ),
or to make it explicitly known that X is k-dimensional,

x Nk (, ).

3 Properties

with k-dimensional mean vector

3.1 Density function

= [E[X1 ], E[X2 ], . . . , E[Xk ]]

3.1.1 Non-degenerate case

and k k covariance matrix

The multivariate normal distribution is said to be nondegenerate when the symmetric covariance matrix
is positive denite. In this case the distribution has
density[2]

= [Cov[Xi , Xj ]], i = 1, 2, . . . , k; j = 1, 2, . . . , k

(
)
1
T 1
exp (x ) (x ) ,
fx (x1 , . . . , xk ) =
2
(2)k ||

Denition

A random vector x = (X1 , , Xk)' is said to have the


multivariate normal distribution if it satises the follow- where x is a real k-dimensional column vector and ||
is the determinant of . Note how the equation above
ing equivalent conditions.[1]
1

3 PROPERTIES
This is because the above expression - but without being inside a signum function - is the best linear unbiased
prediction of Y given a value of X.[4]
3.1.2 Degenerate case

If the covariance matrix is not full rank, then the multivariate normal distribution is degenerate and does not
have a density. More precisely, it does not have a density
with respect to k-dimensional Lebesgue measure (which
is the usual measure assumed in calculus-level probability courses). Only random vectors whose distributions are
Bivariate normal joint density
absolutely continuous with respect to a measure are said
to have densities (with respect to that measure). To talk
reduces to that of the univariate normal distribution if about densities but avoid dealing with measure-theoretic
is a 1 1 matrix (i.e. a single real number).
complications it can be simpler to restrict attention to a
Note that the circularly-symmetric version of the subset of rank() of the coordinates of x such that the
complex normal distribution has a slightly dierent form. covariance matrix for this subset is positive denite; then
the other coordinates may be thought of as an ane funcEach iso-density locusthe locus of points in k- tion of the selected coordinates.
dimensional space each of which gives the same particular value of the densityis an ellipse or its higher- To talk about densities meaningfully in the singular case,
dimensional generalization; hence the multivariate nor- then, we must select a dierent base measure. Using
the disintegration theorem we can dene a restriction of
mal is a special case of the elliptical distributions.
Lebesgue measure to the rank() -dimensional ane
The descriptive statistic (x )T 1 (x ) in the non- subspace of Rk where the Gaussian distribution is supdegenerate multivariate normal distribution equation is ported, i.e. { + 1/2 v : v Rk } . With respect to this
known as the square of the Mahalanobis distance, which measure the distribution has density:
represents the distance of the test point x from the mean
. Note that in case when k = 1 , the distribution reduces
(
) 12
to a univariate normal distribution and the Mahalanobis

+
1
f (x) = det(2)
e 2 (x) (x)
distance reduces to the standard score.
where + is the generalized inverse and det* is the
pseudo-determinant.[5]

Bivariate case

In the 2-dimensional nonsingular case (k = rank() =


2), the probability density function (
of a vector [X [Y] is: 3.2 Higher moments
])
1
(x X )2
(y Y )2
2(x X )(y Y )
1

f (x, y) =
+

exp
2
2
2(1 2 )
X
2X Y 1 2
Main article:YIsserlis theorem X Y
where is the correlation between X and Y and where
X > 0 and Y > 0 . In this case,
The kth-order moments of x are dened by

( )
X
,
Y

(
=

2
X
X Y

)
X Y
.
Y2

1,...,N (x) = r1 ,...,rN (x) = E


def

def

xj j
r

j=1

In the bivariate case, the rst equivalent condition for


multivariate normality can be made less restrictive: it where r1 + r2 + + rN = k.
is sucient to verify that countably many distinct linear The central k-order central moments are given as follows
combinations of X and Y are normal in order to conclude
(a) If k is odd, , , N(x ) = 0.
that the vector [X Y] is bivariate normal.[3]
The bivariate iso-density loci plotted in the x,y-plane are (b) If k is even with k = 2, then
ellipses. As the correlation parameter increases, these
loci appear to be squeezed to the following line :

y(x) = sgn()

Y
(x X ) + Y .
X

1,...,2 (x ) =

(ij k XZ )

where the sum is taken over all allocations of the set


{1, . . . , 2} into (unordered) pairs. That is, if you have

3.4

Entropy

a kth ( = 2 = 6) central moment, you will be summing the i.e. with the conjugate transpose (indicated by ) replacproducts of = 3 covariances (the - notation has been ing the normal transpose (indicated by T ). This is slightly
dropped in the interests of parsimony):
dierent than in the real case, because the circularlysymmetric version of the complex normal distribution has
a slightly dierent form.
E[x1 x2 x3 x4 x5 x6 ]
A similar notation is used for multiple linear regression.[6]
= E[x1 x2 ]E[x3 x4 ]E[x5 x6 ] + E[x1 x2 ]E[x3 x5 ]E[x4 x6 ] + E[x1 x2 ]E[x3 x6 ]E[x4 x5 ]
+ E[x1 x3 ]E[x2 x4 ]E[x5 x6 ] + E[x1 x3 ]E[x2 x5 ]E[x4 x6 ] + E[x1 x3 ]E[x2 x6 ]E[x4 x5 ]
+ E[x1 x4 ]E[x2 x3 ]E[x5 x6 ] + E[x1 x4 ]E[x2 x5 ]E[x3 x6 ] 3.4
+ E[x1Entropy
x4 ]E[x2 x6 ]E[x3 x5 ]
+ E[x1 x5 ]E[x2 x3 ]E[x4 x6 ] + E[x1 x5 ]E[x2 x4 ]E[x3 x6 ] + E[x1 x5 ]E[x2 x6 ]E[x3 x4 ]
The dierential entropy of the multivariate normal distri+ E[x1 x6 ]E[x2 x3 ]E[x4 x5 ] + E[x1 x6 ]E[x2 x4 ]E[x3 x5 ] bution
+ E[x1isx[7]6 ]E[x2 x5 ]E[x3 x4 ].
This yields (2 1)!/(21 ( 1)!) terms in the sum
(15 in the above case), each being the product of (in
this case 3) covariances. For fourth order moments (four
variables) there are three terms. For sixth-order moments
there are 3 5 = 15 terms, and for eighth-order moments
there are 3 5 7 = 105 terms.

h (f ) =
=

f (x) ln f (x) dx,

1
ln ((2e)n ||) ,
2

The covariances are then determined by replacing the where the bars denote the matrix determinant.
terms of the list [1, . . . , 2] by the corresponding terms
of the list consisting of r1 ones, then r2 twos, etc.. To illustrate this, examine the following 4th-order central mo- 3.5 KullbackLeibler divergence
ment case:
The KullbackLeibler divergence from N0 (0 , 0 ) to
N1 (1 , 1 ) , for non-singular matrices 0 and 1 , is:[8]
[ 4]
2
E xi = 3ii
[
]
{
(
)
1
E x3i xj = 3ii ij
T
DKL (N0 N1 ) =
tr 1
0 + (1 0 ) 1
1
1 (1 0 ) K + l
[ 2 2]
2
2
E xi xj = ii jj + 2 (ij )
]
[
where K is the dimension of the vector space.
E x2 x x = + 2
i

j k

ii

jk

ij

ik

The logarithm must be taken to base e since the two terms


following the logarithm are themselves base-e logarithms
where ij is the covariance of xi and xj. The idea with of expressions that are either factors of the density functhe above method is you rst nd the general case for tion or otherwise arise naturally. The equation therefore
a kth moment where you have k dierent x variables - gives a result measured in nats. Dividing the entire exE [xi xj xk xn ] and
[ then you
] can simplify this accordingly. pression above by loge 2 yields the divergence in bits.
Say, you have E x2i xk xn then you simply let xi = xj and
realise that ii = i2 .
E [xi xj xk xn ] = ij kn + ik jn + in jk .

3.6 Cumulative distribution function


3.3

Likelihood function

The notion of cumulative distribution function (cdf) in


dimension 1 can be extended in two ways to the multidiIf the mean and variance matrix are unknown, a suitable mensional case. The rst way is to dene the cumulative
log likelihood function for a single observation x would distribution function F (r) as the probability that a sample
be:
falls inside the ellipsoid determined by its Mahalanobis
distance r from the Gaussian, a direct generalization of
the standard deviation .[9] In order to compute the values
1
1
k
[9]
T 1
ln(L) = ln(|| ) (x) (x) ln(2) of this function, closed analytic formulae exist.
2
2
2
Another way to extend the notion of cumulative distribuwhere x is a vector of real numbers. The circularly- tion function is to dene the cumulative distribution funcsymmetric version of the complex case, where z is a vec- tion (cdf) F(x0 ) of a random vector x as the probability
tor of complex numbers, would be
that all components of x are less than or equal to the corresponding values in the vector x0 . Though there is no
closed form for F(x), there are a number of algorithms
ln(L) = ln(|| ) (z ) 1 (z ) k ln()
that estimate it numerically.[10]

3.7

5 CONDITIONAL DISTRIBUTIONS

Prediction Interval

But it is not true that two random variables that are (separately, marginally) normally distributed and uncorrelated
The prediction interval for the multivariate normal distri- are independent. Two random variables that are normally
bution yields a region consisting of those vectors x satis- distributed may fail to be jointly normally distributed, i.e.,
the vector whose components they are may fail to have a
fying
multivariate normal distribution. In the preceding example, clearly X and Y are not independent, yet choosing c
to be 1.54 makes them uncorrelated.
T 1
2
(x ) (x ) (p).
k

Here x is a k -dimensional vector, is the known k


-dimensional mean vector, is the known covariance 5 Conditional distributions
matrix and 2k (p) is the quantile function for probability p of the chi-squared distribution with k degrees of
If N-dimensional x is partitioned as follows
freedom.[11]
When k = 2, the expression denes the interior of an
[ ]
[
]
ellipse and the chi-squared distribution simplies to an
x1
q1
x
=
sizes
with
exponential distribution with mean equal to two.
x
(N q) 1
2

and accordingly and are partitioned as follows

Joint normality
[

4.1

]
[
]
1
q1
Normally distributed and independent = sizes with (N q) 1
2

If X and Y are normally distributed and independent, this


[
]
[
]
qq
q (N q)
implies they are jointly normally distributed, i.e., the = 11 12 sizes with
21 22
(N q) q (N q) (N q)
pair (X, Y) must have multivariate normal distribution.
However, a pair of jointly normally distributed variables then, the distribution of x conditional on x = a is mul1
2
need not be independent (would only be so if uncorre- tivariate normal (x |x = a) ~ N(, ) where
1 2
lated, = 0 ).

4.2

= 1 + 12 22 (a 2 )
Two normally distributed random
variables need not be jointly bivariate
and covariance matrix
normal

See also: normally distributed and uncorrelated does not


imply independent

[12]
= 11 12 1
22 21 .

This matrix is the Schur complement of 22 in . This


means that to calculate the conditional covariance matrix,
one inverts the overall covariance matrix, drops the rows
and columns corresponding to the variables being conditioned upon, and then inverts back to get the conditional
covariance matrix. Here 1
22 is the generalized inverse
of 22 .

The fact that two random variables X and Y both have a


normal distribution does not imply that the pair (X, Y)
has a joint normal distribution. A simple example is one
in which X has a normal distribution with expected value
0 and variance 1, and Y = X if |X| > c and Y = X if |X|
< c, where c > 0. There are similar counterexamples for
more than two random variables. In general, they sum to Note that knowing that x2 = a alters the variance, though
a mixture model.
the new variance does not depend on the specic value
of a; perhaps more surprisingly, the mean is shifted by
12 1
22 (a 2 ) ; compare this with the situation of
4.3 Correlations and independence
not knowing the value of a, in which case x1 would have
distribution Nq (1 , 11 ) .
In general, random variables may be uncorrelated but stais
tistically dependent. But if a random vector has a multi- An interesting fact derived in order to prove this result,
1
that
the
random
vectors
x
and
y
=
x

x
2
1
1
12
2
22
variate normal distribution then any two or more of its
are
independent.
components that are uncorrelated are independent. This
implies that any two or more of its components that are The matrix 12 22 1 is known as the matrix of regression
pairwise independent are independent.
coecients.

5.1

Bivariate case

marginalize out) from the mean vector and the covariance


matrix. The proof for this follows from the denitions of
In the bivariate case where x is partitioned into X1 and multivariate normal distributions and linear algebra.[15]
X2 , the conditional distribution of X1 given X2 is[13]
Example
Let x = [X1 , X2 , X3 ] be multivariate normal random vari(
)
ables with mean vector = [1 , 2 , 3 ] and covariance
1
X1 |X2 = x2 N 1 + (x2 2 ), (1 2 )12 .matrix (standard parametrization for multivariate nor2
mal distributions). Then the joint distribution of x = [X1 ,
where is the correlation coecient between X1 and X2 . X3 ] is multivariate normal with
[ mean vector
] = [1 , 3 ]

11
13
and covariance matrix =
.
31 33

5.2

5.2.1

Bivariate conditional expectation

7 Ane transformation

In the general case

( )
(( ) ( 2
X1
1
1
N
,
X2
2
1 2

1 2
22

))

If y = c + Bx is an ane transformation of x N (, ),
where c is an M 1 vector of constants and B is a constant
M N matrix, then y has a multivariate normal distribuThe conditional expectation of X1 given X2 is:
tion with(expected value c) + B and variance BBT i.e.,
E(X1 |X2 = x2 ) = 1 + 21 (x2 2 )
y N c + B, BBT . In particular, any subset of
Proof: the result is simply obtained taking the expectation the xi has a marginal distribution that is also multivariate
of the conditional distribution X1 |X2 above.
normal. To see this, consider the following example: to
extract the subset (x1 , x2 , x4 )T , use

5.2.2 In the case of unit variances


( )
(( ) (
))
X1
0
1
N
,
X2
0
1
The conditional expectation of X1 given X2 is:

1 0
B = 0 1
0 0

0
0
0

0
0
1

0
0
0

...
...
...

0
0
0

which extracts the desired elements directly.


Another corollary is that the distribution of Z = b x,
where b is a constant vector of the same length as x and
the dot indicates a vector product, is univariate Gaussian
and the conditional expectation of X1 given that X2 is with Z N (b , bT b) . This result follows by using
smaller/bigger than z is (Maddala 1983, p. 367[14] ) :
E(X1 | X2 = x2 ) = x2

(z)
E(X1 | X2 < z) =
,
(z)
E(X1 | X2 > z) =

(z)
,
(1 (z))

where the nal ratio here is called the inverse Mills ratio.
Proof: the last two results are obtained using the result
E(X1 | X2 = x2 ) = x2 , so that
E(X1 | X2 < z) = E(X2 | X2 < z) and
then using the properties of the expectation of
a truncated normal distribution.

[
B = b1

b2

...

]
b n = bT .

Observe how the positive-deniteness of implies that


the variance of the dot product must be positive.
An ane transformation of x such as 2x is not the same
as the sum of two independent realisations of x.

8 Geometric interpretation
See also: Condence region

The equidensity contours of a non-singular multivariate


normal distribution are ellipsoids (i.e. linear transformations of hyperspheres) centered at the mean.[16] Hence the
6 Marginal distributions
multivariate normal distribution is an example of the class
of elliptical distributions. The directions of the principal
To obtain the marginal distribution over a subset of multi- axes of the ellipsoids are given by the eigenvectors of the
variate normal random variables, one only needs to drop covariance matrix . The squared relative lengths of the
the irrelevant variables (the variables that one wants to principal axes are given by the corresponding eigenvalues.

11

MULTIVARIATE NORMALITY TESTS

If = UUT = U1/2 (U1/2 )T is an eigendecomposition 10 Bayesian inference


where the columns of U are unit eigenvectors and is a
diagonal matrix of the eigenvalues, then we have
In Bayesian statistics, the conjugate prior of the mean vector is another multivariate normal distribution, and the
conjugate prior of the covariance matrix is an inverse1/2
W 1 . Suppose then that n obserWishart(0,
distribution
x N (, ) x +U N (0, I) x +UN
).
vations have been made
Moreover, U can be chosen to be a rotation matrix, as
inverting an axis does not have any eect on N(0, ), but
inverting a column changes the sign of U's determinant. X = {x1 , . . . , xn } N (, )
The distribution N(, ) is in eect N(0, I) scaled by and that a conjugate prior has been assigned, where
1/2 , rotated by U and translated by .
Conversely, any choice of , full rank matrix U, and positive diagonal entries i yields a non-singular multivariate
normal distribution. If any i is zero and U is square, the
resulting covariance matrix UUT is singular. Geometrically this means that every contour ellipsoid is innitely
thin and has zero volume in n-dimensional space, as at
least one of the principal axes has length of zero.

Estimation of parameters

p(, ) = p( | ) p(),
where
p( | ) N (0 , m1 ),
and
p() W 1 (, n0 ).
Then,

The derivation of the maximum-likelihood estimator of


the covariance matrix of a multivariate normal distribution is perhaps surprisingly subtle and elegant. See
estimation of covariance matrices.

p( | , X)
p( | X)

In short, the probability density function (pdf) of a mulwhere


tivariate normal is
1

1
f (x) =
exp (x )T 1 (x )
k
2
(2) ||

and the ML estimator of the covariance matrix from a


sample of n observations is

b = 1

(xi x)(xi x)T


n i=1
n

which is simply the sample covariance matrix. This is a


biased estimator whose expectation is

N
W

n
x+m0
, 1
(n+m n+m

+ nS +

nm
x
n+m (

)
0 )(
x 0 ) , n + n0 ,

= n1 ni=1 xi ,
x
n
)(xi x
) .
S = n1 i=1 (xi x

11 Multivariate normality tests


Multivariate normality tests check a given set of data
for similarity to the multivariate normal distribution.
The null hypothesis is that the data set is similar to
the normal distribution, therefore a suciently small pvalue indicates non-normal data. Multivariate normality
tests include the Cox-Small test[17] and Smith and Jains
adaptation[18] of the Friedman-Rafsky test.[19]

n1
.
n
An unbiased sample covariance is

Mardias test[20] is based on multivariate extensions of


skewness and kurtosis measures. For a sample {x1 , ...,
xn} of k-dimensional vectors we compute

b = 1

(xi x)(xi x)T .


n 1 i=1

b = 1
) (xj x
)T

(xj x
n j=1

b =
E[]

The Fisher information matrix for estimating the parameters of a multivariate normal distribution has a closed
form expression. This can be used, for example, to compute the CramrRao bound for parameter estimation in
this setting. See Fisher information for more details.

n
n
]3
1 [
b 1 (xj x
)T
)
(xi x
6n i=1 j=1
}
{ n

]2
1 [
n
T b 1
) k(k + 2)
) (xi x
(xi x
B=
8k(k + 2) n i=1

A=

7
Under the null hypothesis of multivariate normality, the
statistic A will have approximately a chi-squared distribution with 1/6k(k + 1)(k + 2) degrees of freedom, and
B will be approximately standard normal N(0,1).

the latter approach gives matrices that are related by


simple re-orderings. In theory both approaches give
equally good ways of determining a suitable matrix
A, but there are dierences in computation time.

Mardias kurtosis statistic is skewed and converges very


slowly to the limiting normal distribution. For medium
size samples (50 n < 400) , the parameters of
the asymptotic distribution of the kurtosis statistic are
modied[21] For small sample tests ( n < 50 ) empirical critical values are used. Tables of critical values for
both statistics are given by Rencher[22] for k=2,3,4.

2. Let z = (z1 , , zN)T be a vector whose components


are N independent standard normal variates (which
can be generated, for example, by using the Box
Muller transform).

Mardias tests are ane invariant but not consistent. For


example, the multivariate skewness test is not consistent
against symmetric non-normal alternatives.[23]
The BHEP test[24] computes the norm of the dierence
between the empirical characteristic function and the theoretical characteristic function of the normal distribution.
Calculation of the norm is performed in the L2 () space
of square-integrable functions with respect to the Gaus2
2
sian weighting function (t)=(2 2 )k/2 e|t| /(2 ) . The
test statistic is

2


1/2
1 n itT

2

b
(xj x)
|t| /2

T =
e
e

(t)dt
Rk n j=1

n
2
1 2 (xi xj )T
b 1 (xi xj )
e 2

2
n i,j=1
n(1 + 2 )k/2

The limiting distribution of this test statistic is a weighted


sum of chi-squared random variables,[24] however in
practice it is more convenient to compute the sample
quantiles using the Monte-Carlo simulations.
A detailed survey of these and other test procedures is
available.[25]

12

Drawing values from the distribution

A widely used method for drawing (sampling) a random


vector x from the N-dimensional multivariate normal distribution with mean vector and covariance matrix
works as follows:[26]
1. Find any real matrix A such that A AT = . When
is positive-denite, the Cholesky decomposition
is typically used, and the extended form of this decomposition can always be used (as the covariance
matrix may be only positive semi-denite) in both
cases a suitable matrix A is obtained. An alternative is to use the matrix A = U obtained from a
spectral decomposition = UUT of . The former approach is more computationally straightforward but the matrices A change for dierent orderings of the elements of the random vector, while

3. Let x be + Az. This has the desired distribution


due to the ane transformation property.

13 See also
Chi distribution, the pdf of the 2-norm (or
Euclidean norm) of a multivariate normally distributed vector (centered at zero).
Complex normal distribution, for the generalization
to complex valued random variables.
Copula, for the denition of the Gaussian or normal
copula model.
Multivariate stable distribution extension of the
multivariate normal distribution, when the index
(exponent in the characteristic function) is between
n
zero2to two.

b 1 (xi
1
2(1+
x)T
x)
2 ) (xi
e
+
(1 + 2 2 )k/2
i=1 Mahalanobis distance
Wishart distribution

14 References
[1] Gut, Allan (2009) An Intermediate Course in Probability,
Springer. ISBN 9781441901613 (Chapter 5)
[2] UIUC, Lecture 21. The Multivariate Normal Distribution,
21.5:"Finding the Density.
[3] Hamedani, G. G.; Tata, M. N. (1975). On the determination of the bivariate normal distribution from
distributions of linear combinations of the variables.
The American Mathematical Monthly 82 (9): 913915.
doi:10.2307/2318494.
[4] Wyatt, John. Linear least mean-squared error estimation (PDF). Lecture notes course on applied probability.
Retrieved 23 January 2012.
[5] Rao, C.R. (1973). Linear Statistical Inference and Its Applications. New York: Wiley. pp. 527528.
[6] Tong, T. (2010) Multiple Linear Regression : MLE and
Its Distributional Results, Lecture Notes
[7] Gokhale, DV; Ahmed, NA; Res, BC; Piscataway,
NJ (May 1989).
Entropy Expressions and Their
Estimators for Multivariate Distributions. Information Theory, IEEE Transactions on 35 (3): 688692.
doi:10.1109/18.30996.

[8] J. Duchi, Derivations for Linear Algebra and Optimization


. pp. 13
[9] Bensimhoun Michael, N-Dimensional Cumulative Function, And Other Useful Facts About Gaussians and Normal
Densities (2006)
[10] Genz, Alan (2009). Computation of Multivariate Normal
and t Probabilities. Springer. ISBN 978-3-642-01689-9.
[11] Siotani, Minoru (1964).
Tolerance regions for a
multivariate normal population (PDF). Annals of the
Institute of Statistical Mathematics 16 (1): 135153.
doi:10.1007/BF02868568.
[12] Eaton, Morris L. (1983). Multivariate Statistics: a Vector
Space Approach. John Wiley and Sons. pp. 116117.
ISBN 0-471-02776-6.
[13] Jensen, J (2000). Statistics for Petroleum Engineers and
Geoscientists. Amsterdam: Elsevier. p. 207.
[14] Gangadharrao, Maddala (1983). Limited Dependent and
Qualitative Variables in Econometrics. Cambridge University Press.
[15] The formal proof for marginal distribution is shown
here
http://fourier.eng.hmc.edu/e161/lectures/
gaussianprocess/node7.html
[16] Nikolaus Hansen. The CMA Evolution Strategy: A Tutorial (PDF).
[17] Cox, D. R.; Small, N. J. H. (1978).
Testing
multivariate normality.
Biometrika 65 (2): 263.
doi:10.1093/biomet/65.2.263.
[18] Smith, S. P.; Jain, A. K. (1988). A test to determine the
multivariate normality of a data set. IEEE Transactions
on Pattern Analysis and Machine Intelligence 10 (5): 757.
doi:10.1109/34.6789.
[19] Friedman, J. H.; Rafsky, L. C. (1979). Multivariate Generalizations of the Wald-Wolfowitz and Smirnov
Two-Sample Tests. The Annals of Statistics 7 (4): 697.
doi:10.1214/aos/1176344722.
[20] Mardia, K. V. (1970). Measures of multivariate skewness and kurtosis with applications. Biometrika 57 (3):
519530. doi:10.1093/biomet/57.3.519.
[21] Rencher (1995), pages 112-113.
[22] Rencher (1995), pages 493-495.
[23] Baringhaus, L.; Henze, N. (1991). Limit distributions
for measures of multivariate skewness and kurtosis based
on projections. Journal of Multivariate Analysis 38: 51.
doi:10.1016/0047-259X(91)90031-V.
[24] Baringhaus, L.; Henze, N. (1988). A consistent
test for multivariate normality based on the empirical characteristic function. Metrika 35 (1): 339348.
doi:10.1007/BF02613322.
[25] Henze, Norbert (2002). Invariant tests for multivariate
normality: a critical review. Statistical Papers 43 (4):
467506. doi:10.1007/s00362-002-0119-6.

14

REFERENCES

[26] Gentle, J.E. (2009). Computational Statistics. New York:


Springer. pp. 315316. doi:10.1007/978-0-387-981444.

14.1 Literature
Rencher, A.C. (1995). Methods of Multivariate
Analysis. New York: Wiley.

15
15.1

Text and image sources, contributors, and licenses


Text

Multivariate normal distribution Source: https://en.wikipedia.org/wiki/Multivariate_normal_distribution?oldid=674792766 Contributors: AxelBoldt, Bryan Derksen, Miguel~enwiki, Mrwojo, Michael Hardy, Delirium, SebastianHelm, Ciphergoth, RickK, Selket, Kaal,
Zero0000, McKay, Moriel~enwiki, Robbot, Benwing, Sanders muc, Meduz, Robinh, Giftlite, Cfp, J heisenberg, BenFrantzDale, Pycoucou, Jason Quinn, Lockeownzj00, MarkSweep, Oliver Jennrich, Picapica, Discospinster, Rich Farmbrough, TedPavlic, Paul August,
Nabla, Mdf, SgtThroat, O18, Zelda~enwiki, BernardH, Cburnett, Jheald, Forderud, Tabletop, Waldir, Btyner, Karam.Anthony.K, Rjwilmsi,
Winterstein, Strashny, Jondude11, Krishnavedala, YurikBot, Wavelength, Jess Riedel, Alanb, Ogo, Mebden, Zvika, SmackBot, Delldot,
Mcld, Oli Filth, DHN-bot~enwiki, Hongooi, MaxSem, AussieLegend, SimonFunk, Lambiam, Ninjagecko, Ulner, Breno, Yoderj, Ojan,
Moonkey, Lavaka, Vaughan Pratt, Jackzhp, Shorespirit, Winterfors, KipKnight, Myasuda, Mct mht, 137 0, Talgalili, Thijs!bot, Opabinia
regalis, Colin Rowat, MER-C, Arvinder.virk, Coee2theorems, A3nm, To Serve Man, Jlataire, Gwern, R'n'B, Steve8675309, Jorgenumata, Quantling, HyDeckar, Epistemenical, TXiKiBoT, Ramiromagalhaes, PhysPhD, Derfugu, Mauryaan, Rlendog, Riancon, Tommyjs,
Toddst1, Chromaticity, Jasondet, Mjdslob, Josuechan, Melcombe, Eamon Nerbonne, Rumping, Unbuttered Parsnip, Lbertolotti, Alexbot,
EtudiantEco, Sun Creator, Qwfp, Addbot, DOI bot, Fgnievinski, SpBot, Wikomidia, Peni, Yobot, Wjastle, Amirobot, YetAnotherBunny,
AnomieBOT, Rubinbot, Xqbot, Daviddengcn, Podgy piglet, FrescoBot, Omrit, Citation bot 1, Stpasha,
, Distantcube, KrodMandooon, Richardcherron, Duoduoduo, Sinuhet, RjwilmsiBot, KurtSchwitters, Velocitay, EmausBot, Set theorist, KHamsun, ZroBot, Quondum, Sigma0 1, Mikhail Ryazanov, Mathstat, Longbiaochen, , Andrestellez84, ChristophE, Dannybix, BlueScreenD, Marc.coram,
Isch de, Ivan Ukhov, Viraltux, ChrisGualtieri, Bscan, Dexbot, Rely2004609, Shedeki, Sourov0000, Mark viking, Jmeyers314, RowanMcAllister, Hamoudafg, Luminans, Dorafanxia, Mcrucix en, Maimonid, Mihev, Xcpenguin, RolLoDk, BeyondNormality, CodeEast, Kai
genome, Sa3016, MisterMoop and Anonymous: 200

15.2

Images

File:MultivariateNormal.png Source: https://upload.wikimedia.org/wikipedia/commons/8/8e/MultivariateNormal.png License: CC0


Contributors: Own work Original artist: Bscan
File:Multivariate_Gaussian.png Source: https://upload.wikimedia.org/wikipedia/commons/5/57/Multivariate_Gaussian.png License:
CC-BY-SA-3.0 Contributors: ? Original artist: ?

15.3

Content license

Creative Commons Attribution-Share Alike 3.0

You might also like