You are on page 1of 8

Multivariate normal distribution

and testing for means (see MKB Ch 3)

Where are we going?


One-sample t-test (univariate). . . .
Two-sample t-test (univariate) . . .
One sample T 2 -test (multivariate) .
Two sample T 2 -test (multivariate).

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

2
3
4
5
6

Multivariate normal distribution


Definition. . . . . . . . . . . . . . . . . . . . . . . . .
Why is it important in multivariate statistics?
Why is it important in multivariate statistics?
Characterization . . . . . . . . . . . . . . . . . . . .
Properties . . . . . . . . . . . . . . . . . . . . . . . .
Properties . . . . . . . . . . . . . . . . . . . . . . . .
Data matrices . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

7
8
9
10
11
12
13
14

.
.
.
.
.

15
16
17
18
19
20

.
.
.
.
.

21
22
23
24
25
26

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

Wishart distribution
Quadratic forms of normal data matrices
Definition. . . . . . . . . . . . . . . . . . . . . .
Properties . . . . . . . . . . . . . . . . . . . . .
Properties . . . . . . . . . . . . . . . . . . . . .
Properties . . . . . . . . . . . . . . . . . . . . .
Hotellings T 2 distribution
Definition. . . . . . . . . . . . . . . .
One-sample T 2 test . . . . . . . . .
Relationship with F distribution
Mahalonobis distance. . . . . . . .
Two-sample T 2 test. . . . . . . . .

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

Final remarks
Assumptions for one-sample T 2 test. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Assumptions for two-sample T 2 test. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Multivariate test versus univariate tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

27
. 28
. 29
. 30

Where are we going?

2 / 30

One-sample t-test (univariate)


Suppose that x1 , . . . , xn are i.i.d. N (, 2 ). Then
P
x
= n1 ni=1 xi N (, 2 /n)
Pn
ns2 = i=1 (xi x
)2 2 2n1
2
x
and s are independent
Suppose that the mean is unknown. Then we can do a one-sample t-test:
H0 : = 0 , Ha : 6= 0 .
0 , where su is sample standard deviation with n 1 in the denominator.
Test statistic: t = sx
u/ n

Under H0 , t tn1 , i.e., it has a student t-distribution with n 1 degrees of freedom.


Compute p-value. If p-value < 0.05, we reject the null hypothesis.

3 / 30

Two-sample t-test (univariate)


2 ), and let y , . . . , y be i.i.d. N ( , 2 ) (independent
Suppose that x1 , . . . , xn are i.i.d. N (X , X
1
m
Y
Y
of x1 , . . . , xn ).
Suppose we want to test whether X = Y , under the assumption that X = Y . Then we can do
a two-sample t-test:
H0 : X = Y , Ha : X 6= Y .
y
Test statistic: t = q x
, where
1
1

s2p ( n + m )

s2p =


1
ns2X + ms2Y .
n+m2

Under H0 , t tn+m2 .

4 / 30

One sample T 2 -test (multivariate)


Suppose that x1 , . . . , xn are i.i.d. Np (, ). Then
P
x
= n1 ni=1 xi Np (, /n).
nS Wp (, n 1), where S is the sample covariance matrix and Wp is a Wishart distribution.
x
and S are independent.
Suppose that the mean is unknown. Then we can do a one-sample t-test:
H0 : = 0 , Ha : 6= 0 .
Test statistic: T = n(
x 0 ) Su1 (
x 0 ), where Su is the sample covariance matrix with
n 1 in the denominator.
Under H0 , T T 2 (p, n 1), i.e., it has Hotellings T 2 distribution with parameters p and n 1.

5 / 30

Two sample T 2 -test (multivariate)


Suppose that x1 , . . . , xn are i.i.d. Np (X , X ), and let y1 , . . . , ym be i.i.d. Np (Y , Y )
(independent of x1 , . . . , xn ).
Suppose we want to test whether X = Y , under the assumption that X = Y . Then we can do
a two-sample t-test:
H0 : X = Y , Ha : X 6= Y .
Test statistic: T = (nm/(n + m))(
x y) Su1 (
x y), where

Su =

1
(nS1 + mS2 ) .
n+m2

Under H0 , T T 2 (p, n + m 2).

6 / 30

Multivariate normal distribution

7 / 30

Definition

A random variable x R has a univariate normal distribution with mean and variance 2 (we
write x N (, 2 )) iff its density can be written as


1
1 (x )2
f (x) =
exp
2 2
2


1
2 1/2
2 1
exp (x ){ } (x ) .
= {2 }
2

A random vector x Rp has a p-variate normal distribution with mean vector and covariance
matrix (we write x Np (, )) iff its density can be written as


1
1/2
1
f (x) = |2|
exp (x ) (x ) .
2
8 / 30

Why is it important in multivariate statistics?


It is an easy generalization of the univariate normal distribution. Such generalizations are not
obvious for all univariate distributions; sometimes there are several plausible ways to generalize them.
The multivariate normal distribution is entirely defined by its first two moments. Hence, it has a
sparse parametrization using only p(p + 3)/2 parameters (see board).
In the case of normal variables, zero correlation implies independence, and pairwise independence
implies mutual independence. These properties do not hold for many other distributions.

9 / 30

Why is it important in multivariate statistics?


Linear functions of a multivariate normal are univariate normal. This yields simple derivations.
Even when the original data is not multivariate normal, certain functions like the sample mean will
be approximately multivariate normal due to the central limit theorem.
The multivariate normal distribution has a simple geometry. Its equiprobability contours are
ellipsoids (see picture on slide).

10 / 30

Characterization

x is p-variate normal iff a x is univariate normal for all fixed vectors a Rp (To allow for a = 0, we
regard constants as degenerate forms of the normal distribution.)

Geometric interpretation: x is p-variate normal iff its projection on any univariate subspace is normal.
This characterization will allow us to derive many properties of the multivariate normal without
writing down densities.

11 / 30

Properties

(Th 3.1.1 of MKB) If x is p-variate normal, and if y = Ax + c where A is any q p matrix and c is
any q-vector, then y is q-variate normal (see proof on board).

(Cor 3.1.1.1 of MKB) Any subset of elements of a multivariate normal vector are multivariate
normal. In particular, all individual elements are univariate normal (see proof on board).
12 / 30

Properties

(Th 3.2.1 of MKB) If x Np (, ) and y = Ax + c, then y Nq (A + c, AA ) (see proof on


board).

(Cor 3.2.1.1 of MKB) IfPx Np (, ) with > 0, then y = 1/2 (x ) Np (0, I) and
(x ) 1 (x ) = pi=1 yi2 2p (see proof on board).

13 / 30

Data matrices
Let x1 , . . . , xn be a random sample from N (, ). Then we call X = (x1 , . . . , xn ) a data matrix
from N (, ) or a normal data matrix.
(Th. 3.3.2 of MKB, without proof) If X(n p) is a normal data matrix from Np (, ) and if
Y (m q) satisfies Y = AXB, then Y is a normal matrix iff the following two properties hold:
A1 = 1 for some scalar , or B = 0
AA = I for some scalar , or B B = 0
When both these conditions are satisfied then Y is a normal data matrix from Nq (B , B B).
Note: Pre-multiplication with A means that we take linear combinations of the rows. The conditions on
A ensure that the new rows are independent. Post-multiplication with B means that we take linear
combinations of the columns (variables).

14 / 30

Wishart distribution

15 / 30

Quadratic forms of normal data matrices


We now consider quadratic forms of normal data matrices, i.e., functions of the form X CX for
some symmetric matrix C.
A special case of such a quadratic form is the covariance matrix, which we obtain when
C = n1 (I n1 11 ) (see proof on board; H = I n1 11 is called the centering matrix).

16 / 30

Definition

If M (p p) can be written as X X where X(m p) is a data matrix from Np (0, ), then M is said
to have a p-variate Wishart distribution with scale matrix and m degrees of freedom. We write
M Wp (, m). When = Ip , then the distribution is said to be in standard form.

Note: The Wishart distribution is a multivariate generalization of the 2 distribution: when p = 1,


the W1 ( 2 , m) distribution is given by x x, where x Rm contains i.i.d. N1 (0, 2 ) variables. Hence,
W1 ( 2 , m) = 2 2m .
17 / 30

Properties
(Th 3.4.1 of MKB) If M Wp (, m) and B is a p q matrix, then B M B Wq (B B, m) (see
proof on board).
(Cor 3.4.1.1 of MKB) Diagonal submatrices of M (square submatrices of M whose diagonal
corresponds to the diagonal of M ) have a Wishart distribution.
(Cor 3.4.1.2 of MKB) 1/2 M 1/2 Wp (I, m)
(Cor 3.4.1.3 of MKB) If M Wp (I, m) and B(p q) satisfies B B = Iq , then B M B Wq (I, m)
(Cor 3.4.2.1 of MKB) The ith diagonal element of M , mii , has a i2 2m distribution (where i2 is
the ith diagonal element of ).
All these corollaries follow by choosing particular values of B in Th 3.4.1.

18 / 30

Properties

(Th 3.4.3 of MKB) If M1 Wp (, m1 ) and M2 Wp (, m2 ), and if M1 and M2 are independent,


then M1 + M2 Wp (, m1 + m2 ) (see proof on board).

19 / 30

Properties

(Th 3.4.4 of MKB) If X(n p) is a data matrix from Np (0, ) and C(n n) is a symmetric matrix,
then
X CX has the same distribution as a weighted sum of independent Wp (, 1) matrices, where
the weights are eigenvalues of C;
X CX has a Wishart distribution if C is idempotent. In this case X CX Wp (, r) where
r = tr(C) = rank(C);
If S = n1 X HX is the sample covariance matrix, then nS Wp (, n 1).
(See proof on board)
20 / 30

Hotellings T 2 distribution

21 / 30

Definition

If can be written as md M 1 d where d and M are independently distributed as Np (0, I) and


Wp (I, m), then we say that has the Hotelling T 2 distribution with parameters p and m. We write
T 2 (p, m).

Hotellings T 2 distribution is a generalization of the student t-distribution. If x tm , then


x2 T 2 (1, m).

22 / 30

One-sample T 2 test

(Th 3.5.1 of MKB) If x and M are independently distributed as Np (, ) and Wp (, m) then


m(x ) M 1 (x ) T 2 (p, m) (see proof on board).

(Cor 3.5.1.1 of MKB) if x


and S are the mean vector and covariance matrix of a sample of size n
from Np (, ), and Su = (n/(n 1))S, then
T12 = (n 1)(
x ) S 1 (
x ) = n(
x )Su1 (
x )
has a T 2 (p, n 1) distribution (see proof on board).

23 / 30

Relationship with F distribution

The T 2 distribution is not readily available in R. But the T 2 distribution is closely related to the
F -distribution:
(Th 3.5.2 of MKB, without proof)
T 2 (p, m) = {mp/(m p + 1)}Fp,mp+1 .

(Cor 3.5.2.1 of MKB) If x


and S are the mean and covariance of a sample of size n from Np (, ),
np
then (n1)p
T12 has a Fp,np distribution.
24 / 30

Mahalonobis distance

The so-called Mahalonobis distance between two populations with means 1 and 2 and common
covariance matrix is given by , where
2 = (1 2 ) 1 (1 2 )

In other words, D is the euclidian distance between the re-scaled vectors 1/2 1 and 1/2 2 .
The sample version of the Mahalonobis distance, D, is defined by
D2 = (
x1 x
2 ) Su1 (
x1 x
2 ),
where Su = (n1 S1 + n2 S2 )/(n 2), x
i is the sample mean of sample i, ni is the sample size of
sample i, and n = n1 + n2 .
25 / 30

Two-sample T 2 test

(Th 3.6.1 of MKB, without proof) If X1 and X2 are independent data matrices, and if the ni rows
of Xi are i.i.d. Np (i , i ), i = 1, 2, then when 1 = 2 and 1 = 2 ,
T22 =

n1 n2 2
D
n

has a T 2 (p, n 2) distribution.

Corollary:

n1p 2
(n2)p T2

has a Fp,n1p distribution.


26 / 30

Final remarks

27 / 30

Assumptions for one-sample T 2 test


We have a simple random sample from the population
The population has a multivariate normal distribution

28 / 30

Assumptions for two-sample T 2 test


We have a simple random sample from each population
In each population the variables have a multivariate normal distribution
The two populations have the same covariance matrix

29 / 30

Multivariate test versus univariate tests


See board
30 / 30