One-sample t-test (univariate). . . .

Two-sample t-test (univariate) . . .

One sample T 2 -test (multivariate) .

Two sample T 2 -test (multivariate).

2

3

4

5

6

Definition. . . . . . . . . . . . . . . . . . . . . . . . .

Why is it important in multivariate statistics?

Why is it important in multivariate statistics?

Characterization . . . . . . . . . . . . . . . . . . . .

Properties . . . . . . . . . . . . . . . . . . . . . . . .

Properties . . . . . . . . . . . . . . . . . . . . . . . .

Data matrices . . . . . . . . . . . . . . . . . . . . . .

7

8

9

10

11

12

13

14

.

.

.

.

.

15

16

17

18

19

20

.

.

.

.

.

21

22

23

24

25

26

Wishart distribution

Quadratic forms of normal data matrices

Definition. . . . . . . . . . . . . . . . . . . . . .

Properties . . . . . . . . . . . . . . . . . . . . .

Properties . . . . . . . . . . . . . . . . . . . . .

Properties . . . . . . . . . . . . . . . . . . . . .

Hotellings T 2 distribution

Definition. . . . . . . . . . . . . . . .

One-sample T 2 test . . . . . . . . .

Relationship with F distribution

Mahalonobis distance. . . . . . . .

Two-sample T 2 test. . . . . . . . .

Final remarks

Assumptions for one-sample T 2 test. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Assumptions for two-sample T 2 test. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Multivariate test versus univariate tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

27

. 28

. 29

. 30

2 / 30

Suppose that x1 , . . . , xn are i.i.d. N (, 2 ). Then

P

x

= n1 ni=1 xi N (, 2 /n)

Pn

ns2 = i=1 (xi x

)2 2 2n1

2

x

and s are independent

Suppose that the mean is unknown. Then we can do a one-sample t-test:

H0 : = 0 , Ha : 6= 0 .

0 , where su is sample standard deviation with n 1 in the denominator.

Test statistic: t = sx

u/ n

Compute p-value. If p-value < 0.05, we reject the null hypothesis.

3 / 30

2 ), and let y , . . . , y be i.i.d. N ( , 2 ) (independent

Suppose that x1 , . . . , xn are i.i.d. N (X , X

1

m

Y

Y

of x1 , . . . , xn ).

Suppose we want to test whether X = Y , under the assumption that X = Y . Then we can do

a two-sample t-test:

H0 : X = Y , Ha : X 6= Y .

y

Test statistic: t = q x

, where

1

1

s2p ( n + m )

s2p =

1

ns2X + ms2Y .

n+m2

Under H0 , t tn+m2 .

4 / 30

Suppose that x1 , . . . , xn are i.i.d. Np (, ). Then

P

x

= n1 ni=1 xi Np (, /n).

nS Wp (, n 1), where S is the sample covariance matrix and Wp is a Wishart distribution.

x

and S are independent.

Suppose that the mean is unknown. Then we can do a one-sample t-test:

H0 : = 0 , Ha : 6= 0 .

Test statistic: T = n(

x 0 ) Su1 (

x 0 ), where Su is the sample covariance matrix with

n 1 in the denominator.

Under H0 , T T 2 (p, n 1), i.e., it has Hotellings T 2 distribution with parameters p and n 1.

5 / 30

Suppose that x1 , . . . , xn are i.i.d. Np (X , X ), and let y1 , . . . , ym be i.i.d. Np (Y , Y )

(independent of x1 , . . . , xn ).

Suppose we want to test whether X = Y , under the assumption that X = Y . Then we can do

a two-sample t-test:

H0 : X = Y , Ha : X 6= Y .

Test statistic: T = (nm/(n + m))(

x y) Su1 (

x y), where

Su =

1

(nS1 + mS2 ) .

n+m2

6 / 30

7 / 30

Definition

A random variable x R has a univariate normal distribution with mean and variance 2 (we

write x N (, 2 )) iff its density can be written as

1

1 (x )2

f (x) =

exp

2 2

2

1

2 1/2

2 1

exp (x ){ } (x ) .

= {2 }

2

A random vector x Rp has a p-variate normal distribution with mean vector and covariance

matrix (we write x Np (, )) iff its density can be written as

1

1/2

1

f (x) = |2|

exp (x ) (x ) .

2

8 / 30

It is an easy generalization of the univariate normal distribution. Such generalizations are not

obvious for all univariate distributions; sometimes there are several plausible ways to generalize them.

The multivariate normal distribution is entirely defined by its first two moments. Hence, it has a

sparse parametrization using only p(p + 3)/2 parameters (see board).

In the case of normal variables, zero correlation implies independence, and pairwise independence

implies mutual independence. These properties do not hold for many other distributions.

9 / 30

Linear functions of a multivariate normal are univariate normal. This yields simple derivations.

Even when the original data is not multivariate normal, certain functions like the sample mean will

be approximately multivariate normal due to the central limit theorem.

The multivariate normal distribution has a simple geometry. Its equiprobability contours are

ellipsoids (see picture on slide).

10 / 30

Characterization

x is p-variate normal iff a x is univariate normal for all fixed vectors a Rp (To allow for a = 0, we

regard constants as degenerate forms of the normal distribution.)

Geometric interpretation: x is p-variate normal iff its projection on any univariate subspace is normal.

This characterization will allow us to derive many properties of the multivariate normal without

writing down densities.

11 / 30

Properties

(Th 3.1.1 of MKB) If x is p-variate normal, and if y = Ax + c where A is any q p matrix and c is

any q-vector, then y is q-variate normal (see proof on board).

(Cor 3.1.1.1 of MKB) Any subset of elements of a multivariate normal vector are multivariate

normal. In particular, all individual elements are univariate normal (see proof on board).

12 / 30

Properties

board).

(Cor 3.2.1.1 of MKB) IfPx Np (, ) with > 0, then y = 1/2 (x ) Np (0, I) and

(x ) 1 (x ) = pi=1 yi2 2p (see proof on board).

13 / 30

Data matrices

Let x1 , . . . , xn be a random sample from N (, ). Then we call X = (x1 , . . . , xn ) a data matrix

from N (, ) or a normal data matrix.

(Th. 3.3.2 of MKB, without proof) If X(n p) is a normal data matrix from Np (, ) and if

Y (m q) satisfies Y = AXB, then Y is a normal matrix iff the following two properties hold:

A1 = 1 for some scalar , or B = 0

AA = I for some scalar , or B B = 0

When both these conditions are satisfied then Y is a normal data matrix from Nq (B , B B).

Note: Pre-multiplication with A means that we take linear combinations of the rows. The conditions on

A ensure that the new rows are independent. Post-multiplication with B means that we take linear

combinations of the columns (variables).

14 / 30

Wishart distribution

15 / 30

We now consider quadratic forms of normal data matrices, i.e., functions of the form X CX for

some symmetric matrix C.

A special case of such a quadratic form is the covariance matrix, which we obtain when

C = n1 (I n1 11 ) (see proof on board; H = I n1 11 is called the centering matrix).

16 / 30

Definition

If M (p p) can be written as X X where X(m p) is a data matrix from Np (0, ), then M is said

to have a p-variate Wishart distribution with scale matrix and m degrees of freedom. We write

M Wp (, m). When = Ip , then the distribution is said to be in standard form.

the W1 ( 2 , m) distribution is given by x x, where x Rm contains i.i.d. N1 (0, 2 ) variables. Hence,

W1 ( 2 , m) = 2 2m .

17 / 30

Properties

(Th 3.4.1 of MKB) If M Wp (, m) and B is a p q matrix, then B M B Wq (B B, m) (see

proof on board).

(Cor 3.4.1.1 of MKB) Diagonal submatrices of M (square submatrices of M whose diagonal

corresponds to the diagonal of M ) have a Wishart distribution.

(Cor 3.4.1.2 of MKB) 1/2 M 1/2 Wp (I, m)

(Cor 3.4.1.3 of MKB) If M Wp (I, m) and B(p q) satisfies B B = Iq , then B M B Wq (I, m)

(Cor 3.4.2.1 of MKB) The ith diagonal element of M , mii , has a i2 2m distribution (where i2 is

the ith diagonal element of ).

All these corollaries follow by choosing particular values of B in Th 3.4.1.

18 / 30

Properties

then M1 + M2 Wp (, m1 + m2 ) (see proof on board).

19 / 30

Properties

(Th 3.4.4 of MKB) If X(n p) is a data matrix from Np (0, ) and C(n n) is a symmetric matrix,

then

X CX has the same distribution as a weighted sum of independent Wp (, 1) matrices, where

the weights are eigenvalues of C;

X CX has a Wishart distribution if C is idempotent. In this case X CX Wp (, r) where

r = tr(C) = rank(C);

If S = n1 X HX is the sample covariance matrix, then nS Wp (, n 1).

(See proof on board)

20 / 30

Hotellings T 2 distribution

21 / 30

Definition

Wp (I, m), then we say that has the Hotelling T 2 distribution with parameters p and m. We write

T 2 (p, m).

x2 T 2 (1, m).

22 / 30

One-sample T 2 test

m(x ) M 1 (x ) T 2 (p, m) (see proof on board).

and S are the mean vector and covariance matrix of a sample of size n

from Np (, ), and Su = (n/(n 1))S, then

T12 = (n 1)(

x ) S 1 (

x ) = n(

x )Su1 (

x )

has a T 2 (p, n 1) distribution (see proof on board).

23 / 30

The T 2 distribution is not readily available in R. But the T 2 distribution is closely related to the

F -distribution:

(Th 3.5.2 of MKB, without proof)

T 2 (p, m) = {mp/(m p + 1)}Fp,mp+1 .

and S are the mean and covariance of a sample of size n from Np (, ),

np

then (n1)p

T12 has a Fp,np distribution.

24 / 30

Mahalonobis distance

The so-called Mahalonobis distance between two populations with means 1 and 2 and common

covariance matrix is given by , where

2 = (1 2 ) 1 (1 2 )

In other words, D is the euclidian distance between the re-scaled vectors 1/2 1 and 1/2 2 .

The sample version of the Mahalonobis distance, D, is defined by

D2 = (

x1 x

2 ) Su1 (

x1 x

2 ),

where Su = (n1 S1 + n2 S2 )/(n 2), x

i is the sample mean of sample i, ni is the sample size of

sample i, and n = n1 + n2 .

25 / 30

Two-sample T 2 test

(Th 3.6.1 of MKB, without proof) If X1 and X2 are independent data matrices, and if the ni rows

of Xi are i.i.d. Np (i , i ), i = 1, 2, then when 1 = 2 and 1 = 2 ,

T22 =

n1 n2 2

D

n

Corollary:

n1p 2

(n2)p T2

26 / 30

Final remarks

27 / 30

We have a simple random sample from the population

The population has a multivariate normal distribution

28 / 30

We have a simple random sample from each population

In each population the variables have a multivariate normal distribution

The two populations have the same covariance matrix

29 / 30

See board

30 / 30

