You are on page 1of 15

NOTES FOR MSC STUDENTS

A. Univariate Distributions and Results


Normal distribution
If y is a Normally distributed random variable with mean and variance 2 we write y
N (, 2 ). The density of y is given by

1
2 1/2
2
f (y) = {2 }
exp [(y )/]
2
The Normal distribution with = 0 and 2 = 1, i.e. y N (0, 1), is called the standard
Normal distribution.
The central Chi-squared distribution
The central Chi-squared distribution with n degrees of freedom is defined as the sum of
squares of n independent random variables z1 , z2 , . . . , zn each with the standard Normal
distribution. It is denoted by
y=

n
X

zi2 2n

(A.1)

i=1

In vector notation y = zT z 2n where z = (z1 , z2 , . . . , zn )T . If x1 , x2 , . . . , xn are independent


Normally distributed random variables with xi N (i , i2 ), then
y=

2
n
X
x i i
i=1

2n

(A.2)

since the variables zi = (xi i )/i are N (0, 1).


If y 2n then the density of y is given by
f (y) = {2n/2 (n/2)}1 y (n/21) ey/2

0<y<

(A.3)

and we have
E(y) = n,

V ar(y) = 2n

(A.4)

and, for n > 2


E(1/y) = 1/(n 2)

(A.5)

If y1 , . . . , ym are independent random variables with the chi-squared distributions yi 2ni ,


then
y=

m
X

yi 2k

i=1

(A.6)

where k =

P
i

ni .

If y1 and y2 have chi-squared distributions with n1 and n2 degrees of freedom respectively,


where n1 > n2 , and they are independent then their difference also has a chi-squared distribution
y = y1 y2 2n1 n2

(A.7)

Recall that for a random sample of observations y1 , . . . , yn from N (, 2 ), and defining


n

1X
y=
yi ,
n i=1

1 X
and s =
(yi y)2
n 1 i=1
2

we have
y N (, 2 /n)
(n 1)s2 / 2 2n1
and y and s2 are independent.
The central F distribution
If y1 2n1 , y2 2n2 , and y1 and y2 are distributed independently, then the ratio
F =

y1 /n1
y2 /n2

is defined to have a central F distribution with n1 and n2 degrees of freedom. This is denoted
by
F =

y1 /n1
Fn1 ,n2
y2 /n2

(A.8)

From (A.4) and (A.5), and the independence of y1 and y2 , we have


E(F ) = n2 /(n2 2) for n2 > 2

(A.9)

The t distribution
If z N (0, 1) and y 2n independently of each other then
t=

z
(y/n)1/2

is defined to have a t distribution with n degrees of freedom. This is denoted by


t=

z
tn
(y/n)1/2

(A.10)

Note that squaring the expression for t and using the definition of the F distribution we
have
t2 =

z 2 /1
F1,n
y/n

(A.11)

The non-central Chi-squared distribution


The sum of squares of n independent random variables z1 , z2 , . . . , zn , where zi N (i , 1),
i = 1, . . . , n, is said to have a non-central
Chi-squared distribution with n degrees of freedom
P
and non-centrality parameter = ni=1 2i . It is denoted by
y=

n
X

zi2 2 (n, )

(A.12)

i=1

In vector notation y = zT z 2 (n, ) where z = (z1 , z2 , . . . , zn )T .


If y 2 (n, ) then E(y) = n + and V ar(y) = 2n + 4.
If y1 , . . . , ym are independent random variables with chi-squared distributions yi 2 (ni , i )
then
y=

m
X

yi

X
m

i=1

i=1

ni ,

m
X

(A.13)

i=1

The non-central F distribution


If y1 and y2 are independent random variables such that y1 2 (n, ) and y2 2m then the
ratio
F =

y1 /n
y2 /m

is defined to have a non-central F distribution.

B. Matrices and Eigenvectors


Definition: A symmetric n n matrix A is positive semi-definite (psd) iff
xT Ax 0 n 1 vectors x 6= 0.
If A is positive semi-definite we write A 0.
Definition: A symmetric n n matrix A is positive definite (pd) iff
xT Ax > 0 n 1 vectors x 6= 0
If A is positive definite we write A > 0. Note that if A is positive definite it is of full rank
n.
For any matrix A
AT A is symmetric and psd

(B.1)

AAT is symmetric and psd

(B.2)

rank(AT A) = rank(AAT ) = rank(A)

(B.3)

For any n n matrix A


Definition: The eigenvalues of A are the solutions of
|A I| = 0
where |.| denotes the operation of taking the determinant. This is a polynomial equation of
degree n. If we denote the n eigenvalues of 1 , . . . , n then for each i a corresponding
eigenvector i such that
A i = i i
By convention eigenvectors are normalized so that
Ti = 1
We have
n
X

i = trace(A)

(B.4)

i=1

and

n
Y

i = |A|

i=1

(B.5)

For any n n real, symmetric matrix A

The eigenvalues and eigenvectors of A are all real

(B.6)

Ti j = 0 if i 6= j

(B.7)

so that distinct eigenvalues have mutually orthogonal eigenvectors.


If i is an eigenvalue of multiplicity p then
p corresponding mutually orthogonal eigenvectors can always be chosen

(B.8)

Results (B.7) and (B.8) tell us that for any real, symmetric matrix A we can find n real,
mutually orthogonal eigenvectors 1 , . . . , n that correspond to real eigenvalues 1 , . . . , n .
If we define to be the matrix ( 1 , . . . , n ), then
T = T = I

(B.9)

If we define to be the diagonal matrix diag(1 , . . . , n ) then


A = (i.e. A i = i i i)

(B.10)

A = T

(B.11)

which implies

For any n n real, symmetric positive definite matrix A

The real eigenvalues are all strictly positive

(B.12)

For any n n real, symmetric psd matrix A


If rank(A) = r then
There are exactly r strictly positive eigenvalues and n r zero eigenvalues

(B.13)

Theorem 1 (Spectral Decomposition Theorem)


(i) Any symmetric n n matrix A can be written as
T

A = =

n
X
i=1

i i Ti

(1.1)

(ii) If A is also psd then, from (B.13), we can order the eigenvalues such that 1 2
. . . r > 0 and r+1 = r+2 = . . . = n = 0. Then (1.1) becomes
A=

r
X

i i Ti = 1 1 T1

(1.2)

i=1

where 1 = diag(1 , . . . , r ) and 1 = ( 1 , . . . , r ).


Proof: Follows from (B.11) and (B.13).
Theorem 2
Any symmetric matrix S which is psd can be written as
S = B2

where B is symmetric

B is called the symmetric square root of S and is denoted by S1/2 . If, in addition, S is of
full rank and is therefore positive definite, then B has an inverse which is denoted by S1/2
and is the symmetric square root of S1 .
Proof: For S psd we can express S as S = 1 1 T1 as in part (ii) of the Spectral Decomposition Theorem above. Then set
1/2

B = S1/2 = 1 1 T1
1/2

1/2

1/2

where 1 denotes the diagonal matrix diag(1 , . . . , r ) and r = rank(S). Clearly B is


of rank r and satisfies S = B2 . If S is positive definite then r = n and all its eigenvalues are
strictly positive by (B.12). We can therefore set
B1 = S1/2 = 1/2 T
1/2

where 1/2 denotes the diagonal matrix diag(1

1/2

, . . . , n

).

Theorem 3
For any matrices A (n p) and B (p n) the non-zero eigenvalues of AB and BA are the
same and have the same multiplicity. If x is a non-trivial eigenvector of AB corresponding
to an eigenvalue 6= 0 then Bx is a non-trivial eigenvector of BA.
Proof: Essentially because if
AB =
for some 6= 0 and non-trivial , then pre-multiplying by B gives
BAB = B
so that is also an eigenvalue of BA with corresponding eigenvector B.

We also have
trace(AB) = trace(BA)

(B.14)

A symmetric matrix A has rank 1 iff A = xxT for some vector x

(B.15)

and

Moreover if A = xxT then


xxT is the only non-zero eigenvalue of A and x is an (unstandardized) eigenvector
(B.16)
Result (B.16) follows from (B.4) and (B.14) since
trace(A) = trace(xxT ) = trace(xT x) = xT x
and
Ax = (xxT )x = x(xT x) = (xT x)x
Definition: The minor of the element aij in a p p matrix A = (aij ) is the determinant of
the matrix formed by deleting the ith row and the jth column of A.
Definition: The cofactor of aij is the minor multiplied by (1)i+j and is written Aij .
Then
|A| =

p
X

aij Aij

for all i

aij Aij

for all j

(B.17)

j=1
p

X
i=1

and
A1 = (aij ) =

1
(Aij )T
|A|

(B.18)

C. Multidimensional Random Variables


Let x = (x1 , . . . , xp )T be a random vector.
The joint cumulative distribution function associated with x is the function F defined by
F (x0 ) = P (x1 x01 , x2 x02 , . . . , xp x0p )
= P (x x0 )
A random vector (r.v.) x is absolutely continuous if there exists a joint probability density
function, f (x) such that:
Z x1
Z xp
F (x) =
...
f (u)du1 . . . dup

Marginal Distributions
Consider the partitioned vector xT = (xT1 , xT2 ) where x1 = (x1 , . . . , xk )T and x2 = (xk+1 , . . . , xp )T .
The function
P (x1 x01 ) = F (x01 , . . . , x0k , , . . . , )
is the marginal cumulative distribution function of x1 .
If x is absolutely continuous
Z

f1 (x1 ) =

...

f (x1 , x2 )dxk+1 . . . dxp

is the marginal p.d.f of x1 .


The marginal density of a single element of x is given by
Z
Z
fi (xi ) =
...
f (x)dx1 . . . dxi1 dxi+1 . . . dxp

From now on we only consider random vectors that are absolutely continuous.
Conditional Distributions
If x is an absolutely continuous r.v. partitioned as before as xT = (xT1 , xT2 ), then for any
given value of x1 , x01 , the conditional joint density function of x2 is given by
f (x2 |x1 = x01 ) =

f (x01 , x2 )
f1 (x01 )

Similarly
f (x1 |x2 = x02 ) =

f (x1 , x02 )
f2 (x02 )

Independence
When the conditional p.d.f. f (x2 |x1 = x01 ) is the same for all values of x01 , then x1 and x2
are said to be statistically independent.
x1 and x2 statistically independent implies:
f (x2 |x1 = x01 ) = f2 (x2 ) (the marginal density)
Theorem x1 and x2 are statistically independent if and only if
f (x) = f1 (x1 )f2 (x2 )
x1 , . . . , xp are statistically independent if and only if
f (x) = f1 (x1 )f2 (x2 ) . . . fp (xp )
Expectation
If g is a real valued function of the random vector x then
Z
Z
E(g(x)) =
...
g(x)f (x)dx1 . . . dxp

If xT = (xT1 , xT2 ) as before, and g is a function of x1 only then


Z
Z
E(g(x1 )) =
...
g(x1 )f1 (x1 )dx1 . . . dxk

The expectation of a matrix-valued or a vector-valued function of x, eg G(x) = (gij (x)) is


defined to be the matrix
E[G(x)] = (E[gij (x)])

D. Random Vectors and Linear Transformations


Let x = (x1 , . . . , xp )T be a p 1 random vector. Then
E(x) = (E(x1 ), . . . , E(xp ))T

Cov(x, x) = E{[x E(x)][x E(x)]T }

11 12 . . . 1p
21 22 . . . 2p

= ..
.. . .
..
.
. .
.
p1 2p . . . pp

(D.1)

(D.2)

=
where
ij = ji = Cov(xi , xj ) = E{[xi E(xi )][xj E(xj )]}
and
ii = Cov(xi , xi ) = E{[xi E(xi )]2 } = V ar(xi )
We call the covariance matrix of the random vector x and is symmetric and positive
semi-definite (psd).
Suppose now that x is a p 1 random vector from a population with mean vector and
covariance matrix .
If a and b are p 1 vectors of constants then
E(aT x) = aT E(x)

(D.3)

Cov(aT x, aT x) = V ar(aT x) = aT a

(D.4)

Cov(aT x, bT x) = aT b

(D.5)

and
For example, let x1 be a random variable with mean 1 and variance 12 and x2 be a r.v.
with mean 2 and variance 22 . Suppose Cov(x1 , x2 ) = 12 . Then
E(a1 x1 + a2 x2 ) = a1 1 + a2 2
V ar(a1 x1 + a2 x2 ) = a21 12 + 2a1 a2 12 + a22 22

12 12
a1
= a 1 , a2
a2
12 22

10

Cov(a1 x1 + a2 x2 , b1 x1 + b2 x2 ) = a1 b1 12 + (a1 b1 + a2 b1 )12 + a2 b2 22

12 12
b1
= a1 , a2
12 22
b2
and if x1 and x2 are uncorrelated (i.e. 12 = 0) then
V ar(a1 x1 + a2 x2 ) = a21 12 + a22 22
and
Cov(a1 x1 + a2 x2 , b1 x1 + b2 x2 ) = a1 b1 12 + a2 b2 22
More generally, if A and B are matrices of constants of dimensions rp and sp respectively,
then the means and covariances of the r 1 and s 1 random vectors
y = Ax and z = Bx
will be given by the mean vectors
E(y) = A

(D.6)

E(z) = B

(D.7)

Cov(y, y) = AAT

(D.8)

Cov(z, z) = BBT

(D.9)

Cov(y, z) = ABT

(D.10)

and

and the covariance matrices

Similarly, if X is an np data matrix with p1 sample mean vector x and sample covariance
matrix S, and if x represents the general row vector of X, then
sample V ar(aT x) = aT Sa

(D.11)

sample Cov(aT x, bT x) = aT Sb

(D.12)

sample Cov(Ax, Ax) = ASAT

(D.13)

sample Cov(Bx, Bx) = BSBT

(D.14)

sample Cov(Ax, Bx) = ASBT

(D.15)

11

E. The p-dimensional Multivariate Normal Distribution


Suppose is a p p positive definite matrix (i.e. > 0). A p 1 random vector x is said to
have a p-dimensional Multivariate Normal distribution with mean vector and non-singular
covariance matrix if its joint probability density function (p.d.f.) is given by
1
f (x) = {|2|}1/2 exp{ (x )T 1 (x )}
2
1
= {(2)p ||}1/2 exp{ (x )T 1 (x )}
2

((E.1))

(Note the similarity between (E.1) and the form of the univariate Normal distribution given
at the beginning of Section A). We denote this by
x M V N (, )
It is easily shown that
(x )T 1 (x ) 2p

(E.2)

Set z = 2 (x ). Then
z M V N (0, I)
12

since E(z) = 0, Cov(z, z) = Cov(x, x) 2 = I and linear transformations of multivariate Normal random vectors are also distributed according to the multivariate Normal
distribution. It follows that
(x )T 1 (x ) = zT z =

p
X

zi2 2p

i=1

by definition.
Similarly, if x M V N (, ) then the distribution of the random variable xT 1 x has a
non-central chi-squared distribution with p degrees of freedom and non-centrality parameter
= T 1 . That is
y = xT 1 x 2 (p, )

12

(E.3)

F. Matrix Differentiation
Let f (x) be a real-valued function of the p 1 vector x and g(X) be a real-valued function
of the n p matrix X.
The derivative of f (x) wrt x is defined to be the p 1 vector
f
x

1
f
..
= .
x
f

(F.1)

xp

and the derivative of g(X) wrt X is defined to be the n p matrix


g

. . . xg1p
x11
g

..
..
= ...
.
.
X
g
g
. . . xnp
xn1

(F.2)

Recall that if x and X are functions of a scalar v then


x
1

x v
.
= ..
v
xp

(F.3)

and

X
=
v

xij
v

(F.4)

where X = (xij ). If A is symmetric and a is any constant vector then


T
T
(x a) =
(a x) = a
x
x

(F.5)

T
(x x) = 2x
x

(F.6)

T
(x Ax) = 2Ax
x

(F.7)

T
T
(x Ay) =
(y Ax) = Ay
x
x

(F.8)

(For (F.8) we require only that x, y and A conform).

13

We have the following results for general matrices X and Y.


X
Y

(XY) =
Y+X
v
v
v
X
= Jij
xij

(F.9)

(F.10a)

where Jij denotes the matrix with one in the ijth place and zeroes elsewhere. Also, if X is
square and of full rank

X1
1 X
X1
(F.11)
= X
v
v
and
X1
= X1 Jij X1
xij

(F.12a)

X
= Jij + Jji
xij

(F.10b)

If X is symmetric then
i 6= j

and of full rank then


X1
=
xij

(
X1 Jii X1
i=j
1
1
X (Jij + Jji )X
i=
6 j

(F.12b)

Let A and X be any two matrices such that the product AX exists. Then

trace(AX) = AT
X

(F.13a)

Now let A and X be square and X be of full rank. Then


|X|
= Xij
xij

(F.14a)

(ln|X|) = (X1 )T = (xij )T


X

(F.15a)

trace(AX1 ) = (X1 AX1 )T


X

(F.16a)

If A and X are symmetric then

trace(AX) = 2aij
xij
|X|
= 2Xij
xij
14

i 6= j

i 6= j

(F.13b)
(F.14b)


ln|X| = 2xij
xij

i 6= j

trace(AX1 ) = 2(X1 AX1 )ij


xij

(F.15b)
i 6= j

(F.16b)

Notes: Results (F.5)-(F.9) follow by direct substitution.


To prove (F.11) differentiate the equation XX1 = I using (F.9), then results (F.10a),
(F.12a), (F.11b) and (F.12b) are easy.
To prove (F.13a) show that

trace(AX) = aji
xij
To prove (F.14a) use (B.17). To prove (F.15a) use (F.14a) and (B.18) to show that

ln|X| = xji
xij
Result (F.16a) is harder to prove. Use (F.12a) and (B.14) to show that

trace(AX1 ) = (X1 AX1 )ji


xij
The symmetric results (F.13b), (F.14b), (F.15b) and (F.16b) follow immediately.

15

You might also like