Distributions

NOTES FOR MSC STUDENTS
A. Univariate Distributions and Results

Normal distribution
If y is a Normally distributed random variable with mean and variance 2 we write y
N (, 2 ). The density of y is given by
1
2 1/2
2
f (y) = {2 }
exp [(y )/]
2
The Normal distribution with = 0 and 2 = 1, i.e. y N (0, 1), is called the standard
Normal distribution.
The central Chi-squared distribution
The central Chi-squared distribution with n degrees of freedom is defined as the sum of
squares of n independent random variables z1 , z2 , . . . , zn each with the standard Normal
distribution. It is denoted by
y=
n
X
zi2 2n
(A.1)
i=1
In vector notation y = zT z 2n where z = (z1 , z2 , . . . , zn )T . If x1 , x2 , . . . , xn are independent

Normally distributed random variables with xi N (i , i2 ), then
y=
2
n
X
x i i
i=1
2n
(A.2)
since the variables zi = (xi i )/i are N (0, 1).

If y 2n then the density of y is given by
f (y) = {2n/2 (n/2)}1 y (n/21) ey/2
0<y<
(A.3)
and we have
E(y) = n,
V ar(y) = 2n
(A.4)
and, for n > 2

E(1/y) = 1/(n 2)
(A.5)
If y1 , . . . , ym are independent random variables with the chi-squared distributions yi 2ni ,

then
y=
m
X
yi 2k
i=1
(A.6)
where k =
P
i
ni .
If y1 and y2 have chi-squared distributions with n1 and n2 degrees of freedom respectively,

where n1 > n2 , and they are independent then their difference also has a chi-squared distribution
y = y1 y2 2n1 n2
(A.7)
Recall that for a random sample of observations y1 , . . . , yn from N (, 2 ), and defining

n
1X
y=
yi ,
n i=1
1 X
and s =
(yi y)2
n 1 i=1
2
we have
y N (, 2 /n)
(n 1)s2 / 2 2n1
and y and s2 are independent.
The central F distribution
If y1 2n1 , y2 2n2 , and y1 and y2 are distributed independently, then the ratio
F =
y1 /n1
y2 /n2
is defined to have a central F distribution with n1 and n2 degrees of freedom. This is denoted
by
F =
y1 /n1
Fn1 ,n2
y2 /n2
(A.8)
From (A.4) and (A.5), and the independence of y1 and y2 , we have

E(F ) = n2 /(n2 2) for n2 > 2
(A.9)
The t distribution
If z N (0, 1) and y 2n independently of each other then
t=
z
(y/n)1/2
is defined to have a t distribution with n degrees of freedom. This is denoted by

t=
z
tn
(y/n)1/2
(A.10)
Note that squaring the expression for t and using the definition of the F distribution we
have
t2 =
z 2 /1
F1,n
y/n
(A.11)
The non-central Chi-squared distribution

The sum of squares of n independent random variables z1 , z2 , . . . , zn , where zi N (i , 1),
i = 1, . . . , n, is said to have a non-central
Chi-squared distribution with n degrees of freedom
P
and non-centrality parameter = ni=1 2i . It is denoted by
y=
n
X
zi2 2 (n, )
(A.12)
i=1
In vector notation y = zT z 2 (n, ) where z = (z1 , z2 , . . . , zn )T .

If y 2 (n, ) then E(y) = n + and V ar(y) = 2n + 4.
If y1 , . . . , ym are independent random variables with chi-squared distributions yi 2 (ni , i )
then
y=
m
X
yi
X
m
i=1
i=1
ni ,
m
X
(A.13)
i=1
The non-central F distribution

If y1 and y2 are independent random variables such that y1 2 (n, ) and y2 2m then the
ratio
F =
y1 /n
y2 /m
is defined to have a non-central F distribution.
B. Matrices and Eigenvectors

Definition: A symmetric n n matrix A is positive semi-definite (psd) iff
xT Ax 0 n 1 vectors x 6= 0.
If A is positive semi-definite we write A 0.
Definition: A symmetric n n matrix A is positive definite (pd) iff
xT Ax > 0 n 1 vectors x 6= 0
If A is positive definite we write A > 0. Note that if A is positive definite it is of full rank
n.
For any matrix A
AT A is symmetric and psd
(B.1)
AAT is symmetric and psd
(B.2)
rank(AT A) = rank(AAT ) = rank(A)
(B.3)
For any n n matrix A

Definition: The eigenvalues of A are the solutions of
|A I| = 0
where |.| denotes the operation of taking the determinant. This is a polynomial equation of
degree n. If we denote the n eigenvalues of 1 , . . . , n then for each i a corresponding
eigenvector i such that
A i = i i
By convention eigenvectors are normalized so that
Ti = 1
We have
n
X
i = trace(A)
(B.4)
i=1
and
n
Y
i = |A|
i=1
(B.5)
For any n n real, symmetric matrix A
The eigenvalues and eigenvectors of A are all real
(B.6)
Ti j = 0 if i 6= j
(B.7)
so that distinct eigenvalues have mutually orthogonal eigenvectors.

If i is an eigenvalue of multiplicity p then
p corresponding mutually orthogonal eigenvectors can always be chosen
(B.8)
Results (B.7) and (B.8) tell us that for any real, symmetric matrix A we can find n real,
mutually orthogonal eigenvectors 1 , . . . , n that correspond to real eigenvalues 1 , . . . , n .
If we define to be the matrix ( 1 , . . . , n ), then
T = T = I
(B.9)
If we define to be the diagonal matrix diag(1 , . . . , n ) then

A = (i.e. A i = i i i)
(B.10)
A = T
(B.11)
which implies
For any n n real, symmetric positive definite matrix A
The real eigenvalues are all strictly positive
(B.12)
For any n n real, symmetric psd matrix A

If rank(A) = r then
There are exactly r strictly positive eigenvalues and n r zero eigenvalues
(B.13)
Theorem 1 (Spectral Decomposition Theorem)

(i) Any symmetric n n matrix A can be written as
T
A = =
n
X
i=1
i i Ti
(1.1)
(ii) If A is also psd then, from (B.13), we can order the eigenvalues such that 1 2
. . . r > 0 and r+1 = r+2 = . . . = n = 0. Then (1.1) becomes
A=
r
X
i i Ti = 1 1 T1
(1.2)
i=1
where 1 = diag(1 , . . . , r ) and 1 = ( 1 , . . . , r ).

Proof: Follows from (B.11) and (B.13).
Theorem 2
Any symmetric matrix S which is psd can be written as
S = B2
where B is symmetric
B is called the symmetric square root of S and is denoted by S1/2 . If, in addition, S is of
full rank and is therefore positive definite, then B has an inverse which is denoted by S1/2
and is the symmetric square root of S1 .
Proof: For S psd we can express S as S = 1 1 T1 as in part (ii) of the Spectral Decomposition Theorem above. Then set
1/2
B = S1/2 = 1 1 T1
1/2
1/2
1/2
where 1 denotes the diagonal matrix diag(1 , . . . , r ) and r = rank(S). Clearly B is

of rank r and satisfies S = B2 . If S is positive definite then r = n and all its eigenvalues are
strictly positive by (B.12). We can therefore set
B1 = S1/2 = 1/2 T
1/2
where 1/2 denotes the diagonal matrix diag(1
1/2
, . . . , n
).
Theorem 3
For any matrices A (n p) and B (p n) the non-zero eigenvalues of AB and BA are the
same and have the same multiplicity. If x is a non-trivial eigenvector of AB corresponding
to an eigenvalue 6= 0 then Bx is a non-trivial eigenvector of BA.
Proof: Essentially because if
AB =
for some 6= 0 and non-trivial , then pre-multiplying by B gives
BAB = B
so that is also an eigenvalue of BA with corresponding eigenvector B.
We also have
trace(AB) = trace(BA)
(B.14)
A symmetric matrix A has rank 1 iff A = xxT for some vector x
(B.15)
and
Moreover if A = xxT then

xxT is the only non-zero eigenvalue of A and x is an (unstandardized) eigenvector
(B.16)
Result (B.16) follows from (B.4) and (B.14) since
trace(A) = trace(xxT ) = trace(xT x) = xT x
and
Ax = (xxT )x = x(xT x) = (xT x)x
Definition: The minor of the element aij in a p p matrix A = (aij ) is the determinant of
the matrix formed by deleting the ith row and the jth column of A.
Definition: The cofactor of aij is the minor multiplied by (1)i+j and is written Aij .
Then
|A| =
p
X
aij Aij
for all i
aij Aij
for all j
(B.17)
j=1
p
X
i=1
and
A1 = (aij ) =
1
(Aij )T
|A|
(B.18)
C. Multidimensional Random Variables

Let x = (x1 , . . . , xp )T be a random vector.
The joint cumulative distribution function associated with x is the function F defined by
F (x0 ) = P (x1 x01 , x2 x02 , . . . , xp x0p )
= P (x x0 )
A random vector (r.v.) x is absolutely continuous if there exists a joint probability density
function, f (x) such that:
Z x1
Z xp
F (x) =
...
f (u)du1 . . . dup
Marginal Distributions
Consider the partitioned vector xT = (xT1 , xT2 ) where x1 = (x1 , . . . , xk )T and x2 = (xk+1 , . . . , xp )T .
The function
P (x1 x01 ) = F (x01 , . . . , x0k , , . . . , )
is the marginal cumulative distribution function of x1 .
If x is absolutely continuous
Z
f1 (x1 ) =
...
f (x1 , x2 )dxk+1 . . . dxp
is the marginal p.d.f of x1 .

The marginal density of a single element of x is given by
Z
Z
fi (xi ) =
...
f (x)dx1 . . . dxi1 dxi+1 . . . dxp
From now on we only consider random vectors that are absolutely continuous.
Conditional Distributions
If x is an absolutely continuous r.v. partitioned as before as xT = (xT1 , xT2 ), then for any
given value of x1 , x01 , the conditional joint density function of x2 is given by
f (x2 |x1 = x01 ) =
f (x01 , x2 )
f1 (x01 )
Similarly
f (x1 |x2 = x02 ) =
f (x1 , x02 )
f2 (x02 )
Independence
When the conditional p.d.f. f (x2 |x1 = x01 ) is the same for all values of x01 , then x1 and x2
are said to be statistically independent.
x1 and x2 statistically independent implies:
f (x2 |x1 = x01 ) = f2 (x2 ) (the marginal density)
Theorem x1 and x2 are statistically independent if and only if
f (x) = f1 (x1 )f2 (x2 )
x1 , . . . , xp are statistically independent if and only if
f (x) = f1 (x1 )f2 (x2 ) . . . fp (xp )
Expectation
If g is a real valued function of the random vector x then
Z
Z
E(g(x)) =
...
g(x)f (x)dx1 . . . dxp
If xT = (xT1 , xT2 ) as before, and g is a function of x1 only then

Z
Z
E(g(x1 )) =
...
g(x1 )f1 (x1 )dx1 . . . dxk
The expectation of a matrix-valued or a vector-valued function of x, eg G(x) = (gij (x)) is

defined to be the matrix
E[G(x)] = (E[gij (x)])
D. Random Vectors and Linear Transformations

Let x = (x1 , . . . , xp )T be a p 1 random vector. Then
E(x) = (E(x1 ), . . . , E(xp ))T
Cov(x, x) = E{[x E(x)][x E(x)]T }
11 12 . . . 1p
21 22 . . . 2p
= ..
.. . .
..
.
. .
.
p1 2p . . . pp
(D.1)
(D.2)
=
where
ij = ji = Cov(xi , xj ) = E{[xi E(xi )][xj E(xj )]}
and
ii = Cov(xi , xi ) = E{[xi E(xi )]2 } = V ar(xi )
We call the covariance matrix of the random vector x and is symmetric and positive
semi-definite (psd).
Suppose now that x is a p 1 random vector from a population with mean vector and
covariance matrix .
If a and b are p 1 vectors of constants then
E(aT x) = aT E(x)
(D.3)
Cov(aT x, aT x) = V ar(aT x) = aT a
(D.4)
Cov(aT x, bT x) = aT b
(D.5)
and
For example, let x1 be a random variable with mean 1 and variance 12 and x2 be a r.v.
with mean 2 and variance 22 . Suppose Cov(x1 , x2 ) = 12 . Then
E(a1 x1 + a2 x2 ) = a1 1 + a2 2
V ar(a1 x1 + a2 x2 ) = a21 12 + 2a1 a2 12 + a22 22

12 12
a1
= a 1 , a2
a2
12 22
10
Cov(a1 x1 + a2 x2 , b1 x1 + b2 x2 ) = a1 b1 12 + (a1 b1 + a2 b1 )12 + a2 b2 22
12 12
b1
= a1 , a2
12 22
b2
and if x1 and x2 are uncorrelated (i.e. 12 = 0) then
V ar(a1 x1 + a2 x2 ) = a21 12 + a22 22
and
Cov(a1 x1 + a2 x2 , b1 x1 + b2 x2 ) = a1 b1 12 + a2 b2 22
More generally, if A and B are matrices of constants of dimensions rp and sp respectively,
then the means and covariances of the r 1 and s 1 random vectors
y = Ax and z = Bx
will be given by the mean vectors
E(y) = A
(D.6)
E(z) = B
(D.7)
Cov(y, y) = AAT
(D.8)
Cov(z, z) = BBT
(D.9)
Cov(y, z) = ABT
(D.10)
and
and the covariance matrices
Similarly, if X is an np data matrix with p1 sample mean vector x and sample covariance
matrix S, and if x represents the general row vector of X, then
sample V ar(aT x) = aT Sa
(D.11)
sample Cov(aT x, bT x) = aT Sb
(D.12)
sample Cov(Ax, Ax) = ASAT
(D.13)
sample Cov(Bx, Bx) = BSBT
(D.14)
sample Cov(Ax, Bx) = ASBT
(D.15)
11
E. The p-dimensional Multivariate Normal Distribution

Suppose is a p p positive definite matrix (i.e. > 0). A p 1 random vector x is said to
have a p-dimensional Multivariate Normal distribution with mean vector and non-singular
covariance matrix if its joint probability density function (p.d.f.) is given by
1
f (x) = {|2|}1/2 exp{ (x )T 1 (x )}
2
1
= {(2)p ||}1/2 exp{ (x )T 1 (x )}
2
((E.1))
(Note the similarity between (E.1) and the form of the univariate Normal distribution given
at the beginning of Section A). We denote this by
x M V N (, )
It is easily shown that
(x )T 1 (x ) 2p
(E.2)
Set z = 2 (x ). Then
z M V N (0, I)
12
since E(z) = 0, Cov(z, z) = Cov(x, x) 2 = I and linear transformations of multivariate Normal random vectors are also distributed according to the multivariate Normal
distribution. It follows that
(x )T 1 (x ) = zT z =
p
X
zi2 2p
i=1
by definition.
Similarly, if x M V N (, ) then the distribution of the random variable xT 1 x has a
non-central chi-squared distribution with p degrees of freedom and non-centrality parameter
= T 1 . That is
y = xT 1 x 2 (p, )
12
(E.3)
F. Matrix Differentiation
Let f (x) be a real-valued function of the p 1 vector x and g(X) be a real-valued function
of the n p matrix X.
The derivative of f (x) wrt x is defined to be the p 1 vector
f
x
1
f
..
= .
x
f
(F.1)
xp
and the derivative of g(X) wrt X is defined to be the n p matrix

g
. . . xg1p
x11
g
..
..
= ...
.
.
X
g
g
. . . xnp
xn1
(F.2)
Recall that if x and X are functions of a scalar v then

x
1
x v
.
= ..
v
xp
(F.3)
and
X
=
v
xij
v
(F.4)
where X = (xij ). If A is symmetric and a is any constant vector then

T
T
(x a) =
(a x) = a
x
x
(F.5)
T
(x x) = 2x
x
(F.6)
T
(x Ax) = 2Ax
x
(F.7)
T
T
(x Ay) =
(y Ax) = Ay
x
x
(F.8)
(For (F.8) we require only that x, y and A conform).
13
We have the following results for general matrices X and Y.

X
Y
(XY) =
Y+X
v
v
v
X
= Jij
xij
(F.9)
(F.10a)
where Jij denotes the matrix with one in the ijth place and zeroes elsewhere. Also, if X is
square and of full rank
X1
1 X
X1
(F.11)
= X
v
v
and
X1
= X1 Jij X1
xij
(F.12a)
X
= Jij + Jji
xij
(F.10b)
If X is symmetric then
i 6= j
and of full rank then

X1
=
xij
(
X1 Jii X1
i=j
1
1
X (Jij + Jji )X
i=
6 j
(F.12b)
Let A and X be any two matrices such that the product AX exists. Then
trace(AX) = AT
X
(F.13a)
Now let A and X be square and X be of full rank. Then

|X|
= Xij
xij
(F.14a)
(ln|X|) = (X1 )T = (xij )T

X
(F.15a)
trace(AX1 ) = (X1 AX1 )T

X
(F.16a)
If A and X are symmetric then
trace(AX) = 2aij
xij
|X|
= 2Xij
xij
14
i 6= j
i 6= j
(F.13b)
(F.14b)

ln|X| = 2xij
xij
i 6= j
trace(AX1 ) = 2(X1 AX1 )ij

xij
(F.15b)
i 6= j
(F.16b)
Notes: Results (F.5)-(F.9) follow by direct substitution.

To prove (F.11) differentiate the equation XX1 = I using (F.9), then results (F.10a),
(F.12a), (F.11b) and (F.12b) are easy.
To prove (F.13a) show that
trace(AX) = aji
xij
To prove (F.14a) use (B.17). To prove (F.15a) use (F.14a) and (B.18) to show that
ln|X| = xji
xij
Result (F.16a) is harder to prove. Use (F.12a) and (B.14) to show that
trace(AX1 ) = (X1 AX1 )ji

xij
The symmetric results (F.13b), (F.14b), (F.15b) and (F.16b) follow immediately.
15

Distributions

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Distributions

Uploaded by

Copyright:

Available Formats

NOTES FOR MSC STUDENTS

A. Univariate Distributions and Results

In vector notation y = zT z 2n where z = (z1 , z2 , . . . , zn )T . If x1 , x2 , . . . , xn are independent

since the variables zi = (xi i )/i are N (0, 1).

and, for n > 2

If y1 , . . . , ym are independent random variables with the chi-squared distributions yi 2ni ,

If y1 and y2 have chi-squared distributions with n1 and n2 degrees of freedom respectively,

Recall that for a random sample of observations y1 , . . . , yn from N (, 2 ), and defining

From (A.4) and (A.5), and the independence of y1 and y2 , we have

is defined to have a t distribution with n degrees of freedom. This is denoted by

The non-central Chi-squared distribution

In vector notation y = zT z 2 (n, ) where z = (z1 , z2 , . . . , zn )T .

The non-central F distribution

is defined to have a non-central F distribution.

B. Matrices and Eigenvectors

AAT is symmetric and psd

rank(AT A) = rank(AAT ) = rank(A)

For any n n matrix A

For any n n real, symmetric matrix A

The eigenvalues and eigenvectors of A are all real

so that distinct eigenvalues have mutually orthogonal eigenvectors.

If we define to be the diagonal matrix diag(1 , . . . , n ) then

For any n n real, symmetric positive definite matrix A

The real eigenvalues are all strictly positive

For any n n real, symmetric psd matrix A

Theorem 1 (Spectral Decomposition Theorem)

where 1 = diag(1 , . . . , r ) and 1 = ( 1 , . . . , r ).

where 1 denotes the diagonal matrix diag(1 , . . . , r ) and r = rank(S). Clearly B is

where 1/2 denotes the diagonal matrix diag(1

A symmetric matrix A has rank 1 iff A = xxT for some vector x

Moreover if A = xxT then

C. Multidimensional Random Variables

f (x1 , x2 )dxk+1 . . . dxp

is the marginal p.d.f of x1 .

If xT = (xT1 , xT2 ) as before, and g is a function of x1 only then

The expectation of a matrix-valued or a vector-valued function of x, eg G(x) = (gij (x)) is

D. Random Vectors and Linear Transformations

Cov(x, x) = E{[x E(x)][x E(x)]T }

Cov(a1 x1 + a2 x2 , b1 x1 + b2 x2 ) = a1 b1 12 + (a1 b1 + a2 b1 )12 + a2 b2 22

and the covariance matrices

sample Cov(Ax, Ax) = ASAT

sample Cov(Bx, Bx) = BSBT

sample Cov(Ax, Bx) = ASBT

E. The p-dimensional Multivariate Normal Distribution

and the derivative of g(X) wrt X is defined to be the n p matrix

Recall that if x and X are functions of a scalar v then

where X = (xij ). If A is symmetric and a is any constant vector then

(For (F.8) we require only that x, y and A conform).

We have the following results for general matrices X and Y.

and of full rank then

Now let A and X be square and X be of full rank. Then

(ln|X|) = (X1 )T = (xij )T

trace(AX1 ) = (X1 AX1 )T

If A and X are symmetric then

trace(AX1 ) = 2(X1 AX1 )ij

Notes: Results (F.5)-(F.9) follow by direct substitution.

trace(AX1 ) = (X1 AX1 )ji

You might also like