Professional Documents
Culture Documents
1
2 1/2
2
f (y) = {2 }
exp [(y )/]
2
The Normal distribution with = 0 and 2 = 1, i.e. y N (0, 1), is called the standard
Normal distribution.
The central Chi-squared distribution
The central Chi-squared distribution with n degrees of freedom is defined as the sum of
squares of n independent random variables z1 , z2 , . . . , zn each with the standard Normal
distribution. It is denoted by
y=
n
X
zi2 2n
(A.1)
i=1
2
n
X
x i i
i=1
2n
(A.2)
0<y<
(A.3)
and we have
E(y) = n,
V ar(y) = 2n
(A.4)
(A.5)
m
X
yi 2k
i=1
(A.6)
where k =
P
i
ni .
(A.7)
1X
y=
yi ,
n i=1
1 X
and s =
(yi y)2
n 1 i=1
2
we have
y N (, 2 /n)
(n 1)s2 / 2 2n1
and y and s2 are independent.
The central F distribution
If y1 2n1 , y2 2n2 , and y1 and y2 are distributed independently, then the ratio
F =
y1 /n1
y2 /n2
is defined to have a central F distribution with n1 and n2 degrees of freedom. This is denoted
by
F =
y1 /n1
Fn1 ,n2
y2 /n2
(A.8)
(A.9)
The t distribution
If z N (0, 1) and y 2n independently of each other then
t=
z
(y/n)1/2
z
tn
(y/n)1/2
(A.10)
Note that squaring the expression for t and using the definition of the F distribution we
have
t2 =
z 2 /1
F1,n
y/n
(A.11)
n
X
zi2 2 (n, )
(A.12)
i=1
m
X
yi
X
m
i=1
i=1
ni ,
m
X
(A.13)
i=1
y1 /n
y2 /m
(B.1)
(B.2)
(B.3)
i = trace(A)
(B.4)
i=1
and
n
Y
i = |A|
i=1
(B.5)
(B.6)
Ti j = 0 if i 6= j
(B.7)
(B.8)
Results (B.7) and (B.8) tell us that for any real, symmetric matrix A we can find n real,
mutually orthogonal eigenvectors 1 , . . . , n that correspond to real eigenvalues 1 , . . . , n .
If we define to be the matrix ( 1 , . . . , n ), then
T = T = I
(B.9)
(B.10)
A = T
(B.11)
which implies
(B.12)
(B.13)
A = =
n
X
i=1
i i Ti
(1.1)
(ii) If A is also psd then, from (B.13), we can order the eigenvalues such that 1 2
. . . r > 0 and r+1 = r+2 = . . . = n = 0. Then (1.1) becomes
A=
r
X
i i Ti = 1 1 T1
(1.2)
i=1
where B is symmetric
B is called the symmetric square root of S and is denoted by S1/2 . If, in addition, S is of
full rank and is therefore positive definite, then B has an inverse which is denoted by S1/2
and is the symmetric square root of S1 .
Proof: For S psd we can express S as S = 1 1 T1 as in part (ii) of the Spectral Decomposition Theorem above. Then set
1/2
B = S1/2 = 1 1 T1
1/2
1/2
1/2
1/2
, . . . , n
).
Theorem 3
For any matrices A (n p) and B (p n) the non-zero eigenvalues of AB and BA are the
same and have the same multiplicity. If x is a non-trivial eigenvector of AB corresponding
to an eigenvalue 6= 0 then Bx is a non-trivial eigenvector of BA.
Proof: Essentially because if
AB =
for some 6= 0 and non-trivial , then pre-multiplying by B gives
BAB = B
so that is also an eigenvalue of BA with corresponding eigenvector B.
We also have
trace(AB) = trace(BA)
(B.14)
(B.15)
and
p
X
aij Aij
for all i
aij Aij
for all j
(B.17)
j=1
p
X
i=1
and
A1 = (aij ) =
1
(Aij )T
|A|
(B.18)
Marginal Distributions
Consider the partitioned vector xT = (xT1 , xT2 ) where x1 = (x1 , . . . , xk )T and x2 = (xk+1 , . . . , xp )T .
The function
P (x1 x01 ) = F (x01 , . . . , x0k , , . . . , )
is the marginal cumulative distribution function of x1 .
If x is absolutely continuous
Z
f1 (x1 ) =
...
From now on we only consider random vectors that are absolutely continuous.
Conditional Distributions
If x is an absolutely continuous r.v. partitioned as before as xT = (xT1 , xT2 ), then for any
given value of x1 , x01 , the conditional joint density function of x2 is given by
f (x2 |x1 = x01 ) =
f (x01 , x2 )
f1 (x01 )
Similarly
f (x1 |x2 = x02 ) =
f (x1 , x02 )
f2 (x02 )
Independence
When the conditional p.d.f. f (x2 |x1 = x01 ) is the same for all values of x01 , then x1 and x2
are said to be statistically independent.
x1 and x2 statistically independent implies:
f (x2 |x1 = x01 ) = f2 (x2 ) (the marginal density)
Theorem x1 and x2 are statistically independent if and only if
f (x) = f1 (x1 )f2 (x2 )
x1 , . . . , xp are statistically independent if and only if
f (x) = f1 (x1 )f2 (x2 ) . . . fp (xp )
Expectation
If g is a real valued function of the random vector x then
Z
Z
E(g(x)) =
...
g(x)f (x)dx1 . . . dxp
11 12 . . . 1p
21 22 . . . 2p
= ..
.. . .
..
.
. .
.
p1 2p . . . pp
(D.1)
(D.2)
=
where
ij = ji = Cov(xi , xj ) = E{[xi E(xi )][xj E(xj )]}
and
ii = Cov(xi , xi ) = E{[xi E(xi )]2 } = V ar(xi )
We call the covariance matrix of the random vector x and is symmetric and positive
semi-definite (psd).
Suppose now that x is a p 1 random vector from a population with mean vector and
covariance matrix .
If a and b are p 1 vectors of constants then
E(aT x) = aT E(x)
(D.3)
Cov(aT x, aT x) = V ar(aT x) = aT a
(D.4)
Cov(aT x, bT x) = aT b
(D.5)
and
For example, let x1 be a random variable with mean 1 and variance 12 and x2 be a r.v.
with mean 2 and variance 22 . Suppose Cov(x1 , x2 ) = 12 . Then
E(a1 x1 + a2 x2 ) = a1 1 + a2 2
V ar(a1 x1 + a2 x2 ) = a21 12 + 2a1 a2 12 + a22 22
12 12
a1
= a 1 , a2
a2
12 22
10
12 12
b1
= a1 , a2
12 22
b2
and if x1 and x2 are uncorrelated (i.e. 12 = 0) then
V ar(a1 x1 + a2 x2 ) = a21 12 + a22 22
and
Cov(a1 x1 + a2 x2 , b1 x1 + b2 x2 ) = a1 b1 12 + a2 b2 22
More generally, if A and B are matrices of constants of dimensions rp and sp respectively,
then the means and covariances of the r 1 and s 1 random vectors
y = Ax and z = Bx
will be given by the mean vectors
E(y) = A
(D.6)
E(z) = B
(D.7)
Cov(y, y) = AAT
(D.8)
Cov(z, z) = BBT
(D.9)
Cov(y, z) = ABT
(D.10)
and
Similarly, if X is an np data matrix with p1 sample mean vector x and sample covariance
matrix S, and if x represents the general row vector of X, then
sample V ar(aT x) = aT Sa
(D.11)
sample Cov(aT x, bT x) = aT Sb
(D.12)
(D.13)
(D.14)
(D.15)
11
((E.1))
(Note the similarity between (E.1) and the form of the univariate Normal distribution given
at the beginning of Section A). We denote this by
x M V N (, )
It is easily shown that
(x )T 1 (x ) 2p
(E.2)
Set z = 2 (x ). Then
z M V N (0, I)
12
since E(z) = 0, Cov(z, z) = Cov(x, x) 2 = I and linear transformations of multivariate Normal random vectors are also distributed according to the multivariate Normal
distribution. It follows that
(x )T 1 (x ) = zT z =
p
X
zi2 2p
i=1
by definition.
Similarly, if x M V N (, ) then the distribution of the random variable xT 1 x has a
non-central chi-squared distribution with p degrees of freedom and non-centrality parameter
= T 1 . That is
y = xT 1 x 2 (p, )
12
(E.3)
F. Matrix Differentiation
Let f (x) be a real-valued function of the p 1 vector x and g(X) be a real-valued function
of the n p matrix X.
The derivative of f (x) wrt x is defined to be the p 1 vector
f
x
1
f
..
= .
x
f
(F.1)
xp
. . . xg1p
x11
g
..
..
= ...
.
.
X
g
g
. . . xnp
xn1
(F.2)
x v
.
= ..
v
xp
(F.3)
and
X
=
v
xij
v
(F.4)
(F.5)
T
(x x) = 2x
x
(F.6)
T
(x Ax) = 2Ax
x
(F.7)
T
T
(x Ay) =
(y Ax) = Ay
x
x
(F.8)
13
(XY) =
Y+X
v
v
v
X
= Jij
xij
(F.9)
(F.10a)
where Jij denotes the matrix with one in the ijth place and zeroes elsewhere. Also, if X is
square and of full rank
X1
1 X
X1
(F.11)
= X
v
v
and
X1
= X1 Jij X1
xij
(F.12a)
X
= Jij + Jji
xij
(F.10b)
If X is symmetric then
i 6= j
(
X1 Jii X1
i=j
1
1
X (Jij + Jji )X
i=
6 j
(F.12b)
Let A and X be any two matrices such that the product AX exists. Then
trace(AX) = AT
X
(F.13a)
(F.14a)
(F.15a)
(F.16a)
trace(AX) = 2aij
xij
|X|
= 2Xij
xij
14
i 6= j
i 6= j
(F.13b)
(F.14b)
ln|X| = 2xij
xij
i 6= j
(F.15b)
i 6= j
(F.16b)
trace(AX) = aji
xij
To prove (F.14a) use (B.17). To prove (F.15a) use (F.14a) and (B.18) to show that
ln|X| = xji
xij
Result (F.16a) is harder to prove. Use (F.12a) and (B.14) to show that
15