You are on page 1of 5

Chapter 5: Distribution of Quadratic Forms

Notes for MATH 668 based on Linear Models in Statistics by Alvin C. Rencher and G. Bruce Schaalje, second
edition, Wiley, 2008.
January 30, 2018

5.1 Sums of Squares


In this chapter, we consider the distribution of quadratic forms y⊤ Ay = ∑ ∑ aij yi yj where y = (yi ) is a random vector and
i j

A = (aij ) is a matrix of constants.


Recall, I is the identity matrix, j is a vector of 1’s, and J is a matrix of 1’s.
In this section, suppose that they are n-dimensional.
Here are some basic univarate statistics in matrix form.
n
1 1 ⊤
ȳ = ∑ yi = j y
n n
i=1

2
1 ⊤ ⊤
1 ⊤ ⊤
1 ⊤ ⊤
1 ⊤
ȳ = (j y)(j y) = (y j)(j y) = y (jj )y = y Jy
2 2 2 2
n n n n
n n

2 2 ⊤
1 ⊤ ⊤
1
2
∑(yi − ȳ ) = ∑y − nȳ = y Iy − y Jy = y (I − J) y
i
n n
i=1 i=1

The following decomposition is very useful:

1 1
I = (I − J) + J.
n n

Theorem 5.1.1 (p.106):


( a ) I, I − n1 J , and n1 J are idempotent.
( b ) (I − 1
J)(
1
J) = O
n n

Proof: (a) We see that


= I,
2
I

(
1
J) (
1
J) =
1
(JJ) =
1
(nJ) =
1
J ,
n n n
2
n
2 n

2
(I −
1
J) (I −
1
J) = I
2

2
J + (
1
J) = I −
2
J +
1
J = I −
1
J ,
n n n n n n n
2
and (b) (I − 1

n
J) (
1

n
J) = (
1

n
J) − (
1

n
J) =
1

n
J −
1

n
J = O .

5.2 Mean and Variance of Quadratic Forms


Theorem 5.2.1 (p.107): If A is an n × n matrix of constants and y is an n-dimensional random vector such that E(y) = μ and
cov(y) = Σ , then

⊤ ⊤
E(y Ay) = tr (AΣ) + μ Aμ.

Proof: Since Σ = E(yy



) − μμ

, it follows that E(yy⊤ ) = Σ + μμ

so that

⊤ ⊤
E(y Ay) = E(tr (y Ay))


= E(tr (Ayy ))


= tr (E(Ayy ))


= tr (AE(yy ))


= tr (A(Σ + μμ ))


= tr (AΣ + Aμμ )


= tr (AΣ) + tr (Aμμ )


= tr (AΣ) + tr (μ Aμ)


= tr (AΣ) + μ Aμ.

Theorem 5.2.2 (p.111): If A is an m × n matrix of constants, and x and y are m- and n-dimensional random vectors such that
x μx x Σxx Σxy
E( ) = ( ) and cov ( ) = ( ) , then
y μy y Σyx Σyy

⊤ ⊤
E(x Ay) = tr (AΣyx ) + μx Aμy .

xi μx
Example 5.2.1: Suppose that (x 1 , y1 ), … , (x n , yn ) is a random sample such that E ( ) = ( ) and
yi μy
2 n
xi σx σ xy 1
cov ( ) = ( ) , and let sxy = ∑(x i − x̄ )(yi − ȳ ) . Show that E(sxy ) = σ xy .
2
yi σ xy σy n − 1
i=1

x1 y1
⎛ ⎞ ⎛ ⎞ 2
x μx j x σx I σ xy I
Answer: Let x = ⎜
⎜ ⋮


and y = ⎜
⎜ ⋮


. Then E ( ) = ( ) and cov ( ) = (
2
) . Since
y μy j y σ xy I σy I
⎝ ⎠ ⎝ ⎠
xn yn

1 ⊤
1
sxy = x (I − J) y,
This study source was downloaded by 100000857715933 from CourseHero.com on 03-28-2023 09:01:59 GMT -05:00
n − 1 n

Theorem 5.2.2 implies that


https://www.coursehero.com/file/115335130/Chapter-5-Distribution-of-Quadratic-Formspdf/
1 1 ⊤
1
E(sxy ) = {tr ((I − J) σ xy I) + (μx j) (I − J) (μy j)}
n − 1 n n

1 1 ⊤
1 ⊤
= {σ xy tr (I − J) + μx μy j (I − jj ) j}
n − 1 n n

1 1 ⊤
1 ⊤ ⊤
= {σ xy (tr (I) − tr (J)) + μx μy (j j − j jj j)}
n − 1 n n

1 1 1 2
= {σ xy (n − n) + μx μy (n − n )}
n − 1 n n

1
= {σ xy (n − 1) + 0}
n − 1

= σ xy .

Theorem 5.2.3 (p.108): If A is a p × p matrix of constants and y ∼ Np (μ, Σ) , then the moment generating function of y⊤ Ay is

⊤ −1 −1
−1/2 −μ (I−(I−2tAΣ) )Σ μ/2
My⊤ Ay (t) = det(I − 2tAΣ) e .

Theorem 5.2.4 (p.109): If A is a p × p symmetric matrix of constants and y ∼ Np (μ, Σ) , then

⊤ ⊤
var(y Ay) = 2tr (AΣAΣ) + 4μ AΣAμ.

Theorem 5.2.5 (p.110): If A is a p × p symmetric matrix of constants and y ∼ Np (μ, Σ) , then


cov(y, y Ay) = 2ΣAμ.

Proof: We have

⊤ ⊤ ⊤
cov(y, y Ay) = E [(y − μ)(y Ay − tr (AΣ) − μ Aμ)]

′ ′
= E [(y − μ) ((y − μ) A(y − μ) + 2(y − μ) Aμ − tr (AΣ))]
′ ′
= E [(y − μ)(y − μ) A(y − μ)] + 2E [(y − μ)(y − μ) ] Aμ − E(y − μ)tr (AΣ)

= E [(y − μ)(y − μ) A(y − μ)] + 2ΣAμ − 0

1/2 ′ 1/2 1/2


= E (Σ zz Σ AΣ z) + 2ΣAμ

where z = Σ
−1/2
(y − μ) ∼ Np (0, I) . Letting B = Σ
1/2

1/2
, it follows that

z1
⎡⎛ ⎞ p p ⎤

E [zz Bz] = E ⎢⎜ ⎟ ⎥
⋮ ⎟ ∑ ∑ bij zi zj ⎥
⎢⎜
i=1 j=1
⎣⎝ ⎠ ⎦
zp
p p
∑ ∑ bij z1 zi zj
⎛ i=1 j=1 ⎞

⎜ ⎟
= E⎜ ⎟
⎜ ⋮ ⎟

⎝ ∑p ∑
p
bij zp zi zj ⎠
i=1 j=1

and, for any k ∈ {1, … , p} ,

p p p p

E (∑ ∑ bij zk zi zj ) = ∑ ∑ bij E (zk zi zj )

i=1 j=1 i=1 j=1

3
= bkk E (z )
k

3
1 −z
2
/2
= bkk ∫ z e dz
−−
−∞ √2π

= 0.

Thus, E (Σ1/2 zz′ Σ1/2 AΣ1/2 z) = 0 which implies that cov(y, y⊤ Ay) = 2ΣAμ .

5.3 Noncentral Chi-Square Distribution


Definition 5.3.1 (p.113): If y1 , … , yn are independent N (μi , 1) random variables, then the probability distribution of
n
v = ∑
i=1
y
i
2
= y

y is called a noncentral chi-square distribution with n degrees of freedom and noncentrality parameter
n
1
. We sometimes write v .
2 ⊤ 2
λ = ∑μ = μ μ/2 ∼ χ (n, λ)
i
2
i=1
n
Definition 5.3.2 (p.112): If y1 , … , yn are independent N (0, 1) random variables, then the probability distribution of v = ∑
i=1
y
i
2
= y

y

is called a chi-square distribution with n degrees of freedom and we can write v ∼ χ (n)
2
.
Theorem 5.3.1 (p.114): If v ∼ χ 2 (n, λ), then
E(v) = n + 2λ

var(v) = 2n + 8λ

Mv (t) = (1 − 2t) . −n/2


e
−λ[1−1/(1−2t)]

Proof: These statements follow from Theorem 5.2.1, Theorem 5.2.4, and Theorem 5.2.3, respectively. For instance, with A = Σ = I ,
Theorem 5.2.4 gives

var(v) = 2tr (I) + 4μ μ = 2n + 4(2λ) = 2n + 8λ.

k k k

Theorem 5.3.2 (p.114): If v1 , … , vk are independent χ 2 (ni , λi ) random variables, then ∑ vi ∼ χ


2
(∑ ni , ∑ λi ) .
i=1 i=1 i=1

This study source was downloaded by 100000857715933 from CourseHero.com on 03-28-2023 09:01:59 GMT -05:00 k

Proof: By Theorem 4.3.3(b), the moment generating function of ∑ vi is


https://www.coursehero.com/file/115335130/Chapter-5-Distribution-of-Quadratic-Formspdf/ i=1
k
−n /2 −λ [1−1/(1−2t)]
M (t) = ∏(1 − 2t) i
e
i

i=1

k k
− ∑ ni /2 − ∑ λi [1−1/(1−2t)]
= (1 − 2t) i=1
e i=1
.

k k
This is the moment generating function of a χ 2 (∑i=1 ni , ∑i=1 λi ) distribution so the result holds based on Theorem 4.3.3(a).

5.4 Noncentral F and t Distribution


y
Definition 5.4.1 (p.116): If y ∼ N (μ, 1) and u ∼ χ (p)
2
are independent random variables, then the probability distribution of t =
− −−
√ u/p

is called a noncentral t distribution with p degrees of freedom and noncentrality parameter μ. We sometimes write t ∼ t(p, μ) .
z
Definition 5.4.2 (p.116): If z ∼ N (0, 1) and u 2
∼ χ (p) are independent random variables, then the probability distribution of t =
− −−
√ u/p

is called a t distribution with p degrees of freedom and write t ∼ t(p) .


Compare this with Definition L13.2 from MATH 667.
u/p
Definition 5.4.3 (p.115): If u 2
∼ χ (p, λ) and v ∼ χ (q)
2
are independent random variables, then the probability distribution of z = is
v/q

called a noncentral F distribution with p degrees of freedom in the numerator, q degrees of freedom in the denominator, and noncentrality
parameter λ. We sometimes write z ∼ F (p, q, λ) .
u/p
Definition 5.4.4 (p.114): If u 2
∼ χ (p) and v ∼ χ (q)
2
are independent random variables, then the probability distribution of w = is
v/q

called an F distribution with p degrees of freedom in the numerator and q degrees of freedom in the denominator and we write
w ∼ F (p, q) .

R Example 5.4.1: Suppose that y1 , … , y4 are iid N (1, 9) random variables, z1 , … , z25 are iid N (0, 1) random variables, and
4 25

(y1 , … , y4 ) is independent of (z1 , … , z25 ). Compute P (∑ y


2
i
> ∑z )
2
j
.
i=1 j=1

Answer: Here

y 2
1 4 i
4 25
⎛ ∑ ( ) 25 ⎞
4 i=1 3
2 2
P (∑ y > ∑z ) = P >
i j
1 25
⎝ ∑ z
2 36 ⎠
i=1 j=1
25 j=1 j

4 4 4 2 25
2
yi 1 1 1 2
where ∑ ( ) = ∑y
2
i
∼ χ
2
(4, λ = ∑( ) = ) by Theorem 5.3.1 and ∑ zj2 2
∼ χ (25) by Theorem 5.3.2 (which
3 9 2 3 9
i=1 i=1 i=1 j=1

are independent since they are functions of independent random vectors). This probability can be computed using the R function pf as
follows. The arguments specifying the degrees of freedom are df1 and df2, the noncentrality parameter is specified by ncp (except R’s
noncentrality parameter is μ⊤ Aμ = 2λ), and the option lower.tail=FALSE tells R to compute the probability that the F-ratio is larger than
.
25

36

pf(25/36,df1=4,df2=25,ncp=2*2/9,lower.tail=FALSE)

## [1] 0.6503005

We can simulate these sums many times using the rnorm function to verify that our answer looks reasonable.

set.seed(159847)
numberOfSimulations=10000000
leftSum=rep(0,numberOfSimulations)
rightSum=rep(0,numberOfSimulations)
for (i in 1:numberOfSimulations){
y=rnorm(4,mean=1,sd=3)
z=rnorm(25)
leftSum[i]=sum(y^2)
rightSum[i]=sum(z^2)
}
mean(leftSum > rightSum)

## [1] 0.6502779

5.5 Distribution of Quadratic Forms


Theorem 5.5.1 (p.117): Suppose y ∼ Np (μ, Σ) , A is a symmetric matrix of constants with rank r , and λ =
1
μ

Aμ . Then
2

y
⊤ 2
Ay ∼ χ (r, λ)if and only if AΣ is idempotent.
Proof: Let ω1 , … , ωp be the eigenvalues of AΣ . Then the eigenvalues of I − 2tAΣ are 1 − 2tωi for i = 1, … , p . If we choose t
small enough so that |2tωi | < 1 for all i , then

1 k k
= 1 + ∑(2t) ω
i
1 − 2tωi
k=1

and

−1 k k
This study source was downloaded by 100000857715933 from CourseHero.com on 03-28-2023
(I − 2tAΣ 09:01:59
) GMT
= -05:00
I + ∑(2t) (AΣ) (see p.50).

k=1

https://www.coursehero.com/file/115335130/Chapter-5-Distribution-of-Quadratic-Formspdf/
Since AΣ is idempotent, Theorem 2.13.2 implies that r of the ω’s equal 1 and the other p − r ω ’s equal 0. So, the moment generating
function of y⊤ Ay is
⊤ −1 −1
−1/2 −μ (I−(I−2tAΣ) )Σ μ/2
My⊤ Ay = det(I − 2tAΣ) e

−1/2
p
⊤ ∞ k −1
−μ (− ∑ (2t) AΣ)Σ μ/2
k=1
= (∏(1 − 2tωi )) e

i=1

⊤ ∞ k
−1/2 −(μ Aμ/2)(− ∑ (2t) )
r k=1
= ((1 − 2t) ) e

−r/2 −(μ Aμ/2)(1−1/(1−2t))
= (1 − 2t) e

which is the moment generating function of a χ 2 (r, λ = μ



Aμ/2) random variable (see Theorem 5.3.1).

For a proof of the converse statement, see http://www.tandfonline.com/doi/pdf/10.1080/00031305.1999.10474473


(http://www.tandfonline.com/doi/pdf/10.1080/00031305.1999.10474473).
n 2
∑ (yi − ȳ )
Example 5.5.1: Suppose that y1 , … , yn is a random sample from a N (μ, σ 2
) distribution. Show that i=1 2
∼ χ (n − 1) .
2
σ
Answer: Here \y=\bpm y_1\\ \vdots \\ y_n}\epm \sim N_n(\mu\j,\sigma^2\I) and

n 2 1
∑ (yi − ȳ ) (I − J)
i=1 ⊤ n
= y y.
2 2
σ σ

By Theorem 5.1.1(a), I − 1
J is idempotent, so all of its eigenvalues are either 0 or 1 and its rank equals the number of eigenvalues which
n

are 1. The sum of the eigenvalues of I − 1

n
J is

1 1 1
tr (I − J) = tr (I) − tr (J) = n − n = n − 1,
n n n

so rank(I − 1
J) = n − 1 . The noncentrality parameter is
n

1 ′
1 1
λ = (μj) ( (I − J)) (μj)
2
2 σ n

2
μ 1

= j (I − J)j
2 n
2
μ 1
⊤ ⊤ ⊤
= (j j − j jj j)
2 n

2
μ 1 2
= (n − n )
2 n

= 0.

n 2
∑ (yi − ȳ )
So, by Theorem 5.5.1, is a chi-square random variable with rank(I − degrees of freedom.
i=1 1
J) = n − 1
2 n
σ
Compare this with the proof of Theorem L4.1(c) from MATH 667.

5.6 Independence of Linear Forms and Quadratic Forms


Theorem 5.6.1 (p.119): If y ∼ Np (μ, Σ) , B is a k × p matrix of constants, and A is a p × p matrix of constants, then By and y⊤ Ay
are independent if and only if BΣA = O.
Theorem 5.6.2 (p.120): If y ∼ Np (μ, Σ) and A and B are p × p symmetric matrices of constants, then y⊤ Ay and y⊤ By are
independent if and only if AΣB = O.
Example 5.6.1: Suppose y ∼ Np (μ, σ 2 I) and H is a p × p symmetric idempotent matrix of constants with rank r < p where

y Hy/r
μ

(I − H)μ = 0 . What is the distribution of ?

y (I − H)y/(p − r)

Answer: Note the 1

σ
y ∼ Np (
1

σ
μ, I) . Since H is idempotent with rank r and I − H is idempotent with rank p − r , Theorem 5.5.1 implies
that

1 1 1 ⊤ 2
1 ⊤
( y) H ( y) = y Hy ∼ χ (r, μ Hμ)
2 2
σ σ σ 2σ

and

1 1 1 ⊤ 2 ⊤
( y) (I − H) ( y) = y (I − H)y ∼ χ (p − r) since μ (I − H)μ = 0.
2
σ σ σ

By Theorem 5.6.2, y⊤ Hy and y⊤ (I − H)y are independent since H(I − H) = O . So, by Definition 5.4.3, we see that

1 ⊤
⊤ ( y Hy) /r
y Hy/r σ
2
1 ⊤
= ∼ F (r, p − r, μ Hμ).
⊤ 2
y (I − H)y/(p − r) 1 ⊤ 2σ
( y (I − H)y) /(p − r)
2
σ

Theorem 5.6.3 (p.121): Suppose y ∼ Nn (μ, σ


2
,
I) A i is an n × n symmetric matrix of rank r i for i = 1, … k , and

⊤ ⊤
y y = ∑y A i y.

i=1

1 1
Then y

Ai y ∼ χ
2
(r i , μ

A i μ) for i = 1, … , k and y⊤ A 1 y, … , y⊤ A k y are mutually independent if and only if at least
2 2
σ 2σ

one of the following statements holds:


This study source was downloaded by 100000857715933 from CourseHero.com on 03-28-2023 09:01:59 GMT -05:00
A 1 , … , A k are idempotent matrices

A i A j = O for all i ≠ j
https://www.coursehero.com/file/115335130/Chapter-5-Distribution-of-Quadratic-Formspdf/
k

n = ∑ ri

i=1

This study source was downloaded by 100000857715933 from CourseHero.com on 03-28-2023 09:01:59 GMT -05:00

https://www.coursehero.com/file/115335130/Chapter-5-Distribution-of-Quadratic-Formspdf/

Powered by TCPDF (www.tcpdf.org)

You might also like