You are on page 1of 7

10/3/2020 Chapter 5: Distribution of Quadratic Forms

Chapter 5: Distribution of Quadratic Forms


Notes for MATH 668 based on Linear Models in Statistics by Alvin C. Rencher and G.
Bruce Schaalje, second edition, Wiley, 2008.
January 30, 2018

5.1 Sums of Squares


In this chapter, we consider the distribution of quadratic forms y⊤ Ay = ∑ ∑ aij yi yj where y = (yi ) is
i j

a random vector and A = (aij ) is a matrix of constants.


Recall, I is the identity matrix, j is a vector of 1 ’s, and J is a matrix of 1 ’s.
In this section, suppose that they are n-dimensional.
Here are some basic univarate statistics in matrix form.
n
1 1 ⊤
ȳ = ∑ yi = j y
n n
i=1

2
1 ⊤ ⊤
1 ⊤ ⊤
1 ⊤ ⊤
1 ⊤
ȳ = (j y)(j y) = (y j)(j y) = y (jj )y = y Jy
2 2 2 2
n n n n
n n

2 2 ⊤
1 ⊤ ⊤
1
2
∑(yi − ȳ ) = ∑y − nȳ = y Iy − y Jy = y (I − J) y
i
n n
i=1 i=1

The following decomposition is very useful:

1 1
I = (I − J) + J.
n n

Theorem 5.1.1 (p.106):


( a ) I, I − n1 J, and n1 J are idempotent.
( b ) (I − 1
J)(
1
J) = O
n n

Proof: (a) We see that


= I,
2
I

(
1

n
J) (
1

n
J) =
1

2
(JJ) =
1

2
(nJ) =
1

n
J ,
n n
2
(I −
1

n
J) (I −
1

n
J) = I
2

2

n
J + (
1

n
J) = I −
n
2
J +
1

n
J = I −
1

n
J ,
2
and (b) (I − 1

n
J) (
1

n
J) = (
1

n
J) − (
1

n
J) =
1

n
J −
1

n
J = O .

5.2 Mean and Variance of Quadratic Forms


Theorem 5.2.1 (p.107): If A is an n × n matrix of constants and y is an n-dimensional random vector such
that E(y) = μ and cov(y) = Σ, then

⊤ ⊤
E(y Ay) = tr (AΣ) + μ Aμ.

Proof: Since Σ = E(yy



) − μμ

, it follows that E(yy⊤ ) = Σ + μμ

so that

www.math.louisville.edu/~rsgill01/668/Ch_5_Notes.html 1/7
10/3/2020 Chapter 5: Distribution of Quadratic Forms
⊤ ⊤
E(y Ay) = E(tr (y Ay))


= E(tr (Ayy ))


= tr (E(Ayy ))


= tr (AE(yy ))


= tr (A(Σ + μμ ))


= tr (AΣ + Aμμ )


= tr (AΣ) + tr (Aμμ )


= tr (AΣ) + tr (μ Aμ)


= tr (AΣ) + μ Aμ.

Theorem 5.2.2 (p.111): If A is an m × n matrix of constants, and x and y are m- and n-dimensional
x μx x Σxx Σxy
random vectors such that E ( ) = ( ) and cov ( ) = ( ) , then
y μy y Σyx Σyy

⊤ ⊤
E(x Ay) = tr (AΣyx ) + μx Aμy .

xi μx
Example 5.2.1: Suppose that (x 1 , y1 ), … , (x n , yn ) is a random sample such that E ( ) = ( )
yi μy
2 n
xi σx σ xy 1
and cov ( ) = ( ) , and let sxy = ∑(x i − x̄ )(yi − ȳ ) . Show that
2
yi σ xy σy n − 1
i=1

E(sxy ) = σ xy .
x1 y1
⎛ ⎞ ⎛ ⎞
x μx j
Answer: Let x = ⎜
⎜ ⋮


and y = ⎜
⎜ ⋮


. Then E ( ) = ( ) and
y μy j
⎝ ⎠ ⎝ ⎠
xn yn
2
x σx I σ xy I
cov ( ) = ( ) . Since
2
y σ xy I σy I

1 ⊤
1
sxy = x (I − J) y,
n − 1 n

Theorem 5.2.2 implies that

1 1 ⊤
1
E(sxy ) = {tr ((I − J) σ xy I) + (μx j) (I − J) (μy j)}
n − 1 n n

1 1 ⊤
1 ⊤
= {σ xy tr (I − J) + μx μy j (I − jj ) j}
n − 1 n n

1 1 ⊤
1 ⊤ ⊤
= {σ xy (tr (I) − tr (J)) + μx μy (j j − j jj j)}
n − 1 n n

1 1 1 2
= {σ xy (n − n) + μx μy (n − n )}
n − 1 n n

1
= {σ xy (n − 1) + 0}
n − 1

= σ xy .

Theorem 5.2.3 (p.108): If A is a p × p matrix of constants and y ∼ Np (μ, Σ) , then the moment
generating function of y⊤ Ay is

⊤ −1 −1
−1/2 −μ (I−(I−2tAΣ) )Σ μ/2
My⊤ Ay (t) = det(I − 2tAΣ) e .

Theorem 5.2.4 (p.109): If A is a p × p symmetric matrix of constants and y ∼ Np (μ, Σ) , then

www.math.louisville.edu/~rsgill01/668/Ch_5_Notes.html 2/7
10/3/2020 Chapter 5: Distribution of Quadratic Forms
⊤ ⊤
var(y Ay) = 2tr (AΣAΣ) + 4μ AΣAμ.

Theorem 5.2.5 (p.110): If A is a p × p symmetric matrix of constants and y ∼ Np (μ, Σ) , then


cov(y, y Ay) = 2ΣAμ.

Proof: We have

⊤ ⊤ ⊤
cov(y, y Ay) = E [(y − μ)(y Ay − tr (AΣ) − μ Aμ)]

′ ′
= E [(y − μ) ((y − μ) A(y − μ) + 2(y − μ) Aμ − tr (AΣ))]
′ ′
= E [(y − μ)(y − μ) A(y − μ)] + 2E [(y − μ)(y − μ) ] Aμ − E(y − μ)tr (AΣ)

= E [(y − μ)(y − μ) A(y − μ)] + 2ΣAμ − 0

1/2 ′ 1/2 1/2


= E (Σ zz Σ AΣ z) + 2ΣAμ

where z = Σ
−1/2
(y − μ) ∼ Np (0, I) . Letting B = Σ
1/2

1/2
, it follows that

z1
⎡⎛ ⎞ p p ⎤

E [zz Bz] = E ⎢⎜ ⎟ ⎥
⋮ ⎟ ∑ ∑ bij zi zj ⎥
⎢⎜
i=1 j=1
⎣⎝ ⎠ ⎦
zp
p p
∑ ∑ bij z1 zi zj
⎛ i=1 j=1 ⎞

⎜ ⎟
= E⎜ ⎟
⎜ ⋮ ⎟

⎝ ∑p ∑
p
bij zp zi zj ⎠
i=1 j=1

and, for any k ∈ {1, … , p} ,

p p p p

E (∑ ∑ bij zk zi zj ) = ∑ ∑ bij E (zk zi zj )

i=1 j=1 i=1 j=1

3
= bkk E (z )
k

3
1 −z
2
/2
= bkk ∫ z e  dz
−−
−∞ √2π

= 0.

Thus, E (Σ1/2 zz′ Σ1/2 AΣ1/2 z) = 0 which implies that cov(y, y⊤ Ay) = 2ΣAμ .

5.3 Noncentral Chi-Square Distribution


Definition 5.3.1 (p.113): If y1 , … , yn are independent N (μi , 1) random variables, then the probability
distribution of v = ∑i=1 yi2 = y⊤ y is called a noncentral chi-square distribution with n degrees of freedom
n

n
1
and noncentrality parameter λ = ∑μ
2
i
= μ

μ/2 . We sometimes write v 2
∼ χ (n, λ) .
2
i=1

Definition 5.3.2 (p.112): If y1 , … , yn are independent N (0, 1) random variables, then the probability
n
distribution of v = ∑i=1 yi2 = y⊤ y is called a chi-square distribution with n degrees of freedom and we
can write v ∼ χ 2 (n) .
Theorem 5.3.1 (p.114): If v 2
∼ χ (n, λ) , then
E(v) = n + 2λ

var(v) = 2n + 8λ

Mv (t) = (1 − 2t)
−n/2
e .
−λ[1−1/(1−2t)]

Proof: These statements follow from Theorem 5.2.1, Theorem 5.2.4, and Theorem 5.2.3, respectively. For
instance, with A = Σ = I , Theorem 5.2.4 gives

var(v) = 2tr (I) + 4μ μ = 2n + 4(2λ) = 2n + 8λ.

www.math.louisville.edu/~rsgill01/668/Ch_5_Notes.html 3/7
10/3/2020 Chapter 5: Distribution of Quadratic Forms

Theorem 5.3.2 (p.114): If v1 , … , vk are independent χ 2 (ni , λi ) random variables, then


k k k

.
2
∑ vi ∼ χ (∑ ni , ∑ λi )

i=1 i=1 i=1

Proof: By Theorem 4.3.3(b), the moment generating function of ∑ vi is


i=1

−ni /2 −λi [1−1/(1−2t)]


M (t) = ∏(1 − 2t) e

i=1

k k
− ∑ ni /2 − ∑ λi [1−1/(1−2t)]
= (1 − 2t) i=1
e i=1
.

k k
This is the moment generating function of a χ 2 (∑i=1 ni , ∑i=1 λi ) distribution so the result holds based
on Theorem 4.3.3(a).

5.4 Noncentral F and t Distribution


Definition 5.4.1 (p.116): If y ∼ N (μ, 1) and u ∼ χ (p)
2
are independent random variables, then the
y
probability distribution of t =
− −−
is called a noncentral t distribution with p degrees of freedom and
√ u/p

noncentrality parameter μ. We sometimes write t ∼ t(p, μ) .


Definition 5.4.2 (p.116): If z ∼ N (0, 1) and u ∼ χ 2 (p) are independent random variables, then the
z
probability distribution of t =
− −−
is called a t distribution with p degrees of freedom and write t ∼ t(p) .
√ u/p

Compare this with Definition L13.2 from MATH 667.


Definition 5.4.3 (p.115): If u ∼ χ 2 (p, λ) and v ∼ χ 2 (q) are independent random variables, then the
u/p
probability distribution of z = is called a noncentral F distribution with p degrees of freedom in the
v/q

numerator, q degrees of freedom in the denominator, and noncentrality parameter λ. We sometimes write
z ∼ F (p, q, λ) .

Definition 5.4.4 (p.114): If u ∼ χ 2 (p) and v ∼ χ 2 (q) are independent random variables, then the
u/p
probability distribution of w = is called an F distribution with p degrees of freedom in the numerator
v/q

and q degrees of freedom in the denominator and we write w ∼ F (p, q) .


R Example 5.4.1: Suppose that y1 , … , y4 are iid N (1, 9) random variables, z1 , … , z25 are iid N (0, 1)
4 25

random variables, and (y1 , … , y4 ) is independent of (z1 , … , z25 ). Compute P (∑ y


2
i
> ∑z )
2
j
.
i=1 j=1

Answer: Here

y 2
1 4 i
4 25
⎛ ∑ ( ) 25 ⎞
4 i=1 3
2 2
P (∑ y > ∑z ) = P >
i j
1 25
⎝ ∑ z
2 36 ⎠
i=1 j=1
25 j=1 j

4 4 4 2
2
yi 1 1 1 2
where ∑ ( by Theorem 5.3.1 and
2 2
) = ∑y ∼ χ (4, λ = ∑( ) = )
i
3 9 2 3 9
i=1 i=1 i=1

25

by Theorem 5.3.2 (which are independent since they are functions of independent
2 2
∑z ∼ χ (25)
j

j=1

random vectors). This probability can be computed using the R function pf as follows. The arguments
specifying the degrees of freedom are df1 and df2, the noncentrality parameter is specified by ncp (except
R’s noncentrality parameter is μ⊤ Aμ = 2λ), and the option lower.tail=FALSE tells R to compute the
probability that the F-ratio is larger than .
25

36

www.math.louisville.edu/~rsgill01/668/Ch_5_Notes.html 4/7
10/3/2020 Chapter 5: Distribution of Quadratic Forms

pf(25/36,df1=4,df2=25,ncp=2*2/9,lower.tail=FALSE)

## [1] 0.6503005

We can simulate these sums many times using the rnorm function to verify that our answer looks reasonable.

set.seed(159847)
numberOfSimulations=10000000
leftSum=rep(0,numberOfSimulations)
rightSum=rep(0,numberOfSimulations)
for (i in 1:numberOfSimulations){
y=rnorm(4,mean=1,sd=3)
z=rnorm(25)
leftSum[i]=sum(y^2)
rightSum[i]=sum(z^2)
}
mean(leftSum > rightSum)

## [1] 0.6502779

5.5 Distribution of Quadratic Forms


Theorem 5.5.1 (p.117): Suppose y ∼ Np (μ, Σ) , A is a symmetric matrix of constants with rank r, and
. Then y if and only if AΣ is idempotent.
1 ⊤ ⊤ 2
λ = μ Aμ Ay ∼ χ (r, λ)
2

Proof: Let ω1 , … , ωp be the eigenvalues of AΣ . Then the eigenvalues of I − 2tAΣ are 1 − 2tωi for
i = 1, … , p . If we choose t small enough so that |2tωi | < 1 for all i , then

1 k k
= 1 + ∑(2t) ω
i
1 − 2tωi
k=1

and

−1 k k
(I − 2tAΣ) = I + ∑(2t) (AΣ)     (see p.50).

k=1

Since AΣ is idempotent, Theorem 2.13.2 implies that r of the ω ’s equal 1 and the other p − r ω ’s equal 0.
So, the moment generating function of y⊤ Ay is

⊤ −1 −1
−1/2 −μ (I−(I−2tAΣ) )Σ μ/2
My⊤ Ay = det(I − 2tAΣ) e

−1/2
p
⊤ ∞ k −1
−μ (− ∑ (2t) AΣ)Σ μ/2
k=1
= (∏(1 − 2tωi )) e

i=1

⊤ ∞ k
−1/2 −(μ Aμ/2)(− ∑ (2t) )
r k=1
= ((1 − 2t) ) e

−r/2 −(μ Aμ/2)(1−1/(1−2t))
= (1 − 2t) e

which is the moment generating function of a χ 2 (r, λ = μ



Aμ/2) random variable (see Theorem 5.3.1).

For a proof of the converse statement, see http://www.tandfonline.com/doi/pdf/10.1080/00031305.1999.10474473


(http://www.tandfonline.com/doi/pdf/10.1080/00031305.1999.10474473).

Example 5.5.1: Suppose that y1 , … , yn is a random sample from a N (μ, σ 2 ) distribution. Show that
n 2
∑ (yi − ȳ )
.
i=1 2
∼ χ (n − 1)
2
σ
Answer: Here \y=\bpm y_1\\ \vdots \\ y_n}\epm \sim N_n(\mu\j,\sigma^2\I) and
www.math.louisville.edu/~rsgill01/668/Ch_5_Notes.html 5/7
10/3/2020 Chapter 5: Distribution of Quadratic Forms
n 2 1
∑ (yi − ȳ ) (I − J)
i=1 ⊤ n
= y y.
2 2
σ σ

By Theorem 5.1.1(a), I − is idempotent, so all of its eigenvalues are either 0 or 1 and its rank equals the
1
J
n

number of eigenvalues which are 1. The sum of the eigenvalues of I − is


1
J
n

1 1 1
tr (I − J) = tr (I) − tr (J) = n − n = n − 1,
n n n

so rank(I − . The noncentrality parameter is


1
J) = n − 1
n

1 ′
1 1
λ = (μj) ( (I − J)) (μj)
2
2 σ n

2
μ 1

= j (I − J)j
2 n
2
μ 1
⊤ ⊤ ⊤
= (j j − j jj j)
2 n

2
μ 1 2
= (n − n )
2 n

= 0.

n 2
∑ (yi − ȳ )
So, by Theorem 5.5.1, is a chi-square random variable with rank(I −
i=1 1
J) = n − 1
2 n
σ
degrees of freedom.
Compare this with the proof of Theorem L4.1(c) from MATH 667.

5.6 Independence of Linear Forms and Quadratic Forms


Theorem 5.6.1 (p.119): If y ∼ Np (μ, Σ) , B is a k × p matrix of constants, and A is a p × p matrix of
constants, then By and y Ay are independent if and only if BΣA = O.

Theorem 5.6.2 (p.120): If y ∼ Np (μ, Σ) and A and B are p × p symmetric matrices of constants, then
y

and y⊤ By are independent if and only if AΣB = O.
Ay

Example 5.6.1: Suppose y ∼ Np (μ, σ 2 I) and H is a p × p symmetric idempotent matrix of constants



y Hy/r
with rank r < p where μ⊤ (I − H)μ = 0 . What is the distribution of ?

y (I − H)y/(p − r)

Answer: Note the . Since H is idempotent with rank r and I − H is idempotent with
1 1
y ∼ Np ( μ, I)
σ σ

rank p − r , Theorem 5.5.1 implies that



1 1 1 ⊤ 2
1 ⊤
( y) H ( y) = y Hy ∼ χ (r, μ Hμ)
2 2
σ σ σ 2σ

and

1 1 1 ⊤ 2 ⊤
( y) (I − H) ( y) = y (I − H)y ∼ χ (p − r) since μ (I − H)μ = 0.
2
σ σ σ

By Theorem 5.6.2, y⊤ Hy and y⊤ (I − H)y are independent since H(I − H) = O . So, by Definition
5.4.3, we see that

1 ⊤
⊤ ( y Hy) /r
y Hy/r σ
2
1 ⊤
= ∼ F (r, p − r, μ Hμ).
⊤ 2
y (I − H)y/(p − r) 1 ⊤ 2σ
( y (I − H)y) /(p − r)
2
σ

www.math.louisville.edu/~rsgill01/668/Ch_5_Notes.html 6/7
10/3/2020 Chapter 5: Distribution of Quadratic Forms

Theorem 5.6.3 (p.121): Suppose y ∼ Nn (μ, σ


2
,
I) A i is an n × n symmetric matrix of rank r i for
i = 1, … k , and

⊤ ⊤
y y = ∑y A i y.

i=1

1 1
Then for i and y⊤ A 1 y, … , y⊤ A k y are mutually
⊤ 2 ⊤
y Ai y ∼ χ (r i , μ A i μ) = 1, … , k
2 2
σ 2σ

independent if and only if at least one of the following statements holds:


A 1 , … , A k are idempotent matrices

A i A j = O for all i ≠ j

n = ∑ ri

i=1

www.math.louisville.edu/~rsgill01/668/Ch_5_Notes.html 7/7

You might also like