Distribution of Quadratic Forms

10/3/2020 Chapter 5: Distribution of Quadratic Forms
Chapter 5: Distribution of Quadratic Forms

Notes for MATH 668 based on Linear Models in Statistics by Alvin C. Rencher and G.
Bruce Schaalje, second edition, Wiley, 2008.
January 30, 2018
5.1 Sums of Squares

In this chapter, we consider the distribution of quadratic forms y⊤ Ay = ∑ ∑ aij yi yj where y = (yi ) is
i j
a random vector and A = (aij ) is a matrix of constants.

Recall, I is the identity matrix, j is a vector of 1 ’s, and J is a matrix of 1 ’s.
In this section, suppose that they are n-dimensional.
Here are some basic univarate statistics in matrix form.
n
1 1 ⊤
ȳ = ∑ yi = j y
n n
i=1
2
1 ⊤ ⊤
1 ⊤ ⊤
1 ⊤ ⊤
1 ⊤
ȳ = (j y)(j y) = (y j)(j y) = y (jj )y = y Jy
2 2 2 2
n n n n
n n
2 2 ⊤
1 ⊤ ⊤
1
2
∑(yi − ȳ ) = ∑y − nȳ = y Iy − y Jy = y (I − J) y
i
n n
i=1 i=1
The following decomposition is very useful:
1 1
I = (I − J) + J.
n n
Theorem 5.1.1 (p.106):

( a ) I, I − n1 J, and n1 J are idempotent.
( b ) (I − 1
J)(
1
J) = O
n n
Proof: (a) We see that

= I,
2
I
(
1
n
J) (
1
n
J) =
1
2
(JJ) =
1
2
(nJ) =
1
n
J ,
n n
2
(I −
1
n
J) (I −
1
n
J) = I
2
−
2
n
J + (
1
n
J) = I −
n
2
J +
1
n
J = I −
1
n
J ,
2
and (b) (I − 1
n
J) (
1
n
J) = (
1
n
J) − (
1
n
J) =
1
n
J −
1
n
J = O .
5.2 Mean and Variance of Quadratic Forms

Theorem 5.2.1 (p.107): If A is an n × n matrix of constants and y is an n-dimensional random vector such
that E(y) = μ and cov(y) = Σ, then
⊤ ⊤
E(y Ay) = tr (AΣ) + μ Aμ.
Proof: Since Σ = E(yy

⊤
) − μμ
⊤
, it follows that E(yy⊤ ) = Σ + μμ
⊤
so that
www.math.louisville.edu/~rsgill01/668/Ch_5_Notes.html 1/7
⊤ ⊤
E(y Ay) = E(tr (y Ay))
⊤
= E(tr (Ayy ))
⊤
= tr (E(Ayy ))
⊤
= tr (AE(yy ))
⊤
= tr (A(Σ + μμ ))
⊤
= tr (AΣ + Aμμ )
⊤
= tr (AΣ) + tr (Aμμ )
⊤
= tr (AΣ) + tr (μ Aμ)
⊤
= tr (AΣ) + μ Aμ.
Theorem 5.2.2 (p.111): If A is an m × n matrix of constants, and x and y are m- and n-dimensional
x μx x Σxx Σxy
random vectors such that E ( ) = ( ) and cov ( ) = ( ) , then
y μy y Σyx Σyy
⊤ ⊤
E(x Ay) = tr (AΣyx ) + μx Aμy .
xi μx
Example 5.2.1: Suppose that (x 1 , y1 ), … , (x n , yn ) is a random sample such that E ( ) = ( )
yi μy
2 n
xi σx σ xy 1
and cov ( ) = ( ) , and let sxy = ∑(x i − x̄ )(yi − ȳ ) . Show that
2
yi σ xy σy n − 1
i=1
E(sxy ) = σ xy .
x1 y1
⎛ ⎞ ⎛ ⎞
x μx j
Answer: Let x = ⎜
⎜ ⋮
⎟
⎟
and y = ⎜
⎜ ⋮
⎟
⎟
. Then E ( ) = ( ) and
y μy j
⎝ ⎠ ⎝ ⎠
xn yn
2
x σx I σ xy I
cov ( ) = ( ) . Since
2
y σ xy I σy I
1 ⊤
1
sxy = x (I − J) y,
n − 1 n
Theorem 5.2.2 implies that
1 1 ⊤
1
E(sxy ) = {tr ((I − J) σ xy I) + (μx j) (I − J) (μy j)}
n − 1 n n
1 1 ⊤
1 ⊤
= {σ xy tr (I − J) + μx μy j (I − jj ) j}
n − 1 n n
1 1 ⊤
1 ⊤ ⊤
= {σ xy (tr (I) − tr (J)) + μx μy (j j − j jj j)}
n − 1 n n
1 1 1 2
= {σ xy (n − n) + μx μy (n − n )}
n − 1 n n
1
= {σ xy (n − 1) + 0}
n − 1
= σ xy .
Theorem 5.2.3 (p.108): If A is a p × p matrix of constants and y ∼ Np (μ, Σ) , then the moment
generating function of y⊤ Ay is
⊤ −1 −1
−1/2 −μ (I−(I−2tAΣ) )Σ μ/2
My⊤ Ay (t) = det(I − 2tAΣ) e .
Theorem 5.2.4 (p.109): If A is a p × p symmetric matrix of constants and y ∼ Np (μ, Σ) , then
⊤ ⊤
var(y Ay) = 2tr (AΣAΣ) + 4μ AΣAμ.
Theorem 5.2.5 (p.110): If A is a p × p symmetric matrix of constants and y ∼ Np (μ, Σ) , then
⊤
cov(y, y Ay) = 2ΣAμ.
Proof: We have
⊤ ⊤ ⊤
cov(y, y Ay) = E [(y − μ)(y Ay − tr (AΣ) − μ Aμ)]
′ ′
= E [(y − μ) ((y − μ) A(y − μ) + 2(y − μ) Aμ − tr (AΣ))]
′ ′
= E [(y − μ)(y − μ) A(y − μ)] + 2E [(y − μ)(y − μ) ] Aμ − E(y − μ)tr (AΣ)
′
= E [(y − μ)(y − μ) A(y − μ)] + 2ΣAμ − 0
1/2 ′ 1/2 1/2

= E (Σ zz Σ AΣ z) + 2ΣAμ
where z = Σ
−1/2
(y − μ) ∼ Np (0, I) . Letting B = Σ
1/2
AΣ
1/2
, it follows that
z1
⎡⎛ ⎞ p p ⎤
′
E [zz Bz] = E ⎢⎜ ⎟ ⎥
⋮ ⎟ ∑ ∑ bij zi zj ⎥
⎢⎜
i=1 j=1
⎣⎝ ⎠ ⎦
zp
p p
∑ ∑ bij z1 zi zj
⎛ i=1 j=1 ⎞
⎜ ⎟
= E⎜ ⎟
⎜ ⋮ ⎟
⎝ ∑p ∑
p
bij zp zi zj ⎠
i=1 j=1
and, for any k ∈ {1, … , p} ,
p p p p
E (∑ ∑ bij zk zi zj ) = ∑ ∑ bij E (zk zi zj )
i=1 j=1 i=1 j=1
3
= bkk E (z )
k
∞
3
1 −z
2
/2
= bkk ∫ z e dz
−−
−∞ √2π
= 0.
Thus, E (Σ1/2 zz′ Σ1/2 AΣ1/2 z) = 0 which implies that cov(y, y⊤ Ay) = 2ΣAμ .
5.3 Noncentral Chi-Square Distribution

Definition 5.3.1 (p.113): If y1 , … , yn are independent N (μi , 1) random variables, then the probability
distribution of v = ∑i=1 yi2 = y⊤ y is called a noncentral chi-square distribution with n degrees of freedom
n
n
1
and noncentrality parameter λ = ∑μ
2
i
= μ
⊤
μ/2 . We sometimes write v 2
∼ χ (n, λ) .
2
i=1
Definition 5.3.2 (p.112): If y1 , … , yn are independent N (0, 1) random variables, then the probability
n
distribution of v = ∑i=1 yi2 = y⊤ y is called a chi-square distribution with n degrees of freedom and we
can write v ∼ χ 2 (n) .
Theorem 5.3.1 (p.114): If v 2
∼ χ (n, λ) , then
E(v) = n + 2λ
var(v) = 2n + 8λ
Mv (t) = (1 − 2t)
−n/2
e .
−λ[1−1/(1−2t)]
Proof: These statements follow from Theorem 5.2.1, Theorem 5.2.4, and Theorem 5.2.3, respectively. For
instance, with A = Σ = I , Theorem 5.2.4 gives
⊤
var(v) = 2tr (I) + 4μ μ = 2n + 4(2λ) = 2n + 8λ.
Theorem 5.3.2 (p.114): If v1 , … , vk are independent χ 2 (ni , λi ) random variables, then

k k k
.
2
∑ vi ∼ χ (∑ ni , ∑ λi )
i=1 i=1 i=1
Proof: By Theorem 4.3.3(b), the moment generating function of ∑ vi is

i=1
−ni /2 −λi [1−1/(1−2t)]

M (t) = ∏(1 − 2t) e
i=1
k k
− ∑ ni /2 − ∑ λi [1−1/(1−2t)]
= (1 − 2t) i=1
e i=1
.
k k
This is the moment generating function of a χ 2 (∑i=1 ni , ∑i=1 λi ) distribution so the result holds based
on Theorem 4.3.3(a).
5.4 Noncentral F and t Distribution

Definition 5.4.1 (p.116): If y ∼ N (μ, 1) and u ∼ χ (p)
2
are independent random variables, then the
y
probability distribution of t =
− −−
is called a noncentral t distribution with p degrees of freedom and
√ u/p
noncentrality parameter μ. We sometimes write t ∼ t(p, μ) .

Definition 5.4.2 (p.116): If z ∼ N (0, 1) and u ∼ χ 2 (p) are independent random variables, then the
z
probability distribution of t =
− −−
is called a t distribution with p degrees of freedom and write t ∼ t(p) .
√ u/p
Compare this with Definition L13.2 from MATH 667.

Definition 5.4.3 (p.115): If u ∼ χ 2 (p, λ) and v ∼ χ 2 (q) are independent random variables, then the
u/p
probability distribution of z = is called a noncentral F distribution with p degrees of freedom in the
v/q
numerator, q degrees of freedom in the denominator, and noncentrality parameter λ. We sometimes write
z ∼ F (p, q, λ) .
Definition 5.4.4 (p.114): If u ∼ χ 2 (p) and v ∼ χ 2 (q) are independent random variables, then the
u/p
probability distribution of w = is called an F distribution with p degrees of freedom in the numerator
v/q
and q degrees of freedom in the denominator and we write w ∼ F (p, q) .

R Example 5.4.1: Suppose that y1 , … , y4 are iid N (1, 9) random variables, z1 , … , z25 are iid N (0, 1)
4 25
random variables, and (y1 , … , y4 ) is independent of (z1 , … , z25 ). Compute P (∑ y

2
i
> ∑z )
2
j
.
i=1 j=1
Answer: Here
y 2
1 4 i
4 25
⎛ ∑ ( ) 25 ⎞
4 i=1 3
2 2
P (∑ y > ∑z ) = P >
i j
1 25
⎝ ∑ z
2 36 ⎠
i=1 j=1
25 j=1 j
4 4 4 2
2
yi 1 1 1 2
where ∑ ( by Theorem 5.3.1 and
2 2
) = ∑y ∼ χ (4, λ = ∑( ) = )
i
3 9 2 3 9
i=1 i=1 i=1
25
by Theorem 5.3.2 (which are independent since they are functions of independent
2 2
∑z ∼ χ (25)
j
j=1
random vectors). This probability can be computed using the R function pf as follows. The arguments
specifying the degrees of freedom are df1 and df2, the noncentrality parameter is specified by ncp (except
R’s noncentrality parameter is μ⊤ Aμ = 2λ), and the option lower.tail=FALSE tells R to compute the
probability that the F-ratio is larger than .
25
36
pf(25/36,df1=4,df2=25,ncp=2*2/9,lower.tail=FALSE)
## [1] 0.6503005
We can simulate these sums many times using the rnorm function to verify that our answer looks reasonable.
set.seed(159847)
numberOfSimulations=10000000
leftSum=rep(0,numberOfSimulations)
rightSum=rep(0,numberOfSimulations)
for (i in 1:numberOfSimulations){
y=rnorm(4,mean=1,sd=3)
z=rnorm(25)
leftSum[i]=sum(y^2)
rightSum[i]=sum(z^2)
}
mean(leftSum > rightSum)
## [1] 0.6502779
5.5 Distribution of Quadratic Forms

Theorem 5.5.1 (p.117): Suppose y ∼ Np (μ, Σ) , A is a symmetric matrix of constants with rank r, and
. Then y if and only if AΣ is idempotent.
1 ⊤ ⊤ 2
λ = μ Aμ Ay ∼ χ (r, λ)
2
Proof: Let ω1 , … , ωp be the eigenvalues of AΣ . Then the eigenvalues of I − 2tAΣ are 1 − 2tωi for
i = 1, … , p . If we choose t small enough so that |2tωi | < 1 for all i , then
∞
1 k k
= 1 + ∑(2t) ω
i
1 − 2tωi
k=1
and
∞
−1 k k
(I − 2tAΣ) = I + ∑(2t) (AΣ) (see p.50).
k=1
Since AΣ is idempotent, Theorem 2.13.2 implies that r of the ω ’s equal 1 and the other p − r ω ’s equal 0.
So, the moment generating function of y⊤ Ay is
⊤ −1 −1
−1/2 −μ (I−(I−2tAΣ) )Σ μ/2
My⊤ Ay = det(I − 2tAΣ) e
−1/2
p
⊤ ∞ k −1
−μ (− ∑ (2t) AΣ)Σ μ/2
k=1
= (∏(1 − 2tωi )) e
i=1
⊤ ∞ k
−1/2 −(μ Aμ/2)(− ∑ (2t) )
r k=1
= ((1 − 2t) ) e
⊤
−r/2 −(μ Aμ/2)(1−1/(1−2t))
= (1 − 2t) e
which is the moment generating function of a χ 2 (r, λ = μ

⊤
Aμ/2) random variable (see Theorem 5.3.1).
For a proof of the converse statement, see http://www.tandfonline.com/doi/pdf/10.1080/00031305.1999.10474473

(http://www.tandfonline.com/doi/pdf/10.1080/00031305.1999.10474473).
Example 5.5.1: Suppose that y1 , … , yn is a random sample from a N (μ, σ 2 ) distribution. Show that
n 2
∑ (yi − ȳ )
.
i=1 2
∼ χ (n − 1)
2
σ
Answer: Here \y=\bpm y_1\\ \vdots \\ y_n}\epm \sim N_n(\mu\j,\sigma^2\I) and
n 2 1
∑ (yi − ȳ ) (I − J)
i=1 ⊤ n
= y y.
2 2
σ σ
By Theorem 5.1.1(a), I − is idempotent, so all of its eigenvalues are either 0 or 1 and its rank equals the
1
J
n
number of eigenvalues which are 1. The sum of the eigenvalues of I − is

1
J
n
1 1 1
tr (I − J) = tr (I) − tr (J) = n − n = n − 1,
n n n
so rank(I − . The noncentrality parameter is

1
J) = n − 1
n
1 ′
1 1
λ = (μj) ( (I − J)) (μj)
2
2 σ n
2
μ 1
⊤
= j (I − J)j
2 n
2
μ 1
⊤ ⊤ ⊤
= (j j − j jj j)
2 n
2
μ 1 2
= (n − n )
2 n
= 0.
n 2
∑ (yi − ȳ )
So, by Theorem 5.5.1, is a chi-square random variable with rank(I −
i=1 1
J) = n − 1
2 n
σ
degrees of freedom.
Compare this with the proof of Theorem L4.1(c) from MATH 667.
5.6 Independence of Linear Forms and Quadratic Forms

Theorem 5.6.1 (p.119): If y ∼ Np (μ, Σ) , B is a k × p matrix of constants, and A is a p × p matrix of
constants, then By and y Ay are independent if and only if BΣA = O.
⊤
Theorem 5.6.2 (p.120): If y ∼ Np (μ, Σ) and A and B are p × p symmetric matrices of constants, then
y
⊤
and y⊤ By are independent if and only if AΣB = O.
Ay
Example 5.6.1: Suppose y ∼ Np (μ, σ 2 I) and H is a p × p symmetric idempotent matrix of constants

⊤
y Hy/r
with rank r < p where μ⊤ (I − H)μ = 0 . What is the distribution of ?
⊤
y (I − H)y/(p − r)
Answer: Note the . Since H is idempotent with rank r and I − H is idempotent with
1 1
y ∼ Np ( μ, I)
σ σ
rank p − r , Theorem 5.5.1 implies that

′
1 1 1 ⊤ 2
1 ⊤
( y) H ( y) = y Hy ∼ χ (r, μ Hμ)
2 2
σ σ σ 2σ
and
′
1 1 1 ⊤ 2 ⊤
( y) (I − H) ( y) = y (I − H)y ∼ χ (p − r) since μ (I − H)μ = 0.
2
σ σ σ
By Theorem 5.6.2, y⊤ Hy and y⊤ (I − H)y are independent since H(I − H) = O . So, by Definition
5.4.3, we see that
1 ⊤
⊤ ( y Hy) /r
y Hy/r σ
2
1 ⊤
= ∼ F (r, p − r, μ Hμ).
⊤ 2
y (I − H)y/(p − r) 1 ⊤ 2σ
( y (I − H)y) /(p − r)
2
σ
Theorem 5.6.3 (p.121): Suppose y ∼ Nn (μ, σ

2
,
I) A i is an n × n symmetric matrix of rank r i for
i = 1, … k , and
⊤ ⊤
y y = ∑y A i y.
i=1
1 1
Then for i and y⊤ A 1 y, … , y⊤ A k y are mutually
⊤ 2 ⊤
y Ai y ∼ χ (r i , μ A i μ) = 1, … , k
2 2
σ 2σ
independent if and only if at least one of the following statements holds:

A 1 , … , A k are idempotent matrices
A i A j = O for all i ≠ j
n = ∑ ri
i=1

Distribution of Quadratic Forms

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Distribution of Quadratic Forms

Uploaded by

Copyright:

Available Formats

10/3/2020 Chapter 5: Distribution of Quadratic Forms

Chapter 5: Distribution of Quadratic Forms

5.1 Sums of Squares

a random vector and A = (aij ) is a matrix of constants.

The following decomposition is very useful:

Theorem 5.1.1 (p.106):

Proof: (a) We see that

5.2 Mean and Variance of Quadratic Forms

Proof: Since Σ = E(yy

Theorem 5.2.2 implies that

Theorem 5.2.4 (p.109): If A is a p × p symmetric matrix of constants and y ∼ Np (μ, Σ) , then

Theorem 5.2.5 (p.110): If A is a p × p symmetric matrix of constants and y ∼ Np (μ, Σ) , then

1/2 ′ 1/2 1/2

and, for any k ∈ {1, … , p} ,

E (∑ ∑ bij zk zi zj ) = ∑ ∑ bij E (zk zi zj )

i=1 j=1 i=1 j=1

5.3 Noncentral Chi-Square Distribution

Theorem 5.3.2 (p.114): If v1 , … , vk are independent χ 2 (ni , λi ) random variables, then

i=1 i=1 i=1

Proof: By Theorem 4.3.3(b), the moment generating function of ∑ vi is

−ni /2 −λi [1−1/(1−2t)]

5.4 Noncentral F and t Distribution

noncentrality parameter μ. We sometimes write t ∼ t(p, μ) .

Compare this with Definition L13.2 from MATH 667.

and q degrees of freedom in the denominator and we write w ∼ F (p, q) .

random variables, and (y1 , … , y4 ) is independent of (z1 , … , z25 ). Compute P (∑ y

5.5 Distribution of Quadratic Forms

which is the moment generating function of a χ 2 (r, λ = μ

For a proof of the converse statement, see http://www.tandfonline.com/doi/pdf/10.1080/00031305.1999.10474473

number of eigenvalues which are 1. The sum of the eigenvalues of I − is

so rank(I − . The noncentrality parameter is

5.6 Independence of Linear Forms and Quadratic Forms

Example 5.6.1: Suppose y ∼ Np (μ, σ 2 I) and H is a p × p symmetric idempotent matrix of constants

rank p − r , Theorem 5.5.1 implies that

Theorem 5.6.3 (p.121): Suppose y ∼ Nn (μ, σ

independent if and only if at least one of the following statements holds:

You might also like