Chapter 5 Distribution of Quadratic Forms PDF

Chapter 5: Distribution of Quadratic Forms
Notes for MATH 668 based on Linear Models in Statistics by Alvin C. Rencher and G. Bruce Schaalje, second
edition, Wiley, 2008.
January 30, 2018
5.1 Sums of Squares

In this chapter, we consider the distribution of quadratic forms y⊤ Ay = ∑ ∑ aij yi yj where y = (yi ) is a random vector and
i j
A = (aij ) is a matrix of constants.

Recall, I is the identity matrix, j is a vector of 1’s, and J is a matrix of 1’s.
In this section, suppose that they are n-dimensional.
Here are some basic univarate statistics in matrix form.
n
1 1 ⊤
ȳ = ∑ yi = j y
n n
i=1
2
1 ⊤ ⊤
1 ⊤ ⊤
1 ⊤ ⊤
1 ⊤
ȳ = (j y)(j y) = (y j)(j y) = y (jj )y = y Jy
2 2 2 2
n n n n
n n
2 2 ⊤
1 ⊤ ⊤
1
2
∑(yi − ȳ ) = ∑y − nȳ = y Iy − y Jy = y (I − J) y
i
n n
i=1 i=1
The following decomposition is very useful:
1 1
I = (I − J) + J.
n n
Theorem 5.1.1 (p.106):

( a ) I, I − n1 J , and n1 J are idempotent.
( b ) (I − 1
J)(
1
J) = O
n n
Proof: (a) We see that

= I,
2
I
(
1
J) (
1
J) =
1
(JJ) =
1
(nJ) =
1
J ,
n n n
2
n
2 n
2
(I −
1
J) (I −
1
J) = I
2
−
2
J + (
1
J) = I −
2
J +
1
J = I −
1
J ,
n n n n n n n
2
and (b) (I − 1
n
J) (
1
n
J) = (
1
n
J) − (
1
n
J) =
1
n
J −
1
n
J = O .
5.2 Mean and Variance of Quadratic Forms

Theorem 5.2.1 (p.107): If A is an n × n matrix of constants and y is an n-dimensional random vector such that E(y) = μ and
cov(y) = Σ , then
⊤ ⊤
E(y Ay) = tr (AΣ) + μ Aμ.
Proof: Since Σ = E(yy

⊤
) − μμ
⊤
, it follows that E(yy⊤ ) = Σ + μμ
⊤
so that
⊤ ⊤
E(y Ay) = E(tr (y Ay))
⊤
= E(tr (Ayy ))
⊤
= tr (E(Ayy ))
⊤
= tr (AE(yy ))
⊤
= tr (A(Σ + μμ ))
⊤
= tr (AΣ + Aμμ )
⊤
= tr (AΣ) + tr (Aμμ )
⊤
= tr (AΣ) + tr (μ Aμ)
⊤
= tr (AΣ) + μ Aμ.
Theorem 5.2.2 (p.111): If A is an m × n matrix of constants, and x and y are m- and n-dimensional random vectors such that
x μx x Σxx Σxy
E( ) = ( ) and cov ( ) = ( ) , then
y μy y Σyx Σyy
⊤ ⊤
E(x Ay) = tr (AΣyx ) + μx Aμy .
xi μx
Example 5.2.1: Suppose that (x 1 , y1 ), … , (x n , yn ) is a random sample such that E ( ) = ( ) and
yi μy
2 n
xi σx σ xy 1
cov ( ) = ( ) , and let sxy = ∑(x i − x̄ )(yi − ȳ ) . Show that E(sxy ) = σ xy .
2
yi σ xy σy n − 1
i=1
x1 y1
⎛ ⎞ ⎛ ⎞ 2
x μx j x σx I σ xy I
Answer: Let x = ⎜
⎜ ⋮
⎟
⎟
and y = ⎜
⎜ ⋮
⎟
⎟
. Then E ( ) = ( ) and cov ( ) = (
2
) . Since
y μy j y σ xy I σy I
⎝ ⎠ ⎝ ⎠
xn yn
1 ⊤
1
sxy = x (I − J) y,
This study source was downloaded by 100000857715933 from CourseHero.com on 03-28-2023 09:01:59 GMT -05:00
n − 1 n
Theorem 5.2.2 implies that

https://www.coursehero.com/file/115335130/Chapter-5-Distribution-of-Quadratic-Formspdf/
1 1 ⊤
1
E(sxy ) = {tr ((I − J) σ xy I) + (μx j) (I − J) (μy j)}
n − 1 n n
1 1 ⊤
1 ⊤
= {σ xy tr (I − J) + μx μy j (I − jj ) j}
n − 1 n n
1 1 ⊤
1 ⊤ ⊤
= {σ xy (tr (I) − tr (J)) + μx μy (j j − j jj j)}
n − 1 n n
1 1 1 2
= {σ xy (n − n) + μx μy (n − n )}
n − 1 n n
1
= {σ xy (n − 1) + 0}
n − 1
= σ xy .
Theorem 5.2.3 (p.108): If A is a p × p matrix of constants and y ∼ Np (μ, Σ) , then the moment generating function of y⊤ Ay is
⊤ −1 −1
−1/2 −μ (I−(I−2tAΣ) )Σ μ/2
My⊤ Ay (t) = det(I − 2tAΣ) e .
Theorem 5.2.4 (p.109): If A is a p × p symmetric matrix of constants and y ∼ Np (μ, Σ) , then
⊤ ⊤
var(y Ay) = 2tr (AΣAΣ) + 4μ AΣAμ.
Theorem 5.2.5 (p.110): If A is a p × p symmetric matrix of constants and y ∼ Np (μ, Σ) , then
⊤
cov(y, y Ay) = 2ΣAμ.
Proof: We have
⊤ ⊤ ⊤
cov(y, y Ay) = E [(y − μ)(y Ay − tr (AΣ) − μ Aμ)]
′ ′
= E [(y − μ) ((y − μ) A(y − μ) + 2(y − μ) Aμ − tr (AΣ))]
′ ′
= E [(y − μ)(y − μ) A(y − μ)] + 2E [(y − μ)(y − μ) ] Aμ − E(y − μ)tr (AΣ)
′
= E [(y − μ)(y − μ) A(y − μ)] + 2ΣAμ − 0
1/2 ′ 1/2 1/2

= E (Σ zz Σ AΣ z) + 2ΣAμ
where z = Σ
−1/2
(y − μ) ∼ Np (0, I) . Letting B = Σ
1/2
AΣ
1/2
, it follows that
z1
⎡⎛ ⎞ p p ⎤
′
E [zz Bz] = E ⎢⎜ ⎟ ⎥
⋮ ⎟ ∑ ∑ bij zi zj ⎥
⎢⎜
i=1 j=1
⎣⎝ ⎠ ⎦
zp
p p
∑ ∑ bij z1 zi zj
⎛ i=1 j=1 ⎞
⎜ ⎟
= E⎜ ⎟
⎜ ⋮ ⎟
⎝ ∑p ∑
p
bij zp zi zj ⎠
i=1 j=1
and, for any k ∈ {1, … , p} ,
p p p p
E (∑ ∑ bij zk zi zj ) = ∑ ∑ bij E (zk zi zj )
i=1 j=1 i=1 j=1
3
= bkk E (z )
k
∞
3
1 −z
2
/2
= bkk ∫ z e dz
−−
−∞ √2π
= 0.
Thus, E (Σ1/2 zz′ Σ1/2 AΣ1/2 z) = 0 which implies that cov(y, y⊤ Ay) = 2ΣAμ .
5.3 Noncentral Chi-Square Distribution

Definition 5.3.1 (p.113): If y1 , … , yn are independent N (μi , 1) random variables, then the probability distribution of
n
v = ∑
i=1
y
i
2
= y
⊤
y is called a noncentral chi-square distribution with n degrees of freedom and noncentrality parameter
n
1
. We sometimes write v .
2 ⊤ 2
λ = ∑μ = μ μ/2 ∼ χ (n, λ)
i
2
i=1
n
Definition 5.3.2 (p.112): If y1 , … , yn are independent N (0, 1) random variables, then the probability distribution of v = ∑
i=1
y
i
2
= y
⊤
y
is called a chi-square distribution with n degrees of freedom and we can write v ∼ χ (n)
2
.
Theorem 5.3.1 (p.114): If v ∼ χ 2 (n, λ), then
E(v) = n + 2λ
var(v) = 2n + 8λ
Mv (t) = (1 − 2t) . −n/2

e
−λ[1−1/(1−2t)]
Proof: These statements follow from Theorem 5.2.1, Theorem 5.2.4, and Theorem 5.2.3, respectively. For instance, with A = Σ = I ,
Theorem 5.2.4 gives
⊤
var(v) = 2tr (I) + 4μ μ = 2n + 4(2λ) = 2n + 8λ.
k k k
Theorem 5.3.2 (p.114): If v1 , … , vk are independent χ 2 (ni , λi ) random variables, then ∑ vi ∼ χ

2
(∑ ni , ∑ λi ) .
i=1 i=1 i=1
This study source was downloaded by 100000857715933 from CourseHero.com on 03-28-2023 09:01:59 GMT -05:00 k
Proof: By Theorem 4.3.3(b), the moment generating function of ∑ vi is

https://www.coursehero.com/file/115335130/Chapter-5-Distribution-of-Quadratic-Formspdf/ i=1
k
−n /2 −λ [1−1/(1−2t)]
M (t) = ∏(1 − 2t) i
e
i
i=1
k k
− ∑ ni /2 − ∑ λi [1−1/(1−2t)]
= (1 − 2t) i=1
e i=1
.
k k
This is the moment generating function of a χ 2 (∑i=1 ni , ∑i=1 λi ) distribution so the result holds based on Theorem 4.3.3(a).
5.4 Noncentral F and t Distribution

y
Definition 5.4.1 (p.116): If y ∼ N (μ, 1) and u ∼ χ (p)
2
are independent random variables, then the probability distribution of t =
− −−
√ u/p
is called a noncentral t distribution with p degrees of freedom and noncentrality parameter μ. We sometimes write t ∼ t(p, μ) .
z
Definition 5.4.2 (p.116): If z ∼ N (0, 1) and u 2
∼ χ (p) are independent random variables, then the probability distribution of t =
− −−
√ u/p
is called a t distribution with p degrees of freedom and write t ∼ t(p) .

Compare this with Definition L13.2 from MATH 667.
u/p
Definition 5.4.3 (p.115): If u 2
∼ χ (p, λ) and v ∼ χ (q)
2
are independent random variables, then the probability distribution of z = is
v/q
called a noncentral F distribution with p degrees of freedom in the numerator, q degrees of freedom in the denominator, and noncentrality
parameter λ. We sometimes write z ∼ F (p, q, λ) .
u/p
Definition 5.4.4 (p.114): If u 2
∼ χ (p) and v ∼ χ (q)
2
are independent random variables, then the probability distribution of w = is
v/q
called an F distribution with p degrees of freedom in the numerator and q degrees of freedom in the denominator and we write
w ∼ F (p, q) .
R Example 5.4.1: Suppose that y1 , … , y4 are iid N (1, 9) random variables, z1 , … , z25 are iid N (0, 1) random variables, and
4 25
(y1 , … , y4 ) is independent of (z1 , … , z25 ). Compute P (∑ y

2
i
> ∑z )
2
j
.
i=1 j=1
Answer: Here
y 2
1 4 i
4 25
⎛ ∑ ( ) 25 ⎞
4 i=1 3
2 2
P (∑ y > ∑z ) = P >
i j
1 25
⎝ ∑ z
2 36 ⎠
i=1 j=1
25 j=1 j
4 4 4 2 25
2
yi 1 1 1 2
where ∑ ( ) = ∑y
2
i
∼ χ
2
(4, λ = ∑( ) = ) by Theorem 5.3.1 and ∑ zj2 2
∼ χ (25) by Theorem 5.3.2 (which
3 9 2 3 9
i=1 i=1 i=1 j=1
are independent since they are functions of independent random vectors). This probability can be computed using the R function pf as
follows. The arguments specifying the degrees of freedom are df1 and df2, the noncentrality parameter is specified by ncp (except R’s
noncentrality parameter is μ⊤ Aμ = 2λ), and the option lower.tail=FALSE tells R to compute the probability that the F-ratio is larger than
.
25
36
pf(25/36,df1=4,df2=25,ncp=2*2/9,lower.tail=FALSE)
## [1] 0.6503005
We can simulate these sums many times using the rnorm function to verify that our answer looks reasonable.
set.seed(159847)
numberOfSimulations=10000000
leftSum=rep(0,numberOfSimulations)
rightSum=rep(0,numberOfSimulations)
for (i in 1:numberOfSimulations){
y=rnorm(4,mean=1,sd=3)
z=rnorm(25)
leftSum[i]=sum(y^2)
rightSum[i]=sum(z^2)
}
mean(leftSum > rightSum)
## [1] 0.6502779
5.5 Distribution of Quadratic Forms

Theorem 5.5.1 (p.117): Suppose y ∼ Np (μ, Σ) , A is a symmetric matrix of constants with rank r , and λ =
1
μ
⊤
Aμ . Then
2
y
⊤ 2
Ay ∼ χ (r, λ)if and only if AΣ is idempotent.
Proof: Let ω1 , … , ωp be the eigenvalues of AΣ . Then the eigenvalues of I − 2tAΣ are 1 − 2tωi for i = 1, … , p . If we choose t
small enough so that |2tωi | < 1 for all i , then
∞
1 k k
= 1 + ∑(2t) ω
i
1 − 2tωi
k=1
and
−1 k k
This study source was downloaded by 100000857715933 from CourseHero.com on 03-28-2023
(I − 2tAΣ 09:01:59
) GMT
= -05:00
I + ∑(2t) (AΣ) (see p.50).
k=1
Since AΣ is idempotent, Theorem 2.13.2 implies that r of the ω’s equal 1 and the other p − r ω ’s equal 0. So, the moment generating
function of y⊤ Ay is
⊤ −1 −1
−1/2 −μ (I−(I−2tAΣ) )Σ μ/2
My⊤ Ay = det(I − 2tAΣ) e
−1/2
p
⊤ ∞ k −1
−μ (− ∑ (2t) AΣ)Σ μ/2
k=1
= (∏(1 − 2tωi )) e
i=1
⊤ ∞ k
−1/2 −(μ Aμ/2)(− ∑ (2t) )
r k=1
= ((1 − 2t) ) e
⊤
−r/2 −(μ Aμ/2)(1−1/(1−2t))
= (1 − 2t) e
which is the moment generating function of a χ 2 (r, λ = μ

⊤
Aμ/2) random variable (see Theorem 5.3.1).
For a proof of the converse statement, see http://www.tandfonline.com/doi/pdf/10.1080/00031305.1999.10474473

(http://www.tandfonline.com/doi/pdf/10.1080/00031305.1999.10474473).
n 2
∑ (yi − ȳ )
Example 5.5.1: Suppose that y1 , … , yn is a random sample from a N (μ, σ 2
) distribution. Show that i=1 2
∼ χ (n − 1) .
2
σ
Answer: Here \y=\bpm y_1\\ \vdots \\ y_n}\epm \sim N_n(\mu\j,\sigma^2\I) and
n 2 1
∑ (yi − ȳ ) (I − J)
i=1 ⊤ n
= y y.
2 2
σ σ
By Theorem 5.1.1(a), I − 1
J is idempotent, so all of its eigenvalues are either 0 or 1 and its rank equals the number of eigenvalues which
n
are 1. The sum of the eigenvalues of I − 1
n
J is
1 1 1
tr (I − J) = tr (I) − tr (J) = n − n = n − 1,
n n n
so rank(I − 1
J) = n − 1 . The noncentrality parameter is
n
1 ′
1 1
λ = (μj) ( (I − J)) (μj)
2
2 σ n
2
μ 1
⊤
= j (I − J)j
2 n
2
μ 1
⊤ ⊤ ⊤
= (j j − j jj j)
2 n
2
μ 1 2
= (n − n )
2 n
= 0.
n 2
∑ (yi − ȳ )
So, by Theorem 5.5.1, is a chi-square random variable with rank(I − degrees of freedom.
i=1 1
J) = n − 1
2 n
σ
Compare this with the proof of Theorem L4.1(c) from MATH 667.
5.6 Independence of Linear Forms and Quadratic Forms

Theorem 5.6.1 (p.119): If y ∼ Np (μ, Σ) , B is a k × p matrix of constants, and A is a p × p matrix of constants, then By and y⊤ Ay
are independent if and only if BΣA = O.
Theorem 5.6.2 (p.120): If y ∼ Np (μ, Σ) and A and B are p × p symmetric matrices of constants, then y⊤ Ay and y⊤ By are
independent if and only if AΣB = O.
Example 5.6.1: Suppose y ∼ Np (μ, σ 2 I) and H is a p × p symmetric idempotent matrix of constants with rank r < p where
⊤
y Hy/r
μ
⊤
(I − H)μ = 0 . What is the distribution of ?
⊤
y (I − H)y/(p − r)
Answer: Note the 1
σ
y ∼ Np (
1
σ
μ, I) . Since H is idempotent with rank r and I − H is idempotent with rank p − r , Theorem 5.5.1 implies
that
′
1 1 1 ⊤ 2
1 ⊤
( y) H ( y) = y Hy ∼ χ (r, μ Hμ)
2 2
σ σ σ 2σ
and
′
1 1 1 ⊤ 2 ⊤
( y) (I − H) ( y) = y (I − H)y ∼ χ (p − r) since μ (I − H)μ = 0.
2
σ σ σ
By Theorem 5.6.2, y⊤ Hy and y⊤ (I − H)y are independent since H(I − H) = O . So, by Definition 5.4.3, we see that
1 ⊤
⊤ ( y Hy) /r
y Hy/r σ
2
1 ⊤
= ∼ F (r, p − r, μ Hμ).
⊤ 2
y (I − H)y/(p − r) 1 ⊤ 2σ
( y (I − H)y) /(p − r)
2
σ
Theorem 5.6.3 (p.121): Suppose y ∼ Nn (μ, σ

2
,
I) A i is an n × n symmetric matrix of rank r i for i = 1, … k , and
⊤ ⊤
y y = ∑y A i y.
i=1
1 1
Then y
⊤
Ai y ∼ χ
2
(r i , μ
⊤
A i μ) for i = 1, … , k and y⊤ A 1 y, … , y⊤ A k y are mutually independent if and only if at least
2 2
σ 2σ
one of the following statements holds:

A 1 , … , A k are idempotent matrices
A i A j = O for all i ≠ j
k
n = ∑ ri
i=1
Powered by TCPDF (www.tcpdf.org)

Chapter 5 Distribution of Quadratic Forms PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 5 Distribution of Quadratic Forms PDF

Uploaded by

Copyright:

Available Formats

Chapter 5: Distribution of Quadratic Forms

5.1 Sums of Squares

A = (aij ) is a matrix of constants.

The following decomposition is very useful:

Theorem 5.1.1 (p.106):

Proof: (a) We see that

5.2 Mean and Variance of Quadratic Forms

Proof: Since Σ = E(yy

Theorem 5.2.2 implies that

Theorem 5.2.4 (p.109): If A is a p × p symmetric matrix of constants and y ∼ Np (μ, Σ) , then

Theorem 5.2.5 (p.110): If A is a p × p symmetric matrix of constants and y ∼ Np (μ, Σ) , then

1/2 ′ 1/2 1/2

and, for any k ∈ {1, … , p} ,

E (∑ ∑ bij zk zi zj ) = ∑ ∑ bij E (zk zi zj )

i=1 j=1 i=1 j=1

5.3 Noncentral Chi-Square Distribution

Mv (t) = (1 − 2t) . −n/2

Theorem 5.3.2 (p.114): If v1 , … , vk are independent χ 2 (ni , λi ) random variables, then ∑ vi ∼ χ

Proof: By Theorem 4.3.3(b), the moment generating function of ∑ vi is

5.4 Noncentral F and t Distribution

is called a t distribution with p degrees of freedom and write t ∼ t(p) .

(y1 , … , y4 ) is independent of (z1 , … , z25 ). Compute P (∑ y

5.5 Distribution of Quadratic Forms

which is the moment generating function of a χ 2 (r, λ = μ

For a proof of the converse statement, see http://www.tandfonline.com/doi/pdf/10.1080/00031305.1999.10474473

are 1. The sum of the eigenvalues of I − 1

5.6 Independence of Linear Forms and Quadratic Forms

Answer: Note the 1

Theorem 5.6.3 (p.121): Suppose y ∼ Nn (μ, σ

one of the following statements holds:

Powered by TCPDF (www.tcpdf.org)

You might also like