You are on page 1of 147

Continuous Distributions

by R.J. Reed
These notes are based on handouts for lecture modules given jointly by myself and the late Jeff Harrison.
The following continuous distributions are covered: gamma, χ2 ,normal, lognormal, beta, arcsine, t, Cauchy, F,
power, Laplace, Rayleigh, Weibull, Pareto, bivariate normal, multivariate normal, bivariate t, and multivariate t
distributions.
There are many additional theoretical results in the exercises—full solutions are provided at the end.
The colour red indicates a hyperlink.
Version 2: January 2019. Added section on quadratic forms.

Contents

1 Univariate Continuous Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3


Revision of some basic results: 3. Order statistics: 3. Exercises: 7. The uniform distribution: 9.
Exercises: 15. The exponential distribution: 16. Exercises: 19. The Gamma and χ2 distributions: 20.
Exercises: 24. The normal distribution: 25. Exercises: 28. The lognormal distribution: 29.
Exercises: 31. The beta and arcsine distributions: 32. Exercises: 35. The t, Cauchy and F
distributions: 36. Exercises: 39. Non-central distributions: 41. Exercises: 44. The power
and Pareto distributions: 45. Exercises: 48. Size, shape and related characterization theorems: 50.
Exercises: 54. Laplace, Rayleigh and Weibull distributions: 55. Exercises: 56.

2 Multivariate Continuous Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59


General results: 59. Exercises: 63. The bivariate normal: 65. Exercises: 69. The multivariate
normal: 71. Exercises: 80. Quadratic forms of normal random variables: 82. Exercises: 95. The
bivariate t distribution: 97. The multivariate t distribution: 99. Exercises: 101. The Dirichlet
distribution: 101.

Appendix: Answers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103


Chapter 1. Exercises on page 6: 103. Exercises on page 15: 107. Exercises on page 19: 110. Exercises
on page 24: 113. Exercises on page 28: 115. Exercises on page 31: 118. Exercises on page 35: 119.
Exercises on page 39: 121. Exercises on page 44: 124. Exercises on page 48: 124. Exercises on page 54: 129.
Exercises on page 56: 130.
Chapter 2. Exercises on page 63: 132. Exercises on page 69: 133. Exercises on page 79: 137. Exercises
on page 95: 140. Exercises on page 101: 142.

Appendix: References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

Comments are welcome—even comments such as “not useful because it omits xxx”. Please send comments and details of
errors to my Wordpress account, bcgts.wordpress.com, where the most up-to-date version of these notes will be found.

Bayesian Time Series Analysis by R.J. Reed Jan 8, 2019(21:02) Page 1


Page 2 Jan 8, 2019(21:02) Bayesian Time Series Analysis
CHAPTER 1

Univariate Continuous Distributions

1 Revision of some basic results


1.1 Conditional variance and expectation. For any random vector (X, Y ) such that E[Y 2 ] is finite, the condi-
tional variance of Y given X is defined to be
2
var(Y | X) = E[Y 2 | X] − E[Y | X]
= E (Y − E[Y | X])2 | X
 
(1.1a)
This is a function of X. It follows that
E var(Y | X) + var E[Y | X] = E[Y 2 ] − (E[Y ])2 = var(Y )
  
(1.1b)
Equation(1.1b) is often called the Law of Total Variance and is probably best remembered in the following form:
var(Y ) = E[conditional variance] + var(conditional mean)
This is similar to the decomposition in the analysis of variance. A generalisation is given in exercise 10 on page 7.
Definition(1.1a). For any random vector (X, Y, Z) such that E[XY ], E[X] and E[Y ] are all finite, the condi-
tional covariance between X and Y given Z is defined to be
cov(X, Y | Z) = E[XY | Z] − E[X | Z] E[Y | Z]
An alternative definition is
   
cov(X, Y | Z) = E X − E[X | Z] Y − E[Y | Z] Z (1.1c)
Note that cov(X, Y | Z) is a function of Z. Using the results cov(X, Y ) = E[XY ] − E[X]E[Y ] and
  
cov E[X|Z], E[Y |Z] = E E[X|Z] E[Y |Z] − E[X]E[Y ]
gives the Law of Total Covariance
  
cov(X, Y ) = E cov(X, Y | Z) + cov E[X|Z], E[Y |Z] (1.1d)
This can be remembered as
cov(X, Y ) = E[conditional covariance] + cov(conditional means)
Note that setting X = Y in the Law of Total Covariance gives the Law of Total Variance.
1.2 Conditional independence. Recall that X and Y are conditionally independent given Z iff
Pr[X ≤ x, Y ≤ y | Z] = Pr[X ≤ x | Z] Pr[Y ≤ y | Z] a.e.
for all x ∈ R and all y ∈ R.
Example(1.2a). Conditional independence does not imply independence.
Here is a simple demonstration: suppose box 1 contains two fair coins and box 2 contains two coins which have heads on
both sides. A box is chosen at random—denote the result by Z. A coin is selected from the chosen box and tossed—denote
the result by X; then the other coin from the chosen box is tossed independently of the first coin—denote the result by Y .
Clearly X and Y are conditionally independent given Z. However
5 3
Pr[X = H, Y = H] = but Pr[X = H] = Pr[Y = H] =
8 4

2 Order statistics
2.1 Basics. Suppose X1 , X2 , . . . , Xn are i.i.d. random variables with a continuous distribution which has
distribution function F and density f . Then
X1:n , X2:n , . . . , Xn:n
denote the order statistics of X1 , X2 , . . . , Xn . This means
X1:n = min{X1 , . . . , Xn }
Xn:n = max{X1 , . . . , Xn }
and the random variables X1:n , X2:n , . . . , Xn:n consist of X1 , X2 , . . . , Xn arranged in increasing order; hence
X1:n ≤ X2:n ≤ · · · ≤ Xn:n

Bayesian Time Series Analysis by R.J. Reed Jan 8, 2019(21:02) §2 Page 3


Page 4 §2 Jan 8, 2019(21:02) Bayesian Time Series Analysis

2.2 Finding the density of (X1:n , . . . , Xn:n ). Let g(y1 , . . . , yn ) denote the density of (X1:n , . . . , Xn:n ).
Note that (X1:n , . . . , Xn:n ) can be regarded as a transformation T of the vector (X1 , . . . , Xn ).
• Suppose n = 2. Let A1 = {(x1 , x2 ) ∈ R2 : x1 < x2 } and let T1 denote the restriction of T to A1 . Similarly let
A2 = {(x1 , x2 ) ∈ R2 : x1 > x2 } and let T2 denote the restriction of T to A2 . Clearly T1 : A1 → A1 is 1 − 1 and
T2 : A2 → A1 is 1 − 1. Hence for all (y1 , y2 ) ∈ A1 (i.e. for all y1 < y2 ), the density g(y1 , y2 ) of (X1:2 , X2:2 ) is
fX1 ,X2 (T1−1 (y1 , y2 )) fX1 ,X2 (T2−1 (y1 , y2 ))
g(y1 , y2 ) =
∂(y1 ,y2 ) +
∂(y1 ,y2 )
∂(x1 ,x2 ) ∂(x1 ,x2 )
fX1 ,X2 (y1 , y2 ) fX1 ,X2 (y2 , y1 )
= +
|1| |−1|
= 2f (y1 )f (y2 )
• Suppose n = 3. For this case, we need A1 , A2 , A3 , A4 , A5 and A6 where
A1 = {(x1 , x2 , x3 ) ∈ R3 : x1 < x2 < x3 }
A2 = {(x1 , x2 , x3 ) ∈ R3 : x1 < x3 < x2 }
etc. There are 3! = 6 orderings of (x1 , x2 , x3 ). So this leads to
g(y1 , y2 , y3 ) = 3!f (y1 )f (y2 )f (y3 )
• For the general case of n ≥ 2, we have
g(y1 , . . . , yn ) = n!f (y1 ) · · · f (yn ) for y1 < · · · < yn . (2.2a)

2.3 Finding the distribution of Xr:n by using distribution functions. Dealing with the maximum is easy:
n
Y
Fn:n (x) = P[Xn:n ≤ x] = P[X1 ≤ x, . . . , Xn ≤ x] = P[Xi ≤ x] = {F (x)}n
i=1
n−1
fn:n (x) = nf (x) {F (x)}
and provided the random variables are positive, using the result of exercise 5 on page 7 gives
Z ∞
1 − {F (x)}n dx
 
E[Xn:n ] =
0
Now for the minimum: X1:n :
n
Y
P[X1:n > x] = P[X1 > x, . . . , Xn > x] = P[Xi > x] = {1 − F (x)}n
i=1
F1:n (x) = 1 − P[X1:n > x] = 1 − {1 − F (x)}n
f1:n (x) = nf (x) {1 − F (x)}n−1
and provided the random variables are positive, using the result of exercise 5 on page 7 gives
Z ∞
E[X1:n ] = {1 − F (x)}n dx
0
Now for the general case, Xr:n where 2 ≤ r ≤ n − 1. The event {Xr:n ≤ x} occurs iff at least r random variables
from X1 , . . . , Xn are less than or equal to x. Hence
n  
X n
P[Xr:n ≤ x] = {F (x)}j {1 − F (x)}n−j (2.3a)
j
j=r
n−1  
X n
= {F (x)}j {1 − F (x)}n−j + {F (x)}n
j
j=r
Differentiating gives
n−1  
X n
fr:n (x) = jf (x) {F (x)}j−1 {1 − F (x)}n−j −
j
j=r
n−1  
X n
(n − j)f (x) {F (x)}j {1 − F (x)}n−j−1 + nf (x) {F (x)}n−1
j
j=r
1 Univariate Continuous Distributions Jan 8, 2019(21:02) §2 Page 5

n
X n!
= f (x) {F (x)}j−1 {1 − F (x)}n−j −
(j − 1)!(n − j)!
j=r
n−1
X n!
f (x) {F (x)}j {1 − F (x)}n−j−1
j!(n − j − 1)!
j=r
n!
= f (x) {F (x)}r−1 {1 − F (x)}n−r (2.3b)
(r − 1)!(n − r)!
Note that equation(2.3b) is true for all r = 1, 2, . . . , n.
2.4 Finding the distribution of Xr:n by using the density of (X1:n , . . . , Xn:n ). Recall that the density of
(X1:n , . . . , Xn:n ) is g(y1 , . . . , yn ) = n!f (y1 ) · · · f (yn ) for y1 < · · · < yn .
Integrating out yn gives
Z ∞
 
g(y1 , . . . , yn−1 ) = n!f (y1 ) · · · f (yn−1 ) f (yn )dyn = n!f (y1 ) · · · f (yn−1 ) 1 − F (yn−1 )
yn−1
and hence Z ∞  
g(y1 , . . . , yn−2 ) = n!f (y1 ) · · · f (yn−2 ) f (yn−1 ) 1 − F (yn−1 ) dyn−1
yn−2
 2
1 − F (yn−2 )
= n!f (y1 ) · · · f (yn−2 )
2!
 3
1 − F (yn−3 )
g(y1 , . . . , yn−3 ) = n!f (y1 ) · · · f (yn−3 )
3!
and by induction for r = 1, 2, . . . , n − 1
[1 − F (yr )]n−r
g(y1 , . . . , yr ) = n!f (y1 ) · · · f (yr ) for y1 < y2 < · · · < yr .
(n − r)!
Assuming r ≥ 3 and integrating over y1 gives
Z y2
[1 − F (yr )]n−r [1 − F (yr )]n−r
g(y2 , . . . , yr ) = n! f (y1 ) · · · f (yr ) dy1 = n!F (y2 )f (y2 ) · · · f (yr )
y1 =−∞ (n − r)! (n − r)!
Integrating over y2 gives
[F (y3 )]2 [1 − F (yr )]n−r
g(y3 , . . . , yr ) = n! f (y3 ) · · · f (yr ) for y3 < · · · < yr .
2! (n − r)!
And so on, leading to equation(2.3b).
2.5 Joint distribution of ( Xj:n , Xr:n ) by using the density of (X1:n , . . . , Xn:n ). Suppose X1:n , . . . , Xn:n
denote the order statistics from the n random variables X1 , . . . , Xn which have density f (x) and distribution
function F (x). Suppose 1 ≤ j < r ≤ n; then the joint density of (Xj:n , Xr:n ) is
 j−1  r−1−j  n−r
f(j:n,r:n) (u, v) = cf (u)f (v) F (u) F (v) − F (u) 1 − F (v) (2.5a)
where
n!
c=
(j − 1)!(r − 1 − j)!(n − r)!
The method used to derive this result is the same as that used to derive the distribution of Xr:n in the previous
paragraph.
Example(2.5a). Suppose X1 , . . . , Xn are i.i.d. random variables with density f (x) and distribution function F (x). Find
expressions for the density and distribution function of Rn = Xn:n − X1:n , the range of X1 , . . . , Xn .
Solution. The density of (X1:n , Xn:n ) is
 n−2
f(1:n,n:n) (u, v) = n(n − 1)f (u)f (v) F (v) − F (u) for u < v.
Now use the transformation R = Xn:n − X1:n and T = X1:n . The absolute value of the Jacobian is one. Hence
 n−2
f(R,T ) (r, t) = n(n − 1)f (t)f (r + t) F (r + t) − F (t) for r > 0 and t ∈ R.
Integrating out T gives
Z ∞
 n−2
fR (r) = n(n − 1) f (t)f (r + t) F (r + t) − F (t) dt
t=−∞
Page 6 §2 Jan 8, 2019(21:02) Bayesian Time Series Analysis

The distribution function is, for v Z> 0,Z


v ∞  n−2
FR (v) = n(n − 1) f (t)f (r + t) F (r + t) − F (t) dt dr
r=0 t=−∞
Z ∞ Z v
 n−2
= n(n − 1) f (t) f (r + t) F (r + t) − F (t) dr dt
t=−∞ r=0
Z ∞ ∞
n−1 v
h Z
 n−1
=n f (t) F (r + t) − F (t) dt = n f (t) F (v + t) − F (t) dt
t=−∞ r=0 t=−∞

2.6 Joint distribution of ( Xj:n , Xr:n ) by using distribution functions. Suppose u < v and then define the
counts N1 , N2 and N3 as follows:
X n Xn Xn
N1 = I(Xi ≤ u) N2 = I(u < Xi ≤ v) and N3 = n − N1 − N2 = I(Xi > v)
i=1 i=1 i=1
     
Now P X1 ≤ u = F (u); P u < X1 ≤ v = F (v) − F (u) and P X > v = 1 − F (v). It follows  that the vector
(N1 , N2 , N3 ) has the multinomial distribution with probabilities F (u), F (v) − F (u), 1 − F (v) .

The joint distribution function of Xj:n .Xr:n is:
n X
`
    X  
P Xj:n ≤ u and Xr:n < v = P N1 ≥ j and (N1 + N2 ) ≥ r = P N1 = k and N1 + N2 = `
`=r k=j
n X
`
X n!  k  `−k  n−`
= F (u) F (v) − F (u) 1 − F (v)
k!(` − k)!(n − `)!
`=r k=j

The joint density of Xj:n .Xr:n is:
∂2  
P Xj:n ≤ u and Xr:n < v
f(j:n,r:n) (u, v) =
∂u∂v
Using the abbreviations a = F (u), b = F (v) − F (u) and
 c = 1 − F (v) gives
n X `
∂  X n!
ak−1 b`−k cn−`

P Xj:n ≤ u and Xr:n < v = f (u)
∂u  (k − 1)!(` − k)!(n − `)!
`=r k=j

`−1
X n! 
− ak b`−k−1 cn−`
k!(` − k − 1)!(n − `)! 
k=j
n
X n!
= f (u) aj−1 b`−j cn−`
(j − 1)!(` − j)!(n − `)!
`=r
and hence
∂2  n!
aj−1 br−j−1 cn−r

P Xj:n ≤ u and Xr:n < v = f (u)f (v)
∂u∂v (j − 1)!(r − j − 1)!(n − r)!
as required—see equation(2.5a) on page 5.
2.7 Asymptotic distributions. The next proposition gives the asymptotic distribution of the median. For other
results, see chapter 8 in [A RNOLD et al.(2008)].
Proposition(2.7a). Suppose the random variable X has an absolutely
  continuous distribution with density f
which is positive and continuous at the median, µ̃. Suppose in = n/2 + 1. Then
√  D
2 nf (µ̃) Xin :n − µ̃ =⇒ N (0, 1) as n → ∞.
This means that Xin :n is asymptotically normal with mean µ̃ and variance 4n(f1(µ̃) )2 .
Proof. See page 223 in [A RNOLD et al.(2008)].
Example(2.7b). Suppose U1 , . . . , U2n−1 are i.i.d. random variables with the U (0, 1) distribution. Then the median is
√  D √ √
Un:(2n−1) and by the proposition 8n − 4 Un:(2n−1) − 1/2 =⇒ N (0, 1) as n → ∞. Of course an = 8n/ 8n − 4 → 1 as
√  D
n → ∞. Hence by Lemma 23 on page 263 of [F RISTEDT & G RAY(1997)] we have 8n Un:(2n−1) − 1/2 =⇒ N (0, 1) as
n → ∞ and  
1 t
lim P Un:(2n−1) − < √ = Φ(t) for t ∈ R.
n→∞ 2 8n
1 Univariate Continuous Distributions Jan 8, 2019(21:02) §3 Page 7

3 Exercises (exs-basic.tex)

Revision exercises.
1. The following assumptions are made about the interest rates for the next three years. Suppose the interest rate for year 1
is 4% p.a. effective. Let V1 and V2 denote the interest rates in years 2 and 3 respectively. Suppose V1 = 0.04 + U1 and
V2 = 0.04 + 2U2 where U1 and U2 are independent random variables with a uniform distribution on [−0.01, 0.01]. Hence
V1 has a uniform distribution on [0.03, 0.05] and V2 has a uniform distribution on [0.02, 0.06].
(a) Find the expectation of the accumulated amount at the end of 3 years of £1,000 invested now.
(b) Find the expectation of the present value of £1,000 in three years’ time.
2. Uniform to triangular. Suppose X and Y are i.i.d random variables with the uniform distribution U (−a, a), where a > 0.
Find the density of W = X + Y and sketch the shape of the density.
1
3. Suppose the random variable X has the density fX (x) = 2 for −1 < x < 1. Find the density of Y = X 4 .
4. Suppose X is a random variable with X > 0 a.e. and such that both E[X] and E[ 1/X ] both exist. Prove that E[X] +
E[ 1/X ] ≥ 2.
5. Suppose X is a random variable with X ≥ 0 and density function f . Let F denote the distribution function of X. Show
that Z ∞ Z ∞
(a) E[X] = [1 − F (x)] dx (b) E[X r ] = rxr−1 [1 − F (x)] dx for r = 1, 2, . . . .
0 0

6. Suppose X1 , X2 , . . . , Xn are independent and identically distributed positive random variables and Sn = X1 + · · · + Xn .
(a) Show that  
1 1
E ≥
Sn nµ
(b) Show that   Z ∞
1 n
E = E[e−tX ] dt
Sn 0

7. Suppose X1 , X2 , . . . , Xn are independent and identically distributed positive random variables.


(a) Suppose E[1/Xi ] is finite for all i. Show that E[1/Sj ] is finite for all j = 2, 3, . . . , n where Sj = X1 + · · · + Xj .
(b) Suppose E[Xi ] and E[1/Xi ] both exist and are finite for all i. Show that
 
Sj j
E = for j = 1, 2,. . . , n.
Sn n
8. Suppose X and Y are positive random variables with E[Y ] > 0. Suppose further that X/Y is independent of X and X/Y
is independent of Y .
2
(a) Suppose E[X 2 ], E[Y 2 ] and E[ X /Y 2 ] are all finite. Show that E[X] = E[ X/Y ] E[Y ]. Hence deduce that there
exists b ∈ R with /Y = b almost everywhere.
X

(b) Use characteristic functions to prove there exists b ∈ R with X/Y = b almost everywhere.
9. Suppose a > 0 and X and Y are i.i.d. random variables with the density
ex
f (x) = for −∞ < x < ln(a).
a
Find the density of W = |X − Y |,
Conditional expectation.
10. Suppose (X, Y ) is a random vector and g : R → R such that E[Y 2 ] < ∞ and E[g(X)2 ] < ∞. Show that
h 2 i   h 2 i
E Y − g(X) = E var[Y |X] + E E[Y |X] − g(X)

11. (For this question, you need the results that if X has the uniform distribution on (0, b) which is denoted U (0, b), then
E[X] = b/2, E[X 2 ] = b2 /3 and var[X] = b2 /12.) Suppose X ∼ U (0, 1) and the distribution of Y given X = x is
U (0, x). By using the law of total expectation E[Y ] = E[ E[Y |X] ] and the law of total variance, which is equation(1.1b),
find E[Y ] and var[Y ].
12. The best predictor of the random variable Y . Given the random vector (X, Y ) with E[X 2 ] < ∞ and E[Y 2 ] < ∞, find
that random variable Yb = g(X) which is a function of X and provides the best predictor of Y . Precisely, show that
Yb = E[Y |X], which is a function of X, minimizes
 2 
E Y −Y b
Page 8 §3 Jan 8, 2019(21:02) Bayesian Time Series Analysis

13. Suppose the random vector (X, Y ) satisfies 0 < E[X 2 ] < ∞ and 0 < E[Y 2 ] < ∞. Suppose further that E[Y |X = x] =
a + bx a.e..
(a) Show that µY = a + bµX and E[XY ] = aµX + bE[X 2 ]. Hence show that cov[X, Y ] = b var[X] and E[Y |X] =
µY + ρ σσX
Y
(X − µX ) a.e..
   2
(b) Show that var E(Y |X) = ρ2 σY2 and E Y − E(Y |X) = (1 − ρ2 )σY2 .
(Hence if ρ ≈ 1 then Y is near E(Y |X) with high probability; if ρ = 0 then the variation of Y about E(Y |X) is the
same as the variation about the mean µY .)
(c) Suppose E(X|Y ) = c + dY a.e. where bd < 1 and d 6= 0. Find expressions for E[X], E[Y ], ρ2 and σY2 /σX
2
in
terms of a, b, c and d.

14. Best linear predictor of the random variable Y . Suppose the random vector (X, Y ) satisfies 0 < E[X 2 ] < ∞ and
0 < E[Y 2 ] < ∞.
Find a and b such that therandom variable Yb = a + bX provides the best linear predictor of Y . Precisely, find a ∈ R and
b ∈ R which minimize E ( Y − a − bX )2 .

Note. Suppose E[Y |X] = a0 + b0 X. By exercise 12, we know that E[Y |X] = a0 + b0 X is the best predictor of Y . Hence
a0 + b0 X is also the best linear predictor of Y

15. Suppose the random vector (X, Y ) has the density



6 2
f(X,Y ) (x, y) = 7 (x + y) for x ∈ [0, 1] and y ∈ [0, 1];
0 otherwise.
(a) Find the best predictor of Y .
(b) Find the best linear predictor of Y .
(c) Compare the plots of the answers to parts (a) and (b) as functions of x ∈ [0, 1].

Order statistics.
16. Suppose X1 and X2 are i.i.d. random variables with the U (0, 1) distribution. Let Y denote the point which is closest to
an endpoint—either 0 or 1.
(a) Find the distribution of Z, the distance from Y to the nearest endpoint.
(b) Find the distribution of Z, the distance from 0 to Y .

17. Suppose X1 , X2 , . . . , Xn are i.i.d. random variables with the uniform U (0, 1) distribution.
(a) Find the distribution of Xj:n . (b) Find E[Xj:n ].

18. Suppose X1 , X2 , X3 and X4 are i.i.d. random variables with the U (0, 1) distribution.
(a) Find the density of (X3:4 , X4:4 ).
(b) Find P[X3:4 + X4:4 ≤ 1].

19. Suppose k > r. It is known1 that if the random variable X has an absolutely continuous distribution with distribution
function F then the conditional distribution function P[Xk:n < y|Xr:n = x] is the same as the distribution function of
the (k − r)th order statistic in a sample of size (n − r) from the distribution function

F (y)−F (x)
F1 (y) = 1−F (y) if y > x;
0 otherwise.

Suppose X1 and X2 are i.i.d. absolutely continuous non-negative random variables with density function f (x) and
distribution function F (x). By using the above result, show that X2:2 − X1:2 is independent of X1:2 if and only if
X ∼ exponential (λ).

20. Suppose X1 , X2 , . . . , Xn are i.i.d. absolutely continuous non-negative random variables with density function f (x) and
distribution function F (x). Define the vector (Y1 , Y2 , . . . , Yn ) by
X2:n Xn:n
Y1 = X1:n , Y2 = , . . . , Yn =
X1:n X1:n
(a) Find an expression for the density of the vector (Y1 , Y2 , . . . , Yn ) in terms of f and F .
(b) Hence derive expressions for the density of the vector (Y1 , Y2 ) = (X1:n , X2:n/X1:n ) and the density of the random
variable Y1 = X1:n .

1
For example, page 38 of [G ALAMBOS & KOTZ(1978)].
1 Univariate Continuous Distributions Jan 8, 2019(21:02) §4 Page 9

21. Suppose X1 , X2 , . . . , Xn are i.i.d. random variables with the uniform U (0, 1) distribution. Define (Y1 , Y2 , . . . , Yn ) by
X1:n X2:n X(n−1):n
Y1 = , Y2 = , . . . , Yn−1 = , Yn = Xn:n
X2:n X3:n Xn:n
Show that Y1 , . . . , Yn are independent and that V1 = Y1 , V2 = Y22 , . . . , Vn = Ynn are i.i.d.
22. Suppose X1 , X2 , . . . , Xn are i.i.d. random variables with order statistics X1:n , X2:n , . . . , Xn:n . Find an expression for
E[ X1 |X1:n , X2:n , . . . , Xn:n ].
23. Record values. Suppose X0 , X1 , X2 , . . . are i.i.d. random variables with an absolutely continuous distribution. Let T
denote the index value of the first variable which is greater than X0 . Hence
{ N = 1 } = { X1 > X0 }
{ N = 2 } = { X1 < X0 , X2 > X0 }
{ N = 3 } = { X1 < X0 , X2 < X0 , X3 > X0 } etc.
Find the distribution of N and E[N ].
24. Suppose X1 , X2 , X3 and X4 are i.i.d. random variables with the uniform U (0, 1) distribution. Find the distributions of
Y = X3:4 − X1:4 and Y = X4:4 − X2:4 .
25. Suppose X1 , X2 and X3 are i.i.d. random variables with the uniform U (0, 1) distribution. Find the conditional density
of X2:3 given (X1:3 , X3:3 ).

4 The uniform distribution


4.1 Definition of the uniform distribution.
Definition(4.1a). Suppose a ∈ R, b ∈ R and a < b. Then the random variable X has the uniform distribution
U (a, b) iff X has density 
 1
for x ∈ (a, b).
f (x) = b − a
0 otherwise.
The distribution function is
0 if x < a;

x−a
F (x) = if x ∈ (a, b);
 b−a
1 if x > b.
If X ∼ U (0, 1) and Y = a + (b − a)X then Y ∼ U (a, b). The uniform distribution is also called the rectangular
distribution.
Moments. The moments E[X n ] are finite for n 6= 1:
Z b Z b
a+b a2 + ab + b2 (b − a)2
E[X] = xf (x) dx = E[X 2 ] = x2 f (x) dx = var[X] =
a 2 a 3 12
Z b n+1 n+1
b −a
E[X n ] = xn f (x) dx = for n 6= −1, n ∈ R.
a (n + 1)(b − a)
The moment generating function and characteristic function.
(b−a)t
 tb
e − eta

Z b tx
e  2 sin( 2 )ei(a+b)t/2
for t 6= 0; for t 6= 0;

tX itX
E[e ] = dx = t(b − a) and E[e ] = t(b − a)
a b − a
1 for t = 0. for t = 0.
 
1
fX (x) ............... FX (x) ..........
.... ....
... ..
.. ...
..........
...
...
...........................................................................................
.... .... 1 .... ..... .
..... ..
....................................
......
... ... ... .... .....
... ... ... .. ...
... .
... .
. ... ... ..... .
... .. ... ... ..... .
...
.
..
. ... ... ..
....... .
.
... 1 ..
. ... ... ......
. .
... ..
. ... ... ..
..... .
.
... b−a ..
. ... ... ......
. .
... ..
. ... ... ..
..... .
... ..
. ... ... ...
.... .
.
... ..
. ... ... .
...... .
... ..
. ... ... ...
.... .
... ..
. ... ... .
.....
. .
.
... ... . . ... .
..... . .
......................................................................................................................................................................................................
. .......................................................................................................................................................................................................
.
. .
.... ....
0 a b x 0 a b x
Figure(4.1a). Left: plot of density of U (a, b). Right: plot of distribution function of U (a, b).
(PICTEX)
Page 10 §4 Jan 8, 2019(21:02) Bayesian Time Series Analysis

4.2 Sums of i.i.d. uniforms. First, the sum of two i.i.d. uniforms.
Example(4.2a). Suppose X ∼ U (0, a), Y ∼ U (0, a) and X and Y are independent. Find the distribution of Z = X + Y .
Solution. Clearly Z ∈ (0, 2a). The usual convolution integral gives
1 a
Z Z
fZ (z) = fX (x)fY (z − x) dx = fY (z − x) dx
x a 0
where fY (z − x) = 1/a when 0 < z − x < a; i.e. when z − a < x < z. Hence

Z min{a,z}  z
 if 0 < z < a;
1 min{a, z} − max{0, z − a}  a2
fZ (z) = 2 dx = =
a max{0,z−a} a2 2a − z
if a < z < 2a.


a2

and the distribution function is


 2
z
if 0 < z < a;


 2
2a
FZ (z) =
2z z2 (2a − z)2
− 2 −1=1− if a < z < 2a.



a 2a 2a2
A graph of the density and distribution function is shown in figure(4.2a). For obvious reasons, this is called the triangular
distribution.
fX (x) ................ FX (x) ...........
... ...
... ...
.. ..
... ......... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...........................................................
...
...
..........
.... . .....
..... . .....
1 ...
... ......
........
..... .
.
... ..... .. ........ ... ...... .
... .
.....
. . ..... ... ........
. .
.
...
.
..... . .....
..... ... .
...... .
... ..... . ..... ... ..... .
... ..
.....
. . ..... ... ...
.
.
.... .
.... . ..... .. .
... .
..
.
.
. .
. ..... 1/2 ... . . . . . . . . . . . . . . . . . .......... .
.
... ..
. ..... ... ..
.
... ..
..... 1/a .
. ..... ... ...... ..
. .
.
... ..
.... . .....
..... ... ..
..... . .
... .
..
...
. . ..... ... ..
...... . .
... ........
.
. ..... ... ...
..... .
.
.
.
... ....... . .....
..... ... ...
.....
. . .
......... . ..... ... .
...
.......
. . .
. . .... . . .
.......................................................................................................................................................................................................... .................................................................................................................................................................................................................
... . ... .
0 . a 2a x 0 . a 2a x
Figure(4.2a). Plot of density (left) and distribution function (right) of triangular distribution.
(PICTEX)

Now for the general result on the sum of n independent and identically distributed uniforms.
Proposition(4.2b). Suppose X1 , X2 , . . . , Xn are i.i.d. random variables with the U (0, 1) distribution. Let
Sn = X1 + · · · + Xn . Then the density and distribution function of Sn are given by
n  
1 X k n
n
(t − k)+

Fn (t) = (−1) for all t ∈ R and all n = 1, 2, . . . . (4.2a)
n! k
k=0
n  
1 k n
X n−1
(t − k)+

fn (t) = (−1) for all t ∈ R and all n = 2, 3, . . . .
(n − 1)! k
k=0
Proof. We prove the result for Fn (t) by induction on n.
Now if n = 1, then the right hand side of equation(4.2a) gives [t+ − (t − 1)+ ] which equals
0 if t < 0;
(
t if 0 < t < 1;
1 if t > 1.
as required. Also, for t ∈ R and n = 2, 3, . . . , we have
Z 1 Z 1 Z t
 
fn (t) = fn−1 (t − x)f1 (x) dx = fn−1 (t − x) dx = fn−1 (y) dy = Fn−1 (t) − Fn−1 (t − 1)
0 0 y=t−1
Assume that equation(4.2a) is true for n; to prove it true for n + 1:
fn+1 (t) = [Fn (t) − Fn (t − 1)]
" n   n   #
1 X k n + n k n + n
  X  
= (−1) (t − k) − (−1) (t − k − 1)
n! k k
k=0 k=0
" n   n+1   #
1 X k n + n n + n
X
`
   
= (−1) (t − k) + (−1) (t − `)
n! k `−1
k=0 `=1
n+1  
1 X k n+1
n
(t − k)+

= (−1)
n! k
k=0

by using the combinatorial identity k + k−1 = n+1


n n
  
k . Integrating fn+1 (t) gives
1 Univariate Continuous Distributions Jan 8, 2019(21:02) §4 Page 11

t n+1  
k n+1
Z
1 X n+1
(t − k)+

Fn+1 (t) = fn+1 (x) dx = (−1)
0 (n + 1)! k
k=0
This establishes the proposition.
The proposition easily extends to other uniforms. For example. Suppose X1 , X2 , . . . , Xn are i.i.d. random
variables with the U (0, a) distribution and Sn = X1 + · · · + Xn . Then the proposition can be applied to the sum
Sn0 = Y1 + · · · + Yn , where Yj = Xj/a. Hence
Fn (t) = P[X1 + · · · + Xn ≤ t] = P[Sn0 ≤ t/a]
n  
1 X k n
n
(t − ka)+

= n (−1) for all t ∈ R and all n = 1, 2, . . . .
a n! k
k=0
n  
1 k n
X n−1
(t − ka)+

fn (t) = n (−1) for all t ∈ R and all n = 2, 3, . . . .
a (n − 1)! k
k=0
Similarly if the random variables come from the U (a, b) distribution.
An alternative proof of proposition(4.2b) is based on taking the weak limit of the equivalent result for discrete
uniforms; the proof below is essentially that given on pages 284–285 in [F ELLER(1968)].
Proof. (An alternative proof of proposition(4.2b).) For m = 1, 2, . . . and n = 2, 3, . . . , suppose Xm1 , Xm2 , . . . , Xmn
are i.i.d. random variables with the discrete uniform distribution on the points 0, 1/m, 2/m, . . . , m/m = 1.
0 0 0 0
Let Xmj = mXmj + 1. Then Xm1 , Xm2 , . . . , Xmn are i.i.d. random variables with the discrete uniform distribution on
the points 1, 2, . . . , m + 1.
Now the discrete uniform distribution on the points 1, 2, . . . , m + 1 has probability generating function
s(1 − sm+1 )
for |s| < 1.
(m + 1)(1 − s)
0 0
Hence the probability generating function of Xm1 + · · · + Xmn is
n m+1 n
0 0 s (1 − s )
E[sXm1 +···+Xmn ] = for |s| < 1.
(m + 1) (1 − s)n
n
0 0
Hence the probability generating function of the sequence P[Xm1 + · · · + Xmn ≤ j] is
n m+1 n
s (1 − s )
for |s| < 1. (4.2b)
(m + 1)n (1 − s)n+1
Also, for j ∈ {0, 1/m, 2/m, . . . , mn−1/m, mn/m = n} we have
0 0
P[Xm1 + · · · + Xmn ≤ j] = P[Xm1 + · · · + Xmn ≤ mj + n]
mj+n
and this is the coefficient of s in the expansion of equation(4.2b), which in turn is
n  
( )
m+1 n m`+`
1 (1 − s ) X n s
× coefficient of smj in the expansion of = (−1)`
(m + 1)n (1 − s)n+1 ` (1 − s)n+1
`=0
which is
n  
( )
1 0
X n sm`−mj+`
`
× coefficient of s in the expansion of (−1)
(m + 1)n ` (1 − s)n+1
`=0
This is clearly 0 if l > j. Otherwise it is
n   n
` mj − m` − ` + n (mj − m` − ` + n)!
   
1 X n 1 X ` n
n
(−1) = (−1) (4.2c)
(m + 1) ` n n! ` (m + 1)n (mj − m` − `)!
`=0 `=0
P∞
by using the binomial series 1/(1−z)n+1 = `=0 `+n
 `
n z for |z| < 1. Taking the limit as m → ∞ of the expression in (4.2c)
gives
n  
1 X ` n
n
(j − `)+

(−1)
n! `
`=0
This proves the result when j is an integer. If j is any rational, take a series m tending to ∞ with mj an integer. Hence
the result in equation(4.2a) for any t by right continuity.
A note on the combinatorialP  by equation(4.2a) on page 10. Now Fn (t) = P[Sn ≤ t] = 1 for t ≥ n. By
identity implied
n
equation(4.2a), this implies k=0 (−1)k nk (t − k)n = n! for t ≥ n. How do we prove this identity without probability?
For all t ∈ R we have the identity
n  
X n
(−1)k etk = (1 − et )n
k
k=0
Pn n
Setting t = 0 gives k=0 k (−1)k = 0. Differentiating the identity once and setting t = 0 gives
Page 12 §4 Jan 8, 2019(21:02) Bayesian Time Series Analysis
n  
X n
(−1)k k = 0
k
k=0

Similarly, differentiating r times and setting t = 0 gives


n   
X n k r 0 if r = 0, 1, 2. . . . , n − 1;
(−1) k = n
k (−1) n! if r = n.
k=0

and hence
n  
X n n
0 if r = 1, 2. . . . , n − 1;
(−1)n−k k r =
k n! if r = n.
k=0

For all t ∈ R
n   n X n    
X n X n n j
(−1)k (t − k)n = (−1)k t (−1)n−j k n−j
k k j
k=0 k=0 j=0
n   n  
X n X n
= (−1)j tj (−1)n−k k n−j = n!
j k
j=0 k=0

This generalizes the combinatorial result implied by equation(4.2a). See also question 16 on page 65 in [F ELLER(1968)].

4.3 Representing the uniform distribution as the sum of independent Bernoulli random variables. Now
every y ∈ [0, 1) can be represented as a ‘binary decimal’. This means we can write
y = 0.x1 x2 x3 . . . where each xj is either 0 or 1.
This representation motivates the following result:
Proposition(4.3a). Suppose X1 , X2 , . . . are i.i.d. random variables with the Bernoulli distribution

1 with probability 1/2;
0 with probability 1/2.
Then the random variable

X Xk
V =
2k
k=1
has the uniform U (0, 1) distribution.
Pn Xk Xk
Proof. Let Sn = k=1 2k for n = 2, 3, . . . . Now the moment generating function of 2k
is
k 1 1 k
E[etXk /2 ] = + et/2
2 2
Hence
n
tSn 1 Y h t/2k i
E[e ]= n e +1
2
k=1
  
t/2n+1 n+1 n
Using the identity e − 1 et/2 + 1 = et/2 − 1, and induction, it is possible to show
n h
Y k
i et − 1
et/2 + 1 =
k=1
et/2n − 1
and hence
1 et − 1 et − 1
E[etSn ] =
n → as n → ∞.
2n et/2 − 1 t
Because (et − 1)/t is the moment generating function of U (0, 1), we see that V ∼ U (0, 1) as required.

By calculating the moment generating function, we can also prove the following representation. Suppose Vn ∼
U (0, 1/2n ), and X1 , X2 , . . . , Xn , Vn are all independent; then
n
X Xk
Vn + ∼ U (0, 1) for all n ∈ {1, 2, . . .}.
2k
k=1
1 Univariate Continuous Distributions Jan 8, 2019(21:02) §4 Page 13

4.4 Order statistics for the uniform distribution. Suppose X1 , . . . , Xn are i.i.d. with the U (0, 1) distribution.
Maximum and minimum. It is straightforward to check the following results:
n
P[Xn:n ≤ x] = xn for 0 ≤ x ≤ 1, and E[Xn:n ] =
n+1
1
P[X1:n ≤ x] = 1 − (1 − x)n for 0 ≤ x ≤ 1, and E[X1:n ] =
n+1
Note that E[X1:n ] = 1 − E[Xn:n ] → 0 as n → ∞.
x n
→ e−x as n → ∞. This implies

For 0 ≤ x < 1 we have P[nX1:n > x] = P[X1:n > x/n] = 1 − n
D D
nX1:n =⇒ exponential (1) as n → ∞ where =⇒ denotes weak convergence (also called convergence in
distribution). Now for Xk:n .
The distribution of Xk:n . By equations(2.3a) and (2.3b) we have
n  
X n j
P[Xk:n ≤ t] = t (1 − t)n−k
j
j=k
n − 1 k−1
 
fXk:n (t) = n t (1 − t)n−k for 0 ≤ t ≤ 1. (4.4a)
k−1
This is the Beta density Beta(k, n − k + 1) which is considered later—see §14.1 on page 32.
The limiting distribution of Xk:n as n → ∞.
P[nXk:n > t] = 1 − P[Xk:n ≤ 1/n]
n    j 
t n−j

X n t
=1− 1−
j n n
j=k

t n t n−1
  k−1 
t n−k
       
n t n t
= 1− + 1− + ··· + 1−
n 1 n n k−1 n n
k−1
t e −t
→ e−t + te−t + · · · +
k!
which is equal to P[Y > t] where Y ∼ Gamma(k, 1) distribution—this will be shown in §8.2 on page 21.
D
We have shown that nXk:n =⇒ Gamma(k, 1) as n → ∞ for any fixed k ∈ {1, 2, . . . , n}.
n
Note also that for x < 0, we have P[n(Xn:n − 1) < x] = P[Xn:n < 1 + x/n] = 1 + nx → ex as n → ∞.
4.5 The probability integral transform. Every probability distribution function F is monotonic increasing;
if F is a strictly increasing probability distribution function on the whole of R, then we know from elementary
analysis that F has a unique inverse G = F −1 . If U ∼ U (0, 1) and Y = G(U ), then clearly Y has distribution
function F . Hence we can simulate variates from the distribution F by simulating variates x1 , x2 , . . . xn from the
U (0, 1) distribution and calculating G(x1 ), G(x2 ), . . . , G(xn ).
Now for the general case. Suppose F : R → [0, 1] is a distribution function (not necessarily continuous). We first
need to define the “inverse” of F .
Proposition(4.5a). Suppose F : R → [0, 1] is a distribution function and we let G(u) = min{x: F (x) ≥ u} for
u ∈ (0, 1). Then
{x: G(u) ≤ x} = {x: F (x) ≥ u}.
Proof. Fix u ∈ (0, 1); then
x0 ∈ R.H.S. ⇒ F (x0 ) ≥ u
⇒ G(u) ≤ x0 by definition of G
⇒ x0 ∈ L.H.S.
Conversely
x0 ∈ L.H.S. ⇒ G(u) ≤ x0
⇒ min{x: F (x) ≥ u} ≤ x0
Let x = min{x: F (x) ≥ u}. Hence x ≤ x0 . Choose a sequence {xn }n≥1 with xn ↓↓ x∗ as n → ∞ (this means
∗ ∗

that the sequence {xn }n≥1 strictly decreases with limit x∗ ). Hence F (xn ) ≥ u for all n = 1, 2, . . . . Now F is a
distribution function; hence F is right continuous; hence F (x∗ ) ≥ u. Also x0 ≥ x∗ and F is monotonic increasing; hence
F (x0 ) ≥ F (x∗ ) ≥ u. Hence x0 ∈ R.H.S.
Page 14 §4 Jan 8, 2019(21:02) Bayesian Time Series Analysis

Suppose the distribution function F is continuous at α ∈ R. Then G(β) = α implies F (α) ≥ β. Also, for every
x < α we have F (x) < β. Hence F (α) = β. We have shown that G(β) = α implies F (α) = β and hence
F G(β) = β. If the random variable X has the distribution function F and F is continuous, then P[F (X) ≥ u] =
P[G(u) ≤ X] = 1 − F G(u) = 1 − u.
We have shown the following two important results.
• If the random variable X has the distribution function F and F is continuous, then the random variable F (X)
has the U (0, 1) distribution.
• Suppose F is a distribution function and G is defined in terms of F as explained above. If U has a uniform
distribution on (0, 1) and X = G(U ) then
P[X ≤ x] = P [G(U ) ≤ x] = P [F (x) ≥ U ] = F (x).
Hence
If U ∼ U (0, 1), then the distribution function of G(U ) is F .

As explained before the proposition, if the distribution function F is strictly increasing on the whole of R then
F −1 , the inverse of F , exists and G = F −1 . If F is the distribution function of a discrete distribution, then F is
constant except for countably many jumps and the inverse of F does not exist. However, G(u) is still defined by
the proposition and this method of simulating from the distribution F still works.

4.6 Using the probability integral transformation to prove results about order statistics.
Suppose X1 , . . . , Xn are i.i.d. with the U (0, 1) distribution. By equation(4.4a) on page 13 we have
n!
fXk:n (x) = xk−1 (1 − x)n−k for 0 ≤ x ≤ 1.
(k − 1)!(n − k)!
Suppose Y1 , . . . , Yn are i.i.d. with an absolutely continuous distribution with distribution function FY and density
function fY and we wish to find the distribution of Yk:n .
Then X1 = FY (Y1 ), . . . , Xn = FY (Yn ) are i.i.d. with the U (0, 1) distribution and hence
n!
fXk:n (x) = xk−1 (1 − x)n−k
(k − 1)!(n − k)!
Now FY is monotonic increasing and continuous and the transformation (X1 , . . . , Xn ) 7→ (Y1 , . . . , Yn ) is order
preserving; hence
Z FY (y)
P[Yk:n ≤ y] = P[FY (Yk:n ) ≤ FY (y)] = P[Xk:n ≤ FY (y)] = fXk:n (x) dx
−∞
and hence
n!
fYk:n (y) = {FY (y)}k−1 {1 − FY (y)}n−k fY (y)
(k − 1)!(n − k)!
This approach provides a general method for proving results about the order statistics of a sample from a contin-
uous distribution function.

4.7 Random partitions of an interval. Suppose U1 , . . . , Un are i.i.d. random variables with the uniform U (0, 1)
distribution and let U1:n , . . . , Un:n denote the order statistics. These variables partition the interval [0, 1] into n + 1
disjoint intervals with the following lengths:
D1 = U1:n , D2 = U2:n − U1:n , . . . , Dn = Un:n − U(n−1):n , Dn+1 = 1 − Un:n
Clearly D1 + · · · + Dn+1 = 1. The absolute value of the Jacobian of the transformation (U1:n , . . . , Un:n ) 7→
(D1 , . . . , Dn ) is
∂(d1 , . . . , dn )
∂(u1:n , . . . , un:n ) = 1

The density of (U1:n , . . . , Un:n ) is given by (2.2a) on page 4. Hence the density of (D1 , . . . , Dn ) is
Pn
f(D1 ,...,Dn ) (d1 , . . . , dn ) = n! for d1 ≥ 0,. . . , dn ≥ 0, `=1 d` ≤ 1. (4.7a)
There are many results on random partitions of an interval—see [F ELLER(1971)], [DAVID & BARTON(1962)] and
[W HITWORTH(1901)].
1 Univariate Continuous Distributions Jan 8, 2019(21:02) §5 Page 15

4.8
Summary. The uniform distribution.
• Density. Suppose a < b. Then X has the uniform U (a, b) density iff
1
fX (x) = for x ∈ (a, b).
b−a
• The distribution function.
x−a
F (x) = for x ∈ (a, b).
b−a
• Moments.
a+b (b − a)2 bn+1 − an+1
E[X] = var[X] = E[X n ] = for n 6= −1, n ∈ R.
2 12 (n + 1)(b − a)
• M.g.f. and c.f.
(b−a)t
 tb
 e − eta  2 sin( 2 )ei(a+b)t/2

tX for t 6= 0; itX for t 6= 0;
MX (t) = E[e ] = t(b − a) and φX (t) = E[e ] = t(b − a)
1 for t = 0. for t = 0.
 
1
• Properties.
The sum of two independent uniforms on (0, a) is the triangular distribution on (0, 2a).
If X has the distribution function F and F is continuous, then F (X) ∼ U (0, 1).

5 Exercises (exs-uniform.tex)

1. Suppose X ∼ U (a, b). Show that


n
(b − a) n

1 − (−1) n+1  (b − a)
 
if n is even;
E[(X − µ)n ] = = (n + 1)2n
(n + 1)2n+1 
0 if n is odd.

2. Transforming uniform to exponential. Suppose X ∼ U (0, 1). Find the distribution of Y = − ln X.

3. Product of independent uniforms.


(a) Suppose X and Y are i.i.d. with the U (0, 1) distribution. Let Z = XY . Find the density and distribution function
of Z.
(b) (Note: this part makes use of the fact that the sum of independent exponentials is a gamma distribution—see
proposition(8.6a) on page 22.) Suppose X1 , . . . , Xn are i.i.d. with the U (0, 1) distribution and let Pn = X1 · · · Xn .
Find the density of − ln Pn and hence find the density of Pn .

4. Suppose X1 ∼ U (0, 1), X2 ∼ U (0, X1 ), X3 ∼ U (0, X2 ), . . . , Xn ∼ U (0, Xn−1 ) for some n ≥ 2.


(a) Prove by induction that the density of Xn is
n−1
ln 1/xn
f (x) = for 0 < xn < 1.
(n − 1)!
(b) By using the result of part (b) exercise 3, find the density of Xn .

5. Suppose X is a random variable with an absolutely continuous distribution with density f . Then entropy of X is defined
to be Z
H(X) = − f (x) ln f (x) dx
Suppose X ∼ U (a, b). Find the entropy of X.
(It can be shown that the continuous distribution on the interval (a, b) with the largest entropy is the uniform.)

6. Sum and difference of two independent uniforms. Suppose X ∼ U (0, a) and Y ∼ U (0, b) and X and Y are independent.
(a) Find the density of V = X + Y and sketch its shape.
(b) Find the density of W = Y − X and sketch its shape.

7. Suppose X ∼ U (0, a) and Y ∼ U (0, b) and X and Y are independent. Find the distribution of V = min{X, Y } and find
P[V = X].
Page 16 §6 Jan 8, 2019(21:02) Bayesian Time Series Analysis

8. A waiting time problem. Suppose you arrive at a bus stop at time t = 0. The stop is served by two bus routes. From past
observations, you assess that the time X1 to wait for a bus on route 1 has the U (0, a) distribution and the time X2 to wait
for a bus on route 2 has the U (0, b) distribution. Also X1 and X2 are independent. (Clearly this assumption will not hold
in practice!!) A bus on route 1 takes the time α to reach your destination whilst a bus on route 2 takes the time α + β.
Suppose the first bus arrive at the stop at time t0 and is on route 2. Should you catch it if you wish to minimize your
expected arrival time?
9. Suppose U ∼ U (0, 1) and the random variable V has an absolutely continuous distribution with finite expectation
and density f . Also U and V are independent. Let W denote the fractional part of U + V ; this means that W =
U + V − bU + V c. Show that W ∼ U (0, 1).
(See also Poincaré’s roulette problem; pages 62–63 in [F ELLER(1971)] for example. )
10. Suppose X and Y are i.i.d. random variables with the U (0, 1) distribution. Let V = min{X, Y } and W = max{X, Y }.
(a) Find the distribution functions, densities and expectations of V and W .
(b) Find P[V ≤ v, W ≤ w] and hence derive f(V,W ) (v, w), the joint density of (V, W ).
(c) Find the density of (W |V ≤ v) and hence derive E[W |V ≤ v].
11. Suppose two points are chosen independently and at random on a circle with a circumference which has unit length.
(a) Find the distribution of the lengths of the intervals (X1 , X2 ) and (X2 , X1 ).
(b) Find the distribution of the length of the interval L which contains the fixed point Q.
(See page 23 in [F ELLER(1971)].)
12. Suppose n points are distributed independently and uniformly on a disc with radius r. Let D denote the distance from
the centre of the disc to the nearest point. Find the density and expectation of D.
13. Suppose the random vector (X1 , X2 ) has a distribution which is uniform over the disc {(x, y) ∈ R2 : x2 + y 2 ≤ a2 }.
Find the density of X1 .
14. Suppose a < b and X1 , X2 , . . . , Xn are i.i.d. random variables with the U (a, b) distribution. Let Sn = X1 + · · · + Xn .
Find expressions for the distribution function and density function of Sn .
15. Suppose X1 , . . . , Xn are i.i.d. with the U (0, 1) distribution.
(a) Find E[Xk:n ] and var[Xk:n ] for k = 1, 2, . . . , n.
(b) Find the joint density of (Xj:n , Xk:n ).
(c) Find E[Xj:n Xk:n ], cov[Xj:n , Xk:n ] and corr[Xj:n , Xk:n ].

6 The exponential distribution


6.1 The basics
Definition(6.1a). Suppose λ > 0. Then X has the exponential distribution, exponential (λ), iff X has an
absolutely continuous distribution with density

λe−λx if x > 0;
f (x) =
0 if x < 0.
The distribution function. 
−λx if x > 0;
F (x) = 1 − e
0 if x < 0.
Moments. These can be obtained by integrating by parts.
1 2 1
E[X] = E[X 2 ] = 2 var[X] = 2 (6.1a)
λ λ λ
The moment generating function and characteristic function.
λ λ
E[etX ] = φ(t) = E[eitX ] =
λ−t λ − it
Multiple of an exponential distribution. Suppose X ∼ exponential (λ) and Y = αX where α > 0. Then
P[Y > t] = P[X > t/α] = e−λt/α and hence Y ∼ exponential ( λ/α).
Sum of i.i.d. exponentials. The sum of i.i.d. random variables with an exponential distribution has a gamma
distribution—this is explained in proposition(8.6a) on page 22.
1 Univariate Continuous Distributions Jan 8, 2019(21:02) §6 Page 17

6.2 The exponential as the limit of geometric distributions.


Suppose events can only occur at times δ, 2δ, 3δ, . . . , and events at different times are independent. Let
P[event occurs at time kδ] = p
Let T denote the time to the first event. Then
P[T > kδ] = (1 − p)k
Hence P[T = kδ] = (1 − p)k−1 p which is the geometric distribution and E[T ] = δ/p.
Now suppose δn → 0 as n → ∞ in such a way that E[Tn ] = δn/pn = 1/α is constant. Then
lim P[Tn > t] = lim (1 − pn )t/δn = lim (1 − αδn )t/δn = e−αt (6.2a)
n→∞ n→∞ n→∞
D
and hence Tn =⇒ T as n → ∞ where T ∼ exponential (α).
Here is another approach to defining points distributed randomly on [0, ∞). Suppose X1 , . . . , Xn are i.i.d. random
variables with the U (0, `n ) distribution. Then P[X(1) > t] = (1 − t/`n )n for t ∈ (0, `n ). Now suppose n → ∞ in
D
such a way that n/`n = λ is fixed. Then limn→∞ P[X(1) > t] = e−λt , which means that X(1) =⇒ T as n → ∞
where T ∼ exponential (α). Informally, this result says that if points are distributed randomly on the line such that
the mean density of points is λ, then the distance to the first point has the exponential (λ) distribution.
6.3 The lack of memory or Markov property of the exponential distribution. Suppose the random variable T
models the lifetime of some component. Then the random variable T is said to have the lack of memory property
iff the remaining lifetime of an item which has already lasted for a length of time x has the same distribution as T .
This means
P[T > x + t|T > x] = P[T > t]
and hence
P[T > x + t] = P[T > t] P[T > x] for all t > 0 and x > 0.
Similarly, the distribution with distribution function F has the lack of memory property iff [1 − F (x + t)] =
[1 − F (x)][1 − F (t)] for all x > 0 and all t > 0.
If X ∼ exponential (λ) then 1 − F (x) = e−λx and hence X has the lack of memory property. Conversely
Proposition(6.3a). Suppose X is an absolutely continuous random variable on [0, ∞) with the lack of memory
property. Then there exists λ > 0 such that X ∼ exponential (λ).
Proof. Let G(x) = P[X > x]. Then
G(x + y) = G(x)G(y) for all x ≥ 0 and all y ≥ 0.
 m  n  mn
Suppose x = m/n is rational. Then G(m/n) = G(1/n) . Raising to the nth power gives G(m/n) = G(1/n) =
m m/n
[G(1)] . Hence G(m/n) = [G(1)] .
Now suppose x is any real number in [0, ∞). Choose sequences qn and rn of rationals such that qn ≤ x ≤ rn and
qn → x as n → ∞ and rn → x as n → ∞. Hence G(1)qn = G(qn ) ≥ G(x) ≥ G(rn ) = G(1)rn . Letting n → ∞ gives
G(x) = G(1)x .
Now let λ = − ln [G(1)]. Then G(x) = G(1)x = e−λx . See also [F ELLER(1968)], page 459.

The proof of proposition(6.3a) depends on finding a solution of the functional equation f (x + y) = f (x)f (y).
Taking logs of this equation gives the Cauchy functional equation f (x + y) = f (x) + f (y). Both of these equations
have been studied very extensively—see [S AATY(1981)], [ACZ ÉL(1966)], [K UCZMA(2009)], etc.
6.4 Distribution of the minimum. Suppose Xj ∼ exponential (λj ) for j = 1, 2, . . . , n. Suppose further that
X1 , X2 , . . . , Xn are independent. Let X1:n = min{X1 , . . . , Xn }. Then
P[X1:n > t] = P[X1 > t] · · · P[Xn > t] = e−(λ1 +···+λn )t for t > 0.
Hence X1:n ∼ exponential (λ1 + · · · + λn ).
In particular, we have shown that if X1 , . . . , Xn are i.i.d. random variables with the exponential (λ) distribution,
then X1:n ∼ exponential (nλ). Hence
X1
X1:n has the same distribution as
n
This property characterizes the exponential distribution—see page 39 of [G ALAMBOS & KOTZ(1978)].
Page 18 §6 Jan 8, 2019(21:02) Bayesian Time Series Analysis

6.5 The order statistics of the exponential distribution. Suppose we think of X1 , . . . , Xn as the times when
n events occur. Then we have shown that the time to the first event has the exponential (nλ) distribution. Using
the lack of memory property suggests that the extra time to the second event, X2:n − X1:n , should have the
exponential ( (n − 1)λ ) distribution. And so on. This result is established in the following proposition.
Proposition(6.5a). Suppose X1 , . . . , Xn are i.i.d. random variables with the exponential (λ) distribution. De-
fine Z1 , . . . , Zn by
Z1 = nX1:n , Z2 = (n − 1)(X2:n − X1:n ), . . . , Zn = Xn:n − X(n−1):n
Then Z1 , . . . ,Zn are i.i.d. random variables with the exponential (λ) distribution.
Pn
−λ xj
Proof. We know that the density of (X1:n , . . . , Xn:n ) is g(x, . . . , xn ) = n!λn e j=1 for 0 < x1 < · · · < xn .
Also
Z1 Z1 Z2 Z1 Z2 Zn−1
X1:n = X2:n = + . . . , Xn:n = + + ··· + + Zn
n n n−1 n n−1 2
and hence the Jacobian of the transformation is
∂(x1:n , . . . , xn:n ) 1
=
∂(z1 , . . . , zn ) n!
Hence the density of (Z1 , . . . , Zn ) is
1
f(Z1 ,...,Zn ) (z1 , . . . , zn ) = n!λn e−λ(z1 +···+zn ) for z1 > 0, . . . , zn > 0.
n!
This establishes the proposition.
6.6 Link with the the order statistics from a uniform distribution.
Proposition(6.6a). Suppose Y1 , . . . , Yn+1 are i.i.d. random variables with the exponential (λ) distribution, and
let
S` = Y1 + · · · + Y` for ` = 1, . . . , n + 1.
Then  
Y1 Yn
,..., is independent of Sn+1 . (6.6a)
Sn+1 Sn+1
Suppose U1 , . . . , Un are i.i.d. random variables with the U (0, 1) distribution and denote the vector of order
statistics by (U1:n , . . . , Un:n ). Let
D1 = U1:n , D2 = U2:n − U1:n , . . . , Dn = Un:n − U(n−1):n , Dn+1 = 1 − Un:n
Then  
Y1 Yn+1
,..., has the same distribution as (D1 , . . . , Dn+1 ) (6.6b)
Sn+1 Sn+1
Proof. By equation(2.2a), the density of the vector (U1:n , . . . , Un:n ) is
n! if 0 < x1 < · · · < xn < 1
n
g(x1 , . . . , xn ) =
0 otherwise
Also
f(Y1 ,...,Yn+1 ) (y1 , . . . , yn+1 ) = λn+1 e−λ(y1 +···+yn+1 ) for y1 > 0,. . . , yn+1 > 0.
Consider the transformation:
Y1 Y2 Yn
X1 = , X2 = , . . . , Xn = , Xn+1 = Y1 + · · · + Yn+1
Sn+1 Sn+1 Sn+1
Or
Y1 = X1 Xn+1 , Y2 = X2 Xn+1 , . . . , Yn = Xn Xn+1 , Yn+1 = (1 − X1 − X2 − · · · − Xn )Xn+1
The absolute value of the Jacobian of the transformation is:
∂(y1 , . . . , yn+1 ) n
∂(x1 , . . . , xn+1 ) = xn+1

The determinant can be easily evaluated by replacing the last row by the sum of all the rows—this gives an upper triangular
determinant.
Hence the density of (X1 , . . . , Xn+1 ) is
 Pn+1
n+1 −λxn+1 n
f(X1 ,...,Xn+1 ) (x1 , . . . , xn+1 ) = λ e xn+1 for x1 ≥ 0, . . . , xn+1 ≥ 0, `=1 x` = 1
0 otherwise
Now Xn+1 = Y1 + · · · + Yn+1 is the sum of (n + 1) i.i.d. exponentials; hence by proposition(8.6a) on page 22, Xn+1 ∼
Gamma(n + 1, λ) and has density
λn+1 xnn+1 e−λxn+1
fXn+1 (xn+1 ) =
n!
1 Univariate Continuous Distributions Jan 8, 2019(21:02) §7 Page 19

It follows that (X1 , . . . , Xn ) is independent of Xn+1 and (X1 , . . . , Xn ) has density


Pn
f(X1 ,...,Xn ) (x1 , . . . , xn ) = n! for x1 ≥ 0, . . . , xn ≥ 0, `=1 x` ≤ 1.
This establishes (6.6a).
Using equation(4.7a) on page 14 shows that the density of (D1 , . . . , Dn ) is the same as the density of (X1 , . . . , Xn ). Also
D1 + · · · + Dn + Dn+1 = 1 and X1 + · · · + Xn + Yn+1/Sn+1 = 1. Hence (6.6b).
Corollary(6.6b). With the same notation as the proposition,
 
S1 Sn
,..., has the same distribution as (U1:n , . . . , Un:n ) (6.6c)
Sn+1 Sn+1
Also  
S1 Sn
,..., is independent of Sn+1 . (6.6d)
Sn+1 Sn+1
Proof. The proposition shows that (X1 , . . . , Xn ) has the same distribution (D1 , . . . , Dn ). Taking partial sums of both
sides gives (6.6c). Result (6.6d) follows directly from (6.6a).
6.7
Summary. The exponential distribution.
• Density. X ∼ exponential (λ) iff fX (x) = λe−λx for x > 0.
• The distribution function. FX (x) = 1 − e−λx for x > 0.
• Moments.
1 2 1
E[X] = E[X 2 ] = 2 var[X] = 2
λ λ λ
• M.g.f. and c.f.
λ λ
MX (t) = E[etX ] = for t < λ. φX (t) = E[eitX ] =
λ−t λ − it
• Properties.
Suppose X ∼ exponential (λ) then αX ∼ exponential (λ/α).
The exponential is the continuous analogue of the geometric distribution.
The lack of memory property: P[X > x + t|X > x] = P[X > t].
If X1 , . . . , Xn are i.i.d. exponential (λ), then X1:n ∼ exponential (nλ).

7 Exercises (exs-exponential.tex)

1. Suppose X ∼ U (0, 1) and λ > 0. Prove that


ln(1 − X)
Y =− ∼ exponential (λ)
λ
2. Suppose X ∼ exponential (λ). Let Y be the integer part of X; hence Y = [X]. Let Z be the fractional part of X; hence
Z = X − [X].
Find the distributions of Y and Z and show that Y and Z are independent.
3. Suppose X and Y are i.i.d. random variables with the exponential (λ) distribution. Find the distribution of Z =
X−Y /X+Y .

4. Suppose X and Y are i.i.d. random variables with the exponential (λ) distribution. Find the distribution of Z = X/Y +1.
5. Suppose the random variables X and Y are i.i.d. with the exponential (1) distribution. Let U = min{X, Y } and V =
max{X, Y }. Proves that U ∼ exponential (2) and V ∼ X + 21 Y .
6. Suppose X ∼ exponential (µ) and Y ∼ exponential (δ) where 0 < δ ≤ µ. Suppose further that f : (0, ∞) → (0, ∞)
with f differentiable and f 0 (x) > 0 for all x > 0. Prove that E[f (X)] ≤ E[f (Y )].
7. Suppose X and Y are i.i.d. random variables with the exponential (λ) distribution. Find the conditional density of X
given X + Y = z. What is E[X|X + Y ]?
8. Suppose X1 and X2 are i.i.d. random variables with the exponential (λ) distribution. Let Y1 = X1 −X2 and Y2 = X1 +X2 .
(a) Find the densities of Y1 and Y2 . (b) What is the density of R = |X1 − X2 |?
9. A characterization of the exponential distribution. Suppose X1 and X2 are i.i.d. random variables which are non-negative
and absolutely continuous. Let Y = min{X1 , X2 } and R = |X1 − X2 |. Then Y and R are independent iff X1 and X2
have the exponential distribution.
Page 20 §8 Jan 8, 2019(21:02) Bayesian Time Series Analysis

10. Suppose X1 ∼ exponential (λ1 ), X2 ∼ exponential (λ2 ) and X1 and X2 are independent.
(a) Find P[min{X1 , X2 } = X1 ].
(b) Show that {min{X1 , X2 } > t} and {min{X1 , X2 } = X1 } are independent.
(c) Let R = max{X1 , X2 } − min{X1 , X2 }. Find P[R > t].
(d) Show that min{X1 , X2 } and R are independent.

11. Most elementary analysis texts contain a proof of the result:


 n
1
lim 1 − =e
n→∞ n
By using this result, show that if {δn } is a real sequence in (0, ∞) such that δn → 0 as n → ∞ and α ∈ (0, ∞), then
lim (1 − αδn )t/δn = e−αt
n→∞
This is used in (6.2a) on page 17.

12. (a) Ratio of two independent exponentials. Suppose X ∼ exponential (λ), Y ∼ exponential (µ) and X and Y are
independent. Find the distribution of Z = X/Y .
(b) Product of two independent exponentials. Suppose X ∼ exponential (λ), Y ∼ exponential (µ) and X and Y are
independent.
(i) Find the distribution function of Z = XY . Express your answer in terms of the modified Bessel function of
the second kind, order 1, which is:
Z ∞
K1 (y) = cosh(x)e−y cosh(x) dx for <(x) > 0.
x=0

(ii) Find the density of Z = XY . Express your answer in terms of the modified Bessel function of the second
kind, order 0, which is:
Z ∞
K0 (y) = e−y cosh(x) dx for <(x) > 0.
x=0

(iii) Write down what these answers become when λ = µ.

13. Suppose X1 , . . . , Xn are i.i.d. random variables with the exponential (λ) distribution. Define Y1 , . . . , Yn as follows:
Y1 = X1 , Y2 = X1 + X2 , . . . , Yn = X1 + · · · + Xn .
Find the density of the vector (Y1 , . . . , Yn ).

14. Order statistics. Suppose X1 , . . . , Xn are i.i.d. random variables with the exponential (λ) distribution. Find
(a) E[Xk:n ] (b) var[Xk:n ] (c) cov[Xj:n , Xk:n ].

15. Suppose X1 , . . . , Xn are i.i.d. random variables with the exponential (λ) distribution.
Let Z = nX1:n + (n − 1)X2:n + · · · + 2X(n−1):n + Xn:n . Find E[Z] and var[Z].

16. Suppose X1 , . . . , Xn are i.i.d. random variables with the exponential (λ) distribution. Prove that
Xn
X1:n is independent of (X` − X1:n )
`=1

8 The gamma and chi-squared distributions


8.1 Definition of the Gamma distribution.
Definition(8.1a). Suppose n > 0 and α > 0. Then the random variable X has the Gamma(n, α) distribution
iff X has density
αn xn−1 e−αx
f (x) = for x > 0. (8.1a)
Γ(n)
R∞
By definition, Γ(n) = 0 xn−1 e−x dx for all n ∈ (0, ∞). It follows that
Z ∞
Γ(n)
xn−1 e−αx dx = n provided α > 0 and n > 0. (8.1b)
0 α
1 Univariate Continuous Distributions Jan 8, 2019(21:02) §8 Page 21

8.2 The distribution function. There is a simple expression for the distribution function only when n is a
positive integer: if X ∼ Gamma(n, α) where n is a positive integer then
αx (αx)2 (αx)n−1
 
−αx
P[X ≤ x] = 1 − e 1+ + + ··· + (8.2a)
1! 2! (n − 1)!
This is easy to check—differentiating the right hand side of equation(8.2a) gives the density in equation(8.1a).
Note that P[X ≤ x] = P[Y ≥ n] where Y has a Poisson distribution with expectation αx. In terms of the Poisson
process with rate α, the relation P[X ≤ x] = P[Y ≥ n] means that the nth event occurs before time x iff there are
at least n events in [0, x]. Equation(8.2a) also implies
αxe−αx (αx)2 e−αx (αx)n−1 e−αx
P[X > x] = e−αx + + + ··· +
1! 2! (n − 1)!

8.3 Multiple of a gamma distribution. Suppose n > 0 and α > 0 and X ∼ Gamma(n, α) with density fX (x).
Suppose further that Y = βX where β > 0. Then the density of Y is given by:
fX (x) αn ( y/β )n−1 exp(−αy/β)
fY (y) = =
dy
dx βΓ(n)

Hence Y = βX ∼ Gamma(n, α/β ). Thus the parameter α is a scale parameter of the gamma distribution.
R
8.4 Moments and shape of the gamma distribution. Using the result that f (x) dx = 1 easily gives
Γ(n + k)
E[X k ] = k for n + k > 0. (8.4a)
α Γ(n)
and so  
n n 1 α
E[X] = and var(X) = 2 and E = for n > 1.
α α X n−1

0.8 n = 21
n=1
0.6 n=2

0.4

0.2

0.0

1 2 3 4
Figure(8.4a). Plot of gamma density function for n = 21 , n = 1 and n = 2 (all with α = 1).
(wmf/gammadensity-fig001,121mm,73mm)

Figure (8.4a) shows that n is a shape parameter of the gamma distribution. By §8.3, we know that α is a scale
parameter and if X ∼ Gamma(n, α) then Y = X/α ∼ Gamma(n, 1). So without loss of generality, we now
consider the shape of the density of Gamma(n, 1) distribution.
Let g(x) = xn−1 e−x . If n ≤ 1, then g(x) = e−x /x1−n is monotonic decreasing and hence the density of the
Gamma(n, 1) distribution is monotonic decreasing.
If n > 1, then g 0 (x) = e−x xn−2 [n − 1 − x]. Clearly, if x < n − 1 then g 0 (x) > 0; if x = n − 1 then g 0 (x) = 0 and
if x > n − 1 then g 0 (x) < 0. Hence the density first increases to the maximum at x = n − 1 and then decreases.
By using §8.3, it follows that the maximum of the density of a Gamma(n, α) density occurs at x = (n − 1)/α.
8.5 The moment generating function of a gamma distribution. Suppose X ∼ Gamma(n, α). Then
Z ∞ n n−1 e−αx Z ∞
tx α x αn
tX
MX (t) = E[e ] = e dx = xn−1 e−(α−t)x dx
0 Γ(n) Γ(n) 0
αn Γ(n) 1
= n
= for t < α. (8.5a)
Γ(n) (α − t) (1 − t/α)n
Page 22 §8 Jan 8, 2019(21:02) Bayesian Time Series Analysis

Hence the characteristic function is 1/(1 − it/α)n ; in particular, if n = 1, the characteristic function of the
exponential(α) distribution is α/(α − it).
Equation(8.5a) shows that for integral n, the Gamma distribution is the sum of n independent exponentials. The
next paragraph gives the long proof of this.
8.6 Representing the gamma distribution as a sum of independent exponentials. The following proposi-
tion shows that the distribution of the waiting time for the nth event in a Poisson process with rate α has the
Gamma(n, α) distribution.
Proposition(8.6a). Suppose X1 , X2 . . . . , Xn are i.i.d. random variables with the exponential density αe−αx
for x ≥ 0. Then Sn = X1 + · · · + Xn has the Gamma distribution Γ(n, α).
Proof. By induction: let gn denote the density of Sn . Then for all t > 0 we have
Z t Z t n
α (t − x)n−1 e−α(t−x) −αx
gn+1 (t) = gn (t − x)αe−αx dx = αe dx
0 0 Γ(n)
x=t
αn+1 e−αt t αn+1 e−αt (t − x)n
Z 
= (t − x)n−1 dx = −
Γ(n) 0 Γ(n) n x=0
αn+1 tn e−αt
= as required.
Γ(n + 1)
The result that the sum of n independent exponentials has the Gamma distribution is the continuous analogue of
the result that the sum of n independent geometrics has a negative binomial distribution.
Link with the Poisson distribution. Suppose Sn = X1 + · · · + Xn as in the proposition. Let Nt denote the number
of indices k ≥ 1 with Sk ≤ t. Then
P[Nt = n] = P[Sn ≤ t and Sn+1 > t] = Gn (t) − Gn+1 (t)
e−αt (αt)n
=
n!
by using equation(8.2a) on page 21.
8.7 Normal limit and approximation. Suppose Gn ∼ Gamma(n, α). It follows from proposition(8.6a) and the
Central Limit Theorem that for n large,
Gn − n/α
√ is approximately N (0, 1)
n/α
and hence for large n
αx − n
 
P[Gn ≤ x] ≈ Φ √
n
2
The local central limit theorem√ showsthat √ 
n n+z n 1 1 2
lim fGn = n(z) where n(x) = √ e− 2 x (8.7a)
n→∞ α α 2π
See exercise 14 on page 25 below.
8.8 Lukacs’ characterization of the gamma distribution.
Proposition(8.8a). Suppose X and Y are both positive, non-degenerate 3 and independent random variables.
Then X/(X+Y ) is independent of X+Y iff there exist k1 > 0, k2 > 0 and α > 0 such that X ∼ Gamma(k1 , α)
and Y ∼ Gamma(k2 , α).
Proof.
⇐ This is exercise 5 on page 24.
⇒ This is proved in [L UKACS(1955)] and [M ARSAGLIA(1989)].

2
The local central limit theorem. Suppose Y1 , Y2 , . . . are i.i.d. random variables with mean 0 and variance 1 and characteristic
k
√ |φY | is integrable for some positive k and sup{|φY (t)| : |t| ≥ δ} < 1 for all δ > 0. Let
function φY . Suppose further that
Sn = Y1 + · · · + Yn ; then Sn / n has a bounded continuous density fn for all n ≥ k and supx∈R |fn (x) − n(x)| → 0 as
n → ∞.
This formulation is due to Michael Wichura: galton.uchicago.edu/~wichura/Stat304/Handouts/L16.limits.pdf.
See also page 516 in [F ELLER(1971)].
3
To exclude the trivial case that both X and Y are constant. In fact if one of X and Y is constant and X/(X +Y ) is independent
of X + Y , then the other must be constant also.
1 Univariate Continuous Distributions Jan 8, 2019(21:02) §8 Page 23

We can easily extend this result to n variables:


Proposition(8.8b). Suppose X1 , X2 , . . . , Xn are positive, non-degenerate and independent random variables.
Then Xj /(X1 + · · · + Xn ) is independent of X1 + · · · + Xn for j = 1, 2, . . . , n iff there exist α > 0,
k1 > 0, . . . , kn > 0 such that Xj ∼ Gamma(kj , α) for j = 1, 2, . . . , n.
Proof.
⇐ Let W = X2 + · · · + Xn . By equation(8.5a), W ∼ Gamma(k2 + · · · + kn , α). Also X1 ∼ Gamma(k1 , α) and
W and X1 are independent positive random variables. Hence X1 /(X1 + · · · + Xn ) is independent of X1 + · · · + Xn by
proposition(8.8a). Similarly Xj /(X1 + · · · + Xn ) is independent of X1 + · · · + Xn for j = 2, . . . , n.
⇒ Let Wj = X1 + · · · + Xn − Xj . Then Wj and Xj are independent positive random variables. Also Xj /(Wj + Xj ) is
independent of Wj + Xj . By proposition(8.8a), there exist kj > 0, kj∗ > 0 and αj > 0 such that Xj ∼ Gamma(kj , αj )
and Wj ∼ Gamma(kj∗ , αj ). Hence X1 + · · · + Xn = Wj + Xj ∼ Gamma(kj + kj∗ , αj ). The same argument works for
j = 1, 2, . . . , n; this implies α1 = · · · = αn . The result follows.

8.9 The χ2 distribution. For n ∈ (0, ∞) the Gamma( n/2, 1/2) distribution has density:
xn/2−1 e−x/2
f (x) = for x > 0.
2n/2 Γ( n/2)
This is the density of the χ2n distribution. If n is a positive integer, then n is called the degrees of freedom.
In particular, if n ∈ (0, ∞) and X ∼ Gamma( n/2, α) then 2αX ∼ Gamma( n/2, 1/2) = χ2n .
If Y ∼ χ2n = Gamma( n/2, 1/2), then equation(8.4a) shows that the k th moment of Y is given by
( n
2 k Γ(k+ /2) if n > −2k;
k n
E[Y ] = Γ( /2) (8.9a)
∞ if n ≤ −2k.
In particular E[Y ] = n, E[Y 2 ] = n(n + 2), var[Y ] = 2n,


 
2Γ( (n+1)/2) 1 1
E[ Y ] = and E = provided n > 2.
Γ( n/2) Y (n − 2)

By equation(8.5a), the c.f. of the χ2n distribution is 1/(1 − 2it)n/2 . It immediately follows that if X ∼ χ2m ,
Y ∼ χ2n and X and Y are independent, then X + Y ∼ χ2m+n .

8.10 The generalized gamma distribution.


Suppose n > 0, λ > 0 and b > 0. Then the random variable X has the generalized
Definition(8.10a).
gamma distribution GGamma(n, λ, b) iff X has density
bλn bn−1 −λxb
f (x) = x e for x > 0. (8.10a)
Γ(n)
Note that if n = b = 1, then the generalized gamma is the exponential distribution, if b = 1, the generalized gamma
is the gamma distribution, if n = 1, the generalized gamma is the Weibull distribution—introduced below in §24.3
on page 56, if n = 1, b = 2 and λ = 1/2σ2 , the generalized gamma is the Rayleigh distribution—introduced below
in §24.2 on page 56, and finally if n = 1/2, b = 2 and λ = 1/2σ2 then the generalized gamma is the half-normal
distribution—introduced in exercise 1.11(8) on page 28.

It is left to an exercise (see exercise16 on page 25) to check:


• The function f in equation(8.10a) integrates to 1 and so is a density.
• If X ∼ GGamma(n, λ, b) then Y = X b ∼ Gamma(n, λ).
• The central moments are given by the expression:
Γ( k/b + n)
E[X k ] =
λb/k Γ(n)

The generalized gamma distribution is used in survival analysis and reliability theory to model lifetimes.
Page 24 §9 Jan 8, 2019(21:02) Bayesian Time Series Analysis

8.11
Summary. The gamma distribution.
• Density. X has the Gamma(n, α) density for n > 0 and α > 0 iff
αn xn−1 e−αx
fX (x) = for x > 0.
Γ(n)
• Moments. E[X] = n/α; var[X] = n/α2 and E[X k ] = Γ(n+k)/αk Γ(n) for n + k > 0.
• M.g.f. and c.f.
1 1
MX (t) = E[etX ] = for t < α. φX (t) = E[eitX ] =
(1 − t/α)n (1 − it/α)n
• Properties.
Gamma(1, α) is the exponential (α) distribution.
If X ∼ Gamma(n, α) and β > 0 then βX ∼ Gamma(n, α/β ).
The Gamma(n, α) distribution is the sum of n independent exponential (α) distributions.
If X ∼ Gamma(m, α), Y ∼ Gamma(n, α) and X and Y are independent, then X + Y ∼ Gamma(m + n, α).
The χ2n distribution.
• This is the Gamma( n/2, 1/2) distribution.
• If X ∼ χ2n , then E[X] = n, var[X] = 2n and the c.f. is φ(t) = 1/(1 − 2it)n/2 .
• If X ∼ χ2m , Y ∼ χ2n and X and Y are independent, then X + Y ∼ χ2m+n .
• The χ22 distribution is the exponential ( 1/2) distribution.

9 Exercises (exs-gamma.tex)
R∞ x−1 −u
1. The Gamma function. This is defined to be Γ(x) = 0 u e du for x > 0. Show that
(a) Γ(x + 1) = x Γ(x) for all x > 0; (b) Γ(1) = 1;√
(c) Γ(n) = (n − 1)! for all integral n ≥ 2; (d) Γ( 1/2) = π
 1.3.5 . . . (2n − 1) √ (2n)! √
(e) Γ n + 1/2 = π = 2n π for integral n ≥ 1
2n 2 n!
2. Suppose X ∼ Gamma(m, α) and Y ∼ Gamma(n, α) and X and Y are independent. Find E[ Y /X ].
3. By §8.4 on page 21, we know that if n > 1, the maximum of the Gamma(n, 1) density occurs at x = n − 1. Show that
the maximum value of the density when n > 1 is approximately
1

2π(n − 1)

Hint: Stirling’s formula is n! ∼ nn e−n 2πn as n → ∞.
4. Gamma densities are closed under convolution. Suppose X ∼ Gamma(n1 , α), Y ∼ Gamma(n2 , α) and X and Y are
independent. Prove that X + Y has the Gamma(n1 + n2 , α) distribution.
5. Suppose X ∼ Gamma(m, α) and Y ∼ Gamma(n, α) and X and Y are independent.
(a) Show that U = X + Y and V = X/(X + Y ) are independent..
(b) Show that U = X + Y and V = Y /X are independent.
In both cases, find the densities of U and V .
6. Suppose X1 ∼ Gamma(k1 , λ), X2 ∼ Gamma(k2 , λ) and X3 ∼ Gamma(k3 , λ). Suppose further that X1 , X2 and X3 are
independent. Let
X1 X1 + X2
Y1 = Y2 = Y3 = X1 + X2 + X3
X1 + X2 X1 + X2 + X3
Show that Y1 , Y2 and Y3 are independent and find their distributions.
7. Suppose X ∼ Gamma(n, α). Show that
   n
2n 2
P X≥ ≤
α e
8. Suppose the random variable X has the following central moments:
(a) E[X k ] = 2k−1 (k + 2)! for k = 1, 2, 3, . . . . (b) E[X k ] = 2k (k + 1)! for k = 1, 2, 3, . . . .
In both cases, find the distribution of X.
1 Univariate Continuous Distributions Jan 8, 2019(21:02) §10 Page 25

9. Suppose U ∼ U (0, 1). Find the distribution of Y = −2 ln(U ).


10. Suppose X ∼ Gamma(m, α) and Y ∼ Gamma(n, α) and X and Y are independent. Show that
n mv
if v > 0;
E[X|X + Y = v] = m+n
0 otherwise.
11. Suppose the random variable X ∼ exponential (Y ) where Y ∼ Gamma(n, α).
(a) Find the density of X and E[X].
(b) Find the conditional density of Y given X = x.
12. Suppose X ∼ exponential(λ), and given X = x, the n random variables Y1 , . . . , Yn are i.i.d. exponential(x).4 Find the
distribution of (X|Y1 , . . . , Yn ) and E[X|Y1 , . . . , Yn ].
13. The Poisson-Gamma mixture. Suppose X ∼ Gamma(n, α); suppose further that given the random variable X then Y
has a Poisson distribution with expectation X. Compute P[Y = j] for j = 0, 1, 2, . . . .
14. Suppose
Pn Gn ∼ Gamma(n,
Pn α) and Sn = α(Gn − n/α) where n > 1 is an integer. Hence Sn = α(Gn − n/α) =
i=1 α(Xi − 1/α) = i=1 Yi where each Xi ∼ exponential (α) and each Yi has mean 0 and variance 1.
Check that the conditions of the local central limit theorem (§8.7 on page 22) are satisfied and hence verify the limiting
result (8.7a) on page 22.
15. Length biased sampling in the Poisson process. Suppose {Xj }j≥1 is a sequence of i.i.d. random variables with the
exponential (α) distribution. For n ≥ 1, let Sn = X1 + · · · + Xn and suppose t ∈ (0, ∞).
Define the random variable K to be the unique integer with SK−1 < t ≤ SK ; equivalently K = min{j : Sj ≥ t}.
(a) Find the density of XK . Find E[XK ] and compare with 1/α, the expectation of an exponential (α) distribution.
Note that a longer interval has a higher chance of containing t!
(b) Let Wt denote the waiting time to the next event after time t; hence Wt = SK − t.
Find the distribution of Wt .
16. The generalized gamma distribution.
(a) Show that the function f defined in equation(8.10a) is a density.
(b) Suppose X ∼ GGamma(n, λ, b). Show that Y = X b ∼ Gamma(n, λ).
(c) Suppose X ∼ GGamma(n, λ, b). Find the central moments E[X k ] for k = 1, 2, . . . .

10 The normal distribution


10.1 The density function.
Definition(10.1a). Suppose µ ∈ (−∞, ∞) and σ ∈ (0, ∞). Then the random variable X has the normal
distribution N (µ, σ 2 ) if it has density
(x − µ)2
 
1
fX (x) = √ exp − for x ∈ R. (10.1a)
σ 2π 2σ 2
The normal density has the familiar “bell” shape. There are points of inflection at x = µ − σ and x = µ + σ—this
means the f 00 (x) = 0 at these points and the curve changes from convex, when x < µ − σ, to concave and then to
convex again when x > µ + σ.

A B

µ−σ µ µ+σ

Figure(10.1a). The graph of the normal density. Points A and B are points of inflection.
(wmf/normaldensity,72mm,54mm)

4
This means that f(Y1 ,...,Yn )|X (y1 , . . . , yn |x) = Πni=1 fYi |X (yi |x) = xn e−x(y1 +···+yn ) .
Page 26 §10 Jan 8, 2019(21:02) Bayesian Time Series Analysis

To check that the function fX defined in equation(10.1a) is a density function:


Clearly fX (x) ≥ 0 for all x ∈ R. Using the substitution t = (x − µ)/σ gives
Z ∞ Z ∞
(x − µ)2
 
1
I= fX (x) dx = √ exp − dx
−∞ −∞ σ 2π 2σ 2
Z ∞ Z ∞ r
1  2  2  2  2
=√ exp − t /2 dt = √ exp − t /2 dt = J
2π −∞ 2π 0 π
where
Z ∞Z ∞ Z π/2 Z ∞
2 1 2 2 π
J = exp[− 2 (x + y )]dy dx = r exp[− 21 r2 ]dr dθ =
0 0 0 0 2
and hence r
π
J=
2
This shows that fX integrates to 1 and hence is a density function.
10.2 The distribution function, mean and variance. The standard normal distribution is the normal distribu-
tion N (0, 1); its distribution function is
Z x
1
Φ(x) = √ exp[− 12 t2 ] dt
−∞ 2π
This function is widely tabulated. Note that:
• Φ(−x) = 1 − Φ(x). See exercise 1 on page 28.
• If X has the N (µ, σ 2 ) distribution, then for −∞ < a < b < ∞ we have
Z b Z (b−µ)/σ
(x − µ)2
 
1 1
exp − t2/2 dt
 
P[a < X ≤ b] = √ exp − 2
dx = √
a σ 2π 2σ 2π (a−µ)/σ
b−µ a−µ
   
=Φ −Φ
σ σ
2
The mean of the N (µ, σ ) distribution:
Z ∞
(x − µ)2
 
1
E[X] = [(x − µ) + µ] √ exp − dx = 0 + µ = µ
−∞ σ 2π 2σ 2
because the function x 7−→ x exp[− x2/2] is odd.
The variance of the N (µ, σ 2 ) distribution: use integration by parts as follows
Z ∞ Z ∞
(x − µ)2 σ2
 
2 1
var[X] = (x − µ) √ exp − dx = √ t2 exp[ −t2/2] dt
−∞ σ 2π 2σ 2 2π −∞
Z ∞ Z ∞
2σ 2 2σ 2
=√ 2
t t exp[ /2] dt = √
−t exp[− t2/2] dt = σ 2
2π 0 2π 0
10.3 The moment generating function and characteristic function. Suppose X ∼ N (µ, σ 2 ) and X = µ+σY .
Then Y ∼ N (0, 1) by using the usual change of variable method. For s ∈ R, the moment generating function of
X is given by
Z ∞
1 1 2
MX (s) = E[esX ] = esµ E[esσY ] = esµ esσt √ e− 2 t dt
−∞ 2π
Z ∞ Z ∞
esµ esµ
 2
t − 2σst (t − σs)2 σ 2 s2
  
=√ exp − dt = √ exp − + dt
2π −∞ 2 2π −∞ 2 2
= exp sµ + 12 σ 2 s2
 
1 2 2
Similarly the characteristic function of X is E[eitX ] = eiµt− 2 σ t .
Moments of a distribution can be obtained by expanding the moment generating function as a power series: E[X r ]
is the coefficient of sr /r! in the expansion of the moment generating function. It is easy to find the moments
 about
the mean of a normal distribution in this way: if X ∼ N (µ, σ 2 ) and Y = X − µ then E[esY ] = exp 21 σ 2 s2 which

can be expanded in a power series of powers of s. Hence
E (X − µ)2n+1 = E Y 2n+1 = 0 for n = 0, 1, . . .
   

and
1 Univariate Continuous Distributions Jan 8, 2019(21:02) §10 Page 27

 (2n)!σ 2n
E (X − µ)2n = E Y 2n =
  
for n = 0, 1, . . .
2n n!
For example, E[(X − µ)2 ] = σ 2 and E[(X − µ)4 ] = 3σ 4 .

Similarly we can show that (see exercise 10 on page 28):


 2n/2 σ n
 
 n n+1
E |X − µ| = √ Γ for n = 0, 1, . . . .
π 2
There are available complicated expressions for E[X n ] and E[|X|n ]; for example, see [W INKELBAUER(2014)].

10.4 Linear combination of independent normals.


2
Proposition(10.4a). Pn X1 , X2 , . . . , Xn are independent random variables with Xi ∼ N (µi , σi ) for
Suppose
i = 1, 2, . . . , n. Let T = i=1 di Xi where di ∈ R for i = 1, 2, . . . , n. Then
n n
!
X X
T ∼N di µi , d2i σi2
i=1 i=1

Proof. Using moment generating functions gives


n n n  
sT
Y
sdi Xi
Y Y 1 2 2 2
MT (s) = E[e ] = E[e ]= Mxi (sdi ) = exp sdi µi + s di σi
2
i=1 i=1 i=1
n n
!
X 1 X 2 2
= exp s di µi , + s2 di σi
2
i=1 i=1
Pn Pn 2 2 
which is the mgf of N i=1 di µi , i=1 di σi .
Pn
Corollary(10.4b). If X1 , . . . , Xn are i.i.d. N (µ, σ 2 ), then X = i=1 Xi /n ∼ N (µ, σ 2 /n).

10.5 Sum of squares of independent N (0, 1) variables.


Proposition(10.5a). Suppose X1 ,. . . , Xn are i.i.d. random variables with the N (0, 1) distribution.
Let Z = X12 + · · · + Xn2 . Then Z ∼ χ2n .
Proof. Consider n = 1. Now X1 has density
1 2
fX1 (x) = √ e−x /2 for x ∈ R.

Then Z = X12 has density
√ dx z −1/2 e−z/2

1 1
fZ (z) = 2fX1 ( z) = 2 √ e−z/2 √ = 1/2 1 for z > 0.
dz 2π 2 z 2 Γ( /2)
Thus Z ∼ χ21 . We know that if X ∼ χ2m , Y ∼ χ2n and X and Y are independent, then X + Y ∼ χ2n+m . Hence Z ∼ χ2n
in the general case.

10.6 Characterizations of the normal distribution. There are many characterizations of the normal distribu-
tion5 —here are two of the most useful and interesting.
Proposition(10.6a). Cramér’s theorem. Suppose X and Y are independent random variables such that Z =
X + Y has a normal distribution. Then both X and Y have normal distributions—although one may have a
degenerate distribution.
Proof. See, for example, page 298 in [M ORAN(2003)].
Proposition(10.6b). The Skitovich-Darmois theorem. Suppose n ≥ 2 and X1 , . . . , Xn are independent random
variables. Suppose a1 , . . . , an , b1 , . . . , bn are all in R and
L1 = a1 X1 + · · · + an Xn L2 = b2 X1 + · · · + bn Xn
If L1 and L2 are independent, then all random variables Xj with aj bj 6= 0 are normal.
Proof. See, for example, page 89 in [K AGAN et al.(1973)].

5
For example, see [M ATHAI & P EDERZOLI(1977)] and [PATEL & R EAD(1996)]
Page 28 §11 Jan 8, 2019(21:02) Bayesian Time Series Analysis

10.7
Summary. The normal distribution.
• Density. X has the N (µ, σ 2 ) distribution iff it has the density
(x − µ)2
 
1
fX (x) = √ exp − for x ∈ R.
σ 2π 2σ 2
• Moments: E[X] = µ and var[X] = σ 2
• The distribution function: P[X ≤ x] = Φ(x) which is tabulated.
• The moment generating function: MX (t) = E[etX ] = exp[tµ + 12 t2 σ 2 ]
• The characteristic function: φX (t) = E[eitX ] = exp[iµt − 12 σ 2 t2 ]
• A linear combination of independent normals has a normal distribution.
• The sum of squares of n independent N (0, 1) variables has the χ2n distribution.

11 Exercises (exs-normal.tex)

1. Show that Φ(−x) = 1 − Φ(x).

2. Suppose X ∼ N (µ, σ 2 ). Suppose further that P[X ≤ 140] = 0.3 and P[X ≤ 200] = 0.6. Find µ and σ 2 .

3. Suppose Y has the distribution function FY (y) with


1
Φ(y) if y < 0;
FY (y) = 21 1
2 + 2 Φ(y) if y ≥ 0.
n
Find E[Y ] for n = 0, 1, . . . .

4. Suppose X is a random variable with density fX (x) = ce−Q(x) for all x ∈ R where Q(x) = ax2 − bx and a 6= 0.
(a) Find any relations that must exist between a, b and c and show that X must have a normal density.
(b) Find the mean and variance of X in terms of a and b.

5. (a) Suppose X and Y are i.i.d. random variables with the N (0, σ 2 ) distribution. Find the density of Z = X 2 + Y 2 .
(b) Suppose X1 ,. . . , Xn are i.i.d. random variables with the N (0, σ 2 ) distribution. Let Z = X12 + · · · + Xn2 . Find the
distribution of Z.

6. Suppose X and Y are i.i.d. random variables. Let V = X 2 + Y 2 and W = X/Y . Are V and W independent?

7. Suppose X ∼ N (µ, σ 2 ). Suppose further that, given X = x, the n random variables Y1 , . . . , Yn are i.i.d. N (x, σ12 ).6
Find the distribution of (X|Y1 , . . . , Yn ).

8. The half-normal distribution. Suppose X ∼ N (0, σ 2 ).


(a) Find the density of |X|.
(b) Find E[|X|].

9. The folded normal distribution. Suppose X ∼ N (µ, σ 2 ). Then |X| has the folded normal distribution, folded (µ, σ 2 ).
Clearly the half-normal is the folded (0, σ 2 ) distribution.
Suppose Y ∼ folded (µ, σ 2 ).
(a) Find the density of Y .
b) Find E[Y ] and var[Y ].
(c) Find the c.f. of Y .

10. Suppose X ∼ N (µ, σ 2 ). Show that


 2n/2 σ n
 
n+1
E |X − µ|n = √ Γ

for n = 0, 1, . . . .
π 2
This also gives E[|X|n ] for the half-normal distribution.

6
This means that f(Y1 ,...,Yn )|X (y1 , . . . , yn |x) = Πni=1 fYi |X (yi |x).
1 Univariate Continuous Distributions Jan 8, 2019(21:02) §12 Page 29

11. Suppose X and Y are i.i.d. N (0, 1).


(a) Let Z1 = X + Y and Z2 = X − Y . Show that Z1 and Z2 are independent. Hence deduce that the distribution of
X−Y /X+Y is the same as the distribution of X/Y .
2 2
(b) By using the relation XY = X+Y 2 − X−Y 2 , find the characteristic function of Z = XY .
itXY
R∞ ityX
(c) By using the relation E[e ] = −∞ E[e ]fY (y) dy, find the characteristic function of Z = XY .
(d) Now suppose X and Y are i.i.d. N (0, σ 2 ). Find the c.f. of Z = XY .
(e) Now suppose X and Y are i.i.d. N (µ, σ 2 ). Find the c.f. of Z = XY .
12. Suppose X1 , X2 , X3 and X4 are i.i.d. N (0, 1). Find the c.f. of X1 X2 + X3 X4 and the c.f. of X1 X2 − X3 X4 . See also
exercise 1.25(9) on page 57.
13. (a) Suppose b ∈ (0, ∞). Show that

b2
   r
π −b
Z
1 2
exp − u + 2 du = e (11.13a)
0 2 u 2
(b) Suppose a ∈ R with a 6= 0 and b ∈ R. Show that
Z ∞
b2
    π 1/2
1
exp − a2 u2 + 2 du = e−|ab| (11.13b)
0 2 u 2a2
This result is used in exercise 1.25(15) on page 57.

12 The lognormal distribution


12.1 The definition.
Definition(12.1a). Suppose µ ∈ R and σ ∈ R; then the random variable X has the lognormal distribution,
logN (µ, σ 2 ), iff ln(X) ∼ N (µ, σ 2 ).
Hence:
• if X ∼ logN (µ, σ 2 ) then ln(X) ∼ N (µ, σ 2 );
• if Z ∼ N (µ, σ 2 ) then eZ ∼ logN (µ, σ 2 ).
12.2 The density and distribution function. Suppose X  ∼ logN (µ, σ 2
 ) and let Z = ln(X). Then
ln(x) − µ
FX (x) = P[X ≤ x] = P[Z ≤ ln(x)] = Φ
σ
2
hence the distribution function of the logN (µ, σ ) distribution is
ln x − µ
 
FX (x) = Φ for x > 0.
σ
Differentiating the distribution function gives the density:
ln x − µ (ln x − µ)2
   
1 1
fX (x) = φ =√ exp − for x > 0.
σx σ 2πσx 2σ 2
Z
The density can also be obtained by transforming the normal density
2 dx z dz
 as follows. Now X = e where2 Z ∼
N (µ, σ ). Hence | dz | = e = x; hence fX (x) = fZ (z)| dx | = fZ (ln x) x where fZ is the density of N (µ, σ ).
µ = 0, σ = 0.25
1.5
µ = 0, σ = 0.5
µ = 0, σ = 1.0

1.0

0.5

0.0
0.0 0.5 1.0 1.5 2.0 2.5 3.0

Figure(12.2a). The graph of the lognormal density for (µ = 0, σ = 0.25), (µ = 0, σ = 0.5) and (µ = 0, σ = 1).
In all 3 cases, we have median = 1, mode < 1 and mean > 1—see exercise 4 on page 32.
(wmf/lognormaldensity,72mm,54mm)
Page 30 §12 Jan 8, 2019(21:02) Bayesian Time Series Analysis

Suppose X ∼ logN (µ, σ 2 ). Then E[X n ] = E[enZ ] = exp nµ + 12 n2 σ 2 for any n ∈ R. In


 
12.3 Moments.
particular
 
1 2
E[X] = exp µ + σ (12.3a)
2
2
 2

var[X] = E[X 2 ] − {E[X]}2 = e2µ+σ eσ −1

12.4 Other properties.


• Suppose X1 , . . . , Xn are independent random variables with Xi ∼ logN (µi , σi2 ) for i = 1, . . . , n. Then
n n n
!
Y X X
Xi = X1 · · · Xn ∼ logN µi , σi2
i=1 i=1 i=1
• Suppose X1 , . . . , Xn are i.i.d. with the logN (µ, σ 2 )
distribution. Then
σ2
 
(X1 · · · Xn )1/n ∼ logN µ,
n
2
• If X ∼ logN (µ, σ ) , b ∈ R and c > 0 then
cX b ∼ logN ln(c) + bµ, b2 σ 2

(12.4a)
See exercises 5 and 2 below for the derivations of these results.
12.5 The multiplicative central limit theorem.
Proposition(12.5a). Suppose X1 , . . . , Xn are i.i.d. positive random variables such that
E[ ln(X) ] = µ and var[ ln(X) ] = σ 2
both exist and are finite. Then
 √
X1 · · · Xn 1/ n D

−→ logN (0, σ 2 ) as n → ∞.
enµ
Proof. Let Yi = ln(Xi ) for i = 1, 2, . . . , n. Then
" 1/√n # Pn
X1 · · · Xn (Yi − µ) D
ln = i=1√ −→ N (0, σ 2 ) as n → ∞. 7
enµ n
D D
Now if Xn −→X as n → ∞ then g(Xn ) −→g(X) as n → ∞ for any continuous function, g. Taking g(x) = ex proves
the proposition.

Using equation(12.4a) shows that if X ∼ logN (0, σ 2 ) then X 1/σ ∼ logN (0, 1). It follows that
"  √1 #
X1 · · · Xn σ n
lim P ≤ x = Φ(x) for all x > 0.
n→∞ enµ
Also, if we let
1/√n √ √ √
X1 · · · Xn

n
W = then (X1 · · · Xn )1/ = eµ n
W and (X1 · · · Xn )1/n = eµ W 1/ n
enµ
and hence by equation(12.4a), (X1 · · · Xn )1/n is asymptotically logN (µ, σ 2 /n).

We can generalise proposition (12.5a) as follows:


Proposition(12.5b). Suppose X1 , X2 , . . . is a sequence of independent positive random variables such that for
all i = 1, 2, . . .
E[ ln(Xi ) ] = µi , var[ ln(Xi ) ] = σi2 and E |ln(Xi ) − µi |3 = ωi3
all exist and are finite. For n = 1, 2, . . . , let
n
X Xn n
X
µ(n) = µi s2(n) = σi2 3
ω(n) = ωi3
i=1 i=1 i=1

7
The classical central limit theorem asserts that if X1 , X2 , . . . is a sequence of i.i.d. random variables with finite expectation µ
and finite variance σ 2 and Sn = (X1 + · · · + Xn )/n, then
√ D
n (Sn − µ) −→ N (0, σ 2 ) as n → ∞.
See page 357 in [B ILLINGSLEY(1995)].
1 Univariate Continuous Distributions Jan 8, 2019(21:02) §13 Page 31

Suppose further that ω(n) /s(n) → 0 as n → ∞. Then


X1 · · · Xn 1/s(n) D
 
−→ logN (0, 1) as n → ∞.
eµ(n)
Proof. Let Yi = ln(Xi ) for i = 1, 2, . . . , n. Then
1/s(n) Pn
X1 · · · Xn (Yi − µi ) D

ln µ
= i=1 −→N (0, 1) as n → ∞. 8
e (n) s(n)
Using the transformation g(x) = ex proves the proposition.

Also, if we let
1/s(n)
X1 · · · Xn

W = then X1 · · · Xn = eµ(n) W s(n)
eµ(n)
 
and hence by equation(12.4a), the random variable (X1 · · · Xn ) is asymptotically logN µ(n) , s2(n) .

12.6 Usage. The multiplicative central limit theorem suggests the following applications of the lognormal which
can be verified by checking available data.
• Grinding, where a whole is divided into a multiplicity of particles and the particle size is measured by volume,
mass, surface area or length.
• Distribution of farm size (which corresponds to a division of land)—where a 3-parameter lognormal can be
used. The third parameter would be the smallest size entertained.
• The size of many natural phenomena is due to the accumulation of many small percentage changes—leading to
a lognormal distribution.

12.7
Summary. The lognormal distribution.
• X ∼ logN (µ, σ 2 ) iff ln(X) ∼ N (µ, σ 2 ).
2 2
• Moments: if X ∼ logN (µ, σ 2 ) then E[X] = exp µ + 12 σ 2 and var[X] = e2µ+σ ( eσ − 1 )
 
• The product of independent lognormals is lognormal.
• If X ∼ logN (µ, σ 2 ) , b ∈ R and c > 0 then cX b ∼ logN ln(c) + bµ, b2 σ 2

• The multiplicative central limit theorem.

13 Exercises (exs-logN.tex)

1. An investor forecasts that the returns on an investment over the next 4 years will be as follows: for each of the first
2 years he estimates that £1 will grow to £(1 + I) where I is a random variable with E[I] = 0.08 and var[I] = 0.001;
for each of the last 2 years he estimates that £1 will grow to £(1 + I) where I is a random variable with E[I] = 0.06 and
var[I] = 0.002.
Suppose he further assumes that the return Ij in year j is independent of the returns in all other years and that 1 + Ij has
a lognormal distribution, for j = 1, 2, . . . , 4.
Calculate the amount of money which must be invested at time t = 0 in order to ensure that there is a 95% chance that
the accumulated value at time t = 4 is at least £5,000.

2. Suppose X ∼ logN (µ, σ 2 ).


(a) Find the distribution of 1/X. (b) Suppose b ∈ R and c > 0. Find the distribution of cX b .

8
Lyapunov central limit theorem with δ = 1. Suppose X1 , X2 , . . . is a sequence of independent random variables such that
E[Xi ] = µi and var[Xi ] = σi2 are both finite. Let sn = σ12 + · · · + σn2 and suppose
n Pn
1 X 3 i=1 (Xi − µi ) D
lim 3 E |Xi − µi | = 0, then −→N (0, 1) as n → ∞.
n→∞ sn sn
i=1
See page 362 in [B ILLINGSLEY(1995)].
Page 32 §14 Jan 8, 2019(21:02) Bayesian Time Series Analysis

3. The geometric mean and geometric variance of a distribution. Suppose each xi P in the data set {x1 ., . . . , P
xn } satisfies
n
xi > 0. Then the geometric mean of the data set is g = (x1 · · · xn )1/n or ln(g) = n1 i=1 ln(xi ) or ln(g) = n1 j fj ln(xj )
where fj is the frequency of the observation xj . This definition motivates the following.
Suppose
R∞ X is a random variable with X > 0. Then GMX , the geometric mean of X is defined by ln(GMX ) =
0
ln(x)f X (x) dx = E[ln(X)].

Similarly, we define the geometric variance, GVX , by


ln(GVX ) = E (ln X − ln GMX )2 = var[ln(X)]
 

and the geometric standard deviation by GSDX = GVX .
Suppose X ∼ logN (µ, σ 2 ). Find GMX and GSDX .

4. Suppose X ∼ logN (µ, σ 2 ).


(a) Find the median and mode and show that: mode < median < mean.
(b) Find expressions for the lower and upper quartiles of X in terms of µ and σ.
(c) Suppose αp denotes the p-quartile of X; this means that P[X ≤ αp ] = p. Prove that αp = eµ+σβp where βp is the
p-quartile of the N (0, 1) distribution.

5. (a) Suppose X1 , .Q. . , Xn are independent random variables with Xi ∼ logN (µi , σi2 ) for i = 1, . . . , n. Find the
n
distribution of i=1 Xi = X1 · · · Xn .
(b) Suppose X1 , . . . , Xn are i.i.d. with the logN (µ, σ 2 ) distribution. Find the distribution of (X1 · · · Xn )1/n .
(c) Suppose X1 , . . . , Xn be independent random variables with Xi ∼ logN (µi , σi2 ) for i = 1, . . . , n. Suppose further
that a1 , . . . , an are real constants. Show that
Yn
Xiai ∼ logN (mn , s2n )
i=1
for some mn and sn and find explicit expressions for mn and sn .

6. Suppose X1 and X2 are independent random variables with Xi ∼ logN (µi , σi2 ) for i = 1 and i = 2. Find the distribution
of X1 /X2 .

7. Suppose X ∼ logN (µ, σ 2 ). Suppose further that E[X] = α and var[X] = β. Express µ and σ 2 in terms of α and β.

8. Suppose X ∼ logN (µ, σ 2 ) and k > 0. Show that


 2
  
µ+σ 2 −ln(k)
1 2
Φ ln(k)−µ−σ
σ Φ σ
µ+ 12 σ 2
E[X|X < k] = eµ+ 2 σ   and E[X|X ≥ k] = e  
Φ ln(k)−µ
σ 1−Φ ln(k)−µ
σ

9. Suppose X ∼ logN (µ, σ 2 ). Then the j th moment distribution function of X is defined to be the function G : [0, ∞) →
[0, 1] with
Z x
1
G(x) = uj fX (u) du
E[X j ] 0
(a) Show that G is the distribution function of the logN (µ + jσ 2 , σ 2 ) distribution.
(b) Suppose γX denotes the Gini coefficient of X (also called the coefficient of mean difference of X). By definition
Z ∞Z ∞
1
γX = |u − v|fX (u)fX (v) dudv
2E[X] 0 0
E|X−Y |
Hence γX = 2E[X] where X and Y are independent with the same distribution. Prove that
 
γX = 2Φ( σ/√2) − 1

14 The beta and arcsine distributions


14.1 The density and distribution function.
Definition(14.1a). Suppose α > 0 and β > 0. Then the random variable X has the beta distribution,
Beta(α, β), iff it has density
Γ(α + β) α−1
f (x; α, β) = x (1 − x)β−1 for 0 < x < 1. (14.1a)
Γ(α)Γ(β)
1 Univariate Continuous Distributions Jan 8, 2019(21:02) §14 Page 33

Note: R∞
• Checking equation(14.1a) is a density function. Now Γ(α) = 0 uα−1 e−u du by definition. Hence
Z ∞Z ∞
Γ(α)Γ(β) = uα−1 v β−1 e−u−v du dv
0 0
Now use the transformation x = u/(u + v) and y = u + v; hence u = xy and v = y(1 − x). Clearly 0 < x < 1 and

0 < y < ∞. Finally ∂(u,v)
∂(x,y) = y. Hence

Z 1 Z ∞
Γ(α)Γ(β) = y α+β−1 xα−1 (1 − x)β−1 e−y dx dy
x=0 y=0
Z 1
= Γ(α + β) xα−1 (1 − x)β−1 dx
x=0
• The beta function is defined by
Z 1
Γ(α)Γ(β)
B(α, β) = tα−1 (1 − t)β−1 dt = for all α > 0 and β > 0.
0 Γ(α + β)
Properties of the beta and gamma functions can be found in most advanced calculus books. Recall that Γ(n) =
(n − 1)! if n is a positive integer.
• The distribution functionZ x of the beta distribution, Beta(α,
Z x β) is
1 Ix (α, β)
F (x; α, β) = f (t; α, β) dt = tα−1 (1 − t)β−1 dt = for x ∈ (0, 1).
0 B(α, β) 0 B(α, β)
The integral, Ix (α, β), is called the incomplete beta function.
R1
14.2 Moments. Using the fact that 0 xα−1 (1 − x)β−1 dx = B(α, β), it is easy to check that
α (α + 1)α αβ
E[X] = E[X 2 ] = and hence var[X] = (14.2a)
α+β (α + β + 1)(α + β) (α + β)2 (α + β + 1)
By differentiation, we get f 0 (x; α, β) = 0 implies x(2 − α − β) = α − 1. This has a root for x in [0, 1] if either
(a) α + β > 2, α ≥ 1 and β ≥ 1 or (b) α + β < 2, α ≤ 1 and β ≤ 1. By checking the second derivative, we see
α−1
mode[X] = if α + β > 2, α ≥ 1 and β ≥ 1.
α+β−2
14.3 Shape of the density. The beta density can take many different shapes.
3.0 α = 1/2 , β = 1/2 3.0 α = 2, β = 2
α = 5, β = 1 α = 2, β = 5
2.5 α = 1, β = 3 2.5 α = 5, β = 2

2.0 2.0

1.5 1.5

1.0 1.0

0.5 0.5

0.0 0.0

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

Figure(14.3a). Shape of the beta density for various values of the parameters.
(wmf/betadensity1,wmf/betadensity2,72mm,54mm)

14.4 Some distribution properties.


• Suppose X ∼ Beta(α, β), then 1 − X ∼ Beta(β, α).
• The Beta(1, 1) distribution is the same as the uniform distribution on (0, 1).
• Suppose X ∼ Gamma(n1 , α) and Y ∼ Gamma(n2 , α). Suppose further that X and Y are independent. Then
X/(X + Y ) ∼ Beta(n1 , n2 ). See exercise 5 on page 24.
In particular, if X ∼ χ22k = Gamma(k, 1/2), Y ∼ χ22m = Gamma(m, 1/2) and X and Y are independent, then
X/(X + Y ) ∼ Beta(k, m).
• If X ∼ Beta(α, 1) then − ln(X) ∼ Exponential(α). See exercise 2 on page 35.
Page 34 §14 Jan 8, 2019(21:02) Bayesian Time Series Analysis

14.5 The beta prime distribution.


Definition(14.5a). Suppose α > 0 and β > 0. Then the random variable X is said to have the beta prime
distribution, Beta 0 (α, β), iff it has density
xα−1 (1 + x)−α−β
f (x) = for x > 0. (14.5a)
B(α, β)
Properties of the beta prime distribution are left to the exercises. Its relation to the beta distribution is given by the
next two observations.
• If X ∼ Beta(α, β), then X
1−X ∼ Beta 0 (α, β). See exercise 3 on page 35.
• If X ∼ Beta(α, β), then X1 − 1 ∼ Beta 0 (β, α). This follows from the previous result: just use Y = 1 − X ∼
Beta(β, α) and 1/X − 1 = (1 − X)/X = Y /(1 − Y ).
We shall see later (see §16.8 on page 39) that the beta prime distribution is just a multiple of the F -distribution.

14.6 The arcsine distribution on (0, 1). The arcsine distribution is the distribution Beta 1/2, 1/2 .
Definition(14.6a). The random variable has the arcsine distribution iff X has density
1
fX (x) = √ for x ∈ (0, 1).
π x(1 − x)
The distribution function. Suppose X has the arcsine distribution; then
2 √ arcsin(2x − 1) 1
FX (x) = P[X ≤ x] = arcsin( x) = + for x ∈ [0, 1]. (14.6a)
π π 2
Moments of the arcsine distribution. Using the results in equation(14.2a) on page 33 above and figure(14.6a)
below we get
1 3 1 1
E[X] = E[X 2 ] = var[X] = mode(X) = {0, 1} median(X) =
2 8 8 2
Shape of the distribution.
3.0

2.5

2.0

1.5

1.0

0.5

0.0 0.2 0.4 0.6 0.8 1.0


Figure(14.6a). Plot of the arcsine density
(wmf/arcsineDensity,72mm,54mm)

14.7 The arcsine distribution on (a, b).


Definition(14.7a). Suppose −∞ < a < b < ∞. Then the random variable X has the arcsine distribution
on (a, b), denoted arcsin(a, b), iff X has density
1
fX (x) = √ for x ∈ (a, b).
π (x − a)(b − x))
This means that the distribution defined in definition(14.6a) can also be described as the arcsin(0, 1) distribution.
The distribution function is r
x−a

2
F (x) = arcsin for a ≤ x ≤ b.
π b−a
If X ∼ arcsin(a, b) then kX + m ∼ arcsin(ka + m, bk + m). In particular,
if X ∼ arcsin(0, 1) then (b − a)X + a ∼ arcsin(a, b);
if X ∼ arcsin(a, b) then (X − a)/(b − a) ∼ arcsin(0, 1).
The proof of this and further properties of the arcsine distribution can be found in exercises 7 and 8 on page 36.
1 Univariate Continuous Distributions Jan 8, 2019(21:02) §15 Page 35

14.8
Summary.
The beta distribution. Suppose α > 0 and β > 0; thenX ∼ Beta(α, β) iff X has density
Γ(α + β) α−1
f (x; α, β) = x (1 − x)β−1 for 0 < x < 1.
Γ(α)Γ(β)
• Moments:
α αβ
E[X] = var[X] = 2
α+β (α + β) (α + β + 1)
• Suppose X ∼ Beta(α, β), then 1 − X ∼ Beta(β, α).
• The Beta(1, 1) distribution is the same as the uniform distribution on (0, 1).
• Suppose X ∼ Gamma(n1 , α) and Y ∼ Gamma(n2 , α). Suppose further that X and Y are indepen-
dent. Then X/(X + Y ) ∼ Beta(n1 , n2 ).
nX
• If X ∼ Beta(α, 1) then − ln(X) ∼ Exponential(α). If X ∼ Beta( m/2, n/2) then m(1−X) ∼ Fm,n .
The arcsine distribution. If X ∼ arcsin(0, 1) then X has density
1
fX (x) = √ for x ∈ (0, 1).
π x(1 − x)
• Moments: E[X] = 1/2 and var[X] = 1/8.
The beta prime distribution. Suppose α > 0 and β > 0; then X ∼ Beta 0 (α, β) iff the density is
xα−1 (1 + x)−α−β
f (x) = for x > 0.
B(α, β)
X 0
• If X ∼ Beta(α, β), then 1−X ∼ Beta (α, β).

15 Exercises (exs-betaarcsine.tex.tex)

The beta and beta prime distributions.


1. Suppose X ∼ Beta(α, β). Find an expression for E[X m ] for m = 1, 2, . . . .

2. Suppose X ∼ Beta(α, 1). Show that − ln(X) ∼ exponential (α).

3. Suppose X ∼ Beta(α, β). Show that X


1−X ∼ Beta 0 (α, β).

4. Link between the beta distribution function and the binomial distribution function. Suppose X ∼ Beta(k, n − k + 1)
and Y ∼ binomial (n, p). Prove that P[X > p] = P[Y ≤ k − 1], or equivalently P[X ≤ p] = P[Y ≥ k]. We assume
p ∈ [0, 1], k ∈ {1, 2, . . .} and n ∈ {k, k + 1, . . .}.
Note. This has already been proved—see equation(4.4a) on page 13 where it is shown that if X1 , . . . , Xn are i.i.d. with
the U (0, 1) distribution then the density of Xk:n is the Beta(k, n − k + 1) distribution. Clearly {Xk:n ≤ p} is there at
least k of the n random variables X1 , . . . , Xn in the interval [0, p].

5. The beta prime distribution. Suppose X has the beta prime distribution, Beta 0 (α, β).
(a) Show that E[X] = α/(β − 1) provided β > 1.
(b) Show that var[X] = α(α + β − 1)/(β − 2)(β − 1)2 provided β > 2.
(c) Show that the mode occurs at (α − 1)/(β + 1) if α ≥ 1 and at 0 otherwise.
(d) 1/X ∼ Beta 0 (β, α).
(e) Suppose X ∼ Gamma(n1 , 1) and Y ∼ Gamma(n2 , 1). Suppose further that X and Y are independent. Show that
X/Y ∼ Beta 0 (n1 , n2 ).
(f) Suppose X ∼ χ2n1 , Y ∼ χ2n2 and X and Y are independent. Show that X/Y ∼ Beta 0 (n1 /2, n2 /2).
The arcsine distribution.
6. Prove the equality in equation(14.6a) on page 34:
2 √ arcsin(2x − 1) 1
arcsin( x) = + for x ∈ [0, 1]
π π 2
Page 36 §16 Jan 8, 2019(21:02) Bayesian Time Series Analysis

7. (a) Suppose X ∼ arcsine(a, b). Prove that kX + m ∼ arcsine(ka + m, bk + m).


(b) Suppose X ∼ arcsine(−1, 1). Prove that X 2 ∼ arcsine(0, 1).
(c) Suppose X ∼ U (−π, π). Prove that sin(X), sin(2X) and − cos(2X) all have the arcsine(−1, 1) distribution.
8. Suppose X ∼ U (−π, π), Y ∼ U (−π, π) and X and Y are independent.
(a) Prove that sin(X + Y ) ∼ arcsine(−1, 1). (b) Prove that sin(X − Y ) ∼ arcsine(−1, 1).

16 The t, Cauchy and F distributions


16.1 Definition of the tn distribution.
Definition(16.1a). Suppose n ∈ (0, ∞). Then the random variable T has a t-distribution with n degrees of
freedom iff
X
T =p (16.1a)
Y /n
where X ∼ N(0, 1), Y ∼ χ2n , and X and Y are independent.
Density: Finding the density is a routine calculation and is left to exercise 2 on page 39; that exercise shows that
the density of the tn distribution is
−(n+1)/2   −(n+1)/2
t2 Γ n+1/2 t2

1
fT (t) = √ 1+ = √ 1+ for t ∈ R. (16.1b)
B( 1/2, n/2) n n Γ n/2 πn n
We can check that the function fT defined in equation(16.1b) is a density for any n ∈ (0, ∞) as follows. Clearly
fT (t) > 0; also, by using the transformation θ = 1/(1 + t2 /n), it follows that
Z ∞ −(n+1)/2 Z ∞ −(n+1)/2 Z 1
t2 t2 √
1+ dt = 2 1+ dt = n θ(n−2)/2 (1 − θ)−1/2 dθ
−∞ n 0 n 0

= n B(1/2, n/2)
Hence fT is a density.
Now Y in equation(16.1a) can be replaced by Z12 + · · · + Zn2 where Z1 , Z2 , . . . , Zn are i.i.d. with the N (0, 1) dis-
tribution. Hence Y /n has variance 1 when n = 1, but its distribution becomes more clustered about the constant 1
as n becomes larger. Hence T has a larger variance then the normal when n = 1 but tends to the normal as
n → ∞. Figure(16.1a) graphically demonstrates the density of the t-distribution is similar to the shape of the
normal density but has heavier tails. See exercise 4 on page 40 for a mathematical proof that the distribution of T
tends to the normal as n → ∞.
0.4 t2
t10
normal
0.3

0.2

0.1

0.0
−4 −2 0 2 4

Figure(16.1a). Plot of the t2 , t10 and standard normal densities.


(wmf/tdensity,72mm,54mm)
R∞
16.2 Moments of the tn distribution. Suppose T ∼ tn . Now it is well-known that the integral 1 x1j dx
converges if j > 1 and diverges if jZ ≤ 1. It follows that

tr
dt converges if r < n.
1 (n + t2 )(n+1)/2
Hence the function tr fT (t) is integrable iff r < n. √

Provided n > 1, E[T ] exists and equals nE[X]E[1/ Y ] = 0.
Provided n > 2, var(T ) = E[T 2 ] = nE[X 2 ] E[1/Y ] = n/(n − 2) by using equation(8.9a) on page 23 which gives
E[1/Y ] = 1/(n − 2).
1 Univariate Continuous Distributions Jan 8, 2019(21:02) §16 Page 37

16.3 Linear transformation of the tn distribution. Suppose m ∈ R, s > 0 and V = m + sT . Then


 !−(n+1)/2
1 v−m 2

fT (t) 1
fV (v) = dv = √ 1+ (16.3a)

dt
B( 1/2, n/2)s n n s
Also E[V ] = m for n > 1 and
n
var(V ) = s2 for n > 2 (16.3b)
n−2
This is called a tn (m, s2 ) distribution.
16.4 The Cauchy distribution. The standard Cauchy distribution, denoted Cauchy(1), is the same as the t1
distribution. Hence its density is:
1
γ1 (t) = for t ∈ R.
π(1 + t2 )
More generally, suppose s > 0. Then the Cauchy distribution, Cauchy(s), is the same as the t1 (0, s2 ) distribution.
Its density is
s
γs (t) = for t ∈ R.
π(s + t2 )
2
Clearly if X ∼ Cauchy(1) and s > 0 then sX ∼ Cauchy(s).
0.4 normal
γ1 = t1
γ2 = t1 (0, 4)
0.3

0.2

0.1

0.0
−4 −2 0 2 4

Figure(16.4a). Plot of the normal, standard Cauchy and the Cauchy(2) = t(0, 4) densities.
(wmf/cauchydensity,72mm,54mm)

16.5 Elementary properties of the Cauchy distribution.


• Moments. The expectation, variance and higher moments of the Cauchy distribution are not defined.
• The distribution function. This is  
1 t
Fs (t) = tan−1
π s
This is probably better written as
 
1 1 −1 t
where now tan−1 t/s ∈ (− π/2, π/2).

Fs (t) = + tan
2 π s
• The characteristic function. Suppose the random variable T has the standard Cauchy distribution Cauchy(1).
Then
φT (t) = E[eitT ] = e−|t|
and hence if W ∼ Cauchy(s), then W = sT and E[eitW ] = e−s|t| .
Note. The characteristic function can be derived by using the calculus of residues, or by the following trick. Using integration
by parts gives
Z ∞ Z ∞ Z ∞ Z ∞
e−y cos(ty) dy = 1 − t e−y sin(ty) dy and e−y sin(ty) dy = t e−y cos(ty) dy
0 0 0 0
and hence Z ∞
1
e−y cos(ty) dy =
0 1 + t2
Now the characteristic function of the bivariate exponential density f (x) = 12 e−|x| for x ∈ R is
1 ∞
Z Z ∞
1
φ(t) = (cos(ty) + i sin(ty))e−|y| dy = e−y cos(ty) dy =
2 −∞ 0 1 + t2
Page 38 §16 Jan 8, 2019(21:02) Bayesian Time Series Analysis

Because this function is absolutely inegrable, we can use the inversion theorem to get
Z ∞ −ity Z ∞ ity
1 −|t| 1 e 1 e
e = 2
dy = dy
2 2π −∞ 1 + y 2π −∞ 1 + y 2
as required.

Further properties of the Cauchy distribution can be found in exercises 5 –13 starting on page 40.

16.6 Definition of the F distribution.


Definition(16.6a). Suppose m > 0 and n > 0. Suppose further that X ∼ χ2m , Y ∼ χ2n and X and Y are
independent. Then
X/m
F = has an Fm,n distribution.
Y /n
Finding the density of the Fm,n distribution is a routine calculation and is left to exercise 14 on page 40; that
exercise shows that the density of the Fm,n distribution is
Γ( m+n
2 ) m
m/2 nn/2 xm/2−1
fF (x) = m n for x ∈ (0, ∞). (16.6a)
Γ( 2 )Γ( 2 ) [mx + n](m+n)/2

F10,4 density
0.8 F10,50 density

0.6

0.4

0.2

0.0
0 1 2 3 4 5

Figure(16.6a). Plot of the F10,4 and F10,50 densities.


(wmf/fdensity,72mm,54mm)

16.7 The connection between the t and F distributions. Recall the definition of a tn distribution:
X
Tn = p
Y /n
where X and Y are independent, X ∼ N (0, 1) and Y ∼ χ2n .
Now X 2 ∼ χ21 ; hence
X2
Tn2 = ∼ F1,n (16.7a)
Y 2 /n
It follows that if X ∼ Cauchy(1) = t1 , then X 2 ∼ F1,1 .
Example(16.7a). Using knowledge of the F density and equation(16.7a), find the density of Tn .
Solution. Let W = Tn2 ; hence W ∼ F1,n . Then
1  √ √ 
fW (w) = √ fTn (− w) + fTn ( w)
2 w
But equation(16.7a) clearly implies the distribution of Tn is symmetric about 0; hence for w > 0
−(n+1)/2
√ Γ( n+1 nn/2 w−1 Γ( n+1 w2

1 2 2 ) 2 )
fW (w) = √ fTn ( w) and fTn (w) = wfW (w ) = w 1 =√ 1+
w Γ( 2 )Γ( n2 ) [w2 + n](n+1)/2 nπΓ( n2 ) n
Finally, by symmetry, fTn (−w) = fTn (w).
1 Univariate Continuous Distributions Jan 8, 2019(21:02) §17 Page 39

16.8 Properties of the F distribution. The following properties of the F -distribution are considered in exercises
17–21 on page 41.
• If X ∼ Fm,n then 1/X ∼ Fn,m .
• If X ∼ Fm,n then E[X] = n/(n − 2), var[X] = 2n2 (m + n − 2)/[m(n − 2)2 (n − 4)]. See exercise 16 on
page 41.
• If X1 ∼ Gamma(n1 , α1 ), X2 ∼ Gamma(n2 , α2 ) and X1 and X2 are independent then
n2 α1 X1
∼ F2n1 ,2n2 (16.8a)
n1 α2 X2
In particular, if X and Y are i.i.d. with the exponential (λ) distribution, then X/Y ∼ F2,2 .
nX
• If X ∼ Beta( m/2, n/2) then m(1−X) ∼ Fm,n . See exercise 18 on page 41.
mX n
• If X ∼ Fm,n then ∼ Beta( m/2, n/2) and n+mX
n+mX ∼ Beta( n/2, m/2). See exercise 19 on page 41.
0 m
• If X ∼ Fm,n then mX/n ∼ Beta ( /2, n/2). See exercise 20 on page 41. (16.8b)
D
• Suppose X ∼ Fm,n . Then mX −→χ2m as n → ∞. See exercise 21 on page 41.
16.9 Fisher’s z distribution.
Definition(16.9a). If X ∼ Fm,n , then
ln(X)
∼ FisherZ (m, n)
2
It follows that if X ∼ FisherZ (n, m) then e2X ∼ Fn,m .
16.10
Summary.
The tn distribution. The random variable T ∼ tn iff
X
T =p
Y /n
where X ∼ N (0, 1), Y ∼ χ2n and X and Y are independent.
• Moments:
 
n 1 1
E[T ] = 0 var[T ] = for n > 2. E =
n−2 T n−2
2
• Suppose T ∼ tn , m ∈ R and s > 0. Then V = m + sT ∼ tn (m, s ).
The Cauchy distribution. This has density
1
γ1 (t) = for t ∈ R.
π(1 + t2 )
It is the t1 distribution. The Cauchy(s) distribution is the same as the t1 (0, s2 ) distribution.
The F distribution. Suppose m > 0 and n > 0. Suppose further that X ∼ χ2m , Y ∼ χ2n and X and Y
are independent. Then
X/m
F = has an Fm,n distribution.
Y /n
• If X ∼ tn then X 2 ∼ F1,n .

17 Exercises (exs-tCauchyF.tex.tex)

The t distribution.
1. Suppose X, Y1 , . . . , Yn are i.i.d. random variables with the N (0, σ 2 ) distribution. Find the distribution of
X
Z=q
2
(Y1 + · · · + Yn2 )/n

2. Using the definition of the tn distribution given in definition(16.1a) on page 36, show that the density of the tn distribu-
tion is given by equation(16.1b).
Page 40 §17 Jan 8, 2019(21:02) Bayesian Time Series Analysis

3. Suppose n > 0, s > 0 and α ∈ R. Show that


Z ∞ n/2
1 n−1
 
1
dt = sB ,
−∞ 1 + (t − α)2 /s2 2 2

4. Prove that the limit as n → ∞ of the tn density given in equation(16.1b) is the standard normal density.
The Cauchy distribution.
5. (a) Suppose X ∼ Cauchy(1). Find the distribution of Y = 1/X .
(b) Suppose X ∼ Cauchy(s). Find the distribution of Y = 1/X .
(c) Suppose X has a non-central Cauchy(s) distribution with mean m. Hence
s
fX (x) = for x ∈ R.
π[s + (x − m)2 ]
2

Find the distribution of Y = 1/X .

6. Suppose X and Y are i.i.d. with the N (0, σ 2 ) distribution. Find the distribution of:
(a) W = X/Y ; (b) W = X/|Y |; (c) W = |X|/|Y |.

7. (a) Suppose U has the uniform distribution U (− π/2, π/2). Show that tan(U ) ∼ Cauchy(1).
(b) Suppose U has the uniform distribution U (−π, π). Show that tan(U ) ∼ Cauchy(1).

8. Suppose X1 , . . . , Xn are i.i.d. with density γs .


(X1 +···+Xn )
(a) Show that Y = n also has the Cauchy(s) distribution.
(b) Let Mn = median(X1 , . . . , Xn ). Show that Mn is asymptotically normal with mean 0 and a variance which tends
to 0.

9. (a) Suppose X ∼ Cauchy(s). Find the density of 2X. (This shows that 2X has the same distribution as X1 + X2 where
X1 and X2 are i.i.d. with the same distribution as X.)
(b) Supppose U and V are i.i.d. with the Cauchy(s) distribution. Let X = aU + bV and Y = cU + dV . Find the
distribution of X + Y .

10. Suppose X and Y are i.i.d. with the N (0, 1) distribution. Define R and Θ by R2 = X 2 + Y 2 and tan(Θ) = Y /X where
R > 0 and Θ ∈ (−π, π). Show that R2 has the χ22 distribution, tan(Θ) has the Cauchy(1) distribution, and R and Θ are
2
independent. Show also that the density of R is re−r /2 for r > 0.

11. Suppose X has the Cauchy(1) distribution. Find the density of


2X
  1 − X2
2
Hint: tan(2θ) = 2 tan(θ) 1 − tan (θ) .

12. From a point O, radioactive particles are directed at an absorbing line which is at a distance b from O. Suppose OP
denotes the perpendicular from the point O to the absorbing line—and hence the length of OP is b. The direction of
emission is measured by the angle Θ from the straight line OP . Suppose Θ is equally likely to be any direction in
(− π/2, π/2). Formally, Θ ∼ U (− π/2, π/2).
(a) Determine the density of X, the distance from P where the particle hits the absorbing line.
(b) What is the density of 1/X ?

13. The symmetric Cauchy distribution in R2 . Define the function f : R2 → (0, ∞) by


1
f (x, y) =
2π(1 + x2 + y 2 )3/2
(a) Show that f is a density function.
(b) Find the marginal densities.
(c) Suppose (X, Y ) has the density f and we transform to polar coordinates: X = R cos Θ and Y = R sin Θ. Show
that R and Θ are independent and find the distributions of R and Θ.
The last question can be generalized to produce this density—in this case, the direction must be uniform over the surface
of a hemisphere.
The F distribution.
14. Using definition(18.6a) on page 44, show that the density of the Fm,n distribution is given by equation(16.6a) on page 38.
1 Univariate Continuous Distributions Jan 8, 2019(21:02) §18 Page 41

15. Suppose X and Y are i.i.d. N (0, σ 2 ). Find the density of Z where

2 2
Z = Y /X if X 6= 0;
0 if X = 0.
16. Suppose F has the Fm,n distribution. Show
n 2n2 (m + n − 2)
E[F ] = for n > 2 and var[F ] = for n > 4.
n−2 m(n − 2)2 (n − 4)
17. (a) Suppose X ∼ Fm,n . Show that 1/X ∼ Fn,m .
(b) Suppose X1 ∼ Γ(n1 , α1 ), X2 ∼ Gamma(n2 , α2 ) and X1 and X2 are independent. Show that
n2 α1 X1
∼ F2n1 ,2n2
n1 α2 X2
nX
18. Suppose X ∼ Beta( m/2, n/2). Show that m(1−X) ∼ Fm,n .
mX n
19. (a) Suppose X ∼ Fm,n . Show that n+mX ∼ Beta( m/2, n/2) and n+mX ∼ Beta( n/2, m/2).
(b) Suppose X ∼ F2α,2β where α > 0 and β > 0. Show that αX/β ∼ Beta 0 (α, β).
20. Suppose W ∼ Fm,n . Show that mW/n ∼ Beta 0 ( m/2, n/2).
D
21. Suppose W ∼ Fm,n . Show that mW −→χ2m as n → ∞.

18 Non-central distributions
18.1 The non-central χ2 -distribution with 1 degree of freedom. We know that if Z ∼ N (0, 1), then Z 2 ∼ χ21 .
Now suppose
W = (Z + a)2 where Z ∼ N (0, 1) and a ∈ R.
Then W is said to have a non-central χ21 distribution with non-centrality parameter a2 . Hence W ∼ Y 2 where
Y ∼ N (a, 1).
The moment generating function of W .
Z ∞
t(Z+a)2 1 2 1 2
E[e ]= √ et(z+a) e− 2 z dz
2π −∞
But
t(z + a)2 − 1/2 z 2 = z 2 t + 2azt + a2 t − 1/2 z 2 = z 2 (t − 1/2) + 2azt + a2 t
" 2 #
2t − 2t 2 t2
 
4azt 2a 1 2t 2at 2a 4a
= (t − 1/2) z 2 − − =− z− − −
1 − 2t 1 − 2t 2 1 − 2t 1 − 2t (1 − 2t)2
" #
1 − 2t 2at 2 2a2 t

=− z− −
2 1 − 2t (1 − 2t)2
and hence, if α = 2at/(1 − 2t) and t < 1/2,
 2  Z ∞
(1 − 2t)(z − α)2
 
2 a t 1
E[et(Z+a) ] = exp √ exp − dz
1 − 2t 2π −∞ 2
 2 
−1/2 a t
= (1 − 2t) exp (18.1a)
1 − 2t
The density of W . For w > 0 we have
√ √ √ √
φ( w − a) + φ(− w − a) φ( w − a) + φ( w + a)
fW (w) = √ = √
2 w 2 w
1 √ √ 
exp − 1/2(w + a2 ) exp(a w) + exp(−a w)

= √ (18.1b)
2 2πw
1 √
exp − 1/2(w + a2 ) cosh(a w) because cosh(x) = (ex + e−x )/2 for all x ∈ R.

=√
2πw
Using the standard expansion for cosh and the standard property of the Gamma function that Γ(n + 1/2) =

(2n)! π/(4n n!) for all n = 0, 1, 2, . . . , gives
Page 42 §18 Jan 8, 2019(21:02) Bayesian Time Series Analysis
∞ ∞
1 X (a2 w)j 1 X (a2 w/4)j
fW (w) = √ exp − 1/2(w + a2 ) √ =√ exp − 1/2(w + a2 )
2w π(2j)! 2w j!Γ(j + 1/2)
j=0 j=0
 √ 1/2
1 2
 √ a w
=√ exp − /2(w + a ) I−1/2 (a w)
1
2w 2
 2 1/4
1  a √
= exp − 1/2(w + a2 ) I−1/2 (a w) (18.1c)
2 w
where, for all x > 0,
∞  x 2j− 1/2
X 1
I−1/2 (x) =
j!Γ(j + 1/2) 2
j=0
is a modified Bessel function of the first kind.
Note. The general definition of a modified Bessel function of the first kind is

 x ν X x2j
Iν (x) = j
for all ν ∈ R and x ∈ C. (18.1d)
2 4 j!Γ(ν + j + 1)
j=0

18.2 The non-central χ2 -distribution with n degrees of freedom.


Suppose X1 ∼ N (µ1 , 1), X2 ∼ N (µ2 , 1), . . . , Xn ∼ N (µn , 1) are independent. Then nj=1 (Xj − µj )2 ∼ χ2n but
P
Pn 2 2
j=1 Xj does not have a χ distribution. We say
n
X Pn
W = Xj2 has a non-central χ2n distribution with non-centrality parameter λ = 2
j=1 µj .
j=1

This can be written as: suppose X ∼ N (µ, In ) then XT X ∼ χ2n,µT µ . In particular, if X ∼ N (µ, σ 2 ) then
X 2 /σ 2 ∼ χ21,µ2 /σ2 . Note that some authors define the non-centrality parameter to be λ/2.
Moments. See exercise 2 on page 44 for the following moments:
E[W ] = n + λ and var[W ] = 2n + 4λ
2 2
Pn 2 χn distribution with non-centrality parameter λ and W2 ∼ χn , then
So we see that if W1 has a non-central
E[W1 ] ≥ E[W2 ] because λ = j=1 µj ≥ 0.
The moment generating function of W . By equation(18.1a) the moment generating function of X12 is
 2 
2 1 µ1 t
E[etX1 ] = exp
(1 − 2t)1/2 1 − 2t
Hence  
tW 1 λt
E[e ] = exp for t < 1/2. (18.2a)
(1 − 2t)n/2 1 − 2t
18.3 The non-central χ2 -distribution with n degrees of freedom—the basic decomposition theorem.
Proposition(18.3a). Suppose W has a non-central χ2n distribution with non-centrality parameter λ > 0. Then
W has the same distribution as U + V where:
U has a non-central χ21 distribution with non-centrality parameter λ;
V has a χ2n−1 distribution;
U and V are independent.
p Pn
Proof. Let µj = λ/n for j = 1, . . . , n; hence j=1 µ2j = λ.
Pn
We are given that W has a non-central χ2n distribution with non-centrality parameter λ > 0. Hence W ∼ j=1 Xj2 where
X1 , . . . , Xn are independent with Xj ∼ N (µj , 1) for j = 1, . . . , n. √ √
Let e1 = (1, 0, . . . , 0), . . . , en = (0, . . . , 0, 1) denote the standard basis of Rn . Set b1 = (µ1 / λ, . . . , µn / λ). Then
{b1 , e2 , . . . , en } form a basis of Rn . Use the Gram-Schmidt orthogonalization procedure to create the basis {b1 , . . . , bn }.
Define B to be the n × n matrix with rows {b1 , . . . , bn }; then B is orthogonal.

Suppose X = (X1 , . . . , Xn ) and set Y = BX. Then Y ∼ N (Bµ, BIBT = I) where µ = (µ1 , . . . , µn ) = λb1 . Hence
n×1 n×1

Y1 ∼ N (bT1 µ = λ, 1) and Yj ∼ N (bTj µ = 0, 1) for j = 2, . . . , n and Y1 , . . . , Yn are independent. Also YT Y = XT X.
Pn
Finally, let U = Y12 and V = j=2 Yj2 .
1 Univariate Continuous Distributions Jan 8, 2019(21:02) §18 Page 43

18.4 The non-central χ2 -distribution with n degrees of freedom—the density function. We use proposi-
tion(18.3a). Now U has a non-central χ21 distribution with non-centrality parameter λ. Using equation(18.1b)
gives
1 1
h √ √ i
fU (u) = 3/2 1 √ e− 2 (u+λ) e λu + e− λu for u > 0.
2 Γ( /2) u
Also, V ∼ χ2n−1 has density
e−v/2 v (n−3)/2
fV (v) = for v > 0.
2(n−1)/2 Γ( (n−1)/2)
Using independence of U and V gives
u−1/2 v (n−3)/2 e−(u+v)/2 e−λ/2 h √λu √ i
− λu
f(U,V ) (u, v) = e + e
2(n+2)/2 Γ( 1/2)Γ( (n−1)/2)
Now use the transformation X = U + V and Y = V . The Jacobian equals 1. Hence for y > 0 and x > y
1/2 " √λ(x−y) √ #
e−x/2 e−λ/2 x(n−4)/2  y (n−3)/2 − e− λ(x−y)

x e
f(X,Y ) (x, y) = n/2 1
2 Γ( /2)Γ( (n−1)/2) x x−y 2
Now
1/2 " √λ(x−y) √ #  ∞
1/2 X
− e− λ(x−y) λj (x − y)j

x e x
=
x−y 2 x−y (2j)!
j=0

X (λx)j  y j−1/2
= 1−
(2j)! x
j=0

and so we have


e−x/2 e−λ/2 x(n−4)/2 X (λx)j  y (n−3)/2  y j−1/2
f(X,Y ) (x, y) = n/2 1 1 − for y > 0 and x > y.
2 Γ( /2)Γ( (n−1)/2) j=0 (2j)! x x

We need to integrate out y. By setting w = y/x we get


Z x  (n−3)/2  Z 1
y y j−1/2
1− dy = x w(n−3)/2 (1 − w)j−1/2 dw
y=0 x x w=0
= xB( (n−1)/2, j + 1/2))
Γ( (n−1)/2)Γ(j + 1/2))
=x
Γ( n/2 + j)
and hence for x > 0

e−x/2 e−λ/2 x(n−2)/2 X (λx)j Γ(j + 1/2)) Γ( n/2)
fX (x) =
2n/2 Γ( n/2) j=0
(2j)! Γ( 1/2) Γ( n/2 + j)

e−x/2 e−λ/2 x(n−2)/2 X (λx)j
= (18.4a)
2n/2 4j j!Γ( n/2 + j)
j=0
The expression for the modified Bessel function of the first kind in equation(18.1d) on page 42 gives
√ ∞
(λx)(n−2)/4 X (λx)j
I n2 −1 ( λx) = j
2n/2−1 4 j!Γ( n/2 + j)
j=0
Hence an alternative expression for the density is
1  x (n−2)/4 √
fX (x) = e−x/2 e−λ/2 I n2 −1 ( λx) (18.4b)
2 λ
This is the same as equation(18.1c) if we set n = 1 and λ = a2 .
A plot of the density of the χ28 distribution and the density of the non-central χ28 distribution with non-centrality
parameter µ equal to 5 and 10 is given in figure(18.4a).
Page 44 §19 Jan 8, 2019(21:02) Bayesian Time Series Analysis

0.15
n = 8, µ = 0
n = 8, µ = 5
n = 8, µ = 10
0.10

0.05

0.00
0 5 10 15 20
Figure(18.4a). Plot of the non-central χ28 density for various values of the non-centrality parameter µ.
(wmf/noncentralchisquared,79mm,56mm)

18.5 The non-central t distribution.


Definition(18.5a). Suppose n ∈ (0, ∞) and µ ∈ R. Then the random variable T has a non-central t-
distribution with n degrees of freedom and non-centrality parameter µ iff
X +µ
T =p (18.5a)
Y /n
where X ∼ N(0, 1), Y ∼ χ2n , and X and Y are independent.
See exercise 4 on page 45 for the following moments:
"  #2
n Γ n−1 n−1

2) 2n
r
2 n(1 + µ µ Γ 2
E[T ] = µ for n > 1 and var[T ] = − for n > 2.
2 Γ n2 n−2 2 Γ n2
If X ∼ N (µ, σ 2 ) and Y ∼ χ2n and X and Y are independent, then
X/σ
T =p
Y /n
has the non-central tn distribution with non-centrality parameter µ/σ.
18.6 The non-central F distribution.
Definition(18.6a). Suppose m > 0 and n > 0. Suppose further that X has a non-central χ2m distribution with
non-centrality parameter λ, Y ∼ χ2n and X and Y are independent. Then
X/m
F = has a non-central Fm,n distribution with non-centrality parameter λ.
Y /n
See exercise 6 on page 45 for the following moments:
n(m + λ) (m + λ)2 + (m + 2λ)(n − 2)  n 2
E[F ] = for n > 2 and var[F ] = 2 for n > 4.
m(n − 2) (n − 2)2 (n − 4) m
If F1 has the non-central Fm,n distribution with non-centrality parameter λ and F2 has the Fm,n distribution, then
E[F1 ] ≥ E[F2 ]. This follows from the corresponding property of the non-central χ2 distribution.
The F -statistic used to test a hypothesis will usually have a central F distribution if the hypothesis is true and a
non-central F distribution if the hypothesis is false. The power of a test is the probability of rejecting the null
hypothesis when it is false. Hence calculating the power of a test will often involve calculating probabilities from
a non-central F distribution.

19 Exercises (exs-noncentral.tex.tex)

1. Suppose X1 , . . . , Xn are independent random variables such that Xj has a non-central χ2kj distribution with non-
centrality parameter λj for j = 1, . . . , n. Find the distribution of Z = X1 + · · · + Xn .

2. Suppose W has a non-central χ2n distribution with non-centrality parameter λ. Find E[W ] and var[W ].
3. Suppose the random variable V has the Poisson distribution with mean λ/2. Suppose further that the distribution of W
given V = j is the χ2k+2j distribution. Show that the distribution of W is the non-central χ2k with non-centrality parame-
ter λ.
1 Univariate Continuous Distributions Jan 8, 2019(21:02) §20 Page 45

4. Suppose T has the non-central t distribution with n degrees of freedom and non-centrality parameter µ. Show that
"  #2
n Γ n−1 n(1 + µ2 ) µ2 n Γ n−1
r 
2 2
E[T ] = µ for n > 1 and var[T ] = − for n > 2.
2 Γ n2 n−2 2 Γ n2

5. Suppose T has the non-central t distribution with n degrees of freedom and non-centrality parameter µ. Show that T 2
has the non-central F1,n distribution with non-centrality parameter µ2 .

6. Suppose F has the non-central Fm,n distribution with non-centrality parameter λ. Show that
n(m + λ) (m + λ)2 + (m + 2λ)(n − 2)  n 2
E[F ] = for n > 2 and var[F ] = 2 for n > 4.
m(n − 2) (n − 2)2 (n − 4) m

20 The power and Pareto distributions


There are many more results about these distributions in the exercises.
The power distribution.
20.1 The power distribution. Suppose a0 ∈ R, h > 0 and α > 0. Then the random variable X is said to have
the power distribution Power(α, a0 , h) iff X has density
α(x − a0 )α−1
f (x) = for a0 < x < a0 + h.

The distribution function is
(x − a0 )α
F (x) = for a0 < x < a0 + h.

The standard power distribution is Power(α, 0, 1); this has density f (x) = αxα−1 for 0 < x < 1 and distribution
function F (x) = xα for 0 < x < 1. If X ∼ Power(α, 0, 1), the standard power distribution, then E[X] = α/(α+1),
E[X 2 ] = α/(α + 2) and var[X] = α/(α + 1)2 (α + 2); see exercise 1 on page 48.
Clearly, X ∼ Power(α, a0 , h) iff (X − a0 )/h ∼ Power(α, 0, 1).
4
α = 1/2
α=2
3
α=4

0
0.0 0.2 0.4 0.6 0.8 1.0

Figure(20.1a). The density of the standard power distribution for α = 1/2, α = 2 and α = 4.
(wmf/powerdensity,72mm,54mm)

20.2 A characterization of the power distribution. Suppose X ∼ Power(α, 0, h); then


αxα−1 xα
f (x) = and F (x) = for x ∈ (0, h).
hα hα
Also, for all c ∈ (0, h) we have
Z c
αxα−1 α c
E[X|X ≤ c] = x α dx = c = E[X]
0 c α+1 h
The next proposition shows this result characterizes the power distribution (see [DALLAS(1976)]).
Proposition(20.2a). Suppose X is a non-negative absolutely continuous random variable such that these exists
h > 0 with P[X ≤ h] = 1. Suppose further that for all c ∈ (0, h) we have
c
E[X|X ≤ c] = E[X] (20.2a)
h
Then there exists α > 0 such that X ∼ Power(α, 0, h).
Page 46 §20 Jan 8, 2019(21:02) Bayesian Time Series Analysis

Proof. Let f denote the density and F denote the distribution function of X. Then equation(20.2a) becomes
Z c
xf (x) c h
Z
dx = xf (x) dx
0 F (c) h 0
Rh
Let δ = h1 0 xf (x) dx. Then δ ∈ (0, 1) and
Z c
xf (x) dx = cF (c) δ for all c ∈ (0, h). (20.2b)
0
Differentiating with respect to c gives
cf (c) = [F (c) + cf (c)] δ
and hence
F 0 (c) α δ
= where α = > 0.
F (c) c 1−δ
Integrating gives ln F (c) = A + α ln(c) or F (c) = kcα . Using F (h) = 1 gives F (c) = cα /hα for c ∈ (0, h), as required.
The above result leads on to another characterization of the power distribution:
Proposition(20.2b). Suppose X1 , X2 , . . . , Xn are i.i.d. random variables with a non-negative absolutely con-
tinuous distribution function F and such  exists h > 0 with F (h) = 1. Then
that there
Sn
E X(n) = x = c with c independent of x (20.2c)
X(n)
iff there exists α > 0 such that F (x) = xα /hα for x ∈ (0, h).
Proof. ⇒ Writing Sn = X(1) + · · · + X(n) in equation(20.2c) gives
 
(c − 1)x = E X(1) + · · · + X(n−1) |X(n) = x
It is easy to see that given X(n) = x, then the vector (X(1) , . . . , X(n−1) ) has the same distribution as the vector of n − 1
order statistics (Y(1) , . . . , Y(n−1) ) from the density f (y)/F (x) for 0 < y < x. Hence Y(1) + . . . + Y(n−1) = Y1 + · · · + Yn−1
and
(c − 1)x = (n − 1)E[Y ] where Y has density f (y)/F (x) for y ∈ (0, x). (20.2d)
Hence Z x
c−1
yf (y) dy = xF (x)
0 n −1
Because X(j) < X(n) for all j = 1, 2, . . . , n, equation(20.2c) implies c < nx x = n; also equation(20.2d) implies c > 1.
c−1
Hence δ = n−1 ∈ (0, 1). So we have equation(20.2b) again and we must have F (x) = xα /hα for x ∈ (0, h).
⇐ See part (a) of exercise 6 on page 48.
The next result is an easy consequence of the last one—it was originally announced in [S RIVASTAVA(1965)] but
the proof here is due to [DALLAS(1976)].
Proposition(20.2c). Suppose X1 , X2 , . . . , Xn are i.i.d. random variables with a non-negative absolutely con-
tinuous distribution function F and such that there exists h > 0 with F (h) = 1. Then
Sn
is independent of max{X1 , . . . , Xn } (20.2e)
max{X1 , . . . , Xn }
iff there exists α > 0 such that F (x) = xα /hα for x ∈ (0, h).
Proof. ⇒ Clearly equation(20.2e) implies equation(20.2c). ⇐ See part (b) of exercise 6 on page 48.

The Pareto distribution.


20.3 The Pareto distribution. Suppose α > 0 and x0 > 0. Then the random variable X is said to have a Pareto
distribution iff X has the distribution function  x α
0
FX (x) = 1 − for x ≥ x0 .
x
It follows that X has density
αxα0
fX (x) = α+1 for x ≥ x0 .
x
More generally, the random variable has the Pareto(α, a, x0 ) distribution iff
αxα0 xα0
fX (x) = for x > a + x 0 and F X (x) = 1 − for x > a + x0 .
(x − a)α+1 (x − a)α
α
The standard Pareto distribution is Pareto(α, 0, 1) which has density f (x) = xα+1 for x > 1 and distribution
1
function F (x) = 1 − xα for x > 1. Clearly X ∼ Pareto(α, a, x0 ) iff (X − a)/x0 ∼ Pareto(α, 0, 1).
1 Univariate Continuous Distributions Jan 8, 2019(21:02) §20 Page 47

It is important to note that if X ∼ Pareto(α, 0, x0 ) then 1/X ∼ Power(α, 0, 1/x) also, if X ∼ Power(α, 0, h) then
1/X ∼ Pareto(α, 0, 1/h). So results about one distribution can often be transformed into an equivalent result about
the other.
3.0
x0 = 1, α = 1
x0 = 1, α = 2
2.5
x0 = 1, α = 3
2.0

1.5

1.0

0.5

0.0
0 1 2 3 4 5

Figure(20.3a). The Pareto distribution density for α = 1, α = 2 and α = 3 (all with x0 = 1).
(wmf/Paretodensity,72mm,54mm)

The Pareto distribution has been used to model the distribution of incomes, the distribution of wealth, the sizes of
human settlements, etc.
20.4 A characterization of the Pareto distribution. Suppose X ∼ Pareto(α, 0, x0 ). Suppose further that
α > 1 so that the expectation is finite. We have
αxα0 xα
f (x) = α+1 and F (x) = 1 − α0 for x > x0 .
x x
Also, for all c > x0 we have, because the expectation is finite,
Z ∞ Z ∞
αxα0 1 αc c
E[X|X > c] = x α+1 dx = αcα α
dx = = E[X]
c x [1 − F (c)] c x α − 1 x0
The next proposition shows this result characterizes the power distribution (see [DALLAS(1976)]).
Proposition(20.4a). Suppose X is a non-negative absolutely continuous random variable with a finite expec-
tation and such that these exists x0 > 0 with P[X > x0 ] = 1. Suppose further that for all c ∈ (0, h) we
have
c
E[X|X > c] = E[X] (20.4a)
x0
Then there exists α > 1 such that X ∼ Pareto(α, 0, x0 ).
Proof. Let f denote the density and F denote the distribution function of X. Then equation(20.2a) becomes
Z ∞ Z ∞
xf (x) c
dx = xf (x) dx
c 1 − F (c) x0 x0
R∞
Let δ = x10 x0 xf (x) dx. We are assuming E[X] is finite; hence δ ∈ (1, ∞) and
Z ∞
xf (x) dx = c[1 − F (c)] δ for all c > x0 . (20.4b)
c
Differentiating with respect to c gives
−cf (c) = [1 − F (c) − cf (c)] δ
and hence
cf (c)[δ − 1] = [1 − F (c)]δ
F 0 (c) α δ
= where α = > 1.
1 − F (c) c δ−1
Integrating gives − ln[1 − F (c)] = A + ln(cα ) or 1 − F (c) = k/cα . Using F (x0 ) = 0 gives 1 − F (c) = xα α
0 /c for c > x0 ,
as required.
The above result leads on to another characterization of the Pareto distribution:
Proposition(20.4b). Suppose X1 , X2 , . . . , Xn are i.i.d. random variables with a non-negative absolutely con-
tinuous distribution function F with a finite expectation and such that there exists x0 > 0 with P[X > x0 ] = 1.
Then  
Sn
E X(1) = x = c with c independent of x (20.4c)
X(1)
Page 48 §21 Jan 8, 2019(21:02) Bayesian Time Series Analysis

iff there exists α > 1 such that F (x) = 1 − xα0 /xα for x > x0 .
Proof. ⇒ Writing Sn = X(1) + · · · + X(n) in equation(20.4c) gives
 
(c − 1)x = E X(2) + · · · + X(n) |X(1) = x
It is easy to see that given X(1) = x, then the vector (X(2) , . . . , X(n) ) has the same distribution as the vector of n − 1 order
statistics (Y(1) , . . . , Y(n−1) ) from the density f (y)/[1 − F (x)] for y > x. Hence Y(1) + . . . + Y(n−1) = Y1 + · · · + Yn−1 and
(c − 1)x = (n − 1)E[Y ] where Y has density f (y)/[1 − F (x)] for y > x0 . (20.4d)
Hence

c−1
Z
yf (y) dy = x[1 − F (x)]
x0 n−1
Because X(j) > X(1) for all j = 2, 3, . . . , n, equation(20.4c) implies c > nx x = n. Hence δ =
c−1
n−1 ∈ (1, ∞). So we have
equation(20.4b) again and we must have F (x) = 1 − xα 0 /x α
for x ∈ (x0 , ∞).
⇐ See part (b) of exercise 20 on page 50.

The next result is an easy consequence of the last one—it was originally announced in [S RIVASTAVA(1965)] but
the proof here is due to [DALLAS(1976)].
Proposition(20.4c). Suppose X1 , X2 , . . . , Xn are i.i.d. random variables with a non-negative absolutely con-
tinuous distribution function F with finite expectation and such that there exists x0 > 0 with P[X > x0 ] = 1.
Then
Sn
is independent of min{X1 , . . . , Xn } (20.4e)
min{X1 , . . . , Xn }
iff there exists α > 1 such that F (x) = 1 − xα0 /xα for x ∈ (x0 , ∞).
Proof. ⇒ Clearly equation(20.4e) implies equation(20.4c). ⇐ See part (c) of exercise 20 on page 50.

21 Exercises (exs-powerPareto.tex)

The power distribution.


1. Suppose X has the Power(α, a, h) distribution. Find E[X], E[X 2 ] and var[X].

2. Suppose X1 , X2 , . . . , Xn are i.i.d. random variables with the Power(α, a, h) distribution. Find the distribution of
Mn = max(X1 , . . . , Xn ).

3. Suppose U1 , U2 , . . . , Un are i.i.d. random variables with the U (0, 1) distribution.


(a) Find the distribution of Mn = max(U1 , . . . , Un ).
1/n
(b) Find the distribution of Y = U1 .
(c) Suppose X ∼ Power(α, a, h). Show that X ∼ a + hU 1/α where U ∼ U (0, 1). Hence show that
n  
X α n j n−j
E[X n ] = h a for n = 1, 2, . . . .
α+j j
j=0

4. Transforming the power distribution to the exponential. Suppose X ∼ Power(α, 0, h). Let Y = − ln(X); equivalently
Y = ln( 1/X ). Show that Y − ln( 1/h) ∼ exponential (α).

5. Suppose X1 , X2 , . . . , Xn are i.i.d. random variables with the power distribution Power(α, 0, 1). By using the density of
2
Xk:n , find E[Xk:n ] and E[Xk:n ].

6. Suppose X1 , X2 , . . . , Xn are i.i.d. random variables with the Power(α, 0, h) distribution.


 
Sn
(a) Show that E X(n) = x = c where c is independent of x.
X(n)
Sn
(b) Show that is independent of max{X1 , . . . , Xn }.
max{X1 , . . . , Xn }
1 Univariate Continuous Distributions Jan 8, 2019(21:02) §21 Page 49

7. Suppose r > 0 and X1 , X2 , . . . , Xn are i.i.d. random variables with a non-negative absolutely continuous distribution
function F such that there exists h > 0 with F (h) = 1.
(a) Show that for some i = 1, 2, . . . , n − 1
" #
r
X(i)
E X(i+1) = x = c with c independent of x for x ∈ (0, h)

r
X(i+1)
iff there exists α > 0 such that F (x) = xα /hα for x ∈ (0, h).
(b) Assuming the expectation is finite, show that for some i = 1, 2, . . . , n − 1
" #
r
X(i+1)
E r X(i+1) = x = c with c independent of x for x ∈ (0, h)

X(i)
iff there exists α > 0 such that F (x) = xα /hα for x ∈ (0, h). [DALLAS(1976)]

8. Suppose X1 , X2 , . . . , Xn are i.i.d. random variables with the power distribution Power(α, 0, 1), which has distribution
function F (x) = xα for 0 < x < 1 where α > 0.
(a) Let
X1:n X2:n X(n−1):n
W1 = , W2 = , . . . , Wn−1 = , Wn = Xn:n
X2:n X3:n Xn:n
Prove that W1 , W2 , . . . , Wn are independent and find the distribution of Wk for k = 1, 2, . . . , n.
2
(b) Hence find E[Xk:n ] and E[Xk:n ].

The Pareto distribution.


9. Relationship with the power distribution. Recall that if α > 0, then U ∼ U (0, 1) iff Y = U 1/α ∼ Power(α, 0, 1).
(a) Suppose α > 0. Show that U ∼ U (0, 1) iff Y = U −1/α ∼ Pareto(α, 0, 1).
(b) Suppose α > 0 and x0 > 0. Show that Y ∼ Pareto(α, a, x0 ) iff Y = a + x0 U −1/α where U ∼ U (0, 1).
(c) Show that X ∼ Power(α, 0, 1) iff 1/X ∼ Pareto(α, 0, 1).

10. Suppose X1 , X2 , . . . , Xn are i.i.d. random variables with the Pareto(α, a, x0 ) distribution. Find the distribution of
Mn = min(X1 , X2 , . . . , Xn ).

11. Suppose X ∼ Pareto(α, 0, x0 ).


(a) Find E[X] and var[X].
(b) Find the median and mode of X.
(c) Find E[X n ] for n = 1, 2, . . . .
(d) Find MX (t) = E[etX ], the moment generating function of X and φX (t) = E[eitX ], the characteristic function
of X.

12. Show that the Pareto( 1/2, 0, 1) distribution provides an example of a distribution with E[1/X] finite but E[X] infinite.

13. Transforming the Pareto to the exponential. Suppose X ∼ Pareto(α, 0, x0 ). Let Y = ln(X). Show that Y has a shifted
exponential distribution: Y − ln(x0 ) ∼ exponential (α).

14. Suppose X ∼ Pareto(α, 0, x0 ). Find the geometric mean of X and the Gini coefficient of X. The geometric mean of a
distribution is defined in exercise 3 on page 32 and the Gini coefficient is defined in exercise 9 on page 32.

15. Suppose X1 , X2 , . . . , Xn are i.i.d. random variables with the Pareto distribution Pareto(α, 0, 1). By using the density of
2
Xk:n , find E[Xk:n ] and E[Xk:n ].

16. Suppose X1 , X2 , . . . , Xn are i.i.d. random variables with the Pareto(α, 0, 1) distribution.
(a) Let
X2:n X(n−1):n Xn:n
W1 = X1:n W2 = , . . . , Wn−1 = , Wn =
X1:n X(n−2):n X(n−1):n
Prove that W1 , W2 , . . . , Wn are independent and find the distribution of Wk for k = 1, 2, . . . , n.
2
(b) Hence find E[Xk:n ] and E[Xk:n ]. See exercise 15 for an alternative derivation.

17. Suppose X1 , X2 , . . . , Xn are i.i.d. random variables with the Power(α, 0, 1) distribution. Suppose also Y1 , Y2 , . . . , Yn
are i.i.d. random variables with the Pareto(α, 0, 1) distribution. Show that for k = 1, 2, . . . , n
1
Xk:n and have the same distribution.
Y(n−k+1):n
Page 50 §22 Jan 8, 2019(21:02) Bayesian Time Series Analysis

18. Suppose X and Y are i.i.d. random variables with the Pareto(α, 0, x0 ) distribution. Find the distribution function and
density of Y /X .

19. Suppose X and Y are i.i.d. random variables with the Pareto(α, 0, x0 ) distribution. Let M = min(X, Y ). Prove that M
and Y /X are independent.

20. Suppose X1 , X2 , . . . , Xn are i.i.d. random variables with the Pareto(α, 0, x0 ) distribution.
(a) Prove that the random variable X1:n is independent of the random vector ( X2:n/X1:n , . . . , Xn:n/X
1:n ).
 
Sn
(b) Show that E X(1) = x = c where c is independent of x.
X(1)
(c) Prove that X1:n is independent of Sn/X
1:n = (X1 +···+Xn )/X
1:n .

21. Suppose r > 0 and X1 , X2 , . . . , Xn are i.i.d. random variables with a non-negative absolutely continuous distribution
function F with finite expectation and such that there exists x0 > 0 with P[X > x0 ] = 1.
(a) Show that for some i = 1, 2, . . . , n − 1
" #
r
X(i+1)
E r X(i) = x = c with c independent of x for x > x0

X(i)
iff there exists α > r/(n − i) such that F (x) = 1 − xα α
0 /x for x > x0 .
(b) Show that for some i = 1, 2, . . . , n − 1
" #
r
X(i)
E X(i) = x = c with c independent of x for x > x0

r
X(i+1)
iff there exists α > r/(n − i) such that F (x) = 1 − xα α
0 /x for x > x0 . [DALLAS(1976)]

22. A characterization of the Pareto distribution. It is known that if X and Y are i.i.d. random variables with an absolutely
continuous distribution and min(X, Y ) is independent of X − Y , then X and Y have an exponential distribution—see
[C RAWFORD(1966)].
Now suppose X and Y are i.i.d. positive random variables with an absolutely continuous distribution and min(X, Y ) is
independent of Y /X . Prove that X and Y have a Pareto distribution.
Combining this result with exercise 19 gives the following characterization theorem: suppose X and Y are i.i.d. positive
random variables with an absolutely continuous distribution; then min(X, Y ) is independent of Y /X if and only if X and
Y have the Pareto distribution.

23. Another characterization of the Pareto distribution. Suppose X1 , X2 , . . . , Xn are i.i.d. absolutely continuous non-
negative random variables with density function f (x) and distribution function F (x). Suppose further that F (1) = 0 and
f (x) > 0 for all x > 1 and 1 ≤ i < j ≤ n. Show that Xj:n/Xi:n is independent of Xi:n if and only if there exists β > 0
such that each Xi has the Pareto(β, 0, 1) distribution.
Using the fact that X ∼ Pareto(α, 0, x0 ) iff X/x0 ∼ Pareto(α, 0, 1), it follows that if F (x0 ) = 0 and f (x) > 0 for all
x > x0 where x0 > 0 then Xj:n/Xi:n is independent of Xi:n if and only if there exists β > 0 such that each Xi has the
Pareto(β, 0, x0 ) distribution.

22 Size, shape and related characterization theorems


22.1 Size and shape: the definitions. The results in this section on size and shape are from [M OSSIMAN(1970)]
and [JAMES(1979)].
Definition(22.1a). The function g : (0, ∞)n → (0, ∞) is an n-dimensional size variable iff
g(ax) = ag(x) for all a > 0 and all x ∈ (0, ∞)n .
Definition(22.1b). Suppose g : (0, ∞)n → (0, ∞) is an n-dimensional size variable. Then the function
z: (0, ∞)n → (0, ∞)n is the shape function associated with g iff
x
z(x) = for all x ∈ (0, ∞)n .
g(x)
1 Univariate Continuous Distributions Jan 8, 2019(21:02) §22 Page 51

22.2 Size and shape: standard examples.


• The standard size function. This is g(x1 , . . . , xn ) = x1 + · · · + xn . The associated shape function is the function
z : (0, ∞)n → (0, ∞)n with !
x1 xn
z(x1 , . . . , xn ) = Pn , . . . , Pn
j=1 xj j=1 xj
• Dimension 1 size. This is g(x1 , . . . , xn ) = x1 . The associated shape function is
 
x2 xn
z(x1 , . . . , xn ) = 1, , . . . ,
x1 x1
• Dimension 2 size. This is g(x1 , . . . , xn ) = x2 . The associated shape function is
 
x1 xn
z(x1 , . . . , xn ) = , 1, . . . ,
x2 x2
• Volume. This is g(x1 , . . . , xn ) = (x21 + · · · + x2n )1/2 . The associated shape function is
 
x1 xn
z(x1 , . . . , xn ) = ,..., 2
(x21 + · · · + x2n )1/2 (x1 + · · · + x2n )1/2
• The maximum. This is g(x1 , . . . , xn ) = max{x1 , . . . , xn }. The associated shape function is
 
x1 xn
z(x1 , . . . , xn ) = ,...,
max{x1 , . . . , xn } max{x1 , . . . , xn }
• Root n size. This is g(x1 , . . . , xn ) = (x1 x2 . . . xn )1/n . The associated shape function is
 
x1 xn
z(x1 , . . . , xn ) = ,...,
(x1 x2 . . . xn )1/n (x1 x2 . . . xn )1/n
22.3 Size and shape: the fundamental result. We shall show that:
• if any one shape function z(X) is independent of the size variable g(X), then every shape function is independent
of g(X);
• if two size variables g(X) and g ∗ (X) are both independent of the same shape function z(X), then g(X)/g ∗ (X) is
almost surely constant.
First a specific example9 of this second result:
Example(22.3a). Suppose X = (X1 , X2 , X3 ) ∼ logN (µ, Σ) distribution. By definition, this means that if Y1 = ln(X1 ),
Y2 = ln(X2 ) and Y3 = ln(X3 ), then (Y1 , Y2 , Y3 ) ∼ N (µ, Σ).
Define the three size functions:

g1 (x) = x1 g2 (x) = x2 x3 g3 (x) = (x1 x2 x3 )1/3
and let z1 , z2 and z3 denote the corresponding shape functions. Suppose g1 (X) is independent of z1 (X).
(a) Show that var[Y1 ] = cov[Y1 , Y2 ] = cov[Y1 , Y3 ].
(b) Show that g1 (X) is independent of z2 (X). (c) Show that g1 (X) is independent of g2 (X)/g1 (X).
Now suppose g3 (X) is also independent of z1 (X).
(d) Show that cov[Y1 , S] = cov[Y2 , S] = cov[Y3 , S] where S = Y1 + Y2 + Y3 .
(e) Show that var[Y2 ] + cov[Y2 , Y3 ] = var[Y3 ] + cov[Y2 , Y3 ] = 2var[Y1 ].
(f) Show that var[2Y1 − Y2 − Y3 ] = 0 and hence g1 (X)/g3 (X) is constant almost everywhere.

Solution. We are given X1 is independent of 1, X2 /X1 , X3 /X1 . Taking logs shows that Y1 is independent of (Y2 − Y1 , Y3 −
Y1 ) and these are normal. Hence cov[Y1 , Y2 − Y1 ] = cov[Y1 , Y3 − Y1 ] = 0 and hence (a).
(b) follows because Y1 is independent of (Y1 − 21 Y2 − 12 Y3 , 21 Y2 − 12 Y3 , 12 Y3 − 21 Y2 ).
(c) Now cov[Y1 , 12 (Y2 + Y3 ) − Y1 ) = 21 cov[Y1 , Y2 ] + 21 cov[Y1 , Y3 ] − var[Y1 ] = 0. By normality, ln (g1 (X)) = Y1 is independent
of log (g2 (X)) − ln (g1 (X)). Because the exponential function is one-one, we have (c).
(d) The assumption g3 (X) is independent of z1 (X) implies, by taking logs, that S is independent of (Y2 − Y1 , Y3 − Y1 ) and
these are normal. Hence (d).
(e) Expanding cov[Y1 , S] and using part (a) shows that cov[Y1 , S] = 3var[Y1 ]. Similarly, expanding cov[Y2 , S] shows that
var[Y2 ] + cov[Y2 , Y3 ] + cov[Y1 , Y2 ] = cov[Y2 , S] = cov[Y1 , S] = 3var[Y1 ]. Hence (e).
(f) Now var[2Y1 − Y2 − Y3 ] = 4var[Y1 ] − 4cov[Y 1 , Y2 ] − 4cov[Y1 , Y3 ] + var[Y2 ] + var[Y3 ] + 2cov[Y2 , Y3 ] = 0. Hence
var[Y1 − 31 S] = 0; hence var[ln g1 (X)/g3 (X) ] = 0. Hence (f).


Now for the general result:


9
Understanding this example is not necessary for the rest of the section. The example makes use of the definition of the
multivariate normal and the fact that normals are independent if the covariance is zero. See Chapter2:§3.6 on page 68.
Page 52 §22 Jan 8, 2019(21:02) Bayesian Time Series Analysis

Proposition(22.3b). Suppose g : (0, ∞)n → (0, ∞) is an n-dimensional size variable and z ∗ : (0, ∞)n →
(0, ∞)n is any shape function. Suppose further that X is a random vector such that z ∗ (X) is non-degenerate
and independent of g(X). Then
(a) for any other shape function z1 : (0, ∞)n → (0, ∞)n , z1 (X) is independent of g(X);
(b) if g2 : (0, ∞)n → (0, ∞) is another size variable such that z ∗ (X) is independent of both g2 (X) and g(X),
then
g2 (X)
is constant almost everywhere.
g(X)
Proof. Let g ∗ and g1 denote the size variables which lead to the shape functions z ∗ and z1 . Hence
x x
z ∗ (x) = ∗ and z1 (x) = for all x ∈ (0, ∞)n .
g (x) g1 (x)
For all x ∈ (0, ∞)n we have
 

 x g1 (x)
g1 z (x) = g1 ∗
= ∗ by using g1 (ax) = ag1 (x).
g (x) g (x)
Hence for all x ∈ (0, ∞)n
z ∗ (x) x g ∗ (x)
z1 z ∗ (x) =

= × = z1 (x) (22.3a)
g1 ( z ∗ (x) ) g ∗ (x) g1 (x)
∗ ∗
Equation(22.3a) shows that z1 (X) is a function of z (X); also, we are given that z (X) is independent of g(X). Hence
z1 (X) is independent of g(X). This proves (a).
(b) Because of part (a), we can assume
X X
z2 (X) = is independent of g(X) and z(X) = is independent of g2 (X)
g2 (X) g(X)
Applying g to the first and g2 to the second gives
g(X) g2 (X)
g ( z2 (X) ) = is independent of g(X) and g2 ( z(X) ) = is independent of g2 (X)
g2 (X) g(X)
and hence
g2 (X)
is independent of both g2 (X) and g(X).
g(X)
Hence result by part (b) of exercise 8 on page 7.

22.4 A characterization of the gamma distribution.


Proposition(22.4a). Suppose X1 , X2 , . . . , Xn are independent positive non-degenerate random variables.
Suppose g ∗ (0, ∞)n → (0, ∞) denotes the size variable
n
X
g ∗ (x) = xj
j=1
Then there exists a shape vector z(X) which is independent of g ∗ (X) iff there exist α > 0, k1 > 0, . . . , kn > 0
such that Xj ∼ Gamma(kj , α) for j = 1, 2, . . . , n.
Proof.
⇐ Now g ∗ (X) = nj=1 Xj ∼ Gamma(k1 + · · · + kn , α). Proposition(8.8b) implies Xj /(X1 + · · · + Xn ) is independent
P
of g ∗ (X) = X1 + · · · + Xn for j = 1, 2, . . . , n. Hence the standard shape vector
 
X1 X2 Xn
z(X) = , ,...,
X1 + · · · + Xn X1 + · · · + Xn X1 + · · · + Xn
∗ ∗
is independent of g (X). Hence all shape vectors are independent of g (X).
⇒ By proposition(22.3b), if there exists one shape vector which is independent of g ∗ (X), then all shape vectors are
independent of g ∗ (X). Hence Xj /(X1 + · · · + Xn ) is independent of g ∗ (X) = X1 + · · · + Xn for j = 1, 2, . . . , n. Hence
by proposition(8.8b), there exists α > 0 and kj > 0 such that Xj ∼ Gamma(kj , α) for j = 1, 2, . . . , n.

This result implies many others. For example, suppose X1 , X2 , . . . , Xn are independent random variables with
Xj ∼ Gamma(kj , α). Then every shape vector is independent of X1 + X2 + · · · + Xn ; in particular,
Xj
X1 + X2 + · · · + Xn is independent of
max{X1 , X2 , . . . , Xn }
and
X1 + X2 + · · · + Xn
X1 + X2 + · · · + Xn is independent of
max{X1 , X2 , . . . , Xn }
1 Univariate Continuous Distributions Jan 8, 2019(21:02) §22 Page 53

22.5 A characterization of the Pareto distribution.


Proposition(22.5a). Suppose X1 , X2 , . . . , Xn are independent positive non-degenerate random variables.
Suppose g ∗ (0, ∞)n → (0, ∞) denotes the size variable
g ∗ (x) = min{x1 , . . . , xn }
Then there exists a shape vector z(X) which is independent of g ∗ (X) iff there exists x0 > 0 and αj > 0 for
j = 1, 2, . . . , n such that Xj ∼ Pareto(αj , 0, x0 ) for j = 1, 2, . . . , n.
Proof.
⇐ Let Y1 = ln(X1 ) and Y2 = ln(X2 ). Then Y1 −ln(x0 ) ∼ exponential (α1 ) and Y2 −ln(x0 ) ∼ exponential (α2 ) and Y1 and
Y2 are independent. By exercise 5 on page 55, we know that if Y1 − a ∼ exponential (λ1 ) and Y2 − a ∼ exponential (λ2 )
and Y1 and Y2 are independent, then min{Y1 , Y2 } is independent of Y2 − Y1 .
This establishes U = min{Y1 , Y2 } is independent of V = Y2 − Y1 . But (Y3 , . . . , Yn ) is independent of U and V . Hence
min{Y1 , . . . , Yn } is independent of Y2 −Y1 . Similarly min{Y1 , . . . , Yn } is independent of Yj −Y1 for j = 2, . . . , n. Hence
min{Y1 , . . . , Yn } is independent of the vector (Y2 − Y1 , Y3 − Y1 , . . . , Yn − Y1 ). And hence g ∗ (X) = min{X1 , . . . , Xn ) is
independent of the shape vector (1, X2/X1 , . . . , Xn/X1 ) as required.
⇒ Suppose n = 2. Using the shape vector (1, x2/x1 ) implies that we are given min{X1 , X2 } is independent of X2 /X1 .
Taking logs shows that min{Y1 , Y2 } is independent of Y2 − Y1 where Y1 = ln(X1 ) and Y2 = ln(X2 ).
It is known (see [C RAWFORD(1966)]) that if Y1 and Y2 are independent random variables with an absolutely continuous
distribution and min(Y1 , Y2 ) is independent of Y2 − Y1 , then there exist a ∈ R, α1 > 0 and α2 > 0 such that Y1 − a ∼
exponential (α1 ) and Y2 − a ∼ exponential (α2 ). Hence fY1 (y1 ) = α1 e−α1 (y1 −a) for y1 > a and fY2 (y2 ) = α2 e−α2 (y2 −a) for
y2 > a.
Hence X1 = eY1 ∼ Pareto(α1 , 0, x0 = ea ) and X2 = eY2 ∼ Pareto(α2 , 0, x0 = ea ) where x0 > 0.
For n > 2 we are given that
Xj
is independent of min{X1 , . . . , Xn } for j = 1, 2, . . . , n.
min{X1 , . . . , Xn }
But
min{X1 , . . . , Xn } = min{Xj , Zj } where Zj = min{Xi : i 6= j}
Hence for some x0j > 0, λj > 0 and λ∗j > 0, Xj ∼ Pareto(λj , 0, x0j ) and Zj ∼ Pareto(λ∗j , 0, x0j ). Because Zj =
min{Xi : i 6= j} we must have x0j ≤ x0i for j 6= i. It follows that all x0j are equal. Hence result.

22.6 A characterization of the power distribution. Because the inverse of a Pareto random variable has the
power distribution, the previous proposition can be transformed into a result about the power distribution.
Proposition(22.6a). Suppose X1 , X2 , . . . , Xn are independent positive non-degenerate random variables.
Suppose g ∗ (0, ∞)n → (0, ∞) denotes the size variable
g ∗ (x) = max{x1 , . . . , xn }
Then there exists a shape vector z(X) which is independent of g ∗ (X) iff there exists x0 > 0 and αj > 0 for
j = 1, 2, . . . , n such that Xj ∼ Power(αj , 0, x0 ) for j = 1, 2, . . . , n.
Proof.
⇐ Let Y1 = ln( 1/X1 ) and Y2 = ln( 1/X2 ). By exercise 4 on page 48, Y1 − ln( 1/x0 ) ∼ exponential (α1 ) and Y2 − ln( 1/x0 ) ∼
exponential (α2 ) and Y1 and Y2 are independent. By exercise 5 on page 55, we know that if Y1 − a ∼ exponential (λ1 ) and
Y2 − a ∼ exponential (λ2 ) and Y1 and Y2 are independent, then min{Y1 , Y2 } is independent of Y1 − Y2 .
This establishes U = min{Y1 , Y2 } is independent of V = Y1 − Y2 . But (Y3 , . . . , Yn ) is independent of U and V . Hence
min{Y1 , . . . , Yn } is independent of Y1 −Y2 . Similarly min{Y1 , . . . , Yn } is independent of Y1 −Yj for j = 2, . . . , n. Hence
min{Y1 , . . . , Yn } is independent of the vector (Y1 − Y2 , Y1 − Y3 , . . . , Y1 − Yn ). And hence g ∗ (X) = max{X1 , . . . , Xn ) is
independent of the shape vector (1, X2/X1 , . . . , Xn/X1 ) as required.
⇒ Suppose n = 2. Using the shape vector (1, x2/x1 ) implies that we are given max{X1 , X2 } is independent of X2 /X1 .
Set Y1 = ln( 1/X1 ) and Y2 = ln( 1/X2 ). Hence min{Y1 , Y2 } is independent of Y2 − Y1 .
It is known (see [C RAWFORD(1966)]) that if Y1 and Y2 are independent random variables with an absolutely continuous
distribution and min(Y1 , Y2 ) is independent of Y2 − Y1 , then there exist a ∈ R, α1 > 0 and α2 > 0 such that Y1 − a ∼
exponential (α1 ) and Y2 − a ∼ exponential (α2 ). Hence fY1 (y1 ) = α1 e−α1 (y1 −a) for y1 > a and fY2 (y2 ) = α2 e−α2 (y2 −a) for
y2 > a.
Hence X1 = e−Y1 ∼ Power(α1 , 0, h = e−a ) and X2 = e−Y2 ∼ Power(α2 , 0, h = e−a ) where h > 0.
For n > 2 we are given that
Xj
is independent of max{X1 , . . . , Xn } for j = 1, 2, . . . , n.
max{X1 , . . . , Xn }
But
max{X1 , . . . , Xn } = max{Xj , Zj } where Zj = max{Xi : i 6= j}
Page 54 §23 Jan 8, 2019(21:02) Bayesian Time Series Analysis

Hence for some hj > 0, λj > 0 and λ∗j > 0, Xj ∼ Power(λj , 0, hj ) and Zj ∼ Power(λ∗j , 0, hj ). Because Zj = max{Xi :
i 6= j} we must have hj ≥ hi for j 6= i. It follows that all hj are equal. Hence result.
22.7 Independence of size and shape for the multivariate lognormal. This result requires a basic knowledge
of the multivariate normal—see Chapter2:§5 on page 71.
We say that the random vector X = (X1 , . . . , Xn ) ∼ logN (µ, Σ) iff ln(X) = ( ln(X1 ), . . . , ln(Xn ) ) ∼ N (µ, Σ).
Proposition(22.7a). Suppose X = (X1 , . . . , Xn ) ∼ logN (µ, Σ). Suppose further that g1 : (0, ∞)n → (0, ∞)
denotes the size variable
g1 (x) = (x1 · · · xn )1/n
Then g1 (X) is independent of every shape vector z(X) iff there exists c ∈ R such that cov[Yj , Y1 + · · · + Yn ] = c
for all j = 1, 2, . . . , n, where Y = (Y1 , . . . , Yn ) = (ln(X1 ), . . . , ln(Xn ) ).
Proof. By proposition(22.3b) on page 52,we need only prove g1 (X) is independent of one shape function. Consider the
shape function z ∗ (x) = 1, x2/x1 , . . . , xn/x1 .
Now g1 (X) is independent of z ∗ (X) iff (X1 · · · Xn )1/n is independent of 1, X2/X1 , . . . , Xn/X1 . This occurs iff Y1 +· · ·+Yn


is independent
Pof (Y2 − Y1 , . . . , Yn − Y1 ). But the Y ’s are normal; hencePby proposition(5.8b) Pon page 74, this occurs iff
n n n
cov[Yi − Y1 , j=1 Yj ] = 0 for i = 2, 3, . . . , n; and this occurs iff cov[Yi , j=1 Yj ] = cov[Y1 , j=1 Yj ] for i = 2, 3, . . . , n.

This result implies many others. For example, suppose X1 , X2 , . . . , Xn are independent random variables with
Xj ∼ logN (µj , σ 2 ) for j = 1, 2, . . . , n. Then
Xj
(X1 X2 · · · Xn )1/n is independent of
max{X1 , X2 , . . . , Xn }
and
X1 + X2 + · · · + Xn
(X1 X2 · · · Xn )1/n is independent of etc.
max{X1 , X2 , . . . , Xn }
Proposition(22.7a) leads to the following characterization of the lognormal distribution.
Proposition(22.7b). Suppose X1 , X2 , . . . , Xn are independent positive non-degenerate random variables.
Suppose g1 (0, ∞)n → (0, ∞) denotes the size variable
g1 (x) = (x1 · · · xn )1/n
Then there exists a shape vector z(X) which is independent of g1 (X) iff there exists σ > 0 such that every
Xj ∼ logN (µj , σ 2 ).
Proof.
⇐ Let Yj = ln(Xj ). Then Yj ∼ N (µj , σ 2 ); also Y1 , . . . , Yn are independent. Hence cov[Yj , Y1 + · · · + Yn ] = σ 2 for
j = 1, 2, . . . , n. Hence result by previous proposition.
⇒ By proposition(22.3b), if there exists one shape  vector which is independent of g1 (X), then all shape vectors are inde-
pendent of g1 (X). Hence 1, X2/X1 , . . . , Xn/X1 is independent of g1 (X) = (X1 · · · Xn )1/n . Hence Yk − Y1 is independent
of Y1 + · · · + Yn for k = 2, . . . , n. Hence, by the Skitovich-Darmois theorem—see proposition (10.6b), every Yk is normal.

23 Exercises (exs-sizeshape.tex.tex)

1. Suppose X = (X1 , X2 ) is a 2-dimensional random vector with X1 = aX2 where a ∈ R. Show that if z : (0, ∞)2 →
(0, ∞)2 is any shape function, then z(X) is constant almost everywhere.
2. Suppose X = (X1 , X2 ) is a 2-dimensional random vector with the distribution given in the following table:
X2
1 2 3 6
1 0 0 1/4 0
2 0 0 0 1/4
X1 1/4
3 0 0 0
6 0 1/4 0 0

Define the size variables g1 (x) = x1 x2 and g2 (x) = x1 + x2 .
(a) Suppose z is any shape function: (0, ∞)2 → (0, ∞)2 . Show that z(X) cannot be almost surely constant. Also, show
that z(X) is independent of both g1 (X) and g2 (X).
(b) Find the distribution of g1 (X)/g2 (X).
1 Univariate Continuous Distributions Jan 8, 2019(21:02) §24 Page 55

3. A characterization of the generalized gamma distribution. Prove the following result.


Suppose X1 , X2 , . . . , Xn are independent positive non-degenerate random variables. Suppose g ∗ (0, ∞)n → (0, ∞)
denotes the size variable
 1/b
n
X
g ∗ (x) =  xbj  where b > 0.
j=1

Then there exists a shape vector z(X) which is independent of g ∗ (X) iff there exist α > 0, k1 > 0, . . . , kn > 0 such that
Xj ∼ GGamma(kj , α, b) for j = 1, 2, . . . , n.
Hint: use the result that X ∼ GGamma(n, λ, b) iff X b ∼ Γ(n, λ) and proposition(22.4a).

4. Suppose X1 ∼ exponential (λ1 ), Y ∼ exponential (λ2 ) and X and Y are independent.


(a) Find P[X1 < X2 ].
(b) By using the lack of memory property of the exponential distribution, find the distribution of X1 − X2 .
(c) By using the usual convolution formula for densities, find the denisty of X1 − X2 .

5. Suppose Y1 − a ∼ exponential (λ1 ) and Y2 − a ∼ exponential (λ2 ) and Y1 and Y2 are independent. Show that U =
min{Y1 , Y2 } is independent of V = Y2 − Y1 .

6. A generalization of proposition(22.5a) on page 53. Suppose X1 , X2 , . . . , Xn are independent positive non-degenerate


random variables. and θ1 , θ2 , . . . , θn are positive constants. Suppose g ∗ (0, ∞)n → (0, ∞) denotes the size variable
 
∗ x1 xn
g (x) = min , ...,
θ1 θn
Prove there exists a shape vector z(X) which is independent of g ∗ (X) iff there exists x0 > 0 and αj > 0 for j =
1, 2, . . . , n such that Xj ∼ Pareto(αj , 0, θj x0 ) for j = 1, 2, . . . , n. [JAMES(1979)]

7. A generalization of proposition(22.6a) on page 53. Suppose X1 , X2 , . . . , Xn are independent positive non-degenerate


random variables and θ1 , θ2 , . . . , θn are positive constants. Suppose g ∗ (0, ∞)n → (0, ∞) denotes the size variable
 
∗ x1 xn
g (x) = max , ...,
θ1 θn
Prove there exists a shape vector z(X) which is independent of g ∗ (X) iff there exists x0 > 0 and αj > 0 for j =
1, 2, . . . , n such that Xj ∼ Power(αj , 0, θj x0 ) for j = 1, 2, . . . , n. [JAMES(1979)]

24 Laplace, Rayleigh and Weibull distributions


There are many results about these distributions in the exercises.

24.1 The Laplace or bilateral exponential distribution. Suppose µ ∈ R and α > 0. Then the random
variable X is said to have the Laplace(µ, α) distribution iff X has the density
α
fX (x) = e−α|x−µ| for x ∈ R.
2
Clearly if X ∼ Laplace(µ, α), then X − µ ∼ Laplace(0, α). As figure(24.1a) shows, the density consists of two
equal exponential densities spliced back to back.
3.0

2.5

2.0

1.5

1.0

0.5

0.0

−2 −1 0 1 2
Figure(24.1a). The bilateral exponential density for µ = 0 and α = 6.
(wmf/bilateralExponential,72mm,54mm)
Page 56 §25 Jan 8, 2019(21:02) Bayesian Time Series Analysis

1.2 σ = 0.5
σ = 1.5
1.0 σ=4

0.8

0.6

0.4

0.2

0.0
0 2 4 6 8 10

Figure(24.2a). The Rayleigh distribution density for σ = 0.5, σ = 1.5 and σ = 4.


(wmf/Rayleighdensity,72mm,54mm)

24.2 The Rayleigh distribution. Suppose σ > 0. Then the random variable X is said to have the Rayleigh (σ)
distribution if X has the density
r 2 2
fR (r) = 2 e−r /2σ for r > 0.
σ
The Rayleigh distribution is used to model the lifetime of various items and the magnitude of vectors—see exer-
cise 11 on page 57. There are plots of the density in figure(24.2a)
24.3 The Weibull distribution. Suppose β > 0 and γ > 0. Then the random variable X is said to have the
Weibull (β, γ) distribution iff X has the density
βxβ−1 −(x/γ)β
fX (x) = e for x > 0.
γβ
The distribution function is F (x) = 1 − exp(−xβ /γ β ) for x > 0. The Weibull distribution is frequently used to
model failure times.
The density can take several shapes as figure(24.3a) illustrates.
2.5
β = 1/2, γ = 1
2.0

1.5 β = 5, γ = 1

1.0 β = 1.5, γ = 1

0.5 β = 1, γ = 1

0.0
0.0 0.5 1.0 1.5 2.0 2.5

Figure(24.3a). The Weibull density for β = 1/2, β = 1, β = 1.5 and β = 5; all with γ = 1.
(wmf/weibulldensity,72mm,54mm)

25 Exercises (exs-other.tex)

The Laplace or bilateral exponential distribution.


1. (a) Suppose α > 0. Suppose further that X has the exponential density αe−αx for x > 0 and Y has the exponential
density αeαx for x < 0 and X and Y are independent. Find the density of X + Y .
(b) Suppose α > 0 and the random variables X and Y have the exponential density αe−αx for x > 0. Suppose further
that X and Y are independent. Find the density of X − Y .
2. Suppose X has the Laplace(µ, α) distribution.
(a) Find the expectation, median, mode and variance of X.
(b) Find the distribution function of X.
(c) Find the moment generating function of X.
1 Univariate Continuous Distributions Jan 8, 2019(21:02) §25 Page 57

3. Suppose X has the Laplace(0, α) distribution. Find the density of |X|.


4. Suppose X ∼ exponential (λ), Y ∼ exponential (µ) and X and Y are independent. Find the density of Z = X − Y .
5. (a) Suppose X ∼ Laplace(µ, α); suppose further that k > 0 and b ∈ R. Show that kX + b ∼ Laplace(kµ + b, α/k).
(b) Suppose X ∼ Laplace(µ, α). Show that α(X − µ) ∼ Laplace(0, 1).
Pn
(c) Suppose X1 , . . . , Xn are i.i.d. with the Laplace(µ, α) distribution. Show that 2α i=1 |Xi − µ| ∼ χ22n .
6. Suppose X and Y are i.i.d. Laplace(µ, α). Show that
|X − µ|
∼ F2,2
|Y − µ|
7. Suppose X and Y are i.i.d. uniform U (0, 1). Show that ln( X/Y ) ∼ Laplace(0, 1).
8. Suppose X and Y are independent random variables with X ∼ exponential (α) and Y ∼ Bernoulli ( 1/2).
Show that X(2Y − 1) ∼ Laplace(0, α).
9. Suppose X1 , X2 , X3 and X4 are i.i.d. N (0, 1).
(a) Show that X1 X2 − X3 X4 ∼ Laplace(0, 1).
(b) Show that X1 X2 + X3 X4 ∼ Laplace(0, 1). (See also exercise 1.11(12) on page 29.)
10. Exponential scale mixture of normals.
√ Suppose X and Y are independent
√ random variables with X ∼ exponential (1)
and Y ∼ N (0, 1). Show that Y 2X ∼ Laplace(0, 1) and µ + Y 2X/α ∼ Laplace(µ, α).
Note. Provide two solutions: one using characteristic functions and one using densities.
The Rayleigh distribution.

11. (a) Suppose X and Y are i.i.d. with the N (0, σ 2 ) distribution. Define R and Θ by R = X 2 + Y 2 , X = R cos Θ and
Y = R sin Θ with Θ ∈ [0, 2π). Prove that R and Θ are independent and find the density of R and Θ.
(b) Suppose R has the Rayleigh (σ) distribution and Θ has the uniform U (−π, π) distribution. Show that X = R cos Θ
and Y = R sin Θ are i.i.d. N (0, σ 2 ).
12. Suppose R has the Rayleigh (σ) distribution:
r −r2 /2σ2
fR (r) = e for r > 0.
σ2
(a) Find the distribution function of R.
(b) Find an expression for E[Rn ] for n = 1, 2, . . . .
(c) Find E[R] and var[R].
(d) Find the mode and median of R.

13. Suppose U has the uniform distribution U (0, 1) and X = σ −2 ln U . Show that X has the Rayleigh (σ) distribution.
14. (a) Suppose R has the Rayleigh (σ) distribution. Find the distribution of R2 .
(b) Suppose R has the Rayleigh (1) distribution. Show that the distribution of R2 is χ22 .
Pn
(c) Suppose R1 , . . . , Rn are i.i.d. with the Rayleigh (σ) distribution. Show that Y = i=1 Ri2 has the Gamma(n, 1/2σ 2 )
distribution.

(d) Suppose X has the exponential (λ) = Gamma(1, λ) distribution. Find the distribution of Y = X.
15. Suppose X ∼ Rayleigh (s) where s > 0, and Y |X ∼ N (µ, σ = X). Show that Y ∼ Laplace(µ, 1/s).
The Weibull distribution.
β
16. Suppose X has the Weibull (β, γ) distribution; hence fX (x) = βxβ−1 e−(x/γ) /γ β for x > 0 where β is the shape and γ
is the scale.
(a) Show that the Weibull (1, γ) distribution is the same as the exponential (1/γ) distribution.

(b) Show that the Weibull (2, γ) distribution is the same as the Rayleigh (γ/ 2) distribution.
17. Suppose X has the exponential (1) distribution; hence fX (x) = e−x for x > 0. Suppose further that β > 0 and γ > 0
and W = γX 1/β . Find the density of W .
This is the Weibull (β, γ) distribution; β is called the shape and γ is called the scale.
β
18. Suppose X has the Weibull (β, γ) distribution; hence fX (x) = βxβ−1 e−(x/γ) /γ β for x > 0.
(a) Suppose α > 0; find the distribution of Y = αX. (b) Find an expression for E[X n ] for n = 1, 2, . . . .
(c) Find the mean, variance, median and mode of X. (d) Find E[et ln(X) ], the moment generating function of ln(X).
Page 58 §25 Jan 8, 2019(21:02) Bayesian Time Series Analysis

19. Suppose X has the Weibull (β, γ) distribution.


(a) Find hX (x) = f (x)/[1 − F (x)], the hazard function of X.
(b) Check that if β < 1 then hX decreases as x increases; if β = 1 then hX is constant; and if β > 1 then hX increases
as x increases.
20. Suppose U has the uniform U (0, 1) distribution. Show that X = γ(− ln U )1/β has the Weibull (β, γ) distribution.
CHAPTER 2

Multivariate Continuous Distributions

1 General results
1.1 The mean and variance matrices. If
X1


.
X =  .. 
n×1
Xn
is a random vector, then, provided the univariate expectations E[X1 ], . . . , E[Xn ] exist, we define
E[X1 ]
 
.
E[X] =  .. 
n×1
E[Xn ]
Provided the second moments E[X12 ], . . . , E[Xn2 ] are finite, the variance matrix or covariance matrix of X is the
n × n matrix
var[X] = E[(X − µ)(X − µ)T ] where µ = E[X].
n×n

The (i, j) entry in the variance matrix is cov[Xi , Xj ]. In particular, the ith diagonal entry is var[Xi ].
Clearly:
• the variance matrix is symmetric;
• if X1 , . . . , Xn are i.i.d. with variance σ 2 , then var[X] = σ 2 I;
• var[X] = E[XXT ] − µµT ; (1.1a)
• we shall denote the variance matrix by Σ or ΣX .
We shall often omit stating “when the second moments are finite” when it is obviously needed. Random vectors
will be nearly always column vectors, but may be written in text as row vectors in order to save space.

1.2 Linear transformations. If Y = X + a then var[Y] = var[X].


More generally, if Y = A + BX where A is m × 1 and B is m × n, then µY = A + BµX and
   
var[Y] = E (Y − µY )(Y − µY )T = E B(X − µX )(X − µX )T )BT = B var[X] BT

In particular, if a = (a1 , . . . , an ) is a 1 × n vector, then aX = ni=1 ai Xi is a random variable and


P

n X
X n
T
var[aX] = a var[X] a = ai aj cov[Xi , Xj ]
i=1 j=1
Example(1.2a). Suppose the random vector X = (X1 , X2 , X3 ) has variance matrix
6 2 3
" #
var[X] = 2 4 0
3 0 2
Let Y1 = X1 + X2 and Y2 = X1 + X2 − X3 . Find the variance matrix of Y = (Y1 , Y2 ).
Solution. Now Y = AX where  
1 1 0
A=
1 1 −1
Hence  
14 11
var[Y] = var[AX] = Avar[X]AT =
11 10

Bayesian Time Series Analysis by R.J. Reed Jan 8, 2019(21:02) §1 Page 59


Page 60 §1 Jan 8, 2019(21:02) Bayesian Time Series Analysis

1.3 Positive definiteness of the variance matrix. Suppose X is an n × 1 random vector. Then for any n × 1
vector c we have
cT var[X]c = var[cT X] ≥ 0 (1.3a)
Hence var[X] is positive semi-definite (also called non-negative definite).
Proposition(1.3a). Suppose X is a random vector with finite second moments and such that no element of X is
a linear combination of the other elements. Then var[X] is a symmetric positive definite matrix.
Proof. No element of X is a linear combination of the other elements; this means that if a is an n × 1 vector with aT X
constant then we must have a = 0.
Now suppose var[cT X] = 0; this implies cT X is constant and hence c = 0. Hence cT var[X]c = 0 implies var[cT X] = 0
which implies c = 0. This result, together with equation(1.3a) shows that var[X] must be positive definite.
Example(1.3b). Consider the random vector Z = (X, Y, X + Y )T where µX = E[X], µY = E[Y ] and ρ = corr[X, Y ].
Show that var[Z] is not positive definite.
Solution. Let a = (1, 1, −1). Then a var[Z] aT = var[aZ] = var[0] = 0.

1.4 The square root of a variance matrix; the transformation to independent random variables Suppose
C is a real symmetric positive definite n × n matrix. Because C is real and symmetric, we can write C = LDLT
where L is orthogonal1 and D = diag(d1 , . . . , dn ) is diagonal and d1 , . . . , dn are the eigenvalues of C. Because
C is also positive definite, we have d1 > 0, . . . , dn > 0. Hence we can write C = (LD1/2 LT )(LD1/2 LT ) = QQ
where Q is symmetric and nonsingular.
If we only know C is real, symmetric and non-negative definite, then we only have d1 ≥ 0, . . . , dn ≥ 0. We can
still write C = (LD1/2 LT )(LD1/2 LT ) = QQ where Q is symmetric but Q is now possibly singular.

Now suppose X is a random vector with finite second moments and such that no element of X is a linear combi-
nation of the other elements; then var[X] is a real symmetric positive definite matrix. Hence var[X] = QQ and
if Y = Q−1 X then var(Y) = Q−1 var[X] (Q−1 )T = I. This means that if X is a random vector with finite second
moments and such that no element of X is a linear combination of the other elements, then there exists a linear
transformation of X to independent variables.

1.5 The covariance between two random vectors.


Definition(1.5a). If X is an m × 1 random vector with finite second moments and Y is an n × 1 random vector
with finite second moments, then cov[X, Y] is the m × n matrix with (i, j) entry equal to cov[Xi , Yj ].
Because the (i, j) entry of cov[X, Y] equals cov[Xi , Yj ], it follows that
cov[X, Y] = E[(X − µX )(Y − µy )T ] = E[XYT ] − µX µTY
Also:
• because cov[Xi , Yj ] = cov[Yj , Xi ], it follows that cov[X, Y] = cov[Y, X]T ;
• if n = m, the covariance matrix cov[X, Y] is symmetric;
• cov[X, X] = var[X];
• we shall often denote the covariance matrix by Σ or ΣX,Y .

1.6 The correlation matrix. Suppose√the n × 1 random vector X has the variance matrix Σ. Let D be the n × n
diagonal matrix with diagonal equal to diag(Σ). Then the correlation matrix of X is given by
corr[X] = D−1 ΣD−1
Clearly, the (i, j) element of corr[X] is corr(Xpi , Xj ). Also, corr[X] is the variance matrix of the random vector
Z = (Z1 , . . . , Zn ) where Zj = (Xj − E[Xj ])/ var(Xj ).
Conversely, given corr[X] we need the vector of standard deviations in order to determine the variance  matrix. In
fact, var[X] = D corr[X] D where D is the diagonal matrix with entries stdev(X1 ), . . . , stdev(Xn ) .

1
Orthogonal means that LT L = I and hence L−1 = LT . Because C = LDLT , we have LT CL = D and hence LT (C − λI)L =
D − λI; hence |C − λI| = |D − λI| and the eigenvalues of C equal the eigenvalues of D—see page 39 of [R AO(1973)].
2 Multivariate Continuous Distributions Jan 8, 2019(21:02) §1 Page 61

1.7 Quadratic forms. Results about quadratic forms are important in regression and the analysis of variance.
A quadratic form in (x, y) is an expression of the type ax2 +by 2 +cxy; a quadratic form in (x, y, z) is an expression
of the form ax2 + by 2 + cz 2 + dxy + exz + f yz. Thus, for example, 2x2 + 4x + 3y 2 is not a quadratic form in (x, y).
Definition(1.7a). Suppose A is a real n × n symmetric matrix. Then the quadratic form of A is the function
qA : Rn → Rn with
Xn X n
qA (x) = ajk xj xk = xT Ax
j=1 k=1
Suppose we have x Ax where the matrix A is not symmetric. Because xT Ax is a scalar, we have xT Ax = xT AT x
T

and hence xT Ax = xT Bx where B = 21 (A + AT ). In this way, we can work round the requirement that A is
symmetric.
Pn
Example(1.7b). Suppose A = I, the identity matrix. Then qA (X) = XT AX = k=1 Xk2 .
Pn 2
Example(1.7c). Suppose A = 1, the n × n matrix with every entry equal to 1. Then qA (X) = XT AX = k=1 Xk .
If A and B are both real n×n symmetric matrices and a, b ∈ R, then aA+bB can be used to create a new quadratic
form:
qaA+bB (X) = XT (aA + bB)X = aqA (X) + bqB (X)
Pn Pn 2
Example(1.7d). Suppose A = I and B = 1. Then qaI+b1 (X) = a k=1 Xk2 + b k=1 Xk .
Pn Pn 2 Pn Pn
In particular qI−1/n (X) = k=1 Xk2 − n1

k=1 Xk = k=1 (Xk − X)2 and hence the sample variance S 2 = k=1 (Xk −
X)2 /(n − 1) is a quadratic form in X = (X1 , . . . , Xn ).
Example(1.7e). Suppose
0 1 0 ··· 0 0 0 1 0 ··· 0 0
   
0 0 1 ··· 0 0 1 0 1 ··· 0 0
0 0 0 ··· 0 ···
   
0 0 1 0 0 0
A1 =  ... ... ... . . . ... ..  and A2 =   ... ... ... .. .. .. 
 .  . . . 
0 0 0 ··· 0 1 0 0 0 ··· 0 1
0 0 0 ··· 0 0 0 0 0 ··· 1 0
Then XT A1 X = X1 X2 + · · · + Xn−1 Xn . Note that the matrix A1 is not symmetric. Now A2 = A1 + AT1 is symmetric and
Pn−1
qA2 (X) = XT A2 X = 2XT A1 X = 2 k=1 Xk Xk+1 .

1.8 Mean of a quadratic form.


Proposition(1.8a). Suppose X is an n × 1 random vector with E[X] = µ and var[X] = Σ. Suppose A is a real
n × n symmetric matrix. Then qA (X), the quadratic form of A has expectation
E[XT AX] = trace(AΣ) + µT Aµ (1.8a)
Proof. Now XT AX = (X − µ)T A(X − µ) + µT AX + XT Aµ − µT Aµ and hence
E[XT AX] = E (X − µ)T A(X − µ) + µT Aµ
 

Because (X − µ)T A(X − µ) is a scalar, we have


E[ (X − µ)T A(X − µ) ] = E trace (X − µ)T A(X − µ)
 

= E trace A(X − µ)(X − µ)T


 
because trace(AB) = trace(BA).
T
 
= trace E A(X − µ)(X − µ) because E[trace(V)] = trace(E[V]).
Hence result.
The second term in equation(1.8a) is xT Ax evaluated at x = µ; this simplifies some derivations. We now apply
this result to some of the examples above.
Pn
Example(1.8b). Suppose A = I, the identity matrix. Then qA (X) = XT AX = j=1 Xj2 . and equation(1.8a) gives
 
X n Xn n
X
E Xj2  = σj2 + µ2j
j=1 j=1 j=1
P 2
n
Example(1.8c). Suppose A = 1, the n × n matrix with every entry equal to 1. Then qA (X) = XT AX = j=1 Xj and
equation(1.8a) gives
 2   2
Xn Xn X n X n
E  Xj   = σjk +  µj 
 
j=1 j=1 k=1 j=1
Page 62 §1 Jan 8, 2019(21:02) Bayesian Time Series Analysis

Example(1.8d). Continuation of example(1.7d). P Suppose X1 , . . . , Xn are i.i.d. random variables with mean µ and
n 1
variance σ 2 . Consider the quadratic form qA (X) = 2 T
k=1 (Xk − X) . Then qA (X) = X AX where A = I − n 1. Now
2
var[X] = σ I; hence equation(1.8a) gives
n
1 X
E[qA (X)] = σ 2 trace(I − 1) + xT Ax = (n − 1)σ 2 + (xk − x)2 = (n − 1)σ 2

n x=µ x=µ
k=1
Pn
Hence if S 2 = k=1 (Xk − X)2 /(n − 1), then E[S 2 ] = σ 2 .
Example(1.8e). P Suppose X1 , . . . , Xn are i.i.d. random variables with mean µ and variance σ 2 . First we shall find the matrix
n−1
A with qA (X) = k=1 (Xk − Xk+1 )2 = (X1 − X2 )2 + (X2 − X3 )2 + · · · + (Xn−1 − Xn )2 . Now qA (X) = X12 + 2X22 + · · · +
2 n−1
+Xn2 −2 k=1 Xk Xk+1 . Using the matrix A2 in example(1.7e) gives qA (X) = X12 +2X22 +· · ·+2Xn−1 2
+Xn2 −XT A2 X.
P
2Xn−1
T T
Hence qA (X) = X A1 X − X A2 X where A1 = diag [ 1 2 2 · · · 2 1 ].
Hence qA (X) = XT AX where
1 −1 0 · · · 0 0
 
 −1 2 −1 · · · 0 0 
 0 −1 2 · · · 0 0 
 
A = A1 − A2 =  ..
 .. .. .. .. .. 
 . . . . . . 

 0 0 0 · · · 2 −1 
0 0 0 ··· −1 1
2
Equation(1.8a) gives E[qA (X)] = σ trace(A) + qA (X) = σ (2n − 2) + 0 = σ 2 (2n − 2).
2

X=(µ,...,µ)

1.9 Variance of a quadratic form. This result is complicated!


Proposition(1.9a). Suppose X1 , . . . , Xn are independent random variables with E[Xj ] = 0 for j = 1, . . . , n.
Suppose all the random variables have the same finite second and fourth moments; we shall use the following
notation:
E[Xj2 ] = σ 2 and E[Xj4 ] = µ4
Suppose A is an n × n symmetric matrix with entries aij and d is the n × 1 column vector with entries
(a11 , . . . , ann ) = diag(A). Then
var[XT AX] = (µ4 − 3σ 4 )dT d + 2σ 4 trace(A2 )
  2
Proof. Now var[XT AX] = E (XT AX)2 − E[XT AX] . Pn
Because E[X] = 0, using equation(1.8a) gives E[XT AX] = trace(AΣ) = σ 2 trace(A) = σ 2 j=1 ajj . Let c = XT A and
Z = XXT ; then c is a 1×n row vector and Z is an n×n matrix and (XT AX)2 = XT AXXT AX = cZcT = j k cj ck Zjk =
P P
P P th T
P n th T
j k cj ck Xj Xk . The j entry in the row vector c = X A is i=1 Xi aij and the k entry in the row vector c = X A
P n
is `=1 X` a`k . Hence
XXXX
(XT AX)2 = aij a`k Xi Xj Xk X`
i j k `
Using independence of the X’s gives
µ4
if i = j = k = `;
(
E[Xi Xj Xk X` ] = σ4
if i = j and k = `, or i = k and j = `, or i = ` and j = k.
otherwise. 0
Hence  
X X X X
E[ (XT AX)2 = µ4 a2ii + σ 4  a2ij 
 
aii akk + aij aji +
 
i i,k i,j i,j
i6 =k i6 =j i6 =j

Now
X n
X n
X n
X
aii akk = aii akk = aii (trace(A) − aii ) = [trace(A)]2 − dT d
i,k i=1 k=1 i=1
i6 =k k6 =i

X n X
X n n X
X n
a2ij = a2ij = a2ij − dT d = trace(A2 ) − dT d
i,j i=1 j=1 i=1 j=1
i6 =j j6 =i
X X
aij aji = a2ij = trace(A2 ) − dT d
i,j i,j
i6 =j i6 =j

and hence
E[ (XT AX)2 = (µ4 − 3σ 4 )dT d + σ 4 trace(A)2 + 2 trace(A2 )
   

and hence the result.


2 Multivariate Continuous Distributions Jan 8, 2019(21:02) §2 Page 63

Example(1.9b). Suppose X1 , . . . , Xn are i.i.d. random variables with the N (0, σ 2 ) distribution and A is a symmetric
n × n matrix. By §10.3 on page 26, we know that E[Xj4 ] = 3σ 4 . Hence
var[XT AX] = 2σ 4 trace(A2 )

We can generalize proposition(1.9a) to non-zero means as follows:


Proposition(1.9c). Suppose X1 , . . . , Xn are independent random variables with E[Xj ] = µj for j = 1, . . . , n.
Suppose all the random variables have the same finite second, third and fourth moments about the mean; we
shall use the following notation: E[X] = µ and
E[(Xj − µj )2 ] = σ 2
E[(Xj − µj )3 ] = µ3
E[(Xj − µj )4 ] = µ4
Suppose A is an n × n symmetric matrix with entries ai,j and d is the n × 1 column vector with entries
(a11 , . . . , ann ) = diag(A). Then
var[XT AX] = (µ4 − 3σ 4 )dT d + 2σ 4 trace(A2 ) + 4σ 2 µT A2 µ + 4µ3 µT Ad (1.9a)
Proof. See exercise 11 on page 64.

1.10
Summary.
The variance matrix.
• var[X] = E[ (X − µ)(X − µ)T ] = E[XXT ] − µµT .
• var[A + BX] = B var[X] BT
• var[X] is symmetric positive semi-definite
• if no element of X is a linear combination of the others, then var[X] is symmetric positive definite
• if var[X] is positive definite, there exists symmetric non-singular Q with var[X] = QQ
• if X has finite second moments and no element is a linear combination of the other elements, then there exists a
linear transformation of X to independent variables
The covariance matrix.
• cov[X, Y] = E[ (X − µX )(Y − µY )T ] = E[XYT ] − µX µTY
• cov[X, Y] = cov[Y, X]T
• cov[X, X] = var[X]
• if the dimensions of X and Y are equal, then cov[X, Y] is symmetric
Quadratic forms.
• qA (x) = xT Ax where A is a real symmetric matrix
• E[qA (X)] = trace(AΣ) + µT Aµ

2 Exercises (exs-multiv.tex)

1. Suppose X is an m × 1 random vector and Y is an n × 1 random vector. Suppose further that all second moments are
finite of X and Y and suppose a is an m × 1 vector and b is an n × 1 vector. Show that
Xm X n
cov[aT X, bT Y] = aT cov[X, Y]b = ai bj cov[Xi , Yj ]
i=1 j=1

2. Further properties of the covariance matrix. Suppose X is an m × 1 random vector and Y is an n × 1 random vector.
Suppose further that all second moments are finite.
(a) Show that for any m × 1 vector b and any n × 1 vector d we have
cov[X + b, Y + d] = cov[X, Y]
(b) Show that for any ` × m matrix A and any p × n matrix B we have
cov[AX, BY] = Acov[X, Y]BT
(c) Suppose a, b, c and d ∈ R; suppose further that V is an m × 1 random vector and W is an n × 1 random vector
with finite second moments.
cov[aX + bV, cY + dW] = ac cov[X, Y] + ad cov[X, W] + bc cov[V, Y] + bd cov[V, W]
Both sides are m × n matrices.
Page 64 §2 Jan 8, 2019(21:02) Bayesian Time Series Analysis

(d) Suppose a and b ∈ R and both X and V are m × 1 random vectors. Show that
var[aX + bV] = a2 var[X] + ab cov[X, V] + ab cov[V, X] + b2 var[V]
3. Suppose Y1 , Y2 , . . . , Yn are independent random variables each with variance 1. Let X1 = Y1 , X2 = Y1 + Y2 , . . . , Xn =
Y1 + · · · + Yn . Find the n × n matrix var[X].
4. Suppose X is an n × 1 random vector with finite second moments. Show that for any n × 1 vector α ∈ Rn we have
E[(X − α)(X − α)T ] = var[X] + (µX − α)(µx − α)T
5. Suppose X is an m-dimensional random vector with finite second order moments and such that such that no element of
X is a linear combination of the other elements.
Show that for any n-dimensional random vector Y, there exists an n × m matrix A such that
cov[Y − AX, X] = 0
6. Suppose X is an n × 1 random vector with E[X] = µ and var[X] = Σ. Prove the following results:
(a) E[(AX + a)(BX + b)T ] = AΣBT + (Aµ + a)(Bµ + b)T where A is m × n, a is m × 1, B is r × n and b is r × 1.
(b) E[(X + a)(X + a)T ] = Σ + (µ + a)(µ + a)T where a is n × 1.
(c) E[XaT X] = (Σ + µµT )a where a is n × 1.
7. Suppose X is an n × 1 random vector with E[X] = µ and var[X] = Σ. Prove the following results:
(a) E[(AX + a)T (BX + b)] = trace(AΣBT ) + (Aµ + a)T (Bµ + b) where A and B are m × n, and a and b are m × 1.
(b) E[XT X] = trace(Σ) + µT µ
(c) E[(AX)T (AX)] = trace(AΣAT ) + (Aµ)T (Aµ) where A is n × n.
(d) E[(X + a)T (X + a)] = trace(Σ) + (µ + a)T (µ + a) where a is n × 1.
8. Suppose X is an m-dimensional random vector, Y is an n-dimensional random vector and A is an m × n real matrix.
Prove that
E[ XT AY ] = trace(AΣY,X ) + µTX AµY
where ΣY,X is the n × m matrix cov[Y, X].
9. Quadratic forms. Suppose X is an n × 1 random vector with E[X] = µ and var[X] = Σ. Suppose A is a real n × n
symmetric matrix and b is an n × 1 real vector.
Show that
E[(X − b)T A(X − b)] = trace(AΣ) + (µ − b)T A(µ − b)
In particular
• E[ (X − µ)T A(X − µ) ] = trace(AΣ). √
• If kak denotes the length of the vector a, then kak = aT a and EkX − bk2 = trace(Σ) + kµ − bk2 .
10. Suppose X1 , . . . , Xn are random variables with E[Xj ] = µj and var[Xj ] = σj2 for j = 1, . . . , n; also cov[Xj , Xk ] = 0
for k > j + 1. If
Xn
Q= (Xk − X)2
k=1
show that
(n − 1)α − 2β
E[Q] = +γ
n
Pn 1
Pn 2
where α = σ12 + · · · + σn2 , β = cov[X1 , X2 ] + cov[X2 , X3 ] + · · · + cov[Xn−1 , Xn ] and γ = k=1 µ2k − n k=1 µk .
Note that if all variables have the same mean, then γ = 0.
11. Variance of a quadratic form—proof of proposition (1.9c) on page 63.
(a) Show that XT AX = W1 + W2 + c where W1 = (X − µ)T A(X − µ), W2 = 2µT A(X − µ) and c = µT Aµ.
(b) Show that var[W2 ] = 4σ 2 µT A2 µ.
(c) Show that cov[W1 , W2 ] = E[W1 W2 ] = 2µT A E[YYT AY] = 2µ3 µT Ad where Y = X − µ.
(d) Hence show var[XT AX] = (µ4 − 3σ 4 )dT d + 2σ 4 trace(A2 ) + 4σ 2 µT A2 µ + 4µ3 µT Ad
2
12. Suppose X1 , . . . , Xn are random variables
Pnwith common expectation µ and common variance σ . Suppose further that
cov[Xj , Xk ] = ρσ 2 for j 6= k. Show that k=1 (Xk − X)2 has expectation σ 2 (1 − ρ)(n − 1) and hence
Pn 2
k=1 (Xk − X)
(1 − ρ)(n − 1)
2
is an unbiased estimator of σ .
2 Multivariate Continuous Distributions Jan 8, 2019(21:02) §3 Page 65

13. Suppose X1 , . . . , Xn are i.i.d. random variables with the N (µ, σ 2 ) distribution. Let
Pn 2
Pn−1
k=1 (Xk − X) (Xk+1 − Xk )2
2
S = and Q = k=1
n−1 2(n − 1)
(a) Show that var[S 2 ] = 2σ 4 /(n − 1). (See also exercise 7 on page 80.)
(b) Show that E[Q] = σ 2 and var[Q] = 2σ 4 (6n − 8)/4(n − 1)2 .

14. Suppose (X1 , X2 , X3 ) has the density


1 1 2 2 2 x1 x2 x3 −(x21 +x22 +x23 )
f(X1 ,X2 ,X3 ) (x1 , x2 , x3 ) = e− 2 (x1 +x2 +x3 ) + e for x1 , x2 , x3 ∈ R.
(2π)3/2 (2π)3/2
1 2
(Note that the maximum of |x3 e− 2 x3 | occurs at x3 = ±1 and has absolute value which is less than 1. Hence f ≥ 0
everywhere.)
Show that X1 , X2 and X3 are pairwise independent but not independent.

15. (a) Expectation of XT AY. Suppose X is an n-dimensional random vector with expectation µX , Y is an m-dimensional
random vector with expectation µY and A is an n × m real matrix. Let Z = XT AY. Prove that E[Z] =
trace( Acov[Y, X] ) + µTX AµY .
(b) Suppose (X1 , Y1 ), . . . , (Xn , Yn ) are i.i.d. random vectors with a distribution with expectation (µX , µY ) and variance
 2 
σX σXY
where σXY = cov[X, Y ].
σXY σY2
Suppose
Pn
j=1 (Xj − X)(Yj − Y)
SXY =
n−1
Show that E[SXY ] = σXY .

3 The bivariate normal


3.1 The density. Here is the first of several equivalent formulations of the density.
Definition(3.1a). The random vector (X1 , X2 ) has a bivariate normal distribution iff it has density
|P|1/2 (x − µ)T P (x − µ)
 
fX1 X2 (x1 , x2 ) = exp − (3.1a)
2π 2
   
x1 µ1
where x = , µ = ∈ R2 and P is a real symmetric positive definite matrix.
2×1 x2 2×1 µ2 2×2
Suppose the entries in the 2 × 2 real symmetric matrix P are denoted as follows2 :
 
a1 a2
P=
a2 a3
It follows that equation(3.1a) is equivalent to
q
a1 a3 − a22 
a1 (x1 − µ1 )2 + 2a2 (x1 − µ1 )(x2 − µ2 ) + a3 (x2 − µ2 )2

f (x1 , x2 ) = exp − (3.1b)
2π 2
A more common form of the density is given in equation(3.3a) on page 66.

To show that equation(3.1b) defines a density. Clearly f ≥ 0. It remains to check that f integrates to 1. Let
y1 = x1 − µ1 and y2 = x2 − µ2 . Then
Z Z
fX1 X2 (x1 , x2 ) dx1 dx2
x1 x2
|P|1/2 a1 y12 + 2a2 y1 y2 + a3 y22
Z Z  
= exp − dy1 dy2 (3.1c)
2π y1 y2 2
( 2 )  2
|P|1/2 a22
 
a1 a2 y2
Z Z
= exp − y1 + y2 exp − a3 − dy1 dy2 (3.1d)
2π y1 y2 2 a1 2 a1

2
It is easy to check that the real symmetric matrix P is positive definite iff a1 > 0 and a1 a3 − a22 > 0.
Page 66 §3 Jan 8, 2019(21:02) Bayesian Time Series Analysis

√  
a1 a3 −a22
Now use the transformation z1 = a1 y1 + aa12 y2 and z2 = y2 √ a1 . This transformation has Jacobian
q
a1 a3 − a22 = |P|1/2 and is a 1 − 1 map R2 → R2 ; it gives
 2
z + z22
Z Z Z Z 
1
fX1 X2 (x1 , x2 ) dx1 dx2 = exp − 1 dz1 dz2 = 1
x1 x2 2π z1 z2 2
by using the integral of the standard normal density equals one.
3.2 The marginal distributions of X1 and X2 . For the marginal density of X2 we need to find the following
integral:
Z
fX2 (x2 ) = fX1 X2 (x1 , x2 ) dx1
x1
First let Y1 = X1 − µ1 and Y2 = X2 − µ2 and find the density of Y2 :
|P|1/2 a1 y12 + 2a2 y1 y2 + a3 y22
Z Z  
fY2 (y2 ) = fY1 ,Y2 (y1 , y2 ) dy1 = exp − dy1
y1 2π y1 2
Using the decomposition in equation(3.1d) gives
 2 ( 2 )
|P|1/2 a22
 Z 
y2 a1 a2
fY2 (y2 ) = exp − a3 − exp − y1 + y2 dy1
2π 2 a1 y1 2 a1
 2  r
|P|1/2 y2 a1 a3 − a22 y22
 
2π 1
= exp − =q exp − 2
2π 2 a1 a1 2πσ22 2σ2

where σ22 = a1 /(a1 a3 − a22 ) = a1 /|P|. It follows that the density of X2 = Y2 + µ2 is


(x2 − µ2 )2
 
1 a1 a1
fX2 (x2 ) = q exp − 2
where σ22 = 2
=
2πσ22 2σ2 (a1 a3 − a2 ) |P|
We have shown that the marginal distributions are normal:
X2 has the N (µ2 , σ22 ) distribution.
Similarly,
X1 has the N (µ1 , σ12 ) distribution.
where
a3 a3 a1 a1
σ12 = = and σ22 = =
a1 a3 − a22 |P| a1 a3 − a22 |P|
3.3 The covariance and correlation between X1 and X2 . Of course, cov[X1 , X2 ] = cov[Y1 , Y2 ] where Y1 =
X1 − µ1 and Y2 = X2 − µ2 . So it suffices to find cov[Y1 , Y2 ] = E[Y1 Y2 ]. The density of (Y1 , Y2 ) is:
|P|1/2 a1 y12 + 2a2 y1 y2 + a3 y22
 
fY1 Y2 (y1 , y2 ) = exp −
2π 2
It follows that
a1 y12 + 2a2 y1 y2 + a3 y22
Z Z  

exp − dy1 dy2 = q
y1 y2 2 a1 a3 − a22
Differentiating with respect to a2 gives
a1 y12 + 2a2 y1 y2 + a3 y22
Z Z  

(−y1 y2 ) exp − dy1 dy2 = a2
y1 y2 2 (a1 a3 − a22 )3/2
and hence
a2
cov[X1 , X2 ] = E[Y1 Y2 ] = −
a1 a3 − a22
The correlation between X1 and X2 is
cov[X1 , X2 ] a2
ρ= = −√
σ1 σ2 a1 a3
These results lead to an alternative expression for the density of a bivariate normal:
(x1 − µ1 )2 (x1 − µ1 )(x2 − µ2 ) (x2 − µ2 )2
  
1 1
fX1 X2 (x1 , x2 ) = exp − − 2ρ +
2(1 − ρ2 ) σ12 σ22
p
2πσ1 σ2 1 − ρ2 σ 1 σ2
(3.3a)
2 Multivariate Continuous Distributions Jan 8, 2019(21:02) §3 Page 67

We have also shown that  


σ12 cov[X1 , X2 ]
var[X] = = P−1
cov[X1 , X2 ] σ22
P is sometimes called the precision matrix—it is the inverse of the variance matrix var[X].
Summarizing some of these results:
   
a1 a2 2 −1 1 a3 −a2
P= |P| = a1 a3 − a2 P =
a2 a3 a1 a3 − a22 −a2 a1
   
−1 σ12 ρσ1 σ2 2 2 2 −1 1 σ22 −ρσ1 σ2
Σ=P = |Σ| = (1 − ρ )σ σ Σ = P =
ρσ1 σ2 σ22 1 2
(1 − ρ2 )σ12 σ22 −ρσ1 σ2 σ12
(3.3b)
Example(3.3a). Suppose (X, Y ) has a bivariate normal distribution with density
 
1 1
f (x, y) = exp − (x2 + 2y 2 − xy − 3x − 2y + 4)
k 2
Find the mean vector and the variance matrix of (X, Y ). What is the value of k?
Solution. Let Q(x, y) = a1 (x − µ1 )2 + 2a2 (x − µ1 )(y − µ2 ) + a3 (y − µ2 )2 . So we want Q(x, y) = x2 + 2y 2 − xy − 3x − 2y + 4.
Equating coefficients of x2 , xy and y 2 gives a1 = 1, a2 = − 21 and a3 = 2. Hence
1 − 12 4 2 12
   
−1
P= and Σ = P =
−1 2 7 21 1
7 1/2
√2
Also |P| = 4 and hence k = 2π/|P| = 4π/ 7.
Now ∂Q(x,y)
∂x = 2a1 (x − µ1 ) + 2a2 (y − µ2 ) and ∂Q(x,y)
∂y = 2a2 (x − µ1 ) + 2a3 (y − µ2 ). If ∂Q(x,y)
∂x = 0 and ∂Q(x,y)
∂y = 0 then we
2
must have x = µ1 and y = µ2 because |P| = a1 a3 − a2 6= 0.
Applying this to Q(x, y) = x2 + 2y 2 − xy − 3x − 2y + 4 gives the equations 2µ1 − µ2 − 3 = 0 and 4µ2 − µ1 − 2 = 0. Hence
(µ1 , µ2 ) = (2, 1).

3.4 The characteristic function. Suppose XT = (X1 , X2 ) has the bivariate density defined in equation(3.1a).
Then for all t ∈ R2 , the characteristic function of X is
2×1

|P|1/2 (x − µ)T P(x − µ)


h i Z Z  
itT X itT x
φ(t) = E e = e exp − dx dy
2π x1 x2 2
|P|1/2 itT µ
 T
2it y − yT Py
Z Z 
= e exp dy1 dy2 by setting y = x − µ.
2π y1 y2 2
But y Py − 2it y = (y − iΣt) P(y − iΣt) + tT Σt where Σ = P−1 = var[X]. Hence
T T T

|P|1/2 itT µ− 1 tT Σt (y − iΣt)T P(y − iΣt)


Z Z  
φ(t) = e 2 exp − dy1 dy2
2π y1 y2 2
T 1 T
= eit µ− 2 t Σt
by using the integral of equation(3.1a) is 1.
Example(3.4a). Suppose X = (X1 , X2 ) ∼ N (µ, Σ). Find the distribution of X1 + X2.
Solution. The c.f. of X is φX (t1 , t2 ) = exp iµ1 t1 + iµ2 t2 − 21 (t21 σ12 + 2t1 t2 σ12 + t22 σ22 ) . Setting t1 = t2 = t gives the c.f. of
X1 + X2 to be φX1 +X2 (t) = exp i(µ1 + iµ2 )t − 21 t2 (σ12 + 2σ12 + σ22 ) . Hence X1 + X2 ∼ N (µ1 + µ2 , σ 2 ) where σ 2 = σ12 +


2σ12 + σ22 = σ12 + 2ρσ1 σ2 + σ22 .

3.5 The conditional distributions. We first find the conditional density of Y1 given Y2 where Y1 = X1 − µ1 and
Y2 = X2 − µ2 . Now
fY Y (y1 , y2 )
fY1 |Y2 (y1 |y2 ) = 1 2
fY2 (y2 )
We use the following forms:
 n o
2 2ρσ1 σ12 2
1 y1 − σ2 y1 y2 + σ2 y2
fY1 Y2 (y1 , y2 ) = exp − 2
 
2
p 2
2σ1 (1 − ρ )
2

2πσ1 σ2 1 − ρ

y22
 
1
fY2 (y2 ) = q exp − 2
2πσ22 2σ2
Page 68 §3 Jan 8, 2019(21:02) Bayesian Time Series Analysis

Hence  n o
2ρσ1 ρ2 σ12 2
1 y12 − σ2 y1 y2 + σ22 2
y
f (y1 |y2 ) = √ q exp −
 
2σ12 (1 − ρ2 )

2π σ12 (1 − ρ2 )
 n  o2 
ρσ1
1 y1 − y
σ2 2
=√ q exp −
 
2σ 2 (1 − ρ 2) 
2
2π σ (1 − ρ2 ) 1
1
 
ρσ1 2
which is the density of the N − ρ2 ) distribution.
σ2 y2 , σ1 (1
 
It follows that the density of X1 given X2 is the N µ1 + ρσ σ2
1
(x 2 − µ 2 ), σ1
2 (1 − ρ2 ) distribution, and hence

ρσ1
E[X1 |X2 ] = µ1 + σ2 (X2 − µ2 ) and var[X1 |X2 ] = σ12 (1 − ρ2 ).
of the original notation, σ12 (1 − ρ2 ) = 1/a1 and ρσ1 /σ2 = −a2 /a1 and hence the distribution of X1 given
In terms 
a2
X2 is N µ1 − a1 (x2 − µ2 ), a11 .
Example(3.5a). Suppose the 2-dimensional random vector X has the bivariate normal N (µX , ΣX ) distribution where
   
2 4 2
µX = and ΣX =
1 2 3
Find the distribution of X1 + X2 given X1 = X2 .
Solution. Let Y1 = X1 + X2 and Y2 = X1 − X2 . Then   
Y1 1 1
Y= = BX where B =
Y2 1 −1
Hence Y ∼ N (µY , ΣY ) where
   
3 11 1
µY = and ΣY = BΣX BT =
1 1 3
2 2

 the random vector Y we haveσ1 = 11, σ2 = 3 and ρ = 1/ 33. We now want the distribution of Y1 given Y2 = 0.
Note that for
This is N µ1 + ρσ 2 2
σ2 (y2 − µ2 ), σ1 (1 − ρ ) = N ( /3, /3).
1 5 32

We have also shown that if the random vector (X1 , X2 ) is bivariate normal, then E[X1 |X2 ] is a linear function
of X2 and hence the best predictor and best linear predictor are the same—see exercises 12 and 15 on page 8.
3.6 Independence of X1 and X2 .
Proposition(3.6a). Suppose (X1 , X2 ) ∼ N (µ, Σ). Then X1 and X2 are independent iff ρ = 0.
Proof. If ρ = 0 then fX1 X2 (x1 , x2 ) = fX1 (x1 )fX2 (x2 ). Conversely, if X1 and X2 are independent then cov[X1 , X2 ] = 0
and hence ρ = 0.
In terms of entries in the precision matrix: X1 and X2 are independent iff a2 = 0.
3.7 Linear transformation of a bivariate normal.
Proposition(3.7a). Suppose X has the bivariate normal distribution N (µ, Σ) and C is a 2 × 2 non-singular
matrix. Then the random vector Y = a + CX has the bivariate normal distribution N (a + Cµ, CΣCT ).
Proof. The easiest way is to find the characteristic function of Y. For t ∈ R2 we have
2×1
itT Y itT a itT CX itT a itT Cµ− 21 tT CΣCT t
φY (t) = E[e ] = e E[e ]=e e
which is the characteristic function of the bivariate normal N (a + Cµ, CΣCT ).
We need C to be non-singular in order to ensure the variance matrix of the result is non-singular.

We can transform a bivariate normal to independent normals as follows.


Proposition(3.7b). Suppose X has the bivariate normal distribution N (µ, Σ) where
   
µ1 σ12 ρσ1 σ2
µ= and Σ =
µ2 ρσ1 σ2 σ22
Define Y to be the random vector with components Y1 and Y2 where:
X1 = µ1 + σ1 Y1
p
X2 = µ2 + ρσ2 Y1 + σ2 1 − ρ2 Y2
Then Y ∼ N (0, I).
2 Multivariate Continuous Distributions Jan 8, 2019(21:02) §4 Page 69

Proof. Note that X = µ + BY and Y = B−1 (X − µ) where


   
σ1 0 −1
1/σ 0
B= p and B = √1 √
ρσ2 σ2 1 − ρ2 − ρ/σ1 1−ρ2 1/σ
2 1−ρ2
It is straightforward to check that B−1 Σ(B−1 )T = I. Hence Y ∼ N (0, I).
3.8
Summary. The bivariate normal distribution.
Suppose X = (X1 , X2 ) ∼ N (µ, Σ) where µ = E[X] and Σ = var[X].
• Density.
|P|1/2 (x − µ)T P(x − µ)
 
fX (x) = exp − where P = Σ−1 is the precision matrix.
2π 2
(x1 − µ1 )2 (x1 − µ1 )(x2 − µ2 ) (x2 − µ2 )2
  
1 1
= exp − − 2ρ +
2(1 − ρ2 ) σ12 σ22
p
2πσ1 σ2 1 − ρ2 σ1 σ2
   
a1 a2 1 a3 −a2
• If P = then Σ = P−1 = .
a2 a3 a1 a3 − a2 −a2 a1
2
 
−1 1 σ22 −ρσ1 σ2
• P=Σ = 2 2 and |Σ| = (1 − ρ2 )σ12 σ22 .
σ1 σ2 (1 − ρ2 ) −ρσ1 σ2 σ12
• The marginal distributions. X1 ∼ N (µ1 , σ12 ) and X2 ∼ N (µ2 , σ22 ).
T 1 T T 1 T
• The characteristic function: φ(t) = eit µ− 2 t Σt ; the m.g.f. is E[etX ] = et µ+ 2 t Σt
.
• The conditional distributions.  
The distribution of X1 given X2 = x2 is N µ1 + ρσ σ2
1
(x 2 − µ 2 ), σ1
2 (1 − ρ2 ) .
 
ρσ2
The distribution of X2 given X1 = x1 is N µ2 + σ1 (x1 − µ1 ), σ22 (1 − ρ2 ) .
• X1 and X2 are independent iff ρ = 0.
• Linear transformation of a bivariate normal. If C is non-singular, then Y = a + CX has a bivariate
normal distribution with mean a + Cµ and variance matrix CΣCT .

4 Exercises (exs-bivnormal.tex)

1. Suppose (X, Y ) has the density


2 2
fXY (x, y) = ce−(x −xy+y )/3

(a) Find c. (b) Are X and Y independent?


2. Suppose (X, Y ) has a bivariate normal distribution with density
f (x, y) = k exp −(x2 + 2xy + 4y 2 )
 

Find the mean vector and the variance matrix of (X, Y ). What is the value of k?
3. Suppose (X, Y ) has a bivariate normal distribution with density
 
1 1 2 2
f (x, y) = exp − (2x + y + 2xy − 22x − 14y + 65)
k 2
Find the mean vector and the variance matrix of (X, Y ). What is the value of k?
4. Suppose the random vector Y = (Y1 , Y2 ) has the density
 
1 1 2 2
f (y1 , y2 ) = exp − (y1 + 2y2 − y1 y2 − 3y1 − 2y2 + 4) for y = (y1 , y2 ) ∈ R2 .
k 2
Find E[Y] and var[Y].
5. Evaluate the integral
Z ∞
exp −(y12 + 2y1 y2 + 4y22 ) dy1 dy2
 
−∞

6. Suppose the random vector Y = (Y1 , Y2 ) has the density


 
1 2 2
f (y1 , y2 ) = k exp − (y1 + 2y1 (y2 − 1) + 4(y2 − 1) for y = (y1 , y2 ) ∈ R2 .
12
Show that Y ∼ N (µ, Σ) and find the values of µ and Σ.
Page 70 §4 Jan 8, 2019(21:02) Bayesian Time Series Analysis

2
7. Suppose (X, Y ) has the bivariate normal distribution N (µ, Σ). Let σX = var[X], σY2 = var[Y ] and ρ = corr(X, Y ).
(a) Show that X and Y − ρσY X/σX are independent.
(b) Suppose θ satisfies tan(θ) = σX/σY , show that X cos θ + Y sin θ and X cos θ − Y sin θ are independent.
8. Suppose X = (X1 , X2 ) has a bivariate normal distribution with E[X1 ] = E[X2 ] = 0 and variance matrix Σ. Prove that
X2
XT PX − 21 ∼ χ21
σ1
where P is the precision matrix of X.
9. (a) Suppose E[X1 ] = µ1 , E[X2 ] = µ2 and there exists α such that Y = X1 + αX2 is independent of X2 . Prove that
E[X1 |X2 ] = µ1 + αµ2 − αX2 .
(b) Use part (a) to derive E[X1 |X2 ] for the bivariate normal.
10. An alternative method
p for constructing the bivariate normal. Suppose X and Y are i.i.d. N (0, 1). Suppose ρ ∈ (−1, 1)
and Z = ρX + 1 − ρ2 Y .
(a) Find the density of Z.
(b) Find the density of (X, Z).
(c) Suppose µ1 ∈ R, µ2 ∈ R, σ1 > 0 and σ2 > 0. Find the density of (U, V ) where U = µ1 + σ1 X and V = µ2 + σ2 Z.
11. Suppose (X1 , X2 ) has the bivariate normal distribution with density given by equation(3.3a). Define Q by:
e−Q(x1 ,x2 )
fX1 X2 (x1 , x2 ) = p
2πσ1 σ2 1 − ρ2
Hence
(x1 − µ1 )2 (x1 − µ1 )(x2 − µ2 ) (x2 − µ2 )2
 
1
Q(x1 , x2 ) = − 2ρ +
2(1 − ρ2 ) σ12 σ1 σ2 σ22
Define the random variable Y by Y = Q(X1 , X2 ). Show that Y has the exponential density.
12. (a) Suppose (X1 , X2 ) has a bivariate normal
 distribution
  E[X1 ] = E[X2 ] = 0. Hence it has characteristic function
with
1 T 1
φX (t) = exp − t Σt = exp − σ12 t21 + 2σ12 t1 t2 + σ22 t22

2 2
Explore the situations when Σ is singular.
(b) Now suppose (X1 , X2 ) has a bivariate normal distribution without the restriction of zero means. Explore the
situations when the variance matrix Σ is singular.
13. Suppose T1 and T2 are i.i.d. N (0, 1). Set X = a1 T1 + a2 T2 and Y = b1 T1 + b2 T2 where a21 + a22 > 0 and a1 b2 6= a2 b1 .
(a) Show that E[Y |X] = X(a1 b1 + a2 b2 )/(a21 + a22 ).
 2
(b) Show that E Y − E(Y |X) = (a1 b2 − a2 b1 )2 /(a21 + a22 ).
14. a) Suppose (X, Y ) has a bivariate normal distribution with var[X] = var[Y ]. Show that X + Y and X − Y are
independent random variables.
(b) Suppose (X, Y ) has a bivariate normal distribution with E[X] = E[Y ] = 0, var[X] = var[Y ] = 1 and cov[X, Y ] =
ρ. Show that X 2 and Y 2 are independent iff ρ = 0.
2
(Note. If var[X] = σX and var[Y ] = σY2 then just set X1 = X/σX and Y1 = Y /σY .)
15. (a) Suppose (X, Y ) has a bivariate normaldistribution
 N (µ, Σ) with
 2 
0 σ ρσ 2
µ= and Σ=
0 ρσ 2 σ 2
Let (R, Θ) denote the polar coordinates of (X, Y ). Find the distribution of (R, Θ) and the marginal distribution of
Θ.
If ρ = 0, equivalently if X and Y are independent, show that R and Θ are independent.
(b) Suppose X and Y are i.i.d. random variables with the N (0, σ 2 ) distribution. Let
X2 − Y 2 2XY
T1 = √ and T2 = √
2
X +Y 2 X2 + Y 2
Show that T1 and T2 are i.i.d. random variables with the N (0, 1) distribution.
16. Suppose (X1 , X2 ) has a bivariate normal distribution with E[X1 ] = E[X2 ] = 0. Let Z = X1 /X2 .
(a) Show that p
σ1 σ2 1 − ρ2
fZ (z) =
π(σ22 z 2 − 2ρσ1 σ2 z + σ12 )
(b) Suppose X1 and X2 are i.i.d. random variables with the N (0, σ 2 ) distribution.
(i) What is the distribution of Z?
(ii) What is the distribution of W = (X − Y )/(X + Y )?
2 Multivariate Continuous Distributions Jan 8, 2019(21:02) §5 Page 71

17. Suppose (X1 , X2 ) has a bivariate normal distribution with E[X1 ] = E[X2 ] = 0 and var[X1 ] = var[X2 ] = 1. Let
ρ = corr[X1 , X2 ] = cov[X1 , X2 ] = E[X1 X2 ]. Show that
X12 − 2ρX1 X2 + X22
∼ χ22
1 − ρ2

18. Suppose (X, Y ) has a bivariate normal distribution with E[X] = E[Y ] = 0. Show that
1 1
P[X ≥ 0, Y ≥ 0] = P[X ≤ 0, Y ≤ 0] = + sin−1 ρ
4 2π
1 1
P[X ≤ 0, Y ≥ 0] = P[X ≥ 0, Y ≤ 0] = − sin−1 ρ
4 2π
19. Normality of conditional distributions does not imply normality. Suppose the random vector (X, Y ) has the density
f(X,Y ) (x, y) = C exp −(1 + x2 )(1 + y 2 )
 
for x ∈ R and y ∈ R.
Find the marginal distributions of X and Y and show that the conditional distributions (X|Y = y) and (Y |X = x) are
both normal.

5 The multivariate normal


5.1 The multivariate normal distribution. The n × 1 random vector X has a non-singular multivariate normal
distribution iff X has density
 
1
fX (x) = C exp − (x − µ)T P(x − µ) for x ∈ Rn (5.1a)
2
where
• C is a constant so that the density integrates to 1;
• µ is a vector in Rn ;
• P is a real symmetric positive definite n × n matrix called the precision matrix.
5.2 Integrating the density. Because P is a real symmetric positive definite matrix, there exists an orthogonal
matrix L with
P = LT DL
where L is orthogonal and D is diagonal with entries d1 > 0, . . . , dn > 0. This result is explained in §1.4 on
page 60; the values d1 , . . . , dn are the eigenvalues of P.
Consider the transformation Y = L(X − µ); this is a 1 − 1 transformation: Rn → Rn which has a Jacobian with
absolute value:
∂(y1 , . . . , yn )
∂(x1 , . . . , xn ) = |(det(L)| = 1

Note that X − µ = LT Y. The density of Y is


    n  
1 T T 1 T Y 1 2
fY (y) = C exp − y LPL y = C exp − y Dy = C exp − dj yj
2 2 2
j=1
It follows that Y1 , . . . , Yn are independent with distributions N (0, 1 ), . . . , N (0, 1/dn ) respectively, and
1/d
√ √ √
d1 · · · dn det(D) det(P)
C= = =
(2π)n/2 (2π)n/2 (2π)n/2
So equation(5.1a) becomes
√  
det(P) 1
fX (x) = exp − (x − µ) T
P(x − µ) for x ∈ Rn (5.2a)
(2π)n/2 2
Note that the random vector Y satisfies E[Y] = 0 and var[Y] = D−1 . Using X = µ + LT Y gives
E[X] = µ
var[X] = var[L Y] = LT var[Y]L = LT D−1 L = P−1
T

and hence P is the precision matrix—the inverse of the variance matrix. So equation(5.1a) can be written as
 
1 1 T −1
fX (x) = √ exp − (x − µ) Σ (x − µ) for x ∈ Rn (5.2b)
(2π)n/2 det(Σ) 2
where µ = E[X] and Σ = P−1 = var[X]. This is defined to be the density of the N (µ, Σ) distribution.
Page 72 §5 Jan 8, 2019(21:02) Bayesian Time Series Analysis

Notes.
• A real matrix is the variance matrix of a non-singular normal distribution iff it is symmetric and positive
definite.
• The random vector X is said to have a spherical normal distribution iff X ∼ N (µ, σ 2 I). Hence X1 , . . . , Xn are
independent and have the same variance.
5.3 The characteristic function. Suppose the n-dimensional random vector X has the N (µ, Σ) distribution.
We know that using the transformation Y = L(X − µ) leads to Y ∼ N (0, D−1 ). Because L is orthogonal we have
X = µ + LT Y. Hence the characteristic function of X is:
h T i h T T T
i T
h T T i
φX (t) = E eit X = E eit µ+it L Y = eit µ E eit L Y for all n × 1 vectors t ∈ Rn .
But Y1 , . . . , Yn are independent with distributions N (0, 1/d1 ), . . . , N (0, 1/dn ), respectively. Hence
h T i 2 2 1 T −1
E eit Y = E ei(t1 Y1 +···+tn Yn ) = e−t1 /2d1 · · · e−tn /2dn = e− 2 t D t for all n × 1 vectors t ∈ Rn .
 

Applying this result to the n × 1 vector Lt gives


h T T i 1 T T −1 1 T
E eit L Y = e− 2 t L D Lt = e− 2 t Σt
We have shown that if X ∼ N (µ, Σ) then
h T i T 1 T
φX (t) = E eit X = eit µ− 2 t Σt for all t ∈ Rn .
T T
µ+ 21 tT Σt
The moment generating function of X is E[et X] = et .
5.4 The singular multivariate normal distribution. In the last section we saw that if µ ∈ Rn and Σ is an
n × n real symmetric positive definite matrix, then the function φ : Rn → Cn with
T 1 T
φ(t) = eit µ− 2 t Σt for t ∈ Rn .
is a characteristic function. The condition on Σ can be relaxed to non-negative definite as follows:
Proposition(5.4a). Suppose µ ∈ Rn and V is an n × n real symmetric non-negative definite matrix. Then the
function φ : Rn → Cn with
T
µ− 12 tT Vt
φ(t) = eit for t ∈ Rn (5.4a)
is a characteristic function.
Proof. For n = 1, 2, . . . , set Vn = V + n1 I where I is the n × n identity matrix. Then Vn is symmetric and positive definite
and so
T 1 T
φn (t) = eit µ− 2 t Vn t
is a characteristic function.
Also φn (t) → φ(t) as n → ∞ for all t ∈ Rn . Finally, φ is continuous at t = 0. It follows that φ is a characteristic function
by the multidimensional form of Lévy’s convergence theorem3 .

If V is symmetric and positive definite, then we know that φ in equation(5.4a) is the characteristic function of the
N (µ, V) distribution.
If V is only symmetric and non-negative definite and not positive definite, then by §1.3 on page 60, we know that
some linear combination of the components is zero and the density does not exist. In this case, we say that the
distribution with characteristic function φ is a singular multivariate normal distribution.
5.5 Linear combinations of the components of a multivariate normal.
Proposition(5.5a). Suppose the n-dimensional random vector X has the possibly singular N (µ, Σ) distribu-
tion. Then for any n × 1 vector ` ∈ Rn the random variable Z = `T X has a normal distribution.
Proof. Use characteristic functions. For t ∈ R we have
T T 1 2 T
φZ (t) = E[eitZ ] = E[eit` X ] = φX (t`) = eit` µ− 2 t ` Σ`

and hence Z ∼ N `T µ, `T Σ` .
Conversely:
Proposition(5.5b). Suppose X is an n-dimensional random vector such that for every n × 1 vector ` ∈ Rn the
random variable `T X is univariate normal. Then X has the multivariate normal distribution.
3
Also called the “Continuity Theorem.” See, for example, page 361 in [F RISTEDT & G RAY(1997)].
2 Multivariate Continuous Distributions Jan 8, 2019(21:02) §5 Page 73
T
Proof. The characteristic function of X is φX (t) = E[eit X ]. Now Z = tT X is univariate normal. Also E[tT X] = tT µ and
var[tT X] = tT Σt where µ = E[X] and Σ = var[X]. Hence Z ∼ N (tT µ, tT Σt). Hence the characteristic function of Z is,
for all u ∈ R:
T 1 2 T
φZ (u) = eiut µ− 2 u t Σt
Take u = 1; hence
T 1 T
φZ (1) = eit µ− 2 t Σt
T
But φZ (1) = E[eiZ ] = E[eit X ]. So we have shown that
T T
µ− 21 tT Σt
E[eit X
] = eit
and so X ∼ N (µ, Σ).
Combining these two previous propositions gives a characterization of the multivariate normal distribution:
the n-dimensional random vector X has a multivariate normal distribution iff every linear combination of
the components of X has a univariate normal distribution.
5.6 Linear transformation of a multivariate normal.
Proposition(5.6a). Suppose the n-dimensional random vector X has the non-singular N (µ, Σ) distribution.
Suppose further that B is an m × n matrix with m ≤ n and rank(B) = m; hence B has full rank.
Let Z = BX. Then Z has the non-singular N (Bµ, BΣBT ) distribution.
Proof. We first establish that BΣBT is positive definite. Suppose x ∈ Rm with xT BΣBT x = 0. Then yT Σy = 0 where
y = BT x. Because Σ is positive definite, we must have y = 0. Hence BT x = 0; hence x1 αT1 + · · · + xm αTm = 0 where
α1 , . . . , αm are the m rows of B. But rank(B) = m; hence x = 0 and hence BΣBT is positive definite.
The characteristic function of Z is, for all t ∈ Rm :
T T T 1 T T
φZ (t) = E[eit Z ] = E[eit BX ] = φX (BT t) = eit Bµ− 2 t BΣB T (5.6a)
Hence Z ∼ N (Bµ, BΣBT ).
What if B is not full rank? Suppose now that X has the possibly singular N (µ, Σ) distribution and B is any m × n
matrix where now m > n is allowed. Equation(5.6a) for φZ (t) still holds. Also if v is any vector in Rm , then
vT BΣBT v = zT Σz where z is the n × 1 vector BT v. Because Σ is non-negative definite, it follows that BΣBT is
non-negative definite. Hence Y = BX has the possibly singular N (Bµ, BΣBT ) distribution. We have shown the
following result:
Corollary(5.6b). Suppose that X has the possibly singular N (µ, Σ) distribution and B is any m × n matrix
where m > n is allowed. Then Y = BX has the possibly singular N (Bµ, BΣBT ) distribution.
Here is an example where AX and BX have the same distribution but A 6= B.

√ the 2-dimensional random vector X ∼ N (0, I). Let Y1 = X1 + X2 , Y2 = 2X1 + X2 , Z1 = X1 2
Example(5.6c). Suppose
and Z2 = (3X1 + X2 )/ 2. Then
  √ 
1 1 2 0
Y = AX where A = and Z = BX where B = 3 √ 1 √
2 1 / 2 / 2
T T
Let Σ = AA = BB . Then Y ∼ N (0, Σ) and Z ∼ N (0, Σ).

5.7 Transforming a multivariate normal into independent normals. The following proposition shows that
we can always transform the components of a non-singular multivariate normal into i.i.d. random variables with
the N (0, 1) distribution; see also §1.4 on page 60. We shall show below in §5.14 on page 79 how to convert a
singular multivariate normal into a non-singular multivariate normal.
Proposition(5.7a). Suppose the random vector X is has the non-singular N (µ, Σ) distribution.
Then there exists a non-singular matrix Q such that the QT Q = P, the precision matrix, and
Q(X − µ) ∼ N (0, I)
Proof. From §5.2 on page 71 we know that the precision matrix P = Σ−1 = LT DL where L√is orthogonal √ and D =
diag[d1 , . . . , dn ] with d1 > 0, . . . , dn > 0. Hence P = LT D1/2 D1/2 L where D1/2 = diag[ d1 , . . . , dn ]. Hence
P = QT Q where Q = D1/2 L. Because L is non-singular, it follows that Q is also non-singular.
If Z = Q(X − µ), then E[Z] = 0 and var[Z] = QΣQT = QP−1 QT = I. Hence Z ∼ N (0, I) and Z1 , . . . , Zn are
i.i.d. random variables with the N (0, 1) distribution.

It also follows that if Y = LX where L is the orthogonal matrix which satisfies LT DL = Σ−1 , then Y ∼
N (Lµ, D−1 ). Hence Y1 , Y2 , . . . , Yn are independent with var[Yk ] = 1/dk where 1/d1 , 1/d2 , . . . , 1/dn are the
eigenvalues of Σ.
Page 74 §5 Jan 8, 2019(21:02) Bayesian Time Series Analysis

We now have another characterization of this result: the random vector X has a non-singular normal distri-
bution iff there exists an orthogonal transformation L such that the random vector LX has independent
normal components.
An orthogonal transformation of a spherical normal is a spherical normal: if the random vector has the spherical
normal distribution N (µ, σ 2 I) and L is orthogonal, then Y ∼ N (Lµ, σ 2 I) and hence Y also has a spherical normal
distribution.

5.8 The marginal distributions. Suppose the n-dimensional random vector X has the N (µ, Σ) distribution.
Then the characteristic function of X is
T
µ− 21 tT Σt
φX (t) = E[ei(t1 X1 +···+tn Xn ) ] = eit for t ∈ Rn .
and hence the characteristic function of X1 is
1 2
φX1 (t1 ) = φX (t1 , 0, . . . , 0) = eiµ1 t1 − 2 t1 Σ11
and so X1 ∼ N (µ1 , Σ11 ). Similarly, Xj ∼ N (µj , Σjj ) where Σjj is the (j, j) entry in the matrix Σ.
Similarly, the random vector (Xi , Xj ) has the bivariate normal distribution with mean vector (µi , µj ) and variance
matrix  
Σii Σij
Σij Σjj
In general, we see that every marginal distribution of a multivariate normal is normal.
An alternative method for deriving marginal distributions is to use proposition (5.6a) on page 73: for example,
X1 = aX where a = (1, 0, . . . , 0).
The converse is false!!
Example(5.8a). Suppose X and Y are independent two dimensional random vectors with distributions N (µ, ΣX ) and
N (µ, ΣY ) respectively where
     
µ1 σ12 ρ1 σ1 σ2 σ12 ρ2 σ1 σ2
µ= ΣX = and ΣY =
µ2 ρ1 σ1 σ2 σ22 ρ2 σ1 σ2 σ22
and ρ1 6= ρ2 . Let

X with probability 1/2;
Z=
Y with probability 1/2.
Show that Z has normal marginals but is not bivariate normal. See also exercise 14 on page 81.
Solution. Let Z1 and Z2 denote the components of Z; hence Z = (Z1 , Z2 ). Then Z1 ∼ N (µ1 , σ12 ) and Z2 ∼ N (µ2 , σ22 ). Hence
every marginal distribution of Z is normal.
Now E[Z] = µ and var[Z] = E[(Z − µ)(Z − µ)T ] = 1/2(ΣX + ΣY ). The density of Z is fZ (z) = 21 fX (z) + 12 fY (z) and this is
not the density of N (µ, 1/2(ΣX + ΣY ))—we can see that by comparing the values of these two densities at z = µ.
A special case is when ρ1 = −ρ2 ; then cov[Z1 , Z2 ] = 0, Z1 and Z2 are normal but Z = (Z1 , Z2 ) is not normal.
Now for the general case of a subvector of X.
Proposition(5.8b). Suppose X is an n-dimensional random vector with the possibly singular N (µ, Σ) distri-
bution, and X is partitioned into two sub-vectors:
X1
 
k×1 
X = where n = k + `.
n×1 X2
`×1
Now partition µ and Σ conformably as follows:
µ1 Σ11 Σ12
   
k×1  k×k k×`
µ = and Σ =  (5.8a)
n×1 µ2 n×n Σ21 Σ22
`×1 `×k `×`
Note that Σ21 = ΣT12 . Then
(a) X1 ∼ N (µ1 , Σ11 ) and X2 ∼ N (µ2 , Σ22 );
(b) the random vectors X1 and X2 are independent iff Σ12 = 0, equivalently iff cov[X1 , X2 ] = 0.
h T i T 1 T
Proof. Now the characteristic function of X is E eit X = eit µ− 2 t Σt for all t ∈ Rn . Partitioning t conformably into
t = (t1 , t2 ) gives
h T i h T T
i T 1 T
E eit X = E eit1 X1 +it2 X2 = eit µ− 2 t Σt
2 Multivariate Continuous Distributions Jan 8, 2019(21:02) §5 Page 75

Setting t2 = 0 shows that X1 ∼ N (µ1 , Σ11 ). Hence part (a).


⇒ We are given cov[X1i , X2j ] = 0 for all i = 1, . . . , k and j = 1, . . . , `. Hence Σ12 = 0.
⇐ Because Σ12 = 0 we have
 
Σ11 0
Σ=
0 Σ22
The characteristic function of X gives
h T i T 1 T T 1 T T 1 T
E eit X = eit µ− 2 t Σt = eit1 µ1 − 2 t1 Σ11 t1 eit2 µ2 − 2 t2 Σ22 t2 for all t ∈ Rn .
h T T
i h T i h T i
and hence E eit1 X1 +it2 X2 = E eit1 X1 E eit2 X2 and hence X1 and X2 are independent.
Similarly, if X ∼ N (µ, Σ) and we partition X into 3 sub-vectors X1 , X2 and X3 , then these 3 sub-vectors are
independent iff Σ12 = 0, Σ13 = 0 and Σ23 = 0. More generally:
• Pairwise independence implies independence for sub-vectors of a multivariate normal.
• If X ∼ N (µ, Σ) then X1 , . . . , Xn are independent iff all covariances equal 0. (5.8b)

Example(5.8c). Suppose the 5-dimensional random vector X = (X1 , X2 , X3 , X4 , X5 ) has the N (µ, Σ) distribution where
2 4 0 0 0
 
4 3 0 0 0 
Σ = 0 0 1 0 0 
 
0 0 0 4 −1
 
0 0 0 −1 3
Then (X1 , X2 ), X3 , and (X4 , X5 ) are independent.

5.9 Conditional distributions. To prove the following proposition, we use the following result: suppose
W1 is a k-dimensional random vector;
W2 is an `-dimensional random vector;
W1 and W2 are independent;
h is a function : R` → Rk .
Let V = W1 + h(W2 ). Then the conditional distribution of V given W2 has density
f(V,W2 ) (v, w2 ) f(W1 ,W2 ) (v − h(w2 ), w2 )
fV|W2 (v|w2 ) = = = fW1 ( v − h(w2 ) )
fW2 (w2 ) fW2 (w2 )
In particular, if W1 ∼ N (µ1 , Σ1 ) then the conditional distribution of V = W1 + h(W2 ) given W2 = w2 has the
density of the N ( µ1 + h(w2 ), Σ1 ) distribution. We have shown the following.
Suppose W1 has the non-singular N (µ1 , Σ1 ) distribution and W2 is independent of W1 .
Then the conditional density of W1 + h(W2 ) given W2 = w2 is the density of N ( µ1 +
h(w2 ), Σ1 ). (5.9a)

Proposition(5.9a). Suppose X is an n-dimensional random vector with the non-singular N (µ, Σ) distribution,
and X is partitioned into two sub-vectors:
X1
 
k×1 
X = where n = k + `.
n×1 X2
`×1
Partition µ and Σ conformably as in equations(5.8a). Then the conditional distribution of X1 given X2 = x2 is
the normal distribution:
N µ1 + Σ12 Σ−1 −1

22 (x2 − µ2 ), Σ11 − Σ12 Σ22 Σ21 (5.9b)
Proof. We shall give two proofs of this important result; the first proof is shorter but requires knowledge of the answer!
Proof 1. Let
I −Σ12 Σ−1
 
22
k×k k×`
B = 
n×n 0 I
`×k `×`
Note that B is invertible with inverse
I Σ12 Σ−1
 
B−1 = 22
0 I
But by proposition(5.6a), we know that BX ∼ N (Bµ, BΣBT ). where
X1 − Σ12 Σ−1 µ1 − Σ12 Σ−1 Σ11 − Σ12 Σ−1
     
BX = 22 X2 Bµ = 22 µ2 and BΣBT = 22 Σ21 0
X2 µ2 0 Σ22
Page 76 §5 Jan 8, 2019(21:02) Bayesian Time Series Analysis

It follows that X1 − Σ12 Σ−122 X2 is independent of X2 . Also


X1 − Σ12 Σ−1 −1 −1

22 X2 ∼ N µ1 − Σ12 Σ22 µ2 , Σ11 − Σ12 Σ22 Σ21
It follows by the boxed result(5.9a) that the conditional distribution of X1 given X2 = x2 has the density of
N µ1 + Σ12 Σ−1 −1

22 (x2 − µ2 ), Σ11 − Σ12 Σ22 Σ21
Proof 2. We want to construct a k × 1 random vector W1 with
W1 = C1 X1 + C2 X2
where C1 is k × k and C2 is k × ` and such that W1 is independent of X2 .
Now if W1 is independent of X2 , then C0 W1 is also independent of X2 for any C0 ; hence the answer is arbitrary up to
multiplicative C0 . So take C1 = I. This implies we are now trying to find a k × ` matrix C2 such that W1 = X1 + C2 X2 is
independent of X2 .
Now cov[W1 , X2 ] = 0; hence cov(X1 , X2 ) + cov(C2 X2 , X2 ) = 0; hence Σ12 + C2 Σ22 = 0 and hence C2 = −Σ12 Σ−122 .
So W1 = X1 − Σ12 Σ−1 22 X2 is independent of X2 . Also X1 = W1 − C2 X2 ; hence var[X1 ] = var[W1 ] + C2 var[X2 ]C2 ;
T
−1
hence var[W1 ] = Σ11 − Σ12 Σ22 Σ21 . The rest of the proof is as in the first proof.
The proof shows that the unique linear function of X2 which makes X1 − C2 X2 independent of X2 is Σ12 Σ−1
22 .
Corollary(5.9b). The distribution in (5.9b) is a non-singular multivariate normal. Hence the matrix Σ11 −
Σ12 Σ−1
22 Σ21 is positive definite and non-singular.
Proof. Note that the inverse of a positive definite matrix is positive definite and any principal sub-matrix of a positive
definite matrix is positive definite. Both these results can be found on page 214 of [H ARVILLE(1997)].
Now Σ is positive definite; hence the inverse P = Σ−1 is also positive definite. Partition P conformably as
 
P11 P12
P=
P21 P22
Because PΣ = I we have
P11 Σ11 + P12 Σ21 = I P21 Σ11 + P22 Σ21 = 0
P11 Σ12 + P12 Σ22 = 0 P21 Σ12 + P22 Σ22 = I
Hence
P12 = −P11 Σ12 Σ−122 and P21 = −P22 Σ21 Σ11
−1

Hence
P11 Σ11 − Σ12 Σ−1
 
22 Σ21 = I
Hence the matrix Σ11 − Σ12 Σ−1 22 Σ21 is non-singular with inverse P11 . Because P is positive definite, P11 is also positive
definite and hence its inverse is positive definite.
5.10 The matrix of regression coefficients, partial covariance and partial correlation coefficients. The
k × ` matrix Σ12 Σ−1 22 is called the matrix of regression coefficients of the k-dimensional vector X1 on the `-
dimensional vector X2 ; it is obtained by multiplying the k × ` matrix cov[X1 , X2 ] = Σ12 by the ` × ` precision
matrix of X2 .
Similarly, the matrix of regression coefficients of X2 on X1 is the ` × k matrix Σ21 Σ−1
11 .
Let D1 denote the variance matrix of the conditional distribution of X1 given X2 . Hence D1 is the k × k invertible
matrix Σ11 − Σ12 Σ−1 −1
22 Σ21 . Similarly, let D2 denote the ` × ` invertible matrix Σ22 − Σ21 Σ11 Σ12 .
By postmultiplying the following partitioned matrix by the partitioned form of Σ, it is easy to check that
D−1 −D−1 Σ12 Σ−1
 
−1 1 1 22
P=Σ = (5.10a)
−D−1
2 Σ21 Σ11
−1
D−1
2
The matrix D1 , which is the variance of the conditional distribution of X1 given X2 , is also called the partial
covariance of X1 given X2 . Thus the partial covariance between X1j and X1k given X2 is [D1 ]jk and the partial
correlation between X1j and X1k given X2 is defined to be
[D1 ]jk
p √
[D1 ]jj [D1 ]kk
This is sometimes denoted ρjk·r1 r2 ...r` . For example, ρ13·567 is the partial correlation between X1 and X3 in the
conditional distribution of X1 , X2 , X3 , X4 given X5 , X6 and X7 .
If X1 and X2 are independent then Σ12 = 0 and hence D1 = Σ11 ; this means that the partial covariance between
X1j and X1k given X2 equals the ordinary covariance cov[X1j , X1k ]; similarly the partial correlation equals the
ordinary correlation. In general these quantities are different and indeed may have different signs.
Example(5.10a). Suppose the 4-dimensional random vector X has the N (µ, Σ) distribution where
2 4 2 −1 2
   
3  2 8 3 −2 
µ =   and Σ = 
1 −1 3 5 −4

4 2 −2 −4 4
2 Multivariate Continuous Distributions Jan 8, 2019(21:02) §5 Page 77

Find ρ13 and ρ13·24 and show that they have


√ opposite signs.

Solution. Now ρ13 = σ13 /(σ11 σ33 ) = −1/ 4 × 5 = −1/ 20.
We have X1 = (X1 , X3 )T and X2 = (X2 , X4 )T ; hence
   −1  −1    
−1 4 −1 2 2 8 −2 2 3 1 12 4
D1 = Σ11 − Σ12 Σ22 Σ21 = − =
−1 5 3 −4 −2 4 2 −4 7 4 6
√ √ √ √
Hence ρ13·24 = 4/ 12 × 6 = 2/ 18 = 2/ 3

5.11 The special case of the conditional distribution of X1 given (X2 , . . . , Xn ). For this case, k = 1 and
` = n − 1. Hence D1 is 1 × 1. Denote the first row of the precision matrix P by [q11 , q12 , . . . , q1n ]. Then by
equation(5.10a)
[q11 , q12 , . . . , q1n ] = D−1 −1 −1
 
1 , −D1 Σ12 Σ22
Hence
1 1
D1 = and Σ12 Σ−1 22 = − [q12 , . . . , q1n ]
q11 q11
By equation(5.9b) on page 75, the conditional distribution of X1 given (X2 , . . . , Xn ) = (x2 , . . . , xn ) is
q12 (x2 − µ2 ) + · · · + q1n (xn − µn ) 1
 
−1

N µ1 + Σ12 Σ22 (x2 − µ2 ), D1 = N µ1 − ,
q11 q11
The proof of proposition(5.9a) on page 75 also shows that
the unique linear function a2 X2 + · · · + an Xn which makes X1 − (a2 X2 + · · · + an Xn ) independent of
(X2 , . . . , Xn ) is given by a2 = − qq12
11
, . . . , an = − qq1n
11
.

5.12 The joint distribution of X and S 2 . This is a very important result in statistical inference!!
Suppose X1 , . . . , Xn are i.i.d. random variables with the N (µ, σ 2 ) distribution. To prove independence of X and
S 2 we can proceed as follows. Let Y = BX where B is the n × n matrix
/n 1/n ··· 1/n 
 1
 − 1/n 1 − 1/n · · · − 1/n 
− /n − 1/n · · · − 1/n 
 1 
B=  .
 .. .. .. .. 
. . . 
− 1/n − 1/n · · · 1 − 1/n
Hence (Y1 , Y2 , . . . , Yn ) = (X, X2 − X, . . . , Xn − X). Then
 
1/n 01×(n−1)
T
BB = where A is an (n − 1) × (n − 1) matrix.
0(n−1)×1 A
Hence X is independent of (Y2 , . . . , Yn ) = (X2 − X, . . . , Xn − X). Now (X1 − X) + nk=2 Yk = nk=1 (Xk − X) = 0
P P
Pn 2 2
Yk . Finally (n−1)S 2 = nk=1 (Xk −X)2 = (X1 −X)2 + nk=2 Yk2 =
Pn
and hence (X1 −X)2 =
P P
k=2 k=2 Yk +
Pn 2 2
k=2 Yk . Hence X is independent of S .
The following proposition also derives the distribution of S 2 .
Proposition(5.12a). Suppose X1 , . . . , Xn are i.i.d. random variables with the N (µ, σ 2 ) distribution. Let
Pn Pn
k=1 Xk 2 (Xk − X)2
X= and S = k=1
n n−1
Then X and S 2 are independent; also
σ2 (n − 1)S 2
 
X ∼ N µ, and ∼ χ2n−1 (5.12a)
n σ2
Proof. We shall give two proofs. Pn
Method 1. Let Yk = (Xk − µ)/σ for k = 1, . . . , n. Then Y = k=1 Yk /n = (X − µ)/σ and
Pn 2 n
(n − 1)S 2 k=1 (Xk − X)
X
= = (Yk − Y)2
σ2 σ2
k=1
Because Y1 , . . . , Yn are i.i.d. N (0, 1), we have Y = (Y1 , . . . , Yn ) ∼ N (0, I).
Consider the transformation from Y = (Y1 , . . . , Yn ) to Z = (Z1 , . . . , Zn ) with Z = AY where A is defined as follows.
Y1 + · · · + Yn √
Z1 = √ = nY
n
Page 78 §5 Jan 8, 2019(21:02) Bayesian Time Series Analysis
 
Hence the first row of the matrix A is √1 , . . . , √1 . Construct the other (n − 1) rows of A so that A is orthogonal. For
n n
the explicit value of A, see exercise 11 on page 81. Because A is orthogonal, we have AAT = I and hence
Xn Xn
Zk2 = ZT Z = YT AT AY = YT Y = Yk2
k=1 k=1
Pn Pn
Now Y ∼ N (0, I); hence Z is also N (0, I). Now k=1 Zk2 = k=1 Yk2 . Hence
n n n n
X X X 2 X (n − 1)S 2
Zk2 = Yk2 − Z12 = Yk2 − nY = (Yk − Y)2 =
σ2
k=2 k=1 k=1 k=1

This proves Z1 = n Y is independent of S 2 and hence X is independent of S 2 . It also shows that
n
(n − 1)S 2 X 2
= Zk ∼ χ2n−1
σ2
k=2
Hence result.
Method 2. This is based on an algebraic trick applied to moment generating functions.
For all t1 ∈ R, . . . , tn ∈ R we have
Xn Xn
tk (Xk − X) = tk Xk − nX t
k=1 k=1
and hence for all t0 ∈ R we have
n n   n
X X t0 X
t0 X + tk (Xk − X) = + tk − t Xk = ck Xk
n
k=1 k=1 k=1
Pn Pn Pn
where ck = + (tk − t). Note that k=1 ck = t0 and
t0/n + k=1 (tk − t)2 .
2 t20/n
= k=1 ck
Now let t = (t0 , t1 , . . . , tn ); then the moment generating function of the vector Z = (X, X1 − X, . . . , Xn − X) is
n n n
!
2 2 2 2 2 X
   
Y Y σ c σ t σ
E[et·Z ] = E[ec1 X1 +···cn Xn ] = E[eck Xk ] = exp µck + k
= exp µt0 + 0
exp (tk − t)2
2 2n 2
k=1 k=1 k=1
t0 X
The first factor is E[e ]. Hence X is independent of the vector (X1 − X, . . . , Xn − X) and hence X and S 2 are
independent.
Using the identity Xk − µ = (Xk − X) + (X − µ) gives
n n 2 2
X − µ) (n − 1)S 2 X − µ)
 
1 X 2 1 X 2
(Xk − µ) = (X k − X) + √ = + √ (5.12b)
σ2 σ2 σ n σ2 σ n
k=1 k=1
The left hand side has the distribution and the second term on the right hand side has the χ21 distribution. Using
χ2n
moment generating functions and the independence of the two terms on the right hand side of equation(5.12b) gives
(n − 1)S 2 /σ 2 ∼ χ2n−1 .

5.13 Daly’s theorem. This can be regarded as a generalization of the result that X and S 2 are independent.
Proposition(5.13a). Suppose X ∼ N (µ, σ 2 I) and the function g : Rn → R is translation invariant: this means
g(x + a1) = g(x) for all x ∈ Rn and all a ∈ R. Then X and g(X) are independent.
Proof. This proof is based on [DALY(1946)]. The density of X = (X1 , . . . , Xn ) is
1 − 12
P
(xk −µk )2
fX (x1 , . . . , xn ) = e 2σ k for (x1 , . . . , xn ) ∈ Rn .
(2π)n/2 σ n

Hence the moment generating function of X, g(X) is
Z Z t P
1 0 xk +t1 g(x1 ,...,xn ) − 1 2
P
(xk −µk )2
E[et0 X+t1 g(X) ] = n/2 n
· · · e n k e 2σ k dx1 · · · dxn
(2π) σ
The exponent is
2
σ 2 t20 σ 2 t0

t0 X 1 X 2 1 X
xk − 2 (xk − µk ) + t1 g(x1 , . . . , xn ) = + t0 µ − 2 xk − µ k − + t1 g(x1 , . . . , xn )
n 2σ 2n 2σ n
k k k
Hence
P  2
Z Z 1 σ 2 t0
1 − xk −µk − +t1 g(x1 ,...,xn )
σ 2 t20 /2n+t0 µ 2σ 2
E[et0 X+t1 g(X) ] = e k n
··· e dx1 · · · dxn
(2π)n/2 x1 ∈R xn ∈R
σ 2 t0
Using the transformation (y1 , . . . , yn ) = (x1 , . . . , xn ) − n gives
2 Multivariate Continuous Distributions Jan 8, 2019(21:02) §5 Page 79
Z Z
t0 X+t1 g(X) 1
σ 2 t20 /2n+t0 µ − 12
P t t
(yk −µk )2 +t1 g(y1 + n0 ,...,yn + n0 )
E[e ]=e n/2
· · · e 2σ k dy1 · · · dyn
(2π) y1 ∈R yn ∈R
Z Z
1 − 12
P
(yk −µk )2 +t1 g(y1 ,...,yn )
= E[et0 X ] n/2
· · · e 2σ k dy1 · · · dyn
(2π) y1 ∈R yn ∈R

= E[et0 X ] E[et1 g(X) ]


Hence X and g(X) are independent.

Daly’s theorem implies X and S 2 are independent because S 2 is translation invariant. It also implies that the range
Rn = Xn:n − X1:n and X are independent because the range is translation invariant.

5.14 Converting a singular multivariate normal into a non-singular multivariate normal distribution.
Example(5.14a). Suppose the three dimensional vector X has the N (µ, Σ) distribution where
2 2 1 3
" # " #
µ = 1 and Σ = 1 5 6
5 3 6 9
Find B and µ so that if X = BY + µ then Y ∼ N (0, Ir ).
Solution. Note that if x = [ −1 −1 1 ]T then xT Σx = 0; hence X has a singular multivariate normal distribution. Also, if
   
1 5 1 T 2 1
Q= √ then QQ =
13 1 8 1 5
Let
"5√
2 / 13 1/√13
" # #
µ = 1 and B = 1/√13 8/√13 then BBT = Σ
5 6/√13 9/√13
√ √
Hence if X = BY + µ then 3Y1 = (8X1 − X2 − 15)/ 13 and 3Y2 = (5X2 − X1 − 3)/ 13 and Y ∼ N (0, I2 ).

In general, we can proceed as follows:


Proposition(5.14b). Suppose X is an n-dimensional random vector with the N (µ, Σ) distribution where
rank(Σ) = r ≤ n. Then X = BY + µ where Y is an r-dimensional random vector with Y ∼ N (0, Ir )
and B is a real n × r matrix with Σ = BBT and rank(B) = r.
Proof. If r = n, then the result follows from §1.4 on page 60. So suppose r < n.
Now Σ is a real symmetric non-negative definite matrix with rank(Σ) = r. Hence there exists an orthogonal matrix Q
such that  
D 0
Σ = QT Q
0 0
where D is an r × r diagonal matrix with the non-zero eigenvalues {λ1 , . . . , λr } of Σ on the main diagonal—the values
are all strictly positive. Define the n × n matrix T by
 −1/2 
T D 0
T=Q Q
0 In−r
Then T is non-singular and
 
I 0
TΣTT = r
0 0
Let W = TX; then W ∼ N (Tµ, TΣTT ). Partition W into the r × 1 vector W1 and the (n − r) × 1 vector W2 ; partition
α = Tµ conformably into α1 and α2 . Then W1 ∼ N (α1 , Ir ) and W2 = α2 with probability 1.
Because T is non-singular, we can write X = T−1 W = BW1 + CW2 where B is n × r and C is n × (n − r) and
T−1 = [ B C ]. Hence
X = BW1 + Cα2 = B(W1 − α1 ) + Bα1 + Cα2
= B(W1 − α1 ) + T−1 α
= BY + µ where Y = W1 − α1 .
Then Y ∼ N (0, Ir ) and
    
Ir 0 I 0 BT
Σ=T −1
(T−1 )T = [ B C] r = BBT
0 0 0 0 CT
Finally r = rank(Σ) = rank(BBT ) ≤ min{rank(B), rank(BT } = rank(B) ≤ r; hence rank(B) = r.
Page 80 §6 Jan 8, 2019(21:02) Bayesian Time Series Analysis

6 Exercises (exs-multivnormal.tex)

1. (a) Suppose the random vector X has the N (µ, Σ) distribution. Show that X − µ ∼ N (0, Σ).
(b) Suppose X1 , . . . , Xn are independent with distributions N (µ1 , σ 2 ), . . . , N (µn , σ 2 ) respectively. Show that the
random vector X = (X1 , . . . , Xn )T has the N (µ, σ 2 I) distribution where µ = (µ1 , . . . , µn )T .
(c) Suppose X ∼ N (µ, Σ) where X = (X1 , . . . , Xn )T . Suppose further that X1 , . . . , Xn are uncorrelated. Show that
X1 , . . . , Xn are independent.
(d) Suppose X and Y are independent n-dimensional random vectors with X ∼ N (µX , ΣX ) and Y ∼ N (µY , ΣY ).
Show that X + Y ∼ N (µX + µY , ΣX + ΣY ).

2. Suppose X ∼ N (µ, Σ) where


−3 4 0 −1
! !
µ= 1 and Σ= 0 5 0
4 −1 0 2
(a) Are (X1 , X3 ) and X2 independent?
(b) Are X1 − X3 and X1 − 3X2 + X3 independent?
(c) Are X1 + X3 and X1 − 2X2 − 3X3 independent?

3. Suppose the random vector X = (X1 , X2 , X3 ) has the multivariate normal distribution N (0, Σ) where
2 1 −1
" #
Σ= 1 3 0
−1 0 5
(a) Find the distribution of (X3 |X1 = 1).
(b) Find the distribution of (X2 |X1 + X3 = 1).

4. From linear regression. Suppose a is an n × m matrix with n ≥ m and rank(a) = m. Hence a has full rank.
(a) Show that the m × m matrix aT a is invertible.
(b) Suppose the n-dimensional random vector X has the N (µ, σ 2 I) distribution. Let
B = (aT a)−1 aT and Y = BX
m×n n×1
Show that
Y ∼ N Bµ, σ 2 (aT a)−1


5. Suppose the 5-dimensional random vector Z = (Y, X1 , X2 , X3 , X4 ) is multivariate normal with finite expectation E[Z] =
(1, 0, 0, 0, 0) and finite variance var[Z] = Σ where
1 1/2 1/2 1/2 1/2
 
 1/2 1 1/2 1/2 1/2 
Σ =  1/2 1/2 1 1/2 1/2 
 
/2 1/2 1/2 1 1/2
1 
1/2 1/2 1/2 1/2 1
1
Show that E[Y |X1 , X2 , X3 , X4 ] = 1 + 5 (X1 + X2 + X3 + X4 ).

6. Suppose X = (X1 , X2 , X3 ) has a non-singular multivariate normal distribution with E[Xj ] = µj and var[Xj ] = σj2 for
j = 1, 2 and 3. Also
1 ρ12 ρ13
!
corr[X] = ρ12 1 ρ23
ρ13 ρ23 1
(a) Find E[X1 |(X2 , X3 )] and var[X1 |(X2 , X3 )].
(b) Find E[(X1 , X2 )|X3 ] and var[(X1 , X2 )|X3 ].

7. Suppose X1 , . . . , Xn are i.i.d. random variables with the N (µ, σ 2 ) distribution. As usual
Pn 2
2 j=1 (Xj − X)
S =
n−1
By using the distribution of S 2 , find var[S 2 ]. (See also exercise 13 on page 65.)

8. Suppose the n-dimensional random vector X has the non-singular N (µ, Σ) distribution and j ∈ {1, 2, . . . , n}. Show
that var[Xj |X(j) ] ≤ var[Xj ] where, as usual, X(j) denotes the vector X with Xj removed.
2 Multivariate Continuous Distributions Jan 8, 2019(21:02) §6 Page 81

9. Continuation of proposition(22.7a).
(a) Show that c ≥ 0.
(b) Show that the size variable g1 (X) is independent of every shape vector z(X) iff the n-dimensional vector (1, . . . , 1 )
is an eigenvector of Σ.
(c) Suppose c = 0. Show that (X1 · · · Xn )1/n is almost surely constant.
10. Suppose X1 , . . . , Xn are i.i.d. random variables with the N (µ, σ 2 ) distribution. Consider the transformation from
(x1 , . . . , xn ) to (y1 , . . . , yn ) with
y1 = x − µ, y2 = x2 − x, . . . , yn = xn − x
This transformation is 1 − 1 and
∂(y1 , . . . , yn ) 1
∂(x1 , . . . , xn ) = n

Find the density of (Y1 , . . . , Yn ).

11. The Helmert matrix of order n.


(a) Consider the n × n matrix A with the following rows:
1 1 1
v1 = √ (1, 1, . . . , 1) v2 = √ (1, −1, 0, . . . , 0) v3 = √ (1, 1, −2, 0, . . . , 0)
n 2 6
and in general for k = 2, 3, . . . , n:
1
vk = √ (1, 1, . . . , 1, −(k − 1), 0, . . . , 0)
k(k − 1)

where the vector vk starts with (k − 1) terms equal to 1/ k(k − 1) and ends with (n − k) terms equal to 0.
Check that A is orthogonal. This matrix is used in §5.12 on page 77.
(b) Consider the 3 vectors
2 2 2
 
α1 = (a1 , a2 , a3 ) α2 = a1 , − a1/a2 , 0 α3 = a1 , a2 , − (a1 +a2 )/a3
Check the vectors are orthogonal: this means α1 · α2 = α1 · α3 = α2 · α3 = 0.
Hence construct a 3 × 3 matrix which is orthogonal and which has a first row proportional to the vector α1 .
(c) Construct an n × n orthogonal matrix which has a first row proportional to the vector α = (a1 , a2 , . . . , an ).

12. Suppose X1 , X2 , . . . , Xn are i.i.d. random variables with the N (0, 1) distribution. Suppose a1 , a2 , . . . , an are real
constants with a21 + a22 + · · · + a2n 6= 0.
Find the conditional distribution of X12 + X22 + · · · + Xn2 given a1 X1 + a2 X2 + · · · an Xn = 0.
(Hint. Use part(c) of exercise 11).
13. Suppose X = (X1 , . . . , Xn ) has the multivariate normal distribution with E[Xj ] = µj , var[Xj ] = σ 2 and corr[Xj , Xk ] =
ρ|j−k| for all j and k in {1, 2, . . . , n}. Hence X ∼ N (µ, Σ) where
1 ρ ρ2 · · · ρn−1
 
µ
 
n−2
.  ρ 1 ρ ··· ρ 
µ =  ..  and Σ = σ 2   .. .. .. .. .. 
. . . . .

µ
ρn−1 ρn−2 ρn−3 · · · 1
Show that the sequence {X1 , X2 , . . . , Xn } forms a Markov chain.
14. Suppose X = (X1 , . . . , Xn ) is an n-dimensional random vector with density
 
n n
!
1 Y 1 2 1 X
fX (x) = 1+ xk e− 2 xk exp − x2j  for x = (x1 , . . . , xn ) ∈ Rn .
(2π)n/2 k=1
2
j=1
(`)
Using the usual notation, let X denote the vector X with X` removed for ` = 1, 2 . . . , n. Thus, for example,
X(1) = (X2 , X3 , . . . , Xn ) and X(2) = (X1 , X3 , X4 , . . . , Xn ).
(a) Let g` denote the density of X(`) . Find g` and hence show that the distribution of X(`) is N 0, I).
(b) Show that X gives an example of a random vector whose distribution is not multivariate normal and whose com-
ponents are not independent yet X(`) has the distribution of (n − 1) independent N (0, 1) variables for all `.
See also [P IERCE & DYKSTRA(1969)].
Page 82 §7 Jan 8, 2019(21:02) Bayesian Time Series Analysis

7 Quadratic forms of normal random variables


7.1 Introduction. Quadratic P forms P introduced in §1.7 on page 61. In that section, it was shown that
were
common quantities such as x2j and (xj − x)2 can be regarded as quadratic forms. In addition, general
expressions were obtained for the mean and variance of a quadratic form in random variables with an arbitrary
distribution.
We now consider the special case of quadratic forms in normal random variables; extensive bibliographies of this
subject can be found in [S CAROWSKY(1973)] and [D UMAIS(2000)].
We investigate four topics:
• moment generating functions of quadratic forms;
• the independence of quadratic forms (including Craig’s theorem);
• the distribution of a quadratic form;
• partitioning a quadratic form into independent quadratic forms (including Cochran’s theorem).
Moment generating functions of quadratic forms
7.2 The moment generating function of a quadratic form.
Theorem(7.2a). Suppose A is a real symmetric n × n matrix and X ∼ N (µ, Σ), possibly singular. Then the
m.g.f. of XT AX is
T 1 T −1
E[etX AX ] = 1/2
etµ (I−2tAΣ) Aµ (7.2a)
|I − 2tAΣ|
for t sufficiently small such that the matrix I − 2tAΣ is positive definite or equivalently 4 such that all the
eigenvalues of the matrix I − 2tAΣ are positive.
Proof. Let r = rank(Σ). Then by proposition(5.14b) on page 79 we can write X = BY + µ where Y ∼ N (0, Ir ) and B is
a real n × r matrix with Σ = BBT .
Now BT AB is a real symmetric r × r matrix and hence there exists an orthogonal r × r matrix P with BT AB = PT DP
where D is an r × r diagonal matrix with diagonal elements {λ1 , . . . , λr } which are the eigenvalues of BT AB.
From pages 545–546 in [H ARVILLE(1997)] we know that if F is an m × n matrix and G is an n × m matrix and qFG (λ)
denotes the characteristic polynomial of the m × m matrix FG and qGF (λ) denotes the characteristic polynomial of
the n × n matrix GF, then qGF (λ) = (−λ)n−m qFG (λ) provided n > m. It follows that the eigenvalues of the n × n
matrix ABBT are the same as the eigenvalues of the r × r matrix BT AB together with n − r zeros; this implies they are
{λ1 , . . . , λr , 0, . . . , 0}.
Let Z = PY. Now P is orthogonal and Y ∼ N (0, Ir ); hence Z ∼ N (0, Ir ). Also
XT AX = (YT BT + µT )A(BY + µ) = YT BT ABY + 2µT ABY + µT Aµ
= YT PT DPY + 2µT ABPT Z + µT Aµ
= ZT DZ + αT Z + µT Aµ where α is the r × 1 vector 2PBT Aµ
Xr
= (λj Zj2 + αj Zj ) + µT Aµ
j=1
By part(a) of exercise 16 on page 96, we know that
!
tλj Zj2 +tαj Zj 1 t2 αj2
E[e ]= p exp for t ∈ R and tλj < 1/2.
1 − 2tλj 2(1 − 2tλj )
Now λj may be negative; hence the condition |t| < 1/|2λj | is sufficient.
Suppose t ∈ R with |t| < min{1/|2λ1 |, . . . , 1/|2λr |} and Z1 , . . . , Zr are independent. Then
    
r r 2
X 1 t2 X αj
E exp  (tλj Zj2 + tαj Zj ) = √ exp  
(1 − 2tλ1 ) · · · (1 − 2tλr ) 2 1 − 2tλj
j=1 j=1
Now λ is an eigenvalue of AΣ iff 1 − 2tλ is an eigenvalue of I − 2tAΣ. Also, the determinant of a matrix equals the
product of the eigenvalues. Hence |I − 2tAΣ| = (1 − 2tλ1 ) · · · (1 − 2tλr ). Hence
t2 T
 
T 1
E[etX AX ] = exp tµ T
Aµ + α (I − 2tD)−1
α (7.2b)
|I − 2tAΣ|1/2 2
By straightforward multiplication (I − 2tPT DP)PT (I − 2tD)−1 P = I and hence
PT (I − 2tD)−1 P = (I − 2tPT DP)−1 = (I − 2tBT AB)−1 (7.2c)
4
A symmetric matrix is positive definite iff all its eigenvalues are strictly positive—see page 543 of [H ARVILLE(1997)].
2 Multivariate Continuous Distributions Jan 8, 2019(21:02) §7 Page 83

Replacing α by 2PBT Aµ in the exponent of equation(7.2b) and using equation(7.2c) shows that
t2
tµT Aµ + αT (I − 2tD)−1 α = tµT I + 2tAB(I − 2tBT AB)−1 BT Aµ
 
2
By part (b) of exercise 16 on page 96, we know that if F and G are n × r matrices such that In − FGT and Ir − GT F are
non-singular, then In + F(Ir − GT F)−1 GT = (In − FGT )−1 . Equation(7.2c) shows that Ir − 2tBT AB is non-singular; also
In − 2tAΣ = In − 2tABBT is non-singular for our values of t.
So applying the result with F = 2tAB and G = B shows that In + 2tAB(Ir − 2tBT AB)−1 BT = (In − 2tABBT )−1 =
(In − 2tAΣ)−1 where t ∈ R is such that |t| < min{1/|2λ1 |, . . . , 1/|2λr |}. Hence result.

Alternative expressions for the m.g.f. when the distribution is non-singular. Now suppose the distribution of
X is non-singular; hence the matrix Σ is invertible.
First note that
Σ−1 − Σ−1 (Σ−1 − 2tA)−1 Σ−1 = Σ−1 − (I − 2tAΣ)−1 Σ−1 = I − (I − 2tAΣ)−1 Σ−1
 

Similarly
Σ−1 − Σ−1 (Σ−1 − 2tA)−1 Σ−1 = Σ−1 − Σ−1 (I − 2tΣA)−1 = Σ−1 I − (I − 2tΣA)−1
 

Using part of the exponent in equation(7.2a) gives


1
t(I − 2tAΣ)−1 A = Σ−1 tΣ(I − 2tAΣ)−1 A = Σ−1 (I − 2tΣA)−1 − I
 
2
by using part(b) of exercise 16 on page 96 with F = 2tΣ and G = A = AT .
Hence alternative expressions for the m.g.f. when the distribution is non-singular include
1 1 T −1 −1
e− 2 µ Σ [I−(I−2tΣA) ]µ
T
E[etX AX ] = 1/2
|I − 2tAΣ|
1 1 T −1 −1
= 1/2
e− 2 µ [I−(I−2tAΣ) ]Σ µ (7.2d)
|I − 2tAΣ|
1 − 12 µT [Σ−1 −(Σ−2tΣAΣ)−1 ]µ
= e (7.2e)
|I − 2tAΣ|1/2
A general result for the variance of a quadratic form was obtained in §1.9 on page 62. See exercise 19 on page 97
for an expression for the variance of a quadratic form in normal random variables.
7.3 Other moment generating function. Suppose A and B are real symmetric n×n matrices and X ∼ N (µ, Σ).
Then we can write down the m.g.f. of the 2-dimensional random vector (XT AX, XT BX) as follows: replace A by
s1 A + s2 B in equation(7.2a) and then let t1 = ts1 and t2 = s2 . Hence, provided t1 and t2 are sufficiently small
exp µT (I − 2t1 A1 Σ − 2t2 BΣ)−1 (t1 A + t2 B)µ
 
t1 XT AX+t2 XT BX
E[e ]= (7.3a)
|I − 2t1 AΣ − 2t2 BΣ|1/2
If the distribution is non-singular, then we can use equation(7.2d) or (7.2e) and then
 1 T −1 1 T −1 Σ−1 µ

T T exp − µ Σ µ + µ (I − 2t1 AΣ − 2t 2 BΣ)
E[et1 X AX+t2 X BX ] = 2 2
|I − 2t1 AΣ − 2t2 BΣ|1/2
In particular, if X ∼ N (0, Σ) then µ = 0 and
T T 1
E[et1 X AX+t2 X BX ] =
|I − 2t1 AΣ − 2t2 BΣ|1/2
Hence if X ∼ N (0, Σ), then XT AX and XT BX are independent iff
|I − 2t1 AΣ − 2t2 BΣ|1/2 = |I − 2t1 AΣ|1/2 |I − 2t2 BΣ|1/2 (7.3b)

Moment generating function of a quadratic expression. Suppose X ∼ N (0, I), A is a real symmetric n × n matrix,
b is an n × 1 real vector and d ∈ R. Suppose further that we want the m.g.f. of the quadratic expression
Q = XT AX + bT X + d. First note that if C is an n × n non-singular matrix, x ∈ Rn , b ∈ Rn and t ∈ R then
(x − tC−1 b)T C(x − tC−1 b) − t2 bT C−1 b = (xT − tbT C−1 )(Cx − tb) − t2 bT C−1 b
= xT Cx − 2tbT X
and hence if C = I − 2tA, we have
xT x − (x − tC−1 b)T C(x − tC−1 b) + t2 bT C−1 b = xT x − xT Cx + 2tbT X = 2txT Ax + 2tbT x
Page 84 §7 Jan 8, 2019(21:02) Bayesian Time Series Analysis

and hence
1 1 1
txT Ax + tbT x + td = xT x − (x − tC−1 b)T C(x − tC−1 b) + td + t2 bT C−1 b
2 2 2
Hence the m.g.f. of Q is, provided t is sufficiently small that the matrix I − 2tA is positive definite (equivalently,
t is sufficiently small that all the eigenvalues are positive),
exp[− 21 (x − tC−1 b)T C(x − tC−1 b)]
 Z
tXT AX+tbT X+td 1 2 T −1
E[e ] = exp td + t b C b
2 n (2π)n/2
  x∈R
1 1
= exp td + t2 bT C−1 b √
2 det(C)
 
1 1 2 T −1
= exp td + t b (I − 2tA) b (7.3c)
|I − 2tA|1/2 2

Independence of quadratic forms


7.4 Independence of two normal linear forms. This first result below is the key to the subsequent two results.
Proposition(7.4a). Suppose X is an n-dimensional random vector with the possibly singular N (µ, Σ) distri-
bution. Suppose A is an m1 × n matrix and B is an m2 × n matrix. Then AX and BX are independent iff
AΣBT = 0.
Proof. Let F denote the (m1 + m2 ) × n matrix
 
A
F=
B
Then Y = FX is multivariate normal by Corollary(5.6b) on page 73. Using proposition(5.8b) on page 74 shows AX and
BX are independent iff cov[AX, BX] = 0. But cov[AX, BX] = AΣBT by exercise 2 on page 63. Hence result.
Similarly suppose Ai is mi × n for i = 1, 2, . . . , k. Then Ai ΣATj = 0 for all i 6= j implies A1 X, . . . , Ak X are
pairwise independent by proposition(5.8b) on page 74; box display(5.8b) on page 75 then implies A1 X, . . . , Ak X
are independent.
7.5 Independence of normal quadratic and normal linear form—a simple special case.
Proposition(7.5a). Suppose X is an n-dimensional random vector with the N (µ, Σ) distribution. Suppose
further that A is k × n matrix and B is a symmetric idempotent n × n matrix with AΣB = 0. Then
AX and XT BX are independent.
Proof. We are given AΣB = 0 and B is symmetric; hence AΣBT = 0; hence AX and BX are independent by proposi-
tion(7.4a). Hence AX and (BX)T (BX) are independent. Hence result.
See exercise 4 for a variation of this proposition.
The requirement that B is idempotent is very restrictive. The previous proposition is still true without this condition
but the proof is much harder. We shall return to this problem after considering Craig’s theorem.
7.6 Independence of two normal quadratic forms—Craig’s theorem. The special case when both matrices
are idempotent is easy to prove.
Proposition(7.6a). Suppose X is an n-dimensional random vector with the N (µ, Σ) distribution. Suppose A
and B are n × n symmetric idempotent matrices such that AΣB = 0. Then XT AX and XT BX are independent.
Proof. By proposition(7.4a), we know that AX and BX are independent iff AΣBT = 0.
We are given AΣB = 0. Using the fact that B is symmetric gives AΣBT = 0 and hence AX and BX are independent.
Hence (AX)T (AX) and (BX)T (BX) are independent. But A and B are both idempotent; hence result.
See exercise 3 for a variation of this proposition. We shall also prove in proposition(7.10a) on page 87 that,
provided µ = 0, then XT AX/σ 2 ∼ χ2r where r = rank(A) and XT BX/σ 2 ∼ χ2s where s = rank(B).
The previous proposition is still true without the assumption of idempotent—but then the proof is much harder.
Theorem(7.6b). Craig’s theorem on the independence of two normal quadratic forms. Suppose X has the
non-singular normal distribution N (µ, Σ) and A and B are real and symmetric. Then
XT AX and XT BX are independent iff AΣB = 0
Proof.
⇒ Omitted because the proof is long and not used in applications. See [R EID & D RISCOLL(1988)] and pages 208–211
in [M ATHAI & P ROVOST(1992)].
⇐ Because Σ is real, symmetric and positive definite, then by §1.4 on page 60, we can write Σ = QQ where Q is
2 Multivariate Continuous Distributions Jan 8, 2019(21:02) §7 Page 85

symmetric and nonsingular. Let C = QT AQ and D = QT BQ. Hence CD = QT AQQT BQ = QT AΣBQ = 0, because we
are assuming AΣB = 0. Taking the transpose and using the fact that both C and D are symmetric gives DC = 0.
See pages 559–560 in [H ARVILLE(1997)] for the result that if C and D are both n × n real symmetric matrices, then there
exists an orthogonal matrix P such that both PT CP and PT DP are diagonal iff CD = DC.
Let Y = PT Q−1 X; hence QPY = X.
Now var[Y] = PT Q−1 var[X]Q−1 P = I because var[X] = Σ = QQ and P is orthogonal. So Y1 , . . . , Yn are independent.
T T T T T T
Pn {λ12, . . 2. , λn } denote the eigenvalues of C; then X AX = Y P Q AQPY T= Y P CPY
Let = YT diag[λ1 , . . . , λn ]Y =
T T T T T
T
k=1 λk Yk . Similarly, Let Pn {µ12, . . 2. , µn } denote the eigenvalues of D; then X BX = Y P Q BQPY = Y P DPY =
Y diag[µ1 , . . . , µn ]Y = k=1 µk Yk .
Finally, note that CPPT D = CD = 0; hence PT CPPT DP = 0, and hence diag[λ1 , . . . , λn ]diag[µ1 , . . . , µn ] = 0. This
implies λk µk = 0 for every k = 1, 2, . . . , n. So partition the set {1, 2, . . . , n} into N1 and N2 such that N1 = {j : λj 6= 0}
and N2 = {1, 2, . . . , n} − N1 . Then j ∈ N1 implies λj 6= 0 and hence µj = 0; also µj 6= 0 implies λj = 0 and hence
j ∈ N2 . Hence XT AX depends only on the random variables {Yj : j ∈ N1 } and XT BX depends only on the random
variables {Yj : j ∈ N2 }; hence XT AX and XT BX are independent and the result is proved.
Zhang has given a slicker proof of this result—see after the proof of proposition (7.7a) below.

Note that Craig’s theorem includes proposition(7.6a) in the special case when the distribution is non-singular.

T
Example(7.6c). Applying Craig’s theorem Pn of X X. Suppose
Pn to a decomposition X1 , . . . , Xn are i.i.d. random variables
with the N (µ, σ ) distribution. Let X = j=1 Xj /n and S = j=1 (Xj − X)2 /(n − 1). Show that X and S 2 are independent.
2 2
Pn Pn 2 2
Solution. Now XT X = j=1 Xj2 = j=1 (Xj − X)2 + nX . Also XT 1X/n = nX . Hence XT X = XT [I − 1/n]X + XT 1X/n.
By Craig’s theorem, X and S 2 are independent iff AΣB = 0 where A = I − 1/n, Σ = σ 2 I and B = 1/n. This occurs iff
[I − 1/n]1/n = 0. But 11/n2 = 1/n. Hence result.

Note that the following three propositions are equivalent:


[A] If X has the non-singular n-dimensional normal distribution N (0, I) and A and B are real symmetric n ×
n matrices, then XT AX and XT BX are independent iff AB = 0.
[B] If X has a non-singular n-dimensional normal distribution with var[X] = I and A and B are real symmetric
n × n matrices, then XT AX and XT BX are independent iff AB = 0.
[C] If X has a non-singular n-dimensional normal distribution and A and B are real symmetric n × n matrices,
then XT AX and XT BX are independent iff AΣB = 0 where Σ = var[X].
Clearly [C] ⇒ [B] ⇒ [A].
To prove [B] ⇒ [C]. Suppose Z has the non-singular normal distribution N (µ, Σ) and C and D are real and
symmetric and ZT CZ and ZT DZ are independent. Let Y = Σ−1/2 Z; then Y ∼ N (Σ−1/2 µ, I). Also ZT CZ =
YT Σ1/2 CΣ1/2 Y and ZT DZ = YT Σ1/2 DΣ1/2 Y. So by [B] , we have Σ1/2 CΣ1/2 Σ1/2 DΣ1/2 = 0 and hence
CΣD = 0. Similarly for the other implication.
To prove [A] ⇒ [B]. Suppose Z has the non-singular normal distribution N (µ, I) and C and D are real and
symmetric. Let Y = Z − µ; then Y ∼ N (0, I). Also ZT CZ = (YT + µT )C(Y + µ) and ZT DZ = (YT + µT )D(Y + µ).
We need to prove that ZT CZ and ZT DZ are independent iff CD = 0.
Consider the quadratic expression Q = αZT CZ + ZT DZ = YT (αC + D)Y + 2µT (αC + D)Y + µT (αC + D)µ. Using
equation(7.3c) where A is replaced by αC + D and b by 2(αC + D)µ and d by µT (αC + D)µ gives
2 µT (αC + D)(I − 2tαC − 2tD)−1 (αC + D)µ
 T 
exp tµ (αC + D)µ + 2t
E[etQ ] =
|1 − 2tαC − 2tD|1/2
Using
"∞ # ∞
X X
−1 k
2
4t (αC + D)(I − 2tαC − 2tD) (αC + D) = 2t(αC + D) [2t(αC + D)] 2t(αC + D) = [2t(αC + D)]k
k=0 k=2
and hence

" #
1 1 X
E[tQ ] = exp µT [2t(αC + D)]k µ
|1 − 2tαC − 2tD|1/2 2
k=1
Page 86 §7 Jan 8, 2019(21:02) Bayesian Time Series Analysis

Now if CD = 0 then ∞
P k
P∞ k
P∞ k T T
k=1 [2t(αC + D)] = k=1 (2tαC) + k=1 (2tD) . Also by [A] we have Y CY and Y DY
are independent; hence |I − 2tαC − 2tD| = |I − 2tαC| |I1 − 2tB|. Hence the m.g.f. factorizes
∞ ∞
" # " #
1 1 X 1 1 X
E[etQ ] = exp µT (2tαC)k µ exp µT (2tD)k µ
|I − 2tαC|1/2 2 |I − 2tD|1/2
k=1
2
k=1
αtZT CZ+tZT DZ αtZT CZ tZT DZ
E[e ] = E[e ] E[e ]
and this is equivalent to
T T T T
E[et1 Z CZ+t2 Z DZ ] = E[et1 Z CZ ] E[et2 Z DZ
]
and hence ZT CZ and ZT DZ are independent.
7.7 Independence of normal quadratic and normal linear form: the general non-singular case.
Proposition(7.7a). Suppose X is an n-dimensional random vector with the non-singular N (µ, Σ) distribution.
Suppose further that A is k × n matrix and B is a symmetric n × n matrix. Then
AX and XT BX are independent iff AΣB = 0
Proof.
⇒ We are given AX and XT BX are independent; hence XT AT AX = (AX)T (AX) and XT BX are independent. By Craig’s
theorem, AT AΣB = 0. Premultiplying by A gives (AAT )T AΣB = 0.
Suppose A has full row rank; then by exercise 13 on page 96, we know that AAT is non-singular and hence AΣB = 0.
Suppose rank(A) = r < k. Let A1 denote the r × n matrix of r independent rows of A and let A2 denote the (k − r) × n
matrix of the other k − r rows of A. Hence A2 = CA1 for some (k − r) × r matrix C. Consider the matrix A∗ where
 
∗ A1
A =
A2

So A can be obtained from A by an appropriate permutation of the rows. Now A1 X = f (AX) for some function f ; hence
A1 X and XT BX are independent. By the full row rank case we have A1 ΣB = 0. Hence A2 ΣB = CA1 ΣB = 0. Hence
A∗ ΣB = 0. Hence AΣB = 0.
⇐ The proof is similar to that of Craig’s theorem—see exercise 15 on page 96.

Here is another5 slicker proof of this result due to [Z HANG(2017)]:


⇐ We are given AΣB = 0. Hence cov[AX, BX] = AΣBT = AΣB = 0; thus AX is independent
of BX. Let B− be a generalized inverse of B; hence B = BB− B and hence XT BX = XT BB− BX =
(BX)T B− (BX) = f (BX). Because AX is independent of BX, it follows that AX is independent of
f (BX) = XT BX. Hence result.
⇒ We are given AX and XT BX are independent; hence XT AT AX and XT BX are independent; hence
by Craig’s theorem we have AT AΣB = 0. Now use the equality A = A(AT A)− AT A to get AΣB =
A(AT A)− AT AΣB = 0. Hence result.
Clearly, the Zhang proof of ⇐ also shows that AΣB = 0 implies XT AX and XT BX are independent, and this is
the ⇐ part of Craig’s theorem—proposition (7.6a) above.
The following example shows we can use this proposition to prove the standard result about the independence of
S 2 and X.
Example(7.7b).
Pn . . . , Xn are i.i.d. random variables with the N (µ, σ 2 ) distribution.
Suppose X1 ,P
n
Let X = j=1 Xj /n and S 2 = j=1 (Xj − X)2 /(n − 1). Show that X and S 2 are independent.
Solution. Now X = aT X where aT = (1, 1, . . . , 1)/n. Also by example(1.7d) on page 61, we have (n − 1)S 2 = XT (I − 1/n)X
where 1 is an n × n matrix with every entry equal to 1. Finally Σ = σ 2 I. Using the notation of proposition(7.7a) we have
AΣB = σ 2 aT (I − 1/n) = 0. Hence result.

7.8 Craig’s theorem for the singular case.


Proposition(7.8a). Suppose X is an n-dimensional random vector with the possibly singular N (µ, Σ) distribu-
tion. Suppose further that A and B are n × n real symmetric matrices. Then XT AX and XT BX are independent
iff ΣAΣBΣ = 0, ΣAΣBµ = 0, ΣBΣAµ = 0, and µT AΣBµ = 0.
Proof. See [D RISCOLL & K RASNICKA(1995)].

5
Yet another proof can be found in [S EARLE(1971)]. His proof of ⇐ is based on expressing B as LLT where L has full
column rank. This is possible but L will be complex unless B is non-negative definite. Hence the proof involves using the
complex multivariate normal which we have not considered—and nor does Searle. The proof of ⇒ in Searle is clearly false.
2 Multivariate Continuous Distributions Jan 8, 2019(21:02) §7 Page 87

The distribution of a quadratic form


7.9 Distribution of XT Σ−1 X. We now move to the second problem: the distribution of a quadratic form.
Suppose X is an n-dimensional random vector with the non-singular N (µ, Σ) distribution. By proposition (5.7a)
on page 73, wePknow there exists a non-singular matrix Q such that QT Q = Σ−1 and Z = Q(X − µ) ∼ N (0, I).
Hence ZT Z = ni=1 Zi2 ∼ χ2n and so
(X − µ)T QT Q(X − µ) ∼ χ2n
Summarizing:
if X ∼ N (µ, Σ) then (X − µ)T Σ−1 (X − µ) ∼ χ2n
This is a generalization of the result for the bivariate case given in exercise 17 on page 71.
We can generalize this result:
Proposition(7.9a). Suppose X is an n-dimensional random vector with the non-singular N (µ, Σ) distribution.
Then the random variable XT Σ−1 X has the non-central χ2n distribution with non-centrality parameter λ =
µT Σ−1 µ.
Proof. Take Z = QX where Σ−1 = QT Q. Then Z ∼ N (µZ , I) where µZ = Qµ. Also
XT Σ−1 X = XT QT QX = ZT Z = Z12 + · · · + Zn2
Hence, by Chapter1:§18.2 on page 42, XT Σ−1 X has the non-central χ2n distribution with non-centrality parameter
µTZ µZ = µT Σ−1 µ.

7.10 Distribution of XT CX when C is idempotent. Properties of idempotent matrices can be found in many
places; for example [H ARVILLE(1997)].
Proposition(7.10a). Suppose X1 , . . . , Xn are i.i.d. random variables with the N (0, σ 2 ) distribution. Suppose
C is an n × n symmetric idempotent matrix with rank r (and so trace(C) = r). Then
XT CX
∼ χ2r
σ2
Proof. Because C is real and symmetric, it is orthogonally diagonalizable: this means we can write C = LDLT where
D = diag(d1 , . . . , dp ) is diagonal and L is orthogonal—see §1.4 on page 60. Now C2 = LDLT LDLT = LD2 LT and
C = LDLT . Also C is idempotent—this means that C2 = C. Hence D2 = D. This implies every dj is either 0 or 1 and
hence all the eigenvalues of C are either 0 or 1. It follows, after possibly rearrranging the rows of L, that
 
Ir 0
C=L LT (7.10a)
0 0
where the submatrix Ir is the r × r identity matrix and r = rank(C).
Let Z = LT X; then Z ∼ N (0, σ 2 In ). Also
Pr 2
XT CX ZT LT CLZ
 
1 T Ir 0 k=1 Zk
= = Z Z = ∼ χ2r as required.
σ2 σ2 σ2 0 0 σ2
See exercise 2 for a generalization of this proposition.
7.11 Conditions for a quadratic form in non-singular normal variables to have a χ2 distribution. We start
with the non-singular case.
Theorem(7.11a). Suppose X is an n-dimensional random vector with the non-singular N (µ, Σ) distribution
and A is a real n × n symmetric matrix. Suppose AΣ is idempotent with rank(AΣ) = r; then XT AX has the
non-central χ2r distribution with non-centrality parameter µT Aµ.
Proof. Here are two proofs—both are instructive.
Proof 1. Now Σ is a real symmetric positive definite matrix. Hence there exists a non-singular Q such that Σ = QQT .
Let A1 = QT AQ. Now AΣAΣ = AΣ and Σ is non-singular; hence AΣA = A. Hence A21 = QT AΣAQ = A1 and
rank(A1 ) = rank(A) = rank(AΣ) = r because Q, QT and Σ are all non-singular. Because A1 is a real symmetric
idempotent matrix with rank r, there exists an orthogonal matrix P such that
 
I 0 T
A1 = P r P (7.11a)
0 0
Define P1 to be the n × r matrix and P2 to be the n × (n − r) matrix so that P = [ P1 P2 ]; hence A1 = P1 PT1 . From
equation(7.11a) we have
   T    T  T 
I 0 T P1 Ir 0 P1 P1 P1 PT1 P2
PT A1 = r P = and hence = P T A1 P = [ P1 P2 ] =
0 0 0 0 0 0 0 0
Hence PT1 P1 = Ir .
Page 88 §7 Jan 8, 2019(21:02) Bayesian Time Series Analysis

Let Z = PT1 Q−1 X. Then var[Z] = PT1 Q−1 Σ(Q−1 )T P1 = PT1 Q−1 QQT (Q−1 )T P1 = Ir and hence Z ∼ N (PT1 Q−1 µ, Ir ).
Also ZT Z = XT (Q−1 )T P1 PT1 Q−1 X = XT (Q−1 )T A1 Q−1 X = XT AX. Hence XT AX has a non-central χ2r distribution with
non centrality parameter µT (Q−1 )T P1 PT1 Q−1 µ = µT Aµ.
Proof 2. This proof is based on expanding the moment generating function in equation(7.2d).
Now for |t| < min{1/|2λ1 |, . . . , 1/|2λr |} we have
 
∞  
X 2t
I − (I − 2tAΣ)−1 Σ−1 =  (2t)j (AΣ)j  Σ−1 = A
 
1 − 2t
j=1

because AΣ is idempotent. Also, as in the proof of proposition (7.2a) on page 82, we have |I−2tAΣ| = (1−2tλ1 ) · · · (1−
2tλr ) = (1 − 2t)r where r = rank(AΣ) = rank(A), because we are assuming Σ is non-singular. Substituting into
equation(7.2d) shows that for − 12 < t < 12 we have
 
tXT AX 1 λt
E[e ]= exp where λ = µT Aµ.
(1 − 2t)r/2 1 − 2t
By equation(18.2a), the m.g.f. of the non-central χ2 distribution with n degrees of freedom and non-centrality parameter λ
is  
1 λt
exp for t < 1/2.
(1 − 2t)n/2 1 − 2t
Hence XT AX has a non-central χ2 distribution with r degrees of freedom and non-centrality parameter µT Aµ.

The converse is also true:


Suppose X is an n-dimensional random vector with the non-singular N (µ, Σ) distribution and A is a real
n×n symmetric matrix. Suppose further that XT AX has the non-central χ2m distribution with non-centrality
parameter λ. Then AΣ is idempotent, λ = µT Aµ and m = rank(A).
A proof of this result can be found in [D RISCOLL(1999)].
Here are two special cases:
• Suppose X ∼ N (0, In ) and A is a real symmetric n × n matrix. Then XT AX ∼ χ2r iff A is idempotent with
rank(A) = r.
• Suppose X ∼ N (µ, σ 2 In ) and A is a real symmetric n × n matrix. Then XT AX/σ 2 has a non-central χ2r
distribution with non-centrality parameter µT Aµ/σ 2 iff A is idempotent with rank(A) = r.
Hence this new proposition includes proposition(7.10a) as a special case.
7.12 Conditions for a quadratic form in possibly singular normal variables to have a χ2 distribution. The
requirement that X has a non-singular distribution is essential in proposition (7.11a)—here is an example which
demonstrates this.
Example(7.12a). Suppose Z ∼ N (0, 1), A = I2 and XT = ( Z 1 ). Show that X has a singular multivariate normal
distribution, AΣ is idempotent but XT AX does not have a non-central χ2 distribution.
Solution. We have X ∼ N (µ, Σ) where
   
0 1 0
µ= and Σ =
1 0 0
T
Then (AΣ)2 = AΣ, rank(AΣ) = 1 and XT AX = Z 2 + 1. Now Z 2 ∼ χ21 . Hence E[etX AX
] = et /(1 − 2t)1/2 and so does not
have a non-central χ2 distribution.
Now for the conditions which are necessary and sufficient for a quadratic form in possibly singular normal vari-
ables to have a χ2 distribution.
Theorem(7.12b). Suppose X is an n-dimensional random vector with the possibly singular N (µ, Σ) distribu-
tion and A is a real n × n symmetric matrix. Suppose further that
ΣAΣAΣ = ΣAΣ and µT (AΣ)2 = µT AΣ and µT AΣAµ = µT Aµ
then XT AX has the non-central χ2r distribution with non-centrality parameter α where r = trace(AΣ) and
α = µT Aµ.
Proof. We use moment generating functions. From equation(18.2a) on page 42 we know that if W has the non-central
χ2r distribution with non-centrality parameter α then
 
1 αt
E[etW ] = exp for t < 1/2.
(1 − 2t)r/2 1 − 2t
Also, by equation(7.2a) we have the moment generating function of XT AX:
T 1 T −1
E[etX AX ] = etµ (I−2tAΣ) Aµ
|I − 2tAΣ| 1/2
2 Multivariate Continuous Distributions Jan 8, 2019(21:02) §7 Page 89

The first assumption is that ΣAΣAΣ = ΣAΣ; by part(a) of exercise 17 this implies (ΣA)3 = (ΣA)2 . This implies6 every
eigenvalue of the matrix ΣA is either 0 or 1. Hence every eigenvalue of the transpose AΣ is either 0 or 1.
Now r = trace(AΣ); hence the number of non-zero eigenvalues of AΣ equals r. Using the fact that the determinant
equals the product of the eigenvalues and λ is an eigenvalue of AΣ iff 1 − 2tλ is an eigenvalue of I − 2tAΣ, we get
|I − 2tAΣ|1/2 = (1 − 2t)r/2 .
We have by induction µT (AΣ)k = µT (AΣ) for k ∈ {1, 2, . . .}. Hence µT (AΣ)k Aµ = µT Aµ = α forP k ∈ {1, 2, . . .}.
Now it is well known
P∞ k that if the spectral radius of the matrix F is less than 1 then the geometric series Fk converges
−1
and (I − F) = k=0 F ; but the eigenvalues of the matrix 2tAΣ are either 2t or 0. Hence for − 2 < t < 12 we can
1

expand (I − 2tAΣ)−1 and get



X α
µT (1 − 2tAΣ)−1 Aµ = (2t)k µT (AΣ)k Aµ =
1 − 2t
k=0
Hence the proposition.
It can be shown that the conditions specified in the previous proposition are necessary and sufficient: see pages
197–201 in [M ATHAI & P ROVOST(1992)]
Partitioning a quadratic form into independent pieces
7.13 Cochran’s theorem. Cochran’s theorem is frequently used in the analysis of variance and is concerned with
partitioning a sum of squares into independent pieces. The simplest case is as follows. Suppose X1 , X2 , . . . , Xn
are i.i.d. random variables with the N (0, σ 2 ) distribution. Then
Pn 2
Pn 2 2
j=1 Xj j=1 (Xj − X) nX
= + 2
σ2 σ2 σ
Pn 2 2
By §5.12 on page 77 we know that j=1 (Xj − X) and nX are independent. Hence
Pn 2
Pn 2
j=1 Xj j=1 (Xj − X)2 (n − 1)S 2 nX
∼ χ2n is partitioned into = ∼ χ2n−1 and ∼ χ21
σ2 σ2 σ 2 σ2
7.14 Matrix results when the sum is the identity matrix. The proof of Cochran’s theorem has led to several
results about matrices. We start with a version where the sum of the matrices is the identity matrix.
Proposition(7.14a). Suppose A1 , . . . , Ak are real symmetric n × n matrices with A1 + · · · + Ak = In . Let
rj = rank(Aj ) for j = 1, 2, . . . , k.
Then the following 3 statements are equivalent:
(1) r1 + · · · + rk = n (2) Aj is idempotent for every j = 1, 2, . . . , k (3) Ai Aj = 0 for all i 6= j
Proof.
(3) ⇒ (2) Multiply the equation A1 + · · · + Ak = In by Aj . This gives A2j = Aj .
(2) ⇒ (1) We need the following two general results: trace(A + B) = trace(A) + trace(B), and rank(B) = trace(B) for
Pn Pn
every idempotent matrix B. Hence n = trace(In ) = j=1 trace(Aj ) = j=1 rank(Aj ).
(1) ⇒ (3) We use the rank factorization theorem for matrices: suppose A is an m × n matrix with rank(A) = r where
r 6= 0. Then there exists an m × r matrix B and an r × n matrix C such that A = BC and rank(B) = rank(C) = r. See
page 38 in [H ARVILLE(1997)].
The rank factorization theorem implies that for every j = 1, 2, . . . , n, there exists an n × rj matrix Bj and an rj × n
matrix Cj such that Aj = Bj Cj . Also r1 + r2 + · · · + rk = n. Let B and C denote the n × n matrices defined by
C1
 
 C2 
B = [ B1 B2 · · · Bk ] and C =   ... 

Ck
Then BC = B1 C1 + · · · Bk Ck = A1 + · · · + Ak = In . Thus C is the inverse of the matrix B and hence CB = In . Hence
Cj B` = 0 for j 6= `. Hence for j 6= ` we have Aj A` = Bj Cj B` C` = 0, as required.
See also pages 434–439 of [H ARVILLE(1997)].
Now for a more sophisticated result which is, effectively, an algebraic version of Cochran’s theorem.
Proposition(7.14b). Suppose A1 , . . . , Ak are symmetric non-negative definite n × n matrices with A1 + · · · +
Ak = In . Let rj = rank(Aj ) for j = 1, 2, . . . , k. Suppose further that r1 + · · · + rk = n.
6
Now ΣAx = λx implies (ΣA)2 x = λΣAx = λ2 x and (ΣA)3 x = λ3 x. Hence λ2 = λ3 and hence λ = 0 or 1.
Page 90 §7 Jan 8, 2019(21:02) Bayesian Time Series Analysis

Then there exists an n × n orthogonal matrix L such that for any x ∈ Rn , if y = Lx then
xT A1 x = y12 + · · · + yr21
xT A2 x = yr21 +1 + · · · + yr21 +r2
(7.14a)
············
xT Ak x = yr21 +···+rk−1 +1 + · · · + yn2
Also A1 , . . . ,Ak are all idempotent and Ai Aj = 0 for i 6= j.
Proof. Because A1 is a real symmetric matrix, there exists an orthogonal matrix L1 such that A1 = L1 D1 LT1 where
D1 = diag(d1 , . . . , dn ) is diagonal with entries which are the eigenvalues of A1 —see §1.4 on page 60. Because A1 is
non-negative definite, we must have d1 ≥ 0, . . . , dn ≥ 0. Now rank(A1 ) = r1 ; hence r1 of the eigenvalues are non-zero
and n − r1 are zero. Without loss of generality, we can assume d1 > 0, . . . , dr1 > 0 and dr1 +1 = · · · = dn = 0.
Let B = A2 + · · · + Ak Now rank(C + D) ≤ rank(C) + rank(D) for all matrices C and D with the same dimensions; hence
n = rank(In ) = rank(A1 + · · · + Ak ) ≤ rank(A1 ) + rank(B) ≤ r1 + · · · + rk = n
Hence all these terms are equal and hence rank(B) = r2 + · · · + rk .
Pn Pr1
Now let y = LT1 x. Then j=1 yj2 = yT y = xT x = xT A1 x + xT Bx = yT D1 y + xT Bx = j=1 dj yj2 + xT Bx.
Pr1 n
Hence j=1 (1 − dj )yj2 + j=r1 +1 yj2 = xT Bx = yT LT1 BL1 y. Because L1 and LT1 are non-singular we have rank(LT1 BL1 ) =
P
rank(B) = r2 + · · · + rk = n − r1 . Hence d1 = · · · = dr1 = 1.
Similarly, for j = 1, 2, . . . , k, there exist orthogonal Lj with Aj = Lj Dj LTj where
 
Irj 0
Dj = and Irj is the rj × rj identity matrix.
0 0
Let Mj consists of the first rj columns of Lj ; hence
h i
L = Mj
j
Nj
n×rj n×(n−rj )
rj ×rj
Clearly
Aj = Lj Dj LTj = Mj Irj MTj = Mj MTj (7.14b)
Now define the n × n matrix M as follows
M = [ M1 M2 ··· Mk ]
Hence
MMT = M1 MT1 + · · · + Mk MTk = A1 + · · · + Ak = In
Thus M is an orthogonal matrix and MT M = In . Hence MTj Mj = Irj and MTj M` = 0 for j 6= `.
Hence MTj Aj M` = MTj Mj MTj M` = 0, and MTj A` M` = MTj M` MT` M` = 0; of course MTj Aj Mj = Irj . Hence
0 0 0
  " #
T I r1 0 T
M A1 M = and M A2 M = 0 Ir2 0 and so on.
0 0
0 0 0
T
Let L = M and then if y = Lx equations(7.14a) follow immediately.
Using the representation Aj = Mj MTj in equation(7.14b) shows that Aj is idempotent and Aj A` = 0 for j 6= `.

Here is an alternative method for proving the first part of the theorem which is quite nice.
Now λ is an eigenvalue of A1 iff |A1 − λI| = 0. Because B = I − A1 , it follows that λ is an eigenvalue of A1 iff
|B − (1 − λ)I| = 0. Hence λ is an eigenvalue of A1 iff 1 − λ is an eigenvalue of B. Now A1 is a real symmetric matrix;
hence r1 , the rank of A1 , equals7 the number of non-zero eigenvalues of A1 . So A1 has n − r1 eigenvalues equal to 0
and hence B has n − r1 eigenvalues equal to 1. Because B has rank n − r1 and is symmetric, it follows that B has
n − r1 eigenvalues equal to 1 and r1 eigenvalues equal to 0. Similarly A1 has r1 eigenvalues equal to 1 and n − r1
eigenvalues equal to 0.

7
For a symmetric matrix, the rank equals the number of non-zero eigenvalues. This follows from the fact that we can diago-
nalize a symmetric matrix. We do need the matrix to be symmetric. Consider the 3 × 3 matrix
0 2 4
" #
0 0 3
0 0 0
This has all eigenvalues equal to 0 but rank 2.
2 Multivariate Continuous Distributions Jan 8, 2019(21:02) §7 Page 91

7.15 Partitioning a quadratic form into independent pieces—Cochran’s theorem. Converting proposi-
tion(7.14a) into a result about random variables gives the following.
Theorem(7.15a). Suppose the n-dimensional random vector X has the N (µ, σ 2 I) distribution and
XT X = XT A1 X + · · · + XT Ak X
where A1 , . . . , Ak are symmetric matrices with ranks r1 , . . . , rk . Consider the following three statements:
(1) r1 + · · · + rk = n (2) Aj is idempotent for every j = 1, 2, . . . , k (3) Ai Aj = 0 for all i 6= j
T T
If any one of these statements is true then all three are true and X A1 X, . . . , X Ak X are independent and
XT Aj X 2 XT X
∼ χ T
rj ,µ Aj µ/σ 2 for j = 1, 2, . . . , k, and ∼ χ2n,µT µ/σ2
σ2 σ2
Proof. Proposition(7.14a) shows that if any one of these statements is true then all three are true. By proposition(7.4a)
on page 84 we know that A1 X, . . . , Ak X are pairwise independent. The result in box display(5.8b) on page 75 shows that
A1 X, . . . , Ak X are independent. Now (A1 X)T (A1 X) = XT A1 X because A1 is idempotent. Hence XT A1 X, . . . , XT Ak X
are independent. Proposition (7.11a) on page 87 implies XT Aj X/σ 2 ∼ χ2rj ,µT Aj µ/σ2 for j = 1, 2, . . . , k. Hence result.
Example(7.15b). Distribution of the sample variance from a normal sample. See also example(7.6c) on page 85.
Suppose X1 , . . . , Xn are i.i.d. random variables with the N (µ, σ 2 ) distribution.
(a) Show that
Pn n Pn 2
j=1 Xj (n − 1)S 2 j=1 (Xj − X)
X
2
X= and (Xj − X) are independent. and = ∼ χ2n−1
n σ2 σ2
j=1
(b) Show that
X−µ
T =√ ∼ tn−1
S/ n
Pn Pn 2 2
Solution. (a) By expressing Xj = (Xj − X) + X we get j=1 Xj2 = j=1 Xj − X + n X . Now XT X = XT [In −
1/n]X + XT 1X/n where 1 is the n × n matrix with every entry equal to 1 (see §1.7 on page 61). So for this case we have
A1 = In − 1/n and A2 = 1/n. Clearly A1 A2 = 0 and the theorem can be applied, giving
Pn 2 2
XT [In − 1/n]X 2 XT 1X/n ( j=1 Xj ) nX
2
∼ χn−1 and = = 2 ∼ χ21,nµ2 /σ2
√ σ σ2 nσ 2 σ
(b) Now (X − µ)/(σ/ n) ∼ N (0, 1) and hence

(X − µ)/(σ/ n)
T = p ∼ tn−1 as required.
S 2 /σ 2
Converting proposition(7.14b) into a result about random variables gives the following.
Theorem(7.15c). Cochran’s theorem. Suppose the n-dimensional random vector X has the N (0, σ 2 I) distri-
bution and
XT X = XT A1 X + · · · + XT Ak X
where A1 , . . . , Ak are symmetric non-negative definite matrices with ranks r1 , . . . , rk with r1 + · · · + rk = n.
Then XT A1 X, . . . , XT Ak X are independent and
XT Aj X
∼ χ2rj for j = 1, 2, . . . , k.
σ2
Proof. Use proposition(7.14b). From §5.6, we know that an orthogonal transformation of a spherical normal is spherical
normal—hence Y1 , Y2 , . . . , Yn are i.i.d. random variables with the N (0, σ 2 ) distribution. Hence result.
Cochran’s theorem can be extended to a non-zero mean as follows:
Proposition(7.15d). Non-central version of Cochran’s theorem. Suppose the n-dimensional random vector
X has the N (µ, σ 2 I) distribution and
XT X = XT A1 X + · · · + XT Ak X
where A1 , . . . , Ak are symmetric non-negative definite matrices with ranks r1 , . . . , rk with r1 + · · · + rk = n.
Then XT A1 X, . . . , XT Ak X are independent and
XT Aj X
∼ χ2rj ,µT Aj µ/σ2 for j = 1, 2, . . . , k.
σ2
Proof. Use the transformation Y = LX constructed in proposition(7.14b); hence Y ∼ N (Lµ, σ 2 I) and Y1 , . . . , Yn are
independent.
From §18.2 on page 42 we know that if W ∼ N (µ, In ), then WT W ∼ χ2n,λ where λ = µT µ. Apply this to Z = MTj X/σ ∼
N (MTj µ/σ, Irj ); hence ZT Z ∼ χ2rj ,λj where λj = µT Mj MTj µ/σ 2 = µT Aj µ/σ 2 . Also ZT Z = XT Mj MTj X/σ 2 =
XT Aj X/σ 2 .
Page 92 §7 Jan 8, 2019(21:02) Bayesian Time Series Analysis

7.16 Matrix results when the sum is any symmetric matrix.


Proposition(7.16a). Suppose A, A1 and A2 are symmetric n × n matrices with
A = A1 + A2
Suppose further that A and A1 are idempotent and A2 is non-negative definite.
Then A1 A2 = A2 A1 = 0 and A2 is idempotent. Also rank(A) = r1 + r2 where r1 = rank(A1 ) and r2 = rank(A2 ).
Proof. First note that if C is a real symmetric matrix, then there exists an orthogonal matrix Q with C = QT DQ where D
is a diagonal matrix with the eigenvalues of C on the diagonal. Hence if C is also idempotent, then the eigenvalues are
either 0 or 1 and C is non-negative definite. Thus every real symmetric idempotent matrix is non-negative definite.
This implies A, A1 and A2 are all non-negative definite. Using the fact that A and A1 are idempotent gives
A = A2 = (A1 + A2 )2 = A1 + A1 A2 + A2 A1 + A22
Hence
A2 = A1 A2 + A2 A1 + A22 (7.16a)
Pre-multiplying by A1 and using the fact that A1 is idempotent gives A1 A2 A1 + A1 A22 = 0.
It is well known that trace(A + B) = trace(A) + trace(B) and trace(AB) = trace(BA). Hence
trace(A1 A2 A1 ) + trace(A2 A1 A2 ) = 0 (7.16b)
Now the (i, i)-element of A1 A2 A1 is βi A2 βiT where βi is row i of A1 . This quantity is non-negative because A2 is non-
negative definite. Similarly, because both A1 and A2 are non-negative definite, equation(7.16b) implies trace(A1 A2 A1 ) =
trace(A2 A1 A2 ) = 0. Clearly F = A1 A2 A1 is also symmetric and non-negative definite; hence all the eigenvalues of F are
non-negative; also because trace(F) = 0, the sum of the eigenvalues equals 0. Hence A1 A2 A1 = 0; hence A1 A2 A1 A2 = 0.
Now if B is a real symmetric matrix with B2 = 0, then because B can be diagonalized, it follows that we must have B = 0.
Hence A1 A2 = 0. Similarly A2 A1 = 0. Substituting in equation(7.16a) gives A22 = A2 .
The three matrices A, A1 and A2 commute in pairs. Hence by pages 559-560 of [H ARVILLE(1997)], the matrices are
simultaneously diagonalizable—this means there exists a non-singular matrix P with P−1 AP = D, P−1 A1 P = D1 and
P−1 A2 P = D2 where D, D1 and D2 are all diagonal. Let r = rank(A). Because A is idempotent, the diagonal of D consists
of r entries equal to 1 and the rest equal to 0. Similarly for D1 and D2 . Because D = D1 + D2 we have r = r1 + r2 .
We can extend the previous result to any finite number of terms as follows.
Corollary(7.16b). Suppose A, A1 , A2 , . . . , Ak are symmetric n × n matrices with
A = A1 + A2 + · · · + Ak
Suppose further that A and A1 , . . . , Ak−1 are idempotent and Ak is non-negative definite.
Then Ai Aj = 0 for i 6= j and Ak is idempotent. Also rank(A) = r1 + · · · + rk where rj = rank(Aj ) for
j = 1, . . . , k.
Proof. The proof is by induction. The previous proposition implies P(2). We shall now prove P(k) ⇒ P(k + 1).
So we are given A = A1 + B where B = A2 + · · · + Ak+1 , the matrices A, A1 , . . . , Ak are symmetric idempotent and Ak+1
is non-negative definite. Hence A1 , . . . , Ak+1 are all non-negative definite and hence B is non-negative definite. Hence
P(2) implies A1 B = BA1 = 0 and B is idempotent.
Now apply P(k) to B = A2 + · · · + Ak+1 which we can do because the k matrices B, A2 , . . . , Ak are all idempotent and
Ak+1 is non-negative definite. So P(k) implies Ak+1 is idempotent and Ai Aj = 0 for i 6= j and i,j ∈ {2, . . . , k + 1}.
Now apply P(2) to the decomposition A = Ak+1 + C where A and Ak+1 are both idempotent and C = A1 + · · · + Ak is
non-negative definite. Hence C is idempotent and we can apply P(k) to C = A1 + · · · + Ak which implies Ai Aj = 0 for
i 6= j and i,j ∈ {1, . . . , k}. Finally, A1 B = 0 then implies A1 Ak+1 = 0.
7.17 Other matrix results. Further results where the sum of the matrices is not necessarily the identity matrix.
Proposition(7.17a). Suppose A, A1 , . . . , Ak are real symmetric n × n matrices with A1 + · · · + Ak = A.
Let r = rank(A) and rj = rank(Aj ) for j = 1, 2, . . . , k. Consider the following 3 statements:
(1) A is idempotent (2) Aj is idempotent for every j = 1, 2, . . . , k (3) Ai Aj = 0 for all i 6= j.
If any two of these statements are true then all 3 are true and also r = r1 + · · · + rk .
Proof. We first show
(2) and (3) ⇒ (1). This result is easy because A2 = (A1 + · · · + Ak )2 = A1 + · · · + Ak by using (2) and (3).
(1) and (2) ⇒ (3). Now A is symmetric and idempotent with rank r; hence there exists an orthogonal matrix Q such
that  
I 0
QT AQ = r
0 0
Let Bj = QT Aj Q for j = 1, 2, . . . , k; hence
 
I 0
B1 + · · · + Bk = QT AQ = r (7.17a)
0 0
2 Multivariate Continuous Distributions Jan 8, 2019(21:02) §7 Page 93

Now suppose C is a non-negative definite matrix with entries cij ; then xT Cx ≥ 0 for all x ∈ Rn ; by choosing the
appropriate values for x we seePthat cii ≥ 0 for all i. Now suppose C is also symmetric and idempotent. Hence
C = C2 = CCT and hence cii = j c2ij . This shows that cii = 0 implies cij = cji = 0 for all j.
For our problem, every Bj is real, symmetric and idempotent and hence non-negative definite. So equation(7.17a) implies
that for every j = 1, 2, . . . , k there exists an r × r matrix Cj such that
 
Cj 0
Bj =
0 0
Because Bj is symmetric and idempotent, it follows that Cj is symmetric and idempotent; also equation(7.17a) implies
C1 + · · · + Ck = Ir . By proposition (7.14a), Ci Cj = 0 for all i 6= j. Hence Bi Bj = 0 for all i 6= j. Hence Ai Aj = 0 for
all i 6= j.
(1) and (3) ⇒ implies (2). Now Ai Aj = 0; hence ATj ATi = 0. But Ai and Aj are symmetric; hence Ai Aj = 0 = Aj Ai .
Hence Ai Aj = Aj Ai for all i 6= j. Hence there exists an orthogonal matrix Q and diagonal matrices D1 , . . . , Dk such that
QT Ai Q = Di for all i. Denote the diagonal elements of Di by {di1 , . . . , din }.
Let D = D1 +· · ·+Dk ; hence D is diagonal and QT AQ = D. Because A is symmetric and idempotent, the diagonal elements
of D are either 0 or 1. Also D = D2 = (D1 + · · · + Dk )2 = D21 + · · · + D2k because Di Dj = QT Ai QQT Aj Q = QT Ai Aj Q = 0
for all i 6= j. Picking out the (1, 1) element of the equation D21 + · · · + D2k = D gives d211 + · · · + d2k1 = 0 or 1. Arguing
this way shows that every diagonal element of every one of the matrices D1 , . . . , Dk is either 0 or 1. Hence every Di is
idempotent; and hence every Ai = QDi QT is idempotent, as required.
Suppose both (1) and (2) are both true. Because the matrices are idempotent, we have trace(A) = rank(A) and trace(Ai ) =
rank(Ai ). Hence
X X X
rank(A) = trace(A) = trace( Ai ) = trace(Ai ) = rank(Ai )
i i i
This completes the proof.
We now consider a variation of the previous result which is less useful—because we are no longer assuming the
matrices are symmetric. First we need a lemma
Lemma(7.17b). The rank cancellation rule. Suppose L1 and L2 are ` × m matrices, A is m × n1 and B is
n1 × n2 . Suppose further that L1 AB = L2 AB and rank(AB) = rank(A). Then L1 A = L2 A.
Proof. Let r = rank(A). Using the full rank factorization of A shows there exists an m × r matrix F and an n1 × r matrix
G with A = FGT and rank(F) = rank(G) = r.
Hence rank(AB) = rank(FGT B) = rank(GT B) because F has full column rank—see exercise 13. Hence rank(GT B) = r.
Now L1 AB = L2 AB; hence L1 FGT B = L2 FGT B; hence L1 FGT B(GT B)T = L2 FGT B(GT B)T . But the matrix GT B has
full row rank; hence GT B(GT B)T is non-singular by exercise 13. Hence L1 F = L2 F and hence L1 A = L2 A.
This lemma is used in the following proof
Proposition(7.17c). Suppose A, A1 , . . . , Ak are real n × n matrices with A1 + · · · + Ak = A. Suppose A is
idempotent. Let r = rank(A) and rj = rank(Aj ) for j = 1, 2, . . . , k. Consider the following 3 statements:
(1) Ai Aj = 0 for all i 6= j and rank(A2i ) = rank(Ai ) for all i.
(2) Aj is idempotent for every j = 1, 2, . . . , k
(3) r = r1 + · · · + rk .
Then each of the statements (1), (2) and (3) imply the other two.
Proof.
(1) ⇒ (2) Using Ai Aj = 0 gives Ai A = A2i and A2i A = A3i . We also need to use A is idempotent. Hence A2i = Ai A =
Ai A2 = (Ai A)A = A2i A = A3i .
Now rank(A2i ) = rank(Ai ). Now use rank cancellation rule with L1 = A = B = Ai and L2 = I. Hence A2i = Ai . Similarly,
every Ai is idempotent; hence (2).
(2) ⇒ (3) For an idempotent matrix, the rank equals the trace; hence
k
X k
X Xk
rank(Ai ) = trace(Ai ) = trace( Ai ) = trace(A) = rank(A)
i=1 i=1 i=1
and hence (3).
(3) ⇒ (1)
Let A0 = In − A. Hence A0 + A1 + · · · + Ak = In and A0 is idempotent. By exercise 7 on page 95, we know that for any
n × n matrix A, A is idempotent iff rank(A) + rank(I − A) = n. Applying this to A0 gives rank(A0 ) = n − rank(A). Hence
rank(A0 ) + rank(A1 ) + · · · + rank(Ak ) = n.
Because the rank of a sum of matrices is less than or equal the sum of the ranks, we have
rank(In − A1 ) ≤ rank(A0 ) + rank(A2 ) + · · · + rank(Ak ) = n − rank(A1 )
Page 94 §7 Jan 8, 2019(21:02) Bayesian Time Series Analysis

But In = A1 + (In − A1 ); hence n = rank(In ) ≤ rank(A1 ) + rank(In − A1 ). Hence rank(A1 ) + rank(In − A1 ) = n and
hence A1 is idempotent. Similarly every Ai is idempotent and hence rank(A2i ) = rank(Ai ).
Now In = (A1 + A2 ) + (In − A1 − A2 ). Hence n ≤ rank(A1 + A2 ) + rank(In − A1 − A2 ). But
rank(In − A1 − A2 ) = rank(A0 + A3 + · · · + Ak ) ≤ rank(A0 ) + rank(A3 ) + · · · + rank(Ak )
= n − rank(A1 ) − rank(A2 ) ≤ n − rank(A1 + A2 )
Hence rank(A1 + A2 ) + rank(In − A1 − A2 ) = n. Hence A1 + A2 is idempotent. Hence exercise 8 on page 95, we know
that A1 A2 = A2 A1 = 0.
Hence the proposition is proved.
7.18 Converting the matrix results into random variable results. Converting the matrix result in corol-
lary(7.16b) on page 92 into a result about random variables gives:
Proposition(7.18a). Suppose X ∼ N (0, σ 2 In ). Suppose further that A, A1 , . . . , Ak are k + 1 symmetric n × n
matrices with A = A1 + · · · + Ak . Suppose Ak is non-negative definite,
XT AX 2 XT Aj X
∼ χ r and ∼ χ2rj for j = 1, . . . , k − 1.
σ2 σ2
where r = rank(A) and rj = rank(Aj ) for j = 1, 2, . . . , k. Then XT A1 X, . . . , XT Ak X are independent and
XT Ak X
∼ χ2rk where rk = r − (r1 + · · · + rk−1 ).
σ2
Proof. By exercise 2, we know that A, A1 , . . . , Ak−1 are all idempotent. Hence corollary(7.16b) implies Ai Aj = 0 for
i 6= j and Ak is idempotent. Also r = r1 + · · · + rk .
By proposition(7.4a) on page 84 we know that A1 X, . . . , Ak X are pairwise independent. The result in box display(5.8b)
on page 75 shows that A1 X, . . . , Ak X are independent. Now (A1 X)T (A1 X) = XT A1 X because A1 is idempotent. Hence
XT A1 X, . . . , XT Ak X are independent.
We now generalize the previous result to non-zero mean and general non-singular variance matrix Σ.
Proposition(7.18b). Suppose the n-dimensional random vector X has the non-singular N (µ, Σ) distribution.
Suppose further that A1 , A2 , . . . , Ak are symmetric n × n matrices and A = A1 + · · · + Ak . Hence A is also
symmetric. Suppose AΣ, A1 Σ, . . . , Ak−1 Σ are all idempotent and Ak Σ is non-negative definite.
Let rj = rank(Aj ) for j = 1, 2, . . . , k and r = rank(A).
Then
• XT Aj X ∼ χ2rj ,µT Aj µ for j = 1, 2. . . . , k
• XT AX ∼ χ2r,µT Aµ
• XT A1 X, . . . , XT Ak X are independent
Also Ak Σ is idempotent, r = r1 + · · · + rk , and Ai ΣAj = 0 for all i 6= j.
Proof. Applying Corollary(7.16b) to AΣ, A1 Σ, . . . , Ak Σ shows that Ak Σ is idempotent and Ai ΣAj Σ = 0. Because Σ
is non-singular, this implies Ai ΣAj = 0. Using rank(AΣ) = rank(A) etc. shows that r = r1 + · · · + rk .
By proposition(7.11a) on page 87, XT Aj X ∼ χ2rj ,µT Aj µ and XT AX ∼ χ2r,µT Aµ .
By proposition(7.4a) on page 84 and the result in box display(5.8b), we know that A1 X, . . . , Ak X are independent. Now
(A1 X)T (A1 X) = XT A1 X because A1 is idempotent. Hence XT A1 X, . . . , XT Ak X are independent.
The next proposition is the random variable form of proposition(7.17a) on page 92.
Theorem(7.18c). Suppose the n-dimensional random vector X has the non-singular N (µ, Σ) distribution.
Suppose further that A, A1 , . . . , Ak are real symmetric n × n matrices with A1 + · · · + Ak = A. Let r = rank(A)
and rj = rank(Aj ) for j = 1, 2, . . . , k. Consider the following 3 statements:
(1) AΣ is idempotent (2) Aj Σ is idempotent for every j = 1, 2, . . . , k (3) Ai ΣAj = 0 for all i 6= j.
If any two of these statements are true then all three are true and also r = r1 + · · · + rk and
• XT Aj X ∼ χ2rj ,µT Aj µ for j = 1, 2, . . . , k
• XT AX ∼ χ2r,µT Aµ
• XT A1 X, . . . , XT Ak X are independent
Proof. Applying proposition(7.17a) on page 92 to the matrices AΣ, A1 Σ, . . . , Ak Σ shows that any two of the statements
are true then all three are true and also r = r1 + · · · + rk .
Because Σ is non-singular, rank(AΣ) = rank(A) = r and hence by proposition(7.11a) on page 87 we have XT AX ∼
χ2r,µT Aµ . Similarly XT Aj X ∼ χ2rj ,µT Aj µ for j = 1, 2, . . . , k.
Proposition(7.4a) on page 84 implies A1 X, . . . , Ak X are pairwise independent. The result in box display(5.8b) on page 75
shows that A1 X, . . . , Ak X are independent. By Zhang’s approach (see after proposition(7.7a) on page 86) we know that
2 Multivariate Continuous Distributions Jan 8, 2019(21:02) §8 Page 95

for any n × n symmetric matrix A we have XT AX = (AX)T A− (AX) = fA (AX) for some function fA where A− is a
generalized inverse of A. Hence XT A1 X, . . . , XT Ak X are independent.
The previous proposition implies the following special case:
Corollary(7.18d). Suppose the n-dimensional random vector X has the non-singular N (µ, σ 2 In ) distribution.
Suppose further that A, A1 , . . . , Ak are real symmetric n × n matrices with A1 + · · · + Ak = A. Let r = rank(A)
and rj = rank(Aj ) for j = 1, 2, . . . , k. Consider the following 3 statements:
(1) A is idempotent (2) Aj is idempotent for every j = 1, 2, . . . , k (3) Ai Aj = 0 for all i 6= j.
If any two of these statements are true then all three are true and also r = r1 + · · · + rk and
• XT Aj X/σ 2 ∼ χ2rj ,µT Aj µ/σ2 for j = 1, 2, . . . , k
• XT AX/σ 2 ∼ χ2r,µT Aµ/σ2
• XT A1 X, . . . , XT Ak X are independent
Note that it is not generally true that X ∼ χ2n,λ implies X/a ∼ χ2n,λ/a . However, in this case we see the theorem
implies XT AX ∼ χ2r,µT Aµ and, by applying the theorem to X/σ, that XT AX/σ 2 ∼ χ2r,µT Aµ/σ2

8 Exercises (exs-quadraticForms.tex)

1. Continuation of examples(1.7d) and (1.8d). Suppose X1 , . . . , Xn are i.i.d. random variables with the N (µ, σ 2 ) distri-
bution. By using proposition(7.11a) on page 87 show that
Pn 2
(n − 1)S 2 j=1 (Xj − X)
= ∼ χ2n−1
σ2 σ2
2. Distribution of XT CX when C is idempotent—see §7.10. Suppose X1 , . . . , Xn are i.i.d. random variables with the
N (0, σ 2 ) distribution and C is an n × n symmetric matrix. Show that
XT CX
∼ χ2r iff C is idempotent with rank(C) = r.
σ2
Hint: the implication ⇐ has been proved in proposition(7.10a); use characteristic functions for ⇒ .
3. Independence of two normal quadratic forms—see §7.6. Suppose X is an n-dimensional random vector with the
N (0, σ 2 I) distribution and A and B are n × n symmetric idempotent matrices. Then XT AX and XT BX are independent
iff AB = 0.
Hint: the implication ⇐ has been proved in proposition(7.6a).
4. Independence of normal quadratic and normal linear form—see §7.5. Suppose X is an n-dimensional random vector
with the N (0, σ 2 I) distribution, a is n × 1 vector and B is a symmetric idempotent n × n matrix. Then aT X and XT BX
are independent random variables iff aT B = 0.
Hint: the implication ⇐ has been proved in proposition(7.5a).
5. Suppose the n-dimensional random vector X has the N (µ, Σ) distribution, A is an m × n real matrix and B is an
n × n symmetric matrix. Show that
cov[ AX, XT BX ] = 2AΣBµ
6. Suppose X1 and X2 are i.i.d. random variables with the N (0, σ 2 ) distribution. Find those linear functions of X1 and X2
which are independent of (X1 − X2 )2 .
7. Suppose A is a real n × n matrix. Show that A is idempotent iff rank(A) + rank(I − A) = n.
8. Suppose A and B are real idempotent n × n matrices. Prove that A + B is idempotent iff BA = AB = 0.
9. Suppose P1 and P2 are symmetric idempotent matrices and P1 −P2 is non-negative definite. Prove that P1 P2 = P2 P1 = P2
and P1 − P2 is idempotent.
10. Continuation of example(7.15b) on page 91. Suppose X1 , . . . , Xn are i.i.d. random variables with the N (µ, σ 2 ) dis-
tribution and µ0 6= µ. Show that
X − µ0
T = √
S/ n
has a non-central tn−1 distribution and find the value of the non-centrality parameter.
11. Suppose X1 , . . . , Xn are i.i.d. random variables with the N (µ, σ 2 ) distribution. Find the distribution of
2
nX
Y =
S2
Page 96 §8 Jan 8, 2019(21:02) Bayesian Time Series Analysis

12. Suppose the n-dimensional random vector X has the N (µ, Σ) distribution where µ = µ(1, 1, . . . , 1)T and
1 ρ ··· ρ
 
ρ 1 ··· ρ
Σ = σ2  2
 .. .. . . . ..  = σ [ (1 − ρ)In + ρ1 ]

. . .
ρ ρ ··· 1
Find the distribution of Pn 2
j=1 (Xj − X)
Y =
σ 2 (1 − ρ)

13. From page 75 of [H ARVILLE(1997)] we know that for any m × n matrix A we have rank(A) = rank(AAT ). Also
rank(A) = rank(AT ); hence rank(A) = rank(AT ) = rank(AAT ) = rank(AT A). Also, by page 37 of [H ARVILLE(1997)],
rank(AB) ≤ min{rank(A), rank(B)}.
(a) Suppose the m × n matrix A has full column rank, and hence n ≥ m. Show that AT A is non-singular.
(b) Suppose the m × n matrix A has full row rank, and hence m ≥ n. Show that AAT is non-singular.
(c) Suppose A is m × n and B is r × m. Show that rank(BA) = rank(BT BA). Hence deduce that if B has full column
rank then rank(BA) = rank(A).
(d) Suppose A is m × n and B is n × r with full row rank. Show that rank(AB) = rank(A).

14. The hat matrix in linear regression.


(a) Suppose x is an n × p real matrix with rank(x) = p < n. Let H = x(xT x)−1 xT . Show that H and In − H are both
idempotent and find rank(H) and rank(In − H).
(b) Suppose b is a p × 1 real vector and the n-dimensional random vector Y has the N (xb, σ 2 In ) distribution. Find the
values of E[YT HY] and E[YT (In − H)Y] and the distributions of YT HY/σ 2 and YT (In − H)Y/σ 2 .
(c) Show that YT HY and YT (In − H)Y are independent and hence write down the distribution of
YT HY/p
Y (In − H)Y/(n − p)
T

15. See proposition(7.7a) on page 86. Suppose X is an n-dimensional random vector with the non-singular N (µ, Σ) dis-
tribution. Suppose further that A is k × n matrix and B is a symmetric n × n matrix.
Prove that AΣB = 0 implies AX and XT BX are independent.

16. The following two results are used in the proof of proposition(7.2a).
(a) Suppose X ∼ N (0, 1). Prove that
t2
 
2 1
E[esX +tX
]= √ exp for t ∈ R and s < 12 .
1 − 2s 2(1 − 2s)
(b) Suppose F and G are n × r matrices such that In − FGT and Ir − GT F are non-singular. Show that
(In − FGT )−1 = In + F(Ir − GT F)−1 GT

17. Suppose Σ is an n × n non-negative definite matrix and A is an n × n symmetric matrix.


(a) Show that ΣAΣAΣ = ΣAΣ iff (ΣA)3 = (ΣA)2 .
(b) Show that rank(AΣ) = rank(ΣA) = rank(AΣA).
(c) Show that rank(AΣAΣ) = rank(ΣAΣ) = rank(ΣAΣA).
(d) Show that AΣ is idempotent iff Σ1/2 AΣ1/2 is idempotent.
(d) ⇒ Now AΣAΣ = AΣ. Premultiply by Σ1/2 and postmultiply by Σ−1/2 ; this gives Σ1/2 AΣAΣ1/2 as required.
⇐ Now Σ1/2 AΣAΣ1/2 = Σ1/2 AΣ1/2 . Premultiply by Σ−1/2 and postmultiply by Σ1/2 ; this gives AΣAΣ = AΣ as
required.

18. The following two results are used in part (c) of exercise 19 below.
(a) Suppose A is a real n × n non-singular matrix and let B = A−1 . Assuming the necessary differentiability, show that
2
d2 B d2 A

dB dA dA
= −B B and = −B B + 2 B B
dx dx dx2 dx2 dx
(b) Suppose A is an n × n real matrix with eigenvalues {λ1 , . . . , λn }. Show that
X
[trace(A)]2 = trace(A2 ) + 2 λi λj
i,j
i6 =j
2 Multivariate Continuous Distributions Jan 8, 2019(21:02) §9 Page 97

19. Suppose X is an n-dimensional random vector with the possibly singular N (µ, Σ) distribution and A is a real n × n
symmetric matrix.
(a) Suppose g(t) = |I − 2tAΣ|. Show that
d2 g

dg X
g(t)|t=0 = 1 and = −2 trace(AΣ) and = 8 λi λj
dt t=0 dt2 t=0

i,j
i6 =j

(b) The cumulant generating function of the random variable Y is KY (t) where

X κr tr
KY (t) = ln MY (t) =
r!
r=1
2

and then κ1 = µ and κ2 = σ 2 . Hence µ = dKdtY (t) and σ 2 = d K Y (t)
.

dt2
t=0 t=0
Hence prove that
var[XT AX] = 2 trace[ (AΣ)2 ] + 4µT AΣAµ

20. Suppose X is an n-dimensional random vector with the possibly singular N (µ, Σ) distribution and A is a real n × n
symmetric matrix.
(a) Suppose |t| < min{1/|2λ1 |, . . . , 1/|2λn |}. Show that
∞ r
1 X t r−1
− ln[|I − 2tAΣ|] = 2 trace[ (AΣ)r ]
2 r
r=1

(b) Suppose |t| < min{1/|2λ1 |, . . . , 1/|2λn |}. Show that



X
I − (I − 2tAΣ)−1 Σ−1 = − 2r tr (AΣ)r−1 A
 

r=1

(c) By using parts (a) and (b), prove that the r cumulant of the quadratic form XT AX is
th

κr = 2r−1 (r − 1)! trace(AΣ) ]r + 2r−1 r!µT (AΣ)r−1 Aµ


This result generalizes part (b) of exercise 19.

9 The bivariate t distribution


9.1 The bivariate t-distribution with equal variances. One possible version of the bivariate t-density is
−(ν+2)/2
x2 − 2ρxy + y 2

1
f(X,Y ) (x, y) = 1+ (9.1a)
ν(1 − ρ2 )
p
2π 1 − ρ2
for ν > 0, ρ ∈ (−1, 1), x ∈ R and y ∈ R.

If x denotes the 2 × 1 vector (x, y), then an alternative expression which is equivalent to equation(9.1a) is
ν (ν+2)/2  −(ν+2)/2
fX (x) = p ν + xT C−1 x (9.1b)
2π 1 − ρ2
   
−1 1 1 −ρ 1 ρ
where C = and C = .
1 − ρ2 −ρ 1 ρ 1
This distribution is called the tν (0, C) distribution. We shall see below in §9.3 on page 98 that C = corr[X] and
X and Y have equal variances.
9.2.Characterization of the bivariate tν (0, C) distribution. The univariate tν -distribution is the distribution of
p
Z W/ν where Z ∼ N (0, 1), W ∼ χ2 and Z and W are independent. The generalisation to 2 dimensions is:
ν
Proposition(9.2a). Suppose Z = (Z1 , Z2 ) ∼ N (0, C) where
 
1 ρ
C= and ρ ∈ (−1, 1)
ρ 1
Suppose further that W ∼ χ2ν and Z and W are independent. Define X = (X, Y ) by
Z
X=
(W/ν)1/2
Then X = (X, Y ) has the tν (0, C) density given in (9.1a).
Page 98 §9 Jan 8, 2019(21:02) Bayesian Time Series Analysis

Proof. The density of (Z1 , Z2 , W ) is


wν/2−1
 2
z − 2ρz1 z2 + z22
 h wi
1
f(Z1 ,Z2 ,W ) (z1 , z2 , w) = exp − 1 exp −
2(1 − ρ2 ) 2ν/2 Γ( ν2 )
p
2π 1 − ρ2 2
.p .p
Consider the transformation to (X, Y, W ) where X = Z1 W/ν , Y = Z W/ν and W = W . This is a 1 − 1
2
transformation and the absolute value of the Jacobian
is
∂(x, y, w) ν
∂(z1 , z2 , w) = w

Hence
wν/2 w x2 − 2ρxy + y 2
  
w
f(X,Y,W ) (x, y, w) = f (z1 , z2 , w) = ν exp − + 1
ν(1 − ρ2 )
p
ν 2 2 +1 πνΓ( ν2 ) 1 − ρ2 2
wν/2 h wα i x2 − 2ρxy + y 2
= ν exp − where α = +1 (9.2a)
ν(1 − ρ2 )
ν
p
2
+1

2 2 πνΓ 21 − ρ2
Now using the integral of the χ2n density is 1 gives
Z ∞ h xi
n
x 2 −1 exp − dx = 2n/2 Γ n/2

0 2
which implies
Z ∞     ν2 +1    ν   2  ν2 +1  ν 
ν tα 2 ν
t 2 exp − dt = Γ +1 = Γ
0 2 α 2 2 α 2
Integrating the variable w out of equation(9.2a) gives
−(ν+2)/2
x2 − 2ρxy + y 2

1
f(X,Y ) (x.y) = 1+ for (x, y) ∈ R2
ν(1 − ρ2 )
p
2π 1 − ρ2
which is equation(9.1a) above.
9.3 Properties of the bivariate tν (0, C) distribution.
• The marginal distributions. Both X and Y have t-distributions with ν degrees of freedom. The proof of this is
left to exercise 1 on page 101.
• Moments. E[X] = E[Y ] = 0 and var[X] = var[Y ] = ν/(ν − 2) for ν > 2. The correlation is corr[X, Y ] = ρ
and the covariance is cov[X, Y ] = ρν/(ν − 2). The proof of these results is left to exercise 2 on page 101. It
follows that  
ν 1 ρ ν
var[X] = = C and corr[X] = C
ν−2 ρ 1 ν−2
• If ρ = 0, then equation(9.1a) becomes
−(ν+2)/2
x2 + y 2

1
f(X,Y ) (x, y) = 1+
2π ν
Note that f(X,Y ) (x, y) 6= fX (x)fY (y) and hence X and Y are not independent even when ρ = 0.
9.4 Generalisation to non-equal variances. Suppose T1 = aX and T2 = bY where a 6= 0 and b 6= 0 and
X = (X, Y ) ∼ tν (0, C).
 Thus    2 
T1 a 0 ν a abρ ν
T= = X and Σ = var[T] = = R
T2 0 b ν − 2 abρ b2 ν−2
where 8  2   2 
a abρ −1 1 b −abρ 1
R= and R = 2 2 and |R−1 | = 2 2
abρ b2 a b (1 − ρ ) 2 −abρ a 2
a b (1 − ρ2 )
The absolute value of the Jacobian is |ab|. Substituting in equation(9.1a) on page 97 gives
−(ν+2)/2
b2 t21 − 2ρabt1 t2 + a2 t22

1
fT (t) = 1+
νa2 b2 (1 − ρ2 )
p
2π|ab| 1 − ρ2
−(ν+2)/2
tT R−1 t ν (ν+2)/2 

1 T −1 −(ν+2)/2

= 1 + = ν + t R t
2π|R|1/2 ν 2π|R|1/2
8
In general, the inverse of the 2 × 2 symmetric matrix
   
a c 1 b −c
is
c b ab − c2 −c a
2
provided ab 6= c .
2 Multivariate Continuous Distributions Jan 8, 2019(21:02) §10 Page 99
ν
This is the tν (0, R) distribution. Note that var[T] = ν−2 R.

10 The multivariate t distribution


10.1 The density of the multivariate t-distribution, tν (0, I). If we put ρ = 0 in equation(9.1b) we see that if
T ∼ tν (0, I) then
ν (ν+2)/2  −(ν+2)/2
fT (t) = ν + tT t

Generalizing to p-dimensions leads to the following definition.
Definition(10.1a). The p-dimensional random vector T has the t-distribution tν ( 0 , I ) iff T has density
p×1 p×1 p×p
1
f (t) ∝  (ν+p)/2
ν + tT t
where ν ∈ R and ν > 2.
An alternative expression is:
κ
f (t1 , . . . , tp ) =  (ν+p)/2
ν+ t21 + · · · + t2p
The constant of proportionality, κ, can be determined by integration. Integrating out tp gives
Z ∞ Z ∞
dtp dtp
f (t1 , . . . , tp−1 ) = κ  (ν+p)/2 = 2κ  (ν+p)/2
2
−∞ ν + t + · · · + tp 2 0 2
ν + t1 + · · · + t2p
1
Z ∞
dtp 2 2
= 2κ  (ν+p)/2 where α = ν + t1 + · · · + tp−1
0 α + tp2
Z ∞
2κ dtp
= (ν+p)/2 (ν+p)/2
α

0 1 + t2p /α
√ Z ∞
2κ α dx √ √
= (ν+p)/2 √ (ν+p)/2 where x = tp ν + p − 1/ α
α ν+p−1 0

2
1 + x /(ν + p − 1)
Using the standard result that
Z ∞ −(n+1)/2 √  
t2 √ nΓ 1/2 Γ n/2
2 1+ dt = nB(1/2, n/2) = 
0 n Γ (n+1)/2
implies

κ πΓ( (ν+p−1)/2)
f (t1 , . . . , tp−1 ) =
α(ν+p−1)/2 Γ( (ν+p)/2)

κ π Γ( (ν+p−1)/2) 1
=
Γ( (ν+p)/2) [ν + t1 + · · · + t2p−1 ](ν+p−1)/2
2

By induction
κπ (p−1)/2 Γ( (ν+1)/2)
f (t1 ) =  (ν+1)/2
Γ( (ν+p)/2) ν + t21
and so
ν ν/2 Γ( (ν+p)/2)
κ=
π p/2 Γ( ν/2)

It follows that the density of the p-dimensional tν (0, I) is


ν ν/2 Γ( (ν+p)/2) 1
f (t) = p/2
(10.1a)
π Γ( /2) ν + tT t (ν+p)/2
ν  
Page 100 §10 Jan 8, 2019(21:02) Bayesian Time Series Analysis

10.2 Characterization of the tν (0, I) distribution.


Proposition(10.2a). Suppose Z1 , Z2 , . . . , Zp are i.i.d. with the N (0, 1) distribution and W has the χ2ν dis-
tribution. Suppose further that Z = (Z1 , Z2 , . . . , Zp ) and W are independent. Define T = (T1 , T2 , . . . , Tp )
by
Z
T=
(W/ν)1/2
Then T has the density in equation(10.1a).
Proof. See exercise 3 on page 101.
10.3 Properties of the tν (0, I) distribution.
• TT T/p has the F (p, ν) distribution—see exercise 6 on page 101.
• The contours of the distribution are ellipsoidal (the product of independent t distributions does not have this
property).
• The marginal distribution of an r-dimensional subset of T has the tν (0, I) distribution. In particular, each Ti
has the tν distribution. These results follow immediately from the characterization in §10.2.
ν
• E[T] = 0 and var[T] = E[TTT ] = ν−2 I for ν > 2. (Because W ∼ χ2ν implies E[1/W ] = 1/(ν − 2).)
Finally, corr[T] = I.
10.4 The p-dimensional t-distribution: tν (m, C).
Here C is real, symmetric, positive definite p × p matrix.
The Cholesky decomposition implies there exists a real and nonsingular L with C = LLT . Let
V = m + L T where T ∼ tν (0, I)
p×1 p×1 p×p p×1
T ν ν T
Then E[V] = m and var(V) = Lvar(T)L = ν−2 LL
= ν−2 C. See exercise 4 on page 101 for the proof of the
result
TT T = (V − m)T C−1 (V − m) (10.4a)
It follows that V has density:
ν ν/2 Γ (ν+p)/2

κ
f (v) = (ν+p)/2 where κ = p/2 Γ ν/
 and |L| = |C|1/2 (10.4b)
π

T −1
|L| ν + (v − m) C (v − m) 2

A random variable which has the density given in equation(10.4b) is said to have the tν (m, C) distribution.
Definition(10.4a). Suppose C is real, symmetric, positive definite p × p matrix and m is a p × 1 vector in Rp .
Then the p-dimensional random vector V has the tν (m, C) distribution iff V has the density
1
f (v) ∝  (ν+p)/2
ν + (v − m)T C−1 (v − m)
It follows that
ν
E[V] = m and var[V] = C
ν−2
and the constant of proportionality is given in equation(10.4b).
10.5 Linear transformation of the tν (m, C) distribution. Suppose T ∼ tν (m, C); thus m is the mean vector
ν
and ν−2 C is the covariance matrix of the random vector T. Suppose V = a + AT where A is non-singular.
It follows that T = A−1 (V − a), E[V] = a + Am and var[V] = ν−2ν
ACAT .
Let m1 = a + Am and C1 = ACAT . Then V has the tν (m1 , C1 ) distribution—see exercise 5 on page 101.
10.6 Characterization of the tν (m, C) distribution.
Proposition(10.6a). Suppose Z has the non-singluar multivariate normal distribution N (0, Σ) and W has
the χ2ν distribution. Suppose further that Z and W are independent. Then T = m + Z/(W/ν)1/2 has the
tν (m, Σ) distribution.
Proof. Because Z has a non-singular distribution, Σ is positive definite and there exists a symmetric non-singular Q with
Σ = QQ. Let Y = Q−1 Z. Then var[Y] = Q−1 var[Z](Q−1 )T = I. So Y ∼ N (0, I). Hence
Y
T1 = p ∼ tν (0, I)
W/ν
Using §10.5 gives T = m + QT1 ∼ tν (m, Σ) as required.
2 Multivariate Continuous Distributions Jan 8, 2019(21:02) §12 Page 101

10.7
Summary.
ν
• The bivariate t-distribution tν (0, R). This has E[T] = 0 and var[T] = ν−2 R. The density is
ν (ν+2)/2  −(ν+2)/2
fT (t) = 1/2
ν + tT R−1 t
2π|R|
Particular case:
−(ν+2)/2
t21 − 2ρt1 t2 + t22
  
1 ν 1 ρ
fT (t) = p 1+ where var[T] =
2π 1 − ρ2 ν(1 − ρ2 ) ν−2 ρ 1
ν
• The p-dimensional t-distribution tν (m, R). This has E[T] = m and var[T] = ν−2 R. The density is
ν ν/2 Γ ν+p

1
f (t) = p/2 ν2 
π Γ 2 |R|1/2 ν + (v − m)T R−1 (v − m) (ν+p)/2
 

• Characterization of the t-distribution. Suppose Z ∼ N (0, Σ) and W has the χ2ν distribution. Suppose
further that Z and W are independent. Then T = m + Z/(W/ν)1/2 has the tν (m, Σ) distribution.

11 Exercises (exs-t.tex.tex)

1. Suppose T has the bivariate t-density given in equation(9.1a) on page 97. Show that both marginal distributions are the
tν -distribution and hence have density given in equation(16.1b) on page 36:
−(ν+1)/2
t2

1
f (t) = √ 1+ for t ∈ R.
B( 1/2, ν/2) ν ν
2. Suppose T has the bivariate t-density given in equation(9.1a) on page 97 and ν > 2.
(a) Find E[X] and var[X].
(b) Find cov[X, Y ] and corr[X, Y ].
3. Prove proposition(10.2a) on page 100: Suppose Z1 , Z2 , . . . , Zp are i.i.d. with the N (0, 1) distribution and W has the
χ2ν distribution.
 Suppose further that Z = (Z1 , Z2 , . . . , Zp ) and W are independent. Define T = (T1 , T2 , . . . , Tp ) by
1/2
T = Z (W/ν) . Then T has the following density
ν ν/2 Γ( (ν+p)/2) 1
f (t) = p/2
π Γ( /2) ν + tT t(ν+p)/2
ν 

4. Prove equation(10.4a) on page 100: TT T = (V − m)T C−1 (V − m).


5. See §10.5 on page 100. Suppose T ∼ tν (m, C) and V = a + AT where A is non-singular. Prove that V ∼ tν (m1 , C1 )
where m1 = a + Am and C1 = ACAT .
6. Suppose the p-variate random vector T has the tν (0, I) distribution. Show that TT T/p has the F (p, ν) distribution.

12 The Dirichlet distribution


12.1 Two and three dimensions. Suppose X1 ∼ Gamma(k1 , α) and X2 ∼ Gamma(k2 , α) and X1 and X2
are independent. Let Y1 = X1 /(X1 + X2 ), Y2 = X2 /(X1 + X2 ) and Z = X1 + X2 . Then Y1 ∼ Beta(k1 , k2 ),
Y2 ∼ Beta(k2 , k1 ) and Z ∼ Gamma(k1 + k2 , α). Also Y1 and Z are independent and Y2 and Z are independent.
This is shown in exercise 5 on page 24. Note that the random vector (Y1 , Y2 ) does not have a density function
because Y1 + Y2 = 1. The density of Y1 is
Γ(k1 + k2 ) k1 −1
fY1 (y1 ) = y (1 − y1 )k2 −1 for 0 < y1 < 1.
Γ(k1 )Γ(k2 ) 1
Definition(12.1a). Suppose k1 > 0 and k2 > 0; then the two dimensional random vector (Y1 , Y2 ) has the
Dirichlet distribution Dir(k1 , k2 ) iff Y1 ∼ Beta(k1 , k2 ) and Y2 = 1 − Y1 .
This definition implies Y2 ∼ Beta(k2 , k1 ) and Y1 = 1 − Y2 ; hence this is an equivalent definition of the same
Dirichlet distribution.
Page 102 §12 Jan 8, 2019(21:02) Bayesian Time Series Analysis

Note that if we consider k1 + k2 − 2 independent Bernoulli trials in which the probability of success on any trial
is y1 . The probability of k1 − 1 successes is (k1 + k2 − 1)fY1 (y1 ).
Consider a move to three dimensions: suppose X1 ∼ Gamma(k1 , α), X2 ∼ Gamma(k2 , α), X3 ∼ Gamma(k3 , α)
and X1 , X2 and X3 are independent. The joint density of X = (X1 , X2 , X3 ) is
xk1 1 −1 x2k2 −1 xk3 3 −1 e−α(x1 +x2 +x3 )
fX (x) = αk1 +k2 +k3 for x1 > 0, x2 > 0 and x3 > 0.
Γ(k1 )Γ(k2 )Γ(k3 )
Let W = (Y1 , Y2 , Z) where
X1 X2
Y1 = Y2 = and Z = X1 + X2 + X3
X1 + X2 + X3 X1 + X2 + X3
Hence the transformation X = (X1 , X2 , X3 ) 7−→ W = (Y1 , Y2 , Z) maps
{(x1 , x2 , x3 ) ∈ R3 : x1 > 0, x2 > 0, x3 > 0} 7−→ {(y1 , y2 , z) ∈ R3 : y1 > 0, y2 > 0, y1 + y2 < 1, z > 0}
Note that X1 = Y1 Z, X2 = Y2 Z and X3 = Z(1−Y1 −Y2 ). The absolute value of the Jacobian of the transformation
is
∂(x1 , x2 , x3 ) z 0 y1 z 0 y1

∂(y1 , y2 , z) = 0 z y2 = 0 z y2 = z 2

−z −z 1 − y1 − y2 0 0 1

where we have added the first and second rows of the first determinant to the third row in order to get the second
determinant. Hence the density of (Y1 , Y2 , Z) is
αk1 +k2 +k3 y1k1 −1 y2k2 −1 (1 − y1 − y2 )k3 −1 −αz k1 +k2 +k3 −1
f(Y1 ,Y2 ,Z) (y1 , y2 , z) = e z
Γ(k1 )Γ(k2 )Γ(k3 )
for y1 > 0, y2 > 0, y1 + y2 < 1, z > 0.
This density factorizes into the densities of (Y1 , Y2 ) and Z; hence (Y1 , Y2 ) is independent of Z. Clearly Z ∼
Gamma(k1 + k2 + k3 , α). Integrating out z gives the density of (Y1 , Y2 ):
Γ(k1 + k2 + k3 ) k1 −1 k2 −1
f(Y1 ,Y2 ) (y1 , y2 ) = y y2 (1 − y1 − y2 )k3 −1 for y1 > 0, y2 > 0, y1 + y2 < 1. (12.1a)
Γ(k1 )Γ(k2 )Γ(k3 ) 1
Definition(12.1b). Suppose k1 > 0, k2 > 0 and k3 > 0. Then the random vector (Y1 , Y2 , Y3 ) has the Dirichlet
distribution Dir(k1 , k2 , k3 ) iff (Y1 , Y2 ) has the density in equation(12.1a) and Y3 = 1 − Y1 − Y2 .
Given definition(12.1b), consider the transformation (Y1 , Y2 ) 7−→ (Y1 , Y3 = 1 − Y1 − Y2 ). The absolute value of
the Jacobian is 1 and hence the density of (Y1 , Y3 ) is
Γ(k1 + k2 + k3 ) k1 −1 k3 −1
f(Y1 ,Y3 ) (y1 , y3 ) = y y3 (1 − y1 − y3 )k2 −1 for y1 > 0, y3 > 0, y1 + y3 < 1.
Γ(k1 )Γ(k2 )Γ(k3 ) 1
Integrating out Y3 shows that Y1 ∼ Beta(k1 , k2 + k3 ), and so on.
Similarly for the density of (Y2 , Y3 ).
12.2 The general case. Now suppose X1 , . . . , Xn are independent random variables and Xj ∼ Gamma(kj , α)
for j = 1, . . . , n. Let
Xj
Yj = for j = 1, . . . , n.
X1 + · · · + Xn
and let Z = X1 + · · · + Xn . Then the absolute value of the Jacobian of the transformation (X1 , . . . , Xn ) 7→
(Y1 , . . . , Yn−1 , Z) is now z n−1 . The density of (Y1 , . . . , Yn−1 , Z) is
k −1
αk1 +···+kn y1k1 −1 · · · yn−1
n−1
(1 − y1 − · · · − yn−1 )kn −1 −αz k1 +···+kn −1
f(Y1 ,...,Yn−1 ,Z) (y1 , . . . , yn−1 , z) = e z
Γ(k1 ) · · · Γ(kn )
and hence Z is independent of (Y1 , . . . , Yn−1 ) and the density of (Y1 , . . . , Yn−1 ) can be written down.
Definition(12.2a). Suppose n ∈ {2, 3, . . .} and (k1 , . . . , kn ) ∈ (0, ∞)n . Then the random vector (Y1 , . . . , Yn )
Pn−1
has the Dirichlet distribution Dir(k1 , . . . , kn ) iff Yn = 1 − j=1 Yj and (Y1 , . . . , Yn−1 ) has the density:
Γ(k1 + · · · + kn ) k1 −1 kn−1 −1
f(Y1 ,...,Yn−1 ) (y1 , . . . , yn−1) = y · · · yn−1 (1 − y1 − · · · yn−1 )kn −1
Γ(k1 ) · · · Γ(kn ) 1
for y1 > 0, y2 > 0, . . . , yn−1 > 0 and y1 + y2 + · · · yn−1 < 1.
12.3 Properties.

TO DO
APPENDIX

Answers
Chapter 1 Section 3 on page 7 (exs-basic.tex)

1. (a) A = £1,000 × 1.04 × (1 + V1 ) × (1 + V2 ) = £1,000 × 1.04 × (1.04 + U1 )(1.04 + 2U2 ). Hence E[A] = £1 000 × 1.043 =
£1,124.864 or £1,124.86. (b) For this case    
1,000 1,000 1 1
C= and E[C] = E E
1.04(1.04 + U1 )(1.04 + 2U2 ) 1.04 1.04 + U1 1.04 + 2U2
Now 0.01
  Z 0.01
1 du
E = 50 = 50 ln(1.04 + u) = 50(ln 1.05 − ln 1.03)
1.04 + U1 −0.01 1.04 + u −0.01
  Z 0.01 0.01
1 du 50
E = 50 = ln(1.04 + 2u) = 25(ln 1.06 − ln 1.02)
1.04 + 2U2 −0.01 1.04 + 2u 2 −0.01
Hence E[C] = 889.133375744 or £889.13.
2. Clearly −2a < W < 2a. For w ∈ (−2a, 2a) we have Z
fW (w) = fX (x)fY (w − x) dx
Now −a < x < a and −a < w − x < a; hence w − a < x < w + a. Hence
Z min(a,w+a)
1
fW (w) = fX (x)fY (w − x) dx = 2 [min(a, w + a) − max(−a, −a + w)]
max(−a,−a+w) 4a
|w|
  
(2a − w)/4a2 if w > 0 1
= = 1−
(w + 2a)/4a2 if w < 0 2a 2a

1/2a

−2a 0 2a
Figure(2a). The shape of the triangular density
(wmf/triangulardensity,60mm,21mm)
dy
3. Clearly 0 ≤ Y < 1; also dx = 4x3 = 4y 3/4 .
X fX (x) X fX (x) X 1 1
fY (y) = = = = 3/4
x
| dy/dx|
x
4y 3/4
x
8y 3/4 4y
4. Now (X − 1)2 ≥ 0; hence X 2 + 1 ≥ 2X. Because X > 0 a.e., we have X + 1/X ≥ 2 a.e. Hence result.
5. For parts (a) and (b):
Z ∞ Z ∞Z ∞ Z ∞Z t Z ∞
rxr−1 [1 − F (x)] dx = rxr−1 f (t) dt dx = rxr−1 f (t)dx dt = tr f (t) dt = E[X]
0 x=0 t=x t=0 x=0 t=0
6. (a) Jensen’s Inequality is as follows: suppose X is a random variable with a finite expectation and φ : R → R is a
convex function. Then φ (E[X]) ≤ E [φ(X)].
In particular, suppose φ(x) = 1/x, then φ is a convex function on (0, ∞). Hence if X is positive random  variable
 with
finite expectation, then 1/E[X] ≤ E[1/X]. Trivially, the result is still true if E[X] = ∞. Hence E 1/Sn ≥ 1/(nµ).
(b) Z ∞ Z ∞ Z ∞   
n 1
E[e−tX ] dt = E[e−t(X1 +···+Xn ) ] dt = E e−t(X1 +···+Xn ) dt = E
0 0 0 S n
by using the Fubini-Tonelli theorem that the order of integration can be changed for a non-negative integrand.
7. (a) The arithmetic mean-geometric mean inequality gives
x1 + · · · + xn √
≥ n x1 · · · xn for all x1 > 0, . . . , xn > 0.
n
Hence
1 1

x1 + · · · + xn 1/n
nx1 · · · xn
1/n

Using independence gives


  " #!n
1 1 1
E ≤ E
Sn n X1
1/n

Bayesian Time Series Analysis by R.J. Reed Jan 8, 2019(21:02) Answers Page 103
Page 104 Answers 1§3 Jan 8, 2019(21:02) Bayesian Time Series Analysis

Now for x > 0 we have 


1 1/x if 0 < x ≤ 1;

x1/n 1 if x ≥ 1.
Hence E[1/Sn ] is finite. (b) Because they have identical distributions, E[X1 /Sn ] = · · · = E[Xn /Sn ]. Hence
X1 + · · · + Xn X1 + · · · + Xj
           
Sn X1 Sj X1 j
1=E =E = nE Hence E =E = jE =
Sn Sn Sn Sn Sn Sn n

8. (a) Recall |cov[X1 , X2 ]| ≤ var[X1 ] var[X2 ]; hence cov[ /Y , Y ] is finite. Hence E[X] = E[ /Y ] E[Y ]. Also
X X
2 2 2
E[( X/Y )X] = E[( X /Y 2 )Y ] = E[ X /Y 2 ] E[Y ] because X /Y 2 is independent of Y . Hence
2 2
0 = cov[ X/Y , X] = E[( X/Y )X] − E[ X/Y ] E[X] = E[ X /Y 2 ] E[Y ] − {E[ X/Y ]} E[Y ] = var[ X/Y ]E[Y ]
As E[Y ] > 0, it follows that var[ X/Y ] = 0 as required.
(b) Clearly ln( Y /X ) = ln(Y ) − ln(X) is independent of ln(X). Using characteristic functions, φln(Y /X) (t) φln(X) (t) =
φln(Y ) (t). Also ln( X/Y ) = ln(X) − ln(Y ) is independent of ln(Y ). Hence φln(X/Y ) (t) φln(Y ) (t) = φln(X) (t). Hence
φln(Y /X) (t)φln(X/Y ) (t) = 1. But for any characteristic function |φ(t)| ≤ 1. Hence |φln(Y /X) (t)| = 1 everywhere. This
implies1 ln(Y /X) is constant almost everywhere and this establishes the result.
9. Suppose W = X − Y . Then for w < 0 we have
Z ln(a) Z ln(a) w 2y
e e ew
fW (w) = fX (w + y)fY (y) dy = dy =
y=−∞ −∞ a2 2
For w > 0 we have
Z ln(a)−w Z ln(a)−w w 2y
e e e−w
fW (w) = fX (w + y)fY (y) dy = dy =
y=−∞ −∞ a2 2
Hence density of |W | is e−w for w > 0. This is the exponential (1) distribution.
10. Denote E[Y |X] by h(X). Then
h 2 i h 2 i
E Y − g(X) = E Y − h(X) + h(X) − g(X)
h 2 i    h 2 i
= E Y − h(X) + 2E Y − h(X) h(X) − g(X) + E h(X) − g(X)
But
       
E Y − h(X) h(X) − g(X) |X = h(X) − g(X) E Y − h(X) |X = h(X) − g(X) .0
Hence result.
11. E[Y |X] = X/2; hence E[Y ] = E[X]/2 = 1/4. Now var( E[Y |X] ) = var[X]/4 = 1/48. Finally var[Y |X = x] =
x2 /12 and hence E[ var(Y |X) ] = E[X 2 ]/12 = 1/36. Hence var[Y ] = 1/36 + 1/48 = 7/144.
12. Now 
 2   2 
E Y − Yb = E Y − E(Y |X) + E(Y |X) − Yb
h  2 
2 i h  i
=E Y − E(Y |X) + 2E Y − E(Y |X) E(Y |X) − Yb + E E(Y |X) − Yb

By equation(1.1a) on page 3 and the law of total expectation, the first term is E[var(Y |X)]. Applying the law of total
expectation to the second term gives
h  i n h   io
2E Y − E(Y |X) E(Y |X) − Yb = 2E E Y − E(Y |X) E(Y |X) − Yb |X
n  o
= 2E E(Y |X) − Yb × 0 = 0
Hence  2   2 
E Y − Yb = E[var(Y |X)] + E E(Y |X) − Yb

which is minimized when Yb = E(Y |X).


13. (a) For the second result, just use E(XY |X) = XE(Y |X) = aX + bX 2 and take expectations. Clearly cov[X, Y ] =
b var[X]. Then E(Y |X) = a + bX = a + bµX + b(X − µX ) = µY + b(X − µX ). Then use b = cov(X, Y )/var(X) =
2 h i2
and b = ρσY /σX . Finally E Y − E(Y |X) = E Y − µY − ρ σσX
  
ρσY /σX . (b) var E(Y |X) = b2 σX 2 Y
(X − µX ) =
σY2 + ρ2 σY2 − 2ρ σσX
Y
cov[X, Y ] = σY2 + ρ2 σY2 − 2ρ2 σY2 as required.
(c) We have µX = c + dµY and µY = a + bµX . Hence µX = (c + ad)/(1 − bd) and µY = (a + bc)/(1 − bd).
Now E[XY ] = cµY + dE[Y 2 ] and E[XY ] = aµX + bE[X 2 ]. Hence cov[X, Y ] = dvar[Y ] and cov[X, Y ] = bvar[X].
Hence σY2 /σX 2
= b/d. Finally ρ = cov[X, Y ]/(σX σY ) = d σσX Y
and hence ρ2 = d2 b/d = bd.

1
See for example, pages 18–19 in [L UKACS(1970)] and exercise 4 on page 298 in [A SH(2000)].
Appendix Jan 8, 2019(21:02) Answers 1§3 Page 105

14. Let g(a, b) = E ( Y − a − bX )2 = E[Y 2 ] − 2aµY + a2 − 2bE[XY ] + b2 E[X 2 ] + 2abµX . Hence we need to solve
 

∂g(a, b) ∂g(a, b)
= −2µY + 2a + 2bµX = 0 and = −2E[XY ] + 2bE[X 2 ] + 2aµX = 0
∂a ∂b
This gives
E[XY ] − µX µY σY σY
b= =ρ and a = µY − bµX = µY − ρ µX
E[X 2 ] − µ2X σX σX
15.
Z 1
6 2  2
fX (x) = (x + y)2 dy = (x + 1)3 − x3 = (3x2 + 3x + 1) for x ∈ [0, 1].
0 7 7 7
Similarly
2
fY (y) = (3y 2 + 3y + 1) for y ∈ [0, 1].
7
Hence
3(x + y)2 3(x + y)2
fX|Y (x|y) = 2 and fY |X (y|x) = 2 for x ∈ [0, 1] and y ∈ [0, 1].
3y + 3y + 1 3x + 3x + 1
and so the best predictor of Y is
Z 1 2 2 1
3 3 x y y 3 y 4
E[Y |X = x] = 2 (x2 + 2xy + y 2 )y dy = 2 + 2x +
3x + 3x + 1 0 3x + 3x + 1 2 3 4 0
 2 
3 x 2x 1 1  2 
= 2 + + = 6x + 8x + 3
3x + 3x + 1 2 3 4 4(3x2 + 3x + 1)
9
R1 x=1
, E[X 2 ] = 72 0 (3x4 + 3x3 + x2 )dx = 27 35 x5 + 43 x4 + 13 x3 x=0 = 72 53 + 34 + 13 = 210
101
  
(b) Now µX = µY = 14 and
2
2 9 101 81 199
σX = σY2 = E[X 2 ] − 14 2 = 210 − 196 = 2940 . Also
Z 1Z 1 Z 1 (Z 1 ) Z 1 
1 2y y 2

7 2 2
E[XY ] = xy(x + y) dxdy = y x(x + y) dx dy = y + + dy
6 0 0 y=0 x=0 y=0 4 3 2
Z 1 
y 2y 2 y 3

1 2 1 17
= + + dy = + + =
y=0 4 3 2 8 9 8 36
2
17
Hence E[XY ] = 42 and cov[X, Y ] = 17 9 5 5 2940 25
42 − 142 = − 588 and ρ = − 588 × 199 = − 199 . Hence the best linear predictor is
σY 9 144 25
µY + ρ (X − µX ) = (1 − ρ) + ρX = − X
σX 14 199 199
(c) See figure(15a) below.

0.75 Best linear predictor


Best predictor

0.70

0.65

0.60
0.0 0.2 0.4 0.6 0.8 1.0
Figure(15a). Plot of best predictor (solid line) and best linear predictor (dashed line) for exercise 15.
(wmf/exs-bestlin,72mm,54mm)

16. (a) Clearly 0 < Z < 1/2. Also Z = min{X1 , X2 , 1 − X1 , 1 − X2 }. Hence P[Z ≥ z] = P[X1 ≥ z, X2 ≥ z, 1 − X1 ≥
z, 1 − X2 ≥ z] = P[X1 ≥ z, X2 ≥ z, X1 ≤ 1 − z, X2 ≤ 1 − z] = P[z ≤ X1 ≤ 1 − z] P[z ≤ X2 ≤ 1 − z] = (1 − 2z)2 .
Hence the density is fZ (z) = 4(1 − 2z) for 0 < z < 1/2.
(b) Now
 
X(1) if X(1) < 1 − X(2) X(1) if X(1) + X(2) < 1
Z= =
X(2) if X(1) > 1 − X(2) X(2) if X(1) + X(2) > 1
Hence P[Z ≤ z] = P[X(1) ≤ z, X(1) + X(2) < 1] + P[X(2) ≤ z, X(1) + X(2) > 1]. If z < 1/2 then P[Z ≤ z] =
2(z − z 2 ) + 0. If z > 1/2 then P[Z ≤ z] = 1/2 + 2(z 2 − z) + 1/2 = 1 + 2z 2 − 2z. Or: If z ≤ 1/2 we have
{Z ≤ z} = {(X1 ≤ z) ∩ (X2 further from end than X1 )} ∪ {(X2 ≤ z) ∩ (X1 further from end than X2 )}. Hence
Page 106 Answers 1§3 Jan 8, 2019(21:02) Bayesian Time Series Analysis

2
P[Z ≤ z] = 2P[ (X1 ≤ z) ∩ (X2 further from end than X1 ) ] = P[ (X1 ≤  z) ∩ (X1 < X2 ≤ 1 − X1 ) ] = 2(z − z ). If z>
1/2, P[Z ≤ z] = 2P [ (X ≤ z) ∩ (X further from end than X ) ] = 2P (X ≤ 1/2) ∩ (X further from end than X ) +
 1 2  1  1 2  1
2P ( 1/2 < X1 ≤ z) ∩ (X2 further from end than X1 ) = 2 × 1/4 + 2P ( 1/2 < X1 < z) ∩ (1 − X1 < X2 < X1 ) = 1/2 +
2z 2 − 2z + 1/2 as before.
n!
17. (a) Using equation(2.3b) on page 5 gives the density f (x) = (j−1)!(n−j)! xj−1 (1 − x)n−j . Recall B(j, n − j + 1) =
Γ(j)Γ(n−j+1)
Γ(n+1) = (j−1)!(n−j)!
n!
1
. Hence the density of Xj:n is f (x) = B(j,n−j+1) xj−1 (1 − x)n−j for x ∈ (0, 1) which is
the Beta(j, n − j + 1) distribution. (b) E[Xj:n ] = j/(n + 1) by using the standard result of the expectation of a Beta
distribution.
18. Now the density of (X1:4 , X2:4 , X3:4 , X4:4 ) is g(x1 , x2 , x3 , x4 ) = 4! = 24 for 0 < x1 < x2 < x3 < x4 < 1. Hence the
marginal density of (X2:4 , X3:4 , X4:4 ) is g2,3,4 (x2 , x3 , x4 ) = 24x2 for 0 < x2 < x3 < x4 . Hence the marginal density of
(X3:4 , X4:4 ) is g3,4 (x3 , x4 ) = 12x23 for 0 < x3 < x4 < 1. Hence
Z Z
P[X3:4 + X4:4 < 1] = 12x2 I[0 < x < 1, 0 < y < 1, x < y, x + y < 1] dx dy
Z 1/2 Z 1−x Z 1/2
2 1
= 12x dy dx = 12x2 (1 − 2x) dx =
x=0 y=x x=0 8
19. ⇐ The joint density of (X1:2 , X2:2 ) is g(y1 , y2 ) = 2f (y1 )f (y2 ) = 2λ2 e−λ(y1 +y2 ) for 0 <
y1 < y2 . Now consider the
∂(w,y)
transformation to (W, Y ) = (X2:2 − X1:2 , X1:2 ). The absolute value of the Jacobian is ∂(y 1 ,y2 )
= | − 1| = 1. Hence
f(W,Y ) (w, y) = 2λ2 e−λ(w+y+y) = 2λe−2λy λe−λw = fY (y)fW (w) where the density of X1:2 is fY (y) = 2λe−2λy . The
fact that the joint density is the product of the marginal densities implies W and Y are independent.
⇒ P[X2:2 − X1:2 > y|X1:2 = x] = P[X2:2 > x + y|X1:2 = x] = 1−F (x+y)
1−F (x) and this is independent of x. Taking x = 0
gives 1 − F (x + y) = (1 − F (x))(1 − F (y)) and F is continuous. Hence there exists λ > 0 with F (x) = 1 − e−λx .
20. By equation(2.2a) on page 4, the density of the vector is (X1:n , X2:n , . . . , Xn:n ) is g(x1 , . . . , xn ) = n!f (x1 ) · · · f (xn )
for 0 ≤ x1 ≤ x2 · · · ≤ xn . The transformation to (Y1 , Y2 , . . . , Yn ) has Jacobian with absolute value

∂(y1 , . . . , yn ) 1
∂(x1 , . . . , xn ) = y n−1

1
Hence for y1 ≥ 0 and 1 ≤ y2 ≤ · · · ≤ yn , the density of the vector (Y1 , Y2 , . . . , Yn ) is
h(y1 , . . . , yn ) = n!y1n−1 f (y1 )f (y1 y2 )f (y1 y3 ) · · · f (y1 yn )
(b) Integrating yn from yn−1 to ∞ gives
h(y1 , . . . , yn−1 ) = n!y1n−2 f (y1 )f (y1 y2 )f (y1 y3 ) · · · f (y1 yn−1 ) 1 − F (y1 yn−1 )
 

Then integrating yn−1 over yn−2 to ∞ gives


 2
1 − F (y1 yn−2 )
h(y1 , . . . , yn−2 ) = n!y1n−3 f (y1 )f (y1 y2 )f (y1 y3 ) · · · f (y1 yn−2 )
2
and by induction
[1 − F (y1 y2 )]n−2
h(y1 , y2 ) = n!y1 f (y1 )f (y1 y2 )
(n − 2)!
n−1
h(y1 ) = nf (y1 ) [1 − F (y1 )]
as required.
21. Now Xn:n = Yn , X(n−1):n = Yn−1 Yn , X(n−2):n = Yn−2 Yn−1 Yn , . . . , X1:n = Y1 · · · Yn . The absolute value of the
Jacobian of the transformation
is
∂(x1 , . . . , xn ) 2 3 n−1
∂(y1 , . . . , yn ) = (y2 · · · yn )(y3 · · · yn ) · · · (yn−1 yn )(yn ) = y2 y3 y4 · · · yn

Hence the density of the vector (Y1 , . . . , Yn ) is


f (y1 , . . . , yn ) = n!y2 y32 y43 · · · ynn−1 for 0 < y1 < 1, . . . , 0 < yn < 1.
Because the density factorizes, it follows that Y1 , . . . , Yn are independent. Also
fY1 (y1 ) = 1 fY2 (y2 ) = 2y2 fY3 (y3 ) = 3y32 . . . fYn (yn ) = nynn−1
It is easy to check that V1 = Y1 , V2 = Y22 , . . . , Vn = Ynn are i.i.d. random variables with the U (0, 1) distribution.
22. Now E[ X1 |X1:n , . . . , Xn:n ] = E[ Xj |X1:n , . . . , Xn:n ] for j = 2, . . . , n. Hence E[ X1 |X1:n , . . . , Xn:n ] = E[ X1 + · · · +
Xn |X1:n , . . . , Xn:n ]/n = E[ X1:n + · · · + Xn:n | X1:n , . . . , Xn:n ]/n = ( X1:n + · · · + Xn:n ) /n.
23. P[N = 1] = 1/2, P[N = 2] = P[X1 > X0 < X2 ] = 1/6, P[N = 3] = P[X1 < X0 , X2 < X0 , X3 > X0 ] = 2P[X1P < X2 <
∞ 1
X0 < X3 ] = 2/4! = 1/12. In general, P[N = k] = (k − 1)!/(k + 1)! = 1/k(k + 1) for n = 1, 2, . . . . Hence E[N ] = k=1 k+1
which diverges.
Appendix Jan 8, 2019(21:02) Answers 1§5 Page 107

24. Now the density of (X1:4 , X2:4 , X3:4 , X4:4 ) is g(x1 , x2 , x3 , x4 ) = 4! = 24 for 0 < x1 < x2 < x3 < x4 < 1. Hence
the marginal density of (X2:4 , X3:4 , X4:4 ) is g2,3,4 (x2 , x3 , x4 ) = 24x2 for 0 < x2 < x3 < x4 < 1. Hence the marginal
R 1−y
density of (X2:4 , X4:4 ) is g2,4 (x2 , x4 ) = 24x2 (x4 − x2 ) for 0 < x2 < x4 < 1. Hence fY (y) = x2 =0 g2,4 (x2 , x2 + y) dx2 =
R 1−y
x2 =0
24x2 y dx2 = 12y(1 − y)2 for 0 < y < 1.
The marginal density of (X1:4 , X2:4 , X3:4 ) is g1,2,3 (x1 , x2 , x3 ) = 24(1 − x3 ) for 0 < x1 < x2 < x3 . Hence the marginal
R 1−z
density of (X1:4 , X3:4 ) is g1,3 (x1 , x3 ) = 24(x3 − x1 )(1 − x3 ) for 0 < x1 < x3 < 1. Hence fZ (z) = x1 =0 g1,3 (x1 , x1 +
R 1−z
z) dx1 = x1 =0 24z(1 − x1 − z) dx1 = 12z(1 − z)2 for 0 < z < 1. So both have the same distribution.
25. Now the density of (X1:3 , X2:3 , X3:3 ) is g(x1 , x2 , x3 ) = 3! = 6 for 0 < x1 < x2 < x3 < 1. Hence the marginal
density of (X1:3 , X3:3 ) is g1,3 (x1 , x3 ) = 6(x3 − x1 ) for 0 < x1 < x3 < 1. Hence the conditional density of X2:3 given
(X1:3 , X3:3 ) = (x1 , x3 ) is
6 1
fX2:3 |(X1:3 ,X3:3 ) (x2 |(x1 , x3 ) = = for x2 ∈ (x1 , x3 ).
6(x3 − x1 ) x3 − x1
This is the uniform distribution on (x1 , x3 ).

Chapter 1 Section 5 on page 15 (exs-uniform.tex)

1.
b
 n Z b−a
a+b
Z
1 1 2
E[X − µ)n ] = x− dx = v n dv
b−a a 2 b − a a−b

n
 2
1 − (−1)n+1

n+1 n+1 − a)
(b − a) − (a − b) (b
= =
2n+1 (b − a)(n + 1) (n + 1)2n+1
2. P[− ln X ≤ x] = P[ln X ≥ −x] = P[X ≥ e−x ] = 1 − e−x . Hence Y ∼ exponential (1).
R1
3. (a) Distribution function: FZ (z) = P[XY ≤ z] = 0 P[X ≤ z/y] dy. But P[X ≤ z/y] = 1 if y ≤ z and = z/y if z < y.
Rz R1
Hence FZ (z) = 0 dy + z z/y dy = z(1 − ln z) for 0 < z < 1.
Density: By differentiating FZ (z) we get fz (z) = − ln z for 0 < z < 1. Alternatively, consider
the
transformation
∂(z,v)
Z = XY and V = Y . Then 0 < Z < 1 and 0 < V < 1. The absolute value of the Jacobian is ∂(x,y) = y = v. Hence
f(Z,V ) (z, v) = f(X,Y ) ( z/v, v)/v. Hence
Z 1 Z 1
fX ( z/v) dv
fZ (z) = dv = = − ln z for 0 < z < 1.
v=0 v v=z v
Pn
(b) Now − ln Pn = j=1 Yj where Y1 , . . . , Yn are i.i.d. with the exponential (1) distribution, after using the result in
exercise 2. Hence Z = − ln Pn ∼ Gamma(n, 1) and Z has density z n−1 e−z /Γ(n) for z > 0. Transforming back to Pn
shows the density of Pn is f (x) = (− ln x)n−1 /Γ(n) = (ln 1/x)n−1 /Γ(n) for x ∈ (0, 1).
4. (a) First n = 2. Now the density of (X1 , X2 ) is
1
f(X1 ,X2 ) (x1 , x2 ) = fX2 |X1 (x2 |x1 )fX1 (x1 ) = for 0 < x2 < x1 < 1.
x1
Z 1 1  
1
fX2 (x2 ) = f(X1 ,X2 ) (x1 , x2 ) dx1 = − ln x1 = ln for 0 < x2 < 1.

x1 =x2 x 1 =x2 x 2
Assume true for n − 1; to prove for n. Now Xn ∼ U (0, Xn−1 ); hence
1
f(Xn−1 ,Xn ) (xn−1 , xn ) = fXn |Xn−1 (xn |xn−1 )fXn−1 (xn−1 ) = fX (xn−1 )
xn−1 n−1
n−2
1 ln 1/xn−1
= for 0 < xn < xn−1 < 1.
xn−1 (n − 2)!
and hence
Z 1 n−2 n−1
1 ln 1/xn−1 ln 1/xn
fXn (xn ) = dxn−1 = for 0 < xn < 1.
xn−1 =xn xn−1 (n − 2)! (n − 1)!
As required.
(b) Now Xn = Xn−1 Z where Z ∼ U (0, 1). Similarly, by induction, Xn = X1 Z1 · · · Zn−1 which is the product of n
random variables with the U (0, 1) distribution. Hence result by part (b) of exercise 3
5.
Z b  
1 1
H(X) = − ln dx = ln(b − a)
a b−a b−a
Page 108 Answers 1§5 Jan 8, 2019(21:02) Bayesian Time Series Analysis

6. (a)
b
1 b min{b, v} − max{0, v − a}
Z Z
fV (v) = fX (v − y)dy = fX (v − y)fY (y) dy =
0 b 0 ab
by using fX (v − y) = 1/a when 0 < v − y < a; i.e. when v − a < y < v. Suppose a < b. Then

 v/ab if 0 < v < a;
fV (v) = 1/b if a < v < b;
(a + b − v)/ab if b < v < a + b.

(b) Now −a < W < b.
Z b
1 b min{a, b − w} − max{0, −w}
Z
fW (w) = fY (y + w)fX (y) dy = fY (y + w) dy =
0 a 0 ab
by using fY (y + w) = 1/b when 0 < y + w < b; i.e. when −w < y < b − w. Suppose a < b. Then

 (a + w)/ab if −a < w < 0;
fW (w) = 1/b if 0 < w < b − a;
(b − w)/ab if b − a < w < b.

fX (x) ....
.........
fX (x) ....
.........
... ...
.... ....
.. ..
... ...
... ...
... ............................................................................ ... ............................................................................
... ..... ...... ... ..... ......
... ... . . . ... ... . . .
... ... .
. . . ..... ... ... .
. . . .....
... .
.
. .
. ..
. ... ... .
.
. .
. ..
. ...
... ... . . ... ... ... . . ...
... ... . . ... ... ... . . ...
... ..
. .
.
.
.
... ... ..
. .
.
.
.
...
... .
. ... ... .
. ...
... ....
... ...
.
. .
.
.
/ 1 b ..
.
...
...
...
... ....
... ...
.
. .
.
.
/ 1 b ..
.
...
...
...
... ... . . ... ... ... . . ...
. . ... . . ...
... ... .
.
.
. ... ... ... .
.
.
. ...
....... . . .. . ....... . . .. .
......................................................................................................................................................................................................... .........................................................................................................................................................................................................
... . ... .
0 b aa+b . x −a 0 b−a b x .

Figure(6a). Plot of density of V = X + Y (left) and density of W = Y − X (right).


(PICTEX)
7. Now ( a−t b−t
a b for 0 ≤ t ≤ min{a, b};
P[V ≥ v] = P[X ≥ v]P[Y ≥ v] =
0 for t ≥ min{a, b}.
(a−t)(b−t)
(
1− ab if 0 ≤ t ≤ min{a, b};
FV (t) =
1 t ≥ min{a, b}.
a + b − 2t
fV (t) = for 0 ≤ t ≤ min{a, b}.
ab
Finally
Z min{a,b} Z a a
b−x

P[Y > x]
Z
1 1 − a/2b if a ≤ b;
P[V = X] = P[Y > X] = dx = dx + dx =
0 a 0 ab min{a,b} a 1 − b/2a if a > b.
8. Let V denote the arrival time. If you take the bus on route 2 then E[V ] = V = t0 + α + β. If you wait for a bus on
route 1, then E[V ] = t0 + α + E[X2 − t0 |X2 > t0 ]. But the distribution of (X2 − t0 |X2 > t0 ) is U (0, a − t0 ), and hence
E[X2 − t0 |X2 > t0 ] = (a − t0 )/2. Hence route 1 is faster if (a − t0 )/2 < β and route 2 is faster if (a − t0 )/2 > β.
9. For w ∈ (0, 1) we have
X∞ X∞
P[W ≤ w] = P[W ≤ w, bU + V c = k] = P[U + V ≤ w + k, bU + V c = k]
k=−∞ k=−∞
X∞ ∞ Z
X 1
= P[V ≤ w + k − U, bU + V c = k] = P[u + V ≤ w + k, k ≤ u + V < k + 1] du
k=−∞ k=−∞ u=0
X∞ Z 1 ∞ Z 1
X
= P[k ≤ u + V ≤ w + k] du = P[k − u ≤ V ≤ w + k − u] du
k=−∞ u=0 k=−∞ u=0

X∞ Z 1 Z w+k−u ∞
X
= fV (v)dv du = In
k=−∞ u=0 v=k−u k=−∞
where
Z k+w  
In = min{w + k − v, 1} − max{0, k − v} fV (v) dv
v=k−1
Z k−1+w Z k Z k+w
= [1 − k + v]fV (v)dv + wfV (v)dv + (w + k − v)fV (v)dv
v=k−1 v=k−1+w v=k
Appendix Jan 8, 2019(21:02) Answers 1§5 Page 109
"Z #
Z k+w k−1+w Z k+w
=w fV (v) dv + vfV (v)dv − vfV (v)dv +
v=k−1+w v=k−1 v=k
" Z #
k+w Z k−1+w
k fV (v)dv − (k − 1) fV (v) dv
v=k v=k−1

and hence
n
"Z #
X Z n+w −n−1+w Z n+w
In = w fV (v)dv + vfV (v)dv − vfV (v)dv +
k=−n v=−n−1+w v=−n−1 v=n
" Z #
n+w Z −n−1+w
n fV (v)dv − (−n − 1) fV (v)dv
n v=−n−1
R n+w R n+w
Now E|V | < ∞; hence E[V + ] < ∞. Hence n vfV (v) dv → 0 as n → ∞. In turn, this implies n n fV (v) dv → 0
as n → ∞. Similarly for V − and Pn the other two R∞ integrals.
Hence P[W ≤ w] = limn→∞ k=−n In = w v=−∞ fV (v) dv = w as required.
10. (a) P[V ≥ t] = (1 − t)2 . Hence FV (t) = P[V ≤ t] = 2t − t2 and fV (t) = 2(1 − t) for 0 ≤ t ≤ 1. Also E[V ] = 1/3.
FW (t) = P[W ≤ t] = t2 and fW (t) = 2t for 0 ≤ t ≤ 1. Also E[W ] = 2/3.
(b) For v < w we have P[V ≤ v, W ≤ w] = P[W ≤ w] − P[W ≤ w, V > v] = w2 − (w − v)2 = v(2w − v); whilst for
v > w we have P[V ≤ v, W ≤ w] = P[W ≤ w] = w2 . Hence
∂2
f(V,W ) (v, w) = P[V ≤ v, W ≤ w] = 2 for 0 ≤ v < w ≤ 1.
∂v∂w
(c) For v < w we have P[W ≤ w|V ≤ v] = (2w − v)/(2 − v) and for v > w we have P[W ≤ w|V ≤ v] = w2 /(2v − v 2 ).
Hence 
2/(2 − v) if v < w;
fW (w|V ≤ v) =
2w/(2v − v 2 ) if v > w.
Z v Z 1
2 2 2 2v 2 1 − v2 3 − v2
E[W |V ≤ v] = w dw + w dw = + =
2v − v 2 w=0 2 − v w=v 3(2 − v) 2 − v 3(2 − v)
Note that E[W |V ≤ 1] = /3 = E[W ] and E[W |V ≤ 0] = /2 = E[X].
2 1
11. (a) Without loss of generality, suppose we measure distances clockwise from some fixed origin O on the circle. Let D
denote the length of the interval (X1 , X2 ). Then 0 < D < 1 and
 
X2 − X1 if X2 > X1 ; 0 if 1 > X2 − X1 > 0;
D= = X2 − X1 +
1 − X1 + X2 if X1 > X2 . 1 if −1 < X2 − X1 < 0.
The first line corresponds to points in the clockwise order O → X1 → X2 and the second line to points in the clockwise
order O → X2 → X1 .
So for y ∈ (0, 1) we have
P[D ≤ y] = P[0 ≤ X2 − X1 ≤ y] + P[X2 − X1 ≤ y − 1]
= P[X1 ≤ X2 ≤ min{X1 + y, 1}] + P[X2 ≤ X1 + y − 1]
"Z # "Z #
1−y Z 1 1
= y fX1 (x1 ) dx1 + (1 − x1 )fX1 (x1 ) dx1 + (x1 + y − 1)fX1 (x1 )dx1
x1 =0 x1 =1−y x1 =1−y

1 (1 − y)2 1 (1 − y)2
   
= y(1 − y) + y − + + y(y − 1) + − =y
2 2 2 2
as required.
(b) Without loss of generality, take Q to be the origin and measure clockwise. So we have either Q → X1 → X2 or
Q → X2 → X1 and both of these lead to the same probability. Consider the first case: Q → X1 → X2 . Then for
t ∈ (0, 1) we have
Z t Z t
t2
P[L ≤ t, Q → X1 → X2 ] = P[X2 ≥ 1 − (t − x1 )]fX1 (x1 ) dx1 = (t − x1 )dx1 =
x1 =0 x1 =0 2
Similarly for Q → X2 → X1 . Hence P[L ≤ t] = t2 and fl (t) = 2t for t ∈ (0, 1). Finally, E[L] = 2/3.
2 n 2
n
12. For x ∈ (0, r) we have P[D > x] = 1 − x /r2 and hence P[D ≤ x] = 1 − 1 − x /r2 . Hence
n−1
x2

2nx
fD (x) = 2 1 − 2 for x ∈ (0, r).
r r
Z r n−1 Z 1
2nx2 x2

E[D] = 1 − dx = 2nrv 2 (1 − v 2 )n−1 dv
0 r2 r2 0
Z 1  
1/2 n−1 Γ 3/2 Γ(n) Γ 3/2 Γ(n + 1)
= nr u (1 − u) du = nr  =r 
0 Γ n + 3/2 Γ n + 3/2
Page 110 Answers 1§7 Jan 8, 2019(21:02) Bayesian Time Series Analysis

13. Now f(X1 ,X2 ) (x1 , x2 ) is constant on the disc. Hence f(X1 ,X2 ) (x1 , x2 ) = 1/πa2 for x21 + x22 ≤ a2 . Hence
Z √a2 −x2
q
1 1 2 a2 − x21
fX1 (x1 ) = √ 2
dx1 = for −a < x1 < a.
x2 =− a2 −x21 πa πa2
14. Let Yj = Xj −a. Then Y1 , Y2 , . . . , Yn are i.i.d. random variables with the U (0, b−a) distribution. Also P[X1 +· · ·+Xn ≤
t] = P[Y1 + · · · + Yn < −na]. Hence
n  
1 k n
X n
(x − na − kb + ka)+

Fn (x) = n
(−1) for all x ∈ R and all n = 1, 2, . . . .
(b − a) n! k
k=0
n  
1 k n
X n−1
(x − na − kb + ka)+

fn (x) = n
(−1) for all x ∈ R and all n = 2, 3, . . . .
(b − a) (n − 1)! k
k=0
R1
15. (a) By equation(4.4a) and the standard result 0 xα−1 (1 − x)β−1 dx = Γ(α)Γ(β)/Γ(α + β) which is derived in §14.1 on
page 32, we have
Z 1  Z 1
n−1 k

n−k n!
E[Xk:n ] = n t (1 − t) dt = tk (1 − t)n−k dt
0 k−1 (n − k)!(k − 1)! 0
n! Γ(k + 1)Γ(n − k + 1) k
= =
(n − k)!(k − 1)! Γ(n + 2) n+1
Similarly
Z 1  Z 1
n − 1 k+1

n−k n!
2
E[Xk:n ] = n t (1 − t) dt = tk+1 (1 − t)n−k dt
0 k−1 (n − k)!(k − 1)! 0
n! Γ(k + 2)Γ(n − k + 1) (k + 1)k
= =
(n − k)!(k − 1)! Γ(n + 3) (n + 2)(n + 1)
and so
k(n − k + 1)
var[Xk:n ] =
(n + 1)2 (n + 2)
(b) Just substitute into equation(2.5a) on page 5. This gives f(Xj:n ,Xk:n ) (x, y) = cxj−1 (y − x)k−j−1 (1 − y)n−k for
0 ≤ x < y ≤ 1 where c = n!/[ (j − 1)!(k − j − 1)!(n − k)! ].
(c)
j(k + 1)
E[Xj:n Xk:n ] =
(n + 1)(n + 2)
j(k + 1) j k j(n − k + 1)
cov[Xj:n , Xk:n ] = E[Xj:n Xk:n ] − E[Xj:n ]E[Xk:n ] = − =
(n + 1)(n + 2) n + 1 n + 1 (n + 1)2 (n + 2)
s
cov[Xj:n , Xk:n ] j(n − k + 1)
corr[Xj:n , Xk:n ] = p =
var[Xj:n ]var[Xk:n ] k(n − j + 1)

Chapter 1 Section 7 on page 19 (exs-exponential.tex)

1. This is just the probability integral transformation—see §4.5 on page 13. The distribution function of an exponential (λ)
is F (x) = 1 − e−λx . Inverting this gives G(y) = − ln(1 − y)/λ. Hence G(U ) ∼ exponential (λ).
2. For k = 0, 1, . . . , we have P[Y = k] = P[k ≤ X < k + 1] = 1 − e−λ(k+1) − 1 − e−λk = e−λk (1 − e−λ ) = q k p where
   

1 − e−λ . This is the geometric distribution.


P∞ P∞
For z ∈ (0, 1) we have P[Z ≤ z] = k=0 P[k < X ≤ k + z] = k=0 e−λk 1 − e−λz = 1 − e−λz /[1 − e−λ ]. Hence
   

the density is fZ (z) = λe−λz /[1 − e−λ ] for z ∈ (0, 1).


Clearly Y and Z are independent by the  lack of memory property of the exponential; or, from first principles: P[Y =
k, Z ≤ z] = P[k < X < k + z] = e−λk 1 − e−λz and this equals P[Y = k]P[Z ≤ z].


3. The distribution of (X, Y ) is f(X,Y ) (x, y) = λ2 e−λ(x+y) for x > 0 and y > 0. Let v = (x − y)/(x + y) and w = x + y;
hence
x = w(1 + v)/2 and y = w(1 − v)/2. Note that w > 0 and v ∈ (−1, 1). The absolute value of the Jacobian is
∂(x,y) w
∂(v,w) = 2 and hence f(V,W ) (v, w) = λ2 we−λw /2 for v ∈ (−1, 1) and w > 0. Hence fV (v) = 1/2 for v ∈ (−1, 1).
4. The
density
of (X, Y ) is f(X,Y ) (x, y) = λ2 e−λ(x+y) for x > 0 and y > 0. Let W = Y + 1 and Z = X/Y +1. Then
∂(w,z) 1
∂(x,y) = w . Hence f(W,Z) (w, z) = λ2 eλ we−λw(z+1) for w > 1 and z > 0. Integrating out w gives
1 + λ(z + 1)
fZ (z) = e−λz for z > 0.
(z + 1)2
Appendix Jan 8, 2019(21:02) Answers 1§7 Page 111

5. Now P[U > t] = ( P[X > t] )2 = e−2t and hence U ∼ exponential (2).
Let W = 21 Y . Then fW (w) = 2fY (2w) = 2e−2w for w > 0. Hence, if Z = X + W , then for z ≥ 0 we have
Z z Z z Z z
−2z+2x −x −2z
fZ (z) = fW (z − x)fX (x) dx = 2e e dx = 2e ex dx = 2e−2z (ez − 1) = 2e−z − 2e−2z
0 0 0
2 −t 2 −t −2t
for t ≥ 0, and hence fV (t) = 2e−t − 2e−2t for t ≥ 0.

Finally, P[V ≤ t] = (P[X ≤ t]) = 1 − e = 1 − 2e + e
R∞ −µt −δt
R ∞ df −δt −µt
6. E[f (X)]−E[f (Y )] = 0 f (t)[µe −δe ] dt = − 0 dt [e −e ] dt by integration by parts. Hence E[f (X)]−
E[f (Y )] ≤ 0. Rz Rz
7. The density of Z = X + Y is fZ (z) = 0 fY (z − x)fX (x) dx = 0 λ2 e−λz dx = λ2 ze−λz for z ≥ 0. The joint density
of (X, Z) is f(X,Z) (x, z) = fX (x)fY (z − x) = λ2 e−λz . Hence
f(X,Z) (x, z) 1
fX|Z (x|z) = = for x ∈ (0, z).
fZ (z) z
This is the U (0, z) distribution with expectation z/2. Hence E[X|X + Y ] = (X+Y )/2.
8.
∂(y1 , y2 )
Now = 2 and f(Y1 ,Y2 ) (y1 , y2 ) = 1 λ2 e−λy2 for y2 ≥ 0, y1 ∈ R and −y2 ≤ y1 ≤ y2 .
∂(x1 , x2 ) 2
Hence Z y2
1 2 −λy2
fY2 (y2 ) = λ e dy1 = λ2 y2 e−λy2 for y2 ≥ 0.
−y2 2
Z ∞
1 2 −λy2 1
fY1 (y1 ) = λ e dy2 = λe−λy1 if y1 ≥ 0
y 2 2
Z 1∞
1 2 −λy2 1
fY1 (y1 ) = λ e dy2 = λeλy1 if y1 < 0
−y1 2 2
(b) fR (r) = λe−λr .
9. Now P[Y ≤ y, R ≤ r] = 2P[R ≤ r, X1 < X2 , X1 ≤ y] = 2P[X2 − X1 ≤ r, X1 < X2 , X1 ≤ y]. Hence
Z y Z y
P[Y ≤ y, R ≤ r] = 2 P[z < X2 < r + z]f (z) dz = 2 [F (r + z) − F (z)] f (z) dz
Z0 y Z r 0

=2 f (z + t)f (z) dt dz
z=0 t=0
Differentiating gives f(Y,R) (y, r) = 2f (y)f (r + y) for all y ≥ 0 and r ≥ 0.
Suppose X1 and X2 have the exponential distribution. Then f(Y,R) (y, r) = 2λ2 e−2λy e−λr = 2λe−2λy
  −λr
λe =
fY (y)fR (r). Hence Y and R are independent.
Suppose Y and R are independent. Then we have fY (y)fR (r) = 2f (y)f (r + y) for all y ≥ 0 and r ≥ 0. Let r = 0.
2
Then fY (y)fR (0) = 2 {f (y)} . But fY (y) = 2f (y) [1 − F (y)]. Hence fR (0) [1 − F (y)] = f (y) or cF (y) + f (y) = c. This
differential equation can be solved by using the integrating factor ecy where c = fR (0). We have cecy F (y) + ecy f (y) =
d
cecy or dy {ecy F (y)} = cecy . Hence ecy F (y) = A + ecy . But F (0) = 0; hence A = −1 and F (y) = 1 − e−cy as required.
R∞ R∞
10. (a) P[min{X1 , X2 } = X1 ] = P[X2 ≥ X1 ] = x=0 P[X2 ≥ x]λ1 e−λx1 dx = x=0 λ1 e−(λ1 +λ2 )x1 dx = λ1 /(λ1 + λ2 ).
R∞ R∞
(b) P[min{X1 , X2 } > t and min{X1 , X2 } = X1 ] = P[X2 > X1 > t] = x=t y=x λ1 e−λ1 x λ2 e−λ2 y dy dx =
R∞
λ e−(λ1 +λ2 )x dx = λ1λ+λ
x=t 1
1
2
e−(λ1 +λ2 )t = P[min{X1 , X2 } > t] P[min{X1 , X2 } = X1 ] as required.
R∞
(c) P[R > t and min{X1 , X2 } = X1 ] = P[X2 − X1 > t] = P[X2 > t + X1 ] = u=0 e−λ2 (t+u) λ1 e−λ1 u du =
λ1 e−λ2 t /(λ1 + λ2 ). Similarly for P[R > t and min{X1 , X2 } = X2 ]. Hence P[R > t] = λ1 e−λ2 t + λ2 e−λ1 t /(λ1 + λ2 )
 
for t > 0.
(d) Now P[R > t, min{X1 , X2 } > u] = P[R > t, min{X1 , X2 } > u, min{X1 , X2 }R= X1 ] + P[R > t, min{X1 , X2 } >

u, min{X1 , X2 } = X2 ] = P[X2 − X1 > t, X1 > u] + P[X1 − X2 > t, X2 > u] = x1 =u P[X2 > t + x1 ]fX1 (x1 ) dx1 +
R∞
x2 =u
P[X1 > t + x2 ]fX2 (x2 ) dx2 = λ1 e−λ2 t e−(λ1 +λ2 )u /(λ1 + λ2 ) + λ2 e−λ1 t e−(λ1 +λ2 )u /(λ1 + λ2 ). Hence
λ1 e−λ2 t λ2 e−λ1 t −(λ1 +λ2 )u
 
P[R > t, min{X1 , X2 } > u] = + e = P[R > t] P[min{X1 , X2 } > u]
λ1 + λ2 λ1 + λ2
11. We take n large enough so that αδn < 1. Suppose t ∈ [1 − αδn , 1]. Then
1 1
1< <
t 1 − αδn
Integrating over t from 1 − αδn to 1 gives
αδn
αδn < − ln(1 − αδn ) <
1 − αδn
Hence
1 1 1 1
eαδn < < e 1/αδn −1 and hence <e<
1 − αδn (1 − αδn )1/αδn −1 (1 − αδn )1/αδn
Page 112 Answers 1§7 Jan 8, 2019(21:02) Bayesian Time Series Analysis

Hence
1 − αδn 1
< (1 − αδn )1/αδn < and hence lim (1 − αδn )1/αδn = e−1
e e n→∞
as required. R∞ R∞
12. (a) P[Z ≤ t] = P[Y ≥ X/t] = x=0 P[Y ≥ x/t]λe−λx dx = λ x=0 e−µx/t e−λx dx = λ/(λ + µ/t) = λt/(µ + λt).
(b) First note the following integration
" result: √ √ !#
Z ∞
−(α/x+λx)
Z ∞ √ α λx
e dx = exp − αλ √ + √ dx
0 x=0 λx α
p
Using the substitution x λ/α = e−y gives
Z ∞ r Z ∞
α h √ i −y
r Z ∞
α √
−(α/x+λx) y −y
e dx = exp − αλ e + e e dy = e−y e−2 αλ cosh(y) dy
x=0 λ y=−∞ λ y=−∞
r "Z ∞ √ Z 0 √
#
α −y −2 αλ cosh(y) −y −2 αλ cosh(y)
= e e dy + e e dy
λ y=0 y=−∞
r Z ∞ √ Z ∞ √

α −y −2 αλ cosh(y) y −2 αλ cosh(y)
= e e dy + e e dy
λ y=0 y=0

r Z ∞ √
r
α α
=2 cosh(y)e−2 αλ cosh(y) dy = 2 K1 (2 αλ)
λ y=0 λ
Using this result we get
Z ∞ Z ∞h i
−λx
F (z) = P[Z ≤ z] = P[XY ≤ z] = P[Y ≤ /x]λe
z dx = 1 − e−µz/x λe−λx dx
x=0 x=0
Z ∞ p p
=1−λ e−(µz/x+λx) dx = 1 − 2 λµzK1 (2 λµz)
x=0
The density function can be found by differentiating the distribution function:
Z ∞ Z ∞ " √ √ !#
1 −(λx+µz/x) 1 p x λ µz
f (z) = λµ e dx = λµ exp − λµz √ + √ dx
0 x 0 x µz x λ
√ √
Using the substitution x = µzey / λ gives
Z ∞ h p Z ∞
i h p i
f (z) = λµ exp − λµz ey + e−y dy = λµ exp −2 λµz cosh(y) dy
y=−∞ y=−∞
Z ∞ h p i p
= 2λµ exp −2 λµz cosh(y) dy = 2λµK0 (2 λµz)
y=0
√ √ 2

(c) When λ = µ we have F (z) = 1 − 2λ zK1 (2λ z). and f (z) = 2λ K0 (2λ z).
∂(y1 ,...,yn )
13. Now f(Y1 ,...,Yn ) (y1 , . . . , yn ) = f(X1 ,...,Xn ) (x1 , . . . , xn ) ∂(x 1 ,...,xn )
= f(X1 ,...,Xn ) (x1 , . . . , xn ) = λn e−λyn for 0 < y1 <
y2 < · · · < yn .
Or by induction using fY1 (y1 ) = λe−λy1 for y1 > 0 and
f(Y1 ,...,Yn ) (y1 , . . . , yn ) = fYn |(Y1 ,...,Yn−1 ) (yn |y1 , . . . , yn−1 )f(Y1 ,...,Yn−1 ) (y1 , . . . , yn−1 ) = λe−λ(yn −yn−1 ) λn−1 e−λyn−1 =
λn e−λyn
14. By proposition(6.5a)
Z1 Z2 Zk
Xk:n ∼ + + ··· + where Z1 , . . . , Zk are i.i.d. with the exponential (λ) distribution.
n n−1 n−k+1
Hence
k k
1X 1 1X 1
E[Xk:n ] = var[Xk:n ] =
λ n+1−` λ (n + 1 − `)2
`=1 `=1
Finally, cov[Xj:n , Xk:n ] = var[Xj:n ] for 1 ≤ j < k ≤ n.
15. By proposition(6.5a) we know that
k
X Z`
Xk:n = for k = 1, 2, . . . , n.
n−`+1
`=1
where Z1 , . . . , Zn are i.i.d. with the exponential (λ) distribution. Hence
n n k n n n
X X X Z` X Z` X X n−`+2
Z= (n − k + 1)Xk:n = (n − k + 1) = (n − k + 1) = Z`
n−`+1 n−`+1 2
k=1 k=1 `=1 `=1 k=` `=1
Hence we can write
n
1X
Z= (j + 1)Tj where T1 , . . . , Tn are i.i.d. with the exponential (λ) distribution.
2
j=1
Appendix Jan 8, 2019(21:02) Answers 1§9 Page 113
Pn Pn
Hence E[Z] = j=1 (j + 1)/2λ = n(n + 3)/4λ and var[Z] = j=1 (j + 1)2 /4λ2 = n(2n2 + 9n + 13)/24λ2 .
Pn Pn Pn
16. Now `=1 (X` − X1:n ) = `=1 X` − nX1:n = `=1 X`:n − nX1:n = Z2 + · · · + Zn which is independent of Z1 . Hence
result.

Chapter 1 Section 9 on page 24 (exs-gamma.tex)


R∞ x −u ∞ R∞
1. (a) Integrating by parts gives Γ(x + 1) = 0 u e du = −ux e−u |0
+ x 0 u e du = xΓ(x). x−1 −u
(b) Γ(1) =
R ∞ −u 1 2
e du = 1. (c) Use parts (a) and (b) and induction. (d) Use the transformation u = t ; hence Γ( 1/2) =
R0∞ −1/2 −u √ R ∞ − 1 t2 √ 2

0
u e du = 2 0 e 2 dt = π. The final equality follows because the integral over (−∞, ∞) of the

standard normal density is 1. (e) Use induction. For n = 1 we have Γ( 3/2) = π/2 which is the right hand side. Now
2n + 1 1.3.5 . . . (2n − 1).(2n + 1) √
Γ(n + 1 + 1/2) = (n + 1/2)Γ(n + 1/2) = Γ(n + 1/2) = π as required.
2 2n+1
2. E[Y ] = n/α and E[1/X] = α/(m − 1) provided m > 1. Hence E[ Y /X ] = n/(m − 1) provided m > 1.
3. fX (x) = xn−1 e−x /Γ(n). At x = n − 1, fX (x) = (n − 1)n−1 e−(n−1) /Γ(n). Hence result by Stirling’s formula.
4. The simplest way is to use the moment generating function: MX+Y (t) = E[et(X+Y ) ] = E[etX ]E[etY ] = 1/(1 − t/α)n1 +n2
which is the mgf of a Gamma(n1 + n2 , α) distribution. Alternatively,
Z t
αn1 +n2 e−αt t
Z
fX+Y (t) = fX (t − u)fY (u) du = (t − u)n1 −1 un2 −1 du
0 Γ(n1 )Γ(n2 ) u=0
αn1 +n2 tn1 +n2 −1 e−αt 1 αn1 +n2 tn1 +n2 −1 e−αt Γ(n1 )Γ(n2 )
Z
= (1 − w)n1 −1 wn2 −1 dw =
Γ(n1 )Γ(n2 ) w=0 Γ(n1 )Γ(n2 ) Γ(n1 + n2 )
n1 +n2 n1 +n2 −1 −αt
α t e
=
Γ(n1 + n2 )
where we have used the transformation w = u/t.
5. (a) Clearly u ∈ R and 0 < v < 1; also x = uv and y = u(1 − v). Hence
n+m n+m−1 −αu m−1
(1 − v)n−1

∂(x, y)
= u and f(U,V ) (u, v) = α u e v
= fU (u)fV (v)
∂(u, v) Γ(n)Γ(m)
where U ∼ Gamma(n + m, α) and V ∼ Beta(m, n).
(1+v)2
(b) Clearly u ∈ R and v ∈ R; also x = u/(1 + v) and y = uv/(1 + v). Hence ∂(u,v) ∂(x,y) = and

u

u αn+m un+m−2 e−αu v n−1 αn+m un+m−1 e−αu v n−1


f(U,V ) (u, v) = =
(1 + v)2 (1 + v)m+n−2 Γ(n)Γ(m) (1 + v)m+n Γ(n)Γ(m)
n+m n+m−1 −αu n−1
α u e Γ(n + m) v
= = fU (u)fV (v)
Γ(n + m) Γ(n)Γ(m) (1 + v)m+n
where U ∼ Gamma(n + m, α) and 1/(1 + V ) ∼ Beta(m, n).
6. Note that y1 ∈ (0, 1), y2 ∈ (0, 1) and y3 > 0. Also x1 = y1 y2 y3 , x2 = y2 y3 (1 − y1 ) and x3 = y3 (1 − y2 ) and the absolute
value of the Jacobian of the transformation is given by ∂(x
1 ,x2 ,x3 ) 2
∂(y1 ,y2 ,y3 ) = y2 y3 . Now

λk1 +k2 +k3 xk1 1 −1 xk2 2 −1 x3k3 −1 e−λ(x1 +x2 +x3 )


f(X1 ,X2 ,X3 ) (x1 , x2 , x3 ) = for x1 > 0, x2 > 0, and x3 > 0.
Γ(k1 )Γ(k2 )Γ(k3 )
Hence
λk1 +k2 +k3 y1k1 −1 y2k1 +k2 −1 y3k1 +k2 +k3 −1 (1 − y1 )k2 −1 (1 − y2 )k3 −1 e−λy3
f(Y1 ,Y2 ,Y3 ) (y1 , y2 , y3 ) =
Γ(k1 )Γ(k2 )Γ(k3 )
y1k1 −1 (1 − y1 )k2 −1 y2k1 +k2 −1 (1 − y2 )k3 −1 λk1 +k2 +k3 y3k1 +k2 +k3 −1 e−y3
=
B(k1 , k2 ) B(k1 + k2 , k3 ) Γ(k1 + k2 + k3 )
HenceR Y1 ∼ Beta(k ,
R1 2 2k ), Y ∼ Beta(k 1 + k ,
2 3 k ) and Y 3 ∼ Gamma(k 1 + k 2 + k 3 , λ).
7. Now etX dP ≥ {X>x} etX dP ≥ etx P[X ≥ x]. Hence, for all x ≥ 0 and all t < α we have P[X ≥ x] ≤ e−tx E[etX ].
Hence
e−tx
P[X ≥ x] ≤ inf e−tx E[etX ] = αn inf
t<α t<α (α − t)n
By differentiation, the infimum occurs at t = α − n/x. Hence
n−αx
 
n e 2n
P[X ≥ x] ≤ α and P X ≥ ≤ 2n e−n as required.
(n/x)n α
8. (a) The moment generating function of X is
∞ k ∞ ∞ 
2k−1 (k + 2)!tk X k + 2

X t X 1
E[etX ] = 1 + E[X k ] = 1 + = (2t)k = for |t| < 1/2.
k! k! 2 (1 − 2t)3
k=1 k=1 k=0
P∞
Hence X ∼ Gamma(3, 1/2) = χ26 . (b) E[etX ] = k=0 k+1 k
 2 2
k (2t) = 1/(1 − 2t) for |t| < /2. This is Gamma(2, /2) = χ4 .
1 1
Page 114 Answers 1§9 Jan 8, 2019(21:02) Bayesian Time Series Analysis

9. Use fY (y) = fU (u) du
1 −y/2
dy = 2 e . This is the χ22 distribution.
10. Let V = X + Y . Then
f(X,V ) (x, v) f(X,Y ) (x, v − x)
fX|V (x|v) = =
fV (v) fV (v)
α(αx)m−1 e−αx α (α(v − x))n−1 e−α(v−x) Γ(m + n)
=
Γ(m) Γ(n) α(αv)m+n−1 e−αv
Γ(m + n) xm−1 (v − x)n−1
= for 0 < x < v.
Γ(m)Γ(n) v m+n−1
Hence for v > 0 we have
v v
Γ(m + n) xm (v − x)n−1 Γ(m + n)  x m  x n−1
Z Z
E[X|X + Y = v] = dx = 1− dx
Γ(m)Γ(n) x=0 v m+n−1 Γ(m)Γ(n) x=0 v v
1
Γ(m + n) Γ(m + n) Γ(m + 1)Γ(n) mv
Z
= v tm (1 − t)n−1 dt = v =
Γ(m)Γ(n) t=0 Γ(m)Γ(n) Γ(m + n + 1) m +n
R∞ n ∞ n −(α+x)y
R n n+1
11. (a) Now fX (x) = 0 fX|Y (x|y)fY (y) dy = α y=0 y e dy/Γ(n) = nα /(x + α) for x > 0. E[X] can be
 
found by integration; or E[X|Y ] = /Y and hence E[X] = E E[X|Y ] = E[ /Y ] = α/(n − 1).
1 1

(b) Now f(X,Y ) (x, y) = αn y n e−(x+α)y /Γ(n). Hence fY |X (y|x) = f(X,Y ) (x, y)/fX (x) = y n (x + α)n+1 e−(x+α)y /Γ(n + 1)
for x > 0 and y > 0.
12.
f(X|Y1 ,...,Yn ) (x|y1 , . . . , yn ) ∝ f (y1 , . . . , yn , x) = f (y1 , . . . , yn |x)fX (x)
n
Y
= xe−xyi λe−λx = λxn e−x(λ+y1 +···+yn )
i=1
Using the integral of the Gamma density gives
Z ∞
λn!
λxn e−x(λ+y1 +···+yn ) dx =
0 (λ + y1 + · · · + yn )n+1
and hence
xn e−x(λ+y1 +···+yn ) (λ + y1 + · · · + yn )n+1
f(X|Y1 ,...,Yn ) (x|y1 , . . . , yn ) =
n!
which is the Gamma(n + 1, λ + y1 + · · · + yn ) distribution. Hence E[X|Y1 = y1 , . . . , Yn = yn ] = (n + 1)/(λ + y1 + · · · + yn ).
13. The joint density of (X, Y ) is
e−x xj αn xn−1 e−αx
P[Y = j|X = x]fX (x) = for x > 0 and j = 0, 1, . . . .
j! Γ(n)
Hence
Z ∞ −x j n n−1 −αx Z ∞
e x α x e αn
P[Y = j] = dx = xj+n−1 e−(1+α)x dx
x=0 j! Γ(n) j!Γ(n) x=0
Z ∞
αn Γ(j + n) (1 + α)j+n j+n−1 −(1+α)x
= x e dx
j!Γ(n) (1 + α)j+n x=0 Γ(j + n)
j
αn Γ(j + n) j + n − 1  α n
  
1
= =
j!Γ(n) (1 + α)j+n j 1+α 1+α
j+n−1 j n

which is the negative binomial distribution j p (1 − p) , the distribution of the number of successes before the
nth failure.
14. Now φX (t) = E[eitX ] = α/(α − it). Using Y = αX − 1 gives φY (t) = E[eitY ] = e−it E[eitαX ] = e−it /(1 − it). Hence
|φY (t)| = 1/(1 + t2 )1/2 . Choose k > 2 and then |φY (t)|k ≤ 1/(1 + t2 ) which is integrable. Also, for |t| ≥ δ we have
|φY (t)| ≤ 1/(1 + δ 2√)1/2 < 1 for all δ > 0.
It follows
√ that Sn / n√has a bounded continuous density with density fn which satisfies limn→∞ fn (z) = n(z). Now
Sn / n = (αGn − n)/ n. Hence
√  √ 
n n+z n
fn (z) = fGn and hence the required result.
α α
15. Let Lt = SK − SK−1 . Note that {K = 1} = {S1 ≥ t}. Then for x < t we have
X∞ X∞
P[Lt ≤ x] = P[Lt ≤ x, K = n] = P[Sn − Sn−1 ≤ x, K = n]
n=1 n=2

X ∞
X
= P[Sn − Sn−1 ≤ x, Sn−1 < t ≤ Sn ] = P[Xn ≤ x, Sn−1 < t ≤ Sn−1 + Xn ]
n=2 n=2
Appendix Jan 8, 2019(21:02) Answers 1§11 Page 115
∞ Z
X x ∞ Z
X x Z t
= P[Sn−1 < t < Sn−1 + y]αe−αy dy = fSn−1 (v)αe−αy dv dy
n=2 y=0 n=2 y=0 v=t−y
∞ Z
X t Z x ∞ Z
X t
fSn−1 (v)αe−αy dy dv =
 −α(t−v)
− e−αx fSn−1 (v) dv

= e
n=2 v=t−x y=t−v n=2 v=t−x
P∞
But n=2 fSn−1 (v) = α. Hence
P[Lt ≤ x] = 1 − e−αx − αxe−αx and fLt (x) = α2 xe−αx
If x > t then

X ∞
X
P[Lt ≤ x] = P[Lt ≤ x, K = n] = P[t < X1 ≤ x] + P[Sn − Sn−1 ≤ x, K = n]
n=1 n=2

X
= e−αt − e−αx + P[Sn − Sn−1 ≤ x, Sn−1 < t ≤ Sn ]
n=2

X
= e−αt − e−αx + P[Xn ≤ x, Sn−1 < t ≤ Sn−1 + Xn ]
n=2
∞ Z x
X
= e−αt − e−αx + P[Sn−1 < t < Sn−1 + y]αe−αy dy
n=2 y=0
∞ Z
X t Z x
= e−αt − e−αx + fSn−1 (v)αe−αy dy dv
n=2 v=0 y=t−v
∞ Z
X t
= e−αt − e−αx +
 −α(t−v)
− e−αx fSn−1 (v) dv

e
n=2 v=0
−αx −αx
=1−e − αte
and fLt (x) = α(1 + αt)e−αx .
Z t Z ∞
E[Lt ] = α2 x2 e−αx dx + α(1 + αt)xe−αx dx
x=0 x=t
Z t Z ∞
= −αt2 e−αt + 2α xe−αx dx + α(1 + αt) xe−αx dx
0 x=t
2 − e−αt
=
α
and so E[Lt ] > 1/α.
(b)

X
P[Wt ≤ x] = P[SK ≤ t + x] = P[Sn ≤ t + x, K = n]
n=1

X
= P[t < X1 ≤ t + x] + P[Sn ≤ t + x, Sn ≥ t, Sn−1 < t]
n=2
∞ Z t
X
= e−αt − e−α(x+t) + fSn−1 (y) P[t − y ≤ Xn ≤ t + x − y] dy
n=2 0
∞ Z t
X
= e−αt − e−α(x+t) + fSn−1 (y) e−α(t−y) − e−α(x+t−y) dy
 

n=2 0

= 1 − e−αx
So Wt has the same exponential distribution as the original Xj —this is the “lack of memory” property.
16. (a) Use the transformation y = λxb . Then
Z ∞ Z ∞ n−1 −y
y e
f (x) dx = dy = 1
0 0 Γ(n)
(b) Straightforward. (c) Use the transformation y = xb .

Chapter 1 Section 11 on page 28 (exs-normal.tex)

1. Using the transformation v = −t gives


Z −x Z x Z ∞
1  2  1  2  1
√ exp − v2/2 dv = 1 − Φ(x)
 
Φ(−x) = √ exp − /2 dt = −
t √ exp − /2 dv =
v
−∞ 2π ∞ 2π x 2π
Page 116 Answers 1§11 Jan 8, 2019(21:02) Bayesian Time Series Analysis

2. (140 − µ)/σ = Φ−1 (0.3) = −0.5244005 and (200 − µ)/σ = 0.2533471. Hence σ = 77.14585 and µ = 180.4553.
3. We can take Y = 0 with probability 1/2 and Y = Z with probability 1/2, where Z ∼ N (0, 1). Hence E[Y n ] = 21 E[Z n ] = 0
if n is odd and 12 E[Z n ] = n!/(2(n+2)/2 ( n/2)!) if n is even..
4. (a) Clearly we hmust have c > 0; i also a > 0 is necessaryin order  to ensure fX (x) can integrate
 to  Q(x)2 
1. Now =
2 b b 2 b2 b2 b 2 b2 (x−µ)
a(x − a x) = a (x − 2a ) − 4a2 and hence fX (x) = c exp − 4a2 exp −a(x − 2a ) = c exp − 4a2 exp − 2σ2
 2
b 1 b
where µ = 2a and σ 2 = 2a . Because fX (x) integrates to 1, we must also have c exp − 4a 2 = σ√12π . This answers (a).
b 1

(b) X ∼ N 2a , σ 2 = 2a .
5. Clearly X/σ and Y /σ are i.i.d. with the N (0, 1) distribution. Hence (X 2 + Y 2 )/σ 2 ∼ χ22 = Γ(1, 1/2) which is the
exponential ( 1/2) distribution with density 21 e−x/2 for x > 0. Hence X 2 + Y 2 ∼ exponential ( 1/2σ2 ) with expectation 2σ 2 .
(b) Clearly X1/σ, . . . , Xn/σ are i.i.d. with the N (0, 1) distribution. Hence Z 2 /σ 2 ∼ χ2n = Gamma( n/2, 1/2). Hence
Z 2 ∼ Gamma( n/2, 1/2σ2 ).
1 −(x2 +y 2 )/2
6. The
distribution
of (X, Y ) is f(X,Y ) (x, y) = 2π e for x ∈ R and y ∈ R. The absolute value of the Jacobian is
∂(v,w) 2(x2 +y2 ) 2
∂(x,y) = y2 = 2(w + 1). Note that two values of (x, y) lead to the same value of (v, w). Hence f(V,W ) (v, w) =
1
2π(1+w2 )
e−v/2 for v > 0 and w ∈ R. Hence V and W are independent with fV (v) = 21 e−v/2 for v > 0 and fW (w) =
1
π(1+w2 )
for w ∈ R. So V ∼ exponential ( 21 ) and W ∼ Cauchy(1).
7.
f(X|Y1 ,...,Yn ) (x|y1 , . . . , yn ) ∝ f (y1 , . . . , yn , x) = f (y1 , . . . , yn |x)fX (x)
n
(y − x)2 (x − µ)2
   
1 1
√ exp − i 2
Y
= √ exp −
σ 2π 2σ1 σ 2π 2σ 2
i=1 1
 Pn 2 Pn
nx2 x2 µ2

y 2x i=1 y1 2µx
∝ exp − i=12 i + − − + −
2σ1 2σ12 2σ12 2σ 2 2σ 2 2σ 2
 Pn
nx2 x2

x i=1 y1 µx
∝ exp 2
− 2
− 2
+ 2
σ1 2σ1 2σ σ
 2
 Pn
αx n 1 µ yi
= exp − + βx where α = 2 + 2 and β = 2 + i=12
2 σ1 σ σ σ1
 α 
2
∝ exp − (x − β/α)
2
Hence the distribution of (X|Y1 = y1 , . . . , Yn = yn ) is N ( β/α, σ 2 = 1/α). Note that α, the precision of the result is the
sum of the (n + 1)-precisions. Also, the mean is a weighted average of the input means:
 P
β µ σ12 /n + ( yi )σ 2
= 
α σ12 /n + σ 2
q
2 2 2 2
8. (a) fY (y) = σ√22π e−y /2σ = σ1 π2 e−y /2σ for y ∈ (0, ∞).
2 ∞
q R q h q q
∞ 2 2 2
(b) E[|X|] = σ1 π2 0 xe−x /2σ dx = σ1 π2 −σ 2 e−x /2σ = σ1 π2 σ 2 = σ π2
0
9. (a) FY (x) = P[Y ≤ x] = P[−x ≤ X ≤ x] = Φ(x) − Φ(−x). Hence fY (x) = fX (x) + fX (−x); hence
 r
(x − µ)2 (x + µ)2 (x2 + µ2 )
      µx 
1 1 2
fY (x) = √ exp − 2
+ √ exp − 2
= 2
exp − 2
cosh
2πσ 2 2σ 2πσ 2 2σ πσ 2σ σ2
by using cosh x = 21 (ex + e−x ).
Z ∞ Z ∞
(x − µ)2 (x + µ)2
    
1
E[Y ] = √ x exp − dx + x exp − dx
2πσ 2 0 2σ 2 0 2σ 2
Z ∞ Z ∞
(x − µ)2 (x + µ)2
    
1
= √ (x − µ) exp − dx + (x + µ) exp − dx + A
2πσ 2 0 2σ 2 0 2σ 2
µ2 µ2
    
1
=√ σ 2 exp − 2 + σ 2 exp − 2
2πσ 2 2σ 2σ
where
Z ∞ Z ∞
(x − µ)2 (x + µ)2
    
µ
A= √ exp − dx − exp − dx
2πσ 2 0 2σ 2 0 2σ 2
"Z #
∞ Z ∞
µσ −y 2 /2 2
h µ  µ i h  µ i
= √ e dy − e−y /2 dy = µ Φ −Φ − = µ 1 − 2Φ −
2πσ 2 −µ/σ µ/σ σ σ σ
Hence
Appendix Jan 8, 2019(21:02) Answers 1§11 Page 117
r
µ2
   µ i
2 h
E[Y ] = σ exp − 2 + µ 1 − 2Φ −
π 2σ σ
2 2
Clearly var[Y ] = var[|X|] = E[X ] − {E[|X|} = var[X] + {E[X]} − µ2Y = σ 2 + µ2 − µ2Y .
2
tY
(c) The mgf, E[e ], is
Z ∞ Z ∞
(x − µ)2 (x + µ)2
    
1
√ exp tx − dx + exp tx − dx =
2πσ 2 0 2σ 2 0 2σ 2
 2 2 Z ∞ Z ∞
(x − µ − σ 2 t)2 σ 2 t2 (x + µ − σ 2 t)2
      
1 σ t
√ exp + µt exp − dx + exp − µt exp − dx
2πσ 2 2 0 2σ 2 2 0 2σ 2
 2 2 h  2 2 h
σ t  µ i σ t µ i
= exp + µt 1 − Φ − − σt + exp − µt 1 − Φ − σt
2 σ 2 σ
Hence the cf is
σ 2 t2 σ 2 t2
  h  µ i  h µ i
itY
φY (t) = E[e ] = exp − + iµt 1 − Φ − − iσt + exp − − iµt 1 − Φ − iσt
2 σ 2 σ
10. Z ∞ µ
(x − µ)2 (x − µ)2
  Z   
 n
 1 n n
E |X − µ| = √ (x − µ) exp − dx + (µ − x) exp − dx
2πσ 2 µ 2σ 2 −∞ 2σ 2
Z ∞  2
2 t
=√ tn σ n exp − dt
2π 0 2
Z ∞
σn σn 2n/2 σ n
   
n+1 n+1
=√ v (n−1)/2 exp(− v/2) dv = √ 2(n+1)/2 Γ = √ Γ
2π 0 2π 2 π 2
itZ1 +isZ2 i(t+s)X i(t−s)Y − 21 (t+s)2 − 21 (t−s)2 −s2 −t2 itZ1 isZ2
11. (a) Now E[e ] = E[e ]E[e ]=e e = e e √ = E[e ]E[e ].√
We have shown that X −Y and X +Y are i.i.d. N (0, 2). Hence V1 = (X −Y )/ 2 and V2 = (X +Y )/ 2 are i.i.d. N (0, 1).
Finally X−Y /X+Y = V1/V2 . (b) Let X1 = X+Y 2
2 , Then X1 ∼ N (0, σ = /2). Let X2 =
1 X−Y
2 , Then X2 ∼ N (0, σ = /2).
2 1
2 2
Let Z1 = 2X1 and Z2 = 2X2 . Now X1 and X2 are independent √ by part (a); hence Z1 and√Z2 are independent. Hence
Z1 and Z2 are i.i.d. χ21 = Gamma( 1/2, 1/2) with c.f.√1/ 1 − 2it. Hence −Z2 has the c.f.√ 1/ 1 + 2it. Because Z1 and Z2
are independent, the c.f. of 2XY = Z1 − Z2 is 1/ 1 + 4t2 . Hence the c.f. of XY is 1/ 1 + t2 .
(c) Z ∞ Z ∞
itXY ityX 1 1 2
E[e ]= E[e ]fY (y) dy = E[eityX ] √ e− 2 y dy
−∞ −∞ 2π
Z ∞ Z ∞
1 1 2 2 1 2 1 1 2 2 1
=√ e− 2 t y e− 2 y dy = √ e− 2 y (1+t ) dy = √
2π −∞ √ 2π −∞ 1 + t2 √
(d) Now X = σX1 and Y = σY1 where the c.f. of X1 Y1 is 1/ 1 + t . Hence the c.f. of XY is 1/ 1 + σ 4 t2 .
2
(e) Take σ = 1. Then the m.g.f. Z ∞ is Z ∞
tXY 1 1 2
E[e ]= E[etyX ]fY (y) dy = E[etyX ] √ e− 2 (y−µ) dy
−∞ −∞ 2π
Z ∞ µ2 Z

1 µty+ 12 t2 y 2 − 21 (y−µ)2 e− 2 1 2 2
=√ e e dy = √ e− 2 y (1−t )+µy(1+t) dy
2π −∞ 2π −∞
2
− µ2 Z ∞ 2
(1 − t )
  
e 2 2µy
= √ exp − y − dy
2π −∞ 2 1−t
Z ∞ " 2 #
µ2 µ2 (1 + t) (1 − t2 )
  
1 µ
= exp − + √ exp − y− dy
2 2(1 − t) 2π −∞ 2 1−t
 2 
µ t 1
= exp √
1−t 1 − t2
tXY tσ 2 X1 Y1
For the general case E[e ] = E[e ] where X1 and Y1 are i.i.d. N ( µ/σ, 1) and hence
2
iµ2 t
   
tXY µ t 1 itXY 1
E[e ] = exp √ and the c.f. is E[e ] = exp √
1 − tσ 2
1−σ t 4 2 1 − itσ 2
1 + σ 4 t2
2
12. Use the previous question. In both cases, the c.f. is 1/(1 + t ).
13. (a) Now Z ∞
1 ∞
Z  
 
b2
 b b   2

1
exp − 2 u + u2 2
du = 1 − 2 + 2 + 1 exp − 12 u2 + ub 2 du
0 2 0 u u
Consider the integral
Z ∞ 
b 
1

2 b2

I1 = + 1 exp − 2 u + u2
du
0 u2
Page 118 Answers 1§13 Jan 8, 2019(21:02) Bayesian Time Series Analysis

b dz b
The transformation u → z with z = u − is a 1 − 1 transformation: (0, ∞) → (−∞, ∞). Also
u du =1+ u2
. Hence
Z ∞
1 2 √
I1 = e−b e− 2 z dz = e−b 2π
−∞
Now consider the integral
∞  
b
Z   
b2
I2 = 1 − 2 exp − 12 u2 + u2
du
0 u
√ √
Consider the transformation z = u + ub . This is a 1 − 1 transformation (0, b) → (∞, 2 b) and a 1 − 1 transformation
√ √
( b, ∞) → (2 b, ∞). Hence
Z √b Z ∞ !   Z 2√b Z ∞
b   2
 1 2 1 2
I2 = + √ 1
1 − 2 exp − 2 u2 + u2 b
du = eb e− 2 z dz + √ eb e− 2 z dz = 0
0 b u ∞ 2 b
as required.
(b) Just use the transformation u = |a|v in part (a) and then set b1 = b/|a|.

Chapter 1 Section 13 on page 31 (exs-logN.tex)

1. Let S4 denote the accumulated value at time t = 4 and let s0 denote the initial amount invested. Then S4 = s0 (1 + I1 )(1 +
P4
I2 )(1 + I3 )(1 + I4 ) and ln(S4 /s0 ) = j=1 ln(1 + Ij )
1 2
Recall that if If Y ∼ lognormal(µ, σ 2 ) then Z = ln Y ∼ N (µ,pσ 2 ). Also E[Y ] = E[eZ ] = eµ+ 2 σ and var[Y ] =
2 2 2
e2µ+σ (eσ − 1). Hence eσ = 1 + var[Y ]/E[Y ]2 and eµ = E[Y ]/ 1 + var[Y ]/E[Y ]2 or µ = ln E[Y ] − σ 2 /2.
Using mean=1.08 and variance=0.001 gives µ1 = 0.0765325553785 and σ12 = 0.000856971515297.
Using mean=1.06 and variance=0.002 gives µ2 = 0.0573797028389 and σ22 = 0.00177841057009.
Hence ln(S4 /s0 ) ∼ N (2µ1 + 2µ2 , 2σ12 + 2σ22 ) = N (0.267824516435, 0.00527076417077). We
qwant
0.95 = P[S5 > 5000] = P[ln(S5 /s0 ) > ln(5000/s0 )] = P[Z > (ln(5000/s0 ) − (2µ1 + 2µ2 ))/ 2σ12 + 2σ22 ] Hence
 
ln(5000/s ) − (2µ + 2µ ) ln(5000/s0 ) − (2µ1 + 2µ2 )
0.05 = Φ  q0 1 2 
and so q = Φ−1 (0.05)
2
2σ1 + 2σ2 2 2
2σ1 + 2σ22
q
Hence ln(5000/s0 ) = (2µ1 + 2µ2 ) + Φ−1 (0.05) 2σ12 + 2σ22 = 0.148408095871.
 q 
Hence s0 = 5000 exp −(2µ1 + 2µ2 ) − Φ−1 (0.05) 2σ12 + 2σ22 = 4310.39616086 or £4,310.40.
2. (a) Let Z = 1/X. Then ln(Z) = − ln(X) ∼ 2
 N (−µ, σ ). Hence Z ∼ logN2 (−µ,
2
 σ ). (b) Let Z = cX b . Then
2 2 2
ln(Z) = ln(c) + b ln(X) ∼ N ln(c) + bµ, b σ . Hence Z ∼ logN ln(c) + bµ, b σ .
3. (a) Now X ∼ logN (µ, σ 2 ); hence ln(X) ∼ N (µ, σ 2 ). Hence ln(GMX ) = E[ln(X)] = µ; hence GMX = eµ .
2
(b) Now ln(GVX ) = var[ln(X)] = σ 2 . Hence GVX = eσ and GSDX = eσ .
4. The median is eµ because P[X < eµ ] = P[ln(X) < µ] = 1/2. Hence the median equals the geometric mean. The mean
1 2
is eµ+ 2 σ by equation(12.3a) on page 30. For the mode, we need to differentiate the density function which is
(ln(x) − µ)2
 
c
fX (x) = exp − for x > 0.
x 2σ 2
Hence
c 2(ln(x) − µ) (ln(x) − µ)2
   
dfX (x) c
= − 2− exp −
dx x x 2xσ 2 2σ 2
2 2 1 2
which equals 0 when x = eµ−σ . Clearly mode = eµ−σ < median µ
 = e < mean = e 2 .
µ+ σ
ln(q1 )−µ ln(q1 )−µ
(b) Lower quartile: 0.25 = P[X < q1 ] = P[ln(X) < ln(q1 )] = Φ σ and hence σ = −0.6744898 and hence
q1 = eµ−0.6744898σ . Similarly for the upper quartile,
 q3 =eµ+0.6744898σ .
ln(αp )−µ
(c) p = P[X ≤ αp ] = P[ln(X) ≤ ln(αp )] = Φ σ . Hence ln(αp ) = µ + σβp as required.
Pn Pn Pn  Pn Pn 
5. (a) Let Z = X1 · · · Xn . Then ln(Z) = i=1 ln(Xi ) ∼ N i=1 µi , i=1 σi2 . Hence Z ∼ logN i=1 µi , i=1 σi2 .
1/n
Pn 2 2
(b) Let Z = (XQ1 n· · · Xani) . Then ln(Z) =Pn i=1 ln(Xi )/n ∼ N (µ, Pσn /n). Hence Pn Z2 ∼2 logN (µ, σ /n). Pn
(c) Let Z = i=1 Xi . Then ln(Z) =
qP i=1 ai ln(Xi ) ∼ N i=1 ai µi , i=1 ai σi . Hence mn = i=1 ai µi and
n 2 2
sn = i=1 ai σi .
6. Let Z = X1 /X2 . Then Z ∼ logN (µ1 − µ2 , σ12 + σ22 ) by using the previous 2 questions.
1 2 2 2
7. We know that α = eµ+ 2 σ and β = e2µ+σ (eσ −1 ). Hence
α2 α2
 
β α
σ 2 = ln 1 + 2 and eµ = p =p or µ = ln p
α 1 + β/α2 β + α2 β + α2
Appendix Jan 8, 2019(21:02) Answers 1§15 Page 119

8. Now for x ∈ (0, k) we have


(ln(x) − µ)2
 
1
f (x|X < k) = √ exp −
P[X < k]σx 2π 2σ 2
(ln(x) − µ)2 √ ln(k) − µ
   
1
= exp − where α = σ 2πΦ
xα 2σ 2 σ
and hence
k
(ln(x) − µ)2
Z  
1
E[X|X < k] = exp − dx
α 0 2σ 2
Using the transformation w = ln(x) − µ − σ 2 /σ gives dw 1 2
 2 2
dx = xσ and (ln(x) − µ) /σ = (w + σ) . Hence
2 2
1 (ln(k)−µ−σ )/σ (w + σ)2 σ (ln(k)−µ−σ )/σ w2 σ 2
Z   Z  
E[X|X < k] = exp − xσ dw = exp − + + µ dw
α −∞ 2 α −∞ 2 2
 
ln(k)−µ−σ 2
σ2
Φ σ
= eµ+ 2  
Φ ln(k)−µ
σ
1 2
The other result is similar or use E[X|X < k]P[X < k] + E[X|X > k]P[X > k] = E[X] = eµ+ 2 σ .
9. (a)
Z x
(ln(u) − µ)2
  
1 1
G(x) = exp −jµ − j 2 σ 2 uj √ exp − du
2 0 uσ 2π 2σ 2
Z x
(ln(u) − µ − jσ 2 )2
 
1
= √ exp − du as required.
0 uσ 2π 2σ 2
Setting j = 1 in part (a) shows that xfX (x) = E[X]fX1 (x) where X1 ∼ logN (µ + σ 2 , σ 2 ). (b)
Z ∞Z u Z ∞Z ∞
2E[X]γX = (u − v)fX (u)fX (v) dvdu + (v − u)fX (u)fX (v) dvdu
u=0 v=0 u=0 v=u
Z ∞Z u Z ∞Z v
= (u − v)fX (u)fX (v) dvdu + (v − u)fX (u)fX (v) dvdu
u=0 v=0 v=0 u=0
Z ∞Z u
=2 (u − v)fX (u)fX (v) dvdu
Zu=0

v=0
Z ∞ Z u 
=2 uFX (u)fX (u)du − 2 vfX (v)dv fX (u)du
u=0 u=0 v=0
Z ∞ Z ∞ 
= 2E[X] FX (u)fX1 (u)du − FX1 (u) fX (u)du where X1 ∼ logN (µ + σ 2 , σ 2 ).
u=0 u=0
   
= 2E[X] P[X ≤ X1 ] − P[X1 ≤ X] = 2E[X] P[ X/X1 ≤ 1] − P[ X1/X ≤ 1]

But X/X1 ∼ logN (−σ 2 , 2σ 2 ) and P[ X1/X ≤ 1] = P[X ≥ X1 ] = 1 − P[X < X1 ]. Hence
 
γX = 2P[Y ≤ 1] − 1 where Y ∼ logN (−σ 2 , 2σ 2 ).
   
σ
= 2Φ √ −1 as required.
2

Chapter 1 Section 15 on page 35 (exs-betaarcsine.tex.tex)

1.
1
Γ(α + β) Γ(α + β)Γ(m + α)
Z
m
E[X ] = xm+α−1 (1 − x)β−1 dx =
Γ(α)Γ(β) 0 Γ(α)Γ(m + α + β)
α−1 dx
2. X has density fX (x) = αx . Let Y = − ln(X). Then Y ∈ (0, ∞) and X = e−Y and dy = −x. Hence fY (y) =
.
dy
fX (x) dx = αe−y(α−1) × e−y = αe−αy for y ∈ (0, ∞), as required.
dy
3. Let Y = X/(1 − X); then Y ∈ (0, ∞). Also X = Y /(1 + Y ), 1 − X = 1/(1 + Y ) and dx = (1 + y)2 . Hence
dx xα−1 (1 − x)β−1 y α−1

1 1
fY (y) = fX (x) = = for y ∈ (0, ∞), as required.
dy B(α, β) (1 + y)2 B(α, β) (1 + y)α+β
Page 120 Answers 1§15 Jan 8, 2019(21:02) Bayesian Time Series Analysis

4.
Z 1
Γ(n + 1)
P[X > p] = z k−1 (1 − z)n−k dz
p Γ(k)Γ(n − k + 1)
Z 1
Γ(n + 1)
= [p + (1 − p)y]k−1 (1 − p)n−k+1 (1 − y)n−k dy
Γ(k)Γ(n − k + 1) 0
k−1  Z 1
k−1 r

Γ(n + 1) X
= p (1 − p)n−r y k−1−r (1 − y)n−k dy
Γ(k)Γ(n − k + 1) r 0
r=0
k−1 k−1  
X n! (k − 1 − r)!(n − k)! X n r
= pr (1 − p)n−r = p (1 − p)n−r
(n − k)!r!(k − 1 − r)! (n − r)! r
r=0 r=0
= P[Y ≤ k − 1]
5. (a) Suppose α > 0 and β > 0. Then for all x > 0 we have
Z ∞
xα−1 xα B(α + 1, β − 1) α
fX (x) = α+β
Hence E[X] = α+β
dx = =
B(α, β) (1 + x) 0 B(α, β) (1 + x) B(α, β) β − 1
R∞ α −α−β
using 0 x (1 + x) dx = B(α + 1, β − 1) for all α > −1 and β > 1.
(b) Similarly, for all β > 2 we have
Z ∞
xα+1 B(α + 2, β − 2) α(α + 1)
E[X 2 ] = α+β
dx = =
0 B(α, β) (1 + x) B(α, β) (β − 1)(β − 2)
Hence var[X].
(c) Suppose α ≤ 1. Then fX (x) ↑ as x ↓ and the mode is at 0. Now suppose α > 1 and let g(x) = xα−1 /(1 + x)α+β .
Then g 0 (x) = ( α − 1 − x(1 + β) ) /x(1 + x). Hence g 0 > 0 for x < (α − 1)/(1 + β) and g 0 > 0 when x > (α − 1)/(1 + β).
So the mode is at (α − 1)/(1 + β).
dy dy α+1
B(α, β)(1 + x)α+β =
2 2

(d) Let Y = 1/X; hence | dx | = 1/x . Hence fY (y) = f X (x)/| dx | = x fX (x) = x
y β−1 B(β, α)(1 + y)α+β as required.

(Note that B(α, β) = B(β, α).)
(e) Let V = X/Y and W = Y ; then ∂(v,w)
1
∂(x,y) = y . Now

xn1 −1 y n2 −1 e−(x+y) f(X,Y ) (x, y) xn1 −1 y n2 e−(x+y) v n1 −1 wn1 +n2 −1 e−w(1+v)


f(X,Y ) (x, y) = and f(V,W ) (v, w) = = =
Γ(n1 )Γ(n2 ) ∂(v,w)
Γ(n1 )Γ(n2 ) Γ(n1 )Γ(n2 )
∂(x,y)
and hence
Z ∞
v n1 −1 v n1 −1 Γ(n1 + n2 )
fV (v) = wn1 +n2 −1 e−w(1+v) dw = as required.
Γ(n1 )Γ(n2 ) w=0 Γ(n1 )Γ(n2 ) (1 + v)n1 +n2
(f) Just use X/Y = (2X)/(2Y ) and part (e).
6. Throughout x ∈ [0, 1] and we take arcsin(x) ∈ [0, π2 ].
Let y = arcsin(x); then sin(y) = x and sin(−y) = −x. Hence arcsin(−x) = − arcsin(x).
Let y = π2 − arccos(x). Then sin(y) = x and hence arccos(x) + arcsin(x) = π2 .
Now sin(y) = x and 1 − 2x2 = cos2 (y) − sin2 (y) = cos(2y); hence 21 arccos(1 − 2x2 ) = y = arcsin(x).

Combining gives 2 arcsin( x) = arccos(1 − 2x) = π2 − arcsin(1 − 2x) = π2 + arcsin(2x − 1).
7. (a) Let Y = kX + m. Hence
fX (x) fX (x) 1
fY (y) = dy = = √ the density of an arcsin(m + ak, m + bk) distribution.
| dx | k π (y − m − ak)(bk + m − y)
(b) Let Y = X 2 ; then Y ∈ (0, 1). Also
fX (x) fX (x) fX (x) 1 1
fY (y) = 2 dy = 2 = = √ = √ as required.
| dx | 2|x| |x| |x|π 1 − x2 π y(1 − y)
(c) Let Y = sin(X). Then Y ∈ (−1, 1). Also
fX (x) fX (x) 2 1 1 1
fY (y) = 2 dy = 2 = = p = √ as required.
| dx | | cos(x)| 2π | cos(x)| π 1−y 2 π (1 − y)(1 + y)
Let Y = sin(2X). Then Y ∈ (−1, 1). Also
fX (x) fX (x) 1 1 1 1
fY (y) = 4 dy = 4 = = p = √ as required.
| dx | 2| cos(2x)| π | cos(2x)| π 1 − y 2 π (1 − y)(1 + y)
Let Y = − cos(2X). Then Y ∈ (−1, 1). Also
fX (x) fX (x) 1 1 1 1
fY (y) = 4 dy = 4 = = p = √ as required.
| dx | 2| sin(2x)| π | sin(2x)| π 1 − y 2 π (1 − y)(1 + y)
Appendix Jan 8, 2019(21:02) Answers 1§17 Page 121

8. (a) Now V = X + Y has a triangular distribution on (−2π, 2π) with density:


|v|
 
1
fV (v) = 1− for −2π < v < 2π.
2π 2π
Let Z = sin(V ) = sin(X + Y ). Then Z ∈ (−1, 1). Also
|v|
X fV (v) X fV (v)  
1 1 X
fZ (z) = dz
= =√ 1−
v
| dv | v
| cos(v)| 1 − z 2 2π v 2π
The 4 values of v leading to z are sin−1 −1
P(z), π − sin (z), which are both√positive, and −2π + sin−1 (z) and −π − sin−1 (z)
which are both negative. This gives v |v| = 4π. Hence fZ (z) = 1/π 1 − z 2 . (b) Now −Y ∼ Uniform(−π, π).
Hence result follows by part (a).

Chapter 1 Section 17 on page 39 (exs-tCauchyF.tex.tex)

1. Let X 0 = X/σ. Then X 0 ∼ N (0, 1). Let Y 0 = (Y12 + · · · + Yn2 )/σ 2 . Then Y 0 ∼ χ2n . Also
X0
Z=p ∼ tn
Y 0 /n
2. Use the transformation from (X, Y ) to (W, Z) where
X
W =Y and Z=p
Y /n
Hence w ∈ (0, ∞) and z ∈ R and
√ √
∂(w, z)
= √n = √ n
∂(x, y) y w
Now

∂(x, y)
f(W,Z) (w, z) = f(X,Y ) (x, y) = f(X,Y ) (x, y) √w
∂(w, z) n
n/2−1 −y/2

1 2 y e w
= √ e−x /2 n/2 √
2π 2 Γ n/2 n
1 2
= 1/2 (n+1)/2 1/2  e−z w/2n w(n−1)/2 e−w/2
π 2 n Γ n/2
But

Γ n+1
Z 
(n−1)/2 −αw 2
w e dw = (n+1)/2
0 α
Hence
Γ n+1

z2
 
1 2 1
fZ (z) = 1/2 (n+1)/2 1/2  where α = 1+
π 2 n Γ n/2 α(n+1)/2 2 n

2 −(n+1)/2
Γ n+1/2
 
1 z
= 1/2 √ 1+ as required.
π Γ /2 n n n
3. First, let x = (t − α)/s:
Z ∞  n/2 Z ∞  n/2
1 1
dt = s dx
−∞ 1 + (t − α)2 /s2 −∞ 1 + x2
But from equation(16.1b) on page 36, we know that
Z ∞ −(n+1)/2
t2 √
1+ dt = B( 1/2, n/2) n
−∞ n

Letting x = t/ n gives
Z ∞
(1 + x2 )−(n+1)/2 dx = B 1/2, n/2

−∞
and hence the result.
Page 122 Answers 1§17 Jan 8, 2019(21:02) Bayesian Time Series Analysis
Pn P
4. Proof 1. Now Y = Z12 + · · · + Zn2 ; hence i=1 Zi2 /n−→1 asq n → ∞ by the Weak Law of Large Numbers. Using the
√ √ Pn 2 P
simple inequality | a − 1| = |a − 1|/| a + 1| < |a − 1| gives i=1 Zi /n−→1 as n → ∞. One of the standard results
by Slutsky(see for example p285 in [G RIMMETT & S TIRZAKER(1992)]) is:
D P D
if Zn −→Z as n → ∞ and Yn −→c as n →√∞ where c 6= 0 then Zn /Yn −→Z/c as n → ∞. Hence result.
n −n
Proof 2. Stirling’s formula is n! ∼ n e 2πn as n → ∞. Using this we can show that
1 1
lim √ =√ as n → ∞.
n→∞ B( 1/2, n/2) n 2π
Also
−(n+1)/2
t2

2
lim 1 + = e−t /2 as n → ∞.
n→∞ n

dy
5. (a) Now fX (x) = π(1+x2 ) for x ∈ R and dx = x12 . Hence fY (y) = π(1+y
1 1
2 ) for y ∈ R. Hence Y ∼ Cauchy(1).

(b) As for part (a), fY (y) = π(1+ss 2 y2 ) for y ∈ R which is Cauchy( 1/s). Or: X/s ∼ Cauchy(1); by part (a) we have
s/X ∼ Cauchy(1); hence 1/X ∼ Cauchy( 1/s). s
(c) fY (y) = π[s2 y2 +(my−1)2 ] for y ∈ R.

X X Y
6. Now W = /Y = ( /σ)/( /σ). Hence we can take σ = 1 without loss of generality. (a) Now W = X/Y = X/ Z
where X ∼ N (0, 1) and Z ∼ χ21 and X and Z are independent. Hence W ∼ t1 = γ1 , the Cauchy density.
2
(b) As for part (a). (c) The folded Cauchy ∂u density which 2is fW (w) = 2/π(1 + w ) for w > 0. P ∂u
7. (a) Let W = tan(U ). Then fW (w) = fU (u) ∂w = 1/π(1 + w ). (b) Let W = tan(U ). Then fW (w) = u fU (u) ∂w
=
2
(2/2π) × 1/(1 + w ) as required.
t t
h |t| n
i
8. (a) φ(t) = E[eitY ] = E[ei n (X1 +···Xn ) ] = E[ei n X1 ]n = e−s n = e−s|t| as required. (b) Use proposition(2.7a)
√ D
on page 6. Hence 2 n Mn /(πs) =⇒ N (0, 1) as n → ∞. Hence Mn is asymptotically normal with mean 0 and
2 2
variace π s /(2n).
9. (a) φ2X (t) = E[eit2X ] = E[ei(2t)X ] = e−2s|t| . This is the Cauchy γ2s distribution.
(b) φX+Y (t) = E[eit(X+Y ) ] = E[eit(aU +cU +bV +dV ) ] = E[eit(a+c)U ]E[eit(b+d)V ] = e−s(a+c)|t| e−s(b+d)|t| = e−s(a+b+c+d)|t|
which is the Cauchy distribution γs(a+b+c+d) .
10. We have Y = R sin(Θ) and X = R cos(Θ). Also ∂(x,y) ∂(r,θ) = r. Hence


∂(x, y)
f(R,Θ) (r, θ) = f(X,Y ) (x, y)
= 1 e−(x2 +y2 )/2 r = 1 re−r2 /2
∂(r, θ) 2π 2π
2
Hence Θ is uniform on (−π, π), R has density re−r /2 for r > 0 and R and Θ are independent.
If W = R2 then the density of W is fW (w) = 12 e−w/2 for w > 0; this is the χ22 distribution.
11. Let X = tan(Θ); hence Θ ∈ (− π/2, π/2). Then P[Θ ≤ θ] = P[X ≤ tan(θ)] = 1/2 + θ/π and fΘ (θ) = 1/π. Hence Θ has the
uniform distribution on (− π/2, π/2). Now 2X/(1 − X 2 ) = tan(2Θ). So we want the distribution of W = tan(Y ) where Y
has the uniform distribution on (−π, π). Hence

dw
=2 1 1
X
fW (w) = fY (y) as required.
y
dy 2π 1 + w2
12. Now X = b tan Θ and so 0 < X < ∞. For x > 0 we have P[X ≤ x] = P[|b tan Θ| ≤ x] = P[| tan Θ| ≤ x/b] =
2/π tan−1 x/b. Differentiating gives

2 b
fX (x) =
π b2 + x 2
(b) P[ 1/X ≤ x] = P[X ≥ 1/x] = 1 − π2 tan−1 1/bx and this has density π2 1+bb2 x2 for x > 0.
13. (a) Clearly f ≥ 0. Using the transformation y = (1 + x2 )1/2 tan t gives
Z ∞ Z ∞ Z π/2
1 1 1 cos t 1
f (x, y) dy = dy = 2
dt =
y=−∞ 2π y=−∞ (1 + x + y )2 2 3/2 2π t=−π/2 1 + x π(1 + x2 )
which is the standard Cauchy distribution. This answers parts (a) and (b).
(c) The absolute value of the Jacobian of the transformation is ∂(x,y)∂(r,θ) = r. Hence

r
f(R,Θ) (r, θ) = for r > 0 and θ ∈ (0, 2π).
2π(1 + r2 )3/2
Hence R and Θ are independent. Θ ∼ U (0, 2π) and R has density fR (r) = r/(1 + r2 )3/2 for r > 0.
14. Use the transformation from (X, Y ) to (V, W ) where
X/m nX
V = = and W =Y
Y /n mY
Hence v ∈ (0, ∞) and w ∈ (0, ∞) and
∂(v, w) n n
∂(x, y) = my = mw

Appendix Jan 8, 2019(21:02) Answers 1§17 Page 123

Now
∂(x, y)
f(V,W ) (v, w) = f(X,Y ) (x, y) = f(X,Y ) (x, y) mw
∂(v, w) n
xm/2−1 e−x/2 y n/2−1 e−y/2 mw
=  
2m/2 Γ m/2 2n/2 Γ n/2 n
(mwv/n)m/2−1 e−mvw/2n wn/2−1 e−w/2 mw
=  
2m/2 Γ m/2 2n/2 Γ n/2 n
v m/2−1 (m/n)m/2 w mv
= w(m+n)/2−1 e− 2 (1+ n )
2(m+n)/2 Γ(m/2)Γ(n/2)
R∞
Using 0 wk−1 e−αw dw = Γ(k)/αn with α = 21 (1 + mv n ) and integrating out w gives
m+n
Γ m+n
 
m/2−1 m/2
Γ 2 v (m/n) 2 v m/2−1 mm/2 nn/2
fV (v) = m = as required.
Γ( 2 )Γ( n2 ) 2(m+n)/2 α(m+n)/2 Γ( m n
2 )Γ( 2 ) (n + mv)
(m+n)/2

15. Now Z = X12 /Y12 where X1 = X/σ and Y1 √ = Y /σ are i.i.d. N (0, 1). Now X12 and Y12 are i.i.d. χ21 . Hence Z has the F1,1
distribution which has density fZ (z) = 1/π z(1 + z) for z > 0.
16. Now F = (nX)/(mY ); hence E[F ] = nE[X]E[1/Y ]/m = nmE[1/Y ]/m = nE[1/Y ] = n/(n − 2). Also E[F 2 ] =
n2 E[X 2 ]E[1/Y 2 ]/m2 = n2 m(m + 2)/[(n − 2)(n − 4)m2 ] = n2 (m + 2)/[m(n − 2)(n − 4)]. Hence var[F ] = 2n2 (m +
n − 2)/[ m(n − 2)2 (n − 4) ] for n > 4.
17. (a) By definition(18.6a) on page 44. (b) Using §8.3 on page 21 gives 2α1 X1 ∼ Gamma(n1 , 1/2) = χ22n1 and 2α2 X2 ∼
Gamma(n2 , 1/2) = χ22n2 . Hence result by definition(18.6a).
18. Let Y = nX/m(1 − X); then Y ∈ (0, ∞). Also X = mY /(n + mY ), 1 − X = n/(n + mY ) and
m dy 1 dy n
= and hence =
n dx (1 − x)2 dx m(1 − x)2
Hence
dx xm/2−1 (1 − x)n/2−1 m(1 − x)2 y m/2−1 mm/2 nn/2

fY (y) = fX (x) = = as required.
B( m n
B m n

dy 2 , 2) n 2 , 2
(my + n)(m+n)/2
19. (a) This is the reverse of exercise 18.2
n/X
Now X ∼ Fm,n implies 1/X ∼ Fn,m . Hence m+n/X ∼ Beta( n/2, m/2). Hence result.
dy
(b) Let Y = αX/β; then | dx | = α/β. Now for x ∈ (0, ∞) we have
1 (2α)α (2β)β xα−1 1 αα β β xα−1
fX (x) = =
B(α, β) 2α+β [αx + β]α+β B(α, β) [αx + β]α+β
fX (x) 1 αα−1 β β+1 (βy/α)α−1 1 y α−1
fY (y) = dy = = for y ∈ (0, ∞).
| dx | B(α, β) β α+β [1 + y]α+β B(α, β) [1 + y]α+β
Or use X = (Y /2α)/(Z/2β) = (βY )/(αZ) where Y ∼ χ22α , Z ∼ χ22β and Y and Z are independent. Hence X ∼
Beta 0 (α, β) by part (f) of 5 on page 35.
20. Now W = nX/(mY ); hence mW/n = X/Y where X ∼ χ2m , Y ∼ χ2n and X and Y are independent. Hence by part (f)
of exercise 5 on page 35, X/Y ∼ Beta 0 ( m/2, n/2).
P
21. Proof 1. Now W = (X/m)/(Y /n). Now Y = Z12 + · · · + Zn2 where Z1 , . . . , Zn are i.i.d. N (0, 1). Hence Y /n−→1 as
D
n → ∞. Hence, by Slutsky’s theorem (see answer to exercise 4 on page 122), mW = X/(Y /n)−→χ2m as n → ∞.
Proof 2. Now
Γ( m+n ) mm/2 nn/2 wm/2−1
fW (w) = m 2 n for w ∈ (0, ∞).
Γ( 2 )Γ( 2 ) [mw + n](m+n)/2
mm/2 wm/2−1 Γ( m+n
2 )n
n/2
lim fW (w) = m lim n
n→∞ Γ( 2 ) n→∞ Γ( )[mw + n](m+n)/2
2
mm/2 wm/2−1 1 Γ( m+n
2 )
= m lim n/2 n
Γ( 2 ) n→∞ (1 + mw /n) Γ( 2 )[mw + n]m/2
mm/2 wm/2−1 −mw/2 Γ( m+n
2 )
= m e lim n
Γ( 2 ) n→∞ Γ( )[mw + n]m/2
2
mm/2 wm/2−1 −mw/2 1
= e
Γ( m2 ) 2m/2
2
Suppose X has density fX and Y = α(X) where the function α has an inverse. Let fY denote the density of Y . Then if Z has
density fY then α−1 (Z) has density fX . This follows because P[α−1 (Z) ≤ z] = P[Z ≤ α(z)] = P[Y ≤ α(z)] = P[α(X) ≤
α(z)] = P[X ≤ z].
Page 124 Answers 1§21 Jan 8, 2019(21:02) Bayesian Time Series Analysis
√ 1
by using Stirling’s formula: n! ∼ 2π nn+ 2 e−n as n → ∞. Finally, convergence in densities implies convergence in
distribution (see page 252 in [F ELLER(1971)]).

Chapter 1 Section 19 on page 44 (exs-noncentral.tex.tex)

1. Use moment generating functions or characteristic functions.


n    
Y 1 λj t 1 λt
E[etZ ] = exp = exp
j=1
(1 − 2t)kj /2 1 − 2t (1 − 2t)k/2 1 − 2t
2
Hence Z P has a non-centralPnχk distribution with non-centrality parameter λ where k = k1 + · · · + kn and λ = λ1 + · · · + λn .
n
2. E[W ] = j=1 E[Xj ] = j=1 (1 + µ2j ) = n + λ
2

Suppose Z ∼ N (0, 1). Then E[Z] = E[Z 3 ] = 0, E[Z 2 ] = 1 and E[Z 4 ] = 3. Suppose X = Z + µ. Then E[X] = µ,
E[X 2 ] = 1 + µ2 , E[X 3 ] = 3µ + µ3 , and E[X 4 ] = 3 + 6µ2 + µ4 . Hence var[X 2 ] = 2 + 4µ2 .
Hence var[W ] = var[X12 ] + · · · + var[Xn2 ] = 2n + 4λ.
3. Rearranging equation(18.4a) on page 43 gives
∞ ∞
X e−λ/2 ( λ/2)j e−x/2 x(n+2j−2)/2 X e−λ/2 ( λ/2)j
fX (x) = = fn+2j (x)
j! 2(n+2j)/2 Γ( n/2 + j) j!
j=0 j=0
where fn+2j (x) is the density of the χ2n+2j distribution. Hence result.
4. " # "
n−1
#
X +µ √ h
−1/2
i √
−1/2
√ −1/2 Γ 2
E[T ] = E p = nE[X + µ]E Y = nµE[Y ] = nµ 2 as required.
Y /n Γ n2
Similarly
(X + µ)2
 
2 1
E[T ] = E = nE[(X + µ)2 ]E[1/Y ] = n(1 + µ2 ) and hence result.
Y /n n−2
p
5. Now T = (X + µ)/ Y /n where X ∼ N (0, 1), Y ∼ χ2n and X and Y are independent. Hence
(X + µ)2 W
T2 = = where W ∼ χ21,µ2 , Y ∼ χ2n and W and Y are independent.
Y /n Y /n
Hence result.
6.

n n 1
E[F ] = E[X]E[1/Y ] = (m + λ) as required.
m m n−2
Now for the variance:
n2 n2 1
E[F 2 ] = E[X 2
]E[1/Y 2
] = (2m + 4λ + m2 + 2mλ + λ2 )
m 2 m2 (n − 2)(n − 4)
Hence
n2 1
(n − 2) 2(m + 2λ) + (m + λ)2 − (n − 4)(m + λ)2
  
var[F ] =
m (n − 2) (n − 4)
2 2

n2 1
2(m + λ)2 + 2(n − 2)(m + 2λ) as required.
 
= 2
m (n − 2) (n − 4)
2

Chapter 1 Section 21 on page 48 (exs-powerPareto.tex)

α−1
1. Now Y = (X − a)/h has the standard power distribution Power(α, 0, 1) which has density f (y) = αy for 0 < y < 1.
R1
For j = 1, 2, . . . , we have E[Y j ] = α 0 y α+j−1 dy = α/(α + j). Hence
α α α
E[Y ] = E[Y 2 ] = and hence var[Y ] =
α+1 α+2 (α + 1)2 (α + 2)
Now X = a + hY . Hence
αh αh2 2aαh αh2
E[X] = a + E[X 2 ] = + + a2 and var[X] =
α+1 α+2 α+1 (α + 1)2 (α + 2)
nα nα
2. P[Mn ≤ x] = (x − a) /h and so Mn ∼ Power(nα, a, h).
3. (a) FMn (x) = P[Mn ≤ x] = xn for 0 < x < 1. This is the Power(n, 0, 1) distribution with density fMn (x) = nxn−1 for
0 < x < 1. (b) P[U 1/n ≤ x] = P[U ≤ xn ] = xn for 0 < x < 1. The same distribution as for part (a).
(c) Now (X − a)/h ∼ Power(α, 0, 1); hence by part (b) we have (X − a)/h ∼ U 1/α and X ∼ a + hU 1/α . Then use the
binomial theorem and E[U j/α ] = α/(α + j).
4. Now X ∈ (0, h). Hence Y ∈ (ln( 1/h), ∞) and Y − ln( 1/h) ∈ (0, ∞).
Now P[Y ≤ y] = P[ln(X) ≥ −y] = P[X ≥ e−y ] = 1 − e−αy /hα = 1 − e−α(y−ln(1/h)) for y > ln( 1/h). Hence the
density is αe−α(y−ln(1/h)) for y > ln( 1/h), a shifted exponential.
Appendix Jan 8, 2019(21:02) Answers 1§21 Page 125

5. Equation(2.3b) on page 5 gives


n! k−1 n−k n!
fk:n (x) = f (x) {F (x)} {1 − F (x)} = αxkα−1 (1 − xα )n−k
(k − 1)!(n − k)! (k − 1)!(n − k)!
and using the transformation v = xα gives
Z 1 Z 1
n! n! 1
E[Xk:n ] = αxkα (1 − xα )n−k dx = v k−1+ α (1 − v)n−k dv
(k − 1)!(n − k)! 0 (k − 1)!(n − k)! 0
n! Γ(k + α1 )Γ(n − k + 1) n! Γ(k + α1 )
= 1
=
(k − 1)!(n − k)! Γ(n + α + 1) (k − 1)! Γ(n + α1 + 1)
2 n! Γ(k + α2 )
E[Xk:n ]=
(k − 1)! Γ(n + α2 + 1)
6. (a)
Rx
n − 1 0 yf (y) dy

E[Y1 + · · · + Yn−1 ] (n − 1)E[Y ] (n − 1)α
 
Sn
E X(n) = x = 1 + =1+ =1+ =1+
X(n) x x x F (x) α+1
as required.
(b) The density of (X1:n , X2:n , . . . Xn:n ) is f (x1 , . . . , xn ) = n!αn (x1 x2 · · · xn )α−1 /hnα for 0 ≤ x1 ≤ x2 · · · ≤ xn . Con-
sider the transformation to (W1 , W2 , . . . , Wn ) where W1 = X1:n /Xn:n , W2 = X2:n /Xn:n , . . . , Wn−1 = X(n−1):n /Xn:n
and Wn = Xn:n . This has Jacobian with absolute value
∂(w1 , . . . , wn ) 1
∂(x1 , . . . , xn ) = xn−1

n
Hence for 0 < w1 < 1, . . . , 0 < wn < 1, the density of the vector (W1 , . . . , Wn ) is
αn−1
h(w1 , . . . , wn ) = wnn−1 f (w1 wn )f (w2 wn ) · · · f (wn−1 wn ) = α(n−1) w1α−1 w2α−1 · · · wn−1 α−1 (n−1)α
wn
h
Hence W1 , W2 , . . . , Wn are independent. Hence W1 + · · · + Wn−1 is independent of Wn as required.
7. (a) The distribution of X(i) give X(i+1) = x is the same as the distribution of the maximum of i independent random
i
variables from the density f (y)/F (x) for y ∈ (0, x); this maximum has distribution function {F (y)/F (x)} and density
if (y){F (y)}i−1 /{F (x)}i for y ∈ (0, x).
(a) Hence
" #
r Z x
X(i) i
E X(i+1) = x = r y r f (y){F (y)}i−1 dy (21.7a)

r
X(i+1) x {F (x)}i 0
⇐ Substituting f (y) = αy α−1 /hα and FR(y) = y α /hα in the right hand side of equation(21.7a) gives iα/(iα + r)
x
as required. ⇒ Equation(21.7a) gives i 0 y r f (y){F (y)}i−1 dy = cxr {F (x)}i for x ∈ (0, h). Differentiating with
r i−1
respect to x gives ix f (x){F (x)} = cx {F (x)}i−1 [rF (x) + xif (x)]. Hence f (x)/F (x) = cr/ix(1 − c) > 0
r−1

because c < 1. Hence result. (b)


" #
r Z x
X(i+1) ixr f (y){F (y)}i−1
E X = x = dy (21.7b)

r (i+1)
X(i) {F (x)}i 0 yr

and then as for part (a).


8. By equation(2.2a), the density of the vector is (X1:n , X2:n , . . . , Xn:n ) is
g(x1 , . . . , xn ) = n!f (x1 ) · · · f (xn ) = n!αn (x1 x2 · · · xn )α−1 for 0 ≤ x1 ≤ x2 · · · ≤ xn .
The transformation to (W1 , W2 , . . . , Wn ) has Jacobian with absolute value

∂(w1 , . . . , wn ) 1 2 n−2 n−1
∂(x1 , . . . , xn ) = x2 · · · xn where x2 · · · xn = w2 w3 · · · wn−1 wn and x1 = w1 w2 · · · wn

Hence for 0 < w1 < 1, . . . , 0 < wn < 1, the density of the vector (W1 , . . . , Wn ) is
h(w1 , . . . , wn ) = n!αn xα−1 1 (x2 · · · xn )α = (αw1α−1 )(2αw22α−1 ) · · · (nαwnnα−1 ) = fW1 (w1 )fW2 (w2 ) · · · fWn (wn )
Hence W1 , W2 , . . . , Wn are independent. Also fWk (wk ) = kαwkkα−1 which is Power(kα, 0, 1).
(b) Now Xk:n = Wk Wk+1 · · · Wn ; hence
kα (k + 1)α nα
E[Xk:n ] = EWk ]E[Wk+1 ] · · · E[Wn ] = ···
kα + 1 (k + 1)α + 1 nα + 1
n! 1 1 1 n! Γ(k + α1 )
= · · · =
(k − 1)! k + α1 k + 1 + α1 n + α1 (k − 1)! Γ(n + 1 + α1 )
2
Similarly for E[Xk:n ] = EWk2 ]E[Wk+1
2
] · · · E[Wn2 ].
−1/α
9. (a) Just use P[Y ≤ y] = P[U ≤ y] = P[U 1/α ≥ 1/y] = P[U ≥ 1/yα ]. (b) Just use Y ∼ Pareto(α, a, x0 )
iff (Y − a)/x0 ∼ Pareto(α, 0, 1) and part (a). (c) By part (a), 1/X ∼ Pareto(α, 0, 1) iff 1/X ∼ U −1/α where
U ∼ U (0, 1) and hence iff X ∼ U 1/α where U ∼ U (0, 1) and hence iff X ∼ Power(α, 0, 1).
Page 126 Answers 1§21 Jan 8, 2019(21:02) Bayesian Time Series Analysis

10. n
xα xαn

P[Mn > x] = P[X1 > x]n = 0
α
= 0
which is Pareto(αn, a, x0 ).
(x − a) (x − a)αn
11. Now f (x) = αxα 0 /x
α+1
for x > x0 . (a) E[X] = αx0 /(α − 1) if α > 1 and ∞ if α ≤ 1. Also E[X 2 ] = αx20 /(α − 2) if
α > 2 and ∞ otherwise. √ Hence var[X] = αx20 /(α − 1)2 (α − 2) provided α > 2.
α
(b) The median is x0 2. The mode is x0 .
(c) E[X n ] = αxn0 /(α − n) for α > Rn and ∞ otherwise.

(d) Suppose t < 0. Then E[etX ] = x0 αxα tx
0 e /x
α+1
dx. Set v = −tx; hence v > 0. Then
Z ∞ α −v α+1 Z ∞
αx e (−t) dx α(−x0 t)α e−v
E[etX ] = 0
α+1
= dx = α(−x0 t)α Γ(−α, −x0 t)
−tx0 v (−t) −tx0 v α+1
R∞
where Γ(s, x) = x ts−1 e−t dt is the incomplete gamma function. Hence the c.f. is E[eitX ] = α(−ix0 t)α Γ(−α, −ix0 t).
12. Part (a) of exercise 11 shows that E[X] is infinite. Also
  Z ∞ ∞
1 ∞ −5/2 1 −2x−3/2
Z
1 1 1 1
E = dx = x dx = =3
X 1 x 2x 3/2 2 1 2 3 1
13. P[Y ≤ y] = P[ln(X) ≤ y] = P[X ≤ ey ] = 1 − xα 0e
−αy
= 1 − e−α(y−ln(x0 )) for y > ln(x0 ). Hence the density
is αe−α(y−ln(x0 )) for y > ln(x0 ), a shifted exponential. In particular, if X ∼ Pareto(α, 0, 1), then the distribution of
Y = ln(X) is the exponential (α) distribution. R ∞ ln(x)
14. Now GMX is defined by ln(GMX ) = E[ln X]. Either use exercise 13 or directly: E[ln X] = αxα 0 x0 xα+1
dx =
α ∞ −αy 1 1
R 
αx0 ln(x0 ) ye dy = ln(x0 ) + α and hence GMX = x0 exp α .
From the answer to exercise 1§13(9) on page 119 we have, where E[X] = αx0 /(α − 1),
Z ∞ Z ∞ Z u 
2E[X]γX = 2 uF (u)f (u)du − 2 vf (v)dv f (u)du
u=0 u=0 v=0
Z ∞ h  x α i αxα Z ∞ Z u
αxα αxα

0 0 0 0
=2 u 1− α+1
du − 2 v α+1
dv du
u=x0 u u u=x0 v=x0 v uα+1
Z ∞  Z ∞ Z u

 
1 1 1
= 2αxα 0 α
− 0

du − 2α 2 2α
x 0 α
dv α+1
du
u=x0 u u u=x0 v=x0 v u
2α2 x0 2αx0
= −
(2α − 1)(α − 1) (2α − 1)
and hence
α α−1 1
γX = − =
2α − 1 2α − 1 2α − 1
15. Using equation(2.3b) on page 5 gives
 k−1
n! k−1 n−k n! α 1 1
fk:n (x) = f (x) {F (x)} {1 − F (x)} = 1−
(k − 1)!(n − k)! (k − 1)!(n − k)! xα+1 xα xα(n−k)
 k−1
n! α 1
= 1 −
(k − 1)!(n − k)! x(n−k+1)α+1 xα
and hence
∞  k−1
n! α
Z
1
E[Xk:n ] = 1− α dx
(k − 1)!(n − k)! 1 x(n−k+1)α x
1 1
n! v n−k+1 n!
Z Z
1
k−1
= 1 (1 − v) dv = v n−k− α (1 − v)k−1 dv
(k − 1)!(n − k)! 0 v 1+ α (k − 1)!(n − k)! 0
1
n! Γ(n − k − α + 1)Γ(k) n! Γ(n − k + 1 − α1 )
= 1
=
(k − 1)!(n − k)! Γ(n − α + 1) (n − k)! Γ(n + 1 − α1 )
2 n! Γ(n − k + 1 − α2 )
E[Xk:n ]=
(n − k)! Γ(n + 1 − α2 )
16. By equation(2.2a) on page 4, the density of the vector is (X1:n , X2:n , . . . , Xn:n ) is
g(x1 , . . . , xn ) = n!f (x1 ) · · · f (xn ) = n!αn / (x1 x2 · · · xn )α+1 for 1 ≤ x1 ≤ x2 · · · ≤ xn .
The transformation
to (W1 , W2 , . . . , Wn ) has Jacobian with absolute value
∂(w1 , . . . , wn ) 1 n−1 n−2
∂(x1 , . . . , xn ) = x1 · · · xn−1 where x1 · · · xn−1 = w1 w2 · · · wn−1 and xn = w1 w2 · · · wn

Hence for w1 > 1, . . . , wn > 1, the density of the vector (W1 , . . . , Wn ) is


n!αn n!αn
h(w1 , . . . , wn ) = α α+1
= nα+1 (n−1)α+1
(x1 · · · xn−1 ) xn w1 w2 2α+1 w α+1
· · · wn−1 n
Appendix Jan 8, 2019(21:02) Answers 1§21 Page 127

nα (n − 1)α 2α α
= nα+1 (n−1)α+1
· · · 2α+1 α+1 = fW1 (w1 )fW2 (w2 ) · · · fWn (wn )
w1 w2 w
wn−1 n
(n−k+1)α
Hence W1 , W2 , . . . , Wn are independent. Also fWk (wk ) = (n−k+1)α+1
wk
which is Pareto((n − k + 1)α, 0, 1).
(b) Now Xk:n = W1 W2 · · · Wk ; hence
nα (n − 1)α (n − k + 1)α
E[Xk:n ] = EW1 ]E[W2 ] · · · E[Wk ] = ···
nα − 1 (n − 1)α − 1 (n − k + 1)α − 1
n! 1 1 1 n! Γ(n − k + 1 − α1 )
= 1
· · · =
(n − k)! n − α n − 1 − α1 n − k + 1 − α1 (n − k)! Γ(n + 1 − α1 )
2
Similarly for E[Xk:n ] = EW12 ]E[W22 ] · · · E[Wk2 ].
17. Just use (X1 , . . . , Xn ) ∼ ( Y11 , . . . , Y1n ).
18. Let Z = Y /X . Note that F (x) = 1 − xα α
0 /x for x ≥ x0 and 0 otherwise. Then for z > 1 we have
Z ∞ Z ∞ Z ∞
xα αxα αx2α

1 1
P[Z ≤ z] = F (zx)f (x) dx = 1 − α0 α 0
α+1
dx = 1 − 0
α 2α+1
dx = 1 − α
x0 x0 z x x z x0 x 2z
For z < 1 we have
Z ∞ Z ∞ 
xα αxα zα zα

P[Z ≤ z] = F (zx)f (x) dx = 1 − α0 α 0
α+1
dx = z α − =
x0 /z x0 /z z x x 2 2
and hence
α+1
1
fZ (z) = 2 α/z if z > 1;
1 α−1
2 αz if z < 1;
19.
P[M ≤ x, Y /X ≤ y] = P[M ≤ x, Y /X ≤ y, X ≤ Y ] + P[M ≤ x, Y /X ≤ y, Y < X]
= P[X ≤ x, Y ≤ yX, X ≤ Y ] + P[Y ≤ x, Y ≤ yX, Y < X]

P[X ≤ x, Y ≤ yX, X ≤ Y ] + P[Y ≤ x, Y < X] if y > 1;
=
P[Y ≤ x, Y ≤ yX] if y < 1.
Rx
x0
P[v ≤ Y ≤ yv]f (v) dv + P[Y ≤ x, Y < X] if y > 1;
=
P[Y ≤ x, Y ≤ yX] if y < 1.
(Rx Rx
[F (yv) − F (v)] f (v) dv + x0 [1 − F (v)] f (v) dv if y > 1;
= Rxx0  
x0
1 − F ( v/y) f (v) dv if y < 1.
(  Z x
x2α
 
α 1 − 1
+ 1 if y > 1; f (v) 1 0
= x0 I y α
where I = α
dv = α 1 − 2α
yα if y < 1. x0 v 2x0 x
(
x2α 1 − 2y1α if y > 1;
 
0
= 1 − 2α yα = P[M ≤ x] P[ Y /X ≤ y] by using exercises 10 and 18.
x 2 if y < 1.
20. Define the vector (Y1 , Y2 , . . . , Yn ) by
X2:n Xn:n
Y1 = X1:n , Y2 = , . . . , Yn =
X1:n X1:n
Exercise 20 on page 8(with answer 1§3(20) on page 106) shows that for y1 > 0 and 1 ≤ y2 ≤ · · · ≤ yn the density of
the vector (Y1 , . . . , Yn ) is
h(y1 , . . . , yn ) = n!y1n−1 f (y1 )f (y1 y2 )f (y1 y3 ) · · · f (y1 yn )
αn xαn
0 1 1 1
= n! αn+1 · · · α+1
y1 y2α+1 y3α+1 yn
αnxαn 1 1 1 1 1 1
= αn+1 0
(n − 1)!αn−1 α+1 α+1 · · · α+1 = g(y1 ) (n − 1)!αn−1 α+1 α+1 · · · α+1
y1 y2 y3 yn y2 y3 yn
where g is the density of Y1 = X1:n (see answer 10 above). Hence part (a).
(b)
R∞
n − 1 x yf (y) dy

E[Y1 + · · · + Yn−1 ] (n − 1)E[Y ] (n − 1)α
 
Sn
E X(1) = x = 1 + =1+ =1+ =1+
X(1) x x x 1 − F (x) α−1
as required. (c) The result of part (a) implies Y1 is independent of Y2 + · · · + Yn = (Sn −X1:n )/X1:n = Sn/X1:n − 1. Hence
Y1 is independent of Sn/X1:n as required.
Page 128 Answers 1§21 Jan 8, 2019(21:02) Bayesian Time Series Analysis

21. The distribution of X(i+1) give X(i) = x is the same as the distribution of the minimum of n − i independent random
n on−i
1−F (y)
variables from the density f (y)/[1 − F (x)] for y ∈ (x, ∞); this minimum has distribution function 1 − 1−F (x)
and density (n − i)f (y){1 − F (y)}n−i−1 /{1 − F (x)}n−i for y ∈ (x, ∞). Hence
" #
r Z ∞
X(i+1) (n − i)
E r X(i) = x = xr {1 − F (x)}n−i y r f (y){1 − F (y)}n−i−1 dy (21.21a)

X(i) x
⇐ Substituting f (x) = αxα 0 /x
α+1
and F (x) = 1 − xα α
0 /x in the right hand side of equation(21.21a) gives (n −
i)α/((nR− i)α + r) as required. The condition α > r/(n − i) ensures the integral is finite. ⇒ Equation(21.21a) gives

(n − i) x y r f (y){1 − F (y)}n−i−1 dy = cxr {1 − F (x)}n−i for x ∈ (x0 , ∞). Differentiating with respect to x gives
f (x)/{1 − F (x)} = cr/[x(n − i)(c − 1)] = α/x where α = rc/[(n − i)(c − 1)] > r/(n − i). Hence result. Part (b) is
similar.
22. Let X1 = ln(X) and Y1 = ln(Y ). Then X1 and Y1 are i.i.d. random variables with an absolutely continuous distribution.
Also min(X1 , Y1 ) = ln [min(X, Y )] is independent of Y1 − X1 = ln(Y /X). Hence there exists λ > 0 such that X1 and
Y1 have the exponential (λ) distribution. Hence X = eX1 and Y = eY1 have the Pareto(λ, 0, 1) distribution.
23. ⇐ By equation(2.3b) on page 5 the density of Xi:n is
 i−1  n−i
n! i−1 n−i n! β 1 1
fi:n (t) = f (t) {F (t)} {1 − F (t)} = 1 −
(i − 1)!(n − i)! (i − 1)!(n − i)! tβ+1 tβ tβ
n! β β i−1
= βn+1
t −1 for t > 1.
(i − 1)!(n − i)! t
By equation(2.5a) on page 5, the joint density of (Xi:n , Xj:n ) is, where c = n!/[(i − 1)!(j − i − 1)!(n − j)!],
 i−1  j−1−i  n−j
f(i:n,j:n) (u, v) = cf (u)f (v) F (u) F (v) − F (u) 1 − F (v)
i−1  j−i−1  n−j
β2

1 1 1 1
= c β+1 β+1 1 − β − β
u v u uβ v vβ
 β i−1  β j−i−1
β2 u −1 v − uβ
= c β+1 β(n−j+1)+1
u v uβ uβ v β
β2 i−1  β j−i−1
= c β(j−1)+1 β(n−i)+1 uβ − 1 v − uβ

for 1 ≤ u < v.
u v
Use the transformation (T, W ) = (Xi:n , Xj:n/Xi:n ). The absolute value of the Jacobian is ∂(t,w) ∂(u,v) = | /u| = 1/t. Hence
1

β2 β i−1  β j−i−1
f(T,W ) (t, w) = c t −1 w −1 = fi:n (t)fW (w)
tβn+1 wβ(n−i)+1
The fact that the joint density is the product of the marginal densities implies W and Y = Xi:n are independent.
⇒ The joint density of (Xi:n , Xj:n ) is given by equation(2.5a) on page 5. The transformation to T = Xi:n , W =
Xj:n /Xi:n has Jacobian with absolute value ∂(t,w) ∂(u,v) = 1/u = 1/t. Hence (T, W ) has density

 i−1  j−i−1  n−j
f(T,W ) (t, w) = ctf (t)f (wt) F (t) F (wt) − F (t) 1 − F (wt)
Now T = Xi:n has density given by equation(2.3b) on page 5:
n! i−1 n−i
fT (t) = f (t) {F (t)} {1 − F (t)}
(i − 1)!(n − i)!
Hence the conditional density is, for all t > 1 and w > 1,
j−i−1  n−j
(n − i)! F (tw) − F (t) 1 − F (tw)

f(T,W ) (t, w) tf (tw)
f(W |T ) (w|t) = =
fT (t) (j − i − 1)!(n − j)! 1 − F (t) 1 − F (t) 1 − F (t)
(n − i)! 1 − F (tw)
 
∂q(t, w) j−i−1 n−j
= − {1 − q(t, w)} {q(t, w)} where q(t, w) =
(j − i − 1)!(n − j)! ∂w 1 − F (t)
and by independence, this must be independent of t. Hence there exists a function g(w) with
∂q(t, w) j−i−1 n−j
g(w) = {1 − q(t, w)} {q(t, w)}
∂w
j−i−1 
∂q(t, w) X j − i − 1

r n−j
= (−1)r {q(t, w)} {q(t, w)}
∂w r
r=0
j−i−1
X j − i − 1 ∂q(t, w) r+n−j
= (−1)r {q(t, w)}
r ∂w
r=0
j−i−1  r+n−j+1
∂ X j−i−1 {q(t, w)}

= (−1)r
∂w r r+n−j+1
r=0
and hence
Appendix Jan 8, 2019(21:02) Answers 1§23 Page 129

j−i−1 r+n−j+1
X  j−i−1 {q(t, w)}
Z 
g1 (w) = g(w) dw = (−1)r
r r+n−j+1
r=0
j−i−1
X  j−i−1

∂g1 (w) r+n−j ∂q(t, w)
0= = (−1)r {q(t, w)}
∂t r ∂t
r=0

∂q(t, w) ∂q(t, w)
= g(w)
∂t ∂w
and hence ∂q(t,w)
∂t = 0 and so q(t, w) is a function of w only. Setting t = 1 shows that q(t, w) = (1 − F (tw))/(1 − F (t)) =
q(1, w) = (1 − F (w))/(1 − F (1)). Hence 1 − F (tw) = (1 − F (w))(1 − F (t))/(1 − F (1)). But F (1) = 0; hence we have
the following equation for the continuous function F :
(1 − F (tw)) = (1 − F (t))(1 − F (w))
for all t ≥ 1 and w ≥ 1 with boundary conditions F (1) = 0 and F (∞) = 1. This is effectively Cauchy’s logarithmic
functional equation. It leads to
1
F (x) = 1 − β for x ≥ 1.
x

Chapter 1 Section 23 on page 54 (exs-sizeshape.tex.tex)



1. Consider the size variable g1 : (0, ∞)2 → (0, ∞) with g(x) = x1 . The associated shape function is z1 (x) = 1, x2/x1 =
(1, a). Hence z1 (x) is constant. For any other shape function z2 , we know that z2 (x) = z2 ( z(x) ) (see the proof of
equation(22.3a) on page 52. Hence result.
2. (a) Consider the size variable g ∗ (x) = x1 ; the associated shape function is z ∗ (x) = 1, x2 /x1 . Hence


3 with probability 1/2;
z ∗ (X) = 1
/3 with probability 1/2.
If z is any other shape function, then by equation(22.3a) on page 52 we have z ∗ (X) = z ∗ ( z(X) ). Hence z(X) cannot be
almost surely constant.
The possible values of the 3 quantities are as follows:
probability z(X) g1 (X) g2 (X)

1/4 ( /4, /4)
1 3
√3 4
1/4 ( 1/4, 3/4) 12 4

1/4 ( 3/4, 1/4) √3 4
1/4 ( 3/4, 1/4) 12 4
Clearly z(X) is independent of both g1 (X) and g2 (X).
(b) By proposition(22.3b)

on page 52 we know that g1 (X)/g2 (X) is almost surely constant. It is easy to check that
g1 (X)/g2 (X) = 3/4.
3. ⇐ Let Yj = Xjb for j = 1, 2, . . . , n. Then Y1 , . . . , Yn are independent random variables. By proposition(22.4a), the
b b
 
shape vector 1, Y2/Y1 , . . . , Yn/Y1 is independent of Y1 + · · · + Yn . This means 1, X2 /X1b , . . . , Xn/X1b is independent of
X1b + · · · + Xnb . Hence 1, X2/X1 , . . . , Xn/X1 is independent of (X1b + · · · + Xnb )1/b as required.


⇒ We are given that 1, X2/X1 , . . . , Xn/X1 is independent of (X1b + · · · + Xnb )1/b . Hence 1, X2b/X1b , . . . , Xnb/X1b is
 

independent of X1b + · · · + Xnb . By proposition(22.4a), there exist α > 0, k1 > 0, . . . , kn > 0 such that Xjb ∼
Gamma(kj , α) and hence Xj ∼ GGamma(kj , α, b).
R∞ R∞
4. (a) P[X1 < X2 ] = 0 P[X2 > x]λ1 e−λ1 x dx = 0 e−λ2 x λ1 e−λ1 x dx = λ1 /(λ1 + λ2 ).
(b) The lack of memory property implies the distribution of V = X2 − X1 given X2 > X1 is the same as the distribution
of X2 and the distribution of X2 − X1 given X2 < X1 is the same as the distribution of −X1 . Hence
(
λ2 λ1 λ1 eλ1 v λ1λ+λ
2
if v ≤ 0;
fV (v) = fV (v|X2 < X1 ) + fV (v|X1 < X2 ) = −λ2 v λ1
2
λ1 + λ2 λ1 + λ2 λ2 e λ1 +λ2 if v ≥ 0.
(c) Now V = X2 − X1 . Hence
Z Z ∞
fV (v) = fX2 (x + v)fX1 (x) dx = λ1 λ2 e−λ2 v e−(λ1 +λ2 )x dx
{x:x+v>0} x=max{−v,0}
Hence if v ≥ 0 we have

λ1 λ2 e−λ2 v
Z
fV (v) = λ1 λ2 e−λ2 v e−(λ1 +λ2 )x dx =
0 λ1 + λ2
and if v ≤ 0 we have

λ1 λ2 eλ1 v
Z
fV (v) = λ1 λ2 e−λ2 v e−(λ1 +λ2 )x dx =
−v λ1 + λ2
Page 130 Answers 1§25 Jan 8, 2019(21:02) Bayesian Time Series Analysis

5. Now V = Y2 − Y1 . By exercise 4, we know that


(
λ2 λ1 v
λ1 +λ2 e if v ≤ 0;

λ1 λ2 eλ1 v if v ≤ 0;
fV (v) = and P[V ≤ v] = λ1
λ1 + λ2 e−λ2 v if v ≥ 0. 1 − λ1 +λ2 e−λ2 v if v ≥ 0.
P[U ≥ u] = e−λ1 (u−a) e−λ2 (u−a) = ea(λ1 +λ2 ) e−(λ1 +λ2 )u for u ≥ a.
Now for u ≥ a and v ∈ R we have
Z ∞
P[U ≥ u, V ≤ v] = P[Y1 ≥ u, Y2 ≥ u, Y2 − Y1 ≤ v] = P[u ≤ Y2 ≤ v + y1 ]fY1 (y1 ) dy1
y1 =u
Z ∞
= λ1 eλ1 a P[u ≤ Y2 ≤ v + y1 ]e−λ1 y1 dy1
y1 =u
where  R v+y1
P[u ≤ Y2 ≤ v + y1 ] = λ2 eλ2 a y2 =u
e−λ2 y2 dy2 = eλ2 a [e−λ2 u − e−λ2 (v+y1 ) ] if v + y1 > u;
0 if v + y1 < u.
Hence for v ≥ 0 we have
Z ∞
P[U ≥ u, V ≤ v] = λ1 eλ1 a P[u ≤ Y2 ≤ v + y1 ]e−λ1 y1 dy1
y1 =u
Z ∞
(λ1 +λ2 )a
e−λ1 y1 e−λ2 u − e−λ2 (v+y1 ) dy1
 
= λ1 e
y1 =u
−λ1 u −(λ1 +λ2 )u
λ1 e−λ2 v
   
(λ1 +λ2 )a −λ2 u e −λ2 v e (λ1 +λ2 )a −(λ1 +λ2 )u
= λ1 e e −e =e e 1−
λ1 λ1 + λ2 λ1 + λ2
= P[U ≥ u]P[V ≤ v]
Similarly, for v ≤ 0 we have
Z ∞
λ1 a
P[U ≥ u, V ≤ v] = λ1 e P[u ≤ Y2 ≤ v + y1 ]e−λ1 y1 dy1
y1 =u−v
Z ∞
(λ1 +λ2 )a
e−λ1 y1 e−λ2 u − e−λ2 (v+y1 ) dy1 = P[U ≥ u]P[V ≤ v]
 
= λ1 e
y1 =u−v
Hence the result.
6. Suppose θj > 0 and x0 > 0. Then X ∼ Pareto(α, 0, θj x0 ) iff X/θj ∼ Pareto(α, 0, x0 ) and proceed as in the proof of
proposition(22.5a) on page 53.
7. Suppose θj > 0 and h > 0. Then X ∼ Power(α, 0, θj h) iff X/θj ∼ Power(α, 0, h) and proceed as in the proof of
proposition(22.6a) on page 53.

Chapter 1 Section 25 on page 56 (exs-other.tex)

1. (a) Let W = X + Y . For w > 0,


Z ∞ ∞ ∞
α −αw
Z Z
fW (w) = fX (x)fY (w − x) dx = αe−αx αeα(w−x) dx = α2 eαw e−2αx dx = e
x=w x=w x=w 2
For w < 0
w w
α αw
Z Z
fW (w) = fY (y)fX (w − y) dy = α2 e−αw e2αy dy = e
−∞ −∞ 2
and hence fW (w) = α2 e−α|w| for w ∈ R; this is the Laplace(0, α) distribution.
(b) The Laplace(0, α) distribution by part (a).
2. (a) The expectation, median and mode are all µ. Also var[X] = var[X − µ] = 2/α2 by using the representation of the
Laplace as the difference of two independent exponentials given in exercise 1. (b) The distribution function is
 1 −α(µ−x)
e if x < µ;
FX (x) = 2 1 −α(x−µ)
1 − 2e if x ≥ µ.
(c) Using the representation in exercise 1 again implies the moment generating function is
α α α2 eµt
E[etX ] = eµt E[et(X−µ) ] = eµt = 2 for |t| < α.
α − t α + t α − t2
R x α −α|y| R x α −αy Rx
3. For x > 0 we have P[|X| < x] = P[−x < X < x] = −x 2 e dy = 2 0 2 e dy = 0 αe−αy dy. Hence |X| has
the exponential density αe−αx
R ∞for x > 0. R∞
4. For z > 0 we have fZ (z) = y=0 fX (z + y)fY (y) dy = λµ y=0 e−λz e−λy e−µy dy = λµe−λz /(λ + µ).
R∞ R∞
For z < 0 we have fZ (z) = y=−z fX (z + y)fY (y) dy = λµe−λz y=−z e−(λ+µ)y dy = λµeµz /(λ + µ)
Appendix Jan 8, 2019(21:02) Answers 1§25 Page 131

5. (a) Now E[etX ] = α2 eµt /(α2 − t2 ). Hence if Y = kX + b then E[etY ] = ebt E[etkX ] = ebt α2 eµkt /(α2 − k 2 t2 ) =
et(kµ+b) α12 /(α12 − t2 ) where α1 = α/k. This is the mgf of the Laplace(kµ + b, α/k) distribution.
(b) Now X − µ ∼ Laplace(0, α); hence α(X − µ) ∼ Laplace(0, 1). Pn
(c) By partP(b), α(X − µ) ∼ Laplace(0, 1); hence α|X − µ| ∼ exponential (1); hence α i=1 |Xi − µ| ∼ Gamma(n, 1);
n
hence 2α i=1 |Xi − µ| ∼ Gamma(n, 1/2) = χ22n
6. Now |X − µ| ∼ exponential (α) = Gamma(1, α). Hence result by equation(16.8a) on page 39.
7. Let W = X and Z = ln( X/Y ). Now (X, Y ) ∈ (0, 1) × (0, 1). Clearly W ∈ (0, 1). Also 0 < X < X/Y ; hence
eZ = X/Y > X = W . This implies: if Z > 0 then 0 < W < 1 and if Z < 0 then 0 < W < eZ .
Then | ∂(w,z) z
∂(x,y) | = /y = e /x. Hence f(W,Z) (w, z) = y = we
1 −z
.
R 1 −z R ez
If z > 0 then fZ (z) = 0 we dw = 2 e . If z < 0 then fZ (z) = 0 we−z dw = 21 ez . The Laplace(0, 1) distribution.
1 −z
2
α α
8. Let Z = X(2Y − 1). Then E[etX(2Y −1) ] = 12 E[e−tX ] + 12 E[etX ] = 21 α−t + 12 α+t = α2 α22α
−t2
= α2α−t2 for |t| < α as
required.
9. Let Y1 = (X1 + X2 )/2; Y2 = (X3 + X4 )/2, Y3 = (X1 − X2 )/2 and Y4 = (X3 − X4 )/2. Then X1 X2 = Y12 − Y32 and
X3 X4 = Y22 − Y42 . Hence X1 X2 − X3 X4 = (Y12 + Y42 ) − (Y22 + Y32 ).
Now
t1 − t3 t2 − t4
    
  t1 + t3 t2 + t4
E exp ( i(t1 Y1 + t2 Y2 + t3 Y3 + t4 Y4 ) ) = E exp i X1 + X2 + X3 + X4
2 2 2 2
(t1 + t3 )2 (t1 − t3 )2 (t2 + t4 )2 (t2 − t4 )2
 
= exp − − − −
8 8 8 8
 2 2 2 2
t +t +t +t
= exp − 1 3 2 4 = E[eit1 Y1 ] E[eit2 Y2 ] E[eit3 Y3 ] E[eit4 Y4 ]
4
Hence Y1 , Y2 , Y3 and Y4 are i.i.d. N (0, σ = /2). Hence 2(X1 X2 − X3 X4 ) = 2(Y12 + Y42 ) − 2(Y22 + Y32 ) is equal to the
2 1

difference of two independent χ22 = exponential ( 1/2) distributions which is the Laplace(0, 1/2) distribution.
(b) X1 X2 + X3 X4 = (Y12 + Y22 ) − (Y32 + Y42 ) and then as for part (a).
10. Using characteristic functions.

Z ∞ √
Z ∞ Z ∞
itY 2X itY 2x −x − 21 2xt2 −x 2 1
E[e ]= E[e ]e dx = e e dx = e−x(1+t ) dx =
0 0 0 1 + t2
and this is the c.f. of the Laplace(0, 1) distribution. Hence result.
√ 1 ,y2 ) √

Using densities. Let Y1 = X and Y2 = Y 2X. Hence Y1 > 0 and Y2 ∈ R and ∂(y ∂(x,y) = 2x = 2y1 . Hence
f(X,Y ) (x, y) 1 −x 1 −y2 /2 1 2
f(Y1 ,Y2 ) (y1 , y2 ) = √ =√ e √ e = √ e−y1 e−y2 /4y1
2y1 2y1 2π 2 πy1
Using the substitution y1 = z 2 /2 and equation(11.13b) on page 29 gives
Z ∞ Z ∞ y2
1 −(y1 +y22 /4y1 ) 1 z2 2 1
fY2 (y2 ) = √ e dy1 = √ e−( 2 + 2z2 ) dz = e−|y2 |
0 2 πy 1 0 2π 2
as required.
11. (a) The absolute value of the Jacobian of the transformation is

∂(x, y) cos θ −r sin θ
∂(r, θ) sin θ r cos θ = r
=

Hence for r ∈ (0, ∞) and θ ∈ (0, 2π) we have


r −r2 /2σ2
f(R,Θ) (r, θ) = e
2πσ 2
2 2
Hence Θ is uniform on (0, 2π) with density f (θ) = 1/2π and R has density fR (r) = (r/σ 2 )e−r /2σ for r > 0.
(b)
r 2 2 1 1 1 −(x2 +y2 )/2σ2
f(X,Y ) (x, y) = 2 e−r /2σ = e for (x, y) ∈ R2 .
σ 2π r 2πσ 2
Hence X and Y are i.i.d. N (0, σ 2 ).
2 2
12. (a) P[R ≤ r] = 1 − e−r /2σ for r ≥ 0. (b) Using the substitution r2 = y shows
Z ∞ Z ∞  
1 2 2 1 n/2 −y/2σ 2 n+2
E[Rn ] = 2 rn+1 e−r /2σ dr = y e dy = 2 n/2 n
σ Γ
σ 0 2σ 2 0 2
R ∞ n−1 −λx
where the last equality comes from the integral of the gamma density: 0 x e dx = Γ(n)/λn .
p 2 2 2
(c) E[R] = σ π/2; E[R ] = 2σ and√hence var[R] = (4 − π)σ /2. (d) By differentiating the density, the mode is
at σ. From part (a), the median is at σ 2 ln(2).
√ 2 2 2 2
13. P[X ≤ x] = P[ −2 ln U ≤ x/σ] = P[−2 ln U ≤ x2 /σ 2 ] = P[ln U ≥ −x2 /2σ 2 ] = P[U ≥ e−x /2σ ] = 1 − e−x /2σ as
required.
Page 132 Answers 2§2 Jan 8, 2019(21:02) Bayesian Time Series Analysis
2 2 2
14. (a) Now fR (r) = re−r /2σ /σ 2 for r > 0. Let V = R2 . Then fV (v) = e−v/2σ /2σ 2 which is the exponential (1/2σ 2 ),
or