You are on page 1of 8

Institute of Actuarial and Financial Mathematics WiSe 2023/24

Leibniz University Hannover


Prof. Dr. Stefan Weber, M.Sc. Sören Bettels

Quantitative Risk Management - Presence Tasks 2

Presence Tasks 2.1 - Rules for Mean Vectors and Covariance Matrices
Let X, Y and Z be n-dimensional random vectors, µ := E[X], Cov(X) = Σ and A, B ∈
Rm×n , a, b ∈ Rn . Show that the following results hold:

a) E[X + Y ] = E[X] + E[Y ],

b) E[AX + b] = A · E[X] + b,

c) Cov(X) = E[XX T ] − E[X]E[X]T ,

d) Var(aT X) = aT Cov(X)a,

e) Cov(AX + b) = ACov(X)AT ,

f) Cov(X + Y ) = Cov(X) + Cov(Y ) provided that Cov(X, Y ) = 0,

g) Cov(X + Y, Z) = Cov(X, Z) + Cov(Y, Z),

h) E[X T AX] = tr(AΣ) + µT Aµ,

i) E[X] = E[E[X|Y ]],

j) Cov(X) = E[Cov(X|Y )] + Cov(E[X|Y ]),

k) Var(X) = E[Var(X|Y )] + Var(E[X|Y ]).

Solution
Let Xi , Yi , Zi , i ∈ {1, . . . , n} denote the components of X, Y, Z.

a) As E[X] = (E[X1 ], . . . , E[Xn ])T we have

E[X + Y ] = (E[X1 + Y1 ], . . . , E[Xn + Yn ])T


= (E[X1 ] + E[Y1 ], . . . , E[Xn ] + E[Yn ])T
= E[X] + E[Y ].

1
b) Let A ∈ Rm×n , X ∈ Rn , b ∈ Rn . Further we write A = (aij )i∈{1,...,m},j∈{1,...,n} for
the components of A and Aj , j ∈ {1, . . . , n} for the rows of A. Then we have that

A1 · X T + b1
 

E[AX + b] = E 
 .. 
. 
An · X T + bn
E[A1 · X T + b1 ]
 

= ..
.
 
.
E[An · X T + bn ]

For j ∈ {1, . . . , n} we further have that


n
" #
X
T
E[Aj · X + bj ] = E aij Xi + bj
i=1
n
X
= aij E[Xi ] + bj
i=1
= Aj · E[X]T + bj .

Plugging this in above yields

A1 · E[X]T + b1
 

E[A · X + b] = 
 .. 
. 
T
An · E[X] + bn
= A · E[X] + b.

c) Let X := (X1 , . . . , Xn )T ∈ Rn . Then we have that

Cov(X) = (Cov(Xi , Xj ))i,j∈{1,...,n}


= (E[Xi Xj ] − E[Xi ]E[Xj ])i,j∈{1,...,n}
= (E[Xi Xj ])i,j∈{1,...,n} − (E[Xi ]E[Xj ])i,j∈{1,...,n}
= E[XX T ] − E[X]E[X]T .

d) If a, X ∈ Rn we have that aT · X ∈ R. Therefore

Var(aT X) = E[(aT · X − E[aT · X])2 ]


= E[(aT · X − aT µ)(aT · X − aT µ)T ]
= E[aT (X − µ)(X − µ)T a]
= aT E[(X − µ)(X − µ)T ]a
= aT Cov(X)a.

2
e) With A ∈ Rm×n , X ∈ Rn and b ∈ Rn we have that
Cov(AX + b) = E (AX + b − E[AX + b])(AX + b − E[AX + b])T
 

= E (AX − E[AX])(AX + E[AX])T


 

= E[A(X − E[X])(X − E[X])T )AT ]


= AE[(X − E[X])(X − E[X])T )]AT
= ACov(X)AT .

f) With X, Y ∈ Rn we have that


Cov(X + Y ) = E (X + Y − E[X + Y ])(X + Y − E[X + Y ])T
 

= E ((X − E[X]) + (Y − E[Y ]))((X − E[X])T + (Y − E[Y ])T )


 

= E (X − E[X])(X − E[X])T + E (Y − E[Y ])(Y − E[Y ])T


   

+ E (Y − E[Y ])(X − E[X])T + E (X − E[X])(Y − E[Y ])T


   

= Cov(X) + Cov(Y ) + Cov(Y, X) + Cov(X, Y )


= Cov(X) + Cov(Y ) + 2Cov(X, Y ).
If now Cov(X, Y ) = 0 then
Cov(X + Y ) = Cov(X) + Cov(Y ).

g) With X, Y, Z ∈ Rn we have that


Cov(X + Y, Z) = E[(X + Y − E[X + Y ])(Z − E[Z])T ]
= E[((X − E[X]) + (Y − E[Y ])(Z − E[Z])T ]
= E[(X − E[X])(Z − E[Z])T ] + E[(Y − E[Y ])(Z − E[Z])T ]
= Cov(X, Z) + Cov(Y, Z).

h) First of all we have that


E[X T AX] = E[(µ + (X − µ))T A(µ + (X − µ))]
= µT Aµ + µT AE[(X − µ)] + E[(X − µ)T ]Aµ +E[(X − µ)T A(X − µ)]
| {z } | {z }
=0 =0
T T
= µ Aµ + E[(X − µ) A(X − µ)].
So its left to show that E[(X − µ)T A(X − µ)] = tr(AΣ). Note that (X − µ)T A(X −
µ) ∈ R. Then with the cyclic property of the trace operator and its linearity we
have that
E[(X − µ)T A(X − µ)] = E[tr((X − µ)T A(X − µ))]
= E[tr(A(X − µ)(X − µ)T )]
= tr(E[A(X − µ)(X − µ)T ])
= tr(AE[(X − µ)(X − µ)T ])
= tr(AΣ)

3
i) We have that
 
E[E[X1 |Y ]]
E[E[X|Y ]] = 
 .. 
. 
E[E[Xn |Y ]]
 
E[X1 ]
=  ... 
 

E[Xn ]
= E[X],
by the tower property of the one dimensional conditional expectation.
j) Let X, Y ∈ Rn . Then from the one dimensional case we know that
Cov(Xi , Xj ) = E[Xi Xj ] − E[Xi ]E[Xj ] ⇔ E[Xi Xj ] = Cov(Xi , Xj ) + E[Xi ]E[Xj ],
where i ̸= j. We note that
Cov(X) = (Cov(Xi , Xj ))i,j∈{1,...,n} .
Further
Cov(Xi , Xj ) = E[Xi Xj ] − E[Xi ]E[Xj ]
= E[E[Xi Xj |Y ]] − E[E[Xi |Y ]]E[E[Xj |Y ]]
above
= E[Cov(Xi Xj |Y ) + E[Xi |Y ]E[Xj |Y ]] − E[E[Xi |Y ]]E[E[Xj |Y ]]
= E[Cov(Xi , Xj |Y )] + E[E[Xi |Y ]E[Xj |Y ]] − E[E[Xi |Y ]]E[E[Xj |Y ]]
= E[Cov(Xi , Xj |Y )] − Cov(E[Xi |Y ], E[Xj |Xj ).
So we have proven the property in every entry of the matrix Cov(X).
j) Setting i = j in the solution of task j) we obtain
Cov(Xi , Xi ) = Var(Xi ) = E[Var(Xi |Y )] − Var(E[Xi |Y ]).
So the claimed identy holds in every coordinate of the vector Var(X).

Presence Task 2.2 - The Classical Linear Model - Prediction Matrix


Let β̂ be the least squares estimator of the classical linear model
y = Xβ + ε,
where X ∈ Rn×k+1 is the design matrix, β ∈ Rk+1 is the unknown parameter and ε ∈ Rn
are the centered errors.
Further let H, defined as
Hy = X β̂,
be the prediction matrix.
a) Show that the prediction matrix H is symmetric, idempotent and rk(H) = k + 1.
b) Show that the matrix I − H is symmetric, idempotent with rk(I − H) = n − k − 1.

4
Solution
We will use the following properties of matrix calculations: Let A, B ∈ Rn×k , C ∈ Rk×k ,
then

• (AB)T = B T AT ,

• (A + B)T = AT + B T ,

• (A−1 )T = (AT )−1 ,

• (AT )T = A,

• rk(AC) = rk(CAT ) = rk(C), if rk(A) = k,

• C is invertible if and only if rk(C) = k.

• If A is an orthogonal projection matrix, such that A2 = A and AT = A, then

im(A) = ker(I − A), im(A) ⊕ im(I − A) = Rk .

• Example proof for the last property:


If A2 = A and AT = A we have (I − A)2 = I − A and (I − A)T = I − A, such that
I − A is itself an orthogonal projection as it is a idempotent, symmetric matrix.
Further if x ∈ im(A), then

Ax = A2 x ⇒ x = Ax.

Therefore A is the identity on im(A).


Further we have then

x ∈ im(A) ⇒ 0 = (I − A)x ⇒ x ∈ ker(I − A).

If x ∈ ker(A) then

0 = (I − A)x ⇒ Ix = Ax ⇒ x ∈ im(A).

Overall im(A) = ker(I − A) and ker(A) = im(I − A). From this we conclude

Rk = im(A) ⊕ ker(A) = im(A) ⊕ im(I − A).

a) From the lecture we know that H = X(X T X)−1 X T . Then the transpose of H is
given as
T
H T = X(X T X)−1 X T = (X T )T (X(X T X)−1 )T
= X((X T X)−1 )T X T = X((X T X)T )−1 X T
= X(X T X)−1 X T = H.

5
So H is a symmetric matrix. Further
H · H = X (X T X)−1 X T X (X T X)−1 X T = X(X T X)−1 X T = H,
| {z }
=I
such that H is also idempotent.
From the assumptions to classical linear model we know that rk(X) = k + 1.
Therefore we have that
X (X T X)−1 |{z}
rk(H) = rk( |{z} X T ) = rk((X T X)−1 ).
| {z }
∈Rn×k+1 ∈Rk+1×k+1 ∈Rk+1×n

As (X T X) is assumed to be invertible so is (X T X)−1 . Therefore (X T X)−1 has


full rank, such that rk(H) = k + 1.
b) From task a) we know that H is symmetric. So we have that
(I − H)T = I T − H T = I − H.
Further H is idempotent as
(I − H)(I − H) = I(I − H) − H(I − H) = II − IH − HI + HH
= I − H − H + H = I − H.
As H is idempotent and symmetric H is an orthogonal projection matrix. From
the last property above we have Im(H) ⊕ Im(I − H) = Rn . This yields
rk(H) + rk(I − H) = n ⇔ rk(I − H) = n − k − 1,
where we used that rk(H) = k + 1 was shown in a).

Presence Task 2.3- Statistical Properties of the Residuals in the CLM


We consider the classical linear model
y = Xβ + ε,
where X ∈ Rn×k+1 is the design matrix, β ∈ Rk+1 is the unknown parameter and ε ∈ Rn
are the independent, centered errors with Cov(ε) = σ 2 I .
1
Further let H denote the prediction matrix, σ̂ 2 = n−k−1 ε̂T ε̂ and ε̂ = y−X β̂ the residuals,
where β̂ is the least squares estimator of β.
Show that the following statements hold:
a) E[ε̂i ] = 0,
b) Cov(ε̂) = σ 2 (I − H),
c) Var(ε̂i ) = σ 2 (1 − hii ), where hii is the i-th diagonal entry of H,
d) If ε ∼ N (0, σ 2 I) then ε̂ ∼ N (0, σ 2 (I − H)),
ε̂T ε̂ 2
e) If ε ∼ N (0, σ 2 I) then σ2
= (n − k − 1) σ̂σ2 ∼ χ2n−k−1 ,

f) If ε ∼ N (0, σ 2 I) then ε̂T ε̂ = (n − p)σ̂ 2 and β̂ are independent.

6
Solution
a) From the lecture we know that β̂ = (X T X)−1 X T y. Therefore we can express the
residuals ε̂ as

ε̂ = y − X β̂ = y − X(X T X)−1 X T y = y − Hy = (I − H)y

As H is by definition not random, but known, we have that

E[ε̂] = E[y − Hy]


= E[y] − HE[y]
= E[y] − X(X T X)−1 X T E[y]
= Xβ − X(X T X)−1 X T Xβ
= Xβ − Xβ = 0.

b) We have shown above that I − H is symmetric and idempotent. With this we have
that

Cov(ε̂) = Cov((I − H)y) = (I − H)σ 2 I(I − H)T


= (I − H)(I − H)σ 2 = (I − H)σ 2 .

c) The variance of the residuals Var(ε̂i ), for i ∈ {1, . . . , n} are defined as the diagonal
entries of Cov(ε̂). Therefore with task b) we have that

Var(ε̂i ) = (1 − hii )σ 2 .

d) As before we have that


ε̂ = (I − H)y.
From the lecture we know that y is multivariate normal distributed and therefore
ε̂ the linear transformation of a normal distributed random variable. So ε̂ is itself
multivariate normal distributed, with expectation and covariance matrix calculated
in a) and b). This yields
ε̂ ∼ N (0, σ 2 (I − H)).

e) We define the matrix Q as

Q = I − H = I − X(X T X)−1 X T .

Then, as we know, Q is symmetric, idempotent (Q2 = Q) and with rk(Q) =


n − k − 1. Multiplication with the design matrix X gives

QX = X − X(X T X)−1 X T X = X − X = 0.

7
From this we have that

ε̂T ε̂ = (Qy)T Qy = y T QT Qy
= y T QQy = (Xβ + ε)T QQ(Xβ + ε)
= (β T (QX)T +εT Q)(QX β + Qε)
| {z } |{z}
=0 =0
= εT QQε = εT Qε.

By the model assumptions we have that ε ∼ N (0, σ 2 I) so ε/σ ∼ N (0, I). The
result follows from the theorem:
Let X ∼ N (0, I) and R symmetric, idempotent p × p matrix with rk(R) = r. Then

X T RX ∼ χ2r .

f) We show that σ1 (β̂ −β) and σ12 ε̂T ε̂ are independent. For this we apply the following
result:
Let X ∼ N (0, Ip ), B an (n × p)-matrix (n ≤ p), R a symmetric idempotent (p × p)
matrix with rk(R) = r, then

BR = 0 ⇒ X T RX is idependent to BX.

We set R = Q and B = (X T X)−1 X T .. Then we have that

BR = (X T X)−1 X T Q = (X T X)−1 (QX)T = 0.


| {z }
e)
=0

Since ε/σ ∼ N (0, I), the results implies the independence of

εT ε ε̂T ε̂
Q = 2 ,
σ σ σ
T −1 T ε
(X X) X .
σ
The claim follows by
1 1
(β̂ − β) = ((X T X)−1 X T y − β)
σ σ
1
(X T X)−1 X T (Xβ + ε) − β

=
σ
1
= (β + (X T X)−1 ε − β)
σ
1
= (X T X)−1 ε
σ
ε
= (X T X)−1 .
σ

You might also like