Professional Documents
Culture Documents
Claudia Czado
TU München
• Introduction
• Clustered Data
– response is measured for each subject
– each subject belongs to a group of subjects (cluster)
Ex.:
– math scores of student grouped by classrooms (class room forms cluster)
– birth weigths of rats grouped by litter (litter forms cluster)
• Longitudinal Data
– response is measured at several time points
– number of time points is not too large (in contrast to time series)
Ex.: sales of a product at each month in a year (12 measurements)
random factor =
qualitative variable whose levels are randomly
sampled from a population of levels being studied
Ex.: 20 supermarkets were selected and their number of cashiers were reported
10 supermarkets with 2 cashiers
observed levels of random factor
5 supermarkets with 1 cashier
“number of cashiers”
5 supermarkets with 5 cashiers
random effect =
quantitative variable whose levels are randomly
sampled from a population of levels being studied
Ex.: 20 supermarkets were selected and their size reported. These size values
are random samples from the population of size values of all supermarkets.
Model:
i = 1, . . . , m; j = 1, . . . , ni
Assumptions:
Matrix Notation:
t t
xi1 ui1 Yi1
Xi := .. ∈ Rni×p, Ui := .. ∈ Rni×q , Yi := .. ∈ Rni
xtini utini Yini
Yi = Xiβ + Uiγi + ǫi i = 1, . . . , m
Yi = Xiβ + Uiγi + ǫi
matrix notation
⇒ γi ∼ Nq (0, D) γ1, . . . , γm, ǫ1, . . . , ǫm independent
ǫi ∼ Nni (0, Σi)
Remark: The general form of the mixed linear model is the same for clustered
and longitudinal observations.
X1 D
X := .. ∈ Rn×p, β ∈ Rp G := ... ∈ Rmq×mq
Xn D
U1 0n1×q . . . 0n1×q
0n ×q 0 ... 0
U2 ∈ Rn×(m·q), 0n ×q := .. . . .
U :=
..
2
... i
∈ Rni×q
0 ... 0
0nm×q Um
γ1 ǫ1 Σ1 0
γ := .. ∈ Rm·q , ǫ := .. R := ... ∈ Rn×n
γm ǫm , 0 Σm
Y = Xβ + U γ + ǫ (2)
γ 0 G 0mq×n
where ∼ Nmq+n ,
ǫ 0 0n×mq R
Remarks:
Y |γ ∼ Nn(Xβ + U γ, R) (3)
γ ∼ Nmq (0, R) (4)
∗
Y = Xβ + ǫ
Therefore (2) implies (5) marginal model
ǫ∗ ∼ Nn(0, V )
• (2) or (3)+(4) implies (5), however (5) does not imply (3)+(4)
⇒ If one is only interested in estimating β one can use the ordinary
linear model (5)
If one is interested in estimating β and γ one has to use model (3)+(4)
t −1
−1
β̃ := X V X X tV −1Y (6)
1/2
1/2 t
Recall: Y = Xβ + ǫ, ǫ ∼ Nn(0, Σ), Σ known, Σ = Σ Σ
⇒ Σ−1/2Y = Σ−1/2Xβ + |Σ−1/2
{z ǫ} (7)
−1/2 −1/2t
∼ Nn(0, |Σ ΣΣ
{z })
=In
t
−1 t
β̂ = X tΣ−1/2 Σ−1/2X XΣ−1/2 Σ−1/2Y
⇒ LSE of β in (7) : −1 t −1
t −1
= X Σ X X Σ Y (8)
This estimate is called the weighted LSE
Exercise: Show that (8) is the MLE in Y = Xβ + ǫ, ǫ ∼ Nn(0, Σ)
Cov(Y , γ) = Cov(Xβ + U γ + ǫ, γ)
= Cov(Xβ, γ) +U V ar(γ, γ) + Cov(ǫ, γ) = U G
| {z } | {z } | {z }
=0 G =0
Y Xβ V UG
⇒ ∼ Nn+mq ,
γ 0 GU t G
Y µY ΣY ΣY Z
Recall: X = ∼ Np ,
Z µZ ΣZY ΣZ
⇒ Z|Y ∼ N µZ|Y , ΣY |Z with
µZ|Y = µZ + ΣZY Σ−1
Y (Y − µ Y ) , Σ Z|Y = Σ Z − Σ ZY Σ −1
Y ΣY Z
exp{− 21 γ tG −1γ}
⇒ ln f (y, γ) = − 21 (y − Xβ − U γ)tR−1(y − Xβ − U γ)
t −1
− 21 | G{z γ}
γ + constants ind. of (β, γ)
penalty term for γ
So it is enough to minimize
∂
∂αi f (α) = bj ,
∂
∂α f (α) = b
t
PP
g(α) := α Aα = αiαj aij
i j
n
P n
P n
P
∂
∂αi g(α) = 2αiaii + αj aij + αj aji = 2 αj aij = 2Atiα
j=1,j6=i j=1,j6=i j=1
At1
∂
∂α g(α) = 2 .. = 2Aα Ati is ith row of A
Atn
∂ Set
Q(β, γ) = −2X tR−1y + 2X tR−1U γ + 2X tR−1Xβ = 0
∂β
∂ Set
∂γ Q(β, γ) = −2U tR−1Xβ − 2U tR−1y + 2U tR−1U γ + 2G −1γ = 0
e + X tR−1U γ
⇔ X tR−1X β e = X tR−1y
e + (U tR−1U + G −1)e
U tR−1X β γ = U tR−1y
t −1
t
X R X −1 t
X R U−1 e
β X R y
⇔ = (10)
U tR−1U U tR−1R + G −1 e
γ U tR−1y
0 0
Define C := X U , B :=
0 G −1
t −1 X t
−1
X R t −1
⇒C R C = t R X U = t −1 X U
U
t −1 U R
X R X X tR−1U
=
U tR−1X U tR−1U
t −1
e
β
⇒ (10) ⇔ (C R C + B) = C tR−1y
e
γ
βe
⇔ = (C tR−1C + B)−1C tR−1y
e
γ
Y = Xβ + ǫ∗, ǫ∗ ∼ Nn(0, V )
e
β(ϑ) := (X tV (ϑ)−1X)−1X tV (ϑ)−1y
e
lp(ϑ) := l(β(ϑ), ϑ)
e
= − 12 {ln |V (ϑ)| + (y − X β(ϑ))t e
V (ϑ)−1(y − X β(ϑ))}
Maximizing lp(ϑ) wrt to ϑ gives MLE ϑ̂M L. ϑ̂M L is however biased; this is
why one uses often restricted ML estimation (REML)
R R 1
L(β, ϑ)dβ = (2π)n/2
|V (ϑ)|−1/2 + exp{− 12 (y − Xβ)tV (ϑ)−1(y − Xβ)}dβ
Consider:
(y − Xβ)tV (ϑ)−1(y − Xβ) = β t X tV (ϑ)−1X β − 2y tV (ϑ)−1Xβ + y tV (ϑ)−1y
| {z }
A(ϑ)
= (β − B(ϑ)y)tA(ϑ)(β − B(ϑ)y) + y V (ϑ)−1 − y tB(ϑ)tA(ϑ)B(ϑ)y
t
Now e
(y − X β(ϑ))t e
V (ϑ)−1(y − X β(ϑ))
e
= y tV (ϑ)−1y − 2y tV (ϑ)−1X β(ϑ) e
+ β(ϑ)t e
X tV (ϑ)−1X β(ϑ)
| {z }
A(ϑ)
= y tV (ϑ)−1y − 2y tV (ϑ)−1XB(ϑ)y + y tB(ϑ)tA(ϑ)B(ϑ)y
= y tV (ϑ)−1y − y tB(ϑ)tA(ϑ)B(ϑ)y
Here we used:
e = (X tV (ϑ)−1X)−1X tV (ϑ)−1y = A(ϑ)−1X tV (ϑ)−1y = B(ϑ)y
β
and
B(ϑ)tA(ϑ)B(ϑ) = V (ϑ)−1XA(ϑ)−1A(ϑ)B(ϑ) = V (ϑ)−1XB(ϑ)
R |V (ϑ)|−1/2
L(β, ϑ)dβ = e
exp{− 21 (y − X β(ϑ))t e
V (ϑ)−1(y − X β(ϑ))}
(2π)n/2
(2π)n/2 1
· |A(ϑ)−1|−1/2 |A(ϑ)−1| = |A|
R
⇒ lR(θ) = ln( L(β, ϑ)dβ)
e
= − 12 {ln |V (ϑ)| + (y − X β(ϑ))t e
V (ϑ)−1(y − X β(ϑ))}
− 21 ln |A(ϑ)| + C
= lp(ϑ) − 21 ln |A(ϑ)| + C
Therefore the restricted ML (REML) of ϑ is given by ϑ̂REM L which maximizes
1
lR(ϑ) = lp(ϑ) − ln |X tV (ϑ)−1X|
2
b
⇒ β = (X tVb −1X)−1X tVb −1Y
m
X Xm
−1 −1
= XitVbi Xi)−1( XitVbi Yi)
i=1 i=1
and
b = GU
γ b has components
b tVb −1(Y − X β)
bi = D(b
γ b
γ )UiVbi−1(yi − Xiβ)
b := (X tV −1(ϑ)X)
A(ϑ) b −1
b
It is expected that (X tV −1(ϑ)X)−1 b
jj underestimates V ar(βj ) since the variation
in ϑb is not taken into account.
A full Bayesian analysis using MCMC methods is preferable to these
approximations.
• H0 : βj = 0 versus H1 : βj 6= 0
βb
Reject H0 ⇔ |tj | = | σbjj | > z1−α/2
• H0 : Cβ = d versus H1 : Cβ 6= d rank(C) = r
b − d)t(C tA(ϑ)C)
Reject H0 ⇔ W := (C β b −1 b − d) > χ2
(C β 1−α,r
(Wald-Test)
or
Reject H0 ⇔ −2[l(β, b γ
b ) − l(βbR, γ
bR)] > χ21−α,r
where β, b γb estimates in unrestricted model
βbR, γ
bR estimates in restricted model (Cβ = d)
(Likelihood Ratio Test)