You are on page 1of 28

Lecture 10: Linear Mixed Models

(Linear Models with Random Effects)

Claudia Czado

TU München

c (Claudia Czado, TU Munich)


–0–
Overview
West, Welch, and Galecki (2007)
Fahrmeir, Kneib, and Lang (2007) (Kapitel 6)

• Introduction

• Likelihood Inference for Linear Mixed Models


– Parameter Estimation for known Covariance Structure
– Parameter Estimation for unknown Covariance Structure
– Confidence Intervals and Hypothesis Tests

c (Claudia Czado, TU Munich)


–1–
Introduction
So far: independent response variables, but often

• Clustered Data
– response is measured for each subject
– each subject belongs to a group of subjects (cluster)
Ex.:
– math scores of student grouped by classrooms (class room forms cluster)
– birth weigths of rats grouped by litter (litter forms cluster)

• Longitudinal Data
– response is measured at several time points
– number of time points is not too large (in contrast to time series)
Ex.: sales of a product at each month in a year (12 measurements)

c (Claudia Czado, TU Munich)


–2–
Fixed and Random Factors/Effects
How can we extend the linear model to allow for such dependent data structures?

fixed factor = qualitative covariate (e.g. gender, agegroup)

fixed effect = quantitative covariate (e.g. age)

random factor =
qualitative variable whose levels are randomly
sampled from a population of levels being studied
Ex.: 20 supermarkets were selected and their number of cashiers were reported
10 supermarkets with 2 cashiers 
observed levels of random factor
5 supermarkets with 1 cashier
“number of cashiers”
5 supermarkets with 5 cashiers

random effect =
quantitative variable whose levels are randomly
sampled from a population of levels being studied
Ex.: 20 supermarkets were selected and their size reported. These size values
are random samples from the population of size values of all supermarkets.

c (Claudia Czado, TU Munich)


–3–
Modeling Clustered Data

Yij = response of j-th member of cluster i, i = 1, . . . , m, j = 1, . . . , ni


m = number of clusters
ni = size of cluster i
xij = covariate vector of j-th member of cluster i for fixed effects, ∈ Rp
β = fixed effects parameter, ∈ Rp
uij = covariate vector of j-th member of cluster i for random effects, ∈ Rq
γi = random effect parameter, ∈ Rq

Model:

Yij = xij tβ + uij tγi + ǫij


| {z } | {z } |{z}
fixed random random

i = 1, . . . , m; j = 1, . . . , ni

c (Claudia Czado, TU Munich)


–4–
Mixed Linear Model (LMM) I

Assumptions:

γi ∼ Nq (0, D), D ∈ Rq×q


 
ǫi1
ǫi :=  ..  ∼ Nni (0, Σi), Σi ∈ Rni×ni
ǫini

γ1, . . . , γm, ǫ1, . . . , ǫm independent

D = covariance matrix of random effects γi

Σi = covariance matrix of error vector ǫi in cluster i

c (Claudia Czado, TU Munich)


–5–
Mixed Linear Model (LMM) II

Matrix Notation:
 t   t   
xi1 ui1 Yi1
Xi :=  ..  ∈ Rni×p, Ui :=  ..  ∈ Rni×q , Yi :=  ..  ∈ Rni
xtini utini Yini

Yi = Xiβ + Uiγi + ǫi i = 1, . . . , m

⇒ γi ∼ Nq (0, D) γ1, . . . , γm, ǫ1, . . . , ǫm independent (1)

ǫi ∼ Nni (0, Σi)

c (Claudia Czado, TU Munich)


–6–
Modeling Longitudinal Data

Yij = response of subject i at j-th measurement, i = 1, . . . , m, j = 1, . . . , ni


ni = number of measurements for subject i
m = number of objects
xij = covariate vector of i-th subject at j-th measurement
for fixed effects β ∈ Rp
uij = covariate vector of i-th subject at j-th measurement
for random effects γi ∈ Rq

Yi = Xiβ + Uiγi + ǫi
matrix notation
⇒ γi ∼ Nq (0, D) γ1, . . . , γm, ǫ1, . . . , ǫm independent
ǫi ∼ Nni (0, Σi)

Remark: The general form of the mixed linear model is the same for clustered
and longitudinal observations.

c (Claudia Czado, TU Munich)


–7–
Matrix Formulation of the Linear Mixed Model
 
Y1 Pm
.
Y :=  .  ∈ R , where n :=
n
ni
Ym i=1

   
X1 D
X :=  ..  ∈ Rn×p, β ∈ Rp G :=  ...  ∈ Rmq×mq
Xn D
 
U1 0n1×q . . . 0n1×q  
 0n ×q  0 ... 0
U2  ∈ Rn×(m·q), 0n ×q :=  .. . . .
U := 
 ..
2
...  i
 ∈ Rni×q
0 ... 0
0nm×q Um
     
γ1 ǫ1 Σ1 0
γ :=  ..  ∈ Rm·q , ǫ :=  ..  R :=  ...  ∈ Rn×n
γm ǫm , 0 Σm

c (Claudia Czado, TU Munich)


–8–
Linear Mixed Model (LMM) in matrix formulation

With this, the linear mixed model (1) can be rewritten as

Y = Xβ + U γ + ǫ (2)
     
γ 0 G 0mq×n
where ∼ Nmq+n ,
ǫ 0 0n×mq R

Remarks:

• LMM (2) can be rewritten as two level hierarchical model

Y |γ ∼ Nn(Xβ + U γ, R) (3)
γ ∼ Nmq (0, R) (4)

c (Claudia Czado, TU Munich)


–9–
 
 γ
• Let Y = Xβ + ǫ∗, where ǫ∗ := U γ + ǫ = U In×n
| {z } ǫ
A
(2)
⇒ ǫ∗ ∼ Nn(0, V ), where
    
G 0  G 0 U t
V = A At = U In×n
0 R 0 R In×n
 
 U t
= UG R = U GU t + R
In×n



Y = Xβ + ǫ
Therefore (2) implies (5) marginal model
ǫ∗ ∼ Nn(0, V )

• (2) or (3)+(4) implies (5), however (5) does not imply (3)+(4)
⇒ If one is only interested in estimating β one can use the ordinary
linear model (5)
If one is interested in estimating β and γ one has to use model (3)+(4)

c (Claudia Czado, TU Munich)


– 10 –
Likelihood Inference for LMM:
1) Estimation of β and γ for known G and R
Estimation of β: Using (5), we have as MLE or weighted LSE of β

t −1
−1
β̃ := X V X X tV −1Y (6)

1/2

1/2 t
Recall: Y = Xβ + ǫ, ǫ ∼ Nn(0, Σ), Σ known, Σ = Σ Σ
⇒ Σ−1/2Y = Σ−1/2Xβ + |Σ−1/2
{z ǫ} (7)
−1/2 −1/2t
∼ Nn(0, |Σ ΣΣ
{z })
=In
 t
−1 t
β̂ = X tΣ−1/2 Σ−1/2X XΣ−1/2 Σ−1/2Y
⇒ LSE of β in (7) : −1 t −1
t −1
= X Σ X X Σ Y (8)
This estimate is called the weighted LSE
Exercise: Show that (8) is the MLE in Y = Xβ + ǫ, ǫ ∼ Nn(0, Σ)

c (Claudia Czado, TU Munich)


– 11 –
Estimation of γ:
From (3) and (4) it follows that Y ∼ Nn(Xβ, V ) γ ∼ Nmq (0, G)

Cov(Y , γ) = Cov(Xβ + U γ + ǫ, γ)
= Cov(Xβ, γ) +U V ar(γ, γ) + Cov(ǫ, γ) = U G
| {z } | {z } | {z }
=0 G =0

     
Y Xβ V UG
⇒ ∼ Nn+mq ,
γ 0 GU t G
     
Y µY ΣY ΣY Z
Recall: X = ∼ Np ,
Z µZ ΣZY ΣZ
⇒ Z|Y ∼ N µZ|Y , ΣY |Z with
µZ|Y = µZ + ΣZY Σ−1
Y (Y − µ Y ) , Σ Z|Y = Σ Z − Σ ZY Σ −1
Y ΣY Z

E(γ|Y ) = 0 + GU tV −10(Y − Xβ) = GU tV −1(Y − Xβ) (9)


is the best linear unbiased predictor of γ (BLUP)

Therefore γ̃ := GU tV −1(Y − X β̃) is the empirical BLUP (EBLUP)

c (Claudia Czado, TU Munich)


– 12 –
Joint maximization of log likelihood of (Y t, γ t)t
with respect to (β, γ)
f (y, γ) = f (y|γ) · f (γ)
(3)+(4)
∝ exp{− 21 (y − Xβ − U γ)tR−1(y − Xβ − U γ)}

exp{− 21 γ tG −1γ}

⇒ ln f (y, γ) = − 21 (y − Xβ − U γ)tR−1(y − Xβ − U γ)

t −1
− 21 | G{z γ}
γ + constants ind. of (β, γ)
penalty term for γ

So it is enough to minimize

Q(β, γ) := (y − Xβ − U γ)tR−1(y − Xβ − U γ) − γ tG −1γ


= γ tR−1γ − 2β tX tR−1y + 2β tX tR−1U γ − 2γ tU tR−1y
+β tX tR−1Xβ + γ tU tR−1U γ + γ tG −1γ

c (Claudia Czado, TU Munich)


– 13 –
Recall:
n
P
t
f (α) := α b = αj bj
j=1


∂αi f (α) = bj ,


∂α f (α) = b

t
PP
g(α) := α Aα = αiαj aij
i j

n
P n
P n
P

∂αi g(α) = 2αiaii + αj aij + αj aji = 2 αj aij = 2Atiα
j=1,j6=i j=1,j6=i j=1

 
At1

∂α g(α) = 2  ..  = 2Aα Ati is ith row of A
Atn

c (Claudia Czado, TU Munich)


– 14 –
Mixed Model Equation

∂ Set
Q(β, γ) = −2X tR−1y + 2X tR−1U γ + 2X tR−1Xβ = 0
∂β

∂ Set
∂γ Q(β, γ) = −2U tR−1Xβ − 2U tR−1y + 2U tR−1U γ + 2G −1γ = 0

e + X tR−1U γ
⇔ X tR−1X β e = X tR−1y
e + (U tR−1U + G −1)e
U tR−1X β γ = U tR−1y

     t −1 
t
X R X −1 t
X R U−1 e
β X R y
⇔ = (10)
U tR−1U U tR−1R + G −1 e
γ U tR−1y

c (Claudia Czado, TU Munich)


– 15 –
e γ
Exercise: Show that β, e defined by (8) and (9) respectively solve (10).

 
 0 0
Define C := X U , B :=
0 G −1
   
t −1 X t
−1
X R t −1 
⇒C R C = t R X U = t −1 X U
U
 t −1  U R
X R X X tR−1U
=
U tR−1X U tR−1U
 
t −1
e
β
⇒ (10) ⇔ (C R C + B) = C tR−1y
  e
γ
βe
⇔ = (C tR−1C + B)−1C tR−1y
e
γ

c (Claudia Czado, TU Munich)


– 16 –
2) Estimation for unknown covariance structure

We assume now in the marginal model (5)

Y = Xβ + ǫ∗, ǫ∗ ∼ Nn(0, V )

with V = U GU t + R, that G and R are only known up to the


variance parameter ϑ, i.e. we write

V (ϑ) = U G(ϑ)U t + R(ϑ)

c (Claudia Czado, TU Munich)


– 17 –
ML Estimation in extended marginal model
Y = Xβ + ǫ∗, ǫ∗ ∼ Nn(0, V (ϑ)) with V (ϑ) = U G(ϑ)U t + R(ϑ)
loglikelihood for (β, ϑ):
1
l(β, ϑ) = − {ln |V (ϑ)|+(y−Xβ)tV (ϑ)−1(y−Xβ)}+ const. ind. of β, ϑ (11)
2
If we maximize (11) for fixed ϑ with regard to β, we get

e
β(ϑ) := (X tV (ϑ)−1X)−1X tV (ϑ)−1y

Then the profile log likelihood is

e
lp(ϑ) := l(β(ϑ), ϑ)
e
= − 12 {ln |V (ϑ)| + (y − X β(ϑ))t e
V (ϑ)−1(y − X β(ϑ))}

Maximizing lp(ϑ) wrt to ϑ gives MLE ϑ̂M L. ϑ̂M L is however biased; this is
why one uses often restricted ML estimation (REML)

c (Claudia Czado, TU Munich)


– 18 –
Restricted ML Estimation in extended marginal
model
Here we use for the estimation of ϑ the marginal log likelihood:
Z
lR(ϑ) := ln( L(β, ϑ)dβ)

R R 1
L(β, ϑ)dβ = (2π)n/2
|V (ϑ)|−1/2 + exp{− 12 (y − Xβ)tV (ϑ)−1(y − Xβ)}dβ

Consider:
(y − Xβ)tV (ϑ)−1(y − Xβ) = β t X tV (ϑ)−1X β − 2y tV (ϑ)−1Xβ + y tV (ϑ)−1y
| {z }
A(ϑ)
= (β − B(ϑ)y)tA(ϑ)(β − B(ϑ)y) + y V (ϑ)−1 − y tB(ϑ)tA(ϑ)B(ϑ)y
t

where B(ϑ) := A(ϑ)−1X tV (ϑ)−1

(Note that B(ϑ)tA(ϑ) = V (ϑ)−1XA(ϑ)−1A(ϑ) = V (ϑ)−1X )

c (Claudia Czado, TU Munich)


– 19 –
Therefore we have
R |V (ϑ)|−1/2
L(β, ϑ)dβ = exp{− 21 (y t[V (ϑ)−1 + B(ϑ)tA(ϑ)B(ϑ)]y}
Z(2π)n/2
1
· exp{− (β − B(ϑ)y)tA(ϑ)(β − B(ϑ)y)}dβ (12)
| 2 {z }
(2π)p/2
(Variance is A(ϑ)−1 !)
|A(ϑ)−1 |−1/2

Now e
(y − X β(ϑ))t e
V (ϑ)−1(y − X β(ϑ))
e
= y tV (ϑ)−1y − 2y tV (ϑ)−1X β(ϑ) e
+ β(ϑ)t e
X tV (ϑ)−1X β(ϑ)
| {z }
A(ϑ)
= y tV (ϑ)−1y − 2y tV (ϑ)−1XB(ϑ)y + y tB(ϑ)tA(ϑ)B(ϑ)y
= y tV (ϑ)−1y − y tB(ϑ)tA(ϑ)B(ϑ)y

Here we used:
e = (X tV (ϑ)−1X)−1X tV (ϑ)−1y = A(ϑ)−1X tV (ϑ)−1y = B(ϑ)y
β
and
B(ϑ)tA(ϑ)B(ϑ) = V (ϑ)−1XA(ϑ)−1A(ϑ)B(ϑ) = V (ϑ)−1XB(ϑ)

c (Claudia Czado, TU Munich)


– 20 –
Therefore we can rewrite (12) as

R |V (ϑ)|−1/2
L(β, ϑ)dβ = e
exp{− 21 (y − X β(ϑ))t e
V (ϑ)−1(y − X β(ϑ))}
(2π)n/2

(2π)n/2 1
· |A(ϑ)−1|−1/2 |A(ϑ)−1| = |A|

R
⇒ lR(θ) = ln( L(β, ϑ)dβ)

e
= − 12 {ln |V (ϑ)| + (y − X β(ϑ))t e
V (ϑ)−1(y − X β(ϑ))}

− 21 ln |A(ϑ)| + C

= lp(ϑ) − 21 ln |A(ϑ)| + C
Therefore the restricted ML (REML) of ϑ is given by ϑ̂REM L which maximizes
1
lR(ϑ) = lp(ϑ) − ln |X tV (ϑ)−1X|
2

c (Claudia Czado, TU Munich)


– 21 –
Summary: Estimation in LMM with unknown cov.
For the linear mixed model
     
γ 0 G(ϑ) 0mq×n
Y = Xβ + U γ + ǫ, ∼ Nmq+n ,
ǫ 0 0n×mq R(ϑ)
with V (ϑ) = U G(ϑ)U t + R(ϑ)

the covariance parameter vector ϑ is estimated by either


ϑ̂M L which maximizes
e
lp(ϑ) = − 21 {ln |V (ϑ)| + (y − X β(ϑ))t e
V (ϑ)−1(y − X β(ϑ))}
e = (X tV (ϑ)−1X)−1X tV (ϑ)−1Y
where β
or by
ϑ̂REM L which maximizes lR(ϑ) = lp(ϑ) − 21 ln |X tV (ϑ)−1X|

The fixed effects


 β and random effects γ are estimated by
−1
b
β= X V X t b −1
X tVb −1Y
b tVb −1(Y − X β)
b = GU
γ b where Vb = V (ϑbM L) or V (ϑ
bREM L)

c (Claudia Czado, TU Munich)


– 22 –
Special Case
(Dependence on ϑ is ignored to ease notation)
     
D U1 Σ1
G =  ... ,U =  ... ,R =  ... ,
D Um Σm
   
X1 Y1
X =  ..  , Y =  .. 
Xm Ym
 
U1DU1t + Σ1 0
⇒V = U GU t + R =  ...  (blockdiagonal)
t
0 UmDUm + Σm
 
V1
=  ...  where Vi := UiDUit + Σi
Vm

c (Claudia Czado, TU Munich)


– 23 –
Define

Vbi := UiD(ϑ)U b where ϑ


b t + Σi(ϑ), b=ϑ
bM L or ϑ
bREM L
i

b
⇒ β = (X tVb −1X)−1X tVb −1Y
m
X Xm
−1 −1
= XitVbi Xi)−1( XitVbi Yi)
i=1 i=1

and

b = GU
γ b has components
b tVb −1(Y − X β)

bi = D(b
γ b
γ )UiVbi−1(yi − Xiβ)

c (Claudia Czado, TU Munich)


– 24 –
3) Confidence intervals and hypothesis tests
Since Y ∼ N (Xβ, V (ϑ)) holds, an approximation to the covariance of
 −1
βb = X tV −1(ϑ)X
b b
X tV −1(ϑ)Y is given by

b := (X tV −1(ϑ)X)
A(ϑ) b −1

b is fixed and does not depend on Y .


Note: here one assumes that V (ϑ)
b
bj := (X tV −1(ϑ)X)
Therefore σ −1 b
jj are considered as estimates of V ar(βj ).
Therefore q
βbj ± z1−α/2 (X tV −1(ϑ)X)
b −1
jj
gives an approximate 100(1 − α)% CI for βj .

b
It is expected that (X tV −1(ϑ)X)−1 b
jj underestimates V ar(βj ) since the variation
in ϑb is not taken into account.
A full Bayesian analysis using MCMC methods is preferable to these
approximations.

c (Claudia Czado, TU Munich)


– 25 –
Under the assumption that β b is asymptotically normal with mean β and
covariance matrix A(ϑ), then the usal hypothesis tests can be done; i.e. for

• H0 : βj = 0 versus H1 : βj 6= 0
βb
Reject H0 ⇔ |tj | = | σbjj | > z1−α/2

• H0 : Cβ = d versus H1 : Cβ 6= d rank(C) = r

b − d)t(C tA(ϑ)C)
Reject H0 ⇔ W := (C β b −1 b − d) > χ2
(C β 1−α,r
(Wald-Test)
or
Reject H0 ⇔ −2[l(β, b γ
b ) − l(βbR, γ
bR)] > χ21−α,r
where β, b γb estimates in unrestricted model
βbR, γ
bR estimates in restricted model (Cβ = d)
(Likelihood Ratio Test)

c (Claudia Czado, TU Munich)


– 26 –
References

Fahrmeir, L., T. Kneib, and S. Lang (2007). Regression: Modelle, Methoden


und Anwendungen. Berlin: Springer-Verlag.
West, B., K. E. Welch, and A. T. Galecki (2007). Linear Mixed Models:
a practical guide using statistical software. Boca Raton: Chapman-
Hall/CRC.

c (Claudia Czado, TU Munich)


– 27 –

You might also like