Professional Documents
Culture Documents
E(yi ) = µi and ‘variation’, i.e. var (yi ) = σi2 . Those average and variation can be
further factored out when additional information is supplied.
• For example, such observations actually have a structure so that y11 , · · · , y1m are
observations from females and y21 , · · · , y2m are observations from males with
2m = n. Then, we can factor out the mean such as E(yij ) = µ + αi for i = 1, 2
and j = 1, · · · , m. αi can be interpreted as average contribution by ‘gender’. We
When the model is not necessarily linear, we call it a generalized linear model
(GLM)
Categorical variates are also called factors. Factors are groups which divide
the data accordingly (Ex. gender).
nested/crossed factor
Nested factor example: Consider students’ gpa by gender from each section in
English and Math courses. There are three sections (A,B,C). Then, we have three
factors (gender, section, subject). Within each subject, there are three sections.
Section factor is nested within the subject factor since section A in English class
16 age groups in white and non-white are the same. The age group and the race
factors are said to be crossed.
• Effect of a factor is contribution to the mean and/or variation. There are two
effects: fixed effects and random effects.
Effect of a factor is called fixed effect when the parameters associated with
a factor are fixed constants. Fixed effects contributes to the mean. In this case,
the corresponding factor has a finite number of levels. When the continous
Effect of a factor is called random effect when the parameters associated with
CHAPTER 12. LINEAR MIXED EFFECT MODEL 114
a factor are random. Random effects contributes to the variation. In this case,
we regard the available levels (finite number of levels) for the factor as sample
• Example: Consider yijk be the volume of the kth bread in the jth batch at the
ith temperature level. i = 1, 2, 3, j = 1, · · · , 6 and k = 1, · · · , 4. A researcher
wants to investigate the changes in volume of bread at different temperatures.
There are three different temperatures. In each temperature level, Four loaves
of bread in each of 6 batches are baked.
There are two factors: temperature factor (3 levels) and batch factor (6 levels).
Fixed effects for temperature factor. Random effects for batch factor. One can
argue that the effect by different temperature level contributes to the mean of
the volume while the effect by different batch level contributes to the variation
of the volume.
When both fixed effects and random effects are considered in the model, it
is called a mixed effects model. The above example is a mixed effect model.
Placebo and drug: yij be the number of seizure by patient j receiving the
treatment i. i = 1: placebo, i = 2: a drug. This is one factor model.
Consider a model: E(yij ) = µi .
µi is the mean number of seizures expected from someone receiving the treat-
ment i. If we write E(yij ) = µ + αi , µ is general mean (global mean) and αi is the
effect on the mean number of seizures due to the treatment i. These are fixed
parameters of interest.
Clinical trial: yij be the number of seizures by patient j at the ith clinic in the
city of seoul who is treated by a drug. Say, i = 1, · · · , 20.
Consider a model E(yij | ai ) = µ + ai .
CHAPTER 12. LINEAR MIXED EFFECT MODEL 115
(0, σa2 ) for all i, i.e. E( ai ) = 0, var ( ai ) = σa2 and cov( ai , ai0 ) = 0 for i 6= i0 .
i.i.d.
yij | ai ∼ N (µ + ai , σ2 ), j = 1, · · · , J
i.i.d.
ai ∼ N (0, σa2 ), i = 1, · · · , I.
Alternatively,
yij = µ + ai + eij ,
i.i.d.
ai ∼ N (0, σa2 )
i.i.d.
eij ∼ N (0, σ2 ) and ai ⊥ eij
= σa2 + σ2
= cov(µ + ai , µ + ai ) = σa2
6
4
2
0
−2
DATA1 DATA2
X1 X2 X3 X4 X5
X1 X2 X3 X4 X5
CHAPTER 12. LINEAR MIXED EFFECT MODEL 117
(σa2 is relatively larger than σ2 ) indicates more clustered within the class of ran-
dom effects.
12.2 Inference
Testing is for comparing levels of fixed effects (i.e. linear functions of fixed
effects) or whether variation due to random effects is zero or not (i.e. σa2 = 0).
• Best linear unbiased estimator (BLUE) Let Y be a random vector with E(Y ) =
Xβ, cov(Y ) = V, where X is a known n × p design matrix, β ∈ R p and V is
known non-singular covariance matrix.
A real valued linear estimator t T Y is said to best linear unbiased estimator for
it’s expectation if and only if var (t T Y ) ≤ var ( a T Y ) for all linear estimators with
E ( a T Y ) = E ( t T Y ).
predictor is the one that minimizes mean squared error (MSE) which yields the
conditional mean.
yb0BP = E(y0 | Y )
CHAPTER 12. LINEAR MIXED EFFECT MODEL 118
since for any predictor g(Y ) of y0 , one can show E(y0 − g(Y ))2 ≥ E(y0 −
E(y0 |Y ))2 .
• Best linear predictor (BLP) When mean and covariance are available, let cov(Y ) =
V, where V is non-singular covariance matrix, cov(y0 , Y ) = v0 , E(yi ) = µi ,
= µ 0 + v 0 V − 1 (Y − µ )
• Best linear unbiased predictor (BLUP) When the covariance structure is known,
a0 + a T Y is BLUP of y0 if a0 + a T Y is unbiased (i.e. E( a0 + a T Y ) = E(y0 )) and
for any other predictor b0 + b T Y, E(y0 − a0 − a T Y )2 ≤ E(y0 − b0 − b T Y )2 .
and x0 = c T X for some c, where X and Cov are known, cov(Y ) is non-singular
and β is unknown. Then,
BLUE BLUE
−1
yb0BLUP = x0 β
d + cov(y0 , Y ) cov(Y ) Y − Xβ
c
≥ E(y0 − L β (Y ))2
• linear mixed effects model: A general linear mixed effects model is of the
following form:
Y = Xβ + Zγ + e,
ZDZ T + R ZD
Y Xβ
∼ ,
γ 0 DZ T D
t = 1, 2 and k = 1, · · · , ntij .
CHAPTER 12. LINEAR MIXED EFFECT MODEL 120
∑rl=1 σl2 Zl ZlT + R = ∑rl=1 σl2 Zl ZlT + σ02 IN , where N is the total number of obser-
vations. Such simplified linear mixed effects model is called a variance compo-
nent model. V can be also written as V = ∑rl=0 σl2 Zl ZlT with Z0 = IN .
ML method
Log likelihood for a variance component model is ` = − 21 (Y − Xβ) T V −1 (Y −
N
Xβ) − 12 log |V | − 2 log(2π ). Then, we have
∂`
= − X T V −1 Xβ + X T V −1 Y
∂β
CHAPTER 12. LINEAR MIXED EFFECT MODEL 121
∂` 1 −1 1 ∂
T ∂V
2
= − ( Y − Xβ ) 2
(Y − Xβ) − log |V |
∂σl 2 ∂σl 2 ∂σl2
1 1
= (Y − Xβ)T V −1 Zl ZlT V −1 (Y − Xβ) − tr (V −1 Zl ZlT )
2 2
X T V −1 Xβ = X T V −1 Y
= Y T PZl Zl PY,
for l = 0, · · · , r, where P = V −1 − V −1 X ( X T V −1 X )− X T V −1 .
X βbML = X ( X T V −1 X )− X T V −1 Y.
1
When Zl = 0 for l = 1, · · · , r, V = σ02 I and b
σ02 = N SSE.
BT X = 0
B T is ( N − r ) × N full row rank matrix, where r = rank( X ) and
B T VB is nonsingular.
1 1
` R ∝ − Y T B( B T VB)−1 B T Y − log | B T VB|
2 2
∂` R 1 T T −1 T T T −1 T 1 T −1 T T
= Y B( B VB) B Zl Zl B( B VB) B Y − tr ( B VB) B Zl Zl B ,
∂σl2 2 2
for l = 0, · · · , r.
CHAPTER 12. LINEAR MIXED EFFECT MODEL 122
for l = 0, · · · , r.
• BLUE and BLUP in a linear mixed effect model. We are interested in finding
c BLUE and Zγ
Xβ c BLUP .
X T R −1 X X T R −1 Z X T R −1 Y
β
= .
Z T R −1 X D −1 + Z T R −1 Z γ Z T R −1 Y
A solution to MME is
βb = ( X T V −1 X )− X T V −1 Y
b = DZ T V −1 (Y − X βb)
γ
= ( I − R −1 Z ( D −1 + Z T R −1 Z ) −1 Z T ) R −1
CHAPTER 12. LINEAR MIXED EFFECT MODEL 123
= ( D −1 + Z T R −1 Z ) −1 Z T R −1 .
σa2
a i = E ( a i |Y ) = (ȳi· − µ),
σa2 + σ2 /ni
b
where ni is the number of observations in the ith level of the factor and ȳi· =
1 n
ni ∑ j=i 1 yij , the sample mean of observations in the ith level of the factor.
Note that we do not know σ2 , σa2 and µ. A naive remedy is to replace with
estimators (Ex. MLE):
σa2
(ȳi· − ȳ· · )
b
ãi =
σa2 + b
b σ2 /ni
If ai is a fixed effect parameter,
ai = ȳi· − ȳ· ·
b
with constraint of ∑ ni b
ai = 0.
σa2
Note that an additional term b
b
σa2 +b
σ2 /ni
< 1 so ãi < b
ai .
ai (more shrinkage).
b
Chapter 13
• Random component
• Systematic component
p
ηi = ∑ j=1 Xij β j
β
counter example: exp( β 0 + Xi 1 )
• Link function: The link function relates linear predictor and the conditional
mean, i.e., η = g(µ). g(.) is called the link function and should be twice differ-
Examples of link functions for binary regression are logit, probit, and com-
plementary log-log function.
n
logL( β(θ ), φ) = ∑ {yi θi − b(θi )}/φ + c(yi , φ),
i =1
• Score function
n n
∂ ∂η ∂θ ∂ ∂η ∂µ ∂θ ∂
logL( β, φ) = ∑ i i logL( β, φ) = ∑ i i i logL( β, φ)
∂β i =1
∂β ∂ηi ∂θi i =1
∂β ∂ηi ∂µi ∂θi
∂ηi
∂β = XiT , ∂µi
∂ηi = 1/g0 (µi ), ∂θi
∂µi = 1/b00 (θi ),
∂θi ∂η
= 1/ i = 1/[ g0 {b0 (θi )}b00 (θi )] = 1/{ g0 (µi )b00 (θi )}),
∂ηi ∂θi
CHAPTER 13. GENERALIZED LINEAR MODEL 126
∂ Y − b 0 ( θi ) Y − µi
logL( β, φ) = i = i
∂θi φ φ
∂ n
Y − b 0 ( θi )
logL( β, φ) = ∑ XiT 1/{ g0 (µi )b00 (θi )} i
∂β i =1
φ
n
= ∑ XiT { g0 (µi )Var(Yi )}−1 {Yi − µi }
i =1
∂θi
Note: If θi = ηi , i.e., canonical link, ∂ηi = 1 and the score function is
n
∂
logL( β, φ) = ∑ XiT (Yi − µi )/φ
∂β i =1
• Information
n h
∂ ∂
I ( β) = − T U ( β) = − ∑ XiT { g0 (µi )Var (Yi )}−1 T (Yi − µi )
∂β i =1
∂β
∂ i
− XiT [ T { g0 (µi )Var (Yi )}−1 ]{Yi − µi }
∂β
n h
∂µ ∂ i
= ∑ XiT { g0 (µi )Var (Yi )}−1 Ti − XiT [ T { g0 (µi )Var (Yi )}−1 ]{Yi − µi } .
i =1
∂β ∂β
∂µi ∂µ ∂η
T
= i Ti = { g0 (µi )}−1 Xi
∂β ∂ηi ∂β
n
i ( β) = E{ I ( β)} = ∑ XiT g0 (µi )−1 Var(Yi )−1 g0 (µi )−1 Xi .
i =1
CHAPTER 13. GENERALIZED LINEAR MODEL 127
• Computation of MLE
In general, the equation, U (θ ) = 0, does not have a closed form solution. Then
one should solve the equation iteratively. Starting from initial value θ (0) , one
can iteratively update the estimate by
θ ( p +1) = θ ( p ) + I ( θ ( p ) ) −1 U ( θ ( p ) ),
Motivation: The score function of GLM has a form of weighted linear regression.
∂ηi
z i = ηi + (Y − µ i )
∂µi i
Then
E(zi ) = ηi = Xi β.
h ∂η i2
i
Var (zi ) = Var (Yi ) = g0 (µi )2 Var (Yi ).
∂µi
CHAPTER 13. GENERALIZED LINEAR MODEL 128
n n
βb = ( ∑ XiT Var (zi )−1 Xi )−1 ( ∑ XiT Var (zi )−1 zi )
i =1 i =1
Note that both z and Var (z) are also functions of β. Start with β(0) and update.
β ( p +1)
n n
= [ ∑ XiT { g0 (µi )2 Var (Yi )}−1 Xi ]−1 [ ∑ XiT Var (zi )−1 {ηi + g0 (µi )(Yi − µi )}]
i =1 i =1
n h n
= [ ∑ XiT { g0 (µi )2 Var (Yi )}−1 Xi ]−1 { ∑ XiT Var (zi )−1 Xi β( p) }
i =1 i =1
n i
+ [ ∑ XiT { g0 (µi )2 Var (Yi ))}−1 g0 (µi )(Yi − µi )
i =1
n n
=β ( p)
+ [∑ XiT { g0 (µi )2 Var (Yi )}−1 Xi )]−1 ( ∑ XiT { g0 (µi )Var(Yi ))}−1 (Yi − µi ))
i =1 i =1
( p) ( p ) −1 ( p)
=β + i( β ) U(β ).
• We can tell that iterative reweighting leads to the MLE since it is numerically
• Summary of results
3. βb is consistent.
√
4. n( βb − β 0 ) ∼ N (0, ni ( β)−1 )
where Y = (Y1 , Y2 , · · · , Yn ) T ,
µi exp(ηi )
ηi = log , E(Yi | xi ) = µi = ,
1 − µi 1 + exp(ηi )
P(Yi = 1| xi = 1) P(Yi = 0| xi = 0)
β 1 = log .
P(Yi = 1| xi = 0) P(Yi = 0| xi = 1)
n
1
∑ xi (Yi − µi )
i =1
and the observed and expected information matrices are the same and
n n
∑i=1 µi (1 − µi ) ∑in=1 xi µi (1 − µi )
1
∑ xi µi (1 − µi ) 1 xi = ∑in=1 xi µi (1 − µi ) ∑in=1 x2 µi (1 − µi ) .
i =1 i
∂ηi
z i = ηi + (Y − µi ) = β 0 + β 1 xi + {µi (1 − µi )}−1 (Yi − µi )
∂µi i
∂ηi ∂η ∂θ
= i i = {µi (1 − µi )}−1
∂µi ∂θi ∂µi
els are nonlinear models but usual definition of nonlinear models has the fol-
lowing form:
Yi = f ( Xi , β) + ei ,
squares ∑in=1 ei2 or a weighted sum of error squares ∑in=1 wi ei2 (with known or
unknown weight) or maximum likelihood estimate can be used.
CHAPTER 13. GENERALIZED LINEAR MODEL 130
• Hypothesis testing can be conducted using Wald test, Score test, and likelihood
ratio test.
• Score statistic
The score statistic is based on the null distribution of the score function. In
testing entire vector
H0 : β = β0
H0 : β1 = β10 ,
Score statistic
where β̃2 is the solution of U2 (β10 , β2 ) = 0, i.e., the MLE of β2 under the
restriction β1 = β10 .
T .
It can be easily verified that i21 = i12
To obtain the variance of U (β10 , β˜2 ) we can consider the following expression:
h ∂ i
−1
U1 (β10 , β˜2 ) ≈ U1 (β10 , β20 ) + T
U1 (β1 , β2 ) (β̃2 − β20 ) = U1 (β10 , β20 ) − i12 i22 U 2 ( β1 , β2 ) .
∂β2
Then
−1
Var {U (β10 , β˜2 )} = i11 − i12 i22 i21 .
One weakness is that the computation of the score statistic requires specialized
program.
Example
n h exp( β̃ 0 ) i
U1 ( β̃ 0 , 0) = ∑ i Yi −
X
1 + exp( β̃ 0 )
.
i =1
n n n
Var {U1 ( β̃ 0 , 0)} = ∑ Xi2 µ̃i (1 − µ̃i ) − { ∑ Xi µ̃i (1 − µ̃i )}{ ∑ µ̃i (1 − µ̃i )}−1 {∑ Xi µ̃i (1 − µ̃i )}
i =1 i =1 i =1