Professional Documents
Culture Documents
Lemma Dersh
Overview of Multilevel Data
• Clustered Data
– An outcome is measured once for each subject, and
subjects belong to (or are “nested” in) clusters, such as
families, schools, or neighborhoods
• Longitudinal data
– An outcome is measured for the same person repeatedly
over a period of time
Overview of Multilevel Data
5
Overview of Mixed Models
The general form of linear mixed model (in matrix notation)
assuming two levels (the jth observation from ith individual or
cluster) is:
Y
ij
X
ij
Z b
ij i i
Nxp P x1 Nxq qx1 Nx1
‘N’ is for the number of all observations (all repeated data) at the
lowest level (N for the outcome variable = of covariates = of
random effect variable = of the error)
bi ̴ N(0, G)
Where G is the variance-covariance matrix of the random effects.
The random effects are just deviations around the value in β, which is
the mean. So what is left to estimate is the variance
int 2
int, slop
2
G 2
slop ,int , slop
2
2 int 0
G
0 slopt
2
Overview of Mixed Models
The final element in our model is variance covariance matrix of
the residuals, ε, or the condition covariance matrix of
Yij |Xijβ + Zijbi
The most common residual covariance structure is
R=Iσ2ε
where I is the identity matrix, and σ2ε is the residual variance
ˆ , ˆ, Gˆ , and Rˆ
• The final model depends on the distribution assumed,
but is generally of the form:
0 2 4 6 8 10 12
AGE
Individual profile for the first 50 children
STATA CODE:
gen wt=weight/1000 if ind<=50
10
10
8
8
wt
wt
6
6
4
4
2
0 2 4 6 8 10 12 0 2 4 6 8 10 12
AGE AGE
Individual profiles for the first 50 subjects
STATA CODE:
xtline wt if sex==0, overlay t(age) tlabel(#6) i(ind) legend(off)
title("Individual Profiles for Females")
– Analysis of endpoints
– Analysis of increments
Incorrect inference
Limitations of ignoring the independence
• Are usually not the primary focus of the analysis, but allow us to account
for correlation among observations within repeated measurements or the
same level-2 or higher units (e.g. correlations among observations within
the same school), or
37
General linear mixed-effects model
Notations:
Let i = 1, …, N denote level-2 units (clusters or subjects)
0 0 2
0 2 2
3
1 1
2
2
3
2
1
0 0 0 4 2
1 2 1 2 1 2 2 4 2
unstructured
1 2 3
2
AR(1) 2
1
1 2 1 2 12 1 3 13
1
2 1 21 2 2 3 23
2
3 2 1 2
3 1 31 3 2 32 3
Assumptions
A linear Mixed Model makes assumptions about:
b1 … , bN , ε1 , … , εN independent
Marginally:
Yij ~ N(Xijβ, ZijGZ’ij + Σi),
Estimation
• In REML, we transform Y so that the mean vanishes
from the likelihood
Where
Wij : weight (Kg) of the ith infant at jth visit
Aij : Age of the ith infant at the jth visit
Si: Sex of the ith infant (Female =0, Male =1)
b0i: is random intercept; b1i: is random slope
β0 : the fixed intercept
Iteration 0:
Result (ML)
log likelihood = -5623.9513
Iteration 1: log likelihood = -5623.9512
sex#c.age
1 .1185881 .0120292 9.86 0.000 .0950113 .142165
sex#c.age#c.age
1 -.0084046 .0008312 -10.11 0.000 -.0100337 -.0067755
ind: Unstructured
sd(age) .0973595 .0030093 .0916365 .1034399
sd(_cons) .5215715 .0160432 .4910565 .5539828
corr(age,_cons) .3098716 .0433613 .2225995 .3922169
LR test vs. linear model: chi2(3) = 6048.29 Prob > chi2 = 0.0000
sd(εij )=0.4390.
Stata code
Maximum likelihood (default)
xtmixed wt i.sex age c.age#c.age i.sex#c.age
c.age#c.age#i.sex ||ind: age, cov(un)
• For this family, in general, the mean and variance are related
Generalized Linear Model (GLM)
f ( y | , ) (1 )
y 1 y
exp y ln ln(1 )
1
• The mean is given by μ = π and the variance, var(μ) = π(1 − π)
Generalized Linear Model (GLM)
With λi = exp(xiβ)
Generalized Estimating Equations (GEE)
• Marginal model approach for non-Gaussian longitudinal data
is good, particularly for discrete data
It measures linear correlation between height (ht) and weight (wt) across
all 7 time periods. Vectors!
It measures linear correlation between age (time) and weight
CORR represents the correction for correlation between observations i.e.
correlation among the 7 measurements.
A significant beta 1 (height effect) here would mean either that infants
who have high height also have high weight(between-subjects effect), or
that infants whose height change correspondingly have changes in
weight(within-subjects effect), or both.
62
Effects on standard errors
In general, ignoring the correlation (dependency) of the
observations will overestimate the standard errors of the time-
dependent predictors (such as age and length/height of child ),
since we haven’t accounted for between-subject variability.
This is because the long form of the data makes it seem like
there’s 7 times as much data then there really is (the cheating way
to reduce a standard error)!
63
How does GEE work?
64
OLS regression variance-covariance matrix
y/t
t2
0 y/t2
0
2
t3
0 0 y / t
t1 t2 t3
Correlation structure
t1 must be specified.
2 a b
t2 y/t
a y/t
2
c
2
y / t
t3
b c
We are looking for the simplest structure (uses up the fewest degrees of freedom)
that fits data well!
67
Autoregressive
t1 t2 t3 t4
t1
2 3
t2 2
t3
t4
2
3 2
t1 t2 t3 t4
t1 1 2 0
1 2
1
t2
t3 2 1 1
t4 0 2 1
t1 t2 t3 t4
t1
1 2 3
t2 5 4
t3 1
t4 2 5 6
3 4 6
70
How GEE handles missing data
Because the long form of the data are being used, you
only lose the observations that the subject is missing,
not all measurements.
71
Generalized Linear Mixed Models (GLMM)
With
[ ' (ij )] ( ij ) [ E (Yij | bi , )] x'ij z 'ij bij
Generalized Linear Mixed Models (GLMM)
• For a known link function η(·), with Xij and Zij p-dimensional
and q-dimensional vectors of known covariate values,
respectively
Where
Gi is a gender indicator.
Aij is age of the ith infant at time j (also the time variable)
Result
GEE Model
sex#c.age
1 -.0180339 .0173981 -1.04 0.300 -.0521336 .0160658
| c1 c2 c3 c4 c5 c6 c7
------+-----------------------------------------------------------------------------------------------
r1 | 1
r2 | .1458121 1
r3 | .1458121 .1458121 1
r4 | .1458121 .1458121 .1458121 1
r5 | .1458121 .1458121 .1458121 .1458121 1
r6 | .1458121 .1458121 .1458121 .1458121 .1458121 1
r7 | .1458121 .1458121 .1458121 .1458121 .1458121 .1458121 1
GLMM: The Jimma Infant Data
sex#c.age
1 -.021603 .0204887 -1.05 0.292 -.0617601 .0185542
ind: Identity
sd(_cons) 1.20833 .0752947 1.06941 1.365295
LR test vs. logistic model: chibar2(01) = 222.16 Prob >= chibar2 = 0.0000
GLMM with random intercept
yij is the value of the count outcome, the number of events (yij
can equal 0, 1, …)
tij is the length of time during which the events are recorded
can be equal (tij = t): all observations are based on the same period
of time, and the number of events within that same time period is
of interest
can vary (tij): observations are based on varying periods of time, this
should be accounted for when modeling the number of events
within the varying time periods
Mixed-effects Poisson Regression Model
The mixed-effects Poisson regression model
indicates the expected number of counts in tij as:
E(yij) = μij = tijexp[xiβ + Zibi]
log(μij) = log(tij ) + [xiβ + Zibi]
log(μij/tij ) = [xiβ + Zibi]
ID: Identity
sd(_cons) .5008095 .0139198 .474257 .5288485
LR test vs. Poisson model: chibar2(01) = 2.6e+05 Prob >= chibar2 = 0.0000
Stata code
ID: Independent
sd(visit) .1266529 .0042952 .1185082 .1353574
sd(_cons) .621142 .0173675 .5880183 .6561316
LR test vs. Poisson model: chi2(2) = 2.9e+05 Prob > chi2 = 0.0000
2log( L1 ) log( L2 ) ~ DF
2
If the two models are not nested and contain different number of parameters
the likelihood can not be used directly. It is still possible to compare these
models with some of the methods described below.
The bigger the likelihood is the better the model fits data and we use this when
we compare different models
• The idea with both of these is to punish models with many parameters in
some way.
– The AIC value is computed as below where p is the number of parameters
in the covariance structure. Formulated this way, a smaller value of AIC
indicates a better model.
AIC 2LL 2 p, 2LL is the deviance
– The BIC value is computed using the following formula where p is the
number of parameters in the covariance structure and n is the number of
effective observations, which means the number of individuals. Like for
AIC a smaller value of BIC is better than a larger.
BIC 2LL ln( N ) * p
Stata command to calculate AIC and BIC
For maximum likelihood models, use the command estimates stats after
storing results. i.e.
AIC doesn’t require N in its calculation but BIC as can be seen from the formula