Dr. Lemma Longtudinal Data Analysis

Analysis of Longitudinal Data:
A STATA Oriented Approach
Lemma Dersh
Overview of Multilevel Data
• Clustered Data
– An outcome is measured once for each subject, and
subjects belong to (or are “nested” in) clusters, such as
families, schools, or neighborhoods
• Repeated measures data

– Multiple observations are made for the same person over
time, area, space or other dimension
• Longitudinal data
– An outcome is measured for the same person repeatedly
over a period of time
• Longitudinal/clustered/repeated measures data are

more generally known as “multilevel” data.
• Level 1 is the lowest or most granular level of the

data, and where the outcome variable of interest is
measured.
• Levels 2, 3,… capture higher level information

 Data of outcome variable by different levels are

correlated, and
 There are methods in GLM that could be used in

analyzing such correlated data
 However, with serious limitations

Limitations of GLMs (rANOVA/rMANOVA)
 They assume categorical predictors
 They do not handle time-dependent covariates (predictors
measured over time)
 They assume everyone is measured at the same time (time is
categorical) and at equally spaced time intervals
 You don’t get parameter estimates (just p-values)
 Missing data must be imputed
 They require restrictive assumptions about the correlation
structure
5
Overview of Mixed Models
The general form of linear mixed model (in matrix notation)
assuming two levels (the jth observation from ith individual or
cluster) is:
Yij = Xijβ + Zijbi + εi
Where Yij is an N x 1 column vector, the outcome variable; Xij is

an N x p matrix of the p predictor variables; β is a p x 1 column
vector of the fixed-effects regression coefficients (the "betas"); Zij is
the N x q design matrix for the q random effects (the random
complement to the fixed Xij); bi is a q x 1 vector of the random effects
(the random complement to the fixed β); and εi is an N x 1 column
vector of the residuals, that part of Yij that is not explained by the
model, Xijβ + Zijbi
Matrix of Z
Over view of Mixed Models
Nx1 Nx1 Nx1 Nx1
Y 
ij
X
ij
  Z b  
ij i i
Nxp P x1 Nxq qx1 Nx1
 ‘N’ is for the number of all observations (all repeated data) at the
lowest level (N for the outcome variable = of covariates = of
random effect variable = of the error)
 ‘p‘ is the number of fixed effect covariates including the fixed

intercept, which is also the same as the number of fixed effects,
the beta’s
 ‘q’ is the number of clusters (or individuals from which repeated

data was taken), which is also the same as the number of random
intercepts associated with each cluster or individual
• If we are modeling only random intercepts, Zij is a special matrix
that only codes which individual/cluster an observation/datum
belongs to
• Each column is for one individual (cluster) and each row

represents one observation (one row in the dataset).
• If an observation belongs to an individual/cluster in that column,

the cell will have a 1, 0 otherwise; (sparse matrix i.e., a matrix of
mostly zeros)
• If we add a random slope, the number of rows in Zij would remain

the same, but the number of its columns and the number of rows
for the random effect matrix (bi ) will be doubled (i.e. 2q)
 If we estimate bi , it would be a column vector, similar to β. However, in
classical statistics, we do not actually estimate bi ; instead, we nearly
always assume that:
bi ̴ N(0, G)
 Where G is the variance-covariance matrix of the random effects.
 Because we directly estimated the fixed effects (fixed effect intercept

and slope), the random effect complements are modeled as deviations
from the fixed effects, so they have mean zero
 The random effects are just deviations around the value in β, which is
the mean. So what is left to estimate is the variance
 If the model has only a random intercept, G is just a 1 x 1 matrix, the

variance of the random intercept
• If we had both random intercept and random slope, then
  int 2
 int, slop 
2
G 2 
 slop ,int ,  slop 
2
 G is square, symmetric, and positive semi-definite with

redundant elements.
 For a q x q matrix, there are q(q+1)/2 unique elements

• To simplify computation (rather than modeling G directly), we
estimate θ (e.g., a triangular Cholesky factorization G=LDL’)
• θ is not always parameterized the same way, but you can

generally think of it as representing the random effects
• It is usually designed to contain non redundant elements

(unlike the variance covariance matrix), and
• to be parameterized in a way that yields more stable estimates

than variances (such as taking the natural logarithm to ensure
that the variances are positive)
• G is some function of θ, so we get its estimate
• Various parameterizations and constraints allow us to

simplify the model
• For example, by assuming that the random effects

are independent, we can get the following:
 2 int 0 
G 
 0  slopt 
2
 The final element in our model is variance covariance matrix of
the residuals, ε, or the condition covariance matrix of
Yij |Xijβ + Zijbi
 The most common residual covariance structure is
R=Iσ2ε
 where I is the identity matrix, and σ2ε is the residual variance
 This structure assumes a homogeneous residual variance for all

(conditional) observations and that they are (conditionally)
independent.
 Other structures can be assumed such as compound symmetry

or autoregressive, etc.
• The final fixed elements are Y, X, Z, and ε
• The final estimated elements are:
ˆ , ˆ, Gˆ , and Rˆ
• The final model depends on the distribution assumed,
but is generally of the form:
Yij |Xijβ + Zijbi ∼F(0,R)

Part I
Gaussian Longitudinal Data
Structure of Longitudinal Data
What the repeated data look like? Long form
id sem score
1 1 35
1 2 38
Wide form 1 3 37
1 4 33
id score1 score2 score3 score4
2 1 44
2 2 42
1 35 38 37 33 2 3 40
2 44 42 40 43 reshape 2 4 43
3 48 50 48 46 3 1 48
. 3 2 50
. 3 3 48
. 3 4 46
.
.
.
Data: student math final exam score out of 50% in 4 consecutive semesters
Changing from one form to the other
From wide to long form (using stata)
reshape long var, i(varlist) j(new_varname)
var- the outcome variable that belongs to the repeated data
varlist- a unique identifier of participants (it is required)
new_varname-new variable whose unique values denote a sub-
observation
(Unless we want a name to it, new_varname is optional, and stata assigns it as j)
Example: reshape long score, i(id) j(sem)
From Long to wide form

reshape wide var, i(varlist) j(existing_varname), j is existing variable name
To go back to wide after using reshape long: reshape wide
To go back to long after using reshape wide: reshape long
Let us work on Jimma Infant Data
• Wide ranges of data were collected on the following characteristics:
basic demographic information
feeding practice
anthropometric measurements, . . .
• Infants were followed during 12 months
• Measurements were taken at seven time points every two months
from each child
• Weight was one of the variables recorded at each visit
• Research question: How does weight change over time?

Jimma Infant data
Individual Profiles
10
8
wt
6
4
2
0 2 4 6 8 10 12
AGE
Individual profile for the first 50 children
STATA CODE:
gen wt=weight/1000 if ind<=50
xtline wt, overlay t(age) tlabel(#6) i(ind) legend(off)

title("Individual Profiles")
Remarks
• Subjects high (low) at baseline seem to remain high

(low) over time
• Much variability within subjects
• Much variability between subjects
• The variability between subjects at higher ages is

relatively larger than that of the baseline
Individual profile by sex
Individual Profiles for Females Individual Profiles for Males
10
10
8
8
wt
wt
6
6
4
4
2
0 2 4 6 8 10 12 0 2 4 6 8 10 12
AGE AGE
Individual profiles for the first 50 subjects
STATA CODE:
xtline wt if sex==0, overlay t(age) tlabel(#6) i(ind) legend(off)
title("Individual Profiles for Females")
xtline wt if sex==1, overlay t(age) tlabel(#6) i(ind) legend(off)

title("Individual Profiles for Males")
Longitudinal versus Cross-sectional Data
 Recall: Longitudinal data refers to measurements made
repeatedly over time to study how the subjects evolve
over time
 And, the repeated measures taken from a subjects tend

to correlate with each other.
 Cross-sectional data refers to the data collected at a

specific point of time. A snap shot of population.
 Observations from cross-sectional data are uncorrelated

Cross-sectional data
• Suppose it is of interest to study the relation between some
response Y and age from a cross-sectional study that yields the
following data:
• The graph suggests a negative relation between Y and age.

• Exactly the same observations could also have been obtained in a
longitudinal study, with 2 measurements per subject as shown in
the following slide
Cross-sectional and longitudinal Trend
First case Second case
The graph suggests a negative The graph now suggests cross-

cross-sectional association but a sectional as well as longitudinal
positive longitudinal trend. trend to be negative
Dependency/Correlation
• Measurements from the same person likely to have high
correlation
• A correct analysis should account for this correlation
• This is why the classical methods such as ANOVA, linear

regression, ... fail for such data
• Usually correlation decreases as the time span between

measurements increases.
• The simplest case of longitudinal data are paired data. The

paired t-test accounts for this by considering subject-
specific differences
Simple Methods
– Analysis at each time point separately
– Analysis of endpoints
– Analysis of increments
– Ignoring the dependence

Limitations of methods mentioned before
 Does not consider overall differences
 Does not allow to study evolution differences
 Problem of multiple testing
 Incorrect inference
Limitations of ignoring the independence
If we treat observations as independent (i.e., ignore the

correlation), then:
– in general, the estimation of the associations (regression
coefficients) of the outcome and covariates is valid.
– however, the variability measures (e.g, the SEs from a
classical regression analysis) are not right
– sometimes smaller, sometimes bigger than the true
variability, and
– therefore, the inference is not valid (too significant than it
should be if the SE is too small).
Linear Mixed Models (LMMs)
• LMMs are also known as multilevel models, hierarchical

models, random effects models, mixed models
• For a continuous outcome variable, Y, the relationship is

linear in the parameters (β’s)
• For multilevel data, where outcomes were measured for

the same cluster/subject repeatedly are assumed to be
correlated and/or the error variance is not constant.
• As a result, the GLM assumption (below) will be violated.

 i ~ iid N (0,  2 )
Linear Mixed Models (LMMs)
• Used to analyze repeated or clustered continuous data
• Not the only modeling option for multilevel data with a

continuous outcome. Another option is a marginal model
• Composed of both fixed and random effects, hence,

“mixed”
Fixed Effects in a LMM
• Are usually the focus of the analysis and can be thought of
as similar to parameters in an ordinary regression model
(the Betas)
• Can be taken from any level of the data, and help us to

explain the variance in Y at each level of the data
• For fixed effects, the only levels under consideration are

contained in the coding of those levels
• Examples of fixed effects:

– Age, sex, treatment, marital status, anxiety level, etc.
Random Effects in a LMM
• If the levels contained in the coding of factors are a random sample of
the total number of levels, there would be random effect
• Are usually not the primary focus of the analysis, but allow us to account
for correlation among observations within repeated measurements or the
same level-2 or higher units (e.g. correlations among observations within
the same school), or
• Allow us to partition the total variance of Y into levels that correspond

with the multilevel structure of the data
– Example: how much of the variation in student math achievement
scores can be attributed to student-level variability (level -2), class-
level variability (level-3), and school-level variability (level-4)?
• Are summarized by their variance and covariance, if there is more than

one random effect in the LMM
Random Effects in a LMM
 Come in two flavors:
 Random intercepts
 Random slopes
 Are explicitly specified in the model. This is in contrast to the

random errors, which are never explicitly specified when a model
is fit, but always exist and their variance is always estimated.
 LMM account for the correlation in the data by including subject

specific random effects
 These random effects are usually of a Gaussian type

When to use Fixed versus Random Effects
Their relevance depends on research question

 Use fixed effect
If interested in the mean of an outcome for factors containing
all levels
Example: Race, Gender, Age, etc
 Use random effect

If interested in the variance of an outcome by factors
sampled from a population of levels
Example: Facilities, nursing homes, time, etc
37
General linear mixed-effects model
Notations:
 Let i = 1, …, N denote level-2 units (clusters or subjects)
 j = 1, …, ni denote level-1 units (subjects or multiple

observations)
 yij is the value of the continuous outcome (yij can equal

any legitimate value of the outcome variable)
General linear mixed-effects model
Considering the model: Yij = Xijβ + Zijbi + εi . Thus, we have:
Xij are covariates
– at level-1, level-2, or cross-level interactions
– Can include polynomials, dummy variables, interactions, ...
 β are the fixed effect regression coefficients for the covariates
 Zij are the random effect variable(s)

– usually just an intercept for clustered data
– often an intercept and time for longitudinal data
 bi are the random effects ̴N(0, G), b1 , … , bN
– how cluster i influences the observations within the cluster
– how a subject starts and progresses across time
 ε1 , … , εN are independent, and εi ̴N(0, Σi)
 Variance components: elements in G and Σi
Structures of Repeated Effects
Variance Components Compound Symmetry

 1 2 0 0 0   2   1 2  1 2  1 2  1 2 
   2
 2 
2
 0 0 0   1    2  1  1 
2 2 2
 
2
0 0  2
0    2 2 
 3  
 1  1
2
 2
  3
2
 1 
0 0 0  4 2 
 1 2  1 2  1 2  2   4 2 
unstructured 
1    2 3

 
   2
AR(1)   2 
1
 1 2  1 2 12  1 3 13 
  1   
   2 1 21 2  2 3 23
    2

  3  2  1         2 
 3 1 31 3 2 32 3 
Assumptions
 A linear Mixed Model makes assumptions about:
mean structure: (non-)linear, covariates,. . .
variance function: constant, quadratic, . . .
correlation structure: constant, serial, . . .
subject-specific profiles: linear, quadratic,

Exploratory Analysis
 It comprises techniques to visualize patterns in the data
usually graphically
 The following aspects of the data will be looked:

individual profiles
average evolution
correlation structure
Estimation
• Restricted Maximum Likelihood (REML)
• Maximum Likelihood (ML)
• ML is the default in STATA

Estimation
Recall,
Yij = Xijβ + Zijbi + εij
bi ~ N(0, G), εi ~ N(0, Σi)
b1 … , bN , ε1 , … , εN independent
 Marginally:
Yij ~ N(Xijβ, ZijGZ’ij + Σi),
Estimation
• In REML, we transform Y so that the mean vanishes
from the likelihood
• Note that Likelihood at convergence of REML is NOT the

likelihood for the original data Y
• And hence can not be considered for comparison of

models.
Jimma Infant Data
• From the exploratory analysis
– mean structure seems quadratic over time
– variability between subjects at baseline
– variability between subjects in the way they evolve
• Hence a quadratic mean, with random intercept

and slope is a good idea...
Model
Wij = β0 + β1Si+ β2Aij +β3Aij2 + β4SiAij + β5Aij2Si + b0i +b1iAij+εij
= (β0 + b0i ) + β1Si + (β2+b1i)Aij +β3Aij2 + β4SiAij + β5Aij2Si +εij
Where
Wij : weight (Kg) of the ith infant at jth visit
Aij : Age of the ith infant at the jth visit
Si: Sex of the ith infant (Female =0, Male =1)
b0i: is random intercept; b1i: is random slope
β0 : the fixed intercept
Iteration 0:
Result (ML)
log likelihood = -5623.9513
Iteration 1: log likelihood = -5623.9512
Computing standard errors:
Mixed-effects ML regression Number of obs = 6,113

Group variable: ind Number of groups = 1,000
Obs per group:

min = 1
avg = 6.1
max = 7
Wald chi2(5) = 21610.03

Log likelihood = -5623.9512 Prob > chi2 = 0.0000
wt Coef. Std. Err. z P>|z| [95% Conf. Interval]
1.sex .1032682 .0413105 2.50 0.012 .0223011 .1842354

age .79505 .0086158 92.28 0.000 .7781633 .8119367
c.age#c.age -.0346971 .0005926 -58.55 0.000 -.0358585 -.0335356
sex#c.age
1 .1185881 .0120292 9.86 0.000 .0950113 .142165
sex#c.age#c.age
1 -.0084046 .0008312 -10.11 0.000 -.0100337 -.0067755
_cons 3.352453 .0297906 112.53 0.000 3.294064 3.410841

Results (ML):
Random-effects Parameters Estimate Std. Err. [95% Conf. Interval]
ind: Unstructured
sd(age) .0973595 .0030093 .0916365 .1034399
sd(_cons) .5215715 .0160432 .4910565 .5539828
corr(age,_cons) .3098716 .0433613 .2225995 .3922169
sd(Residual) .4390212 .0048254 .4296646 .4485815
LR test vs. linear model: chi2(3) = 6048.29 Prob > chi2 = 0.0000
Note: LR test is conservative and provided only for reference.
sd(b0i)=0.5216, sd(b1i)=0.0974, Corr(b0i, b1i)=0.3099;
sd(εij )=0.4390.
Stata code
Maximum likelihood (default)
xtmixed wt i.sex age c.age#c.age i.sex#c.age
c.age#c.age#i.sex ||ind: age, cov(un)
Restricted maximum likelihood

xtmixed wt i.sex age c.age#c.age i.sex#c.age
c.age#c.age#i.sex ||ind: age, cov(un) reml
Goodness of fit
• Ignore the correlation in the data and fit linear
regression
• Observe the major impact in model fit
• Formally, one can consider likelihood ratio test for

model comparison
Interpretation
• All fixed effects are statistically significant
• The random effects capture the correlation in the data
• Males tend to be higher at baseline (β1 = 0.103), as well

as in evolution over time (β4 = 0.119)
Part II
Non-Gaussian Longitudinal Data
The Jimma Infant Data
 It is of particular interest to identify the risk of
overweight in early life through weight and height
measurements
 This helps in prevention of overweight and obesity to

reduce incidence of several adulthood diseases
 One possible indicator of overweight is age- and sex-

specific BMI, with a BMI over the 85th percentile
referring to overweight
Variable of interest
 The outcome of interest is BMI coded as 0 (normal or

underweight) or 1 (over weight)
 The question of interest is whether the percentage of

overweight changes over time (age), differs for gender.
Generalized Linear Model (GLM)
• A random variable Y follows an exponential family distribution if
the density is of the form
 
f ( y)  f ( y |  ,  )  exp  1 y  ( )  c( y,  )
• For a specific set of unknown parameters η and φ , and for

known functions ψ(·) and c(·, ·)
• Often, η and φ are termed ‘natural parameter’ (or ‘canonical

parameter’) and ‘dispersion parameter,’ respectively
• For this family, in general, the mean and variance are related
• For binary responses, the model of interest is:

Y ~ Bernoulli (π)
• We want to explain variability between outcome values based on

covariate values with density function
    
f ( y |  ,  )   (1   )
y 1 y
 exp  y ln    ln(1   )
 1   
• The mean is given by μ = π and the variance, var(μ) = π(1 − π)
• When collecting a set of data, let Y1, . . . , YN be a set of

independent binary outcomes
• Let x1, . . . ,xp represent the corresponding p-

dimensional vectors of covariate values
• With a logit link function, ln[πi /(1- πi)] = Xiβ is the

logistic regression model with a vector of p fixed,
unknown regression coefficients
• For count data, we assume that

Yi ̴ Poi(λi)
With λi = exp(xiβ)
Generalized Estimating Equations (GEE)
• Marginal model approach for non-Gaussian longitudinal data
is good, particularly for discrete data
• Unlike GLMM, it is not a model, rather an approach that

model the dependence of marginal mean on covariates
• Repeated nature of the data is modeled based on ‘working

correlation’
• Same form as for full likelihood procedure, but we restrict

specification to the first moment only
• Model-based version and Empirically-corrected version

Generalized Estimating Equations (GEE)
• Correlation is modeled using, say, Exchangeable

correlation
• However, separate estimation and reporting of

correlation is not a main interest.
The model in GEE
 wt 1  ht1 
 wt 2  ht 
   β 0  β1  2   β 2 (age)  CORR  Error
...  ... 
   
 wt 7  ht 7 
It measures linear correlation between height (ht) and weight (wt) across
all 7 time periods. Vectors!
It measures linear correlation between age (time) and weight
CORR represents the correction for correlation between observations i.e.
correlation among the 7 measurements.
A significant beta 1 (height effect) here would mean either that infants
who have high height also have high weight(between-subjects effect), or
that infants whose height change correspondingly have changes in
weight(within-subjects effect), or both.
62
Effects on standard errors
In general, ignoring the correlation (dependency) of the
observations will overestimate the standard errors of the time-
dependent predictors (such as age and length/height of child ),
since we haven’t accounted for between-subject variability.
However, standard errors of the time-independent predictors

(such as sex) will be underestimated.
This is because the long form of the data makes it seem like
there’s 7 times as much data then there really is (the cheating way
to reduce a standard error)!
63
How does GEE work?
• First, a classical linear regression analysis is carried out, assuming

the observations within subjects are independent.
• Then, residuals are calculated from the classical model (observed

minus predicted) and a working correlation matrix is estimated
from these residuals.
• Then the regression coefficients are refit, correcting for the

correlation. (Iterative process)
• The within-subject correlation structure is treated as a nuisance

variable (i.e. as a covariate)
64
OLS regression variance-covariance matrix
Example: Assuming we have only three repeated data from

each participant at three time points (not 7 measurements)
t1 t2 t3
Correlation structure (pair-wise
t1 correlations between time points)
 2 0 0  is Independence.
 y/t 
t2
 0  y/t2
0 
 2 
t3
 0 0  y / t 
Variance of scores is homogenous across time

(MSE in ordinary least squares regression).
65
GEE variance-covariance matrix
t1 t2 t3
Correlation structure
t1 must be specified.
 2 a b 
t2  y/t 
 a  y/t
2
c 
 2 
 y / t 
t3
 b c
Variance of scores is homogenous

across time (residual variance). 66
Choice of the correlation structure
within GEE
In GEE, the correction for within subject correlations is carried out
by assuming a priori a correlation structure for the repeated
measurements (although GEE is fairly robust against a wrong
choice of correlation matrix—particularly with large sample size)
Choices:
• Independent (classical regression analysis)
• Exchangeable (compound symmetry, as in rANOVA)
• Autoregressive
• M-dependent
• Unstructured (no specification, as in rMANOVA)
We are looking for the simplest structure (uses up the fewest degrees of freedom)
that fits data well!
67
Autoregressive
t1 t2 t3 t4
t1
  2 3 
t2  2
t3     
t4
 2    
 
  3  2   
Only 1 parameter estimated. Decreasing correlation for

farther time periods.
68
M-dependent
t1 t2 t3 t4
t1   1 2 0 
 1  2 
 1 
t2
t3   2 1  1 
 
t4 0  2 1 
Here, 2-dependent. Estimate 2 parameters (adjacent time periods

have 1 correlation coefficient; time periods 2 units of time away
have a different correlation coefficient; others are uncorrelated) 69
Unstructured
t1 t2 t3 t4
t1
 1 2 3 
t2   5  4 
t3  1
t4  2 5  6 
 
3 4 6 
Estimate all correlations separately (here 6)
70
How GEE handles missing data
Uses the “all available pairs” method, in which all non-

missing pairs of data are used in the estimating the
working correlation parameters.
Because the long form of the data are being used, you
only lose the observations that the subject is missing,
not all measurements.
71
Generalized Linear Mixed Models (GLMM)
• For non-Gaussian data, the well-known generalized

linear mixed model is commonly used
• The linear predictor contains random effects in addition

to the usual fixed effects
• These random effects are usually assumed to come

from a normal distribution
• Let Yij be the jth outcome measured for subject i = 1, . .
. ,n, j = 1, . . . , ni and group the ni measurements into
a vector Yi
• Conditionally upon q-dimensional random effects bi ̴

N(0,G), the outcomes Yij are independent with
densities of the form
   
f i ( yij | bi ,  ,  )  exp  1 yij ij  (ij )  c( yij ,  )
With
[ ' (ij )]   (  ij )  [ E (Yij | bi ,  )]  x'ij   z 'ij bij
• For a known link function η(·), with Xij and Zij p-dimensional
and q-dimensional vectors of known covariate values,
respectively
• With β a p-dimensional vector of unknown fixed regression

coefficients, and with φ a scale (over-dispersion) parameter
• Finally, let f(bi|G) be the density of the N(0,G) distribution

for the random effects bi
GEE versus GLMM
Generalized Estimating Equations (GEE) Generalized Linear Mixed Models (GLMM)
Focus Called a "marginal" mean regression model. Called a "conditional" mean regression
Mean model is the primary focus model
 Estimation of both means and cluster
Longitudinal or cluster correlation is specific random effects are of interest
a nuisance feature of the data
Has applications for person or cluster
level prediction
Technical Estimates are the solution to estimating Estimates are obtained from a likelihood
details equations function
Semi‐parametric: only the mean and  Fully parametric specification of mean,
working’ correlation models are random effects, and error terms
specified
Correlation Requires choice of “working” correlation Correlation induced by person or cluster specific
model (independence, exchangeable, etc.) random effects
Random intercepts, random slopes
Robust to misspecification of
correlation, assuming the sandwich Permits multiple levels of clustering, i.e.
estimate of the variance is used hierarchical models
Traditionally accommodates only one
level of clustering
Issues to Sandwich estimate of variance requires a Valid inference only if model assumptions
sufficiently large number of clusters (≥ 40) are met
consider
 With missing data, assumption is
MCAR With missing data, assumption is MAR
Mdeling TVCs (time‐varying Modeling TVCs (time‐varying covariates) is
covariates) is tricky tricky
GEE: The Jimma Infant Data
• The following model for BMI is assumed for the mean
structure: Yij|bi ̴ Bernoulli(πij), for subject i and
measurement j,
• Exchangeable correlation ( or CS)
logit(πij) = β0 + β1Aij + β2 Gi + β3GiAij
Where
Gi is a gender indicator.
Aij is age of the ith infant at time j (also the time variable)
Result
GEE Model
GEE population-averaged model Number of obs = 6,113

Group variable: ind Number of groups = 1,000
Link: logit Obs per group:
Family: binomial min = 1
Correlation: exchangeable avg = 6.1
max = 7
Wald chi2(3) = 2.13
Scale parameter: 1 Prob > chi2 = 0.5462
BMIBIN Coef. Std. Err. z P>|z| [95% Conf. Interval]
1.sex .148298 .1376528 1.08 0.281 -.1214966 .4180926

age .0016801 .0124828 0.13 0.893 -.0227856 .0261459
sex#c.age
1 -.0180339 .0173981 -1.04 0.300 -.0521336 .0160658
_cons -1.872127 .1011225 -18.51 0.000 -2.070323 -1.67393

Stata code
xtgee BMIBIN i.sex c.age i.sex#c.age, i(ind) t(age)

corr(exc) link(logit) family(bin)
Options and interpretations
• The option ‘robust’ can be used to obtain the
empirically corrected standard error estimates.
• The odds ratios can be requested by the option ‘eform’.
• The correlation matrix can be requested by the

command ‘estat wcorrelation’.
• Run and compare the standard errors of the ‘model

based’ and the ‘robust’ versions and interpret them
Correlation matrix
estat wcorrelation
Estimated within-ind correlation matrix R:
| c1 c2 c3 c4 c5 c6 c7
------+-----------------------------------------------------------------------------------------------
r1 | 1
r2 | .1458121 1
r3 | .1458121 .1458121 1
r4 | .1458121 .1458121 .1458121 1
r5 | .1458121 .1458121 .1458121 .1458121 1
r6 | .1458121 .1458121 .1458121 .1458121 .1458121 1
r7 | .1458121 .1458121 .1458121 .1458121 .1458121 .1458121 1
GLMM: The Jimma Infant Data
• Random-effects model for non-Gaussian longitudinal data
• The following model is assumed for the mean structure:

Yij |bi ̴ Bernoulli(πij), for subject i and measurement j
• Gaussian distributed random intercepts bi, i.e., bi ̴ N(0, G)

can be included to capture the correlation.
logit(πij) = β0 + β1Aij + β2 Gi + β3GiAij + bi

GLMM with random intercept
Integration points = 7 Wald chi2(3) = 2.14
BMIBIN Coef. Std. Err. z P>|z| [95% Conf. Interval]
1.sex .1774603 .163003 1.09 0.276 -.1420197 .4969403

age .0022733 .0147053 0.15 0.877 -.0265485 .0310952
sex#c.age
1 -.021603 .0204887 -1.05 0.292 -.0617601 .0185542
_cons -2.336672 .1266153 -18.45 0.000 -2.584834 -2.088511
ind: Identity
sd(_cons) 1.20833 .0752947 1.06941 1.365295
LR test vs. logistic model: chibar2(01) = 222.16 Prob >= chibar2 = 0.0000
GLMM with random intercept
xtmelogit BMIBIN i.sex c.age i.sex#c.age|| ind:
Note that the odds ratio estimates can be

obtained by including the option ‘or’.
Mixed effect model for count data
 Let i = 1, …, n denote level-2 units (clusters or subjects)
 j = 1, …, ni denote level-1 units (subjects or multiple observations)
 yij is the value of the count outcome, the number of events (yij
can equal 0, 1, …)
 tij is the length of time during which the events are recorded
 can be equal (tij = t): all observations are based on the same period
of time, and the number of events within that same time period is
of interest
 can vary (tij): observations are based on varying periods of time, this
should be accounted for when modeling the number of events
within the varying time periods
Mixed-effects Poisson Regression Model
 The mixed-effects Poisson regression model
indicates the expected number of counts in tij as:
E(yij) = μij = tijexp[xiβ + Zibi]
log(μij) = log(tij ) + [xiβ + Zibi]
log(μij/tij ) = [xiβ + Zibi]
 link function for Poisson regression is the log link
 tij is sometimes called an offset variable
 exp β = incidence or event rate ratio

Mixed effect model for count data
Consider FelegeHiwot referral hospital CD4 count data
• Let Yij represent the number of CD4 counts for patient i at
the jth visit
• Let tij be the time-point (visit) at which Yij has been

measured, tij = 1, 2, . . . until at most 8. And assume that the
measurement time intervals were equal, 3 months (consider
as a 1 term or period, otherwise consider ln(3) as an offset)
• For sex, females were coded as zero and males as 1
• bi are subject specific random intercepts assumed to have

Gaussian distribution with mean 0 and variance d.
Mixed effect poison model
First fixed effect and random intercept model
• Assuming that CD4 counts are generated from a
Poisson-normal process with mean λij
ln(λij) = log(t) + [β0 + β1sexi + β2visitij + bi0 ]
ln(λij) = (β0 + bi0) + β1 sexi + β2 visitij’ taking t = 1
Mixed-effects Poisson regression Number of obs = 4,208

Group variable: ID Number of groups = 681
Obs per group:

min = 1
avg = 6.2
max = 8

Result for mixed effect with random intercept
CD4_ Coef. Std. Err. z P>|z| [95% Conf. Interval]
sex -.2325727 .0389667 -5.97 0.000 -.308946 -.1561994

visit .0350176 .0003898 89.82 0.000 .0342535 .0357817
_cons 5.700367 .0253205 225.13 0.000 5.650739 5.749994
ID: Identity
sd(_cons) .5008095 .0139198 .474257 .5288485
LR test vs. Poisson model: chibar2(01) = 2.6e+05 Prob >= chibar2 = 0.0000
Stata code
xtmepoisson CD4_ age i.sex i.residence || ID:
• The option ‘irr’ can be used to obtain incidence rate

ratios as follows:
xtmepoisson CD_ age i.sex i.residence || ID:, irr

Mixed effect Poisson model
Fixed effect with random slope and random intercept
• Include a random slope assuming subjects have
different evolution over time.
• Both bi0 and bi1 are jointly normally distributed and

possibly correlated.
• The varance-covarance matrix can then be

‘unstructured’.
ln(λij) = β0 + bi0 + β1 sexi + β2 visitij + bi1visitij
= (β0 + bi0) + β1 sexi + (β2 + bi1)visiti
Result for random slope and intercept
CD4_ Coef. Std. Err. z P>|z| [95% Conf. Interval]
sex -.2497908 .0485448 -5.15 0.000 -.3449368 -.1546448

visit .0446168 .0051307 8.70 0.000 .0345608 .0546728
_cons 5.665371 .0314839 179.94 0.000 5.603663 5.727078
ID: Independent
sd(visit) .1266529 .0042952 .1185082 .1353574
sd(_cons) .621142 .0173675 .5880183 .6561316
LR test vs. Poisson model: chi2(2) = 2.9e+05 Prob > chi2 = 0.0000
Note: LR test is conservative and provided only for reference.

Stata code
xtmepoisson CD4_ sex visit || ID: visit

Model checking in linear mixed models
Model selection: likelihood
• When choosing between different models we want to be able
to decide which model fits our data best. If the models
compared are nested within each other it is possible to do a
likelihood ratio test where the test statistic has an approximate
distribution. The test statistic for the likelihood statistic is,
2log( L1 )  log( L2 ) ~  DF
2
• where DF are the degrees of freedom which is the difference

in number of parameters for the models and L1 and L2 are the
likelihoods for the first and second model respectively.
Model selection: likelihood
 If the two models compared are not nested with each other but contain the
same number of parameters they can be compared directly by looking at the
log likelihood and
the model with the biggest likelihood value wins
 If the two models are not nested and contain different number of parameters
the likelihood can not be used directly. It is still possible to compare these
models with some of the methods described below.
 The bigger the likelihood is the better the model fits data and we use this when
we compare different models
 Since we are interested in getting as simple models as possible we also have to

consider the number of parameters in the structures. A model with many
parameters usually fits data better than a model with less number of parameters
 Information.
Model selection: Information criteria
• It is possible to compute so called information criteria and there are different
ways to do that and two of these were shown here, Akaikes information
criteria (AIC) and Bayesian information criteria (BIC). AIC and BIC are
appropriate for maximum likelihood models.
• The idea with both of these is to punish models with many parameters in
some way.
– The AIC value is computed as below where p is the number of parameters
in the covariance structure. Formulated this way, a smaller value of AIC
indicates a better model.
AIC  2LL  2 p,  2LL is the deviance
– The BIC value is computed using the following formula where p is the
number of parameters in the covariance structure and n is the number of
effective observations, which means the number of individuals. Like for
AIC a smaller value of BIC is better than a larger.
BIC  2LL  ln( N ) * p
Stata command to calculate AIC and BIC
 For maximum likelihood models, use the command estimates stats after
storing results. i.e.
estimates stats model1_name model2_name, n(N)
Where N is the number of observation used in the analysis; and in longitudinal

(correlated data) select N carefully
 AIC doesn’t require N in its calculation but BIC as can be seen from the formula
 If there is very strong within correlation ,use the number of individual as N;

otherwise the number of observations
 However, STATA uses the number of observations and it is preferable in

general.
 For non-maximum likelihood estimates (e.g. regress) use the command:

estat ic
So much thanks

Dr. Lemma Longtudinal Data Analysis

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Dr. Lemma Longtudinal Data Analysis

Uploaded by

Copyright:

Available Formats

Analysis of Longitudinal Data:

A STATA Oriented Approach

• Repeated measures data

• Longitudinal/clustered/repeated measures data are

• Level 1 is the lowest or most granular level of the

• Levels 2, 3,… capture higher level information

 Data of outcome variable by different levels are

 There are methods in GLM that could be used in

 However, with serious limitations

Yij = Xijβ + Zijbi + εi

Where Yij is an N x 1 column vector, the outcome variable; Xij is

Nx1 Nx1 Nx1 Nx1

 ‘p‘ is the number of fixed effect covariates including the fixed

 ‘q’ is the number of clusters (or individuals from which repeated

• Each column is for one individual (cluster) and each row

• If an observation belongs to an individual/cluster in that column,

• If we add a random slope, the number of rows in Zij would remain

 Because we directly estimated the fixed effects (fixed effect intercept

 If the model has only a random intercept, G is just a 1 x 1 matrix, the

 G is square, symmetric, and positive semi-definite with

 For a q x q matrix, there are q(q+1)/2 unique elements

• θ is not always parameterized the same way, but you can

• It is usually designed to contain non redundant elements

• to be parameterized in a way that yields more stable estimates

• Various parameterizations and constraints allow us to

• For example, by assuming that the random effects

 This structure assumes a homogeneous residual variance for all

 Other structures can be assumed such as compound symmetry

• The final fixed elements are Y, X, Z, and ε

• The final estimated elements are:

Yij |Xijβ + Zijbi ∼F(0,R)

Example: reshape long score, i(id) j(sem)

From Long to wide form

• Research question: How does weight change over time?

xtline wt, overlay t(age) tlabel(#6) i(ind) legend(off)

• Subjects high (low) at baseline seem to remain high

• Much variability within subjects

• Much variability between subjects

• The variability between subjects at higher ages is

Individual Profiles for Females Individual Profiles for Males

xtline wt if sex==1, overlay t(age) tlabel(#6) i(ind) legend(off)

 And, the repeated measures taken from a subjects tend

 Cross-sectional data refers to the data collected at a

 Observations from cross-sectional data are uncorrelated

• The graph suggests a negative relation between Y and age.

The graph suggests a negative The graph now suggests cross-

• A correct analysis should account for this correlation

• This is why the classical methods such as ANOVA, linear

• Usually correlation decreases as the time span between

• The simplest case of longitudinal data are paired data. The

– Analysis at each time point separately

– Ignoring the dependence

 Does not consider overall differences

 Does not allow to study evolution differences

 Problem of multiple testing

If we treat observations as independent (i.e., ignore the

• LMMs are also known as multilevel models, hierarchical

• For a continuous outcome variable, Y, the relationship is

• For multilevel data, where outcomes were measured for

• As a result, the GLM assumption (below) will be violated.

• Used to analyze repeated or clustered continuous data

• Not the only modeling option for multilevel data with a

• Composed of both fixed and random effects, hence,