You are on page 1of 98

Analysis of Longitudinal Data:

A STATA Oriented Approach

Lemma Dersh
Overview of Multilevel Data
• Clustered Data
– An outcome is measured once for each subject, and
subjects belong to (or are “nested” in) clusters, such as
families, schools, or neighborhoods

• Repeated measures data


– Multiple observations are made for the same person over
time, area, space or other dimension

• Longitudinal data
– An outcome is measured for the same person repeatedly
over a period of time
Overview of Multilevel Data

• Longitudinal/clustered/repeated measures data are


more generally known as “multilevel” data.

• Level 1 is the lowest or most granular level of the


data, and where the outcome variable of interest is
measured.

• Levels 2, 3,… capture higher level information


Overview of Multilevel Data

 Data of outcome variable by different levels are


correlated, and

 There are methods in GLM that could be used in


analyzing such correlated data

 However, with serious limitations


Limitations of GLMs (rANOVA/rMANOVA)
 They assume categorical predictors
 They do not handle time-dependent covariates (predictors
measured over time)
 They assume everyone is measured at the same time (time is
categorical) and at equally spaced time intervals
 You don’t get parameter estimates (just p-values)
 Missing data must be imputed
 They require restrictive assumptions about the correlation
structure

5
Overview of Mixed Models
The general form of linear mixed model (in matrix notation)
assuming two levels (the jth observation from ith individual or
cluster) is:

Yij = Xijβ + Zijbi + εi

Where Yij is an N x 1 column vector, the outcome variable; Xij is


an N x p matrix of the p predictor variables; β is a p x 1 column
vector of the fixed-effects regression coefficients (the "betas"); Zij is
the N x q design matrix for the q random effects (the random
complement to the fixed Xij); bi is a q x 1 vector of the random effects
(the random complement to the fixed β); and εi is an N x 1 column
vector of the residuals, that part of Yij that is not explained by the
model, Xijβ + Zijbi
Matrix of Z
Over view of Mixed Models

Nx1 Nx1 Nx1 Nx1

Y 
ij
X
ij
  Z b  
ij i i
Nxp P x1 Nxq qx1 Nx1

 ‘N’ is for the number of all observations (all repeated data) at the
lowest level (N for the outcome variable = of covariates = of
random effect variable = of the error)

 ‘p‘ is the number of fixed effect covariates including the fixed


intercept, which is also the same as the number of fixed effects,
the beta’s

 ‘q’ is the number of clusters (or individuals from which repeated


data was taken), which is also the same as the number of random
intercepts associated with each cluster or individual
Overview of Mixed Models
• If we are modeling only random intercepts, Zij is a special matrix
that only codes which individual/cluster an observation/datum
belongs to

• Each column is for one individual (cluster) and each row


represents one observation (one row in the dataset).

• If an observation belongs to an individual/cluster in that column,


the cell will have a 1, 0 otherwise; (sparse matrix i.e., a matrix of
mostly zeros)

• If we add a random slope, the number of rows in Zij would remain


the same, but the number of its columns and the number of rows
for the random effect matrix (bi ) will be doubled (i.e. 2q)
Overview of Mixed Models
 If we estimate bi , it would be a column vector, similar to β. However, in
classical statistics, we do not actually estimate bi ; instead, we nearly
always assume that:

bi ̴ N(0, G)
 Where G is the variance-covariance matrix of the random effects.

 Because we directly estimated the fixed effects (fixed effect intercept


and slope), the random effect complements are modeled as deviations
from the fixed effects, so they have mean zero

 The random effects are just deviations around the value in β, which is
the mean. So what is left to estimate is the variance

 If the model has only a random intercept, G is just a 1 x 1 matrix, the


variance of the random intercept
Overview of Mixed Models
• If we had both random intercept and random slope, then

  int 2
 int, slop 
2

G 2 
 slop ,int ,  slop 
2

 G is square, symmetric, and positive semi-definite with


redundant elements.

 For a q x q matrix, there are q(q+1)/2 unique elements


Overview of Mixed Models
• To simplify computation (rather than modeling G directly), we
estimate θ (e.g., a triangular Cholesky factorization G=LDL’)

• θ is not always parameterized the same way, but you can


generally think of it as representing the random effects

• It is usually designed to contain non redundant elements


(unlike the variance covariance matrix), and

• to be parameterized in a way that yields more stable estimates


than variances (such as taking the natural logarithm to ensure
that the variances are positive)
Overview of Mixed Models
• G is some function of θ, so we get its estimate

• Various parameterizations and constraints allow us to


simplify the model

• For example, by assuming that the random effects


are independent, we can get the following:

 2 int 0 
G 
 0  slopt 
2
Overview of Mixed Models
 The final element in our model is variance covariance matrix of
the residuals, ε, or the condition covariance matrix of
Yij |Xijβ + Zijbi
 The most common residual covariance structure is
R=Iσ2ε
 where I is the identity matrix, and σ2ε is the residual variance

 This structure assumes a homogeneous residual variance for all


(conditional) observations and that they are (conditionally)
independent.

 Other structures can be assumed such as compound symmetry


or autoregressive, etc.
Overview of Mixed Models

• The final fixed elements are Y, X, Z, and ε

• The final estimated elements are:

ˆ , ˆ, Gˆ , and Rˆ
• The final model depends on the distribution assumed,
but is generally of the form:

Yij |Xijβ + Zijbi ∼F(0,R)


Part I
Gaussian Longitudinal Data
Structure of Longitudinal Data
What the repeated data look like? Long form
id sem score
1 1 35
1 2 38
Wide form 1 3 37
1 4 33
id score1 score2 score3 score4
2 1 44
2 2 42
1 35 38 37 33 2 3 40
2 44 42 40 43 reshape 2 4 43
3 48 50 48 46 3 1 48
. 3 2 50
. 3 3 48
. 3 4 46
.
.
.
Data: student math final exam score out of 50% in 4 consecutive semesters
Changing from one form to the other
From wide to long form (using stata)
reshape long var, i(varlist) j(new_varname)
var- the outcome variable that belongs to the repeated data
varlist- a unique identifier of participants (it is required)
new_varname-new variable whose unique values denote a sub-
observation
(Unless we want a name to it, new_varname is optional, and stata assigns it as j)

Example: reshape long score, i(id) j(sem)

From Long to wide form


reshape wide var, i(varlist) j(existing_varname), j is existing variable name
To go back to wide after using reshape long: reshape wide
To go back to long after using reshape wide: reshape long
Let us work on Jimma Infant Data
• Wide ranges of data were collected on the following characteristics:
basic demographic information
feeding practice
anthropometric measurements, . . .
• Infants were followed during 12 months
• Measurements were taken at seven time points every two months
from each child
• Weight was one of the variables recorded at each visit

• Research question: How does weight change over time?


Jimma Infant data
Individual Profiles
10
8
wt
6
4
2

0 2 4 6 8 10 12
AGE
Individual profile for the first 50 children

STATA CODE:
gen wt=weight/1000 if ind<=50

xtline wt, overlay t(age) tlabel(#6) i(ind) legend(off)


title("Individual Profiles")
Remarks

• Subjects high (low) at baseline seem to remain high


(low) over time

• Much variability within subjects

• Much variability between subjects

• The variability between subjects at higher ages is


relatively larger than that of the baseline
Individual profile by sex

Individual Profiles for Females Individual Profiles for Males

10
10

8
8
wt

wt
6
6

4
4
2

0 2 4 6 8 10 12 0 2 4 6 8 10 12
AGE AGE
Individual profiles for the first 50 subjects

STATA CODE:
xtline wt if sex==0, overlay t(age) tlabel(#6) i(ind) legend(off)
title("Individual Profiles for Females")

xtline wt if sex==1, overlay t(age) tlabel(#6) i(ind) legend(off)


title("Individual Profiles for Males")
Longitudinal versus Cross-sectional Data
 Recall: Longitudinal data refers to measurements made
repeatedly over time to study how the subjects evolve
over time

 And, the repeated measures taken from a subjects tend


to correlate with each other.

 Cross-sectional data refers to the data collected at a


specific point of time. A snap shot of population.

 Observations from cross-sectional data are uncorrelated


Cross-sectional data
• Suppose it is of interest to study the relation between some
response Y and age from a cross-sectional study that yields the
following data:

• The graph suggests a negative relation between Y and age.


• Exactly the same observations could also have been obtained in a
longitudinal study, with 2 measurements per subject as shown in
the following slide
Cross-sectional and longitudinal Trend
First case Second case

The graph suggests a negative The graph now suggests cross-


cross-sectional association but a sectional as well as longitudinal
positive longitudinal trend. trend to be negative
Dependency/Correlation
• Measurements from the same person likely to have high
correlation

• A correct analysis should account for this correlation

• This is why the classical methods such as ANOVA, linear


regression, ... fail for such data

• Usually correlation decreases as the time span between


measurements increases.

• The simplest case of longitudinal data are paired data. The


paired t-test accounts for this by considering subject-
specific differences
Simple Methods

– Analysis at each time point separately

– Analysis of endpoints

– Analysis of increments

– Ignoring the dependence


Limitations of methods mentioned before

 Does not consider overall differences

 Does not allow to study evolution differences

 Problem of multiple testing

 Incorrect inference
Limitations of ignoring the independence

If we treat observations as independent (i.e., ignore the


correlation), then:
– in general, the estimation of the associations (regression
coefficients) of the outcome and covariates is valid.
– however, the variability measures (e.g, the SEs from a
classical regression analysis) are not right
– sometimes smaller, sometimes bigger than the true
variability, and
– therefore, the inference is not valid (too significant than it
should be if the SE is too small).
Linear Mixed Models (LMMs)

• LMMs are also known as multilevel models, hierarchical


models, random effects models, mixed models

• For a continuous outcome variable, Y, the relationship is


linear in the parameters (β’s)

• For multilevel data, where outcomes were measured for


the same cluster/subject repeatedly are assumed to be
correlated and/or the error variance is not constant.

• As a result, the GLM assumption (below) will be violated.


 i ~ iid N (0,  2 )
Linear Mixed Models (LMMs)

• Used to analyze repeated or clustered continuous data

• Not the only modeling option for multilevel data with a


continuous outcome. Another option is a marginal model

• Composed of both fixed and random effects, hence,


“mixed”
Fixed Effects in a LMM
• Are usually the focus of the analysis and can be thought of
as similar to parameters in an ordinary regression model
(the Betas)

• Can be taken from any level of the data, and help us to


explain the variance in Y at each level of the data

• For fixed effects, the only levels under consideration are


contained in the coding of those levels

• Examples of fixed effects:


– Age, sex, treatment, marital status, anxiety level, etc.
Random Effects in a LMM
• If the levels contained in the coding of factors are a random sample of
the total number of levels, there would be random effect

• Are usually not the primary focus of the analysis, but allow us to account
for correlation among observations within repeated measurements or the
same level-2 or higher units (e.g. correlations among observations within
the same school), or

• Allow us to partition the total variance of Y into levels that correspond


with the multilevel structure of the data
– Example: how much of the variation in student math achievement
scores can be attributed to student-level variability (level -2), class-
level variability (level-3), and school-level variability (level-4)?

• Are summarized by their variance and covariance, if there is more than


one random effect in the LMM
Random Effects in a LMM
 Come in two flavors:
 Random intercepts
 Random slopes

 Are explicitly specified in the model. This is in contrast to the


random errors, which are never explicitly specified when a model
is fit, but always exist and their variance is always estimated.

 LMM account for the correlation in the data by including subject


specific random effects

 These random effects are usually of a Gaussian type


When to use Fixed versus Random Effects

Their relevance depends on research question


 Use fixed effect
If interested in the mean of an outcome for factors containing
all levels
Example: Race, Gender, Age, etc

 Use random effect


If interested in the variance of an outcome by factors
sampled from a population of levels
Example: Facilities, nursing homes, time, etc

37
General linear mixed-effects model
Notations:
 Let i = 1, …, N denote level-2 units (clusters or subjects)

 j = 1, …, ni denote level-1 units (subjects or multiple


observations)

 yij is the value of the continuous outcome (yij can equal


any legitimate value of the outcome variable)
General linear mixed-effects model
Considering the model: Yij = Xijβ + Zijbi + εi . Thus, we have:
Xij are covariates
– at level-1, level-2, or cross-level interactions
– Can include polynomials, dummy variables, interactions, ...
 β are the fixed effect regression coefficients for the covariates

 Zij are the random effect variable(s)


– usually just an intercept for clustered data
– often an intercept and time for longitudinal data
 bi are the random effects ̴N(0, G), b1 , … , bN
– how cluster i influences the observations within the cluster
– how a subject starts and progresses across time
 ε1 , … , εN are independent, and εi ̴N(0, Σi)
 Variance components: elements in G and Σi
Structures of Repeated Effects

Variance Components Compound Symmetry


 1 2 0 0 0   2   1 2  1 2  1 2  1 2 
   2
 2 
2
 0 0 0   1    2  1  1 
2 2 2
 
2

0 0  2
0    2 2 
 3  
 1  1
2
 2
  3
2
 1 
0 0 0  4 2 
 1 2  1 2  1 2  2   4 2 

unstructured 
1    2 3

 
   2

AR(1)   2 
1
 1 2  1 2 12  1 3 13 
  1   
   2 1 21 2  2 3 23
    2

  3  2  1         2 
 3 1 31 3 2 32 3 
Assumptions
 A linear Mixed Model makes assumptions about:

mean structure: (non-)linear, covariates,. . .

variance function: constant, quadratic, . . .

correlation structure: constant, serial, . . .

subject-specific profiles: linear, quadratic,


Exploratory Analysis
 It comprises techniques to visualize patterns in the data
usually graphically

 The following aspects of the data will be looked:


individual profiles
average evolution
correlation structure
Estimation

• Restricted Maximum Likelihood (REML)

• Maximum Likelihood (ML)

• ML is the default in STATA


Estimation
Recall,
Yij = Xijβ + Zijbi + εij

bi ~ N(0, G), εi ~ N(0, Σi)

b1 … , bN , ε1 , … , εN independent

 Marginally:
Yij ~ N(Xijβ, ZijGZ’ij + Σi),
Estimation
• In REML, we transform Y so that the mean vanishes
from the likelihood

• Note that Likelihood at convergence of REML is NOT the


likelihood for the original data Y

• And hence can not be considered for comparison of


models.
Jimma Infant Data
• From the exploratory analysis
– mean structure seems quadratic over time

– variability between subjects at baseline

– variability between subjects in the way they evolve

• Hence a quadratic mean, with random intercept


and slope is a good idea...
Model
Wij = β0 + β1Si+ β2Aij +β3Aij2 + β4SiAij + β5Aij2Si + b0i +b1iAij+εij
= (β0 + b0i ) + β1Si + (β2+b1i)Aij +β3Aij2 + β4SiAij + β5Aij2Si +εij

Where
Wij : weight (Kg) of the ith infant at jth visit
Aij : Age of the ith infant at the jth visit
Si: Sex of the ith infant (Female =0, Male =1)
b0i: is random intercept; b1i: is random slope
β0 : the fixed intercept
Iteration 0:
Result (ML)
log likelihood = -5623.9513
Iteration 1: log likelihood = -5623.9512

Computing standard errors:

Mixed-effects ML regression Number of obs = 6,113


Group variable: ind Number of groups = 1,000

Obs per group:


min = 1
avg = 6.1
max = 7

Wald chi2(5) = 21610.03


Log likelihood = -5623.9512 Prob > chi2 = 0.0000

wt Coef. Std. Err. z P>|z| [95% Conf. Interval]

1.sex .1032682 .0413105 2.50 0.012 .0223011 .1842354


age .79505 .0086158 92.28 0.000 .7781633 .8119367

c.age#c.age -.0346971 .0005926 -58.55 0.000 -.0358585 -.0335356

sex#c.age
1 .1185881 .0120292 9.86 0.000 .0950113 .142165

sex#c.age#c.age
1 -.0084046 .0008312 -10.11 0.000 -.0100337 -.0067755

_cons 3.352453 .0297906 112.53 0.000 3.294064 3.410841


Results (ML):
Random-effects Parameters Estimate Std. Err. [95% Conf. Interval]

ind: Unstructured
sd(age) .0973595 .0030093 .0916365 .1034399
sd(_cons) .5215715 .0160432 .4910565 .5539828
corr(age,_cons) .3098716 .0433613 .2225995 .3922169

sd(Residual) .4390212 .0048254 .4296646 .4485815

LR test vs. linear model: chi2(3) = 6048.29 Prob > chi2 = 0.0000

Note: LR test is conservative and provided only for reference.

sd(b0i)=0.5216, sd(b1i)=0.0974, Corr(b0i, b1i)=0.3099;

sd(εij )=0.4390.
Stata code
Maximum likelihood (default)
xtmixed wt i.sex age c.age#c.age i.sex#c.age
c.age#c.age#i.sex ||ind: age, cov(un)

Restricted maximum likelihood


xtmixed wt i.sex age c.age#c.age i.sex#c.age
c.age#c.age#i.sex ||ind: age, cov(un) reml
Goodness of fit
• Ignore the correlation in the data and fit linear
regression

• Observe the major impact in model fit

• Formally, one can consider likelihood ratio test for


model comparison
Interpretation

• All fixed effects are statistically significant

• The random effects capture the correlation in the data

• Males tend to be higher at baseline (β1 = 0.103), as well


as in evolution over time (β4 = 0.119)
Part II
Non-Gaussian Longitudinal Data
The Jimma Infant Data
 It is of particular interest to identify the risk of
overweight in early life through weight and height
measurements

 This helps in prevention of overweight and obesity to


reduce incidence of several adulthood diseases

 One possible indicator of overweight is age- and sex-


specific BMI, with a BMI over the 85th percentile
referring to overweight
Variable of interest

 The outcome of interest is BMI coded as 0 (normal or


underweight) or 1 (over weight)

 The question of interest is whether the percentage of


overweight changes over time (age), differs for gender.
Generalized Linear Model (GLM)
• A random variable Y follows an exponential family distribution if
the density is of the form
 
f ( y)  f ( y |  ,  )  exp  1 y  ( )  c( y,  )

• For a specific set of unknown parameters η and φ , and for


known functions ψ(·) and c(·, ·)

• Often, η and φ are termed ‘natural parameter’ (or ‘canonical


parameter’) and ‘dispersion parameter,’ respectively

• For this family, in general, the mean and variance are related
Generalized Linear Model (GLM)

• For binary responses, the model of interest is:


Y ~ Bernoulli (π)

• We want to explain variability between outcome values based on


covariate values with density function

    
f ( y |  ,  )   (1   )
y 1 y
 exp  y ln    ln(1   )
 1   
• The mean is given by μ = π and the variance, var(μ) = π(1 − π)
Generalized Linear Model (GLM)

• When collecting a set of data, let Y1, . . . , YN be a set of


independent binary outcomes

• Let x1, . . . ,xp represent the corresponding p-


dimensional vectors of covariate values

• With a logit link function, ln[πi /(1- πi)] = Xiβ is the


logistic regression model with a vector of p fixed,
unknown regression coefficients
Generalized Linear Model (GLM)

• For count data, we assume that


Yi ̴ Poi(λi)

With λi = exp(xiβ)
Generalized Estimating Equations (GEE)
• Marginal model approach for non-Gaussian longitudinal data
is good, particularly for discrete data

• Unlike GLMM, it is not a model, rather an approach that


model the dependence of marginal mean on covariates

• Repeated nature of the data is modeled based on ‘working


correlation’

• Same form as for full likelihood procedure, but we restrict


specification to the first moment only

• Model-based version and Empirically-corrected version


Generalized Estimating Equations (GEE)

• Correlation is modeled using, say, Exchangeable


correlation

• However, separate estimation and reporting of


correlation is not a main interest.
The model in GEE
 wt 1  ht1 
 wt 2  ht 
   β 0  β1  2   β 2 (age)  CORR  Error
...  ... 
   
 wt 7  ht 7 

It measures linear correlation between height (ht) and weight (wt) across
all 7 time periods. Vectors!
It measures linear correlation between age (time) and weight
CORR represents the correction for correlation between observations i.e.
correlation among the 7 measurements.

A significant beta 1 (height effect) here would mean either that infants
who have high height also have high weight(between-subjects effect), or
that infants whose height change correspondingly have changes in
weight(within-subjects effect), or both.
62
Effects on standard errors
In general, ignoring the correlation (dependency) of the
observations will overestimate the standard errors of the time-
dependent predictors (such as age and length/height of child ),
since we haven’t accounted for between-subject variability.

However, standard errors of the time-independent predictors


(such as sex) will be underestimated.

This is because the long form of the data makes it seem like
there’s 7 times as much data then there really is (the cheating way
to reduce a standard error)!
63
How does GEE work?

• First, a classical linear regression analysis is carried out, assuming


the observations within subjects are independent.

• Then, residuals are calculated from the classical model (observed


minus predicted) and a working correlation matrix is estimated
from these residuals.

• Then the regression coefficients are refit, correcting for the


correlation. (Iterative process)

• The within-subject correlation structure is treated as a nuisance


variable (i.e. as a covariate)

64
OLS regression variance-covariance matrix

Example: Assuming we have only three repeated data from


each participant at three time points (not 7 measurements)
t1 t2 t3
Correlation structure (pair-wise
t1 correlations between time points)
 2 0 0  is Independence.

 y/t 
t2
 0  y/t2
0 
 2 
t3
 0 0  y / t 

Variance of scores is homogenous across time


(MSE in ordinary least squares regression).
65
GEE variance-covariance matrix

t1 t2 t3
Correlation structure
t1 must be specified.
 2 a b 
t2  y/t 
 a  y/t
2
c 
 2 
 y / t 
t3
 b c

Variance of scores is homogenous


across time (residual variance). 66
Choice of the correlation structure
within GEE
In GEE, the correction for within subject correlations is carried out
by assuming a priori a correlation structure for the repeated
measurements (although GEE is fairly robust against a wrong
choice of correlation matrix—particularly with large sample size)
Choices:
• Independent (classical regression analysis)
• Exchangeable (compound symmetry, as in rANOVA)
• Autoregressive
• M-dependent
• Unstructured (no specification, as in rMANOVA)

We are looking for the simplest structure (uses up the fewest degrees of freedom)
that fits data well!
67
Autoregressive

t1 t2 t3 t4

t1
  2 3 
t2  2
t3     
t4
 2    
 
  3  2   

Only 1 parameter estimated. Decreasing correlation for


farther time periods.
68
M-dependent

t1 t2 t3 t4

t1   1 2 0 
 1  2 
 1 
t2
t3   2 1  1 
 
t4 0  2 1 

Here, 2-dependent. Estimate 2 parameters (adjacent time periods


have 1 correlation coefficient; time periods 2 units of time away
have a different correlation coefficient; others are uncorrelated) 69
Unstructured

t1 t2 t3 t4

t1
 1 2 3 
t2   5  4 
t3  1
t4  2 5  6 
 
3 4 6 

Estimate all correlations separately (here 6)

70
How GEE handles missing data

Uses the “all available pairs” method, in which all non-


missing pairs of data are used in the estimating the
working correlation parameters.

Because the long form of the data are being used, you
only lose the observations that the subject is missing,
not all measurements.

71
Generalized Linear Mixed Models (GLMM)

• For non-Gaussian data, the well-known generalized


linear mixed model is commonly used

• The linear predictor contains random effects in addition


to the usual fixed effects

• These random effects are usually assumed to come


from a normal distribution
Generalized Linear Mixed Models (GLMM)
• Let Yij be the jth outcome measured for subject i = 1, . .
. ,n, j = 1, . . . , ni and group the ni measurements into
a vector Yi

• Conditionally upon q-dimensional random effects bi ̴


N(0,G), the outcomes Yij are independent with
densities of the form
   
f i ( yij | bi ,  ,  )  exp  1 yij ij  (ij )  c( yij ,  )

With
[ ' (ij )]   (  ij )  [ E (Yij | bi ,  )]  x'ij   z 'ij bij
Generalized Linear Mixed Models (GLMM)

• For a known link function η(·), with Xij and Zij p-dimensional
and q-dimensional vectors of known covariate values,
respectively

• With β a p-dimensional vector of unknown fixed regression


coefficients, and with φ a scale (over-dispersion) parameter

• Finally, let f(bi|G) be the density of the N(0,G) distribution


for the random effects bi
GEE versus GLMM
Generalized Estimating Equations (GEE) Generalized Linear Mixed Models (GLMM)
Focus Called a "marginal" mean regression model. Called a "conditional" mean regression
Mean model is the primary focus model
 Estimation of both means and cluster
Longitudinal or cluster correlation is specific random effects are of interest
a nuisance feature of the data
Has applications for person or cluster
level prediction
Technical Estimates are the solution to estimating Estimates are obtained from a likelihood
details equations function
Semi‐parametric: only the mean and  Fully parametric specification of mean,
working’ correlation models are random effects, and error terms
specified
Correlation Requires choice of “working” correlation Correlation induced by person or cluster specific
model (independence, exchangeable, etc.) random effects
Random intercepts, random slopes
Robust to misspecification of
correlation, assuming the sandwich Permits multiple levels of clustering, i.e.
estimate of the variance is used hierarchical models
Traditionally accommodates only one
level of clustering
Issues to Sandwich estimate of variance requires a Valid inference only if model assumptions
sufficiently large number of clusters (≥ 40) are met
consider
 With missing data, assumption is
MCAR With missing data, assumption is MAR
Mdeling TVCs (time‐varying Modeling TVCs (time‐varying covariates) is
covariates) is tricky tricky
GEE: The Jimma Infant Data
• The following model for BMI is assumed for the mean
structure: Yij|bi ̴ Bernoulli(πij), for subject i and
measurement j,

• Exchangeable correlation ( or CS)

logit(πij) = β0 + β1Aij + β2 Gi + β3GiAij

Where
Gi is a gender indicator.
Aij is age of the ith infant at time j (also the time variable)
Result
GEE Model

GEE population-averaged model Number of obs = 6,113


Group variable: ind Number of groups = 1,000
Link: logit Obs per group:
Family: binomial min = 1
Correlation: exchangeable avg = 6.1
max = 7
Wald chi2(3) = 2.13
Scale parameter: 1 Prob > chi2 = 0.5462

BMIBIN Coef. Std. Err. z P>|z| [95% Conf. Interval]

1.sex .148298 .1376528 1.08 0.281 -.1214966 .4180926


age .0016801 .0124828 0.13 0.893 -.0227856 .0261459

sex#c.age
1 -.0180339 .0173981 -1.04 0.300 -.0521336 .0160658

_cons -1.872127 .1011225 -18.51 0.000 -2.070323 -1.67393


Stata code

xtgee BMIBIN i.sex c.age i.sex#c.age, i(ind) t(age)


corr(exc) link(logit) family(bin)
Options and interpretations
• The option ‘robust’ can be used to obtain the
empirically corrected standard error estimates.

• The odds ratios can be requested by the option ‘eform’.

• The correlation matrix can be requested by the


command ‘estat wcorrelation’.

• Run and compare the standard errors of the ‘model


based’ and the ‘robust’ versions and interpret them
Correlation matrix
estat wcorrelation

Estimated within-ind correlation matrix R:

| c1 c2 c3 c4 c5 c6 c7
------+-----------------------------------------------------------------------------------------------
r1 | 1
r2 | .1458121 1
r3 | .1458121 .1458121 1
r4 | .1458121 .1458121 .1458121 1
r5 | .1458121 .1458121 .1458121 .1458121 1
r6 | .1458121 .1458121 .1458121 .1458121 .1458121 1
r7 | .1458121 .1458121 .1458121 .1458121 .1458121 .1458121 1
GLMM: The Jimma Infant Data

• Random-effects model for non-Gaussian longitudinal data

• The following model is assumed for the mean structure:


Yij |bi ̴ Bernoulli(πij), for subject i and measurement j

• Gaussian distributed random intercepts bi, i.e., bi ̴ N(0, G)


can be included to capture the correlation.

logit(πij) = β0 + β1Aij + β2 Gi + β3GiAij + bi


GLMM with random intercept
Integration points = 7 Wald chi2(3) = 2.14
Log likelihood = -2325.1686 Prob > chi2 = 0.5444

BMIBIN Coef. Std. Err. z P>|z| [95% Conf. Interval]

1.sex .1774603 .163003 1.09 0.276 -.1420197 .4969403


age .0022733 .0147053 0.15 0.877 -.0265485 .0310952

sex#c.age
1 -.021603 .0204887 -1.05 0.292 -.0617601 .0185542

_cons -2.336672 .1266153 -18.45 0.000 -2.584834 -2.088511

Random-effects Parameters Estimate Std. Err. [95% Conf. Interval]

ind: Identity
sd(_cons) 1.20833 .0752947 1.06941 1.365295

LR test vs. logistic model: chibar2(01) = 222.16 Prob >= chibar2 = 0.0000
GLMM with random intercept

xtmelogit BMIBIN i.sex c.age i.sex#c.age|| ind:

Note that the odds ratio estimates can be


obtained by including the option ‘or’.
Mixed effect model for count data
 Let i = 1, …, n denote level-2 units (clusters or subjects)

 j = 1, …, ni denote level-1 units (subjects or multiple observations)

 yij is the value of the count outcome, the number of events (yij
can equal 0, 1, …)

 tij is the length of time during which the events are recorded
 can be equal (tij = t): all observations are based on the same period
of time, and the number of events within that same time period is
of interest
 can vary (tij): observations are based on varying periods of time, this
should be accounted for when modeling the number of events
within the varying time periods
Mixed-effects Poisson Regression Model
 The mixed-effects Poisson regression model
indicates the expected number of counts in tij as:
E(yij) = μij = tijexp[xiβ + Zibi]
log(μij) = log(tij ) + [xiβ + Zibi]
log(μij/tij ) = [xiβ + Zibi]

 link function for Poisson regression is the log link

 tij is sometimes called an offset variable

 exp β = incidence or event rate ratio


Mixed effect model for count data
Consider FelegeHiwot referral hospital CD4 count data
• Let Yij represent the number of CD4 counts for patient i at
the jth visit

• Let tij be the time-point (visit) at which Yij has been


measured, tij = 1, 2, . . . until at most 8. And assume that the
measurement time intervals were equal, 3 months (consider
as a 1 term or period, otherwise consider ln(3) as an offset)

• For sex, females were coded as zero and males as 1

• bi are subject specific random intercepts assumed to have


Gaussian distribution with mean 0 and variance d.
Mixed effect poison model
First fixed effect and random intercept model
• Assuming that CD4 counts are generated from a
Poisson-normal process with mean λij
ln(λij) = log(t) + [β0 + β1sexi + β2visitij + bi0 ]
ln(λij) = (β0 + bi0) + β1 sexi + β2 visitij’ taking t = 1

Mixed-effects Poisson regression Number of obs = 4,208


Group variable: ID Number of groups = 681

Obs per group:


min = 1
avg = 6.2
max = 8

Integration points = 7 Wald chi2(2) = 8105.18


Log likelihood = -82951.749 Prob > chi2 = 0.0000
Result for mixed effect with random intercept

CD4_ Coef. Std. Err. z P>|z| [95% Conf. Interval]

sex -.2325727 .0389667 -5.97 0.000 -.308946 -.1561994


visit .0350176 .0003898 89.82 0.000 .0342535 .0357817
_cons 5.700367 .0253205 225.13 0.000 5.650739 5.749994

Random-effects Parameters Estimate Std. Err. [95% Conf. Interval]

ID: Identity
sd(_cons) .5008095 .0139198 .474257 .5288485

LR test vs. Poisson model: chibar2(01) = 2.6e+05 Prob >= chibar2 = 0.0000
Stata code

xtmepoisson CD4_ age i.sex i.residence || ID:

• The option ‘irr’ can be used to obtain incidence rate


ratios as follows:

xtmepoisson CD_ age i.sex i.residence || ID:, irr


Mixed effect Poisson model
Fixed effect with random slope and random intercept
• Include a random slope assuming subjects have
different evolution over time.

• Both bi0 and bi1 are jointly normally distributed and


possibly correlated.

• The varance-covarance matrix can then be


‘unstructured’.
ln(λij) = β0 + bi0 + β1 sexi + β2 visitij + bi1visitij
= (β0 + bi0) + β1 sexi + (β2 + bi1)visiti
Result for random slope and intercept
Integration points = 7 Wald chi2(2) = 101.78
Log likelihood = -67131.312 Prob > chi2 = 0.0000

CD4_ Coef. Std. Err. z P>|z| [95% Conf. Interval]

sex -.2497908 .0485448 -5.15 0.000 -.3449368 -.1546448


visit .0446168 .0051307 8.70 0.000 .0345608 .0546728
_cons 5.665371 .0314839 179.94 0.000 5.603663 5.727078

Random-effects Parameters Estimate Std. Err. [95% Conf. Interval]

ID: Independent
sd(visit) .1266529 .0042952 .1185082 .1353574
sd(_cons) .621142 .0173675 .5880183 .6561316

LR test vs. Poisson model: chi2(2) = 2.9e+05 Prob > chi2 = 0.0000

Note: LR test is conservative and provided only for reference.


Stata code

xtmepoisson CD4_ sex visit || ID: visit


Model checking in linear mixed models
Model selection: likelihood
• When choosing between different models we want to be able
to decide which model fits our data best. If the models
compared are nested within each other it is possible to do a
likelihood ratio test where the test statistic has an approximate
distribution. The test statistic for the likelihood statistic is,

2log( L1 )  log( L2 ) ~  DF
2

• where DF are the degrees of freedom which is the difference


in number of parameters for the models and L1 and L2 are the
likelihoods for the first and second model respectively.
Model selection: likelihood
 If the two models compared are not nested with each other but contain the
same number of parameters they can be compared directly by looking at the
log likelihood and

the model with the biggest likelihood value wins

 If the two models are not nested and contain different number of parameters
the likelihood can not be used directly. It is still possible to compare these
models with some of the methods described below.
 The bigger the likelihood is the better the model fits data and we use this when
we compare different models

 Since we are interested in getting as simple models as possible we also have to


consider the number of parameters in the structures. A model with many
parameters usually fits data better than a model with less number of parameters
 Information.
Model selection: Information criteria
• It is possible to compute so called information criteria and there are different
ways to do that and two of these were shown here, Akaikes information
criteria (AIC) and Bayesian information criteria (BIC). AIC and BIC are
appropriate for maximum likelihood models.

• The idea with both of these is to punish models with many parameters in
some way.
– The AIC value is computed as below where p is the number of parameters
in the covariance structure. Formulated this way, a smaller value of AIC
indicates a better model.
AIC  2LL  2 p,  2LL is the deviance

– The BIC value is computed using the following formula where p is the
number of parameters in the covariance structure and n is the number of
effective observations, which means the number of individuals. Like for
AIC a smaller value of BIC is better than a larger.
BIC  2LL  ln( N ) * p
Stata command to calculate AIC and BIC
 For maximum likelihood models, use the command estimates stats after
storing results. i.e.

estimates stats model1_name model2_name, n(N)

Where N is the number of observation used in the analysis; and in longitudinal


(correlated data) select N carefully

 AIC doesn’t require N in its calculation but BIC as can be seen from the formula

 If there is very strong within correlation ,use the number of individual as N;


otherwise the number of observations

 However, STATA uses the number of observations and it is preferable in


general.

 For non-maximum likelihood estimates (e.g. regress) use the command:


estat ic
So much thanks

You might also like