You are on page 1of 14

Longitudinal Data Analysis

Random Effects Models for


continuous outcomes

Giorgio Di Gessa
g.di-gessa@ucl.ac.uk

Learning objectives

• Linear regression for continuous outcomes


• Longitudinal data: ‘Between-individual’ or
‘between-cluster’ variability
• Statistical models to longitudinal data
• Interpret results from longitudinal models

Learning objectives

• Linear regression for continuous outcomes


• Longitudinal data: ‘Between-individual’ or
‘between-cluster’ variability
• Statistical models to longitudinal data
• Interpret results from longitudinal models

3
Regression models: the basic idea

When two variables are analysed, we might be


interested in summarising their relationship and in
explaining/ predicting one of the variables on the
basis of information on the other

 Dependent (outcome, y) variable: the variable whose


variation we wish to explain or predict
 Independent (explanatory, x) variable: the variable
used to explain/predict changes in the outcome

Simple linear regression


The simplest linear regression procedure is the
bivariate linear regression analysis (or simple
linear regression). This model explores the linear
relationship for 2 variables (continuous outcome).

Regression is a statistical model that captures


the randomness of real-life processes (not a
deterministic model). Variables are stochastically
related [i.e. you can only tell what the chances are
that the outcome will have a particular value,
based on the value of the other variable(s), and
their distributions].

Graphical Representation

Blood
pressure

(weight Kg)

6
Simple linear regression equation

Mathematically, a straight line is written as below:

a = intercept

y  a  bx b = slope
y = dependent variable
x = independent variable

Yi = α0 + β1 Xi + εi

Regression equation - Graphical Representation

y Observed Value yi
of y for xi y^  a  bx
yi
Residual Slope
Predicted Value ei b
}

yˆi

{ Intercept
a
xi x

Assessing the model -- R2

Coefficient of determination (R2) shows how much


of the variation in y is explained by the variation in x
(i.e. the regression model). R2 represents the
proportion of variation in y that is explained by x.

𝑆𝑆 𝑆𝑆
𝑅 = =1 −
𝑆𝑆 𝑆𝑆

SSreg = “Regression sum of squares"


SSerr = “Error sum of squares"
SStot = “Total sum of squares"

9
𝑆𝑆 𝑆𝑆
𝑅 = =1 −
𝑆𝑆 𝑆𝑆
y
yi 
 y
SSerr = (yi - yi )2
_
SStot = (yi - y)2

y  _
SSreg = (yi - y)2 _
_
y y

Xi x

10

Learning objectives

• Linear regression for continuous outcomes


• Longitudinal data: ‘Between-individual’ or
‘between-cluster’ variability
• Statistical models to longitudinal data
• Interpret results from longitudinal models

11

11

Hierarchical / Clustered/ Longitudinal data


• Longitudinal data:
– Data grouped in time: i.e. the same measures of cognitive
function gathered from same persons every year
• Observations from hierarchical data structures are
correlated. Standard regression techniques do not
take this intra-subject correlation of response
measurements into account (invalid inferences)
• Random Effects/ Mixed models can estimate the
associations between variables whilst taking into
account the correlated nature of observations within
the same group.
12

12
Hierarchical structure (time)

Individual 1 Individual 2

Occasion 1 Occasion 2 Occasion 1 Occasion 2

serial measurement occasions (level 1: i)


clustered within individuals (level 2: j)
13

13

Correlated nature of repeated measurements


700
600
Lung function
400 300
200500

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Subject id

Occasion 1 Occasion 2
Source: Rabe-Hesketh and Skrondal (2010): pp.75-76.
14

14

ith measurement in jth cluster

Continuous
outcome

𝑥 Exposure

15

15
Linear regression: no apparent relationship

Continuous
outcome

Exposure

16

16

What if multiple observations come from the


same person?
Continuous
outcome

Exposure
Exercise: Draw a line!
17

17

Mixed model regression: clear relationship

Continuous Fixed effects relationship


outcome = re-population average relationship
Correlated residuals
shrinkage

+ve

residuals
cluster +ve (variance 𝝈𝒆 𝟐 )
dependent -ve
intercepts
shrinkage

(from a pop -ve


with var 𝝈𝒖 𝟐 )

Exposure

18

18
Between- and within-group variation
There are two sources of variance within
longitudinal/ hierarchical data:
– Level 2(j): Between groups
differences between persons in a longitudinal study
(inter-individual)
– Level 1(i): Within the same group
change in outcomes within the same person over time
in a longitudinal study (intra-individual)

“Inter-individual differences are differences that are observed between people, whereas intra-individual
differences are differences that are observed within the same person when assessed at different times.”

19

Null model for hierarchical data


Fixed and Random parts

𝑦 = 𝛽 + 𝑢 +𝑒

Outcome: 𝑦  response at occasion i for person j


Fixed / systematic part: 𝛽 is the overall mean
Random parts: 𝑢 - group level residual ~ N(0, 𝝈𝒖 𝟐 )
𝑒 - lowest- level residual ~ N(0, 𝝈𝒆 𝟐 )
𝜎 and 𝜎 are the variance components (group j
and lowest level i respectively) which we estimate.
20
# level 2 units >10, >20: to ensure good estimation of 𝝈𝒖 𝟐

20

Intra-cluster/class Correlation Coefficient (ICC)


Variance Partition Coefficient (VPC)

𝜌=𝜎 /(𝜎 +𝜎 )

𝜎 +𝜎 is the total variation.


𝜎 is the variation at level 2, 𝜎 the variation at level 1.

Possible values for 𝜌 : 0 ≤ 𝜌 ≤ 1

Two different ways of looking at the same thing:


• Proportion of variance: “How much of the total variance
is at level 2 (the group / cluster level)?”
• Correlation coefficient: With longitudinal data, 𝜌 is the
correlation among observations within the same cluster 21

21
Variance Partition Coefficient /2

VPC ~ how important group level differences are (what


proportion of the variance is at the group level?)

• VPC=0 if no group effect 𝜎 =0

• VPC=1 if no within-group differences 𝜎 =0

22

summary(randint)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: QoLscore ~ 1 + (1 | idauniq)
## Data: ELSA_long
##
## AIC BIC logLik deviance df.resid
## 264345.3 264371.1 -132169.7 264339.3 40434
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -6.5733 -0.4958 0.0544 0.5340 4.3855
##
## Random effects:
## Groups Name Variance Std.Dev.
## idauniq (Intercept) 55.70 7.463
## Residual 23.36 4.833
## Number of obs: 40437, groups: idauniq, 10173
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 41.11007 0.07971 515.8

ICC <- (55.70)/(55.70 + 23.36)


ICC
## [1] 0.7045282 23

23

Learning objectives

• Linear regression for continuous outcomes


• Longitudinal data: ‘Between-individual’ or
‘between-cluster’ variability
• Statistical models to longitudinal data
• Interpret results from longitudinal models

24

24
Add covariates: Random-Intercept model

Fixed Random

𝑦 = 𝛽 + 𝜷𝟏 𝑥 + 𝑢 + 𝑒

Fixed Part: 𝛽 , 𝛽 Covariates:


 Time-invariant (e.g. sex)
Random Part: 𝑢 , 𝑒  Time-varying, including time
to estimate rate of change
(Session 2)

• Mean response ~ combination of characteristics shared by all persons


(fixed effects) and subject-specific effects that are unique to a particular
25
observation (random effects).

25

University of Bristol: Centre for


Multilevel Modelling

β0 = intercept
β1 = slope (mean line)

For a given sample, there are


N lines, one per individual.
The variance 𝜎 represents
the spread of these lines
(Gibbons et al;2010).

Fixed part: slope (β1) does not vary across groups (parallel lines)
Random part: the intercept varies across clusters (overall mean
[β0] + cluster-specific deviation [uj]). Recognition that observations
are heterogeneous: some observations will have outcome values
above (uj>0) or below (uj<0) the overall mean (intercept β0) 26

26

Random-Intercept only model: Interpretation


• Fixed part (population-average relationship):
mean intercept (β ); average slope for X-Y
association (β )
– Mean population X-Y regression line: Set the level 1
and level 2 random effects to zero: the population-
average trajectory.
• Random effects part: group-specific variation
around the mean intercept (estimated by 𝜎 -- we
can assign values to uj): resolve non-
independence by allowing each level-2 unit to
have a different intercept / initial level.
27

27
Random-Intercept and Random slope model

We can extend the model by allowing the gradient /


slope as well as the intercept to vary with cluster
Random-Intercept: heterogeneity ~ intercepts above/ below average (b0)
Random-Slope: heterogeneity ~ slopes above/ below average (b1) 28

28

Random-Intercept and Random slope model /2

𝑦 = 𝛽 + 𝜷𝟏𝒋 𝑥 +𝑒

Where 𝛽 = 𝛽 + 𝑢 ; 𝛽 =𝛽 + 𝑢
hence:

𝑦 = (𝛽 + 𝑢 ) +(𝛽 + 𝑢 )𝑥 + 𝑒
= (𝛽 +𝛽 𝑥 ) +(𝑢 + 𝑢 𝑥 ) +𝑒

fixed random

Both intercept and slope are now group dependent.


Intercept and slope = average (𝛽 ,𝛽 ) + (group-specific
deviation from the average: 𝑢 , 𝑢 ) 29

29

𝑦 = (𝛽 + 𝑢 ) +(𝛽 + 𝑢 )𝑥 +𝑒

𝛽 is the mean response when X=0


𝛽 is the average slope of the X-Y association

𝛽 and 𝛽 are the fixed effects: and describe the


population average response of Y and how it
changes over the values of the covariates.
 Each random effect 𝑢 is the difference between the pop-
averaged intercept (𝛽 ) and the intercept for group j
 Each random effect 𝑢 is the difference between the
population-averaged slope (𝛽 ) and the slope for group j
30

30
Person-specific trajectories

Population averaged response


• j=1: Intercept: Y is higher (when X=0) than the pop. average
(b0) and therefore u is positive. Slope: steeper (b1 + u )
than the pop. average (b1) and therefore has a positive u
• j=2: Negative u and u
• Level 1 residuals e allow responses Y on any occasion to
vary randomly above/ below the group-specific trajectories
31

31

Random-Intercept and Random-Slopes:


Interpretation:

Fixed part: mean intercept (β ); average slope for X-


Y association (β )

Random part:
(1) group-specific variation around the mean
intercept (𝑢 ); and
(2) group-specific variation around the mean slope
of the X-Y association (𝑢 )
32

32

Two different random effects / mixed models


Random-Intercept only + Random-slope

Random intercept Random intercept


unique to group (𝛽 + 𝑢 ) unique to group (𝛽 + 𝑢 )
Fixed slope: Random slope:
Pop average over time for full sample: effect of X on individuals differs; slope
effect of X same ∀ individuals (β) is partitioned into:
(1) population average (β) , and
(2) individual-specific (𝛽 + 𝑢 )

33

33
Model specification
• “Shall I add variable x to my random effects?”
• “Do you want x to be in the fixed part of the model
or the random part?”…
– What is your research question?
• Questions about means (variables)
• Questions about variability (levels: multilevel structure)
– Account for statistical correlation in longitudinal data to
ensure correct standard errors (partition the error into
the level 1 and level 2 components).

http://www.bristol.ac.uk/cmm/learning/videos/random-intercepts.html 34

34

Model specification /2
About variables (means) About variability
• What is the relationship between • How much variation in the slope of
age and cognitive function (CF)? the age - CF association is at the
• This is a question about means: person level (Level 2)?
what happens to the mean value • This is a question about variances
of CF for a 1-unit change in age? • This can be answered using the
• This is answered using the fixed random part of the model: we allow
part of the model: a random slope for age; in addition
– Fixed effect of age to the random intercepts (and the
• Using mixed models, we can allow fixed effect of age)
for the clustering in the data (e.g.
obtain correct SEs) by specifying
random intercepts. 𝜎 as a
“nuisance parameter”*.
*http://www.bristol.ac.uk/cmm/learning/videos/random-intercepts.html 35

35

Learning objectives

• Linear regression for continuous outcomes


• Longitudinal data: ‘Between-individual’ or
‘between-cluster’ variability
• Statistical models to longitudinal data
• Interpret results from longitudinal models

36

36
Random Intercept model
𝑦 =𝛽 + 𝛽 𝑥 +𝑢 +𝑒

Interpretation of fixed
effects as in ordinary
linear regression:
On average, the X-Y
( 𝝈𝒖 𝟐) association is represented
(𝝈𝒆 𝟐) by a straight line with
intercept = 43.1 and slope
= -0.75
β0
β1

37

37

Random Slope model

group level intercept variance ( 𝝈𝒖𝟎 𝟐 )


group-level slope variance (σu12)
person-level variance ( 𝝈𝒆 𝟐 )

group-level intercept & slope cov (σu01)

Model where association between X (CF) and Y (QoL) is allowed to vary


across groups (level j) 38

38

Covariance between intercept and slope

Pattern of fanning out Pattern of fanning in 39

39
What is a better model? Do we need extra
random intercept or random slope?
• The context of the research and your research
question(s) should guide the selection of the model
• You can also use Likelihood Ratio Tests [LRT] to
inform model choice and assess if extra parameters
in a larger, more complex model are needed.
– Fit the model with the random intercept/ random slope
– Fit the model without the random intercept/random slope
– Compare the 2 models using LR test
– The null hypothesis is equivalent to the hypothesis that
(a) the variance of the random intercept ( 𝝈𝒖 𝟐) = 0; (b) the
variance of the random slope = 0 40

40

Summary
Special techniques for longitudinal data to account for
correlation between observations within the same person
Random effects/ mixed models: combo of fixed & random parts:
 Fixed part: describes the pop average response and how it
changes over the values of the covariates. Interpretation as
in ordinary regression.
 Random part: involves decomposing error variance:
– Between-groups (level 2 residuals)
– Within-groups (level 1 residuals)
– Two different types of random effects (level 2 residuals):
random intercepts
random slopes
41

41

References
• Rabe-Hesketh & Skrondal (2012). Multilevel and
longitudinal Modeling Using Stata. Stata Press.
• Fitzmaurice & Ravichandran (2008). A primer in
longitudinal Data Analysis Circulation:118(19).
• Gibbons et al (2010) “Advances in Analysis of
Longitudinal Data” Annu Rev Clin Psychol: 6.
• Douglas Bates (2010): Mixed modelling with R
(http://lme4.r-forge.r-project.org/lMMwR/lrgprt.pdf)
• LEMMA: https://www.cmm.bris.ac.uk/lemma/ A
comprehensive and free (!) online course in
multilevel modelling, including computing exercises
in R & Stata. 42

42

You might also like