4 2022 Advanced Bio2 Introduction To GEE

Introduction to GEE
Generalised Estimating Equations (GEE)

- Extends generalized linear model to accommodate correlated Y s
- Handles longitudinal, repeated measures or otherwise clustered or correlated

data, eg from complex survey
- Often people would fit a linear model to such data and only then adjust the
standard errors to account for the clustering; the problem is that this post-hoc
approach does not affect the parameter estimates in the model.
-
GEEs are
- A different way to estimate regression coefficients (not based on likelihoods)
- Can account for clustered or correlated data
- Called population average or marginal models
- They are not really statistical models (“estimating equations”)
- New! First introduced in 1986 (Liang & Zeger)

- GEEs have consistent and asymptotically normal solutions, even with
misspecification of the correlation structure
- Avoids need for multivariate distributions by only assuming a functional form

for the marginal distribution at each timepoint (i.e., y ij )
- The covariance structure is treated as a nuisance
- Relies on the independence across subjects to estimate consistently the

variance of the regression coefficients (even when the assumed correlation
structure is incorrect)
Subject-specific vs population-average
GEE estimates the population average effects. Consider:
- A: You are a doctor. You want to know how much a certain drug will reduce
your patient’s odds of getting a heart attack.
- B: You are a department of health official. You want to know how the number
of people who die of heart attacks would change if everyone at risk took a
certain drug.
A is asking for subject specific information, B is asking for entire population

information. GEE can give estimates for B, but not A.
Parts of a GEE
g(Y) = 𝛃X + CORR
Parts of a GEE
g(Y) - outcome, related to the systematic part through a link function g()
𝛃X - systematic part (linear predictor, includes coefficients, covariates etc)
CORR - placeholder term for the correlation / covariance matrix
Y - the responses, are assumed to be clustered or correlated

Assumptions
We assume that the mean of Y (E(Y)) depends on the covariates via a link
function g()
AND
that the variance of Y is related (through a function) to the mean
For this, we need to estimate something called a ‘working correlation’

Assume a structure for the working
correlation
Independent - Exchangeable - Autoregressive - Unstructured - Free specification
Describing correlation mathematically
Exchangeable correlation: Responses within subjects are equally correlated
First-order Auto Regressive (AR1):
Correlation among responses within subjects decays exponentially
Unstructured:
Correlation among responses within subjects completely unspecified
Independence: No correlation among responses within subjects
Independent: the covariance matrix is diagonal (only variance, no co-variance,
GLM)
Exchangeable: All measurements on the same unit are equally correlated

(plausible for clustered data, maybe not so much for longitudinal)
Also called: spherical, compound symmetry
Autoregressive: Correlation depends on time or distance between

measurements (plausible for many spatial and temporal data)
Unstructured: No assumptions about correlation
Lots of parameters, usually fails to converge

Examples
An example
● Longitudinal follow-up from 1995-2003 of lung function decline
● N = 322 males used for this example
● Variables of interest
○ FEV1
○ Age (at each observation)
○ Smoking status: current, former, nonsmoker
○ Age at baseline
○ Height at baseline
Data from 2
Code for fitting this
. xtgee fev age smoking2 smoking3 agebase hbase, i(id), t(wave)
corr(exchangeable)
geeglm(fev ~ age + smoking2 + smoking3 + agebase + hbase, id = id, waves =

“wave”, corrst = ”exchangeable”)
** please note: usually the order of your individuals with clusters and the type of
your variables matters more than you might wish. Please read documentation
carefully.
“An expected difference in blood pressure comparing smokers to non-smokers of
the same age and weight”
Not
“An expected difference in blood pressure comparing a smoker to a non-smoker of

the same age and weight”
A real study! GEEs!
Design
● N = 375
● Visits at weeks 4, 12, 24, 36, 48
● Outcome 1: Retention at week 48 (Y/N)
● Outcome 2: Protective tenofovir concentrations at week 48 (Y/N)
Methods
“as tenofovir diphospahate concentrations for individuals aged 18-24 years were
assessed at all visits, to control for correlation among observations, generalised
estimating equation logistic regression models with autoregressive correlation
matrix and robust variance were used..”
Reading / links
Simple worked example in R:
https://rlbarter.github.io/Practical-Statistics/2017/05/10/generalized-estimating-equ
ations-gee/
Nice paper - probably read the introduction only - the rest gets fairly technical. Statistical Analysis of
Correlated Data Using Generalized Estimating Equations: An Orientation James A. Hanley, Abdissa
Negassa, Michael D. deB. Edwardes, Janet E. Forrester American Journal of Epidemiology, Volume 157,
Issue 4, 15 February 2003, Pages 364–375, https://academic.oup.com/aje/article/157/4/364/78911

4 2022 Advanced Bio2 Introduction To GEE

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

4 2022 Advanced Bio2 Introduction To GEE

Uploaded by

Copyright:

Available Formats

Introduction to GEE

Generalised Estimating Equations (GEE)

- Handles longitudinal, repeated measures or otherwise clustered or correlated

- Can account for clustered or correlated data

- Called population average or marginal models

- They are not really statistical models (“estimating equations”)

- New! First introduced in 1986 (Liang & Zeger)

- Avoids need for multivariate distributions by only assuming a functional form

- The covariance structure is treated as a nuisance

- Relies on the independence across subjects to estimate consistently the

A is asking for subject specific information, B is asking for entire population

𝛃X - systematic part (linear predictor, includes coefficients, covariates etc)

CORR - placeholder term for the correlation / covariance matrix

Y - the responses, are assumed to be clustered or correlated

that the variance of Y is related (through a function) to the mean

For this, we need to estimate something called a ‘working correlation’

Exchangeable: All measurements on the same unit are equally correlated

Also called: spherical, compound symmetry

Autoregressive: Correlation depends on time or distance between

Unstructured: No assumptions about correlation

Lots of parameters, usually fails to converge

○ Age (at each observation)

○ Smoking status: current, former, nonsmoker

geeglm(fev ~ age + smoking2 + smoking3 + agebase + hbase, id = id, waves =

“An expected difference in blood pressure comparing a smoker to a non-smoker of

You might also like