Introduction to Survival Analysis
October 19, 2004
Brian F. Gage, MD, MSc with thanks to Bing Ho, MD, MPH Division of General Medical Sciences
Survival analysis compared w/ other regression techniques What is survival analysis When to use survival analysis Univariate method: Kaplan-Meier curves Multivariate methods:
• Cox-proportional hazards model • Parametric models
Assessment of adequacy of analysis Examples
Regression vs. Survival Analysis
Technique Predictor Variables Outcome Variable
No No Yes
Categorical or Linear continuous Regression
Categorical or Binary (except in Logistic polytomous log. continuous Regression regression)
Time and categorical or continuous
Linear changes Odds ratios Hazard rates
h(t) = ho(t)exp(B1X+Bo)
.Regression vs. Survival Analysis
Technique Linear Regression Logistic Regression Survival Analyses Mathematical model
Y=B1X + Bo (linear)
Ln(P/1-P)=B1X+Bo (sigmoidal prob.
What is survival analysis?
Model time to failure or time to event
• Unlike linear regression. survival analysis has a dichotomous (binary) outcome • Unlike logistic regression. survival analysis analyzes the time to an event
– Why is that important?
Able to account for censoring Can compare survival between 2+ groups Assess relationship between covariates and survival time
Importance of censored data
Why is censored data important? What is the key assumption of censoring?
Types of censoring
Subject does not experience event of interest Incomplete follow-up
• Lost to follow-up • Withdraws from study • Dies (if not being studied)
Left or right censored
When to use survival analysis
• Time to death or clinical endpoint • Time in remission after treatment of disease • Recidivism rate after addiction treatment
When one believes that 1+ explanatory variable(s) explains the differences in time to an event Especially when follow-up is incomplete or variable
• Hazard function is the derivative of the survivor function over time h(t)=dS(t)/dt
– instantaneous risk of event at time t (conditional failure rate)
Survivor and hazard functions can be converted into each other
.Relationship between survivor function and hazard function
Survivor function. S(t) defines the probability of surviving longer than time t
• this is what the Kaplan-Meier curves show.
Approach to survival analysis
Like other statistics we have studied we can do any of the following w/ survival analysis:
• Descriptive statistics • Univariate statistics • Multivariate statistics
with a higher values indicating more events per time
• When can this be calculated? • What test would you use to compare average survival between 2 cohorts?
Average hazard rate
• Total # of failures divided by observed survival time (units are therefore 1/t or 1/pt-yrs) • An incidence rate.
Univariate method: Kaplan-Meier survival curves
Also known as product-limit formula Accounts for censoring Generates the characteristic ―stair step‖ survival curves Does not account for confounding or effect modification by other covariates
• When is that a problem? • When is that OK?
4 0.5 0.8 0.1.1 0.3 0.0 0 100 200 300 400 500 600 700 800 900
Warf ASA No Rx Age 76 Years and Older (N = 394)
Days Since Index Hospitalization
.0 0.2 0.9 0.7 0.6 0.
Time to Cardiovascular Adverse Event in VIGOR Trial
which places greater weights
on events near time 0.
Hypothesis test (test of significance)
• H0: the curves are statistically the same • H1: the curves are statistically different
Compares observed to expected cell counts Test statistic which is compared to 2 distribution
.Comparing Kaplan-Meier curves
Log-rank test can be used to compare survival curves
• Less-commonly used test: Wilcoxon.
compare all curves at once
• analogous to using ANOVA to compare > 2 cohorts • Then use judicious pair-wise testing
.Comparing multiple Kaplan-Meier curves
Multiple pair-wise comparisons produce cumulative Type I error – multiple comparison problem Instead.
for many covariates
• (think multivariate regression or logistic regression rather than a Student’s t-test or the odds ratio from a 2 x 2 table)
.Limit of Kaplan-Meier curves
What happens when you have several covariates that you believe contribute to survival? Example
• Smoking. hypertension. hyperlipidemia. diabetes. contribute to time to myocardial infarct
Can use stratified K-M curves – for 2 or maybe 3 covariates Need another approach – multivariate Cox proportional hazards model is most common -.
but they require stronger assumptions about h(t). or SAS • Parametric approaches are an alternative.
.Multivariate method: Cox proportional hazards
Needed to assess effect of multiple covariates on survival Cox-proportional hazards is the most commonly used multivariate survival method
• Easy to implement in SPSS. Stata.
which is similar to the relative risk
Nonparametric Quasi-likelihood function
.Cox proportional hazard model
Works with hazard model Conveniently separates baseline hazard function from covariates
• Baseline hazard function over time – h(t) = ho(t)exp(B1X+Bo) • Covariates are time independent • B1 is used to calculate the hazard ratio.
and therefore hazard ratio Assumes multiplicative risk—this is the proportional hazard assumption
• Can be compensated in part with interaction terms
. can still calculate coefficients for each covariate. continued
Can handle both continuous and categorical predictor variables (think: logistic. linear regression) Without knowing baseline hazard ho(t).Cox proportional hazards model.
Limitations of Cox PH model
Does not accommodate variables that change over time
• Luckily most variables (e. gender. one can program time-dependent variables – When might you want this?
Baseline hazard function.
. ethnicity.g. is never specified
• You can estimate ho(t) accurately if you need to estimate S(t). ho(t). or congenital condition) are constant
– If necessary.
What is the hazard ratio and how to you calculate it from your parameters. β How do we estimate the relative risk from the hazard ratio (HR)? How do you determine significance of the hazard ratios (HRs).
• Confidence intervals • Chi square test
Assessing model adequacy
Multiplicative assumption Proportional assumption: covariates are independent with respect to time and their hazards are constant over time Three general ways to examine model adequacy
• Graphically • Mathematically • Computationally: Time-dependent variables (extended model)
.Model adequacy: graphical approaches
Several graphical approaches
• Do the survival curves intersect? • Log-minus-log plots • Observed vs.
Testing model adequacy mathematically with a goodness-of-fit test
Uses a test of significance (hypothesis test) One-degree of freedom chi-square distribution p value for each coefficient Does not discriminate how a coefficient might deviate from the PH assumption
Example: Tumor Extent
3000 patients derived from SEER cancer registry and Medicare billing information Exploring the relationship between tumor extent and survival Hypothesis is that more extensive tumor involvement is related to poorer survival
0973 p <.0001
2 = 269.
Example: Tumor Extent
Tumor extent may not be the only covariate that affects survival
• Multiple medical comorbidities may be associated with poorer outcome • Ethnic and gender differences may contribute
Cox proportional hazards model can quantify these relationships
Example: Tumor Extent
Test proportional hazards assumption with logminus-log plot Perform Cox PH regression
• Examine significant coefficients and corresponding hazard ratios
0001 0.479 1.227 1.668 1.117 2.43196 0.173 1.725 1.540 1.059 1.05060 0.05079 0.954 1.2784 22.0921 0.0001 0.05787 0.0001 <.219 1.07953 0.0001 <.269 1.06768 0.06746 0.872 1.465 1.840 1.082 1.368 1.189 1.2193 <.52773 Standard Error Chi-Square Pr > ChiSq 0.86213 0.9046 83.0001 <.0001 Hazard 95% Hazard Ratio Variable Ratio Confidence Limits Label 1.005 0.15690 0.326 1.28228 0.4874 103.170 1.373 1.05678 0.0001 <.0001 <.311 1.4215 361.806 5.5090 60.Example: Tumor Extent 5
The PHREG Procedure Analysis of Maximum Likelihood Estimates Parameter Estimate 0.09590 0.608 1.793 1.0020 <.9127 4.375 0.16088 0.381 1.292 2.175 1.0001 <.937 1.05016 0.07890 0.27087 0.423 3.052 1.51143 0.564 2.8522 0.0431 0.603 4.05575 0.58385 0.571 1.854 2.08031 9.2558 139.61752 0.47215 1.6366 55.393 70<age<=80 age>80 black other Variable DF age2 age3 race2 race3 comorb1 comorb2 comorb3 DISTANT REGIONAL LIPORAL PHARYNX treat3 treat2 treat0 1 1 1 1 1 1 1 1 1 1 1 1 1 1
both rad none
.5430 74.06423 0.624 2.32271 0.046 1.512 1.0001 <.06341 0.7549 25.7206 1.06074 0.732 1.9513 25.052 1.07300 0.5977 <.
provided certain assumptions are met.Summary
Survival analyses quantifies time to a single. dichotomous event Handles censored data well Survival and hazard can be mathematically converted to each other Kaplan-Meier survival curves can be compared statistically and graphically Cox proportional hazards models help distinguish individual contributions of covariates on survival.