You are on page 1of 27

Evaluation considerations for EHR-based phenotyping algorithms: A case study for Drug Induced Liver Injury

Casey Overby, Chunhua Weng, Krystl Haerian, Adler Perotte Carol Friedman, George Hripcsak Department of Biomedical Informatics Columbia University

AMIA TBI paper presentation March 20th, 2013

Success in part due to GWAS consortia to obtain needed sample sizes


Published Genome-Wide Associations through 09/2011 1,596 published GWA at p5X10-8 for 249 traits

NHGRI GWA Catalog www.genome.gov/ GWAStudies Background and Motivation

There are added challenges to studying pharmacological traits


Drug response is complex
Risk factors in pathogenesis of drug induced liver injury (DILI)

Source:Tujios & Fontana et al. Nat. Rev. Gastroenterol. Hepatol. 2011

Sample sizes are small compared to typical association studies of

common disease
Adverse drug events Responder types

Background and Motivation

Consortium recruitment approaches


Recruit and phenotype participants prospectively
Protocol driven recruitment

Electronic health records (EHR) linked with DNA biorepositories


EHR phenotyping

Background and Motivation

Successes developing EHR algorithms within eMERGE


Type II diabetes Peripheral arterial disease Atrial fibrillation Crohn disease Multiple sclerosis Rheumatoid arthritis
Source: www.phekb.org

High PPV

(Kho et al. JAMIA 2012; Kullo et al. JAMIA 2010; Ritchie et al. AJHG 2010; Denny et al. Circulation 2010; Peissig et al. JAMIA 2012)

Background and Motivation

Unique characteristics of DILI


Rare condition of low prevalence Complex condition
Drug is causal agent of liver injury Different drugs can cause DILI Pattern of liver injury varies between drug

Pattern of liver injury based on liver enzyme elevations No tests to confirm drug causality (some assessment tools exist) High PPV may be challenging

Background and Motivation

Why study?
DILI accounts for up to 15 % of acute liver failure cases in the

U.S., of which 75% requires liver transplant or lead to death Most frequent reason for withdrawal of approved drugs from the market
Lack understanding of underlying mechanisms of DILI Computerized approaches can reduce the burden of

traditional approaches to screening for rare conditions


(Jha AK et al. JAMIA 1998;Thadani SR et al. JAMIA 2009)

Background and Motivation

Overview of EHR phenotyping process

Case definition

Background and Motivation

Overview of EHR phenotyping process


(Re-)Design EHR Phenotyping algorithm e.g., ICD-9 codes for acute liver injury, Decreased liver function lab

Case definition

e.g., liver injury

Background and Motivation

Overview of EHR phenotyping process


(Re-)Design EHR Phenotyping algorithm

Case definition

Implement EHR Phenotyping algorithm

Background and Motivation

Overview of EHR phenotyping process


(Re-)Design EHR Phenotyping algorithm

Case definition

Evaluate EHR Phenotyping algorithm

Implement EHR Phenotyping algorithm

Background and Motivation

Overview of EHR phenotyping process


(Re-)Design EHR Phenotyping algorithm If algorithm needs improvement Implement EHR Phenotyping algorithm

Case definition

Disseminate EHR Phenotyping algorithm If algorithm is sufficient to be useful

Evaluate EHR Phenotyping algorithm

Background and Motivation

Overview of methods to develop & evaluate initial algorithm


Design EHR Phenotyping algorithm

DILI Case definition (iSAEC)

Disseminate EHR Phenotyping algorithm

Evaluate EHR Phenotyping algorithm

Implement EHR Phenotyping algorithm

Methods and Results

Overview of methods to develop & evaluate initial algorithm


Design EHR Phenotyping algorithm

DILI Case definition (iSAEC)

Disseminate EHR Phenotyping algorithm

Evaluate EHR Phenotyping algorithm

Implement EHR Phenotyping algorithm

Report lessons learned

Develop an evaluation framework Methods and Results

Lessons inform evaluator approach and algorithm design changes


Design EHR Phenotyping algorithm

DILI Case definition (iSAEC)

Disseminate EHR Phenotyping algorithm

Evaluate EHR Phenotyping algorithm

Implement EHR Phenotyping algorithm

Report lessons learned

Develop an evaluation framework

Lessons learned

Initial DILI EHR phenotyping algorithm

Clinical data warehouse

A1. Diagnosed with liver injury?

18,423
yes

A2. Exposure to drug? no

no Exclude Exclude no

Consider chronicity no

13,972

yes Exclude

C3. ALT >= 3x ULN

B. New liver injury?

no

C2. ALT >= 5x ULN

DILI case definition


no

2,375

yes

1.
a.

Liver injury diagnosis (A1)


Acute liver injury (C1-C4) b. New liver injury (B)

yes

yes

C1. ALP >= 2x ULN

yes

2.

Caused by a drug
a. New drug (A2) b. Not by another disease (D)
C4. Bilirubin >= 2x ULN yes

1,264
D. Other diagnoses?

560
no

Patients meeting drug induced liver injury criteria

Ref: Aithal, G.P., et al. Case Definition and Phenotype Standardization in Drug-induced Liver Injury. Clin Charmacol Ther. 2011 Jun; 89(6): 806-15

no

yes

Exclude

Exclude

Methods and Results

Estimated positive predictive value


Initial algorithm results: 100 randomly selected for manual review from 560 patients

Reviewer 1
TP: 27 FP: 42 NA: 30 PPV: TP/(TP+FP) = 27/(42+27) = 39%

20

Reviewer 2

20

20

20

Reviewer 3

20

Reviewer 4

Preliminary kappa coefficient:

0.50 (Moderate agreement)

Interpretation of PPV is unclear given moderate agreement among reviewers

Methods and Results

Included measurement and demonstration studies


Measurement study goal to determine the extent and nature of the errors with which a measurement is made using a specific instrument. Evaluator effectiveness Demonstration study goal establishes a relation which may be associational or causal between a set of measured variables. Algorithm performance
Definitions from: Evaluation methods in medical informatics Friedman & Wyatt 2006

Methods and Results

Included quantitative and qualitative assessment


Quantitative data Inter-rater reliability assessment PPV Qualitative data Perceptions of evaluation approach effectiveness
e.g., review tool, artifacts reviewed

Perceptions of benefit of results


e.g., correct for the case definition?

Methods and Results

An evaluation framework and results


Measurement study (evaluator effectiveness) Quantitative results Kappa coefficient: 0.50 TP: 27 FP: 42 NA: 30 PPV: TP/(TP+FP) = 39% Demonstration study (algorithm performance)

Qualitative results

Perceptions of evaluation approach effectiveness: Differences between evaluation platforms Visualizing lab values Availability of notes Discharge summary vs. other notes

Perceptions of benefit of results (themes in FPs): Babies Patients who died Overdose patients Patients who had a liver transplant

Methods and Results

Lesson learned: Whats correct for the algorithm may not be correct for the case definition
Are we measuring what we mean to measure? Case definition: liver injury due to medication, not by another disease Many FPs were transplant patients Patients correct for the algorithm, but liver enzymes elevated due to procedure Revision: exclude transplant patients

Lessons learned

Improved algorithm design given themes in FPs


Added exclusions Babies Overdose patients Patients who died Transplant patients

A collaborative approach to develop an EHR phenotyping algorithm for DILI in preparation

Lessons learned

Lesson learned: Evaluator effectiveness influences ability to drawing appropriate inferences about algorithm performance
How does our evaluation approach influence performance

estimations?
Moderate agreement among algorithm reviewers, so interpretation of

PPV unclear
Revision: Improve evaluator approach

Lessons learned

Improved evaluator approach


aph value

Consensus among 4 reviewers Assign TP/FP status by

300

200

100

20120315

20120320

20120325

20120330

20120405

20120410

20120415

20120420

20120426

20120501

20120506

20120511

20120517

20120522

20120527

20120601

20120607

20120612

20120617

20120622

20120628

20120703

20120708

20120713 20120713 20120713

400

1. First-pass review of temporal relationship


alt value

200

Assign preliminary TP, FP, unknown status

20120315

20120320

20120325

20120330

20120405

20120410

20120415

20120420

20120426

20120501

20120506

20120511

20120517

20120522

20120527

20120601

20120607

20120612

20120617

20120622

20120628

20120703

20120708

Confirm suspected TPs


bilirubinIV value

Assign TP/FP if unknown status in first pass

review

20120315

20120320

20120325

20120330

20120405

20120410

20120415

20120420

20120426

20120501

20120506

20120511

20120517

20120522

20120527

20120601

20120607

20120612

20120617

20120622

20120628

20120703

20120708

A collaborative approach to develop an EHR phenotyping algorithm for DILI in preparation

Lessons learned

20120719

20120719

2. Perform Chart review

20120719

Summary of findings
Lessons learned from applying our evaluation framework Whats correct for the algorithm may not be correct for the case definition (Are we measuring what we mean to measure?) Evaluator effectiveness influences ability to draw appropriate inferences about algorithm performance Demonstrated that our evaluation framework is useful Informed improvements in algorithm design Informed improvements in evaluator approach Likely more useful for rare conditions Demonstrated EHR phenotyping algorithm development is an iterative process Complexity of the algorithm may influence

Acknowledgments
Dr. Yufeng Shen - Serious Adverse Event Consortium collaborator eMERGE collaborators
Mount Sinai (Drs. Omri Gottesman, Erwin Bottinger, and Steve Ellis) Mayo Clinic (Drs. Jyotishman Pathak, Sean Murphy, Kevin Bruce, Stephanie Johnson,

Jay Talwalker, Christopher G. Chute, Iftikhar J. Kullo) Northwestern (Dr. Abel Kho) Vanderbilt (Dr. Josh Denny)
DILIN collaborator
UNC-CH (Dr. Ashraf Farrag)

Columbia Training in Biomedical Informatics (NIH NLM #T15 LM007079) The eMERGE network U01 HG006380-01 (Mount Sinai)

Questions?
Quantitative results

DILI Case definition (iSAEC)

Design EHR Phenotyping algorithm

Measurement study Kappa coefficient: 0.50 TP: 27 FP: 42 NA: 30

Demonstration study

PPV: TP/(TP+FP) = 39%

Report lessons learned

Develop an evaluation framework

Casey L. Overby casey@dbmi.columbia.edu

Qualitative results

Disseminate EHR Phenotyping algorithm

Evaluate EHR Phenotyping algorithm

Implement EHR Phenotyping algorithm

Perceptions of evaluation approach effectiveness: Differences between evaluation platforms Visualizing lab values Availability of notes Discharge summary vs. other notes

Perceptions of benefit of results (themes in FPs): Babies Patients who died Overdose patients Patients who had a liver transplant