You are on page 1of 9

nature publishing group Articles

Open
see COMMENTARY page 474 and ARTICLE page 539

Pharmacovigilance Using Clinical Notes


P LePendu1, SV Iyer1, A Bauer-Mehren1, R Harpaz1, JM Mortensen1, T Podchiyska2, TA Ferris2 and
NH Shah1

With increasing adoption of electronic health records (EHRs), there is an opportunity to use the free-text portion of EHRs
for pharmacovigilance. We present novel methods that annotate the unstructured clinical notes and transform them into
a deidentified patient–feature matrix encoded using medical terminologies. We demonstrate the use of the resulting
high-throughput data for detecting drug–adverse event associations and adverse events associated with drug–drug
interactions. We show that these methods flag adverse events early (in most cases before an official alert), allow filtering
of spurious signals by adjusting for potential confounding, and compile prevalence information. We argue that analyzing
large volumes of free-text clinical notes enables drug safety surveillance using a yet untapped data source. Such data
mining can be used for hypothesis generation and for rapid analysis of suspected adverse event risk.

Phase IV surveillance is a critical component of drug safety Two key barriers to using clinical notes are privacy and acces-
because not all safety issues associated with drugs are detected sibility.16 Clinical notes contain identifying information, such as
before market approval. Each year, drug-related events account for names, dates, and locations, that are difficult to redact automati-
up to 50% of adverse events occurring in hospital stays,1 signifi- cally, so care organizations are reluctant to share clinical notes.
cantly increasing costs and length of stay in hospitals.2 As much We describe an approach that computationally processes clini-
as 30% of all drug reactions result from concomitant use—with an cal text rapidly and accurately enough to serve use cases such as
estimated 29.4% of elderly patients on six or more drugs.3 drug safety surveillance. Like other terminology-based systems,
Efforts such as the Sentinel Initiative and the Observational it deidentifies the data as part of the process.18 We trade the
Medical Outcomes Partnership4 envision the use of elec- “unreasonable effectiveness”24 of large data sets in exchange for
tronic health records (EHRs) for active pharmacovigilance.5–7 sacrificing some individual note-level accuracy in the text pro-
Complementing the current state of the art—based on reports cessing. Given the large volumes of clinical notes, our method
of suspected adverse drug reactions—active surveillance aims produces a patient–feature matrix encoded using standardized
to monitor drugs in near real time and potentially shorten the medical terminologies. We demonstrate the use of the result-
time that patients are at risk. ing patient–feature matrix as a substrate for signal detection
Coded discharge diagnoses and insurance claims data from algorithms for drug–adverse event associations and drug–drug
EHRs have already been used for detecting safety signals.8–10 interactions.
However, some experts argue that methods that rely on coded
data could be missing >90% of the adverse events that actually RESULTS
occur, in part because of the nature of billing and claims data.1 Our results show that it is possible to detect drug safety signals
Researchers have used discharge summaries (which summarize using clinical notes transformed into a feature matrix encoded
information from a care episode, including the final diagnosis using medical terminologies. We evaluate the performance of the
and follow-up plan) for detecting a range of adverse events11 and resulting data set for pharmacovigilance using curated reference
for demonstrating the feasibility of using the EHR for pharma- sets of single-drug adverse events as well as adverse events related
covigilance by identifying known adverse events associated with to drug–drug interactions. In addition, we show that we can
seven drugs using 25,074 notes from 2004.12 Therefore, the clini- simultaneously estimate the prevalence of adverse events result-
cal text can potentially play an important role in future pharma- ing from drug–drug interactions. The reference set, described in
covigilance,13,14 particularly if we can transform notes taken daily the Methods section, contains 28 positive associations and 165
by doctors, nurses, and other practitioners into more accessible negative associations spanning 78 drugs and 12 different events
data-mining inputs.15–17 for single drug–adverse event associations. For the drug–drug
1Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, California, USA; 2Stanford Center for Clinical Informatics, Stanford University ,

Stanford, California, USA. Correspondence: P LePendu (plependu@stanford.edu)


Received 26 October 2012; accepted 22 February 2013; advance online publication 10 April 2013. doi:10.1038/clpt.2013.47

Clinical pharmacology & Therapeutics | VOLUME 93 NUMBER 6 | June 2013 547


15326535, 2013, 6, Downloaded from https://ascpt.onlinelibrary.wiley.com/doi/10.1038/clpt.2013.47, Wiley Online Library on [01/09/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Articles

interactions, the reference set contains 466 positive and 466 nega- performance for detecting associations between a single drug
tive associations spanning 333 drugs across 10 events. and its adverse event, with an area under the receiver oper-
ating characteristic curve (AUC) of 75.3% (unadjusted) and
Feasibility of detecting drug–adverse event associations 80.4% (adjusted). A threshold of 1.0 (a commonly used cut-
To demonstrate the feasibility of using free text–derived features off) on the lower bound of the 95% CI of the adjusted ORs
for detecting drug–adverse event associations, we reproduce the translates to 39% sensitivity and 97.5% specificity. Choosing a
well-known association between rofecoxib and myocardial infarc- signaling threshold, defined using minimum specificity of 90%,
tion. Rofecoxib was taken off the market because of the increased based on the receiver operating characteristic curve, yields a
risk of heart attack and stroke.19,20 We compute an association cutoff of 1.18 (unadjusted) and 0.84 (adjusted) on the lower
between rofecoxib and myocardial infarction, keeping track of the bound of the 95% CI. Supplementary Data S1 online lists all
temporal order of the diagnosis of rheumatoid arthritis, exposure adjusted results, and Supplementary Data S2 online lists the
to the drug, and occurrence of an adverse event as described in AUC threshold data.
the Methods section. Using data up to 2005, we obtain an odds
ratio (OR) of 1.31 (95% confidence interval (CI): 1.16–1.45) for Profiling drug–adverse event associations over time
the association, which agrees with previously reported results.19,20 Figure 3 shows the cumulative ORs and exposures over time
In a previous study, we compared using clinical notes with using based on the unadjusted associations for the 10 drugs in our
the codes from the International Classification of Diseases, Ninth reference set that have had an alert in the past decade. Using a
Revision (ICD-9), and found no association (OR: 1.71; 95% CI: threshold of 1.0 on the lower bound of the CI for the association,
0.74–3.53) using the coded data.21 This is probably due to under- we would flag six of nine alerts earlier than the official date (we do
coding: for patients to be counted as exposed requires a prior not have enough data for one drug, troglitazone). By comparison,
arthritis indication, and approximately one-third of the patients the propensity-adjusted method would catch three of the alerts
meet that criterion. early. The unadjusted associations can flag signals worth inves-
tigating, and the adjusted associations may reduce false alarms.
Performance of detecting adverse drug events
Figure 1 shows the adjusted ORs and 95% CIs for the 28 true- Performance of detecting adverse drug–drug interactions
positive associations from our single drug–adverse event ref- Figure 2b shows the performance (AUC of 81.5%) for detect-
erence set. As expected, the results show some variation by ing known adverse events arising from drug–drug interac-
event across the adverse events.10 Figure 2a shows the overall tions. Adjusting the associations for potential confounding

Myocardial infarction Cardiac valve fibrosis Venous thrombosis

Rosiglitazone – mi (1,401) Cabergoline – cvf (52) Raloxifene – vt (1,137)

Rofecoxib – mi (5,294)
Clozapine – vt (222)
Medroxyprogesterone – mi (4,828)
Pergolide – cvf (114)
Celecoxib – mi (9,132) Anastrozole – vt (2,523)

Valdecoxib – mi (1,722) Drospirenone – vt (64)


Phentermine – cvf (341)
Levonorgestrel – mi (4,208)
Levonorgestrel – vt (4,214)
Sibutramine – mi (304)

0 0.5 1 1.5 2 2.5 3 3.5 4 0 0.5 1 1.5 2 2.5 3 3.5 4 0 0.5 1 1.5 2 2.5 3 3.5 4

Aplastic anemia Progressive multifocal leukoencephalopathy Other


(5.2) (4.4)
Ticlopidine – aa (19) Fludarabine – pml (1,243) Cerivastatin – rhabd (109)

Allopurinol – aa (1,173)
Troglitazone – arf (85)
Rituximab – pml (3,148)
Valproic acid – aa (1,882)
Cisapride – qt (447)
Phenobarbital – aa (1,285)
Alemtuzumab – pml (318)
Pioglitazone – ubc (1,578)
Clopidogrel – aa (1,899)

0 0.5 1 1.5 2 2.5 3 3.5 4 0 1.5 3 4.5 6 7.5 9 11 0 0.5 1 1.5 2 2.5 3 3.5 4

Figure 1 Adjusted odds ratios (ORs) for positive cases in the single drug–adverse event set. Results show some variability by event. The 28 positive cases include
the following events: myocardial infarction (mi), rhabdomyolysis (rhabd), cardiovascular fibrosis (cvf), acute renal failure (arf), QT prolongation (qt), urinary bladder
cancer (ubc), progressive multifocal leukoencephalopathy (pml), aplastic anemia (aa), and venous thrombosis (vt). Some associations are off the scale, and we
indicate the OR in parenthesis above the line (one exception, Natalizumab-pml (232), is not shown at all due to extreme scale: OR: 79.5; 95% CI: 30.8–270.4). We
also include the number of exposed patients in parenthesis for each drug–adverse event pair. Typically, a signal occurs when the lower bound of the confidence
intervals exceed 1.0; however, this threshold may have different optimal settings on the basis of the event.

548 VOLUME 93 NUMBER 6 | June 2013 | www.nature.com/cpt


15326535, 2013, 6, Downloaded from https://ascpt.onlinelibrary.wiley.com/doi/10.1038/clpt.2013.47, Wiley Online Library on [01/09/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Articles

a Drug–event b Drug–drug–event
1.0 1.0

0.8 0.8

Sensitivity (true positive rate)


Sensitivity (true positive rate)

0.6 0.6

0.4 0.4

AUC AUC
75.3% 74.8%
0.2 80.4% 0.2
81.5%

0.0 0.0

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
1–Specificity (false positive rate) 1–Specificity (false positive rate)

Figure 2 Performance of adverse drug reactions and drug–drug interaction detection. Overall performance is measured using areas under the receiver
operating characteristic curve (AUCs). (a) The unadjusted (blue) vs. adjusted (red) methods yield AUCs of 75.3 and 80.4% overall. (b) For drug interactions, the
adjusted methods (red) reach 81.5% AUC.

Score Rofecoxib – mi Exposure Score Celecoxib – mi Exposure Score Valdecoxib – mi Exposure

10,000
1.6
2.0

1.5

1,500
1.3
1.4
4,000
1.6

1.3

1.0
6,000

1,000
1.2

1.2
1.2

0.5
1.0
2,000

500
2,000
0.8

0.8

0.0
2000 2002 2004 2006 2008 2010 Year 2000 2002 2004 2006 2008 2010 Year 2002 2004 2006 2008 2010 Year

Score Rosiglitazone – mi Exposure Score Pioglitazone – ubc Exposure Score Cerivastatin – rhabd Exposure
1,500

2,000

25
8
3,000

1,000

20
6

100
15
500 1,000
4
500

7.9
10
1,000

1.6
2

1.3
0

1998 2000 2002 2004 2006 2008 2010 Year 2002 2004 2006 2008 2010 Year 2002 2004 2006 2008 2010 Year

Score Pergolide – cvf Exposure Score Sibutramine – mi Exposure Score Cisapride – qt Exposure
120
300

800
100
4

2.0

20 40 60 80
3

80

200

44.5 600
2

60

1.0

400
1

40

100
0.0
0

20

2000 2002 2004 2006 2008 2010 Year 2000 2002 2004 2006 2008 2010 Year 2000 2002 2004 2006 2008 2010 Year

Figure 3 Cumulative (unadjusted) odds and exposure plots for 10 positive cases involving US Food and Drug Administration (FDA) intervention. Signals are
flagged earlier than official alerts in six of nine cases (troglitazone excluded for lack of sufficient exposure). The solid red line is the odds ratio (OR), and the
dotted red lines are the confidence intervals (CIs). The solid blue line is the exposure rate. The shaded area marks the period for which FDA intervention applies
(e.g., withdrawal). The point estimate marks the earliest year and OR when the lower bound of the 95% CI is above a threshold of 1.0, i.e., when the unadjusted
method would flag the drug for monitoring. As more data accumulate and exposure increases, patterns often converge toward more confident signals. cvf,
cardiovascular fibrosis; mi, myocardial infarction; qt, QT prolongation; rhabd, rhabdomyolysis; ubc, urinary bladder cancer.

improves the signal detection capability (red curves in Estimating the prevalence of adverse events
Figure 2b).22 In the drug–drug interaction scenario, we do Population-level prevalence data for adverse events are hard to
not constrain by drug indications because of combinatorial come by. For single drugs, sources such as Side Effect Resource
complexity. We obtain 52% sensitivity at 91% specificity, provide information on the frequency of specific adverse events
using 1.0 as a threshold on the lower bound of the CI for the from the drug product label. No such comparable resource exists
adjusted associations. for adverse events arising from drug–drug interactions.

Clinical pharmacology & Therapeutics | VOLUME 93 NUMBER 6 | June 2013 549


15326535, 2013, 6, Downloaded from https://ascpt.onlinelibrary.wiley.com/doi/10.1038/clpt.2013.47, Wiley Online Library on [01/09/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Articles

While performing the drug–adverse event association cal- our methods are not dependent on any particular NLP tool (we
culations using data from a clinical data warehouse, we can in contrast MGREP and UNITEX in the Methods section), and
parallel estimate the prevalence of adverse events associated with we expect results to improve given the availability of better and
drug–drug interactions. For example, we found that 42.8% (176 faster clinical NLP tools.28,29 We are currently collaborating with
of 411) of patients on both levodopa and lorazepam experience researchers at the Mayo Clinic to improve the speed of the clini-
parkinsonian symptoms, 19.8% (140 of 707) of patients on pacli- cal Text Analysis and Knowledge Extraction System,29 one of
taxel and trastuzumab experience neutropenia, and 17.8% (796 the state-of-the-art NLP tools available for clinical text. Broader
of 4,467) of patients on amiodarone and metoprolol experience availability of curated clinical NLP data sets and health outcome
bradycardia. definitions would accelerate research and validation.
Our work has several limitations and opportunities for
DISCUSSION improvement. Not all conditions are equally identifiable from
We have demonstrated that adverse drug events as well as text using lexical approaches (Supplementary Data S3 online
adverse events associated with drug–drug interactions can be reports validation results by condition). Advanced NLP tools
detected using a deidentified patient–feature matrix extracted would improve accuracy in these cases. Biases in our refer-
from free-text clinical documents. Blumenthal and others5 ence set—although among the largest used for such a study—
envision a scenario in which a new drug comes to market affect our performance estimation. A new reference standard
and a nationwide learning system monitors for safety signals. covering four events has just recently been released by the
Our results show that deidentified clinical notes can be used Observational Medical Outcomes Partnership,4 and we are
to generate drug safety signals—taking a step toward such a currently evaluating its utility. Some adverse drug events are
scenario. In addition, the patient–feature matrix also provides dose dependent, and our methods currently ignore this infor-
prevalence data not available from other data sources (e.g., mation. The UNITEX tool, described in the Methods section,
spontaneous reports). Having such prevalence information includes libraries for dosage extraction and thus is a logical
can assist in prioritizing actionable events and reducing alert next step. We do not distinguish between new users of drugs
fatigue.23 and existing or chronic ones. Our methods have a limited abil-
Our approach to processing clinical notes is simple in com- ity to define eras (durations of medication and illness). We are
parison with advanced natural language processing (NLP) sys- currently examining the annotation data for the utility of the
tems that may have better accuracy in identifying nuanced last mention of a concept, sentence-level co-occurrences, and
attributions of disease conditions. We sacrifice some individ- temporal density of mentions to address this question. The
ual note-level accuracy in exchange for the ability to detect majority of our findings are based on the Stanford Hospital and
population-level trends against massive data sets. Our results, Clinics, which is a tertiary-care center representing a skewed
based on a reference set of known drug–event pairs, show that population. At the same time, this population has added utility
when exposure data are numerous enough, the use of rela- for investigating rare events. Variations in signaling thresh-
tively simple text mining with standard association strength olds can also occur as a result of the prevalence or rarity of an
tests for signal detection can work, reflecting the adage in the event,10 and more research is needed to adapt detection algo-
machine-learning community that “a dumb algorithm with rithms accordingly. The prevalence data estimated in studies
lots of data beats a clever one with modest amounts of it.”24,25 such as ours are an important step in this direction.10 Finally,
When used in combination with other data sets, clinical notes we note that the Observational Medical Outcomes Partnership
may address cases that otherwise pass undetected. We sacri- group suggests that no single method works best uniformly,
fice sensitivity for specificity because for a new approach, and that different methods be considered for each event and data
a new data source (clinical notes), keeping false-discovery source, and that profiling performance via receiver operating
rates low is important, particularly in the initial stages of characteristic curves assists in understanding the utility of a
establishing feasibility. method or data source.4
We find that ontologies are an excellent source of features To conclude, our method extracts from textual clinical notes
and allow systematic normalization and aggregation when the a deidentified patient–feature matrix encoded using standard-
feature set needs reduction.15,26 For example, we can count all ized medical terminologies. We have demonstrated the use of
patients who experience cardiac arrhythmias as patients with the resulting patient–feature matrix as a substrate for detect-
arrhythmias because of the hierarchical relationships. Therefore, ing single drug–adverse event associations (AUC of 80.4%)
ontology hierarchies can organize a very large number of terms and for detecting adverse events associated with drug–drug
into a smaller feature set. Moreover, because names, dates, and interactions (AUC of 81.5%), illustrating that clinical notes
locations are not present in the clinical terminologies, those are can be a source for detecting drug safety signals at scale.15
not extracted as features by dictionary-based methods.18,27 The patient–feature matrix can also be used to learn off-label
We believe that the information embedded in text is crucial usage30 and to discern drug adverse events from indications.31
for leveraging EHR data,10,13,14 particularly for rare events Using the textual contents of the EHR complements efforts
for which large amounts of data are needed. Our annotation- using billing and claims data or spontaneous reports4,8,14,32,33
based approach produces a feature matrix that complements and opens up new opportunities for leveraging observational
other structured data such as codes from the ICD-9. Of note, data.

550 VOLUME 93 NUMBER 6 | June 2013 | www.nature.com/cpt


15326535, 2013, 6, Downloaded from https://ascpt.onlinelibrary.wiley.com/doi/10.1038/clpt.2013.47, Wiley Online Library on [01/09/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Articles

A Drug–event B Drug–drug–event

2×2 2×2
Patient timeline Reason Patient timeline Reason
cell cell
No drug taken and event No drug taken and event not
d d
I not encountered encountered
Event encountered without d Single drug taken and no event
I E c D
taking drug encountered
>0 Event encountered without Event encountered without
c taking drug c
I E D E taking either drug
Drug taken but no event Single drug precedes adverse
b c
I D encountered D E event
Event encountered after
a Drugs possibly interact, but
I D E taking drug b
D1 D2 adverse event not encountered
C +Outcome − Drugs possibly interact,
Key + a
a b D1 D2 E followed by adverse event
Exposure
− c d

Figure 4 Assignment of patients to 2 × 2 contingency tables. Patients are assigned to cells a, b, c, and d of a 2 × 2 contingency table (C) on the basis of the
patterns shown in parts (A) and (B). In the patterns, indications are abbreviated with “I”, drugs with “D”, and outcomes or events with “E.” A patient exposed to
the drug is counted in cells “a” or “b” depending on whether the outcome occurs after the drug exposure, based on temporal ordering of first mentions of the
I, D, and E. Other patients (i.e., unexposed) are placed in the bottom row of the 2 × 2 contingency table in cells “c” or “d” depending on whether the outcome
occurred in the observation duration after the indication. Therefore, for example, an indication followed by a drug and then an event would go into the “a”
cell. An indication followed by no drug mention but having an occurrence of the event would go into cell “c.” For drug–drug interactions, we do not restrict the
assignment on the basis of the indications. Therefore, patients with mentions of both drugs (in either order) before an event would go into the “a” cell.

METHODS ral order, we assigned patients to specific cells of a 2 × 2 contingency table


Data sources. Our primary data source was the Stanford Translational as shown in Figure 4 (see also Supplementary Data S5 online). The tem-
Research Integrated Database Environment,34 which spans 18 years of poral information in the patient–feature matrix is critical for determining
patient data from 1.8 million patients; it contains 19 million encoun- whether the event follows exposure.39 Patients having no mention of the
ters, 35 million coded ICD-9, diagnoses, and >11 million unstructured indication at any time are excluded from the analysis (see Supplementary
clinical notes, which are a combination of pathology, radiology, and tran- Data S6 online for those patients being excluded). Using data following
scription reports. The gender split is ~60% female; the average age is 44 the indication, and not counting repeat mentions, the ordering of the
with an SD of 25. drug and event determined into which cell of a 2 × 2 matrix the patient
fell. Because all unexposed patients have the indication, they could be on
The reference standard. We created reference standards of known
an alternative drug or other treatment, or none at all.
drug–adverse event associations for testing the performance of our
In the second step, we adjusted for confounding by specific patient fac-
methods in detecting drug safety signals from text. Supplementary
tors. We included age, gender, race, and comorbidity and coprescription
Data S4 online lists the single drug–event reference set.
frequency (as a surrogate for overall health status) in calculating the pro-
For the single-drug adverse events, our reference set included 12 dis-
pensity score.9 The propensity score quantified the likelihood of a patient
tinct events worth monitoring35 and 78 distinct drugs, 28 positive cases,
to be exposed to a drug. Patients with known indications were matched
and 165 negative cases. We started with a validation set from the Euro-
(exposed vs. unexposed) via the propensity score. Finally, we included the
pean Union adverse drug event project (EU-ADR)36 and to that set, we
propensity score as a covariate in logistic regression to compute adjusted
added 10 drug safety signals that involved US Food and Drug Adminis-
ORs and 95% CIs using the coefficients of the regression model. We used
tration intervention in the past decade, manually curating these from the
the Matching and Survival packages in R.40
literature and cross-referencing with the agency’s website. We established
For single drug–event associations, we identified the indications of the
our false-discovery rate by generating a set of negative associations by
drug using the Medi-Span Drug Indications Database and the National
creating all combinations of drugs and events and subtracting any known
Drug File–Reference Terminology. In the drug–drug interaction sce-
associations that were identified by any one of the EU-ADR filtering
nario, the key idea is to determine whether the association of the event
workflows,37 the Medi-Span (Wolters Kluwer Health, Indianapolis, IN)
with the combination of the two drugs outweighs any association of the
Adverse Drug Effects Database, or the Side Effect Resource database.38
event with either one of the drugs alone (or none at all). Including the
For the two-drug case, known drug–drug interactions were extracted
indications adds a degree of combinatorial complexity, so we focused
(and manually validated) from textual monographs in DrugBank and the
primarily on the temporal order of the two drugs and event (Figure 4b)
Medi-Span Drug Therapy Monitoring System. In this case, we simulated
without restricting by the indications of the drugs.
the negative set by associating drug pairs with a randomly chosen event,
removing any cases that were already known to be associated on the Generating the patient–feature matrix. Our annotator workflow,
basis of external knowledge (DrugBank, Medi-Span, Drugs.com, Uni- described previously,21,30 uses ~5.6 million strings from existing ter-
fied Medical Language System (UMLS), or Side Effect Resource). This minologies; filters unambiguous terms that are predominantly noun
reference set included 10 distinct events, 333 distinct drugs, 466 positive phrases representing drugs, diseases, devices, or procedures; uses the
cases, and 466 negative cases. cleaned up lexicon for term recognition in the clinical notes to tag
Testing for drug safety signals. We followed a two-step process for or annotate41 the text; excludes negated terms or terms that apply to
detecting drug safety signals: first, we computed a raw association in family and medical history;42 normalizes all terms using the ontology
the form of an unadjusted OR, followed by adjustment for potential hierarchies; and finally uses the time stamps of the note to produce a
confounders. The first step is useful for flagging putative signals, and deidentified, temporally ordered patient–feature matrix. The process
the second step is useful in reducing false alarms. is summarized in Figure 5 and the individual steps are detailed below.
In the first step, we computed unadjusted ORs and 95% CIs by con-
structing a 2 × 2 contingency table26,33 from the patient–feature matrix. On Using biomedical ontologies for text annotation. We use existing ontolo-
the basis of first mentions of drug, event, and indication and their tempo- gies as a source of (i) a lexicon of strings that are grouped together and

Clinical pharmacology & Therapeutics | VOLUME 93 NUMBER 6 | June 2013 551


15326535, 2013, 6, Downloaded from https://ascpt.onlinelibrary.wiley.com/doi/10.1038/clpt.2013.47, Wiley Online Library on [01/09/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Articles

Creating clean lexicons Annotation Normalizing concepts


Frequency Term-1 3 BioPortal megathesaurus Concept-1

4
Syntactic types 2 Term-n UMLS semantic types Concept-m
5
1

BioPortal – ontologies Drugs Diseases Devices Procedures

e
m
True internal representation

Ti
Root (with some keys shown for illustration)
1 0 1 0 0 1 1 1 1 0 0 1

Term A Term B Reconstructed representation 0 0 1 1 1 0 0 1 0 0 1 1


1 1 0 0 0 1 0 0 1 1 1 1

Patients
Term D 6 1 0 1 0 0 1 0 1 0 0 0 1
Term C
0 0 1 1 1 1 0 0 0 1 0 1
Further analysis
0 0 1 0 1 1 1 1 1
P1 Cohorts of interest, 0 0 1

P3 clinical data subsets, 1 0 0 0 1 0 0 0 0 0 1 1


P2
information retrieval 0 0 1 1 1 1 1 0 1 1 0 0

Figure 5 Generation of the patient–feature matrix. The workflow (1) starts by downloading ~5.6 million strings for every term in ontologies from both the
Unified Medical Language System (UMLS) and BioPortal, as well as all trigger terms from NegEx and ConText; (2) uses term frequency and syntactic type
information (e.g., predominant noun phrases) from MedLine to prune the set of strings into a clean lexicon; (3) applies the lexicon directly against the textual
notes using exact string matching; (4) applies NegEx and ConText rules to filter negation and family history contexts; (5) applies UMLS Metathesaurus and
BioPortal mappings and semantic type information to normalize terms into concepts that are grouped by drug, disease, device, or procedure; and (6) results
finally in the patient–feature matrix. Each row of the matrix represents a single note that is linked to a single patient, and the time stamps of the notes induce a
temporal ordering over the entire patient–feature matrix.

linked to over a million concepts via synonymy (referred to as mappings) ingredients using relationships from RxNorm (e.g., “tradename_of ”).
and (ii) a hierarchy of >14 million parent–child relationships among Therefore, “rofecoxib 12.5 mg oral tablet” and “Vioxx” are normalized
those concepts. We use the lexicon to recognize terms in the input text to the active ingredient rofecoxib. In addition, we map ingredients to
using a tool called MGREP,41 which also tracks the relative position at the Anatomical Therapeutic Chemical Classification System, which
which each term occurs (Figures 5 and 6). In addition to clinical terms, enables four levels of aggregation, i.e., rofecoxib, celecoxib, and val-
based on the ConText system,42 we include terms corresponding to con- decoxib are all cyclooxygenase-2 inhibitors, which are nonsteroidal
textual cues called “triggers” in our lexicon. Cues such as “denies,” “no anti-inflammatory drugs, and so on.
sign of,” and “father has a history of ” are used in a postprocessing step to Although drug normalization is fairly straightforward, diseases,
identify terms that are negated or that apply to family or medical history. devices, and procedures present a challenge. In what we call the
Terms that correspond to mentions in these contexts are ignored—thus, two-hop method (Figure 7), we use a query-driven approach to
the subsequent analysis relies on positive, present mentions of concepts. normalize disease, device, and procedure concepts. We start with
The resulting annotations for the Stanford Translational Research Inte- definitions from the E ­ U-ADR project’s specifications and MedDRA
grated Database Environment data set comprise ~3.75 billion records. standardized query definitions: for example, for myocardial infarc-
It takes 1 hour to generate annotations from 3 million documents using tion, we would start with the ICD-9, code 410 (acute myocardial
a single computer workstation and ~2 hours to postprocess the data. infarction) and 18 different UMLS concept unique identifiers includ-
MGREP can be substituted with other NLP tools: one such tool we have ing C0027051 ­(myocardial infarction), C0340324 (silent myocardial
tested is UNITEX,43 which offers advanced functionality such as regular infarction), and C0155626 (acute ­myocardial infarction). Starting
expressions for drug doses and morpheme-based matching at the cost of with these “seed” concepts, we utilize mappings across ontologies
an additional 10–20% processing time. and the hierarchical parent–child relationships to expand subsumed
entities. Supplementary Data S7 online lists all seed queries and
Cleaning the lexicon. Motivated by previous work on identifying and their full expansions.
removing noninformative terms,44,45 we apply a series of suppression We first precompute the transitive closure over all parent–child hier-
rules that fall into two categories: syntactic and semantic. We keep terms archies, and we index it such that we can retrieve all ancestors or all
that are predominantly noun phrases46 based on an analysis of over 20 descendants of a given concept. Second, the mappings among synony-
million MEDLINE abstracts; we remove uninformative phrases based mous terms form an equivalence class to which we assign a unique iden-
on term frequency analysis of >50 million clinical documents from the tifier (similar to the UMLS Metathesaurus concept unique identifiers).
Mayo clinic;47 and we suppress terms having fewer than four characters Using these two resources, given concepts of interest as a seed query, for
by default because the majority of these tend to be ambiguous abbre- example, the 18 concepts for myocardial infarction, we use the mappings
viations. Finally, using frequency-based sorting, we manually identify to find all canonical identifiers (first hop) and then use the transitive clo-
ambiguous terms that belong to more than one semantic group (drug, sure to include all subsumed concepts in the query. Next, we repeat the
disease, device, and procedure),47,48 and we suppress their least likely process once more with this expanded set of concepts (second hop). For
interpretation. For example, “clip” is more likely to be a device than a myocardial infarction, the expansion process yields 470 unique strings.
drug in clinical text, so we suppress the interpretation as “corticotropin- In principle, recursion with a least fixed-point semantics would apply;
like intermediate lobe peptide” even though clip is listed as its synonym. however, recursion does not work well in practice because of differing
abstraction levels among ontologies, which induce cycles. We have found
Normalizing terms in the patient–feature matrix. Drug prescrip- that two hops achieve an adequate balance between soundness and com-
tions are identified via the text processing and normalized into active pleteness for the current use case.

552 VOLUME 93 NUMBER 6 | June 2013 | www.nature.com/cpt


15326535, 2013, 6, Downloaded from https://ascpt.onlinelibrary.wiley.com/doi/10.1038/clpt.2013.47, Wiley Online Library on [01/09/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Articles

Input text Annotations

True internal representation


(with some keys shown for illustration)

Reconstructed representation

Figure 6 Sample annotations. (a) A discharge summary is encoded internally using (b) a highly compressed, numerical representation. The strings in parenthesis are
keyed to the first column of numbers and are included merely for illustration purposes. (c) The annotations keep track of relative positional information and are so rich
owing to the vast lexicon that if we reconstruct the note, very little of the useful information is lost (notice the section headers). The blank areas in the reconstruction
represent terms that are not recognized, and terms highlighted in red denote ones that will not be attributed to the present patient because of contextual cues (e.g.,
family history and negated findings). CABG, coronary artery bypass graft; COPD, chronic obstructive pulmonary disease; CT, computed tomography.

First iteration Second iteration


O2 O2
O1 O1

∈C′
∈C ∈C″

Figure 7 Two-hop query expansion. The algorithm takes a set of concepts C (solid red) and derives all subconcepts C′ (all red) in each ontology O and then
repeats the process only once more for all derived concepts C′ (solid blue) to obtain C′′ (all red and blue). Because concepts are mapped across ontologies, the
process traverses simultaneously all ontologies that contain C (and C′), thereby “hopping” across ontologies twice. In this illustration, C′′ captures two more
concepts from the adjacent ontology O2 that would have otherwise been missed with a single iteration.

Recognizing events and exposures. By combining the above proce- myocardial infarction has 63% sensitivity and 94% specificity, whereas
dures (seeding queries using established definitions, normalizing gallstones have 15% sensitivity and 99% specificity.
and aggregating terms, and using only positive, present mentions; see Drug recognition is done in a similar manner using strings from
Supplementary Data S8 online), we are able to recognize events and RxNorm and an independent study at the University of Pittsburgh,
exposures with enough accuracy for the drug safety use case. We deter- which examined the annotations on 1,960 clinical notes manually50
mine the accuracy of the event identification using a gold-­standard and estimated over 84% recall and 84% precision for recognizing drugs
corpus from the 2008 i2b2 Obesity Challenge.49 This corpus has been (R. Boyce, personal communication).
manually annotated by two annotators for 16 conditions and was Ordering the features. We use the time stamps for each note to induce a
designed to evaluate the ability of NLP systems to identify a condition temporal ordering over the recognized concepts on a per-patient basis.
present for a patient given a textual note. We extended this corpus by We focus on first mentions of concepts and do not use exposure win-
manually annotating each of the events listed in Figures 1 and 3 (see dows or eras. We keep positive, present mentions and ignore negated
Supplementary Data S3 online). mentions and family and medical history mentions identified via trig-
Using the set of terms corresponding to the definition of the event ger terms. Therefore, for every patient, the feature matrix contains a
of interest (see Supplementary Data S7 online) and the set of terms temporally ordered list of drugs, diseases, devices, and procedures
recognized by our annotation workflow in the i2b2 notes, we evaluate mentioned in their medical record.
the sensitivity and specificity of identifying each of the events (see Sup-
plementary Data S3 online). Overall, our event identification has 74% SUPPLEMENTARY MATERIAL is linked to the online version of the paper at
sensitivity and 96% specificity. Accuracy varies by condition: for example, http://www.nature.com/cpt

Clinical pharmacology & Therapeutics | VOLUME 93 NUMBER 6 | June 2013 553


15326535, 2013, 6, Downloaded from https://ascpt.onlinelibrary.wiley.com/doi/10.1038/clpt.2013.47, Wiley Online Library on [01/09/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Articles

ACKNOWLEDGMENTS 9. Schneeweiss, S., Rassen, J.A., Glynn, R.J., Avorn, J., Mogun, H. & Brookhart, M.A.
The authors acknowledge support from the National Institutes of Health High-dimensional propensity score adjustment in studies of treatment effects
grant U54-HG004028 for the National Center for Biomedical Ontology. NHS using health care claims data. Epidemiology 20, 512–522 (2009).
also acknowledges support from NIH grant U54-LM008748. The authors 10. Coloma, P.M. et al. Electronic healthcare databases for active drug safety
surveillance: is there enough leverage? Pharmacoepidemiol. Drug Saf. 21,
thank Cédrick Fairon for assistance in evaluating UNITEX and Richard Boyce
611–621 (2012).
for evaluating drug accuracy. 11. Melton, G.B. & Hripcsak, G. Automated detection of adverse events using
natural language processing of discharge summaries. J. Am. Med. Inform.
AUTHOR CONTRIBUTIONS Assoc. 12, 448–457 (2005).
P.L., A.B.-M., S.V.I, and N.H.S wrote the manuscript. P.L., S.V.I., A.B.-M., and 12. Wang, X., Hripcsak, G., Markatou, M. & Friedman, C. Active computerized
N.H.S. designed the research. P.L., S.V.I., A.B.-M., J.M.M., and N.H.S. performed pharmacovigilance using natural language processing, statistics, and
the research. P.L., S.V.I., A.B.-M., R.H., T.P., and T.A.F. analyzed the data. P.L., S.V.I, electronic health records: a feasibility study. J. Am. Med. Inform. Assoc. 16,
A.B.-M, and N.H.S contributed new reagents/analytical tools. 328–337 (2009).
13. Hennessy, S. & Flockhart, D.A. The need for translational research on drug-
CONFLICT OF INTEREST drug interactions. Clin. Pharmacol. Ther. 91, 771–773 (2012).
The authors declared no conflict of interest. 14. Harpaz, R., DuMouchel, W., Shah, N.H., Madigan, D., Ryan, P. & Friedman, C.
Novel data-mining methodologies for adverse drug event discovery and
analysis. Clin. Pharmacol. Ther. 91, 1010–1021 (2012).
Study Highlights 15. Nadkarni, P.M. Drug safety surveillance using de-identified EMR and claims
data: issues and challenges. J. Am. Med. Inform. Assoc. 17, 671–674 (2010).
WHAT IS THE CURRENT KNOWLEDGE ON THE TOPIC? 16. Chapman, W.W., Nadkarni, P.M., Hirschman, L., D’Avolio, L.W., Savova, G.K. &
Uzuner, O. Overcoming barriers to NLP for clinical text: the role of shared tasks
33 The current state of the art in drug safety surveillance and the need for additional creative solutions. J. Am. Med. Inform. Assoc. 18,
540–543 (2011).
relies either on databases of reported adverse events
17. Radecki, R.P. & Sittig, D.F. Application of electronic health records to the Joint
(such as the Adverse Event Reporting System) or on Commission’s 2011 National Patient Safety Goals. JAMA 306, 92–93 (2011).
longitudinal observational data, primarily claims and 18. Morrison, F.P., Li, L., Lai, A.M. & Hripcsak, G. Repurposing the clinical record: can
billing, derived from coded EHR sources. an existing natural language processing system de-identify clinical notes?
J. Am. Med. Inform. Assoc. 16, 37–39 (2009).
WHAT QUESTION DID THIS STUDY ADDRESS? 19. Graham, D.J. et al. Risk of acute myocardial infarction and sudden cardiac
33 In this study, we demonstrate the feasibility of using death in patients treated with cyclo-oxygenase 2 selective and non-selective
non-steroidal anti-inflammatory drugs: nested case-control study. Lancet
large amounts of free-text notes as a substrate for per- 365, 475–481 (2005).
forming pharmacovigilance after transforming clinical 20. Brownstein, J.S., Sordo, M., Kohane, I.S. & Mandl, K.D. The tell-tale heart:
notes into a deidentified patient–feature matrix coded population-based surveillance reveals an association of rofecoxib and
with standard medical terminologies. celecoxib with myocardial infarction. PLoS ONE 2, e840 (2007).
21. Lependu, P., Iyer, S.V., Fairon, C. & Shah, N.H. Annotation analysis for testing
WHAT THIS STUDY ADDS TO OUR KNOWLEDGE drug safety signals using unstructured clinical notes. J. Biomed. Semantics 3
(suppl. 1), S5 (2012).
33 We show that by using a large corpus, we can detect 22. Ryan, P.B., Madigan, D., Stang, P.E., Overhage, J.M., Racoosin, J.A. & Hartzema,
single drug–adverse event associations and adverse A.G. Empirical assessment of methods for risk identification in healthcare
events associated with drug–drug interactions with data: results from the experiments of the Observational Medical Outcomes
high accuracy. Partnership. Stat. Med. 31, 4401–4415 (2012).
23. Phansalkar, S. et al. Drug-drug interactions that should be non-interruptive in
HOW THIS MIGHT CHANGE CLINICAL PHARMACOLOGY order to reduce alert fatigue in electronic health records. J. Am. Med. Inform.
AND THERAPEUTICS Assoc. (2012); e-pub ahead of print 25 September 2012.
24. Halevy, A., Norvig, P. & Pereira, F. The unreasonable effectiveness of data.
33 We argue that drug safety surveillance can be advanced Intelligent Systems, IEEE 24 (2):8–12 (2009).
by using this yet untapped data source of clinical notes, 25. Domingos, P. A few useful things to know about machine learning. Commun.
which comprise the majority of EHR data available. ACM. 55 (10):78–87 (2012).
26. Bate, A. & Evans, S.J. Quantitative signal detection using spontaneous ADR
reporting. Pharmacoepidemiol. Drug Saf. 18, 427–436 (2009).
© 2013 American Society for Clinical Pharmacology and Therapeutics 27. Aronson, A.R. & Lang, F.M. An overview of MetaMap: historical perspective
and recent advances. J. Am. Med. Inform. Assoc. 17, 229–236 (2010).
1. Classen, D.C. et al. ‘Global trigger tool’ shows that adverse events in hospitals 28. D’Avolio, L.W., Nguyen, T.M., Goryachev, S. & Fiore, L.D. Automated concept-
may be ten times greater than previously measured. Health Aff. (Millwood). 30, level information extraction to reduce the need for custom software and rules
581–589 (2011). development. J. Am. Med. Inform. Assoc. 18, 607–613 (2011).
2. Hug, B.L., Keohane, C., Seger, D.L., Yoon, C. & Bates, D.W. The costs of adverse 29. Savova, G.K. et al. Mayo clinical Text Analysis and Knowledge Extraction
drug events in community hospitals. Jt. Comm. J. Qual. Patient Saf. 38, System (cTAKES): architecture, component evaluation and applications. J. Am.
120–126 (2012). Med. Inform. Assoc. 17, 507–513 (2010).
3. Bushardt, R.L., Massey, E.B., Simpson, T.W., Ariail, J.C. & Simpson, K.N. Poly­ 30. LePendu, P., Liu, Y., Iyer, S., Udell, M. & Shah, N.H. Analyzing patterns of drug
pharmacy: misleading, but manageable. Clin. Interv. Aging 3, 383–389 (2008). use in clinical notes for patient safety. AMIA Summit on Clinical Research
4. Stang, P.E. et al. Advancing the science for active surveillance: rationale and Informatics, San Francisco, CA, 21–23 March 2012.
design for the Observational Medical Outcomes Partnership. Ann. Intern. Med. 31. Liu, Y., LePendu, P., Iyer, S., Udell, M. & Shah, N.H. Using temporal patterns
153, 600–606 (2010). in medical records to discern adverse drug events from indications. AMIA
5. Friedman, C.P., Wong, A.K. & Blumenthal, D. Achieving a nationwide learning Summit on Clinical Research Informatics, San Francisco, CA, 21–23 March 2012.
health system. Sci. Transl. Med. 2 (57), 57cm29 (2010). 32. Schuemie, M.J. et al. Using electronic health care records for drug safety signal
6. McClellan, M. Drug safety reform at the FDA–pendulum swing or systematic detection: a comparative evaluation of statistical methods. Med. Care 50,
improvement? N. Engl. J. Med. 356, 1700–1702 (2007). 890–897 (2012).
7. Avorn, J. & Schneeweiss, S. Managing drug-risk information–what to do with 33. Hauben, M. & Bate, A. Decision support methods for the detection of adverse
all those new numbers. N. Engl. J. Med. 361, 647–649 (2009). events in post-marketing data. Drug Discov. Today 14, 343–357 (2009).
8. Gagne, J.J. et al. Active safety monitoring of newly marketed medications 34. Lowe, H.J., Ferris, T.A., Hernandez, P.M. & Weber, S.C. STRIDE–an integrated
in a distributed data network: application of a semi-automated monitoring standards-based translational research informatics platform. AMIA Annu.
system. Clin. Pharmacol. Ther. 92, 80–86 (2012). Symp. Proc. 2009, 391–395 (2009).

554 VOLUME 93 NUMBER 6 | June 2013 | www.nature.com/cpt


15326535, 2013, 6, Downloaded from https://ascpt.onlinelibrary.wiley.com/doi/10.1038/clpt.2013.47, Wiley Online Library on [01/09/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Articles

35. Trifirò, G. et al.; EU-ADR group. Data mining on electronic health record 44. Demner-Fushman, D., Mork, J.G., Shooshan, S.E. & Aronson, A.R. UMLS content
databases for signal detection in pharmacovigilance: which events to views appropriate for NLP processing of the biomedical literature vs. clinical
monitor? Pharmacoepidemiol. Drug Saf. 18, 1176–1184 (2009). text. J. Biomed. Inform. 43, 587–594 (2010).
36. OMOP. Detection of Long Term Adverse Drug Reactions in Electronic 45. McCray, A.T., Bodenreider, O., Malley, J.D. & Browne, A.C. Evaluating UMLS strings
Health Data, 2012; <http://omop.fnih.org/sites/default/files/Schuemie_ for natural language processing. Proc. AMIA Symp., 448–452 (2001).
Detection%20of%20ADRs_Protocol%202011.pdf> (2012). 46. Xu, R., Musen, M.A. & Shah, N.H. A comprehensive analysis of five million UMLS
37. Bauer-Mehren, A. et al. Automatic filtering and substantiation of drug safety metathesaurus terms using eighteen million MEDLINE citations. AMIA Annu.
signals. PLoS Comput. Biol. 8, e1002457 (2012). Symp. Proc. 2010, 907–911 (2010).
38. Kuhn, M., Campillos, M., Letunic, I., Jensen, L.J. & Bork, P. A side effect resource 47. Wu, S.T. et al. UMLS term occurrences in clinical notes: a large scale corpus
to capture phenotypic effects of drugs. Mol. Syst. Biol. 6, 343 (2010). analysis. JAMIA 19, e149–e156 (2012).
39. Hanauer, D.A. & Ramakrishnan, N. Modeling temporal relationships in large 48. Bodenreider, O. & McCray, A.T. Exploring semantic groups through visual
scale clinical associations. J. Am. Med. Inform. Assoc. 20, 332–341 (2012). approaches. J. Biomed. Inform. 36, 414–432 (2003).
40. Sekhon, J.S. Multivariate and propensity score matching software with 49. Uzuner, O. Recognizing obesity and comorbidities in sparse data. J. Am.
automated balance optimization: the matching package for R. J. Stat. Softw. Med. Inform. Assoc. 16 (4), 561–570 (2009).
42 (7), 1–52 (2011). 50. Marshall, M.S. et al. Emerging practices for mapping and linking life sciences
41. Shah, N.H., Bhatia, N., Jonquet, C., Rubin, D., Chiang, A.P. & Musen, M.A. data using RDF—a case series. Web Semantics: Science, Services and Agents on
Comparison of concept recognizers for building the Open Biomedical the World Wide Web 14, 2–13 (2012).
Annotator. BMC Bioinformatics 10 (suppl. 9), S14 (2009).
42. Harkema, H., Dowling, J.N., Thornblade, T. & Chapman, W.W. ConText: an This work is licensed under a Creative
algorithm for determining negation, experiencer, and temporal status from Commons Attribution-NonCommercial-No
clinical reports. J. Biomed. Inform. 42, 839–851 (2009).
43. Paumier, S. De la reconnaissance de formes linguistiques à l’analyse Derivative Works 3.0 License. To view a copy of this license,
syntaxique, Université de Marne-la-Vallée (2003). visit http://creativecommons.org/licenses/by-nc-nd/3.0/

Clinical pharmacology & Therapeutics | VOLUME 93 NUMBER 6 | June 2013 555

You might also like