Professional Documents
Culture Documents
1 s2.0 S1532046404000310 Main
1 s2.0 S1532046404000310 Main
www.elsevier.com/locate/yjbin
Abstract
Automatic detection of cases of febrile illness may have potential for early detection of outbreaks of infectious disease either by
identification of anomalous numbers of febrile illness or in concert with other information in diagnosing specific syndromes, such as
febrile respiratory syndrome. At most institutions, febrile information is contained only in free-text clinical records. We compared
the sensitivity and specificity of three fever detection algorithms for detecting fever from free-text. Keyword CC and CoCo classified
patients based on triage chief complaints; Keyword HP classified patients based on dictated emergency department reports. Key-
word HP was the most sensitive (sensitivity 0.98, specificity 0.89), and Keyword CC was the most specific (sensitivity 0.61, specificity
1.0). Because chief complaints are available sooner than emergency department reports, we suggest a combined application that
classifies patients based on their chief complaint followed by classification based on their emergency department report, once the
report becomes available.
2004 Elsevier Inc. All rights reserved.
Keywords: Disease outbreaks; Fever; Natural language processing; Computerized patient medical records; Public health surveillance; Surveillance;
Infection control
1532-0464/$ - see front matter 2004 Elsevier Inc. All rights reserved.
doi:10.1016/j.jbi.2004.03.002
W.W. Chapman et al. / Journal of Biomedical Informatics 37 (2004) 120–127 121
processing free-text chief complaints recorded in the P ðRÞP ðw1 jRÞP ðw2 jw1 RÞ P ðwn jw1 wn1 RÞ
emergency department [26,35–39]. In this study, we P ðRjGÞ ¼ P :
P ðRÞP ðw1 jRÞP ðw2 jw1 RÞ P ðwn jw1 wn1 RÞ
evaluated the performance of three free-text processing R
includes 16 phrases such as ‘‘no increase,’’ ‘‘not tients seen at the University of Pittsburgh Medical
cause,’’ and ‘‘gram negative.’’ Center (UPMC) Presbyterian Hospital. We compared
(2) Pre-finding negation phrases—phrases that occur be- the detection performance of the three algorithms
fore the term they are negating. Pre-finding phrases against physician classification of fever based on the
are used in Regular Expression 1. NegEx currently patientÕs ED report. In particular, we measured sensi-
applies 125 pre-finding negation phrases; however, tivity, specificity, and likelihood ratio positive (LR+).
seven of the pre-finding negation phrases (‘‘no,’’ ‘‘de-
nies,’’ ‘‘without,’’ ‘‘not,’’ ‘‘no evidence,’’ ‘‘with no,’’ 3.1. Test collection
and ‘‘negative for’’) account for 90% of negations in
most types of dictated reports [43]. In addition, Ne- The test and control cases were randomly selected
gEx uses 21 pre-finding negation phrases to indicate from hospitalized patients seen during the period 02/01/
a conditional possibility (e.g., ‘‘rule out’’ and ‘‘r/o’’). 02 to 12/31/02. One-half of the patients were drawn from
(3) Post-finding negation phrases—phrases that occur patients with an ICD-9 hospital discharge diagnosis of
after the term they are negating. Post-finding 780.6 (fever). The other half from patients without an
phrases are used in Regular Expression 2. NegEx ICD-9 diagnosis of 780.6. All medical records were de-
implements seven post-finding negation phrases identified in accordance with the procedures approved by
(e.g., ‘‘free’’ and ‘‘are ruled out’’) and 14 post-find- the UPMC Institutional Review Board. A physician
ing phrases to indicate conditional possibility (e.g., board-certified in internal medicine and infectious dis-
‘‘did not rule out’’ and ‘‘is to be ruled out’’). eases reviewed reports dictated from the emergency de-
NegExÕs algorithm works as follows: partment to determine if the patients met our definition
• For each sentence, find all negation phrases. of febrile illness. We defined febrile illness as being
• Go to the first negation phrase in the sentence (Neg1). present if there was either (1) a measured temperature
• If Neg1 is a pseudo-negation phrase, skip to the next P 38.0 C or (2) a description of recent fever or chills.
negation phrase in the sentence. The measured temperature could have been determined
• If Neg1 is a pre-finding negation phrase, define a win- in the ED, by the patient, or at another institution such
dow of six terms after Neg1; if Neg1 is a post-finding as a nursing home. The physicianÕs judgments about
negation phrase, define a window of six terms before whether the patients were febrile based on the dictated
Neg1. ED reports comprised the gold standard answers for the
• Decide whether to decrease window size (relevant if test set. To better understand the how fever was de-
another negation phrase or a conjunction like ‘‘but’’ scribed in ED reports, for patients that met the definition
is found within the window). of febrile illness the physician also noted whether the
• Mark all indexed findings within the window as either patient had a measured temperature, at what location the
negated (if negation phrase is a negating phrase) or temperature was measured, and who reported the fever.
possible (if negation phrase is a conditional possibil-
ity phrase). 3.2. Fever detection algorithms
• Repeat for all negation phrases in the sentence.
• Repeat for all sentences. We evaluated three free-text processing algorithms by
As an example, consider the following sentence: ‘‘He comparing their classifications of febrile illness against
says he has not vomited but is short of breath with fever.’’ classifications made by the gold standard physician
Indexed findings are italicized, negation phrases are in based on manual review of ED reports.
bold, and conjunctions that stop the scope of the ne-
gation phrase are underlined. All of the indexed findings 3.2.1. Keyword HP algorithm
in the sentence are eligible for negation based on Reg- The Keyword HP algorithm was designed for this
ular Expression 1 triggered by the negation phrase study to detect fever from history and physical exams
‘‘not.’’ However, the presence of the word ‘‘but’’ within dictated in the emergency department (ED). The algo-
the window prevents short of breath and fever from be- rithm accounts for contextual information about nega-
ing negated so that NegEx only negates vomited. tion and hypothetical descriptions to eliminate false
We incorporated NegEx within the Keyword HP al- positive classifications. The logic of this algorithm is
gorithm to determine whether indexed instances of fever satisfied if either of two clauses is true:
(described below) were negated. 1. (report contains a fever keyword AND the fever key-
word is not negated AND the fever keyword is not in
a hypothetical statement) OR
3. Methods 2. (report describes a measured temperature P 38.0 C).
If either of the clauses is satisfied, the algorithm
We measured the classification accuracy of three fever classifies the patient as febrile. We describe the logic for
detection algorithms on a test set consisting of 213 pa- the two clauses below.
W.W. Chapman et al. / Journal of Biomedical Informatics 37 (2004) 120–127 123
For clause 1 to be true, the report must contain a as febrile by Keyword CC. Because chief complaints are
fever keyword. The set of fever keywords includes: fe- syntactically simple phrases describing a recent problem,
ver(s), febrile, chill*, and low(-)grade temp* where Keyword CC did not use negation or hypothetical
characters within parentheses are optional and asterisks statement detection.
indicate any character, including a white space. In this
way, fevers, chills, and low-grade temperature would be 3.2.3. CoCo algorithm
considered fever keywords. We evaluated a second algorithm for detecting fever
Clause 1 also requires the fever keyword not to be from chief complaints by applying CoCo (described in
negated. To determine if a fever keyword was negated, Section 2) to the patientsÕ chief complaints and deter-
the Keyword HP algorithm uses a regular-expression mining whether the patients had a constitutional syn-
negation algorithm called NegEx [43,44], which is de- drome. Patients with chief complaints indicating a fever
scribed in Section 2 of this paper. NegEx looks for are currently classified by CoCo as having a constitu-
dozens of negation phrases, such as denies or no, up to six tional syndrome as are patients who present with non-
terms before the fever keyword and for a few negation localized complaints typical of many illnesses in their
phrases, such as free or unlikely, up to six terms after the early stages, such as malaise, lethargy, or generalized
fever keyword. In the sentence ‘‘The patient denies any aches. We applied CoCo to the problem of fever de-
occurrence of fever’’ the keyword fever would be con- tection to capture two possible scenarios. First, it is
sidered negated; therefore, keyword HP would not possible that some febrile patients presenting to the ED
classify the patient as having febrile illness. If multiple do not yet complain of fever but are experiencing other
fever keywords were found in a single sentence (e.g., fever constitutional symptoms that typically occur with a fe-
and chills), and one of the fever keywords was negated, ver. Second, because chief complaints are short phrases
the other fever keywords were also negated, regardless of that are designed to represent the most pertinent
whether NegEx considered them negated or not. symptoms rather than to give a complete description of
The last requirement of clause 1 is that the fever a patientÕs clinical condition, even when a patient re-
keyword not occur in a hypothetical statement. To de- ports fever to the triage nurse, a word indicating fever
termine if a fever keyword was used in a hypothetical may not be included in the chief complaint. The CoCo
statement, the algorithm looks for descriptions of fever algorithm represents a potentially more sensitive algo-
occurring in the future, as in ‘‘The patient should return rithm for detecting patients with a fever even though
for increased shortness of breath or fever.’’ If a fever fever is not indicated in the chief complaint. For this
keyword was preceded in the sentence by the word if, study, any patient classified by CoCo as having a con-
return, should, or as needed the keyword was considered stitutional syndrome was considered febrile.
to be used in a hypothetical statement, and the algo-
rithm did not classify the patient as febrile. In addition, 3.3. Outcome measures
if the word spotted preceded a fever keyword, the algo-
rithm did not consider it an instance of fever, in order to We calculated the sensitivity, specificity, and LR+ of
eliminate sentences hypothesizing that the patient might the fever detection algorithms compared against the
have (Rocky Mountain) spotted fever. gold standard classifications made by the physician as
Clause 2 is true if the report describes a measured shown below, where TP is the number of true positives,
temperature. To determine the presence of a measured TN is true negatives, FP is false positives, and FN is
temperature in a report, we first located any occurrences false negatives.
of the word temp*. If a number ranging inclusively from TP
38 to 44 or from 100.4 to 111.2 occurred within nine Sensitivity ¼ ;
TP þ FN
words after temp*, the patient was considered to have a
measured temperature meeting the definition of febrile TN
Specificity ¼ ;
illness. For example, in the sentence ‘‘Her temperature TN þ FP
was measured at the nursing home as being 38.5 C’’ the
algorithm would classify the patient as febrile. Sensitivity
LRþ ¼ :
1 Specificity
3.2.2. Keyword CC algorithm
The Keyword CC algorithm was designed to detect
cases of febrile illness from chief complaints electroni- 4. Results
cally entered on admission to the ED. If any of the fever
keywords used by Keyword HP or the term temp* ap- Table 1 shows the accuracy of classification with 95%
peared in the chief complaint, the patient was considered confidence intervals for the three algorithms when com-
febrile. For example, a patient with the chief complaint pared against gold standard classifications. The Key-
‘‘increased temperature’’ or ‘‘fever’’ would be classified word HP algorithm was the most sensitive algorithm.
124 W.W. Chapman et al. / Journal of Biomedical Informatics 37 (2004) 120–127
Table 1
Performance of three fever detection algorithms
Keyword HP Keyword CC CoCo
Sensitivity 0.98 (107/109) 0.61 (66/109) 0.57 (62/109)
[95% CI] [0.94–0.995] [0.52–0.69] [0.48–0.66]
4.1. Description of fever in data set tients contained a fever or temperature keyword but
were not detected by CoCo. The reason CoCo did not
Prevalence of fever in the data set was 51% (109/213). accurately classify these patients as having a constitu-
Only nine (4%) of the 213 reports contained no infor- tional syndrome—in spite of CoCoÕs being trained to
mation about temperature or fever. Criteria by which classify chief complaints with fever as constitutional—
the gold standard physician determined the patient was involves the current method CoCo uses for determining
febrile were as follows. Of the 109 patients with fever, 96 the best syndromic classification when multiple classifi-
(88%) had a measured temperature. In 80 (73%) of the cations exist. Currently, CoCo selects the single syn-
patients with fever, the temperature was measured and dromic classification with the highest probability. Thus,
found to be elevated in the ED, whereas in 16 (15%) chief complaints such as rash/fever, nausea/vomiting/fe-
instances the temperature had been taken by the patient ver, or fever and headaches were classified as rash, gas-
or at an institution where the patient had been previ- trointestinal, and neurological, respectively, because the
ously. In 13 (12%) of 109 patients with fever, the fever or posterior probabilities for those syndromes were higher
chills was self-reported or a report of fever came from than the probabilities for constitutional syndrome.
another institution. Thirteen of the febrile patients were detected by
CoCo but not by Keyword CC. All of these patients had
4.2. Error analysis of fever detection algorithms chief complaints indicating a constitutional illness that
did not specifically mention fever, such as sepsis, viral
4.2.1. Keyword CC algorithm infection, and weakness. However, CoCo also generated
All patients with fever indicated in the chief com- five false positive classifications for patients with chief
plaint were detected by Keyword CC, and all patients complaints of viral infection, dizziness, muscle aches, and
with a fever keyword in the chief complaint had a fever weakness.
according to the gold standard classification, indicating
that the Keyword CC algorithm is precise and specific. 4.2.3. Keyword HP algorithm
However, Keyword CC had a sensitivity of only 0.61. Keyword HP generated two false negatives and 11
False negatives were due to chief complaints that did false positives. One of the false negatives was due to the
not explicitly indicate a fever. Thirteen of the false vague description of fever: ‘‘he felt warm.’’ The other
negatives included patients with constitutional chief false negative was an error on the part of the expert
complaints that were detected correctly by CoCo, as physician, who classified an afebrile patient as febrile.
described below. However, the majority of the false Four false positives were due to contradictions in the
negatives were due to chief complaints not generally record between report of fever and measured tempera-
associated with febrile illness, including headache, ture. For example, one patient was described as febrile
tachypnea, sob, altered mental status, dehydration, leg for the last week, but his measured temperature in the
swelling, and chest pain. Five patients had chief com- ED was 37.6 C. Three false positives were due to NegEx
plaints describing a disease or syndrome often associ- errors in which fever keywords were not properly ne-
ated with fever, such as conjunctivitis, bacteremia, and gated and one was due to not identifying a fever key-
flu like symptoms, and four patients had chief com- word as a hypothetical statement (‘‘We have given her
plaints that instead of describing a clinical complaint instructions on what to watch out for, including . . . fe-
described an evaluation or procedure for which the ver, chills, distention . . .’’). Two false positive were due
patient came to the ED. to a conflict between the residentÕs and the attendingÕs
notes, and in one instance ‘‘fever of unknown origin’’
4.2.2. CoCo algorithm was interpreted by the gold standard physician as de-
CoCo had slightly lower sensitivity and specificity scribing an undocumented sign rather than a possible
than Keyword CC. Chief complaints for 16 febrile pa- diagnosis.
W.W. Chapman et al. / Journal of Biomedical Informatics 37 (2004) 120–127 125
linguistic variation in reporting may exist across the syndromes (syndromic surveillance): the example of lower respi-
United States. ratory infection. BMC Public Health 2001;1(1):9.
[11] Matsui T, Takahashi H, Ohyama T, et al. An evaluation of
syndromic surveillance for the G8 Summit in Miyazaki and
Fukuoka, 2000. Kansenshogaku Zasshi 2002;76(3):161–6.
6. Conclusion [12] Irvin CB, Nouhan PP, Rice K. Syndromic analysis of computer-
ized emergency department patientsÕ chief complaints: an oppor-
tunity for bioterrorism and influenza surveillance. Ann Emerg
We measured the ability of three algorithms to detect
Med 2003;41(4):447–52.
patients with a fever from free-text medical records. [13] Gesteland PH, Wagner MM, Chapman WW, et al. Rapid
Two of the algorithms used triage chief complaints to deployment of an electronic disease surveillance system in the
classify the patients, and a third used the information state of Utah for the 2002 olympic winter games. Proc AMIA
described in the dictated ED report. The algorithm using Symp 2002;3:285–9.
[14] Tsui FC, Espino JU, Wagner MM, et al. Data, network, and
information from the ED report was the most sensitive,
application: technical description of the Utah RODS winter
whereas the algorithms using information from the chief olympic biosurveillance system. Proc AMIA Symp 2002:815–9.
complaint were the most specific. A surveillance appli- [15] Lewis MD, Pavlin JA, Mansfield JL, et al. Disease outbreak
cation incorporating fever detection from chief com- detection system using syndromic data in the greater Washington,
plaints—which are the earliest electronic clinical data DC area. Am J Prev Med 2002;23(3):180–6.
[16] Recognition of illness associated with the intentional release of a
available in an emergency care facility—followed by
biologic agent. MMWR Morb Mortal Wkly Rep
detection from ED reports as they become available 2001;50(41):893–7.
may provide an effective method for surveillance of fe- [17] Bravata DM, McDonald K, Owens DK, et al. Bioterrorism:
brile illness. information technologies and decision support systems. Evidence
Report/Technology Assessment No. 5 (Prepared by UCSF-Stan-
ford Evidence-based Practice Center under Contract No. 290-97-
0013). Rockville, MD: Agency for Healthcare Research and
Acknowledgments Quality; 2002.
[18] Health CDoP. Bioterrorism: a public health threat. CD Info
We acknowledge Zhongwei Lu for his programming 2000;2(5).
[19] Monterey County Health Department Disaster Plan. Monterey,
assistance. This work was partially funded by Defense
CA: Monterey County Health Department; October 9, 2001.
Advanced Research Projects Agency (DARPA) Coop- Available from: http://www.co.monterey.ca.us/health/Publica-
erative Agreement No. F30602-01-2-0550; Pennsylvania tions/pdf/MCHDDisasterPlan.pdf. Accessed December 9, 2003.
Department of Health Grant No. ME-01-737; and [20] Available from: http://www.who.int/csr/sars/casedefinition/en/.
AHRQ Grant No. 290-00-0009. Accessed April 12, 2003.
[21] Available from: http://www.cdc.gov/ncidod/sars/casedefini-
tion.htm. Accessed April 12, 2003.
[22] Chen SY, Su CP, Ma MH, et al. Predictive model of
References diagnosing probable cases of severe acute respiratory syndrome
in febrile patients with exposure risk. Ann Emerg Med
[1] Available from: http://www.health.pitt.edu/rods/. Accessed April 2004;43(1):1–5.
16, 2003. [23] Friedman C, Hripcsak G. Natural language processing and its
[2] Available from: http://www.geis.ha.osd.mil/GEIS/SurveillanceAc- future in medicine. Acad Med 1999;74(8):890–5.
tivities/ESSENCE/ESSENCE.asp. Accessed April 16, 2003. [24] Mendonca EA, Cimino JJ, Johnson SB. Using narrative reports to
[3] Lober WB, Karras BT, Wagner MM, et al. Roundtable on support a digital library. Proc AMIA Symp 2001:458–62.
bioterrorism detection: information system-based surveillance. J [25] Chapman WW, Cooper GF, Hanbury P, Chapman BE, Harrison
Am Med Inform Assoc 2002;9(2):105–15. LH, Wagner MM. Creating a text classifier to detect radiology
[4] Available from: http://secure.cirg.washington.edu/bt2001amia/in- reports describing mediastinal findings associated with inhala-
dex.htm. Accessed April 16, 2003. tional anthrax and other disorders. J Am Med Inform Assoc
[5] Zelicoff A, Brillman J, Forslund DW, et al. The rapid syndrome 2003;10(5):494–503.
validation project (RSVP), a technical paper. Sandia National [26] Olszewski RT. Bayesian classification of triage diagnoses for the
Laboratories; 2001. early detection of epidemics. In: Recent Advances in Artificial
[6] Available from: http://www.nytimes.com/2003/04/04/nyregion/ Intelligence: Proceedings of the 16th International FLAIRS
04WARN.html?ex ¼ 1050552000&en ¼ 2c63a52eb80102bc&ei- Conference, 2003. AAAI Press; 2003. p. 412–6.
&ei ¼ 5070. Accessed April 16, 2003. [27] Cho PS, Taira RK, Kangarloo H. Automatic section segmenta-
[7] Brinsfield K, Gunn J, Barry M, McKenna V, Dyer K, Sulis C. tion of medical reports. Proc AMIA Annu Fall Symp 2003.
Using volume-based surveillance for an outbreak early warning [28] Haug PJ, Christensen L, Gundersen M, Clemons B, Koehler S,
system. Acad Emerg Med 2001;8:492. Bauer K. A natural language parsing system for encoding
[8] Reis BY, Mandl KD. Time series modeling for syndromic admitting diagnoses. Proc AMIA Annu Fall Symp 1997:814–8.
surveillance. BMC Med Inform Decis Making 2003;3(1):2. [29] Taira RK, Bui AA, Kangarloo H. Identification of patient name
[9] Lazarus R, Kleinman K, Dashevsky I, et al. Use of automated references within medical documents using semantic selectional
ambulatory-care encounter records for detection of acute illness restrictions. Proc AMIA Symp 2002:757–61.
clusters, including potential bioterrorism events. Emerg Infect Dis [30] Friedman C. A broad-coverage natural language processing
2002;8(8):753–60. system. Proc AMIA Symp 2000:270–4.
[10] Lazarus R, Kleinman KP, Dashevsky I, DeMaria A, Platt R. [31] Baud RH, Lovis C, Ruch P, Rassinoux AM. A light knowledge
Using automated medical records for rapid identification of illness model for linguistic applications. Proc AMIA Symp 2001:37–41.
W.W. Chapman et al. / Journal of Biomedical Informatics 37 (2004) 120–127 127
[32] Fiszman M, Chapman WW, Aronsky D, Evans RS, Haug PJ. [39] Travers DA, Haas SW. Using nursesÕ natural language entries to
Automatic detection of acute bacterial pneumonia from chest X- build a concept-oriented terminology for patientsÕ chief com-
ray reports. J Am Med Inform Assoc 2000;7(6):593–604. plaints in the emergency department. J Biomed Inform 2003;36(4–
[33] Hahn U, Romacker M, Schulz S. MEDSYNDIKATE—a natural 5):260–70.
language system for the extraction of medical information from [40] Tsui FC, Espino JU, Dato VM, Gesteland PH, Hutman J,
findings reports. Int J Med Inf 2002;67(1–3):63–74. Wagner MM. Technical description of RODS: a real-time public
[34] Taira RK, Soderland SG. A statistical natural language processor health surveillance system. J Am Med Inform Assoc
for medical reports. Proc AMIA Symp 1999:970–4. 2003;10(5):399–408.
[35] Chapman WW, Christensen L, Wagner MM, et al. Classifying [41] Wong W, Moore AW, Cooper G, Wagner M. Rule-based
free-text triage chief complaints into syndromic categories with anomaly pattern detection for detecting disease outbreaks. In:
natural language processing. AI Med 2004 [in press]. Proceedings of the 18th National Conference on Artificial
[36] Ivanov O, Wagner MM, Chapman WW, Olszewski RT. Accuracy Intelligence (AAAI-02). 2002.
of three classifiers of acute gastrointestinal syndrome for syn- [42] Manning CD, Schutze H. Foundations of statistical natural
dromic surveillance. Proc AMIA Symp 2002:345–9. language processing. Cambridge, MA: The MIT Press;
[37] Ivanov O, Gesteland P, Hogan W, Mundorff MB, Wagner MM. 1999.
Detection of pediatric respiratory and gastrointestinal outbreaks [43] Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan
from free-text chief complaints. Proc AMIA Annu Fall Symp BG. Evaluation of negation phrases in narrative clinical reports.
2003:318–22. Proc AMIA Symp 2001:105–9.
[38] Travers DA, Waller A, Haas SW, Lober WB, Beard C. [44] Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan
Emergency department data for bioterrorism surveillance: elec- BG. A simple algorithm for identifying negated findings and
tronic data availability, timeliness, sources and standards. Proc diseases in discharge summaries. J Biomed Inform 2001;34(5):301–
AMIA Symp 2003:664–8. 10.