International Application of a New Probability

Algorithm for the Diagnosis of Coronary Artery

Robert Detrano, MD, PhD, Andras Janosi, MD, Walter Steinbrunn, MD, Matthias Pfisterer, MD,
Johann-Jakob Schmid, DE, Sarbjit Sandhu, MD, Kern H. Guppy, PhD, Stella Lee, MS,
and Victor Froelicher, MD

espiteOsler’saxiom that “medicine is a scienceof

A new discrtminant function modef for estimating
probabiitttes of angtographk coronary disease was
tested for reiiabitii and ctbtkal utility in 3 pattent
D uncertainty and an art of probability,” l the ap
plicability of probability analysis to the diagno-
sis of common diseasesis still uncertain. Part of this
test groups. This modei, derived from the ciinkal uncertainty is due to the difficulty in obtaining clinical
and nontnvasive test resuits of 303 patients under- data from very large numbers of patients. Such data are
going angiography at the Cieveiand Clink in Cieve- neededto derive accurate probability modelsthat could
land, Ohio, was applied to a group of 435 patients be applied universally. Diamond et al* were the first to
undergoing angiography at ths Hungarian Institute circumvent this paucity of large data collections. By cal-
of Cardiigy in Budapest, Hungary (disease preva- culating weighted averagesof sensitivities and specific-
lence 36%); 200 patients undergoing angiography ities obtained from a review of published studies on di-
at the Veterans Administration Medical Center in agnostic testing, and by using tables of pretest probabil-
Long Beach, California (disease prevafence 75%); ities also from those studies, they pooled knowledge
and 143 such patients from the University Hospi- obtained from several thousand patients to define a
tals in Zurkh and Basei, Switzerland (disease prev- probability algorithm for the diagnosis of coronary ar-
alence 64%). The probabiiiis that resulted from tery disease (CAD). Becausepublished reports rarely
the app&ation of the Cteveiand algorithm were give complete distributions of clinical and test variables,
compared with those derived by applying a Bayes- the investigators had to assume that these variables
ian algorithm derived from published medkal stud- were independent of one another. This assumption of
ies called CADENZA to the same 3 patient test independencecreated errors in their predictions. Fur-
groups. Both algorithms ovetpredkted the proba- thermore, the reported sensitivities and specificities are
bility of disease at the Hungarian and American affected by bias and other methodologic problems3-5
centers. Overprediction was more pronounced with that may actually lead to further errors when probabili-
the use of CADENZA (average overestimation 16 ties are based on them. We undertook this project to
vs 10% and llvsS%, p <O.OOl). IntheSwiss determine if a probability algorithm derived from the
group, the discriminant function underestimated clinical and test characteristics of a relatively small
(by 7%) and CADENZA siightiy overestimated (by group of 303 patients could accurately predict CAD
2%) disease probability. Ciinkat utility, assessed as probabilities in study samplesdrawn from various eth-
the percentage of pattents correctiy ciasstfted, was nic populations with different clinical characteristics.
mode&ty superkr for the new discriminant function This algorithm was comparedwith the algorithm of Di-
as compared with CADENZA in the Hungartan amond et a1.2
group and similar in the American and Swiss
groqm. It was conciuded that coronary dkease From the Division of Cardiology, Department of Medicine, Veterans
probabilities derived from discriminant functions Administration Medical Center, Long Beach,California; the Hungari-
an Institute of Cardiology, Budapest,Hungary; the Division of Cardiol-
are reiiibie and ciinkalty useful when appiii to OW.
-_. Deuartment of Medicine. Universitv Hosuital. Zurich. Switzer-
patients with chest pain syndromes and intermedi- land; the Division of Cardiology, Department of Medicine, University
ate disease prevalence. Hospital, Basel,Switzerland; the EngineeringDepartment,Studer Cor-
(Am J Cardid 1969;64:304-310) poration, Regensdorf,Switzerland; and the Department of Statistics,
Stanford University, Palo Alto, California. This work was supportedin
part by grant IDH-SCOR HL-17651 from the National Institutes of
Health, Bethesda,Maryland, to Cedars-SinaiMedical Center, Division
of Cardiology, Los Angeles,California. Dr. Janosiis the recipient of a
fellowship from the National Institutes of Health. Manuscript received
February 27, 1989; revised manuscript received May 12, 1989, and
acceptedMay 14.
Address for reprints: Robert Detrano, MD, Cardiology 11l-C,
Veterans Administration Medical Center, 5901 East Seventh Street,
Long Beach,California 90822.


METHODS in the casewhen exerciseelectrocardiography was per-
Reference group for derivation of the probability formed, maximal heart rate and exercise-inducedangi-
modeb The reference group used to derive the model na. BecauseST-segment depressionsinduced by exer-
consistedof 303 consecutivepatients referred for coro- cise are sometimesdifficult to interpret, clinically rele-
nary angiography at the Cleveland Clinic between May vant combinations do not necessarily include them.
1981 and September 1984. No patient had a history or However, when the slope of the ST segment could be
electrocardiographic evidence of prior myocardial in- calculated, the actual depressionalso was required. In
farction or known valvular or cardiomyopathic disease. all, there were 352 such relevant combinations and,
All 303 patients provided a history and underwent phys- therefore, 352 logistic regressioncalculations were per-
ical examination, electrocardiogram at rest, serum cho- formed.
lesterol determination and fasting blood sugar determi- Testing the algorithm: The 352 subsetsof regression
nation as part of their routine evaluation. Historical coefficients were stored and indexed in a computer data
data were recorded and coded without knowledge of file. A computer program was written to read clinical
noninvasive or angiographic test data. In addition, after and test data of test patients and match the data avail-
giving informed consent, the patients underwent 3 non- able with the appropriate subsetof coefficients from this
invasive tests as part of a researchprotocol. The results data file. The program then computes the ith patient’s
of thesetests (exerciseelectrocardiogram,thallium scin- diseaseprobability, Pi, using the formula: Pi = es/(1 +
tigraphy and cardiac fluoroscopy) were not interpreted es), where fi is the linear combination of this patient’s
until after the coronary angiograms had been read. data, using the appropriate subset of coefficients.
Thesetestswere analyzed and the results recordedwith- Rayesian method: The Bayesian method based on
out knowledge of the historical or angiographic results. published medical studies can be applied using a com-
Work-up bias was therefore not present.3The mean age puter program called CADENZA.8 We obtained this
of these patients was 54 years; 206 were men. Angio- program from its authors and transformed all test pa-
grams were interpreted by a cardiologist without knowl- tient data to fit the input documentation of this comput-
edge of other test data. Further details of this data col- er program. The program uses weighted averages of
lection are described elsewhere.6 sensitivities and specificities from published medical
Clinical and test variables: The 4 clinical variables works. The pretest probabilities in the program also
were age, sex, chest pain type (typical anginal, atypical were derived from published studies.9The sensitivities,
anginal, nonanginal, asymptomatic’j) and systolic blood specificities and pretest probabilities are sequentially
pressure. Routine test data collected included serum substituted in Bayes’theorem. The resultant equation is
cholesterol, fasting blood sugar >120 mg/dl and elec- applied first to pretest probabilities basedonly on clini-
trocardiographic results at rest (classified as (1) normal; cal and routine test data (age, sex, chest pain type, his-
(2) ST-T-wave abnormality [T-wave inversions or ST tory of diabetes and hypertension, serum cholesterol,
depression>0.05 mV or both]; or (3) probable or defi- electrocardiogram at rest and so on) and then consecu-
nite left ventricular hypertrophy by Estes’ criteria). tively to resulting posttest probabilities from the previ-
Noninvasive tests were exercise electrocardiogram, ous calculation.1°
exercisethallium scintigraphy and fluoroscopy for coro- Test group data: Three test groups were analyzed
nary calcium. Exercise data collected included maximal using the 13-variable discriminant function and the
heart rate, exercise-induced angina, slope of the peak Bayesian algorithm. These groups included subjects
exercise ST segment (upsloping, flat or downsloping), without a prior catheterization or evidenceof prior in-
exercise-inducedST-segment depression (where 1 mm farction or valve disease,whose CAD status was there-
= 0.1 mV), exercise thallium scintigraphic defects fore unknown but whoseangiogramswere performed to
(fixed, reversible or none). The fluoroscopic data con- determine the presenceand severity of disease.These
sisted of the number of major vesselsthat appearedto test groups were (1) 200 patients at the Veterans Ad-
contain calcium.7 Data for all of these 13 variables were ministration Medical Center in Long Beach, California;
entered into a computerized database. (2) 425 patients at the Hungarian Institute of Cardiolo-
Derivation of the algorithm: The algorithm was de- gy; and (3) 143 patients at 2 Swissuniversity hospitals.
rived by applying logistic regression to the 13 clinical In thesetest groups, noninvasivetest results were not
and test variables against the angiographic variable of withheld from the treating physician, and might have
the presenceor absenceof a >50% diameter narrowing influenced the decision to perform coronary angiogra-
(dependentvariable). Our object was to make the algo- phy; therefore, workup bias or angiographic referral
rithm relevant to clinical situations in which data might bias was present.3Clinical data and noninvasive test re-
be present for only certain combinations of the 13 vari- sults were recorded before (and therefore without
ables. Derivation of the most complete algorithm would knowledge of) the coronary angiography results.
require applying logistic regressionto all 8,191 possible All available data that could be used by either algo-
combinations of these 13 variables. To simplify this cal- rithm were collected from the patient records for the
culation, only “clinically relevant” combinations were test groups. The data included age, sex, chest pain char-
allowed. Combinations were considered clinically rele- acteristics (as seen earlier), systolic blood pressure at
vant if they included age, sex and chest pain type, and, rest, history of hypertension, smoking history, history of



TABLE I Clinical Characteristics of the Study Groups

Mean Age Men Angina Disease* MVD+

Study Group (yrs) (%I (%I WI @) SBP*

Long Beach
(n = 200) 59 97 66 74 62 135
(n = 425) 48 71 43 38 63 132
(n = 85) 55 86 73 a5 69 139
(n = 58) 54 93 74 74 40 119
(n = 303) 54 68 53 46 60 132
* Diseaseis definedas >50% diameter narrowing;+ MVD = multivesseldiseasedefinedby X0% diameter narrowingin >l vessel:* SBP = mean systolic blood pressure.

TABLE II Sensitivities and Specificities of Exercise Testing in the Study Groups

l-mm ST Depression Exercise Angina Thallium Defect

Sensitivity Specificity Sensitivity Specificity Sensitivity Specificity

Long Beach (%) 64 73 72 63 90 45

(alii49) (37/51) (83/115) (20/32) (27/30) (5/11)
Hungary (%) 67 86 59 a9 94 59
(109/162) (225,‘263) (95/162) (233,‘263) (15/16) (10/17)
Base1(%) 29 a5 70
(21172) (11/13) F&,71) ::, 13) (N/71) :;,13j
Zurich (%) :;3,43j 67 30 80 1 patient 1 patient
(10/15) (13/43) (1205)
Cleveland (%) 66 73 ;;6,139) 86 73 79
(92/ 139) (119/164) (144/164) (101/138) (129/163)

diabetes, family history, electrocardiogram at rest, se- Swiss universities: This group was drawn from all
rum cholesterol and fasting blood sugar. The etercise subjects undergoing cardiac catheterization at the uni-
variables included medications at the time of the exer- versity hospitals in Zurich and Basel, Switzerland, in
cise test, duration of the exercise test, peak achieved 1985. The aforementioned exclusion criteria were ap-
heart rate, heart rate at rest, peak exercise systolic plied. Of the 143 Swiss patients, 58 underwent catheter-
blood pressure, exercise-induced angina or hypotension ization in Zurich and 85 in Basel.
or both, exercise-induced ST depression relative to rest, The clinical characteristics of the subjects in the 3
exercise-induced ST slope, exercise-induced R-wave test groups and the rationalization for combining the
change, radionuclide ejection fraction and wall motion groups from the Swiss universities is given in the Re-
abnormalities at rest and during exercise, and exercise sults section. The observed sensitivities and specificities
thallium results (as seen earlier). Coronary angiograms of the exercise electrocardiogram and exercise thallium
were considered abnormal if there was >50% luminal scintigraphy also are given in the Results section.
narrowing of any major epicardial vessel. Histories, Evatuation &f probabilities: The reliability of a prob-
physical examinations and all noninvasive tests were ability estimate :reflects its numerical proximity to the
performed within 6 weeks before the date of the coro- actual disease prevalence in subjects with similar clini-
nary angiogram. A description of the individual test cal and test data. If the estimates are reliable, the mean
groups follows. disease probability in a test group will be the same as
Long Beach Veterans Administration Medical Cen- the disease prevalence ii the group. Thus, by subtract-
t&: This group was drawn from all consecutive subjects ing the prevalence of fisease from the mean or expected
undergoing cardiac catheterization at the Veterans Ad- probabilities, and dlvlding this difference by the stan-
ministration Medical Center in Long Beach between dard deviation, we get an overestimation index that is a
1984 and 1987. After excluding those with prior infarc- measure of how much a model over- or underpredicts
tion, valvular disease and prior catheterization, there disease probability. Because this is a mean of an as-
were 200 test group subjects. sumedly normally distributed difference divided by its
Hungarian Institute of Cardiology: This group was standard deviation,” comparison of models is simpli-
drawn from all patients undergoing catheterjzation at fied. This can be done using the Student t test with a
the Hungarian Institute of Cardiology, in Budapest be- standard deviation of 1.0.
tween 1983 and 1987. Patients with prior infarction or Finer detail can be obtained by sorting the probabili-
valvular disease were excluded. The remaining 425 sub- ty estimates in ascending order and then dividing them
jects made up the Hungarian test group. into quintiles of probability.12 The expected probabili-


TABLE III Percent* Correctly Classified by Using Three Probability Thresholds

0.4 0.5 0.6


Hungarian 74*2 71 f2 77 f 2+ 74f2 82 f 2+ 76f2

Only those 58zk3 56 CL3 64f3 61 f3 73 f 3+ 64f3
Long Beach, 7af3 77 f 3 79*3 77 f 3 79zk3+ 75*3
Only those 64f5 61 f5 66f5 60f5 65f5+ 57 * 5
Swiss 82 f 3 82h3 81 f 3 81 f 3 7af3 79f3
Only those 73 f 6 77 f 5 70f6 73f6 66f6 70f6
* Percent f standard error of percent; + p <0.05 vs CADENZA (McNemar’s test).
CDF = Cleveland discrlmwxv4 function.

ties in each quintile are compared with the prevalences Reliability: Overestimation indexesfor the probabili-
in that quintile and the differencescomputed. Thesedif- ty estimates were significantly higher for CADENZA
ferencesare a measure of the overestimation per quin- than for the discriminant function in the American test
tile, and reflect the reliability of low, intermediate and group (6.1 vs 2.0, p <O.OOl) and in the Hungarian
high estimates. group (10.4 vs 5.6, p <O.OOl). In the Swiss group,
A probability estimate will be clinically useful if itCADENZA slightly overestimateddiseaseprobability,
accurately classifiespatients as diseasedor not diseased. whereas the discriminant function underestimated it.
A probability algorithm will be useful if its probability Figure lA, B and C shows that both models tend to
estimatesare clinically useful over an appropriate range overestimateintermediate diseaseprobabilities (second,
of probability thresholds. third and fourth quintiles), with CADENZA causing
We agreed that the most relevant probability thresh- the most overestimation. At the Swiss universities,
olds for making decisions concerning angiography or where diseaseprevalencewas highest, both models un-
therapy lie between 0.20 and 0.80 for subjects with derestimated low probabilities. In all but the second
chest pain syndromes.Therefore, the percentageof cor- quintile of this group, the probability estimatesderived
rect classifications was calculated over this range (0.2 < by CADENZA had a larger absolute error than those
p < 0.8) for both algorithms in the 3 test groups. obtained using the discriminant function.
Subjects whose clinical and test data are concordant Clinical utility: The percentageof patients who are
will generally have very high or very low probability es- correctly classified will depend on the accuracy of the
timates from any algorithm. We agreed that a clinician model and on the overall diseaseprevalencein the test
would probably not need a probability estimate for clin- group. Models that overestimatediseasewill causemore
ical decision making in these cases. Such estimates erroneousdiagnosesat low prevalence,but will classify
would instead be most useful for caseswhere patient patients correctly at high prevalence.Figure 2A, B and
data are discordant. These patients would have interme- C bears this out. The Cleveland discriminant function
diate probability estimates by most algorithms; there- more accurately classifiedpatients at the Hungarian In-
fore, the percent correct classification rate was recalcu- stitute, where there was a low diseaseprevalence.These
lated ignoring all subjects for whom probability esti- differences were statistically significant between proba-
mates from both the discriminant function and bility thresholdsof 0.4 and 0.7 (p <0.05). In the Ameri-
CADENZA were out of the range (0.2 < p < 0.8). can group, where diseaseprevalencewas higher, the dis-
To compare the correct classification rates for the 2 criminant function also classified patients more correct-
algorithms, we used the McNemar’s test. ly than did CADENZA, but the difference was not
significant except at thresholdsaround 0.6. In the Swiss
RESULTS group, which had the highest diseaseprevalence, CA-
Table I lists the demographic and clinical character- DENZA resulted in a significantly higher rate of cor-
istics of the various patient study groups. Table II lists rect classification at thresholds of at least 0.70 (p
the sensitivities and specificities of 1 mm = 0.1 mV ex- <0.05).
ercise-induced ST depression, exercise-induced angina It is interesting to comparethe percentageof correct
pectoris and an abnormal thallium scintigram (fixed or classifications for the 2 algorithms in the 3 groups at
reversible defect or both). specific thresholds. Table III is such a comparison.
The 2 Swiss groups are very similar with respect to Thresholds of 0.4, 0.5 and 0.6 were used becausewe
age, sex and symptoms. Becauseof their small size and thought them to be appropriate for many clinical deci-
their similarity, they were combined into a single group. sions. The table gives the percentageof correct classifi-



cations by the 2 algorithms (1) when all patients are decisionsare more difficult and the use of a probability
included; and (2) when those for whom both algorithms algorithm is more relevant. Both algorithms performed
produced very low (10.2) or very high (10.8) probabil- less well when these exclusions were made. The dis-
ities are excluded. Excluding these latter subjects may criminant function performed moderately better than
be appropriate becausethis exclusion leaves primarily CADENZA, both with and without the exclusion of
those patients with discordant results for whom clinical subjects with extreme probabilities.

Ownrtimation ot Probability
0.3, --“I I

80% -

- Cleveland D. F.

0.2 0.3 0.4 0.6 0.6 0.7 0.8

-0.3 A
1 2 3 4 6 Probability Threehold
A Ctulntlle of Probability

Ovwertlmrtlon of Probability

- Cleveland D. F.

0.2 0.3 0.4 0.5 0.8 0.7 0.8

B Probability Threrhold
1 2 6
B Quintile of3Probabllit~

Owmrtlm~tion of Probability

0.1 - - Cleveland D. F.

.O.l -

.0.2 -
00% I
0.2 0.3 0.4 0.8 0.8. 0.7 0.8
.0.3 ’ c
1 2 3 6 Probability Threshold
C Qulntlle of Probabllit:

FIGURE 1. - by quintiles fer the Amerkan (A), pmbabH&tbrdddshtbe~(A),~(B)and
Mmgafian (B) and Swiss (C) test gmups. Bar Mghts are swiss (c) test greups. verlkai dlances behveenibeauves
cakdatd by subtrm disease prevalence from the average aredgn&a~tadyforhwholdsnecvO.6(-),O.Sto
estimated probabilities in each quintile. 0.7 (Hungarian) and near 0.7 (Swiss).


DISCUSSION ing the diseaseprobability of subjectsin whom the pre-
The results reported herein illustrate the compara- test probability is between 0.2 and 0.7. The algorithms
tive accuracies of probabilities derived by applying a based on published studies can be accurately applied
discriminant function based on clinical and test data when the pretest probability is between 0.7 and 1.0.
from a relatively small group of patients and those de- Such tactics would reduce the number of unnecessary
rived from a Bayesian algorithm based on published normal coronary angiograms with fewer “missed” pa-
medical studies. The superiority of the discriminant tients with severedisease.However, we caution against
function is most clear at an institution (Hungarian In- the application of such algorithms in subjectswith very
stitute of Cardiology) in which the diseaseprevalencein low probability of disease,such as those seenin screen-
an angiographic cohort was relatively low (38%). Its su- ing clinics.
periority is less clear in the American group, in which As in all previous reports on this topic, angiograms
the more accurate probabilities of the discriminant were read by visual assessmentof the percent diameter
function did not produce a significantly higher number narrowing. Although this was done without knowledge
of correct diagnoses.In the Swiss group, in which the of test results, it is likely that the clinical angiographers
diseaseprevalence was highest, the CADENZA algo- making these assessmentswere aware of clinical symp-
rithm produced a higher percentageof correctly classi- toms, and it is even more probable that they knew the
fied patients than did the discriminant function, though agesand sexesof their patients. Although visual assess-
its overall reliability was similar. ments have well-known drawbacks,19they are standard
Cornfield et all3 noted that the interdependencyof practice in most institutions and were usedin the groups
symptoms generates overconfidence when algorithms from which both models were derived. It was therefore
basedon published medical studies are used,such as the decidedthat they presenteda more appropriate referent
Bayesian algorithm we used here for comparison pur- standard than more refined endpoints,such as quantita-
poses.This overconfidenceis reflected in the over- and tive coronary angiography.
underestimation of diseaseprobability that is evident in The algorithm derived from medical studies had the
Figure lA, B and C. apparent advantageof having more data available to it.
There is another potential source of overconfidence Despite this, it did not outperform the discriminant
in the algorithm basedon published studies, namely the function in any of the 3 test groups. We*Oand other9
bias implicit in the conduction, reporting, review and have noted that Bayesianalgorithms tend toward great-
publication of the results of researchstudies on diagnos- er overestimation when more data becomeavailable to
tic testing.14Sensitivities and specificities that are high them. This paradoxic increasein error may account for
are more likely to be published than those that are low. someof the differences between the 2 algorithms.
When these are applied in Bayes’theorem to subjectsin
Acknowlodgmont: The editorial assistanceof Mag-
laboratories in which falsely positive and negative re-
gie Meyer is deeply appreciated.
sults are more common, the resulting after-test proba-
bilities will be too high or too low, as reflected in our
