This round will deal mostly with
Med AssocJ 1981; 124: 555-558)
presented 10 reasons to read clinic-
al journals and introduced a flow-
chart of guides for reading them
(Fig. 1) that suggests four univer-
sal guides for any article (consider
the title, the authors, the summary
and the site) and points out that
further guides for reading (and dis-
carding) articles depend on why
they are being read.
This round will present guides Q Look at the TITLE: interesting or useful?
for reading articles that describe YES
diagnostic tests, both old and new. Review the AUTHORS: good track record?
First, however, we must give sonic YES orDON'TKNOW
nominal definitions. (Readthe SUMMARY: if valid, would these results be useful?
The serum level of thyroxine (T1) YES f
can be measured in at least four cir- . (Consider the SITE: if valid, would these results apply in your practice?
cumstances, and it is important for
us to tell them apart. First, a group
of passers-by in a shopping plaza
or the members of a senior citizens'
club may be invited to have a free
T1 test; this testing of apparently
healthy volunteers from the general
population for the purpose of separ-
ating them into groups with high
and low probabilities for thyroid
disease is called screening. Second,
patients who come to a clinicians s
office for any illness may have a T
test routinely added to whatever
laboratory studies are undertaken to
diagnose their chief complaints: this
testing of patients for disorders that
arc unrelated to the reason they
came to the clinician is called case
Reprint requests to: Dr. R.B. Haynes,
McMaster University Health Sciences
Centre, Rm. 3V43D. 1200 Main St. W,
Hamilton, Ont. L8N 3Z5 FIG. I-The first steps in how to read articles in a clinical journal.
consider the diagnostic test: Does
it have something to offer that the
gold standard does not? For exam-
ple, is it less risky, less uncom-
fortable or less embarrassing for the
patient, less costly or applicable
earlier in the course of the illness?
Again, if the proposed diagnostic
test offers no theoretical advantage
over the gold standard, why read
Having satisfied yourself that it's

'Of course, the gold standard mustn't

include the diagnostic test result as one
of its components, for the resulting "in-
corporation bias" would invalidate the
whole comparison.3

Table I-Elements of the proper clinical evaluation of a diagnostic test


Table II Fourfold table demonstrating "blind" comparison with "gold standard"

Gold standard
Patient has does not have
the disease the disease
Patient appears True False
Test result to have the positive positive a+b
(conclusion disease
results of the Negative:
test) Patient appears False True c+ d
not to have
the disease negative - negative

Stable properties:
a/(a + c) = sensitivity
d/(b + d) specificity
Frequency-dependent properties:
a/(a + b) = positive predictive value*
d/(c + d) = negative predictive value
(a+d)I(a+ b+c+d)= accuracy
(a + c)/(a + b + c + d) = prevalence
*Positive predictive value can be calculated other ways too. One of them uses Bayes' theorem:
(prevalence)(sensitivity) + (1- prevalence)(1 - specificity)
tive diagnostic test result, in what
proportion, a/(a + b), have we cor-
rectly predicted, or "ruled in", the
correct diagnosis? This proportion
a/(a + b), again usually expressed
as a percentage, goes by the name
positive predictive value.
Similarly, we want to know how
well a negative test result correctly
predicts the absence of, or "rules
out", the disease in question. This
proportion, d/(c d), is named the
negative predictive value.
Another property of interest is
the overall rate of agreement be-
tween the diagnostic test and the
gold standard. Table II reveals that
this could be expressed by the frac-
tion (a + d)/(a b + c + d); this
rate is usually called accuracy.*
If a diagnostic test's predictive
value constitutes the focus of our
clinical interest, why waste time
considering its sensitivity and spe-
cificity? The reason is a funda-
mental one that has major implica-
tions, not just for the rational use
of diagnostic tests, but also for the
basic education of clinicians. Put
simply, a diagnostic test's positive
and negative predictive values fluc-
tuate widely, depending on the pro-
portion of truly diseased individuals
among patients to whom the test is
applied - in Table II this is the
proportion (a + c)/(a + b + c
a property called prevalence.
Although a diagnostic test's sen-

*Galen and Gambino,4 who have written

a very thorough and easily understood
book on this topic, call this property
"efficiency". We won't.

Table Ill-Postexercise electrocardiogram as a predictor of coronary artery stenosis when the

disease is present in half the men tested5

Positive predictive value = a/(a + b) = 55/62 = 89%

Negative predictive value = d/(c + d) = 84/133 = 63%
Sensitivity = a/(a + c) 55/104 = 53%
Specificity d/(b + d) = 84/91 = 92%
Prevalence = (a + c)/(a + b + c + d) = 104/195 53%
standard arteriographic results (a + started with the original number of before any diagnostic tests are per-
c)/(a + b + c + d) or 104/195 or patients with coronary artery dis- formed. For example, a 30-year-
53% of the patients had marked ease (104), five times this number old man with a history of nonan-
coronary artery stenosis - a highly (520) would be free of the disease. ginal chest pain has a low likeli-
selected group of patients indeed. Because sensitivity remains con- hood of coronary artery stenosis
What would happen if enthusiasts stant, 55 (53%) of the 104 diseased (Diamond and Forrester6 put this
adopted the multistage stress test men would have positive exercise likelihood at 5%), whereas a 62-
for wider use in an effort to detect ECGs. Similarly, because specificity year-old man with typical angina
significant coronary disease in men remains at 92%, 478 of the 520 has a very high likelihood of coro-
who want to take up jogging or nondiseased men would have neg- nary stenosis (94% 6) When these
other sports, regardless, of whether ative tests. The rest of the table "pretest likelihoods" or "preval-
they had any chest pain?* Would a can then be completed by adding ences" are fed into our diagnostic
positive stress test still be useful? or subtracting to fill in the appro- test model for exercise electrocar-
The results of applying this test priate boxes, and the predictive diography, the information pro-
to a less carefully selected group of values and accuracy can then be vided by this test varies greatly. For
men are entirely predictable (Table calculated. In this or any other the younger man it can be calcu-
IV). If the true prevalence of example, then, the positive predic- lated that the likelihood of coro-
marked coronary artery stenosis, tive value falls and the negative pre- nary artery stenosis is 26% if the
as assessed by the gold standard of dictive value rises when a diagnostic exercise test is positive (positive
arteriography, was only 1/6 (104/ test developed for patients with a predictive value) and 3% if the test
624 or 17%) rather than better than high prevalence of the target dis- is negative (this is the complement
1/2 (104/195 or 53%), the test's order is subsequently applied to pa- of the negative predictive value or
positive predictive value would fall tients with a lower prevalence of d/[c + d]). The exercise test is of
from 89% to 57% and its negative the disorder. little value here: a negative test
predictive value would rise from Our analysis derives its relevance merely informs us of the obvious
63% to 91% - the reverse of the from the very real differences in (ischemic heart disease is unlikely
original situation. t prevalence of various disorders in in this man) and a positive test does
Now, we said that this result primary and tertiary care settings. not imply a sufficiently high prob-
could be forecast from Table III, But individual clinicians seldom ability of the disease to justify in-
and it is this forecasting feature work at more than one level of vasive testing under most circum-
that permits a reader to translate specialization and so it might be stances.
the results of a diagnostic test evalu- assumed that a given clinician need The exercise test is also not very
ation to his or her own setting. All not be concerned about the effect helpful for the 62-year-old man
that are needed are a rough estimate of shifts in disease prevalence on with typical angina. If the exercise
of the prevalence of the disease in his or her interpretation of diag- test is positive the likelihood of
one's own practice (from personal nostic tests. This assumption is disease rises only from 94% to
experience) or practices like it (from quite incorrect, however. We have 99%. If the test is negative the
other articles) and some simple already mentioned the difference in likelihood falls only to 89%, hard-
arithmetic. For example, as we'vc prevalence among men and women ly reassuring enough to forgo fur-
charitably estimated for Table IV, in the same clinical setting. Patients ther testing.
approximately one sixth of all men usually have a variety of easily dis- The important use of the exer-
(both symptomatic and asymptoma- cernible features that permit a fair- cise test (or any other test) lies in
tic) sent for coronary arteriography ly precise estimate of the diagnosis its application in cases of uncer-
from a primary care setting might
ultimately be found to have coro- Table IV Postexercise electrocardiogram as a predictor of coronary artery stenosis when the
nary artery stenosis. Thus, if we disease is present in one sixth of the men tested1
The authors of the work cited in this
example made no such recommendation.5
This hypothetical case closely approx-
imates what actually happened among
women in the study cited here.5 Roughly
one sixth had 75% stenosis or more and
the stress test had a sensitivity of 50%.
a specificity of 78% (values close to
those observed among men), and positive
and negative predictive values of 33%
and 88% respectively. The authors con-
cluded: "'In women, a positive exercise Positive predictive value = a/(a + b) = 55/97 = 57%
test is of little value in predicting the Negative predictive value - d/(c + d) - 478/527 = 91%
presence of significant coronary artery Sensitivity = a/(a + c) = 55/104 = 53% (as in Table Ill)
disease, whereas a negative test is quite Specificity = d/(b + d) = 478/520 - 92% (as in Table Ill)
Liseful in ruling out the presence of sig- Prevalence - (a + c)/(a + b + c + d) - 104/624 = 17%
nificant disease."
tainty. Let us consider another real clinical value of a new diag- selection of patients with and with-
example, that of a 45-year-old nostic test often lies in its predictive out the target disease. The reader
man with atypical angina. Clinical value among equivocal cases. More- deserves some assurance that dif-
studies demonstrate that such a pa- over, the apparent diagnostic value ferences in diagnostic test results
tient has a 46% likelihood of coro- of some tests actually resides in their are due to a mechanism of disease
nary artery stenosis.0 Should he go ability to detect the manifestations and not simply to differences in
on to angiography or not? If an of therapy (such as radiopaque de- such features as age, sex, diet and
exercise test is done and is posi- posits in the buttocks of ancient mobility of case and control sub-
tive, the likelihood of ischemic syphilitics) rather than disease, and jects.
heart disease can be calculated to the reader must be satisfied that the
be 85%, and he should therefore two are not being confused. 4. Was the reproducibility of the
have an angiogram if clinically war- Finally, just as a duck is not often test result (precision) and its inter-
ranted. If an exercise test is nega- confused with a yak even in the pretation (observer variation) de-
tive, however, the likelihood of sig- absence of chromosomal analyses, termined?
nificant coronary stenosis drops to the ability of a diagnostic test to dis- Validity of a diagnostic test de-
30% and the need for further in- tinguish between disorders not com- mands both the absence of sys-
vestigation diminishes. monly confused in the first place tematic deviation from the truth
Thus, the exercise test is of is scant endorsement for its wide- (that is, the absence of bias) and
value, but only for selected patients spread application. Again, the key the presence of precision (the same
for whom the likelihood of coro- value of a diagnostic test often lies test applied to the same unchanged
nary artery disease is neither high in its ability to .tlistinguish between patient must produce the same re-
nor low. To act on the results of otherwise commonly confused sult). The description of a diagnostic
the exercise test in the last two cir- disorders, especially when their test ought to tell readers how re-
cumstances makes little sense be- prognoses or therapies differ sharp- producible they can expect the test
cause it provides little information ly. It is this discriminating property results to be. This is especially true
beyond that already apparent from that makes the T4 determination so when expertise is required in per-
the clinical presentation. helpful in sorting out tense, anxious, forming the test (for example, ultra-
Having discussed the fourfold tremulous and perspiring patients sonography currently has enormous
comparison with a gold standard, into those with abnormal thyroid variation in the quality of its results
what about the element of "blind- function and those with other dis- when performed by different oper-
ness"? This simply means that those orders. ators) or in interpreting it (as you
who are carrying out or interpret- may recall from an earlier round,
ing the results of the diagnostic test 3. Was the setting for the study, as observer variation is a major prob-
should not know whether the pa- well as the filter through which lem for tests involving x-rays, elec-
tient being tested really does or study patients passed, adequately trocardiography and the like).9
does not have the disease of interest; described?
that is, they should be "blind" to In the previous round we saw 5. Was the term "normal" defined
each patient's true disease status. how the proportion of hypertensive sensibly?
Similarly, those who are applying patients with surgically curable le- If the article uses the word "nor-
the gold standard should not know sions varied almost 10-fold depend- mal" its authors should tell you
the diagnostic test result for any ing on whether the same diagnostic what they mean by it. Moreover,
patient. It is only when the diag- tests were applied in a general prac- you should satisfy yourself that
nostic test and gold standard are tice or in a tertiary care centre. their definition is clinically sensible.
applied in a blind fashion that we Because a test's predictive value Several different definitions of nor-
can be assured that conscious or changes with the prevalence of the mal are used in clinical medicine;
unconscious bias (in this case the target disease, the article ought to we contend that some of them prob-
"diagnostic suspicion" bias) has tell you enough about the study site ably lead to more harm than good.
been avoided.7 As you may recall, and patient selection filter to permit We have listed six definitions of
this bias was discussed in an earlier you to calculate the diagnostic test's normal in Table V and acknowl-
round on clinical disagreement.8 likely predictive value in your own edge our debt to Tony Murphy for
practice. pointing out most of them.2'10
2. Did the patient sample include The selection of control subjects Perhaps the most common defi-
an appropriate spectrum of mild who do not have the disease of in- nition of normal assumes that the
and severe, treated and untreated terest should be described as well. diagnostic test results (or some
disease, plus individuals with dif- Although lab technicians and jani- arithmetic manipulation of them)
ferent but commonly confused tors may be appropriate control for everyone, for a group of pre-
disorders? subjects early in the development sumably normal people or for a
Florid disease (such as long- of a new diagnostic test (especially carefully characterized "reference"
standing rheumatoid arthritis) usual- with the declining use of medical population will fit a specific theore-
ly presents a much smaller diag- students as laboratory animals), the tical distribution known as the nor-
nostic challenge than the same dis- definitive comparison with a gold mal or gaussian distribution. One
ease in an early or mild form; the standard demands equal care in the of the nice properties of the gaussian
raised to the power of the number
of independent diagnostic tests per-
formed. Thus, a patient who under-
goes 20 tests has only Q.952O or
about 1 chance in 3 of being called
normal; a patient undergoing 100
such tests has only about 6 chances
in 1000 of being called normal at
the end of the work up.*
Other definitions of normal, in
avoiding the foregoing pitfalls, pre-
sent other problems. The risk factor
approach is based upon studies of
precursors or statistical predictors
of subsequent clinical events; by
this definition, the normal range for
serum cholesterol concentration or
blood pressure consists of levels
that carry no additional risk of mor-
bidity or mortality. Unfortunately,
however, many of these risk factors
exhibit steady increases in risk
throughout their range of values;
indeed, it has been pointed out that
the normal serum cholesterol con-
centration, by this definition, might
lie below 150 mg/dl (3.9 mmol/l)."
Another shortcoming of this risk
factor definition becomes apparent
when we examine the consequences
of acting upon a test result that
lies beyond the normal range: Will
altering a risk factor really change
the risk? Recent experience with the
*Thjs consequence of such definitions
helps explain the results of a randomized
trial of multitest screening at the time of
admission to hospital that found no pa-
tient benefits but increased health care

Table V-Properties and consequences of different definitions of "normal"

Consequences of its
Property Term clinical application
The distribution of diagnostic Gaussian Ought to occasionally obtain
test results has a certain shape minus values for hemoglobin
level etc. All diseases have the
same prevalence. Patients are
normal only until they are
Lies within a preset percentile Percentile All diseases have the same
of previous diagnostic test prevalence. Patients are
results normal only until they are
Carries no additional risk Risk factor Assumes that altering a risk
of morbidity or mortality factor alters risk.
Socially or politically Culturally Confusion over the role of
aspired to desirable medicine in society.
Range of test results beyond Diagnostic Need to know predictive values
which a specific disease is, for your practice.
with known probability,
present or absent
Range of test results beyond Therapeutic Need to keep up with new
which therapy does more good knowledge about therapy.
than harm
used in the first guide to reading shown to do more good than harm, valid assessment would be the
about a diagnostic test: comparison and is indicated in Fig. 2 as point agreement of results of the combi-
with a gold standard. The "known Z. This therapeutic definition is nation of leg scanning and impe-
probability" with which a disease attractive because of its link to ac- dance plethysmography with veno-
is present is our old friend the posi- tion. The therapeutic definition of graphy.
tive predictive value. the normal range of blood pressure, In summary, any single compo-
This definition is illustrated in for example, avoids the hazards of nent of a cluster of diagnostic tests
Fig. 2, where we see the usual over- labelling patients as diseased17 un- should be evaluated in the context
lap in diagnostic test results between less they are going to be treated. of its clinical use.
patients shown, by application of a The use of this definition requires
gold standard, to be disease-free or that clinicians keep abreast of ad- 7. Were the tactics for carrying out
diseased (the a, b, c and d in Fig. 2 vances in therapeutics and become the test described in sufficient detail
correspond to cells a, b, c and d of adept at sorting out therapeutic to permit their exact replication?
Tables II to IV). The known prob- claims; a later article in this series If the authors have concluded
ability (or predictive value) with of Clinical Epidemiology Rounds is that you should use their diagnostic
which a disease is present or absent devoted to this topic. test, they have to tell you hoW to
depends on where we set the limits When reading a report of a new use it; this description should cover
for the normal range of diagnostic diagnostic test, then, you should patient issues as well as the me-
test results. If we simply wanted satisfy yourself that the authors chanics of performing the test and
to maximize the number of times have defined what they mean by interpreting its results. Are there
the diagnostic test result was cor- normal and that they have done so special requirements for fluids, diet
rect, we'd set the limits for normal in a sensible and clinically useful or physical activity? What drugs
at the dotted line where the curves fashion. should be avoided? How painful is
cross, but that might not be very the procedure and what is done to
helpful clinically. If we lower these 6. If the test is advocated as part relieve any pain? What precautions
normal limits to point X, cell c of a cluster or sequence of tests, should be taken during and after the
approaches zero, sensitivity and was its contribution to the overall test? How should the specimen be
negative predictive values approach validity of the cluster or sequence transported and stored for later
100% and we can use the normal determined? analysis? These tactics and pre-
diagnostic test result to rule out the In many conditions an individual cautions must be described if you
disease (because nobody with the diagnostic test examines but one of and your patients are to benefit
disease has test results below X). several manifestations of the un- from this diagnostic test.
Similarly, if we raise the limits of derlying disorder. For example,
normal for the diagnostic test result in diagnosing deep vein thrombosis 8. Was the "utility" of the test de-
to point Y, cell b approaches zero, impedance plethysmography exam- termined?
specificity and positive predictive ines venous emptying, whereas leg The ultimate criterion for a diag-
values approach 100% and we can scanning. with iodine-l 25-labelled nostic test or any other clinical
use the abnormal diagnostic test fibrinogen examines the turnover of maneuver is whether the patient is
result to rule in the disease (because coagulation ,factors at the site of better off for it. If you agree with
no nondiseased patients have test thrombosis.'8 Furthermore, plethys- this point of view you should scru-
results above Y). Thus, this defini- mography is much more sensitive tinize the article to see whether the
tion has clinical utility and is a for proximal than distal venous authors went beyond the foregoing
distinct improvement over the defi- thrombosis, whereas the reverse is issues of accuracy, precision and
nitions described earlier. However, true for leg scanning. As a result, the like to explore the long-term
it does require that clinicians keep these tests are best applied in se- consequences of their use of the
track of both the predictive values quence: if the plethysmogram is diagnostic test.
of individual diagnostic tests and positive, the diagnosis is made and In addition to telling you what
the test levels at points X and Y treatment begins at once; if it is happened to patients correctly clas-
that apply in their own practices. negative, leg scanning begins and sified by the diagnostic test, the
The final definition of normal the diagnostic and treatment deci- authors should describe the fate of
sets its limits at the point beyond sions await its results. the patients who had false-positive
which specific treatments have been This being so, it is clinically non- results (those with positive test
sensical to base a judgement of the results who really did not have the
value of leg scanning on a simple disease) and those with false-neg-
comparison of its results alone ative results (those with negative
against the gold standard of veno- test results who really did have the
graphy. Rather, its agreement with disease). Moreover, when the execu-
venography among suitably symp- tion of a test requires a delay in the
tomatic patients with a negative im- initiation of definitive therapy
pedance plethysmogram is one ap- (while the procedure is being re-
FIG. 2-Diagnostic and therapeutic propriate assessment of its validity scheduled, the test tubes are in-
definitions of "normal". and clinical usefulness. Another cubating or the slides are waiting
to be read) the consequences of this the one that will best meet your 15. MENCKEN HL: A Mencken Chres-
delay should be described. clinical requirements. toinathy, Knopf, Westminster, 1949
For example, we are part of a The next round will consider ar- 16. LALONDE M: A New Perspective on
the Hea/th of Canadians. A Working
team that has studied the value ticles that describe the clinical Document, Department of National
of noninvasive tests in the diagnosis course and prognosis of disease. Health and Welfare, Ottawa, Apr
of patients with clinically suspected 1974: 58
deep leg vein thrombosis, and have References 17. HAYNES RB, SACKETT DL, TAYLOR
tested the policy of withholding DW, GIBSON ES, JOHNSON AL: In-
anticoagulants from patients with a 1. SACKETT DL: Clinical diagnosis and creased absenteeism from work fol-
the clinical laboratory. Clin Invest lowing the detection and labelling of
negative impedance plethysmogram Med 1978; 1: 37-43 hypertensives. N Engi J Med 1978;
(a quick test) until or unless the 2. MURPHY EA: The Logic of Med- 299: 74 1-744
"'1-fibrinogen leg scan becomes icine, Johns Hopkins, Baltimore, 18. HULL R, HIRsH J, SACKETT DL,
positive.'8 The scan takes several 1976: 117-160 POWERS P, TURPIE AGG, WALKER
hours to several days to become 3. RANSOHOFF DF, FEINSTEIN AR: I: Combined use of leg scanning and
Problems of spectrum and bias in impedance plethysmography in sus-
positive when venous thrombi are evaluating the efficacy of diagnostic pected venous thrombosis. An alter-
small or confined to the calf; it is tests. N Engi J Med 1978; 299: 926- native to venography. N Engi J Med
therefore important to determine 930 1977; 296: 1497-1500
and report whether any patients 4. GALEN RS, GAMBINO SR: Beyond
suffer clinical embolic' events during Normality: The Predictive Value and
this interval (fortunately, they do Elficiency of Medical Diagnoses,
not). Moreover, comparisons of Wiley, New York, 1975: 30-40
