You are on page 1of 8

A framework for Curren t

clinical evaluation Rseview


of diagnostic
technologies

Gordon H. Guyatt,* MD
Peter X. Tugwell, MD
David H. Feeny, PhD
R. Brian Haynes, MD, PhD
Michael Drummond, DPhil

Most new diagnostic technologies have not been possibles, la precision diagnostique et l'incidence
adequately assessed to determine whether their sur les personnes et les organismes charges des
application improves health. Comprehensive evalu- soins sanitaires, sur les traitements et sur le devenir
ation of diagnostic technologies includes establish- des malades. On presente ici des principes devant
ing technologic capability and determining the servir a decouvrir s'il est satisfait a chacun de ces
range of possible uses, diagnostic accuracy, impact criteres. Une technique de diagnostic ne doit etre
on the health care provider, therapeutic impact and rendue disponible que si elle est moins chere et
impact on patient outcome. Guidelines to deter- moins nuisible, tout en etant au moins aussi
mine whether each of these criteria have been met precise, que les techniques deja en place, si elle
adequately are presented. Diagnostic technologies rend inutile le recours a d'autres explorations sans
should be disseminated only if they are less expen- perte de precision, ou si elle permet la mise en
sive, produce fewer untoward effects and are at route d'un traitement efficace. La preuve d'un
least as accurate as existing methods, if they avantage pour les malades necessite souvent un
eliminate the need for other investigations without essai comparatif, chez des sujets choisis au hasard,
loss of accuracy, or if they lead to institution of de la nouvelle methode et d'une autre approche
effective therapy. Establishing patient benefit often diagnostique. Si on adopte une forme d'essai moins
requires a randomized controlled trial in which difficile a realiser, on risque de se tromper dans
patients receive the new test or an alternative l'estimation des avantages. L'examen rigoureux de
diagnostic strategy. Other study designs are logisti- la validite de toutes les methodes de diagnostic est
cally less difficult but may not provide accurate indispensable a l'utilisation efficace des ressources
assessment of benefit. Rigorous assessment of diag- sanitaires.
nostic technologies is needed for efficient use of
health care resources.
w l he introduction of sophisticated and
recent
expensive diagnostic technology into medi-
Dans la plupart des cas, les nouvelles techniques de cal practice has given rise to important
diagnostic n'ont pas ete scrutees d'assez pres afin de questions. First, do these technologies actually
savoir si elles sont susceptibles d'ameliorer la sante. improve patient care? Second, if they do improve
II faut les examiner completement afin d'en deter- care, is their price in terms of services to other
miner la validite technologique, les applications patients forgone in keeping with their benefit?
Unfortunately, many technologies have been dis-
seminated without adequate evaluation, and we
From the departments of Medicine, Economics, and Clinical now face a monumental backlog of technologies
Epidemiology and Biostatistics, McMaster University, Hamil- that need to be assessed plus a burgeoning stream
ton, Ont. of even newer machines. To make matters worse,
while methodologic standards for determining the
*Career scientist, Ontario Ministry of Health accuracy of diagnostic tests are well established,"3
Reprint requests to: Dr. Gordon H. Guyatt, Department of ] the criteria that should be met before a new test is
Clinical Epidemiology and Biostatistics, McMaster University ] introduced into routine clinical practice remain
Health Sciences Centre, Rm. 3H7, 1200 Main St. W, Hamilton, controversial. We believe the standards for evaluat-
Ont. L8N 3Z5 I ing new techniques have not been sufficiently
- For prescribing information see page 661 CAN MED ASSOC J, VOL. 134, MARCH 15,1986 587
rigorous and that inadequate evaluation has con- through the addition of the second and third
tributed to overutilization of diagnostic techniques. criteria. Fineberg and colleagues did not attempt to
In this paper we present a framework and a set of provide guidelines whereby one could determine
practical and scientific standards for the clinical whether these criteria have been adequately met,
assessment of diagnostic technologies. Our frame- nor did they discuss how far this hierarchy needs
work is perhaps an idealized outline of how to go before a technology should be considered
diagnostic technologies should be assessed; howev- appropriate for dissemination. The latter will de-
er, adherence to its principles would substantially pend on a number of factors, including present
decrease the problems we have described. practice, the cost and untoward effects of the
Health-care-related technology can be broadly technology and one's values. These issues will be
defined as "the set of techniques, drugs, equipment considered in detail.
and procedures used by health care professionals
in delivering medical care to individuals, and the
system within which such care is delivered".4 Technologic capability
According to this definition, simple tests, as well
as sophisticated machines, are technologies. While Establishing the technologic capability of a
our discussion will focus on new and expensive new test is generally undertaken by physicists,
techniques, it is equally applicable to diagnostic biochemists, physiologists and manufacturers. Our
tests in general. discussion will concern the clinical evaluation of
the new technology, which begins when the basic
scientists and manufacturers have produced a
The framework product that meets laboratory specifications.
In addition to their clinical uses, diagnostic
technologies may contribute to a better under- Range of possible uses
standing of human physiology and mechanisms of
disease. Positron emission tomography scanning is When the technology comes out of the labora-
likely to provide important information about tory into the clinical setting, it must first be
tissue metabolism in health and disease, irrespec- applied to many patients with a large number of
tive of its diagnostic use. While elucidation of diverse conditions. The goal of this exercise is not
disease mechanisms constitutes an important rea- to establish accuracy but rather to delineate the
son for developing and using diagnostic technolo- possible uses of the technology. Striking impres-
gies, this goal is largely independent of clinical sions often result from this exploratory phase: for
considerations and by itself would lead to much example, the ease with which computed tomogra-
more limited dissemination. For these reasons we phy (CT) scanning identifies hemorrhage, the diffi-
shall limit our discussion to the accuracy and culty in delineating lesions in the first weeks
benefit of technologies in the clinical context. following thrombotic stroke, and the usefulness of
Depending on the point of view, there are a magnetic resonance (MR) imaging in demonstrat-
number of criteria for concluding that a diagnostic ing demyelinating lesions.
technology is ready for dissemination.5,6 These The major criteria for patient selection during
criteria can be considered to form a hierarchy of this phase should be a good idea of what the
progressively more rigorous evaluation, as follows: underlying condition is and a reasonable expecta-
* Technologic capability: The ability of the tion that the new technology will provide impor-
technology to perform to specifications in a labora- tant information. Attempts to limit the spectrum
tory setting has been demonstrated. of conditions may prevent a full appreciation of
* Range of possible uses: The technology the technology's uses. Those interpreting the test
promises to provide important diagnostic informa- results should not be blind to who the patients are
tion in a number of clinical situations. or what underlying condition they have. In fact,
* Diagnostic accuracy: The technology pro- the more clinical information the better, for this
vides information that allows health care workers phase of instrument development is a learning
to make a more accurate assessment regarding the process in which unexpected correlations are dis-
presence and severity of disease. covered, interpretations refined and important hy-
* Impact on health care providers: The tech- potheses for subsequent testing generated. Prob-
nology allows health care workers to be more lems can arise if certain elements of more advanced
confident of their diagnoses and thereby decreases studies (such as establishing reliability, carefully
their anxiety and increases their comfort. defining the patient population or blinding the
* Therapeutic impact: The therapeutic deci- interpreters of the test results) are incorporated in
sions made by health care providers are altered as a what is still essentially an exploratory study. This
result of application of the technology. may have two deleterious consequences: the full
* Patient outcome: Application of the tech- potential of the early phase of evaluation may not
nology results in benefit to the patient. be realized, and a spurious impression of the
This schema has been modified from the one adequacy of studies in establishing the usefulness
suggested by Fineberg and colleagues5 primarily of the test may be created.
588 CAN MED ASSOC J, VOL. 134, MARCH 15, 1986
Diagnostic accuracy cient that takes account of both random and
systematic error9 can be used to quantitate the
For a diagnostic technology to be clinically relation between the test and the gold standard. An
useful it must be able to accurately distinguish alternative approach is to divide the outcomes into
diseased and nondiseased and to quantitate the clinically relevant levels and calculate chance-cor-
severity of an illness or condition. The pitfalls in rected agreement with a statistic called kappa,10 or
trying to determine the accuracy of diagnostic tests a modified version, called weighted kappa, which
and the best ways around them have been thor- allows one to consider not only the fact that the
oughly studied and described."2 We will briefly technologies disagreed with the gold standard but
summarize them here. also the extent of the discrepancy."
Establishing accuracy implies independent In assessing accuracy we need to know wheth-
comparison with a "gold standard". By independ- er the diagnostic technology is capable of identify-
ent, we mean that those interpreting the test ing patients with mild as well as severe disease and
results must be unaware of the results of the of distinguishing them not from healthy individu-
gold-standard procedure. The gold standard is als but from those with conditions easily confused
often a more invasive or dangerous procedure, with the disease of interest. Accuracy should be
such as coronary artery angiography, which is a determined by examining representative patients
gold standard for electrocardiographic stress test- with a suspected condition, applying the diagnos-
ing. If a definitive procedure is not available a tic technology under investigation and proceeding
substitute gold standard, such as long-term follow- with independent application of the gold standard.
up, may be adequate. In determining accuracy the Sensitivity will be greater in those with severe
precision should also be ascertained; both intra- disease, and specificity will be greater in normal
and interobserver reliability should be established. controls. Exercise radionuclide ventriculography as
There are many situations in which no gold a diagnostic test for coronary artery disease pro-
standard exists and adequate substitutes are not vides a dramatic example.12 When initially studied
available (e.g., bronchial provocation tests for asth- radionuclide ventriculography had a specificity of
ma, walking tests for functional exercise capacity 94%. In subsequent studies the specificity fell to
in patients with chronic heart and lung disease, 49%; this was most probably due to the fact that
and strain-gauge plethysmography for diagnosis of the disease-free population was far healthier dur-
the postphlebitic syndrome). In these situations ing the earlier studies. Patients who have a sub-
one must rely on construct validity. To demon- stantial pretest likelihood of coronary artery dis-
strate construct validity one examines the relation ease (i.e., the ones for whom we need the diagnosis
between a new test and existing measures and confirmed) show a high incidence of nonspecific
looks at whether the new technology relates to abnormalities in exercise radionuclide angiogra-
other variables in the way one would expect if it is phy.12
really measuring what it is supposed to measure.
When a gold standard is available and a new
test is designed to detect the presence or absence of Impact on health care providers
disease, sensitivity (the proportion of patients with
disease correctly identified as such) and specificity Accurate diagnostic tests may influence nei-
(the proportion of patients without disease correct- ther therapy nor patient outcome, and yet many
ly identified) should be calculated. A receiver- still receive wholehearted support from the medical
operating characteristic (ROC) curve, which relates community. CT scanning has been widely dis-
false-positive results to true-positive results at seminated, and its use for many conditions has
multiple cut-off points, can be constructed to help received endorsement from an impressive array of
determine the cut-off point that- gives the optimum official agencies and consensus conferences with-
combination? of sensitivity and specificity. A more out rigorous scientific evidence that patients bene-
powerful method of establishing a test's usefulness fit from its application. This may be because
is to examine the associated likelihood ratios, physicians and policy-makers are convinced of its
which allow estimates of the probability that benefit even in the absence of adequate data,
disease is present at any level of a diagnostic test because they see the demonstration of accuracy as
result. If two tests are being compared their sufficient reason for dissemination irrespective of
results should be interpreted independently benefit, or because the CT scan's ability to reduce
against the same gold standard. In this situation the physician's anxiety and increase his or her
ROC curves are used to determine which test is confidence may favour rapid diffusion and official
"better",8 but comparison of likelihood ratios at endorsement. Although one may think of more
various levels of test results is a more powerful sinister possibilities (such as the increased power
method. and status, and the financial advantages, acquired
For diagnostic technologies in which level of by the medical profession as it adopts more and
function or severity of disease are the variables of more mysterious and apparently powerful gadget-
interest, simple correlation coefficients (if the unit ry, or the unregulated promotional efforts of the
of measurement differs between the test and the companies responsible for developing the new
gold standard) or an intraclass correlation coeffi- technology) we suspect that the third explanation
CAN MED ASSOC J, VOL. 134, MARCH 15,1986 589
may be the most important. There is no doubt that unproven therapy is instituted a change in health
it is immensely reassuring to know that a patient status as a result of the diagnostic technology
who has had a sudden change in neurologic status remains a possibility.
is not suffering from a condition (such as acute or How would one go about showing that thera-
chronic subdural hematoma) that requires immedi- py has changed as a result of a new diagnostic
ate surgical intervention. Similar reassurance ac- technology? The best way would be a randomized
companies the knowledge that severe headaches of controlled trial in which patients would be ran-
recent onset do not represent a brain tumour or domly assigned to one of two diagnostic plans,
that the comatose patient with a documented only one of which would include the technology
malignant disease in whom one has decided to do under investigation. The new technology might be
nothing more does, in fact, have tumour deposits added (e.g., exercise stress testing might be insti-
throughout the brain. Such reassurance is even tuted before a patient who has been treated for
more powerful in these days of rampant medical myocardial infarction leaves the hospital), or the
litigation, when a mistake may have disastrous two arms of the experiment might contain alterna-
consequences for not only the patient but also the tive technologies (e.g., CT scanning and MR imag-
physician. ing).
Assuming that diagnostic technologies do It has been argued that clinical trials are likely
have an impact on health care workers, what to be too cumbersome or impractical for regular
weight should we give this effect in our decisions evaluation of diagnostic technologies.14 Problems
about resource allocation? For example, if an ex- include the need for a large number of patients, the
pensive technology reassures health care providers need for preliminary use of the technology in
but does not influence patient outcome should it practice for clinicians to develop expertise in inter-
be adopted? Certainly a test's effects on health care pretation of the results, and rapid developments in
workers should influence our decisions about its technology, which may make the results of a trial
use, but careful thought needs to be given to the obsolete by the time they appear.'4'5 Given the
appropriate measurement of the effect as well as to difficulties of randomized controlled trials are
its importance. there alternative ways of assessing a test's thera-
Reasonable discussion of this so far neglected peutic impact?
area cannot begin until data about the extent to One strategy is simply to review patient
which diagnostic technologies do provide reassur- records and evaluate whether the diagnostic test
ance to health care workers are collected. Diamond altered patient management.'6 Retrospective review
and Forrester"3 have highlighted the distinction has numerous problems, including the difficulty of
between estimates of the probability that a disease determining what would have been done if the test
is present and confidence in that estimate. That is, had not been available. A more effective method
a given estimate of the probability that a disease is would be to ask physicians about their plans for
present (e.g., it is believed that there is a 25% further diagnosis and therapy before the test is
chance a patient has coronary artery disease) may performed, then give them the results and see if
be associated with a great deal of confidence (e.g., it their plans change.
is quite certain that the probability is close to 25%) Using before-after studies based on clinicians'
or very little confidence (e.g., the estimate of 25% is reports of their plans for therapy is enticing. First,
just a guess; the real probability could be much the expense and logistic difficulties of a random-
higher or be close to zero). Confidence in probabil- ized controlled trial are avoided. Second, no patient
ity estimates is likely to be related to physician is denied a potentially beneficial technology.
expertise and experience and is likely to increase as There are, however, major problems with this
data about patients accumulate. This issue should study design:1718 changes in therapy that are be-
be explored further in studies of diagnostic tech- lieved to be beneficial may, in fact, be harmful;
nologies. inaccurate diagnostic tests can have deleterious
therapeutic impact; clinicians differ systematically
Therapeutic impact in their assessment of whether a given test result
contributed to management;'9'20 it may be difficult
A test result may have diagnostic impact and to consistently be aware of clinicians' plans before
still not affect therapy: a health care worker may the test results are available; clinicians' reports of
be unaware of the significance of a test result or what they would do before the test result is
unfamiliar with available treatment, the change in available may differ from what they actually would
probability of disease may be insufficient to alter have done were the technology not available; while
therapy, the patient may refuse treatment, there all patients receive the potential benefits of the
may be no therapy available, or the patient may test, they also are all exposed to its known and
already be receiving the best possible therapy. To unknown hazards; and the design is in most cases
change morbidity or mortality or improve the limited to "add-on" technologies as opposed to
quality of life a diagnostic test must provide those that replace existing tests.
information that changes therapy. If the test re- It may be argued that these problems do not
sults lead to institution of an intervention whose significantly mar the validity of before-after study
effectiveness is known, patient benefit follows. If designs of therapeutic impact that rely on physi-
590 CAN MED ASSOC J, VOL. 134, MARCH 15, 1986
cian judgement. We believe it is more likely that to date suggests that the outcome does not
such studies will overestimate patient benefit. The change.2526 Nonstress tests that monitor fetal heart
only way to know for sure is to do what has been rate for abnormalities clearly add information and
done for uncontrolled trials or for those using have been widely disseminated, but they do not
historical controls: compare their results to those change perinatal morbidity or mortality.2728 These
of randomized controlled trials that ask the same examples illustrate the wisdom of demonstrating
question. Preliminary evidence comes from two improvement in patient outcome before a diagnos-
before-after studies that found that CT scanning tic technology becomes widely disseminated.
can decrease the frequency of abdominal sur- It has been suggested that randomized con-
gery.'1621 In the only randomized controlled trial of trolled trials are extremely difficult and may not be
CT scanning patients presenting with undiagnosed feasible for many diagnostic technologies. A key
abdominal masses were randomly assigned to feature of such trials of therapeutic technology -
receive CT or conventional imaging.22 The propor- blinding - is difficult in trials of diagnostic
tion of patients who received laparotomy was technologies in which the physician may have to
actually higher in the group that underwent CT be aware of the diagnostic procedure. This limita-
scanning (39%) than in the control group (32%). tion introduces the possibility of bias in the
Sample size limitations make the results of this application of other tests or the institution of a
study far from definitive, but the results suggest treatment regimen. Feinstein29 has recently argued
that further comparisons between before-after that cohort studies of patients who have been
studies of therapeutic impact and randomized con- given a test according to clinical judgement or
trolled trials that examine patient benefit be con- availability may provide valid results if important
ducted before the former are accepted as valid in confounders are considered. One problem with
the assessment of diagnostic technologies. this approach is that it is unlikely that all impor-
However, there are circumstances in which tant confounders can be identified and adequately
one can be more confident of the validity of measured. For example, a group of Australian
before-after therapeutic impact studies. If change neurologists examined the effect of CT scanning
in therapy immediately follows receipt of new on mortality in stroke patients by comparing the
diagnostic information, or if the test is clearly outcome of patients seen in 1978 who underwent
responsible for an important change in treatment CT scannning with that of patients seen in 1974
plan, the therapeutic impact of the technology is before CT scanning was introduced.-IO To ensure
established. If changes in management have been that it was the scanning that was making the
shown to be effective in well designed randomized difference, they chose patients matched for all the
controlled trials or obviate the need for an invasive prognostic variables they thought relevant. The
procedure no further studies will be required. 1978 group had a lower mortality, apparently
More often, though, therapeutic impact providing dramatic evidence of the impact of CT
studies that rely on clinical judgement will have a scanning on outcome. However, the investigators
role as exploratory studies. If no therapeutic im- then assessed mortality in another matched group:
pact is found it is extremely unlikely that the stroke patients seen in 1978 when the CT scanner
technology is of benefit. On the other hand, if the wasn't working. This group had the lowest mortal-
initial study results suggest therapeutic impact ity, comparable to that of the other patients stud-
more rigorous investigations must be undertaken. ied in 1978 and lower than that of the group
studied in 1974. The conclusions are that patients
Patient outcome in the more recent group were not as sick (in ways
that the investigators could not measure except by
Does one really need to go beyond determin- looking at rates of death) as the historical control
ing therapeutic impact or even diagnostic impact group and that concurrent randomized controls are
before concluding that a technology warrants dis- necessary to establish the benefit of diagnostic
semination? There have been many instances in technologies.
which a diagnostic technology provided informa- While there is no doubt that randomized
tion but failed to change clinically relevant out- controlled trials of diagnostic technologies are
comes. In one case, application of a diagnostic test difficult, are certainly not applicable to all situa-
(measurement of serum cholesterol levels) when tions and are limited by the difficulty associated
followed by specific therapy (clofibrate administra- with blinding, they are nevertheless possible. Tri-
tion), actually increased the rate of death." Emer- als of diagnostic technologies conducted to date
gency endoscopy in patients bleeding from the include studies of nonstress tests in pregnant
upper gastrointestinal tract provides increased di- women,2728 CT scanning in the assessment of
agnostic information without altering rates of abdominal masses,22 endoscopy in patients with
surgery, length of hospital stay or rate of death.24 acute gastrointestinal bleeding,'4 chest radiography
Chest radiography is an accurate tool in ascertain- in men at risk of carcinoma of the lung,2" endo-
ing the presence of carcinoma of the lung, and scopic cholangiography versus transhepatic cho-
radiographic screening for lung cancer has thera- langiography in patients with jaundice,31 and mul-
peutic impact, in that more patients undergo sur- tiphasic screening at the time of admission to
gery and at an earlier stage of disease, but evidence hospital." Clearly, there are many methodologic
CAN MED ASSOC J, VOL. 134, MARCH 15, 1986 591
challenges still to be met, but the same was true of All clinically relevant aspects of patient out-
randomized controlled trials of therapeutic tech- come should be measured. These may include
nologies two decades ago. reductions in rate of death, length of hospital stay
and number of complications from more invasive
tests, as well as improvement in quality of life. By
Methodologic standards for studies of quality of life we mean both a person's ability to
patient outcome undertake activities that he or she finds rewarding
and enjoyable, and the way he or she feels. A
The methodologic standards for trials of ther- diagnostic technology may change quality of life
apeutic technologies are equally applicable to diag- even when other, more easily measurable, variables
nostic tests. They include the necessity for true show no change. Although the assessment of
randomization, pre- or posthoc stratification for quality of life is an intimidating task, guidelines
potential confounders, consideration of possible for its measurement are becoming more avail-
cointervention and contamination, adequate sam- able.34'35
ple size, and measures (such as blinding) to mini- An example of the importance of measuring
mize potential bias.33 However, several points need the impact of an intervention on quality of life is
to be made about the special challenges posed by provided by Sox and associates,-- who found that
randomized controlled trials of diagnostic technol- among patients presenting with noncardiac chest
ogies. pain, those randomly assigned to receive routine
Care must be taken to identify the appropriate measurements of creatine phosphokinase and elec-
role of the new technology. If it is added on to trocardiography showed less short-term disability
existing technologies all patients must receive the than did patients who did not undergo these
full conventional examination and then be ran- investigations. Just as the physician may find that
domly assigned to receive or not receive the new negative results of CT scanning decrease his or her
procedure. If the new test is designed to replace anxiety about the possibility of a brain tumour in
existing methods patients must receive an identical patients with severe headaches of recent onset, so
examination prior to randomization to receive or may the patients. This reassurance can be extreme-
not receive the conventional or experimental tech- ly important for the worried patient.
nology. The therapeutic value of diagnostic tests, as
Diagnostic possibilities (and diagnostic confi- this reassurance value might be labelled, is worthy
dence) and therapeutic plans should be elicited of considerably more investigation than it has
from health care providers before and after appli- received to date, but we would like to include one
cation of the technology, for this will help clarify caveat for those who might try. The reassurance
the mechanism of any effects that are found. This value of the test for the patient may be confounded
is analogous to measuring the biologic effects, in with its reassurance value for the physician, and
addition to the clinically relevant outcomes, of a unless the latter is measured the extent of this
new treatment in an attempt to clarify mechanisms confounding may be impossible to assess. For
of action. example, patients with a severe headache who
Patient selection for studies of diagnostic tech- undergo CT scanning may have a stronger convic-
nology must be appropriate. For example, there tion that they don't have a brain tumour than
will be patients in whom the pretest likelihood of patients who are spared the test only because the
disease is so low that even a positive result on the physician expressed some hesitation about the
new test will not lead to institution of therapy (or matter. The appropriate conclusion from such a
so high that therapy would be administered de- study would be not to recommend CT scanning for
spite a negative result). The inclusion of patients in patients with headache but to educate physicians
whom the test can have an impact on neither so they realize that if the results of a careful
therapy nor outcome in a randomized controlled neurologic examination are negative a brain tu-
trial of a new technology will decrease the power mour can be virtually ruled out.37,38 Physicians
of the study. The analogy here is restricting entry might then be able to provide the reassurance that
to a therapeutic trial to "high-risk, high-response" would quell the patient's anxieties.
patients. Blinding of patients in studies of diagnostic
Mechanisms for estimating the accuracy of the technologies (such as by mock CT scanning) may
technology should be built into the trial. After all, be ethically questionable. Although in some situa-
one wouldn't expect benefit from an inaccurate tions blinding of physicians may be possible (such
test. This is also an argument for proceeding as when a physician receives a verbal report of a
directly from the preliminary stage of establishing test without being aware of which test or combina-
the range of possible uses to randomized con- tion of tests led to the result), it will often be
trolled trials in which patient benefit is the prima- difficult or impossible. However, it will usually be
ry measure of outcome. If this approach is taken possible and highly desirable that those who are
the hierarchy of diagnostic accuracy, impact on assessing the outcome (such as interviewers ad-
health care providers, therapeutic impact and pa- minstering questionnaires on quality of life) be
tient outcome can all be examined in a single blinded.
study. We have discussed the difficulties of assessing
592 CAN MED ASSOC J, VOL. 134, MARCH 15X 1986
the accuracy of tests for which a gold standard given a toxic treatment), benefit can be considered
does not.exist and the need to resort to construct established. For example, impedance plethysmog-
validity: the determination of whether a technolo- raphy and leg scanning have been shown to be
gy relates to other measures in the manner that comparable to venography in diagnosing deep vein
one would expect if it is really measuring what it is thrombosis.44 Because heparin is known to do
supposed to measure. Another, possibly more sat- more good than harm in treating deep vein throm-
isfactory, approach is to consider construct validity bosis, that impedance plethysmography leads to
only in passing and to proceed straight to deter- appropriate administration of heparin is a suffi-
mining whether the application of the technology cient demonstration of its usefulness. Given the
results in patient benefit. For example, one could difficulty in performing studies that examine dif-
randomly assign patients suspected of having ferences in outcome, it may be worth while to
asthma to receive or not receive a bronchial provo- concentrate diagnostic technology assessment in
cation test. If it is found that asthma is better areas where treatment is known to do more good
controlled in patients in the experimental group, than harm on the basis of existing results of
then there is very strong evidence that the bronchi- randomized controlled trials or other definitive
al provocation test is a good measure of the evidence.
severity of asthma.
Given that new technologies are often expen-
sive, rigorous economic evaluation should be built Summary
into randomized controlled trials of diagnostic
tests. Methodologic standards of economic evalua- The clinical assessment of diagnostic technol-
tion in clinical trials are available.39-41 The inclu- ogies should begin with an exploratory stage in
sion of economic evaluation underscores the im- which potential application of the new test is
portance of assessing the impact of a diagnostic studied. Ideally, the accuracy, impact on health
technology on patient outcome. If the clinical care providers, therapeutic impact, patient out-
evaluation yields rigorous evidence concerning come, and pecuniary costs and benefits of the
benefit the incremental cost of using the test can technology should then be systematically assessed.
be related to the incremental improvement in These steps need not be sequential but under the
patient outcome. The relation between incremental right circumstances may be established in a single
cost and benefit can in turn be compared with the trial. There are a large number of situations in
costs and benefits associated with other health care which shortcuts are appropriate; for example, if a
interventions. 7'3941 If the cost per life-year or new test is shown to be both more accurate and
quality-adjusted life-year gained is lower than that less expensive than existing alternatives, its useful-
associated with many alternative uses of health ness is established. Nevertheless, demonstration of
care resources, it is likely that the new technology accuracy is ordinarily not sufficient for dissemina-
represents an efficient use of health care resources. tion of a new technology. While before-after
studies with physicians' assessments of therapeutic
impact are less costly than randomized controlled
When are randomized controlled trials of trials, the results may overestimate the benefit of
patient outcome unnecessary? the new diagnostic technology. In many situations
methodologically rigorous randomized controlled
There are a number of situations in which the trials that test whether a diagnostic technology not
most stringent tests of benefit are not appropriate, only improves accuracy and changes therapy but
as follows: also improves outcome will be required. Attention
* Patient benefit from the test is so dramatic to this framework for assessing diagnostic tests
that even the results of observational studies leave will avoid premature dissemination of expensive
no room for doubt. The use of electrocardiography new technologies; ignoring the framework will
for dysrhythmias associated with treatment of result in inefficient use of increasingly limited
known efficacy is one example. The use of CT health care resources.
head scanning in the context of head trauma is
sufficiently dramatic in decreasing the need for
exploratory surgery that it can probably also be References
included in this category.42'43
* The new technology produces the same or 1. Department of Clinical Epidemiology and Biostatistics,
fewer untoward effects and is equally or less McMaster University Health Sciences Centre: How to read
expensive than existing alternatives and has been clinical journals: II. To learn about a diagnostic test. Can
Med Assoc J 1981; 124: 703-710
shown to be more accurate.
* If controlled trials demonstrate that appli- 2. Ransohoff DF, Feinstein AR: Problems of spectrum and
cation of a diagnostic technology leads to the bias in evaluating the efficacy of diagnostic tests. N Engi J
institution of a therapy that previous randomized Med 1978; 299: 926-929
controlled trials have proven effective or to the 3. Griner PF, Mayewski RJ, Mushlin Al et al: Selection and
termination of harmful therapy (as might happen interpretation of diagnostic tests and procedures. Principles
when a patient without a disease is mistakenly and applications. Ann Intern Med 1981; 94 (4 pt 2): 557-592

CAN MED ASSOC J, VOL. 134, MARCH 15, 1986 593


4. Banta HD, Behney CJ: Policy formulation and technology 25. Brett CZ: The value of lung cancer detection by six-month-
assessment. Milbank Mem Fund Q 1981; 59: 445-479 ly chest radiographs. Thorax 1968; 23: 414-420
5. Fineberg HV, Bauman R, Sosman M: Computerized cranial 26. Ibid: Earlier diagnosis and survival in lung cancer. Br MedJ
tomography. Effect on diagnostic and therapeutic plans. 1969; 4: 260-262
JAMA 1977; 238: 224-227
27. Brown VA, Sawers RS, Parsons RJ et al: The value of
6. Banta HD, Behney CJ, Willems JS: Toward Rational Tech- antenatal cardiotocography in the management of high-risk
nology in Medicine: Considerations for Health Policy, pregnancy: a randomized controlled trial. Br J Obstet
Springer-Verlag, New York, 1981 Gynaecol 1982; 89: 716-722
7. Department of Clinical Epidemiology and Biostatistics, 28. Flynn AM, Kelly J, Mansfield H et al: A randomized
McMaster University, Hamilton, Ont.: Interpretation of controlled trial of non-stress antepartum cardiotocography.
diagnostic data: 5. How to do it with simple maths. Can Idem: 427-433
Med Assoc J 1983; 129: 947-954
29. Feinstein AR: An additional basic science for clinical
8. Swets JA, Pickett RM, Whitehead SF et al: Assessment of medicine: III. The challenges of comparison and measure-
diagnostic technologies: advanced measurement methods ment. Ann Intern Med 1983; 99: 705-712
are illustrated in a study of computed tomography of the
brain. Science 1979; 205: 753-759 30. Christie D: Before-and-after comparisons: a cautionary role.
Br Med J 1979; 2: 1629-1630
9. Kramer MS, Feinstein AR: Clinical biostatistics. LIV. The
biostatistics of concordance. Clin Pharmacol Ther 1981; 29: 31. Elias E, Hamlyn AN, Jain S et al: A randomized trial of
111-123 percutaneous transhepatic cholangiography with the Chiba
needle versus endoscopic retrograde cholangiography for
10. Fleiss J: Statistical Methods for Rates and Proportions, bile duct visualization in jaundice. Gastroenterology 1976;
Wiley, New York, 1973: 146-153 71: 439-443
11. Cohen J: Weighted kappa. Psychol Bull 1968; 70: 213-230 32. Durbridge TC, Edwards F, Edwards RG et al: An evaluation
of multiphasic screening on admission to hospital. Precis of
12. Rozanski A, Diamond GA, Berman D et al: The declining a report to the National Health and Medical Research
specificity of exercise radionuclide ventriculography. N Council. Med J Aust 1976; 1: 703-705
EnglJ Med 1983; 309: 518-522
33. Department of Clinical Epidemiology and Biostatistics,
13. Diamond GA, Forrester JS: Metadiagnosis: an epistemologic McMaster University Health Sciences Centre: How to read
model of clinical judgment. Am J Med 1983; 75: 129-137 clinical journals: V. To distinguish useful from useless or
even harmful therapy. Can Med Assoc J 1981; 124: 1156-
14. Feinstein AR: An additional basic science for clinical 1162
medicine: II. The limitations of randomized trials. Ann
Intern Med 1983; 99: 544-550 34. Kirshner B, Guyatt GH: A methodological framework for
assessing health indices. J Chronic Dis 1985; 38: 27-36
15. Alperovitch A: Controlled assessment of diagnostic tech-
niques: methodological problems. Effect Health Care 1983; 35. Guyatt GH, Bombardier C, Tugwell PX: Measuring disease-
1: 187-190 specific quality of life in clinical trials. Can Med Assoc J (in
press)
16. Robbins AH, Pugatch RD, Gerzof SG et al: Observations on
the medical efficacy of computed tomography of the chest 36. Sox HC Jr, Margulies I, Sox CH: Psychlogically mediated
and abdomen. Am J Roentgenol 1978; 131: 15-19 effects of diagnostic tests. Ann Intern Med 1981; 95: 680-
685
17. Feeny D, Guyatt G, Tugwell P (eds): Health Care Technolo-
gy: Effectiveness, Efficiency and Public Policy (in press) 37. Carrera GF, Gerson DE, Schnur J et al: Computed tomogra-
phy of the brain in patients with headache or temporal lobe
18. Guyatt GH, Tugwell PX, Feeny DH et al: The role of epilepsy: findings and cost-effectiveness. J Comput Assist
before-after studies of therapeutic impact in the evaluation Tomogr 1977; 1: 200-203
of diagnostic technologies. J Chronic Dis (in press)
38. Larson EB, Omenn GS, Lewis H: Diagnostic evaluation of
19. Goldman L, Feinstein AR, Batsford WP et al: Ordering headache. Impact of computerized tomography and cost-
patterns and clinical impact of cardiovascular nuclear medi- effectiveness. JAMA 1980; 243: 359-362
cine procedures. Circulation 1980; 62: 680-687
39. Department of Clinical Epidemiology and Biostatistics,
20. Goldman L, Cohn PF, Mudge GH Jr et al: Clinical utility McMaster University Health Sciences Centre: How to read
and management impact of M-mode echocardiography. Am clinical journals: VII. To understand an economic evalua-
J Med 1983; 75: 49-56 tion (part A). Can Med Assoc J 1984; 130: 1428-1432, 1434
21. Wittenberg J, Fineberg HV, Ferrucci JT et al: Clinical 40. Idem: How to read clinical journals: VII. To understand an
efficacy of computed body tomography. AJR 1980; 134: 111- economic evaluation (part B). Ibid: 1542-1549
120
41. Drummond MF, Stoddart GL: Economic analysis and clini-
22. Dixon AK, Fry IK, Kingham JG et al: Computed tomogra- cal trials. Controlled Clin Trials 1984; 5: 115-128
phy in patients with an abdominal mass: effective and
efficient? A controlled trial. Lancet 1981; 1: 1199-1201 42. Zimmerman RA, Bilaniuk LT, Gennarelli T et al: Cranial
computed tomography in diagnosis and management of
23. WHO cooperative trial on primary prevention of ischaemic acute head trauma. Am I Roentgenol 1978; 131: 27-34
heart disease using clofibrate to lower serum cholesterol:
mortality follow-up. Report of the Committee of Principal 43. Ambrose J, Gooding MR, Uttley D: E.M.I. scan in the
Investigators. Lancet 1980; 2: 379-385 management of head injuries. Lancet 1976; 1: 847-848
24. Dronfield MW, Langman MIS, Atkinson M et al: Outcome 44. Hull R, Hirsh J, Sackett DL et al: Combined use of leg
of endoscopy and barium radiography for acute upper scanning and impedance plethysmography in suspected
gastrointestinal bleeding: controlled trial in 1037 patients. venous thrombosis. An alternative to venography. N Engl I
Br Med l [Clin Resl 1982; 284: 545-548 Med 1977; 296: 1497-1500
594 CAN MED ASSOC J, VOL. 134, MARCH 15,1986

You might also like