You are on page 1of 5

Fundamentals of Clinical Research

for Radiologists

C. Craig Blackmore1
The Challenge of Clinical
Radiology Research

T he development of new technol-

ogy traditionally has been the
lifeblood of radiology. Many of
the spectacular advances in medicine over
termediate and more advanced modules. The
objective is to provide a pathway for the nov-
ice researcher to learn to critically appraise the
literature and to conduct evidence-based radi-
the past few decades have centered around ology, to communicate effectively with meth-
radiology. One does not have to go far into odology experts, and finally, to perform or
the past to predate the development of CT, direct independent, scientifically valid, and
MR imaging, and sonography, technologies clinically useful research.
that now are omnipresent, critical components The concepts introduced in this first article
of medical care. Yet for all the advances in the will be by design simplistic. The intent of this
development of imaging technology, radiology first module is to introduce the scope of the ma-
research has come under deserved criticism in terial that is to be covered in much greater detail
its efforts to assess the effectiveness and appro- in the sessions to come. Many of the major con-
priate use of such imaging technology [1–5]. cepts of rigorous technology assessment will be
Production of a technologically adequate im- introduced, with detailed discussions to follow
Received July 14, 2000; accepted without revision age is a starting point, but it is only the first in future modules. This introduction describes
July 14, 2000.
step in determining whether such a technology the problems of research in radiology and at-
C. C. Blackmore received salary support as a General
Electric–Association of University Radiologists Academic
should be used in clinical care. To be useful, tempts to provide the radiology investigator
Fellow. an imaging study must also be accurate and with an understanding of some of the potential
Series editors: Craig A. Beam, C. Craig Blackmore, provide information that has the potential to pitfalls to be avoided.
Steven Karlik, and Caroline Reinhold. change the medical care, and ultimately the
This is the first in a series designed by the American health, of the patient [6, 7].
College of Radiology (ACR), the Canadian Association of This article is the first of an ongoing series Evidence-Based Radiology
Radiologists, and the American Journal of Roentgenology.
The series, which will ultimately comprise 22 articles, is
that, taken together, will form a comprehensive Every day in the clinical practice of radi-
designed to progressively educate radiologists in the teaching primer on basic and advanced con- ology, we make observations and adjust our
methodologies of rigorous clinical research, from the most cepts in technology assessment and outcomes practice accordingly. Many of the great ad-
basic principles to a level of considerable sophistication.
The articles are intended to complement interactive research as described in the introductory arti- vances in science have arisen from just such
software that permits the user to work with what he or she cle in this month’s issue of the American Jour- observations. The fortuitous observation that
has learned, which is available on the ACR Web site nal of Roentgenology (AJR) [8]. This series of bacteria colonies did not grow around bread
articles published in the AJR will form one mold led Alexander Fleming to discover pen-
Project coordinator: Bruce J. Hillman, Chair, ACR
Commission on Research and Technology Assessment.
component of the research course cosponsored icillin. In radiology, we constantly observe
by the American College of Radiology and the imaging appearances of diseases and
Department of Radiology, University of Washington,
Harborview Medical Center, 325 Ninth Ave., Box 359728, Canadian Association of Radiologists on the healthy states and subtly adjust our thresh-
Seattle, WA 98104. Address correspondence to fundamentals of clinical research for radiolo- olds for interpretation. However, at the same
C. A. Beam, Department of Radiology, Medical College of gists. Tightly linked with these articles will be time, this simple anecdote and experience is,
Wisconsin, 8701 Watertown Plank Rd., Milwaukee, WI 53226.
Web-based interactive teaching modules. The by definition, limited to what we personally
AJR 2001;176:327–331
intent of this integrated series is to be progres- have seen and is most strongly influenced by
0361–803X/01/1762–327 sive, starting with basic introductory concepts what we have seen recently. We have all ob-
© American Roentgen Ray Society and gradually adding complexity through in- served the phenomenon that after a patient

AJR:176, February 2001 327


presents with a rare and difficult-to-diagnose a four-grade scale (Appendix). At the top level ses [3, 20]. Today, more sophisticated and
disease, the next group of patients that ap- (grade A) are methodologically rigorous studies dependable research methods have been ap-
pear at all similar will be examined for that with broad generalizability, including large ran- plied to MR imaging and assessment of effi-
same malady. Our belief that a disease is rare domized clinical trials and prospective compar- cacy with this modality for a number of
is shaken by the fact that we have seen it, and isons of diagnostic test results to an appropriate indications. However, most of the research
have seen it recently. The same is true for the gold standard. At the bottom level of this hierar- literature on the use of radiology techniques
use of diagnostic technology. For example, chy are grade D studies, which include multiple remains descriptive, with little published
because we have diagnosed a case of testicu- methodologic flaws, biases in study design, or work on the influence of radiology on patient
lar seminoma from CT findings does not unsubstantiated opinion [2, 15]. Most of the ra- treatment or outcome [4]. One of the reasons
mean that CT is the imaging modality of diology literature relates to development of new for these deficiencies is the lack of research
choice for this condition, or that all patients techniques and descriptive work. Actual assess- training of the individual radiology investi-
at risk for testicular seminoma should un- ment of these new technologies and determina- gators. Unfortunately, training in research
dergo CT. tion of any impact on patient outcome is relatively methodology has been underemphasized in
To supersede this practice based on anec- uncommon [4]. Few grade A or B studies exist. radiology residency training in the United
dote, the field of evidence-based medicine New radiology technologies have been rapidly States [21]. Many radiologists, although
has evolved and has become the standard for developed and disseminated, often without ade- highly skilled clinicians, have only a rudi-
medical practice [9, 10]. Although less estab- quate proof of efficacy [1, 16, 17]. Although radi- mentary background in research methodol-
lished in radiology than in other areas of ologists may not have paid great attention to the ogy and lack many of the basic tools
medicine, this evidence-based paradigm is shortcomings in their research efforts, these limi- required to perform a critical review of the
no less relevant for radiology [11]. The con- tations may have been more apparent to the re- medical literature. The objective of this dis-
struct underlying evidence-based medicine is mainder of the medical community. cussion is to introduce some major concepts
that one individual’s experience is limited. Early studies of MR imaging represent an in research design and in critical literature re-
Decisions should be based on the best evi- illustrative example of how radiology re- view. More detailed discussion will be in-
dence from the medical literature rather than search has come under external criticism, cluded in subsequent modules.
one’s own limited experience [9, 11]. As a particularly for methodologic deficiencies.
corollary, as physicians we tend to cling to Developed in the 1970s and early 1980s, MR
Anatomy of a Research Project
what we were taught in residency or fellow- imaging was initially greeted with a variety
ship, often by acknowledged experts in the of investigations and reports in the radiology It is useful to review the anatomy of a re-
field. However, the evidence-based paradigm literature in particular, describing the excit- search project. This standard framework is the
suggests that the experts are also individuals, ing potential of this new modality. However, foundation of the scientific literature. In brief, a
and we should trust their anecdotal experi- most of this early research was merely de- research question is formulated, methods are
ence only somewhat more than we trust our scriptive. Those studies that attempted to as- derived to answer the question, data are col-
own. Instead, practice should be guided by sess even accuracy were limited in size and lected and analyzed, and conclusions are drawn.
rigorous scientific investigation [9, 11, 12]. generally suffered from important design Within this framework are several key concepts
The major source for the evidence on which flaws [2, 16, 18, 19]. A 1988 article by Coo- that are discussed in the following text, includ-
to base practice is the medical literature. With per et al. [16] noted that none of the initial 54 ing formulation of the research question, use of
the rapid proliferation in radiology technology research reports on the efficacy of MR imag- efficient study design, avoidance of error and
has come a parallel increase in the volume of ing met accepted contemporary standards for bias, and appropriate data analysis.
the radiology literature. There are now more research design. The article concluded that
than 40 radiology journals and more than 4000 “health care professionals paying for expen- The Research Question
articles published each year [13]. However, the sive innovative technology should demand The first step in any research endeavor is to
published literature has its own perils and better research on diagnostic efficacy.” In frame an appropriate research question. This
should be interpreted with a critical eye. First, 1994, Kent et al. [2] found that of 142 studies question must be important (or it is not worth
case reports, even if published, are essentially of MR neuroimaging published through 1993, our efforts), but it also must be precise [22, 23].
anecdotes that are codified in print. Although only one provided grade A information, 28 As an example, we can start with a common
they are often interesting, may be provocative, provided grade B or C, and most (113) pro- and vexing clinical problem that has been the
and can invoke questions for scientific study, vided only grade D information. Kent et al. cause of considerable interest in the radiology
they should not form the basis for practice. Sec- concluded that despite the fact that more than literature, “Which test is better in patients with
ond, and more insidious, are published reports 2000 MR imaging scanners had been installed, possible appendicitis, CT or sonography?” This
that, although well intended, contain biases or the evidence supporting the use of MR imag- question is certainly important and clinically
flaws in the methodology that attenuate the ap- ing in clinical practice was weak. relevant, but as framed above it cannot be an-
plicability of the results into practice. A central The credibility of the radiology research swered. The question must be defined more pre-
tenet of evidence-based medicine is that the lit- community was shaken by these criticisms, cisely with respect to the type of patients in
erature must be analyzed critically, and only with some nonradiologists questioning whom the question is being raised, the target
those studies that are robust should be used as whether conflicts of interest would influence population, and what is actually being asked.
the basis for practice [11, 14]. A useful frame- radiologists and organized radiology [17]. The imaging accuracy and usefulness of sonog-
work for evaluating the value of a literature arti- Similar methodologic deficiencies have also raphy and CT will likely vary on the basis of a
cle is promoted by Kent et al. [2], who propose been reported for radiology economic analy- number of patient-specific variables. Are the pa-

328 AJR:176, February 2001

Fundamentals of Clinical Research for Radiologists

tients we are interested in adults or children? modeling studies can also provide useful in- pensate for systematic error. Avoidance of
Are they thin or fat? Are they cooperative or un- formation [4, 26]. These study designs will such systematic error, or bias, is one of the ma-
cooperative? Are they men or women? Disease- be discussed in detail in future modules. jor challenges of research design. Unfortu-
specific factors may also affect the imaging. nately, many of the apparently simple research
Has the patient been symptomatic for a few Error designs that are common in the radiology liter-
hours and we suspect simple unperforated ap- The research design is intended to arrive at the ature succumb to bias. As an example, one
pendicitis, or has the patient been symptomatic truth for the question under study. One of the could imagine a study designed to compare
for 4 days and appears septic, leading us to sus- major driving factors of research design is the ef- CT and MR imaging for detection of liver me-
pect an abscess? These factors also might affect fort to avoid or control error. Error can be di- tastases in patients with known adenocarci-
the performance of sonography and CT. vided into two general categories: random error, noma of another organ. To identify patients for
Finally, how we are using the findings of an and systematic error, also known as bias. Ran- such a study, one might review all the patients
imaging study might affect the determination dom error, as the name implies, is due to chance who underwent both tests, and using some ex-
of optimal imaging modality. Are we using im- events that have the potential to lead to false con- ternal gold standard, make a comparison.
aging to confirm appendicitis en route to the clusions. The field of statistics has evolved in However, would this study design be free of
operating room, or are we using imaging to large part to deal with the random and therefore bias? Likely, there would be significant bias in
look for other abnormalities that might mimic unpredictable error that can occur in any study the selection of the subjects. For example, if at
appendicitis, such as ureteral calculi, diverticu- design. Statistics is a methodology for drawing a given center CT is generally used as the ini-
litis, or even abdominal aortic aneurysm? A inference about populations from data collected tial imaging modality for the evaluation of
better defined research question might be, “In on samples [27]. In medicine, we generally ac- possible liver metastases, then the patients who
nonpregnant women younger than 40 years cept events as being true (not related to random undergo both imaging studies would be the
with symptoms suggestive of appendicitis but chance) if the probability of their random occur- ones in whom the initial CT was equivocal.
no peritoneal signs, what is the preferred imag- rence is less than 5%, expressed as the common The comparison would not be CT versus MR
ing modality to exclude the presence of an ab- statistical p value of 0.05. Of course, unlikely imaging, but rather, CT versus MR imaging in
dominal condition that might require surgical events do occur. Type I (also known as alpha) er- patients in whom the CT was equivocal. Of
intervention?” This reformulated research ror occurs when we conclude that a difference course, the results of such a study would un-
question is perhaps less “sexy” than “Which exists when in fact two groups are the same. At a derestimate the accuracy of CT, because only
test is better?” but it is also much more useful. significance threshold of p less than 0.05, we those cases that are difficult to diagnose with
The reformulated question is no longer an is- will make such type I errors in 5% of compari- CT were included. This is a simple but unfor-
sue of comparing radiology tests. Instead, we sons. However, if a study involves multiple com- tunately common example of selection bias in
are asking a clinical question about a specific parisons (i.e., comparing six different MR recruiting patients for a study. Selection bias
group of patients that can potentially affect the imaging pulse sequences), then the probability occurs when the subjects studied are not repre-
health of those patients [22–25]. Some experi- of a type I error also increases [28]. sentative of the target population. In the previ-
enced researchers believe that formulating and The opposite of type I error, known as type ous example, the target population is all
framing the research question is the most chal- II error, is when we conclude that two popula- patients with known adenocarcinoma of an-
lenging aspect of doing research [22]. tions are the same when in fact they are not. other organ. However, the study group is only
Unfortunately, the commonly reported p value those patients with known adenocarcinoma
Study Design gives no information about the potential for who underwent both CT and MR imaging. To
Having determined the question to be an- this type II, or beta, error. There is a common avoid this bias, subject selection should be
swered, the next issue is the research meth- misconception that a p value greater than 0.05 based on clinical criteria (i.e., all subjects with
odology itself. To produce evidence that will indicates that two groups are the same. How- a new diagnosis of adenocarcinoma) rather
appropriately drive decision making, experi- ever, this is only true if the study sample has than availability of imaging studies [14, 22].
mental design is of critical importance and sufficient size to have the power to detect a dif- When using a test to screen a population,
will be the focus of much of this article se- ference if it is present [27]. Sufficient sample selection bias can be more subtle but equally
ries. The goal of study design is to achieve size is determined by the size of difference we problematic. Intuitively, one would expect that
the most with the least (i.e., to achieve effi- are interested in detecting, usually the amount if a cohort of subjects is randomly selected to
ciency). Fortunately, we have the experience of difference that would be clinically signifi- undergo a radiologic screening test, we could
of clinical epidemiologists and biostatisti- cant, and by the desired power of the study compare the subjects who actually undergo
cians with decades of experience from which [27, 29]. Power is the chance the study will re- screening with those who elect not to undergo
to draw to determine the most efficient way veal the clinically significant difference when screening and make reasonable conclusions.
of designing studies and the most appropri- it exists and equals one minus the type II error However, convincing evidence from previous
ate way to productively critique research. probability. As an example, a study might re- screening studies indicates that differences ex-
Prospective comparisons of diagnostic test port 90% power to detect a difference of 5%. ist between subjects who elect screening and
results with a well-defined reference test and those who refuse. Subjects who elect to un-
randomized double-blinded clinical trials are Bias dergo screening may be more health con-
the study designs that provide the best infor- The opposite of random error is systematic scious, or more optimistic, or there may be
mation to guide clinical practice [2, 26]. error that is introduced through inadequacy in some other factor that is not understood [4,
However, other study designs, including co- the study design, subject selection, or analysis. 30]. Thus, in a research study designed to in-
hort and case-control investigations and Statistics are for the most part unable to com- vestigate patient outcome for a new screening

AJR:176, February 2001 329


study, comparison of those who undergo the observers and would therefore potentially study is reproducible [1, 24]. The opposite of
screening with those who elect against screen- introduce greater bias. reliability is variability. Interpretation of
ing could show improved outcomes in the The effect of these various biases has been some diagnostic tests can be quite subjective.
screened group even if the test has no benefit, documented. In general, studies with bias tend If different observers cannot agree on the test
or is even harmful. Therefore, to investigate the to report more encouraging results than those result on the same subject, then interobserver
effectiveness of a screening study, it is essential without bias [31]. In addition, preliminary variability is high. Similarly, if the same ob-
to compare patients who are randomized to be studies of a diagnostic technology, performed server determines the results of the same test
invited for screening to those who are random- with small sample size and vulnerable to bias, to be different at different times, then in-
ized not to be invited. In the analysis, all sub- often will be highly optimistic about the capa- traobserver variability is high. If a test has
jects are included, regardless of whether they bilities of that technology. Subsequent reports low reliability, then the test cannot achieve
actually undergo the screening study. This is may present a more realistic appraisal [32]. high accuracy in general practice [1].
known as an intention-to-treat analysis and
avoids the subtle bias I have described [4]. Data Analysis Conclusion
Other bias can develop from the way in Research is conducted on samples. We Performing methodologically rigorous
which data are collected. All humans have measure outcome or accuracy on a relatively scientific research is not a trivial task. The
preconceived notions, both conscious and small number of subjects. Yet the intent of optimal research study will be directed at an
unconscious. These preconceptions alter the research is (eventually) to influence clinical important, precisely defined clinical ques-
way in which we observe our surroundings care. To achieve this, the research results tion, with a specified target population
and can unintentionally affect data that we must be valid on subjects other than those in- matched by the subject selection. The most
collect, which is referred to as review bias. cluded in the study. Statistics is the science efficient study design will be used and the
To remove any review bias, it is necessary that allows us to make inferences about pop- sample size will be sufficient to limit type II
to ensure that the individual who collects ulations from measurements made on sam- error to an acceptable level. Further, bias will
the data is unaware of the outcome under ples. A vast array of tools is available to the be avoided, and the results will be reliable,
study. For example, the individual who de- biostatistician to enable such inference. internally valid, and generalizable to the tar-
termines if a test is positive should not These tools must be familiar to the research get population and possibly beyond. Success
know whether the subject truly has the dis- radiologist and will be discussed in future at such demanding research endeavors is cer-
ease in question. Also, when comparing two modules. In this discussion I will limit my- tainly within the reach of radiologists and ra-
tests, the results of the first test should not self to introduction of the concepts of valid- diology researchers. However, training—the
be known before interpretation of the sec- ity and reliability. goal of this series of articles—is necessary.
ond. A recent analysis of research on diag- Validity can be divided into internal valid- In this article, I have attempted to intro-
nostic tests performed by Reid et al. [1] ity and external validity, which is also known duce the problem—the need for improved re-
included some radiology studies that re- as generalizability. Internal validity refers to search methodology in radiology research. I
ported that 62% of research studies did not the extent to which the results and conclu- have also begun to outline the solution
document that appropriate steps had been sions of a study actually relate to true events through briefly introducing the concept of
taken to avoid such review bias. in the sample under study. Some of the bi- evidence-based radiology and discussing the
Similarly, if different gold standards are ases and study design considerations de- basics of research methodology: posing the
used for patients with disease than for those scribed previously relate to validity. For research question, and study design, error,
without, then results of accuracy studies may example, an observer who is aware of the re- bias, and data analysis. I am certain that this
be overestimated. Lijmer et al. [31] found that sults of the reference test might unintention- discussion has been too basic for some and
the reported accuracy of diagnostic studies was ally overestimate the accuracy of the too sophisticated for others. However, in the
significantly greater if different verification diagnostic test under study. Thus, the re- modules that follow, increasing depth, clar-
standards were applied to patients with and corded results might not be an internally ity, and detail will be added to the rough out-
without disease than if the same gold standard valid representation of the actual sample. line that has been described in this article. By
was applied to all. The term “verification bias” The method of data analysis and the statisti- the conclusion of this project, the radiology
has been applied to this problem [31, 32]. cal tests used are also critical to the internal investigator will have a comprehensive re-
Additional potential biases in diagnostic validity of the study, because use of inappro- source to aid the transition from relative nov-
test evaluation include spectrum bias, in priate analysis can lead to false conclusions. ice to skilled researcher.
which only patients with overt disease are Similarly, the external validity of a study is
used in assessment of a diagnostic test. Not dependent on both the research design and the References
including subtle or indeterminate cases can analytic methods. The extent to which the 1. Reid MC, Lachs MS, Feinstein AR. Use of meth-
also lead to overestimation of disease accu- sample selected truly reflects the target popu- odological standards in diagnostic test research:
racy [31, 32]. Prospective data collection is lation is a strong determinate of the generaliz- getting better but still not good. JAMA 1995;274:
generally less subject to bias than retrospec- ability of a study [22]. Also, the use of 645–651
tive collection and is therefore preferred appropriate statistics allows determination of 2. Kent DL, Haynor DR, Longstreth WT Jr, Larson EB.
The clinical efficacy of magnetic resonance imaging
when designing a study. However, retrospec- what inferences can be drawn about the target
in neuroimaging. Ann Intern Med 1994;120:856–871
tive data collection may be preferred in a few population on the basis of the sample data. 3. Blackmore CC, Magid DJ. Methodologic evalua-
circumstances, such as when prospective data A final consideration is study reliability. tion of the radiology cost-effectiveness literature.
collection would remove the ability to blind Reliability refers to the extent to which the Radiology 1997;203:87–91

330 AJR:176, February 2001

Fundamentals of Clinical Research for Radiologists

4. Blackmore CC, Black WB, Jarvik JG, Langlotz 14. Jaeschke R, Guyatt G, Sackett DL. Users’ guides 22. Eng J, Siegelman SS. Improving radiology re-
CP. A critical synopsis of the diagnostic and to the medical literature. III. How to use an article search methods: what is being asked and who is
screening radiology outcomes literature. Acad about a diagnostic test. A. Are the results of the being studied? Radiology 1997;205:651–655
Radiol 1999;6[supp 1]:S8–S18 study valid? JAMA 1994;271:389–391 23. Hulley SB, Cummings SR. Designing clinical re-
5. Hillman BJ. Outcomes research and cost-effec- 15. Kent DL, Larson EB. Disease, level of impact, and search. Baltimore: Williams & Wilkins, 1988:12–18
tiveness analysis for diagnostic imaging. Radiol- quality of research methods: three dimensions of 24. Jaeschke R, Guyatt GH, Sackett DL. Users’
ogy 1994;193:307–310 clinical efficacy assessment applied to magnetic guides to the medical literature. III. How to use
6. Fryback DG, Thornbury JR. The efficacy of diag- resonance imaging. Invest Radiol 1992;27:245–254 an article about a diagnostic test. B. What are the
nostic imaging. Med Decis Making 1991;11:88–94 16. Cooper LS, Chalmers TC, McCally M, Berrier J, results and will they help me in caring for my pa-
7. Thornbury JR. Clinical efficacy of diagnostic im- Sacks HS. The poor quality of early evaluations tients? JAMA 1994;271:703–707
aging: love it or leave it. (Eugene W. Caldwell of magnetic resonance imaging. JAMA 1988; 25. Black WC. How to evaluate the radiology litera-
lecture) AJR 1994;162:1–8 259:3277–3280 ture. AJR 1990;154:17–22
8. Beam CA, Blackmore CC, Karlik S, Reinhold, C. 17. Kent DL, Larson EB. Diagnostic technology as- 26. Sackett DL, Haynes RB, Guyatt GH, Tugwell P.
Fundamentals of clinical research for radiolo- sessments: problems and prospects. Ann Intern Clinical epidemiology: a basic science for clini-
gists: editors’ introduction to the series. AJR Med 1988;108:759–761 cal medicine. Boston: Little, Brown, 1991:51–68
2001;176:323–325 18. Beam CA, Sostman HD, Zheng J. Status of clini- 27. Altman DG. Practical statistics for medical re-
9. Sackett DL, Richardson WS, Rosenberg W, cal MR evaluations 1985-1988: baseline and de- search. London: Chapman and Hall, 1991
Haynes RB. Evidence-based medicine. New sign for future assessments. Radiology 1991; 28. Fleiss JL. Statistical methods for rates and pro-
York: Churchill Livingstone, 1997:2–3 180:265–270 portions, 2nd ed. New York: Wiley, 1981:121
10. Evidence-Based Medicine Working Group. Evi- 19. Kent DL, Larson EB. Magnetic resonance imaging of 29. Obuchowski NA. Testing for equivalence of diag-
dence-based medicine: a new approach to teach- the brain and spine: is clinical efficacy established af- nostic tests. AJR 1997;168:13–17
ing the practice of medicine. JAMA 1992; 268: ter the first decade? Ann Intern Med 1988;108:402– 30. Black WC, Welch HG. Screening for disease.
2420–2425 424 [Erratum in Ann Intern Med 1988;109:438] AJR 1997;168:3–11
11. Wood BP. What's the evidence? Radiology 20. Blackmore CC, Smith WJ. Economic analyses of ra- 31. Lijmer JG, Mol BW, Heisterkamp S, et al. Empir-
1999;213:635–637 diological procedures: a methodological evaluation of ical evidence of design-related bias in studies of
12. Eisenberg JM. Ten lessons for evidence-based tech- the medical literature. Eur J Radiol 1998;27:123–130 diagnostic tests. JAMA 1999;282:1061–1066
nology assessment. JAMA 1999;282:1865–1869 21. Hillman BJ, Putman CE. Fostering research by 32. Ransohoff DF, Feinstein AR. Problems of spec-
13. Index to imaging literature. Radiology 1999;210 radiologists: recommendations of the 1991 sum- trum and bias in evaluating the efficacy of diag-
[suppl]:iv–v mit meeting. Radiology 1992;182:315–318 nostic tests. N Engl J Med 1978;299:926–930

APPENDIX: Quality of Research Methods

Grade A: Studies with broad generalizability
• No significant flaws
• Prospective comparison of a diagnostic test with a well-defined diagnosis
• Large randomized, blinded clinical trial assessing therapeutic efficacy or patient outcome

Grade B: Studies with narrower spectrum of generalizability

• Few well-described flaws with definable impact on the results
• Prospective study of diagnostic tests
• Randomized trial of therapeutic effects and patient outcomes

Grade C: Studies with limited generalizability

• Multiple flaws in research methods, small sample size, incomplete reporting
• Retrospective studies of diagnostic accuracy

Grade D: Studies with multiple flaws in research methods

• Obvious selection bias
• Opinions without substantiating data

(Modified from Kent et al. [2, 15])

AJR:176, February 2001 331