You are on page 1of 5

Statistical Concepts Series

Radiology

Kimberly E. Applegate, MD, An Introduction to


MS
Philip E. Crewson, PhD Biostatistics1
Index terms: This introduction to biostatistics and measurement is the first in a series of articles
Radiology and radiologists, research
designed to provide Radiology readers with a basic understanding of statistical
Statistical analysis
concepts. Although most readers of the radiology literature know that application of
Published online before print study results to their practice requires an understanding of statistical issues, many
10.1148/radiol.2252010933 may not be fully conversant with how to interpret statistics. The goal of this series
Radiology 2002; 225:318 –322 is to enhance the ability of radiologists to evaluate the literature competently and
critically, not make them into statisticians.
1 ©
From the Department of Radiology, RSNA, 2002
Riley Hospital for Children, 702 Barn-
hill Rd, Indianapolis, IN 46202 (K.E.A.);
and the Department of Research, Amer-
ican College of Radiology, Reston, Va
(P.E.C.). Received May 16, 2001; revi-
sion requested July 6; revision received
September 5; accepted September 24. There are three kinds of lies: lies, damned lies and statistics.
Address correspondence to K.E.A. Benjamin Disraeli (1)
(e-mail: kiappleg@iupui.edu).
See also the From the Editor article
in this issue. The use of statistics in both radiology journals and the broader medical literature has
© RSNA, 2002 become a common feature of published clinical research (2). Although not often recog-
nized by the casual consumer of research, errors in statistical analysis are common, and
many believe that as many as 50% of the articles in the medical literature have statistical
flaws (2). Most radiologists, however, are poorly equipped to properly interpret many of
the statistics reported in the radiology literature. There are a number of reasons for this
problem, but the reality for the radiology profession is that research methods have long
been a low priority in the training of radiologists (3–5). Contributing to this deficit is a
general indifference toward statistical teaching in medical school and physician training,
insufficient numbers of statisticians, and limited collaboration and understanding be-
tween radiologists and statisticians (2,6,7).
If it has traditionally been such a low priority in the profession, why then do we need
to improve our understanding of statistics? We are consumers of information. Statistics
allow us to organize and summarize information and to make decisions by using only a
sample of all available data. Nearly all readers of the radiology literature know that
understanding a study’s results and determining the applicability of the results to their
practice requires an understanding of statistical issues (5). Even when learned, however,
research skills can be quickly forgotten if not applied on a regular basis—something most
radiologists are unlikely to do, given their increasing clinical demands.
This introduction to biostatistics and measurement is the first in a series of articles
designed to provide Radiology readers with a basic understanding of statistical concepts.
These articles are meant to increase understanding of how statistics can and should be
applied in radiology research so that radiologists can appropriately interpret the results of
a study. Each article will provide a short summary of a statistical topic. The series begins
with basic measurement issues and progress from descriptive statistics to hypothesis
testing, multivariate models, and selected technology-assessment topics. Key concepts
presented in this series will be directly related to the practice of radiology and radiologic
research. In some cases, formulas will be provided for those who wish to develop a deeper
understanding; however, the goal of this series is to enhance the ability of radiologists to
evaluate the literature competently and critically, not to make them statisticians.
The concepts presented in this introductory article are important for putting into
perspective the substantive value of published research. Appendices A and B include two
useful resources. One is a list of the common terms and definitions related to measure-
ment. The other is a list of potentially useful Web resources. This list contains Web sites

318
that are either primarily educational or leading results, and inappropriate patient have a different value in different people,
have links to other resources such as sta- care. The benefits of statistical software in different places, or at different times.
tistical software. In addition, some sug- generally far outweigh the costs, but Such variables are often referred to as
gested additional readings are listed in proper measurement, study design, and random variables when the value of a
Appendix C. good judgment should prevail over the particular outcome is determined by
ease with which many analyses can be chance (ie, by means of random sam-
Radiology

conducted. What follows is an introduc- pling) (17). Since many characteristics


ORIGINS OF STATISTICS tion to the basics of measurement. are measured imperfectly, we should not
expect complete congruence between a
The term statistic simply means “numeric measure and truth. Put simply, any mea-
MEASUREMENTS: BUILDING
data.” In contrast, the field of statistics is surement has an error component.
BLOCKS OF STATISTICS
a human enterprise encompassing a wide If a measure does not take on more
range of methods that allow us to learn than one value, it is referred to as a con-
The interpretation and use of statistics
from experience (8,9). Tied to the emer- stant. As an example, patient sex is a vari-
require a basic understanding of the fun-
gence of the scientific method in the able: It can vary between male and fe-
damentals of measurement. Although
16th and 17th centuries, statistical think- male from one patient to the next.
most readers of the radiology literature
ing involves deduction of explanations of However, a study of breast imaging is
will recognize common terms such as
reality and framing of these explanations likely to be limited to female patients. In
variables, association, and causation, few
into testable hypotheses. The results of this context, sex is no longer a variable in
are likely to understand how these terms
these tests are used to reach conclusions a statistical sense (we cannot analyze it
interrelate with one another to frame the
(inferences) about future outcomes on the because it does not vary). In contrast,
structure of a statistical analysis. What
basis of past experiences. holding the value of one variable con-
follows is a brief introduction to the prin-
Statistical thinking is probabilistic. For stant in order to clarify variations in
ciples and vocabulary of measurement.
example, most radiologists are familiar other variables is sometimes referred to as
with the notion that a P value of less than a statistical control. With mammography
.05 represents a statistically significant re- Operationalization as an example, it may be useful to esti-
sult. Few understand that .05 is an arbi- Emmet (15) wrote, “We must beware mate the accuracy of an imaging tech-
trary threshold. While performing agron- always of thinking that because a word nique separately for women with and for
omy research, Sir Ronald Fisher helped to exists the ’thing’ for which that word is women without dense breast tissue. As
establish the P value cutoff level of .05 supposed to stand necessarily exists too.” noted previously, however, operational-
(commonly referred to as the ␣ level) in Measurement begins with the assign- izing what is and is not dense breast tis-
the early part of the 20th century. Fisher ment of numbers to events or things to sue may not be as simple as it first ap-
(10) was testing hypotheses about appro- help us describe reality. Measurements pears.
priate levels of fertilizer for potato plants range from the obvious (eg, diameter,
and needed a basis for decision making. length, time) to the more difficult (eg, Measurement Scales
Today we often use this same basis for patient satisfaction, quality of life, pain),
testing hypotheses about appropriate pa- but all are something we can quantify or There are four levels of data, com-
tient care. count. This process is called operational- monly referred to as nominal, ordinal,
The health profession gradually recog- ization. Operationalized concepts range interval, and ratio data. Nominal data
nized that statistics were as applicable to from well-established measures such as classifies objects according to type or
people as they were to potato plants. By lesion diameter in millimeters to less well- characteristic but has no logical order.
the 1960s, clinicians and health policy defined measures such as image quality, With imaging technology as an example,
leaders were asking for statistical evi- contrast agent toxicity, patient comfort, ultrasonography, magnetic resonance
dence that an intervention was effective and imaging cost. (MR) imaging, computed tomography,
(11). Over the past several decades, the If this appears somewhat abstract, con- and conventional radiography are each
use of statistics in medical journals has sider the following three points: First, re- exclusive technologic categories, gener-
increased both in quantity and in sophis- searchers can operationalize anything ally without logical order. Other com-
tication (12,13). Advances in computers that exists (16), but some measures will mon examples would be sex, race, and a
and statistical software have paralleled be more imprecise (quality of life) than radiologist’s primary subspecialty. Ordi-
this increase. However, the benefits of others (diameter). Second, since there is nal data also classify objects according to
easy access to the tools of statistical anal- likely to be more than one way to opera- characteristic, but the categories can take
ysis can be overshadowed by the costs tionalize a concept, the choice of the best on some meaningful order. The Ameri-
associated with misapplication of statis- way may not be obvious. Third, the radi- can College of Radiology Breast Imaging
tical methods. Statistical software makes ology profession and the research it gen- Reporting and Data System, or BI-RADS,
it far too easy to conduct multiple tests of erates are saturated with conceptualiza- classification system for final assessment
data without prior hypotheses (the so- tions that have been operationalized, is a good example of an ordinal scale. The
called data-mining phenomenon) or to some more successfully than others. categories are mutually exclusive (eg, a
report overly precise results that portend finding cannot be both “benign” and a
a false sense of accuracy. There is also the “suspicious abnormality”), have some
Variables
potential for errors in statistical software logical order (ranked from “negative” to
and the ever-present risk that researchers Variables represent measurable indica- “highly suggestive of malignancy”), and
will fail to take the time to carefully look tors of a characteristic that can take on are scaled according to the amount of a
at the raw data (14). These issues can more than one value from one observa- particular characteristic they possess
result in poor science, erroneous or mis- tion to the next. A characteristic may (suspicion of malignancy). Nominal

Volume 225 䡠 Number 2 Introduction to Biostatistics 䡠 319


and ordinal data are also referred to as
qualitative variables, since their under- Required Elements for Causation
lying meaning is nonnumeric. Element Explanation
Interval data classify objects according
Association Do the variables covary empirically? Strong associations are more likely
to type and logical order, but the differ- to be causal than are weak associations.
ences between levels of a measure are
Radiology

Precedence Does the independent variable vary before the effect exhibited in the
equal (eg, temperature in degrees Celsius, dependent variable?
T scores reported for bone mineral den- Nonspuriousness Can the empirical correlation between two variables be explained
away by the influence of a third variable?
sity). Ratio data are the same as interval
Plausibility Is the expected outcome biologically plausible and consistent with
data but have a true zero starting point. theory, prior knowledge, and results of other studies?
As noted in the examples above, the val-
ues of degrees Celsius and T score can
take on both positive and negative num-
bers. Examples of ratio data would be
heart rate, percentage vessel stenosis, and and subsequent malignancy). Content plore (not prove) connections between
respirations per minute. Interval and ra- validity is the extent to which the indi- independent and dependent variables. A
tio data are also referred to as quantita- cator reflects the full domain of interest dependent variable (sometimes called
tive variables, since they have a direct (eg, tumor shrinkage may be indicated by the response variable) is a variable that
numeric interpretation. In most analyses, tumor width, height, or both). Construct contains variations for which we seek an
it does not matter whether the data are validity is the degree to which one mea- explanation. An independent variable is
interval or ratio data. sure correlates with other measures of the a variable that is thought to affect (cause)
same concept (eg, does a positive MR changes in the dependent variable. Cau-
Continuous and Discrete Variables study for multiple sclerosis correlate with sation is implied when statistically signif-
physical examination findings, patient icant associations are found between an
Variables such as weight and diameter symptoms, or laboratory results?). Face independent and a dependent variable,
are measured on a continuous scale, validity evaluates whether the indicator but causation can never be truly proved.
meaning they can take on any value appears to measure the concept. As an Proof is always an exercise in logical de-
within a given interval or set of intervals. example, it is unlikely that an MR study duction tempered with a degree of uncer-
As a general rule of thumb, if a subdivi- of the lumbar spine will facilitate a diag- tainty (18,19), even in experimental de-
sion between intervals makes sense, the nosis for lost memory and disorientation. signs (such as randomized controlled
data are continuous. As an example, a trials).
time interval of minutes can be further Association Statistical techniques provide evidence
divided into seconds, milliseconds, and that a relationship exists between inde-
an infinite number of additional frac- The connection between variables is
pendent and dependent variables through
tions. In contrast, discrete variables such often referred to as association. Associa-
the use of significance testing and mea-
as sex, the five-point BI-RADS final assess- tion, also known as covariation, is exhib-
sures of the strength of association. This
ment scale, race, and number of children ited by measurable changes in one vari-
evidence must be supported by the theo-
in a household have basic units of mea- able that occur concurrently with
retical basis and logic of the research. The
surement that cannot be divided (one changes in another variable. A positive
Table presents a condensed list of ele-
cannot have 1.5 children). association is represented by changes in
ments necessary for a claim of causation.
the same direction (eg, heart rate in-
The first attempt to provide an epidemi-
Reliability and Validity creases as physical activity increases).
ologic method for evaluating causation
Negative association is represented by
Measurement accuracy is directly re- was performed by A. G. Hill and adapted
concurrent changes in opposite direc-
lated to reliability and validity. Reliabil- for the well-known U.S. Surgeon Gener-
tions (hours per week spent exercising
ity is the extent to which the repeated use al’s report, Smoking and Health (1964)
and percentage body fat). Spurious asso-
of a measure yields the same values when (18,19). The elements described in the
ciations are associations between two
no change has occurred. Therefore, reli- Table serve to remind us that causation is
variables that can be better explained by
ability can be evaluated empirically. Poor neither a simple exercise nor a direct
a third variable. As an example, if after
reliability negatively affects all studies. As product of statistical significance. This is
taking medication for a common cold for
an example, reliability can depend on why many believe the optimal research
10 days the symptoms disappear, one
who performs the measurement and technique to establish causation is to use
could assume that the medication cured
when, where, how, and from whom the a randomized controlled experiment.
the illness. Most of us, however, would
data are collected.
probably agree that the change is better
Validity is the extent to which a mea-
explained in terms of the normal time
sure is an accurate representation of the MAINTAINING PERSPECTIVE
course of a common cold rather than a
concept it is intended to operationalize.
pharmacologic effect.
Validity cannot be confirmed empiri- Rothman and Greenland (19) wrote,
cally—it will always be in question. Al- “The tentativeness of our knowledge
Causation
though there are several different con- does not prevent practical applications,
ceptualizations of validity, the following There is a difference between the deter- but it should keep us skeptical and criti-
provides a brief overview. Predictive va- mination of association and that of cau- cal, not only of everyone else’s work but
lidity refers to the ability of an indicator sation. Causation cannot be proved with of our own as well.”
to correctly predict (or correlate with) an statistics. With this caveat in mind, sta- A basic understanding of measurement
outcome (eg, imaged abnormal lesion tistical techniques are best used to ex- will enable radiologists to better under-

320 䡠 Radiology 䡠 November 2002 Applegate and Crewson


stand and put into perspective the sub- within a given interval or set of intervals: an www.StatPages.net
stantive importance of published re- infinite number of possible values. www.stats.gla.ac.uk
search. Maintaining perspective not only Dependent variable.—The value of the de- The following Web links are sources of
requires an understanding that all ana- pendent variable depends on variations in statistics help (accessed May 14, 2001).
lytic studies operate under a cloud of im- another variable. BMJ Statistics at Square One: www.bmj
perfect knowledge, but it also requires Discrete variable.—This type of variable is .com/statsbk/
Radiology

sufficient insight to recognize that statis- a measure that is represented by a limited The Little Handbook of Statistical Prac-
number of values. tice: www.tufts.edu/⬃gdallal/LHSP.HTM
tical sophistication and significance test-
Face validity.—Face validity evaluates Rice Virtual Lab in Statistics: www.ruf.rice
ing are tools, not ends in themselves. Sta-
whether the indicator appears to measure .edu/⬃lane/rvls.html
tistical techniques, however, are useful in
the abstract concept. Concepts and Applications of Inferential
providing summary measures of con-
Independent variable.—The independent Statistics: faculty.vassar.edu/⬃lowry/webtext
cepts and helping researchers decide, variable can be manipulated to affect varia- .html
given certain assumptions, what is mean- tions or responses in another variable. StatSoft Electronic Textbook: www.statsoft
ingful in a statistical sense (more about Interval data.—These variables classify ob- .com/textbook/stathome.html
this in future articles). As new techniques jects according to type and logical order but Hypertext Intro Stat Textbook: www
are presented in this series, readers also require that differences between levels .stat.ucla.edu/textbook/
should remind themselves that statistical of a category are equal. Introductory Statistics: Concepts, Models,
significance is meaningless without clin- Nominal data.—These are variables that and Applications: www.psychstat.smsu.edu/
ical significance. classify objects according to type or charac- sbk00.htm
teristic. Statnotes: An Online Textbook: www2
Operationalize.—This is the process of cre- .chass.ncsu.edu/garson/pa765/statnote.htm
WHAT COMES NEXT ating a measure of an abstract concept. Research Methods Knowledge Base: trochim
Ordinal data.—These are variables that .human.cornell.edu/kb/
This introduction to measurement will classify objects according to type or kind The following is a list Web links to statis-
be followed by a series of articles on basic but also have some logical order. tical software (accessed May 14, 2001).
biostatistics. The series will cover topics Predictive validity.—This is the ability of Stata Software (links to software provid-
on descriptive statistics, probability, sta- an indicator to correctly predict (or corre- ers): www.stata.com/links/stat_software.html
tistical estimation and hypothesis test- late with) an outcome. EpiInfo, free software downloads avail-
ing, sample size, and power. There will Random variable.—This type of variable is a able from the Centers for Disease Control
also be more advanced topics introduced, measure where any particular value is based and Prevention: www.cdc.gov/epiinfo/
such as correlation, regression modeling, on chance by means of random sampling.
statistical agreement, measures of risk Ratio data.—These variables have a zero APPENDIX C: SUGGESTED
and accuracy, technology assessment, re- starting point and classify objects according
GENERAL READINGS
ceiver operating characteristic curves, to type and logical order but also require
and bias. Each article will be written by that differences between levels of a category
The following is a list of suggested read-
experienced researchers using radiologic be equal.
ings.
Reliability.—Reliability is the extent to
examples to present a nontechnical ex- Pagano M, Gauvreau K. Principles of bio-
which the repeated use of a measure yields
planation of a statistical topic. statistics. Belmont, Calif: Duxbury, 1993.
the same value when no change has occurred.
Motulsky H. Intuitive biostatistics. New
Spurious association.—This is an association
York, NY: Oxford University Press, 1995.
APPENDIX A: KEY TERMS between two variables that can be better ex-
Rothman KJ, Greenland S, eds. Modern
plained by or depends greatly on a third vari-
epidemiology. Philadelphia, Pa: Lippincott-
Below is a list of the common terms and able.
Raven, 1998.
definitions related to measurement. Statistical control.—This refers to holding
Gordis L, ed. Epidemiology. Philadelphia,
Abstract concept.—The starting point for the value of one variable constant in order
Pa: Saunders, 1996.
measurement, an abstract concept is best to clarify associations among other vari-
Oxman AD, Sackett DL, Guyatt GH. Us-
understood as a general idea in linguistic ables.
ers’ guides to the medical literature. I. How
form that helps us describe reality. Statistical inference.—This is the process
to get started. The Evidence-Based Medicine
Association.—An association is a measur- whereby one reaches a conclusion about a
Working Group. JAMA 1993; 270:2093–2095.
able change in one variable that occurs con- population on the basis of information ob-
[This is from an ongoing series through year
currently with changes in another variable. tained from a sample drawn from that pop-
2000.]
Positive association is represented by change ulation. There are two such methods, statis-
in the same direction. Negative association is tical estimation and hypothesis testing. References
represented by concurrent changes in oppo- Validity.—Validity is the extent to which 1. Disraeli B. Quoted by: Twain M. An auto-
site directions. a measure accurately represents the abstract biography of Mark Twain. Neider C, ed.
Constant.—A constant is an attribute of a concept it is intended to operationalize. New York, NY: Columbia University
concept that does not vary. Variable.—A variable is a measure of a Press, 1995; chapter 29.
concept that can take on more than one 2. Altman DG, Bland JM. Improving doc-
Construct validity.—Construct validity is tor’s understanding of statistics. J R Stat
the degree to which one measure correlates value from one observation to the next. Soc 1991; 154:223–267.
with other measures of the same abstract 3. Hillman BJ, Putnam CE. Fostering re-
concept. APPENDIX B: WEB search by radiologists: recommendations
Content validity.—Content validity is the RESOURCES of the 1991 summit meeting. Radiology
1992; 182:315–318.
extent to which the indicator reflects the
4. Doubilet PM. Statistical techniques for
full domain of interest. The following is a list of links to general medical decision making: application to
Continuous variable.—This type of variable statistics resources available on the Web (ac- diagnostic radiology. AJR Am J Roentge-
is a measure that can take on any value cessed May 14, 2001). nol 1988; 150:745–750.

Volume 225 䡠 Number 2 Introduction to Biostatistics 䡠 321


5. Black WC. How to evaluate the radiology of statistics. 2nd ed. Boston, Mass: NEJM Md: Littlefield Adams Quality Paper-
literature. AJR Am J Roentgenol 1990; Books, 1992. backs, 1993.
154:17–22. 10. Fisher RA. The arrangement of field exper- 16. Babbie E. The practice of social research.
6. Moses LE, Lois TA. Statistical consultation iments. Journal of the Ministry of Agricul- 6th ed. New York, NY: Wadsworth, 1995.
in clinical research: a two-way street. In: ture of Great Britain 1926; 33:503–513. 17. Fisher LD, Belle GV. Biostatistics: a meth-
Bailar JC III, Mosteller F, eds. Medical uses 11. Oxman AD, Guyatt GH. The science of odology for the health sciences. New
of statistics. 2nd ed. Boston, Mass: NEJM reviewing research. Ann N Y Acad Sci
Radiology

York, NY: Wiley, 1993.


Books, 1992; 349 –356. 1993; 703:125–133; discussion 133–134. 18. Rothman KJ, Greenland S, eds. The emer-
7. Altman DG. Statistics and ethics in med- 12. Altman DG. Statistics: necessary and im-
gence of modern epidemiology. In: Mod-
ical research. VIII. Improving the quality portant. Br J Obstet Gynaecol 1986; 93:1–5.
ern epidemiology. 2nd ed. Philadelphia,
of statistics in medical journals. Br Med J 13. Altman DG. Statistics in medical jour-
1981; 282:44 – 47. nals: some recent trends. Stat Med 2000; Pa: Lippincott-Raven, 1998; 3– 6.
8. Moses L. Statistical concepts fundamental 19:3275–3289. 19. Rothman KJ, Greenland S, eds. Causation
to investigations. In: Bailar JC III, Mosteller 14. Altman DG. Practical statistics for medi- and causal inference. In: Modern epide-
F, eds. Medical uses of statistics. 2nd ed. cal research. London, England: Chapman miology. 2nd ed. Philadelphia, Pa: Lip-
Boston, Mass: NEJM Books, 1992; 5–44. & Hall, 1991; 108 –111. pincott-Raven, 1998; 7–28.
9. Bailar JC III, Mosteller F, eds. Medical uses 15. Emmet ER. Handbook of logic. Lanham,

322 䡠 Radiology 䡠 November 2002 Applegate and Crewson

You might also like