Professional Documents
Culture Documents
Chhatwal et al.
Model for Breast Cancer Diagnosis
Women’s Imaging
Original Research
M
ammography, accepted as the cision making by physicians and patients
Received October 24, 2007; accepted after revision
September 17, 2008.
most effective screening method [23–25]. Previous studies on CADx tools use
in the detection of early breast either small subsets of data, suspicious mam-
1
Department of Radiology, University of Wisconsin cancer, still has limited accuracy mograms, or mammograms recommended
School of Medicine and Public Health, E3/311 Clinical and significant interpretation variability that for biopsy [11–15]. Although most of these
Science Center, 600 Highland Ave., Madison, WI
53792-3252. Address correspondence to E. S. Burnside
decreases its effectiveness [1–6]. The use of studies show that CADx tools are efficient in
(eburnside@uwhealth.org). computer models can help by detecting abnor- predicting the outcome as benign or malig-
malities on mammograms [7–10]; estimating nant disease, none shows the effectiveness of
2
Industrial & Systems Engineering, University of the risk of breast cancer for improved sensitiv- CADx models when applied to mammogra-
Wisconsin, Madison, Madison, WI.
ity and specificity of diagnosis [11–16]; and phy data collected during daily clinical prac-
3
Present address: Health Economic Statistics, Merck identifying high-risk populations for screen- tice. In addition, previous studies used biop-
Research Labatories, North Wales, PA. ing, genetic testing, or participation in clinical sy results as the reference standard, whereas
trials [17–22]. This study focuses on the sec- we use a match with our state cancer registry.
4
Department of Biostatistics and Medical Informatics, ond goal: the use of a computer-aided diagno- To our knowledge, our study is the first one
University of Wisconsin, Madison, WI.
sis (CADx) model for risk estimation to aid to develop and test a logistic regression–
5
Department of Radiology, Medical College of Wisconsin, radiologists in breast cancer diagnosis. based CADx model based on consecutive
Milwaukee, WI. CADx models can quantify the risk of mammograms from a breast imaging prac-
cancer using demographic factors and mam- tice incorporating BI-RADS descriptors.
AJR 2009; 192:1117–1127 mography features already identified by a ra- As the variables that help predict breast
0361–803X/09/1924–1117
diologist or a computer-aided detection model. cancer increase in number, physicians must rely
CADx models estimate the probability (or risk) on subjective impressions based on their expe-
© American Roentgen Ray Society of disease that can be used for improved de- rience to make decisions. Using a quantitative
modeling technique such as logistic regression TABLE 1: Distribution of Study Population
to predict the risk of breast cancer may help ra- No. (%) of Mammograms
diologists manage the large amount of infor-
Factor Benign (n = 48,267) Malignant (n = 477) Total (n = 48,744)
mation available, make better decisions, detect
more cancers at early stages, and reduce un- Age (y)
necessary biopsies. The purpose of this study < 45 9,529 (20) 66 (14) 9,595
Downloaded from www.ajronline.org by 72.34.128.250 on 03/14/20 from IP address 72.34.128.250. Copyright ARRS. For personal use only; all rights reserved
was to create a breast cancer risk estimation 45–49 7,524 (16) 49 (10) 7,573
model based on demographic risk factors and
50–54 7,335 (15) 56 (12) 7,391
BI-RADS descriptors available in the National
Mammography Database using logistic regres- 55–59 6,016 (12) 71 (15) 6,087
sion that can aid in decision making for the im- 60–64 4,779 (10) 59 (12) 4,838
proved early detection of breast cancer. ≥ 65 13,084 (27) 176 (37) 13,260
Breast density
Materials and Methods
The institutional review board determined Predominantly fatty 7,226 (15) 61 (13) 7,287
that this retrospective HIPAA-compliant study Scattered fibroglandular 19,624 (41) 201 (42) 19,825
was exempt from requiring informed consent. Heterogeneously dense 17,032 (35) 174 (36) 17,206
We used variables collected in the National
Extremely dense tissue 4,385 (9) 41 (9) 4,426
Mammography Database [26] to develop a CADx
model. The National Mammography Database is BI-RADS category
a recommended format for collecting practice- 1 21,094 (44) 0 (0) 21,094
level mammography audit data to monitor and 2 10,048 (21) 13 (3) 10,061
standardize performance nationally. The National
3 8,520 (18) 32 (7) 8,552
Mammography Database includes Breast Imag
ing Reporting and Data System (BI-RADS) 0 8,148 (17) 130 (27) 8,278
descriptors [27, 28]. 4 364 (1) 137 (29) 501
5 93 (0) 165 (35) 258
Subjects
We collected data from all screening and dia
gnostic mammography examinations that were detection. Mean glandular dose was not available characteristics, treatment, and mortality. Data
performed at the Medical College of Wisconsin, at the time of our study. exchange agreements with 17 other state cancer
Milwaukee, an academic, tertiary care medical The clinical practice we studied routinely converts registries yield data for Wisconsin residents
center, between April 5, 1999 and February 9, 2004. screening examinations to diagnostic mammography receiving care in other states. We sent 65,892
Our database included 48,744 mammography examinations when an abnormality is identified; records in the database to the cancer registry
examinations (477 malignant and 48,267 benign) therefore, practice performance parameters were and received back 65,904 records after their
performed on 18,270 patients (Table 1) having the calculated in aggregate because these examinations matching protocol. An additional 12 records
mean age of 56.8 years (range, 18–99 years). Our could not be accurately separated. Specifically, we were returned to us because of duplication of
data set consisted of 65,892 records; each record measured recommended performance parameters records for patients diagnosed with more than
represents a mammography lesion (benign or (cancer detection rate, early-stage cancer detection one cancer. We developed an automated process
malignant) observed on the mammogram or a rate, and abnormal interpretation rate) for all mammo that confirmed whether the cancer matched the
single record of demographic factors only, if grams in our data set. assigned abnormality. This process ensured that
nothing is observed on the mammogram. The data In contrast to our practice performance audit, the record indicated the same side and the same
were entered using the PenRad mammography based on mammograms, the analysis of the quadrant and that the diagnosis was made no
reporting and tracking data system (structured classification accuracy of the logistic regression longer than 12 months after the mammography
reporting software, PenRad) by technologists and model and radiologists was conducted at the record date. If more than one record indicated the
radiologists. There were a total of eight radi level. Because breast cancer classification actually same side and quadrant, the matching was done
ologists, four of whom were general radiologists occurs at the record level (i.e., each finding on manually. We used a 12-month follow-up period
with some mammography background, two who mammography will require a decision to recall or as the reference standard because it has been
were fellowship-trained, and two who had lengthy to biopsy), we target this level of level of detail to recommended as an interval sufficient to identify
experience in breast imaging. The experience of help improve radiologists’ performance. We clearly false-negatives in mammography practice audits
the eight radiologists ranged between 1 and 35 indicate when analyses in this article are based on [27, 28]. We removed 299 records belonging to
years, and the number of mammograms interpreted mammograms rather than on records. 188 mammograms from 124 women because they
by them ranged from 49 to 22,219. All mammog We used cancer registry matching as the refer could not be matched due to missing laterality
raphy observations were made by radiologists; all ence standard in this study. All newly diagnosed or quadrant information from either the cancer
demographic factors were recorded by technologists. cancer cases are reported to the Wisconsin Cancer registry (117 records) or the mammography
This facility used a combination of digital and film Reporting System. This registry collaborates with structured report (182 records) (Table 2). Of the
mammography (~ 75% film mammog raphy). No several other state agencies to collect a range of unmatched 299 records, 183 records represented
computer-aided detection tool was used for lesion data, including demographic information, tumor a second record identifying a finding in women
Patient BI-RADS
Findings
demographics assessment
Scattered Circumscribed
Suturea Linear Rodlikea Microlobulated
Clustered Obscured
Regional Indistinct
Segmental Spiculated
Fig. 1—Descriptors of National Mammography Database [26] entered to build logistic regression model for breast cancer prediction.
aBinary variable with categories “Present” or “Not Present.”
bClass 1, predominantly fatty; class 2, scattered fibroglandular; class 3, heterogeneously dense; and class 4, extremely dense tissue.
important. The p values listed in Tables 3 and 4 are TABLE 3: Model 1, Multivariable Model with BI-RADS Categories Excluded
from chi-square tests of the significance of each Risk Factor β Odds Ratio (95% CI) p
term entered last. The importance of each term in
Mass stability < 0.0001
predicting breast cancer can be assessed using the
odds ratios provided in the tables. The details of None 0.00 1 (referent)
logistic regression (including the interpretation of Increasing 0.63 1.88 (1.37–2.60)
Downloaded from www.ajronline.org by 72.34.128.250 on 03/14/20 from IP address 72.34.128.250. Copyright ARRS. For personal use only; all rights reserved
TABLE 3: Model 1, Multivariable Model with BI-RADS Categories Excluded to estimate the coefficients of the independent
(continued) variables (training) and predicted the probability of
Risk Factor β Odds Ratio (95% CI) p cancer on the 10th fold (testing). Then we omitted
the 9th fold (used as the testing set) and trained
Fine linear calcifications < 0.0001
the model using the other nine folds. Similarly,
Not present 0.00 1 (referent) we tested on each fold. Finally, we combined all
Downloaded from www.ajronline.org by 72.34.128.250 on 03/14/20 from IP address 72.34.128.250. Copyright ARRS. For personal use only; all rights reserved
Present 0.89 2.44 (1.61–3.69) test sets to obtain a full-test set and evaluated the
Patient age (y) 0.2216 overall performance of the model using the full-
test set. Note that for inclusion of variables in the
< 45 0.00 1 (referent)
final model, we used the whole data set (62,219
45–50 −0.02 0.98 (0.65–1.48) records), which gave us the best possible estimates
51–54 −0.20 0.82 (0.51–1.32) of the variables from the available data.
55–60 0.26 1.30 (0.88–1.92) Performance measures—We measured the
performance of the two models using the outcome
61–64 0.18 1.20 (0.77–1.88)
(i.e., the probability of cancer) of the full-test
≥ 65 0.22 1.25 (0.87–1.78) set obtained by 10-fold cross-validation. We
History of breast cancer < 0.0001 plotted and measured area under the receiver
None 0.00 1 (referent) operating characteristic (ROC) curve of Model 1
and Model 2 using the probability of cancer. We
History of ductal or lobular carcinoma 2.90 18.16 (14.38–22.93)
measured the performance of radiologists using
Note—Beta (β) indicates regression coefficients. BI-RADS assessment categories assigned to each
TABLE 4: Model 2, Multivariable Model with BI-RADS Categories Included mammography record. We first ordered BI-RADS
assessment categories by likelihood of breast
Risk Factor β Odds Ratio (95% CI) p cancer (1, 2, 3, 0, 4, and 5), generated an ROC curve,
Mass stability 0.0002 and measured its area (A z) using a nonparametric
None 0.00 1 (referent) method [33]. We compared the performance of
the two models with that of radiologists using the
Increasing 0.54 1.71 (1.21–2.42)
nonparametric method of DeLong et al. [34] for
Stable −0.04 0.96 (0.55–1.68)
comparing two or more areas under ROC curves
Decreasing −0.96 0.38 (0.19–0.78) obtained from the same data set.
Mass margins < 0.0001 For the purpose of assessing the sensitivity
and specificity of radiologists, we classified
None 0.00 1 (referent)
BI-RADS categories 1, 2, and 3 as negative; and
Circumscribed −0.41 0.66 (0.38–1.14
BI-RADS categories 0, 4, and 5 as positive [28].
Cannot discern 0.41 1.51 (0.89–2.55) We compared the sensitivity of the two models
Ill-defined 0.76 2.13 (1.38–3.29) with the radiologists’ sensitivity at 90% specificity,
Spiculated 0.77 2.16 (1.27–3.69) and the specificity of the two models with the
radiologists’ specificity at 85% sensitivity, with
Microlobulated 0.10 1.11 (0.41–2.95)
the corresponding CIs estimated using the efficient
Mass size < 0.0001 score method corrected to continuity [35]. Note
None 0.00 1 (referent) that the points “sensitivity at 90% specificity” and
Small 1.13 3.10 (2.15–4.48) “specificity at 85% sensitivity” on the radiologists’
ROC curve were not observed in practice; they
Large 0.42 1.51 (0.78–2.95)
were obtained from the linear interpolation of the
Intramammary lymph node < 0.0001 two neighboring discrete points. We used these
Not present 0.00 1 (referent) levels of sensitivity and specificity because they
Present −1.73 0.18 (0.07–0.45) represent the minimal performance thresholds for
screening mammography [36]. We also estimated
Focal asymmetric density 0.0002
the number of true-positive and false-negative
Not present 0.00 1 (referent) records at 90% specificity by multiplying the
Present 0.78 2.18 (1.54–3.08) sensitivity (of radiologists, Model 1 and Model 2)
Calcification distribution < 0.0001 by the total number of malignant records. Similarly,
we estimated the number of false-positive and true-
None 0.00 1 (referent)
negative records at 85% sensitivity by multiplying
Clustered 1.09 2.98 (2.00–4.43) the specificity (of radiologists, Model 1 and Model
Regional 0.92 2.51 (0.95–6.62) 2) by the total number of benign records. Finally,
Note—Beta (β) indicates regression coefficients. we identified the most important predictors of
(Table 4 continues on next page) breast cancer using the odds ratio given in the
TABLE 4: Model 2, Multivariable Model with BI-RADS Categories Included Radiologists achieved an Az of 0.939 ± 0.011
(continued) as measured by the BI-RADS assessment cat-
Risk Factor β Odds Ratio (95% CI) p egory assigned to each record. Model 1
achieved an Az of 0.927 ± 0.015, which was not
Calcification distribution (continued) < 0.0001
significantly different (p = 0.104) from the ra-
Scattered 0.60 1.82 (0.14–23.48) diologists’ Az. Model 2, with an Az of 0.963 ±
Downloaded from www.ajronline.org by 72.34.128.250 on 03/14/20 from IP address 72.34.128.250. Copyright ARRS. For personal use only; all rights reserved
and Model 2 used jointly will show a high lev- as compared with general radiologists [39]. BI-RADS 1 cases from our analyses, which
el of disagreement in the prediction of breast However, with appropriate training [40], represented either undetected cancer (pres-
cancer (as in example case 2) and will poten- general radiologists in combination with the ent on the mammogram but not seen) or an
tially indicate this error. When the radiolo- model may approach the accuracy of subspe- interval cancer (not detectable on the mam-
gist’s BI-RADS category is correct (i.e., when cialty-trained mammographers. Decreasing mogram). The inclusion of these cases may
there is an agreement between the predictions variability in mammography interpretation, have erroneously increased the probability of
Downloaded from www.ajronline.org by 72.34.128.250 on 03/14/20 from IP address 72.34.128.250. Copyright ARRS. For personal use only; all rights reserved
of Model 1 and Model 2), Model 2 will be a one of the underlying motivations of this re- malignancy by considering future risks rath-
better model for breast cancer prediction. In search, can only be realized with further de- er than making a prediction at a single time
future work, we plan to estimate the level of velopment of tools such as our model and based on mammography features alone. How-
disagreement between the two models and in- with research to validate accuracy, effective- ever, the exclusion of these cases may have er-
vestigate the possible use of these models as ness, and generalizability. We consider this roneously decreased the estimated probability
complementary tools. work to be only a first step toward this goal. of malignancy, given that at least some of the
Our secondary model (Model 3) showed We could not compare practice parameters false-negative cancers were likely present at
that the exclusion of the BI-RADS descrip- directly with the literature because screen- the time of the mammogram, especially those
tors significantly impairs the performance of ing and diagnostic examinations could not be in women with dense breasts, which is a limi-
the logistic regression model, underscoring separated for this database. Our prediction tation of our model.
the need for the collection of these variables Model 2 shows a significant improvement Our models provide the probability of can-
in a clinical practice. over radiologists’ assessment in classifying cer as the outcome that can be used by radiol-
It is common for clinical data sets to con- abnormalities when built on a mix of screen- ogists for making appropriate patient manage-
tain a substantial number of missing data. ing and diagnostic data. The model’s perfor- ment decisions. The use of such models has
Although complete data are ideally better, mance may differ when built separately on a potential to reduce the interpretive variabil-
that situation is rarely encountered in the screening and diagnostic mammograms. For ity of mammography across practices and ra-
real world. There is no perfect way to handle screening mammograms, the incidence is diologists. Our models also facilitate shared
missing data, but there are two possibilities: low and descriptors are less exact because of decision making by providing the probabili-
to impute the missing descriptor depending general imaging protocols and so may result ty of cancer, which can be better understood
on the fraction of various possible values of in less accurate model parameters. In con- by patients than BI-RADS categories. In the
the descriptor or to assume that the missing trast, for diagnostic mammograms, the mod- future, we will test our models’ performance
descriptor was not observed by radiologists el parameters may be more accurate because on other mammography practices to evaluate
and mark it as “not present.” When building more descriptors can be observed as a result their generalizability. We will also include po-
the model, we made the decision to label all of additional specialized views. In addition, tentially important interaction effects that de-
of the missing data as not present; therefore, the performance of our existing model may serve particular attention. Note that including
when testing and applying the model on a differ when tested on screening and diagnos- interaction effects will further improve the
new case, the missing descriptors should be tic mammograms separately. The model may performance of our models.
treated as not present. Our approach to handling perform better when tested on the diagnostic In conclusion, we found that our logis-
missing data is appropriate for mammogra- examinations but worse when tested on the tic regression models (Model 1 and Model
phy data, where radiologists often leave the screening examinations. 2) can effectively discriminate between be-
descriptors blank if nothing is observed on Our risk estimation models are designed nign and malignant lesions. Furthermore,
the mammogram. to aid radiologists, not to act as a substitute. we have found that the radiologist alone or
To our knowledge, no prior studies dis- The improvement in the model’s performance the logistic regression model incorporat-
cuss a logistic regression–based CADx mod- by adding BI-RADS assessments indeed sug- ing only mammographic and demographic
el incorporating mammography descriptors gests that the radiologist’s integration of the features (Model 1) are inferior to Model 2,
from consecutive mammograms from a imaging findings summarized by the BI- which incorporates the model, the features,
breast imaging practice. The use of a logis- RADS assessment categories does augment and the radiologist’s impression as captured
tic regression model has some attractive fea- predictions based on the observed mammo- by the BI-RADS assessment categories. Our
tures when compared with artificial intelli- graphic features. However, the logistic regres- study supports that further research is need-
gence prediction tools (e.g., artificial neural sion model contributes an additional measure ed to define how radiologists and computa-
networks, Bayesian networks, support vector of accuracy over and above that provided by tional models can collaborate, each adding
machines). Logistic regression can identify the BI-RADS assessment categories, as evi- valuable predictive features, experience, and
important predictors of breast cancer using denced by the improved performance com- training to improve overall performance.
odds ratios and can generate confidence in- pared with that of the radiologists alone.
tervals that provide additional information The objective of our model is to aid deci- References
for decision making. sion making by generating a risk prediction 1. Kopans DB. The positive predictive value of
Our models’ performance depends on the for a single point in time (at mammography). mammography. AJR 1992; 158:521–526
ability of radiologists to accurately identify As we were designing the study, we did not 2. Barlow WE, Chi C, Carney PA, et al. Accuracy of
findings on mammograms. Therefore, based want to influence the probability of breast screening mammography interpretation by char-
on the literature, performance may be high- cancer based on future events but only on acteristics of radiologists. J Natl Cancer Inst
er in facilities where most mammograms variables identified at the time of mammogra- 2004; 96:1840–1850
are read by mammography subspecialists phy. For this reason, we excluded unmatched 3. Kerlikowske K, Grady D, Barclay J, et al. Variability
and accuracy in mammographic interpretation us- 17. Claus EB, Risch N, Thompson WD. Autosomal 32. Moineddin R, Matheson FI, Glazier RH. A simula-
ing the American College of Radiology Breast dominant inheritance of early-onset breast can- tion study of sample size for multilevel logistic re-
Imaging Reporting and Data Systems. J Natl cer: implications for risk prediction. Cancer 1994; gression models. BMC Med Res Methodol 2007;
Cancer Inst 1998; 90:1801–1809 73:643–651 7:34
4. Elmore JG, Miglioretti DL, Reisch LM, et al. 18. Colditz GA, Rosner B. Cumulative risk of breast 33. Hanley JA, McNeil BJ. The meaning and use of
Screening mammograms by community radiolo- cancer to age 70 years according to risk factor sta- the area under a receiver operating characteristic
Downloaded from www.ajronline.org by 72.34.128.250 on 03/14/20 from IP address 72.34.128.250. Copyright ARRS. For personal use only; all rights reserved
gists: variability in false-positive rates. J Natl tus: data from the Nurses’ Health Study. Am J (ROC) curve. Radiology 1982; 143:29–36
Cancer Inst 2002; 94:1373–1380 Epidemiol 2000; 152:950–964 34. DeLong ER, DeLong D, Clarke-Pearson D. Com-
5. Miglioretti DL, Smith-Bindman R, Abraham L, et 19. Gail MH, Brinton LA, Byar DP, et al. Projecting paring the areas under two or more correlated re-
al. Radiologist characteristics associated with in- individualized probabilities of developing breast ceiver operating characteristic curves: a nonpara-
terpretive performance of diagnostic mammogra- cancer for white females who are being examined metric approach. Biometrics 1988; 44:837–845
phy. J Natl Cancer Inst 2007; 99:1854–1863 annually. J Natl Cancer Inst 1989; 81:1879–1886 35. Newcombe RG. Two-sided confidence intervals
6. Taplin S, Abraham L, Barlow WE, et al. Mam- 20. Taplin SH, Thompson RS, Schnitzer F, Anderman for the single proportion: comparison of seven
mography facility characteristics associated with C, Immanuel V. Revisions in the risk-based Breast methods. Stat Med 1998; 17:857–872
interpretive accuracy of screening mammogra- Cancer Screening Program at Group Health Co- 36. Bassett LW, Hendrick RE, Bassford TL. Quality
phy. J Natl Cancer Inst 2008; 100:876–887 operative. Cancer 1990; 66:812–818 determinants of mammography. Clinical practice
7. Freer TW, Ulissey MJ. Screening mammography 21. Barlow WE, White E, Ballard-Barbash R, et al. guideline. No. 13. Rockville, MD: Agency for
with computer-aided detection: prospective study Prospective breast cancer risk prediction model Health Care Policy and Research. Public Health
of 12,860 patients in a community breast center. for women undergoing screening mammography. Service, U.S. Department of Health and Human
Radiology 2001; 220:781–786 J Natl Cancer Inst 2006; 98:1204–1214 Services, 1994
8. Dean JC, Ilvento CC. Improved cancer detection 22. Tice JA, Cummings SR, Smith-Bindman R, 37. Burnside ES, Rubin DL, Fine JP, Shachter RD,
using computer-aided detection with diagnostic Ichikawa L, Barlow WE, Kerlikowske K. Using Sisney GA, Leung WK. Bayesian network to pre-
and screening mammography: prospective study clinical factors and mammographic breast density dict breast cancer risk of mammographic micro-
of 104 cancers. AJR 2006; 187:20–28 to estimate breast cancer risk: development and calcifications and reduce number of benign biopsy
9. Cupples TE, Cunningham JE, Reynolds JC. Impact validation of a new predictive model. Ann Intern results: initial experience. Radiology 2006; 240:
of computer-aided detection in a regional screening Med 2008; 148:337–347 666–673
mammography program. AJR 2005; 185:944–950 23. Vyborny CJ, Giger ML, Nishikawa RM. Comput- 38. Liberman L, Abramson AF, Squires FB, Glass-
10. Birdwell RL, Bandodkar P, Ikeda DM. Comput- er-aided detection and diagnosis of breast cancer. man JR, Morris EA, Dershaw DD. The Breast Im-
er-aided detection with screening mammography Radiol Clin North Am 2000; 38:725–740 aging Reporting and Data System: positive predic-
in a university hospital setting 1. Radiology 2005; 24. Doi K, Macmahon H, Katsuragawa S, Nishikawa tive value of mammographic features and final
236:451–457 RM, Jiang Y. Computer-aided diagnosis in radiology: assessment categories. AJR 1998; 171:35–40
11. Baker JA, Kornguth PJ, Lo JY, Williford ME, potential and pitfalls. Eur J Radiol 1999; 31:97–109 39. Sickles EA, Wolverton DE, Dee KE. Performance
Floyd CE Jr. Breast cancer: prediction with artifi- 25. Freedman AN, Seminara D, Gail MH, et al. Can- parameters for screening and diagnostic mam-
cial neural network based on BI-RADS standard- cer risk prediction models: a workshop on devel- mography: specialist and general radiologists.
ized lexicon. Radiology 1995; 196:817–822 opment, evaluation, and application. J Natl Can- Radiology 2002; 224:861–869
12. Bilska-Wolak AO, Floyd CE Jr. Development and cer Inst 2005; 97:715–723 40. Berg WA, D’Orsi CJ, Jackson VP, et al. Does
evaluation of a case-based reasoning classifier for 26. Osuch JR, Anthony M, Bassett LW, et al. A proposal training in the Breast Imaging Reporting and
prediction of breast biopsy outcome with BI- for a national mammography database: content, pur- Data System (BI-RADS) improve biopsy recom-
RADS lexicon. Med Phys 2002; 29:2090–2100 pose, and value. AJR 1995; 164:1329–1334 mendations or feature analysis agreement with
13. Burnside ES, Rubin DL, Shachter RD. Using a 27. American College of Radiology. Breast Imaging experienced breast imagers at mammography?
Bayesian network to predict the probability and Reporting and Data System (BI-RADS), 3rd ed. Radiology 2002; 224:871–880
type of breast cancer represented by microcalcifi- Reston, VA: American College of Radiology, 1998 41. Kleinbaum DG. Logistic regression: a self-learn-
cations on mammography. Stud Health Technol 28. American College of Radiology. Breast Imaging ing text. New York, NY: Springer-Verlag, 1994
Inform 2004; 107(Pt 1):13–17 Reporting and Data System (BI-RADS), 4th ed. 42. Hosmer D, Lemeshow S. Applied logistic regres-
14. Fischer EA, Lo JY, Markey MK. Bayesian net- Reston, VA: American College of Radiology, 2004 sion. New York, NY: Wiley, 1989
works of BI-RADS descriptors for breast lesion 29. Bagley SC, White H, Golomb BA. Logistic regres- 43. Davis J, Goadrich M. The relationship between
classification. Conf Proc IEEE Eng Med Biol Soc sion in the medical literature: standards for use and precision-recall and ROC curves. Proceedings of
2004; 4:3031–3034 reporting, with particular attention to one medical the 23rd International Conference on Machine
15. Markey MK, Lo JY, Floyd CE. Differences between domain. J Clin Epidemiol 2001; 54:979–985 Learning. Pittsburgh, PA: ICML, 2006:233–240
computer-aided diagnosis of breast masses and that 30. Gareen IF, Gatsonis C. Primer on multiple regres- 4 4. Chhatwal J, Burnside ES, Alagoz O. Receiver op-
of calcifications. Radiology 2002; 223: 489–493 sion models for diagnostic imaging research. Ra- erating characteristic (ROC) curves versus preci-
16. Jesneck JL, Lo JY, Baker JA. Breast mass lesions: diology 2003; 229:305–310 sion-recall (PR) curves in models evaluated with
computer-aided diagnosis models with mammo- 31. Team RDC. R: a language and environment for unbalanced data. Proceedings of the 29th annual
graphic and sonographic descriptors. Radiology statistical computing. Vienna, Austria: R Founda- meeting of the Society for Medical Decision
2007; 244:390–398 tion for Statistical Computing, 2005 Making. Pittsburgh, PA: SMDM, 2007
Binomial (or binary) logistic regression is a form of regression that is used when the dependent variable is dichotomous (e.g., present or ab-
sent) and the independent variables are of any type (discrete or continuous). The independent (observed) variables, Xi, X2,… Xn, are related to
the dependent (outcome) variable, Y, by the following equation:
Logit(p) = β0 + β1X1 + ... + βnXn (1),
Downloaded from www.ajronline.org by 72.34.128.250 on 03/14/20 from IP address 72.34.128.250. Copyright ARRS. For personal use only; all rights reserved
p
where β1 is the regression coefficient of X1, p = probability {Y = 1}, and Logit (p) = 1n
the inverse of the Logit (p) as shown in the following equation: 1−p
( )
. The value of p can be calculated by taking
APPENDIX 2: Model 3
In order to assess the contribution of mammography descriptors in estimating the risk of breast cancer, we constructed Model 3, which in-
cluded patient demographic factors (age, history of breast cancer, family history of breast cancer, history of surgery, breast density, and hor-
mone therapy) and BI-RADS assessment categories, and excluded mammography descriptors. Only three variables were found significant in
predicting the risk of cancer in Model 3 (Table 6); BI-RADS assessment categories were the most important predictor.
We measured the performance of our model using receiver operating characteristic (ROC) curves and precision–recall curves (Figs. 3 and 4).
We used precision–recall curves in addition to ROC curves to gain more insights into the performance of our model because precision–recall
curves have higher discriminative power than ROC curves in cases of skewed data [43, 44]. “Precision” measures the positive predictive value
and “recall” measures the sensitivity of a test. We plotted and measured the area under the precision–recall curve (A PR) of the three models
(Model 1, Model 2, and Model 3) and radiologists using the probability of cancer and BI-RADS assessment categories, respectively [43].
Downloaded from www.ajronline.org by 72.34.128.250 on 03/14/20 from IP address 72.34.128.250. Copyright ARRS. For personal use only; all rights reserved
1.00 1.00
Model 1; AUC = 0.363
Model 2; AUC = 0.559
Model 3; AUC = 0.487
Radiologists; AUC = 0.396
0.80 0.80
Precision (PPV)
Sensititvity
0.60 0.60
0.40 0.40
Fig. 3—Graph shows receiver operating characteristic curves constructed from Fig. 4—Graph shows precision–recall curves constructed from output
output probabilities of Model 1, Model 2, and Model 3, and radiologist’s BI-RADS probabilities of Model 1, Model 2, and Model 3, and radiologist’s BI-RADS
assessment categories. AUC = area under curve. assessment categories. AUC = area under curve, PPV = positive predictive value.
Model 3 achieved an A z (area under the ROC curve) and A PR that were significantly higher than that of Model 1 and radiologists (all p <
0.001). More important, Model 3 excluding descriptors performed significantly worse (p < 0.001) than Model 2 including descriptors in terms
of A z and APR (Table 7). Thus, the inclusion of mammographic descriptors significantly contributes to the superior performance of Model 2.