You are on page 1of 11

Wo m e n ’s I m a g i n g • O r i g i n a l R e s e a r c h

Chhatwal et al.
Model for Breast Cancer Diagnosis

Women’s Imaging
Original Research

A Logistic Regression Model Based


Downloaded from www.ajronline.org by 72.34.128.250 on 03/14/20 from IP address 72.34.128.250. Copyright ARRS. For personal use only; all rights reserved

on the National Mammography


Database Format to Aid Breast
W O M E N ’S
IMAGING Cancer Diagnosis
Jagpreet Chhatwal1,2,3 OBJECTIVE. The purpose of our study was to create a breast cancer risk estimation mod-
Oguzhan Alagoz 2 el based on the descriptors of the National Mammography Database using logistic regression
Mary J. Lindstrom 4 that can aid in decision making for the early detection of breast cancer.
Charles E. Kahn, Jr. 5 MATERIALS AND METHODS. We created two logistic regression models based on
Katherine A. Shaffer 5 the mammography features and demographic data for 62,219 consecutive mammography re-
cords from 48,744 studies in 18,270 patients reported using the Breast Imaging Reporting and
Elizabeth S. Burnside1,2,4
Data System (BI-RADS) lexicon and the National Mammography Database format between
Chhatwal J, Alagoz O, Lindstrom MJ, Kahn CE Jr, April 5, 1999 and February 9, 2004. State cancer registry outcomes matched with our data
Shaffer KA, Burnside ES served as the reference standard. The probability of cancer was the outcome in both models.
Model 2 was built using all variables in Model 1 plus radiologists’ BI-RADS assessment cat-
egories. We used 10-fold cross-validation to train and test the model and to calculate the area
under the receiver operating characteristic curves (A z) to measure the performance. Both
models were compared with the radiologists’ BI-RADS assessments.
RESULTS. Radiologists achieved an A z value of 0.939 ± 0.011. The A z was 0.927 ± 0.015
for Model 1 and 0.963 ± 0.009 for Model 2. At 90% specificity, the sensitivity of Model 2
(90%) was significantly better (p < 0.001) than that of radiologists (82%) and Model 1 (83%).
At 85% sensitivity, the specificity of Model 2 (96%) was significantly better (p < 0.001) than
that of radiologists (88%) and Model 1 (87%).
CONCLUSION. Our logistic regression model can effectively discriminate between be-
Keywords: logistic regression, mammography, National
Mammography Database, risk prediction
nign and malignant breast disease and can identify the most important features associated
with breast cancer.
DOI:10.2214/AJR.07.3345

M
ammography, accepted as the cision making by physicians and patients
Received October 24, 2007; accepted after revision
September 17, 2008.
most effective screening method [23–25]. Previous studies on CADx tools use
in the detection of early breast either small subsets of data, suspicious mam-
1
Department of Radiology, University of Wisconsin cancer, still has limited accuracy mograms, or mammograms recommended
School of Medicine and Public Health, E3/311 Clinical and significant interpretation variability that for biopsy [11–15]. Although most of these
Science Center, 600 Highland Ave., Madison, WI
53792-3252. Address correspondence to E. S. Burnside
decreases its effectiveness [1–6]. The use of studies show that CADx tools are efficient in
(eburnside@uwhealth.org). computer models can help by detecting abnor- predicting the outcome as benign or malig-
malities on mammograms [7–10]; estimating nant disease, none shows the effectiveness of
2
Industrial & Systems Engineering, University of the risk of breast cancer for improved sensitiv- CADx models when applied to mammogra-
Wisconsin, Madison, Madison, WI.
ity and specificity of diagnosis [11–16]; and phy data collected during daily clinical prac-
3
Present address: Health Economic Statistics, Merck identifying high-risk populations for screen- tice. In addition, previous studies used biop-
Research Labatories, North Wales, PA. ing, genetic testing, or participation in clinical sy results as the reference standard, whereas
trials [17–22]. This study focuses on the sec- we use a match with our state cancer registry.
4
Department of Biostatistics and Medical Informatics, ond goal: the use of a computer-aided diagno- To our knowledge, our study is the first one
University of Wisconsin, Madison, WI.
sis (CADx) model for risk estimation to aid to develop and test a logistic regression–
5
Department of Radiology, Medical College of Wisconsin, radiologists in breast cancer diagnosis. based CADx model based on consecutive
Milwaukee, WI. CADx models can quantify the risk of mammograms from a breast imaging prac-
cancer using demographic factors and mam- tice incorporating BI-RADS descriptors.
AJR 2009; 192:1117–1127 mography features already identified by a ra- As the variables that help predict breast
0361–803X/09/1924–1117
diologist or a computer-aided detection model. cancer increase in number, physicians must rely
CADx models estimate the probability (or risk) on subjective impressions based on their expe-
© American Roentgen Ray Society of disease that can be used for improved de- rience to make decisions. Using a quantitative

AJR:192, April 2009 1117


Chhatwal et al.

modeling technique such as logistic regression TABLE 1:  Distribution of Study Population
to predict the risk of breast cancer may help ra- No. (%) of Mammograms
diologists manage the large amount of infor-
Factor Benign (n = 48,267) Malignant (n = 477) Total (n = 48,744)
mation available, make better decisions, detect
more cancers at early stages, and reduce un- Age (y)
necessary biopsies. The purpose of this study   < 45 9,529 (20) 66 (14) 9,595
Downloaded from www.ajronline.org by 72.34.128.250 on 03/14/20 from IP address 72.34.128.250. Copyright ARRS. For personal use only; all rights reserved

was to create a breast cancer risk estimation   45–49 7,524 (16) 49 (10) 7,573
model based on demographic risk factors and
  50–54 7,335 (15) 56 (12) 7,391
BI-RADS descriptors available in the National
Mammography Database using logistic regres-   55–59 6,016 (12) 71 (15) 6,087
sion that can aid in decision making for the im-   60–64 4,779 (10) 59 (12) 4,838
proved early detection of breast cancer.   ≥ 65 13,084 (27) 176 (37) 13,260
Breast density
Materials and Methods
The institutional review board determined   Predominantly fatty 7,226 (15) 61 (13) 7,287
that this retrospective HIPAA-compliant study   Scattered fibroglandular 19,624 (41) 201 (42) 19,825
was exempt from requiring informed consent.   Heterogeneously dense 17,032 (35) 174 (36) 17,206
We used variables collected in the National
  Extremely dense tissue 4,385 (9) 41 (9) 4,426
Mammography Database [26] to develop a CADx
model. The National Mammography Database is BI-RADS category
a recommended format for collecting practice-   1 21,094 (44) 0 (0) 21,094
level mammography audit data to monitor and   2 10,048 (21) 13 (3) 10,061
standardize performance nationally. The National
  3 8,520 (18) 32 (7) 8,552
Mammography Database includes Breast Imag­
ing Reporting and Data System (BI-RADS)   0 8,148 (17) 130 (27) 8,278
descriptors [27, 28].   4 364 (1) 137 (29) 501
  5 93 (0) 165 (35) 258
Subjects
We collected data from all screening and dia­
gnostic mammography examinations that were detection. Mean glandular dose was not available characteristics, treatment, and mortality. Data
per­formed at the Medical College of Wisconsin, at the time of our study. exchange agreements with 17 other state cancer
Milwaukee, an academic, tertiary care medical The clinical practice we studied routinely converts registries yield data for Wisconsin residents
center, between April 5, 1999 and February 9, 2004. screening examinations to diagnostic mammo­graphy receiving care in other states. We sent 65,892
Our database included 48,744 mammography examinations when an abnormality is identified; records in the database to the cancer registry
examinations (477 malignant and 48,267 benign) therefore, practice performance para­meters were and received back 65,904 records after their
performed on 18,270 patients (Table 1) having the calculated in aggregate because these examinations matching protocol. An additional 12 records
mean age of 56.8 years (range, 18–99 years). Our could not be accurately separated. Specifically, we were returned to us because of duplication of
data set consisted of 65,892 records; each record measured recommended per­formance parameters records for patients diagnosed with more than
represents a mammography lesion (benign or (cancer detection rate, early-stage cancer detection one cancer. We developed an automated process
malignant) observed on the mammogram or a rate, and abnormal interpretation rate) for all mammo­ that confirmed whether the cancer matched the
single record of demographic factors only, if grams in our data set. assigned abnormality. This process ensured that
nothing is observed on the mammogram. The data In contrast to our practice performance audit, the record indicated the same side and the same
were entered using the PenRad mammo­graphy based on mammograms, the analysis of the quadrant and that the diagnosis was made no
reporting and tracking data system (struc­tured classification accuracy of the logistic regression longer than 12 months after the mammography
reporting software, PenRad) by technologists and model and radiologists was conducted at the record date. If more than one record indicated the
radiologists. There were a total of eight radi­ level. Because breast cancer classification actually same side and quadrant, the matching was done
ologists, four of whom were general radiologists occurs at the record level (i.e., each finding on manually. We used a 12-month follow-up period
with some mammography background, two who mammography will require a decision to recall or as the reference standard because it has been
were fellowship-trained, and two who had lengthy to biopsy), we target this level of level of detail to recommended as an interval sufficient to identify
experience in breast imaging. The experience of help improve radiologists’ performance. We clearly false-negatives in mammography prac­tice audits
the eight radiologists ranged between 1 and 35 indicate when analyses in this article are based on [27, 28]. We removed 299 records belonging to
years, and the number of mammograms interpreted mammograms rather than on records. 188 mammograms from 124 women because they
by them ranged from 49 to 22,219. All mammog­ We used cancer registry matching as the refer­ could not be matched due to missing laterality
raphy observations were made by radiologists; all ence standard in this study. All newly diagnosed or quadrant information from either the cancer
demographic factors were recorded by technologists. cancer cases are reported to the Wisconsin Cancer registry (117 records) or the mammography
This facility used a combination of digital and film Reporting System. This registry collaborates with structured report (182 records) (Table 2). Of the
mammography (~ 75% film mammo­g raphy). No several other state agencies to collect a range of unmatched 299 records, 183 records represented
computer-aided detection tool was used for lesion data, including demographic information, tumor a second record identifying a finding in women

1118 AJR:192, April 2009


Model for Breast Cancer Diagnosis

TABLE 2:  Data Processing in the National Mammography Database) to 36


Record Group Removed Total Malignant discrete variables. Figure 1 shows the schema
of these variables used to build the model. We
Mammography records reported to Wisconsin Cancer Registry 65,892
constructed two risk estimation models. Both
System
models used the presence or absence of breast
Records from Wisconsin Cancer Registry System (12)a 65,904 cancer as the dependent variable, and these 36
Records unmatched with registryb 299 65,605 546
Downloaded from www.ajronline.org by 72.34.128.250 on 03/14/20 from IP address 72.34.128.250. Copyright ARRS. For personal use only; all rights reserved

discrete variables were used as independent


No. of duplicate records 101 65,504 532 variables to build the model. Model 2 included
these same variables plus the BI-RADS assess­
Records with missing features (but expected) in the structured 3,285 62,219c 510c
reports (i.e., BI-RADS 3, 0, 4, 5 with no masses and calcifications) ment categories assigned by the radiologists. More
a Additional records were returned because of duplication of records for patients diagnosed with more than one
than 600 two-way interaction effects are possible
cancer. in each model. We did not include any interaction
bLaterality or quadrant position was not available in National Mammography Database or registry data. term in our models.
cData used to build logistic regression model.
Before model construction, we grouped​
BI-RADS categories 1 and 2 as “BI-RADS 1
who already had a cancer matched to the registry. Statistical Analysis or 2” because these cases had a low frequency
The remaining 116 records consisted of 38 Model construction—Logistic regression, a of malignancy. The logistic regression model
BI-RADS category 1, 24 category 2, 22 category 3, statistical approach to predicting the presence of was built using R statistical software (The R
21 category 0, four category 4, and seven category a disease based on available variables (symptoms, Foundation for Statistical Computing) [31]. We
5. We then removed 101 duplicates. Finally, imaging data, patient history, and so forth), used forward selection based on the chi-square
we removed 3,285 records that had BI-RADS has been successfully used for prediction and test of the change in residual deviance. We used
assessment categories 0, 3, 4, and 5 (indicating a diagnosis in medicine [29, 30]. To build a breast a cutoff of p < 0.001 for adding new terms. This
finding) that did not have descriptors recorded in cancer risk estimation model, we mapped the stringent criterion was used to avoid including
the record. The final sample consisted of 62,219 variables collected by physicians in their daily terms that, although statistically significant be­
(510 malignant, 61,709 benign) records. clinical practice (based on BI-RADS descriptors cause of the large sample size, are not clinically

National Mammography Database

Patient BI-RADS
Findings
demographics assessment

Family history Personal history


of breast cancer of breast cancera Breast densityb Associated
findings
Strong
Class 1
Minor
Class 2
None Skin retractiona
Architechtural Class 3
Calcifications Mass Special cases Class 4
distortiona
Nipple retractiona
Prior breast Milk of Caa Lymph nodea
surgerya
Size Stability
Skin thickeninga
Small (< 3 cm) Increasing Asymmetric
Age (y) Hormones Rounda Dystrophica densitya
Large (≥ 3 cm) Decreasing
< 45 < 5 yr Stable Trabecular
thickeninga
45–50 > 5 yr Tubular densitya
50–55 None Punctatea Eggshella
55–60 Skin lesiona
60–65 Density Shape
> 65 Amorphousa Lucenta
High Round
Axillary
Equal Oval adenopathya
Low Lobular
Pleomorphic a
Dermal a
Fat-containing Irregular

Fine lineara Popcorna


Distribution Margins

Scattered Circumscribed
Suturea Linear Rodlikea Microlobulated
Clustered Obscured
Regional Indistinct
Segmental Spiculated

Fig. 1—Descriptors of National Mammography Database [26] entered to build logistic regression model for breast cancer prediction.
aBinary variable with categories “Present” or “Not Present.”
bClass 1, predominantly fatty; class 2, scattered fibroglandular; class 3, heterogeneously dense; and class 4, extremely dense tissue.

AJR:192, April 2009 1119


Chhatwal et al.

important. The p values listed in Tables 3 and 4 are TABLE 3: Model 1, Multivariable Model with BI-RADS Categories Excluded
from chi-square tests of the significance of each Risk Factor β Odds Ratio (95% CI) p
term entered last. The importance of each term in
Mass stability < 0.0001
predicting breast cancer can be assessed using the
odds ratios provided in the tables. The details of   None 0.00 1 (referent)
logistic regression (including the interpretation of   Increasing 0.63 1.88 (1.37–2.60)
Downloaded from www.ajronline.org by 72.34.128.250 on 03/14/20 from IP address 72.34.128.250. Copyright ARRS. For personal use only; all rights reserved

odds ratios) are discussed in Appendix 1.


  Stable –1.19 0.30 (0.18–0.50)
A number of sources of correlation are possible
in these data. Findings from a particular radiologist   Decreasing –0.74 0.48 (0.26–0.87)
may be more similar than findings from different Mass shape 0.0003
radiologists, findings within a patient may be more   None 0.00 1 (referent)
similar than those from different patients, and
  Irregular 0.84 2.31 (1.32–4.04)
findings during the same mammography visit of
a patient may be more similar than those during   Oval –0.12 0.89 (0.49–1.60)
other mammography visits of the same patient.   Round –0.02 0.98 (0.51–1.89)
We investigated models in which the radiologist is   Lobular 0.62 1.87 (0.89–3.89)
included as a random effect and compared it with
our models in which the radiologist is excluded from   Cannot discern –0.70 0.50 (0.17–1.48)
the models. We found no substantial differences Mass margins < 0.0001
in the coefficients for the other terms in the model   None 0.00 1 (referent)
due to including the radiologist as a random
  Circumscribed –0.93 0.39 (0.21–0.74)
effect. Thus, we chose the simpler model without
the radiologist. We were unable to test random   Cannot discern 0.27 1.32 (0.72–2.42)
effects for patient or for mammogram within   Ill-defined 1.41 4.10 (2.49–6.76)
patient because the expected number of cancers   Spiculated 2.90 18.24 (10.67–31.20)
for each patient is very small. Random effects
  Microlobulated 0.63 1.88 (0.74–4.82)
models tend to be biased in these circumstances
[32]. Instead, we relied on our stringent criterion Mass density < 0.0001
of p < 0.001 for inclusion in the model to avoid   None 0.00 1 (referent)
the overly optimistic p value that occurs when
  Cannot discern 0.80 2.23 (1.25–3.97)
the variance of the parameters is reduced by
positive correlation induced by clustered data.   Equal 0.74 2.10 (1.13–3.88)
The parameter estimates themselves are unbiased   Low 0.63 1.88 (0.73–4.88)
regardless of the form of the variance.   High 2.27 9.67 (5.59–16.71)
To show that BI-RADS descriptors substantively
Mass size < 0.0001
contribute to prediction accuracy in Model 2, we
also constructed a secondary model (Model 3).   None 0.00 1 (referent)
Model 3 omits these descriptors and includes only   Small 1.20 3.33 (2.32–4.77)
patient demographic factors (age, history of breast
  Large 0.90 2.46 (1.36–4.45)
cancer, family history of breast cancer, history of
surgery, breast density, and hormone therapy) and Skin retraction < 0.0001
BI-RADS assessment categories as independent   Not present 0.00 1 (referent)
variables to test whether performance declines. The   Present –1.45 0.23 (0.11–0.49)
details of Model 3 are provided in Appendix 2.
Calcification distribution < 0.0001
Model evaluation—We used a 10-fold cross-
validation technique to evaluate the predictive   None 0.00 1 (referent)
performance of the two models. This methodology   Clustered 0.64 1.89 (1.20–2.96)
avoids the problem of validating the model on the   Regional 1.10 3.01 (1.21–7.46)
same data used to estimate the parameters by
  Scattered 0.89 2.44 (0.31–19.22)
using separate estimation and evaluation subsets
of the data. Specifically, we divided the data set   Linear 1.13 3.11 (1.15–8.44)
into 10 subsets (with approximately one tenth of   Segmental 3.58 35.71 (10.79–118.15)
benign abnormalities and one tenth of malignant
Pleomorphic calcifications < 0.0001
abnormalities in each subset or “fold”) so that
all abnormalities associated with a single patient   Not present 0.00 1 (referent)
were assigned to the same fold. This ensured that   Present 2.37 10.68 (7.17–15.93)
all folds are independent of each other. We started Note—Beta (β) indicates regression coefficients.
with the first nine folds (omitting the 10th fold) (Table 3 continues on next page)

1120 AJR:192, April 2009


Model for Breast Cancer Diagnosis

TABLE 3: Model 1, Multivariable Model with BI-RADS Categories Excluded to estimate the coefficients of the independent
(continued) variables (training) and predicted the probability of
Risk Factor β Odds Ratio (95% CI) p cancer on the 10th fold (testing). Then we omitted
the 9th fold (used as the testing set) and trained
Fine linear calcifications < 0.0001
the model using the other nine folds. Similarly,
  Not present 0.00 1 (referent) we tested on each fold. Finally, we combined all
Downloaded from www.ajronline.org by 72.34.128.250 on 03/14/20 from IP address 72.34.128.250. Copyright ARRS. For personal use only; all rights reserved

  Present 0.89 2.44 (1.61–3.69) test sets to obtain a full-test set and evaluated the
Patient age (y) 0.2216 overall performance of the model using the full-
test set. Note that for inclusion of variables in the
  < 45 0.00 1 (referent)
final model, we used the whole data set (62,219
  45–50 −0.02 0.98 (0.65–1.48) records), which gave us the best possible estimates
  51–54 −0.20 0.82 (0.51–1.32) of the variables from the available data.
  55–60 0.26 1.30 (0.88–1.92) Performance measures—We measured the
performance of the two models using the outcome
  61–64 0.18 1.20 (0.77–1.88)
(i.e., the probability of cancer) of the full-test
  ≥ 65 0.22 1.25 (0.87–1.78) set obtained by 10-fold cross-validation. We
History of breast cancer < 0.0001 plotted and measured area under the receiver
  None 0.00 1 (referent) operating characteristic (ROC) curve of Model 1
and Model 2 using the probability of cancer. We
  History of ductal or lobular carcinoma 2.90 18.16 (14.38–22.93)
measured the performance of radiologists using
Note—Beta (β) indicates regression coefficients. BI-RADS assessment categories assigned to each
TABLE 4:  Model 2, Multivariable Model with BI-RADS Categories Included mammography record. We first ordered BI-RADS
assessment categories by likelihood of breast
Risk Factor β Odds Ratio (95% CI) p cancer (1, 2, 3, 0, 4, and 5), generated an ROC curve,
Mass stability 0.0002 and measured its area (A z) using a nonparametric
  None 0.00 1 (referent) method [33]. We compared the performance of
the two models with that of radiologists using the
  Increasing 0.54 1.71 (1.21–2.42)
nonparametric method of DeLong et al. [34] for
  Stable −0.04 0.96 (0.55–1.68)
comparing two or more areas under ROC curves
  Decreasing −0.96 0.38 (0.19–0.78) obtained from the same data set.
Mass margins < 0.0001 For the purpose of assessing the sensitivity
and specificity of radiologists, we classified​
  None 0.00 1 (referent)
BI-RADS categories 1, 2, and 3 as negative; and
  Circumscribed −0.41 0.66 (0.38–1.14
BI-RADS categories 0, 4, and 5 as positive [28].
  Cannot discern 0.41 1.51 (0.89–2.55) We compared the sensitivity of the two models
  Ill-defined 0.76 2.13 (1.38–3.29) with the radiologists’ sensitivity at 90% specificity,
  Spiculated 0.77 2.16 (1.27–3.69) and the specificity of the two models with the
radiologists’ specificity at 85% sensitivity, with
  Microlobulated 0.10 1.11 (0.41–2.95)
the corresponding CIs estimated using the efficient
Mass size < 0.0001 score method corrected to continuity [35]. Note
  None 0.00 1 (referent) that the points “sensitivity at 90% specificity” and
  Small 1.13 3.10 (2.15–4.48) “specificity at 85% sensitivity” on the radiologists’
ROC curve were not observed in practice; they
  Large 0.42 1.51 (0.78–2.95)
were obtained from the linear interpolation of the
Intramammary lymph node < 0.0001 two neighboring discrete points. We used these
  Not present 0.00 1 (referent) levels of sensitivity and specificity because they
  Present −1.73 0.18 (0.07–0.45) represent the minimal performance thresholds for
screening mammography [36]. We also estimated
Focal asymmetric density 0.0002
the number of true-positive and false-negative
  Not present 0.00 1 (referent) records at 90% specificity by multiplying the
  Present 0.78 2.18 (1.54–3.08) sensitivity (of radiologists, Model 1 and Model 2)
Calcification distribution < 0.0001 by the total number of malignant records. Similarly,
we estimated the number of false-positive and true-
  None 0.00 1 (referent)
negative records at 85% sensitivity by multiplying
  Clustered 1.09 2.98 (2.00–4.43) the specificity (of radiologists, Model 1 and Model
  Regional 0.92 2.51 (0.95–6.62) 2) by the total number of benign records. Finally,
Note—Beta (β) indicates regression coefficients. we identified the most important predictors of
(Table 4 continues on next page) breast cancer using the odds ratio given in the

AJR:192, April 2009 1121


Chhatwal et al.

TABLE 4: Model 2, Multivariable Model with BI-RADS Categories Included Radiologists achieved an Az of 0.939 ± 0.011
(continued) as measured by the BI-RADS assessment cat-
Risk Factor β Odds Ratio (95% CI) p egory assigned to each record. Model 1
achieved an Az of 0.927 ± 0.015, which was not
Calcification distribution (continued) < 0.0001
significantly different (p = 0.104) from the ra-
  Scattered 0.60 1.82 (0.14–23.48) diologists’ Az. Model 2, with an Az of 0.963 ±
Downloaded from www.ajronline.org by 72.34.128.250 on 03/14/20 from IP address 72.34.128.250. Copyright ARRS. For personal use only; all rights reserved

  Linear 0.40 1.49 (0.49–4.54) 0.009, performed significantly better (p <


  Segmental 2.82 16.73 (3.76–74.48) 0.001) than radiologists and Model 1 (Fig. 2).
At the abnormality level, we found that
Patient age (y) < 0.0001
at 90% specificity, the sensitivity of Mod-
  < 45 0.00 1 (referent) el 2 was 90.2% (95% CI, 87.2–92.6%) and
  45–50 0.03 1.04 (0.66–1.64) was significantly better (p < 0.001) than that
  51–54 0.02 1.02 (0.61–1.72) of the radiologists at 82.2% (78.5–85.3%)
and Model 1 at 80.7% (77.0–84.1%). Table
  55–60 0.77 2.16 (1.39–3.36)
5 illustrates that Model 2 identified 41 more
  61–64 0.66 1.93 (1.29–2.87) cancers than the radiologists at this level of
  ≥ 65 0.70 2.01 (1.21–3.35) specificity. At a fixed sensitivity of 85%, the
History of breast cancer < 0.0001 specificity of Model 2 at 95.6% (95.4–95.8%)
was also significantly better (p < 0.001) than
  None 0.00 1 (referent)
the radiologists at 88.2% (87.9–88.5%) and
  History of ductal or lobular carcinoma 2.40 11.05 (8.56–14.27) Model 1 at 87.0% (86.7–87.3%). Table 5 il-
BI-RADS category < 0.0001 lustrates that Model 2 decreased the number
  1 or 2 0.00 1 (referent) of false-positives by 4,567 when compared
with radiologists’ performance.
  3 1.08 2.94 (1.80–4.82)
We now illustrate the use of the logistic
  0 3.14 23.00 (14.40–36.73) regression models to estimate the probability
  4 5.21 183.62 (113.36–297.45) of cancer using three cases.
  5 6.26 522.10 (296.73–918.63) Case 1—A 45-year-old woman presented
with a circumscribed oval mass of equal den-
Note—Beta (β) indicates regression coefficients.
sity on her baseline mammogram. She was
output of the two models when built on the whole identified by this model were spiculated mass assigned BI-RADS category 4 by the radiol-
data set (62,219 records). margins, high mass density, segmental calcifi- ogist for this abnormality. Model 1 and Mod-
cation distribution, pleomorphic calcification el 2 estimated her probability of cancer to be
Results morphology, and history of invasive carcino- equal to 0.05% (95% CI, 0.01–0.23%). and
Practice Performance ma. Age was not found to be a significant pre- 1.79% (0.27–11.11%), respectively. Biopsy of
We found the following distribution of dictor, but it was included in the model because this case was benign. This is a classic exam-
breast tissue density: predominantly fatty of its clinical relevance. In Model 2, which in- ple of a probably benign finding with an esti-
tissue, 15%; scattered fibroglandular tissue, cluded BI-RADS assessment categories, nine mate of breast cancer of less than 2%.
41%; heterogeneously dense tissue, 35%; and independent variables were significant in pre- Case 2—A 52-year-old woman with a his-
extremely dense tissue, 9% (Table 1). At the dicting the risk of breast cancer (Table 4). The tory of breast cancer had a mammogram that
mammogram level, the cancer detection rate most important predictors associated with showed an ill-defined oval mass (< 3 cm) that
was 9.8 cancers per 1,000 mammograms (477 breast cancer identified by this model were BI- was increasing in size and had density equal
cancers for 48,744 mammograms). The ab- RADS assessment categories 0, 4, and 5; seg- to the surrounding glandular tissue. The ra-
normal interpretation rate was 18.5% (9,037 mental calcification distribution; and history of diologist assigned BI-RADS category 3. The
of 48,744 mammograms). Of all cancers de- invasive carcinoma. Note that the inclusion of probability of malignancy for this finding us-
tected, 71.9% were early-stage (0 or 1) and BI-RADS assessment categories in Model 2 re- ing Model 1 was 30.6% (8.2–68.6%) and for
only 25.9% had lymph node metastasis. Ra- moved some of the significant predictors found Model 2 was 3.6% (0.7–17.4%). Biopsy re-
diologists showed a sensitivity of 90.5% and in Model 1 and added others. We tested for the vealed malignancy. This case illustrates the
a specificity of 82.2% as estimated from BI- significance of variables in both the models (as superior predictive ability for Model 1 be-
RADS assessment categories on the mam- shown in Tables 3 and 4) using the whole data cause the BI-RADS category was not correct
mogram level. set. Among demographic factors, none of the and misled Model 2.
models found family history of breast cancer Case 3—A 60-year-old woman with a
Logistic Regression Model or use of hormones to be significant predictors family history of breast cancer had a mam-
In Model 1, 10 independent variables (mam- of breast cancer. Among imaging descriptors, mogram that showed a mass with a spiculat-
mographic features and demographic fac- none of the models found breast density, archi- ed margin and irregular shape. Model 1 esti-
tors) were found to be significant in predicting tectural distortion, and amorphous calcifica- mated her probability of cancer to be 51.2%
breast cancer (Table 3). The most important tion morphology to be significant predictors of (24.4–78.3%).This abnormality was assigned
predictors associated with breast cancer as breast cancer. BI-RADS category 5. Model 2 estimated her

1122 AJR:192, April 2009


Model for Breast Cancer Diagnosis

Fig. 2—Graph shows the second category, in which we classify our


1.00 receiver operating
characteristic curves
model, have used suspicious findings recom-
constructed from mended for biopsy for training and evaluation
output probabilities or biopsy results as the reference standard.
0.80 of Model 1 and Model For example, one study constructed a Bayes-
2, and radiologist’s
BI-RADS assessment ian network using 38 BI-RADS descriptors;
Downloaded from www.ajronline.org by 72.34.128.250 on 03/14/20 from IP address 72.34.128.250. Copyright ARRS. For personal use only; all rights reserved

categories. AUC = area by training the model on 111 biopsies per-


Sensititvity

0.60 under curve. formed on suspicious calcifications, they


found an A z of 0.919 [37]. Another study de-
veloped linear discriminant analysis and ar-
0.40
tificial neural network models using a com-
bination of mammographic and sonographic
features; they found an A z of 0.92 [16]. In
0.20
Model 1; AUC = 0.927 contrast, our computer model was trained
Model 2; AUC = 0.963 and evaluated on consecutive mammogra-
Radiologists; AUC = 0.939
phy examinations and used registry match
0.00
0.00 0.20 0.40 0.60 0.80 1.00 as the reference standard. The third category
1 − Specificity of models (risk prediction models) has been
built using consecutive cases, but they in-
cluded only demographic factors and breast
TABLE 5:  Performance Measures density in their model [19, 21, 22] and cannot
be directly compared with our model.
Performance at 90% specificity True-Positive False-Negative
In addition, our model differs from these
  Radiologist 419 (400–435) 91 (75–110) risk prediction models by estimating the risk
  Model 1 412 (393–429) 98 (81–117) of cancer at a single time point (i.e., at the time
  Model 2 460 (445–472) 50 (38–65) of mammography) instead of over an interval
in the future (e.g., over the next 5 years). In
Performance at 85% sensitivity False-Positive True-Negative
contrast to their findings, our model did not
  Radiologist 7,282 (7,126–7,441) 54,427 (54,268–54,583) find breast density to be a significant predic-
  Model 1 8,002 (7,837–8,207) 53,687 (53,502–53,872) tor of breast cancer. This could be because
  Model 2 2,715 (2,592–2,839) 59,994 (58,870–59,117) the risk of breast cancer is explained by more
informative mammographic descriptors in
Note—Data are numbers (95% CIs) of cases.
our logistic regression model. Our model re-
inforces previously known mammography
predictors of breast cancer—irregular mass
probability of cancer to be 69.7% (33.5– sions, clearly dominates the other two ROC shape; ill-defined and spiculated mass mar-
91.2%). The biopsy outcome of this case was curves, indicating better sensitivity and spec- gins; fine linear calcifications; and clustered,
malignant. This case is a straightforward ificity at all threshold levels. Adding radiol- linear, and segmental calcification distribu-
case of malignancy in which a correct BI- ogists’ overall impressions (BI-RADS cat- tions [38]. In addition, we found increasing
RADS category increases the probability of egory) in Model 2, we could identify more mass size and high mass density to be signif-
malignancy using Model 2. malignant lesions and avoid false-positive icant predictors, which has not been shown
cases as compared with the performance of in the literature to our knowledge. Note that
Discussion Model 1 and radiologists alone. our results reflect a single practice and must
We constructed two breast cancer risk Our computer model is different in vari- be viewed with some caution with respect
estimation models based on the National ous ways when compared with the existing to their generalizability because significant
Mammography Database descriptors to aid mammography computer models in the lit- variability has been observed in the interpre-
radiologists in breast cancer diagnosis. Our erature. The existing models can be catego- tive performance of screening and diagnostic
results show that the combination of a logis- rized in the following ways: for detecting ab- mammography [5, 6].
tic regression model and radiologists’ assess- normalities present on the mammograms, for We developed two risk estimation models
ment performs better than either alone in dis- estimating the risk of breast cancer based on by excluding (in Model 1) and including (in
criminating between benign and malignant the mammographic observations and patient Model 2) BI-RADS assessment categories.
lesions. The ROC curve of Model 1, which demographic information, and for predict- Although Model 2 performed significantly
includes only demographic factors and mam- ing the risk of breast cancer to identify high- better than Model 1 in discriminating between
mography observations, overlaps and inter- risk individuals. The first category of mod- benign and malignant lesions, Model 2 may
sects with the radiologists at certain points in els is used to identify abnormalities on the have weaknesses as a stand-alone risk estima-
the curve, showing that one is not always bet- mammograms, whereas our model provides tion tool if the assessed BI-RADS category is
ter than the other. On the other hand, Model the interpretation of mammography observa- incorrect. If the BI-RADS assessment catego-
2, which also includes radiologists’ impres- tions after they are identified. The models in ry does not agree with the findings, Model 1

AJR:192, April 2009 1123


Chhatwal et al.

and Model 2 used jointly will show a high lev- as compared with general radiologists [39]. BI-RADS 1 cases from our analyses, which
el of disagreement in the prediction of breast However, with appropriate training [40], represented either undetected cancer (pres-
cancer (as in example case 2) and will poten- general radiologists in combination with the ent on the mammogram but not seen) or an
tially indicate this error. When the radiolo- model may approach the accuracy of subspe- interval cancer (not detectable on the mam-
gist’s BI-RADS category is correct (i.e., when cialty-trained mammographers. Decreasing mogram). The inclusion of these cases may
there is an agreement between the predictions variability in mammography interpretation, have erroneously increased the probability of
Downloaded from www.ajronline.org by 72.34.128.250 on 03/14/20 from IP address 72.34.128.250. Copyright ARRS. For personal use only; all rights reserved

of Model 1 and Model 2), Model 2 will be a one of the underlying motivations of this re- malignancy by considering future risks rath-
better model for breast cancer prediction. In search, can only be realized with further de- er than making a prediction at a single time
future work, we plan to estimate the level of velopment of tools such as our model and based on mammography features alone. How-
disagreement between the two models and in- with research to validate accuracy, effective- ever, the exclusion of these cases may have er-
vestigate the possible use of these models as ness, and generalizability. We consider this roneously decreased the estimated probability
complementary tools. work to be only a first step toward this goal. of malignancy, given that at least some of the
Our secondary model (Model 3) showed We could not compare practice parameters false-negative cancers were likely present at
that the exclusion of the BI-RADS descrip- directly with the literature because screen- the time of the mammogram, especially those
tors significantly impairs the performance of ing and diagnostic examinations could not be in women with dense breasts, which is a limi-
the logistic regression model, underscoring separated for this database. Our prediction tation of our model.
the need for the collection of these variables Model 2 shows a significant improvement Our models provide the probability of can-
in a clinical practice. over radiologists’ assessment in classifying cer as the outcome that can be used by radiol-
It is common for clinical data sets to con- abnormalities when built on a mix of screen- ogists for making appropriate patient manage-
tain a substantial number of missing data. ing and diagnostic data. The model’s perfor- ment decisions. The use of such models has
Although complete data are ideally better, mance may differ when built separately on a potential to reduce the interpretive variabil-
that situation is rarely encountered in the screening and diagnostic mammograms. For ity of mammography across practices and ra-
real world. There is no perfect way to handle screening mammograms, the incidence is diologists. Our models also facilitate shared
missing data, but there are two possibilities: low and descriptors are less exact because of decision making by providing the probabili-
to impute the missing descriptor depending general imaging protocols and so may result ty of cancer, which can be better understood
on the fraction of various possible values of in less accurate model parameters. In con- by patients than BI-RADS categories. In the
the descriptor or to assume that the missing trast, for diagnostic mammograms, the mod- future, we will test our models’ performance
descriptor was not observed by radiologists el parameters may be more accurate because on other mammography practices to evaluate
and mark it as “not present.” When building more descriptors can be observed as a result their generalizability. We will also include po-
the model, we made the decision to label all of additional specialized views. In addition, tentially important interaction effects that de-
of the missing data as not present; therefore, the performance of our existing model may serve particular attention. Note that including
when testing and applying the model on a differ when tested on screening and diagnos- interaction effects will further improve the
new case, the missing descriptors should be tic mammograms separately. The model may performance of our models.
treated as not present. Our approach to handling perform better when tested on the diagnostic In conclusion, we found that our logis-
missing data is appropriate for mammogra- examinations but worse when tested on the tic regression models (Model 1 and Model
phy data, where radiologists often leave the screening examinations. 2) can effectively discriminate between be-
descriptors blank if nothing is observed on Our risk estimation models are designed nign and malignant lesions. Furthermore,
the mammogram. to aid radiologists, not to act as a substitute. we have found that the radiologist alone or
To our knowledge, no prior studies dis- The improvement in the model’s performance the logistic regression model incorporat-
cuss a logistic regression–based CADx mod- by adding BI-RADS assessments indeed sug- ing only mammographic and demographic
el incorporating mammography descriptors gests that the radiologist’s integration of the features (Model 1) are inferior to Model 2,
from consecutive mammograms from a imaging findings summarized by the BI- which incorporates the model, the features,
breast imaging practice. The use of a logis- RADS assessment categories does augment and the radiologist’s impression as captured
tic regression model has some attractive fea- predictions based on the observed mammo- by the BI-RADS assessment categories. Our
tures when compared with artificial intelli- graphic features. However, the logistic regres- study supports that further research is need-
gence prediction tools (e.g., artificial neural sion model contributes an additional measure ed to define how radiologists and computa-
networks, Bayesian networks, support vector of accuracy over and above that provided by tional models can collaborate, each adding
machines). Logistic regression can identify the BI-RADS assessment categories, as evi- valuable predictive features, experience, and
important predictors of breast cancer using denced by the improved performance com- training to improve overall performance.
odds ratios and can generate confidence in- pared with that of the radiologists alone.
tervals that provide additional information The objective of our model is to aid deci- References
for decision making. sion making by generating a risk prediction 1. Kopans DB. The positive predictive value of
Our models’ performance depends on the for a single point in time (at mammography). mammography. AJR 1992; 158:521–526
ability of radiologists to accurately identify As we were designing the study, we did not 2. Barlow WE, Chi C, Carney PA, et al. Accuracy of
findings on mammograms. Therefore, based want to influence the probability of breast screening mammography interpretation by char-
on the literature, performance may be high- cancer based on future events but only on acteristics of radiologists. J Natl Cancer Inst
er in facilities where most mammograms variables identified at the time of mammogra- 2004; 96:1840–1850
are read by mammography subspecialists phy. For this reason, we excluded unmatched 3. Kerlikowske K, Grady D, Barclay J, et al. Variability

1124 AJR:192, April 2009


Model for Breast Cancer Diagnosis

and accuracy in mammographic interpretation us- 17. Claus EB, Risch N, Thompson WD. Autosomal 32. Moineddin R, Matheson FI, Glazier RH. A simula-
ing the American College of Radiology Breast dominant inheritance of early-onset breast can- tion study of sample size for multilevel logistic re-
Imaging Reporting and Data Systems. J Natl cer: implications for risk prediction. Cancer 1994; gression models. BMC Med Res Methodol 2007;
Cancer Inst 1998; 90:1801–1809 73:643–651 7:34
4. Elmore JG, Miglioretti DL, Reisch LM, et al. 18. Colditz GA, Rosner B. Cumulative risk of breast 33. Hanley JA, McNeil BJ. The meaning and use of
Screening mammograms by community radiolo- cancer to age 70 years according to risk factor sta- the area under a receiver operating characteristic
Downloaded from www.ajronline.org by 72.34.128.250 on 03/14/20 from IP address 72.34.128.250. Copyright ARRS. For personal use only; all rights reserved

gists: variability in false-positive rates. J Natl tus: data from the Nurses’ Health Study. Am J (ROC) curve. Radiology 1982; 143:29–36
Cancer Inst 2002; 94:1373–1380 Epidemiol 2000; 152:950–964 34. DeLong ER, DeLong D, Clarke-Pearson D. Com-
5. Miglioretti DL, Smith-Bindman R, Abraham L, et 19. Gail MH, Brinton LA, Byar DP, et al. Projecting paring the areas under two or more correlated re-
al. Radiologist characteristics associated with in- individualized probabilities of developing breast ceiver operating characteristic curves: a nonpara-
terpretive performance of diagnostic mammogra- cancer for white females who are being examined metric approach. Biometrics 1988; 44:837–845
phy. J Natl Cancer Inst 2007; 99:1854–1863 annually. J Natl Cancer Inst 1989; 81:1879–1886 35. Newcombe RG. Two-sided confidence intervals
6. Taplin S, Abraham L, Barlow WE, et al. Mam- 20. Taplin SH, Thompson RS, Schnitzer F, Anderman for the single proportion: comparison of seven
mography facility characteristics associated with C, Immanuel V. Revisions in the risk-based Breast methods. Stat Med 1998; 17:857–872
interpretive accuracy of screening mammogra- Cancer Screening Program at Group Health Co- 36. Bassett LW, Hendrick RE, Bassford TL. Quality
phy. J Natl Cancer Inst 2008; 100:876–887 operative. Cancer 1990; 66:812–818 determinants of mammography. Clinical practice
7. Freer TW, Ulissey MJ. Screening mammography 21. Barlow WE, White E, Ballard-Barbash R, et al. guideline. No. 13. Rockville, MD: Agency for
with computer-aided detection: prospective study Prospective breast cancer risk prediction model Health Care Policy and Research. Public Health
of 12,860 patients in a community breast center. for women undergoing screening mammography. Service, U.S. Department of Health and Human
Radiology 2001; 220:781–786 J Natl Cancer Inst 2006; 98:1204–1214 Services, 1994
8. Dean JC, Ilvento CC. Improved cancer detection 22. Tice JA, Cummings SR, Smith-Bindman R, 37. Burnside ES, Rubin DL, Fine JP, Shachter RD,
using computer-aided detection with diagnostic Ichikawa L, Barlow WE, Kerlikowske K. Using Sisney GA, Leung WK. Bayesian network to pre-
and screening mammography: prospective study clinical factors and mammographic breast density dict breast cancer risk of mammographic micro-
of 104 cancers. AJR 2006; 187:20–28 to estimate breast cancer risk: development and calcifications and reduce number of benign biopsy
9. Cupples TE, Cunningham JE, Reynolds JC. Impact validation of a new predictive model. Ann Intern results: initial experience. Radiology 2006; 240:
of computer-aided detection in a regional screening Med 2008; 148:337–347 666–673
mammography program. AJR 2005; 185:944–950 23. Vyborny CJ, Giger ML, Nishikawa RM. Comput- 38. Liberman L, Abramson AF, Squires FB, Glass-
10. Birdwell RL, Bandodkar P, Ikeda DM. Comput- er-aided detection and diagnosis of breast cancer. man JR, Morris EA, Dershaw DD. The Breast Im-
er-aided detection with screening mammography Radiol Clin North Am 2000; 38:725–740 aging Reporting and Data System: positive predic-
in a university hospital setting 1. Radiology 2005; 24. Doi K, Macmahon H, Katsuragawa S, Nishikawa tive value of mammographic features and final
236:451–457 RM, Jiang Y. Computer-aided diagnosis in radiology: assessment categories. AJR 1998; 171:35–40
11. Baker JA, Kornguth PJ, Lo JY, Williford ME, potential and pitfalls. Eur J Radiol 1999; 31:97–109 39. Sickles EA, Wolverton DE, Dee KE. Performance
Floyd CE Jr. Breast cancer: prediction with artifi- 25. Freedman AN, Seminara D, Gail MH, et al. Can- parameters for screening and diagnostic mam-
cial neural network based on BI-RADS standard- cer risk prediction models: a workshop on devel- mography: specialist and general radiologists.
ized lexicon. Radiology 1995; 196:817–822 opment, evaluation, and application. J Natl Can- Radiology 2002; 224:861–869
12. Bilska-Wolak AO, Floyd CE Jr. Development and cer Inst 2005; 97:715–723 40. Berg WA, D’Orsi CJ, Jackson VP, et al. Does
evaluation of a case-based reasoning classifier for 26. Osuch JR, Anthony M, Bassett LW, et al. A proposal training in the Breast Imaging Reporting and
prediction of breast biopsy outcome with BI- for a national mammography database: content, pur- Data System (BI-RADS) improve biopsy recom-
RADS lexicon. Med Phys 2002; 29:2090–2100 pose, and value. AJR 1995; 164:1329–1334 mendations or feature analysis agreement with
13. Burnside ES, Rubin DL, Shachter RD. Using a 27. American College of Radiology. Breast Imaging experienced breast imagers at mammography?
Bayesian network to predict the probability and Reporting and Data System (BI-RADS), 3rd ed. Radiology 2002; 224:871–880
type of breast cancer represented by microcalcifi- Reston, VA: American College of Radiology, 1998 41. Kleinbaum DG. Logistic regression: a self-learn-
cations on mammography. Stud Health Technol 28. American College of Radiology. Breast Imaging ing text. New York, NY: Springer-Verlag, 1994
Inform 2004; 107(Pt 1):13–17 Reporting and Data System (BI-RADS), 4th ed. 42. Hosmer D, Lemeshow S. Applied logistic regres-
14. Fischer EA, Lo JY, Markey MK. Bayesian net- Reston, VA: American College of Radiology, 2004 sion. New York, NY: Wiley, 1989
works of BI-RADS descriptors for breast lesion 29. Bagley SC, White H, Golomb BA. Logistic regres- 43. Davis J, Goadrich M. The relationship between
classification. Conf Proc IEEE Eng Med Biol Soc sion in the medical literature: standards for use and precision-recall and ROC curves. Proceedings of
2004; 4:3031–3034 reporting, with particular attention to one medical the 23rd International Conference on Machine
15. Markey MK, Lo JY, Floyd CE. Differences between domain. J Clin Epidemiol 2001; 54:979–985 Learning. Pittsburgh, PA: ICML, 2006:233–240
computer-aided diagnosis of breast masses and that 30. Gareen IF, Gatsonis C. Primer on multiple regres- 4 4. Chhatwal J, Burnside ES, Alagoz O. Receiver op-
of calcifications. Radiology 2002; 223: 489–493 sion models for diagnostic imaging research. Ra- erating characteristic (ROC) curves versus preci-
16. Jesneck JL, Lo JY, Baker JA. Breast mass lesions: diology 2003; 229:305–310 sion-recall (PR) curves in models evaluated with
computer-aided diagnosis models with mammo- 31. Team RDC. R: a language and environment for unbalanced data. Proceedings of the 29th annual
graphic and sonographic descriptors. Radiology statistical computing. Vienna, Austria: R Founda- meeting of the Society for Medical Decision
2007; 244:390–398 tion for Statistical Computing, 2005 Making. Pittsburgh, PA: SMDM, 2007

(Appendixes appear on the next page)

AJR:192, April 2009 1125


Chhatwal et al.

APPENDIX 1:  Logistic Regression

Binomial (or binary) logistic regression is a form of regression that is used when the dependent variable is dichotomous (e.g., present or ab-
sent) and the independent variables are of any type (discrete or continuous). The independent (observed) variables, Xi, X2,… Xn, are related to
the dependent (outcome) variable, Y, by the following equation:
Logit(p) = β0 + β1X1 + ... + βnXn (1),
Downloaded from www.ajronline.org by 72.34.128.250 on 03/14/20 from IP address 72.34.128.250. Copyright ARRS. For personal use only; all rights reserved

p
where β1 is the regression coefficient of X1, p = probability {Y = 1}, and Logit (p) = 1n
the inverse of the Logit (p) as shown in the following equation: 1−p
( )
. The value of p can be calculated by taking

exp(β0 + β1X1 +...+ βnXn)


p=
1 + exp(β0 + β1X1 +...+ βnXn)
where p is the probability of the presence of disease (e.g., probability of cancer) when the findings X1, X2,… Xn, (e.g., calcification types,
breast density, and age) are identified. bi is the coefficient of the independent variable Xi that is estimated using the available data (training
set). Only significant variables (p values ≥ a) are included in the model. Variables can be added by stepwise, forward, or backward selection
methods. Odds ratio is commonly used to interpret the effect of independent variables on the dependent variable, which is estimated by exp
(bi). For example, if b1 is the coefficient of variable “prior history of breast surgery,” then exp is the odds ratio corresponding to the history of
surgery—that is, the odds that the patient has a malignant lesion increases by the factor of exp (b1) if the patient has ever had breast surgery
and all other independent variables remain fixed. More details of logistic regression and its application to the medical field can be found in
other sources [29, 41, 42].

APPENDIX 2: Model 3

In order to assess the contribution of mammography descriptors in estimating the risk of breast cancer, we constructed Model 3, which in-
cluded patient demographic factors (age, history of breast cancer, family history of breast cancer, history of surgery, breast density, and hor-
mone therapy) and BI-RADS assessment categories, and excluded mammography descriptors. Only three variables were found significant in
predicting the risk of cancer in Model 3 (Table 6); BI-RADS assessment categories were the most important predictor.

TABLE 6: Model 3, Multivariate Model with Patient Demographic Factors and


BI-RADS Categories Only
Risk Factor β Odds Ratio (95% CI) p
BI-RADS category < 0.0001
  1 or 2 0.00 1 (referent)
  3 1.62 5.07 (3.15–8.18)
  0 3.02 20.43 (13.20–31.63)
  4 5.97 389.18 (250.95–603.55)
  5 7.01 1112.24 (691.79–1788.22)
Patient age (y) < 0.0001
  < 45 0.00 1 (referent)
  45–50 −0.03 0.97 (0.62–1.52)
  51–54 −0.03 0.97 (0.59–1.60)
  55–60 0.65 1.92 (1.25–2.95)
  61–64 0.49 1.63 (0.99–2.68)
  ≥ 65 0.54 1.71 (1.16–2.51)
History of breast cancer < 0.0001
  None 0.00 1 (referent)
  History of ductal or lobular carcinoma 2.27 9.64 (7.60–12.23)
Note—Beta (β) indicates regression coefficients.

(Appendixes continue on the next page)

1126 AJR:192, April 2009


Model for Breast Cancer Diagnosis

We measured the performance of our model using receiver operating characteristic (ROC) curves and precision–recall curves (Figs. 3 and 4).
We used precision–recall curves in addition to ROC curves to gain more insights into the performance of our model because precision–recall
curves have higher discriminative power than ROC curves in cases of skewed data [43, 44]. “Precision” measures the positive predictive value
and “recall” measures the sensitivity of a test. We plotted and measured the area under the precision–recall curve (A PR) of the three models
(Model 1, Model 2, and Model 3) and radiologists using the probability of cancer and BI-RADS assessment categories, respectively [43].
Downloaded from www.ajronline.org by 72.34.128.250 on 03/14/20 from IP address 72.34.128.250. Copyright ARRS. For personal use only; all rights reserved

1.00 1.00
Model 1; AUC = 0.363
Model 2; AUC = 0.559
Model 3; AUC = 0.487
Radiologists; AUC = 0.396
0.80 0.80

Precision (PPV)
Sensititvity

0.60 0.60

0.40 0.40

0.20 Model 1; AUC = 0.927 0.20


Model 2; AUC = 0.963
Model 3; AUC = 0.955
Radiologists; AUC = 0.939
0.00 0.00
0.00 0.20 0.40 0.60 0.80 1.00 0.00 0.20 0.40 0.60 0.80 1.00
1 − Specificity Recall (Sensitivity)

Fig. 3—Graph shows receiver operating characteristic curves constructed from Fig. 4—Graph shows precision–recall curves constructed from output
output probabilities of Model 1, Model 2, and Model 3, and radiologist’s BI-RADS probabilities of Model 1, Model 2, and Model 3, and radiologist’s BI-RADS
assessment categories. AUC = area under curve. assessment categories. AUC = area under curve, PPV = positive predictive value.

Model 3 achieved an A z (area under the ROC curve) and A PR that were significantly higher than that of Model 1 and radiologists (all p <
0.001). More important, Model 3 excluding descriptors performed significantly worse (p < 0.001) than Model 2 including descriptors in terms
of A z and APR (Table 7). Thus, the inclusion of mammographic descriptors significantly contributes to the superior performance of Model 2.

TABLE 7: Comparison of Area Under Receiver Operating Characteristic (A z) and


Precision–Recall (APR ) Curves
Curve for Az APR
Radiologists 0.939 ± 0.011 0.396 ± 0.027
Model 1 (demographics + descriptors) 0.927 ± 0.015 0.363 ± 0.030
Model 2 (demographics + descriptors + assessments) 0.963 ± 0.009 0.559 ± 0.026
Model 3 (demographics + assessments) 0.955 ± 0.011 0.487 ± 0.028

AJR:192, April 2009 1127

You might also like