Professional Documents
Culture Documents
Clinical Radiology
journal homepage: www.clinicalradiologyonline.net
article in formation AIM: To test the diagnostic performance of a deep learning-based system for the detection of
clinically significant pulmonary nodules/masses on chest radiographs.
Article history: MATERIALS AND METHODS: Using a retrospective study of 100 patients (47 with clinically
Received 1 March 2019 significant pulmonary nodules/masses and 53 control subjects without pulmonary nodules),
Accepted 14 August 2019 two radiologists verified clinically significantly pulmonary nodules/masses according to chest
computed tomography (CT) findings. A computer-aided diagnosis (CAD) software using a
deep-learning approach was used to detect pulmonary nodules/masses to determine the
diagnostic performance in four algorithms (heat map, abnormal probability, nodule proba-
bility, and mass probability).
RESULTS: A total of 100 cases were included in the analysis. Among the four algorithms, mass
algorithm could achieve a 76.6% sensitivity (36/47, 11 false negative) and 88.68% specificity (47/
53, six false-positive) in the detection of pulmonary nodules/masses at the optimal probability
score cut-off of 0.2884. Compared to the other three algorithms, mass probability algorithm
had best predictive ability for pulmonary nodule/mass detection at the optimal probability
score cut-off of 0.2884 (AUCMass: 0.916 versus AUCHeat map: 0.682, p<0.001; AUCMass: 0.916
versus AUCAbnormal: 0.810, p¼0.002; AUCMass: 0.916 versus AUCNodule: 0.813, p¼0.014).
CONCLUSION: In conclusion, the deep-learning based computer-aided diagnosis system will
likely play a vital role in the early detection and diagnosis of pulmonary nodules/masses on
chest radiographs. In future applications, these algorithms could support triage workflow via
double reading to improve sensitivity and specificity during the diagnostic process.
Ó 2019 The Royal College of Radiologists. Published by Elsevier Ltd. All rights reserved.
* Guarantor and correspondence: F.-Z. Wu, Department of Radiology, Kaohsiung Veterans General Hospital, Kaohsiung, Taiwan. Tel.: þ886 985 330160.
E-mail address: cmvwu1029@gmail.com (F.-Z. Wu).
https://doi.org/10.1016/j.crad.2019.08.005
0009-9260/Ó 2019 The Royal College of Radiologists. Published by Elsevier Ltd. All rights reserved.
C.-H. Liang et al. / Clinical Radiology 75 (2020) 38e45 39
external validation test datasets based on the reference process time using AI, the mean processing time of CR was
standard from CT images, the predictive ability and cut-off significantly longer compared to DR (116.8512.27 versus
values for each algorithm in the prediction of pulmonary 85.26.33 seconds). Of the 47 pulmonary nodules or
nodules/masses were assessed using area under the masses, 39 (82.97%) were solid nodules and eight (17.02%)
receiver operating characteristic (AUROC) curves. AUROC were part-solid nodules. The mean nodule size was
between 0.7 and 0.9 was regarded as moderate accuracy 4.370.41 cm (range 0.7e13.5 cm).
according to Greiner et al.23 Youden index and the The cross-tables for the best-performing models for
discriminant ability at each cut-off value for the four algo- pulmonary nodule/mass detection, including the heat map
rithms were used to determine the optimal cut-off value to algorithm, abnormal probability algorithm, nodule proba-
diagnose pulmonary nodules/masses. Cross-tables, sensi- bility algorithm, and mass probability algorithm are pro-
tivity, specificity, positive likelihood ratio (positive LR), vided in Fig 1.
negative likelihood ratio (negative LR), positive predictive Table 2 shows the sensitivity, specificity, diagnostic ac-
value (PPV), negative predictive value (NPV), and diagnostic curacy, negative predictive value (NPV), positive predictive
accuracy were determined from the optimal cut-off value value (PPV), likelihood ratio (LR) (þ), and LR () values of
by the Youden index for different algorithm models in the four algorithms of QUIBIM Chest X-ray Classifier at
pulmonary nodules or masses detection. optimal threshold of probability score for pulmonary
To determine and compare the diagnostic performance nodule/mass detection. The sensitivity of the heat map al-
of four different AI algorithms in pulmonary nodules or gorithm was 38.3% and the specificity was 98.11% for iden-
masses detection, the optimal diagnostic cut-off values of tifying the most abnormal region. The sensitivity of the
these algorithms was determined by using the receiver abnormal probability algorithm was 74.47% and the speci-
operating characteristic curve (ROC) curve via the Youden ficity was 81.13% for pulmonary nodule/mass detection at
index maximises the overall diagnostic accuracy. A com- the optimal probability score cut-off of 0.4116. The sensi-
parison of the ROC curves was performed by using a method tivity of the nodule probability algorithm was 85.11% and
described by DeLong and colleagues.24 A p-value of <0.05 the specificity was 64.15% for pulmonary nodule/mass
was considered significant. detection at the optimal probability score cut-off of 0.2879.
The sensitivity of the mass probability algorithm was 76.6%
and the specificity was 88.68% for pulmonary nodule/mass
Results
detection at the optimal probability score cut-off of 0.2884.
Among these four different algorithms for pulmonary
Demographics and clinical characteristics nodules detection, the nodule probability algorithm was the
most sensitive algorithm whereas the heat map algorithm
A total of 100 patients with 100 chest radiographs were was the most specific.
enrolled and summarised in Table 1. There were 47 pa- The areas under the ROC curves for pulmonary nodule
tients with clinically significant pulmonary nodules/ detection were 0.682 (95% confidence interval [CI]
masses and 53 patients with negative findings. The mean 0.581e0.772) for the heat map algorithm, 0.810 (95% CI
age was 55.0713.80 years and 54 (54%) patients were 0.719e0.882) for the abnormal probability algorithm, 0.813
men. Among 100 chest radiographs, 72% of the chest ra- (95% CI 0.723e0.884) for the nodule probability algorithm,
diographs were produced using DR, and the rest using CR. and 0.916 (95% CI 0.844e0.962) for the mass probability
Average processing time per case was 94.0716.54 sec- algorithm, respectively (Fig 2). Compared to the other three
onds, with a maximum of 133 seconds. For imaging algorithms, the mass probability algorithm had best pre-
Table 1
dictive ability for pulmonary nodule detection at the
Baseline characteristics of 100 study subjects. optimal cut-off of probability score of 0.2884 (AUCMass:
0.916 versus AUCHeat map: 0.682, p<0.001; AUCMass: 0.916
Characteristic p-Value
versus AUCAbnormal: 0.810, p¼0.002; AUCMass: 0.916 versus
Age (years) 55.07 13.80
AUCNodule: 0.813, p¼0.014).
Gender (male) 54 (54%)
Chest radiographs DICOM modality
CR 28 Subgroup analysis of detected findings using the four
DR 72 algorithms
Processing time (seconds) <0.001a
CR 116.8512.27
DR 85.26.33
Detailed distribution of detected nodular diameter and
Positive pulmonary nodule/mass 47 (47%) type according to four algorithms is displayed in Table 3.
Nodular size (cm) 4.370.41 (0.7e13.5) This mass algorithm can detect pulmonary nodule/mass
Nodular type with an average diameter of 4.8722.894 cm. This heat map
Solid nodule 39
algorithm can detect and localise pulmonary nodules
Part-solid nodule 8
correctly with an average diameter of 6.0673.029 cm;
DICOM, digital imaging and communications in medicine; CR, computed however, the ability to detect part-solid nodules was rela-
radiography; DR, digital radiography.
a
For the imaging processing time per case, the mean processing time of CR
tive weak compared to solid nodules for these algorithms.
modality was significantly larger in comparison to DR modality Nodules detected by these algorithms were usually larger
(116.8512.27 versus 85.26.33 seconds). than undetectable nodules.
C.-H. Liang et al. / Clinical Radiology 75 (2020) 38e45 41
Figure 1 Flowchart of the 100 consecutive patients and retrospective assessment using the deep-learning Chest X-ray Classifier. Cross-tables for
the best-performing models for pulmonary nodule/mass detection, including the heat map algorithm, abnormal probability algorithm, nodule
probability algorithm, and mass probability algorithm.
Table 2
ROC analysis results at the threshold to maximise sensitivity and specificity in pulmonary nodule detection across different algorithm models.
Algorithm model Cut-off ROC Sensitivity Specificity Positive LR Negative LR PPV % NPV % Accuracy %
Heat map Identify lesion 0.682 38.30 98.11 20.30 0.63 94.73% 64.19% 70%
Abnormal probability 0.4116 0.810 74.47 81.13 3.95 0.31 77.78% 78.20% 78%
Nodule probability 0.2879 0.813 85.11 64.15 2.37 0.23 67.80% 82.90% 74%
Mass probability 0.2884 0.916 76.60 88.68 6.77 0.26 85.70% 81.00% 83%
ROC, receiver operating characteristic; LR, likelihood ratio; PPV, positive predictive value; NPV, negative predictive value.
Figure 2 Comparison of ROC curves for the four algorithms. Compared to the other three algorithms, the mass probability algorithm had best
predictive ability for pulmonary nodule/mass detection at the optimal probability score cut-off of 0.2884 (AUCMass: 0.916 versus AUCHeat map:
0.682, p<0.001; AUCMass: 0.916 versus AUCAbnormal: 0.810, p¼0.002; AUCMass: 0.916 versus AUCNodule: 0.813, p¼0.014).
Diagnostic performance according to nodule size across models, the Chest X-ray Classifier software appears to
the different algorithm models is summarised in Electronic have superior diagnostic accuracy for pulmonary nodules
Supplementary Material Table S1. For the four algorithm 3 cm than pulmonary nodules <3 cm. Comparison of the
42 C.-H. Liang et al. / Clinical Radiology 75 (2020) 38e45
Figure 3 A solid nodule 2.3 cm, diagnosed as lung cancer, in the left middle lung field properly detected by the heat map algorithm, with an
abnormality score of 0.67.
Figure 4 A faint nodule with false-negative findings missed by the heat map algorithm, which was diagnosed as pulmonary adenocarcinoma
manifesting as a part-solid right upper lobe nodule of 2.9 cm as demonstrated at CT; however, it was properly detected with an abnormality
score of 0.48.
Figure 5 A faint nodule with false-negative findings missed by both classifiers (heat map and abnormal probability score), which was diagnosed
as pulmonary adenocarcinoma manifesting as a right lower lobe part-solid nodule of 1.6 cm as demonstrated at CT.
performance in a retrospective setting. Future work will aim previous studies have demonstrated blinds spots in chest
to drive implementation of deep learning to aid the radi- radiographs, which have been shown to contribute to
ologists in detecting lung nodules in real-time. Third, detection and interpretation errors.2 Further work aiming
44 C.-H. Liang et al. / Clinical Radiology 75 (2020) 38e45
Figure 6 False positive nodule by both AI algorithm classifiers (nodule and abnormality score). The chest radiography showed that this 73-year-
old woman with increasing lung marking in both lower lung fields, which was diagnosed as a normal chest finding at CT.
to investigate the diagnostic performance of deep learning 015, VGHKS104-048, VGHKS105-064, VGHKS108-159,
for blinds spots in chest radiography is warranted. Fourth, MOST108-2314-B-075B-008-).
the heat map algorithm, which focused on the automatic
identification and localisation of pulmonary nodules/
masses, has a good PPV (94.73%), is highly specific (98.11%), Appendix A. Supplementary data
but has poor sensitivity (38.3%). This algorithm can
detect pulmonary nodules with an average diameter of Supplementary data to this article can be found online at
6.0673.029 cm. Therefore, reliable identification and https://doi.org/10.1016/j.crad.2019.08.005.
localisation of smaller pulmonary nodules with deep
learning is critical to clinical implementation in real-world
References
practice. Finally, the present study included chest radio-
graphs generated by both DR and CR technologies and the 1. Brogdon BG, Kelsey CA, Moseley Jr RD. Factors affecting perception of
processing time of CR was found to be much longer than DR. pulmonary lesions. Radiol Clin N Am 1983;21(4):633e54.
This may be attributed to differences in the principle of 2. de Groot PM, Carter BW, Abbott GF, et al. Pitfalls in chest radiographic
interpretation: blind spots. Sem Roentgenol 2015;50(3):197e209.
image processing between CR and DR. In the future, the
3. Aberle DR, Adams AM, Berg CD, et al. Reduced lung-cancer mortality
diagnostic accuracy of convolutional neural networks be- with low-dose computed tomographic screening. N Engl J Med
tween CR and DR should be investigated. 2011;365(5):395e409.
In conclusion, deep-learning based CAD systems will 4. Wu FZ, Chen PA, Wu CC, et al. Semiquantative visual assessment of sub-
solid pulmonary nodules 3 cm in differentiation of lung adenocarci-
likely play a vital role in the early detection and diagnosis of
noma spectrum. Sci Rep 2017;7(1):15790.
pulmonary nodules/masses on chest radiographs. In future 5. Hsu HT, Tang EK, Wu MT, et al. Modified Lung-RADS improves perfor-
applications, these algorithms could support triage work- mance of screening LDCT in a population with high prevalence of non-
flow with double reading to improve sensitivity and spec- smoking-related lung cancer. Acad Radiol 2018;25(10):1240e51.
ificity during the diagnostic process. 6. Wu FZ, Huang YL, Wu CC, et al. Assessment of selection criteria for low-
dose lung screening CT among Asian ethnic groups in Taiwan: from mass
screening to specific risk-based screening for non-smoker lung cancer.
Clin Lung Cancer 2016;17(5):e45e56.
Conflicts of interest 7. Bhargavan M, Kaye AH, Forman HP, et al. Workload of radiologists in
United States in 2006e2007 and trends since 1991e1992. Radiology
2009;252(2):458e67.
The authors declare the following financial interests/
8. Levin DC, Rao VM, Parker L, et al. Analysis of radiologists’ imaging
personal relationships which may be considered as poten- workload trends by place of service. J Am Coll Radiol 2013;10(10):760e3.
tial competing interests: Fabio GarciaCastro and Angel 9. Brady AP. Error and discrepancy in radiology: inevitable or avoidable?
Alberich-Bayarri are founders of the spin-off company Insights Imaging 2017;8(1):171e82.
10. Forrest JV, Friedman PJ. Radiologic errors in patients with lung cancer.
QUIBIM SL. The other authors declare that they have no
West J Med 1981;134(6):485e90.
competing interests. 11. Shin HC, Roth HR, Gao M, et al. Deep convolutional neural networks for
computer-aided detection: CNN architectures, dataset characteristics
and transfer learning. IEEE Trans Med Imaging 2016;35(5):1285e98.
Acknowledgements 12. Novikov AA, Lenis D, Major D, et al. Fully convolutional architectures for
multiclass segmentation in chest radiographs. IEEE Trans Med Imaging
2018;37(8):1865e76.
This study was supported by grants from Kaohsiung 13. Cicero M, Bilbily A, Colak E, et al. Training and validating a deep con-
Veterans General Hospital, Taiwan, R.O.C. (nos. VGHKS103- volutional neural network for computer-aided detection and
C.-H. Liang et al. / Clinical Radiology 75 (2020) 38e45 45
classification of abnormalities on frontal chest radiographs. Invest Radiol 22. Wang X, Peng Y, Lu L, et al. ChestX-Ray8: hospital-scale chest x-ray
2017;52(5):281e7. database and benchmarks on weakly-supervised classification and
14. Lakhani P, Sundaram B. Deep learning at chest radiography: automated localization of common thorax diseases. In: IEEE conference on computer
classification of pulmonary tuberculosis by using convolutional neural vision and pattern recognition (CVPR), Honolulu, HI; 2017. p. 3462e71.
networks. Radiology 2017;284(2):574e82. https://doi.org/10.1109/CVPR.2017.369.
15. Liu V, Clark MP, Mendoza M, et al. Automated identification of pneu- 23. Greiner M, Pfeiffer D, Smith RD. Principles and practical application of
monia in chest radiograph reports in critically ill patients. BMC Med the receiver-operating characteristic analysis for diagnostic tests. Prev
Inform Decis Making 2013;13(1):90. Vet Med 2000;45(1e2):23e41.
16. Hua K-L, Hsu C-H, Hidayati SC, et al. Computer-aided classification of 24. DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under
lung nodules on computed tomography images via deep learning two or more correlated receiver operating characteristic curves: a
technique. OncoTargets Ther 2015;8:2015e22. nonparametric approach. Biometrics 1988;44(3):837e45.
17. Zhang W, Li R, Deng H, et al. Deep convolutional neural networks for 25. Lee J-G, Jun S, Cho Y-W, et al. Deep learning in medical imaging: general
multi-modality isointense infant brain image segmentation. NeuroImage overview. Korea J Radiol 2017;18(4):570e84.
2015;108:214e24. 26. Choy G, Khalilzadeh O, Michalski M, et al. Current applications and
18. Cheng J-Z, Chen C-M, Shen D. Chapter 9: deep learning techniques on future impact of machine learning in radiology. Radiology 2018;288(2):
texture analysis of chest and breast images. In: Depeursinge A, Al-Kadi 318e28.
O S, Mitchell JR, editors. Biomedical texture analysis. London: Academic 27. Ribli D, Horva th A, Unger Z, et al. Detecting and classifying lesions in
Press; 2017. p. 247e79. mammograms with deep learning. Sci Rep 2018;8(1):4165.
19. Nam JG, Park S, Hwang EJ, et al. Development and validation of deep 28. Geijer H, Geijer M. Added value of double reading in diagnostic radi-
learning-based automatic detection algorithm for malignant pulmonary ology, a systematic review. Insights Imaging 2018;9(3):287e301.
nodules on chest radiographs. Radiology 2019 Jan;290(1):218e28. 29. Ciatto S, Del Turco MR, Burke P, et al. Comparison of standard and
20. Baldwin DR, Callister MEJ. The British Thoracic Society guidelines on the double reading and computer-aided detection (CAD) of interval cancers
investigation and management of pulmonary nodules. Thorax 2015; at prior negative screening mammograms: blind review. Br J Cancer
70(8):794. 2003;89(9):1645e9.
21. Rajpurkar P, Irvin J, Ball RL, et al. Deep learning for chest radiograph 30. del Ciello A, Franchi P, Contegiacomo A, et al. Missed lung cancer: when,
diagnosis: a retrospective comparison of the CheXNeXt algorithm to where, and why? Diagn Interv Radiol 2017;23(2):118e26.
practicing radiologists. PLoS Med 2018;15(11). e1002686-e1002686.