Professional Documents
Culture Documents
Clinical Imaging
journal homepage: www.elsevier.com/locate/clinimag
A R T I C LE I N FO A B S T R A C T
Keywords: Artificial intelligence (AI) is a broad umbrella term used to encompass a wide variety of subfields dedicated to
Overfitting creating algorithms to perform tasks that mimic human intelligence. As AI development grows closer to clinical
Artificial intelligence integration, radiologists will need to become familiar with the principles of artificial intelligence to properly
Machine learning evaluate and use this powerful tool. This series aims to explain certain basic concepts of artificial intelligence,
and their applications in medical imaging starting with a concept of overfitting.
⁎
Corresponding author at: Columbia University Medical Center, New York Presbyterian Hospital, 622 West 168th Street, PB-1-301, New York, NY 10032, United
States of America.
E-mail addresses: stm9116@nyp.org (S. Mutasa), shs2179@cumc.columbia.edu (S. Sun), rh2616@columbia.edu (R. Ha).
https://doi.org/10.1016/j.clinimag.2020.04.025
Received 9 March 2020; Received in revised form 10 April 2020; Accepted 17 April 2020
0899-7071/ © 2020 Elsevier Inc. All rights reserved.
S. Mutasa, et al. Clinical Imaging 65 (2020) 96–99
Fig. 1. A graphical representation of overfitting. (Left) The trend line hasn't learned enough patterns from the data and has failed to capture the dominant trend.
(Middle) The trend line is a good fit for this set of data. (Right) The trend line has learned too many patterns and has lost the dominant trend. This algorithm would
not be generalizable to new data.
2.1. Digging deeper examples of many species of dogs and cats would be necessary in the
training set (Fig. 3). In the original ImageNet competition, where deep
A typical deep learning algorithm consists of a neural network ar- learning neural networks first publicly demonstrated their power, re-
chitecture that begins with an input, connects to layers of nodes, and searchers had the luxury of having 1.4 million images to work with
ultimately ends in an output. During the learning process, the para- [21]. These datasets are orders of magnitude larger than the medical
meters or “weights” within the nodes are continuously adjusted to image datasets most AI radiology studies are using, and represent one
minimize the difference between the output and the correct answer major reason why current medical studies may be prone to overfitting
(ground truth). This extracts the features in the input that were most [22,23]. Though there is no set number of “sufficient” training data, as
important to solving the question. However, if a model is run through this number will vary depending on the study and the question asked, a
the same training data too many times without enough regularization, general basic starting point around 1000 images per class will have a
the model will inevitably capture residual variation, or noise, as fea- good chance of training a classifier accurately. This is based largely on
tures and interpret these as parameters useful for prediction, thus de- the ImageNet database which contained around 1000 images per class.
creasing the overall generalizability [18,19]. There are many factors that can reduce this requirement including
using different architecture types, how representative of the population
your training data is, how distinguishable the different classes are, and
3. Overcoming overfitting
the methods you employ for regularization.
Medical data has historically been difficult to amass due to concerns
The most effective way to mitigate overfitting is to collect more
about patient confidentiality and the cost of obtaining high quality
training data. Ideally your training data would be truly representative
ground-truth annotated data [22,23]. However, the efforts of large data
of the overall population. In the case of distinguishing cats and dogs,
97
S. Mutasa, et al. Clinical Imaging 65 (2020) 96–99
3.1. Technical solutions Dropout is based on the insight that an ensemble of neural net-
works, each trained on the same data but with slightly different con-
Researchers have also developed several creative workarounds to siderations, generalizes better. Dropout allows us to have a large
efficiently use limited training data and prevent overfitting. One number of different network architectures while training by randomly
strategy, data augmentation, can artificially increase the size of a removing nodes during training. The theory behind this is that the ar-
training dataset by creating image variants from the original dataset. chitecture of a neural network can compensate for individual nodal
This can involve random rigid affine transformations, such as flipping, deficiencies. However, this is not reproducible to unseen data, and
rotation, cropping, skewing, or even the introduction of artifacts to disrupting the architecture slightly by dropping random nodes will re-
diversify the images without straying too far from the original label, as move this compensation effect [35].
shown in Fig. 4 [22,26]. Another technique for regularization is called L1/L2 regularization.
Kim et al., in their effort to detect fractures, used this technique to As a neural network learns each feature, each feature is given a weight
create 11,112 training images from an original dataset of only 1389 to determine how significant that feature is. We can limit the magnitude
images [27]. Warped images, while providing variation, do not prove of our weights through L1/L2 regularization so that no single feature
the same degree of data enrichment that additional, separate examples overwhelms others [36]. This has the effect of encouraging the agree-
do. ment of many input features when coming to a certain conclusion as
In a study using CNN to predict breast cancer molecular subtype opposed to preferring the input of a few overwhelming features.
published in 2019 [15], Ha et al. used this augmentation technique to Batch normalization (BN), initially developed to mitigate a game of
increase sample size (Fig. 5). Images of a single input example of an “broken telephone” which can occur when layers in a neural network
enhancing mass with multiple random affine warps applied for data are unable to learn simultaneously, has a small regularizing effect. The
augmentation. This technique alters the mass slightly utilizing a rigid theoretical motivation for batch normalization is outside the scope of
transformation effectively making additional unique samples of the this paper; however it functions by forcing normalization of the acti-
vation maps to a learned mean and standard deviation. This has the
effect of introducing noise to the input of each layer in the neural
network, which introduces a regularization effect [37].
5. Conclusion
Fig. 5. Images of a single input example of an enhancing mass with multiple
random affine warps applied for data augmentation. Overfitting is a common pitfall in which AI models capture noise or
98
S. Mutasa, et al. Clinical Imaging 65 (2020) 96–99
superficial information rather than truly distinguishing disease. Models three-dimensional deep learning on low-dose chest computed tomography. Nat Med
that are overfitted will have a high training performance but will have 2019;25:954–61. https://doi.org/10.1038/s41591-019-0447-x.
[17] Langlotz C. RSNA annual meeting. November 27, 2017.
severely decreased accuracy upon encountering new data. This can be [18] Burnham KP, Anderson DR. Model selection and multimodel inference. 2nd ed.
overcome by increasing the amount of training data, data augmenta- Springer-Verlag; 2002.
tion, or several other techniques such as regularization and dropout. [19] England JR, Cheng PM. Artificial intelligence for medical image analysis: a guide
for authors and reviewers. Am J Roentgenol 2018;212(3):513–9. https://doi.org/
Before AI algorithms can be incorporated into clinical use, external 10.2214/AJR.18.20490.
validation will be necessary to ensure generalizability. [21] Russakovsky O, Deng J, Su H, et al. ImageNet large scale visual recognition chal-
lenge. arXiv:1409.0575v3 [cs.CV] 302015.
[22] Chartrand G, Cheng PM, Vorontsov E, et al. Deep learning: a primer for radiologists.
Declaration of competing interest RadioGraphics 2017;37(7):2113–31. https://doi.org/10.1148/rg.2017170077.
[23] Parmar C, Barry JD, Hosny A, et al. Data analysis strategies in medical imaging. Clin
Cancer Res 2018;24(15):3492–9. https://doi.org/10.1158/1078-0432.CCR-18-
No disclosures. No conflict of interest.
0385.
[24] Clark K, Vendt B, Smith K, et al. The cancer imaging archive (TCIA): maintaining
References and operating a public information repository. J Digit Imaging
2013;26(6):1045–57. https://doi.org/10.1007/s10278-013-9622-7.
[25] AI challenge. RSNA. https://www.rsna.org/en/education/ai-resources-and-
[1] Langlotz CP, Allen B, Erickson BJ, et al. A roadmap for foundational research on training/ai-image-challenge, Accessed date: 16 January 2020.
artificial intelligence in medical imaging: from the 2018 NIH/RSNA/ACR/The [26] Yamashita R, Nishio M, Do RKG, et al. Convolutional neural networks: an overview
Academy Workshop. Radiology 2019;291(3):781–91. https://doi.org/10.1148/ and application in radiology. Insights Imaging 2018;9(4):611–29.
radiol.2019190613. [27] Kim DH, Mackinnon T. Artificial intelligence in fracture detection: transfer learning
[2] Thrall JH, Li X, Li Q, et al. Artificial intelligence and machine learning in radiology: from deep convolutional neural networks. Clin Radiol 2018;73:439–45.
opportunities, challenges, pitfalls, and criteria for success. J Am Coll Radiol 2018 [29] Paul R, Schabath M, Balagurunathan Y, et al. Explaining deep features using radi-
Mar;15(3 Pt B):504–8. https://doi.org/10.1016/j.jacr.2017.12.026. [Epub 2018 ologist-defined semantic features and traditional quantitative features. Tomography
Feb 4]. 2019;5(1):192–200. https://doi.org/10.18383/j.tom.2018.00034.
[3] Ahmed H, Chintan P, John Q, et al. Artificial intelligence in radiology. Nat Rev [30] Nishio M, Sugiyama O, Yakami M, et al. Computer-aided diagnosis of lung nodule
Cancer 2018;18(8):500–10. Aug. classification between benign nodule, primary lung cancer, and metastatic lung
[4] Huang X, Shan J, Vaidya V. Lung nodule detection in CT using 3D convolutional cancer at different image size using deep convolutional neural network with
neural networks. IEEE 14th International Symposium on Biomedical Imaging (ISBI transfer learning. PLoS One 2018;13(7):e0200721. Published 2018 Jul 27 https://
2017). 2017. p. 379–83. doi.org/10.1371/journal.pone.0200721.
[5] Tsehay YK, Lay NS, Roth HR, et al. Convolutional neural network based deep- [31] Samala RK, Chan HP, Hadjiiski LM, et al. Multi-task transfer learning deep con-
learning architecture for prostate cancer detection on multiparametric magnetic volutional neural network: application to computer-aided diagnosis of breast cancer
resonance images. Proceedings of SPIE 2017. https://doi.org/10.1117/12.2254423. on mammograms. Phys Med Biol 2017;62(23):8894–908. Published 2017 Nov 10
[6] Kooi T, Litjens G, van Ginneken B, et al. Large scale deep learning for computer https://doi.org/10.1088/1361-6560/aa93d4.
aided detection of mammographic lesions. Med Image Anal 2017;35:303–12. [32] Maqsood M, Nazir F, Khan U, et al. Transfer learning assisted classification and
[7] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic seg- detection of Alzheimer’s disease stages using 3D MRI scans. Sensors (Basel)
mentation. 2015 IEEE Conference on Computer Vision and Pattern Recognition 2019;19(11):2645. Published 2019 Jun 11 https://doi.org/10.3390/s19112645.
(CVPR). 2015. p. 3431–40. Boston, MA, USA. [33] Sra S, Nowozin S, Wright SJ. Optimization for machine learning. Mit Press; 2012.
[8] Moeskops P, Wolterink JM, van der Velden BHM, et al. Deep learning for multi-task [34] Bell RM, Koren Y. SIGKDD Explor Newsl. Lessons from the Netflix prize challenge 9.
medical image segmentation in multiple modalities. Medical image computing and New York, NY, USA: ACM; 2007. p. 75–9.
computer-assisted intervention—MICCAI 2016. 478–486. 2016. Athens, Greece. [35] Srivastava N, Hinton GR, Krizhevsky A, et al. Dropout: a simple way to prevent
[9] Cheng JZ, Ni D, Chou YH, et al. Computer-aided diagnosis with deep learning ar- neural networks from overfitting. J Mach Learn Res 2014;15:1929–58.
chitecture: applications to breast lesions in US images and pulmonary nodules in CT [36] Nowlan SJ, Hinton GE. Simplifying neural networks by soft weight-sharing. Neural
scans. Sci Rep 2016;6:24454. Published 2016 Apr 15 https://doi.org/10.1038/ Comput 1992;4(4).
srep24454. [37] Luo P, Wang X, Shao W, et al. Towards understanding regularization in batch
[10] Ding Y, Sohn JH, Kawczynski MG, et al. A deep learning model to predict a diag- normalization. 7th International Conference on Learning Representations. ICLR
nosis of Alzheimer disease by using 18F-FDG PET of the brain. Radiology 2019; 2019.
2019;290(2):456–64. https://doi.org/10.1148/radiol.2018180958. [38] Jha S, Topol EJ. Adapting to artificial intelligence: radiologists and pathologists as
[11] Mazurowski MA, Buda M, Saha A, et al. Deep learning in radiology: an overview of information specialists. JAMA 2016;316(22):2353–4. https://doi.org/10.1001/
the concepts and a survey of the state of the art with focus on MRI. J Magn Reson jama.2016.17438.
Imaging 2019;49:939–54. https://doi.org/10.1002/jmri.26534. [39] Zech JR, Badgeley MA, Liu M, et al. Variable generalization performance of a deep
[12] Patriarche JW, Erickson BJ. Part 1. Automated change detection and character- learning model to detect pneumonia in chest radiographs: a cross-sectional study.
ization in serial MR studies of brain-tumor patients. J Digit Imag 2007;20:203–22. PLoS Med 2018;15(11):e1002683https://doi.org/10.1371/journal.pmed.1002683.
[13] Ha R, Chang P, Karcich J, et al. Axillary lymph node evaluation utilizing con- [40] Yasaka K, Abe O. Deep learning and artificial intelligence in radiology: current
volutional neural networks using MRI dataset. J Digit Imaging 2018;31(6):851–6. applications and future directions. PLoS Med 2018;15(11):e1002707. Published
https://doi.org/10.1007/s10278-018-0086-7. 2018 Nov 30 https://doi.org/10.1371/journal.pmed.1002707.
[14] Ha R, Mutasa S, Sant EP, et al. Accuracy of distinguishing atypical ductal hyper- [41] Park SH, Do KH, Choi JI, et al. Principles for evaluating the clinical implementation
plasia from ductal carcinoma in situ with convolutional neural network–based of novel digital healthcare devices. J Korean Med Assoc 2018;61:765–75.
machine learning approach using mammographic image data. Am J Roentgenol [42] Kim DW, Jang HY, Kim KW, et al. Design characteristics of studies reporting the
2019;212(5):1166–71. https://doi.org/10.2214/AJR.18.20250. performance of artificial intelligence algorithms for diagnostic analysis of medical
[15] Ha R, Mutasa S, Karcich J, et al. Predicting breast cancer molecular subtype with images: results from recently published papers. Korean J Radiol
MRI dataset utilizing convolutional neural network algorithm. J Digit Imaging 2019;20(3):405–10. https://doi.org/10.3348/kjr.2019.0025.
2019;32(2):276–82. https://doi.org/10.1007/s10278-019-00179-2.
[16] Ardila D, Kiraly AP, Bharadwaj S, et al. End-to-end lung cancer screening with
99