You are on page 1of 9

Unveiling the Precision of CheXNeXt Algorithm Against Radiologist

Expertise in Chest Radiograph Pathology Detection

Selva Vignesh M1*


1
IIPC Coordinator, Dr.N.G.P Arts and Science College, Coimbatore, Tamilnadu, India

*Author to whom correspondence should be addressed


E-mail: selvavigneshmds@gmail.com

ABSTRACT

Chest radiograph interpretation plays a pivotal role in diagnosing thoracic diseases, yet the reliance on expert
radiologists poses challenges such as fatigue-based errors and limited diagnostic accessibility in certain regions
of the planet. Leveraging deep learning, we introduce a novel algorithm, CheXNeXt, designed to detect 14
different pathologies in frontal-view chest radiographs. In a comprehensive study comparing CheXNeXt with 3
radiologists, including board-certified and senior residents, our algorithm demonstrated expert-level
performance on 11 pathologies. Notably, CheXNeXt outperformed radiologists in detecting atelectasis. While
radiologists excelled in specific cases like cardiomegaly, emphysema, and hiatal hernia, CheXNeXt showcased
efficiency and consistency, completing image interpretation substantially faster. The study highlights the
potential of deep learning algorithms, like CheXNeXt, to augment diagnostic capabilities, addressing challenges
associated with radiologist shortages and enhancing patient access to chest radiograph diagnostics.

Key Points – Chest radiograph, CheXNeXt, Radiologists, Thoracic diseases, Digital Diagnostics

I. INTRODUCTION

Chest radiograph interpretation is a critical In recent years, the emergence of deep learning
component of diagnosing thoracic diseases, approaches has marked a transformative shift in the
providing insights into conditions such as landscape of medical image interpretation. Fueled
tuberculosis and lung cancer that affect millions by large-scale neural network architectures and
worldwide annually. The traditional reliance on propelled by extensive labeled datasets, these
expert radiologists for image analysis, however, algorithms have showcased expert-level
presents formidable challenges that extend beyond performance in various diagnostic tasks. Our study
the nuances of medical diagnosis. This dependence aims to contribute to this paradigm shift by
introduces a susceptibility to fatigue-based errors, introducing CheXNeXt, a convolutional neural
where the demanding nature of the task can network meticulously designed to concurrently
compromise the accuracy and precision of detect 14 distinct pathologies in frontal-view chest
diagnostic assessments. Moreover, the shortage of radiographs.
radiologists in specific geographic regions
exacerbates the limited diagnostic accessibility for The motivation behind the development of
individuals seeking timely medical evaluations. CheXNeXt lies in addressing the dual challenges of
diagnostic accuracy and accessibility. By expertise to regions where the scarcity of
harnessing the power of deep learning, we aspire to radiologists limits accessibility. The comprehensive
alleviate the burden on expert radiologists, reduce training and internal validation of CheXNeXt on
fatigue-induced errors, and extend diagnostic the ChestX-ray8 dataset, coupled with the reference

1
standard provided by a panel of three board- pathologies, identifying areas where the algorithm
certified cardiothoracic specialist radiologists, form excels and those where it may require refinement.
the foundation of our investigation into the
algorithm's discriminative performance. Furthermore, we delve into the efficiency aspect,
measuring the time taken for image interpretation
Our comparative analysis involves not only by both CheXNeXt and the radiologists. The stark
algorithmic performance but also an assessment of contrast in time efficiency underscores the potential
CheXNeXt against the proficiency of nine of deep learning algorithms to expedite diagnostic
radiologists, comprising six board-certified processes, addressing concerns related to prolonged
radiologists and three senior radiology residents. waiting times for medical evaluations. Through this
The 14 pathologies under consideration include research, we envision a future where advanced
pneumonia, pleural effusion, pulmonary masses, deep learning algorithms, exemplified by
nodules, and others of clinical significance. In this CheXNeXt, play a pivotal role in complementing
pursuit, we seek to evaluate whether CheXNeXt radiological expertise, reducing diagnostic errors,
achieves radiologist-level performance across these and expanding patient access to chest radiograph
diagnostics on a global scale.

II. SYSTEM DESIGN

2.1 DATASET

In this research, the dataset used in the study is


called ChestX-ray14. It is currently the largest
public repository of radiographs, containing
112,120 frontal-view chest radiographs of 30,805
unique patients. Each image in the dataset is
annotated with up to 14 different thoracic
pathology labels. The labels were chosen based on
the frequency of observation and diagnosis in
clinical practice. The dataset was partitioned into
training, tuning, and validation sets for the purpose
of the study. It's worth noting that the evaluation of
Figure 1: Original Slices of X-rays from ChestX-ray14
CheXNeXt's performance relies on the reference
standard provided by a panel of three board- The chest radiograph images utilized in this study
certified cardiothoracic specialist radiologists. were sourced from the College of Chest X-rays and
These radiologists, as part of the validation process, were available in JPEG format as shown in Figure
likely contributed to the annotation and labeling of 1.
the dataset, ensuring a robust benchmark for the The use of this standardized image format allowed
algorithm's discriminative capabilities The training seamless integration into the study's workflow,
set was used to optimize network parameters, the ensuring accessibility and ease of manipulation for
tuning set was used to compare and choose subsequent stages, such as training and validation
networks, and the validation set was used to of the deep learning algorithm, CheXNeXt.
evaluate the performance of the algorithm and
radiologists. The dataset is publicly hosted by the
2.2 EXPLORATORY DATA
National Institutes of Health Clinical Centre.. The
test set annotations are not made publicly available
ANALYSIS AND PREPROCESSING
to preserve the integrity of the test results.
The dataset employed in this research, derived from
the publicly available ChestX-ray8 dataset, serves
as a fundamental resource for the diagnosis of

2
thoracic diseases through chest radiographs. This colour channel. The maximum pixel value is
section delineates the critical steps involved in the 0.9804, the minimum is 0.0000, the mean value of
Exploratory Data Analysis (EDA) and pre- the pixels is 0.4796, and the standard deviation is
processing procedures, vital for enhancing both 0.2757 as shown in Figure 2.
interpretability and the subsequent efficacy of the
machine learning models.
To obviate potential data leakage, a
meticulous approach to patient-level splitting is
adopted, thereby preventing inadvertent
occurrences of the same patient's radiographic
images in both the training and test datasets. This
meticulous partitioning strategy ensures the
integrity of the model evaluation process.
The dataset encompasses 14 pathology
classes, including Atelectasis, Cardiomegaly,
Consolidation, Edema, Effusion, Emphysema,
Fibrosis, Hernia, Infiltration, Mass, Nodule, Pleural
Thickening, Pneumonia, and Pneumothorax. Each
Figure 3: Single Image Investigation
pathology class exhibits varying prevalence within
the dataset, necessitating nuanced considerations The pre-processing phase is paramount for
during both EDA and subsequent model training. effective model training. Standardization,
The subsequent detailed EDA facilitated through the Keras ImageDataGenerator,
encompasses the scrutiny of individual X-ray ensures a mean pixel value of zero and a standard
images. Pixel value distributions are analysed to deviation of one. This critical step enhances model
gain insights into the intensity variations across convergence during training. Addressing class
image as shown in Figure 2. Additionally, imbalance, a ubiquitous challenge in medical
statistical characteristics, including dimensions, datasets, involves the introduction of weighted loss
pixel intensity, and overall image quality, are functions. These functions assign distinct weights
examined to inform subsequent pre-processing to positive and negative cases based on their
steps. frequency, mitigating the impact of class
imbalances and fostering equal contribution from
each pathology during model training.

Figure 2: Pixel Value Distribution

The subsequent detailed EDA encompasses the


scrutiny of individual X-ray images. Pixel value
distributions are analyzed to gain insights into the
Figure 4: Frequency of Each Class
intensity variations across images. Additionally,
statistical characteristics, including dimensions, Visualizing the frequency distribution of
pixel intensity, and overall image quality, are each class underscores the significant imbalances
examined to inform subsequent pre-processing inherent in certain pathologies. Computation of
steps. The images have dimensions of 1024 pixels class-wise positive and negative frequencies,
width and 1024 pixels height, with one single coupled with the determination of class-specific

3
weights, contributes to the establishment of a features is crucial for accurate diagnosis. We
balanced and nuanced model training paradigm. leverage the pre-trained DenseNet-121 on a vast
The integration of a weighted loss function into the dataset, such as ImageNet, to harness its learned
model training pipeline augments model features from a diverse range of visual patterns.
performance, ensuring equitable consideration of The fine-tuning process is a pivotal step in
each pathology. adapting DenseNet-121 to the nuances of chest X-
In conclusion, the rigorous EDA and pre- ray images. Fine-tuning involves adjusting model
processing methodologies outlined herein form the parameters to specialize in discerning disease-
bedrock for the subsequent development of a robust related features. In our case, the fine-tuning occurs
and effective diagnostic model for thoracic on a comprehensive chest X-ray dataset that
diseases. The nuanced understanding of dataset includes various pathologies. This ensures that the
intricacies, coupled with the mitigation of class model becomes adept at capturing the subtle visual
imbalances and standardization of image data, cues indicative of thoracic diseases. The
collectively contribute to the creation of a reliable augmentation of the dataset with more data is a
and accurate diagnostic framework. strategic move, enhancing the model's ability to
generalize across a broader spectrum of cases,
2.3 NORMALIZATION including rare pathologies and diverse patient
populations.
The hierarchical feature extraction
Normalization of input data stands as a pivotal step
capabilities of DenseNet-121 are integral to our
in optimizing the performance of the DenseNet-121
model's success because of its dense networks as
model in our research endeavour. The adoption of
shown in Figure 5. Its densely connected blocks
standardization, where pixel values undergo
enable efficient information flow, capturing
transformation to attain a mean of zero and a
intricate details within chest X-ray images. We
standard deviation of one, is instrumental in
exploit these features to empower CheXNeXt in
ensuring uniformity across input features. The
accurately identifying and classifying a diverse
image is resized to a standardized 512x512 pixel
array of thoracic pathologies. This versatility is
dimension, and pixel values are adjusted based on
particularly valuable in the medical domain, where
the mean and standard deviation derived from the
conditions may manifest in various ways.
ImageNet training set. This is particularly crucial
for DenseNet-121, characterized by densely
connected blocks. The process of normalization
serves to stabilize the training procedure,
mitigating the impact of varied scales in images
and preventing the dominance of individual
features. Aligned with the architecture of
DenseNet-121, this normalization strategy
enhances convergence, regulates activation
function behaviour, and contributes to the overall
efficiency of the model. The meticulous
normalization process facilitates the successful
fine-tuning of DenseNet-121, thereby augmenting Figure 5: DenseNet121 Architecture
its predictive accuracy and robustness.
In terms of data, our chest X-ray dataset,
enriched with a plethora of annotated images,
2.4 CHEXNEXT contributes to the model's robustness. The fine-
tuning process involves training the model on this
DenseNet-121, a robust convolutional neural
expansive dataset, ensuring that it adapts to the
network architecture, is at the core of our chest X-
intricacies of medical imaging. The augmentation
ray image analysis. DenseNet-121 is chosen for its
of data involves feeding the model with additional
unique dense connectivity pattern, promoting
instances, exposing it to a more extensive variety of
feature reuse and enhancing model capacity. This
cases. This not only improves the model's ability to
architecture is particularly beneficial in medical
image analysis, where the extraction of intricate

4
generalize but also enhances its sensitivity to rare This architectural choice facilitates an
conditions and subtle abnormalities. efficient flow of information, enabling the model to
CheXNeXt, therefore, stands as a glean insights from various layers simultaneously.
testament to the synergy between DenseNet-121 Trained on an extensive dataset, the DenseNet121
and an enriched dataset. Through meticulous fine- model autonomously learns parameters,
tuning and data augmentation, we optimize the empowering the CheXNeXt algorithm to make
model's performance, aiming for heightened accurate predictions on previously unseen chest
accuracy in chest X-ray pathology detection. This radiographs. This integration exemplifies a robust
approach represents a significant stride in synergy between GradCAM's interpretative
leveraging deep learning for medical diagnostics, capabilities and DenseNet121's proficiency in
showcasing the potential of DenseNet-121 in handling intricate image-based diagnostic tasks as
enhancing the capabilities of chest X-ray analysis shown in Figure 6
systems.

2.5 GRADCAM

GradCAM, or Gradient-weighted Class Activation


Mapping, constitutes a pivotal interpretative tool in
our research, particularly when applied to the
predictions generated by the CheXNeXt algorithm.
Figure 6: GradCAM on Single Label
It functions by generating heat maps that spotlight
regions within chest radiographs that significantly Visualizing multiple labels using GradCAM offers
influence the algorithm's classification of specific a comprehensive understanding of how the
pathologies. Through these heat maps, the CheXNeXt algorithm processes and interprets chest
algorithm's decision-making process becomes radiographs for various pathologies. The GradCAM
transparent, elucidating the critical features in the technique, which highlights regions contributing to
image that contribute to its predictions. the model's predictions, becomes particularly
insightful when applied to multiple labels
Within our research framework, the simultaneously as shown in Figure 7.
integration of GradCAM is seamlessly realized
alongside the DenseNet121 architecture, which
serves as the neural network model for the
CheXNeXt algorithm. DenseNet121, a
convolutional neural network tailored for image
data, exhibits a distinctive characteristic of direct Figure 7: GradCAM on Multiple Labels
connections between every layer within a block.

III. RESULTS AND DISCUSSION

The study employed the CheXNeXt algorithm to proportion correct for all pathologies stood at
assess the effectiveness of deep learning in chest 0.828, showcasing the algorithm's overall
radiograph diagnosis, revealing promising results. competence. However, notable variations in
The algorithm exhibited a specificity of 0.927 and a performance were observed, particularly for
sensitivity of 0.594 across 14 pathologies, pathologies with low prevalence during training.
surpassing radiologists in certain areas like effusion The relabelling process, while improving
but lagging behind in others, including performance in some instances, posed challenges in
cardiomegaly, emphysema, and hernia. The mean others, highlighting the need for ongoing research

5
and refinement to enhance the accuracy and
reliability of deep learning algorithms in clinical
practice. Figure 8 depicts how the model has
performed well in detecting multiple pathologies.

Figure 8: Multi- Pathological Detection by


CheXNeXt

Figure 9: AUC Curve


In evaluating accuracy through the area
under the receiver operating characteristic curve Across various pathologies, the algorithm matched
(AUC - ROC), CheXNeXt demonstrated significant or outperformed radiologists in 11 instances, but
prowess. Specifically, it achieved a statistically the radiologists excelled in AUC performance for
higher AUC of 0.862 (95% CI 0.825–0.895) for cardiomegaly, emphysema, and hiatal hernia. These
atelectasis compared to radiologists, whose AUC findings underscore the potential of deep learning
was 0.808 (95%) as shown in Figure 8. algorithms like CheXNeXt in enhancing diagnostic
capabilities, while also emphasizing the nuanced
challenges and areas for improvement, particularly
in handling pathologies with lower prevalence and
refining algorithmic performance across diverse
conditions.

Pathology Radiologists Algorithm Algorithm - Radiologists Difference Advantage


Atelectasis 0.808 0.862 0.053 Algorithm
Cardiomegaly 0.888 0.831 -0.057 Radiologists
Consolidation 0.841 0.893 0.052 No Difference
Edema 0.910 0.924 0.015 No Difference
Effusion 0.900 0.901 0.000 No Difference
Emphysema 0.911 0.704 -0.208 Radiologists
Fibrosis 0.897 0.806 -0.091 No Difference
Hernia 0.985 0.851 -0.133 Radiologists
Infiltration 0.734 0.721 -0.013 No Difference
Mass 0.886 0.909 0.024 No Difference
Nodule 0.899 0.894 -0.005 No Difference
Pleural thickening 0.779 0.798 0.019 No Difference
Pneumonia 0.823 0.851 0.028 No Difference
Pneumothorax 0.940 0.944 0.004 No Difference

Table 1: Evaluation of Model and Radiologists

6
The evaluation of the CheXNeXt algorithm for prevalence and the impact of the relabelling
chest radiograph diagnosis demonstrated its process were acknowledged. The study emphasizes
considerable potential in augmenting diagnostic the efficiency of CheXNeXt, completing image
capabilities compared to expert radiologists. With a interpretation substantially faster than radiologists,
specificity of 0.927 and a sensitivity of 0.594 addressing concerns related to prolonged waiting
across 14 pathologies, CheXNeXt outperformed times for medical evaluations. Despite challenges,
radiologists in specific instances, notably excelling the research underscores the potential of deep
in the detection of atelectasis as shown in Table 1. learning algorithms, such as CheXNeXt, to
However, variations in performance were observed, revolutionize chest radiograph diagnostics,
and radiologists demonstrated superiority in complement radiological expertise, and enhance
pathologies like cardiomegaly, emphysema, and patient access to timely and accurate interpretations
hernia. The algorithm showcased overall globally.
competence, with a mean proportion correct of
0.828. In AUC-ROC analysis, CheXNeXt exhibited
significant prowess, achieving a higher AUC of
0.862 for atelectasis compared to radiologists
(AUC 0.808). While the algorithm matched or
outperformed radiologists in 11 instances,
challenges associated with pathologies of low

IV CONCLUSIONS

In conclusion, this project underscores the interpretative capabilities of CheXNeXt, providing


transformative potential of the CheXNeXt transparency into the algorithm's decision-making
algorithm in reshaping chest radiograph process. While challenges exist, particularly in
diagnostics. The algorithm, designed to detect 14 handling pathologies of low prevalence, the
different pathologies, demonstrated expert-level research emphasizes the need for ongoing
performance on numerous fronts, surpassing refinement and research to enhance the accuracy
radiologists in specific pathologies and showcasing and reliability of deep learning algorithms in
efficiency in image interpretation. The study clinical practice. Ultimately, the findings position
illuminates the profound impact of deep learning CheXNeXt as a promising advancement in the
algorithms, like CheXNeXt, in mitigating realm of chest radiograph interpretation, paving the
challenges associated with radiologist shortages, way for a future where advanced algorithms
reducing diagnostic errors, and expediting patient contribute significantly to global healthcare
access to timely medical evaluations. Notably, the accessibility and diagnostic precision.
integration of GradCAM further elucidates the

V REFERENCES

1. Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, et al. Dermatologist-level classification
of
skin cancer with deep neural networks. Nature. 2017 Feb; 542(7639):115–8. https://doi.org/10.1038/
nature21056 PMID: 281174455. Ehteshami Bejnordi B, Veta M, Johannes van Diest P, van Ginneken
B, Karssemeijer N, Litjens G, et al.
2. Diagnostic Assessment of Deep Learning Algorithms for Detection of Lymph Node Metastases in
Women With Breast Cancer. JAMA. 2017 12; 318(22):2199–210. https://doi.org/10.1001/jama.2017.
14585 PMID: 29234806
3. Cicero M, Bilbily A, Colak E, Dowdell T, Gray B, Perampaladas K, et al. Training and Validating a
Deep Convolutional Neural Network for Computer-Aided Detection and Classification of

7
Abnormalities on Frontal Chest Radiographs. Invest Radiol. 2017; 52(5):281–7.
https://doi.org/10.1097/RLI.
0000000000000341 PMID: 27922974
4. Bar Y, Diamant I, Wolf L, Lieberman S, Konen E, Greenspan H. Chest pathology detection using deep
learning with non-medical training. In: 2015 IEEE 12th International Symposium on Biomedical
Imaging (ISBI). 2015. p. 294–7.
5. Maduskar P, Muyoyeta M, Ayles H, Hogeweg L, Peters-Bax L, van Ginneken B. Detection of
tuberculosis using digital chest radiography: automated reading vs. interpretation by clinical officers.
6. Lakhani P, Sundaram B. Deep Learning at Chest Radiography: Automated Classification of Pulmonary
Tuberculosis by Using Convolutional Neural Networks. Radiology. 2017 Apr 24; 284(2):574–82.
https://
doi.org/10.1148/radiol.2017162326 PMID: 28436741
7. Setio AAA, Ciompi F, Litjens G, Gerke P, Jacobs C, Riel SJ van, et al. Pulmonary Nodule Detection in
CT Images: False Positive Reduction Using Multi-View Convolutional Networks. IEEE Trans Med
Imaging. 2016 May; 35(5):1160–9. https://doi.org/10.1109/TMI.2016.2536809 PMID: 26955024
8. Yao L, Poblenz E, Dagunts D, Covington B, Bernard D, Lyman K. Learning to diagnose from scratch
byexploiting dependencies among labels. ArXiv171010501 Cs [Internet]. 2017 Oct 28. Available from:
http://arxiv.org/abs/1710.10501. [cited 2017 Oct 28].
9. Donovan T, Litchfield D. Looking for Cancer: Expertise Related Differences in Searching and
Decision
Making. Appl Cogn Psychol. 2013 Jan 1; 27(1):43–9
10. Huang G, Liu Z, Maaten L v d, Weinberger KQ. Densely Connected Convolutional Networks. In: 2017
IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017. p. 2261–9

8
9

You might also like