Professional Documents
Culture Documents
2168-2194 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY JALANDAR. Downloaded on February 23,2024 at 13:16:49 UTC from IEEE Xplore. Restrictions apply.
4410 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 27, NO. 9, SEPTEMBER 2023
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY JALANDAR. Downloaded on February 23,2024 at 13:16:49 UTC from IEEE Xplore. Restrictions apply.
LIU et al.: MULTI-LABEL LOCAL TO GLOBAL LEARNING 4411
(PL) and brain-like computing. We closely review them in the network architecture-based methods primarily use bio-inspired
medical diagnosis area and explain how they differ from ours. spiking neural networks (SNNs) [37] in various tasks. For
1) Cl: Using curriculum in the context of machine learning example, [38] proposes an SNN-based framework to mitigate
is first proposed by [25], before [26] coin the term curriculum the class imbalanced problems in medical image classification,
learning and trains deep neural network models gradually from while [39] uses the SNNs for COVID-19 detection. 2) The
easier to harder data within a dataset. CL has been extensively learning paradigm-based methods aim to mimic how humans
applied across various domains, demonstrating improved per- learn in the real world. Representative works include transfer
formance [27]. In medical image diagnosis, CL methods fall learning, few-shot learning [40] and the above-mentioned CL.
into two categories based on how the curriculum is gener- Our proposed ML-LGL method is also inspired by how people
ated. 1) Predefined curriculum-based methods manually design learn, but it specifically focuses on the mechanism of how
the curriculum using human prior knowledge. For example, individuals learn from a few categories to more categories.
in histopathology image classification, [28] and [29] leverage
annotator agreement and disease severity as a proxy for image III. PRELIMINARIES: LGL
difficulty, respectively. 2) By contrast, transfer teacher-based
methods invite a teacher model and measure training samples’ Consider the problem of multi-class classification on cat-
difficulty based on its output. For example, [30] develop a teacher egories Y using a DNN model, where w denotes the model
model with an evidence identification algorithm, so as to explore weights and L denotes the loss function. The definition of LGL
prior knowledge of training bias about diagnosis difficulty and introduced in [5] can be given as follows:
local features for curriculum generation. Definition 1: The LGL methodology is to iteratively train the
Our ML-LGL conducts training by progressive aggregation of DNN model by adding a new category’s samples to the training
disease categories within the entire dataset, thereby classifying set at each time.
it as a form of CL. A key difference between our approach LGL is implemented by iteratively minimizing the loss func-
and other CL methods is curriculum’s implementation space. tion on gradually increasing training sets. At the k-th iteration,
While others derive the sample’s curriculum in sample space, let Ykl denotes the local category where the categories have
our method adopts a distinct type of curriculum, specified by already been trained, and wk∗ denotes convergent weights. At
establishing the disease’s curriculum in category space. each iteration, the implementation consists of three steps:
2) Spl: SPL is the extension of CL that measures the diffi- 1) Update the local category: Using the category selection
culty of training examples according to their losses automati- function f to select a new category yks from the remaining
cally and embeds the curriculum design into the loss function. untrained categories, and push it into the local category
For example, [31] exploit SPL to handle class imbalance of to form a new set:
skin disease, by combining the sample number of each class
and the difficulty of the sample. In multi-modal Alzheimer’s yks = f (Y\Yk−1
l
)
(1)
disease classification [32], SPL is applied to dynamically es- Yk = Yk−1 ∪ {yks }
l l
timate the contribution of each sample to the fusion model
to avoid the influence of noise samples and outliers, and the 2) Build a new training set: Select samples whose labels
sample significance is also helpful to capture the relevance are in the new local category to build the new training
across different modalities in the multi-modal fusion process. set tk :
Apparently, our proposed ML-LGL is out of the scope of SPL,
and more importantly, the curriculum of these methods also takes tk = {(xn , yn ) : yn ∈ Ykl } (2)
place in the sample space. 3) Training: Train the model on the new training set tk using
3) Progressive Learning: “Progressive Learning” is an over- the loss function L until convergence:
loaded term used with different meanings. In some literature, it
means a kind of CL method where the curriculum is not related wk∗ = arg min L(w; tk , wk−1
∗
)
w (3)
to the difficulty of each single sample but is configured instead
as a progressive mutation of model capacity or architecture. For
example, [33] progressively decreases the dropout probability IV. METHOD
during training and [34] progressively grows the capacity of Building on LGL, we propose ML-LGL for multi-label
GANs to obtain high-quality results. Another meaning of PL is CXR classification. Given the dataset Dcxr = {(xn , Yn )}N
n=1
related to continual or lifelong learning [35], which learns on with K abnormalities Y = {y1 , y2 , . . .yK }, xn is an input
increasing tasks using an infinite stream of data. image, and Yn is its corresponding label, which is a subset
As a novel kind of CL method, ML-LGL does not belong of Y. The flowchart of ML-LGL is shown in Fig. 2, and
to the scope of PL and differs from the aforementioned PL in Algorithm 1 outlines the process. ML-LGL follows the general
two aspects: 1) ML-LGL performs curriculum in the category steps of LGL: selecting new disease patterns to update the
space and the model remains unchanged, whereas PL uses a local category, building a new training set, and training on
changeable architecture during the learning process. 2) ML-LGL the new set using dynamic loss. We will now delve into the
uses a fixed dataset set while PL uses increasing and large-scale details.
data flow.
4) Brain-Like Computing: Brain-like computing mimics the
information processing mode and structure of the biological A. Selection Function for Local Category Updating
nervous system, proposing new computing theory, computer The core of updating local category at each iteration is to
architecture and learning algorithms [36]. In medical diagnosis, use the selection function f to choose diseases of high training
the methods can be roughly divided into two families. 1) The priority. To enable the model to learn like a radiologist, f must
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY JALANDAR. Downloaded on February 23,2024 at 13:16:49 UTC from IEEE Xplore. Restrictions apply.
4412 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 27, NO. 9, SEPTEMBER 2023
Fig. 2. ML-LGL for multi-label CXR classification, which iteratively trains the DNN model on gradually increasing abnormalities. It shows the three
steps involved in each iteration: select new disease to update the local category, build a new training set, and train on the set using dynamic loss.
Fig. 3. Illustration of three clinical knoweledge-leveraged selection functions: (a) co-occurrence frequency matrix of 12 disease patterns from
the PLCO dataset; (b) multi-label conditional entropy computing; (c) frequency of disease pattern from PLCO. The 12 disease are nodule, mass,
pleural-based mass, granuloma, fluid in pleural space, hilar, infiltration, scarring, pleural fibrosis, bone lesion, cardiac abnormality, and COPD.
characterize radiologists’ learning patterns. Accordingly, we for training. The correlation function is given by:
propose three selection functions that leverage clinical knowl-
edge to establish radiologist-like learning order. K
2 × #(yi , yj )
1) Correlation Function: Radiologists typically determine fcor = arg max (4)
the learning order based on prior knowledge of comorbidity l
yi ∈Y\Yk−1 #(yi ) + #(yj )
j=1,j=i
(Fig. 3(a)). They tend to prioritize learning diseases that are
strongly related to other ones, as this enables them to leverage where #(yi ) represents the number of samples where the disease
as much relevant experience gained from previously studied yi exists, and #(yi , yj ) represents the number of samples where
diseases as possible when diagnosing subsequent ailments. both diseases yi and yj co-exist.
Therefore, we aim to assign higher training priority to strongly 2) Similarity Function: Radiologists tend to prioritize learn-
correlated diseases. ing about similar diseases based on their historical learning
To achieve this, we use the co-occurrence frequency of two experiences, which motivates us to assign a higher training
diseases to measure their correlation and use it to compute the priority to diseases more similar to those already learned
total correlation for each disease. The disease with higher total To achieve this, we propose multi-label conditional entropy
correlation is selected first to be pushed into the local category (MLCE) as a similarity metric. As shown in Fig. 3(b), given
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY JALANDAR. Downloaded on February 23,2024 at 13:16:49 UTC from IEEE Xplore. Restrictions apply.
LIU et al.: MULTI-LABEL LOCAL TO GLOBAL LEARNING 4413
N K
1 k
=− [on log2 okn +(1−okn )log2 (1−okn )]
N n=1
k=1
(5)
where O is the output of classification layer and okn is the sigmoid
output of the k-th node for image xn .
We quantify the similarity between each untrained disease
with the local category. The one with the highest similarity value
is first selected. Let Xyi represent the image labeled solely with
yi , the similarity function is given as:
fsim = arg max Hwk−1 (O | Xyi ) (6)
l
yi ∈Y\Yk−1
Note that the initial disease is chosen based on weights trans- number of buckets is denoted as nb , and the number of disease
ferred from ImageNet. patterns added in the k-th iteration can be calculated using:
3) Frequency Function: As mentioned in the introduction,
radiologists typically start by learning commonly occurring K/nb k < nb
disease patterns and gradually move on to rare ones over time. add(k) = (10)
K − (nb − 1) × K/nb k = nb
To emulate this approach, we introduce the frequency function:
This is more efficient than adding disease patterns one by one,
ff re = arg max #(yi ) (7) which is time-consuming due to the linear relationship between
l
yi ∈Y\Yk−1
the training time and the number of disease patterns (K) as
This function prioritizes disease patterns with high frequency. proven in [5], as well as the time required for abnormality
selection, especially for the dynamic similarity function that
B. Build New Training Set involves feedforward estimating of images.
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY JALANDAR. Downloaded on February 23,2024 at 13:16:49 UTC from IEEE Xplore. Restrictions apply.
4414 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 27, NO. 9, SEPTEMBER 2023
TABLE I
OVERVIEW OF THE THREE DATASETS
Fig. 4. (a) Initial stability across varying disease numbers; (b) AUC
performance at the convergent state for different disease numbers.
1) PLCO: PLCO is a CXR arm of the National Cancer Insti-
tute’s screening trial for prostate, lung, colorectal, and ovarian
Overall, our ML-LGL approach is a novel and effective solu- cancer [43]. There are 198,000 images in total but only 84182
tion for multi-label CXR abnormality classification, with unique images from 25 k participants are publicly distributed, with 13
features that distinguish it from LGL. disease patterns labeled by radiologists’ visual observation. We
exclude major atelectasis due to its low prevalence and use the
V. EXPLANATION OF MODEL STABILITY PERSPECTIVE remaining 12 ones for experiments. Without provided official
split, we split the data into three patient-level subgroups: 70%
To understand why ML-LGL works well, we give an expla- for training, 10% for validation, and 20% for testing, maintaining
nation from the perspective of model stability during training. disease pattern prevalence in each subgroup.
First, we define model stability as follows: 2) ChestX-Ray14: ChestX-ray14 (CXR14), published by
Definition 2: Given the image dataset X and model weights the US National Institute of Health, is the most widely used
w, we use the multi-label conditional entropy (MLCE) defined benchmark in the field of CXR analysis. The 14 labels are
in (5) to quantify the model stability. A lower MLCE signifies a minded from radiological reports by using natural language pro-
more stable model. cessing (NLP). To establish a consistent benchmark, we follow
We then devise a series of comparative experiments using the official patient-wise split standard [3]. During training, we
the PLCO dataset. A non-ML-LGL model is established, which use 10% of images from training set for validation.
shares the same DenseNet121 backbone as the ML-LGL model 3) CheXpert: CheXpet is a large-scale dataset released by
but employs random initialization for training. The ML-LGL Stanford Hospital and here we use the 1.0 version dataset with
model is trained using nb = 12 and the correlation function, downsampled resolution. The labels of training set are automati-
while the non-ML-LGL model is trained on a series of updated cally extracted from radiology reports using a rule-based labeler,
datasets established by the ML-LGL model (see (8)). We report while the validation and test sets are manually annotated by the
the stability at the initial moment and the AUC performance consensus of board-certified radiologists. 12 common disease
at the convergent moment for each training session of the two patterns are labeled as positive, negative and uncertain. The test
models. Both metrics are evaluated on the corresponding subset set is not publicly available and we evaluate our method on the
of the validation set, i.e., the image-label pairs whose labels validation set. We adopt two commonly utilized uncertain label
intersect with the currently trained diseases. Consequently, we policies [44] to use uncertain labels: 1) U-Zeros that replaces all
obtain the metric values on various disease numbers. the uncertain labels with “zero” and 2) U-Ones that replaces all
As shown in Fig. 4, the stability of ML-LGL model tends the uncertain labels with “one”.
to be unchanged when the disease number exceeds 8, whereas 4) Remarks: Label Noise: The CXR14 and CheXpert
the non-ML-LGL model demonstrates sustained growth with a datasets use NLP to extract labels from reports, which can result
slope of 1. More importantly, the stability gap and the AUC gap in label noise for two main reasons. First, NLP has high levels of
between the two models synchronously expand as the disease error and uncertainty [44]. Second, text reports cannot replace
number increases, indicating a positive correlation between the visual examination of the image due to insufficient or overly
reduced initial stability achieved by ML-LGL and the AUC descriptive information (e.g., lab tests or prior radiological stud-
performance. Therefore, we infer that ML-LGL benefits the ies). Label noise is severe in CXR14, with positive predictions
training process by lowering the initial stability, ultimately mostly 10% to 30% lower than original values [45]. As a result,
attaining improved performance. Note that our explanation is PLCO is the only large-scale CXR dataset with labels produced
provided from the perspective of initial stability, grounded in by radiologists’ visual observation of CXR images.
experimental validation. More theoretical explanations could be
given from continuation method [41] or data distribution [42].
B. Experimental Setup
VI. EXPERIMENT AND RESULTS 1) Evaluation: The per-class area under the ROC curve
(AUC) is adopted to measure the performance of each disease
A. Dataset pattern, and the average AUC is employed to evaluate over-
We conduct experiments on three public available CXR all performance. In our experiments, differences in the two
datasets: PLCO [43], ChestX-ray14 [3] and CheXpert [44]. compared methods are assessed using the one-sided DeLong
Table I gives an overview. statistical test [46] and the p-value < 0.05 is considered as the
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY JALANDAR. Downloaded on February 23,2024 at 13:16:49 UTC from IEEE Xplore. Restrictions apply.
LIU et al.: MULTI-LABEL LOCAL TO GLOBAL LEARNING 4415
TABLE II
THE AUC SCORES OF VARIOUS MODELS ON PLCO
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY JALANDAR. Downloaded on February 23,2024 at 13:16:49 UTC from IEEE Xplore. Restrictions apply.
4416 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 27, NO. 9, SEPTEMBER 2023
TABLE III
THE AUC SCORES OF VARIOUS MODELS ON CXR14
TABLE IV
THE AUC SCORES OF VARIOUS MODELS ON CHEXPERT
not only to inherent variations among datasets but also to label the organizers. Moreover, this method reported results us-
noise, which we will explore in Section VII-D. ing training data that included samples from CXR14. It is
also not possible to make a direct comparison with HMLC
regarding the per-class AUC score, as they use different
data splits. Thus, to ensure fairness, we only compared the
F. Compare With State-of-The-Art Methods overall performance. As shown in Table II, our correla-
We compare our method against state-of the-art methods on tion function outperforms HMLC, and the similarity function
the three datasets and Fig. 6 shows ROC of our best results. achieves comparative results, demonstrating the effectiveness of
1) Results on PLCO: We only compared our method with ML-LGL.
only HMLC [17] due to limited studies on this dataset. [20] 2) Results on CXR14: We select 15 typical and well-
also experimented on PLCO but used a previously pub- performed methods that are evaluated on the official patient-wise
lished dataset with up to 198000 images, which is now split. To provide more comprehensive comparisons, we attempt
not available due to changes in the data release policy by to compare our proposed method with SI-GCN by [24], which
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY JALANDAR. Downloaded on February 23,2024 at 13:16:49 UTC from IEEE Xplore. Restrictions apply.
LIU et al.: MULTI-LABEL LOCAL TO GLOBAL LEARNING 4417
has shown superior performance on multi-label natural images. A. Is Radiologists-Like Learning Order Beneficial?
However, the results are obtained from different datasets and To examine how much radiologist-like learning order con-
thus cannot be used for reliably comparing the performance of tribute to ML-LGL, we give a new selection function for compar-
the two methods. Fortunately, we note that CheXGCN by [23] ative study called random function, which selects abnormalities
and SSGE by [16] are two special cases of SI-GCN tailored randomly at each iteration. Table II shows random function also
for CXR classification. Specifically, CheXGCN and SSGE both achieves a higher mean AUC than the baseline, demonstrating
mirror the corresponding components of SI-GCN to build con- that ML-LGL can naturally benefit CXR classification regard-
cept correlations of the same scene and the semantic similarities less of learning order. This finding supports the motivation of
of different scenes, respectively. Therefore, we use the results LGL [5] and our explanation in Section V.
reported in [16], [23] for our comparison. We compared the random function with the three clinical
As shown in Table III, our similarity function achieves the knowledge-leveraged functions in terms of mean AUC and
second-highest mean AUC, narrowly beaten by [19] who used per-class AUC. We found that the random function achieved
CheXpert as an external dataset for joint training. Notably, our results similar to the frequency function, but was outperformed
similarity function performed comparably with the previous by the correlation and similarity functions. This suggests that the
best results, [48] and SSGE [16], both of which used only correlation and similarity order can benefit the training process,
CXR14. [48] proposed squeeze-and-excitation blocks, multi- while the frequency order cannot. The similarity order makes the
map transfer and max-min pooling to learn disease-specific fea- model start training from the most stable state at each iteration as
tures, thus achieving good performance. SSGE [16] explored the demonstrated in Section V, which may possibly explain the su-
semantic similarities of in-batch images to optimize the feature perior performance. The improvement brought by the correlation
embedding, also leading to good results. As previous best work function may be attributed to the exploitation of label relations,
that exploits label correlations, CheXGCN [23] compares fairly which is consistent with previous methods [11], [23] that have
against our frequency function, but is inferior to our correlation demonstrated the boosting of performance through label relation
and similarity function. These observations demonstrate the modeling.
superiority of our proposed ML-LGL.
Compared to the multi-modal learning method TieNet [13], B. Deep Analysis of Radiologists-Like Learning Order
which uses additional reports, and the multi-view method Im-
ageGCN [15], which uses multi-relational images (e.g., from the To aetiologically understand the radiologists-like learning
same age and person), three functions we proposed can consis- order, We tested the reverse versions of the correlation and
tently achieve the highest mean AUC with higher per-class AUC similarity functions. The anti-correlation function learns weaker
on almost all diseases. Furthermore, it outperforms AG-CL [4], correlated disease patterns first, while the anti-similarity func-
which also uses CL for training, by a significant margin. These tion learns dissimilar disease patterns first. The reverse version
comparisons again persuasively demonstrate the advantage of of the frequency function is not tested as it does not improve
our proposed ML-LGL. performance.
3) Results on CheXpert: For a fair comparison, we selected
state-of-the-art methods that were evaluated on the official val- 2 https://github.com/fzfs/Multi-view-Chest-X-ray-Classification; https://
idation set. It should be noted that although the training dataset github.com/mocherson/ImageGCN
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY JALANDAR. Downloaded on February 23,2024 at 13:16:49 UTC from IEEE Xplore. Restrictions apply.
4418 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 27, NO. 9, SEPTEMBER 2023
Fig. 7. Classification results visualization from CXR14. Image with one-label (a), two-label (b) and three-label (c) are shown. From top to bottom:
heatmaps generated by baseline and three selection functions. The highest five sigmoid values are shown with ground truth highlighted in red.
D. Limitation
Our approach is susceptible to label noise due to the learn-
Comparing the results in Table II, the reverse versions ing order built on the label, causing the clinical knowledge-
had significantly different outcomes compared to their orig- leveraged function to degrade into a random function and hinder
inal counterparts. Surprisingly, the anti-similarity function learning ability. Our experimental results in Tables III and IV,
showes a higher average AUC and improved 5 disease pat- which show no significant difference in the mean AUC of the
terns compared to its original version. This may be because three functions, seem to support this hypothesis.
the similarity function prioritizes disease with greater sim- To demonstrate our hypothesis, we perform experiments to
ilarity, which are more informative and further away from simulate different levels of label noise severity to monitor per-
the currently trained images. Training on these images first formance change. Since PLCO has greater label reliability, we
can rapidly reduce the error and lead to better outcomes. regard it as a basic dataset to simulate a label noise scenario
Conversely, the anti-correlation function decreased the mean that CXR14 and CheXpert face. Label noise specifically refers
AUC drastically and approached the baseline, indicating that to three cases: missed finding, over-labeling and mislabeling.
it is crucial to learn strongly correlated disease patterns first Missed finding means an abnormality is observed on a CXR
and incorrect learning order may adversely affect the training while it is not annotated. Over-labeling means an abnormality
process. does not exist on a CXR but it is labeled as positive. Mislabeling
means an abnormality is misidentified and labeled as another
C. Generalizability on Different DNN Models name. Accordingly, we corrupt PLCO to produce the scenario
using the following controlled scheme:
To validate ML-LGL’s generalizability across models, an- r Choose a base probability α ∈ [0, 0.25]
other three DNN models are involved for evaluation: VGG16, r For each sample with abnormality, we randomly select one
ResNet50 and AG-CNN [8]. The experiments were conducted
on PLCO using correlation and similarity functions, and AG- of its labels and delete it with a probability of α.
CNN was implemented using code3 provided by the authors. As
r For each sample with abnormality, we replace one of its
shown in Table V, the performance increased with the model’s labels with another label (not originally included in the
sample) with a probability of α.
r For each sample, we add an extra label (not originally
3 https://github.com/Ien001/AG-CNN/blob/master/ included in the sample) with a probability of α.
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY JALANDAR. Downloaded on February 23,2024 at 13:16:49 UTC from IEEE Xplore. Restrictions apply.
LIU et al.: MULTI-LABEL LOCAL TO GLOBAL LEARNING 4419
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY JALANDAR. Downloaded on February 23,2024 at 13:16:49 UTC from IEEE Xplore. Restrictions apply.
4420 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 27, NO. 9, SEPTEMBER 2023
[8] Q. Guan, Y. Huang, Z. Zhong, Z. Zheng, L. Zheng, and Y. Yang, “Diagnose [31] J. Yang et al., “Self-paced balance learning for clinical skin disease
like a radiologist: Attention guided convolutional neural network for thorax recognition,” IEEE Trans. Neural Netw. Learn. Syst., vol. 31, no. 8,
disease classification,” 2018, arXiv:1801.09927. pp. 2832–2846, Aug. 2020.
[9] Q. Guan and Y. Huang, “Multi-label chest X-ray image classification [32] Q. Zhu, N. Yuan, J. Huang, X. Hao, and D. Zhang, “Multi-modal AD
via category-wise residual attention learning,” Pattern Recognit. Lett., classification via self-paced latent correlation analysis,” Neurocomputing,
vol. 130, pp. 259–266, 2018. vol. 355, pp. 143–154, 2019.
[10] B. Chen, J. Li, G. Lu, and D. Zhang, “Lesion location attention guided [33] P. Morerio, J. Cavazza, R. Volpi, R. Vidal, and V. Murino, “Curriculum
network for multi-label thoracic disease classification in chest X-rays,” dropout,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 3544–3552.
IEEE J. Biomed. Health Inform., vol. 24, no. 7, pp. 2016–2027, Jul. 2020. [34] T. Karras, T. Aila, S. Laine, and J. Lehtinen, “Progressive grow-
[11] Q. Guan, Y. Huang, Y. Luo, P. Liu, M. Xu, and Y. Yang, “Discriminative ing of GANs for improved quality, stability, and variation,” 2017,
feature learning for thorax disease classification in chest X-ray images,” arXiv:1710.10196.
IEEE Trans. Image Process., vol. 30, pp. 2476–2487, 2021. [35] M. De Lange et al., “A continual learning survey: Defying forgetting in
[12] H. Wang, S. Wang, Z. Qin, Y. Zhang, R. Li, and Y. Xia, “Triple attention classification tasks,” in IEEE Trans. Pattern Anal. Mach. Intell., vol. 44,
learning for classification of 14 thoracic diseases using chest radiography,” no. 7, pp. 3366–3385, Jul. 2022.
Med. Image Anal., vol. 67, 2021, Art. no. 101846. [36] W. Ou, S. Xiao, C. Zhu, W. Han, and Q. Zhang, “An overview of
[13] X. Wang, Y. Peng, L. Lu, Z. Lu, and R. M. Summers, “TieNet: Text-image brain-like computing: Architecture, applications, and future trends,” Front.
embedding network for common thorax disease classification and report- Neurorobot., vol. 16, 2022, Art. no. 1041108.
ing in chest X-rays,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., [37] S. Ghosh-Dastidar and H. Adeli, “Spiking neural networks,” Int. J. Neural
2018, pp. 9049–9058. Syst., vol. 19, no. 4, pp. 295–308, 2009.
[14] G. Jacenków, A. Q. O’Neil, and S. A. Tsaftaris, “Indication as prior [38] Q. Zhou, C. Ren, and S. Qi, “An imbalanced R-STDP learning rule in
knowledge for multimodal disease classification in chest radiographs with spiking neural networks for medical image classification,” IEEE Access,
transformers,” in Proc. IEEE 19th Int. Symp. Biomed. Imag., 2022, pp. 1–5. vol. 8, pp. 224162–224177, 2020.
[15] C. Mao, L. Yao, and Y. Luo, “ImageGCN: Multi-relational image graph [39] A. Garain, A. Basu, F. Giampaolo, J. D. Velasquez, and R. Sarkar,
convolutional networks for disease identification with chest X-rays,” IEEE “Detection of COVID-19 from CT scan images: A spiking neural
Trans. Med. Imag., vol. 41, no. 8, pp. 1990–2003, Aug. 2022. network-based approach,” Neural Comput. Appl., vol. 33, no. 19,
[16] B. Chen, Z. Zhang, Y. Li, G. Lu, and D. Zhang, “Multi-label chest X-ray im- pp. 12591–12604, 2021.
age classification via semantic similarity graph embedding,” IEEE Trans. [40] A. Paul et al., “Generalized zero-shot chest X-ray diagnosis
Circuits Syst. Video Technol., vol. 32, no. 4, pp. 2455–2468, Apr. 2022. through trait-guided multi-view semantic embedding with self-
[17] H. Chen, S. Miao, D. Xu, G. D. Hager, and A. P. Harrison, “Deep training,” IEEE Trans. Med. Imag., vol. 40, no. 10, pp. 2642–2655,
hiearchical multi-label classification applied to chest X-ray abnormality Oct. 2021.
taxonomies,” Med. Image Anal., vol. 66, 2020, Art. no. 101811. [41] Y. Bengio, “Evolving culture versus local minima,” in Growing Adaptive
[18] H. H. Pham, T. T. Le, D. Q. Tran, D. T. Ngo, and H. Q. Nguyen, “Interpret- Machines: Combining Development and Learning in Artificial Neural
ing chest X-rays via CNNs that exploit hierarchical disease dependencies Networks. Berlin, Germany: Springer, 2014, pp. 109–138.
and uncertainty labels,” Neurocomputing, vol. 437, pp. 186–194, 2021. [42] T. Gong, Q. Zhao, D. Meng, and Z. Xu, “Why curriculum learning &
[19] L. Luo et al., “Deep mining external imperfect data for chest X-ray disease self-paced learning work in big/noisy data: A theoretical perspective,” Big
screening,” IEEE Trans. Med. Imag., vol. 39, no. 11, pp. 3583–3594, Data Inf. Analytics, vol. 1, no. 1, pp. 111–127, 2016.
Nov. 2020. [43] P. P. Team, J. K. Gohagan, P. C. Prorok, R. B. Hayes, and B.-S.
[20] S. Gündel et al., “Robust classification from noisy labels: Integrating Kramer, “The prostate, lung, colorectal and ovarian (PLCO) cancer
additional knowledge for chest radiography abnormality assessment,” screening trial of the national cancer institute: History, organization,
Med. Image Anal., vol. 72, 2021, Art. no. 102087. and status,” Controlled Clin. Trials, vol. 21, no. 6, pp. 251S–272S,
[21] L. Yao, E. Poblenz, D. Dagunts, B. Covington, D. Bernard, and K. Lyman, 2000.
“Learning to diagnose from scratch by exploiting dependencies among [44] J. Irvin et al., “CheXpert: A large chest radiograph dataset with uncertainty
labels,” 2017, arXiv:1710.10501. labels and expert comparison,” in Proc. AAAI Conf. Artif. Intell., 2019,
[22] P. Kumar, M. Grewal, and M.M. Srivastava, “Boosted cascaded convnet pp. 590–597.
for multilabel classification of thoracic diseases in chest radiographs,” in [45] L. Oakden-Rayner, “Exploring large-scale public medical image datasets,”
Proc. Int. Conf. Image Anal. Recognit., Springer, 2018, pp. 546–552. Academic Radiol., vol. 27, no. 1, pp. 106–112, 2020.
[23] B. Chen, J. Li, G. Lu, H. Yu, and D. Zhang, “Label co-occurrence [46] X. Sun and W. Xu, “Fast implementation of DeLong’s algorithm for
learning with graph convolutional networks for multi-label chest X-ray comparing the areas under correlated receiver operating characteristic
image classification,” IEEE J. Biomed. Health Inform., vol. 24, no. 8, curves,” IEEE Signal Process. Lett., vol. 21, no. 11, pp. 1389–1393,
pp. 2292–2302, Aug. 2020. Nov. 2014.
[24] B. Chen, Z. Zhang, Y. Lu, F. Chen, G. Lu, and D. Zhang, “Semantic- [47] L. N. Smith, “Cyclical learning rates for training neural networks,” in Proc.
interactive graph convolutional network for multilabel image recognition,” IEEE Winter Conf. Appl. Comput. Vis., 2017, pp. 464–472.
IEEE Trans. Syst., Man, Cybern. Syst., vol. 52, no. 8, pp. 4887–4899, [48] C. Yan, J. Yao, R. Li, Z. Xu, and J. Huang, “Weakly supervised deep
Aug. 2022. learning for thoracic disease classification and localization on chest X-
[25] J. L. Elman, “Learning and development in neural networks: The impor- rays,” in Proc. ACM Int. Conf. Bioinf., Comput. Biol., Health Inform.,
tance of starting small,” Cognition, vol. 48, no. 1, pp. 71–99, 1993. 2018, pp. 103–110.
[26] Y. Bengio, J. Louradour, R. Collobert, and J. Weston, “Curriculum learn- [49] X. Zhu and Q. Feng, “MVC-Net: Multi-view chest radiograph classifica-
ing,” in Proc. 26th Annu. Int. Conf. Mach. Learn., 2009, pp. 41–48. tion network with deep fusion,” in Proc. IEEE 18th Int. Symp. Biomed.
[27] X. Wang, Y. Chen, and W. Zhu, “A survey on curriculum learning,” Imag., 2021, pp. 554–558.
IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 9, pp. 4555–4576, [50] W. Xu, W. Liu, X. Huang, J. Yang, and S. Qiu, “Multi-modal self-paced
Sep. 2022. learning for image classification,” Neurocomputing, vol. 309, pp. 134–144,
[28] J. Wei et al., “Learn like a pathologist: Curriculum learning by annotator 2018.
agreement for histopathology image classification,” in Proc. IEEE/CVF [51] N. Sarafianos, T. Giannakopoulos, C. Nikou, and I. A. Kakadiaris, “Cur-
Winter Conf. Appl. Comput. Vis., 2021, pp. 2473–2483. riculum learning of visual attribute clusters for multi-task classification,”
[29] M. Yang, Z. Xie, Z. Wang, Y. Yuan, and J. Zhang, “Su-MICL: Severity- Pattern Recognit., vol. 80, pp. 94–108, 2018.
guided multiple instance curriculum learning for histopathology image [52] S. Ghamizi, M. Cordy, M. Papadakis, and Y. L. Traon, “On evaluating
interpretable classification,” IEEE Trans. Med. Imag., vol. 41, no. 12, adversarial robustness of chest X-ray classification: Pitfalls and best prac-
pp. 3533–3543, Dec. 2022. tices,” 2022, arXiv:2212.08130.
[30] R. Zhao, X. Chen, Z. Chen, and S. Li, “EGDCL: An adaptive curriculum [53] M. Xu, T. Zhang, Z. Li, M. Liu, and D. Zhang, “Towards evaluating the
learning framework for unbiased glaucoma diagnosis,” in Proc. Eur. Conf. robustness of deep diagnostic models by adversarial attack,” Med. Image
Comput. Vis., Springer, 2020, pp. 190–205. Anal., vol. 69, 2021, Art. no. 101977.
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY JALANDAR. Downloaded on February 23,2024 at 13:16:49 UTC from IEEE Xplore. Restrictions apply.