Professional Documents
Culture Documents
5, MAY 2022
Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on January 29,2023 at 13:32:49 UTC from IEEE Xplore. Restrictions apply.
YAO et al.: SINGLE MODEL DL ON IMBALANCED SMALL DATASETS FOR SKIN LESION CLASSIFICATION 1243
Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on January 29,2023 at 13:32:49 UTC from IEEE Xplore. Restrictions apply.
1244 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 41, NO. 5, MAY 2022
difficulty between different classes. Lin et al. introduced a 1) We propose a novel Multi-weighted New Loss method
Focal Loss method to enhance the train outcome on samples to deal with class imbalance issue and to improve the
with imbalanced classification difficulties [45]. Cui et al. accuracy for detection of key classes such as melanoma.
further proposed a Class-Balanced Focal Loss function that A correction operation that forcibly limits very large
combines the Focal Loss with the Class-Balanced Loss [46]. losses is also introduced to reduce the interference of
Considering that accurate detection of melanoma has the outliers in the network training. As far as we know, the
highest clinical impact, we propose a Multi-weighted New Multi-weighted New Loss method is the first reported
Loss (MWNL) method that not only overcomes the class that can simultaneously deal with the data imbalance
imbalance issue in sample number and classification difficulty, problem and the outlier problem.
but also improve the accuracy of melanoma classification by 2) We propose a novel end-to-end cumulative learning
adjusting the weight of the corresponding loss. strategy that can balance representation learning and
On the other hand, the random cropping expanding the classifier learning more effectively at no additional com-
scale of the dataset is widely used in the training of the putational cost. This strategy is designed to first learn in
DCNNs, and it can also greatly enhance the spatial robustness the universal pattern that lead to a good initialization
of the model [47]. Noticeably, the skin lesions on some images and then gradually focus on the imbalance data.
are very small, and the random cropped images may contain 3) By comparing the classification performance of DCNNs
only partial or even empty lesions. Such samples, as well as with different structures and capacities on the dermo-
those with incorrect labels, are called “very hard examples” or scopic image datasets, we experimentally verify that the
“outliers”. They may still bring large losses in the convergence advanced DCNN models performing better on large nat-
training stage. Thereafter, if the converged model is forced to ural image datasets (e.g. ImageNet) will generally have
learn to classify these outliers better, the classification of a better performance in medical image classification tasks,
large number of other examples tends to be less accurate [48]. and for small image datasets, a moderate model, instead
To deal with this problem, we further introduce a correction of the larger ones, should yield better performance.
operation in our MWNL method that forcibly limits these very To the best of our knowledge, this is the first systematic
large losses in order to reduce the interference of outliers in verification on skin lesion datasets and the first adoption
the network training. of RegNets in dermoscopic image classification tasks.
Moreover, Zhou et al. [49] observed that the conventional 4) In addition to the fundamental innovations as listed
class re-balancing methods may cause the unexpected loss of above, we also improve the available methods for better
the representative ability for the learned deep features, despite performance. To increase data diversity of dermoscopic
their significant promotion of classifier learning. Therefore, image datasets, we propose a novel Modified RandAug-
they proposed a unified Bilateral-Branch Network (BBN) ment strategy. To reduce over-fitting, we redesign the
model to balance between representation learning and clas- DCNN models by integrating regularization DropOut
sifier learning [49]. Similarly, Cao et al. [50] separated the and DropBlock.
training process into two stages, where they first trained We have implemented the proposed methods to the follow-
networks as usual on the originally imbalanced data, and ing four datasets: ISIC 2018 [56], ISIC 2019 [2], [15], [16],
only utilized re-balancing at the second stage to fine-tune the ISIC 2017 [16], and Seven-Point Criteria Evaluation (7-PT)
network at a smaller learning rate. Inspired by these previous Dataset [57]. Our experiments have achieved the outstanding
works, we propose a novel end-to-end cumulative learning performance on all of these datasets. Our next plan is to
strategy capable of effective balancing between representation implement the proposed strategies in a mobile dermoscopic
learning and classifier learning at no additional computational device for low-cost automated screening of skin diseases in
cost. This strategy includes two phases. In the initial phase, low-resource settings.
the conventional training is carried out on the originally The impact of our study is not limited to skin disease
imbalanced data to initialize appropriate weights for deep classification tasks. The methods described in this paper
layers’ features [50]. As the number of iterations increases, can be further extended to handle the common problems,
the training gradually changes to a re-balancing mode, and such as insufficient sample size, class imbalance, and label-
the learning rate also synchronously decreases to promote the ing noise, in other medical image datasets and computer
optimization of upper classifier of DCNN. vision (CV) classification tasks in general. Although previous
Although previous studies show that the ensemble DCNN publications have addressed each of these problems in depth,
models usually achieve better performance [44], [51]–[55], few work has been published on simultaneous handling of
implementing these models in a mobile device is not practical, them. All the source codes of our methods are available at
especially in remote and rural areas with limited computa- https://github.com/yaopengUSTC/mbit-skin-cancer.git.
tional resource. Also, cloud-based intelligent diagnosis is only
implementable in developed countries or areas where advanced II. R ELATED W ORK
infrastructure has been established. Considering the urgent A. DCNN Models
need for portable, low-cost, and automatic diagnosis at a low Recent research and development efforts on advanced
computational cost, we focus our research on the single-model- DCNN models have facilitated automated feature extraction
based method. and classification with high performance [19]–[22], [30], [58],
The main contributions of this paper are thus summarized [59]. He et al. introduced shortcut connections in ResNet [22]
as follows: to address the degradation problem [60], making it possible to
Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on January 29,2023 at 13:32:49 UTC from IEEE Xplore. Restrictions apply.
YAO et al.: SINGLE MODEL DL ON IMBALANCED SMALL DATASETS FOR SKIN LESION CLASSIFICATION 1245
train up to hundreds or even thousands of layers. ResNet won proportional to the number of samples [44]:
the 2015 ImageNet image classification competition [14]. The
1
SENet model enhanced its sensitivity to channel characteris- wi = (1)
tics by introducing Squeeze-and-Excitation (SE) [58]. SENet Ni
won the 2017 ImageNet image classification competition. where wi is the loss weighting factor for class i , and Ni is the
In 2019, Tan et al. proposed a series of lightweight DCNN number of samples for class i .
models named EfficientNets [21]. Owing to their outstanding Cui et al. design a new re-weighting scheme that uses the
performance in many classification tasks, they have been effective number of samples for each class to re-balance the
adopted as a backbone in the latest skin lesion classification loss in Class-Balanced Loss [46]. The effective number of
tasks [55], [61], [62]. Recently, Radosavovic et al. proposed samples is defined as (1 − β n )/(1 − β), where n is the number
RegNets that perform better and up to 5x faster on GPUs of samples and β ∈[0:1) is a hyperparameter. The loss weight-
compared with EfficientNets of similar FLOPs and parame- ing factor wi for class i is thus defined as the class-balanced
ters [20]. RegNets consist of two sequences of RegNetXs loss in expression (2):
and RegNetYs. RegNetXs were designed mainly on standard
1−β
residual bottleneck block with grouped convolution, while wi = . (2)
RegNetYs were optimized by the addition of SE modules to 1 − β Ni
RegNetXs [58]. In each of these series, the author trained Notably, wi varies in [1, 1/Ni ) following the change of β.
DCNNs of FLOPs from 200M to 32G on ImageNet, and the As β → 1, wi → 1/Ni . Therefore, we can finally find an
performances of RegNetYs were better than those of Reg- optimal β value to minimize the performance loss caused by
NetXs, proving the effectiveness of the SE structure module unbalanced samples among classes for any dataset.
for natural image classification. Alternatively, Lin et al. applies a modulating term to cross-
entropy loss in order to enhance the train outcome on samples
with imbalanced classification difficulties [45] [46]. First,
B. Data Augmentation a parameter z it is defined as :
Data augmentation is one of the essential keys to overcome
the challenge of limited training data by randomly “augment” t zi , if i = y
zi = (3)
data diversity and number. Since the design of augmentation −z i , otherwise
policies requires expertise to capture prior knowledge in each
where z i is the predicted value of the sample for class
domain, it is difficult to extend existing data augmentation
i , y is the ground truth. Denote pit = sigmoi d(z it ) =
methods to other applications and domains. The recently
1/(1 + exp(−z it )), the Focal Loss is then updated as:
emerging automatic design augmentation methods based on
Learning policies is able to address the shortcomings of tradi- C
tional data augmentation methods [37]–[39]. Among them, the F L(z, y) = − (1 − pit )r log( pit ) (4)
i=1
RandAugment method proposed by Cubuk et al. is the most
where the parameter r smoothly adjusts the rate at which easy
advanced data augmentation technology so far [39]. Compared
examples are down-weighted. As r = 0, the Focal Loss value
with AutoAugment [37] and Fast AutoAugment [38], the
is equivalent to cross entropy (CE) loss. As r increases, the
RandAugment greatly reduces the search space and thereby
effect of the modulating factor increases accordingly. Based on
shorten the training time and the computational cost as well.
the Class-Balanced Loss and the Focal Loss, Cui et al. further
The search space of RandAugment consists of 14 available
propose the Class-Balanced Focal Loss (CBfocal ) that is able
transformations. For transformation of each image, a parame-
to deal with the imbalances in both the sample numbers and
ter -free procedure is applied in order to reduce the parame-
the classification difficulties [46]. The CBfocal is expressed as
ter space while maintain the image diversity. RandAugment
following:
comprises 2 integer hyperparameters N and M, where N is
the number of transformations applied to a training image, 1 − β C
C B f ocal (z, y) = − (1 − pit )r log( pit ). (5)
and M is the magnitude of each augmentation distortion. 1 − β Ny i=1
A randomly selected transformation is applied to each image
according to the preset magnitude, followed by repetition of where N y is the number of samples in the ground-truth class y.
this process for N-1 times. All the transformations use the
same global parameter M so that the resultant search space III. M ETHODOLOGY
size is significantly reduced from 1032 of AutoAugment [37]
The proposed classification framework is illustrated by
and Fast AutoAugment [38] to 102 .
the flowchart in Fig. 2. This section first introduces the
datasets and the evaluation metrics used in our study, followed
by the in-depth explanation of the proposed approaches,
C. Class-Balanced Loss and Focal Loss
including DCNN models, Modified RandAugment, Multi-
To address the class imbalance issue in a given dataset, weighted New Loss method and Cumulative Learning Strat-
the cost-sensitive-based methods are commonly used. The egy. Finally, the training and the evaluation strategies are
methods usually introduce a loss weighting factor inversely introduced.
Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on January 29,2023 at 13:32:49 UTC from IEEE Xplore. Restrictions apply.
1246 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 41, NO. 5, MAY 2022
Fig. 2. The flowchart of the proposed framework. The upper part is the training pipeline, and the lower part is the test pipeline.
Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on January 29,2023 at 13:32:49 UTC from IEEE Xplore. Restrictions apply.
YAO et al.: SINGLE MODEL DL ON IMBALANCED SMALL DATASETS FOR SKIN LESION CLASSIFICATION 1247
TABLE III
L IST OF A LL T RANSFORMATIONS C AN B E C HOSEN
D URING THE S EARCH
TABLE II
D ETAILED I NFORMATION OF DCNN S W ITH D IFFERENT PARAMETERS
AND D IFFERENT A RCHITECTURES [20]–[22], [30], [58], [59]
Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on January 29,2023 at 13:32:49 UTC from IEEE Xplore. Restrictions apply.
1248 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 41, NO. 5, MAY 2022
each image will improve the training outcome. Therefore, where T is a hyperparameter threshold for limiting the loss
we divide the transformations in search space into two subsets of outlier. Experimental results indicate that the value of
of {color} and {shape}, comprising 13 and 8 transformations 0.1 perform best, and T is set as 0.1 for all subsequent
respectively. As N > 1, the transformations are randomly experiments. G ∗ = (1 − T )r log (T ) is a constant determined
selected from {color} and {shape} subsets consequently. The by T .
working protocol of Modified RandAugment is illustrated
below using N = 2 as an example: F. Cumulative Learning Strategy
Step 1. For each image in the training set, randomly select
a transformation from {color}, and implement the transfor- Inspired by literature [49] and [50], we proposed a
mation at the preset probability of P. If the transformation is novelend-to-end cumulative learning strategy (CLS) and
to be performed, its amplitude is randomly selected within an updated (9) as:
allowable range. Otherwise, the original image is remained. 1 β
C
Step 2. Randomly select a transformation from {shape}, MWNL(z, y) = −(C y∗ ) Lossi (11)
Ny
and follow the same procedure in Step 1 to implement the i=1
transformation. where C y∗α = C y , β ∈ [0, α] and changes with the current
Step 3. Randomly crop an image of 224 × 224 from the epoch E:
augmented image obtained from Step 2 and send it to the ⎧
training network. ⎪
⎪ 0 E ≤ E1
⎨ E−E
1 2
β= ( ) α E1 < E < E2 (12)
⎪
⎪ E − E1
E. Multi-Weighted New Loss Method ⎩ 2
α E ≥ E2
In the conventional Class-Balanced Loss method [46], the
loss weighting factor wi is calculated by (2) and the weighting where E 1 and E 2 are hyperparameter thresholds for training
strength floats within [1, 1/Ni ) regardless of the value of β. epoch, which are set as 20 and 60 respectively in our experi-
In our Multi-weighted New Loss method, we modify wi by ment. With the increase of epoch, β changes from 0 to α, and
adding the hyperparameter α, as shown in (7): the weighted factors for different classes gradually vary from 1
to Ci (1/Ni )α . Wherein, the CLS first trains networks on the
1 α originally imbalanced data as usual to make appropriate weight
wi = ( ) . (7)
Ni justifications for deep layers’ features. Following that the
Clearly, the scope of wi is extended beyond [1, 1/Ni ] training gradually changes to a re-balancing mode. For more
as α > 1, which will further enhance the diversity of the details in the re-balancing training, the learning rate gradually
weighting strength. In order to reduce the sample imbalance decreases to a small value, coupled with the non-convexity of
between classes and the classification difficulty imbalance, the loss, only minor changes occurred to the weights of the
we propose a Multi-weighted Focal Loss (MWLfocal ) function deep features and the network consequently transfers to the
by combining (7) with the Focal Loss function [45]. Here, optimization of the upper classifier. Another point to mention
a weighting coefficient Ci on the basis is also introduced to here is that the CLS method does not require additional
strengthen the training effect for specified category, and the training time, which is almost half of the BBN method [49].
MWLfocal function is expressed as follow:
G. Training and Evaluation Strategies
1 C
MWLfocal (z, y) = −C y ( )α (1 − pit )r log( pit ) (8) The models are initialized with the weights pre-trained on
Ny the ImageNet dataset [14], and fine-tuned on the training sets.
i=1
As shown in Fig. 2, the proposed data enhanced strategy is
where C y is the weighting coefficient for the ground-truth class
applied to each image in the training set and the resultant
y. For example, we can simply set the category weighting
image is randomly cropped in order to match the size of 224×
coefficient C M E L a value greater than 1 while keep the other
224 required by the models.
Ci values as 1 to strengthen the training effect for melanoma.
A multi-crop strategy is adopted for evaluation of the DCNN
As shown in formula (8), for outliers of pit → 0, the
models. As the indicated by the flowchart in Fig. 2, the areas
loss → ∞, and it would seriously mislead the optimization
from the upper left corner to the lower right corner of the
of network training. We further modify (8) by introducing a
test image are cropped with equal space, and the average of
correction term to reduce the interference of outliers, and the
their predicted values is used as the final prediction value.
Multi-weighted New Loss function (MWNL) is updated as:
The number of crops is defined as 16 in our study since we
1 α
C notice little improvement of the model performance beyond
MWNL(z, y) = −C y ( ) Lossi (9) this number.
Ny
i=1 All the training and the testing tasks are performed on 2
where NVIDIA GeForce GTX 2080Ti graphics cards using PyTorch.
Adam [66] is used as an optimizer and “MultiStepLR” is used
(1 − pit )r log( pit ) pit > T for learning rate schedule for all the models. A starting learn-
Lossi = (10)
G∗ pit ≤ T ing rate of 0.001 is chosen and is gradually reduced at a factor
Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on January 29,2023 at 13:32:49 UTC from IEEE Xplore. Restrictions apply.
YAO et al.: SINGLE MODEL DL ON IMBALANCED SMALL DATASETS FOR SKIN LESION CLASSIFICATION 1249
TABLE IV TABLE V
BACC OF D IFFERENT DCNN M ODELS ON THE ISIC 2018 S KIN DCNN S P ERFORM B EST ON 7-PT D ATASET, ISIC 2017,
L ESION C LASSIFICATION C HALLENGE T EST S ET AND ISIC 2019
TABLE VI
BACC OF R EG N ET Y-3.2G W ITH A DDING D ROP O UT AND D ROP B LOCK
ON THE ISIC 2018 S KIN L ESION C LASSIFICATION
C HALLENGE T EST S ET
Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on January 29,2023 at 13:32:49 UTC from IEEE Xplore. Restrictions apply.
1250 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 41, NO. 5, MAY 2022
TABLE VII
BACC C OMPARISON OF G ENERAL AUGMENTATION M ATHOD,
R ANDAUGMENT AND O UR M ODIFIED R ANDAUGMENT
ON THE D IFFERENT T EST S ETS
Fig. 5. BACC on the ISIC 2018 test set using Modified RandAugment.
(a) BACC with different N and P. ∗ means randomly select a transforma-
tion from the {color } subset, ^means randomly select a transformation
from the {shape } subset, and transformations are executed following the
order of symbols in the training. In addition, transformation is randomly
selected from the entire search space for unmarked groups. (b) BACC
with the changing of M (N = 2∗ ^, P = 0.7). CM is “Constant Magnitude”,
RM is “Random Magnitude with Increasing Upper Bound”.
Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on January 29,2023 at 13:32:49 UTC from IEEE Xplore. Restrictions apply.
YAO et al.: SINGLE MODEL DL ON IMBALANCED SMALL DATASETS FOR SKIN LESION CLASSIFICATION 1251
TABLE X
α = 1.1. After that, further increase of α gradually reduces the
T HE P ERFORMANCE OF ISIC 2019 C HALLENGE W INNERS F ROM THE
BACC level. Noticeably, as an extension of the CBfocal [46] in L EGACY L EADERBOARD (R OWS 1–3), C URRENT ISIC 2019
weighting strength, MWLfocal (C M E L = 1.0) at α = 1.0 yields C HALLENGE L IVE L EADERBOARD (R OWS 4–8), AND O UR
a BACC value of 0.862, corresponding to the best performance P ROPOSED A PPROACH
achievable by CBfocal . This result verifies that MWLfocal is
better than CBfocal in performance improvement.
Fig. 6 (b) shows the BACC and melanoma detection sen-
sitivity of models adopting MWLfocal of different C M E L .
At C M E L = 1.0, the maximal BACC reaches 0.864, while
the melanoma detection sensitivity is only 0.772. As C M E L
increases, the BACC decreases slightly, while the melanoma
detection sensitivity increases significantly. For example,
as C M E L = 2.0, BACC drops to 0.848, and melanoma
sensitivity rises up to 0.871. This proves that training a DCNN
with both high BACC and high sensitivity of specified class
is possible by adjusting Ci .
The effectiveness of the proposed MWNL method is further
verified on different datasets using RegNetY-##-Drop as the proves the effectiveness of the proposed training strategy.
baseline and Modified RandAugment as the augmentation However, on the 7-PT and ISIC 2017 datasets, the CLS meth-
method. Table VIII shows the result of our MWNL with ods doesn’t work well. Especially, BACC dropped significantly
MWLfocal , the standard training, and several state-of-the- on the 7-PT dataset with fewest samples. It is likely that too
art techniques widely adopted in mitigating data imbalance. few samples unsuccessfully enable the effective learning of
Here these methods include: standard cross-entropy loss (CE), deep features in the training strategies like DRW and CLS,
CE with over-sample method (CE-RS), CE re-weighted by (1) and thereby fail the proper weights initialization for model
(CE-RW), CBfocal [46], label-distribution-aware margin loss in the initial training phase, and finally hinder subsequent
(LDAM) [50], learning optimal samples weights method optimization training.
(LOW) [71], and complement cross entropy (CCE) [72]. It is It is also noticed that BBN method that works on general
clear that our MWLfocal helps to achieve better performance imbalanced datasets is not efficient on the four dermatoscopic
than those methods except for LDAM, and our MWNL further image datasets. Although the BBN method performs slightly
improved the performance beyond all of them. In view of this, better than the DRW method and our CLS method on 7-PT and
the addition of correction items to suppress the adverse effects ISIC 2017 datasets both of smaller sample sizes, its training
of outliers can really improve the performance of DCNNs on time is almost twice that of the other two methods.
small and imbalanced dermoscopic image datasets.
E. Comparison With Other Methods in the Challenge
D. Results on Our Cumulative Learning Strategy Table IX - XII show the performance of our method
Table VIII also shows the performance of DCNNs trained and other state-of-the-art methods on 4 different skin lesion
following our proposed end-to-end Cumulative Learning classification tasks at the time of submitting the manuscript.
Strategy (CLS), BBN method [49], and two-stage DRW As shown in Table IX, on the ISIC 2018 classification
method [50]. On the ISIC 2018 and ISIC 2019 datasets, the challenge, our proposed MWNL-CLS approach ranks third
CLS method performs better than other methods, and the overall among the top 200 submissions on the live leaderboard
MWNL-CLS method achieves the best performance, which at the time this manuscript is submitted (leaderboard data as
Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on January 29,2023 at 13:32:49 UTC from IEEE Xplore. Restrictions apply.
1252 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 41, NO. 5, MAY 2022
TABLE XI and computing time. For example, one of the best methods
T HE P ERFORMANCE OF T HREE T OP -R ANKING C HALLENGE on the ISIC 2018 challenge live leaderboard integrates 90
S OLUTIONS (R OWS 1–3), F OUR R ECENT M ETHODS (R OWS 4–7),
DCNNs, which takes 13.9 seconds in a single inference [55].
AND O UR P ROPOSED A PPROACH ON THE ISIC 2017 T EST S ET
In comparison, our method takes only 0.05 seconds in the
similar computing environment, and achieves better perfor-
mance. In addition, although our method uses multi-crop
strategy in the inference process, the single-model method in
asynchronous pipeline mode results in the overall inference
time just slightly longer than that adopt single crop method,
because these different crops can be calculated in different
layers of the model at the same time. Therefore, our method
has great potential for deployment on portable diagnostic
systems.
V. C ONCLUSION
This paper proposes a novel skin lesion classification
method consisting of modified DCNNs integrating regulariza-
tion DropOut and DropBlock, Modified RandAugment, Multi-
TABLE XII Weighted New Loss and an end-to-end Cumulative Learning
T HE P ERFORMANCE OF T HREE R ECENT M ETHODS (R OWS 1–3), AND Strategy. DCNNs of different structures and capacities trained
O UR P ROPOSED A PPROACH ON THE 7-PT D ATASET
on different dermoscopic image datasets show that the models
with moderate complexity outperform the larger ones. The
Modified RandAugment helps to achieve significant perfor-
mance with less computing resources and shorter time. The
Multi-weighted New Loss can not only deal with the class
imbalance issue, improve the accuracy of key classes, but also
reduce the interference of outliers in the network training.
The end-to-end cumulative learning strategy can more effec-
tively balance representation learning and classifier learning
without additional computational cost. By combining Modified
of 10/21/2021), and ranks the first among those using a single RandAugment and Multi-weighted New Loss, we train single-
model without additional data. On the ISIC 2019 classification model DCNNs on ISIC 2018, ISIC 2017, ISIC 2019 and 7-PT
challenge (Table X), our proposed MWNL-CLS approach also Dataset following the end-to-end cumulative learning strategy.
ranks third on the live leaderboard at the time this manuscript They all achieve outstanding classification accuracy on these
is submitted, and surpasses the first-ranked team on the legacy datasets, matching or even surpassing those ensembling meth-
leaderboard, also ranks the first among those using a single ods. Our study shows that this method is able to achieve a
model without additional data. As the Table IX and Table X high classification performance at a low cost of computational
indicate that, the MWNL-CLS method significantly improves resources on a small and imbalanced dataset. It is of great
all the BACC, Avg. Spec and Sen. MEL than the MWNL potential to explore mobile devices for automated screening
method, which further proves the effectiveness of our CLS of skin lesions and can be also implemented in developing
training strategy on imbalanced datasets that are not very automatic diagnosis tools in other clinical disciplines.
small.
R EFERENCES
As shown in Table XI, on the ISIC 2017 classification
challenge, our proposed MWNL surpasses all the methods on [1] R. L. Siegel, K. D. Miller, and A. Jemal, “Cancer statistics, 2020,” CA
A, cancer J. Clinicians, vol. 70, no. 4, pp. 7–30, 2020.
the legacy leaderboard. Compared with other state-of-the-art [2] P. Tschandl, C. Rosendahl, and H. Kittler, “The HAM10000 dataset,
methods, the MWNL only lower than a method using both a large collection of multi-source dermatoscopic images of common
additional data and ensemble models [10]. Data in Table XII pigmented skin lesions,” Sci. Data, vol. 5, no. 1, pp. 1–9, Aug. 2018.
[3] M. E. Vestergaard, P. Macaskill, P. E. Holt, and S. W. Menzies,
indicates that our MWNL method is better than all other “Dermoscopy compared with naked eye examination for the diagnosis
published ones on the 7-PT dataset. It is worth noting that of primary melanoma: A meta-analysis of studies performed in a clinical
in our approach, only dermoscopic image data from the 7-PT setting,” Brit. J. Dermatol., vol. 159, no. 3, pp. 669–676, 2008.
[4] H. Feng, J. Berk-Krauss, P. W. Feng, and J. A. Stein, “Comparison of
dataset is used. While in the comparison methods, clinical dermatologist density between urban and rural counties in the United
image data, patient meta data or external data from other States,” JAMA Dermatol., vol. 154, no. 11, pp. 1265–1271, Nov. 2018.
datasets are additionally used. [5] G. Litjens et al., “A survey on deep learning in medical image analysis,”
Med. Image Anal., vol. 42, pp. 60–88, Dec. 2017.
In general, DCNNs with ensemble models perform better [6] A. Adegun and S. Viriri, “Deep learning techniques for skin lesion
than those with a single model [10], [44], [55], [74], [79], analysis and melanoma cancer detection: A survey of state-of-the-art,”
and similar phenomenon can also be observed from Artif. Intell. Rev., vol. 54, no. 2, pp. 811–841, Feb. 2021.
[7] J. Yang, X. Sun, J. Liang, and P. L. Rosin, “Clinical skin lesion diagnosis
Table IX - XII. However, implementing ensemble models is using representations inspired by dermatologist criteria,” in Proc. CVPR,
less practical due to the limitation in computing resources Jun. 2018, pp. 1258–1266.
Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on January 29,2023 at 13:32:49 UTC from IEEE Xplore. Restrictions apply.
YAO et al.: SINGLE MODEL DL ON IMBALANCED SMALL DATASETS FOR SKIN LESION CLASSIFICATION 1253
[8] T. Y. Satheesha, D. Satyanarayana, M. N. G. Prasad, and K. D. Dhruve, [33] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and
“Melanoma is skin deep: A 3D reconstruction technique for computer- R. Salakhutdinov, “Dropout: A simple way to prevent neural networks
ized dermoscopic skin lesion classification,” IEEE J. Transl. Eng. Health from overfitting,” J. Mach. Learn. Res., vol. 15, no. 1, pp. 1929–1958,
Med., vol. 5, pp. 1–17, 2017. 2014.
[9] N. Gessert et al., “Skin lesion classification using CNNs with [34] L. Wan, M. D. Zeiler, S. Zhang, Y. Lecun, and R. Fergus, “Regulariza-
patch-based attention and diagnosis-guided loss weighting,” IEEE Trans. tion of neural networks using DropConnect,” in Proc. ICML, Jul. 2013,
Biomed. Eng., vol. 67, no. 2, pp. 495–503, Feb. 2020. pp. 2095–2103.
[10] Y. Xie, J. Zhang, Y. Xia, and C. Shen, “A mutual bootstrapping model [35] I. J. Goodfellow, D. Warde-Farley, M. Mirza, A. Courville,
for automated skin lesion segmentation and classification,” IEEE Trans. and Y. Bengio, “Maxout networks,” in Proc. ICML, Jun. 2013,
Med. Imag., vol. 39, no. 7, pp. 2482–2493, Dec. 2020. pp. 2356–2364.
[11] Y. Yuan, M. Chao, and Y.-C. Lo, “Automatic skin lesion segmentation [36] G. Ghiasi, T.-Y. Lin, and Q. V. Le, “Dropblock: A regularization
using deep fully convolutional networks with Jaccard distance,” IEEE method for convolutional networks,” in Proc. NeurIPS, Dec. 2018,
Trans. Med. Imag., vol. 36, no. 9, pp. 1876–1886, Sep. 2017. pp. 10727–10737.
[12] A. Esteva et al., “Dermatologist-level classification of skin cancer with [37] E. D. Cubuk, B. Zoph, D. Mane, V. Vasudevan, and Q. V. Le, “AutoAug-
deep neural networks,” Nature, vol. 542, no. 7639, pp. 115–118, 2017. ment: Learning augmentation strategies from data,” in Proc. IEEE/CVF
[13] Y. Liu et al., “A deep learning system for differential diagnosis of skin Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 113–123.
diseases,” Nature Med., vol. 26, no. 6, pp. 900–908, 2020. [38] S. Lim, I. Kim, T. Kim, C. Kim, and S. Kim, “Fast AutoAugment,” in
[14] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: Proc. NeurIPS, Dec. 2019, pp. 6665–6675.
A large-scale hierarchical image database,” in Proc. IEEE Conf. Comput. [39] E. D. Cubuk, B. Zoph, J. Shlens, and Q. V. Le, “Randaugment: Practical
Vis. Pattern Recognit., Jun. 2009, pp. 248–255. automated data augmentation with a reduced search space,” in Proc.
[15] M. Combalia et al., “BCN20000: Dermoscopic lesions in the wild,” IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW),
2019, arXiv:1908.02288. Jun. 2020, pp. 3008–3017.
[16] N. C. F. Codella et al., “Skin lesion analysis toward melanoma detection: [40] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer,
A challenge at the 2017 international symposium on biomedical imaging “SMOTE: Synthetic minority over-sampling technique,” J. Artif. Intell.
(ISBI), hosted by the international skin imaging collaboration (ISIC),” Res., vol. 16, no. 1, pp. 321–357, Jan. 2002.
2017, arXiv:1710.05006. [41] A. Estabrooks, T. H. Jo, and N. Japkowicz, “A multiple resampling
[17] L. Yu, H. Chen, Q. Dou, J. Qin, and P.-A. Heng, “Automated melanoma method for learning from imbalanced data sets,” Comput. Intell., vol. 20,
recognition in dermoscopy images via very deep residual networks,” no. 1, pp. 18–36, 2004.
IEEE Trans. Med. Imag., vol. 36, no. 4, pp. 994–1004, Apr. 2017. [42] C. Huang, Y. Li, C. C. Loy, and X. Tang, “Learning deep representation
[18] J. Yang, X. Wu, J. Liang, X. Sun, M.-M. Cheng, P. L. Rosin, for imbalanced classification,” in Proc. IEEE Conf. Comput. Vis. Pattern
and L. Wang, “Self-paced balance learning for clinical skin disease Recognit. (CVPR), Jun. 2016, pp. 5375–5384.
recognition,” IEEE Trans. Neural Netw. Learn. Syst., vol. 31, no. 8, [43] Z.-H. Zhou and X.-Y. Liu, “On multi-class cost-sensitive learning,”
pp. 2832–2846, Aug. 2019. Comput. Intell., vol. 26, no. 3, pp. 232–257, 2010.
[19] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification [44] N. Gessert et al., “Skin lesion diagnosis using ensembles, unscaled
with deep convolutional neural networks,” Commun. ACM, vol. 60, no. 6, multi-crop evaluation and loss weighting,” 2018, arXiv:1808.01694.
pp. 84–90, May 2017. [45] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, “Focal loss for
[20] I. Radosavovic, R. P. Kosaraju, R. Girshick, K. He, and P. Dollar, dense object detection,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV),
“Designing network design spaces,” in Proc. IEEE/CVF Conf. Comput. Oct. 2017, pp. 2999–3007.
Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 10425–10433. [46] Y. Cui, M. Jia, T.-Y. Lin, Y. Song, and S. Belongie, “Class-balanced
[21] M. Tan and Q. V. Le, “EfficientNet: Rethinking model scaling for convo- loss based on effective number of samples,” in Proc. IEEE/CVF Conf.
lutional neural networks,” in Proc. ICML, Jun. 2019, pp. 10691–10700. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 9260–9269.
[22] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for [47] R. Takahashi, T. Matsubara, and K. Uehara, “Data augmentation using
image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. random image cropping and patching for deep CNNs,” IEEE Trans.
(CVPR), Jun. 2016, pp. 770–778. Circuits Syst. Video Technol., vol. 30, no. 9, pp. 2917–2931, Sep. 2020.
[23] M. Belkin, D. Hsu, S. Ma, and S. Mandal, “Reconciling mod- [48] B. Y. Li, Y. Liu, and X. G. Wang, “Gradient harmonized single-stage
ern machine learning practice and the bias-variance trade-off,” 2018, detector,” in Proc. AAAI Conf. Artif. Intell., 2019, pp. 8577–8584.
arXiv:1812.11118. [49] B. Zhou, Q. Cui, X.-S. Wei, and Z.-M. Chen, “BBN: Bilateral-branch
[24] Z. Yu et al., “Melanoma recognition in dermoscopy images via aggre- network with cumulative learning for long-tailed visual recognition,”
gated deep convolutional features,” IEEE Trans. Biomed. Eng., vol. 66, in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR),
no. 4, pp. 1006–1016, Apr. 2019. Jun. 2020, pp. 9716–9725.
[25] S. S. Han, M. S. Kim, W. Lim, G. H. Park, I. Park, and S. E. Chang, [50] K. Cao, C. Wei, A. Gaidon, N. Arechiga, and T. Ma, “Learning
“Classification of the clinical images for benign and malignant cutaneous imbalanced datasets with label-distribution-aware margin loss,” in Proc.
tumors using a deep learning algorithm,” J. Investigative Dermatol., NeurIPS, Dec. 2019, pp. 1–12.
vol. 138, no. 7, pp. 1529–1538, 2018. [51] A. Menegola, J. Tavares, M. Fornaciali, L. T. Li, S. Avila, and E. Valle,
[26] T. J. Brinker et al., “Deep learning outperformed 136 of 157 derma- “RECOD titans at ISIC challenge 2017,” 2017, arXiv:1703.04819.
tologists in a head-to-head dermoscopic melanoma image classification [52] B. Harangi, A. Baran, and A. Hajdu, “Classification of skin lesions using
task,” Eur. J. Cancer, vol. 113, pp. 47–54, May 2019. an ensemble of deep neural networks,” in Proc. 40th Annu. Int. Conf.
[27] K. M. Hosny, M. A. Kassem, and M. M. Foaud, “Classification of skin IEEE Eng. Med. Biol. Soc. (EMBC), Jul. 2018, pp. 2575–2578.
lesions using transfer learning and augmentation with Alex-net,” PLoS [53] B. Harangi, “Skin lesion classification with ensembles of deep con-
ONE, vol. 14, no. 5, pp. 1–17, May 2019. volutional neural networks,” J. Biomed. Inform., vol. 86, pp. 25–32,
[28] F. P. dos Santos and M. A. Ponti, “Robust feature spaces from Oct. 2018.
pre-trained deep network layers for skin lesion classification,” in Proc. [54] S. S. Chaturvedi, J. V. Tembhurne, and T. Diwan, “A multi-class
31st SIBGRAPI Conf. Graph., Patterns Images (SIBGRAPI), Oct. 2018, skin cancer classification using deep convolutional neural networks,”
pp. 189–196. Multimedia Tools Appl., vol. 79, nos. 39–40, pp. 28477–28498,
[29] A. G. Howard et al., “MobileNets: Efficient convolutional neural net- Oct. 2020.
works for mobile vision applications,” 2017, arXiv:1704.04861. [55] A. Mahbod, G. Schaefer, C. Wang, G. Dorffner, R. Ecker, and I. Ellinger,
[30] K. Simonyan and A. Zisserman, “Very deep convolutional net- “Transfer learning using a multi-scale and multi-network ensemble
works for large-scale image recognition,” in Proc. ICLR, May 2015, for skin lesion classification,” Comput. Methods Programs Biomed.,
pp. 1–14. vol. 193, pp. 1–9, Sep. 2020.
[31] S. Shen et al., “Low-cost and high-performance data augmentation [56] N. Codella et al., “Skin lesion analysis toward melanoma detection
for deep-learning-based skin lesion classification,” 2021, arXiv:2101. 2018: A challenge hosted by the international skin imaging collaboration
02353. (ISIC),” 2019, arXiv:1902.03368.
[32] H.-C. Shin et al., “Deep convolutional neural networks for computer- [57] J. Kawahara, S. Daneshvar, G. Argenziano, and G. Hamarneh, “Seven-
aided detection: CNN architectures, dataset characteristics and transfer point checklist and skin lesion classification using multitask multi-
learning,” IEEE Trans. Med. Imag., vol. 35, no. 5, pp. 1285–1298, modal neural nets,” IEEE J. Biomed. Health Informat., vol. 23, no. 2,
Feb. 2016. pp. 538–546, Mar. 2019.
Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on January 29,2023 at 13:32:49 UTC from IEEE Xplore. Restrictions apply.
1254 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 41, NO. 5, MAY 2022
[58] J. Hu, L. Shen, S. Albanie, G. Sun, and E. Wu, “Squeeze-and-excitation [70] T. Dozat, “Incorporating Nesterov momentum into Adam,” in Proc.
networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 8, ICLR Workshop, 2016, pp. 1–14.
pp. 2011–2023, Jun. 2020. [71] C. Santiago, C. Barata, M. Sasdelli, G. Carneiro, and J. C. Nascimento,
[59] G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, “Densely “LOW: Training deep neural networks by learning optimal sample
connected convolutional networks,” in Proc. IEEE Conf. Comput. Vis. weights,” Pattern Recognit., vol. 110, p. 12, Feb. 2021.
Pattern Recognit., Jul. 2017, pp. 2261–2269. [72] Y. Kim, Y. Lee, and M. Jeon, “Imbalanced image classification with
[60] R. K. Srivastava, K. Greff, and J. Schmidhuber, “Training very deep complement cross entropy,” 2020, arXiv:2009.02189.
networks,” in Proc. NeurIPS, Dec. 2015, pp. 2377–2385. [73] J. Steppan and S. Hanke, “Analysis of skin lesion images with deep
[61] T. A. Putra, S. I. Rufaida, and J.-S. Leu, “Enhanced skin con- learning,” 2021, arXiv:2101.03814.
dition prediction through machine learning using dynamic training [74] K. Matsunaga, A. Hamada, A. Minagawa, and H. Koga, “Image clas-
and testing augmentation,” IEEE Access, vol. 8, pp. 40536–40546, sification of melanoma, nevus and seborrheic keratosis by deep neural
2020. network ensemble,” 2017, arXiv:1703.03108.
[62] N. Gessert, M. Nielsen, M. Shaikh, R. Werner, and A. Schlaefer, “Skin [75] I. González Díaz, “Incorporating the knowledge of dermatologists to
lesion classification using ensembles of multi-resolution EfficientNets convolutional neural networks for the diagnosis of skin lesions,” 2017,
with meta data,” 2019, arXiv:1910.03910. arXiv:1703.01976.
[63] H. Zhang, M. Cisse, Y. N. Dauphin, and D. Lopez-Paz, “MixUp: Beyond [76] J. Zhang, Y. Xie, Y. Xia, and C. Shen, “Attention residual learning
empirical risk minimization,” in Proc. ICLR, 2018, pp. 1–3. for skin lesion classification,” IEEE Trans. Med. Imag., vol. 38, no. 9,
[64] R. Wu, S. Yan, Y. Shan, Q. Dang, and G. Sun, “Deep image: Scaling pp. 2092–2103, Sep. 2019.
up image recognition,” 2015, arXiv:1501.02876. [77] Y. Xie, J. Zhang, and Y. Xia, “Semi-supervised adversarial model for
[65] T. DeVries and G. W. Taylor, “Improved regularization of convolutional benign–malignant lung nodule classification on chest CT,” Med. Image
neural networks with cutout,” 2017, arXiv:1708.04552. Anal., vol. 57, pp. 237–248, Oct. 2019.
[66] D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” [78] J. Zhang, Y. Xie, Q. Wu, and Y. Xia, “Medical image classification
in Proc. ICLR, 2015, pp. 1–13. using synergic deep learning,” Med. Image Anal., vol. 54, pp. 10–19,
[67] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, May 2019.
“Rethinking the inception architecture for computer vision,” in Proc. [79] T. Nedelcu, M. Vasconcelos, and A. Carreiro, “Multi-dataset training for
IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, skin lesion classification on multimodal and multitask deep learning,” in
pp. 2818–2826. Proc. World Congr. Electr. Eng. Comput. Syst. Sci., Aug. 2020, pp. 1–8.
[68] I. Goodfellow, Y. Bengio, and A. Courville. (2016). Deep Learning. [80] J. F. Rodrigues, B. Brandoli, and S. Amer-Yahia, “DermaDL: Advanced
[Online]. Available: http://www.deeplearningbook.org convolutional neural networks for automated melanoma detection,”
[69] A. Graves, “Generating sequences with recurrent neural networks,” in Proc. IEEE 33rd Int. Symp. Comput.-Based Med. Syst. (CBMS),
2013, arXiv:1308.0850. Jul. 2020, pp. 504–509.
Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on January 29,2023 at 13:32:49 UTC from IEEE Xplore. Restrictions apply.