Deep Learning From Limited Training Data Novel Segmentation and Ensemble Algorithms Applied To Automatic Melanoma Diagnosis

Received February 3, 2020, accepted February 7, 2020, date of publication February 11, 2020, date of current version February
20, 2020.
Digital Object Identifier 10.1109/ACCESS.2020.2973188
Deep Learning From Limited Training Data: Novel

Segmentation and Ensemble Algorithms Applied
to Automatic Melanoma Diagnosis
BENJAMIN ALEXANDER ALBERT
Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
e-mail: balbert2@jhu.edu
ABSTRACT Deep learning algorithms often require thousands of training instances to generalize well.
The presented research demonstrates a novel algorithm, Predict-Evaluate-Correct K-fold (PECK), that
trains ensembles to learn well from limited data. The PECK algorithm is used to train a deep ensemble
on 153 non-dermoscopic lesion images, significantly outperforming prior publications and state-of-the-art
methods trained and evaluated on the same dataset. The PECK algorithm merges deep convolutional neural
networks with support vector machine and random forest classifiers to achieve an introspective learning
method. Where the ensemble is organized hierarchically, deeper layers are provided not only more training
folds, but also the predictions of previous layers. Subsequent classifiers then learn and correct the previous
layer errors by training on the original data with injected predictions for new data folds. In addition to the
PECK algorithm, a novel segmentation algorithm, Synthesis and Convergence of Intermediate Decaying
Omnigradients (SCIDOG), is developed to accurately detect lesion contours in non-dermoscopic images,
even in the presence of significant noise, hair, and fuzzy lesion boundaries. As SCIDOG is a non-learning
algorithm, it is unhindered by data quantity limitations. The automatic and precise segmentations that
SCIDOG produces allows for the extraction of 1,812 lesion features that quantify shape, color and texture.
These morphological features are used in conjunction with convolutional neural network predictions for
training the PECK ensemble. The combination of SCIDOG and PECK algorithms are used to diagnose
melanomas and benign nevi through automatic digital image analysis on the MED-NODE dataset. Evaluated
using 10-fold cross validation, the proposed methods achieve significantly increased diagnostic capability
over the best prior methods.
INDEX TERMS Convolutional neural network, feature extraction, medical diagnostic imaging, random
forest, support vector machine.
I. INTRODUCTION Physicians and dermatologists often diagnose melanoma

A. MELANOMA IMAGING using direct observation or by using a dermatoscope, a spe-
In 2019, an estimated 192,310 new cases of melanoma will cialized microscope that can control illumination, distance,
have been diagnosed in the United States alone, claiming resolution, angle, and other elements of the image capture
7,230 lives. Although it is relatively rare amongst the various process. Dermoscopic images allow the observer to perceive
types of skin cancers, it is by far the deadliest. When detected microscopic structures of the epidermis and outer dermis [2].
early, melanoma has a 98% 5-year survival rate. However, To make skin lesion analysis easier and more consistent,
when the disease metastasizes to lymph nodes and organs, dermoscopy can be used to observe high levels of detail,
5-year survival rates plummet to 64% and 23% respectively, uniformity, and repeatability. Many of the largest skin lesion
making early detection critical [1]. datasets consist of dermoscopic images [3]; consequently,
research on automatic skin lesion assessment has primarily
focused on dermoscopic imagery. However, dermatoscopes
The associate editor coordinating the review of this manuscript and are expensive and require technical training and expertise for
approving it for publication was Lefei Zhang . effective application [2], which are limiting factors to many
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/
31254 VOLUME 8, 2020
B. A. Albert: Deep Learning From Limited Training Data: Novel Segmentation and Ensemble Algorithms
people globally. By contrast, non-dermoscopic images may

be obtained using inexpensive mobile phones that are widely
available in developed and developing nations. The presented
research therefore focuses on non-dermoscopic imagery.
Computer-aided diagnosis systems (CADx) that can pro-
cess non-dermoscopic images offer greater applicability and
availability; however, these systems must be robust in the
face of the substantial image variations that occur without
dermoscopy. The proposed system begins with a novel seg-
mentation algorithm to identify lesion borders despite noise,
hair, low contrast, and other obstructions.
After segmenting the lesion from surrounding skin, the pre-
sented methods extract features used by clinicians to diag-
nose skin lesions. The primary distinguishing features of
melanomas include asymmetry, border irregularity, color
variation, large diameter, and a presence of evolution, rep-
resented by the acronym ABCDE [1]. However, only asym-
metry, border irregularity, and color (ABC), can be extracted
from static images taken from unknown distances and angles.
B. CADX FOR MELANOMA

Melanoma classification is challenging because many known
characteristics of melanoma manifest in benign nevi, which FIGURE 1. Workflow of experiment methods for melanoma diagnosis and
makes melanoma detection difficult even for dermatologists depiction of the five categories: dataset, segmentation, feature
extraction, machine learning, and evaluation.
and other experts [1]. These difficulties have also hindered
many prior methods from developing accurate automatic
melanoma detectors, especially for non-dermoscopic images.
One study [4] attained high accuracy but required an inordi- dataset of 129,450 images containing 2032 fine-grained dis-
nately large dataset: over 761 times larger than the one used in ease classifications [4].
the presented research. To develop such a dataset requires sig- Despite the successes of deep learning, such methods
nificant labor, time from clinical experts, and funding. Rather are remarkably data inefficient, often requiring thousands
than manually labeling hundreds of thousands of images, of training instances to attain robust generalizations. When
the presented methods achieve high 10-fold cross validation trained on a few hundred images, state-of-the-art methods
accuracy using the standard MED-NODE dataset consisting suffer drastic losses in accuracy. The presented methods also
of 170 images. Moreover, when provided the same train- use transfer learning with Inception V3 CNN but achieves
ing data, the presented methods achieve significantly greater high accuracy with limited training data by combining deep
accuracy than state-of-the-art deep learning techniques [4]. learning with traditional image classification through a novel
The proposed methods combine traditional image ensemble training algorithm. Therefore, the novelties and
classification techniques with deep learning models. These advancements this research contributes:
methods typically involve extraction of manually engineered • PECK: a novel training algorithm for deep ensembles
features for classification by algorithms such as Random • SCIDOG: a novel lesion segmentation algorithm for
Forest (RF) [5] or Support Vector Machine (SVM) [6]. Such non-dermoscopic images
methods applied to melanoma detection [7]–[10] yielded This research also demonstrates these algorithms for
marginal classification accuracy. Deep learning, notably con- melanoma and naevus classification on the MED-NODE
volutional neural networks (CNN), have since eclipsed the dataset to achieve higher accuracy than prior research. A stan-
traditional methods in many areas [11]–[14]. dard 10-fold cross validation split of the MED-NODE dataset
Where statistical methods require manually crafted fea- is thus proposed for comparison with future works.
tures, deep learning automates feature extraction by using
the backpropagation algorithm, making them effective for II. MATERIALS AND METHODS
general image classification. Convolutional neural net- The presented research is divided into five overarching cat-
works have been applied to melanoma detection and egories: dataset, segmentation, feature extraction, machine
produced higher accuracy than statistical image classifi- learning, and evaluation. The procedure is illustrated in Fig. 1.
cation methods [4], [15]. A prior study achieved high The dataset is first randomly split for 10-fold cross valida-
accuracy of automatic skin lesion diagnosis [4] using tion (k = 10). For each iteration of cross validation, 9 folds
transfer learning [16], [17] with the Google Inception are used for training and 1 is reserved for testing. For each
V3 CNN [18]. The skin lesion classifier was trained on a image in the training set, the lesion is segmented from the
VOLUME 8, 2020 31255

surrounding skin to extract statistical features that quantify

shape, color, and texture. The training set is also used to train
a library of deep networks that provide multiple predictions
for each training instance. The segmentation-based features
and the network-based features (predictions) are merged and
used in the novel ensemble training algorithm called PECK.
To evaluate the methods, the testing data is processed similar
to the training data, but the machine learning algorithms are FIGURE 2. Contrast enhancement of Melanoma image using (1). The
Hadamard product is used to represent layer-wise scaling by a value
not retrained; test images are automatically segmented for indexed by the dimensions of λ. The channel multiplier λ is calculated to
feature extraction and classification by the PECK ensemble. maximize surrounding skin pixel values while not saturating the lesion
area; saturated pixels appear white and result in a lossy transformation
This process repeats for the 10 testing folds and the average whereas unsaturated pixels remain distinguishable.
results are reported with the standard error of the mean.
A. DATASET
This research was conducted using the open MED-NODE
dataset [19] created by The Department of Dermatology on the MED-NODE dataset, this would introduce data pre-
of the University Medical Center Groningen (UMCG); the viously unavailable to prior methods, which would make
dataset was initially used to train the MED-NODE com- comparisons unfair. Moreover, methods that do not require
puter assisted melanoma detection system [7]. There are manual labelling are more widely applicable and can be used
170 non-dermoscopic images, of which 70 are melanoma and with fewer training data.
100 are nevi. The image dimensions vary greatly, ranging Unsupervised deep learning for semantic segmentation
from 201×257 to 3177×1333 pixels. is feasible [26], [27], but is more often applied to images
The dataset is partitioned into 10 equally sized folds for that contain high contrast borders, particularly from large
10-fold cross validation. Each fold consists of 7 melanomas scene datasets [28], [29], [25]. When applied to objects of
and 10 nevi. For each cross-fold iteration, 9 folds are used for lower contrast, particularly skin lesions, unsupervised learn-
training and 1 fold is reserved for evaluation. The k-fold split ing yields poor results [30]. Therefore, to more accurately
is randomly generated and the image identifications per fold segment lesions, a novel segmentation algorithm is devel-
are provided in the results of this research. oped. It is a non-learning algorithm that is inherently more
robust against noise, artifacts, and low contrast. The algo-
B. SEGMENTATION rithm, Synthesis and Convergence of Intermediate Decay-
Accurate lesion analysis and classification is highly depen- ing Omnigradients (SCIDOG), adaptively removes noise and
dent on accurate segmentation, the processing of detecting artifacts while improving contrast between the lesion and sur-
the lesion contour. However, due to the wide variety of skin rounding skin. The SCIDOG algorithm recursively eliminates
lesions and the unpredictable obstructions on the skin, tra- obstructions until it determines that no further artifacts are
ditional segmentation algorithms have often failed to seg- removable from the process. The lesion boundary is preserved
ment skin lesions accurately, especially for non-dermoscopic because it is relatively low frequency compared to noise and
images. Due to the poor performance of prior algo- hairs.
rithms, including statistical region merging for skin lesions SCIDOG starts by enhancing contrast in a preprocessing
(L-SRM) [20], [21] and variants of Otsu’s threshold [22], stage. Preprocessing transforms the images so that they are
a recent study [23] developed a popular skin lesion segmen- more easily segmented; color contrast is enhanced so that the
tation algorithm to identify lesion area using texture analysis. lesion is made more prominent against the surrounding skin.
Texture Distinctiveness Lesion Segmentation (TDLS) is able Without contrast enhancement, the presented segmentation
to accurately identify lesion contours and is more robust than algorithm often includes surrounding skin in the contour due
previous algorithms. However, TDLS still requires artifact to the gradual transition in coloration from the lesion to the
removal prior to segmentation for reliable results; without surrounding skin.
artifact removal, segmentation algorithms conflate noise with Contrast is enhanced by multiplying each channel of the
lesion boundaries. Moreover, segmentation algorithms also image by a scalar determined by the channel median. As the
will often yield poor results when presented a lesion with low surrounding skin is lighter than the lesion, the surrounding
contrast against the surrounding skin. skin pixels are multiplied to reach the maximum channel
Deep learning methods have demonstrated good perfor- value, making the skin appear white. Darker pixels are not
mance on semantic segmentation challenges [14], [24], [25]. maximized, therefore appearing more prominent against the
However, such methods require manual annotations of classes surrounding skin background as seen in Fig. 2. The chan-
at the pixel level. The MED-NODE dataset does not have nel multiplier exponentially decreases as the median of the
the semantic segmentation labelling necessary for supervised channel increases. Therefore, if the image is primarily dark,
deep learning, thereby precluding such methods. Although the channel multiplier is greater so that the lighter pixels are
external manual labelling by a clinician could be performed still maximized. On the contrary, if the image is primarily
31256 VOLUME 8, 2020

FIGURE 3. Omnigradients from all iterations of (2). As the omnigradient sequence progresses, the ratio of the sums of pixel values between adjacent
omnigradients converges to 1.
light, the channel multiplier is reduced so that the lesion is

not maximized.
Where b denotes the number of bits per pixel channel,
the channel multiplier λ for channel z is defined:
−q
λz = q1+q
FIGURE 4. Where the input to the SCIDOG algorithm is Fig. 2b, the final
2b − 1 merged omnigradient is (a) and applying the median filter produces (b).
where q = ; M = MEDIAN (z) ∈ N0 mod 2 b
(1) Thresholding (b) generates segmentation mask depicted in (c) and the
M largest curve is depicted in (d) by a green border.
SCIDOG recursively generates a sequence of merged gradi-
ent matrices, called omnigradients, that contain fewer arti-
facts as the sequence continues, depicted in Fig. 3. on the presence of obstructions that are removed with the
Median Filter. The predicate of (2) determines when the
φ : hNm×n×3
0 , c ∈ [1, ∞)i → {0, 1}m×n omnigradients converge as in Fig. 3; this is accomplished
by comparing the grand sum of the current omnigradient of

 φ(TP, c + 1)
(2) with that of the previous omnigradient in the sequence;

m Pn
j=1 α(T , c + 1)ij


 i=1
<0 if the ratio between the grand sums is greater than the

 | P
m Pn
φ (T , c) = i=1 j=1 α(T , c)ij 0 parameter, the predicate proves false and the recursion
τ β µ S1,2,...,c , 2c + 1 |

terminates.




As the c index of (2) increases, the Median Filter ker-

 otherwise
where Sg = β (α (T , g) , 2g + 3) (2) nel also increases, therefore eliminating more noise and
artifacts. Because higher frequency noise and artifacts are
The matrices are generated until the next matrix is considered removed with the Median Filter more easily than lower
redundant with the previous matrix. Where T represents a frequency objects, the difference between omnigradients
lesion image of r channels {B,G,R} and T ∈Nr×m×n 0 with decreases as the sequence progresses. Moreover, the elimina-
pixel matrices of m rows and n columns already multiplied tion of higher frequency objects with lesser c values and the
by the channel multiplier λ, the root recursive function φ of retention of lower frequency objects makes the grand sums
the SCIDOG algorithm is defined in terms of T and iteration decrease as the sequence increases. As a result of contin-
index c. ually decreasing grand sums, the grand sum ratio of (2) is
As (2) recurs, a smoothing function β applies a Median a value between [0,1] where 1 indicates no change in the
Filter of increasing kernel size to the T parameter in (3). The grand sum.
multiplied lesion image T is smoothed using the standard Larger 0 values result in increased artifact removal but
Median Filter with an odd kernel size proportional to c. The decreased contour detail, whereas smaller 0 values produce
Median Filter translates a square window, with side length more detailed contours at the cost of including unwanted
equal to the kernel size, over the image. For each translation pixels and artifacts. Therefore, it can be generalized that
of the window, the center pixel value is replaced with the increasing the 0 parameter makes the final contour more
median value in the region. This operation suppresses noise restrictive whereas decreasing the 0 parameter makes the seg-
and, with a large enough kernel, can remove extraneous mentation more lenient. Through qualitative assessment of
artifacts such as hairs [31]. Entities of higher frequency are segmentation results, the 0 parameter is manually optimized
eliminated before those of lower frequency; therefore, noise to 0.995. Using this hyperparameter, the product of SCIDOG
and artifacts are removed before destroying the lesion con- is given: where Fig. 2a is the original image and the result of
tour. The smoothing function β is defined: preprocessing the original image is Fig. 2b, the result of (2)
is depicted in Fig. 4d.
β : Nm×n×3
0 → Nm×n×3
0 Equation (2) executes a series of nested functions that
β (T , κ∈ 2N1 + 3) = MEDIANFILTER (T , kernelSize =κ) (3) produce the individual omnigradients:
SCIDOG is adaptive so that the number of iterations of (2), α : Nm×n×3

0 → Nm×n
0
thus the length of the omnigradient sequence, is dependent α (T , x) = µ (σ (γ (β (T , 2x + 3)) , 12)) (4)
VOLUME 8, 2020 31257

After the median filter, the rank 3 tensor T is converted to a Algorithm 1 SCIDOG
rank 2 tensor Y using BGR to grayscale conversion [32], [33]: Input : T ← BGR input image (b bits per pixel channel
value)
γ : Nm×n×3
0 → Nm×n
0 Output : Y ← binary mask of lesion area
0 0
γ (T ) = T where Tij = Tij · h0.114, 0.587, 0.299i 1. q ← (2b − 1)/MEDIAN(T )
−q
(5) 2. λ ← q1+q
3. T ← T λ
Equation (6) is used to create the gradient sequence. Deriva- 4. 2 ← empty list of omnigradients
tive approximations are calculated according to the Sobel and 5. 6previous ← ∞
Scharr operators whereby the grayscale image is convolved 6. 6current ← 0
with a kernel matrix of odd size and with equal weighting of 7. i ← 1
the x and y axes. A sequence of gradients is produced in this 8. while 6current /6previous < 0
manner with increasing kernel sizes: 9. M ← β(T , 2i + 1)
ω 10. M ← β(T , 2i + 1)
σ : Nm×n , ω ∈ [3, min ({m, n})] → S ∈ Nm×n

0 0 g=1 11. M ← γ (M )
σ (Y , ω) = S where Sg 12. S ← array of size 15
(
SCHARR(Y , κ = 3) | g = 1 13. S0 ← SCHARR(M )
= (6) 14. for j ∈ {1, 2, . . . 14}
SOBEL(Y , κ = 2g + 1) | otherwise
15. Sj ← Sobel(M , 2j + 3)
Equation (7) merges the gradient sequence to form a single 16. end for
omnigradient. 17. 9 ← µ(S)
ω 18. 6previous ←P 6current
µ: S∈Nm×n0 g=1
→ Nm×n
0 19. 6current ← 9
1 Xω 20. 9 ← β(9, 2i + 1)
µ (S) = 9 where 9ij =

Sg ij (7)
ω g=1 21. 2←2+9
When the recursion of (2) terminates, the final omnigradients 22. end while
are merged using (7), depicted in Fig. 4a, the product of 23. M ← µ (2)
which is then smoothed using (3) as shown in Fig. 4b using 24. M ← β (M , 2i + 1)
a κ parameter proportional to the final c value from (2). 25. Y ← τ (M )
Fig. 4b is then thresholded to produce Fig. 4c according to 26. return Y
the following function:
τ : Nm×n
0 → {0, 1}m×n after which the Sobel operator becomes large and produces
τ (Y ) = Y 0 where: Yij0 redundant gradients. These selected hyperparameter values

Pm Pn Yxy are used for the segmentation of all lesions.
1| ≤ Yij ≤ 2b − 1

= x=1 y=1 (8)
mn
0| otherwise C. FEATURE EXTRACTION
After segmenting the lesion, features are extracted according
The lesion in Fig. 2 has a distinct boundary that only requires to the ABC acronym that delineates common attributes of
only 7 iterations of the SCIDOG algorithm when 0 = 0.99. melanoma: asymmetry, border irregularity, and color vari-
The lesion in Fig. 5 has minimal boundary contrast and ation. In addition to the ABC features, texture features
requires 17 iterations of SCIDOG. The minimal contrast and are calculated using co-occurrence matrix analysis. In total,
relatively greater noise of Fig. 5 are apparent when observing 1,812 features are extracted.
the preliminary omnigradients in Fig. 5; the lesion contour
in the omnigradient sequence begins with poor definition. 1) ASYMMETRY
As the omnigradient sequence progresses, the delta between To quantify asymmetry, the central moments of the lesion
omnigradients decreases until the final pair of omnigradients are calculated using Green’s theorem [32]. The centroid is
have less than 1% delta. Combining this sequence using (7) then determined from the calculated moments. By converting
produces a merged omnigradient that is used to successfully the lesion contours to a set of Cartesian points centered at
segment the lesion despite minimal contrast distinguishing it the lesion centroid, the following formula [10] is used to
from the surrounding skin. determine the major axis of symmetry:
Fig. 5 also illustrates the general process by which the 0
X N2
parameter and the size of the gradient sequence are chosen. argminθ (Ri (x) + RN −i (x) + Ri (y) − RN −i (y)) (9)
By observing the omnigradient deltas, one can approximate i=1
the desired delta for all lesions. It is also observed that the R has N points in the xy plane that define the lesion contour.
sequences of gradients converge at approximately size 15, Angle θ granularity is a thousandth of a radian. To calculate
31258 VOLUME 8, 2020

axis of symmetry and calculating the ratio of reflected sym-

metry area to the total lesion area.
2) BORDER IRREGULARITY
Three metrics representing border irregularity are also calcu-
lated according to the following equations:

P2 CA − A CP − P
4πA A P
A = Lesion Area, P = Lesion Perimeter
CA = Convex Hull Area, CP = Convex Hull Perimeter
(10)
The first border irregularity metric is designed so that a circle
is considered perfectly regular and has a border irregularity
index of 1 [8]. The second border irregularity metric [10]
calculates the error of the area formed by the convex hull
relative to the actual lesion area. Similarly, the third border
irregularity metric calculates the error of a perimeter formed
by the convex hull where the lesion perimeter is ground truth.
3) COLOR
Color metrics are calculated across three color spaces: BGR,
HSV, and 1976 CIE L∗ a∗ b∗ (CIELAB). Due to the varying
geometries of the color spaces and unique representations of
a pixel, a color metric, such as covariance, can be extracted
per color space. Therefore, for every defined color metric is
extracted across the three color spaces [32].
With the original image in three color spaces, the fol-
lowing metrics are calculated per channel: arithmetic mean,
geometric mean, harmonic mean, and every tenth percentile
(including min 0 and max 100). Additionally, covariance
FIGURE 5. Complete portrayal of all SCIDOG stages for the segmentation is calculated for each channel combination per color space
of naevus where 0 = 0.99. Contrast is first enhanced by multiplication (6 combinations for 3 channels). These metrics are calculated
with λ. Then, for each row enumerated by c, the median filter is applied
(β) prior to grayscaling (Y ). The next columns (S) depict the gradients for 2 sets of pixels: the pixel set contained by the closed lesion
from the Sobel and Scharr filters. The omnigradient (µ) is the contour and the complementary set. Therefore, color metrics
combination of the gradients in the same row (c). For each row, median
filtering increases as a function of c, thereby gradually decreasing the
are calculated for the lesion area and the surrounding skin
delta between the sequence of omnigradients. When the delta between area with pixel-wise locality. There are 252 color features
the current omnigradient and the previous one is less than extracted per image.
hyperparameter 0, the omnigradients are merged, subjected to the
median filter (β), and thresholded (T ). The lesion contour can then be
extracted from the binary image. 4) TEXTURE
Texture features are calculated from co-occurrence matrix
analysis [34]. For each color-space including grayscale, three
regions of interest (ROI) are sampled: the inscribed rectangle,
the major axis of symmetry, the lesion contour points are bounding rectangle, and a rectangle formed by a corner of
rotated about the centroid in iterations of one thousandth the image and the lesion contour. Each ROI offers differing
of a radian. Every iteration, after rotation, the points below texture perspectives. The inscribed rectangle is a small subset
the centroid are reflected over the horizontal axis that passes of lesion pixels compared to the bounding rectangle, but it
through the centroid. Where the rotation and reflection trans- consists exclusively of lesion pixels, so it is a small but precise
formations on the contour produces a new region, the major texture ROI. The bounding rectangle is, on the contrary,
axis of symmetry minimizes the area of this region. The area a large but imprecise texture ROI. Lastly, the surrounding
of the region created by reflection over the major axis of skin sample is used to generate reference texture metrics to
symmetry is calculated and divided by the total lesion area. observe a delta between the lesion and the surrounding skin.
This process is repeated for every angle in the set {θ, θ+10◦ , For each ROI, symmetrical co-occurrence matrix metrics
θ+20◦ , . . . , θ+170◦ }. Therefore, 18 asymmetry features are are calculated for each channel combination and for each
calculated by stepping in 10 degree intervals from the major angle in the set {0◦ ,45◦ ,90◦ ,135◦ }. The metrics for a channel
VOLUME 8, 2020 31259

FIGURE 6. Example automatic segmentations using SCIDOG (green) with automatically calculated: centroid (gray circle), major axis of symmetry (black),
convex hull (blue), inscribed rectangle (light blue), bounding rectangle (red), and surrounding skin sample (white). Note minor adjustments were made to
aspect ratios of individual images to fit in grid layout.
FIGURE 7. PECK introspection module illustrated with the ensemble architecture used to detect melanoma in the MED-NODE dataset. All types of the
ensemble classifiers include: Inception V3 Convolutional Neural Network (CNN), Support vector machine (SVM), and random forest with X trees and X
random features (RF-X) with unlimited depth.
combination are averaged across all 4 angles to reduce the D. DEEP NETWORK TRAINING AND COMBINATIONS
effect of rotation. Gray-level co-occurrence matrix (GLCM) An Inception V3 Convolutional Neural Network [18] instance
metrics [34] are calculated from 8-bit images whereas co- is trained for each combination of sequential folds, yielding
occurrence matrix metrics from color images [35] are calcu- 90 unique networks. The Inception V3 combinations are
lated from 7-bit images to increase speed at minimal metric conceptually divided into sets and layers, denoted nS×L ; for
accuracy loss. For each co-occurrence matrix, standard devi- instance, the set 0 layer 0 network (n0×0 ) is trained on fold
ation and 26 unique texture features are extracted as defined 0, n0×1 is trained on folds 0 and 1, and n0×8 is trained on
by prior research on GLCM features [36]. This process ulti- folds 0 through 8. Similarly, n1×0 is trained on fold 1, n1×1
mately yields 1,539 texture features. Example segmentations is trained on folds 1 and 2, and n1×8 is trained on folds
and feature extraction regions are provided in Fig. 6. 1 through 9. The sequence wraps such that n9×1 is trained
31260 VOLUME 8, 2020

on folds 9 and 0. This pattern repeats for sets 0 through 9 and by the softmax function, mapping the input to a classification.
for layers 0 through 8. To mitigate redundancy, the presented methods convert CNN
To improve CNN generalization and reduce overfitting due output into a single number where the magnitude represents
to limited training data, synthetic images are generated from classification probability and the sign denotes class (nega-
the original training images using combinations of reflection, tive = naevus, non-negative = melanoma).
rotation, noise, brightness, and contrast. This process yields a
training set 288 times larger than the original. Geometric aug- E. PECK
mentations include vertical and horizontal reflections as well To improve accuracy and data efficiency, the Predict-
as all instances of right-angle rotations. Noise, brightness, and Evaluate-Correct K-fold (PECK) algorithm is proposed: a
contrast augmentations are detailed respectively where the novel ensemble training algorithm that combines statistical
original image T contains pixel value x, a three-dimensional image classification techniques with deep neural networks.
vector within the 255-bit BGR color space commonly used in It has been demonstrated that supplementing deep neural
image processing library OpenCV [32], at indices i and j, and networks with manually engineered features can boost accu-
x 0 is the augmented pixel value. racy [37]. However, where prior research injected manually
0 crafted features into the final layer of a CNN, the presented
x =x+ v methods inject previous classifier inferences into subsequent
vi ∼ N 0,σ 2 ; σ ∈ 0+ , 8

(11) classifier feature inputs. Moreover, the novel training algo-
0
x = αx α ∈ {0.8, 1.0, 1.2} (12) rithm gradually feeds more training data to enable intro-
0 (1 − α) spection concurrent with the primary classification task. The
x = αx + proposed PECK algorithm iteratively predicts and corrects
Xm X mn
n
· Tij · h0.114, 0.587, 0.299i
i=1 j=1
α ∈ {0.8, 1.0, 1.2} (13) Algorithm 2 PECK
Input :{D← k folds, N← Deep Network Combinations}
CNN models are trained over 300 epochs to prevent overfit- Output : Pfinal ← classifications for fold k − 1
ting with a batch size of 2880 and a learning rate of 0.01. The 1. S ← {D0 , . . . Dk−2 }
output of each CNN is a probability distribution, generated 2. φ ← CHOOSECOMBINATIONS(D, N , k − 2, k)
3. ρφ ← CLASSIFY(S, φ)
4. S ←INJECT(ρφ , S)
5. C1 ← initialize combinative classifier
6. C1 ←TRAIN(C1 , S)
7. for m ∈ {0, 1, . . . k − 2}
8. φ ←CHOOSECOMBINATIONS(D, N , m, k)
9. ρφ ←CLASSIFY(D, φ)
10. D ←INJECT(ρφ , D)
11. if (m < k − 2)
12. ψ ← initialize layer mclassifiers
13. ψ ←TRAIN(ψ, {D0 , . . . Dm })
14. ρψ ←CLASSIFY(D, ψ)
15. D ←INJECT(ρψ , D)
16. end if
17. end for
18. C2 ← initialize introspective classifier
19. C2 ←TRAIN(C2 , {D0 , . . . Dk−2 })
20. Pfinal ←(CLASSIFY(Dk−1 , C1 ) +CLASSIFY(Dk−1 C2 ))/2
21. return Pfinal
22. function CHOOSECOMBINATIONS(D, N , m, k)
23. C ← array of size (m+(k −2) (k −1) /2)
24. for n ∈ N
ISTRAINEDONALL (n, {D0 , . . . Dm })

25. if not
or ISTRAINEDONANY (n, Dk−1)
FIGURE 8. Complete procedural workflow and visualization of the PECK 26. C ←C +n
algorithm in context with respect to k-fold cross validation. The
C-RF-64 model trained on the Training Folds node is referred to as the
27. end if
combinative classifier whereas the I-RF-64 model trained on the 28. end for
Introspection Module is referred to as the introspective classifier. The 29. return C
mean of the combinative classifier and the introspective classifier is the
final PECK ensemble output. 30. end function
VOLUME 8, 2020 31261

FIGURE 9. Low-dimensional manifold embeddings of SCIDOG and network combination features using PCA (43 principal components) in conjunction
with t-SNE. Red points are melanoma and blue points are nevi. The t-SNE algorithm is trained with the Barnes-Hut clustering algorithm to further reduce
the 43-dimensional vectors to generate visualizations for assessing the separability of lesion classes prior to training machine learning classifiers.
a classification while implicitly learning how the previous training set generated by this method contains the predictions
layers learn by correcting their errors. This process is illus- on the new data folds from all prior layer classifiers. The
trated in Fig. 7 and referred to as the introspection module. classifier hyperparameters are kept constant throughout the
This module generates a training set that implicitly encodes layers so that introspective learning is feasible. Where Fig. 7
how the PECK ensemble layer classifiers and the Inception illustrates the introspection module of the PECK algorithm,
V3 neural network learn. Each iteration of the introspection the overall PECK algorithm is abstracted as follows:
module evaluates a set of folds with size proportional to the After the introspection module loop, a single RF-64 clas-
ensemble layer index. For each layer, the current training sifier is trained on the generated data set as the introspective
folds are evaluated and the next fold is predicted. These classifier. This introspective classifier output is averaged with
predictions are injected into the feature vectors of the next the combinative classifier output to produce the final PECK
fold as a class distribution probability. Therefore, the final classifications.
31262 VOLUME 8, 2020

For each layer of the PECK Ensemble, a subset of the space [39]. Thus, to visually analyze the data extracted
network combinations is selected to inject prediction features from SCIDOG and the Network Combinations node,
into the dataset for that k-fold iteration. For layer m of the the feature vectors are projected onto dimensions 2 and
PECK Ensemble, the network combinations consist of all net- 3 using a combination of Principal Component Analysis and
works that are not trained on the testing set, nor the networks t-SNE [40]–[42]. The 1,812 features extracted using SCI-
trained on all the folds used to train the layer m classifiers. DOG are projected separately from the 43 network com-
Networks that are trained on a subset, but not the entire set, bination predictions. For all t-SNE mappings, inputs are
of the training folds are included for feature injection. Oth- standardized by centering and dividing by standard deviation.
erwise, networks trained on all the same folds would appear Principal Component Analysis was then used to reduce
perfect to the layer m classifiers, so the classifiers would learn dimensionality to 43 principal components. The t-SNE algo-
to mimic the output of Inception V3 without further inference. rithm is then used with learning rate and perplexity hyper-
Therefore, networks trained on a total of k-1 folds are not parameters set to 1000 and 17 respectively. The maximum
used in the PECK Ensemble, but they are tested and used as number of iterations is 1000 while the termination tolerance
a comparative metric. For the zero-indexed layer m of the for the Kullback-Leibler divergence function is 10−10 . The
k-fold PECK Ensemble, there are n network combinations Barnes-Hut algorithm was used for clustering with theta =
where: n = m + (k − 2)(k − 1)/2. Due to overlapping combi- 0.001 [43]. The exaggeration hyperparameter was set to 1 and
nations across layers, in total, the network injection process 4 in combination with varying distance metrics: Euclidean
for 10-fold cross validation adds an additional 43 network and cosine.
prediction features for the combinative classifier of the PECK The preliminary visualizations depict separable vectors
ensemble. when mapped to the second and third dimensions. These visu-
Each layer performs statistical image classification on the alizations indicate the combined significance of the extracted
given training images; an SVM and 5 random forests are features, which is demonstrated when training binary classi-
trained on the manually engineered features extracted from fication models on the hyperdimensional data.
each image in addition to the network predictions. The SVM,
trained using sequential minimal optimization (SMO) [38], III. RESULTS
has a linear kernel and the random forests contain X trees Prior research [8], [7], [9], [10], [15], [44], [45] was tested on
that each train on X random features where X is in the set: the same dataset as the presented system. Spotmole [9] is a
{1,2,4,8,64}. The trees that comprise the random forests are commercial web and mobile application for skin lesion diag-
given unlimited depth. nosis. The web-based Spotmole is evaluated for diagnostic
capabilities. MED-NODE [7] is a semi-supervised system
F. FEATURE VISUALIZATION that utilizes physician input in addition to automatically
It has been demonstrated that data instances often exist extracted texture and color data to make a diagnosis. Physi-
as low-dimensional embeddings in higher dimensional cian feature extractions are regarded as annotations. The other
MED-NODE descriptors, texture and color, are separately
evaluated. The full MED-NODE system combines the anno-
tated, texture, and color descriptors to assign a final diagnosis
TABLE 1. Evaluation metrics where TP, TN, FP, FN correspond to the total
true positive, true negative, false positive, and false negative instances by majority vote. Barhoumi and Zagrouba [10] developed an
respectively. automatic system that localizes regions of interest, extracts
features that quantify characteristics similar to ABCD, and
classifies with a perceptron network. Previous research [10]
TABLE 2. Comparison of prior publications and the presented methods.
VOLUME 8, 2020 31263

TABLE 3. Comparison of classifiers trained and evaluated on the same 10-fold cross validation split. The Linear SVM, Bagging ensemble, Adaboost
M1 ensembles, Combinative RF-64, and RBF SVM are trained on the training folds injected with the Inception V3 predictions, which compose the data set
precisely used to train the Combinative classifier in the PECK Ensemble. The Leaked RBF SVM is optimized on the testing set solely for demonstrative
purposes; the complexity and gamma parameters are selected for greatest accuracy on the testing sets. Despite this advantage, the PECK Ensemble
outperforms the RBF SVM in all metrics.
TABLE 4. Classification results per set for Inception V3 (IV3), Combinative Random Forest (C-RF), Introspective Random Forest (I-RF) and PECK Ensemble
(PECK). Class probability distributions are normalized and centered to shift the output range from [0,1] to [−1,1] where negative values predict naevus
(N) and non-negative values predict melanoma (M). Highlighted cells contain incorrect classifications.
that surpassed the state-of-the-art systems used broad fea- convolutional neural network to produce improved results.
ture extraction and classification via SVM. The research The statistical evaluations of each system, tested on the same
team of [10] also developed the system in [15] that uses a dataset [19], are presented in Table 2.
31264 VOLUME 8, 2020

Some prior research cannot be used for comparison due

to inconsistent results such as stating 10-fold cross vali-
dation but only evaluating on 68 of the 170 images [45].
Another study cannot be used due to a similar inconsistency
whereby the published accuracy for 5-fold cross validation
of the 170 image dataset requires a fractional correctness
(151.419 correct images) [44].
After omitting the 2 inconsistent publications, the pre-
sented research achieves greater results than prior research
conducted on the same dataset. A few prior systems achieved
relatively high scores in individual evaluation metrics,
but these systems sacrificed other metrics. For instance,
Jafari et al. [10] attained high TPR (sensitivity) and NPV
scores but scored significantly lower on TNR (specificity)
and PPV (precision). Barhoumi and Zagrouba [8] achieved
the second highest TNR but yielded the lowest TPR. The
presented methods outperform each prior method in metric
category except TPR where it is 0.01 lower than the best [10].
The presented methods result in a total accuracy increase
of 10% from the best prior publications [7], [15] and 9%
accuracy improvement over Inception V3 [18]. FIGURE 10. Receiver operating characteristic (ROC) curves plot the
In addition to outperforming prior publications, the PECK sensitivity (TPR) against the specificity (TNR) to demonstrate classifier
Ensemble yields superior results to other classifiers tested capability with respect to a comprehensive threshold of the classifier
output distributions. A baseline random classifier is represented by the
on the same extracted and injected feature data. The tested dashed blue line with AUC = 0.5. A perfect classifier has AUC = 1. The
classifiers include: an SVM with linear kernel trained with AUC per classifier is parenthesized in the plot legend.
SMO, Bagging [46] ensemble with 10% bag size and 100 iter-
ations of linear SVM classifier, Adaboost M1 ensemble
with 1000 iterations of decision stump classifier, Adaboost prior methods, but also suggests greater reliability from more
M1 ensemble with 100 iterations of decision stump classifier, rigorous evaluation techniques.
and Random Forest with 64 trees and 64 random features Analysis of the output distribution per class is presented by
and unlimited depth. Additionally an SVM with radial-basis plotting the Receiver Operating Characteristic (ROC) curve
function (RBF) kernel was trained using SMO and grid- in Fig. 10 and Precision-Recall (PR) Curves in Fig. 11.
search optimized on the testing set where the complexity The respective area under the curve (AUC) is presented
parameter was searched in the set {2−10 ,2−9 ,. . . ,210 } and the per classifier parenthesized in the legend. The evaluated
gamma parameter of the kernel was concurrently searched classifiers include Inception V3, the combinative classifier
in the set {10−10 ,10−9 ,. . . ,1010 }. This SVM is optimized on (C-RF), the introspective classifier (I-RF), and the PECK
the testing set instead of a validation set to demonstrate the ensemble. Perfect classifiers yield an AUC = 1 for ROC
optimal complexity and gamma parameters in the grid search, and PR curves. A baseline random classifier for ROC curves
so the leaked RBF SVM is for demonstrative purposes only. would yield a straight line such that TPR = 1 − TNR, produc-
It is observed that the PECK Ensemble outperforms all other ing an AUC = 0.50. A baseline classifier for the PR curves
tested classifiers. Table 3 metrics are formatted as the mean represent the ratio of class instances; the lines where PPV
across 10 fold cross validation with the respective standard equals 7/17 and 10/17 indicate the baseline classifiers and
error of the mean. Given metric x and N folds, Table 3 values corresponding AUC for melanoma and naevus respectively.
are calculated as follows: The baseline random classifiers are illustrated with dashed
 s  lines for each threshold curve plot. AUC per classifier is
N N
N
2 calculated by integration using trapezoidal summation with
µx ± SEµx = 1 1
P P P
 xi ± xi − xi  (14) threshold steps of 0.001. The threshold curves indicate that
N N
i=1 i=1 i=1 the PECK ensemble outperforms the other classifiers with
regard to ROC and both PR plots.
The standard error of the mean therefore functions as a confi- The PECK ensemble is also tested against the Inception
dence interval for the reported metric means across 10-fold V3 instances trained on 9 folds, which were not used for
cross validation. These confidence intervals are calculable prediction feature injections. It is observed that in all but
with multiple trials, achievable through cross fold validation, one instance (set 3 melanoma MED-NODE image iden-
so they are not presented in prior research that primarily con- tifier 2143742), when the PECK ensemble is incorrect,
ducted single training and testing splits and no cross-fold vali- Inception V3 is also incorrect. The results indicate that the
dation. Therefore, the proposed solution not only outperforms PECK ensemble corrects 16 of 30 misclassified instances
VOLUME 8, 2020 31265

FIGURE 11. Precision-Recall (PR) curves plot the precision (PPV) against the recall (TPR) across multiple thresholds of the classifier output distributions.
Baseline classifiers are represented by horizontal dashed lines at precision equal to the respective class ratio: melanoma = 7/17 and naevus = 10/17. A PR
curve is generated for each class reconsidering class positivity; precision for melanoma regards the melanoma class index as positive whereas precision
for nevi considers naevus as the positive class. Similar designation of positivity is applied for determining recall. Perfect classifiers have AUC = 1.
TABLE 5. MED-NODE image identifiers per set with class labels. Ordering is preserved so that values in Table 4 can be cross-referenced with the image
identifier.
while adding 1 misclassification. Where highlighted cells in distributions per classifier are provided in Table 4. The results
Table 4 present misclassifications, the corrective capabilities in Table 4 can be cross-referenced to the precise MED-NODE
of the PECK ensemble become apparent; PECK not only image by the identifiers presented in Table 5 where ordering
corrects over half of Inception V3 misclassifications, but it is preserved for each set.
also correctly classifies 74% of the instances that prompt dis-
agreement between the combinative and introspective Ran- IV. CONCLUSION
dom Forest classifiers. Out of 38 discrepancies between the The similarities between melanomas and nevi make lesion
two Random Forest classifiers, only 10 result in misclas- classification challenging. The difficulty of this chal-
sifications by the PECK ensemble. The signed probability lenge is compounded by using a limited training set of
31266 VOLUME 8, 2020

non-dermoscopic images; non-dermoscopic images often study suggest that the introspective capabilities of the PECK
include variable noise, glare, artifacts, illumination, distance, ensemble significantly boost classification accuracy.
and angle. Although these factors are controlled by dermato- For robust comparison, the evaluation metrics are averaged
scopes, such equipment necessitates greater cost and exper- over 10-fold cross validation and provided with the standard
tise for application. Thus, the proposed methods focus on errors of the means as confidence intervals. Prior methods
non-dermoscopic imagery to develop a more widely appli- precluded this evaluation technique by primarily conducting
cable CADx system. single training and testing splits. The presented methods also
To analyze lesions despite the variability of non- include comprehensive descriptors such as F1 and MCC for
dermoscopic imagery and without pixel-wise class label- more direct comparison between classifiers and methods. The
ing, the segmentation algorithm SCIDOG is proposed. As a results and evaluation techniques suggest that the proposed
non-learning algorithm, SCIDOG automatically eliminates methods are not only superior with regard to diagnostic capa-
noise and artifacts while converging on the lesion boundary. bilities, but also with respect to reliability.
SCIDOG is demonstrated to be a robust algorithm for not
only variable image quality, but also the wide variety of V. FUTURE WORK
lesion appearances. The proposed algorithm eliminates hairs, Future work aims to quantify feature importance from a
detects fuzzy lesion boundaries and those with low contrast, dermatological perspective. Such work explores the signifi-
identifies lesions with distinct masses, and operates under any cance of asymmetry, border irregularity, color variation, and
resolution. The reliability of SCIDOG allows for the accu- texture metrics on the class distribution in high dimensional
rate extraction of 1,812 morphological features that describe space. The presented methods demonstrate such exploration
shape, color, and texture. After feature extraction, machine by visually comparing the separability of classes in the
learning is used for classification. low-dimensional manifold embeddings of the manually engi-
However, machine learning classifiers and state-of-the-art neered features and automatically extracted network features
methods suffer dramatic losses in accuracy when trained on in Fig. 9. Future work also aims to further dermatological
relatively limited data. Of the prior machine learning algo- emphasis by applying the proposed methods to multiclass
rithms evaluated, convolutional neural networks performed skin lesion classification [47].
the best. The previous best publication [15] used a CNN to In addition to dermatology, the problem of limited training
achieve 0.81 ACC with 0.61 MCC. Another skin lesion clas- data extends to other medical imaging challenges, some of
sification study trained the Inception V3 CNN architecture which include: lung cancer classification [48], brain tumor
on a massive dataset [4]; however, the presented methods classification, [49], prostate cancer segmentation [50], bone
demonstrate that the same Inception V3 CNN architecture fracture detection [51], bladder wall segmentation [52], and
only achieves 0.82 ACC with 0.65 MCC on the MED-NODE retinal vessel segmentation [53].
dataset. To improve classification accuracy from limited The presented methods can also be adapted for a vari-
training data, the PECK algorithm is developed. It merges ety of applications beyond medical imaging. For example,
the Inception V3 CNN architecture with Random Forest gene selection and classification of microarray data [54]
and Support Vector Machine classifiers in a deep ensemble could benefit from the PECK ensemble introspective learning
structure. The ensemble achieves 0.91 ACC with 0.83 MCC, to improve accuracy from highly dimensional but limited
significantly outperforming the best prior publications. training data. Other disciplines can benefit as well, such
The PECK algorithm enables introspection by layer- as automatic fault detection systems [55]–[57], modeling in
ing classifiers and propagating outputs such that classifiers chemoinformatics [58], natural language processing [59], and
implicitly learn the classification task concurrent to learning adaptive wireless networks [60]. The presented methods can
from previous layer errors. To demonstrate that increased be adapted generally for supervised learning and applied to
performance is due to the introspective capabilities rather such fields that have challenges with limited data.
than the addition of Inception V3 predictions, multiple clas-
sifier and ensemble algorithms are trained on the same data. REFERENCES
Additionally, an ablation study is performed to compare the [1] Skin Cancer Foundation. (2016). Melanoma Overview. [Online]. Avail-
able: http://www.skincancer.org
results of the individual components of the PECK ensemble. [2] A. Herschorn, ‘‘Dermoscopy for melanoma detection in family practice,’’
The study compares the results of the Inception V3 CNN, Can. Family Phys., vol. 58, no. 7, pp. 740–745, 2012.
combinative Random Forest, introspective Random Forest, [3] (2019). ISIC Archive. [Online]. Available: https://www.isic-archive.com
[4] A. Esteva, B. Kuprel, R. A. Novoa, J. Ko, S. M. Swetter, H. M. Blau,
and PECK ensemble. The PECK ensemble and the introspec- and S. Thrun, ‘‘Dermatologist-level classification of skin cancer with deep
tive Random Forest yield significantly greater performance neural networks,’’ Nature, vol. 542, no. 7639, pp. 115–118, Feb. 2017.
than all other classifiers. The classification probability distri- [5] L. Breiman, ‘‘Random forests,’’ Mach. Learn., vol. 45, no. 1, pp. 5–32,
Jan. 2001.
butions per instance are also analyzed to reveal the corrective [6] C. Cortes and V. Vapnik, ‘‘Support-vector networks,’’ Mach. Learn.,
capabilities of the PECK ensemble; the PECK ensemble vol. 20, no. 3, pp. 273–297, Sep. 1995.
corrects 16 of the 30 Inception V3 misclassifications while [7] I. Giotis, N. Molders, S. Land, M. Biehl, M. F. Jonkman, and
N. Petkov, ‘‘MED-NODE: A computer-assisted melanoma diagnosis sys-
only adding 1 new misclassification. Ultimately, the com- tem using non-dermoscopic images,’’ Expert Syst. Appl., vol. 42, no. 19,
parison between classifiers and the results of the ablation pp. 6578–6585, Nov. 2015.
VOLUME 8, 2020 31267

[8] W. Barhoumi and E. Zagrouba, ‘‘A prelimary approach for the automated [30] B. Bozorgtabar, S. Sedai, P. K. Roy, and R. Garnavi, ‘‘Skin lesion
recognition of malignant melanoma,’’ Image Anal. Stereology J., vol. 23, segmentation using deep convolution networks guided by local unsu-
no. 2, pp. 121–135, 2004. pervised learning,’’ IBM J. Res. Dev., vol. 61, nos. 4–5, pp. 6:1–6:8,
[9] C. Munteanu and S. Cooclea. (2009). Spotmole—Melanoma Control Sys- Jul. 2017.
tem. [Online]. Available: https://www.spotmole.com [31] M. E. Celebi, H. Iyatomi, G. Schaefer, and W. V. Stoecker, ‘‘Lesion border
[10] M. H. Jafari, S. Samavi, N. Karimi, S. M. R. Soroushmehr, K. Ward, and detection in dermoscopy images,’’ Comput. Med. Imag. Graph., vol. 33,
K. Najarian, ‘‘Automatic detection of melanoma using broad extraction of no. 2, pp. 148–153, Mar. 2009.
features from digital images,’’ in Proc. 38th Annu. Int. Conf. IEEE Eng. [32] G. Bradski, ‘‘The OpenCV Library,’’ Dr. Dobb’s J. Softw. Tools Prof.
Med. Biol. Soc. (EMBC), Aug. 2016, pp. 1357–1360. Program., vol. 25, no. 11, pp. 120–123, 2000.
[11] A. Krizhevsky, I. Sutskever, and G. E. Hinton, ‘‘ImageNet classification [33] Studio Encoding Parameters of Digital Television for Standard 4:3
with deep convolutional neural networks,’’ in Proc. Neural Inf. Process. and Wide-Screen 16:9 Aspect Ratios, document ITU-R Rec. BT.601-
Syst. Conf., Lake Tahoe, NV, USA, 2012, pp. 1097–1105. 7, International Telecommunications Union, Geneva, Switzerland,
[12] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, 2011.
V. Vanhoucke, and A. Rabinovich, ‘‘Going deeper with convolutions,’’ [34] R. M. Haralick, K. Shanmugam, and I. Dinstein, ‘‘Textural features for
in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2015, image classification,’’ IEEE Trans. Syst., Man, Cybern., vols. SMC-3,
pp. 1–9. no. 6, pp. 610–621, Nov. 1973.
[13] K. Simonyan and A. Zisserman, ‘‘Very deep convolutional networks for [35] K. Hossain and R. Parekh, ‘‘Extending GLCM to include color information
large-scale image recognition,’’ in Proc. Int. Conf. Learn. Represent., for texture recognition,’’ in Proc. Int. Conf. Modeling, Optim., Comput.
San Diego, CA, USA, 2015, pp. 1–14. (ICMOS), West Bengal, India, 2010, pp. 1–6.
[14] J. Long, E. Shelhamer, and T. Darrell, ‘‘Fully convolutional networks [36] J. J. M. Van Griethuysen, A. Fedorov, C. Parmar, A. Hosny, N. Aucoin,
for semantic segmentation,’’ in Proc. IEEE Conf. Comput. Vis. Pattern V. Narayan, R. G. H. Beets-Tan, J.-C. Fillion-Robin, S. Pieper, and
Recognit. (CVPR), Jun. 2015, pp. 3431–3440. H. J. W. L. Aerts, ‘‘Computational radiomics system to decode the radio-
[15] E. Nasr-Esfahani, S. Samavi, N. Karimi, S. M. R. Soroushmehr, graphic phenotype,’’ Cancer Res., vol. 77, no. 21, pp. e104–e107,
M. H. Jafari, K. Ward, and K. Najarian, ‘‘Melanoma detection by analysis Nov. 2017.
of clinical images using convolutional neural network,’’ in Proc. 38th Annu. [37] M. N. Kashif, S. E. A. Raza, K. Sirinukunwattana, M. Arif, and N. Rajpoot,
Int. Conf. IEEE Eng. Med. Biol. Soc. (EMBC), Aug. 2016, pp. 1373–1376. ‘‘Handcrafted features with convolutional neural networks for detection of
[16] J. Yosinski, J. Clune, Y. Bengio, and H. Lipson, ‘‘How transferable are tumor cells in histology images,’’ in Proc. IEEE 13th Int. Symp. Biomed.
features in deep neural networks?’’ in Proc. Neural Inf. Process. Syst. Imag. (ISBI), Apr. 2016, pp. 1029–1032.
Conf., Montreal, QC, Canada, 2014, pp. 3320–3328. [38] J. C. Platt, ‘‘Sequential minimal optimization: A fast algorithm for train-
[17] M. Huh, P. Agrawal, and A. A. Efros, ‘‘What makes ImageNet good ing support vector machines,’’ Microsoft Res., Redmond, WA, USA,
for transfer learning?’’ 2016, arXiv:1608.08614. [Online]. Available: Tech. Rep. MSR-TR-98-14, 1998.
https://arxiv.org/abs/1608.08614 [39] Z. Liu, J. Wang, G. Liu, and L. Zhang, ‘‘Discriminative low-rank preserv-
[18] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, ‘‘Rethinking ing projection for dimensionality reduction,’’ Appl. Soft Comput., vol. 85,
the inception architecture for computer vision,’’ in Proc. IEEE Conf. Dec. 2019, Art. no. 105768.
Comput. Vis. Pattern Recognit. (CVPR), Las Vegas, NV, USA, Jun. 2016, [40] L. van der Maaten and G. Hinton, ‘‘Visualizing data using t-SNE,’’ J. Mach.
pp. 2818–2826. Learn. Res., vol. 9, pp. 2579–2605, Nov. 2008.
[19] I. Giotis, N. Molders, S. Land, M. Biehl, M. Jonkman, and N. Petkov. [41] MATLAB 9.6 R2019a Toolbox Statistics and Machine Learning, Math-
(2015). MED-NODE Database. Department of Dermatology, University Works, Natick, MA, USA, 2019.
Medical Center Groningen, Groningen, The Netherlands. [Online]. Avail- [42] Z. Liu, Z. Lai, W. Ou, K. Zhang, and R. Zheng, ‘‘Structured optimal
able: http://www.cs.rug.nl/~imaging/databases/melanoma_naevi/ graph based sparse feature extraction for semi-supervised learning,’’ Signal
[20] M. Emre Celebi, H. A. Kingravi, H. Iyatomi, Y. A. Aslandogan, Process., vol. 170, May 2020, Art. no. 107456.
W. V. Stoecker, R. H. Moss, J. M. Malters, J. M. Grichnik, [43] J. Barnes and P. Hut, ‘‘A hierarchical O(N log N) force-calculation algo-
A. A. Marghoob, H. S. Rabinovitz, and S. W. Menzies, ‘‘Border rithm,’’ Nature, vol. 324, no. 6096, pp. 446–449, Dec. 1986.
detection in dermoscopy images using statistical region merging,’’ Skin [44] T. T. K. Munia, M. N. Alam, J. Neubert, and R. Fazel-Rezai, ‘‘Automatic
Res. Technol., vol. 14, no. 3, pp. 347–353, Aug. 2008. diagnosis of melanoma using linear and nonlinear features from digital
[21] M. H. Jafari, E. Nasr-Esfahani, N. Karimi, S. M. R. Soroushmehr, image,’’ in Proc. 39th Annu. Int. Conf. IEEE Eng. Med. Biol. Soc. (EMBC),
S. Samavi, and K. Najarian, ‘‘Extraction of skin lesions from non- Seogwipo, South Korea, Jul. 2017, pp. 4281–4284.
dermoscopic images for surgical excision of melanoma,’’ Int. J. Comput. [45] K. M. Hosny, M. A. Kassem, and M. M. Foaud, ‘‘Classification of skin
Assist. Radiol. Surg., vol. 12, no. 6, pp. 1021–1030, Jun. 2017. lesions using transfer learning and augmentation with Alex-net,’’ PLoS
[22] N. Otsu, ‘‘A threshold selection method from gray-level histograms,’’ IEEE ONE, vol. 14, no. 5, May 2019, Art. no. e0217293.
Trans. Syst., Man, Cybern., vol. SMC-9, no. 1, pp. 62–66, Jan. 1979. [46] L. Breiman, ‘‘Bagging predictors,’’ Mach. Learn., vol. 24, no. 2,
[23] J. Glaister, A. Wong, and D. A. Clausi, ‘‘Segmentation of skin lesions from pp. 123–140, 1996.
digital images using joint statistical texture distinctiveness,’’ IEEE Trans. [47] A. Hekler, J. S. Utikal, A. H. Enk, A. Hauschild, M. Weichenthal,
Biomed. Eng., vol. 61, no. 4, pp. 1220–1230, Apr. 2014. R. C. Maron, C. Berking, S. Haferkamp, J. Klode, D. Schadendorf,
[24] Q. Zhu, B. Du, and P. Yan, ‘‘Boundary-weighted domain adaptive neural B. Schilling, T. Holland-Letz, B. Izar, C. Von Kalle, S. Fröhling, and
network for prostate MR image segmentation,’’ IEEE Trans. Med. Imag., T. J. Brinker, ‘‘Superior skin cancer classification by the combination of
to be published. human and artificial intelligence,’’ Eur. J. Cancer, vol. 120, pp. 114–121,
[25] B. Zhou, H. Zhao, X. Puig, S. Fidler, A. Barriuso, and A. Torralba, ‘‘Scene Oct. 2019.
parsing through ADE20K dataset,’’ in Proc. IEEE Conf. Comput. Vis. [48] J. Kuruvilla and K. Gunavathi, ‘‘Lung cancer classification using neural
Pattern Recognit. (CVPR), Honolulu, HI, USA, Jul. 2017, pp. 5122–5130. networks for CT images,’’ Comput. Methods Programs Biomed., vol. 113,
[26] X. Xia and B. Kulis, ‘‘W-Net: A deep model for fully unsuper- no. 1, pp. 202–209, Jan. 2014.
vised image segmentation,’’ 2017, arXiv:1711.08506. [Online]. Available: [49] M. Sajjad, S. Khan, K. Muhammad, W. Wu, A. Ullah, and S. W. Baik,
http://arxiv.org/abs/1711.08506 ‘‘Multi-grade brain tumor classification using deep CNN with exten-
[27] P. Bhatt, S. Sarangi, and S. Pappula, ‘‘Unsupervised image segmentation sive data augmentation,’’ J. Comput. Sci., vol. 30, pp. 174–182,
using convolutional neural networks for automated crop monitoring,’’ in Jan. 2019.
Proc. 8th Int. Conf. Pattern Recognit. Appl. Methods, 2019, pp. 1–7. [50] Q. Zhu, B. Du, B. Turkbey, P. Choyke, and P. Yan, ‘‘Exploit-
[28] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, ing interslice correlation for MRI prostate image segmentation, from
and L. C. Zitnick, ‘‘Microsoft COCO: Common objects in context,’’ in recursive neural networks aspect,’’ Complexity, vol. 2018, pp. 1–10,
Proc. Eur. Conf. Comput. Vis., Zürich, Switzerland, 2014, pp. 740–755. 2018.
[29] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, [51] Y. D. Pranata, K.-C. Wang, J.-C. Wang, I. Idram, J.-Y. Lai, J.-W. Liu,
U. Franke, S. Roth, and B. Schiele, ‘‘The cityscapes dataset for semantic and I.-H. Hsieh, ‘‘Deep learning and SURF for automated classification
urban scene understanding,’’ in Proc. IEEE Conf. Comput. Vis. Pattern and detection of calcaneus fractures in CT images,’’ Comput. Methods
Recognit. (CVPR), Las Vegas, NV, USA, Jun. 2016, pp. 3213–3223. Programs Biomed., vol. 171, pp. 27–37, Apr. 2019.
31268 VOLUME 8, 2020

[52] Q. Zhu, B. Du, P. Yan, H. Lu, and L. Zhang, ‘‘Shape prior constrained PSO [59] A. Conneau, D. Kiela, H. Schwenk, L. Barrault, and A. Bordes, ‘‘Super-
model for bladder wall MRI segmentation,’’ Neurocomputing, vol. 294, vised learning of universal sentence representations from natural lan-
pp. 19–28, Jun. 2018. guage inference data,’’ 2017, arXiv:1705.02364. [Online]. Available:
[53] A. Dasgupta and S. Singh, ‘‘A fully convolutional neural network based http://arxiv.org/abs/1705.02364
structured prediction approach towards the retinal vessel segmentation,’’ [60] C. Jiang, H. Zhang, Y. Ren, Z. Han, K.-C. Chen, and L. Hanzo, ‘‘Machine
in Proc. IEEE 14th Int. Symp. Biomed. Imag. (ISBI), Melbourne, VIC, learning paradigms for next-generation wireless networks,’’ IEEE Wireless
Australia, Apr. 2017, pp. 248–251. Commun., vol. 24, no. 2, pp. 98–105, Apr. 2017.
[54] R. Díaz-Uriarte and S. Alvarez de Andrés, ‘‘Gene selection and classifi-
cation of microarray data using random forest,’’ BMC Bioinf., vol. 7, p. 3,
Dec. 2006.
[55] Y. Wu, B. Jiang, and Y. Wang, ‘‘Incipient winding fault detection and BENJAMIN ALEXANDER ALBERT has a
diagnosis for squirrel-cage induction motors equipped on CRH trains,’’ ISA primary appointment with the Department of
Trans., to be published. Biomedical Engineering and a secondary appoint-
[56] F. Lv, C. Wen, Z. Bao, and M. Liu, ‘‘Fault diagnosis based on deep learn- ment with the Department of Computer Science,
ing,’’ in Proc. Amer. Control Conf. (ACC), Boston, MA, USA, Jul. 2016,
Johns Hopkins University. He has previously con-
pp. 6851–6856.
ducted deep learning and imaging research with
[57] Y. Wu, B. Jiang, and N. Lu, ‘‘A descriptor system approach for estimation
of incipient faults with application to high-speed railway traction devices,’’ the JHU Applied Physics Laboratory. He is a mem-
IEEE Trans. Syst., Man, Cybern., Syst., vol. 49, no. 10, pp. 2108–2118, ber of the JHU Whiting School of Engineering,
Oct. 2019. Institute for Computational Medicine. He also
[58] J. B. O. Mitchell, ‘‘Machine learning methods in chemoinfor- serves as Reviewer for IEEE ACCESS and IEEE
matics,’’ WIREs Comput. Mol. Sci., vol. 4, no. 5, pp. 468–481, TRANSACTIONS ON MEDICAL IMAGING journals.
Sep. 2014.
VOLUME 8, 2020 31269

Deep Learning From Limited Training Data Novel Segmentation and Ensemble Algorithms Applied To Automatic Melanoma Diagnosis

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Deep Learning From Limited Training Data Novel Segmentation and Ensemble Algorithms Applied To Automatic Melanoma Diagnosis

Uploaded by

Copyright:

Available Formats

Received February 3, 2020, accepted February 7, 2020, date of publication February 11, 2020, date of current version February

Deep Learning From Limited Training Data: Novel

I. INTRODUCTION Physicians and dermatologists often diagnose melanoma

people globally. By contrast, non-dermoscopic images may

B. CADX FOR MELANOMA

VOLUME 8, 2020 31255

surrounding skin to extract statistical features that quantify

31256 VOLUME 8, 2020

light, the channel multiplier is reduced so that the lesion is

SCIDOG is adaptive so that the number of iterations of (2), α : Nm×n×3

VOLUME 8, 2020 31257

31258 VOLUME 8, 2020

axis of symmetry and calculating the ratio of reflected sym-

VOLUME 8, 2020 31259

31260 VOLUME 8, 2020

VOLUME 8, 2020 31261

31262 VOLUME 8, 2020

TABLE 2. Comparison of prior publications and the presented methods.

VOLUME 8, 2020 31263

31264 VOLUME 8, 2020

Some prior research cannot be used for comparison due

VOLUME 8, 2020 31265

31266 VOLUME 8, 2020

VOLUME 8, 2020 31267

31268 VOLUME 8, 2020

VOLUME 8, 2020 31269

You might also like