You are on page 1of 14

Biomedical Signal Processing and Control 51 (2019) 59–72

Contents lists available at ScienceDirect

Biomedical Signal Processing and Control


journal homepage: www.elsevier.com/locate/bspc

Automated detection of melanocytes related pigmented skin lesions:


A clinical framework
Sameena Pathan a , K. Gopalakrishna Prabhu b , P.C. Siddalingaswamy a,∗
a
Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India
b
Department of Electronics and Communication Engineering, Faculty of Engineering, Manipal University Jaipur, India

a r t i c l e i n f o a b s t r a c t

Article history: A clinically oriented Computer-Aided Diagnostic (CAD) system is of prime importance for the diagnosis
Received 17 September 2018 of melanoma, since the deadly disease is associated with high morbidity and mortality. Unfortunately,
Received in revised form 16 January 2019 the development of CAD tools is hampered by several issues, such as (i) smooth boundaries between the
Accepted 9 February 2019
lesion and the surrounding skin, (ii) subtlety of features between the melanoma and non-melanoma skin
lesions, and (iii) lack of reproducibility of CAD systems due to complexity. The proposed system aims to
Keywords:
address the aforementioned issues. First, the lesion regions are localized by incorporating chroma based
Benign lesions
deformable models. Second, the lesion patterns are analyzed to detect various dermoscopic criteria.
Dermoscopy
Malignant lesions
Further, a robust ensemble architecture is developed using dynamic classifier selection techniques to
Melanocytic nevi detect malignancy. Quantitative analysis is performed on two diverse datasets (ISBI and PH2) achieving
Pigment network an accuracy of 88% and 97%, sensitivity of 95% and 97% and specificity of 82% and 100% for ISBI and PH2
Shape datasets respectively.
Color © 2019 Elsevier Ltd. All rights reserved.
Texture

1. Introduction benign Pigmented Skin Lesions (PSL) [4]. The construction of the
classifier mainly depends on the features selected based on the
The malignant tumor due to abnormal growth of melanocytes analysis strategy.
is known as melanoma. It can originate in any part of the body A complete CAD tool for melanoma diagnosis mainly requires 4
and is the most lethal form of skin cancer [1]. Although, melanoma major steps: lesion segmentation, feature extraction, selection and
accounts for less than 20% of all the skin cancer cases, it has a lesion classification. Although, these steps are performed sequen-
high mortality rate. The prevalence of cutaneous melanoma has tially, the major concern is that the segmentation accuracy has a
increased rapidly over the past 30 years in the Caucasian popula- major decisive influence on the lesion diagnosis. Irrespective of the
tion [2]. Dermoscopy is an in vivo assessment technique performed surrounding skin, the Region of Interest (ROI) characteristics pre-
using the dermoscope for observation of the structures inside the dominantly effects the lesion classification [5]. However, due to
epidermis and dermis. It is an essential tool for dermatologists, the smooth boundaries between the lesion and normal skin, hair
plastic surgeons attempting early diagnosis of skin related disor- artifacts, and variabilities in skin texture, the accuracy of classifier
ders [3]. However, dermoscopic analysis suffers from significant is adversely affected. Artifacts that might change the morphology
interobserver disagreement as small observations go unnoticed. of the dermoscopic image, thereby hindering development of CAD
The investigation of computerized dermoscopy pictures can be system for lesion diagnosis, need to be eliminated. The Fig. 1 pro-
performed using various methodologies, either determined by the vides an illustration of the dermoscopic image analysis.
restorative master viewpoint, or driven by computerized knowl-
edge. Computer aided diagnostic process involves extraction of
huge number of low-level elements from the dermoscopic images 1.1. Related work
which allow the discrimination between malignant melanoma and
Several approaches have been proposed to design CAD systems
for melanoma diagnosis [6–17]. These approaches can be divided
∗ Corresponding author.
into 2 categories based on the features used for predicting the lesion
E-mail addresses: sameena.pathan.k@gmail.com (S. Pathan),
type (i) Dermoscopic based approaches and, (ii) Pattern recognition
gk.prabhu@manipal.edu (K. Gopalakrishna Prabhu), pcs.swamy@manipal.edu based approaches. The CAD tools that mimic the visual percep-
(P.C. Siddalingaswamy). tion of the dermatologist belong the former category. This involves

https://doi.org/10.1016/j.bspc.2019.02.013
1746-8094/© 2019 Elsevier Ltd. All rights reserved.
60 S. Pathan, K. Gopalakrishna Prabhu and P.C. Siddalingaswamy / Biomedical Signal Processing and Control 51 (2019) 59–72

Fig. 1. Illustration of Melanocytic lesion analysis.

the extraction of dermoscopic structures such as pigment net- the total dermoscopic score for classification of skin lesions in con-
work, blue-white veil, dots and globules etc. On the other hand, junction with Geodesic Active Contours (GAC). However, stopping
the extraction of subtle information from the region of interest criteria constraints resulted in misclassification. In the recent years,
belong to the latter category. Among the significant works reported several researchers have proposed deep learning architectures for
in literature, an interesting work was reported by Celebi et al. [6], skin lesion classification [15–16]. Yu et al. [15] employed deep
wherein color and texture features were extracted from the seg- networks in conjunction with residual networks of more than 50
mented lesion area for classification. Similarly, Schaefer et al. [7] layers for segmentation and classification of melanoma skin lesions.
used an ensemble of color, shape and texture features. A classi- Although, residual networks were used to overcome the network
fication accuracy of 83.4% was achieved by Barata et al. [8] by degradation problem, the rate of correct classification of melanoma
employing color constancy models. The method proposed in [8] lesions was reported to be 50.7%. Codella et al. [16] employed an
concentrates mainly on the clinical aspects of color in dermoscopic ensemble of machine learning and deep learning frameworks. Fully
images, while missing out the important role of shape features. convolution U-net is used for obtaining segmented masks, further
Pennisi et al. [9] proposed an automatic skin lesion segmen- hand-crafted features, sparse coding techniques and deep resid-
tation algorithm by employing Denulay triangulation. Although, ual networks are used for melanoma recognition. Although, deep
the segmentation approach produced better results in case of learning architectures have reportedly improved the classification
benign lesions, the segmentation accuracy was relatively lesser accuracies by learning from a large amount of data, optimizing the
for malignant lesions. Further, a classification sensitivity of 93.5% network parameters to reduce the computational complexity is a
and specificity of 87.1% was obtained by extracting geometri- challenging issue. Despite the significant advances, there exists a
cal and color features. A classification accuracy of 90.3% was margin scope in improvement for lesion segmentation and classi-
achieved by Maglogiannis et al. [10] by extracting features from fication.
dots/globules. The dots/globules were segmented using multi-
resolution approach. However, the classification accuracy could
be enhanced by combining the dots specific features with region
1.2. Contributions
based descriptors. A recent study by Barata et al. [11] employed
color and texture features while missing out the most important
The proposed system focusses on the development of a dermo-
shape features. Tejeddin et al. [12] classified malignant lesions from
scopic inspired framework for distinguishing benign and malignant
benign using a set of shape, color and texture features extracted
skin lesions. The proposed method differs from the existing related
from the lesion’s peripheral region. A pseudo-polar space was used
works reported in literature [6–17] in the following ways.
for mapping of peripheral region pixels. Jukic et al. [13] employed
tensor decomposition for analyzing color data from autofluores-
cent RGB images, since color of the lesion is an important feature
• The system takes into account the global and dermoscopic fea-
used for discriminating malignant melanocytic lesions. Oliveira
et al. [14] build an ensemble classification model using feature sub- tures for differentiating the pigmented skin lesions.
• The chroma based deformable model takes into account the
set selection from shape, color and texture features. However, the
method was computationally intensive and lacked pre-processing chrominance properties of the lesion, thus is robust in the pres-
and ROI extraction algorithms. In contrast to the handcrafted fea- ence of background variabilities.
• An ensemble of dynamic classifier architectures employed have
ture extraction approaches, deep learning based approaches aim
to automatically extract features in a hierarchical fashion. Kasmi proven to achieve better accuracy in identifying benign and
et al. [17] implemented the ABCD dermoscopic rule for computing malignant lesions in contrast to the computationally intensive
deep learning approaches.
S. Pathan, K. Gopalakrishna Prabhu and P.C. Siddalingaswamy / Biomedical Signal Processing and Control 51 (2019) 59–72 61

The organization of the paper is as follows. In Section 2 the function ∅ (x) , the unknown variable C is replaced by ∅ (x) and the
overview of the proposed system is presented. The subsections in energy function is re-written as given in (2).
the Section 2 briefly describe the methodology used for building a 
CAD system. The results obtained are briefed in Section 3 followed E = 1 (I (x) − c1)2 H(∅ (x))dx
by conclusion in Section 4.
 
2. Materials and methods + 2 (I (x) − c2)2 (1 − H(∅ (x))dx +  ı (∅ (x)) |∇ ∅ (x) |dx

The developed CAD tool for melanoma diagnosis mainly con-



sists of four major steps: Pre-processing and lesion localization, +v H (∅ (x)) dx (2)
feature extraction, feature selection and classification. The follow-
ing sections provide an explanation of the methods proposed for where, H (∅) and ı(∅) are Heaviside and Dirac functions written as
developing the CAD tool. in (3).

1
 2
 z 
2.1. Dataset
H (∅) = 1+ arctan (3)
2  ε
Two datasets namely PH2 [18] and ISBI 2016 [19] were used
for the extraction of features and classification. The PH2 datasets and
consists of 200 pigmented skin lesions, (160 benign and 40 malig- 1 ε
nant lesions) and the ISBI dataset consists of 1279 pigmented skin ı (∅) =
 ε2 + z 2
lesions, (1033 benign lesions and 246 malignant lesions).
Here, c1 and c2 are obtained by differentiating equation (2) with
2.2. Pre-processing and lesion localization  (∅ (x)) fixed.
respect to c1 and c2 keeping 
H(∅)I(x) (1−H(∅))I(x)
Correspondingly, c1 =  and c2 = 
The presence of hair greatly affects the accuracy of the segmen- H(∅) (1−H(∅))
tation algorithm. Thus, the primary step for development of a CAD The proposed energy function is defined as given in (4)
tool is the detection and exclusion of dermoscopic hair. Scalar or
vector dermoscopic images may be used for pre-processing. Sev- E Total = E Global + E Chroma (4)
eral hair detection and exclusion methods have been discussed in
the past [20–22]. Moreover, these techniques were designed based E Global is calculated from the grayscale dermoscopic image.
on the assumption that the color of the hair is much darker than the E Chroma is calculated from the CIE L*a*b color space using the
skin and lesion. However, most of the lesions are brown or black chrominance component of the dermoscopic image I(x, y). The CIE
in color due to the localization of melanin in the upper and lower L*a*b is more convenient than the tristimulus values with respect
epidermis [23]. This signifies the fact that, there exists a need to to its conceptual relationship to visual perception, and additionally,
include attributes specific to the properties of dermoscopic hair, it also provides the means to measure the differences between any
in the hair detection algorithm. In order to address this issue, the two colors. The color difference between the colors is calculated
hair detection procedure takes into account the width, magnitude using co-ordinate geometry as given in (5).
and direction of the hair shafts. As these properties are unique to 1
the dermoscopic hair, overlap between the lesion features and hair 2 2 2
E = [L + a + b ] 2 (5)
features is eliminated. Directional Gaussian filters are used iden-
tify the hair artifacts. A group of 16 directional filters were used for where, L is the lightness component, a and b indicate the chroma.
detection of hair using the luminance component of the perceptu- The lightness attribute L gives the measure of grey-scale from black
ally uniform CIE L*a*b color space. A detailed explanation of the hair to white. As the perceived color of the object mainly depends on
detection method used is given in [24]. the nature of the illuminating light, in some circumstances the dif-
The lesions are localized using the chroma based deformable ferences between the two colors may be considered in terms of
models. The performance of the geometric deformable models the differences in chroma (a, b). The Fig. 2 provides an illustration
mainly relies on the initial conditions used and the evolution of the of color space selection. The skin lesions should be recognizable
speed function. Color plays a very important role in dermoscopy, by their color dissimilarities from normal skin irrespective of the
since the color of melanin mainly depends on extent of the local- capturing device. Thus, in the proposed method the chroma com-
ization in the skin. Thus, the segmentation approach is proposed by ponent given in (6) is considered for lesion segmentation.
exploiting the aforementioned domain knowledge of skin lesions 
by considering the chroma component, rather than the conven- C (x, y) = a2 + b2 (6)
tional RGB channels.
The speed function is defined to segment the lesions in der- Assuming that the intensity values in the lesion region follow a
moscopic images with variabilities in intensities. The basic idea of Gaussian distribution, C(x, y) is represented as given in (7)
Chan-Vese is to partition the given image I (x) into foreground and
background [25]. The deformation of the contour towards the lesion − (C (x, y) − L )2
CG (x, y) = exp (7)
boundaries is through minimization of the energy function E. The 2L 2
Chan-Vese energy function is given as (1).
  The statistical values L , L are the mean and standard devi-
E = 1 (I (x) − c1)2 dx + 2 (I (x) − c2)2 dx + length (C) ation of the lesion region respectively. The statistical values L ,
and L are computed approximately by binarizing the dermoscopic
image by performing clustering in the chrominance color space. The
+ vArea (C) (1)
Euclidean distance is used as a metric to perform the classification
where, 1 , 2 , , v are fixed parameters as given in [18]. Using of pixels. Let (a0 , b0 ) and (a1 , b1 ) bethe pixel values of the ran-
the level set to represent curve C, the zero level set of a Lipschitz domly selected centroids, and ai , bj be the corresponding pixel
62 S. Pathan, K. Gopalakrishna Prabhu and P.C. Siddalingaswamy / Biomedical Signal Processing and Control 51 (2019) 59–72

The Chan-Vese energy function is minimized by differentiating E


(equation 2) with respect to ∅ (x) as follows.
As we know,

d(H(∅)
= ı (∅)
d∅


dı (∅ (x)) |∇ ∅ (x) | ∇∅
= div
d∅ ∇ ∅

Thus, E Total is calculated using the equations (9–10)


  
E Global = ı (∅) −1 (I (x, y) − c1)2 + 2 (I (x, y) − c2)2 + div ∇
∇∅

(9)


2
E Chroma = ı (∅) − 1 ( Gradient(C G (x, y)) − c1)


∇∅
+2 ( Gradient(C G (x, y)) − c2) + div
2
(10)
∇ ∅
Fig. 2. Illustration of color space selection.

∂ C G (x,y) 2 ∂C 2
where, Gradient( C G (x, y)) = + G (x,y)
∂x ∂y
∂ C G (x,y) ∂ C G (x,y)
and are the gradients in x and y direction
∂x   ∂y
The div |∇ ∅
∇ ∅|
known as curvature is given by (11).


2 2
∇∅ fyy fx + fxx fy − 2fx fy fxy
div = (11)
∇ ∅ 2 2
3
(fx + fy ) 2

The terms in equation (11) are obtained using finite differences


as given below.

f x (x, y) = (f (x + 1, y) − f (x − 1, y))*0.5

f y (x, y) = (f (x, y + 1) − f (x, y − 1))*0.5

f xx (x, y) = f (x + 1, y) + f (x − 1, y) − 2f (x, y)

f yy (x, y) = f (x, y + 1) + f (x, y − 1) − 2f (x, y)

f xy (x, y) = (f (x + 1, y + 1) − f (x + 1, y − 1) − f (x − 1, y + 1)
Fig. 3. Illustration of clustering.
+f (x − 1, y − 1)) *0.25
values of C (x, y), then using the following rule the dermoscopic
image I(x, y) is binarized.
The curve is evolved as given in (12).

0 if d0 (a, b) ≤ d1 (a, b)  
I ’ (x, y) = (8) ∅n+1 = ∅n + deltaT * E Total (12)
1 , otherwise
The value of deltaT known as step size as given by Chan Vese
 should be between 0.1 to 0.9. The curve evolution is stopped when
where, d0 (a, b) = (ai − a0 ) − (bi − b0 ) and
 the difference between the last two iterations is less than the
d1 (a, b) = (ai − a1 ) − (bi − b1 ) . The Fig. 3 illustrates the step size as given in (13), or when maximum iterations have been
binary image obtained after clustering using the above method. achieved.
Further, the binarized image is fused with the dermoscopic n+1 n
image to calculate the statistical values of L , and L . Using L , ∅ − ∅ ≤ deltaT (13)
and L , C G (x, y) is computed as explained below. The evolution
of the initial contour is attracted towards the lesion boundaries by The lesions segmented are further subjected to extraction of
taking the gradient of the Gaussian distribution C G (x, y). Thus, the features. The Fig. 4 provides an illustration of results of lesion seg-
total energy function is the summation of energy from the grayscale mentation. A mean segmentation overlap error of 11.5% and 7.2%
image and from the gradient of the Gaussian distribution C G (x, y). was obtained for ISBI 2016 and PH2 datasets respectively.
S. Pathan, K. Gopalakrishna Prabhu and P.C. Siddalingaswamy / Biomedical Signal Processing and Control 51 (2019) 59–72 63

difference in centroid positions as given in Step (2) of Algorithm 1.


The image is divided into two halves with respect to x − axis and
y − axis as illustrated in Fig. 5 to determine the asymmetry along
x and y axis. The Algorithm 1 provides a detailed summary of the
steps used for calculation of shape asymmetry.

Fig. 4. Illustration of segmentation results (a) Original Image (b) Segmented


masks(c) Segmented Masks overlapped on Ground truth.

Compactness Index-Compactness gives a measure of the circu-


larity of the lesion given by (14) [26].

PL 2
CI = (14)
4AL
P L and AL indicate the perimeter and area of the lesion. Thus a total
of 3 shape features were computed.

Fig. 5. Illustration of shape asymmetry. 2.3.2. Color features


Color plays a major role in dermoscopic image analysis. The
color of melanocytic lesion relies on the localization of melanin.
2.3. Feature extraction
Malignant lesions tend to exhibit three or more colors due to the
presence of melanin in the deeper layers of the skin, whereas benign
2.3.1. Shape features
lesions exhibit one or two colors. To quantify the color of the lesion,
The shape of the lesion is an important indicator of malignancy
features such as color asymmetry, color entropy, suspicious color
of melanocytic lesions. An irregular shape mainly signifies a malig-
score and statistical features are computed. The CIEL*a*b color space
nant lesion, whereas benign lesions are symmetrical in nature.
is used to determine the color asymmetry of the lesion. The per-
Thus, shape features are quantified by computing geometric fea-
ceived color difference between the opposite halves of the lesion
tures such as (i) Asymmetry index and, (ii) Compactness Index.
for CIEL*a*b channels is computed. The algorithm 2 provides a brief
The shape asymmetry index is computed using the lesion mask.
summary of the steps used for computing the color asymmetry.
As the lesions are not positioned at the image center. The position
of the lesion centroid is aligned with the image centroid using the
64 S. Pathan, K. Gopalakrishna Prabhu and P.C. Siddalingaswamy / Biomedical Signal Processing and Control 51 (2019) 59–72

Fig. 6. Illustration of Pigment Network detection (a) Original Image (b) Pigment
network detected.

2.3.4. Pigment network


One of the most significant dermoscopic feature of melanocytic
skin lesions is the presence of honeycomb like grid patterns over a
diffused background. Malignant melanocytic skin lesions are char-
acterized with atypical pigment network patterns, whereas benign
melanocytic skin lesions are characterized by typical pigment net-
work pattern. The pigment network is detected using a bank of 15
directional Gabor filters tuned to various orientations. The detailed
explanation regarding this is given in [23]. The Fig. 6 provides an
The suspicious color score is a cumulative score that is computed illustration of the pigment network detected. From the detected
by considering the Euclidean distance between the RGB values of pigment network patterns six features that specify the character-
the lesion pixels and the RGB values of the suspicious colors (Light istic nature of holes and lines are extracted.
Brown, Dark Brown, Red, White, Black and Blue-Grey) [17]. The The six features extracted are delineated as follows:
suspicious color score is inversely related to the Euclidean dis- Area of the Pigment Network (PN) (f 1): The area of the pigment
tance. Further, statistical color features such as color variance, color network specifies the presence or absence of the pigment network.
entropy, correlation coefficient and Eigen values are computed for Number of holes (H) (f 2): Holes formed by the grid network, are
the red, green, blue and gray planes for the lesion region and the characteristic features of pigment networks.
entire image. Holes to lesion Ratio (H/L) (f 3) and Pigment Network to lesion
Usually, the color of an image is represented through a partic- (PN/L) (f 4) ratio signifies the spread of pigment network regions.
ular color model. There are various color models to describe the In addition to this, the percentage of holes and lines in the pigment
color information within an image. All color spaces do not match network region are computed as given in (15)–(16).
the pre-requisite color criteria to choose an appropriate model. The
HSV, CIEL*a*b and CIEL*u*v color spaces represent colors based on Pigment Network Mask
No. of lines (f 5) = (15)
human perception. Furthermore, CIEL*a*b and CIEL*u*v are approx- Region Mask
imately perceptually uniform color spaces and can simplify the Hole Mask
identification of color properties, as it is easy to maintain color- No. of Holes (f 6) = (16)
Region Mask
difference ratios. Six statistical measures, i.e., average, variance,
standard deviation, skewness, entropy and energy are computed The feature extraction process resulted in a set of 3 shape, 113
for each color channel in the region of the lesion using the afore- color, 279 texture and 6 pigment network features. Thus, a total of
mentioned four-color spaces that correspond to 12 channels. Thus, 401 features were extracted.
leading to computation of 113 color features.
2.4. Feature selection

2.3.3. Texture features A non-parametric test based on Wilcoxon Rank Sum statistics
Tamura’s features characterize coarse texture, inhomogeneous [29] is conducted to test the difference in median values for the
contrast and irregularity in the texture patterns [27]. Since malig- features extracted for benign and malignant classes. A statistical
nant lesions exhibit the aforementioned texture patterns, Tamura’s significance p ≤ 0.05, is used to test the null hypothesis.
features are computed for the region of interest.
Discrete Wavelet Transform (DWT) features – The energy and Ho. The extracted features for benign and malignant lesions have
entropy measures from the coefficients obtained by DWT are equal medians.
computed for each of the Haar wavelet sub-bands obtained by The null hypothesis is tested against the alternative that they are
a-three-level decomposition for RGB, HSV, CIEL*a*b and CIEL*u*v not at 5% significance level. Thus, a set of 401 p-values were com-
color spaces. Since it is a three level decomposition, 10 wavelet puted for PH2 dataset. The features having p ≤ 0.05 were selected.
energy co-efficients and one entropy co-efficient are obtained for Among the 401, features extracted, 296 features had p-values <0.05
each channel, thus resulting in 132 DWT features. hence, these features were found to be statistically significant. The
GLCM features – Gray-level co-occurrence matrix (GLCM) is a sta- p-values were computed for the features extracted from the PH2
tistical approach which considers the spatial relationship between dataset, and the same selected features (296 features) were used
pixels. From the normalized co-occurrence matrix, 12 statistical from the ISBI datasets for classification. It can be observed from
measures were extracted from the image. These measures are con- the ISBI dataset, that 81% samples are benign and 19% samples
trast, dissimilarity, energy, entropy, homogeneity, sum average, are malignant. Since the dataset is imbalanced, with fewer number
sum entropy, difference entropy, difference variance, sum variance, of diseased cases, an oversampling technique termed as Adaptive
auto-correlation, and correlation [28]. These features were calcu- Synthetic Approach (ADAYSN) is used to synthetically generate
lated for CIEL*a*b, RGB, CIEL*u*v and HSV color spaces thus resulting samples from the minority class. ADASYN techniques are basically
in 12 × 3 × 4 = 144 features. Thus, a total of 279 texture features applied to medical problems, wherein the number of diseased cases
were computed. are comparatively low relative to the non-diseased cases [30]. The
S. Pathan, K. Gopalakrishna Prabhu and P.C. Siddalingaswamy / Biomedical Signal Processing and Control 51 (2019) 59–72 65

procedure for synthetically producing the minority samples is as Table 1


Performance parameters for PH2 dataset.
follows: For each minority data sample xi belonging to minority
class, the K-nearest neighbors (K = 5) are selected in n-dimensional No. of features SP SE ACC
space. This is followed by selection of one minority data sample 401 0.95 0.93 0.94
xzi from the nearest neighbors for the data xi . The synthetic data 388 0.91 0.97 0.96
sample si is calculated as given in (17). 296 0.91 0.98 0.97

Bold values indicate the best results for each of the methods used.
si = xi + (xi − xzi ) * (17)

where,  ∈ [0, 1] is a random number. Thus, ADASYN is applied Table 2


Performance parameters for ISBI dataset.
to malignant classes. Thus, the ISBI balanced dataset has 49.21% of
malignant samples and 50.78% of benign samples. No. of features SP SE ACC

401 0.81 0.79 0.80


2.5. Lesion classification 388 0.85 0.85 0.85
296 0.82 0.80 0.81

Classification forms the last block of a computer aided diagnostic Bold values indicate the best results for each of the methods used.
system. The literature reports various classifiers used for diagno-
sis of skin lesions [31]. Three classifiers namely SVM, ensemble of Among the 401 features, features such as contrast, direction-
decision stump trees and ensemble of AdaBoost classifiers with ality, and color variance for RGB channel and color scores for light
dynamic selection of classifiers technique are used. In ensemble brown, white and blue-grey were omitted. Since contrast was mea-
approach, a number of classifiers are tested, and the most appropri- sured as a part of GLCM texture feature, it was omitted from
ate one is chosen for a given test sample. Different classifiers usually Tamura texture feature. Similarly, color variance was also a part
make different errors on different samples, which means that, by of statistical values computed from RGB color space and hence
combining classifiers, we can put together an ensemble that makes it was omitted. Further, the computation of texture directional-
more accurate decisions [32]. In order to have classifiers with dif- ity was computationally intensive, hence it was omitted. A close
ferent errors, it is advisable to create diverse classifiers and group look at the color scores indicated that, light brown was the most
them into what is known as an Ensemble of Classifiers (EoC). If one common color present in the skin lesions irrespective of the type
classifier from an EoC can correctly classify a given pattern, then lesion (benign/malignant). Similarly, white and blue-grey colors
this EoC is considered to be able to classify this pattern. Intuitively, were found to be very rare. Thus, these three scores were ignored.
the more diverse the EoC, the better the classification. The major This lead to the reduction of 13 features from a set of 401 features
objective of the classification is to select those classifiers which leading to 388 features.
might be capable to correctly classify a given pattern. This is done
through a dynamic fashion, since different patterns might require
3. Results and discussions
different ensembles of classifiers. Thus, the method is known as
dynamic ensemble selection. The advantage of dynamic ensemble
3.1. Results of classification using SVM
selection is that we distribute the risk of over-generalization by
choosing a group of classifiers instead of one individual classifier
The following two set-ups were used for evaluating the classifi-
for a test pattern.
cation ability for the features extracted.
In the proposed work, two ensemble models are created. (i) The
Set-up 1: A hold out set of 20% was used for testing and 80%
first model is built by creating an ensemble of decision stump trees
was used for training the SVM classifier. The testing process was
by bagging, (ii) The second model is built by creating an ensem-
carried out in three iterations, with a different hold-out set used
ble of AdaBoost classifiers built using weak learners by bagging.
during each iteration. The results reported are the average per-
The dynamic ensemble selection methods such as Overall Local
formance values of the classifiers for the three iterations. The
Accuracy (OLA), Local Class Accuracy (LCA), A-Priori, A-Posterior,
performance metrics are sensitivity, specificity and accuracy as
KNORA-E and KNORA-U [33] are employed for the selection of the
given in (18)–(20). Sensitivity (SE) indicates the rate of correct clas-
classifiers for both the models.
sification of malignant lesions. Specificity (SP) indicates the rate of
First Model Decision Stump Trees – The number of decision
correct classification of benign lesions. Accuracy (ACC) indicates the
stump trees are decided based on the number of input features.
overall correct classification rate of benign and malignant lesions.
Thus, an ensemble of 401 decision stump trees are built using bag-
Table 1 and 2 depicts the classification performance on PH2 and
ging. The maximum depth of the tree is two. Once an ensemble of
ISBI datasets.
decision stump trees are built using bagging, the aforementioned
dynamic selection methods are used to select an ensemble of classi- TP
SE = (18)
fiers for a given test pattern. Further, the chosen classifier is applied (FN + TP)
to predict the label for the given test pattern from the ensemble of TN
classifiers selected. SP = (19)
(FP + TN)
Second Model - Similar to the first model, the decision stump
trees are replaced by AdaBoost of weak learners and further the TN + TP
ACC = (20)
dynamic selection models are used to predict a test label. (FN + FP + TN + TP)
The two models are applied to the features extracted from PH2 True Positive (TP): The classifier correctly predicts that the lesion is
and ISBI 2016 databases. The following section provides the results malignant.True Negative (TN): The classifier correctly predicts that
obtained using the aforementioned classification techniques. The the lesion is benign.False Positive (FP): The classifier predicts that
classification techniques are applied to 3 feature sets: the lesion is benign when it is malignant.False Negative (FN): The
classifier predicts that the lesion is malignant when it is benign.
i) 401 features extracted. Set-up 2 – To estimate the general performance of the classifica-
ii) 296 features that were found to be statistically significant tion model k-fold cross validation scheme with k = 10 is applied for
iii) 388 features. each set of the data. Each dataset is divided into ten equal sub-parts,
66 S. Pathan, K. Gopalakrishna Prabhu and P.C. Siddalingaswamy / Biomedical Signal Processing and Control 51 (2019) 59–72

Table 3 Table 5
Performance parameters for PH2 dataset using k-fold cross validation. Performance parameters for evaluating the generalization ability.

No. of features SP SE ACC Approach SP SE ACC

401 0.94 0.77 0.91 i) 0.80 0.83 0.81


388 0.94 0.80 0.91 ii) 0.85 0.90 0.88
296 0.95 0.72 0.91

Bold values indicate the best results for each of the methods used.
Table 6
Performance parameters for evaluating the generalization ability using k-fold cross
Table 4 validation.
Performance parameters for ISBI dataset using k-fold cross validation.
Approach SP SE ACC
No. of features SP SE ACC i) 0.76 0.81 0.79
401 0.72 0.85 0.78 ii) 0.85 1 0.93
388 0.84 0.72 0.78
296 0.71 0.82 0.76

Bold values indicate the best results for each of the methods used.
(ii) The classifier is trained on ISBI dataset and tested on PH2
dataset.

The results obtained for the two aforementioned approaches


using the two classification set-ups is given in Tables 5 and 6.
It can be observed form the Tables 5 and 6 that, the results
obtained for the approach (ii) using 10 fold cross validation is high
in terms of sensitivity, this is due to the larger training dataset
adopted.

3.2. Results for Ensemble of Classifiers with dynamic selection


methods

The results for ensemble of classifiers with dynamic selection


methods for ISBI and PH2 for the three different feature sets are
depicted in the following section (Tables 7–19). Correspondingly,
the Region of Operating Characteristics (ROC) curve for the best
results are illustrated in Figs. 8, 9, 11 and 12 .
Fig. 7. Frequency of best performance using SVM classifier for the three feature sets.
3.2.1. Ensemble of decision stump tress for ISBI dataset
See Tables 7 and 8.
stratification of the data is performed such that each sub-part con-
tains samples from both the classes (benign and malignant). Among 3.2.2. Ensemble of AdaBoost for ISBI dataset
the 10 sub-parts, 9 parts are used for training the SVM classifier and See Tables 9 and 10.
the 10th sub-part is used for testing. The average classification per-
formance is reported in Table 3 and 4 for PH2 and ISBI datasets for 3.2.3. Ensemble of decision stump trees for PH2 dataset
each of the feature sets. See Tables 11 and 12.
Two observations can be drawn from Tables 1 to 4 for the two
classes of lesions: 3.2.4. Ensemble of AdaBoost for PH2 dataset
For malignant lesions: The accuracy of classification of malig- See Tables 13 and 14.
nant lesions was found to be good using 296 features for the first To analyze the results reported in Tables 7 to 14, two sets of
set-up and in case of second set-up 388 features provided good graphs have been plotted that indicate the frequency of best clas-
results. For ISBI dataset, 388 features were found to be better for sification accuracy (benign and malignant) using decision stump
the first set-up and 401 features performed better for the second trees and AdaBoost classifier. The Fig. 10(a) illustrates the fre-
set-up. quency of best classification accuracy (benign and malignant) using
For benign lesions: The accuracy of classification of benign decision stump trees. It can be observed from the Fig. 10 (a) that
lesions in PH2 dataset was found to be good using 401 features 388 and 296 feature sets proved to be best in case of benign
for the first set-up, and the second set-up resulted in better results and malignant lesion classification. Similarly, the Fig. 10(b) illus-
using 296 features. In case of ISBI dataset, 388 features were found trates the frequency of best classification using AdaBoost. It can be
to be best for both the set-ups. In order to draw an inference regard- observed that, 388 feature set proved to be better in case of benign
ing the best feature set, a graph of frequency of performance (i.e. and malignant lesion classification in comparison to other feature
represented as frequency of occurrence) versus the number of fea- sets.
tures is plotted in Fig. 7. It can be observed from the Fig. 7 that The summary of results reported in Tables 7–14 is depicted
overall highest frequency of better performance was obtained for in Table 15. The Table 15 provides the frequency of best perfor-
the set of 388 features. mance considering two aspects i.e. the feature sets and the dynamic
In order to measure the generalization ability of the classifica- ensemble selection techniques. The number in the table quantifies
tion irrespective of the datasets two approaches have been adopted the number of times the dynamic ensemble selection technique
using 388 features. provided better results for the respective feature sets. It can be
inferred from the results summarized in Table 15, that 388 and 296
(i) The two datasets are concatenated resulting in a multi-dataset feature sets were found to give better results using Decision Stump
consisting of samples. Trees and AdaBoost respectively. Additionally, KNORA-E dynamic
S. Pathan, K. Gopalakrishna Prabhu and P.C. Siddalingaswamy / Biomedical Signal Processing and Control 51 (2019) 59–72 67

Table 7
Performance parameters obtained using Decision Stump trees on ISBI dataset.

Selection Method SP SE ACC SP SE ACC SP SE ACC


Features 401 Features 388 Features 296 Features

Normal 0.75 0.79 0.77 0.58 0.79 0.68 0.65 0.74 0.70
Ensemble 0.78 0.88 0.83 0.59 0.79 0.69 0.66 0.74 0.70
OLA 0.78 0.87 0.83 0.70 0.88 0.79 0.68 0.84 0.76
LCA 0.69 0.90 0.80 0.70 0.89 0.79 0.68 0.86 0.77
A-Priori 0.71 0.85 0.78 0.61 0.87 0.74 0.66 0.83 0.75
A-Posterior 0.51 0.97 0.74 0.61 0.93 0.76 0.63 0.89 0.76
KNORA-E 0.79 0.93 0.86 0.70 0.91 0.80 0.73 0.87 0.80
KNORA-U 0.79 0.90 0.85 0.58 0.82 0.70 0.63 0.80 0.72

Bold values indicate the best results for each of the methods used.

Table 8
Performance parameters obtained using Decision Stump trees on ISBI dataset using k-fold cross-validation.

Selection Method SP SE ACC SP SE ACC SP SE ACC


Features 401 Features 388 Features 296 Features

Normal 0.52 0.82 0.67 0.67 0.63 0.65 0.66 0.67 0.66
Ensemble 0.53 0.82 0.67 0.61 0.73 0.67 0.65 0.70 0.68
OLA 0.59 0.95 0.77 0.67 0.95 0.81 0.66 0.93 0.79
LCA 0.61 0.95 0.78 0.69 0.95 0.90 0.70 0.94 0.81
A-Priori 0.56 0.94 0.75 0.62 0.94 0.78 0.66 0.91 0.78
A-Posterior 0.54 0.98 0.76 0.58 0.99 0.78 0.60 0.97 0.78
KNORA-E 0.61 0.95 0.78 0.70 0.96 0.83 0.78 0.95 0.82
KNORA-U 0.54 0.88 0.71 0.61 0.91 0.76 0.65 0.85 0.74

Bold values indicate the best results for each of the methods used.

Table 9
Performance parameters obtained using AdaBoost on ISBI dataset.

Selection Method SP SE ACC SP SE ACC SP SE ACC


Features 401 Features 388 Features 296 Features

Normal 0.75 0.79 0.77 0.81 0.83 0.82 0.74 0.78 0.76
Ensemble 0.78 0.88 0.83 0.80 0.84 0.82 0.80 0.83 0.82
OLA 0.69 0.87 0.83 0.74 0.88 0.81 0.79 0.86 0.83
LCA 0.71 0.90 0.80 0.71 0.91 0.81 0.74 0.92 0.83
A-Priori 0.51 0.85 0.78 0.73 0.83 0.78 0.73 0.77 0.75
A-Posterior 0.79 0.97 0.74 0.57 0.98 0.77 0.57 0.96 0.77
KNORA-E 0.79 0.93 0.86 0.82 0.95 0.88 0.82 0.91 0.87
KNORA-U 0.79 0.90 0.85 0.81 0.87 0.84 0.81 0.88 0.84

Bold values indicate the best results for each of the methods used.

Table 10
Performance parameters obtained using AdaBoost on ISBI dataset using k-fold cross validation.

Selection Method SP SE ACC SP SE ACC SP SE ACC


Features 401 Features 388 Features 296 Features

Normal 0.76 0.79 0.77 0.74 0.80 0.77 0.73 0.77 0.65
Ensemble 0.79 0.85 0.85 0.79 0.84 0.81 0.79 0.83 0.81
OLA 0.75 0.91 0.83 0.76 0.90 0.82 0.75 0.97 0.82
LCA 0.73 0.90 0.84 0.73 0.96 0.83 0.74 0.94 0.84
A-Priori 0.74 0.85 0.80 0.75 0.81 0.78 0.74 0.85 0.79
A-Posterior 0.57 0.98 0.78 0.58 0.99 0.78 0.59 0.98 0.78
KNORA-E 0.81 0.93 0.87 0.82 0.93 0.96 0.82 0.93 0.87
KNORA-U 0.79 0.87 0.83 0.79 0.87 0.83 0.79 0.78 0.83

Bold values indicate the best results for each of the methods used.

Table 11
Performance parameters obtained using decision stump trees on PH2 dataset.

Selection Method SP SE ACC SP SE ACC SP SE ACC


Features 401 Features 388 Features 296 Features

Normal 0.88 0.88 0.88 0.97 0.80 0.93 0.90 1.00 0.93
Ensemble 0.91 0.75 0.88 1.00 0.80 0.95 1.00 0.80 0.95
OLA 0.88 0.62 0.82 0.90 1.00 0.93 0.97 0.60 0.88
LCA 0.88 0.62 0.82 0.93 0.90 0.93 0.97 0.70 0.90
A-Priori 0.91 0.88 0.90 0.90 0.80 0.88 0.90 0.70 0.85
A-Posterior 0.88 0.75 0.85 0.97 0.80 0.93 0.97 0.80 0.93
KNORA-E 0.88 0.88 0.88 0.97 1.00 0.97 1.00 0.80 0.95
KNORA-U 0.91 0.75 0.88 1.00 0.80 0.95 1.00 0.80 0.95

Bold values indicate the best results for each of the methods used.
68 S. Pathan, K. Gopalakrishna Prabhu and P.C. Siddalingaswamy / Biomedical Signal Processing and Control 51 (2019) 59–72

Table 12
Performance parameters obtained using decision stump trees on PH2 dataset using k-fold cross-validation.

Selection Method SP SE ACC SP SE ACC SP SE ACC


Features 401 Features 388 Features 296 Features

Normal 0.92 0.59 0.85 0.95 0.67 0.90 0.95 0.72 0.90
Ensemble 0.96 0.70 0.91 0.97 0.67 0.91 0.96 0.65 0.90
OLA 0.87 0.80 0.85 0.95 0.77 0.91 0.94 0.75 0.81
LCA 0.92 0.75 0.88 0.94 0.77 0.91 0.93 0.76 0.90
A-Priori 0.90 0.65 0.86 0.88 0.62 0.84 0.92 0.80 0.90
A-Posterior 0.91 0.72 0.88 0.93 0.72 0.89 0.93 0.80 0.91
KNORA-E 0.93 0.77 0.90 0.85 0.77 0.91 0.93 0.77 0.90
KNORA-U 0.96 0.70 0.91 0.97 0.67 0.91 0.96 0.75 0.92

Bold values indicate the best results for each of the methods used.

Table 13
Performance parameters obtained using AdaBoost on PH2 dataset.

Selection Method SP SE ACC SP SE ACC SP SE ACC


Features 401 Features 388 Features 296 Features

Normal 1.00 1.00 1.00 0.97 1.00 0.97 0.97 0.80 0.93
Ensemble 0.94 0.75 0.90 0.97 0.80 0.93 1.00 0.80 0.95
OLA 1.00 0.75 0.95 0.93 1.00 0.95 0.97 0.90 0.95
LCA 0.97 0.75 0.93 0.93 1.00 0.95 0.97 0.90 0.95
A-Priori 0.94 0.75 0.90 0.97 1.00 0.97 1.00 1.00 1.00
A-Posterior 0.91 0.62 0.85 1.00 0.70 0.93 0.97 0.70 0.90
KNORA-E 0.94 0.75 0.90 1.00 0.90 0.97 1.00 0.90 0.97
KNORA-U 0.94 0.75 0.90 1.00 0.80 0.95 1.00 0.80 0.95

Bold values indicate the best results for each of the methods used.

Table 14
Performance parameters obtained using AdaBoost on PH2 dataset using k-fold cross validation.

Selection Method SP SE ACC SP SE ACC SP SE ACC


Features 401 Features 388 Features 296 Features

Normal 0.98 0.70 0.92 0.94 0.75 0.90 0.95 0.85 0.92
Ensemble 0.98 0.75 0.93 0.97 0.80 0.84 0.96 0.77 0.92
OLA 0.96 0.70 0.91 0.95 0.85 0.83 0.95 0.80 0.92
LCA 0.96 0.70 0.91 0.95 0.82 0.92 0.95 0.77 0.92
A-Priori 0.94 0.72 0.90 0.96 0.77 0.92 0.93 0.97 0.88
A-Posterior 0.96 0.70 0.91 0.95 0.77 0.89 0.97 0.70 0.92
KNORA-E 0.98 0.80 0.93 0.97 0.82 0.84 0.95 0.72 0.92
KNORA-U 0.98 0.75 0.93 0.97 0.80 0.93 0.93 0.77 0.92

Bold values indicate the best results for each of the methods used.

Table 15
Summary of the results reported in Tables 7 to 14.

Selection Method Decision Stump Trees AdaBoost

401 388 296 401 388 296

Normal 1 1 1
Ensemble 1 1 1 1 1
OLA 1 1
LCA 1 2 1
A-Priori 1 1 1
A-Posterior 1
KNORA-E 2 3 3 3 3 2
KNORA-U 2 1 2 1 1 1

Table 16
Performance parameters for evaluating the generalization ability using approach (i).

Selection Method SP SE ACC SP SE ACC


Classifier Decision Stump Trees AdaBoost

Normal 0.77 0.53 0.65 0.78 0.73 0.75


Ensemble 0.74 0.58 0.66 0.81 0.81 0.81
OLA 0.74 0.80 0.77 0.79 0.90 0.84
LCA 0.76 0.84 0.80 0.76 0.94 0.85
A-Priori 0.69 0.76 0.72 0.76 0.82 0.79
A-Posterior 0.72 0.89 0.79 0.59 0.98 0.78
KNORA-E 0.77 0.81 0.79 0.85 0.92 0.89
KNORA-U 0.71 0.73 0.72 0.81 0.85 0.83

Bold values indicate the best results for each of the methods used.
S. Pathan, K. Gopalakrishna Prabhu and P.C. Siddalingaswamy / Biomedical Signal Processing and Control 51 (2019) 59–72 69

Table 17
Performance parameters for evaluating the generalization ability for k-fold cross-validation using approach (i).

Selection Method SP SE ACC SP SE ACC


Classifier Decision Stump Trees AdaBoost

Normal 0.70 0.58 0.64 0.78 0.77 0.78


Ensemble 0.71 0.58 0.64 0.82 0.82 0.82
OLA 0.70 0.80 0.80 0.78 0.89 0.83
LCA 0.70 0.94 0.81 0.76 0.92 0.83
A-Priori 0.68 0.90 0.79 0.77 0.85 0.81
A-Posterior 0.63 0.97 0.79 0.64 0.97 0.79
KNORA-E 0.75 0.94 0.84 0.84 0.92 0.87
KNORA-U 0.71 0.82 0.76 0.82 0.85 0.83

Bold values indicate the best results for each of the methods used.

Table 18
Performance parameters for evaluating the generalization ability using approach (ii).

Selection Method SP SE ACC SP SE ACC


Classifier Decision Stump Trees AdaBoost

Normal 0.80 1.00 0.84 0.98 0.30 0.84


Ensemble 0.85 1.00 0.88 0.95 0.70 0.90
OLA 0.85 0.80 0.84 0.76 0.60 0.73
LCA 0.83 0.80 0.82 0.78 0.80 0.78
A-Priori 0.85 0.90 0.86 0.78 1.00 0.82
A-Posterior 0.71 0.90 0.75 0.68 0.90 0.71
KNORA-E 0.85 0.70 0.82 0.93 0.60 0.86
KNORA-U 0.83 1.00 0.86 0.95 0.70 0.90

Bold values indicate the best results for each of the methods used.

Table 19
Performance parameters for evaluating the generalization ability for k-fold cross validation using approach (ii).

Selection Method SP SE ACC SP SE ACC


Classifier Decision Stump Trees AdaBoost

Normal 0.75 0.50 0.69 0.99 0.06 0.69


Ensemble 0.67 0.63 0.67 0.92 0.66 0.88
OLA 0.62 0.85 0.67 0.73 0.60 0.71
LCA 0.69 0.80 0.71 0.71 0.61 0.70
A-Priori 0.52 0.78 0.57 0.79 0.55 0.75
A-Posterior 0.63 0.85 0.67 0.62 0.83 0.68
KNORA-E 0.68 0.83 0.70 0.90 0.59 0.85
KNORA-U 0.63 0.92 0.69 0.91 0.66 0.87

Bold values indicate the best results for each of the methods used.

Fig. 8. ROC curve for (a) Decision Stump trees (b) Ensemble of AdaBoost with KNORE E dynamic selection for ISBI dataset.
70 S. Pathan, K. Gopalakrishna Prabhu and P.C. Siddalingaswamy / Biomedical Signal Processing and Control 51 (2019) 59–72

Fig. 9. ROC curve for (a) Decision Stump trees (b) Ensemble of AdaBoost with KNORE E dynamic selection on PH2 dataset.

Fig. 10. Illustration of frequency of best performance for benign and malignant lesion classification (a) Using Decision Stump Trees (b) Using AdaBoost of weak learners.

ensemble selection technique proved to be better among the other Table 20


Comparative analysis for lesion classification with State-of art methods.
dynamic ensemble selection techniques. Since, the 388 feature set
is found to be optimal feature set using SVM and dynamic ensem- Dataset Ref. SE (%) SP (%) ACC (%)
ble selection classifiers, the generalization ability is computed using Pennisi et al. [9] 93.5 87.1 –
388 feature sets as shown in Table 16–19. Barata et al. [11] 92.5 76.3 84.3
PH2 Tejeddin et al. [12] 95 95 95
Sateesha et al. [34] 96 97 –
3.2.5. Performance parameters for evaluating the generalization Proposed 97 100 97
ability using Ensemble of Decision trees and AdaBoost for the two Yu et al. [16] 54.7 93.1 85
ISBI Codella et al. [17] 62 79 75
approaches Proposed 95 82 88
See Tables 16 and 17.
Bold values indicate the best results for each of the methods used.
It can be observed from Table 16 to 17, that the generaliza-
tion ability evaluated using approach (i) yields good results using
LCA and KNORA-E for Decision Stump Trees and AdaBoost. Addi- selection method in comparison to their counterparts. A compar-
tionally, good results were obtained using KNORA-E for both the ative analysis of the skin lesion classification with the state of
classification models (Decision Stump Trees and AdaBoost). art methods for the respective datasets (PH2 and ISBI) is given in
It can be observed from Table 18 to 19, that the generaliza- Table 20. The best results mentioned in Table 20 for PH2 datasets
tion ability evaluated using approach (ii) yields good results using are reported for 388 feature set using AdaBoost dynamic ensemble
Ensemble and KNORA-U for Decision Stump Trees and AdaBoost. selection technique. These results are compared with the lesion
Additionally, good results were obtained using LCA and Ensemble classification results reported in [9,11,12,34] evaluated for the
models for both the classifiers. PH2 dataset. The best results reported for ISBI dataset belong
The Fig. 13(a) illustrates the frequency of best performance to the 388 feature set computed using AdaBoost with KNORA-
obtained for various dynamic selection methods and Fig. 13(b) illus- E dynamic ensemble selection technique. It can be inferred from
trates the frequency of best performance obtained for the various Table 20 that, the proposed set of dermoscopic inspired features
feature sets on PH2 and ISBI datasets have found to be better in classifying benign and malignant lesions
It can be observed from Fig. 13 that frequency of best per- in comparison to the state of art methods reported in litera-
formance was obtained for 388 features and KNORA-E dynamic ture.
S. Pathan, K. Gopalakrishna Prabhu and P.C. Siddalingaswamy / Biomedical Signal Processing and Control 51 (2019) 59–72 71

Fig. 11. ROC curve for (a) Decision Stump trees (b) Ensemble of AdaBoost with KNORE E dynamic selection for approach (i).

Fig. 12. ROC curve for (a) Decision Stump trees (b) Ensemble of AdaBoost with KNORE E dynamic selection for approach (ii).

Fig. 13. Frequency of Best Performance (a) Based on dynamic selection of ensemble classifiers (b) Based on Features.

3.3. Transfer learning paradigm tion net was passed to an Average pooling layer followed by a 2
neuron Softmax output. This output was trained on 400 Cat & Dog
A transfer learning paradigm approach is adopted to compare Images obtained from Kaggle [36], with the Xception model made
the efficacy of the proposed model with that of the convolution non-trainable. The trained model was tested on 256 × 256 cropped
neural network model. The proposed model takes into account der- ISBI skin lesion images. The Xception network resulted in a test
moscopic inspired features for designing the classification model, accuracy of 80.67%. It has been observed that the proposed model
whereas the transfer learning strategy is built using a pre-trained performs better in comparison to the Xception model built using
Xception Network that was trained on the Image Net dataset [35]. the transfer learning strategy.
Keras Library on top of Tensorflow was used. Output from the Xcep-
72 S. Pathan, K. Gopalakrishna Prabhu and P.C. Siddalingaswamy / Biomedical Signal Processing and Control 51 (2019) 59–72

4. Conclusion [10] I. Maglogiannis, K.K. Delibasis, Enhancing classification accuracy utilizing


globules and dots features in digital dermoscopy, Comput. Methods Programs
Biomed. 118 (2) (2015) 124–133.
A clinically oriented framework is developed for the diagno- [11] C. Barata, M.E. Celebi, J.S. Marques, Development of a clinically oriented
sis of melanocytic skin lesions. The proposed algorithm takes system for melanoma diagnosis, Pattern Recognit. 69 (2017) 270–285.
into account the domain knowledge of the skin lesions to extract [12] N.Z. Tajeddin, B.M. Asl, Melanoma recognition in dermoscopy images using
lesion’s peripheral region information, Comput. Methods Programs Biomed.
features specific to the lesion area. In addition to the statistical 163 (2018) 143–153.
features, novel algorithms for determining the region of interest, [13] A. Jukić, I. Kopriva, A. Cichocki, Noninvasive diagnosis of melanoma with
shape asymmetry, color asymmetry, color similarity index and pig- tensor decomposition-based feature extraction from clinical color image,
Biomed. Signal Process. Control 8 (6) (2013) 755–763.
ment network have been proposed. The implementation issues
[14] R.B. Oliveira, A.S. Pereira, J.M.R. Tavares, Skin lesion computational diagnosis
are discussed with the aid of quantitative and qualitative analy- of dermoscopic images: ensemble models based on input feature
sis. Further, ensemble of homogenous classifiers are designed and manipulation, Comput. Methods Programs Biomed. 149 (2017) 43–53.
[15] L. Yu, H. Chen, Q. Dou, J. Qin, P.-A. Heng, Automated melanoma recognition in
dynamic selection methods are incorporated for the selection of
dermoscopy images via very deep residual networks, IEEE Trans. Med.
ensemble models to distinguish the benign and malignant lesions. Imaging 36 (4) (2017) 994–1004.
The proposed approach is carried out on two dermoscopic datasets. [16] N. Codella, Q.B. Nguyen, S. Pankanti, D. Gutman, B. Helba, A. Halpern, J.R.
Additionally, ten-fold cross validation is also performed for the two Smith, Deep learning ensembles for melanoma recognition in dermoscopy
images, IBM J. Res. Dev. 61 (4) (2016).
datasets. In comparison with the other algorithms existing in the [17] R. Kasmi, K. Mokrani, Classification of malignant melanoma and benign skin
state of art literature, better performance is obtained by the pro- lesions: implementation of automatic ABCD rule, IET Image Process. 10
posed method on the respective datasets. The proposed system is (January (6)) (2016) 448–455.
[18] Teresa Mendonça, Pedro M. Ferreira, Jorge S. Marques, AndréR.S. Marcal, Jorge
designed using domain specific features rather than abstract image Rozeira, PH 2-A dermoscopic image database for research and benchmarking,
features. This implies the quantification of an expert’s domain Engineering in Medicine and Biology Society (EMBC), 2013 35th Annual
knowledge. Moreover, it is also shown that the features extracted International Conference of the IEEE (2013) 5437–5440.
[19] T. Mendonca, P.M. Ferreira, J.S. Marques, A.R.S. Marcal, J. Rozeira, PH2- A
are not specific to the dataset and can be used for diverse datasets. dermoscopic image database for research and benchmarking 2013, 35th
We hope that the proposed framework paves the way for a new Annual International Conference of the IEEE Engineering in Medicine and
direction in dermoscopic image analysis. Biology Society (EMBC), 2013.
[20] C. Barata, J.S. Marques, J. Rozeira, A system for the detection of pigment
network in dermoscopy images using directional filters, IEEE Trans. Biomed.
Acknowledgements Eng. 59 (10) (2012) 2744–2754.
[21] Q. Abbas, M. Celebi, I.F. García, Hair removal methods: a comparative study
for dermoscopy images, Biomed. Signal Process. Control 6 (4) (2011) 395–404.
The authors thank Dr. Sathish Pai Ballambat, Professor and Head,
[22] M.T.B. Toossi, H.R. Pourreza, H. Zare, M.H. Sigari, P. Layegh, A. Azimi, An
Department of Dermatology, Venereology and Leprosy, Kasturba effective hair removal algorithm for dermoscopy images, Skin Res. Technol.
Medical College, Manipal for the expert guidance. The authors want 19 (3) (2013) 230–235.
[23] S. Pathan, K.G. Prabhu, P.C. Siddalingaswamy, A methodological approach to
to thank Mr. Vatsal Aggarwal for the support provided in Python
classify typical and atypical pigment network patterns for melanoma
simulations. The authors also express their gratitude to Prof. Tan- diagnosis, Biomed. Signal Process. Control 44 (2018) 25–37.
weer, Manipal Institute of Technology, Manipal for his extensive [24] S. Pathan, K.G. Prabhu, P.C. Siddalingaswamy, Hair detection and lesion
support and contribution in carrying out this research. segmentation in dermoscopic images using domain knowledge, Med. Biol.
Eng. Comput. (2018) 1–15.
[25] T.F. Chan, L.A. Vese, Active contours without edges, IEEE Trans. Image Process.
References 10 (2) (2001) 266–277.
[26] T.K. Lee, D.I. Mclean, M.S. Atkins, Irregularity index: a new border irregularity
[1] S. Pathan, K.G. Prabhu, P.C. Siddalingaswamy, Techniques and algorithms for measure for cutaneous melanocytic lesions, Med. Image Anal. 7 (1) (2003)
computer aided diagnosis of pigmented skin lesions—a review, Biomed. 47–64.
Signal Process. Control 39 (2018) 237–262. [27] H. Tamura, S. Mori, T. Yamawaki, Textural features corresponding to visual
[2] A. Jemal, R. Siegel, J. Xu, E. Ward, Cancer statistics, 2010, CA Cancer J. Clin. 60 perception, IEEE Trans. Syst. Man Cybern. 8 (6) (1978) 460–473.
(5) (2010) 288–296. [28] Sameena Pathan, P.C. Siddalingaswamy, K.G. Prabhu, Classification of benign
[3] A. Blum, R. Hofmann-Wellenhof, H. Luedtke, U. Ellwanger, A. Steins, S. Roehm, and malignant melanocytic lesions: a CAD tool, Proc IEEE International
C. Garbe, H.P. Soyer, Value of the clinical history for different users of Conference on Advances in Computing, Communications and Informatics
dermoscopy compared with results of digital image analysis, J. Eur. Acad. (2017).
Dermatol. Venereol. Wiley 18 (6) (2004) 665–669. [29] C. Wild, G. Seber, The Wilcoxon rank-sum test (2016).
[4] Ammara Masood, Adel Ali Al-Jumaily, Computer aided diagnostic support [30] H. He, Y. Bai, E.A. Garcia, S. Li, ADASYN: Adaptive synthetic sampling approach
system for skin cancer: a review of techniques and algorithms, Int. J. Biomed. for imbalanced learning, Neural Networks, 2008. IJCNN 2008 (IEEE World
Imaging Hindawi (2013). Congress on Computational Intelligence) (2008) 1322–1328.
[5] H. Lee, Y.P.P. Chen, Skin cancer extraction with optimum fuzzy thresholding [31] S.M. Odeh, A.K.M. Baareh, A comparison of classification methods as
technique, Appl. Intell. 40 (3) (2014) 415–426. diagnostic system: a case study on skin lesions, Comput. Methods Programs
[6] M.E. Celebi, H.A. Kingravi, B. Uddin, H. Iyatomi, Y.A. Aslandogan, W.V. Biomed. 137 (2016) 311–319.
Stoecker, R.H. Moss, A methodological approach to the classification of [32] A.H. Ko, R. Sabourin, A.S. Britto Jr., From dynamic classifier selection to
dermoscopy images, Comput. Med. Imaging Graph. 31 (6) (2007) 362–373. dynamic ensemble selection, Pattern Recognit. 41 (5) (2008) 1718–1731.
[7] G. Schaefer, B. Krawczyk, M.E. Celebi, H. Iyatomi, An ensemble classification [33] L. Didaci, G. Giacinto, F. Roli, G.L. Marcialis, A study on the performances of
approach for melanoma diagnosis, Memetic Comput. 6 (4) (2014) 233–240, dynamic classifier selection based on local accuracy estimation, Pattern
2014/12/01. Recognit. 38 (11) (2005) 2188–2191.
[8] C. Barata, M.E. Celebi, J.S. Marques, Improving dermoscopy image [34] T.Y. Satheesha, D. Satyanarayana, M.G. Prasad, K.D. Dhruve, Melanoma is skin
classification using color constancy, IEEE J. Biomed. Health Inform. 19 (3) deep: a 3D reconstruction technique for computerized dermoscopic skin
(2015) 1146–1152. lesion classification, IEEE J. Transl. Eng. Health Med. 5 (2017) 1–17.
[9] A. Pennisi, D.D. Bloisi, D. Nardi, et al., Skin lesion image segmentation using [35] F. Chollet, Xception: Deep learning with depthwise separable convolutions,
Delaunay triangulation for melanoma detection, Comput. Med. Imaging arXiv preprint (2017) 1610-02357.
Graph. 52 (2016) 89–103, http://dx.doi.org/10.1016/j.compmedimag.2016.05. [36] Kaggle Dogs vs Cats, Retrieved from https://www.kaggle.com/c/dogs-vs-cats
002. (Online Accessed 1 December 2018).

You might also like