You are on page 1of 19

Physical and Engineering Sciences in Medicine

https://doi.org/10.1007/s13246-020-00890-3

SCIENTIFIC PAPER

Automated classification of diabetic retinopathy through reliable


feature selection
S. Gayathri1 · Varun P. Gopi1   · P. Palanisamy1

Received: 2 November 2019 / Accepted: 23 June 2020


© Australasian College of Physical Scientists and Engineers in Medicine 2020

Abstract
Diabetic retinopathy (DR) is a complication of diabetes mellitus that damages the blood vessels in the retina. DR is considered
a serious vision-threatening impediment that most diabetic subjects are at risk of developing. Effective automatic detection
of DR is challenging. Feature extraction plays an important role in the effective classification of disease. Here we focus on
a feature extraction technique that combines two feature extractors, speeded up robust features and binary robust invariant
scalable keypoints, to extract the relevant features from retinal fundus images. The selection of top-ranked features using
the MR-MR (maximum relevance-minimum redundancy) feature selection and ranking method enhances the efficiency of
classification. The system is evaluated across various classifiers, such as support vector machine, Adaboost, Naive Bayes,
Random Forest, and multi-layer perception (MLP) when giving input image features extracted from standard datasets (IDRiD,
MESSIDOR, and DIARETDB0). The performances of the classifiers were analyzed by comparing their specificity, precision,
recall, false positive rate, and accuracy values. We found that when the proposed feature extraction and selection technique
is used together with MLP outperforms all the other classifiers for all datasets in binary and multiclass classification.

Keywords  DR detection · Retinal fundus images · SURF · BRISK · Feature selection and ranking · MR-MR method · 10-
Fold cross validation

Introduction and proliferative DR (PDR). At the primary stage (NDPR),


many symptoms might develop in the retina due to the DR.
Diabetic retinopathy (DR) mainly occurs in people who have PDR refers to the advanced stage of DR. The stages in DR
a history of diabetes mellites. DR is a result of gradual dam- are broadly categorized as mild NPDR, moderate NPDR,
age to the blood vessels in the retina; where these tiny blood severe NPDR, and PDR[2]. This category is required when
vessels eventually leak blood and other fluids. DR is consid- DR severity grading is considered. The structure of a nor-
ered a serious vision-threatening complication of diabetes. mal retina is shown in Fig 1a. The major lesions[3] that are
DR is a clearly defined marker of coronary diseases[1]. The considered for grading are: (1) microAneurysms (MA), (2)
significance of early detection of DR is that it might help to blood vessels, (3) hemorrhage, and (4) exudates. The struc-
reduce the risk of coronary diseases and can be used as a ture of a retina with such lesions is shown in Fig. 1b–e. From
biomarker for many chronic diseases. However, early detec- these diagrams, the changes in retina are clearly visible. If
tion of DR requires efficient algorithms that can extract the any of the above conditions persist, then it is considered
most valuable features from the image for classifying images that the subject has DR. MA is a small swelling form in the
as either DR or normal fundus. The main stages involved wall of tiny blood vessels. In patients with DR, these minute
in DR can be categorized as non-proliferative DR (NPDR) swelling MAs are considered the earliest visible symptom
of DR. These can be visualized as small red dots in the ret-
ina[1]. As the disease progresses, its size increases. Retinal
* Varun P. Gopi hemorrhage is another disorder in the retina that is caused
varun@nitt.edu by DR. Other reasons for hemorrhage (HM) include hyper-
S. Gayathri tension and retinal vein occlusion. If HMs are small, then
gsgayathriunnithan@gmail.com they can resemble MAs. When there are lipid and protein
1 residues in the leaked blood from the damaged capillaries,
National Institute of Technology, Trichy, Tamilnadu, India

13
Vol.:(0123456789)
Physical and Engineering Sciences in Medicine

Fig. 1  Different stages of DR in
fundus images

these forms yellow flicks in the retina, called exudates. MAs classification. If all features that are related to DR can be
and HMs are categorized as dark lesions and exudates as automatically extracted, then the detection process will
bright lesions. It is necessary to implement a technique that become easier. For that, artificial intelligence will be useful.
can extract both of these lesion types. Intra-retinal microvas- Here we propose a simple and efficient algorithm for binary
cular abnormalities (IRMAs)[4] are among the most impor- and multiclass classification of DR from fundus images.
tant factors for detecting DR. IRMAs involve the abnormal
branching or expansion of retinal blood vessels.
DR is difficult to cure once it has progressed to an Related works
advanced stage. Ultimately, DR leads to complete vision
loss. It is important to reduce the global prevalence of DR. There have been many studies addressing DR detection and
To achieve this, many techniques have been developed to grading the severity of the disease. The probabilistic latent
detect DR in its early stages. It is important to make the semantic analysis (PLSA) technique has been combined with
technique as accurate as possible, as well as to reduce the the Bag of Visual Words (BoVW) method to separate dis-
implementation cost. There are already many effective algo- eased and normal retinal images using the SVM (support
rithms for the detection of macular edema from retinal opti- vector machine) classifier[6].A comparative study of the
cal coherence tomography (OCT) images [5]. If a subject different feature extraction techniques is conducted in[7].
has achieved PDR, they will have had to have gradually According to these authors’ analysis, SIFT (scale invariant
progressed through all other stages of DR, ultimately caus- feature transform), SURF (speeded up robust features), and
ing vision loss. In this regard, early detection of DR in its BRISK (binary robust invariant scalable keypoints) are the
primary stage (Mild NPDR)—or at least in the moderate most scale-invariant feature detectors have control on wide-
NPDR stage—is important. In the primary stage, the lesions spread scale variations. In[8], SURF and BoVW are used
appear, but they might not be clearly visible and difficult to to extract features and are given to the SVM classifier for
differentiate with the normal retinal conditions. Extracting further classification, which yielded an accuracy of around
each of the lesions in the retina of the subjects is a difficult 94%. Segmentation of retinal vessels using the Adaboost
task, and it makes the DR detection more complicated. For classifier is evaluated in[9]. Then, this feature based classi-
accurate automated detection, highly efficient feature extrac- fier was trained and tested with the DRIVE database, which
tion techniques are needed to get the most reliable features achieved an area under the ROC (Region of Operating Char-
from the fundus image that can differentiate normal and DR acteristics) curve value of 0.9561. The detection and clas-
images. This work concentrates on such an efficient auto- sification of glaucoma from retinal images by combining
mated feature learning method that can be used to extract clinical and multiresolution features is described in[10]. A
features from retinal fundus images for binary and multiclass fast feature extraction algorithm using SURF and BRISK

13
Physical and Engineering Sciences in Medicine

is proposed in[11], then, the extracted feature matching is regions in the image. All of the features obtained from the
performed using k- Nearest Neighbours (k-NN) for retina extractors are passed as input to the MR-MR technique for
identification task. The concept of feature extraction can feature selection and ranking. Here, the features are ranked
be adapted as it is a combination of two strong local fea- according to their relevancy, and then the most relevant 30
ture extraction techniques. A decision support system for features will be selected by maintaining minimum redun-
early detection of DR is introduced in[12]. The system dancy. This, in turn, enhances the speed of the classifier.
was developed with Gabor and Discrete Fourier Transform In this work, the strongest 30 features of each image are
(DFT) attributes. Then, spectral regression discriminant selected as input to the classifier for training. The feature
analysis is used to perform the dimensionality reduction. selection is performed according to the relevancy of a feature
Random forest and logistic regression classifiers are used in describing the characteristics of the image. Then, 10-fold
for the classification. For bright lesion detection, feature cross-validation is applied for validating the classifier. The
extraction using SIFT is applied in[13]. Then, dimension- script for the work is done in python 3.7.
ality reduction is carried out using Laplacian Eigen (LE)
maps. In[14], local features of retinal images are extracted
Feature extraction techniques
using Local Binary Patterns (LBP). Then, this is evaluated
across Artificial Neural Network (ANN), Random Forest,
This is an important task that increases the efficiency of the
and SVM for the detection task. In[15], feature selection
whole system. In the proposed work, a combination of two
from the retinal OCT images are made by the Laplacian
local feature extractors (SURF and BRISK) is used. Local
score (L-score) method for angle-closure glaucoma detec-
features illustrate the local properties of an image (i.e., they
tion. Then, the maximum relevance-minimum redundancy
extract a set of salient points that can be detected repeat-
(MR-MR) method is used for dimensionality reduction. The
edly from the same image irrespective of scale variance,
classification was performed using the AdaBoost classifier.
illumination variance, and orientation variance and describe
A sparse coding technique with linear SVM for retinal image
the gradient properties around the salient points). Thus, an
classification is proposed in[16]. These authors make use
automatic feature learning technique is implemented using
of the BoVW technique for feature extraction. According
SURF and BRISK.
to their evaluation, a dictionary size of 100 achieves bet-
ter sensitivity and specificity. A data fusion method with
a meta-SVM classifier for DR detection is implemented Speeded up robust features (SURF)
in[17]. In[18], a scanning window analysis (SWA) and the
hybrid method of morphology are applied for retinal feature As discussed, SURF[23] is an efficient feature detector and
extraction. Principal component analysis (PCA) is adapted descriptor that can be applied for object recognition and
to locate the optic disc for retinal feature extraction in[19]. classification.
Also, to detect the disk boundary, a modified active shape The steps involved in this feature extraction are:
model (ASM) is proposed. In[20], PCA is used for localiza-
tion of the optic disc and segmentation is based on Markov 1. Selection of interest points
random field. In[21], the Gullible Bayes classifier is used to 2. Extracting the descriptors
classify DR and normal images, which yielded higher accu- 3. Matching descriptor vectors of different images
racy than SVM. In[22], the DR classification was done using
a fuzzy image processing technique. The evaluation is per- For interest point selection, it is necessary to filter the
formed for k-NN, Polynomial and RBF (Radial Basis Func- images using box filters. The filtering time can be reduced
tion) kernel SVM, and a naive Bayes classifier, of which the if we use integral images[23] instead of the original
k-NN classifier showed the best performance. images. Each pixel of an integral image is calculated from
the original image by summing the pixels above and left
to it. Then, the convolution with the box filters produces
Methodology a matrix, called a Hessian matrix. This matrix is used to
find points of interest. The points are selected such that
The proposed method for detecting DR at its earlier stage the determinant of the Hessian matrix is maximum. Let us
consists of five steps: database selection, feature extraction, assume a point in the image as s = (p, q) . Then, the hessian
dimensionality reduction, classification, and performance matrix H(s, 𝜎) in s at scale 𝜎 is given in Eq. (1). The scal-
analysis. The proposed method is illustrated in Fig.  2. ing is applied by using box filters of different sizes (mainly
SURF and BRISK techniques are applied to extract the fea- up-scaling the size) without changing the image size. Then
tures from the input image. The feature descriptors in this the approximate scale can be calculated using Eq. (2).
work mainly focus on the local features, which means local

13
Physical and Engineering Sciences in Medicine

Fig. 2  Proposed method
Retinal fundus Image

SURF Feature Extraction BRISK

Feature selection and


Ranking(MR-MR)

Classifier

Performance
Evaluation

( )
initial filter scale After getting the interest points, our aim is to select the
𝜎approx =current filter size × (1) reproducible orientation for those interest points. This is
initial filter size
achieved based on the information from a circular region
[ ] around the required points. Then, a square region is con-
Lpp (s, 𝜎) Lpq (s, 𝜎) structed that is aligned to the selected orientation, and the
H(s, 𝜎) =
Lpq (s, 𝜎) Lqq (s, 𝜎) (2)
descriptors corresponding to that interest point are extracted.
The feature matching process is done using Laplacian sign
where, Lpp (s, 𝜎) is the convolved output of the image in point
indexing of interest points. The Laplacian sign recognizes
s with the Gaussian second order derivation (box filter). In
bright blobs on dark backgrounds and vice versa. In the
the same way all the other three components are obtained.
matching stage, features are compared if they have the same
Then, determinant of the hessian matrix can be obtained as
type of contrast. This information provides faster match-
seen in Eq. (3)
ing without affecting the performance of the descriptor. The
det(H) = Lpp (s, 𝜎) × Lqq (s, 𝜎) − Lpq (s, 𝜎) (3) SURF algorithm exhibits the property of scale invariance,
lighting invariance, rotation, and translation invariance[23].
After getting the determinant value, the non-maximum
suppression can be used to select the point with maximum Binary robust invariant scalable keypoints (BRISK)
determinant value in each 3 × 3 neighborhood in the image.
Thus, interest points can be selected. Then, the descriptors BRISK[24] computes brightness comparisons to form a
for these key points must be selected. For that we need to go binary descriptor string from configurable circular sampling
through two steps. patterns. This is a scale and rotation invariant algorithm. It
offers the quality of high-end features, mainly in applications
– Step 1: Assigning Orientation where there is demand in time. The steps involved in this
– Step 2: Construct square region and descriptors extrac- feature extraction are:
tion

13
Physical and Engineering Sciences in Medicine

1. Detection of scale-space keypoints



p(r, s)
2. Descriptors extraction 𝜇(r, s) = p(r, s)log drds (4)
p(r).p(s)
3. Descriptors matching
Let, D(M, s) denotes the mutual information between a fea-
In this method, the scale-space keypoint detection is per- ture in the set M and class s. Consider that the set M contains
formed by taking octaves and intra octaves. For the scale- m features r1 , r2 , … , rm . Then the mutual information can be
space representation, BRISK generally uses two pyra- formalize as:
mids. The first corresponds to m octaves vi of scale space 1 ∑
and the second deals with the m intra octaves bi where, max[D(M, s)] = 𝜇(ri , s) (5)
|M| r ∈M
i = 1, 2, … , m − 1 . The methods for construction of the i

octaves and intraoctaves are illustrated in[24]. In the first The features selected by using the maximum relevance
pyramid, the first layer is the original image, and by imple- approach should have a high correlation, which means they
menting successive half sampling, all the other layers are show maximum redundancy. Therefore, it is required to con-
derived. In the case of the second pyramid, the initial intra sider the minimum redundancy condition, which is formu-
octave alone is derived by downsampling the original image lated in Eq. (6). Let, R(M) represents the mutual information
by a factor of 1.5, and the other intra octave layers are from a between two features ri and rj in set M. Then, the equation
successive half sampling method. If s denotes the scale, then for mutual information can be modified as:
s(vi ) = 2i and s(bi ) = 2i ⋅ (1.5) . It is mentioned in[24] that by
using downsampling of 1.5, the computational effectiveness 1 ∑
min[R(M)] = 𝜇(ri , rj ) (6)
can be maintained. In the proposed work, using the same |M|2 r ,r ∈M
i j
method, we were also able to construct computationally
effective scale space of the original image. The predominant The method that combines Eqs. (5) and (6) is called “MR-
points are selected among all the neighbour octave and intra MR” method. This method finds the compact set of features
octave layers alternatively. Thus, after an iterative process, with highest relevance with less redundancy by maximizing
we get a set of scale-space key points. The BRISK descrip- D(M, s) and minimizing R(M). To combine both criteria
tors are composed as binary strings. The keypoint descrip- (to optimize ‘D’ and ‘R’), consider an objective function
tion is related to positioning the sampling pattern for each 𝛹 (D, R) . It can be defined as:
keypoint. It is important that, according to each keypoint,
the sampling pattern should be exactly scaled and rotated.
max[𝛹 (D, R), 𝛹 ] = D − R (7)
Then, hamming distance is used for matching purposes[25]. The nearest optimal features that can be defined by 𝛹 (.) can
An example for SURF and BRISK feature extraction is found using incremental search methods[26]. Generally, the
illustrated in Fig. 3. The feature extraction is carried out in feature that maximizes the optimal criterion is selected for
the gray scale image and the detected keypoints are marked further process.
as red dots in the blue channel image. The difference in the
feature point selection for each category is clearly visible in
each image. Classifier

Support vector machine (SVM)


Feature selection and ranking
Support vector machines[27] are supervised learning
MR-MR[26] is one of the feature selection algorithms; methods with associated learning algorithms. If the vectors
its goal is to find the most relevant features for the target are non-linearly separable in a space, then SVM helps to
classes when the redundancy in the extracted features gets make it linearly separable in a higher-dimensional space.
reduced. In this work, the target classes are the different The algorithm[28] describing minimal sequential optimi-
stages of DR, which are defined in the database itself. zation for training SVM is explained in Algorithm 1. The
In order to find those relevant features, it is necessary to different decision planes that can be obtained in the hyper-
maximize the mutual information between a feature and space[27] are demonstrated in Fig. 4 and the optimal hyper
the target class. Let r and s be two points, then the mutual plane for the feature sets in Fig. 5.
information (𝜇) between these two variables can be defined
using Eq. (4). The mutual information gives the maximum
dependency between two variables.

13
Physical and Engineering Sciences in Medicine

Fig. 3  Example for SURF and


BRISK Key point detection
(Images are from Messidor
Database)

13
Physical and Engineering Sciences in Medicine

Naive Bayes classifier

It is Bayes theorem based probabilistic classifier[21]. One


of the advantages of this classifier is the required small
amount of data sets. If r is the target class variable and S is
the predicted feature vector (S = s1 , s2 , … , sn ) with known
prior probabilities P(r) and P(S) respectively, then consider
P(r|S) as posterior probability with P(S|r) as likelihood, can
be written in the form,
P(S|r) ⋅ P(r)
P(r|S) = (8)
P(S)

In Naive Bayes method, the Bayes theorem is applied with


strong (naive) assumption. In the classifier model, we need
to find the probability of given input data set for all possible
Fig. 4  Different hyper planes
values of the class variable r and pick up the output with
maximum probability using the following equation.
n
r = arg max P(r)𝛱i=1
r
P(si |r) (9)

AdaBoost classifier

Adaboost[29] means adaptive Boosting. The Adaboost clas-


sifier combines weaker algorithms to get a boosted version.
For a single classifier there may be chance for poor effi-
ciency. This can be rectified by combining multiple classi-
fiers that are adaptive to training set at every iteration and
can assign the accurate weight in final voting, thus we can
have good accuracy score for the overall classifier.

Fig. 5  The optimal hyper plane

Algorithm 1 Pseudo code generation for SVM


Require: S and t load with labeled data for training
consider initially η = 0
START:
1. for soft margin parameter (γ) assume random value initially
2. repeat
3. Do for: {si , ti }, {sj , tj }
4. find the lagrange multipliers (ηi , ηj ) and optimize
5. end for
6. until η and γ become unchanged
Ensure: Retain support vectors where ηi > 0

13
Physical and Engineering Sciences in Medicine

Algorithm 2 AdaBoost Classifier Algorithm


Require: Sequence of N sampled data: (x1 , y1 ), ..., (xN , yN );xi ∈ X and yi = ±1
D ← Distribution over N samples
W ←Weak learning Classifier
START:
1
1. Initialize weight vector ωi1 = N for i = 1, 2, ..., N
2. Do for t = 1 to K

3. set Distribution d = Σ Mω ωt
t−1 i
4. run algorithm W on d ;
5. set weak hypothesis pt : X → ±1 ;
6. Aim: select pt with lowest weightederror. 
di pt (xi2)−yi

Find weight error of pt : t = Σi=1
N

7. set weight for the tth weak classifier Θt = 2


1
ln 1−t
t
ωit exp{−Θt yi pt (xi )}
8. set updated weight vector ωit+1 = Zt ;
Zt is a normalization factor
9. end for

+1, if K
Σt=1 Θt pt (x) ≥ 0
Output: Hypothesis F(x)=
−1, otherwise
End

Ada-Boost algorithm has the power to select only those 3. Again use the best split method to split the others into
features known to improve the predictive power of the branches
model[29], thereby improving execution time of the classi- 4. Repeat steps 1–3 until form a root node with target as
fier by eliminating the irrelevant features. Mathematically, the leaf nodes
this classifier[9] can be defined as: 5. Construct the forest by iteration (doing steps 1–4) for n
times to create n trees
(10)
T
DT (x) = 𝛴t=1 ft (x)

where, ft is a weak learner that takes an object x as input


and returns a value indicating the object class. The detailed
algorithm[30] of the classifier is explained in Algorithm 2.

Random forest

Random Forest[31] is an ensemble model classifier; in


which group of trees are developed together with each has
independent random vector. i.e., the K th tree generates a
random vector 𝛷K which is independent from previously
generated random vectors(𝛷1 , 𝛷2 , … , 𝛷K−1 ) but have same
distribution[32]. In this work, the number of trees used in
random forest classifier is 100[33]. The steps in the pseudo
code generation are as follows:

1. Select the features randomly from total features.


2. From the selected features find out the mother node for
the tree using best split method

Fig. 6  Schematic of multi layer perceptron

13
Physical and Engineering Sciences in Medicine

Multi layer perceptron (MLP) available dataset is split into K-sub sections (K   =   1, 2,
3,). Then, each subsection is treated as a validation set for
MLP is a multi-layer feed-forward network that maps inputs to each iteration.
outputs in a nonlinear manner. The MLP base structure con- The general steps in K-fold validation is as follows:
tains an input layer, hidden layers, and an output layer, with
each node fully connected to the nodes in the next layer with 1. Randomly shuffle the dataset;
appropriate weights, which is schematically represented in 2. Data will split into K-sub groups (If K = 10, then split
Fig. 6. In the proposed work only one hidden layer is used by the data into ten groups)
considering the advantages of single hidden layer MLP which 3. The evaluation process is performed for each group;
is mentioned in[34]. The number of nodes in the hidden layer
is derived from the number of attributes and classes averaged. – Use one group as a test set,
MLP uses a backpropagation method for training, there might – Use the remaining groups as the training dataset,
be a non-linear activation function that is not seen in other – Train the classifier with this dataset and evaluate the
neural networks. In MLP, the sigmoid function is generally model with the test data,
used, and it is described in Eq. (11). – Retain the evaluation score and repeat the steps by
selecting another group.
yi (si ) = (1 + e−si )−1 (11)
4. Summarize the model efficiency using the evaluation
where, yi depicts the ith node output and the weighted sum
scores.
of the input synapses is denoted by si . In back propagation
algorithm[35], the motive is to reduce the error propagated
The results are stored in the form of a confusion matrix[37].
in the network by adjusting the weights at each node. The
The structure of the confusion matrix that depicts the char-
error ej (n) at the jth output node in the nth data point can be
acteristics of a binary classifier is shown in Table 1. In that
calculated using the actual output value aj (n) and predicted
matrix, P, Q, R, and S represents the number of true positives
output value yj (n) as in Eq. (12).
(TP), false negatives (FN), false positives (FP), and true nega-
ej (n) = aj (n) − yj (n) (12) tives (TN) respectively. TP and TN give the results of correctly
classified data while FP and FN give the incorrectly classi-
In order to minimize the error in the entire output, the cor- fied details. Using these values, we can calculate the accu-
rections in weights at each node is done by Eq. (13) and the racy, F-score, specificity, precision and recall of the classifier
new weight for each node can be acquired from Eq. (14). to examine system efficiency.
1
𝜎(n) = 𝛴j [e2j (n)] (13) Accuracy defines the overall power of the system. It can be
2
obtained from the confusion matrix using the formula,
𝜕𝜎(n) P+S
𝛥Wji (n) = − 𝛼 yi (n) (14) Accuracy = (15)
j (n) P+Q+R+S

where, 𝛼 is the learning rate, yi (n) is the previous node out- False Positive Rate (FPR) gives the rate of incorrect positive
put. The iterative process continue until the error become predictions. The best FPR rate for a good classifier is 0.0.
unchangeable.
R
FPR = (16)
R+S
Performance analysis
Precision gives the positive prediction value. This value pro-
The performance of the classifier is analyzed using K-fold vides the information on how efficiently our system avoids
cross-evaluation[36]. In this evaluation technique, the entire FPs. It can be measured as,
P
Precision = (17)
P+R
Table 1  Confusion matrix Actual diagnosis Predicted
Recall, also called as sensitivity which gives the information
diagnosis
about how efficiently the model reduces FNs. This can be
DR NO DR calculated as,
DR P Q
NO DR R S

13
Physical and Engineering Sciences in Medicine

P Table 2  Confusion matrix for the evaluation of each classifier using


Recall = (18) IDRiD Database
P+Q
DR No DR
F1- score determines the model’s accuracy. This score gives
the harmonic mean of precision and recall. (a) SVM
   DR 278 1
2P    No DR 10 124
F1score = (19)
2P + R + Q (b) Adaboost
High values of these measures except FPR indicates the    DR 279 0
good performance of the classifier.    No DR 10 124
Specificity quantifies how efficiently the FPs are reduced (c) Naive Bayes
in a model.    DR 276 3
   No DR 4 130
S (d) Random Forest
Specificity = (20)
R+S    DR 276 3
   No DR 3 131
In order to summarize the performance of the system,
(e) MLP
weighted average of each class performance measures are
   DR 276 3
required. If P1 and P2 denotes the performance measures
   No DR 3 131
obtained for class 1 ( C1 ) and class 2 ( C2 ) respectively then
the weighted average of performance measure WPM can be
calculated using the following equation:
Table 3  Confusion matrix for the evaluation of each classifier using
(P ∗ |C1 |) + (P2 ∗ |C2 |) MESSIDOR Database
WPM = 1 (21)
|C1 | + |C2 |
DR No DR

(a) SVM
   DR 638 16
Results and discussions    No DR 11 535
(b) AdaBoost
The proposed system is evaluated for both binary and    DR 654 0
multi-class classification. For binary classification, all    No DR 66 480
categories in the MESSIDOR and IDRiD databases are (c) Naive Bayes
combined into a single category of DR, whereas in the    DR 614 40
DIARETDB0 database, only normal and DR images are    No DR 23 523
available. For multiclass classification, we used the cat- (d) Random Forest
egories available in the MESSIDOR and IDRiD data-    DR 637 17
bases. In MESSIDOR, the images are categorized into    No DR 18 528
four classes: normal, mild DR, moderate DR, and severe (e) MLP
DR. In IDRiD, there are five classes (normal, mild NPDR,    DR 642 12
moderate NPDR, severe NPDR, and PDR).    No DR 13 533

Analysis of binary classification (DR or normal)


for evaluation are created by applying the prescribed
The performance of the proposed system for binary classi- cross-validation method to determine the efficiency of
fication is evaluated over the IDRiD[38], MESSIDOR[39] each classifier and to find the classifier that gives the best
and DIARETDB0[40] databases.There are a total of 413 performance with the proposed feature extraction and
images in the IDRiD database, of which 279 are included ranking method. The feature set used for classification is
in the DR category and 134 in the normal category. In the the combined features extracted using the SURF-BRISK
MESSIDOR database, there are a total of 1200 images procedure. The features extracted using SURF and BRISK
(654 DR images and 546 Normal images). DIARETDB0 shows the properties of light invariance, scale invariance,
contains a total of 130 images, which are categorized into translation invariance, and rotation invariance. Thus, all
20 normal and 110 DR images. The confusion matrices features—irrespective of contrast (dark or bright), size
(large or small), and shape—can be extracted using this

13
Physical and Engineering Sciences in Medicine

Table 4  Confusion matrix for the evaluation of each classifier using combined feature extraction method. Here, a proportion-
DIARETDB0 Database based feature selection is not used because sometimes the
DR No DR SURF or BRISK methods provide all relevant features
that can describe the characteristics of the input image.
(a) SVM
In that case, feature selection ratio from each method may
   DR 107 3
miss predominant features. In the proposed work, all of
   No DR 1 19
the extracted features are combined and ranked accord-
(b) AdaBoost
ing to their relevance using the MR-MR method and 30
   DR 107 3
top-ranked features are fed into the classifier. For the per-
   No DR 1 19
formance analysis, a 10-fold cross-validation method is
(c) Naive Bayes
also utilized. The classifiers evaluated are SVM, Adaboost,
   DR 94 16
Naive Bayes, Random Forest, and MLP. The confusion
   No DR 6 14
matrices in Tables 2, 3 and 4 are used to evaluate the per-
(d) Random Forest
formance of each classifier with MR-MR selected features.
   DR 109 1
From these confusion matrices, the correctly classified and
   No DR 4 16
wrongly classified instances can be recognized. While
(e) MLP
evaluating the matrices, we noted that when the IDRiD
   DR 107 3
and MESSIDOR datasets are used, the number of FNs is
   No DR 0 20
zero for the Adaboost classifier and the number of FNs are
zero for MLP classifier while the DIARETDB0 database is
used. The number of FPs is higher in SVM and Adaboost

Table 5  Detailed efficiency Classifier FP rate Specificity Precision Recall F1 score Class


measures of each class for
different classifiers with SVM 0.004 0.996 0.992 0.925 0.958 Normal
strongest 30 features from
0.075 0.925 0.965 0.996 0.981 DR
IDRiD Database images
Adaboost 0.00 1.00 1.00 0.925 0.961 Normal
0.075 0.925 0.965 1.00 0.982 DR
Naive Bayes 0.011 0.989 0.977 0.970 0.974 Normal
0.030 0.97 0.986 0.989 0.987 DR
Random Forest 0.022 0.978 0.989 0.989 0.989 Normal
0.011 0.989 0.978 0.978 0.978 DR
MLP 0.007 0.993 0.985 0.978 0.981 Normal
0.022 0.978 0.989 0.993 0.991 DR

Bold characters indicate the evaluation metrics values that give best classification performance in the pro-
posed work

Table 6  Detailed efficiency Classifier FP rate Specificity Precision Recall F1 score Class


measures of each class for
different classifiers with SVM 0.024 0.996 0.971 0.980 0.975 Normal
strongest 30 features from
0.02 0.98 0.983 0.976 0.979 DR
MESSIDOR Database images
AdaBoost 0.00 1.00 1.00 0.879 0.936 Normal
0.121 0.879 0.908 1.00 0.952 DR
Naive Bayes 0.011 0.989 0.977 0.970 0.974 Normal
0.030 0.97 0.986 0.989 0.987 DR
Random Forest 0.026 0.974 0.969 0.967 0.968 Normal
0.033 0.967 0.973 0.974 0.973 DR
MLP 0.018 0.982 0.978 0.976 0.977 Normal
0.024 0.976 0.982 0.982 0.981 DR

Bold characters indicate the evaluation metrics values that give best classification performance in the pro-
posed work

13
Physical and Engineering Sciences in Medicine

Table 7  Detailed efficiency Classifier FP rate Specificity Precision Recall F1 score Class


measures of each class for
different classifiers with SVM 0.027 0.973 0.864 0.95 0.905 Normal
strongest 30 features from
0.05 0.95 0.991 0.973 0.982 DR
DIARETDB0 Database images
Adaboost 0.027 0.973 0.864 0.950 0.905 Normal
0.05 0.95 0.991 0.973 0.982 DR
Naive Bayes 0.145 0.855 0.467 0.70 0.560 Normal
0.30 0.70 0.940 0.855 0.895 DR
Random Forest 0.009 0.991 0.941 0.8 0.865 Normal
0.2 0.8 0.965 0.991 0.978 DR
MLP 0.027 0.973 0.87 1.00 0.930 Normal
0.00 1.00 1.00 0.973 0.986 DR

Bold characters indicate the evaluation metrics values that give best classification performance in the pro-
posed work

Table 8  Weighted average Database Classifier FP rate Specificity Precision Recall F1 score


values for each performance
measures in Tables 5, 6 and 7 IDRiD [38] SVM 0.052 0.948 0.974 0.973 0.973
for binary classification
AdaBoost 0.050 0.95 0.977 0.976 0.976
Naive Bayes 0.024 0.976 0.983 0.983 0.983
Random Forest 0.019 0.981 0.985 0.985 0.985
MLP 0.017 0.983 0.988 0.988 0.988
MESSIDOR[39] SVM 0.02 0.98 0.978 0.978 0.978
AdaBoost 0.066 0.934 0.950 0.945 0.945
Naive Bayes 0.051 0.949 0.948 0.948 0.948
Random Forest 0.030 0.97 0.971 0.971 0.971
MLP 0.021 0.979 0.979 0.979 0.979
DIARETDB0[40] SVM 0.047 0.953 0.971 0.969 0.97
AdaBoost 0.047 0.953 0.971 0.969 0.970
Naive Bayes 0.276 0.724 0.867 0.831 0.844
Random Forest 0.171 0.829 0.961 0.962 0.960
MLP 0.004 0.996 0.980 0.977 0.978

Bold characters indicate the evaluation metrics values that give best classification performance in the pro-
posed work

Table 9  Validation accuracy of each classifier using SURF+BRISK- Table 10  Validation accuracy of each classifier using SURF+BRISK-
MR-MR selected features from IDRiD Database images for binary MR-MR selected features from MESSIDOR Database images
classification
Classifier Correctly classified Accuracy (%)
Classifier Correctly classified Accuracy (%) instances
instances
SVM 1173 97.75
SVM 402 97.33 Adaboost 1134 94.52
Adaboost 403 97.57 Naive Bayes 1137 94.75
Naive Bayes 406 98.31 Random Forest 1165 97.08
Random Forest 407 98.55 MLP 1175 97.92
MLP 408 98.78
Bold characters indicate the evaluation metrics values that give best
Bold characters indicate the evaluation metrics values that give best classification performance in the proposed work
classification performance in the proposed work

13
Physical and Engineering Sciences in Medicine

Table 11  Validation accuracy of each classifier using SURF+BRISK- Table 13  Confusion matrix for the evaluation of each classifier for
MR-MR selected features from DIARETDB0 Database images multiclass classification using MESSIDOR Database
Classifier Correctly classified Accuracy (%) Normal Mild Moderate Severe
instances
(a) SVM
SVM 126 96.92  Normal 532 14 0 0
Adaboost 126 96.92  Mild 6 141 6 0
Naive Bayes 108 83.07  Moderate 0 6 234 7
Random Forest 125 96.15  Severe 0 0 11 243
MLP 127 97.69 (b) Adaboost
 Normal 546 0 0 0
 Mild 1 0 0 152
 Moderate 0 0 0 247
Table 12  Performance of each classifier (binary case) with proposed
feature extraction with training on MESSIDOR and IDRiD and test-  Severe 0 0 0 254
ing on DIARETDB0 (c) Naive Bayes
 Normal 501 39 0 6
Trained Database Classifier Correctly clas- Accuracy (%)
sified instances  Mild 15 129 7 2
 Moderate 1 19 208 19
IDRiD SVM 114 87.6  Severe 0 0 60 194
Adaboost 115 88.4 (d) Random Forest
Naive Bayes 106 81.5  Normal 539 7 0 0
Random Forest 119 91.5  Mild 2 142 9 0
MLP 122 93.8  Moderate 0 7 228 12
MESSIDOR SVM 111 85.3  Severe 0 0 4 250
Adaboost 115 88.4 (e) MLP
Naive Bayes 104 80.0  Normal 546 0 0 0
Random Forest 118 90.7  Mild 4 146 3 0
MLP 120 92.3  Moderate 1 3 239 4
Bold characters indicate the evaluation metrics values that give best  Severe 0 1 5 248
classification performance in the proposed work

when compared to Naive Bayes, Random Forest, and MLP. average values. The weighted average details of the effi-
The misclassification of features is comparatively low in ciency measures are given in Table 8. From the analysis, it
MLP. While analysing the MLP confusion matrix, the mis- is clear that the MLP classifier produces the lowest FP rate
classification very less compared to others. (i.e., the amount of misclassification is comparatively low
The detailed class wise efficiency measures such as FP when MLP classifier is used with SURF-BRISK features).
rate, specificity, Precision, Recall, F1 score are illustrated The FP rate is 0.017 with IDRiD dataset, 0.021 with MESSI-
in Tables 5, 6 and 7 for each classifier using datasets from DOR dataset, and 0.004 with DIARETDB0. The weighted
IDRiD, MESSIDOR and DIARETDB0 respectively. The average of precision, recall, F1 sore is high in MLP for all
analysis shows less FPR for all classifiers. This is one of the cases.
signatures of a good classifier. It is difficult to understand The accuracies obtained for each classifier analyzed using
the classifier efficiency with this single element. When the the IDRiD, MESSIDOR, and DIARETDB0 databases are
details in Table 5 were analyzed, for the Adaboost classi- given in Tables 9, 10 and 11. It is clear from the accuracy
fier, the precision for the normal class is 1.00, but for the evaluation that the MLP shows the highest accuracy in all
DR class the precision is 0.965. At the same time, for the cases. The weighted average of detailed measures for the
DR class, recall is 1.00, and for Normal class it is 0.925. IDRiD, MESSIDOR, and DIARETDB0 databases are:
These measures also affect the efficiency of the system. The 0.988, 0.979, and 0.980 precision; 0.988, 0.979, and 0.977
same case is repeated in the case of the Adaboost classifier recall; and 0.988, 0.979, and 0.978 F1 scores respectively.
with IDRiD database and MLP with DIARETDB0 dataset. There was little variation in the accuracy of classifier while
The efficiency of the system varies according to its weighted

13
Physical and Engineering Sciences in Medicine

Table 14  Detailed efficiency Classifier FPR Specificity Precision Recall F1-score Class


measures for multiclass
classification using MESSIDOR SVM 0.009 0.991 0.989 0.974 0.982 Normal
Database
0.019 0.981 0.876 0.922 0.898 Mild
0.018 0.982 0.932 0.947 0.940 Moderate
0.007 0.993 0.972 0.957 0.964 Severe
Adaboost 0.002 0.998 0.998 1.00 0.999 Normal
0.00 0.00 0.00 0.00 0.00 Mild
0.00 0.00 0.00 0.00 0.00 Moderate
0.422 0.578 0.389 1.00 0.560 Severe
Naive Bayes 0.024 0.976 0.969 0.943 0.996 Normal
0.055 0.945 0.690 0.843 0.759 Mild
0.070 0.93 0.756 0.842 0.797 Moderate
0.029 0.971 0.878 0.764 0.817 Severe
Random Forest 0.003 0.997 0.996 0.987 0.992 Normal
0.013 0.987 0.910 0.928 0.919 Mild
0.014 0.986 0.946 0.923 0.934 Moderate
0.013 0.987 0.954 0.984 0.969 Severe
MLP 0.008 0.992 0.991 1.00 0.995 Normal
0.004 0.996 0.973 0.954 0.964 Mild
0.008 0.992 0.968 0.968 0.968 Moderate
0.004 0.996 0.984 0.976 0.980 Severe

Bold characters indicate the evaluation metrics values that give best classification performance in the pro-
posed work

using different datasets. This might be because of the varia-


Table 15  Weighted average values calculated for each measures in
Table 14 for multiclass classification using MESSIDOR Database tion in the clarity of the images.
The results mentioned in Table 12 shows the efficiency
Classifier FP rate Specificity Precision Recall F1 score
of the proposed system trained with MESSIDOR and
SVM 0.012 0.988 0.959 0.958 0.959 IDRiD databases and are tested using DIARETDB0 data-
Adaboost 0.090 0.91 0.537 0.667 0.573 base. But it is analysed that the accuracy of the system
Naive Bayes 0.039 0.961 0.870 0.860 0.863 reduces when compared to the other validation accuracy
Random Forest 0.009 0.991 0.966 0.966 0.966 results in Tables 9, 10 and 11. This happens because the
MLP 0.007 0.993 0.982 0.983 0.982 test dataset do not have appropriate representation as it
is not from the same training population. Even though
Bold characters indicate the evaluation metrics values that give best
the system provides a better accuracy result with MLP
classification performance in the proposed work
classifier.

Analysis of multiclass classification (different stages


Table 16  Validation accuracy of each classifier for multiclass classifi- of DR)
cation using MESSIDOR Database
Different stages of DR are evaluated using the proposed
Classifier Correctly classified Accuracy (%)
instances algorithm with MESSIDOR and IDRiD Databases. In
MESSIDOR, a total of 1200 images are categorized into
SVM 1150 95.83 526 normal, 153 mild DR, 247 moderate DR, and 254 severe
Adaboost 800 66.67 DR images. In IDRiD, a total of 413 images are split into
Naive Bayes 1032 86 134 normal, 20 mild NPDR, 136 moderate NPDR, 74 severe
Random Forest 1159 96.58 NPDR, and 49 PDR images. This can be used for five-class
MLP 1179 98.25 grading. The confusion matrices obtained after the 10-fold
Bold characters indicate the evaluation metrics values that give best cross-validation using the MESSIDOR and IDRiD databases
classification performance in the proposed work are given in Tables 13 and 17 respectively. By using the

13
Physical and Engineering Sciences in Medicine

Table 17  Confusion matrix for the evaluation of each classifier for MESSIDOR database (detailed efficiency measures shown
multiclass classification using IDRiD Database in Table 14), the MLP classifier performs better than all
Normal Mild NPDR Moderate Severe PDR other classifiers with less FPR and higher specificity, pre-
NPDR NPDR cision, recall, and F1-score values. The weighted average
(a) SVM
values of FPR, specificity, precision, recall, and F1 score of
 Normal 132 0 2 0 0
the MLP classifier are 0.007, 0.993, 0.982, 0.983, and 0.982
 Mild 11 0 9 0 0
respectively (Table 15). The MLP classifier gives the best
NPDR performance measures in multiclass classification with an
 Moderate 0 0 131 5 0 accuracy of 98.25%, while the lowest accuracy was achieved
NPDR by the Adaboost classifier (66.67%). The accuracies of clas-
 Severe 0 0 11 59 4 sifiers using MESSIDOR database are shown in Table 16.
NPDR By using the IDRiD database, the proposed algorithm
 PDR 0 0 0 14 35 shows an accuracy of 92.01 (Table 20). While analyzing the
(b) Adaboost detailed efficiency measures, the MLP works better than the
 Normal 134 0 0 0 0 other classifiers. In Table 18, the performance measures are
 Mild 20 0 0 0 0 very poor for the Adaboost classifier. The weighted average
NPDR
values in Table 19 show that MLP gives good performance
 Moderate 1 0 135 0 0
NPDR
over the other classifiers, with an FPR of 0.031, specificity
 Severe 0 0 74 0 0
of 0.969, precision of 0.925, recall of 0.920, and F1 score of
NPDR 0.908. The accuracies for each classifier using IDRiD data-
 PDR 0 0 49 0 0 base are given in Table 20. The category used in each data-
(c) Naive Bayes base for multiclass classification is different. So, the training
 Normal 120 10 4 0 0 and testing of the proposed work with different database is
 Mild 3 16 1 0 0 not possible in the case of multiclass classification.
NPDR
 Moderate 78 8 37 13 0 Comparison of the proposed system
NPDR with pre‑trained models
 Severe 2 0 24 21 27
NPDR
In the proposed feature extraction, two local feature extrac-
 PDR 0 0 0 6 43
tors are combined and higher ranked features are used
(d) Random Forest
for classification to reduce the system complexity and to
 Normal 130 3 1 0 0
increase system performance. It is necessary to analyse
 Mild 5 4 11 0 0
NPDR the performance of the hybrid feature extraction with the
 Moderate 1 1 125 9 0 existing pre-trained models. For comparative study here we
NPDR used the features from RESNET-50[41] and VGG-16[42]
 Severe 0 0 3 67 4 pre-trained models. The accuracy of the classifier obtained
NPDR using the extracted features from the pre-trained models
 PDR 0 0 0 3 46 RESNET-50 and VGG-16 are demonstrated in Table 21
(e) MLP for binary class and in Table 22 for multiclass classifica-
 Normal 134 0 0 0 0 tion. When compared to the proposed work, the classifica-
 Mild 12 5 3 0 0 tion accuracy using pre-trained network features becomes
NPDR competitively less. These Image- Net pre-trained models are
 Moderate 0 0 136 0 0 mainly trained on natural or general images. Thus we can
NPDR
infer that the selected pre-trained networks are not able to
 Severe 0 0 2 71 1
NPDR provide fine features for DR classification task.
 PDR 0 0 4 11 34

13
Physical and Engineering Sciences in Medicine

Table 18  Detailed efficiency Classifier FPR Specificity Precision Recall F1-score Class


measures for multiclass
classification using IDRiD SVM 0.039 0.961 0.923 0.985 0.953 Normal
Database
0.00 0.00 0.00 0.00 0.00 Mild NPDR
0.079 0.921 0.856 0.963 0.907 Moderate NPDR
0.056 0.944 0.756 0.797 0.776 Severe NPDR
0.011 0.989 0.897 0.714 0.795 PDR
Adaboost 0.075 0.925 0.865 1.00 0.927 Normal
0.00 0.00 0.00 0.00 0.00 Mild NPDR
0.44 0.56 0.523 0.993 0.685 Moderate NPDR
0.00 0.00 0.00 0.00 0.00 Severe NPDR
0.00 0.00 0.00 0.00 0.00 PDR
Naive Bayes 0.297 0.703 0.591 0.896 0.712 Normal
0.046 0.954 0.471 0.800 0.593 Mild NPDR
0.105 0.895 0.561 0.272 0.366 Moderate NPDR
0.056 0.944 0.525 0.284 0.368 Severe NPDR
0.074 0.926 0.614 0.878 0.723 PDR
Random Forest 0.022 0.978 0.956 0.970 0.963 Normal
0.010 0.99 0.500 0.200 0.286 Mild NPDR
0.054 0.946 0.893 0.919 0.906 Moderate NPDR
0.035 0.965 0.848 0.905 0.876 Severe NPDR
0.011 0.989 0.920 0.939 0.929 PDR
MLP 0.043 0.957 0.918 1.00 0.957 Normal
0.00 1.00 1.00 0.250 0.40 Mild NPDR
0.032 0.968 0.938 1.00 0.968 Moderate NPDR
0.032 0.968 0.866 0.959 0.910 Severe NPDR
0.003 0.997 0.971 0.694 0.810 PDR

Bold characters indicate the evaluation metrics values that give best classification performance in the pro-
posed work

Table 19  Weighted average values calculated for each measures in Table 20  Validation accuracy of each classifier for multiclass classifi-
Table 18 for multiclass classification using IDRiD Database cation using IDRiD Database

Classifier FP rate Specificity Precision Recall F1 score Classifier Correctly classified Accuracy (%)
instances
SVM 0.050 0.95 0.823 0.864 0.841
SVM 357 86.44
Adaboost 0.171 0.829 0.453 0.651 0.527
Adaboost 269 65.13
Naive Bayes 0.152 0.848 0.566 0.574 0.532
Naive Bayes 237 57.39
Random Forest 0.033 0.967 0.889 0.901 0.892
Random Forest 372 90.07
MLP 0.031 0.969 0.925 0.920 0.908
MLP 380 92.01
Bold characters indicate the evaluation metrics values that give best
classification performance in the proposed work Bold characters indicate the evaluation metrics values that give best
classification performance in the proposed work

13
Physical and Engineering Sciences in Medicine

Table 21  Accuracy measures of classifier using pre-trained network Table 22  Accuracy measures of classifier using pre-trained network
feature extraction for binary classification feature extraction for multiclass classification
Pre-trained Database Classifier Accuracy using Pre-trained Database Classifier Accuracy using
network pre-trained network pre-trained
network (%) network(%)

RESNET-50 [41] IDRiD SVM 85.9 RESNET-50 IDRiD SVM 70.46


Adaboost 95.3 Adaboost 65.13
Naive Bayes 81.1 Naive Bayes 54.4
Random Forest 81.1 Random Forest 60.29
MLP 70.46 MLP 86.76
MESSIDOR SVM 68.8 MESSIDOR SVM 83.75
Adaboost 83.4 Adaboost 66.00
Naive Bayes 68.8 Naive Bayes 49.91
Random Forest 68.4 Random Forest 56.25
MLP 74.83 MLP 84.21
DIARETDB0 SVM 84.6 VGG-16 IDRiD SVM 70.46
Adaboost 92.3 Adaboost 65.13
Naive Bayes 84.6 Naive Bayes 54.4
Random Forest 84.6 Random Forest 60.29
MLP 82.3 MLP 71.56
VGG-16[42] IDRiD SVM 80.6 MESSIDOR SVM 83.75
Adaboost 93.2 Adaboost 66.00
Naive Bayes 81.3 Naive Bayes 49.91
Random Forest 84.2 Random Forest 56.25
MLP 70.2 MLP 84.21
MESSIDOR SVM 93.3
Adaboost 94.5
Naive Bayes 63.75 networks RESNET-50 and VGG-16, the classifier Adaboost
Random Forest 76.4 provides higher accuracy than other classifiers for binary
MLP 73.2 classification and MLP shows higher performance for mul-
DIARETDB0 SVM 79.2 ticlass classification but the accuracy measure of the system
Adaboost 89.2 is less when compared with the proposed method. Based on
Naive Bayes 75.3 our findings, we conclude that while considering all data-
Random Forest 84.6 sets, the proposed feature extraction method with the MLP
MLP 82.3 classifier shows higher performance with average accuracies
of 98.13% (binary classification) and 95.13% (multiclass
classification).
By analyzing all of the performance measures, it can be
concluded that the system works well with the MLP clas-
sifier. A comparative assessment of the proposed method Conclusion
using the MLP classifier with the other methods of feature
extraction and classification is demonstrated in Table 23. Here we propose an automated method for DR detection
In the existing methods, the specificity in the work of Naga using a combination of the SURF and BRISK features.
Sai Prasad et al.[13] and Rahim et al.[22] have higher values The MR-MR method was implemented to select the top 30
than the proposed method. However, the difference in the ranked features to improve the efficiency of the classifier.
specificity and recall values are greater than in the proposed The selected features were used as input to five classifiers
method. Generally, if specificity is less and recall is larger, (SVM, Adaboost, Naive Bayes, Random Forest, and MLP)
then it is necessary to re-examine the negative candidates and their performance evaluated. For binary classification,
to eliminate FPs. However, if specificity is larger and recall we found that MLP outperforms all the other classifiers for
is less, then the excellent candidates must be re-examined IDRiD, MESSIDOR, and DIARETDB0 databases with
to eliminate FNs. Thus, classifiers having little variation in accuracies of 98.78%, 97.92%, and 97.69% respectively.
specificity and recall values give good classification of the Thus, the average accuracy of the system with the MLP
input data. When the features are extracted using pre-trained classifier is 98.13%. For multiclass classification with MLP

13
Physical and Engineering Sciences in Medicine

Table 23  Comparison of Dataset Methods Specificity Recall Precision Accuracy (%)


proposed work with other
methods for DR classification IDRiD RESNET-50+Adaboost (Binary) −− −− −− 95.3
RESNET-50+MLP (Multiclass) −− −− −− 86.76
VGG-16+SVM (Binary) −− −− −− 93.2
VGG-16+MLP (Multiclass) −− −− −− 71.56
Proposed method (with MLP) 0.983 0.988 0.988 98.78
MESSIDOR RESNET-50+Adaboost (Binary) −− −− −− 83.4
RESNET-50+MLP (Multiclass) −− −− −− 84.21
VGG-16+Adaboost (Binary) −− −− −− 94.5
VGG-16+MLP (Multiclass) −− −− −− 84.21
R. Kamil et a.l[8] 0.93 0.91 −− 94.00
Naga sai prasad et al.[13] 1.00 0.928 −− 96.60
Proposed method (with MLP) 0.979 0.979 0.979 97.92
DIARETDB0 RESNET-50+Adaboost (Binary) −− −− −− 92.3
VGG-16+Adaboost (Binary) −− −− −− 89.2
Rahim et al.[22] 1.00 0.8679 −− 93.00
Proposed method (with MLP) 0.996 0.977 0.980 97.69

Bold characters indicate the evaluation metrics values that give best classification performance in the pro-
posed work

classifier, the accuracies were 98.25% and 92.01% using the 2. Zachariah S, Wykes W, Yorston D (2015) Grading diabetic retin-
MESSIDOR and IDRiD databases, respectively. The average opathy (dr) using the scottish grading protocol. Commun Eye
Health 28:72–73
accuracy of the system with the MLP classifier was 95.13%. 3. Abramoff MD, Garvin MK, Sonka M (2010) Retinal imaging and
The weighted average values of precision, recall, and F1 image analysis. IEEE Rev Biomed Eng 3:169–208. https​://doi.
score for MLP in all cases marks the quality of the classifier. org/10.1109/RBME.2010.20845​67
The weighted average FP rate is lower for MLP than other 4. Ali R, Usman Akram M (2018) Analysing vascular structure to
determine intra retinal microvascular abnormalities (IRMA), pp
classifiers, which increase the efficiency of the classifier. As 49–52. https​://doi.org/10.1109/CIBEC​.2018.86418​25
per the evaluations, the MLP serves as a good classifier by 5. Jemshi KM, Gopi VP, Issac Niwas S (2018) Development of an
using the SURF-BRISK extracted MR-MR selected features. efficient algorithm for the detection of macular edema from optical
Future work should aim to develop much easier, efficient, coherence tomography images. Int J Comput Assist Radiol Surg
13(9):1369–1377. https​://doi.org/10.1007/s1154​8-018-1795-6
and novel feature extraction techniques for five-class grading 6. Sreejini K, Govindan V (2019) Retrieval of pathological retina
of DR. Also, efforts should be made to derive novel methods images using bag of visual words and plsa model. Int J Eng Sci
using deep learning techniques with efficient architectures Technol 22:777–785. https:​ //doi.org/10.1016/j.jestch​ .2019.02.002
for efficient DR classification. 7. Tareen SAK, Saleem Z (2018) A comparative analysis of sift,
surf, kaze, akaze, orb, and brisk. In: International conference on
computing, mathematics and engineering technologies (iCoMET),
pp 1–10. https​://doi.org/10.1109/ICOME​T.2018.83464​40
Compliance with ethical standards  8. Kamil R, Al-Saedi K, Al-Azawi R (2018) An accurate system to
measure the diabetic retinopathy using svm classifier. Ciência e
Conflict of interest  The authors declare that they have no conflict of Técnica Vitivinícola 33:135–139
interest. 9. Lupascu CA, Tegolo D, Trucco E (2010) FABC: retinal vessel
segmentation using adaboost. IEEE Trans Inf Technol Biomed
Ethical approval  For this type of study, formal consent is not required. 14(5):1267–1274. https​://doi.org/10.1109/TITB.2010.20522​82
10. Kausu T, Gopi VP, Wahid KA, Doma W, Niwas SI (2018) Combi-
Informed consent  This article does not contain any studies with nation of clinical and multiresolution features for glaucoma detec-
human participants or animals performed by any of the authors. tion and its classification using fundus images. Biocybern Biomed
Eng 38(2):329–341. https​://doi.org/10.1016/j.bbe.2018.02.003
11. Abdulmunem M, Fatoohi Z (2018) Propose retina identifica-
tion system based on the combination of surf detector and brisk
descriptor. Iraqi J Sci 59(2B):946–955
References 12. Akyol K, BAYIR S, Sen B (2017) A decision support system for
early-stage diabetic retinopathy lesions. Int J Adv Comput Sci
1. Cheung N, Wang JJ, Klein R, Couper DJ, Sharrett AR, Wong Appl 8:369–379. https​://doi.org/10.14569​/IJACS​A.2017.08124​9
TY (2007) Diabetic retinopathy and the risk of coronary heart 13. Naga Sai Prasad VG, Ratna B, Rajesh V (2018) Feature extrac-
disease. Diabetes Care 30(7):1742–1746. https​://doi.org/10.2337/ tion based retinal image analysis for bright lesion classification
dc07-0264

13
Physical and Engineering Sciences in Medicine

in fundus image. Biomed Res 29:3648–3653. https​ : //doi. features. Multimed Tools Appl. https​://doi.org/10.1007/s1104​
org/10.4066/biome​dical​resea​rch.29-16-2170 2-019-7485-8
14. de la Calleja J, Tecuapetla L, Auxilio Medina M, Bárcenas E, 28. Daqi G, Tao Z (2007) Support vector machine classifiers using
Urbina Nájera AB (2014) LBP and machine learning for diabetic RBF kernels with clustering-based centers and widths. In: 2007
retinopathy detection. Int Conf Intell Data Eng Autom Learn international joint conference on neural networks, pp 2971–2976.
8669:110–117. https​://doi.org/10.1007/978-3-319-10840​-7_14 https​://doi.org/10.1109/IJCNN​.2007.43714​33
15. Issac Niwas S, Lin W, Kwoh CK, Kuo CJ, Sng CC, Aquino MC, 29. Wang R (2012) Adaboost for feature selection, classification and
Chew PTK (2016) Cross-examination for angle-closure glaucoma its relation with svm, a review. Phys Procedia 25:800–807. https:​ //
feature detection. IEEE J Biomed Health Inform 20(1):343–354. doi.org/10.1016/j.phpro​.2012.03.160. International conference
https​://doi.org/10.1109/JBHI.2014.23872​07 on solid state devices and materials science, macao
16. Sidibé D, Sadek I, Mériaudeau F (2015) Discrimination of retinal 30. Schapire RE (2013) Explaining AdaBoost. Springer, Berlin, pp
images containing bright lesions using sparse coded features and 37–52. https​://doi.org/10.1007/978-3-642-41136​-6_5
svm. Comput Biol Med 62:175–184. https​://doi.org/10.1016/j. 31. Roychowdhury A, Banerjee S (2018) Random forests in the clas-
compb​iomed​.2015.04.026 sification of diabetic retinopathy retinal images. In: Bhattacharyya
17. Jelinek HF, Pires R, Padilha R, Goldenstein S, Wainer J, Bos- S, Gandhi T, Sharma K, Dutta P (eds) Advanced computational
somaier T, Rocha A (2012) Data fusion for multi-lesion diabetic and communication paradigms, vol 475. Springer, Singapore, pp
retinopathy detection. In: 25th IEEE international symposium 168–176. https​://doi.org/10.1007/978-981-10-8240-5_19
on computer-based medical systems (CBMS), pp 1–4. https​:// 32. Breiman L (2001a) Random forests. Mach Learn 45(1):5–32. https​
doi.org/10.1109/CBMS.2012.62663​42 ://doi.org/10.1023/A:10109​33404​324
18. Panchal P, Bhojani R, Panchal T (2016) An algorithm for reti- 33. Breiman L (2001b) Random forests. Mach Learn 45:5–32. https​
nal feature extraction using hybrid approach. Procedia Com- ://doi.org/10.1023/A:10109​33404​324
put Sci 79:61–68. https​://doi.org/10.1016/j.procs​.2016.03.009. 34. Huang G-B, Chen Y-Q, Babri HA (2000) Classification ability
Proceedings of international conference on communication, of single hidden layer feed forward neural networks. IEEE Trans
computing and virtualization (ICCCV) 2016 Neural Netw 11(3):799–801. https​://doi.org/10.1109/72.84675​0
19. Li H, Chutatape O (2004) Automated feature extraction in color 35. Saifuddin H, Vijayalakshmi H (2016) Prediction of diabetic retin-
retinal images by a model based approach. IEEE Trans Biomed opathy using multi layer perceptron. Int J Adv Res 4:658–664.
Eng 51(2):246–254. https​://doi.org/10.1109/TBME.2003.82040​ https​://doi.org/10.21474​/IJAR0​1/714
0 36. Yadav S, Shukla S (2016) Analysis of k-fold cross-validation over
20. Gopi VP, Anjali MS, Niwas SI (2017) Pca-based localization hold-out validation on colossal datasets for quality classification.
approach for segmentation of optic disc. Int J Comput Assist In: 2016 IEEE 6th international conference on advanced comput-
Radiol Surg 12(12):2195–2204. https​://doi.org/10.1007/s1154​ ing (IACC), pp 78–83. https​://doi.org/10.1109/IACC.2016.25
8-017-1670-x 37. Visa S, Ramsay B, Ralescu A, Knaap E (2011) Confusion matrix-
21. Sudha V, Karthikeyan C (2018) Analysis of diabetic retinopathy based feature selection. CEUR Workshop Proc 710:120–127
using naive bayes classifier technique. Int J Eng Technol 7:440– 38. Porwal P, Pachade S, Kamble R, Kokare M, Deshmukh G, Sahas-
442. https​://doi.org/10.14419​/ijet.v7i2.21.12462​ rabuddhe V, Meriaudeau F (2018) Indian diabetic retinopathy
22. Rahim SS, Palade V, Shuttleworth J, Jayne C (2016) Automatic image dataset (IDRiD): a database for diabetic retinopathy screen-
screening and classification of diabetic retinopathy and maculopa- ing research. Data 3:1–8
thy using fuzzy image processing. Brain Inform 3(4):249–267. 39. Decencière E, Zhang X, Cazuguel G, Lay B, Cochener B, Trone
https​://doi.org/10.1007/s4070​8-016-0045-3 C, Gain P, Ordonez R, Massin P, Erginay A, Charton B, Klein JC
23. Bay H, Tuytelaars T, Van Gool L (2006) Surf: speeded up robust (2014) Feedback on a publicly distributed database: the messi-
features. In: Leonardis A, Bischof H, Pinz A (eds) Computer dor database. Image Anal Stereol 33(3):231–234. https​://doi.
vision—ECCV 2006, vol 3951. Springer, Berlin, pp 404–417 org/10.5566/ias.1155
24. Leutenegger S, Chli M, Siegwart RY (2011) Brisk: binary robust 40. Kalesnykiene V, kristian Kamarainen J, Lensu L, Sorri I, Uusi-
invariant scalable keypoints. In: 2011 international conference talo H, Kälviäinen H, Pietilä J (2007) DIARETDB0: Evaluation
on computer vision, pp 2548–2555. https​://doi.org/10.1109/ database and methodology for diabetic retinopathy algorithms
ICCV.2011.61265​42 41. He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for
25. Gularte A, Thomasi C, De Bem R, Adamatti D (2013) Perfor- image recognition. In: 2016 IEEE conference on computer vision
mance evaluation of brisk algorithm on mobile devices. VISAPP and pattern recognition (CVPR), pp 770–778
2013 Proc Int Conf Comput Vis Theory Appl 2:5–11 42. Simonyan K, Zisserman A (2014) Very deep convolutional net-
26. Peng H, Long F, Ding C (2005) Feature selection based on mutual works for large-scale image recognition. CoRR abs/1409.1556
information criteria of max-dependency, max-relevance, and min-
redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226– Publisher’s Note Springer Nature remains neutral with regard to
1238. https​://doi.org/10.1109/TPAMI​.2005.159 jurisdictional claims in published maps and institutional affiliations.
27. Kandhasamy JP, Kadry Balamurali S, Ramasamy LK (2019)
Diagnosis of diabetic retinopathy using multi level set segmenta-
tion algorithm with feature extraction using svm with selective

13

You might also like