You are on page 1of 6

Diabetic Retinopathy Classification Using

MobileNetV2-SVM Model
Shidqie Taufiqurrahman Astri Handayani Beni Rio Hermanto
Biomedical Engineering Program Biomedical Engineering Program Biomedical Engineering Program
School of Electrical Engineering and School of Electrical Engineering and School of Electrical Engineering and
Informatics Informatics Informatics
Institut Teknologi Bandung Institut Teknologi Bandung Institut Teknologi Bandung
Bandung, Indonesia Bandung, Indonesia Bandung, Indonesia
shidqie.taufiqurrahman@gmail.com a.handayani@ieee.org benirio@stei.itb.ac.id

Tati Latifah Mengko


Biomedical Engineering Program
School of Electrical Engineering and
Informatics
Institut Teknologi Bandung
Bandung, Indonesia
tati@stei.itb.ac.id

Abstract—Diabetic retinopathy (DR) is a common DR diagnostics, thus potentially contributes a significant


complication of diabetes mellitus which may lead to blindness impact on the management of DR-related vision disorders.
without early diagnosis and proper treatment. Manual grading
of DR severity is known to be labour-intensive and prone to Over the last decade, machine learning, and in specific,
inter-observer variability. Hence, deep learning has been deep learning has proven its capability to reliably classify
proposed as one of the automated solutions for DR severity the severity of DR. The rapid increase in data and
classification. However, most of the successful deep learning computation resource capability in conjunction with the
models are based on large convolutional neural network (CNN) public release of a large-scale DR classification dataset by
architectures, requiring a vast volume of training data. EyePACS (approximately 80.000 images) [5] have rendered
Furthermore, training is often complicated by intra-class Convolutional Neural Network (CNN) based solutions
variations and class imbalances naturally exist in DR datasets. increasingly popular. Most of the successful CNN models
In this study, we proposed a hybrid and computationally are based on transfer learning on pre-existing large-scale
efficient deep learning model, MobileNetV2-SVM, which was CNN architectures [6], hence requiring a considerable
based on the training set of APTOS 2019 dataset (3662 images amount of training data as well as computational resources.
in total). This model obtained 85% accuracy, quadratic
weighted kappa score of 0.925 and area under receiving The EyePACS dataset was compiled from different
operating characteristic (AUROC) of 100%, 82%, 94%, 94%, institutions, and as such contains the highest possible
93% for normal, mild, moderate, severe, and proliferative DR variations can be expected from the different characteristics
classes, respectively. Our result shows that with proper of fundus cameras as well as patients. Several smaller
optimization strategy, a relatively small and generic CNN single-center public DR datasets such as Asia Pacific Tele-
architecture can achieve promising DR classification Ophthalmology Society (APTOS) 2019 dataset [7] and
performance, even when trained in a highly imbalanced and Indian Diabetic Retinopathy Image Dataset (IDRiD) [8] are
limited number of data. recently available. Although smaller in size, these datasets
can be considered more clinically-realistic because within
Keywords—diabetic retinopathy, convolutional neural
this dataset one can expect more physiological variations
network, deep learning, APTOS, classification
coming from patient conditions instead of from different
I. INTRODUCTION acquisition settings.
Diabetic retinopathy (DR) is one of the micro vascular The APTOS 2019 dataset provided by the Aravind Eye
complications that affect 1 out of 3 patients with diabetes Hospital in India consists of 3662 training images [7]. To
mellitus (DM) [1]. The high blood sugar level in DM build solution for this particular dataset, several publications
patients causes gradual damages and blockages of fine have used similar approaches as those applied in the much
vessels in the retina, including retinal precapillary arterioles, larger EyePACS dataset. Khalifa et al. proposed deep
capillaries and veins, causing degenerative vision disorders transfer learning using AlexNet, Res-Net18, SqueezeNet,
which may lead to blindness if left untreated [2,3]. DR risk in GoogleNet, VGG16, and VGG19 models [9]. Dekhil et al.
DM patients increases with the duration of the disease; proposed a transfer learning method using VGG16 model
therefore, early diagnosis and treatment are critical to prevent [10]. Kassani et al. proposed modified Xception model [11].
the unwanted final outcome of blindness [4]. Tymochenko et al. proposed an ensemble of various
pretrained models [12]. Although the above-mentioned
In the current clinical practice, fundus examination is the methods achieve good performance on the DR classification
most effective way to diagnose DR. However, it heavily task, their dependency on large CNN architectures requires
relies on the manual identification of tiny and subtle retinal either pre-training in much larger DR dataset or dedicated
features such as microaneurysms, hemorrhages and exudates computational resources for fine-tuning purposes.
by trained ophthalmologists, making the overall process
labour-intensive and highly-susceptible to inter-observer With the above review in place, the motivation of this
variability. An automatic DR severity classification system research is to build an automatic DR classification system
can improve the time efficiency as well as reproducibility of using a computationally efficient small-scale convolutional
neural network. By choosing a small-scale network, we Detection Competition. The APTOS 2019 official training
hope to avoid the necessity of dedicated computational set release contains 3662 color retinal images that were
resources during training and implementation phases, thus obtained using fundus photography [7]. A clinician has rated
increasing the versatility of the model adoption as well as each image for the severity of DR on a scale of 0 to 4, which
use-case scenarios. We adopted the model proposed by Gao represents normal, mild, moderate, severe and proliferative
et al. [13] where DR classification was performed based on DR respectively. The dataset is taken under a variety of
the MobilenetV2 architecture with just 4.2 million trainable imaging conditions and resolutions. The resolutions range
parameters. We considered the MobileNetV2 as a suitable from 474 x 358 pixels to 4288 x 2848 pixels. Since either the
small-scale CNN model for the number of training images ground truth labels or the actual images were not accessible
available from the APTOS 2019 dataset. For further for the APTOS 2019 validation and test official data release,
optimization, we combined the MobileNetV2 model with an we only used the 3662 images in the training release and re-
SVM classifier, resulting a hybrid MobileNetV2-SVM assigned 90% of which to the training set and 10% to the test
model presented in this paper. for all analysis in this paper.
The main contribution of this paper is showing that with The label distribution for the APTOS 2019 official
appropriate optimization strategy, a relatively small and training release is shown in Fig. 2. Some of the samples
generic CNN architecture can achieve promising DR image from this dataset are shown in Fig. 3, each row
classification performance, even when trained in a highly contains 3 images of one class.
imbalanced and limited number of data. Our optimization
strategy can be summarized as follows:
• We treated DR classification task as an ordinal
regression task. MobileNetV2 architecture, which
was considered a small-scale architecture (4.2
million parameters), was used. Using a small-scale
architecture has enabled us to retrain the overall
architecture with relatively low computational cost.
We used the generic MobileNetV2 pre-trained
weights from ImageNet as initialization, instead of
the specific pretrained weights obtained from
training in larger DR datasets. To address the class
Fig. 2. Label distribution for APTOS 2019 dataset official training release
imbalance, we implemented image augmentation and (N = 3662 images, DR severity labels are 0 to 4).
resampling during training. With this strategy we
obtained accuracy of 83% and quadratic weighted
kappa score of 0.923 for the MobileNetV2 model
performance
• We further optimized our model by combining it
with an SVM classifier. The addition of SVM allows
flexible tuning to the inherent characteristics of the
dataset and effectively boosted the classification
performance. Our hybrid MobileNetV2-SVM model
obtained the final accuracy of 85%, quadratic
weighted kappa score of 0.925 and area under ROC
curve of 100%, 82%, 94%, 94%, 93% for normal,
mild, moderate, severe, and proliferative classes,
respectively. Fig. 3. Sample images from APTOS 2019 dataset, the top-to-bottom rows
The remainder of this paper is organized as follows. corresponds to the gradual DR severity of 0 (normal) to 4 (proliferative
Section II introduces the workflow of our proposed DR DR).
classification system. Section III describes experiment
results. Section IV summarizes our main findings. B. Image Preprocessing
The color retinal images are resized to 224 x 224 pixels
II. METHODOLOGY using bilinear interpolation. The 224 x 224 resolution is used
The workflow of our proposed DR Classification system in accordance to the default input resolution of MobileNetV2
is shown in Fig. 1. The following section contain details which was pre-trained in Image-Net [14]. Several other
procedure in each step of the workflow. generic CNN architectures such as VGG and Alex-Net also
used the similar default input image resolution [15,16]. We
have investigated several contrast enhancement and cropping
strategies to be embedded prior to image resizing, but found
that they either insignificantly affected or even lowered the
final classification result. Therefore, only image resizing was
implemented in this paper.
Fig. 1. Proposed diabetic retinopathy classification method
C. Image Augmentation and Sampling
A. Dataset The label distribution in the APTOS 2019 dataset is
The Asia Pacific Tele-Ophthalmology Society (APTOS) highly imbalanced as shown in Figure 2. Therefore, we
2019 dataset is provided by the Aravind Eye Hospital in applied image augmentation and resampling to maintain
India and used in the Kaggle APTOS 2019 Blindness uniform label distribution in our training set. For the image
augmentation we applied image flipping (horizontally and achieved higher QWK score compared to the previous
vertically), random image rotation (0-360 degrees), and model checkpoint. In the end, our model checkpoint
random zoom (90%). provided the particular model which achieved the highest
QWK score.
D. Training
Our approach is based on the MobileNetV2 architecture 2) MobileNetV2-SVM
which has been previously pretrained on ImageNet. The In most transfer learning approach, the pre-trained
training details are given in the following section. weights of the convolutional layers are left intact while the
fully-connected layers are re-trained with the specific
1) MobileNetV2 dataset. The motivation for such approach is the assumption
The MobileNetV2 model is constructed using a stack of that the convolutional layers pre-trained in large-scale
residual bottleneck layers that consists of a stack of dataset are already an optimum generic feature extractor on
bottleneck residual blocks [14]. The bottleneck residual their current state. Following such procedure, we
block is summarized in Table I, where h, w, k, t, s, are investigated the utilization of the MobileNetV2
height, width, depth, expansion factor and stride convolutional layers as a fixed feature extractor, while
respectively. Compared to the classical residual block, the replacing the fully-connected layer (i.e. the feature
inverted residual block is shown to be more memory classifier) with different types of classifiers that may offer
efficient and resistant to information loss [14]. Depthwise more flexibility, for instance the Support Vector Machine
(dwise) separable convolution is applied throughout the (SVM).
network so MobileNetV2 is computationally efficient
because it uses between 8 to 9 times fewer computations In this paper, each retinal image was represented as 256
than a standard convolution layer at the cost of a small feature values extracted from the last fully-connected layer
reduction in performance [17]. of the optimum checkpoint of MobileNetV2 (second row
from bottom in Table 2). Then we trained an SVM classifier
TABLE I. BOTLENECK RESIDUAL BLOCK using the 256 features in our training set with radial basis
function (RBF) kernel, and evaluated its performance on the
Input Operator Output test set. To get the better view of how well our model can
h×w×d 1x1 conv2d, ReLU6 h × w × (t*d) classify each class of DR, we used area under receiving
h × w × t*d 3x3 dwise stride = s, ReLU6 h/s × w/s × (t*d)
h/s × w/s × t*d Linear 1x1 conv2d h/s × w/s × d’
operating characteristic (ROC) curve or AUROC [19] for
each class to evaluate the model. In order to do that, we had
to treat the task as a binary classification. One-vs-one [20]
Our MobileNetV2 model architecture is summarized in approach is selected because it is less prone to an imbalance
Table II. Each bottleneck layer is constructed using n dataset, since it confronts a lower subset of instances and is
bottleneck residual block. The first bottleneck residual block therefore less likely to obtained imbalance training sets [21].
of each residual bottleneck layer has s stride, and the
E. Performance Evaluation
following blocks use stride 1. Output channels of each
layers is denoted as c. With the exception of the first layer, a Using plain accuracy metrics to evaluate the
constant expansion rate of t = 6 was used throughout the classification performance in an imbalance data may
network. produce spurious result, since label distributions were not
inherently taken into account by those metrics. Hence, we
TABLE II. THE STRUCTURE OF MOBILENETV2 used the quadratic weighted kappa (QWK) score as the main
evaluation metric. QWK score was initially designed to
Input Operator t c n s
2242 x 3 Conv2d - 40 1 2
measure the agreement of two raters on labels with ordinal
1122 x 40 Bottleneck 1 24 1 1 scales and it has been used for reporting DR classification
1122 x 24 Bottleneck 6 32 2 2 performance for existing models [10,12,13]. The QWK
562 x 32 Bottleneck 6 40 3 2 score, denoted as kw, is defined as follows:
282 x 40 Bottleneck 6 80 4 2
142 x 80 Bottleneck 6 128 3 1 ∑𝑖,𝑗 𝑤𝑖,𝑗 𝑜𝑖,𝑗
𝑘𝑤 = 1 − (1)
142 x 128 Bottleneck 6 208 3 2 ∑𝑖,𝑗 𝑤𝑖,𝑗 𝑒𝑖,𝑗
72 x 208 Bottleneck 6 416 1 1
72 x 416 Conv2d 1x1 - 1664 1 1 where wi,j, oi,j and ei,j represents the i,jth entry of the weight
72 x 1664 GlobalAvgPool - 1664 1 - matrix, observation rating matrix and the expected ratings
1664 Fully Connected - 256 1 -
256 Fully Connected - 256 1 -
matrix respectively.
256 Output - 1 - - The second metric we adopted is the area under
receiving operating characteristic (ROC) curve or AUROC.
The 4.2 million trainable parameters were initialized ROC graphs are two-dimensional graphs in which true
using the pretrained weights from Imagenet. It was re- positive rate is plotted on the Y axis and false positive rate is
trained with the APTOS training data for 100 epochs with a plotted on the X axis. A ROC graph depicts relative
batch size of 32. The optimizer was Adam with the learning tradeoffs between benefits (true positives) and costs (false
rate of 10-4. The output of the MobileNetV2 was a positives). AUROC of a classifier is equivalent to the
continuous number which are then rounded to the nearest probability that the classifier will rank a randomly chosen
class label between 0 to 4. Mean squared error (MSE) loss positive instance higher than a randomly chosen negative
was used because we treated DR classification problem as instance [19]. In order to accommodate multi-class problem,
an ordinal regression problem. With label ordinality, we implemented the micro- and macro-averaged AUROC
absolute loss is bigger for further-apart label scores to evaluate our results. The micro-and macro-
misclassifications. After each epoch, we evaluated the averaged AUROC scores can only be implemented in
quadratic weighted kappa (QWK) score [18] of the latest probability-based label predictions; therefore, we can only
model and saved its weights as model checkpoint if it
measure these metrics on the SVM output and not in the B. Performance of MobileNetV2-SVM
native MobileNetV2 fully-connected layer output. With the results shown in Table IV, we expected that a
To enable comparison with previous publications, we more complex classifier would provide better classification
also measured the accuracy, recall, precision and F1 score result compared to the native fully-connected layers of the
metrics adjusted for multiclass problem. MobileNetV2. Based on the optimum weights of the
MobileNetV2 model depicted in Tables III and IV, we used
III. RESULTS its convolutional layers to produce 256 features for each
retinal image and further trained an SVM classifier to predict
A. Performance of MobileNetV2 the DR severity based on these features.
Table III. provided the QWK scores on our test set over
10-fold cross validations. It can be seen that implementing The confusion matrix for the hybrid MobileNetV2-SVM
the data augmentation and resampling strategy to maintain model is shown in Table VI. Compared to the native
uniform label distribution during training enabled our model confusion matrix of MobileNetV2 in Table IV, the hybrid
to achieve relatively high QWK score ranged from 0.883 to MobileNetV2-SVM model provides better classification
0.923. performance, especially on moderate and proliferative
classes. The hybrid MobileNetV2-SVM model achieved the
TABLE III. THE QWK SCORE ON THE TEST SET PARTITION – 10-FOLD QWK score of 0.925 in the similar test set used in Table IV.
CROSS VALIDATIONS This result obtained classification accuracy of 85% and for
the precision, recall and F1 score is shown in Table VII. It
Cross Validation Fold Test QWK
1 0.897
can be seen that MobileNetV2-SVM obtained higher scores
2 0.918 than native MobileNetV2 in all of the metric used. With that
3 0.908 being said, our initial assumption that a more complex
4 0.907 classifier would provide better classification result were
5 0.894 proven to be true.
6 0.918
The ROC curve of the hybrid MobileNetV2-SVM model
7 0.899
8 0.923
is shown in Figure 4. With a one versus one approach, it is
9 0.883 shown that the hybrid model achieved the independent
10 0.887 AUROC scores of 100%, 82%, 94%, 94%, and 93% for
normal, mild, moderate, severe, and proliferative classes,
The confusion matrix corresponded to the fold with respectively. The micro- and macro-average AUROC scores,
highest QWK score (8th fold in the test QWK column of taking into account imbalanced class distributions, were 96%
Table 2) is given in Table IV. This result obtained and 93% respectively.
classification accuracy of 83%. This result obtained
TABLE VI. CONFUSION MATRIX OF MOBILENETV2-SVM (QWK =
classification accuracy of 85% and for the precision, recall 0.925)
and F1 score is shown in Table V.
Normal Mild Moderate Severe Proliferative
TABLE IV. CONFUSION MATRIX OF THE OPTIMUM MOBILENETV2 Normal 172 8 0 0 0
(QWK = 0.923) Mild 0 27 9 0 1
Moderate 0 6 83 7 4
Normal Mild Moderate Severe Proliferative Severe 0 0 5 12 2
Normal 178 2 0 0 0 Proliferative 0 1 3 7 18
Mild 3 26 6 2 0
Moderate 0 12 68 17 3 TABLE VII. MICRO-/MACRO-AVERAGE PRECISION, RECALL AND F1
Severe 0 0 2 15 2 SCORE FOR THE HYBRID MOBILENEVV2-SVM MODEL
Proliferative 0 1 3 9 16
Precision Recall F1 Score
Micro-Average 85% 85% 85%
TABLE V. MICRO-/MACRO-AVERAGE PRECISION, RECALL AND F1 Macro-Average 73% 75% 74%
SCORE FOR THE MOBILENEVV2 MODEL a.

Precision Recall F1 Score


Micro-Average 83% 83% 83%
Macro-Average 72% 74% 71%

Table IV. shows that the optimum MobileNetV2 model


performs relatively better in identifying the normal and
severe classes compared to the mild, moderate, and
proliferative classes. Even with the relatively high QWK
produced by the model, there were still considerable
misclassifications of mild to moderate classes, moderate to
both mild and severe classes, as well as of proliferative to
severe classes. These results are consistent to previous
publication where it is confirmed that the separability
between the mild, moderate, and severe classes as well as Fig. 4. The One-vs-One and Micro-/Macro-Average ROC curves for the
between the severe and proliferative classes in the APTOS Hybrid MobileNetV2-SVM Model
2019 training set release is potentially sub-optimal [10].
C. Comparison with Other Results [1] “Diabetic Retinopathy – Asia”, American Academy of
Ophthalmology, 2016. [Online]. Available:
We identified in overall four previous publications which https://www.aao.org/topic-detail/diabetic-retinopathy-asia. [Accessed:
were built based on the similar training release of APTOS 27- Jun-2020]
dataset. Three of them are based on single CNN models [2] H. Nema, Textbook of Ophthalmology. Jaypee Brothers Medical
[9,10,11] while the other one is based on ensemble CNN Publishers (P) Ltd., 2012.
models [12]. The ensemble CNN models in [12] provided [3] D. Vaughan, T. Asbury and L. Schaubert, General ophtalmology. Los
better performance on the APTOS dataset, however with the Altos, California: Lange Medical Publ, 1986.
cost of more complex classifier based on multiple large-scale [4] A. N. Kollias and M. W. Ulbig, “Diabetic retinopathy: early diagnosis
CNN architectures. The one single CNN model-based and effective treatment,” Deutsches Arzteblatt Int, vol. 107, no. 5, pp.
approach in [9] was evaluated in different data partition 75–84, 2010.
compared to ours, such that we consider the single CNN [5] EyePACS, "Diabetic Retinopathy Detection | Kaggle", Kaggle, 2015.
models in [10,11] to be the most comparable to our model in [Online]. Available: https://www.kaggle.com/c/diabetic-retinopathy-
detection. [Accessed: 26- Jun- 2020].
terms of data, complexity, and scale. Table VII. shows the
performance comparison between our model and the models [6] N. Asiri, M. Hussain, F. Al Adel and N. Alzaidi, "Deep learning
based computer-aided diagnosis systems for diabetic retinopathy: A
published by Dekhil et al. [10] as well as Kassani et al. [11]. survey", Artificial Intelligence in Medicine, vol. 99, p. 101701, 2019.
Dekhil et al. [10] used VGG16 while Kassani et al. [11] Available: 10.1016/j.artmed.2019.07.009.
used Xception as their base CNN model architectures. Both [7] Asia Pacific Tele-Ophthalmology Society, “APTOS 2019 blindness
publications utilized transfer learning approach, detection,” Kaggle, 2019. [Online]. Available:
https://www.kaggle.com/c/aptos2019-blindnessdetection/data, 2019.
unfortunately without exact information on the number of [Accessed: 27-Jun-2020]
trainable parameters in their proposed models. However, the [8] P. Porwal et al., "Indian Diabetic Retinopathy Image Dataset
basic Xception and VGG16 model architectures without the (IDRiD): A Database for Diabetic Retinopathy Screening Research",
fully-connected layers consist of 20.8 million and 14.7 Data, vol. 3, no. 3, p. 25, 2018. Available: 10.3390/data3030025.
million parameters, respectively; which are far higher than [9] N. Khalifa, M. Loey, M. Taha and H. Mohamed, "Deep Transfer
our number of 4.2 million parameters. The Dekhil et.al result Learning Models for Medical Diabetic Retinopathy Detection", Acta
was obtained in the 85%/15% training/test set partitions [10], Informatica Medica, vol. 27, no. 5, p. 327, 2019. Available:
while the Kassani et.al result was obtained in the 90%/10% 10.5455/aim.2019.27.327-332.
training/test set partitions [11], similar with our approach. [10] O. Dekhil, A. Naglah, M. Shaban, M. Ghazal, F. Taher and A. Elbaz,
"Deep Learning Based Method for Computer Aided Diagnosis of
TABLE VIII. PERFORMANCE COMPARISON WITH PUBLISHED Diabetic Retinopathy," 2019 IEEE International Conference on
LITERATURES Imaging Systems and Techniques (IST), Abu Dhabi, United Arab
Emirates, 2019, pp. 1-4, doi: 10.1109/IST48021.2019.9010333.
Test AUROC [11] S. H. Kassani, P. H. Kassani, R. Khazaeinezhad, M. J. Wesolowski,
Method Accuracy QWK K. A. Schneider and R. Deters, "Diabetic Retinopathy Classification
Normal Mild Moderate Severe Proliferative Using a Modified Xception Architecture," 2019 IEEE International
Dekhil et. Symposium on Signal Processing and Information Technology
77% 0.78 - - - - - (ISSPIT), Ajman, United Arab Emirates, 2019, pp. 1-6, doi:
al.[10]
Kassani et. 10.1109/ISSPIT47144.2019.9001846.
83% - 1.00 0.92 0.94 0.88 0.85
al.[11] [12] B. Tymchenko, P. Marchenko and D. Spodarets, "Deep Learning
MobileNetV2- Approach to Diabetic Retinopathy Detection." arXiv preprint
SVM (this 85% 0.925 1.00 0.82 0.94 0.94 0.93 arXiv:2003.02261, 2020.
paper) [13] J. Gao, C. Leung and C. Miao, "Diabetic Retinopathy Classification
Using an Efficient Convolutional Neural Network," 2019 IEEE
It can be seen that our hybrid model can outperform both International Conference on Agents (ICA), Jinan, China, 2019, pp.
80-85, doi: 10.1109/AGENTS.2019.8929191.
the Dekhil et al. [10] and Kassani et al. [11] models in terms
of accuracy, QWK score and independent AUROC scores. [14] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov and L. Chen,
"MobileNetV2: Inverted Residuals and Linear Bottlenecks," 2018
Compared to the independent AUROC scores of Kassani et IEEE/CVF Conference on Computer Vision and Pattern Recognition,
al, our scores were higher for the severe and proliferative Salt Lake City, UT, 2018, pp. 4510-4520, doi:
classes, similar for the normal and moderate classes, and 10.1109/CVPR.2018.00474.
lower for the mild class. This comparison proven that our [15] K. Simonyan and A. Zisserman, “Very deep convolutional networks
optimization strategy can make a small CNN architecture for large-scale image recognition,” arXiv preprint arXiv:1409.1556,
obtain promising DR classification performance, and even 2014.
can outperform the large CNN architecture performance in [16] A. Krizhevsky, I. Sutskever and G. Hinton, "ImageNet classification
general. with deep convolutional neural networks", in Proceedings of the
International Conference on Neural Information Processing Systems
IV. CONCLUSION (NIPS). Curran Associates, Inc., 2012, pp. 1097–1105.
[17] A. G. Howard et al., “Mobilenets: Efficient convolutional neural
We proposed a hybrid and computationally efficient networks for mobile vision applications,” arXiv preprint
deep learning model, MobileNetV2-SVM, to classify arXiv:1704.04861, 2017.
diabetic retinopathy. This model obtained 85% accuracy, [18] J. Cohen, “Weighted kappa: Nominal scale agreement provision for
quadratic weighted kappa score of 0.925 and AUROC of scaled disagreement or partial credit.” Psychological Bulletin, vol. 70,
100%, 82%, 94%, 94%, 93% for normal, mild, moderate, no. 4, pp. 213–220, 1968.
severe, and proliferative classes, respectively. This result [19] T. Fawcett, "An introduction to ROC analysis", Pattern Recognition
shows that with proper optimization strategy, a relatively Letters, vol. 27, no. 8, pp. 861-874, 2006. Available:
small and generic CNN architecture can achieve promising 10.1016/j.patrec.2005.10.010.
DR classification performance, even when trained in a [20] T. Hastie and R. Tibshirani, "Classification by pairwise coupling",
highly imbalanced and limited number of data. The Annals of Statistics, vol. 26, no. 2, pp. 451-471, 1998. Available:
10.1214/aos/1028144844.
REFERENCES [21] A. Fernández, V. López, M. Galar, M. del Jesus and F. Herrera,
"Analysing the classification of imbalanced data-sets with multiple
classes: Binarization techniques and ad-hoc approaches", Knowledge- 10.1016/j.knosys.2013.01.018.
Based Systems, vol. 42, pp. 97-110, 2013. Available:

You might also like