You are on page 1of 3

Detection of Diabetic Retinopathy via Pixel Color Amplification Using EfficientNetV2

Yi-Hsuan Kao1a, Chun-Ling Lin1b*


Department of Electrical Engineering, Ming Chi University of Technology,
New Taipei City, Taiwan.
Email: 1aM10128012@o365.mcut.edu.tw, 1bginnylin@mail.mcut.edu.tw
2023 9th International Conference on Applied System Innovation (ICASI) | 979-8-3503-9838-0/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICASI57738.2023.10179565

Abstract Methodology

This study adopts a pixel color amplification to increase A. Dataset Description


the characteristics of the fundus image to solve inconsistent The dataset used is provided by Kaggle APTOS 2019
images quality problem and adopts EfficientNetV2 Blindness [4]. In this DR dataset, the distribution of each class
architecture of deep convolutional neural network (CNN) to is highly imbalanced (Fig. 1) Due to variations in the
detect Diabetic Retinopathy (DR). The results show that the photographer, environment, and equipment, fundus images had
proposed method can achieve 0.9120 /87.16% of quadratic differences in brightness, shape, and size (Fig. 2). This
weight kappa and accuracy score and proves the efficacy of the problem affects the training of machine learning models and
proposed approach in DR classification. Thus, our work can cause the decreased performance of model.
help DR patients to reduce the probability of having lifelong
blindness.

Keywofrds: Diabetic Retinopathy, deep learning, fundus


image, EficientNetV2.

Introduction
Diabetic Retinopathy (DR) is an eye disease caused by
diabetes that damages the retina. DR is one of the leading
causes of significant vision loss and blindness in developed
countries. DR is estimated to affect over 93 million people
worldwide [1]. Symptoms of DR include microaneurysms,
hard exudates, cotton wool spots, neovascularization, vitreous Fig. 1 Number of images per class
proliferation, macular edema, and retinal detachment. Early
detection and treatment of DR are crucial to successfully
managing the disease and preventing vision loss. Undiagnosed
cases of DR may occur because doctors may not recognize this
condition during routine examinations of diabetes patients.
Therefore, some individuals with DR may not receive timely
ophthalmological treatment. However, detecting DR based on
fundus images requires expertise and knowledge from
professional ophthalmologists and it can be a time-consuming
and costly process. Therefore, the application of artificial
intelligence (AI) for automatic DR detection is an effective
method to resolve this issue.
Using image recognition and deep learning techniques Fig. 2 images in differences in brightness, shape, size
can help detect DR symptoms that are difficult to discern with B. Pre-processing methods
the naked eye at an early stage. It can also alleviate the burden First, in order to resolve the issue of different image sizes,
on ophthalmologists. Many studies have employed this study employed the gradient Hough Transform[5] to detect
Convolutional Neural Networks (CNNs) for the detection of the circular boundary of the retinal image and crop out the
DR because of their convenience, promptness, and low cost. excess regions. This preprocessing step ensured that all images
CNN has demonstrated their effectiveness in accomplishing had the same size and alignment and then can train the
computer vision and image classification objectives. [2][3]. proposed model on a consistent set of inputs.
The aim of this study is to propose a pixel color Second, this study adopted Pixel Color Amplification (PCA)
amplification to enhance image pixels and adopt the [6] to transform the color of individual pixels and enhance fine
EfficientNetV2 to train the DR classification model. Then, this detail. This method is built on three different approaches: Dark
study would verify the proposed model can achieve higher Channel Prior (DCP) [7] method for image dehazing, the
accuracy in the detection of DR. Inverted DCP for illumination correction, and the Bright

ISBN 979-8-3503-9838-0 148


Authorized licensed use limited to: Istanbul Medipol Universitesi. Downloaded on October 02,2023 at 12:30:06 UTC from IEEE Xplore. Restrictions apply.
Channel Prior for exposure correction. Then, this study Experimental Results
inverted the illumination and exposure correction and
This study evaluated the performance of the latest
combined them to enhance image brightening and darkening.
EfficientNetV2 architecture with and without the Pixel Color
Finally, the method merged the sharpened brightening and
Amplification pre-processing technique. This study also
darkening (Fig.3).
compared the results of EfficientNetV2 with and without pre-
Third, in order to improve the training effectiveness, this
processing to that of EfficientNet.
study applied data augmentation using the
The performance of the trained model was evaluated
ImageDataGenerator from Keras [8]. The data augmentation
using three metrics: train accuracy, validation accuracy, and
process involves randomly applying a combination of the
test accuracy. The accuracy is computed through (1).
following steps:
 Horizontal flip True prediction
 Rotation of 30 degrees Accuracy = True prediction + False prediction (1)
 Horizontal flip followed by rotation of 30 degrees.
These augmentation techniques can increase the size of the
The results are summarized in Table 1 and Table 2. The
training dataset and improve the ability of model to generalize
training accuracy achieved a value of 96.55%, indicating that
to new data.
the model performed well on the training set. The validation
accuracy reached a value of 83.11%, which suggests that the
model generalized well to unseen data. Finally, the test
accuracy achieved a value of 87.16%, which indicates the
model's overall performance.

Table 1 The Accuracy of EfficientNetB5


Without pre-processing
EfficientNetB5
pre-processing
Train accuracy 0.9850 0.9867
Val accuracy 0.8665 0.8147
Test accuracy 0.8530 0.8625

Table 2 The Accuracy of EfficientNetV2S


Without pre-processing
EfficientNetV2S
pre-processing
Train accuracy 0.9857 0.9655
Val accuracy 0.8389 0.8311
Test accuracy 0.8530 0.8716

Fig. 4 shows the confusion matrix of the EfficientNetV2S


model with pre-processing. The confusion matrix showed that
the highest number of misclassifications occurred when the
Fig. 3 Results of preprocessing on input sample images. A) Original, moderate class was predicted as other labels. This could be
B) Crop out the excess regions, C)PCA in brightening image, D) attributed to the highly imbalanced nature of the dataset, which
PCA in darkening image, E) merge brightening and darkening, resulted in the model learning more features associated with
the moderate class during training, thereby leading to
mispredictions.
C. EfficientNetV2 If the dataset was imbalanced, the evaluation model using
In order to improve upon the original EfficientNet [9] accuracy can be inaccurate. Therefore, this study utilized the
model which is a new network scaling method proposed by quadratic weighted kappa (QWK)[11] as an evaluation metric.
Google in 2019, the EfficientNetV2 [10] architecture was QWK is a metric used to evaluate the performance of
proposed to address three main issues: (1) slow training with classification models when dealing with imbalanced data
very large image sizes, (2) slow depth wise convolutions in (2)(3). Unlike traditional accuracy metrics, QWK considers the
early layers, and (3) sub-optimal scaling of every stage. agreement between predicted labels and true labels, as well as
EfficientNetV2 proposed a search space enriched with the possibility of the agreement occurring by chance. QWK
additional Fused-MBConv and utilized training-aware NAS assigns weights to the different levels of agreement between
and scaling to optimize the model accuracy, training speed, and labels and computes a score that ranges from -1 to 1, with
parameter size jointly. This approach enabled the achievement higher scores indicating better agreement. It is particularly
of both fast training and high accuracy. useful in case the summary of model performance, both models
This study utilized EfficientNetV2S which achieves achieved very good QWK, weighted F1, and accuracy scores.
similar accuracy to EfficientNetB5 but with only 2/3 of its In the quadrat s where the distribution of labels is not
parameters and reduced the training time per epoch from 77s balanced, as it penalizes the model for misclassifying rare
to 43s, resulting in a reduction of about 40% execution time. labels more heavily. Table 3 shows ic weighted kappa

ISBN 979-8-3503-9838-0 149


Authorized licensed use limited to: Istanbul Medipol Universitesi. Downloaded on October 02,2023 at 12:30:06 UTC from IEEE Xplore. Restrictions apply.
equation (2) , 𝑂𝑖,𝑗 is constructed the number of class 𝑖 Acknowledgments
identified as class 𝑗 , 𝐸𝑖,𝑗 represents the real contingency label, This work was supported, in part, by the Aiming for the
and variable N (3) represents the total number of classes in the Talent Cultivation Project of Ministry of Education of Taiwan.
classification task. Further support came from the Aiming for the Talent
Cultivation Project of Ministry of Education of Taiwan, and
∑ 𝑤 𝑂 the Program funded by National Science and Technology
𝜅 = 1 − ∑𝑖,𝑗 𝑤𝑖,𝑗 𝐸𝑖,𝑗 (2) Council of Taiwan under grant number MOST 111-2221-E-
𝑖,𝑗
𝑖,𝑗 𝑖,𝑗
(𝑖−𝑗)2 131-030.
𝑤𝑖,𝑗 = (𝑁−1)2 (3)

Table 3 Summary of model performance References


Weighted Accuracy [1] Diabetic Retinopathy Detection. Kaggle
Model QWK
F1 [2] A. K. G. Inc and A. Krizhevsky, “ImageNet
EfficientNetV2S 0.9120 0.8723 0.8716 classification with deep convolutional Neural Networks,”
Communications of the ACM, 01-Jun-2017. [Online].
EfficientNetB5 0.9082 0.8514 0.8625 Available: https://dl.acm.org/doi/10.1145/3065386.
[Accessed: 03-Mar-2023].
[3] K. Simonyan and A. Zisserman, “Very deep
convolutional networks for large-scale image
recognition,” arXiv.org, 10-Apr-2015. [Online].
Available: https://arxiv.org/abs/1409.1556. [Accessed:
03-Mar-2023].
[4] "APTOS 2019 Blindness Detection," Kaggle, 2019.
[Online]. Available:
https://www.kaggle.com/c/aptos2019-blindness-
detection. [Accessed: Feb. 23, 2023].
[5] H. K. Yuen, J. Princen, and J. Dlingworth, “A
comparative study of Hough transform methods for
circle finding,” Procedings of the Alvey Vision
Conference 1989, 1989.
[6] A. Gaudio, A. Smailagic, and A. Campilho,
“Enhancement of retinal fundus images via pixel color
amplification,” SpringerLink, 17-Jun-2020. [Online].
Available:
https://link.springer.com/chapter/10.1007/978-3-030-
50516-5_26. [Accessed: 03-Mar-2023].
[7] K. He, J. Sun and X. Tang, "Single Image Haze Removal
Using Dark Channel Prior," in IEEE Transactions on
Pattern Analysis and Machine Intelligence, vol. 33, no.
12, pp. 2341-2353, Dec. 2011, doi:
Fig. 4. Confusion Matrix of EfficientNetV2S with pre-processing 10.1109/TPAMI.2010.168.
[8] Tf.keras.preprocessing.image.imagedatagenerator;
tensorflow V2.11.0 TensorFlow. Available at:
Conclusion https://www.tensorflow.org/api_docs/python/tf/keras/pr
eprocessing/image/ImageDataGenerator.
Based on the experimental results, EfficientNetV2 [9] M. Tan and Q. V. Le, “EfficientNet: Rethinking model
showed faster training speed and higher prediction accuracy scaling for Convolutional Neural Networks,” arXiv.org,
than EfficientNet. The combination with the effective pre- 11-Sep-2020. [Online]. Available:
processing Pixel Color Amplification (PCA) technique can https://arxiv.org/abs/1905.11946. [Accessed: 03-Mar-
effectively enhance DR features. This approach enabled better 2023].
prediction accuracy using limited data, and future retraining [10] M. Tan and Q. V. Le, “EFFICIENTNETV2: Smaller
with new data can save a lot of time. The proposed technology models and faster training,” arXiv.org, 23-Jun-2021.
can be applied to preliminary medical diagnoses, reducing the [Online]. Available: https://arxiv.org/abs/2104.00298.
burden on doctors, accelerating treatment, and preventing [Accessed: 03-Mar-2023].
tragedies of blindness. In the future, this study will exploration [11] J. Cohen, “A coefficient of agreement for nominal scales,”
conducted on the model architecture and the issue of Educational and Psychological Measurement, vol. 20, no.
imbalanced data. 1, pp. 37–46, 1960.

ISBN 979-8-3503-9838-0 150


Authorized licensed use limited to: Istanbul Medipol Universitesi. Downloaded on October 02,2023 at 12:30:06 UTC from IEEE Xplore. Restrictions apply.

You might also like