Diabetic Retinopathy Detection

Diabetic Retinopathy Detection
Pritish M. Kokare Sarvesh S. Choudhary Govind Pole

Dept. of Computer Engineering Dept. of Computer Engineering Dept. of Computer Engineering
Modern Education Society’s College of Modern Education Society’s College of Modern Education Society’s College of
Engineering, Pune Engineering, Pune Engineering, Pune
Pune, India Pune, India Pune, India
pritishmkokare@gmail.com sarveshchoudhary8402@gmail.com govind.pole@mescoepune.org
Prathamesh S. Vagare Revati Kawade

Dept. of Computer Engineering Dept. of Computer Engineering
Modern Education Society’s College of Modern Education Society’s College of
Engineering, Pune) Engineering, Pune
Pune, India Pune, India
psvagare@gmail.com revatikawade30@gmail.com
Abstract—Diabetic Retinopathy (DR) is a prevalent eye which can delay treatment & potentially lead to incorrect
condition affecting individuals with diabetes, potentially leading to management.
vision loss and blindness if left untreated. Manual screening by
ophthalmologists for DR is a time-consuming process. This project To aid in the early detection of DR, fundus cameras are
aims to address this challenge by leveraging Deep Learning (DL), a commonly used by doctors to capture images of the veins &
subset of Artificial Intelligence (AI), to analyze and classify nerves located behind the retina. Unfortunately, the initial stage of
different stages of DR. A model was trained on a large dataset of DR often exhibits no noticeable signs, making it extremely
more than 35,000 high-resolution fundus images to automatically challenging to identify the disease at an early phase. In this
detect and categorize DR stages. The dataset used in this study,
project, we leverage Convolutional Neural Network (CNN)
titled "Diabetic Retinopathy (resized)," is available on Kaggle. The
DR stages are classified as 0, 1, 2, 3, and 4. Patient fundus eye
architectures, specifically designed for deep learning, to improve
images serve as input for this study. The trained model utilizes the the accuracy & efficiency of DR stage detection. By utilizing
MobileNet v2 architecture, a state-of-the-art convolutional neural these AI models, patients can receive timely treatment &
network (CNN) design optimized for efficient deployment on mobile intervention.
and embedded devices, to extract features from the fundus images.
The final output is generated using an activation function. This The dataset used for this project is sourced from EyePACS, a
paper discusses the utilization of DL techniques, the CNN free platform for retinopathy screening, & is available on Kaggle
architecture, and the analysis of fundus images to diagnose and under the name "Diabetic Retinopathy (resized)". The dataset
classify DR stages, aiming to enhance the accuracy and efficiency comprises retinal images that have been resized to a standard
of DR screening. dimension of 1024x1024 pixels, with larger images proportionally
scaled down. It is important to note that the dataset includes
Keywords: Deep Learning, Diabetic Retinopathy (DR), CNN images captured by various models & types of cameras, which
Architecture, Fundus Images, MobileNet. may impact the visual appearance, especially in distinguishing
between left & right eyes. Some images are displayed
anatomically, with the macula on the left & the optic nerve on the
right for the right eye.
I. INTRODUCTION
DR is a serious complication arising from diabetes, causing Recent advancements & research in the field of artificial
substantial harm to the retina and potentially leading to vision intelligence, particularly in deep learning, have demonstrated its
impairment. This condition primarily impacts the blood vessels effectiveness in uncovering hidden patterns & features within
situated in the retinal tissue, resulting in fluid leakage and the medical image analysis tasks. Deep learning models have shown
distortion of visual perception. Within the spectrum of visual promising results in disease classification & decision support,
impairments, including conditions like cataracts and glaucoma, thereby enhancing patient care.
Diabetic Retinopathy (DR) stands out as a highly prevalent
disorder. The classification of this disease comprises five distinct [1]-[3] cite the efficacy of deep learning models in medical
stages denoted as 0, 1, 2, 3, and 4. each with its distinct symptoms image analysis tasks.
& characteristics. However, it is challenging for doctors to [4] emphasizes the potential of AI models, specifically deep
determine the stage of DR from normal retinal images alone. learning, to improve patient care through disease classification &
Existing diagnostic methods are inefficient & time-consuming, decision support.
performance of deep learning-based DR detection & classification
By utilizing deep learning techniques & analyzing fundus systems. Overall, the continued development & refinement of
images, this paper aims to overcome the limitations of current DR these automated systems hold great promise for improving the
diagnostic methods, improve the accuracy of DR stage diagnosis & management of diabetic retinopathy. [5]
classification, & ultimately enhance patient care in the
management of this debilitating disease. The research focuses on developing an automated system for
grading diabetic retinopathy (DR) using deep learning techniques
applied to color fundus images. Skip-connected networks,
II. LITERATURE REVIEW particularly the ResNet50-based network, demonstrated superior
performance in classifying DR stages, especially when dealing
Automated screening systems have revolutionized the with small features. The proposed system has the potential to
diagnosis process by significantly reducing the time required for significantly reduce diagnosis time, benefiting diabetic patients'
determining diagnoses, resulting in cost & effort savings for quality of life. Further improvements can be made by expanding
ophthalmologists & timely treatment for patients. In the context the classification to a more detailed 5-class problem,
of diabetic retinopathy (DR), automated systems play a crucial implementing ensemble approaches, utilizing Convolutional
role in detecting the disease at an early stage. The classification of Neural Networks (CNNs) for tasks like optic disc removal &
DR stages is based on the specific type of lesions observed on the blood vessel extraction, & exploring CNN-based image semantic
retina. This article presents a comprehensive review of the latest segmentation for the detection of challenging cases such as cotton
automated systems for detecting & classifying diabetic wool natured exudates. These enhancements will contribute to
retinopathy, focusing on the utilization of deep learning more accurate & precise DR grading, improving patient care &
techniques. Additionally, it describes the commonly used publicly treatment outcomes. The research showcases the effectiveness of
available fundus DR datasets & provides a brief explanation of deep learning in automating DR classification & highlights the
deep learning techniques. Convolutional Neural Networks potential for continued advancements in this field to further
(CNNs) have emerged as the preferred choice for classifying & enhance the performance & applicability of the automated
detecting DR images due to their efficiency. The review also system. [6]
explores various useful techniques that can be employed to detect Diabetic Retinopathy (DR) poses a significant threat to
& classify DR using deep learning methods. [5] individuals with diabetes, with surveys suggesting a 30% chance
of developing the condition. DR encompasses various stages,
Automated systems have transformed the field of ranging from mild to severe & ultimately progressing to
ophthalmology by streamlining the process of diagnosing diabetic Proliferative Diabetic Retinopathy (PDR), which can result in
retinopathy (DR). These systems offer immense benefits by vision impairment & even blindness if not detected early. Manual
reducing the time & effort required for diagnosis, while also diagnosis of DR through fundus image analysis requires extensive
enabling early detection & timely treatment for patients. Diabetic training & is a time-consuming & challenging task. In order to
retinopathy stages are determined based on the specific types of tackle this problem, researchers have introduced computer vision
retinal lesions present. In recent years, deep learning techniques techniques for the automated detection and classification of
have gained significant attention in the development of automated Diabetic Retinopathy (DR). This study specifically focuses on
systems for DR detection & classification. This review aims to classifying all stages of DR, with a particular emphasis on the
provide a comprehensive overview of these state-of-the-art early stages that pose significant challenges for existing models.
systems & their use of deep learning methodologies. It also sheds The authors present a deep learning ensemble framework based
light on the publicly available fundus DR datasets commonly on Convolutional Neural Networks (CNNs) to identify and
used in these studies. Moreover, the fundamental principles of categorize the various stages of DR using color fundus images.
deep learning techniques are briefly explained, with a focus on The model is trained and evaluated using the largest publicly
Convolutional Neural Networks (CNNs) as the primary choice for available fundus image dataset from Kaggle. The results indicate
DR classification & detection. Additionally, this review discusses that the proposed ensemble model surpasses existing methods and
various useful techniques that can enhance the accuracy & demonstrates accurate detection of all DR stages.. To further
effectiveness of DR detection & classification using deep learning improve the accuracy of early-stage detection, future work
approaches. [5] involves training specific models for each stage & leveraging
ensemble techniques to enhance the performance in identifying
Automated systems utilizing deep learning techniques have early stages of DR. By addressing the limitations of existing
emerged as powerful tools for the detection & classification of models & achieving better performance, this research contributes
diabetic retinopathy. By significantly reducing the time required to the advancement of automated DR detection systems,
for diagnosis, these systems contribute to cost savings & enable potentially leading to earlier interventions & improved patient
timely treatment for patients. Convolutional Neural Networks outcomes. The proposed ensemble-based deep learning approach
(CNNs) have proven to be efficient & effective in analyzing DR shows promise in overcoming the challenges associated with DR
images, making them the preferred choice for classification & detection, bringing us closer to more efficient & accurate
detection tasks. This comprehensive review has provided insights diagnosis in the early stages of the disease.[7]
into the recent advancements in automated DR detection & The aim of this study is to propose a novel architecture called
classification using deep learning techniques. Additionally, it has ResNet-34/DR, which is based on a deep Convolutional Neural
highlighted the publicly available fundus DR datasets & Network (CNN) & transfer learning, for the classification of
discussed various useful techniques that can further enhance the Diabetic Retinopathy (DR) into four classes using publicly
available Kaggle datasets. Previous attempts using the Vgg-19 & images serve as valuable documentation for diagnosing &
Xception architectures to classify DR stages encountered detecting the presence of diabetic retinopathy.
challenges related to high loss values caused by overfitting.
However, by leveraging transfer learning & fine-tuning on the To enable the training and testing of the CNN architecture,
pre-trained ResNet-34 network, the proposed architecture the dataset is accompanied by two CSV files: "train.csv" and
demonstrates exceptional performance on the color fundus image "test.csv." The "train.csv" file includes details such as the
dataset. Additionally, data augmentation techniques were names of fundus eye images and their corresponding severity
employed to increase the training samples, particularly for the levels (classes), while the "test.csv" file includes only the eye
lower classes. The training methodology employed in this
image names, as the trained CNN architecture employs images
research has led to significant advancements in the classification
for conducting testing. [11]. This dataset provides a substantial
of DR, yielding improved results compared to previous
approaches. This study contributes to the field by presenting a amount of data for training the models & evaluating their
robust architecture that addresses the challenges of DR performance on detecting & classifying diabetic retinopathy
classification, paving the way for enhanced diagnosis & treatment stages.
of this condition. [8]
The author presents a straightforward network architecture that
enables the development of highly efficient models suitable for
mobile applications. The architecture's basic unit possesses key
properties that make it well-suited for memory-efficient inference
& leverages standard operations found in all neural frameworks.
In experiments conducted on the ImageNet dataset, the proposed
architecture surpasses and the network achieves superior
performance at different performance levels compared to
real-time detectors on the COCO dataset for object detection
tasks. It outperforms them in terms of accuracy and model
complexity. It is worth mentioning that when integrated with the
SSDLite detection module, the network exhibits remarkable
results.. The proposed architecture requires significantly less
computation (20 times less) & fewer parameters (10 times less)
compared to the YOLOv2 model. On a theoretical level, the
convolutional block introduced in this work exhibits a unique
property that separates the network's expressiveness, represented
by the expansion layers, from its capacity, encoded by bottleneck
inputs. This novel characteristic opens up promising avenues for
future research, enabling the exploration of the architecture's
expressiveness & capacity in a decoupled manner. Overall, this
work contributes to the field of efficient network design &
highlights potential directions for further investigation. [9]
Fig 1: Sample Test Images
In summary, the research utilizes an open dataset of resized

III. DATASET fundus images obtained from a fundus camera. The images are
The image data utilized in this study was sourced from an categorized into different severity levels of diabetic retinopathy,
openly available dataset. called "Diabetic Retinopathy & CSV files provide the necessary information for training &
arranged." This dataset, available on Kaggle, consists of testing the CNN architecture. This dataset serves as a valuable
resized 224x224 pixel images of gaussian filtered retina scans. resource for studying & developing automated approaches for
The original dataset, known as APTOS 2019 Blindness diabetic retinopathy detection & classification.
Detection, contains high-resolution retina images captured The provided image is an example captured by a fundus
camera, representing a sample from the dataset.
using a fundus camera, which provides color fundus images
for DR detection [11]. The dataset is organized into five
directories corresponding to different stages of diabetic
retinopathy: "No_DR" (class 0), "Mild" (class 1), "Moderate"
(class 2), "Severe" (class 3), & "Proliferate_DR" (class 4) [11].
The device employed for capturing these images is a

specialized fundus camera, which integrates a low-power
microscope and an attached camera to capture precise images
of the inner surface of the eye [11]. The recorded fundus
Fig.2: Single Train Image
Fig.4: Binary Class Image Distribution
The depicted illustration displays the collection of nerves IV. METHODOLOGY

located at the posterior region of the eye.
Figure 3 presents the tally of each class, illustrating the A. CNN Architecture:
distribution of images across different categories within the
dataset.
The CNN model comprises of two primary stages: feature
extraction and classification [1]. During the feature extraction
stage, relevant information and distinguishing characteristics are
extracted from the images., while the classification phase utilizes
these extracted features to categorize the images based on the
target variable of the problem at hand.
A common CNN model resembles this:
The input layer +
Convolution layer +
Activating function+
Fig.3: Image Class Distribution
Pooling layer+
Consequently, the figure below displays the cumulative count
of images assigned exclusively to two distinct classes. Fully Linked Layer
which means we move the filter by one column at a time. The
stride determines the shift in columns or rows when applying the
filter.
By repeating this process, the filter covers the entire image,

resulting in the final feature map. Once we obtain the feature
map, we can apply an activation function to introduce
nonlinearity.
It is worth mentioning that the size of the feature map is

smaller compared to the original image in this case, as the stride
increases. The feature map's size decreases as the filter with a
stride of 1 moves across the entire image.
The input file of the first convolutional layer in our network

has a dimension of 224 x 224, & it is an RGB image. The image
Fig.5: CNN Architecture goes through several convolutional layers with filters applied. In
these convolutional layers with 3 x 3 filters, the spacing between
1) Intake Layer
input pixels is one pixel, maintaining the depth of the image after
The input image, regardless of being in RGB or grayscale convolution.
format, contains pixels with values that span from 0 to 255.
Before feeding these images into the model, it is important to Moreover, several convolutional layers in the architecture are
normalize them by scaling the pixel values to a range of 0 to 1. followed by five max-pooling layers, which conduct spatial
This normalization step ensures that the data is in a consistent & pooling. Max-pooling encompasses utilizing a window size larger
standardized format, which facilitates the learning process of the than 2 x 2 pixels and a stride of 2.
model. As an illustration, consider an example of an input image
with three channels (RGB), each containing pixel values, & a size
Subsequent to the convolutional and max-pooling layers, fully
of 4x4.:-
connected (FC) layers are employed. The number of channels in
these FC layers is 4096, & they come after a series of
convolutional layers with varying depths based on the network
architecture.
The FC layers serve as the final layers in the network

architecture.
3) Pooling Layer
Following the convolutional layer, the pooling layer is

employed to reduce the size of the feature map. This aids in
preserving the essential elements or attributes of the input image
while enhancing computational efficiency.
The principal or essential elements of the input image are still

Fig.6: Image input present in the reduced resolution rendition of the data that is
2) Convolution Layer created by pooling.
The convolution layer applies a filter to the input image to Two commonly used types of pooling, Max Pooling and
extract or identify its characteristics. Multiple filtering processes Average Pooling, are frequently employed. The operational
are performed to create a feature map that aids in classifying the process of Max Pooling is visually illustrated in the figure below.
input image. To better understand this, let's consider an example In this example, we perform pooling on the obtained feature map
using a 2D input image with standardized pixels. using a 2x2 window and a stride of 2. The maximum value within
each highlighted region is extracted, resulting in a new version of
the input image with a reduced size of 2x2. Consequently, the
In the given illustration, we have an input image of size 66 &
application of pooling reduces the overall size of the feature map.
we place a 3x3 filter on it to identify specific features. It is
important to note that multiple similar filters are actually used in
practice to extract information from the image. 4) Fully Connected Layer
The filter is then shifted by a single column in the next stage, We have completed the Feature Extraction operations up to that
as depicted in the figure. In this example, we choose a stride of 1, stage; the subsequent step is Classification. The input image is
categorised in a label utilising the completely connected layer training methods and expanding its applicability to various
(which is what we're dealing with in ANN). This layer establishes domains and activities, in future research projects. We can open
a connection between the output layer and the data obtained from up new opportunities for deploying potent computer vision
the preceding steps, namely the convolutional and pooling layers. models on mobile and embedded devices by continuing to
Its primary function is to assign the appropriate label to the input innovate and improve MobileNetV2's capabilities. This will
data. ultimately advance the development of deep learning for
resource-constrained situations.
The image below demonstrates the whole approach for
constructing a CNN model.
1) Architecture of MobileNet v1
MobileNet V1 introduces a fundamental concept of replacing

computationally intensive convolutional layers, crucial for
computer vision tasks, with a more efficient approach called
depthwise separable convolutions. In this approach, the
convolution layer's function is divided into two separate subtasks:
a depthwise convolution layer that filters the input and an 11 (or
pointwise) convolution layer that combines the filtered values to
create new features. By using this strategy, MobileNet V1 reduces
computing complexity considerably while still successfully
extracting key features from the input data. This advancement has
proven to be a significant contribution to the field of deep
learning and has made it possible to create effective and potent
Fig.7: CNN Architecture with Probability Distribution
models for a variety of computer vision applications.
B. MobileNet
A highly developed convolutional neural network (CNN) By utilizing depthwise separable convolutions, MobileNet V1
architecture called MobileNetV2 was created primarily for achieves a notable reduction in computational complexity
effective deployment on mobile and embedded devices. Its main typically associated with conventional convolutional layers, all
goal is to address the difficulties of using computationally while preserving the ability to extract meaningful features from
demanding deep learning models on devices with limited the input data. The depthwise convolution layer applies distinct
resources. This section gives a thorough description of filters to each input channel, effectively capturing spatial
MobileNetV2, encompassing its architectural innovations, information within individual channels. Subsequently, the 1×1
training approaches, and several computer vision applications. It convolution layer aggregates information across channels to
highlights the benefits and drawbacks of MobileNetV2 and produce the final feature representation. This approach offers a
provides possible directions for further study in the area of significant advantage by substantially decreasing the number of
effective deep learning models. parameters & computations required compared to st&ard
convolutional layers. As a result, MobileNet V1 is highly suitable
Need for real-time & on-device applications for computer for resource-constrained devices like mobile & embedded
vision has increased recently. But using sophisticated deep systems, striking a balance between model size & accuracy. It
learning models on hardware with constrained resources has enables efficient deployment of deep learning models for
proven to be extremely difficult. By offering a lightweight, computer vision applications on devices with limited
effective design that balances model size, accuracy, and computational resources.
computing needs, MobileNetV2 overcomes this difficulty.
MobileNetV2 is extremely effective in terms of processing The concept of depthwise separable convolutions introduced in
demands and achieves outstanding performance through rigorous MobileNet V1 has been embraced & expanded upon in
design optimisation and the use of cutting-edge techniques. subsequent versions of the MobileNet architecture, such as
MobileNet V2 & V3. These iterations have leveraged & extended
MobileNetV2 is significant because it enables a variety of the idea to enhance performance & efficiency even further. This
computer vision applications, such as semantic segmentation, groundbreaking technique has made a significant impact in the
object identification, and image classification, on mobile and realm of deep learning, enabling the creation of lightweight &
embedded devices. Its design ideas, including linear bottlenecks efficient models that are well-suited for real-world applications.
& depthwise separable convolutions, increase its efficiency & The adoption of depthwise separable convolutions has opened up
efficacy in a variety of jobs. MobileNetV2 also allows for flexible new possibilities for developing powerful yet resource-efficient
model customisation, enabling customers to alter the trade-off models that can be effectively deployed in various domains. Its
between accuracy & resource utilisation in accordance with their contributions have solidified its position as a valuable
unique needs. advancement in the field of deep learning.
Even though MobileNetV2 has excelled in the field of effective

deep learning, there is still room for advancement and more
research. The architecture can be improved, as well as new
MobileNet v2 continues the use of depthwise separable
convolutions, but the main building block looks like this:
Fig.8: Building Block for MobileNet v1

Following 13 repetitions of depthwise separable convolutional
blocks, the MobileNet V1 architecture starts with a standard 3x3
convolutional layer. There are no aggregating or pooling layers
between these blocks. The data's spatial dimensions are decreased
by using a stride of 2 in some of the depthwise convolutional Fig.9: Building Block for MobileNet v2
layers. The related pointwise convolutional layer improves the
output channels in such circumstances. The network produces a
feature map with dimensions of 7x7x1024 for an input image The building block in the MobileNet v2 architecture is made up
with dimensions of 224x224x3. With this architecture, computer of three convolutional layers. As in v1, the first two layers consist
vision tasks may be performed with effective and efficient feature of a depthwise convolution and an 11 pointwise convolution. The
extraction while still keeping a small model size. 11 pointwise convolution layer's function has changed,
nevertheless. It either increased or maintained the number of
Following the standard procedure in contemporary systems, channels in V1. It serves the opposite purpose in V2 by lowering
batch normalisation is conducted following the convolutional the number of channels. Because it transforms input with a lot of
layers. ReLU6, which is comparable to the popular ReLU dimensions (channels) into a tensor with a lot fewer dimensions,
activation but prohibits activations from growing too large, is the this layer is now known as the projection layer. The MobileNet
activation function utilised in MobileNet V1. The architectural V2 architecture's efficiency is increased and the computational
layout and activation mechanism help MobileNet V1 perform complexity is further reduced thanks to this change to the
diverse computer vision jobs effectively and efficiently. pointwise convolution layer.
𝑦 = 𝑚𝑖𝑛( 𝑚𝑎𝑥(0, 𝑥), 6) The depthwise convolution layer in MobileNet V2 utilises a

tensor with a greater number of channels, such as 144. This tensor
is compressed to substantially fewer channels, such as 24, in the
One important hyperparameter in MobileNet V1 is the depth
following projection layer. The amount of data that passes
multiplier, which is sometimes referred to as the "width
through the network is constrained by this projection layer, which
multiplier" despite its effect on the number of channels. This
serves as a bottleneck. As a result, it is known as a bottleneck
parameter determines the number of channels in each layer. For
layer. The phrase "bottleneck residual block" refers to a
example, using a depth multiplier of 0.5 reduces the number of
compressed representation of the input tensor that is formed by
channels by half in each layer, resulting in a fourfold reduction in
the output of each block, which produces a bottleneck. The
computations & a threefold reduction in learnable parameters.
network is further optimised by this design decision, which
This configuration significantly improves the speed of the model
lowers computational complexity while retaining important
but may lead to a decrease in accuracy compared to the full
properties.
model. The depth multiplier allows for a trade-off between speed
& accuracy in MobileNet V1.
The expansion layer, referred to as the initial layer in the block,
2) Architecture of MobileNet v2 is a new layer in MobileNet V2. Before the data enters the
depthwise convolution, the expansion layer conducts an 11
convolution to increase the number of channels in the data. The
expansion layer never has fewer output channels than input
channels, in contrast to the projection layer which increases the
number of channels. In essence, the expansion layer functions as
the projection layer's opposite, increasing the dimensionality of
the data and giving the following depthwise convolution layer
more freedom to extract useful features.
The expansion factor determines the extent to which the data is

expanded in the expansion layer. It is a hyperparameter that
allows for exploring different architectural trade-offs. In
MobileNet V2, the default expansion factor is set to 6, meaning
that the number of output channels in the expansion layer is six
times larger than the number of input channels. This value
provides a balance between increasing model capacity &
controlling computational complexity. By adjusting the expansion
factor, researchers & practitioners can customize the architecture
to meet specific requirements & optimize performance for their
target applications..
Fig.11: Proposed Model Architecture
The residual connections & the expansion/projection layers are
the key modifications in the v2 architecture. The proposed architecture, MobileNet V2, consists of two main
components: a feature extraction part & a classifier part. The
architecture utilizes depthwise separable convolutions, which are
a key element for efficient neural network designs.
Two layers are involved in depthwise separable convolutions:
depthwise convolution and pointwise convolution. Although the
pointwise convolution combines the filtered values to create new
features, the depthwise convolution applies a small filter to each
input channel. In comparison to conventional convolutions, this
Fig.10: Block Operation
method considerably decreases computation, yielding an 8–9
times less computational cost with only a slight reduction in
The expansion layer in MobileNet V2 acts as a decompressor, accuracy. These depthwise separable convolutions enable efficient
similar to unzipping a file. It expands the data back to its original and effective feature extraction in MobileNet V2.
size by increasing the number of channels. The depthwise
convolution layer then processes the restored data, performing the The input for the following block is the output of the preceding
appropriate filtering operations at this point in the network. The block. The depthwise layer uses batch normalisation once more
data is then once again compressed by the projection layer, which once we do the necessary batch normalisations after expanding
also shrinks the dimensions of the data. This sequence of the block's dimensions first. The expected reduction in
expansion, depthwise convolution, & projection forms a crucial dimensions follows this. These layers' output is then flattened.
part of the MobileNet V2 architecture, enabling efficient feature The generated 2-Dimensional arrays from pooled feature maps
extraction & representation while minimizing computational are all flattened into a single, lengthy continuous linear vector.
complexity & model size. The fully connected layer, which provides the probability
distribution using the softmax activation function, comes next.
The output shape for the dense layer with softmax activation
C. Proposed Architecture function is the number of classes in consideration during
classification.
D. Streamlit
Streamlit is a valuable tool that simplifies the creation of
frontend websites for machine learning models. Its intuitive
interface enables developers to build interactive & dynamic web
applications with ease. Streamlit seamlessly integrates with
popular machine learning libraries, allowing developers to
showcase their models & visualize results effortlessly. The
platform's automatic updates & real-time rendering capabilities
facilitate quick iteration & experimentation, making it perfect for
rapid prototyping. By eliminating the need for complex web
development frameworks, Streamlit empowers machine learning
practitioners to focus on the functionality of their models &
deliver engaging frontend experiences for their users.
V. RESULTS AND ANALYSIS
We conducted the experiments & acquired the study's results,

which enabled us to demonstrate the project's accuracy. Other
Image Classification CNN models such as seemed to work well
on the training dataset but didn’t perform quite well during the
prediction. For the identical dataset, we utilised MobileNet v2
architecture & analysed the accuracy & loss.
Fig.14: Confusion Matrix
VI.CONCLUSION
Given the significant impact of Diabetic Retinopathy on
individuals affected by diabetes, the manual identification of this
Fig.12: Accuracy Plot disease requires a substantial amount of time. To address this
issue, we opted to employ MobileNet v2, a cutting-edge
convolutional neural network renowned for its exceptional
performance. By utilizing this network architecture, we
developed a model for the automated detection of Diabetic
Retinopathy. Our proposed approach achieved a comparative
accuracy of 0.9311, demonstrating its effectiveness in accurately
identifying this condition.
REFERENCES
[1] “Deep Learning Approach to Diabetic Retinopathy Detection”
Borys Tymchenko, Philip Marchenko, Dmitry Spodarets
arXiv:2003.02261
[2] Anuj Jain, Arnav Jalui, Jahanvi Jasani, Yash Lahoti, Ruhina
Karani. "Deep Learning for Detection & Severity
Classification of Diabetic Retinopathy", 2019 1st International
Conference on Innovations in Information & Communication
Technology (ICIICT), 2019
[3] Cai L, Gao J, Zhao D. A review of the application of deep
learning in medical image classification and segmentation.
Fig.13: Loss Plot
Ann Transl Med. 2020;8(11):713.
doi:10.21037/atm.2020.02.44
[4] Suganyadevi, S., Seethalakshmi, V. & Balasamy, K. A review
on deep learning in medical image analysis. Int J Multimed Info
Retr 11, 19–38 (2022).
https://doi.org/10.1007/s13735-021-00218-1
[5] Wejdan L. Alyoubi, Wafaa M. Shalash, Maysoon F. Abulkhair,
Diabetic retinopathy detection through deep learning
techniques: A review, Informatics in Medicine Unlocked,
Volume 20, 2020, 100377,
ISSN 2352-9148, https://doi.org/10.1016/j.imu.2020.100377.
[6] T.R. Athira, Jyothisha J Nair, Diabetic Retinopathy Grading
From Color Fundus Images: An Autotuned Deep Learning
Approach, Procedia Computer Science, Volume 218, 2023,
Pages 1055-1066, ISSN 1877-0509,
https://doi.org/10.1016/j.procs.2023.01.085.
[7] S. Qummar et al., "A Deep Learning Ensemble Approach for images," 2016 International Conference on Signal Processing
Diabetic Retinopathy Detection," in IEEE Access, vol. 7, pp. and Communications (SPCOM), Bangalore, India, 2016, pp.
150530-150539, 2019, doi: 10.1109/ACCESS.2019.2947484. 1-4, doi: 10.1109/SPCOM.2016.7746702.
[8] “RESNET-34/DR: A RESIDUAL CONVOLUTIONAL [12] R. Sarki, K. Ahmed, H. Wang, and Y. Zhang(2020), “Automatic
NEURAL NETWORK FOR THE DIAGNOSIS OF Detection of Diabetic Eye Disease through Deep Learning
DIABETIC RETINOPATHY” Noor M. Al-Moosawi M. Using Fundus Images: A Survey,” IEEE Access, vol. 8, pp.
Al-Moosawi, Raidah Salim Khudeyer 151133–151149, doi: 10.1109/ACCESS.2020.3015258.
[9] “MobileNetV2: Inverted Residuals & Linear Bottlenecks”
Mark S&ler, Andrew Howard, Menglong Zhu, Andrey
Zhmoginov, Liang-Chieh Chen
[10] Kele Xu, Dawei Feng, & Haibo Mi, “Deep Convolutional
Neural Network-Based Early Automated Detection of Diabetic
Retinopathy Using Fundus Image,” Received: 10 November
2017; Accepted: 22 November 2017; Published: 23 November
2017.
[11] S. K. Vengalil, N. Sinha, S. S. S. Kruthiventi and R. V. Babu,
"Customizing CNNs for blood vessel segmentation from fundus
.

Diabetic Retinopathy Detection

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Diabetic Retinopathy Detection

Uploaded by

Copyright:

Available Formats

Diabetic Retinopathy Detection

Pritish M. Kokare Sarvesh S. Choudhary Govind Pole

Prathamesh S. Vagare Revati Kawade

In summary, the research utilizes an open dataset of resized

The device employed for capturing these images is a

Fig.4: Binary Class Image Distribution

The depicted illustration displays the collection of nerves IV. METHODOLOGY

A common CNN model resembles this:

The input layer +

By repeating this process, the filter covers the entire image,

It is worth mentioning that the size of the feature map is

The input file of the first convolutional layer in our network

The FC layers serve as the final layers in the network

Following the convolutional layer, the pooling layer is

The principal or essential elements of the input image are still

MobileNet V1 introduces a fundamental concept of replacing

Even though MobileNetV2 has excelled in the field of effective

Fig.8: Building Block for MobileNet v1

𝑦 = 𝑚𝑖𝑛( 𝑚𝑎𝑥(0, 𝑥), 6) The depthwise convolution layer in MobileNet V2 utilises a

The expansion factor determines the extent to which the data is

We conducted the experiments & acquired the study's results,

Fig.14: Confusion Matrix

You might also like