You are on page 1of 84

ABSTARCT:

Insufficient labeled training data is one of the challenges in deep learning classification
problems mostly in medical. Transfer learning helps to train the deep learning model using
small Training Dataset. Training data insufficiency problem in Deep convolutional networks
can be solved by using transfer learning. Deep convolutional neural networks have been
achieving high performance results on the ImageNet dataset and one such example is
EfficientNet-b5 which achieves state of art top-5 96.2% accuracy and top-1 83.4% accuracy. It
achieves more accuracy and efficiency with less parameters compared to other deep
convolutional neural networks. we can also use this in transfer learning in order to solve the
problems with small training dataset. we trained this on training dataset in Kaggle Aptos
blindness detection challenge and applied it on previously unseen test dataset in Kaggle Aptos
blindness detection challenge. We can use it in other deep learning based image classification
problems which face the challenge of labeled training data insufficiency.

Keywords: Deep Convolutional Networks, Transfer Learning, Diabetic Retinopathy,


Insufficient labeled training data

i
TABLE OF CONTENTS

TITLE PAGE NUMBER

Abstract i
Keywords i
List of Figures iv

1. INTRODUCTION 1
1.1 Introduction (sub parts) 1
1.1.1 Diabetic Retinopathy 1
1.1.2 Transfer Learning 5
1.2 Motivation for the work 8
1.3 Problem statement 9

2. LITERATURE SURVEY 10
2.1 Application of higher order spectra for the identification of 10
diabetes retinopathy stages.
2.2 Rethinking the inception architecture for computer vision 10
2.3 Development and validation of a deep learning algorithm 10
for detection of diabetic retinopathy in retinal fundus
photographs
2.4 Convolutional neural networks for diabetic retinopathy 11
2.5 Automated identification of diabetic retinopathy using 11
deep learning
2.6 Comparative study of fine-tuning of Pre-Trained Convolutional 11
Neural networks for diabetic retinopathy screening
2.7 Deep convolutional neural networks for diabetic retinopathy 11
Detection by image classification
2.8 Deep-Learning based automatic computer-aided diagnosis of 12
Diabetic retinopathy
2.9 Diagnosis of diabetic retinopathy using deep neural networks 12
2.10 Multi-Cell Multi-Task Convolutional neural networks for DR 12

ii
2.11 Fundus image classification using VGG-19 architecture with 13
PCA and SVD
2.12 Existing System 13

3.METHODOLOGY 16
3.1 Proposed System 16
3.1.1 Data Collection and Preparation 27
3.1.2 Image Preprocessing 28
3.1.3 Modeling 32
3.1.4 Evolution 34

4.EXPERIMENTAL ANALYSIS 36
4.1 System Configuration 36
4.1.1 Software Configuration
4.1.2 Hardware Configuration
4.2 Sample Code 36
4.3 Sample Outputs 49

5.CONCLUSION AND FUTURE WORK 56


5.1 Conclusion
5.2 Future Work

REFERENCES 57

APPENDECES 62

iii
LIST OF FIGURES

FIGURE NAME FIGURE NUMBER


DIFFERENCES
BETWEEN
TRADITIONAL
1.1.2.1
MACHINE LEARNING
AND TRANSFER
LEARNING

GRAPH
REPRESENTATION FOR
TRANSFER LEARNING
1.1.2.2
AND WITHOUT
TRANSFER LEARNING

STRUCTURE OF
NEURAL NETWORK FOR 2.12.1
IMAGE RECOGNITION

CONFUSION MATRIX
FOR THE 2.12.2
CLASSIFICATION OF
THE NETWORK
SYSTEM
ARCHITECTURE 3.1.1
OVERALL FLOW OF
OUR AUTOMATED
NEURAL
3.1.2
ARCHITECTURE

OUR MnasNet NETWORK,


SAMPLED FROM NOVEL
FACTORIZED
HIERARCHIAL SEARCH
SPACE, ILLUSTRATING 3.1.3
THE LAYER DIVERSITY
THROUGHOUT THE
NETWORK
ARCHITECTURE

IMAGENET ACCURACY
AND INFERENCE 3.1.4
LATENCY COMPARISON

VARIOUS SCALING 3.1.5


METHODS

EFFICIENT NET-B5 3.1.6


ARCHITECTURE

iv
GRAPH
REPRESENTATION OF 3.1.7
THE EFFICIENT NET
PERFORMANCE

LABEL DISTRIBUTION 3.1.1.1


IMAGE OF DIABETIC 3.1.2.1
RETINOPATHY

ORIGINAL INPUT 3.1.2.2


IMAGES

AFTER CROPPING THE 3.1.2.3


IMAGES

AFTER PREPROCESSING 3.1.2.4


THE IMAGES

LABEL DISTRIBUTION 4.3.1


OF TRAINING SET

IMAGES WITH OUT 4.3.2


RESIZING

IMAGES AFTER 4.3.3


RESIZING

IMAGES BEFORE 4.3.4


PREPROCESSING

IMAGES AFTER 4.3.5


PREPROCESSING

MODEL ARCHITECTURE 4.3.6


LABEL DISTRIBUTION 4.3.7
FOR THE PREDICTIONS

v
1. INTRODUCTION
1.1. Introduction

1.1.1 Diabetic Retinopathy

Diabetic retinopathy is a condition that may occur in people who have diabetes.
It causes progressive damage to the retina, the light-sensitive lining at the back of the
eye. Diabetic retinopathy is a serious sight-threatening complication of diabetes.
Diabetes interferes with the body's ability to use and store sugar (glucose).

The disease is characterized by too much sugar in the blood, which can cause
damage throughout the body, including the eyes. Over time, diabetes damages small
blood vessels throughout the body, including the retina. Diabetic retinopathy occurs
when these tiny blood vessels leak blood and other fluids. This causes the retinal tissue
to swell, resulting in cloudy or blurred vision. The condition usually affects both eyes.
The longer a person has diabetes, the more likely they will develop diabetic
retinopathy. If left untreated, diabetic retinopathy can cause blindness.

Symptoms of diabetic retinopathy include:

 Seeing spots or floaters

 Blurred vision

 Having a dark or empty spot in the center of your vision

 Difficulty seeing well at night

When people with diabetes experience long periods of high blood sugar, fluid can
accumulate in the lens inside the eye that controls focusing. This changes the curvature
of the lens, leading to changes in vision. However, once blood sugar levels are
controlled, usually the lens will return to its original shape and vision improves.
Patients with diabetes who can better control their blood sugar levels will slow the
onset and progression of diabetic retinopathy. According to a 2018 American Eye-
Q Survey conducted by the AOA, nearly half of Americans didn't know whether
diabetic eye diseases have visible symptoms (often which the early stages of diabetic
retinopathy does not). The same survey found that more than one-third of Americans

1
didn't know a comprehensive eye exam is the only way to determine if a person's
diabetes will cause blindness, which is why the AOA recommends that everyone with
diabetes have a comprehensive dilated eye examination at least once a year. Early
detection and treatment can limit the potential for significant vision loss from diabetic
retinopathy. Treatment of diabetic retinopathy varies depending on the extent of the
disease. People with diabetic retinopathy may need laser surgery to seal leaking blood
vessels or to discourage other blood vessels from leaking. Your Doctor of Optometry
might need to inject medications into the eye to decrease inflammation or stop the
formation of new blood vessels. People with advanced cases of diabetic retinopathy
might need a surgical procedure to remove and replace the gel-like fluid in the back of
the eye, called the vitreous. Surgery may also be needed to repair a retinal detachment.
This is a separation of the light-receiving lining in the back of the eye.
If you are diabetic, you can help prevent or slow the development of diabetic
retinopathy by:

 Taking your prescribed medication

 Sticking to your diet

 Exercising regularly

 Controlling high blood pressure

 Avoiding alcohol and smoking

What causes diabetic retinopathy?

Diabetic retinopathy results from the damage diabetes causes to the small blood
vessels located in the retina. These damaged blood vessels can cause vision loss:

 Fluid can leak into the macula, the area of the retina responsible for clear central
vision. Although small, the macula is the part of the retina that allows us to see
colors and fine detail. The fluid causes the macula to swell, resulting in blurred
vision.

 In an attempt to improve blood circulation in the retina, new blood vessels may
form on its surface. These fragile, abnormal blood vessels can leak blood into the
back of the eye and block vision.

2
Diabetic retinopathy is classified into two types:

A. Non-proliferative diabetic retinopathy (NPDR) is the early stage of the


disease in which symptoms will be mild or nonexistent. In NPDR, the blood
vessels in the retina are weakened. Tiny bulges in the blood vessels, called
microaneurysms, may leak fluid into the retina. This leakage may lead to swelling
of the macula.

B. Proliferative diabetic retinopathy (PDR) is the more advanced form of the


disease. At this stage, circulation problems deprive the retina of oxygen. As a
result, new, fragile blood vessels can begin to grow in the retina and into the
vitreous, the gel-like fluid that fills the back of the eye. The new blood vessels may
leak blood into the vitreous, clouding vision.

Other complications of PDR include detachment of the retina due to scar tissue
formation and the development of glaucoma. Glaucoma is an eye disease in which
there is progressive damage to the optic nerve. In PDR, new blood vessels grow into
the area of the eye that drains fluid from the eye. This greatly raises the eye pressure,
which damages the optic nerve. If left untreated, PDR can cause severe vision loss and
even blindness.

Risk factors for diabetic retinopathy include:

 Diabetes. People with type 1 or type 2 diabetes are at risk for developing diabetic
retinopathy. The longer a person has diabetes, the more likely he or she is to
develop diabetic retinopathy, particularly if the diabetes is poorly controlled.

 Race. Hispanics and African Americans are at greater risk for developing
diabetic retinopathy.

 Medical conditions. People with other medical conditions, such as high blood
pressure and high cholesterol, are at greater risk.

 Pregnancy. Pregnant women face a higher risk for developing diabetes and
diabetic retinopathy. If a woman develops gestational diabetes, she has a higher
risk of developing diabetes as she ages.

3
How is diabetic retinopathy diagnosed?

Diabetic retinopathy can be diagnosed through a comprehensive eye examination.


Testing with emphasis on evaluating the retina and macula may include:

 Patient history to determine vision difficulties, presence of diabetes, and other


general health concerns that may be affecting vision

 Visual acuity measurements to determine how much central vision has been
affected

 Refraction to determine if a new eyeglass prescription is needed

 Evaluation of the ocular structures, including the evaluation of the retina through
a dilated pupil

 Measurement of the pressure within the eye

Supplemental testing may include:

 Retinal photography or tomography to document the current status of the retina

 Fluorescent angiography to evaluate abnormal blood vessel growth

How is diabetic retinopathy treated?

Laser treatment (photocoagulation) is used to stop the leakage of blood and


fluid into the retina. A laser beam of light can be used to create small burns in areas of
the retina with abnormal blood vessels to try to seal the leaks.

Treatment for diabetic retinopathy depends on the stage of the disease. The
goal of any treatment is to slow or stop the progression of the disease.

In the early stages of non-proliferative diabetic retinopathy, regular monitoring


may be the only treatment. Following your doctor's advice for diet and exercise and
controlling blood sugar levels can help control the progression of the disease.

Injections of medication in the eye are aimed at discouraging the formation of


abnormal blood vessels and may help slowdown the damaging effects of diabetic
retinopathy. If the disease advances, the abnormal blood vessels can leak blood and
fluid into the retina, leading to macular edema. Laser treatment (photocoagulation) can

4
stop this leakage. A laser beam of light creates small burns in areas of the retina with
abnormal blood vessels to try to seal the leaks. Widespread blood vessel growth in the
retina, which occurs in proliferative diabetic retinopathy, can be treated by creating a
pattern of scattered laser burns across the retina. This causes abnormal blood vessels to
shrink and disappear. With this procedure, some side vision may be lost in order to
safeguard central vision.

1.1.2 Transfer learning

Humans have an inherent ability to transfer knowledge across tasks. What we acquire
as knowledge while learning about one task, we utilize in the same way to solve
related tasks. The more related the tasks, the easier it is for us to transfer, or cross-
utilize our knowledge. Some simple examples would be,

 Know how to ride a motorbike -> Learn how to ride a car

 Know how to play classic piano -> Learn how to play jazz piano

 Know math and statistics -> Learn machine learning

The first thing to remember here is that, transfer learning, is not a new concept which
is very specific to deep learning. There is a stark difference between the traditional
approach of building and training machine learning models, and using a methodology
following transfer learning principles.

Fig 1.1.2.1 Differences between traditional Machine Learning and Transfer Learning

5
Traditional learning is isolated and occurs purely based on specific tasks, datasets and
training separate isolated models on them. No knowledge is retained which can be
transferred from one model to another. In transfer learning, you can leverage
knowledge (features, weights etc.) from previously trained models for training newer
models and even tackle problems like having less data for the newer task.

Let’s understand the preceding explanation with the help of an example. Let’s
assume our task is to identify objects in images within a restricted domain of a
restaurant. Let’s mark this task in its defined scope as T1. Given the dataset for this
task, we train a model and tune it to perform well (generalize) on unseen data points
from the same domain (restaurant). Traditional supervised ML algorithms break down
when we do not have sufficient training examples for the required tasks in given
domains. Suppose, we now must detect objects from images in a park or a cafe (say,
task T2). Ideally, we should be able to apply the model trained for T1, but in reality,
we face performance degradation and models that do not generalize well. This happens
for a variety of reasons, which we can liberally and collectively term as the model’s
bias towards training data and domain.

Transfer learning should enable us to utilize knowledge from previously


learned tasks and apply them to newer, related ones. If we have significantly more data
for task T1, we may utilize its learning, and generalize this knowledge (features,
weights) for task T2 (which has significantly less data). In the case of problems in the
computer vision domain, certain low-level features, such as edges, shapes, corners and
intensity, can be shared across tasks, and thus enable knowledge transfer among tasks!
Also, as we have depicted in the earlier figure, knowledge from an existing task acts as
an additional input when learning a new target task.

How to Use Transfer Learning?

You can use transfer learning on your own predictive modeling problems.

Two common approaches are as follows:

a) Develop Model Approach

b) Pre-trained Model Approach

6
a) Develop Model Approach

Select Source Task. You must select a related predictive modeling problem with an
abundance of data where there is some relationship in the input data, output data,
and/or concepts learned during the mapping from input to output data.

Develop Source Model. Next, you must develop a skillful model for this first task.
The model must be better than a naive model to ensure that some feature learning has
been performed.

Reuse Model. The model fit on the source task can then be used as the starting point
for a model on the second task of interest. This may involve using all or parts of the
model, depending on the modeling technique used.

Tune Model. Optionally, the model may need to be adapted or refined on the input-
output pair data available for the task of interest.

b) Pre-trained Model Approach

Select Source Model. A pre-trained source model is chosen from available models.
Many research institutions release models on large and challenging datasets that may
be included in the pool of candidate models from which to choose from.

Reuse Model. The model pre-trained model can then be used as the starting point for a
model on the second task of interest. This may involve using all or parts of the model,
depending on the modeling technique used.

Tune Model. Optionally, the model may need to be adapted or refined on the input-
output pair data available for the task of interest.

When to Use Transfer Learning?

Transfer learning is an optimization, a shortcut to saving time or getting better


performance. In general, it is not obvious that there will be a benefit to using transfer
learning in the domain until after the model has been developed and evaluated.

7
Lisa Torrey and Jude Shavlik describe three possible benefits to look for when using
transfer learning:

Higher start. The initial skill (before refining the model) on the source model is
higher than it otherwise would be.

Fig 1.1.2.2 Graph Representation for transfer learning and without transfer learning

Higher slope. The rate of improvement of skill during training of the source model is
steeper than it otherwise would be.

Higher asymptote. The converged skill of the trained model is better than it otherwise
would be.

On some problems where you may not have very much data, transfer learning
can enable you to develop skillful models that you simply could not develop in the
absence of transfer learning. The choice of source data or source model is an open
problem and may require domain expertise and/or intuition developed via experience.

1.2 Motivation for problem

According to WHO (world health organization) 412 million people were living
with diabetes mellitus in 2014.In 2010,33 percent of the people who are suffering from
diabetes are detected with diabetic retinopathy and among them one third of the people

8
were affected with loss of vision. It is expected that number of people detected with
DR may triple in 2050 particularly in America.

Diagnosis of DR requires expert knowledge and we can detect using deep


learning techniques but it requires huge dataset which it lacks in healthcare and takes
so much time which we can eliminate in transfer learning.

1.3 Problem statement

Detecting the stage of Diabetic retinopathy using Fundus photograph images with help
of transfer learned approach of EfficientNet-B5 model. The main objective of the
project is to detect diabetic retinopathy to stop blindness before it is too late. we detect
by classifying the images of retina of patient into five labels numbered from 0 to 4
where each label named as Normal, Mild DR, Moderate DR, Severe DR, Prolific DR
respectively represents the Complication of the disease using Deep transfer learning
and classification techniques. From these 5 stages one stages is observed as an output
label for the given input fundus image.

9
2. LITERATURE SURVEY
2.1 Application of higher order spectra for the identification of diabetes
retinopathy stages.

Feature extraction based classification and DL has been used to classify DR. In
Acharya et al. [18] higher order spectra technique was used to extract features from
300 fundus images and fed to a support vector machine classifier; it classified the
images into 5 classes with sensitivity of 82% and specificity of 88%. Different
algorithms were developed to extract DR lesions such as blood vessels, exudates, and
microaneurysms [19]. Exudates have been extracted for DR grading [20 - 24]. Support
vector machine was used to classify the DIABETDB1 dataset into positive and
negative classes using area and number of microaneurysms as features [25].

2.2 Rethinking the inception architecture for computer vision

Feature extraction based classification methods need expert knowledge in order


to detect the required features, and they also involve a time consuming process of
feature selection, identification and extraction. Furthermore, DL based systems such as
CNNs have been seen to outperform feature extraction based methods [26]. DL
training for DR classification have been performed in two major categories: learning
from scratch and transfer learning.

2.3 Development and validation of a deep learning algorithm for detection of


diabetic retinopathy in retinal fundus photographs.

A convolutional neural network (CNN) was trained to classify a dataset of


128,175 fundus images into 2 classes, where the first class contains images with
severity levels 0 and 1, and the second class contains levels 2, 3 and 4 [27]. In an
operating cut point picked for high sensitivity, [27] had a sensitivity of 97.5% and
specificity of 93.4% on the EyePACS-1 dataset which consists of 9963 images; it
scored a sensitivity of 96.1% and a specificity of 93.9% on the Messidor-2 dataset; and
in an evaluation cut point selected for high specificity, the sensitivity and specificity
were 90.3% and 98.1% on the EyePACS-1, while 87% and 98.5% was scored on the
Messidor-2, consecutively.

10
2.4 Convolutional neural networks for diabetic retinopathy

Using a training dataset of over 70,000 fundus images, Pratt et al. [28] trained a
CNN using stochastic gradient descent algorithm to classify DR into 5 classes, and it
achieved 95% specificity, 75% accuracy and 30% sensitivity. A DL model was trained
from scratch on the MESSIDOR-2 dataset for the automatic detection of DR in [29],
and a 96.8% sensitivity and 87% specificity were scored.

2.5 Automated identification of diabetic retinopathy using deep learning

A CNN was trained from scratch to classify fundus images from the Kaggle
dataset into referable and non-referable classes, and it scored a sensitivity of 96.2%
and a specificity of 66.6% [30]. A dataset of 71896 fundus images was used to train a
CNN DR classifier and resulted in a sensitivity of 90.5% and specificity of 91.6%
[31]. A DL model was designed and trained on a dataset of 75137 fundus images and
resulted in a sensitivity and specificity scores of 94% and 98%, respectively [32].

2.6 Comparative Study of Fine-Tuning of Pre-Trained Convolutional Neural


Networks for Diabetic Retinopathy Screening

In order to avoid the time and resource consumed during DL, Mohammadian et
al. [33] fine-tuned the Inception-V3 and Exception pre-trained models to classify the
Kaggle dataset into two classes. After using data augmentation to balance the dataset,
[33] reached at an accuracy score of 87.12% on the Inception-V3, and 74.49% on the
Exception model.

2.7 Deep convolutional neural networks for diabetic retinopathy detection by


image classification

Wan et al. [34] implemented transfer learning and hyper parameter tuning on
the pre-trained models AlexNet, VggNet-s, VggNet-16, VggNet-19, GoogleNet and
ResNet using the Kaggle dataset and compared their performances. The highest
accuracy score was that of VggNet-s model, which reached 95.68% when training
with hyper-parameter tuning [34]. Transfer learning was used to work around the

11
problem of insufficient training dataset in [35] for retinal vessel segmentation. An
Inception-V4 [36] model-based DR classification scored higher sensitivity when
compared with human expert graders on a 25,326 retinal images of patients with
diabetes from Thailand [37].

2.8 Deep-learning-based automatic computer-aided diagnosis system for diabetic


retinopathy

Mansour [38] put to use the Kaggle dataset to train a deep convolutional neural
network using transfer learning for feature extraction when building a computer aided
diagnosis for DR. In Dutta et al. [39] 2000 fundus images were selected from the
Kaggle dataset to train a shallow feed forward neural network, deep neural network
and VggNet16 model. On a test dataset of 300 images, the shallow neural network
scored an accuracy of 41%, and the deep neural network scored 86.3% whiles the
VggNet-16 scored 78.3% accuracy [39].

2.9 Diagnosis of Diabetic Retinopathy Using Deep Neural Networks

A training dataset of size 4476 was collected and labeled into 4 classes
depending on abnormalities and required treatment [40]; they resized input images into
600x600 and cut every image into four 300x300 images, and fed these images into
separate pre-trained Inception-V3 models, which they called the Inception@4. After it
was seen that accuracy result of the Inception@4 surpassed the VggNet and ResNet
models, it was deployed on a web-based DR classification system.

2.10 Multi-Cell Multi-Task Convolutional Neural Networks for Diabetic


Retinopathy Grading

A multi cell, multi task convolutional neural network that uses a combination
of cross entropy and mean square error was developed to classify images from the
Kaggle dataset into 5 DR degrees [41]. A binary tree based multi-class VggNet
classifier was trained on the Kaggle dataset in Adly et al. [41], and it scored an
accuracy of 83.2%, sensitivity of 81.8% and specificity of 89.3% on a validation

12
dataset of 6000 fundus images.

2.11 Fundus Image Classification Using VGG-19 Architecture with PCA and
SVD

By making use of SVMs with fully connected layers based on the VggNet-19
model, Mateen et al. [43] reached at an accuracy of 98.34% when classifying DR on
the Kaggle dataset. The Kaggle dataset [17], which contains 35126 labeled fundus
images, has been exhaustively used for DL based classification of DR research
purposes.

2.12 Existing system

In the paper, they developed a network with CNN architecture and data
augmentation which can identify the intricate features involved in the classification
task such as micro-aneurysms, exudate and hemorrhages on the retina and
consequently provide a diagnosis automatically and without user input. Network was
trained using a high-end graphics processor unit (GPU) on the publicly available
Kaggle dataset and demonstrate impressive results, particularly for a high-level
classification task. On the data set of 80,000 images used our proposed CNN achieves
a sensitivity of 95% and an accuracy of 75% on 5,000 validation images.

The structure of our neural network, shown in below was decided after studying the
literature for other image recognition tasks.

13
Fig 2.12.1 Structure of neural network for image recognition

Increased convolution layers are perceived to allow the network to learn deeper
features. The first layer learns edges the deepest layer of the network, the last
convolutional layer, should learn the features of classification of DR such as hard
exudate. The network starts with convolution blocks with activation and then batch
normalization after each convolution layer. As the number of feature maps increases it
move to one batch normalization per block. All max pooling is performed with kernel
size 3x3 and 2x2 strides. After the final convolutional block, the network is flattened
to one dimension. To avoid overfitting, it uses weighted class weights relative to the
number of images in each class. Likewise, we perform dropout on dense layers, to
reduce overfitting, until we reach the dense five node classification layer which uses a
SoftMax activation function to predict our classification. The leaky rectified linear unit
13 activation function was used, applied with a value of 0.01, to stop over reliance on
certain nodes in the network. Similarly, in the convolution layers, L2 regularization

14
was used for weight and biases. The loss function used to optimize was the widely
used categorical cross-entropy function.

The dataset used for testing was provided by the Kaggle coding website
contains over 80,000 images, of approximately 6M pixels per image and scales of
retinopathy. Resizing these images and running CNN on a high-end GPU, the
NVIDIA K40c, meant we were able to train on the whole dataset. The NVIDIA K40c
contains 2880 CUDA cores and comes with the NVIDIA CUDA Deep Neural
Network library (cuDNN) for GPU learning. 5,000 images from the dataset were
saved for validation purposes. Running the validation images on the network took 188
seconds. For this five class problem we define specificity as the number of patients
correctly identified as not having DR out of the true total amount not having DR and
sensitivity as the number of patients correctly identified as having DR out of the true
total amount with DR. We define accuracy as the number of patients with a correct
classification. The final trained network achieved, 95% specificity, 75% accuracy and
30% sensitivity.

The classifications in the network were defined numerically as: 0 - No DR 1


- Mild DR 2 - Moderate DR 3 - Severe DR 4 - Proliferative DR

Predicted label

Fig 2.12.2 confusion matrix for the classification of the network

The above table shows the confusion matrix of the result of the classification of five
stages in Diabetic Retinopathy.

15
3. METHODOLOGY

3.1 PROPOSED SYSTEM

SYSTEM ARCHITECTURE

Transfer learning is a machine learning technique where a model trained on


one task is re-purposed on a second related task. Transfer learning is an optimization
that allows rapid progress or improved performance when modeling the second task.

Fig 3.1.1 System architecture

How to use transfer learning?

Basically, the training of a CNN involves, finding of the right values on each
of the filters so that an input image when passed through the multiple layers, activates
certain neurons of the last layer so as to predict the correct class.

Though training a CNN from scratch is possible for small projects, most
applications require the training of very large CNN’s and this as you guessed, takes

16
huge amounts of processed data and computational power. And both of these are not
found so easily these days. In transfer learning, we take the pre-trained weights of an
already trained model (one that has been trained on millions of images belonging to
1000’s of classes, on several high-power GPU’s for several days) and use these
already learned features to predict new classes.

Working of Transfer Learning

When we train a deep convolutional neural network on a dataset of images,


during the training process, the images are passed through the network by applying
several filters on the images at each layer. The values of the filter matrices are
multiplied with the activations of the image at each layer. The activations coming out
of the final layer are used to find out which class the image belongs to.

When we train a deep network, our goal is to find the optimum values on each
of these filter matrices so that when an image is propagated through the network, the
output activations can be used to accurately find the class to which the image belongs.
The process used to find these filter matrix values is gradient descent.

When we train a conv net on the ImageNet dataset and then take a look at what
the filters on each layer of the conv net has learnt to recognize, or what each filter gets
activated by, we are able to see something really interesting.

The filters on the first few layers of the conv net learn to recognize colors and
certain horizontal and vertical lines. The next few layers slowly learn to recognize
trivial shapes using the lines and colors learnt in the previous layers. Then the next
layers learn to recognize textures, then parts of objects like legs, eyes, nose etc.
Finally, the filters in the last layers get activated by whole objects and gives the output.
By using a pretrained network to do transfer learning, we are simply adding a few
dense layers at the end of the pretrained network and learning what combination of
these already learnt features help in recognizing the objects in our new datasets.

17
EfficientNet pre-trained model

Convolutional neural networks are commonly developed at a fixed resource


cost, and then scaled up in order to achieve better accuracy when more resources are
made available. For example, Resnet can be scaled up from ResNet-18 to ResNet-200
by increasing the number of layers, and recently, GPipe achieved 84.3% ImageNet
top-1 accuracy by scaling up a baseline CNN by a factor of four. The conventional
practice for model scaling is to arbitrarily increase the CNN depth or width, or to use
larger input image resolution for training and evaluation. While these methods do
improve accuracy, they usually require tedious manual tuning, and still often yield
suboptimal performance. We found a more principled method to scale up a CNN to
obtain better accuracy and efficiency.

In the ICML 2019 paper, “EfficientNet: A rethinking model for scaling up


CNN” a novel scaling method that uses a simple yet highly effective compound
coefficient to scale up CNNs in a more structured manner. Unlike conventional
approaches that arbitrarily scale network dimensions, such as width, depth and
resolution, our method uniformly scales each dimension with a fixed set of scaling
coefficients. Powered by this novel scaling method and recent progress on AutoML
was developed a family of models, called EfficientNets, which super pass state-of-the-
art accuracy with up to 10x better efficiency (smaller and faster).

AutoML

CNNs have been widely used in image classification, face recognition, object
detection and many other domains. Unfortunately, designing CNNs for mobile devices
is challenging because mobile models need to be small and fast, yet still accurate.
Although significant effort has been made to design and improve mobile models, such
as MobileNet and MobileNetV2, manually creating efficient models remains
challenging when there are so many possibilities to consider. Inspired by recent
progress in AutoML neural architecture search, we wondered if the design of mobile
CNN models could also benefit from an AutoML approach.

18
In “MnasNet: Platform-Aware Neural Architecture Search for Mobile”, we
explore an automated neural architecture search approach for designing mobile models
using reinforcement learning. To deal with mobile speed constraints, it explicitly
incorporates the speed information into the main reward function of the search
algorithm, so that the search can identify a model that achieves a good trade-off
between accuracy and speed. In doing so, MnasNet is able to find models that run 1.5x
faster than state-of-the-art hand-crafted MobileNetV2 and 2.4x faster than NASNet,
while reaching the same ImageNet top 1 accuracy.

Unlike in previous architecture search approaches, where model speed is


considered via another proxy our approach directly measures model speed by
executing the model on a particular platform, e.g., Pixel phones which were used in
this research study. In this way, we can directly measure what is achievable in real-
world practice, given that each type of mobile device has its own software and
hardware idiosyncrasies and may require different architectures for the best trade-offs
between accuracy and speed.

The overall flow of our approach consists mainly of three components: a RNN-based
controller for learning and sampling model architectures, a trainer that builds and
trains models to obtain the accuracy, and an inference engine for measuring the model
speed on real mobile phones using TensorFlow Lite. We formulate a multi-objective
optimization problem that aims to achieve both high accuracy and high speed and
utilize a reinforcement learning algorithm with a customized reward function to
find Pareto optimal solutions (e.g., models that have the highest accuracy without
worsening speed).

19
Fig 3.1.2 Overall flow of our automated neural architecture

In order to strike the right balance between search flexibility and search space size, we
propose a novel factorized hierarchical search space, which factorizes a convolutional
neural network into a sequence of blocks, and then uses a hierarchical search space to
determine the layer architecture for each block. In this way, our approach allows
different layers to use different operations and connections; Meanwhile, we force all
layers in each block to share the same structure, thus significantly reducing the search
space size by orders of magnitude compared to a flat per-layer search space.Our
MnasNet network, sampled from the novel factorized hierarchical search space
illustrating the layer diversity throughout the network architecture.

20
Fig 3.1.3 Our MnasNet network, sampled from the novel factorized hierarchical search space,
illustrating the layer diversity throughout the network architecture.

21
We tested the effectiveness of our approach on ImageNet classification
and COCO object detection. Our experiments achieve a new state-of-the-art accuracy
under typical mobile speed constraints. In particular, the figure below shows the results
on ImageNet.

Fig 3.1.4 ImageNet Accuracy and Inference Latency comparison

With the same accuracy, our MnasNet model runs 1.5x faster than the hand-crafted
state-of-the-art MobileNetV2, and 2.4x faster than NASNet, which also used
architecture search. After applying the squeeze-and-excitation optimization, our
MnasNet and SE models achieve ResNet-50 level top-1 accuracy at 76.1%, with 19x
fewer parameters and 10x fewer multiply-adds operations. On COCO object detection,
our model family achieves both higher accuracy and higher speed than Mobile Net and
achieves comparable accuracy to the SSD300 model with 35x less computation cost.

22
Compound Model Scaling: A Better Way to Scale Up CNNs

In order to understand the effect of scaling the network, we systematically


studied the impact of scaling different dimensions of the model. While scaling
individual dimensions improves model performance, we observed that balancing all
dimensions of the network—width, depth, and image resolution—against the available
resources would best improve overall performance.

The first step in the compound scaling method is to perform a grid search to
find the relationship between different scaling dimensions of the baseline network
under a fixed resource constraint (e.g., 2x more flops). This determines the appropriate
scaling coefficient for each of the dimensions mentioned above. We then apply those
coefficients to scale up the baseline network to the desired target model size or
computational budget.

Fig 3.1.5 Various Scaling Methods

From the above diagram we can see that compound scaling which we used in

23
EfficientNet-b5 uniformly scales network width, network depth, resolution scaling at a
time. In compound scaling, we use compound coefficient to uniformly scales all
dimensions. compound coefficient obtained by applying grid search to find the
relationship between different scaling dimensions. Also, baseline network plays
important role in EfficientNet architecture.

To find the best baseline network, we used neural architectural search using
AutoML Mnas network. We get the resulting architecture is similar to
MobileNetV2[18] and then we get EfficientNets by scaling up the resulting
architecture.

This compound scaling method consistently improves model accuracy and


efficiency for scaling up existing models such as MobileNet (+1.4% ImageNet
accuracy), and ResNet (+0.7%), compared to conventional scaling methods.

EfficientNet Architecture

The effectiveness of model scaling also relies heavily on the baseline


network. So, to further improve performance, we have also developed a new baseline
network by performing a neural architecture search using the AutoML MNAS
framework, which optimizes both accuracy and efficiency (FLOPS). The resulting
architecture uses mobile inverted bottleneck convolution (MBConv), similar
to MobileNetV2 and MnasNet, but is slightly larger due to an increased FLOP budget.
We then scale up the baseline network to obtain a family of models,
called EfficientNets.

The architecture for our baseline network EfficientNet-B0 is simple and clean,
making it easier to scale and generalize. Below is the architecture of EfficientNet-b0
architecture.

24
Fig 3.1.6 EfficientNet-B5 Architecture

25
EfficientNet Performance

We have compared our EfficientNets with other existing CNNs on ImageNet.


In general, the EfficientNet models achieve both higher accuracy and better efficiency
over existing CNNs, reducing parameter size and FLOPS by an order of magnitude.
For example, in the high-accuracy regime, our EfficientNet-B7 reaches state-of-the-art
84.4% top-1 / 97.1% top-5 accuracy on ImageNet, while being 8.4x smaller and 6.1x
faster on CPU inference than the previous Gpipe. Compared with the widely
used ResNet-50, our EfficientNet-B4 uses similar FLOPS, while improving the top-1
accuracy from 76.3% of ResNet-50 to 82.6% (+6.3%).

Fig 3.1.7 Graph Representation of the EfficientNet performance

26
EfficientNet-B0 is the baseline network developed by AutoML MNAS, while
Efficient-B1 to B7 are obtained by scaling up the baseline network. In particular, our
EfficientNet-B7 achieves new state-of-the-art 84.4% top-1 / 96.2% top-5 accuracy,
while being 8.4x smaller than the best existing CNN.

Though EfficientNets perform well on ImageNet, to be most useful, they should also
transfer to other datasets. To evaluate this, we tested EfficientNets on eight widely
used transfer learning datasets. EfficientNets achieved state-of-the-art accuracy in 5
out of the 8 datasets, such as CIFAR-100 (91.7%) and Flowers (98.8%), with an order
of magnitude fewer parameters (up to 21x parameter reduction), suggesting that our
EfficientNets also transfer well. EfficientNets potentially serve as a new foundation
for future computer vision tasks. We have open-sourced all EfficientNet models,
which can benefit the larger machine learning community.

3.1.1 DATA COLLECTION AND PREPARATION


In Pre-processing, there are 4 sub-modules: Data preparation, Exploratory Data
Analysis, Metric, Pre-processing.

Data preparation
we collect all the fundus images from APTOS (Asia pacific Tele-Ophthalmology
Society) dataset. In this dataset the fundus images are labelled as 0,1,2,3 and 4 for
Normal, Mild DR, Moderate DR, Severe DR, Prolific DR respectively. This dataset
provides 4657 fundus images in total. Among these 3662 (stored in Train.csv with
image ID and its diagnosis label) and were used for model training and remaining 995
(stored in Test.csv with image ID and its diagnosis label) are used for model testing.

Exploratory Data Analysis


We think one should at least examine the label distribution, the images before Pre-
processing and the images after Pre-processing.

27
Fig 3.1.1.1 Label distribution

The above graph shows labels on the x-axis denotes the stages of DR and
height of the histogram denotes the number of images present in that stage.

3.1.2. IMAGE PREPROCESSING

One intuitive way to improve the performance of our model is to simply improve
the quality of input images. In this kernel, we will share two ideas which we hope may
be useful to some of you:

 Reducing lighting-condition effects: images come with many different


lighting conditions, some images are very dark and difficult to visualize. We
can try to convert the image to gray scale, and visualize better.

 Cropping uninformative area

28
Image to explain diabetic retinopathy

Fig 3.1.2.1 Image of Diabetic Retinopathy

We found that Hemorrhages, Hard Exudates and Cotton Wool spots are quite easily
observed. However, we still could not find examples of Aneurysm or Abnormal
Growth of Blood Vessels from our data yet. Perhaps the latter two cases are important
if we want to catch up human benchmark using our model.

In Pre-processing, we use Ben Graham Preprocessing method. Images come


with many different lighting conditions, some images are very dark and difficult to
visualize.

First, let have a glance of original inputs. Each row depicts each severity level.
We can see two problems which make the severity difficult to spot on. First, some
images are very dark [pic (0,2) and pic (4,4)] and sometimes different color
illumination is confusing [pic (3,3)]. Second, we can get the uninformative dark areas
for some pictures [pic (0,1), pic (0,3)]. This is important when we reduce the picture
size, as informative areas become too small. So, it is intuitive to crop the
uninformative areas out in the second case.

29
Fig 3.1.2.2. Original input images

To avoid this color distraction, we convert this original image (BGR format) to
RGB format. So, we can crop the images for uninformative area. After cropping the
images, they are converted into gray scale to resize the images and detect in which
stage it is.

30
Fig 3.1.2.3. After cropping the images

After preprocessing we have managed to enhance the distinctive features in the


images. This will increase performance when we train our Efficient Net model.
Jupyter notebook is used for preprocessing.

Fig 3.1.2.4. After Preprocessing the images

31
3.1.3. MODELING

Metric (Quadratic Weighted Kappa)

The Quadratic Weighted Kappa is used to calculate the similarity between the
actuals and predictions. A perfect score of 1.0 is granted when both the predictions and
actuals are the same. Whereas, the least possible score is -1 which is given when the
predictions are furthest away from actuals. In our case, consider all actuals were 0's
and all predictions were 4's.The aim is to get as close to 1 as possible. Generally, a
score of 0.6+ is considered to be a really good score. This Metric is used to know
when to stop the training of the images by the model. The training model be selected
based on the best metric value.

Weighted kappa(K) = 1-

Where i= actual values; j= predicted values; k= number of labels i.e., 5

=Weight matrix of actuals and predicted values

= Confusion matrix of actuals and predicted values

=Expected matrix, which is calculated based on outer product of actuals


and predicted vectors

This Weighted kappa is calculated as follows:

Step 1: First, an NxN matrix X is constructed, such that X (i, j) corresponds to the
actual ratings i and predicted ratings j. This NxN matrix is considered as Confusion
matrix.

Step 2: Construct a weighted matrix W which is calculated based on difference


between the actual and predicted rating scores.

Step 3: Create two vectors, one for predictions and another for actuals, which tells us
that how many values of each rating exists.

Step 4: Now, Construct Expected Matrix E which is the outer product of two vectors

32
(prediction vector and the actual vector) calculated in step 3.

Step 5: Normalize both matrices to have same sum. Since, it is easiest to get sum to be
'1', we will simply divide each matrix by its sum to normalize the data.

Step 6: Now calculate the weighted kappa as per formulae by substituting the values.

Since we want to optimize the Quadratic Weighted Kappa score, we can


formulate this challenge as a regression problem. In this way we are more flexible in
our optimization and we can yield higher scores than solely optimizing for accuracy.
We will optimize a pre-trained EfficientNetB5 with a few added layers. The metric
that we try to optimize is the Mean Squared Error. This is the mean of squared
differences between our predictions and labels, as showed in the formula below. By
optimizing this metric, we are also optimizing for Quadratic Weighted Kappa if we
round the predictions afterwards.

Mean Squared Error=

Where n = number of data points

=represents observed values

= represents Predicted values

Since we are not provided with that much data (3662 images), we will augment
the data to make the model more robust. We will rotate the data on any angle. Also,
we will flip the data both horizontally and vertically. Lastly, we will divide the data by
128 for normalization.

We split the training dataset into 85% for training and 15% for validating the
model. We resize the images into (image width, image height) of (456,456) so that it is
suitable to process in the efficientnet-b5 model. In efficientnet-b5 model batch
normalization is applied and batch normalization is unstable for small batch sizes since
the non-uniformity of dataset does not change. To solve this problem, we applied the
group normalization to each layer of efficientnet-b5 so that it normalizes the features
by dividing the channels into groups so that it is used for processing. Before we import
the feature extractor part of efficientnet-b5 model create the sequential object from
keras and pass the feature extractor part of efficientnet-b5 model. Then create the

33
architecture of classifier part. In the classifier part we used the Relu and linear
activation function since we get single output as we consider it as regression problem.

To make the model converge to optimal point we should use the optimizers. we
use RAdam Optimizer since it does not follow adaptive learning rate. Adaptive
learning rate comes with high variance in the first few epochs. In RAdam optimizer,
we trained with low learning rate in the first few epochs with momentum turned off
and later learning rate increases with momentum turned on that leads to better
convergence of model with few epochs.

Then in the process of training model, we used the 35 epochs and we used the
metrics as quadratic weighted kappa (qwk) and accuracy. If the metrics does not
change up to 4 epochs, we stop the training and save the best model according to
quadratic weighted kappa score. Then we load in the weights that are provided by our
dataset. we will use the RAdam Optimizer since it often yields better convergence and
to get the optimized output. And we use batch normalization to take input images and
process the result. If the batch size increases then this speed of the RAM will be
decreased. So that, better to take small batch size.

3.1.4. EVALUATION

To evaluate our performance, we test model on validation data and predict


values from the validation generator and round them off to the corresponding integer
to get valid predictions. To detect the severity level of the diabetic retinopathy of the
patient from the values of generator we set threshold value for each stage of the
disease During training of the model. Based on the threshold values we get accuracy of
the model. we can improve the model Accuracy by optimizing the model
performance using optimizers like RAdam optimizer we used in this model. We
initially set initial values of coefficient values to (0.5,1.5,2.5,3.5). Later we minimize
the quadratic weighted kappa Score with respect to coefficient values using nelder-
mead method.

We evaluate the performance of the model depending on validation accuracy


and quadratic weighted kappa score. since quadratic weighted kappa score plays vital
role in assessing performance of the model. It further enhances using the optimizers on
the quadratic weighted kappa score with respect to coefficients of the threshold values.

34
we get 0.8712 quadratic weighted kappa score and 83% accuracy on validation data.
Now train the model till the model does not improve further apply it on test data we
get severity of diabetic retinopathy using fundus images.

35
4. EXPERIMENTAL ANALYSIS

4.1 System configuration

4.1.1 Software configuration

Operating system: Windows 7 or newer, 64-bit macOS 10.13+, or Linux,


including Ubuntu, RedHat, CentOS 6+, and others.

System architecture: Windows- 64-bit x86, 32-bit x86; MacOS- 64-bit x86;
Linux- 64-bit x86, 64-bit Power8/Power9.

Environment: Python 2.7 or above version

4.1.2 Hardware configuration

Processor: Intel core i5 or above

Ram: 8GB (OR) above

Storage: Minimum 5 GB disk space to download and install.

4.2 SAMPLE CODE


import os
import sys
sys.path.append(os.path.abspath('input/efficientnet/efficientnetmaster/efficientnet-
master/'))
import cv2
import time
import scipy as sp
import numpy as np
import random as rn
import pandas as pd
from tqdm import tqdm
from PIL import Image
from functools import partial
import matplotlib.pyplot as plt
import tensorflow as tf

36
import os
import sys
import keras
from keras import initializers
from keras import regularizers
from keras import constraints
from keras import backend as K
from keras.activations import elu
from keras.optimizers import Adam
from keras.models import Sequential
from keras.engine import Layer, InputSpec
from keras.utils.generic_utils import get_custom_objects
from keras.callbacks import Callback, EarlyStopping, ReduceLROnPlateau
from keras.layers.convolutional import Convolution2D, MaxPooling2D
from keras.layers import Dense, Conv2D, Flatten, GlobalAveragePooling2D, Dropout
from keras.preprocessing.image import ImageDataGenerator
from sklearn.metrics import cohen_kappa_score
KAGGLE_DIR = 'APTOS/'
TRAIN_DF_PATH = KAGGLE_DIR + "train.csv"
TEST_DF_PATH = KAGGLE_DIR + 'test.csv'
TRAIN_IMG_PATH = KAGGLE_DIR + "train_images/"
TEST_IMG_PATH = KAGGLE_DIR + 'test_images/'
SAVED_MODEL_NAME = 'effnet_modelB5.h5'
seed = 1234
rn.seed(seed)
np.random.seed(seed)
tf.set_random_seed(seed)
os.environ['PYTHONHASHSEED'] = str(seed)
t_start = time.time()
print("Image IDs and Labels (TRAIN)")
train_df = pd.read_csv(TRAIN_DF_PATH)
train_df['id_code'] = train_df['id_code'] + ".png"

37
print(f"Training images: {train_df.shape[0]}")
display(train_df.head())
print("Image IDs (TEST)")
test_df = pd.read_csv(TEST_DF_PATH)
test_df['id_code'] = test_df['id_code'] + ".png"
print(f"Testing Images: {test_df.shape[0]}")
display(test_df.head())
IMG_WIDTH = 456
IMG_HEIGHT = 456
CHANNELS = 3

def get_preds_and_labels(model, generator):


preds = []
labels = []
for _ in range(int(np.ceil(generator.samples / BATCH_SIZE))):
x, y = next(generator)
preds.append(model.predict(x))
labels.append(y)
return np.concatenate(preds).ravel(), np.concatenate(labels).ravel()
class Metrics(Callback):
def on_train_begin(self, logs={}):
self.val_kappas = []
def on_epoch_end(self, epoch, logs={}):
y_pred, labels = get_preds_and_labels(model, val_generator)
y_pred = np.rint(y_pred).astype(np.uint8).clip(0, 4)
_val_kappa = cohen_kappa_score(labels, y_pred, weights='quadratic')
self.val_kappas.append(_val_kappa)
print(f"val_kappa: {round(_val_kappa, 4)}")
if _val_kappa == max(self.val_kappas):
print("Validation Kappa has improved. Saving model.")
self.model.save(SAVED_MODEL_NAME)
return
train_df['diagnosis'].value_counts().sort_index().plot(kind="bar",

38
figsize=(12,5),
rot=0)
plt.title("Label Distribution (Training Set)",
weight='bold',
fontsize=18)
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.xlabel("Label", fontsize=17)
plt.ylabel("Frequency", fontsize=17);
train_df['diagnosis'].value_counts().sort_index().plot(kind="bar",
figsize=(12,5), rot=0)

# Function for Cropping the image


def crop_image_from_gray(img, tol=7):
if img.ndim == 2:
mask = img > tol
return img[np.ix_(mask.any(1),mask.any(0))]
elif img.ndim == 3:
gray_img = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
mask = gray_img > tol

check_shape = img[:,:,0][np.ix_(mask.any(1),mask.any(0))].shape[0]
if (check_shape == 0):
return img
else:
img1=img[:,:,0][np.ix_(mask.any(1),mask.any(0))]
img2=img[:,:,1][np.ix_(mask.any(1),mask.any(0))]
img3=img[:,:,2][np.ix_(mask.any(1),mask.any(0))]
img = np.stack([img1,img2,img3],axis=-1)
return img

#Preprocessing the image


def preprocess_image(image, sigmaX=10):

39
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
image = crop_image_from_gray(image)
image = cv2.resize(image, (IMG_WIDTH, IMG_HEIGHT))
image = cv2.addWeighted(image,4, cv2.GaussianBlur(image, (0,0) ,sigmaX), -
4, 128)
return image

fig, ax = plt.subplots(1, 5, figsize=(15, 6))


for i in range(5):
sample = train_df[train_df['diagnosis'] == i].sample(1)
image_name = sample['id_code'].item()
X = preprocess_image(cv2.imread(f"{TRAIN_IMG_PATH}{image_name}"))
ax[i].set_title(f"Image: {image_name}\n Label = {sample['diagnosis'].item()}",
weight='bold', fontsize=10)
ax[i].axis('off')
ax[i].imshow(X);
BATCH_SIZE = 4

train_datagen = ImageDataGenerator(rotation_range=360,
horizontal_flip=True,
vertical_flip=True,
validation_split=0.15,
preprocessing_function=preprocess_image,
rescale=1 / 128.)

train_generator = train_datagen.flow_from_dataframe(train_df,
x_col='id_code',
y_col='diagnosis',
directory = TRAIN_IMG_PATH,
target_size=(IMG_WIDTH, IMG_HEIGHT),
batch_size=BATCH_SIZE,
class_mode='other',
subset='training')

40
val_generator = train_datagen.flow_from_dataframe(train_df,
x_col='id_code',
y_col='diagnosis',
directory = TRAIN_IMG_PATH,
target_size=(IMG_WIDTH, IMG_HEIGHT),
batch_size=BATCH_SIZE,
class_mode='other',
subset='validation')
class GroupNormalization(Layer):
def init (self,
groups=32,
axis=-1,
epsilon=1e-5,
center=True,
scale=True,
beta_initializer='zeros',
gamma_initializer='ones',
beta_regularizer=None,
gamma_regularizer=None,
beta_constraint=None,
gamma_constraint=None,
**kwargs):
super(GroupNormalization, self). init (**kwargs)
self.supports_masking = True
self.groups = groups
self.axis = axis
self.epsilon = epsilon
self.center = center
self.scale = scale
self.beta_initializer = initializers.get(beta_initializer)
self.gamma_initializer = initializers.get(gamma_initializer)
self.beta_regularizer = regularizers.get(beta_regularizer)

41
self.gamma_regularizer = regularizers.get(gamma_regularizer)
self.beta_constraint = constraints.get(beta_constraint)
self.gamma_constraint = constraints.get(gamma_constraint)

def build(self, input_shape):


dim = input_shape[self.axis]

if dim is None:
raise ValueError('Axis ' + str(self.axis) + ' of '
'input tensor should have a defined dimension '
'but the layer received an input with shape ' +
str(input_shape) + '.')

if dim < self.groups:


raise ValueError('Number of groups (' + str(self.groups) + ') cannot be '
'more than the number of channels (' +
str(dim) + ').')

if dim % self.groups != 0:
raise ValueError('Number of groups (' + str(self.groups) + ') must be a '
'multiple of the number of channels (' +
str(dim) + ').')

self.input_spec = InputSpec(ndim=len(input_shape),
axes={self.axis: dim})
shape = (dim,)

if self.scale:
self.gamma = self.add_weight(shape=shape,
name='gamma',
initializer=self.gamma_initializer,
regularizer=self.gamma_regularizer,

42
constraint=self.gamma_constraint)
else:
self.gamma = None
if self.center:
self.beta = self.add_weight(shape=shape,
name='beta',
initializer=self.beta_initializer,
regularizer=self.beta_regularizer,
constraint=self.beta_constraint)
else:
self.beta = None
self.built = True

def call(self, inputs, **kwargs):


input_shape = K.int_shape(inputs)
tensor_input_shape = K.shape(inputs)

# Prepare broadcasting shape.


reduction_axes = list(range(len(input_shape)))
del reduction_axes[self.axis]
broadcast_shape = [1] * len(input_shape)
broadcast_shape[self.axis] = input_shape[self.axis] // self.groups
broadcast_shape.insert(1, self.groups)

reshape_group_shape = K.shape(inputs)
group_axes = [reshape_group_shape[i] for i in range(len(input_shape))]
group_axes[self.axis] = input_shape[self.axis] // self.groups
group_axes.insert(1, self.groups)
group_shape = [group_axes[0], self.groups] + group_axes[2:]
group_shape = K.stack(group_shape)
inputs = K.reshape(inputs, group_shape)

group_reduction_axes = list(range(len(group_axes)))

43
group_reduction_axes = group_reduction_axes[2:]

mean = K.mean(inputs, axis=group_reduction_axes, keepdims=True)


variance = K.var(inputs, axis=group_reduction_axes, keepdims=True)

inputs = (inputs - mean) / (K.sqrt(variance + self.epsilon)


inputs = K.reshape(inputs, group_shape)
outputs = inputs

if self.scale:
broadcast_gamma = K.reshape(self.gamma, broadcast_shape)
outputs = outputs * broadcast_gamma

if self.center:
broadcast_beta = K.reshape(self.beta, broadcast_shape)
outputs = outputs + broadcast_beta

outputs = K.reshape(outputs, tensor_input_shape)

return outputs

def get_config(self):
config = {
'groups': self.groups,
'axis': self.axis,
'epsilon': self.epsilon,
'center': self.center,
'scale': self.scale,
'beta_initializer': initializers.serialize(self.beta_initializer),
'gamma_initializer': initializers.serialize(self.gamma_initializer),
'beta_regularizer': regularizers.serialize(self.beta_regularizer),
'gamma_regularizer': regularizers.serialize(self.gamma_regularizer),
'beta_constraint': constraints.serialize(self.beta_constraint),

44
'gamma_constraint': constraints.serialize(self.gamma_constraint)
}
base_config = super(GroupNormalization, self).get_config()
return dict(list(base_config.items()) + list(config.items()))

def compute_output_shape(self, input_shape):


return input_shape

effnet = EfficientNetB5(weights=None,
include_top=False,
input_shape=(IMG_WIDTH, IMG_HEIGHT, CHANNELS))
effnet.load_weights('../input/efficientnet-keras-weights-b0b5/efficientnet-
b5_imagenet_1000_notop.h5')
for i, layer in enumerate(effnet.layers):
if "batch_normalization" in layer.name:
effnet.layers[i] = GroupNormalization(groups=32, axis=-1, epsilon=0.00001)

def build_model():

model = Sequential()
model.add(effnet)
model.add(GlobalAveragePooling2D())
model.add(Dropout(0.5))
model.add(Dense(5, activation=elu))
model.add(Dense(1, activation="linear"))
model.compile(loss='mse',
optimizer=RAdam(lr=0.00005),
metrics=['mse', 'acc'])
print(model.summary())
return model
model = build_model()
kappa_metrics = Metrics()

45
# Monitor MSE to avoid overfitting and save best model
es = EarlyStopping(monitor='val_loss', mode='auto', verbose=1, patience=12)
rlr = ReduceLROnPlateau(monitor='val_loss',
factor=0.5,
patience=4,
verbose=1,
mode='auto',
epsilon=0.0001)

# Begin training
model.fit_generator(train_generator,
steps_per_epoch=train_generator.samples // BATCH_SIZE,
epochs=35,
validation_data=val_generator,
validation_steps = val_generator.samples // BATCH_SIZE,
callbacks=[kappa_metrics, es, rlr])
model.load_weights('../input/trainmodel2/effnet_original.h5')
history_df = pd.DataFrame(model.history.history)
history_df[['loss', 'val_loss']].plot(figsize=(12,5))
plt.title("Loss (MSE)", fontsize=16, weight='bold')
plt.xlabel("Epoch")
plt.ylabel("Loss (MSE)")
history_df[['acc', 'val_acc']].plot(figsize=(12,5))
plt.title("Accuracy", fontsize=16, weight='bold')
plt.xlabel("Epoch")
plt.ylabel("% Accuracy");
y_train_preds, train_labels = get_preds_and_labels(model, train_generator)
y_train_preds = np.rint(y_train_preds).astype(np.uint8).clip(0, 4)

# Calculate score
train_score = cohen_kappa_score(train_labels, y_train_preds, weights="quadratic")

# Calculate QWK on validation set

46
y_val_preds, val_labels = get_preds_and_labels(model, val_generator)
y_val_preds = np.rint(y_val_preds).astype(np.uint8).clip(0, 4)

# Calculate score
val_score = cohen_kappa_score(val_labels, y_val_preds, weights="quadratic")
print(f"The Training Cohen Kappa Score is: {round(train_score, 5)}")
print(f"The Validation Cohen Kappa Score is: {round(val_score, 5)}")
class OptimizedRounder(object):

def init (self):


self.coef_ = 0

def _kappa_loss(self, coef, X, y):


X_p = np.copy(X)
for i, pred in enumerate(X_p):
if pred < coef[0]:
X_p[i] = 0
elif pred >= coef[0] and pred < coef[1]:
X_p[i] = 1
elif pred >= coef[1] and pred < coef[2]:
X_p[i] = 2
elif pred >= coef[2] and pred < coef[3]:
X_p[i] = 3
else:
X_p[i] = 4

ll = cohen_kappa_score(y, X_p, weights='quadratic')


return -ll

def fit(self, X, y):

loss_partial = partial(self._kappa_loss, X=X, y=y)


initial_coef = [0.5, 1.5, 2.5, 3.5]

47
self.coef_ = sp.optimize.minimize(loss_partial, initial_coef, method='nelder-
mead')

def predict(self, X, coef):


X_p = np.copy(X)
for i, pred in enumerate(X_p):
if pred < coef[0]:
X_p[i] = 0
elif pred >= coef[0] and pred < coef[1]:
X_p[i] = 1
elif pred >= coef[1] and pred < coef[2]:
X_p[i] = 2
elif pred >= coef[2] and pred < coef[3]:
X_p[i] = 3
else:
X_p[i] = 4
return X_p

def coefficients(self):
"""
Return the optimized coefficients
"""
return self.coef_['x']

# Optimize on validation data and evaluate again


y_val_preds, val_labels = get_preds_and_labels(model, val_generator)
optR = OptimizedRounder()
optR.fit(y_val_preds, val_labels)
coefficients = optR.coefficients()
opt_val_predictions = optR.predict(y_val_preds, coefficients)
new_val_score = cohen_kappa_score(val_labels, opt_val_predictions,
weights="quadratic")
test_df['diagnosis'] = np.zeros(test_df.shape[0])

48
# For preprocessing test images
test_generator = ImageDataGenerator(preprocessing_function=preprocess_image,
rescale=1 / 128.).flow_from_dataframe(test_df,
x_col='id_code', y_col='diagnosis',directory=TEST_IMG_PATH,
target_size=(IMG_WIDTH, IMG_HEIGHT), batch_size=BATCH_SIZE,
class_mode='other', shuffle=False)

train_df['diagnosis'].value_counts().sort_index().plot(kind="bar",figsize=(12,5),
rot=0)
plt.title("Label Distribution (Training Set)", weight='bold', fontsize=18)

plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.xlabel("Label", fontsize=17)
plt.ylabel("Frequency", fontsize=17);
t_finish = time.time()
total_time = round((t_finish-t_start) / 3600, 4)
print('Kernel runtime = {} hours ({} minutes)'.format(total_time,
int(total_time*60)))

4.3.SAMPLE OUTPUTS

Fig 4.3.1 Label Distribution of the training set

49
The Figure 4.3.1 tells about distribution of the images present in the training dataset.
The histogram represents the categories of the images present in the training data set.
Here, The Vertical axis (y) represents Frequency of the images in training dataset and
the Horizontal axis (x) represents the label (No DR-0, Mild DR-1, Moderate DR-2,
Severe DR-3, Prolific DR-4) of the image.

IMAGES WITHOUT RESIZING

Fig 4.3.2 depicts the resulted images after we eliminated the color distraction in
images using ben graham preprocessing method by applying tolerance of 7 in each
image so that it eliminates the darker portion of image that they are uninformative and
gives brighter side of the image that is more informative area of image.

Fig 4.3.2 Images without Resizing

50
IMAGES AFTER RESIZING

The main problem still there in fig 4.3.2 is size of informative area portion of image is
very small
and difficult to notice. we should resize the image so that model process the images
well and derive meaningful information from image and classify it well. we resize the
image into (256,256) so that pre-trained model is compatible with pre-processed image
and also informative areas of image will be get larger and model works perfectly.

Fig 4.3.3 Images After Resizing

51
IMAGES BEFORE PRE-PROCESSING

Fig 4.3.4 Images before pre-Processing

The above diagrams are original images from datasets of each image belongs to each
stage of the diabetic retinopathy disease and we can see one image is brighter
distracted by different color visuals and other image is darker that is uninformative
areas and we will eliminate the above problems using pre-processing methods.

IMAGES AFTER PRE-PROCESSING

Fig 4.3.5 Images After Pre-Processing

52
We used the gaussian blurry on the image because smoothen images often suppress
high frequency components that we get high frequency components of image by
blurring the images. Then we subtract this blurred image from original image and
adding back the difference known as mask enhance the high frequency components of
image. The resulting image is fig 4.3.5.

MODEL ARCHITECTURE

Fig 4.3.6 Calculations of Model Architecture

The above diagram describes the architecture of the model here layer describes the
operation on the image and output shape is the shape of image we get applying that
layer operation on the image and param denotes the number of parameters it results in
the output image. Here 6 is the last param number denotes the total number of classes
in the final layer.

53
PREDICTION DISTRIBUTIONS

Fig 4.3.7 Label Distribution for the Predictions

Here histogram of the diagram represents the number of images comes under each
stage in diabetic retinopathy disease and we can see a greater number of images comes
under stage-2 of Diabetic Retinopathy.
The different stages of Diabetic Retinopathy are:
0-No DR
1- Mild DR
2- Moderate DR
3- Proliferative DR
4-Severe DR

54
PERFORMANCE MEASURES:
Metric (Quadratic Weighted Kappa):
The Quadratic Weighted Kappa is used to calculate the similarity between the
actuals and predictions. A perfect score of 1.0 is granted when both the predictions and
actual values are the same. Whereas, the least possible score is -1 which is given when
the predictions are furthest away from actuals. In our case, consider all actuals were 0's
and all predictions were 4's. This would lead to a QWKP score of -1.The aim is to get
as close to 1 as possible. Generally, a score of 0.6+ is considered to be a really good
score. This Metric is used to know when to stop the training of the images by the
model. The training model be selected based on the best metric value.

Weighted kappa(K) = 1-

Where i= actual values; j= predicted values; k= number of labels i.e., 5


=Weight matrix of actuals and predicted values
= Confusion matrix of actuals and predicted values
=Expected matrix, which is calculated based on outer product of actuals
and predicted vectors
validation Accuracy:
The percentage of which model which was trained on the training dataset
correctly predicts on validation dataset. It is used to evaluate the performance of the
model before we deploy the model.
Mean Squared Error:
It takes the average of the square of the difference between the original values
and the predicted values. It is easier to compute the gradient. Since our problem is a
regression problem we take mean squared error as a loss metric. The lesser the value
of mean squared error, the better the model. As it takes the square of the error, the
effect of larger errors(outliers) becomes more visible and can be penalized easily,
hence the effect of outliers can be diminished on the model.

55
5. CONCLUSION AND FUTURE WORK

5.1. CONCLUSION
In this project, transfer learning is implemented to classify DR into 5 classes
with a much-reduced training data than other previous DR classification techniques
employed. This was done to design a way to train a DL model that performs well on
unseen data by efficiently learning from small dataset because training data is limited
in healthcare. Our model has reached at an accuracy that is higher than other
techniques that have used transfer learning on the whole Kaggle DR challenge dataset
for multi-class classification. Our model has reached at a superior performance on
account of the selected training algorithm, which is batch gradient descent with
ascending learning rate, and the quadratic weighted kappa loss function. Deep learning
techniques that can learn from small dataset to categorize medical images should be
utilized to classify DR, as this can be transferred to other medical image classification
problems facing the challenge of insufficient training data. Experiments should be
done to compare performances of other pre-trained deep convolutional
Networks.

5.2. FUTURE WORK


We will the same model on larger dataset compared to the present dataset. We
will apply the feature extraction part from pre-trained model and apply to algorithms
such as support vector machines and changing the performance measures such as
specificity and sensitivity as it gives healthcare more trust to model usage in real time.
we will apply the different image pre-processing techniques to the dataset and
compare the performances and will compare the different transfer learning techniques
and apply the pre-trained models to complex image classification challenges in real
world problems.

56
REFERENCES

[1] World Health Organization, 2016. Global report on diabetes.


[2] Saaddine, J.B., Honeycutt, A.A., Narayan, K.V., Zhang, X., Klein, R. and Boyle,
J.P., 2008. Projection of diabetic retinopathy and other major eye diseases among
people with diabetes mellitus: United States, 2005-2050. Archives of
ophthalmology, 126(12), pp.1740-174
[3] Yau, J.W., Rogers, S.L., Kawasaki, R., Lamoureux, E.L., Kowalski, J.W., Bek,
T., Chen, S.J., Dekker, J.M., Fletcher, A., Grauslund, J. and Haffner, S., 2012.
Global prevalence and major risk factors of diabetic retinopathy. Diabetes care,
35(3), pp.556-564.
[4] Duh, E.J., Sun, J.K. and Stitt, A.W., 2017. Diabetic retinopathy: current
understanding, mechanisms, and treatment strategies. JCI insight, 2(14).
[5] Wilkinson, C.P., Ferris III, F.L., Klein, R.E., Lee, P.P., Agardh, C.D., Davis, M.,
Dills, D., Kampik, A., Pararajasegaram, R., Verdaguer, J.T. and Group,
G.D.R.P., 2003. Proposed international clinical diabetic retinopathy and diabetic
macular edema disease severity scales. Ophthalmology, 110(9), pp.1677-1682.
[6] Salz, D.A. and Witkin, A.J., 2015. Imaging in diabetic retinopathy. Middle East
African journal of ophthalmology, 22(2), p.145.
[7] Fong, D.S., Aiello, L., Gardner, T.W., King, G.L., Blankenship, G., Cavallerano,
J.D., Ferris, F.L. and Klein, R., 2004. Retinopathy in diabetes. Diabetes care,
27(suppl 1), pp.s84-s87.
[8] Vitale, S., Maguire, M.G., Murphy, R.P., Hiner, C., Rourke, L., Sackett, C. and
Patz, A., 1997. Interval between onset of mild nonproliferative and proliferative
retinopathy in type I diabetes. Archives of Ophthalmology, 115(2), pp.194-198.

[9] Deng, J., Dong, W., Socher, R., Li, L.J., Li, K. and Fei-Fei, L., 2009, June.
Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference
on computer vision and pattern recognition (pp. 248-264)
[10] LeCun, Y., Boser, B.E., Denker, J.S., Henderson, D., Howard, R.E., Hubbard,
W.E. and Jackel, L.D., 1990. Handwritten digit recognition with a back-
propagation network. In Advances in neural information processing systems (pp.
396-404).

57
[11] LeCun, Y., Bengio, Y. and Hinton, G., 2015. Deep learning. nature, 521(7553),
p.436.
[12] Miotto, R., Wang, F., Wang, S., Jiang, X. and Dudley, J.T., 2017. Deep learning
for healthcare: review, opportunities and challenges. Briefings in bioinformatics,
19(6), pp.1236-1246.
[13] Razzak, M.I., Naz, S. and Zaib, A., 2018. Deep learning for medical image
processing: Overview, challenges and the future. In Classification in BioApps
(pp. 323-350). Springer, Cham.
[14] Erhan, D., Bengio, Y., Courville, A., Manzagol, P.A., Vincent, P. and Bengio, S.,
2010. Why does unsupervised pre-training help deep learning? Journal of
Machine Learning Research, 11(Feb), pp.625-660.
[15] Litjens, G., Kooi, T., Bejnordi, B.E., Setio, A.A.A., Ciompi, F., Ghafoorian, M.,
Van Der Laak, J.A., Van Ginneken, B. and Sánchez, C.I., 2017. A survey on
deep learning in medical image analysis. Medical image analysis, 42, pp.60-88.
[16] Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J. and Wojna, Z., 2016.
Rethinking the inception architecture for computer vision. In Proceedings of the
IEEE conference on computer vision and pattern recognition (pp. 2818-2826).

[17] Kaggle. (2015). Diabetic Retinopathy Detection. [online] Available at:


https://www.kaggle.com/c/diabetic-retinopathy-detection [Accessed 11 Mar.
2019].
[18] Acharya, R., Chua, C.K., Ng, E.Y.K., Yu, W. and Chee, C., 2008. Application of
higher order spectra for the identification of diabetes retinopathy stages. Journal
of Medical Systems, 32(6), pp.481-488.
[19] Faust, O., Acharya, R., Ng, E.Y.K., Ng, K.H. and Suri, J.S., 2012. Algorithms for
the automated detection of diabetic retinopathy using digital fundus images: a
review. Journal of medical systems, 36(1), pp.145-157.
[20] Prentašić, P. and Lončarić, S., 2016. Detection of exudates in fundus
photographs using deep neural networks and anatomical landmark detection
fusion. Computer methods and programs in biomedicine, 137, pp.281-292.
[21] Perdomo, O., Arevalo, J. and González, F.A., 2017, January. Convolutional
network to detect exudates in eye fundus images of diabetic subjects. In 12th

58
International Symposium on Medical Information Processing and Analysis (Vol.
10160, p. 101600T). International Society for Optics and Photonics.
[22] Quellec, G., Charrière, K., Boudi, Y., Cochener, B. and Lamard, M., 2017. Deep
image mining for diabetic retinopathy screening. Medical image analysis, 39,
pp.178-193.
[23] Gondal, W.M., Köhler, J.M., Grzeszick, R., Fink, G.A. and Hirsch, M., 2017,
September. Weakly-supervised localization of diabetic retinopathy lesions in
retinal fundus images. In 2017 IEEE International Conference on Image
Processing (ICIP) (pp. 2069-2073). IEEE.
[24] Joshi, S. and Karule, P.T., 2018. A review on exudates detection methods for
diabetic retinopathy. Biomedicine & Pharmacotherapy, 97, pp.1454-1460.

[25] Kumar, S. and Kumar, B., 2018, February. Diabetic Retinopathy Detection by
Extracting Area and Number of Microaneurysm from Colour Fundus Image. In
2018 5th International Conference on Signal Processing and Integrated
Networks (SPIN) (pp. 359-364). IEEE.
[26] Asiri, N., Hussain, M. and Abualsamh, H.A., 2018. Deep Learning based
Computer-Aided Diagnosis Systems for Diabetic Retinopathy: A Survey. arXiv
preprint arXiv:1811.01238.
[27] Gulshan, V., Peng, L., Coram, M., Stumpe, M.C., Wu, D., Narayanaswamy, A.,
Venugopalan, S., Widner, K., Madams, T., Cuadros, J. and Kim, R., 2016.
Development and validation of a deep learning algorithm for detection of
diabetic retinopathy in retinal fundus photographs. Jama, 316(22), pp.2402-2410.
[28] Pratt, H., Coenen, F., Broadbent, D.M., Harding, S.P. and Zheng, Y., 2016.
Convolutional neural networks for diabetic retinopathy. Procedia Computer
Science, 90, pp.200-205.
[29] Abràmoff, M.D., Lou, Y., Erginay, A., Clarida, W., Amelon, R., Folk, J.C. and
Niemeijer, M., 2016. Improved automated detection of diabetic retinopathy on a
publicly available dataset through integration of deep learning. Investigative
ophthalmology & visual science, 57(13), pp.5200-5206.
[30] Colas, E., Besse, A., Orgogozo, A., Schmauch, B., Meric, N. and Besse, E.,
2016. Deep learning approach for diabetic retinopathy screening. Acta
Ophthalmologica, 94.

59
[31] Ting, D.S.W., Cheung, C.Y.L., Lim, G., Tan, G.S.W., Quang, N.D., Gan, A.,
Hamzah, H., Garcia-Franco, R., San Yeo, I.Y., Lee, S.Y. and Wong, E.Y.M.,
2017. Development and validation of a deep learning system for diabetic
retinopathy and related eye diseases using retinal images from multiethnic
populations with diabetes. Jama, 318(22), pp.2211-2223.
[32] Gargeya, R. and Leng, T., 2017. Automated identification of diabetic retinopathy
using deep learning. Ophthalmology, 124(7), pp.962-969.

[33] Mohammadian, S., Karsaz, A. and Roshan, Y.M., 2017, November. Comparative
Study of Fine-Tuning of Pre-Trained Convolutional Neural Networks for
Diabetic Retinopathy Screening. In 2017 24th National and 2nd International
Iranian Conference on Biomedical Engineering (ICBME) (pp. 1-6). IEEE.
[34] Wan, S., Liang, Y. and Zhang, Y., 2018. Deep convolutional neural networks for
diabetic retinopathy detection by image classification. Computers & Electrical
Engineering, 72, pp.274-282.
[35] Mo, J. and Zhang, L., 2017. Multi-level deep supervised networks for retinal
vessel segmentation. International journal of computer assisted radiology and
surgery, 12(12), pp.2181-2193.
[36] Szegedy, C., Ioffe, S., Vanhoucke, V. and Alemi, A.A., 2017, February.
Inception-v4, inception-resnet and the impact of residual connections on
learning. In Thirty-First AAAI Conference on Artificial Intelligence.
[37] Raumviboonsuk, P., Krause, J., Chotcomwongse, P., Sayres, R., Raman, R.,
Widner, K., Campana, B.J., Phene, S., Hemarat, K., Tadarati, M. and Silpa-Acha,
S., 2018. Deep Learning vs. Human Graders for Classifying Severity Levels of
Diabetic Retinopathy in a Real-World Nationwide Screening Program. arXiv
preprint arXiv:1810.08290.
[38] Mansour, R.F., 2018. Deep-learning-based automatic computer-aided diagnosis
system for diabetic retinopathy. Biomedical engineering letters, 8(1), pp.41-57.
[39] Dutta, S., Manideep, B.C., Basha, S.M., Caytiles, R.D. and Iyengar, N.C.S.,
2018. Classification of Diabetic Retinopathy Images by Using Deep Learning
Models. International Journal of Grid and Distributed Computing, 11(1), pp.89-
106.

60
[40] Gao, Z., Li, J., Guo, J., Chen, Y., Yi, Z. and Zhong, J., 2018. Diagnosis of
Diabetic Retinopathy Using Deep Neural Networks. IEEE Access, 7, pp.3360-
3370.
[41] Zhou, K., Gu, Z., Liu, W., Luo, W., Cheng, J., Gao, S. and Liu, J., 2018, July.
Multi-Cell Multi-Task Convolutional Neural Networks for Diabetic Retinopathy
Grading. In 2018 40th Annual International Conference of the IEEE Engineering
in Medicine and Biology Society (EMBC) (pp. 2724-2727). IEEE.
[42] Adly, M.M., Ghoneim, A.S. and Youssif, A.A., 2019. On the Grading of
Diabetic Retinopathies using a Binary-Tree-based Multiclass Classifier of CNNs.
International Journal of Computer Science and Information Security (IJCSIS),
17(1).
[43] Mateen, M., Wen, J., Song, S. and Huang, Z., 2019. Fundus Image Classification
Using VGG-19 Architecture with PCA and SVD. Symmetry, 11(1), p.1.

[44] Graham, B., 2015. Kaggle diabetic retinopathy detection competition report.
University of Warwick.
[45] Barz, B. and Denzler, J., 2019. Deep Learning on Small Datasets without Pre-
Training using Cosine Loss. arXiv preprint arXiv:1901.09054.
[46] Colaboratory. (2018). Welcome to Colaboratory. [online] Available at:
http://colab.research.google.com/ [Accessed 2 May 2019].
[47] Carneiro, T., Da Nóbrega, R.V.M., Nepomuceno, T., Bian, G.B., De
Albuquerque, V.H.C. and Reboucas Filho, P.P., 2018. Performance Analysis of
Google Colaboratory as a Tool for Accelerating Deep Learning Applications.
IEEE Access, 6, pp.61677-61685.

61
APPENDICES

62
BASE PAPER
Available online at www.sciencedirect.com

ScienceDirect
Procedia Computer Science 90 (2016) 200 – 205

International Conference On Medical Imaging Understanding and Analysis 2016, MIUA 2016,
6-8 July 2016, Loughborough, UK

Convolutional Neural Networks for Diabetic Retinopathy


Harry Pratta,∗, Frans Coenenb, Deborah M Broadbentc, Simon P Hardinga,c, Yalin Zhenga,c
a
Department of Eye and Vision Science, Institute of Ageing and Chronic Disease, University of Liverpool, Apex Building, 6 West Derby Street,
Liverpool L7 9TX, United Kingdom
b
Department of Computer Science, University of Liverpool, Ashton Street, Liverpool L69 3BX, United Kingdom
c
Royal Liverpool University Hospital, St. Paul’s Eye Unit, Prescot Street, Liverpool L7 8XP, United Kingdom

Abstract
The diagnosis of diabetic retinopathy (DR) through colour fundus images requires experienced clinicians to identify the presence
and significance of many small features which, along with a complex grading system, makes this a difficult and time consum-
ing task. In this paper, we propose a CNN approach to diagnosing DR from digital fundus images and accurately classifying its
severity. We develop a network with CNN architecture and data augmentation which can identify the intricate features involved
in the classification task such as micro-aneurysms, exudate and haemorrhages on the retina and consequently provide a diagnosis
automatically and without user input. We train this network using a high-end graphics processor unit (GPU) on the publicly avail-
able Kaggle dataset and demonstrate impressive results, particularly for a high-level classification task. On the data set of 80,000
images used our proposed CNN achieves a sensitivity of 95% and an accuracy of 75% on 5,000 validation images.
© 2016 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND
∗c 2016
license The Authors. Published by Elsevier B.V.
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
Peer-review underresponsibility
Peer-review under responsibility
of of
thethe Organizing
Organizing Committee
Committee of MIUA
of MIUA 20162016.
Keywords: Deep Learning, Convolutional Neural Networks, Diabetic Retinopathy, Image Classification, Diabetes

1. Introduction

Diabetic Retinopathy (DR) is one of the major causes of blindness in the western world 12 . Increasing life ex-
pectancy, indulgent lifestyles and other contributing factors mean the number of people with diabetes is projected to
continue rising 3. Regular screening of diabetic patients for DR has been shown to be a cost-effective and important
aspect of their care . The accuracy and timing of this care is of significant importance to both the cost and effective-
4
ness of treatment. If detected early enough, effective treatment of DR is available, making this a vital process 5.

Classification of DR involves the weighting of numerous features and the location of such features 6. This is
highly time consuming for clinicians. Computers are able to obtain much quicker classifications once trained,
giving the
ability to aid clinicians in real-time classification. The efficacy of automated grading for DR has been an active area


Harry Pratt. Tel.: +447428611330
E-mail address: sghpratt@liverpool.ac.uk

1877-0509 © 2016 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
Peer-review under responsibility of the Organizing Committee of MIUA 2016
doi:10.1016/j.procs.2016.07.014
Harry Pratt et al. / Procedia Computer Science 90 (2016) 200 – 205 201

of research in computer imaging with encouraging conclusions 78. Significant work has been done on detecting the
features of DR using automated methods such as support vector machines and k-NN classifiers 9. The majority of
these classification techniques are on two class classification for DR or no DR.

Convolutional Neural Networks (CNNs), a branch of deep learning, have an impressive record for applications in
image analysis and interpretation, including medical imaging. Network architectures designed to work with image
data were routinely built already in 1970s 10 with useful applications and surpassed other approaches to challenging
tasks like handwritten character recognition 11. However, it wasn’t until several breakthroughs in neural networks
such as the implementation of dropout 12, rectified linear units 13 and the accompanying increase in computing power
through graphical processor units (GPUs) that they became viable for more complex image recognition problems.
Presently, large CNNs are used to successfully tackle highly complex image recognition tasks with many object
classes to an impressive standard. CNNs are used in many current state-of-the-art image classification tasks such as
the annual ImageNet and COCO challenges 14 15.

Two main issues exist within automated grading and particularly CNNs. One is achieving a desirable offset in
sensitivity (patients correctly identified as having DR) and specificity (patients correctly identified as not having DR).
This is significantly harder for national criteria which is a five class problem in to normal, mild DR, moderate DR,
severe DR, and proliferative DR classes. Furthermore, overfitting is a major issue in neural networks. Skewed
datasets cause the network to over-fit to the class most prominent in the dataset. Large datasets are often massively
skewed. In the dataset, we used less than three percent of images came from the 4th and 5th class, meaning changes
had to be made in our network to ensure it could still learn the features of these images.

In this paper, we introduce a deep learning-based CNN method for the problem of classifying DR in fundus im-
agery. This is a medical imaging task with increasing diagnostic relevance, discussed earlier, and one that has been
subject to many studies in the past. As far as we are aware, this is the first paper discussing the five class
classification of DR using a CNN approach. Several new methods are introduced to adapt the CNN to our large
dataset. We then analyse the performance and dissect the capabilities of our network.

The remainder of this paper is organised as follows. Section 2 presents an overview of related work, section 3
describes the architecture of the CNN and the training methods used in this work, section 4 presents the results from
our experiments, section 5 concludes the paper with discussion on the results and future work.

2. Related Work

Extensive research has been carried out on methods for a binary classification of DR with encouraging results.
Gardner et al used Neural Networks and pixel intensity values to achieve sensitivity and specificity results of 88.4%
and 83.5% respectively for yes or no classification of DR 16. They used a small dataset of around 200 images and
split each image in to patches and then required a clinician to classify the patches for features before SVM
implementation.

Neural Networks have also been used in three-class classification of DR. Nayak et al 17 used features such as the
area of exudates and the area of blood vessels together with texture parameters. Features are entered into the neural
network to classify images into normal, non-proliferative retinopathy and proliferative retinopathy. The neural net-
work used these features as input for classification. The detection results were validated by comparing with grading
from expert ophthalmologists. They demonstrated a classification accuracy of 93%, sensitivity of 90% and specificity
of 100%. This was carried out on a dataset of 140 images and feature extraction was required on all images in both
training and testing which can be time consuming.

The vast majority of research on the five-class classification that has been carried out has used support vector ma-
chines (SVMs). Acharya et al 18 have created an automated method for identifying the five classes. Features, which
are extracted from the raw data using a higher-order spectra method, are fed in to the SVM classifier and capture the
202 Harry Pratt et al. / Procedia Computer Science 90 (2016) 200 – 205

(a) No DR (b) Mild DR (c) Moderate DR

(d) Severe DR (e) Proliferative DR


Fig 1: Stages of diabetic retinopathy (DR) with increasing severity

variation in the shapes and contours in the images.

This SVM method reported an average accuracy of 82%, sensitivity


of 82% and specificity of 88%. Acharya et al 19 also created a five- class
classification method by calculating the areas of several features such as
haemorrhages, micro-aneurysms, exudate and blood vessel. The fea- tures
determined to be the most crucial; blood vessels, micro-aneurysms, exudates, and
haemorrhages, were extracted from the raw images us- ing image
processing techniques. These were then fed to the SVM for
classification. A sensitivity of 82%, specificity of 86% and accu- racy of
85.9% was achieved using this system. These methods were per- formed on
relatively small datasets and the drop in sensitivity and speci- ficity was
likely due to to the complex nature of the five class prob- lem.

Adarsh et al 20 also used image processing techniques to produce an au- tomated


diagnosis for DR through the detection of retinal blood vessels, exudate, micro-
aneurysms and texture features. The area of lesions and texture features were
used to construct the feature vector for the multi- class SVM. This
achieved accuracies of 96% and 94.6% on the pub- lic 89 and 130
image databases DIARETDB0 and DIARETDB1 respec- tively.

Each of the previous five class methods required feature extraction from the
images before being input to an SVM classifier and have only been validated on
small test sets of approximately 100 images. These methods are less real-time
applicable than a CNN.

3. Method and Structure

The structure of our neural network, shown in Fig 1, was decided after study-
ing the literature for other image recognition tasks. Increased convolution layers
are perceived to allow the network to learn deeper features. For example, whereas Fig 2: Network architecture
Harry Pratt et al. / Procedia Computer Science 90 (2016) 200 – 205 203

(a) Original image (b) Preprocessed image (c) Augmented image


Fig 3: Illustration of the preprocessing and augmentation processes

the first layer learns edges the deepest layer of the network, the last convolutional layer, should learn the features of
classification of DR such as hard exudate. The network starts with convolution blocks with activation and then batch
normalisation after each convolution layer. As the number of feature maps increases we move to one batch normali-
sation per block.

All maxpooling is performed with kernel size 3x3 and 2x2 strides. After the final convolutional block the network
is flattened to one dimension. To avoid overfitting we use weighted class weights relative to the amount of images in
each class. Likewise, we perform dropout on dense layers, to reduce overfitting, until we reach the dense five node
classification layer which uses a softmax activation function to predict our classification. The leaky rectified linear
unit 13 activation function was used, applied with a value of 0.01, to stop over reliance on certain nodes in the
network. Similarly, in the convolution layers, L2 regularisation was used for weight and biases. The network was
also initialised with Gaussian initialisation to reduce initial training time. The loss function used to optimise was the
widely used categorical cross-entropy function.

3.1. Dataset, Hardware and Software

The dataset used for testing was provided by the Kaggle coding website (https://www.kaggle.com) and con- tains
over 80,000 images, of approximately 6M pixels per image and scales of retinopathy. Resizing these im-
ages and running our CNN on a high-end GPU, the NVIDIA K40c, meant we were able to train on the whole
dataset. The NVIDIA K40c contains 2880 CUDA cores and comes with the NVIDIA CUDA Deep Neural Net-
work library (cuDNN) for GPU learning. Through using this package around 15,000 images were uploaded on
the GPU memory at any one time. The deep learning package Keras (http://keras.io/) was used with the Theano
(http://deeplearning.net/software/theano/) machine learning back end. This was chosen due to good documentation
and short calculation time. An image can be classified in 0.04 seconds meaning real-time feedback for the patient is
possible.

3.2. Preprocessing

The dataset contained images from patients of varying ethnicity, age groups and extremely varied levels of
lighting in the fundus photography. This affects the pixel intensity values within the images and and creates
unnecessary variation unrelated to classification levels. To counteract this, colour normalisation was implemented on
the images using the OpenCV (http://opencv.org/) package. The result of this can be seen in Fig 3 (b). The images
were also high resolution and therefore of significant memory size. The dataset was resized to 512x512 pixels
which retained
the intricate features we wished to identify but reduced the dataset to a memory size the NVIDIA K40c could handle.

3.3. Training

The CNN was initially pre-trained on 10,290 images until it reached a significant level. This was needed to
achieve a relatively quick classification result without wasting substantial training time. After 120 epochs of training
on the initial images the network was then trained on the full 78,000 training images for a further 20 epochs. Neural
networks
suffer from severe over-fitting, especially in a dataset such as ours in which the majority of the images in the dataset
are classified in one class, that showing no signs of retinopathy. To solve this issue, we implemented real-time class
204 Harry Pratt et al. / Procedia Computer Science 90 (2016) 200 – 205

weights in the network. For every batch loaded for back-propagation, the class-weights were updated with a ratio
respective to how many images in the training batch were classified as having no signs of DR. This reduced the risk
of over-fitting to a certain class to be greatly reduced.

The network was trained using stochastic gradient descent with Nestrov momentum. A low learning rate of 0.0001
was used for 5 epochs to stabilise the weights. This was then increased to 0.0003 for the substantial 120 epochs of
training on the initial 10,290 images, taking the accuracy of the model to over 60%, this took circa 350 hours of
training. The network was then trained on the full training set of images with a low learning rate. Within a couple of
large epochs of the full dataset the accuracy of the network had increased to over 70%. The learning rate was then
lowered by a factor of 10 every time training loss and accuracy saturated.

3.4. Augmentation

The original pre-processed images were only used for training the network once. Afterwards, real-time data-
augmentation was used throughout training to improve the localisation ability of the network. During every epoch
each image was randomly augmented with: random rotation 0-90 degrees, random yes or no horizontal and vertical
flips and random horizontal and vertical shifts. The result of an image augmentation can be seen in Fig 3 (c).

4. Results

5,000 images from the dataset were saved for validation


purposes. Running the validation images on the network took
188 seconds. For this five class problem we define specificity
as the number of patients correctly identified as not having
DR out of the true total amount not having DR and sensitivity
as the number of patients correctly identified as having DR
out of the true total amount with DR. We define accuracy as
the amount of patients with a correct classification. The final
trained network achieved, 95% specificity, 75% accuracy and
30% sensitivity. The classifications in the network were de-
fined numerically as: 0 - No DR 1 - Mild DR 2 - Moderate
DR 3 - Severe DR 4 - Proliferative DR.

5. Discussion and Conclusion

Our study has shown that the five-class problem for na-
tional screening of DR can be approached using a CNN
method. Our network has shown promising signs of being Fig 4: Confusion matrix of final classification results
able to learn the features required to classify the fundus images, accurately classifying the majority of proliferative
cases and cases with no DR. As in other studies using large datasets high specificity has come with a trade o ff of
lower sensitivity 8. Our method produces comparable results to these previous methods without any feature-specific
detection and using a much more general dataset.

The potential benefit of using our trained CNN is that it can classify thousands of images every minute allowing it
to be used in real-time whenever a new image is acquired. In practice images are sent to clinicians for grading and
not accurately graded when the patient is in for screening. The trained CNN makes a quick diagnosis and instant
response to a patient possible. The network also achieved these results with only one image per eye.

The network has no issue learning to detect an image of a healthy eye. This is likely due to the large number of
healthy eyes within the dataset. In training the learning required to classify the images at the extreme ends of the
scale was significantly less. The issues came in making the network to distinguish between the mild, moderate and
severe
Harry Pratt et al. / Procedia Computer Science 90 (2016) 200 – 205 205

cases of DR. The low sensitivity, mainly from the mild and moderate classes suggests the network struggled to learn
deep enough features to detect some of the more intricate aspects of DR. An associated issue identified, which was
certified by a clinician, was that by national UK standards around over 10% of the images in our dataset are deemed
ungradable. These images were defined a class on the basis of having at least a certain level of DR. This could have
severely hindered our results as the images are misclassified for both training and validation.

In future, we have plans to collect a much cleaner dataset from real UK screening settings. The ongoing develop-
ments in CNNs allow much deeper networks which could learn better the intricate features that this network
struggled to learn. The results from our network are very promising from an orthodox network topology. Unlike in
previous methods, nothing specifically related to the features of our fundus images have been used such as vessels,
exudate etc. This makes the CNN results impressive but in future we have ideas to cater our network towards this
specific task, in order to learn the more subtle classification features. We will also look to compare these networks to
five class SVM methods trained on the same datasets.

To conclude, we have shown that CNNs have the potential to be trained to identify the features of Diabetic
Retinopathy in fundus images. CNNs have the potential to be incredibly useful to DR clinicians in the future as
the networks and the datasets continue improving and they will offer real-time classifications.

Acknowledgment: This study was funded by Fight for Sight in the form of a PhD studentship for HP
(http://www.fightforsight.org.uk). We also thank NVIDIA corporation for their donation of GPU card.

References

1. Kocur, I., Resnikoff, S.. Visual impairment and blindness in europe and their prevention. Brit J Ophthalmol 2002;86(7):716–722.
2. Evans, J., Rooney, C., Ashwood, F., Dattani, N., Wormald, R.. Blindness and partial sight in England and Wales: April 1990-march 1991.
Health Trends 1996;28(1):5–12.
3. Sector, S.P., et al. State of the nation 2012. Diabetes UK 2013;.
4. Sculpher, M., Buxton, M., Ferguson, B., Spiegelhalter, D., Kirby, A.. Screening for diabetic retinopathy: A relative cost-effectiveness
analysis of alternative modalities and strategies. Health Econ 1992;1(1):39–51.
5. Benbassat, J., Polak, B.C.. Reliability of screening methods for diabetic retinopathy. Diabetic Med 2009;26(8):783–790.
6. Grading diabetic retinopathy from stereoscopic color fundus photographsan extension of the modified airlie house classification: Etdrs report
number 10. Ophthalmology 1991;98(5):786–806.
7. Philip, S., Fleming, A.D., Goatman, K.A., Fonseca, S., Mcnamee, P., Scotland, G.S., et al. The efficacy of automated disease/no disease
grading for diabetic retinopathy in a systematic screening programme. Brit J Ophthalmol 2007;91(11):1512–1517.
8. Fleming, A.D., Philip, S., Goatman, K.A., Prescott, G.J., Sharp, P.F., Olson, J.A.. The evidence for automated grading in diabetic
retinopathy screening. Current Diabetes Reviews 2011;7:246 – 252.
9. Mookiah, M.R.K., Acharya, U.R., Chua, C.K., Lim, C.M., Ng, E., Laude, A.. Computer-aided diagnosis of diabetic retinopathy: A
review. Comput Biol Med 2013;43(12):2136–2155.
10. Fukushima, K.. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in
position. Biol Cybern 1980;36(4):193–202.
11. Cun, Y.L., Boser, B., Denker, J.S., Howard, R.E., Habbard, W., Jackel, L.D., et al. Advances in neural information processing systems 2.
Citeseer. ISBN 1-55860-100-7; 1990, p. 396–404.
12. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.. Dropout: A simple way to prevent neural networks from
overfitting. J Mach Learn Res 2014;15(1):1929–1958.
13. Nair, V., Hinton, G.E.. Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th International Conference
on Machine Learning (ICML-10). 2010, p. 807–814.
14. Ioffe, S., Szegedy, C.. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv 2015;URL:
arXiv:1502.03167.
15. He, K., Zhang, X., Ren, S., Sun, J.. Deep residual learning for image recognition. arXiv 2015;URL: arXiv:1512.03385.
16. Gardner, G., Keating, D., Williamson, T., Elliott, A.. Automatic detection of diabetic retinopathy using an artificial neural network: a
screening tool. Brit J Ophthalmol 1996;80(11):940–944.
17. Nayak, J., Bhat, P.S., Acharya, R., Lim, C., Kagathi, M.. Automated identification of diabetic retinopathy stages using digital fundus
images. J Med Syst 2008;32(2):107–115.
18. Acharya, R., Chua, C.K., Ng, E., Yu, W., Chee, C.. Application of higher order spectra for the identification of diabetes retinopathy stages.
J Med Syst 2008;32(6):481–488.
19. Acharya, U., Lim, C., Ng, E., Chee, C., Tamura, T.. Computer-based detection of diabetes retinopathy stages using digital fundus images.
P I Mech Eng H 2009;223(5):545–553.
20. Adarsh, P., Jeyakumari, D.. Multiclass svm-based automated diagnosis of diabetic retinopathy. In: Communications and Signal Processing
(ICCSP), 2013 International Conference on. IEEE; 2013, p. 206–210.
PROJECT PAPER
© 2020 IJRAR March 2020, Volume 7, Issue 1 www.ijrar.org (E-ISSN 2348-1269, P- ISSN 2349-5138)

Diagnosis of Diabetic Retinopathy Using


Pre- trained EfficientNet Model
1
K.Suresh,2R.Bhargavi,3R.Geeta Nalini,4K.P.Subhash,5K.Sainadh
1
Anil Neerukonda Institute of Technology&Sciences(A)
Abstract: Insufficient labeled training data is one of the challenges in deep learning classification problems mostly in medical.
Transfer learning helps to train the deep learning model using small Training Dataset. Training data insufficiency problem in Deep
convolution networks can be solved by using transfer learning. Deep convolution neural networks have been achieving high
performance results on the Image Net dataset and one such example is EfficientNet-b5 which achieves state of art top-5 96.2%
accuracy and top-1 83.4% accuracy. It achieves more precision and efficiency with few parameters compared to other deep CNN. We
have ability to use this in transfer learning in order to solve the problems by using limited training data. We get training and testing
dataset on Kaggle.we trained this model on training dataset and tested the model using testing dataset. We can use it in other deep
learning based image classification problems which face the challenge of labeled training data insufficiency.

Keywords: Deep Convolution Networks, Transfer Learning, Diabetic Retinopathy, Insufficient labeled training data.
1. INTRODUCTION

1.1 Diabetic Retinopathy


It was reported that 422 million people were living with diabetes mellitus in 2014 [1].In 2010, 33 percent of the people who are
suffering from diabetes are detected with diabetic retinopathy and among them one third of the people were affected with loss of
vision. It is expected that number of people detected with DR may triple in 2050 particularly in America. DR is one of the leading
causes of vision impairment and blindness.

Diabetic retinopathy (DR) caused due to diabetic mellitus which damages the retina also known as diabetes mellitus. It is a leading
cause of blindness.80 percent of the people who are above 20 years old with diabetes are affected with diabetic retinopathy. DR often
has no initial warning signs. In hospitals, to diagnose the person with DR, instead of in being dilated examinations uses retinal or
fundus photography with some manual interpretation used to diagnose the individual which more working than former.

a. Image for Patient with DR and Healthy Patient.

Wilkinson et al. [5] classified DR into 5 stages based on severity of disease. The first stage is ‘no DR’ where there is no damage
caused by diabetes mellitus to the retina. mild DR which is the label of second stage appear with some micro aneurysms and moderate
DR which is labeled as third stage characterized by multiple micro aneurysms like blot hemorrhages and cotton wool spots. The
fourth stage, called ‘severe’ DR is characterized by intraregional micro vascular abnormalities, venous beading and cotton wool spots.
‘Proliferative DR’ is the final stage of DR in which new retinal blood vessels start getting produced, blood leaks into the vitreous
humor of the eye, and retinal detachment may start to appear. Techniques like fundus photography and optical coherence tomography
used to diagnose DR by ophthalmologist. In this work, we have used fundus images.

IJRAR2001622 International Journal of Research and Analytical Reviews (IJRAR) www.ijrar.org 408
© 2020 IJRAR March 2020, Volume 7, Issue 1 www.ijrar.org (E-ISSN 2348-1269, P- ISSN 2349-5138)

Figure 1. Fundus images: (A) normal (B) mild (C) moderate (D) severe (E) prolific DR

1.2 Transfer Learning


Transfer learning is the reuse of deep learning models that are pre-trained on huge datasets such as subsets of the Image Net
database to fit to a previously unseen dataset. Deep Learning (DL) is part of machine learning that works fine when there is huge
training data and large computational power for solving classification and regression problems. Convolution neural network(CNN)
[10] is one of instances of DL architecture that works thriving with multi-dimensional data such as images and videos. DL requires
huge extent of training data to correctly work. The lack of enough labeled training data is one of the key challenges of applying DL in
health industry. Transfer learning, on deep convolution neural networks, has gained attention due to the lack of enough training data.
Transfer learning helps to reduce the deep learning model training time.

We used the pre-trained model which is trained on ImageNet dataset to solve the problem. The pre-trained EfficientNet-b5
model was downloaded and it is imported using keras library. Later we used linear, rely activation functions to predict the class of the
image to which it belongs.

2 .RELATED WORK

Diabetic Retinopathy classification can be achieved by Feature extraction methods. In Acharya et al. [4] higher order spectra
technique was used to derive features from 300 fundus images and provided to a support vector machine (SVM) classifier. It classifies
the images into 5 classes with sensitivity of 82% and specificity of 88%.DR wounds such as Blood vessels, exudates, and micro
aneurysms are extracted using various algorithms. SVM was used to classify into positive and negative classes on DIABETDB1
dataset using area and count of micro aneurysms as features [6].

Expertise is required to derive the features and it takes so much time as it contains feature selection, identification and extraction.
Feature extraction based methods [6] often outperform the CNN models. We can train the model for DR classification using two
major categories: learning from scratch and transfer learning.

A dataset of 128,175 fundus images was classified into two classes using trained Convolution neural networks, where first class
contains images with severity levels 0,1 and the second class contains images with severity levels 2,3,4 [7].It achieves sensitivity of
97.5% and specificity pf 93.4% on EYEPACS 1 dataset and sensitivity of 96.1% and specificity pf 93.9% on Messidor 2 dataset and
in evaluation point it achieves sensitivity and specificity of 90.3 and 98.1 percentage on EYEPACS 1 dataset and sensitivity of 87%
and specificity pf 98.5% on Messidor 2 dataset. Pratt et al trained a CNN using stochastic gradient descent algorithm to classify DR
into 5 classes on 70,000 fundus images and achieves accuracy of 70 percentages.

Using a training dataset of over 70,000 fundus images, Pratt et al. trained a CNN using stochastic gradient descent algorithm to
classify DR into 5 classes, and it achieved 95% specificity, 75% accuracy and 30% sensitivity. A DL model was trained from scratch
on the MESSIDOR-2 dataset for the automatic detection of DR in [5], and a 96.8% sensitivity and 87% specificity was score.

Mohammad an etal. [8] fine-tuned the Inception-V3 and Exception pre-trained models to divide the Kaggle dataset into two
classes to avoid time consuming problems in deep learning. Data augmentation was used to reach accuracy of 87.12% on the
Inception-V3, and 74.49% on the Exception model to balance the dataset. Wan et al. [10] implemented Transfer learning and hyper
parameter tuning and applied on the AlexNet ,VggNet-s, VggNet-19, VggNet-16, Google Net pre-trained modelson Kaggle
dataset.VggNet-s with hyper
IJRAR2001622 International Journal of Research and Analytical Reviews (IJRAR) www.ijrar.org 409
© 2020 IJRAR March 2020, Volume 7, Issue 1 www.ijrar.org (E-ISSN 2348-1269, P- ISSN 2349-5138)
parameter tuning model gets highest accuracy of 95.2%.Mansour [11] uses transfer learning to train a deep CNN for feature extraction
when building a Computer aided diagnosis model for DR.

Dutta et al. [12] trained a feed forward neural network, deep neural network and VggNet- 16 model using 2000 images from
kaggle dataset. On a test dataset of 300 images, deep neural network scored 86.3% shallow neural network scored an accuracy of
42% using shallow neural network and 78.3% accuracy on VggNet-16 .

To solve problem of medical training data insufficiency for DL, our model was trained on a subsample of 3500 fundus images
and tested it on 2500 previously unseen fundus images.

3. METHODOLOGY

Data Collection and


Preparation

Image Preprocessing

Modeling

Model testing

Conclusion

Figure 2. DR Detection System Diagram

3.1 Data Collection and Preparation

The Kaggle APTOS blindness detection dataset contains color fundus images of the persons eye that are labeled by clinician as
0,1,2,3,4 for normal, mild, moderate, severe and prolific Diabetic retinopathy. Images size ranges from 2500×2000 to
4000×3000.Dataset contains training and testing zip files of images along with its corresponding csv files consists of image id and
labeled stage of diabetic retinopathy. We trained the model on the training images and applied the model on testing images.

We collect all the fundus images from APTOS dataset. In this dataset the fundus images relabeled as 0,1,2,3 and 4 for Normal, Mild
DR, Moderate DR, Severe DR, and Prolific DR respectively. This dataset provides 4657 fundus images in total. Among these 3662
(stored in Train.csv with image ID and its diagnosis label) and were used for model training and remaining 995 (stored in Test.csv with
image ID and its diagnosis label) are used for model testing.

3.2 Image Preprocessing

In Preprocessing, we used Ben-Graham preprocessing method to deal with images that comes under different lighting conditions, some
images are very dark and difficult to visualize.

First, let have a glance of original inputs. Each row depicts each severity level. We can see two problems which make the severity
difficult to spot on. First, some images are very dark [pic (0,2) and pic (4,4)] and sometimes different color illumination is confusing
[pic (3,3)]. Second, we can get the uninformative dark areas for some pictures [pic (0,1), pic (0,3)]. This is important when we reduce
the picture size, as informative areas become too small. So, it is intuitive to crop the uninformative areas out in the second case.

IJRAR2001622 International Journal of Research and Analytical Reviews (IJRAR) www.ijrar.org 410
© 2020 IJRAR March 2020, Volume 7, Issue 1 www.ijrar.org (E-ISSN 2348-1269, P- ISSN 2349-5138)

Fig.3.2.1 Original input images

To avoid this color distraction, we convert this original image (BGR format) to RGB format. So, we can crop the images for uninformative
area. After cropping the images, they are converted into gray scale to resize the images and detect in which stage it is.

Fig.3.2.2. After cropping the images

After preprocessing we have managed to enhance the distinctive features in the images. This will increase performance when we train our
Efficient Net model. Jupyter notebook is used for preprocessing.

IJRAR2001622 International Journal of Research and Analytical Reviews (IJRAR) www.ijrar.org 411
© 2020 IJRAR March 2020, Volume 7, Issue 1 www.ijrar.org (E-ISSN 2348-1269, P- ISSN 2349-5138)

The last step is to use Gaussian filters[15] on image is to reduce noise in images by smoothening the boundary of images especially
sharp edges in the images so that image can be used for processing.

Figure 3.2.1 Original fundus image (Left), cropped and preprocessed fundus image (Right)

3.3 Modeling

The architecture of a deep CNN contains two basic parts: a convolution and classifier part. Convolution part convolves with input
images and uses different kernel filters to find different characteristics like horizontal, vertical lines etc called features in images and
this process is called feature selection and classifier part extracts the features which come from convolution part and uses those to
classify the images.

In this work, we used the pre-trained convolution part of EfficientNet-b5 to extract features of fundus images. We used the Efficient
Net base architecture.Generally,in convolution neural networks to achieve more accuracy and efficiency we scale network depth,
network width(number of channels in each layer), image resolution arbitrarily but it takes more time and more computational
resources.EfficientNet architecture uses compound coefficient which uniformly scales network depth, width and image resolution at a
time.

From the above diagram we can see that compound scaling which we used in EfficientNet-b5 uniformly scales network width,
network depth, resolution scaling at a time. In compound scaling, we use compound coefficient to uniformly scale all dimensions.
Compound coefficient obtained by applying grid search to find the relationship between different scaling dimensions. Also, baseline
network plays important role in EfficientNet architecture. To find the best baseline network, we used neural architectural search [17]

IJRAR2001622 International Journal of Research and Analytical Reviews (IJRAR) www.ijrar.org 412
using

IJRAR2001622 International Journal of Research and Analytical Reviews (IJRAR) www.ijrar.org 413
© 2020 IJRAR March 2020, Volume 7, Issue 1 www.ijrar.org (E-ISSN 2348-1269, P- ISSN 2349-5138)
Automl minas network[16].we get the resulting architecture is similar to MobileNetV2[18] and then we get efficientnets by scaling up
the resulting architecture.

The above diagram shows the architecture of EfficientNet-b0model.If we scale the network dimensions then we get EfficientNet-b1 to
EfficientNet-b7.Efficientnet models often gives more performance and accuracy than convolution neural networks with few parameters
compared to other transfer learning algorithms. As it achieves state of art accuracy in most of the datasets such as CIFAR-100 with
accuracy of 92% and FLOWERS dataset with accuracy of 98% we can also use it for transfer learning. Due to its few parameters to train
and its high performance in nature we are using it for transfer learning for this problem.

We are using quadratic weighted kappa as a metric for this problem, it will comes under regression problem. Optimizing quadratic
weighted kappa is same as optimizing mean squared error.
The below is the formula for quadratic weighted kappa metric.

1/N ⅀( Yi - Pi)2

Where I range from 1 to number of data


points Yi is observed value
Pi is predicted value

We can see that it imposes higher penalty on predicted data points which vary highly with respect to observed values and that’s help in
model learning more accurately. Since training images are less we use data augmentation to add more images using opencv library in
python. Then we split the training data into 15 percentages for validation and remaining for training the model. While in training, to find
the optimum parameters for the model we used the RAdam optimizer which offer better convergence in model training compared to other
optimization algorithms. Also, to normalize the features of images during training we used Group normalization [14] since it is independent
of batch size and we kept the batch size of 4 hence batch normalization becomes unstable for small batch sizes. We can implement group
normalization using keras API in python.

To detect the severity level of the diabetic retinopathy of the patient we set threshold value for each stage of the disease during training
of the model. Based on the threshold values we get accuracy of the model. We can improve the model accuracy by optimizing the threshold
values of model using optimizers like Radom optimizer we used in this model. We initially set initial values of coefficient values to (0.5,
1.5, 2.5, and 3.5). Later we minimize the quadratic weighted kappa score with respect to coefficient values using nelder-mead method.

In the classifier part of the convolution neural network we used the linear activation function because we output single value later we
round that value to integer based on threshold values and we use that value to calculate quadratic weighted kappa score which is used as a
metric to solve this problem. During training we used the 32 epochs(number of times we pass entire training data pass to the model).After
each epoch, we save the model which gives more quadratic weighted kappa score and also if the model’s score does not change up to 4
epochs we stop the training and save the best model till that time.

IJRAR2001622 International Journal of Research and Analytical Reviews (IJRAR) www.ijrar.org 414
© 2020 IJRAR March 2020, Volume 7, Issue 1 www.ijrar.org (E-ISSN 2348-1269, P- ISSN 2349-5138)
3.4 Model Testing

We tested our model on a previously unseen dataset of 547 fundus images. Our classifier model resulted in an accuracy of 83%.Kaggle
cloud provides 30 hours of GPU service per week for education and research purposes . It was used for training and testing our model. Here
all the details and results of our model

Size of training data


3167
Training Parameters

OPTIMIZER RADAM

DATA AUGMENTATION USED YES

LEARNING RATE 0.0005

ACCURACY 83%
RESULTS

LOSS 3.94%

4. CONCLUSION

In this project, we implemented transfer learning to train the CNN model on small training dataset. Training data usually limited in
healthcare and transfer learning comes to aid in those situations and performs well on unseen data. Our model has reached at an accuracy
that is higher than other technique which classifies the DR into 5 stages that have used transfer learning. Our model has reached at a
superior performance on account of the selected training algorithm, which is batch gradient descent with ascending learning rate, and the
quadratic weighted kappa loss function. Deep learning techniques that can learn from small dataset to categorize medical images should
be utilized to classify DR, as this can be transferred to other medical image classification problems facing the challenge of insufficient
training data. Experiments should be done to compare performances of other pre-trained deep convolution networks in DR classification
using small training data.

IJRAR2001622 International Journal of Research and Analytical Reviews (IJRAR) www.ijrar.org 415
© 2020 IJRAR March 2020, Volume 7, Issue 1 www.ijrar.org (E-ISSN 2348-1269, P- ISSN 2349-5138)
5. REFERENCES

[1] World Health Organization, 2016. Global report on diabetes. Address:http://www.who.int


[2] Wilkinson, C.P., Ferris III, F.L., Klein, R.E., Lee, P.P., Agardh, C.D., Davis, M., Dills, D., Kampik, A., Pararajasegaram,
R., Verdaguer, J.T. and Group, G.D.R.P., 2003. Proposed international clinical diabetic retinopathy and diabetic macular
edema disease severity scales. Ophthalmology, 110(9),pp.1677-1682.
[3] LeCun, Y., Boser, B.E., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W.E. and Jackel, L.D., 1990. Handwritten
digit recognition with a back-propagation network. In Advances in neural information processing systems (pp.396-404).
[4] Acharya, R., Chua, C.K., Ng, E.Y.K., Yu, W. and Chee, C., 2008. Application of higher order spectra for the identification
of diabetes retinopathy stages. Journal of Medical Systems, 32(6),pp.481-488.
[5] Faust, O., Acharya, R., Ng, E.Y.K., Ng, K.H. and Suri, J.S., 2012. Algorithms for the automated detection of diabetic
retinopathy using digital fundus images: a review. Journal of medical systems, 36(1),pp.145-157.
[6] Kumar, S. and Kumar, B., 2018, February. Diabetic Retinopathy Detection by Extracting Area and Number of
Microaneurysm from Colour Fundus Image. In 2018 5th International Conference on Signal Processing and Integrated
Networks (SPIN) (pp. 359-364). IEEE.
[7] Asiri, N., Hussain, M. and Abualsamh, H.A., 2018. Deep Learning based Computer-Aided Diagnosis Systems for Diabetic
Retinopathy: A Survey. arXiv preprintarXiv:1811.01238.
[8] Gulshan, V., Peng, L., Coram, M., Stumpe, M.C., Wu, D., Narayanaswamy, A., Venugopalan, S., Widner, K., Madams, T.,
Cuadros, J. and Kim, R., 2016. Development and validation of a deep learning algorithm for detection of diabetic retinopathy
in retinal fundus photographs. Jama, 316(22),pp.2402-2410.
[9] Mohammadian, S., Karsaz, A. and Roshan, Y.M., 2017, November. Comparative Study of Fine-Tuning of Pre-Trained
Convolutional Neural Networks for Diabetic Retinopathy Screening. In 2017 24th National and 2nd International Iranian
Conference on Biomedical Engineering (ICBME) (pp. 1-6).IEEE.
[10] Wan, S., Liang, Y. and Zhang, Y., 2018. Deep convolutional neural networks for diabetic retinopathy detection by
image classification. Computers & Electrical Engineering, 72,pp.274-282.
[11] Mansour, R.F., 2018. Deep-learning-based automatic computer-aided diagnosis system for diabetic retinopathy. Biomedical
engineering letters, 8(1), pp.41-57.
[12] Dutta, S., Manideep, B.C., Basha, S.M., Caytiles, R.D. and Iyengar, N.C.S., 2018. Classification of Diabetic Retinopathy
Images by Using Deep Learning Models. International Journal of Grid and Distributed Computing, 11(1), pp.89-106.
[13] Gao, Z., Li, J., Guo, J., Chen, Y., Yi, Z. and Zhong, J., 2018. Diagnosis of Diabetic Retinopathy Using Deep Neural Networks.
IEEE Access, 7,pp.3360-3370.
[14] Yuxin Wu,Kaiming He Group Normalisation for Deep learning arXiv preprint arXiv:1803.08494V3.
[15] G.Deng,L.W.Cahill,An adaptive Gaussian filter for noise reduction and edge detection.1993 IEEE Conference Record
Nuclear Science Symposium and Medical Imaging Conference.
[16] Xin He, Kaiyong Zhao, Xiaowen Chu, Department of Computer Science, Hong Kong Baptist University, AutoML:A Survey
of the State-of-the-Art arXiv preprint arXiv:1908:00709v2[cs.LG]
[17] Thomas Elsken,Jan Hendrik Metzen,Frank Hutter, Neural Architecture Search:A survey arXiv preprint arXiv:1808.05377v3.
[18] Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen, MobileNetV2:Residuals and
bottlenecks arxiv preprint arXiv:1801.0438[cs.cv]

IJRAR2001622 International Journal of Research and Analytical Reviews (IJRAR) www.ijrar.org 416

You might also like