You are on page 1of 12

574 IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 18, NO.

2, APRIL 2021

Diabetic Retinopathy Diagnosis Using Multichannel


Generative Adversarial Network
With Semisupervision
Shuqiang Wang , Xiangyu Wang, Yong Hu, Yanyan Shen , Zhile Yang, Min Gan, and Baiying Lei

Abstract— Diabetic retinopathy (DR) is one of the major causes high-resolution medical images. As for the inadequacy of labeled
of blindness. It is of great significance to apply deep-learning data in medical image analysis, the reasons mainly include the
techniques for DR recognition. However, deep-learning algo- followings: 1) the high-quality annotation of medical imaging
rithms often depend on large amounts of labeled data, which sample depends heavily on scarce medical expertise which is
is expensive and time-consuming to obtain in the medical very expensive and 2) comparing with natural issues, it is more
imaging area. In addition, the DR features are inconspicuous difficult to collect medical images because of privacy issues. It is of
and spread out over high-resolution fundus images. Therefore, great significance to apply deep-learning techniques for diabetic
it is a big challenge to learn the distribution of such DR retinopathy (DR) recognition. In this article, the multichannel
features. This article proposes a multichannel-based generative generative adversarial network (GAN) with semisupervision is
adversarial network (MGAN) with semisupervision to grade DR. developed for DR-aided diagnosis. The proposed model can
The multichannel generative model is developed to generate a deal with DR classification problem with inadequacy of labeled
series of subfundus images corresponding to the scattering DR data in the following ways: 1) the multichannel generative
features. By minimizing the dependence on labeled data, the pro- scheme is proposed to generate a series of subfundus images
posed semisupervised MGAN can identify the inconspicuous corresponding to the scattering DR features and 2) the proposed
lesion features by using high-resolution fundus images without multichannel-based GAN (MGAN) model with semisupervision
compression. Experimental results on the public Messidor data can make full use of both labeled data and unlabeled data.
set show that the proposed model can grade DR effectively. The experimental results demonstrate that the proposed model
outperforms the other representative models in terms of accuracy,
Note to Practitioners—This article is motivated by the chal- area under ROC curve (AUC), sensitivity, and specificity.
lenging problem due to the inadequacy of labeled data in
medical image analysis and the dispersion of efficient features in Index Terms— Computer-aided diagnosis (CAD), diabetic
retinopathy (DR), generative adversarial network (GAN), multi-
channel, semisupervised learning.
Manuscript received August 23, 2019; revised December 3, 2019 and
January 31, 2020; accepted March 4, 2020. Date of publication April 9, I. I NTRODUCTION
2020; date of current version April 7, 2021. This article was recommended
for publication by Lead Guest Editor A. Si and Editor M. Zhang upon
evaluation of the reviewers’ comments. This work was supported by the
National Natural Science Foundations of China under Grant 61872351 and
D IABETIC retinopathy (DR) is one of the complications
caused by diabetes [1]. The high blood sugar level
damages the blood vessels of the light-sensitive tissue at
Grant 61771465, by the International Science and Technology Cooperation the retina. It is one of the main causes of blindness. The
Projects of Guangdong under Grant 2019A050510030, by the Strategic
Priority CAS Project under Grant XDB38000000, by the Major Projects World Health Organization predicts that global diabetics will
from General Logistics Department of People’s Liberation Army under reach 4.4% of the population in 2030 and about half of
Grant AWS13C008, and by the Shenzhen Key Basic Research Projects them will have DR complications [2]. Early diagnosis through
under Grant JCYJ20180507182506416. (Shuqiang Wang and Xiangyu Wang
contributed equally to this work.) (Corresponding authors: Min Gan; regular screening is important for preventing DR. However,
Baiying Lei.) it is time-consuming for ophthalmologists to diagnose effi-
Shuqiang Wang is with the Shenzhen Institutes of Advanced Technology, ciently. In order to reduce the cost of regular screening,
Chinese Academy of Sciences, Shenzhen 518055, China, and also with the
Joint Engineering Research Center for Health Big Data Intelligent Analysis the technology for capturing color fundus images is often
Technology, Shenzhen 518060, China. adopted. This approach offers the possibility to make use of
Xiangyu Wang is with the College of Data Science, University of Science computer-aided diagnosis (CAD) [3], [4] technology, which
and Technology of China, Hefei 230026, China.
Yong Hu is with the Department of Orthopedics and Traumatology, has been widely studied in artificial intelligence for healthcare
The University of Hong Kong, Hong Kong. applications [5]–[9]
Yanyan Shen and Zhile Yang are with the Shenzhen Institutes of Advanced For traditional machine learning, most of the efficient fea-
Technology, Chinese Academy of Sciences, Shenzhen 518055, China.
Min Gan is with the College of Mathematics and Computer Science, Fuzhou tures need to be identified by an expert manually. Besides,
University, Fuzhou 350116, China (e-mail: aganmin@aliyun.com). the performance of the traditional machine-learning meth-
Baiying Lei is with the School of Biomedical Engineering, Health ods often depends on how accurately the features are
Science Center, Shenzhen University, Shenzhen 518060, China (e-mail:
leiby@szu.edu.cn). extracted [10]–[13]. In recent years, deep-learning technol-
This article has supplementary downloadable material available at ogy has been widely applied in the field of medical image
https://ieeexplore.ieee.org, provided by the authors. analysis [6], [14]. It can learn the high-level features from
Color versions of one or more of the figures in this article are available
online at https://ieeexplore.ieee.org. images automatically. However, deep-learning models usually
Digital Object Identifier 10.1109/TASE.2020.2981637 depend on a large amount of labeled data, which is a big
1545-5955 © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: VIT University. Downloaded on December 02,2021 at 10:52:28 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: DR DIAGNOSIS USING MGAN WITH SEMISUPERVISION 575

challenge to obtain in the medical imaging area. For the The rest of this article is organized as follows. In Section II,
labeling process of DR images, the grading of DR requires the related works on DR assessment are reviewed. In
the clinician to extract the lesions and measure the area of Section III, the proposed model is described in detail.
the lesions manually, which is highly time-consuming. Due to In Section IV, the model is evaluated, and the factors influ-
the lack of high-quality labeled data in real applications, it is encing model performance are also analyzed. In Section V,
difficult to apply the general deep-learning method (such as the experimental results compared with the existing methods
GoogleLeNet and ResNet) for DR diagnosis. On the other are presented and discussed. Conclusions are summarized in
hand, hospitals can produce a large number of unlabeled Section VI.
data containing important potential information in which the
machine-learning model can benefit. Therefore, it is feasible II. R ELATED W ORKS
to employ semisupervised learning for DR recognition task. In the recent years, CAD technology using machine learning
As for semisupervised learning, generative adversarial network has been applied to diagnose various diseases. Extensive
(GAN) [15] not only works well in a wide variety of appli- research has been carried out on assisted DR diagnosis [18].
cations for image generation but also can achieve excellent For example, Seoud et al. [19] proposed an automatic DR
performance in semisupervised classification [16]. grading system. In their work, a red lesion detection is
The high-resolution fundus images are usually employed adopted to generate a DR lesion probability map, which
to diagnose DR in the clinical treatment. However, there are is represented by 35 features, including location, size, and
several challenges for employing the high-resolution fundus probability information. Pratt et al. [20] used a convolutional
images and the general GAN model to recognize DR. First, neural network (CNN) model and color fundus images
the proportion of effective DR features is pretty low in for DR classification. They employed data augmentation to
high-resolution fundus images. Feeding such images into a expand the training set and used 80 000 samples to train
classifier directly will result in numerous redundant infor- the CNN model. Gulshan et al. [21] employed the Inception-
mation. Due to the microminiature of DR characteristics, v3 architecture-based model to automate detection of DR and
reducing the size of fundus image beforehand will result in diabetic macular edema. Haloi et al. [22] proposed a microa-
losing the information of tiny lesions. Second, due to the neurysm detection system for DR detection. They used deep
microminiature of DR features, it is difficult for the general neural network (DNN) to identify microaneurysm without the
GAN model to generate high-resolution images, including preprocessing steps. The proposed model was evaluated on the
detailed semantic information. Third, the locations of DR Retinopathy Online Challenge (ROC) and Diaretdb1v2 data-
lesion are often spread out and exudates may appear in various base.
locations depending on the patient. Moreover, the lesion area Costa et al. [23] developed a weakly supervised DR detec-
might surround with noise, such as blood vessels and imaging tion system. They used multiple instance learning (MIL)
shadows. Thus, it is unappealing to extract the lesion by algorithms for the joint optimization of the instance encod-
employing the detection of region of interest (ROI). Based on ing and the classification. Shan and Li [24] used the stacked
the earlier analysis, this article proposes a multichannel-based sparse autoencoder (SSAE) and fundus images for microa-
semisupervised GAN (SSGAN) for grading DR. The proposed neurysms classification. Gargeya and Leng [25] developed a
model can make full use of labeled data and unlabeled data data-driven method using the deep residual learning mech-
to recognize DR automatically without losing the original anism to learn discriminative features for DR detection.
DR features. Antal and Hajdu [26] developed an ensemble-based algorithm
The main contributions of this article are summarized as for the DR screening. Costa and Campilho [27] presented the
follows. bag-of-visual-words (BoVW)-based model for DR detection,
1) A multichannel-based GAN (MGAN) model is pro- and the model was tested on the Messidor and DR2 data
posed, which can generate a series of subfundus images, sets. Vo and Verma [28] proposed a DR detection model
including effective local features. All the subfundus by combing kernels with multiple losses network (CKML
images are then combined to obtain the most represen- Net) and VGGNet with Extra Kernel (VNXK). The exper-
tative features of the entire fundus image. In this way, imental results on the EyePACS and Messidor data sets
the proposed model can deal with the challenge that the showed the efficiency of the proposed model. Two pre-
effective DR features (e.g., exudates, microaneurysms, trained [29] CNN models were employed to identify the grade
and bleeding points) are diffuse in the high-resolution of DR on fluorescein angiography photographs. To detect
fundus images. microaneurysm from fundus images, Dai et al. [30] pro-
2) The feature extraction scheme is incorporated into the posed a multisieving CNN framework integrating with the
proposed MGAN framework. This scheme can reduce image-to-text mapping scheme for guiding clinical report.
the noise from the original fundus images and extract Cao et al. [31] developed a model by integrating random
the scattering lesion features, which can improve the forest (RF), neural network (NN), and support vector machine
performance of discriminator. (SVM) to detect microaneurysm. Moreover, principal com-
3) The proposed MGAN can employ both labeled data and ponent analysis (PCA) was employed to reduce the dimen-
unlabeled data. As far as we know, it is the first time sionality of DR image patches. Based on top-performing
that the SSGAN is employed for grading DR. supervised CNN, Gondal et al. [32] presented a weakly

Authorized licensed use limited to: VIT University. Downloaded on December 02,2021 at 10:52:28 UTC from IEEE Xplore. Restrictions apply.
576 IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 18, NO. 2, APRIL 2021

TABLE I
R ETINOPATHY G RADE IN M ESSIDOR D ATA S ET

TABLE II
C LASSIFICATION TASK D ESCRIPTION

supervised object localization method to detect DR lesions on transfer leaning scheme to deal with this issue, while transfer
image level. By introducing a heatmap optimization procedure, learning does not always work in medical image processing.
Quellec et al. [33] proposed a ConvNets framework to detect This article proposes a multichannel based SSGAN for grading
lesions in the context of DR screening. Zhou et al. [34] pro- DR. The proposed model can make full use of labeled data and
posed an MIL-based model for DR detection. Zeng et al. [35] unlabeled data to recognize DR automatically without losing
trained a binocular Siamese-Like CNN with a transfer learning the original DR features.
technique to classify color retinal fundus photographs into
two grades. By employing transfer learning, Li et al. [36] III. M ATERIALS AND M ETHODS
A. Data Set
presented a CNN-based model for DR fundus image classifica-
tion. Similarly, Khandelwal and Mishra [37], Raju et al. [38], The proposed model is evaluated on publicly avail-
Xu et al. [39] and Ghosh et al. [40] also adopted CNN-based able Messidor data set (http://www.adcis.net/en/third-party/
approach for automatic recognition of DR. Furthermore, messidor/) [47], which contains approximately 1200 digital
Qummar et al. [41] presented an ensemble of five CNN fundus images. Each image is obtained by using a Topcon
models to encode the DR features and improve the clas- TRC NW6 nonmydriatic camera and 45◦ of view centered
sification performance. Brown et al. [42] employed CNN to on the fovea. The ratio of images with pupil dilation to
diagnose plus disease in retinopathy of prematurity (ROP) images without dilation is 2:1. The size of the fundus image
from retinal photographs. Li et al. [43] developed a novel is 1440 × 960, 2240 × 1488, or 2304 × 1536. The label
deep network OCTD_Net for early-stage DR detection. The for each image is provided by ophthalmologists. According
OCTD_Net consisted of two independent networks. One net- to the number of microaneurysms, hemorrhages, and the
work extracted features from the original optical coherence existence of neovascularization, each image is classified as
tomography (OCT) images, and the other extracted retinal one of four lesion grades (R0, R1, R2, and R3), as described
layer information. Poplin et al. [44] developed a DNN-based in Table I. R0 represents a normal, no lesion fundus image,
model to predict cardiovascular risk factors from retinal fundus R1 and R2 represent the mild and severe nonproliferative
photographs. Wang et al. [17] proposed a novel zoom-in-net retinal image, respectively, and R3 represents the proliferative
for DR recognition by employing both the whole retinal retinopathy image. In the Messidor data set, R0, R1, R2,
image and its suspicious lesion patches generated by atten- and R3 account for 45.5%, 12.75%, 20.58%, and 21.67%
tion maps. De Fauw et al. [45] first applied a U-Net-based of the total data set, respectively. In addition, the severity
segmentation network on t3-D OCT scans to generate the of the lesion is classified according to class labels, as shown
tissue segmentation map, and then, they predict the retinal in Table II.
disease using a classification network. Based on the pixel-wise
score propagation model, de La Torre et al. [46] presented a B. DR Feature Extraction
DNN-based interpretable model for DR assessment. With this The framework of the proposed DR grading model is shown
model, the generated visual maps can be interpreted. in Fig. 1. The proposed model consists of four parts.
Most of the abovementioned works are based on super- 1) The DR feature extractor is designed to extract disper-
vised DNN of which the performance often depends on a sive lesion features.
large amount of labeled samples. However, plenty of labeled 2) The multichannel generative model is designed to gen-
samples are always not available for image processing in the erate a series of subfundus images, including effective
medical field. Some of the abovementioned works resorted to local features.

Authorized licensed use limited to: VIT University. Downloaded on December 02,2021 at 10:52:28 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: DR DIAGNOSIS USING MGAN WITH SEMISUPERVISION 577

Fig. 1. Framework of the DR grading model.

Fig. 2. Framework of the SSGAN with multichannel input.

3) The discriminator is designed to predict C + 1 classes drastically. Moreover, compressing the fundus image will
by using convolutional layers accounting for multiscale destroy tiny DR lesion features, such as bleeding points with
DR features matching. only a few pixels. To solve this problem, a feature extractor
4) In order to improve the robustness and accuracy of is designed to extract the most representative DR features
the proposed model, an ensemble-based framework is from the original fundus image by transferring a portion of
designed by integrating various trained discriminators. the pretrained network structure.
Each part will be detailed as follows. The feature extractor is shown in Fig. 2. First, the image is
Since the fundus image has high resolution with millions normalized to eliminate the effects of different scales and illu-
of pixels, feeding such high-resolution fundus images into minations. Second, the data augmentation strategy is applied
GAN directly will cause the number of parameters increasing to increase the amount of training data. Third, the large-scale

Authorized licensed use limited to: VIT University. Downloaded on December 02,2021 at 10:52:28 UTC from IEEE Xplore. Restrictions apply.
578 IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 18, NO. 2, APRIL 2021

natural data set is used to pretrain the model and the fundus defined for different label types. The labeled samples only
image is used to fine-tune the network parameters. In order to need to be predicted as a certain class since they are all real.
extract the most representative DR features, the densely con- The corresponding loss function is defined as
nected convolutional network (DenseNet) [48] is pretrained to
extract DR features from high-resolution fundus images. The L Dr−labeled = −E x,y∼ pdata (x,y) log pd (y|x, y < C + 1)

main idea of DenseNet is to connect each layer’s output to the = −o y + log[ exp o j ] (2)
subsequent layers by creating a short path from preceding lay- j =1
ers to subsequent layers. Therefore, compared with using other
where o represents the probability of the corresponding cate-
pretrained models, DenseNet can extract and transfer features
gory and y represents the ground truth. The unlabeled samples
efficiently with fewer parameters. For the feature extractor,
can only be classified as real sample or generated sample
we transfer a portion of the pretrained model to represent
without one certain class label. The (C + 1)th class is
fundus image features. The framework includes a convolution
employed to indicate whether a sample is generated by the
layer learning the basic features of the retinopathy image and
generator. For the generated DR samples, the loss function is
several dense blocks containing several composite layers. Each
defined as
composite layer includes three consecutive operations: batch
normalization (BN), rectifying by linear unit (ReLU) [49], L Dgenerated = E x∼Gen(z) log[ pd (y = C + 1|x)]
and convolution operation with a 3 × 3 convolution kernel. ⎡ ⎤
To remove the redundant information, a 1 × 1 convolution 
= log⎣1 + exp o j ⎦. (3)
layer is designed in prior to each composite layer. It can
j =1
enhance the expression ability of the feature extractor and
decrease computation cost. Likewise, a 1 × 1 convolution In the same manner, the corresponding loss function for the
layer and a pooling layer are designed among dense blocks. real unlabeled DR sample is defined as
They can preserve the effective features, discard redundant
information, and effectively reduce the size of samples. L Dr−unlabeled
= −E x,y∼ pdata (x,y) log[1 − pd (y = C + 1|x)]
⎡ ⎤ ⎡ ⎤
C. Multichannel-Based SSGAN  
= − log⎣ exp o j ⎦ + log⎣1 + exp o j ⎦. (4)
The proposed framework includes a multichannel generative j =1 j =1
model and a discriminator. Moreover, to improve the robust-
ness and accuracy of the model, the ensemble-based scheme is The total loss function for the discriminator is the sum of the
employed to integrate a series of well-trained discriminators. abovementioned three loss functions
The proposed framework is shown in Fig. 2. Each part will
be detailed as follows. (Loss)(Dis) = L Dr−labeled + L Dgenerated + L Dr−unlabeled . (5)
1) Discriminator: For the proposed multichannel based 2) Multichannel-Based Generative Model: For the general
SSGAN, the input to the discriminator includes the real DR GAN, it is difficult to generate a high-resolution fundus image
features extracted by the feature extractor, and DR features directly since the number of parameters for the generator will
generated by the multichannel generative model. For the be increased dramatically. In this article, a multichannel-based
proposed model, the discriminator is mainly composed of generative model is proposed to learn the distribution of DR
several convolution layers and a fully connected layer with features. The multichannel-based generative model contains
a softmax activation function, which predicts the class labels multiple parallel generators to generate a series of subfundus
of the real samples and generated samples. images, including effective DR features directly. It can be
Supposing that the input DR sample is x and the last layer assumed that one entire DR sample includes M features and
outputs a vector (o1 , o2 , . . . , oC ), the probability pd (y = j |x) the generative model includes n generators. Each generator
that x belongs to class j is given by accounts for N = M/n features. Let Geni denotes the i th
exp(o j ) generator in set Gen= {Gen1 , Gen2 , . . . , Genn }. The objec-
pd (y = j |x) = C . (1) tive function for the proposed multichannel-based SSGAN is
c=1 exp(oc )
defined as
The semisupervised learning strategy is introduced into
discriminator to make full use of the unlabeled samples. The min max V (Dis, {Gen1 , Gen2 , . . . , Genn })
{Gen1 ,Gen2 ,...,Genn } Dis
discriminator outputs C + 1 classes [16], where C represents
= E x∼Pdata(x) [log(Dis(x))]
the number of classes for real sample, and the extra class
indicates whether the input is real sample or fake sample from + E z∼ pz (z) [log(1 − Dis(Gen(z)))]
the generator. + ||E φ∼Pr { f (φ1 ), f (φ2 ), . . . , f (φn )}
In this article, there are two types of the input data for the − E z∼Pz (z) { f (Gen1 ), f (Gen2 ), . . . , f (Genn )}||22 (6)
discriminator. One is the labeled DR data, and the other is
the unlabeled DR data. For the labeled data, there are four where f (φi ) represents the real DR features corresponding to
categories (R0, R1, R2, and R3). For the unlabeled data, each the i th generator. For the multichannel generative model, each
sample can be labeled as “Real” or “Fake.” In order to make generator contains several deconvolution layers. By choosing
full use of the label information, different loss functions are a noise variable pz (z) following a Gaussian distribution as

Authorized licensed use limited to: VIT University. Downloaded on December 02,2021 at 10:52:28 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: DR DIAGNOSIS USING MGAN WITH SEMISUPERVISION 579

the input for Geni , the output of each layer of Geni can be
expressed as
⎛ ⎞
⎜  ⎟
bi sx+ p,sy+q,l = gli ⎜

i
ax,y,d · wip,q,d,l + cli ⎟
⎠ (7)
p∈{0,1...,k−1}
q∈{0,1...,k−1}
d∈{1...,D}

where a represents the input of layer 1, w and c represent


the weight and bias of the deconvolution layer, respectively,
i indicates the i th generator, s represents the stride of the
deconvolution operation, x and y denote the size of the input
features, respectively, D and k indicate the number of input Fig. 3. Architecture of the ensemble-based discriminative model.
features and the size of deconvolution kernels, respectively,
and gli (·) represents the activation function in layer l. The
ReLU is chosen as the activation function. loss is introduced into each generator. The corresponding loss
Each generator maps the input pz (z) to the DR feature space function is defined as
Geni (z; Wi ). Wi represents the parameter in Geni . Each gener-
ator generates a subfundus image, including a part of effective LG W = E z∼ pz (z) [log(1 − Dis(Gen(z)))] (11)
DR features. The output of the multichannel generative model
where Gen(z) indicates the combined features generated by
is given by
the multichannel generative model. The loss function of each
Gen(z) = Gen1 (z; W1 )⊕Gen2 (z; W2 )⊕. . .⊕Genn (z; Wn ) (8) generator is defined as

where ⊕ denotes the connection operation of feature maps. (Loss)(Geni ) = E z∼ pz (z) [log(1 − Dis(Gen(z)))]
To simulate the basic DR features, the multichannel generative + ||E x∼Pr f (φi ) − E z∼Pz (z) f (Geni (z))||22 . (12)
model generates a matrix with the same size as the output
For optimization purpose, the loss of each generator is
of feature extractor. As elaborated earlier, the multichannel
minimized as
generative model aims to learn the distribution of real DR
features. This is achieved by optimizing the loss function of G opt = arg min(Loss)(Geni ). (13)
G
the multichannel generative model. Each generator simulates
a subfundus image, including the local DR features and also 3) Ensemble Discriminators: To improve the robustness
measures the distribution deviation between the final generated and accuracy of the proposed model, we devise an ensem-
features and the real DR features. Thus, the loss function of ble scheme into the model. The ensemble discriminative
each generator consists of two parts model includes multiple discriminators with different struc-
tures, as shown in Fig. 3. Each discriminator is optimized
(Loss)(Geni ) = LG W + LG iM (9) by training independent multichannel generative networks.
For the input DR features, each discriminator outputs an
where LG W represents the distance between the distribution
independent classification result. The weight parameter is
of generated DR features and the distribution of real DR
assigned to each discriminator by using weighted-based fusion
features. LG iM represents the multichannel features matching
scheme [50], [51]. The classification probability of the kth
loss between the generated features of each subfundus image
input DR features given by discriminator i can be expressed
and the real local features.
as
The distribution of the generated DR features is expected  
to approximate the real distribution of DR features. Therefore, exp(o 1 ) exp(o 2 ) exp(o C )
Pki = C i
, C i
,. . . , C i
.
the multichannel feature matching mechanism is introduced
c=1 exp(oci ) c=1 exp(oci ) c=1 exp(oci )
into the proposed model, while one discriminator is employed (14)
to specify the distribution approximating which is the target of
the multichannel generators. More specifically, each generator Pki can be normalized by
is trained to approximate the expected state accounting for  i i 
the real DR features from the intermediate layers of the α1 , α2 , . . . , αCi
discriminator. Therefore, LG iM can be given by Pki
=   . (15)
exp(o1i ) exp(o2i ) exp(oCi )
LG iM = ||E x∼Pr f (φi ) − E z∼Pz (z) f (Gen i
(z))||22 (10) max C , C exp(o , . . . , C exp(o
c=1 exp(oci ) c=1 ci ) c=1 ci )

where f (·) denotes the activation on an intermediate layer Supposing that the proposed model employs L discrimina-
of the discriminator, E x∼Pr f (·) represents the expectation of tors, the final class label is determined by
real DR features extracted by the intermediate layers of the  L 
 
L 
L
discriminator, and Geni (z) represents the generated features of ypredict = arg max α1i , α2i , . . . , αCi . (16)
a subfundus image generated by generator i. The discriminant i=1 i=1 i=1

Authorized licensed use limited to: VIT University. Downloaded on December 02,2021 at 10:52:28 UTC from IEEE Xplore. Restrictions apply.
580 IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 18, NO. 2, APRIL 2021

Fig. 4. Effect of using different numbers of labeled training data. (a) DR. (b) Normal/abnormal. (c) Referable/nonreferable.

IV. E XPERIMENTS AND R ESULTS samples. For the DR grading, it is hoped to reduce the rate of
A. Experiment Configuration missed diagnosis by increasing sensitivity.
The AUC is a measure frequently used to evaluate the
All the experiments are implemented using NVIDIA Tesla performance of classifiers. It is an indicator of the probability
P100 GPU. The weights of the generators and discriminators that a classifier can correctly classify samples. Note that an
are initialized with Xavier uniform distribution, and the initial AUC value of 0.5 indicates a random classifier (guessing). The
value of bias was set as 0. The model was trained by using main advantage of AUC is its ability to evaluate the grading
minibatch with a size of 16. performance for unbalanced data sets.

B. Data Preprocessing and Augmentation D. Parameter Analysis


The useless black boundary region is removed by a prede- The experiment was evaluated using a tenfold cross vali-
fined threshold, and all the fundus images are normalized to dation to weaken the likelihood of overfitting. For each fold,
the size of 1440 × 1440. The training set is augmented with the proportion of training, validation, and testing data is 70%,
several geometric transformations, including random rotation 10%, and 20%, respectively. Meanwhile, the training set has
(range from −90◦ to 90◦ ) and horizontal and vertical flips. been augmented by ten times.
The developed model is pretrained using the ImageNet [52] In the following, we present the perspectives on evaluating
data set. the proposed multichannel-based SSGAN. The main objectives
of the evaluation are as follows.
C. Evaluation Criteria 1) To investigate the effect of using different numbers of
The proposed model is evaluated individually and compar- labeled training samples in 1).
atively using the following metrics: sensitivity (SN), speci- 2) To investigate how the feature extractor influences the
ficity (SP), area under ROC curve (AUC), and accuracy. These performance of the proposed model in 2).
metrics are reviewed in the following. 3) To investigate the effect of using different numbers of
In medical statistic tasks, diagnostic indicators usually have generators in 3).
two basic characteristics: sensitivity and specificity. Sensitivity 4) To compare the effectiveness of four optimization algo-
refers to the probability that it will not miss a diagnosis, and rithms (gradient descent, momentum, Adagrad, and
specificity refers to the probability of not being misdiagnosed. Adadelta) in 4).
Sensitivity and specificity are calculated as follows: 5) To compare the effectiveness of five different cost func-
tions employed in the proposed model in 5).
TP TN
Sensitivity = , Specificity = (17) Each of these objectives is considered in a separate section
TP + FN TN + FP in the following.
where the true positives (TPs) value is the number of samples 1) Number of Labeled Training Data: For DR diagnosis,
that are correctly classified as the positive class, the false neg- there are a very few high-quality samples with labels, and
atives (FNs) value is the number of samples belonging to the most of the DR samples are unlabeled. We first test how
positive class, but they are erroneously predicted as negative the number of labeled samples influences the performance of
class, the true negatives (TNs) value is the number of samples the proposed model. The metrics, including AUC, SN, SP,
that are correctly classified as the negative class, and the false and accuracy, are employed. The experimental results are
positives (FPs) value is the number of instances belonging to shown in Fig. 4. From Fig. 4(a), it can be observed that the
negative class, but they are erroneously predicted as positive classification accuracy raises greatly (from 70.17% to 78%)
class. In (17), sensitivity indicates how many positive samples as the number of labeled data increases from 5 to 20.
can be detected from all TP samples, and specificity indicates The classification accuracy increases mildly (from 79.37% to
how many negative samples can be detected from all TN 80.98%) as the number of labeled data increases continually

Authorized licensed use limited to: VIT University. Downloaded on December 02,2021 at 10:52:28 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: DR DIAGNOSIS USING MGAN WITH SEMISUPERVISION 581

Fig. 5. Effect of using feature extractor. Fig. 6. Displays of DR features. (a) and (b) Feature maps from the feature
extractor and (c) scattered DR features for sample 1. (d) and (e) Feature maps
from the feature extractor and (f) scattered DR features for sample 2.

from 50 to 200. The abovementioned experimental results


show that the proposed semisupervised model can work well
using a relatively small number of labeled samples. However, when the number of generators keeps increasing. Therefore,
a certain amount of labeled samples is essential if one wants to multichannel-based generative model can improve the model
obtain a higher classification accuracy by using the proposed performance. However, the generator is not the more the
semisupervised model. The similar trend and conclusion can better. The similar trend and conclusion can be observed from
also be observed from Fig. 4(b) and (c). It demonstrates Fig. 7(b) and (c). Based on Fig. 7, the following conclusions
that the proposed multichannel-based SSGAN can generate can be obtained: 1) the proposed multichannel-based gener-
samples following the similar distribution of real DR samples, ative model can benefit from a certain number of generators
and the performance of discriminators can be significantly since they can help generate a series of subfundus images
improved. In reality, especially in the field of DR diagno- including effective local features and 2) excessive generators
sis, labeled data is time-consuming for clinicians. Therefore, can weaken the performance of the multichannel-based model
the proposed multichannel-based SSGAN has great potential since they introduce excessive parameters, which results in
for DR diagnosis. overfitting. Moreover, excessive generators may lead to miss-
2) Feature Extractor: In order to verify the influence of ing the correlation between local features and global features.
the feature extractor, a comparison experiment was carried 4) Optimization Algorithm: To test the influence of the opti-
out to test the model performance with/without the feature mization algorithm, several groups of experiments are carried
extractor. The experimental results are shown in Fig. 5. From out using gradient descent, Adam, momentum, Adagrad, and
Fig. 5, it can be seen that the proposed multichannel-based Adadelta, respectively. The experimental results are shown
SSGAN with feature extractor can achieve a higher classifi- in Fig. 8(a). Fig. 8(a) shows that, compared with the other four
cation accuracy. The reasons mainly include the followings: optimization algorithms (gradient descent, momentum, Ada-
1) some redundant features are removed by feature extractor grad, and Adadelt), Adam algorithm achieves the best accuracy
so that the discriminator can directly process the effective with respect to classifications of DR, normal/abnormal, and
features and 2) the DR features extracted by pretrained feature referable/nonreferable, respectively. The reason is that Adam
extractor are highly representative so that the discriminator can update adaptive learning rates based on the training data.
can learn the distribution of DR features more accurately and The learning rate is stable within a certain range, which makes
the classification accuracy of the proposed model is improved. the parameters relatively stable. Discriminator with relatively
Fig. 6 shows the feature maps and the corresponding scattered stable parameters is important for training GAN [53].
features of DR. For sample 1, Fig. 6(a) and (b) indicates the 5) Cost Function: In this article, the multichannel feature
feature maps obtained from the feature extractor and Fig. 6(c) matching mechanism is introduced into the proposed model.
indicates the scattered DR features. For sample 2, Fig. 6(d) Several comparative experiments using this mechanism are
and (e) indicates the feature maps obtained from the feature conducted to demonstrate its advantage. The employed mech-
extractor and Fig. 6(f) indicates the scattered DR features. anisms include minibatch discrimination (MBD), historical
3) Number of Generators: The proposed model is based on averaging (HisA), one-sided label smoothing (OSLS), and
the multichannel GAN. The number of channels/generators virtual BN (VBN) [54]. The experimental results are shown
plays an important role in the model performance. Several in Fig. 8(b). Fig. 8(b) shows that the proposed model with
groups of experiments are carried out to test its influence. The multichannel feature matching mechanism achieves the highest
experimental results are shown in Fig. 7. Fig. 7(a) shows that accuracy among all the methods. The multichannel feature
the grading accuracy raised greatly as the number of generators matching mechanism can minimize the difference between the
increases from 1 to 4, but the accuracy is decreased gradually generated DR features and the real DR features. Fig. 9 shows

Authorized licensed use limited to: VIT University. Downloaded on December 02,2021 at 10:52:28 UTC from IEEE Xplore. Restrictions apply.
582 IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 18, NO. 2, APRIL 2021

Fig. 7. Effect of using different numbers of generators. (a) DR. (b) Normal/abnormal. (c) Referable/nonreferable.

Fig. 8. (a) Effect of optimization methods. (b) Effect of five different loss functions.

Fig. 10. Confusion matrix of DR classification results.


Fig. 9. Trend of loss function for the training process.
TABLE III
C OMPARISON W ITH O THER M ETHODS FOR DR G RADING
the discriminator loss for DR classification in the training
process.

V. D ISCUSSION
This section provides a comparative evaluation of the pro-
posed multichannel-based SSGAN with other representative
methods in terms of accuracy, AUC, sensitivity, and specificity.
Several groups of experiments were conducted using differ- human experts, SSGAN, and other representative methods.
ent classification methods. The comparable methods include The experimental results are presented in Tables III–V.

Authorized licensed use limited to: VIT University. Downloaded on December 02,2021 at 10:52:28 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: DR DIAGNOSIS USING MGAN WITH SEMISUPERVISION 583

TABLE IV
C OMPARISON W ITH O THER M ETHODS ON N ORMAL /A BNORMAL

TABLE V
C OMPARISON W ITH O THER M ETHODS ON R EFERABLE /N ONREFERABLE

Fig. 11. (a)–(d) and (f)–(i) Generated samples versus (e) and (j) real samples.

From Table III, it can be observed that, for the TABLE VI


four-category grading of DR, the proposed framework using R ESULTS OF DR C LASSIFICATION ON D IFFERENT D ATA S ETS
multichannel-based SSGAN achieves the highest accuracy
of 84.23% among all the methods. For the binary classifi-
cation of DR, Table IV shows that, compared with the other
methods, the proposed method achieves the highest accuracy
of 96.6%, the highest sensitivity of 97.0%, the highest speci-
ficity of 96.8%, and the highest AUC of 98.3%. Table V
also shows that the proposed method outperforms the other
representative methods [16], [17], [28] in terms of accuracy, It demonstrates that the proposed method can deal with a
AUC, sensitivity, and specificity, respectively. classification problem when the labeled samples are limited.
To demonstrate the advantage of the proposed method, The proposed model is tested on different data sets to
we only employ 100 labeled DR samples and a large number demonstrate its generalization capability. The experimental
of unlabeled DR samples to train the model. To show the results are shown in Table VI. The comparison between
classification performance in the four categories (R0, R1, R2, the generated images and real images is given in Fig. 11,
and R3), we employ the confusion matrix to analyze the exper- where Fig. 11(a)–(d) is the generated images corresponding
imental results that are shown in Fig. 10. From Fig. 10, it can to the real image Fig. 11(e) at different stages. Similarly,
be observed that although we only employ 100 labeled DR Fig. 11(f)–(i) is generated images corresponding to the real
samples to train the model, the proposed model performs well. image Fig. 11(j) at different stages. Note that, in this article,

Authorized licensed use limited to: VIT University. Downloaded on December 02,2021 at 10:52:28 UTC from IEEE Xplore. Restrictions apply.
584 IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 18, NO. 2, APRIL 2021

TABLE VII
C ONFIGURATIONS OF THE P ROPOSED M ODEL

we aim to obtain a good discriminator. It is well known that with four generators. Five discriminators are trained for the
good semisupervised classification performance and a good ensemble model. In Table VII, “256, 3 × 3conv, stride 2”
generator cannot be obtained at the same time [56]. Given the means 256 convolution kernels of size 3 × 3 and the stride
discriminator objective, good semisupervised learning often is 2.
requires a bad generator. Therefore, in this article, it is not
important whether the generated images are uniform with the R EFERENCES
real images. [1] O. Faust, R. Acharya U., E. Y. K. Ng, K.-H. Ng, and J. S. Suri, “Algo-
rithms for the automated detection of diabetic retinopathy using digital
VI. C ONCLUSION fundus images: A review,” J. Med. Syst., vol. 36, no. 1, pp. 145–157,
Feb. 2012.
The inadequacy of labeled data is a challenge for using [2] S. Haneda and H. Yamashita, “International clinical diabetic retinopathy
deep-learning technology in medical image analysis. The disease severity scale,” Nihon Rinsho. Jpn. J. Clin. Med., vol. 68, p. 228,
Nov. 2010.
reasons mainly include the followings: 1) the high-quality [3] S. Wang, Y. Hu, Y. Shen, and H. Li, “Classification of diffusion tensor
annotation of medical imaging sample depends heavily on metrics for the diagnosis of a myelopathic cord using machine learning,”
scarce medical expertise which is very expensive; 2) compared Int. J. Neural Syst., vol. 28, no. 2, Mar. 2018, Art. no. 1750036.
[4] S. Wang et al., “Skeletal maturity recognition using a fully automated
with natural issues, it is more difficult to collect medical system with convolutional neural networks,” IEEE Access, vol. 6,
images because of privacy issues, and 3) there are many kinds pp. 29979–29993, 2018.
of diseases, and so many different medical data sets need to be [5] M. Zhang et al., “Adaptive patient-cooperative control of a compliant
ankle rehabilitation robot (CARR) with enhanced training safety,” IEEE
collected. Actually, collecting sufficient DR samples with label Trans. Ind. Electron., vol. 65, no. 2, pp. 1398–1407, Feb. 2018.
is difficult. In this article, the multichannel GAN with semi- [6] H. Greenspan, B. van Ginneken, and R. M. Summers, “Guest editorial
supervision is developed to assess DR. The proposed model deep learning in medical imaging: Overview and future promise of
an exciting new technique,” IEEE Trans. Med. Imag., vol. 35, no. 5,
can deal with a DR classification problem with inadequacy pp. 1153–1159, May 2016.
of labeled data in the following ways. First, the multichannel [7] B. Zhong, W. Niu, E. Broadbent, A. McDaid, T. M. C. Lee, and
generative scheme is proposed to generate a series of sub- M. Zhang, “Bringing psychological strategies to robot-assisted physio-
therapy for enhanced treatment efficacy,” Frontiers Neurosci., vol. 13,
fundus images corresponding to the scattering DR features. p. 984, Sep. 2019.
Second, the proposed multichannel-based GAN model with [8] A. K. AlZubaidi, F. B. Sideseq, A. Faeq, and M. Basil, “Computer
semisupervision can make full use of both labeled data and aided diagnosis in digital pathology application: Review and perspective
approach in lung cancer classification,” in Proc. Annu. Conf. New Trends
unlabeled data. Third, the DR feature extractor is introduced Inf. Commun. Technol. Appl. (NTICT), Mar. 2017, pp. 219–224.
into the proposed model to weaken noise and extract represen- [9] M. Zhang, A. McDaid, A. J. Veale, Y. Peng, and S. Q. Xie, “Adaptive
tative DR features, some of which are tiny and spread out over trajectory tracking control of a parallel ankle rehabilitation robot with
joint-space force distribution,” IEEE Access, vol. 7, pp. 85812–85820,
high-resolution fundus images. Experiments are conducted 2019.
by using the Messidor data set. The experimental results [10] A. Sopharak et al., “Machine learning approach to automatic exudate
demonstrate that the developed model outperforms the other detection in retinal images from diabetic patients,” J. Modern Opt.,
vol. 57, no. 2, pp. 124–135, Jan. 2010.
representative models [16], [17], [28] in terms of accuracy, [11] R. Priya and P. Aruna, “Diagnosis of diabetic retinopathy using
AUC, sensitivity, and specificity. In particular, Fig. 10 shows machine learning techniques,” ICTACT J. Soft Comput., vol. 3, no. 4,
the promising performance of the proposed model even if only pp. 563–575, Jul. 2013.
[12] J. Krause et al., “Grader variability and the importance of reference stan-
100 labeled samples are employed. However, this article still dards for evaluating machine learning models for diabetic retinopathy,”
has some limitations. For example, biological methods and Ophthalmology, vol. 125, no. 8, pp. 1264–1272, Aug. 2018.
pathological analysis are not considered in this article. These [13] M. Zhang, W. Meng, C. Davies, Y. Zhang, and S. Xie, “A robot-
driven computational model for estimating passive ankle torque with
issues will be addressed in the future work. subject-specific adaptation,” IEEE Trans. Biomed. Eng., vol. 63, no. 4,
pp. 814–821, Aug. 2016.
A PPENDIX : [14] G. Litjens et al., “A survey on deep learning in medical image analysis,”
D ETAILS OF N ETWORK S TRUCTURE Med. Image Anal., vol. 42, pp. 60–88, Dec. 2017.
[15] H. Huang et al., “Introvae: Introspective variational autoencoders for
The configurations of the proposed model are given photographic image synthesis,” in Proc. Adv. Neural Inf. Process. Syst.,
in Table VII. In this article, each discriminator is cotrained 2018, pp. 52–63.

Authorized licensed use limited to: VIT University. Downloaded on December 02,2021 at 10:52:28 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: DR DIAGNOSIS USING MGAN WITH SEMISUPERVISION 585

[16] A. Odena, “Semi-supervised learning with generative adversarial [36] X. Li, T. Pang, B. Xiong, W. Liu, P. Liang, and T. Wang, “Convolutional
networks,” 2016, arXiv:1606.01583. [Online]. Available: http://arxiv. neural networks based transfer learning for diabetic retinopathy fundus
org/abs/1606.01583 image classification,” in Proc. 10th Int. Congr. Image Signal Process.,
[17] Z. Wang, Y. Yin, J. Shi, W. Fang, H. Li, and X. Wang, “Zoom-in- Biomed. Eng. Informat. (CISP-BMEI), Oct. 2017, pp. 1–11.
net: Deep mining lesions for diabetic retinopathy detection,” in Proc. [37] A. Khandelwal and A. K. Mishra, “Design simulation and analysis of
Int. Conf. Med. Image Comput. Comput.-Assist. Intervent. Quebec City, enhanced diabetic retinopathy using convolutional neural network,” Int.
QC, Canada: Springer, 2017, pp. 267–275. J. Res. Anal. Rev., vol. 6, no. 3, pp. 660–663, 2019.
[18] C. I. Sánchez, M. Niemeijer, A. V. Dumitrescu, [38] M. Raju, V. Pagidimarri, R. Barreto, A. Kadam, V. Kasivajjala, and
M. S. A. Suttorp-Schulten, M. D. Abràmoff, and B. van Ginneken, A. Aswath, “Development of a deep learning algorithm for automatic
“Evaluation of a computer-aided diagnosis system for diabetic diagnosis of diabetic retinopathy,” in Proc. MedInfo, 2017, pp. 559–563.
retinopathy screening on public data,” Investigative Opthalmol. Vis. [39] K. Xu, D. Feng, and H. Mi, “Deep convolutional neural network-based
Sci., vol. 52, no. 7, pp. 4866–4871, Jun. 2011. early automated detection of diabetic retinopathy using fundus image,”
[19] L. Seoud, J. Chelbi, and F. Cheriet, “Automatic grading of diabetic Molecules, vol. 22, no. 12, p. 2054, 2017.
retinopathy on a public database,” in Proc. Ophthalmic Med. Image [40] R. Ghosh, K. Ghosh, and S. Maitra, “Automatic detection and classifi-
Anal. 2nd Int. Workshop (OMIA), Munich, Germany, 2015, pp. 97–104. cation of diabetic retinopathy stages using CNN,” in Proc. 4th Int. Conf.
[20] H. Pratt, F. Coenen, D. M. Broadbent, S. P. Harding, and Y. Zheng, Signal Process. Integr. Netw. (SPIN), Feb. 2017, pp. 550–554.
“Convolutional neural networks for diabetic retinopathy,” Procedia Com- [41] S. Qummar et al., “A deep learning ensemble approach for diabetic
put. Sci., vol. 90, pp. 200–205, Jan. 2016. retinopathy detection,” IEEE Access, vol. 7, pp. 150530–150539, 2019.
[21] V. Gulshan et al., “Development and validation of a deep learning [42] J. M. Brown et al., “Automated diagnosis of plus disease in retinopa-
algorithm for detection of diabetic retinopathy in retinal fundus pho- thy of prematurity using deep convolutional neural networks,” JAMA
tographs,” JAMA, vol. 316, no. 22, pp. 2402–2410, Dec. 2016. Ophthalmol., vol. 136, no. 7, pp. 803–810, Jul. 2018.
[22] M. Haloi, “Improved microaneurysm detection using deep neural net- [43] X. Li, L. Shen, M. Shen, F. Tan, and C. S. Qiu, “Deep learning
works,” 2015, arXiv:1505.04424. [Online]. Available: http://arxiv.org/ based early stage diabetic retinopathy detection using optical coherence
abs/1505.04424 tomography,” Neurocomputing, vol. 369, pp. 134–144, Dec. 2019.
[23] P. Costa, A. Galdran, A. Smailagic, and A. Campilho, “A weakly- [44] R. Poplin et al., “Prediction of cardiovascular risk factors from retinal
supervised framework for interpretable diabetic retinopathy detection fundus photographs via deep learning,” Nature Biomed. Eng., vol. 2,
on retinal images,” IEEE Access, vol. 6, pp. 18747–18758, 2018. no. 3, pp. 158–164, Mar. 2018.
[24] J. Shan and L. Li, “A deep learning method for microaneurysm detection [45] J. De Fauw et al., “Clinically applicable deep learning for diagnosis and
in fundus images,” in Proc. IEEE 1st Int. Conf. Connected Health, Appl., referral in retinal disease,” Nature Med., vol. 24, no. 9, pp. 1342–1350,
Syst. Eng. Technol. (CHASE), Jun. 2016, pp. 357–358. Sep. 2018.
[25] R. Gargeya and T. Leng, “Automated identification of diabetic retinopa- [46] J. de la Torre, A. Valls, and D. Puig, “A deep learning interpretable
thy using deep learning,” Ophthalmology, vol. 124, no. 7, pp. 962–969, classifier for diabetic retinopathy disease grading,” Neurocomputing,
Jul. 2017. Apr. 2019, doi: 10.1016/j.neucom.2018.07.102.
[26] B. Antal and A. Hajdu, “An ensemble-based system for auto- [47] E. Decencière et al., “Feedback on a publicly distributed image data-
matic screening of diabetic retinopathy,” Knowl.-Based Syst., vol. 60, base: The Messidor database,” Image Anal. Stereology, vol. 33, no. 3,
pp. 20–27, Apr. 2014. pp. 231–234, 2014.
[27] P. Costa and A. Campilho, “Convolutional bag of words for diabetic [48] G. Huang, Z. Liu, L. V. D. Maaten, and K. Q. Weinberger, “Densely
retinopathy detection from eye fundus images,” IPSJ Trans. Comput. connected convolutional networks,” in Proc. CVPR, Jul. 2017, vol. 1,
Vis. Appl., vol. 9, no. 1, p. 10, 2017. no. 2, p. 3.
[28] H. H. Vo and A. Verma, “New deep neural nets for fine-grained diabetic [49] X. Glorot, A. Bordes, and Y. Bengio, “Deep sparse rectifier neural net-
retinopathy recognition on hybrid color space,” in Proc. IEEE Int. Symp. works,” in Proc. 14th Int. Conf. Artif. Intell. Statist., 2011, pp. 315–323.
Multimedia (ISM), Dec. 2016, pp. 209–215. [50] X. Frazao and L. A. Alexandre, “Weighted convolutional neural network
[29] S. Jialin Pan and Q. Yang, “A survey on transfer learning,” IEEE Trans. ensemble,” in Iberoamerican Congress on Pattern Recognition. Puerto
Knowl. Data Eng., vol. 22, no. 10, pp. 1345–1359, Oct. 2010. Vallarta, Mexico: Springer, 2014, pp. 674–681.
[30] L. Dai et al., “Clinical report guided retinal microaneurysm detection [51] G. Wen, Z. Hou, H. Li, D. Li, L. Jiang, and E. Xun, “Ensemble of
with multi-sieving deep learning,” IEEE Trans. Med. Imag., vol. 37, deep neural networks with probability-based fusion for facial expression
no. 5, pp. 1149–1161, May 2018. recognition,” Cognit. Comput., vol. 9, no. 5, pp. 597–610, Oct. 2017.
[31] W. Cao, N. Czarnek, J. Shan, and L. Li, “Microaneurysm detection [52] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet:
using principal component analysis and machine learning methods,” A large-scale hierarchical image database,” in Proc. IEEE Conf. Comput.
IEEE Trans. Nanobiosci., vol. 17, no. 3, pp. 191–198, Jul. 2018. Vis. Pattern Recognit., Jun. 2009, pp. 248–255.
[32] W. M. Gondal, J. M. Köhler, R. Grzeszick, G. A. Fink, and M. Hirsch, [53] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein GAN,” 2017,
“Weakly-supervised localization of diabetic retinopathy lesions in retinal arXiv:1701.07875. [Online]. Available: http://arxiv.org/abs/1701.07875
fundus images,” in Proc. IEEE Int. Conf. Image Process. (ICIP), [54] T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and
Sep. 2017, pp. 2069–2073. X. Chen, “Improved techniques for training GANs,” in Proc. Adv. Neural
[33] G. Quellec, K. Charrière, Y. Boudi, B. Cochener, and M. Lamard, “Deep Inf. Process. Syst., 2016, pp. 2234–2242.
image mining for diabetic retinopathy screening,” Med. Image Anal., [55] D. Y. Carson Lam, M. Guo, and T. Lindsey, “Automated detection
vol. 39, pp. 178–193, Jul. 2017. of diabetic retinopathy using deep learning,” in Proc. AMIA Summits
[34] L. Zhou, Y. Zhao, J. Yang, Q. Yu, and X. Xu, “Deep multiple Transl. Sci., 2018, p. 147.
instance learning for automatic detection of diabetic retinopathy in [56] Z. Dai, Z. Yang, F. Yang, W. W. Cohen, and R. Salakhutdinov,
retinal images,” IET Image Process., vol. 12, no. 4, pp. 563–571, “Good semi-supervised learning that requires a bad GAN,” 2017,
Apr. 2018. arXiv:1705.09783. [Online]. Available: http://arxiv.org/abs/1705.09783
[35] X. Zeng, H. Chen, Y. Luo, and W. Ye, “Automated diabetic retinopathy [57] M. D. Abràmoff et al., “Automated analysis of retinal images for
detection based on binocular Siamese-like convolutional neural net- detection of referable diabetic retinopathy,” JAMA Ophthalmol., vol. 131,
work,” IEEE Access, vol. 7, pp. 30744–30753, 2019. no. 3, pp. 351–357, Mar. 2013.

Authorized licensed use limited to: VIT University. Downloaded on December 02,2021 at 10:52:28 UTC from IEEE Xplore. Restrictions apply.

You might also like