Professional Documents
Culture Documents
2, APRIL 2021
Abstract— Diabetic retinopathy (DR) is one of the major causes high-resolution medical images. As for the inadequacy of labeled
of blindness. It is of great significance to apply deep-learning data in medical image analysis, the reasons mainly include the
techniques for DR recognition. However, deep-learning algo- followings: 1) the high-quality annotation of medical imaging
rithms often depend on large amounts of labeled data, which sample depends heavily on scarce medical expertise which is
is expensive and time-consuming to obtain in the medical very expensive and 2) comparing with natural issues, it is more
imaging area. In addition, the DR features are inconspicuous difficult to collect medical images because of privacy issues. It is of
and spread out over high-resolution fundus images. Therefore, great significance to apply deep-learning techniques for diabetic
it is a big challenge to learn the distribution of such DR retinopathy (DR) recognition. In this article, the multichannel
features. This article proposes a multichannel-based generative generative adversarial network (GAN) with semisupervision is
adversarial network (MGAN) with semisupervision to grade DR. developed for DR-aided diagnosis. The proposed model can
The multichannel generative model is developed to generate a deal with DR classification problem with inadequacy of labeled
series of subfundus images corresponding to the scattering DR data in the following ways: 1) the multichannel generative
features. By minimizing the dependence on labeled data, the pro- scheme is proposed to generate a series of subfundus images
posed semisupervised MGAN can identify the inconspicuous corresponding to the scattering DR features and 2) the proposed
lesion features by using high-resolution fundus images without multichannel-based GAN (MGAN) model with semisupervision
compression. Experimental results on the public Messidor data can make full use of both labeled data and unlabeled data.
set show that the proposed model can grade DR effectively. The experimental results demonstrate that the proposed model
outperforms the other representative models in terms of accuracy,
Note to Practitioners—This article is motivated by the chal- area under ROC curve (AUC), sensitivity, and specificity.
lenging problem due to the inadequacy of labeled data in
medical image analysis and the dispersion of efficient features in Index Terms— Computer-aided diagnosis (CAD), diabetic
retinopathy (DR), generative adversarial network (GAN), multi-
channel, semisupervised learning.
Manuscript received August 23, 2019; revised December 3, 2019 and
January 31, 2020; accepted March 4, 2020. Date of publication April 9, I. I NTRODUCTION
2020; date of current version April 7, 2021. This article was recommended
for publication by Lead Guest Editor A. Si and Editor M. Zhang upon
evaluation of the reviewers’ comments. This work was supported by the
National Natural Science Foundations of China under Grant 61872351 and
D IABETIC retinopathy (DR) is one of the complications
caused by diabetes [1]. The high blood sugar level
damages the blood vessels of the light-sensitive tissue at
Grant 61771465, by the International Science and Technology Cooperation the retina. It is one of the main causes of blindness. The
Projects of Guangdong under Grant 2019A050510030, by the Strategic
Priority CAS Project under Grant XDB38000000, by the Major Projects World Health Organization predicts that global diabetics will
from General Logistics Department of People’s Liberation Army under reach 4.4% of the population in 2030 and about half of
Grant AWS13C008, and by the Shenzhen Key Basic Research Projects them will have DR complications [2]. Early diagnosis through
under Grant JCYJ20180507182506416. (Shuqiang Wang and Xiangyu Wang
contributed equally to this work.) (Corresponding authors: Min Gan; regular screening is important for preventing DR. However,
Baiying Lei.) it is time-consuming for ophthalmologists to diagnose effi-
Shuqiang Wang is with the Shenzhen Institutes of Advanced Technology, ciently. In order to reduce the cost of regular screening,
Chinese Academy of Sciences, Shenzhen 518055, China, and also with the
Joint Engineering Research Center for Health Big Data Intelligent Analysis the technology for capturing color fundus images is often
Technology, Shenzhen 518060, China. adopted. This approach offers the possibility to make use of
Xiangyu Wang is with the College of Data Science, University of Science computer-aided diagnosis (CAD) [3], [4] technology, which
and Technology of China, Hefei 230026, China.
Yong Hu is with the Department of Orthopedics and Traumatology, has been widely studied in artificial intelligence for healthcare
The University of Hong Kong, Hong Kong. applications [5]–[9]
Yanyan Shen and Zhile Yang are with the Shenzhen Institutes of Advanced For traditional machine learning, most of the efficient fea-
Technology, Chinese Academy of Sciences, Shenzhen 518055, China.
Min Gan is with the College of Mathematics and Computer Science, Fuzhou tures need to be identified by an expert manually. Besides,
University, Fuzhou 350116, China (e-mail: aganmin@aliyun.com). the performance of the traditional machine-learning meth-
Baiying Lei is with the School of Biomedical Engineering, Health ods often depends on how accurately the features are
Science Center, Shenzhen University, Shenzhen 518060, China (e-mail:
leiby@szu.edu.cn). extracted [10]–[13]. In recent years, deep-learning technol-
This article has supplementary downloadable material available at ogy has been widely applied in the field of medical image
https://ieeexplore.ieee.org, provided by the authors. analysis [6], [14]. It can learn the high-level features from
Color versions of one or more of the figures in this article are available
online at https://ieeexplore.ieee.org. images automatically. However, deep-learning models usually
Digital Object Identifier 10.1109/TASE.2020.2981637 depend on a large amount of labeled data, which is a big
1545-5955 © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: VIT University. Downloaded on December 02,2021 at 10:52:28 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: DR DIAGNOSIS USING MGAN WITH SEMISUPERVISION 575
challenge to obtain in the medical imaging area. For the The rest of this article is organized as follows. In Section II,
labeling process of DR images, the grading of DR requires the related works on DR assessment are reviewed. In
the clinician to extract the lesions and measure the area of Section III, the proposed model is described in detail.
the lesions manually, which is highly time-consuming. Due to In Section IV, the model is evaluated, and the factors influ-
the lack of high-quality labeled data in real applications, it is encing model performance are also analyzed. In Section V,
difficult to apply the general deep-learning method (such as the experimental results compared with the existing methods
GoogleLeNet and ResNet) for DR diagnosis. On the other are presented and discussed. Conclusions are summarized in
hand, hospitals can produce a large number of unlabeled Section VI.
data containing important potential information in which the
machine-learning model can benefit. Therefore, it is feasible II. R ELATED W ORKS
to employ semisupervised learning for DR recognition task. In the recent years, CAD technology using machine learning
As for semisupervised learning, generative adversarial network has been applied to diagnose various diseases. Extensive
(GAN) [15] not only works well in a wide variety of appli- research has been carried out on assisted DR diagnosis [18].
cations for image generation but also can achieve excellent For example, Seoud et al. [19] proposed an automatic DR
performance in semisupervised classification [16]. grading system. In their work, a red lesion detection is
The high-resolution fundus images are usually employed adopted to generate a DR lesion probability map, which
to diagnose DR in the clinical treatment. However, there are is represented by 35 features, including location, size, and
several challenges for employing the high-resolution fundus probability information. Pratt et al. [20] used a convolutional
images and the general GAN model to recognize DR. First, neural network (CNN) model and color fundus images
the proportion of effective DR features is pretty low in for DR classification. They employed data augmentation to
high-resolution fundus images. Feeding such images into a expand the training set and used 80 000 samples to train
classifier directly will result in numerous redundant infor- the CNN model. Gulshan et al. [21] employed the Inception-
mation. Due to the microminiature of DR characteristics, v3 architecture-based model to automate detection of DR and
reducing the size of fundus image beforehand will result in diabetic macular edema. Haloi et al. [22] proposed a microa-
losing the information of tiny lesions. Second, due to the neurysm detection system for DR detection. They used deep
microminiature of DR features, it is difficult for the general neural network (DNN) to identify microaneurysm without the
GAN model to generate high-resolution images, including preprocessing steps. The proposed model was evaluated on the
detailed semantic information. Third, the locations of DR Retinopathy Online Challenge (ROC) and Diaretdb1v2 data-
lesion are often spread out and exudates may appear in various base.
locations depending on the patient. Moreover, the lesion area Costa et al. [23] developed a weakly supervised DR detec-
might surround with noise, such as blood vessels and imaging tion system. They used multiple instance learning (MIL)
shadows. Thus, it is unappealing to extract the lesion by algorithms for the joint optimization of the instance encod-
employing the detection of region of interest (ROI). Based on ing and the classification. Shan and Li [24] used the stacked
the earlier analysis, this article proposes a multichannel-based sparse autoencoder (SSAE) and fundus images for microa-
semisupervised GAN (SSGAN) for grading DR. The proposed neurysms classification. Gargeya and Leng [25] developed a
model can make full use of labeled data and unlabeled data data-driven method using the deep residual learning mech-
to recognize DR automatically without losing the original anism to learn discriminative features for DR detection.
DR features. Antal and Hajdu [26] developed an ensemble-based algorithm
The main contributions of this article are summarized as for the DR screening. Costa and Campilho [27] presented the
follows. bag-of-visual-words (BoVW)-based model for DR detection,
1) A multichannel-based GAN (MGAN) model is pro- and the model was tested on the Messidor and DR2 data
posed, which can generate a series of subfundus images, sets. Vo and Verma [28] proposed a DR detection model
including effective local features. All the subfundus by combing kernels with multiple losses network (CKML
images are then combined to obtain the most represen- Net) and VGGNet with Extra Kernel (VNXK). The exper-
tative features of the entire fundus image. In this way, imental results on the EyePACS and Messidor data sets
the proposed model can deal with the challenge that the showed the efficiency of the proposed model. Two pre-
effective DR features (e.g., exudates, microaneurysms, trained [29] CNN models were employed to identify the grade
and bleeding points) are diffuse in the high-resolution of DR on fluorescein angiography photographs. To detect
fundus images. microaneurysm from fundus images, Dai et al. [30] pro-
2) The feature extraction scheme is incorporated into the posed a multisieving CNN framework integrating with the
proposed MGAN framework. This scheme can reduce image-to-text mapping scheme for guiding clinical report.
the noise from the original fundus images and extract Cao et al. [31] developed a model by integrating random
the scattering lesion features, which can improve the forest (RF), neural network (NN), and support vector machine
performance of discriminator. (SVM) to detect microaneurysm. Moreover, principal com-
3) The proposed MGAN can employ both labeled data and ponent analysis (PCA) was employed to reduce the dimen-
unlabeled data. As far as we know, it is the first time sionality of DR image patches. Based on top-performing
that the SSGAN is employed for grading DR. supervised CNN, Gondal et al. [32] presented a weakly
Authorized licensed use limited to: VIT University. Downloaded on December 02,2021 at 10:52:28 UTC from IEEE Xplore. Restrictions apply.
576 IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 18, NO. 2, APRIL 2021
TABLE I
R ETINOPATHY G RADE IN M ESSIDOR D ATA S ET
TABLE II
C LASSIFICATION TASK D ESCRIPTION
supervised object localization method to detect DR lesions on transfer leaning scheme to deal with this issue, while transfer
image level. By introducing a heatmap optimization procedure, learning does not always work in medical image processing.
Quellec et al. [33] proposed a ConvNets framework to detect This article proposes a multichannel based SSGAN for grading
lesions in the context of DR screening. Zhou et al. [34] pro- DR. The proposed model can make full use of labeled data and
posed an MIL-based model for DR detection. Zeng et al. [35] unlabeled data to recognize DR automatically without losing
trained a binocular Siamese-Like CNN with a transfer learning the original DR features.
technique to classify color retinal fundus photographs into
two grades. By employing transfer learning, Li et al. [36] III. M ATERIALS AND M ETHODS
A. Data Set
presented a CNN-based model for DR fundus image classifica-
tion. Similarly, Khandelwal and Mishra [37], Raju et al. [38], The proposed model is evaluated on publicly avail-
Xu et al. [39] and Ghosh et al. [40] also adopted CNN-based able Messidor data set (http://www.adcis.net/en/third-party/
approach for automatic recognition of DR. Furthermore, messidor/) [47], which contains approximately 1200 digital
Qummar et al. [41] presented an ensemble of five CNN fundus images. Each image is obtained by using a Topcon
models to encode the DR features and improve the clas- TRC NW6 nonmydriatic camera and 45◦ of view centered
sification performance. Brown et al. [42] employed CNN to on the fovea. The ratio of images with pupil dilation to
diagnose plus disease in retinopathy of prematurity (ROP) images without dilation is 2:1. The size of the fundus image
from retinal photographs. Li et al. [43] developed a novel is 1440 × 960, 2240 × 1488, or 2304 × 1536. The label
deep network OCTD_Net for early-stage DR detection. The for each image is provided by ophthalmologists. According
OCTD_Net consisted of two independent networks. One net- to the number of microaneurysms, hemorrhages, and the
work extracted features from the original optical coherence existence of neovascularization, each image is classified as
tomography (OCT) images, and the other extracted retinal one of four lesion grades (R0, R1, R2, and R3), as described
layer information. Poplin et al. [44] developed a DNN-based in Table I. R0 represents a normal, no lesion fundus image,
model to predict cardiovascular risk factors from retinal fundus R1 and R2 represent the mild and severe nonproliferative
photographs. Wang et al. [17] proposed a novel zoom-in-net retinal image, respectively, and R3 represents the proliferative
for DR recognition by employing both the whole retinal retinopathy image. In the Messidor data set, R0, R1, R2,
image and its suspicious lesion patches generated by atten- and R3 account for 45.5%, 12.75%, 20.58%, and 21.67%
tion maps. De Fauw et al. [45] first applied a U-Net-based of the total data set, respectively. In addition, the severity
segmentation network on t3-D OCT scans to generate the of the lesion is classified according to class labels, as shown
tissue segmentation map, and then, they predict the retinal in Table II.
disease using a classification network. Based on the pixel-wise
score propagation model, de La Torre et al. [46] presented a B. DR Feature Extraction
DNN-based interpretable model for DR assessment. With this The framework of the proposed DR grading model is shown
model, the generated visual maps can be interpreted. in Fig. 1. The proposed model consists of four parts.
Most of the abovementioned works are based on super- 1) The DR feature extractor is designed to extract disper-
vised DNN of which the performance often depends on a sive lesion features.
large amount of labeled samples. However, plenty of labeled 2) The multichannel generative model is designed to gen-
samples are always not available for image processing in the erate a series of subfundus images, including effective
medical field. Some of the abovementioned works resorted to local features.
Authorized licensed use limited to: VIT University. Downloaded on December 02,2021 at 10:52:28 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: DR DIAGNOSIS USING MGAN WITH SEMISUPERVISION 577
3) The discriminator is designed to predict C + 1 classes drastically. Moreover, compressing the fundus image will
by using convolutional layers accounting for multiscale destroy tiny DR lesion features, such as bleeding points with
DR features matching. only a few pixels. To solve this problem, a feature extractor
4) In order to improve the robustness and accuracy of is designed to extract the most representative DR features
the proposed model, an ensemble-based framework is from the original fundus image by transferring a portion of
designed by integrating various trained discriminators. the pretrained network structure.
Each part will be detailed as follows. The feature extractor is shown in Fig. 2. First, the image is
Since the fundus image has high resolution with millions normalized to eliminate the effects of different scales and illu-
of pixels, feeding such high-resolution fundus images into minations. Second, the data augmentation strategy is applied
GAN directly will cause the number of parameters increasing to increase the amount of training data. Third, the large-scale
Authorized licensed use limited to: VIT University. Downloaded on December 02,2021 at 10:52:28 UTC from IEEE Xplore. Restrictions apply.
578 IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 18, NO. 2, APRIL 2021
natural data set is used to pretrain the model and the fundus defined for different label types. The labeled samples only
image is used to fine-tune the network parameters. In order to need to be predicted as a certain class since they are all real.
extract the most representative DR features, the densely con- The corresponding loss function is defined as
nected convolutional network (DenseNet) [48] is pretrained to
extract DR features from high-resolution fundus images. The L Dr−labeled = −E x,y∼ pdata (x,y) log pd (y|x, y < C + 1)
main idea of DenseNet is to connect each layer’s output to the = −o y + log[ exp o j ] (2)
subsequent layers by creating a short path from preceding lay- j =1
ers to subsequent layers. Therefore, compared with using other
where o represents the probability of the corresponding cate-
pretrained models, DenseNet can extract and transfer features
gory and y represents the ground truth. The unlabeled samples
efficiently with fewer parameters. For the feature extractor,
can only be classified as real sample or generated sample
we transfer a portion of the pretrained model to represent
without one certain class label. The (C + 1)th class is
fundus image features. The framework includes a convolution
employed to indicate whether a sample is generated by the
layer learning the basic features of the retinopathy image and
generator. For the generated DR samples, the loss function is
several dense blocks containing several composite layers. Each
defined as
composite layer includes three consecutive operations: batch
normalization (BN), rectifying by linear unit (ReLU) [49], L Dgenerated = E x∼Gen(z) log[ pd (y = C + 1|x)]
and convolution operation with a 3 × 3 convolution kernel. ⎡ ⎤
To remove the redundant information, a 1 × 1 convolution
= log⎣1 + exp o j ⎦. (3)
layer is designed in prior to each composite layer. It can
j =1
enhance the expression ability of the feature extractor and
decrease computation cost. Likewise, a 1 × 1 convolution In the same manner, the corresponding loss function for the
layer and a pooling layer are designed among dense blocks. real unlabeled DR sample is defined as
They can preserve the effective features, discard redundant
information, and effectively reduce the size of samples. L Dr−unlabeled
= −E x,y∼ pdata (x,y) log[1 − pd (y = C + 1|x)]
⎡ ⎤ ⎡ ⎤
C. Multichannel-Based SSGAN
= − log⎣ exp o j ⎦ + log⎣1 + exp o j ⎦. (4)
The proposed framework includes a multichannel generative j =1 j =1
model and a discriminator. Moreover, to improve the robust-
ness and accuracy of the model, the ensemble-based scheme is The total loss function for the discriminator is the sum of the
employed to integrate a series of well-trained discriminators. abovementioned three loss functions
The proposed framework is shown in Fig. 2. Each part will
be detailed as follows. (Loss)(Dis) = L Dr−labeled + L Dgenerated + L Dr−unlabeled . (5)
1) Discriminator: For the proposed multichannel based 2) Multichannel-Based Generative Model: For the general
SSGAN, the input to the discriminator includes the real DR GAN, it is difficult to generate a high-resolution fundus image
features extracted by the feature extractor, and DR features directly since the number of parameters for the generator will
generated by the multichannel generative model. For the be increased dramatically. In this article, a multichannel-based
proposed model, the discriminator is mainly composed of generative model is proposed to learn the distribution of DR
several convolution layers and a fully connected layer with features. The multichannel-based generative model contains
a softmax activation function, which predicts the class labels multiple parallel generators to generate a series of subfundus
of the real samples and generated samples. images, including effective DR features directly. It can be
Supposing that the input DR sample is x and the last layer assumed that one entire DR sample includes M features and
outputs a vector (o1 , o2 , . . . , oC ), the probability pd (y = j |x) the generative model includes n generators. Each generator
that x belongs to class j is given by accounts for N = M/n features. Let Geni denotes the i th
exp(o j ) generator in set Gen= {Gen1 , Gen2 , . . . , Genn }. The objec-
pd (y = j |x) = C . (1) tive function for the proposed multichannel-based SSGAN is
c=1 exp(oc )
defined as
The semisupervised learning strategy is introduced into
discriminator to make full use of the unlabeled samples. The min max V (Dis, {Gen1 , Gen2 , . . . , Genn })
{Gen1 ,Gen2 ,...,Genn } Dis
discriminator outputs C + 1 classes [16], where C represents
= E x∼Pdata(x) [log(Dis(x))]
the number of classes for real sample, and the extra class
indicates whether the input is real sample or fake sample from + E z∼ pz (z) [log(1 − Dis(Gen(z)))]
the generator. + ||E φ∼Pr { f (φ1 ), f (φ2 ), . . . , f (φn )}
In this article, there are two types of the input data for the − E z∼Pz (z) { f (Gen1 ), f (Gen2 ), . . . , f (Genn )}||22 (6)
discriminator. One is the labeled DR data, and the other is
the unlabeled DR data. For the labeled data, there are four where f (φi ) represents the real DR features corresponding to
categories (R0, R1, R2, and R3). For the unlabeled data, each the i th generator. For the multichannel generative model, each
sample can be labeled as “Real” or “Fake.” In order to make generator contains several deconvolution layers. By choosing
full use of the label information, different loss functions are a noise variable pz (z) following a Gaussian distribution as
Authorized licensed use limited to: VIT University. Downloaded on December 02,2021 at 10:52:28 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: DR DIAGNOSIS USING MGAN WITH SEMISUPERVISION 579
the input for Geni , the output of each layer of Geni can be
expressed as
⎛ ⎞
⎜ ⎟
bi sx+ p,sy+q,l = gli ⎜
⎝
i
ax,y,d · wip,q,d,l + cli ⎟
⎠ (7)
p∈{0,1...,k−1}
q∈{0,1...,k−1}
d∈{1...,D}
where ⊕ denotes the connection operation of feature maps. (Loss)(Geni ) = E z∼ pz (z) [log(1 − Dis(Gen(z)))]
To simulate the basic DR features, the multichannel generative + ||E x∼Pr f (φi ) − E z∼Pz (z) f (Geni (z))||22 . (12)
model generates a matrix with the same size as the output
For optimization purpose, the loss of each generator is
of feature extractor. As elaborated earlier, the multichannel
minimized as
generative model aims to learn the distribution of real DR
features. This is achieved by optimizing the loss function of G opt = arg min(Loss)(Geni ). (13)
G
the multichannel generative model. Each generator simulates
a subfundus image, including the local DR features and also 3) Ensemble Discriminators: To improve the robustness
measures the distribution deviation between the final generated and accuracy of the proposed model, we devise an ensem-
features and the real DR features. Thus, the loss function of ble scheme into the model. The ensemble discriminative
each generator consists of two parts model includes multiple discriminators with different struc-
tures, as shown in Fig. 3. Each discriminator is optimized
(Loss)(Geni ) = LG W + LG iM (9) by training independent multichannel generative networks.
For the input DR features, each discriminator outputs an
where LG W represents the distance between the distribution
independent classification result. The weight parameter is
of generated DR features and the distribution of real DR
assigned to each discriminator by using weighted-based fusion
features. LG iM represents the multichannel features matching
scheme [50], [51]. The classification probability of the kth
loss between the generated features of each subfundus image
input DR features given by discriminator i can be expressed
and the real local features.
as
The distribution of the generated DR features is expected
to approximate the real distribution of DR features. Therefore, exp(o 1 ) exp(o 2 ) exp(o C )
Pki = C i
, C i
,. . . , C i
.
the multichannel feature matching mechanism is introduced
c=1 exp(oci ) c=1 exp(oci ) c=1 exp(oci )
into the proposed model, while one discriminator is employed (14)
to specify the distribution approximating which is the target of
the multichannel generators. More specifically, each generator Pki can be normalized by
is trained to approximate the expected state accounting for i i
the real DR features from the intermediate layers of the α1 , α2 , . . . , αCi
discriminator. Therefore, LG iM can be given by Pki
= . (15)
exp(o1i ) exp(o2i ) exp(oCi )
LG iM = ||E x∼Pr f (φi ) − E z∼Pz (z) f (Gen i
(z))||22 (10) max C , C exp(o , . . . , C exp(o
c=1 exp(oci ) c=1 ci ) c=1 ci )
where f (·) denotes the activation on an intermediate layer Supposing that the proposed model employs L discrimina-
of the discriminator, E x∼Pr f (·) represents the expectation of tors, the final class label is determined by
real DR features extracted by the intermediate layers of the L
L
L
discriminator, and Geni (z) represents the generated features of ypredict = arg max α1i , α2i , . . . , αCi . (16)
a subfundus image generated by generator i. The discriminant i=1 i=1 i=1
Authorized licensed use limited to: VIT University. Downloaded on December 02,2021 at 10:52:28 UTC from IEEE Xplore. Restrictions apply.
580 IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 18, NO. 2, APRIL 2021
Fig. 4. Effect of using different numbers of labeled training data. (a) DR. (b) Normal/abnormal. (c) Referable/nonreferable.
IV. E XPERIMENTS AND R ESULTS samples. For the DR grading, it is hoped to reduce the rate of
A. Experiment Configuration missed diagnosis by increasing sensitivity.
The AUC is a measure frequently used to evaluate the
All the experiments are implemented using NVIDIA Tesla performance of classifiers. It is an indicator of the probability
P100 GPU. The weights of the generators and discriminators that a classifier can correctly classify samples. Note that an
are initialized with Xavier uniform distribution, and the initial AUC value of 0.5 indicates a random classifier (guessing). The
value of bias was set as 0. The model was trained by using main advantage of AUC is its ability to evaluate the grading
minibatch with a size of 16. performance for unbalanced data sets.
Authorized licensed use limited to: VIT University. Downloaded on December 02,2021 at 10:52:28 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: DR DIAGNOSIS USING MGAN WITH SEMISUPERVISION 581
Fig. 5. Effect of using feature extractor. Fig. 6. Displays of DR features. (a) and (b) Feature maps from the feature
extractor and (c) scattered DR features for sample 1. (d) and (e) Feature maps
from the feature extractor and (f) scattered DR features for sample 2.
Authorized licensed use limited to: VIT University. Downloaded on December 02,2021 at 10:52:28 UTC from IEEE Xplore. Restrictions apply.
582 IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 18, NO. 2, APRIL 2021
Fig. 7. Effect of using different numbers of generators. (a) DR. (b) Normal/abnormal. (c) Referable/nonreferable.
Fig. 8. (a) Effect of optimization methods. (b) Effect of five different loss functions.
V. D ISCUSSION
This section provides a comparative evaluation of the pro-
posed multichannel-based SSGAN with other representative
methods in terms of accuracy, AUC, sensitivity, and specificity.
Several groups of experiments were conducted using differ- human experts, SSGAN, and other representative methods.
ent classification methods. The comparable methods include The experimental results are presented in Tables III–V.
Authorized licensed use limited to: VIT University. Downloaded on December 02,2021 at 10:52:28 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: DR DIAGNOSIS USING MGAN WITH SEMISUPERVISION 583
TABLE IV
C OMPARISON W ITH O THER M ETHODS ON N ORMAL /A BNORMAL
TABLE V
C OMPARISON W ITH O THER M ETHODS ON R EFERABLE /N ONREFERABLE
Fig. 11. (a)–(d) and (f)–(i) Generated samples versus (e) and (j) real samples.
Authorized licensed use limited to: VIT University. Downloaded on December 02,2021 at 10:52:28 UTC from IEEE Xplore. Restrictions apply.
584 IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 18, NO. 2, APRIL 2021
TABLE VII
C ONFIGURATIONS OF THE P ROPOSED M ODEL
we aim to obtain a good discriminator. It is well known that with four generators. Five discriminators are trained for the
good semisupervised classification performance and a good ensemble model. In Table VII, “256, 3 × 3conv, stride 2”
generator cannot be obtained at the same time [56]. Given the means 256 convolution kernels of size 3 × 3 and the stride
discriminator objective, good semisupervised learning often is 2.
requires a bad generator. Therefore, in this article, it is not
important whether the generated images are uniform with the R EFERENCES
real images. [1] O. Faust, R. Acharya U., E. Y. K. Ng, K.-H. Ng, and J. S. Suri, “Algo-
rithms for the automated detection of diabetic retinopathy using digital
VI. C ONCLUSION fundus images: A review,” J. Med. Syst., vol. 36, no. 1, pp. 145–157,
Feb. 2012.
The inadequacy of labeled data is a challenge for using [2] S. Haneda and H. Yamashita, “International clinical diabetic retinopathy
deep-learning technology in medical image analysis. The disease severity scale,” Nihon Rinsho. Jpn. J. Clin. Med., vol. 68, p. 228,
Nov. 2010.
reasons mainly include the followings: 1) the high-quality [3] S. Wang, Y. Hu, Y. Shen, and H. Li, “Classification of diffusion tensor
annotation of medical imaging sample depends heavily on metrics for the diagnosis of a myelopathic cord using machine learning,”
scarce medical expertise which is very expensive; 2) compared Int. J. Neural Syst., vol. 28, no. 2, Mar. 2018, Art. no. 1750036.
[4] S. Wang et al., “Skeletal maturity recognition using a fully automated
with natural issues, it is more difficult to collect medical system with convolutional neural networks,” IEEE Access, vol. 6,
images because of privacy issues, and 3) there are many kinds pp. 29979–29993, 2018.
of diseases, and so many different medical data sets need to be [5] M. Zhang et al., “Adaptive patient-cooperative control of a compliant
ankle rehabilitation robot (CARR) with enhanced training safety,” IEEE
collected. Actually, collecting sufficient DR samples with label Trans. Ind. Electron., vol. 65, no. 2, pp. 1398–1407, Feb. 2018.
is difficult. In this article, the multichannel GAN with semi- [6] H. Greenspan, B. van Ginneken, and R. M. Summers, “Guest editorial
supervision is developed to assess DR. The proposed model deep learning in medical imaging: Overview and future promise of
an exciting new technique,” IEEE Trans. Med. Imag., vol. 35, no. 5,
can deal with a DR classification problem with inadequacy pp. 1153–1159, May 2016.
of labeled data in the following ways. First, the multichannel [7] B. Zhong, W. Niu, E. Broadbent, A. McDaid, T. M. C. Lee, and
generative scheme is proposed to generate a series of sub- M. Zhang, “Bringing psychological strategies to robot-assisted physio-
therapy for enhanced treatment efficacy,” Frontiers Neurosci., vol. 13,
fundus images corresponding to the scattering DR features. p. 984, Sep. 2019.
Second, the proposed multichannel-based GAN model with [8] A. K. AlZubaidi, F. B. Sideseq, A. Faeq, and M. Basil, “Computer
semisupervision can make full use of both labeled data and aided diagnosis in digital pathology application: Review and perspective
approach in lung cancer classification,” in Proc. Annu. Conf. New Trends
unlabeled data. Third, the DR feature extractor is introduced Inf. Commun. Technol. Appl. (NTICT), Mar. 2017, pp. 219–224.
into the proposed model to weaken noise and extract represen- [9] M. Zhang, A. McDaid, A. J. Veale, Y. Peng, and S. Q. Xie, “Adaptive
tative DR features, some of which are tiny and spread out over trajectory tracking control of a parallel ankle rehabilitation robot with
joint-space force distribution,” IEEE Access, vol. 7, pp. 85812–85820,
high-resolution fundus images. Experiments are conducted 2019.
by using the Messidor data set. The experimental results [10] A. Sopharak et al., “Machine learning approach to automatic exudate
demonstrate that the developed model outperforms the other detection in retinal images from diabetic patients,” J. Modern Opt.,
vol. 57, no. 2, pp. 124–135, Jan. 2010.
representative models [16], [17], [28] in terms of accuracy, [11] R. Priya and P. Aruna, “Diagnosis of diabetic retinopathy using
AUC, sensitivity, and specificity. In particular, Fig. 10 shows machine learning techniques,” ICTACT J. Soft Comput., vol. 3, no. 4,
the promising performance of the proposed model even if only pp. 563–575, Jul. 2013.
[12] J. Krause et al., “Grader variability and the importance of reference stan-
100 labeled samples are employed. However, this article still dards for evaluating machine learning models for diabetic retinopathy,”
has some limitations. For example, biological methods and Ophthalmology, vol. 125, no. 8, pp. 1264–1272, Aug. 2018.
pathological analysis are not considered in this article. These [13] M. Zhang, W. Meng, C. Davies, Y. Zhang, and S. Xie, “A robot-
driven computational model for estimating passive ankle torque with
issues will be addressed in the future work. subject-specific adaptation,” IEEE Trans. Biomed. Eng., vol. 63, no. 4,
pp. 814–821, Aug. 2016.
A PPENDIX : [14] G. Litjens et al., “A survey on deep learning in medical image analysis,”
D ETAILS OF N ETWORK S TRUCTURE Med. Image Anal., vol. 42, pp. 60–88, Dec. 2017.
[15] H. Huang et al., “Introvae: Introspective variational autoencoders for
The configurations of the proposed model are given photographic image synthesis,” in Proc. Adv. Neural Inf. Process. Syst.,
in Table VII. In this article, each discriminator is cotrained 2018, pp. 52–63.
Authorized licensed use limited to: VIT University. Downloaded on December 02,2021 at 10:52:28 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: DR DIAGNOSIS USING MGAN WITH SEMISUPERVISION 585
[16] A. Odena, “Semi-supervised learning with generative adversarial [36] X. Li, T. Pang, B. Xiong, W. Liu, P. Liang, and T. Wang, “Convolutional
networks,” 2016, arXiv:1606.01583. [Online]. Available: http://arxiv. neural networks based transfer learning for diabetic retinopathy fundus
org/abs/1606.01583 image classification,” in Proc. 10th Int. Congr. Image Signal Process.,
[17] Z. Wang, Y. Yin, J. Shi, W. Fang, H. Li, and X. Wang, “Zoom-in- Biomed. Eng. Informat. (CISP-BMEI), Oct. 2017, pp. 1–11.
net: Deep mining lesions for diabetic retinopathy detection,” in Proc. [37] A. Khandelwal and A. K. Mishra, “Design simulation and analysis of
Int. Conf. Med. Image Comput. Comput.-Assist. Intervent. Quebec City, enhanced diabetic retinopathy using convolutional neural network,” Int.
QC, Canada: Springer, 2017, pp. 267–275. J. Res. Anal. Rev., vol. 6, no. 3, pp. 660–663, 2019.
[18] C. I. Sánchez, M. Niemeijer, A. V. Dumitrescu, [38] M. Raju, V. Pagidimarri, R. Barreto, A. Kadam, V. Kasivajjala, and
M. S. A. Suttorp-Schulten, M. D. Abràmoff, and B. van Ginneken, A. Aswath, “Development of a deep learning algorithm for automatic
“Evaluation of a computer-aided diagnosis system for diabetic diagnosis of diabetic retinopathy,” in Proc. MedInfo, 2017, pp. 559–563.
retinopathy screening on public data,” Investigative Opthalmol. Vis. [39] K. Xu, D. Feng, and H. Mi, “Deep convolutional neural network-based
Sci., vol. 52, no. 7, pp. 4866–4871, Jun. 2011. early automated detection of diabetic retinopathy using fundus image,”
[19] L. Seoud, J. Chelbi, and F. Cheriet, “Automatic grading of diabetic Molecules, vol. 22, no. 12, p. 2054, 2017.
retinopathy on a public database,” in Proc. Ophthalmic Med. Image [40] R. Ghosh, K. Ghosh, and S. Maitra, “Automatic detection and classifi-
Anal. 2nd Int. Workshop (OMIA), Munich, Germany, 2015, pp. 97–104. cation of diabetic retinopathy stages using CNN,” in Proc. 4th Int. Conf.
[20] H. Pratt, F. Coenen, D. M. Broadbent, S. P. Harding, and Y. Zheng, Signal Process. Integr. Netw. (SPIN), Feb. 2017, pp. 550–554.
“Convolutional neural networks for diabetic retinopathy,” Procedia Com- [41] S. Qummar et al., “A deep learning ensemble approach for diabetic
put. Sci., vol. 90, pp. 200–205, Jan. 2016. retinopathy detection,” IEEE Access, vol. 7, pp. 150530–150539, 2019.
[21] V. Gulshan et al., “Development and validation of a deep learning [42] J. M. Brown et al., “Automated diagnosis of plus disease in retinopa-
algorithm for detection of diabetic retinopathy in retinal fundus pho- thy of prematurity using deep convolutional neural networks,” JAMA
tographs,” JAMA, vol. 316, no. 22, pp. 2402–2410, Dec. 2016. Ophthalmol., vol. 136, no. 7, pp. 803–810, Jul. 2018.
[22] M. Haloi, “Improved microaneurysm detection using deep neural net- [43] X. Li, L. Shen, M. Shen, F. Tan, and C. S. Qiu, “Deep learning
works,” 2015, arXiv:1505.04424. [Online]. Available: http://arxiv.org/ based early stage diabetic retinopathy detection using optical coherence
abs/1505.04424 tomography,” Neurocomputing, vol. 369, pp. 134–144, Dec. 2019.
[23] P. Costa, A. Galdran, A. Smailagic, and A. Campilho, “A weakly- [44] R. Poplin et al., “Prediction of cardiovascular risk factors from retinal
supervised framework for interpretable diabetic retinopathy detection fundus photographs via deep learning,” Nature Biomed. Eng., vol. 2,
on retinal images,” IEEE Access, vol. 6, pp. 18747–18758, 2018. no. 3, pp. 158–164, Mar. 2018.
[24] J. Shan and L. Li, “A deep learning method for microaneurysm detection [45] J. De Fauw et al., “Clinically applicable deep learning for diagnosis and
in fundus images,” in Proc. IEEE 1st Int. Conf. Connected Health, Appl., referral in retinal disease,” Nature Med., vol. 24, no. 9, pp. 1342–1350,
Syst. Eng. Technol. (CHASE), Jun. 2016, pp. 357–358. Sep. 2018.
[25] R. Gargeya and T. Leng, “Automated identification of diabetic retinopa- [46] J. de la Torre, A. Valls, and D. Puig, “A deep learning interpretable
thy using deep learning,” Ophthalmology, vol. 124, no. 7, pp. 962–969, classifier for diabetic retinopathy disease grading,” Neurocomputing,
Jul. 2017. Apr. 2019, doi: 10.1016/j.neucom.2018.07.102.
[26] B. Antal and A. Hajdu, “An ensemble-based system for auto- [47] E. Decencière et al., “Feedback on a publicly distributed image data-
matic screening of diabetic retinopathy,” Knowl.-Based Syst., vol. 60, base: The Messidor database,” Image Anal. Stereology, vol. 33, no. 3,
pp. 20–27, Apr. 2014. pp. 231–234, 2014.
[27] P. Costa and A. Campilho, “Convolutional bag of words for diabetic [48] G. Huang, Z. Liu, L. V. D. Maaten, and K. Q. Weinberger, “Densely
retinopathy detection from eye fundus images,” IPSJ Trans. Comput. connected convolutional networks,” in Proc. CVPR, Jul. 2017, vol. 1,
Vis. Appl., vol. 9, no. 1, p. 10, 2017. no. 2, p. 3.
[28] H. H. Vo and A. Verma, “New deep neural nets for fine-grained diabetic [49] X. Glorot, A. Bordes, and Y. Bengio, “Deep sparse rectifier neural net-
retinopathy recognition on hybrid color space,” in Proc. IEEE Int. Symp. works,” in Proc. 14th Int. Conf. Artif. Intell. Statist., 2011, pp. 315–323.
Multimedia (ISM), Dec. 2016, pp. 209–215. [50] X. Frazao and L. A. Alexandre, “Weighted convolutional neural network
[29] S. Jialin Pan and Q. Yang, “A survey on transfer learning,” IEEE Trans. ensemble,” in Iberoamerican Congress on Pattern Recognition. Puerto
Knowl. Data Eng., vol. 22, no. 10, pp. 1345–1359, Oct. 2010. Vallarta, Mexico: Springer, 2014, pp. 674–681.
[30] L. Dai et al., “Clinical report guided retinal microaneurysm detection [51] G. Wen, Z. Hou, H. Li, D. Li, L. Jiang, and E. Xun, “Ensemble of
with multi-sieving deep learning,” IEEE Trans. Med. Imag., vol. 37, deep neural networks with probability-based fusion for facial expression
no. 5, pp. 1149–1161, May 2018. recognition,” Cognit. Comput., vol. 9, no. 5, pp. 597–610, Oct. 2017.
[31] W. Cao, N. Czarnek, J. Shan, and L. Li, “Microaneurysm detection [52] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet:
using principal component analysis and machine learning methods,” A large-scale hierarchical image database,” in Proc. IEEE Conf. Comput.
IEEE Trans. Nanobiosci., vol. 17, no. 3, pp. 191–198, Jul. 2018. Vis. Pattern Recognit., Jun. 2009, pp. 248–255.
[32] W. M. Gondal, J. M. Köhler, R. Grzeszick, G. A. Fink, and M. Hirsch, [53] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein GAN,” 2017,
“Weakly-supervised localization of diabetic retinopathy lesions in retinal arXiv:1701.07875. [Online]. Available: http://arxiv.org/abs/1701.07875
fundus images,” in Proc. IEEE Int. Conf. Image Process. (ICIP), [54] T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and
Sep. 2017, pp. 2069–2073. X. Chen, “Improved techniques for training GANs,” in Proc. Adv. Neural
[33] G. Quellec, K. Charrière, Y. Boudi, B. Cochener, and M. Lamard, “Deep Inf. Process. Syst., 2016, pp. 2234–2242.
image mining for diabetic retinopathy screening,” Med. Image Anal., [55] D. Y. Carson Lam, M. Guo, and T. Lindsey, “Automated detection
vol. 39, pp. 178–193, Jul. 2017. of diabetic retinopathy using deep learning,” in Proc. AMIA Summits
[34] L. Zhou, Y. Zhao, J. Yang, Q. Yu, and X. Xu, “Deep multiple Transl. Sci., 2018, p. 147.
instance learning for automatic detection of diabetic retinopathy in [56] Z. Dai, Z. Yang, F. Yang, W. W. Cohen, and R. Salakhutdinov,
retinal images,” IET Image Process., vol. 12, no. 4, pp. 563–571, “Good semi-supervised learning that requires a bad GAN,” 2017,
Apr. 2018. arXiv:1705.09783. [Online]. Available: http://arxiv.org/abs/1705.09783
[35] X. Zeng, H. Chen, Y. Luo, and W. Ye, “Automated diabetic retinopathy [57] M. D. Abràmoff et al., “Automated analysis of retinal images for
detection based on binocular Siamese-like convolutional neural net- detection of referable diabetic retinopathy,” JAMA Ophthalmol., vol. 131,
work,” IEEE Access, vol. 7, pp. 30744–30753, 2019. no. 3, pp. 351–357, Mar. 2013.
Authorized licensed use limited to: VIT University. Downloaded on December 02,2021 at 10:52:28 UTC from IEEE Xplore. Restrictions apply.