You are on page 1of 8

In partnership with the

Validation of a Deep Learning Model


to Screen for Glaucoma Using Images
from Different Fundus Cameras and
Data Augmentation
Ryo Asaoka, MD, PhD,1 Masaki Tanito, MD, PhD,2 Naoto Shibata,3 Keita Mitsuhashi,3 Kenichi Nakahara,3
Yuri Fujino, CO,1,4 Masato Matsuura, CO,1,4 Hiroshi Murata, MD,1 Kana Tokumo, MD,5
Yoshiaki Kiuchi, MD, PhD5

Purpose: To validate a deep residual learning algorithm to diagnose glaucoma from fundus photography
using different fundus cameras at different institutes.
Design: Cross-sectional study.
Participants: A training dataset consisted of 1364 color fundus photographs with glaucomatous indications
and 1768 color fundus photographs without glaucomatous features. Two testing datasets consisted of (1) 95
images of 95 glaucomatous eyes and 110 images of 110 normative eyes, and (2) 93 images of 93 glaucomatous
eyes and 78 images of 78 normative eyes.
Methods: A deep learning algorithm known as Residual Network (ResNet) was used to diagnose glaucoma
using a training dataset. The 2 testing datasets were obtained using different fundus cameras (different manu-
facturers) across multiple institutes. The size of the training data was artificially increased by adding minor al-
terations to the original data, known as “image augmentation.” Diagnostic accuracy was assessed using the area
under the receiver operating characteristic curve (AROC).
Main Outcome Measures: Area under the receiver operating characteristic curve.
Results: When image augmentation was not used, the AROC was 94.8% (90.3e96.8) in the first testing dataset
and 99.7% (99.4e100.0) in the second dataset. These AROC values were significantly (P < 0.05) smaller without
augmentation (87.7% [82.8e92.6] in the first testing dataset and 94.5% [91.3e97.6] in the second testing dataset).
Conclusions: The previously developed deep residual learning algorithm achieved high diagnostic perfor-
mance with different fundus cameras across multiple institutes, in particular when image augmentation was
used. Ophthalmology Glaucoma 2019;2:224-231 ª 2019 by the American Academy of Ophthalmology

There is no doubt that early diagnosis of glaucoma is is affordable and can be carried out at nonophthalmological
important for preventing blindness, because glaucoma causes facilities, including optician practices, screening centers, and
irreversible visual impairment. In glaucoma, morphologic internal medicine clinics, where high-tech imaging devices
changes at the optic disc occur in typical patterns.1 The 2- such as OCT are less readily available.8
dimensional fundus photograph is a basic ophthalmological In a recent study, we developed a ResNet model using
screening tool for glaucoma; however, the diagnosis is fundus images obtained with a single fundus camera
currently based on subjective judgment. With the develop- (nonmyd WX, Kowa Company, Ltd, Aichi, Japan) and
ment of deep learning methods in imaging recognition validated its usefulness using a testing dataset of images
research,2 there has been renewed interest in using deep obtained from the same camera. This is a concern because
learning to diagnose eye diseases. Recent studies have fundus images are not homogenous across cameras; for
suggested the usefulness of applying a deep learning instance, pixel resolution and sensor properties are unique to
method known as the “convolutional neural network” each camera. Furthermore, all images in our previous study
(CNN) to diagnose glaucoma.3-5 A possibly more powerful were obtained at a single institute. The purpose of the cur-
deep learning methoddthe deep residual learning for Image rent study was to validate the usefulness of the newly
Recognition (Residual Network [ResNet])6dis available, developed ResNet algorithm for fundus images obtained
and we recently reported its usefulness for diagnosing using different fundus cameras (multiple manufacturers)
glaucoma.7 across multiple institutes. In addition, it has been reported
The potential impact of an accurate algorithm for the early that increasing the size of a training dataset by adding minor
detection of glaucoma and prevention of blindness, using alterations to the original data, known as “augmentation,” is
fundus photography, is enormous. This imaging technology useful to improve diagnostic accuracy.9 Thus, we also

224  2019 by the American Academy of Ophthalmology https://doi.org/10.1016/j.ogla.2019.03.008


Published by Elsevier Inc. ISSN 2589-4196/19
Asaoka et al 
Algorithm to Diagnose Glaucoma from Fundus Camera

investigated the usefulness of this technique to improve the dataset was prepared without considering VF defects, IOP level,
performance of the ResNet model. and gonioscopic appearance.
Testing Dataset 2. Testing dataset 2 consisted of 93 images of
93 glaucomatous eyes derived from the glaucoma clinic, Depart-
Methods ment of Ophthalmology, Hiroshima University, from December
2017 to February 2018 and 78 images of 78 normative eyes of
The study was approved by the Research Ethics Committee of the patients who visited Hiroshima University Hospital from
Matsue Red Cross Hospital, IInan Hospital, Hiroshima University September 2009 to July 2018. Posterior fundus photographs were
Hospital and the Faculty of Medicine at the University of Tokyo. captured using the TRC-50DX (Topcon Co Ltd) camera. The optic
The ethics committee of Matsue Red Cross Hospital and IInan nerve head and macula were scanned using the An RTVue Fourier-
Hospital waived the requirement for the patient’s informed consent domain OCT system (Optovue, Inc, Fremont, CA). The protocol
regarding the use of their medical record data in accordance with consisted of 1 horizontal scan of 7 mm in length, followed by 15
the regulations of Japanese Guidelines for Epidemiologic Study vertical scans of 7 mm in length at 0.5-mm intervals centered 1 mm
issued by the Japanese Government. Instead, the protocol was temporal to the fovea. The retinal nerve fiber layer (RNFL) 3.45
posted at the outpatient clinic to notify participants about the mode of the RTVue FD system measures the peripapillary RNFL
research. This study was performed according to the tenets of the thickness along a 3.45-mm diameter circle around the optic disc.
Declaration of Helsinki. Refractive error was recorded using the KR-1 refract-keratometer
(Topcon Co Ltd), and IOP was determined using the Goldmann
Participants applanation tonometer. The glaucomatous and normal groups were
defined in the same manner as the testing dataset. The VF was
Training Dataset. The training dataset was inherited from our tested using the HFA Swedish Interactive Thresholding Algorithm
previous study.7 In short, 1364 glaucomatous and 1768 normative central 30-2 program (Carl Zeiss Meditec, Dublin, CA).
photographs, labeled according to the recommendations of the
Japan Glaucoma Society Guidelines for Glaucoma,10 were Structure and Training Strategy of Deep Neural
obtained using a fundus camera (nonmyd WX, Kowa Company, Networks
Ltd) between February 2016 and October 2016 at Matsue Red
Cross Hospital. All photographs were taken with an angle of view In our recent report,7 we used a type of CNN known as ResNet,6
of 45 and a resolution of 21441424 pixels. Photographs that which is well known to be useful for image classification and
were defocused, unclear, too dark, too bright, or decentered from feature extraction. We reported the usefulness of the
the posterior pole, or had other conditions that could interfere with ResNet algorithm (a scratch model) to accurately discriminate
a diagnosis of glaucoma were excluded. Photographs from 2 eyes between glaucomatous and healthy eyes, trained with
of a patient were included if both photographs satisfied the criteria approximately 3000 fundus images labeled as glaucomatous or
but duplicate photographs of a single eye were excluded. Images not. In ResNet, “identical skip connections” that skip 1 or more
with other optic nerve head and retinal pathologies were also layers are used and features are propagated to succeeding layers.
excluded. This enables ResNet to facilitate a deeper and larger network,
Testing Dataset 1. Testing dataset 1 consisted of 95 images of which is helpful to acquire more effective and conceptual features
95 glaucomatous eyes and 110 images of 110 normative eyes without overfitting.
obtained from outpatients who visited Iinan Hospital (Iinan Town, Figure 1 summarizes the methods used in this study. We exploit
Shimane Prefecture, Japan) between December 2017 and February 34 layers in ResNet and initialize training with pretrained weights,
2018. Posterior fundus photographs were captured using the non- which were optimized for ImageNet classification.11 The network
myd 7 camera (Kowa Company, Ltd). The optic nerve head and also inherits the input shape from ImageNet,11 that is, 224224
macula were scanned using the RS-3000 OCT (Nidek, Gamagori, pixels and Red, Green, and Blue channels in each. Only the last
Japan) in the glaucoma mode. Refractive error was recorded using fully connected layer is changed to output 2 values (glaucoma or
the KR-1 refract-keratometer (Topcon Co Ltd, Tokyo, Japan), and normal). This methodology is inspired by recent successes in fine-
intraocular pressure (IOP) was determined using the Goldmann tuning deep neural networks,12 whereby parameters of a network
applanation tonometer. The glaucoma group was defined as having are first derived in a different but large pretraining dataset and then
glaucomatous changes in fundus photographs and corresponding used to initialize training in a new and smaller training dataset.
thinning in circumpapillary retinal nerve fiber layer thickness Fine-tuning is now a widely used approach to acquire a general-
measurements or in macular inner retinal thickness measurements ized feature representation with small training samples. In addition to
(outside of the OCT’s normative range), and no other optic nerve/ fine-tuning, we attempted further improvements of the model by
optic nerve head and retinal pathologies by fundus photographs applying image augmentation. It has been demonstrated that
and OCT images. Circumpapillary retinal nerve fiber layer thick- increasing the diversity of the training data by adding minor alter-
ness at the 3.45-mm diameter and vertical cup-to-disc ratio in the ations to the original data, known as “augmentation,” is useful to
raster scanning over a 66-mm2 area centered on the optic disc, improve diagnostic accuracy.9 Various changes were made to the
and macular inner retinal thickness within the 9-mm circle in the input images, as described next. For comparison, other deep neural
raster scanning over a 99-mm2 area centered on the foveal center networks of VGG11, VGG16,13 and Inception-v3 were also used.
were obtained. In glaucomatous groups, the visual field (VF) was Inception-v314 is a CNN model consisting of 11 inceptions; this
tested using the Humphrey Field Analyzer (HFA) Swedish Inter- structure enabled a reduction of parameters in the model, and as a
active Thresholding Algorithm central 30-2 program (Carl Zeiss result, the size of the input data required to obtain accurate
Meditec, Dublin, CA); however, in this study, VF results were not diagnosis was decreased.14 These models were pretrained using
considered in the diagnosis of glaucoma. The normal group was the ImageNet classification11 and then fine-tuned using the
defined as being free of glaucomatous changes and retinal pa- training dataset with augmentations. Structural augmentation images
thologies in both fundus photographs and OCT images. A diag- were vertically and horizontally inverted up to 50%. Random affine
nosis was independently judged by 3 ophthalmologists specializing transformations were also applied. The term “affine transformation”
in glaucoma (M.T., H.M., and R.A.). Photographs were excluded if represents 3 transformations: (1) rotation with random angles
the diagnoses of the 3 examiners did not agree. Thus, this testing from 10 degrees to þ 10 degrees, (2) vertical and horizontal

225
Ophthalmology Glaucoma Volume 2, Number 4, July/August 2019

Figure 1. Outline of the Residual Network (ResNet) model used in the current study. A total of 34 layers in ResNet were initialized with pretrained weights
optimized for ImageNet classification.11 The network also inherits the input shape from ImageNet,11 that is, 224224 pixels and red, green, and blue (RGB)
channels in each. Only the last fully connected layer is changed to output 2 values (glaucoma or normal).

translations to a maximum of 10% of the length, and (3) scaling Figure 2 shows the receiver operating characteristic curve obtained
edges to a random value from 224 to 256 pixels. Finally, 224224 with both testing datasets combined. The AROC with ResNet was
regions were randomly cropped from the transformed image. These 96.5% (95% CI, 94.9e98.1) with both structural and color
transformations enhance spatial diversity among the training dataset augmentation. Significantly smaller AROC values were obtained
and thus help the algorithm to be scale and position invariant. For
when no type of image augmentation was applied (90.5%
color augmentation, the contrast, saturation, and hue of the images
were modified randomly (range, 50%e150%, 90%e110%, and [87.5e93.4]) and when only structural augmentation (95.0%
90%e110%, respectively). This operation helps the algorithm be [93.0e97.1]) or only color augmentation was applied (94.5%
more robust to glaucomatous feature extraction. [92.3e96.7]); P values were < 0.001, < 0.001, and 0.0053,
Alongside the augmentation of training data, common pre- respectively (DeLong’s method with adjustment for multiple
processing and postprocessing for input tensors were made. Before comparisons).
training, raw fundus images were trimmed around the optic disc with Figure 3 shows the receiver operating characteristic curve
halo exclusion using vessel region30 and through transformation10 to obtained with testing dataset 1 only. The AROC with ResNet
find the circular area, that is, optic disk, and resized to fit the network was 94.8% (95% CI, 92.1e97.5) with both structural and color
input (224224 pixels). Also, just before images were fed into the augmentation. Smaller AROC values were obtained when no
deep learning network, images were normalized with the global
type of image augmentation was applied (87.7% [82.8e92.6])
average value and standard deviation of pixel values among the
training images, because the pretrained ResNet expects normalized and when only structural augmentation (91.8% [88.4e95.8]) or
tensors. Further training settings are detailed in Table 1. only color augmentation was applied (92.1% [88.2e95.5]); P
values were 0.0050, 0.062, and 0.069, respectively (DeLong’s
Statistical Analysis method with adjustment for multiple comparisons).
Figure 4 shows the receiver operating characteristic curve
Validation was carried out using the testing datasets. The obtained with testing dataset 2. The AROC with ResNet was
ResNet algorithm was built using all data in the training dataset and 99.4% (95% CI, 98.6e100.0) with both structural and color
the area under the receiver operating characteristic curve (AROC) augmentation. Smaller AROC values were obtained when no
was calculated. The AROCs were compared using DeLong’s
type of augmentation was applied (94.5% [91.3e97.6]) and
method.15 Holm’s method16,17 was used to correct P values for the
problem of multiple testing. when only structural augmentation (98.4% [96.4e99.8]) or only
color augmentation was applied (98.1% [97.1e99.7]); P values
were 0.0047, 0.025, and 0.025, respectively (Delong’s method
Results with adjustment for multiple comparisons).
Table 5 shows the AROC values with ResNet, Inception-v3,
Demographic data of the subjects in the testing datasets are sum- VGG11, and VGG16. In testing dataset 1, the area under the curve
marized in Tables 2 and 3. The mean deviation (MD) values of (AUC) value with ResNet was significantly larger than that of VGG11
glaucomatous eyes were 4.35.5 (mean  standard deviation) (91.6% [87.8e95.5], P ¼ 0.033), but not significantly different from
(95% confidence interval [CI], 5.4 to 3.2) decibels (dB) in those with inception-v3 (92.3% [88.6e96.0], P ¼ 0.074) and VGG16
testing dataset 1 and 14.17.1 (16.0 to 12.2) dB in testing (93.7% [90.4e96.9], P ¼ 0.28). There was no significant difference
dataset 2, respectively. Details on the fundus cameras used to between ResNet and the other 3 models of inception-v3, VGG11, and
capture images in all the datasets are summarized in Table 4. VGG16 in testing dataset 2 and all eyes (P > 0.05).

226
Asaoka et al 
Algorithm to Diagnose Glaucoma from Fundus Camera

Table 1. Parameters Used in the Residual Network Table 3. Demographics of Subjects in Testing Dataset 2

Momentum SGD Glaucoma Normal P Value


Learning Batch Damping Weight n 93 78
Rate Dropout Size Coefficient Decay Age (yrs)
0.001 0.5 (only last 64 0.9 0.0001 Mean  SD 66.1 13.1 51.623.3 <0.0001*
fully connected) 95% CI 63.4e68.8 46.3e56.9
Sex
Men, n (%) 51 (54.8) 39 (50.0) 0.5425y
SGD ¼ stochastic gradient descent. Women, n (%) 42 (46.2) 39 (50.0)
Eye
Right, n (%) 47 (50.5) 44 (56.4) 0.5384y
Left, n (%) 46 (49.5) 34 (44.6)
Discussion IOP (mmHg)
Mean  SD 14.84.1 14.64.4 0.7647*
We previously reported that a ResNet deep learning model 95% CI 13.9e15.5 13.6e15.6
is useful to diagnose glaucoma in fundus photographs;7 this No. of glaucoma
ResNet model was trained using images from 1364 eyes medications
Mean  SD 2.51.3 0.0510.45 <0.0001*
with glaucoma and 1768 eyes of normative subjects using 95% CI 2.2e2.8 0.050e0.15
the Kowa nonmyd WX camera. In the current study, the Spherical equivalent
usefulness of the ResNet deep learning model with the refractive error (D)
Mean  SD 3.73.2 1.02.6 <0.0001*
95% CI 4.3 to 3.0 1.5 to 0.36
Table 2. Demographics of Subjects in Testing Dataset 1
vC/D ratio
Mean  SD 0.910.08 0.610.11 <0.0001*
Glaucoma Normal P Value
95% CI 0.89e0.92 0.55e0.66
N 95 110 cpRNFLT (mm)
Age (yrs) Mean  SD 65.110.0 100.79.2 <0.0001*
Mean  SD 78.78.1 76.48.8 0.0268* 95% CI 63.1e67.2 98.7e102.8
95% CI 77.1e80.4 74.8e78.1 mIRT (mm)
Sex Mean  SD 68.411.1 97.58.4 <0.0001*
Men, n (%) 30 (32) 48 (44) 0.0848y 95% CI 66.1e70.7 95.6e99.4
Women, n (%) 65 (68) 62 (56) HFA MD
Eye Mean  SD 14.17.1 1.22.2 <0.0001*
Right, n (%) 49 (52) 46 (48) 0.3986y 95% CI 16.0 to 12.2 2.0 to 0.46
Left, n (%) 64 (58) 46 (42) Axial length (mm)
IOP (mmHg) Mean  SD 25.82.2 23.71.2 0.0004*
Mean  SD 12.83.4 13.82.6 0.0244* 95% CI 24.8e26.8 23.2e24.3
95% CI 12.1e13.5 12.3e14.3
No. of glaucoma medications
BCVA ¼ best-corrected visual acuity; CI ¼ confidence interval;
Mean  SD 1.30.9 e cpRNFLT ¼ circumpapillary retinal nerve fiber layer thickness; D ¼
95% CI 1.1e1.5 e diopter; HFA ¼ Humphrey Field Analyzer; IOP ¼ intraocular pressure;
Spherical equivalent logMAR ¼ logarithm of the minimum angle of resolution; MD ¼ mean
refractive error (D) deviation; mIRT ¼ macular inner retinal thickness; SD ¼ standard devi-
Mean  SD 0.71.9 0.31.6 0.0001* ation; vC/D ratio ¼ vertical cup-to-disc ratio.
95% CI 1.1 to 0.3 0.0eþ0.6 P values are calculated among the 4 groups by 1-way analysis of variance
vC/D ratio (*) for continuous variables and by the chi-square test (y) for categoric
Mean  SD 0.740.12 0.540.15 <0.0001* variables.
95% CI 0.71e0.76 0.51e0.56
cpRNFLT (mm)
Mean  SD 75.316.7 91.911.6 <0.0001*
95% CI 71.9e78.7 89.7e94.1
image augmentation was investigated using fundus images
mIRT (mm) obtained using different fundus cameras (multiple
Mean  SD 75.89.3 91.09.2 <0.0001* manufacturers) across multiple institutes. Testing dataset 1
95% CI 73.9e77.7 89.3e92.8 was obtained using the Kowa nonmyd 7 camera, and
HFA MD testing dataset 2 was composed of images from the
Mean  SD 4.35.5 e e Topcon TRC-50DX camera. These testing datasets were
95% CI 5.4 to 3.2 e
drawn from hospitals different than the hospital in the
training dataset. In this study, we observed high AROC
CI ¼ confidence interval; cpRNFLT ¼ circumpapillary retinal nerve fiber values in these testing datasets (94.8% and 99.4%,
layer thickness; D ¼ diopter; HFA ¼ Humphrey Field Analyzer; IOP ¼ respectively).
intraocular pressure; MD ¼ mean deviation; mIRT ¼ macular inner retinal
thickness; SD ¼ standard deviation; vC/D ¼ vertical cup-to-disc; e ¼ no After successes in the application of deep learning
data. methods to screen for diabetic retinopathy,18,19 studies have
P values are calculated between normative and glaucomatous image groups now suggested the usefulness of this method to screen for
by unpaired t test (*) for the continuous variables or by the Fisher exact glaucoma.3-5,7 In particular, we have recently validated the
probability test (y) for the categoric variables.
usefulness of a deep learning approach in nonhighly myopic

227
Ophthalmology Glaucoma Volume 2, Number 4, July/August 2019

Table 4. Comparisons of Fundus Cameras Used in the Training Dataset, Testing Dataset 1, and Testing Dataset 2

Institute Camera Resolution Sensor


Training dataset Matsue Red Cross Hospital nonmyd WX, Kowa Company, Ltd. (Aichi, Japan) 21441424 pixels CCD
Testing dataset 1 Iinan City Hospital nonmyd7, Kowa Company, Ltd. 19361296 pixels CMOS
Testing dataset 2 Hiroshima University Hospital TRC-50DX, Topcon Co. Ltd. (Tokyo, Japan) 14601424 pixels CCD

CCD ¼ charge-coupled device; CMOS ¼ complementary metal-oxide-semiconductor.

glaucoma and highly myopic glaucoma.7 This is important of deep learning methods to diagnose glaucoma from fundus
for patients of Asian origin, including Japanese, photographs.3-5 Augmentation is a technique to increase the
because myopia is more common in these populations20,21 size of the training data by adding minor alterations to the
and myopia is a risk factor for the development of original data,9 and so this method is particularly useful when
glaucoma.22-25 Furthermore, the detection of glaucoma is a the size of the training dataset is small.9 Current results
more challenging task in highly myopic patients because suggested that the diagnostic performance of the
their optic discs are morphologically different from those of ResNet algorithm was greatly improved using structural
nonhighly myopic eyes.26,27 and color image augmentations (Figs 2e4). It should be
A wide variety of fundus cameras are used in the clinical noted that in all the ResNet models built in the current study,
setting. Thus, the diagnostic performance of any algorithm a fine-tuning (“transfer learning”) technique was used
designed to screen for glaucoma from fundus photographs whereby the parameters of ResNet are pretrained using a
should be validated across fundus cameras and different different but massive dataset before the training process.
conditions. For example, images may be systematically This method is now widely used to acquire a generalized
different according to the institute where they were captured feature representation with a small training dataset, and
and, moreover, according to the photographer who captured indeed we have recently reported the usefulness of this
them. Previous studies3,4 from different groups analyzed approach in diagnosing glaucoma using OCT images.29 The
larger numbers of fundus images; however, the smaller current results suggest that image augmentation is a useful
training dataset size in this study did not limit the success of approach even after the fine-tuning process.
the ResNet algorithm. The ResNet model achieved a high In the current study, images were cropped around the
AROC in the testing datasets despite different pixel reso- optic disc before training the ResNet model and features
lutions (19361296 pixels in testing dataset 1 and
14601424 pixels in testing dataset 2) from that used in the
training dataset (21441424 pixels). Furthermore, each
fundus camera differs not only in resolution but also in the
sensor; the nonmyd WX and TRC-50DX use a charge-
coupled device (CCD), whereas a complementary metal-
oxide-semiconductor (CMOS) is used in nonmyd WX
(Table 4). Both CCD and CMOS sensors are used in
conventional digital cameras, but there are large
differences between these 2 types of sensors. Both CCD
and CMOS devices transform the light from one small
portion of the image into electrons; however, the charge is
transported across the chip and read at 1 corner of the
array in CCD, but there are several transistors at each
pixel that amplify and move the charge using traditional
wires in CMOS.28 As a result, CCD sensors usually create
high-quality, low-noise images, whereas CMOS sensors
traditionally are more susceptible to noise. Also, the light
sensitivity of a CMOS chip tends to be lower than CCD,
because each pixel on a CMOS sensor has several transistors
located next to it and many of the photons hitting the chip
hit the transistors instead of the photodiode. Despite these
differences across the training dataset and 2 testing datasets, Figure 2. Receiver operating characteristic curve obtained with a com-
bined testing dataset (testing dataset 1 and testing dataset 2). The area
high AROC values (94.8% and 99.4%) were obtained in
under the receiver operating characteristic curve (AROC) with Residual
both testing datasets (Figs 2e4). The relatively smaller Network (ResNet) was 96.5 (95% confidence interval [CI], 94.9e98.1)
AROC value in the testing dataset 1 (HFA MD: 4.35.5 with iii) both structural and color augmentation. Significantly smaller
dB) than in the testing dataset 2 (HFA MD: 14.17.1 dB) AROC values were obtained with no augmentation (90.5% [87.5e93.4]),
would be due to the difference of the stages of glaucoma. i) color augmentation (94.5% [92.3e96.7]), and ii): color augmentations
The size of the training dataset used is smaller than that (95.0% [93.0e97.1]), with the P values of < 0.001, < 0.001, and 0.0053
used in previous studies that have reported on the usefulness (DeLong’s method with adjustment for multiple comparisons).

228
Asaoka et al 
Algorithm to Diagnose Glaucoma from Fundus Camera

Table 5. Comparisons of the Area Under the Receiver Operating


Characteristic Curve Obtained with Residual Network, VGG11,
and VGG16 Models

Testing
All Eyes Testing Dataset 1 Dataset 2
ResNet 96.5% [94.9e98.1] 94.8% [92.1e97.5] 99.4%
[98.6e100.0]
Inception-v3 95.7% [93.8e97.6] 92.3% [88.6e96.0] 99.1%
[98.2e100.0]
VGG11 95.5% [93.5e97.5] 91.6% [87.8e95.5]* 99.4%
[98.8e100.0]
VGG16 96.4% [94.7e98.1] 93.7% [90.4e96.9] 99.5%
[98.8e100.0]

ResNet ¼ Residual Network.


In testing dataset 2, the area under the curve (AUC) value with ResNet was
significantly larger than that of inception-v3 (91.6% [87.8e95.5], P ¼
0.0033), but not significantly different from those with VGG11 and VGG16.
There was no significant difference between ResNet and other 3 models of
inception-vs, VGG11, and VGG16 in testing dataset 2 and all eyes.
*P < 0.01
Figure 3. Receiver operating characteristic curve using testing dataset 1
(N ¼ 205). The area under the receiver operating characteristic curve with
Residual Network (ResNet) was 94.8 (95% confidence interval,
92.1e97.5) with iii) both structural and color augmentation. Significantly
smaller AROC values were obtained with no augmentation (87.7% of the ResNet model may rely on this process, because other
[82.8e92.6]) and i) color augmentation (91.8% [88.4e95.8]), but not with information from the retina could amplify differences across
ii): structural augmentations (92.1% [88.2e95.5]), with the P values of fundus cameras. In addition, a previous study suggested that
0.0050, 0.062, and 0.069 (DeLong’s method with adjustment for multiple the accuracy of a glaucoma diagnosis is hampered in eyes
comparisons). with other retinal diseases,4 whereas our ResNet model is
from outside the optic disc were not used. Nerve fiber layer less affected by this issue.
defects, optic disc hemorrhages, and other features may be In our previous study,7 it was reported that the
omitted by this cropping procedure; however, the robustness usefulness of other machine learning methods (a CNN
with 16 layers, similar to VGG16,13 a support vector
machine,30 and a Random Forest31) was poorer compared
with the ResNet model. Further, the diagnostic
performance of residents in ophthalmology was also lower
than the proposed ResNet model. The testing datasets 1
and 2 were in early and moderate stages on average, as
shown by the MD values of 4.35.5 dB in the testing
dataset 1 and 14.17.1 dB in the testing dataset 2
(Tables 2 and 3). The obtained AROC values were high
in both datasets, in particular with the image
augmentation method, but there was a marginal difference
between the 2 groups (94.8% and 99.7%). This would
have been inherited by the different stages of glaucoma in
these groups. A future study may be needed to shed
further light on the diagnostic performance of the
diagnosis of glaucoma using fundus photographs and
deep learning; however, our results validated the
usefulness of this approach in early (testing dataset 1) and
moderate (testing dataset 2) glaucoma. In the current
study, the diagnostic performances of other CNN models
of VGG11 and 16, and a newer model of inception-v3
Figure 4. Receiver operating characteristic curve using testing dataset 2 were investigated. As a result, there was no significant
(N ¼ 172). The area under the receiver operating characteristic curve
difference among the AROC values of ResNet, inception-
(AROC) with Residual Network was 99.4 (95% confidence interval,
98.6e100.0) with iii) both structural and color augmentation. Significantly
v3, and VGG16, but the AROC value with VGG11 was
(P < 0.05, DeLong’s method with adjustment for multiple comparisons) significantly smaller than ResNet in the testing dataset 1.
smaller AROC values were obtained with no augmentation (94.5% [91.3e This finding was not observed in testing dataset 2; there was
97.6]), i) color augmentation (98.1% [97.1e99.7]), and ii): structural no significant difference across the 4 models. This differ-
augmentations (98.4% [96.4e99.8]), with the P values of 0.0047, 0.014, ence may be due to the difference of the glaucoma stages in
and 0.025 (Delong’s method with adjustment for multiple comparisons). testing datasets 1 and 2.

229
Ophthalmology Glaucoma Volume 2, Number 4, July/August 2019

Study Limitations 14. Szegedy C, Vanhouke V, Ioffe S, et al. Rethinking the


inception architecture for computer vision. http://arxivorg/pdf/
One significant limitation of the current study is that the 151200567v3pdf. Accessed April 14, 2015.
model has not been validated in different ethnicities. This is 15. DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the
an important limitation because fundus images from areas under two or more correlated receiver operating char-
different ethnicities may have different features, not only in acteristic curves: a nonparametric approach. Biometrics.
retinal color but also in optic disc structure. Our ResNet 1988;44:837e845.
model may be robust to these differences; however, a future 16. Holm S. A simple sequentially rejective multiple test proced-
study is needed to confirm this. ure. Scand J Stat. 1979;6:65e70.
17. Aickin M, Gensler H. Adjusting for multiple testing when
In conclusion, the usefulness of a deep residual learning reporting research results: the Bonferroni vs Holm methods.
algorithm to automatically screen for glaucoma from fundus Am J Public Health. 1996;86:726e728.
photographs was validated using images obtained from 18. Gulshan V, Peng L, Coram M, et al. Development and vali-
different fundus cameras. The algorithm had a high diag- dation of a deep learning algorithm for detection of diabetic
nostic ability irrespective of the type of fundus camera. The retinopathy in retinal fundus photographs. JAMA. 2016;316:
potential impact of the current algorithm as a screening tool 2402e2410.
cannot be exaggerated, because 2-dimensional fundus 19. Takahashi H, Tampo H, Arai Y, et al. Applying artificial in-
photography is commonly used at screening centers and telligence to disease staging: deep learning for improved
optician and internal medicine clinics. staging of diabetic retinopathy. PLoS One. 2017;12:e0179790.
20. Rudnicka AR, Owen CG, Nightingale CM, et al. Ethnic dif-
ferences in the prevalence of myopia and ocular biometry in
References 10- and 11-year-old children: the Child Heart and Health
Study in England (CHASE). Invest Ophthalmol Vis Sci.
1. Hitchings RA, Spaeth GL. The optic disc in glaucoma. I: 2010;51:6270e6276.
Classification. Br J Ophthalmol. 1976;60:778e785. 21. Sawada A, Tomidokoro A, Araie M, et al. Refractive errors in
2. Hinton GE, Osindero S, Teh YW. A fast learning algorithm for an elderly Japanese population: the Tajimi study. Ophthal-
deep belief nets. Neural Comput. 2006;18:1527e1554. mology. 2008;115:363e370.e363.
3. Ting DSW, Cheung CY, Lim G, et al. Development and 22. Mitchell P, Hourihan F, Sandbach J, Wang JJ. The relationship
validation of a deep learning system for diabetic retinopathy between glaucoma and myopia: the Blue Mountains Eye
and related eye diseases using retinal images from multiethnic Study. Ophthalmology. 1999;106:2010e2015.
populations with diabetes. JAMA. 2017;318:2211e2223. 23. Suzuki Y, Iwase A, Araie M, et al. Risk factors for open-angle
4. Li Z, He Y, Keel S, et al. Efficacy of a deep learning system glaucoma in a Japanese population: the Tajimi Study.
for detecting glaucomatous optic neuropathy based on color Ophthalmology. 2006;113:1613e1617.
fundus photographs. Ophthalmology. 2018;125:1199e1206. 24. Xu L, Wang Y, Wang S, Jonas JB. High myopia and glaucoma
5. Liu S, Graham SL, Schulz A, et al. A deep learning-based susceptibility the Beijing Eye Study. Ophthalmology.
algorithm identifies glaucomatous discs using monoscopic 2007;114:216e220.
fundus photographs. Ophthalmol Glaucoma. 2018;1:15e22. 25. Perera SA, Wong TY, Tay WT, et al. Refractive error, axial
6. He K, Zhang X, Ren S, Sun J. Deep residual learning for dimensions, and primary open-angle glaucoma: the Singapore
image recognition. arXiv:151203385 2015. Malay Eye Study. Arch Ophthalmol. 2010;128:900e905.
7. Shibata N, Tanito M, Mitsuhashi K, et al. Development of a 26. How AC, Tan GS, Chan YH, et al. Population prevalence of
deep residual learning algorithm to screen for glaucoma from tilted and torted optic discs among an adult Chinese population
fundus photography. Sci Rep. 2018;8:14665. in Singapore: the Tanjong Pagar Study. Arch Ophthalmol.
8. Huang D, Swanson EA, Lin CP, et al. Optical coherence to- 2009;127:894e899.
mography. Science. 1991;254:1178e1181. 27. Samarawickrama C, Mitchell P, Tong L, et al. Myopia-related
9. Russakovsky O, Deng J, Su H, et al. ImageNet Large Scale optic disc and retinal changes in adolescent children from
Visual Recognition Challenge. Int J Comput Vis. 2015;15: Singapore. Ophthalmology. 2011;118:2050e2057.
211e252. 28. Carlson BS. Comparison of modern CCD and CMOS image
10. Japan Glaucoma Society. http://www.ryokunaisho.jp/english/ sensor technologies and systems for low resolution imaging.
guidelines.html. Accessed April 14, 2019. Proc IEEE Sensors. 2002;1:171e176.
11. Deng J, Dong W, Scocher R, et al. ImageNet: a large-scale 29. Asaoka R, Murata H, Hirasawa K, et al. Using deep learning
hierarchical image database. 2009 IEEE Conference on and transform learning to accurately diagnose early-onset
Computer Vision and Pattern Recognition, June 22e24, 2009, glaucoma from macular optical coherence tomography im-
Miami Beach, Florida. ages. Am J Ophthalmol. 2019;198:136e145.
12. Yosinski J, Clune J, Bengio Y, Lipson H. How transferable are 30. Cristianini N, Shawe-Taylor J. An Introduction to Support
features in deep neural networks? Adv Neural Inf Process Syst. Vector Machines and Other Kernel-Based Learning Methods.
2014;27:3320e3328. Cambridge, UK: Cambridge University Press; 2000.
13. Simonyan K, Zisserman A. Very deep convolutional networks 31. Breiman L. Random Forests. Machine Learning. 2001;45:
for large-scale image recognition. arXiv:14091556 2014. 5e32.

Footnotes and Financial Disclosures


1
Originally received: February 7, 2019. Department of Ophthalmology, The University of Tokyo, Tokyo, Japan.
Final revision: March 22, 2019. 2
Department of Ophthalmology, Shimane University Faculty of Medicine,
Accepted: March 22, 2019. Shimane, Japan.
Available online: April 1, 2019. Manuscript no. 2019-30.

230
Asaoka et al 
Algorithm to Diagnose Glaucoma from Fundus Camera
3
Queue Inc., Tokyo, Japan. issued by the Japanese Government, and the protocol was posted at the
4
Department of Ophthalmology, Graduate School of Medical Science, outpatient clinic to notify participants about the research.
Kitasato University, Sagamihara Kanagawa, Japan. No animal subjects were used in this study.
5
Department of Ophthalmology and Visual Science, Hiroshima University, Author Contributions:
Hiroshima, Japan. Conception and design: Asaoka, Tanito, Murata, Kiuchi
Financial Disclosure(s): Analysis and interpretation: Asaoka, Tanito, Shibata, Mitsuhashi, Nakahara,
The author(s) have made the following disclosure(s): N.S., M.T., K.M., Murata, Tokumo, Kiuchi
H.M., and R.A.: Co-inventors on a patent for the deep learning system used Data collection: Asaoka, Tanito, Shibata, Mitsuhashi, Nakahara, Fujino,
in this study (Tokugan 2017-196870); potential conflicts of interests are
Matsuura, Tokumo, Kiuchi
managed according to institutional policies of the University of Tokyo.
Obtained funding: Asaoka, Tanito, Shibata, Mitsuhashi, Murata
Supported in part by The Translational Research program; Strategic
Overall responsibility: Asaoka, Tanito, Shibata, Mitsuhashi, Nakahara,
PRomotion for practical application of INnovative medical Technology,
TR-SPRINT, from the Japan Agency for Medical Research and Develop- Fujino, Matsuura, Murata, Tokumo, Kiuchi
ment, AMED, Grant 17K11418 and 18KK0253 from the Ministry of Ed- Abbreviations and Acronyms:
ucation, Culture, Sports, Science, and Technology of Japan and Japan AROC ¼ area under the receiver operating characteristic curve; CCD ¼ charge-
Science and Technology Agency (JST) CREST JPMJCR1304. coupled device; CI ¼ confidence interval; CMOS ¼ complementary metal-
HUMAN SUBJECTS: Human subjects were included in this study. The oxide-semiconductor; CNN ¼ convolutional neural network; dB ¼ decibels;
human ethics committees at the Matsue Red Cross Hospital, IInan Hospital, HFA ¼ Humphrey Field Analyzer; MD ¼ mean deviation; ResNet ¼ Residual
Hiroshima University Hospital, and the Faculty of Medicine at the Uni- Network.
versity of Tokyo approved the study. All research adhered to the tenets of Correspondence:
the Declaration of Helsinki. The ethics committee of Matsue Red Cross Ryo Asaoka, MD, PhD, Department of Ophthalmology, The University of
Hospital and IInan Hospital waived the requirement for the patient’s Tokyo Graduate School of Medicine, 7-3-1 Hongo, Bunkyo-ku, Tokyo,
informed consent regarding the use of their medical record data in accor- 113-8655 Japan. E-mail: rasaoka-tky@umin.ac.jp.
dance with the regulations of Japanese Guidelines for Epidemiologic Study

Pictures & Perspectives

Pseudoexfoliation after Cataract Surgery


A 60-year-old woman presented to our clinic regarding suspected
cataract in the right eye (RE). Slit-lamp examination revealed nuclear
cataract in RE and in-the-bag acrylic intraocular lens (IOL) in the left
eye (LE). There were white granular deposits radially and in clumps
suggesting pseudoexfoliation (PXF) material over the IOL (Fig).
PXF material was not present preoperatively in both her eyes. This
type of occurrence of classical PXF distribution on IOL after cataract
surgery is rare. Careful examination is mandatory in presence of PXF
as it may be associated with glaucoma, weak zonules, poor mydri-
asis, and excess postoperative inflammation. The Figure (diffuse and
retroillumination) of the LE shows the radial spoke-like (yellow
arrows) and clump of granular deposits of pseudoexfoliation mate-
rial (white arrow) over the intraocular lens. (Magnified version of the
Figure is available online at www.ophthalmologyglaucoma.org).
MAHESH BHARATHI, MS
RASHMI KRISHNAMURTHY, DNB
L V Prasad Eye Institute, Hyderabad, India

231

You might also like