Professional Documents
Culture Documents
Purpose: To validate a deep residual learning algorithm to diagnose glaucoma from fundus photography
using different fundus cameras at different institutes.
Design: Cross-sectional study.
Participants: A training dataset consisted of 1364 color fundus photographs with glaucomatous indications
and 1768 color fundus photographs without glaucomatous features. Two testing datasets consisted of (1) 95
images of 95 glaucomatous eyes and 110 images of 110 normative eyes, and (2) 93 images of 93 glaucomatous
eyes and 78 images of 78 normative eyes.
Methods: A deep learning algorithm known as Residual Network (ResNet) was used to diagnose glaucoma
using a training dataset. The 2 testing datasets were obtained using different fundus cameras (different manu-
facturers) across multiple institutes. The size of the training data was artificially increased by adding minor al-
terations to the original data, known as “image augmentation.” Diagnostic accuracy was assessed using the area
under the receiver operating characteristic curve (AROC).
Main Outcome Measures: Area under the receiver operating characteristic curve.
Results: When image augmentation was not used, the AROC was 94.8% (90.3e96.8) in the first testing dataset
and 99.7% (99.4e100.0) in the second dataset. These AROC values were significantly (P < 0.05) smaller without
augmentation (87.7% [82.8e92.6] in the first testing dataset and 94.5% [91.3e97.6] in the second testing dataset).
Conclusions: The previously developed deep residual learning algorithm achieved high diagnostic perfor-
mance with different fundus cameras across multiple institutes, in particular when image augmentation was
used. Ophthalmology Glaucoma 2019;2:224-231 ª 2019 by the American Academy of Ophthalmology
There is no doubt that early diagnosis of glaucoma is is affordable and can be carried out at nonophthalmological
important for preventing blindness, because glaucoma causes facilities, including optician practices, screening centers, and
irreversible visual impairment. In glaucoma, morphologic internal medicine clinics, where high-tech imaging devices
changes at the optic disc occur in typical patterns.1 The 2- such as OCT are less readily available.8
dimensional fundus photograph is a basic ophthalmological In a recent study, we developed a ResNet model using
screening tool for glaucoma; however, the diagnosis is fundus images obtained with a single fundus camera
currently based on subjective judgment. With the develop- (nonmyd WX, Kowa Company, Ltd, Aichi, Japan) and
ment of deep learning methods in imaging recognition validated its usefulness using a testing dataset of images
research,2 there has been renewed interest in using deep obtained from the same camera. This is a concern because
learning to diagnose eye diseases. Recent studies have fundus images are not homogenous across cameras; for
suggested the usefulness of applying a deep learning instance, pixel resolution and sensor properties are unique to
method known as the “convolutional neural network” each camera. Furthermore, all images in our previous study
(CNN) to diagnose glaucoma.3-5 A possibly more powerful were obtained at a single institute. The purpose of the cur-
deep learning methoddthe deep residual learning for Image rent study was to validate the usefulness of the newly
Recognition (Residual Network [ResNet])6dis available, developed ResNet algorithm for fundus images obtained
and we recently reported its usefulness for diagnosing using different fundus cameras (multiple manufacturers)
glaucoma.7 across multiple institutes. In addition, it has been reported
The potential impact of an accurate algorithm for the early that increasing the size of a training dataset by adding minor
detection of glaucoma and prevention of blindness, using alterations to the original data, known as “augmentation,” is
fundus photography, is enormous. This imaging technology useful to improve diagnostic accuracy.9 Thus, we also
investigated the usefulness of this technique to improve the dataset was prepared without considering VF defects, IOP level,
performance of the ResNet model. and gonioscopic appearance.
Testing Dataset 2. Testing dataset 2 consisted of 93 images of
93 glaucomatous eyes derived from the glaucoma clinic, Depart-
Methods ment of Ophthalmology, Hiroshima University, from December
2017 to February 2018 and 78 images of 78 normative eyes of
The study was approved by the Research Ethics Committee of the patients who visited Hiroshima University Hospital from
Matsue Red Cross Hospital, IInan Hospital, Hiroshima University September 2009 to July 2018. Posterior fundus photographs were
Hospital and the Faculty of Medicine at the University of Tokyo. captured using the TRC-50DX (Topcon Co Ltd) camera. The optic
The ethics committee of Matsue Red Cross Hospital and IInan nerve head and macula were scanned using the An RTVue Fourier-
Hospital waived the requirement for the patient’s informed consent domain OCT system (Optovue, Inc, Fremont, CA). The protocol
regarding the use of their medical record data in accordance with consisted of 1 horizontal scan of 7 mm in length, followed by 15
the regulations of Japanese Guidelines for Epidemiologic Study vertical scans of 7 mm in length at 0.5-mm intervals centered 1 mm
issued by the Japanese Government. Instead, the protocol was temporal to the fovea. The retinal nerve fiber layer (RNFL) 3.45
posted at the outpatient clinic to notify participants about the mode of the RTVue FD system measures the peripapillary RNFL
research. This study was performed according to the tenets of the thickness along a 3.45-mm diameter circle around the optic disc.
Declaration of Helsinki. Refractive error was recorded using the KR-1 refract-keratometer
(Topcon Co Ltd), and IOP was determined using the Goldmann
Participants applanation tonometer. The glaucomatous and normal groups were
defined in the same manner as the testing dataset. The VF was
Training Dataset. The training dataset was inherited from our tested using the HFA Swedish Interactive Thresholding Algorithm
previous study.7 In short, 1364 glaucomatous and 1768 normative central 30-2 program (Carl Zeiss Meditec, Dublin, CA).
photographs, labeled according to the recommendations of the
Japan Glaucoma Society Guidelines for Glaucoma,10 were Structure and Training Strategy of Deep Neural
obtained using a fundus camera (nonmyd WX, Kowa Company, Networks
Ltd) between February 2016 and October 2016 at Matsue Red
Cross Hospital. All photographs were taken with an angle of view In our recent report,7 we used a type of CNN known as ResNet,6
of 45 and a resolution of 21441424 pixels. Photographs that which is well known to be useful for image classification and
were defocused, unclear, too dark, too bright, or decentered from feature extraction. We reported the usefulness of the
the posterior pole, or had other conditions that could interfere with ResNet algorithm (a scratch model) to accurately discriminate
a diagnosis of glaucoma were excluded. Photographs from 2 eyes between glaucomatous and healthy eyes, trained with
of a patient were included if both photographs satisfied the criteria approximately 3000 fundus images labeled as glaucomatous or
but duplicate photographs of a single eye were excluded. Images not. In ResNet, “identical skip connections” that skip 1 or more
with other optic nerve head and retinal pathologies were also layers are used and features are propagated to succeeding layers.
excluded. This enables ResNet to facilitate a deeper and larger network,
Testing Dataset 1. Testing dataset 1 consisted of 95 images of which is helpful to acquire more effective and conceptual features
95 glaucomatous eyes and 110 images of 110 normative eyes without overfitting.
obtained from outpatients who visited Iinan Hospital (Iinan Town, Figure 1 summarizes the methods used in this study. We exploit
Shimane Prefecture, Japan) between December 2017 and February 34 layers in ResNet and initialize training with pretrained weights,
2018. Posterior fundus photographs were captured using the non- which were optimized for ImageNet classification.11 The network
myd 7 camera (Kowa Company, Ltd). The optic nerve head and also inherits the input shape from ImageNet,11 that is, 224224
macula were scanned using the RS-3000 OCT (Nidek, Gamagori, pixels and Red, Green, and Blue channels in each. Only the last
Japan) in the glaucoma mode. Refractive error was recorded using fully connected layer is changed to output 2 values (glaucoma or
the KR-1 refract-keratometer (Topcon Co Ltd, Tokyo, Japan), and normal). This methodology is inspired by recent successes in fine-
intraocular pressure (IOP) was determined using the Goldmann tuning deep neural networks,12 whereby parameters of a network
applanation tonometer. The glaucoma group was defined as having are first derived in a different but large pretraining dataset and then
glaucomatous changes in fundus photographs and corresponding used to initialize training in a new and smaller training dataset.
thinning in circumpapillary retinal nerve fiber layer thickness Fine-tuning is now a widely used approach to acquire a general-
measurements or in macular inner retinal thickness measurements ized feature representation with small training samples. In addition to
(outside of the OCT’s normative range), and no other optic nerve/ fine-tuning, we attempted further improvements of the model by
optic nerve head and retinal pathologies by fundus photographs applying image augmentation. It has been demonstrated that
and OCT images. Circumpapillary retinal nerve fiber layer thick- increasing the diversity of the training data by adding minor alter-
ness at the 3.45-mm diameter and vertical cup-to-disc ratio in the ations to the original data, known as “augmentation,” is useful to
raster scanning over a 66-mm2 area centered on the optic disc, improve diagnostic accuracy.9 Various changes were made to the
and macular inner retinal thickness within the 9-mm circle in the input images, as described next. For comparison, other deep neural
raster scanning over a 99-mm2 area centered on the foveal center networks of VGG11, VGG16,13 and Inception-v3 were also used.
were obtained. In glaucomatous groups, the visual field (VF) was Inception-v314 is a CNN model consisting of 11 inceptions; this
tested using the Humphrey Field Analyzer (HFA) Swedish Inter- structure enabled a reduction of parameters in the model, and as a
active Thresholding Algorithm central 30-2 program (Carl Zeiss result, the size of the input data required to obtain accurate
Meditec, Dublin, CA); however, in this study, VF results were not diagnosis was decreased.14 These models were pretrained using
considered in the diagnosis of glaucoma. The normal group was the ImageNet classification11 and then fine-tuned using the
defined as being free of glaucomatous changes and retinal pa- training dataset with augmentations. Structural augmentation images
thologies in both fundus photographs and OCT images. A diag- were vertically and horizontally inverted up to 50%. Random affine
nosis was independently judged by 3 ophthalmologists specializing transformations were also applied. The term “affine transformation”
in glaucoma (M.T., H.M., and R.A.). Photographs were excluded if represents 3 transformations: (1) rotation with random angles
the diagnoses of the 3 examiners did not agree. Thus, this testing from 10 degrees to þ 10 degrees, (2) vertical and horizontal
225
Ophthalmology Glaucoma Volume 2, Number 4, July/August 2019
Figure 1. Outline of the Residual Network (ResNet) model used in the current study. A total of 34 layers in ResNet were initialized with pretrained weights
optimized for ImageNet classification.11 The network also inherits the input shape from ImageNet,11 that is, 224224 pixels and red, green, and blue (RGB)
channels in each. Only the last fully connected layer is changed to output 2 values (glaucoma or normal).
translations to a maximum of 10% of the length, and (3) scaling Figure 2 shows the receiver operating characteristic curve obtained
edges to a random value from 224 to 256 pixels. Finally, 224224 with both testing datasets combined. The AROC with ResNet was
regions were randomly cropped from the transformed image. These 96.5% (95% CI, 94.9e98.1) with both structural and color
transformations enhance spatial diversity among the training dataset augmentation. Significantly smaller AROC values were obtained
and thus help the algorithm to be scale and position invariant. For
when no type of image augmentation was applied (90.5%
color augmentation, the contrast, saturation, and hue of the images
were modified randomly (range, 50%e150%, 90%e110%, and [87.5e93.4]) and when only structural augmentation (95.0%
90%e110%, respectively). This operation helps the algorithm be [93.0e97.1]) or only color augmentation was applied (94.5%
more robust to glaucomatous feature extraction. [92.3e96.7]); P values were < 0.001, < 0.001, and 0.0053,
Alongside the augmentation of training data, common pre- respectively (DeLong’s method with adjustment for multiple
processing and postprocessing for input tensors were made. Before comparisons).
training, raw fundus images were trimmed around the optic disc with Figure 3 shows the receiver operating characteristic curve
halo exclusion using vessel region30 and through transformation10 to obtained with testing dataset 1 only. The AROC with ResNet
find the circular area, that is, optic disk, and resized to fit the network was 94.8% (95% CI, 92.1e97.5) with both structural and color
input (224224 pixels). Also, just before images were fed into the augmentation. Smaller AROC values were obtained when no
deep learning network, images were normalized with the global
type of image augmentation was applied (87.7% [82.8e92.6])
average value and standard deviation of pixel values among the
training images, because the pretrained ResNet expects normalized and when only structural augmentation (91.8% [88.4e95.8]) or
tensors. Further training settings are detailed in Table 1. only color augmentation was applied (92.1% [88.2e95.5]); P
values were 0.0050, 0.062, and 0.069, respectively (DeLong’s
Statistical Analysis method with adjustment for multiple comparisons).
Figure 4 shows the receiver operating characteristic curve
Validation was carried out using the testing datasets. The obtained with testing dataset 2. The AROC with ResNet was
ResNet algorithm was built using all data in the training dataset and 99.4% (95% CI, 98.6e100.0) with both structural and color
the area under the receiver operating characteristic curve (AROC) augmentation. Smaller AROC values were obtained when no
was calculated. The AROCs were compared using DeLong’s
type of augmentation was applied (94.5% [91.3e97.6]) and
method.15 Holm’s method16,17 was used to correct P values for the
problem of multiple testing. when only structural augmentation (98.4% [96.4e99.8]) or only
color augmentation was applied (98.1% [97.1e99.7]); P values
were 0.0047, 0.025, and 0.025, respectively (Delong’s method
Results with adjustment for multiple comparisons).
Table 5 shows the AROC values with ResNet, Inception-v3,
Demographic data of the subjects in the testing datasets are sum- VGG11, and VGG16. In testing dataset 1, the area under the curve
marized in Tables 2 and 3. The mean deviation (MD) values of (AUC) value with ResNet was significantly larger than that of VGG11
glaucomatous eyes were 4.35.5 (mean standard deviation) (91.6% [87.8e95.5], P ¼ 0.033), but not significantly different from
(95% confidence interval [CI], 5.4 to 3.2) decibels (dB) in those with inception-v3 (92.3% [88.6e96.0], P ¼ 0.074) and VGG16
testing dataset 1 and 14.17.1 (16.0 to 12.2) dB in testing (93.7% [90.4e96.9], P ¼ 0.28). There was no significant difference
dataset 2, respectively. Details on the fundus cameras used to between ResNet and the other 3 models of inception-v3, VGG11, and
capture images in all the datasets are summarized in Table 4. VGG16 in testing dataset 2 and all eyes (P > 0.05).
226
Asaoka et al
Algorithm to Diagnose Glaucoma from Fundus Camera
Table 1. Parameters Used in the Residual Network Table 3. Demographics of Subjects in Testing Dataset 2
227
Ophthalmology Glaucoma Volume 2, Number 4, July/August 2019
Table 4. Comparisons of Fundus Cameras Used in the Training Dataset, Testing Dataset 1, and Testing Dataset 2
glaucoma and highly myopic glaucoma.7 This is important of deep learning methods to diagnose glaucoma from fundus
for patients of Asian origin, including Japanese, photographs.3-5 Augmentation is a technique to increase the
because myopia is more common in these populations20,21 size of the training data by adding minor alterations to the
and myopia is a risk factor for the development of original data,9 and so this method is particularly useful when
glaucoma.22-25 Furthermore, the detection of glaucoma is a the size of the training dataset is small.9 Current results
more challenging task in highly myopic patients because suggested that the diagnostic performance of the
their optic discs are morphologically different from those of ResNet algorithm was greatly improved using structural
nonhighly myopic eyes.26,27 and color image augmentations (Figs 2e4). It should be
A wide variety of fundus cameras are used in the clinical noted that in all the ResNet models built in the current study,
setting. Thus, the diagnostic performance of any algorithm a fine-tuning (“transfer learning”) technique was used
designed to screen for glaucoma from fundus photographs whereby the parameters of ResNet are pretrained using a
should be validated across fundus cameras and different different but massive dataset before the training process.
conditions. For example, images may be systematically This method is now widely used to acquire a generalized
different according to the institute where they were captured feature representation with a small training dataset, and
and, moreover, according to the photographer who captured indeed we have recently reported the usefulness of this
them. Previous studies3,4 from different groups analyzed approach in diagnosing glaucoma using OCT images.29 The
larger numbers of fundus images; however, the smaller current results suggest that image augmentation is a useful
training dataset size in this study did not limit the success of approach even after the fine-tuning process.
the ResNet algorithm. The ResNet model achieved a high In the current study, images were cropped around the
AROC in the testing datasets despite different pixel reso- optic disc before training the ResNet model and features
lutions (19361296 pixels in testing dataset 1 and
14601424 pixels in testing dataset 2) from that used in the
training dataset (21441424 pixels). Furthermore, each
fundus camera differs not only in resolution but also in the
sensor; the nonmyd WX and TRC-50DX use a charge-
coupled device (CCD), whereas a complementary metal-
oxide-semiconductor (CMOS) is used in nonmyd WX
(Table 4). Both CCD and CMOS sensors are used in
conventional digital cameras, but there are large
differences between these 2 types of sensors. Both CCD
and CMOS devices transform the light from one small
portion of the image into electrons; however, the charge is
transported across the chip and read at 1 corner of the
array in CCD, but there are several transistors at each
pixel that amplify and move the charge using traditional
wires in CMOS.28 As a result, CCD sensors usually create
high-quality, low-noise images, whereas CMOS sensors
traditionally are more susceptible to noise. Also, the light
sensitivity of a CMOS chip tends to be lower than CCD,
because each pixel on a CMOS sensor has several transistors
located next to it and many of the photons hitting the chip
hit the transistors instead of the photodiode. Despite these
differences across the training dataset and 2 testing datasets, Figure 2. Receiver operating characteristic curve obtained with a com-
bined testing dataset (testing dataset 1 and testing dataset 2). The area
high AROC values (94.8% and 99.4%) were obtained in
under the receiver operating characteristic curve (AROC) with Residual
both testing datasets (Figs 2e4). The relatively smaller Network (ResNet) was 96.5 (95% confidence interval [CI], 94.9e98.1)
AROC value in the testing dataset 1 (HFA MD: 4.35.5 with iii) both structural and color augmentation. Significantly smaller
dB) than in the testing dataset 2 (HFA MD: 14.17.1 dB) AROC values were obtained with no augmentation (90.5% [87.5e93.4]),
would be due to the difference of the stages of glaucoma. i) color augmentation (94.5% [92.3e96.7]), and ii): color augmentations
The size of the training dataset used is smaller than that (95.0% [93.0e97.1]), with the P values of < 0.001, < 0.001, and 0.0053
used in previous studies that have reported on the usefulness (DeLong’s method with adjustment for multiple comparisons).
228
Asaoka et al
Algorithm to Diagnose Glaucoma from Fundus Camera
Testing
All Eyes Testing Dataset 1 Dataset 2
ResNet 96.5% [94.9e98.1] 94.8% [92.1e97.5] 99.4%
[98.6e100.0]
Inception-v3 95.7% [93.8e97.6] 92.3% [88.6e96.0] 99.1%
[98.2e100.0]
VGG11 95.5% [93.5e97.5] 91.6% [87.8e95.5]* 99.4%
[98.8e100.0]
VGG16 96.4% [94.7e98.1] 93.7% [90.4e96.9] 99.5%
[98.8e100.0]
229
Ophthalmology Glaucoma Volume 2, Number 4, July/August 2019
230
Asaoka et al
Algorithm to Diagnose Glaucoma from Fundus Camera
3
Queue Inc., Tokyo, Japan. issued by the Japanese Government, and the protocol was posted at the
4
Department of Ophthalmology, Graduate School of Medical Science, outpatient clinic to notify participants about the research.
Kitasato University, Sagamihara Kanagawa, Japan. No animal subjects were used in this study.
5
Department of Ophthalmology and Visual Science, Hiroshima University, Author Contributions:
Hiroshima, Japan. Conception and design: Asaoka, Tanito, Murata, Kiuchi
Financial Disclosure(s): Analysis and interpretation: Asaoka, Tanito, Shibata, Mitsuhashi, Nakahara,
The author(s) have made the following disclosure(s): N.S., M.T., K.M., Murata, Tokumo, Kiuchi
H.M., and R.A.: Co-inventors on a patent for the deep learning system used Data collection: Asaoka, Tanito, Shibata, Mitsuhashi, Nakahara, Fujino,
in this study (Tokugan 2017-196870); potential conflicts of interests are
Matsuura, Tokumo, Kiuchi
managed according to institutional policies of the University of Tokyo.
Obtained funding: Asaoka, Tanito, Shibata, Mitsuhashi, Murata
Supported in part by The Translational Research program; Strategic
Overall responsibility: Asaoka, Tanito, Shibata, Mitsuhashi, Nakahara,
PRomotion for practical application of INnovative medical Technology,
TR-SPRINT, from the Japan Agency for Medical Research and Develop- Fujino, Matsuura, Murata, Tokumo, Kiuchi
ment, AMED, Grant 17K11418 and 18KK0253 from the Ministry of Ed- Abbreviations and Acronyms:
ucation, Culture, Sports, Science, and Technology of Japan and Japan AROC ¼ area under the receiver operating characteristic curve; CCD ¼ charge-
Science and Technology Agency (JST) CREST JPMJCR1304. coupled device; CI ¼ confidence interval; CMOS ¼ complementary metal-
HUMAN SUBJECTS: Human subjects were included in this study. The oxide-semiconductor; CNN ¼ convolutional neural network; dB ¼ decibels;
human ethics committees at the Matsue Red Cross Hospital, IInan Hospital, HFA ¼ Humphrey Field Analyzer; MD ¼ mean deviation; ResNet ¼ Residual
Hiroshima University Hospital, and the Faculty of Medicine at the Uni- Network.
versity of Tokyo approved the study. All research adhered to the tenets of Correspondence:
the Declaration of Helsinki. The ethics committee of Matsue Red Cross Ryo Asaoka, MD, PhD, Department of Ophthalmology, The University of
Hospital and IInan Hospital waived the requirement for the patient’s Tokyo Graduate School of Medicine, 7-3-1 Hongo, Bunkyo-ku, Tokyo,
informed consent regarding the use of their medical record data in accor- 113-8655 Japan. E-mail: rasaoka-tky@umin.ac.jp.
dance with the regulations of Japanese Guidelines for Epidemiologic Study
231