You are on page 1of 8

Clinical science

Br J Ophthalmol: first published as 10.1136/bjophthalmol-2020-317659 on 26 November 2020. Downloaded from http://bjo.bmj.com/ on February 9, 2024 by guest. Protected by copyright.
Convolutional neural network to identify
symptomatic Alzheimer’s disease using multimodal
retinal imaging
C. Ellis Wisely ‍ ‍,1 Dong Wang,2 Ricardo Henao ‍ ‍,3 Dilraj S. Grewal ‍ ‍,1
Atalie C. Thompson,1 Cason B. Robbins ‍ ‍,1 Stephen P. Yoon,1
Srinath Soundararajan,1 Bryce W. Polascik,1 James R. Burke,4 Andy Liu,4
Lawrence Carin,2 Sharon Fekrat ‍ ‍1

► Additional material is ABSTRACT forestalled their routine use in clinical practice.1 2


published online only. To view Background/Aims To develop a convolutional neural Thus, AD remains primarily a clinical diagnosis.3
please visit the journal online
network (CNN) to detect symptomatic Alzheimer’s
(http://d​ x.​doi.o​ rg/​10.​1136/​ Abnormalities in the retina have been associated
bjophthalmol-­​2020-­​317659). disease (AD) using a combination of multimodal retinal with the presence of neurodegenerative disease,
images and patient data. and AD in particular.4 Amyloid beta plaques and
1
Department of Ophthalmology, Methods Colour maps of ganglion cell-­inner plexiform tau neurofibrillary tangles, the hallmark neuro-
Duke University Health System,
Durham, NC, USA
layer (GC-­IPL) thickness, superficial capillary plexus (SCP) pathological features of AD, have been observed
2
Department of Electrical and optical coherence tomography angiography (OCTA) in retinal tissues of persons with AD.5 6 A recent
Computer Engineering, Duke images, and ultra-­widefield (UWF) colour and fundus study, however, found that retinal Aβ levels in AD
University, Durham, North autofluorescence (FAF) scanning laser ophthalmoscopy and in cognitively healthy older adults were similar,
Carolina, USA
3 images were captured in individuals with AD or healthy while retinal tau levels were elevated in AD.6 Thus,
Department of Biostatistics and
Bioinformatics, Duke University, cognition. A CNN to predict AD diagnosis was developed the retina may demonstrate characteristic structural
Durham, North Carolina, USA using multimodal retinal images, OCT and OCTA changes in individuals with AD.
4
Department of Neurology, quantitative data, and patient data. In fact, retinal imaging has identified structural
Duke University Health System, Results 284 eyes of 159 subjects (222 eyes from 123 changes in the neurosensory retina and the retinal
Durham, NC, USA
cognitively healthy subjects and 62 eyes from 36 subjects microvasculature in persons with AD.6 Optical
Correspondence to
with AD) were used to develop the model. Area under coherence tomography (OCT) imaging has high-
Dr Sharon Fekrat, Department of the receiving operating characteristic curve (AUC) values lighted changes in macular volume,7 8 ganglion
Ophthalmology, Duke University, for predicted probability of AD for the independent test cell-­inner plexiform layer (GC-­IPL) thickness,8–11
Durham, NC 27708-­0187, USA; set varied by input used: UWF colour AUC 0.450 (95% CI retinal nerve fibre layer (RNFL) thickness7–9 12 and
​sharon.​fekrat@d​ uke.​edu
0.282, 0.592), OCTA SCP 0.582 (95% CI 0.440, 0.724), subfoveal choroidal thickness.8 13 14 OCT changes
A portion of this data was UWF FAF 0.618 (95% CI 0.462, 0.773), GC-­IPL maps in the ellipsoid zone and retinal pigment epithe-
presented at the 2019 0.809 (95% CI 0.700, 0.919). A model incorporating all lium,15 macular volume,7 RNFL thickness16 and
Association for Research in images, quantitative data and patient data (AUC 0.836 GC-­IPL thickness16 have also demonstrated correla-
Vision and Ophthalmology (CI 0.729, 0.943)) performed similarly to models only
(ARVO) annual meeting in
tion with cognitive test scores in AD. Additionally,
Vancouver, Canada. incorporating all images (AUC 0.829 (95% CI 0.719, peripheral retinal changes including increased
0.939)). GC-­IPL maps, quantitative data and patient data drusen and decreased retinal vascular complexity
Received 10 August 2020 AUC 0.841 (95% CI 0.739, 0.943). have been demonstrated using ultra-­ widefield
Revised 2 November 2020 Conclusion Our CNN used multimodal retinal images (UWF) scanning laser ophthalmoscopy (SLO).17 18
Accepted 4 November 2020
Published Online First
to successfully predict diagnosis of symptomatic AD in OCT angiography (OCTA) has been used to detect
26 November 2020 an independent test set. GC-­IPL maps were the most retinal microvascular changes in AD, specifi-
useful single inputs for prediction. Models including only cally decreased vessel density (VD) and perfusion
images performed similarly to models also including density (PD) in the superficial capillary plexus
quantitative data and patient data. (SCP)13 14 16 and increased foveal avascular zone
(FAZ) area.13 14 Recent studies have also demon-
strated clinical correlation between these OCTA
parameters and Mini Mental State Examination
INTRODUCTION (MMSE) scores.14 16 17
Alzheimer’s disease (AD) is a common neurodegen- More recently, advances in machine learning have
erative disease affecting 1.6% of the population in spawned significant efforts to facilitate the clinical
© Author(s) (or their
the USA. Expert estimates suggest 3.3% of Ameri- diagnosis of AD. Published works have largely
employer(s)) 2022. No
commercial re-­use. See rights cans will be impacted by 2060.1 While brain imaging focused on machine learning analysis of neuro-
and permissions. Published including MRI and positron emission tomography imaging including MRI18 19 or a combination of
by BMJ. (PET), cerebrospinal fluid (CSF) biomarkers such MRI and PET images.20 21 Other machine learning
To cite: Wisely CE, Wang D, as amyloid beta (Aβ) and tau, and serologic testing models have incorporated neuroimaging data along
Henao R, et al. including genetic markers may each have diagnostic with laboratory data (eg, CSF biomarkers)22 or
Br J Ophthalmol value, their limitations, including high cost, inva- clinical data (eg, MMSE) to augment predictive
2022;106:388–395. siveness, and limited sensitivity and specificity, have capabilities.23 Machine learning models using MRI
388 Wisely CE, et al. Br J Ophthalmol 2022;106:388–395. doi:10.1136/bjophthalmol-2020-317659
Clinical science

Br J Ophthalmol: first published as 10.1136/bjophthalmol-2020-317659 on 26 November 2020. Downloaded from http://bjo.bmj.com/ on February 9, 2024 by guest. Protected by copyright.
and PET image inputs have reported 93.3%20 and 97.1%21 diag- history of ocular surgery apart from cataract surgery and
nostic accuracy when comparing patients with AD to cognitively corrected ETDRS visual acuity <20/40 at the time of image
normal controls. Another model using only MRI inputs reported acquisition.
99% diagnostic accuracy for differentiating patients with AD Subjects with AD aged ≥50 years were enrolled from the
from cognitively normal controls.19 Duke Memory Disorders Clinic. Subjects with AD were evalu-
While these studies demonstrate promising results, their clin- ated and clinically diagnosed by experienced neurologists, and
ical application is limited by the expense and time required all medical records were reviewed by one of two expert neurolo-
to obtain neuroimaging data. Links between AD and retinal gists (AL and JRB) to confirm a clinical diagnosis of AD using the
changes, coupled with the non-­ invasive, cost-­
effective and guidelines of the National Institute on Ageing and Alzheimer’s
widely available nature of retinal imaging platforms, position Association.3
multimodal retinal image analysis as an attractive alternative for Cognitively healthy control subjects were either community
AD diagnosis.24 volunteers aged ≥50 years without memory complaints or were
In the present study, we acquired multimodal retinal images, participants from Duke’s Alzheimer’s Disease Research Center
including OCT, OCTA and UWF SLO colour and autofluores- (ADRC) registry of cognitively normal community subjects. All
cence (FAF), in patients with an expert-­confirmed clinical diag- subjects underwent corrected ETDRS visual acuity measurement
nosis of AD and in cognitively healthy controls in order to train, and MMSE to evaluate cognitive function on the day of image
validate and test a convolutional neural network (CNN) that acquisition.
predicts the probability of AD, P(AD), for each eye. At the time Test set patients were imaged 1 year after the training and vali-
of this publication, there were no published machine learning dation set patients. Test set patients were also imaged by different
models using multimodal retinal image inputs for prediction of photographers than the training and validation set patients.
AD diagnosis. This exploratory study was designed as a proof of
concept that symptomatic AD could be detected by a CNN when
applied to an independent test dataset. Image acquisition
Non-­dilated subjects were imaged with the Zeiss Cirrus HD-­5000
METHODS Spectral-­Domain OCT with AngioPlex OCT Angiography (Carl
Participants Zeiss Meditec, Dublin, California, USA, V.11.0.0.29946).16
Study participants were enrolled in a cross-­ sectional study (​ Colour map reports of GC-­IPL thickness were generated using
Clinicaltrials.​
gov, NCT03233646) approved by Duke Health the Zeiss OCT software following automated segmentation.
System’s Institutional Review Board. This study followed the GC-­IPL thickness maps with motion artefact or poor centration
tenets of the Declaration of Helsinki. Written informed consent were excluded. SCP OCTA en face 3×3 mm and 6×6 mm images
was obtained for each subject. centred on the fovea were acquired. OCTA images with low
Detailed methods describing patient eligibility and selec- signal strength (less than 7/10), poor centration, segmentation
tion criteria have been previously published.16 In brief, exclu- artefacts or motion artefacts were excluded. UWF fundus colour
sion criteria included a known history of non-­AD dementia, and FAF images were captured using a non-­mydriatic UWF SLO
diabetes mellitus, uncontrolled hypertension, demyelinating (Optos California, Marlborough, Massachusetts, USA). Images
disorders, high myopia or hyperopia, glaucoma, age-­related with motion artefact or eyelid artefacts within the major vascular
macular degeneration, other vitreoretinal pathology, any arcades were excluded.

Figure 1 Illustration of the architecture of the end-­to-­end Alzheimer’s disease (AD) prediction model. For each imaging modality, the model used a
shared-­weight image feature extractor (depicted in the dotted box) to create feature maps for each imaging modality (denoted as fOCTA, fColor and FAF and
fGC-­IPL) and passed this result to modality-­specific fully connected layers (denoted as FCOCTA, FCColor and FAF and FCGC-­IPL) to obtain preclassification scores
for AD. The details of the image feature extractor can be found in online supplemental figure 1. For the clinical data, the model used a fully connected
layer (denoted by FCother) to obtain a preclassification score. The model combines the preclassification scores from multimodal retinal images, OCT
and OCT angiography quantitative data, and clinical data, and then scales it to the [0,1] probability range using a sigmoid function26 to obtain the
final prediction result, referred to as probability of AD, P(AD). GC-­IPL, ganglion cell-­inner plexiform layer; FAF, fundus autofluorescence; OCT, optical
coherence tomography; OCTA, optical coherence tomography angiography; UWF, ultra-­widefield.
Wisely CE, et al. Br J Ophthalmol 2022;106:388–395. doi:10.1136/bjophthalmol-2020-317659 389
Clinical science

Br J Ophthalmol: first published as 10.1136/bjophthalmol-2020-317659 on 26 November 2020. Downloaded from http://bjo.bmj.com/ on February 9, 2024 by guest. Protected by copyright.
The model was trained in PyTorch with the Adam optimizer27
for 100 epochs with an initial learning rate of 0.01 for FC layers,
and 0.0001 for other layers where parameters are initialised
by the pretrained ResNet18. A decay of 0.5 is applied to the
learning rates every 10 epochs. We tested different weights from
0.001 to 10 for regularisation loss and found that 0.01 worked
best. The training loss dropped slowly after epoch 40, and the
validation loss stayed consistent during epochs 50–100. Due to
our sample size, we stopped the training at epoch 100 to avoid
overfitting. The final prediction results reported are averages
over epochs 50–100, which represent conservative performance
estimates to avoid bias in selection of the best performing epoch.
Table 1 describes the performance results for 11 individual
models that were trained using various input combinations. The
architecture of each model is the same as the model outlined
Figure 2 Image montage from the right eye of an 85-­year-­old in figure 1, with the omission of the portions corresponding to
patient with Alzheimer’s disease (AD). The probability of AD value, unused input modalities. For example, if the inputs included
P(AD), for this eye was 0.344 (strongly predicting the subject carried only GC-­IPL and OCTA, then we kept the feature extractor and
a diagnosis of symptomatic AD based on figure 4A) when using the FC layers for those two modalities (ie, the FC layers for GC-­IPL
best model (GC-­IPL, quantitative data and patient data). (A) 6×6 mm and OCTA), and dropped the other modules (ie, the FC layers
optical coherence tomography angiography (OCTA) superficial capillary for UWF and quantitative and patient data), and then trained
plexus (SCP) image. Note the loss of density of the SCP surrounding the model.
the fovea. (B) Attention map for the 6×6 mm OCTA SCP image. The The data used to develop the model represented the intersec-
model demonstrates attention on the foveal avascular zone and on tion of each of the imaging datasets (ie, only included patients
several areas in the temporal macula. Some foci in the inferior macula with full sets of images). Two hundred and eighty-­four eyes from
appear to correlate with areas of decreased vessel density. (C) Ganglion 159 patients, including 222 eyes from 123 cognitively healthy
cell-­inner plexiform layer (GC-­IPL) thickness map showing diffuse GC-­ control subjects and 62 eyes from 36 AD subjects, were used
IPL thinning with an average thickness of 61 µm. (D) Ultra-­widefield to develop and evaluate the model (online supplemental table
(UWF) scanning laser ophthalmoscopy (SLO) colour image. (E) UWF SLO 1a-­c). We initially imaged 186 eyes, which were used for training
autofluorescence image. (F) Retinal nerve fibre layer quadrant thickness and validating the model. Subsequently, 68 independent eyes
map demonstrating the thickness of each region in microns with of 34 subjects were imaged and used for the test set. Online
superior thinning.

Model development
A schematic of the structure of the CNN is shown in figure 1.
The model was trained in end-­to-­end fashion to receive inputs
of multimodal retinal images, OCT and OCTA quantitative data,
and patient data to produce a score suggesting the probability
that a patient carried a clinical diagnosis of symptomatic AD. As
individual imaging modalities were expected to have different
salient features, the model used a shared-­weight image feature
extractor to extract useful modality-­ agnostic features that
were then fed to a modality-­specific function (fully connected
(FC) layer)25 to obtain modality-­ aware preclassification AD
scores. UWF colour and FAF images were stacked in a common
modality-­specific function given that they share similar visual
structures (figure 1). The image feature extractor (described in Figure 3 Image montage from the left eye of a 72-­year-­old cognitively
online supplemental figure 1) has a structure consistent with the healthy control subject. The probability that the patient carried a
first five layers of the ResNet1825 neural network. Figures 2 and diagnosis of symptomatic Alzheimer’s disease (AD), P(AD), for this eye
3 show examples of the multimodal retinal images (figures 2a, was 0.115 (strongly predicting that this was a control subject based on
2 c-­f; figures 3a, 3 c-­f) and attention maps (figures 2b, 3b) that figure 4A) when using the best model (ganglion cell-­inner plexiform
were generated to visualise the discriminative image regions in layer (GC-­IPL) and quantitative data and clinical data). (A) 6×6 mm
OCTA images. A description of the attention mapping process is optical coherence tomography angiography (OCTA) superficial capillary
provided in the online supplemental appendix. plexus (SCP) image. (B) Attention map for the 6×6 mm OCTA SCP image.
For patient data, the model used an FC layer (FCother in figure 1) The model demonstrates attention on the foveal avascular zone as
to obtain the preclassification score for AD. The combined result well as scattered foci elsewhere in the macula. Some foci of attention,
was obtained by averaging the preclassification AD scores from particularly the superior patches, appear to correlate with areas of
the retinal images (UWF colour, UWF FAF, GC-­IPL maps and decreased vessel density on the OCTA SCP image. (C) GC-­IPL thickness
OCTA) with OCT and OCTA quantitative data and patient map showing normal thickness. Average thickness was 80 µm. (D) Ultra-­
data and scaled to the [0, 1] probability range using a sigmoid widefield (UWF) scanning laser ophthalmoscopy (SLO) colour image. (E)
function26 (figure 1). The model is trained to maximise binary UWF SLO autofluorescence image. (F) Retinal nerve fibre layer quadrant
cross-­entropy loss (ie, the likelihood that multimodal inputs are thickness map demonstrating the thickness of each region in microns
correctly assigned to either the AD or control group). without areas of thinning.
390 Wisely CE, et al. Br J Ophthalmol 2022;106:388–395. doi:10.1136/bjophthalmol-2020-317659
Clinical science

Br J Ophthalmol: first published as 10.1136/bjophthalmol-2020-317659 on 26 November 2020. Downloaded from http://bjo.bmj.com/ on February 9, 2024 by guest. Protected by copyright.
Table 1 Predictive capabilities of each model in the convolutional neural network
OCT/OCTA AUC on test set
Model inputs UWF colour UWF FAF GC-­IPL OCTA quantitative data Patient data (95% CI)
UWF colour X 0.450 (0.282 to 0.592)
UWF FAF X 0.618 (0.462 to 0.773)
OCTA     X   0.582 (0.440 to 0.724)
GC-­IPL X 0.809 (0.700 to 0.919)
Quantitative data and patient data X X 0.754 (0.625 to 0.882)
All images X X X X   0.829 (0.719 to 0.939)
All images and quantitative data X X X X X 0.830 (0.726 to 0.940)
OCTA and GC-­IPL   X X   0.828 (0.718 to 0.938)
All images and all data X X X X X X 0.836 (0.729 to 0.943)
GC-­IPL, OCTA, quantitative data and patient data   X X X X 0.832 (0.724 to 0.940)
GC-­IPL, quantitative data and patient data* X X X 0.841 (0.739 to 0.943)
AUC on validation and test set figures describe the performance of each model on predicting the probability of symptomatic AD diagnosis for each individual eye in the
independent validation set. AUC values represent average performance over epochs 50–100. An ‘x’ indicates that the input(s) were included in the model described in that row.
*Indicates best-­performing model.
AD, Alzheimer’s disease; AUC, area under the receiver operating characteristic curve; FAF, fundus autofluorescence; GC-­IPL, ganglion cell-­inner plexiform layer thickness maps;
OCTA, optical coherence tomography angiography; UWF, ultra-­widefield color images.

supplemental table 2a-­c describes the subjects included in the Statistical analysis
training, validation and test sets. For each eye, UWF colour Statistical analysis was performed to compare the OCT and
images, UWF FAF images, 6×6 mm OCTA macular images and OCTA quantitative data and patient data using XLSTAT (Addin-
macular GC-­IPL thickness colour map images were included. soft, Paris, France). χ2 tests were used to compare categorical
There was no overlap between subjects in the training, validation variables, and Wilcoxon rank-­sum tests were used to compare
and test sets. Neither the model design nor its development was continuous variables.
informed by the test set. The model did not learn from the test
set and made no changes based on test set information. Perfor-
mance of each model is reported in terms of area under the RESULTS
receiver operating characteristic curve (AUC), which represents Online supplemental table 1a-­c summarises patient and quan-
the ability of the model to correctly predict a symptomatic AD titative data used by the model for AD and control subjects.
diagnosis for each individual eye analysed. Sensitivity and spec- Comparisons of OCT and OCTA parameters in AD and control
ificity values are also reported for the best-­performing model subjects have been described in our previous work.16 Briefly,
based on a threshold value determined using the Youden’s index subjects with AD in the present study had significant differences
for our receiver operating characteristic (ROC) curve. in OCT parameters including GC-­IPL thickness and subfoveal
choroidal thickness (online supplemental table 1b). Significant
differences in OCTA parameters included PD for the 3 mm ring,
Image cropping VD for the 3 mm circle and 3 mm ring, and PD for the 6 mm
Analysis was conducted using the unedited image files for the outer ring (online supplemental table 1c). Mean MMSE scores
UWF colour and UWF FAF images with automated cropping do not appear in online supplemental table 1, as they were not
which preserved the central 1800×1800 pixels (originally used as inputs in the CNN. MMSE scores were 29.2 (SD=1.2)
4000×4000 pixels). Cropping was performed to minimise for control subjects and 21.9 (SD=4.5) for subjects with AD
eyelid artefacts. SCP OCTA and GC-­IPL colour map images did (p<0.0001). Online supplemental table 2a-­c includes a summary
not undergo any cropping or preprocessing. of the patient, OCT and OCTA quantitative data included in
the model. Values for each input variable were similar for the
Descriptive patient data and quantitative imaging data training, validation and test sets. The model was trained, vali-
Patient data included age, sex, ETDRS visual acuity converted dated and tested with 11 different sets of inputs as described in
to the logarithm of the minimal angle of resolution (logMAR) table 1.
and years of education. The model also used quantitative OCT The best-­performing model, including GC-­IPL maps, quanti-
and OCTA imaging data derived from the Zeiss software (Carl tative data and patient data, achieved an AUC of 0.861 (95% CI
Zeiss Meditec, Version 11.0.0.29946). References to ‘OCT/ 0.727, 0.995) on the validation set and 0.841 (95% CI 0.739,
OCTA quantitative data’ or ‘quantitative data’ throughout the 0.943) on the test set.
manuscript encompass OCT parameters including subfoveal P(AD) is also represented as box and whisker plots for AD
choroidal thickness (manually measured on foveal enhanced and control subjects in figures 4A and 5A. Figure 4 describes
depth imaging scan by two masked graders—BP and CBR—and the performance of the GC-­IPL, quantitative data and patient
adjudicated by another masked grader—DSG), central subfield data model. The ROC curve for this model’s performance on
thickness, average GC-­IPL thickness and average RNFL thick- the validation set (figure 4B) was used to calculate the Youden’s
ness; 3×3 mm OCTA parameters including area of the FAZ, index, 0.750, which was used to select sensitivity (100%) and
FAZ circularity index (scale 0–1), SCP PD for the 3 mm ETDRS specificity (75%) values. Figure 4A displays a Youden threshold
circle and 3 mm ring,16 and SCP VD in the 3 mm circle and 3 mm value of 0.208, the P(AD) value from this input combination
ring; and 6×6 mm OCTA parameters including PD and VD in representing our decision point for determining sensitivity and
the 6 mm circle, 6 mm outer ring and 6 mm inner ring. specificity values reported above. Figure 4C represents the
Wisely CE, et al. Br J Ophthalmol 2022;106:388–395. doi:10.1136/bjophthalmol-2020-317659 391
Clinical science

Br J Ophthalmol: first published as 10.1136/bjophthalmol-2020-317659 on 26 November 2020. Downloaded from http://bjo.bmj.com/ on February 9, 2024 by guest. Protected by copyright.
Figure 4 (A) Box and whisker plots of probability of Alzheimer’s disease (AD), P(AD), predictions for the best-­performing model: ganglion cell-­inner
plexiform layer (GC-­IPL) thickness maps, quantitative data and patient data, for the cognitively healthy control and AD groups on the validation and
test data sets. Validation set: AD median 0.281, IQR (0.208, 0.353), control median 117, IQR (0.073, 0.305). Test set: AD median 0.266, IQR (0.108,
0.526), control median 0.1207, IQR (0.052, 0.269). (B) Receiver operating characteristic (ROC) curve for P(AD) for each eye in the validation set (red
line) and test set (blue line) using the GC-­IPL thickness maps, quantitative data and patient data model. Area under the curve (AUC) values for the
validation and test sets along with sensitivity and specificity values for the test set are reported. (C) Changes in AUC of the GC-­IPL map, quantitative
data and patient data model with each epoch during training, from which minimal variability is seen after epoch 40.

changes in performance of the GC-­IPL map, quantitative data Given that the test set performance was also stable from epochs
and patient data input model with each epoch. The model was 50–100, we presented test set results by averaging predictions
trained for 100 epochs without using the test set, and from which from epochs 50–100 (table 1).
we observed (figure 4C) that performance on the validation set Figure 5 compares the performance on several different
was stable (flat) after epoch 50. After setting an epoch interval models, including all images and all data; GC-­IPL maps, quanti-
to 50–100, for which performance was known to be stable, we tative data and patient data (labelled as Best Model); all images
ran predictions on the test set for the entire epoch set (1–100). only; and OCTA images and GC-­IPL maps. Each of these box

Figure 5 (A) Box and whisker plots of probability of Alzheimer’s disease (AD) P(AD), for the following models: all images and all data; GC-­IPL,
quantitative data and clinical data (labelled as Best Model); all images; and OCTA and GC-­IPL on the validation and test sets. Each of these box and
whisker plots is labelled with a Youden threshold value for that model calculated using the corresponding receiver operating characteristic (ROC)
curve in figure 5B. (B) Comparison of ROC curves for selected model.
392 Wisely CE, et al. Br J Ophthalmol 2022;106:388–395. doi:10.1136/bjophthalmol-2020-317659
Clinical science

Br J Ophthalmol: first published as 10.1136/bjophthalmol-2020-317659 on 26 November 2020. Downloaded from http://bjo.bmj.com/ on February 9, 2024 by guest. Protected by copyright.
and whisker plots is labelled with a Youden threshold value for modality. The final probability of whether an eye carries a diag-
that input combination calculated using the corresponding ROC nosis of symptomatic AD is a combination of signals from each
curve in figure 5B. of the multimodal retinal images, OCT and OCTA quantitative
data, and patient data, scaled by the sigmoid activation func-
tion.26 The image feature extractor is based on a CNN model.
DISCUSSION
While the promising performance of CNNs is accompanied
To our knowledge, this is the first attempt to develop a CNN
by a huge amount of annotated data, our study population
to detect AD diagnosis using multimodal retinal imaging. This
had 283 eyes and each UWF image contained 3 240 000 pixels
study represents proof of concept that machine learning anal-
(1800×1800).
ysis of multimodal retinal images has potential for future clinical
A concern with training a CNN-­based model with a dataset
application. Our best-­performing model, using retinal images,
of this size is overfitting, meaning the model performs very
OCT and OCTA quantitative data, and patient data, achieved
well on the training data (overfitting the observed data), but
an AUC of 0.841 on an independent test set. Additional training
performs poorly on testing (unobserved or new data). To reduce
using images from a larger, more diverse population with known
overfitting, we carefully designed the model architecture, using
confounders may further strengthen performance.
L2 and L1 regularisation, data augmentation and transfer
The decision to incorporate OCT and OCTA inputs in our
learning.28 29 We have included figure 4C to allay overfitting
CNN was based on our prior statistical comparison of parame-
concerns as figure 4C demonstrates: (1) the model has learnt
ters from these modalities in AD and control subjects.16 Images
as performance on the validation set improves over the initial
used in the training and validation sets in the present study over-
epochs; (2) the performance on the test set stabilises after epoch
lapped, in part, with those images assessed in cohorts reported in
50; (3) validation and test performance metrics do not decrease
our prior work.16 However, the images in the independent test
after the model reaches peak performance. If overfitting were
set included herein had not yet been obtained at the time of our
occurring, we would expect validation and test set performance
prior publication.
metrics to decrease after reaching peak performance rather than
When considering performance on the independent vali-
stabilising.
dation and testing sets demonstrated in table 1, evaluation of
Another concern inherent in smaller datasets is that when
OCTA images and GC-­IPL map images with our CNN demon-
models have a smaller number of images to use for training
strated better AUC values than UWF colour or UWF FAF image
purposes, they may not learn to consistently recognise all of
inputs alone. GC-­IPL maps demonstrated the best performance
the salient features present in the imaging modality inputs. To
as a single input, with AUC 0.809 (0.700, 0.919). Particularly
address this concern, we added quantitative information from
when considering 95% CIs demonstrated in table 1 for UWF
OCTA SCP images and GC-­ IPL thickness colour maps. We
images, neither colour nor FAF images used as single inputs
suspected this redundancy would improve the model’s perfor-
demonstrated a consistent ability to differentiate AD and control
subjects. Various models outlined in table 1 also suggest low mance by maximising the useful information extracted from
utility of UWF fundus images for predicting AD diagnosis, as each imaging modality. Details of the model structure and the
the two best-­performing models excluded UWF images. Further- techniques used to maximise the smaller dataset can be reviewed
more, the OCTA and GC-­IPL map model without UWF images in the online supplemental appendix.
outperformed the combination of all images. Apart from the
UWF images, comparison of the input combinations in table 1 Limitations
suggests that adding more data modalities provided incremental In reviewing our imaging data, many UWF images included
improvement in predictive value. eyelid artefacts. Initially, we attempted to address this by
A concern inherent in smaller datasets is that when models use generating image masks to hide artefacts (online supplemental
a smaller number of images for training, they may not learn to appendix). The use of masked images provided no advantage
consistently recognise all of the salient features in the imaging over cropped images in the predictive value of the model, so
modality input. To address this concern, we added quantita- we elected for standardised automated cropping of UWF images
tive information in the OCTA quantitative data that is based
rather than masking. Despite these attempts to improve UWF
on the OCTA SCP images already used as model inputs. This
data quality, UWF colour and FAF images consistently contrib-
same type of redundancy was included in the quantitative data
uted little predictive value. Prior research using UWF colour
for GC-­IPL thickness via addition of GC-­IPL thickness colour
fundus imaging in patients with AD has demonstrated periph-
maps. We suspected this redundancy might improve the model’s
eral retinal alterations including increased peripheral drusen
performance by maximising the useful information extracted
deposition30 31 and decreased branching complexity throughout
from each imaging modality. Interestingly, we found that our
the retinal vasculature.30 Thus, using cropped images may have
all images model and our OCTA and GC-­IPL model performed
deprived the model of potentially pertinent information in the
similarly to our all images and all data model and our GC-­IPL,
peripheral retina.
OCTA, quantitative data and patient data models, which contain
To determine what information the model identified and used
this intentional redundancy (table 1). The strong performance of
in the images to make decisions, attention maps were created.
our models that used only images without addition of supportive
Given the relatively lower AUC values associated with use of the
quantitative data suggests that an effective diagnostic model
UWF images, UWF attention maps demonstrated inconsistent,
using only multimodal images without quantitative summary
multifocal areas of attention (examples in online supplemental
data or patient data could predict a subject carries a diagnosis of
figure 2). However, review of the OCTA image attention maps
symptomatic AD.
demonstrated consistent foci of attention on the FAZ (figures 2B
and 3B). Prior OCTA studies have demonstrated retinal micro-
Model structure vascular changes in AD, including increased FAZ area,13 14
To use multimodal retinal images, our model employed an image which may explain our model’s consistent attention to this area.
feature extractor to detect useful features from each imaging Other scattered foci of attention were found throughout the
Wisely CE, et al. Br J Ophthalmol 2022;106:388–395. doi:10.1136/bjophthalmol-2020-317659 393
Clinical science

Br J Ophthalmol: first published as 10.1136/bjophthalmol-2020-317659 on 26 November 2020. Downloaded from http://bjo.bmj.com/ on February 9, 2024 by guest. Protected by copyright.
OCTA attention maps, which appear to correlate with areas of the translations (including but not limited to local regulations, clinical guidelines,
of decreased capillary density13 14 16 visible on the SCP OCTA terminology, drug names and drug dosages), and is not responsible for any error
and/or omissions arising from translation and adaptation or otherwise.
images, perhaps best illustrated by the two patches of attention
superiorly in figure 3B. ORCID iDs
In our patient cohort, neither brain MRI and PET images C. Ellis Wisely http://orcid.org/0000-0001-7675-689X
nor APOE ε4 status and CSF biomarkers were routinely used Ricardo Henao http://orcid.org/0000-0003-4980-845X
Dilraj S. Grewal http://orcid.org/0000-0002-2229-5343
for diagnosis of symptomatic AD. This approach may leave
Cason B. Robbins http://orcid.org/0000-0001-7909-510X
some uncertainty in the differentiation of AD from other less Sharon Fekrat http://orcid.org/0000-0003-4403-5996
common types of dementia; however, all participants with AD
were evaluated by an experienced memory disorders clinician
prior to enrolment, and a clinical diagnosis of AD was confirmed REFERENCES
using National Institute on Ageing/Alzheimer’s Association clin- 1 Blennow K, Zetterberg H. Biomarkers for Alzheimer’s disease: current status and
ical guidelines.3 Medical records of individuals with AD were prospects for the future. J Intern Med 2018.
2 Liao H, Zhu Z, Peng Y. Potential utility of retinal imaging for Alzheimer’s disease: a
reviewed by one of two experienced neurologists (AL or JB) to review. Front Aging Neurosci 2018;10:188.
determine a final diagnosis. Another potential limitation to the 3 McKhann GM, Knopman DS, Chertkow H, et al. The diagnosis of dementia due
generalisability of our model was that our dataset only included to Alzheimer’s disease: recommendations from the National Institute on Aging-­
patients without known ocular disease in order to exclude poten- Alzheimer’s association workgroups on diagnostic guidelines for Alzheimer’s disease.
Alzheimers Dement 2011;7:263–9.
tial confounders. Given the higher incidence of some ocular
4 London A, Benhar I, Schwartz M. The retina as a window to the brain-­from eye
diseases in AD populations,32 it will be important in future studies research to CNS disorders. Nat Rev Neurol 2013;9:44–53.
to demonstrate diagnostic accuracy of the model in patients with 5 Hart NJ, Koronyo Y, Black KL, et al. Ocular indicators of Alzheimer’s: exploring disease
common ocular diseases, such as glaucoma and diabetic reti- in the retina. Acta Neuropathol 2016;132:767–87.
nopathy. Additionally, in the present dataset, we prescreened all 6 Heringa SM, Bouvy WH, van den Berg E, et al. Associations between retinal
microvascular changes and dementia, cognitive functioning, and brain imaging
images for quality and excluded poor quality images. To create abnormalities: a systematic review. J Cereb Blood Flow Metab 2013;33:983–95.
an algorithm that can be used clinically for prediction of AD, our 7 Cipollini V, Abdolrahimzadeh S, Troili F, et al. Neurocognitive assessment and retinal
next-­generation model will need to identify and exclude poor thickness alterations in Alzheimer disease: is there a correlation? J Neuroophthalmol
quality images. Finally, our model performance results demon- 2019.
strated in table 1 exhibit relatively large variance. We suspect this 8 Chan VTT, Sun Z, Tang S, et al. Spectral-­Domain OCT measurements in Alzheimer’s
disease: a systematic review and meta-­analysis. Ophthalmology 2019;126:497–510.
can be attributed our sample size limitations. 9 Lad EM, Mukherjee D, Stinnett SS, et al. Evaluation of inner retinal layers as
Our CNN used multimodal retinal images with and without biomarkers in mild cognitive impairment to moderate Alzheimer’s disease. PLoS One
OCT and OCTA quantitative data and selected clinical data for 2018;13:e0192646.
automated detection of symptomatic AD and achieved a best AUC 10 Cheung CY-­lui, Ong YT, Hilal S, et al. Retinal ganglion cell analysis using high-­
definition optical coherence tomography in patients with mild cognitive impairment
of 0.841 using an independent test dataset. GC-­IPL maps were
and Alzheimer’s disease. J Alzheimers Dis 2015;45:45–56.
the most useful single inputs for prediction. Based on our prom- 11 Choi SH, Park SJ, Kim NR. Macular ganglion cell -Inner plexiform layer thickness is
ising findings, we plan to collect more multimodal retinal images associated with clinical progression in mild cognitive impairment and Alzheimers
from a larger and more diverse cohort, which may improve disease. PLoS One 2016;11:e0162202.
the diagnostic performance of our model and potentially allow 12 Lu Y, Li Z, Zhang X, et al. Retinal nerve fiber layer structure abnormalities in early
Alzheimer’s disease: evidence in optical coherence tomography. Neurosci Lett
future translation of this model into clinical settings. Strong 2010;480:69–72.
performance of models containing only GC-­IPL map images and 13 Grewal DS, Polascik BW, Hoffmeyer GC, et al. Assessment of differences in retinal
OCTA images without quantitative data suggests that a CNN microvasculature using OCT angiography in Alzheimer’s disease: a twin discordance
using only OCT and OCTA imaging inputs may have utility for report. Ophthalmic Surg Lasers Imaging Retina 2018;49:440–4.
prediction of a diagnosis of symptomatic AD without quanti- 14 Bulut M, Kurtuluş F, Gözkaya O, et al. Evaluation of optical coherence
tomography angiographic findings in Alzheimer’s type dementia. Br J Ophthalmol
tative or clinical data supplementation. However, it remains 2018;102:233–7.
uncertain how such retinal imaging models will compare to the 15 Uchida APJ, Bermel R, Bonner-­Jackson A, et al. Outer retinal assessment using
diagnostic accuracy of established methods for AD diagnosis like spectral-­domain optical coherence tomography in patients with Alzheimer’s and
neuroimaging and CSF or serologic biomarkers. Parkinson’s disease. Visual Neuroscience 2018;59:2768–77.
16 Yoon SP, Grewal DS, Thompson AC, et al. Retinal microvascular and neurodegenerative
changes in Alzheimer’s disease and mild cognitive impairment compared with control
Contributors CEW was the first author and drafted the manuscript. DW, RH and
participants. Ophthalmology Retina 2019;3:489–99.
LC coauthored the manuscript, provided critical review, and designed the CNN. ACT,
17 Almeida ALM, Pires LA, Figueiredo EA, et al. Correlation between cognitive impairment and
SS, CBR, SY, JB, BWP and AL were involved in patient recruitment, data management,
retinal neural loss assessed by swept-­source optical coherence tomography in patients with
imaging and review of the manuscript. ACT performed statistical analysis. DSG
mild cognitive impairment. Alzheimers Dement 2019;11:659–69.
and SF designed the study, coauthored the manuscript and critically reviewed the
18 Amoroso NDD, Fanizzi A, La Rocca M, et al. for the Alzheimer’s Disease Neuroimaging
manuscript.
Initiative. Deep learning reveals Alzheimer’s disease onset in MCI subjects: Results
Funding This research was supported in part by Alzheimer’s Drug Discovery from an international challenge. Journal of Neuroscience Methods 2018;302:3–9.
Foundation. 19 Basaia S, Agosta F, Wagner L, et al. Automated classification of Alzheimer’s disease
Competing interests None declared. and mild cognitive impairment using a single MRI and deep neural networks.
Neuroimage Clin 2019;21:101645.
Patient consent for publication Not required. 20 Liu M, Cheng D, Wang K, et al. Alzheimer’s Disease Neuroimaging I. Multi-­modality
Provenance and peer review Not commissioned; externally peer reviewed. cascaded convolutional neural networks for Alzheimer’s disease diagnosis.
Neuroinformatics 2018;16:295–308.
Data availability statement Data from this study is available upon reasonable 21 Shi J, Zheng X, Li Y, et al. Multimodal neuroimaging feature learning with multimodal
request from our corresponding author, Dr. Sharon Fekrat, ​sharon.​fekrat@​duke.​edu. stacked deep polynomial networks for diagnosis of Alzheimer’s disease. IEEE J Biomed
Supplemental material This content has been supplied by the author(s). It Health Inform 2018;22:173–83.
has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have 22 Leuzy A, Heurling K, Ashton NJ, et al. In vivo Detection of Alzheimer’s Disease. Yale J
been peer-­reviewed. Any opinions or recommendations discussed are solely those Biol Med 2018;91:291–300.
of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and 23 Spasov SPL, Duggento A, Lio P, et al. for the Alzheimer’s Disease Neuroimaging
responsibility arising from any reliance placed on the content. Where the content Initiative. A parameter-­efficient deep learning apprroach to predict conversion from
includes any translated material, BMJ does not warrant the accuracy and reliability mild cognitive impairment to Alzheimer’s disease. NeuroImage 2019;189:276–87.

394 Wisely CE, et al. Br J Ophthalmol 2022;106:388–395. doi:10.1136/bjophthalmol-2020-317659


Clinical science

Br J Ophthalmol: first published as 10.1136/bjophthalmol-2020-317659 on 26 November 2020. Downloaded from http://bjo.bmj.com/ on February 9, 2024 by guest. Protected by copyright.
24 Cheung CY-­L, Ikram MK, Chen C, et al. Imaging retina to study dementia and stroke. 28 Goodfellow I, Bengio Y, Courville A. Deep learning. Cambridge, MA: The MIT Press,
Prog Retin Eye Res 2017;57:89–107. 2016.
25 He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. 29 Zhang H, Goodfellow I, Metaxas D, et al. Self-­Attention generative Adversarial
Proceedings of the IEEE conference on computer vision and pattern recognition networks. Stat 2018.
2016:770–8. 30 Csincsik L, MacGillivray TJ, Flynn E, et al. Peripheral retinal imaging biomarkers for
26 Hastie T, Tibshirani R, Friedman J. The elements of statistical learning: data mining, Alzheimer’s disease: a pilot study. Ophthalmic Res 2018;59:182–92.
inference, and prediction. Springer Series in Statistics 2008;241-­245:392–4. 31 Ukalovic K, Cao S, Lee S, et al. Drusen in the peripheral retina of the Alzheimer’s eye.
27 Kingma DP, JL B. Adam: a method for stochastic optimization. International Curr Alzheimer Res 2018;15:743–50.
Conference on Learning Representations 2015:1–15. 32 Lee CS, Apte RS. Retinal Biomarkers of Alzheimer’s Disease. Am J Ophthalmol 2020.

Wisely CE, et al. Br J Ophthalmol 2022;106:388–395. doi:10.1136/bjophthalmol-2020-317659 395

You might also like