You are on page 1of 9

Translational Stroke Research

https://doi.org/10.1007/s12975-020-00811-2

ORIGINAL ARTICLE

Stability Assessment of Intracranial Aneurysms Using Machine


Learning Based on Clinical and Morphological Features
Wei Zhu 1 & Wenqiang Li 1 & Zhongbin Tian 1 & Yisen Zhang 1 & Kun Wang 1 & Ying Zhang 1 & Jian Liu 1 & Xinjian Yang 1

Received: 19 February 2020 / Revised: 19 March 2020 / Accepted: 19 March 2020


# Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract
Machine learning (ML) as a novel approach could help clinicians address the challenge of accurate stability assessment of
unruptured intracranial aneurysms (IAs). We developed multiple ML models for IA stability assessment and compare their
performances. We enrolled 1897 consecutive patients with unstable (n = 528) and stable (n = 1539) IAs. Thirteen patient-specific
clinical features and eighteen aneurysm morphological features were extracted to generate support vector machine (SVM),
random forest (RF), and feed-forward artificial neural network (ANN) models. The discriminatory performances of the models
were compared with statistical logistic regression (LR) model and the PHASES score in IA stability assessment. Based on the
receiver operating characteristic (ROC) curve and area under the curve (AUC) values for each model in the test set, the AUC
values for RF, SVM, and ANN were 0.850 (95% CI 0.806–0.893), 0.858 (95 %CI 0.816–0.900), and 0.867 (95% CI 0.828–
0.906), demonstrating good discriminatory ability. All ML models exhibited superior performance compared with the statistical
LR and the PHASES score (the AUC values were 0.830 and 0.589, respectively; RF versus PHASES, P < 0.001; RF versus LR,
P = 0.038). Important features contributing to the stability discrimination included three clinical features (location, sidewall/
bifurcation type, and presence of symptoms) and three morphological features (undulation index, height-width ratio, and irreg-
ularity). These findings demonstrate the potential of ML to augment the clinical decision-making process for IA stability
assessment, which may enable more optimal management for patients with IAs in the future.

Keywords Intracranial aneurysms . Risk evaluation . Artificial intelligence . Machine learning . Unstable aneurysm

Introduction decades, a growing number of incidental unruptured aneu-


rysms are being detected [2]. As a consequence, clinicians
Intracranial aneurysms (IAs) affect about 3–5% of the adult are faced with uncertainty regarding the choice of optimal
population [1]. As the application of non-invasive imaging management for such IAs, including preventive treatment,
modalities as diagnostic tools has increased over the past which is associated with risk of procedure-related complica-
tions, and conservative management, which is associated with
This work was originated from Beijing Neurosurgical Institute and risk of IA rupture. The ability to accurately and promptly
Beijing Tian Tan Hospital, No.119 South Fourth Ring West Road, distinguish unstable IAs from stable ones could facilitate treat-
Fengtai District, Beijing
ment optimization and, thus lead to better outcomes [3].
Electronic supplementary material The online version of this article In recent years, a large number of factors have been report-
(https://doi.org/10.1007/s12975-020-00811-2) contains supplementary
ed to be related to unstable IAs with high rupture risk. These
material, which is available to authorized users.
factors include patient-specific clinical features, such as gen-
* Jian Liu
der, age, smoking, and hypertension [4, 5], and morphological
jianliu_ns@163.com features of IAs, such as maximum size, aspect ratio (AR), and
* Xinjian Yang
size ratio (SR) [6, 7]. However, as the correlation among those
yangxinjian@voiceoftiantan.org risk factors and aneurysm status is complex and multifactorial,
no consensus has been reached and stability assessment re-
1
Department of Interventional Neuroradiology, Beijing Neurosurgical mains difficult. Furthermore, the difficulty has also been ex-
Institute and Beijing Tiantan Hospital, Capital Medical University, acerbated by the discrepancy between two-dimensional (2D)
No.119 South Fourth Ring West Road, Fengtai District,
Beijing 100050, China
projection and three-dimensional (3D) projection [8, 9]. The
Transl. Stroke Res.

PHASES score is one major attempt for assessing the 5-year prior to analysis. We retrospectively review the medical re-
rupture risk of unruptured IAs from a set of easily accessible cords and image data in our database. The inclusion criteria
patient and aneurysm characteristics [10]. The score is based were (a) age over 18 years, (b) diagnosis of saccular aneu-
on the six most rupture-related features, including population, rysm, and (c) accessibility of clinical and radiological data.
age, hypertension, size, earlier subarachnoid hemorrhage The exclusion criteria were (a) diagnosis of traumatic, infec-
(SAH) history, and the IA location. However, the score does tious, fusiform, or dissecting aneurysms; (b) patients with
not account for any morphological features, so it needs further moyamoya disease or vascular malformation; and (c) absence
improvement before it can be widely used in clinical setting of clinical data or high-quality radiological data. The criteria
[11]. for unstable aneurysm include ruptured aneurysm within
Machine learning (ML) has been widely used in medical 1 month and growing aneurysm in a sequential imaging fol-
fields for image detection, diagnosis, and outcome prediction. low-up. The process of patient selection is summarized in
Since it belongs to the domain of artificial intelligence (AI), Fig. 1. A total of 2067 IAs in 1897 patients with unstable
ML outweighs traditional statistical methods in the process of (n = 528) and stable (n = 1539) IAs were included in this
non-linear relationships and complex patterns problem [12, study.
13]. Previous studies have proven the efficacy and potential
of ML approaches in several medical fields [14–16]. For the
purpose of IA stability assessment, Liu et al. developed a two- Clinical and Morphological Features
layer feed-forward artificial neural network model to predict
the rupture risk of anterior communicating artery aneurysms, The following clinical features were collected from medical
and the overall prediction accuracy could reach 94.8% for all records: age, sex, smoking, drinking, hypertension, hyperlipe-
samples [17]. Kim et al. developed a convolutional neural mia, diabetes mellitus, coronary heart disease, family history,
network based on 3D digital subtraction angiography (DSA) presence of symptoms, and multiplicity, as well as the IA
for predicting the rupture risk in patients harboring small-size location. The IA locations were categorized as (a) internal
aneurysms, and the overall accuracy was 76.84% [18]. Liu carotid artery (ICA), (b) middle cerebral artery (MCA), (c)
et al. used ML to predict aneurysm stability and found that anterior cerebral artery (ACA), (d) anterior communicating
flatness was the most important morphological determinant in artery (AComA), (e) posterior communicating artery
aneurysm stability [19]. However, to the best of our knowl- (PComA), and (f) posterior circulation (PC) IAs.
edge, no previous study has ever compared ML models with To acquire precise and objective measurements, quantita-
statistical method or the PHASES score in IA stability assess- tive morphological features in our study were extracted and
ment. Furthermore, the correlations between IA stability and
various rupture-related risk features have not been studied in
detail or examined in a large sample size.
In this study, we utilize a large data set to build multiple
ML models to discriminate IA stability (i.e., stable versus
unstable) based on patient clinical features and aneurysm mor-
phological features measured from 3D DSA. We compare the
discriminatory ability of each ML model with those of the
other models, classic statistical logistic regression (LR), and
the PHASES score. We further identify and compare features
with significant contributions to IA stability selected in the
aforementioned models.

Method

Patient Population

This study included clinical and radiological data in patients


diagnosed with IAs at Beijing Tiantan hospital between
January 2014 and December 2018. This study was approved
Fig. 1 The flowchart shows the patients selection, process of
by the Institutional Review Board of Beijing Tiantan hospital, discriminatory model generation, and final evaluation. SAH indicates
and written informed consent was obtained from each patient subarachnoid hemorrhage, CV indicates cross-validation. ROC means
before operation. The patients’ information was de-identified receiver operating characteristic curve
Transl. Stroke Res.

measured from the reconstructed 3D DSA. Three-dimensional to estimate the performance of each model with new data in a
models of IAs were reconstructed from DSA into a standard generalizable way. The overall accuracy, sensitivity, and spec-
tessellation language (STL) format; the method of construct- ificity of each model were calculated accordingly. This pro-
ing and refining the 3D STL models was described in detail cess is outlined in Fig. 1.
previously [20, 21]. Morphological features were defined as in
previous studies and calculated using the GEOMAGIC 12.0 Statistical Logistic Regression Model
software (Geomagic, Morrisville, North Carolina) and Matlab
(The Math Works, Inc., Natick, Ma, USA). Blind to patient LR is a classic statistical regression model that is used to select
information and stability status, morphological feature evalu- significantly important variables and predict outcome proba-
ation was independently done by two interventionalists bility. The variables used for the binary LR model were ini-
(Zhongbin Tian and Kun Wang) before group allocation; dis- tially selected and evaluated by the univariate logistic regres-
cordance was resolved by a third evaluator (Jian Liu with sion. Then, a classic LR model was constructed with these
more than 10 years of experience in neuroradiology). A total variables and used to assess IA stability.
of eighteen morphological features that comprehensively de-
scribe the IAs and parent vessel geometries were measured, PHASES Score System
including the maximum height (the maximum distance of the
dome from the neck center), perpendicular height (the maxi- We aim to compare the performances of ML models with the
mum perpendicular distance of the dome from the neck previously reported PHASES score, which is based on six
plane), neck diameter, width (the maximum distance of dome readily accessible risk factors, to predict the risk of rupture
perpendicular to maximum height), transverse diameter (the for unruptured IAs. Here, PHASES scores less or equal to 5
maximum distance of dome perpendicular to perpendicular were considered as stable IAs with low rupture risk, which
height), maximal diameter (the largest distance within aneu- correlates with a 5-year risk of IA rupture of 1.3% [10].
rysm sac, used as the size), and volume. In addition, the AR,
SR, undulation index (UI), nonsphericity index (NSI), Statistical Analysis
volume-to-neck ratio (VNR), height-width ratio, and bottle-
neck factor were defined and calculated as described in previ- Statistical analysis was performed on the clinical and morpho-
ous studies [22–25]. Features related to the parent vessel, such logical variables to identify those that were significantly dif-
as the flow angle (angles between the vector of the IA size and ferent between the stable IA and unstable IA groups. For con-
vector of the centerline of the feeding parent vessel) and an- tinuous variables, the differences between the groups were
eurysm angle (angles between the neck and maximum height), tested using the Student t test or Mann-Whitney U test. For
were also calculated [24]. Irregular shape was defined based categorical variables, a chi-square test was used to evaluate the
on the presence of small bleb(s), bi- or multi-lobular, or pro- differences between groups. Receiver operating characteristic
truding bulge(s) from the IA fundus. The relative location of (ROC) curves were generated, and the area under the curve
aneurysm to parent vessel was dichotomized as the sidewall (AUC) values were calculated to compare the efficacies of the
type or bifurcation type. Detailed descriptions of all features different models. Since the aim of our study was to validate
are provided in the Supplementary Materials. the superiority of ML models relative to previous discrimina-
tory models, the worst ML model was selected for comparison
Machine Learning Models with the statistical LR model and the PHASES score. The
ROC curves between different methods were compared by
Three binary classification ML models, RF, SVM, and feed- the asymptotically exact method described by Delong et al.
forward ANN models, were built and applied to assess IA [26]. Statistical significance was defined as P < 0.05. The R
stability. All of these models are widely used and have shown software was used for all statistical analyses (version 3.6.1, R
good performance in clinical classification and outcome pre- Foundation for Statistical Computing, Vienna, Austria).
diction studies. The models are described in detail in the
Supplementary Materials.
The whole dataset was randomly divided into two subsets: Results
a training set (1656 IAs, 80%) and a test set (411 IAs, 20%).
The ratio of stable IAs to unstable IAs was maintained across Demographic and Clinical Features
both the two subsets. The training set was used to process the
input features and generate the prediction models with a 10- As shown in Figs. 1, 282 patients were excluded from this
fold cross-validation and select the optimal model and hyper- analysis because of age < 18 (n = 13); traumatic, infectious,
parameters. Additional holdout as test set was set aside and fusiform, or dissecting IAs (n = 139); uncertain SAH history
not used during the model training phase. The test set was later (n = 24); missing data (n = 17); and poor radiological quality
Transl. Stroke Res.

(n = 99). A total of 1897 patients with 2067 IAs were includ- type, implying that the bifurcation-type IAs are less stable
ed: the mean age was 55.30 ± 10.32 years. A total of 1262 than sidewall-type IAs (P < 0.001).
(66.5%) were female. Among the IAs, 1539 were stable (mean
age 55.46 ± 9.81) and 528 were unstable (mean age 54.83 ±
11.67), of which 449 ruptured within 1 month and 79 grew in Aneurysm Morphological Features
imaging follow-up (median 15.6 months; range 5–
39 months). The morphological features of the stable and unstable IAs are
The demographic and clinical features are summarized in summarized in Table 2. Aneurysms were categorized based
Table 1: female, smoking, drinking, hypertension, coronary on maximum size: 56 were categorized as tiny (< 3 mm), 1299
heart disease, presence of symptoms, and positive family his- as small (< 7 mm), 462 as medium (7–10 mm), and 250 as
tory were more common among those with unstable IAs large (> 10 mm). The mean size of unstable IAs was 6.80 ±
(P < 0.05). IA stability was significantly associated with loca- 2.76 mm, which was not significantly different from that of
tion: the portion of unstable IAs located in the AComA and the stable IAs (6.71 ± 3.26 mm, P = 0.548), implying the sim-
PComA was significantly higher than that in the groups of ilar size of those unstable and stable IAs. Among other size
stable IAs (29.7% vs. 10.0%, respectively, in the AComA; indices, significant differences were found in the aneurysm
13.8% vs. 9.9%, respectively, in the PComA, P < 0.001). In width, neck diameter, transverse diameter, and volume be-
contrast, stable IAs were more prevalent in the ICA than un- tween the stable and unstable groups (P < 0.05); the discrep-
stable IAs (54.8% vs 33.9%, respectively). Multiple aneu- ancy in size indices demonstrated that the differences between
rysms were more common in the stable group than in the two groups were not mainly focused on aneurysm size dimen-
unstable group (P < 0.001). Considering the relative location sion but on the aneurysm shape variances.
of IAs, 71.2% (374 of 528) of unstable IAs were bifurcation The shape indices, AR, SR, UI, NSI, and height-width ratio
were significantly higher in unstable IAs than in stable IAs
(P < 0.001), indicating that unstable IAs were often in abnor-
Table 1 Demographic and clinical features between stable and unstable
IAs
mal shape than their stable counterpart. Regard to regularity,
30.1% (159 of 528) of all unstable IAs exhibited irregular
Stable (n = 1539) Unstable P value shapes, which is higher than the portion of 11.9% (183 of
(n = 528)

Age (mean) year 55.46 ± 9.81 54.83 ± 11.67 0.229 Table 2 Aneurysm morphological features between stable and unstable
Gender (female) 1055 (68.6%) 321 (60.8%) 0.001* IAs
Hypertension 758 (49.3%) 324 (61.4%) < 0.001* Stable (n = 1539) Unstable (n = 528) P value
Hyperlipidemia 144 (9.4%) 61 (11.6%) 0.152
Diabetes mellitus 159 (10.3%) 55 (10.4%) 0.934 Max diameter 6.71 ± 3.26 6.80 ± 2.76 0.548
Coronary heart disease 93 (6%) 53 (10.4%) 0.003* Maximum height 4.78 ± 2.56 5.02 ± 2.22 0.055
Smoker 307 (19.9%) 175 (33.1%) < 0.001* Perpendicular height 4.09 ± 2.33 4.14 ± 1.85 0.658
Drinker 292 (19.0%) 163 (30.9%) < 0.001* Neck diameter 5.52 ± 2.29 5.02 ± 1.99 < 0.001*
Presence of symptom 784 (50.9%) 434 (82.2%) < 0.001* Aneurysm width 5.41 ± 2.67 5.05 ± 2.16 0.001*
Family history 110 (7.1%) 63 (11.9%) 0.001* Transverse diameter 5.65 ± 2.77 5.34 ± 2.34 0.005*
Multiplicity 592 (38.5%) 137 (25.9%) < 0.001* Aneurysm volume 0.11 ± 0.28 0.08 ± 0.17 0.017*
Location < 0.001* Aneurysm angle 52.72 ± 20.87 49.70 ± 20.37 0.004*
ICA 843 (54.8%) 173 (33.9%) Flow angle 99.39 ± 32.76 113.10 ± 29.36 < 0.001*
MCA 189 (12.3%) 62 (11.7%) AR 0.78 ± 0.41 0.89 ± 0.41 < 0.001*
ACA 64 (4.2%) 21 (4.0%) SR 1.68 ± 1.07 2.02 ± 1.04 < 0.001*
AComA 154 (10.0%) 157 (29.7%) UI 0.15 ± 0.06 0.18 ± 0.74 < 0.001*
Posterior circulation 136 (8.8%) 36 (6.8%) NSI 0.28 ± 0.11 0.31 ± 0.10 < 0.001*
PComA 153 (9.9%) 73 (13.8%) VNR 3.11 ± 4.12 3.11 ± 3.71 0.995
Sidewall/bifurcation < 0.001* Height-width ratio 0.89 ± 0.21 1.01 ± 0.26 < 0.001*
Sidewall 970 (63%) 154 (28.8%) Bottleneck factor 0.88 ± 0.38 0.95 ± 0.33 0.964
Bifurcation 569 (37%) 374 (71.2%) Vessel diameter 3.21 ± 0.85 2.72 ± 0.78 < 0.001*
Irregularity 183 (11.9%) 159 (30.1%) < 0.001*
*P < 0.05
ICA, internal carotid artery; MCA, middle cerebral artery; ACA, anterior *P < 0.05
cerebral artery; AComA, anterior communicating artery; PComA, poste- AR, aspect ratio; SR, size ratio; UI, undulation index; NSI, nonspherical
rior communicating artery index; VNR, volume-to-neck ratio
Transl. Stroke Res.

1539) in the stable IAs, and this difference was statistically 0.833). PHASES score had the worst performance with
significant (P < 0.001). an AUC value of 0.615 (95% CI 0.586–0.644). Among
Features related to the parent vessel, including the aneu- the three ML models, the RF model had the worst per-
rysm angle and flow angle, also differed between the stable formance; however, the AUC value of the RF model
and unstable IAs (P = 0.004 and P < 0.001, respectively). was still significantly higher than those of the LR mod-
Furthermore, compared with stable IAs, unstable IAs tended el and the PHASES score (P = 0.045 and P < 0.001, re-
to have smaller parent vessel size (P < 0.05), which is consis- spectively), implying that the ML models provided bet-
tent with the observed differences in the IA location between ter performance than LR model and the PHASES score
the stable and unstable IA groups. in the training set.
The accuracy metrics: sensitivity, specificity, and accuracy
Comparison of Discriminatory Models in the Training of each model are listed in Table 3. Although the specificities
Set of ML models and classic statistical model show no evident
difference (in the range of 88.9–95.4%), the sensitivity (in the
The ROC curve and AUC value of each model in the range of 50.8–56.7% for the ML models compared with
training set are shown in Fig. 2. The ANN model 37.2% for LR) and accuracy (in the range of 82.6–85.5% for
showed the best performance with an AUC value of the ML models compared with 75.7% for LR) were much
0.851 (95% CI 0.828–0.873). The AUC values of the higher in ML models (up to about 10–20%), These findings
RF model and the SVM model were 0.832 (95% CI in accuracy metrics further demonstrate that the ML models
0.809–0.855) and 0.840 (95% CI 0.817–0.863). The displayed better performance than the statistical method in IA
AUC value of the LR was 0.811 (95% CI 0.788– stability assessment.

Fig. 2 The receiver operating characteristic curve of the machine learning There were significant differences between machine learning and
models, classic logistic regression model, and PHASES score system in PHASES score, statistical model and PHASES score, and machine
the training set. The area under curve value was provided at the bottom. learning and statistical model.
The area under curve (AUC) is displayed as bar graph in the training set.
Transl. Stroke Res.

Table 3 Performance of each models in the training set The sensitivity, specificity, and accuracy of each models in
Sensitivity Specificity Accuracy AUC the test set are listed in Table 4. Similar to the finding in the
training set, the specificity was similar between the ML
Machine learning models and the classic statistical model (ranging from 88.3–
RF 56.7% 95.4% 85.5% 0.832 (0.809–0.859) 92.9%). However, the sensitivity (ranging from 51.5–61.2%
SVM 50.8% 93.8% 82.6% 0.840 (0.817–0.863) for the ML models compared with 33.9% for the LR model)
ANN 55.0% 95.3% 85.0% 0.851 (0.829–0.873) and accuracy (ranging from 81.5–82.4% for the ML models
Classic statistical compared with 74.5% for the LR model) were much higher
Classic LR 37.2% 88.9% 75.7% 0.811 (0.788–0.833) with the ML models.
Clinical score
PHASES 13.8% 88.7% 69.6% 0.615 (0.586–0.644) Important Features to Aneurysm Stability
RF, random forest; SVM, linear support vector machine; ANN, artificial
neural network; LR, logistic regression The importance of each variable is calculated for the RF and
SVM models. In addition, the classical LR model is common-
Comparison of Discriminatory Models in the Test Set ly used as a statistical method for variable selection. Thus, in
accordance with the composition of the PHASES score, the
The models were further tested in a separate test set to evaluate six most important IA stability-related features were selected
their generalizability and broad applicability. The ROC curve and shown in Table 5. The UI, height-width ratio, and pres-
and AUC values of each model are shown in Fig. 3. The AUC ence of symptoms were among the most important features in
values of each ML models are as follows: RF 0.850 (95% CI both the classic statistical model and the ML models. The
0.8061–0.893), SVM 0.858 (95% CI 0.816–0.900), ANN location, irregularity, and sidewall/bifurcation type were also
0.867 (95% CI 0.828–0.906). As in the training set, the RF the most important in two of these three discriminatory
model had the worst performance among the three ML models. Notably, among the risk factors that are included in
models. However, similar to the finding in the training set, the PHASES score, only the location was identified as the
the AUC of the RF model was still better than those of the most important features in the other discriminatory models.
LR and the PHASES score: 0.818 (95% CI 0.776–0.860) and
0.589 (95% CI 0.527–0.651), respectively; these differences
were statistically significant (P < 0.001 and P = 0.038, respec- Discussion
tively). These findings further confirm the superior perfor-
mance of the ML models and demonstrate that this finding In our study, we developed multiple ML models for IA sta-
is generalizable beyond the training set. bility assessment based on clinical features and morphological

Fig. 3 The receiver operating characteristic curve of the machine learning logistic regression, NN feed-forward neural network, SVM support
models, classic logistic regression model, and PHASES score system in vector machine, and RF random forest. There were significant
the test set. The area under curve value were provided as the bar graph. differences between machine learning and PHASES score, statistical
PHASES indicates PHASES score system, LR indicates statistical model and PHASES score, and machine learning and statistical model
Transl. Stroke Res.

Table 4 Performance of each models in the test set Although the specificity and accuracy of the PHASES appear
Sensitivity Specificity Accuracy AUC to be moderate (in the range of approximately 70–90%), the
sensitivities were only 13.8% and 9.7% in the training set and
Machine learning the test set, respectively. This gap indicates that the PHASES
RF 54.4% 90.9% 81.8% 0.850 (0.806–0.893) score tends to conservatively misclassify IAs as having low
SVM 61.2% 88.3% 81.5% 0.858 (0.816–0.900) rupture risk and cannot reliably recognize differences between
ANN 51.5% 92.9% 82.4% 0.867 (0.838–0.906) stable and unstable IAs. Considering the potential catastrophic
Classic statistical outcome of misdiagnosing unstable IAs with high rupture risk
Classic LR 33.9% 88.3% 74.5% 0.818 (0.776–0.860) as stable IAs with low rupture risk, it is difficult to consider the
Clinical score PHASES score as a useful tool in daily clinical practice.
PHASES 9.7% 95.4% 73.0% 0.589 (0.527–0.651) Previously, Foreman et al. and Pagiola et al. both reported that
the majority of a SAH patients present with low PHASES
RF, random forest; SVM, linear support vector machine; ANN, artificial scores (≤ 5), which represent stable IAs with relatively low
neural network; LR, logistic regression
rupture risk [28, 29]. Thus, the poor performance of the
PHASES score in our IA cohort is consistent with the reports
features. Their performances were compared with those of a
of inconsistencies between the PHASES score and the actual
statistical LR model and the PHASES score. Result showed
rupture outcomes in numerous reports. In fact, to the best of
that all three ML models displayed good discriminatory abil-
our knowledge, this is the first study to evaluate the efficacy of
ity, outperforming both the statistical LR model and the
the PHASES score based on a large sample of IAs (2067) and,
PHASES score in discrimination between unstable and stable
at the same time, compare its discriminatory ability with those
IAs. This result suggests that ML models have great potential
of ML models.
in clinical IA stability assessment.
The classic statistical model had an AUC value of 0.818
ML represents a set of novel and efficient approaches en-
with a sensitivity of 33.9%, a specificity of 88.3%, and an
abling classification and outcome prediction to support clini-
accuracy of 74.5% in the final test set. As one of the most
cal decision-making and individualized management. The
classic and practical models in statistics, LR exhibited good
main advantage of ML over traditional statistical method is
performance in IA stability assessment. This result was con-
the ability to simultaneously process massive numbers of var-
sistent with previous reports: Prestigiacomo et al. developed a
iables and the capability to model non-linear relationships
binary LR model based on morphological features, which
[27]. Conventional statistical methods, like LR, however, are
yield a sensitivity, specificity, and accuracy of 83%, 78%,
limited in that they can only capture linear relationships.
and 80%, respectively [30]. Xiang et al. used multivariate
Similarly, most clinical scoring systems, including the
LR model that included three important variables (size ratio,
PHASES scoring system, were derived using the same statis-
average wall shear stress, and oscillatory shear index) and
tical method to select important factors; thus, they are also
found that its AUC value was 0.89 for discriminating IA status
restricted to linear relationships. Thus, theoretically, the ML
[31]. In this study, although the classic LR model achieved a
models should outperform conventional statistical methods
pretty good performance, it was still secondary to the ML
and the PHASES score system. Indeed, the results showed
models, especially in terms of the accuracy metrics.
that the ML models had good discriminatory ability and better
We compared three different types of ML models and val-
performance compared with the classic LR model and the
idated their performance in the test set. They had comparable
PHASES score in both the training set and the final test set.
discriminatory ability with AUC values between 0.850 and
0.867, with that of the ANN model being the highest. The
Table 5 Important features in models accuracy values were similar, with that of the ANN model
Classic statistical model Machine learning models being the highest. For all ML models, the specificity values
in the test set were higher than their sensitivity values (88.3–
LR SVM RF 92.9% and 51.5–61.2%, respectively); this may be explained
by the differences in proportion between unstable and stable
1 UI Symptom Symptom
IAs (unstable:stable = 1:3); less unstable cases caused less
2 Height-width ratio Height-width ratio Height-width ratio
efficient discriminatory ability in all models. Furthermore,
3 Symptom Location UI
the sensitivities of the three ML models were notably higher
4 Sidewall/bifurcation Vessel size Sidewall/bifurcation
than that of the LR model in both the training set and the test
5 Irregularity Flow angle Width
set, demonstrating another advantage of ML models over the
6 Location UI Irregularity
classic LR model. It is particularly important for a discrimina-
RF, random forest; SVM, linear support vector machine; LR, logistic tory model to have higher sensitivity given the devastating
regression; UI, undulation index
Transl. Stroke Res.

and potentially fatal consequences of IA rupture if unstable Limitations


IAs with high rupture risk are falsely classified as a low risk.
Though the ANN model had the best performance among The present study has several limitations. First, our major
the three ML models, the other two models, the RF and SVM limitation is the retrospective nature of this study; stable IAs
models, offered an advantage in the ability to calculate the at diagnosis did not warrant no rupture risk in the future; a
relative importance of each feature with respect to the model further prospective cohort would be ideal; however, because
outcome, making the algorithm easy to understand and ex- evidence about rupture risk factors already existed, it would
plain in clinical settings. The most important features contrib- have been unethical to leave UIAs with seemingly high risk
uting to IA stability selected by using the ML models were rupture without treatment. Second, although previous study
largely consistent with those selected by using the classic LR reported that IA morphology does not change dramatically
model. Both the ML and LR models recognized UI, height- [34], the event of rupture itself may have affected the mor-
width ratio, presence of symptoms, sidewall/bifurcation type, phology of IAs, this may generate a possible bias on our result
irregularity, and location as the most stability-related features. [35]. Third, all the patients with IAs came from one single
This finding is consistent with the findings in previous studies center; further validation needs to be done in other multiple
[20–22, 24]. Interestingly, among the top six most important centers. Also, we did not include any aneurysm wall informa-
risk features comprised equal numbers of clinical features and tion in this study. Further work should be done to strengthen
morphological features, suggesting both types of features are and verify our model by involving more risk-related parame-
critical in IA stability assessment. Thus, disregarding either ters. Finally, the stable and unstable IA cohort number is not
aspect would inhibit the ability to conduct a comprehensive perfectly balanced; this may affect the generalization of these
and informed assessment. Of these top six most important ML models. In the future, the validation of these model per-
features, only the location was included in the PHASES score, formance will be done in a prospective, multicenter, longitu-
which did not include any morphological features; this may dinal data set. Also, the development of a clinically used
explain the mediocre discriminatory ability of the PHASES decision-making assistant software in real clinical setting is
score in our study. Notably, three shape indices (the UI, our aim.
height-width ratio, and irregularity) were identified as impor-
tant contributors to IA stability in both the ML and classic LR
model, indicating that abnormal shape is a key distinction
between stable and unstable IAs, rather than size indices. Conclusions
AI technology offer a variety of strengths: First, AI algo-
rithms can be used to learn patterns from massive, complex ML models outperformed the statistical LR method and the
datasets and generate helpful predictive outputs. Second, AI PHASES score in IA stability assessment. The ML models
algorithms demonstrate striking performance, including high demonstrated great potential in aiding the clinical decision-
accuracy and speed. AI technologies are well adapted to rapid, making process, and their future application in clinical prac-
high-volume data processing, enabling tailored and specific tice may provide individualized and optimal management for
management based on each patient’s characteristics. Third, patients with IAs.
algorithms like RF and SVM can allow for outcome interpre-
tation by providing the relative importance of each feature, Funding Information This work was supported by National Key
Research and Development Plan of China (grant number:
allowing clinicians to more directly understand the algo- 2016YFC1300800), the National Natural Science Foundation of China
rithm’s internal mechanisms. The application of AI may sub- (grant numbers: 81801156, 81801158, and 81671139), the Special
stantially improve clinical safety and effectiveness. In partic- Research Project for Capital Health Development (grant number: 2018-
ular, AI has been initially applied to IA diagnosis, risk strati- 4-1077), and Beijing Hospitals Authority Youth Programme (code:
QML20190503).
fication, and outcome prediction [19, 32, 33]. As the technol-
ogy matures, AI algorithms with different functions can be
integrated into decision-making systems to aid clinicians. In Compliance with Ethical Standards
the future, such systems may be introduced into clinical
Conflict of Interest The authors declare that they have no conflict of
workflows as supportive and complementary tools for making interest.
day-to-day clinical judgements. In this capacity, they are ex-
pected to improve clinicians’ effectiveness and help devise Ethical Approval All procedures performed in this study were in accor-
accurate and individualized treatment plans for patients. dance with the ethical standards of the Institutional Review Board of
Beijing Tiantan Hospital and with the 1964 Helsinki Declaration and its
Based on the results from our study, we suggest that the
later amendments or comparable ethical standards.
application of ML has excellent potential in IA stability as-
sessment and could facilitate better clinical management of Informed Consent Informed consent was obtained from all individual
IAs in the future. participants included in the study.
Transl. Stroke Res.

References 18. Kim HC, Rhim JK, Ahn JH, et al. Machine learning application for
rupture risk assessment in small-sized intracranial aneurysm. J Clin
Med. 2019;8(5).
1. Vlak MH, Algra A, Brandenburg R, Rinkel GJ. Prevalence of
19. Liu Q, Jiang P, Jiang Y, Ge H, Li S, Jin H, et al. Prediction of
unruptured intracranial aneurysms, with emphasis on sex, age, co-
aneurysm stability using a machine learning model based on
morbidity, country, and time period: a systematic review and meta-
PyRadiomics-derived morphological features. Stroke. 2019;50(9):
analysis. Lancet Neurol. 2011;10(7):626–36.
2314–21.
2. Brown RD Jr, Broderick JP. Unruptured intracranial aneurysms: 20. Zhang Y, Jing L, Liu J, Li C, Fan J, Wang S, et al. Clinical, mor-
epidemiology, natural history, management options, and familial phological, and hemodynamic independent characteristic factors
screening. Lancet Neurol. 2014;13(4):393–404. for rupture of posterior communicating artery aneurysms. J
3. Thompson BG, Brown RD Jr, Amin-Hanjani S, et al. Guidelines Neurointerv Surg. 2016;8(8):808–12.
for the management of patients with unruptured intracranial aneu- 21. Liu J, Xiang J, Zhang Y, Wang Y, Li H, Meng H, et al.
rysms: a guideline for healthcare professionals from the American Morphologic and hemodynamic analysis of paraclinoid aneurysms:
Heart Association/American Stroke Association. Stroke. ruptured versus unruptured. J Neurointerv Surg. 2014;6(9):658–63.
2015;46(8):2368–400. 22. Hoh BL, Sistrom CL, Firment CS, et al. Bottleneck factor and
4. Korja M, Lehto H, Juvela S, Kaprio J. Incidence of subarachnoid height-width ratio: association with ruptured aneurysms in patients
hemorrhage is decreasing together with decreasing smoking rates. with multiple cerebral aneurysms. Neurosurgery. 2007;61(4):716–
Neurology. 2016;87(11):1118–23. 22 discussion 722-3.
5. Vlak MH, Rinkel GJ, Greebe P, Algra A. Risk of rupture of an 23. Ryu C-W, Kwon OK, Koh JS, Kim EJ. Analysis of aneurysm
intracranial aneurysm based on patient characteristics: a case- rupture in relation to the geometric indices: aspect ratio, volume,
control study. Stroke. 2013;44(5):1256–9. and volume-to-neck ratio. Neuroradiology. 2010;53(11):883–9.
6. Kashiwazaki D, Kuroda S. Size ratio can highly predict rupture risk 24. Dhar S, Tremmel M, Mocco J, et al. Morphology parameters for
in intracranial small (<5 mm) aneurysms. Stroke. 2013;44(8):2169– intracranial aneurysm rupture risk assessment. Neurosurgery.
73. 2008;63(2):185–96 discussion 196-7.
7. Ujiie H, Tamano Y, Sasaki K, et al. Is the aspect ratio a reliable 25. Raghavan ML, Ma B, Harbaugh RE. Quantified aneurysm shape
index for predicting the rupture of a saccular aneurysm? and rupture risk. J Neurosurg. 2005;102(2):355–62.
Neurosurgery. 2001;48(3):495–502 discussion 502-3. 26. DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the
8. Wong SC, Nawawi O, Ramli N, Abd Kadir KA. Benefits of 3D areas under two or more correlated receiver operating characteristic
rotational DSA compared with 2D DSA in the evaluation of intra- curves: a nonparametric approach. Biometrics. 1988;44(3):837–45.
cranial aneurysm. Acad Radiol. 2012;19(6):701–7. 27. Obermeyer Z, Emanuel EJ. Predicting the future - big data, machine
9. Brinjikji W, Cloft H, Lanzino G, Kallmes DF. Comparison of 2D learning, and clinical medicine. N Engl J Med. 2016;375(13):1216–
digital subtraction angiography and 3D rotational angiography in 9.
the evaluation of dome-to-neck ratio. AJNR Am J Neuroradiol. 28. Foreman PM, Hendrix P, Harrigan MR, Fisher WS 3rd, Vyas NA,
2009;30(4):831–4. Lipsky RH, et al. PHASES score applied to a prospective cohort of
10. Greving JP, Wermer MJ, Brown RD Jr, et al. Development of the aneurysmal subarachnoid hemorrhage patients. J Clin Neurosci.
PHASES score for prediction of risk of rupture of intracranial an- 2018;53:69–73.
eurysms: a pooled analysis of six prospective cohort studies. Lancet 29. Pagiola I, Mihalea C, Caroff J, et al. The PHASES score: to treat or
Neurol. 2014;13(1):59–66. not to treat? Retrospective evaluation of the risk of rupture of intra-
cranial aneurysms in patients with aneurysmal subarachnoid hem-
11. Neyazi B, Sandalcioglu IE, Maslehaty H. Evaluation of the risk of
orrhage. J Neuroradiol. 2019.
rupture of intracranial aneurysms in patients with aneurysmal sub-
30. Prestigiacomo CJ, He W, Catrambone J, Chung S, Kasper L,
arachnoid hemorrhage according to the PHASES score. Neurosurg
Pasupuleti L, et al. Predicting aneurysm rupture probabilities
Rev. 2019;42(2):489–92.
through the application of a computed tomography angiography-
12. Brusko GD, Kolcun JPG, Wang MY. Machine-learning models: derived binary logistic regression model. J Neurosurg.
the future of predictive analytics in neurosurgery. Neurosurgery. 2009;110(1):1–6.
2018;83(1):E3–e4. 31. Xiang J, Natarajan SK, Tremmel M, Ma D, Mocco J, Hopkins LN,
13. Senders JT, Arnaout O, Karhade AV, et al. Natural and artificial et al. Hemodynamic-morphologic discriminants for intracranial an-
intelligence in neurosurgery: a systematic review. Neurosurgery. eurysm rupture. Stroke. 2011;42(1):144–52.
2018;83(2):181–92. 32. Park A, Chute C, Rajpurkar P, Lou J, Ball RL, Shpanskaya K, et al.
14. Hostettler IC, Muroi C, Richter JK, Schmid J, Neidert MC, Seule Deep learning–assisted diagnosis of cerebral aneurysms using the
M, et al. Decision tree analysis in subarachnoid hemorrhage: pre- HeadXNet model. JAMA Netw Open. 2019;2(6):e195600.
diction of outcome parameters during the course of aneurysmal 33. Paliwal N, Jaiswal P, Tutino VM, Shallwani H, Davies JM,
subarachnoid hemorrhage using decision tree analysis. J Siddiqui AH, et al. Outcome prediction of intracranial aneurysm
Neurosurg. 2018;129(6):1499–510. treatment by flow diverters using machine learning. Neurosurg
15. Rubbert C, Patil KR, Beseoglu K, Mathys C, May R, Kaschner Focus. 2018;45(5):E7.
MG, et al. Prediction of outcome after aneurysmal subarachnoid 34. Rahman M, Ogilvy CS, Zipfel GJ, et al. Unruptured cerebral aneu-
haemorrhage using data from patient admission. Eur Radiol. rysms do not shrink when they rupture: multicenter collaborative
2018;28(12):4949–58. aneurysm study group. Neurosurgery. 2011;68(1):155–60 discus-
16. Heo, J., J.G. Yoon, H. Park, et al., Machine learning-based model sion 160-1.
for prediction of outcomes in acute stroke. Stroke, 2019. 50(5): p. 35. Skodvin TO, Johnsen LH, Gjertsen O, Isaksen JG, Sorteberg A.
1263–1265. Cerebral aneurysm morphology before and after rupture: nation-
17. Liu J, Chen Y, Lan L, Lin B, Chen W, Wang M, et al. Prediction of wide case series of 29 aneurysms. Stroke. 2017;48(4):880–6.
rupture risk in anterior communicating artery aneurysms with a
feed-forward artificial neural network. Eur Radiol. 2018;28(8): Publisher’s Note Springer Nature remains neutral with regard to jurisdic-
3268–75. tional claims in published maps and institutional affiliations.

You might also like