Abe 2022 Oi 220481 1654263485.26457

Original Investigation | Emergency Medicine
A Prehospital Triage System to Detect Traumatic Intracranial Hemorrhage

Using Machine Learning Algorithms
Daisu Abe, MD; Motoki Inaji, MD, PhD; Takeshi Hase, PhD; Shota Takahashi, MD; Ryosuke Sakai, MD; Fuga Ayabe, MD; Yoji Tanaka, MD, PhD;
Yasuhiro Otomo, MD, PhD; Taketoshi Maehara, MD, PhD
Abstract Key Points

Question Can machine learning
IMPORTANCE An adequate system for triaging patients with head trauma in prehospital settings
algorithms be used to triage patients
and choosing optimal medical institutions is essential for improving the prognosis of these patients.
with head trauma according to their
To our knowledge, there has been no established way to stratify these patients based on their head
severity before transportation?
trauma severity that can be used by ambulance crews at an injury site.
Findings In this cohort study of 2123
OBJECTIVES To develop a prehospital triage system to stratify patients with head trauma according patients with head trauma, a machine
to trauma severity by using several machine learning techniques and to evaluate the predictive learning–based prediction model
accuracy of these techniques. detected traumatic intracranial
hemorrhage with a sensitivity of 74%
DESIGN, SETTING, AND PARTICIPANTS This single-center retrospective cohort study was and a specificity of 75% by using only
conducted by reviewing the electronic medical records of consecutive patients who were the patient’s prehospital information.
transported to Tokyo Medical and Dental University Hospital in Japan from April 1, 2018, to March 31,
Meaning This study suggests that a
2021. Patients younger than 16 years with cardiopulmonary arrest on arrival or with a significant
machine learning algorithm can
amount of missing data were excluded.
accurately stratify patients with head
trauma according to severity in
MAIN OUTCOMES AND MEASURES Machine learning–based prediction models to detect the
prehospital settings and may improve
presence of traumatic intracranial hemorrhage were constructed. The predictive accuracy of the
the prognosis of patients with severe
models was evaluated with the area under the receiver operating curve (ROC-AUC), area under the
traumatic head injury.
precision recall curve (PR-AUC), sensitivity, specificity, and other representative statistics.
RESULTS A total of 2123 patients (1527 male patients [71.9%]; mean [SD] age, 57.6 [19.8] years) with + Supplemental content
head trauma were enrolled in this study. Traumatic intracranial hemorrhage was detected in 258 Author affiliations and article information are
patients (12.2%). Among several machine learning algorithms, extreme gradient boosting (XGBoost) listed at the end of this article.
achieved the mean (SD) highest ROC-AUC (0.78 [0.02]) and PR-AUC (0.46 [0.01]) in cross-
validation studies. In the testing set, the ROC-AUC was 0.80, the sensitivity was 74.0% (95% CI,
59.7%-85.4%), and the specificity was 74.9% (95% CI, 70.2%-79.3%). The prediction model using
the National Institute for Health and Care Excellence (NICE) guidelines, which was calculated after
consultation with physicians, had a sensitivity of 72.0% (95% CI, 57.5%-83.8%) and a specificity of
73.3% (95% CI, 68.7%-77.7%). The McNemar test revealed no statistically significant differences
between the XGBoost algorithm and the NICE guidelines for sensitivity or specificity (P = .80 and
P = .55, respectively).
CONCLUSIONS AND RELEVANCE In this cohort study, the prediction model achieved a
comparatively accurate performance in detecting traumatic intracranial hemorrhage using only the
simple pretransportation information from the patient. Further validation with a prospective
multicenter data set is needed.
JAMA Network Open. 2022;5(6):e2216393. doi:10.1001/jamanetworkopen.2022.16393
Open Access. This is an open access article distributed under the terms of the CC-BY License.
JAMA Network Open. 2022;5(6):e2216393. doi:10.1001/jamanetworkopen.2022.16393 (Reprinted) June 10, 2022 1/11
Downloaded from jamanetwork.com by guest on 01/24/2024

JAMA Network Open | Emergency Medicine Machine Learning Algorithms for Detecting Traumatic Intracranial Hemorrhage
Introduction
Traumatic brain injury (TBI) is common in neurosurgery practice. Approximately 10 million to 69
million people worldwide experience TBI annually, and the number of patients with TBI is
increasing.1-3 Although approximately 70% to 90% of all TBI cases are mild and necessitate neither
admission nor neurosurgical intervention,4 moderate or severe TBI can become a major cause of
death or severe disability. Because early efficient treatments during the acute period of TBI improve
patient outcomes,5 it is important to adequately stratify patients with head trauma according to their
trauma severity.
To this end, several screening methods to identify patients with clinically important brain
injuries have been proposed,6-8 and numerous reports have supported the reliability of these
screening tools.9-11 However, because these screening tools were developed for physicians’ use, they
cannot be used for field triage. The prehospital triage of patients with TBI is important because it
eliminates any delays in care, decreases the need for patient transportation between hospitals, and
allows for streamlining of the medical system. However, to our knowledge, there have been no
established protocols for stratifying patients with TBI at the scene before transportation.
In recent years, research on clinical event prediction using machine learning algorithms has
been actively conducted, and more complicated and reliable classification methods have become
possible.12-20 Liu et al21 developed a machine learning triage system that can be used to detect
severity of patients’ injuries using basic clinical information. Therefore, we hypothesized that
machine learning approaches might make it possible to produce reliable prediction models even
when using insufficient information that was collected on the scene.
The purpose of this study was to create a prehospital triage system that can discriminate
patients with traumatic intracranial hemorrhage (tICH) from those without tICH. By using several
representative machine learning algorithms, we built prediction models of tICH for patients with
head trauma. We verified the models’ prediction performance by applying these models on test data,
and we compared these models with the conventional screening tools.
Methods
Study Population
We retrospectively reviewed the electronic medical records of patients who were transported to
Tokyo Medical and Dental University Hospital in Japan for head trauma from April 1, 2018, to March
31, 2021. Based on this retrospective review, we enrolled 2258 patients with head trauma who
underwent head computed tomography (CT). Patients younger than 16 years (n = 83) with
cardiopulmonary arrest at the time of emergency service contact (n = 26) or with missing data in
more than 2 of the features (n = 26) were excluded, leaving 2123 patients included in this analysis
(Figure 1). This study was approved by the institutional review board of Tokyo Medical and Dental
Figure 1. Study Flowchart
2258 Patients with traumatic head injury received emergency

transportation from May 1, 2018, to March 31, 2021
135 Excluded
83 Aged <16 y
26 With cardiopulmonary arrest
26 With missing data
5-Fold stratified
cross-validation
1362 Training cohort 340 Validation cohort 421 Testing cohort

University, which approved a waiver of patient consent because this is a retrospective study and all
data were deidentified. This study followed the Transparent Reporting of a Multivariable Prediction
Model for Individual Prognosis or Diagnosis (TRIPOD) reporting guideline.
Outcome
The predicted outcome was the presence of tICH on results of initial head CT scanning, where tICH
was defined as traumatic subarachnoid hemorrhage, acute subdural hematoma, acute epidural
hematoma, or cerebral contusion at the time of consultation. The radiologic findings were double-
checked by either a radiologist and a neurosurgeon or a radiologist and an emergency medicine
physician.
Features for Prediction Model

We obtained data from the electronic medical records of patients at the Tokyo Medical and Dental
University Hospital and selected the following 18 features as explanatory variables, which are
routinely collected by ambulance crews in Japan: the patient’s age, sex, systolic blood pressure, heart
rate, body temperature, respiratory rate, state of consciousness (assessed with the eye component
of the Glasgow Coma Scale22 [E-GCS; eAppendix in the Supplement] and disorientation), pupil
abnormalities (defined as anisocoria or bilateral pupil dilation and loss of light reflex), high-energy
head trauma (head trauma by dangerous mechanisms as defined in the National Institute for Health
and Care Excellence [NICE] guidelines; eAppendix in the Supplement),23 head trauma scars, multiple
trauma (combination of head trauma and other traumas to the body), posttraumatic seizures, loss
of consciousness, vomiting, hemiplegia, and clinical deterioration, known as “talk and
deteriorate,”24-26 which is defined in this study as a reduction in the E-GCS score of 1 point or more
before transportation from the site.
Data Preprocessing
The missing data for the body temperature and respiratory rate were filled with the overall mean
value of each category. Patients with missing data other than body temperature and respiratory rate
were excluded. The values of each feature were normalized for the support vector machine and
logistic regression.
Machine Learning Algorithms

To build the classification models, we used 5 representative machine learning algorithms (eg,
extreme gradient boosting [XGBoost], random forest, support vector machine [linear kernel and
radial basis function kernel] and logistic regression) as prediction model–based algorithms.
The XGBoost algorithm is an efficient implementation of gradient tree boosting, which is a
representative ensemble learning algorithm that is based on decision trees.27,28 In the algorithm,
weak learners are trained successively to split the residual of the last prediction. In addition, adding a
regularization term to the loss function results in creating a strong learner without overfitting.
Random forest29,30 is also an ensemble classifier that is based on decision trees. In the
algorithm, a large number of weak learners are trained using randomly bootstrapped data. Each weak
learner votes for its predicted outcome, and the most voted class is adopted as the prediction result
of all learners.
Support vector machine is a supervised machine learning algorithm in which a binary
classification boundary is set to maximize the margin from each data sample. Logistic regression31 is
one of the most widely used classification algorithms. It is used to compute the probability of an
occurrence of binary outcomes using the sigmoid function.
To build the classification models based on these 5 machine learning algorithms, we used
Python, version 3.6 (Python Software Foundation) and several Python modules (numpy, sklearn,
matplotlib, pandas, and xgboost). In brief, we used the values shown in the eTable in the Supplement
for each hyperparameter of the machine learning algorithms. For the other parameters, we used the

default values. To validate the classification models and to optimize the parameters in the models, we
performed a 5-fold cross-validation.
Training, Cross-Validation, and Test

We used 80% of the data to train the classification models through a 5-fold stratified cross-
validation. The remaining 20% of the data were reserved to test the predictive ability of the resultant
classification models. In the process of cross-validation, scikit-learn, version 0.23.2 (scikit-learn
Developers) was used to implement a grid search to obtain the optimal hyperparameters. The
hyperparameters were tuned to maximize the area under the receiver operator characteristic curve
(ROC-AUC). The trained classifier models calculated 2 class probabilities (the probability for the class
with tICH and the probability for the class without tICH) for each patient. We used the Platt
calibration32,33 to calibrate the class probabilities from each classifier model.
As shown in Table 1, our data set was imbalanced owing to the relative rarity of tICH among
patients with head injury. Therefore, to solve the issues associated with the imbalanced data set, we
introduced class weights to modify the loss function; that is, in a similar manner to Dong et al,34 we
assigned a 10-fold higher weight to the positive cases (cases with tICH) than the negative cases
(cases without tICH).
Performance Evaluation
To evaluate the predictive performance of the classifier models, we calculated 6 representative
performance evaluation measures, including sensitivity, specificity, positive predictive value (PPV),
negative predictive value, positive likelihood ratio, and negative likelihood ratio. The ROC-AUC and
the area under the precision recall curve (PR-AUC) were also computed. The recall corresponds to the
sensitivity, and the precision corresponds to the PPV. The PR curve is often used along with the ROC
curve to assess the model performance, especially in imbalanced data sets.
Statistical Analysis
The statistical analysis was conducted using R, version 4.0.3 (R Group for Statistical Computing).
Mann-Whitney tests were used to analyze nonnormally distributed variables, and the χ2 test was
chosen for the analysis of categorical data. To compare the sensitivity and specificity between each
model, the McNemar test was used. All P values were from 2-sided tests and results were deemed
statistically significant at P < .05. Bonferroni adjustment was applied for multiple comparison testing.
Results
Baseline Characteristics
This study cohort included 2123 patients (1527 male patients [71.9%]; mean [SD] age, 57.6 [19.8]
years; age range, 16-102 years), of whom 258 (12.2%) were found to have tICH. All of the patients
were divided into a training set (1702 cases, 80.2% of all data) and a testing set (421 cases, 19.8% of
all data) by a random allocation that was stratified by tICH. Table 1 shows the breakdown of each
explanatory feature in each group. No statistically significant difference was found between the 2
groups in any of the features.
Comparison Among the Machine Learning Algorithms in the Cross-Validation

Figure 2 shows the ROC and PR curves for the 5 machine learning algorithms. Among the algorithms,
XGBoost achieved the highest mean (SD) AUC in both the ROC curve (0.78 [0.02]) and PR curve
(0.46 [0.01]). Therefore, we selected XGBoost as an appropriate algorithm to develop the prediction
models for the patient stratification and conducted a further analysis with it for its predictive validity.

Table 1. Baseline Features of the Study Cohort
Patients, No. (%)

Training and validation Testing cohort
Feature cohort (n = 1702) (n = 421) P value
Age, mean (SD), y 57.5 (19.8) 58 (19.9) >.99
Sex
Male 1220 (71.7) 307 (72.9)
>.99
Female 482 (28.3) 114 (27.1)
Score on the eye component of the GCS
4 1558 (91.5) 387 (91.9)
3 82 (4.8) 13 (3.1)
>.99
2 12 (0.7) 3 (0.7)
1 50 (2.9) 18 (4.3)
Disorientation
Yes 594 (34.9) 139 (33.0)
>.99
No 1108 (65.1) 282 (67.0)
Vital signs, mean (SD)
Systolic blood pressure, mm Hg 130.4 (27.6) 129.2 (26) >.99
Heart rate, beats/min 88.2 (17.5) 88.3 (17.7) >.99
Body temperature, °C 36.6 (8.0) 36.4 (1.6) >.99
Respiratory rate, breaths/min 18.7 (4.2) 18.5 (2.4) >.99
Pupil abnormality
Yes 13 (0.8) 5 (1.2)
>.99
No 1689 (99.2) 416 (98.8)
Hemiparesis
Yes 12 (0.7) 6 (1.4)
>.99
No 1690 (99.3) 415 (98.6)
Posttraumatic seizure
Yes 14 (0.8) 7 (1.7)
>.99
No 1688 (99.2) 414 (98.3)
Vomiting
Yes 65 (3.8) 9 (2.1)
>.99
No 1637 (94.2) 412 (97.9)
Loss of consciousness
Yes 63 (3.7) 13 (3.1)
>.99
No 1639 (96.3) 408 (96.9)
Talk and deteriorate
Yes 8 (0.5) 3 (0.7)
>.99
No 1694 (99.5) 418 (99.3)
High-energy head trauma
Yes 181 (10.6) 36 (8.6)
>.99
No 1521 (89.4) 385 (91.4)
Head trauma scar
Yes 1366 (80.3) 334 (79.3)
>.99
No 336 (19.7) 87 (20.7)
Multiple trauma
Yes 707 (41.5) 169 (40.1)
>.99
No 995 (58.5) 252 (59.9)
Alcohol intake
Yes 841 (49.4) 200 (47.6)
>.99
No 861 (50.6) 221 (52.4)
Outcome: tICH
Yes 208 (12.2) 50 (11.9)
>.99 Abbreviations: GCS, Glasgow Coma Scale; tICH,
No 1494 (87.8) 371 (88.1)
traumatic intracranial hemorrhage.

Prediction Model for the Test Data Using XGBoost

To classify the patients with tICH, we built a classifier model based on the XGBoost algorithm with all
18 features and named this model the “all-elements model.” This model achieved a mean (SD)
ROC-AUC of 0.78 (0.02) and a mean (SD) PR-AUC of 0.46 (0.01) for cross-validation (eFigure 1 in the
Supplement) and an ROC-AUC of 0.80 and a PR-AUC of 0.51 for the testing set (Figure 3; eFigure 2
in the Supplement). A calibration plot for the testing set is shown in Figure 3.
To quantify the association of the 18 features with the outcomes, we performed an analysis of
the importance of the features for the all-elements model. As shown in eFigure 3 in the Supplement,
among the 18 features, 5 variables (disorientation, high-energy head trauma, head trauma scar,
E-GCS, and pupil abnormality) were likely to be associated with obtaining accurate predictions.
Based on the results of this feature importance analysis and to simplify the classification
models, we selected the 5 most important features (disorientation, high-energy head trauma, head
trauma scar, E-GCS, and pupil abnormality) and built another classification model (named the
Figure 2. Receiver Operating Characteristic (ROC) Curves and Precision Recall (PR) Curves for Each Machine
Learning Algorithm in the 5-Fold Stratified Cross-Validation
A ROC curve B PR curve
XGBoost (mean [SD] AUC = 0.78 [0.02]) XGBoost (AUC = 0.46 [0.01])
Random forest (mean [SD] AUC = 0.76 [0.03]) Random forest (AUC = 0.44 [0.00])
Linear SVM (mean [SD] AUC = 0.76 [0.03]) Linear SVM (AUC = 0.39 [0.03])
rbf SVM (mean [SD] AUC = 0.74 [0.04]) rbf SVM (AUC = 0.35 [0.03])
Logistic regression (mean [SD] AUC = 0.76 [0.03]) Logistic regression (AUC = 0.43 [0.05])
1.0 1.0
0.8 0.8
True-positive rate
0.6 0.6
Precision
0.4 0.4
0.2 0.2
The diagonal dotted line represents the random

0 0 classifier. AUC indicates area under the curve; rbf,
0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0 radial basis function; SVM, support vector machine;
False-positive rate Recall and XGBoost, extreme gradient boosting.
Figure 3. Receiver Operating Characteristic (ROC) Curve for the Extreme Gradient Boosting (XGBoost) Models
on the Testing Set and Calibration Plot for the Testing Set
A ROC curve for XGBoost models on testing set B Calibration plot for testing set
1.0 1.0
0.8 0.8
True-positive rate
Observation
0.6 0.6
0.4 0.4
All-elements model
(AUC = 0.80)
0.2 0.2
5-Elements model
(AUC = 0.74)
0 0 Diagonal dotted lines represent the random classifier
0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0 (A) and perfectly calibrated plot (B). AUC indicates
False-positive rate Prediction area under the curve.

“5-elements model”) with the selected features. The 5-elements model showed a mean (SD)
ROC-AUC of 0.74 (0.03) for the cross-validation study (eFigure 1 in the Supplement) and 0.74 for the
testing set (Figure 3).
Comparison of the Performances of the Conventional Screening Method

and the Current Models
Table 2 shows the comparison of representative statistics obtained from each model: the
all-elements model, the 5-elements model, and the NICE guideline’s recommendation for CT scans.
The sensitivity and specificity of the NICE guidelines were 72.0% (95% CI, 57.5%-83.8%) and 73.3%
(95% CI, 68.7%-77.7%), respectively, and those of the all-elements model were 74.0% (95% CI,
59.7%-85.4%) and 74.9% (95% CI, 70.2%-79.3%), respectively. There were no statistically
significant differences between this model and the NICE guidelines using the McNemar test (P = .80
and P = .55, respectively). The McNemar test was also conducted for the NICE guidelines and the
5-elements model. The sensitivity was 74.0% (95% CI, 59.7%-85.4%), and the specificity was 70.1%
(95% CI, 65.1%-74.7%), and there were no statistically significant differences between them (P = .81
and P = .22, respectively).
Discussion
The purpose of this study was to establish a machine learning–based triage system that can detect
tICH in patients with head injury using information that can be collected by the ambulance crews at
the scene of trauma to select an optimal medical institution for each patient. Among several machine
learning algorithms, XGBoost showed a comparatively better prediction performance. The XGBoost
model with all available features was able to predict tICH with a sensitivity of 74.0%, a specificity of
74.9%, and an ROC-AUC of 0.80. In addition, a feature importance analysis revealed that 5 features
(eg, disorientation, E-GCS, high-energy head trauma, head trauma scar, and pupil abnormalities)
were likely to be associated with an accurate prediction. Another model using only these 5 features
showed relatively accurate predictions, with a sensitivity of 74.0%, a specificity of 70.1%, and an
ROC-AUC of 0.74.
The current process for a patient with head trauma is as follows: (1) ambulance crews judge the
severity of the patient’s trauma and decide where to hospitalize them; (2) after transportation,
physicians evaluate the necessity of CT scanning through a physical examination; and (3) the patient
undergoes CT scanning and may be transported to another hospital. Although well-established tICH
screening tools (such as the NICE guidelines) are useful in step 2, to our knowledge there is no
established way to properly triage patients in step 1. As the functional outcomes of patients with
head injury worsen when their transportation is delayed,5 the transport time in step 3 should be
reduced by constructing a reliable field triage tool. In general, the severity of patients’ TBI is classified
according to their GCS scores.35 However, the GCS score is not reliable because it is easily affected
by environmental issues, such as alcohol or drug intake, and it tends to underestimate the severity of
patients’ TBI because intracranial mass lesions do not immediately affect the patient’s intracranial
status, especially among older patients.36 Ter Avest et al37 tried to detect elevated intracranial
pressures with a combination of the GCS score, pupil reaction, and the Cushing triad (bradycardia,
Table 2. Comparison of Representative Statistics That Were Predicted With the Present Machine Learning Models and the NICE Guidelines on the Testing Set
Positive likelihood Negative likelihood
Model Sensitivity (95% CI) Specificity (95% CI) PPV (95% CI) NPV (95% CI) ratio (95% CI) ratio (95% CI)
NICE guidelines 72.0 (57.5-83.8) 73.3 (68.7-77.7) 26.7 (19.4-35.0) 95.1 (91.9-97.3) 2.70 (2.12-3.44) 0.38 (0.24-0.60)
XGBoost
All-elements model 74.0 (59.7-85.4) 74.9 (70.2-79.3) 28.5 (20.9-37.0) 95.5 (92.5-97.6) 2.95 (2.32-3.76) 0.35 (0.22-0.56)
5-Elements model 74.0 (59.7-85.4) 70.1 (65.1-74.7) 25.0 (18.3-32.8) 95.2 (92.0-97.4) 2.47 (1.97-3.10) 0.37 (0.23-0.60)
Abbreviations: NICE, National Institute for Health and Care Excellence; NPV, negative predictive value; PPV, positive predictive value; XGBoost, extreme gradient boosting.

irregular breathing pattern, and hypertension), but the sensitivity and ROC-AUC were relatively low
(36.8% and 0.65, respectively). We added the crush severity, head trauma scars, and the neurologic
findings, such as paresis and seizure (which were easily collectable by ambulance crews at the site),
to these features to obtain a new model. These additional features, combined with machine learning
techniques, allowed us to achieve a more accurate prediction model compared with
previous models.
One of the major advantages of applying artificial intelligence to clinical practice is that it can
substitute partially, if not completely, for the assessment by physicians.38 During our investigation,
we used the NICE guidelines’ CT scan recommendation criteria as a representation of physicians’
assessment and compared its screening performance with those of our present models. As a result,
the XGBoost models achieved a predictive accuracy as good as that of the NICE guidelines (Table 2).
Seemingly, the sensitivities calculated in our testing set were worse than those of conventional tools,
such as the Canadian CT Head Rule (CCHR)6 and New Orleans Criteria (NOC),7 which were reported
to be nearly 100%. Several possible reasons could exist for this finding. First, the CCHR and NOC
excluded patients with mild head trauma without loss of consciousness, amnesia, or confusion,
which would have resulted in increasing their sensitivity. Second, the outcome was set to be clinically
significant brain injuries, which did not include minor tICHs. In a report by Svensson et al,39 in which
the screening outcome was set as all tICHs, the sensitivity and specificity were, respectively, 87.1%
and 35.7% with the CCHR, 97.1% and 3.4% with the NOC, and 75.7% and 58.0% with the NICE
guidelines. Considering the trade-off of the sensitivity and specificity on the ROC curve (Figure 3), the
performance of our machine learning model is assumed to be as good as these screening tools. The
evaluation results of the testing cohort shown in Table 2 also support this assumption. In our model,
the PPV, which is another important statistic for imbalanced data, was also comparable to that of the
NICE guidelines. Although the improvement in field triage accuracy has been desired in emergency
medicine,40-42 our present prediction model will possibly allow emergency crews, who are not as
specialized as physicians, to triage as well as physicians, even with incomplete data. Furthermore,
whereas conventional tools output only binary outcomes and yield a single sensitivity, specificity, and
PPV value, our model can yield different sensitivity, specificity, and PPV values by changing the cutoff
value to adapt to the situation needed, which is another strength of our model.
Regarding the analysis method, while previous screening tools have been established using
logistic regression, which is based on linear separation, decision trees can set more than 1 boundary
at each variable and can make it possible to classify objects with more complexity. Although complex
classification algorithms tend to overfit only training data and will not perform well on test data,
several ensemble learning methods, such as random forest and gradient boosting, have been shown
to improve their predictive accuracy by preventing overfitting.43,44 In our study, XGBoost, as a
state-of-the-art tree-based gradient boosting algorithm, allowed us to achieve an acceptable
prediction performance. Our current models are available on the GitHub website.45 In the future,
after proper validation study with a prospective cohort, our models can be converted to web or
smartphone applications for ambulance crews to calculate patients’ risk.
Limitations
Our study has some limitations. First, because this was a single-center study and included only
patients who were hospitalized and underwent head CT, our data set may not represent the general
population of patients with head trauma. In addition, we suggest that our model may be
underestimating patients at high risk, based on the calibration plot. To apply our model to clinical
practice, we should verify the predictive accuracy using a prospective external validation set and
investigate the optimal cutoff value.
Second, the GCS was not used to evaluate the patient’s state of consciousness because it is not
routinely recorded in the Japanese emergency medical system. Instead, the Japan Coma Scale46 is
widely used to obtain the patient’s state of consciousness. Because the Japan Coma Scale can be
translated to the combination of E-GCS and disorientation, our model adopted these 2 items as

features. These variables might be less accurate than the GCS itself, and the prediction result might
have been degraded.
Conclusions
Machine learning algorithms have achieved good performance as conventional screening tools in
detecting tICH. Although conventional screening tools require examination by a physician, our
proposed models require only pretransportation patient information, which can be easily obtained.
The results suggest that our proposed prediction models may be useful for constructing a triage
system that can be used to assess the optimal institution to which a patient with a head injury should
be transported. Further validation with prospective and multicenter data sets is needed.
ARTICLE INFORMATION
Accepted for Publication: April 24, 2022.
Published: June 10, 2022. doi:10.1001/jamanetworkopen.2022.16393
Open Access: This is an open access article distributed under the terms of the CC-BY License. © 2022 Abe D et al.
JAMA Network Open.
Corresponding Author: Motoki Inaji, MD, PhD, Department of Neurosurgery, Tokyo Medical and Dental
University, 1-5-45 Yushima, Bunkyo-ku, Tokyo-to, Japan (inamnsrg@tmd.ac.jp).
Author Affiliations: Department of Neurosurgery, Tokyo Medical and Dental University, Tokyo, Japan (Abe, Inaji,
Takahashi, Sakai, Ayabe, Tanaka, Maehara); Institute of Education, Tokyo Medical and Dental University, Tokyo,
Japan (Hase); Department of Acute Critical Care and Disaster Medicine, Graduate School of Medical and Dental
Sciences, Tokyo Medical and Dental University, Tokyo, Japan (Otomo).
Author Contributions: Drs Abe and Inaji had full access to all of the data in the study and take responsibility for the
integrity of the data and the accuracy of the data analysis.
Concept and design: Abe, Inaji, Otomo, Maehara.
Acquisition, analysis, or interpretation of data: Abe, Inaji, Hase, Takahashi, Sakai, Ayabe, Tanaka.
Drafting of the manuscript: Abe, Inaji, Sakai, Maehara.
Critical revision of the manuscript for important intellectual content: Abe, Inaji, Hase, Takahashi, Ayabe, Tanaka,
Otomo, Maehara.
Statistical analysis: Abe, Hase, Tanaka.
Administrative, technical, or material support: Hase, Sakai.
Supervision: Tanaka, Otomo, Maehara.
Conflict of Interest Disclosures: None reported.
REFERENCES
1. Dewan MC, Rattani A, Gupta S, et al. Estimating the global incidence of traumatic brain injury. J Neurosurg.
2018;130:1080-1097. doi:10.3171/2017.10.JNS17352
2. Maas AIR, Menon DK, Adelson PD, et al; InTBIR Participants and Investigators. Traumatic brain injury: integrated
approaches to improve prevention, clinical care, and research. Lancet Neurol. 2017;16(12):987-1048. doi:10.1016/
S1474-4422(17)30371-X
3. James SL, Theadom A, Ellenbogen RG, et al; GBD 2016 Traumatic Brain Injury and Spinal Cord Injury
Collaborators. Global, regional, and national burden of traumatic brain injury and spinal cord injury, 1990-2016:
a systematic analysis for the Global Burden of Disease Study 2016. Lancet Neurol. 2019;18(1):56-87. doi:10.1016/
S1474-4422(18)30415-0
4. Jennett B. Epidemiology of head injury. J Neurol Neurosurg Psychiatry. 1996;60(4):362-369. doi:10.1136/jnnp.
60.4.362
5. Chen CH, Shin SD, Sun JT, et al. Association between prehospital time and outcome of trauma patients in 4
Asian countries: a cross-national, multicenter cohort study. PLoS Med. 2020;17(10):e1003360. doi:10.1371/journal.
pmed.1003360

6. Stiell IG, Wells GA, Vandemheen K, et al. The Canadian CT Head Rule for patients with minor head injury.
Lancet. 2001;357(9266):1391-1396. doi:10.1016/S0140-6736(00)04561-X
7. Haydel MJ, Preston CA, Mills TJ, Luber S, Blaudeau E, DeBlieux PM. Indications for computed tomography in
patients with minor head injury. N Engl J Med. 2000;343(2):100-105. doi:10.1056/NEJM200007133430204
8. Shravat BP, Huseyin TS, Hynes KA. NICE guideline for the management of head injury: an audit demonstrating
its impact on a district general hospital, with a cost analysis for England and Wales. Emerg Med J. 2006;23(2):
109-113. doi:10.1136/emj.2004.022327
9. Mata-Mbemba D, Mugikura S, Nakagawa A, et al. Canadian CT Head Rule and New Orleans Criteria in mild
traumatic brain injury: comparison at a tertiary referral hospital in Japan. Springerplus. 2016;5:176. doi:10.1186/
s40064-016-1781-9
10. Kavalci C, Aksel G, Salt O, et al. Comparison of the Canadian CT Head Rule and the New Orleans Criteria in
patients with minor head injury. World J Emerg Surg. 2014;9:31. doi:10.1186/1749-7922-9-31
11. Papa L, Stiell IG, Clement CM, et al. Performance of the Canadian CT Head Rule and the New Orleans Criteria for
predicting any traumatic intracranial injury on computed tomography in a United States level I trauma center. Acad
Emerg Med. 2012;19(1):2-10. doi:10.1111/j.1553-2712.2011.01247.x
12. Chang K, Bai HX, Zhou H, et al. Residual convolutional neural network for the determination of IDH status in
low- and high-grade gliomas from MR imaging. Clin Cancer Res. 2018;24(5):1073-1081. doi:10.1158/1078-0432.
CCR-17-2236
13. Abdelaziz Ismael SA, Mohammed A, Hefny H. An enhanced deep learning approach for brain cancer MRI
images classification using residual networks. Artif Intell Med. 2020;102:101779. doi:10.1016/j.artmed.2019.101779
14. Shahrestani S, Cardinal T, Micko A, et al. Neural network modeling for prediction of recurrence, progression,
and hormonal non-remission in patients following resection of functional pituitary adenomas. Pituitary. 2021;24
(4):523-529. doi:10.1007/s11102-021-01128-5
15. Kim D, You S, So S, et al. A data-driven artificial intelligence model for remote triage in the prehospital
environment. PLoS One. 2018;13(10):e0206006. doi:10.1371/journal.pone.0206006
16. Xie F, Ong MEH, Liew JNMH, et al. Development and assessment of an interpretable machine learning triage
tool for estimating mortality after emergency admissions. JAMA Netw Open. 2021;4(8):e2118467. doi:10.1001/
jamanetworkopen.2021.18467
17. Baniasadi A, Rezaeirad S, Zare H, Ghassemi MM. Two-step imputation and AdaBoost-based classification for
early prediction of sepsis on imbalanced clinical data. Crit Care Med. 2021;49(1):e91-e97. doi:10.1097/CCM.
0000000000004705
18. Tseng PY, Chen YT, Wang CH, et al. Prediction of the development of acute kidney injury following cardiac
surgery by machine learning. Crit Care. 2020;24(1):478. doi:10.1186/s13054-020-03179-9
19. Wu E, Marthi S, Asaad WF. Predictors of mortality in traumatic intracranial hemorrhage: a National Trauma
Data Bank study. Front Neurol. 2020;11:587587. doi:10.3389/fneur.2020.587587
20. Li Y, Wang L, Liu Y, et al. Development and validation of a simplified prehospital triage model using neural
network to predict mortality in trauma patients: the ability to follow commands, age, pulse rate, systolic blood
pressure and peripheral oxygen saturation (CAPSO) model. Front Med (Lausanne). 2021;8:810195. doi:10.3389/
fmed.2021.810195
21. Liu Y, Gao J, Liu J, et al. Development and validation of a practical machine-learning triage algorithm for the
detection of patients in need of critical care in the emergency department. Sci Rep. 2021;11(1):24044. doi:10.1038/
s41598-021-03104-2
22. Teasdale G, Jennett B. Assessment and prognosis of coma after head injury. Acta Neurochir (Wien). 1976;34(1-
4):45-55. doi:10.1007/BF01405862
23. National Institute for Health and Care Excellence. Head injury: assessment and early management: clinical
guideline [CG176]. Updated September 13, 2019. Accessed April 10, 2022. https://www.nice.org.uk/
guidance/cg176
24. Marshall LF, Toole BM, Bowers SA. The National Traumatic Coma Data Bank, part 2: patients who talk and
deteriorate: implications for treatment. J Neurosurg. 1983;59(2):285-288. doi:10.3171/jns.1983.59.2.0285
25. Lobato RD, Rivas JJ, Gomez PA, et al. Head-injured patients who talk and deteriorate into coma: analysis of 211
cases studied with computerized tomography. J Neurosurg. 1991;75(2):256-261. doi:10.3171/jns.1991.75.2.0256
26. Rockswold GL, Leonard PR, Nagib MG. Analysis of management in thirty-three closed head injury patients
who “talked and deteriorated”. Neurosurgery. 1987;21(1):51-55. doi:10.1227/00006123-198707000-00010

27. Tianqi C, Guestrin C. XGBoost: a scalable tree boosting system. In: Krishnapuram B, ed. Proceedings of the 22nd
ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing
Machinery; 2016:785-794.
28. Liu J, Wu J, Liu S, Li M, Hu K, Li K. Predicting mortality of patients with acute kidney injury in the ICU using
XGBoost model. PLoS One. 2021;16(2):e0246306. doi:10.1371/journal.pone.0246306
29. Breiman L. Random forests. Mach Learn. 2001;45:5-32. doi:10.1023/A:1010933404324
30. Biau G. Analysis of a random forests model. J Machine Learning Res. 2012;13:1063-1095.
31. Hosmer DW Jr, Lemeshow S, Sturdivant RX. Applied Logistic Regression. 3rd ed. John Wiley & Sons; 2013. doi:10.
1002/9781118548387
32. Platt J. Probabilities for SV machines. In: Smola AJ, Bartlett PL, Schölkopf B, Schuurmans D, eds. Advances in
Large Margin Classifiers. The MIT Press; 2000:61-74.
33. Niculescu-Mizil A, Caruana R. Predicting good probabilities with supervised learning. In: Džeroski S, Raedt LD,
Wrobel S, eds. Proceedings of the 22nd International Conference on Machine Learning. Association for Computing
Machinery; 2005:625-632.
34. Dong X, Rashidian S, Wang Y, et al. Machine learning based opioid overdose prediction using electronic health
records. AIMA Annu Symp Proc. 2020;2019:389-398.
35. Jennett B, Teasdale G. Aspects of coma after severe head injury. Lancet. 1977;1(8017):878-881. doi:10.1016/
S0140-6736(77)91201-6
36. Kehoe A, Rennie S, Smith JE. Glasgow Coma Scale is unreliable for the prediction of severe head injury in
elderly trauma patients. Emerg Med J. 2015;32(8):613-615. doi:10.1136/emermed-2013-203488
37. Ter Avest E, Taylor S, Wilson M, Lyon RL. Prehospital clinical signs are a poor predictor of raised intracranial
pressure following traumatic brain injury. Emerg Med J. 2021;38(1):21-26. doi:10.1136/emermed-2020-209635
38. Rajkomar A, Dean J, Kohane I. Machine learning in medicine. N Engl J Med. 2019;380(14):1347-1358. doi:10.
1056/NEJMra1814259
39. Svensson S, Vedin T, Clausen L, Larsson PA, Edelhamre M. Application of NICE or SNC guidelines may reduce
the need for computerized tomographies in patients with mild traumatic brain injury: a retrospective chart review
and theoretical application of five guidelines. Scand J Trauma Resusc Emerg Med. 2019;27(1):99. doi:10.1186/
s13049-019-0673-8
40. Lee SK, Kang JH, Song SW. Validation of guidelines for field triage of injured patients for major trauma in
patients of brain and spinal injury. J Med Life Sci. 2020;17(1):7-15. doi:10.22730/jmls.2020.17.1.7
41. Newgard CD, Fu R, Zive D, et al. Prospective validation of the national field triage guidelines for identifying
seriously injured persons. J Am Coll Surg. 2016;222(2):146-158. doi:10.1016/j.jamcollsurg.2015.10.016
42. van der Sluijs R, van Rein EAJ, Wijnand JGJ, Leenen LPH, van Heijl M. Accuracy of pediatric trauma field triage:
a systematic review. JAMA Surg. 2018;153(7):671-676. doi:10.1001/jamasurg.2018.1050
43. Sahin EK. Assessing the predictive capability of ensemble tree methods for landslide susceptibility mapping
using XGBoost, gradient boosting machine, and random forest. SN Appl Sci. 2020;2(7):1308. doi:10.1007/s42452-
020-3060-1
44. Li Y, Chen W. A comparative performance assessment of ensemble learning for credit scoring. Mathematics.
2020;8(10):1756. doi:10.3390/math8101756
45. GitHub. DaisuAbe/HeadTraumaTriage. Accessed May 2, 2022. https://github.com/DaisuAbe/
HeadTraumaTriage
46. Ohta T, Waga S, Handa W, Saito I, Takeuchi K. New grading of level of disordered consiousness (author’s
transl). Article in Japanese. No Shinkei Geka. 1974;2(9):623-627.
SUPPLEMENT.
eTable. The Values of Hyperparameters for Each Algorithm
eFigure 1. Receiver Operating Characteristic Curves and Precision Recall Curves for the All-Elements XGBoost
Model and the 5-Elements XGBoost Model in the Cross Validation
eFigure 2. Precision Recall Curves for the All-Elements XGBoost Model and the 5-Elements XGBoost Model on the
Testing Set
eFigure 3. The Result of the Feature Importance Analysis
eAppendix.

Abe 2022 Oi 220481 1654263485.26457

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Abe 2022 Oi 220481 1654263485.26457

Uploaded by

Copyright:

Available Formats

Original Investigation | Emergency Medicine

A Prehospital Triage System to Detect Traumatic Intracranial Hemorrhage

Abstract Key Points

JAMA Network Open. 2022;5(6):e2216393. doi:10.1001/jamanetworkopen.2022.16393

Downloaded from jamanetwork.com by guest on 01/24/2024

Figure 1. Study Flowchart

2258 Patients with traumatic head injury received emergency

1362 Training cohort 340 Validation cohort 421 Testing cohort

Downloaded from jamanetwork.com by guest on 01/24/2024

Features for Prediction Model

Machine Learning Algorithms

Downloaded from jamanetwork.com by guest on 01/24/2024

Training, Cross-Validation, and Test

Comparison Among the Machine Learning Algorithms in the Cross-Validation

Downloaded from jamanetwork.com by guest on 01/24/2024

Table 1. Baseline Features of the Study Cohort

Patients, No. (%)

Downloaded from jamanetwork.com by guest on 01/24/2024

Prediction Model for the Test Data Using XGBoost

A ROC curve B PR curve

The diagonal dotted line represents the random

Downloaded from jamanetwork.com by guest on 01/24/2024

Comparison of the Performances of the Conventional Screening Method

Downloaded from jamanetwork.com by guest on 01/24/2024

Downloaded from jamanetwork.com by guest on 01/24/2024

Downloaded from jamanetwork.com by guest on 01/24/2024

Downloaded from jamanetwork.com by guest on 01/24/2024

Downloaded from jamanetwork.com by guest on 01/24/2024

You might also like