You are on page 1of 4

Predicting Length of Stay for Cardiovascular Hospitalizations in the

Intensive Care Unit: Machine Learning Approach


Belal Alsinglawi 1, Fady Alnajjar 2, Omar Mubin 1, Mauricio Novoa 1, Mohammed Alorjani 3, Ola
Karajeh 4, Omar Darwish 5

Consequently, it has a direct impact on hospital resources


Abstract— Predicting Cardiovascular Length of stay based


hospitalization at the time of patients' admitting to the coronary utilization. In the traditional hospital management system, the
care unit (CCU) or (cardiac intensive care units CICU) is comprehensive transitional care interventions for
deemed as a challenging task to hospital management systems hospitalizations reduce the risk of re-admissions and the risk
globally. Recently, few studies examined the length of stay (LOS) mortality.
predictive analytics for cardiovascular inpatients in ICU.
However, there are almost scarcely real attempts utilized The machine learning prediction models play a significant
machine learning models to predict the likelihood of heart role to facilitate hospital operation success during various
failure patients length of stay in ICU hospitalization. This paper hospitalizations such as predicting inpatients length of stay,
introduces a predictive research architecture to predict Length risk of mortality, risk of re-admission or unplanned admission
of Stay (LOS) for heart failure diagnoses from electronic medical [7-8]. Several recent studies have explored the effectiveness of
records using the state-of-art- machine learning models, in prediction models to address the problem of inpatient length
particular, the ensembles regressors and deep learning of stay for cardiovascular hospitalizations. However, only a
regression models. Our results showed that the gradient boosting few of them examined the effectiveness of prediction models
regressor (GBR) outweighed the other proposed models in this for heart failure. For instance, In statistical analysis
study. The GBR reported higher R-squared value followed by approaches, Omar and Guglin [9] implemented univariate
the proposed method in this study called Staking Regressor. analysis to determine the short LOS (7 < days) and longer LOS
Additionally, The Random forest Regressor (RFR) was the (7 days) for HF patients. Almashrafi et al. [6] utilized
fastest model to train. Our outcomes suggested that deep multivariate regression model to determine the prolonged
learning-based regressor did not achieve better results than the
length of stay for congestive heart failure (HF) patients. The
traditional regression model in this study. This work contributes
to the field of predictive modelling for electronic medical records
observational study by Durstenfeld et al. [10] used statistical
for hospital management systems. generalized estimating equation in their predication for the
analysis of variance (ANOVA) to compare the predicted and
I. INTRODUCTION actual LOS for the observations. However, sample sizes of HF
patients in these studies are small, and some of these
Heart Failure (HF) or sometimes known as heart failure
approaches did not assess whether accuracy improved closer
(HF) is a clinical syndrome identified by distinctive symptoms
to discharge, such as in [7].
such as breathlessness, ankle swelling, and fatigue caused by
a structural and functional cardiac abnormality. Heart failure In machine learning-based methods, Tsai et al. [11]
results in reduced cardiac output and or elevated intracardiac compared the performance of artificial neural networks and
pressures of the heart at rest or during stress [1]. The HF linear regression LR for Length of stay HF inpatients. Other
population is ranging between 1% to 2% of the total world machine learning predictive approach such as [12-14]
population [2]. The heart failure hospitalizations are costly for examined the LOS for heart failure patient using decision tree-
hospitals, and they increase the health expenditures in many based approaches. The literature review on studies relating to
health systems globally as of 1-2 % of the healthcare budget is HF LOS prediction focused on predicting the risk of re-
spent on heart failure worldwide [3]. For instance, the future admission, and the risk of mortality. Whereas very few studies
prevalence of HF is predicted to cost the healthcare system attended to the problem of HF inpatients length of stay, and
$3.5 billion for Australia by 2030 [4] and $69.7 billion for the the majority of these studies used the statistical approaches to
United States by 2030 [5]. These figures have an economic determine the LOS prediction. There is a great need for more
impact and put pressure on hospital systems internationally studies that apply machine learning techniques to offer
and lead to an increase in the cost of HF inpatients HF hospital management systems solutions and improve patients'
hospitalizations [6] in public and private hospitals. health outcomes, discharge planning for HF, and improve
hospital resources utilization. The main key contribution of the
study is to introduce a practical predictive architecture LOS
1 4
Belal Alsinglawi, Omar Mubin, and Mauricio Novoa are with School of Ola Karajeh is with Department of Computer Science, Virginia
Computer, Data and Mathematical Sciences, Western Sydney University, Polytechnic Institute and State University, Blacksburg, VA 24061, United
Penrith NSW 2751, Australia. E-mail: b.alsinglawi@westernsydney States.
5
university.edu.au. Omar Darwish is with Department of Computer Information Systems,
2
Fady Alnajjar, is with College of Information Technology, United Arab Ferrum College, Ferrum, Virginia 24088, United States.
Emirates University, Al Ain, 15551, UAE. Email: fady.alnajjar@uaeu.ac.ae
3
Mohammed Alorjani is with Department of Pathology and
Microbiology, King Abdullah University Hospital, Faculty of Medicine
Jordan University of Science and Technology, Irbid 22110, Jordan.

978-1-7281-1990-8/20/$31.00 ©2020 IEEE 5442

Authorized licensed use limited to: Cornell University Library. Downloaded on September 03,2020 at 10:29:40 UTC from IEEE Xplore. Restrictions apply.
benchmarking the regression models versus the deep learning Regressor [19] and Stacking Regressor [20]) and evaluated the
model for heart failure hospitalizations from the hospital best regressor model with Deep Neural Network model [21].
electronic medical records in ICU settings.
In this research, we aim to analyze and compare the
prediction abilities of regression ensembles-based machine
learning method against the deep neural networks and find the
optimal prediction method in the context of heart failure LOS
regression prediction problem in ICU-based hospitalizations.
II. METHOD
This section describes the proposed predictive framework
for heart failure LOS prediction using ICU electronic medical
records data. Also, it explains the steps towards building the
predictive framework to baseline the outperforming
predictive model for HF length of stay admissions. Then, the
performing model will be selected, and the top features are Figure 1. Proposed Predictive Framework for HF Length of Stay in
passed during the model tuning stage. Relevant models ICU Hospital Management Systems.
evaluation metrics were chosen to examine the performance D. Candidates Models Evaluation
of each model.
Models evaluation is important in predictions tasks to
A. Data Description and Data Preparation examine the performance of each model and report the
The dataset was constructed from MIMIC-III dataset [15] outperforming model(s). In our study, we have used R-squared
which is an openly available dataset developed by the MIT Lab R2 or the coefficient of determination. R2 indicates how much
for Computational Physiology. It comprises de-identified variation of a dependent variable is explained by the
health data measuring 61,532 intensive care unit stays in independent variable(s) in the regression models. Also, we
intensive care unit admissions, demographics, vital signs, have used Mean average error MAE (equation 1) to measure
laboratory tests, medications, and more clinical variables. the difference between continuous variables regression
evaluation metrics. Noting that, the closer R is to the value 1.0,
During the data preparation process, we have used the better the prediction model performance. We have
(process_mimic.py), which was developed [16] to mine the evaluated regression models against each other (stage 1), then
cardiovascular inpatients' variables. We have merged the we have assessed the performance of the winning regressor
mined (csv file) which contains (Vitals, laboratory test, with the DNN model (stage 2).
Demographic) with variables from MIMIC-III tables
∑𝑛 |𝐿𝑂𝑆𝑝𝑟𝑒𝑑 − 𝐿𝑂𝑆𝑜𝑏𝑠𝑟𝑣 |
("ADMISSIONS", "PATIENTS", "ICUSTAYS", and 𝑀𝐴𝐸 = 𝑖=1 (1)
𝑛
"DIAGNOSES_ICD"). Data blending was performed, and all Where LOSpred is the predicted length of stay, LOSobsrv is
selected variables were combined using merge and join the observed length of stay, and n is the sample size (n = HF
(DataFrames) with Pandas in Python in one final table patients)
("HF_MIMIC1_4v"). For missing values, a technique called
"Impute Missing Values" [17] was applied to replace missing E. Model Fitting and Hyperparameters Optimization
values with specific values that have meaning to heart failure Hyperparameters or tuning is vital for the success of the
admitted cases from the dataset. fitting model. Hyperparameters determine the skill for the
B. Data Preprocessing selected model, learned from data, and they are manually
configuration by the machine learning expert. Grid search
Pearson correlation test was implemented to determine the parameter tuning and random search parameter tuning [22-23]
correlation between the independent variable and the output are common algorithm tuning which is considered as the last
variable. The features (25 independent variables) inputs for the step before obtaining predictions results. In our study, the
candidate models were selected according to the correlation choice of model tuning is based on the winning model.
between the independent variables (LOS) and the dependent
variables. To convert categorical variables, (One-hot-encoding III. RESULTS AND DISCUSSION
"0,1") was used for variables (such as Gender, Admit type,
Admit location, etc.). This technique is essential to improve We have identified all distinct heart failure hospitalizations
results for prediction models. Data scaling "normalization" (1592) based on ICD-9. Table 2 shows selected variables
technique was applied to get scaled features from the dataset, (Demographic, and Vital) from the HF patients' characteristics
which will be passed to the prediction designs. in the study. Male inpatients HF hospitalizations (52%) were
slightly more than female inpatients HF hospitalizations
C. Candidates Models (48%). The LOS patients mean, and median are (67.74, 62),
The chosen models choice (see Figure 1) demonstrated the respectively. The minimum LOS is 0.13 days, and the
nature of the length of stay data type (scale / continuous), maximum LOS is 93.94 days. All prediction models were
which was extracted from the MIMIC-III repository. Further, implemented using Python programming language. Scikit-
we have considered using regression-based models. learn [24] library was used during the regression models
Accordingly, we have evaluated regression-based model, such building and Keras [25] for the deep learning model. We have
as (Random Forest Regressor [18], Gradient Boosting divided HF dataset into training (64%) and testing (34%). We

5443

Authorized licensed use limited to: Cornell University Library. Downloaded on September 03,2020 at 10:29:40 UTC from IEEE Xplore. Restrictions apply.
have compared regression models with deep learning model
using the evaluation of the appropriate metrics.
We have considered three regression models (Random
Forest Regressor "RFR", Gradient Boosting Regressor,
"GBR", and the Stacking Regressor). The regressors were built
and evaluated, including exaction time. The results (Figure 2)
showed that the three models (GBR, Stacking Regression,
RFR) showed relatively close R2 and MAE (R2:0.81, 0.81, and
0.80) and (MAE: 2.00, 1.92, and 1.98) respectively. Stacking
Regressor was the slowest in prediction execution speed
(11.85 seconds), and GBR was the fastest in execution time.
GBR and Stacking had the best R2. Since the model training
and evaluating time is vital in real-time settings, such as
predicting clinical and medical cases. Therefore, the GBR was
our winning Regressor model.
The Deep Neural Networks (DNN) approach was
implemented using Keras with TensorFlow backend using
Figure 2. Single Regression predictors vs Stacked predictor.
three layers (25 dense layers). We have used "linear" for
activation and Stochastic gradient descent (SGD) for the
optimizer. Figure 3 illustrates the DNN prediction results.
Whereas, Table 1 compares the prediction results for the
regression models vs the DNN model. We noticed regression
models showed overall better results compared to DNN.
Generally, DNN performs well in larger data size and high
dimensional data due to its automatic learning feature benefit.
We have decided to fit GBR in the model tuning.
Since Gradient Boosting Regressor performed well
compared to other regression models as well as the DNN, we
refined the model using GridSearchCV from scikit-learn to get
the best estimators (hyperparameters) for the GBR model. One
of the advantages of the GBR is the fact that it is robust to the
over-fitting problem; a hence larger number of (n_estimatrors
= 200) can result in better performance. The second tuned
parameter is the (max_depth = 3), where tuning this parameter
to the performance, and it depends on the iteration of the input
Figure 3. Loss vs MAE for the DNN model on training and testing
value. The default value is (3). We have set the depth of the HF sets.
iteration maximum of three individual regression estimators.
The third tuned parameter was the loss function, and we have
set it to ('ls', 'lad'), where ls is: least squares regression, the lad IV. LIMITATIONS
is least absolute deviation. We have fitted GBR on the In this section, we report some of the limitations associated
predicators (Figure 4). After we have fitted the GBR with the with our study. First of all, since the aim was to develop a
best hyperparameters and the top features, the model achieved machine learning predictive length of stay approach for HF
improved and achieved better results (R2: 0.84±0.0.7). patients. We did not consider the other existing medical
comorbidities associated with HF inpatient, which may need
Table 1. Regression vs DNN Prediction results
further research exploration in a future study to measure its
Model R2 MAE time impact on the LOS or the extended LOS. Also, we did not
0.95 sec address post-hospitalization intervention and its effect on heart
RFR 0.8±0.08 1.98±0.16
failure LOS. Therefore, a further research study is needed to
GBR 0.81±0.07 2.0±0.14 0.85 sec
address the post-hospitalization intervention on LOS. We aim
Stacking
0.81±0.08 1.92±0.15
11.84 sec to exploit other deep learning approaches such as Long-Term
Regression Short Memory "LSTM" algorithm in a future research work.
DNN 4.11 sec
0.77±0.06 2.30±0.18
Regression
V. CONCLUSION
This study aimed to develop machine learning length of
stay predictive framework to predict heart failure for patients'
hospitalization in intensive care units using electronic medical
records historical data. The GBR regressor outperformed all
other models in this study with (R2 0.8±0.08). Noting that the
deep learning model (Deep Learning Regression) did not
report better results than regression models. Hence, it will be

5444

Authorized licensed use limited to: Cornell University Library. Downloaded on September 03,2020 at 10:29:40 UTC from IEEE Xplore. Restrictions apply.
commendable to exam DNN on a big EMR dataset with a large [3] W. Lesyuk, C. Kriza, and P. Kolominsky-Rabas, "Cost-of-illness
volume of heart failure hospitalizations and compare it with studies in heart failure: A systematic review 2004–2016," BMC
regression models in a future study. Moreover, in future work, cardiovascular disorders, vol. 18, no. 1, p. 74, 2018.
[4] Y.-K. Chan et al., "Current and projected burden of heart failure in the
we will validate our proposed predictive LOS framework on a Australian adult population: a substantive but still ill-defined major
real-world external dataset to evaluate the performance of the health issue," BMC health services research, vol. 16, no. 1, p. 501,
proposed model on other data sources. Our contribution 2016.
provides new insight to the healthcare decision-makers in [5] S. L. Jackson, X. Tong, R. J. King, F. Loustalot, Y. Hong, and M. D.
hospital management systems, clinical AI researches, and the Ritchey, "National burden of heart failure events in the United States,
clinical practitioner to predict patients' future outcomes, as 2006 to 2014," Circulation: Heart Failure, vol. 11, no. 12, p. e004873,
2018.
well as determining the costly HF hospitalizations using the
[6] A. Almashrafi, H. Alsabti, M. Mukaddirov, B. Balan, and P. Aylin,
artificial intelligence (AI) approaches. Further, it facilitates the "Factors associated with prolonged length of stay following cardiac
mission to clinical AI researches towards a more surgery in a major referral hospital in Oman: a retrospective
comprehensive framework to predict the common LOS heart observational study," BMJ open, vol. 6, no. 6, p. e010764, 2016.
failure hospitalizations from the clinical diagnosis and [7] B. Alsinglawi and O. Mubin, "Predictive Analytics and Deep Learning
improving resources utilization using the machine learning Techniques in Electronic Medical Records: Recent Advancements and
Future Direction," in Workshops of the International Conference on
advancements. Advanced Information Networking and Applications, 2019: Springer,
pp. 907-914.
[8] B. Alsinglawi, O. Mubin, M.Novoa, and O.Darwish, "Benchmarking
Predictive Models in Electronic Health Records: Sepsis Length of
Stay Prediction" to appear in International Conference on Advanced
Information Networking and Applications, 2020: Springer.
[9] H. Omar and M. Guglin, "Longer-than-average length of stay in acute
heart failure," Herz, vol. 43, no. 2, pp. 131-139, 2018.
[10] M. S. Durstenfeld, M. D. Saybolt, A. Praestgaard, and S. E. Kimmel,
"Physician predictions of length of stay of patients admitted with heart
failure," Journal of hospital medicine, vol. 11, no. 9, pp. 642-645,
2016.
[11] P.-F. J. Tsai et al., "Length of hospital stay prediction at the admission
stage for cardiology patients using artificial neural network," Journal of
healthcare engineering, vol. 2016, 2016.
[12] S. Barnes, E. Hamrock, M. Toerper, S. Siddiqui, and S. Levin, "Real-
Figure 4. Best 10 features (predictors) for GBR model. * beats per time prediction of inpatient length of stay for discharge prioritization,"
minutes, ** breaths per minutes, Journal of the American Medical Informatics Association, vol. 23, no.
e1, pp. e2-e10, 2015.
[13] L. Turgeman, J. H. May, and R. Sciulli, "Insights from a machine
Table 2. Patient Characteristics learning model for predicting the hospital Length of Stay (LOS) at the
time of admission," Expert Systems with Applications, vol. 78, pp.
HF 376-385, 2017
(Mean/ Median) Std Min Max [14] N. binti Omar, E. Supriyanto, and R. H. Al-Ashwal, "Personalized
Demographics Clinical Pathway for Heart Failure Management," in 2018 International
Conference on Applied Engineering (ICAE), 2018: IEEE, pp. 1-5
0.13 93.94 [15] A. E. Johnson et al., "MIMIC-III, a freely accessible critical care
LOS (10.77 / 8.13) days 9.16
days day
database," Scientific data, vol. 3, p. 160035, 2016.
Age 67.74/62 13.17 0 89 [16] D. Gligorijevic et al., "Deep Attention Model for Triage of Emergency
Mean: Department Patients," in Proceedings of the 2018 SIAM International
Weight 9.79 34.59 127.59 Conference on Data Mining, 2018: SIAM, pp. 297-305.
80.84
[17] J. A. Sterne et al., "Multiple imputation for missing data in
Gender Female: 765 (48%) Male: 827 (52%)
epidemiological and clinical research: potential and pitfalls," Bmj, vol.
Vital 338, p. b2393, 2009.
[18] A. Liaw and M. Wiener, "Classification and regression by
Temperature
Mean: 97.83 3.41 33.9 100.48 randomForest," R news, vol. 2, no. 3, pp. 18-22, 2002.
(F)
Mean: [19] scikit-learn. "Gradient Boosting Regressor." https://scikit-learn.org
Heart Rate 22.19 35.5 161.41 (accessed 15/01/2020, 2020).
92.27
Respiratory Mean: [20] L. Breiman, "Stacked regressions," Machine Learning, journal article
3.40 8.4 27.8 vol. 24, no. 1, pp. 49-64, July 01 1996, doi: 10.1007/bf00117832.
Rate 18.02
Pulse Mean: [21] J. Schmidhuber, "Deep learning in neural networks: An overview,"
1.60 57.85 99.83 Neural networks, vol. 61, pp. 85-117, 2015.
Oximetry 97.09
[22] J. Bergstra and Y. Bengio, "Random search for hyper-parameter
optimization," Journal of Machine Learning Research, vol. 13, no.
REFERENCES Feb, pp. 281-305, 2012.
[1] P. Ponikowski et al., "2016 ESC Guidelines for the diagnosis and [23] J. Bergstra and Y. Bengio, "Random search for hyper-parameter
treatment of acute and chronic heart failure: The Task Force for the optimization," Journal of machine learning research, vol. 13, no. Feb,
diagnosis and treatment of acute and chronic heart failure of the pp. 281-305, 2012
European Society of Cardiology (ESC). Developed with the special [24] F. Pedregosa et al., "Scikit-learn: Machine learning in Python,"
contribution of the Heart Failure Association (HFA) of the ESC," Journal of machine learning research, vol. 12, no. Oct, pp. 2825-2830,
European journal of heart failure, vol. 18, no. 8, pp. 891-975, 2016. 2011.
[2] G. Savarese and L. H. Lund, "Global public health burden of heart [25] F. Chollet, "Keras," ed, 2015
failure," Cardiac failure review, vol. 3, no. 1, p. 7, 2017.

5445

Authorized licensed use limited to: Cornell University Library. Downloaded on September 03,2020 at 10:29:40 UTC from IEEE Xplore. Restrictions apply.

You might also like