Professional Documents
Culture Documents
Original Research
A R T I C L E I N F O A B S T R A C T
Keywords: Objective: Machine learning (ML) models for allocating readmission-mitigating interventions are typically
Health informatics selected according to their discriminative ability, which may not necessarily translate into utility in allocation of
Machine learning resources. Our objective was to determine whether ML models for allocating readmission-mitigating in
terventions have different usefulness based on their overall utility and discriminative ability.
Materials and methods: We conducted a retrospective utility analysis of ML models using claims data acquired
from the Optum Clinformatics Data Mart, including 513,495 commercially-insured inpatients (mean [SD] age 69
[19] years; 294,895 [57%] Female) over the period January 2016 through January 2017 from all 50 states with
mean 90 day cost of $11,552. Utility analysis estimates the cost, in dollars, of allocating interventions for
lowering readmission risk based on the reduction in the 90-day cost.
Results: Allocating readmission-mitigating interventions based on a GBDT model trained to predict readmissions
achieved an estimated utility gain of $104 per patient, and an AUC of 0.76 (95% CI 0.76, 0.77); allocating in
terventions based on a model trained to predict cost as a proxy achieved a higher utility of $175.94 per patient,
and an AUC of 0.62 (95% CI 0.61, 0.62). A hybrid model combining both intervention strategies is comparable
with the best models on either metric. Estimated utility varies by intervention cost and efficacy, with each model
performing the best under different intervention settings.
Conclusion: We demonstrate that machine learning models may be ranked differently based on overall utility and
discriminative ability. Machine learning models for allocation of limited health resources should consider
directly optimizing for utility.
* Corresponding author at: 1265 Welch Road X-235 MC: 5479, Stanford, CA, USA.
1
These authors contributed equally to this work.
https://doi.org/10.1016/j.jbi.2021.103826
Received 20 January 2021; Received in revised form 23 May 2021; Accepted 28 May 2021
Available online 1 June 2021
1532-0464/© 2021 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
M. Ko et al. Journal of Biomedical Informatics 119 (2021) 103826
cost as a proxy for the task of readmission prediction, and a hybrid 2.4. Model development and validation
model that incorporated the predictions of the 2 models. We compared
all three models on discriminative ability measured by AUC and on We developed machine learning models using gradient boosted de
utility measured by estimated cost savings. cision trees (GBDTs) to predict readmission risk (readmission model),
and post-discharge cost (cost model). We chose GBDTs because they
2. Methods employ decision trees to capture nonlinear relationships in data that
traditional linear models are unable to capture, can handle mixes of
2.1. Ethics categorical and continuous covariates and scales well with large
amounts of data [13]. Moreover, it is straightforward to obtain variable
The Stanford University administrative panel for the Protection of importance ranking from the model, which makes the approach more
Human Subjects approved this study. interpretable than many other machine learning models [14]. GBDTs
have also been shown to achieve state-of-the-art performance in read
mission risk prediction [15]. The training procedure for GBDTs involves
2.2. Data
the construction of a sequence of decision trees such that each tree learns
from the errors of the prior tree to iteratively improve predictions [16].
Claims data were acquired from the Optum Clinformatics Data Mart,
We used the LightGBM framework to develop the models, which im
which collects administrative health claims for commercially insured
plements several algorithmic optimizations on standard gradient
members nationally. Claims data were verified, adjudicated, adjusted
boosting to allow for additional training efficiency [17].
with a standard pricing methodology to account for differences in
We trained the readmission and cost models on an identical set of
pricing across health plans and provider contracts, and de-identified
features extracted for every index admission. We included demographics
prior to inclusion in the Data Mart dataset. For this analysis, data are
(age, sex) associated with the patient, diagnostic and procedure codes,
health plan claims from all 50 states for commercially-insured in
and location of index admission. Sex and index admission location were
dividuals who were admitted as inpatients over the period January 2016
one hot encoded while diagnostic and procedure codes were encoded as
through January 2017. The data include demographics (age, sex from
aggregated counts of Clinical Classification Software categories [18]
enrollment applications) and all medical claims data, including inpa
from 12 months prior to discharge. In addition, the following features
tient visits, International Classification of Disease series 10 diagnostic
included in the HOSPITAL Score [19] were computed: (1) number of
codes, and payments. The data were split into a training set (80%) and a
procedures during hospital stay, (2) number of admissions in the pre
test set (20%). The training set was further split into two non-
vious year, (3) number of hospital stays for 5 or more days, and (4) type
overlapping subsets, with 12.5% used to train the propensity score
of index admission (urgent or elective). All features included in the LACE
model as part of our method to estimate utility, and the other 87.5%
Index [20] were also computed: (1) length of stay, (2) whether the
used for training the readmission and cost models.
admission was acute, (3) Charlson Comorbidity Index [21], and (4)
number of ED visits prior to the index admission. A total of 539 features
2.3. Outcome were used. No missing data was present in our final dataset. Feature
selection was not used prior to model training since previous research
The primary outcome in this study was all-cause unplanned read has shown that using the full set of features rather than subsets of fea
mission within 30 days of discharge. A readmission was defined as a tures was better for GBDTs in predicting readmission risk [15].
subsequent hospitalization following an eligible index admission. An We used 3-fold cross-validation on the training data subset to select
index admission was eligible if the following criteria were met: (1) pa the hyper-parameters for the models, including the number of trees, the
tient was enrolled for 12 consecutive months prior to discharge and 90 maximum depth of each tree, and the required minimum loss reduction
continuous days post-discharge, (2) patient was discharged alive, (3) to partition leaf nodes based on the cross-validation area under the
patient did not leave against medical advice, (4) patient was not trans receiver operating characteristic curve (AUC) and R2 for the read
ferred to another acute care hospital setting, (5) patient’s primary mission model and cost model respectively. Each model was refitted to
diagnosis was not one among psychiatric disorders, cancer treatments or the full training set using the best parameters determined from 3-fold
rehabilitation care, (6) patient’s previous index admission was not cross validation.
within 30 days prior to discharge. Eligible index admissions were The readmission model, cost models and an additional hybrid model,
considered to have a subsequent readmission if the patient had an un were all built for the task of unplanned readmission prediction. The
planned admission within 30 days of discharge. An admission was readmission model employed standard GBDT binary classification for
defined as “planned” and not part of the readmission measure if it predicting readmissions whereas the cost model employed standard
contained procedures from a set of pre-specified planned procedure GBDT regression for predicting 90 day cost. The hybrid model inde
codes, i.e. organ transplants or chemotherapy, and did not include acute pendently multiplied the output of the readmission model with the
diagnosis codes for potentially planned procedures as described by output of the cost model to obtain an expected value of cost. We chose to
Planned Readmission Algorithm (Version 4.0, 2019) [11,12]. All other develop a hybrid model that combined predictions of the readmission
readmissions were considered as unplanned, regardless of cause. The and cost model with the hopes of netting predictive and utility benefits
outcome used in this study was consistent with the methodology used in from both models. The outputs of all 3 models (readmission risk, 90 day
the U.S. Centers for Medicare & Medicaid Services (CMS) definitions, cost, and expected cost) were used to rank patients in our test set (see
with the exception that we required 90 continuous days of enrollment Fig. 1). We evaluated the discriminative performance of these models
post-discharge, rather than 30 days, for the purposes of our cost analysis. using AUC on the test set along with 95% confidence intervals computed
The details of the labelling and cohort exclusions procedure adapted can using the nonparametric bootstrap method with 1,000 bootstrap repli
be viewed in Appendix D & E of CMS 2019 Measures Update: HWR [12]. cates. Additionally, for the cost model only, we evaluated discriminative
The secondary outcome in this study was the cost accrued in the 90 performance on the task of predicting cost using R2 and mean absolute
days post-discharge of an index admission, herein referred to as post- error using the same bootstrapping method for 95% confidence
discharge cost. The accrued cost was defined as total standardized intervals.
gross payments (not charges) to all providers and facilities. The cost was For supplemental analysis, we constructed an additional cost-
computed for each patient index admission by summing standardized sensitive readmission model for comparison. The cost-sensitive read
costs (in US Dollars) over 90 days, including post year claims corrections mission model was trained with the same protocol of that of the read
and zero spending among enrolled individuals without medical claims. mission model, but employed a loss weighted by the 90 day cost of
2
M. Ko et al. Journal of Biomedical Informatics 119 (2021) 103826
Fig. 1. The readmission model, cost models and an additional hybrid model, were all built for the task of unplanned readmission prediction. The readmission model
employed standard GBDT binary classification for predicting readmissions whereas the cost model employed standard GBDT regression for predicting 90 day cost.
The hybrid model independently multiplied the output of the readmission model with the output of the cost model to obtain an expected value of cost. The outputs of
all 3 models (readmission risk, 90 day cost, and expected cost) were used to rank patients.
2.5. Individualized utility analysis Upon assigning interventions based on a predictive model, the 90-
day cost of the patient’s care may follow these 4 scenarios (see Fig. 2):
We used an approach, which we refer to as Individualized Utility
Analysis (IUA), for comparing predictive models based on estimated • Scenario 1: Intervention is not assigned to a patient who was not
utility. IUA estimates the utility, in dollars, of allocating interventions readmitted
for reducing readmission risk based on the reduction in the subsequent • Scenario 2: Intervention is not assigned for a patient who was
90-day cost of care, incorporating heterogeneity in outcomes and costs readmitted
across patients [23]. Although other utility analysis are similar in esti • Scenario 3: Intervention is assigned to a patient who was not
mating model utility, other approaches treat all patients with the same readmitted
global estimates of outcomes and costs [22] which may not be gener • Scenario 4: Intervention is assigned to a patient who was readmitted
alizable to real-world settings. We used IUA which estimates the utility
at an individual level. Although we present the example using dollar In both Scenario 1 and 2, the 90-day cost is the observed 90-day cost
cost, other units such as estimates of disability adjusted life years can in the dataset since no intervention has occurred. In Scenario 3, the 90-
also be used for such utility estimates. day cost is the observed 90-day cost plus the cost of intervention. It is
IUA constructs an outcome-cost matrix for each patient to outline the important to note that the intervention in Scenario 3 is assumed to be
utility for every possible outcome scenario. Each matrix uses the both harmless and effective. In Scenario 4, intervention is assumed to be
observed patient cost, estimated counterfactual patient cost, known effective in reducing readmission risk and the 90-day cost is defined to
intervention cost and intervention efficacy. This method makes the be the weighted average of their counterfactual cost (with weight p,
following assumptions: (1) intervention cost and efficacy is fixed across which is the efficacy of the intervention), and their observed cost (with
patients (2) the estimated counterfactual cost is a good estimate of the weight 1-p).
cost incurred for a patient, had they received an effective intervention.
2.7. Counterfactual patient cost
3
M. Ko et al. Journal of Biomedical Informatics 119 (2021) 103826
associated with Scenario 4. However, since we do not have access to the 3.1. Model validation
counterfactual cost in observational data, we must estimate it using
causal inference. For this study we used propensity score matching On the task of predicting unplanned readmissions, the readmission
(PSM) [24], a commonly used statistical method, to estimate counter model achieved an AUC of 0.76 (95% CI 0.76, 0.77) on the test set; the
factual patient costs. We note that our method IUA is not limited to PSM, cost model achieved an AUC of 0.62 (95% CI 0.61, 0.62); the hybrid
but can be used with any reasonable causal inference method (nearest model achieved an AUC of 0.72 (95% CI 0.71, 0.72). On the task of
neighbor matching, optimal matching) suitable for the dataset at hand. predicting cost, the cost model achieved an R2 of 0.22 (95% CI 0.20,
We find that for our high dimensional dataset, PSM can match patients 0.23) and a mean absolute error of $13,574 (95% CI $13,439, $13,714).
with similar probability of readmission in readmitted to non-readmitted The receiver operating characteristic (ROC) curves for each of the
groups with the advantage of being a well-established approach [24]. models are shown in Fig. 3.
Propensity scores were obtained from an L1 regularized logistic
regression model trained to predict 90-day cost using an identical 3.2. Individualized utility analysis
feature set and hyperparameter selection procedure as the readmission
and cost models. The matching procedure assigned patients to one of 20 Utility at 0% intervention was $-12,719 which we defined as baseline
equal width propensity score bins from 0.0 to 0.6. Within each bin, the utility. We express all computed utility relative to baseline utility. A
median cost of the non-readmitted patient group was used to impute the fixed cost of $500 and fixed efficacy of 10% was used for the IUA
counterfactual costs for the readmitted patients (i.e. their cost if they analysis based on prior literature [26–28], which estimates efficacy of
would not have a readmission) (see Supplemental Table 2). We evalu readmission reduction interventions to be between 10 and 50% and the
ated the goodness of fit of the logistic regression model on the test set cost to be between $100–$2000. As computed by the IUA, the read
using AUC, with its 95% confidence intervals computed using the non- mission model had a maximum utility gain of $104.23 per patient when
parametric bootstrap with 1000 bootstrap replicates. The propensity 21.4% of the patient population was treated, the cost model had a
score model had an AUC of 0.691 (95% CI 0.688, 0.695), a calibration maximum utility gain of $175.94 per patient when 34.0% of the patient
slope of 0.945 (95% 0.931, 0.957) and a calibration intercept of 0.0124 population was treated, and the hybrid model had a maximum utility
(95% 0.0085, 0.016). gain of $177.21 when 34.3% of the patient population was treated. Both
the cost and hybrid models’ maximum utility gain per patient were
2.8. Feature importances significantly greater than that of the readmission model using boot
strapping and Bonferroni correction (p < .001). No significant difference
We quantified the impact of each input feature on the readmission in utility was found between the cost and the hybrid model. Fig. 3 shows
and cost models. The SHAP (SHapley Additive exPlanations) method the utility gain at each % patient population treated of each model.
was used. The method explains prediction by allocating credit among Rerunning the analysis with a cost sensitive readmission model instead
the input features; feature credit is calculated using Shapley Values [25], of the readmission model yielded similar results (see Supplemental
as the change in the expected value of the model’s prediction of Fig. 3).
improvement for a symptom when a feature is observed versus un
known. To uncover clinically important features that were globally 3.3. Sensitivity analysis
predictive of the readmissions outcome and the cost outcome, the
Shapley values for features on individual predictions were aggregated A sensitivity analysis of intervention cost and efficacy was performed
and reported along with their averaged absolute Shapley contributions to measure the utility of each model under various settings of cost and
as a percent of the contributions of all the features. efficacy of the intervention. Under low intervention cost and high
intervention efficacy settings, the readmission model is able attain the
3. Results best utility. But when intervention is expensive and less effective, the
cost and hybrid model outperform the readmission model based on
The final dataset consisted of 513,495 patients, divided into a overall utility. Fig. 4 shows the best model based on highest utility gain
training set of 410,796 patients, a test set of 102,699 patients. There was under various intervention cost and efficacy values. Supplemental
no patient overlap between the two sets. Overall, the average number of Figs. 1 and 2 shows individual model utility and model utility
index admissions per patient was 1.18. The prevalence of readmission differences.
was 25.41%, and the mean 90-day cost was $11,552. Table 1 details the
patient, admission, and outcome characteristics in the dataset. 3.4. Feature importances
4
M. Ko et al. Journal of Biomedical Informatics 119 (2021) 103826
Fig. 3. The left panel shows the ROC curve for three models for assigning interventions to prevent a readmission. A readmission model (AUC = 0.76), which assigns
intervention according to readmission probability, a cost model (AUC = 0.62), which assigns intervention according to the cost of patient, and a hybrid cost +
readmission model (AUC = 0.72), which assigns intervention according to the product of cost and readmission probability. The right panel shows the utility realized
in terms of the average savings per patient given a certain number of interventions provided. With a fixed intervention cost of $500 and efficacy of 10%, the
readmission model had a maximum utility gain of $104.23 per patient when 21.4% of the patient population was treated, the cost model had a maximum utility of
$175.94 per patient when 34.0% of the patient population was treated, and the hybrid model had a maximum utility of $177.21 when 34.3% of the patient
population was treated.
5
M. Ko et al. Journal of Biomedical Informatics 119 (2021) 103826
Fig. 5. Important features for the readmission model (left) and cost model (right). The impact of each feature on the discrimination ability of the readmission and
cost models was quantified using the SHAP (SHapley Additive exPlanations) method, which explains prediction by allocating credit among the input features. Feature
importances on individual predictions were aggregated and reported along with their averaged absolute Shapley contributions as a percent of the contributions of all
the features.
6
M. Ko et al. Journal of Biomedical Informatics 119 (2021) 103826
[4] E.J. Emanuel, R.M. Wachter, Artificial intelligence in health care: will the value [21] H. Quan, V. Sundararajan, P. Halfon, et al., Coding algorithms for defining
match the hype? JAMA 321 (23) (2019) 2281, https://doi.org/10.1001/ comorbidities in ICD-9-CM and ICD-10 administrative data, Med. Care 43 (11)
jama.2019.4914. (2005) 1130–1139, https://doi.org/10.1097/01.mlr.0000182534.19832.83.
[5] S.W. Grant, G.S. Collins, S.A.M. Nashef, Statistical primer: developing and [22] M. Bayati, M. Braverman, M. Gillam, et al., Data-driven decisions for reducing
validating a risk prediction model†, Eur. J. Cardiothorac. Surg. 54 (2) (2018) readmissions for heart failure: general methodology and case study, Zhang H, ed.
203–208, https://doi.org/10.1093/ejcts/ezy180. PLoS ONE. 2014;9(10):e109264. doi:10.1371/journal.pone.0109264.
[6] A.K. Triantafyllidis, A. Tsanas, Applications of machine learning in real-life digital [23] J.P.A. Ioannidis, A.M. Garber, Individualized cost-effectiveness analysis, PLoS Med.
health interventions: review of the literature, J. Med. Internet Res. 21 (4) (2019), 8 (7) (2011), e1001058, https://doi.org/10.1371/journal.pmed.1001058.
e12286, https://doi.org/10.2196/12286. [24] A. Abadie, G.W. Imbens, Matching on the estimated propensity score,
[7] S.F. Jencks, M.V. Williams, E.A. Coleman, Rehospitalizations among patients in the Econometrica. 84 (2) (2016) 781–807, https://doi.org/10.3982/ECTA11293.
medicare fee-for-service program, N. Engl. J. Med. 360 (14) (2009) 1418–1428, [25] L.S. Shapley, A value for n-person games, Contrib. Theory Games 2 (28) (1953)
https://doi.org/10.1056/NEJMsa0803563. 307–317.
[8] Daniel R. Levinson, Inspector General, Adverse events in hospitals: national [26] R. Nair, H. Lak, S. Hasan, D. Gunasekaran, A. Babar, K.V. Gopalakrishna, Reducing
incidence among medicare beneficiaries. Department of Health and Human all-cause 30-day hospital readmissions for patients presenting with acute heart
Services Office of the Inspector General, 2010. failure exacerbations: a quality improvement initiative, Cureus 12 (3) (2020),
[9] H.M. Krumholz, K. Wang, Z. Lin, et al., Hospital-readmission risk — isolating e7420, https://doi.org/10.7759/cureus.7420.
hospital effects from patient effects, N. Engl. J. Med. 377 (11) (2017) 1055–1064, [27] S. Kripalani, C.N. Theobald, B. Anctil, E.E. Vasilevskis, Reducing hospital
https://doi.org/10.1056/NEJMsa1702321. readmission rates: current strategies and future directions, Annu. Rev. Med. 65
[10] C.K. McIlvennan, Z.J. Eapen, L.A. Allen, Hospital readmissions reduction program, (2014) 471–485, https://doi.org/10.1146/annurev-med-022613-090415.
Circulation 131 (20) (2015) 1796–1803, https://doi.org/10.1161/ [28] V. Baky, D. Moran, T. Warwick, et al., Obtaining a follow-up appointment before
CIRCULATIONAHA.114.010270. discharge protects against readmission for patients with acute coronary syndrome
[11] L.I. Horwitz, J.N. Grady, D.B. Cohen, et al., Development and validation of an and heart failure: A quality improvement project, Int. J. Cardiol. 257 (2018)
algorithm to identify planned readmissions from claims data, J. Hosp. Med. 10 (10) 12–15, https://doi.org/10.1016/j.ijcard.2017.10.036.
(2015) 670–677, https://doi.org/10.1002/jhm.2416. [30] W. Crown, N. Buyukkaramikli, P. Thokala, et al., Constrained optimization
[12] CMS. U.S. Centers for Medicare & Medicaid Services (CMS) Measure Methodology. methods in health services research-an introduction: Report 1 of the ISPOR
Published 2019. https://www.qualitynet.org/inpatient/measures/readmission/ optimization methods emerging good practices task force, Value Health J. Int. Soc.
methodology. Pharmacoeconomics Outcomes Res. 20 (3) (2017) 310–319, https://doi.org/
[13] H. Zhang, S. Si, C.-J. Hsieh, GPU-acceleration for Large-scale Tree Boosting. 10.1016/j.jval.2017.01.013.
ArXiv170608359 Cs Stat. Published online June 26, 2017. Accessed March 24, [31] W. Crown, N. Buyukkaramikli, M.Y. Sir, et al., Application of constrained
2020. http://arxiv.org/abs/1706.08359. optimization methods in health services research: report 2 of the ISPOR
[14] F. Cabitza, R. Rasoini, G.F. Gensini, Unintended consequences of machine learning optimization methods emerging good practices task force, Value Health J. Int. Soc.
in medicine, JAMA 318 (6) (2017) 517, https://doi.org/10.1001/jama.2017.7797. Pharmacoeconomics Outcomes Res. 21 (9) (2018) 1019–1028, https://doi.org/
[15] L. Einav, A. Finkelstein, S. Mullainathan, Z. Obermeyer, Predictive modeling of U. 10.1016/j.jval.2018.05.003.
S. health care spending in late life, Science 360 (6396) (2018) 1462–1465, https:// [32] D.W. Challener, L.J. Prokop, O. Abu-Saleh, The proliferation of reports on clinical
doi.org/10.1126/science.aar5045. scoring systems: issues about uptake and clinical utility, JAMA 321 (24) (2019)
[16] J. Friedman, T. Hastie, R. Tibshirani, Additive logistic regression: a statistical view 2405, https://doi.org/10.1001/jama.2019.5284.
of boosting (With discussion and a rejoinder by the authors), Ann. Stat. 28 (2) [33] A. Artetxe, A. Beristain, M. Graña, Predictive models for hospital readmission risk:
(2000) 337–407, https://doi.org/10.1214/aos/1016218223. A systematic review of methods, Comput. Methods Programs Biomed. 164 (2018)
[17] G. Ke, Q. Meng, T. Finley, et al., LightGBM: a highly efficient gradient boosting 49–64, https://doi.org/10.1016/j.cmpb.2018.06.006.
decision tree, NIPS17 Proc 31st Int Conf Neural Inf Process Syst, Published online [34] Z. Obermeyer, B. Powers, C. Vogeli, S. Mullainathan, Dissecting racial bias in an
December 2017:3149-3157. algorithm used to manage the health of populations, Science 366 (6464) (2019)
[18] CCS_10 (Clinical Classifications Software 10) - Synopsis. https://www.nlm.nih. 447–453, https://doi.org/10.1126/science.aax2342.
gov/research/umls/sourcereleasedocs/current/CCS_10/index.html. [35] T.K. Nuckols, E. Keeler, S. Morton, et al., Economic evaluation of quality
[19] J.D. Donzé, M.V. Williams, E.J. Robinson, et al., International validity of the improvement interventions designed to prevent hospital readmission: a systematic
HOSPITAL score to predict 30-day potentially avoidable hospital readmissions, review and meta-analysis, JAMA Intern. Med. 177 (7) (2017) 975, https://doi.org/
JAMA Intern. Med. 176 (4) (2016) 496, https://doi.org/10.1001/ 10.1001/jamainternmed.2017.1136.
jamainternmed.2015.8462. [36] K. Jung, S. Kashyap, A. Avati, et al., A framework for making predictive models
[20] C. van Walraven, I.A. Dhalla, C. Bell, et al., Derivation and validation of an index to useful in practice, J. Am. Med. Inform. Assoc. Published online December 22, 2020:
predict early death or unplanned readmission after discharge from hospital to the ocaa318. doi:10.1093/jamia/ocaa318.
community, Can. Med. Assoc. J. 182 (6) (2010) 551–557, https://doi.org/
10.1503/cmaj.091117.