You are on page 1of 7

Journal of Biomedical Informatics 119 (2021) 103826

Contents lists available at ScienceDirect

Journal of Biomedical Informatics


journal homepage: www.elsevier.com/locate/yjbin

Original Research

Improving hospital readmission prediction using individualized


utility analysis
Michael Ko a, Emma Chen a, 1, Ashwin Agrawal a, 1, Pranav Rajpurkar a, Anand Avati a,
Andrew Ng a, Sanjay Basu b, Nigam H. Shah c, *
a
Department of Computer Science, Stanford University, CA, USA
b
Center for Primary Care, Harvard Medical School, MA, USA
c
Center for Biomedical Informatics Research, Stanford University, Stanford, CA, USA

A R T I C L E I N F O A B S T R A C T

Keywords: Objective: Machine learning (ML) models for allocating readmission-mitigating interventions are typically
Health informatics selected according to their discriminative ability, which may not necessarily translate into utility in allocation of
Machine learning resources. Our objective was to determine whether ML models for allocating readmission-mitigating in­
terventions have different usefulness based on their overall utility and discriminative ability.
Materials and methods: We conducted a retrospective utility analysis of ML models using claims data acquired
from the Optum Clinformatics Data Mart, including 513,495 commercially-insured inpatients (mean [SD] age 69
[19] years; 294,895 [57%] Female) over the period January 2016 through January 2017 from all 50 states with
mean 90 day cost of $11,552. Utility analysis estimates the cost, in dollars, of allocating interventions for
lowering readmission risk based on the reduction in the 90-day cost.
Results: Allocating readmission-mitigating interventions based on a GBDT model trained to predict readmissions
achieved an estimated utility gain of $104 per patient, and an AUC of 0.76 (95% CI 0.76, 0.77); allocating in­
terventions based on a model trained to predict cost as a proxy achieved a higher utility of $175.94 per patient,
and an AUC of 0.62 (95% CI 0.61, 0.62). A hybrid model combining both intervention strategies is comparable
with the best models on either metric. Estimated utility varies by intervention cost and efficacy, with each model
performing the best under different intervention settings.
Conclusion: We demonstrate that machine learning models may be ranked differently based on overall utility and
discriminative ability. Machine learning models for allocation of limited health resources should consider
directly optimizing for utility.

1. Introduction disconnect. Unplanned readmissions have a high financial burden for


hospitals [7], are associated with adverse patient outcomes [8], and are
Machine learning–based models for predicting a future health state reflections of low quality of care [9]. The Medicare Payment Advisory
are now common, however, evidence that the use of these models to Commission (MedPAC) has estimated that 12% of readmissions are
guide interventions has improved patient outcomes is lacking [1]. A key potentially avoidable, and has estimated the potential cost-savings at $1
reason for this gap is that typical measures of predictive performance, billion [10].
such as discrimination (the ability to differentiate people at higher risk In this study, we investigated how models for allocating readmission-
of having an event from those at lower risk) and calibration (the extend mitigating interventions are ranked based on utility. Utility is measured
to which the predicted values agree with the observed risk in a popu­ by taking into account the dollar value costs of allocating an interven­
lation) [2], do not necessarily reflect a useful model [1,3]. Useful models tion or treatment, along with estimates of future individual patient ex­
[4] are those that lead to a favorable change in clinical decision-making penses. We examined whether higher discriminative ability of a model
[5], as measured by improved patient outcomes [6], increased utility, or translated into better utility by developing three models: one to directly
lower costs [6]. Hospital readmissions are a well known example of this predict 30-day readmission using claims data, another to predict 90-day

* Corresponding author at: 1265 Welch Road X-235 MC: 5479, Stanford, CA, USA.
1
These authors contributed equally to this work.

https://doi.org/10.1016/j.jbi.2021.103826
Received 20 January 2021; Received in revised form 23 May 2021; Accepted 28 May 2021
Available online 1 June 2021
1532-0464/© 2021 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
M. Ko et al. Journal of Biomedical Informatics 119 (2021) 103826

cost as a proxy for the task of readmission prediction, and a hybrid 2.4. Model development and validation
model that incorporated the predictions of the 2 models. We compared
all three models on discriminative ability measured by AUC and on We developed machine learning models using gradient boosted de­
utility measured by estimated cost savings. cision trees (GBDTs) to predict readmission risk (readmission model),
and post-discharge cost (cost model). We chose GBDTs because they
2. Methods employ decision trees to capture nonlinear relationships in data that
traditional linear models are unable to capture, can handle mixes of
2.1. Ethics categorical and continuous covariates and scales well with large
amounts of data [13]. Moreover, it is straightforward to obtain variable
The Stanford University administrative panel for the Protection of importance ranking from the model, which makes the approach more
Human Subjects approved this study. interpretable than many other machine learning models [14]. GBDTs
have also been shown to achieve state-of-the-art performance in read­
mission risk prediction [15]. The training procedure for GBDTs involves
2.2. Data
the construction of a sequence of decision trees such that each tree learns
from the errors of the prior tree to iteratively improve predictions [16].
Claims data were acquired from the Optum Clinformatics Data Mart,
We used the LightGBM framework to develop the models, which im­
which collects administrative health claims for commercially insured
plements several algorithmic optimizations on standard gradient
members nationally. Claims data were verified, adjudicated, adjusted
boosting to allow for additional training efficiency [17].
with a standard pricing methodology to account for differences in
We trained the readmission and cost models on an identical set of
pricing across health plans and provider contracts, and de-identified
features extracted for every index admission. We included demographics
prior to inclusion in the Data Mart dataset. For this analysis, data are
(age, sex) associated with the patient, diagnostic and procedure codes,
health plan claims from all 50 states for commercially-insured in­
and location of index admission. Sex and index admission location were
dividuals who were admitted as inpatients over the period January 2016
one hot encoded while diagnostic and procedure codes were encoded as
through January 2017. The data include demographics (age, sex from
aggregated counts of Clinical Classification Software categories [18]
enrollment applications) and all medical claims data, including inpa­
from 12 months prior to discharge. In addition, the following features
tient visits, International Classification of Disease series 10 diagnostic
included in the HOSPITAL Score [19] were computed: (1) number of
codes, and payments. The data were split into a training set (80%) and a
procedures during hospital stay, (2) number of admissions in the pre­
test set (20%). The training set was further split into two non-
vious year, (3) number of hospital stays for 5 or more days, and (4) type
overlapping subsets, with 12.5% used to train the propensity score
of index admission (urgent or elective). All features included in the LACE
model as part of our method to estimate utility, and the other 87.5%
Index [20] were also computed: (1) length of stay, (2) whether the
used for training the readmission and cost models.
admission was acute, (3) Charlson Comorbidity Index [21], and (4)
number of ED visits prior to the index admission. A total of 539 features
2.3. Outcome were used. No missing data was present in our final dataset. Feature
selection was not used prior to model training since previous research
The primary outcome in this study was all-cause unplanned read­ has shown that using the full set of features rather than subsets of fea­
mission within 30 days of discharge. A readmission was defined as a tures was better for GBDTs in predicting readmission risk [15].
subsequent hospitalization following an eligible index admission. An We used 3-fold cross-validation on the training data subset to select
index admission was eligible if the following criteria were met: (1) pa­ the hyper-parameters for the models, including the number of trees, the
tient was enrolled for 12 consecutive months prior to discharge and 90 maximum depth of each tree, and the required minimum loss reduction
continuous days post-discharge, (2) patient was discharged alive, (3) to partition leaf nodes based on the cross-validation area under the
patient did not leave against medical advice, (4) patient was not trans­ receiver operating characteristic curve (AUC) and R2 for the read­
ferred to another acute care hospital setting, (5) patient’s primary mission model and cost model respectively. Each model was refitted to
diagnosis was not one among psychiatric disorders, cancer treatments or the full training set using the best parameters determined from 3-fold
rehabilitation care, (6) patient’s previous index admission was not cross validation.
within 30 days prior to discharge. Eligible index admissions were The readmission model, cost models and an additional hybrid model,
considered to have a subsequent readmission if the patient had an un­ were all built for the task of unplanned readmission prediction. The
planned admission within 30 days of discharge. An admission was readmission model employed standard GBDT binary classification for
defined as “planned” and not part of the readmission measure if it predicting readmissions whereas the cost model employed standard
contained procedures from a set of pre-specified planned procedure GBDT regression for predicting 90 day cost. The hybrid model inde­
codes, i.e. organ transplants or chemotherapy, and did not include acute pendently multiplied the output of the readmission model with the
diagnosis codes for potentially planned procedures as described by output of the cost model to obtain an expected value of cost. We chose to
Planned Readmission Algorithm (Version 4.0, 2019) [11,12]. All other develop a hybrid model that combined predictions of the readmission
readmissions were considered as unplanned, regardless of cause. The and cost model with the hopes of netting predictive and utility benefits
outcome used in this study was consistent with the methodology used in from both models. The outputs of all 3 models (readmission risk, 90 day
the U.S. Centers for Medicare & Medicaid Services (CMS) definitions, cost, and expected cost) were used to rank patients in our test set (see
with the exception that we required 90 continuous days of enrollment Fig. 1). We evaluated the discriminative performance of these models
post-discharge, rather than 30 days, for the purposes of our cost analysis. using AUC on the test set along with 95% confidence intervals computed
The details of the labelling and cohort exclusions procedure adapted can using the nonparametric bootstrap method with 1,000 bootstrap repli­
be viewed in Appendix D & E of CMS 2019 Measures Update: HWR [12]. cates. Additionally, for the cost model only, we evaluated discriminative
The secondary outcome in this study was the cost accrued in the 90 performance on the task of predicting cost using R2 and mean absolute
days post-discharge of an index admission, herein referred to as post- error using the same bootstrapping method for 95% confidence
discharge cost. The accrued cost was defined as total standardized intervals.
gross payments (not charges) to all providers and facilities. The cost was For supplemental analysis, we constructed an additional cost-
computed for each patient index admission by summing standardized sensitive readmission model for comparison. The cost-sensitive read­
costs (in US Dollars) over 90 days, including post year claims corrections mission model was trained with the same protocol of that of the read­
and zero spending among enrolled individuals without medical claims. mission model, but employed a loss weighted by the 90 day cost of

2
M. Ko et al. Journal of Biomedical Informatics 119 (2021) 103826

Fig. 1. The readmission model, cost models and an additional hybrid model, were all built for the task of unplanned readmission prediction. The readmission model
employed standard GBDT binary classification for predicting readmissions whereas the cost model employed standard GBDT regression for predicting 90 day cost.
The hybrid model independently multiplied the output of the readmission model with the output of the cost model to obtain an expected value of cost. The outputs of
all 3 models (readmission risk, 90 day cost, and expected cost) were used to rank patients.

patients. 2.6. Outcome cost matrix

2.5. Individualized utility analysis Upon assigning interventions based on a predictive model, the 90-
day cost of the patient’s care may follow these 4 scenarios (see Fig. 2):
We used an approach, which we refer to as Individualized Utility
Analysis (IUA), for comparing predictive models based on estimated • Scenario 1: Intervention is not assigned to a patient who was not
utility. IUA estimates the utility, in dollars, of allocating interventions readmitted
for reducing readmission risk based on the reduction in the subsequent • Scenario 2: Intervention is not assigned for a patient who was
90-day cost of care, incorporating heterogeneity in outcomes and costs readmitted
across patients [23]. Although other utility analysis are similar in esti­ • Scenario 3: Intervention is assigned to a patient who was not
mating model utility, other approaches treat all patients with the same readmitted
global estimates of outcomes and costs [22] which may not be gener­ • Scenario 4: Intervention is assigned to a patient who was readmitted
alizable to real-world settings. We used IUA which estimates the utility
at an individual level. Although we present the example using dollar In both Scenario 1 and 2, the 90-day cost is the observed 90-day cost
cost, other units such as estimates of disability adjusted life years can in the dataset since no intervention has occurred. In Scenario 3, the 90-
also be used for such utility estimates. day cost is the observed 90-day cost plus the cost of intervention. It is
IUA constructs an outcome-cost matrix for each patient to outline the important to note that the intervention in Scenario 3 is assumed to be
utility for every possible outcome scenario. Each matrix uses the both harmless and effective. In Scenario 4, intervention is assumed to be
observed patient cost, estimated counterfactual patient cost, known effective in reducing readmission risk and the 90-day cost is defined to
intervention cost and intervention efficacy. This method makes the be the weighted average of their counterfactual cost (with weight p,
following assumptions: (1) intervention cost and efficacy is fixed across which is the efficacy of the intervention), and their observed cost (with
patients (2) the estimated counterfactual cost is a good estimate of the weight 1-p).
cost incurred for a patient, had they received an effective intervention.
2.7. Counterfactual patient cost

Counterfactual patient cost is needed in estimating the cost

Fig. 2. Upon assigning an intervention based


on one of the machine learning derived
models, the 90-day cost of the patient’s care
is calculated as follows. If intervention is not
assigned, the future cost will be the observed
cost in the dataset for both readmitted (C1)
and not readmitted patients (C0). If inter­
vention is assigned, there are two scenarios. If
intervention was assigned to the not read­
mitted subgroup, the future cost is the actual
cost (C0) plus the cost of the intervention (t).
The effect of the intervention on them is assumed to be harmless, meaning a readmission will not occur due to the intervention. If intervention was assigned to the
readmitted subgroup, the future cost is defined to be the weighted average of their counterfactual cost (μ0) and their observed cost (C1) with the efficacy of the
intervention (p) serving as the weight. μ0 is estimated via propensity score matching as described in the methods section.

3
M. Ko et al. Journal of Biomedical Informatics 119 (2021) 103826

associated with Scenario 4. However, since we do not have access to the 3.1. Model validation
counterfactual cost in observational data, we must estimate it using
causal inference. For this study we used propensity score matching On the task of predicting unplanned readmissions, the readmission
(PSM) [24], a commonly used statistical method, to estimate counter­ model achieved an AUC of 0.76 (95% CI 0.76, 0.77) on the test set; the
factual patient costs. We note that our method IUA is not limited to PSM, cost model achieved an AUC of 0.62 (95% CI 0.61, 0.62); the hybrid
but can be used with any reasonable causal inference method (nearest model achieved an AUC of 0.72 (95% CI 0.71, 0.72). On the task of
neighbor matching, optimal matching) suitable for the dataset at hand. predicting cost, the cost model achieved an R2 of 0.22 (95% CI 0.20,
We find that for our high dimensional dataset, PSM can match patients 0.23) and a mean absolute error of $13,574 (95% CI $13,439, $13,714).
with similar probability of readmission in readmitted to non-readmitted The receiver operating characteristic (ROC) curves for each of the
groups with the advantage of being a well-established approach [24]. models are shown in Fig. 3.
Propensity scores were obtained from an L1 regularized logistic
regression model trained to predict 90-day cost using an identical 3.2. Individualized utility analysis
feature set and hyperparameter selection procedure as the readmission
and cost models. The matching procedure assigned patients to one of 20 Utility at 0% intervention was $-12,719 which we defined as baseline
equal width propensity score bins from 0.0 to 0.6. Within each bin, the utility. We express all computed utility relative to baseline utility. A
median cost of the non-readmitted patient group was used to impute the fixed cost of $500 and fixed efficacy of 10% was used for the IUA
counterfactual costs for the readmitted patients (i.e. their cost if they analysis based on prior literature [26–28], which estimates efficacy of
would not have a readmission) (see Supplemental Table 2). We evalu­ readmission reduction interventions to be between 10 and 50% and the
ated the goodness of fit of the logistic regression model on the test set cost to be between $100–$2000. As computed by the IUA, the read­
using AUC, with its 95% confidence intervals computed using the non- mission model had a maximum utility gain of $104.23 per patient when
parametric bootstrap with 1000 bootstrap replicates. The propensity 21.4% of the patient population was treated, the cost model had a
score model had an AUC of 0.691 (95% CI 0.688, 0.695), a calibration maximum utility gain of $175.94 per patient when 34.0% of the patient
slope of 0.945 (95% 0.931, 0.957) and a calibration intercept of 0.0124 population was treated, and the hybrid model had a maximum utility
(95% 0.0085, 0.016). gain of $177.21 when 34.3% of the patient population was treated. Both
the cost and hybrid models’ maximum utility gain per patient were
2.8. Feature importances significantly greater than that of the readmission model using boot­
strapping and Bonferroni correction (p < .001). No significant difference
We quantified the impact of each input feature on the readmission in utility was found between the cost and the hybrid model. Fig. 3 shows
and cost models. The SHAP (SHapley Additive exPlanations) method the utility gain at each % patient population treated of each model.
was used. The method explains prediction by allocating credit among Rerunning the analysis with a cost sensitive readmission model instead
the input features; feature credit is calculated using Shapley Values [25], of the readmission model yielded similar results (see Supplemental
as the change in the expected value of the model’s prediction of Fig. 3).
improvement for a symptom when a feature is observed versus un­
known. To uncover clinically important features that were globally 3.3. Sensitivity analysis
predictive of the readmissions outcome and the cost outcome, the
Shapley values for features on individual predictions were aggregated A sensitivity analysis of intervention cost and efficacy was performed
and reported along with their averaged absolute Shapley contributions to measure the utility of each model under various settings of cost and
as a percent of the contributions of all the features. efficacy of the intervention. Under low intervention cost and high
intervention efficacy settings, the readmission model is able attain the
3. Results best utility. But when intervention is expensive and less effective, the
cost and hybrid model outperform the readmission model based on
The final dataset consisted of 513,495 patients, divided into a overall utility. Fig. 4 shows the best model based on highest utility gain
training set of 410,796 patients, a test set of 102,699 patients. There was under various intervention cost and efficacy values. Supplemental
no patient overlap between the two sets. Overall, the average number of Figs. 1 and 2 shows individual model utility and model utility
index admissions per patient was 1.18. The prevalence of readmission differences.
was 25.41%, and the mean 90-day cost was $11,552. Table 1 details the
patient, admission, and outcome characteristics in the dataset. 3.4. Feature importances

The most important features for the readmission prediction model


were the number of hospital admissions within the past year (greater
counts increased risk), number of readmissions within the past year
Table 1
(greater counts increased risk) and Osteoarthritis (less visits with the
Summary statistics of the training set, test set, and overall dataset.
diagnosis in the past year increased risk) with mean contributions of
Characteristics Training set Test set Overall
10.1%, 6.6% and 5.5%.
Total members 410,796 102,699 513,495 For the cost model, the most important features were Hemodialysis
Total female 235,836 59,059 294,895 (more visits with the diagnosis in the past year increased cost), Age
(57.41%) (57.51%) (57.43%)
Age, mean (std) 69.11 (18.79) 69.15 (18.76) 69.12 (18.79)
(higher value increased cost) and number of ED visits in the past year
Total index admission 484,609 121,265 605,874 (greater counts increased cost) with mean contributions of 10.2%, 7.2%
Readmission rate 123,405 30,574 153,979 and 6.1%. Fig. 5 shows additional important features for the readmission
(25.46%) (25.21%) (25.41%) and cost model.
Mean 90-day cost $ 11,556 $ 11,538 $ 11,552
25th Quantile 90-day $ 415 $ 418 $ 415
cost 4. Discussion
50th Quantile 90-day $ 2,521 $ 2,521 $ 2,521
cost Useful models are those that lead to a favorable change in clinical
75th Quantile 90-day $ 11,556 $ 11,538 $ 11,552 decision-making in the form of improved patient outcomes, increased
cost
utility, or lower costs. In this study, we assessed the utility of machine

4
M. Ko et al. Journal of Biomedical Informatics 119 (2021) 103826

Fig. 3. The left panel shows the ROC curve for three models for assigning interventions to prevent a readmission. A readmission model (AUC = 0.76), which assigns
intervention according to readmission probability, a cost model (AUC = 0.62), which assigns intervention according to the cost of patient, and a hybrid cost +
readmission model (AUC = 0.72), which assigns intervention according to the product of cost and readmission probability. The right panel shows the utility realized
in terms of the average savings per patient given a certain number of interventions provided. With a fixed intervention cost of $500 and efficacy of 10%, the
readmission model had a maximum utility gain of $104.23 per patient when 21.4% of the patient population was treated, the cost model had a maximum utility of
$175.94 per patient when 34.0% of the patient population was treated, and the hybrid model had a maximum utility of $177.21 when 34.3% of the patient
population was treated.

interventions according to its rankings. Second, under the same inter­


vention settings, a hybrid model that combined the predictions of the
readmission model with the cost model achieved comparable utility and
higher AUC than the cost model. Third, individualized utility analysis
identifies different models that perform the best in terms of utility gain
under different intervention cost and efficacy settings.
Our results suggest that classification or prediction models with
higher discriminative performance may not necessarily give higher
utility. In healthcare, there is always a capacity constraint (either
monetary or staffing) which induces a hard limit on the number of ac­
tions that can be taken in response to a model’s ranking of patients at
risk of an outcome such as readmission [30,31]. The intent underlying
most of the risk-scoring [32] (from readmissions [33], to cost [34], to
mortality [15]) is to allocate interventions to those in the most need.
Utility per unit cost (or the budget) is one objective measure for eval­
uating clinical risk scoring. We acknowledge that “cost” is an imperfect
proxy for need across patient groups as outlined by Obermeyer et al.
[34] and that caution needs to be exercised in defining the outcome for
which a prediction model is built [34]. However, within a homogenous
patient group where label-bias can be avoided, incorporating cost in­
formation directly into model building provides a much tighter align­
ment with the end goal–which is allocation of limited resources to derive
the highest value from interventions.
Fig. 4. Different models have the highest utility gain under variable inter­
vention cost and efficacy. In low intervention cost and high intervention effi­ Our proposed approach for analyzing individualized utility poses
cacy settings, the readmission model is able attain the highest utility. However, several advantages over prior literature studying 30-day readmission
with higher intervention cost and lower intervention efficacy, the cost and [20]. Our approach uses an evaluation that focuses on utility obtained
hybrid model outperform the readmission model in terms of overall utility. by taking actions based on predictions whilst most 30-day readmission
work has focused on metrics such as AUC that do not account for the cost
learning models for the task of allocating interventions to avoid un­ implications of alternative actions [33]. Although a case study of
planned readmissions. We took into account the dollar value costs of an reducing readmissions for heart failure done by Bayati et al. has
intervention allocation, along with estimates of future individual patient demonstrated that improvements in prediction quality can improve
expenses. Our study led to three main findings. First, although a model utility [22], a key difference is that the prior work used the average cost
trained to predict readmissions achieved a higher AUC than a model of readmission in calculating utility, which may not translate into an
trained to predict cost, at 10% intervention efficacy and $500 inter­ individual-level net benefit [23,35]. We found that 90 day costs have a
vention cost, the cost model provided a higher utility in allocation of the high variability per patient and our individualized utility analysis

5
M. Ko et al. Journal of Biomedical Informatics 119 (2021) 103826

Fig. 5. Important features for the readmission model (left) and cost model (right). The impact of each feature on the discrimination ability of the readmission and
cost models was quantified using the SHAP (SHapley Additive exPlanations) method, which explains prediction by allocating credit among the input features. Feature
importances on individual predictions were aggregated and reported along with their averaged absolute Shapley contributions as a percent of the contributions of all
the features.

addresses this heterogeneity in costs across patients [23]. 6. Conclusion


Furthermore, analyzing individualized utility can reveal different
areas in the intervention cost and efficacy space where different models Machine learning models are used to predict the probability of
perform the best. Previous work done by Bayati et al., highlights inter­ occurrence of an outcome, in an effort to allocate limited health re­
vention settings that lead to failure modes in which a readmission model sources to get higher value for the care delivered. We show that models
would not be useful. Our work extends these findings by proposing al­ for intervention allocation can be made more useful by jointly ranking
ternatives, such as the cost and hybrid models, which attain a higher on individual utility and discriminative ability. Machine learning
utility and may be used in place of a readmission prediction model in models for allocation of limited health resources should consider
such situations. When allocating interventions to improve patient out­ directly optimizing for utility.
comes, examining individualized utility may help identify the suitable
model given the cost and efficacy of the intervention. CRediT authorship contribution statement
While we have demonstrated the usefulness of our proposed
approach using the well understood readmissions prediction setting, the Michael Ko: Conceptualization, Methodology, Software, Writing -
idea of IUA is applicable to many settings where the output of a pre­ original draft. Emma Chen: Methodology, Software, Writing - original
dictive model is used to guide allocation of interventions. Since draft. Ashwin Agrawal: Methodology, Software, Writing - original
healthcare interventions have a wide range of costs and efficacies, IUA draft. Pranav Rajpurkar: Conceptualization, Writing - review & edit­
may be a generalized framework that identifies the best modeling ap­ ing. Anand Avati: Conceptualization. Andrew Ng: Supervision. Sanjay
proaches for the intervention setting at hand. We believe that the pro­ Basu: Supervision. Nigam H. Shah: Conceptualization, Supervision,
posed method is another concrete step towards making machine Writing - review & editing.
learning models more useful in practice [1,36].
Declaration of Competing Interest
5. Limitations
The authors declare that they have no known competing financial
This study has several important limitations. First, utility was
interests or personal relationships that could have appeared to influence
defined as the dollar value with a linear U-Curve, and may be redefined
the work reported in this paper.
to incorporate measures such as quality of life, number of actions a care
team takes, essential resources consumed with a nonlinear U-curve.
Appendix A. Supplementary material
Second, our analysis is based on retrospective data. Rigorous validation
of our findings using IUA for estimating the utility of 30-day readmission
Supplementary data to this article can be found online at https://doi.
models would require evaluation in a prospective setting. Third, we
org/10.1016/j.jbi.2021.103826.
assume a fixed efficacy of readmission mitigating interventions. While
there is no doubt that treatment effect heterogeneity exists, estimating
heterogeneous treatment effects remains unsolved. Fourth, we do not References
investigate the scenario of explicitly performing model selection based
[1] N.H. Shah, A. Milstein, Ph.D.S.C. Bagley, Making machine learning models
on our IUA method; we expect that training procedures that directly clinically useful, JAMA 322 (14) (2019) 1351, https://doi.org/10.1001/
optimize and select models based on the utility may be a fruitful avenue jama.2019.10306.
for further exploration. [2] A.C. Alba, T. Agoritsas, M. Walsh, et al., Discrimination and calibration of clinical
prediction models: users’ guides to the medical literature, JAMA 318 (14) (2017)
1377–1384, https://doi.org/10.1001/jama.2017.12126.
[3] V.X. Liu, D.W. Bates, J. Wiens, N.H. Shah, The number needed to benefit:
estimating the value of predictive analytics in healthcare, J. Am. Med. Inform.
Assoc. JAMIA 26 (12) (2019) 1655–1659, https://doi.org/10.1093/jamia/ocz088.

6
M. Ko et al. Journal of Biomedical Informatics 119 (2021) 103826

[4] E.J. Emanuel, R.M. Wachter, Artificial intelligence in health care: will the value [21] H. Quan, V. Sundararajan, P. Halfon, et al., Coding algorithms for defining
match the hype? JAMA 321 (23) (2019) 2281, https://doi.org/10.1001/ comorbidities in ICD-9-CM and ICD-10 administrative data, Med. Care 43 (11)
jama.2019.4914. (2005) 1130–1139, https://doi.org/10.1097/01.mlr.0000182534.19832.83.
[5] S.W. Grant, G.S. Collins, S.A.M. Nashef, Statistical primer: developing and [22] M. Bayati, M. Braverman, M. Gillam, et al., Data-driven decisions for reducing
validating a risk prediction model†, Eur. J. Cardiothorac. Surg. 54 (2) (2018) readmissions for heart failure: general methodology and case study, Zhang H, ed.
203–208, https://doi.org/10.1093/ejcts/ezy180. PLoS ONE. 2014;9(10):e109264. doi:10.1371/journal.pone.0109264.
[6] A.K. Triantafyllidis, A. Tsanas, Applications of machine learning in real-life digital [23] J.P.A. Ioannidis, A.M. Garber, Individualized cost-effectiveness analysis, PLoS Med.
health interventions: review of the literature, J. Med. Internet Res. 21 (4) (2019), 8 (7) (2011), e1001058, https://doi.org/10.1371/journal.pmed.1001058.
e12286, https://doi.org/10.2196/12286. [24] A. Abadie, G.W. Imbens, Matching on the estimated propensity score,
[7] S.F. Jencks, M.V. Williams, E.A. Coleman, Rehospitalizations among patients in the Econometrica. 84 (2) (2016) 781–807, https://doi.org/10.3982/ECTA11293.
medicare fee-for-service program, N. Engl. J. Med. 360 (14) (2009) 1418–1428, [25] L.S. Shapley, A value for n-person games, Contrib. Theory Games 2 (28) (1953)
https://doi.org/10.1056/NEJMsa0803563. 307–317.
[8] Daniel R. Levinson, Inspector General, Adverse events in hospitals: national [26] R. Nair, H. Lak, S. Hasan, D. Gunasekaran, A. Babar, K.V. Gopalakrishna, Reducing
incidence among medicare beneficiaries. Department of Health and Human all-cause 30-day hospital readmissions for patients presenting with acute heart
Services Office of the Inspector General, 2010. failure exacerbations: a quality improvement initiative, Cureus 12 (3) (2020),
[9] H.M. Krumholz, K. Wang, Z. Lin, et al., Hospital-readmission risk — isolating e7420, https://doi.org/10.7759/cureus.7420.
hospital effects from patient effects, N. Engl. J. Med. 377 (11) (2017) 1055–1064, [27] S. Kripalani, C.N. Theobald, B. Anctil, E.E. Vasilevskis, Reducing hospital
https://doi.org/10.1056/NEJMsa1702321. readmission rates: current strategies and future directions, Annu. Rev. Med. 65
[10] C.K. McIlvennan, Z.J. Eapen, L.A. Allen, Hospital readmissions reduction program, (2014) 471–485, https://doi.org/10.1146/annurev-med-022613-090415.
Circulation 131 (20) (2015) 1796–1803, https://doi.org/10.1161/ [28] V. Baky, D. Moran, T. Warwick, et al., Obtaining a follow-up appointment before
CIRCULATIONAHA.114.010270. discharge protects against readmission for patients with acute coronary syndrome
[11] L.I. Horwitz, J.N. Grady, D.B. Cohen, et al., Development and validation of an and heart failure: A quality improvement project, Int. J. Cardiol. 257 (2018)
algorithm to identify planned readmissions from claims data, J. Hosp. Med. 10 (10) 12–15, https://doi.org/10.1016/j.ijcard.2017.10.036.
(2015) 670–677, https://doi.org/10.1002/jhm.2416. [30] W. Crown, N. Buyukkaramikli, P. Thokala, et al., Constrained optimization
[12] CMS. U.S. Centers for Medicare & Medicaid Services (CMS) Measure Methodology. methods in health services research-an introduction: Report 1 of the ISPOR
Published 2019. https://www.qualitynet.org/inpatient/measures/readmission/ optimization methods emerging good practices task force, Value Health J. Int. Soc.
methodology. Pharmacoeconomics Outcomes Res. 20 (3) (2017) 310–319, https://doi.org/
[13] H. Zhang, S. Si, C.-J. Hsieh, GPU-acceleration for Large-scale Tree Boosting. 10.1016/j.jval.2017.01.013.
ArXiv170608359 Cs Stat. Published online June 26, 2017. Accessed March 24, [31] W. Crown, N. Buyukkaramikli, M.Y. Sir, et al., Application of constrained
2020. http://arxiv.org/abs/1706.08359. optimization methods in health services research: report 2 of the ISPOR
[14] F. Cabitza, R. Rasoini, G.F. Gensini, Unintended consequences of machine learning optimization methods emerging good practices task force, Value Health J. Int. Soc.
in medicine, JAMA 318 (6) (2017) 517, https://doi.org/10.1001/jama.2017.7797. Pharmacoeconomics Outcomes Res. 21 (9) (2018) 1019–1028, https://doi.org/
[15] L. Einav, A. Finkelstein, S. Mullainathan, Z. Obermeyer, Predictive modeling of U. 10.1016/j.jval.2018.05.003.
S. health care spending in late life, Science 360 (6396) (2018) 1462–1465, https:// [32] D.W. Challener, L.J. Prokop, O. Abu-Saleh, The proliferation of reports on clinical
doi.org/10.1126/science.aar5045. scoring systems: issues about uptake and clinical utility, JAMA 321 (24) (2019)
[16] J. Friedman, T. Hastie, R. Tibshirani, Additive logistic regression: a statistical view 2405, https://doi.org/10.1001/jama.2019.5284.
of boosting (With discussion and a rejoinder by the authors), Ann. Stat. 28 (2) [33] A. Artetxe, A. Beristain, M. Graña, Predictive models for hospital readmission risk:
(2000) 337–407, https://doi.org/10.1214/aos/1016218223. A systematic review of methods, Comput. Methods Programs Biomed. 164 (2018)
[17] G. Ke, Q. Meng, T. Finley, et al., LightGBM: a highly efficient gradient boosting 49–64, https://doi.org/10.1016/j.cmpb.2018.06.006.
decision tree, NIPS17 Proc 31st Int Conf Neural Inf Process Syst, Published online [34] Z. Obermeyer, B. Powers, C. Vogeli, S. Mullainathan, Dissecting racial bias in an
December 2017:3149-3157. algorithm used to manage the health of populations, Science 366 (6464) (2019)
[18] CCS_10 (Clinical Classifications Software 10) - Synopsis. https://www.nlm.nih. 447–453, https://doi.org/10.1126/science.aax2342.
gov/research/umls/sourcereleasedocs/current/CCS_10/index.html. [35] T.K. Nuckols, E. Keeler, S. Morton, et al., Economic evaluation of quality
[19] J.D. Donzé, M.V. Williams, E.J. Robinson, et al., International validity of the improvement interventions designed to prevent hospital readmission: a systematic
HOSPITAL score to predict 30-day potentially avoidable hospital readmissions, review and meta-analysis, JAMA Intern. Med. 177 (7) (2017) 975, https://doi.org/
JAMA Intern. Med. 176 (4) (2016) 496, https://doi.org/10.1001/ 10.1001/jamainternmed.2017.1136.
jamainternmed.2015.8462. [36] K. Jung, S. Kashyap, A. Avati, et al., A framework for making predictive models
[20] C. van Walraven, I.A. Dhalla, C. Bell, et al., Derivation and validation of an index to useful in practice, J. Am. Med. Inform. Assoc. Published online December 22, 2020:
predict early death or unplanned readmission after discharge from hospital to the ocaa318. doi:10.1093/jamia/ocaa318.
community, Can. Med. Assoc. J. 182 (6) (2010) 551–557, https://doi.org/
10.1503/cmaj.091117.

You might also like