You are on page 1of 13

MediCast: Model & App

to Predict and Lower


Hospital Readmissions

​Navya Nori
​Fulton Science Academy, Alpharetta, GA
Hospital readmission are expensive and preventable
​Every year, over 26 billion dollars are spent in the United States on hospital readmissions
(i.e. a hospital admission following a discharge). Out of this over 17 billion dollars is on
readmissions which may be preventable. In 2010, the US government enacted a law
penalizing hospitals for having high readmission rates, and in 2019, 82% of hospitals were
penalized.
​Avoidable re-admissions take time, resources, and care away from patients who are
actually in need of care.
​Not only the patient, families of these patients also endure significant psychological and
financial strain. Depending on the situation this may even eventually lead to financial
bankruptcy.
​Limited literature in this area focus on condition-specific models rather than all chronic
conditions which the patient may have. In this project the focus is on building several
models which consider these conditions; the models are used to make care decisions
immediately following the discharge.
2
Problem Statement & Hypothesis

​Problem Statement
​Which factors (demographic and clinical) will have the highest impact on the risk of
readmission?
​Hypothesis
• Age will be an important predictor of hospital readmission (especially very old individuals)
• People with chronic conditions such as heart failure and kidney disease will have a higher
risk of re-admission
• Model for re-admission risk can score a patient and suggest next steps based on the
severity of the patient

3
Methods
1. Download data and import into Python Jupyter notebook and run data validation methods
(prevalence of diseases, odds ratios) to make sure data was downloaded and imported correctly
2. Set up matrix with response variables where the rows are people and columns are diseases.
Add columns for age (converting to ranges 50-54, 55-59, … 80-84, 85+) and gender
3. Split the rows randomly (to prevent bias) 80% to train the model and 20% to test the model
4. Train models using Python libraries – Generalized Linear Model (h2o), Gradient Boosted Tree
(LightGBM), Feed Forward Network (pyTorch)
5. Compute scores and discharge decisions, ROC charts and other statistics, and analyze the data
6. Design the user interface of the app screens using MIT App inventor. Using the coefficients from
a smaller model, compute the risk score and provide the user with an appropriate
recommendation.
7. Build web application using the same logic from the mobile app using a much larger and more
comprehensive list of conditions obtained from a larger model.
4
Data Characteristics
• Age 50 and above.
• Number of patients: 85,796 (train 80%, test 20%).
• Cohort definition: Cases – evidence of readmission within 30days of discharge from
hospital, Controls – no evidence of readmission during that period
• Length of stay in the hospital can be from 1 to 14 days
• Considering only Diagnosis codes (ICD9) mapped to CCS [link] for clinical inputs to
models. This allows the models and apps to be used in a variety of settings and
countries. Mapping of the codes makes the columns more comprehensible. For
example: there are over 10 codes for hypertension but the map converts them to one
code

5
Comparison of AI/ML Algorithms Used to Train Models
​Generalized Linear Models ​Gradient Boosted Trees ​Feed Forward Network is a
assume a linear functional allows interactions between deep neural network algorithm
form for the predictors. The predictors (i.e. non-linear) which can capture any
coefficients are estimated as improving the predictive functional form for the input
part of the training. power. Boosting is a process data. Requires a lot of
Regularization (LASSO) where in each iteration the experimentation to figure out
allows features which are not model evaluates observations details of the model i.e.
predictive to be dropped where it is performing poorly number of layers, size of a
and adjusts weights to layer, dropout, and other
• Runs very fast improve those predictions parameters
• Easy to interpret the • Slower • Much slower
coefficients and build trust
with clinicians • Returns feature • Black box. Interpretation
importances but cannot requires a lot of work.
separate positive and
negative effects easily
6
GLM Model Results
Feature/Variable Coefficient • Ten predictive conditions from GLM model are shown.
The model quantifies and ranks these conditions so
Intercept -0.88
that they can be used to make post-discharge
Heart Failure 0.2 decisions and avoid re-admission
Acute Renal Failure 0.2
Implant, device or graft-related 0.19 • While the presence of many of these factors is self-
explanatory, factors like anemia and electrolytic
Pulmonary collapse, Pleurisy 0.19 disorders need more explanation. In the absence of
Aplastic anemia 0.18 adequate post-discharge support, a chronically anemic
patient can collapse at home leading to other
Chronic Obstructive Pulmonary complications. Conditions like electrolytic disorders
0.18
Disease can lead to temporary dementia thus necessitating a
Fluid and electrolyte disorder 0.17 move to a nursing facility for better monitoring
Diabetes mellitus with complications 0.09
• The absence of age from the GLM model key feature
Cardiac dysrhythmias 0.08 list is explained by the fact that typically age is highly
Peripheral and visceral vascular correlated with each of these chronic conditions and
0.08 hence they act as surrogates for age.
disease

7
thresh Threshold
sens Sensitivity
Comparison of All Models spec Specificity
ppv Positive Predictive Value
Receiver Operating accu Accuracy
Characteristics

• Thresholds are selected so that the


same number of patients are predicted
AUC as cases, so that the model comparison
is fair
• All models identify about 50% of the
cases (sensitivity), and 70% of the
• Scores have a bi-modal distribution with each controls (specificity) correctly
peak representing a class. Average score for • GBM Model has the best accuracy
cases is slightly higher than that for controls. (70.3%) out of the three
• There is a large overlap between the scores • Lift shows that 1.6x patients (60%
for classes which show the amount of • FFN Model has the best AUC measure more) will be identified correctly over a
similarity between the cases and controls. on test data random model
• The distribution of scores is similar for GLM • GLM Model has the least overfitting
and GBM models (gap between train and test AUC)
between three models
• GBM model is close to FFN model but
is much more interpretable for clinicians
8
Decisions Based on Scores
​When a patient is scored using any of the models, it returns a number (risk score) between
0 and 1. This score represents the similarity that the patient has with a patient who has
been readmitted
• if the score is high (>0.6), the patient has high risk of readmission
• If the score is low (<0.3), the patient has low risk of readmission
These scores serve as drivers to make care decisions during discharge
Suggestions for post-discharge care
• [score > 0.6] move to a skilled nursing facility (SNF)
• [0.3 ≤ score ≤ 0.6] ensure support at home from immediate family or nurse visits
• [score < 0.3 ] enable electronic remote monitoring

As with all models, they serve to guide a clinician and offer advice based on the
circumstances that are specific to a patient
9
Mobile App Computes Re-admission Risk Very Quickly
Patient Information Input Screen Clinical Input Screen
​Input includes age, gender, and
set of chronic conditions (shown
on the app screens).
​Computation happens in the
app. Does not need access to
internet/data plan which is
important for many areas in third
world countries.
​Prototype designed and
programmed using MIT App
Inventor.

10
Web App Computes Re-admission Risk Using Many Risk Factors
Splash Screen

​Input includes age, gender, and set of both


chronic (ex. Diabetes Mellitus) and acute
(ex. fever) conditions. The conditions are
sorted by their related organ system, along
with a section for miscellaneous signs and
symptoms.
Clinical Input Screen
​Can be run on a desktop in any hospital,
and connected to an Electronic Medical
Records (EMR) system for ease of use.

11
Conclusions & Future Work
​Chronic conditions such as heart failure, renal failure and osteoarthritis are good predictors
of re-admission risk
​Gradient Boosted tree model has the highest accuracy, GLM model is simplest and easiest
to interpret, while the Deep Learning-based FFN model has highest AUC
​The apps provide a quick way of making an important post-discharge decision thus
preventing a possible re-admit
​Future Work
• including demographic (race, ethnicity) and other social determinants of health (SHOH)
as part of the model to understand their importance along with the current features
• including past surgeries and other procedure
• identifying factors which may not be in the model because they are not easily obtained
such as family support and weighing that into the decision

12
References
​Frizzell JD et al. Prediction of 30-Day All-Cause Readmissions in Patients Hospitalized for Heart Failure:
Comparison of Machine Learning and Other Statistical Approaches. [link]
​Weinreich M et al. Predicting the Risk of Readmission in Pneumonia. A Systematic Review of Model
Performance. [link]
​Cui S et al. An improved support vector machine-based diabetic readmission prediction. [link]
​Zhu K et al. Predicting 30-day Hospital Readmission with Publicly Available Administrative Database. A
Conditional Logistic Regression Modeling Approach. [link]
​UCI Machine Learning Repository - Diabetes 130-US hospitals for years 1999-2008 Data Set
​Rau, J, Look Up Your Hospital: Is It Being Penalized By Medicare? [link]
​Center for Medicare & Medicare Services - Hospital Readmissions Reduction Program (HRRP) [link]
​Agency for Healthcare Research and Quality [link]

13

You might also like