You are on page 1of 8

210 Diabetes Care Volume 40, February 2017

Personalized Diabetes Dimitris Bertsimas, Nathan Kallus,


Alexander M. Weinstein, and
CLIN CARE/EDUCATION/NUTRITION/PSYCHOSOCIAL

Management Using Electronic Ying Daisy Zhuo

Medical Records
Diabetes Care 2017;40:210217 | DOI: 10.2337/dc16-0826

OBJECTIVE
Current clinical guidelines for managing type 2 diabetes do not differentiate based
on patient-specic factors. We present a data-driven algorithm for personalized
diabetes management that improves health outcomes relative to the standard
of care.

RESEARCH DESIGN AND METHODS


We modeled outcomes under 13 pharmacological therapies based on electronic
medical records from 1999 to 2014 for 10,806 patients with type 2 diabetes from
Boston Medical Center. For each patient visit, we analyzed the range of outcomes
under alternative care using a k-nearest neighbor approach. The neighbors were
chosen to maximize similarity on individual patient characteristics and medical
history that were most predictive of health outcomes. The recommendation al-
gorithm prescribes the regimen with best predicted outcome if the expected im-
provement from switching regimens exceeds a threshold. We evaluated the effect of
recommendations on matched patient outcomes from unseen data.

RESULTS
Among the 48,140 patient visits in the test set, the algorithms recommendation
mirrored the observed standard of care in 68.2% of visits. For patient visits in
which the algorithmic recommendation differed from the standard of care, the
mean posttreatment glycated hemoglobin A1c (HbA1c) under the algorithm was
lower than standard of care by 0.44 6 0.03% (4.8 6 0.3 mmol/mol) (P < 0.001), Operations Research Center, Massachusetts Insti-
from 8.37% under the standard of care to 7.93% under our algorithm (68.0 to tute of Technology, Cambridge, MA
63.2 mmol/mol). Corresponding author: Dimitris Bertsimas, dbertsim@
mit.edu.
CONCLUSIONS Received 15 April 2016 and accepted 6 Novem-
A personalized approach to diabetes management yielded substantial improve- ber 2016.
ments in HbA1c outcomes relative to the standard of care. Our prototyped dash- This article contains Supplementary Data online
board visualizing the recommendation algorithm can be used by providers to at http://care.diabetesjournals.org/lookup/
suppl/doi:10.2337/dc16-0826/-/DC1.
inform diabetes care and improve outcomes.
N.K. is currently afliated with the School of Op-
erations Research and Information Engineering,
Type 2 diabetes is typically managed through healthy eating, physical activity, oral Cornell University, Ithaca, NY and Cornell Tech,
medication, and/or insulin injections. Although there are evidence-based clinical New York, NY.
guidelines for glycemic control (1), how to choose among pharmacological therapies 2017 by the American Diabetes Association.
to maximize effectiveness for a given patient is not well understood. There has been Readers may use this article as long as the work
is properly cited, the use is educational and not
growing interest in using clinical evidence to understand the effects of treatments for prot, and the work is not altered. More infor-
in different populations with type 2 diabetes. In a joint statement from 2012, the mation is available at http://www.diabetesjournals
American Diabetes Association and the European Association for the Study of .org/content/license.
care.diabetesjournals.org Bertsimas and Associates 211

Diabetes highlighted the need for a patient- inference to make personalized recom- HbA1c outcome under our recommended
centered approach to diabetes manage- mendations based on comparative ef- therapy to the observed outcome under
ment (2). The need for an individualized fectiveness among subpopulations in the standard of care (ground-truth) ther-
approach is especially pressing given the the EMR database. Machine learning apy, according to a commonly used
variety of disease symptoms, comorbid techniques have been increasingly adopted matching approach (12). We conducted
conditions, pharmacological treatments, in- in health care, along with many other additional simulations to ensure that the
dividual treatment histories, and other in- elds (79). Our novel approach lever- results were robust to training models on
dividual characteristics that may inform ages the power of analytics and abundant different datasets and using alternative
treatment (3). data in the EMR system to improve qual- predictive modeling techniques.
Evidence suggests that the response ity of care.
Data
to blood glucose regulation agents can The recommendations are personal-
Through a partnership with Boston Med-
differ among population subgroups. A ized by patient characteristics, including
ical Center (BMC), an academic medical
post hoc secondary analysis found that age, sex, race, BMI, treatment history,
center in Boston, MA, we obtained EMR
African American adults with prediabe- and diabetes progression. We evaluate
for .1.1 million patients from 1999 to
tes responded better to metformin than the effectiveness of the personalized
2014. In this dataset, 10,806 patients
Caucasian adults with prediabetes (4). treatment recommendations against
met all of the following inclusion criteria:
Another study recommended less ag- the current standard of care by estimat-
gressive treatments for older patients, ing patients counterfactual outcomes
c Were present in the system for an
as they were more likely to experience from historical outcomes of similar pa-
observation period of at least 1 year;
severe consequences from hypoglycemia tients in the EMR database. We develop a
c Received a prescription for at least one
(5). These studies each provide valuable prototype clinical support dashboard that
blood glucose regulation agent, includ-
insights with respect to a single subgroup provides evidence for the algorithms rec-
ing insulin, metformin, sulfonylureas,
or treatment, but do not offer a decision ommendations and could guide providers
or one of the other blood glucose reg-
rule for the general population that pro- in caring for patients with type 2 diabetes
ulation agents listed below, and had at
viders can easily apply in practice. in a personalized manner.
least one medical record 100 days prior
Tailoring glycemic management for
RESEARCH DESIGN AND METHODS to the date of this prescription;
specic subpopulations can be critical.
c Had at least three recorded labora-
Among patients with chronic kidney dis- Analytic Overview
tory measurements of HbA1c; and,
ease, contraindication to metformin We modeled outcomes for patients with
c Did not have a recorded diagnosis of
needs to be taken into consideration type 2 diabetes based on EMR data. We
type 1 diabetes, as dened by the pres-
when prescribing medication (6). Sepa- divided each patients medical history
ence of ICD-9 diagnosis code 250.x1 or
rate glycated hemoglobin A1c (HbA1c) into distinct lines of therapy, each char-
250.x3 combined with the absence of
goals may be needed for subgroups or acterized by a particular drug monother-
any subsequent prescriptions for oral
individuals differentiated by age, comor- apy or combination therapy. Within
blood glucose regulation agents. (If the
bidities, and other clinical characteristics each line of therapy, we considered pa-
patient received oral blood glucose
(3). A personalized treatment recommen- tient visits occurring every 100 days. At
regulation agents subsequent to one
dation using a quantitative approach could each visit, the provider decides whether
of these diagnosis codes, we assumed
readily incorporate different glycemic tar- to proceed with the patients current line
the diagnosis record was an error.)
gets and contraindications and thus al- of therapy or to recommend an alterna-
low for more systematic management of tive regimen. We developed a nonparamet-
For each patient, we had access to de-
subgroups. ric prescriptive algorithm that provides
mographic data, including date of birth,
We provide an algorithm that generates personalized treatment recommendations.
sex, and race/ethnicity, and to all BMC
a personalized type 2 diabetes treatment For each patient visit, we used k-nearest
EMR data, including a history of drug pre-
recommendation for any given patient neighbor (kNN) regression (10) to predict
scriptions and measurements of height,
based on evidence from historical out- the potential HbA1c outcome under each
weight, BMI, and HbA1c, as well as creat-
comes of similar patients drawn from an treatment alternative. The nearest neigh-
inine levels (Table 1). Neither the size of
electronic medical records (EMR) database. bors were chosen to control for confound-
the population nor the proportion with
EMR analysis allows for pinpoint com- ing that may be present in nonrandomized
good glycemic control changed substan-
parisons of effectiveness because of the data (11) and to maximize similarity on the
tially over the course of the study.
abundance of clinical evidence from multi- patient characteristics that were most
ple treatment options administered to a predictive of outcomes. The algorithm Interpreting Individual Medical
diverse population over long-term patient then prescribed the regimen with best Histories
clinical histories. EMR data combine the predicted outcome, provided the pre- We divided each patients medical
large sample sizes found in some insurance dicted improvement relative to the history into distinct lines of therapy,
claims databases with the depth of longi- patients current regimen exceeded a each characterized by a particular drug
tudinal clinical evidence typically found in condence threshold. The outcome met- regimen (Supplementary Fig. 1). Within
clinical trials. One caveat is that EMR data ric was the average HbA1c measurement each line of therapy, we considered pa-
are not controlled via randomization. 75 to 200 days after the visit date. The tient visits occurring every 100 days,
Our methodological approach applies effect of the prescriptive algorithm was corresponding to the life cycle of a red
machine learning techniques and causal evaluated by comparing the expected blood cell (13). These patient visits
212 Personalized Diabetes Management Using EMR Diabetes Care Volume 40, February 2017

Table 1Demographics, medical history, and treatment history of patients (N = to be NoRx. This denition of end date
10,806) for each line of therapy intends to capture
Feature the period when the patient was experi-
encing the effect of the drug regimen.
Age (years)* 59.7 (13.6)
Percent male 42.4 Patient Visits
Percent black 58.5 Within each line of therapy, we consid-
Percent Hispanic 15.1 ered patient visits occurring every
Percent white 16.6 100 days, beginning with the visit at
BMI (kg/m2)* 33.1 (8.1) which that regimen was initiated and
HbA1c (%)* 7.9 (1.8)
continuing until no later than 80 days
HbA1c (mmol/mol)* 62.8 (19.7)
prior to the start of the subsequent reg-
imen. There were 48,140 unique patient
Percent with good glycemic control (i.e., HbA1c #7.0%
[53 mmol/mol])* 37.7 visits in our dataset (Table 1). At each
Years since rst treatment in EMR* 3.52 (3.66)
visit, we dened a set of visit-specic
Percent with current prescription for metformin* 45.6
patient characteristics, including the
current line of therapy (i.e., therapy
Percent with current prescription for insulin* 30.2
given during the 100 days immediately
Percent with contraindication to metformin* 17.4
preceding the current visit) and recent
Number of patients with rst visit prior to 2007 (%) 6,175 (57.1)
HbA 1c and BMI history. The outcome
Observed standard of care regimen (abbreviation) Number of patient
was measured as average HbA1c 75 to
visits (N = 48,140)
No regimen prescribed, new patient (NEWPT) 5,449 200 days after the visit. This effect period
No regimen prescribed, existing patient (NORX) 2,137 was chosen to allow for a complete red
Metformin monotherapy (MET0) 9,649 blood cell life cycle to elapse before mea-
Insulin monotherapy (INS0) 7,539 suring the effect of a drug therapy.
Other blood glucose regulation agent monotherapy (OTHER0) 4,671 We dened the standard of care for
Metformin combined with one other noninsulin agent (MET1) 6,959 each visit as the drug regimen that was
Metformin combined with insulin (METINS0) 3,977
Insulin combined with one nonmetformin agent (INS1) 2,139
administered. For 16.3% of visits, the
Combination of two nonmetformin, noninsulin agents (OTHER1) 1,047 provider prescribed an adjustment to
Metformin combined with two other noninsulin agents (MET2) 1,749 the current line of therapy; in the other
Metformin combined with insulin and one other agent (METINS1) 2,005 83.7%, the providers prescription was
Insulin combined with two nonmetformin agents (INS2) 249 to continue the current regimen.
All other multidrug ($3) combinations (MULTI) 570
Data are mean (SD) unless otherwise indicated. *Sample statistics are calculated across all
Prescriptive Algorithm
patient visits. Individual patients with longer medical histories may be overrepresented in the Our novel prescriptive algorithm con-
sample. Individuals may have a current prescription for both metformin and insulin. A patient siders a menu of available treatment op-
was considered to be contraindicated to metformin when current serum level of creatinine tions, including the patients current
was .1.5 mg/dL.
treatment; uses kNN regression models
to predict potential outcomes under
each option; rejects any noncurrent treat-
provided the basis for our denition of 4 inhibitors, meglitinides, a-glucosidase ment option with predicted outcome
patient outcomes. inhibitors, glucagon-like peptide 1 ago- above a prespecied HbA1c threshold;
Lines of Therapy nists, and other antihyperglycemic and chooses the remaining option with
We developed an algorithm to dene agents. If a sufcient number of HbA1c best predicted outcome. The menu of op-
precisely when each line of therapy observations existed during a period in tions for a given patient could be deter-
ends and the next line begins according which no drugs were prescribed, we de- mined by the provider, accounting for
to when the combination of drugs pre- ned the patients line of therapy as contraindications and other preferences,
scribed to the patient changes in the NoRx. We considered 13 possible reg- such as not using intensive control for
EMR data. Each line of therapy was imen types (Table 1). A combination of elderly patients or patients with a history
characterized by a unique drug regimen, drug classes was included as a regimen of severe hypoglycemia.
dened to include all blood glucose reg- type if it was observed in a sufcient For the purposes of this analysis, the
ulation agents prescribed to the patient number of patient visits. menu of options for each patient was
within the rst 6 months after starting We determined that a patients current chosen relative to the intensity and
that line of therapy. line of therapy had ended whenever the composition of the patients current
Regimens were dened as combina- patients drug regimen changed in some treatment regimen. Specically, the al-
tions of drugs from one or more drug way, such as when one or more drugs gorithm considered only regimens that
classes. The drug classes we consid- were added to or removed from the drug represented an incremental addition
ered were metformin, insulin, and other regimen, or when the phase was interrupted or subtraction of a drug, or substitution
blood glucose regulation agents; the by a period of at least 500 days with no of a drug of comparable intensity; met-
other agents included sulfonylureas, new prescriptions, in which case the sub- formin and insulin were considered to
thiazolidinediones, dipeptidyl peptidase sequent line of therapy was determined be of the lowest and highest intensity,
care.diabetesjournals.org Bertsimas and Associates 213

respectively. Patients with serum creati- value of 0.8% by testing the algorithm specic weighted Euclidean distance
nine levels .1.5 mg/dL (6), a sign of kidney on a single test set, using values of d rang- across normalized patient and visit-
disease, were not offered metformin- ing from 0 to 1.5%. Increasing the thresh- specic factors. The weights were de-
based regimens. The menu options used old d causes the algorithm to recommend rived by training a separate ordinary least
in our analysis, differentiated by current switching for fewer patients, but the squares linear regression model for each
treatment, are depicted in Fig. 1; by def- mean benet among those who switch treatment regimen and using the mag-
inition, the algorithm never recom- increases. Above a certain threshold, the nitudes of the regression coefcients
mended metformin-based therapies for recommendation ts to noise in the train- (Supplementary Table 1). This weighted
patients with the contraindication de- ing data and does not provide better distance improves upon classical kNN by
scribed above. mean benets in the testing set. The op- selecting neighbors based on the factors
For each patient visit, the outcomes pre- timal threshold balances these concerns. most predictive of HbA1c outcome, rather
dicted by kNN under each treatment were kNN regression is a nonparametric, than weighting all factors equally.
compared. Our algorithm selected the instance-based algorithm that makes pre- We considered factors from the fol-
treatment with the best predicted HbA1c dictions by averaging the outcomes for lowing categories: demographic informa-
outcome subject to the condition that this the subset of observations most similar tion, medical history, and treatment
best predicted outcome improve upon to the target as dened by some dis- history. Specically, the demographic fac-
the predicted outcome under the patients tance metric (10). To predict potential out- tors used in the model were age, sex, and
current treatment by at least some thres- comes under each regimen, we used a race. The medical history factors were
hold d. We chose the optimal threshold kNN regression based on a treatment- days since rst diabetes diagnosis, the

Figure 1HbA1c benet of prescriptive algorithm for patients switching regimens. Each cell in the gure represents patients for whom the
prescriptive algorithm recommended switching from the regimen on the vertical axis to the regimen on the horizontal axis. The color in each
cell indicates the mean HbA1c benet (%) of the prescriptive algorithm for patients in that cell, with red indicating benets of the algorithm and blue
indicating worsening relative to standard of care. Each cell is labeled with the number of patients who made that switch; cells labeled with a dash
were not on the menu of options provided to patients currently on a given regimen. Patients with serum creatinine levels .1.5 mg/dL were not
considered for metformin-based regimens and therefore are never assigned by the algorithm to columns with metformin-based regimens.
214 Personalized Diabetes Management Using EMR Diabetes Care Volume 40, February 2017

patients average serum creatinine level in Intuitively, the number of neighbors k averaging the outcomes of the most
the previous year, the patients past two used to estimate posttreatment HbA1c similar patient visits at which the rec-
HbA1c and most recent BMI observations levels should increase with the size of ommended therapy was administered;
up to and including the current visit, the the dataset. For each treatment t, we these similar visits were chosen from a
patients average, median, 25th percen- found the value kt that minimized the test set not used for training, and the
tile, and 75th percentile HbA1c and BMI root mean square error of the kNN pre- number of neighbors kt was selected to
in the 1,000-day period up to and includ- dictions on a subset of the data not used t the size of the test set. This estimated
ing the current visit, and the patients to evaluate the algorithm. We regressed outcome was compared with the true
p
frequency of HbA1c measurements. The kt on nt and thus derived the depen- outcome under standard of care at the
p
treatment history factors were the num- dence function kt 0:34 nt , which given patient visit.
ber of regimens the patient had tried, the was used to select k in the prescriptive Our hypothesis was that the average
number of visits since starting the current algorithm. predicted HbA1c outcome after applying
regimen, whether or not the patient had To verify the accuracy of the kNN HbA1c our prescriptive algorithm would be less
been previously prescribed metformin, predictions, we evaluated the R2 metric. than that observed from administering
and the patients current regimen. Positive values of R2 suggest patient char- standard of care, resulting in a net aver-
The prediction step of our algorithm is acteristics are predictive of future HbA1c. age improvement in outcomes.
best illustrated through an example. Sup- For comparison, we evaluated the predic-
Sensitivity Analysis
pose we would like to estimate a patients tive accuracy of least absolute shrinkage
To ensure the evaluation of our algo-
potential outcome under metformin and selection operator (LASSO) regres-
rithm was not sensitive to the particular
monotherapy. To identify the importance sion (14) and random forest (15), two
random split of the database into train-
of each factor in predicting outcomes, we state-of-the-art machine-learning meth-
ing and test data, we evaluated the ef-
used patient visits in which metformin ods used widely because of their high pre-
fectiveness of our algorithm (with xed
monotherapy was prescribed to train an diction accuracy. We used the predictions
threshold d 0:8) under additional ran-
ordinary least squares regression on nor- from these models in two alternative pre-
dom splittings of the data.
malized values of each patient factor listed scriptive algorithms.
above. The most predictive factors were: Software
the patients most recent HbA1c measure- Model Evaluation All analyses were performed in R 3.3.0
ment (regression coefcient magnitude To evaluate the performance of the kNN- (16).
0.22), whether the patient was currently based prescriptive model, we tested the
prescribed insulin (0.11), the patients algorithms recommendations on a set of RESULTS
mean BMI over the past 1,000 days patient data that had not been used when The R2 of the kNN predictions on unseen
(0.11), and several other HbA 1c and training the models. data ranged from 0.20 to 0.54 depending
BMI measurements (coefcient magni- Because counterfactual treatment ef- on the regimen (Supplementary Table 2).
tudes ranging from 0.03 to 0.10) (see fects are not observable, we used the The strongest models were for insulin
Supplementary Table 1 for full details). weighted matching approach embedded monotherapy, metformin monotherapy,
To estimate the patients potential out- in the kNN regression to impute potential metformin plus insulin, and multidrug
come, we used the coefcient magnitudes outcomes, an approach commonly used ($3) therapies. The R2 values from the
to weight the Euclidean distance between for causal inference in observational stud- LASSO and random forest models ranged
this patient visit and each patient visit in ies when randomization is unavailable from 0.24 to 0.53. The predictive power
which metformin monotherapy was (12). For each visit, we applied our pre- was similar across the three methods.
prescribed. Thus, for any choice of k, we scriptive algorithm to recommend a ther- The performance of the prescriptive
could rank the k closest neighbors from apy. If that recommendation matched the algorithm is summarized in Table 2. The
this treatment group. This procedure was prescribed standard of care therapy, we mean HbA1c outcome after treatment
repeated for each therapy in the patients observed the true effect from the therapy. was 0.14% lower under the prescriptive
menu of treatment options. Otherwise, the outcome was imputed by algorithm than under the standard of care

Table 2Performance of prescriptive algorithms


kNN LASSO Random forest
All patient visits (N = 48,140)
Mean HbA1c benet relative to standard of care (SE)
Percent 20.14 (0.01)* 20.13 (0.01)* 20.07 (0.01)*
mmol/mol 21.5 (0.1)* 21.4 (0.1)* 20.8 (0.1)*
Visits for which algorithms recommendation differed from
observed standard of care
Number of visits (%) 15,323 (31.8) 12,684 (26.3) 14,302 (29.7)
Mean HbA1c benet relative to standard of care (SE)
Percent 20.44 (0.03)* 20.45 (0.03)* 20.26 (0.03)*
mmol/mol 24.8 (0.3)* 24.9 (0.3)* 22.8 (0.3)*
*P , 0.001.
care.diabetesjournals.org Bertsimas and Associates 215

treatment, with SE 0.01% and signicance outsize effect on the mean but did not The overall mean HbA1c outcome using
level P , 0.001; equivalently, mean HbA1c affect the median. the prescriptive algorithm was 0.14%
was 1.5 6 0.1 mmol/mol lower under the Fig. 1 depicts the number of patients for (1.5 mmol/mol) lower than standard of
algorithm. Of the 48,140 patient visits in whom the prescriptive algorithm recom- care for both male and female patients.
our dataset, the algorithm differed from mended switching from a given current The benet of using the algorithm was
the standard of care for 15,323 visits, line of therapy to a given new line of ther- 0.14% (1.5 mmol/mol) for black patients
31.8% of all visits. For this subset of visits, apy, along with the mean reduction in (29,120 visits), 0.09% (1.0 mmol/mol) for
the mean HbA1c outcome under the algo- HbA1c for patient visits in each category. white patients (7,444 visits), 0.22%
rithm was lower by 0.44 6 0.03% (4.8 6 Among trajectories with at least 300 pa- (2.4 mmol/mol) for Hispanic patients
0.3 mmol/mol) compared with standard tients, the largest benet of the algorithm (6,732 visits), and 0.11% (1.2 mmol/mol)
of care, with P , 0.001, a reduction from was achieved through personalized rec- for all other patients (4,844 visits). The
8.37% under the standard of care to ommendations for 7,564 patients cur- benet of the algorithm was 0.20%
7.93% under our algorithm or, equiva- rently on insulin monotherapy to switch (2.2 mmol/mol) for patients ,60 years
lently, from 68.0 to 63.2 mmol/mol. The to monotherapy with metformin or an- of age and 0.08% (0.9 mmol/mol) for pa-
median outcome for these visits was other blood glucose regulation agent. tients aged $60 years or older. The ben-
0.21% (2.3 mmol/mol) lower under the However, for the vast majority of patients et was 0.20% (2.2 mmol/mol) for
prescriptive algorithm compared with currently on insulin-based regimens, the patients with poor glycemic control (i.e.,
standard of care. For comparison, the me- algorithm recommends that those pa- current HbA1c .7.0% [53 mmol/mol]) as
dian difference for all visits was zero be- tients continue with that therapy. Among compared with 0.05% (0.4 mmol/mol) for
cause, for 68.2% of visits, there was no the 7,564 patient visits, those who were those with good glycemic control.
difference between the algorithms rec- recommended to switch from insulin Our methodology motivates a provider
ommendation and the standard of care. were on average younger (mean 52.9 vs. dashboard that would report information
In our analysis, the mean difference 61.4 years of age) and had substantially on the demographics, medical history, and
in HbA1c was more negative than the higher average HbA1c (11.0 vs. 8.0% or response to treatment for patients similar
median because of a left-skewed distri- 97 vs. 64 mmol/mol). to an index patient. A prototype dashboard
bution. Some patients received partic- The performance of the prescriptive al- visualization for one sample patient visit is
ularly large benets from using the gorithm in specic patient subgroups is shown in Fig. 2. The dashboard would
prescriptive algorithm, which had an summarized in Supplementary Table 3. include the patients demographic and

Figure 2Visualization of prescriptive algorithm: provider dashboard prototype. This gure visualizes how the prescriptive algorithm can be used by
providers for a single patient. A: Displays the algorithms treatment recommendation along with the predicted posttreatment HbA1c under that
treatment. B: Depicts the mean, SD, and full distribution of posttreatment HbA1c outcomes for the kt most similar patient visits in the data set for
each of the six regimens on this patients menu of options. In each subpanel in panel B, the posttreatment HbA1c level is on the horizontal axis, and
the number of visits is on the vertical axis. The table in panel C presents basic information about the patients demographic and medical history. D:
Depicts the history of diabetes progression and treatment for the patient, with date along the horizontal axis. The vertical axis of the top subpanel
indicates various drug classes; the vertical axis of the bottom subpanel depicts HbA1c percentage.
216 Personalized Diabetes Management Using EMR Diabetes Care Volume 40, February 2017

health information along with visualiza- personalization is the primary driver of Despite these limitations, the study es-
tions of the patients treatment history benet relative to standard of care. tablishes strong evidence of the benet of
and HbA1c progression. In addition, the In practice, the algorithm can be inte- individualizing diabetes care. The success
dashboard would display the mean, SD, grated into existing EMR systems to dy- of this data-driven approach invites fur-
and full distribution of HbA1c outcomes namically suggest personalized treatment ther testing using datasets from other
among the kt nearest neighbors who re- paths for each patient based on histori- hospital and care settings. Testing the pre-
ceived each treatment in the menu of cal records. The algorithm ingests and scriptive algorithm in a clinical trial setting
options. Based on this evidence, the dash- analyzes EMR data and generates rec- would provide even stronger evidence of
board would display a treatment recom- ommendations. An intuitive, interactive clinical effectiveness. As large-scale geno-
mendation. The provider would have the dashboard summarizes the evidence for mic data become more widely available,
ability to override this recommendation the recommendation, including the ex- the algorithm could readily incorporate
given any special management needs of pected distribution of outcomes under al- such data to reach the full potential of
the patient. For instance, if the patient is ternative treatments (Fig. 2). personalized medicine in type 2 diabetes.
elderly and the distribution of HbA1c out- Because of the nature of retrospective In this study, we developed a novel
comes indicates that the recommended data from existing EMR, this study has data-driven prescriptive algorithm for
therapy has an elevated risk of hypogly- several limitations. Patients were not ran- type 2 diabetes that improves signicantly
cemia, the provider may opt for an alter- domized into treatment groups. Although on the standard of care when tested on
native treatment. our matching methodology controls for patient-level EMR data from a large med-
The overall mean HbA1c outcome using several confounding factors that could ex- ical center. Our work is a key step toward a
the LASSO-based prescriptive algorithm plain differences in treatment effects, we fully patient-centered approach to diabe-
was lower by 0.13 6 0.01% (1.4 6 can only estimate counterfactual out- tes management.
0.1 mmol/mol; P , 0.001) compared comes. EMR data do not include socio-
with the mean standard of care outcome. economic factors or patient preferences
The benet from using the random forest that may be important in treatment deci- Acknowledgments. The authors thank Dr.
based prescriptive algorithm relative to sions. Becaue of a lack of sufcient data, Michael Kane, Massachusetts Institute of Tech-
standard of care was 0.07 6 0.01% glucagon-like peptide 1 agonists were not nology, for sharing clinical expertise in the pro-
gression and treatment of diabetes and Dr.
(0.8 6 0.1 mmol/mol; P , 0.001). considered as a separate drug class. If William Adams, Boston University Clinical and
In the sensitivity analyses, under three more data were available, we could fur- Translational Science Institute, for sharing clinical
alternate random splittings of the data- ther differentiate regimen types beyond expertise and assisting with the interpretation of
set, the overall mean benet of using the 13 we include in this analysis. In ad- EMR. The authors also thank BMC for use of its
the prescriptive algorithm compared dition, the study population from BMC i2b2 database and the Associate Editor and the
three reviewers for thoughtful comments that
with standard of care ranged from 0.11 may not be representative of the U.S. improved the paper signicantly.
to 0.15% (1.21.6 mmol/mol; P , 0.001 population as a whole. Funding. This research is partially supported by
in all instances). With EMR medication-order data alone, National Science Foundation grant 6926678 [SHB:
we cannot be certain whether a prescribed Type II (INT): Collaborative Research: Algorithmic
Approaches to Personalized Health Care].
CONCLUSIONS medication was lled or taken and cannot
Duality of Interest. No potential conicts of
To our knowledge, we present the rst know precisely when the medication was interest relevant to this article were reported.
prescriptive method for personalized stopped. Although this data quality issue Author Contributions. D.B. contributed to the
type 2 diabetes care. Using historical data could hamper attempts to make drug denition of the problem, methods development,
from a large EMR database, this novel pre- efcacy comparisons, our analysis aims to and reviewing and editing the paper. N.K. contrib-
uted to the denition of the problem, methods
scriptive method resulted in an average address the question of which drugs to development, data analysis, and reviewing and
HbA1c benet of 0.44% (4.8 mmol/mol) prescribe under real-world scenarios. We editing the paper. A.M.W. contributed to the
at each doctors visit for which the algo- optimize for an outcome that takes into denition of the problem, methods development,
rithms recommendation differed from account unobserved factors such as non- data analysis, and writing, reviewing, and editing
standard of care. adherence. For instance, if nonadherence the paper. Y.D.Z. contributed to the denition of
the problem, methods development, data analy-
Our method incorporates patient- is more prevalent among patients pre- sis, and writing, reviewing, and editing the paper.
specic demographic and medical his- scribed insulin than other regimens, this D.B. is the guarantor of this work and, as such, had
tory data to determine the best course perspective may explain why, in our study full access to all the data in the study and takes
of treatment. Compared with other population, the algorithm recommends in- responsibility for the integrity of the data and the
sulin less often than it is prescribed in clin- accuracy of the data analysis.
machine-learning methods considered,
Prior Presentation. Parts of this study were
the kNN prescriptive approach is highly ical practice. presented in abstract form at the 76th Scientic
interpretable and exible in clinical app- Our method can be extended to be Sessions of the American Diabetes Association,
lications. The novelty of our approach is more exible and comprehensive. Cur- New Orleans, LA, 1014 June 2016.
in personalizing the decision-making pro- rently, the prescriptive algorithm does
cess by incorporating patient-specic fac- not support individualized glycemic tar- References
tors. This method can easily accommodate gets; we assume that a lower glycemic level 1. Rodbard HW, Jellinger PS, Davidson JA, et al.
alternative disease-management ap- is always preferred. The study currently op- Statement by an American Association of Clini-
cal Endocrinologists/American College of Endo-
proaches within specic subpopulations, timizes only for a single health outcome; a crinology consensus panel on type 2 diabetes
such as patients with chronic kidney dis- more comprehensive algorithm would con- mellitus: an algorithm for glycemic control. En-
ease and elderly patients. We believe this sider adverse event outcomes as well. docr Pract 2009;15:540559
care.diabetesjournals.org Bertsimas and Associates 217

2. Inzucchi SE, Bergenstal RM, Buse JB, et al.; targets in type 2 diabetes mellitus: implications 11. Rosenbaum PR, Rubin DB. The central role
American Diabetes Association (ADA); Euro- of recent clinical trials. Ann Intern Med 2011; of the propensity score in observational studies
pean Association for the Study of Diabetes 154:554559 for causal effects. Biometrika 1983;70:4155
(EASD). Management of hyperglycemia in 6. Lipska KJ, Bailey CJ, Inzucchi SE. Use of metfor- 12. Imbens GW, Rubin DB. Causal Inference for
type 2 diabetes: a patient-centered approach: min in the setting of mild-to-moderate renal in- Statistics, Social, and Biomedical Sciences: An
position statement of the American Diabetes sufciency. Diabetes Care 2011;34:14311437 Introduction. New York, Cambridge University
Association (ADA) and the European Association 7. Jordan MI, Mitchell TM. Machine learning: Press, 2015
for the Study of Diabetes (EASD). Diabetes Care Trends, perspectives, and prospects. Science 13. Franco RS. Measurement of red cell lifespan
2012;35:13641379 2015;349:255260 and aging. Transfus Med Hemother 2012;39:
3. Subramanian S, Hirsch IB. Personalized dia- 8. Bertsimas D, Kallus N. From predictive to pre- 302307
betes management: Moving from algorithmic to scriptive analytics [article online], 2015. Available 14. Tibshirani R. Regression shrinkage and se-
individualized therapy. Diabetes Spectr 2014; from https://arxiv.org/abs/1402.5481. Accessed lection via the lasso. J R Stat Soc Series B Stat
27:8791 10 October 2016 Methodol 1996;58:267288
4. Zhang C, Zhang R. More effective glycaemic 9. Bertsimas D, OHair AK, Pulleyblank WR. The 15. Breiman L. Random forests. Mach Learn
control by metformin in African Americans than Analytics Edge. Charlestown, MA, Dynamic 2001;45:532
in Whites in the prediabetic population. Diabe- Ideas LLC, 2016 16. R: a language and environment for statisti-
tes Metab 2015;41:173175 10. Cover T, Hart P. Nearest neighbor pattern cal computing [Internet], 2016. Available from
5. Ismail-Beigi F, Moghissi E, Tiktin M, Hirsch IB, classication. IEEE Trans Inf Theory 1967;13: https://www.r-project.org/. Accessed 10 Octo-
Inzucchi SE, Genuth S. Individualizing glycemic 2127 ber 2016

You might also like