You are on page 1of 10

Articles

https://doi.org/10.1038/s42256-020-0176-3

Improving healthcare operations management


with machine learning
Oleg S. Pianykh   1 ✉, Steven Guitron1, Darren Parke1, Chengzhao Zhang2, Pari Pandharipande1,
James Brink   1 and Daniel Rosenthal1

Healthcare institutions need modern and powerful technology to provide high-quality, cost-effective care to patients. However,
despite the considerable progress in the computerization and digitization of medicine, efficient and robust management tools
have yet to materialize. One important reason for this is the extreme complexity and variability of healthcare operations, the
needs of which have outgrown conventional management. Machine learning algorithms, scalable and adaptive to complex pat-
terns, may be particularly well suited to solving these problems. Two major advantages of machine learning—the power of
building strong models from a large number of weakly predictive features, and the ability to identify key factors in complex fea-
ture sets—have a particularly direct connection to the principal operational challenges. The main goal of this work was to study
this relationship using two major types of operational problems: predicting operational events, and identifying key workflow
drivers. Using practical examples, we demonstrate how machine learning can improve human ability to understand and manage
healthcare operations, leading to more efficient healthcare.

T
he potential contribution of machine learning (ML) to health- limitations8. They depend on a large number of rules that must
care operations and management has been largely unexplored. be uncovered by earlier observations and manual data analysis,
To date, within healthcare, ML efforts have focused primarily thus requiring a substantial amount of time to develop, maintain
on clinical applications rather than operations, such as diagnostic and run. Further, these rules are often built on idealized or sim-
feature detection in medical images or natural language processing plified assumptions, with predefined distributions or manually
of patient records1. chosen variables, alienating these models from the real processes.
Yet, the operational sphere of healthcare has much to gain Consequently, rule-based models are not designed to adapt and
from the promises of ML. Modern healthcare workflows—from self-correct as the environment around them changes, which
front-desk registration to hospitalization to post-discharge care— makes them very brittle and not well suited for individualized and
have grown immensely complex and interdependent. These inher- real-time predictions.
ent challenges have been compounded by increasing labour and Approaches that are less model-dependent and more data-
capital costs, a shortage of qualified medical staff and limited facili- driven have also been suggested. For example, Tayne et  al.9 and
ties, leading to a variety of severe operational problems, such as lim- Attarian et al.10 used analyses of variance in patient data to identify
ited access and overcrowding2,3. Moreover, operational failures in operational bottlenecks at academic medical centres. Schwarz et al.11
healthcare have both clinical and financial costs, directly impacting and Wolf et  al.12 leveraged patient waiting time data to optimize
patient health and well-being. operating room efficiency. However, these groups’ methods relied
These challenges have been complemented by the escalating on inferential statistics, rather limited in scope (due to assumptions
complexity and breadth of data, growing with progressive digitiza- about the underlying data structure/distributions) and also limited
tion of healthcare. Electronic medical records have been equipping in ability to support individualized insight. As a result, deeper, more
medical centres with years of operational data, such as process- detailed models are needed to tackle the full magnitude and scope
ing timestamps, scheduling/planning records, examination types of healthcare operational challenges.
and various resource characteristics, recorded routinely by hospi- To be practically useful, the best clinical operations tools should:
tal information systems (HISs), in consistent formats defined by run in real time; require minimal initial assumptions; automati-
major healthcare standards (Health Level 7 (HL7), Fast Healthcare cally identify key features responsible for failures and successes; and
Interoperability Resources (FHIR) or Digital Imaging and naturally adapt to frequent changes in the environment. Combined
Communications in Medicine (DICOM); see Fig. 1). All of these with rich operational data records, this need for all-inclusive analy-
data points need to be taken into account to adequately describe sis and adaptivity naturally led us to the exploration of ML tech-
complex healthcare operations; yet their use in operational analysis niques. We also recognize that, albeit unexplored in healthcare
has been extremely sparse. operations, ML has already been used to meet similar requirements
To address these problems, several advanced approaches to in other industries: to streamline factory manufacturing processes
operations analysis have been investigated, including simulations by detecting bottlenecks and optimizing schedules13–15; to boost
and modelling4–6. Duguay and Chetouane5 and Ghanes et al.6 stud- workflow efficiency in traffic control16; and to run online network
ied emergency department (ED) operations with process-oriented systems17 and data centres18. Yet our goal was not to merely replicate
models, mostly based on discrete event simulation7. While power- these approaches but to address important practical problems. In
ful for ‘what if ’ analyses, simulation models have several principal healthcare, where artificial intelligence and ML have been largely

Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA. 2Massachusetts Institute of Technology,
1

Cambridge, MA, USA. ✉e-mail: opianykh@mgh.harvard.edu

266 Nature Machine Intelligence | VOL 2 | MaY 2020 | 266–273 | www.nature.com/natmachintell


NaTUre MacHine InTelligence Articles

Operational
ML

Imaging study Number of images


Patient identifier Patient arrival time Age
identifier taken
Operational AAA01 AAA 1 Nov 2019 10:30 56 287
features AAA02 AAA 1 Nov 2019 10:30 56 145

BBB01 BBB 1 Nov 2019 10:42 34 2

HIS Medical imaging Other data sources


Example data: Example data: Example data:
Timestamps (arrival, exam Timestamps (image acquisition, Laboratories
begin/end and more) reports) Reports
Patient data (age, gender and more) Image types, counts Transport
Resources (names, locations) Imaging protocols Medications
Sites (local, remote) Device operators, schedules Medical insurance
Physicians (names, schedules) and so on Weather, traffic trends

Digital healthcare standards


HL7, DICOM, FHIR and more

Fig. 1 | From standards to analyses for operational data in healthcare. Modern healthcare data are driven by a few leading digital standards: HL7
(www.hl7.org), DICOM (www.dicomstandard.org) or FHIR (www.hl7.org/fhir/overview.html). By their contents and application areas, the data can be
subdivided into several large domains (hospital information, medical imaging and other sources), and flows into several different areas in a hospital
(imaging, laboratories, insurance and more). Most hospitals already have all of these data flows in place, generating and collecting new data while
retaining it for years as often required by law. As a result, modern hospitals accumulate large volumes of data, which capture their operations and output,
and can be successfully used to build operational models.

overpromised and underdelivered, the applications of ML should be In contrast with strong learners (hard-to-find features, highly cor-
driven by the major practical needs rather than by the availability of related with the variable of interest), operational weak learners are
off-the-shelf ML solutions. abundant. This is mostly due to the fact that the outcome of a clini-
Therefore, the overall purpose of this article is to demonstrate the cal workflow is affected by a plethora of different factors, none of
practical advantages of ML for healthcare operations management. them being decisive enough to exclude the rest. For instance, the
To do so, we introduce ML into two major classes of healthcare current delay in a medical facility may depend on its staffing, cur-
operational problems: predicting workflow events; and identifying rent patient arrival pattern, time of day, complexity of examinations
key operational features. We demonstrate how two specific proper- performed, current bottlenecks in the operational environment,
ties of ML—weak learning and optimal subset selection—make ML holidays, weather and seasonal trends, traffic jams, cafeteria lunch
particularly suitable for solving operational problems. Finally, we hours and many other factors. Each of these features, if considered
discuss the advantages and limitations of these solutions and illus- separately, would usually have very little impact on the target delay
trate them with several applications our group has already devel- value, thus acting as a weak predictor. Only when brought together
oped and implemented in a large tertiary care centre. can these poor predictors form a far more complete representation
of the real process.
ML and healthcare operations problems However, this is precisely what ML was designed for: harnessing
Learning operational patterns from healthcare data can be enor- hundreds and thousands of weak learners to create a strong pre-
mously helpful for two major classes of operational problems: pre- dictive model. In this respect, ML offers a very logical choice for
dicting critical workflow events; and identifying key features that describing complex operational processes. This approach becomes
define process behaviour. Intractable with traditional methods, both particularly useful when one needs to predict certain operational
problems lend themselves naturally to ML techniques. Let us take a events to inform and to take early actions. For example, hospital
closer look at the main approaches and methodology required for facilities often need to know their near-future workloads (the num-
building these ML solutions. ber of unread examinations or walk-in patients) to better allocate
their resources. In such cases, applying ML to predictive operational
Forecasting workflow events with ML weak learning. Why weak problems becomes one of the most promising directions for advanc-
learning? The greatest strength of ML lies in its ability to synthe- ing healthcare operations.
size weak learners—features that are only slightly correlated with HISs already provide hospitals with a number of standard-
the variable of interest—into a strongly predictive model19,20. ized workflow-related parameters (from processing timestamps to

Nature Machine Intelligence | VOL 2 | MaY 2020 | 266–273 | www.nature.com/natmachintell 267


Articles NaTUre MacHine InTelligence

150 Simple model ML model

Predicting
Delayed


Actual wait time outliers

Predicted wait
100 time

Wait time (min) 50

–50

Actual wait time

–100

Early

–150
Sequence of patient encounters
→ Arrival time →

Fig. 2 | Predicting wait time. Grey: a typical waiting time pattern in a hospital facility. Note the nonlinear complexity, high variability and temporal surges,
leading to processing bottlenecks. Left (in red): an attempt to predict patient waits from the previous wait, resulting in an extremely inaccurate model.
Right (in green): patient wait time predicted using a more complex ML model with 55 features, engineered from the HIS data. While still not perfect, this
model shows much better prediction quality and even captures some of the large outliers.

resource specifications), which can all act as weak learners in ML in Fig. 3, top); these displays were implemented in our waiting
models. To add even more weak learners, we have developed tem- rooms. First, this provided our facility management with accurate
poral feature engineering—the production of features at frequent and actionable predictions of their incoming delays, impossible to
time points. For example, to refine our ability to predict wait times, achieve with simple models. Second, 400 patient surveys conducted
we compute the number of patients waiting for an appointment soon after proved that 82% of the patients approved the displays
(the wait line length) not only at the time of each new patient arrival, and wanted them to become available in all hospital areas27. This
but also at 5, 10, 15 and 30 min prior. This ‘environment snapshot- extension was easily achieved by running the same ML model with
ting’ method provides our ML models with much richer informa- different subsets of HIS data, corresponding to each hospital area.
tion on how the entire process evolves over time, with each time Thus, the adaptability of ML models made our wait time predictors
sample acting as a weak predictor of the final outcome. naturally scalable, which enabled us to deploy different displays for
every modality and procedure area.
Predicting patient wait times. Waiting is a major cause of patient We also had to develop a second ‘administrator’ view, show-
dissatisfaction and missed care opportunities21–23. The ability to ing the most current prediction history (Fig. 3, bottom). This was
accurately predict current wait times serves two important goals: requested by the facility managers, to show how well the model per-
if displayed to the patients, it permits them to plan their time; and forms, and to investigate the unpredictable outliers, which is often
when displayed to the managers, it allows them to intervene if nec- indicative of workflow problems. This experience was also lever-
essary and possible24,25. aged to advance and refine the ML model: by allowing staff mem-
Wait time is calculated as the difference between two timestamps bers to examine the gaps between what was predicted and the actual
readily available in HISs: for walk-in appointments, it is the differ- outcomes, new features and improvements were uncovered.
ence between patient arrival and the beginning of the examination; Figure 4 provides an illustration from another operational pre-
for scheduled appointments, it is the difference between the sched- dictive ML project developed in our hospital. The same method-
uled start and the actual start (better expressed as ‘delay’ rather than ology enabled us to predict stressful workflow overloads in our
‘wait’). While simple to compute, wait times exhibit very complex ED several hours in advance, with 80% accuracy (precision and
and highly variable temporal trends, making them impossible to recall)—a rather remarkable result for highly random ED patient
predict with a simple average or by looking at previous patient wait flow, unattainable with non-ML predictors. As a result, the ML
times (see the actual patient wait time pattern in Fig. 2). In addition, model predictions, computed 2 h before the real overload surge,
incessant environmental changes—such as weekdays versus week- become sufficiently accurate for taking actions, and provide ED
ends, different staffing, and changes in facility resources and patient management with enough time to call in support staff.
volumes—call for adaptive models.
To solve the wait time prediction problem for a large tertiary Discovering key process drivers with ML optimal subset selec-
care centre, we used our domain knowledge with temporal feature tion. Why optimal feature selection? The investigative capability of
engineering to create a set of 55 predictors, computed from HIS ML derives from optimal feature subset selection (OSS), a common
data in real time. An ML boosted random forest was used26 (Matlab aspect of ML optimization28,29. Depending on the ML model, OSS
LSBoost and Python XGBoost), performance-optimized to train in can employ various feature selection algorithms—pruning, stepwise
less than 5 min and compute each new prediction in a few seconds. removal, penalized elimination even ‘brute force’—to exclude the
The model was configured to run every 3 min, computing the cur- least important variables until the model discovers the minimal and
rent wait time prediction (Fig. 2, right); the model was automatically most predictive set of key features (Fig. 5).
retrained every weekend to reflect any changes in the environment. This OSS functionality of ML models has a clear relationship
To deliver the results, we designed a patient-facing display show- to another large class of operational problems: identifying the fea-
ing predicted wait or delay time for individual imaging workflows tures that are most pivotal for process outcomes. Questions such
(such as X-ray, computed tomography (CT) and so on, as shown as ‘Why is this facility so overloaded?’, ‘Why do patients have to

268 Nature Machine Intelligence | VOL 2 | MaY 2020 | 266–273 | www.nature.com/natmachintell


NaTUre MacHine InTelligence Articles
Walk-in patients

Modality Location Patients waiting Anticipated wait

X-ray Yawkey 6 0 3 min


X-ray Yawkey 6
Scheduled appointments

25 Modality Location Anticipated delay

Predicted: 16 min CT Yawkey 6 11 min


Actual: 25 min
Accession: 1,234,567 MRI Yawkey 6 15 min
Time: 19 Feb 2020 9:37
20 Ultrasound Yawkey 6 1 min
Procedures Yawkey 6 4 min

15
Minutes

Your exact wait time may be slightly different than predicted.

10

0
0 20 40 60 80 100 120 140 160 180 199
Previous appointments Predicted Actual

Fig. 3 | Implementing the wait time model in a hospital. The model (55 operational predictors, boosted random forest ML), implemented at our hospital
‘Yawkey 6’ facility. The main chart shows the administrator view with the last 200 patients (horizontal axis) and their wait times (vertical axis): predicted
wait times are plotted along the blue line, whereas actual (observed) waits are plotted in red. The plot interactively shows an information box on each
case, which facilitates delay investigations. Inset: patient view—the monitor that displays currently predicted wait time to patients.

Predicting the probability of large ED queues over time


1.0

0.9

0.8
Probability of large ED queues

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
0 50 100 150 200 250 300 350 400 450 500 550 600 650 700
Hours
Large ED queue (>7) Prediction (2 h in advance)

Fig. 4 | Predicting workflow overloads in ED imaging (logistic random forest model, 75 predictors). An overload was defined as having more than seven
unreported CT examinations. The shaded regions indicate out-of-sample true outcomes (0 if the queue was less than 7; 1 if the queue was 7 or greater).
The solid line represents the predicted probability of a queue of size 7 or greater, 2 h in advance.

wait for so long?’ and ‘What key factors are contributing to process- the model more accurate; collecting the information from diverse
ing delays and stress?’ are commonly asked in healthcare settings. independent sources makes the model less biased. As a result, we
Conventionally, they would be answered on the basis of manual inevitably end up with a large set of candidate features, impossible
observations and data exploration through reports, charts and to explore manually. Therefore, we add the following two new prin-
dashboards—a time-consuming and laborious process. Moreover, cipal tasks, different from predictive modelling.
these conventional methods work best when there is a small num- 1. Running OSS to discover a small number of the most impor-
ber of factors—to visualize with a chart or a table—which excludes tant features.
full factor set analysis and can produce biased results. For example, 2. Visualizing OSS feature dependencies so that they can be inter-
a single graph of patient wait time versus time of day would suggest preted by humans.
conclusions that may be inherently biased due to factors that are
correlated with time of day, such as appointment schedules. With Below, we describe an example of this approach.
OSS, we can use ML to discover the most important features in the
complex, noisy, evolving, multidimensional data. Delays in creation of radiology reports. We used the described inves-
To run an investigative ML project, one starts with the same step tigative ML process to explain delays in the creation of radiology
as in predictive learning: assembling as many (weak) features as pos- reports for the ED in a large tertiary care centre. The process out-
sible. Including any information known about the process makes come that we wanted to model was the turn-around time, defined

Nature Machine Intelligence | VOL 2 | MaY 2020 | 266–273 | www.nature.com/natmachintell 269


Articles NaTUre MacHine InTelligence

Decrease in model error versus number of predictors in model models can produce cascading failures, negatively impacting
all process participants. Therefore, although ML provides a
100%
natural fit for many operational problems, it needs to be designed
X-ray MRI CT
Decrease in error (percentage of largest decrease)

90% and implemented with sufficient attention to quality and


details to become practically useful. Our experiences taught us a
80% few lessons, particularly important for healthcare operations ML,
70% ‘Explain’ area, where a few as follows.
optimal OSS features achieve the
most significant initial error
reduction 1. Cross-validation and overfitting penalties. The models should
60%
be trained on different data subsets and with constraints forcing
50% ML to capture only the most essential process trends30–32. This is
particularly critical in the case of healthcare management data,
40%
often entered partially, manually and with limited accuracy
30%
‘Predict’ area, where a large
number of weak features
(such as most of patient data, typically recorded manually).
achieves the most accurate
prediction
2. Sufficient longitudinal data. Operational models should be
20% trained on at least a full year of historical data. Not only does
10% this provide one with larger training sets, but it also helps the
models learn long-term workflow patterns, such as seasonal
0% variations in productivity and workloads.
0 10 20 30 40 50
3. Continuous updates. A robust, real-time processing pipeline
Number of predictors N (ordered by importance)
should be developed that enables facile reading of hospital data,
Fig. 5 | The relationship between the number of predictor variables N
application of one or more ML models and retraining the mod-
and the percentage of model error for three ML models corresponding
els ‘on the fly’ to remain accurate for real-time decision-making.
to three different workflows in a facility. Each workflow is represented by
4. Feature engineering. ML should be used to build features from
a different model error line. The ML model used was a boosted random
a wide range of sources—patient-level data (age and history),
forest. Note that the errors often decrease rapidly with a relatively small N
facility-level data (staffing and queues) and local area (weather
(although the X-ray model demonstrates a visibly slower decrease rate).
and traffic)—to capture all variabilities in operational process-
The vertical axis measures the model error relative to a single-predictor
es. This also includes making model results visible to the facility
model (hence, N = 1 corresponds to 100%). MRI, magnetic resonance
managers, to gather their feedback (as shown in Fig. 3) and to
imaging.
control model performance.
5. Correct ML model selection. There is no ‘one model fits all’
solution, and several models should be attempted to select the
best, or to create a multi-model ensemble33,34. GAMs can com-
as the time from medical image acquisition to preliminary diagno- bine interpretability with nonlinear analyses35,36. Additive tree
sis (preliminary radiology report). In total, 38 predictive features models, such as random37,38 or boosted39,40 forests, can provide
(6 original and 32 derived) were collected and engineered from the extreme modelling flexibility, and handle sparse, categorical
routine HIS records. One year of ED data (42,064 patient records) and missing data. Neural networks excel in predicting sophisti-
was considered to account for seasonal operational trends. We cated trends41,42, but their hidden logic may be hard to interpret
divided the ED workflow into four major sub-workflows: neurolog- (note that interpretation clarity is important in operations)43.
ical examinations with CT images; neurological examinations with We also recommend using several ML models of different na-
magnetic resonance images; non-neurological examinations with ture, comparing their results to make sure that they are consist-
CT images; and non-neurological examinations with X-ray images. ent and free of any model-specific bias.
For each sub-workflow, the highly interpretable generalized addi-
tive model (GAM) was chosen as our explanatory ML tool.
Using these data, an interpretable GAM and OSS to identify Advantages of operational ML in healthcare
the most critical bottleneck factors, we were able to substantially As our results demonstrate, ML can provide efficient and practical
improve our understanding of the underlying workflow problems. solutions for major operational problems, facilitating operational
For instance, OSS identified the time of day as one of the most criti- decision-making and identifying the most essential process-driving
cal features: the model discovered a sharp increase in turn-around parameters. When applied to healthcare operations, this leads to
time for neurological exams between 3:00 and 5:00 (Fig. 6, right). several important gains summarized below.
When discussed with the ED management, this model discovery was
explained by insufficient staff coverage during this time. Moreover, Higher accuracy. The advantages of ML can be illustrated with
the model estimated the real 2-h scale of this effect, decoupling it Fig. 7, comparing the accuracy of ML-based wait time prediction to
from the mix of the other factors—that is, it provided us with a very the most widespread pre-ML approaches (predicting from previous
specific metric to target for improvement. waits and their moving averages44). To quantify model quality, we
Exploring highly dimensional feature spaces—such as a used both R2 and mean absolute error (MAE), where reduction in
38-dimensional space of original features, with thousands of possi- MAE was computed relative to the MAE of the original wait time.
ble feature interactions—would be impossible for a human analyst. A single decision tree, penalized linear regression (elastic net), a
However, it takes minutes for an ML model, which also performs boosted random forest and shallow neural networks (up to two lay-
much more comprehensive data analysis. As a result, the ML-based ers) were used to demonstrate the benefits of ML. All model errors
approach to key feature detection provides a superior alternative to were computed on the test (out-of-sample) datasets, to exclude
traditional operational data analysis, and to process exploration in overfitting.
general. As Fig. 7 demonstrates, the application of ML resulted in sub-
stantially higher model accuracy. Increasing both the model com-
Practical considerations for operational modelling in health- plexity and the model feature set improves ML outcomes. Practically
care. Healthcare operations management is a very demanding field speaking, predicting wait times for those four facilities without ML
with a very low margin for error. Simple guesses or incorrect would be impossible.

270 Nature Machine Intelligence | VOL 2 | MaY 2020 | 266–273 | www.nature.com/natmachintell


NaTUre MacHine InTelligence Articles
Original data points Same data as seen by ML algorithm, discovering hidden trends

TAT by time of day CT neurological


125 MRI neurological
350
CT non-neurological
X-ray non-neurological
100
300 Significant delay in ED
patient processing,

Minutes added to the TAT


Minutes added to the TAT

250 75 identified by ML model

200 50

150 25

100 0

50 –25

0 –50
0 4 8 12 16 20 24

11

18
12

21
20
13

22
10

23
1

8
9
0

19
3

6
2

16

24
14
15
7
5

17
Hour of the day
Hour of the day

30 20
Change in bed requested to bed assigned (minutes)

20

Change in length of stay (minutes)


10
10

0 0
56 min
36 min
–10
–10

–20

–20
–30

–40 –30
0.86 0.88 0.90 0.92 0.94 0.96 0.98 1.00 6 8 10 12 14 16 18 20 22 24 26 28 30
Bed utilization: medicine Number of pending CT exams

Fig. 6 | Using ML to discover main operational features. Top: neurological exams experience high delays between 3:00 and 5:00 due to our staffing model
for interpretation of these studies. The scatter plot (left) is using the original HIS data for all four ED workflows, making bottleneck patterns impossible to
clearly discern or measure. TAT, turn-around time. Once processed by a workflow-specific GAM model (right), the early morning delay pattern becomes
obvious. Bottom: using ML to identify the impact of certain features on ED patient length of stay (the total time a patient stays in the ED). These partial
dependency plots were built from ML models to study the independent contribution of each feature. The shaded areas show the variability in the results
when using different types of ML models.

Are there any limits to how accurate operational models could with poor predictability should be first checked for operational con-
be? Yes, and on the basis of our experience, we would attribute them sistency; this should lead to either refining the processing logic, or
to two principal sources of operational variabilities, listed below. discovering some new ML features to make it more predictable.
1. Temporal variability. This type of variability is associated with
attempts to predict for longer periods of time, which inevita- Scalability and adaptivity. As we mentioned earlier, operational
bly makes predictions less accurate (random events accumu- environments are highly variable, and depend on many local,
lating during a longer process). It can be seen in facility F2 facility-specific features. Can operational ML scale up and adapt to
(Fig. 7), where patient examinations take 2–3 times longer than in this variability?
F1, F3 or F4, thus making the next patient wait time less Yes, owing to the two principal reasons below.
predictable.
2. Workflow variability. This type of variability is associated with 1. Global standardization of healthcare data. Since the 1990s, all
highly unpredictable and disruptive events. It can also be seen healthcare data elements (operational included) are recorded
in Fig. 7, where the least predictable F1 is the only facility tak- using a few major digital standards (such DICOM, HL7 or
ing random walk-in patients. FHIR; Fig. 1). As a result, all operational data are collected
in a uniform, site- and vendor-independent way. Therefore,
Note that both limitations are rooted into their specific work- porting an operational ML model to another clinical facil-
flows, and not into the deficiencies of the ML models. Highly ran- ity does not require reinventing the model or its feature set.
dom or subjective workflows would be impossible to model even Instead, the model needs only to be retrained into the local
with the most advanced ML. Therefore, any operational processes workflow patterns.

Nature Machine Intelligence | VOL 2 | MaY 2020 | 266–273 | www.nature.com/natmachintell 271


Articles NaTUre MacHine InTelligence

Wait time models, R 2


0.7
ML advantage
0.6

0.5
Model R 2
0.4

0.3

0.2

0.1

0
F1 F2 F3 F4
Moving average Decision tree Linear regression Boosted forest Neural network

Wait time models, reduction in MAE


50%
ML advantage

40%
Model reduction in MAE

30%

20%

10%

0%
F1 F2 F3 F4
Moving average Decision tree Linear regression Boosted forest Neural network

Fig. 7 | ML versus pre-ML estimates. Wait time prediction model quality, computed for four different medical facilities (F1–F4), with different models.
Models using moving averages of the previous waits were reported in the earlier publications, but they fail to achieve satisfactory results for our data.
On the contrary, ML models substantially improve prediction accuracy.

2. Adaptivity of ML. Unlike many pre-ML models based on static with a unique, new opportunity to make informed, accurate,
rules or coefficients, ML algorithms are naturally adaptive— real-time decisions. The impact of this approach can be directly
they do not need human assistance to learn (and to re-learn) observed in the data, which confirms not only the accuracy of the
from their data. Keeping operational ML on a regular, automat- model, but also the overall ML impact of the model-driven opera-
ed retraining schedule ensures that the models remain accurate tional improvements.
as their environments evolve.
Future directions. Although process prediction and key feature analy-
These advantages are augmented by the ability of operational sis encompass two principal areas of operational problems, other man-
ML to take into account dozens—and often hundreds—of opera- agement challenges—such as scheduling optimization, performance
tional features, as we have demonstrated above. This also explains and productivity analysis, process mining, quality improvements and
why human decision-making—limited to only about 4–5 variables more—can benefit from ML techniques. Similarly, many rigid and
at best45,46—is surpassed by the power of operational ML. ‘non-learning’ operational algorithms from the past can be enhanced
with ML design, making them much more applicable in complex
Higher efficiency. How do accurate, highly predictive models operational environments. As a result, blending domain-optimal
translate into real impact? The ML models we described were devel- algorithms with ML data learning opens an extremely promising
oped because of real staff and patient needs. Previously, data-driven direction for solving real-life operational problems, where both ML
predictive models that could be used in real time with high accu- and conventional operational approaches might enhance and comple-
racy were not available. Now, staff can use these tools to drive ment each other, overcoming their present limitations.
decision-making and be more reactive to hospital and especially In many ways, this ‘learning’ approach to operations manage-
patient needs. ment should change the entire management paradigm. Operational
Using wait time displays, staff can investigate the unique patient processes often behave as natural phenomena, where the rules must
identifiers associated with high predicted wait times to investigate be discovered rather than forced. As our examples demonstrate, ML
the underlying causes of these delays. This can drive future opera- weak feature learning and optimal subset selection make these dis-
tional changes (for example, if predictions are high despite seem- coveries possible, directly contributing to more efficient healthcare
ingly low resource utilization). With workflow overflow models, management.
one becomes able to predict surges in patient queues 2 h in advance.
This was a staff-driven threshold: 2 h was the amount of time Reporting Summary. Further information on research design
needed to call in support staff to aid in quelling those queues before is available in the Nature Research Reporting Summary linked to
they could bottleneck the system. ML provides healthcare managers this article.

272 Nature Machine Intelligence | VOL 2 | MaY 2020 | 266–273 | www.nature.com/natmachintell


NaTUre MacHine InTelligence Articles
Data availability 23. Brandenburg, L., Gabow, P., Steele, G., Toussaint, J. & Tyson, B. Innovation
The clinical facility wait/delay time dataset described in this work and best practices in health care scheduling. NAM Perspect. 5, 1–24 (2015).
24. Dibble, E. H., Baird, G. L., Swenson, D. W. & Healey, T. T. Psychometric
is available from our web site: https://medicalanalytics.group/ analysis and qualitative review of an outpatient radiology-specific patient
operational-data-challenge. satisfaction survey: a call for collaboration in validating a survey instrument.
J. Am. Coll. Radiol. 14, 1291–1297 (2017).
Code availability 25. Singh, S. C., Sheth, R. D., Burrows, J. F. & Rosen, P. Factors influencing
patient experience in pediatric neurology. Pediatr. Neurol. 60, 37–41 (2016).
The Matlab code used to analyse the clinical facility wait/delay
26. Kuhn, M. Applied Predictive Modeling (Springer, 2013).
time dataset is available from our website: https://medicalanalytics. 27. Jaworsky, C., Pianykh, O. & Oglevee, C. Patient feedback on waiting time
group/operational-data-challenge. displays. Am. J. Med. Qual. 32, 108–108 (2016).
28. Bertsimas, D., King, A. & Mazumder, R. Best subset selection via a modern
Received: 3 May 2019; Accepted: 13 April 2020; optimization lens. Ann. Stat. 44, 813–852 (2016).
Published online: 18 May 2020 29. Khalid, S., Khalil, T. & Nasreen, S. A survey of feature selection and feature
extraction techniques in machine learning. In Proc. Science and Information
Conference (Science and Information Conference, 2014).
References 30. Bottou, L., Curtis, F. E. & Nocedal, J. Optimization methods for large-scale
1. Choy, G. et al. Current applications and future impact of machine learning in machine learning. Soc. Ind. Appl. Math. Rev. 60, 223–311 (2018).
radiology. Radiology 288, 318–328 (2018). 31. Dietterich, T. Overfitting and undercomputing in machine learning.
2. Winasti, W., Elkhuizen, S., Berrevoets, L., van Merode, G. & Berden, H. ACM Comput. Surv. 27, 326–327 (1995).
Inpatient flow management: a systematic review. Int. J. Health Care Qual. 32. Benjamin, A. S. et al. Modern machine learning far outperforms GLMs at
Assur. 31, 718–734 (2018). predicting spikes. Preprint at https://doi.org/10.1101/111450 (2017).
3. Zhao, Y. et al. Bottleneck detection for improvement of emergency 33. Austin, P. C. A comparison of regression trees, logistic regression, generalized
department efficiency Bus. Process Manag. J. 21, 564–585 (2014). additive models, and multivariate adaptive regression splines for predicting
4. Benneyan, J. C. An introduction to using computer simulation in healthcare: AMI mortality. Stat. Med. 26, 2937–2957 (2007).
patient wait case study. J. Soc. Health Diabetes 5, 1–15 (1997). 34. Mohri, M., Rostamizadeh, A. & Talwalkar, A. Foundations of Machine
5. Duguay, C. & Chetouane, F. Modeling and improving emergency department Learning (MIT Press, 2012).
systems using discrete event simulation. SAGE J. 83, 311–320 (2007). 35. Hastie, T. & Tibshirani, R. Generalized Additive Models (Chapman and
6. Ghanes, K. et al. A comprehensive simulation modeling of an emergency Hall, 1990).
department: a case study for simulation optimization of staffing levels. In 36. Dominici, F., McDermott, A., Zeger, S. L. & Samet, J. M. On the use of
Proc. 2014 Winter Simulation Conference (IEEE, 2014). generalized additive models in time-series studies of air pollution and health.
7. Rossetti, M. D. Simulation Modeling and Arena (John Wiley and Sons, 2009). Am. J. Epidemiol. 156, 193–203 (2002).
8. Subramaniyan, M., Skoogh, A., Gopalakrishnan, M. & Salomonsson, H. 37. Prasad, A. M., Iverson, L. R. & Liaw, A. Newer classification and regression
An algorithm for data-driven shifting bottleneck detection. Cogent Eng. 3, tree techniques: bagging and random forests for ecological prediction.
1–19 (2016). Ecosystems 9, 181–199 (2006).
9. Tayne, S., Merrill, C. & Saxena, R. Maximizing operational efficiency using an 38. Genuer, R., Poggi, J.-M. & Tuleau-Malot, C. Variable selection using random
in-house ambulatory surgery model at an academic medical center. Found. forests. Pattern Recognit. Lett. 31, 2225–2236 (2010).
Am. Coll. Healthc. Exec. 63, 118–129 (2018). 39. Elith, J., Leathwick, J. R. & Hastie, T. A working guide to boosted regression
10. Attarian, D. E., Wahl, J. E., Wellman, S. S. & Bolognesi, M. P. Developing a trees. J. Anim. Ecol. 77, 802–813 (2008).
high-efficiency operating room for total joint arthroplasty in an academic 40. De’ath, G. Boosted trees for ecological modeling and prediction. Ecology 88,
setting. Clin. Orthop. Relat. Res. 471, 1832–1836 (2013). 243–251 (2007).
11. Schwarz, P. et al. Lean processes for optimizing OR capacity utilization: 41. Haykin, S. Neural Networks: A Comprehensive Foundation (Prentice
prospective analysis before and after implementation of value stream mapping Hall, 1994).
(VSM). Langenbeck’s Arch. Surg. 396, 1047–1053 (2011). 42. Schmidhuber, J. Deep learning in neural networks: an overview. Neural Netw.
12. Wolf, F. A., Way, L. W. & Stewart, L. The efficacy of medical team training: 61, 85–117 (2015).
improved team performance and decreased operating room delays. Ann. Surg. 43. Olden, J. D. & Jackson, D. A. Illuminating the “black box”: a randomization
252, 477–483 (2010). approach for understanding variable contributions in artificial neural
13. Subramaniyan, M., Skoogh, A., Salomonsson, H., Bangalore, P. & Bokrantz, J. networks. Ecol. Model. 154, 135–150 (2002).
A data-driven algorithm to predict throughput bottlenecks in a production 44. Hemaya, S. & Locker, T. How accurate are predicted waiting times,
system based on active periods of the machines. Comput. Ind. Eng. 125, determined upon a patient’s arrival in the emergency department? Emergency
533–544 (2018). Med. J. 29, 316–318 (2012).
14. Priore, P., Gómez, A., Pino, R. & Rosillo, R. Dynamic scheduling of 45. Halford, G. S., Baker, R., McCredden, J. E. & Bain, J. D. How many variables
manufacturing systems using machine learning: an updated review. Artif. can humans process? Psychol. Sci. 16, 70–76 (2005).
Intell. Eng. Des. Anal. Manuf. 28, 83–97 (2014). 46. Iyengar, S. S. & Lepper, M. R. When choice is demotivating: can one desire
15. Thomas, T. E., Koo, J., Chaterji, S. & Bagchi, S. MINERVA: a reinforcement too much of a good thing? J. Pers. Soc. Psychol. 79, 995–1006 (2000).
learning-based technique for optimal scheduling and bottleneck detection in
distributed factory operations. In Proc. 10th Int. Conf. Communication
Systems and Networks (IEEE, 2018). Acknowledgements
16. Elhenawy, M. M. Z. Applying Machine and Statistical Learning Techniques to We acknowledge C. Crowley and other former and present members of the Medical
Intelligent Transport Systems: Bottleneck Identification and Prediction, Dynamic Analytics Group who contributed to these projects.
Travel Time Prediction, Driver Stop-Run Behavior Modeling, and Autonomous
Vehicle Control at Intersections. PhD thesis, Virginia Polytechnic Institute and Author contributions
State Univ. (2015). O.P. and D.R. conceived the study. O.P., S.G., D.P. and C.Z. analysed the data. All authors
17. Fadlullah, Z. M. et al. State-of-the-art deep learning: evolving machine contributed to interpreting the results. All authors contributed to writing the manuscript.
intelligence toward tomorrow’s intelligent network traffic control systems.
IEEE Commun. Surv. Tutorials 19, 2432–2455 (2017).
18. Matsunaga, A. & Fortes, J. A. B. On the use of machine learning to predict the Competing interests
time and resources consumed by applications. In Proc. 2010 10th IEEE/ACM The authors declare no competing interests.
International Conference on Cluster, Cloud and Grid Computing (IEEE, 2010).
19. Joshi, M. V., Agarwal, R. C. & Kumar, V. Predicting rare classes: can boosting
make any weak learner strong? In KDD ‘02 Proc. 8th ACM SIGKDD Interna­ Additional information
tional Conference on Knowledge Discovery and Data Mining (ACM, 2002). Supplementary information is available for this paper at https://doi.org/10.1038/
20. Freund, Y. & Schapire, R. E. A short introduction to boosting. J. Jpn. Soc. s42256-020-0176-3.
Artif. Intell. 14, 771–780 (1999). Correspondence and requests for materials should be addressed to O.S.P.
21. Holbrook, A. et al. Shorter perceived outpatient MRI wait times associated Reprints and permissions information is available at www.nature.com/reprints.
with higher patient satisfaction. J. Am. Coll. Radiol. 13, 505–509 (2016).
22. Anderson, R. T., Camacho, F. T. & Balkrishnan, R. Willing to wait?: The Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in
influence of patient wait time on satisfaction with primary care. BMC Health published maps and institutional affiliations.
Serv. Res. 7, 31 (2007). © The Author(s), under exclusive licence to Springer Nature Limited 2020

Nature Machine Intelligence | VOL 2 | MaY 2020 | 266–273 | www.nature.com/natmachintell 273

You might also like