Professional Documents
Culture Documents
Business Drivers AP-HP began discussions of how to use direction of France’s open data
its data to support development of an initiatives. The AP-HP project was
Overcrowding is a growing problem in analytics and machine learning solution developed at Teratec, a Paris-area
hospitals around the world. It results that could help predict emergency campus, founded by a consortium
in backups in the emergency room, department visits and hospital of government agencies, academic
with patients occasionally being turned admissions. They wished to incorporate institutions, and private companies,
away from emergency departments due additional data sources into a predictive including Intel.
to lack of space, beds, or staff to treat model with their own data, to evaluate
them. whether including external factors Data Wrangling and Preparation
would contribute to a greater accuracy
Predicting the utilization of their in visit- and admission-rate projections. France has strict data privacy laws,
emergency departments is a major AP-HP’s initial goal was to develop a which prevented AP-HP from providing
priority for administrators at AP-HP, web interface that would allow hospital Intel with its source data directly.
both as a means for reducing patient administrators to view predictions of
wait times and for improving patient future emergency department visits Because ETALAB had permission
care. Inability to accurately plan for and hospital admissions. A future goal to work with the protected health
future patient loads is a cause of was to offer a public-facing version of information contained in the raw data,
frustration and concern, as some the interface that would allow incoming it was responsible for managing the full
emergency rooms in AP-HP may be patients to view anticipated wait anonymization of the dataset, removing
overcrowded while others may be times at all of the network emergency any information that could lead to the
underutilized. In addition, patients may departments and to evaluate the identification of a patient. The dataset
not have sufficient information about medical specialties at various hospitals provided to Intel, which contained
specialty clinics within the network to before determining which AP-HP facility data on more than 470,000 patient
ensure they go to the most appropriate to visit. encounters, included the hospital
emergency department for specific site of the encounter, whether the
ailments or injuries. These challenges “The primary consideration in the patient was admitted to the emergency
make it difficult for administrators development of our data science department or later to the hospital,
to effectively allocate staff and other project with Intel was to improve and the date and times of patient
resources, because they are unable to patient care and outcomes,” said movements in or out of the emergency
accurately anticipate the need for beds Raphael Beaufret, the head of web department.
both in emergency departments and in innovation in the AP-HP CIO’s data
other parts of the hospital. department. “If patients had access to Given the limitations brought on by
accurate real-time data on wait times dealing with anonymized records, data
Emergency Departments with Rich at all AP-HP emergency departments, scientists from AP-HP and Intel were
and information about the availability restricted in the queries that could be
Datasets
of specialized care opportunities at performed on the data. However, by
In 2003, an intense heat wave hit various locations, some of them could including environmental data, such as
France, resulting in the death of make more informed decisions about weather and flu rates from publically
dozens of citizens, in part due to where they seek care.” available data sources, they could
inadequate emergency health facilities explore the usefulness of adding these
to deal with the crisis. The French Project Overview external data sources to provide more
government responded by making context to the patient information.
major investments in new emergency “We worked closely with AP-HP to
departments, including new electronic build a prototype cloud-based solution Phase One: Modeling
information systems. With a decade’s designed to predict the expected
number of patient visits and admissions After the full dataset was prepared
worth of both clinical and operational
over a moving 15-day window. and assembled by ETALAB, the project
data within its emergency department
This proof-of-concept (POC) used moved forward in two phases. The first
networks, AP-HP has a rich dataset
retrospective data to build a predictive involved the bulk of the data science
with the potential to use as a basis for
model of patient visits for four of the research, along with the initial model
meaningful big data analytics solutions
French hospitals in AP-HP,” said Kyle evaluations. During this phase, the Intel
for addressing a range of challenges in
Ambert, senior data scientist and and AP-HP teams cleaned and sampled
healthcare. In addition, clinicians and
technical lead for Health & Life Science, the data as well as surveyed a variety
data scientists at AP-HP were aware of
Intel Corp. of possible methods for predicting
recent studies suggesting correlations
hour-by-hour visits and admissions at
among such external factors as
Also contributing to the collaboration each site.
weather, holidays, and flu incidence and
increased emergency room visits. was ETALAB, a bureau of the French
government responsible for the Though the team’s primary goal at
this point was determining the time
French Hospital Uses Trusted Analytics Platform to Predict Emergency Department Visits and Hospital Admissions 3
series algorithm that most accurately Solution Details flu rates, seasonality, and the daily
tracked the peaks and valleys of the maximum and minimum temperatures
observed data from AP-HP, there The data science team evaluated the at Charles De Gaulle airport. Notably,
were also important secondary final dataset and the ARIMAX model models using holiday information
goals. The selected analytical model with respect to the AIC, a metric for did not perform as well, indicating
would need to scale for the analysis quantifying the tradeoff between that these data were not predictive
of much larger data sets than those model accuracy and model complexity. for emergency department visits and
used in the POC, and easily adapt to a Generally speaking, data analysts hospitalization for this population.
distributed computing framework. The aim to find a model that leads to the
model would also need to be readily greatest accuracy for a given set of To illustrate the contribution that a
interpretable to a non-statistical group data. However, to focus simply on single feature such as seasonality
of users, namely clinicians. increasing the model’s accuracy by could make in the model, the team
adding an abundance of predictive compared the predicted and actual
The team evaluated three time-series variables or over-training the model admission rates for the model with and
analysis approaches for deployment for a data subset can lead to overly without seasonality. Additionally, this
with the AP-HP data. For each method, complex models that aren’t able to comparison enabled visualization of
the team optimized model parameters, generalize to new data. AIC quantifies the predicted and observed admission
as well as identified the best predictive the accuracy of a model for a given rates (see Figure 2).The models
features to include in each. set of training data, along with the generally followed the patterns of the
likelihood that the model will generalize observed data well, with an overall
Phase Two: Selecting and Training to unseen data. variance of around 5 percent. However,
the Model the accuracy of the model speaks to
The team performed a traditional the inherent difficulty of predicting
After evaluating the three time series model comparison of the different actualities that are triggered by
analysis models, the data science team parameterizations for ARIMAX to accidents and unforeseen events.
from Intel chose the ARIMAX approach, identify and fix the correct parameter
based on its ability to address the settings to minimize the objective The data science team deployed a
requirements for accuracy, scalability, function (the AIC), then evaluated distributed implementation of the
and versatility. The team then tuned the the external feature combinations to ARIMAX time series algorithm on TAP,
model for production on TAP. similarly minimize AIC. By running to take advantage of TAP’s capacity for
successive tests and adding and massive scalability and fast processing.
ARIMAX is an algorithm used frequently removing variables, the team arrived The ARIMAX algorithm utilizes the
in the financial sector, where it’s at a model that yielded the greatest Apache Spark Resilient Distributed
employed for predicting changes in accuracy with the least complexity. Datasets (RDD) paradigm, which keeps
stock, commodity, and other financial RDDs in persistent memory for iterative
market value. It provides a standard The results of the model exploration processing to minimize the amount of
approach for expressing a linear phase, using the AIC metric, indicated time spent moving data into memory
regression model in which predictive that the best-performing models (see for repeated computations. The data
variables are used to predict some Figure 1 – a lower AIC number is better) is distributed to different compute
future outcome. ARIMAX takes into did not utilize all the explanatory nodes in the cluster for processing,
account previously-observed values variables that could be logically then reduced back into memory on
and incorporates them into the expected to be useful as predictors. the control node, with no need to
predictive model. The optimal combination of features for reengineer the code for scalability as
this dataset were the estimated number data size increases.
With an optimized model and a of visits, a weekend indicator variable,
final dataset (which included the
anonymized data from AP-HP, as well
as a number of external, open datasets,
including weekly reported flu cases
throughout France, maximum and
minimum daily temperatures recorded
at Charles De Gaulle airport, and holiday
and post-holiday indicator variables),
the team was ready to deploy the model
on TAP where big data analysis could
quickly be performed and results easily
shared with application developers. Figure 1. Impact of the combination of datasets within AIC (lower AIC number is better).
French Hospital Uses Trusted Analytics Platform to Predict Emergency Department Visits and Hospital Admissions 4
Contributions to the Open Source Figure 2. Admission predictions for a model with (red) and without (blue) the seasonality
Community feature, compared with the actual observed rates (black).
Browser-Based Application
Summary
to collaborate in advancing the role or even predicting these flows—is To learn more, visit:
of data analytics in healthcare,” said absolutely key if we want to improve
Ambert. our quality of care.” Intel Big Data & Analytics:
https://software.intel.com/bigdata
“In addition to helping specific The Intel and AP-HP collaboration
hospitals organize and allocate successfully created a merged dataset Trusted Analytics Platform:
resources to avoid overcrowding and that incorporated de-identified
http://trustedanalytics.org
other emergency room challenges, data from four AP-HP hospitals
I see a number of other uses for this with exogenous public datasets that AP-HP:
solution,” said Dr. Saïk URIENS, Unité described external conditions in France
de Recherche Clinique et CIC Paris that might relate to hospital visit and http://www.aphp.fr
Descartes Necker Cochin, APHP, admission rates.
Research Director at Inserm. “Moving
forward, our hospital administrators The data science team evaluated
and medical staff should be able a number of time series analysis
to use this predictive analysis, on algorithms to determine which
retrospective big data, not only to approach optimized requirements
estimate the number of emergency for accuracy, scalability, and
department visits (including admissions interpretability by non-data science
rate), but also to categorize these users. The team chose the time
visits. For instance, it would be very series analysis method ARIMAX and
helpful to have predictive data that deployed it on TAP, a collaborative,
showed the proportion of children, flexible environment for developing
adults and elderly persons or men advanced analytics solutions. The
and women, as well as type of medical TAP analytics and development
causes (infection, cardiology, surgery, environment was also instrumental in
traumatology, etc.).” creating a prototype browser interface
that displays predicted patient visits
“Seeing the prediction application take and admission rates. The result was
advantage of all the data and provide an application that makes it simple for
useful and actionable insights has hospital administrators and clinicians
allowed our medical staff to imagine to better anticipate staffing levels and
the tremendous benefit it will provide improve allocation of resources, all with
to both the staff and our patients,” the end goal of improving care delivery
said Dr. Sébastien Beaune, emergency for patients within the AP-HP hospital
department director at AP-HP. “Having network.
a better understanding of patient flows
at our emergency departments—
Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configution
No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at www.intel.com. Copies of documents which have an order number and are
referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725 or by visiting Intel’s website at http://www.intel.com/design/literature.htm.
Intel is a trademark of Intel Corporation or its subsidiaries in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
Copyright © 2016 Intel Corporation. All rights reserved. Printed in USA 0516/MD/ MCB/PDF Please Recycle 334409-001US