Professional Documents
Culture Documents
• Discovery
• Data preparation
• Model planning
• Model building
• Communication of results
• Operationalization
Discovery
• Set develop clear goals and a plan of how to achieve those goals.
• Learning the business domain. Focus on the business requirements,
rather than the data.
• History such as whether the organization or business unit has attempted
similar projects in the past from which they can learn.
• Assess the resources available to support the project in terms of people,
technology, time, and data.
• Frame the business problem as an analytics challenge that can be
addressed in subsequent phases
• Formulate initial hypotheses to test.
Data preparation
• Focus shifts from business requirements to data requirements.
• Activities include collecting, processing, and cleansing data.
• Ensure that the data you need is actually available.
• Data is captured through three main ways:
• Data acquisition: obtaining existing data from outside sources.
• Data entry: creating new data values from data inputted within the
organization.
• Signal reception: capturing data created by devices. A distribution and range
may be obtained for the data, which forms a natural bridge to the next step.
Model Planning
• Determine the methods, techniques, and workflow to follow for the
subsequent model building phase.
• Load and explore the data at hand. Many techniques are available for loading
data.
A few examples:
• ETL (Extract, Transform & Load) transforms data using a set of business rules, before
loading it into a sandbox.
• ELT (Extract, Load, and Transform) loads raw data into the sandbox then transforms the
data.
• ETLT (Extract, Transform, Load, Transform) has two levels of transformation. The first
transformation is often used to eliminate noise.
• Learn about the relationships between variables and subsequently select key
variables and the most suitable models.
Model building
Building a model involves two phases:
• Design the model: identify a suitable model (e.g. a normal
distribution).
• Execute the model: The model is run against the data to ensure that
the model fits the data.
Communication of results
• In collaboration with major stakeholders, determine if the results of
the project are a success or a failure based on the criteria developed
in Phase 1.
• Identify key findings, quantify the business value, and develop a
narrative to summarize and convey findings to stakeholders.
Operationalization
• Deliver final reports, briefings, code, and technical documents.
• Run a pilot project to implement the models in a production
environment.
Map facts in the case to the stages of the data analytics lifecycle.
Format
Lifecycle stage Action taken in the case
Discovery Dr. Kelly is deeply invested in solving the problem; already familiar
with the challenges because of her position as Medical Director of
the OU; ….
Data preparation
Model planning
Model building
Communication of results
Operationalization
• The file OUData.csv contains realistic data for medicine service
patients admitted to the observation unit (OU) over a period of time,
including their age, gender, preliminary diagnosis related group
(DRG) at the time of admission to the OU, and whether they had to
be sent to the inpatient wards eventually.
• Your task in this exercise will be to use various data summaries and
visualizations to determine simple, intuitive rules for placing certain
types of patients to the OU versus sending them directly to the
wards. The main criteria we are going to use to determine whether a
patient should be placed in the OU are:
(1) the probability that the patient will “flip,” i.e., that the patient’s
status will change from OBSERVATION to INPATIENT and the patient
will need to be sent to the wards anyway (column Flipped); and
(2) the expected Length of Stay (LOS) of the patient in the OU (column
OU_LOS_hrs).
Patient types with a relatively high probability of flipping and a relatively
long LOS should be sent directly to the wards, so as not to create
congestion in the OU.