You are on page 1of 11

Data Analytics Lifecycle

• Discovery
• Data preparation
• Model planning
• Model building
• Communication of results
• Operationalization
Discovery
• Set develop clear goals and a plan of how to achieve those goals.
• Learning the business domain. Focus on the business requirements,
rather than the data.
• History such as whether the organization or business unit has attempted
similar projects in the past from which they can learn.
• Assess the resources available to support the project in terms of people,
technology, time, and data.
• Frame the business problem as an analytics challenge that can be
addressed in subsequent phases
• Formulate initial hypotheses to test.
Data preparation
• Focus shifts from business requirements to data requirements.
• Activities include collecting, processing, and cleansing data.
• Ensure that the data you need is actually available.
• Data is captured through three main ways:
• Data acquisition: obtaining existing data from outside sources.
• Data entry: creating new data values from data inputted within the
organization.
• Signal reception: capturing data created by devices. A distribution and range
may be obtained for the data, which forms a natural bridge to the next step.
Model Planning
• Determine the methods, techniques, and workflow to follow for the
subsequent model building phase.
• Load and explore the data at hand. Many techniques are available for loading
data.
A few examples:
• ETL (Extract, Transform & Load) transforms data using a set of business rules, before
loading it into a sandbox.
• ELT (Extract, Load, and Transform) loads raw data into the sandbox then transforms the
data.
• ETLT (Extract, Transform, Load, Transform) has two levels of transformation. The first
transformation is often used to eliminate noise.
• Learn about the relationships between variables and subsequently select key
variables and the most suitable models.
Model building
Building a model involves two phases:
• Design the model: identify a suitable model (e.g. a normal
distribution).
• Execute the model: The model is run against the data to ensure that
the model fits the data.
Communication of results
• In collaboration with major stakeholders, determine if the results of
the project are a success or a failure based on the criteria developed
in Phase 1.
• Identify key findings, quantify the business value, and develop a
narrative to summarize and convey findings to stakeholders.
Operationalization
• Deliver final reports, briefings, code, and technical documents.
• Run a pilot project to implement the models in a production
environment.
Map facts in the case to the stages of the data analytics lifecycle.

Format
Lifecycle stage Action taken in the case

Discovery Dr. Kelly is deeply invested in solving the problem; already familiar
with the challenges because of her position as Medical Director of
the OU; ….

Data preparation
Model planning
Model building
Communication of results
Operationalization
• The file OUData.csv contains realistic data for medicine service
patients admitted to the observation unit (OU) over a period of time,
including their age, gender, preliminary diagnosis related group
(DRG) at the time of admission to the OU, and whether they had to
be sent to the inpatient wards eventually.
• Your task in this exercise will be to use various data summaries and
visualizations to determine simple, intuitive rules for placing certain
types of patients to the OU versus sending them directly to the
wards. The main criteria we are going to use to determine whether a
patient should be placed in the OU are:
(1) the probability that the patient will “flip,” i.e., that the patient’s
status will change from OBSERVATION to INPATIENT and the patient
will need to be sent to the wards anyway (column Flipped); and
(2) the expected Length of Stay (LOS) of the patient in the OU (column
OU_LOS_hrs).
Patient types with a relatively high probability of flipping and a relatively
long LOS should be sent directly to the wards, so as not to create
congestion in the OU.

1. Use appropriate visualizations to explore how a patient’s Age,


Gender, and primary complaint (DRG01) affect the likelihood that
the patient will flip and his/her length of stay.
2. Consider combinations of variables: Age-Gender, DRG-Gender and
Gender-PrimaryInsuranceCategory. What observations can you
make about the profiles of patients that are more likely to flip and
more likely to stay long in the OU?
3. Let’s focus only on two DRGs (780 and 782) and only two insurance
types (“MEDICARE” and “MEDICARE OTHER”). Cross-tabulate the
number of patients from each DRG and insurance type that flipped.
Based on the table, patients with which of the two DRGs (780 or
782) are a better fit for the OU? Why?
DRG Code Description
276 Dehydration
428 Congestive Heart Failure
486 Pneumonia
558 Colitis
577 Pancreatitis
578 GI Bleeding
599 Urinary Tract Infection
780 Syncope
782 Edema
786 Chest Pain
787 Nausea
789 Abdominal Pain

You might also like