You are on page 1of 3

ANALYTICS IN HEALTHCARE

It was a Monday morning and the newly appointed business analyst had arrived.
He was recruited by the company “Medcart Solutions Pvt. Ltd.” and after
completing the joining formalities, he sat down eagerly looking forward to his first
real-time assignment and the demands of such a project. Just then, he got a much
awaited notification in his mail box and he opened it to see an email from his new
project leader to meet him.

It was his first stint at analytics and he was looking forward to a challenging
project that will help him bring to the table all the data crunching skills he acquired
to be applied in a real-life scenario.

He entered the cabin and his mentor cleared his throat and said: “Welcome to
Medcart Solutions Pvt. Ltd. You must be excited to know what you are here for.
Ours is a first of a kind company that specializes in healthcare analytics to draw
interesting insights that is useful for the realm of medical sciences. We have been
assigned a project, a very prestigious one........we have got a project to examine
factors that contribute to the occurrence of Diabetes in people, more so, in Women.
From secondary research, factors such as BMI, Blood Pressure, Cholesterol and
Glucose levels are important factors that cause Diabetes. In Women, Pregnancy
seems to be an additional factor. There is a need to validate the above hypotheses,
identify additional risk factors and build tools that can predict the occurrence of
Diabetes. I have mailed the relevant dataset. Please go through and take about 2
weeks to complete the project. I think you got the clue! So, fire away!”
The budding data analyst was excited at the same time bit apprehensive since
healthcare is a very critical sector and there is a need to carefully analyze and
derive the constructive insights from the data and that needs careful examination.

Diabetes mellitus is a group of metabolic disorders where the blood sugar levels
are higher than normal for a prolonged time periods. Diabetes is caused either due
to the insufficient production of insulin in the body or due to improper response of
the body’s cells to Insulin. There are two types of Diabetes namely: the Type 1
DM or Insulin-dependent Diabetes mellitus and the latter is the Type 2 DM or
Non-Insulin Dependent DM. Gestational Diabetes is a third type of Diabetes where
women not suffering from DM develop high sugar levels during pregnancy. In the
United States, 30.3 million Americans were recorded as suffering from Diabetes
with 1.5 million being diagnosed with Diabetes every year. Total cost of diagnosed
Diabetes in the US in 2017 was $327 billion.

Without much time waste, the analyst opened his laptop and started out from the
dataset given to him by his project mentor i.e. the PIMA diabetes dataset. The
diabetes data contains information about PIMA Indian females, near Phoenix,
Arizona which has been under continuous study since 1965 due to the high
incidence rate of Diabetes in PIMA females. The dataset was originally published
by the National Institute of Diabetes and Digestive and Kidney Diseases,
consisting of diagnostic measurements pertaining to females of age greater than 20.
It contains information of 768 females, of which 268 females were diagnosed with
Diabetes.

The response variable in the dataset is a binary classifier, Outcome, that indicates if
the person was diagnosed with Diabetes or not. The intern was a well-known
academic genius who was pursuing a programme in big data analytics at a
prestigious institute and was well known in data analysis.
Thus, as expected, he did not waste much time and went on to closely examine the
PIMA diabetes dataset, a sample demonstrated below in Exhibhit 1:

Exhibhit 1: PIMA Diabetes dataset


Pregnancies Glucose BloodPressure SkinThickness Insulin BMI Diabetes_Pedigree_Function Age Outcome
6 148 72 35 0 33.6 0.627 50 1
1 85 66 29 0 26.6 0.351 31 0
8 183 64 0 0 23.3 0.672 32 1
1 89 66 23 94 28.1 0.167 21 0
0 137 40 35 168 43.1 2.288 33 1
5 116 74 0 0 25.6 0.201 30 0
3 78 50 32 88 31 0.248 26 1
10 115 0 0 0 35.3 0.134 29 0
2 197 70 45 543 30.5 0.158 53 1
8 125 96 0 0 0 0.232 54 1

The data showed the number of pregnancies, the glucose level, blood pressure,
thickness of skin, insulin content, Body Mass Index(BMI), history of occurrence of
diabetes, age of the patient and the outcome 1/0 to determine the occurrence of
diabetes in the patient. This data prompted the analyst to make some analysis and
uncover some interesting patterns for the healthcare client for which he came up
with the following initial questions.
Hope data analytics comes to the rescue of the newly recruited employee who
wants to make it big in this line. This could be the first real-time analytics project
that he is taking up!

Case Questions:
Q1. Which technique and which algorithm is applicable in the particular scenario?
Q2. What are the different algorithms for examining the factors that cause diabetes
and that predict the occurrence of diabetes?
Q3. Which algorithm provides the best accuracy?

You might also like