You are on page 1of 3

BFM4633 Data Analytics

SEMESTER II 2020/2021

Assignment 1
Section 01P


Data Collection

According to the World Health Organization (WHO) stroke is the 2nd leading cause of
death globally, responsible for approximately 11% of total deaths.

This dataset is used to predict whether a patient is likely to get stroke based on the input
parameters like gender, age, various diseases, and smoking status. Each row in the data
provides relavant information about the patient.

1. id: unique identifier

2. gender: "Male", "Female" or "Other"
3. age: age of the patient
4. hypertension: 0 if the patient doesn't have hypertension, 1 if the patient has
5. heart_disease: 0 if the patient doesn't have any heart diseases, 1 if the patient has a heart
6. ever_married: "No" or "Yes"
7. work_type: "children", "Govt_jov", "Never_worked", "Private" or "Self-employed"
8. Residence_type: "Rural" or "Urban"
9. avg_glucose_level: average glucose level in blood
10. bmi: body mass index
11. smoking_status: "formerly smoked", "never smoked", "smokes" or "Unknown"*
12. stroke: 1 if the patient had a stroke or 0 if not
*Note: "Unknown" in smoking_status means that the information is unavailable for this

Data Intrepretation

We evaluated our data collected from the data source to produce data arranged value
automatically. For example, depending on the output from the data source, it is apparent that
the data is categorized into male and female genders. Then according to patience heath
conditions such as average glucose level in blood .body mass index , smoking status to
predict the present of stroke in patient.

Based on the data output, we know that there are5110 patient involved .The age taken
to predict heart disease is between age of 0.08 to age of 82, the average of the age is 43.226.
The people in hypertension have 498 people ,and 4612 is not in hypertension. Other than
that, there are 2718 female ,1893 male and 1 other is not hypertension . The female genter
having the stroke are higher than male genter with the number 2994 and 2115 .There are
2634 female and 1765 male is not in heart disease, but there are 360 female and 350 male in
heart disease.In the other side, in the work data there have 685 female children , 624 female
work in gorvement ,22 female unemployed,2776 female work in private company and 754
female are self employed . Also , there are 2 male children , 33 male work in gorvement ,
149 male work in private company and 65 male are self employed.In the data 2514 people
are in rural residence and 2596 in urban residence. According the data , the glucose level is
between in 55.12 and 271.74 and the average of the avg glucose level is 106.115 .In the
smoking status the are 885 formerly smoker ,1892 never smoked , 789 smokes and 1544 is
unknown.At the last , there are 4861 people is not get stroke and 249 are get stroke.
At the conclusion ,for the stroke prediction, age and gender does not affect
at all. Majorly ,the stroke is not only cause by smoke, its also can be cause by fat from the
body .

You might also like