Professional Documents
Culture Documents
Cocu Mentation
Cocu Mentation
Dataset
Title:
Haberman's Survival Data Set
Information:
The dataset contains cases from a study that was conducted between 1958 and 1970 at the
University of Chicago's Billings Hospital on the survival of patients who had undergone surgery
for breast cancer. Class1 = the patient survived 5 years or longer, where class2 = the patient died
within 5 year
Attribute Information:
1. Age of patient at time of operation (numerical)
Dataset link:
https://archive.ics.uci.edu/ml/datasets/Haberman%27s+Survival
PDF Analytics:
According to the previous histogram, people with age between (40-60) years old are
most likely to die. People with age less than 40 years old are more likely to survive
(because this area on the plot is totally blue and not overlapped with the other orange
area).
From the above PDFs(Univariate analysis) both Age and Operation_Year are not good features for
useful insights as the distibution is more similar for both people who survived and also dead.
From the year distribution, we can observe that people who didnt survive suddenly rise and fall in
between 1958 and 1960. More number of people are not survived in year of operation of 1965
Box plot
From box plots and violin plots, we can say that more no of patients who are dead have age
between 46-62,year between 59-65 and the patients who survived have age between 42-60, year
between 60-66.
Resources
https://stackabuse.com/implementing-svm-and-kernel-svm-with-pythons-scikit-learn/
https://github.com/bethusaisampath/Haberman-Cancer-Survival-
Dataset/blob/master/Haberman.ipynb
https://www.youtube.com/watch?v=U4vHP7KXt2Y&list=PLs7xKYqehofX6EIYD7WG6Lq
1ZMM2cs6Es
https://towardsdatascience.com/pca-using-python-scikit-learn-e653f8989e60
https://github.com/mGalarnyk/Python_Tutorials/blob/master/Sklearn/PCA/PCA_Data_Vi
sualization_Iris_Dataset_Blog.ipynb