Professional Documents
Culture Documents
Group Project:
Breast Cancer Classification
Group4 – CityU7D
Tạ Thị Phương Anh
Nguyễn Đức Anh
Đoàn Lê Thiện Hảo
Hà Văn Nguyên
Nguyễn Việt Tùng
Table of content
We use the dataset to evaluate the goodness of the models, thereby selecting the best model.
This is a classification task. Because it classifies a diagnosis of breast cancer. The classes are:
benign and malignant.
2. Dataset Description section
The mean, standard error and "worst" or largest (mean of the three largest values) of these
features were computed for each image, resulting in 30 features. For instance, field 3 is Mean
Radius, field 13 is Radius SE, field 23 is Worst Radius.
We have the first attribute ID which is in
numeric form, so we remove them using
drop command. When the ID column is
lost, the dataset will still work normally
and will not be disturbed by the ID
number. For highly disparate data, we
separate out the disproportionate
columns of data to transform them using
the standard deviation method.
This dataset has a label for each data sample. This is a monitored issue. And there are no
missing values in this dataset. Just as there is no "noise" data in the dataset.
Compared to other countries in the world, the
United States is where breast cancer is the
second leading cause of death in women, after
lung cancer, but this rate is showing signs of
decreasing.
3. Methodology & Algorithm section
Logistic Regression, Decision Tree, Random Forest, Xgboost were applied
There are 2 samples that are wrongly predicted: Fact is 1 (Malignant) ==> Prediction is 0 (Benign)
There is 1 sample that is wrongly predicted: Fact is 0 (Benign) ==> Prediction is 1 (Malignant)
Decision Tree
There are 4 samples that are wrongly predicted: Fact is 1 (Malignant) ==> Prediction is 0 (Benign)
There are 4 samples that is wrongly predicted: Fact is 0 (Benign) ==> Prediction is 1 (Malignant)
Random forest
There are 4 samples that are wrongly predicted: Fact is 1 (Malignant) ==> Prediction is 0
(Benign)
XGBOOST
There are 3 samples that are wrongly predicted: Fact is 1 (Malignant) ==> Prediction is 0 (Benign)
Thanks!
Any questions?