You are on page 1of 5

Amaan Ahmad (2020-310-021)

Breast Cancer prediction

Problem Statement

To develop an accurate machine learning model for early breast cancer


prediction using clinical, demographic, and imaging data. The primary objective
is to classify breast cancer cases as benign or malignant, facilitating early
detection and improving patient outcomes.
Amaan Ahmad (2020-310-021)

TUMOR

Basically of two types

1) Benign tumor: -

 Non-Cancerous
 Slow Growing
 Cells are normal.

2) Malignant tumor: -

 Cancerous
 Fast Growing
 Cells have large dark nuclei & may have abnormal shape.
Amaan Ahmad (2020-310-021)

Dataset:-

(taken from Kaggle)

Datasets are obtained from a biopsy procedure named Fine Needle


Aspiration which is a procedure in which a thin needle is inserted into
an area of abnormal-appearing tissue or body fluid. The sample
collected from the procedure can help make a diagnosis or rule out
conditions such as cancer.

Sample of data

id diagnosis radius mean texture mean perimeter mean area mean smoothness means
842302 M 17.99 10.38 122.8 1001 0.1184
842517 B 20.57 17.77 132.9 1326 0.08474
84300903 M 19.69 21.25 130 1203 0.1096
84348301 M 11.42 20.38 77.58 386.1 0.1425
84358402 M 20.29 14.34 135.1 1297 0.1003
843786 B 12.45 15.7 82.57 477.1 0.1278
844359 M 18.25 19.98 119.6 1040 0.09463
84458202 M 13.71 20.83 90.2 577.9 0.1189
844981 M 13 21.82 87.5 519.8 0.1273
84501001 B 12.46 24.04 83.97 475.9 0.1186
845636 M 16.02 23.24 102.7 797.8 0.08206
84610002 B 15.78 17.89 103.6 781 0.0971
846226 M 19.17 24.8 132.4 1123 0.0974
846381 B 15.85 23.95 103.7 782.7 0.08401
84667401 M 13.73 22.61 93.6 578.3 0.1131
84799002 M 14.54 27.54 96.73 658.8 0.1139
848406 M 14.68 20.13 94.74 684.5 0.09867
84862001 B 16.13 20.68 108.1 798.8 0.117
849014 M 19.81 22.15 130 1260 0.09831
Amaan Ahmad (2020-310-021)

Steps:-

1) Data collection
The dataset is a collection of features extracted from images of breast tissue samples
taken by a fine needle aspirate (FNA) procedure. The dataset has 569 instances and 32
attributes, including the ID number and the diagnosis of each sample. The diagnosis is
either malignant (M) or benign (B), and it is the target variable for classification. The
other 30 attributes are numerical features that describe the shape, size, texture, and
other characteristics of the cell nuclei in the images. These features are computed
from the mean, standard error, and worst values of 10 different measurements, such as
radius, perimeter, area, smoothness, concavity, etc. The dataset is intended to help in
the diagnosis of breast cancer based on the FNA images. The dataset has a class
distribution of 357 benign and 212 malignant samples.

2) Data Preparation
Data preparation is a critical step in building a predictive model for breast cancer. The
dataset provided may contain some null or special Characters which are not usable
data to make it usable we need data preparation process. To prepare your data for
breast cancer prediction, you need to follow a systematic process that involves
cleaning and formatting the data.

3)Train model on data

 Splitting the Data:

The dataset is divided into two parts.

o Validation dataset which is used to train the data with the help of an
algorithm.

o Test dataset which is used to test the data which is trained by the algorithm
on validation dataset.
Amaan Ahmad (2020-310-021)

 Selecting an Algorithm:

Choose an appropriate machine learning algorithm for binary classification.


Common algorithms include logistic regression, decision trees, random forests,
support vector machines (SVM), k-nearest neighbors (KNN), and deep learning
models like neural networks.

The choice of algorithm may depend on the size and complexity of dataset. we can
experiment with multiple algorithms to determine which one performs best for
your specific data.

 Model Training:

Fit the selected machine learning model to the training data. This involves finding
the model parameters that best fit the training examples to make accurate
predictions.

The exact code to train the model will depend on the machine learning library or
framework you are using.

4)Analysis/Evaluation
When performing analysis, it's essential to thoroughly assess the performance of your
model to ensure its effectiveness in correctly classifying cases as benign or malignant.

Using the validation dataset to evaluate the model's performance. Common evaluation
metrics for binary classification tasks include accuracy, precision, recall, F1-score,
and ROC-AUC.

Make predictions on the validation data and calculate these metrics to assess how well
the model is performing.

You might also like