Classification of Tumor Cells For The Prediction of Breast Cancer Using Machine Learning

Classification of tumor cells for the prediction of
breast cancer using machine learning
Guided by, Submitted by,

Mrs.M.Mahiba M.Tech,(AP/CSE) J.K.Benzersen,
R.Rangaswamy,
V.Sundaram.
Objectives
● Predict breast cancer with biopsy data
● To provide a extra layer verification for predicting breast cancer
● To create an API to predict breast cancer in real time
● To automatically train the models every time a new data is fed to the API
Abstract

Breast cancer is a cancer that forms in the cells of the breasts.

It is a very common disease nowadays,it can occur in both men and women.

The project deals with the prediction of the tumor cells by classifying data obtained
from Fine Needle Aspirate (FNA) to predict if a tumor is Malignant or Benign in
nature using logistic regresssion.

A Rest API is developed and deployed in the cloud which uses the preserved model to
predict breast cancer from the biopsy data sent from the client.
Existing system

supervised machine learning model such as Decision Trees, random forest, SVM or
regression algorithms used to create the moel to predict the tumor is malignant or
benign in nature.

Most studies use the SVM(Support vector machine) learning algorithm to create the
model due to its high accuracy rate in prediction.

Disadvantages of Existing Method

In production, the patient is required to undergo some form of physical activity to
generate the required input for the model.

The system heavily relies on the medical parameters of the patient which is not stable

Unstable inputs to the model can lead to false positives or false negatives.
Proposed System
● Using a machine learning classifier to predict breast cancer from the FNA
biopsy report
● Deploy a rest API on the cloud that will train the model in real time every time
a new data is provide
● Deploy a client facing web app to display the result of the classification of the
report.
Advantages of Proposed System
● Reports of false negatives and false positives can be prevented on a large
scale as the system can act as an external reference.
● Due to the systems self learning nature, the system will produce foolproof
result when it is provided with data over time.
● The system can also be used as a quick way to analyze the result of a FNA
report without much clinical knowledge.
Advantages of Proposed System
● Reports of false negatives and false positives can be prevented on a large
scale as the system can act as an external reference.
● Due to the systems self learning nature, the system will produce foolproof
result when it is provided with data over time.
● The system can also be used as a quick way to analyze the result of a FNA
report without much clinical knowledge.
Block Diagram
Training Learning Computing

Data Algorithm Model
New Data Model Prediction

Preprocessing
• Data preprocessing is a data mining technique that involves transforming raw data into
an understandable format.
• Data preprocessing is the most important phase of a machine learning project,

especially in computational biology.If there is much irrelevant and redundant
information present or noisy and unreliable data,then knowledge discovery during the
training phase is more difficult
Feature Selection
The features that are considered for the training data are
a. radius (mean of distances from center to points on the perimeter)

b. texture (standard deviation of gray-scale values)
c. perimeter
d. area
e. smoothness (local variation in radius lengths)
f. compactness (perimeter^2 / area – 1.0)
g. concavity (severity of concave portions of the contour)
h. concave points (number of concave portions of the contour)
i. symmetry
The mean, standard error and “worst” or largest (mean of the three largest
values) of these features were computed for each image, resulting in 30
features.
Algorithm
● Since the data set contains more number of observation than the features, a
regression model is selected.
● The regression algorithm used for developing the model is Logistic regression
● This algorithm is selected due to its efficiency in drafting the regression line
between binary data.
Logistic Regression
● The logistic model is used to model the probability of a certain class or event
existing such as pass/fail, win/lose, alive/dead or healthy/sick.
● Since the result of an FNA report can either be Malignant or Benign
● , logistic regression is better suited
● The logistic regression is given by,
Logistic regression curve
Performance of the model
Test vs Training

The Normalization of the graph along 0 shows that the train and test data has been
overlapped. This is a good indication that the model has done a good job in classifying the
data.The spike in the center is the training data and the rectangular spike in the center below
the training data is the testing data.It isalso safe to say that the model has performed with an
accuracy of above 99 percent.
Results

A Classification report is used to measure the quality of predictions from
a classification algorithm.

The confusion matrix shows that the accuracy of the model is 100% but
however the model is not 100 % accurate and there were few miscalculations
during the prediction of real time data

Fig confusion matrix
CONCLUSION

The literature focuses on creating a better model for the classification of tumor cells
for predicting breast cancer.

The Model promises an accuracy of above 98%.

The reports of false negatives and false positives can be prevented on a large scale.
FUTURE ENHANCEMENT

The project is limited to its ability . So a more indepth steady has to be conducted in
order to source more data for detecting any patterns in biopsy data over time.

In future image processing approach will be taken to process the raw digital image of
the FNA to detect cancer without decoding the image

References
[1].L.A. Altonen, R. Saalovra., P. Kristo, F. Canzian, A.Hemminki, Peltomaki P, R. Chadwik, A. De La

Chapelle, "Incidence of hereditary nonpolyposis colorectal cancer and the feasibility of molecular
screening for the disease",N Engl J Med, Vol. 337, pp.1481–1487, 1998.
[2].S.Chakraborty, "Bayesian kernel probit model for Microarray Based Cancer

classification",Computational Statistics and Data Analysis, Vol.12, pp. 4198–4209, 2009.
[3].Siegel RL, Miller KD, Jemal A. Cancer Statistics , 2016. 2016;00(00):1-24. doi:10.3322/caac.21332.
Thank you

Classification of Tumor Cells For The Prediction of Breast Cancer Using Machine Learning

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Classification of Tumor Cells For The Prediction of Breast Cancer Using Machine Learning

Uploaded by

Copyright:

Available Formats

Classification of tumor cells for the prediction of

breast cancer using machine learning

Guided by, Submitted by,

● Predict breast cancer with biopsy data

● To provide a extra layer verification for predicting breast cancer

● To create an API to predict breast cancer in real time

model due to its high accuracy rate in prediction.

generate the required input for the model.

Training Learning Computing

New Data Model Prediction

• Data preprocessing is the most important phase of a machine learning project,

a. radius (mean of distances from center to points on the perimeter)

during the prediction of real time data

the FNA to detect cancer without decoding the image

[1].L.A. Altonen, R. Saalovra., P. Kristo, F. Canzian, A.Hemminki, Peltomaki P, R. Chadwik, A. De La

[2].S.Chakraborty, "Bayesian kernel probit model for Microarray Based Cancer

You might also like