Professional Documents
Culture Documents
● To automatically train the models every time a new data is fed to the API
Abstract
Breast cancer is a cancer that forms in the cells of the breasts.
It is a very common disease nowadays,it can occur in both men and women.
The project deals with the prediction of the tumor cells by classifying data obtained
from Fine Needle Aspirate (FNA) to predict if a tumor is Malignant or Benign in
nature using logistic regresssion.
A Rest API is developed and deployed in the cloud which uses the preserved model to
predict breast cancer from the biopsy data sent from the client.
Existing system
supervised machine learning model such as Decision Trees, random forest, SVM or
regression algorithms used to create the moel to predict the tumor is malignant or
benign in nature.
Most studies use the SVM(Support vector machine) learning algorithm to create the
In production, the patient is required to undergo some form of physical activity to
The system heavily relies on the medical parameters of the patient which is not stable
Unstable inputs to the model can lead to false positives or false negatives.
Proposed System
● Using a machine learning classifier to predict breast cancer from the FNA
biopsy report
● Deploy a rest API on the cloud that will train the model in real time every time
a new data is provide
● Deploy a client facing web app to display the result of the classification of the
report.
Advantages of Proposed System
● Reports of false negatives and false positives can be prevented on a large
scale as the system can act as an external reference.
● Due to the systems self learning nature, the system will produce foolproof
result when it is provided with data over time.
● The system can also be used as a quick way to analyze the result of a FNA
report without much clinical knowledge.
Advantages of Proposed System
● Reports of false negatives and false positives can be prevented on a large
scale as the system can act as an external reference.
● Due to the systems self learning nature, the system will produce foolproof
result when it is provided with data over time.
● The system can also be used as a quick way to analyze the result of a FNA
report without much clinical knowledge.
Block Diagram
• Data preprocessing is a data mining technique that involves transforming raw data into
an understandable format.
The mean, standard error and “worst” or largest (mean of the three largest
values) of these features were computed for each image, resulting in 30
features.
Algorithm
● Since the data set contains more number of observation than the features, a
regression model is selected.
● The regression algorithm used for developing the model is Logistic regression
● This algorithm is selected due to its efficiency in drafting the regression line
between binary data.
Logistic Regression
● The logistic model is used to model the probability of a certain class or event
existing such as pass/fail, win/lose, alive/dead or healthy/sick.
● Since the result of an FNA report can either be Malignant or Benign
● , logistic regression is better suited
● The logistic regression is given by,
Logistic regression curve
Performance of the model
Test vs Training
The Normalization of the graph along 0 shows that the train and test data has been
overlapped. This is a good indication that the model has done a good job in classifying the
data.The spike in the center is the training data and the rectangular spike in the center below
the training data is the testing data.It isalso safe to say that the model has performed with an
accuracy of above 99 percent.
Results
A Classification report is used to measure the quality of predictions from
a classification algorithm.
The confusion matrix shows that the accuracy of the model is 100% but
however the model is not 100 % accurate and there were few miscalculations
The literature focuses on creating a better model for the classification of tumor cells
for predicting breast cancer.
The Model promises an accuracy of above 98%.
The reports of false negatives and false positives can be prevented on a large scale.
FUTURE ENHANCEMENT
The project is limited to its ability . So a more indepth steady has to be conducted in
order to source more data for detecting any patterns in biopsy data over time.
In future image processing approach will be taken to process the raw digital image of
[3].Siegel RL, Miller KD, Jemal A. Cancer Statistics , 2016. 2016;00(00):1-24. doi:10.3322/caac.21332.
Thank you