You are on page 1of 21

MAJOR PROJECT

ON
BREAST CANCER DETECTION

Guided By: Submitted By:


Bivasa Ranjan Parida Mahima Milan Mohapatra - 1801219074
Suroshree Ghosh - 1801219161
Gyana Prakash Sahoo - 1801219056
Biswajit Sahoo - 1801219035

Dept. of Computer Science & Engineering


College of Engineering Bhubaneswar
:Content:
 Introduction
 System Specification
 Methodologies
 System Architecture
 Project Interface
 Task Performed
 Confusion Matrix
 Project Interface for breast cancer detected
 Project Interface for breast cancer not detected
 Advantages & Disadvantages
 Applications
 Future Scope
 Conclusion
 Reference
Introduction
 Cancer is a disease in which abnormal cells divide uncontrollably and destroy
body tissue.
 Mainly of two types i.e.
 Malignant(Cancerous)
 Benign(Non cancerous)
 Breast Cancer is the second largest cause of cancer deaths among women.
 At the same time, it is also among the most curable cancer types if it can be
diagnosed early.
System
Specification
Hardware Requirements:
 System: Pentium IV 2.4GHz
 Hard Disk: 500 GB
 RAM: 4 GB
 Any desktop/laptop system with above configuration or higher level

Software Requirements:
 Operating System: Windows 7 and above
 Coding Language: Python 2.7 and above
 Scripting tool: Jupyter Notebook
 Libraries: Pandas, Numpy, Sklearn, stats, Matplotlib, statistics.
Methodologies
What is a Support Vector Machine(SVM)?
• Supervised pattern classification
• powerful and versatile Machine Learning model
• suited for small or medium sized datasets.
• SVM is a training algorithm for learning classification and regression
rules from
data.
System
Architecture: Start

Training Data Breast cancer


detection

Preprocessed data
Cleaned dataset

Data visualization

Prediction using SVM


algorithm

Analysis the output and


Stop
performance
Project Interface
Task
Performed
Preparing the Data:-
Some loaded packages are;
1. import pandas pd 2.import
numpy as np
3.import matplotlib.pyplot as plt 4.import
seaborn as sns
Using pandas we will load the dataset and print some basic
information.
df = pd.read_csv("cell_samples.csv")
df.head()
df.tail()
• Output:
Which will display top and bottom entities of the data set used in our model.
• Now we can calculate how many diagnosis are malignant and how many are
benign . Which has been shown below.

Output:

• Now we can use seaborn to create heat map of the correlations between the
features.

plt.figure(figsize=(14, 11))
sns.heatmap(df.corr(),annot=True,cmap=
'viridis’) plt.show()
Output:

Fig: Heat map


Why Choose
SVC? Predicted

114

TN FP
Actual

FN TP

(Fig: Confusion Matrix)


From confusion matrix we can calculate Accuracy,Error,precision,recall.
1.Accuracy=(TP+TN)/Total
=(114+54)/175
=168/175
=0.96
2.Error=1-Accuracy
=1-0.96
=0.04
3.precision=TP/Predicted positive
=54/58
=0.93
4.recall=TP/Actual positive
=54/57
=0.95
Project Interface for Breast Cancer
Detected
Project Interface for Breast Cancer Not
Detected
Advantages Disadvantages

 Effective in high dimensional spaces  If the number of features is much


 Effective in cases where number of greater than the number of samples,
dimensions is greater than the avoid over-fitting in choosing Kernel
number of samples. functions.
 It is also memory efficient.  SVMs do not directly provide probability
estimates, these are calculated using an
expensive five-fold cross-validation.
Application
s
 Early detection leads to more treatment options and a better chance for
survival.
 Breast cancer detected at an early stage have a 93 percent or higher
survival rate in the first five years.
 It is quite easier to treat at an early stage rather than last stage.
Future
Scope
Breast cancer if found at an early stage will help save lives of thousands of women
or even men. Hence, this project plays a very important role for future:
• These projects help the real world patients and doctors to gather as much
information as they can.
• By using machine learning algorithms we will be able to classify and predict
the cancer into bening or malignant.
• Machine learning algorithms can be used for medical oriented research, it
advances the system, reduces human errors and lowers manual mistakes.
Conclusio
n
• After applying the different classification models, we have got accuracies with
different models. Decision Tree, K-NN, Support Vector Machine and Logistic
Regression algorithms achieved 94.64 percent,89.22 percent, 96.87 percent and 94.67
percent accuracy respectively.
• This research established the model’s performance and significant factors affecting
breast cancer patients’ survival rates, which may be used in clinical practice, especially
in the Asian scenario.
Reference
1. https://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/1472-694
7-8-56
2. https://airccse.org/journal/ijdps/papers/4313ijdps09.pdf
3. https://link.springer.com/article/10.1007/s10489-007-0073-z
4. https://www.sciencedirect.com/science/article/pii/S1877050916302575
5. https://www.academia.edu/71848246/Prediction_of_Breast_Cancer_Disease_us
ing_Machine_Learning_Algorithms
Thank You

You might also like