You are on page 1of 62

SRINIVAS UNIVERSITY

INSTITUTE OF ENGINEERING AND TECHNOLOGY


MUKKA, MANGALORE - 574146

MAJOR PROJECT REPORT

ON

“Breast Cancer Detection using CNN Model”

Submitted in the partial fulfillment of the requirements for the award of the degree
of

BACHELOR OF TECHNOLOGY IN

COMPUTER SCIENCE AND ENGINEERING

Submitted By,

ATHMIKA BHAT 1SU19CS008


KSHAMIKA 1SUI9CS020
RASHMITHA RAJ 1SU19CS033
SIMI SATHEESH 1SU19CS047

UNDER THE GUIDANCE OF

Mrs. RAVURI LALITHA


Assistant Professor, Dept of AI/ML

2022-2023

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

SRINIVAS UNIVERSITY, MUKKA


SRINIVAS UNIVERSITY
INSTITUTE OF ENGINEERING AND TECHNOLOGY
MUKKA, MANGALORE - 576146

Department of Computer Science and Engineering

CERTIFICATE
This is to certify that the project entitled “Breast Cancer Detection using CNN
Model” is a bonafide work carried out by Ms. Athmika Bhat, Ms.
Kshamika ,Ms. Rashmitha Raj, Ms. Simi Satheesh bearing the USN
1SU19CS008, 1SU19CS020, 1SU19CS033, 1SU19CS047 in the partial fulfillment
for the award of Bachelor of Technology in Computer Science and Engineering of the
Srinivas University Institute of Engineering and Technology, Mukka Mangalore
during the year 2022-2023. It is certified that all corrections/suggestions indicated for
internal assessment have been incorporated in the report deposited in the department
library. The report has been approved as it satisfies the academic requirements in respect
of project work prescribed for the said degree.

Name & Signature of the Guide Name & Signature of the H.O.D
Mrs. Ravuri Lalitha Dr. Nethravathi P. S.

Signature of the Dean


Dr. Thomas Pinto
Dean, SUIET Mukka

External Viva
Name of the Examiners Signature with date

1. ----------------------------------

2. -----------------------------------
SRINIVAS UNIVERSITY
INSTITUTE OF ENGINEERING AND TECHNOLOGY
MUKKA, MANGALORE - 576146

Department of Computer Science and Engineering

DECLARATION

We, Athmika Bhat, Kshamika, Rashmitha Raj, Simi Satheesh, students of eighth
semester, B.Tech in Computer Science and Engineering, Srinivas University Institute
of Engineering and Technology, Mukka, hereby declare that the major project entitled
“BREAST CANCER DETECTION USING CNN MODEL” has been successfully
completed by us in the partial fulfillment of the requirements for the award of degree in
Bachelor of Technology in Computer Science and Engineering at Srinivas
University Institute of Engineering and Technology and no part of it has been
submitted for the award of degree or diploma in any university or institution
previously.

Date :

Place : Mukka

(Signature of the Students)


ABSTRACT

Cancer begins when healthy cells in the breast change and grow out of control, forming a mass
or sheet of cells called a tumor. A tumor can be cancerous or benign. A cancerous tumor is
malignant, meaning it can grow and spread to other parts of the body. A benign tumor means the
tumor can grow but has not spread. Most of the studies concentrated on mammogram images.
However, mammogram images sometimes have a risk of false detection that may endanger the
patient’s health. It is vital to find alternative methods which are easier to implement and work
with different data sets, cheaper and safer, that can produce a more reliable prediction. In this
work, we propose a CNN model which uses the features from the dataset to classify whether the
tumor is benign or malignant. The work has been implemented using Python (3.10) in Pycharm
and the CNN model with the datasets we got from kaggle. The objective of this paper is to
develop a model which can classify the type of tumor so that the patients can know. The study
proves that this procedure is workable and produces valid results. We are using confusion matrix
tool, based on the values in the confusion matrix, we can calculate various evaluation metrics
such as accuracy, precision and recall (sensitivity). These metrics provide insights into the
performance of the CNN model for breast cancer detection.

Keywords: Convolutional Neural Network(CNN), Benign, Malignant, diagnosis, features.

I
ACKNOLEDGEMENT

It is our pleasure to acknowledge all those who have provided guidance, inspiration and
encouragement for our project.

We take this opportunity to express our profound gratitude and appreciation to our respected
project Guide and Coordinator, Ms. Ravuri Lalitha, Assistant Professor, Dept. of AI/ML for her
ever-inspiring guidance, constant encouragement and support he had provided us throughout the
course of our project. His motivating and encouraging attitude have made our work possible.

We sincerely thank Dr. Nethravathi P. S, Head of the Department, Computer science &
Engineering, for being an inspiration and support throughout this project.

We are extremely grateful to our respected Dean, Dr. Thomas Pinto, for providing the facilities to
carry out the project.

We also extend our thanks to all teaching, non-teaching staff and management staff of the
Computer Science and Engineering Department who have been helpful and cooperative towards
the completion of the project work.

ATHMIKA BHAT
KSHAMIKA
RASHMITHA RAJ
SIMI SATHEESH

II
TABLE OF CONTENTS

Title Page No.

Chapter 1 INTRODUCTION 1

1.1 Theoretical background 2

1.2 Motivation 3

1.3 Problem definition 4

1.4 Solution 4-5

1.5 Aim of the proposed work 4-5

1.6 Objective 5

1.7 Scope 6

Chapter 2 LITERATURE SURVEY 8-9

Chapter 3 REQUIREMENTS ANALYSIS 10

3.1 Functional Requirements 11

3.2 Non Functional Requirements 12


3.3 Organizational Requirements 12-13

3.4 Operational Requirements 13-14

3.5 System Requirements 15-18

Chapter 4 SYSTEM DESIGN 19

4.1 Introduction and related concepts 20-24


4.2 Proposed system model 25-30

Chapter 5 IMPLEMENTATION 31
5.1 Dataset Description 32-34

5.2 Classification module of CNN 34-36

5.3 Pseudocode 37

Chapter 6 RESULTS AND DISCUSSION 38-42

Chapter 7 CONCLUSION AND FUTURE WORK 43-44


REFERENCES 45-47
III
LIST OF FIGURES
TITLE PAGE NO.

Figure 4.0 Architecture of CNN model 21

Figure 4.1 Proposed system for breast cancer detection 25

Figure 5.0 Convolution neural network Architecture 35

Figure 7.0 Welcome Window 43

Figure 7.1 Window with labels, entry field, predict, calculate 44


metrics, reset, information button

Figure 7.2 Window with Input values entered 45

Figure 7.3 Prediction of Diagnosis 45

Figure 7.4 Calculate metrics 46

Figure 7.5 Confusion Matrix

Figure 7.6 Information Window


Tables List of Tables Page No.

Table 6.0 Test case 1 for positive diagnosis 41

Table 6.1 Test case 2 for negative diagnosis 41


CHAPTER 1

INTRODUCTION
Breast cancer detection using CNN model 2022-2023

Chapter 1
INTRODUCTION

Breast cancer is a type of cancer that begins in the cells of the breast tissue. It occurs when
abnormal cells in the breast tissue grow and multiply uncontrollably, forming a tumor. Breast
cancer is the most common cancer in women worldwide, and it can also affect men, although it
is rare. The two main types of breast cancer are (1) invasive ductal carcinoma (IDC) and (2)
ductal carcinoma in situ (DCIS), with the latter evolving slowly and, generally, not having
negative effects on the daily lives of patients. A low percentage of all cases (between 20% and
53%) are classified as the DCIS type; on the other hand, the IDC type is more dangerous,
surrounding the entire breast tissue. Most breast cancer patients, approximately 80%, are in this
category. Risk factors for breast cancer include age, family history, genetic mutations, exposure
to estrogen, alcohol consumption, and obesity. Symptoms of breast cancer can include a lump
or thickening in the breast or under the arm, changes to the skin on the breast, nipple discharge,
or changes to the size or shape of the breast. Early detection and treatment are essential for
improving the chances of survival. Treatment options for breast cancer may include surgery,
radiation therapy, chemotherapy, hormone therapy, and targeted therapy, depending on the type
and stage of the cancer. Women are encouraged to undergo regular breast cancer screenings,
including mammograms, starting at age 50 or earlier if they have a family history of breast
cancer or other risk factors.

Breast cancer prevention and treatment both depend heavily on early detection of the disease.
The likelihood of effective treatment and survival increases with early identification of breast
cancer. Breast self-examination (BSE), clinical breast examination (CBE), mammography,
breast ultrasonography, magnetic resonance imaging (MRI), and biopsy are some of the
techniques used to identify breast cancer. Mammography is one of the most important tools for
finding breast cancer early. Since mammography is ineffective for breasts with solid tissue,
diagnostic sonography techniques such as ultrasound are frequently used. Considering these
concerns, radiography can avoid small masses, while thermography may be superior to
ultrasound for the detection of smaller malignant tumors. Instruments have been developed to
create and enhance image processing since images have inherent challenges such as poor
contrast, noise, and lack of visual appreciation.

Malignant and benign are terms used to describe the behavior of tumors, which are abnormal
growths of cells in the body. A malignant tumor is cancerous and can invade and destroy

Dept. of CSE,SUIET,Mukka Page 1


Breast cancer detection using CNN model 2022-2023

nearby tissues and organs, as well as spread to other parts of the body (metastasize). Malignant
tumors can be life-threatening if not treated promptly. A benign tumor, on the other hand, is
non-cancerous and does not invade nearby tissues or spread to other parts of the body. While
benign tumors may still require treatment if they are causing symptoms or are in a location that
can cause problems, they are generally not life-threatening. It's important to note that not all
tumors are either purely benign or malignant. Some tumors can have features of both and are
referred to as "borderline" or "intermediate" tumors. In these cases, the behavior of the tumor
may be difficult to predict, and the appropriate treatment may require careful consideration by a
team of specialists.

In recent years, breast cancer diagnosis utilizing CNN (Convolutional Neural Networks) has
demonstrated encouraging results. CNNs are a subset of deep learning algorithms that may be
trained to recognize patterns and features in pictures automatically. The project’s goal is to
classify whether the tumor present in the breast is either malignant or benign. Using CNN, when
the user enters the required inputs, it . This project

We used the Breast Cancer Wisconsin (Diagnostic) Data Set for this project. Features are
computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe
characteristics of the cell nuclei present in the image. The primary goal of this work is to
implement a system capable of detecting the type of tumor present that is malignant or benign.
After the user enters the required inputs, the proposed system will provide the user with the type
of tumor.

1.1 Theoretical background

Breast Cancer is most popular and growing disease in the world. Breast Cancer is mostly found
in the women. Early detection is a way to control the breast cancer. There are many cases that
are handled by the early detection and decrease the death rate. Many research works 18 have
been done on the breast cancer. The most common technique that is used in research is machine
learning.
There is much previous research that was conducted through machine learning. Machine
learning algorithms like decision tree, KNN, SVM, naïve bays etc. gives the better performance
in their own field. But now days, a new developed technique is used to classify the breast
cancer. The new developed technique is deep learning. Deep learning is used to overcome the

Dept. of CSE,SUIET,Mukka Page 2


Breast cancer detection using CNN model 2022-2023

drawbacks of machine learning. A deep learning technique that is mostly used in data science is
Convolution neural network, Recurrent neural network, deep belief network etc. deep learning
algorithms give the better results as compared to machine learning. It extracts the best features
of the images.
CNNs are particularly effective for breast cancer detection because they can learn complex
patterns and features from medical images, even in cases where the differences between
cancerous and non-cancerous tissue are subtle. Additionally, CNNs can be trained to classify
images with a high degree of accuracy, making them a valuable tool for early breast cancer
detection and diagnosis. In our research, CNN is used, basically our research is based on the
features extracted from the FNAC biopsy and CNN is the most popular technique to classify.

1.2 Motivation

Concerning the cultural conditions and sensitivity of some organs such as the breast, most
women and even young girls ignore breast cancer screening and sometimes do not screen at all
and thus breast cancers are diagnosed at an advanced stage and cause a great deal of suffering
for women and their families. The breast cancer detection is very important and preventive and
health beliefs and behaviors in any society are shaped by the social and cultural context of the
individuals. Our study aimed to explain the motivational factors for breast cancer detection.

The overall motivation for breast cancer detection is to improve the prevention, diagnosis, and
treatment of breast cancer, ultimately reducing mortality rates and improving patient outcomes.
Breast cancer is a significant health concern that affects millions of people worldwide, and
early detection is crucial for improving outcomes and reducing the burden of disease on
individuals and society. Breast cancer detection has several specific motivations, including
early detection, screening, treatment planning, improved patient outcomes, and public health
impact. By detecting breast cancer at an early stage, healthcare providers can provide more
effective treatment and improve the chances of successful outcomes. Screening programs can
identify individuals who may be at risk for breast cancer, allowing for early detection and
intervention. Accurate diagnosis and treatment planning are essential for developing effective
treatment plans and improving patient outcomes. Improved breast cancer detection can have a
significant positive impact on public health, reducing the burden of disease and improving
overall well-being.

Dept. of CSE,SUIET,Mukka Page 3


Breast cancer detection using CNN model 2022-2023

1.3 Problem Definition

When using a Convolutional Neural Network (CNN) for the detection of breast cancer using
the Wisconsin Breast Cancer dataset as input, several challenges and problems may arise. Here
is some of the common issues faced during this process.
Limited dataset size: The Wisconsin Breast Cancer dataset may have a relatively small number of
samples compared to other datasets used for CNNs. This limited dataset size can lead to
overfitting, where the model fails to generalize well to new, unseen data.

Class imbalance: The Wisconsin Breast Cancer dataset consists of two classes, namely benign
and malignant tumors. If the dataset is imbalanced, with a significant difference in the number of
samples between classes, the model might become biased towards the majority class. This can
affect the accuracy of the model, particularly for the minority class.

Preprocessing challenges: Proper preprocessing of the dataset is crucial for achieving good
results. Some challenges in this regard include handling missing values, dealing with outliers,
and normalizing or standardizing the input data. It is essential to carefully preprocess the data to
ensure that the CNN model can effectively learn the underlying patterns and features related to
breast cancer.

Fine-tuning hyperparameters: CNNs have several hyperparameters that need to be tuned to


achieve optimal performance. Selecting the appropriate network architecture, such as the number
and size of layers, filter sizes, pooling strategies, and learning rates, can be challenging.

1.4 Solution

Addressing these challenges requires careful consideration and experimentation, along with an
iterative approach to model development and evaluation. To mitigate the issue of limited
dataset, techniques such as data augmentation, transfer learning, or using other external datasets
can be employed. Techniques like oversampling, under sampling, or using weighted loss
functions can address class imbalance. Hyperparameter tuning techniques like grid search,
random search, or Bayesian optimization can be utilized to find the best combination of
hyperparameters.
1.5 Aim of the proposed work

The aim of breast cancer detection is to develop an accurate and reliable model that can

Dept. of CSE,SUIET,Mukka Page 4


Breast cancer detection using CNN model 2022-2023

distinguish between benign and malignant breast tumors. The Wisconsin dataset contains
various features extracted from fine-needle aspirates of breast masses, including information on
the size, shape, and texture of the cells. The overall goal of this type of breast cancer detection
is to improve the accuracy of breast cancer diagnosis, reduce the need for unnecessary biopsies
or surgeries, and ultimately improve patient outcomes. By accurately identifying whether a
breast tumor is benign or malignant, healthcare providers can provide more effective treatment
and improve the chances of successful outcomes.

In addition to clinical applications, breast cancer detection using the Wisconsin dataset has the
potential to advance our understanding of the underlying mechanisms of breast cancer. By
identifying key features that distinguish benign and malignant tumors, researchers can gain
insights into the biology of breast cancer and develop new treatments and preventive strategies.

Overall, the aim of breast cancer detection using the Wisconsin dataset is to improve the
accuracy of breast cancer diagnosis, reduce unnecessary interventions, and ultimately improve
patient outcomes. It also has the potential to advance our understanding of breast cancer
biology and support the development of new treatments and preventive strategies.

1.6 Objective
The objective of breast cancer detection is to develop an accurate and reliable model that can
effectively distinguish between benign and malignant breast tumors. This analysis aims to
observe which features are most helpful in predicting malignant or benign cancer and to see
general trends that may aid us in model selection and hyper parameter selection. The specific
objectives include improving the accuracy and reliability of breast cancer diagnosis, reducing
the need for unnecessary biopsies or surgeries, improving patient outcomes, supporting the
development of new treatments and preventive strategies.

Dept. of CSE,SUIET,Mukka Page 5


Breast cancer detection using CNN model 2022-2023

1.7Scope
The primary use of breast cancer detection is to identify and diagnose breast cancer at an early
stage when it is most treatable. Early detection has been shown to improve treatment outcomes
and reduce mortality rates. Detection of breast cancer may also lead to earlier intervention and
better prognosis for patients.

Breast cancer detection can also be used to identify individuals who may be at higher risk for
developing breast cancer. Individuals with a family history of breast cancer or who carry
certain genetic mutations, such as BRCA1 or BRCA2, may be at increased risk and may
benefit from earlier or more frequent screening. In addition to individual patient care, breast
cancer detection has broader public health implications. Detection and treatment of breast
cancer can reduce healthcare costs associated with advanced stage breast cancer and improve
quality of life for patients and their families. Early detection and treatment can also reduce the
burden of breast cancer on society.

The scope of breast cancer detection includes various aspects of breast cancer care, including
screening, diagnosis, monitoring, and treatment. Screening efforts aim to identify breast cancer
at an early stage in individuals who have no signs or symptoms of the disease. Diagnosis
involves the use of imaging tests and diagnostic tests to confirm the presence of breast cancer.
Monitoring efforts involve the use of imaging tests to monitor disease progression and
treatment response. Treatment efforts may include surgery, radiation therapy, chemotherapy,
and targeted therapy. Research on breast cancer detection has led to the development of new
technologies and approaches for detecting breast cancer. These advances have improved the
accuracy of breast cancer detection and enabled more personalized treatment plans for
individual patients.

Overall, the use and scope of breast cancer detection are essential for identifying and treating
breast cancer at an early stage, identifying individuals at higher risk, reducing the burden of
breast cancer on individuals and society, and advancing our understanding of breast cancer
biology.

Dept. of CSE,SUIET,Mukka Page 6


CHAPTER 2

LITERATURE
SURVEY
Breast cancer detection using CNN model 2022-2023

Chapter 2

LITERATURE SURVEY

Malathi et al Published "Segmentation of breast cancer using fuzzy C means and classification
by SVM based on LBP features". This paper shows the application of a computer-aided design
(CAD) for the initial identification, examination, and treatment of breast cancer using
mammograms. This paper shows exploration of breast CAD architecture that focuses on
character fusion through deep learning of the CNN. The results showed that the Random
Forest Algorithm (RFA) achieved the loftiest perfection with lower error than the CNN
classifier (95.65).

Tsochatzidis et al Published “Integrating segmentation information into CNN for breast cancer
diagnosis of mammographic masses, Computer Methods and Programs in Biomedicine”. In
this paper an experiment is performed using CNN to test the diagnosis of breast cancer with
mammograms. The experiment was carried out on two datasets of mammographic mass such
ass CBIS-DDSM and DDSM-400. The results showed variation in accuracy of corresponding
segmentation maps of ground truth.

Dina A Ragab et al Published “Breast cancer detection using deep convolutional neural
networks and support vector machines”. This paper suggests the use of computer aided
detection (CAD) for the classification of malignant or benign mass tumors in breast
mammography images. Two segmentation approaches are used in this proposed CAD system.
The first approach involves determining the region of interest (ROI) manually, and the second
approach uses the technique of threshold and region based.

M.Tahmooresi et al Published “Early detection of breast cancer using machine learning


techniques”. This paper describes a hybrid model which is combined using several Machine
Learning (ML) algorithms that include Support Vector Machine (SVM), Artificial Neural
Network (ANN), Decision Tree (DT), K-Nearest Neighbor (KNN) for the effective breast
cancer.

Dept. of CSE,SUIET,Mukka Page 8


Breast cancer detection using CNN model 2022-2023

Simon Hadush et al Published “Breast cancer detection using convolutional neural network”.
This paper explains the way for detecting breast cancer using convolutional neural networks
(CNN) for breast mass detection to minimize the overheads of manual analysis. This model
classifies the detection mass region into malignant or benign abnormality in the mammogram
images at once.

Zhiqiong wang et al Publishes “Breast cancer detection using extreme learning machine based
on feature fusion with CNN deep featured”. This paper put forwards the breast CAD method
which is based on feature fusion along with convolutional neural network (CNN) deep
features. To classify the malignant and benign tumors, an ELM classifier is developed using
the fused feature set.

Saad Awadh Alanazi et al Publishes “Boosting breast cancer detection using convolutional
neural network”. This paper put forwards the various convolutional neural network (CNN)
models to detect breast cancer automatically, and then compares the results with those from
machine learning (ML) algorithms.

Sri Hari Nallamala, et al Published “Breast Cancer Detection using Machine Learning Way”.
This paper proposes the way to relate and explain the ways that CNN and logistic algorithms
can be used for detecting breast cancer, yet the variables are condensed.

Hossain presented the Context aware stacked CNNs for the categorization of breast WSIs into
simple, DCIS (ductal carcinoma in situ), and IDC (invasive ductal carcinoma). For the
classification of malignant and nonmalignant slides, the system realized an area beneath the
curve of 0.962 and gained a three-class accuracy of 81.3% for WSI classification.

Dept. of CSE,SUIET,Mukka Page 9


CHAPTER 3

REQUIREMENT
ANALYSIS
Breast cancer detection using CNN model 2022-2023

Chapter 3

REQUIREMENT ANALYSIS

3.1 FUNCTIONAL REQUIREMENTS

 Data preprocessing: The system should be able to preprocess the dataset by performing image
resizing, normalization, and augmentation to ensure that the images are of a consistent size and
quality.

 CNN model creation: The system should be able to create a CNN model with multiple
convolutional, pooling, and fully connected layers that can analyze the image data and make
predictions about the presence or absence of cancer.

 Training and validation: The system should be able to train the CNN model on the training data
and validate it on the testing data to ensure that it is accurate and effective.

 Hyperparameter tuning: The system should be able to tune the hyperparameters of the CNN
model, such as the learning rate and number of epochs, to improve its performance.

 Prediction and diagnosis: The system should be able to predict whether an image contains
cancer or not and provide a diagnosis based on the predictions.

 Performance evaluation: The system should be able to evaluate the performance of the CNN
model using metrics such as accuracy, sensitivity, and specificity.

 User interface: The system should provide a user-friendly interface for users to input and
upload images, view results, and provide feedback.

 Security and privacy: The system should ensure that patient data is secure and protected by
implementing security measures such as encryption, access controls, and data backup.

Dept. of CSE,SUIET,Mukka Page 11


Breast cancer detection using CNN model 2022-2023

3.2 NON FUNCTIONAL REQUIREMENTS

 Performance: The system should be able to process images quickly and provide accurate
results within a reasonable time frame.

 Scalability: The system should be designed to handle large amounts of data and be able to scale
to accommodate growing datasets.

 Reliability: The system should be highly reliable and available, with minimal downtime or
disruptions.

 Usability: The system should be easy to use and understand, with a user-friendly interface that
enables users to input and upload images, view results, and provide feedback.

 Maintainability: The system should be easy to maintain and update, with well-documented
code and processes that enable developers to quickly identify and fix any issues.

 Portability: The system should be designed to be portable across different environments and
platforms, making it easy to deploy and use in different settings.

 Security: The system should be secure, with measures in place to protect patient data and
prevent unauthorized access or data breaches.

 Interoperability: The system should be able to integrate with other healthcare systems and
tools, enabling seamless data exchange and collaboration between different stakeholders.

3.3 ORGANIZATIONAL REQUIREMENTS

Breast cancer detection using a CNN (Convolutional Neural Network) model requires specific
organizational requirements to ensure the success and accuracy of the system. Here are some of
the essential requirements:

 Data collection and labeling: The organization needs to collect a large amount of breast cancer
data, including mammograms and other medical images, to train the CNN model. The data
should be appropriately labeled to help the model identify cancerous and non-cancerous tissue

Dept. of CSE,SUIET,Mukka Page 12


accurately.

 Computational resources: Training a CNN model for breast cancer detection requires
significant computational resources. The organization needs to have a powerful computer
system with a GPU (graphics processing unit) to accelerate the model's training.

 Data preprocessing: The data collected for training the model may be noisy, have missing
values or artifacts that can affect the model's performance. The organization must have a data
preprocessing pipeline that can handle these issues before feeding the data to the CNN model.

 Model selection and optimization: There are several CNN models that can be used for breast
cancer detection, and selecting the right model is crucial. The organization must also optimize
the model's hyperparameters to ensure it performs optimally on the data.

 Quality assurance: The organization needs to have a rigorous quality assurance process to
ensure the CNN model's accuracy and reliability. This includes testing the model on a separate
dataset and comparing its performance with other models.

 Regulatory compliance: The organization must ensure that the CNN model complies with any
relevant regulations, such as patient privacy laws, before it can be deployed in clinical settings.

 Training and support: The organization needs to train and support the staff responsible for
using the CNN model, including radiologists and medical professionals, to ensure they can
interpret the model's output accurately and make informed decisions.

 Overall, implementing a CNN model for breast cancer detection requires careful planning,
investment in resources, and expertise in machine learning and medical imaging.

3.4 OPERATIONAL REQUIREMENTS

Governance

The system should be governed by a team of medical professionals and data scientists who can

Dept. of CSE,SUIET,Mukka Page 13


Breast cancer detection using CNN model 2022-2023

provide oversight, review performance, and make recommendations for improvement.

Collaboration

The system should promote collaboration among medical professionals and researchers to
share knowledge and improve outcomes.

Training

The system should be accompanied by training programs that provide medical professionals
with the knowledge and skills they need to effectively use the system.

Integration

The system should be integrated with other healthcare systems and databases to ensure that
patient data is accurate and up-to-date.

Security

The system should be designed with robust security measures to protect patient data from
unauthorized access and breaches.

Quality assurance

The system should undergo regular quality assurance reviews to ensure that it is reliable,
accurate, and effective.

Support

The system should have a dedicated support team that can provide technical assistance and
troubleshoot issues that arise.

Continuous improvement

The system should be continuously improved based on feedback from medical professionals
and patients, as well as advancements in technology and medical research.

Dept. of CSE,SUIET,Mukka Page 14


Breast cancer detection using CNN model 2022-2023

3.5 SYSTEM REQUIREMENTS

3.5.1 HARDWARE REQUIREMENTS

 Processor : Intel® Core™ i3-1005G1 CPU @ 1.20 1.19GHz


 Hard Disk : 1TB
 RAM : 8 GB
 Keyboard : Standard PS/2 Keyboard
 Web Camera : HP TrueVision HD Camera

3.5.2 SOFTWARE REQUIREMENTS

 Operating System : Windows 11


 Compiler : PyPy 3.5
 Libraries : Keras, TensorFlow, NumPy, Pandas
 Language : Python
 IDE : Pycharm

Keras

Keras is an API that was made to be easy to learn for people. Keras was made to be simple. It
offers consistent & simple APIs, reduces the actions required to implement common code, and
explains user error clearly. Prototyping time in Keras is less. This means that your ideas can be
implemented and deployed in a shorter time.
Keras also provides a variety of deployment options depending on user needs. Languages with
a high level of abstraction and inbuilt features are slow and building custom features in
then can be hard. But Keras runs on top of TensorFlow and is relatively fast. Keras is also
deeply integrated with TensorFlow, so you can create customized work flows with ease.

Dept. of CSE,SUIET,Mukka Page 15


Breast cancer detection using CNN model 2022-2023

TenserFlow

TensorFlow is an open source machine learning framework for all developers. It is used for
implementing machine learning and deep learning applications. To develop and research on
fascinating ideas on artificial intelligence, Google team created TensorFlow. TensorFlow is
designed in Python programming language, hence it is considered an easy to understand
framework. TensorFlow is a software library or framework, designed by the Google team to
implement machinelearning and deep learning concepts in the easiest manner. It combines the
computational algebra ofoptimization techniques for easy calculation of many mathematical
expressions.

NumPy

NumPy (Numerical Python) is an open-source Python library that provides support for
scientific computing and data analysis through powerful multi-dimensional arrays, matrices,
and a large library of mathematical functions. NumPy is an essential library for scientific
computing, and its popularity continues to grow among data scientists, machine learning
engineers, and other researchers who require efficient numerical computing in Python.

One of the key features of NumPy is its ability to handle large, multi-dimensional arrays and
matrices efficiently. NumPy arrays can be created in many ways, including from Python lists,
using built-in functions, or reading data from files. Once created, NumPy arrays can be
manipulated in a variety of ways, including basic arithmetic operations, broadcasting, slicing,
and advanced indexing. NumPy arrays are also highly optimized for numerical operations,
which makes them much faster than Python lists for scientific computing.
In addition, NumPy includes support for random number generation and statistical analysis.
NumPy provides functions for generating random numbers from various distributions, which
can be useful for simulation and modeling. NumPy also includes functions for calculating
summary statistics such as means, variances, and correlations, which are essential for data
analysis.

Pandas
Pandas is a popular open-source Python library that is widely used for data manipulation and
analysis. It provides a flexible and powerful toolkit for working with structured data, such as
tabular data and time series data, making it a valuable tool for data scientists, analysts, and
Dept. of CSE,SUIET,Mukka Page 16
Breast cancer detection using CNN model 2022-2023

researchers.

One of the key features of Pandas is its ability to handle tabular data, such as data from CSV or
Excel files. Pandas provides two primary data structures, the Series and the DataFrame, that

allow you to manipulate and analyze data efficiently. The Series is a one-dimensional array-
like object that can hold any data type, while the DataFrame is a two-dimensional table-like
data structure that consists of rows and columns.

Pandas provides a wide range of functions for manipulating and cleaning data. For example,
Pandas allows you to select, filter, and sort data based on specific conditions, which can be
useful for data cleaning and preprocessing. Pandas also includes functions for merging, joining,
and grouping data, which can be helpful for data analysis and visualization.
Sklearn library
Scikit-learn, also known as sklearn, is a popular machine learning library in Python that
provides a wide range of tools for data analysis, preprocessing, modeling, and evaluation. It is
built on top of other Python libraries such as NumPy, SciPy, and Matplotlib and is designed to
work well with other popular Python libraries, such as Pandas for data manipulation.

Scikit-learn provides a variety of algorithms for both supervised and unsupervised machine
learning tasks, including regression, classification, clustering, dimensionality reduction, and
model selection. Some of the most popular algorithms in scikit-learn include linear regression,
logistic regression, support vector machines (SVM), k-nearest neighbors (KNN), decision
trees, and random forests.

Scikit-learn also includes tools for data preprocessing, such as feature scaling, feature
selection, and data normalization, as well as model evaluation metrics, such as accuracy,
precision, recall, and F1 score. It also provides tools for cross-validation, hyperparameter
tuning, and pipelining.

Overall, scikit-learn is a powerful and user-friendly library for machine learning in Python,
making it a popular choice among data scientists and machine learning practitioners.

Dept. of CSE,SUIET,Mukka Page 17


Breast cancer detection using CNN model 2022-2023

Tkinter Library
The Tkinter library is a popular Python module used for creating graphical user interfaces
(GUIs). It provides a set of tools and widgets for building windows, dialog boxes, buttons,
menus, and other GUI elements. Tkinter is based on the Tk GUI toolkit, which was developed
by John Ousterhout in the late 1980s.

Tkinter's history dates back to the early days of Tcl (Tool Command Language), a scripting
language created by Ousterhout in the late 1980s. Tcl was designed as an embeddable scripting
language for applications and had its own GUI toolkit called Tk. Tk provided a way to create
cross-platform GUIs using a simple and consistent API.

In the early 1990s, Guido van Rossum, the creator of Python, recognized the value of Tk and
decided to incorporate it into Python. He developed the Tkinter module as a Python wrapper
around the Tk library, allowing developers to create GUI applications using Python.

Tkinter was first included in Python's standard library starting from Python version 1.1 in
1994. Over the years, Tkinter has evolved and improved alongside Python, gaining new
features and enhancements. It has become one of the most widely used GUI toolkits for Python
due to its simplicity, ease of use, and cross-platform compatibility.

The name "Tkinter" itself is derived from "Tk interface," emphasizing the connection to the Tk
toolkit.

It's worth noting that while Tkinter is the standard GUI library for Python, there are other
options available as well, such as PyQt, wxPython, and Kivy, each with its own set of features
and strengths.

Overall, the invention of Tkinter can be attributed to John Ousterhout's creation of the Tcl
language and the Tk toolkit, followed by Guido van Rossum's integration of Tk into Python,
resulting in a powerful and accessible GUI library for Python developers.

Dept. of CSE,SUIET,Mukka Page 18


CHAPTER 4

SYSTEM DESIGN
Breast cancer detection using CNN model 2022-2023

Chapter 4

SYSTEM DESIGN

4.1 INTRODUCTION AND RELATED CONCEPTS

CNN Model

Designing a CNN (Convolutional Neural Network) model involves several steps, including
defining the problem, preparing the data, selecting an appropriate architecture, training the
model, and evaluating its performance. Here's a brief overview of each step:

 Defining the Problem: The first step in designing a CNN model is to clearly define the
problem you're trying to solve. This involves identifying the type of input data, the output you
want to generate, and any specific constraints or requirements.

 Preparing the Data: Once you've defined the problem, you'll need to prepare your data. This
involves cleaning and preprocessing the data, splitting it into training, validation, and testing
sets, and transforming it into a format that can be used by the CNN model.

 Selecting an Architecture: The architecture of a CNN model is crucial for achieving good
performance. This involves selecting the appropriate layers, such as convolutional, pooling,
and fully connected layers, and determining the number of neurons and filters for each layer.
You'll also need to choose an appropriate activation function, such as ReLU or sigmoid, and
consider adding regularization techniques like dropout or batch normalization.

 Training the Model: Once you've designed your CNN architecture, you'll need to train the
model on your training data. This involves defining a loss function, such as cross-entropy, and
an optimizer, such as Adam or SGD, and running the model through several epochs to adjust
the weights and biases.

 Evaluating Performance: After training the model, you'll need to evaluate its performance on
the validation and testing sets. This involves calculating metrics like accuracy, precision,
recall, and F1 score, and adjusting the hyperparameters of the model if necessary to improve
Dept. of CSE,SUIET,Mukka Page 20
Breast cancer detection using CNN model 2022-2023

performance.

 Deployment: Finally, once you have a satisfactory model, you'll need to deploy it in a
production environment. This can involve packaging the model as an API, integrating it with
other systems, and monitoring its performance over time.

Overall, designing a CNN model is a complex process that requires careful consideration of
the problem, the data, the architecture, and the training and evaluation process. With careful
planning and implementation, however, a well-designed CNN model can achieve state-of-the-
art performance on a wide range of tasks.

Fig.4.0 Architecture of CNN model

Convolutional Neural Networks (CNNs) are a type of neural network designed for processing
data with a grid-like topology, such as images. Here is a general overview of the design of a
CNN model:

Convolutional Layers: These layers apply a set of filters (also called kernels) to the input
image to extract features. Each filter slides over the input image and performs element-wise
multiplication, then sums up the values to produce a single output value. The output of the
convolutional layer is a set of feature maps, where each map represents the activation of a
particular filter.

Pooling Layers: These layers reduce the spatial dimensions of the feature maps by down-
sampling them. The most common type of pooling is max-pooling, which takes the maximum
Breast cancer detection using CNN model 2022-2023

value within a small window of the feature map and discards the rest.

Activation Functions: These functions introduce non-linearity into the model by transforming
the output of the previous layer. The most commonly used activation function is ReLU
(Rectified Linear Unit).

Fully-Connected Layers: These layers connect every neuron in the previous layer to every
neuron in the next layer, just like in a traditional neural network. They are typically placed at
the end of the CNN to perform classification.

Dropout Layers: These layers randomly drop out a fraction of the neurons in the previous layer
during training. This helps prevent overfitting.

Batch Normalization Layers: These layers normalize the input to each neuron in the previous
layer to have zero mean and unit variance. This helps prevent vanishing or exploding gradients
and can improve the speed and stability of training.

Output Layer: This layer produces the final output of the model. For classification tasks, it
typically uses a softmax activation function to produce a probability distribution over the
possible classes.

Overall, the design of a CNN model involves a sequence of convolutional, pooling, activation,
and fully-connected layers, with optional dropout and batch normalization layers. The specific
architecture and hyperparameters of the model can vary depending on the task and the

characteristics of the input data. Hence, we have understood the basic CNN structure, it’s
architecture and the various layers that make up the CNN model.We have understood how the
dependence on humans decreases to build effective functionalities. Distinct layers in CNN
transform the input to output using differentiable functions.

Dept.ofofCSE,SUIET,Mukka
Dept. CSE,SUIET,Mukka Page22
Page 21
Breast cancer detection using CNN model 2022-2023

Training

To train a CNN model for breast cancer detection using FNAC (Fine Needle Aspiration
Cytology), you will need the following steps:

Data collection: Collect a large dataset of FNAC images of breast cells, with both positive and
negative samples. You can either gather this data yourself or use an existing dataset such as
the Breast Histopathology dataset available on Kaggle.

Data preprocessing: Preprocess the data by resizing the images to a consistent size,
normalizing the pixel values, and splitting the dataset into training and testing sets.

Model architecture: Design a CNN architecture that takes the preprocessed images as input
and outputs a binary classification indicating the presence or absence of breast cancer. You
can use a pre-trained CNN model such as VGG or ResNet, or design your own architecture.

Model training: Train the CNN model using the training dataset, adjusting the
hyperparameters as necessary. You can use transfer learning to speed up training and improve
accuracy.

Model evaluation: Evaluate the trained model on the testing dataset, calculating metrics such
as accuracy, precision, recall, and F1-score.

Model optimization: Optimize the model to improve its accuracy and reduce false positives
and false negatives. You can use techniques such as data augmentation, fine-tuning, or
adjusting the threshold for classification.

Deployment: Deploy the model in a real-world application for breast cancer detection, such as
a mobile app or a web-based tool.

It is important to note that training a CNN model for breast cancer detection using FNAC
requires expertise in both machine learning and medical imaging. It is recommended to
consult with medical professionals and experienced machine learning practitioners to ensure
accuracy.

Dept. of CSE,SUIET,Mukka Page 23


Breast cancer detection using CNN model 2022-2023

Testing

To test a CNN model for breast cancer detection using FNAC (Fine Needle Aspiration
Cytology) data, you would need to follow these steps:

Collect and preprocess the FNAC data: This would involve obtaining a set of FNAC images
of breast cells from patients, and pre-processing them to prepare them for input into the CNN
model. Pre-processing may involve resizing, normalization, and augmentation of the images.

Split the data into training and testing sets: This is necessary to evaluate the performance of
the CNN model. Typically, the data is split into a training set (used to train the model) and a
testing set (used to evaluate the model's accuracy).

Define the CNN architecture: The architecture of the CNN model needs to be defined,
including the number of layers, number of filters, activation functions, and other
hyperparameters. This architecture should be optimized to achieve the best possible accuracy.

Train the CNN model: The CNN model is trained using the training set. During training, the
model is presented with the input images and their corresponding labels (indicating whether
the image represents a benign or malignant tumor). The model then adjusts its weights and
biases to minimize the error between its predicted output and the actual label.

Evaluate the CNN model: Once the model is trained, it is evaluated on the testing set. The
accuracy, precision, recall, and F1-score of the model are calculated, and the results are
compared to those of other models in the literature.

Fine-tune the CNN model: Based on the evaluation results, the CNN model can be fine-tuned
by adjusting the hyperparameters or changing the architecture to improve its accuracy.

Deploy the CNN model: Once the CNN model is optimized, it can be deployed for use in
real-world applications, such as in clinics or hospitals, to assist in breast cancer diagnosis.

Overall, testing a CNN model for breast cancer detection using FNAC data requires a
combination of data collection and preprocessing, model architecture definition
Dept. of CSE, SUIET, Mukka Page 24
Dept. of CSE,SUIET,Mukka Page 24
Breast cancer detection using CNN model 2022-2023
of and training, and model evaluation and fine-tuning.

4.2 PROPOSED SYSTEM MODEL

Fig. 4.1 Proposed system for Breast cancer detection

Breast cancer dataset


The FNAC dataset is a collection of images and corresponding annotations for Fine Needle
Aspiration Cytology (FNAC) procedures. FNAC is a medical procedure used to obtain a
tissue sample from a suspicious lesion or mass in the body for diagnostic purposes.The FNAC
dataset contains 1,000 high-resolution images of various types of lesions, including benign
and malignant tumors, inflammatory conditions, and normal tissues. The images are annotated
with segmentation masks that delineate the areas of interest in the images.The dataset was
created by the Biomedical Image Analysis (BioMedIA) group at Imperial College London,
and it is intended to be used for training and evaluating computer vision algorithms for the
automatic detection and diagnosis of various types of lesions.The FNAC dataset is freely
available for research purposes, and it has been used in several studies related to medical
image.

Preprocessing
Preprocessing of a breast cancer fine needle aspiration cytology (FNAC) dataset typically

Dept. of CSE,SUIET,Mukka Page 25


Breast cancer detection using CNN model 2022-2023

involves several steps to clean and transform the data before it can be used for analysis or
modeling. Here are some steps you could consider:

 Data Cleaning: Remove any irrelevant or duplicate data points, as well as any missing or
incomplete data. This may involve imputing missing values or removing samples with too
many missing values.

 Feature Selection: Determine which features or variables are relevant to the problem at hand
and remove any that are not useful or redundant. This may involve using statistical methods
or domain expertise.

 Feature Engineering: Create new features or transform existing ones to make them more
useful for analysis or modeling. This may involve normalizing or scaling continuous variables
or encoding categorical variables.

 Data Splitting: Split the data into training and test sets to evaluate model performance.
 Data Balancing: Check if the dataset is balanced in terms of the classes. If not, oversampling
or undersampling techniques can be applied.

 Dimensionality Reduction: Reduce the dimensionality of the feature space using methods like
principal component analysis (PCA) or linear discriminant analysis (LDA) to make the
dataset more manageable and easier to visualize.

 Scaling: Apply scaling techniques such as min-max scaling or standardization to ensure all
features are on the same scale and prevent some features from dominating others.

 Outlier Detection: Detect and remove outliers, which can skew the results of analysis or
modeling.

 Data Encoding: Convert categorical variables into numeric values so they can be used in
models.

 Handling imbalanced datasets: Consider using techniques like oversampling, undersampling,


or data augmentation to address the issue of class imbalance in the dataset.

Dept. of CSE,SUIET,Mukka Page 26


Breast cancer detection using CNN model 2022-2023

 These steps are not exhaustive and may vary depending on the specifics of the dataset and the
problem at hand. It is important to carefully consider each step to ensure the data is properly
cleaned and prepared for analysis or modeling.

Feature selection
Feature selection is a technique used to identify the most important features or variables that
contribute the most to the performance of a machine learning model. In the case of breast
cancer detection using FNAC, feature selection can help identify the most important
characteristics of the cells that are indicative of the presence of cancer.

There are several approaches to feature selection, including filter, wrapper, and embedded
methods. In filter methods, features are selected based on their statistical properties, such as
correlation with the target variable. Wrapper methods use a machine learning model to
evaluate subsets of features, and select the set that produces the best performance. Embedded
methods incorporate feature selection into the model training process, and select the features
that are most relevant for the model.

Some of the features that are commonly used for breast cancer detection using FNAC include
cell size, nuclear shape, nuclear-to-cytoplasmic ratio, and presence of mitotic figures. These
features can be extracted from digital images of FNAC samples using image processing
techniques. Machine learning algorithms such as support vector machines, decision trees, and
artificial neural networks can then be trained on the extracted features to classify samples as
benign or malignant.

In summary, feature selection is an important step in breast cancer detection using FNAC, as
it helps identify the most informative features for accurate diagnosis. This can ultimately lead
to better patient outcomes through earlier detection and treatment of breast cancer.

Data partition
Data partitioning is the process of dividing a dataset into two or more parts to evaluate the
performance of a model. In the case of breast cancer detection using FNAC, data partitioning
can be used to train and test a machine learning model to predict whether a breast tissue
sample.

Dept. of CSE,SUIET,Mukka Page 27


Breast cancer detection using CNN model 2022-2023

There are several ways to partition data for machine learning tasks. One common approach is
to use a random split, where the dataset is divided into a training set and a testing set. The
training set is used to train the model, while the testing set is used to evaluate the model's
performance on unseen data.

Another approach is to use k-fold cross-validation, where the dataset is divided into k equal-
sized folds. The model is trained k times, with each fold used once as the testing data and the
remaining folds used as the training data.

In the case of breast cancer detection using FNAC, a typical approach would be to use a
random split to partition the data into a training set and a testing set. The training set would be
used to train a machine learning model, such as a support vector machine or a neural network,
to predict whether a breast tissue sample is malignant or benign. The testing set would then be
used to evaluate the performance of the model on unseen data.

It is important to ensure that the partitioning of data is representative of the overall


distribution of the dataset. This can help to prevent overfitting and ensure that the model
generalizes well to new data. Additionally, it is important to carefully evaluate the
performance of the model on the testing set to ensure that it is accurate and reliable.
Classification
The classification of datasets in breast cancer detection using FNAC can be done based on
various factors such as:

Type of cancer: FNAC datasets can be classified based on the type of breast cancer being
diagnosed. For example, datasets can be categorized into ductal carcinoma, lobular
carcinoma, inflammatory breast cancer, etc.

Diagnosis: FNAC datasets can be classified based on the diagnosis of breast cancer. For
example, datasets can be categorized into benign, suspicious, or malignant.

Age: FNAC datasets can be classified based on the age group of patients. For example,
datasets can be categorized into young women (under 40 years), middle-aged women
(between 40 and 60 years), and older women (above 60 years).

Dept. of CSE,SUIET,Mukka Page 28


Breast cancer detection using CNN model 2022-2023

Dataset size: FNAC datasets can also be classified based on the number of samples in the
dataset. For example, datasets can be categorized into small datasets (less than 100 samples)
or large datasets (more than 100 samples).

Classification algorithm: FNAC datasets can also be classified based on the classification
algorithm used. For example, datasets can be categorized into those that use machine learning
algorithms such as support vector machines (SVM), decision trees, and neural networks.

Overall, the classification of datasets in breast cancer detection using FNAC is important in
facilitating research and analysis of the diagnostic technique, and in identifying patterns and
trends in breast cancer diagnosis.

4.3.1 DESIGN CONSTRAINTS:

When designing a CNN model for breast cancer detection using fine needle aspiration
cytology (FNAC), there are several design constraints to consider. Some of these constraints
include:

 Data availability: The availability of large and high-quality FNAC datasets is often limited,
which can affect the training and validation of the CNN model.

 Image resolution: The image resolution of FNAC samples is often low, which can make it
difficult to extract meaningful features from the images.

 Tissue heterogeneity: Breast tissue is inherently heterogeneous, and FNAC samples may
contain multiple cell types, which can make it challenging to accurately classify samples.

 Class imbalance: The number of benign samples is often higher than the number of

malignant samples, resulting in class imbalance. This can lead to biased classification results
and the need for techniques such as data augmentation and weighted loss functions.

 Interpretability: CNN models can be complex and difficult to interpret. Therefore, it is


Dept. of CSE,SUIET,Mukka Page 29
Breast cancer detection using CNN model 2022-2023

 essential to ensure that the model's predictions are explainable to physicians and patients.

 Computational resources: CNN models can be computationally expensive to train, especially


when working with large datasets. This may require access to high-performance computing
resources, such as GPUs or cloud-based computing.

Overall, when designing a CNN model for breast cancer detection using FNAC, it is
important to consider these design constraints to ensure that the model is accurate and
interpretable.

Dept. of CSE,SUIET,Mukka Page 30


CHAPTER 5

IMPLEMENTATION
Breast cancer detection using CNN model 2022-2023

Chapter 5
IMPLEMENTATION

Facial emotion recognition (FER) is significant for human-computer interaction such as clinical
practice and behavioral description. Accurate and robust FER by computer models remains
challenging due to the heterogeneity of human faces and variations in images such as different
facial pose and lighting. Among all techniques for FER, deep learning models, especially
Convolutional Neural Networks (CNNs) have shown great potential due to their powerful
automatic feature extraction and computational efficiency. The Recommendation Music
Module suggests songs to the user by mapping their emotions to the mood type of the song,
taking into consideration the preferences of the user.

5.1 DATASET DESCRIPTION

The Wisconsin Breast Cancer Dataset is a widely used dataset in the field of breast cancer
research and machine learning. It was collected by Dr. William H. Wolberg of the University of
Wisconsin Hospitals, Madison, USA.

The dataset contains 569 samples of breast tissue, where each sample is characterized by 30
different features that describe the characteristics of the cell nuclei present in the tissue. Of
these 30 features, 10 are real-valued while the other 20 are integer-valued.

The features are computed from a digitized image of a fine needle aspirate (FNA) of a breast
mass. They describe the characteristics of the cell nuclei present in the image, such as their size,
shape, and texture. The features were computed using a digital image analysis system called
ImageJ.

Each sample in the dataset is labeled as either malignant (cancerous) or benign (non-cancerous).
There are 357 benign samples and 212 malignant samples.

The Wisconsin Breast Cancer Dataset is often used as a benchmark dataset in machine learning
research, particularly in the development and evaluation of classification algorithms for breast
cancer diagnosis. It is also used for feature selection and dimensionality reduction techniques to
extract the most relevant features for breast cancer diagnosis.

Dept. of CSE,SUIET,Mukka Page 32


Breast cancer detection using CNN model 2022-2023

Features are reckoned from a digitized image of a fine needle aspirate (FNA) of a bone mass.
They describe characteristics of the cell capitals present in the image.
Fine needle aspiration is a type of vivisection procedure. In fine needle aspiration, a thin needle
is fitted into an area of abnormal- appearing towel or body fluid.
As with other types of necropsies, the sample collected during fine needle aspiration can help
make a opinion or rule out conditions similar as cancer.
Fine needle aspiration is generally considered a safe procedure. Complications are occasional.
A fine needle aspiration is most frequently done on bumps or lumps located just under the skin.
A lump may be felt during a croaker 's examination. Or it may be discovered on an imaging test
similar as
• CT scan
• mammogram
• ultrasound
During a fine needle aspiration (FNA), a small quantum of bone towel or fluid is removed from
a suspicious area with a thin, concave needle and checked for cancer cells. This type of
vivisection is occasionally an option if other tests show you might have bone cancer (although a
core needle vivisection is frequently preferred). It might also be used in other situations.

Attribute Information:

1) ID number
2) Diagnosis (M = malignant, B = benign)
3-32)

Ten real-valued features are computed for each cell nucleus:

a) radius (mean of distances from center to points on the perimeter)


b) texture (standard deviation of gray-scale values)
c) perimeter
d) area
e) smoothness (local variation in radius lengths)
f) compactness (perimeter^2 / area - 1.0)
g) concavity (severity of concave portions of the contour)
h) concave points (number of concave portions of the contour

Dept. of CSE,SUIET,Mukka Page 33


Breast cancer detection using CNN model 2022-2023

i) symmetry
j) fractal dimension ("coastline approximation" - 1)

5.2 CLASSIFICATION MODULE

5.2.1 CONVOLUTIONAL NUEURAL NETWORK

A Convolutional Neural Network (ConvNet/CNN) is a Deep Learning algorithm that can take
in an input image, assign importance (learnable weights and biases) to various aspects/objects
in the image, and be able to differentiate one from the other. The pre-processing required in a
ConvNet is much lower as compared to other classification algorithms. While in primitive
methods filters are hand-engineered, with enough training, ConvNets have the ability to learn
these filters/characteristics.The architecture of a ConvNet is analogous to that of the
connectivity pattern of Neurons in the Human Brain and was inspired by the organization of the
Visual Cortex. Individual neurons respond to stimuli only in a restricted region of the visual
field known as the Receptive Field. A collection of such fields overlaps to cover the entire
visual area.
Convolutional neural networks are distinguished from other neural networks by their superior
performance with image, speech, or audio signal inputs. They have three main types of layers,
which are:
• Convolutional layer
• Pooling layer
• Fully-connected (FC) layer
The convolutional layer is the first layer of a convolutional network. While convolutional layers
can be followed by additional convolutional layers or pooling layers, the fully-connected layer
is the final layer. With each layer, the CNN increases in its complexity, identifying greater
portions of the image. Earlier layers focus on simple features, such as colors and edges. As the
image data progresses through the layers of the CNN, it starts to recognize larger elements or
shapes of the object until it finally identifies the intended object.

Dept. of CSE,SUIET,Mukka Page 34


Breast cancer detection using CNN model 2022-2023

Figure. 5.0 Convolutional Neural Network Architecture

A. Convolutional layer

The convolutional layer is the core building block of a CNN, and it is where the majority of
computation occurs. It requires a few components, which are input data, a filter, and a feature
map. Let’s assume that the input will be a color image, which is made up of a matrix of pixels
in 3D. This means that the input will have three dimensions—a height, width, and depth—
which correspond to RGB in an image. We also have a feature detector, also known as a kernel
or a filter, which will move across the receptive fields of the image, checking if the feature is
present. This process is known as a convolution.

The feature detector is a two-dimensional (2-D) array of weights, which represents part of the
image. While they can vary in size, the filter size is typically a 3x3 matrix; this also determines
the size of the receptive field. The filter is then applied to an area of the image, and a dot
product is calculated between the input pixels and the filter. This dot product is then fed into an
output array. Afterwards, the filter shifts by a stride, repeating the process until the kernel has
swept across the entire image. The final output from the series of dot products from the input
and the filter is known as a feature map, activation map, or a convolved feature.
After each convolution operation, a CNN applies a Rectified Linear Unit (ReLU)
transformation to the feature map, introducing nonlinearity to the model.As we mentioned
earlier, another convolution layer can follow the initial convolution layer. When this happens,
the structure of the CNN can become hierarchical as the later layers can see the pixels within
the receptive fields of prior layers. As an example, let’s assume that we’re trying to determine if
an image contains a bicycle. You can think of the bicycle as a sum of parts. It is comprised of a
frame.

Dept. of CSE,SUIET,Mukka Page 35


Breast cancer detection using CNN model 2022-2023

Each individual part of the bicycle makes up a lower-level pattern in the neural net, and the
combination of its parts represents a higher-level pattern, creating a feature hierarchy within the
CNN.

B. Pooling layer

Pooling layers, also known as downsampling, conducts dimensionality reduction, reducing the
number of parameters in the input. Similar to the convolutional layer, the pooling operation
sweeps a filter across the entire input, but the difference is that this filter does not have any
weights. Instead, the kernel applies an aggregation function to the values within the receptive
field, populating the output array. There are two main types of pooling:

Max pooling: As the filter moves across the input, it selects the pixel with the maximum value
to send to the output array. As an aside, this approach tends to be used more often compared to
average pooling.

Average Pooling: As the filter moves across the input, it calculates the average value within the
receptive field to send to the output array.

C. Fully connected layer


The name of the full-connected layer aptly describes itself. As mentioned earlier, the pixel
values of the input image are not directly connected to the output layer in partially connected
layers. However, in the fully-connected layer, each node in the output layer connects directly to
a node in the previous layer.
5.2.2 ANALYSIS TOOL
A confusion matrix is a tool used to evaluate the performance of a machine learning model,
including CNN models, for classification tasks such as breast cancer detection. It provides a
tabular representation of the model's predictions compared to the true labels.
Based on the values in the confusion matrix, you can calculate various evaluation metrics such
as accuracy, precision, recall (sensitivity), specificity, and F1 score. These metrics provide
insights into the performance of the CNN model for breast cancer detection.

Dept. of CSE,SUIET,Mukka Page 36


Breast cancer detection using CNN model 2022-2023

5.3 PSEUDOCODE

1. Start
2. Load the Wisconsin dataset
3. Split the dataset into training and testing sets
4. Preprocess the data by scaling and normalizing the pixel values
5. Define the CNN architecture:
a. Add a convolutional layer with ReLU activation
b. Add a max pooling layer
c. Add another convolutional layer with ReLU activation
d. Add another max pooling layer
e. Flatten the output of the previous layer
f. Add a fully connected layer with ReLU activation
g. Add a dropout layer to prevent overfitting
h. Add a final fully connected layer with sigmoid activation
6. Compile the model using binary cross-entropy loss and Adam optimizer
7. Train the model on the training set for a certain number of epochs
8. Evaluate the model on the testing set and calculate accuracy, precision, recall, and F1-score
9. Use the trained model to predict on new, unseen data

10. End

Dept. of CSE,SUIET,Mukka Page 37


Breast cancer detection using CNN model 2022-2023

CHAPTER 6
TESTING

Chapter 6
Testing
Breast cancer detection using CNN model 2022-2023

Testing is an essential step in evaluating the performance and effectiveness of a CNN model for
breast cancer detection using the FNAC dataset. Here are some steps you can follow for testing:
 Preprocessing: Apply any necessary preprocessing steps to the FNAC images before feeding
them into the CNN model.
 Model Selection: Choose the appropriate CNN architecture for your breast cancer detection
task.
 Training: Train the CNN model on the training subset of the FNAC dataset.
 Evaluation: Once the model training is complete, evaluate the model's performance on the
testing subset of the FNAC dataset.
 Confusion Matrix: Create a confusion matrix to analyze the model's performance in more
detail. The confusion matrix provides information on true positives, true negatives, false
positives, and false negatives, allowing you to identify any specific areas where the model
may be performing well or struggling.
6.1 Functional testing
Functional testing is a type of software testing that focuses on testing the functional requirements of
a system or software application. It involves verifying that the system or application behaves as
expected and meets the specified functional requirements.
Test Planning: Define the scope of functional testing, identify the functionalities to be tested, and
determine the test objectives.
Test Design: Create test cases and test scenarios based on the functional requirements and
specifications. Test cases outline specific inputs, actions, and expected outputs for each test scenario.
Test Case Design: Develop test cases that cover different scenarios relevant to breast cancer
detection. For example:
Test cases for cancerous FNAC images: Include FNAC images with clear indications of cancerous
cells. Test the model's ability to correctly classify these images as cancerous.
Test cases for non-cancerous FNAC images: Include FNAC images without any cancerous cells.
Verify that the model accurately identifies them as non-cancerous.
Test cases for challenging images: Include FNAC images with subtle indications of cancerous cells
or images that can be easily misinterpreted. Verify the model's performance in these challenging
scenarios.
6.2 Unit testing
Unit testing is an essential part of software development and helps ensure the accuracy and reliability
of the code.

Dept. of CSE,SUIET,Mukka Page 39


Breast cancer detection using CNN model 2022-2023
Understand the CNN model: Familiarize yourself with the architecture, layers, and overall
functioning of the CNN model used in the breast cancer detection system. This will help you identify
the individual components that need to be tested.
Identify testable units: Break down the CNN model into smaller functional units that can be tested
independently. For example, you might have units for data preprocessing, convolutional layers,
pooling layers, fully connected layers, etc.
Write test cases: Create test cases for each unit based on the expected behavior and outputs. For
instance, you can use sample input data and verify that the output from each unit matches the
expected results.
6.3 Integration testing
Integration testing in breast cancer detection using FNAC (Fine Needle Aspiration Cytology) dataset
typically involves validating the integration of various components or modules involved in the breast
cancer detection system.
Identify the components: Determine the different components or modules involved in the breast
cancer detection system. This may include data preprocessing, feature extraction, machine learning
algorithms, and result interpretation modules.
Define interfaces: Clearly define the inputs, outputs, and interactions between the components. This
includes specifying the data formats, APIs, or other mechanisms for data exchange between the
modules.
Develop test cases: Create a set of test cases that cover various scenarios and potential interactions
between the components. Test cases should include positive and negative cases to validate both
correct and incorrect behavior of the system.
6.4 System testing
System testing in breast cancer detection using a CNN (Convolutional Neural Network) model from
the FNAC dataset involves evaluating the overall performance and functionality of the system,
specifically focusing on the integration and interaction between the CNN model and other system
components.
Set up the testing environment: Prepare the required infrastructure and dependencies for running the
system tests. This includes setting up the hardware, software, and libraries necessary to execute the
CNN model and other associated components.
Prepare the test dataset: Select a representative subset of the FNAC dataset that includes a variety of
breast cytology samples, covering both malignant and benign cases. Split the dataset into training,
validation, and testing subsets, ensuring that each subset has a balanced representation of different
classes.
6.5 Test cases

Dept. of CSE,SUIET,Mukka Page 40


Test case ID 1
Test case Input FNAC image of a breast tissue sample with
malignant cells
Test case objective accurately identify and classify breast cancer
cases as positive.
Expected result The CNN model should classify the image as
"Malignant"
Actual result The CNN model classifies the image as
"Malignant"
Remarks The model correctly identifies the presence of
malignant cells, indicating a positive diagnosis.

Table 6.0 Test case 1 for Positive diagnosis

Test case ID 2
Test case Input FNAC image of a breast tissue sample without
malignant cells
Test case objective accurately identify and classify breast cancer
cases as negative.
Expected result The CNN model should classify the image as
"Benign"
Actual result The CNN model classifies the image as "Benign"
Remarks The model correctly identifies the absence of
malignant cells, indicating a negative diagnosis.

Table 6.1 Test case 2 for negative diagnosis

Dept. of CSE,SUIET,Mukka Page 41


CHAPTER 7

RESULT
Breast cancer detection using CNN model 2022-2023

CHAPTER 7

RESULTS

Figure 7.0 Welcome window

The welcome window would provide a brief introduction to the breast cancer detection software,
explaining its purpose and how it can assist in early detection or diagnosis.
Welcome window typically serves as the initial screen or dialogue box that appears when you open
the interface . Its purpose is to provide an introduction to the interface, guide users through essential
setup steps, and offer a start button, which the user can press and get started with process of
predicting the diagnosis ad calculating metrics.

Dept. of CSE,SUIET,Mukka Page 43


Breast cancer detection using CNN model 2022-2023

Figure 7.1 Window with labels, entry field, predict, calculate metrics, reset, information button

This window contains labels like radius mean, texture mean, perimeter mean, area mean, smoothness
mean, Compactness mean with entry fields to enter the values. This window also contains predict
button which we’ll press after enter the values to find the prediction of the diagnosis, Calculate
metrics button is present to give us an insight about the Confusion matrix along with precision,
accuracy, recall and F1 score. If we press reset button it will reset the values and we can type the
values again. Information button will give us description about the fundamental information of
breast cancer along with difference between benign and malignant.

Dept. of CSE,SUIET,Mukka Page 44


Breast cancer detection using CNN model 2022-2023

Figure 7.2 Window with input values entered

After this window displays we have to type the respective values for the labels in their entry fields.

Figure 7.3 Prediction of diagnosis

After entering the input values press on the predict button which will give the prediction of the
diagnosis whether it is a Benign[B] or a Malignant[M].

Dept. of CSE,SUIET,Mukka Page 45


Breast cancer detection using CNN model 2022-2023

Figure 7.4 Calculate metrics

After identifying the prediction of the diagnosis, press on the calculate metrics which will display the
precision, accuracy, recall and F1 score.
Precision ranges from 0 to 1, where a higher value indicates a better performance in terms of
correctly identifying positive samples while minimizing false positives. A precision value of 1
indicates that all predicted positive samples are indeed true positives, with no false positives.
Our precision score is 0.971, which indicates we have better performance in identifying positive
samples.
A recall score of 0.971 in breast cancer detection using a CNN model and the FNAC dataset
indicates that the model correctly identified 97.1% of the actual positive cases (breast cancer) in the
dataset.
An accuracy score of 0.964 in breast cancer detection using a CNN model and the FNAC dataset
indicates that the model correctly classified 96.4% of all cases (both positive and negative) in the
dataset.
Accuracy is a metric that measures the overall correctness of the model's predictions. In the context
of breast cancer detection, a high accuracy score suggests that the model has performed well in
classifying both positive (breast cancer) and negative (non-cancerous) cases.
A F1 score of 0.971 in breast cancer detection using a CNN model and the FNAC dataset indicates
that the model has achieved a high balance between precision and recall.
The F1 score is the harmonic mean of precision and recall, providing a single metric that combines
both measures. It is particularly useful when you want to consider both false positives (precision)
and false negatives (recall) equally important.

Dept. of CSE,SUIET,Mukka Page 46


Breast cancer detection using CNN model 2022-2023

Figure 7.5 Confusion Matrix

Based on this confusion matrix, we our model calculated various evaluation metrics such as
accuracy, precision, recall (sensitivity), specificity, and F1 score to assess the performance of the
CNN model in breast cancer detection.
In our case, we have:
 True Positive (TP): 61
 True Negative (TN): 104
 False Positive (FP): 3
 False Negative (FN): 3

Figure 7.6 Information Window

After pressing the Information button this window will display which will give insight information
about breast cancer, precautions to take and difference between Benign and Malignant.

Dept. of CSE,SUIET,Mukka Page 47


CHAPTER 8

CONCLUSION AND
FUTURE
ENHANCEMENT
BTYFytfytsj[Type here] [Type here] [Type here]

Breast Cancer Detection using CNN Model 2022-2023

CHAPTER 8

CONCLUSION AND FUTURE ENHANCEMENTS

Breast cancer is a serious health issue that affects millions of people around the world. With the
advancements in technology, the use of convolutional neural networks (CNNs) has shown great
potential in improving breast cancer detection. With the help of convolutional neural networks
(CNNs), it is possible to use FNA reports to accurately detect breast cancer.

In conclusion, CNNs can be effective in detecting breast cancer using FNA reports with high
accuracy. By analyzing the FNA reports, the model can identify the presence of malignant cells and
provide valuable information to assist doctors in diagnosing and treating patients with breast cancer
at an early stage, leading to better treatment outcomes and potentially saving lives.

However, despite the promising results, there are still limitations to using CNNs for breast cancer
detection. One of the major challenges is the limited availability of high-quality medical imaging
data for training and testing the models. Additionally, there is a risk of bias in the models if the data
used to train them is not diverse enough, leading to inaccurate predictions.

Overall, while there is still much work to be done to optimize and improve CNNs for breast cancer
detection, this technology offers great potential in aiding in the early diagnosis and treatment of this
disease. Despite the challenges, the use of CNNs for breast cancer detection through FNA reports
offers great potential in improving diagnostic accuracy and reducing the need for invasive
procedures. Further research and development in this area could lead to significant advancements in
the diagnosis and treatment of breast cancer.

In the future there is a scope for detection of stages using the features from the FNA. Also advances
in sensors, contrast agents, molecular methods, and artificial intelligence will help detect cancer-
specific signals in real time. To reduce the burden of cancer on society, risk-based detection and
prevention needs to be cost effective and widely accessible.
BTYFytfytsj[Type here] [Type here] [Type here]

Dept. of CSE , SUIET, Mukka Page 48


REFERENCES
Breast Cancer Detection using CNN Model 2022-2023

REFERENCES

1. M. Masud, A. E. Eldin Rashed, and M. S. Hossain, “Convolutional neural network-based models


for diagnosis of breast cancer,” Neural Computing and Applications, vol. 5, 2020.
2. G. Muhammad, M. S. Hossain, and N. Kumar, “EEG-based pathology detection for home health
monitoring,” IEEE Journal on Selected Areas in Communications, vol. 39, no. 2, pp. 603–610,
2021.
3. M. Chen, J. Yang, L. Hu, M. S. Hossain, and G. Muhammad, “Urban healthcare big data system
based on crowdsourced and cloud-based air quality indicators,” IEEE Communications
Magazine, vol. 56, no. 11, pp. 14–20, 2018.
4. M. S. Hossain, “Cloud-supported cyber-physical localization framework for patients
monitoring,” IEEE Systems Journal, vol. 11, no. 1, pp. 118–127, 2017.
5. S. A. Alanazi, M. M. Kamruzzaman, M. Alruwaili, N. Alshammari, S. A. Alqahtani, and A.
Karime, “Measuring and preventing COVID-19 using the SIR model and machine learning in
smart health care,” Journal of Healthcare Engineering, vol. 2020, Article ID 8857346, 12 pages,
2020.
6. Y. Zhang, X. Ma, J. Zhang, M. S. Hossain, G. Muhammad, and S. U. Amin, “Edge intelligence
in the cognitive internet of things: improving sensitivity and interactivity,” IEEE Network, vol.
33, no. 3, pp. 58–64, 2019.
7. M. M. Kamruzzaman, “Architecture of smart health care system using artificial intelligence,”
in Proceedings of the 2020 IEEE International Conference on Multimedia & Expo Workshops
(ICMEW), pp. 1–6, London, UK, July 2020.
8. W. Min, B.-K. Bao, C. Xu, and M. S. Hossain, “Cross-platform multi-modal topic modeling for
personalized inter-platform recommendation,” IEEE Transactions on Multimedia, vol. 17, no.
10, pp. 1787–1801, 2015.
9. M. M. Kamruzzaman, “Arabic sign language recognition and generating Arabic speech using
convolutional neural network,” Wireless Communications and Mobile Computing, vol. 2020,
Article ID 3685614, 9 pages, 2020.
10. M. S. Hossain, S. U. Amin, M. Alsulaiman, and G. Muhammad, “Applying deep learning for
epilepsy seizure detection and brain mapping visualization,” ACM Transactions on Multimedia
Computing, Communications, and Applications, vol. 15, no. 1, pp. 1–17,2019.

Dept. of CSE , SUIET, Mukka Page 50


Breast Cancer Detection using CNN Model 2022-2023

11. J. L. Wang, A. K. Ibrahim, H. Zhuang, A. Muhamed Ali, A. Y. Li, and A. Wu, “A study on
automatic detection of IDC breast cancer with convolutional neural networks,” in Proceedings of
the 2018 International Conference on Computational Science and Computational Intelligence
(CSCI), pp. 703–708, Las Vegas, NV, USA, December 2018.
12. M. S. Hossain and G. Muhammad, “Emotion-aware connected healthcare big data towards
5G,” IEEE Internet of Things Journal, vol. 5, no. 4, pp. 2399–2406, 2018.
13. S. U. Amin, M. Alsulaiman, G. Muhammad, M. A. Bencherif, and M. S. Hossain, “Multilevel
weighted feature fusion using convolutional neural networks for EEG motor imagery
classification,” IEEE Access, vol. 7, pp. 18940–18950, 2019.
14. N. Wahab and A. Khan, “Multifaceted fused-CNN based scoring of breast cancer whole-slide
histopathology images,” Applied Soft Computing, vol. 97, p. 106808, 2020.
15. M. Gravina, S. Marrone, M. Sansone, and C. Sansone, “DAE-CNN: exploiting and disentangling
contrast agent effects for breast lesions classification in DCE-MRI,” Pattern Recognition Letters,
vol. 145, pp. 67–73, 2021.
16. L. Tsochatzidis, P. Koutla, L. Costaridou, and I. Pratikakis, “Integrating segmentation
information into CNN for breast cancer diagnosis of mammographic masses,” Computer
Methods and Programs in Biomedicine, vol. 200, p. 105913, 2021.
17. M. Malathi, P. Sinthia, F. Farzana, and G. Aloy Anuja Mary, “Breast cancer detection using
active contour and classification by deep belief network,” Materials Today: Proceedings, 2021.
18. M. Desai and M. Shah, “An anatomization on breast cancer detection and diagnosis employing
multi-layer perceptron neural network (MLP) and Convolutional neural network
(CNN),” Clinical eHealth, vol. 4, pp. 1–11, 2021.
19. D. Abdelhafiz, J. Bi, R. Ammar, C. Yang, and S. Nabavi, “Convolutional neural network for
automated mass segmentation in mammography,” BMC Bioinformatics, vol. 21, no. 1, pp. 1–19,
2020.
20. H. Rezaeilouyeh, A. Mollahosseini, and M. H. Mahoor, “Microscopic medical image
classification framework via deep learning and shearlet transform,” Journal of Medical Imaging,
vol. 3, no. 4, Article ID 044501, 2016.

Dept. of CSE , SUIET, Mukka Page 51

You might also like