Breast

TABLE OF CONTENT
CHAPTER TITLE PAGE

ABSTRACT I
1 INTRODUCTION 1
1.1 SYSTEM OVERVIEW 1
1.2 SCOPE OF THE PROJECT 2
2 LITERATURE SURVEY 3
2.1 Bacterial colony counting with Convolutional Neural 3
Networks in Digital Microbiology Imaging
2.2 A Dataset for Breast Cancer Histopathological Image 3

Classification
2.3 Breast Cancer Multi-classification from 4

Histopathological Images with Structured Deep Learning
Model
2.4 Deep Convolutional Neural Networks for Breast 4

Cancer Histology Image Analysis
2.5 Gland segmentation in colon histology images using 5

hand-crafted features and convolutional neural networks
2.6 Context-aware stacked convolutional neural networks 6

for classification of breast carcinomas in whole-slide
histopathology images
3 SYSTEM ANALYSIS 7
3.1 EXISTING SYSTEM 7
3.1.1 Disadvantages 7
3.2 PROPOSED SYSTEM 8
3.2.1 Advantages 8
3.3 SYSTEM REQUIREMENTS 8
3.3.1 Hardware Requirement 8
3.3.2 Software Requirement 8
4 SYSTEM ARCHITCTECTURE 9
4.1 ARCHITECTURE DESCRIPTION 9
5 SYSTEM IMPLEMENTATION 10
5.1 LIST OF MODULE 10
5.1.1 dataset 10
5.1.2 Data preprocessing 10
5.1.3 Classification 10
5.1.4 Comparative performance analysis 11
REFERENCES 12
ABSTRACT
Cancer has been characterized as a heterogeneous disease consisting of many different

subtypes. The early diagnosis and prognosis of a cancer type have become a necessity in cancer
research, as it can facilitate the subsequent clinical management of patients. The importance of
classifying cancer patients into high or low risk groups has led many research teams, from the
biomedical and the bioinformatics field, to study the application of machine learning (ML)
methods. Therefore, these techniques have been utilized as an aim to model the progression and
treatment of cancerous conditions. In addition, the ability of ML tools to detect key features from
complex datasets reveals their importance. A variety of these techniques, including Artificial
Neural Networks (ANNs), Bayesian Networks (BNs), Support Vector Machines (SVMs) and
Decision Trees (DTs) have been widely applied in cancer research for the development of
predictive models, resulting in effective and accurate decision making. Even though it is evident
that the use of ML methods can improve our understanding of cancer progression, an appropriate
level of validation is needed in order for these methods to be considered in the everyday clinical
practice. In this work, we present a review of recent ML approaches employed in the modeling
of cancer progression. The predictive models discussed here are based on various supervised ML
techniques as well as on different input features and data samples. Given the growing trend on
the application of ML methods in cancer research, we present here the most recent publications
that employ these techniques as an aim to model cancer risk or patient outcomes
CHAPTER 1
INTRODUCTION
1.1 SYSTEM OVERVIEW
1
ARTIFICIAL INTELLIGENCE
 “Artificial Intelligence (AI) is the part of computer science concerned with
designing intelligent computer systems, that is, systems that exhibit
characteristics we associate with intelligence in human behaviour –
understanding language, learning, reasoning, solving problems, and so on.”
 Scientific Goal To determine which ideas about knowledge representation,
learning, rule systems, search, and so on, explain various sorts of real
intelligence.
 Engineering Goal To solve real world problems using AI techniques such as
 knowledge representation, learning, rule systems, search, and so on.
 Traditionally, computer scientists and engineers have been more interested
in the engineering goal, while psychologists, philosophers and cognitive
scientists have been more interested in the scientific goal.
 The Roots - Artificial Intelligence has identifiable roots in a number of older
disciplines, particularly
 Philosophy
 Logic/Mathematics
 Computation
 Logic/Mathematics
 Computation Psychology/Cognitive Science
 Biology/Neuroscience
 Evolution
2
1.2 MACHINE LEARNING
Machine learning is a subset of artificial intelligence in the field of computer

science that often uses statistical techniques to give computers the ability to "learn"
(i.e., progressively improve performance on a specific task) with data, without
being explicitly programmed.is a type of Artificial Intelligence that provides
computers with the ability to learn without being explicitly programmed.
1.3.1 Machine Learning Approaches

• Supervised Learning: Learning with a labelled training set
• Example: email spam detector with training set of labelled emails
• Unsupervised Learning: Discovering patterns in unlabelled data
• Example: cluster similar documents based on the text content
3
• Reinforcement Learning: learning based on feedback or reward
• Example: learn to play chess by winning or losing
1.3 CONVOLU
TIONAL
NEURAL
NETWORK
CNNs use a variation of multilayer perceptrons designed to require minimal
preprocessing. They are also known as Shift Invariant or Space Invariant
Artificial Neural networks (SIANN), based on their shared-weights architecture
and translation invariance characteristics.
Convolutional networks were inspired by biological processes in that the
connectivity pattern between neurons resembles the organization of the animal
visual cortex. Individual cortical neurons respond to stimuli only in a restricted
region of the visual field known as the receptive field. The receptive fields of
different neurons partially overlap such that they cover the entire visual field.
CNNs use relatively little pre-processing compared to other image classification
algorithms. This means that the network learns the filters that in traditional
algorithms were hand-engineered. This independence from prior knowledge and
human effort in feature design is a major advantage.
Over the past decades, a continuous evolution related to cancer research has been
performed. Scientists applied different methods, such as screening in early stage, in
4
order to find types of cancer before they cause symptoms. Moreover, they have
developed new strategies for the early prediction of cancer treatment outcome.
With the advent of new technologies in the field of medicine, large amounts of
cancer data have been collected and are available to the medical research
community. However, the accurate prediction of a disease outcome is one of the
most interesting and challenging tasks for physicians. As a result, ML methods
have become a popular tool for medical researchers. These techniques can discover
and identify patterns and relationships between them, from complex datasets, while
they are able to effectively predict future outcomes of a cancer type. Given the
significance of personalized medicine and the growing trend on the application of
ML techniques, we here present a review of studies that make use of these methods
regarding the cancer prediction and prognosis. In these studies prognostic and
predictive features are considered which may be independent of a certain treatment
or are integrated in order to guide therapy for cancer patients, respectively [2]. In
addition, we discuss the types of ML methods being used, the types of data they
integrate, the overall performance of each proposed scheme while we also discuss
their pros and cons. An obvious trend in the proposed works includes the
integration of mixed data, such as clinical and genomic. However, a common
problem that we noticed in several works is the lack of external validation or
testing regarding the predictive performance of their models. It is clear that the
application of ML methods could improve the accuracy of cancer susceptibility,
recurrence and survival prediction. Based on [3], the accuracy of cancer prediction
outcome has significantly improved by 15%–20% the last years, with the
application of ML techniques. Several studies have been reported in the literature
and are based on different strategies that could enable the early cancer diagnosis
and prognosis [4–7]. Specifically, these studies describe approaches related to the
profiling of circulating miRNAs that have been proven a promising class for
5
cancer detection and identification. However, these methods suffer from low
sensitivity regarding their use in screening at early stages and their difficulty to
discriminate benign from malignant tumors. Various aspects regarding the
prediction of cancer outcome based on gene expression signatures are discussed in
[8,9]. These studies list the potential as well as the limitations of microarrays for
the prediction of cancer outcome. Even though gene signatures could significantly
improve our ability for prognosis in cancer patients, poor progress has been made
for their application in the clinics. However, before gene expression profiling can
be used in clinical practice, studies with larger data samples and more adequate
validation are needed. In the present work only studies that employed ML
techniques for modeling cancer diagnosis and prognosis are presented. 2. ML
techniques ML, a branch of Artificial Intelligence, relates the problem of learning
from data samples to the general concept of inference [10–12]. Every learning
process consists of two phases: (i) estimation of unknown dependencies in a
system from a given dataset and (ii) use of estimated dependencies to predict new
outputs of the system. ML has also been proven an interesting area in biomedical
research with many applications, where an acceptable generalization is obtained by
searching through an n-dimensional space for a given set of biological samples,
using different techniques and algorithms [13]. There are two main common types
of ML methods known as (i) supervised learning and (ii) unsupervised learning. In
supervised learning a labeled set of training data is used to estimate or map the
input data to the desired output. In contrast, under the unsupervised learning
methods no labeled examples are provided and there is no notion of the output
during the learning process. As a result, it is up to the learning scheme/model to
find patterns or discover the groups of the input data. In supervised learning this
procedure can be thought as a classification problem. The task of classification
refers to a learning process that categorizes the data into a set of finite classes. Two
6
other common ML tasks are regression and clustering. In the case of regression
problems, a learning function maps the data into a real-value variable.
Subsequently, for each new sample the value of a predictive variable can be
estimated, based on this process. Clustering is a common unsupervised task in
which one tries to find the categories or clusters in order to describe the data items.
Based on this process each new sample can be assigned to one of the identified
clusters concerning the similar characteristics that they share. Suppose for example
that we have collected medical records relevant to breast cancer and we try to
predict if a tumor is malignant or benign based on its size. The ML question would
be referred to the estimation of the probability that the tumor is malignant or no (1
= Yes, 0 = No). Fig. 1 depicts the classification process of a tumor being malignant
or not. The circled records depict any misclassification of the type of a tumor
produced by the procedure. Another type of ML methods that have been widely
applied is semi-supervised learning, which is a combination of supervised and
unsupervised learning. It combines labeled and unlabeled data in order to construct
an accurate learning model. Usually, this type of learning is used when there are
more unlabeled datasets than labeled. When applying a ML method, data samples
constitute the basic components. Every sample is described with several features
and every feature consists of different types of values. Furthermore, knowing in
advance the specific type of data being used allows the right selection of tools and
techniques that can be used for their analysis. Some data-related issues refer to the
quality of the data and the preprocessing steps to make them more suitable for ML.
Data quality issues include the presence of noise, outliers, missing or duplicate
data and data that is biased-unrepresentative. When improving the data quality,
typically the quality of the resulting analysis is also improved. In addition, in order
to make the raw data more suitable for further analysis, preprocessing steps should
be applied that focus on the modification of the data. A number of different
7
techniques and strategies exist, relevant to data preprocessing that focus on
modifying the data for better fitting in a specific ML method. Among these
techniques some of the most important approaches include (i) dimensionality
reduction (ii) feature selection and (iii) feature extraction. There are many benefits
regarding the dimensionality reduction when the datasets have a large number of
features. ML algorithms work better when the dimensionality is lower [14].
Additionally, the reduction of dimensionality can eliminate irrelevant features,
reduce noise and can produce more robust learning models due to the involvement
of fewer features. In general, the dimensionality reduction by selecting new
features which are a subset of the old ones is known as feature selection. Three
main approaches exist for feature selection namely embedded, filter and wrapper
approaches
1.2 SCOPE OF THE PROJECT

 Accurate prognosis prediction of breast cancer using deep belief
networks using multi-dimensional data
CHAPTER 2
LITERATURE SURVEY
8
Classification of Breast Cancer Based on Histology Images using
Convolutional Neural Networks
The work is contributed by Dalal Bardou et.al. In recent years, the classification of
breast cancer has been the topic of interest in the field of Healthcare informatics,
because it is the second main cause of cancer-related deaths in women. Breast
cancer can be identified using a biopsy where tissue is removed and studied under
microscope. The diagnosis is based on the qualification of the histopathologist,
who will look for abnormal cells. However, if the histopathologist is not well-
trained, this may lead to wrong diagnosis. With the recent advances in image
processing and machine learning, there is an interest in attempting to develop a
reliable pattern recognition based systems to improve the quality of diagnosis. In
this paper, we compare two machine learning approaches for the automatic
classification of breast cancer histology images into benign and malignant and into
benign and malignant sub-classes. The first approach is based on the extraction of
a set of handcrafted features encoded by two coding models (bag of words and
locality constrained linear coding) and trained by support vector machines, while
the second approach is based on the design of convolutional neural network. We
have also experimentally tested dataset augmentation techniques to enhance the
accuracy of the convolutional neural network as well as “handcrafted features +
convolutional neural network” and “convolutional neural network” and
“convolutional neural network features + classifier” configurations. The results
show convolutional neural networks outperformed the handcrafted feature based
classifier, where we achieved accuracy between 96.15% and 98.33% for the binary
classification and 83.31% and 88.23% for the multi-class classification.
Bacterial colony counting with Convolutional Neural Networks in Digital
Microbiology Imaging
9
The work is contributed by Alessandro Ferrari et.al. In recent years, the
classification of With this work we explore the possibility to find effective
solutions to the above issue by designing and testing two different machine
learning approaches. The first one is based on the extraction of a complete set of
handcrafted morphometric and radiometric features used within a Support Vector
Machines solution. The second one is based on the design and configuration of a
Convolutional Neural Networks deep learning architecture. To validate, in a real
and challenging clinical scenario, the proposed bacterial load estimation
techniques, we built and publicly released a fully labeled large and representative
database of both single and aggregated bacterial colonies extracted from routine
clinical laboratory culture plates. Dataset enhancement approaches have also been
experimentally tested for performance optimization. The adopted deep learning
approach outperformed the handcrafted feature based one, and also a conventional
reference technique, by a large margin, becoming a preferable solution for the
addressed Digital Microbiology Imaging quantification task, especially in the
emerging context of Full Laboratory Automation systems.
A Dataset for Breast Cancer Histopathological Image Classification

The work is contributed by Fabio A. Spanhol et.al. Today, medical image analysis
papers require solid experiments to prove the usefulness of proposed methods.
However, experiments are often performed on data selected by the researchers,
which may come from different institutions, scanners, and populations. Different
evaluation measures may be used, making it difficult to compare the methods. In
this paper, we introduce a dataset of 7909 breast cancer histopathology images
acquired on 82 patients, which is now publicly available from
http://web.inf.ufpr.br/vri/breast-cancer-database. The dataset includes both benign
and malignant images. The task associated with this dataset is the automated
10
classification of these images in two classes, which would be a valuable computer-
aided diagnosis tool for the clinician. In order to assess the difficulty of this task,
we show some preliminary results obtained with state-of-the-art image
classification systems. The accuracy ranges from 80% to 85%, showing room for
improvement is left. By providing this dataset and a standardized evaluation
protocol to the scientific community, we hope to gather researchers in both the
medical and the machine learning field to advance toward this clinical application.
Breast Cancer Multi-classification from Histopathological Images with

Structured Deep Learning Model
The work is contributed by Zhongyi Han, Benzheng Wei et.al. Automated breast
cancer multi-classification from histopathological images plays a key role in
computer-aided breast cancer diagnosis or prognosis. Breast cancer multi-
classification is to identify subordinate classes of breast cancer (Ductal carcinoma,
Fibroadenoma, Lobular carcinoma, etc.). However, breast cancer multi-
classification from histopathological images faces two main challenges from: (1)
the great difficulties in breast cancer multi-classification methods contrasting with
the classification of binary classes (benign and malignant), and (2) the subtle
differences in multiple classes due to the broad variability of high-resolution image
appearances, high coherency of cancerous cells, and extensive inhomogeneity of
color distribution. Therefore, automated breast cancer multi-classification from
histopathological images is of great clinical significance yet has never been
explored. Existing works in literature only focus on the binary classification but do
not support further breast cancer quantitative assessment. In this study, we propose
a breast cancer multi-classification method using a newly proposed deep learning
model. The structured deep learning model has achieved remarkable performance
11
(average 93.2% accuracy) on a large-scale dataset, which demonstrates the strength
of our method in providing an efficient tool for breast cancer multi-classification in
clinical settings.
Deep Convolutional Neural Networks for Breast Cancer Histology Image

Analysis
The work is contributed by Zhongyi Han, Benzheng Wei et.al. Breast cancer is one
of the main causes of cancer death worldwide. Early diagnostics significantly
increases the chances of correct treatment and survival, but this process is tedious
and often leads to a disagreement between pathologists. Computer-aided diagnosis
systems showed potential for improving the diagnostic accuracy. In this work, we
develop the computational approach based on deep convolution neural networks
for breast cancer histology image classification. Hematoxylin and eosin stained
breast histology microscopy image dataset is provided as a part of the ICIAR 2018
Grand Challenge on Breast Cancer Histology Images. Our approach utilizes
several deep neural network architectures and gradient boosted trees classifier. For
4-class classification task, we report 87.2% accuracy. For 2-class classification task
to detect carcinomas we report 93.8% accuracy, AUC 97.3%, and
sensitivity/specificity 96.5/88.0% at the high-sensitivity operating point. To our
knowledge, this approach outperforms other common methods in automated
histopathological image classification.
Gland segmentation in colon histology images using hand-crafted features and

convolutional neural networks
12
The work is contributed by Stephen J. McKenna et.al. We investigate glandular
structure segmentation in colon histology images as a window-based classification
problem. We compare and combine methods based on fine-tuned convolutional
neural networks (CNN) and hand-crafted features with support vector machines
(HC-SVM). On 85 images of H&E-stained tissue, we find that fine-tuned CNN
outperforms HC-SVM in gland segmentation measured by pixel-wise Jaccard and
Dice indices. For HC-SVM we further observe that training a second-level window
classifier on the posterior probabilities - as an output refinement - can substantially
improve the segmentation performance. The final performance of HC-SVM with
refinement is comparable to that of CNN. Furthermore, we show that by
combining and refining the posterior probability outputs of CNN and HC-SVM
together, a further performance boost is obtained
Context-aware stacked convolutional neural networks for classification of

breast carcinomas in whole-slide histopathology images
The work is contributed by Babak Ehteshami Bejnordi et.al.Currently,
histopathological tissue examination by a pathologist represents the gold standard
for breast lesion diagnostics. Automated classification of histopathological whole-
slide images (WSIs) is challenging owing to the wide range of appearances of
benign lesions and the visual similarity of ductal carcinoma in-situ (DCIS) to
invasive lesions at the cellular level. Consequently, analysis of tissue at high
resolutions with a large contextual area is necessary. We present context-aware
stacked convolutional neural networks (CNN) for classification of breast WSIs into
normal/benign, DCIS, and invasive ductal carcinoma (IDC). We first train a CNN
using high pixel resolution to capture cellular level information. The feature
responses generated by this model are then fed as input to a second CNN, stacked
on top of the first. Training of this stacked architecture with large input patches
13
enables learning of fine-grained (cellular) details and global tissue structures. Our
system is trained and evaluated on a dataset containing 221 WSIs of hematoxylin
and eosin stained breast tissue specimens. The system achieves an AUC of 0.962
for the binary classification of nonmalignant and malignant slides and obtains a
three-class accuracy of 81.3% for classification of WSIs into normal/benign, DCIS,
and IDC, demonstrating its potential for routine diagnostics.
CHAPTER 3
SYSTEM ANALYSIS
3.1 SYSTEM ANALYSIS
14
EXISTING SYSTEM
Existing System are the works that are already implemented successfully.
Techniques used in the existing system are described below.
In recent years, the classification of breast cancer has been the topic of interest in
the field of Healthcare informatics, because it is the second main cause of cancer-
related deaths in women. Breast cancer can be identified using a biopsy where
tissue is removed and studied under microscope. The diagnosis is based on the
qualification of the histopathologist, who will look for abnormal cells. However, if
the histopathologist is not well-trained, this may lead to wrong diagnosis. With the
recent advances in image processing and machine learning, there is an interest in
attempting to develop a reliable pattern recognition based systems to improve the
quality of diagnosis. In this paper, we compare two machine learning approaches
for the automatic classification of breast cancer histology images into benign and
malignant and into benign and malignant sub-classes. The first approach is based
on the extraction of a set of handcrafted features encoded by two coding models
(bag of words and locality constrained linear coding) and trained by support vector
machines, while the second approach is based on the design of convolutional
neural networks. We have also experimentally tested dataset augmentation
techniques to enhance the accuracy of the convolutional neural network as well as
‘‘handcrafted features + convolutional neural network’’ and ‘‘convolutional neural
network features + classifier’’ configurations. The results show convolutional
neural networks outperformed the handcrafted feature based classifier, where we
achieved accuracy between 96.15% and 98.33% for the binary classification and
83.31% and 88.23% for the multi-class classification.
3.1.1 DISADVANTAGES
15
 The designed CNN topology worked well on both binary and multi-class
classification tasks. However, the performance of the multi-class
classification was lower when compared to the one of the binary
classification due to the number of handled classes and also due to the
similarities between the sub-classes
 The results show convolutional neural networks outperformed the
handcrafted feature based classifier, where they achieved accuracy between
96.15% and 98.33% for the binary classification and 83.31% and 88.23% for
the multi-class classification.
3.2 PROPOSED SYSTEM

The proposed system is the system which is developed to enhance the
shortcomings of existing system by using advanced techniques.
3.2.1 ADVANTAGES
 Our Proposing work using multiple classifier topology has to improve
the accuracy of the previous ones for the multiclass classification task,
where they reached a performance of between 83.31% and 88.23%
3.3 SYSTEM REQUIREMENT

The project will require some software and hardware to develop the
system and these resources are mandate to develop the project with utmost
accuracy. The hardware and software required to develop the proposed
system are listed below:
3.31.1 HARDWARE REQUIREMENTS
16
Hardware specifications are technical description of the computer's
components and capabilities. Processor speed, model and manufacturer, etc.
So the hardware components required for the proposed system are:
 Processor : Intel Core i5.
 Hard disk : 1TB.
 Speed : 1.80GHz
 Memory : 4GB.
3.3.2 SOFTWARE REQUIREMENTS

A software requirement specification is a description of a software
system to be developed. It lays out functional and non-functional
requirements and it also describes the operating system and tool used in the
system and they are:
 Operating system : Ubuntu 16.4 LTS.
 Front end : Keras.
 Back end : Tensorflow,Open CV package.
CHAPTER 4
SYSTEM ARCHITECTURE
17
CHAPTER 5
SYSTEM IMPLEMENTATION
5.1 LIST OF MODULES

5.1.1 Dataset
5.1.2 Data Pre-processing
5.1.3 Classification
5.1.4 Comparative Performance Analysis
SYSTEM DESCRIPTION
This system contains certain modules to execute the proposed system and each
module will contain certain algorithms and techniques to be executed. Certainly,
this work also contains few modules to exhibit the breast cancer classification
based on histological images and also graph representation.
5.1.1 DATASET
Dataset is a collection of data. Most commonly a data set corresponds to

the contents of a single database table, or a single statistical data matrix, where
every column of the table represents a particular variable, and each row
corresponds to a given member of the data set in question. The data set lists values
for each of the variables, such as height and weight of an object, for each member
of the data set. Each value is known as a datum. The data set may comprise data
for one or more members, corresponding to the number of rows. The term data set
may also be used more loosely, to refer to the data in a collection of closely related
18
tables, corresponding to a particular experiment or event. This collected data stored
in the data warehouse.
5.5.2 DATA PREPROCESSING
Pre-processing is defined as the removal of error in the data. It transform

raw data and un-structured data into a structured data. Source data is collected and
stored in data warehouse and is preprocessed to extract the consistent data.
TYPES OF PREPROCESSING:
 RGB image
 Gray scale image
 Binary image
RGB COLOR MODEL
The RGB color model is an additive color model in which red, green and
blue light are added together in various ways to reproduce a broad array of colors.
The main purpose of the RGB color model is for the sensing, representation and
display of images in electronic systems, such as televisions and computers, though
it has also been used in conventional photography.
GRAY SCALE MODEL
Gray scale transformations can be performed using look-up tables. Grey

scale transformations are mostly used if the result viewed by a human. Grey scale
transformations do not depend on the position of the pixel in the image.
19
BINARY IMAGE MODEL
A binary image is a digital image that has only two possible values for each
pixel. Typically, the two colors used for a binary image are black and white. The
color used for the object ,in the image is the foreground color while the rest of the
image is the background color. In the document-scanning industry, this is often
referred to as "bi-tonal".
Binary images are also called bi-level or two-level. This means that each pixel is
stored as a single bit (i.e., a 0 or 1). The names black-and-white, B&W,
monochrome or monochromatic are often used for this concept, designate any
images that have only one sample per pixel, such as grayscale images.
PROCESSED DATA
The process results in converting the data from un-structured format to

structured form of data which imports the classification of breast cancer.
5.1.3 CLASSIFICATION
Classification is a general process related to categorization , the process in

which ideas and objects are recognized, differentiated, and understood. In this
case, the features are collected and classified by two various methods.
 Feature based classification
 Convolutional Neural Networks(CNN)
POOLING LAYERS
20
Convolutional networks may include local or global pooling layers, which
combine the outputs of neuron clusters at one layer into a single neuron in the next
layer. For example, max pooling uses the maximum value from each of a cluster of
neurons at the prior layer. Another example is average pooling, which uses the
average value from each of a cluster of neurons at the prior layer.
FULLY CONNECTED LAYERS
Finally, after several convolutional and max pooling layers, the high-level
reasoning in the neural network is done via fully connected layers. Neurons in a
fully connected layer have connections to all activations in the previous layer, as
seen in regular neural networks. Their activations can hence be computed with a
matrix multiplication followed by a bias offset.
5.1.4 COMPARATIVE PERFORMANCE ANALYSIS
Precision - Precision is the ratio of correctly predicted positive

observations to the total predicted positive observations. The question that
this metric answer is of all passengers that labeled as survived, how many
actually survived? High precision relates to the low false positive rate.
Precision = TP/TP+FP
Precision is used with recall, the percent of all relevant documents that is

returned by the search. The two measures are sometimes used together in
21
the F1 Score (or f-measure) to provide a single measurement for a system.
The usage of "precision" in the field of information retrieval differs from the
definition of accuracy and precision within other branches of science and
technology.
Recall (Sensitivity) - Recall is the ratio of correctly predicted positive

observations to the all observations in actual class - yes.
Recall = TP/TP+FN
For example, for a text search on a set of documents, recall is the number of
correct results divided by the number of results that should have been
returned. It can be viewed as the probability that a relevant document is
retrieved by the query.
It is trivial to achieve recall of 100% by returning all documents in response

to any query. Therefore, recall alone is not enough but one needs to measure
the number of non-relevant documents also, for example by also computing
the precision.
F1 score - F1 Score is the weighted average of Precision and Recall.

Therefore, this score takes both false positives and false negatives into
account. Intuitively it is not as easy to understand as accuracy, but F1 is
usually more useful than accuracy, especially if you have an uneven class
distribution. Accuracy works best if false positives and false negatives have
similar cost. If the cost of false positives and false negatives are very
different, it’s better to look at both Precision and Recall.
22
F1 Score = 2*(Recall * Precision) / (Recall + Precision)
This measure is approximately the average of the two when they are close,
and is more generally the harmonic mean, which, for the case of two
numbers, coincides with the square of the geometric mean divided by
the arithmetic mean.
Accuracy - Accuracy is the most intuitive performance measure and it

is simply a ratio of correctly predicted observation to the total observations.
One may think that, if we have high accuracy then our model is best. Yes,
accuracy is a great measure but only when you have symmetric datasets
where values of false positive and false negatives are almost same.
Therefore, you have to look at other parameters to evaluate the performance
of your model.
Accuracy = TP+TN/TP+FP+FN+TN
(i)True Positives (TP) - These are the correctly predicted positive values which
means that the value of actual class is yes and the value of predicted class is also
yes. E.g. if actual class value indicates that this passenger survived and predicted
class tells you the same thing.
(ii)True Negatives (TN) - These are the correctly predicted negative values
which means that the value of actual class is no and value of predicted class is
also no. E.g. if actual class says this passenger did not survive and predicted class
tells you the same thing.
23
False positives and false negatives, these values occur when your actual class
contradicts with the predicted class.
(iii)False Positives (FP) – When actual class is no and predicted class is yes. E.g.
if actual class says this passenger did not survive but predicted class tells you that
this passenger will survive.
(iv)False Negatives (FN) – When actual class is yes but predicted class in no. E.g.
if actual class value indicates that this passenger survived and predicted class tells
you that passenger will die.
Observation of classification
Data Flow Diagram
24
Analysing Classifying classified Breast cancer
Breast cancer images Breast cance classifier cells
processing Sensing
DFD 1 :
Preprocessing Breast cancer

images
acquired
transform data
ation
cleaning selecti
on Binary classification
Extraction
CNN
Feature Extraction
Classified Breast cancer
Breast cancer images cells
binary
classific Classification
ation segregate
analyse prediction
Prediction find output
DFD 2 :
preprocess breast
cancer images
Load
convert conver
classified
RGB images Gray scale binary images Breast
cancer cells
25
preprocess breast
cancer images
Load
convert conver
classified
RGB images Gray scale binary images Breast
cancer cells
DFD 4 :
Feature Extraction
Select
Extract Monitoring
classified
Feature Nodes Breast
cancer cells
DFD 5 :
Classification
Define
Set Load
CNN layer Activation class Image Breast cancer images
26
Prediction
load
Predict
Test vs Train Accuracy
Test images
compare
DFD 2.1

Load images
conv conv acquired
RGB ert Grey ert Binary data
image scale image
cleanin
transfor
mation
selectio Binary classification
n
CNN Extractio
n
Feature extraction Classified Breast cancer
Breast cancer images binary
classifi cells
cation extraction
Classification
analyse
prediction
find output
Prediction
27
images
transf
select ormati
ion Binary classification on
acquired data
load clas
cleanin preproces sify benight
s data transform
ation
malignant
CNN
Extractio
n
Breast cancer images binary cells
classifi
cation segregate
Classification
analyse
prediction
find output
Prediction
DFD 2.3

Load images
acquired
data
cleanin
transfor
mation
selectio Binary classification
n
CNN Extractio
n
sele
ct
feature nodes monitoring
classific
ation
Classification
segreg
ate
Prediction find output

analyse prediction
28
Load images
acquired
data
transf
ormati
cleanin Binary classification on
selectio
n
CNN Extractio
Feature extraction n
Classification
defi
ne se loa
CNN t Activatio d
layer n class image
classificati segregate
on
analyse prediction Prediction find output
DFD 2.5
29
Load images
acquired
data
transf
ormati
cleanin Binary classification on
selectio
n
CNN Extractio
Feature extraction n
Classification
binary
classificatio
segrega
n
te
l Prediction
load
Test Accurac
images Test vs
y
train find
compare output
predict
analyse
prediction
REFERENCES
 A. Ferrari, S. Lombardi, and A. Signoroni, ‘‘Bacterial colony counting with

convolutional neural networks in digital microbiology imaging,’’ Pattern
Recognit., vol. 61, pp. 629–640, Jan. 2017.
 F. A. Spanhol, L. S. Oliveira, C. Petitjean, and L. Heutte, ‘‘Breast cancer

histopathological image classification using convolutional neural
networks,’’ in Proc. Int. Joint Conf. Neural Netw. (IJCNN), Jul. 2016, pp.
2560–2567.
30
 T. Araújo et al., ‘‘Classification of breast cancer histology images using
convolutional neural networks,’’ PLoS ONE, vol. 12, no. 6, p. e0177544,
2017.
 F. A. Spanhol, L. S. Oliveira, P. R. Cavalin, C. Petitjean, and L. Heutte,

‘‘Deep features for breast cancer histopathological image classification,’’ in
Proc. IEEE Int. Conf. Syst., Man, Cybern. (SMC), Oct. 2017, pp. 1868–
1873.
 N. Bayramoglu, J. Kannala, and J. Heikkilä, ‘‘Deep learning for

magnification independent breast cancer histopathology image
classification,’’ in Proc. 23rd Int. Conf. Pattern Recognit. (ICPR), Dec.
2016, pp. 2440–2445.
 A.-A. Nahid and Y. Kong, ‘‘Histopathological breast-image classification

using local and frequency domains by convolutional neural network,’’
Information, vol. 9, no. 1, p. 19, 2018.
 Z. Han, B. Wei, Y. Zheng, Y. Yin, K. Li, and S. Li, ‘‘Breast cancer

multiclassification from histopathological images with structured deep
learning model,’’ Sci. Rep., vol. 7, Jun. 2017, Art. no. 4172.
31

Breast

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Breast

Uploaded by

Copyright:

Available Formats

TABLE OF CONTENT

CHAPTER TITLE PAGE

2.2 A Dataset for Breast Cancer Histopathological Image 3

2.3 Breast Cancer Multi-classification from 4

2.4 Deep Convolutional Neural Networks for Breast 4

2.5 Gland segmentation in colon histology images using 5

2.6 Context-aware stacked convolutional neural networks 6

Cancer has been characterized as a heterogeneous disease consisting of many different

Machine learning is a subset of artificial intelligence in the field of computer

1.3.1 Machine Learning Approaches

1.2 SCOPE OF THE PROJECT

A Dataset for Breast Cancer Histopathological Image Classification

Breast Cancer Multi-classification from Histopathological Images with

Deep Convolutional Neural Networks for Breast Cancer Histology Image

Gland segmentation in colon histology images using hand-crafted features and

Context-aware stacked convolutional neural networks for classification of

3.2 PROPOSED SYSTEM

3.3 SYSTEM REQUIREMENT

3.3.2 SOFTWARE REQUIREMENTS

 Operating system : Ubuntu 16.4 LTS.

 Front end : Keras.

 Back end : Tensorflow,Open CV package.

5.1 LIST OF MODULES

Dataset is a collection of data. Most commonly a data set corresponds to

5.5.2 DATA PREPROCESSING

Pre-processing is defined as the removal of error in the data. It transform

 Gray scale image

RGB COLOR MODEL

GRAY SCALE MODEL

Gray scale transformations can be performed using look-up tables. Grey

The process results in converting the data from un-structured format to

Classification is a general process related to categorization , the process in

 Feature based classification

 Convolutional Neural Networks(CNN)

FULLY CONNECTED LAYERS

5.1.4 COMPARATIVE PERFORMANCE ANALYSIS

Precision - Precision is the ratio of correctly predicted positive

Precision is used with recall, the percent of all relevant documents that is

Recall (Sensitivity) - Recall is the ratio of correctly predicted positive

It is trivial to achieve recall of 100% by returning all documents in response

F1 score - F1 Score is the weighted average of Precision and Recall.

Accuracy - Accuracy is the most intuitive performance measure and it

Data Flow Diagram

Preprocessing Breast cancer

CNN layer Activation class Image Breast cancer images

Preprocessing Breast cancer

Preprocessing Breast cancer

Prediction find output

analyse prediction Prediction find output

 A. Ferrari, S. Lombardi, and A. Signoroni, ‘‘Bacterial colony counting with

 F. A. Spanhol, L. S. Oliveira, C. Petitjean, and L. Heutte, ‘‘Breast cancer

 F. A. Spanhol, L. S. Oliveira, P. R. Cavalin, C. Petitjean, and L. Heutte,

 N. Bayramoglu, J. Kannala, and J. Heikkilä, ‘‘Deep learning for

 A.-A. Nahid and Y. Kong, ‘‘Histopathological breast-image classification

 Z. Han, B. Wei, Y. Zheng, Y. Yin, K. Li, and S. Li, ‘‘Breast cancer

You might also like