Professional Documents
Culture Documents
Submitted to -
Dr. Ruchika Malhotra
Associate Dean (IRD), DTU
A PROJECT BY
1
INDEX
I. Timeline and progressive itinerary
II. Introduction
III. Different proposed methods of detection
i. Mammography
ii. Supervised learning techniques
iii. Deep Learning for image classification (CNN)
IV. PHASE 1
i. Mammographic Research
ii. Prior developments in mammography for Breast cancer detection
iii. Why we didn’t proceed with Mammography for Breast Cancer
detection?
V. PHASE 2
i. Supervised Machine Learning Techniques; A brief overview and
chosen method of progression
ii. Proposed Supervised ML algorithms
iii. Linked Google Collaboratory link of implemented code
VI. PHASE 3
i. Convolution Neural Network for Breast Cancer Detection.
ii. Database Elucidated - BreakHis
iii. Why Histopathological Images are used in place of
Mammographic Images?
iv. Convolutional Neural Networks
v. Linked Google Collaboratory link of implemented code
vi. Experimental results
vii. Defining parameters for our model
VII. CONCLUSION AND FUTURE SCOPE
2
TIMELINE AND PROGRESSIVE ITINERARY
A mass can be either benign or malignant. The difference between benign and
malignant tumours is that the benign tumours have round or oval shapes,
3
while malignant tumours have a partially rounded shape with an irregular
outline. In addition, the malignant mass will appear whiter than any tissue
surrounding it.
Since our dataset was now available, we used exemplified data visualisation
techniques to strategically sort and examine our data and eventually used
enforced classification algorithms on the processed data and predicted the
accuracy of each.
(The Classification algorithms as well as the corresponding code has been
elaborated upon in the proceeding sections)
4
pragmatic approach, i.e. the use of deep learning to classify our data
and hence predict the outcome, i.e. the benign or malignancy of the
tumour.
5
INTRODUCTION
Breast cancer affects one out of eight females worldwide. It is diagnosed by
detecting the malignancy of the cells of breast tissue. Modern medical image
processing techniques work on histopathology images captured by a
microscope, and then analyse them by using different algorithms and
methods. Machine learning algorithms are now being used for processing
medical imagery and pathological tools.
Breast cancer (BC) is one of the most common cancers among women
worldwide, representing the majority of new cancer cases and cancer-related
deaths according to global statistics, making it a significant public health
problem in today’s society.
o The early diagnosis of Breast Cancer can improve the prognosis and
chance of survival significantly, as it can promote timely clinical
treatment to patients. Further accurate classification of benign tumours
can prevent patients undergoing unnecessary treatments.
6
o Thus, the correct diagnosis of Breast Cancer and classification of
patients into malignant or benign groups is the subject of much
research. Because of its unique advantages in critical features detection
from complex BC datasets, machine learning (ML) is widely recognized
as the methodology of choice in BC pattern classification and forecast
modelling.
o The drawback of the MRI is that the patient could develop an allergic
reaction to the contrasting agent, or that a skin infection could develop
at the place of injection. It may cause claustrophobia. Masses and
macrocalcifications (MCs) are two important early signs of the disease
Classification and data mining methods are an effective way to classify data.
Especially in medical field, where those methods are widely used in diagnosis
and analysis to make decisions.
o In the last few decades, several data mining and machine learning
techniques have been developed for breast cancer detection and
classification, which can be divided into three main stages: pre-
processing, feature extraction, and classification.
7
intensity distribution, and several methods have been reported to assist
in this process.
8
Deep Learning to Improve Breast Cancer Detection on
Screening Mammography
9
Various steps are involved in a Computer aided diagnosis (CAD) system using
a conventional workflow.
10
Image enhancement
Image enhancement is processing the mammogram images to increase
contrast and suppress noise in order to aid radiologists in detecting the
abnormalities.
THE CLAHE algorithm can be used for image enhancement and can be defined
as follows:
1. Divide the original image into contextual regions of equal size,
2. Apply the histogram equalization on each region,
3. Limit this histogram by the clip level,
4. Redistribute the clipped amount among the histogram, and
5. Obtain the enhanced pixel value by the histogram integration.
Image segmentation
Image segmentation is used to divide an image into parts having similar
features and properties. The main aim of segmentation is to simplify the
image by presenting in an easily analysable way. Some of the most popular
image segmentation methodologies are edge, fuzzy theory, partial
differential equation (PDE), artificial neural network (ANN), threshold, and
region-based segmentation
Feature extraction
Deep Convolutional Neural Network is used in order to perform Feature
extraction. Feature extraction is a process of dimensionality reduction by
which an initial set of raw data is reduced to more manageable groups for
processing. A characteristic of these large data sets is a large number of
variables that require a lot of computing resources to process.
11
Classification
In this step, the ROI is classified as either benign or malignant according to
the features. There are lots of classifier techniques; such as linear
discriminant analysis (LDA), artificial neural networks (ANN), binary decision
tree, and support vector machines (SVM). We figured that since the problem
at hand is a binary classification problem, SVM should be used because it
achieved high classification rates in the breast cancer classification problem.
12
13
14
Why we didn’t proceed with Mammographic
Images?
15
PHASE 2
Herein we have made use of independent supervised learning algorithms on
our dataset (publicly available) and have computed accuracies on the test set
for each of the proposed models. The corresponding curvature of our work
and substantiating code have been elaborated upon in the section that
follows.
16
CLASSIFICATION USING DIVERSIFIED
SUPERVISED LEARNING ALGORITHMS:
Attribute Information:
1. ID number
2. Diagnosis (M = malignant, B = benign)
17
Ten real-valued features are computed for each
cell nucleus:
18
OBJECTIVE
1. Data Preparation
2. Encoding Categorical Data
3. Feature Scaling
4. Model Selection
19
MEASURING THE ACCURACY
We will use Classification Accuracy method to find the accuracy of our
models. Classification Accuracy is what we usually mean, when we use the
term accuracy.
It is the ratio of number of correct predictions to the total number of input
samples.
CONFUSION MATRIX
20
THE CODE IS IMPLEMENTED ON GOOGLE
COLLABORATORY AND THE SUBSEQUENT
LINK FOR THE SAME IS PROVIDED BELOW:
LINK:
https://colab.research.google.com/drive/1dxNE5P2x79gmpsEsyRaqF
OlADP5EvgKo
21
PHASE 3
HEREIN WE USED HISTOPATHOLOGICAL IMAGES OF THE
BREAST AS DATASET TO CLASSIFY TUMOR AS BENIGN OR
MALIGNANT AND HENCE CONFIRM PRESENCE OF BREAST
CANCER.
22
In deep learning algorithms, a series of tasks are
implemented.
23
Introduction to the proposed Analogy for Image
classification using CNN
24
BreakHis Database
o The samples are generated from breast tissue biopsy slides, stained
with Haematoxylin and eosin(HE). The samples are collected by
surgical open Biopsy(SOB), prepared for histological study.
25
o Samples are stained by haematoxylin and eosin and produced by a
standard paraffin process in which specimen infiltration and
embedment are done in paraffin. Images are taken by a Samsung high-
resolution device (SCC-131AN) which is coupled with an Olympus BX-
50 microscopic system equipped with a relay lens with a magnification
of 3.3×. These histopathology images have a RGB (three channel)
TrueColor (8 bits- Red, 8 bits- Green, 8 bits- Blue) colour coding
scheme.
26
CONVOLUTION NEURAL NETWORK
CNN is a modified variety of deep neural net which depends upon the
correlation of neighbouring pixels. It uses randomly defined patches for
input at the start, and modifies them in the training process. Once training is
done, the network uses these modified patches to predict and validate the
result in the testing and validation process.
The CNN architecture has two main types of transformation. The first is
convolution, in which pixels are convolved with a filter or kernel. This step
provides the dot product between image patch and kernel. The width and
height of filters can be set according to the network, and the depth of the
filter is the same as the depth of the input.
27
PROPOSED OUTLINE AND WORKFLOW PROCESS
28
CNN ARCHITECTURE:
o Input Layer: It loads the input, image data in our case, and produces
the output to feed to convolutional layers. In our case, the dimension
of an image is (92x140) and the number of channels (3 for RGB).
29
Layer Attribute L1 L2 L3 L4 L5 L6
Type Conv Pool Conv Pool Conv pool
Channel 32 - 32 - 64 -
Filter Size 3x3 - 3x3 - 3x3 -
Conv. Stride 1x1 - 1x1 - 1x1 -
Pooling Size - 2x2 - 2x2 - 2x2
Padding Size same none same none same none
Activation ReLu - ReLu - ReLu -
o Flatten Layer: After extracting the pooled feature map and passing
it through dropout layer, flattening is done. It converts the pooled
feature map matrix into a single column which in turn is fed to neural
network for processing.
o Dense Layer: Dense Layer is also called a fully connected layer, this is
like a hidden layer, except all the neurons in layers are fully connected
to the next layer. After the flattening process, the feature map is passed
through Dense Layer. Our model makes use of two dense layers, one
after flattening with activation function as “ReLu” and other as output
30
layer (where we get predicted classes) with the “SoftMax” activation
function.
Final figures produced by the neural network should lie between zero
and one, representing the probability of each class. This is achieved by
using the SoftMax activation function in the output layer.
31
EXPERIMENTAL RESULTS
32
THE CODE IS IMPLEMENTED ON GOOGLE
COLLABORATORY AND THE SUBSEQUENT
LINK FOR THE SAME IS PROVIDED BELOW:
LINK:
https://colab.research.google.com/drive/1DfcOSUq6qEj5LU2wo9k_SnltP7y_
sFaU
33
DEFINING PARAMETERS FOR OUR MODEL
Precision
Precision is the ratio of correctly predicted positive observations to the
total predicted positive observations. High precision relates to the low
FPR. The precision is calculated using the following equation.
Precision = T P T P + F P
F1 score
F1 score is the weighted average of precision and recall. It is used as a
statistical measure to rate the performance of the classifier. Therefore,
this score takes both false positives and false negatives into account.
34
CONCLUSION AND FUTURE SCOPE
35
Henceforth, we are coerced to limit our experimental
investigation till here.
36
THANK YOU
A Project By :
o Anisha Gupta
o Diwakar Arora
37