Professional Documents
Culture Documents
Project Report
on
Classification of Microcalcifications
Using CNN
Under the guidance of
Dr Satish Kumar Singh
Associate Professor, Department of Information Technology,
Indian Institute of Information Technology (IIIT-Allahabad)
Submitted by
D. Abhishek Reddy(16103109)
The institution was founded in 1999, and in the following year received
university status and the right to award its own degrees. In 2014 the IIIT Act
was passed, under which IIITA and four other Institutes of Information
Technology funded by the Ministry of Human Resource Development were
classed as Institutes of National Importance.
The first director of the institute was M.D. Tiwari, from 1999 to 2013. G.C.
Nandi served as director in-charge on the first four months of 2014, until
Somenath Biswas took over the directorship and served from May 2014 to July
2016. After another stint by Nandi as director that lasted until May 2017,P. Na-
gabhushan was appointed director.
I have taken efforts in this project. However, it would not have been pos-
sible without the kind support and help of many individuals and organi-
zations. I would like to extend my sincere thanks to all of them.
I would also like to express my special gratitude and thanks to other re-
searcher scholars for giving me such attention and time.
Abstract
Chapter 1. Introduction
Chapter 3. Design
Chapter 7. Conclusion
Bibliography
ABSTRACT
The project deals with the classification of Breast Micro calcifications given multi-view
mammograms into benign and malignant categories. It is important to detect them as many of
the women are becoming the victims of Breast cancer as it gave better results. So it is better
that before the actual disease occurs to a person we might try to detect it and take appropriate
precautions to avoid the person to become a victim of that disease. So here a pre trained mod-
el is taken and fine-tuned according to the training set. The dataset used here is Digital Data-
base for Screening Mammography (DDSM).The model is trained and testing is done. The
input is given to the model and it starts predicting the images as one of the benign or malig-
nant categories. We can make the labels noisy by randomly shuffling the true labels with
some probability and train the model to give better results which is not included in the work.
1.INTRODUCTION:
BREAST CALCIFICATIONS:
Breast calcifications are small calcium deposits that develop in a wom-
an's breast tissue. They are very common and are usually benign (noncancerous).
BREAST MICROCALCIFICATIONS:
Microcalcifications are small calcium deposits that look like white spots on
a mammogram.
Microcalcifications are usually not a result of cancer. But if they appear in certain pat-
terns and are clustered together, they may be a sign of precancerous cells or ear-
ly breast cancer.
1
Why it is important to detect Microcalcifications?
Nearly 50% of non-palpable cancers in the breast are detected only by the presence of
microcalcifications on a mammogram.
2.TECHNOLOGY USED:
2
ARCHITECTURE USED:
The model is modified according to our dataset and the last layers in the model are removed
and few more layers are added as it is fine tuned for making it to work on the images which
are preprocessed earlier.
x=mobile.get_layer('conv_pw_13_relu').output
x=Flatten()(x)
3
3.DESIGN:
4
4.RESULTS OBTAINED:
The overall accuracy of the model after finishing the final epoch is 96.8%
5
CONFUSION MATRIX AFTER PREDICTION:
OTHER METRICS:
6
5. PROGRAM DESCRIPTION:
Extract.py: This program is used to extract images from the Digital Database for
Screening Mammography dataset (DDSM) and are saved in the device storage.
Training.py: This is used to perform training on the given dataset. This program was
ran on google colab. It uses Adam optimizer for training and also does the validation
on the set. Then it performs the testing on dataset and draws a confusion matrix and
other metrics.
6 PROGRAM LISTING:
a) Extract.py:
#Copied from kaggle website
import tensorflow as tf
import matplotlib.pyplot as plt
%matplotlib inline
def read_and_decode_single_example(filenames):
filename_queue = tf.train.string_input_producer(filenames, num_epochs=1)#hold file
names in FIFO queue
for i in range(10):
#plt.imshow(im_b[i].reshape([299,299]))
filename1="F:/Extracted Images/training set/0/%d.jpg" % (p+1)
filename2="F:/Extracted Images/training set/1/%d.jpg" % (p+1)
#cv2.imwrite(filename,im_b[i])
#plt.title("Label: " + str(la_b[i]))
if str(la_b[i])=='0':
#cv2.imwrite(filename1,im_b[i])
continue
8
else:
cv2.imwrite(filename2,im_b[i])
label.append(la_b[i])
#label.append(la_b[i])
#plt.show()
p=p+1
#for i in range(label):
#print(label)
coord.request_stop()#stop the threads
b) Training.py:
#For mounting the drive:
from google.colab import drive
drive.mount('/content/drive')
#Import libraries and Mobilenet weights
import numpy as np
import keras
from keras import backend as K
from keras.models import Sequential
from keras.models import Model
from keras.layers import Activation,Input,Dropout
from keras.layers.core import Dense, Flatten
from keras.optimizers import Adam
from keras.metrics import categorical_crossentropy
from keras.preprocessing.image import ImageDataGenerator
from keras.layers.normalization import BatchNormalization
from keras.layers.convolutional import *
from keras.models import load_model
from matplotlib import pyplot
9
from sklearn.metrics import precision_score, recall_score, accuracy_score,f1_score
from sklearn.metrics import confusion_matrix
import itertools
input1=Input(shape=(299,299,3))
mobile = keras.applications.mobilenet.MobileNet(weights='imagenet', include_top=False,
input_tensor=input1, input_shape=None, pooling=None, classes=2)
10
#Training the model for 10 epochs
H=model.fit_generator(train_batches, steps_per_epoch = 1000, validation_data = val-
id_batches,validation_steps = 800, epochs = 10, verbose = 1)
11
"""
This function prints and plots the confusion matrix.
Normalization can be applied by setting `normalize=True`.
"""
plt.imshow(cm, interpolation='nearest', cmap=cmap)
plt.title(title)
plt.colorbar()
tick_marks = np.arange(len(classes))
plt.xticks(tick_marks, classes, rotation=45)
plt.yticks(tick_marks, classes)
if normalize:
cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
print("Normalized confusion matrix")
else:
print('Confusion matrix, without normalization')
print(cm)
fmt = '.2f' if normalize else 'd'
thresh = cm.max() / 2.
for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
plt.text(j, i, format(cm[i, j], fmt),
horizontalalignment="center",
color="white" if cm[i, j] > thresh else "black")
plt.tight_layout()
plt.ylabel('True label')
plt.xlabel('Predicted label')
# plotting the confusion matrix
test_labels = test_batches.classes
cm = confusion_matrix(test_labels, predictions.argmax(axis=1))
test_batches.class_indices
cm_plot_labels = ['0','1']
plot_confusion_matrix(cm, cm_plot_labels, title='confusion matrix')
12
#Showing the metrics
print('precision',precision_score(test_labels,predictions.argmax(axis=1), average='weighted'))
print('recall',recall_score(test_labels,predictions.argmax(axis=1), average='weighted'))
print('accuracy',accuracy_score(test_labels,predictions.argmax(axis=1)))
print('f1-score',f1_score(test_labels,predictions.argmax(axis=1),average='weighted'
13
FUTURE WORK:
By Creating noisy labels:
Noisy labels are formed by randomly shuffling the original labels with some
probability. Adding noisy labels accuracy can be improved like it was shown in the
referenced paper.
Size of data set can be increased to improve the accuracy.
Number of epochs can be increased but see that the model does not overfit on the
training data.
A good Graphical user interface can be made for testing so that user doesn’t interact
much with the code.
The dataset can be tested on the model by adding or removing few layers accordingly
for better results.
The dataset can be tested on various other pretrained models by fine-tuning them for
better accuracy.
14
BIBLIOGRAPHY:
1. CNN - https://en.wikipedia.org/wiki/Convolutional_neural_network
2. Open-CV - https://opencv.org
3. Python- https://www.python.org
4. TensorFlow - https://github.com/tensorflow
5. Keras - https://github.com/keras-team, Documentation -https://keras.io
6. CNN-cs231n.stanford.edu
7. CNN-https://youtu.be/vT1JzLH4G4 (Stanford University lectures)
8. Referenced paper: “TRAINING A NEURAL NETWORK BASED ON UNRELIABLE
HUMAN ANNOTATION OF MEDICAL IMAGES” by Yair Dgani, Hayit Greenspan, Ja-
cob Goldberger
9. Dataset: DDSM from www.kaggle.com
15