Cat and Dog Classification Using CNN Fin

MINOR PROJECT REPORT
ON
CAT AND DOG CLASSIFICATION USING CNN
DEVELOPED BY:
VIKAS ARORA (00255102719)
PRAKHAR GUPTA (03855102719)
Under the Guidance of

Mrs. SHRUTY AHUJA
HOD (CSE)
At
MAHAVIR SWAMI INSTITUTE OF TECHNOLOGY

SONEPAT
AFFILIATED TO GURU GOBIND SINGH INDRAPRASTHA UNIVERSITY, DWARKA NEW
DELHI
2019 - 2023
DECLARATION
We VIKAS ARORA (00255102719), PRAKHAR GUPTA (03855102719), of Fourth Year

B.Tech., in the Department of Computer Science and Engineering from MVSIT hereby declare
that the work presented in this report entitled “Cat and Dog classification using CNN”, in
fulfillment of the requirement for the award of the degree Bachelor of Technology in
Computer Science & Engineering, submitted in CSE Department, Mahaveer Swami Institute
of Technology affiliated to Guru Gobind Singh Indraprastha University, New Delhi, during the
academic year 2019-2023 is an authentic record of our own work carried out during our degree
under the guidance of MS. SHRUTI AHUJA. The work reported in this has not been submitted
by me for award of any other degree or diploma.
VIKAS ARORA (00255102719)
PRAKHAR GUPTA (03855102719)

ACKNOWLEDGEMENT
We would like to express our deep gratitude to our guide Ms. SHRUTI AHUJA for her valuable
guidance, faculty of computer science and engineering, MVSIT and timely suggestions during the
entire duration of our dissertation work, without which this work would not have been possible.
We would also like to convey our deep regards to all other faculty members of MVSIT, who have
bestowed their great effort and guidance at appropriate times without which it would have been
very difficult on our part to finish this work. Finally, we would also like to thank our friends for
their advice and pointing out our mistakes, parents, and classmates for their encouragement
throughout our project period. Last but not least, we thank everyone for supporting us directly or
indirectly in completing this project successfully.
ABSTRACT
Image classification is a fundamental problem in computer vision.

Deep learning provides successful results for machine learning
problems. Many algorithms like minimum distance algorithm, K-
Nearest neighbor algorithm, Nearest Clustering algorithm, Fuzzy
C - Means algorithm, Maximum likelihood algorithm are used for
the purpose of image classification. In this report, image
classification is performed using convolutional neural network
which is became standard after since Alex Krizhevsky, Geoff
Hinton and Ilya Sutskevar won ImageNet in 2012. Generally
convolutional neural network uses GPU technology because of
huge number of computations but, in proposed method we are
building a very small network which can work on CPU as well.
The network is trained using a subset of Kaggle Dog-Cat dataset.
This trained classifier can classify the given image into either cat
or dog. The same network can be trained with any other dataset
and classify the images into one of the two predefined class.
CONTENTS
Declaration
Acknowledgement
Abstract
CHAPTERS
CHAPTER 1. - INTRODUCTION
1.1 Convolutional Neural Network
1.1.1 Convolutional Layer
1.1.2 Pooling Layer
1.1.3 Fully Connected Layer
1.2 AIM & OBJECTIVE
1.3 Conceptual Framework
1.4 Method
CHAPTER 2. – STUDY AND ANALYSIS

2.1 Problem Statement
2.2 Installing Required Packages for Python
2.2.1 NumPy
2.2.2 TensorFlow
2.2.3 Keras
2.3 Import Libraries
2.4 Convolution
2.5 Activation
2.6 Pooling
2.7 Fully Connected
CHAPTER 3. EXPERIMENTAL ANALYSIS AND RESULTS
3.1 Plot Dog and Cat Photos
3.2 Pre-Process Photos into Standard Directories
3.3 Develop a Baseline CNN Model
3.3.1 One Block VGG Model
3.3.2 Two Block VGG Model
3.3.3 Three Block VGG Model
3.4 Image Data Augmentation
3.5 Prepare Final Dataset
3.6 Save Final Model
3.7 Make Prediction
3.8 Data overview
4 CONCLUSION AND FUTURE WORK
5 BIBLIOGRAPHY
CHAPTER 1. INTRODUCTION
1.1. Convolutional Neural Network
Artificial Intelligence has been witnessing a monumental growth in bridging the gap
between the capabilities of humans and machines. Researchers and enthusiasts alike,
work on numerous aspects of the field to make amazing things happen. One of many such
areas is the domain of Computer Vision.
The agenda for this field is to enable machines to view the world as humans do, perceive
it in a similar manner and even use the knowledge for a multitude of tasks such as Image
& Video recognition, Image Analysis & Classification, Media Recreation, Recommendation
Systems, Natural Language Processing, etc. The advancements in Computer Vision with
Deep Learning have been constructed and perfected with time, primarily over one
particular algorithm — a Convolutional Neural Network.
A Convolutional Neural Network (ConvNet/CNN) is a Deep Learning algorithm which
can take in an input image, assign importance (learnable weights and biases) to various
aspects/objects in the image and be able to differentiate one from the other. The pre-
processing required in a ConvNet is much lower as compared to other classification
algorithms. While in primitive methods filters are hand-engineered, with enough training,
ConvNets have the ability to learn these filters/characteristics.
The architecture of a ConvNet is analogous to that of the connectivity pattern of Neurons
in the Human Brain and was inspired by the organization of the Visual Cortex. Individual
neurons respond to stimuli only in a restricted region of the visual field known as the
Receptive Field. A collection of such fields overlaps to cover the entire visual area.
Convolutional neural networks are distinguished from other neural networks by their
superior performance with image, speech, or audio signal inputs. They have three main
types of layers, which are:
• Convolutional layer
• Pooling layer
• Fully-connected (FC) layer
The convolutional layer is the first layer of a convolutional network. While convolutional
layers can be followed by additional convolutional layers or pooling layers, the fully-
connected layer is the final layer. With each layer, the CNN increases in its complexity,
identifying greater portions of the image. Earlier layers focus on simple features, such as
colors and edges. As the image data progresses through the layers of the CNN, it starts
to recognize larger elements or shapes of the object until it finally identifies the intended
object.
1.1.1 Convolutional Layer
The convolutional layer is the core building block of a CNN, and it is where the majority
of computation occurs. It requires a few components, which are input data, a filter, and a
feature map. Let’s assume that the input will be a color image, which is made up of a
matrix of pixels in 3D. This means that the input will have three dimensions—a height,
width, and depth—which correspond to RGB in an image. We also have a feature
detector, also known as a kernel or a filter, which will move across the receptive fields of
the image, checking if the feature is present. This process is known as a convolution.
1.1.2 Pooling Layer

Pooling layers, also known as down sampling, conducts dimensionality reduction,
reducing the number of parameters in the input. Similar to the convolutional layer, the
pooling operation sweeps a filter across the entire input, but the difference is that this filter
does not have any weights. Instead, the kernel applies an aggregation function to the
values within the receptive field, populating the output array. There are two main types of
pooling:
• Max pooling: As the filter moves across the input, it selects the pixel with the
maximum value to send to the output array. As an aside, this approach tends to
be used more often compared to average pooling.
• Average pooling: As the filter moves across the input, it calculates the
average value within the receptive field to send to the output array.
1.1.3 Fully-Connected Layer
The name of the full-connected layer aptly describes itself. As mentioned earlier, the pixel
values of the input image are not directly connected to the output layer in partially
connected layers. However, in the fully-connected layer, each node in the output layer
connects directly to a node in the previous layer.
1.2 AIM & OBJECTIVE
• The main aim of this learning is to help to Achieve and Understanding the Data
such as Images.
• Most of the Large Companies uses this kind of deep leaning at the core of their
service. Facebook uses neural nets for their automatic tagging algorithms, Google
for their photo search, Amazon for their product recommendations, and Instagram
for their search infrastructure.
However, use case of these networks is for image processing.
• To learn multiple levels of representations that correspond to different levels of
abstraction, the levels form a hierarchy of concepts.

• To learn in supervised and/or unsupervised manner.
• The image input which you give to the system will be analyzed and the predicted
result will be given as output. Machine learning algorithm [Convolutional Neural
Networks] is used to classify the image.
1.3 Conceptual Framework:

The project is entirely implemented using Python3. The Conceptual Framework involved
is mainly:
• Keras – TensorFlow backend
• OpenCV – Used to handle image operations
1.4 Method:
Step 1: Getting the Dataset
Step 2: Installing Required Packages [Python 3.6]

1. OpenCV —> Used to handle image operations like reading the image, resizing,
reshaping
2. NumPy —> Image that is read will be stored in an NumPy array
3. TensorFlow —> TensorFlow is the backend for Keras
4. Keras —> Keras is used to implement the CNN
Step 3: How the Model Works?
The dataset contains a lot of images of cats and dogs. Our aim is to make the model
learn the distinguishing features between the cat and dog. Once the model has learned,
i.e. once the model got trained, it will be able to classify the input image as either cat or
a dog.
Features Provided:
• Own image can be tested to verify the accuracy of the model
• This code can directly be integrated with your current project or can be
extended as a mobile application or a site.
• To extend the project to classify different entities, all you need to do is find
the suitable dataset, change the dataset accordingly and train the model
Data structures and Algorithms used in project

• NumPy Array: This most powerful and widely used data structure of python
is used to store the pixel value of images.
Tools Used:
• Python Interpreter
• Anaconda Prompt
• Spyder
Applications:
This project gives a general idea of how image classification can be done efficiently.
The scope of the project can be extended to the various industries where there is a huge
scope for automation, by just altering the dataset which is relevant to the problem.
CHAPTER 2. STUDY AND ANALYSIS
Convolutional Neural Network (CNN) is an algorithm taking an image as input then

assigning weights and biases to all the aspects of an image and thus differentiates one
from the other. Neural networks can be trained by using batches of images, each of them
having a label to identify the real nature of the image (cat or dog here). A batch can
contain few tenths to hundreds of images. For each and every image, the network
prediction is compared with the corresponding existing label, and the distance between
network prediction and the truth is evaluated for the whole batch. Then, the network
parameters are modified to minimize the distance and thus the prediction capability of the
network is increased. The training process continues for every batch similarly.
2.1 Dogs vs. Cats Prediction Problem Statement
The main goal is to develop a system that can identify images of cats and dogs. The input
image will be analyzed and then the output is predicted. The model that is implemented
can be extended to a website or any mobile device as per the need. The Dogs vs Cats
dataset can be downloaded from the Kaggle website. The dataset contains a set of
images of cats and dogs. Our main aim here is for the model to learn various distinctive
features of cat and dog. Once the training of the model is done it will be able to
differentiate images of cat and dog.
2.2 Installing Required Packages for Python 3.6

2.2.1. NumPy -> [ Image is read and stored in a NumPy array] 2.2.2. TensorFlow
-> [ TensorFlow is the backend for Keras]
2.2.3. Keras -> [ Keras is used for implementing the CNN]

2.3 Import Libraries
1. NumPy- For working with arrays, linear algebra.
2. Pandas – For reading/writing data
3. Matplotlib – to display images
4. TensorFlow Keras models – Need a model to predict
5. TensorFlow Keras layers – Every NN needs layers and CNN needs well a couple of
layers.
CNN does the processing of Images with the help of matrixes of weights known as filters.
They detect low-level features like vertical and horizontal edges etc. Through each layer,
the filters recognize high-level features.

We first initialize the CNN,
For compiling the CNN, we are using Adam optimizer.
Adaptive Moment Estimation (Adam) is a method used for computing individual learning
rates for each parameter. For loss function, we are using Binary cross-entropy to compare
the class output to each of the predicted probabilities. Then it calculates the penalization
score based on the total distance from the expected value.
Image augmentation is a method of applying different kinds of transformation to original
images resulting in multiple transformed copies of the same image. The images are
different from each other in certain aspects because of shifting, rotating, flipping
techniques. So, we are using the Keras ImageDataGenerator class to augment our
images.
2.4 Convolution
Convolution is a linear operation involving the multiplication of weights with the input. The
multiplication is performed between an array of input data and a 2D array of weights
known as filter or kernel. The filter is always smaller than input data and the dot product
is performed between input and filter array.

2.5 Activation
The activation function is added to help ANN learn complex patterns in the data. The main
need for activation function is to add non-linearity into the neural network.
2.6 Pooling
The pooling operation provides spatial variance making the system capable of
recognizing an object with some varied appearance. It involves adding a 2Dfilter over
each channel of the feature map and thus summarise features lying in that region covered
by the filter.
So, pooling basically helps reduce the number of parameters and computations present
in the network. It progressively reduces the spatial size of the network and thus controls
overfitting. There are two types of operations in this layer; Average pooling and Maximum
pooling. Here, we are using max-pooling which according to its name will only take out
the maximum from a pool. This is possible with the help of filters sliding through the input
and at each stride, the maximum parameter will be taken out and the rest will be dropped.
The pooling layer does not modify the depth of the network unlike in the convolution layer.
2.7 Fully Connected
The output from the final Pooling layer which is flattened is the input of the fully connected
layer.
The Full Connection process practically works as follows:
The neurons present in the fully connected layer detect a certain feature and preserves
its value then communicates the value to both the dog and cat classes who then check
out the feature and decide if the feature is relevant to them.

The Dogs vs. Cats dataset is a standard computer vision dataset that involves
classifying photos as either containing a dog or cat.
Although the problem sounds simple, it was only effectively addressed in the last
few years using deep learning convolutional neural networks. While the dataset is
effectively solved, it can be used as the basis for learning and practicing how to
develop, evaluate, and use convolutional deep learning neural networks for image
classification from scratch.
This includes how to develop a robust test harness for estimating the performance
of the model, how to explore improvements to the model, and how to save the
model and later load it to make predictions on new data.
The dogs vs cats dataset refers to a dataset used for a Kaggle machine learning
competition held in 2013.
The dataset is comprised of photos of dogs and cats provided as a subset of

photos from a much larger dataset of 3 million manually annotated photos.
The photos are labeled by their filename, with the word “dog” or “cat“. The file
naming convention is as follows:
CHAPTER 3. EXPERIMENTAL ANALYSIS AND RESULTS
3.1 Plot Dog and Cat Photos

Looking at a few random photos in the directory, we can see that the photos are color
and have different shapes and sizes.
For example, let’s load and plot the first nine photos of dogs in a single figure.
The complete example is listed below.
Running the example creates a figure showing the first nine photos of dogs in the dataset.
We can see that some photos are landscape format, some are portrait format, and some
are square.
We can update the example and change it to plot cat photos instead; the complete
example is listed below.
Again, we can see that the photos are all different sizes.
We can also see a photo where the cat is barely visible (bottom left corner) and another
that has two cats (lower right corner). This suggests that any classifier fit on this problem
will have to be robust.
3.2 Pre-Process Photos into Standard Directories

Alternately, we can load the images progressively using the Keras ImageDataGenerator
class and flow_from_directory() API. This will be slower to execute but will run on more
machines.
This API prefers data to be divided into separate train/ and test/ directories, and under
each directory to have a subdirectory for each class, e.g. a train/dog/ and
a train/cat/ subdirectories and the same for test. Images are then organized under the
subdirectories.
We can create directories in Python using the makedirs() function and use a loop to
create the dog/ and cat/ subdirectories for both the train/ and test/ directories.
3.3 Develop a Baseline CNN Model

In this section, we can develop a baseline convolutional neural network model for the
dogs vs. cats dataset.
A baseline model will establish a minimum model performance to which all of our other
models can be compared, as well as a model architecture that we can use as the basis
of study and improvement.
The architecture involves stacking convolutional layers with small 3×3 filters followed by
a max pooling layer. Together, these layers form a block, and these blocks can be
repeated where the number of filters in each block is increased with the depth of the
network such as 32, 64, 128, 256 for the first four blocks of the model. Padding is used
on the convolutional layers to ensure the height and width shapes of the output feature
maps matches the inputs.
We can explore this architecture on the dogs vs cats problem and compare a model with
this architecture with 1, 2, and 3 blocks.
We can create a function named define_model() that will define a model and return it
ready to be fit on the dataset. This function can then be customized to define different
baseline models, e.g. versions of the model with 1, 2, or 3 VGG style blocks.
The model will be fit with stochastic gradient descent and we will start with a conservative
learning rate of 0.001 and a momentum of 0.9.
The problem is a binary classification task, requiring the prediction of one value of either
0 or 1. An output layer with 1 node and a sigmoid activation will be used and the model
will be optimized using the binary cross-entropy loss function.
Below is an example of the define_model() function for defining a convolutional neural

network model for the dogs vs. cats problem with one vgg-style block.
The complete example of evaluating a one-block baseline model on the dogs and cats
dataset is listed below.
3.3.1 One Block VGG Model
The one-block VGG model has a single convolutional layer with 32 filters followed by a
max pooling layer.
The define_model() function for this model was defined in the previous section but is
provided again below for completeness.
3.3.2 Two Block VGG Model
The two-block VGG model extends the one block model and adds a second block with
64 filters.
The define_model() function for this model is provided below for completeness.
3.3.3 Three Block VGG Model

The three-block VGG model extends the two block model and adds a third block with 128
filters.
The define_model() function for this model was defined in the previous section but is
provided again below for completeness.
3.4 Image Data Augmentation

Image data augmentation is a technique that can be used to artificially expand the size of
a training dataset by creating modified versions of images in the dataset.
Training deep learning neural network models on more data can result in more skillful
models, and the augmentation techniques can create variations of the images that can
improve the ability of the fit models to generalize what they have learned to new images.
Data augmentation can also act as a regularization technique, adding noise to the training
data, and encouraging the model to learn the same features, invariant to their position in
the input.
Small changes to the input photos of dogs and cats might be useful for this problem, such
as small shifts and horizontal flips. These augmentations can be specified as arguments
to the ImageDataGenerator used for the training dataset. The augmentations should not
be used for the test dataset, as we wish to evaluate the performance of the model on the
unmodified photographs.
This requires that we have a separate ImageDataGenerator instance for the train and test
dataset, then iterators for the train and test sets created from the respective data
generators.
3.5 Prepare Final Dataset

A final model is typically fit on all available data, such as the combination of all train and
test datasets.
In this tutorial, we will demonstrate the final model fit only on the training dataset as we
only have labels for the training dataset.
The first step is to prepare the training dataset so that it can be loaded by
the ImageDataGenerator class via flow_from_directory() function. Specifically, we need
to create a new directory with all training images organized
into dogs/ and cats/ subdirectories without any separation into train/ or test/ directories.
This can be achieved by updating the script we developed at the beginning of the tutorial.
In this case, we will create a new finalize_dogs_vs_cats/ folder
with dogs/ and cats/ subfolders for the entire training dataset.
The structure will look as follows:
3.6 Save Final Model
We are now ready to fit a final model on the entire training dataset.
The complete example of fitting the final model on the training dataset and saving it to file
is listed below.
3.7 Make Prediction

We can use our saved model to make a prediction on new images.
The model assumes that new images are color and they have been segmented so that
one image contains at least one dog or cat.
Below is an image extracted from the test dataset for the dogs and cats competition. It
has no label, but we can clearly tell it is a photo of a dog. You can save it in your current
working directory with the filename ‘sample_image.jpg‘.
We will pretend this is an entirely new and unseen image, prepared in the required way,
and see how we might use our saved model to predict the integer that the image
represents. For this example, we expect class “1” for “Dog“.
First, we can load the image and force it to the size to be 224×224 pixels. The loaded
image can then be resized to have a single sample in a dataset. The pixel values must
also be centered to match the way that the data was prepared during the training of the
model. The load_image() function implements this and will return the loaded image ready
for classification.
Next, we can load the model as in the previous section and call the predict() function to
predict the content in the image as a number between “0” and “1” for “cat” and “dog”
respectively.
The complete example is listed below.

Running the example first loads and prepares the image, loads the model, and then
correctly predicts that the loaded image represents a ‘dog‘ or class ‘1‘.
3.8 Data overview
The data we collected is a subset of the Kaggle dog/cat dataset. In total, there are 10, 000
images, 80% for the training set, and 20% for the test set. In the training set, 4,000 images
of dogs, while the test set has 1,000 images of dogs, and the rest are cats.
All images are saved in a special folder structure, making it easy for Keras to understand
and differentiate the animal category of each image
CONCLUSION AND FUTURE WORK
This work aims at classifying images using Convolutional Neural Network (CNN). With
the optimization possible with CNN, it is easier to classify images as compared to
traditional image classification algorithms. With further enhancement in study of neural
networks, image classification problems will continue to become more and more easier to
solve. With image classification finding applications in various spheres of life, neural
networks have assumed even more significance. In future, this work can be extended for
real time image processing in various fields like validation and verification of different real
time images, spoofing.
BIBLIOGRAPHY
1: analyticsvidhya.com
2: towardsdatascience.com
3: geeksforgeeks.org
4: google.com
5: kaggle.com

Cat and Dog Classification Using CNN Fin

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Cat and Dog Classification Using CNN Fin

Uploaded by

Copyright:

Available Formats

MINOR PROJECT REPORT

CAT AND DOG CLASSIFICATION USING CNN

Under the Guidance of

MAHAVIR SWAMI INSTITUTE OF TECHNOLOGY

We VIKAS ARORA (00255102719), PRAKHAR GUPTA (03855102719), of Fourth Year

VIKAS ARORA (00255102719)

PRAKHAR GUPTA (03855102719)

Image classification is a fundamental problem in computer vision.

CHAPTER 2. – STUDY AND ANALYSIS

1.1. Convolutional Neural Network

areas is the domain of Computer Vision.

particular algorithm — a Convolutional Neural Network.

A Convolutional Neural Network (ConvNet/CNN) is a Deep Learning algorithm which

processing required in a ConvNet is much lower as compared to other classification

ConvNets have the ability to learn these filters/characteristics.

The architecture of a ConvNet is analogous to that of the connectivity pattern of Neurons

1.1.1 Convolutional Layer

1.1.2 Pooling Layer

1.1.3 Fully-Connected Layer

1.2 AIM & OBJECTIVE

for their search infrastructure.

However, use case of these networks is for image processing.

• To learn multiple levels of representations that correspond to different levels of

abstraction, the levels form a hierarchy of concepts.

result will be given as output. Machine learning algorithm [Convolutional Neural

Networks] is used to classify the image.

1.3 Conceptual Framework:

Step 2: Installing Required Packages [Python 3.6]

Data structures and Algorithms used in project

Convolutional Neural Network (CNN) is an algorithm taking an image as input then

differentiate images of cat and dog.

2.2 Installing Required Packages for Python 3.6

-> [ TensorFlow is the backend for Keras]

2.2.3. Keras -> [ Keras is used for implementing the CNN]

2. Pandas – For reading/writing data

3. Matplotlib – to display images

4. TensorFlow Keras models – Need a model to predict

the filters recognize high-level features.

For compiling the CNN, we are using Adam optimizer.

score based on the total distance from the expected value.

Image augmentation is a method of applying different kinds of transformation to original

multiplication is performed between an array of input data and a 2D array of weights

is performed between input and filter array.

The Full Connection process practically works as follows:

out the feature and decide if the feature is relevant to them.

The dataset is comprised of photos of dogs and cats provided as a subset of

3.1 Plot Dog and Cat Photos

The complete example is listed below.

3.2 Pre-Process Photos into Standard Directories

3.3 Develop a Baseline CNN Model

Below is an example of the define_model() function for defining a convolutional neural

3.3.3 Three Block VGG Model

3.4 Image Data Augmentation

3.5 Prepare Final Dataset

3.7 Make Prediction

The complete example is listed below.

3.8 Data overview

You might also like