You are on page 1of 18

OBJECT DETECTION

MAJOR PROJECT REPORT

Submitted in partial fulfilment of the requirements for the award of the degree

of

BACHELOR OF TECHNOLOGY

in

COMPUTER SCIENCE & ENGINEERING

by

Adyaksh Sharma Shivank Bansal


Enrollment No: 00415602717 Enrollment No: 36215602717

Yash Vardhan Malpani Vishal Singh Bisht


Enrollment No: 06015602717 Enrollment No: 43415602717

Guided by

Ms. Tanvi Rustagi


Assistant Professor

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING


Dr. AKHILESH DAS GUPTA INSTITUTE OF TECHNOLOGY & MANAGEMENT
(AFFILIATED TO GURU GOBIND SINGH INDRAPRASTHA UNIVERSITY, DELHI)
NEW DELHI – 110053
JULY 2021
CANDIDATE’S DECLARATION

It is hereby certified that the work which is being presented in the B. Tech Major Project
Report entitled "OBJECT DETECTION" in partial fulfilment of the requirements for the
award of the degree of Bachelor of Technology and submitted in the Department of
Computer Science & Engineering of Dr. Akhilesh Das Gupta Institute of Technology &
Management, New Delhi (Affiliated to Guru Gobind Singh Indraprastha University,
Delhi) is an authentic record of our own work carried out during a period from March 2021 to
July 2021 under the guidance of Ms. Tanvi Rustagi, Assistant Professor.

The matter presented in the B. Tech Major Project Report has not been submitted by me for the
award of any other degree of this or any other Institute.

(Adyaksh Sharma) (Shivank Bansal)


(Enrollment No: 00415602717) (Enrollment No: 36215602717)

(Yash Vardhan Malpani) (Vishal Singh Bisht)


(Enrollment No: 06015602717) (Enrollment No: 43415602717)

This is to certify that the above statement made by the candidate is correct to the best of my
knowledge. He/She/They are permitted to appear in the External Major Project Examination

(Ms. Tanvi Rustagi) Prof. (Dr.) Anupam Kumar Sharma


Assistant Professor Head, Department of CSE
The B. Tech Minor Project Viva-Voce Examination of Shivank Bansal (Enrollment No:
36215602717), has been held on ……………………………….

Ms. Pinky Yadav (Signature of External Examiner)


Project Coordinator
ABSTRACT

The report presents the study of a highly accurate object detection model for the classification
of food items in a video. The model makes use of ResNet50 with transfer learning to get high
accuracy and gain benefits of larger dataset. It also implements stochastic gradient descent to
optimize the loss function. This in turn uses less computational power and is very space-
efficient. Moreover, transfer learning helps to achieve even higher accuracy with a minimal
dataset. In order to achieve the best accuracy, we randomly resized and flipped the sample
images to match the real-world scenario.
ACKNOWLEDGEMENT

We express our deep gratitude to Ms. Tanvi Rustagi, Assistant Professor, Department of
Computer Science & Engineering for his valuable guidance and suggestion throughout my
project work. We are thankful to Ms. Pinky Yadav, Project Coordinator for their valuable
guidance.

We would like to extend my sincere thanks to Prof. (Dr.) Anupam Kumar Sharma, Head of
Department, for his time to time suggestions to complete my project work. I am also thankful
to Prof. (Dr.) Sanjay Kumar, Director for providing me the facilities to carry out my project
work.

(Adyaksh Sharma) (Shivank Bansal)


(Enrollment No: 00415602717) (Enrollment No: 36215602717)

(Yash Vardhan Malpani) (Vishal Singh Bisht)


(Enrollment No: 06015602717) (Enrollment No: 43415602717)
TABLE OF CONTENTS

CANDIDATE DECLARATION ii-iii


ABSTRACT iv
ACKNOWLEDGEMENT v
TABLE OF CONTENTS vi
LIST OF FIGURES vii
LIST OF ABBREVIATIONS viii

Chapter 1: Introduction 1-2

1.1 Introduction 1
1.1.1 Images Based Upon

1.1.2 Modifications
1.2 Objective 1
1.3 Summary of The Report 2

Chapter 2: Project Description 3


Chapter 3: Technology Used 4
Chapter 4: Implementation 5
Chapter 5: Testing 6
Chapter 6: Results and Snapshots of Project 7
Chapter 7: Conclusions 8
Chapter 8: Future Scope 9

REFERNCES 10
LIST OF FIGURES

Figure 1.1 Different layers in ResNet


Figure 1.2 Sample images of training dataset
Figure 1.3 Graph depicting training accuracy and validation accuracy against epoch
Figure 1.4 Input samples other than the dataset
LIST OF ABBREVIATION

SGD Stochastic Gradient Descent

CNN Convolutional Neural Network

CHAPTER 1: INTRODUCTION
1.1 INTRODUCTION

This project focuses on multiclass object detection in a single domain in video. Multiple food
classes may be classified using this methodology. Its ability to detect and classify is highly
accurate and uses significantly less computational power. It operates on 300 x 300 resolution
video input from which the food items are detected. This research will aid in classifying real-
world things as weapons, which would be especially useful for surveillance.

1.1.1 Ideas Based Upon

Object detection in our project is based upon two key ideas. Each of these ideas will be
presented briefly below, followed by a detailed explanation.

The first aspect of our project is ResNet50 which is a CNN that can be trained for
classifying different objects. We used the pre-trained network that can easily be
retrained for a new similar classification problem by following a few easy steps.
Furthermore, we employ ResNet using transfer learning, which allows us to build
highly accurate models with the least amount of training data.

The Stochastic Gradient Descent is the other aspect of our project. Its optimization
strategy is to update the model according to the loss function's output. Given sparse
data, Stochastic Gradient Descent is often used in classification-based machine
learning.

1.1.2 Modifications

In order to increase the testing phase's accuracy score, we modified the data using the
addition of a code that resizes as well as flips images in datasets at random to reflect
real world conditions, improving accuracy even with tiny datasets.

1.2 OBJECTIVE

● This project focuses on object detection in video using low computational


power.

● The framework provides with classification of multiple food classes.

● Towards this end, an object detection system for classifying food into different
classes is achieved.
● Its ability to detect and classify is very accurate.

● It operates on 300 x 300 resolution images from which the food items are
detected.

● This model has been trained to operate with a minimal dataset, resulting in
increased efficiency for small projects that does not have a large amount of data.

1.3 SUMMARY OF THE REPORT

Using ResNet50 and the SGD, we created highly accurate object detection model which
can be trained with a minimal dataset. Our model uses very little computing power;
hence it is efficient. In a relatively less amount of time, it classifies multiple food
objects in a given video. In the future, our model might be used as a foundation for
classifying items into surveillance against weapons.

CHAPTER 2: PROJECT DESCRIPTION


We have built a project in which we will be implementing an object detection model, which
will be able to detect the objects present in the video, in the process of finding real-world
object instances like car, bike, TV, flowers, and humans in slideshow or Videos. We have tried
to make the model as efficient as we can with the best accuracy we can achieve. Our project
consists of two parts, Object Localization and Video Classification. Video classification
involves assigning a class label to an image, whereas object localization involves drawing a
bounding box around one or more objects in a video. Object detection is more challenging as it
combines both of these tasks and draws a bounding box around each object of interest in the
video and assigns them a class label, successfully detecting the objects present in the video.

Project Specifications:

1.Input: A video with one or more objects, like, a slideshow or multiple frames of a video.

2.Output: An image with one or more bounding boxes with detected objects marked on them.

3.Modules:

a. Object Localization: Input video is passed and bounding boxes for potential classifiable
objects are returned.

b. Feature Extraction: The pixel values of input image of pixels present within the bounding
boxes is passed to convolution network for feature extraction of the potential objects present in
the image.

c. Image Classification: Feature extracted are passed to a classifier that detects the class of the
image with a confidence score, and if the confidence score is above a threshold score, the
localized object will be marked as that class.

CHAPTER 3: TECHNOLOGY USED


3.1 ResNet 50

The term ResNet denotes a residual network and 50 denotes the number of layers
present in network. ResNet employs an identity connection which is also called a
shortcut connection that counteracts the vanishing gradient during the backpropagation
preventing the formation of a small gradient that makes initial layer learning
intractable. Another benefit of identity connections is that we can train deep networks
faster than before [2]. There are ResNet consisting of more than 50 layers, ResNet with
101 layers is called ResNet101, and ResNet with 152 layers is called ResNet152. We
used ResNet to implement transfer learning to model with a minimal dataset. Transfer
learning takes a large dataset's pretrained model and applies its learned feature maps to
classify a separate but related dataset. ResNet also uses Batch normalisation. Bach
normalization improves network performance by standardizing input. Internal covariate
shifts are also reduced by using batch normalization [4], and it also allows us to make
use of higher learning rates [5]. ResNet uses a 2D convolutional neural network to
accomplish deep learning. It can be used for classification purposes, including image
recognition, effectively. [1, 3].

The equation of the neural network for updating the weights wi and bias b where xi is
input of layer is defined as below:

∑i wi xi + b

Where,
wi = weight of layer
xi = input of layer
b = bias

3.2 Stochastic Gradient Descent

The loss function is optimized using Stochastic Gradient Descent. It is computationally


efficient since it eliminates time-consuming operations and processes one sample at a
time. Moreover, because samples are processed one at a time, space is used more
effectively, making it cheaper in terms of complexity to load into memory. Stochastic
Gradient makes numerous steps towards loss function minima, which allows it to avoid
local minimums. With fewer training errors, stochastic gradient descent generalizes the
model well [7]. It only results in a minor generalization error for a short training period.

CHAPTER 4: IMPLEMENTATION
The system's core is made up of ResNet50, where the number 50 denotes the number of layers.
A pre-trained version of the network is loaded, which has been trained on a large number of
photos from the ImageNet database. We employed that pre-trained model on a similar dataset
—this process is known as transfer learning. Even with a tiny dataset, we were able to obtain
good accuracy with this methodology. ResNet is chosen over Convolutional Neural Network
because it is an ensemble of many mathematical methods used in Deep Learning, such as the
Convolutional2D layer and the MaxPooling Layer. ResNet also uses shortcut connections to
overcome the vanishing gradient problem, allowing our network to have additional layers.
Additional layers like the Flatten, Dense, and Dropout Layers help to enhance the working. In
addition, it can predict a variety of multiple food items.

CHAPTER 5: TESTING
5.1 Training Dataset

Our training dataset comprises 750 images of each food class, where each of them is
300 x 300 base resolution. Further, the images in the dataset are randomly flipped
or/and resized to match the real scenarios and achieve the highest accuracy. The initial
training accuracy of our model is 46.8% which increases to 90% accuracy. The time
taken by the model to train to achieve the maximum accuracy is approximately 20
minutes.

5.2 Test Dataset

Our test dataset or the validation dataset comprises 250 samples of each class. Every
image is of base resolution 300 x 300. The initial testing accuracy of our model is
74.6% which increases to 92%.

CHAPTER 6: RESULT AND SNAPSHOTS OF PROJECT


6.1 Real World Test Cases

Apart from validation dataset, some image samples were also taken in our study. The
model performed up to the expectations and identified the food items correctly. Below
are some examples.

Figure 1.4 Input samples other than the dataset

CHAPTER 7: CONCLUSION
Using ResNet50 and the Stochastic Gradient Descent, we created highly accurate object
detection model which can be trained with a minimal dataset. Our model uses very little
computing power; hence it is efficient. In a relatively less amount of time, it classifies multiple
food objects in a given video. In the future, our model might be used as a foundation for
classifying items into surveillance against weapons.

CHAPTER 8: FUTURE SCOPE


• Originally, we planned to use this on every domain but currently it is restricted to the
individuals only being able to detect objects in the already captured images.

• This should not come as a surprise since the detection significantly depends on the
availability of dataset.

• We aim to deploy the application on weapons to improve security in public domains, this
mainly includes stores, streets, subways, etc

• Making the model more robust so that we can work on multiple datasets.

REFERENCES
[1] Kelek MO, Calik N, Yildirim T. Painter classification over the novel art painting data set
via the latest deep neural networks. Procedia Comput Sci. 2019 Jan 1;154:369-76.J. Clerk
Maxwell, A Treatise on Electricity and Magnetism, 3rd ed., vol. 2. Oxford: Clarendon,
1892, pp.68–73.
[2] He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. InProceedings
of the IEEE conference on computer vision and pattern recognition 2016 (pp. 770-778).
[3] Ren S, He K, Girshick R, Zhang X, Sun J. Object detection networks on convolutional
feature maps. IEEE transactions on pattern analysis and machine intelligence. 2016 Aug
17;39(7):1476-81.
[4] Santurkar S, Tsipras D, Ilyas A, Madry A. How does batch normalization help
optimization?. Advances in neural information processing systems. 2018;31:2483-93.
[5] Bjorck N, Gomes CP, Selman B, Weinberger KQ. Understanding batch normalization.
InAdvances in Neural Information Processing Systems 2018 (pp. 7694-7705).
[6] Bottou L. Stochastic gradient learning in neural networks. Proceedings of Neuro-Nımes.
1991 Nov;91(8):12.
[7] Hardt M, Recht B, Singer Y. Train faster, generalize better: Stability of stochastic gradient
descent. InInternational Conference on Machine Learning 2016 Jun 11 (pp. 1225-1234).
PMLR..
[8] Torrey L, Shavlik J. Transfer learning. InHandbook of research on machine learning
applications and trends: algorithms, methods, and techniques 2010 (pp. 242-264). IGI
global.
[9] opencv-python-tutroals.readthedocs.io
[10] pytorch.org
[11] fritz.ai
[12] towardsdatascience.com/object-detection-simplified-e07aa3830954
[13] tensorflow.org/lite/models/object_detection/overview
[14] tutorialspoint.com/image-processing-in-python
[15] openaccess.thecvf.com/content_cvpr_2018/papers/Chen_Domain_Adaptive_Faster_CVPR
_2018_paper.pdf
[16] openaccess.thecvf.com/content_cvpr_2017_workshops/w8/papers/Hung_Applying_Faster_
R-CNN_CVPR_2017_paper.pdf

You might also like