Professional Documents
Culture Documents
Major Project Report - G-21 Group
Major Project Report - G-21 Group
Submitted in partial fulfilment of the requirements for the award of the degree
of
BACHELOR OF TECHNOLOGY
in
by
Guided by
It is hereby certified that the work which is being presented in the B. Tech Major Project
Report entitled "OBJECT DETECTION" in partial fulfilment of the requirements for the
award of the degree of Bachelor of Technology and submitted in the Department of
Computer Science & Engineering of Dr. Akhilesh Das Gupta Institute of Technology &
Management, New Delhi (Affiliated to Guru Gobind Singh Indraprastha University,
Delhi) is an authentic record of our own work carried out during a period from March 2021 to
July 2021 under the guidance of Ms. Tanvi Rustagi, Assistant Professor.
The matter presented in the B. Tech Major Project Report has not been submitted by me for the
award of any other degree of this or any other Institute.
This is to certify that the above statement made by the candidate is correct to the best of my
knowledge. He/She/They are permitted to appear in the External Major Project Examination
The report presents the study of a highly accurate object detection model for the classification
of food items in a video. The model makes use of ResNet50 with transfer learning to get high
accuracy and gain benefits of larger dataset. It also implements stochastic gradient descent to
optimize the loss function. This in turn uses less computational power and is very space-
efficient. Moreover, transfer learning helps to achieve even higher accuracy with a minimal
dataset. In order to achieve the best accuracy, we randomly resized and flipped the sample
images to match the real-world scenario.
ACKNOWLEDGEMENT
We express our deep gratitude to Ms. Tanvi Rustagi, Assistant Professor, Department of
Computer Science & Engineering for his valuable guidance and suggestion throughout my
project work. We are thankful to Ms. Pinky Yadav, Project Coordinator for their valuable
guidance.
We would like to extend my sincere thanks to Prof. (Dr.) Anupam Kumar Sharma, Head of
Department, for his time to time suggestions to complete my project work. I am also thankful
to Prof. (Dr.) Sanjay Kumar, Director for providing me the facilities to carry out my project
work.
1.1 Introduction 1
1.1.1 Images Based Upon
1.1.2 Modifications
1.2 Objective 1
1.3 Summary of The Report 2
REFERNCES 10
LIST OF FIGURES
CHAPTER 1: INTRODUCTION
1.1 INTRODUCTION
This project focuses on multiclass object detection in a single domain in video. Multiple food
classes may be classified using this methodology. Its ability to detect and classify is highly
accurate and uses significantly less computational power. It operates on 300 x 300 resolution
video input from which the food items are detected. This research will aid in classifying real-
world things as weapons, which would be especially useful for surveillance.
Object detection in our project is based upon two key ideas. Each of these ideas will be
presented briefly below, followed by a detailed explanation.
The first aspect of our project is ResNet50 which is a CNN that can be trained for
classifying different objects. We used the pre-trained network that can easily be
retrained for a new similar classification problem by following a few easy steps.
Furthermore, we employ ResNet using transfer learning, which allows us to build
highly accurate models with the least amount of training data.
The Stochastic Gradient Descent is the other aspect of our project. Its optimization
strategy is to update the model according to the loss function's output. Given sparse
data, Stochastic Gradient Descent is often used in classification-based machine
learning.
1.1.2 Modifications
In order to increase the testing phase's accuracy score, we modified the data using the
addition of a code that resizes as well as flips images in datasets at random to reflect
real world conditions, improving accuracy even with tiny datasets.
1.2 OBJECTIVE
● Towards this end, an object detection system for classifying food into different
classes is achieved.
● Its ability to detect and classify is very accurate.
● It operates on 300 x 300 resolution images from which the food items are
detected.
● This model has been trained to operate with a minimal dataset, resulting in
increased efficiency for small projects that does not have a large amount of data.
Using ResNet50 and the SGD, we created highly accurate object detection model which
can be trained with a minimal dataset. Our model uses very little computing power;
hence it is efficient. In a relatively less amount of time, it classifies multiple food
objects in a given video. In the future, our model might be used as a foundation for
classifying items into surveillance against weapons.
Project Specifications:
1.Input: A video with one or more objects, like, a slideshow or multiple frames of a video.
2.Output: An image with one or more bounding boxes with detected objects marked on them.
3.Modules:
a. Object Localization: Input video is passed and bounding boxes for potential classifiable
objects are returned.
b. Feature Extraction: The pixel values of input image of pixels present within the bounding
boxes is passed to convolution network for feature extraction of the potential objects present in
the image.
c. Image Classification: Feature extracted are passed to a classifier that detects the class of the
image with a confidence score, and if the confidence score is above a threshold score, the
localized object will be marked as that class.
The term ResNet denotes a residual network and 50 denotes the number of layers
present in network. ResNet employs an identity connection which is also called a
shortcut connection that counteracts the vanishing gradient during the backpropagation
preventing the formation of a small gradient that makes initial layer learning
intractable. Another benefit of identity connections is that we can train deep networks
faster than before [2]. There are ResNet consisting of more than 50 layers, ResNet with
101 layers is called ResNet101, and ResNet with 152 layers is called ResNet152. We
used ResNet to implement transfer learning to model with a minimal dataset. Transfer
learning takes a large dataset's pretrained model and applies its learned feature maps to
classify a separate but related dataset. ResNet also uses Batch normalisation. Bach
normalization improves network performance by standardizing input. Internal covariate
shifts are also reduced by using batch normalization [4], and it also allows us to make
use of higher learning rates [5]. ResNet uses a 2D convolutional neural network to
accomplish deep learning. It can be used for classification purposes, including image
recognition, effectively. [1, 3].
The equation of the neural network for updating the weights wi and bias b where xi is
input of layer is defined as below:
∑i wi xi + b
Where,
wi = weight of layer
xi = input of layer
b = bias
CHAPTER 4: IMPLEMENTATION
The system's core is made up of ResNet50, where the number 50 denotes the number of layers.
A pre-trained version of the network is loaded, which has been trained on a large number of
photos from the ImageNet database. We employed that pre-trained model on a similar dataset
—this process is known as transfer learning. Even with a tiny dataset, we were able to obtain
good accuracy with this methodology. ResNet is chosen over Convolutional Neural Network
because it is an ensemble of many mathematical methods used in Deep Learning, such as the
Convolutional2D layer and the MaxPooling Layer. ResNet also uses shortcut connections to
overcome the vanishing gradient problem, allowing our network to have additional layers.
Additional layers like the Flatten, Dense, and Dropout Layers help to enhance the working. In
addition, it can predict a variety of multiple food items.
CHAPTER 5: TESTING
5.1 Training Dataset
Our training dataset comprises 750 images of each food class, where each of them is
300 x 300 base resolution. Further, the images in the dataset are randomly flipped
or/and resized to match the real scenarios and achieve the highest accuracy. The initial
training accuracy of our model is 46.8% which increases to 90% accuracy. The time
taken by the model to train to achieve the maximum accuracy is approximately 20
minutes.
Our test dataset or the validation dataset comprises 250 samples of each class. Every
image is of base resolution 300 x 300. The initial testing accuracy of our model is
74.6% which increases to 92%.
Apart from validation dataset, some image samples were also taken in our study. The
model performed up to the expectations and identified the food items correctly. Below
are some examples.
CHAPTER 7: CONCLUSION
Using ResNet50 and the Stochastic Gradient Descent, we created highly accurate object
detection model which can be trained with a minimal dataset. Our model uses very little
computing power; hence it is efficient. In a relatively less amount of time, it classifies multiple
food objects in a given video. In the future, our model might be used as a foundation for
classifying items into surveillance against weapons.
• This should not come as a surprise since the detection significantly depends on the
availability of dataset.
• We aim to deploy the application on weapons to improve security in public domains, this
mainly includes stores, streets, subways, etc
• Making the model more robust so that we can work on multiple datasets.
REFERENCES
[1] Kelek MO, Calik N, Yildirim T. Painter classification over the novel art painting data set
via the latest deep neural networks. Procedia Comput Sci. 2019 Jan 1;154:369-76.J. Clerk
Maxwell, A Treatise on Electricity and Magnetism, 3rd ed., vol. 2. Oxford: Clarendon,
1892, pp.68–73.
[2] He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. InProceedings
of the IEEE conference on computer vision and pattern recognition 2016 (pp. 770-778).
[3] Ren S, He K, Girshick R, Zhang X, Sun J. Object detection networks on convolutional
feature maps. IEEE transactions on pattern analysis and machine intelligence. 2016 Aug
17;39(7):1476-81.
[4] Santurkar S, Tsipras D, Ilyas A, Madry A. How does batch normalization help
optimization?. Advances in neural information processing systems. 2018;31:2483-93.
[5] Bjorck N, Gomes CP, Selman B, Weinberger KQ. Understanding batch normalization.
InAdvances in Neural Information Processing Systems 2018 (pp. 7694-7705).
[6] Bottou L. Stochastic gradient learning in neural networks. Proceedings of Neuro-Nımes.
1991 Nov;91(8):12.
[7] Hardt M, Recht B, Singer Y. Train faster, generalize better: Stability of stochastic gradient
descent. InInternational Conference on Machine Learning 2016 Jun 11 (pp. 1225-1234).
PMLR..
[8] Torrey L, Shavlik J. Transfer learning. InHandbook of research on machine learning
applications and trends: algorithms, methods, and techniques 2010 (pp. 242-264). IGI
global.
[9] opencv-python-tutroals.readthedocs.io
[10] pytorch.org
[11] fritz.ai
[12] towardsdatascience.com/object-detection-simplified-e07aa3830954
[13] tensorflow.org/lite/models/object_detection/overview
[14] tutorialspoint.com/image-processing-in-python
[15] openaccess.thecvf.com/content_cvpr_2018/papers/Chen_Domain_Adaptive_Faster_CVPR
_2018_paper.pdf
[16] openaccess.thecvf.com/content_cvpr_2017_workshops/w8/papers/Hung_Applying_Faster_
R-CNN_CVPR_2017_paper.pdf