You are on page 1of 26

OBJECT DETECTION - DEEP LEARNING

A MAJOR-II PROJECT REPORT

Submitted by

PRASHANT SHARMA
ENROLL. NO. 2016-310-073

in partial fulfillment for the award of the degree of

Bachelor of Technology

Under the supervision of

DR. SHERIN ZAFAR

Department of Computer Science& Engineering


School of Engineering Sciences & Technology

JAMIA HAMDARD
New Delhi-110062

(2020)

1
DECLARATION

I, Mr. PRASHANT SHARMA a student of Bachelors of Technology


(Computer Science & Engineering) (B.TECH -CSE), (Enrolment No :
2016- 310-073) hereby declare that the Project entitled “OBJECT
DETECTION - DEEP LEARNING” which is being submitted by me to the
Department of Computer Science, Jamia Hamdard, New Delhi in partial
fulfillment of the requirement for the award of the degree of Bachelors of
Technology (Computer Science & Engineering) (B.TECH -CSE), is my
original work and has not been submitted anywhere else for the award of any
Degree, Diploma, Associateship, Fellowship or other similar title or recognition.

(PRASHANT SHARMA)

Date: 4th May, 2020

Place: DELHI
ACKNOWLEDGEMENT

I thank the almighty for giving us the courage and perseverance in


completing the main project. This project itself is acknowledgements for all
those people who have given us their heartfelt co-operation in making this
project a grand success.

I have taken efforts in this project. However, it would not have been
possible without the kind support and help of my mentor. I would like to extend
my sincere thanks to my mentor.

I am highly indebted to my project teacher DR. SHERIN ZAFAR for his


guidance and constant supervision as well as for providing necessary
information regarding the project & also for his support in completing the
project.

I would like to express my gratitude towards my parents, teachers and my


friend MAYANK BAGGA for their kind co-operation and encouragement
which helped me in completion of this project.

Last but not least, we would like to express our deep sense of gratitude
and earnest thanksgiving to our dear parents for their moral support and
heartfelt cooperation in doing the main project.

(PRASHANT SHARMA)

Date: 4th May, 2020

Place: DELHI
Contents
1. ABSTRACT
2. INTRODUCTION
2.1 Problem Statement
2.2 Application
2.3 Challenges
2.4 Literature Survey
3. EXPERIMENTAL RESULTS
3.1 Dataset
3.2 Implementation Details
3.2.1 Pre-processing

3.2.2 Network

3.3 Qualitative Analysis

3.4 Quantitative Analysis

4. PROJECT PLAN
4.1 Feasibility study
4.1.1 Operation
4.1.2 Technical
4.1.3 Cost/Benefit Analysis
4.2 System Analysis
4.3 System Requirement

5. CONCLUSION

6. FUTURE SCOPE AND LIMITATIONS

7. BIBLOGRAPHY AND REFERENCES


1. ABSTRACT

Efficient and accurate object detection has been an important topic in the
advancement of computer vision systems. With the advent of deep learning
techniques, the accuracy for object detection has increased drastically. The
project aims to incorporate state-of-the-art technique for object detection with
the goal of achieving high accuracy with a real-time performance. A major
challenge in many of the object detection systems is the dependency on other
computer vision techniques for helping the deep learning-based approach,
which leads to slow and non-optimal performance.

In this project, we use a completely deep learning-based approach to


solve the problem of object detection in an end-to-end fashion. The network is
trained on the most challenging publicly available dataset on which an object
detection challenge is conducted annually. The resulting system is fast and
accurate, thus aiding those applications which require object detection.
2. INTRODUTION

2.1 Problem Statement

Many problems in computer vision were saturating on their


accuracy before a decade. However, with the rise of deep learning
techniques, the accuracy of these problems drastically improved. One
of the major problems was that of image classification, which is
defined as predicting the class of the image. A slightly complicated
problem is that of image localization, where the image contains a
single object and the system should predict the class of the location of
the object in the image (a bounding box around the object). The more
complicated problem (this project), of object detection involves both
classification and localization. In this case, the input to the system will
be an image, and the output will be a bounding box corresponding to
all the objects in the image, along with the class of object in each box.
An overview of all these problems is depicted in Fig. 1.

Figure 1: Computer Vision Tasks


2.2 Applications

A well-known application of object detection is face detection, that


is used in almost all the mobile cameras. A more generalized (multi-
class) application can be used in autonomous driving where a variety
of objects need to be detected. Also, it has an important role to play in
surveillance systems. These systems can be integrated with other tasks
such as pose estimation where the first stage in the pipeline is to detect
the object, and then the second stage will be to estimate pose in the
detected region. It can be used for tracking objects and thus can be
used in robotics and medical applications. Thus, this problem serves a
multitude of applications.

(a) Surveillance
(b) Autonomous vehicles

Figure 2: Applications of object detections


2.3 Challenges

The major challenge in this problem is that of the variable


dimension of the output which is caused due to the variable number of
objects that can be present in any given input image. Any general
machine learning task requires a fixed dimension of input and output
for the model to be trained. Another important obstacle for widespread
adoption of object detection systems is the requirement of real-time
(>30fps) while being accurate in detection. The more complex the
model is, the more time it requires for inference; and the less complex
the model is, the less is the accuracy. This trade-off between accuracy
and performance needs to be chosen as per the application. The
problem involves classification as well as regression, leading the
model to be learnt simultaneously. This adds to the complexity of the
problem.
2.4 Literature Survey

In various fields, there is a necessity to detect the target object and


also track them effectively while handling occlusions and other
included complexities. Many researchers (Almeida and Guting 2004,
Hsiao-Ping Tsai 2011, Nicolas Papadakis and Aure lie Bugeau 2010)
attempted for various approaches in object tracking. The nature of the
techniques largely depends on the application domain. Some of the
research works which made the evolution to proposed work in the
field of object tracking are depicted as follows.

Object detection is an important task, yet challenging vision task. It


is a critical part of many applications such as image search, image
auto- annotation and scene understanding, object tracking. Moving
object tracking of video image sequences was one of the most
important subjects in computer vision. It had already been applied in
many computer visions fields, such as smart video surveillance (Arun
Hampapur 2005), artificial intelligence, military guidance, safety
detection and robot navigation, medical and biological application. In
recent years, a number of successful single object tracking system
appeared, but in the presence of several objects, object detection
becomes difficult and when objects are fully or partially occluded,
they are obtruded from the human vision which further increases the
problem of detection. Decreasing illumination and acquisition angle.
The proposed MLP based object tracking system is made robust by an
optimum selection of unique features and also by implementing the
Adaboost strong classification method.
3. EXPERIMENTAL RESULTS

3.1 Dataset

For the purpose of this project, the publicly available


PASCAL VOC dataset will be used. It consists of 10k
annotated images with 20 object classes with 25k object
annotations (xml format). These images are downloaded
from flickr. This dataset is used in the PASCAL VOC
Challenge which runs every year since 2006.

Figure 3: Dataset
3.2 Implementation Details

The project is implemented in python 3.8. Tensorflow


was used for training the deep network and OpenCV was
used for image pre-processing. The system specifications on
which the model is trained and evaluated are mentioned as
follows: CPU – Intel(R) Core(TM) i3-3110M, CPU@2.40
GHz, RAM - 4 GB.

Pre-processing: - The annotated data is provided in xml


format, which is read and stored into a pickle file along with
the images so that reading can be faster. Also, the images are
resized to a fixed size.

Network: - The entire network architecture is the model


consists of the base network derived from VGG net and then
the modified convolutional layers for fine-tuning and then
the classifier and localizer networks. This creates a deep
network which is trained end-to-end on the dataset.
3.3 Qualitative Analysis

The results from the PASCAL VOC dataset are shown


in Table 1.

Table 1: Detection results on PASCAL VOC dataset.


Input

Ground Truth

Prediction

Table 2
3.4 Quantitative Analysis

The evaluation metric used is mean average precision


(mAP). For a given class, precision-recall curve is computed.
Recall is defined as the proportion of all positive examples
ranked above a given rank. Precision is the proportion of all
examples above that rank which are from the positive class.
The AP summarizes the shape of the precision-recall curve
and is defined as the mean precision at a set of eleven
equally spaced recall levels [0, 0.1, ... 1]. Thus to obtain a
high score, high precision is desired at all levels of recall.
This measure is better than area under curve (AUC) because
it gives importance to the sensitivity.

The detections were assigned to ground truth objects


and judged to be true/false positives by measuring bounding
box overlap. To be considered a correct detection, the area of
overlap between the predicted bounding box and ground
truth bounding box must exceed a threshold. The output of
the detections assigned to ground truth objects satisfying the
overlap criterion were ranked in order of (decreasing)
confidence output. Multiple detections of the same object in
an image were considered false detections, i.e. 5 detections
of a single object counted as 1 true positive and 4 false
positives. If no prediction is made for an image, then it is
considered a false negative.

The average precision for all the object categories is


reported in Table 3. The mAP for the PASCAL VOC dataset
was found to be 0.633. The current state-of-the-art best mAP
value is reported to be 0.739.
Table 3: Average precision for all classes
4. PROJECT PLAN

4.1 Feasibility Study

A feasibility analysis involves a detailed assessment


of the need, value and practicality of a proposed enterprise,
such as systems development. The process of designing and
implementing record keeping systems has significant
accountability and resource implications for an organization.
Feasibility analysis will help you make informed and
transparent decisions at crucial points during the
developmental process to determine whether it is
operationally, economically and technically realistic to
proceed with a particular course of action.

Most feasibility studies are distinguished for both


users and analysts. First, the study often presupposes that
when the feasibility document is being prepared, the analyst
is in a position to evaluate solutions. Second, most studies
tend to overlook the confusion inherent in system
development – the constraints and the assumed attitudes.

Operational Feasibility: - People are inherently resistant to


change, and computers have been known to facilitate
change. An estimate should be made of how strong a
reaction the user staff is likely to have toward the
development of a computerized system. It is common
knowledge that computer installations have something to do
with turnover, transfers, retraining, and changes in employee
job status. Therefore, it is understood that the introduction of
a candidate system requires special effort to educate, sell and
train the staff on new ways of conducting business.

Technical Feasibility: - Technical feasibility centers around


the existing computer system (hardware, software, etc.) and
to what extend it can support the proposed addition. For
example, if the current computer is operating at 80 percent
capacity – an arbitrary ceiling – then running another
application could overload the system or require additional
hardware. This involves financial considerations to
accommodate technical enhancements. If the budget is a
serious constraint, then the project is judged not feasible.
Cost/ Benefit analysis: - Economic analysis is the most
frequently used method for evaluating the effectiveness of a
candidate system. More commonly known as cost benefit
analysis, the procedure is to determine the benefits and
savings that are expected from a candidate system and
compare them with costs. If benefits overweigh costs, then
the decision is made to design and implement the system.
Otherwise, further justification or alterations in the proposed
system will have to be made if it is to have a chance of being
approved. This is an ongoing effort that improves in
accuracy at each phase in the system life cycle.
4.2 System Analysis

It is the most creative and challenging phase of the


system life cycle. The analysis phase is used to design the
logical model of the system whereas the design phase is used
to design the physical model.

Many things are to be done in this phase we began the


designing process by identifying forms, reports and the other
outputs the system will produce. Then the specify data on
each were pinpointed. we sketched the forms or say, the
displays, as expected to appear, on paper, so it serves as
model for the project to begin finally we design the form on
computer display, using one of the automated system design
tools.

After the forms were designed, the next step was to


specify the data to be inputted, calculated and stored
individual data items and calculation procedure were written
in detail. File structure such as paper files were selected the
procedures were written so as how to process the data and
procedures the output during the programming phase. The
documents were design ion the form of charts.

Output design means what should be the format for


presenting the results. It should be in most convenient and
attractive format for the user. The input design deals with
what should be the input to the system and thus prepare the
input format. File design deals with how the data has to be
stored on physical devices. Process design includes the
description of the procedure for carrying out operations on
the given data.
4.3 System Requirements

The system services and goals are established by


consultation with system user. They are then defined in
detail and serve as a system specification. System
requirement are those on which the system runs.

Hardware Requirements:
 Computer with Intel processor.
 2GB RAM
 500GB hard disk drive
Software Requirements:
 Windows 7 operating system.
 Python 3
 Juypter Notebook
Software Development:
 Using Python
Python Requirements:
 OpenCV
 Imutils
 Dlib
 Skit-learn
 Skit-image
 Tensorflow
 Keras
 Mxnet
5. CONCLUSION

An accurate and efficient object detection system has been developed


which achieves comparable metrics with the existing state-of-the-art system.
This project uses recent techniques in the field of computer vision and deep
learning. Custom dataset was created using labelImg and the evaluation was
consistent. This can be used in real-time applications which require object
detection for pre- processing in their pipeline.

An important scope would be to train the system on a video sequence for


usage in tracking applications. Addition of a temporally consistent network
would enable smooth detection and more optimal than per-frame detection.

By using this thesis and based on experimental results we are able to


detect object more precisely and identify the objects individually with exact
location of an object in the picture in x-y axis. This paper also provides
experimental results on different methods for object detection and identification
and compares each method for their efficiencies.
6. FUTURE ENCHANCEMENTS AND LIMITATIONS

The object recognition system can be applied in the area of surveillance


system, face recognition, fault detection, character recognition etc. The objective
of this thesis is to develop an object recognition system to recognize the 2D and
3D objects in the image. The performance of the object recognition system
depends on the features used and the classifier employed for recognition. This
research work attempts to propose a novel feature extraction method for
extracting global features and obtaining local features from the region of interest.
Also, the research work attempts to hybrid the traditional classifiers to recognize
the object. The object recognition system developed in this research was tested
with the benchmark datasets like COIL100, Caltech 101, ETH80 and MNIST.
The object recognition system is implemented in Python 3.8.

It is important to mention the difficulties observed during the


experimentation of the object recognition system due to several features present
in the image. The research work suggests that the image is to be preprocessed
and reduced to a size of 128 x 128. The proposed feature extraction method
helps to select the important feature. To improve the efficiency of the classifier,
the number of features should be less in number. Specifically, the contributions
towards this research work are as follows,

 An object recognition system is developed, that recognizes the two-


dimensional and three-dimensional objects.

 The feature extracted is sufficient for recognizing the object and marking
the location of the object. x the proposed classifier is able to recognize the
object in less computational cost.

 The proposed global feature extraction requires less time, compared to


the traditional feature extraction method.

 The performance of the One-against-One classifier is efficient.

 Global feature extracted from the local parts of the image.

 Along with the local features, the width and height of the object
computed through projection method is used.
As a scope for future enhancement,

 Features either the local or global used for recognition can be increased,
to increase the efficiency of the object recognition system.

 Geometric properties of the image can be included in the feature vector


for recognition.

 Using unsupervised classifier instead of a supervised classifier for


recognition of the object.

 The proposed object recognition system uses grey-scale image and


discards the color information.

The colour information in the image can be used for recognition of the
object. Colour based object recognition plays vital role in Robotics. Although
the visual tracking algorithm proposed here is robust in many of the conditions,
it can be made more robust by eliminating some of the limitations as listed
below:

 In the Single Visual tracking, the size of the template remains fixed for
tracking. If the size of the object reduces with the time, the background
becomes more dominant than the object being tracked. In this case the
object may not be tracked.

 Fully occluded object cannot be tracked and considered as a new object in


the next frame.

 Foreground object extraction depends on the binary segmentation which


is carried out by applying threshold techniques. So, blob extraction and
tracking depend on the threshold value.

 Splitting and merging cannot be handled very well in all conditions using
the single camera due to the loss of information of a 3D object projection
in 2D images.

 For Nighttime visual tracking, night vision mode should be available as


an inbuilt feature in the CCTV camera.
To make the system fully automatic and also to overcome the above
limitations, in future, Multiview tracking can be implemented using multiple
cameras. Multi view tracking has the obvious advantage over single view
tracking because of wide coverage range with different viewing angles for the
objects to be tracked. In this thesis, an effort has been made to develop an
algorithm to provide the base for future applications such as listed below.

 In this research work, the object Identification and Visual Tracking has
been done through the use of ordinary camera. The concept is well
extendable in applications like Intelligent Robots, Automatic Guided
Vehicles, Enhancement of Security Systems to detect the suspicious
behavior along with detection of weapons, identify the suspicious
movements of enemies on boarders with the help of night vision cameras
and many such applications.

 In the proposed method, background subtraction technique has been used


that is simple and fast. This technique is applicable where there is no
movement of camera. For robotic application or automated vehicle
assistance system, due to the movement of camera, backgrounds are
continuously changing leading to implementation of some different
segmentation techniques like single Gaussian mixture or multiple
Gaussian mixture models.

 Object identification task with motion estimation needs to be fast enough


to be implemented for the real time system. Still there is a scope for
developing faster algorithms for object identification.
7. REFERENCES

[1] Ross Girshick, Je_ Donahue, Trevor Darrell, and Jitendra Malik. Rich
feature hierarchies for accurate object detection and semantic
segmentation. In The IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), 2014.
[2] Ross Girshick. Fast R-CNN. In International Conference on Computer
Vision (ICCV),2015.
[3] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster R-CNN:
Towards real-time object detection with region proposal networks. In
Advances in Neural Information Processing Systems (NIPS), 2015.
[4] Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You
only look once: Unified, real-time object detection. In The IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[5] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott
Reed, Cheng-Yang Fu, and Alexander C. Berg. SSD: Single shot
multibox detector. In ECCV, 2016.
[6] Karen Simonyan and Andrew Zisserman. Very deep convolutional
networks for large-scale image recognition. arXiv preprint
arXiv:1409.1556, 2014
26

You might also like